Skip navigation

Greetings,

this will be about the DSL I implemented to solve the issue about wirting those coroutines described in the first post .

As mentioned in the presentation roadmap, I will not use the implemented internal DSL, because this DSL requires a number of additional concepts and issues to be solved. I would like to avoid these. I will overall use a Java or C-like-language with a little quirk. I will throw in numbers in pointy parenthesis which denotes “Buffer this many bits here and execute the following statement, providing the buffered bits in a 0-padded byte-buffer called buffer”.

We will now examine three to four examples, in order to see regular statements, conditionals and loops. These constructs are sufficient as a complete base for a programming language, and I did not implement more than these intially. We discovered further constructs of this language via refactoring later on.

For example, assume we are supposed to read an 8-bit integer, assuming that the canonical byte form is defined to be 8 continuous integers in 2 complement form. Think about this intuitively for a second. I arrive at something like “Well.. buffer 8 bits, convert them somehow and write that into some output variable”. In the DSL implemented, we can write this as:

<8> output = convertToInt(buffer[0]);

Looking at this, this means: Buffer 8 bits (“<8>”) and execute the statement after it, with the buffered bits in the byte-array buffer. In other words, we arrive at: Buffer 8 bits, convert these 8 bits into an integer and store the result in the output variable. At first, I think it is nice how close this is to my intuitive idea how to do this. Second, note how this code is virtually oblivious about buffering issues. Do the bitsarrive in a block of 8? In two blocks of 4? In single bits? I don’t know. I don’t need to know.

You are unable to see this here, because I am using pseudo code, but the DSL in itself does not define basic statements, such as primitive types, integer operations and so on. This decision was made, because the problem in itself is a control flow problem. In other words: The primitive operations C integers provide are perfectly fine for processing the contents of a buffer, once it is buffered. The primary issue to solve is actually getting this buffer filled and then executing these C-statements in order to process the contents of this buffer.

Taking it one step further, we get conditionals. Assume that we have two possible types, which both serialize into 16 bits and are converted by two conversion function, call these convertA and convertB. Which type is transmitted is transmitted with a single byte preceding the 16 bits of the actual data. If this bit is 1, it is the type which requires convertA, otherwise (if it is 0), the type requires convertB. Again, think about this intuitively. I arrive at something along the lines: “Well, I need to read a single byte, check this to be 1 or 0, buffer 16 more bytes and call the correct conversion function”. In the DSL, this translates into:

<1> if (buffer[0] == 1) 
        <16> output = convertA(buffer[0], buffer[1])
else
        <16> output = convertB(buffer[0], buffer[1])

Let’s walk through this precisely. We have an if, which is annotated with a buffer size. This means that a single byte is buffered, the rest of the buffer is padded with 0 and “the if is executed”. That means, the condition of the if is evaluated and the right path is taken. The statements in the branches should be familiar: The translate to “Buffer 16 bit, call convertA (or convertB) and store the result in the output”. So overall, this translates into: “Depending if the first buffered bit is 0 or 1, buffer 16 more bits and convert them in the right fashion”.

Note that padding in this case means that all bits in the buffer are in a well defined state (buffered or 0). This padding is independent of padding in the serialized byte sequence, which is used for example to byte-align the various data fields in order to enhance the readability of the serialized byte sequence with a hex editor.

Finally, there is the loop. Assume we need to read a sequence of 8 bit chunks. The first bit of a chunk is 1 if there is another chunk and it is 0 if the current chunk is the last chunk in the sequence of chunks. Of each chunk, the remaining 7 bits are some data we store and collect. Think about this intuitively again. I arrive at something along the lines: “Well, read a bit, check if it is 1 or 0, and if it is 1, read data and continue, and otherwise stop.” In the DSL, this translates into:

 while(buffer[0] == 1)  storeData(buffer[0]);

Examining this, we get: Read a bit, and if this is 1, read 7 more bits, store these and continue, otherwise, if the read bit was 0, stop. Again, no need to care about buffering and everything. I just need to focus on the description of the byte sequence and I can translate these into clean code, which should do the right thing in a very clear way. Considering that serialization an deserialization is a pretty tricky topic, I think this is a very strong property, because now, it is very easy to verify if this code does what it wants. I am doing it all the time for you.

Finally, there is a last concept about buffering: not buffering. Confusion aside, consider the following definition: A sequence of bytes, terminated by a 0 byte. This should be an issue given your current knowledge, bcause the condition of a loop appears to consume the byte it examines. However, in this case, we just need to inspect the byte for the condition and hand it to the body of the loop in order to process it further. Are we lost? Well. No. In this situation, we decided to define that a buffer size of 0 requires that no buffering occurs, and the buffer of the last buffering is reused. So, we can implement the issue from above as:

<8>while(buffer[0] != 0) <0> storeData(buffer[0]);

This reads as: Buffer 8 bits, and if these are not 0, store these 8 buffered bits somehow. Obviously, if there is no buffering before a 0-buffering, the buffer will be undefined. This has not posed a problem for us until now. I am overall not entirely sure about this decision, even though it is working pretty well. At the time we decided this, it was a pretty pragmatic decision and the more I think about it, the more I like it, but somehow, I don’t get into the state of loving it. I mean, it obviously does make sense that no buffering reuses the previous buffer, but on the other hand, it still feels like a little hack. On the other hand, it works. I guess, a clean solution would be to decouple the buffering from individual statements and expressions and allow separate buffer statements. However, this will result in trickery in the loops again.

So, this is the overall DSL, on a high level. You just pick C, add some buffering directives here and there and you have this DSL, with very little surprises.

Greetings,

in this post, I will explain the problem I encountered and which I solved with the reading coroutines. This DSL is a part of the code generator in a compiler. This compiler reads a certain type description and produces serialization and deserialization code for values described by the type description. This reading coroutine simplifies generating the deserialization code. Let’s go through this step by step.

At first, there is the type description. The current type description is a condensed form of a parsed XSD. Basically it contains a number of elements with a name and a type, and the type defines what kind of values are valid. You can think of this as a variable declaration with a certain type in Java. In Java, you see something along the lines “int numberOfCats;” and you know that numberOfCats holds a numeric value from Integer.INT_MIN to Integer.INT_MAX without any fractions. The only difference is that the type description is more flexible than this. For example, if you define a numeric type, you can (optionally) specify the number of fraction digits, the total number of digits, the minimum value and the maximum value. Overall, this type description is (because it is derived from XSDs) very general and contains fun things such as:

  • lists with an optional maximum length
  • unions of types
  • numbers with optional limits and precision
  • strings with optinal maximum and minimum length

In short, it would be easier to list things this type description cannot represent.

Given this description, we need to generate a serialization and deserialization function. We use the term serialization in order to describe the transformation of a value of a given type into a well defined bit sequence, while deserialization describes the transformation of a well defined, well formed bit sequence into a value again. In other words, the serialization of a boolean is implemented by either writing a 1 if the boolean is true or writing a 0 if the boolean is false, or the serialization of an integer is implemented by writing the two complement bit representation of this number. For a sequence, for example, we absuse the fact that we know a lot about the value, because we know the type of this value and thus, we can implement the serialization of a value of a sequence type by applying the serialization for each member value of the sequence one after each other.

The problem in this situation occurs once you consider the context where the generated code is supposed to be used. The generated code is supposed to be part of a communication stack on an embedded system (well, also on an enterprise server, such that you have a consistent communication stack on all parts of the system implemented). Given this, you cannot assume that you receive the entire bit sequence for a value in a single go, deserialize the value and go back to sleep after this. Such an embedded system usually recieves data in chunks, whereas the size of the chunks is determined by the receive buffer size. Furthermore, we cannot precompute how long the byte sequence for a type will be, so we cannot require the client to prebuffer enough data (consider the union of a list without a length resctriction and a boolean, for example). Thus, we overall needed a way to handle the deserialization of such a segmented byte sequence.

The first way we considered was to try to deserialize the current buffer and either produce a value or reject the buffer. If the buffer is rejected, the user has to buffer more data and retry the deserialization with more buffered data. If a value is produced, the user has to process it. This approach has the problem that overall, a lot of redundant deserialization attempts will happen and the client has to store a lot of buffered data he actually does not want to store.

Given this, I figured we could use a coroutine with a statemachine in order to solve these problems. The coroutine could read the current segment of the bit sequence and remember all necessary information and potentially produce a value. If no value is produced, well, no value is produced and the buffer is consumed and can be reused. If a value is produced, the value can be processed and the remainder of the buffer is fed into the coroutine again. This approach has the advantage that the user of the generated code can buffer a natural amount of bits (pretty much the receive buffer size of his system) and feed this into the coroutine. This means that there will be no redundant deserialization attempts, because every bit of the input bit sequence will be considered exactly once and no information will be stored redundantly, because the coroutine stores just the information it needs and nothing more. The buffer can be discarded after the coroutine consumed it’s contents.

However, writing these coroutines is a fairly tricky part. Overall, you need to keep a static variable to contain the current state, you need to remember how many bits to buffer in order to execute the next deserialization step, you need to remember a lot of technical overhead (updating the buffer size, the remaining unconsumed bits) and overall, you need to write a lot of C-Code in order to do this. Thus, I wrote a prototype for such a coroutine, got a headache from that and decided that I cannot have unexperienced C programmers write such nontrivial code. Thus, I decided to solve this problem with a DSL.

Greetings,

it has been requested that I talk about the generation of the reading coroutines in C a bit more. This first post will operate as a roadmap and also as a collection of links to the other posts about this matter, easening the navigation in addition to the categories.

I will overall present the code in the current state and try to elaborate why I implemented things in this way and not in another way. I might sometimes wonder and ramble about some design decisions and my missing knowledge about software design, and I might make up theories about design which might make sense or not. If you disagree, I will be happy, because then we can discuss. :)

The overall presentation will consist of the following steps:

At first, I will present the overall problem we will try to solve. This post should be read by everyone who wants to continue further, I think, because otherwise, terms will become pretty hard to understand.

Second, I will present my take on a DSL to solve this problem. I will present a pretty pragmatic view on this which I presented to the users of the DSL. I am going to keep this view abstract from the actual code really used to implement this, because those classes contain the solution of one or two more problems I encountered which overall add a bit of understanding trouble which I would like to avoid.

Third, I will present one half of the foundation of everything: the datastructures. The datastructures are overall a big composite pattern. I will explain why I implemented this as datastructures with vistors, opposed to, say, regular objects with methods. I will also present some implementation details for immutable classes we discovered as beneficial.

Fourth, I will present the other half of the foundation of this overall layer: The vistor part of the datastructure+visitor-implementation. I will present the general way we chose to implement the visitors, the way we removed a number of redundancy and the way we tested the visitors. The third and the fourth part would overall benefit a reader who sometimes ends up with a lot of operations on few datastructures, because I think that we overall reached a point where our datastructures are strong and usable and implementing visitors is quite easy and painless.

Fifth, I will explain how we solved the issue with variable names. It took around three iterations which I mostly forgot again, but my git log will probably help me on that. Overall, we reached a strong solution, which is a bit arcane, but once you get it, the light is bright (as a teaser: you basically annotate run time values with objects in order to identify value flow in order to add variables). I will also show you a nice way to add extensibility in this DSL.

Sixth, I will explain the abstract, high level plan for the generation, including the structure of the C method we want to generate. For those theoretically inclined, I will also discuss the overall abstract model of this language. This is the point where the theoretical description from part two will become relevant and this is the point where I had some headaches to solve for a day or two, but I managed to solve it :)
Overall, to have some terms handy to explain part seven and eight, we transform the code tree into a code graph and convert this graph into some while-switch-statement in order to implement the state machine, which is the model of the domain specific language.

Seventh, I will explain how the transformation from the code tree into the code graph. This was overall pretty tricky, but manageable. I will also elaborate why I think I need this. This should be a short, but interesting post.

Eigth, I will demonstrate how the code emitting is implemented. This is not exactly difficult, but kind of tricky, because there are all so little nitpicks that can go wrong.

Ninth, I will discuss how I managed to get this entire beast tested. Yes, it is fully unit tested, which is very, very nice. :)

Finally, I will wrap this up and consider if implementing this DSL was worth it, what could have gone better and so on.

Wow, so we got a lot of fun in front of us. Let’s begin. If someone would like to recieve the entire code from me, I will be happy to provide it.

Greetings,
let us go and write down a number of fuzzy numbers which are supposed to resemble development time, risk and value, and be happy with them, until we notice how horribly wrong they are. Once we arrive there, we can decide to either run around screaming or we can re-estimate and make some manager run around screaming. Both possibilities will be fun somehow.

In case you are wondering why I am posting at this rapid speed: Currently, I have to share a computer with several other people, so I have a certain backlog of content piled up at the moment, and if I say “I will estimate things now”, I pretty much meant “Yesterday evening, I formulated the post in my head, and yesterday evening, I figured I will estimate now.”, and thus, the estimation is about done right now.

At first, I had to estimate the stories in 1, 2 or 3 weeks of Ideal Development time. This was the regular estimation problem, so I basically went through the stories, assigned a single week to them (and for some of the stories, a single week feels like an awful lot of time, but I think the project velocity should be able to fix this) and assigned 2 or 3 weeks of time to just a very small number of the stories (In numbers: of 77 stories, 62 have been estimated 1 week, 12 have been estimated 2 weeks and 3 have been estimated 3 weeks). However, I think this distribution makes sense, as my application is mostly a simple web frontend for a database which pushes data around in the database, so most of the tasks are simple forms feeding queries to the database and displaying the results. Those 2 week stories are mostly the version control stories and the 3 week stories are stories involving technology I am unsure about (recovering passwords, as this involves the use of emails, and I am not sure how I will test this) and one of the version control stories (recovering a deleted hint-through).

Second, I had to assign buissness value to my stories. I was not entirely satisfied with just assigning low, medium and high to the stories, because for example, the community part felt more important than the version control for hint throughs, but less important than actually developing the hint-throughs, but the version control part felt more important than the rating mechanism for comments. Thus, I tricked myself by allowing myself to assign buissness values on a scale from 5 to 0. 5 means “Critical: If we do not implement this, the system is just worthless”, while 0 means “Well, if we are eventually done, we can implement this, but if this is missing, it is not so bad.”. In the planning game, I can then merge values 5 and 4 into high buisness value, 3 and 2 into medium buisness value and 1 and 1 into low buisness value. If I am unsure which high-value stories I want to pick, I can still pick those with a value of 5 over those with a value of 4, of course. Most of the stories are rated 4 to 3, which again makes sense to me. In numbers, of 77 stories 17 stories are rated 5 (Basic hint-through development stories and basic account management stories), 22 are rated 4 (Administrative stories, basic forum functionalities, searching for hint-throughs, basic messages), 31 are rated 3 (Flagging, Markup, version control, additional features for the high-value components), 6 are rated 2 (comment versions, additional content for user pages, recovering deleted hint-throughs (hey, accidentially deleting a hint-through really requires a lot of work, so it won’t happen often, I hope)) and the remaining 2 stories are rated value 1 (rating comments). I think this heavy bias to high- and very high buisness value stories makes sense, because I wrote down user stories for the core system I envisioned, and thus, most of the functionality I want to develop is pretty critical for the system.

After that, I had to assess risk. I planned to estimate risk in three categories and combine them into an overall risk value. The first category is completeness of the description, or “Do I know everything?”. If I am confident that I know everything I need to know, I rate the completeness-risk low, if the story is very vague, the completeness-risk is high, otherwise, it is somewhere in the middle. The second category is implementation complexity. If I look at a story, I check if I can easily think of a way to implement this, or if I already see some tricky problems popping up (e.g. the version control will be pretty tricky to implement, while adding a comment is fairly simple). If the complexity then directly translates into the risk of this category. The third category I looked at was volatility, as in “How often will this change?”. If it changes often, the risk is high, if it changes like a mountain, the risk is low. Given these three thoughts, one can easily combine the risk into an overall risk. If everything is low for example, the overall risk is probably low. If the risk is medium in two of the categories, the overall risk probably is in the upper area of middle, too. If I was unsure about a certain risk, I usually picked the higher category. Given this, of 77 stories, 1 story was rated high risk (Recovering delete hint-throughs, because this felt pretty complicated to implement and I was not entirely sure about how long a hint-through is kept), 10 were rated medium (posts, version control, searching, recovering passwords) and the remaining 66 stories are rated low risk. I think this makes sense, because I have a pretty clear vision of the application I want to create and thus, the risk of incompleteness is pretty low. Second, most of these mechanics are pretty basic and thus, most of the mechanisms cannot change much (at most, they can move from one page to another), so the volatility risk is pretty low, too. There are just a few stories in there which feel tricky to implement (recovering delete hint-throughs, version control, polls), so they are rated with medium or even high risk. But again, the bulk of the stories (adding comments, adding notes to hint-throughs, account management) are mostly database operations with form in front of them, and thus, pretty easy.

Thus, overall, I think the estimation, value and risk estimation resulted in a sort of sensible result and I can walk into a scope based release planning now (where I probably will pick all the value 5 stories and call it a release).

Regards,
Tetha.

Greetings readers,
let us embrace the most dropped part of extreme programming and be stubborn enough about it to run into it, get frustrated, be tempted to drop it and haggle it to the ground using a large amount of rubber bands, coffee and persistence: The system metaphor.

My plan right now is to walk through the proven system metaphors, and brainstorm some more, think about the metaphor and pick one that feels good, whereas good stands for “Does not sound completely akward.” I will call the system from now on “a development community for hint-throughs”, as this is a consistent, concise description of the system which can be easily put into the sentence “SYSTEM is an METAPHOR”.

Brainstormed metaphors

Proven Metaphor: Spreadsheet
A development community for hint-throughs is like a spreadsheet. Uh. the hint-throughs are like cells, while the users operate on those cells. Uh. What?!
Proven Metaphor; Script
A development community for hint-throughs is like a theatre script. I don’t think this works, as a script is mostly a sequence of actions, while the community pretty much a bunch of people working on a certain set of artifacts, developing, improving and consuming them.

Spontanetous idea: Library
The development system for a development community for hint-throughs is a library. Inside this library, there are librarians performing certain tasks, like “Fetching an entire hint-through” or “Fetching a certain page of a hint-through”. They do this by doing this and asking other librarians to do things. The hint-throughs are books, with a cover (containing author information, hint-through information, …), and pages containing actual content (and pages refer to each other). Futhermore, there are meeting places, like the cafeteria inside the library where the users can meet if they wish to dos o. Besides that, users ask the librarians to do whatever they want them to do. I like this.

I think I will stick with the idea of a library, involving shelves containing the hint-throughs and librarians who require a certain amount of information in order to go and retrieve books according to this.

I am not entirely convinced if this metaphor is strong enough or useful in any way, however, several of the papers and pages I looked at basically state that getting the metaphor right is really, really hard. Thus, I will pick this one, accept the risk to run into problems and be ready to refine the metaphor. I just want to have one and get experience at choosing a metaphor.

I think this is a good short post with a nice amount of progress. I am going to work on the estimations and the release plan now, I think,
Tetha.

Greetings,
after about two days of casually noting down user stories, I think I have a pretty decent coverage of user stories for my vision. Eventually, I ended up at about 77 user stories. I am unsure if I was struck by a certain case of creeping featuritis, but a lot of stories result from doing things consistently (say, I want that people are able to comment on a hint-through, but in that case, it makes no sense to forbid comments on comments, so the story “Add comment to hint-through page” pretty much implies the story “Add comment to other comment”).

Besides that, I was eventually forced to categorize my stories a bit in order to keep at least some sort of clear view, because remembering everything or just skimming through that pretty large stack of user stories grows really hard in the end.

I eventually converged against the following pattern of a user story: “As a X, if I do Y, then I can do Z. Thus, A”. X is an identification of a user status (any user vs logged-in user vs original author vs moderator vs several others), Y usually presents a certain page the user views or some setup actions, Z identifies an action the user does and A identifies the benefit the user has from this. I have to admit that pretty often, the benefit of the user is identical among several stories, because the stories describe a similar mechanic.

I will not present all user stories in full text here, unless someone explicitly requests them. However, here are some examples of user stories I have right now:

Polls, Participation: As a logged-in user, if I view a poll I did not participate in yet, I see a matrix with an option group for all votes I have and an option for all possible answers. I set those options and submit. After this, the number of votes on the items is displayed whenever I view the poll. Thus, I can take part in votes and voice my opinion there.

Flagging, Flag hint through pages: As any user, if I view a node of a hint-through, I can see a link “Flag as inappropiate”. If I click this link, the node is scheduled into the flagged item moderation queue so the moderators and administrators can review and potentially change the inappropiate content. Thus, I can report abuse and keep the site civilzed

I think, those stories can be translated into acceptance tests pretty easily, because they are structured and detailed enough, and they provide a lot of information what is supposed to happen.

Overall, I have the following user stories planned out by now (the elements of the ordered lists are the titles of the actual user story):

Account management (11)
  1. Logging in
  2. Registering
  3. Find another user by username
  4. Set password to new password
  5. Set my email
  6. Recover lost password
  7. “About me” field
  8. Moderators can set email addresses, passwords and about me
  9. Moderators can ban users.
  10. Administrators designate moderators
  11. Administrators designate administrators
Statistics (3)
  1. Statistics are displayed on user page
  2. Authors statistics page is linked to from a hint-through cover
  3. Own statistics page is directly reachable from main page
Feedback for authors (4)
  1. view comment of hint-through page
  2. add comment to a hint-through page
  3. rate a hint-through
  4. Display the ratings of a hint-through
Consuming hint-throughs (3)
  1. Read a hint-through
  2. Search for a hint-through
  3. Alphabetical hint-through index
Developing hint-throughs (10)
  1. Create a new hint-through
  2. Publish a hint-through
  3. Add a new page to a hint-through
  4. Edit an existing hint-through page
  5. Delete a hint-through page
  6. Move a hint-through pabe
  7. Delete a hint-through
  8. Moderators can change hint-through pages
  9. Moderators can delete hint-through pages
  10. Moderators can delete hint-throughs
Markup (6)
  1. Links
  2. bold
  3. italics
  4. underline
  5. ordered lists
  6. unordered lists
Bookmarking hint-throughs (3)
  1. Add a bookmark
  2. Delete a bookmark
  3. Bookmarks are displayed on the main page
Hint-through version control (3)
  1. View changelog of the hint-through
  2. Revert a page to an old version
  3. Administrators can recover deleted hint-throughs
Collaboration on hint-throughs (3)
  1. Add a collaborator
  2. Remove a collaborator
  3. The list of collaborators is displayed for a collaborator on the cover of the hint-through
Messages (6)
  1. Notiifcation about unread messages on main page
  2. Viewing the inbox
  3. Reading a message
  4. Sending a message
  5. Replzying to a message
  6. Deleting a message
Comments (7)
  1. Add a comment ot a comment
  2. Edit a comment
  3. View edit history of a comment
  4. revert a comment to an old version (Moderators or original author)
  5. comment rating is displayed
  6. comments dcan be rated
  7. Moderators can edit all comments
Flagging (9)
  1. Posts can be flagged
  2. Threads can be flagged
  3. Messages can be flagged
  4. “About me”-Section can be flagged
  5. Comments can be flagged
  6. Hint-Through pages can be flagged
  7. Flagged items queue for Moderators
  8. Moderators can accept flagged items from the queue
  9. Moderators can resolve an accepted flagged item
Forum (7)
  1. Forum can be browsed
  2. Reading a thread
  3. Starting a thread
  4. Replying to a thread
  5. Editing a post (moderators or own post)
  6. post version history
  7. reverting a post to an old version
Polls (4)
  1. Creating a poll
  2. Editing a poll (mod or original author
  3. Participate in a poll
  4. Close a poll

Overall, I think it is a nice set of user stories for a page which is built around the idea of cooperating to develop hint-throughs and building up a certain community to do so.

The next steps I will have to do now is to commit to a system metaphor, estimate the stories in Ideal Developer Time and annotate them with value and risk. After that, I can plan the first release and check how everything works out.

Until then, until then,
Tetha.

Greetings,
I have decided to go on a journey in structured software development. I was finally annoyed enough that there is no good site implementing the hint-through-system a very old walkthrough system (back in the days of Dos 6.22) used, so I decided to do this myself. I think, this is a good thing, because it is going to get me into web development and database development a bit (an area my skills are lacking in, yet) and I figured I can try to do some approximation of Extreme Programming in order to implement this vision. In particular, I am going to do as much of extreme programming as possible. For example, I am a solo developer for this project, so pair programming is out of the question, and I do not really have a customer and a user besides me yet, but I can still gather and estimate user stories, I can schedule them in a release plan, I can plan iterations and so on. However, I decided to create a series of blog posts in order to build up a certain amount of positive pressure and also to use this as a diary in order to have some written results in order to be able to learn lessons from this.

So, at first, let me describe my vision of the web application and what a hint-through is, so everyone can follow me and my ideas. I assume we all know walkthroughs of games. A walkthrough presents a clear sequence of steps the player has to take in order to win the game. Certainly, the walkthrough contains all information you will ever need to win the game, however, the presentation is unsatisfactory for me, because whenever I need just a hint, I have to skim the walkthrough, get far too many hints, learn far too many spoilers and generally lose a large amount of challenge, because the solution is slapped into my face.

A hint-through solves this problem by changing the presentation of the steps required to do in order to progress. In contrast to the sequential presentation of a walkthrough, a hint-through is a tree. In each node of the tree, there is a certain information stored and further sub-trees are associated with te current tree node, each associated with a queston that is answered in the sub-tree. Of course, if you look at a traversion of the leafs of a hint-through, you will have the linear sequence of steps similar to a walkthrough again.

For example, if you are supposed to find a key in a sealed trashcan (which has to be opened with a spoon) in order to open a door, a walkthrough simply presents you “Use spoon with trashcan. Take key. Use key with door.”, while a hint-through rather presents you with the information “You need to open the door. (1) How am I suposed to open the door?”. Investigating 1 further results in the information “There is a key hidden in this room. (1) Uh, key? Where?”. Again, (1) refines into “The trashcan contains the key. (1) And.. how do I open it?”, whereas (1) refines into “Use the spoon”. Putting it all together results in “You need to pen the door. There is a key hidden in this rum. The trashcan contains the key. Use the spoon.”, which is a very detailed soution to the room, however, a lot of people can stop reading the hint-through after one or two refinements, because the hint “The key is in the trash can” finally transforms the problem “Open door” into “Open trashcan. Then, get key, open door, progress”. I consider this experience superior to the boring walk-through, because the steps in a hint-through merely nudge me into the right direction.

Given the Extreme Programming workflow, I have to gather user stories at the moment. A certain time of inspiration yesterday evening resulted in a batch of 9 stories, but these storie describe a suprisingly small fraction of the system. I have 2 stories for registering a user and logging users in, 2 stories for browsing the hint-throughs (a title search and an alphabetical index) and 5 stories for developing hint-throughs (creating a new hint-through, adding nodes, moving nodes, deleting nodes, editing nodes). I am still missing stories about the collaboration of people on a hint-through, a version history of hint-throughs, actually reading a hint-through, requesting hint-throughs, probably translating and requesting the translation of hint-throughs, administration and moderation tasks, commenting on nodes and hint-throughs, managing ones own account, rating hint-throughs and gathering statistics about hint-throughs. (so there is still a lot of stories to be found).

At first, I started with a relatively rigid structure similar to the UserStoryTemplate, however, this was too restricting, and thus, I eventually kept the general idea of the template, but went into a more free form. (Most of the cards now read like “As an …, if I .., then I …. Thus, I can…”. One story is for example: “as a registered, logged-in user, I can click a button “create new hint-through. Then I enter the name of the game and an initial description of the game. After this, a hint-through node is presented to me under the entered title of the game, containing the entered description of the game. Thus, I can add new content to the system.”) This nicely wraps up the preconditions of a story, the actions performed during the story and the actual value the story adds. Thus, it should be possible to write acceptance tests for this stories.

Besides this, I am currently pondering about the system metaphor, but mostly, I will keep on collecting user stories in order to get a nice backlog of them in order to be able to plan the release plans.

Recently, I have been wondering about variable naming schemes, again, and finally, the urge to write a post about this is big enough to actually do it. I think, a simple “use descriptive names” is way, way to general to be a good guideline. I think that developing a good sense for variable names involves getting familiar with the forces and the context which involves the variable name and listening to these forces. For example, if there is a statement:


FileWriter of = get outputFileWriter()

in, say, a 5 – 10 line function, what use will there be to call of outputFile or outputFileWriter versus of? By now, I think that of might be a better name, as it is shorter and requires the reader to read less physical characters, easing reading the code, unless other forces and facts apply (for example, if there are 8 other variables in scope, or if of will be used for the next 200 lines)

Let me elaborate a bit on this. I think that the most important thing about a variable is the semantic meaning of the value contained by this variable. For example, the 5 in the variable ‘leftOffset’ has a different semantic meaning than an 8 in the variable ‘elementsFound’. If you know the semantic meaning of the value contained by a variable, you can evaluate statements and expressions involving this variable and thus the modifications made to this value, and you can check if they make sense. Furthermore, you can modify them in a sensible way. For example, if you understood that the semantic meaning of a 0 in the variable elementsFound is the number of elements found, fitting a certain criterion, then it makes sense to see a statement like


if(criterion(element)) elementsFound += 1;

This makes sense, because elementsFound contains the number of elements found so far and you just found one more element. Thus, a crucial point of naming variables involves communicating this semantic meaning to the reader of code.

Taking this, it certainly makes sense to use descriptive names for a variable, as the semantic meaning of a variable can be deduced by interpreting the name of a variable (which should be unique enough, given the context). However, I think that this is too narrow. A variable is never read in isolation. There are declarations, initializations around, providing further information about the variable, aside from the name.

Given this observation, I think we rather need to talk about forces which increase the name of a variable and forces which decrease the name of a variable, resulting in a feeling if "of" might be a sufficient name, or if "outfile" might be more appropriate.

I have been able to identify the following forces by now:

Size of Scope
The length of the scope of a variable correlates with the ideal length of the variable name. A large scope requires a long, readable name, while a small scope can cope with a short acronym. This implies that functions, parameters, class attributes and class names must be long and readable, as their scope is the entire program (usually) or (for class attributes) includes every function inside the class. However, it allows a short variable like of for a simple 3-line loop.
Number of variables alive at the same time
If there are more variables alive at the same time, the names should eventually be expanded into long, descriptive names. If there are only very few variables in scope at the same time, variables can remain short.
Readable Initialization / Declaration
If a variable has a highly descriptive initialization or declaration, it’s name can remain short, as the semantic of the value can be inferred from the declaration and/or initialization. Consider for example

void checkAllTypes(TypeEnvironment typeEnvironment) {
    foreach(Type t : typeEnvironment.types) {
        checkTypeInvariants(t);
    }
}

In this case, I do not see the advantage of calling t "checkedType", especially as the scope is small and there are only 2 variables in scope.

An interesting conclusion from this is that a statically typed language (say, Haskell or Java) tends to allow shorter names, if there are good type names involved, while a dynamic language (say, Python or Lisp) requires longer and more descriptive due to the lack of such type annotations.

As a conclusion, I think there are forces which can justify very short variable names, and even more, some forces might even be statically checked. Counting the number of variables in scope at a certain point is simple and possible, and checking the size of the scope could be possible in the number of lines or the number of blocks. This would allow the implementation of more clever variable name code metrics, which is something I should start thinking about. If you find more forces, I will be happy if you let me now!

Greetings,

I just solved the meta loopless sort problem on the Online Judge. I had problems 107, 108 and 109 solved earlier during a university course already. If I recall right, 107 and 108 are math problems, so if you do not see the trick to solve them very soon, you can pretty much go and look the solution up or give up (at least that is the case for me usually). 109 on the other hand was pretty much enjoyable, because all one has to do is to check if a polygon contains a point or not.

But let us move to solve the problem at hand, 110. The problem is again very nice, because it pretty much boils down to a simple recursion which then solves everything nicely. Overall, the goal is: Given an integer, which is the number of elements in an array, output an optimal, loopless sorter which simply selects the right permutation of input variables via a tree of if-statements and outputs the sorted sequence of values.

In order to solve this, assume you are supposed to sort n values, and k values are sorted already. Assume that the sequence of sorted values is <v1, v2, …, vk>. If you are suppposed to enter a new value, w into this permutation, you have k+1 possible cases:

  • w < v1
  • v1 < w < v2
  • v2 < w < v3
  • vk < w

Each of these cases creates a new permutation with w somewhere inside the k values in the existing permutation and each of these cases can be checked with an if (the last one requires an else without an if). Since this results in a new permutation, it can be easily iterated until all variables are inserted into the permutation. Once all variables are in this permutation, you can simply output writeln(current_permutation), and you are done.

Overall, this is a very nice, recursive solution. My implementation even solved the ‘trippy special case’ 1, which is simply a writeln(a) and worked just nicely besides that. :)

Regards, Tetha.

Greetings,

I just solved the Skyline problem on the online judge. I have to say, this is a very enjoyable problem after that Arbitrage problem. It is very, very straightforward and the only problem is actually getting how the output works. Once that is understood, though, things are easy

In the skyline problem, you are supposed to compute the skyline of a city, given the starts, ends and heights of buildings. The skyline is segmented into 10.000 discrete segments, which all have a certain integer height. Your input consists of triples (L, H, R). Inside such a tripel, L is the lower inclusive coordinate of the building described, R is the exclusive upper coordinate of the building described and H is the height of the building. Common sense applies, as if there are two buildings overlapping in certain segments of the skyline, you will see the higher building.

Given this, you are supposed to compute a description vector for the resulting skyline. I found the ACM-description of the output to be pretty unclear and thus, it took some time to actually see what they wanted. Basically, the output O = <o1, o2, o3, o4, …, o2n-1, o2n> has to be read as: (1) walk to the right until cooridinate o1, and walk to height o2, walk to the right until coordinate o3, walk to height o4, … and so on. Thus, one needs to count the amount of steps to the right one has to take until the height changes, output this amount and output the new height (and repeat this for the entire distance).

Being armed with this understandings, the implementation is straightforward. It is not even necessary to store the buildings, because in order to compute the output, we just need an array which contains the maximum heights at each coordinate of the skyline. Given this, the output can be easily computed via:

function print_result(array height of int with n elements) {
    for i = 1 ... n {
        if (height[i] != height[i-1]) {
            print(i + " " + height[i]);
        }
    }

Given this, it suffices to compute this height array, given the input triples mentioned above. This can be done initializiing the array heights with 0′s and whenever a triple (L, H, R), you compute max(H, heights[i]) for all L <= i < R.

Regards, tetha.

Follow

Get every new post delivered to your Inbox.