Greetings,
this will be about the DSL I implemented to solve the issue about wirting those coroutines described in the first post .
As mentioned in the presentation roadmap, I will not use the implemented internal DSL, because this DSL requires a number of additional concepts and issues to be solved. I would like to avoid these. I will overall use a Java or C-like-language with a little quirk. I will throw in numbers in pointy parenthesis which denotes “Buffer this many bits here and execute the following statement, providing the buffered bits in a 0-padded byte-buffer called buffer”.
We will now examine three to four examples, in order to see regular statements, conditionals and loops. These constructs are sufficient as a complete base for a programming language, and I did not implement more than these intially. We discovered further constructs of this language via refactoring later on.
For example, assume we are supposed to read an 8-bit integer, assuming that the canonical byte form is defined to be 8 continuous integers in 2 complement form. Think about this intuitively for a second. I arrive at something like “Well.. buffer 8 bits, convert them somehow and write that into some output variable”. In the DSL implemented, we can write this as:
<8> output = convertToInt(buffer[0]);
Looking at this, this means: Buffer 8 bits (“<8>”) and execute the statement after it, with the buffered bits in the byte-array buffer. In other words, we arrive at: Buffer 8 bits, convert these 8 bits into an integer and store the result in the output variable. At first, I think it is nice how close this is to my intuitive idea how to do this. Second, note how this code is virtually oblivious about buffering issues. Do the bitsarrive in a block of 8? In two blocks of 4? In single bits? I don’t know. I don’t need to know.
You are unable to see this here, because I am using pseudo code, but the DSL in itself does not define basic statements, such as primitive types, integer operations and so on. This decision was made, because the problem in itself is a control flow problem. In other words: The primitive operations C integers provide are perfectly fine for processing the contents of a buffer, once it is buffered. The primary issue to solve is actually getting this buffer filled and then executing these C-statements in order to process the contents of this buffer.
Taking it one step further, we get conditionals. Assume that we have two possible types, which both serialize into 16 bits and are converted by two conversion function, call these convertA and convertB. Which type is transmitted is transmitted with a single byte preceding the 16 bits of the actual data. If this bit is 1, it is the type which requires convertA, otherwise (if it is 0), the type requires convertB. Again, think about this intuitively. I arrive at something along the lines: “Well, I need to read a single byte, check this to be 1 or 0, buffer 16 more bytes and call the correct conversion function”. In the DSL, this translates into:
<1> if (buffer[0] == 1)
<16> output = convertA(buffer[0], buffer[1])
else
<16> output = convertB(buffer[0], buffer[1])
Let’s walk through this precisely. We have an if, which is annotated with a buffer size. This means that a single byte is buffered, the rest of the buffer is padded with 0 and “the if is executed”. That means, the condition of the if is evaluated and the right path is taken. The statements in the branches should be familiar: The translate to “Buffer 16 bit, call convertA (or convertB) and store the result in the output”. So overall, this translates into: “Depending if the first buffered bit is 0 or 1, buffer 16 more bits and convert them in the right fashion”.
Note that padding in this case means that all bits in the buffer are in a well defined state (buffered or 0). This padding is independent of padding in the serialized byte sequence, which is used for example to byte-align the various data fields in order to enhance the readability of the serialized byte sequence with a hex editor.
Finally, there is the loop. Assume we need to read a sequence of 8 bit chunks. The first bit of a chunk is 1 if there is another chunk and it is 0 if the current chunk is the last chunk in the sequence of chunks. Of each chunk, the remaining 7 bits are some data we store and collect. Think about this intuitively again. I arrive at something along the lines: “Well, read a bit, check if it is 1 or 0, and if it is 1, read data and continue, and otherwise stop.” In the DSL, this translates into:
while(buffer[0] == 1) storeData(buffer[0]);
Examining this, we get: Read a bit, and if this is 1, read 7 more bits, store these and continue, otherwise, if the read bit was 0, stop. Again, no need to care about buffering and everything. I just need to focus on the description of the byte sequence and I can translate these into clean code, which should do the right thing in a very clear way. Considering that serialization an deserialization is a pretty tricky topic, I think this is a very strong property, because now, it is very easy to verify if this code does what it wants. I am doing it all the time for you.
Finally, there is a last concept about buffering: not buffering. Confusion aside, consider the following definition: A sequence of bytes, terminated by a 0 byte. This should be an issue given your current knowledge, bcause the condition of a loop appears to consume the byte it examines. However, in this case, we just need to inspect the byte for the condition and hand it to the body of the loop in order to process it further. Are we lost? Well. No. In this situation, we decided to define that a buffer size of 0 requires that no buffering occurs, and the buffer of the last buffering is reused. So, we can implement the issue from above as:
<8>while(buffer[0] != 0) <0> storeData(buffer[0]);
This reads as: Buffer 8 bits, and if these are not 0, store these 8 buffered bits somehow. Obviously, if there is no buffering before a 0-buffering, the buffer will be undefined. This has not posed a problem for us until now. I am overall not entirely sure about this decision, even though it is working pretty well. At the time we decided this, it was a pretty pragmatic decision and the more I think about it, the more I like it, but somehow, I don’t get into the state of loving it. I mean, it obviously does make sense that no buffering reuses the previous buffer, but on the other hand, it still feels like a little hack. On the other hand, it works. I guess, a clean solution would be to decouple the buffering from individual statements and expressions and allow separate buffer statements. However, this will result in trickery in the loops again.
So, this is the overall DSL, on a high level. You just pick C, add some buffering directives here and there and you have this DSL, with very little surprises.