Skip navigation

Greetings,

in this post, I will explain the problem I encountered and which I solved with the reading coroutines. This DSL is a part of the code generator in a compiler. This compiler reads a certain type description and produces serialization and deserialization code for values described by the type description. This reading coroutine simplifies generating the deserialization code. Let’s go through this step by step.

At first, there is the type description. The current type description is a condensed form of a parsed XSD. Basically it contains a number of elements with a name and a type, and the type defines what kind of values are valid. You can think of this as a variable declaration with a certain type in Java. In Java, you see something along the lines “int numberOfCats;” and you know that numberOfCats holds a numeric value from Integer.INT_MIN to Integer.INT_MAX without any fractions. The only difference is that the type description is more flexible than this. For example, if you define a numeric type, you can (optionally) specify the number of fraction digits, the total number of digits, the minimum value and the maximum value. Overall, this type description is (because it is derived from XSDs) very general and contains fun things such as:

  • lists with an optional maximum length
  • unions of types
  • numbers with optional limits and precision
  • strings with optinal maximum and minimum length

In short, it would be easier to list things this type description cannot represent.

Given this description, we need to generate a serialization and deserialization function. We use the term serialization in order to describe the transformation of a value of a given type into a well defined bit sequence, while deserialization describes the transformation of a well defined, well formed bit sequence into a value again. In other words, the serialization of a boolean is implemented by either writing a 1 if the boolean is true or writing a 0 if the boolean is false, or the serialization of an integer is implemented by writing the two complement bit representation of this number. For a sequence, for example, we absuse the fact that we know a lot about the value, because we know the type of this value and thus, we can implement the serialization of a value of a sequence type by applying the serialization for each member value of the sequence one after each other.

The problem in this situation occurs once you consider the context where the generated code is supposed to be used. The generated code is supposed to be part of a communication stack on an embedded system (well, also on an enterprise server, such that you have a consistent communication stack on all parts of the system implemented). Given this, you cannot assume that you receive the entire bit sequence for a value in a single go, deserialize the value and go back to sleep after this. Such an embedded system usually recieves data in chunks, whereas the size of the chunks is determined by the receive buffer size. Furthermore, we cannot precompute how long the byte sequence for a type will be, so we cannot require the client to prebuffer enough data (consider the union of a list without a length resctriction and a boolean, for example). Thus, we overall needed a way to handle the deserialization of such a segmented byte sequence.

The first way we considered was to try to deserialize the current buffer and either produce a value or reject the buffer. If the buffer is rejected, the user has to buffer more data and retry the deserialization with more buffered data. If a value is produced, the user has to process it. This approach has the problem that overall, a lot of redundant deserialization attempts will happen and the client has to store a lot of buffered data he actually does not want to store.

Given this, I figured we could use a coroutine with a statemachine in order to solve these problems. The coroutine could read the current segment of the bit sequence and remember all necessary information and potentially produce a value. If no value is produced, well, no value is produced and the buffer is consumed and can be reused. If a value is produced, the value can be processed and the remainder of the buffer is fed into the coroutine again. This approach has the advantage that the user of the generated code can buffer a natural amount of bits (pretty much the receive buffer size of his system) and feed this into the coroutine. This means that there will be no redundant deserialization attempts, because every bit of the input bit sequence will be considered exactly once and no information will be stored redundantly, because the coroutine stores just the information it needs and nothing more. The buffer can be discarded after the coroutine consumed it’s contents.

However, writing these coroutines is a fairly tricky part. Overall, you need to keep a static variable to contain the current state, you need to remember how many bits to buffer in order to execute the next deserialization step, you need to remember a lot of technical overhead (updating the buffer size, the remaining unconsumed bits) and overall, you need to write a lot of C-Code in order to do this. Thus, I wrote a prototype for such a coroutine, got a headache from that and decided that I cannot have unexperienced C programmers write such nontrivial code. Thus, I decided to solve this problem with a DSL.

Advertisement

2 Trackbacks/Pingbacks

  1. [...] things, thinking akward thoughts About « Hint-Through: Estimation, Risk and Value. The problem to solve with reading coroutines. [...]

  2. [...] Domain specific headache Developing things, thinking akward thoughts About « The problem to solve with reading coroutines. [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.