Pearls: Jil, serialization of primitives

The last pearl of design that I covered was an implementation for the discriminated union in the probuf-net library. Now, it’s time to move to an area that is less esoteric in terms of the format, but still intriguing in terms of performance. Time to take a look at the fastest JSON serializer available for .NET, Jil.

Is it fast?

Jil is crazy fast. As always, the best way to do something fast, is to skip some steps. You can think about skipping doing some computations, to effectively lower the load for the CPU. When speaking about programming in .NET, one additional thing to skip are allocations. And when speaking about allocations and JSON, the one that should come to our minds are strings. I don’t mean strings per se, but object that are transformed into them before writing to the actual input.

A unique case of GUID

GUID in .NET (or Guid) stands for Globally Unique Identifier. It’s a structure, 16 bytes long, providing a unique identity generator (let’s not deal with comb guids and other types for now). It’s frequently used whenever its the client responsibility to generate an id. It’s also cheap, as you don’t need to call the third party to get a globally unique id.

Let’s take a look how Guid is serialized by another library for .NET, probably, the most popular one, JSON.NET

As you can see above (and you can see it on GitHub), JSON.NET first calls .ToString, allocating  32 characters in a form of a string, just to pass it to the TextWriter instance. Yes, this instance will end its life shortly after writing it to the writer, but still, it will be allocated. What could be done better?

Jil to the rescue

We’ve got a text writer. If we had an additional buffer of char[] we could write characters from Guid to the buffer and then pass it to the text writer. Unfortunately, Guid does not provide a method to write its output to the buffer. Jil, provides one instead. It has an additional Guid structure that enables to access internal fields’ values of Guid. With this and a buffer, we can write to a text writer without allocating an actual string. The code is much to big to paste it in here. Probably it must be big to be that fast;-) Just follow this link to see the selected lines.

Summary

The mentioned optimization for Guid type is just one of many, that you can find in the awesome Jil library. As always, do not allocate, is one of the top commandments when talking about performance.

Pearls: the protobuf’s discriminated union

Google Protocol Buffers is a proven protocol for serializing data efficiently. It has a wide adoption, enabling serialization for almost every platform, making the data easy to exchange between platforms. To store its schema, you can use .proto files, that enable describing messages in a platform agnostic format. You can see an example below:

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}

One of

Sometimes you want to define a message that will have only one of its fields initialized. This can be useful when sending a message providing a wrap around multiple types or in any other case that requires it. Take a look at the example, providing a wrap around three messages of the following types: MessageA, MessageB and MessageC.

message Wrapper {
  oneof OnlyOneOf {
    MessageA a = 1;
    MessageB b = 2;
    MessageC c = 3;  
  }
}

Protocol buffers use a tag value to store any field. First, the tag (1, 2, 3 above) and the type of the field is stored, next, the value is written. In the following case, where only one field is assigned, it would write its tag, type and value. Do we need to create a class that will contain all the fields? Do we need to waste this space for storing fields?

Protobuf-net Discriminated Union

To fix this wasteful generation, protobuf-net, a library build by Marc Gravell, added Discriminated Union types that is capable of addressing it. Let’s just take a look at the first of them.

public struct DiscriminatedUnionObject
{
  private readonly int _discriminator;

 // The value typed as Object
 public readonly object Object;

 // Indicates whether the specified discriminator is assigned
 public bool Is(int discriminator) => _discriminator == ~discriminator;

 // Create a new discriminated union value
 public DiscriminatedUnionObject(int discriminator, object value)
 {
  _discriminator = ~discriminator; // avoids issues with default value / 0
  Object = value;
 }
}

Let’s walk through all the design choices that have been made here:

  1. DiscriminatedUnionObject is a struct. It means, that if a class have a field of this type, it will be stored in the object, without additional allocations (you can think of it as inlining the structure, creating a “fat object”)
  2. It has only one field for storing the value Object. (no matter which type is it).
  3. It has only one field, called _discriminator to store the tag of the field.

If you generated the Wrapper class, it’d have only one field, of the DiscriminatedUnionObject type. Once a message of a specified type is set, the discriminator and the value would be written in the union. Simple, and efficient.

Summing up

Mapping a generic idea, like a discriminated union, into a platform or a language isn’t simple. Again, once it’s made in an elegant and an efficient way, I truly believe that it’s worth to be named as a pearl.

Protobuf-linq

I had an idea about querying and projecting over big streams of messages serialized with Google Protocol Buffers. If one needs only a few fields to his/her projection, why don’t make it implicit and prepare an optimal way of deserializing only these fields? That’s the way Protobuf-linq has been born. It’s simple, fast and eager to help you iterate over big streams of data.
Check it out!

Protobuf-net: inheritance of messages

The last post was an introduction to a simple project called Protopedia, located here. The project is destined to bring in a simple manner, probably one test per case, solutions for complex scenarios like versioning, derivation of messages, etc. As the versioning was described by the previous entry, it’s right time to deal with derivation.

Inheritance
It’s well known fact that one should favor composition over inheritance. Dealing with derivation trees with plenty of nodes can bring any programmer to his/her knees. How about messaging? Does this rule apply also in this area? It’s common for messages to provide a common denominator, containing fields common for all messages (headers, correlation identifiers and so on), especially if they’re meant to be sent/saved as a stream of messages of the base type (example: Event Sourcing with events of a given aggregate). Using a set of messages with a distilled root greatly simplifies concerns mentioned earlier. Consider the following scenario, serialization of a collection of A messages (or its derivatives) being given the following structure:

Message inheritance tree for example

How would Protobuf-net serialize such collection? First, take a look at the folder from Protopedia. You can notice, that all the classes: A, B, C, have been mapped with different types. It’s worth to notice the ProtoInclude attributes with tag values of the types located one level deeper in the derivation tree. The second important thing is the values of the derived type tags, which do not collide with tags of the class fields. In the example, you can find a constant value of 10 used for sake of future versions of the root, the A class. As one can see in the test of the derivation, the child classed of the given class are serialized as fields with the tags equal to the tag passed in the ProtoInclude attribute. To see the fields composed in a way the Protobuf-net serializes inherited messages take a look into following message contracts. There’s no magic and the whole idea is rather straightforward: serialize derivatives as fields, turning the inheritance into the composition. This working proposal of Protobuf-net will be sufficient and effective in all of your efforts of serialization of inheritance. Nice serializing!