Pearls: Jil, serialization of primitives

The last pearl of design that I covered was an implementation for the discriminated union in the probuf-net library. Now, it’s time to move to an area that is less esoteric in terms of the format, but still intriguing in terms of performance. Time to take a look at the fastest JSON serializer available for .NET, Jil.

Is it fast?

Jil is crazy fast. As always, the best way to do something fast, is to skip some steps. You can think about skipping doing some computations, to effectively lower the load for the CPU. When speaking about programming in .NET, one additional thing to skip are allocations. And when speaking about allocations and JSON, the one that should come to our minds are strings. I don’t mean strings per se, but object that are transformed into them before writing to the actual input.

A unique case of GUID

GUID in .NET (or Guid) stands for Globally Unique Identifier. It’s a structure, 16 bytes long, providing a unique identity generator (let’s not deal with comb guids and other types for now). It’s frequently used whenever its the client responsibility to generate an id. It’s also cheap, as you don’t need to call the third party to get a globally unique id.

Let’s take a look how Guid is serialized by another library for .NET, probably, the most popular one, JSON.NET

As you can see above (and you can see it on GitHub), JSON.NET first calls .ToString, allocating  32 characters in a form of a string, just to pass it to the TextWriter instance. Yes, this instance will end its life shortly after writing it to the writer, but still, it will be allocated. What could be done better?

Jil to the rescue

We’ve got a text writer. If we had an additional buffer of char[] we could write characters from Guid to the buffer and then pass it to the text writer. Unfortunately, Guid does not provide a method to write its output to the buffer. Jil, provides one instead. It has an additional Guid structure that enables to access internal fields’ values of Guid. With this and a buffer, we can write to a text writer without allocating an actual string. The code is much to big to paste it in here. Probably it must be big to be that fast;-) Just follow this link to see the selected lines.

Summary

The mentioned optimization for Guid type is just one of many, that you can find in the awesome Jil library. As always, do not allocate, is one of the top commandments when talking about performance.

Pearls: the protobuf’s discriminated union

Google Protocol Buffers is a proven protocol for serializing data efficiently. It has a wide adoption, enabling serialization for almost every platform, making the data easy to exchange between platforms. To store its schema, you can use .proto files, that enable describing messages in a platform agnostic format. You can see an example below:

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}

One of

Sometimes you want to define a message that will have only one of its fields initialized. This can be useful when sending a message providing a wrap around multiple types or in any other case that requires it. Take a look at the example, providing a wrap around three messages of the following types: MessageA, MessageB and MessageC.

message Wrapper {
  oneof OnlyOneOf {
    MessageA a = 1;
    MessageB b = 2;
    MessageC c = 3;  
  }
}

Protocol buffers use a tag value to store any field. First, the tag (1, 2, 3 above) and the type of the field is stored, next, the value is written. In the following case, where only one field is assigned, it would write its tag, type and value. Do we need to create a class that will contain all the fields? Do we need to waste this space for storing fields?

Protobuf-net Discriminated Union

To fix this wasteful generation, protobuf-net, a library build by Marc Gravell, added Discriminated Union types that is capable of addressing it. Let’s just take a look at the first of them.

public struct DiscriminatedUnionObject
{
  private readonly int _discriminator;

 // The value typed as Object
 public readonly object Object;

 // Indicates whether the specified discriminator is assigned
 public bool Is(int discriminator) => _discriminator == ~discriminator;

 // Create a new discriminated union value
 public DiscriminatedUnionObject(int discriminator, object value)
 {
  _discriminator = ~discriminator; // avoids issues with default value / 0
  Object = value;
 }
}

Let’s walk through all the design choices that have been made here:

  1. DiscriminatedUnionObject is a struct. It means, that if a class have a field of this type, it will be stored in the object, without additional allocations (you can think of it as inlining the structure, creating a “fat object”)
  2. It has only one field for storing the value Object. (no matter which type is it).
  3. It has only one field, called _discriminator to store the tag of the field.

If you generated the Wrapper class, it’d have only one field, of the DiscriminatedUnionObject type. Once a message of a specified type is set, the discriminator and the value would be written in the union. Simple, and efficient.

Summing up

Mapping a generic idea, like a discriminated union, into a platform or a language isn’t simple. Again, once it’s made in an elegant and an efficient way, I truly believe that it’s worth to be named as a pearl.

Pearls: EventStore transaction log

I thought for a while about presenting a few projects which are in my opinion real pearls. Let’s start with the EventStore and one in one of its aspects: the transaction log.
If you’re not familiar with this project, EventStore is a stream database providing complex event processing. It’s oriented around streams of events, which can be easily aggregated or repartitioned with projections. Based on ever appended streams and projections chasing the streams one can build a truly powerful logic around processing events.
One of the interesting aspects of EventStore is its storage engine. You can find a bit of description in here. ES does not abstract a storage away, the storage is a built-in part of the database itself. Let’s take a look at its parts before discussing its further:

Appending to the log
One the building blocks of ES is SEDA architecture – the communication within db is based on publishing and consuming messages, which one can notice reviewing StorageWriterService. The service subscribes to multiple messages, mentioned in implementations of the IHandle interface. The arising question is how often does the service flushed it’s messages to disk. One can notice, that method EnqueueMessage beside enqueuing incoming messages counts ones marked by interface IFlushableMessage. What is it for?
Each Handle method call Flush at its very end. Additionally, as the EnqueueMessage increases the counter of messages requiring flush, each Handle method decreases the counter when it handles a flushable message. This brings us to the conclusion that the mentioned counter is equal 0 iff there are no more flushable messages in the queue.

Flushing the log
Once the Flush is called a condition is checked whether:

  • the call was made with force=true (this never happens) or
  • there are no more flush messages in the queue or
  • the given time from the last time has passed

This provides a very powerful batching behavior. Under stress, the flush-to-be counter will be constantly greater than 0, providing flushing every given period of time. Under less stress, with no more flushables in the queue, ES will flush every message which needs to flush the log file.

Acking the client
The final part of the processing is the acknowledgement part. The client should be informed about persisting a transaction to disk. I spent a bit of time (with help of Greg Young and James Nugent) of chasing the place where the ack is generated. It does not happen in the StorageWriterService. What’s responsible for considering the message written then? Here comes the second part of the solution, the StorageChaser. In a dedicated thread, in an infinite loop, a method ChaserIteration is called. The method tries to read a next record from a chunk of unmanaged memory, that was ensured to be flushed by the StorageWriterService. Once the chaser finds CommitRecord, written when a transaction is commited, it acks the client by publishing the StorageMessage.CommitAck in ProcessCommitRecord method. The message will be translated to a client message, confirming the commit and sent back to the client.

Sum up
One cannot deny the beauty and simplicity of this solution. One component tries to flush as fast as possible, or batches a few messages if it cannot endure the pressure. Another one waits for the position to which a file is flushed to be increased. Once it changes, it reads the record (from the in-memory chunk matched with the file on disk) processes it and sends acks. Simple and powerful.