Pearls: the protocol for monitoring NServiceBus

It’s time for another pearl of design, speed and beauty at the same time. Today, I’m bringing you a protocol used by NServiceBus to efficiently report its measurements to a monitoring endpoint. It’s really cool. Take a look! Not that I co-authored it or something… 😉

Measure everything

One of the assumptions behind monitoring NServiceBus was ability to measure everything. By everything, I mean a few values like Processing Time, Criticial Time, etc. This, multiplied by a number of messages an endpoint processes can easily add up. Of course, if your endpoints processes 1 or 2 messages per second, the way you serialize data won’t make a difference. Now, imagine processing 1000 messages a sec. How would you record and report 1000 messages every single second? Do you think that “just use JSON” would work in this case? Nope, it would not.

How to report

NServiceBus is all about messages, and being given, that the messaging infrastructure is already in place, using messages for reporting messaging performance was the easiest choice. Yes, it gets a bit meta (sending messages about messages) but this was also the easiest to use for clients. As I mentioned, everything was already in place.


As you can imagine, a custom protocol for custom needs like this could help. There were several items that needed to be sent for every item being reported:

  1. the reporting time
  2. the value of a metric (depending on a metrics type it can have a different meaning)
  3. the message type

This triple, enables a lot of aggregations and enables dealing with out of order messages (temporal ordering) if needed. How to report these three values. Let’s consider first a dummy approach and estimate needed sizes:

  1. the reporting time – DateTime (8 bytes)
  2. the value of a metric – long (8 bytes)
  3. the message type (N bytes using UTF8 encoding)

You can see that beside 16 bytes, we’re paying a huge tax for sending the message type over and over again. Sending it 1000 times a second does not make sense, does it? What we could do is to send every message type once per message and assign an identifier to reuse it in a single message. This would prefix every message with a dictionary of message types used in the specific message Dictionary<string,int> and leave the tuple in the following shape:

  1. the reporting time – DateTime (8 bytes)
  2. the value of a metric – long (8 bytes)
  3. the message type id – int (4 bytes)

20 bytes for a single measurement is not a big number. Can we do better? You bet!

As measurements are done in a temporal proximity, the difference between reporting times won’t be that big. If we extracted the minimal date to the header, we could just send difference between the starting date and a date for the entry. This would make the tuple look like:

  1. the reporting time difference – int (4 bytes)
  2. the value of a metric – long (8 bytes)
  3. the message type – int (4 bytes)

16 bytes per measurement? Even if we’re recording 1000 messages a sec this gives just 16kb. It’s not that big.

Final protocol

The final protocol consists of:

  1. the prefix
    1. the minimum date for all the entries in a message (8 bytes)
    2. the dictionary of message types mapped to ints (variable length)
  2. the array of
    1. tuples each having
      1. the reporting time difference – int (4 bytes)
      2. the value of a metric – long (8 bytes)
      3. the message type – int (4 bytes)

With these schema being written binary, we can measure everything.


Writing measurements is fun. One part, not mentioned here, is doing it in a thread/task friendly manner. The other, even better, is to design protocols that can deal with the flood of values, and won’t break because someone pushed a pedal to the metal.




On structures that last

Last year I read over 20 books. One of them was Antifragile, by Nassim Nicholas Taleb. One of the ideas that I found intriguing, was the following statement (I’m quoting from memory): things, that have been here for some time, are much more likely to stay, than all the new and shiny.

Herds & ownership

This is my herd. You and me, we’re in the same herd. This person is from another herd. In herd we trust. We, the herd, share secrets, stories and fun. The herd lasts, building its strength over time. Support, knowing each other, help – you get it for free. No matter how you call this herd, a team, a group, people did not change that much. We need herds.

This is mine, that is yours. We own things. Collectively (we, the herd) or individually (“don’t you dare to touch MY phone”). We care about things we own. We care less about things we don’t. We need ownership.

Say Conway’s law, one more time

If you haven’t heard about this law, here it is:

organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations

This one sentence was a reason for never-ending debates, tooooo many presentations and many people nodding and murmuring “Yeah, this is because of the Conway’s law”.

Now, think again about structures, the Conway’s law and things that last. Is it a valid approach to organize things in different way? If it is, what things are included in the new order and what are excluded? Is there any chance that by designing a new approach, the proven approaches are being thrown away? Whatever you do, don’t throw the baby out with the bathwater.

Pearls: putting EventStore in reverse

Pearls of design, beautiful patterns, efficient approaches. After covering Jil and its extremely efficient serialization of primitives it’s time to put things in reverse. By things, in this case, I mean EventStore, the event centric database that I already presented once. Now, it’s time to visit its ability to move back in time and traverse its log in the opposite direction.

Log all the things

EventStore uses a traditional log mechanism for storing its data in a transactional way. The mechanism can be described as a never-ending log file, that has new data appended at the end of itself. Of course different databases implement the “never ending file” in different ways, still, the logical approach is the same: whenever data are updated/inserted we log these operations to a file.

Appending operations has an interesting property. Whenever other components break, whenever hardware restarts, the log is the single source of truth that can be traversed and reapplied to all the components that didn’t catch up before failure. How to make sure that a specific piece of log was written properly though? That the data what we stored, are the data that survived the crush?

Make it twice

Below you can see a piece of code that is extracted from the writer:

We can see the following:

  1. there’s a buffer, which is a regular MemoryStream
  2. its position is set to 4, leaving 4 initial bytes empty
  3. the whole record is written to the buffer
  4. its length is calculated
  5. the length is written at the beginning and and at the end

This looks like storing 4 bytes too much. On the other hand, it’s a simple check mechanism, used internally by EventStore to check the consistency of the log. If these two values do not match, something is seriously wrong in chunk (an original comment from the code). Could the same length written twice be used for something more?

Put it in reverse Terry!

EventStore has additional indexing capabilities, that allow it to move back and forth pretty fast. What if we had none and still wanted to travel through the log?

Consider the following scenario. You use a log approach for you service. Then, for any reason you need to read 10th entry from the end. Having a regular log, you’d need to do the following:

  1. Read the log from the beginning counting items till it’s done. Sorry, there’s no other point and it’s painfully blunt method for doing it.

If we have the length written twice though, what you can do is to read 4 last bytes of the log, it will always be length and move backward to the previous entry. The number of bytes to move backward?

var moveBackBy = length + 2 * sizeof(int)

as two lengths are written on a single integer.

With this, you don’t need any additional index. The log itself is sufficient enough to move forward and backward.


Data structures and data format nowadays are not popular topics. As you saw above, adding just one integer added a lot of capabilities to a simple append only file. Next time, before “just using” another data store, think about your data and the format you’re gonna use. It can make a real difference.


On sharing your opinion

Two short stories, one topic.


Imagine the following scenario. An interesting discussion that gets people excited. There’s this one new person that doesn’t get involved too much. Actually, if you were counting all the words they said, the number would be 0. Nothing, none, null. The reason for this is simple. They did not go through the same as the rest. The experience in this topic, if any, is none. Their reasoning as simple as that: no exp, no talk. Is it a valid approach?


Imagine the following scenario. An interesting discussion that gets people excited. There’s this one person that gets involved. They share all the experiences they had with the topic. The reasoning they share, the experience, is so big, that all the rest just follows their words. The speaker’s reasoning as simple as that: big exp, big talk. Is it a valid approach?

What to do?

Recently, I came to a conclusion that there’s no simple answer to these questions. Being an expert, sometimes requires you to be quiet for a little. To listen to others’ experiences and to not flood them with your opinions. They may just follow you.
Being a novice, sometimes requires you to share, what you don’t know. Just to show a new point of view.
Next time, when debating, think about your opinions again. Maybe, it’s time to make it the other way around.

Never ending Append Blobs

In this article I’ll describe an easy and fast way to use Azure Storage Append Blobs to create a never ending Append Blob. Yes, a regular Append Blob has its limitations, including the maximum number of blocks and the size, but with a proper design we can overcome them.

Limits we want to overcome

According to Azure Subscription Storage Limits, an Append Blob is limited in the following way:

  1. Max number of blocks in an append blob: 50,000 – this means that we can append to a single blob only 50,000 times, no matter how much data we add at the same time
  2. Max size of a block in an append blob: 4 MiB – this means that a single operation adding one chunk of data, cannot contain more than 4 MiB.

If we could address the first, it’s highly unlikely that we’d need to address the second. 4 MiB is just enough for a single append operation.

Overcoming the limited number of blocks

Let’s consider a single writer case. Now, agree that we’ll use natural number (1, 2, 3, …) to name blobs. Then, whenever an append operation is about to happen, the writer could check the number of already appended blocks by fetching blob’s properties and create another one, if the number is equal to the max number of blocks. We could also try to append and catch the StorageException, checking for BlockCountExceedsLimit error code (see Blob Error Codes for more). Then, we’d follow with creating another blob and appending to the newly created one. This case is easy. What about multiple processes, writers trying to append at the same time?

Multiple writers

Multiple writers could use a similar approach. There’s also a risk of not being able to check for the limit. When you fetch attributes, another writer could already append their block making the number invalid. We could stick with the exception handling way of doing it:

  1. get the latest blob name (1, 2, 3, … – natural numbers)
  2. append the block
  3. if (2) this throws:
    1. try to create the next one or retrieve existing one
    2. append the block

This allows multiple writers to write to the logically same chunked blob, that is split across multiple physical Append Blobs. Wait a minute, what about ordering?

Ordering multiple writers

With multiple writers A, B, C, … appending blocks

  1. A1, A2 – for A,
  2. B1, B2 – for B,
  3. C1, C3 – for C3

the following sequences could be a result of applying this approach:

  1. A1, A2, B1, B2, C1, C2
  2. A1, B1, B2, A2, C1, C2,
  3. A1, B1, B2, C1, C2, A2

You can see that this creates a partial order (A1 will always be before A2, B1 before B2, C1 before C2) but different total orders are possible, depending on the speed of writers. Usually, it’s just ok as the writers were appending to a blob their results, their operations, not carrying about the others’ results.


We’ve seen how easy it’s to implement a never ending append blob for multiple writers. This is a great enabler in case, where you need a single logical, log-like, blob, that provides an ordered list of blocks.

Pearls: Jil, serialization of primitives

The last pearl of design that I covered was an implementation for the discriminated union in the probuf-net library. Now, it’s time to move to an area that is less esoteric in terms of the format, but still intriguing in terms of performance. Time to take a look at the fastest JSON serializer available for .NET, Jil.

Is it fast?

Jil is crazy fast. As always, the best way to do something fast, is to skip some steps. You can think about skipping doing some computations, to effectively lower the load for the CPU. When speaking about programming in .NET, one additional thing to skip are allocations. And when speaking about allocations and JSON, the one that should come to our minds are strings. I don’t mean strings per se, but object that are transformed into them before writing to the actual input.

A unique case of GUID

GUID in .NET (or Guid) stands for Globally Unique Identifier. It’s a structure, 16 bytes long, providing a unique identity generator (let’s not deal with comb guids and other types for now). It’s frequently used whenever its the client responsibility to generate an id. It’s also cheap, as you don’t need to call the third party to get a globally unique id.

Let’s take a look how Guid is serialized by another library for .NET, probably, the most popular one, JSON.NET

As you can see above (and you can see it on GitHub), JSON.NET first calls .ToString, allocating  32 characters in a form of a string, just to pass it to the TextWriter instance. Yes, this instance will end its life shortly after writing it to the writer, but still, it will be allocated. What could be done better?

Jil to the rescue

We’ve got a text writer. If we had an additional buffer of char[] we could write characters from Guid to the buffer and then pass it to the text writer. Unfortunately, Guid does not provide a method to write its output to the buffer. Jil, provides one instead. It has an additional Guid structure that enables to access internal fields’ values of Guid. With this and a buffer, we can write to a text writer without allocating an actual string. The code is much to big to paste it in here. Probably it must be big to be that fast;-) Just follow this link to see the selected lines.


The mentioned optimization for Guid type is just one of many, that you can find in the awesome Jil library. As always, do not allocate, is one of the top commandments when talking about performance.