It’s time for another pearl of design, speed and beauty at the same time. Today, I’m bringing you a protocol used by NServiceBus to efficiently report its measurements to a monitoring endpoint. It’s really cool. Take a look! Not that I co-authored it or something… ;-)
One of the assumptions behind monitoring NServiceBus was ability to measure everything. By everything, I mean a few values like Processing Time, Criticial Time, etc. This, multiplied by a number of messages an endpoint processes can easily add up. Of course, if your endpoints processes 1 or 2 messages per second, the way you serialize data won’t make a difference. Now, imagine processing 1000 messages a sec. How would you record and report 1000 messages every single second? Do you think that “just use JSON” would work in this case? Nope, it would not.
NServiceBus is all about messages, and being given, that the messaging infrastructure is already in place, using messages for reporting messaging performance was the easiest choice. Yes, it gets a bit meta (sending messages about messages) but this was also the easiest to use for clients. As I mentioned, everything was already in place.
As you can imagine, a custom protocol for custom needs like this could help. There were several items that needed to be sent for every item being reported:
This triple, enables a lot of aggregations and enables dealing with out of order messages (temporal ordering) if needed. How to report these three values. Let’s consider first a dummy approach and estimate needed sizes:
You can see that beside 16 bytes, we’re paying a huge tax for sending the message type over and over again. Sending it 1000 times a second does not make sense, does it? What we could do is to send every message type once per message and assign an identifier to reuse it in a single message. This would prefix every message with a dictionary of message types used in the specific message Dictionary<string,int> and leave the tuple in the following shape:
20 bytes for a single measurement is not a big number. Can we do better? You bet!
As measurements are done in a temporal proximity, the difference between reporting times won’t be that big. If we extracted the minimal date to the header, we could just send difference between the starting date and a date for the entry. This would make the tuple look like:
the reporting time difference - int (4 bytes)
16 bytes per measurement? Even if we’re recording 1000 messages a sec this gives just 16kb. It’s not that big.
The final protocol consists of:
With these schema being written binary, we can measure everything.
Writing measurements is fun. One part, not mentioned here, is doing it in a thread/task friendly manner. The other, even better, is to design protocols that can deal with the flood of values, and won’t break because someone pushed a pedal to the metal.