Bounded design

If you wake up a Domain Driven Design fan and scream the word “Bounded” the answer will be immediate and always the same: “Context”. It’s funny that this word is having so much trouble in leaving DDD context. I’d like to encourage you for broadening it a little bit, to a design, architecture space.

Reality strikes back

It’s quite interesting that often, when we hit a limitation of a service of any kind, the first reaction is negative. Probably you heard statements similar to these:

  • Why did they set the value at this level?
  • I want to be able to put a transaction on top of anything
  • I don’t need to change my design. It’s their fault that the created a service that sucks

I’m not writing about malicious providers claiming, that they do something when they don’t. I’m talking about a well described limitations that you can find in a documentation of a tool that you’re using. You can try to kick the wall, but still, the limitation will be  there. The better question to ask is why?

Limit to profit

Every limitation that you put in your service provides more space for designing it. Let’s take into consideration a supported type of data for a custom database. If all values and keys could be only of type int (4 bytes), then I would not need to care about variable lengths of buffers used for storing them! Every pair of a key and a value would be written on 8 bytes! And this is memory aligned as well! And this and that. Let’s see another example.

Often, when using cloud databases there’s a notion of a partition or a shard. You are promised to have some kind of transactionality within one partition, but you can’t use one transaction across partitions. Again, you may raise all the blaming questions, or think about partition as a unit of a scalability. Imagine that you’d have to store all the data of a single database on one machine. That’s at least highly inefficient. Now think. You create partitions from the very beginning. They can be moved to other machines, as all the data of one partition resides on one machine (this is implication, so multiple partitions can reside on the same machine). This could be a game changer, especially, if you’re writing a cloud service like Azure Storage Tables.


You can see the pattern now. Behind majority of these limitations, are some design decisions. They were made to make a service work. Of course there are probably some sloppy ones, but still, whenever you see a limitation like this, think about it. Then, design your solution accordingly.

Semantic logging unleashed

Who didn’t use printf or Console.WriteLine to just something logged. Possibly, you were a bit more advanced and used a custom library that does this line printing to a separate file, or event a rolling file. What’s printed is just a text. If you’re aware enough, you’d put probably some ‘around printed values’. In a moment of doubt, like a downtime or a serious client claiming that he lost money, you’d try to use these and the state of your app to find what’s going on.

You may be on the other spectrum, using approaches like event sourcing where every single business decision is captured in a well-named, explicitly modeled object called an event. Your storage, providing a continuous log of all the events could be treated as the source of the truth. Is there anything in the middle of the spectrum?

Semantic logging is an approach where you model your log entries to be more explicit. You separate the template, from the actual values passed into the specific entry. See the following example:

log.Error("Failed to log on user {UserId}", userId);

The logging template, is constant. It informs about an error that occurred. In this case, it’s a failed log. Is there a schema? Of course, there is. It’s not strict, as it may change whenever a developer augments the statement. Nevertheless, the first part:

"Failed to log on user {UserId}"

provides a schema. The value, passed as the second parameter, is the value for the occurrence of the event. Depending on the storage system, it can be indexed and searched for. Same for templates. If you have this part specifically separated (and we do have it in the semantic approach), it’s easy to index against this part and search for all entries of failed user logons.

I’m not a fan of implicitness, I’d say I’m totally on the opposite side of the spectrum, still, I find the semantic logging approach, explicit enough to capture important information, without thinking about all the schema celebration. Eventually, the only consumer of the schema and the payload is the logging tool. If it’s smart enough to make sense out of it, without spinning tens of VMs or using tens of GB of RAM, I’m fine with this semi-strict approach.

Pearls: the protocol for monitoring NServiceBus

It’s time for another pearl of design, speed and beauty at the same time. Today, I’m bringing you a protocol used by NServiceBus to efficiently report its measurements to a monitoring endpoint. It’s really cool. Take a look! Not that I co-authored it or something… 😉

Measure everything

One of the assumptions behind monitoring NServiceBus was ability to measure everything. By everything, I mean a few values like Processing Time, Criticial Time, etc. This, multiplied by a number of messages an endpoint processes can easily add up. Of course, if your endpoints processes 1 or 2 messages per second, the way you serialize data won’t make a difference. Now, imagine processing 1000 messages a sec. How would you record and report 1000 messages every single second? Do you think that “just use JSON” would work in this case? Nope, it would not.

How to report

NServiceBus is all about messages, and being given, that the messaging infrastructure is already in place, using messages for reporting messaging performance was the easiest choice. Yes, it gets a bit meta (sending messages about messages) but this was also the easiest to use for clients. As I mentioned, everything was already in place.


As you can imagine, a custom protocol for custom needs like this could help. There were several items that needed to be sent for every item being reported:

  1. the reporting time
  2. the value of a metric (depending on a metrics type it can have a different meaning)
  3. the message type

This triple, enables a lot of aggregations and enables dealing with out of order messages (temporal ordering) if needed. How to report these three values. Let’s consider first a dummy approach and estimate needed sizes:

  1. the reporting time – DateTime (8 bytes)
  2. the value of a metric – long (8 bytes)
  3. the message type (N bytes using UTF8 encoding)

You can see that beside 16 bytes, we’re paying a huge tax for sending the message type over and over again. Sending it 1000 times a second does not make sense, does it? What we could do is to send every message type once per message and assign an identifier to reuse it in a single message. This would prefix every message with a dictionary of message types used in the specific message Dictionary<string,int> and leave the tuple in the following shape:

  1. the reporting time – DateTime (8 bytes)
  2. the value of a metric – long (8 bytes)
  3. the message type id – int (4 bytes)

20 bytes for a single measurement is not a big number. Can we do better? You bet!

As measurements are done in a temporal proximity, the difference between reporting times won’t be that big. If we extracted the minimal date to the header, we could just send difference between the starting date and a date for the entry. This would make the tuple look like:

  1. the reporting time difference – int (4 bytes)
  2. the value of a metric – long (8 bytes)
  3. the message type – int (4 bytes)

16 bytes per measurement? Even if we’re recording 1000 messages a sec this gives just 16kb. It’s not that big.

Final protocol

The final protocol consists of:

  1. the prefix
    1. the minimum date for all the entries in a message (8 bytes)
    2. the dictionary of message types mapped to ints (variable length)
  2. the array of
    1. tuples each having
      1. the reporting time difference – int (4 bytes)
      2. the value of a metric – long (8 bytes)
      3. the message type – int (4 bytes)

With these schema being written binary, we can measure everything.


Writing measurements is fun. One part, not mentioned here, is doing it in a thread/task friendly manner. The other, even better, is to design protocols that can deal with the flood of values, and won’t break because someone pushed a pedal to the metal.




Pearls: putting EventStore in reverse

Pearls of design, beautiful patterns, efficient approaches. After covering Jil and its extremely efficient serialization of primitives it’s time to put things in reverse. By things, in this case, I mean EventStore, the event centric database that I already presented once. Now, it’s time to visit its ability to move back in time and traverse its log in the opposite direction.

Log all the things

EventStore uses a traditional log mechanism for storing its data in a transactional way. The mechanism can be described as a never-ending log file, that has new data appended at the end of itself. Of course different databases implement the “never ending file” in different ways, still, the logical approach is the same: whenever data are updated/inserted we log these operations to a file.

Appending operations has an interesting property. Whenever other components break, whenever hardware restarts, the log is the single source of truth that can be traversed and reapplied to all the components that didn’t catch up before failure. How to make sure that a specific piece of log was written properly though? That the data what we stored, are the data that survived the crush?

Make it twice

Below you can see a piece of code that is extracted from the writer:

We can see the following:

  1. there’s a buffer, which is a regular MemoryStream
  2. its position is set to 4, leaving 4 initial bytes empty
  3. the whole record is written to the buffer
  4. its length is calculated
  5. the length is written at the beginning and and at the end

This looks like storing 4 bytes too much. On the other hand, it’s a simple check mechanism, used internally by EventStore to check the consistency of the log. If these two values do not match, something is seriously wrong in chunk (an original comment from the code). Could the same length written twice be used for something more?

Put it in reverse Terry!

EventStore has additional indexing capabilities, that allow it to move back and forth pretty fast. What if we had none and still wanted to travel through the log?

Consider the following scenario. You use a log approach for you service. Then, for any reason you need to read 10th entry from the end. Having a regular log, you’d need to do the following:

  1. Read the log from the beginning counting items till it’s done. Sorry, there’s no other point and it’s painfully blunt method for doing it.

If we have the length written twice though, what you can do is to read 4 last bytes of the log, it will always be length and move backward to the previous entry. The number of bytes to move backward?

var moveBackBy = length + 2 * sizeof(int)

as two lengths are written on a single integer.

With this, you don’t need any additional index. The log itself is sufficient enough to move forward and backward.


Data structures and data format nowadays are not popular topics. As you saw above, adding just one integer added a lot of capabilities to a simple append only file. Next time, before “just using” another data store, think about your data and the format you’re gonna use. It can make a real difference.


Never ending Append Blobs

In this article I’ll describe an easy and fast way to use Azure Storage Append Blobs to create a never ending Append Blob. Yes, a regular Append Blob has its limitations, including the maximum number of blocks and the size, but with a proper design we can overcome them.

Limits we want to overcome

According to Azure Subscription Storage Limits, an Append Blob is limited in the following way:

  1. Max number of blocks in an append blob: 50,000 – this means that we can append to a single blob only 50,000 times, no matter how much data we add at the same time
  2. Max size of a block in an append blob: 4 MiB – this means that a single operation adding one chunk of data, cannot contain more than 4 MiB.

If we could address the first, it’s highly unlikely that we’d need to address the second. 4 MiB is just enough for a single append operation.

Overcoming the limited number of blocks

Let’s consider a single writer case. Now, agree that we’ll use natural number (1, 2, 3, …) to name blobs. Then, whenever an append operation is about to happen, the writer could check the number of already appended blocks by fetching blob’s properties and create another one, if the number is equal to the max number of blocks. We could also try to append and catch the StorageException, checking for BlockCountExceedsLimit error code (see Blob Error Codes for more). Then, we’d follow with creating another blob and appending to the newly created one. This case is easy. What about multiple processes, writers trying to append at the same time?

Multiple writers

Multiple writers could use a similar approach. There’s also a risk of not being able to check for the limit. When you fetch attributes, another writer could already append their block making the number invalid. We could stick with the exception handling way of doing it:

  1. get the latest blob name (1, 2, 3, … – natural numbers)
  2. append the block
  3. if (2) this throws:
    1. try to create the next one or retrieve existing one
    2. append the block

This allows multiple writers to write to the logically same chunked blob, that is split across multiple physical Append Blobs. Wait a minute, what about ordering?

Ordering multiple writers

With multiple writers A, B, C, … appending blocks

  1. A1, A2 – for A,
  2. B1, B2 – for B,
  3. C1, C3 – for C3

the following sequences could be a result of applying this approach:

  1. A1, A2, B1, B2, C1, C2
  2. A1, B1, B2, A2, C1, C2,
  3. A1, B1, B2, C1, C2, A2

You can see that this creates a partial order (A1 will always be before A2, B1 before B2, C1 before C2) but different total orders are possible, depending on the speed of writers. Usually, it’s just ok as the writers were appending to a blob their results, their operations, not carrying about the others’ results.


We’ve seen how easy it’s to implement a never ending append blob for multiple writers. This is a great enabler in case, where you need a single logical, log-like, blob, that provides an ordered list of blocks.

Pearls: Jil, serialization of primitives

The last pearl of design that I covered was an implementation for the discriminated union in the probuf-net library. Now, it’s time to move to an area that is less esoteric in terms of the format, but still intriguing in terms of performance. Time to take a look at the fastest JSON serializer available for .NET, Jil.

Is it fast?

Jil is crazy fast. As always, the best way to do something fast, is to skip some steps. You can think about skipping doing some computations, to effectively lower the load for the CPU. When speaking about programming in .NET, one additional thing to skip are allocations. And when speaking about allocations and JSON, the one that should come to our minds are strings. I don’t mean strings per se, but object that are transformed into them before writing to the actual input.

A unique case of GUID

GUID in .NET (or Guid) stands for Globally Unique Identifier. It’s a structure, 16 bytes long, providing a unique identity generator (let’s not deal with comb guids and other types for now). It’s frequently used whenever its the client responsibility to generate an id. It’s also cheap, as you don’t need to call the third party to get a globally unique id.

Let’s take a look how Guid is serialized by another library for .NET, probably, the most popular one, JSON.NET

As you can see above (and you can see it on GitHub), JSON.NET first calls .ToString, allocating  32 characters in a form of a string, just to pass it to the TextWriter instance. Yes, this instance will end its life shortly after writing it to the writer, but still, it will be allocated. What could be done better?

Jil to the rescue

We’ve got a text writer. If we had an additional buffer of char[] we could write characters from Guid to the buffer and then pass it to the text writer. Unfortunately, Guid does not provide a method to write its output to the buffer. Jil, provides one instead. It has an additional Guid structure that enables to access internal fields’ values of Guid. With this and a buffer, we can write to a text writer without allocating an actual string. The code is much to big to paste it in here. Probably it must be big to be that fast;-) Just follow this link to see the selected lines.


The mentioned optimization for Guid type is just one of many, that you can find in the awesome Jil library. As always, do not allocate, is one of the top commandments when talking about performance.

Pearls: the protobuf’s discriminated union

Google Protocol Buffers is a proven protocol for serializing data efficiently. It has a wide adoption, enabling serialization for almost every platform, making the data easy to exchange between platforms. To store its schema, you can use .proto files, that enable describing messages in a platform agnostic format. You can see an example below:

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;

One of

Sometimes you want to define a message that will have only one of its fields initialized. This can be useful when sending a message providing a wrap around multiple types or in any other case that requires it. Take a look at the example, providing a wrap around three messages of the following types: MessageA, MessageB and MessageC.

message Wrapper {
  oneof OnlyOneOf {
    MessageA a = 1;
    MessageB b = 2;
    MessageC c = 3;  

Protocol buffers use a tag value to store any field. First, the tag (1, 2, 3 above) and the type of the field is stored, next, the value is written. In the following case, where only one field is assigned, it would write its tag, type and value. Do we need to create a class that will contain all the fields? Do we need to waste this space for storing fields?

Protobuf-net Discriminated Union

To fix this wasteful generation, protobuf-net, a library build by Marc Gravell, added Discriminated Union types that is capable of addressing it. Let’s just take a look at the first of them.

public struct DiscriminatedUnionObject
  private readonly int _discriminator;

 // The value typed as Object
 public readonly object Object;

 // Indicates whether the specified discriminator is assigned
 public bool Is(int discriminator) => _discriminator == ~discriminator;

 // Create a new discriminated union value
 public DiscriminatedUnionObject(int discriminator, object value)
  _discriminator = ~discriminator; // avoids issues with default value / 0
  Object = value;

Let’s walk through all the design choices that have been made here:

  1. DiscriminatedUnionObject is a struct. It means, that if a class have a field of this type, it will be stored in the object, without additional allocations (you can think of it as inlining the structure, creating a “fat object”)
  2. It has only one field for storing the value Object. (no matter which type is it).
  3. It has only one field, called _discriminator to store the tag of the field.

If you generated the Wrapper class, it’d have only one field, of the DiscriminatedUnionObject type. Once a message of a specified type is set, the discriminator and the value would be written in the union. Simple, and efficient.

Summing up

Mapping a generic idea, like a discriminated union, into a platform or a language isn’t simple. Again, once it’s made in an elegant and an efficient way, I truly believe that it’s worth to be named as a pearl.