In this article I’ll describe an easy and fast way to use Azure Storage Append Blobs to create a never ending Append Blob. Yes, a regular Append Blob has its limitations, including the maximum number of blocks and the size, but with a proper design we can overcome them.
Limits we want to overcome
According to Azure Subscription Storage Limits, an Append Blob is limited in the following way:
- Max number of blocks in an append blob: 50,000 – this means that we can append to a single blob only 50,000 times, no matter how much data we add at the same time
- Max size of a block in an append blob: 4 MiB – this means that a single operation adding one chunk of data, cannot contain more than 4 MiB.
If we could address the first, it’s highly unlikely that we’d need to address the second. 4 MiB is just enough for a single append operation.
Overcoming the limited number of blocks
Let’s consider a single writer case. Now, agree that we’ll use natural number (1, 2, 3, …) to name blobs. Then, whenever an append operation is about to happen, the writer could check the number of already appended blocks by fetching blob’s properties and create another one, if the number is equal to the max number of blocks. We could also try to append and catch the StorageException, checking for BlockCountExceedsLimit error code (see Blob Error Codes for more). Then, we’d follow with creating another blob and appending to the newly created one. This case is easy. What about multiple processes, writers trying to append at the same time?
Multiple writers could use a similar approach. There’s also a risk of not being able to check for the limit. When you fetch attributes, another writer could already append their block making the number invalid. We could stick with the exception handling way of doing it:
- get the latest blob name (1, 2, 3, … – natural numbers)
- append the block
- if (2) this throws:
- try to create the next one or retrieve existing one
- append the block
This allows multiple writers to write to the logically same chunked blob, that is split across multiple physical Append Blobs. Wait a minute, what about ordering?
Ordering multiple writers
With multiple writers A, B, C, … appending blocks
- A1, A2 – for A,
- B1, B2 – for B,
- C1, C3 – for C3
the following sequences could be a result of applying this approach:
- A1, A2, B1, B2, C1, C2
- A1, B1, B2, A2, C1, C2,
- A1, B1, B2, C1, C2, A2
You can see that this creates a partial order (A1 will always be before A2, B1 before B2, C1 before C2) but different total orders are possible, depending on the speed of writers. Usually, it’s just ok as the writers were appending to a blob their results, their operations, not carrying about the others’ results.
We’ve seen how easy it’s to implement a never ending append blob for multiple writers. This is a great enabler in case, where you need a single logical, log-like, blob, that provides an ordered list of blocks.
The last pearl of design that I covered was an implementation for the discriminated union in the probuf-net library. Now, it’s time to move to an area that is less esoteric in terms of the format, but still intriguing in terms of performance. Time to take a look at the fastest JSON serializer available for .NET, Jil.
Is it fast?
Jil is crazy fast. As always, the best way to do something fast, is to skip some steps. You can think about skipping doing some computations, to effectively lower the load for the CPU. When speaking about programming in .NET, one additional thing to skip are allocations. And when speaking about allocations and JSON, the one that should come to our minds are strings. I don’t mean strings per se, but object that are transformed into them before writing to the actual input.
A unique case of GUID
GUID in .NET (or Guid) stands for Globally Unique Identifier. It’s a structure, 16 bytes long, providing a unique identity generator (let’s not deal with comb guids and other types for now). It’s frequently used whenever its the client responsibility to generate an id. It’s also cheap, as you don’t need to call the third party to get a globally unique id.
Let’s take a look how Guid is serialized by another library for .NET, probably, the most popular one, JSON.NET
As you can see above (and you can see it on GitHub), JSON.NET first calls .ToString, allocating 32 characters in a form of a string, just to pass it to the TextWriter instance. Yes, this instance will end its life shortly after writing it to the writer, but still, it will be allocated. What could be done better?
Jil to the rescue
We’ve got a text writer. If we had an additional buffer of char we could write characters from Guid to the buffer and then pass it to the text writer. Unfortunately, Guid does not provide a method to write its output to the buffer. Jil, provides one instead. It has an additional Guid structure that enables to access internal fields’ values of Guid. With this and a buffer, we can write to a text writer without allocating an actual string. The code is much to big to paste it in here. Probably it must be big to be that fast;-) Just follow this link to see the selected lines.
The mentioned optimization for Guid type is just one of many, that you can find in the awesome Jil library. As always, do not allocate, is one of the top commandments when talking about performance.