On sharing your opinion

Two short stories, one topic.

1

Imagine the following scenario. An interesting discussion that gets people excited. There’s this one new person that doesn’t get involved too much. Actually, if you were counting all the words they said, the number would be 0. Nothing, none, null. The reason for this is simple. They did not go through the same as the rest. The experience in this topic, if any, is none. Their reasoning as simple as that: no exp, no talk. Is it a valid approach?

2

Imagine the following scenario. An interesting discussion that gets people excited. There’s this one person that gets involved. They share all the experiences they had with the topic. The reasoning they share, the experience, is so big, that all the rest just follows their words. The speaker’s reasoning as simple as that: big exp, big talk. Is it a valid approach?

What to do?

Recently, I came to a conclusion that there’s no simple answer to these questions. Being an expert, sometimes requires you to be quiet for a little. To listen to others’ experiences and to not flood them with your opinions. They may just follow you.
Being a novice, sometimes requires you to share, what you don’t know. Just to show a new point of view.
 
Next time, when debating, think about your opinions again. Maybe, it’s time to make it the other way around.

Never ending Append Blobs

In this article I’ll describe an easy and fast way to use Azure Storage Append Blobs to create a never ending Append Blob. Yes, a regular Append Blob has its limitations, including the maximum number of blocks and the size, but with a proper design we can overcome them.

Limits we want to overcome

According to Azure Subscription Storage Limits, an Append Blob is limited in the following way:

  1. Max number of blocks in an append blob: 50,000 – this means that we can append to a single blob only 50,000 times, no matter how much data we add at the same time
  2. Max size of a block in an append blob: 4 MiB – this means that a single operation adding one chunk of data, cannot contain more than 4 MiB.

If we could address the first, it’s highly unlikely that we’d need to address the second. 4 MiB is just enough for a single append operation.

Overcoming the limited number of blocks

Let’s consider a single writer case. Now, agree that we’ll use natural number (1, 2, 3, …) to name blobs. Then, whenever an append operation is about to happen, the writer could check the number of already appended blocks by fetching blob’s properties and create another one, if the number is equal to the max number of blocks. We could also try to append and catch the StorageException, checking for BlockCountExceedsLimit error code (see Blob Error Codes for more). Then, we’d follow with creating another blob and appending to the newly created one. This case is easy. What about multiple processes, writers trying to append at the same time?

Multiple writers

Multiple writers could use a similar approach. There’s also a risk of not being able to check for the limit. When you fetch attributes, another writer could already append their block making the number invalid. We could stick with the exception handling way of doing it:

  1. get the latest blob name (1, 2, 3, … – natural numbers)
  2. append the block
  3. if (2) this throws:
    1. try to create the next one or retrieve existing one
    2. append the block

This allows multiple writers to write to the logically same chunked blob, that is split across multiple physical Append Blobs. Wait a minute, what about ordering?

Ordering multiple writers

With multiple writers A, B, C, … appending blocks

  1. A1, A2 – for A,
  2. B1, B2 – for B,
  3. C1, C3 – for C3

the following sequences could be a result of applying this approach:

  1. A1, A2, B1, B2, C1, C2
  2. A1, B1, B2, A2, C1, C2,
  3. A1, B1, B2, C1, C2, A2

You can see that this creates a partial order (A1 will always be before A2, B1 before B2, C1 before C2) but different total orders are possible, depending on the speed of writers. Usually, it’s just ok as the writers were appending to a blob their results, their operations, not carrying about the others’ results.

Summary

We’ve seen how easy it’s to implement a never ending append blob for multiple writers. This is a great enabler in case, where you need a single logical, log-like, blob, that provides an ordered list of blocks.

Pearls: Jil, serialization of primitives

The last pearl of design that I covered was an implementation for the discriminated union in the probuf-net library. Now, it’s time to move to an area that is less esoteric in terms of the format, but still intriguing in terms of performance. Time to take a look at the fastest JSON serializer available for .NET, Jil.

Is it fast?

Jil is crazy fast. As always, the best way to do something fast, is to skip some steps. You can think about skipping doing some computations, to effectively lower the load for the CPU. When speaking about programming in .NET, one additional thing to skip are allocations. And when speaking about allocations and JSON, the one that should come to our minds are strings. I don’t mean strings per se, but object that are transformed into them before writing to the actual input.

A unique case of GUID

GUID in .NET (or Guid) stands for Globally Unique Identifier. It’s a structure, 16 bytes long, providing a unique identity generator (let’s not deal with comb guids and other types for now). It’s frequently used whenever its the client responsibility to generate an id. It’s also cheap, as you don’t need to call the third party to get a globally unique id.

Let’s take a look how Guid is serialized by another library for .NET, probably, the most popular one, JSON.NET

As you can see above (and you can see it on GitHub), JSON.NET first calls .ToString, allocating  32 characters in a form of a string, just to pass it to the TextWriter instance. Yes, this instance will end its life shortly after writing it to the writer, but still, it will be allocated. What could be done better?

Jil to the rescue

We’ve got a text writer. If we had an additional buffer of char[] we could write characters from Guid to the buffer and then pass it to the text writer. Unfortunately, Guid does not provide a method to write its output to the buffer. Jil, provides one instead. It has an additional Guid structure that enables to access internal fields’ values of Guid. With this and a buffer, we can write to a text writer without allocating an actual string. The code is much to big to paste it in here. Probably it must be big to be that fast;-) Just follow this link to see the selected lines.

Summary

The mentioned optimization for Guid type is just one of many, that you can find in the awesome Jil library. As always, do not allocate, is one of the top commandments when talking about performance.