Never ending Append Blobs

In this article I’ll describe an easy and fast way to use Azure Storage Append Blobs to create a never ending Append Blob. Yes, a regular Append Blob has its limitations, including the maximum number of blocks and the size, but with a proper design we can overcome them.

Limits we want to overcome

According to Azure Subscription Storage Limits, an Append Blob is limited in the following way:

  1. Max number of blocks in an append blob: 50,000 – this means that we can append to a single blob only 50,000 times, no matter how much data we add at the same time
  2. Max size of a block in an append blob: 4 MiB – this means that a single operation adding one chunk of data, cannot contain more than 4 MiB.

If we could address the first, it’s highly unlikely that we’d need to address the second. 4 MiB is just enough for a single append operation.

Overcoming the limited number of blocks

Let’s consider a single writer case. Now, agree that we’ll use natural number (1, 2, 3, …) to name blobs. Then, whenever an append operation is about to happen, the writer could check the number of already appended blocks by fetching blob’s properties and create another one, if the number is equal to the max number of blocks. We could also try to append and catch the StorageException, checking for BlockCountExceedsLimit error code (see Blob Error Codes for more). Then, we’d follow with creating another blob and appending to the newly created one. This case is easy. What about multiple processes, writers trying to append at the same time?

Multiple writers

Multiple writers could use a similar approach. There’s also a risk of not being able to check for the limit. When you fetch attributes, another writer could already append their block making the number invalid. We could stick with the exception handling way of doing it:

  1. get the latest blob name (1, 2, 3, … – natural numbers)
  2. append the block
  3. if (2) this throws:
    1. try to create the next one or retrieve existing one
    2. append the block

This allows multiple writers to write to the logically same chunked blob, that is split across multiple physical Append Blobs. Wait a minute, what about ordering?

Ordering multiple writers

With multiple writers A, B, C, … appending blocks

  1. A1, A2 – for A,
  2. B1, B2 – for B,
  3. C1, C3 – for C3

the following sequences could be a result of applying this approach:

  1. A1, A2, B1, B2, C1, C2
  2. A1, B1, B2, A2, C1, C2,
  3. A1, B1, B2, C1, C2, A2

You can see that this creates a partial order (A1 will always be before A2, B1 before B2, C1 before C2) but different total orders are possible, depending on the speed of writers. Usually, it’s just ok as the writers were appending to a blob their results, their operations, not carrying about the others’ results.

Summary

We’ve seen how easy it’s to implement a never ending append blob for multiple writers. This is a great enabler in case, where you need a single logical, log-like, blob, that provides an ordered list of blocks.

Dependency rejection

TL;DR

This values must be updates synchronously or we need referential integrity or we need to make all these service calls together are sentences that unfortunately can be heard in discussions about systems. This post provides a pattern for addressing this remarks

The reality

As Jonas Boner says in his presentation Knock, knock. Who’s there? Reality. We really can’t afford to have one database, one model to fit it all. It’s impossible. We process too much, too fast, with too many components to make it all in one fast call. Not to mention transactions. This is reality and if you want to deny reality good luck with that.

The pattern

The pattern to address this remarks is simple. You need to say no to this absurd. NoNoNo.  This no can be supported with many arguments:

  • independent components should be independent, right? Like in-de-pen-dent. They can’t stop working when others stop. This dependency just can’t be taken.
  • does your business truly ensures that everything is prepared up front? What if somebody buys a computer from you? Would you rather say I’ll get it delivered in one week or first double check with all the components if it’s there, or maybe somebody returned it or maybe call the producer with some speech synthesizer? For sure it’s not an option. This dependency just can’t be taken.

You could come up with many more arguments, which could be summarized simply as a new pattern in the town, The Dependency Rejection.

Next time when your DBA/architect/dev friend/tester dreams about this shiny world of a total consistency and always available services, remind them of this and simply reject dependency on this unrealistic idea.

 

Producer – consumer relationship

In the last post about the RampUp library I covered on of the foundations: IRingBuffer. Now I’d like to describe the contract it fulfills.

Producer consumer

If you take a look at the IRingBuffer you’ll see Write/Read methods. These two are responsible for producing/consuming or writing/reading messages to the buffer in FIFO way. What are the guarantees behind such an interface? What about concurrent accessing this structure?

Multimulti

The easiest approach for distinguishing accessing patterns is considering whether the structure could be accessed by one or more threads. If you consider producer/consumer you’ll see that there are four options:

  1. SPSC – Single Producer Single Consumer – only one thread produces items, and another consumes them
  2. MPSC – Multi Producer Single Consumer – multiple threads may produce items in a safe manner, again there’s a single consuming thread
  3. SPMC – Single Producer Multi Consumer – this could be treated as a distributor of work
  4. MPMC – Multi Producer Multi Consumer – multi/multi, ConcurrentQueue is a good example of it

Unfortunately nothing comes for free. If you want to get multi, you’ll pay the price of handling friction on that end. On the other hand, if you want to design a system where queues provide transport between different parts of the system, you’ll need to enable multiple producers for sure, as there’s going to be more than one system element. If you want to process items in an order they appeared and leave the locking issues, just write a fast single threaded code, a single consumer with a single worker thread would be the way to go. Of course this worker thread may access other queues and produce items for them (hence, multi producer is needed).

The ring buffer implementation in RampUp provides exactly MPSC behavior, as it’s prepared to handle items in order, by a single thread.

Feature oriented design got wrong

The fourth link in my google search for ‘feature toggle’ is a link to this Building Real Software post. It’s about not about feature toggles described by Martin Folwer. It’s about feature toggles got wrong.

If you consider toggling features with flags and apply it literally, what you get is a lot of branching. That’s all. Some tests should be written twice to handle a positive and a negative scenario for the branch. The reason for this is a design not prepared to handle toggling properly. In the majority of cases, it’s a design which is not feature-based on its own.

The featured based design is created on the basis of closed components, which handle the given domain aspect. Some of them may be big like ‘basket’, some may be much smaller, like ‘notifications’ reacting to various changes and displaying needed information. The important thing is to design the features as closed components. Once you have it done this way, it’s easier to think about the page without notifications or ads. Again, disabling the feature is not a mere flag thrown in different pieces of code. It’s disabling or replacing the whole feature.

One of my favorite architecture styles, event driven architecture helps in a great manner to build this kind of toggles. It’s quite easy to simply… not handle the event at all. If you consider the notifications, if they are disabled, they simply do not react to various events like ‘order-processed’, etc. The separate story is to not create cycles of dependencies, but still, if you consider the reactive nature of connections between features, that’s a great enabler for introducing toggling with all of advantages one can derive from it with A/B tests, canary releases in mind.

I’m not a fan boy of feature toggling, I consider it as an important tool in architects arsenal though.

 

Pearls: EventStore transaction log

I thought for a while about presenting a few projects which are in my opinion real pearls. Let’s start with the EventStore and one in one of its aspects: the transaction log.
If you’re not familiar with this project, EventStore is a stream database providing complex event processing. It’s oriented around streams of events, which can be easily aggregated or repartitioned with projections. Based on ever appended streams and projections chasing the streams one can build a truly powerful logic around processing events.
One of the interesting aspects of EventStore is its storage engine. You can find a bit of description in here. ES does not abstract a storage away, the storage is a built-in part of the database itself. Let’s take a look at its parts before discussing its further:

Appending to the log
One the building blocks of ES is SEDA architecture – the communication within db is based on publishing and consuming messages, which one can notice reviewing StorageWriterService. The service subscribes to multiple messages, mentioned in implementations of the IHandle interface. The arising question is how often does the service flushed it’s messages to disk. One can notice, that method EnqueueMessage beside enqueuing incoming messages counts ones marked by interface IFlushableMessage. What is it for?
Each Handle method call Flush at its very end. Additionally, as the EnqueueMessage increases the counter of messages requiring flush, each Handle method decreases the counter when it handles a flushable message. This brings us to the conclusion that the mentioned counter is equal 0 iff there are no more flushable messages in the queue.

Flushing the log
Once the Flush is called a condition is checked whether:

  • the call was made with force=true (this never happens) or
  • there are no more flush messages in the queue or
  • the given time from the last time has passed

This provides a very powerful batching behavior. Under stress, the flush-to-be counter will be constantly greater than 0, providing flushing every given period of time. Under less stress, with no more flushables in the queue, ES will flush every message which needs to flush the log file.

Acking the client
The final part of the processing is the acknowledgement part. The client should be informed about persisting a transaction to disk. I spent a bit of time (with help of Greg Young and James Nugent) of chasing the place where the ack is generated. It does not happen in the StorageWriterService. What’s responsible for considering the message written then? Here comes the second part of the solution, the StorageChaser. In a dedicated thread, in an infinite loop, a method ChaserIteration is called. The method tries to read a next record from a chunk of unmanaged memory, that was ensured to be flushed by the StorageWriterService. Once the chaser finds CommitRecord, written when a transaction is commited, it acks the client by publishing the StorageMessage.CommitAck in ProcessCommitRecord method. The message will be translated to a client message, confirming the commit and sent back to the client.

Sum up
One cannot deny the beauty and simplicity of this solution. One component tries to flush as fast as possible, or batches a few messages if it cannot endure the pressure. Another one waits for the position to which a file is flushed to be increased. Once it changes, it reads the record (from the in-memory chunk matched with the file on disk) processes it and sends acks. Simple and powerful.

Composition vs derivation

Assume you’re writing some reports in your application. You’ve just created the third controller covering some kind of reporting and it seems to be, that all the three controllers have a very similar code, modulo type passed as the entity type to your NH ISession.QueryOver() or another data source. It seems that, if the method creating report base was generic, it could be used in all the cases. You want to extract it, make it clearer and to stop repeating yourself. What would you do?

Derivation
The very first thing is to extract method in each of the controllers. Now they seem almost the same. A place is needed where the method can be easily moved. How about a super type? Let’s create a controller, call it in a fashionable way, for instance: ReportControllerBase, and move the method in there. Now you can easily remove the methods in the deriving controllers. Yeah! It’s reusable, everyone writing his/her report can derive from the ReportControllerBase and use its methods to speed up his/her task.

Composition
The very first step is exactly the same: the extract method must be done, to see the common code. Once it’s done, you notice that the whole method has only a few dependencies which can be easily pushed to parameters, for instance: isession, the entity type passed to query, the value used in some complex where clause, etc. You change all the field and properties usages to parameters which allows this method to be static. You create a static method and turn it into an extension method of a session. The refactorization is done, you can easily call this method in all the controllers, by simply calling an extension onto a session.

What’s the difference and why composition should be preferred
If you use C# or Java you should always be aware of one limitation: you can derive from only one type. Spending this ‘once in a lifetime’ for a simple functionality extraction, for me – that’s the wrong way. Using composition, and deriving only when you truly see that one type is another type, that’s the right way to go. In the next post I’ll write about ASP MVC Detached Actions, a simple mechanism you can use, to derive your controllers only, when it is needed and delegating common actions, without it.

ASP MVC Model binders

Recently I had to create a model for ASP MVC, which would be bound by the model binder and had its properties set according to the request. It’s the most common case you can imagine, but there was one ‘but’. The model, because of its nature had to have non-default constructor. It was only a few parameters but it broke the DefaultModelBinder, which simply couldn’t find all the needed values. My response was quite fast, specially having the controller factory implemented in the following way:

/// <summary>
/// The controller factory building up the controllers with the unity container.
/// </summary>
public class ControllerFactory : DefaultControllerFactory
{
    private readonly IUnityContainer _container;

    public ControllerFactory(IUnityContainer container)
    {
        _container = container;
    }

    protected override IController GetControllerInstance(RequestContext requestContext, 
        Type controllerType)
    {
        return (IController)_container.Resolve(controllerType, null);
    }
}

The very next step was to replace the default binder stored in ModelBinders.Binders.DefaultBinder with another, controller-based implementation:

/// <summary>
/// The model binder building up the model objects with the unity container.
/// </summary>
public class ModelBinder : DefaultModelBinder
{
    private readonly IUnityContainer _container;

    public ModelBinder(IUnityContainer container)
    {
        _container = container;
    }

    protected override object CreateModel(ControllerContext controllerContext, 
                                          ModelBindingContext bindingContext,
                                          Type modelType)
    {
        var type = modelType;
        if (modelType.IsGenericType)
        {
            var genericTypeDefinition = modelType.GetGenericTypeDefinition();
            if (genericTypeDefinition == typeof (IDictionary<,>))
            {
                type = typeof (Dictionary<,>).MakeGenericType(modelType.GetGenericArguments());
            }
            else if (((genericTypeDefinition == typeof (IEnumerable<>)) ||
                      (genericTypeDefinition == typeof (ICollection<>))) ||
                     (genericTypeDefinition == typeof (IList<>)))
            {
                type = typeof (List<>).MakeGenericType(modelType.GetGenericArguments());
            }
        }

        // in the base method: return Activator.CreateInstance(type);
        return _container.Resolve(type, null); 
    }
}

As you can compare, the method simply delegates creating model to the container. Having this binder set allowed passing models with non-default constructors, so the problem was solved:] After a while, I considered following case: if the type is constructed with a container, the controller can also be passed an object implementing an interface registered in container. Hence, if a method needs a NHibernate ISession, this can be expressed not only as a parameter in Controller constructor, but also as a method parameter. This paradigm creates controllers with no parameters needed in constructor (or only common parameters, used in all actions) and all the others passed as action parameters. The result is a method, which can be perfectly tested in an environment where the only smallest, needed set of dependencies is passed (no more constructors with not needed in the current method parameters). What do you think about it?

An example of a controller:

public class CarController : Controller
{
    private readonly ISession _session;
    private const int PageSize = 10;

    public HomeController(ISession session)
    {
        _session = session;
    }

    public ActionResult Rent(IUserService userService, Guid carId)
    {
        using (var tx = _session.BeginTransaction())
        {
            var car = _session.Load<Car>(carId);
            var user = userService.GetCurrentUser();

            car.RentBy(user);
            
            tx.Commit();
        }

        return View();
    }

    public ActionResult Index( int? pageNo)
    {
        var page = pageNo ?? 0;

        var cars = _session.CreateQuery("from Car")
            .SetFirstResult(page*PageSize)
            .SetMaxResults(PageSize)
            .Future<Car>();

        return View(cars);
    }
}