Task.WhenAll tests

In the last post I’ve shown some optimizations one can apply to reduce the overhead on creating asynchronous state machines. Let’s dive into the async world again and consider helper methods provided by the Task class, especially Task.WhenAll.

public static Task WhenAll(params Task[] tasks)

The method works in a following way. It accepts an array of tasks and returns a task that will finish as soon as all of the underlying tasks are finished. Applying this method in some scenarios may provide an improvement gain, as one can run a few tasks in parallel. It has a drawback though.

Let’s consider following code

public Task<int> A()
  await B1();
  await B2();
  return C ();

If b1 and b2 could be executed in parallel (for instance, they access Azure Table Storage), this method could be rewritten in a following way.

public Task<int> A()
  await Task.WhenAll(B1(), B2());
  return C ();

What, beside the mentioned performance improvement, changed? Now, method A is no longer a method A. There are two methods which can be randomly executed. One running operations in the following order: B1, B2, C, and the other: B2, B1, C. This effectively means, that your previous test coverage is no longer true. If you want to truly test it, you need to provide suites that will order these B* calls properly and ensure that all permutations will be emitted and tested. Sometimes it has no meaning, sometimes it has. Let’s consider a following scenario:

  • Two callers are calling A in the same time
  • Every B method removes a specific file, failing if it does not exist
  • At least one of the callers should succeed

In the first version, it was a pure race for being the first. The first that goes through B1, B2, C would execute properly. Now consider the second version of A with two callers executing following operations in the specified order:

  • Caller 1: B1, B2, C
  • Caller 2: B2, B1, C

As you can see, it’s a typical deadlock scenario and both callers would fail.

As always, there’s no silver bullet and if you want to use Task.WhenAll to speed up your application by running operations in parallel, you must embrace the fact of a possibly non linear execution.

Happy awaiting.

Rise of the IAsyncStateMachines

Whenever you use async/await pair, the compiler performs a lot of work creating a class that handles the coordination of code execution. The created (and instantiated) class implements an interface called IAsyncStateMachine and captures all the needed context to move on with the work. Effectively, any async method using await will generate such an object. You may say that creating objects is cheap, then again I’d say that not creating them at all is even cheaper. Could we skip creation of such an object still providing an asynchronous execution?

The costs of the async state machine

The first, already mentioned, cost is the allocation of the async state machine. If you take into consideration, that for every async call an object is allocated, then it can get heavy.

The second part is the influence on the stack frame. If you use async/await you will find that stack traces are much bigger now. The calls to methods of the async state machine are in there as well.

The demise of the IAsyncStateMachines

Let’s consider a following example:

public async Task A()
    await B ();

Or even more complex example below:

public async Task A()
    if (_hasFlag)
        await B1 ();
        await B2 ();

What can you tell about these A methods? They do not use the result of  the Bs. If they do not use it, maybe awaiting could be done on a higher level? Yes it can. Please take a look at the following example:

public Task A()
    if (_hasFlag)
        return B1 (); 
        return B2 ();

This method is still asynchronous, as asynchronous isn’t about using async await but about returning a Task. Additionally, it does not generate a state machine, which lowers all the costs mentioned above.

Happy asynchronous execution.

Build Tour 2016 in Warsaw

Probably every .NET developer watched at least a few presentations from the Build conference. Happily for Polish .NET developers, Warsaw has been selected as one of the cities where Build speakers came to share their exciting news about Microsoft, Windows and .NET.

The conference took place in Expo XXI. The very first thing were containers with Build logo on them. I must admit that this was a bold way of showing where the conference takes place and much better than just a few stands. The venue and the scene were amazing. It was a copy of the Build scene so you could feel the atmosphere. Every organizational aspect of it was well thought through.

The key topics were Universal Windows Platform apps, Web Apps and IoT. Even if I don’t work with these on daily basis the level of talks, as well as the way they were being presented, was balanced enough to actually show and share some meaningful information. The top three for me were:

  1. The packing of Universal Windows Platform apps. Running an installer in a recording container to be able to apply all the operations on the system later on.
  2. Manifoldjs for building hosted we apps.
  3. Enhanced Reality provided with the Unity engine.

It was quite sad that people started leaving quite early. Fortunately, a solid group (including me:P) stayed till the very end when a networking took place. I’m looking forward to seeing Build 2017 again. Of course, in Warsaw.

Views’ warm up for Event Sourcing

When using Event Sourcing as a foundation for your solution, the command part is a solved problem. Just take an aggregate version, a command, apply onto a state and try append created events to a store, checking the version again. There is a read part of this as well, called views, which is nothing more than an aggregation of a subset of events from the system. This works like a live query, which consumes events from a log and applies them on the projection on and on. Considering that the number of events is constantly growing, how would you deploy a new version of application containing a new view which needs to be build from the beginning, from the very first event? Even with a well performing database applying a few millions of events can take a while.

Warm up routine

Let’s consider a following routine. Instead of calling views by their names, a system version is appended. Take a view ‘users’ as an example

  1. for version 1.0.0 it’s “users-1_0_0”
  2. for version 1.2.0 it’s “users-1_2_0”

Before publishing the new version and moving all users to use it, predeploy the application and run its view builder. The views will be rebuild in the background, taking needed time. Once the builder starts to have problems with getting last events as there’s no more data, the views are prebuilt and a new version can be deployed.

Cost and optimization

Of course these views rebuild can be tedious and long. These could increase costs of your app as well as put an additional pressure on the event store. If a cost/performance optimization is needed, you can consider detection of a view change, something very similar to what Rinat did a few years back. You may come up with something more explicit as well. For any mechanism, the rule would be that if a view is the same, your app uses the last existing version of the view.

Providing this warm up routine, especially when using blue green deployments can improve not only the performance of your application start (the new version scenario) but also can provide an environment for testing the deployment before switching to the new version.

Events on the Outside versus Events on the Inside

Recently I’ve been revisiting some of my Domain Driven Design, CQRS & Event Sourcing knowledge and techniques. I’ve supported creation of systems with these approaches, hence, I could revisit the experiences I had as well. If you are not familiar with these topics, a good started could be my Feed Your Head list.


So you model you domain with aggregates in minds, distilling contexts and domains. The separation between services may be clear or a bit blurry, but it looks ok and, more important, maps the business well. Inside a single context bubble, you can use your aggregates’ events to create views and use the views when in need of data for a command execution. It doesn’t matter which database you use for storing events. It’s simple. Restore the state of an aggregate, gather some data from views, execute a command. If any events are emitted, just store them. A background worker will pick them up to dispatch to a Process Manager.


What about exposing you events to other modules? If and how can another module react to an event? Should it be able to build it’s own view from the data held in the event? All of these could be sum up in one question: do external events match the internal of a specific module? My answer would be: it’s not easy to tell.

In some systems, these may be good. By the system I mean not only a product, but also a team. Sometimes having a feed of events can be liberating and enabling faster grow, by speeding up initial shaping. You could agree to actually separate services from the very start and verify during a design, if the logical complexity is still low. I.e., if there is not that much events shared between services and what they contain.

This approach brings some problems as well. All the events are becoming your API. They are public, so now they should be taken into consideration when versioning your schemas. Probably some migration guide will be needed as well. The bigger public API the bigger friction with maintaining it for its consumers.

Having this said, you could consider having a smaller and totally separate set of events you want to share with external systems. This draws a visible line between the Inside & the Outside of your service, enabling you to evolve rapidly in the Inside. Maintaining a stable API is much easier then and the system itself has a separation. This addresses questions about views as well. Where should they be stored originally. The answer would be to store properly versioned, immutable views Inside the service, using identifiers to pass the reference to another service. When needed, the consumer can copy & transform the data locally. A separate set of events provides ability to do not use Event Sourcing where not needed. That kind of options (you may, but don’t have to use it) are always good.

For some time I was an advocate of sharing events between services easily, but now, I’d say: apply a proper choice for your scenario. Consider pros and cons, especially in terms of the schema maintainer tax & an option for not sticking to Event Sourcing.


The process of revisiting my assumptions has been started by a few materials. One of them is a presentation by Andreas Ohlund, ‘Putting your events on a diet’, sharing a story about deconstructing an online shop into services. The second are some bits from A Decade of DDD, CQRS, Event Sourcing by Greg Young. The last but not least, Pat Helland’s Data on the Outside versus Data on the Inside.

Single producer single consumer optimizations

The producer-consumer relationship is one of the most fundamental cooperation patterns. Some components produce values, issues requests and some consume/handle them. Depending on the number of components at the end of this dependency it’s called ‘single/multi producer single/multi consumer’ relationship. It’s important to make this choice explicit, because as with every explicit choice, it enables some optimizations. I’d like to share some thoughts o the optimizations taken in the single consumer single producer scenario in the RampUp library provided by OneToOneRingBuffer.

The behavior of ring buffers in RampUp is ported from Java’s Agrona. They provide a queue that enables reading sequentially on the consumer side. The reasoning behind it is that sequential reads are CPU friendly, so that consumer can process messages much quicker. For ManyToOneRingBuffer the production part is quite complex. It proceeds as follows:

  1. check against the consumer position, is there enough of space
  2. allocate a slot in the ring (this is done with Interlocked operations, in a loop, may take a while)
  3. write a header in an ordered way (using volatile)
  4. put data
  5. write the header again marking the message as published

This brings a lot of unneeded work for a single producer. When considering a single producer, there’s nothing to compete with. The only check that needs to be made is that the producer does not overlap with the consumer. So the algorithm looks as follows:


  1. check against the consumer position, is there enough of space
  2. put data
  3. write the header again marking the message as published
  4. write the tail value for future writes

Removal of Interlocked and lowering the number of Volatile operations can improve the producer performance greatly (less synchronization).


If you wanted to compare these two on your own, here you are: ManyToOne and OneToOne.

Happy producing (and consuming).

Data has no format

I need to be able to store 1GB of JSON

I’d like to push XML 100 MB/s to this Azure blob

I need to log this data as CSV

Statements like this are sometimes true, but in the majority of cases the format is not given and is a part of designing your architecture/application. Or redesigning if needed. Selecting a proper format can lower the size of your data, increasing the throughput of your system, if a medium like a disk or a network is saturated. That’s why systems like Apache Arrow or Google’s Dremel use their own formats. That’s why you may consider using the protobuf-net serialization for EventStore, disabling it build in v8 projections and lowering size of events at the same time. For low latency systems you can choose the new library Simple Binary Encoding. That’s why sometimes storing data in another format is simply better. I’ve written a blog post Do we really need all these data tranformations and it doesn’t state something opposite. It’s all about making a rational and proper choices of the storage format and taking into consideration different aspects of it and its influence on your system. With this one decision you might improve your system performance greatly.