Orchestrating processes with full recoverability


Do you call a few services in a row as a part of a bigger process? What if one of the calls fails? What if your hosting application fails? Do you provide a reliable way for successfully finishing your process? If not, I might have a solution for you.

Anatomy of a process

A process can be defined as at least two calls to different services. When using a client library of some sort and C# async-await feature one could write a following process

var id = await invoiceService.IssueInvoice(invoiceData);
await notificationService.NotifyAboutInvoice(id);

It’s easy and straightforward. First, we want to issue an invoice. Once it’s done, a notification should be sent. Both calls although they are async should be executed step by step. Now. What if the process is halted after issuing the invoice? When we rerun it, there’s no notion of something stopped in the middle. One could hope for good logging, but what if this fails as well.

Store and forward

Here comes the solution provided by DurableTask library provided by the Azure team. The library provides a capability of recording all the responses and replaying them without execution. All you need is to create proxies to the services using a special orchestration context.

With a process like the above when executing following state is captured:

  1. Initial params to the instance of the process
  2. invoiceData are stored when first call is done
  3. invoiceService returns and the response is recorded as well
  4. invoiceNumber is stored as a parameter to the second call
  5. notificationService returns and it’s marked in the state as well

As you can see, every execution is stored and is followed by storing it’s result. OK. But what does it mean if my process fails?

When failure occurs

What happens when failure occurs. Let’s consider some of the possibilities.

If an error occurs between 1 and 2, process can be restarted with the same parameters. Nothing really happened.

If an error occurs between 2 and 3, process is restarted. The parameters to the call were stored but there’s no notion of the call to the first service. It’s called again (yes, the delivery guarantee is at-least-once).

If an error occurs between 3 and 4, process is restarted. The response to the call to the invoice service is restored from the state (there’s no real call made). The parameters are established on the basis of previous values.

And so on and so forth.

Deterministic process

Because the whole process is based either on the input data or already received calls’ results it’s fully deterministic. It can be safely replayed when needed. What are not deterministic calls that you might need? DateTime.Now comes immediately to one’s mind. You can address it by using deterministic time provided by the context.CurrentUtcDateTime.

What’s next

You can build a truly powerful and reliable processes on top of it. Currently, implementation that is provides is based on Azure Storage and Azure Service Bus. In a branch you can find an implementation for Service Fabric, which enables you to use it in your cluster run on your development machine, on premises or in the cloud.


Ensuring that a process can be run till a successful end isn’t an easy task. It’s good to see a library that uses a well known and stable language construct of async-await and lifts it to the next level, making it an important tool for writing resilient orchestrations.


Async programming model


This is a follow up post to Async pump for better throughput in Azure. Please read the first before moving forward.


I’ve been given a lot of feedback about my Async pump post. In a few cases this blog post from Ayende was quoted as it describes exactly the same approach. You can read the post, but more meaningful are comments provided by Kelly Sommers and Clemens Vasters.

The model

The await statement has simple semantics. It breaks your code and schedules the following part as a task continuation. This heavy lifting is done on the C# compiler level, so you don’t have to worry about. The model of this extension is simple: define a task and a continuation. Nothing more, nothing less. With my approach, that was a “trick”

That’s the premise of the “trick” that is allegedly achieving parallel execution of I/O and compute work here. That is, however, not the purpose of the asynchronous programming model and of the Windows IO completion port (IOCP) model. The point of IOCP is to efficiently offload IO work from user code to kernel and driver and hardware and not to bother the user code until the IO work is done

by Clemens Vasters

What it basically says is once you await on IO operation, your code that is run after, is scheduled on the IO thread

As IO typically takes very long and compute work is comparatively cheap, the goal of the IO system is to keep the thread count low (ideally one per core) and schedule all callbacks (and thus execution of interleaved user code) on that one thread. That means that, ideally, all work gets serialized and there minimal context switching as the OS scheduler owns the thread. The point of this model is ALREADY to keep the CPU nicely busy with parallel work while there is pending IO.

by Clemens Vasters

So with your code awaiting some IO, you’ll call IO, next the code after await is executed on the IO thread, as it’s assumed to be lightweight, another IO occurs, again the same thread dispatched the callback. With the async pump I proposed, comes a danger, as when we follow this partially awaitless approach the continuation code is

not on a .NET poll thread, but an IO pool thread. The strategy above sits on that IO pool thread past the point where you are supposed to return it (which is the next IO call) and thus you might force the IO pool to grow

by Clemens Vasters


Measure first. Think. Then think again. The “trick” worked in my case, improving performance for initialization of one app (I needed it in the beginning). Does it work in every case and should be used in general, I’d say no. It’s good to follow the programming model. If you don’t want to, you must have strong reasons for it.

Async pump for better throughput in Azure

This post is followed up by 


Introducing async-await has changed a lot. Now, with some compiler’s help we’re able to squeeze out more throughput from our machines, which may lower costs and increase throughput. In this blog post we’ll push the boundaries even further by questioning the need of immediate awaiting on a task.


The story behind this pattern is simple. I’m using a part of the Azure Storage Services, the page blob. It provides storage API targeted at random IO & page aligned reads and writes. This is a perfect solution for emulating disk IO (it’s used for VMs’ disks), but could be used in cases where you want to have ability to write to a file in Azure, under specific index. You can just read a page, modify it and write it back. If you’re interested in this topic, take a look at my talks, maybe we’ll meet during my presentation Keep Its Storage Simple Stupid.


It’s obvious that for reading/writing from Storage Services you want to use asyncified code. Using blocking calls like the one below freezes a thread, which considering the money you already pay, is not the best option. Remember, it’s the cloud and regular Storage Services are backed up by HDD disks. It might take a while. Still, let’s take a look at the sync version first. We’ll operate on a stream that has been opened from the blob.


The method above reads the buffer from a stream. It dispatches a read after a read to fill the buffer.


The async-ified version of this reader is not that different. It uses just async Task in its signature and awaits one the Read. We’ll have no blocking calls, leaving some spare CPU cycles for other operations.


Asynchronous pump

In the last attempt, we need to ask what do we read this buffer for? In my case, that’s for scanning over its content. I need to read a page blob from a given position and scan/deserialize it in a C# code. I do not want to preserve the buffer. As soon as I read it, I can move on. It’s just about reading a log, nothing more. The second property of it is: entries in this log are aligned to pages, so are well aligned for reading. Can we modify the reading part then?

We could think of using two buffers/streams. Schedule a read in the first, then in a loop:

  1. schedule a read in the second
  2. await on the first
  3. swap firstsecond
  4. go to 1

If we used this algorithm, we’d have a higher probability of one operation being already ended and ready to be dispatched. This means that our algorithm, possibly, could work on prefetched data without any interruptions, having the data ready when it needs it. For sake of simplicity, the buffer array, ReadBuffer were closed in a simple helper class called Buffer.



Having something ready to be awaited does not mean that you should await it immediately. Using this two buffer approach can increase the throughput of your algorithm by ensuring that data are fetched before they are needed. Now it’s time for you to search for some pumping potential in your code!

Task.WhenAll tests

In the last post I’ve shown some optimizations one can apply to reduce the overhead on creating asynchronous state machines. Let’s dive into the async world again and consider helper methods provided by the Task class, especially Task.WhenAll.

public static Task WhenAll(params Task[] tasks)

The method works in a following way. It accepts an array of tasks and returns a task that will finish as soon as all of the underlying tasks are finished. Applying this method in some scenarios may provide an improvement gain, as one can run a few tasks in parallel. It has a drawback though.

Let’s consider following code

public Task<int> A()
  await B1();
  await B2();
  return C ();

If b1 and b2 could be executed in parallel (for instance, they access Azure Table Storage), this method could be rewritten in a following way.

public Task<int> A()
  await Task.WhenAll(B1(), B2());
  return C ();

What, beside the mentioned performance improvement, changed? Now, method A is no longer a method A. There are two methods which can be randomly executed. One running operations in the following order: B1, B2, C, and the other: B2, B1, C. This effectively means, that your previous test coverage is no longer true. If you want to truly test it, you need to provide suites that will order these B* calls properly and ensure that all permutations will be emitted and tested. Sometimes it has no meaning, sometimes it has. Let’s consider a following scenario:

  • Two callers are calling A in the same time
  • Every B method removes a specific file, failing if it does not exist
  • At least one of the callers should succeed

In the first version, it was a pure race for being the first. The first that goes through B1, B2, C would execute properly. Now consider the second version of A with two callers executing following operations in the specified order:

  • Caller 1: B1, B2, C
  • Caller 2: B2, B1, C

As you can see, it’s a typical deadlock scenario and both callers would fail.

As always, there’s no silver bullet and if you want to use Task.WhenAll to speed up your application by running operations in parallel, you must embrace the fact of a possibly non linear execution.

Happy awaiting.

Rise of the IAsyncStateMachines

Whenever you use async/await pair, the compiler performs a lot of work creating a class that handles the coordination of code execution. The created (and instantiated) class implements an interface called IAsyncStateMachine and captures all the needed context to move on with the work. Effectively, any async method using await will generate such an object. You may say that creating objects is cheap, then again I’d say that not creating them at all is even cheaper. Could we skip creation of such an object still providing an asynchronous execution?

The costs of the async state machine

The first, already mentioned, cost is the allocation of the async state machine. If you take into consideration, that for every async call an object is allocated, then it can get heavy.

The second part is the influence on the stack frame. If you use async/await you will find that stack traces are much bigger now. The calls to methods of the async state machine are in there as well.

The demise of the IAsyncStateMachines

Let’s consider a following example:

public async Task A()
    await B ();

Or even more complex example below:

public async Task A()
    if (_hasFlag)
        await B1 ();
        await B2 ();

What can you tell about these A methods? They do not use the result of  the Bs. If they do not use it, maybe awaiting could be done on a higher level? Yes it can. Please take a look at the following example:

public Task A()
    if (_hasFlag)
        return B1 (); 
        return B2 ();

This method is still asynchronous, as asynchronous isn’t about using async await but about returning a Task. Additionally, it does not generate a state machine, which lowers all the costs mentioned above.

Happy asynchronous execution.