Implementing a scheduler for your orchestrations

TL;DR

We’ve already seen here and here that with async-await one could easily sketch an orchestration/saga for any process that should be both, robust and resilient. It’s time to take a look how a scheduler for such a process could be implemented.

Delay with no Task

Usually, when we want to delay an action in an asynchronous flow, we use Task.Delay. This method schedules a continuation with the rest of our code, to be executed after the specified delay. The usage is as simple as:


await Task.Delay(TimeSpan.FromSeconds(1.5));

This is fine, when we want to postpone an action for a few seconds, but what in case of processes that want to be frozen for days? How could you implement it?

First, let us rephrase the delay to a method that is provided by the base orchestration class (you can always have a base, can’t you?).


await this.Delay(TimeSpan.FromSeconds(1.5));

With this assumption, we can move forward and take a look at a possible Delay implementation.

Delay for Orchestrations

The whole idea of this orchestration is based on snapshoting its changes as events and make them replayable. In other words, if a failure occurs, the orchestration process should be resurrected on another node with no changes in the flow. This makes implementation a bit trickier, but is needed for providing strong foundations for our processes. Let’s take a look at Delay possible implementation.


protected Task Delay(TimeSpan delay)
{
  var date = GetDateTimeUtcNow();
  var scheduleAt = date + delay;

  ScheduledAt existingDelay;
  if (TryPop(out existingDelay))
  {
    if (existingDelay.Value > context.DateTimeUtcNow())
    {
      EndCurrentExecution();
      return new TaskCompletionSource<object>().Task;
    }

    return Task.CompletedTask;
  }

  if (scheduleAt <= date)
  {
    return Task.CompletedTask;
  }

  Append(new ScheduledAt(scheduleAt));
  EndCurrentExecution();
  return new TaskCompletionSource<object>().Task;
}

The first line of this methods calls to GetDateTimeUtcNow. As you can imagine, this gets the UTC current date. It has one additional property though. Do you remember that we need to make this method possible to be executed multiple times with the same effect? This means, that the result of GetDateTimeUtcNow will be recorded and when we, for any reason like the process kill, enter the orchestration again, it will provide the same value. Effectively, it will be now from the first execution of it.

The next step is to calculate the date when the delay should end, where the next execution is ScheduledAt.

We TryPop a prerecorded event. If the orchestration was already active, it left a trace, an event in the history, that we can pop. If there’s an entry, we compare it with the current UtcDateTimeNow. If the orchestrations should wait more we just mark it as one that requires ending execution. Next we return  new TaskCompletionSource<object>().Task
which effectively is a never ending task. This means that any continuation attached by the caller of this method, either explicit or implicit using await won’t be run!

If there was no event and the date that Delay is scheduled for some reason lower then current date, a completed task is returned. Otherwise, an event is added and the current execution is ended with the same pattern: by setting a notification about execution not proceeding any longer and returning a never completing task.

Execution status

The caller responsibility is to gather information whether the orchestration ended or was scheduled for later execution. This is done by awaiting one of the tasks. Either the task of the orchestration itself or a task that is EndCurrentExecution sets the result.


await Task.WhenAny(
orchestration.Execute(),
currentExecutionEndingTask);

Summary

We saw how powerful can be asynchronous flows, especially when connected with optional calling of scheduled continuations. With a simple recording of events we were able to create an orchestration tooling that is easy to use by the end user (programmer), but still provides an interesting and powerful semantics of a time dependent process.

Converging processes

TL;DR

Completing an order is not an easy process. A payment gateway may not accept payments for some time. A coffee can be spilled all over the book you ordered and a new one needs to be taken from a storage. Failures may occur in different moments of the pipeline, but still your ordered is received. Is it a one process or multiple? If one, does anyone takes every single piece into consideration? How to ensure that eventually a client will get their product shipped?

Process managers and sagas

There are two terms used for these processors/processes. They are process managers and sagas. I don’t want to go through the differences and marketing behind using one or another. I want you to focus on a process that handles an order and that reacts to three events:

  • PaymentTimedOut – occurring when the response for Payment was not delivered before the specific timeout
  • PaymentReceived – the payment was received in time
  • OrderCancelled – the order for which we requested this payment was cancelled

What messages will this process receive and in which order? Before answering, take into consideration:

  • the scheduling system that is used to dispatch timeouts,
  • the external gateway system (that has its own SLA),
  • messaging infrastructure

What’s the order then? Is there a predefined one? For sure there isn’t.

Convergence

Your aim is to provide a convergent process. You should review all the permutations of the process inputs. Being given 3 types of input, you should consider 3! = 6 possibilities. That’s why building a few processes instead of one is easier. You can think of them as sub-state machines that later, can be composed into a bigger whole. This isn’t only a code complexity, but a real mental overhead that you need to design against.

Summary

Designing multiple small processes as sub-state machines is easier. When dealing with many events to react to, try to extract smaller state machines and aggregate them on a higher level.

 

Big Ball of Fun

TL; DR
How to deal better with legacy code.

Mental

There’s a very useful tool, that everybody uses but only a few recognize in their behavior. This tool is called a mental model. It’s a way of thinking, that could help or harm your ability to deal with a specific situation. For instance, when somebody overtakes your car, you might think that he/she is an idiot, or leave some space for an interpretation saying ‘it must be something urgent’. Depending on your choice, your significant other and your child may learn a new combination of swearwords that never existed before. Just to be clear: I’m not saying that inventing these combinations is useful, but rather the opposite. The most important thing is to be more aware of your reactions and consciously build models that fit you and help you.

Fun fun fun fun

What’s better, mud or fun? If you were Peppa Pig, then you’d put an equality sign between them. But you aren’t. The mud is dirty and stinky and fun is just … fun. What would you like to approach: fun or mud? I bet the answer is fun. Next time when you need to work on a piece of a legacy, whether it’s a COBOLac or Spaghetti Visualo Basico, try to use this Big Ball of Fun and share this term in your team. This might remove one of the obstacles and turn it into something more approachable. You still might have some others, but one will be gone.

Let’s have some fun!