Relaxed Optimistic Concurrency

TL;DR

When using the optimistic concurrency approach for entities that are updated frequently, some of the actions may fail because of the conflicting version numbers. A proper modelling technique distilling if business requirements can be loosened may greatly increase the chances of succeeding with commands issued against these entities improving overall performance of an application and a lowering a probability of errors.

Optimistic Concurrency

The optimistic concurrency is an approach for ensuring non overlapping updates over a given entity. It’s supported by the majority of heavy ORMs and applied simply by adding a conditional where at the end of the update. For example

UPDATE Orders
-- more updated columns
SET version = @version + 1
WHERE id = @id AND version = @version

This approach ensures, that if any other operation updated the entity in the meantime, this update will fail. Additionally, if an ORM is capable of counting rows that should have been updated, like NHibernate does, it can abort a transaction and throw an exception informing that some of the operations that were planned to be executed failed.

The optimistic concurrency approach is not a unique SQL technique. It’s popular in many NoSql databases like Azure Table Storage for example. When updating an entity, its ETag is added as the If-Match header, ensuring, that if the entity was modified after retrieval and updated, the operation, again, will fail. See Update operation documentation here.

Finally, when applying Domain Driven Design and operating on an Aggregate Root, this technique is the easiest one to ensure, that the aggregate root is truly a transaction boundary. If the root has its version updated with every change of the aggregate, then two concurrent operations cannot be executed and one will fail, still, preserving the root as a transaction boundary. This applies to aggregate roots, no matter if you immerse them into Event Sourcing or a regular ORM mapped graph of entities. Just update the root with every operation and your aggregate will be just fine.

As it’s been shown above, optimistic concurrency is a simple and powerful tool that in a world of NoSql and transactional-boundaries-got-right may be the only one to ensure atomicity of operations.

Limitations

When using optimistic concurrency, the flow of applying a change is a bit different. Instead of just updating a property, or a value, the following approach is taken

  1. An aggregate is retrieved with its version
  2. If the state allows it, a command is executed
  3. The aggregates’ state is updated conditionally (if the version is unchanged)

Again, this ensures that the updated is applied on the version that a business logic operated onto, but limits the concurrent access.

For services using Event Sourcing, instead of retrieving entity all of the events are retrieved and a state of an aggregate is rebuilt. If snapshots are used, only events with versions bigger than a snapshot must be retrieved. If the snapshot is preserved in a in memory cache, then possibly, no events will be retrieved if the snapshot’s version is equal to the number of aggregate’s events so far. Events that are a result of a command are appended to the store conditionally. Depending on the storage it can be the stream version when using EventStore AppendToStreamAsync or update of a root markup entity when using a custom relational store.

An example

Let’s consider an example of a GitHub-like issue. Every issue has an option of locking it. It can be used for instance to lock an issue created by a troll (you don’t feed the troll) and disallow adding more comments. For sake of argument:

  1. let’s model all comments as a part of the issue aggregate (as always, there are many models that can be applied)
  2. optimistic concurrency is used for all commands.

A business requirement for locking an issue could look like:

when an issue is locked no user should be able to add more comments

It’s quite common, that when seeing a requirement like this, developers don’t ask questions. It’s even more unfortunate, that some companies require to just follow the analysis. Let’s try to relax this requirement a little bit by asking some questions:

  1. Is it required to lock the issue immediately?
  2. Could an issue be considered locked after some short period of time (less than 1s) after locking it?
  3. Could we allow adding some comments during this period?

If the answers point towards no need of an immediate lock, there’s a space to handle locking in a relaxed manner

Relaxed Optimistic Concurrency

If an operation can have its preconditions relaxed and can be performed after achieving some state it can be executed with much less friction. In the previous example, the state when a user can add a comment is a created issue. The precondition is a non-locked issue, but it’s ok to add a comment to a locked issue within some time boundaries. Consider the following flow

  1. An aggregate is retrieved with its version
  2. If the state allows it, a command is executed
  3. The aggregates’ state is updated conditionally (if the version is unchanged) appending the change unconditionally

Depending on the storage and the applied design in can be done in many ways.

When using Event Sourcing with EventStore a special version can be passed to the appending method which represents any version. This appends events unconditionally. This means that a locking operation and adding a comment can be done in parallel without conflicts!

When using a relational database, an issue entity can be retrieved to check it’s state. Next, a comment entity can be added separately, without updating the version of the issue itself. Again, because adding a comment does not change the version, the friction on the aggregate is lowered.

Summing up

Don’t take requirements for granted, but rather ask for the reasoning behind them. Try to relax requirements for areas which may suffer from the high contention. The model is just a model. There are no true or false models but these which help you or make your work harder. Choose wisely:)

Views’ warm up for Event Sourcing

When using Event Sourcing as a foundation for your solution, the command part is a solved problem. Just take an aggregate version, a command, apply onto a state and try append created events to a store, checking the version again. There is a read part of this as well, called views, which is nothing more than an aggregation of a subset of events from the system. This works like a live query, which consumes events from a log and applies them on the projection on and on. Considering that the number of events is constantly growing, how would you deploy a new version of application containing a new view which needs to be build from the beginning, from the very first event? Even with a well performing database applying a few millions of events can take a while.

Warm up routine

Let’s consider a following routine. Instead of calling views by their names, a system version is appended. Take a view ‘users’ as an example

  1. for version 1.0.0 it’s “users-1_0_0”
  2. for version 1.2.0 it’s “users-1_2_0”

Before publishing the new version and moving all users to use it, predeploy the application and run its view builder. The views will be rebuild in the background, taking needed time. Once the builder starts to have problems with getting last events as there’s no more data, the views are prebuilt and a new version can be deployed.

Cost and optimization

Of course these views rebuild can be tedious and long. These could increase costs of your app as well as put an additional pressure on the event store. If a cost/performance optimization is needed, you can consider detection of a view change, something very similar to what Rinat did a few years back. You may come up with something more explicit as well. For any mechanism, the rule would be that if a view is the same, your app uses the last existing version of the view.

Providing this warm up routine, especially when using blue green deployments can improve not only the performance of your application start (the new version scenario) but also can provide an environment for testing the deployment before switching to the new version.

Data has no format

I need to be able to store 1GB of JSON

I’d like to push XML 100 MB/s to this Azure blob

I need to log this data as CSV

Statements like this are sometimes true, but in the majority of cases the format is not given and is a part of designing your architecture/application. Or redesigning if needed. Selecting a proper format can lower the size of your data, increasing the throughput of your system, if a medium like a disk or a network is saturated. That’s why systems like Apache Arrow or Google’s Dremel use their own formats. That’s why you may consider using the protobuf-net serialization for EventStore, disabling it build in v8 projections and lowering size of events at the same time. For low latency systems you can choose the new library Simple Binary Encoding. That’s why sometimes storing data in another format is simply better. I’ve written a blog post Do we really need all these data tranformations and it doesn’t state something opposite. It’s all about making a rational and proper choices of the storage format and taking into consideration different aspects of it and its influence on your system. With this one decision you might improve your system performance greatly.

IS vs HAS relationship for your API

There is an urge of making things automagically. For instance, when you have a DDD Aggregate, one could consider automatic publishing all the commands as the service API. As The Aggregate is a part of a model & a language which you agreed to use, that seems to be a perfect match to be your API. Is it?

IS vs HAS

There is a rule of Composition over inheritance. It says that instead of deriving from different components, one should compose bigger parts from already existing by using them, but no deriving. A good example might be a user and an employee. As an employee, you are given a user in the system. You might try to model it with the derivation in mind. Is an user an employee as well or the other way around? There’s no good answer to this.

You could model it in another way. There an employee, that a user has access to. When a user logs in he/she can access data of an employee is attached to him/her. You can see where is it going. Keep things minimal, use other elements but do not introduce a relation of being something.

A bit abusive allegory

Now ask yourself a question. Is your API using the model or is it the model? In the majority of cases, the interfaces of your API & your model may be aligned, but they are not the same! Even if you publish operations named after a model that you established, you’d like your API to use the model just in case of remodeling the domain. It’s good to automate and do not write much code. On the other hand, it’s good to have proper abstractions separating concerns of two different worlds.

Producer – consumer relationship

In the last post about the RampUp library I covered on of the foundations: IRingBuffer. Now I’d like to describe the contract it fulfills.

Producer consumer

If you take a look at the IRingBuffer you’ll see Write/Read methods. These two are responsible for producing/consuming or writing/reading messages to the buffer in FIFO way. What are the guarantees behind such an interface? What about concurrent accessing this structure?

Multimulti

The easiest approach for distinguishing accessing patterns is considering whether the structure could be accessed by one or more threads. If you consider producer/consumer you’ll see that there are four options:

  1. SPSC – Single Producer Single Consumer – only one thread produces items, and another consumes them
  2. MPSC – Multi Producer Single Consumer – multiple threads may produce items in a safe manner, again there’s a single consuming thread
  3. SPMC – Single Producer Multi Consumer – this could be treated as a distributor of work
  4. MPMC – Multi Producer Multi Consumer – multi/multi, ConcurrentQueue is a good example of it

Unfortunately nothing comes for free. If you want to get multi, you’ll pay the price of handling friction on that end. On the other hand, if you want to design a system where queues provide transport between different parts of the system, you’ll need to enable multiple producers for sure, as there’s going to be more than one system element. If you want to process items in an order they appeared and leave the locking issues, just write a fast single threaded code, a single consumer with a single worker thread would be the way to go. Of course this worker thread may access other queues and produce items for them (hence, multi producer is needed).

The ring buffer implementation in RampUp provides exactly MPSC behavior, as it’s prepared to handle items in order, by a single thread.

Nautral identifiers as subresources in RESTful services

There’s a project I’m working on, which provides a great study of legacy. Beside the code, there’s a database which frequently uses complex natural keys, consisting of various values. As modelling using natural complex keys may be natural, when it comes to put a layer of REST services on top of that structure, a question may be raised: how to model the identifiers in the API, should this natural keys be leaked into the API?

REST provides various ways of modelling API. One of them are REST subresources, which are represented as URIs with additional identifiers at the end. The subresource is nothing more than an identified subpart of the resource itself. Having that said, and taking as an example a simple row with complex natural key consisting of two values <Country, City> how could one model accessing cities (for sake of this example I assume that, there are cities all around the world having the same name but being in different countries and all the cities in the given country have distinct names). How one could provide a URI for that? Is the following the right one?

/api/country/Poland/city/Warsaw 

The API shows Warsaw as the Polish city. That’s true. This API has that nice notion of being easy to consume, navigate. Consider following example:

/api/city/Poland,Warsaw

Now it’s a big uglier, both the country and the city name are at the end. This is a bit different for sure and tells nothing about country accessible under /api/country/Poland. The question is which is better?

Let me abuse a bit the DDD Aggregate term and follow its definition. Are there any operations that can be performed against the city resource/subresource that does not change the state of the country? If yes, then in my opinion modelling your API with resources shows something totally different, saying: hey, this is a city, a part of this country; it’s a subresource and should be treated as a part of the country ALWAYS. Consider the second take. This one, presents a city as a standalone resource. Yes, it is identified by a complex natural key consisting of two dimentions, but this is a mere implementation detail. Once a usual identifiers like int, Guid are introduced the API won’t change that much, or even better, API could accept both of them, using the older combined id for consumers that don’t want to change their usage (easier versioninig).

To sum up: do not leak your internal design either it’s a database design or an application design. Present your user a consistent view grouping resources under wings of transactional consistency.

Feature oriented design got wrong

The fourth link in my google search for ‘feature toggle’ is a link to this Building Real Software post. It’s about not about feature toggles described by Martin Folwer. It’s about feature toggles got wrong.

If you consider toggling features with flags and apply it literally, what you get is a lot of branching. That’s all. Some tests should be written twice to handle a positive and a negative scenario for the branch. The reason for this is a design not prepared to handle toggling properly. In the majority of cases, it’s a design which is not feature-based on its own.

The featured based design is created on the basis of closed components, which handle the given domain aspect. Some of them may be big like ‘basket’, some may be much smaller, like ‘notifications’ reacting to various changes and displaying needed information. The important thing is to design the features as closed components. Once you have it done this way, it’s easier to think about the page without notifications or ads. Again, disabling the feature is not a mere flag thrown in different pieces of code. It’s disabling or replacing the whole feature.

One of my favorite architecture styles, event driven architecture helps in a great manner to build this kind of toggles. It’s quite easy to simply… not handle the event at all. If you consider the notifications, if they are disabled, they simply do not react to various events like ‘order-processed’, etc. The separate story is to not create cycles of dependencies, but still, if you consider the reactive nature of connections between features, that’s a great enabler for introducing toggling with all of advantages one can derive from it with A/B tests, canary releases in mind.

I’m not a fan boy of feature toggling, I consider it as an important tool in architects arsenal though.