Snowy identifiers

TL;DR

When using the snowflake entities pattern, it’s quite easy to forget about using external identifiers that we need to communicate with external systems. This post provides an easy way to address this concern.

Identity revisited

The identifier of a snowflake entity was presented as a guid. We use an artificial non-colliding client-generated identifier to ensure, that any part of the system can generate one without validating that a specific value hasn’t been used before. This enables storing different pieces of data, belonging to different contexts in different services of our system. No system leaves in vacuum though, and sometimes it requires communication with the rest of the world.

Gate away!

A common aspect that is handled by an external system are payments. When you consider credit cards, native bank applications, PayPay, BitCoin and all the rest, providing that kind of a service on your own is not a reasonable option. That’s why external services are used – the price of using one is much cheaper than delivering one. Let’s stick to the payments example. How would you approach this? Would you call the external payment service from each of your services? I hope you’d not. A better approach is to create a gateway, that will act as a translator between your system and the external one.

How many ids do I need?

Using a gateway provides a really interesting property. As the payment gateway is a part of your system, it can use the snowflake identifier. In other words, if there’s an order, it’s ok (under given circumstances) to use its identifier as identifier of the payment as well. Of course if you want to model these two as a part of a snowflake entity spanning across services. It’d be the payment gateway responsibility to correlate the system snowflake identifier with the external system id (integer, some string, whatever). This would create a coherent view of an entity within your system boundaries, closing the mapping in a small dedicated area of the payment gateway.

An integration with an external system closed in a small component leaving your system agnostic to this? Do we need more?

Summary

As you can see, closing the external dependency as a gateway provides value not only by separating the interface of the external provider from your system components, but also preserves a coherent (but distributed) view of your entities.

Dependency rejection

TL;DR

This values must be updates synchronously or we need referential integrity or we need to make all these service calls together are sentences that unfortunately can be heard in discussions about systems. This post provides a pattern for addressing this remarks

The reality

As Jonas Boner says in his presentation Knock, knock. Who’s there? Reality. We really can’t afford to have one database, one model to fit it all. It’s impossible. We process too much, too fast, with too many components to make it all in one fast call. Not to mention transactions. This is reality and if you want to deny reality good luck with that.

The pattern

The pattern to address this remarks is simple. You need to say no to this absurd. NoNoNo.  This no can be supported with many arguments:

  • independent components should be independent, right? Like in-de-pen-dent. They can’t stop working when others stop. This dependency just can’t be taken.
  • does your business truly ensures that everything is prepared up front? What if somebody buys a computer from you? Would you rather say I’ll get it delivered in one week or first double check with all the components if it’s there, or maybe somebody returned it or maybe call the producer with some speech synthesizer? For sure it’s not an option. This dependency just can’t be taken.

You could come up with many more arguments, which could be summarized simply as a new pattern in the town, The Dependency Rejection.

Next time when your DBA/architect/dev friend/tester dreams about this shiny world of a total consistency and always available services, remind them of this and simply reject dependency on this unrealistic idea.

 

Data has no format

I need to be able to store 1GB of JSON

I’d like to push XML 100 MB/s to this Azure blob

I need to log this data as CSV

Statements like this are sometimes true, but in the majority of cases the format is not given and is a part of designing your architecture/application. Or redesigning if needed. Selecting a proper format can lower the size of your data, increasing the throughput of your system, if a medium like a disk or a network is saturated. That’s why systems like Apache Arrow or Google’s Dremel use their own formats. That’s why you may consider using the protobuf-net serialization for EventStore, disabling it build in v8 projections and lowering size of events at the same time. For low latency systems you can choose the new library Simple Binary Encoding. That’s why sometimes storing data in another format is simply better. I’ve written a blog post Do we really need all these data tranformations and it doesn’t state something opposite. It’s all about making a rational and proper choices of the storage format and taking into consideration different aspects of it and its influence on your system. With this one decision you might improve your system performance greatly.

Feature oriented design got wrong

The fourth link in my google search for ‘feature toggle’ is a link to this Building Real Software post. It’s about not about feature toggles described by Martin Folwer. It’s about feature toggles got wrong.

If you consider toggling features with flags and apply it literally, what you get is a lot of branching. That’s all. Some tests should be written twice to handle a positive and a negative scenario for the branch. The reason for this is a design not prepared to handle toggling properly. In the majority of cases, it’s a design which is not feature-based on its own.

The featured based design is created on the basis of closed components, which handle the given domain aspect. Some of them may be big like ‘basket’, some may be much smaller, like ‘notifications’ reacting to various changes and displaying needed information. The important thing is to design the features as closed components. Once you have it done this way, it’s easier to think about the page without notifications or ads. Again, disabling the feature is not a mere flag thrown in different pieces of code. It’s disabling or replacing the whole feature.

One of my favorite architecture styles, event driven architecture helps in a great manner to build this kind of toggles. It’s quite easy to simply… not handle the event at all. If you consider the notifications, if they are disabled, they simply do not react to various events like ‘order-processed’, etc. The separate story is to not create cycles of dependencies, but still, if you consider the reactive nature of connections between features, that’s a great enabler for introducing toggling with all of advantages one can derive from it with A/B tests, canary releases in mind.

I’m not a fan boy of feature toggling, I consider it as an important tool in architects arsenal though.

 

Lokad.CQRS Retrospective

In the recent post Rinat Abdullin provides a retrospective for Lokad.CQRS framework which was/is a starting point for many CQRS journeys. It’s worth to mention that Rinat is the author of this library. The whole article may sound a bit harsh, it provides a great retrospection from the author’s and user’s point of view though.

I agree with the majority points of this post. The library provided abstractions allowing to change the storage engine, but the directions taken were very limiting. The tooling for messages, ddd console, was the thing at the beginning, but after spending a few days with it, I didn’t use it anyway. The library encouraged to use one-way messaging all the way down, to separate every piece. Today, when CQRS mailing lists are filled with messages like ‘you don’t have to use queues all the time’ and CQRS people are much more aware of the ability to handle the requests synchronously it’d be easier to give some directions.

The author finishes with

So, Lokad.CQRS was a big mistake of mine. I’m really sorry if you were affected by it in a bad way.

Hopefully, this recollection of my mistakes either provided you with some insights or simply entertained.

which I totally disagree with! Lokad.CQRS was the tool that shaped thinking of many people, when nothing like that was available on the market. Personally, it helped me to build a event-driven project (you can see the presentation about this here) based on somehow on Lokad.CQRS but with other abstractions and targeted at very good performance, not to mention living documentation built with Mono.Cecil.

Summary

Lokad.CQRS was a ground breaking library providing a bit too much tooling and abstracting too many things. I’m really glad if it helped you to learn about CQRS as it helped me.  Without this, I wouldn’t ask all the questions and wouldn’t learn so much.

The provided retrospective is invaluable and brings a lot of insights. I’m wishing you all to make that kind of ground breaking mistakes someday.

Pearls: EventStore transaction log

I thought for a while about presenting a few projects which are in my opinion real pearls. Let’s start with the EventStore and one in one of its aspects: the transaction log.
If you’re not familiar with this project, EventStore is a stream database providing complex event processing. It’s oriented around streams of events, which can be easily aggregated or repartitioned with projections. Based on ever appended streams and projections chasing the streams one can build a truly powerful logic around processing events.
One of the interesting aspects of EventStore is its storage engine. You can find a bit of description in here. ES does not abstract a storage away, the storage is a built-in part of the database itself. Let’s take a look at its parts before discussing its further:

Appending to the log
One the building blocks of ES is SEDA architecture – the communication within db is based on publishing and consuming messages, which one can notice reviewing StorageWriterService. The service subscribes to multiple messages, mentioned in implementations of the IHandle interface. The arising question is how often does the service flushed it’s messages to disk. One can notice, that method EnqueueMessage beside enqueuing incoming messages counts ones marked by interface IFlushableMessage. What is it for?
Each Handle method call Flush at its very end. Additionally, as the EnqueueMessage increases the counter of messages requiring flush, each Handle method decreases the counter when it handles a flushable message. This brings us to the conclusion that the mentioned counter is equal 0 iff there are no more flushable messages in the queue.

Flushing the log
Once the Flush is called a condition is checked whether:

  • the call was made with force=true (this never happens) or
  • there are no more flush messages in the queue or
  • the given time from the last time has passed

This provides a very powerful batching behavior. Under stress, the flush-to-be counter will be constantly greater than 0, providing flushing every given period of time. Under less stress, with no more flushables in the queue, ES will flush every message which needs to flush the log file.

Acking the client
The final part of the processing is the acknowledgement part. The client should be informed about persisting a transaction to disk. I spent a bit of time (with help of Greg Young and James Nugent) of chasing the place where the ack is generated. It does not happen in the StorageWriterService. What’s responsible for considering the message written then? Here comes the second part of the solution, the StorageChaser. In a dedicated thread, in an infinite loop, a method ChaserIteration is called. The method tries to read a next record from a chunk of unmanaged memory, that was ensured to be flushed by the StorageWriterService. Once the chaser finds CommitRecord, written when a transaction is commited, it acks the client by publishing the StorageMessage.CommitAck in ProcessCommitRecord method. The message will be translated to a client message, confirming the commit and sent back to the client.

Sum up
One cannot deny the beauty and simplicity of this solution. One component tries to flush as fast as possible, or batches a few messages if it cannot endure the pressure. Another one waits for the position to which a file is flushed to be increased. Once it changes, it reads the record (from the in-memory chunk matched with the file on disk) processes it and sends acks. Simple and powerful.

Out of order commands

In the previous posts a simple mechanism of storing information needed for operation idempotence was introduced. A simple hash table, which state is transactionally saved with the state of object onto which the send operation was applied. How about receiving operations out of order? What if infrastructure (for instance, messaging system) will pass one operation earlier than the second, which in reality occurred earlier?

It’s time to make it explicit and start calling elements in the DDD manner. So for sake of reference, the object considered as the subject of an operation is an aggregate root. The operation is of course a message. The modeling assumes using the event sourcing as a storage for aggregates’ states.

Assume, that the aggregate, which the command is sent to, has a property called Version, incremented with each event applied on. Assume then, each command contains a version number, which is supposed to be equal to the aggregate’s version. If, during dispatch, these two values are different, an exception is thrown and command do not change the state of the aggregate. It’s a simple optimistic concurrency implementation, allowing discarding out of order commands sent to an object.

To make it more interesting, consider a sharded system, where specific aggregates are stored by different nodes (but for each aggregate there is one node where it is stored). An aggregate’s events (state changes) have to be propagated across all the nodes/shards in the same idempotent manner as commands are sent to aggregates. It’s easy to apply hashtable for each node and with using the very same key: aggregateId with version but it would mean storing all the pairs of aggregate identifiers with their versions, which could possibly bring down each of your nodes (or make you use GBs of memory). Can the trivial fact, that version is increased with every event on the aggregate, could be used for some optimization? You’ll see in the next entry.