Events on the Outside versus Events on the Inside

Recently I’ve been revisiting some of my Domain Driven Design, CQRS & Event Sourcing knowledge and techniques. I’ve supported creation of systems with these approaches, hence, I could revisit the experiences I had as well. If you are not familiar with these topics, a good started could be my Feed Your Head list.

Inside

So you model you domain with aggregates in minds, distilling contexts and domains. The separation between services may be clear or a bit blurry, but it looks ok and, more important, maps the business well. Inside a single context bubble, you can use your aggregates’ events to create views and use the views when in need of data for a command execution. It doesn’t matter which database you use for storing events. It’s simple. Restore the state of an aggregate, gather some data from views, execute a command. If any events are emitted, just store them. A background worker will pick them up to dispatch to a Process Manager.

Outside

What about exposing you events to other modules? If and how can another module react to an event? Should it be able to build it’s own view from the data held in the event? All of these could be sum up in one question: do external events match the internal of a specific module? My answer would be: it’s not easy to tell.

In some systems, these may be good. By the system I mean not only a product, but also a team. Sometimes having a feed of events can be liberating and enabling faster grow, by speeding up initial shaping. You could agree to actually separate services from the very start and verify during a design, if the logical complexity is still low. I.e., if there is not that much events shared between services and what they contain.

This approach brings some problems as well. All the events are becoming your API. They are public, so now they should be taken into consideration when versioning your schemas. Probably some migration guide will be needed as well. The bigger public API the bigger friction with maintaining it for its consumers.

Having this said, you could consider having a smaller and totally separate set of events you want to share with external systems. This draws a visible line between the Inside & the Outside of your service, enabling you to evolve rapidly in the Inside. Maintaining a stable API is much easier then and the system itself has a separation. This addresses questions about views as well. Where should they be stored originally. The answer would be to store properly versioned, immutable views Inside the service, using identifiers to pass the reference to another service. When needed, the consumer can copy & transform the data locally. A separate set of events provides ability to do not use Event Sourcing where not needed. That kind of options (you may, but don’t have to use it) are always good.

For some time I was an advocate of sharing events between services easily, but now, I’d say: apply a proper choice for your scenario. Consider pros and cons, especially in terms of the schema maintainer tax & an option for not sticking to Event Sourcing.

Inspirations

The process of revisiting my assumptions has been started by a few materials. One of them is a presentation by Andreas Ohlund, ‘Putting your events on a diet’, sharing a story about deconstructing an online shop into services. The second are some bits from A Decade of DDD, CQRS, Event Sourcing by Greg Young. The last but not least, Pat Helland’s Data on the Outside versus Data on the Inside.

IS vs HAS relationship for your API

There is an urge of making things automagically. For instance, when you have a DDD Aggregate, one could consider automatic publishing all the commands as the service API. As The Aggregate is a part of a model & a language which you agreed to use, that seems to be a perfect match to be your API. Is it?

IS vs HAS

There is a rule of Composition over inheritance. It says that instead of deriving from different components, one should compose bigger parts from already existing by using them, but no deriving. A good example might be a user and an employee. As an employee, you are given a user in the system. You might try to model it with the derivation in mind. Is an user an employee as well or the other way around? There’s no good answer to this.

You could model it in another way. There an employee, that a user has access to. When a user logs in he/she can access data of an employee is attached to him/her. You can see where is it going. Keep things minimal, use other elements but do not introduce a relation of being something.

A bit abusive allegory

Now ask yourself a question. Is your API using the model or is it the model? In the majority of cases, the interfaces of your API & your model may be aligned, but they are not the same! Even if you publish operations named after a model that you established, you’d like your API to use the model just in case of remodeling the domain. It’s good to automate and do not write much code. On the other hand, it’s good to have proper abstractions separating concerns of two different worlds.

Enriching your events with important metadata

When considering the application of event sourcing it’s quite common to allow a common part for all the events, the metadata. Various stores handle it in separate but common ways. EventStore lets you append the metadata with events. The same you can do with NEventStore using headers. But what information can be useful to store in the metadata, which info is worth to store despite the fact that it was not captured in the creation of the model?

Audit data

The most common case considered by various lists and blog posts are audit data. This set of data can be described as:

  1. who? – simply store the user id of the action invoker
  2. when? – the timestamp of the action and the event(s)
  3. why? – the serialized intent/action of the actor

That’s an obvious choice and you can easily find examples filled with repartitioning by the who or gather event in a time frame or window as it’s done in the complex event processing. But is there something more one could store? Is it there any particular set of additional dimensions that are worth to remember?

Important metadata

The event sourcing deals with the effect of the actions. An action executed on a state results in an action according to the current implementation. Wait. The current implementation? Yes, the implementation of your aggregate can change and it will either because of bug fixing or introducing new features. Wouldn’t it be nice if the version, like a commit id (SHA1 for gitters) or a semantic version could be stored with the event as well? Imagine that you published a broken version and your business sold 100 tickets before fixing a bug. It’d be nice to be able which events were created on the basis of the broken implementation. Having this knowledge you can easily compensate transactions performed by the broken implementation.

It’s quite common to introduce canary releases, feature toggling and A/B tests for users. With automated deployment and small code enhancement all of the mentioned approaches are feasible to have on a project board. If you consider the toggles or different implementation coexisting in the very same moment, storing the version only may be not enough. How about adding information which features were applied for the action? Just create a simple set of features enabled, or map feature-status and add it to the event as well. Having this and the command, it’s easy to repeat the process. Additionally, it’s easy to result in your A/B experiments. Just run the scan for events with A enabled and another for the B ones.

Optimization (when needed)

If you think that this is too much, create a lookup for sets of versions x features. It’s not that big and is repeatable across many users, hence you can easily optimize storing the set elsewhere, under a reference key. You can serialize this map and calculate SHA1, put the values in a map (a table will do as well) and use identifiers to put them in the event. There’s plenty of options to shift the load either to the query (lookups) or to the storage (store everything as named metadata).

Summing up

If you create an event sourced architecture, consider adding the temporal dimension (version) and a bit of configuration to the metadata. Once you have it, it’s much easier to reason about the sources of your events and introduce tooling like compensation. There’s no such thing like too much data, is there?

Lokad.CQRS Retrospective

In the recent post Rinat Abdullin provides a retrospective for Lokad.CQRS framework which was/is a starting point for many CQRS journeys. It’s worth to mention that Rinat is the author of this library. The whole article may sound a bit harsh, it provides a great retrospection from the author’s and user’s point of view though.

I agree with the majority points of this post. The library provided abstractions allowing to change the storage engine, but the directions taken were very limiting. The tooling for messages, ddd console, was the thing at the beginning, but after spending a few days with it, I didn’t use it anyway. The library encouraged to use one-way messaging all the way down, to separate every piece. Today, when CQRS mailing lists are filled with messages like ‘you don’t have to use queues all the time’ and CQRS people are much more aware of the ability to handle the requests synchronously it’d be easier to give some directions.

The author finishes with

So, Lokad.CQRS was a big mistake of mine. I’m really sorry if you were affected by it in a bad way.

Hopefully, this recollection of my mistakes either provided you with some insights or simply entertained.

which I totally disagree with! Lokad.CQRS was the tool that shaped thinking of many people, when nothing like that was available on the market. Personally, it helped me to build a event-driven project (you can see the presentation about this here) based on somehow on Lokad.CQRS but with other abstractions and targeted at very good performance, not to mention living documentation built with Mono.Cecil.

Summary

Lokad.CQRS was a ground breaking library providing a bit too much tooling and abstracting too many things. I’m really glad if it helped you to learn about CQRS as it helped me.  Without this, I wouldn’t ask all the questions and wouldn’t learn so much.

The provided retrospective is invaluable and brings a lot of insights. I’m wishing you all to make that kind of ground breaking mistakes someday.

Event Driven Architecture – feed your head

It’s been a few days since the last Warsaw .NET User Group meeting. The main presentation was provided by me & Tomasz Frydrychewicz. The title was: “Event Driven Architecture in practice”. Being given a high number of answers to the pool and the overall was very positive response I may call it one of my best presentations ever. Anyway, I was being asked many questions during these days, the main one is what/who should I read/watch to immerse into this event-based approach. The list below tries to answer it somehow, grouped by author:

  1. Martin Fowler
    1. http://martinfowler.com/eaaDev/EventSourcing.html – the top 1 Google search result. Martin provides a good intro, mixing a bit a concept of storying commands and events. Anyway, this is a must read if you starts with this topic
    2. http://martinfowler.com/eaaDev/RetroactiveEvent.html – the article which one should become familiar with after spending some time with event modelling. Some domains are less prone to result in special cases for handling this kind of events, other may be very fragile and one should start with this
  2. Lokad, CQRS, Rinat Abdullin
    1. http://lokad.github.io/lokad-cqrs/ – a must-read if you want to choose the event way. Plenty of materials and tooling. To me some parts are a bit frameworkish, but still, it’s one of the best implementations I’ve seen. Understanding this might be your game changer.
      Additionally, it provides an Azure storage implementation.
  3. Rinat Abdullin & Kerry Street
    1. Being the worst – how to become a master? Immerse yourself in a new field as the worst. That’s how winning is done! Am amazing journey through learning about DDD, Event Sourcing and many paradigms.
  4. Microsoft Patterns and Practices:
    1. CQRS Journey – a free book about a group of developers using event driven approach with DDD in mind, to build a new system. I love the personas they use to drive dialogues between different opinions/minds/approaches. It’s not a guide. I’d rather consider it a diary of all the different cases you can meet when implementing solutions using these approaches.
  5. Event Store
    1. The whole Event Store database is an actual event store for storying events from the event sourced systems. I encourage you to spend a week or more on reading its code. It’s a good codebase.
    2. Event sourcing documentation is a short introduction to the ES world. After all these years, it still uses the Word generated pictures:) but this doesn’t diminish its value.
  6. NEventStore
    1. NEventStore is an open source library for storying and querying your events. It’s opinionated, for instance it stores all the events as one commit object. I’ve read it carefully, although I don’t like its approach still. One should read it though, it’s always worth to know what’s already provided.

It’s a bit long list but nobody said that you can learn a new paradigm over one weekend. So read, learn and apply it successfully:)

From domain to code

Currently I’m helping to model one domain with Event Sourcing. A few days ago we spent ~15 minutes on modelling some cases on the whiteboard. The result of the first phase was distilling a few aggregates with events. Later on, we described some processes as we needed to react to some events. At first to much behavior was put in a process manager, to finally be moved to a separate aggregate – a process itself. The process manager was left as a simple router. Because of the strong foundations (a library) providing us at-least-once semantics with a idempotent receivers and handling process managers, the whole discussion was on a much higher level. No imperative programming, just declarative pieces of the domain itself!
A few hours later an implementation was finished. The most important thing was that there was no mapping!. The code was a simple representation of the model being discussed and agreed on. No ORM magic, no enterprise onion with tons of layers and mappings. It was just a simple model of this piece of a domain. And it’s been working like a charm.

Aggregate, an idempotent receiver

In the previous post I covered the process manager subscribing to and consuming events from multiple sources. Additionally, it was show that saving the position of read logs after performing action is sufficient to get at-least-once delivery (retry in case of errors).

Let me consider an aggregate which an action is invoked on. As the only transactional boundary that can be used is the aggregate itself, to each call from process manager we’ll add additional data:

  1. hash (unique, SHA1 probably) of the process manager identifier and the name of the origin module where the handled event was taken from
  2. the order number of the handled event

This two values combined in an event, will allow in one transaction to check, whether the action has been already applied and skip it if needed. Everything in one transaction.
As order numbers for the given hash can only increase, the state of this idempotent received can be modeled as a dictionary with Sha1 value as its key and the order number as its value.
The only disadvantage is additional event added to the aggregate for each action performed within a process manager. Fortunately, a scavenging process, a similar one to this from EventStore. When events are dumped to a file from a store of your choice, only the last value for the given Sha1 hash can be stored.