Top Domain Model: behaviors, processes and reactions

TL;DR

After pivoting two nights straight (here and here) it’s time to settle down and ask is it possible to stick just to aggregates to model your domain? Is it possible to make it any better with services or maybe another tool is needed?

Actions and state

Aggregates define the boundaries of transactions. That’s their primary target. They narrow down not only our thinking (in a positive sense), but also provide an ability for scaling out our model. If every aggregate sits in its own bucket, there’s nothing easier to save it as a document, or a stream of events within one partition. Their boundaries won’t be ever crossed. The last one is actually a wishful thinking. Who would like to transfer money without ever receiving them on the other side?

Actions and reactions

Think in processes. One account is debited, the other is credited. The second does not have to happen in the same transaction. Moreover, the second, in its nature, is a reaction to the first account being debited. In a natural world reactions often happen at the same time. Modelling them in this way (transactions across all the aggregates) might be more realistic but for sure, is not more useful. The more elements in the transaction, the higher probability of failing… and so on and so forth.

Aggregates + processes

The above is a description of a perfect couple. Aggregates and processes are two tools that you can use to model any domain. Actually, if you dared to say that you don’t need processes, I’d say that you could be wasting your time modelling a really boring and simple domain with a DDD tooling. From the tooling perspective, whether you want to event-source everything all the way or use good old fashioned orm with a message bus, it’s up to you. Just remember, that the true power in your model comes from combining aggregates and processes.

Summary

Modelling aggregates that just work (TM) is impossible. Any, even moderately complex domain contains a lot of reactions and processes that need to be captured in your model to make it usable. Next time, you think about an aggregate reacting to something, ask yourself, what kind of process is this, how should I name it, how to model it?

Top Domain Model: I’ve been pivoting all night long. Again

TL;DR

In the last entry we noticed that we can model out some problems by selecting a different aggregate an event belongs to. In this episode, we will revisit this idea by referring to the good old-fashioned Twitter and its timelines. How timelines are created? Is everyone treated equally? Is it only aggregates’ identities that should be pivoted or maybe our thinking as well?

Fans in, fans out

The model behind Twitter timelines is quite simple. It’s also different for people having 10 followers or 10 millions. The difference is the number of timelines that needs to be updated when Madonna writes a tweet or a @JustAnotherAccountLikingMadonna2019 retweets one of their favorite performer thoughts. Notifying 10 millions people is not that easy and takes a bit longer. Writing to 10 timelines is easy and fast.

Golden customer

A similar reasoning might be added to a golden customer profile or any kind of dimension that distinguishes an aggregate type in your model. Sometimes, a behavior will be slightly altered, sometimes it will be totally rewired. The most important idea is to don’t stick to the initial model. Let yourself ask important questions to distill and clarify it. To discover these golden customers that require different handling, to discover Madonnas that require cargo shipping to deliver tons of their tweets and so on and so forth.

A new type or not

Sometimes, this reasoning will bring you new types of aggregates, sometimes it will be a process based on a dimension distinguishing a Madonna-like aggregate instance. As always, every model will be wrong, but some of them will be more useful than others.

Summary

Don’t stick to the initial aggregate type. Ask more modelling questions, that can bring you new types of aggregates or processes depending on a specific dimension value of already existing aggregate. Pivot not only identities, but also your thinking.

Top Domain Model: I’ve been pivoting all night long

TL;DR

This is the next entry in the Top Domain Model series of loosely coupled posts describing different modelling approaches. In the last one we covered a temporal dimension, it’s time to consider what is an aggregate and where an event belongs to.

Endorsements

We all love receiving endorsements on LinkedIn, don’t we? How would you model such a feature? How would you draw a boundary of the aggregate? Would one aggregate be sufficient enough? Let’s first reason about a single aggregate provided by a user.

Endorsements profile per user

You could easily imagine providing an Endorsements aggregate per user. It could be visualized as a REST subresource of a user like

/user/1231423/endorsements

This could be implemented in a simple way and would handle properly situations where endorsing somebody is not that frequent. In other way, that the probability of having two users endorsing the third at the same time is quite low. What if it was a famous CTO who just created his profile? What if the number of endorsements would be bigger that 100/s? This model would not hold? What would it then?

Pivot to the rescue

When modelling, answering what’s the subject and what’s the action is not that straightforward. You may say that:

An user is endorsed by another

which means that user’s endorsements are modified by another user. But you could also rephrase this to

An user endorses another

Which could be modeled as endorsements storing another users tags in YOUR aggregate. This model, even with a lot of people endorsing is easily scalable: everyone is applying endorsements for other users in their own aggregate. You won’t endorse more than 10 skills a sec, will you?

This approach requires to create a separate, reactive view applying all the events to the read side, so that the famous CTO gets their skills displayed. But this, as it can be eventually consistent, is much easier than overcoming a high throughput scenario for writes.

Summary

When dealing with high volumes of operations ask, if the model can be pivoted? Maybe, selecting another owner of business operation, or even, modelling a new type of an aggregate, can help you deal with requirements you need to satisfy.

 

Zapraszam na otwarte szkolenie z EventSourcingu: Warszawa, 28-09-2017

Zapraszam Cię na moje szkolenie otwarte dotyczące EventSourcingu w .NET!

Tematem zdarzeń zajmuję się od dawna i nastał czas aby podzielić się tą wiedzą z Tobą w drugiej odsłonie mojego szkolenia. Rejestracja odbywa się przez system Evenea na tej stronie i tam też znajdziesz wszystkie informacje.

Szkolenie podzielone jest na moduły. W każdym z nich wykonujemy od jednego do kilku ćwiczeń. Obejmują one zarówno modelowanie domeny, testowanie Waszych modeli, jak i zagadnienia takie jak tworzenie widoków, czy reagowanie na zdarzenia. Wszystko to opiszemy w kodzie, tak aby w łatwy sposób zastosować świeżo zdobyte zdolności w Twoich projektach.

Strona szkolenia

Do zobaczenia!

Top Domain Model: I’m temporal

TL;DR

This is the first from entries written under wings of Top Domain Model. Nope, it’s not a new TV show for domain driven modellers. It’s a series about domain driven modelling itself. Today, we’ll take a look at models and their temporal dimension.

Reservation

Imagine that you model a system for a major hospital. Unfortunately, the registrations are opened once a year, and multiple people want to register at the same time. It’s even worse, they want to register to get their procedure scheduled in a specific room that has all the needed tools. The first idea could be to model the room as an aggregate. You can imagine hundreds of people trying to update the same aggregate. It won’t work.

Temporal dimension

To address this you could introduce a temporal dimension of the room. Instead of treating it as one room, you could model it as a RoomDay, that represents a schedule for a specific day. This would:

  • spread the saturation over multiple aggregates (possibly with a more less natural distribution)
  • make the aggregate time-bound (once the day passes, whole aggregate can be archived, deleted)

You could chop even further if needed, nevertheless, adding a temporal dimension to an aggregate can easily help you to simplify challenges related to scale and long lifespan of aggregates.

Summary

When modelling an aggregate, that it’s going to be highly contended, ask yourself is it naturally temporal? Possibly it can be replaced by multiple aggregates, each assigned to one year/month/day.

 

Top Domain Model

It looks like a new series is going to be published on this blog!

I received some feedback related to the recent series making your event sourcing functional. Posts in this series will be loosely coupled and more-less self-contained. The topic that will connect them is modelling your domain, aggregates, processes towards models that are useful, not necessarily correct. We’ll go through different domains & different requirements and we’ll try to focus not on the created model (it’s just an exercise) but rather on the modelling approaches and practices.

Put on your modeller’s gloves and safety glasses. It’s about time to crunch a few domains!

 

Event stores and event sourcing: some not so practical disadvantages and problems

TL;DR

This post is some kind of answer to the article mentioned in a tweet by Greg Young. The blog post of the author has no comment section. Also, this post contains a lot of information, so that’s why I’m posting it instead of sending as an email or DM.

Commits

Typically, an event store models commits rather than the underlying event data.

I don’t know what is a typical event store. I know though that:

  1. EventStore built by Greg Young company, a standalone event database, fully separates these two, providing granularity on the event level
  2. StreamStone that provides support for Azure Table Storage, works on the event level as well
  3. Marten , a PostgreSQL based document&event database also works on the singular event level

For my statistical sample, the quoted statement does not hold true.

Scaling with snapshots

One problem with event sourcing is handling entities with long and complex lifespans.

and later

Event store implementations typically address this by creating snapshots that summarize state up to a particular point in time.

and later

The question here is when and how should snapshots be created? This is not straightforward as it typically requires an asynchronous process to creates snapshots in advance of any expected query load. In the real world this can be difficult to predict.

The first and foremost, if you have aggregates with long and complex lifespans, it’s your responsibility because you chose a model where you have aggregates like that. Remember, that there are no right or wrong models, only useful or crappy ones.

The second. Let me provide an algorithm for snapshoting. If you retrieved 1000 events to build up aggregate, you should snapshot it (serialize + put into cache in memory + possibly store in a db). Easy and simple, I see no need for fancy algorithms.

Visibility of data

In a generic event store payloads tend to be stored as agnostic payloads in JSON or some other agnostic format. This can obscure data and make it difficult to diagnose data-related issues.

If you as an architect or developer know your domain and you know that you need a strong schema, because you want to use it as published interface but still persist data in JSON instead of some schema-aware serialization like protobuf (binary, schema-aware serialization from Google) it’s not the event store fault. Additionally,

  1. EventStore
  2. StreamStone

both handle binary just right (yes, you can’t write js projections for EventStore, but still you can subscribe).

Handling schema change

If you want to preserve the immutability of events, you will be forced to maintain processing logic that can handle every version of the event schema. Over time this can give rise to some extremely complicated programming logic.

It was shown, that instead of cluttering your model with different versions (which still, sometimes it’s easier to achieve), one could provide a mapping that is applied on the event stream before returning events to the model. In this case, you can handle the versioning in one place and move forward with schema changes (again, if it’s not your published interface). This is not always the case, but this patter can be used to reduce the clutter.

Dealing with complex, real world domains

Given how quickly processing complexity can escalate once you are handling millions of streams, it’s easy to wonder whether any domain is really suitable for an event store.

EventStore, StreamStone – they are designed to handle these millions.

The problem of explanation fatigue

Event stores are an abstract idea that some people really struggle with. They come with a high level of “explanation tax” that has to be paid every time somebody new joins a project.

You could tell this about messaging and delivery guarantees, fast serializers like protobuf or dependency injection. Is there a project, when a newbie joins, they just know what and how to do it? Nope.

Summary

It’s your decision whether to use event sourcing or not, as it’s not a silver bullet. Nothing is. I wanted to clarify some of the misunderstandings that I found in the article. Hopefully, this will help my readers in choosing their tooling (and opinions) wisely.