Kramer vs Kramer

The reality of project management is simple. A project manager has a project he/she’s responsible for. The project can span across multiple teams, areas, resources, whatever you want to call them (just use the name managish enough). The responsibility of the manager is to deliver the project, in majority of cases within the given budget (manhours). The mythical man-month rule can be applied to deliver it faster by simply throwing more resources…

On the other hand, one may take a look at the teams and products. There’s one thing about team and product – they are bound and it’s quite interesting to see the cross-functional team delivering the whole product experience and simply being happy because of the created product. It’s a natural thing to be happy because of your own creation. We all know it: build the house, etc…

And here goes Kramers! Pulling the team from one to another, arguing about behavior only from his/her project point of view, trying to match the high-level goals and estimates and discussing who paid for the ice cream, I meant, who is responsible for providing time to learn a new technology or fixing this or that.

Kramers may think that they just fulfill some orders, or that they’ve been given some kind of a power over others, but the question which everyone should answer is how is Billy.

Enriching your events with important metadata

When considering the application of event sourcing it’s quite common to allow a common part for all the events, the metadata. Various stores handle it in separate but common ways. EventStore lets you append the metadata with events. The same you can do with NEventStore using headers. But what information can be useful to store in the metadata, which info is worth to store despite the fact that it was not captured in the creation of the model?

Audit data

The most common case considered by various lists and blog posts are audit data. This set of data can be described as:

  1. who? – simply store the user id of the action invoker
  2. when? – the timestamp of the action and the event(s)
  3. why? – the serialized intent/action of the actor

That’s an obvious choice and you can easily find examples filled with repartitioning by the who or gather event in a time frame or window as it’s done in the complex event processing. But is there something more one could store? Is it there any particular set of additional dimensions that are worth to remember?

Important metadata

The event sourcing deals with the effect of the actions. An action executed on a state results in an action according to the current implementation. Wait. The current implementation? Yes, the implementation of your aggregate can change and it will either because of bug fixing or introducing new features. Wouldn’t it be nice if the version, like a commit id (SHA1 for gitters) or a semantic version could be stored with the event as well? Imagine that you published a broken version and your business sold 100 tickets before fixing a bug. It’d be nice to be able which events were created on the basis of the broken implementation. Having this knowledge you can easily compensate transactions performed by the broken implementation.

It’s quite common to introduce canary releases, feature toggling and A/B tests for users. With automated deployment and small code enhancement all of the mentioned approaches are feasible to have on a project board. If you consider the toggles or different implementation coexisting in the very same moment, storing the version only may be not enough. How about adding information which features were applied for the action? Just create a simple set of features enabled, or map feature-status and add it to the event as well. Having this and the command, it’s easy to repeat the process. Additionally, it’s easy to result in your A/B experiments. Just run the scan for events with A enabled and another for the B ones.

Optimization (when needed)

If you think that this is too much, create a lookup for sets of versions x features. It’s not that big and is repeatable across many users, hence you can easily optimize storing the set elsewhere, under a reference key. You can serialize this map and calculate SHA1, put the values in a map (a table will do as well) and use identifiers to put them in the event. There’s plenty of options to shift the load either to the query (lookups) or to the storage (store everything as named metadata).

Summing up

If you create an event sourced architecture, consider adding the temporal dimension (version) and a bit of configuration to the metadata. Once you have it, it’s much easier to reason about the sources of your events and introduce tooling like compensation. There’s no such thing like too much data, is there?

Feature oriented design got wrong

The fourth link in my google search for ‘feature toggle’ is a link to this Building Real Software post. It’s about not about feature toggles described by Martin Folwer. It’s about feature toggles got wrong.

If you consider toggling features with flags and apply it literally, what you get is a lot of branching. That’s all. Some tests should be written twice to handle a positive and a negative scenario for the branch. The reason for this is a design not prepared to handle toggling properly. In the majority of cases, it’s a design which is not feature-based on its own.

The featured based design is created on the basis of closed components, which handle the given domain aspect. Some of them may be big like ‘basket’, some may be much smaller, like ‘notifications’ reacting to various changes and displaying needed information. The important thing is to design the features as closed components. Once you have it done this way, it’s easier to think about the page without notifications or ads. Again, disabling the feature is not a mere flag thrown in different pieces of code. It’s disabling or replacing the whole feature.

One of my favorite architecture styles, event driven architecture helps in a great manner to build this kind of toggles. It’s quite easy to simply… not handle the event at all. If you consider the notifications, if they are disabled, they simply do not react to various events like ‘order-processed’, etc. The separate story is to not create cycles of dependencies, but still, if you consider the reactive nature of connections between features, that’s a great enabler for introducing toggling with all of advantages one can derive from it with A/B tests, canary releases in mind.

I’m not a fan boy of feature toggling, I consider it as an important tool in architects arsenal though.


The cost of scan queries in Azure Table Storage

There are multiple articles describing the performance of Azure Table Storage. You probably read the entry of Troy Hunt, Working with 154 million records on Azure Table Storage…. You may have invested your time in reading How to get most out of Windows Azure Tables as well. My question is have you really considered the limitations of the queries, specifically scan queries and how they can consume the major part of Azure Performance Targets.

The PartitionKey and RowKey create the primary and the only index in ATS (Azure Table Storage). Depending on the query the following kinds can be distinguished:

  1. Point Queries, which are queries to retrieve a single entity by specifying a single PartitionKey and RowKey using equality as predicate
  2. Row Range Queries, which  are queries to get a set of entities defined with the same PartitionKey and a range of RowKeys
  3. Partition Range Queries, which are run with a range of ParitionKeys
  4. Full table scans, which have no predicate for ParitionKey

What are the costs and limitations of the following queries? Unfortunately, every row that is accessed by the query to perform scan over will be counted as the table operation, Tthere ain’t no such thing as a free lunch. This means, that if you scan your entire table (4th scenario), you’ll be able to process no more than 20,000 entities per second. This limits the usage of large data sets’ scans. If you have to model queries across different keys, then you may consider storing the same value twice: once under the natural Parition/RowKey pair and the second time to match the other index, to create an inverted index. If any case, you’ll have to scan through the entire data set, then using ATS is not the way to go, and you should consider some other ways of modelling your data, like asynchronous copy data to blob, etc.

Lokad.CQRS Retrospective

In the recent post Rinat Abdullin provides a retrospective for Lokad.CQRS framework which was/is a starting point for many CQRS journeys. It’s worth to mention that Rinat is the author of this library. The whole article may sound a bit harsh, it provides a great retrospection from the author’s and user’s point of view though.

I agree with the majority points of this post. The library provided abstractions allowing to change the storage engine, but the directions taken were very limiting. The tooling for messages, ddd console, was the thing at the beginning, but after spending a few days with it, I didn’t use it anyway. The library encouraged to use one-way messaging all the way down, to separate every piece. Today, when CQRS mailing lists are filled with messages like ‘you don’t have to use queues all the time’ and CQRS people are much more aware of the ability to handle the requests synchronously it’d be easier to give some directions.

The author finishes with

So, Lokad.CQRS was a big mistake of mine. I’m really sorry if you were affected by it in a bad way.

Hopefully, this recollection of my mistakes either provided you with some insights or simply entertained.

which I totally disagree with! Lokad.CQRS was the tool that shaped thinking of many people, when nothing like that was available on the market. Personally, it helped me to build a event-driven project (you can see the presentation about this here) based on somehow on Lokad.CQRS but with other abstractions and targeted at very good performance, not to mention living documentation built with Mono.Cecil.


Lokad.CQRS was a ground breaking library providing a bit too much tooling and abstracting too many things. I’m really glad if it helped you to learn about CQRS as it helped me.¬† Without this, I wouldn’t ask all the questions and wouldn’t learn so much.

The provided retrospective is invaluable and brings a lot of insights. I’m wishing you all to make that kind of ground breaking mistakes someday.

GitFlow and Continuous Build

In the recent post I’ve described the idea how to ensure, that your feature-to-develop GitFlow merge commits are reviewed before being introduced to the develop branch. This preserves the quality of the develop branch, ensuring that it’s truly possibly deployable. How one would like to build his/her repository and provide artifacts? Which commits and which branches should be built? These questions are answered below.

Let’s start with the following observation. Whichever branch points at the given commit, if a proper modern build approach is used like PSake build script is used, the result is the same. The repository contains all the needed scripts to run the build, the output will be the same no matter which branch is selected as a source of the build (if two or more points at the same commit). After all the same commit is the same tree which results in the same build. This gives us a very powerful tool in ensuring even better quality of develop. One can easily setup TeamCity using branch selector to run the same build for all features:


The build script creates artifacts, in my case NuGet packages, using the following versioning [major].[minor].[build_number]. The first two are the numbers stored in the repository. This requires that features are not long running (you don’t want to have a long running from 1.1.1 to get to later than 2.1.2). The build number is the same for all the features. For now, I’m not considering case whether the artifacts should be published to some gallery or not.

The question is, should we build the development commits? For now, considering only the feature branches, the answer is no! All the commits in the develop are merge commits which have been already reviewed and build in the feature branches! That’s why creating the merge commit in the feature is so powerful. You get the code reviewed, built and tested before it goes to the development! Again, the idea of postponing finishing a feature till it has its value acknowledged bring profit to the quality of the develop branch.


GitFlow and code reviews

Presentation1Many companies that uses GIT as its code repository use git-flow as the branching model for its projects. The question, that one can come up with is where and when the code review should be taken? Should it be before:

git flow feature finish MYFEATURE

If yes, then the reviewer looks through some version of code, but still, the person responsible for closing the feature creates a a new merge commit after all which can change a lot. On the other hand, if the reviewer creates the merge commit, he/she may not know all the aspects needed for a successful merge. There are a few pages trying to answer this, but still I haven’t found anything satisfying. Please read further for my proposal of clean and proper code review process with git-flow.

The problem with git-flow is the fact that finishing a feature is an atomic action. In one action the author does plenty of things

  1. Author: pushes the commit to the remote
  2. Reviewer: reviews the code
  3. A: creates a merge commit (new commit created!)
  4. A: moves the develop cursor to the new commit
  5. A: deletes the feature branch
  6. A: pushes to the remote

As always, splitting one action into multiple and then proper grouping might help. Consider the following flow including the author and the reviewer.

  1. Author: creates a merge commit
  2. A: moves the feature/a to the merge commit
  3. A: pushes the merge commit to the remote
  4. Reviewer: reviews the merge commit
  5. R: if the review is successful, rebase the develop with fast-forward to the merge commit (some automation can be introduced).

According to this flow, the reviewer always reviews the commit which will land in the develop. Additionally, this makes the author of the feature branch aware of merge problems before he/she pushes out his/her work to the review. Effectively, a feature can be completed and merged and simply wait for acceptance which then, is a simple “go/no go” without considering merge conflicts later on.

This isn’t a git-flow anymore, but still, it is a solid flow to lower the context switching of the author. Ones he/she finishes, it’s a real end, not an entry for incoming merge.