Optimisation of queries against event-sourced systems

I hope you’re familiar with event sourcing pattern. Switching from update-this-row to a more behavioral paradigm in designing systems, that’s one of the most influential trends for sure. But how, a system storing only business deltas/events can be queried? Can it be queried at all?

Project your views

To answer this need you can use projections. It’s nothing more than a function applied to all, or selected events (by stream id, event category, or any dimension you can get from the processed event). The similar solution one can find in Lokad CQRS which is described in this post. So yes, there is a way of applying all the needed events on a view/projections, which can be queried the given parameters. Is there a way of optimizing the queries responses?

Fast response

For sure there is! Let’s take into consideration a projection replying all events for users changing their personal data, but applying only these, which modify their user name. This is probably a very infrequent operation. The projections stores id->name map, which is used for various services to display the user friendly name. What can be done to improve the service performance storing this kind of mapping?

Consider storing two event sequence numbers:

  1. for remembering to which index the events were scanned
  2. for remembering the last user related event which actually changed the mapping

The second can be easily used to ETag all the responses. If the operation is infrequent, the responses can be 304ed easily for long periods of time. This ETag based optimization can be applied always. The more sparse projection state changes, the better chance of reusing the client cached response.

Web API caching get wrong

I read/watch a lot of stuff published on the infoq site. I enjoy it in the majority of cases and find it valid. Recently I read an article about Web APIs and Select N+1 problem and it lacks the very basic information one should provide when writing about the web and http performance.
The post discusses structuring your Web API and providing links/identifiers to other resources one should query to get the full information. It’s easy to imagine that returning a collection of identifiers, for example ids of the books belonging to the given category can bring more requests to your server. A client querying over books will hit your app one by one performing from a load test to a fully developed DOS. The answer to this question is given in following points:

  • Denormalize and build read models
  • Parallelising calls
  • Using Async patterns
  • Optimising threading model and network throttles

What is missing is
the basic http mechanism provided by the specification: cache headers and ETags. There’s no mention about properly tagging your responses to allow return 304 if the client asks for data that didn’t change. The http caching, its expiration are not mentioned as well. Recently Greg Young posted a great article about leveraging http caching. The best quote summing the whole take on it from Greg’s article would be:

This is often a hard lesson to learn for developers. More often than not you should not try to scale your own software but instead prefer to scale commoditized things. Building performant and scalable things is hard, the smaller the surface area the better. Which is a more complex problem a basic reverse proxy or your business domain?

Before getting into fancy caching systems, understand your responses, cache forever what isn’t changing and ETag with version things that may change. Then, when you have a performance issue turn into more complex solutions.

UPDATE:
For sake of reference, the author of the Infoq post reponded to my tweet in here.

A poor cookie

The implementation of a http cookie is leaky. Better get used to it. You can read RFCs about, but better read one, more meaningful question posted on the security stackexchange. If your site is hosted as a subdomain with others apps and a malicious user can access any other app a cookie with top domain can be set. What it means, is that the cookie will be sent with every request to the top domain as well as yours (domain-match verb in the RFCs). This can bring a lot of trouble when an attacker sets a cookie with a name important for your app, like a session cookie. According to the specification, both values will be sent under the same name with no additional information about on which basis a given value was sent.

Html5 to the rescue
If you design a new Single Page Application, you can be saved. Imagine that during POST sending the login data (user & password) in a result JSON a value previously stored in cookie is returned. One can save it in the localStorage easily and add later on to the headers of requests needing authentication. A simple change brings another advantage. Requests not needing authentication like GETs (as noone sends fragile data with verb that is vulnerable to JSON Hijacking) can be sent with no header overhead to the same domain. A standard solution to stop sending cookies with GETs is shipping all your static files to another domain. That isn’t needed anymore.