Build Tour 2016 in Warsaw

Probably every .NET developer watched at least a few presentations from the Build conference. Happily for Polish .NET developers, Warsaw has been selected as one of the cities where Build speakers came to share their exciting news about Microsoft, Windows and .NET.

The conference took place in Expo XXI. The very first thing were containers with Build logo on them. I must admit that this was a bold way of showing where the conference takes place and much better than just a few stands. The venue and the scene were amazing. It was a copy of the Build scene so you could feel the atmosphere. Every organizational aspect of it was well thought through.

The key topics were Universal Windows Platform apps, Web Apps and IoT. Even if I don’t work with these on daily basis the level of talks, as well as the way they were being presented, was balanced enough to actually show and share some meaningful information. The top three for me were:

  1. The packing of Universal Windows Platform apps. Running an installer in a recording container to be able to apply all the operations on the system later on.
  2. Manifoldjs for building hosted we apps.
  3. Enhanced Reality provided with the Unity engine.

It was quite sad that people started leaving quite early. Fortunately, a solid group (including me:P) stayed till the very end when a networking took place. I’m looking forward to seeing Build 2017 again. Of course, in Warsaw.

Views’ warm up for Event Sourcing

When using Event Sourcing as a foundation for your solution, the command part is a solved problem. Just take an aggregate version, a command, apply onto a state and try append created events to a store, checking the version again. There is a read part of this as well, called views, which is nothing more than an aggregation of a subset of events from the system. This works like a live query, which consumes events from a log and applies them on the projection on and on. Considering that the number of events is constantly growing, how would you deploy a new version of application containing a new view which needs to be build from the beginning, from the very first event? Even with a well performing database applying a few millions of events can take a while.

Warm up routine

Let’s consider a following routine. Instead of calling views by their names, a system version is appended. Take a view ‘users’ as an example

  1. for version 1.0.0 it’s “users-1_0_0”
  2. for version 1.2.0 it’s “users-1_2_0”

Before publishing the new version and moving all users to use it, predeploy the application and run its view builder. The views will be rebuild in the background, taking needed time. Once the builder starts to have problems with getting last events as there’s no more data, the views are prebuilt and a new version can be deployed.

Cost and optimization

Of course these views rebuild can be tedious and long. These could increase costs of your app as well as put an additional pressure on the event store. If a cost/performance optimization is needed, you can consider detection of a view change, something very similar to what Rinat did a few years back. You may come up with something more explicit as well. For any mechanism, the rule would be that if a view is the same, your app uses the last existing version of the view.

Providing this warm up routine, especially when using blue green deployments can improve not only the performance of your application start (the new version scenario) but also can provide an environment for testing the deployment before switching to the new version.

Events on the Outside versus Events on the Inside

Recently I’ve been revisiting some of my Domain Driven Design, CQRS & Event Sourcing knowledge and techniques. I’ve supported creation of systems with these approaches, hence, I could revisit the experiences I had as well. If you are not familiar with these topics, a good started could be my Feed Your Head list.

Inside

So you model you domain with aggregates in minds, distilling contexts and domains. The separation between services may be clear or a bit blurry, but it looks ok and, more important, maps the business well. Inside a single context bubble, you can use your aggregates’ events to create views and use the views when in need of data for a command execution. It doesn’t matter which database you use for storing events. It’s simple. Restore the state of an aggregate, gather some data from views, execute a command. If any events are emitted, just store them. A background worker will pick them up to dispatch to a Process Manager.

Outside

What about exposing you events to other modules? If and how can another module react to an event? Should it be able to build it’s own view from the data held in the event? All of these could be sum up in one question: do external events match the internal of a specific module? My answer would be: it’s not easy to tell.

In some systems, these may be good. By the system I mean not only a product, but also a team. Sometimes having a feed of events can be liberating and enabling faster grow, by speeding up initial shaping. You could agree to actually separate services from the very start and verify during a design, if the logical complexity is still low. I.e., if there is not that much events shared between services and what they contain.

This approach brings some problems as well. All the events are becoming your API. They are public, so now they should be taken into consideration when versioning your schemas. Probably some migration guide will be needed as well. The bigger public API the bigger friction with maintaining it for its consumers.

Having this said, you could consider having a smaller and totally separate set of events you want to share with external systems. This draws a visible line between the Inside & the Outside of your service, enabling you to evolve rapidly in the Inside. Maintaining a stable API is much easier then and the system itself has a separation. This addresses questions about views as well. Where should they be stored originally. The answer would be to store properly versioned, immutable views Inside the service, using identifiers to pass the reference to another service. When needed, the consumer can copy & transform the data locally. A separate set of events provides ability to do not use Event Sourcing where not needed. That kind of options (you may, but don’t have to use it) are always good.

For some time I was an advocate of sharing events between services easily, but now, I’d say: apply a proper choice for your scenario. Consider pros and cons, especially in terms of the schema maintainer tax & an option for not sticking to Event Sourcing.

Inspirations

The process of revisiting my assumptions has been started by a few materials. One of them is a presentation by Andreas Ohlund, ‘Putting your events on a diet’, sharing a story about deconstructing an online shop into services. The second are some bits from A Decade of DDD, CQRS, Event Sourcing by Greg Young. The last but not least, Pat Helland’s Data on the Outside versus Data on the Inside.

Single producer single consumer optimizations

The producer-consumer relationship is one of the most fundamental cooperation patterns. Some components produce values, issues requests and some consume/handle them. Depending on the number of components at the end of this dependency it’s called ‘single/multi producer single/multi consumer’ relationship. It’s important to make this choice explicit, because as with every explicit choice, it enables some optimizations. I’d like to share some thoughts o the optimizations taken in the single consumer single producer scenario in the RampUp library provided by OneToOneRingBuffer.

The behavior of ring buffers in RampUp is ported from Java’s Agrona. They provide a queue that enables reading sequentially on the consumer side. The reasoning behind it is that sequential reads are CPU friendly, so that consumer can process messages much quicker. For ManyToOneRingBuffer the production part is quite complex. It proceeds as follows:

  1. check against the consumer position, is there enough of space
  2. allocate a slot in the ring (this is done with Interlocked operations, in a loop, may take a while)
  3. write a header in an ordered way (using volatile)
  4. put data
  5. write the header again marking the message as published

This brings a lot of unneeded work for a single producer. When considering a single producer, there’s nothing to compete with. The only check that needs to be made is that the producer does not overlap with the consumer. So the algorithm looks as follows:

 

  1. check against the consumer position, is there enough of space
  2. put data
  3. write the header again marking the message as published
  4. write the tail value for future writes

Removal of Interlocked and lowering the number of Volatile operations can improve the producer performance greatly (less synchronization).

 

If you wanted to compare these two on your own, here you are: ManyToOne and OneToOne.

Happy producing (and consuming).

Data has no format

I need to be able to store 1GB of JSON

I’d like to push XML 100 MB/s to this Azure blob

I need to log this data as CSV

Statements like this are sometimes true, but in the majority of cases the format is not given and is a part of designing your architecture/application. Or redesigning if needed. Selecting a proper format can lower the size of your data, increasing the throughput of your system, if a medium like a disk or a network is saturated. That’s why systems like Apache Arrow or Google’s Dremel use their own formats. That’s why you may consider using the protobuf-net serialization for EventStore, disabling it build in v8 projections and lowering size of events at the same time. For low latency systems you can choose the new library Simple Binary Encoding. That’s why sometimes storing data in another format is simply better. I’ve written a blog post Do we really need all these data tranformations and it doesn’t state something opposite. It’s all about making a rational and proper choices of the storage format and taking into consideration different aspects of it and its influence on your system. With this one decision you might improve your system performance greatly.

Shared Resources in TeamCity

It’s a common requirement that a set of your tests depends on some resources. It might be a database or an Azure Storage account. It’s possible that instead of providing TeamCity with an administrator account (giving a subscription access for Azure) you’d prefer to have a limited preexisting set or resources like databases or Azure Storage accounts that are leased for a build time by a particular agent. As soon as build is finished the resource would go back to the pool to be leased for another build.

Fortunately TeamCity has a built in ability for this purpose called Shared Resources. This can be defined on any project level and used as a parameter of any build configuration below. Shared Resources feature provides you with all the capabilities mentioned before, removing all the burden of managing a resource pool. In the same way a build leases an agent, an agent leases a shared resource. Nice, simple, easy.

A pointer to a generic method argument

Let’s consider a following method signature of an interface taken from a RampUp interface.


bool Write<TMessage>(ref Envelope envelope, 
    ref TMessage message, IRingBuffer bufferToWrite) 
    where TMessage : struct;

It’s a fairly simple signature, enabling to pass a struct of any size using just a reference to it, without copying it. Now let’s consider the need of obtaining a pointer to this message. Taking a pointer could be needed for various reasons. One could be getting fields by offset, another could be using memcpy for copying the value to any given address. Is it possible to get this pointer in C# code?

No pointers for generic parameters

Unfortunately, you can’t do it in C#. If you try to obtain a pointer to a generic parameter, you’ll be informed about the compiler error. If you can’t do it in C#, is there any other .NET language one could use to get it? Yes, there is. It’s the foundation of .NET programs, the MSIL itself and if it’s MSIL, it means emitting code dynamically.

Ref looks like a pointer

What is a reference to a struct? It looks like a pointer to me. What if we could load it and just assume that it is a pointer? Would CLR accept this program? It occurs that it would. I won’t cover the whole implementation which can be found in here, but want to accent some points.

  • CLR uses the argument with index 0 to passing this. If you want to load a field you need to use the following sequence of operations:
    • Ldloc_0; // load this on the stack
    • Ldfld, “Field1” // pops this loading the value named “Field1” on the stack
  • For Write method, getting a pointer to a message is nothing more than calling an op code: Ldarg_2. As the struct is passed by reference, it can be treated as a pointer by CLR and it will.

I encourage you to download the RampUp codebase and play a little bit with an emitted implementation of the IMessageWriter. Maybe you’ll never need to take the pointer to a generic method parameter (I did), but it’s a good starter to learn a little about emitting code.