Cel: MVP


To będzie krótki post. Po spojrzeniu na moją aktywność w roku 2016 postanowiłem zgłosić swoją kandydaturę na MVP w kategorii Visual Studio and Development Technologies. Wiem, że to dopiero październik i wcześnie na podsumowania, ale biorąc pod uwagę liczbę prezentacji, kontrybucji do projektów Open Source, zorganizowanych spotkań Warszawskiej Grupy .NET i postów na tym blogu stwierdzam, że nastał czas na zgłoszenie się. Link do strony do zgłaszania nominacji znajduje się tutaj: https://mvp.microsoft.com/en-us/Nomination/NominateAnMvp Możesz użyć go, aby zgłosić mnie raz jeszcze. Mój email to scooletz@gmail.com

Czasami programujemy w parach, dlaczego więc nie zgłaszać kandydatur parami? Dodatkowo poza mną zgłosiłem Konrada Kokosę. Znacie go dobrze z prezentacji i bloga dotyczącego performance’u i pamięci. Więcej o jego kandydaturze możesz przeczytać tutaj.


It will be a short entry. After looking at my activities in 2016 I decided to nominate myself for the MVP award in Visual Studio and Development Technologies. I know it’s quite early to summarize a whole year, but considering number of talks, contributions to Open Source projects, meetings of Warsaw .NET User Group and posts on this blog I can tell that it’s the right time to make it. You can use following link to nominate a person https://mvp.microsoft.com/en-us/Nomination/NominateAnMvp and you can use it to nominate me once again.

Sometimes we program in pairs, sometimes we nominate in pairs. That’s why I nominated Konrad Kokosa. You can read about his candidature more here.

Motivation is nothing

Grit is everything

For sure you saw it before, a person gets motivated to do something. Whether they attended a conference, or saw a video how to do 100 push ups under 1 minute. The thing is that the source of this motivation is external and is just an impulse. It may encourage one to get involved but it won’t help to maintain the habit. I’d say, that you GET motivated sometimes, but you AREN’T motivated all the time. It’s so important to make the next step and see, that to really make something you need perseverance, you need persistence, you need a plan. If you exercise EVERY work day, that’s a totally different than exercising A LOT. The first is a plan, the other is try-to-follow-plan.

Not all-in-1
Remember one thing. If you want to change your life, you can’t do it at once, but rather, you should pick a thing and stick to it. Focus on one thing, make it a habit and move on. With this approach, you can build up a big mental muscle that will help you with next aims. To give you an example I stopped eating sweets and sugar over 1 month ago (this seemed impossible to me).

Mastering one thing at a time isn’t my idea and it has been proven in various cases.

Grit becomes a very popular topic. Don’t just get motivated to learn it some day😉

It’s time for your first PR


If you’ve never issued a PR, now, it’s the best time to do it for the first time.


If you think that issuing a PR is not for you, please read the following Whys and issue your PR then🙂


The weakest argument goes first. How cool is to say that I’m the contributor to XYZ. Open Source Software has a strong position in developers’ minds so if you’d like to impress someone for any reason, this can be the way.

Fear of being ridiculed

When you issue a PR, it is reviewed. You may consider sending a PR to do not get ridiculed (in Linus style). The thing is, that laughing at your contributors doesn’t pay off. From a maintainer perspective, if people drop off, you have more work to do, as no one is willing to handle registered issues or make meaningful changes on your own. Maintaining a good relationship with contributors is a thing that works the best.

Learn from the best

Sometimes you’d like to ask a famous developer about a possible improvement to their product or just verify some technical thought. How about being given their full attention and learning some of their expertise? PR is a great way to get both at the same time. Just write your idea in the code instead of a vague human language and share it as a proposal. You’ll get it reviewed it for free and your code for sure will be a beginning of a meaningful discussion.


If none of the above work for you, think about your CV and the fact that asking for OSS contribution references is getting more popular this days. Your changes to publicly accessible projects are the best way of showing your potential.

If you’ve never issued a PR, it’s the best time to do it for the first time. Just book an evening or two and make the positive impact on the OSS environment.

Wire improvements


This post sums up my recent involvement in the Wire serializer being built by the Akka.NET team. It was an interesting OSS experience. I’m glad that I could improve the performance of this serializer .

Box me not

The initial design of constructing a serializer of a complex object was based on emitting a pair of delegates (serialize, deserialize). Let’s consider deserialization:

  1. Create object
  2. For each field:
    1. Use the following method of a ValueSerializer:

      public abstract object ReadValue(
         Stream stream, DeserializerSession session)
    2. Cast to the field type or ubox if the field is a value type
    3. Load the deserialized object and set the field.

The problem with this approach was manifesting for classes with many primitive properties/fields the cost of boxing/unboxing might be quite high. On the other hand, one couldn’t be calling a method returning a common type without boxing for value types. How would you approach this? My answer was to move emitting point 1 & 2 to the value serializer and provide a generic implementation for this emitting in the basic class. That allowed me to customize the emitting for primitive value types like int, long, Guid still preserving a possibly bit less performing generic approach.

After the change and delegating the emitting to the ValueSerializer class, it was given two new virtual methods with their implementation using boxing etc, but leaving much more space for special cases

public virtual int EmitReadValue(Compiler<ObjectReader> c, 
   int stream, int session, FieldInfo field) 
public virtual void EmitWriteValue(Compiler<ObjectWriter> c, 
   int stream, int fieldValue, int session) 

Call me once

Wire uses a BCL’s Stream abstraction to work with a stream of bytes. When using a stream, a conversion to primitive value types sometimes require a helper byte array to get all the data in one Read call. For instance, a long value is stored as 8 bytes, hence it requires an 8 byte array. To remove allocations, during serialization and deserialization, a byte chunk can be obtained by calling GetBuffer method on the session object. What if your object contains 4 longs. Should every serializer call this method or maybe it should be called once and the result stored in a variable?

I took the second approach and by removing this additional calls to GetBuffer was able to squeeze again a bit more from Wire.

Summing up

Working on these features was a very pleasant experience. It took me just a few evenings and I was able to make a positive impact. And this is really nice.

.NET volatile write performance degradation in x86


This is a summary of my investigation about writing a fast and well designed concurrent queue for akka.net which performance was drastically low for 32bit application. Take a look at PR here. If you’re interested in writing a well performing no-alloc applications with mechanical symapthy in mind or you’re simply interested in good .NET concurrency this post is for you.


Akka.NET is an actor system’s implementation for .NET platform. It has been ported from Java. Recently I spent some time playing with it and reading through the codebase. One of the classes that I took a look into was UnboundedMailboxQueue using a general purpose .NET BCL’s concurrent queue. It looked strange to me, as knowing a structure Envelope that is passed through this queue one could implement a better queue. I did it in this PR lowering number of allocations by 10% and speeding up the queue by ~8%. Taking into consideration that queues are foundations of akka actors, this result was quite promising. I used the benchmark tests provided with the platform and it looked good. Fortunately Jeff Cyr run some tests on x86 providing results that were disturbing. On x86 the new queue was underperforming. Finally I closed the PR without providing this change.

The queue design

The custom queue provided by use a similar design to the original concurrent queue. The difference was using Envelope fields (there are two: message & sender) to mark message as published without using the concurrent queue state array. Again, knowing the structure you want to passed to the other side via a concurrent queue was vital for this design. You can’t make a universal general collection. Note ‘general’, not ‘generic’.


To make the change finally visible to a queue’s consumer, Volatile.Write was used. The only difference was the type being written. In the BCL’s concurrent queue that was bool in an array. In my case it was an object. Both used different overloads of Volatile.Write(ref ….). For sake of reference, Volatile.Write ensures release barrier so if a queue’s consumer reads status with Volatile.Read (the aquire barrier), it will finally see the written value.

Some kind of reproduction

To know how .net is performing this operations I’ve used two types and run a sample application with x64 and x86. Let’s take a look at the code first.

struct VolatileInt
int _value;

public void Write(int value)
_value = value;

public void WriteVolatile(int value)
Volatile.Write(ref _value, value);

struct VolatileObject
object _value;

public void Write(object value)
_value = value;

public void WriteVolatile(object value)
Volatile.Write(ref _value, value);

It’s really nothing fancy. These two either write the value ensuring release fence or just write the value.

Windbg for x86

The methods had been prepared using RuntimeHelpers.PrepareMethod(). A Windbg instance was attached to the process. I loaded sos clr and took a look at method tables of these two types. Because methods had been prepared, they were jitted so I could easily take a look at the jitted assembler. Because x64 was performing well, let’s take a look at x86. At the beginning let’s check the non-object method, VolatileInt.VolatileWrite

cmp     byte ptr [ecx],al
mov     dword ptr [ecx],edx

Nothing heavy here. Effectively, just move a memory and return. Let’s take a look at writing the object with VolatileObject.VolatileWrite

cmp     byte ptr [ecx],al
lea     edx,[ecx]
call    clr!JIT_CheckedWriteBarrierEAX

Wow! Beside moving some data an additional method is called. The method name is JIT_CheckedWriteBarrierEAX (you probably this now that there may be a group of JIT_CheckedWriteBarrier methods). What is it and why does it appear only in x86?

CoreCLR to the rescue

Take a look at the following snippet and compare blocks for x86 and non-x86? What can you see? For x86 there are additional fragments, including the one mentioned before JIT_CheckedWriteBarrierEAX. What does it do? Let’s take a look at another piece of CoreCLR here. Let’s not dive into this implementation right now and what is checked during this call, but just taking a look at first instructions of this method one can tell that it’ll cost more than the simple int operation

cmp edx,dword ptr [clr!g_lowest_address]
jb      clr!JIT_CheckedWriteBarrierEAX+0x35
cmp     edx,dword ptr [clr!g_highest_address]
jae     clr!JIT_CheckedWriteBarrierEAX+0x35
mov     dword ptr [edx],eax
cmp     eax,dword ptr [clr!g_ephemeral_low]
jb      clr!JIT_CheckedWriteBarrierEAX+0x37
cmp     eax,dword ptr [clr!g_ephemeral_high]
jae     clr!JIT_CheckedWriteBarrierEAX+0x37
shr     edx,0Ah

Summing up
If you want to write well performing code and truly want to support AnyCPU, a proper benchmark tests run with different architectures should be provided.
Sometimes, a gain in one will cost you a lot in another. Even if for now this PR didn’t make it, this was an interesting journey and an extreme learning experience. There’s nothing better that to answer a childish ‘why?’ on your own.

Relaxed Optimistic Concurrency


When using the optimistic concurrency approach for entities that are updated frequently, some of the actions may fail because of the conflicting version numbers. A proper modelling technique distilling if business requirements can be loosened may greatly increase the chances of succeeding with commands issued against these entities improving overall performance of an application and a lowering a probability of errors.

Optimistic Concurrency

The optimistic concurrency is an approach for ensuring non overlapping updates over a given entity. It’s supported by the majority of heavy ORMs and applied simply by adding a conditional where at the end of the update. For example

-- more updated columns
SET version = @version + 1
WHERE id = @id AND version = @version

This approach ensures, that if any other operation updated the entity in the meantime, this update will fail. Additionally, if an ORM is capable of counting rows that should have been updated, like NHibernate does, it can abort a transaction and throw an exception informing that some of the operations that were planned to be executed failed.

The optimistic concurrency approach is not a unique SQL technique. It’s popular in many NoSql databases like Azure Table Storage for example. When updating an entity, its ETag is added as the If-Match header, ensuring, that if the entity was modified after retrieval and updated, the operation, again, will fail. See Update operation documentation here.

Finally, when applying Domain Driven Design and operating on an Aggregate Root, this technique is the easiest one to ensure, that the aggregate root is truly a transaction boundary. If the root has its version updated with every change of the aggregate, then two concurrent operations cannot be executed and one will fail, still, preserving the root as a transaction boundary. This applies to aggregate roots, no matter if you immerse them into Event Sourcing or a regular ORM mapped graph of entities. Just update the root with every operation and your aggregate will be just fine.

As it’s been shown above, optimistic concurrency is a simple and powerful tool that in a world of NoSql and transactional-boundaries-got-right may be the only one to ensure atomicity of operations.


When using optimistic concurrency, the flow of applying a change is a bit different. Instead of just updating a property, or a value, the following approach is taken

  1. An aggregate is retrieved with its version
  2. If the state allows it, a command is executed
  3. The aggregates’ state is updated conditionally (if the version is unchanged)

Again, this ensures that the updated is applied on the version that a business logic operated onto, but limits the concurrent access.

For services using Event Sourcing, instead of retrieving entity all of the events are retrieved and a state of an aggregate is rebuilt. If snapshots are used, only events with versions bigger than a snapshot must be retrieved. If the snapshot is preserved in a in memory cache, then possibly, no events will be retrieved if the snapshot’s version is equal to the number of aggregate’s events so far. Events that are a result of a command are appended to the store conditionally. Depending on the storage it can be the stream version when using EventStore AppendToStreamAsync or update of a root markup entity when using a custom relational store.

An example

Let’s consider an example of a GitHub-like issue. Every issue has an option of locking it. It can be used for instance to lock an issue created by a troll (you don’t feed the troll) and disallow adding more comments. For sake of argument:

  1. let’s model all comments as a part of the issue aggregate (as always, there are many models that can be applied)
  2. optimistic concurrency is used for all commands.

A business requirement for locking an issue could look like:

when an issue is locked no user should be able to add more comments

It’s quite common, that when seeing a requirement like this, developers don’t ask questions. It’s even more unfortunate, that some companies require to just follow the analysis. Let’s try to relax this requirement a little bit by asking some questions:

  1. Is it required to lock the issue immediately?
  2. Could an issue be considered locked after some short period of time (less than 1s) after locking it?
  3. Could we allow adding some comments during this period?

If the answers point towards no need of an immediate lock, there’s a space to handle locking in a relaxed manner

Relaxed Optimistic Concurrency

If an operation can have its preconditions relaxed and can be performed after achieving some state it can be executed with much less friction. In the previous example, the state when a user can add a comment is a created issue. The precondition is a non-locked issue, but it’s ok to add a comment to a locked issue within some time boundaries. Consider the following flow

  1. An aggregate is retrieved with its version
  2. If the state allows it, a command is executed
  3. The aggregates’ state is updated conditionally (if the version is unchanged) appending the change unconditionally

Depending on the storage and the applied design in can be done in many ways.

When using Event Sourcing with EventStore a special version can be passed to the appending method which represents any version. This appends events unconditionally. This means that a locking operation and adding a comment can be done in parallel without conflicts!

When using a relational database, an issue entity can be retrieved to check it’s state. Next, a comment entity can be added separately, without updating the version of the issue itself. Again, because adding a comment does not change the version, the friction on the aggregate is lowered.

Summing up

Don’t take requirements for granted, but rather ask for the reasoning behind them. Try to relax requirements for areas which may suffer from the high contention. The model is just a model. There are no true or false models but these which help you or make your work harder. Choose wisely🙂

RampUp journey ends

My RampUp project has been an important journey to me. I verified my knowledge about modern hardware and dealt with interesting cases writing a really low level code in .NET/C#. Finally, writing code that does not allocate is demanding but doable. This journey ends today.

RampUp has been greatly influenced by Aeron, a tool for publishing your messages really fast. You can easily saturate the bandwidth and get very low service time. Fortunately, for .NET ecosystem, Adaptive Consulting has just opened an Aeron client port here. This is unfortunate for RampUp as this port covers ~70-80% of RampUp implementation. Going on with RampUp would mean, that I need to chase Aeron and compete somehow with Aeron.NET to get an .NET environment attention, and in result, participation (PRs. issues, actual use cases). This is what I don’t want to do.

RampUp source code won’t be removed as still, I find it interesting and it shows some really interesting low level cases of CLR. I hope you enjoyed the ride. Time to move on🙂