ProtobufRaw vs protobuf-net


I’m working currently on SewingMachine, an OSS project of mine, that is aimed at unleashing the ultimate performance for your stateful services written in/for Service Fabric (more posts: here). In this post I’m testing further (previous test is here) whether it would be beneficial to write a custom unmanaged writer for protobuf-net using stackallocked memory.

SewingMachine is raw, very raw

SewingMachine works with pointers. When storing data, you pass an IntPtr with a length as a value. Effectively, it means that if you use a managed structure to serialize your data, finally you’ll need to either pin it (pinning is notification for GC to do not move object around when it’s pinned) or have it pinned from the very beginning (this approach could be beneficial if an object is large and has a long lifetime). If you don’t want to use managed memory, you could always use stackalloc to allocate a small amount of memory on stack, serialize to it, and then pass it as IntPtr. This is approach I’m testing now.

Small, fixed sized payloads

If a payload, whether it’s an event or a message is small and contains no fields of variable length (strings, arrays) you could estimate the maximum size it will take to get serialized. Next, instead of using Protobuf-net regular serializer, you could write (or emit during a post-compilation) a custom function to serialize a given type, like I did in this spike. Then it’s time to test it.

Performance tests and over 10x improvement

Again, as in the previous post about memory, the unsafe stackallock version shows that it could be beneficial to invest some more time as the performance benefit is just amazing . The raw version is 10x faster. Wow!



Sometimes using raw unsafe code improves performance. It’s worth to try it, especially in situations where the interface you communicate with is already unsafe and requiring to use unsafe structures.

ThreadStatic vs stackalloc


I’m working currently on SewingMachine, an OSS project of mine, that is aimed at unleashing the ultimate performance for your stateful services written in/for Service Fabric (more posts: here). In this post I’m testing whether it would be beneficial to write a custom unmanaged writer for protobuf-net, instead of using some kind of object pooling with ThreadLocal.

ThreadStatic and stackalloc

ThreadStatic is the old black. It was good to use before async-await has been introduced. Now, when you don’t know on which thread your continuation will be run, it’s not that useful. Still, if you’re on a good old-fashioned synchronous path, it might be used for object pooling and keeping one object per thread. That’s how protobuf-net caches ProtoReader objects.

One could use it to cache locally a chunk of memory for serialization. This could be a managed or unmanaged chunk, but eventually, it would be used to pass data to some storage (in my case, SewingSession from SewingMachine). If the interface accepted unmanaged chunks, I could also use stackalloc for small objects, that I know how much memory will be occupied by. stackalloc provides a way to allocate some number of bytes from the stackframe. Yes, it’s unsafe so keep your belts fastened.

ThreadStatic vs stackalloc

I gave it a try and wrote a simple (if it’s dummy, I encourage you to share your thoughts in comments) test that either uses a ThreadStatic-pooled object with an array or a stackallocated and writes. You can find it in this gist.

How to test it? As always, to the rescue comes BenchmarkDotNet, the best benchmarking tool for any .NET dev. Let’s take a look at the summary now.


Stackalloc wins.

There are several things that should be taken into consideration. Finally block, the real overhead of writing an object and so on and so forth. Still, it looks that for heavily optimized code and small objects, one could this to write them a bit faster.


Using stackallocated buffers is fun and can bring some performance benefits. If I find anything unusual or worth noticing with this approach, I’ll share my findings. As always, when working on performance, measure first, measure in the middle and at the end.

Much Better Stateful Service Fabric


In the last post we found the COM+ interface that is a foundation for KeyValueStoreReplica persistence. It’s time to describe the way, how the underlying performance can be unleashed for the managed .NET world.

Internal or not, I don’t care

The sad part of this COM+ layer of ServiceFabric is that’s internal. The good part of .NET is that when running in Full Trust, you can easily overcome this obstacle with Reflection.Emit namespace and emitting helper methods wrapping internal types. After all, there is a public surface that you can start with. The more you know about MSIL and internals of CLR and the more you love it, the less tears and pain will be caused by the following snippet code. Sorry, it’s time for some IL madness.

var nonNullResult = il.DefineLabel();

// if (result == null)
//    return null;
il.Emit(OpCodes.Brtrue_S, nonNullResult);


// GC.KeepAlive(result);
il.EmitCall(OpCodes.Call, typeof(GC).GetMethod("KeepAlive"), null);

// nativeItemResult.get_Item()
il.EmitCall(OpCodes.Callvirt, InternalFabric.KeyValueStoreItemResultType.GetMethod("get_Item"), null);

il.EmitCall(OpCodes.Call, ReflectionHelpers.CastIntPtrToVoidPtr, null);

// empty stack, processing metadata
il.Emit(OpCodes.Ldloc_1);   // NativeTypes.FABRIC_KEY_VALUE_STORE_ITEM*
il.Emit(OpCodes.Ldfld, InternalFabric.KeyValueStoreItemType.GetField("Metadata")); // IntPtr
il.EmitCall(OpCodes.Call, ReflectionHelpers.CastIntPtrToVoidPtr, null); // void*

il.Emit(OpCodes.Ldloc_2); // NativeTypes.FABRIC_KEY_VALUE_STORE_ITEM_METADATA*
il.Emit(OpCodes.Ldfld, InternalFabric.KeyValueStoreItemMetadataType.GetField("Key")); // IntPtr

il.Emit(OpCodes.Ldloc_2); // IntPtr, NativeTypes.FABRIC_KEY_VALUE_STORE_ITEM_METADATA*
il.Emit(OpCodes.Ldfld, InternalFabric.KeyValueStoreItemMetadataType.GetField("ValueSizeInBytes")); // IntPtr, int

il.Emit(OpCodes.Ldloc_2); // IntPtr, int, NativeTypes.FABRIC_KEY_VALUE_STORE_ITEM_METADATA*
il.Emit(OpCodes.Ldfld, InternalFabric.KeyValueStoreItemMetadataType.GetField("SequenceNumber")); // IntPtr, int, long

il.Emit(OpCodes.Ldloc_1); // IntPtr, int, long, NativeTypes.FABRIC_KEY_VALUE_STORE_ITEM*
il.Emit(OpCodes.Ldfld, InternalFabric.KeyValueStoreItemType.GetField("Value")); // IntPtr (char*), int, long, IntPtr (byte*)

The part above is just a small snippet responsible partially for reading the value. If you’re interested in more, here are over 200 lines of emit that brings the COM+ to the public surface. You don’t need to read it though. SewingMachine delivers a much nicer interface for it.


RawAccessorToKeyValueStoreReplica is a new high level API provided by SewingMachine. It’s not as high level as the original KeyValueStoreReplica as it accepts IntPtr parameters, but still, it removes a lot of layers leaving the performance, serialization, memory management decisions to the end user of the library. You can use your own serializer, you can use stackalloc to allocate on the stack (if values are small) and much much more. This accessor is a foundation for another feature provided by SewingMachine, called KeyValueStatefulService, a new base class for your stateful services.


We saw how KeyValueStoreReplica is implemented. We took a look at the COM+ interface call sites. Finally, we observed how, by emitting IL, one can expose an internal interface, wrap it in a better abstraction and expose it to the caller. It’s time to take a look at the new stateful service.


Better Stateful Service Fabric


This is a follow up entry to Stateful Service Fabric that introduced all the kinds of statefulness you can out of the box from the Fabric SDK. It’s time to move on with providing a better stateful service. First, we need to define what better means.

KeyValueStoreReplica implementation details

The underlying actors’ persistence is based on KeyValueStoreReplica. This class provides a high level API for low level interop interfaces provided by Service Fabric. Yep, you heard it. The runtime of the Fabric is unmanaged and is exposed via COM+ interfaces. Then they are wrapped in a nice .NET classes just to be delivered to you. The question is how nice are these classes? Let’s take a look at the decompiled part of a method.

private void UpdateHelper(TransactionBase transactionBase, string key, byte[] value, long checkSequenceNumber)
using (PinCollection pin = new PinCollection())
IntPtr key1 = pin.AddBlittable((object) key);
Tuple<uint, IntPtr> nativeBytes = NativeTypes.ToNativeBytes(pin, value);
this.nativeStore.Update(transactionBase.NativeTransactionBase, key1, (int) nativeBytes.Item1, nativeBytes.Item2, checkSequenceNumber);


pinning + Interop + pointers = fun

As you can see above, when you pass a string key and a byte[] value a lot must be done to execute the update:

  1. value needs to be allocated every time

    as there’s no interface accepting Stream or ArraySegment<byte> that would allow you to reuse bytes used for allocations, you always need to ToArray the payload

  2. key and value are pinned for the time of executing update.

    Pinning, is nothing more or less than prohibiting object from being moved by GC. The same effect that you get when using fixed keyword or using GCHandle.Alloc. When handles not that many requests it’s ok to do it. When GC kicks in frequently, this might be a problem.

The interesting part is the nativeStore field that provides the COM+ seam to the Fabric internal interface. This is the interface that is closest to the Fabric surface and that allows to squeeze performance out of it.


You can probably see, when this leads us. Seeing that underneath the .NET wrapper there is a COM+ that has much more low level interface and allows to use raw memory, we can try to access it directly, skipping KeyValueStoreReplica altogether and write a custom implementation that will enable to maximize the performance.


Stateful Service Fabric


This is a very next entry in a series covering my journey based on Service Fabric and my Open Source project called SewingMachine. After hitting the internal wall of Service Fabric I pivoted the approach of as I knew that I can’t change the Actors’ part as I wanted (too much internals) and need to work on a different level.

Underneath it’s all stateful

It doesn’t matter if you use StatefulService or the actor’s model of Service Fabric. In the first case you explicitly work with a state using reliable collections: a distributed, transactional dictionary and a queue. You can access them using StateManager of your service derived from the mentioned base class.

In case of actors, you just register an actor in the Actors’ Runtime. Underneath, this results in creating a stateful ActorService. Both, StatefulService and ActorService derive from StatefulServiceBase. The way they provide and operate on the state is quite different.

Reliable collections

Reliable collections provide a dictionary and a queue. Both, are transactional so you can wrap any operations with a regular transaction. They provide different semantics, where the dictionary is a regular dictionary, a lookup if you prefer or a hashmap. The queue is a regular FIFO structure. The transactions have a bit different semantics from regular db transactions, but for now, you may treat them as regular ones.

If the stateful services use reliable collections, is it the same for actors? It looks like it’s not.


The storage for actors is provided by KeyValueStoreReplica already covered in here and here. If you asked is it that much different from the reliable collections, I’d say yes. It has different semantics and provides a simple replicable key-value store. If its is so different, could we build something else than an actor system on top of it?

The answer will be revealed in the very next post.

Hitting internal wall in Service Fabric


In this post I share my experience in trying extending Service Fabric for Sewing Machine purposes.

Sewing Machine aim

The aim of Sewing Machine is to extend the Service Fabric actor model to use better, faster, less-allocating foundations. The only part I was working on so far was persisted actors. This is the one stored in the KeyValueReplica, I’ve been writing about for last few weeks.

Not-so public seam

I started my work by discovering how the persisted actors are implemented. The class responsible for it is named KvsActorStateProvider. It uses following components:

  • KeyValueStoreWrapper – private
  • VolatileLogicalTimeManager.ISnapshotHandler – internal
  • VolatileLogicalTimeManager – internal
  • IActorStateProviderInternal – internal
  • ActorStateProviderHelper – internal, responsible for shared logic among providers
  • IActorStateProvider – public interface to implement

As you can see, the only part that is given is a seam of the state provider. Every single helper that one could use to implement their own, is internal. Additionally, the provider interface is filled with other interfaces that one needs to implement. I know, sharing data structures isn’t the best option, but as the ServiceFabric share them internally, why wouldn’t you give it to the user.

All of the above means that extending actors’ runtime is hard if not impossible. It provides no real extension points and has its public seam not prepared for it. What does it mean for SewingMachine?

Can’t win, change battle

Rewriting the runtime would be time-consuming. I can’t spend half a year on writing it and don’t want to. At least, not now. I’ve implemented a faster unsafe wrap around KeyValueReplicaStore that still can be useful. To make an impact with SewingMachine, I’ll introduce the event driven actor first, even when clean up will be run by not efficient regular Actor disposal. Using a custom serializer and adhering to the currently used prefixes, later on can be changed to use a custom runtime. The only problem would be reminders, but this can be handled as well by a better versioning of them.


SewingMachine was meant to extend the actors’ runtime. Seeing difficulties like the ones described above, I could either kill it or repurpose it to provide a real value first, leaving performance for later. That’s how we’ll do it.


Service Fabric – KeyValueStoreReplica, ReplicaRole


After taking a look at how actors’ state is persisted with KeyValueStoreReplica to follo the prefix query guideline, it’s time to see how this state is replicated.


When defining replication for a partition, one defines on how many nodes the data will reside. Every copy of a partition’s data is called Replica. It’s important to know, that for a given partition only one replica at a time is active. This kind of replica is called Primary. Let’s take a look at the ReplicaRole values and decipher their meanings:

  1. Primary – the currently active replica. All operations are handled by the primary, ensuring that any write will be replicated and acknowledged by a quorum of ActiveSecondary replicas. As in The Highlander, there can be only one Primary replica at the same time.
  2. IdleSecondary – a replica that accepts and applies a state send by the Primary to catch up with all the changes and eventually become ActiveSecondary as soon as it catch up.
  3. ActiveSecondary – a replica that is a part of the write quorum. It stores updates from the Primary and acknowledge them to enable Primary to successfully end a write operation.

Active, passive and not-that-passive

As you can see above, there’s only one replica at any time that is truly active, it’s Primary. What happens with the secondaries? Can they do anything meaningful or maybe they’re just there for copying state?

First and foremost, secondary replicas receive notifications about the state being replicated. This means, that if you derive from KeyValueStoreReplica class, you can be notified about the copied key-value pairs. That’s how you can react to these changes. But how would this be helpful?

You could index the data somehow, you could notify other services, endpoints calling their methods or sending a request (in a safe manner, not failing on the notification) and much more. For instance, Actors’ Runtime uses it to capture the last timestamp for a component called VolatileLogicalTimeManager.


The role of a replica can be easily summarized as: primary – the current active replica accepting reads&writes, secondary – replicas just copying the state.