Much Better Stateful Service Fabric

TL;DR

In the last post we found the COM+ interface that is a foundation for KeyValueStoreReplica persistence. It’s time to describe the way, how the underlying performance can be unleashed for the managed .NET world.

Internal or not, I don’t care

The sad part of this COM+ layer of ServiceFabric is that’s internal. The good part of .NET is that when running in Full Trust, you can easily overcome this obstacle with Reflection.Emit namespace and emitting helper methods wrapping internal types. After all, there is a public surface that you can start with. The more you know about MSIL and internals of CLR and the more you love it, the less tears and pain will be caused by the following snippet code. Sorry, it’s time for some IL madness.


var nonNullResult = il.DefineLabel();

// if (result == null)
//{
//    return null;
//}
il.Emit(OpCodes.Ldloc_0);
il.Emit(OpCodes.Brtrue_S, nonNullResult);
il.Emit(OpCodes.Ldnull);
il.Emit(OpCodes.Ret);

il.MarkLabel(nonNullResult);

// GC.KeepAlive(result);
il.Emit(OpCodes.Ldloc_0);
il.EmitCall(OpCodes.Call, typeof(GC).GetMethod("KeepAlive"), null);

// nativeItemResult.get_Item()
il.Emit(OpCodes.Ldloc_0);
il.EmitCall(OpCodes.Callvirt, InternalFabric.KeyValueStoreItemResultType.GetMethod("get_Item"), null);

il.EmitCall(OpCodes.Call, ReflectionHelpers.CastIntPtrToVoidPtr, null);
il.Emit(OpCodes.Stloc_1);

// empty stack, processing metadata
il.Emit(OpCodes.Ldloc_1);   // NativeTypes.FABRIC_KEY_VALUE_STORE_ITEM*
il.Emit(OpCodes.Ldfld, InternalFabric.KeyValueStoreItemType.GetField("Metadata")); // IntPtr
il.EmitCall(OpCodes.Call, ReflectionHelpers.CastIntPtrToVoidPtr, null); // void*
il.Emit(OpCodes.Stloc_2);

il.Emit(OpCodes.Ldloc_2); // NativeTypes.FABRIC_KEY_VALUE_STORE_ITEM_METADATA*
il.Emit(OpCodes.Ldfld, InternalFabric.KeyValueStoreItemMetadataType.GetField("Key")); // IntPtr

il.Emit(OpCodes.Ldloc_2); // IntPtr, NativeTypes.FABRIC_KEY_VALUE_STORE_ITEM_METADATA*
il.Emit(OpCodes.Ldfld, InternalFabric.KeyValueStoreItemMetadataType.GetField("ValueSizeInBytes")); // IntPtr, int

il.Emit(OpCodes.Ldloc_2); // IntPtr, int, NativeTypes.FABRIC_KEY_VALUE_STORE_ITEM_METADATA*
il.Emit(OpCodes.Ldfld, InternalFabric.KeyValueStoreItemMetadataType.GetField("SequenceNumber")); // IntPtr, int, long

il.Emit(OpCodes.Ldloc_1); // IntPtr, int, long, NativeTypes.FABRIC_KEY_VALUE_STORE_ITEM*
il.Emit(OpCodes.Ldfld, InternalFabric.KeyValueStoreItemType.GetField("Value")); // IntPtr (char*), int, long, IntPtr (byte*)

The part above is just a small snippet responsible partially for reading the value. If you’re interested in more, here are over 200 lines of emit that brings the COM+ to the public surface. You don’t need to read it though. SewingMachine delivers a much nicer interface for it.

RawAccessorToKeyValueStoreReplica

RawAccessorToKeyValueStoreReplica is a new high level API provided by SewingMachine. It’s not as high level as the original KeyValueStoreReplica as it accepts IntPtr parameters, but still, it removes a lot of layers leaving the performance, serialization, memory management decisions to the end user of the library. You can use your own serializer, you can use stackalloc to allocate on the stack (if values are small) and much much more. This accessor is a foundation for another feature provided by SewingMachine, called KeyValueStatefulService, a new base class for your stateful services.

Summary

We saw how KeyValueStoreReplica is implemented. We took a look at the COM+ interface call sites. Finally, we observed how, by emitting IL, one can expose an internal interface, wrap it in a better abstraction and expose it to the caller. It’s time to take a look at the new stateful service.

 

Better Stateful Service Fabric

TL;DR

This is a follow up entry to Stateful Service Fabric that introduced all the kinds of statefulness you can out of the box from the Fabric SDK. It’s time to move on with providing a better stateful service. First, we need to define what better means.

KeyValueStoreReplica implementation details

The underlying actors’ persistence is based on KeyValueStoreReplica. This class provides a high level API for low level interop interfaces provided by Service Fabric. Yep, you heard it. The runtime of the Fabric is unmanaged and is exposed via COM+ interfaces. Then they are wrapped in a nice .NET classes just to be delivered to you. The question is how nice are these classes? Let’s take a look at the decompiled part of a method.


private void UpdateHelper(TransactionBase transactionBase, string key, byte[] value, long checkSequenceNumber)
{
using (PinCollection pin = new PinCollection())
{
IntPtr key1 = pin.AddBlittable((object) key);
Tuple<uint, IntPtr> nativeBytes = NativeTypes.ToNativeBytes(pin, value);
this.nativeStore.Update(transactionBase.NativeTransactionBase, key1, (int) nativeBytes.Item1, nativeBytes.Item2, checkSequenceNumber);
}
}

 

pinning + Interop + pointers = fun

As you can see above, when you pass a string key and a byte[] value a lot must be done to execute the update:

  1. value needs to be allocated every time

    as there’s no interface accepting Stream or ArraySegment<byte> that would allow you to reuse bytes used for allocations, you always need to ToArray the payload

  2. key and value are pinned for the time of executing update.

    Pinning, is nothing more or less than prohibiting object from being moved by GC. The same effect that you get when using fixed keyword or using GCHandle.Alloc. When handles not that many requests it’s ok to do it. When GC kicks in frequently, this might be a problem.

The interesting part is the nativeStore field that provides the COM+ seam to the Fabric internal interface. This is the interface that is closest to the Fabric surface and that allows to squeeze performance out of it.

Summary

You can probably see, when this leads us. Seeing that underneath the .NET wrapper there is a COM+ that has much more low level interface and allows to use raw memory, we can try to access it directly, skipping KeyValueStoreReplica altogether and write a custom implementation that will enable to maximize the performance.

 

Stateful Service Fabric

TL;DR

This is a very next entry in a series covering my journey based on Service Fabric and my Open Source project called SewingMachine. After hitting the internal wall of Service Fabric I pivoted the approach of as I knew that I can’t change the Actors’ part as I wanted (too much internals) and need to work on a different level.

Underneath it’s all stateful

It doesn’t matter if you use StatefulService or the actor’s model of Service Fabric. In the first case you explicitly work with a state using reliable collections: a distributed, transactional dictionary and a queue. You can access them using StateManager of your service derived from the mentioned base class.

In case of actors, you just register an actor in the Actors’ Runtime. Underneath, this results in creating a stateful ActorService. Both, StatefulService and ActorService derive from StatefulServiceBase. The way they provide and operate on the state is quite different.

Reliable collections

Reliable collections provide a dictionary and a queue. Both, are transactional so you can wrap any operations with a regular transaction. They provide different semantics, where the dictionary is a regular dictionary, a lookup if you prefer or a hashmap. The queue is a regular FIFO structure. The transactions have a bit different semantics from regular db transactions, but for now, you may treat them as regular ones.

If the stateful services use reliable collections, is it the same for actors? It looks like it’s not.

KeyValueStoreReplica

The storage for actors is provided by KeyValueStoreReplica already covered in here and here. If you asked is it that much different from the reliable collections, I’d say yes. It has different semantics and provides a simple replicable key-value store. If its is so different, could we build something else than an actor system on top of it?

The answer will be revealed in the very next post.

Hitting internal wall in Service Fabric

TL;DR

In this post I share my experience in trying extending Service Fabric for Sewing Machine purposes.

Sewing Machine aim

The aim of Sewing Machine is to extend the Service Fabric actor model to use better, faster, less-allocating foundations. The only part I was working on so far was persisted actors. This is the one stored in the KeyValueReplica, I’ve been writing about for last few weeks.

Not-so public seam

I started my work by discovering how the persisted actors are implemented. The class responsible for it is named KvsActorStateProvider. It uses following components:

  • KeyValueStoreWrapper – private
  • VolatileLogicalTimeManager.ISnapshotHandler – internal
  • VolatileLogicalTimeManager – internal
  • IActorStateProviderInternal – internal
  • ActorStateProviderHelper – internal, responsible for shared logic among providers
  • IActorStateProvider – public interface to implement

As you can see, the only part that is given is a seam of the state provider. Every single helper that one could use to implement their own, is internal. Additionally, the provider interface is filled with other interfaces that one needs to implement. I know, sharing data structures isn’t the best option, but as the ServiceFabric share them internally, why wouldn’t you give it to the user.

All of the above means that extending actors’ runtime is hard if not impossible. It provides no real extension points and has its public seam not prepared for it. What does it mean for SewingMachine?

Can’t win, change battle

Rewriting the runtime would be time-consuming. I can’t spend half a year on writing it and don’t want to. At least, not now. I’ve implemented a faster unsafe wrap around KeyValueReplicaStore that still can be useful. To make an impact with SewingMachine, I’ll introduce the event driven actor first, even when clean up will be run by not efficient regular Actor disposal. Using a custom serializer and adhering to the currently used prefixes, later on can be changed to use a custom runtime. The only problem would be reminders, but this can be handled as well by a better versioning of them.

Summary

SewingMachine was meant to extend the actors’ runtime. Seeing difficulties like the ones described above, I could either kill it or repurpose it to provide a real value first, leaving performance for later. That’s how we’ll do it.

 

Service Fabric – KeyValueStoreReplica, ReplicaRole

TL;DR

After taking a look at how actors’ state is persisted with KeyValueStoreReplica to follo the prefix query guideline, it’s time to see how this state is replicated.

ReplicaRole

When defining replication for a partition, one defines on how many nodes the data will reside. Every copy of a partition’s data is called Replica. It’s important to know, that for a given partition only one replica at a time is active. This kind of replica is called Primary. Let’s take a look at the ReplicaRole values and decipher their meanings:

  1. Primary – the currently active replica. All operations are handled by the primary, ensuring that any write will be replicated and acknowledged by a quorum of ActiveSecondary replicas. As in The Highlander, there can be only one Primary replica at the same time.
  2. IdleSecondary – a replica that accepts and applies a state send by the Primary to catch up with all the changes and eventually become ActiveSecondary as soon as it catch up.
  3. ActiveSecondary – a replica that is a part of the write quorum. It stores updates from the Primary and acknowledge them to enable Primary to successfully end a write operation.

Active, passive and not-that-passive

As you can see above, there’s only one replica at any time that is truly active, it’s Primary. What happens with the secondaries? Can they do anything meaningful or maybe they’re just there for copying state?

First and foremost, secondary replicas receive notifications about the state being replicated. This means, that if you derive from KeyValueStoreReplica class, you can be notified about the copied key-value pairs. That’s how you can react to these changes. But how would this be helpful?

You could index the data somehow, you could notify other services, endpoints calling their methods or sending a request (in a safe manner, not failing on the notification) and much more. For instance, Actors’ Runtime uses it to capture the last timestamp for a component called VolatileLogicalTimeManager.

Summary

The role of a replica can be easily summarized as: primary – the current active replica accepting reads&writes, secondary – replicas just copying the state.

 

Service Fabric – KeyValueStoreReplica, Prefix query and actors

TL;DR

The last post presented a deep dive into capabilities of internal Service Fabric storage. We saw, that in spite of being “just a key value store”, KeyValueStoreReplica enables transactions, optimistic concurrency and an interesting approach for handling queries. It’s time to go back to Service Fabric actor model and understand how Actors have their state persisted.

Actor state manager revisited

To access its state Service Fabric actors use IActorStateManager, which enables to setting and getting states by their names. The manager provides one more method, that is called by the actor class when handling a call is ended this method is:

Task SaveStateAsync(CancellationToken token)

This is the place where the actor state is persisted. Let’s take a look at the implementation details behind the default actor state manager.

For persisted actors, the call eventually lands in the KvsActorStateProvider class which is nothing more than just a wrapper around KeyValueStoreReplica. During SaveStateAsync call:

  1. KeyValueStoreReplica transaction is opened
  2. All the states that were changed/added/removed have appropriate methods invoked (Update, Add, Remove)
  3. Transaction is committed.

It’s all good, but how to ensure that one can easily access all the states of a specific actor? How to make it easy and accessible with one call?

The key is… the key

Because KeyValueStoreReplica provides no indexes and accepts just a string as the key, you could think that the key formatting is crucial to ensure the ability to query I mentioned before. That’s true.

To make it happen the following algorithm is used to encode the key

var key = $"Actor_{actorId}_{stateName}"

As you can see every single actor state starts with the “Actor_” prefix. Then, the actorId is used, then the state name is appended. This means that every state of the actor has the same name, which means that we could use Prefix query with the following prefix to make it happen:

var prefix = $"Actor_{actorId}"

Summary

After reading this post you should understand how actors use the underlying Service Fabric store and how they ensure that all the values of a specific actor instance are collocated to make them easy to access together.

Service Fabric – KeyValueStoreReplica (2)

TL;DR

In the previous post we started the top bottom journey into the Service Fabric actor model. The very last step was encountering KeyValueStoreReplica that provides persistence capabilities for persisted actors. Today, we’ll review the capabilities of this distributed, associative data storage.

Transactional

The first and quite interesting attribute of KeyValueStoreReplica is its transactionality. You can open a transaction by calling a regular CreateTransaction method and later on commit it with CommitAsync like in the following example.

using (var tx = replica.CreateTransaction ())
{
  // do something
  var seqNumber = await tx.CommitAsync()
                          .ConfigureAwait(false);
}

This is the way actors ensure that all the states are stored atomically. No partially stored state, it’s all or nothing.

As you probably noticed, there’s a sequence number that is returned when a transaction is committed. We’ll get back to it shortly.

Key Value store

KeyValueStoreReplica is a key value store. Although it’s transactional as we saw above, it can store only binary payloads accessed by string keys. You can easily:

  • Add / TryAdd
  • Update / TryUpdate
  • Remove / TryRemove
  • Get / TryGet

a value by passing the ongoing transaction (you can’t do anything without an active transaction), the key and the value when needed. See the following example:

using (var tx = r.CreateTransaction ())
{
  r.Add (tx, "key1", new byte[] {1,2});
  r.Update (tx, "key2", new byte[] {3,4});
  var seqNumber = await tx.CommitAsync()
                          .ConfigureAwait(false); 
}

There’s an additional parameter that you can pass to all the methods changing data. It’s a sequence number.

Sequence number

The sequence number is simply an optimistic concurrency marker. Once you retrieve it either by getting it with data when Getting value or obtain it from a committed transaction, you can use it to Update or Delete values conditionally. If between your first call that resulted in obtaining the sequence number and the second used for Update or Delete something changed any value accessed in the second transaction it will fail.

This pattern may lower the need of rereading data every time and simply use the marker returned from the last committed transaction.

Prefix query

Although KeyValueStoreReplica is a key-value store it provides one additional way of querying data instead of getting the values one by one. This feature is called a prefix query. Consider the following example where two values are added with keys that have a common prefix. They can be retrieved in one call and returned as an enumerator

using (var tx = r.CreateTransaction ())
{
   r.Add (tx, "key1", new byte[] {1,2});
   r.Add (tx, "key2", new byte[] {3,4});
   var enumerator = r.Enumerate(tx, "key");
   // enumerator has: "key1", "key2"
}

Summary

In this blog post we saw that the underlying storage for stateful part of the Service Fabric cluster is much more than a dummy key value store. It is a transactional db, it enables optimistic concurrency and enables a prefix query that with a proper key design can be leveraged to do a lot. But this is a topic for another blog post.