We accept pull requests

There are many protocols of collaboration. There’s famous C4 from 0MQ & others as well. Sometimes a project lives without collaboration rules being explicitly stated for years and can be successful. Sometimes it can suffer from bad assumptions collaborators made when they decided to invest their time in a project. To lower the possibility of any unhealthy friction, I’ve decided to write at least some bullet points for RampUp and created a simple Collaboration.md (I’ve just been sent the very first PR for RampUp :D).

The main issue I wanted to address was that ‘I accept pull requests’. This topic is discussed over and over again and can be described in a following way: .NET devs are not willing to collaborate and rather than providing PRs they just blaim/complain etc. The landscape is changing now and everyone involved in .NET Open Source can feel it for sure. More PRs is being issued, the engagement, so strongly needed is shifting towards an actual participation rather than poking. It doesn’t matter if you provide a new feature and rather than spending days on drawing UML you sketch it and issue a PR or if you spend one additional hour of your debugging time, distilling a PR reproducing a bug you’ve encountered. That’s what real involvement is. It’s not about talking, it’s about making things. Even if a PR is rejected for whatever reason, you learn a lot, you really participate and get a real feedback about your work, not your cheap talk.

I really hope that the new wave of OSS is coming. In that case, let’s swim with the tide.

False sharing is dead, long live the Padded

False sharing is a common problem of multithreaded applications in .NET. If you allocate objects in/for different threads, they may land on the same cache line impacting the performance, limiting gains from scaling your app on a single machine. Unfortunately, because of the multithreaded nature of the RampUp library it’s been suffering from the same condition. I’ve decided to address by providing a tooling rather than going through the whole codebase and apply LayoutKind.Explicit with plenty of FieldOffsets

Padded is born

The easiest and the best way of addressing cross cutting concerns in your .NET apps I’ve found so far is Fody. It’s a post compiler/weaver based on the mighty Mono.Cecil library. The tool has a decent documentation, allowing one to create a even quite complex plugin in a few hours. Because of this advantages I’ve used it already in RampUp but wanted to have something, which can live on its own. That how Padded was born.

Pad me please

Padded uses a very simple technique of adding a dozen of additional fields. According to the test cases provided, they are sufficient enough to provide enough of space to prohibit overlapping with another object in the same cache line. All you need is to:

  1. install Padded in your project (here you can find nuget) in a project that requires padding
  2. declare one attribute in your project:
    namespace Padded.Fody
    {
    public sealed class PaddedAttribute : Attribute { }
    }
    
  3. mark the classes that need padding with this attribute.

Summary

Marking a class/struct with one attribute is much easier than dealing with its layout using .NET attributes, especially, as they were created not for this purpose. Using a custom, small tool to get the needed result is the way to go. That’s how & why Padded was provided.

Ping pong Bruce Lee test

There is a famous Bruce Lee clip showing him as a very good ping pong player using unusual tooling to get the job done. It thought that this ping pong match would be a great story for writing a test for my RampUp library, especially when I provided the first, most likely not final, version of the actor system.

To have more fun I split Bruce Lee into Bruce & Lee. Each part of Bruce Lee either pings or pongs.


public class Bruce : IHandle<Ping>
{
    public IBus Bus;

    public void Handle(ref Envelope envelope, ref Ping msg)
    {
        var p = new Pong();
        Bus.Publish(ref p);
    }
}

public class Lee : IHandle<Pong>
{
    public IBus Bus;

    public void Handle(ref Envelope envelope, ref Pong msg)
    {
        var p = new Ping();
        Bus.Publish(ref p);
    }
}

The ping/pong messages are only markups:


public struct Pong : IMessage {}

public struct Ping : IMessage {}

And the final execution of this setup can be summarized in:


public class Program
{
    public static void Main()
    {
        var system = new ActorSystem();
        IBus bus = null;

        system.Add(new Bruce(), ctx => { bus = ctx.Actor.Bus = ctx.Bus; });
        system.Add(new Lee(), ctx => { ctx.Actor.Bus = ctx.Bus; });

        system.Start();

        var p = new Pong();
        bus.Publish(ref p); // pong as Bruce
        // ... later
        system.Stop();
    }
}

I hope you like the example. I’m aware that ActorSystem API isn’t the best possible API ever, but even in this shape enables me to push RampUp forward.

StructLayoutKind.Sequential not

If you want to write a performant multi threaded application which actually is an aim of RampUp, you have to deal with padding. The gains can be pretty big, considering that the whole work with threads mean, that you need to give them their own spaces to work in.

False sharing

False sharing is nothing more than two or more threads trying to use memory that’s mapped to a single line of cache. The best case for any thread is to have their own memory space separated & by separation I mean having enough of padding on the right and on the left, to keep the spaces of two threads without any overlapping. The easiest way is to add additional 64 bytes (the size of a cache line) at the end and at the beginning of the struct/class to ensure that no other thread will be able to allocate memory close enough. This mechanism is called padding.

Padding

The easiest way to apply padding is applying StructLayoutAttribute. If StructLayoutKind.Sequential is used, then adding 4 Guid fields at the beginning and 4 Guid fields at the end should work just fine. The size of Guid is 16 bytes which give us needed 64 bytes. A harder way of doing it is using StructLayoutKind.Explicit as it requires to add FieldOffsetAttribute to every field of the structure/class, explicitly stating the offset in the memory. With this approach, it’s easy to start with 64 and leave some space at the end of the class.

Problem

StructLayoutKind.Sequential works perfectly. Almost. Unfortunately if any field has type that is not Sequential or Explicit CLR will simply ignore the sequential requirement and silently apply automatic layout ruining the padding. This is a regular case, all classes use Auto by default. Unfortunately it leaves the developer with the need of applying the fields offsets manually.

Solution

As I need this padding behavior for RampUp, I’m creating a small Fody weaver plugin called Padded which will automatically calculate offsets (possibly with some memory overhead) for any class/struct marked with a proper attribute. Hopefully, it will be useful not only for RampUp but for more, performance oriented projects

Using Fody to provide common parts for structs

The RampUp library is meant to provide low-latency, low/no-alloc environment for building fast systems in .NET. As it’s based on messaging in an actor-like/SEDA fashion, the messages are the first class citizen in its environment. Because of these requirements, unlike in other frameworks/systems, they’ve been built on structs. Yes, good old fashioned value types that has no virtual method tables, no object overhead. They’re just pure data. But even in the world of pure data sometimes you need a common denominator, which provides some basic information. Let me share my RampUp approach to this problem.

Envelope

In case of RampUp and its messages, the part that should be attachable to every message is an envelope. You probably want to now the sender of the message and maybe a few more facts. We can’t derive as structure types cannot derive one from another. How can this be done, how to introduce at least one common field in all the messages? Having one field of type Envelope would be sufficient as we could use this field to store all the needed information.

Fody

There’s a tool created by Simon Cropp called Fody. It’s a AOP tool, a weaver, a post compiler. With this you can create ModuleWeavers, that are reusable (there’s a lot of them) and/or applied only in the solution they were created. Using this tool I’ve been able to deliver a weaver that scans for messages in a project and adds a specific envelope field. For each message a metadata is created describing the offset to the Envelope field. Additionally, on the basis of the metadata, a message reader and a writer are emitted so that the final user of RampUp does not need to access this field manually.

Summary

Using a post compiler is often seen as an overkill. On the other hand, being able to introduce a common denominator for a set of value types is impossible without either manual copy-paste techniques or weaving it in a post compilation process. I prefer the latter.

OpCodes.Ldtoken

You probably used typeof operator a few times. It’s quite funny, that an operator like this actually has no MSIL counterpart. Using it emits TWO OpCodes.

The first emitted opcode is OpCodes.Ldtoken with the type. It consumes the token of the type, pushing the RuntimeTypeHandle structure onto the stack as the result of its operation. The second emitted code is a call to the Type.GetTypeFromHandle(RuntimeTypeHandle) which consumes the structure pushed by the previous code, returning the runtime type. The interesting thing is that you can’t use just OpCodes.Ldtoken from C#. You need to load the runtime type first and then you can access the handle by a property. You can emit IL with just OpCodes.Ldtoken though to remove overhead of calling a method and use the structure as a key for a lookup. It will be a bit faster for sure.

You can see the example of emitting this in RampUp code of the message writer.

Replacing a generic dictionary

There is a moment, when you profile your high-throughput system and you hit the wall. And it’s not your code but some BCL elements. That’s what happened in RampUp when I was profiling Write part of the buffer.

The writer is emitted, but as the very foundation it uses a dictionary of metadata stored per message type. The metadata are simple:

  • the message size
  • the offset of the message envelope

Before optimization it was using the generic Dictionary specified with the message type and the message metadata. It was horribly slow. As the set of messages does not change, you could easily provide a set of message types up-front. For each message type one can obtain RuntimeTypeHandle, which can be easily converted to long. With a set of longs, you could select minimal long and just subtract it from all the values. This would reduce the value to int. So here you are. You have a way of turning a type into int and ints are much easier to compare & to handle. One could even use a simple hash-map just to map between ints and metadata. This was the way to reduce the overhead of obtaining metadata with IntLookup<TValue>. After applying the change, write performance has increased by 10%.