ThreadStatic vs stackalloc

Posted on April 20, 2017 · 2 mins read · tagged with: #BenchmarkDotNet #performance

TL;DR

I’m working currently on SewingMachine, an OSS project of mine, that is aimed at unleashing the ultimate performance for your stateful services written in/for Service Fabric (more posts: here). In this post I’m testing whether it would be beneficial to write a custom unmanaged writer for protobuf-net, instead of using some kind of object pooling with ThreadLocal.

ThreadStatic and stackalloc

ThreadStatic is the old black. It was good to use before async-await has been introduced. Now, when you don’t know on which thread your continuation will be run, it’s not that useful. Still, if you’re on a good old-fashioned synchronous path, it might be used for object pooling and keeping one object per thread. That’s how protobuf-net caches ProtoReader objects.

One could use it to cache locally a chunk of memory for serialization. This could be a managed or unmanaged chunk, but eventually, it would be used to pass data to some storage (in my case, SewingSession from SewingMachine). If the interface accepted unma naged chunks, I could also use stackalloc for small objects, that I know how much memory will be occupied by. stackalloc provides a way to allocate some number of bytes from the stackframe. Yes, it’s unsafe so keep your belts fastened.

ThreadStatic vs stackalloc

I gave it a try and wrote a simple (if it’s dummy, I encourage you to share your thoughts in comments) test that either uses a ThreadStatic-pooled object with an array or a stackallocated and writes. You can find it in this gist.

How to test it? As always, to the rescue comes BenchmarkDotNet, the best benchmarking tool for any .NET dev. Let’s take a look at the summary now.

local_vs_threadstatic.png

Stackalloc wins.

There are several things that should be taken into consideration. Finally block, the real overhead of writing an object and so on and so forth. Still, it looks that for heavily optimized code and small objects, one could this to write them a bit faster.

Summary

Using stackallocated buffers is fun and can bring some performance benefits. If I find anything unusual or worth noticing with this approach, I’ll share my findings. As always, when working on performance, measure first, measure in the middle and at the end.