.NET 5 is around the corner and it will bring a lot of goodness to the land of performance. One of the interesting aspects that I visited when working on Async Expert online course is the aspect of pooling boxes hidden behind ValueTask
returning operations. The reasoning behind it can be found in the awesome article Async ValueTask Pooling in .NET 5 authored by Stephen Toub. This post focuses on diving deep in the codebase and seeing how it’s implemented behind it.
To enable pooling, the DOTNET_SYSTEM_THREADING_POOLASYNCVALUETASKS
environment variable needs to be set either to true
or 1
. Then the state machine box objects that implement the IValueTaskSource
and IValueTaskSource<TResult>
interfaces will be pooled and reused for async methods with ValueTask
or async ValueTask<TResult>
.
Another setting that you can use to tweak the behavior is DOTNET_SYSTEM_THREADING_POOLASYNCVALUETASKSLIMIT
that sets the size of the pool. By default it’s equal to Environment.ProcessorCount * 4
.
Both can be found in AsyncTaskCache class.
Setting the flag mentioned above is responsible for setting the field AsyncTaskCache.s_valueTaskPoolingEnabled
. This is the value that is used to determine whether the pooling should be used on not. You can navigate through AsyncValueTaskMethodBuilderT.cs and AsyncValueTaskMethodBuilder.cs to see all the occurrences. They are all quite similar and follow this pattern:
public void AwaitOnCompleted<TAwaiter, TStateMachine>(ref TAwaiter awaiter,
ref TStateMachine stateMachine)
where TAwaiter : INotifyCompletion
where TStateMachine : IAsyncStateMachine
{
if (AsyncTaskCache.s_valueTaskPoolingEnabled)
{
// the value task pool way
AwaitOnCompleted(ref awaiter, ref stateMachine,
ref Unsafe.As<object?, StateMachineBox?>(ref m_task));
}
else
{
// the regular task way
AsyncTaskMethodBuilder<TResult>.AwaitOnCompleted(ref awaiter, ref stateMachine,
ref Unsafe.As<object?, Task<TResult>?>(ref m_task));
}
}
internal static void AwaitOnCompleted<TAwaiter, TStateMachine>(
ref TAwaiter awaiter, ref TStateMachine stateMachine, ref StateMachineBox? box)
where TAwaiter : INotifyCompletion
where TStateMachine : IAsyncStateMachine
{
try
{
// GetStateMachineBox - it looks interesting :)
awaiter.OnCompleted(GetStateMachineBox(ref stateMachine, ref box).MoveNextAction);
}
catch (Exception e)
{
System.Threading.Tasks.Task.ThrowAsync(e, targetContext: null);
}
}
Now we’re really close to see how pool works. Can you notice the GetStateMachineBox
call? This is the place where pooling happens.
Interested in async await, concurrency and you want to expand your knowledge?
Check out Async Expert! Together with Dotnetos I prepared something special for you!
The method GetStateMachineBox is 70 lines long and it’s pretty well documented. It tries to obtain the object responsible for keeping the state a.k.a box
. If it fails, it means that the box needs to be obtained or created as it’s shown in its last few lines.
var box = StateMachineBox<TStateMachine>.GetOrCreateBox();
boxFieldRef = box;
box.StateMachine = stateMachine;
box.Context = currentContext;
What’s behind the GetOrCreateBox
and can we finally move to the pooling behavior?
The implementation of GetOrCreateBox
is simple and uses fast primitives (read it as: no lock
) to either obtain the existing box or fallback to the allocation path. Let’s take a look at the code with comments that I augmented to explain some primitives used below.
internal static StateMachineBox<TStateMachine> GetOrCreateBox()
{
// Try to acquire the lock to access the cache. If there's any contention, don't use the cache.
// This operation uses an atomic compare and exchange.
// The operation replaces s_cacheLock with 1 only and only if the previous value was 0.
// It returns the previous value, so it can be checked whether the "fast lock" was taken.
if (Interlocked.CompareExchange(ref s_cacheLock, 1, 0) == 0)
{
// The exchange succeeded. We can check the cache.
// If there are any instances cached, take one from the cache stack and use it.
StateMachineBox<TStateMachine>? box = s_cache;
// check if there's something in the cache
if (!(box is null))
{
// The cache is a simple list. Get item and move it to the _next.
s_cache = box._next;
box._next = null;
s_cacheSize--;
Debug.Assert(s_cacheSize >= 0, "Expected the cache size to be non-negative.");
// Release the lock. It uses Volatile.Write that will be eventually seen by other threads.
// We can do it, as we know that we are holding the lock.
// The other threads will finally see the s_cacheLock == 0.
// One of them will be able to CompareExchange it with 1.
// Volatile.Write ensures happened-before semantics. If another thread observes s_cacheLock == 0,
// it also will observe a proper s_cache value.
Volatile.Write(ref s_cacheLock, 0);
return box;
}
// No objects were cached. We'll just create a new instance.
Debug.Assert(s_cacheSize == 0, "Expected cache size to be 0.");
// There's no box, but again, we need to release the lock here. See comments above
Volatile.Write(ref s_cacheLock, 0);
}
// Couldn't quickly get a cached instance, so create a new instance.
return new StateMachineBox<TStateMachine>();
}
I hope this posts helps you in diving a bit deeper in what’s hidden behind pooling/caching for ValueTask
in .NET5. As you can see, the behavior is optional and enabling it should be followed by measurements and benchmarking proving, that in your case it does help and reduces allocations greatly.