All the optimizations that you can’t use

It’s true. This blog post title is a bit sad. Still, it tries to show some cases that one needs to take into consideration when writing or designing a serializer. The article, as usual, assumes .NET environment.

Dynamic for everybody!

The first obstacle are data passed in a form of object. Don’t get me wrong, the top level of API call can be an object. What I mean by dynamic is having fields of object type inside of other objects.


class MyData
{
public object MoreData {get; set;}
}

This can happen, when a serializer is a generic purpose serializer and it’s totally fine to expect a serializer to work on data like this. This is a bit limiting when comes to the optimizations though.

You cannot easily serialize data or inline the method for a field type as you need to check the type and dispatch it dynamically. Again, for a general purpose serializer it’s a good property to have. Eventually, in JSON everything is an object, right?

How big is it?

Knowing the size of the serialized payload up-front is a really huge game changer. What you can do with this information? You can allocate the buffer before. That’s the easiest one. What else?

You can ask a chunk of memory from a pool. In .NET, we’ve got the MemoryPool now, which can be asked to return Memory<byte>.

If the size of the needed memory is small enough and we’re on the synchronous path (no async, no thread switching), we can also use Span with stackalloc.


Span memory = stackalloc byte[maxEstimatedSize];

That’s how Enzyme, my experimental serializer works.

Again, if we have some really complex objects and going through the object tree is costly, it might be hard to go through this. If we add a not so strongly typed schema, and the need of checking types of objects and then visiting their payload on this basis. This is getting even more costly.

If you know the payload size, you can optimize for the memory usage. Unfortunately, for general purpose serializer, this might not be the case when a weakly-typed value is passed.

General purpose vs system messages

That’s the final question I presume. What do you design for? Is it a general purpose serializer, the one that should and will accept anything that it thrown into it, or are you designing a system protocol specific message? Or maybe there’s some middle ground?

An example of a system specific message, with a rigid protocol could be NServiceBus metrics message https://github.com/Particular/ServiceControl.Monitoring.Data/blob/d1b15192315e041590e7a8d297b7b9b92afbd470/src/ServiceControl.Monitoring.Data/TaggedLongValueWriterV1.cs#L35-L47 or Hazelcast providing a wrapper around message payload https://github.com/hazelcast/hazelcast-csharp-client/blob/c7546896caa6061c7f10500b190a51354551b47d/Hazelcast.Net/Hazelcast.Client.Protocol/ClientMessage.cs#L38-L54 which combines bits of general purpose (payload) and system message (wrapper)

Summary

Designing a serializer requires you to make some decisions and choices. Even if you’re writing a new JSON serializer, you’ll need to choose what kind of types will be accepted, what kind of memory pooling (if any) can be used and how deep can we go to be specific (general purpose vs system). I hope this post provides you some of my experience in the serialization area.

4 thoughts on “All the optimizations that you can’t use

  1. The Object field may be an array of arrays of whatever. And, even if field of this class was a specific type, serializer needs to check whether it’s a sealed type, otherwise it might be subclass type.

    My general purpose (Java) serializer caches information about types. Also, it saves instruction for deserialization, because it deserializes into a tree structure without having original types. Everything is sent over a TCP connection. It looks like a debugger 🙂

    • This comment is so right 🙂 I’ve been working through this cases (including subclasses) recently.

      The design is strongly connected to needs. Passing the full graph with all the info is costly, but it allows a lot. Sometimes, you need a bit less, and some information can be dropped, if the other side does not need to deserialize value, but only visit it. Again, it’s case by case work 🙂

Comments are closed.