All the optimizations that you can't use

Posted by scooletz on July 22, 2019 · 4 mins read

It’s true. This blog post title is a bit sad. Still, it tries to show some cases that one needs to take into consideration when writing or designing a serializer. The article, as usual, assumes .NET environment.

Dynamic for everybody

The first obstacle are data passed in a form of object. Don’t get me wrong, the top level of API call can be an object. What I mean by dynamic is having fields of object type inside of other objects.

class MyData
{
    public object MoreData {get; set;}
}

This can happen, when a serializer is a generic purpose serializer and it’s totally fine to expect a serializer to work on data like this. This is a bit limiting when comes to the optimizations though.

You cannot easily serialize data or inline the method for a field type as you need to check the type and dispatch it dynamically. Again, for a general purpose serializer it’s a good property to have. Eventually, in JSON everything is an object, right?

How big is it

Knowing the size of the serialized payload up-front is a really huge game changer. What you can do with this information? You can allocate the buffer before. That’s the easiest one. What else?

You can ask a chunk of memory from a pool. In .NET, we’ve got the MemoryPool now, which can be asked to return Memory{byte}.

If the size of the needed memory is small enough and we’re on the synchronous path (no async, no thread switching), we can also use Span with stackalloc.

Span<byte>
        ; memory = stackalloc byte[maxEstimatedSize];

That’s how Enzyme, my experimental serializer works.

Again, if we have some really complex objects and going through the object tree is costly, it might be hard to go through this. If we add a not so strongly typed schema, and the need of checking types of objects and then visiting their payload on this basis. This is getting even more costly.

If you know the payload size, you can optimize for the memory usage. Unfortunately, for general purpose serializer, this might not be the case when a weakly-typed value is passed.

General purpose vs system messages

That’s the final question I presume. What do you design for? Is it a general purpose serializer, the one that should and will accept anything that it thrown into it, or are you designing a system protocol specific message? Or maybe there’s some middle ground?

An example of a system specific message, with a rigid protocol could be:

  1. NServiceBus metrics message
  2. Hazelcast with a wrapper around message payload which combines bits of general purpose (payload) and system message (wrapper)

Summary

Designing a serializer requires you to make some decisions and choices. Even if you’re writing a new JSON serializer, you’ll need to choose what kind of types will be accepted, what kind of memory pooling (if any) can be used and how deep can we go to be specific (general purpose vs system). I hope this post provides you some of my experience in the serialization area.


Comments

The Object field may be an array of arrays of whatever. And, even if field of this class was a specific type, serializer needs to check whether it's a sealed type, otherwise it might be subclass type.

My general purpose (Java) serializer caches information about types. Also, it saves instruction for deserialization, because it deserializes into a tree structure without having original types. Everything is sent over a TCP connection. It looks like a debugger :)

by Namek at 2019-07-22 08:21:39 +0000

This comment is so right :-) I've been working through this cases (including subclasses) recently.

The design is strongly connected to needs. Passing the full graph with all the info is costly, but it allows a lot. Sometimes, you need a bit less, and some information can be dropped, if the other side does not need to deserialize value, but only visit it. Again, it's case by case work :-)

by Szymon Kulec 'Scooletz' at 2019-07-30 20:26:40 +0000

What you mean by visiting a value without deserializing it?

by Namek at 2019-07-30 23:04:46 +0000

I meant a visitor walking over the object, never deerializing object as a whole.

by Szymon Kulec 'Scooletz' at 2019-08-27 16:43:43 +0000

My tools



[PL] Master of Aggregates
[PL] Master of Aggregates
[ENG] My free e-book
[ENG] My free e-book