Local translation table

It’s quite to common to use GUIDs as a unique identifiers across systems as they are… unique :) The GUID generation algorithm has a very important property: it’s local. Your code doesn’t have to contact any service or a database. It just works. There’s one problem with GUIDs, they are pretty lengthy taking 16 bytes to be stored.
Consider following example: you want to provide a descriptor for a finite set of distinguishable items, like events in event sourcing. You can use:

  • an event type, a string description of it to make it right. That’s the idea behind event storage within EventStore.
  • GUIDs and generate them them on developers machines. They will be unique but still lengthy when it comes to storing them
  • assign integers, but you will need to divide integers sets between module and be very careful to do not step into another module area

There’s one additional thing you can do. You can easily combine 2 and 3. Just use GUIDs on contracts, providing uniqueness, but when it comes to storing provide a translation table, persistent, ensured of its existence during start up, mapping GUIDs to ints (4 bytes) or even shorts (2 bytes). You can easily create this kind of table locally, one for each module/service, just to embrace all the contracts used by this module. This will lower the storage cost and still let you use nice properties of Guids.

Simple, easy and powerful.

Processes and The Cone of Uncertainty

The Cone of Uncertainty is a management phrase describing the fact, that we cannot foresee the future of a project with a constant probability. The longer period we want to plan for, the less probable it’s going to be exact. That’s the reasoning behind sprints in Agile, to keep them short enough to be in the narrow part of the cone. But it isn’t only the planning! The shape of the cone can be modified by some good practices and reducing the manual labor. By modified I mean greatly narrowed.
The are plenty of processes and elements that can be introduced:

  • continues builds
  • proper db deployments
  • tests
  • continues deployments
  • promotion of application versions between environments

Each of them improves some parts of the development process making it less manual and more repeatable. Additionally, as you introduce tooling for majority of these cases, the tools run with a similar speed so you can greatly lower the uncertainty of some aspects of development. Then again, if some of aspects are constant, only the real development will affect the cone and your team with a manager get what you wanted: a more predictable process and a smaller cone of uncertainty.

I think therefore I haven’t written a test

It seems to me that veryfing one’s thesis is one of the most important aspects of software engineer work. Quite often phrases started with ‘it seems to me…’ or ‘think that…’ are said in a way, that the listener takes for granted truthfulness of the sentence. Meanwhile it hasn’t been proven or covered by a replicating test. I find it as a part of some kind of a social contract that ‘if I say maybe and you partially confirm this then it’s ok’. It isn’t.

I’m not a TDD bigot, I use either TDD or tests themselves as a tool. But when a bisection is needed, when I have to search for a bug or simply verify my thesis, then the test provides a verifiable and repeatable way of answering a question with yes/no without all this hesitation. Just take a look at protobuf-net cases named after StackOverflow questions.

I’m wishing you and myself truly binary answers with a high test coverage.

From domain to code

Currently I’m helping to model one domain with Event Sourcing. A few days ago we spent ~15 minutes on modelling some cases on the whiteboard. The result of the first phase was distilling a few aggregates with events. Later on, we described some processes as we needed to react to some events. At first to much behavior was put in a process manager, to finally be moved to a separate aggregate – a process itself. The process manager was left as a simple router. Because of the strong foundations (a library) providing us at-least-once semantics with a idempotent receivers and handling process managers, the whole discussion was on a much higher level. No imperative programming, just declarative pieces of the domain itself!
A few hours later an implementation was finished. The most important thing was that there was no mapping!. The code was a simple representation of the model being discussed and agreed on. No ORM magic, no enterprise onion with tons of layers and mappings. It was just a simple model of this piece of a domain. And it’s been working like a charm.

Values, principles, then processes

The presentation of Pawel Brodzinski published on Infoq here is a must see. Whether you’re into Kanban or not, it shows a few simple cases and discusses the three important aspects of a company:

  1. values
  2. principles
  3. processes

and interactions between them. Additionally, some problems with visualizations are shown. One can show you all of his/her processes but is it possible to steal this know-how?

Zone, where are you?

I love working in the zone. I find this flow extremely productive and exhaustive the same time, but if one asks is it worth to go in There, my answer is always yes.
Through the years I’ve learned a few things about myself and my responses to different measures I use to get in it.

Music
The very first thing which I use to help me to get to the zone is a specific kind of a music, I don’t listen to outside my work. The genre isn’t important at all. What is important is to inform myself that ‘this is the moment I want to work hard and extremely productive’. Wearing headphones helps always as it separates you from the background noise, but that’s the music that helps to get in there.

Coffee
A cup of a black coffee. No sugar. If it was a fourth one, I’d go with a tea.

Standard tooling
I need to have all my favorite tools on board. It requires VisualStudio, Resharper and SSD based machine with a proper CPU and a few gigs of RAM.

And here we go… :-)

Do we really need all these data transformations?

Applications have layers. It’s still pretty common to see an enterprise application being built with layers like DAL, Business Logic (or Domain), Services, etc. Let’s not discuss this abomination itself. Let us rather consider the flow of the data within the application.

SELECT * FROM
That’s where the data are stored. Let us consider a good old-fashioned SQL Server. To get the data from the database you may use ADO (oh no!) or any new ORMs, including the micro ORMs like Dapper or something similar. What you end with is probably some kind of an object, or an object collection. Here’s where you start playing with data.

Mappings
It doesn’t matter whether you’re using Automapper or map the data on your own. For encapsulation purposes or getting an immutable version of an object it’s common to copy its values to a new representation. I know that strings are immutable and will be copied by reference, but you copy them as well.

Services
So you’ve got your data mapped to the right model. Now you can return them from your service. Ooops, it’s a fancy REST service and you translate the very same data again. Now, because it’s a browser asking and you use content negotiation, the data are transformed to JSON.

In onion architectures, you can meet even more transformations between layers, mappings from DTOs to DTOs are quite common. The question, not only from the architecture point of view, but from the performance oriented angle is the same: what are you doing? Why do you want to spend plenty of time to write all these mappings? Why do you want to melt the CPU in never ending mappings? Can you not skip all of these? Why not to store JSON in the database or use a database that supports JSON blobs as a first level citizen (RavenDB, MongoDB) and simply push the content retrieved from the database right to the output stream?

All the thoughts above have been provoked by services I’m creating now. Long story short, they store objects serialized with Google Protocol Buffers. When you access an object from an external system, a service just copies the blob without the deserialization right to the output stream. No deserialization, no allocations, no overhead. Simple and brutally fast.

Next time you come up with an onion design or layers of transformations ask yourself is it worth and if you can pay the price of doing all these mappings.