Polyglot programming

Have you ever been in a situation when you’ve got not enough tooling in your preferred environment? What did you do back then? Did you use already known tools or search for something better?
Recently I’ve been involved in a project which truly pushed me out of my comfort zone. I like .NET, but there’s not enough tooling to provide a highly scalable, performant, failure resistant environment. We moved to Java, Storm, Cassandra, Zookeeper – JVM all over the place. It wasn’t easy, but it was interesting and eye opening. Having so many libraries focused on resolving specified concerns instead of the .NET framework-oriented paradigm was very refreshing.
Was it worth it? Yes. Was it good for self-development? Yes. Will I reach for every new library/language? For sure no. The most important thing which I’ve learned so far, was that being adaptable and aware of tools is the most important thing. Mistakes were made, that for sure, but the overall solution is growing in a right direction.
After all, it’s survival of the fittest, isn’t it?

Disruptor with MultiProducer

I hope you’re aware of the LMAX tool for fast in memory processing called disruptor. If not, it’s a must-see for nowadays architects. It’s nice to see your process eating messages with speeds ~10 millions/s.
One of the problems addressed in the latest release was a fast multi producer allowing one to instantiate multiple tasks publishing data for their consumers. I must admit that the simplicity of this robust part is astonishing. How one could handle claiming and publishing a one or a few items from the buffer ring? Its easy, claim it in a standard way using CAS operation to let other threads know about the claimed value and publish it. But how publish this kind of info? Here’s come the beauty of this solution:

  1. allocate a int array of the buffer ring length
  2. when items are published calculate their positions in the ring (sequence % ring.length)
  3. set the values in the helper int array with numbers of sequences (or values got from them)

This, with overhead of int array allows:

  1. waiting for producer by simply checking the value in the int array, if it matches the current number of buffer iteration
  2. publishing in the same order items were claimed
  3. publishing with no additionals CASes

Simple, powerful and fast.
Come, take a look at it: MultiProducerSequencer

Out of order commands

In the previous posts a simple mechanism of storing information needed for operation idempotence was introduced. A simple hash table, which state is transactionally saved with the state of object onto which the send operation was applied. How about receiving operations out of order? What if infrastructure (for instance, messaging system) will pass one operation earlier than the second, which in reality occurred earlier?

It’s time to make it explicit and start calling elements in the DDD manner. So for sake of reference, the object considered as the subject of an operation is an aggregate root. The operation is of course a message. The modeling assumes using the event sourcing as a storage for aggregates’ states.

Assume, that the aggregate, which the command is sent to, has a property called Version, incremented with each event applied on. Assume then, each command contains a version number, which is supposed to be equal to the aggregate’s version. If, during dispatch, these two values are different, an exception is thrown and command do not change the state of the aggregate. It’s a simple optimistic concurrency implementation, allowing discarding out of order commands sent to an object.

To make it more interesting, consider a sharded system, where specific aggregates are stored by different nodes (but for each aggregate there is one node where it is stored). An aggregate’s events (state changes) have to be propagated across all the nodes/shards in the same idempotent manner as commands are sent to aggregates. It’s easy to apply hashtable for each node and with using the very same key: aggregateId with version but it would mean storing all the pairs of aggregate identifiers with their versions, which could possibly bring down each of your nodes (or make you use GBs of memory). Can the trivial fact, that version is increased with every event on the aggregate, could be used for some optimization? You’ll see in the next entry.

What am I missing here?

If you’re in a startup and have a full-time job a the same moment as I do, that’s a post for you.

The initial startup pressure and tempo is huge. Focused on the features you can bring to life more and more of them. How often do you load your project, collapse it’s whole structure and ask questions:

  • What am I doing now?
  • How does it influence the rest of the system?
  • Is everything I need expressible in the current infrastructure and/or design?
  • Is it something, which I know from other projects missing?

It might seem that those opened questions are unneeded, to silly to ask, but from last time I asked them, they became a weekly routine. To show you, I’ll give you an example.

I write tests. As you already know, not always unit tests, but… During one of my write test/run/fix error cycles I noticed that it was quite hard to get all the information I needed. There was an assert failing and without debugging, only by viewing logs I had no idea what might have gone wrong. I reopened the project and did ‘what am I missing here?’ After global review of the whole solution I did found a thing. During all the feature based design I did a silly mistake not providing any logging in the application. You know, these _if_log_isDebugEnabled_ stuff (take a look in the NHibernate code). It took me no more than 10 minutes to spike it with some console appender and I rerun my tests. Ha! Some components did not log one or two operations and that was it.

It’s worth not to loose the (overused phrase) big picture and from time to time, stop providing features and ask these silly, ordinary questions.

Idempotence, pt. 2

In the previous post a few operations were taken into consideration, whether there are (not) idempotent. For the sake of reference, here there are:

  • Marked as default
  • Money transfer ‘500$’ ordered to ‘x’ account
  • Label ‘leave sth for the future month’ added

If we consider ‘idempotent’ as an operation which can be applied multiple times in a row, then all the operations overriding previous values of some properties are idempotent. Having some entity marked as default 5 times does not change the fact that it is default. That’s for sure. What about provisioning ‘x’ account with 500$? Can this type of operation can be reapplied multiple times? Of course not, because it does not override any property, it changes the state, by interacting with a previous one. The same goes for ‘labeling’, of course if there is no compensation introduced (select only unique labels before saving, which would allow reapplying).

What if you want your system to be resistant to operations resend multiple times? The simplest solution is to add unique identifier for each operation and storing them is a lookup (hashtable). Each time the operation arrives, the lookup is checked whether there this operation was already processed. If so, skip it.

There is one additional condition is to have the lookup transactional with a storage you save the states. This condition is a simple ‘all-or-none’ for storing the result of operation with the fact, that this specific operation was already applied. Otherwise, if lookup would be updated in the first place and storing the state after the operation failed, there would be no change saved. The same applies to a situation, where the lookup is updated at the very end. The operation result is saved, adding info about operation to lookup fails and the next time the same operation arrives it is applied one more time. Having that said, lookup must be transactional with the medium where state is saved.

Composition vs derivation

Assume you’re writing some reports in your application. You’ve just created the third controller covering some kind of reporting and it seems to be, that all the three controllers have a very similar code, modulo type passed as the entity type to your NH ISession.QueryOver() or another data source. It seems that, if the method creating report base was generic, it could be used in all the cases. You want to extract it, make it clearer and to stop repeating yourself. What would you do?

The very first thing is to extract method in each of the controllers. Now they seem almost the same. A place is needed where the method can be easily moved. How about a super type? Let’s create a controller, call it in a fashionable way, for instance: ReportControllerBase, and move the method in there. Now you can easily remove the methods in the deriving controllers. Yeah! It’s reusable, everyone writing his/her report can derive from the ReportControllerBase and use its methods to speed up his/her task.

The very first step is exactly the same: the extract method must be done, to see the common code. Once it’s done, you notice that the whole method has only a few dependencies which can be easily pushed to parameters, for instance: isession, the entity type passed to query, the value used in some complex where clause, etc. You change all the field and properties usages to parameters which allows this method to be static. You create a static method and turn it into an extension method of a session. The refactorization is done, you can easily call this method in all the controllers, by simply calling an extension onto a session.

What’s the difference and why composition should be preferred
If you use C# or Java you should always be aware of one limitation: you can derive from only one type. Spending this ‘once in a lifetime’ for a simple functionality extraction, for me – that’s the wrong way. Using composition, and deriving only when you truly see that one type is another type, that’s the right way to go. In the next post I’ll write about ASP MVC Detached Actions, a simple mechanism you can use, to derive your controllers only, when it is needed and delegating common actions, without it.

Simplicity of your architecture

A few days ago I watched a nice presentation published by Infoq called Simplicity Architect which made me think one more about design decisions I’ve been making through the last one year. Dan North, which was the speaker remind of the major problem every architect, I’d say, designer has. The problem is complexity lurking in the dark corners of each solution. I can remember times when from-zero-to-working-example took my team more than one month because of the db creation, design, and other stuff which you can qualify as things, which your business don’t understand. During the very last startup, it took me only 2 hours to provide the very basic set of functionalities with viewing and filtering a list of entities, editing them and so on. I did use a few tools (NHibernate, log4net, Unity and my infrastructure part injecting it all in MVC in my way), but it was only 2 hours to have a working example and it was _simple_ in terms of design. I did like it as the business owner did.

Hope you don’t get too complex either.