The batch is dead, long live the smart batch

It lurks in the night. It consumes all the energy. It lasts much too long. If you experienced it, you know it’s unforgettable. If you’re lucky and you did not meet it, you probably heard these stories from your friends. Yes, I’m talking about the batch job.

The old batch

This ultimate tool of terror was dreading us for much too long. Statements like “let’s wait till tomorrow” or “I think that the job didn’t run” were quite popular a few years back. Fortunately for us, with the new waves of reactive programming, serverless and good old-fashioned queues, it’s becoming thing of the past. We’re in a very happy position being able to process events, messages, items as soon as they come into our system. Sometimes a temporary spike can be amortized by a queue. And it works. Until it’s not.

When working on processing 2 billion events per day with Azure functions, I deliberately started with the assumption of 1-1 mapping, where one event was mapped as one message. This didn’t go well (as planned). Processing 2 billion items can cost you a lot, even, if you run this processing on-premises, that are frequently treated as “free lunch”. The solution was easy and required going back to the original meaning of the batch, which is a group, a pack. The very same solution that can be seen in so many modern approaches. It was smart batching.

The smart batch

If you think about regular batching, it’s unbounded. There’s no limit on the size of the batch. It must be processed as a whole. Next day, another one will arrive. The smart batching is different. It’s meant to batch a few items in a pack, just to amortize different costs of:

  1. storage (accessing store once)
  2. transport (sending one package rather than 10; I’m aware of the Nagle’s algorithm)
  3. serialization (serializing an array once)

To use it you need:

  1. a buffer (potentially reusable)
  2. a timer
  3. potentially, an external trigger

It works in the following way. The buffer is a concurrent-friendly structure that allows appending new items by, potentially, multiple writers. Once

  1. the buffer is full or
  2. the timer fires or
  3. the external trigger fires

all the items that are in the buffer will be sent to the other party. This ensures that the flushing has:

  1. a bounded size (the size of the buffer)
  2. a bounded time of waiting for ack (the timer’s timeout)

With this approach, used actually by so many libraries and frameworks, one could easily overcome many performance related issues. It amortizes all the mentioned costs above, paying a bit higher tax, but only once. Not for every single item.

Outsmart your costs

The smart batch pattern enables you to cut lots of costs. For cloud deployments, doing less IO requests means less money spent on the storage. It works miracles for the throughput as well, no wonder that some cloud vendors allow you to obtain messages in batches etc. It’s just better.

Next time when you think about a batch, please, do it,  but in a smart way.

Different forms of drag

Have you heard about this new library called ABC? If not, you don’t know what you’re missing! It enables your app to do all these things! I’ll send you the links to tutorial so that you can become a fan as well. Have I tested it thoroughly? Yeah, I clicked through demo. And got it working on my dev machine. What? What do you mean by handling a moderate or high traffic? I don’t get it. I’m telling you, that I was able to spin an app within a few minutes! It was so easy!

Drag (physics) is a very interesting phenomenon. It’s a resistance of a fluid that behaves much different from regular, dry friction. Instead of being a stable force, the faster an object moves, the stronger the drag is. Let’s take a look what kind of drags we could find in modern IT world.

 

Performance drag

The library you chose works on your dev machine. Will it work for 10 concurrent users? Will it work for another 100 or 1000? Or, let me rephrase the question: how much of RAM and CPU it will consume? 10% of your application resources or maybe 50%? A simple choice of a library is not that simple at all. Sometimes your business have money to just spin up 10 more VMs in a cloud or pay 10x more because you prefer JSON over everything, sometimes it does not. Choose wisely and use resources properly.

Technical drag

You probably have heard about the technical debt. With every shortcut you make, just to deliver it this week, not the next one, there’s a non zero chance of introducing parts of your solution that aren’t a perfect fit. Moreover, in a month or two, they can slow you down, because the postponed issues will need to be solved eventually. Recently, instead of debt it was proposed to use the word drag. You move on with a debt, but moving with a drag, for sure will make you slower.

Environment drag

So you chose your library wisely. You know that it will consume a specific amount resources. But you know that it has a configuration parameter, that allows you to cut off some data processing or RAM usage or data storage costs. One example that automatically comes to my mind are logging libraries. You can use the logging level as a threshold for storing data or not. How many times these levels are changed only to store less data on these poor productions servers? When this happens, scenario for a failure is simple:

  1. cut down the data
  2. an error happens
  3. no traces beside the final catch clause
  4. changing the logging level for one hour
  5. begging asking users to trust us again and click one more time

This and similar stories heard tooooo many times.

Summary

There are different forms of a drag. None of them is pleasant. When choosing approaches, libraries, tools, choose wisely. Don’t let them drag you.

Shallow and deep foundations of your architecture

TL;DR

This entry addresses some of the ideas related to various types of foundations one can use to create a powerful architectures. I see this post as somewhat resonating with the Gregor Hohpe approach for architecture selling options.

Deep foundations

The deep/shallow foundations allegory came to me after running my workshop about event sourcing in .NET. One of the main properties of an event store was the fact whether it was able to provide a linearized view of all of its events. Having or not this property was vital for providing a simple marker for projections. After all, if all the events have a position, one could easily track only this one number to ensure, that an event was processed or not.

This property laid out a strong foundation for simplicity and process management. Having it or not, was vital for providing one design or another. This was a deep foundation, that was doing a lot of heavy-lifting of the design of processes and views. Opting out later on, from a store that provides this property, wouldn’t be that easy.

You could rephrase it as having strong requirements for a component. On the other hand, I like to think about it as a component providing deep foundations for the emerging design of a system.

Shallow foundations

The other part of a solution was based on something I called Dummy Database. A store that has only two operations PUT & GET, without transactions, optimistic versioning etc. With a good design of the process that can store its progress just by storing a single marker, one could easily serialize it with a partition of a view and store it in a database. What kind of database would it be? Probably any. Any SQL database, Cassandra or Azure Storage Tables are sufficient enough to make it happen.

Moving up and down

With these two types of foundations you have some potential for naturally moving your structure. The deep foundations provide a lot of constraints that can’t be changed that easily, but the rest, founded on the shallow ones, can be changed easily. Potentially, one could swap a whole component or adjust it to an already existing infrastructure. It’s no longer a list of requirement to satisfy for a brand new shiny architecture of your system. The list is much shorter and it’s ended with a question “the rest? we’ll take whatever you have already installed”.

Summary

When designing, split the parts that need to be rock solid and can’t be changed easily from the ones that can. Keep it in mind when moving forward and do not poor to much concrete under parts where a regular stone would just work.

Service kata with Business Rules

TL;DR

In the previous post we started working on a code kata and discovered that instead of creating a new monolithic giant we could tackle the complexity of a process by modelling it right in its natural boundaries: contexts. It this post we continue this journey.

Requesting payment

Let’s spend some time on modelling a process of ordering a membership. It’s been said that it requires a payment and as soon as the payment is done, the membership is activated. We introduced the PaymentReceived event as an asynchronous response to the payment request.

Consider a membership order with the following identifier

11112222-3333-4444-5555-666677778888

When accepting the request for a membership, Membership sends a request to the Payments with the following information

payment_request

It is important to see, that the caller generates identifier, which has following properties:

  • In this case it reuses it’s own identifier for a different context to use snowy identifiers to create snowflake entities
  • As the caller generates id and stores it, in case of the failure when requesting a payment, it can be POSTed again as it’s idempotent (any http status indicating that it already exists means that the previous call was accepted and processed).

Using approach in a service oriented architecture enables idempotence (everyone knows the id upfront).

Events strike back

The result of the payment, after receiving money is an event PaymentReceived which is published to all the interested parties. One of them will be Membership, which would simply take the paymentId and check if there’s an order for a membership with the same identifier. It there is, it can be checked as paid. Simple and easy. The same will apply to other rules in other contexts.

There’s really no point of making this ONE BIG APP TO RULE THEM ALL. You can separate services according to business units and design towards integrating them.

Again, depending on the used tools, you can have events delivered by a bus to all the subscribers all use ATOM feed to publish events in there, and consume them by polling from other services.

Summary

These two posts show that raising modelling questions is important and that it can help to reuse existing structures and applications in creating new robust systems. They do not cover transactions, retries and more. You can use tools that solve it for you like a messaging bus or you’ll need to handle it on your own. Whatever path you choose, the modelling techniques will be generally the same and you can use them to bring real value into the existing ecosystem instead of creating the single new shiny application that will rule them all.

 

Code kata with Business Rules

TL;DR

How many times you were given an implicit requirements that you’d create one application or two services? How many times the architecture and design were predetermined before any modelling with business stake holders? Let’s take a dive into a code kata, that will reveal much more than code.

Kata

The kata we’ll be working on is presented here. It covers writing a tool for a set of business rules gathered across the whole company. The business rules depends on the payment (the fact that it is done) and some other conditions, for example:

  • If the payment is for a physical product, generate a packing slip for shipping.
  • If the payment is for a book, create a duplicate packing slip for the royalty department.
  • If the payment is for a membership, activate that membership.

The starting point for every rule is a payment that is accepted. Another observation is that these rules are scattered across the whole company (as the author mentions Carol on the second floor handles that kind of order). Do you think that having a new single application that gathers all the rules is the way to go?

Contexts

If a mythical Carol is responsible for some part of the rules, maybe another department/team is responsible for membership? What about the payments? Is the bookstore part of your organization really interested if a payment was done with a credit card or a transfer? Is a membership rule really valid outside of the membership context? Is it needed to anyone without membership specific knowledge should be able to say when the membership is activated?

I hope you see the way through these questions. There are multiple contexts that are somehow dependent but that are not a truly one:

  • payments – a part responsible for accepting money and making them transferred to the company’s account
  • membership – taking care of (possibly) accounts, monitoring activity, activating/disactivating accounts
  • bookstore/videostore or simply store – the sales part
  • shipping – for physical products

Are these areas connected? Of course they are!

PaymentReceived

The first visible connector is the payment. To be precise, the fact of receiving it, which can be described in a passive tense PaymentReceived. You can imagine, when requesting a membership, a payment is required. This can be perceived as a whole process, but could be split into following phases:

  • gathering membership data
  • requesting a payment
  • receiving a payment
  • completing the membership order

This is the Membership point of view. As you can see it requests a payment but does not handle it. We will see in the next post how it can be solved.

 

Snowy identifiers

TL;DR

When using the snowflake entities pattern, it’s quite easy to forget about using external identifiers that we need to communicate with external systems. This post provides an easy way to address this concern.

Identity revisited

The identifier of a snowflake entity was presented as a guid. We use an artificial non-colliding client-generated identifier to ensure, that any part of the system can generate one without validating that a specific value hasn’t been used before. This enables storing different pieces of data, belonging to different contexts in different services of our system. No system leaves in vacuum though, and sometimes it requires communication with the rest of the world.

Gate away!

A common aspect that is handled by an external system are payments. When you consider credit cards, native bank applications, PayPay, BitCoin and all the rest, providing that kind of a service on your own is not a reasonable option. That’s why external services are used – the price of using one is much cheaper than delivering one. Let’s stick to the payments example. How would you approach this? Would you call the external payment service from each of your services? I hope you’d not. A better approach is to create a gateway, that will act as a translator between your system and the external one.

How many ids do I need?

Using a gateway provides a really interesting property. As the payment gateway is a part of your system, it can use the snowflake identifier. In other words, if there’s an order, it’s ok (under given circumstances) to use its identifier as identifier of the payment as well. Of course if you want to model these two as a part of a snowflake entity spanning across services. It’d be the payment gateway responsibility to correlate the system snowflake identifier with the external system id (integer, some string, whatever). This would create a coherent view of an entity within your system boundaries, closing the mapping in a small dedicated area of the payment gateway.

An integration with an external system closed in a small component leaving your system agnostic to this? Do we need more?

Summary

As you can see, closing the external dependency as a gateway provides value not only by separating the interface of the external provider from your system components, but also preserves a coherent (but distributed) view of your entities.

Snapshot time!

It’s snapshot time! There’s been a lot of event sourcing content so far. Let’s do a recap!

Below you will find a short summary of event sourcing related articles that I have published here so far. Treat it as a table of content or a lookup or a pattern collection. It’s ordered by date, the later – the older. Enjoy!

  1. Why did it happen – how to make your event sourced system even easier to reason about
  2. Event sourcing and interim stream – how to embrace new modelling techniques with short living streams
  3. Multitenant Event Sourcing with Azure – how to design a multitenant event sourced system using Azure Storage Services
  4. Rediscover your domain with Event Sourcing – how to use your events and astonish your business with meaningful insights
  5. Event Sourcing for DBAs – a short introduction for any relational person into the amazing world of event sourcing. Can be used as an argument during a conversation.
  6. Enriching your events – what are events metadata and why should we care? how to select the most important ones
  7. Aggregate, an idempotent receiver – how to receive a command or dispatch an event exactly once?
  8. Process managers – what is a process manager, how can you simplify it?
  9. Optimizing queries – how to make queries efficient, especially when dealing with multiple version of the same application running in parallel
  10. Event sourcing and failure handling – an exception is thrown. Is it an event or not? How to deal and model it?
  11. Embracing domain leads towards event oriented design – how event oriented design emerges from understanding of a domain