Shallow and deep foundations of your architecture

TL;DR

This entry addresses some of the ideas related to various types of foundations one can use to create a powerful architectures. I see this post as somewhat resonating with the Gregor Hohpe approach for architecture selling options.

Deep foundations

The deep/shallow foundations allegory came to me after running my workshop about event sourcing in .NET. One of the main properties of an event store was the fact whether it was able to provide a linearized view of all of its events. Having or not this property was vital for providing a simple marker for projections. After all, if all the events have a position, one could easily track only this one number to ensure, that an event was processed or not.

This property laid out a strong foundation for simplicity and process management. Having it or not, was vital for providing one design or another. This was a deep foundation, that was doing a lot of heavy-lifting of the design of processes and views. Opting out later on, from a store that provides this property, wouldn’t be that easy.

You could rephrase it as having strong requirements for a component. On the other hand, I like to think about it as a component providing deep foundations for the emerging design of a system.

Shallow foundations

The other part of a solution was based on something I called Dummy Database. A store that has only two operations PUT & GET, without transactions, optimistic versioning etc. With a good design of the process that can store its progress just by storing a single marker, one could easily serialize it with a partition of a view and store it in a database. What kind of database would it be? Probably any. Any SQL database, Cassandra or Azure Storage Tables are sufficient enough to make it happen.

Moving up and down

With these two types of foundations you have some potential for naturally moving your structure. The deep foundations provide a lot of constraints that can’t be changed that easily, but the rest, founded on the shallow ones, can be changed easily. Potentially, one could swap a whole component or adjust it to an already existing infrastructure. It’s no longer a list of requirement to satisfy for a brand new shiny architecture of your system. The list is much shorter and it’s ended with a question “the rest? we’ll take whatever you have already installed”.

Summary

When designing, split the parts that need to be rock solid and can’t be changed easily from the ones that can. Keep it in mind when moving forward and do not poor to much concrete under parts where a regular stone would just work.

Do zobaczenia na GET.NET Łódź

GET.NET w Łodzi nadciąga wielkimi krokami. Mam nadzieję, że zarezerwowałaś/-eś sobie sobotę 22 kwietnia jako czas poświęcony na pogłębienie swojej pasji i wiedzy. Razem z wieloma prelegentami, będę miał przyjemność podzielić się z Tobą swoim doświadczeniem i wiedzą podczas mojej prezentacji The Only Thing That Matters jak i dyskusji w przerwach. Aby umożliwić Ci łatwiejsze dostanie się na konferencję, przygotowałem mały konkurs. Wystarczy, że krótko opiszesz:

“Najciekawsze zagadnienie architektoniczne z którym się ostatnio spotkałaś/-eś”

Jak rozumieć architekturę, co opisać, to pozostawiam Tobie.

Odpowiedzi wstawiajcie w komentarzach pod tym postem. Nie zapomnijcie podać maili! Wśród zwycięzców (wyłonionych metodą pseudolosową Random) organizatorzy przygotowali 2 bilety.

Termin zgłaszania odpowiedzi: 29 marca (włącznie).

How to steal customers from your competitors

TL;DR

I’ve seen this pattern more than a few times during last few years. It looks that the approach I describe below is quite handy when trying to steal clients from your competitors in IT landscape.

I look the same but I’m better

Have you heard that you can use MongoDB driver to connect to Azure DocumentDB? Just like that one can easily swap it’s database and use DocumentDB without changing a line of its code. If your app weren’t cloud native and as one of the strategies you were considering was running MongoDB on your own in the cloud, you don’t have to do it any longer. You can use this offering of the Database as a Service and simply make your app cloud-ready in a matter of seconds (just change the connection string).

Have you heard about the ScyllaDB? It’s a functional port of Cassandra database. It’s  was written in C++ with a custom SEDA-like architecture, user-level network drivers and a lot of understanding of the mechanical sympathy. How does it work? It supports the same network protocol, it supports the same file structure on your disk. Does it look similar? Yes, you can use it as a drop-in replacement. No migrations, not a single line of your code rewritten. Isn’t it great?

API parity, feature parity

It’s often said that the feature parity can hurt your business. If you can be compared you will be compared to others and eventually, the better/cheaper will win. What about API parity? What about ScallaDB that can support the same workload on 10x smaller number of servers? What about DocumentDB that is served as a service and additionally has its amazing indexing algorithm? They strive for this comparison, especially when they guarantee no-op switch using not the feature parity but the API parity.

Summary

Mimicking and setting free customers from a vendor lock-in looks like an interesting and valuable vector of attack for products that offer something more, under the same layer of API. I think, that especially in the public cloud sector, we’ll see it more and more.

Event stores and event sourcing: some not so practical disadvantages and problems

TL;DR

This post is some kind of answer to the article mentioned in a tweet by Greg Young. The blog post of the author has no comment section. Also, this post contains a lot of information, so that’s why I’m posting it instead of sending as an email or DM.

Commits

Typically, an event store models commits rather than the underlying event data.

I don’t know what is a typical event store. I know though that:

  1. EventStore built by Greg Young company, a standalone event database, fully separates these two, providing granularity on the event level
  2. StreamStone that provides support for Azure Table Storage, works on the event level as well
  3. Marten , a PostgreSQL based document&event database also works on the singular event level

For my statistical sample, the quoted statement does not hold true.

Scaling with snapshots

One problem with event sourcing is handling entities with long and complex lifespans.

and later

Event store implementations typically address this by creating snapshots that summarize state up to a particular point in time.

and later

The question here is when and how should snapshots be created? This is not straightforward as it typically requires an asynchronous process to creates snapshots in advance of any expected query load. In the real world this can be difficult to predict.

The first and foremost, if you have aggregates with long and complex lifespans, it’s your responsibility because you chose a model where you have aggregates like that. Remember, that there are no right or wrong models, only useful or crappy ones.

The second. Let me provide an algorithm for snapshoting. If you retrieved 1000 events to build up aggregate, you should snapshot it (serialize + put into cache in memory + possibly store in a db). Easy and simple, I see no need for fancy algorithms.

Visibility of data

In a generic event store payloads tend to be stored as agnostic payloads in JSON or some other agnostic format. This can obscure data and make it difficult to diagnose data-related issues.

If you as an architect or developer know your domain and you know that you need a strong schema, because you want to use it as published interface but still persist data in JSON instead of some schema-aware serialization like protobuf (binary, schema-aware serialization from Google) it’s not the event store fault. Additionally,

  1. EventStore
  2. StreamStone

both handle binary just right (yes, you can’t write js projections for EventStore, but still you can subscribe).

Handling schema change

If you want to preserve the immutability of events, you will be forced to maintain processing logic that can handle every version of the event schema. Over time this can give rise to some extremely complicated programming logic.

It was shown, that instead of cluttering your model with different versions (which still, sometimes it’s easier to achieve), one could provide a mapping that is applied on the event stream before returning events to the model. In this case, you can handle the versioning in one place and move forward with schema changes (again, if it’s not your published interface). This is not always the case, but this patter can be used to reduce the clutter.

Dealing with complex, real world domains

Given how quickly processing complexity can escalate once you are handling millions of streams, it’s easy to wonder whether any domain is really suitable for an event store.

EventStore, StreamStone – they are designed to handle these millions.

The problem of explanation fatigue

Event stores are an abstract idea that some people really struggle with. They come with a high level of “explanation tax” that has to be paid every time somebody new joins a project.

You could tell this about messaging and delivery guarantees, fast serializers like protobuf or dependency injection. Is there a project, when a newbie joins, they just know what and how to do it? Nope.

Summary

It’s your decision whether to use event sourcing or not, as it’s not a silver bullet. Nothing is. I wanted to clarify some of the misunderstandings that I found in the article. Hopefully, this will help my readers in choosing their tooling (and opinions) wisely.

Why you should and eventually will invest your time in learning about public cloud?

TL;DR

Within 2-5 years the majority of applications will be moved to public cloud. You’d better be prepared for it.

Economies of scale

You might have heard that economy of scale does not work for software. Unfortunately, this is not the case for public cloud sector. It’s cheaper to buy 1000000 processors than to buy one. It’s cheaper to buy 1000000 disks than to buy one. It’s better to resell them as a service to the end customer. And that’s what public cloud vendors do.

Average app

The majority of applications does not require fancy processing, or 1ms service time. They require handling peaks, being mostly available and costing no money when nobody uses one. I’d say, that within 2-5 years we will all see majority of them moving to the cloud. If there is a margin, where the service proves its value and it costs more than its execution in the cloud, eventually, it will be migrated or it will die with a big IT department running through the datacenter trying to optimize the costs and make ends meet.

Pure execution

The pure execution has arrived and its called Azure Functions (or Lambda if you use the other cloud:P ). You pay for a memory and CPU multiplied. This means that when there’s nothing to work on, you’ll pay nothing (or almost nothing depending on the triggering mechanism). This is the moment when you pay for your application performing actions. If an app user can pay more than the cost of the execution, you’ll be profitable. If not, maybe it’s about time to rethink your business.

Performance matters

With this approach and detailed enough measurements you can actually can see where you spend the most money. It’s no longer profiling an app for seeing where is it slow or where it consumes most of the memory. It’s about your business burning money in different places. Whether to update one or not – it’s a decision based on money and how much does it cost to fix it. With highly profitable businesses you could even flood your less performing parts with money. Just like that.

Environments and versioning

How to version a function? Preserve signature and rewrite it. Then deploy. Nothing less nothing more. I can almost see a new wave of development approaches where Continuous Delivery looks like a grandpa trying to run with Usain Bolt. You can’t compete with this. It’s a brand new league.

Summary

If you think about areas you should invest your time, public cloud and functions are the way to go. For majority of the cases, this is going to be vital to survive in the market competing and betting for the lowest costs of infrastructure, IT and devops.

Hitting internal wall in Service Fabric

TL;DR

In this post I share my experience in trying extending Service Fabric for Sewing Machine purposes.

Sewing Machine aim

The aim of Sewing Machine is to extend the Service Fabric actor model to use better, faster, less-allocating foundations. The only part I was working on so far was persisted actors. This is the one stored in the KeyValueReplica, I’ve been writing about for last few weeks.

Not-so public seam

I started my work by discovering how the persisted actors are implemented. The class responsible for it is named KvsActorStateProvider. It uses following components:

  • KeyValueStoreWrapper – private
  • VolatileLogicalTimeManager.ISnapshotHandler – internal
  • VolatileLogicalTimeManager – internal
  • IActorStateProviderInternal – internal
  • ActorStateProviderHelper – internal, responsible for shared logic among providers
  • IActorStateProvider – public interface to implement

As you can see, the only part that is given is a seam of the state provider. Every single helper that one could use to implement their own, is internal. Additionally, the provider interface is filled with other interfaces that one needs to implement. I know, sharing data structures isn’t the best option, but as the ServiceFabric share them internally, why wouldn’t you give it to the user.

All of the above means that extending actors’ runtime is hard if not impossible. It provides no real extension points and has its public seam not prepared for it. What does it mean for SewingMachine?

Can’t win, change battle

Rewriting the runtime would be time-consuming. I can’t spend half a year on writing it and don’t want to. At least, not now. I’ve implemented a faster unsafe wrap around KeyValueReplicaStore that still can be useful. To make an impact with SewingMachine, I’ll introduce the event driven actor first, even when clean up will be run by not efficient regular Actor disposal. Using a custom serializer and adhering to the currently used prefixes, later on can be changed to use a custom runtime. The only problem would be reminders, but this can be handled as well by a better versioning of them.

Summary

SewingMachine was meant to extend the actors’ runtime. Seeing difficulties like the ones described above, I could either kill it or repurpose it to provide a real value first, leaving performance for later. That’s how we’ll do it.

 

Service Fabric – KeyValueStoreReplica, ReplicaRole

TL;DR

After taking a look at how actors’ state is persisted with KeyValueStoreReplica to follo the prefix query guideline, it’s time to see how this state is replicated.

ReplicaRole

When defining replication for a partition, one defines on how many nodes the data will reside. Every copy of a partition’s data is called Replica. It’s important to know, that for a given partition only one replica at a time is active. This kind of replica is called Primary. Let’s take a look at the ReplicaRole values and decipher their meanings:

  1. Primary – the currently active replica. All operations are handled by the primary, ensuring that any write will be replicated and acknowledged by a quorum of ActiveSecondary replicas. As in The Highlander, there can be only one Primary replica at the same time.
  2. IdleSecondary – a replica that accepts and applies a state send by the Primary to catch up with all the changes and eventually become ActiveSecondary as soon as it catch up.
  3. ActiveSecondary – a replica that is a part of the write quorum. It stores updates from the Primary and acknowledge them to enable Primary to successfully end a write operation.

Active, passive and not-that-passive

As you can see above, there’s only one replica at any time that is truly active, it’s Primary. What happens with the secondaries? Can they do anything meaningful or maybe they’re just there for copying state?

First and foremost, secondary replicas receive notifications about the state being replicated. This means, that if you derive from KeyValueStoreReplica class, you can be notified about the copied key-value pairs. That’s how you can react to these changes. But how would this be helpful?

You could index the data somehow, you could notify other services, endpoints calling their methods or sending a request (in a safe manner, not failing on the notification) and much more. For instance, Actors’ Runtime uses it to capture the last timestamp for a component called VolatileLogicalTimeManager.

Summary

The role of a replica can be easily summarized as: primary – the current active replica accepting reads&writes, secondary – replicas just copying the state.