Should companies orient their IT toward service/product

The question from the subject of this entry has arisen in me after a discussion with a colleague of mine. The very initial question on this subject was whether a library with contracts should be named:

  1. after the consumer, embedding its name in the contract, marking it as ‘it’s for this app’
  2. after the provider’s functionality, marking it as ‘this service part provides this functionality’

The answer may obvious from the development perspective. The second answer provides and abstraction over given functionality hence I prefer it over 1st. What does it mean for your organization? Well, you just have defined a contract of a service. If you have a system, with up to a few abstractions like this and a team behind it, you’ve got a real service, a real product owned by the team. This way of thinking in a long term may result in:

  1. team’s ownership feeling
  2. proper contracts versioning (as it is based on)
  3. top-bottom understanding of service boundaries and responsibilities
  4. distilling a single API for given set of functionalities

The other way may be good as:

  1. other teams may influence or dictate interfaces they need
  2. versioning may be much simpler (one consumer for the contract)

Which one should choose? There’s one point I didn’t mention before, which shows the winner. It’s the entanglement factor. The first solution introduces one functionality-one API rule and makes consumers obey the rules team wants to be obeyed, for example max number of items returned per request etc. The second is very similar to sharing the most internal parts of the system, almost like… db integration. A service team have to maintain and version multiple interfaces. Let us count it, being given

  • n services
  • each service depends on 3 others

when the first solution is chosen (one service-one API) the total number of published contracts is n. In the second case, it’s 3n and the responsibility for maintenance and fixes is blury. The second version reminds me of an entangled web with no owners. Everything is nobody’s nothing.
In my opinion, the bigger shift toward service/product paradigm, with team’s ownership of the product, not the code ownership, the healthier IT teams and product they make.
How about your company? Is it oriented around services?

Reconcilation between systems

From time to time a system is replaced with another system being capable of doing more, or doing the thing better. It’s quite to common to ask whether no data is lost or does the system preserve needed behaviors of the old one. Sometimes it’s human-application comparison, when a procedure followed by people is replaced with an application, sometimes it’s a question of an old system vs a new system

Cassandra integrity check
Let’s presume one migrates some old-fashioned SQL system to a Cassandra based solution given following:

  1. total payload of daily data is quite high
  2. data are written to the Cassandra cluster (more than one node) with ConsistencyLevel Local Quorum
  3. there should be a possibility to check whether all the data stored in the previous systems are written to the new one

After a bit of consideration one can propose that as the data are written with LocalQuorum, they should be queried with the same level and match in the old solution. This would ensure that data which has been written are being read (famous R + W > N). This could cost a lot as querying hits [N+1]/2 nodes of your cluster, streaming a daily payload through network twice: once to the coordinator, second – to the client. Can we do this better?

Possibly faster integrity check
How about using Consistency Level of One? How can this be done to ensure that the given node consists of all the needed data? By running repair in your local data center on each node, one can ensure that each node consist of all the data it’s responsible for. Then, querying with One is ok. What’s important about nodetool repair is that it does not stream data if it’s not needed. The information sent to match if the given node contains all the data is a Merkle tree, a tree made by hash of hashes of hashes of… Sending this structure is cheap and doesn’t your network so much.
If you consider (know that) running repairs daily is a heavy task for your cluster, you’ll be happy to read about Cassandra 2.1 repair improvements, including incremental repairs.

So stop complaining about your good old fashioned RMDB and get yourself a new shiny cluster of Cassandra nodes :)

Web API caching get wrong

I read/watch a lot of stuff published on the infoq site. I enjoy it in the majority of cases and find it valid. Recently I read an article about Web APIs and Select N+1 problem and it lacks the very basic information one should provide when writing about the web and http performance.
The post discusses structuring your Web API and providing links/identifiers to other resources one should query to get the full information. It’s easy to imagine that returning a collection of identifiers, for example ids of the books belonging to the given category can bring more requests to your server. A client querying over books will hit your app one by one performing from a load test to a fully developed DOS. The answer to this question is given in following points:

  • Denormalize and build read models
  • Parallelising calls
  • Using Async patterns
  • Optimising threading model and network throttles

What is missing is
the basic http mechanism provided by the specification: cache headers and ETags. There’s no mention about properly tagging your responses to allow return 304 if the client asks for data that didn’t change. The http caching, its expiration are not mentioned as well. Recently Greg Young posted a great article about leveraging http caching. The best quote summing the whole take on it from Greg’s article would be:

This is often a hard lesson to learn for developers. More often than not you should not try to scale your own software but instead prefer to scale commoditized things. Building performant and scalable things is hard, the smaller the surface area the better. Which is a more complex problem a basic reverse proxy or your business domain?

Before getting into fancy caching systems, understand your responses, cache forever what isn’t changing and ETag with version things that may change. Then, when you have a performance issue turn into more complex solutions.

For sake of reference, the author of the Infoq post reponded to my tweet in here.

The expert real drama

You’re familiar with The expert for sure. It’s popular across all developers. Many of my colleagues find it funny to watch the misunderstanding between an illogical business and a logical developer. There is a real drama hiding in the dark corners of this short. The expert’s drama being sold by its own manager/leader as a universal toolish expert who will deliver. If a manager/leader needs a yes-man, it’s easy to get one. They are cheap. But if one insists that a person skilled in a technical direction, with some intellect on the board will ok or yes everything which is on the table, that’s a serious misunderstanding. The expert have the following choices:

  1. say no which results in being persecuted
  2. say yes and live with this dissonance
  3. resign

If you ever be in the expert position from the movie, I advise the third option. I made the wrong choice a few years ago and this short brought back some memories.

I should’ve used EventStore

One of the most important features of EventStore is an ability to ask the questions like they were asked in the past. You don’t have to rerun manually all the stored information to repartition them, or to aggregate them. All you’ve got to do is to write a new projection which will run from the very beginning (almost always: take a loot at scavenging) till now.
There’s no question you should’ve asked, which cannot be added later, with no mental overhead of manually rerouting data through the pipeline once again. Nice:)

When business owner does not own

One of the most worrying scenarios for having a business owner is a business owner who does not care. This may manifest in various ways:

  • a delay in communication – you receive responses for your emails after days, weeks, when the context is already gone
  • a pressure (even positive) to deliver varying over time
  • a knowledge about project has to be refreshed, the basic use cases are being forgotten

The result is a semi-finished, almost-released product. Even if a team delivering the product cares, having no business verification of their ideas may ruin the project. This state may be a result of personal reasons, like not caring at all. More possible this situation is rooted at organizational issues, which result in switching long-/midterms without pushing down the reasons behind the decisions, keeping people in the zone of unknown. Despite the reason, doing a job with value deprecating in time, without final DONE isn’t good for your team morale. That’s for sure.

Event sourcing and failure handling

Currently I’m workingwith a project using event sourcing as its primary source of truth and the log in the same time (a standard advantage). There are some commands, which may throw an exception if the given condition is not satisfied. The exception propagates to the service and after transformation is displayed to the user. The fact of throwing the exception is not marked as an event. From the point of consistency it’s good: an event isn’t appended, the state does not change, when an exception occurs. What is lost is a notion of failure.
A simple proposal is to think a bit more before throwing an exception and ending a command with nothing changed. One may append a ThisCriticalCommandFailedEvent with nothing but the standard event headers (like time, user performing command, etc.) or something with a better name and return a result equal to the exception thrown. The event can be used later, when you want to analyze failures of executing commands.