Merge request policy

If you’re in a company with multiple teams in IT department, you could’ve been considering using GitHub Enterprise or its free replacement, GitLab. Beside providing Git hosting experience both can support you with one aspect previously unknown to your organization: pull/merge requests.

A pull request in GitHub, a merge request in GitLab provides the same functionality. They let other service users to issue demands of changes in a way that makes it easy to apply onto the original repository. The advantages of using this kind of change management over sending diffs with emails or other ways of applying fixes in other codebases are:

  1. Traceability – a request has its URL, is linkable and is public
  2. Permission granularity – everyone can be given a permission to read and fork repo but not to write. This lets the owners to stay owners and deal with issued requests rather than a broken codebase because of mistakes made by others (for particular flows, read here)
  3. True ownership – an owner is released from digging the dirt and moved to the acceptor state
  4. No more emails – email requests are no longer sent. The basic way of asking for a given change is… doing the change
  5. Learning over abusing – the requester is given an opportunity to leave with other codebase. It can cost a bit more, that’s for sure, but it spreads practices and knowledge. The most important thing for owner is to help to create pull requests, not to apply them onto repositories.

This kind of change can be painful for lazy people, which want to delegate, or rather, push away all their work with no insight about the requested changes. This can be a big learning opportunity as well. Adopting OSS rules, like merge-request-policy in your day-to-day job can increase your awareness and make you a happier developer.

It’s time to issue some pull requests!

Multidatacenter Cassandra cluster with slow cross DC connection

I’d like to discuss a particular failure scenario for a multi datacenter Cassandra cluster.
The setup to reproduce is following:

  • Two Cassandra data centers
    • DC1: n nodes
    • DC2: m nodes
  • TestKeyspace
  • NetworkTopologyStrategy with replication factors:
    • DC1: n (each key on each node)
    • DC2: m (each key on each node)
  • Tables in TestKeyspace are created with default settings
  • hinted hand-off enabled
  • read repair enabled

The writes and reads goes to the DC1. What can go wrong when whole DC2 goes down (or you get a network split)?
It occurs that read_repair is defined not by one but two probabilities:

What’s the difference between them? The first one shows probability of read repair across whole cluster, the second – rr across the same DC. If you have an occasionally failing connection, or a slow one using the first can bring you some troubles. If you plan for multi DC cluster and you can live with periodical runs nodetool repair instead of failing some of your LOCAL_QUORUM reads from time to time, switch to dc read repair and disable the global one.

For curious readers the class responsible for performing reads with read-repairs as well is AbstractReadExecutor

Bounded context in deployment tools

Recently I’ve been moving around the topic of a deployment. Imagine a situation you’re being given a set of scripts, or script like objects used to deploy a set of applications. The so-called scripts are from the very basic like create-directory to complex, rooted in an organization infrastructure and tooling. Additionally, some of them are defined as groups of other scrips. For example, installing an application service starts with creation of a directory, then binaries are copied and finally, the service is registered.
The scripts are not covered with tests, but are hardened by years of successful usage. One could consider to rewrite them totally, and provide a full blown set of tests. This may be hard, as you throw away all the knowledge hidden behind scripts. Remember, that there were big companies that are no longer here, take Netscape as example.

I’ve spent quite a while about considering chef, PowerShell, Pupper even the msbuild with its tasks. What helped me to make up my mind was the famous Blue Book. Why not to consider a set of scrips as a bounded context? Just take a look at the picture provided by Martin Fowler here. Wrap all the older scripts in a context bubble providing mapping, mostly intellectual, to all the terms that are needed to be known outside. It’s more than wrapping all old scrips with an interface. There is a need of a real mapping with a glossary to let people which do not want to leave the bubble now exist in it for a while. What tool will be used for the new bounded context communicating with the other? That’s an open question. I’ll try to choose the best tool, with good enough test support. The only real requirement is the ability to provide the mapping to the old deployment tools’ context.

If you want to learn more, just take a loot at the great Eric Evan’s videos under this link: http://dddcommunity.org/?s=four+strategies