Multidatacenter Cassandra cluster with slow cross DC connection

I’d like to discuss a particular failure scenario for a multi datacenter Cassandra cluster.
The setup to reproduce is following:

  • Two Cassandra data centers
    • DC1: n nodes
    • DC2: m nodes
  • TestKeyspace
  • NetworkTopologyStrategy with replication factors:
    • DC1: n (each key on each node)
    • DC2: m (each key on each node)
  • Tables in TestKeyspace are created with default settings
  • hinted hand-off enabled
  • read repair enabled

The writes and reads goes to the DC1. What can go wrong when whole DC2 goes down (or you get a network split)?
It occurs that read_repair is defined not by one but two probabilities:

What’s the difference between them? The first one shows probability of read repair across whole cluster, the second – rr across the same DC. If you have an occasionally failing connection, or a slow one using the first can bring you some troubles. If you plan for multi DC cluster and you can live with periodical runs nodetool repair instead of failing some of your LOCAL_QUORUM reads from time to time, switch to dc read repair and disable the global one.

For curious readers the class responsible for performing reads with read-repairs as well is AbstractReadExecutor

Bounded context in deployment tools

Recently I’ve been moving around the topic of a deployment. Imagine a situation you’re being given a set of scripts, or script like objects used to deploy a set of applications. The so-called scripts are from the very basic like create-directory to complex, rooted in an organization infrastructure and tooling. Additionally, some of them are defined as groups of other scrips. For example, installing an application service starts with creation of a directory, then binaries are copied and finally, the service is registered.
The scripts are not covered with tests, but are hardened by years of successful usage. One could consider to rewrite them totally, and provide a full blown set of tests. This may be hard, as you throw away all the knowledge hidden behind scripts. Remember, that there were big companies that are no longer here, take Netscape as example.

I’ve spent quite a while about considering chef, PowerShell, Pupper even the msbuild with its tasks. What helped me to make up my mind was the famous Blue Book. Why not to consider a set of scrips as a bounded context? Just take a look at the picture provided by Martin Fowler here. Wrap all the older scripts in a context bubble providing mapping, mostly intellectual, to all the terms that are needed to be known outside. It’s more than wrapping all old scrips with an interface. There is a need of a real mapping with a glossary to let people which do not want to leave the bubble now exist in it for a while. What tool will be used for the new bounded context communicating with the other? That’s an open question. I’ll try to choose the best tool, with good enough test support. The only real requirement is the ability to provide the mapping to the old deployment tools’ context.

If you want to learn more, just take a loot at the great Eric Evan’s videos under this link:

AzureDirectory – code review

The project AzureDirectory provides an Azure implementation of the Lucene.NET abstraction – Directory. It targets in providing ability to store Lucene index in the Azure storage services. The code can be found in here AzureDirectory . The packages can be found on nuget here. There aren’t marked as prereleases.

The solutions consists of two projects:

  1. AzureDirectory
  2. TestApp

The first provides implementation of the Lucene abstractions. The are a few classes, only needed for the feature implementation (implementations of Lucene abstractions). Additionally some utils class are introduced.
The code is structured with regions, which I personally dislike. Names of regions like: CTORS, internal methods, DIRECTORY METHODS shows the way the code is molded, with no classes holding common wrapped in region functionality. The lengthy methods and ctors are another disadvantage of this code base.
The spacing, using directives, fields that may be readonly are messy. Something which may be cleared with a ReSharper Clean Code is left for a reader to deal with.
You can find in there usages of Lucene obsolete API (like IndexInput.Close in disposal), as well as informative comments like:

// sometimes we get access denied on the 2nd stream…but not always. I haven’t tracked it down yet
// but this covers our tail until I do

It’s good and informative for author but leaving the project in an immature state.

The second project is not a test project but a sample app using the lib. No tests at all.

Summing up, after consideration, I wouldn’t use this implementation for my production Azure app. The code is badly composed, with no tests and left with comments pointing at situations where authors are aware of the unsolved-yet problem.

The missing 20% of configuration with Octopus Deploy

Recently I’ve been evaluating Octopus Deploy. I wanted to learn more about a platform which is quite unique for .NET environment. After getting through the features I wrote following tweet:

What’s about the missing 20%?
The missing part is the configuration considered as an artifact. There are many projects when changing you code makes you change your configuration. Adding some values is ok, much more important are operations like swapping whole sections, etc. This kind of changes are frequently connected with the given code change, hence they are changed in the very same commit and pushed to VCS. Then, your configuration is no longer a simple table with selection of dimensions where the given value should be applied. The additional skipped dimension is time measured in the only unit VCS is aware of – commits. It’s like “from now on I need this values”.
What you can do is to use Octopus values to point to the right file for the given environment, that’s for sure. The thing becomes a bit more tricky when for instance production config should not be leaked into the development repo, etc.
This leads to the fact, that your configuration is an artifact. In many cases it can be easily replaced by a table with ‘environment’ dimension but still, it is an artifact, now, unfortunately not stored in your repository.

The reasoning above is not meant to lead you astray. Octopus is a great deployment tool. As with every tool, use it wisely.

Agile team analogy

Yesterday I’ve had a discussion about ability to introduce a Scrum-based development in a big organization. There was a question raised whether it’s possible to construct a cross-functional team and deliver shippable product with no busy waits, no “I need this from that guy” syndrome.
I thought about each team member as a team part, consuming some input (mails, requests, discussions) and producing some output. Nowadays it’s much more possible to make it asynchronous, to remove all the meetings, long-running discussions, to review code of your colleagues in a post-commit manner. Having that said, you can look at each person in producer-consumer manner and the graph of the dependencies is the bus shifting the artifacts you create. What it takes to create a good team is to couple all the people which can be waiting for the output of others and collocate them in one team. The obvious producers (with much less input) would be product owners, hence, they’d be feed by testers for sure. If you have some kind of core system in your company, a guy working with it would be a perfect fit as well. Just try to collocate all the people with transitive producer-consumer dependencies. Choose the most critical and time consuming ones. Just make a bubble with a minimum input, possibly asynchronous. What about output? That would be, according to your definition of done, your shippable product. Nothing more nothing less. That’s what all Agile is all about, isn’t it?

Your local user groups

Recently I’ve been deeply involved in Warsaw .NET User Group. What makes you a deeply involved you ask? I’d say that resolving current problems would be the answer that fits the most. We covered a few important points like getting some sponsorship, being given a few tickets for Build Stuff conference (thanks!) and running snacks sessions (short presentations, for those who want to start with their presentations). It looks like people are a bit more energized and active. That’s for sure the right direction for any user group.
I want to encourage all of you, just make a small move, do sth for the community you’re chosen “to be involved with”, for instance ask for a problem to be resolved. It’s a win-win “by people, for people”, nothing more, nothing less.

Code reviews

Recently, I’ve been involved in a discussion about code reviews. It made me remember a code review board that I introduced in one of my previous projects. Beside that I had to clarify positive aspects of code reviews. Verified by experience the most important point of doing code reviews with a tool is total asynchrony of the process. The reviewer can easily go through the diff, marking lines, changes in a given time selected by himself (or scheduled via team rules). The second point would be the artifact left after this process. It’s not a discussion over a coffee or an email in a private inbox. Once it’s published in a tool, it’s visible part of the project, a manifestation of the change in the code. Isn’t it great?

The list of possible tools to use:

  • with it’s possibility to comment over pull requests and commits.
  • I used this tool, it’s simple and easy to go with. It’s free!
  • Crucible an Atlasian tool. I haven’t used it so far, but hope to do it in a while

Getting asynchronous, public, strongly connected with code discussion over the project. This will make your code thrive and the knowledge spread across the team. Trust me.