Multidatacenter Cassandra cluster with slow cross DC connection

Posted on April 07, 2014 · 1 min read · tagged with: #Cassandra #network partition #split brain #splitbrain

This post has been imported from my previous blog. I did my best to parse XML properly, but it might have some errors.

If you find one, send a Pull Request.

I’d like to discuss a particular failure scenario for a multi datacenter Cassandra cluster. The setup to reproduce is following:

Two Cassandra data centers
- DC1: n nodes
- DC2: m nodes
TestKeyspace
NetworkTopologyStrategy with replication factors:
- DC1: n (each key on each node)
- DC2: m (each key on each node)
Tables in TestKeyspace are created with default settings
hinted hand-off enabled
read repair enabled

The writes and reads goes to the DC1. What can go wrong when whole DC2 goes down (or you get a network split)?

It occurs that read_repair is defined not by one but two probabilities:

read repair chance with default value of 0.1
dclocal read repair chance with default value of 0.0

What’s the difference between them? The first one shows probability of read repair across whole cluster, the second - rr across the same DC. If you have an occasionally failing connection, or a slow one using the first can bring you some troubles. If you plan for multi DC cluster and you can live with periodical runs nodetool repair instead of failing some of your LOCAL_QUORUM reads from time to time, switch to dc read repair and disable the global one.

For curious readers the class responsible for performing reads with read-repairs as well is AbstractReadExecutor

← Previous Post Next Post →