This post has been imported from my previous blog. I did my best to parse XML properly, but it might have some errors.
If you find one, send a Pull Request.
I’d like to discuss a particular failure scenario for a multi datacenter Cassandra cluster. The setup to reproduce is following:
DC1: n nodes
DC2: m nodes
DC1: n (each key on each node)
DC2: m (each key on each node)
The writes and reads goes to the DC1. What can go wrong when whole DC2 goes down (or you get a network split)?
It occurs that read_repair is defined not by one but two probabilities:
What’s the difference between them? The first one shows probability of read repair across whole cluster, the second - rr across the same DC. If you have an occasionally failing connection, or a slow one using the first can bring you some troubles. If you plan for multi DC cluster and you can live with periodical runs nodetool repair instead of failing some of your LOCAL_QUORUM reads from time to time, switch to dc read repair and disable the global one.
For curious readers the class responsible for performing reads with read-repairs as well is AbstractReadExecutor