I’d like to discuss a particular failure scenario for a multi datacenter Cassandra cluster.
The setup to reproduce is following:
- Two Cassandra data centers
- DC1: n nodes
- DC2: m nodes
- NetworkTopologyStrategy with replication factors:
- DC1: n (each key on each node)
- DC2: m (each key on each node)
- Tables in TestKeyspace are created with default settings
- hinted hand-off enabled
- read repair enabled
The writes and reads goes to the DC1. What can go wrong when whole DC2 goes down (or you get a network split)?
It occurs that read_repair is defined not by one but two probabilities:
What’s the difference between them? The first one shows probability of read repair across whole cluster, the second – rr across the same DC. If you have an occasionally failing connection, or a slow one using the first can bring you some troubles. If you plan for multi DC cluster and you can live with periodical runs nodetool repair instead of failing some of your LOCAL_QUORUM reads from time to time, switch to dc read repair and disable the global one.
For curious readers the class responsible for performing reads with read-repairs as well is AbstractReadExecutor
Cassandra is one of the most interesting NoSQL databases which resolves plenty of complex problems with extremely simple solutions. They are not the easiest options, but can be deduced from this db foundations.
Cassandra uses Sorted String Tables as its store for rows values. When queried, it simply finds the value offset with the index file and searched the data file for this offset. New files are flushed once in a while to disc and a new memory representation of SST is started again. The files, once stored on disc are no longer modified (the compactation is another scenario). How would you backup them? Here comes the simplicity and elegance of this solution. Cassandra stores hard links to each SST flushed from memory in a special directory. Hard links preserves removing of a file system inodes, allowing to backup your data to another media. Once once backup them, they can be removed and it’d be the file system responsibility to count whether it was the last hard link and all the inodes can be set free. Having your data written once into not modified files gives you this power and provides great simplicity. That’s one of the reasons I like Cassandra’s design so much.
The docs for Cassandra backups are available here.
If you’re a user of NHibernate, I hope you enjoy using polymorphic queries. It’s one of the most powerful features of this ORM allowing you to write queries spanning against several hierarchies. Finally, you can even query for the derivations of object to get all the objects from db (don’t try this at home:P). The most important fact is, that having a nicely implemented dialect NH can do it in one DB call, separating specific queries by semicolon (in case of MS SQL)
var allObjects = session.QueryOver<object>().List();
Although the feature is powerful, you can find a small problem. How to count entities returned by the query without getting’em all to the memory. Setting a simple projection Projections.RowCount() will not work. Why? ‘Cause it’s gonna query each table with COUNT demanding at the same time that IDataReader should contain only one, unique result. It won’t happen, and all you’ll be left with it’ll be a nice exception. So, is it possible to count entities in a polymorphic way? Yes! You can define in a extension method and use it every time you need a polymorphic count.
private static int GetPolymorphicCount(ISession s, Type countedType)
var factory = s.GetSessionImplementation().Factory;
var implementors = factory.GetImplementors(countedType.FullName);
.Select(i => s.CreateCriteria(i)
.ToArray() // to eagerly create all the future values for counting
.Aggregate(0, (count, v) => count + v.Value); // sum up counts