Simple Cassandra backups

Cassandra is one of the most interesting NoSQL databases which resolves plenty of complex problems with extremely simple solutions. They are not the easiest options, but can be deduced from this db foundations.
Cassandra uses Sorted String Tables as its store for rows values. When queried, it simply finds the value offset with the index file and searched the data file for this offset. New files are flushed once in a while to disc and a new memory representation of SST is started again. The files, once stored on disc are no longer modified (the compactation is another scenario). How would you backup them? Here comes the simplicity and elegance of this solution. Cassandra stores hard links to each SST flushed from memory in a special directory. Hard links preserves removing of a file system inodes, allowing to backup your data to another media. Once once backup them, they can be removed and it’d be the file system responsibility to count whether it was the last hard link and all the inodes can be set free. Having your data written once into not modified files gives you this power and provides great simplicity. That’s one of the reasons I like Cassandra’s design so much.

The docs for Cassandra backups are available here.

Polyglot programming

Have you ever been in a situation when you’ve got not enough tooling in your preferred environment? What did you do back then? Did you use already known tools or search for something better?
Recently I’ve been involved in a project which truly pushed me out of my comfort zone. I like .NET, but there’s not enough tooling to provide a highly scalable, performant, failure resistant environment. We moved to Java, Storm, Cassandra, Zookeeper – JVM all over the place. It wasn’t easy, but it was interesting and eye opening. Having so many libraries focused on resolving specified concerns instead of the .NET framework-oriented paradigm was very refreshing.
Was it worth it? Yes. Was it good for self-development? Yes. Will I reach for every new library/language? For sure no. The most important thing which I’ve learned so far, was that being adaptable and aware of tools is the most important thing. Mistakes were made, that for sure, but the overall solution is growing in a right direction.
After all, it’s survival of the fittest, isn’t it?

Disruptor with MultiProducer

I hope you’re aware of the LMAX tool for fast in memory processing called disruptor. If not, it’s a must-see for nowadays architects. It’s nice to see your process eating messages with speeds ~10 millions/s.
One of the problems addressed in the latest release was a fast multi producer allowing one to instantiate multiple tasks publishing data for their consumers. I must admit that the simplicity of this robust part is astonishing. How one could handle claiming and publishing a one or a few items from the buffer ring? Its easy, claim it in a standard way using CAS operation to let other threads know about the claimed value and publish it. But how publish this kind of info? Here’s come the beauty of this solution:

  1. allocate a int array of the buffer ring length
  2. when items are published calculate their positions in the ring (sequence % ring.length)
  3. set the values in the helper int array with numbers of sequences (or values got from them)

This, with overhead of int array allows:

  1. waiting for producer by simply checking the value in the int array, if it matches the current number of buffer iteration
  2. publishing in the same order items were claimed
  3. publishing with no additionals CASes

Simple, powerful and fast.
Come, take a look at it: MultiProducerSequencer

ProtoDescriptor

Recently I’ve created (with some porting from another project) a simple library which allows parsing .proto files and storing them in a model. The library offers serialization/deserialization of the mentioned model. I hope I ship dynamic genaration of protobuf-net classes as well. It would allow creation of self-desriptive streams (contract added at the very beginning of a file) discoverable via reflection (you got class, you IEnumerable of this class’ object) and queryable. It has some potential in it.

https://github.com/Scooletz/ProtoDescriptor

Goodbye Voron

Today I read about the next project from Ayende, which manifesto you can find here. The project is meant to deliver managed memory mapped file based storage for RavenDb. What caught my attention was a sentence asking for contributors.
I thought that it would be a great idea to write a storage engine or to be a part of a team creating one, so I asked about licensing. I’ve been given an answer that it will have a RavenDB compatible license which means nothing less that the product will be dual-licensed: it’s open for OSS projects and closed for commercial. There’s an exception of course: the Raven itself.
As Ayende stated in comments “Scooletz, My code, my rules, pretty much. You are free to do the same and publish it under any license you want.”. It’s true, but as he has right to make that kind of choice I’m allowed to dislike it. The most interesting part is asking for contributors to a project which will be non-free for non-OSS solutions.
Looking through various OSS projects, Event Store looks much better. It’s BSD. One can contribute to or take it and turn it into anything he/she can think of. I do prefer the other style.

NUnit and time measurements

It’s common, that some part of your NUnit tests are tests, that should log their execution time. One of the ways of providing such a behavior is to provide custom SetUp and TearDown methods whether in your fixture or in a base test fixture. I find it disturbing, as a simple SetUp can be bloated with plenty of concerns.
Another way of providing it is using not a well-known interface of ITestAction. It allows to execute code before and after test in an AOP way. Of course one can argue that a simple method accepting action which execution will be measured is a better option, but I prefer coding in a declarative way and using a simple attribute visible in the signature of your method seems much more suited for this kind of behavior.
Take a look at the gist below and use it in you tests!

A poor cookie

The implementation of a http cookie is leaky. Better get used to it. You can read RFCs about, but better read one, more meaningful question posted on the security stackexchange. If your site is hosted as a subdomain with others apps and a malicious user can access any other app a cookie with top domain can be set. What it means, is that the cookie will be sent with every request to the top domain as well as yours (domain-match verb in the RFCs). This can bring a lot of trouble when an attacker sets a cookie with a name important for your app, like a session cookie. According to the specification, both values will be sent under the same name with no additional information about on which basis a given value was sent.

Html5 to the rescue
If you design a new Single Page Application, you can be saved. Imagine that during POST sending the login data (user & password) in a result JSON a value previously stored in cookie is returned. One can save it in the localStorage easily and add later on to the headers of requests needing authentication. A simple change brings another advantage. Requests not needing authentication like GETs (as noone sends fragile data with verb that is vulnerable to JSON Hijacking) can be sent with no header overhead to the same domain. A standard solution to stop sending cookies with GETs is shipping all your static files to another domain. That isn’t needed anymore.