This post has been imported from my previous blog. I did my best to parse XML properly, but it might have some errors.
If you find one, send a Pull Request.
Writing your own db is getting more and more popular, isn’t it?:] Especially, if you want to create your custom, NoSQL solution. It’s worth to mention that even Ayende, on of the core NHibernate contributors commited one. It’s called Raven DB and is a document database based on a managed storage with Lucene included for querying purposes. One of its interesting attributes is that Lucene’s index is eventually consistent with the managed storage (you can wait a dozen of miliseconds before your updated document will be indexed and searchable) . If you want to learn sth more about NoSQL solutions and have a plenty of time for watching interesting interviews, you should visit NoSQL Tapes. They’re definitely worth to watch.
Speaking about ‘your own db’, it’s worth to mention, that sometimes NoSQL dbs has dramatically different paradigms. I cannot imagine eas ily switching between Cassandra and MongoDB, or between Redis and RavenDB. They abilities do not map simply. Choosing one should be done with a deep view into your system requirements (for instance Twitter perfectly fits the Cassandra approach).
After this short introduction I wanted to present results of different strategies of aggregating (summing) one column of 100000000 rows with a spike code written recently. The code itself was written after rereading Google’s article about Dremel as well as What every programmer should know about memory and a plenty of Joe Duffy stuff. The result is quite nice in my opinion and is represented with
It’s quite qood, isn’t?
The spike I wrote is a journey, not a way of accomplishing something. With Themis there was an aim and reason, with this, at least for now, it’ only play.