I want my own db

Writing your own db is getting more and more popular, isn’t it?:] Especially, if you want to create your custom, NoSQL solution. It’s worth to mention that even Ayende, on of the core NHibernate contributors commited one. It’s called Raven DB and is a document database based on a managed storage with Lucene included for querying purposes. One of its interesting attributes is that Lucene’s index is eventually consistent with the managed storage (you can wait a dozen of miliseconds before your updated document will be indexed and searchable) . If you want to learn sth more about NoSQL solutions and have a plenty of time for watching interesting interviews, you should visit NoSQL Tapes. They’re definitely worth to watch.

Speaking about ‘your own db’, it’s worth to mention, that sometimes NoSQL dbs has dramatically different paradigms. I cannot imagine easily switching between Cassandra and MongoDB, or between Redis and RavenDB. They abilities do not map simply. Choosing one should be done with a deep view into your system requirements (for instance Twitter perfectly fits the Cassandra approach).

After this short introduction I wanted to present results of different strategies of aggregating (summing) one column of 100000000 rows with a spike code written recently. The code itself was written after rereading Google’s article about Dremel as well as What every programmer should know about memory and a plenty of Joe Duffy stuff. The result is quite nice in my opinion and is represented with

0,098273895 seconds

It’s quite qood, isn’t?

The spike I wrote is a journey, not a way of accomplishing something. With Themis there was an aim and reason, with this, at least for now, it’ only play.

Themis, take a break

Sometimes it’s good to take a break and give some ideas a while to regenerate in your head. After a few weeks of not touching Themis at all, I wrote a lot of documentation yesterday (and a few paragraphs today). I also recompiled it with the newest build of NHibernate, to allow you using with the newest fluent NHibernate. One more news is that Themis will be used in a production project. So far it brings profits but I’m ready to immerse into it if there any optimization/extension/refactorization will be needed.

Happy Themissing!

Open your project’s sources

In the Ryan King’s presentation you can find a plenty of good/best practices of Twitter teams. You can find tips how they use Cassandra, what kind of an awesome id generator they provide (and it is an open source: Snowflake). I do read about open sourcing as a way of getting your apps better, but I do not know where are the boundaries of showing what you’ve got vs. giving everyone the omnipotent ‘How to steal my users and dethrone me’ book. Can everything may become public? What are the root secrets you should not reveal? Any thoughts?

BTW the whole presentation can empower you in using no-sql. It’s a must-see.

How to guide users of your API

After winter holidays it was time to get back to the reality and to make up for the pleanty of unread posts in my reader. One of the read posts was Christopher Bennage’s entry about RavenDB API. The main problem with API was that it allows calling every Linq extension method on the Raven session. Even, the unwanted one – ToList. Christoper’s proposal is to provide an obsolete ToList method, which accepts the Raven query interface, and by being obsolete informs about the best way of getting a list of objects.

Simple, nice, powerful.

Continuous delivery

I’ve just finished a Continuous delivery book. I must say, that my expectations were much higher than the level at which the topic was described. I do not say, that this is a bad book, but having a few well-grounded articles, like Fowler’s I simply do not find it good enough for a deep dive into the problem’s space. The last chapters present a way in which the whole book should have been written: simple cases with a short explanation dealing with the most common problems. From the other side, the very first chapters repeat like some kind of mantra the same sentences paraphrised over and over again (automate your build, do nothing manually, etc.).

If you’re a noob, and read nothing about this topic, that’s a very good choice. For the others: it is not a must-have.

Cache me this, cache me that

Recently I’ve been optimizing one application. After a few runs of profile_analyzeData_fixBottlenecks I stopped when an app written with NHibernate intensively using CacheManager from EntLib was spending 5% of their time getting data from cache. As the performance was increased by an order of magnitude, the time spend within _GetData_ method was much to large. I reviewed code and wrote a few statements describing what application require from a cache:

  • no expiration and
  • it is infrequently erased so
  • has big read:write ratio
  • the number of cached items can be easly estimated, hence
  • the number of cached items will not frequently exceed the max number of item
  • once the scavenging occurs (current number > max number of items), it can take a while, since it has a low probability to hit the upper limit of cache items number

Following these points, the performance results of GetData method and need of tuning the app (it was the most obvious point for optimization) I came up with the following Cache Manager implementation. The code is simple, oriented for frequent reads (slim rw lock), with all operations having cost of O(1). Adding, when a maximum number of cache items is exceeded takes O(n) because scavenging takes O(n) to clean up cache (the FIFO is used for removing these entries). The removal of key is handled by marking it as removed in the key look up list.

The whole implementation with tests took me 40 minutes, which with other small optimizations improved performance by 10%. The main idea behind this is to give you some thoughts about using the external all in one solutions versus writing a small, custom one, for only your needs. Hope the code example help as well :-)

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;
using Trader.Core.Exceptions;
using Trader.Infrastructure.IoC.Wcf;

namespace Trader.Infrastructure.Caching
{
    ///
    /// The implementation of ,
    /// using fifo strategy for scavenging to many objects in a cache.
    ///
    ///
    /// This implementation is thread safe and its instances can be shared
    /// by different threads.
    ///
    public class CacheManager : ICacheManager, IDisposable
    {
        private readonly Dictionary _cache;
        private readonly int _capacity;
        private readonly int _maxNumberOfItems;
        private readonly int _numberToRemove;
        private readonly ReaderWriterLockSlim _rwl = new ReaderWriterLockSlim();
        private List _keys;

        public CacheManager(string name, int maximumElementsInCacheBeforeScavenging, int numberToRemove)
        {
            if (maximumElementsInCacheBeforeScavenging < 1)
            {
                throw new ArgumentException("Cannot be less than 1", "maximumElementsInCacheBeforeScavenging");
            }
            if (numberToRemove < 1)             {                 throw new ArgumentException("Cannot be less than 1", "numberToRemove");             }             if (numberToRemove > maximumElementsInCacheBeforeScavenging)
            {
                throw new ArgumentException("Scavenging more than max number of elements in cache", "numberToRemove");
            }

            Name = name;
            _maxNumberOfItems = maximumElementsInCacheBeforeScavenging;
            _numberToRemove = numberToRemove;

            _capacity = maximumElementsInCacheBeforeScavenging + 1;
            _cache = new Dictionary(_capacity);
            _keys = new List(_capacity);
        }

        public string Name { get; private set; }

        public int Count
        {
            get
            {
                using (_rwl.Readlock())
                {
                    return _cache.Count;
                }
            }
        }

        ///
        /// Adds the specified key and value to the cache.
        ///
        /// The key of cache entry.
        /// The value of cache entry.
        ///
        /// This is O(1) operation, but when a number of cache items exceeded,
        /// it calls  method. See more .
        ///
        public void Add(string key, object value)
        {
            key.ThrowIfNull("key");
            using (_rwl.WriteLock())
            {
                CacheItem item;
                if (_cache.TryGetValue(key, out item))
                {
                    // if exists only update
                    item.Value = value;
                }
                else
                {
                    item = new CacheItem(value, _keys.Count, key);
                    _cache[key] = item;
                    _keys.Add(item);

                    // scavange if needed
                    if (_cache.Count > _maxNumberOfItems)
                    {
                        Scavange();
                    }
                }
            }
        }

        public bool Contains(string key)
        {
            using (_rwl.Readlock())
            {
                return _cache.ContainsKey(key);
            }
        }

        public void Flush()
        {
            using (_rwl.WriteLock())
            {
                _cache.Clear();
                _keys.Clear();
            }
        }

        public object GetData(string key)
        {
            using (_rwl.Readlock())
            {
                CacheItem result;
                _cache.TryGetValue(key, out result);

                if (result != null)
                {
                    return result.Value;
                }
                return null;
            }
        }

        ///
        /// Removes a specified key from cache.
        ///
        /// The key to be removed.
        ///
        /// When non existing key passed, simply does nothing.
        /// This is O(1) operation.
        ///
        public void Remove(string key)
        {
            using (_rwl.WriteLock())
            {
                CacheItem result;
                _cache.TryGetValue(key, out result);

                if (result != null)
                {
                    // mark key as removed for scavenging optimization
                    _keys[result.KeyIndex] = null;
                    _cache.Remove(key);
                }
            }
        }

        public void Dispose()
        {
            _rwl.Dispose();
        }

        ///
        /// Scavanges elements, deleting at most n entries, where n is numberToRemove
        /// from ctor parameter.
        ///
        /// A number of scavanged items.
        ///
        /// This method has O(n) complexity. It iterates and recreates index list.
        ///
        public int Scavange()
        {
            var removedSoFar = 0;
            for (var i = 0; i < _keys.Count && removedSoFar < _numberToRemove; i++)
            {
                if (_keys[i] != null)
                {
                    _cache.Remove(_keys[i].Key);
                    _keys[i] = null;
                    ++removedSoFar;
                }
            }

            var newKeys = new List(_capacity);
            for (var i = 0; i < _keys.Count; i++)
            {
                if (_keys[i] != null)
                {
                    newKeys.Add(_keys[i]);
                }
            }

            // fix indexes
            for (var i = 0; i < newKeys.Count; i++)
            {
                newKeys[i].KeyIndex = i;
            }

            _keys = newKeys;

            return removedSoFar;
        }

        #region Nested type: CacheItem

        [DebuggerDisplay("Cache item: Key {Key}, KeyIndex {KeyIndex}")]
        private class CacheItem
        {
            public readonly string Key;
            public object Value;
            public int KeyIndex;

            public CacheItem(object value, int keyIndex, string key)
            {
                Key = key;
                Value = value;
                KeyIndex = keyIndex;
            }
        }

        #endregion
    }
}