Back to relational, hibernating world

Recently, I’ve been extensively using Fluent NHibernate.  As always, setting up the tests made me setup the SqlLite to cope with NHibernate. I found that FNH provides a simple class with a mouthful name FluentNHibernate.Testing.SingleConnectionSessionSourceForSQLiteInMemoryTesting. All it does is accepting the FluentConfiguration object and providing the created session for your tests. Nice :]

Deiphobus design, pt. 1

It’s the right time to write about Deiphobus design. I’ll start with an example of usage, next I’ll move to configuration, serialization and model created from the config. In the next topic the event design of session implementation, the identity map and its usage will be explained as well as the lifetime of a session and query API.

The configuration provided with Deiphobus is a fluent, “just code” configuration. Each entity is described with a generic EntityClassMap.

/// <summary>
/// The mapping of the class, marking it as entity, stored under separate key.
/// </summary>
/// <typeparam name="T">The type to be mapped.
public abstract class EntityClassMap<T>
 where T : class
{
 protected void Id(Expression<Func<T, Guid>> id)
 protected IndexPart IndexBy(Expression> memberExpression)
 protected void SetSerializer<TSerializer>( )
 where TSerializer : ISerializer
}

The class interface was designed to be similar to Fluent NHibernate:

  • class specifying this type, should be a mapped entity class
  • Id method marks the specific property as id. It’s worth to notice, that only Guid identifiers are available
  • the second method is IndexBy used for marking a property to be indexed with an inverted index. Only the properties marked with this method can be queried in Deiphobus queries. Running query on a not indexed property will throw an exception
  • the very last method, allows to set a custom serializer type for the mapped entity type

All the mappings are consumed by mapping container registering all entity class maps in itself. The maps are translated into EntityClassModel object, describing the specific entity properties. This process takes place when the session factory is created. On the basis of each model class, the object implementing the interface IEntityPersister is created. The implementation of the persister provides methods like: GetPropertyValue or GetIndexedPropertyValues with IL code emitted, to overcome the reflection overhead. This class will be described later, the EntityClassModel‘s method signatures can be seen below:

/// <summary>
/// The class representing a model of mapped entity.
/// </summary>
public class EntityClassModel
{
 public EntityClassModel(Type classType, PropertyInfo id, object idUnsavedValue, IEnumerable<IndexedProperty> indexedProperties)
 {
  ClassType = classType;
  Id = id;
  IdUnsavedValue = idUnsavedValue;
  IndexedProperties = indexedProperties.ToList().AsReadOnly();
 }
 public Type ClassType { get; private set; }
 public PropertyInfo Id { get; private set; }
 public object IdUnsavedValue { get; private set; }
 public IEnumerable<IndexedProperty> IndexedProperties { get; private set; }
 public Type SerializerType { get; set; }
}

The very last part of this entry, is for serialization in Deiphobus. Because of the usage of Cassandra, each entity is stored under one key, in one column family, in one column. The entity is serialized in the moment of storing. The serialized entity is stored in Cassandra as well as its inverted indexes based on values retrieved just before saving the entity in the database. In the current moment, two levels of serializers can be setup:

  • the default, used by all classes not having their own
  • entity class specific

The rest of types is always serialized using the default serializer. This behavior may be subject to change.

Dremel

I’ve just finished reading Dremel whitepaper. It seems that Google one more time brought to life something, which may change the IT world. Imagine queries running against trillions of rows and returning results in a few seconds, imagine fault tolerant db, that scales linear and still, allows you to query it in a very advanced ways (for instance, using grouping). Yeah, I’m aware of VoltDB but the Dremel’s description was just astonishing.

On the other hand: have you ever had a possibility to test your scalable app on 2900 servers? :)

Going to NoSQL

Cassandra is a NoSQL database, hence there is no easy way to map your SQL experiences to this brave new world of no-relationship. The most fundamental paper, which helped me to get the idea of Cassandra was Google’s white paper describing their Big Table. The document is a must-read for all Cassandra (and Big Table as well) users. It describes used data structures and shows how Google uses it for storing page indexes. Once you read it, you’ll never look at Google search engine the same way.

The second position on my list was Cassandra’s wiki page, updated with a speed of light, with plenty of links for topics like eventual constistency. Believe me, you can spend at least a few days going deeper and deeper into more cassandranic state of mind.

The nicest inverted index picture, describing the whole idea can be found in here. The post additionally describes the performance of inverted indexes in Cassandra’s databases. Probably it was the moment, when I asked myself, is there any mapper, allowing with ease save entities (for DDD fans: aggregate roots), like

 session.Save(user);

which automatically will disassemble the entity and store properties, marked as indexed, in inverted indexes.

I found none.

The Deiphobus was born.

A story of Cassandra’s brother: Deiphobus

The Apache Cassandra Project develops, as it is written on the official page of the project, a highly scalable second-generation distributed database, bringing together Dynamo’s fully distributed design and Bigtable’s ColumnFamily-based data model (wow!:P). The mentioned properties of this architecture brings profits, but also some limitations and challenges such as:

  • no transaction support
  • no atomic operations (the data entry can be stored in one node of replicating nodes, the others can still return the dirty-read data)
  • possibility of slow ‘eventual consistency’ (see http://wiki.apache.org/cassandra/HintedHandoff)
  • no obvious quering (there is no possibility to simple ask for a song titled ‘Octavarium’)

Next, the Cassandra client, Thrift-based, provides a very low level API, which does not allow connection pooling, using identity map and many more features so well-known from other db access tools (ADO.NET, NHibernate). According to the Cassandra’s list of .NET clients there are three of them:

Although they provide a nice object wrap around the Thrift interface (or LINQ in Fluent Cassandra), they lack few features. These deficiencies brought to life an idea of Deiphobus, the more advanced access API for Cassandra database. The core features of Deiphobus will be:

  • unit of work pattern
  • identity map
  • automatic inverted index creation (based on a fluent configuration)
  • compensation strategies (for the transactionality lack)
  • future queries (the very same as in NHibernate)

In the next post I’ll describe the influences and basics of Deiphobus design.

Take care