Deiphobus, no more SELECT n + 1

The previous post contained an information about lazy loading of group of properties, let’s call them families as it is called in the Cassandra. What about the following code. How many db hits you’d like to get by default?

using (var s = sessionFactory.Open())
{
	var user = s.Load<IUser>(5);
	foreach(var post in user.Posts)
	{
		Console.WriteLine(post.Title);
	}
}

I’ll tell you how many you’ll get. The answer is two: first hit will occur, when a collection of posts is accessed in the foreach loop, the second – when a title is printed on the console. During the second hit all the posts loaded in the session will have their titles loaded. In some cases it may drive to a small overhead, but it simplifies batching and working with your entities in the majority of cases. Would anyone like to set FetchMode, like it was done in the NHibernate? ;)

Deiphobus, mapping properties groups to column families

As i it is stated in the official documentation, the column family can be compared to a table in the relational database and as with table, the main target of creating one is to hold the same type of objects together to create a better model, allow querying, etc. Speaking about Cassandra it has one more advantage: the whole column family is stored on one server (the data are consistently hashed by key), so you may consider a column family as a collection of items frequently used together, for instance: the surname and the name, or the user name and the password.
In NHibernate you can embed those values within component to make it look like a whole, but they’re still stored in the same table. Speaking about Deiphobus, it can be easily configured to match the needs of grouping properties (simple properties, and referencing other entities in many-to-one or one-to-one way) by implementing a convention interface:

public interface IPropertyFamilyConvention
{
    /// <summary>
    /// Gets the Cassandra family name.
    /// </summary>
    /// <param name="mappedType">The mapped type.</param>
    /// <param name="propertyInfo">The info of a mapped property.</param>
    /// <returns>The family name.</returns>
    FamilyName GetFamilyName(Type mappedType, PropertyInfo propertyInfo);
}

It’s worth to mention, that by default all the properties are mapped to one column family called ‘Entity’. Ok, you know how it influences the storage from the Cassandra point of view, but what about Deiphobus? As it was implemented, every time you access previously not loaded property of an entity mapped with Deiphobus, the whole column family, containing the specified property is loaded from the database (actually a bit more data is retrieved, but for the sake of simplicity it can be omitted in here). It means, that once you started using a property which is strongly connected with others, all the needed properties will be loaded in one db hit. Simple and powerful, isn’t it?

Deiphobus, lazy load

Last time, when an identity map of the Deiphobus was described, one user entity of was asked about several times. We already know, that the same object was returned, but what about hitting the Cassandra DB? Consider the following code:

var user = session.Load<IUser>(5);
// some other loads and operations
var model = user.Login; // here goes db hit!

As you can see, the database is not hit till one of the properties is queried. It’s default and only mode of loading entities with Deiphobus. Is allows a great reduction of db calls, if your code is structured in a right way (query for data first, then operate). The question is, what if a user holds a few massive, in terms of bytes transported, properties. Will of them will be loaded at once, even if they’re unneeded? The answer will be revealed in a very next post.

Deiphobus, first level caching

Another feature of my mapper for Cassandra database, is the first level cache, commonly known as the identity map. Consider loading an entity of a user type only for displaying its login in the page header. Next, you’d like to get the same property to display it in several places across the page (for instance an author of a posts). I assume, that you use DI and your app creates only one session object per web request, injecting it in all the needed places. Consider the following code:

var user1 = session.Load<IUser>(5);
var user2 = session.Load<IUser>(5);
var user3 = session.Load<IUser>(5);

Now try to ReferenceEquals all the returned instances. Yes, that’s the same object! It means that there will be no problems with dirty tracking (what instance should I persist?). Also once an user is loaded with a session object, it can be re-retrieved without db hits. Isn’t it nice?

Deiphobus, unit of work

You can find a very nice description of Unit of Work in here. It describes a mythological artifact, which can understand what you’re doing with object retrieved from a database and allows you easily persist the whole state back to the database. Typically, clients for NoSQL databases do not provide this high abstraction of the persistence. Deiphobus makes a difference, providing a well known from other object mappers, the mighty unit of work, called Session. Below you can find how simple example of the session usage is:

// sessionFactory created earlier from a configuration
using (var s = sessionFactory.Open())
{
	// entity creation makes it tracked, the state will be saved at s.Flush();
	var user = s.Create<IUser>(t =>
	{
		t.Email = "Cassandra@cassandra.org";
		t.AllowMarketingEmails = true;
	});

	var secret = s.Create<IPersonalInfo>(t =>
	{
		t.Type = PersonalInfoType.SecretFact;
		t.Body = "I dislike SQL paradigm";
	});

	var quote = s.Create<IPersonalInfo>(t =>
	{
		t.Type = PersonalInfoType.Quote;
		t.Body = "To be SQL or not to be";
	});
	
	user.Infos.Add(secret);
	user.Infos.Add(quote );
	
	// Flush to persist all changes at once! Dirty checks, etc. are handled without your coding!
	s.Flush();
}

It is simple and not dealing with checking what should be saved. Just create a session and call Flush when you want your changes be persisted. Of course it isn’t transactional as the database Cassandra isn’t, but still, it puts a nice layer over the standard Thrift protocol.

Deiphobus, it simply works with your Cassandra

I’ve migrated Deiphobus to the GitHub. Then, I refactored it a lot, or maybe better – rewrote. Now it became fully functional wrapper for Cassandra. The newest version pushed recently closes a major list of features I wanted to add. If you want to use Cassandra from your .NET code, I strongly encourage you to take a look into (at least) a readme file which covers the majority of cases, where you may use it. And the list of features is:

  • unit of work (ISession)
  • first level caching (identity map), which simply does not hit db when entity is requested twice in one session
  • lazy load, loading an entity does not hit db till you get one of its properties
  • mapping properties groups to column families, which allows you to load commonly used sets of properties in one db hit
  • prefetching a few entities’ data to omit ‘SELECT n+1 problem’
  • entities’ references, as one entity can reference another simply by creating a property of the other type
  • entities’ collections, as one entity can reference another; it’s lazy loaded, so adding entities does not load a whole collection from db
  • automatic dirty checks performed when ISession.Flush is called
  • the latest feature: second level cache for infrequently changing data (column families), caching a part of your entity (for instance a user name and its whole column family) in the memory cache (the default implementation)

I’ll describe all of them in the forthcoming entries, as a lot of changes happened since the last Deiphobus post.

Themis

During the past few days I thought about authorizations.

The Deiphobus is put on hold. At least for a while, a small ‘inverted-index-and-allow-me-to-query-non-relational-Cassandra’s system is stopped being developed. For fans of inverted indexes and Cassandra’s world, I can publish a link to the Lucandra – an implementation of automatic inverted indexer (the search indexes are created when the object is saved in the db), which is driven by the same idea as Deiphobus. The reason behind stopping Deiphobus is the project I want to have right now. I looked through the projects providing authorization possibility and I found none, which would allow me defining in a simple, but multidimensional way all the rules behind authorizations.

The project is name Themis and is located under http://themis.codeplex.com/

The main idea behind Themis is to allow a few dimensions of configuration, with no impact on your domain model (no interface to implement in your domain model). The Themis model consists of:

  • demands
  • role definitions
  • your domain specific roles and entities types, with no impact from Themis

In the future posts I’ll write sth more about all the parts.

My next tasks in the project will be: finishing the example, integrate with NHibernate (that will be tough) to allow filter session results with your demands!