Marten: my Open Source experience


This post describes some of my Open Source experiences when working on Marten, a Polyglot Persistence for .Net Systems using the Rock Solid Postgresql Database (TM). This isn’t me boasting, but rather sharing a pleasant story about getting involved in some solid OSS.


The main person responsible for Marten and the main committer is Jeremy D. Miller. He’s a well known person in .NET Open Source world as the author of StructureMap. I had a pleasure working with him before, in OSS area, when we were trying to create a better NuGet client, called Ripple. Before creating Marten, he spent some time working with RavenDB. He described it thoroughly in Would I use RavenDB again? and Moving from RavenDB to Marten. These entries are pure gold, I encourage you to read them.

What is Marten

Marten is a client library for .NET that enables you to use Postgres database is two ways:

  1. as a document database
  2. as a event sourcing database

Both are included in the same library, as they share a lot. The very foundation of Marten is Postgres’ ability to process JSON and treating it as the first class citizen. First class, you say, what does it even mean? You can use db parameters with JSONB type, indexes (that are truly ACID) and a lot more. Being given that Postgres is a mature database with a well performing storage engine, you can build a lot on top of it.

Performance matters

I started to work on Marten after seeing the JSON serializer seam. It contained only method for serializing objects, that was returning a string


This means, that for every operation that involved obtaining a serialized version of an object (both: a document and an event), a new string would be allocated. For low traffic this could be good, but when operating on many documents or appending a lot of events, it isn’t.

We had a short discussion in this issue which was followed with a PR of mine where I introduced a buffer pool, that provides TextWriters and uses it for upserting documents in the database. I chose to work with the TextWriter abstraction as all good .NET JSON serializers are capable to work with it. I encourage you to follow the PR and the issue, again it shows how a bit excessively described issue can help in communication and making things happen.

Performance matters even more

I wanted to introduce the same pooling for appending events and bulk inserts. This was blocked by the npgsql Postgres .NET driver though. Because of its internal structure, when using array parameters, this driver could not extract the size of a passed text from the parameter. It wasn’t supporting arrays of ArraySegment<char> as parameters as well… I issued the npgsql related PR to fix it and pinged Shay, how fast can we get it. It was released under a week.

Meanwhile, I provided PRs for events and bulk inserts that were RED on TeamCity. Once the new client library npgsql 3.2 has been released, Jeremy merged these two PRs making Marten allocating a lot less on the write side.

No company, lots of work done

Jeremy, Shay and me – we share no manager, no company and we work in different time zones. Still, as a group, we were able to deliver a meaningful performance improvements for a library that we care for. This shows a true power of Open Source.


I hope you enjoyed this possitive journey of mine. This summary does not mean I finished working on Marten. I’d say that it’s totally opposite.

Event sourcing for DBAs


Are you in a team trying to convince a database administrator  to use event sourcing? Are you a DBA that is being convinced? Or maybe it’s you and a person that does not want to change the relational point of view. In any case, read forward.

Drop. Every. Single. Table

That’s the argument you should start with. You can drop every single table and your database will be just right. Why is that? There’s a recovery option. If you don’t run your database with some silly settings, you can just recover all the tables with all the data. Alright, but where do I recover from? What’s the source of truth if I don’t have tables.


The tables are just a cache. All the data are stored in the transaction file log. Yes, the file that is always to big. If all the data are in there, can it be queried somehow?

SELECT * FROM fn_dblog(null, null)

That’s the query using an undocumented SQL Server function. It simply reads a database log file and displays it as a nicely formatted table like this:


Take a look at the operation column now. What can you see?


What are these? That’s right. That’s every single change that has been applied to this database so far. If you spent enough of time with these data, you’d be able to decode  payloads or changes applied to the database in every single event. Having this said, it looks like the database itself is built with an append only store that saves all the changes done to the db.

Event Sourcing

Event sourcing does exactly the same. Instead of using database terms and names of operations, it incorporates the business language, so that you won’t find LOP_MODIFY_COLUMNS with a payload that needs to be parsed, but rather an event with an explicit business related name appended to a store. Or to a log, if you want it call it this way. Of course there’s a cost of making tables out of it once again, but it pays back by pushing the business understanding and meaning much deeper into the system and bringing the business, closer to the system.

At the end these tables will be needed as well to query something. They won’t be treated as a source of truth, they are just a cache anyway, right? The truth is in the log.

Google’s Percolator

Next great article from Google: Percolator. The whitepaper describes a new indexing engine, which no longer uses the massive MapReduce algorithm to calculate web pages indexes. As always, Google used a few already existing tools, like BigTable and build an index updater, which opposite to MapReduce updates small chunks of the index repository drastically reducing 🙂 the time between a page being crawled and being indexed. Worth to read, even for the transaction scheme implemented on the non-transactional BigTable.

Solrnet NHibernate integration

Every developer which creates a scalable applications with high read/write ratio finally has to move to some query oriented storage, which allows almost instant, advanced querying. Solr is one of the frequently used solutions, providing a nice performance with advanced indexing/querying. Talking to Solr needs an API and for .NET and you can use SolrNet. As it is described on the project page, it provides a nice and simple way to integrate with NHibernate, simply by calling following code:

NHibernate.Cfg.Configuration cfg = SetupNHibernate();
var cfgHelper = new NHibernate.SolrNet.CfgHelper();
cfgHelper.Configure(cfg, true); // true -> autocommit Solr after every operation (not really recommended)

The integration with NH allows you to use an extended session to query the solr server. Additionally, all your updates, inserts, deletes will be automatically called on the solr. This feature is provided by NH events listeners registered by the described configuration helper. Let’s take a look and audit a bit of code:

public Configuration Configure(Configuration config, bool autoCommit)
 foreach (Type type in this.mapper.GetRegisteredTypes())
 Type type2 = typeof(SolrNetListener<>).MakeGenericType(new Type[] { type });
 Type serviceType = typeof(ISolrOperations<>).MakeGenericType(new Type[] { type });
 object service = this.provider.GetService(serviceType);
 ICommitSetting listener = (ICommitSetting) Activator.CreateInstance(type2, new object[] { service });
 listener.Commit = autoCommit;
 this.SetListener(config, listener);
 return config;

For each of mapped types a new listener is generated. Is it OK? Why do not use one listener (with no generics at all), handling all the mapped types?
Additionally, doesn’t SetListener method clear all the previously registered listeners, so… what about the types previously handled?

The more interesting question can be raised when looking through the SolrNetListener<T> code. All the listeners in NH are singletons. They are initialized during the start up phase, and used till the application end. Hence, listeners should be stateless, or use some kind of current context resolvers passed in their constructor. The SolrNetListener uses WeakHashtables fields to store the entities which should be flushed to the solr. Static field (because the listener is a singleton)? What about the race conditions, locking etc.? Let’s take a look:

The example of Delete method, called, when solr delete should be deferred (session uses a transaction) shows that listener GLOBALLY locks the execution of all threads:

private void Delete(ITransaction s, T entity)
 lock (this.entitiesToDelete.SyncRoot)
 if (!this.entitiesToDelete.Contains(s))
 this.entitiesToDelete[s] = new List<T>();
 ((IList<T>) this.entitiesToAdd[s]).Add(entity);

furthermore, data can be committed to the solr server, but not committed to the sql database! It can happen when flushing occurs. The override of OnFlush method saves the data to the solr before flushing the db changes.
What if optimistic locking rolls back the whole transaction after the data were stored in the solr? I’d rather have no data in solr and use some background worker to upsert data in solr then have copies of nonexisting entities.

The right behavior for the solr, with no global locks, flushing changes to the solr after a sql db transaction commit can be done with using an Interceptor and rewritten the event handler. I’ll describe it in the next post.


Deiphobus design, pt. 1

It’s the right time to write about Deiphobus design. I’ll start with an example of usage, next I’ll move to configuration, serialization and model created from the config. In the next topic the event design of session implementation, the identity map and its usage will be explained as well as the lifetime of a session and query API.

The configuration provided with Deiphobus is a fluent, “just code” configuration. Each entity is described with a generic EntityClassMap.

/// <summary>
/// The mapping of the class, marking it as entity, stored under separate key.
/// </summary>
/// <typeparam name="T">The type to be mapped.
public abstract class EntityClassMap<T>
 where T : class
 protected void Id(Expression<Func<T, Guid>> id)
 protected IndexPart IndexBy(Expression> memberExpression)
 protected void SetSerializer<TSerializer>( )
 where TSerializer : ISerializer

The class interface was designed to be similar to Fluent NHibernate:

  • class specifying this type, should be a mapped entity class
  • Id method marks the specific property as id. It’s worth to notice, that only Guid identifiers are available
  • the second method is IndexBy used for marking a property to be indexed with an inverted index. Only the properties marked with this method can be queried in Deiphobus queries. Running query on a not indexed property will throw an exception
  • the very last method, allows to set a custom serializer type for the mapped entity type

All the mappings are consumed by mapping container registering all entity class maps in itself. The maps are translated into EntityClassModel object, describing the specific entity properties. This process takes place when the session factory is created. On the basis of each model class, the object implementing the interface IEntityPersister is created. The implementation of the persister provides methods like: GetPropertyValue or GetIndexedPropertyValues with IL code emitted, to overcome the reflection overhead. This class will be described later, the EntityClassModel‘s method signatures can be seen below:

/// <summary>
/// The class representing a model of mapped entity.
/// </summary>
public class EntityClassModel
 public EntityClassModel(Type classType, PropertyInfo id, object idUnsavedValue, IEnumerable<IndexedProperty> indexedProperties)
  ClassType = classType;
  Id = id;
  IdUnsavedValue = idUnsavedValue;
  IndexedProperties = indexedProperties.ToList().AsReadOnly();
 public Type ClassType { get; private set; }
 public PropertyInfo Id { get; private set; }
 public object IdUnsavedValue { get; private set; }
 public IEnumerable<IndexedProperty> IndexedProperties { get; private set; }
 public Type SerializerType { get; set; }

The very last part of this entry, is for serialization in Deiphobus. Because of the usage of Cassandra, each entity is stored under one key, in one column family, in one column. The entity is serialized in the moment of storing. The serialized entity is stored in Cassandra as well as its inverted indexes based on values retrieved just before saving the entity in the database. In the current moment, two levels of serializers can be setup:

  • the default, used by all classes not having their own
  • entity class specific

The rest of types is always serialized using the default serializer. This behavior may be subject to change.


I’ve just finished reading Dremel whitepaper. It seems that Google one more time brought to life something, which may change the IT world. Imagine queries running against trillions of rows and returning results in a few seconds, imagine fault tolerant db, that scales linear and still, allows you to query it in a very advanced ways (for instance, using grouping). Yeah, I’m aware of VoltDB but the Dremel’s description was just astonishing.

On the other hand: have you ever had a possibility to test your scalable app on 2900 servers? 🙂