Sewing Machine for Service Fabric


This is an introductory post for my new Open Source journey project, Sewing Machine that aims at extending capabilities of Azure Service Fabric services in various ways. It’s focused on speed and performance, but also aims at delivering more capabilities build on top of the powerful Service Fabric foundations.

Services and actors

Service Fabric provides various capabilities. It can host processes and containers, enabling you to control resources usage on a much more granular level. You can host multiple processes on one VM, if they don’t require that much CPU/memory. Beside this, SF provides following models for writing your applications:

  • stateless services
  • stateful services
  • actor framework

These parts will be covered in the forthcoming posts.

Can we do more

I’ve spent some time reading official docs and decompiling Fabric sources and found that there are possibilities of building more on top of these strong foundations. Does more mean better? Possibly yes. I think that in Sewing Machine I’ll be able to allocate much less, pin no memory and use better serialization. This is low level. On a higher level I hope for:

  • event sourced actors
  • better use of secondary replicas (like using them for running processes etc)
  • multi actor projections

Journey, not path

Sewing Machine is in its initial phase. This is another journey project of mine as it’s a project based on uncovering what’s behind Service Fabric. I will share my findings in following blog posts and work towards reaching the aims above. This also means that version 1.0 is highly unlikely to be published within a month or two. For sure this journey will take a bit more and hopefully findings will be good enough to release it.

I hope you’re eager to do some sewing with this fabric. I am.


The cost of scan queries in Azure Table Storage

There are multiple articles describing the performance of Azure Table Storage. You probably read the entry of Troy Hunt, Working with 154 million records on Azure Table Storage…. You may have invested your time in reading How to get most out of Windows Azure Tables as well. My question is have you really considered the limitations of the queries, specifically scan queries and how they can consume the major part of Azure Performance Targets.

The PartitionKey and RowKey create the primary and the only index in ATS (Azure Table Storage). Depending on the query the following kinds can be distinguished:

  1. Point Queries, which are queries to retrieve a single entity by specifying a single PartitionKey and RowKey using equality as predicate
  2. Row Range Queries, which  are queries to get a set of entities defined with the same PartitionKey and a range of RowKeys
  3. Partition Range Queries, which are run with a range of ParitionKeys
  4. Full table scans, which have no predicate for ParitionKey

What are the costs and limitations of the following queries? Unfortunately, every row that is accessed by the query to perform scan over will be counted as the table operation, Tthere ain’t no such thing as a free lunch. This means, that if you scan your entire table (4th scenario), you’ll be able to process no more than 20,000 entities per second. This limits the usage of large data sets’ scans. If you have to model queries across different keys, then you may consider storing the same value twice: once under the natural Parition/RowKey pair and the second time to match the other index, to create an inverted index. If any case, you’ll have to scan through the entire data set, then using ATS is not the way to go, and you should consider some other ways of modelling your data, like asynchronous copy data to blob, etc.

AzureDirectory – code review

The project AzureDirectory provides an Azure implementation of the Lucene.NET abstraction – Directory. It targets in providing ability to store Lucene index in the Azure storage services. The code can be found in here AzureDirectory . The packages can be found on nuget here. There aren’t marked as prereleases.

The solutions consists of two projects:

  1. AzureDirectory
  2. TestApp

The first provides implementation of the Lucene abstractions. The are a few classes, only needed for the feature implementation (implementations of Lucene abstractions). Additionally some utils class are introduced.
The code is structured with regions, which I personally dislike. Names of regions like: CTORS, internal methods, DIRECTORY METHODS shows the way the code is molded, with no classes holding common wrapped in region functionality. The lengthy methods and ctors are another disadvantage of this code base.
The spacing, using directives, fields that may be readonly are messy. Something which may be cleared with a ReSharper Clean Code is left for a reader to deal with.
You can find in there usages of Lucene obsolete API (like IndexInput.Close in disposal), as well as informative comments like:

// sometimes we get access denied on the 2nd stream…but not always. I haven’t tracked it down yet
// but this covers our tail until I do

It’s good and informative for author but leaving the project in an immature state.

The second project is not a test project but a sample app using the lib. No tests at all.

Summing up, after consideration, I wouldn’t use this implementation for my production Azure app. The code is badly composed, with no tests and left with comments pointing at situations where authors are aware of the unsolved-yet problem.

It’s getting cloudy, isn’t it?

I’ve just finished the Azure workshop. In two days course you cannot get everything, but as far as I know, the discussed topics can show what Azure is all about. I won’t rewrite plenty of blog entries and articles. What I want is to write that the cloud is the future. By the cloud I do not mean Azure, I mean the paradigm allowing you to scale as hell, to manage you site performance on the very organic level (“too much sugar – more insulin”). There is only one danger I can imagine and it’s not the security of your data. Imagine a situation that having such a scaling environment one can improve performance of his application with scaling rather then finding a bug running a 100 additional queries in each request. I hope that programmers’ culture will evolve and will disallow such behavior.