Heavy cloud but no rain

Recently I’ve been playing with Azure Functions. Probably, I should use a bigger word than “playing”, because I implemented a full working app using only functions. 4$ , that was all that I needed to pay for the whole month after running some synthetic load through the app. I spent a few additional hours just to make it 3$ next month. You could ask, what’s the reason. Read along.

Heavy cloud

Moving to cloud is getting easier and easier. With the new backendless (let’s stop calling it serverless) you can actually chop your app into pieces and pay only when they are run. More than this. You’ve got everything monitored, so effectively you can see where you spend your money. If you’re crazy enough, you could even modify the workflow of your app, to make the heavy work at the end of a chain, to postpone it till a user really needs it. Still, these optimizations and thinking don’t seem to be popular this days (or at least I haven’t seen it popping up that frequently).

But no rain

The synthetic load I used to stress the app was simulating a single not that active user. A real usage would be probably much higher, with the price being much bigger. Effectively, instead of treating this optimizations as 1$ only, I could say that I cut the cost by 25%. Now this was only an experiment, but think about it again. A dummy, fast implementation was cheap, but with some additional work I could have done it more profitable. If a price for the cheapest option would be 5$, these are some real gains. These are differences that can make you either profitable or bankrupted.

Make it rain

In past years developers weren’t dealing with money. Servers were there, sometimes faster, sometimes slower. Databases were there, spending countless hours on running our not optimized queries. This time is ending now. Our apps will be billed and it’ll be our responsibility to earn money by making them thinner and faster. Welcome to the cost aware era of software engineering.

Cloudy cost awareness

TL;DR

Our industry was forgiving, very forgiving. You could not put an index, run a query for 1 minutes and some users of your app would be disappointed. If you were the only one on the market or delivered banking systems, that was just fine as you’d no loose clients because of it. The public cloud changes it and if you can’t embrace it, you will pay. You will pay a lot.

Pay as you crawl

If you issue a query scanning a table in Azure Table Storage, every entity you access will be counted as a storage transaction. Run millions of them and your bill will be increased. Maybe not that much, but it will.

If you deploy a set of services as Azure Cloud Services, each of them consuming just 100MB of memory, your VMs will be undersaturated. You’ll pay for memory you don’t use and CPU that just sits in the rack that hosts your VM.

Design is money

Before public cloud, all these inefficiencies could be more or less tolerated, but were not that easy to spot on. Nowadays, with a public cloud, it’s the provider, the host that will notice them and charge you for them. If you don’t design your systems with the awareness of the environment, you will pay more.

Mitigations

This is not black or white situation. It never is. You’ll probably be able to dockerize some parts of your app and host it inside of Service Fabric cluster. You’ll probably be able to use CosmosDB and its autoindexing feature to just fix the performance for lookups in your Azure Storage Tables. There’s a lot of ways to mitigate these effects, but still, I consider a good appropriate design as the most valuable tool for making your systems not only well performing and effective but, eventually, cheap.

Summary

Don’t throw your app against the wall of clouds and check if its sticks. Design it properly. Otherwise, it may stick in a very painful and cost ineffective way.

Hot or not? Data inside of Service Fabric

TL;DR

When calculating the space needed for your Service Fabric cluster, especially in Azure, one can hit machine limits. After all, a D2 instance has only 100 GiB of local disk and this is the disk used by Service Fabric to store data onto. 100 GiB might be not that small, but if you use your cluster for more than one application, you can hit the wall.

Why local?

There’s a reason behind using local, ephemeral disk for Service Fabric storage. The reason is locality. As Service Fabric replicates the data, you don’t need to store them in a highly available storage as the cluster provides one on its own. Storing data in multiple copies by using Azure Storage Services is not needed. Additionally, using local, SSD drives is much faster. It’s a truly local disk after all.

Saturation

Service Fabric is designed to run many applications with many partitions. After all, you want to keep your cluster saturated (almost) as using a few VMs just to run an app that is needed once in a month would be useless. Again, if you run many applications, you need to think about capacity. Yes, you might be running stateless services which don’t require one, but not using stateful services would be a waste. They provide an efficient, transactional, replicated database built in inside of them. So what about the data? What if you saturate the cluster not in terms of CPU but the storage.

Hot or not

One of the approaches you could use is the separation between hot and cold data. For instance, users haven’t logged in for one month could have their data considered as cold. These data could be offloaded from the cluster to Azure Storage Services, leaving more space for one that are needed. When writing applications that use an append only model (for instance ones based on event sourcing) you could think about offloading events older than X days, at the same time ensuring that they can be accessed. Yes, the access will be slower, but it’s unlikely that you’ll need them on regular basis.

Summary

When designing your Service Fabric apps and planning your cluster capacity think through the hot/cold approach as well. This, could lower your requirements for the storage space and enable you to use the same cluster for more application, which effectively is what Service Fabric is for.

The cost of scan queries in Azure Table Storage

There are multiple articles describing the performance of Azure Table Storage. You probably read the entry of Troy Hunt, Working with 154 million records on Azure Table Storage…. You may have invested your time in reading How to get most out of Windows Azure Tables as well. My question is have you really considered the limitations of the queries, specifically scan queries and how they can consume the major part of Azure Performance Targets.

The PartitionKey and RowKey create the primary and the only index in ATS (Azure Table Storage). Depending on the query the following kinds can be distinguished:

  1. Point Queries, which are queries to retrieve a single entity by specifying a single PartitionKey and RowKey using equality as predicate
  2. Row Range Queries, which  are queries to get a set of entities defined with the same PartitionKey and a range of RowKeys
  3. Partition Range Queries, which are run with a range of ParitionKeys
  4. Full table scans, which have no predicate for ParitionKey

What are the costs and limitations of the following queries? Unfortunately, every row that is accessed by the query to perform scan over will be counted as the table operation, Tthere ain’t no such thing as a free lunch. This means, that if you scan your entire table (4th scenario), you’ll be able to process no more than 20,000 entities per second. This limits the usage of large data sets’ scans. If you have to model queries across different keys, then you may consider storing the same value twice: once under the natural Parition/RowKey pair and the second time to match the other index, to create an inverted index. If any case, you’ll have to scan through the entire data set, then using ATS is not the way to go, and you should consider some other ways of modelling your data, like asynchronous copy data to blob, etc.

It’s getting cloudy, isn’t it?

I’ve just finished the Azure workshop. In two days course you cannot get everything, but as far as I know, the discussed topics can show what Azure is all about. I won’t rewrite plenty of blog entries and articles. What I want is to write that the cloud is the future. By the cloud I do not mean Azure, I mean the paradigm allowing you to scale as hell, to manage you site performance on the very organic level (“too much sugar – more insulin”). There is only one danger I can imagine and it’s not the security of your data. Imagine a situation that having such a scaling environment one can improve performance of his application with scaling rather then finding a bug running a 100 additional queries in each request. I hope that programmers’ culture will evolve and will disallow such behavior.