Azure Functions: processing 2 billions items per day (4)

Here comes the last but not least entry in a series, where I’m describing a few patterns that enabled me to process 2 billions items per day, using Azure Functions. The goal was to do it in a cost-aware and cost-wise manner, enabling fast processing with a small amount of money spent on this.

  1. part 1
  2. part 2
  3. part 3
  4. part 4

The first part was all about batching on the sender side. The second part, was all about batching on the receiver side. The third provided the way, to use Azure services without paying for function execution. The last part is about costs and money.

How much do I get for free?

When running under Consumption Plan, you get something for free. What you get is the following:

  • 400k GB-s – GBs means running with 1GB of memory consumed for 1s
  • 1 million executions

The first item is measured with 1 ms accuracy. Unfortunately for the runner,

The minimum execution time and memory for a single function execution is 100 ms and 128 mb respectively.

This means that even if your function could be run under 100ms, you’d pay for it. Fortunately for me, using all the batching techniques from the past entries, that’s not the case. I was able to run function for much longer, removing the taxation of a minimal run time.

Now the second measure. On average there’s over 2 million seconds in a month. This means, that if your functions is executed with smaller frequency, that should be enough.

How much did I pay?

Not much at all. Below, you can find a table from Cost Management. The table includes the writer used for synthetic tests, so the overall it should be much lower.

price

This would mean that I was able to process 60 billion of items per month, using my optimized approach, for 3$.

Is it a free lunch?

Nope, it’s not. There’s no such a thing like free lunch. You’d need to add all the ingredients, like Azure Storage Account operations (queue, table, blobs) and a few more (CosmosDB, anyone?). Still, you must admit, that the price for the computation itself is unbelievebly low.

Summary

In this series we saw, that by using a cloud native approaches like SAS tokens, and treating functions a bit differently (batch computation), we were able to run under a Consumption Plan and process loads of items. As always, entering a new environment and embracing its rules, brought a lot of goodness. Next time, when writing “just a functiona that will be processed a few millions times per month” we need to think and think again. We may pay much less, if our approach truly embrace the new backendless reality of Azure Functions.

 

Azure Functions: processing 2 billions items per day (3)

Here comes the third entry in a series in which I’m describing a few patterns that enabled me to process 2 billions items per day using Azure Functions. The goal was to do it in a cost-aware and cost-wise manner, enabling fast processing with a small amount of money spent on this.

  1. part 1
  2. part 2
  3. part 3

The first part was all about batching on the sender side. The second part, was all about batching on the receiver side. In this part we’ll move to truly backendless processing.

No backend no cry

I truly admire how solutions are migrated to the serverless world. The most interesting is observing 1-1 parity between components that were there before and functions that are created now, a.k.a “Just make it a func!”. If you see this, one to one mapping, there’s a chance that you’re migrating code without changing the approach at all. Let me give you an example.

Imagine that you need to accept users’ requests. These requests are extremely unlikely to fail (there are ways to model services towards that) and if they do, there’s a natural compensation action. You could think that using a queue to store them is a perfect way of accepting a request, that can be processed later on. OK, but we need a component that will accept these requests. We need something that will write to one of Azure Storage Queues, right? Wrong.

Tokenzzzzz

Fortunately for FaaS, Azure Storage Queues have a very interesting capability. They can be accessed directly with a limited scope of rights. This functionality is provided with SAS tokens that enable access to Add, Update and/or Process, and more. What you can do is to give somebody access to only Add messages, you can limit this access to 5 minutes (and revalidate if user can do it after this period of time). The options are limitless.

If we can limit the access to a queue to just adding messages, why would we need a function to accept it? Yes, we might need a function to issue a few tokens at the beginning but there’s no need of consuming a regular request and move it to a queue. No need at all. Your user can use a storage service directly with no code for putting data in there.

To put it even more bluntly: You don’ need a user to call a func to put a message in a queue. A user can can just put a message.

Cloud native

This moves us to being cloud native. To embrace fully different services and understand that using them no longer requires writing code for them. Your functions can easily move to a higher level, assigning permissions, returning tokens and shifting from being a regular app that “just was migrated to functions” to a set of “cloud native functions”, from “using services” to “orchestrating their usage”.

Where’s the cherry

We’ve got the cake. We need a cherry. In the last part, I’ll briefly describe costs and numbers. See you soon.

Azure Functions: processing 2 billions items per day (2)

This is the second blog post in a series in which I’m describing a few patterns that enabled me to process 2 billions items per day using Azure Functions. The goal was to do it in a cost-aware and cost-wise manner, enabling fast processing with a small amount of money spent on this.

  1. part 1
  2. part 2

In the first part you saw that batching can greatly lower the number of messages you need to send, and that it can actually broaden a selection of tools you can use to deliver the same value. My choice was to stick to good, old fashioned Azure Storage Queues as with the new estimated number of messages, I could simply use a single queue.

Serverless side

The initial code responsible for dispatching messages was simple. It was a single function using QueueTrigger, dispatching messages as fast as they go. Because of running under Consumption Plan, all the scaling was being done automatically. I could see a flood of log entries informing about functions being properly executed.

The test was run for a week. I checked the amount of money being spent in the new Cost Management tool and refactored the code a little bit. I was paying too much for doing lookup after lookup and spending too much time on finding data needed for the message processing. The new version was a bit faster and a bit cheaper. But it made me think.

If a single Table Storage operation takes ~30-40 ms, and I need to do a few for a single function run, what am I paying for? Also, I knew that the data are coupled temporarily. In other words, if one entry from a table was used for this message, it’s highly likely to be used within few seconds. Also, I did not care about latency. There was already a queue in there in front of it. I was fine whether the result will be presented within 1s or 5s. I asked myself: how can I use all these constraints in my favor?

Processing batches in batches

The result of my searches was as simple as that. Why don’t process messages already containing batched entries in batches as well. I could use a TimerTrigger to get this function run every 5/10 s and grasp all the messages using a batched operation GetMessages from Azure Storage Queues. Once, they are fetched, I could be able to either prefetch all the required data using parallel async operations with Task.WhenAll or use a local cache for the execution.

Any side effects of dispatching messages on my own? Good poison message handling and doing some work that was internally handled by QueueTrigger.

The outcome? A single function running every x seconds, draining the queue till it’s empty and dispatching loads of messages.

Was it worth it? The total time spent previously by functions could have been estimated as

total_time = number_of_messages * single_message_processing_time

where single_message_processing_time would include all the lookups.

With the updated approach, the number of executions was stable (~15k per day) with different processing times, depending on the number of messages in the queue. The most important factor was the amortized cost of lookups and storage operations. The final answer was: yes, it was definitely worth it as it lowered the price greatly.

Moving on

In this part we saw that the batching idea leaked to the serverless side, beautifully lowering the time and the money spent on the function execution. In the next part we’ll see the power of backendless.

Azure Functions: processing 2 billions items per day (1)

In this series I’ll describe a few patterns that enabled me to process 2 billions items per day using Azure Functions. Yes 2 billions items per day. The aim of this trial was not to check whether you can do it with Azure Functions. You can do it easily. The goal was to do it in a cost-aware and cost-wise manner, enabling fast processing with a small amount of money spent on this.

Initial phase

The start point was simple. To have a single queue, in my case Azure Storage Queue, and simply enqueue items to it, and run processing on a Consumption Plan. This looked pretty nicely. If you ever try Azure Functions you’ll see the ability to scale up instances when needed, just to make your workload processed in a timely manner.

I must admit that I skipped that part. When you calculate the number of operations that a single queue can handle, it won’t be enough to cope with 2 billions item per day. Yes, you can scale to multiple queues or use a different kind of queue. This was not the case for my experiment though.

It comes in batches

The important part that I intentionally didn’t mention, was the fact, that the numbers of items’ producers was limited. Also, they were able to batch items and flush them once in a while. With this assumption I was able to use a dense serialization protocol (big no no for JSON) and fill every single message that is being sent with hundreds, sometimes, thousands of items to get them processed.

In my case this lowered the number of messages greatly, by a factor of 1000, leaving the whole thing working as it was supposed to. Yes, the receiving part become a bit different as it was required to deserialize the densely packed payload properly.

You may ask why not Event Hubs? Being able to pack data on my own, being given the possibility of a delayed write and comparing prices for the scale I talk about, Azure Storage Queues with a properly selected serializer still won in my calculations.

Cheating Seeing opportunities

This was the first opportunity that I used to make the processing faster and cheaper. We saw that using a batch (smart-batching in this case) greatly lowered the number of moving pieces, still delivering the same value. In the following entry, we’ll move a bit deeper into the solution I built.

Pricing SaaS in the clouds

Why is it so pricey? This is a question that might have popped in your head too many times. Especially, when looking at the pricing pages of SaaS applications. The second that might have followed up, is Why? What’s behind this pricing model? How did they come up with it? Of course one answer could be I don’t care. They did it to get rich. With my money!, but this isn’t very constructive, is it? What would be the minimum price you’d charge for a single user, or a single account of your app? These are much better questions to ask.

Recently, I’ve been playing with Azure Functions. They provide this beautiful FaaS (Function as a Service) environment, where you pay for what you use. In a Consumption Plan, you don’t even pay for you app lying in there as long as nobody uses. Not paying for having no users is a good thing. Having users and paying something is a much better situation though. Imagine now, that you have your first account registered. Let’s put aside the cost of staff/work/development. How much money do you need to handle this single account. How would you estimate costs?

I think that using word estimation in this case, would be a really underestimation. It’s so easy to put a single Function App, with a single storage account and just run a synthetic workload. A single account for one month. Then, using Azure cost management, just to take a look at your bill. See the numbers. No guessing, no estimation, but real costs, real money. Now, with these numbers, you can go back to the pricing model and put something on top of it, just to make it work for you. And for clouds’ sake, remember to make it rain!

How does Service Fabric host your services?

TL;DR

Service Fabric provides an amazing fully automated hosting for any number of services with any number of instances each (up to the physical limits of your cluster). But how are these hosted? What if you have more partitions than nodes?

Structure recap

When building an app that is meant to be hosted in Service Fabric you build an… app. This application might consist of multiple stateful and stateless services. The application is packaged to a… package that, when uploaded to the cluster as an image, provides an application type. From this, you can instantiate multiple application instances. Let me give you an example.

Application “Bank” consists of two services:

  1. “Users”
  2. “Accounts”

When build with version “1.0.0” and packaged, it can be uploaded to the SF cluster and is registered as “Bank 1.0.0”. From now on you can instantiate as many banks as you want within your cluster. Each will be composed of two sets of services: “Users” and “Accounts”.

Services, stateful services

When defining stateful services, these that have a built in kind-of database (reliable collections, or SewingSession provided by my project SewingMachine) you need to define how many partitions they will have. You can think of partitions as separate databases. Additionally, you define the number of replicas every partition will have. That’s done to ensure high availability. Let me give you an example.

  1. “Users” have the number of partitions set to 100 and every partition is replicated to 5 replicas (let’s say P=100, R=5)
  2. “Accounts” are configured with P=1000, R=7

Imagine that it’s hosted on a cluster that has only 100 nodes. This will mean, that on every node (on average) system will place 5 replicas of “Users” and 70 replicas of “Accounts”. It’s a valid scenario. Once some nodes are added to the cluster, replicas will be automatically moved to new nodes lowering the saturation of previously existing.

What if a node hosts more than one replica of one service, how are they hosted? Moreover, how do the communicate, as there’s only one port assigned to do it?

Cohosting to the rescue

Currently, all the replicas are hosted within the same process. Yes, although 5 “Users” instances will be created, they will all be sitting in the same AppDomain of the same process. The same goes for 70 “Accounts”. You can check it on your own by obtaining the current process ID (PID) and AppDomain.Current and compare. This reduces the overhead of hosting as all assemblies and static resources (assemblies loaded, static fields, types) are shared across replicas.

One port to rule them all

By default, when using native Service Fabric communication listener, only one port is used by an endpoint. How is possible that the infrastructure knows how to route messages to the right partition and replica? Under the hood, when opening a communication listener, replica registers the identifier of the partition it belongs to and its replica number. That’s how, when a message arrives, Service Fabric infrastructure is capable of sending the message to the right communication listener, and therefore, to the right service instance.

Summary

Now you know, that all replicas of partitions of the same service on one node are cohosted in the same process and that Service Fabric infrastructure dispatches messages accordingly to the registered partition/replica pair.

Orchestrating processes with full recoverability

TL;DR

Do you call a few services in a row as a part of a bigger process? What if one of the calls fails? What if your hosting application fails? Do you provide a reliable way for successfully finishing your process? If not, I might have a solution for you.

Anatomy of a process

A process can be defined as at least two calls to different services. When using a client library of some sort and C# async-await feature one could write a following process


var id = await invoiceService.IssueInvoice(invoiceData);
await notificationService.NotifyAboutInvoice(id);

It’s easy and straightforward. First, we want to issue an invoice. Once it’s done, a notification should be sent. Both calls although they are async should be executed step by step. Now. What if the process is halted after issuing the invoice? When we rerun it, there’s no notion of something stopped in the middle. One could hope for good logging, but what if this fails as well.

Store and forward

Here comes the solution provided by DurableTask library provided by the Azure team. The library provides a capability of recording all the responses and replaying them without execution. All you need is to create proxies to the services using a special orchestration context.

With a process like the above when executing following state is captured:

  1. Initial params to the instance of the process
  2. invoiceData are stored when first call is done
  3. invoiceService returns and the response is recorded as well
  4. invoiceNumber is stored as a parameter to the second call
  5. notificationService returns and it’s marked in the state as well

As you can see, every execution is stored and is followed by storing it’s result. OK. But what does it mean if my process fails?

When failure occurs

What happens when failure occurs. Let’s consider some of the possibilities.

If an error occurs between 1 and 2, process can be restarted with the same parameters. Nothing really happened.

If an error occurs between 2 and 3, process is restarted. The parameters to the call were stored but there’s no notion of the call to the first service. It’s called again (yes, the delivery guarantee is at-least-once).

If an error occurs between 3 and 4, process is restarted. The response to the call to the invoice service is restored from the state (there’s no real call made). The parameters are established on the basis of previous values.

And so on and so forth.

Deterministic process

Because the whole process is based either on the input data or already received calls’ results it’s fully deterministic. It can be safely replayed when needed. What are not deterministic calls that you might need? DateTime.Now comes immediately to one’s mind. You can address it by using deterministic time provided by the context.CurrentUtcDateTime.

What’s next

You can build a truly powerful and reliable processes on top of it. Currently, implementation that is provides is based on Azure Storage and Azure Service Bus. In a branch you can find an implementation for Service Fabric, which enables you to use it in your cluster run on your development machine, on premises or in the cloud.

Summary

Ensuring that a process can be run till a successful end isn’t an easy task. It’s good to see a library that uses a well known and stable language construct of async-await and lifts it to the next level, making it an important tool for writing resilient orchestrations.