Heavy cloud but no rain

Recently I’ve been playing with Azure Functions. Probably, I should use a bigger word than “playing”, because I implemented a full working app using only functions. 4$ , that was all that I needed to pay for the whole month after running some synthetic load through the app. I spent a few additional hours just to make it 3$ next month. You could ask, what’s the reason. Read along.

Heavy cloud

Moving to cloud is getting easier and easier. With the new backendless (let’s stop calling it serverless) you can actually chop your app into pieces and pay only when they are run. More than this. You’ve got everything monitored, so effectively you can see where you spend your money. If you’re crazy enough, you could even modify the workflow of your app, to make the heavy work at the end of a chain, to postpone it till a user really needs it. Still, these optimizations and thinking don’t seem to be popular this days (or at least I haven’t seen it popping up that frequently).

But no rain

The synthetic load I used to stress the app was simulating a single not that active user. A real usage would be probably much higher, with the price being much bigger. Effectively, instead of treating this optimizations as 1$ only, I could say that I cut the cost by 25%. Now this was only an experiment, but think about it again. A dummy, fast implementation was cheap, but with some additional work I could have done it more profitable. If a price for the cheapest option would be 5$, these are some real gains. These are differences that can make you either profitable or bankrupted.

Make it rain

In past years developers weren’t dealing with money. Servers were there, sometimes faster, sometimes slower. Databases were there, spending countless hours on running our not optimized queries. This time is ending now. Our apps will be billed and it’ll be our responsibility to earn money by making them thinner and faster. Welcome to the cost aware era of software engineering.

Anomalies: Listening to your secondaries with Service Fabric

This is the second post in the series describing different anomalies you may run into using modern databases or other storage systems.

Just turn this on

This story has a similar beginning as the last one. It starts when one of developers working on a project built with ServiceFabric finds this property ListenOnSecondary and enables this feature. After all, if now every node in my cluster can answer queries sent by other parts, that should be good, right? I meant, it’s even more than good! We’re faster now!

Replication

To answer this, we need to dive a bit deeper . We need to know how Service Fabric internal storage works. Service Fabric provides a clustered storage. To ensure that your data are properly copied, it uses a replication protocol. In every single moment, there’s only one active master, the copy accepting all the write and read operations, replicating its data to all the secondary replicas. Because of various reasons, replicas that data are copied to, can be not always up to date. To give an example, imagine that we sent three commands to Service Fabric to write different pieces of data. Let’s take a look at the state

  • master: cmd1, cmd2, cmd3
  • replica2: cmd1, cmd2,
  • replica3: cmd1, cmd2, cmd3

Eventually, replica2 will receive the missing cmd3, but depending on you hardware (disks, network), there can be a constant small lag, where it has some of the operations not replicated yet.

Now, after seeing this example of how replication works and noticing that the state on replicas might be occasionally stale, can we turn on ListenOnSecondary that easily?

It depends (TM)

There is no straight answer to this. If your user first calls an action that might result in a write, and then, almost-immediately, queries for the data, they might not see their writes, which are replicated with some lag.

If your writes are not followed with reads, and you always cheat by updating the view for the user as it would be, if data were read from the store, then, you might not run into a problem.

Unfortunately, before switching on this small flag, you should think about concerns I raised above.

Wrapping up

Unfortunately for us, we’ve been given a very powerful option, configured with a single call to a method. Now, we can enable reading potentially stale data to gain bigger query throughput. It’s still up to us, whether we want to do it and whether we can do it, being given the environment and the architecture our solution lives in.