The missing 20% of configuration with Octopus Deploy

Recently I’ve been evaluating Octopus Deploy. I wanted to learn more about a platform which is quite unique for .NET environment. After getting through the features I wrote following tweet:

What’s about the missing 20%?
The missing part is the configuration considered as an artifact. There are many projects when changing you code makes you change your configuration. Adding some values is ok, much more important are operations like swapping whole sections, etc. This kind of changes are frequently connected with the given code change, hence they are changed in the very same commit and pushed to VCS. Then, your configuration is no longer a simple table with selection of dimensions where the given value should be applied. The additional skipped dimension is time measured in the only unit VCS is aware of – commits. It’s like “from now on I need this values”.
What you can do is to use Octopus values to point to the right file for the given environment, that’s for sure. The thing becomes a bit more tricky when for instance production config should not be leaked into the development repo, etc.
This leads to the fact, that your configuration is an artifact. In many cases it can be easily replaced by a table with ‘environment’ dimension but still, it is an artifact, now, unfortunately not stored in your repository.

The reasoning above is not meant to lead you astray. Octopus is a great deployment tool. As with every tool, use it wisely.

Agile team analogy

Yesterday I’ve had a discussion about ability to introduce a Scrum-based development in a big organization. There was a question raised whether it’s possible to construct a cross-functional team and deliver shippable product with no busy waits, no “I need this from that guy” syndrome.
I thought about each team member as a team part, consuming some input (mails, requests, discussions) and producing some output. Nowadays it’s much more possible to make it asynchronous, to remove all the meetings, long-running discussions, to review code of your colleagues in a post-commit manner. Having that said, you can look at each person in producer-consumer manner and the graph of the dependencies is the bus shifting the artifacts you create. What it takes to create a good team is to couple all the people which can be waiting for the output of others and collocate them in one team. The obvious producers (with much less input) would be product owners, hence, they’d be feed by testers for sure. If you have some kind of core system in your company, a guy working with it would be a perfect fit as well. Just try to collocate all the people with transitive producer-consumer dependencies. Choose the most critical and time consuming ones. Just make a bubble with a minimum input, possibly asynchronous. What about output? That would be, according to your definition of done, your shippable product. Nothing more nothing less. That’s what all Agile is all about, isn’t it?

Your local user groups

Recently I’ve been deeply involved in Warsaw .NET User Group. What makes you a deeply involved you ask? I’d say that resolving current problems would be the answer that fits the most. We covered a few important points like getting some sponsorship, being given a few tickets for Build Stuff conference (thanks!) and running snacks sessions (short presentations, for those who want to start with their presentations). It looks like people are a bit more energized and active. That’s for sure the right direction for any user group.
I want to encourage all of you, just make a small move, do sth for the community you’re chosen “to be involved with”, for instance ask for a problem to be resolved. It’s a win-win “by people, for people”, nothing more, nothing less.

Code reviews

Recently, I’ve been involved in a discussion about code reviews. It made me remember a code review board that I introduced in one of my previous projects. Beside that I had to clarify positive aspects of code reviews. Verified by experience the most important point of doing code reviews with a tool is total asynchrony of the process. The reviewer can easily go through the diff, marking lines, changes in a given time selected by himself (or scheduled via team rules). The second point would be the artifact left after this process. It’s not a discussion over a coffee or an email in a private inbox. Once it’s published in a tool, it’s visible part of the project, a manifestation of the change in the code. Isn’t it great?

The list of possible tools to use:

  • https://github.com with it’s possibility to comment over pull requests and commits.
  • http://www.reviewboard.org/ I used this tool, it’s simple and easy to go with. It’s free!
  • Crucible an Atlasian tool. I haven’t used it so far, but hope to do it in a while

Getting asynchronous, public, strongly connected with code discussion over the project. This will make your code thrive and the knowledge spread across the team. Trust me.

Simple Cassandra backups

Cassandra is one of the most interesting NoSQL databases which resolves plenty of complex problems with extremely simple solutions. They are not the easiest options, but can be deduced from this db foundations.
Cassandra uses Sorted String Tables as its store for rows values. When queried, it simply finds the value offset with the index file and searched the data file for this offset. New files are flushed once in a while to disc and a new memory representation of SST is started again. The files, once stored on disc are no longer modified (the compactation is another scenario). How would you backup them? Here comes the simplicity and elegance of this solution. Cassandra stores hard links to each SST flushed from memory in a special directory. Hard links preserves removing of a file system inodes, allowing to backup your data to another media. Once once backup them, they can be removed and it’d be the file system responsibility to count whether it was the last hard link and all the inodes can be set free. Having your data written once into not modified files gives you this power and provides great simplicity. That’s one of the reasons I like Cassandra’s design so much.

The docs for Cassandra backups are available here.

Polyglot programming

Have you ever been in a situation when you’ve got not enough tooling in your preferred environment? What did you do back then? Did you use already known tools or search for something better?
Recently I’ve been involved in a project which truly pushed me out of my comfort zone. I like .NET, but there’s not enough tooling to provide a highly scalable, performant, failure resistant environment. We moved to Java, Storm, Cassandra, Zookeeper – JVM all over the place. It wasn’t easy, but it was interesting and eye opening. Having so many libraries focused on resolving specified concerns instead of the .NET framework-oriented paradigm was very refreshing.
Was it worth it? Yes. Was it good for self-development? Yes. Will I reach for every new library/language? For sure no. The most important thing which I’ve learned so far, was that being adaptable and aware of tools is the most important thing. Mistakes were made, that for sure, but the overall solution is growing in a right direction.
After all, it’s survival of the fittest, isn’t it?

Disruptor with MultiProducer

I hope you’re aware of the LMAX tool for fast in memory processing called disruptor. If not, it’s a must-see for nowadays architects. It’s nice to see your process eating messages with speeds ~10 millions/s.
One of the problems addressed in the latest release was a fast multi producer allowing one to instantiate multiple tasks publishing data for their consumers. I must admit that the simplicity of this robust part is astonishing. How one could handle claiming and publishing a one or a few items from the buffer ring? Its easy, claim it in a standard way using CAS operation to let other threads know about the claimed value and publish it. But how publish this kind of info? Here’s come the beauty of this solution:

  1. allocate a int array of the buffer ring length
  2. when items are published calculate their positions in the ring (sequence % ring.length)
  3. set the values in the helper int array with numbers of sequences (or values got from them)

This, with overhead of int array allows:

  1. waiting for producer by simply checking the value in the int array, if it matches the current number of buffer iteration
  2. publishing in the same order items were claimed
  3. publishing with no additionals CASes

Simple, powerful and fast.
Come, take a look at it: MultiProducerSequencer