This post has been imported from my previous blog. I did my best to parse XML properly, but it might have some errors.
If you find one, send a Pull Request.
Storm processor, recently moved to Apache foundation is a powerful stream processing library open sourced by Twitter. It provides needed resources to scale out the processing across multiple machines, providing at-least-once guarantees or exactly-once using Trident. The library is based on two basic elements:
which are combined in a topology, a mesh of elements emitting and consuming events in order of data processing. Streams, unlike EventStore are not cheap and represent a logical flow of data rather than an aggregate boundary. One stream can be emitted by more than one spout or bolt. The further discussion of streams is beyond scope of this article. The bolt declarer, used in topology builder implements plenty of interfaces allowing to define consumed tuples of different streams. What it allows one to do is assigning a given bolt instance to handle a given set of tuples from a given stream of data, for example:
The bolt isn’t limited to consume tuples only from one grouping. It can join multiple streams grouped in multiple ways. This can bring another opportunity for data repartitioning. For example, application emitting streams of data like exceptions on the production environment and user transactions can easily raise an alarm when a given user experience more than one exception every 10 transactions. A simple bolt using two fields groupings can deal with it easily. I hope this short introduction will encourage you to dive into Storm. It’s a very powerful tool, especially in Complex Event Processing, with scale out possibilities.