Real time processing frameworks

S4, Storm – When, What and How to choose.

Lady in departmental store
Category

Real-time processing denotes processing, transforming and analyzing data on the fly, as and when data is generated or received. Real-time processing is different from batch processing, where data is stored as tables, files or blocks and the stored data is processed as chunks, in a distributed parallel fashion. In real-time processing, data is processed as individual records or small groups of records, depending on the speed of data arrival/generation. There are numerous frameworks available as open source for performing real-time operations on streaming data, for example S4, Storm and Spark streaming. In this blog post, I will focus on S4 and Storm.

S4 is a general purpose, scalable, distributed platform for processing event streams. S4 is developed in Java. In S4, a processing element (PE) is the smallest component responsible for performing operations on a subset of data - or a partition of the entire data, depending on the design. Applications (called APP in the S4 world) are built as a graph of processing elements. S4 spawns a PE instance for each unique combination of data, depending on the design of APP. Adapters are S4 applications that can convert external streams into streams of S4 events. PE’s communicate asynchronously by sending events on streams. Events are dispatched to nodes according to their key

Storm is a scalable, fault tolerant platform for processing event streams. In my previous blog post, Comparing Apache Storm and Trident, I explained more about this framework. Storm is developed partly in Java and partly in Clojure. In Storm, a bolt is responsible for performing operations on a subset of data. User has control to direct streams of data to appropriate tuples based on the requirement.

Kindly refer to fig.1 and fig.2 below for the architecture of S4 and Storm.


Fig 1. Architecture of S4


Fig 2. Architecture of Storm

Based on my experience with S4 and Storm, I have compared various features of these frameworks in a table below.


Table 1: Comparing S4 and Storm

The pros and cons of S4 and Storm are mentioned in table 2, below.


Table 2. Pros and Cons of S4 and Storm

Keeping in mind, the above mentioned comparisons between S4 and Storm, I have compiled a list of scenarios and the best framework to select for each of the scenarios, in table 3 below.


Table 3. Scenarios and suitable frameworks

There are numerous other frameworks like Spark Streaming, Samza, Dempsy, P2G etc., which can also be considered while making a choice of frameworks.

Manoj P Ericsson Research


ABOUT THE CONTRIBUTOR
The Ericsson blog

In a world that is increasingly complex, we are on a quest for easy. At the Ericsson blog, we provide insight, news and opinion to help make complex ideas on technology, business and innovation simple. If you want to hear from us directly, please head over to our contact page.

Contact us