Real time processing frameworks
S4, Storm – When, What and How to choose.
Real-time processing denotes processing, transforming and analyzing data on the fly, as and when data is generated or received. Real-time processing is different from batch processing, where data is stored as tables, files or blocks and the stored data is processed as chunks, in a distributed parallel fashion. In real-time processing, data is processed as individual records or small groups of records, depending on the speed of data arrival/generation. There are numerous frameworks available as open source for performing real-time operations on streaming data, for example S4, Storm and Spark streaming. In this blog post, I will focus on S4 and Storm.
S4 is a general purpose, scalable, distributed platform for processing event streams. S4 is developed in Java. In S4, a processing element (PE) is the smallest component responsible for performing operations on a subset of data - or a partition of the entire data, depending on the design. Applications (called APP in the S4 world) are built as a graph of processing elements. S4 spawns a PE instance for each unique combination of data, depending on the design of APP. Adapters are S4 applications that can convert external streams into streams of S4 events. PE’s communicate asynchronously by sending events on streams. Events are dispatched to nodes according to their key
Storm is a scalable, fault tolerant platform for processing event streams. In my previous blog post, Comparing Apache Storm and Trident, I explained more about this framework. Storm is developed partly in Java and partly in Clojure. In Storm, a bolt is responsible for performing operations on a subset of data. User has control to direct streams of data to appropriate tuples based on the requirement.
Kindly refer to fig.1 and fig.2 below for the architecture of S4 and Storm.
Fig 1. Architecture of S4
Fig 2. Architecture of Storm
Based on my experience with S4 and Storm, I have compared various features of these frameworks in a table below.
Table 1: Comparing S4 and Storm
The pros and cons of S4 and Storm are mentioned in table 2, below.
Table 2. Pros and Cons of S4 and Storm
Keeping in mind, the above mentioned comparisons between S4 and Storm, I have compiled a list of scenarios and the best framework to select for each of the scenarios, in table 3 below.
Table 3. Scenarios and suitable frameworks
There are numerous other frameworks like Spark Streaming, Samza, Dempsy, P2G etc., which can also be considered while making a choice of frameworks.
Manoj P Ericsson Research