Comparing Apache Storm and Trident
As promised in my earlier blog on Trident-benchmarking performance, here is a post comparing Apache Storm to Trident performance Benchmarks.
Apache Storm is a distributed, free and open source framework, which performs real time computations on unbounded streams of data. In my previous blog, I discussed Trident and its performance and detailed the architecture of Apache Storm and Trident. Trident is a layer of abstraction built on top of Apache Storm. Both Storm and Trident have the same architecture as shown in Fig.1.
Fig 1. Storm – Architecture
Usecase and hardware settings are the same as in my previous experiments on Trident. In short, the usecase is to aggregate the KPI values of Extended Session Records of every user. For further information on the usecase kindly refer to my previous blog.
- A Kestrel queue is used instead of reading directly from files. A kestrelThriftSpout is used to fetch ESR’s from the kestrel queue.
- ESR’s are sent in local-or-shuffle mode to the flattener bolts from the spout.
- Field grouping on dimension values send ESR’s from flattener to aggregator.
- Greenplum database server was used for storing the output aggregation tables. Each of these tables contains a predefined set of dimension and KPI columns. The aggregator bolts export whole rows into the database with JDBC’s CopyManager API.
Table 1. Performance of Storm
Table 2. Comparing Performance of Storm and Trident
It is to be noted that 1000 ESR’s/sec corresponds to roughly 1 million subscribers. Hence, in storm approximately 8 million subscribers could be handled and in trident approximately 7.4 million subscribers could be handled with the 3 node cluster used for the experiment.
To conclude, both Storm and Trident perform well for flattening when compared to aggregation. The frameworks do not scale linearly (as expected) with addition of more workers/nodes. Both frameworks scale comparatively better for flattening than for aggregation. Since data locality has to be ensured in aggregation (to avoid duplicates), field grouping was done on dimension values and all events with the same dimension value vector are routed to the same instance of the bolt where the corresponding aggregation table was built. Based on our experiences, shuffle grouping distributes the load evenly among all instances of bolts in both storm and trident. Fields grouping cannot guarantee uniform and even distribution of events among bolts. This is a major reason for performance skew in aggregator bolts. Another key reason is the blocking communication issue. Storm and Trident uses LMax Distruptor  for intra-worker communication (communication between multiple threads of a worker process). Events spend more time in the waitFor() method of lmax, waiting for a slot in the ring buffer. This adds to the communication time which also contributes majorly for the performance skew.
Further detailed learnings on Trident can be found here.
There are other frameworks like Spark Streaming, which can be used for real time usecases. Druid is a framework specialized for realtime aggregation. The performance of these frameworks can also be compared to the performance of Storm/Trident to arrive at a best recommendation for real time usecases with low and high memory usage.
Manoj P. Ericsson Research
laszlo Toka Ericsson Research