Apache Storm – Taking The Big Data World By Storm

Page 1

Apache Storm – Taking The Big Data World By Storm


Apache Storm – Taking The Big Data World By Storm A tough question for organizations having loads and lots of data piled up is how to manage it and cull out valuable information from it. One of the most reliable, high performance framework recognized today is Apache Storm. It is a known name in the Big Data industry as a free, open source, real time, distributed framework capable of processing huge bulk of data. It possesses efficient stream processing capabilities and has a niche clientele today around the world. The highlight of Storm is its real time data processing computation system. Streaming data in parallel over a cluster is the mechanism by which it works and hence is quite fast. Taken over by Apache a few years back, now it has risen to be an Apache Top-Level Project (TLP). Seeing its security, multi-tenancy support and enhanced scalability, elite organizations like Yahoo have adopted Storm and are happily implementing it further. Storm is known for adding real time data processing capabilities to Apache Hadoop 2.x, in which it focuses on assisting Hadoop to acquire new projects which contain low latency dashboards and third party integration with applications running in the Hadoop cluster.

Why is Storm Popular?


Faster Speed As quoted by its official site – ‘a benchmark clocked it at over a million 100 byte messages processed per second per node’. Needless to say more about its speed.

Scalability The feature of parallel calculations which execute across a cluster of machines makes it much more scalable than its peers. Separate sections of the topology can be scaled separately and the parallelism of the same can be adjusted accordingly through commands.

Fault Tolerance There is an inbuilt mechanism wherein as soon as the workers die, they will be automatically restarted by Storm. And, as soon as a node dies, another node comes into picture for the workers to start on it.

Reliability Since each unit of data which is known as a tuple, is sure to undergo processing, the entire framework is quite reliable and safe.

Operational Ease There is a lot of ease of deployment and standardization in it helps provide stability. Once it is installed, it just has to be operated with standardized configurations.

Workflow of Storm There are three sets of nodes involved in the workflow:

Apache Storm is being continuously compared with many other frameworks specially Apache Hadoop and Apache Spark. Of course, each one has its own features to highlight. Tough to say, which is the best? It surely goes as per requirements and available parameters.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.