What is Big Data and why is it so important? Big data Analytics you read and think to yourself, am I going to be doing statistician? Am I going to be coding? What am I analyzing? Why do I need this? What is Big Data? Before we convince you to do a Diploma in Big Data Analytics or why you need a certification in Big Data or Big Data Training, we need you to understand how Big data is and how important it is to a future ruled by information. According 2013 numbers Flipkart was averaging 25,000 or so transactions per day. Amazon was averaging about 26.5 million transactions per day, over a million transactions per hour that translates into roughly 2.5 petabytes an hour. What’s a petabyte? One petabyte is a 1000 Terabytes. To bring some perspective, 1 Terabyte (TB) can hold about 2000 hours of CD quality music. Google processes 20 PB’s of information per day. According to IDC, “In 2011, the amount of information created and replicated will surpass 1 8 zettabytes (1 8 trillion gigabytes), growing by a factor of nine in just 3 years. That’s nearly as many bits of information in the digital universe as stars in the physical universe ” Big data essentially outlines a situation where the volume, velocity and variety of data exceed an organizations ability to store it and analyse it so that they can make better decisions. But volume is not where the problem lies with big data, but variety, variability, velocity and complexity. While Amazon’s data might be numerical and easy to tabulate, how are we to analyse unstructured data like videos, text and music. We need to convert unstructured data into quantifiable numbers that will help companies make better decisions like, when should I release that next music video and where. Data flow differs in speed and is seasonal, think of what happens to transactions during Flipkart and Snapdeal sale days or the days leading up to Diwali. So it’s quite clear, we are living in a world where there are mountains of data. But we don’t have data for data’s sake. This vast universe, and it is a universe, of information has to be mined and harvested to help make decisions that impact businesses. If the past is the best indicator of the future, imagine what analyzing such vast amounts of data can do in terms of selling for a retailer like Amazon. Large amounts of data also mean large sample sizes, which enhance analysis because samples can now reflect reality unlike before when you took ten bits of data to reflect what was to happen to an entire bucket of data with so many variables, such as loans. Today you have so much data that loans can be analyzed under multiple subsets of samples, which reflect reality and help create better solutions. “The general rule is that the larger the data sample, the more accurate are the statistics and other products of the analysis ” But the sheer volume of all this data is overwhelming company IT platforms, which means we need efficient technology that can break it, down and analyse it. This brings us to software like SAS, Hadoop and R. What are the key differences between these tools? What are the career options in Data Analytics and what are the best Big data programs in India.