BIG DATA-INTRODUCTION BY QUONTRA SOLUTIONS
What is BIG DATA ?
BIG DATA Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.
BIG DATA-Characteristics
Volume – The quantity of data that is generated is very important in this context. It is the size of the data which determines the value and potential of the data under consideration and whether it can actually be considered Big Data or not. The name ‘Big Data’ itself contains a term which is related to size and hence the characteristic.
Variety - The next aspect of Big Data is its variety. This means that the category to which Big Data belongs to is also a very essential fact that needs to be known by the data analysts. This helps the people, who are closely analyzing the data and are associated with it, to effectively use the data to their advantage and thus upholding the importance of the Big Data. .
Velocity - The term ‘velocity’ in the context refers to the speed of generation of data or how fast the data is generated and processed to meet the demands and the challenges which lie ahead in the path of growth and development. Variability - This is a factor which can be a problem for those who analyze the data. This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively. Variability - This is a factor which can be a problem for those who analyze the data. This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively.
Veracity - The quality of the data being captured can vary greatly. Accuracy of analysis depends on the veracity of the source data.
Complexity - Data management can become a very complex process, especially when large volumes of data come from multiple sources. These data need to be linked, connected and correlated in order to be able to grasp the information that is supposed to be conveyed by these data. This situation, is therefore, termed as the ‘complexity’ of Big Data.
BIG DATA HISTORY
The first major data project is created in 1937 and was ordered by the Franklin. Roosevelt’s administration in the USA. After the Social Security Act became law in 1937, the government had to keep track of contribution from 26 million Americans and more than 3 million employers. IBM got the contract to develop punch card-reading machine for this massive bookkeeping project.
1937
The first data-processing machine appeared in 1943 and was developed by the British to decipher Nazi codes during World War II. This device, named Colossus, searched for patterns in intercepted messages at a rate of 5.000 characters per second. Thereby reducing the task from weeks to merely hours.
1943
In 1952 the National Security Agency (NSA) is created and within 10 years contract more than 12.000 cryptologists. They are confronted with information overload during the Cold War as they start collecting and processing intelligence signals automatically.
1952
In 1965 the United Stated Government decided to build the first data center to store over 742 million tax returns and 175 million sets of fingerprints by transferring all those records onto magnetic computer tape that had to be stored in a single location. The project was later dropped out of fear for ‘Big Brother’, but it is generally accepted that it was the beginning of the electronic data storage era.
1965
In 1989 British computer scientist Tim Berners-Lee invented eventually the World Wide Web. He wanted to facilitate the sharing of information via a ‘hypertext’ system. Little could he know at the moment the impact of his invention.
1989
As of the ‘90s the creation of data is spurred as more and more devices are connected to the internet. In 1995 the first super-computer is built, which was able to do as much work in a second than a calculator operated by a single person can do in 30.000 years.
1995
In 2005 Roger Mougalas from O’Reilly Media coined the term Big Data for the first time, only a year after they created the term Web 2.0. It refers to a large set of data that is almost impossible to manage and process using traditional business intelligence tools.
2005
2005 is also the year that Hadoop was created by Yahoo! built on top of Google’s Map Reduce. It’s goal was to index the entire World Wide Web and nowadays the open-source Hadoop is used by a lot organizations to crunch through huge amounts of data.
2005
As more and more social networks start appearing and the Web 2.0 takes flight, more and more data is created on a daily basis. Innovative startups slowly start to dig into this massive amount of data and also governments start working on Big Data projects. In 2009 the Indian government decides to take an iris scan, fingerprint and photograph of all of tis 1.2 billion inhabitants. All this data is stored in the largest biometric database in the world.
2009
In 2010 Eric Schmidt speaks at the Techonomic conference in Lake Tahoe in California and he states that "there were 5 Exabyte’s of information created by the entire world between the dawn of civilization and 2003. Now that same amount is created every two days.
2010
In 2011 the McKinsey report on Big Data: The next frontier for innovation, competition, and productivity, states that in 2018 the USA alone will face a shortage of 140.000 – 190.000 data scientist as well as 1.5 million data managers.
2011
PRESENT
Thank You