Welcome to the World of Big Data & Hadoop
www.easylearning.guru
Agenda What is Big Data ? Different Kinds of Big Data
Big Data Global Market Hadoop Global job trends What is Hadoop ? www.easylearning.guru
What is Big Data? Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
www.easylearning.guru
Types of Big Data ? Semi-Structured Data
Traditional RDBMS deals with only Structured data.
Need of a technology which deals with Semi-structured data, Unstructured data and Structured data as well
www.easylearning.guru
The 3V’s of Big Data
www.easylearning.guru
Sources of Data
Social Media & Networks (All of us are generating data)
Mobile Devices (Tracking all the objects all the time) Sensor Technology & Networks (Measuring all kinds of data)
Scientific Instruments (Collecting all sorts of data)
www.easylearning.guru
Where Big Data is used ?
www.easylearning.guru
Facebook Scenario Facebook on an average generates 70 thousand MB in 1 minute. 1 hour 1 Day 1 week 4 weeks 52 weeks
= 70,000 MB *60 = 4.2 Million MB = 4.2 Million *24 MB = 10.8 Billion MB = 98438 GB = 6.9 thousand GB = 690 TB = 690 TB * 4 = 2756 TB = 2.7 PB = 2.7 PB * 52 = 143.3 PB
A d that’s aloooooooooot of data ! www.easylearning.guru
Various Bigdata Technologies
www.easylearning.guru
Big Data Global Market 60 Big Data Growth (in USD Billions)
Big Data Implementation
50 40 30 20 10
0 Implemented Big Data
Yet to Implement Big Data
2012 Filled
DATA SCIENTIST BIG DATA VISUALIZER BIG DATA RESEARCH ANALYST
2014
2015
2016
2017
Unfilled
82
18
77
23
69
31
BIG DATA ENGINEER
44
56
BIG DATA ARCHITECT
43
57
BIG DATA ANALYST
Sources : Dice, LinkedIn.
2013
50
50
FILLED/VACANCY(%)
www.easylearning.guru
Hadoop Global Job Trends
More than 17,000 employees with Hadoop skill across these companies
Top Hadoop Technology Companies
Sources : Dice, LinkedIn.
www.easylearning.guru
Hadoop Global Job Trends
SALARY (USD P.A. IN THOUSANDS)
120
DEMAND FOR BIG DATA IN CITIES
100
38% 80
As of February 2014
60
14%
40 8% 20 2%
2%
3%
8%
10%
11%
4%
0
www.easylearning.guru Sources : Dice, LinkedIn.
What is Hadoop ? Hadoop was created by Doug Cutting and Mike Cafarella. Hadoop provides the reliable shared storage and analysis system. It is designed to scale up from a single server to thousand of machines, with a high degree of fault tolerance.
www.easylearning.guru
Hadoop History
www.easylearning.guru
Hadoop Core Components Core Hadoop has two main systems: • Hadoop Distributed File System: The Hadoop file system is a Distributed file system which holds the large amount of data across multiple nodes in a cluster. • MapReduce: MapReduce is a distributed programming paradigm used to analyze the data in the HDFS.
www.easylearning.guru
Hadoop Distributed File System (HDFS) A given file is broken down into blocks (default=64MB), then blocks are replicated across cluster (default=3). Optimized for throughput. HDFS allows you to put/get/delete files. Follows the philosophy Write O ce a d Read Multiple ti es
Block Replication for: - Durability, High Availability and Throughput.
www.easylearning.guru
MapReduce Flow
www.easylearning.guru
MapReduce Framework Map Reduce works by breaking the processing into two phases : Map Phase and Reduce Phase.
www.easylearning.guru
www.easylearning.guru
What we offer‌ www.easylearning.guru
www.easylearning.guru
Syllabus Introduction
Hive
a)Big Data
a)Hive 1
b)Hadoop
b)Hive 2
Hadoop
Hbase
a)HDFS
Zookeeper
b)MapReduce
Sqoop
PIG a)Pig 1 b)Pig 2
Yarn Project Class
www.easylearning.guru
Thank you for watching the Live Demo for Hadoop. You can always contact us on:
Phone : +91 124 4763660 (India) Email : contact@easylearning.guru
Skype Id : easylearning.guru Website : www.easylearning.guru Your queries are always welcome.
www.easylearning.guru