Welcome to the World of Big Data & Hadoop www.easylearning.guru
Jul 14, 2015
Agenda
What is Big Data ?
Different Kinds of Big Data
Big Data Global Market
Hadoop Global job trends
What is Hadoop ?
www.easylearning.guru
What is Big Data?
Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
www.easylearning.guru
Types of Big Data ?
Traditional RDBMS deals
with only Structured data.
Need of a technology which deals with
Semi-structured data, Unstructured
data and Structured data as well
Semi-Structured
Data
www.easylearning.guru
Sources of Data
Social Media & Networks
(All of us are generating data)
Mobile Devices
(Tracking all the objects all the time)
Sensor Technology & Networks
(Measuring all kinds of data)
Scientific Instruments
(Collecting all sorts of data)
www.easylearning.guru
Facebook Scenario
Facebook on an average generates 70 thousand MB in 1 minute.
1 hour = 70,000 MB *60 = 4.2 Million MB
1 Day = 4.2 Million *24 MB = 10.8 Billion MB = 98438 GB
1 week = 6.9 thousand GB = 690 TB
4 weeks = 690 TB * 4 = 2756 TB = 2.7 PB
52 weeks = 2.7 PB * 52 = 143.3 PB
A d that’s aloooooooooot of data !
www.easylearning.guru
Big Data Global Market
Sources : Dice, LinkedIn.
Big Data Implementation
Implemented Big Data Yet to Implement Big Data0
10
20
30
40
50
60
2012 2013 2014 2015 2016 2017
Big
Da
ta G
row
th (
in U
SD
Bil
lio
ns)
BIG D A TA A NA LYST
B IG D A TA A RCHITECT
B IG D A TA ENGINEER
B IG D A TA RESEA RCH A NA LYST
B IG D A TA V ISUA LIZ ER
D A TA SCIENTIST
50
43
44
31
23
18
50
57
56
69
77
82
FILLED/VACANCY(%)
Filled Unfilled
www.easylearning.guru
Hadoop Global Job Trends
Top Hadoop Technology Companies
Sources : Dice, LinkedIn.
More than 17,000
employees with Hadoop
skill across these
companies
www.easylearning.guru
2% 2% 3% 4%
8% 8% 10% 11%
14%
38%
DEMAND FOR BIG DATA IN CITIES
As of February 2014
0
20
40
60
80
100
120
SA
LAR
Y (
US
D P
.A.
IN T
HO
US
AN
DS
)
Sources : Dice, LinkedIn.
Hadoop Global Job Trends
www.easylearning.guru
What is Hadoop ?
Hadoop was created by Doug Cutting and Mike Cafarella.
Hadoop provides the reliable shared storage and analysis system.
It is designed to scale up from a single server to thousand of machines, with a high degree of fault tolerance.
www.easylearning.guru
Hadoop Core Components
Core Hadoop has two main systems:
• Hadoop Distributed File System: The Hadoop file system is a
Distributed file system which holds the large amount of data across multiple nodes in a cluster.
• MapReduce: MapReduce is a distributed programming paradigm used to analyze the data in the HDFS.
www.easylearning.guru
Hadoop Distributed File System (HDFS)
A given file is broken down into blocks (default=64MB), then blocks are replicated across cluster (default=3).
Optimized for throughput.
HDFS allows you to put/get/delete files.
Follows the philosophy
Write O ce a d Read Multiple ti es
Block Replication for:
- Durability, High Availability and Throughput.
www.easylearning.guru
MapReduce Framework
Map Reduce works by breaking the processing into two phases :
Map Phase and Reduce Phase.
www.easylearning.guru
Syllabus
Introduction
a)Big Data
b)Hadoop
Hadoop
a)HDFS
b)MapReduce
PIG
a)Pig 1
b)Pig 2
Hive
a)Hive 1
b)Hive 2
Hbase
Zookeeper
Sqoop
Yarn
Project Class www.easylearning.guru
Thank you for watching the Live Demo for Hadoop.
You can always contact us on:
Your queries are always welcome.
Phone : +91 124 4763660 (India)
Email : [email protected]
Skype Id : easylearning.guru
Website : www.easylearning.guru
www.easylearning.guru