HADOOP D EVELOPED B Y : J AYDEEP P ATEL (13 MCA 63) K ULDEEP P ATEL (13 MCA 64)
HADOOPDEVELOPED BY : JAYDEEP PATEL(13MCA63)
KULDEEP PATEL(13MCA64)
WHAT IS BIG DATA?
THE TERM BIG DATA STANDS FOR COLLECTION OF DATA SETS THAT ARE TOO
LARGE AND COMPLEX ,SO IT IS DIFFICULT TO CAPTURE , STORE , SEARCH AND
ANALYZE USING TRADITIONAL DATA PROCESSING APPLICATIONS.
BIG DATA = SORTED DATA + UNSORTED DATA
SORTED DATA
UNSORTED DATA
CHARACTERISTICS OF BIG DATA
3VS (VOLUME, VARIETY AND VELOCITY) ARE DEFINING PROPERTIES OR
DIMENSIONS OF BIG DATA.
VOLUME REFERS TO THE AMOUNT OF DATA.
VARIETY REFERS TO THE NUMBER OF TYPES OF DATA.
VELOCITY REFERS TO THE SPEED OF DATA PROCESSING.
Continue…
SERVER3
SERVER2
SERVER1
SERVER6
SERVER5
SERVER4
SO HADOOP IS..
• A PRODUCT OF APACHE SOFTWARE FOUNDATION.
• A SOFTWARE FRAMEWORK WRITTEN IN JAVA.
• IT SUPPORTS CROSS-PLATFORM.
• IT IS OPEN SOURCE.
HADOOP FRAMEWORK IS BUILT OF :
1. HADOOP COMMON
2. HDFS
3. HADOOP YARN
4. MAPREDUCE
HDFS
IT IS A SPECIALLY DESIGN FILE SYSTEM FOR STORING HUGE DATA SETS WITH
CLUSTER OF COMMODITY HARDWARE STREAMING ACCESS PLATFORM.
• CLUSTER
• COMMODITY HARDWARE
• STREAMING ACCESS PLATFORM
• SPECIALLY DESIGN FILE SYSTEM
5 SERVICES PROVIDED BY HDFS
• NAME NODE
• SECONDARY NAME NODE
• JOB TRACKER
• DATA NODE
• TASK TRACKER
Name node
Secondary name node
Job tracker
Data node
Task tracker
client Namenode
1 2 3
45 6
DN DN DN
DN DN DN
A.Text
B.Text
C.Text
Request for File A.Text
(1,2,6) Available
clientMap
Job Tracker
1 2 3
45 6
TT TT TT
TT TT TT
A.Text (1,2,6)
B.Text
C.Text
Logic
INSTALLATION
REQUIREMENT FOR INSTALLATION
o JAVA 1.6.X , PREFERABLY FROM SUN MUSTBE INSTALLED
o SSH MUST BE INSTALLED AND SSHD MUST BE RUNNING TO USE THE HADOOP SCRIPTS THAT
MANAGE REMOTE HADOOP DAEMONS
o INSTALL HADOOP-2.3.0 AND HADOOP-2.3-CONFIG-MASTER
o WWW.HADOOP.APACHE.ORG
INSTALL JAVA
SET PATH OF JAVA IN ENVIRONMENT VARIABLES
REPLACE YARN.CMD IN HADOOP 2.3.0.TAR.GZ IN BIN FOLDER
REPLACE WHOLE HADOOP FOLDER FROM CONFIG MASTER TO TAR.GZ FOLDER
SET HADOOP PATH IN ENVIRONMENT VARIABLES
OPEN CMD AND RUN HADOOP
FLOW CHART OF WORD COUNT JOB
FILE.TXT 200MB
Input File(File.txt)
Input Split Input Split Input Split Input Split
Mapper Mapper Mapper Mapper
64mb
64mb
64mb
8mb
Record
Reader
Record
ReaderRecord
Reader
Record
Reader
(byteoffset , entireline)
(0 , hi how are you?)
(17 , how is your job?)
(how,1)(what,1)
(is,1)(your,1)
(how,1)(is,1)
(brother,1)(now,1)
INTERMEDIATE DATA
Mapper Mapper Mapper Mapper
(what,1)
(is,1) (your,1)
(how,1) (is,1)
(brother,1) (now,1)
(time,1) (is,1)
(the,1)
(how,1)(hi,1)
(is,1)(how,1)
(are,1)(your,1)
(you,1)(job,1)
(how,1)(what,1)
(is,1)(your,1)
(how,1)(is,1)
(sister,1)(family,1)
(what,1)
(is,1) (use,1)
(of,1) (hadoop,1)
Intermediate Data Shuffling Sorting
(how,1,1,1,1,1) Reducer(how,5)
COMPLETE FLOWInput File(File.txt)
Input Split
Record
Reader
Mapper
Reducer
Record
writer
Output File
OUTPUT
(are,1)
(brother,1)
(family,1)
(hadoop,1)
(hi,1)
(how,4)
(is,6)
(job,1)(now,1)
(of,1)
(sister,1)
(the,2)
(time,1)
(use,1)(what,2)
(you,1)
(your,4)
THANK YOU!!!