Top Banner
HADOOP D EVELOPED B Y : J AYDEEP P ATEL (13 MCA 63) K ULDEEP P ATEL (13 MCA 64)
25
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hadoop

HADOOPDEVELOPED BY : JAYDEEP PATEL(13MCA63)

KULDEEP PATEL(13MCA64)

Page 2: Hadoop

WHAT IS BIG DATA?

THE TERM BIG DATA STANDS FOR COLLECTION OF DATA SETS THAT ARE TOO

LARGE AND COMPLEX ,SO IT IS DIFFICULT TO CAPTURE , STORE , SEARCH AND

ANALYZE USING TRADITIONAL DATA PROCESSING APPLICATIONS.

BIG DATA = SORTED DATA + UNSORTED DATA

SORTED DATA

UNSORTED DATA

Page 3: Hadoop

CHARACTERISTICS OF BIG DATA

3VS (VOLUME, VARIETY AND VELOCITY) ARE DEFINING PROPERTIES OR

DIMENSIONS OF BIG DATA.

VOLUME REFERS TO THE AMOUNT OF DATA.

VARIETY REFERS TO THE NUMBER OF TYPES OF DATA.

VELOCITY REFERS TO THE SPEED OF DATA PROCESSING.

Page 4: Hadoop

Continue…

Page 5: Hadoop

SERVER3

SERVER2

SERVER1

SERVER6

SERVER5

SERVER4

Page 6: Hadoop

SO HADOOP IS..

• A PRODUCT OF APACHE SOFTWARE FOUNDATION.

• A SOFTWARE FRAMEWORK WRITTEN IN JAVA.

• IT SUPPORTS CROSS-PLATFORM.

• IT IS OPEN SOURCE.

HADOOP FRAMEWORK IS BUILT OF :

1. HADOOP COMMON

2. HDFS

3. HADOOP YARN

4. MAPREDUCE

Page 7: Hadoop

HDFS

IT IS A SPECIALLY DESIGN FILE SYSTEM FOR STORING HUGE DATA SETS WITH

CLUSTER OF COMMODITY HARDWARE STREAMING ACCESS PLATFORM.

• CLUSTER

• COMMODITY HARDWARE

• STREAMING ACCESS PLATFORM

• SPECIALLY DESIGN FILE SYSTEM

Page 8: Hadoop

5 SERVICES PROVIDED BY HDFS

• NAME NODE

• SECONDARY NAME NODE

• JOB TRACKER

• DATA NODE

• TASK TRACKER

Name node

Secondary name node

Job tracker

Data node

Task tracker

Page 9: Hadoop

client Namenode

1 2 3

45 6

DN DN DN

DN DN DN

A.Text

B.Text

C.Text

Request for File A.Text

(1,2,6) Available

Page 10: Hadoop

clientMap

Job Tracker

1 2 3

45 6

TT TT TT

TT TT TT

A.Text (1,2,6)

B.Text

C.Text

Logic

Page 11: Hadoop

INSTALLATION

Page 12: Hadoop

REQUIREMENT FOR INSTALLATION

o JAVA 1.6.X , PREFERABLY FROM SUN MUSTBE INSTALLED

o SSH MUST BE INSTALLED AND SSHD MUST BE RUNNING TO USE THE HADOOP SCRIPTS THAT

MANAGE REMOTE HADOOP DAEMONS

o INSTALL HADOOP-2.3.0 AND HADOOP-2.3-CONFIG-MASTER

o WWW.HADOOP.APACHE.ORG

Page 13: Hadoop

INSTALL JAVA

Page 14: Hadoop

SET PATH OF JAVA IN ENVIRONMENT VARIABLES

Page 15: Hadoop

REPLACE YARN.CMD IN HADOOP 2.3.0.TAR.GZ IN BIN FOLDER

Page 16: Hadoop
Page 17: Hadoop

REPLACE WHOLE HADOOP FOLDER FROM CONFIG MASTER TO TAR.GZ FOLDER

Page 18: Hadoop

SET HADOOP PATH IN ENVIRONMENT VARIABLES

Page 19: Hadoop

OPEN CMD AND RUN HADOOP

Page 20: Hadoop
Page 21: Hadoop

FLOW CHART OF WORD COUNT JOB

FILE.TXT 200MB

Input File(File.txt)

Input Split Input Split Input Split Input Split

Mapper Mapper Mapper Mapper

64mb

64mb

64mb

8mb

Record

Reader

Record

ReaderRecord

Reader

Record

Reader

(byteoffset , entireline)

(0 , hi how are you?)

(17 , how is your job?)

(how,1)(what,1)

(is,1)(your,1)

(how,1)(is,1)

(brother,1)(now,1)

Page 22: Hadoop

INTERMEDIATE DATA

Mapper Mapper Mapper Mapper

(what,1)

(is,1) (your,1)

(how,1) (is,1)

(brother,1) (now,1)

(time,1) (is,1)

(the,1)

(how,1)(hi,1)

(is,1)(how,1)

(are,1)(your,1)

(you,1)(job,1)

(how,1)(what,1)

(is,1)(your,1)

(how,1)(is,1)

(sister,1)(family,1)

(what,1)

(is,1) (use,1)

(of,1) (hadoop,1)

Intermediate Data Shuffling Sorting

(how,1,1,1,1,1) Reducer(how,5)

Page 23: Hadoop

COMPLETE FLOWInput File(File.txt)

Input Split

Record

Reader

Mapper

Reducer

Record

writer

Output File

Page 24: Hadoop

OUTPUT

(are,1)

(brother,1)

(family,1)

(hadoop,1)

(hi,1)

(how,4)

(is,6)

(job,1)(now,1)

(of,1)

(sister,1)

(the,2)

(time,1)

(use,1)(what,2)

(you,1)

(your,4)

Page 25: Hadoop

THANK YOU!!!