Bd class 2 complete

Post on 04-Nov-2014

178 Views

Category:

Education

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

BigData Class 2

Transcript

BUMPER

Topic 1

HDFS – Hands On (Part – 1)

Class 2 – Hadoop Distributed File System

AGENDA

• What is Big Data?• Hadoop Distributed File System• MapReduce• Understanding Hadoop Ecosystem• Setting up a Hadoop Cluster• HDFS – Hands On• MapReduce-Hands On

Pre-requisites

HDFS – Hands On

Virtual Machine is up and running.

Connected to your Virtual Machine using putty as ‘hduser’.

Command Syntax

HDFS – Hands On

hadoop fs –ls / (To list directory contents)

Command Syntax

HDFS – Hands On

hadoop fs –ls / (To list directory contents)

hadoop fs -<command> <args>

Command Syntax

HDFS – Hands On

hadoop fs –ls / (To list directory contents)

hadoop fs -<command> <args>

hadoop: This is the binary executable.

Command Syntax

HDFS – Hands On

hadoop fs –ls / (To list directory contents)

hadoop fs -<command> <args>

hadoop: This is the binary executable.

fs: Invokes the Hadoop file system, which is the HDFS.

Command Syntax

HDFS – Hands On

hadoop fs –ls / (To list directory contents)

hadoop fs -<command> <args>

hadoop: This is the binary executable.

fs: Invokes the Hadoop file system, which is the HDFS.

<command>: Indicates what is the purpose of the statement and always preceded by a ‘-‘.

Command Syntax

HDFS – Hands On

hadoop fs –ls / (To list directory contents)

hadoop fs -<command> <args>

hadoop: This is the binary executable.

fs: Invokes the Hadoop file system, which is the HDFS.

<command>: Indicates what is the purpose of the statement and always preceded by a ‘-‘.

<args>: Indicates the arguments that are applicable for the command.

Where do DataNodes store data?HDFS – Hands On

Where do DataNodes store data?HDFS – Hands On

hadoop.tmp.dir = /tmp/hadoop

Where do DataNodes store data?HDFS – Hands On

hadoop.tmp.dir = /tmp/hadoop dfs.data.dir = ($hadoop.tmp.dir)/dfs/data

Where do DataNodes store data?HDFS – Hands On

hadoop.tmp.dir = /tmp/hadoop dfs.data.dir = ($hadoop.tmp.dir)/dfs/data = /tmp/hadoop/dfs/data

Where do DataNodes store data?HDFS – Hands On

hadoop.tmp.dir = /tmp/hadoop dfs.data.dir = ($hadoop.tmp.dir)/dfs/data = /tmp/hadoop/dfs/data

VERSION >> Java properties fileblk_********* >> Raw data of a fileblk_******.meta >> Metadata of the blockHow come there is a block when we have not loaded any file?

jobtracker.infoHDFS – Hands On

fsckHDFS – Hands On

Generates a summary report that lists the overall health of the filesystem.

fsckHDFS – Hands On

Total size: Indicates the size of the directory (root directory in our case). Does not account for replication.

Total dirs: Indicates the number of directories in HDFS

Total files: Indicates the number of files in HDFS

Total blocks: Indicates the number of blocks

Default replication factor:Average replication factor:Corrupt blocks:Missing replicas: Number of data nodes:Number of racks:

Edit .bashrc

HDFS – Hands On

Navigate to the home directory.

cd

List hidden files.

ls -a

Edit the .bashrc file.

vi .bashrc

Update HADOOP paths using ‘export’ command.

export HADOOP_CONF=/home/hduser/hadoop/confexport HADOOP_PREFIX=/home/hduser/hadoop

# Add Hadoop bin/ directory to path

export PATH=$PATH:$HADOOP_PREFIX/bin

Execute the updated contents of the .bashrc file.

source ~/.bashrc

copyFromLocalHDFS – Hands On

Copies file from local file system to HDFS.

hadoop fs –copyFromLocal <Path to source file on Local File System> <Target path in HDFS>

hadoop fs –copyFromLocal NOTICE.txt noticehdfs.txt

copyFromLocalHDFS – Hands On

copyFromLocal commands internally results in:

a file getting split into multiple blocks.

the client contacting the NameNode to find out where each block should be copied in the cluster.

replication of blocks to nodes assigned by NameNode.

How many blocks were created?HDFS – Hands On

RECAP

HDFS Commonly used commandsHDFS Concepts

BUMPER

BUMPER

Topic 2

HDFS – Hands On (Part – 2)

Class 2 – Hadoop Distributed File System

AGENDA

• What is Big Data?• Hadoop Distributed File System• MapReduce• Understanding Hadoop Ecosystem• Setting up a Hadoop Cluster• HDFS – Hands On• MapReduce-Hands On

Load a file larger than the block sizeHDFS – Hands On

Load a 200 MB file and see how many blocks were created.

Command to generate a 200 MB dummy file.dd if=/dev/zero of=file.txt count=1024 bs=204800

hadoop fs –copyFromLocal file.txt file.txtcd /tmp/hadoop/dfs/data/currentls –lrt

Load a file larger than the block sizeHDFS – Hands On

Block 1 = 64 MB

Block 2 = 64 MB

Block 3 = 8 MB

Block 4 = 64 MB

fsckHDFS – Hands On

fsck after loading 2 additional files.

Total size has increased.Total dirs: 7. Additions - /user and /user/hduser directories.Total files: 3. Additions - 2 newly loaded files.Total blocks: 6. Additions - 1 block of the 1st file and 4 blocks of the 2nd file.

catHDFS – Hands On

Displays contents of file on the command prompt.

hadoop fs –cat <Path of file in HDFS>

hadoop fs –cat noticehdfs.txt

copyToLocalHDFS – Hands On

Copies file from HDFS to local file system.

hadoop fs –copyToLocal <Path of file in HDFS> <Path of file in Local File System>

hadoop fs –copyToLocal noticehdfs.txt noticelocal.txt

mkdirHDFS – Hands On

Creates a directory inside HDFS.HDFS paths are relative.

Creates directory in current user’s home directoryhadoop fs –mkdir newdir

Creates new directory under roothadoop fs –mkdir /newdir

rmHDFS – Hands On

Removes file (s).

hadoop fs –rm <File Name>

Removes file and empty directories.hadoop fs –rm noticehdfs.txt

Trash featureHDFS – Hands On

Prevents accidental deletion of files and directories.Disabled by default.To enable, configure the fs.trash.interval property in core-site.xml file.

RECAP

HDFS Commonly used commandsHDFS Concepts

BUMPER

BUMPER

Topic 3

HDFS – Web UI

Class 2 – Hadoop Distributed File System

AGENDA

• What is Big Data?• Hadoop Distributed File System• MapReduce• Understanding Hadoop Ecosystem• Setting up a Hadoop Cluster• HDFS – Hands On• MapReduce-Hands On

NameNode Web Interface

HDFS – Hands On

HDFS Web Interface URL.

http://<namenode_host>:50070/

From the Virtual Machine:

http://localhost:50070/

From outside the Virtual Machine:http://<IP Address of VM or Hostname of VM>:50070/Example- http://192.168.234.135:50070/

NameNode Web Interface

HDFS – Hands On

Server Name and Port

Last start time of the NameNode

Hadoop Version, followed by subversion source code repository

To browse the files in HDFS View NameNode log files

Number of files, directories and blocks. Heap memory utilized/available.

Storage capacity of machines in the clusterHow much space utilized in HDFSSpace utilized by O/S, Applications etc.Amount of space available on HDFS

How many blocks have replicas less than Replication Factor

Nodes that are active and in contact with NameNodeNodes that are NOT in contact with NameNodeNodes administratively removed from the cluster

RECAP

HDFS Web UI

BUMPER

BUMPER

Topic 4

Class 2 – Hadoop Distributed File System

MapReduce – Hands On (Part – 1)

AGENDA

• What is Big Data?• Hadoop Distributed File System• MapReduce• Understanding Hadoop Ecosystem• Setting up a Hadoop Cluster• HDFS – Hands On• MapReduce-Hands On

How does MapReduce work?

MapReduce

How does MapReduce work?

MapReduce

Map Input List

Map Output List

Reduce Input List

Reduce Output List

Mapping Phase

Reducing Phase

How does MapReduce work?

MapReduce

Map Input List

Map Output List

Reduce Input List

Reduce Output List

Mapping Phase

Reducing Phase

How does MapReduce work?

MapReduce

Map Input List

Map Output List

Mapper

Reduce Input List

Reduce Output List

Mapping Phase

Reducing Phase

How does MapReduce work?

MapReduce

Map Input List

Map Output List

Mapper

Reduce Input List

Reduce Output List

Mapping Phase

Reducing Phase

How does MapReduce work?

MapReduce

Map Input List

Map Output List

Mapper

Reduce Input List

Reduce Output List

Mapping Phase

Reducing Phase

How does MapReduce work?

MapReduce

Map Input List

Map Output List

Mapper

Reduce Input List

Reduce Output List

Mapping Phase

Reducing Phase

How does MapReduce work?

MapReduce

Map Input List

Map Output List

Mapper

Reduce Input List

Reduce Output List

Reducer

Mapping Phase

Reducing Phase

How does MapReduce work?

MapReduce

Map Input List

Map Output List

Mapper

Reduce Input List

Reduce Output List

Reducer

Mapping Phase

Reducing Phase

How does MapReduce work?

MapReduce

Map Input List

Map Output List

Mapper

Reduce Input List

Reduce Output List

Reducer

Mapping Phase

Reducing Phase

Hadoop MapReduce

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Hadoop MapReduce

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Hadoop MapReduce

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Hadoop MapReduce

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Hadoop MapReduce

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Hadoop MapReduce

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Hadoop MapReduce

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Hadoop MapReduce

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Hadoop MapReduce

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Hadoop MapReduce – Roles: User vs. Framework

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Hadoop MapReduce – Roles: User vs. Framework

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Load data into HDFS

Hadoop MapReduce – Roles: User vs. Framework

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Load data into HDFS

Specify Path & Input Format

Hadoop MapReduce – Roles: User vs. Framework

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Load data into HDFS

Specify Path & Input Format

Hadoop MapReduce – Roles: User vs. Framework

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Load data into HDFS

Specify Path & Input Format

Create ‘Input Splits’

Hadoop MapReduce – Roles: User vs. Framework

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Load data into HDFS

Specify Path & Input Format

Create ‘Input Splits’

Create individual Records

Hadoop MapReduce – Roles: User vs. Framework

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Load data into HDFS

Specify Path & Input Format

Create ‘Input Splits’

Create individual Records

Hadoop MapReduce – Roles: User vs. Framework

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Load data into HDFS

Specify Path & Input Format

Create ‘Input Splits’

Create individual Records

User Defined Logic

Hadoop MapReduce – Roles: User vs. Framework

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Load data into HDFS

Specify Path & Input Format

Create ‘Input Splits’

Create individual Records

User Defined Logic

Hadoop MapReduce – Roles: User vs. Framework

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Load data into HDFS

Specify Path & Input Format

Create ‘Input Splits’

Create individual Records

User Defined Logic

Hadoop MapReduce – Roles: User vs. Framework

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Load data into HDFS

Specify Path & Input Format

Create ‘Input Splits’

Create individual Records

User Defined Logic

User Defined Logic

Hadoop MapReduce – Roles: User vs. Framework

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Load data into HDFS

Specify Path & Input Format

Create ‘Input Splits’

Create individual Records

User Defined Logic

User Defined Logic

Hadoop MapReduce – Roles: User vs. Framework

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Load data into HDFS

Specify Path & Input Format

Create ‘Input Splits’

Create individual Records

User Defined Logic

User Defined Logic Specify Path &

Output format

Hadoop MapReduce – Roles: User vs. Framework

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Load data into HDFS

Specify Path & Input Format

Create ‘Input Splits’

Create individual Records

User Defined Logic

User Defined Logic Specify Path &

Output format

Hadoop MapReduce – Roles: User vs. Framework

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Load data into HDFS

Specify Path & Input Format

Create ‘Input Splits’

Create individual Records

User Defined Logic

User Defined Logic Specify Path &

Output format

Replication, Rack Awareness etc.

Hadoop MapReduce – Roles: User vs. Framework

MapReduce

<1, King Queen King>

<King, 1><Queen, 1><King, 1>

<2, Minister King Soldier>

<3, Queen Soldier King>

<Minister, 1><King, 1><Soldier, 1>

<Queen, 1><Soldier, 1><King, 1>

<King, 1><King, 1><King, 1><King, 1>

<Minister, 1>

<Queen, 1><Queen, 1>

<Soldier,1><Soldier,1>

<King, (1,1,1,1)><Minister, 1>

<Queen, (1,1)><Soldier, (1,1)>

<King, 4><Minister, 1>

King Queen King

Minister King Soldier

Queen Soldier King

Input SplittingMap Shuffling Reduce Result

<Queen, 2><Soldier, 2>

Map Output

Load data into HDFS

Specify Path & Input Format

Create ‘Input Splits’

Create individual Records

User Defined Logic

User Defined Logic Specify Path &

Output format

Replication, Rack Awareness etc.

MapReduce Execution FrameworkMapReduce

MapReduce Execution FrameworkMapReduce

Mapper Process

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper ProcessDriver

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txt

Driver

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

InputFormat

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Input Split 1

InputFormat

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Calculates

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Mapper Process

Calculates

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Mapper Process

Calculates

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Mapper Process

Record Reader

Calculates

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Mapper Process

Record Reader

Reads Reads

Calculates

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Mapper Process

Record Reader

Reads Reads

Calculates

Defines

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Mapper Process

Record Reader

Reads

Passes <K,V> pairs

Reads

Calculates

Defines

Passes <K,V> pairs

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Mapper Process

Mapper

Record Reader

Reads

Passes <K,V> pairs

Reads

Calculates

Defines

Passes <K,V> pairs

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Mapper Process

Mapper

Record Reader

Reads

Passes <K,V> pairs

Reads

Calculates

Defines

Passes <K,V> pairs

<K, V> pairs <K, V> pairs

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Reduce Process

Mapper Process

Mapper

Record Reader

Reads

Passes <K,V> pairs

Reads

Calculates

Defines

Passes <K,V> pairs

<K, V> pairs <K, V> pairs

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Reduce Process

Mapper Process

Mapper

Record Reader

Reads

Passes <K,V> pairs

Reads

Calculates

Defines

Passes <K,V> pairs

<K, V> pairs <K, V> pairs

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Reduce Process

Mapper Process

Mapper

Record Reader

Reads

Passes <K,V> pairs

Reads

Calculates

Defines

Passes <K,V> pairs

<K, V> pairs <K, V> pairs

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Reduce Process

Mapper Process

Mapper

Record Reader

Reads

Passes <K,V> pairs

Reads

Calculates

Defines

Passes <K,V> pairs

<K, V> pairs <K, V> pairs

Shuffle

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Reduce Process

Mapper Process

Mapper

Record Reader

Reads

Passes <K,V> pairs

Reads

Calculates

Defines

Passes <K,V> pairs

<K, V> pairs <K, V> pairs

Partition Shuffle

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Reduce Process

Mapper Process

Mapper

Record Reader

Reads

Passes <K,V> pairs

Reads

Calculates

Defines

Passes <K,V> pairs

<K, V> pairs <K, V> pairs

Partition ShuffleSort

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Reduce Process

Mapper Process

Mapper

Record Reader

Reads

Passes <K,V> pairs

Reads

Calculates

Defines

Passes <K,V> pairs

<K, V> pairs <K, V> pairs

Partition ShuffleSort

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Reducer

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Reduce Process

Mapper Process

Mapper

Reducer

Record Reader

Reads

Passes <K,V> pairs

Reads

Calculates

Defines

Passes <K,V> pairs

<K, V> pairs <K, V> pairs

Partition ShuffleSort

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Reducer

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Reduce Process

Mapper Process

Mapper

Reducer

Record Reader

Reads

Passes <K,V> pairs

Reads

Passes <K,V> pairs

Calculates

Defines

Passes <K,V> pairs

Passes <K,V> pairs

<K, V> pairs <K, V> pairs

Partition ShuffleSort

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Reducer

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Reduce Process

Mapper Process

Mapper

Reducer

Record Reader

Reads

Passes <K,V> pairs

Reads

Passes <K,V> pairsOutputFormat

Calculates

Defines

Passes <K,V> pairs

Passes <K,V> pairs

<K, V> pairs <K, V> pairs

Partition ShuffleSort

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Reducer

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Output Data

Reduce Process

Mapper Process

Mapper

Reducer

Record Reader

Output Data

Reads

Passes <K,V> pairs

Reads

Passes <K,V> pairsOutputFormat

Calculates

Defines

Defines

Passes <K,V> pairs

Passes <K,V> pairs

<K, V> pairs <K, V> pairs

Partition ShuffleSort

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Reducer

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

Writer

InputFormat

Output Data

Reduce Process

Mapper Process

Mapper

Reducer

Record Reader

Writer

Output Data

Reads

Passes <K,V> pairs

Reads

Passes <K,V> pairsOutputFormat

Calculates

Defines

DefinesDefines

Passes <K,V> pairs

Passes <K,V> pairs

<K, V> pairs <K, V> pairs

Partition ShuffleSort

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Reducer

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

Writer

InputFormat

Output Data

Reduce Process

Mapper Process

Mapper

Reducer

Record Reader

Writer

Output Data

Reads

Passes <K,V> pairs

Reads

Passes <K,V> pairsOutputFormat

Calculates

Defines

DefinesDefines

Passes <K,V> pairs

Passes <K,V> pairs

<K, V> pairs <K, V> pairs

Partition ShuffleSort

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Reducer

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

Writer

InputFormat

Output Data

Reduce Process

Mapper Process

Mapper

Reducer

Record Reader

Writer

Output Data

Reads

Passes <K,V> pairs

Writes

Reads

Passes <K,V> pairs

Writes

OutputFormat

Calculates

Defines

DefinesDefines

Passes <K,V> pairs

Passes <K,V> pairs

<K, V> pairs <K, V> pairs

Partition ShuffleSort

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Reducer

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

Writer

InputFormat

Output Data

Reduce Process

Mapper Process

Mapper

Reducer

Record Reader

Writer

Output Data

Reads

Passes <K,V> pairs

Writes

Reads

Passes <K,V> pairs

Writes

OutputFormat

Defines

Calculates

Defines

DefinesDefines

Passes <K,V> pairs

Passes <K,V> pairs

<K, V> pairs <K, V> pairs

Partition ShuffleSort

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Reducer

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

Writer

InputFormat

Output Data

Reduce Process

Mapper Process

Mapper

Reducer

Record Reader

Writer

Output Data

Reads

Passes <K,V> pairs

Writes

Reads

Passes <K,V> pairs

Writes

OutputFormat

Defines

Defines

Calculates

Defines

DefinesDefines

Passes <K,V> pairs

Passes <K,V> pairs

<K, V> pairs <K, V> pairs

Partition ShuffleSort

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Reducer

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

Writer

InputFormat

Output Data

Reduce Process

Mapper Process

Mapper

Reducer

Record Reader

Writer

Output Data

Reads

Passes <K,V> pairs

Writes

Reads

Passes <K,V> pairs

Writes

OutputFormat

Defines

Defines

Calculates

Defines

Defines

DefinesDefines

Passes <K,V> pairs

Passes <K,V> pairs

<K, V> pairs <K, V> pairs

Partition ShuffleSort

MapReduce Execution FrameworkMapReduce

Reduce Process

Mapper Process

Input HDFS File - inputFile.txtBlock A Block B Block C

Driver

Mapper

Reducer

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

Writer

InputFormat

Output Data

Reduce Process

Mapper Process

Mapper

Reducer

Record Reader

Writer

Output Data

Reads

Passes <K,V> pairs

Writes

Reads

Passes <K,V> pairs

Writes

OutputFormat

Defines

Defines

Calculates

Defines

Defines

Defines

DefinesDefines

Passes <K,V> pairs

Passes <K,V> pairs

<K, V> pairs <K, V> pairs

Partition ShuffleSort

RECAP

MapReduce Execution Framework

BUMPER

BUMPER

Topic 5

Class 2 – Hadoop Distributed File System

MapReduce – Hands On (Part – 2)

AGENDA

• What is Big Data?• Hadoop Distributed File System• MapReduce• Understanding Hadoop Ecosystem• Setting up a Hadoop Cluster• HDFS – Hands On• MapReduce-Hands On

Java MapReduce Programming

MapReduce

Hello World of MapReduce >> Word Count program

Eclipse – Integrated Development Environment (IDE)

https://www.eclipse.org/downloads/

RECAP

Part two of Java MapReduce program

BUMPER

top related