Top Banner
A Study of Hadoop in Map- Reduce Poumita Das Shubharthi Dasgupta Priyanka Das
22
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hadoop

A Study of Hadoop in Map-Reduce

Poumita DasShubharthi DasguptaPriyanka Das

Page 2: Hadoop

What is Big Data??

Big data is an evolving term that describes any voluminous amount of

structured, semi-structured and unstructured data that has the

potential to be mined for information.

Page 3: Hadoop

The 3 V’s

Page 4: Hadoop

Why DFS

Page 5: Hadoop

An introduction to Map-Reduce

Map-Reduce programs are designed to compute large volumes of data in a

parallel fashion. There are 3 steps

• Map

• Shuffle

• Reduce

Page 6: Hadoop

Map-Reduce continuedMap Shuffle Reduce

Page 7: Hadoop

What is Hadoop??

Apache Hadoop is a framework

that allows for the distributed

processing of large data sets

across clusters of commodity

computers using a simple

programming model.

Page 8: Hadoop

Hadoop core components

• Namenode

• Datanode

• Client

• User

• Job tracker

• Task tracker

Page 9: Hadoop

Namenode

The NameNode maintains the namespace tree and the mapping of

blocks to DataNodes. In a cluster there may exist hundreds or even

thousands of datanodes.

Secondary NameNode reads the metadata from RAM and writes it into a

secondary storage. However it is NOT a substitute of a NameNode

Page 10: Hadoop

Datanode

On startup, a DataNode connects to the NameNode; spinning until that

service comes up. It then responds to requests from the NameNode for

filesystem operations.

Client applications can talk directly to a DataNode, once the NameNode has

provided the location of the data.

Page 11: Hadoop

HDFS client

User applications access the filesystem using the HDFS client. A client has mainly 3

operations.

• Creating a new file

• File read

• File write

Page 12: Hadoop

Creating a new file

Page 13: Hadoop

File read

HDFS implements a single-

writer, multiple-reader model.

That is reading is a parallel

operation in Hadoop

Page 14: Hadoop

File write

An HDFS file consists of blocks.

When there is a need for a new

block, the NameNode allocates

a block with a unique block ID

and determines a list of

DataNodes to host replicas of

the block.

Page 15: Hadoop

Job tracker and task tracker

Page 16: Hadoop

Hadoop ecosystem

• PIG

• HIVE

• MAHOUT

Page 17: Hadoop

A Sample Program

Page 18: Hadoop

The Output

Page 19: Hadoop

Why Anagrams?

• Started out as a simple relaxation game, finding anagrams in sentences

• Games and Puzzles like Scrabble

• Ciphers, like permutation cipher, transposition ciphers

Page 20: Hadoop

Future scope

Keeping in mind the vast application of Hadoop we have certain graph-

searching techniques in mind that would be much more easier to solve

with the help of Map-reduce engine.

Page 21: Hadoop

References

• Introduction to Hadoop: Welcome to Apache https://hadoop.apache.org/ • Cloudera Documentation: Usage

http://www.cloudera.com/content/cloudera/en/documentation/hadoop-tutorial/CDH5/Hadoop-Tutorial/ht_usage.html • Edureka: Anatomy of a Map-Reduce Job

http://www.edureka.co/blog/anatomy-of-a-mapreduce-job-in-apache-hadoop/ • Stackoverflow: Explain Map-Reduce Simply

http://stackoverflow.com/questions/28982/please-explain-mapreduce-simply

Page 22: Hadoop

Thank you