Top Banner
SEMINAR ON SEMINAR ON Android App Development Android App Development Trained by- Trained by- Hewlett-Packard Education Hewlett-Packard Education Services, Mumbai Services, Mumbai Presented to- Mr. R.K. Banyal By- Mr. Hukum Chand Saini Urvashi Kataria
27

Hadoop MapReduce

Apr 15, 2017

Download

Technology

Urvashi Kataria
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hadoop MapReduce

SEMINAR ONSEMINAR ONAndroid App DevelopmentAndroid App Development

Trained by-Trained by-Hewlett-Packard Education Services, Hewlett-Packard Education Services,

MumbaiMumbai

Presented to-Mr. R.K. Banyal By-Mr. Hukum Chand Saini Urvashi Kataria

Page 2: Hadoop MapReduce

About HPES:About HPES:• American global IT company headquartered in Palo-

Alto, California, US.• Provider of products, soft wares, technologies,

solutions and services to individual as well as small & medium sized business.

• Major operations include- HP Software, HP Financial Services & Corporate Investments

• Provides practical training in fields like Big Data, Android App Dev, Embedded Systems etc.

Page 3: Hadoop MapReduce

An android application that allows you to enjoy your as well as your dear ones birthday.

Save the days, get reminded of them, capture moments on the day itself, get greeted by the app, and celebrate!!

About Birthday Bash:About Birthday Bash:

Page 4: Hadoop MapReduce

The home screen:The home screen:

Page 5: Hadoop MapReduce

Calculating age and Calculating age and further:further:

Page 6: Hadoop MapReduce

Saving name for specified Saving name for specified date:date:

Page 7: Hadoop MapReduce

Happy Birthday!Happy Birthday!

Page 8: Hadoop MapReduce

Hadoop Map Reduce

(Map + reduce)

Presentation on:Presentation on:

Page 9: Hadoop MapReduce

Why MapReduce?Why MapReduce?• Large scale data processing was difficult!

Managing hundreds or thousands of processors

Managing parallelization and distribution

Reliable execution with easy data access

MapReduce provides all of these, easily!

Page 10: Hadoop MapReduce

What is Hadoop MapReduce?What is Hadoop MapReduce?

Page 11: Hadoop MapReduce

Hadoop ClusterHadoop Cluster HDFS (Physical) HDFS (Physical) StorageStorage

Page 12: Hadoop MapReduce

MapReduce ObjectsMapReduce Objects

Page 13: Hadoop MapReduce

How Map and Reduce Work How Map and Reduce Work TogetherTogether

Page 14: Hadoop MapReduce

Hadoop MapReduce: A Closer Hadoop MapReduce: A Closer LookLook

file

file

InputFormat

Split Split Split

RR RR RR

Map Map Map

Input (K, V) pairs

PartitionerIntermediate (K, V) pairs

Sort

Reduce

OutputFormat

Files loaded from local HDFS store

RecordReaders

Final (K, V) pairs

Writeback to local HDFS store

file

file

InputFormat

Split Split Split

RR RR RR

Map Map Map

Input (K, V) pairs

PartitionerIntermediate (K, V) pairs

Sort

Reduce

OutputFormat

Files loaded from local HDFS store

RecordReaders

Final (K, V) pairs

Writeback to local HDFS store

Node 1 Node 2

Shuffling Process

Intermediate (K,V) pairs

exchanged by all nodes

Page 15: Hadoop MapReduce

AlgorithmAlgorithmmap(key, value):// key: document name; value: text of document

for each word w in value:emit(w, 1)

reduce(key, values):// key: a word; values: an iterator over counts

result = 0for each count v in values:result += vemit(key,result)

map(key=url, val=contents): for each word w in contents:

emit (w, “1”)reduce(key=word, values=uniq_counts)://Sum all “1”s in values list

emit result “(word, sum)”

Page 16: Hadoop MapReduce

The very famous:The very famous:Word Count ExampleWord Count Example

Page 17: Hadoop MapReduce

Ways to MapReduceWays to MapReduce

Libraries Languages

Note: Java is most common, but other languages can be used

Page 18: Hadoop MapReduce

Common Data Sources Common Data Sources for MapReduce Jobsfor MapReduce Jobs

Page 19: Hadoop MapReduce

Service ProvidersService Providers• Open Source

o Apache

• Commercialo Clouderao Hortonworkso MapRo AWS MapReduceo Microsoft HDInsight (Beta)

Page 20: Hadoop MapReduce

Advancements:Advancements:MRV1 & MRV2MRV1 & MRV2

MRV2 (MAPREDUCE VERSION 2)•Splits the existing JobTracker’s roles

o Resource managemento Job lifecycle management

•MapReduce 2.0 provides many benefits over the existing MapReduce framework:

o Better scalability o Through distributed job lifecycle management o Support for multiple Hadoop MapReduce API versions in a

single cluster

Page 21: Hadoop MapReduce

Better MapReduce - Better MapReduce - OptimizationsOptimizations

Page 22: Hadoop MapReduce

Advantages of MapReduceAdvantages of MapReduce

• Distributed data and computation.• Tasks are independent. Entire nodes can fail and restart.• Linear scaling in the idle case. It’s used to design cheap

commodity, hardware.• Simple programming model. The end-user programmer

only writes map reduce task.

Page 23: Hadoop MapReduce

Disadvantages/ Cases where Disadvantages/ Cases where MR isn’t a suitable choice:MR isn’t a suitable choice:

• Real time processing• It is not always very easy to implement each and every

thing as a map reduce program• When your intermediate processes need to talk to each

other • When your processing requires lot of data to be shuffled

over the network• When you need to handle streaming data. MR is best suited

to batch process huge amount of data which you already have

Page 24: Hadoop MapReduce

Limitations of Limitations of MapReduceMapReduce

Page 25: Hadoop MapReduce

RDBMS vs. RDBMS vs. HadoopHadoop

Traditional RDBMS Hadoop / MapReduce

Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)

Access Interactive and Batch Batch – NOT Interactive

Updates Read / Write many times

Write once, Read many times

Structure Static Schema Dynamic Schema

Integrity High (ACID) Low

Scaling Nonlinear Linear

Query Response Time

Can be near immediate Has latency (due to batch processing)

Page 26: Hadoop MapReduce

ReferencesReferences• J. Dean and S. Ghemawat. “MapReduce: Simplified Data

Processing on Large Clusters.” Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI 2004), pages 137-150. 2004.

• S. Ghemawat, H. Gobioff, and S.-T. Leung. “The Google File System.” OSDI 200?

• http://hadoop.apache.org/common/docs/current/mapred_tutorial.html. “Map/Reduce Tutorial”. Fetched January 21, 2010.

• Tom White. Hadoop: The Definitive Guide. O'Reilly Media. June 5, 2009

• http://developer.yahoo.com/hadoop/tutorial/module4.html• J. Lin and C. Dyer. Data-Intensive Text Processing with

MapReduce, Book Draft. February 7, 2010.

Page 27: Hadoop MapReduce

Thank You!!Thank You!!