University of Minnesota Running MapReduce in Non-Traditional Environments Abhishek Chandra Associate Professor Department of Computer Science and Engineering.

University of Minnesota

Running MapReduce in Non-Traditional Environments

Abhishek ChandraAssociate Professor

Department of Computer Science and Engineering

University of Minnesotahttp://www.cs.umn.edu/~chandra

1


Talk Outline

Big Data and MapReduce MapReduce Background MapReduce in Non-Traditional

Environments Concluding Remarks

2


Big Data Data-rich enterprises and communities

Both user-facing services and batch data processing

Commercial, social, scientific E.g.: Google, Facebook, Yahoo, LHC, ...

Data analysis is key!

Need massive scalability and parallelism PB’s of data, millions of files, 1000’s of nodes,

millions of users

Need to do this cost effectively and reliably Use commodity hardware where failure is the

norm Share resources among multiple projects


Big Data and MapReduce Simple data-parallel programming model and

framework Designed for scalability and fault-tolerance Can express several data analysis algorithms

Widely used Pioneered by Google: Processes several

petabytes of data per day Popularized by open-source Hadoop project:

Used at Yahoo!, Facebook, Amazon, …


MapReduce Design Goals

Scalability 1000’s of machines,

10,000’s of disks TBs-PBs of data

Cost-efficiency Hardware: Commodity

machines and network Administration: Automatic

fault-tolerance, easy set up

Programming: Easy to use and write applications

Image Source: http://www.ibm.com


MapReduce Applications (Industry)

Google: Index construction for Google Search Article clustering for Google News

Yahoo!: “Web map” powering Yahoo! Search Spam detection for Yahoo! Mail

Facebook: Ad optimization Spam detection

...


MapReduce Applications (Research) Wide interest in academia/research:

High Energy Physics (Indiana) Astronomical image analysis (Washington) Bioinformatics (Maryland) Analyzing Wikipedia conflicts (PARC) Natural language processing (CMU) Particle physics (Nebraska) Ocean climate simulation (Washington)...


Talk Outline



8


MapReduce Computation

Input

Data

Data Push

Output

Data

Map Reduce


MapReduce Programming Model

Data: Sequence of key-value records

Map function: converts input key-value pairs to intermediate key-value pairs

(Kin, Vin) list(Kinter, Vinter)

Reduce function: converts intermediate key-value pairs to output key-value pairs

(Kinter, list(Vinter)) list(Kout, Vout)


Example: Word Count

def mapper(file, text):

foreach word in text.split():

output(word, 1)

def reducer(word, list(count)):

output(word, sum(count))


Word Count ExampleInput Map Shuffle & Sort Reduce Output

the quick

brown fox

the fox ate

the mouse

how now

brown cow

Map

Map

Map

Reduce

Reduce

brown, 2

fox, 2

how, 1

now, 1

the, 3

ate, 1

cow, 1

mouse, 1

quick, 1

the, 1brown, 1

fox, 1

quick, 1

the,1fox,

1the,

1

how, 1now, 1brown,

1

ate, 1mouse, 1

cow, 1


MapReduce Workflow

Input

Data

Data Push

Output

Data

Map Reduce


MapReduce Stages Push: Input split into large chunks and placed

on local disks of cluster nodes

Map: Chunks are served to “mapper” tasks Prefer mapper that has data locally Mappers save outputs to local disk before

serving them to reducers

Reduce: “Reducers” execute reduce tasks when map phase complete


Partitioning/Shuffling Goal: Divide intermediate key space across

reducers k reduce tasks => k partitions (simple hash fn) E.g.: k=3, keys: {1,…6} => partitions: {1,2},

{3,4}, {5,6}

Shuffle: Send intermediate key-values to the relevant reducers All-to-all communication: since all mappers

typically have all intermediate keys

Combine: Local aggregation function for repeated keys produced by same map


Fault Tolerance Task re-execution: Retry task(s) on another

node On task or node failure OK for a map because it has no

dependencies OK for reduce because map outputs are on

disk Speculative execution: Launch copy of task

on another node To handle stragglers (slow tasks) Use result from first task to finish

16


Hadoop

Open-source Apache project Software framework for distributed data

processing Primary project: MapReduce implementation Other projects on top of MapReduce

Implemented in Java Primary data analysis platform at Yahoo!

40,000+ machines running Hadoop


Hadoop: Primary Components

HDFS: Distributed File System Combines cluster’s local storage into a single

namespace All data is replicated to multiple machines Provides locality information to clients

MapReduce: Batch computation framework Tasks re-executed on failure Optimizes for data locality of input


Talk Outline



19


Traditional MapReduce Environments

20

Assumptions: Tightly-coupled

clusters Dedicated compute

nodes Data is centrally

available/pre-placed

Image Source: http://www.ibm.com


But… Data may be distributed

Data originates in geographically distributed manner Scientific instruments, sensors.

E.g.: oceanic, atmospheric data Public/social data. E.g.: User

blogs, traffic data

21

Commercial data. E.g.: Warehouse, ecommerce data

Monitoring data. E.g.: CDN user access logs Mobile data. E.g.: phone pics, sensors

May want to combine multiple data sources E.g.: CDC+Google Maps


Computation may be distributed

Distributed data centers/clouds E.g.: Amazon EC2 regions,

Akamai CDN servers Computational Grids

E.g.: FutureGrid Volunteer computing

platforms E.g.: BOINC

22

University of Minnesota 23

Question: How to execute MapReduce in such non-traditional environments?

Highly-Distributed Environments


Research Overview

Step 1: Understanding tradeoffs Compare different deployment architectures

for MapReduce execution Step 2: Optimizing MapReduce execution

Data placement/task scheduling based on system and application characteristics

24


Step 1: Understanding Tradeoffs

25

Input

Data

Data Push

Output

Data

Map Reduce

Goal: Understand what deployment architectures would work best

Input

Data


Architecture 1: Local MapReduce

Data Source

(US)

Data Source

(EU)

Data Center (US) Data Center (EU)

MapReduce Job

Final Result

Data Push (Fast)

Data Push (Slow)


Architecture 2: Global MapReduce

Data Source

(US)

Data Source

(EU)


MapReduce JobFinal

Result

Data Push (Fast)

Data Push (Slow)

Data Push (Fast)

Data Push (Slow)


Architecture 3: Distributed MapReduce

Data Source

(US)

Data Source

(EU)


MapReduce Job

Final Result

Data Push (Fast)

MapReduce Job

CombineResults

Data Push (Fast)


Experimental Results: PlanetLab

29

Push

US

Push

EU

Map

Reduc

e

Result C

...

Tota

l0

200

400

600

800

1000

1200

1400

1600

1800

Local MR

Global MR

Distrib-uted MR

Tim

e in s

eco

nds

Pus

Pus

Map

Reduc

e

Result .

..

Tota

l0

100

200

300

400

500

600

700

800

900

Local MR

Global MR

Distrib-uted MR

Tim

e in s

eco

nds

Performance depends on network, application characteristics

WordCount (Random)WordCount (Text)

Result Combine cost

dominant

Data Push cost

dominant

PlanetLab: 4/1 US, 4/1 EU compute/data nodes, Hadoop 0.20.1


Experimental Results: Amazon EC2

30

Push

US

Push

EU

Map

Reduce

Result

Combin

eTo

tal

0

100

200

300

400

500

600

700

800

900

1000Local MR

Distributed MR

Tim

e in

se

con

ds

Push

US

Push

EU

Map

Reduce

Result

Combin

eTo

tal

0

100

200

300

400

500

600

700

800

900

Local MR

Distributed MR

Tim

e i

n S

eco

nd

s

WordCount (Random)WordCount (Text)

Amazon EC2: 6 US, 3 EU small instances, 1 data node each

Performance depends on network, application characteristics


Lessons Learnt

Make MapReduce topology-aware Data placement and task scheduling should

consider network locality Application-specific data aggregation critical

High aggregation => Avoid initial data push cost

Low aggregation => Avoid shuffle cost Make globally optimal decisions

“Good” local decisions can adversely impact E2E performance

31


Step 2: Optimizing MapReduce Execution

Framework for modeling MapReduce execution Optimizer to determine an optimal execution

plan (data placement and task scheduling) Topology-aware: Uses information about

network and node characteristics Application-aware: Uses data aggregation

characteristics Global optimization: Performs end-to-end,

multi-phase optimization Implemented in Hadoop 1.0.1


MapReduce Execution Model


MapReduce Execution Model

Parameters Di – Size of data supplied at each data source Bij – Link bandwidth from node i to node j Ci – Mapper/Reducer compute rates α – Ratio of size of intermediate data to input

data Execution Plan

Each source: where to push data All mappers: where to shuffle data


MapReduce Execution Model: Constraints

xij – fraction of node i’’s data pushed/shuffled to node j Each data source (mapper) must push (shuffle) all of

its data

One-reducer-per-key: yk denotes fraction reduced at reducer k


MapReduce Execution Optimization

Objective: Minimize Makespan subject to Model constraints

Use model parameters to compute execution time: Push/shuffle time based on link bandwidths,

size of data communicated over each link Map/reduce time based on compute rates, size

of data computed at each node


Benefit of OptimizationPlanetLab measurements: 4 US, 2 Europe, 2 Asia nodes; 1 data source each

uniform myopic multi e2e multi0

500

1000

1500

2000

2500

ReduceShuffleMapPush

Optimization Algorithm

Make

span (

s)

uniform myopic multi e2e multi0

5000

10000

15000

20000

25000

ReduceShuffleMapPush

Optimization AlgorithmM

ake

span (

s)Uniform Myopic Optimized Uniform Myopic

Optimized

α=0.1 (Data Aggregation) α=10 (Data Expansion)

Model-driven optimization achieves minimum makespan under different scenarios


Comparison to Hadoop

Uniform Hadoop Optimized0

1000

2000

3000

4000

5000

6000

7000Word Count

ReduceMapPush

Execution Plan

Ma

ke

spa

n (

s)


5001000150020002500300035004000

Sessionization

ReduceMapPush

Execution Plan

Ma

ke

spa

n (

s)


50010001500200025003000350040004500

Full Inverted Index

ReduceMapPush

Execution Plan

Ma

ke

spa

n (

s)

Emulated PlanetLab, Hadoop 1.0.1 (Modified for model-based execution plans)


Concluding Remarks MapReduce: Large-scale distributed data

processing Scalable: large no. of machines and data Cheap: lower hardware, programming, admin

costs Well-suited for several data analysis applications

Rich area for research Resource management, algorithms, programming

models Our focus: Optimization in highly-distributed

environments Acknowledgments:

Students, esp. Ben Heintz Jon Weissman (UMN), Ramesh Sitaraman (UMASS)

39


Thank You!

http://www.cs.umn.edu/~chandra

40

University of Minnesota Running MapReduce in Non-Traditional Environments Abhishek Chandra Associate Professor Department of Computer Science and Engineering.

Documents

petabytes of data

parallelismpbs of data

data analysis algorithms

minnesota1talk outlinebig

input keyvalue pairs

sequence of key

output keyvalue pairskinter

value records map function