Top Banner
1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion
59

1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

1

Map-Reduce and Its Children

Distributed File SystemsMap-Reduce and Hadoop

Dataflow SystemsExtensions for Recursion

Page 2: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

2

Distributed File Systems

ChunkingReplication

Distribution on Racks

Page 3: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

3

Distributed File Systems

Files are very large, read/append. They are divided into chunks.

Typically 64MB to a chunk. Chunks are replicated at several

compute-nodes. A master (possibly replicated)

keeps track of all locations of all chunks.

Page 4: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

4

Compute Nodes

Organized into racks. Intra-rack connection typically

gigabit speed. Inter-rack connection faster by a

small factor.

Page 5: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

5

Racks of Compute Nodes

File

Chunks

Page 6: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

6

3-way replication offiles, with copies ondifferent racks.

Page 7: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

7

Implementations

GFS (Google File System – proprietary).

HDFS (Hadoop Distributed File System – open source).

CloudStore (Kosmix File System, open source).

Page 8: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

8

Above the DFS

Map-ReduceKey-Value Stores

SQL Implementations

Page 9: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

9

The New Stack

Distributed File System

Map-Reduce, e.g.Hadoop

Object Store (key-valuestore), e.g., BigTable,

Hbase, Cassandra

SQL Implementations,e.g., PIG (relational

algebra), HIVE

Page 10: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

10

Map-Reduce Systems

Map-reduce (Google) and open-source (Apache) equivalent Hadoop.

Important specialized parallel computing tool.

Cope with compute-node failures. Avoid restart of the entire job.

Page 11: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

11

Key-Value Stores

BigTable (Google), Hbase, Cassandra (Apache), Dynamo (Amazon). Each row is a key plus values over a

flexible set of columns. Each column component can be a set

of values.

Page 12: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

12

SQL-Like Systems

PIG – Yahoo! implementation of relational algebra. Translates to a sequence of map-reduce

operations, using Hadoop. Hive – open-source (Apache)

implementation of a restricted SQL, called QL, over Hadoop.

Page 13: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

13

SQL-Like Systems – (2)

Sawzall – Google implementation of parallel select + aggregation.

Scope – Microsoft implementation of restricted SQL.

Page 14: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

14

Map-Reduce

Formal DefinitionFault-ToleranceExample: Join

Page 15: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

15

Map-Reduce

You write two functions, Map and Reduce. They each have a special form to be

explained. System (e.g., Hadoop) creates a

large number of tasks for each function. Work is divided among tasks in a precise

way.

Page 16: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

16

Map-Reduce Pattern

Maptasks

Reducetasks

InputfromDFS

Outputto DFS

“key”-value pairs

Page 17: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

17

Map-Reduce Algorithms

Map tasks convert inputs to key-value pairs. “keys” are not necessarily unique.

Outputs of Map tasks are sorted by key, and each key is assigned to one Reduce task.

Reduce tasks combine values associated with a key.

Page 18: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

18

Coping With Failures

Map-reduce is designed to deal with compute-nodes failing to execute a task.

Re-executes failed tasks, not whole jobs. Failure modes:

1. Compute node failure (e.g., disk crash).2. Rack communication failure.3. Software failures, e.g., a task requires Java

n; node has Java n-1.

Page 19: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

19

Things Map-Reduce is Good At

1. Matrix-Matrix and Matrix-vector multiplication.

One step of the PageRank iteration was the original application.

2. Relational algebra operations. We’ll do an example of the join.

3. Many other “embarrassingly parallel” operations.

Page 20: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

20

Joining by Map-Reduce

Suppose we want to compute R(A,B) JOIN S(B,C), using k Reduce tasks. I.e., find tuples with matching B-

values. R and S are each stored in a

chunked file.

Page 21: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

21

Joining by Map-Reduce – (2)

Use a hash function h from B-values to k buckets. Bucket = Reduce task.

The Map tasks take chunks from R and S, and send: Tuple R(a,b) to Reduce task h(b).

• Key = b value = R(a,b).

Tuple S(b,c) to Reduce task h(b).• Key = b; value = S(b,c).

Page 22: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

22

Joining by Map-Reduce – (3)

Reducetask i

Map tasks sendR(a,b) if h(b) = i

Map tasks sendS(b,c) if h(b) = i

All (a,b,c) such thath(b) = i, and (a,b)is in R, and (b,c) isin S.

Page 23: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

23

Joining by Map-Reduce – (4)

Key point: If R(a,b) joins with S(b,c), then both tuples are sent to Reduce task h(b).

Thus, their join (a,b,c) will be produced there and shipped to the output file.

Page 24: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

24

Dataflow Systems

Arbitrary Acyclic Flow Among TasksPreserving Fault Tolerance

The Blocking Property

Page 25: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

25

Generalization of Map-Reduce

Map-reduce uses only two functions (Map and Reduce). Each is implemented by a rank of

tasks. Data flows from Map tasks to Reduce

tasks only.

Page 26: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

26

Generalization – (2)

Natural generalization is to allow any number of functions, connected in an acyclic network.

Each function implemented by tasks that feed tasks of successor function(s).

Key fault-tolerance (blocking ) property: tasks produce all their output at the end.

Page 27: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

27

Many Implementations

1. Clustera – University of Wisconsin.2. Hyracks – Univ. of California/Irvine.3. Dryad/DryadLINQ – Microsoft.4. Nephele/PACT – T. U. Berlin.5. BOOM – Berkeley.6. epiC – N. U. Singapore.

Page 28: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

28

Example: Join + Aggregation

Relations D(emp, dept) and S(emp,sal).

Compute the sum of the salaries for each department.

D JOIN S computed by map-reduce. But each Reduce task can also group

its emp-dept-sal tuples by dept and sum the salaries.

Page 29: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

29

Example: Continued

A Third function is needed to take the dept-SUM(sal) pairs from each Reduce task, organize them by dept, and compute the final sum for each department.

Page 30: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

30

3-Layer Dataflow

MapTasks

D

S

Join +GroupTasks

Hash byemp

FinalGroup +Aggreg-

ate

Hash bydept

Page 31: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

31

Recursion

Transitive-Closure ExampleFault-Tolerance Problem

Endgame ProblemSome Systems and

ApproachesRecent research ideas contributed byF. Afrati, V. Borkar, M. Carey, N. Polyzotis

Page 32: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

32

Applications Requiring Recursion

1. PageRank, the original map-reduce application is really a recursion implemented by many rounds of map-reduce.

2. Analysis of Web structure.3. Analysis of social networks.4. PDE’s.

Page 33: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

33

Transitive Closure

Many recursive applications involving large data are similar to transitive closure :

Path(X,Y) :- Arc(X,Y)Path(X,Y) :- Path(X,Z) & Path(Z,Y)

Path(X,Y) :- Arc(X,Y)Path(X,Y) :- Arc(X,Z) & Path(Z,Y)

Nonlinear

(Right) Linear

Page 34: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

34

Implementing TC on a Cluster

Use k tasks. Hash function h sends each node of

the graph to one of the k tasks. Task i receives and stores Path(a,b)

if either h(a) = i or h(b) = i, or both. Task i must join Path(a,c) with

Path(c,b) if h(c) = i.

Page 35: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

35

TC on a Cluster – Basis

Data is stored as relation Arc(a,b). Map tasks read chunks of the Arc

relation and send each tuple Arc(a,b) to recursive tasks h(a) and h(b). Treated as if it were tuple Path(a,b). If h(a) = h(b), only one task receives.

Page 36: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

36

TC on a Cluster – Recursive Tasks

Task iPath(a,b)received

StorePath(a,b)if new.Otherwise,ignore.

Look upPath(b,c) and/orPath(d,a) forany c and d

Send Path(a,c) totasks h(a) and h(c);send Path(d,b) totasks h(d) and h(b)

Page 37: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

37

Big Problem: Managing Failure

Map-reduce depends on the blocking property.

Only then can you restart a failed task without restarting the whole job.

But any recursive task has to deliver some output and later get more input.

Page 38: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

38

HaLoop (U. Washington)

Iterates Hadoop, once for each round of the recursion. Like iterative PageRank.

Similar idea: Twister (U. Indiana). Clever piece is the way HaLoop

tries to run each task in round i at a compute node where it can find its needed output from round i – 1.

Page 39: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

39

Pregel (Google)

Views all computation as a recursion on some graph.

Nodes send messages to one another. Messages bunched into supersteps.

Checkpoint all compute nodes after some fixed number of supersteps.

On failure, rolls all tasks back to previous checkpoint.

Page 40: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

40

Example: Shortest Paths Via Pregel

Node N

I found a pathfrom node M toyou of length L

5 3 6

I found a pathfrom node M toyou of length L+3

I found a pathfrom node M toyou of length L+5

I found a pathfrom node M toyou of length L+6

Is this theshortest path fromM I know about?

table ofshortestpathsto N

Page 41: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

41

Using Idempotence

Some recursive applications allow restart of tasks even if they have produced some output.

Example: TC is idempotent; you can send a task a duplicate Path fact without altering the result. But if you were counting paths, the

answer would be wrong.

Page 42: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

42

Big Problem: The Endgame

Some recursions, like TC, take a large number of rounds, but the number of new discoveries in later rounds drops. T. Vassilakis (Google): searches

forward on the Web graph can take hundreds of rounds.

Problem: in a cluster, transmitting small files carries much overhead.

Page 43: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

43

Approach: Merge Tasks

Decide when to migrate tasks to fewer compute nodes.

Data for several tasks at the same node are combined into a single file and distributed at the receiving end.

Downside: old tasks have a lot of state to move.

Example: “paths seen so far.”

Page 44: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

44

Approach: Modify Algorithms

Nonlinear recursions can terminate in many fewer steps than equivalent linear recursions.

Example: TC. O(n) rounds on n-node graph for

linear. O(log n) rounds for nonlinear.

Page 45: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

45

Advantage of Linear TC

The data-volume cost (= sum of input sizes of all tasks) for executing linear TC is generally lower than that for nonlinear TC.

Why? Each path is discovered only once. Note: distinct paths between the same

endpoints may each be discovered.

Page 46: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

46

Example: Linear TC Arc + Path = Path

Page 47: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

47

Nonlinear TC Constructs Path + Path = Path in Many

Ways

Page 48: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

48

Smart TC

(Valduriez-Boral, Ioannides) Construct a path from two paths:

1. The first has a length that is a power of 2.

2. The second is no longer than the first.

Page 49: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

49

Example: Smart TC

Page 50: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

50

Other Nonlinear TC Algorithms

You can have the unique-decomposition property with many variants of nonlinear TC.

Example: Balance constructs paths from two equal-length paths. Favor first path when length is odd.

Page 51: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

51

Example: Balance

Page 52: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

52

Incomparability of TC Algorithms

On different graphs, any of the unique-decomposition algorithms – left-linear, right-linear, smart, balanced – could have the lowest data-volume cost.

Other unique-decomposition algorithms are possible and also could win.

Page 53: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

53

Extension Beyond TC

Can you convert any linear recursion into an equivalent nonlinear recursion that requires logarithmic rounds?

Answer: Not always, without increasing arity and data size.

Page 54: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

54

Positive Points

1. (Agarwal, Jagadish, Ness) All linear Datalog recursions reduce to TC.

2. Right-linear chain-rule Datalog programs can be replaced by nonlinear recursions with the same arity, logarithmic rounds, and the unique-decomposition property.

Page 55: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

55

Example: Alternating-Color Paths

P(X,Y) :- Blue(X,Y)P(X,Y) :- Blue(X,Z) & Q(Z,Y)Q(X,Y) :- Red(X,Z) & P(Z,Y)

Page 56: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

56

The Case of Reachability

Reach(X) :- Source(X)Reach(X) :- Reach(Y) & Arc(Y,X)

Takes linear rounds as stated. Can compute nonlinear TC to get

Reach in O(log n) rounds. But, then you compute O(n2) facts

instead of O(n) facts on an n-node graph.

Page 57: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

57

Reachability – (2)

Theorem: If you compute Reach using only unary recursive predicates, then it must take (n) rounds on a graph of n nodes. Proof uses the ideas of Afrati,

Cosmodakis, and Yannakakis from a generation ago.

Page 58: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

58

Summary: Recursion

Key problems are “endgame” and nonblocking nature of recursive tasks.

In some applications, endgame problem can be handled by using a nonlinear recursion that requires O(log n) rounds and has the unique-decomposition property.

Page 59: 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

59

Summary: Research Questions

1. How do you best support fault tolerance when tasks are nonblocking?

2. How do you manage tasks when the endgame problem cannot be avoided?

3. When can you replace linear recursion with nonlinear recursion requiring many fewer rounds and (roughly) the same data-volume cost?