Top Banner
Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud Computing Conference July 19, 2008
42

Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

May 28, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Cluster Computing with DryadLINQ

Mihai BudiuMicrosoft Research Silicon Valley

IEEE Cloud Computing ConferenceJuly 19, 2008

Page 2: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Goal

2

Page 3: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Design Space

3

ThroughputLatency

Internet

Privatedata

center

Data-parallel

Sharedmemory

Page 4: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Data Partitioning

4

RAM

DATA

DATA

Page 5: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Data-Parallel Computation

5

Storage

Execution

Application

Parallel Databases

Map-Reduce

GFSBigTable

CosmosNTFS

Dryad

DryadLINQScope,PSQ

L

Sawzall, Pig

Page 6: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

• Introduction

• Dryad

• DryadLINQ

• Applications

6

Page 7: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

2-D Piping• Unix Pipes: 1-D

grep | sed | sort | awk | perl

• Dryad: 2-D

grep1000 | sed500 | sort1000 | awk500 | perl50

7

Page 8: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Virtualized 2-D Pipelines

8

Page 9: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Virtualized 2-D Pipelines

9

Page 10: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Virtualized 2-D Pipelines

10

Page 11: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Virtualized 2-D Pipelines

11

Page 12: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Virtualized 2-D Pipelines

12

• 2D DAG• multi-machine• virtualized

Page 13: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Architecture

13

Files, TCP, FIFO, Networkjob schedule

data plane

control plane

NS PD PDPD

V V V

Job manager cluster

Page 14: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Fault Tolerance

Page 15: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

• Introduction

• Dryad

• DryadLINQ

• Applications

15

Page 16: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

DryadLINQ

16

Dryad

Page 17: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

17

LINQ = C# + Queries

Collection<T> collection;

bool IsLegal(Key);

string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

Page 18: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Collection<T> collection;bool IsLegal(Key k);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

18

DryadLINQ = LINQ + Dryad

C#

collection

results

C# C# C#

Vertexcode

Queryplan(Dryad job)

Data

Page 19: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Data Model

19

Partition

Collection

C# objects

Page 20: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Language Summary

20

WhereSelectGroupByOrderByAggregateJoinApplyMaterialize

Page 21: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Demo

21Done

Page 22: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Example: Histogram

22

public static IQueryable<Pair> Histogram(IQueryable<LineRecord> input, int k)

{var words = input.SelectMany(x => x.line.Split(' '));var groups = words.GroupBy(x => x);var counts = groups.Select(x => new Pair(x.Key, x.Count()));var ordered = counts.OrderByDescending(x => x.count);var top = ordered.Take(k);return top;

}

“A line of words of wisdom”

[“A”, “line”, “of”, “words”, “of”, “wisdom”]

[[“A”], [“line”], [“of”, “of”], [“words”], [“wisdom”]]

[ {“A”, 1}, {“line”, 1}, {“of”, 2}, {“words”, 1}, {“wisdom”, 1}]

[{“of”, 2}, {“A”, 1}, {“line”, 1}, {“words”, 1}, {“wisdom”, 1}]

[{“of”, 2}, {“A”, 1}, {“line”, 1}]

Page 23: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Histogram Plan

23

SelectManyHashDistribute

MergeGroupBy

Select

OrderByDescendingTake

MergeSortTake

Page 24: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Map-Reduce in DryadLINQ

24

public static IQueryable<S> MapReduce<T,M,K,S>(this IQueryable<T> input,Expression<Func<T, IEnumerable<M>>> mapper,Expression<Func<M,K>> keySelector,Expression<Func<IGrouping<K,M>,S>> reducer)

{var map = input.SelectMany(mapper);var group = map.GroupBy(keySelector);var result = group.Select(reducer);return result;

}

Page 25: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Map-Reduce Plan

25

M

R

G

M

Q

G1

R

D

MS

G2

R

static dynamic

X

X

M

Q

G1

R

D

MS

G2

R

X

M

Q

G1

R

D

MS

G2

R

X

M

Q

G1

R

D

M

Q

G1

R

D

MS

G2

R

X

M

Q

G1

R

D

MS

G2

R

X

M

Q

G1

R

D

MS

G2

R

MS

G2

R

map

sort

groupby

reduce

distribute

mergesort

groupby

reduce

mergesort

groupby

reduce

consumer

map

part

ial a

ggre

gatio

nre

ducedynamic

Page 26: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

• Introduction

• Dryad

• DryadLINQ

• Applications

26

Page 27: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Applications

27

Dryad

DryadLINQ

Distributed Data Structures

Machine learning

GraphsLog

analysisImage

processing

Combinatorial

optimization

Raytracin

g

Dataanalysis

Page 28: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

E.g: Linear Algebra

28

T U Vnmm ×ℜℜℜ ,,=, ,

T

Page 29: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Expectation Maximization (Gaussians)

29

• 160 lines • 3 iterations shown

Page 30: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Conclusions

• Dryad = distributed execution environment• Supports rich software ecosystem

• DryadLINQ = Compiles LINQ to Dryad• C# objects and declarative programming• .Net and Visual Studio

for distributed programming

30

Page 31: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Dryad Job Structure

31

grep

sed

sortawk

perlgrep

grepsed

sort

sort

awk

Inputfiles

Vertices (processes)

Outputfiles

ChannelsStage

Page 32: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Linear Regression

• Data

• Find

• S.t.

mt

nt yx ℜ∈ℜ∈ ,

mnA ×ℜ∈

tt yAx ≈

},...,1{ nt∈

32

Page 33: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Analytic Solution

33

X×XT X×XT X×XT Y×XT Y×XT Y×XT

Σ

X[0] X[1] X[2] Y[0] Y[1] Y[2]

Σ

[ ]-1

*

A

1))(( −××= ∑∑ Ttt t

Ttt t xxxyA

Map

Reduce

Page 34: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Linear Regression Code

Vectors x = input(0), y = input(1);

Matrices xx = x.Map(x, (a,b) => a.OuterProd(b));

OneMatrix xxs = xx.Sum();

Matrices yx = y.Map(x, (a,b) => a.OuterProd(b));

OneMatrix yxs = yx.Sum();

OneMatrix xxinv = xxs.Map(a => a.Inverse());

OneMatrix A = yxs.Map(xxinv, (a, b) => a.Mult(b));34

1))(( −××= ∑∑ Ttt t

Ttt t xxxyA

Page 35: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Dryad = Execution Layer

35

Job (application)

Dryad

Cluster

Pipeline

Shell

Machine

Page 36: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

• Many similarities• Exe + app. model• Map+sort+reduce• Few policies• Program=map+reduce• Simple• Mature (> 4 years)• Widely deployed• Hadoop

Dryad Map-Reduce

• Execution layer• Job = arbitrary DAG• Plug-in policies• Program=graph gen.• Complex ( features)• New (< 2 years)• Still growing• Internal

36

Page 37: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

PLINQ

37

public static IEnumerable<TSource>DryadSort<TSource, TKey>(IEnumerable<TSource> source,

Func<TSource, TKey> keySelector,IComparer<TKey> comparer,bool isDescending)

{return source.AsParallel().OrderBy(keySelector, comparer);

}

Page 38: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Operations on Large Vectors: Map 1

38

U

T

T Uf

f

f preserves partitioning

Page 39: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

V

Map 2 (Pairwise)

39

T Uf

V

U

T

f

Page 40: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Map 3 (Vector-Scalar)

40

T Uf

V

V

40

U

T

f

Page 41: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Reduce (Fold)

41

U UU

U

f

f f f

fU U U

U

Page 42: Cluster Computing with Dryad - IEEEewh.ieee.org/r6/scv/computer/nfic/2008/Microsoft... · Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research Silicon Valley IEEE Cloud

Software Stack

42

Windows Server

Cluster Services

Distributed Filesystem (Cosmos)

Dryad

Distributed Shell (Nebula)

PSQL

DryadLINQ

PerlSQL

server

C++

Windows Server

Windows Server

Windows Server

C++

CIFS/NTFS

legacycode

sed, awk, grep, etc.

SSISScope

C# Data structures

Applications

C#

Job

queu

eing

, mon

itori

ng