Top Banner
Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein
54

Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Dec 25, 2015

Download

Documents

Tyler Bruce
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Carnegie Mellon

GraphLabA New Framework for

Parallel Machine Learning

Yucheng LowAapo Kyrola Carlos Guestrin

Joseph GonzalezDanny BicksonJoe Hellerstein

Page 2: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

2

1988

1990

1992

1994

1996

1998

2000

2002

2004

2006

2008

2010

0.01

0.1

1

10

Exponential Parallelism

Exponentially

Incre

asing

Sequential P

erform

ance

Constant SequentialPerformance

Pro

cess

or

Sp

eed

GH

z

Exponentially Increasing Parallel Performance

Release Date

13 Million Wikipedia Pages3.6 Billion photos on Flickr

Page 3: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Parallel Programming is Hard

Designing efficient Parallel Algorithms is hard

Race conditions and deadlocksParallel memory bottlenecksArchitecture specific concurrencyDifficult to debug

ML experts repeatedly address the same parallel design challenges

3

Avoid these problems by using high-level abstractions.

Graduate students

Page 4: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

CPU 1 CPU 2 CPU 3 CPU 4

MapReduce – Map Phase

4

Embarrassingly Parallel independent computation

12.9

42.3

21.3

25.8

No Communication needed

Page 5: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

CPU 1 CPU 2 CPU 3 CPU 4

MapReduce – Map Phase

5

Embarrassingly Parallel independent computation

12.9

42.3

21.3

25.8

24.1

84.3

18.4

84.4

No Communication needed

Page 6: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

CPU 1 CPU 2 CPU 3 CPU 4

MapReduce – Map Phase

6

Embarrassingly Parallel independent computation

12.9

42.3

21.3

25.8

17.5

67.5

14.9

34.3

24.1

84.3

18.4

84.4

No Communication needed

Page 7: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

CPU 1 CPU 2

MapReduce – Reduce Phase

7

12.9

42.3

21.3

25.8

24.1

84.3

18.4

84.4

17.5

67.5

14.9

34.3

2226.

26

1726.

31

Fold/Aggregation

Page 8: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Related Data

8

Interdependent Computation: Not MapReduceable

Page 9: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Parallel Computing and ML

Not all algorithms are efficiently data parallel

9

Data-Parallel Complex Parallel Structure

CrossValidation

Feature Extraction

BeliefPropagation

SVM

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

Sampling

Lasso

Page 10: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Common Properties

10

1) Sparse Data Dependencies

2) Local Computations

3) Iterative Updates

• Sparse Primal SVM• Tensor/Matrix Factorization

• Expectation Maximization• Optimization

• Sampling• Belief Propagation

Operation A

Operation B

Page 11: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Gibbs Sampling

11

X4 X5 X6

X9X8

X3X2X1

X7

1) Sparse Data Dependencies

2) Local Computations

3) Iterative Updates

Page 12: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

GraphLab is the SolutionDesigned specifically for ML needs

Express data dependenciesIterative

Simplifies the design of parallel programs:

Abstract away hardware issuesAutomatic data synchronizationAddresses multiple hardware architectures

Implementation here is multi-coreDistributed implementation in progress

12

Page 13: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Carnegie Mellon

GraphLab

A New Framework for Parallel Machine

Learning

Page 14: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

GraphLab

14

Data Graph Shared Data Table

Scheduling

Update Functions and Scopes

GraphLabModel

Page 15: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Data Graph

15

A Graph with data associated with every vertex and edge.

:Data

x3: Sample valueC(X3): sample counts

Φ(X6,X9): Binary potential

X1

X2

X3

X5

X6

X7

X8

X9

X10

X4

X11

Page 16: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Update Functions

16

Update Functions are operations which are applied on a vertex and transform the data in the scope of the vertex

Gibbs Update: - Read samples on adjacent vertices - Read edge potentials - Compute a new sample for the current vertex

Page 17: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Update Function Schedule

17

e f g

kjih

dcbaCPU 1

CPU 2

a

h

a

i

b

d

Page 18: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Update Function Schedule

18

e f g

kjih

dcbaCPU 1

CPU 2

a

i

b

d

Page 19: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Static ScheduleScheduler determines the

order of Update Function Evaluations

19

Synchronous Schedule: Every vertex updated simultaneously

Round Robin Schedule: Every vertex updated sequentially

Page 20: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Converged Slowly ConvergingFocus Effort

Need for Dynamic Scheduling

20

Page 21: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Dynamic Schedule

21

e f g

kjih

dcbaCPU 1

CPU 2

a

h

a

b

b

i

Page 22: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Dynamic ScheduleUpdate Functions can insert new tasks into

the schedule

22

FIFO Queue Wildfire BP [Selvatici et al.]

Priority Queue Residual BP [Elidan et al.]

Splash Schedule Splash BP [Gonzalez et al.]

Obtain different algorithms simply by changing a flag!

--scheduler=fifo --scheduler=priority --scheduler=splash

Page 23: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Global Information

What if we need global information?

23

Sum of all the vertices?

Algorithm Parameters?

Sufficient Statistics?

Page 24: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Shared Data Table (SDT)Global constant parameters

24

Constant:Total # Samples

Constant: Temperature

Page 25: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Accumulate Function:

Sync OperationSync is a fold/reduce operation over the graph

25

Sync!

1 3 2

1211

3251

0

Apply Function:

AddDivide by |

V|9222

Accumulate performs an aggregation over verticesApply makes a final modification to the accumulated dataExample: Compute the average of all the vertices

Page 26: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Shared Data Table (SDT)Global constant parametersGlobal computation (Sync Operation)

26

Constant:Total # Samples

Sync: SampleStatistics

Sync: LoglikelihoodConstant: Temperature

Page 27: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Carnegie Mellon

Safetyand

Consistency

27

Page 28: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Write-Write Race

28

Write-Write Race If adjacent update functions write simultaneously

Left update writes: Right update writes:Final Value

Page 29: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Race Conditions + Deadlocks

Just one of the many possible racesRace-free code is extremely difficult to write

29

GraphLab design ensures race-free operation

Page 30: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Scope Rules

30

Guaranteed safety for all update functions

Page 31: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Full Consistency

31

Only allow update functions two vertices apart to be run in parallelReduced opportunities for parallelism

Page 32: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Obtaining More Parallelism

32

Not all update functions will modify the entire scope!

Belief Propagation: Only uses edge dataGibbs Sampling: Only needs to read adjacent vertices

Page 33: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Edge Consistency

33

Page 34: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Obtaining More Parallelism

34

“Map” operations. Feature extraction on vertex data

Page 35: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Vertex Consistency

35

Page 36: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Sequential ConsistencyGraphLab guarantees sequential

consistency

36

For every parallel execution, there exists a sequential execution of update functions which will produce the same result.

CPU 1

CPU 2

CPU 1

Parallel

Sequential

time

Page 37: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

GraphLab

37

GraphLabModel

Data Graph Shared Data Table

Scheduling

Update Functions and

Scopes

Page 38: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Carnegie Mellon

Experiments

38

Page 39: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

ExperimentsShared Memory Implemention in C++ using PthreadsTested on a 16 processor machine

4x Quad Core AMD Opteron 838464 GB RAM

Belief Propagation +Parameter Learning

Gibbs SamplingCoEMLasso

39

Compressed SensingSVMPageRankTensor Factorization

Page 40: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Graphical Model Learning

40

3D retinal image denoising

Data Graph: 256x64x64 vertices

Update Function

Belief PropagationSync

Acc: Compute inference statisticsApply:Take a gradient step

Sync: Edge-potential

Page 41: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Graphical Model Learning

41

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Sp

eed

up

Optimal

Bett

er

Approx. Priority Schedule

Splash Schedule

15.5x speedup on 16 cpus

Page 42: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Graphical Model Learning

42

Inference

Gradient Step

Standard parameter learning takes gradient only after inference is compute

Parallel Inference + Gradient Step

With GraphLab:Take gradient step while inference is running

Ru

nti

me

3x faster!

2100 sec

700 sec

Iterated Simultaneous

Page 43: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Gibbs SamplingTwo methods for sequentially consistency:

43

ScopesEdge Scope

graphlab(gibbs, edge, sweep);

SchedulingGraph Coloring

CPU

1

CPU

2

CPU

3

t0

t1

t2

t3

graphlab(gibbs, vertex, colored);

Page 44: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Gibbs SamplingProtein-protein interaction networks [Elidan et al. 2006]

Pair-wise MRF14K Vertices100K Edges

10x SpeedupScheduling reduceslocking overhead

44

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Sp

eed

up

Optimal

Bett

er

Round robin schedule

Colored Schedule

Page 45: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

CoEM (Rosie Jones, 2005)Named Entity Recognition Task

Vertices

Edges

Small 0.2M 20M

Large 2M 200M

the dog

Australia

Catalina Island

<X> ran quickly

travelled to <X>

<X> is pleasant

Hadoop 95 Cores 7.5 hrs

Is “Dog” an animal?Is “Catalina” a place?

Page 46: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

CoEM (Rosie Jones, 2005)

4646

Optimal

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Sp

eed

up

Bett

er

Small

Large

GraphLab 16 Cores 30 min

15x Faster!6x fewer CPUs!

Hadoop 95 Cores 7.5 hrs

Page 47: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Lasso

47

L1 regularized Linear Regression

Shooting Algorithm (Coordinate Descent)Due to the properties of the update, full consistency is needed

Page 48: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Lasso

48

L1 regularized Linear Regression

Shooting Algorithm (Coordinate Descent)Due to the properties of the update, full consistency is needed

Page 49: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Lasso

49

L1 regularized Linear Regression

Shooting Algorithm (Coordinate Descent)Due to the properties of the update, full consistency is needed

Finance Dataset from Kogan et al [2009].

Page 50: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Full Consistency

50

Optimal

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Sp

eed

up

Bett

er

Dense

Sparse

Page 51: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Sp

eed

up

Relaxing Consistency

51Why does this work? (Open Question)

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Sp

eed

up

Bett

er Optimal

Dense

Sparse

Page 52: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

GraphLabAn abstraction tailored to Machine Learning Provides a parallel framework which compactly expresses

Data/computational dependenciesIterative computation

Achieves state-of-the-art parallel performance on a variety of problemsEasy to use

52

Page 53: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Future WorkDistributed GraphLab

Load BalancingMinimize CommunicationLatency HidingDistributed Data ConsistencyDistributed Scalability

GPU GraphLabMemory bus bottle neckWarp alignment

State-of-the-art performance for <Your Algorithm Here> .

53

Page 54: Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein.

Carnegie Mellon

Parallel GraphLab 1.0

Available Today

http://graphlab.ml.cmu.edu

54

Documentation… Code… Tutorials…