Top Banner
12/10/12 1 GraphLab under the hood Zuhair Khayyat
31

Graphlab under the hood

Nov 10, 2014

Download

Education

Zuhair khayyat

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Graphlab under the hood

12/10/12 1

GraphLab under the hood

Zuhair Khayyat

Page 2: Graphlab under the hood

12/10/12 2

GraphLab overview: GraphLab 1.0

● GraphLab: A New Framework For Parallel Machine Learning

– high-level abstractions for machine learning problems

– Shared-memory multiprocessor

– Assume no fault tolerance needed

– Concurrent access precessing models with sequential-consistency guarantees

Page 3: Graphlab under the hood

12/10/12 3

GraphLab overview: GraphLab 1.0

● How GraphLab 1.0 works?– Represent the user's data by a directed graph

– Each block of data is represented by a vertex and a directed edge

– Shared data table

– User functions:● Update: modify the vertex and edges state,

read only to shared table● Fold: sequential aggregation to a key entry in

the shared table, modify vertex data● Merge: Parallelize Fold function● Apply: Finalize the key entry in the shared table

Page 4: Graphlab under the hood

12/10/12 4

GraphLab overview: GraphLab 1.0

Page 5: Graphlab under the hood

12/10/12 5

GraphLab overview: Distributed GraphLab 1.0

● Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud

– Fault tolerance using snapshot algorithm

– Improved distributed parallel processing

– Two stage partitioning:● Atoms generated by ParMetis● Ghosts generated by the intersection of the

atoms

– Finalize() function for vertex synchronization

Page 6: Graphlab under the hood

12/10/12 6

GraphLab overview: Distributed GraphLab 1.0

Page 7: Graphlab under the hood

12/10/12 7

GraphLab overview: Distributed GraphLab 1.0

GHostsWorker 1 Worker 2

Page 8: Graphlab under the hood

12/10/12 8

PowerGraph: Introduction

● GraphLab 2.1● Problems of highly skewed power-law graphs:

– Workload imbalance ==> performance degradations

– Limiting Scalability

– Hard to partition if the graph is too large

– Storage

– Non-parallel computation

Page 9: Graphlab under the hood

12/10/12 9

PowerGraph: New Abstraction

● Original Functions:– Update

– Finalize

– Fold

– Merge

– Apply: The synchronization apply

● Introduce GAS model:– Gather: in, out or all neighbors

– Apply: The GAS model apply

– Scatter

Page 10: Graphlab under the hood

12/10/12 10

PowerGraph: Gather

Worker 1 Worker 2

Page 11: Graphlab under the hood

12/10/12 11

PowerGraph: Apply

Worker 1 Worker 2

Page 12: Graphlab under the hood

12/10/12 12

PowerGraph: Scatter

Worker 1 Worker 2

Page 13: Graphlab under the hood

12/10/12 13

PowerGraph: Vertex Cut

A

B

H

F

C

G

I

E D

A B

C I

E

D

A H

A G

B H

B C

C

CH

D E D I

E F I

F H F G

Page 14: Graphlab under the hood

12/10/12 14

PowerGraph: Vertex Cut

A B

C I

E

D

A H

A G

B H

B C

C

CH

D E D I

E F I

F H F G

A B

A H

A G

C

B

DC

H

IC

D

I

E

E I

F

F G

Page 15: Graphlab under the hood

12/10/12 15

PowerGraph: Vertex Cut (Greedy)

A B

C I

E

D

A H

A G

B H

B C

C

CH

D E D I

E F I

F H F G

A B

HG

B C

H

DC

C

I EE

F G

Page 16: Graphlab under the hood

12/10/12 16

PowerGraph: Experiment

Page 17: Graphlab under the hood

12/10/12 17

PowerGraph: Experiment

Page 18: Graphlab under the hood

12/10/12 18

PowerGraph: Discussion

● Isn't it similar to Pregel Mode?– Partially process the vertex if a message exists

● Gather, Apply and Scatter are commutative and associative operations. What if the computation is not commutative!

– Sum up the message values in a specific order to get the same floating point rounding error.

Page 19: Graphlab under the hood

12/10/12 19

PowerGraph and Mizan

● In Mizan we use partial replication:

a

b

c

d

e

f

g

W0 W1

a'a

b

c

d

e

f

g

W0 W1

Compute Phase Communication Phase

Page 20: Graphlab under the hood

12/10/12 20

GraphChi: Introduction

● Asynchronous Disk-based version of GraphLab

● Utilizing parallel sliding window– Very small number of non-sequential accesses

to the disk

● Support for graph updates– Based on Kineograph, a distributed system for

processing a continuous in-flow of graph updates, while simultaneously running advanced graph mining algorithms.

Page 21: Graphlab under the hood

12/10/12 21

GraphChi: Graph Constrains

● Graph does not fit in memory● A vertex, its edges and values fits in memory

Page 22: Graphlab under the hood

12/10/12 22

GraphChi: Disk storage

● Compressed sparse row (CSR):– Compressed adjacency list with indexes of the

edges.

– Fast access to the out-degree vertices.

● Compressed Sparse Column (CSC):– CSR for the transpose graph

– Fast access to the in-degree vertices

● Shard: Store the edges' data

Page 23: Graphlab under the hood

12/10/12 23

GraphChi: Loading the graph

● Input graph is split into P disjoint intervals to balance edges, each associated with a shard

● A shard contains data of the edges of an interval

● The sub graph is constructed as reading its interval

Page 24: Graphlab under the hood

12/10/12 24

GraphChi: Parallel Sliding Windows

● Each interval is processed in parallel

● P sequential disk access are required to process each interval

● The length of intervals vary with graph distribution

● P * P disk access required for one superstep

Page 25: Graphlab under the hood

12/10/12 25

GraphChi: Example

(1,2) (3,4) (5,6)

Executing interval (1,2):

Page 26: Graphlab under the hood

12/10/12 26

GraphChi: Example

(1,2) (3,4) (5,6)

Executing interval (3,4):

Page 27: Graphlab under the hood

12/10/12 27

GraphChi: Example

Page 28: Graphlab under the hood

12/10/12 28

GraphChi: Evolving Graphs

● Adding an edge is reflected on the intervals and shards if read

● Deleting an edge causes that edge to be ignored

● Adding and deleting edges are handled after processing the current interval.

Page 29: Graphlab under the hood

12/10/12 29

GraphChi: Preprocessing

Page 30: Graphlab under the hood

12/10/12 30

Thank you

Page 31: Graphlab under the hood

12/10/12 31

thegraphsblog.wordpress.com/

The Blog wants YOU