Top Banner
Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed Systems Shanghai Jiao Tong University http://ipads.se.sjtu.edu.cn/projects/powerlyra.html 2014 PowerLyra J R Y H B H
35

Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Dec 14, 2015

Download

Documents

Alina Tabb
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Differentiated Graph Computation and Partitioning on Skewed Graphs

Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen

Institute of Parallel and Distributed SystemsShanghai Jiao Tong University

http://ipads.se.sjtu.edu.cn/projects/powerlyra.html

2014

PowerLyra

J

R Y

HB

H

Page 2: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Big Data Everywhere100 Hrs of

Video every minute

1.11 Billion Users

6 Billion Photos400 Million

Tweets/day

How do we understand and use Big Data?

Page 3: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Big Data Big Learning100 Hrs of

Video every minute

1.11 Billion Users

6 Billion Photos400 Million

Tweets/day

NLP

Big Learning: machine learning and data mining on

Big Data

Page 4: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

It’s all about the graphs …

Page 5: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Example Algorithms

PageRank (Centrality Measures)

α is the random reset probabilityL[j] is the number of links on page j

𝑹 [𝟏 ]=0.15+0.85(𝑹 [𝟑 ]+ 13𝑹 [𝟒 ]+ 1

2𝑹[𝟓])

2 4

3 1 5

iterate until convergence

example:

http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf

Page 6: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Background: Graph Algorithms

IterativeComputation

DependentData

LocalAccesses

2 4

3 1 5

2 4

3 1 5

2 4

3 1 5

Coding graph algorithms as vertex-centric programs to process vertices in parallel and communicate along edges

"Think as a Vertex" philosophy

Page 7: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Think as a Vertex

1. aggregate value of neighbors2. update itself value3. activate neighbors

compute(v): double sum = 0 double value, last = v.get () foreach (n in v.in_nbrs) sum += n.value / n.nedges; value = 0.15 + 0.85 * sum;

v.set (value);

activate (v.out_nbrs);

Example: PageRank

1

2

3

AlgorithmImpl. compute() for vertex

Page 8: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Graph in Real World

Hallmark Property : Skewed power-law degree distributions

“most vertices have relatively few neighbors while a few have many neighbors”

cou

nt

degree

Low Degree Vertex

High Degree Vertex

Twitter Following Graph:1% of the vertices are adjacent to nearly half of the edges

star-like motif

Page 9: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Existing Graph Modelssampl

e graph

Graph Placement Comp. Pattern

Comm. CostDynamic Comp.

Load Balance

edge-cuts local

≤ #edge-cutsnono

edge-cuts local≤ 2 x

#mirrorsyesno

vertex-cuts distributed

≤ 5 x #mirrors

yesyes

Computation Model Pregel GraphLab PowerGraph

A B

A B

Pregel GraphLab

PowerGraph

A

B

A

B

x5

A

B

A

B

x2mirrormaste

r

Page 10: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Existing Graph Cuts

14 2

3

6

5

1 2

6

5

14 2

3

6

14 2

3 5 14 2

65 1 2

3

6

5

14 2

6

1

26

5

14

36

Edge-cut

Vertex-cut

master

mirror

dup.edge

flying maste

r

randomgreedy

imbalance

Page 11: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

partition λ ingress runtimeRandom 16.0 263 94.2

Coordinated 5.5 391 33.7

Oblivious 12.8 289 75.3

Grid 8.9 138 43.6

Issues of Graph Partitioning

Edge-cut: Imbalance & replicated edges

Vertex-cut: do not exploit locality□ Random: high replication factor*□ Greedy: long ingress time, unfair to low-degree

vertex□ Constrained: imbalance, poor placement of low-

vertex

∗𝑟𝑒𝑝𝑙𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑓𝑎𝑐𝑡𝑜𝑟:𝜆=

¿𝑟𝑒𝑝𝑙𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑠¿ 𝑣𝑒𝑟𝑡𝑖𝑐𝑒𝑠

Twitter Follower Graph 48 machines, |V|=42M |E|=1.47B

Page 12: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Principle of PowerLyra

Differentiated Graph Computation and Partitioning

The vitally important challenges associated to

the performance of distributed computation system

1. How to make resource locally accessible?2. How to evenly parallelize workloads?

Conflict

High-degree vertex Parallelism

Low-degree vertex LocalityOne

Size fit

All

Page 13: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Computation Model

High-degree vertex□ Goal: exploit parallelism□ Follow GAS model [PowerGraph OSDI’12]

“Gather Apply Scatter”compute (v) double sum = 0 double value, last = v.get () foreach (n in v.in_nbrs) sum += n.value / n.nedges; value = 0.15 + 0.85 * sum;

v.set (value);

activate (v.out_nbrs);

gather (n): return n.value / n.nedges;

scatter (v) activate (v.out_nbrs);

apply (v, acc): value = 0.15 + 0.85 * acc; v.set (value);

Page 14: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Computation Model

High-degree vertex□ Goal: exploit parallelism□ Follow GAS model [PowerGraph OSDI’12]

H HGather

master mirrorscall gather()master mirrors

1

2

Scattermaster mirrorscall scatter()master mirrors

4

5

Applycall apply()

master mirrors

3

1

2

3

4

5

Gather

Scatter

Apply

Gather

Scatter

Page 15: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Computation Model

Low-degree vertex□ Goal: exploit locality□ One direction locality (avoid replicated

edges)□ Local gather + distributed scatter□ Comm. Cost : ≤ 1 x #mirrors

L L

Gather call gather()

Scatter call scatter()

Applycall apply()

master mirrors

11

Gather

Scatter

Apply

Scatter

Observation: most algorithms only gather or scatter in one

direction(e.g., PageRank: G/IN and

S/OUT)

All of in-edges

e.g., PageRank: Gather/IN & Scatter/OUT

Page 16: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Computation Model

Generality□ Algorithm gather or scatter in two directions□ Adaptive degradation for gathering or scattering□ Easily check in runtime without overhead

(user has explicitly defined access direction in code)

L L1

Gather

Scatter

Apply

Gather

Scatter

e.g., Gather/IN & Scatter/ALL Type Gather Scatter Ex.

InIN/NONE

OUT/NONE PR

OutOUT/ NONE

IN/NONE DIA

Other ANY ANY LBP

2

Page 17: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

1. Lower replication factor2. One direction locality3. Efficiency

(ingress/runtime)4. Balance (#edge)5. Fewer flying master

partition λ ingress runtime

Random 11.7 35.5 14.71

Coordinated 4.8 32.0 6.85

Oblivious 8.3 36.4 11.70

Grid 8.4 21.4 7.30

Low-cut 3.9 15.0 2.26

Synthetic Regular Graph*48 machines, |V|=10M |E|=93M

Graph Partitioning

Low-degree vertex□ Place one direction edges (e.g., in-edges) of a

vertex to its hash-based machine

□ Simple, but Best !

*https://github.com/graphlab-code/graphlab/blob/master/src/graphlab/graph/distributed_graph.hpp

Page 18: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Graph Partitioning

High-degree vertex□ Distribute edges (e.g., in-edges) according to

another endpoint vertex (e.g., source)□ The upper bound of replications imported by

placing all edges belonged to high-degree vertex is #machines

low-masterlow-mirror

high-masterhigh-mirror

Existing Vertex-cut

Low-degree mirror

Page 19: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Graph Partitioning

High-degree vertex□ Distribute edges (e.g., in-edges) according to

another endpoint vertex (e.g., source)□ The upper bound of replications imported by

placing all edges belonged to high-degree vertex is #machines

low-masterlow-mirror

high-masterhigh-mirror

High-cut

Page 20: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Graph Partitioning

Hybrid vertex-cut□ User defined threshold (θ) and the direction of

locality□ Group edges in hash-based machine of vertex□ Low-cut: done! / High-cut: re-assignment

14 25 3

6

14 15

2 13

14

3

1 2

5 1 23

6

group

reassign

construct

e.g., θ =3 , IN

Page 21: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Heuristic for Hybrid-cut

Inspired by heuristic for edge-cut□ choose best master location of vertex

according to neighboring has located□ Consider one direction neighbors is enough□ Only apply to low-degree vertices□ Parallel ingress: periodically synchronize

private mapping-table (global vertex-id machine)

Page 22: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Optimization

Challenge: graph computation usually exhibits poor data access (cache) locality*□ irregular traversal of neighboring vertices along

edges

How about (cache) locality in communication?□ Problem: a mismatch of orders btw. sender &

receiver

*LUMSDAINE et al. Challenges in parallel graph processing. 2007

4

1 7

2 53

Page 23: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Locality-conscious Layout

General Idea: match orders by hybrid vertex-cut□ Tradeoff: ingress time vs. runtime□ Decentralized matching global vertex-id

9 86 2Low-master

high-mirrorHigh-master

low-mirror

4

35 4

5

72 6 1 8

2 6 1 8 3 9 7

4 9 8 6 2 5 1 9Zoning

M1

M2

M3

M1

M2

M3

H2 L2 h-mrr l-mrr

H3 L3 h-mrr l-mrr

H1 L1 h-mrr l-mrrZ1 Z2 Z3 Z4

8

15

52

8

4 371 6 8

2 6 4 3 9 7

9 3 1 2 5 46

Page 24: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Locality-conscious Layout

General Idea: match orders by hybrid vertex-cut□ Tradeoff: ingress time vs. runtime□ Decentralized algorithm global vertex-id

9 86 2Low-master

high-mirrorHigh-master

low-mirror

Grouping

M1

M2

M3

M1

M2

M3 8

15

52

8

4 371 6 8

2 6 4 3 9 7

9 3 1 2 5 46

8

65

82

8

4 371 6 5

2 1 4 7 9 3

9 3 1 2 4 56

H2 L2 h1 h3 l1 l3

H3 L3 h1 h2 l1 l2

H1 L1 h2 h3 l2 l3

H2 L2 h-mrr l-mrr

H3 L3 h-mrr l-mrr

H1 L1 h-mrr l-mrrZ1 Z2 Z3 Z4

Page 25: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Locality-conscious Layout

General Idea: match orders by hybrid vertex-cut□ Tradeoff: ingress time vs. runtime□ Decentralized algorithm global vertex-id

9 86 2Low-master

high-mirrorHigh-master

low-mirror

Sorting

M1

M2

M3

M1

M2

M3 5

65

52

8

7 341 6 8

2 1 4 7 3 9

3 9 1 2 4 86

8

65

82

8

4 371 6 5

2 1 4 7 9 3

9 3 1 2 4 56

H2 L2 h1 h3 l1 l3

H3 L3 h1 h2 l1 l2

H1 L1 h2 h3 l2 l3

H2 L2 h1 h3 l1 l3

H3 L3 h1 h2 l1 l2

H1 L1 h2 h3 l2 l3

Page 26: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Locality-conscious Layout

General Idea: match orders by hybrid vertex-cut□ Tradeoff: ingress time vs. runtime□ Decentralized algorithm global vertex-id

9 86 2Low-master

high-mirrorHigh-master

low-mirror

Rolling

M1

M2

M3

M1

M2

M3 5

65

52

8

7 341 6 8

2 1 4 7 3 9

3 9 1 2 4 86

5

15

52

8

7 341 6 8

2 6 3 9 4 7

3 9 1 2 4 86

H2 L2 h3 h1 l3 l1

H3 L3 h1 h2 l1 l2

H1 L1 h2 h3 l2 l3

H2 L2 h1 h3 l1 l3

H3 L3 h1 h2 l1 l2

H1 L1 h2 h3 l2 l3

Page 27: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Evaluation

Experiment Setup□ 48-node EC2-like cluster (4-core 12G RAM 1GigE

NIC)□ Graph Algorithms

− PageRank− Approximate Diameter− Connected Components

□ Data Set: − 5 real-world graphs− 5 synthetic power-law graphs*

*Varying α and fixed 10 million vertices (smaller α produces denser graphs)

Page 28: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Runtime Speedup

48 machines and baseline: PowerGraph + Grid (default)

Real-world GraphsPower-law Graphs

Hybrid: 2.02X ~ 2.96X

Ginger: 2.17X ~ 3.26X

Hybrid: 1.40X ~ 2.05X

Ginger: 1.97X ~ 5.53X

PageRank Gather: IN / Scatter: OUT

better

Page 29: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Runtime Speedup

48 machines and baseline: PowerGraph + Grid (default)

Connected ComponentGather: NONE / Scatter: ALL

Approximate DiameterGather: OUT / Scatter:

NONE

Hybrid: 1.93X ~ 2.48X

Ginger: 1.97X ~ 3.15X

Hybrid: 1.44X ~ 1.88X

Ginger: 1.50X ~ 2.07X

better

Page 30: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Communication Cost

1.8 1.9 2 2.1 2.20%

20%

40%

60%

80%

100%

120%grid coor hybrid ginger

power-law constant (α)

% C

om

m.

Da

ta S

ize

Twit-ter

UK Wiki LJ Gweb

datasets

Power-law Graphs Real-world Graphs

394MB 170MB188MB

79.4

%

better

Page 31: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Effectiveness of Hybrid

Power-law (48)

Real-world (48)

Ingress Time

Scalability (Twitter)

Hybrid Graph Partitioning

better

Page 32: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Effectiveness of HybridHybrid Graph Computation

1.8 1.9 2 2.1 2.20

10

20

30

40

50

60

power-law constant (α)

On

e I

tera

tio

n C

om

ms

(MB

)

1.8 1.9 2 2.1 2.20

2

4

6

8

10

12PG+HybridPG+GingerPL+Hybrid

power-law constant (α)

Exe

cuti

on

Tim

e (

Se

c)

better

Page 33: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Scalability

Increasing of machines Increasing of data size

0.01

0.1

1 2 3 4 5 6 7 8 9 100

2

4

6

8

10

12

14 gridobliviouscoor-di-natedhybridginger

#Vertices (Millions)

Execu

tion T

ime (

Sec)

8 16 24 480

2

4

6

8

10

12

14

16gridobliviouscoor-di-natedhybridginger

#Machines

Execu

tion T

Ime (

Sec)

better

Page 34: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Conclusion

PowerLyra□ a new hybrid graph analytics engine that

embraces the best of both worlds of existing frameworks

□ an efficient hybrid graph partitioning algorithm that adopts different heuristics for different vertices.

□ outperforms PowerGraph with default partition by up to 5.53X and 3.26X for real-world and synthetic graphs accordingly

http://ipads.se.sjtu.edu.cn/projects/powerlyra.html

Page 35: Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed.

Questions

Thanks

PowerLyra

http://ipads.se.sjtu.edu.cn

Institute of Parallel And Distributed Systems

http://ipads.se.sjtu.edu.cn/projects/powerlyra.html