Top Banner
Paragon: Parallel Architecture-Aware Graph Partition Refinement Algorithm Angen Zheng, Alexandros Labrinidis, Patrick Pisciuneri, Panos K. Chrysanthis, and Peyman Givi University of Pittsburgh 1
52

Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Apr 08, 2018

Download

Documents

buidang
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Paragon: Parallel Architecture-Aware Graph Partition Refinement Algorithm

Angen Zheng, Alexandros Labrinidis, Patrick Pisciuneri, Panos K. Chrysanthis, and Peyman Givi

University of Pittsburgh

1

Page 2: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Importance of Graph Partitioning

Applications of Graph Partitioning

o Scientific Simulations

o Distributed Graph Computation

o Pregel, Hama, Giraph

o VLSI Design

o Task Scheduling

o Linear Programming

2

Page 3: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Target Workloads

3

★ Vertex○ a unique identifier

○ a modifiable, user-defined value

★ Edge○ a modifiable, user-defined value

○ a target vertex identifier

UD

F

UD

F

UD

F

UD

F

Balanced load distribution!!!

Minimizing the comm cost!!!

★ Vertex-Centric UDF

○ Change vertex/edge state

○ Send msg to neighbours

○ Receive msg from neighbors

○ Mutate the graph topology

○ Deactivate at end of the superstep

○ Reactivate by external msgs

Page 4: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

A Balanced Partitioning = Even Load Distribution

N3

N1

N2

4

Page 5: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Minimal Edge-Cut = Minimal Data Comm

5

N3

N1

N2

Minimal Data Comm ≠ Minimal Comm Cost

Page 6: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Roadmap

6

# of slides

Introduction ✔

Heterogeneity

State of the Art

PARAGON

Contention

Experiments

% of

audience

asleep

Page 7: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Nonuniform Inter-Node Network Comm Cost

Comm costs vary a lot as their locations change!7

Page 8: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Nonuniform Intra-Node Network Comm Cost

Cores sharing more cache levels communicate faster!

8

Page 9: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Inter-Node Comm Cost > Intra-Node Comm Cost

Network(Ethernet, IPoIB)

Node#1 Node#2

9

Page 10: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Minimal Edge-Cut = Minimal Data Comm ≠ Minimal Comm Cost

N1 N2 N3

N1 1 6

N2 1 1

N3 6 1

N3N1

N2

• 3 edge-cut• 3 unit data comm• 8 unit comm cost (8 = 1 * 6 + 2 * 1)

10

Page 11: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Minimal Edge-Cut = Minimal Data Comm ≠ Minimal Comm Cost

N1 N2 N3

N1 1 6

N2 1 1

N3 6 1

11

• 4 edge-cut• 4 unit data comm• 4 unit comm cost (4 = 1 * 1+ 3 * 1)

N3N1

N2

Group neighbouring vertices as close as possible!

Page 12: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Roadmap

12

# of slides

Introduction ✔

Heterogeneity✔

State of the Art

PARAGON

Contention

Experiments

% of

audience

asleep

Page 13: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Overview of the State-of-the-Art

13

Balanced Graph (Re)Partitioning

Partitioners

(static graphs)Repartitioners

(dynamic graphs)

Metis ICA3PP’08 SoCC’12 TKDE’15 BigData’15 DG/LDG

Fennel

Offline Methods

(High Quality)

(Poor Scalability)

Online Methods(Moderate Quality)

(High Scalability)

Parmetis Aragon

Offline Methods(High Quality)

(Poor Scalability)

Online Methods(Moderate~High Quality)

(High Scalability)

Heterogeneity-Aware

CatchWParagon xdgp Mizan

Heterogeneity-Aware

LogGPHermes

Page 14: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Our Prior Work: Aragon

A sequential architecture-aware graph partition refinement algorithm.o Input:

• A partitioned graph• The relative network comm cost matrix

o Output:• A partitioning with improved mapping of the comm

pattern to the underlying hardware topology.

14

[1]. Angen Zheng, Alexandors Labrinidis, and Panos K. Chrysanthis. Archiitecture-Aware Graph Repartitioning for Data-Intensive Scientific

Computing. BigGraphs, 2014

Page 15: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Our Prior Work: AragonN1

N2

N3

N4

N5

N6

N7

N8

N9

G

P1

P2

P3

P4

P5

P6

P7

P8

P9

P1

P2

P3

P4

P9

P8

P7

P6

Heterogeneity-Aware Refinement(More details in the paper)

Aragon N5 can hold entire graph

in memory

Prefer to work in offline

mode

15

Page 16: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Roadmap

16

# of slides

Introduction ✔

Heterogeneity✔

State of the Art ✔

PARAGON

Contention

Experiments

% of

audience

asleep

Page 17: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Overview:

o Parallel Architecture-Aware Graph Partition Refinement Algorithm

Goal:

o Group neighbouring vertices as close as possible

Paragon

17

Paragon vs Aragon○ lower overhead

○ scale to much larger graphs

Page 18: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Paragon: Partition Grouping

P1P2P3

P4P6P9

P5P7P8

18

P1

P2

P3

P4

P5

P6

P7

P8

P9

N1

N2

N3

N4

N5

N6

N7

N8

N9

Page 19: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Paragon: Group Server Selection

19

P1

P2

P3

P4

P5

P6

P7

P8

P9

N1

N2

N3

N4

N5

N6

N7

N8

N9

N9

N8

N2P1P2P3

P4P6P9

P5P7P8

Page 20: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Paragon: Sending “Partition” to Group Servers

N1

N2

N3

N4

N5

N6

N7

N8

N9

P1

P2

P3

P4

P5

P6

P7

P8

P9

P1

P3

P4

P6

P5

P7

20

Only send boundary vertices

N9

N8

N2P1P2P3

P4P6P9

P5P7P8

Page 21: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Paragon: Parallel Refinement

Aragon

P1

P2

P3

P4

P5

P6

P7

P8

P9

P1

P3

P4

P6

P5

P7

Aragon

Aragon

N2

N3

N4

N5

N6

N7

N8

N9

N1

21

N9

N8

N2P1P2P3

P4P6P9

P5P7P8

# of Groups○ Degree of Parallelism

Page 22: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Paragon: Parallel Refinement

Aragon

P1

P2

P3

P4

P5

P6

P7

P8

P9

P1

P3

P4

P6

P5

P7

Aragon

Aragon

N2

N3

N4

N5

N6

N7

N8

N9

N1

22

N9

N8

N2P1P2P3

P4P6P9

P5P7P8

# of Groups○ Degree of Parallelism○ Parallelism vs Quality

36

16

96

Page 23: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Paragon: Shuffle Refinement

N2:

N9:

N8:

P1P2P4

P5P7P9

P3P6P8

Swap

Aragon

Aragon

Aragon

Parallel

P1P2P3

P4P6P9

P5P7P8

23

Repeat k times

To increase the # of partition pairs being refined!

Page 24: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Roadmap

24

# of slides

Introduction ✔

Heterogeneity✔

State of the Art ✔

PARAGON ✔

Contention

Experiments

% of

audience

asleep

Page 25: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Inter-Node Comm Cost ? Intra-Node Comm Cost

Network

Node#1 Node#2

RDMA-enabled

25

Page 26: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Inter-Node Comm Cost≅ Intra-Node Comm Cost

[2]. C. Binnig, U. Çetintemel, A. Crotty, A. Galakatos, T. Kraska, E. Zamanian, and S. B. Zdonik. The End of Slow Networks: It’sTime for a

Redesign. CoRR, 2015

★ Dual-socket Xeon E5v2 server with

○ DDR3-1600

○ 2 FDR 4x NICs per socket

Revisit the Impact of Memory Subsystem Carefully!

★ Infiniband: 1.7GB/s~37.5GB/s

★ DDR3:

6.25GB/s~16.6GB/s

26

Page 27: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Intra-Node Shared Resource Contention

Send Buffer

Sending Core Receiving Core

Receive BufferShared Buffer

1. Load3. Load2b. Write

2a. Load 4a. Load

4b. Write

27

Page 28: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Cached Send/Shared/Receive Buffer

Intra-Node Shared Resource Contention

Multiple copies of the same data in LLC,

contending for LLC and MC

28

Page 29: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Intra-Node Shared Resource Contention

Cached Send/Shared Buffer Cached Receive/Shared Buffer

29

Multiple copies of the same data in LLC,

contending for LLC, MC, and QPI.

Page 30: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Paragon: Avoiding Contention

Intra-Node Network Comm Cost

Maximal Inter-Node Network Comm Cost

Degree of Contention

30

(Small HPC Clusters)

(Cloud/Large Clusters)

Page 31: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Paragon: Avoiding Contention

Send

Buffer

Sending Core

Node#1

IB

HCA

Receive

Buffer

Receiving Core

Node#2

IB

HCA

31

Page 32: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Roadmap

32

# of slides

Introduction ✔

Heterogeneity✔

State of the Art ✔

PARAGON ✔

Contention ✔

Experiments

% of

audience

asleep

Page 33: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Evaluation

MicroBenchmarks

o Degree of Refinement Parallelism

o Varying Shuffle Refinement Times

o Varying Initial Partitioners

Real-World Workloads

o Breadth First Search (BFS)

o Single Source Shortest Path (SSSP)

Billion-Edge Graph Scaling

33

Page 34: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Evaluation

MicroBenchmarks

o Degree of Refinement Parallelism

o Varying Shuffle Refinement Times

o Varying Initial Partitioners

Real-World Workloads

o Breadth First Search (BFS)

o Single Source Shortest Path (SSSP)

Billion-Edge Graph Scaling

34

Page 35: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Degree of Refinement Parallelism: Refinement Time

35

Aragon

★ com-lj: |V|=4M, |E| = 69M

★ 40 partitions: two 20-core machines

★ Initial Partitioner: DG (deterministic greedy)

★ # of Shuffle Times: 0

Page 36: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Degree of Refinement Parallelism: Partitioning Quality

36

★ com-lj: |V|=4M, |E| = 69M

★ 40 partitions: two 20-core machines

★ Initial Partitioner: DG (deterministic greedy)

★ # of Shuffle Times: 0

Page 37: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Evaluation

MicroBenchmarks

o Degree of Refinement Parallelism

o Varying Shuffle Refinement Times

o Varying Initial Partitioners

Real-World Workloads

o Breadth First Search (BFS)

o Single Source Shortest Path (SSSP)

Billion-Edge Graph Scaling

37

Page 38: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Varying Shuffle Refinement Times

38

★ com-lj: |V|=4M, |E| = 69M

★ 40 partitions: two 20-core machines

★ Initial Partitioner: DG (deterministic greedy)

★ Deg. of Parallelism: 8

# of shuffle refinement times > 10○ Paragon had lower refinement overhead

■ 8~10s vs 33s (Paragon vs Aragon)

○ Paragon produce better decompositions

■ 0~2.6% (Paragon vs Aragon)

Page 39: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Evaluation

MicroBenchmarks

o Degree of Refinement Parallelism

o Varying Shuffle Refinement Times

o Varying Initial Partitioners

Real-World Workloads

o Breadth First Search (BFS)

o Single Source Shortest Path (SSSP)

Billion-Edge Graph Scaling

39

Page 40: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Varying Initial Partitioners

40

Dataset 12 datasets from various areas

# of Parts 40 (two 20-core machines)

Initial Partitioner HP/DG/LDG

Deg. of Parallelism 8

# of Refinement Times 8

HP: Hashing Partitioning

DG: Deterministic Greedy Partitioning

LDG: Linear Deterministic Greedy Partitioning

Page 41: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Impact of Varying Initial Partitioners: Partitioning Quality

Improv. Max Avg.

HP 58% 43%

DG 29% 17%

LDG 53% 36%

41

Page 42: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Evaluation

MicroBenchmarks

o Degree of Refinement Parallelism

o Varying Shuffle Refinement Times

o Varying Initial Partitioners

Real-World Workloads

o Breadth First Search (BFS)

o Single Source Shortest Path (SSSP)

Billion-Edge Graph Scaling

42

Page 43: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

BFS: Platform

• PittMPICluster: 32 nodes connected via a single switch with 56Gbps FDR Infiniband

• Gordon Supercomputer: 4x4x4 3D torus of switches connected via QDR Infiniband with 16 compute nodes attached to each switch (with 8Gbps link bandwidth)

43Bottleneck Memory (𝛌=1) Network (𝛌=0)

Page 44: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Paragon xdgp Mizan

BFS: Competitors

44

uniParagon

Initial Partitioner: DG

Balanced Graph (Re)Partitioning

Partitioners

(static graphs)Repartitioners

(dynamic graphs)

Metis ICA3PP’08 SoCC’12 TKDE’15 BigData’15 DG/LDG

Fennel

Offline Methods

(High Quality)

(Poor Scalability)

Online Methods(Moderate Quality)

(High Scalability)

Parmetis Aragon

Offline Methods(High Quality)

(Poor Scalability)

Online Methods(Moderate~High Quality)

(High Scalability)

CatchW LogGPHermes

Page 45: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

BFS on PittMPICluser (𝛌=1): Exec. Time

★ as-skitter: |V|=1.6M, |E| = 22M

★ 60 partitions: three 20-core machines

★ deg. of parallelism: 8

★ # of shuffle refinement times: 8

5.9x

6.7x

5.9x

2.7x

45

Page 46: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

BFS on PittMPICluser (𝛌=1): Comm Vol

46

★ as-skitter: |V|=1.6M, |E| = 22M

★ 60 partitions: three 20-core machines

★ deg. of parallelism: 8

★ # of shuffle refinement times: 8

Reduction Intra-

Socket

Inter-

Socket

DG 62% 55%

METIS 53% 55%

PARMETIS 15% 17%

uniPARAGON 62% 39%

Page 47: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

2.5x

1.5x50%

38%

BFS on Gordon (𝛌=0)

47

★ as-skitter: |V|=1.6M, |E| = 22M

★ 48 partitions: three 16-core machines

★ deg. of parallelism: 8

★ # of shuffle refinement times: 8

Page 48: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Evaluation

MicroBenchmarks

o Degree of Refinement Parallelism

o Varying Shuffle Refinement Times

o Varying Initial Partitioners

Real-World Workloads

o Breadth First Search (BFS)

o Single Source Shortest Path (BFS)

Billion-Edge Graph Scaling

48

Page 49: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Billion-Edge Graph Scaling: BFS Exec. Time

49

★ friendster: |V|=124M, |E| = 3.6B

★ 60 partitions: three 20-core machines

★ deg. of parallelism: 10

★ # of shuffle refinement times: 10

★ 1.65x

★ 60 cores

Page 50: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Billion-Edge Graph Scaling: Refine Time

★ friendster: |V|=124M, |E| = 3.6B

★ 60 partitions: three 20-core machines

★ deg. of parallelism: 10

★ # of shuffle refinement times: 10

50

1.36x

Page 51: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Conclusions

Introduced PARAGONo Parallel Architecture-Aware

Graph Partition Refinement Algorithm

PARAGON addresses:

o Scalability

o Communication Heterogeneity

o Shared Resource Contention

Extensive experimental study:

o Achieved up to 6.7x speedups on real-world workloads

o Scaled up to 3.6 billion edges

51

Acknowledgments:

• Jack Lange

• Albert DeFusco

• Kim Wong

• Mark Silvis

Funding:

• NSF OIA-1028162

• NSF CBET-1250171

Page 52: Paragon: Parallel Architecture-Aware Graph Partition ...people.cs.pitt.edu/~anz28/papers/paragon.edbt16.slides.pdf · Paragon: Parallel Architecture-Aware Graph Partition Refinement

Thank You!

Email: [email protected] or [email protected]: http://people.cs.pitt.edu/~anz28/ADMT: http://db.cs.pitt.edu/group/

52