Chronos: A Graph Engine for Temporal Graph Analysis Wentao Han 1,3 , Youshan Miao 2,3 , Kaiwei Li 1,3 , Ming Wu 3 , Fan Yang 3 , Lidong Zhou 3 , Vijayan Prabhakaran 3 , Wenguang Chen 1 , Enhong Chen 2 Tsinghua University 1 University of Science and Technology of China 2 Microsoft Research 3 1
43
Embed
Chronos: A Graph Engine for Temporal Graph Analysis
Chronos: A Graph Engine for Temporal Graph Analysis. Wentao Han 1,3 , Youshan Miao 2,3 , Kaiwei Li 1,3 , Ming Wu 3 , Fan Yang 3 , Lidong Zhou 3 , Vijayan Prabhakaran 3 , Wenguang Chen 1 , Enhong Chen 2 - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• Temporal graph analysis• an emerging class of applications
• Chronos • supports analysis of temporal graphs efficiently
• Joint design of data layout and scheduling• Leveraging the temporal similarity of graphs• Exploit data locality esp. in time dimension
27
Thank You!
Questions?
Tsinghua University
University of Science and Technology of
China
MicrosoftResearch
28
BACKUP
• Experiment Environment Details• Real Graphs Similarities over Time• Batch Size Discussion• LABS Locking• LABS with Incremental Computation• LABS on Cluster• Related Work
29
Experiment Setup
CPU 2.4GHz Intel Xeon E5-2665 16-core
RAM 128GB
DISK 1TB SSD (RAID 0 with 372GB1 *3)
Network InfiniBand (DDR, 40Gb/s)
ClusterSize 4
1. SSD model: TOSHIBA MK4001GRZB
30
Temporal Distributions of Graphs• Edges increase gradually
6%13%
19%25%
31%38%
44%50%
56%63%
69%75%
81%88%
94%100%
0%10%20%30%40%50%60%70%80%90%
100%
wiki
Ratio of time range
Num
ber o
f Edg
es
6% 13%
19%
25%
31%
38%
44%
50%
56%
63%
69%
75%
81%
88%
94%
100%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
weibo
Ratio of time range
Num
ber o
f Edg
es
31
On-disk Temporal Graph
Ci: checkpoint of vi: Edges without time informationaij: j-th activity of vi: Edge changes, e.g., <addE, (v0, v3, w), t2 >
Snapshot Groups
A Snapshot Group
Snapshot Group 0 Snapshot Group 1
Timeindex
......
...... C0 a0,1 ... C1 ...
Vertexindex
a0,t a1,1 a1,t
Edge activities of v0 Edge activities of v1
Edge data for v0 Edge data for v1
32
LABS: In-memory Design
... ...
Vertexindex
Edges of v1
Temporal Edge
(v1)→ v2 110 (v1)→ v3 111 ... ...Edge Array
Vertex Data Array
indicate which snapshots the edge exists in
v2v1 ...v2'v1' v2”v1”...
Vertexindex
Data of v1 Data of v2
v1 → v2 v1'→ v2' v1”→ v2”LogicallyEquals to:
33
Temporal Graph Re-construction• User input time points: 0, 10, 20• Scan the graph activity log [Type, Endpoints, Time]:
• Possible to further reduce cache miss / inter-core comm.
• Cons
• Bit wide limit of the instruction: _BitScanForward64
• Less snapshot similarity within a batch
• No more cache miss / inter-core comm. to reduce
• False sharing with locking
36
Compute Snapshot by Snapshot (another way)
Vertex Data Array
v2 ...v1 ...... v3 ...
Þ 3 cache missesÞ 3 inter-core comm.
v2' ...v1' ...... v3' ...
v2” ...v1” ...... v3” ...
Cache Miss
Snapshot1
Snapshot2
Snapshot3
Inter-core communication
Core 0 Core 1
Core 0
Core 1
Core 2
• Snapshot-Parallelism
Partition-Parallelism
Snapshot-Parallelism
LABS-Parallelism
Cache Miss More More Less
Inter-core Communications More No Less
Parallelization -- Summary
37
Snapshot by snapshot LABS
Good partitioning: Num. of intra-partition edge > Num. of inter-partition edge
?
Partition-Parallelism: Computing partitions of the same snapshot in parallelSnapshot-Parallelism: Computing snapshots in parallelLABS-Parallel: Computing LABS-batched partition in parallel
38
LABS Performance on Multi-Core
LABS-Parallelism out-performs
0 4 8 12 160
102030405060708090
PageRank on Wiki
Partition-Parallelism
LABS-Parallelsm
Snapshot-Parallelism
# of Cores
Spee
dup
1
Baseline: Single Core
39
LABS Performance on Cluster
• A small cluster with 4 machines
• Benefit less than in single machine test• The benefit of LABS hided by the high overhead of network
PageRank WCC SSSP10
100
1000
10000 7318 6405
518
20021250
48
Baseline LABS
Tim
e (s
)
Up to 10x speed up
40
Reduced Lock Contentions
• LABS amortizes the lock cost across snapshots• PageRank on the Wiki graph
2 4 8 160
20
40
60
80
100
120
28.85 34.2547.54
96.73
1.32 1.34 1.85 4.02
No LABSLABS
Number of Cores
Lock
tim
e (s
econ
d)
Reduced the time of locking by more than 95%
95% 96%96%
96%
41
LABS with Incremental Computation• Traditional incremental computing
• Incremental computing with LABSSnapshot
0Snapshot
1Snapshot
2Snapshot
3
Snapshot0
Snapshot1
Snapshot2
Snapshot3
Apply LABS(BatchSize = 3)
Incremental Computing
42
Gain of Incremental LABS
1 10 1000%
10%
20%
30%
40%
50%
60%
70%
WCCSSSP
Batch size
Impr
ovem
ent (
%)
Baseline: Traditional Incremental
43
Related work• Existing Graph Engines – static graph engines