Distributed Machine Learning and Graph Processing with ...Distributed Machine Learning and Graph Processing with Sparse Matrices Shivaram Venkataraman*, Erik Bodzsar# ... Graph-centric
Post on 16-Oct-2020
9 Views
Preview:
Transcript
Distributed Machine Learning and Graph Processing with Sparse Matrices
Shivaram Venkataraman*, Erik Bodzsar#
Indrajit Roy+, Alvin AuYoung+, Rob Schreiber+
*UC Berkeley, #U Chicago, +HP Labs
Big Data, Complex Algorithms
PageRank(Dominant eigenvector)
Recommendations(Matrix factorization)
Anomaly detection(Top-K eigenvalues)
User Importance(Vertex Centrality)
2
Machine learning + Graph algorithms
Large-Scale Processing Frameworks
Data-parallel frameworks – MapReduce/Dryad (2004)– Process each record in parallel
– Use case: Computing sufficient statistics, analytics queries
Graph-centric frameworks – Pregel/GraphLab (2010)– Process each vertex in parallel
– Use case: Graphical models
Array-based frameworks – MadLINQ (2012)– Process blocks of array in parallel
– Use case: Linear Algebra Operations
3
PageRank using Matrices
Power Method
Dominant eigenvector
4
Mp
M = web graph matrixp = PageRank vector
Simplified algorithm repeat { p = M*p }
Linear Algebra Operations on Sparse Matrices
p
Presto
Large-scale machine learning and
graph processing on sparse matrices
5
Extend R – make it scalable, distributed
Challenge 1 – Sparse Matrices
6
Challenge 1 – Sparse Matrices
1
10
100
1000
10000
1 11 21 31 41 51 61 71 81 91
Bloc
k de
nsit
y (n
orm
aliz
ed )
Block ID
LiveJournal Netflix ClueWeb-1B
1000x more data Computation imbalance
7
Challenge 2 – Data Sharing
Sharing data through pipes/network
Time-inefficient (sending copies)Space-inefficient (extra copies)
Process
copy of data
local copy
Process
data
Process
copy of data
Process
copy of data
Server 1
network
copynetwork
copy
Server 2
8
Sparse matrices
Communication overhead
Outline
• Motivation
• Programming model
• Design
• Applications and Results
9
darray
10
foreach f (x)
11
PageRank Using Presto
M darray(dim=c(N,N),blocks=(s,N))
P darray(dim=c(N,1),blocks=(s,1))
while(..){
foreach(i,1:len,
calculate(m=splits(M,i),
x=splits(P), p=splits(P,i)) {
p m*x
}
)}
Create Distributed Array
12
M p
P1
P2
PN/s
PageRank Using Presto
M darray(dim=c(N,N),blocks=(s,N))
P darray(dim=c(N,1),blocks=(s,1))
while(..){
foreach(i,1:len,
calculate(m=splits(M,i),
x=splits(P), p=splits(P,i)) {
p m*x
}
)}
Execute function in a cluster
Pass array partitions
13
p
P1
P2
PN/s
M
Presto Architecture
WorkerWorker
Master
R instanceR instance
DR
AM
R instance R instanceR instance
DR
AM
R instance
14
Repartitioning Matrices
Profile execution
Repartition
15
Partition if max(𝑡)
𝑚𝑒𝑑𝑖𝑎𝑛 (𝑡)> 𝛿
Maintaining Size Invariants
invariant(mat,vec, type=ROW)
16
Sharing Distributed Arrays
Versioned distributed arrays
17
Goal: Zero-copy sharing across cores
Immutable partitions Safe sharing
Data Sharing Challenges
1. Garbage collection
R object data partR object header
R instance R instance
2. Header conflicts
18
Overriding R’s allocator
page
Shared R object data part
Allocate process-local headers. Map data in shared memory
Local R object header
page boundary page boundary
19
Outline
• Motivation
• Programming model
• Design
• Applications and Results
20
demo
5 node cluster 8 cores per node
PageRank on 1.5B edge Twitter data
demodemo
demo
21
22
Applications Implemented in Presto
Application Algorithm Presto LOC
PageRank Eigenvector calculation 41
Triangle counting Top-K eigenvalues 121
Netflix recommendation Matrix factorization 130
Centrality measure Graph algorithm 132
k-path connectivity Graph algorithm 30
k-means Clustering 71
Sequence alignment Smith-Waterman 64
Fewer than 140 lines of code
23
Evaluation Overview
Evaluation Setup - 25 machine cluster
- Machine: 24 cores, 96GB RAM, 10Gbps network
Data-sharing benefits – 1.5B edge Twitter graph
Repartitioning analysis – 6B edge Web-graph
Faster than Spark and Hadoop using in-memory data
Collaborative Filtering using Netflix dataset
24
Data sharing benefits
25
4.45
2.49
1.63
0.71
0.7
0.72
10
20
40
CORE
S
4.38
2.21
1.22
1.22
2.12
4.16
10
20
40
CORE
S
Compute TransferNo sharing
Sharing
0
10
20
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Split
siz
e (G
B)
Iteration count
Repartitioning Progress
26
0 20 40 60 80 100 120 140 160
Wor
kers
Transfer Compute
0 20 40 60 80 100 120 140 160
Wor
kers
Repartitioning benefits
No Repartition
Repartition
27
Related Work
Large scale data processing frameworks
– MapReduce, Dryad, Spark, GraphLab
Matrix Computations – Ricardo, MadLINQ
HPC systems – ARPACK, Combinatorial BLAS
Multi-core R packages – doMC, snow, Rmpi
28
Presto
29
Co-partitioning
matrices
Locality-based
scheduling
Caching
partitions
Conclusion
Presto: Large scale array-based framework extends R
Challenges with Sparse matrices
Repartitioning, sharing versioned arrays
30
Backup Slides
31
Netflix Collaborative Filtering
32
755.112
380.985
256.1
234.236
202.725
155.299
0 200 400 600 800
8
16
24
32
40
48
Time (seconds)
Num
ber
of c
ores
Load t(R)×R R×t(R)×R
Repartitioning benefits
0
50
100
150
200
250
300
350
400
2000
3000
4000
5000
6000
7000
8000
0 5 10 15 20
Cum
ula
tive
par
titi
onin
g ti
me
(s)
Tim
e to
con
verg
ence
(s)
Number of Repartitions
Convergence Time
Time spent partitioning
33
top related