Distributed Machine Learning and Graph Processing with Sparse Matrices Speaker: LIN Qian http://www.comp.nus.edu.sg/ ~linqian/
Jan 16, 2015
Distributed Machine Learning and Graph Processing with Sparse Matrices
Speaker: LIN Qianhttp://www.comp.nus.edu.sg/~linqian/
Big Data, Complex Algorithms
PageRank(Dominant eigenvector)
Recommendations(Matrix factorization)
Anomaly detection(Top-K eigenvalues)
User Importance(Vertex Centrality)
Machine learning + Graph algorithms
Large-Scale Processing Frameworks
Data-parallel frameworks – MapReduce/Dryad (2004)– Process each record in parallel – Use case: Computing sufficient statistics, analytics queries
Graph-centric frameworks – Pregel/GraphLab (2010)– Process each vertex in parallel– Use case: Graphical models
Array-based frameworks – MadLINQ (2012)– Process blocks of array in parallel– Use case: Linear Algebra Operations
PageRank using Matrices
Power Method Dominant
eigenvector
Mp
M = web graph matrixp = PageRank vector
Simplified algorithm repeat { p = M*p }
Linear Algebra Operations on Sparse Matrices
p
Statistical software
moderately-sized datasetssingle server, entirely in memory
Work-around for massive dataset
Vertical scalabilitySampling
MapReduce
Limited to aggregation processing
Data analytics
Deep vs. Scalable
Statistical software(R, MATLAB, SPASS, SAS) MapReduce
Improvement ways
1. Statistical sw. += large-scale data mgnt2. MapReduce += statistical functionality3. Combining both existing technologies
Parallel MATLAB, pR
HAMA, SciHadoop
MadLINQ [EuroSys’12]
Linear algebra platform on DryadNot efficient for sparse matrix comp.
Ricardo [SIGMOD’10]
But ends up inheriting the inefficiencies of the MapReduce
interface
R Hadoopaggregation-processing queries
aggregated data
Array-basedSingle-threaded
Limited support for scaling
Challenge 1: Sparse Matrices
Challenge 1 – Sparse Matrices
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 971
10
100
1000
10000
LiveJournal Netflix ClueWeb-1B
Block ID
Blo
ck d
ensi
ty (n
orm
aliz
ed )
1000x more data Computation imbalance
Challenge 2 – Data Sharing
Sharing data through pipes/network
Time-inefficient (sending copies)Space-inefficient (extra copies)
Process
copy of data
local copy
Process
data
Process
copy of data
Process
copy of data
Server 1
network copy
network copy
Server 2
Sparse matrices Communication overhead
Extend R – make it scalable, distributedLarge-scale machine learning and
graph processing on sparse matrices
Presto architecture
Presto architecture
WorkerWorker
Master
R instanceR instance
DRA
M
R instance R instanceR instance
DRA
M
R instance
Distributed array (darray)PartitionedSharedDynamic
foreach
Parallel execution of the loop body
f (x)
Barrier
Call Update to publish changes
PageRank Using Presto
M darray(dim=c(N,N),blocks=(s,N))P darray(dim=c(N,1),blocks=(s,1))while(..){ foreach(i,1:len,
calculate(m=splits(M,i), x=splits(P), p=splits(P,i)) { p m*x
} )}
Create Distributed Array
M p
P1
P2
PN/s
PageRank Using Presto
M darray(dim=c(N,N),blocks=(s,N))P darray(dim=c(N,1),blocks=(s,1))while(..){ foreach(i,1:len,
calculate(m=splits(M,i), x=splits(P), p=splits(P,i)) { p m*x
} )}
Execute function in a cluster
Pass array partitions
p
P1
P2
PN/s
M
Dynamic repartitioning
To address load imbalanceCorrectness
Repartitioning Matrices
Profile execution
Repartition
Partition if
Invariants
compatibility in array sizes
Maintaining Size Invariants
invariant(mat, vec, type=ROW)
Data sharing for multi-core
Zero-copy sharing across cores
Data sharing challenges
1. Garbage collection2. Header conflict
R object data partR object header
R instance R instanceGarbage collect
ReadWrite Write
Overriding R’s allocator
Allocate process-local headersMap data in shared memory
page
Shared R object data partLocal R object header
page boundary page boundary
Immutable partitions Safe sharing
Only share read-only data
Versioning arrays
To ensure correctness when arrays are shared across machines
Fault tolerance
Master: primary-backup replicationWorker: heartbeat-based failure detection
Presto applications
Presto doubles LOC w.r.t. purely programming in R.
Evaluation
Faster than Spark and Hadoop using in-memory data
Multi-core support benefits
Data sharing benefits
10
20
40
4.45
2.49
1.63
0.71
0.7
0.72
Core
S
10
20
40
4.38
2.21
1.22
1.22
2.12
4.16
Compute TransferCO
RES
No sharing
Sharing
Repartitioning benefits
0 20 40 60 80 100 120 140 160
Transfer ComputeW
orke
rs
0 20 40 60 80 100 120 140 160
Wor
kers
No Repartition
Repartition
Repartitioning benefits
0 2 4 6 8 10 12 14 16 18 202000
3000
4000
5000
6000
7000
8000
0
50
100
150
200
250
300
350
400Convergence TimeTime spent partitioning
Number of Repartitions
Tim
e to
con
verg
ence
(s)
Cum
ulati
ve p
artiti
onin
g tim
e (s
)
Limitations
1. In-memory computation2. One writer per partition
3. Array-based programming
• Presto: Large scale array-based framework extends R
• Challenges with Sparse matrices• Repartitioning, sharing versioned arrays
Conclusion
IMDb Rating: 8.5Release Date: 27 June 2008Director: Doug SweetlandStudio: PixarRuntime: 5 min
Brief: A stage magician’s rabbit gets into a magical onstage brawl against his neglectful guardian with two magic hats.