Motivation Gilbert Evaluation Conclusion Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems Till Rohrmann 1 Sebastian Schelter 2 Tilmann Rabl 2 Volker Markl 2 1 Apache Software Foundation 2 Technische Universität Berlin March 8, 2017 1 / 25
25
Embed
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Motivation Gilbert Evaluation Conclusion
Gilbert: Declarative Sparse Linear Algebra onMassively Parallel Dataflow Systems
Till Rohrmann 1 Sebastian Schelter 2 Tilmann Rabl 2
Volker Markl 2
1Apache Software Foundation
2Technische Universität Berlin
March 8, 2017
1 / 25
Motivation Gilbert Evaluation Conclusion
Motivation
2 / 25
Motivation Gilbert Evaluation Conclusion
Information Age
Collected data grows exponentiallyValuable information stored in dataNeed for scalable analytical methods
3 / 25
Motivation Gilbert Evaluation Conclusion
Distributed Computing and Data Analytics
Writing parallel algorithms is tediousand error-proneHuge existing code base in form oflibrariesNeed for parallelization tool
4 / 25
Motivation Gilbert Evaluation Conclusion
Requirements
Linear algebra is lingua franca of analyticsParallelize programs automatically to simplify developmentSparse operations to support sparse problems efficiently
Goal
Development of distributed sparse linear algebra system
1 A = rand (10 , 2 ) ;2 B = eye ( 1 0 ) ;3 A’∗B;4 f = @( x ) x . ^ 2 . 0 ;5 eps = 0 . 1 ;6 c = @( p , c ) norm ( p−c , 2 ) < eps ;7 f i x p o i n t (1/2 , f , 10 , c ) ;
9 / 25
Motivation Gilbert Evaluation Conclusion
Gilbert Typer
Matlab is dynamically typedDataflow systems require type knowledge at compile typeAutomatic type inference using the Hindley-Milner type inferencealgorithmInfer also matrix dimensions for optimizations
1 A = rand (10 , 2 ) : Mat r i x ( Double , 10 , 2)2 B = eye ( 1 0 ) : Mat r i x ( Double , 10 , 10)3 A’∗B: Matr i x ( Double , 2 , 10)4 f = @( x ) x . ^ 2 . 0 : N −> N5 eps = 0 . 1 : Double6 c = @( p , c ) norm ( p−c , 2 ) < eps : (N,N) −> Boolean7 f i x p o i n t (1/2 , f , 10 , c ) : Double
10 / 25
Motivation Gilbert Evaluation Conclusion
Intermediate Representation & Gilbert Optimizer
Language independent representation of linear algebra programsAbstraction layer facilitates easy extension with new programminglanguages (such as R)Enables language independent optimizations
Which partitioning is better suited for matrix multiplications?
io_costrow = O(n3) io_costblock = O
(n2√n
)
12 / 25
Motivation Gilbert Evaluation Conclusion
Distributed Operations: Addition
Apache Flink and Apache Spark offer MapReduce-like API withadditional operators: join, coGroup, cross
13 / 25
Motivation Gilbert Evaluation Conclusion
Evaluation
14 / 25
Motivation Gilbert Evaluation Conclusion
Gaussian Non-Negative Matrix Factorization
Given V ∈ Rd×w find W ∈ Rd×t and H ∈ Rt×w such that V ≈WHUsed in many fields: Computer vision, document clustering andtopic modelingEfficient distributed implementation for MapReduce systems
Algorithm
H ← randomMatrix(t, w)W ← randomMatrix(d , t)while ‖V −WH‖2 > eps do
H ← H · (W T V /W T WH)W ←W · (VHT /WHHT )
end while
15 / 25
Motivation Gilbert Evaluation Conclusion
Testing Setup
Set t = 10 and w = 100000V ∈ Rd×100000 with sparsity 0.001Block size 500× 500Numbers of cores 64Flink 1.1.2 & Spark 2.0.0Gilbert implementation: 5 linesDistributed GNMF on Flink: 70 lines
1 i t = 10 ;2 d = sum (A, 2) ;3 M = ( d iag (1 . / d ) ∗ A) ’ ;4 r_0 = ones ( n , 1) / n ;5 e = ones ( n , 1) / n ;6 f o r i = 1 : i t7 r = .85 ∗ M ∗ r + .15 ∗ e8 end
Gilbert1 i t = 10 ;2 d = sum (A, 2) ;3 M = ( d iag (1 . / d ) ∗ A) ’ ;4 r_0 = ones ( n , 1) / n ;5 e = ones ( n , 1) / n ;6 f i x p o i n t ( r_0 ,7 @( r ) . 85 ∗ M ∗ r + .15 ∗ e ,8 i t )
22 / 25
Motivation Gilbert Evaluation Conclusion
PageRank: 10 Iterations
104 105
101
102
103
104
Number of vertices n
Executiontim
etin
s
SparkFlink
SP FlinkSP Spark
Gilbert backends show similar performanceSpecialized implementation faster because it can fuse operations
Scales to data sizes exceeding a singlecomputerHigh-level linear algebra optimizationsimprove runtimeSlower than specializedimplementations due to abstractionoverhead