Top Banner
Distributed Machine Learning and Graph Processing with Sparse Matrices Speaker: LIN Qian http://www.comp.nus.edu.sg/ ~linqian/
44

Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Jan 16, 2015

Download

Technology

Qian Lin

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Distributed Machine Learning and Graph Processing with Sparse Matrices

Speaker: LIN Qianhttp://www.comp.nus.edu.sg/~linqian/

Page 2: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Big Data, Complex Algorithms

PageRank(Dominant eigenvector)

Recommendations(Matrix factorization)

Anomaly detection(Top-K eigenvalues)

User Importance(Vertex Centrality)

Machine learning + Graph algorithms

Page 3: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Large-Scale Processing Frameworks

Data-parallel frameworks – MapReduce/Dryad (2004)– Process each record in parallel – Use case: Computing sufficient statistics, analytics queries

Graph-centric frameworks – Pregel/GraphLab (2010)– Process each vertex in parallel– Use case: Graphical models

Array-based frameworks – MadLINQ (2012)– Process blocks of array in parallel– Use case: Linear Algebra Operations

Page 4: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

PageRank using Matrices

Power Method Dominant

eigenvector

Mp

M = web graph matrixp = PageRank vector

Simplified algorithm repeat { p = M*p }

Linear Algebra Operations on Sparse Matrices

p

Page 5: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Statistical software

moderately-sized datasetssingle server, entirely in memory

Page 6: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Work-around for massive dataset

Vertical scalabilitySampling

Page 7: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

MapReduce

Limited to aggregation processing

Page 8: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Data analytics

Deep vs. Scalable

Statistical software(R, MATLAB, SPASS, SAS) MapReduce

Page 9: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Improvement ways

1. Statistical sw. += large-scale data mgnt2. MapReduce += statistical functionality3. Combining both existing technologies

Page 10: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Parallel MATLAB, pR

Page 11: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

HAMA, SciHadoop

Page 12: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

MadLINQ [EuroSys’12]

Linear algebra platform on DryadNot efficient for sparse matrix comp.

Page 13: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Ricardo [SIGMOD’10]

But ends up inheriting the inefficiencies of the MapReduce

interface

R Hadoopaggregation-processing queries

aggregated data

Page 14: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Array-basedSingle-threaded

Limited support for scaling

Page 15: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Challenge 1: Sparse Matrices

Page 16: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Challenge 1 – Sparse Matrices

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 971

10

100

1000

10000

LiveJournal Netflix ClueWeb-1B

Block ID

Blo

ck d

ensi

ty (n

orm

aliz

ed )

1000x more data Computation imbalance

Page 17: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Challenge 2 – Data Sharing

Sharing data through pipes/network

Time-inefficient (sending copies)Space-inefficient (extra copies)

Process

copy of data

local copy

Process

data

Process

copy of data

Process

copy of data

Server 1

network copy

network copy

Server 2

Sparse matrices Communication overhead

Page 18: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Extend R – make it scalable, distributedLarge-scale machine learning and

graph processing on sparse matrices

Page 19: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Presto architecture

Page 20: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Presto architecture

WorkerWorker

Master

R instanceR instance

DRA

M

R instance R instanceR instance

DRA

M

R instance

Page 21: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Distributed array (darray)PartitionedSharedDynamic

Page 22: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

foreach

Parallel execution of the loop body

f (x)

Barrier

Call Update to publish changes

Page 23: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

PageRank Using Presto

M darray(dim=c(N,N),blocks=(s,N))P darray(dim=c(N,1),blocks=(s,1))while(..){ foreach(i,1:len,

calculate(m=splits(M,i), x=splits(P), p=splits(P,i)) { p m*x

} )}

Create Distributed Array

M p

P1

P2

PN/s

Page 24: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

PageRank Using Presto

M darray(dim=c(N,N),blocks=(s,N))P darray(dim=c(N,1),blocks=(s,1))while(..){ foreach(i,1:len,

calculate(m=splits(M,i), x=splits(P), p=splits(P,i)) { p m*x

} )}

Execute function in a cluster

Pass array partitions

p

P1

P2

PN/s

M

Page 25: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Dynamic repartitioning

To address load imbalanceCorrectness

Page 26: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Repartitioning Matrices

Profile execution

Repartition

Partition if

Page 27: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Invariants

compatibility in array sizes

Page 28: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Maintaining Size Invariants

invariant(mat, vec, type=ROW)

Page 29: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Data sharing for multi-core

Zero-copy sharing across cores

Page 30: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Data sharing challenges

1. Garbage collection2. Header conflict

R object data partR object header

R instance R instanceGarbage collect

ReadWrite Write

Page 31: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Overriding R’s allocator

Allocate process-local headersMap data in shared memory

page

Shared R object data partLocal R object header

page boundary page boundary

Page 32: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Immutable partitions Safe sharing

Only share read-only data

Page 33: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Versioning arrays

To ensure correctness when arrays are shared across machines

Page 34: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Fault tolerance

Master: primary-backup replicationWorker: heartbeat-based failure detection

Page 35: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Presto applications

Presto doubles LOC w.r.t. purely programming in R.

Page 36: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Evaluation

Faster than Spark and Hadoop using in-memory data

Page 37: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Page 38: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Multi-core support benefits

Page 39: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Data sharing benefits

10

20

40

4.45

2.49

1.63

0.71

0.7

0.72

Core

S

10

20

40

4.38

2.21

1.22

1.22

2.12

4.16

Compute TransferCO

RES

No sharing

Sharing

Page 40: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Repartitioning benefits

0 20 40 60 80 100 120 140 160

Transfer ComputeW

orke

rs

0 20 40 60 80 100 120 140 160

Wor

kers

No Repartition

Repartition

Page 41: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Repartitioning benefits

0 2 4 6 8 10 12 14 16 18 202000

3000

4000

5000

6000

7000

8000

0

50

100

150

200

250

300

350

400Convergence TimeTime spent partitioning

Number of Repartitions

Tim

e to

con

verg

ence

(s)

Cum

ulati

ve p

artiti

onin

g tim

e (s

)

Page 42: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Limitations

1. In-memory computation2. One writer per partition

3. Array-based programming

Page 43: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

• Presto: Large scale array-based framework extends R

• Challenges with Sparse matrices• Repartitioning, sharing versioned arrays

Conclusion

Page 44: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

IMDb Rating: 8.5Release Date: 27 June 2008Director: Doug SweetlandStudio: PixarRuntime: 5 min

Brief: A stage magician’s rabbit gets into a magical onstage brawl against his neglectful guardian with two magic hats.