Top Banner
Tall-and-Skinny QR Factorizations in MapReduce DAVID F. GLEICH PURDUE UNIVERSITY COMPUTER SCIENCE DEPARTMENT PAUL G. CONSTANTINE AUSTIN BENSON JOE NICHOLS STANFORD UNIVERSITY JAMES DEMMEL UC BERKELEY JOE RUTHRUFF JEREMY TEMPLETON SANDIA 1 Cornell CS David Gleich · Purdue
36

Direct tall-and-skinny QR factorizations in MapReduce architectures

Jan 15, 2015

Download

Documents

David Gleich

An updated version of the TSQR talk I gave at Cor
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Direct tall-and-skinny QR factorizations in MapReduce architectures

Tall-and-Skinny QR Factorizations in MapReduce DAVID F. GLEICH !

PURDUE UNIVERSITY COMPUTER SCIENCE !

DEPARTMENT

PAUL G. CONSTANTINE AUSTIN BENSON !JOE NICHOLS ! STANFORD UNIVERSITY JAMES DEMMEL ! UC BERKELEY JOE RUTHRUFF !JEREMY TEMPLETON ! SANDIA

1 Cornell CS David Gleich · Purdue

Page 2: Direct tall-and-skinny QR factorizations in MapReduce architectures

Cornell CS David Gleich · Purdue 2

Questions?

Most recent code at http://github.com/arbenson/mrtsqr

Page 3: Direct tall-and-skinny QR factorizations in MapReduce architectures

Quick review of QR QR Factorization

David Gleich (Sandia)

Using QR for regression

   is given by the solution of   

QR is block normalization“normalize” a vector usually generalizes to computing    in the QR

A Q

Let    , real

  

   is    orthogonal (   )

   is    upper triangular.

0

R

=

4/22MapReduce 2011 3 Cornell CS David Gleich · Purdue

QR Factorization

David Gleich (Sandia)

Using QR for regression

   is given by the solution of   

QR is block normalization“normalize” a vector usually generalizes to computing    in the QR

A Q

Let    , real

  

   is    orthogonal (   )

   is    upper triangular.

0

R

=

4/22MapReduce 2011

Page 4: Direct tall-and-skinny QR factorizations in MapReduce architectures

4

Tall-and-Skinny matrices (m ≫ n)

A

Cornell CS David Gleich · Purdue

Page 5: Direct tall-and-skinny QR factorizations in MapReduce architectures

5

Tall-and-Skinny matrices !(m ≫ n) arise in

regression with many samples block iterative methods panel factorizations model reduction problems!general linear models "with many samples tall-and-skinny SVD/PCA

All of these applications !need a QR factorization of !a tall-and-skinny matrix.! some only need  R !

A

From tinyimages"collection

Cornell CS David Gleich · Purdue

Page 6: Direct tall-and-skinny QR factorizations in MapReduce architectures

Input "Parameters

Time history"of simulation

s f"~100GB

The Database

s1 -> f1 s2 -> f2

sk -> fk

f(s) =

2

66666666666664

q(x1, t1, s)...

q(xn

, t1, s)q(x1, t2, s)

...q(x

n

, t2, s)...

q(xn

, t

k

, s)

3

77777777777775

A single simulation at one time step

X =⇥f(s1) f(s2) ... f(sp)

The database as a very"tall-and-skinny matrix Th

e sim

ulatio

n as

a v

ecto

r

Cornell CS David Gleich · Purdue 6

Page 7: Direct tall-and-skinny QR factorizations in MapReduce architectures

Dynamic Mode Decomposition One simulation, ~10TB of data, compute the SVD of a space-by-time matrix.

Cornell CS David Gleich · Purdue 7

DMD video

Page 8: Direct tall-and-skinny QR factorizations in MapReduce architectures

MapReduce It’s a computational model "and a framework.

Cornell CS David Gleich · Purdue 8

Page 9: Direct tall-and-skinny QR factorizations in MapReduce architectures

MapReduce

Cornell CS David Gleich · Purdue 9

Page 10: Direct tall-and-skinny QR factorizations in MapReduce architectures

The MapReduce Framework Originated at Google for indexing web pages and computing PageRank.

Express algorithms in "data-local operations. Implement one type of communication: shuffle. Shuffle moves all data with the same key to the same reducer.

MM R

RMM

Input stored in triplicate

Map output"persisted to disk"before shuffle

Reduce input/"output on disk

1 MM R

RMMM

Maps Reduce

Shuffle

2

3

4

5

1 2 M M

3 4 M M

5 M

Data scalable

Fault-tolerance by design

10

Cornell CS David Gleich · Purdue

Page 11: Direct tall-and-skinny QR factorizations in MapReduce architectures

Computing variance in MapReduce Run 1 Run 2 Run 3

T=1 T=2 T=3 T=1 T=2 T=3 T=1 T=2 T=3

11

Cornell CS David Gleich · Purdue

Page 12: Direct tall-and-skinny QR factorizations in MapReduce architectures

Mesh point variance in MapReduce

M M M

R R

1. Each mapper out-puts the mesh points with the same key.

2. Shuffle moves all values from the same mesh point to the same reducer.

Run 1 Run 2 Run 3

3. Reducers just compute a numerical variance.

T=1 T=2 T=3 T=1 T=2 T=3 T=1 T=2 T=3

12

Cornell CS David Gleich · Purdue

Page 13: Direct tall-and-skinny QR factorizations in MapReduce architectures

MapReduce vs. Hadoop.

MapReduce!A computation model with:"Map a local data transform"Shuffle a grouping function "Reduce an aggregation

Hadoop!An implementation of MapReduce using the HDFS parallel file-system. Others !

Pheonix++, Twisted, Google MapReduce, spark …

Cornell CS David Gleich · Purdue 13

Page 14: Direct tall-and-skinny QR factorizations in MapReduce architectures

Current state of the art for MapReduce QR

MapReduce is often used to compute the principal components of large datasets. These approaches all form the normal equations and work with it.

Cornell CS David Gleich · Purdue 14

AT A

Page 15: Direct tall-and-skinny QR factorizations in MapReduce architectures

MapReduce is great for TSQR!!You don’t need  ATA Data A tall and skinny (TS) matrix by rows Input 500,000,000-by-50 matrix"Each record 1-by-50 row"HDFS Size 183.6 GB Time to compute read A 253 sec. write A 848 sec.!Time to compute  R in qr(A) 526 sec. w/ Q=AR-1 1618 sec. "Time to compute Q in qr(A) 3090 sec. (numerically stable)!

Cornell CS David Gleich · Purdue 15/2

2

Page 16: Direct tall-and-skinny QR factorizations in MapReduce architectures

Tall-and-Skinny QR

Cornell CS David Gleich · Purdue 16

Page 17: Direct tall-and-skinny QR factorizations in MapReduce architectures

Communication avoiding QR (Demmel et al. 2008) Communication avoiding TSQR

Demmel et al. 2008. Communicating avoiding parallel and sequential QR.

First, do QR factorizationsof each local matrix   

Second, compute a QR factorization of the new “R”

David Gleich (Sandia) 6/22MapReduce 2011

17

Cornell CS David Gleich · Purdue

Page 18: Direct tall-and-skinny QR factorizations in MapReduce architectures

Serial QR factorizations!(Demmel et al. 2008) Fully serial TSQR

Demmel et al. 2008. Communicating avoiding parallel and sequential QR.

Compute QR of    , read    , update QR, …

David Gleich (Sandia) 8/22MapReduce 2011

18

Cornell CS David Gleich · Purdue

Page 19: Direct tall-and-skinny QR factorizations in MapReduce architectures

Tall-and-skinny matrix storage in MapReduce MapReduce matrix storage

  

Key is an arbitrary row-idValue is the    array for

a row.

Each submatrix    is an input split.

A1

A2

A3

A4

David Gleich (Sandia) 10/22MapReduce 2011

19

Cornell CS David Gleich · Purdue

You can also store multiple rows together. It goes a little faster.

Page 20: Direct tall-and-skinny QR factorizations in MapReduce architectures

A1

A2

A3

A1

A2qr

Q2 R2

A3qr

Q3 R3

A4qr Q4A4

R4

emit

A5

A6

A7

A5

A6qr

Q6 R6

A7qr

Q7 R7

A8qr Q8A8

R8

emit

Mapper 1Serial TSQR

R4

R8

Mapper 2Serial TSQR

R4

R8

qr Q emitRReducer 1Serial TSQR

AlgorithmData Rows of a matrix

Map QR factorization of rowsReduce QR factorization of rows

20

Cornell CS David Gleich · Purdue

Page 21: Direct tall-and-skinny QR factorizations in MapReduce architectures

Key Limitation Computes only R and not Q

Can get Q via Q = AR-1 with another MR iteration. Numerical stability: dubious although iterative refinement helps.

Cornell CS David Gleich · Purdue 21

kQT Q � Ik is large

Page 22: Direct tall-and-skinny QR factorizations in MapReduce architectures

Achieving numerical stability

Cornell CS David Gleich · Purdue 22

Condition number 1020 105

norm

( Q

T Q –

I )

AR-1

AR-1 + "

iterative refinement Direct TSQR

Page 23: Direct tall-and-skinny QR factorizations in MapReduce architectures

Why MapReduce?

Cornell CS David Gleich · Purdue 23

Page 24: Direct tall-and-skinny QR factorizations in MapReduce architectures

Full code in hadoopy In hadoopyimport random, numpy, hadoopyclass SerialTSQR:def __init__(self,blocksize,isreducer):self.bsize=blocksizeself.data = []if isreducer: self.__call__ = self.reducerelse: self.__call__ = self.mapper

def compress(self):R = numpy.linalg.qr(

numpy.array(self.data),'r')# reset data and re-initialize to Rself.data = []for row in R:self.data.append([float(v) for v in row])

def collect(self,key,value):self.data.append(value)if len(self.data)>self.bsize*len(self.data[0]):self.compress()

def close(self):self.compress()for row in self.data:key = random.randint(0,2000000000)yield key, row

def mapper(self,key,value):self.collect(key,value)

def reducer(self,key,values):for value in values: self.mapper(key,value)

if __name__=='__main__':mapper = SerialTSQR(blocksize=3,isreducer=False)reducer = SerialTSQR(blocksize=3,isreducer=True)hadoopy.run(mapper, reducer)

David Gleich (Sandia) 13/22MapReduce 2011 24

Cornell CS David Gleich · Purdue

Page 25: Direct tall-and-skinny QR factorizations in MapReduce architectures

3HUIRUPDQFH�UHVXOWV��VLPXODWHG�IDXOWV�

:H�FDQ�VWLOO�UXQ�ZLWK�3�IDXOW�� �����ZLWK�RQO\�a����SHUIRUPDQFH�SHQDOW\���+RZHYHU��ZLWK�3�IDXOW��VPDOO��ZH�VWLOO�VHH�D�SHUIRUPDQFH�KLW�

Fault injection

Cornell CS David Gleich · Purdue 25

10 100 1000

1/Prob(failure) – mean number of success per failure

Tim

e to

com

plet

ion

(sec)

200

100

No faults (200M by 200)

Faults (800M by 10)

Faults (200M by 200)

No faults "(800M by 10)

With 1/5 tasks failing, the job only takes twice as long.

Page 26: Direct tall-and-skinny QR factorizations in MapReduce architectures

How to get Q?

Cornell CS David Gleich · Purdue 26

Page 27: Direct tall-and-skinny QR factorizations in MapReduce architectures

Idea 1 (unstable)

Cornell CS David Gleich · Purdue 27

A1

A4

Q1 R-1

Mapper 1

A2 Q2

A3 Q3

A4 Q4

R TSQR

Distribute R

R-1

R-1

R-1

Page 28: Direct tall-and-skinny QR factorizations in MapReduce architectures

Idea 2 (better)

Cornell CS David Gleich · Purdue 28

A1

A4

Q1 R-1

Mapper 1

A2 Q2

A3 Q3

A4 Q4

R TSQR

Distribute R

R-1

R-1

R-1

There’s a famous quote that “two iterations of iterative refinement are enough” attributed to Parlett

TSQR

Q1

A4

Q1 T-1

Mapper 2

Q2 Q2

Q3 Q3

Q4 Q4

T

Distribute T

T-1

T-1

T-1

Page 29: Direct tall-and-skinny QR factorizations in MapReduce architectures

Communication avoiding QR (Demmel et al. 2008) Communication avoiding TSQR

Demmel et al. 2008. Communicating avoiding parallel and sequential QR.

First, do QR factorizationsof each local matrix   

Second, compute a QR factorization of the new “R”

David Gleich (Sandia) 6/22MapReduce 2011

29

Cornell CS David Gleich · Purdue

Page 30: Direct tall-and-skinny QR factorizations in MapReduce architectures

Idea 3 (best!)

Cornell CS David Gleich · Purdue 30

A1

A4

Q1 R1

Mapper 1

A2 Q2 R2

A3 Q3 R3

A4 Q4

Q1

Q2

Q3

Q4

R1

R2

R3

R4

R4 Q o

utpu

t

R ou

tput

Q11

Q21

Q31

Q41

R Task 2

Q11

Q21

Q31

Q41

Q1

Q2

Q3

Q4

Mapper 3

1. Output local Q and R in separate files

2. Collect R on one node, compute Qs for each piece

3. Distribute the pieces of Q*1 and form the true Q

Page 31: Direct tall-and-skinny QR factorizations in MapReduce architectures

The price is right!

Cornell CS David Gleich · Purdue 31

seco

nds

2500

500

Full TSQR is faster than refinement for few columns

… and not any slower for many columns.

Page 32: Direct tall-and-skinny QR factorizations in MapReduce architectures

What can we do now?

Cornell CS David Gleich · Purdue 32

Page 33: Direct tall-and-skinny QR factorizations in MapReduce architectures

PCA of 80,000,000!images

33/2

2

A

80,000,000 images

1000 pixels

X

MapReduce Post Processing

Zero"mean"rows

TSQ

R

R SVD

   V

First 16 columns

of V as images

Top 100 singular values

(principal �components)

Cornell CS David Gleich · Purdue

Page 34: Direct tall-and-skinny QR factorizations in MapReduce architectures

A Large Scale Example

Nonlinear heat transfer model 80k nodes, 300 time-steps 104 basis runs SVD of 24m x 104 data matrix 500x reduction in wall clock time (100x including the SVD) 34

ICASSP David Gleich · Purdue

Page 35: Direct tall-and-skinny QR factorizations in MapReduce architectures

What’s next? Investigate randomized algorithms for computing SVDs for fatter matrices.

Cornell CS David Gleich · Purdue 35

RANDOMIZED ALGORITHMS FOR MATRIX APPROXIMATION 9

Algorithm: Randomized PCA

Given an m ! n matrix A, the number k of principal components, and anexponent q, this procedure computes an approximate rank-2k factorizationU!V !. The columns of U estimate the first 2k principal components of A.

Stage A:1 Generate an n ! 2k Gaussian test matrix ".2 Form Y = (AA!)qA" by multiplying alternately with A and A!

3 Construct a matrix Q whose columns form an orthonormal basis for therange of Y .

Stage B:1 Form B = Q!A.2 Compute an SVD of the small matrix: B = !U!V !.3 Set U = Q !U .

singular spectrum of the data matrix often decays quite slowly. To address this di!-culty, we incorporate q steps of a power iteration where q = 1, 2 is usually su!cientin practice. The complete scheme appears below as the Randomized PCA algorithm.For refinements, see the discussion in §§4–5.

This procedure requires only 2(q +1) passes over the matrix, so it is e!cient evenfor matrices stored out-of-core. The flop count is

TrandPCA " qk Tmult + k2(m + n),

where Tmult is the cost of a matrix–vector multiply with A or A!.We have the following theorem on the performance of this method, which is a

consequence of Corollary 10.10.

Theorem 1.2. Suppose that A is a real m!n matrix. Select an exponent q anda target number k of principal components, where 2 # k # 0.5 min{m, n}. Executethe randomized PCA algorithm to obtain a rank-2k factorization U!V !. Then

E $A % U!V !$ #

"

1 + 4

#2 min{m, n}

k % 1

$1/(2q+1)

!k+1, (1.8)

where E denotes expectation with respect to the random test matrix and !k+1 is the(k + 1)th singular value of A.

This result is new. Observe that the bracket in (1.8) is essentially the same as thebracket in the basic error bound (1.6). We find that the power iteration drives theleading constant to one exponentially fast as the power q increases. Since the rank-kapproximation of A can never achieve an error smaller than !k+1, the randomizedprocedure estimates 2k principal components that carry essentially as much varianceas the first k actual principal components. Truncating the approximation to the firstk terms typically produces very accurate results.

1.7. Outline of paper. The paper is organized into three parts: an introduction(§§1–3), a description of the algorithms (§§4–7), and a theoretical performance analysis(§§8–11). Each part commences with a short internal outline, and the three segments

Halko, Martinsson, Tropp. SIREV 2011

Page 36: Direct tall-and-skinny QR factorizations in MapReduce architectures

Cornell CS David Gleich · Purdue 36

Questions?

Most recent code at http://github.com/arbenson/mrtsqr