Top Banner
Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)
34

Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Jan 04, 2016

Download

Documents

Adelia Patrick
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Stochastic SVDon Hadoop

Shannon Quinn(with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Page 2: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Lecture breakdown

• Part I–Stochastic SVD

• Part II–Distributed stochastic SVD

Page 3: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Part I: Stochastic SVD

Page 4: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Basic goal

• Matrix A–Find a low-rank approximation of A–Basic dimensionality reduction

Preconditioning

Page 5: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Basic algorithm

• INPUT: A, k, p• OUTPUT: Q1. Draw Gaussian n x (k + p) test matrix Ω2. Form product Y = AΩ3. Orthogonalize columns of Y Q

Page 6: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Basic evaluation

Page 7: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Approximating the SVD

• INPUT: Q• OUTPUT: Singular vectors U1. Form k x n matrix B = QTA2. Compute SVD of B = ÛΣVT3. Compute singular vectors U = QÛ

Page 8: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Demo

Page 9: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Empirical Results

• 1000x1000 matrix

Page 10: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Power iterations

• Affects decay of eigenvalues / singular values

Page 11: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Empirical Results

Page 12: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Empirical Results

Page 13: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Part II: Distributed SSVD

Page 14: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Algorithm Overview

QR factorization

Power iterationQR

factorization

In-core SVD

Page 15: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

SSVD Primitives

• Matrix-vector multiplication: y = Ax

• (midterm, anyone?)

Page 16: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

SSVD Primitives

• Matrix-matrix multiplication: y = ATAx

Page 17: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Matrix-matrix multiplication

• Very clever use of map/reduce• Each Mapper outputs:

Page 18: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

SSVD Primitives

• Distributed orthogonalization: Y = AΩ–Givens rotation

– Streaming QR• Sliding window

–Merge factorizations1. Merge R2. Merge QT

Page 19: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

SSVD

1

2

3

4

5

Page 20: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

1: Q-job

Page 21: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

2: BT-job

Page 22: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

3: ABT-job

Page 23: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

4: U-job

Page 24: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

5: V-job

Page 25: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Mahout SSVD Parameters

Page 26: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Block height

Page 27: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Power iterations

Page 28: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Comparison to Lanczos

Page 29: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Comparison to Lanczos

Page 30: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Comparison to Lanczos

Page 31: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Comparison to Lanczos

Page 32: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Datasets

• Wikipedia-all

• Wikipedia-MAX

Page 33: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

That’s SSVD!

Page 34: Stochastic SVD on Hadoop Shannon Quinn (with thanks to Gunnar Martinsson and Nathan Halko of UC Boulder, and Joel Tropp of CalTech)

Resources

• Randomized methods for computing the SVD of very large matrices– http://web.stanford.edu/group/mmds/slides2010/Martinsson.pdf

• Randomized methods for computing low-rank approximations of matrices– https://amath.colorado.edu/faculty/martinss/Pubs/2012_halko_dissertation.pdf

• SSVD on Mahout– https://mahout.apache.org/users/algorithms/d-ssvd.html