YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Large-scale Matrix Factorization

Kijung Shin

Ph.D. Student, CSD

Page 2: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Roadmap•Matrix Factorization (review)

•Algorithms◦ Distributed SGD: DSGD

◦ Alternating Least Square: ALS

◦ Cyclic Coordinate Descent: CCD++

• Experiments

• Extension to Tensor Factorization

•Conclusions

Large-scale Matrix Factorization (by Kijung Shin) 2/99

Page 3: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Roadmap•Matrix Factorization (review) <<

•Algorithms◦ Distributed SGD: DSGD

◦ Alternating Least Square: ALS

◦ Cyclic Coordinate Descent: CCD++

• Experiments

• Extension to Tensor Factorization

•Conclusions

Large-scale Matrix Factorization (by Kijung Shin) 3/99

Page 4: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Matrix Factorization: Problem•Given:

◦ 𝑽: 𝑛 by 𝑚 matrix

▪possibly with missing values

◦ 𝑟: target rank (a scalar)

▪usually 𝑟 << 𝑛 and 𝑟 << 𝑚

Large-scale Matrix Factorization (by Kijung Shin) 4/99

𝑚

𝑛

?

?

?

?

𝑽

Page 5: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Matrix Factorization: Problem•Given:

◦ 𝑽: 𝑛 by 𝑚 matrix

▪possibly with missing values

◦ 𝑟: target rank (a scalar)

▪usually 𝑟 << 𝑛 and 𝑟 << 𝑚

• Find:◦ 𝑾: 𝑛 by 𝑟 matrix

◦ 𝑯: 𝑟 by 𝑚 matrix

▪without missing values

Large-scale Matrix Factorization (by Kijung Shin) 5/99

𝑛

𝑟

𝑚

𝑾

𝑯

𝑟

Page 6: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Matrix Factorization: Problem•Goal: 𝑾𝑯 ≈ 𝑽

Large-scale Matrix Factorization (by Kijung Shin) 6/99

𝑾

𝑯

𝑽

≈×?

?

?

?

Page 7: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Loss Function𝐿 𝑉,𝑊,𝐻

= σ 𝑖,𝑗 ∈𝑍 (𝑉𝑖𝑗 − 𝑊𝐻 𝑖𝑗)2 + …

Goal: to make 𝑾𝑯 similar to 𝑽

Large-scale Matrix Factorization (by Kijung Shin) 7/99

Indices ofnon-missing entries

i.e., 𝑖, 𝑗 ∈ 𝑍↔ 𝑉𝑖𝑗 is not missing

(𝑖, 𝑗)-th entry of 𝑊𝐻

(𝑖, 𝑗)-th entry of 𝑉

Page 8: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Loss Function (cont.)𝐿 𝑉,𝑊,𝐻

= σ 𝑖,𝑗 ∈𝑍 (𝑉𝑖𝑗 − 𝑊𝐻 𝑖𝑗)2

+ 𝜆( 𝑊 𝐹2 + 𝐻 𝐹

2)

Goal: to prevent overfitting

(by making the entries of 𝑊 and 𝐻 close to zero)

Large-scale Matrix Factorization (by Kijung Shin) 8/99

Regularization parameter

Frobenius Norm:

𝑊 𝐹2 =

𝑖=1

𝑛

𝑘=1

𝑟

𝑊𝑖𝑘2

Page 9: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Algorithms•How can we minimize this loss function 𝐿?

◦ Stochastic Gradient Descent: SGD (covered in the last lecture)

◦ Alternating Least Square: ALS (covered today)

◦ Cyclic Coordinate Descent: CCD++ (covered today)

•Are these algorithms parallelizable?

• Yes, all of them!

Large-scale Matrix Factorization (by Kijung Shin) 9/99

Page 10: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Roadmap•Matrix Factorization (review)

•Algorithms◦ Distributed SGD: DSGD <<

◦ Alternating Least Square: ALS

◦ Cyclic Coordinate Descent: CCD++

• Experiments

• Extension to Tensor Factorization

•Conclusions

Large-scale Matrix Factorization (by Kijung Shin) 10/99

Page 11: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Distributed SGD (DSGD)

Large-scale matrix factorizationwith distributed

stochastic gradient descent(KDD 2011)

Large-scale Matrix Factorization (by Kijung Shin) 11/99

Rainer Gemulla, Erik Nijkamp, Peter J. Haas, and Yannis Sismanis

Page 12: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Stochastic GD for MF (review)

• Let 𝑊 =

− 𝑊1: −

:− 𝑊𝑛: −

, 𝐻 =

| |𝐻:1 … 𝐻:𝑚| |

•𝐿 𝑉,𝑊,𝐻 : sum of loss for each non-missing entry

𝐿 𝑉,𝑊,𝐻 =

𝑖,𝑗 ∈𝑍

𝐿′ 𝑉𝑖𝑗 ,𝑊𝑖:, 𝐻:𝑗

where loss at each non-missing entry 𝑉𝑖𝑗 is

𝐿′ 𝑉𝑖𝑗 ,𝑊𝑖:, 𝐻:𝑗 ≔ 𝑉𝑖𝑗 −𝑊𝑖:𝐻:𝑗2+ 𝜆

𝑊𝑖:2

𝑁𝑖:+

𝐻:𝑗2

𝑁:𝑗

Large-scale Matrix Factorization (by Kijung Shin) 12/99

= 𝑊𝐻 𝑖𝑗

Number of non-missing entries in the 𝑖-th row of 𝑉

Page 13: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Stochastic GD for MF (cont.)• Stochastic Gradient Descent (SGD) for MF

• repeat until convergence◦ randomly shuffle the non-missing entries of 𝑉

◦ for each non-missing entry:

▪perform an SGD step on it

Large-scale Matrix Factorization (by Kijung Shin) 13/99

Page 14: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Stochastic GD for MF (cont.)•An SGD step on each non-missing entry 𝑉𝑖𝑗:

◦ Read 𝑊𝑖: and 𝐻:𝑗

◦ Compute gradient of 𝐿′ 𝑉𝑖𝑗 ,𝑊𝑖:, 𝐻:𝑗◦ Update 𝑊𝑖: and 𝐻:𝑗▪Detailed update rules were

covered in the last lecture

Large-scale Matrix Factorization (by Kijung Shin) 14/99

𝑽𝑾

𝑯

𝑉𝑖𝑗𝑊𝑖:

𝐻:𝑗

Page 15: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Simple Parallel SGD for MF• Parameter Mixing: MSGD

◦ entries of 𝑉 are distributed across multiple machines

Large-scale Matrix Factorization (by Kijung Shin) 15/99

Machine 1 Machine 2 Machine 3

𝑽 𝑽 𝑽

Page 16: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Simple Parallel SGD for MF (cont.)• Parameter Mixing: MSGD

◦ entries of 𝑉 are distributed across multiple machines

◦ Map step: each machine runs SGD independently on the assigned entries until convergence

Large-scale Matrix Factorization (by Kijung Shin) 16/99

Machine 1 Machine 2 Machine 3

𝑽

𝑾

𝑯

𝑽

𝑾

𝑯

𝑽

𝑾

𝑯

Page 17: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

• Parameter Mixing: MSGD◦ Map step: each machine runs

▪an independent instance of SGD on subsets of the data

▪until convergence

◦ Reduce step: average results (i.e., 𝑾 and 𝑯)

• Problem: does not converge to correct solutions◦ no guarantee that “the average” reduces the loss function 𝐿

Large-scale Matrix Factorization (by Kijung Shin) 17/99

Simple Parallel SGD for MF (cont.)

𝑯𝑯 𝑯 𝑯+ +

Machine 1 Machine 2 Machine 3

3

Page 18: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Simple Parallel SGD for MF (cont.)• Iterative Parameter Mixing: ISGD

◦ entries of 𝑉 are distributed across multiple machines

◦ Repeat until convergence

▪Map step: each machine runs SGD independently on the assigned entries for some time

▪Reduce step:

◦ average results (i.e., 𝑾 and 𝑯)

◦ broadcast averaged results

• Problem: slow convergence◦ still has the averaging step

Large-scale Matrix Factorization (by Kijung Shin) 18/99

Page 19: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Interchangeability•How can we avoid the averaging step?

◦ let different machines update different entries of W and 𝐻

• Two entries 𝑉𝑖𝑗 and 𝑉𝑘𝑙 of 𝑽 are interchangeable if they share neither row nor column

Large-scale Matrix Factorization (by Kijung Shin) 19/99

𝑽𝑾

𝑯

𝑉𝑖𝑗𝑊𝑖:

𝐻:𝑗

𝑉𝑖𝑙

𝐻:𝑙Not interchangeable!

𝑽𝑾

𝑯

𝑉𝑖𝑗𝑊𝑖:

𝐻:𝑗

𝑉𝑘𝑙

𝐻:𝑙

𝑊𝑘:

Interchangeable!

Page 20: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Interchangeability (cont.)• If 𝑉𝑖𝑗 and 𝑉𝑘𝑙 are interchangeable,

• SGD steps on 𝑉𝑖𝑗 and 𝑉𝑘𝑙 can be parallelized safely

Large-scale Matrix Factorization (by Kijung Shin) 20/99

Read 𝑊𝑖: and 𝐻:𝑗Compute gradient of 𝐿′ 𝑊𝑖:, 𝐻:𝑗Update 𝑊𝑖: and 𝐻:𝑗

Read 𝑊𝑘: and 𝐻:𝑙Compute gradient of 𝐿′ 𝑊𝑙:, 𝐻:𝑙Update 𝑊𝑘: and 𝐻:𝑙

no conflicts!

𝑽𝑾

𝑯

𝑉𝑖𝑗𝑊𝑖:

𝐻:𝑗

𝑉𝑘𝑙

𝐻:𝑙

𝑊𝑘:

Page 21: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Distributed SGD •Block 𝑾, 𝑯, and 𝑽

Large-scale Matrix Factorization (by Kijung Shin) 21/99

𝑊(1)

𝑊(2)

𝑊(3)

𝐻(1) 𝐻(2) 𝐻(3)

𝑉(1,1) 𝑉(1,2) 𝑉(1,3)

𝑉(2,1) 𝑉(2,2) 𝑉(2,3)

𝑉(3,1) 𝑉(3,2) 𝑉(3,3)

𝑊

𝐻

𝑉

Page 22: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Distributed SGD (cont.) •Block 𝑾, 𝑯, and 𝑽

• repeat until convergence◦ for a set of interchangeable blocks of 𝑉

▪1. run SGD on the blocks in parallel

◦ no conflict between machines

▪2. merge results (i.e., 𝑾 and 𝑯)

◦ averaging is not needed

Large-scale Matrix Factorization (by Kijung Shin) 22/99

𝑊(1)

𝑊(2)

𝑊(3)

𝐻(1) 𝐻(2) 𝐻(3)

𝑉(1,1)

𝑉(2,2)

𝑉(3,3)Blocks on a diagonal are interchangeable!

Page 23: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Interchangeable Blocks• Example of interchangeable blocks

Large-scale Matrix Factorization (by Kijung Shin) 23/99

𝑊(1)

𝑊(2)

𝑊(3)

𝐻(1) 𝐻(2) 𝐻(3)

𝑉(1,1)

𝑉(2,2)

𝑉(3,3)

Machine 1 Machine 2

Machine 3Blocks are interchangeable!

𝑊(1)

𝐻(1)

𝑉(1,1) 𝑊(2)

𝐻(2)

𝑉(2,2)

𝑊(3)

𝐻(3)

𝑉(3,3)

Page 24: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Interchangeable Blocks (cont.)• Example of interchangeable blocks

Large-scale Matrix Factorization (by Kijung Shin) 24/99

𝑊(1)

𝑊(2)

𝑊(3)

𝐻(1) 𝐻(2) 𝐻(3)

𝑉(1,2)

𝑉(2,3)

𝑉(3,1)

𝑊(1)

𝐻(2)

𝑉(1,2) 𝑊(2)

𝐻(3)

𝑉(2,3)

𝑊(3)

𝐻(1)

𝑉(3,1)

Machine 1 Machine 2

Machine 3Blocks are interchangeable!

Page 25: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Interchangeable Blocks (cont.)• Example of interchangeable blocks

Large-scale Matrix Factorization (by Kijung Shin) 25/99

𝑊(1)

𝑊(2)

𝑊(3)

𝐻(1) 𝐻(2) 𝐻(3)

𝑉(1,3)

𝑉(2,1)

𝑉(3,2)

𝑊(1)

𝐻(3)

𝑉(1,3) 𝑊(2)

𝐻(1)

𝑉(2,1)

𝑊(3)

𝐻(2)

𝑉(3,2)

Machine 1 Machine 2

Machine 3Blocks are interchangeable!

Page 26: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Interchangeable Blocks (cont.)•Not interchangeable blocks

◦ Not used in DSGD

Large-scale Matrix Factorization (by Kijung Shin) 26/99

𝑊(1)

𝑊(2)

𝑊(3)

𝐻(1) 𝐻(2) 𝐻(3)

𝑉(1,3)

𝑉(2,3)

𝑉(3,2)

Blocks are not interchangeable!

Page 27: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

More details of DSGD•How should we choose sets of interchangeable blocks?

◦ In each iteration, we choose sets of interchangeable blocks so that every block of 𝑉 is used exactly once

◦ E.g.,

◦ But the order and grouping are random

• SGD within each block◦ every non-missing entry in the block is used exactly once ◦ but the order is random

Large-scale Matrix Factorization (by Kijung Shin) 27/99

Page 28: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

More details of DSGD (cont.)•Use “bold driver” to set step size (or learning rate)

◦ After each iteration,

▪ increase the step size if the loss decreases

▪decrease the step size if the loss increases

• Implemented Hadoop and R/Snowfall◦ Snowfall: package for parallel R programs

◦ https://cran.r-project.org/web/packages/snowfall/index.html

Large-scale Matrix Factorization (by Kijung Shin) 28/99

Page 29: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Pros & Cons of DSGD• Pros:

◦ Faster convergence than MSGD and ISGD▪no averaging step

◦ Memory efficiency: each machine needs to maintain a single block of W and a single block of H in memory

•Cons:◦ Many hyperparameters: step size (𝜖) in addition to

regularization parameter (𝜆) and rank (𝑟)

Large-scale Matrix Factorization (by Kijung Shin) 29/99

𝑊(1)

𝐻(1)

𝑉(1,1) 𝑊(2)

𝐻(2)

𝑉(2,2) 𝑊(3)

𝐻(3)

𝑉(3,3)

Machine 1 Machine 2 Machine 3

Page 30: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Roadmap•Matrix Factorization (review)

•Algorithms◦ Distributed SGD: DSGD

◦ Alternating Least Square: ALS <<

◦ Cyclic Coordinate Descent: CCD++

• Experiments

• Extension to Tensor Factorization

•Conclusions

Large-scale Matrix Factorization (by Kijung Shin) 30/99

Page 31: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Alternating Least Square

Large-scale Parallel Collaborative Filtering for the Netflix Prize

(AAIM 2008)

Large-scale Matrix Factorization (by Kijung Shin) 31/99

Yunhong Zhou, Dennis Wilkinson, Robert Schreiber and Rong Pan

Page 32: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Main Idea behind ALS•How hard is matrix factorization?

◦ i.e., finding argmin𝑾,𝑯

𝐿(𝑉,𝑊,𝐻)

• Solving the entire problem is difficult◦ 𝐿 𝑉,𝑊,𝐻 is a non-convex function of 𝑾 and 𝑯

▪many local optima

◦ finding a global optimum is NP-hard

Large-scale Matrix Factorization (by Kijung Shin) 32/99

Page 33: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Main Idea behind ALS (cont.)•How hard is solving a smaller problem?

◦ Specifically, findingargmin

𝑯𝐿(𝑉,𝑊,𝐻)

while fixing 𝑾 to its current value

•Much easier!◦ 𝐿 𝑉,𝑊,𝐻 is a convex function of 𝑯 (once 𝑾 is fixed)

▪one local optimum, which is also globally optimal

◦ Moreover, there exists the closed-form solution

▪we can direct compute the global optimum

Large-scale Matrix Factorization (by Kijung Shin) 33/99

Page 34: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Updating H• Finding

argmin𝑯

𝐿(𝑉,𝑊,𝐻)

while fixing 𝑾 to its current value

• This problem is solved by the following update rule

◦ derived from 𝜕𝐿

𝜕𝐻𝑘𝑗= 0, ∀𝑘, ∀𝑗 (first-order condition)

Large-scale Matrix Factorization (by Kijung Shin) 34/99

For j = 1…𝑚

𝐻:𝑗 ←

𝑖∈𝑍:𝑗

𝑊𝑖: 𝑊𝑖:𝑇 + 𝜆𝐼𝑘

−1

𝑖∈𝑍:𝑗

𝑉𝑖𝑗𝑊𝑖:

row indices of non-missing entries in the 𝑗-th column of 𝑽

𝑘 by 𝑘 identity matrix

Page 35: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Updating W• Likewise, finding

argmin𝑾

𝐿(𝑉,𝑊,𝐻)

while fixing 𝑯 to its current value

• This problem is solved by the following update rule

◦ derived from 𝜕𝐿

𝜕𝑊𝑖𝑘= 0, ∀𝑖, ∀𝑘 (first-order condition)

Large-scale Matrix Factorization (by Kijung Shin) 35/99

For i = 1…𝑛

𝑊𝑖: ←

𝑖∈𝑍𝑖:

𝐻:𝑗𝑇𝐻:𝑗 + 𝜆𝐼𝑘

−1

𝑗∈𝑍𝑖:

𝑉𝑖𝑗𝐻:𝑗

column indices of non-missing entries in the 𝑖-th row of 𝑽

Page 36: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Alternating Least Square (ALS)randomly Initialize 𝑾 and 𝑯

repeat until convergenceupdate 𝑾 while fixing 𝑯 to its current value

update 𝑯 while fixing 𝑾 to its current value

Large-scale Matrix Factorization (by Kijung Shin) 36/99

• Each step never increases 𝐿 𝑉,𝑊,𝐻◦ 𝑾 is updated to the “best” 𝑾 that minimizes 𝐿 𝑉,𝑊,𝐻

for current 𝑯◦ 𝑯 is updated to the “best” 𝑯 that minimizes 𝐿 𝑉,𝑊,𝐻

for current 𝑾

•𝐿 𝑉,𝑊,𝐻 monotonically decreases until convergence

Page 37: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Parallel ALS: Idea•Recall the update rule for each 𝑗-th column of 𝑯

Large-scale Matrix Factorization (by Kijung Shin) 37/99

𝐻:𝑗 ←

𝑖∈𝑍:𝑗

𝑊𝑖: 𝑊𝑖:𝑇 + 𝜆𝐼𝑘

−1

𝑖∈𝑍:𝑗

𝑉𝑖𝑗𝑊𝑖:

𝑽𝑾

𝑯

𝑉:𝑗

𝐻:𝑗Read Written

𝑽𝑾

𝑯

Page 38: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Parallel ALS: Idea (cont.)• Potentially every entry of 𝑾 needs to be read

• Only the 𝑗-th column of 𝑽 needs to be read

• 𝑯 does not need to be read⇒ can update the columns of 𝑯 independently in parallel

Large-scale Matrix Factorization (by Kijung Shin) 38/99

𝑽𝑾

𝑯

𝑉:𝑗

𝐻:𝑗Read Written

𝑽𝑾

𝑯

Page 39: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Updating H in Parallel•A toy example with 3 machines

• 1. Divide 𝑽 column-wise into 3 pieces

Large-scale Matrix Factorization (by Kijung Shin) 39/99

𝑽(𝟏) 𝑽(𝟐) 𝑽(𝟑)𝑾

Page 40: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Updating H in Parallel (cont.)

Large-scale Matrix Factorization (by Kijung Shin) 40/99

• 2. Distribute the pieces across the machines

• 3. 𝑾 is broadcast to every machine

𝑽(𝟏) 𝑽(𝟐) 𝑽(𝟑)𝑾 𝑾 𝑽(𝟏) 𝑾 𝑽(𝟐) 𝑾 𝑽(𝟑)

Machine 1 Machine 2 Machine 3

Page 41: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Updating H in Parallel (cont.)

Large-scale Matrix Factorization (by Kijung Shin) 41/99

• 4. Compute the corresponding columns of 𝑯 in parallel

𝑽(𝟏) 𝑽(𝟐) 𝑽(𝟑)𝑾 𝑾 𝑽(𝟏) 𝑾 𝑽(𝟐) 𝑾 𝑽(𝟑)

Machine 1 Machine 2 Machine 3

𝑯(𝟏) 𝑯(𝟐) 𝑯(𝟑)New! New! New!

Page 42: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Updating H in Parallel (cont.)

Large-scale Matrix Factorization (by Kijung Shin) 42/99

Machine 1 Machine 2 Machine 3

• 5. Broadcast the part of 𝑯 that each machine computes

𝑾 𝑽(𝟏) 𝑾 𝑽(𝟐) 𝑾 𝑽(𝟑)

𝑯(𝟏) 𝑯(𝟐) 𝑯(𝟑) 𝑯(𝟏) 𝑯(𝟐) 𝑯(𝟑) 𝑯(𝟏) 𝑯(𝟐) 𝑯(𝟑)

Page 43: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Parallel ALS: Idea•Recall the update rule for each row of 𝑾

Large-scale Matrix Factorization (by Kijung Shin) 43/99

𝑊𝑖: ←

𝑖∈𝑍𝑖:

𝐻:𝑗𝑇𝐻:𝑗 + 𝜆𝐼𝑘

−1

𝑗∈𝑍𝑖:

𝑉𝑖𝑗𝐻:𝑗

𝑽𝑾

𝑯

𝑉𝑖:

Read Written

𝑽𝑾

𝑯

𝑊𝑖:

Page 44: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Parallel ALS: Idea (cont.)• Potentially every entry of 𝑯 needs to be read

• Only the 𝑖-th row of 𝑽 needs to be read

•𝑾 does not need to be read⇒ can update the rows of 𝑾 independently in parallel

Large-scale Matrix Factorization (by Kijung Shin) 44/99

𝑽𝑾

𝑯

𝑉𝑖:

Read Written

𝑽𝑾

𝑯

𝑊𝑖:

Page 45: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Updating W in Parallel•A toy example with 3 machines

• 1. Divide 𝑽 row-wise into 3 pieces

Large-scale Matrix Factorization (by Kijung Shin) 45/99

𝑽(𝟏)

𝑯

𝑽(𝟐)

𝑽(𝟑)

Page 46: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Updating W in Parallel (cont.)

Large-scale Matrix Factorization (by Kijung Shin) 46/99

Machine 1 Machine 2 Machine 3

• 2. Distribute the pieces across the machines

• 3. 𝑯 is broadcast to every machine

𝑯 𝑯 𝑯

𝑽(𝟏) 𝑽(𝟐) 𝑽(𝟑)

𝑽(𝟏)

𝑯

𝑽(𝟐)

𝑽(𝟑)

Page 47: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Updating W in Parallel (cont.)

Large-scale Matrix Factorization (by Kijung Shin) 47/99

• 4. Compute the corresponding rows of 𝑾 in parallel

Machine 1 Machine 2 Machine 3

𝑯 𝑯 𝑯

𝑽(𝟏) 𝑽(𝟐) 𝑽(𝟑)𝑾(𝟐)𝑾(𝟏) 𝑽(𝟑)

New! New! New!

Page 48: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Updating W in Parallel (cont.)

Large-scale Matrix Factorization (by Kijung Shin) 48/99

Machine 1 Machine 2 Machine 3

• 5. Broadcast the part of 𝑾 that each machine computes

𝑯 𝑯 𝑯

𝑽(𝟏) 𝑽(𝟐) 𝑽(𝟑)

𝑾(𝟏)

𝑾(𝟐)

𝑾(𝟑)

𝑾(𝟏)

𝑾(𝟐)

𝑾(𝟑)

𝑾(𝟏)

𝑾(𝟐)

𝑾(𝟑)

Page 49: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Pros & Cons of ALS• Pros:

◦ Less hyper-parameters: not requiring step size

•Cons:◦ High computational cost: e.g., matrix inversion takes O(𝑟3)

Large-scale Matrix Factorization (by Kijung Shin) 49/99

𝐻:𝑗 ←

𝑖∈𝑍:𝑗

𝑊𝑖: 𝑊𝑖:𝑇 + 𝜆𝐼𝑘

−1

𝑖∈𝑍:𝑗

𝑉𝑖𝑗𝑊𝑖:

Page 50: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Pros & Cons of ALS (cont.)

Large-scale Matrix Factorization (by Kijung Shin) 50/99

𝑾 𝑽(𝟏) 𝑾 𝑽(𝟐) 𝑾 𝑽(𝟑)

Machine 1 Machine 2 Machine 3

•Cons (cont.)◦ High memory requirement:

▪while updating 𝑾(or 𝑯),

▪each machine maintains the entire 𝑯(or 𝑾) in memory

Page 51: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Roadmap•Matrix Factorization (review)

•Algorithms◦ Distributed SGD: DSGD

◦ Alternating Least Square: ALS

◦ Cyclic Coordinate Descent: CCD++ <<

• Experiments

• Extension to Tensor Factorization

•Conclusions

Large-scale Matrix Factorization (by Kijung Shin) 51/99

Page 52: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Cyclic Coordinate Descent

Scalable Coordinate Descent Approaches

to Parallel Matrix Factorization

for Recommender Systems

(ICDM 2012)

Large-scale Matrix Factorization (by Kijung Shin) 52/99

Hsiang-Fu Yu, Cho-Jui Hsieh, Si Si, and Inderjit Dhillon

Best Paper Award

Page 53: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Matrix Factorization: Revisit

Large-scale Matrix Factorization (by Kijung Shin) 53/99

•Goal: 𝑽 is approximated by the product of 𝑾 and 𝑯

𝑾

𝑯

𝑽

≈ ×?

?

?

?

𝑛

𝑟 𝑚

𝑟

𝑛

𝑚

Page 54: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Matrix Factorization: Revisit

Large-scale Matrix Factorization (by Kijung Shin) 54/99

•Goal: 𝑽 is approximated by the product of 𝑾 and 𝑯

𝑾

𝑯

𝑽

≈ ×?

?

?

?

𝑛

𝑟 𝑚

𝑟𝑊𝑖:

𝐻:𝑗

𝑉𝑖𝑗 ≈ 𝑊𝐻 𝑖𝑗

= σ𝑘=1𝑟 𝑊𝑖𝑘𝐻𝑘𝑗 = 𝑊𝑖1𝐻1𝑗 +⋯+𝑊𝑖𝑟𝐻𝑟𝑗

𝑛

𝑚

𝑉𝑖𝑗

Page 55: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Matrix Factorization: Revisit

Large-scale Matrix Factorization (by Kijung Shin) 55/99

• Equivalently, 𝑽 is approximated by the sum of 𝑟 matrices

𝑽

≈ ×?

?

?

?

𝑊:1

𝐻1:

×

𝑊:𝑟

𝐻𝑟:

+⋯+

𝑉𝑖𝑗 ≈ 𝑊:1𝐻1: 𝑖𝑗 +⋯+ 𝑊:𝑟𝐻𝑟: 𝑖𝑗 = 𝑊𝑖1𝐻1𝑗 +⋯+𝑊𝑖𝑟𝐻𝑟𝑗

𝑊:1𝐻1: 𝑖𝑗 = 𝑊𝑖1𝐻1𝑗

𝑊:𝑟𝐻𝑟:

𝑊:𝑟𝐻𝑟: 𝑖𝑗 = 𝑊𝑖𝑟𝐻𝑟𝑗

1st column of 𝑊

1st row of 𝐻

rth column of 𝑊

rth row of 𝐻

Page 56: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Cyclic Coordinate Descent: CCD++• repeat until convergence

◦ for 𝑘 = 1…𝑟

update the 𝑘-th column of 𝑾 and the 𝑘-th row of 𝑯

while fixing the other entries of 𝑾 and 𝑯

Large-scale Matrix Factorization (by Kijung Shin) 56/99

•Consider a toy example with rank 3

𝑽

≈ ×

𝑊:1

+ × + ×

𝐻1:

𝑊:2

𝐻2:

𝑊:3

𝐻3:

Step 1:

Page 57: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Large-scale Matrix Factorization (by Kijung Shin) 57/99

𝑽

≈ ×

𝑊:1

+ × + ×

𝐻1:

𝑊:2

𝐻2:

𝑊:3

𝐻3:

𝑽

≈ ×

𝑊:1

+ × + ×

𝐻1:

𝑊:2

𝐻2:

𝑊:3

𝐻3:

Step 2:

Step 3:

Order of Updates in CCD++

Page 58: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Performing one update• Let 𝑊:1 and 𝐻1: be the updated row and column

Large-scale Matrix Factorization (by Kijung Shin) 58/99

𝑽

≈ ×

𝑊:1

𝐻1:

− × − ×

𝑊:2

𝐻2:

𝑊:3

𝐻3:

residual matrix 𝑹

𝑽

≈ ×

𝑊:1

+ × + ×

𝐻1:

𝑊:2

𝐻2:

𝑊:3

𝐻3:

Page 59: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Performing one update (cont.)

• Updating 𝑊:1 and 𝐻1: equals to factorizing 𝑹 with rank 1

• 𝑹 = 𝑽 −𝑊:2𝐻2: −𝑊:3𝐻:3

= 𝑽 −𝑊:1𝐻1: −𝑊:2𝐻2: −𝑊:3𝐻:3 +𝑊:1𝐻1:

= 𝑽 −𝑾𝑯+𝑊:1𝐻1:

Large-scale Matrix Factorization (by Kijung Shin) 59/99

𝑽

≈ ×

𝑊:1

𝐻1:

− × − ×

𝑊:2

𝐻2:

𝑊:3

𝐻3:

residual matrix 𝑹

Page 60: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Performing one update (cont.)• Factorizing 𝑹 with rank 1 can be performed using ALS

•with the following rules:◦ The updates rules can be derived from those in ALS

Large-scale Matrix Factorization (by Kijung Shin) 60/99

For i = 1…𝑛

𝑊𝑖𝑘 ← ൙σ𝑗∈𝑍𝑖:

𝑅𝑖𝑗𝐻𝑘𝑗

σ𝑗∈𝑍𝑖:𝐻𝑘𝑗

2+ 𝜆

For j = 1…𝑚

𝐻𝑘𝑗 ← ൙σ𝑖∈𝑍:𝑗

𝑅𝑖𝑗𝑊𝑖𝑘

σ𝑖∈𝑍:𝑗𝑊𝑖𝑘

2 + 𝜆

Page 61: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Pseudocode of CCD++

Large-scale Matrix Factorization (by Kijung Shin) 61/99

randomly initialize 𝑾 and 𝑯

repeat until convergencefor 𝑘 = 1…𝑟:𝑹 ← 𝑽 −𝑾𝑯+𝑊:𝑘𝐻𝑘:update 𝑊:𝑘 and 𝐻𝑘:

(rank-1 factorization of 𝑹)

randomly initialize 𝑾 and 𝑯

𝑹 ← 𝑽 −𝑾𝑯

repeat until convergencefor 𝑘 = 1…𝑟:𝑹 ← 𝑹 +𝑊:𝑘𝐻𝑘:update 𝑊:𝑘 and 𝐻𝑘:(rank-1 factorization of 𝑹)

𝑹 ← 𝑹 +𝑊:𝑘𝐻𝑘:Repeated!!

Maintained!!

Pseudocode (Simple) Pseudocode (Detailed)

Page 62: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Parallel CCD++ •Updating 𝐻𝑘: in parallel

◦ 1. divide 𝑹 column-wise

◦ 2. distribute the pieces across the machines

◦ 3. 𝑊:𝑘 is broadcast to every machine

Large-scale Matrix Factorization (by Kijung Shin) 62/99

𝑊:𝑘

𝑹(𝟏) 𝑹(𝟐) 𝑹(𝟑)

Machine 1 Machine 2 Machine 3

𝑊:𝑘 𝑊:𝑘

𝑹(𝟏) 𝑹(𝟐) 𝑹(𝟑)

𝑊𝐾:

𝐻𝑘:

Page 63: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Parallel CCD++ (cont.) •Updating 𝐻𝑘: in parallel (cont.)

◦ 4. compute the corresponding entries of 𝐻𝑘: in parallel

Large-scale Matrix Factorization (by Kijung Shin) 63/99

𝑊:𝑘

𝑹(𝟏) 𝑹(𝟐) 𝑹(𝟑)

Machine 1 Machine 2 Machine 3

𝐻 1𝑘:

New!

𝑊:𝑘 𝑊:𝑘

𝐻 2𝑘:

New!

𝐻 3𝑘:

New!

Page 64: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Parallel CCD++ (cont.) •Updating 𝐻𝑘: in parallel (cont.)

◦ 5. send the computed entries to the other machines.

Large-scale Matrix Factorization (by Kijung Shin) 64/99

𝑊:𝑘

𝑹(𝟏) 𝑹(𝟐) 𝑹(𝟑)

Machine 1 Machine 2 Machine 3

𝑊:𝑘 𝑊:𝑘

𝐻𝑘: 𝐻𝑘: 𝐻𝑘:

Page 65: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Pros & Cons of CCD++• Pros:

◦ Low computational cost

▪update rules do not need matrix inversion

Large-scale Matrix Factorization (by Kijung Shin) 65/99

𝐻𝑘𝑗 ← ൙σ𝑖∈𝑍:𝑗

𝑅𝑖𝑗𝑊𝑖𝑘

σ𝑖∈𝑍:𝑗𝑊𝑖𝑘

2 + 𝜆𝑊𝑖𝑘 ← ൙

σ𝑗∈𝑍𝑖:𝑅𝑖𝑗𝐻𝑘𝑗

σ𝑗∈𝑍𝑖:𝐻𝑘𝑗

2+ 𝜆

Page 66: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Pros & Cons of CCD++• Pros (cont.):

◦ Low memory requirements

▪each machine needs to maintain

▪one row of 𝑯 (or one column of 𝑾) in memory at a time

▪ instead of entire 𝑯 (or𝑾)

Large-scale Matrix Factorization (by Kijung Shin) 66/99

Machine 1 in ALS

𝑾 𝑽(𝟏)

Machine 1 in CCD++

𝑊:𝑘

𝑹(𝟏)

Page 67: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Pros & Cons of CCD++•Cons:

◦ Slow for dense matrices with many non-missing entries

▪To update 𝑹 , every non-missing entry

▪needs to be read and rewritten.

▪This is especially slow if 𝑹 is stored on disk.

Large-scale Matrix Factorization (by Kijung Shin) 67/99

Page 68: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Roadmap• Problem (review)

•Algorithms◦ Distributed SGD: DSGD

◦ Alternating Least Square: ALS

◦ Cyclic Coordinate Descent: CCD++

• Experiments <<

• Extension to Tensor Factorization

•Conclusions

Large-scale Matrix Factorization (by Kijung Shin) 68/99

Page 69: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Experimental Settings•Datasets:

Large-scale Matrix Factorization (by Kijung Shin) 69/99

5 ? 4 3

3 4 1 ?

2 ? 1 ?

5 3 ? 2

Use

rs

Movies (or songs)

Ratings

Page 70: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Experimental Settings (cont.)•Datasets:

Large-scale Matrix Factorization (by Kijung Shin) 70/99

(2.7M users, 18K movies, 100M ratings)

(1M users, 625K songs, 256M ratings)

(71K users, 65K movies, 10M ratings)

Page 71: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Experimental Settings (cont.)• Split of Datasets:

◦ Training set: about 80% of non-missing entries

◦ Test set: about 20% of non-missing entries

• Evaluation Metric:◦ Test RMSE (Root-mean square error)

Large-scale Matrix Factorization (by Kijung Shin) 71/99

𝑅𝑀𝑆𝐸 𝑉,𝑊,𝐻 =1

|𝑍𝑡𝑒𝑠𝑡|

(𝑖,𝑗)∈𝑍𝑡𝑒𝑠𝑡

𝑉𝑖𝑗 − 𝑊𝐻 𝑖𝑗2

index of test entriesEstimated

ratingTrue

rating

Page 72: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Experimental Settings (cont.)•Machines and implementations

◦ Shared-memory setting:

▪8 cores

▪OpenMP in C++

◦ Distributed setting:

▪up to 20 machines

▪MPI in C++

Large-scale Matrix Factorization (by Kijung Shin) 72/99

Page 73: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Convergence Speed

Large-scale Matrix Factorization (by Kijung Shin) 73/99

• Shared-memory setting with 8 cores

•CCD++ decreases Test RMSE fastest

Figure 3(b) of [YHS+12]

Page 74: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Convergence Speed (cont.)

Large-scale Matrix Factorization (by Kijung Shin) 74/99

• Shared-memory setting with 8 cores

•CCD++ converges Test RMSE fastest

Figure 3(c) of [YHS+12]

Page 75: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Large-scale Matrix Factorization (by Kijung Shin) 75/99

• Shared-memory setting with 8 cores

•CCD++ converges fastest

Figure 3(a) of [YHS+12]

Convergence Speed (cont.)

Page 76: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Speedup in Shared Memory•CCD++ and ALS show near-linear machine scalability

•DSGD suffers form high cache-miss rates due to its randomness◦ Cache-miss rate increases as more cores are used

Large-scale Matrix Factorization (by Kijung Shin) 76/99

Figure 4 of [YHS+12]

Page 77: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

•All algorithms show similar speedups

Large-scale Matrix Factorization (by Kijung Shin) 77/99

Speedup in Distributed Settings

Figure 6(b) of [YHS+12]

Page 78: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Roadmap•Matrix Factorization (review)

•Algorithms◦ Distributed SGD: DSGD

◦ Alternating Least Square: ALS

◦ Cyclic Coordinate Descent: CCD++

• Experiments

• Extension to Tensor Factorization <<

•Conclusions

Large-scale Matrix Factorization (by Kijung Shin) 78/99

Page 79: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Tensors

Large-scale Matrix Factorization (by Kijung Shin) 79/99

•An 𝑁-way tensor is an 𝑁-dimensional array

𝑡11 ⋯ 𝑡1𝐼2⋮ ⋱ ⋮𝑡𝐼11 ⋯ 𝑡𝐼1𝐼2

𝑖1=1,…

,𝐼1

𝑖2 = 1,… , 𝐼2

𝑡1⋮𝑡𝐼1

1-way tensor(= vector)

2-way tensor(= matrix)

3-way tensor

Page 80: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Tensor Factorization: Problem

Large-scale Matrix Factorization (by Kijung Shin) 80/99

•Given: ◦ 𝑿: 𝐼 by 𝐽 by 𝐾 tensor

▪possibly with missing values

◦ 𝑅: target rank (a scalar)

▪usually 𝑅 ≪ 𝐼, 𝐽, 𝐾

𝐼

𝐽

𝐾

?

?

?

?

𝑿

Page 81: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Tensor Factorization: Problem

Large-scale Matrix Factorization (by Kijung Shin) 81/99

•Given: ◦ 𝑿: 𝐼 by 𝐽 by 𝐾 tensor (i.e., 3-dim array)

▪possibly with missing values

◦ 𝑅: target rank (a scalar)

▪usually 𝑅 ≪ 𝐼, 𝐽, 𝐾

• Find:◦ 𝑼: 𝐼 by 𝑅 matrix

◦ 𝑯: 𝐽 by 𝑅 matrix

◦ 𝑾: 𝐾by 𝑅 matrix

▪without missing values

𝐼

𝑅

𝑼

𝐽

𝑯

𝐾

𝑾

𝑅

𝑅

Page 82: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Tensor Factorization: Problem•Goal: 𝑿 ≈ [𝑼𝑯𝑾]

◦ where 𝑼𝑯𝑾 𝑖𝑗𝑘 = σ𝑟=1𝑅 𝑈𝑖𝑟𝐻𝑗𝑟𝑊𝑘𝑟

Large-scale Matrix Factorization (by Kijung Shin) 82/99

?

?

?

?

𝑼

𝑯𝑻

𝑿

Page 83: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Loss Function𝐿 𝑋, 𝑈, 𝐻,𝑊

= σ 𝑖,𝑗,𝑘 ∈𝑍 (𝑋𝑖𝑗𝑘 − 𝑈𝐻𝑊 𝑖𝑗𝑘)2 + …

Goal: to make 𝑿 and 𝑼𝑯𝑾 similar

Large-scale Matrix Factorization (by Kijung Shin) 83/99

Indices ofnon-missing entries

i.e., 𝑖, 𝑗, 𝑘 ∈ 𝑍↔ 𝑋𝑖𝑗𝑘 is not missing

(𝑖, 𝑗, 𝑘)-th entry of [UHW]

(𝑖, 𝑗, 𝑘)-thentry of 𝑋

Page 84: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Loss Function (cont.)𝐿 𝑋, 𝑈, 𝐻,𝑊

= σ 𝑖,𝑗,𝑘 ∈𝑍 (𝑋𝑖𝑗𝑘 − 𝑈𝐻𝑊 𝑖𝑗𝑘)2

+ 𝜆( 𝑈 𝐹2 + 𝐻 𝐹

2 + 𝑊 𝐹2)

Goal: to prevent overfitting

(by making the entries of 𝑈, 𝐻 and 𝑉 close to zero)

Large-scale Matrix Factorization (by Kijung Shin) 84/99

Regularization parameter

Frobenius Norm:

𝑈 𝐹2 =

𝑖=1

𝐼

𝑟=1

𝑅

𝑈𝑖𝑟2

Page 85: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

SGD for TF•𝐿 𝑋, 𝑈, 𝐻,𝑊 : sum of loss for each non-missing entry

𝐿 𝑋, 𝑈, 𝐻,𝑊 = σ 𝑖,𝑗,𝑘 ∈𝑍 𝐿′(𝑋𝑖𝑗𝑘 , 𝑈𝑖:𝐻𝑗:𝑊𝑘:)

•An SGD step on each non-missing entry 𝑋𝑖𝑗𝑘:◦ Read 𝑈𝑖: , 𝐻𝑗:, and 𝑊𝑘:

◦ Compute gradient of 𝐿′ 𝑉𝑖𝑗 ,𝑊𝑖:, 𝐻:𝑗 ,𝑊𝑘:

◦ Update 𝑈𝑖: , 𝐻𝑗:, 𝑊𝑘:

Large-scale Matrix Factorization (by Kijung Shin) 85/99

𝑼

𝑯𝑻

𝑾

𝑋𝑖𝑗𝑘

𝑿

Page 86: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Parallel Algorithms for TF• Parallel algorithms for MF are extended to TF

◦ DSGD → FlexiFaCT [BKP+14]

◦ ALS [SPK16]

◦ CCD++ → CDTF [SK17]

Large-scale Matrix Factorization (by Kijung Shin) 86/99

FlexiFaCT [VKP+14]

Page 87: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Applications•Context-aware recommendation [KABO10]

Large-scale Matrix Factorization (by Kijung Shin) 87/99

Use

rs

Ratings

?

5

3

4

2

Movies

𝑋𝑖𝑗𝑘= 𝑈𝐻𝑊 𝑖𝑗𝑘

Missing rating

Page 88: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Applications (cont.)•Context-aware recommendation [KABO10]

Large-scale Matrix Factorization (by Kijung Shin) 88/99

Use

rs

Ratings

?

5

3

4

2

Restaurants

Missing rating𝑋𝑖𝑗𝑘= 𝑈𝐻𝑊 𝑖𝑗𝑘

Page 89: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Applications (cont.)•Video Restoration [LMWY13]

Large-scale Matrix Factorization (by Kijung Shin) 89/99

X-a

xis

Pixels

?

5

3

4

2

Y-Axis

Missing or corrupted pixel𝑋𝑖𝑗𝑘= 𝑈𝐻𝑊 𝑖𝑗𝑘

Page 90: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Applications (cont.)•Video Restoration [LMWY13]

Large-scale Matrix Factorization (by Kijung Shin) 90/99

Corrupted frame Restored frame

Page 91: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Applications (cont.)• Personalized Web Search [SZL+05]

Large-scale Matrix Factorization (by Kijung Shin) 91/99

Use

rs

Preference

?

5

3

4

2

Query keywords

Missing preference

𝑋𝑖𝑗𝑘= 𝑈𝐻𝑊 𝑖𝑗𝑘

Page 92: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Roadmap•Matrix Factorization (review)

•Algorithms◦ Distributed SGD: DSGD

◦ Alternating Least Square: ALS

◦ Cyclic Coordinate Descent: CCD++

• Experiments

• Extension to Tensor Factorization

•Conclusions <<

Large-scale Matrix Factorization (by Kijung Shin) 92/99

Page 93: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Conclusions•Matrix Factorization (MF)

• Parallel algorithms for MF◦ Distributed SGD: DSGD

◦ Alternating Least Square: ALS

◦ Cyclic Coordinate Descent: CCD++

• Tensor Factorization (TF)◦ Extension of MF to tensors

◦ Applications: context-aware recommendation, video restoration, personalized web search, etc.

Large-scale Matrix Factorization (by Kijung Shin) 93/99

Page 94: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Questions?• These slides are available at:

www.cs.cmu.edu/~kijungs/ETC/10-405.pdf

• Email: [email protected]

Large-scale Matrix Factorization (by Kijung Shin) 94/99

Page 95: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

References• [SZL+05] Jian-Tao Sun, Hua-Jun Zeng, Huan Liu, Yuchang Lu, and Zheng Chen. "Cubesvd: a novel approach to personalized web search." WWW 2005

• [ZWSP08] Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and RongPan. "Large-scale parallel collaborative filtering for the netflix prize." AAIM 2008

• [KABO10] Karatzoglou, Alexandros, Xavier Amatriain, Linas Baltrunas, and Nuria Oliver. "Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering." RecSys 2010.

• [GNHS11] Rainer Gemulla, Erik Nijkamp, Peter J. Haas, and YannisSismanis. "Large-scale matrix factorization with distributed stochastic gradient descent." KDD 2011

• [DKA11] Daniel M. Dunlavy, Tamara G. Kolda, and Evrim Acar. "Temporal link prediction using matrix and tensor factorizations." TKDD 2011

Large-scale Matrix Factorization (by Kijung Shin) 95/99

Page 96: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

References (cont.)• [YHSD12] Yu, Hsiang-Fu, Cho-Jui Hsieh, Si Si, and Inderjit Dhillon. "Scalable coordinate descent approaches to parallel matrix factorization for recommender systems." ICDM 2012

• [LMWY13] Ji Liu, Przemyslaw Musialski, Peter Wonka, and Jieping Ye. "Tensor completion for estimating missing values in visual data." TPAMI 2013.

• [BKP+14] Alex Beutel, Abhimanu Kumar, Evangelos E. Papalexakis, Partha Pratim Talukdar, Christos Faloutsos, and Eric P. Xing. "Flexifact: Scalable flexible factorization of coupled tensors on hadoop." SDM 2014

• [SPK16] Shaden Smith, Jongsoo Park, and George Karypis. "An exploration of optimization algorithms for high performance tensor completion." In High Performance Computing, Networking, Storage and Analysis, SC 2016

• [SK17] Kijung Shin and U Kang. “Distributed Methods for High-dimensional and Large-scale Tensor Factorization.” ICDM 2014

Large-scale Matrix Factorization (by Kijung Shin) 96/99

Page 97: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Closed-Form Solution of ALS

• Let 𝑊 =

− 𝑊1: −

:− 𝑊𝑛: −

, 𝐻 =| |𝐻:1 … 𝐻:𝑚| |

Then,

𝐿 𝑉,𝑊,𝐻

= σ 𝑖,𝑗 ∈𝑍 (𝑉𝑖𝑗 −𝑊𝑖:𝐻:𝑗)2 + 𝜆 σ𝑖=1

𝑛 𝑊𝑖: 2 + σ𝑗=1𝑚 𝐻:𝑗 2

= σ 𝑖,𝑗 ∈𝑍 (𝑉𝑖𝑗 −σ𝑘=1𝑟 𝑊𝑖𝑘𝐻𝑘𝑗)

2

+ 𝜆 σ𝑖=1𝑛 σ𝑘=1

𝑟 𝑊𝑖𝑘2 +σ𝑗=1

𝑚 σ𝑘=1𝑟 𝐻𝑘𝑗

2

Large-scale Matrix Factorization (by Kijung Shin) 97/99

Page 98: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

Closed-Form Solution of ALS (cont.)1

2

𝜕𝐿

𝜕𝐻𝑘𝑗= 0, ∀𝑘, ∀𝑗 (first-order conditions)

⇒ σ𝑖∈𝑍:𝑗𝑉𝑖𝑗 −𝑊𝑖:𝐻:𝑗 𝑊𝑖𝑘 + 𝜆𝐻𝑘𝑗 = 0, ∀𝑘, ∀𝑗

⇒ σ𝑖∈𝑍:𝑗𝑊𝑖:𝐻:𝑗𝑊𝑖𝑘 + 𝜆𝐻𝑘𝑗 = σ𝑖∈𝑍:𝑗

𝑉𝑖𝑗𝑊𝑖𝑘 , ∀𝑘, ∀𝑗

⇒ σ𝑖∈𝑍:𝑗𝑊𝑖:𝐻:𝑗𝑊𝑖𝑘 + 𝜆𝐻𝑘𝑗 = σ𝑖∈𝑍:𝑗

𝑉𝑖𝑗𝑊𝑖𝑘 , ∀𝑘, ∀𝑗

Large-scale Matrix Factorization (by Kijung Shin) 98/99

rows of non-missing entrys in the 𝑗-th column of 𝑽

Page 99: Large-scale Matrix Factorizationkijungs/etc/10-405.pdf · Large-scale Matrix Factorization ... Large-scale Matrix Factorization (by Kijung Shin) 16/99 Machine 1 Machine 2 Machine

(If we stack 𝑟 conditions on 𝐻1𝑗, …., 𝐻𝑟𝑗)

⇒ (σ𝑖∈𝑍:𝑗𝑊𝑖: 𝑊𝑖:

𝑇 + 𝜆𝐼𝑘)𝐻:𝑗 = σ𝑖∈𝑍:𝑗𝑉𝑖𝑗𝑊𝑖: , ∀𝑗

⇒ 𝐻:𝑗 = σ𝑖∈𝑍:𝑗𝑊𝑖: 𝑊𝑖:

𝑇 + 𝜆𝐼𝑘−1

σ𝑖∈𝑍:𝑗𝑉𝑖𝑗𝑊𝑖: , ∀𝑗

Large-scale Matrix Factorization (by Kijung Shin) 99/99

𝑘 by 𝑘 identity matrix

Closed-Form Solution of ALS (cont.)


Related Documents