Matrix Completion: Fundamental Limits and Efficient Algorithmsswoh.web.engr.illinois.edu/defense.pdf · 2018. 3. 27. · Matrix Completion: Fundamental Limits and E cient Algorithms

Matrix Completion:Fundamental Limits and Efficient Algorithms

Sewoong Oh

PhD DefenseStanford University

July 23, 2010

1 / 33

Matrix completion

Find the missing entries in a huge data matrix

2 / 33

Example 1. Recommendation systems

?

13

4

1

1

5

4 4

4

4

4

4

4

4

4

4

4

4

4

4

3 3

3

3

1

1

1

11

5

5

5

2

2

2

2

3

?

?

?

?

5 · 105 users

2·10

4m

ovies

106 queries

Given less than 1% of the movie ratings

Goal: Predict missing ratings3 / 33

Example 2. Positioning

Distance Matrix

Only distances between close-by sensors are measured

Goal: Find the sensor positions up to a rigid motion

4 / 33

Matrix completion

More applications:I Computer vision: Structure-from-motionI Molecular biology: MicroarrayI Numerical linear algebra: Fast low-rank approximationsI etc.

5 / 33

Outline

1 Background

2 Algorithm and main results

3 Applications in positioning

6 / 33

Background

7 / 33

The model

Low-rank M U

Σ VT

=αn

n r

r

Rank-r matrix M

Random uniform sample set E

Sample matrix ME

MEij =

{Mij if (i , j) ∈ E

0 otherwise

8 / 33

The model

Sample MEαn

n

Rank-r matrix M

Random uniform sample set E

Sample matrix ME

MEij =

{Mij if (i , j) ∈ E

0 otherwise

8 / 33

Which matrices?

Pathological example

M =

1 0 · · · 00 0 · · · 0...

.... . .

...0 0 · · · 0

=

10...0

· [1] · [1 0 · · · 0]

[Candes, Recht ’08] M = UΣVT has coherence µ if

A0. max1≤i≤αn

r∑k=1

U2ik ≤ µ

r

n, max

1≤j≤n

r∑k=1

V2jk ≤ µ

r

n

A1. maxi,j

∣∣∣ r∑k=1

UikVjk

∣∣∣ ≤ µ√r

n

IntuitionI µ is small if singular vectors are well balancedI We need low-coherence for matrix completion

9 / 33

Which matrices?

Pathological example

M =

1 0 · · · 00 0 · · · 0...

.... . .

...0 0 · · · 0

=

10...0

· [1] · [1 0 · · · 0]

[Candes, Recht ’08] M = UΣVT has coherence µ if

A0. max1≤i≤αn

r∑k=1

U2ik ≤ µ

r

n, max

1≤j≤n

r∑k=1

V2jk ≤ µ

r

n

A1. maxi,j

∣∣∣ r∑k=1

UikVjk

∣∣∣ ≤ µ√r

n

IntuitionI µ is small if singular vectors are well balancedI We need low-coherence for matrix completion

9 / 33

Previous work

Rank minimization

minimize rank(X)subject to Xij = Mij , (i , j) ∈ E

Heuristic [Fazel ’02]

minimize ‖X‖∗subject to Xij = Mij , (i , j) ∈ E

NP-hard

Convex relaxation

Nuclear norm

‖X‖∗ =n∑

i=1

σi (X )

Can be solved using SemidefiniteProgramming(SDP)

10 / 33

Previous work

Rank minimization

minimize rank(X)subject to Xij = Mij , (i , j) ∈ E

Heuristic [Fazel ’02]

minimize ‖X‖∗subject to Xij = Mij , (i , j) ∈ E

NP-hard

Convex relaxation

Nuclear norm

‖X‖∗ =n∑

i=1

σi (X )

Can be solved using SemidefiniteProgramming(SDP)

10 / 33

Previous work

[Candes, Recht ’08]

I Nuclear norm minimization reconstructs M exactly with highprobability, if

|E | ≥ C µ r n6/5 log n

I Surprise?

I Open questionsF Optimality: Do we need n6/5 log n samples?F Complexity: SDP is computationally expensiveF Noise: Can not deal with noise

A new approach to Matrix Completion: OptSpace

11 / 33

Previous work



|E | ≥ C µ r n6/5 log n

I Degrees of freedom ' (1 + α)rnI Open questions

F Optimality: Do we need n6/5 log n samples?F Complexity: SDP is computationally expensiveF Noise: Can not deal with noise


11 / 33

Previous work



|E | ≥ C µ r n6/5 log n

I Degrees of freedom ' (1 + α)rnI Open questions

F Optimality: Do we need n6/5 log n samples?F Complexity: SDP is computationally expensiveF Noise: Can not deal with noise


11 / 33

Example: 2000× 2000 rank-8 random matrixlow-rank matrix M sampled matrix ME

OptSpace output M squared error (M− M)2

0.25% sampled

12 / 33



0.50% sampled

12 / 33



0.75% sampled

12 / 33



1.00% sampled

12 / 33



1.25% sampled

12 / 33



1.50% sampled

12 / 33



1.75% sampled

12 / 33

Algorithm

13 / 33

Naıve approachSingular Value Decomposition (SVD)

ME =n∑

i=1

σixiyTi

Compute rank-r approximation MSVD

MSVD ,αn2

|E |

r∑i=1

σixiyTi

ME MSVD

14 / 33

Naıve approach failsSingular Value Decomposition (SVD)

ME =n∑

i=1

σixiyTi


MSVD ,αn2

|E |

r∑i=1

σixiyTi

ME MSVD

14 / 33

Naıve approach failsSingular Value Decomposition (SVD)

ME =n∑

i=1

σixiyTi


MSVD ,αn2

|E |

r∑i=1

σixiyTi

ME MSVD

14 / 33

Trimming

ME =

MEij =

0 if deg(rowi ) > 2|E |/αn0 if deg(colj) > 2|E |/n

MEij otherwise

deg(·) is the number of samples in that row/column

15 / 33

Trimming

ME =

MEij =


MEij otherwise


15 / 33

Trimming

ME =

MEij =


MEij otherwise


15 / 33

Algorithm

OptSpace

Input : sample indices E , sample values ME , rank r

Output : estimation M1: Trimming

2: Compute MSVD using SVD3: Greedy minimization of the residual error

MSVD can be computed efficiently for sparse matrices

16 / 33

Algorithm

OptSpace



2: Compute MSVD using SVD

3: Greedy minimization of the residual error

MSVD can be computed efficiently for sparse matrices

16 / 33

Main results

Theorem

For any |E |, MSVD achieves, with high probability,

RMSE ≤ CMmax

√nr

|E |

RMSE =(

1αn2

∑i,j(M− MSVD)2

ij

)1/2

Mmax , maxi,j |Mij |

Keshavan, Montanari, Oh, IEEE Trans. Information Theory, 201017 / 33

Main results

Theorem


RMSE ≤ CMmax

√nr

|E |

[Achlioptas, McSherry ’07]

If |E | ≥ n(8 log n)4, with highprobability,

RMSE ≤ 4Mmax

√nr

|E |

For n = 105, (8 log n)4 ' 7.2 · 107


Main results

Theorem


RMSE ≤ CMmax

√nr

|E |

[Achlioptas, McSherry ’07]

If |E | ≥ n(8 log n)4, with highprobability,

RMSE ≤ 4Mmax

√nr

|E |

For n = 105, (8 log n)4 ' 7.2 · 107

Netflix dataset

A single user rated 17,000 movies.

“Miss Congeniality”: 200,000 ratings.


Can we do better?

18 / 33

Greedy minimization of residual error

Starting from (X0,Y0) for MSVD = X0S0YT0 , use gradient descent

methods to solve

minimize F (X,Y)subject to XT X = I, YT Y = I

F (X,Y) , minS∈Rr×r

∑(i ,j)∈E

(ME

ij − (XSYT )ij

)2

X S YT

rank

Can be computed efficiently for sparse matrices

19 / 33

Algorithm

OptSpace



2: Compute MSVD using SVD3: Greedy minimization of the residual error

20 / 33

Main results

Theorem (Trimming+SVD)

MSVD achieves, with high probability,

RMSE ≤ CMmax

√nr

|E |

Theorem (Trimming+SVD+Greedy minimization)

OptSpace reconstructs M exactly, with high probability, if

|E | ≥ C µ r n max{µr , log n}


OptSpace is order-optimal

Theorem

If µ and r are bounded, OptSpace reconstructs M exactly, with highprobability, if

|E | ≥ C n log n

Lower bound (coupon collector’s problem):If |E | ≤ C ′n log n, then exact reconstruction is impossible

Nuclear norm minimization:[Candes, Recht ’08, Candes, Tao ’09, Recht ’09, Gross et al. ’09]

If |E | ≥ C ′′n (log n)2, then exact reconstruction by SDP

22 / 33

OptSpace is order-optimal

Theorem

If µ and r are bounded, OptSpace reconstructs M exactly, with highprobability, if

|E | ≥ C n log n

Lower bound (coupon collector’s problem):If |E | ≤ C ′n log n, then exact reconstruction is impossible

Nuclear norm minimization:[Candes, Recht ’08, Candes, Tao ’09, Recht ’09, Gross et al. ’09]

If |E | ≥ C ′′n (log n)2, then exact reconstruction by SDP

22 / 33

Comparison1000× 1000 rank-10 matrix M

0

0.5

1

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14

Sampling rate

PsuccessFundamental Limit�

OptSpace�

FPCA�

SVT�

ADMiRA�

Fundamental Limit [Singer, Cucuringu ’09], FPCA [Ma, Goldfarb, Chen ’09],

SVT [Cai, Candes, Shen ’08], ADMiRA [Lee, Bresler ’09]23 / 33

Story so far

OptSpace reconstructs M from a few sampled entries, when M isexactly low-rank and samples are exact

In reality,I M is only approximately low-rankI samples are corrupted by noise

24 / 33

The model with noise

Low-rank M U

Σ VT

=αn

n r

r

Rank-r matrix M

Random sample set E

Sample noise ZE

Sample matrix NE = ME + ZE

25 / 33

The model with noise

Sample NEαn

n

Rank-r matrix M

Random sample set E

Sample noise ZE

Sample matrix NE = ME + ZE

25 / 33

Main results

Theorem

For |E | ≥ Cµrn max{µr , log n}, OptSpace achieves, with high probability,

RMSE ≤ C ′n√

r

|E |||ZE ||2 ,

provided that the RHS is smaller than σr (M)/n.

‖ · ‖2 is the spectral norm

RMSE ≥ σz

√2 r n|E |

Keshavan, Montanari, Oh, Journ. Machine Learning Research, 201026 / 33

OptSpace is order-optimal when noise is i.i.d. Gaussian

Theorem

For |E | ≥ Cµrn max{µr , log n}, OptSpace achieves, with high probability,

RMSE ≤ C ′ σz

√r n

|E |

provided that the RHS is smaller than σr (M)/n.

Lower bound: [Candes, Plan ’09]

RMSE ≥ σz

√2 r n|E |

Trimming + SVD

RMSE ≤ CMmax

√r n

|E |︸︷︷︸missing entries

+ C ′σz

√r n

|E |︸︷︷︸sample noise

27 / 33

Comparison500× 500 rank-4 matrix M, Gaussian noise with σz = 1Example from [Candes, Plan ’09]

0

0.5

1

1.5

0.2 0.4 0.6 0.8 1

Sampling rate

RMSE

Lower Bound��*

OptSpace

6

Trim+SVD��

FPCA [Ma, Goldfarb, Chen ’09], ADMiRA [Lee, Bresler ’09]

28 / 33

Comparison500× 500 rank-4 matrix M, Gaussian noise with σz = 1Example from [Candes, Plan ’09]

0

0.5

1

1.5

0.2 0.4 0.6 0.8 1

Sampling rate

RMSE

Lower Bound��*

OptSpace

6

Trim+SVD��

FPCA�� ADMiRA�

��

FPCA [Ma, Goldfarb, Chen ’09], ADMiRA [Lee, Bresler ’09]28 / 33

Positioning

29 / 33

The model

R

Distance Matrix D

n wireless devices uniformly distributed in a bounded convex region

Distance measurements between devices within radio range R

Find the locations up to a rigid motion

30 / 33

The model

R

Distance Matrix D

How is it related to Matrix Completion?I Need to find the missing entriesI rank(D) = 4

How is it different?I Non-uniform samplingI Rich information not used in Matrix Completion

30 / 33

The model

R

Distance Matrix D

MDS-MAP [Shang et al. ’03]

1. Fill in the missing entries with shortest paths2. Compute rank-4 approximation

30 / 33

Main results

Theorem

For R > C√

log nn , with high probability,

RMSE ≤ C

R

√log n

n+ o(1) .

RMSE =(

1n2

∑i,j

(D− D

)2ij

)1/2

Lower Bound:

If R <√

log nπn , then the graph is disconnected

Generalized to quantized measurements and distributed algorithms

We can add Greedy Minimization step

Oh, Karbasi, Montanari, Information Theory Workshop, 2010Karbasi, Oh, ACM SIGMETRICS, 2010

31 / 33

Numerical simulation

0

0.05

0.1

0.15

0.2

0.25

1 1.5 2 2.5 3 3.5 4 4.5

n=500n=1,000n=2,000

0

0.05

0.1

0.15

0.2

0.25

2 3 4 5 6 7 8 9 10 11 12

n=500n=1,000n=2,000

R/√

log nπn R/

√1πn

RMSE

32 / 33

Conclusion

Matrix Completion is an important problem with many practicalapplications

OptSpace is an efficient algorithm for Matrix Completion

OptSpace achieves performance close to the fundamental limit

33 / 33

Special thanks to:

PMCRQOAVNBETHRDAZOKNWXAOURHLIO

33 / 33

Special thanks to:


33 / 33

Special thanks to:


33 / 33

Special thanks to:


33 / 33

Special thanks to:


33 / 33

Special thanks to:

Officemates: Morteza, Yash, Jose, Raghu, Satish

Friends: Mohsen, Adel, Farshid, Fernando, Arian, Haim, Sachin,Ivana, Ahn, Cha, Choi, Rhee, Kang, Kim, Lee, Na, Park, Ra, Seok,Song

33 / 33

Special thanks to:

My family and Kyung Eun

33 / 33

Thank you!

33 / 33

Matrix Completion: Fundamental Limits and Efficient Algorithmsswoh.web.engr.illinois.edu/defense.pdf · 2018. 3. 27. · Matrix Completion: Fundamental Limits and E cient Algorithms

Documents