Top Banner
Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu 1 , Alex Dimakis 1 , Sujay Sanghavi 1 , Felix Yu 2 , Dan Holtmann-Rice 2 , Dmitry Storcheus 2 , Afshin Rostamizadeh 2 , Sanjiv Kumar 2 1 University of Texas at Austin, 2 Google Research Wed Jun 12th 06:30 -- 09:00 PM @ Pacific Ballroom #189
18

Learning a Compressed Sensing Measurement …12-11-00...2000/12/11  · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Jul 04, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Learning a Compressed Sensing Measurement …12-11-00...2000/12/11  · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Learning a Compressed Sensing Measurement Matrix via

Gradient UnrollingShanshan Wu1, Alex Dimakis1, Sujay Sanghavi1, Felix Yu2, Dan Holtmann-Rice2,

Dmitry Storcheus2, Afshin Rostamizadeh2, Sanjiv Kumar2

1University of Texas at Austin, 2Google Research

Wed Jun 12th 06:30 -- 09:00 PM @ Pacific Ballroom #189

Page 2: Learning a Compressed Sensing Measurement …12-11-00...2000/12/11  · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Learning a Compressed Sensing Measurement Matrix via

Gradient UnrollingShanshan Wu1, Alex Dimakis1, Sujay Sanghavi1, Felix Yu2, Dan Holtmann-Rice2,

Dmitry Storcheus2, Afshin Rostamizadeh2, Sanjiv Kumar2

1University of Texas at Austin, 2Google Research

Wed Jun 12th 06:30 -- 09:00 PM @ Pacific Ballroom #189

Page 3: Learning a Compressed Sensing Measurement …12-11-00...2000/12/11  · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Motivation

• Goal: Create good representations for sparse data

Page 4: Learning a Compressed Sensing Measurement …12-11-00...2000/12/11  · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Motivation

• Goal: Create good representations for sparse data• Amazon employee dataset: 𝑑 = 15𝑘, nnz = 9• RCV1 text dataset: 𝑑 = 47𝑘, nnz = 76• Wiki multi-label dataset: 𝑑 = 31𝑘, nnz = 19

• eXtreme Multi-label Learning (XML). (Multiple labels per item, from a very large class of labels)

One-hot encoded categorical data+Text parts

Page 5: Learning a Compressed Sensing Measurement …12-11-00...2000/12/11  · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Motivation

• Goal: Create good representations for sparse data• Amazon employee dataset: 𝑑 = 15𝑘, nnz = 9• RCV1 text dataset: 𝑑 = 47𝑘, nnz = 76• Wiki multi-label dataset: 𝑑 = 31𝑘, nnz = 19

• eXtreme Multi-label Learning (XML). (Multiple labels per item, from a very large class of labels)

• Unlike image/video data, there is no notion of spatial/time locality. No CNN • Reduce the dimensionality via a linear sketching/embeddingWant: Beyond sparsity, learn additional structure

One-hot encoded categorical data+Text parts

Page 6: Learning a Compressed Sensing Measurement …12-11-00...2000/12/11  · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Representing vectors in low-dimension

• 𝐴 ∈ ℝ-×/ Measurement matrix• If we ask: Linear compression, • And Linear recovery • Best learned

measurement/reconstructionmatrices for l2 norm?

𝑥 ∈ ℝ/

𝑦 = 𝐴𝑥

𝑦 ∈ ℝ-(𝑚 < 𝑑)

?𝑥 ≈ 𝑥

Encode Recover𝑙𝑖𝑛𝑒𝑎𝑟 𝑙𝑖𝑛𝑒𝑎𝑟

Page 7: Learning a Compressed Sensing Measurement …12-11-00...2000/12/11  · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Representing vectors in low-dimension

• 𝐴 ∈ ℝ-×/ Measurement matrix• If we ask Linear compression, • And Linear recovery • Best learned

measurement/reconstructionmatrices for l2 norm?• PCA

𝑥 ∈ ℝ/

𝑦 = 𝐴𝑥

𝑦 ∈ ℝ-(𝑚 < 𝑑)

?𝑥 ≈ 𝑥

Recover𝑙𝑖𝑛𝑒𝑎𝑟 𝑙𝑖𝑛𝑒𝑎𝑟

Encode

Page 8: Learning a Compressed Sensing Measurement …12-11-00...2000/12/11  · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Representing vectors in low-dimension

• 𝐴 ∈ ℝ-×/ Measurement matrix• If we ask Linear compression, • And Linear recovery • Best learned

measurement/reconstructionmatrices for l2 norm?• PCA• But if x is sparse we can do

better

𝑥 ∈ ℝ/

𝑦 = 𝐴𝑥

𝑦 ∈ ℝ-(𝑚 < 𝑑)

?𝑥 ≈ 𝑥

Recover𝑙𝑖𝑛𝑒𝑎𝑟 𝑙𝑖𝑛𝑒𝑎𝑟

Encode

Page 9: Learning a Compressed Sensing Measurement …12-11-00...2000/12/11  · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Compressed Sensing (Donoho; Candes et al.; …)

𝑥 ∈ ℝ/𝑦 = 𝐴𝑥 ∈ ℝ-

(𝑚 < 𝑑)

?𝑥 ≈ 𝑥

Recover

• 𝐴 ∈ ℝ-×/ Measurement matrix• If we ask Linear compression, • Recovery by convex opt• ℓI-min, Lasso,...

• Near-perfect recovery for sparse vectors. • Provably for Gaussian random A.

𝑓(𝐴, 𝑦) ≔ argminOP 𝑥Q I s. t. 𝐴𝑥Q = 𝑦

ℓ𝟏-min𝑙𝑖𝑛𝑒𝑎𝑟Encode

Page 10: Learning a Compressed Sensing Measurement …12-11-00...2000/12/11  · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Compressed Sensing (Donoho; Candes et al.; …)

𝑥 ∈ ℝ/𝑦 = 𝐴𝑥 ∈ ℝ-

(𝑚 < 𝑑)

?𝑥 ≈ 𝑥

Compress Recover

• 𝐴 ∈ ℝ-×/ Measurement matrix• If we ask Linear compression, • Recovery by convex opt• ℓI-min, Lasso,...

• Near-perfect recovery for sparse vectors. • Provably for Gaussian random A.

1. If our vectors aresparse +additional unknown structure

(e.g. one-hot encoded features, text+features, XML, etc)

2. Can we LEARN a measurement matrix A

3. Make it work well for convex opt decoder

Page 11: Learning a Compressed Sensing Measurement …12-11-00...2000/12/11  · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Comparisons of the recovery performanceLearned measurements + ℓI-min decoding [our method]

Gaussian measurements+ model-based CoSaMP

Gaussian + ℓI-min-decoding

Fraction of exactly recovered points

Exact recovery:𝑥 − ?𝑥 U ≤ 10XIY

Number of measurements (m)

It has structure knowledge!

Page 12: Learning a Compressed Sensing Measurement …12-11-00...2000/12/11  · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Learning a measurement matrix

• Training data: 𝑛 sparse vectors 𝑥I, 𝑥U, … , 𝑥[ ∈ ℝ/

𝑥\ ∈ ℝ/

𝑦 = 𝐴𝑥\

]𝑥\ = 𝑓 𝐴, 𝐴𝑥\ ∈ ℝ/

𝑓(𝐴, 𝑦) ≔ argminOP 𝑥Q I s. t. 𝐴𝑥Q = 𝑦

ℓI-min𝐴 ∈ ℝ-×/RecoverEncode

Page 13: Learning a Compressed Sensing Measurement …12-11-00...2000/12/11  · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Learning a measurement matrix

• Training data: 𝑛 sparse vectors 𝑥I, 𝑥U, … , 𝑥[ ∈ ℝ/

𝑥\ ∈ ℝ/

𝑦 = 𝐴𝑥\

𝑓(𝐴, 𝑦) ≔ argminOP 𝑥Q I s. t. 𝐴𝑥Q = 𝑦

ℓI-min𝐴 ∈ ℝ-×/

Objective function:min ∑\_I[ 𝑥\ − 𝑓 𝐴, 𝐴𝑥\ U

U

𝐴 ∈ ℝ-×/

Recover

]𝑥\ = 𝑓 𝐴, 𝐴𝑥\ ∈ ℝ/

Encode

Page 14: Learning a Compressed Sensing Measurement …12-11-00...2000/12/11  · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Learning a measurement matrix

• Training data: 𝑛 sparse vectors 𝑥I, 𝑥U, … , 𝑥[ ∈ ℝ/

𝑥\ ∈ ℝ/

𝑦 = 𝐴𝑥\

𝑓(𝐴, 𝑦) ≔ argminOP 𝑥Q I s. t. 𝐴𝑥Q = 𝑦

ℓI-min𝐴 ∈ ℝ-×/

Objective function:min ∑\_I[ 𝑥\ − 𝑓 𝐴, 𝐴𝑥\ U

U

𝐴 ∈ ℝ-×/

Problem:How to compute gradient w.r.t. 𝐴?

Recover

]𝑥\ = 𝑓 𝐴, 𝐴𝑥\ ∈ ℝ/

Encode

Page 15: Learning a Compressed Sensing Measurement …12-11-00...2000/12/11  · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Learning a measurement matrix

• Training data: 𝑛 sparse vectors 𝑥I, 𝑥U, … , 𝑥[ ∈ ℝ/

𝑥\ ∈ ℝ/

𝑦 = 𝐴𝑥\Recover

𝑓(𝐴, 𝑦) ≔ argminOP 𝑥Q I s. t. 𝐴𝑥Q = 𝑦

ℓI-min𝐴 ∈ ℝ-×/

Objective function:min ∑\_I[ 𝑥\ − 𝑓 𝐴, 𝐴𝑥\ U

U

𝐴 ∈ ℝ-×/

Problem:How to compute gradient w.r.t. 𝐴?

Key idea:Replace 𝑓(𝐴, 𝑦) by a few steps of projected subgradient

]𝑥\ = 𝑓 𝐴, 𝐴𝑥\ ∈ ℝ/

Encode

Page 16: Learning a Compressed Sensing Measurement …12-11-00...2000/12/11  · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

ℓI-AE: a novel autoencoder architecture 𝑥

𝑦 = 𝐴𝑥

𝐴 𝐴%

𝑥(') = 𝐴%𝑦𝑧' = sign(𝑥 ' )

−𝛼' 𝐼 − 𝐴%𝐴 𝑧' 𝑥(1)

𝑥(')

BN

−𝛼% 𝐼 − 𝐴%𝐴 𝑧%𝑥(%4')

𝑥(%)

BN

𝑧% = sign(𝑥 % )

𝑥5 = ReLU(𝑥(%4'))

𝐈𝐧𝐩𝐮𝐭 𝐎𝐮𝐭𝐩𝐮𝐭

𝐄𝐧𝐜𝐨𝐝𝐞𝐫 𝐃𝐞𝐜𝐨𝐝𝐞𝐫

𝑥(`aI) = 𝑥(`) − 𝛼` 𝐼 − 𝐴d𝐴 sign(𝑥(`))One step of projected subgradient

Page 17: Learning a Compressed Sensing Measurement …12-11-00...2000/12/11  · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Real sparse datasets

Fraction of exactly recovered test points

Our method performs the best!

[Our method]

2-layer

Test RMSE

2-layerd = 15k, nnz = 9 d = 31k, nnz = 19 d = 47k, nnz = 76

Number of measurements (m)

Page 18: Learning a Compressed Sensing Measurement …12-11-00...2000/12/11  · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Summary

• Key idea: We learn a compressed sensing measurement matrix by unrolling the projected subgradient of ℓI-min decoder• Implemented as an autoencoder ℓI-AE• Compared 12 algorithms over 6 datasets (3 synthetic and 3 real)• Our method created perfect reconstruction with 1.1-3X fewer

measurements compared to previous state-of-the-art methods• Applied to Extreme multilabel classification, our method outperforms SLEEC (Bhatia et al., 2015)

Wed Jun 12th 06:30 -- 09:00 PM @ Pacific Ballroom #189