Learning a Compressed Sensing Measurement …12-11-00...2000/12/11 · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Learning a Compressed Sensing Measurement Matrix via

Gradient UnrollingShanshan Wu1, Alex Dimakis1, Sujay Sanghavi1, Felix Yu2, Dan Holtmann-Rice2,

Dmitry Storcheus2, Afshin Rostamizadeh2, Sanjiv Kumar2

1University of Texas at Austin, 2Google Research

Wed Jun 12th 06:30 -- 09:00 PM @ Pacific Ballroom #189

Learning a Compressed Sensing Measurement Matrix via

Gradient UnrollingShanshan Wu1, Alex Dimakis1, Sujay Sanghavi1, Felix Yu2, Dan Holtmann-Rice2,

Dmitry Storcheus2, Afshin Rostamizadeh2, Sanjiv Kumar2

1University of Texas at Austin, 2Google Research


Motivation

• Goal: Create good representations for sparse data

Motivation

• Goal: Create good representations for sparse data• Amazon employee dataset: 𝑑 = 15𝑘, nnz = 9• RCV1 text dataset: 𝑑 = 47𝑘, nnz = 76• Wiki multi-label dataset: 𝑑 = 31𝑘, nnz = 19

• eXtreme Multi-label Learning (XML). (Multiple labels per item, from a very large class of labels)

One-hot encoded categorical data+Text parts

Motivation

• Goal: Create good representations for sparse data• Amazon employee dataset: 𝑑 = 15𝑘, nnz = 9• RCV1 text dataset: 𝑑 = 47𝑘, nnz = 76• Wiki multi-label dataset: 𝑑 = 31𝑘, nnz = 19

• eXtreme Multi-label Learning (XML). (Multiple labels per item, from a very large class of labels)

• Unlike image/video data, there is no notion of spatial/time locality. No CNN • Reduce the dimensionality via a linear sketching/embeddingWant: Beyond sparsity, learn additional structure

One-hot encoded categorical data+Text parts

Representing vectors in low-dimension

• 𝐴 ∈ ℝ-×/ Measurement matrix• If we ask: Linear compression, • And Linear recovery • Best learned

measurement/reconstructionmatrices for l2 norm?

𝑥 ∈ ℝ/

𝑦 = 𝐴𝑥

𝑦 ∈ ℝ-(𝑚 < 𝑑)

?𝑥 ≈ 𝑥

Encode Recover𝑙𝑖𝑛𝑒𝑎𝑟 𝑙𝑖𝑛𝑒𝑎𝑟


• 𝐴 ∈ ℝ-×/ Measurement matrix• If we ask Linear compression, • And Linear recovery • Best learned

measurement/reconstructionmatrices for l2 norm?• PCA

𝑥 ∈ ℝ/

𝑦 = 𝐴𝑥

𝑦 ∈ ℝ-(𝑚 < 𝑑)

?𝑥 ≈ 𝑥

Recover𝑙𝑖𝑛𝑒𝑎𝑟 𝑙𝑖𝑛𝑒𝑎𝑟

Encode


• 𝐴 ∈ ℝ-×/ Measurement matrix• If we ask Linear compression, • And Linear recovery • Best learned

measurement/reconstructionmatrices for l2 norm?• PCA• But if x is sparse we can do

better

𝑥 ∈ ℝ/

𝑦 = 𝐴𝑥

𝑦 ∈ ℝ-(𝑚 < 𝑑)

?𝑥 ≈ 𝑥

Recover𝑙𝑖𝑛𝑒𝑎𝑟 𝑙𝑖𝑛𝑒𝑎𝑟

Encode

Compressed Sensing (Donoho; Candes et al.; …)

𝑥 ∈ ℝ/𝑦 = 𝐴𝑥 ∈ ℝ-

(𝑚 < 𝑑)

?𝑥 ≈ 𝑥

Recover

• 𝐴 ∈ ℝ-×/ Measurement matrix• If we ask Linear compression, • Recovery by convex opt• ℓI-min, Lasso,...

• Near-perfect recovery for sparse vectors. • Provably for Gaussian random A.

𝑓(𝐴, 𝑦) ≔ argminOP 𝑥Q I s. t. 𝐴𝑥Q = 𝑦

ℓ𝟏-min𝑙𝑖𝑛𝑒𝑎𝑟Encode

Compressed Sensing (Donoho; Candes et al.; …)

𝑥 ∈ ℝ/𝑦 = 𝐴𝑥 ∈ ℝ-

(𝑚 < 𝑑)

?𝑥 ≈ 𝑥

Compress Recover

• 𝐴 ∈ ℝ-×/ Measurement matrix• If we ask Linear compression, • Recovery by convex opt• ℓI-min, Lasso,...

• Near-perfect recovery for sparse vectors. • Provably for Gaussian random A.

1. If our vectors aresparse +additional unknown structure

(e.g. one-hot encoded features, text+features, XML, etc)

2. Can we LEARN a measurement matrix A

3. Make it work well for convex opt decoder

Comparisons of the recovery performanceLearned measurements + ℓI-min decoding [our method]

Gaussian measurements+ model-based CoSaMP

Gaussian + ℓI-min-decoding

Fraction of exactly recovered points

Exact recovery:𝑥 − ?𝑥 U ≤ 10XIY

Number of measurements (m)

It has structure knowledge!

Learning a measurement matrix

• Training data: 𝑛 sparse vectors 𝑥I, 𝑥U, … , 𝑥[ ∈ ℝ/

𝑥\ ∈ ℝ/

𝑦 = 𝐴𝑥\

]𝑥\ = 𝑓 𝐴, 𝐴𝑥\ ∈ ℝ/


ℓI-min𝐴 ∈ ℝ-×/RecoverEncode



𝑥\ ∈ ℝ/

𝑦 = 𝐴𝑥\


ℓI-min𝐴 ∈ ℝ-×/

Objective function:min ∑\_I[ 𝑥\ − 𝑓 𝐴, 𝐴𝑥\ U

U

𝐴 ∈ ℝ-×/

Recover

]𝑥\ = 𝑓 𝐴, 𝐴𝑥\ ∈ ℝ/

Encode



𝑥\ ∈ ℝ/

𝑦 = 𝐴𝑥\




U

𝐴 ∈ ℝ-×/

Problem:How to compute gradient w.r.t. 𝐴?

Recover

]𝑥\ = 𝑓 𝐴, 𝐴𝑥\ ∈ ℝ/

Encode



𝑥\ ∈ ℝ/

𝑦 = 𝐴𝑥\Recover




U

𝐴 ∈ ℝ-×/

Problem:How to compute gradient w.r.t. 𝐴?

Key idea:Replace 𝑓(𝐴, 𝑦) by a few steps of projected subgradient

]𝑥\ = 𝑓 𝐴, 𝐴𝑥\ ∈ ℝ/

Encode

ℓI-AE: a novel autoencoder architecture 𝑥

𝑦 = 𝐴𝑥

𝐴 𝐴%

𝑥(') = 𝐴%𝑦𝑧' = sign(𝑥 ' )

−𝛼' 𝐼 − 𝐴%𝐴 𝑧' 𝑥(1)

𝑥(')

BN

−𝛼% 𝐼 − 𝐴%𝐴 𝑧%𝑥(%4')

𝑥(%)

BN

𝑧% = sign(𝑥 % )

…

𝑥5 = ReLU(𝑥(%4'))

𝐈𝐧𝐩𝐮𝐭 𝐎𝐮𝐭𝐩𝐮𝐭

𝐄𝐧𝐜𝐨𝐝𝐞𝐫 𝐃𝐞𝐜𝐨𝐝𝐞𝐫

𝑥(`aI) = 𝑥(`) − 𝛼` 𝐼 − 𝐴d𝐴 sign(𝑥(`))One step of projected subgradient

Real sparse datasets

Fraction of exactly recovered test points

Our method performs the best!

[Our method]

2-layer

Test RMSE

2-layerd = 15k, nnz = 9 d = 31k, nnz = 19 d = 47k, nnz = 76

Number of measurements (m)

Summary

• Key idea: We learn a compressed sensing measurement matrix by unrolling the projected subgradient of ℓI-min decoder• Implemented as an autoencoder ℓI-AE• Compared 12 algorithms over 6 datasets (3 synthetic and 3 real)• Our method created perfect reconstruction with 1.1-3X fewer

measurements compared to previous state-of-the-art methods• Applied to Extreme multilabel classification, our method outperforms SLEEC (Bhatia et al., 2015)


Learning a Compressed Sensing Measurement …12-11-00...2000/12/11 · Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu1, Alex Dimakis1, SujaySanghavi1,

Documents