Top Banner
Rémi Gribonval Inria Rennes - Bretagne Atlantique [email protected]
77

Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

Jul 05, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

Rémi Gribonval Inria Rennes - Bretagne Atlantique

[email protected]

Page 2: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Contributors & Collaborators

2

Anthony Bourrier Nicolas Keriven Yann Traonmilin

Tomer Peleg

Gilles Puy

Mike Davies Patrick PerezGilles Blanchard

Page 3: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Agenda

From Compressive Sensing to Compressive Learning ? Information-preserving projections & sketches Compressive Clustering / Compressive GMM Conclusion

3

Page 4: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Machine Learning

Available data training collection of feature vectors = point cloud

Goals infer parameters to achieve a certain task generalization to future samples with the same probability distribution

Examples

4

Compressive Gaussian Mixture Estimation

Anthony Bourrier

12, R

´

emi Gribonval

2, Patrick P

´

erez

11 Technicolor, 975 Avenue des Champs Blancs, 35576 Cesson Sevigne, France

[email protected]

2 INRIA Rennes - Bretagne Atlantique, Campus de Beaulieu, 35042 Rennes, [email protected]

Motivation

Goal: Infer parameters ✓ from n-dimensional data X = {x1

, . . . ,xN}. Thistypically requires extensive access to the data. Proposed method: Inferfrom a sketch of the data ) memory and privacy savings.

n x

1

. . .xN

Learning set(size Nn)

ˆ

A

=) mˆ

z

Database sketch(size m)

L=) K ✓

Learned parameters(size K)

Figure 1: Illustration of the proposed sketching framework. A is a sketch-

ing operator, L is a learning method from the sketch.

Model and problem statement

Application to mixture of isotropic Gaussians in Rn:

fµ / exp

��kx� µk2

2

/(2�2

)

�. (1)

Data X = {xj}Nj=1 ⇠i.i.d.

p =

Pks=1↵sfµs

with:

•weights ↵1

, . . . ,↵k (positive, sum to one)•means µ

1

, . . . ,µk 2 Rn.

Sketch = Fourier samplings at different frequencies: (Af )l = ˆf (!l).Empirical version: ( ˆA(X ))l =

1

N

PNj=1 exp(�ih!l,xji) ⇡ (Ap)l.

We want to infer the mixture parameters from ˆ

z =

ˆ

A(X ).Problem casted as:

p = argminq2⌃k

kˆz�Aqk22

, (2)

where ⌃k = mixtures of k isotropic Gaussians with positive weights.Standard CS Our problem

Signal x 2 Rn f 2 L1

(Rn)

Dimension n InfiniteSparsity k k

Dictionary {e1

, . . . , en} F = {fµ,µ 2 Rn}Measurements x 7! ha,xi f 7!

RRn f (x)e�ih!,xidx

Algorithm

Current estimate p with weights {↵s}ks=1 and support ˆ� = {ˆµs}ks=1.Residual ˆr = ˆ

z�Ap.1. Searching new support functions:

Search for ”good components to add” to the support) Local minima of µ 7! �hAfµ, ˆri, added to the support ˆ�.New support ˆ�0.

2. k-term thresholding:Projection of ˆz onto ˆ

0 with positivity constraints on coefficients:

argmin�2RK

+

||ˆz�U�||22

, (3)

with U = [

ˆµ1

, . . . , ˆµK].k highest coefficients and corresponding support are kept! new support ˆ� and coefficients ↵

1

, . . . , ↵k.3. Final ”shift”:

Gradient descent algorithm on the objective function, with initialization atthe current support and coefficients.

First step Second step Third step

Figure 2: Algorithm illustration in dimension n = 1 for k = 3 Gaus-

sians. Top: Iteration 1. Bottom: Iteration 2. Blue curve=true mixture,

Red curve=reconstructed mixture, Green curve=gradient function. Green

Dots=Candidate Centroids, Red Dots=Reconstructed Centroids.

Experimental results

Data setup: � = 1, (↵1

, . . . ,↵k) drawn uniformly on the simplex.Entries of µ

1

, . . . ,µk ⇠i.i.d.

N (0, 1).

Algorithm heuristics:•Frequencies drawn i.i.d. from N (0, Id).

•New support function search (step 1) initialized as ru, where r uniformly

drawn in0,max

x2X||x||

2

�and u uniformly drawn on B

2

(0, 1).

Comparison between:•Our method: Sketch is computed on-the-fly and data is discarded.

•EM: Data is stored to allow the standard optimization steps to be per-formed.

Quality measures: KL Divergence and Hellinger distance.

NCompressed EM

KL div. Hell. Mem. KL div. Hell. Mem.10

3

0.68± 0.28 0.06± 0.01 0.6 0.68± 0.44 0.07± 0.03 0.2410

4

0.24± 0.31 0.02± 0.02 0.6 0.19± 0.21 0.01± 0.02 2.410

5

0.13± 0.15 0.01± 0.02 0.6 0.13± 0.21 0.01± 0.02 24

Table 1: Comparison between our method and an EM algorithm. n =

20, k = 10,m = 1000.

−4 −2 0 2 4 6 8

−4

−2

0

2

4

6

ˆ

A

=)

−0.5 0 0.5 10

10

20

30

40

50

60 n=10, Hell. for 80%

sketch size m

k*n/

m

200 400 600 800 1000 1200 1400 1600 1800 2000

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruc-

tion quality for n = 10.

-6 -4 -2 0 2 4 6-4

-3

-2

-1

0

1

2

3

PCA principal subspace

Dictionary learning dictionary

Clustering centroids

Classification classifier parameters

(e.g. support vectors)

X

Page 5: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Point cloud = large matrix of feature vectors

Challenging dimensions

5

X

Page 6: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Point cloud = large matrix of feature vectors

Challenging dimensions

5

x1X

Page 7: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Point cloud = large matrix of feature vectors

Challenging dimensions

5

x1 x2X

Page 8: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Point cloud = large matrix of feature vectors

Challenging dimensions

5

x1 x2 xN…X X

Page 9: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Point cloud = large matrix of feature vectors

High feature dimension n Large collection size N

Challenging dimensions

5

x1 x2 xN…X X

Page 10: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Point cloud = large matrix of feature vectors

High feature dimension n Large collection size N

Challenging dimensions

5

x1 x2 xN…X X

Challenge: compress before learning ?X

Page 11: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Compressive Machine Learning ?

Point cloud = large matrix of feature vectors

6

x1 x2 xN…X X

yNy2 …y1Y = MX

M

Page 12: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Compressive Machine Learning ?

Point cloud = large matrix of feature vectors

Reduce feature dimension [Calderbank & al 2009, Reboredo & al 2013]

(Random) feature projection Exploits / needs low-dimensional feature model

6

x1 x2 xN…X X

yNy2 …y1Y = MX

Page 13: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Challenges of large collections

Feature projection: limited impact

7

X

Y = MX

Page 14: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Challenges of large collections

Feature projection: limited impact

7

X

Y = MX

“Big Data” Challenge: compress collection size

Page 15: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Compressive Machine Learning ?

Point cloud = … empirical probability distribution

8

X

Page 16: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Compressive Machine Learning ?

Point cloud = … empirical probability distribution

Reduce collection dimension coresets

see e.g. [Agarwal & al 2003, Felman 2010]

sketching & hashing see e.g. [Thaper & al 2002, Cormode & al 2005]

8

X

Page 17: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Compressive Machine Learning ?

Point cloud = … empirical probability distribution

Reduce collection dimension coresets

see e.g. [Agarwal & al 2003, Felman 2010]

sketching & hashing see e.g. [Thaper & al 2002, Cormode & al 2005]

8

XM

z 2 Rm

Sketching operator nonlinear in the feature vectors

linear in their probability distribution

Page 18: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Compressive Machine Learning ?

Point cloud = … empirical probability distribution

Reduce collection dimension coresets

see e.g. [Agarwal & al 2003, Felman 2010]

sketching & hashing see e.g. [Thaper & al 2002, Cormode & al 2005]

8

XM

z 2 Rm

Sketching operator nonlinear in the feature vectors

linear in their probability distribution

Page 19: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Example: Compressive Clustering

9

X

Compressive Gaussian Mixture Estimation

Anthony Bourrier

12, R

´

emi Gribonval

2, Patrick P

´

erez

11 Technicolor, 975 Avenue des Champs Blancs, 35576 Cesson Sevigne, France

[email protected]

2 INRIA Rennes - Bretagne Atlantique, Campus de Beaulieu, 35042 Rennes, [email protected]

Motivation

Goal: Infer parameters ✓ from n-dimensional data X = {x1

, . . . ,xN}. Thistypically requires extensive access to the data. Proposed method: Inferfrom a sketch of the data ) memory and privacy savings.

n x

1

. . .xN

Learning set(size Nn)

ˆ

A

=) mˆ

z

Database sketch(size m)

L=) K ✓

Learned parameters(size K)

Figure 1: Illustration of the proposed sketching framework. A is a sketch-

ing operator, L is a learning method from the sketch.

Model and problem statement

Application to mixture of isotropic Gaussians in Rn:

fµ / exp

��kx� µk2

2

/(2�2

)

�. (1)

Data X = {xj}Nj=1 ⇠i.i.d.

p =

Pks=1↵sfµs

with:

•weights ↵1

, . . . ,↵k (positive, sum to one)•means µ

1

, . . . ,µk 2 Rn.

Sketch = Fourier samplings at different frequencies: (Af )l = ˆf (!l).Empirical version: ( ˆA(X ))l =

1

N

PNj=1 exp(�ih!l,xji) ⇡ (Ap)l.

We want to infer the mixture parameters from ˆ

z =

ˆ

A(X ).Problem casted as:

p = argminq2⌃k

kˆz�Aqk22

, (2)

where ⌃k = mixtures of k isotropic Gaussians with positive weights.Standard CS Our problem

Signal x 2 Rn f 2 L1

(Rn)

Dimension n InfiniteSparsity k k

Dictionary {e1

, . . . , en} F = {fµ,µ 2 Rn}Measurements x 7! ha,xi f 7!

RRn f (x)e�ih!,xidx

Algorithm

Current estimate p with weights {↵s}ks=1 and support ˆ� = {ˆµs}ks=1.Residual ˆr = ˆ

z�Ap.1. Searching new support functions:

Search for ”good components to add” to the support) Local minima of µ 7! �hAfµ, ˆri, added to the support ˆ�.New support ˆ�0.

2. k-term thresholding:Projection of ˆz onto ˆ

0 with positivity constraints on coefficients:

argmin�2RK

+

||ˆz�U�||22

, (3)

with U = [

ˆµ1

, . . . , ˆµK].k highest coefficients and corresponding support are kept! new support ˆ� and coefficients ↵

1

, . . . , ↵k.3. Final ”shift”:

Gradient descent algorithm on the objective function, with initialization atthe current support and coefficients.

First step Second step Third step

Figure 2: Algorithm illustration in dimension n = 1 for k = 3 Gaus-

sians. Top: Iteration 1. Bottom: Iteration 2. Blue curve=true mixture,

Red curve=reconstructed mixture, Green curve=gradient function. Green

Dots=Candidate Centroids, Red Dots=Reconstructed Centroids.

Experimental results

Data setup: � = 1, (↵1

, . . . ,↵k) drawn uniformly on the simplex.Entries of µ

1

, . . . ,µk ⇠i.i.d.

N (0, 1).

Algorithm heuristics:•Frequencies drawn i.i.d. from N (0, Id).

•New support function search (step 1) initialized as ru, where r uniformly

drawn in0,max

x2X||x||

2

�and u uniformly drawn on B

2

(0, 1).

Comparison between:•Our method: Sketch is computed on-the-fly and data is discarded.

•EM: Data is stored to allow the standard optimization steps to be per-formed.

Quality measures: KL Divergence and Hellinger distance.

NCompressed EM

KL div. Hell. Mem. KL div. Hell. Mem.10

3

0.68± 0.28 0.06± 0.01 0.6 0.68± 0.44 0.07± 0.03 0.2410

4

0.24± 0.31 0.02± 0.02 0.6 0.19± 0.21 0.01± 0.02 2.410

5

0.13± 0.15 0.01± 0.02 0.6 0.13± 0.21 0.01± 0.02 24

Table 1: Comparison between our method and an EM algorithm. n =

20, k = 10,m = 1000.

−4 −2 0 2 4 6 8

−4

−2

0

2

4

6

ˆ

A

=)

−0.5 0 0.5 10

10

20

30

40

50

60 n=10, Hell. for 80%

sketch size m

k*n/

m

200 400 600 800 1000 1200 1400 1600 1800 2000

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruc-

tion quality for n = 10.

M z 2 Rm

Recovery algorithm

estimated centroids

ground truth

N = 1000;n = 2 m = 60

Page 20: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Computational impact of sketching

10

Ph.D. A. Bourrier & N. Keriven

Computation time Memory

Collection size N Collection size N

Tim

e (s

)

Mem

ory

(byt

es)

Page 21: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Data distribution

Sketch

The Sketch Trick

11

X ⇠ p(x)

z` =1

N

NX

i=1

h`(xi)

Page 22: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Data distribution

Sketch

The Sketch Trick

11

X ⇠ p(x)

z` =1

N

NX

i=1

h`(xi)

⇡ Eh`(X)

Page 23: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Data distribution

Sketch

The Sketch Trick

11

X ⇠ p(x)

z` =1

N

NX

i=1

h`(xi)

⇡ Eh`(X)

=

Zh`(x)p(x)dx

Page 24: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Data distribution

Sketch

The Sketch Trick

11

X ⇠ p(x)

z` =1

N

NX

i=1

h`(xi)

⇡ Eh`(X)

=

Zh`(x)p(x)dx

nonlinear in the feature vectors linear in the distribution p(x)

Page 25: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Data distribution

Sketch

The Sketch Trick

11

X ⇠ p(x)

z` =1

N

NX

i=1

h`(xi)

⇡ Eh`(X)

=

Zh`(x)p(x)dx

y

Signal

space

x

Observation space

Signal Processing inverse problems compressive sensing

MM

Probability space

Sketch space

Machine Learning method of moments compressive learning

z

p

Linear “projection”

nonlinear in the feature vectors linear in the distribution p(x)

Page 26: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Information preservation ?

Data distribution

Sketch

The Sketch Trick

11

X ⇠ p(x)

z` =1

N

NX

i=1

h`(xi)

⇡ Eh`(X)

=

Zh`(x)p(x)dx

y

Signal

space

x

Observation space

Signal Processing inverse problems compressive sensing

MM

Probability space

Sketch space

Machine Learning method of moments compressive learning

z

p

Linear “projection”

nonlinear in the feature vectors linear in the distribution p(x)

Page 27: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

The Sketch Trick

Data distribution

Sketch

Dimension reduction ?

12

X ⇠ p(x)

z` =1

N

NX

i=1

h`(xi)

⇡ Eh`(X)

=

Zh`(x)p(x)dx

y

Signal

space

x

Observation space

Signal Processing inverse problems compressive sensing

MM

Probability space

Sketch space

Machine Learning method of moments compressive learning

z

p

Linear “projection”

nonlinear in the feature vectors linear in the distribution p(x)

Page 28: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

Information preserving projections

Page 29: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Stable recovery

Signal space

Observation space

Linear “projection”

x

Ex: set of k-sparse vectors

14

M

y

m ⌧ nRm

Rn

Model set = signals of interest

⌃k = {x 2 Rn, kxk0 k}

Page 30: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Stable recovery

Signal space

Observation space

Linear “projection”

x

Ex: set of k-sparse vectors

14

M

y

m ⌧ nRm

Rn

Model set = signals of interest

⌃k = {x 2 Rn, kxk0 k}

Recovery algorithm

= “decoder”

Ideal goal: build decoder with the guarantee that

(instance optimality [Cohen & al 2009])

kx��(Mx+ e)k Ckek, 8x 2 ⌃

Page 31: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Stable recovery

Signal space

Observation space

Linear “projection”

x

Ex: set of k-sparse vectors

14

M

y

m ⌧ nRm

Rn

Model set = signals of interest

⌃k = {x 2 Rn, kxk0 k}

Recovery algorithm

= “decoder”

Ideal goal: build decoder with the guarantee that

(instance optimality [Cohen & al 2009])

kx��(Mx+ e)k Ckek, 8x 2 ⌃

Are there such decoders?

Page 32: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Stable recovery of k-sparse vectors

Typical decoders L1 minimization

LASSO [Tibshirani 1994],Basis Pursuit [Chen & al 1999]

Greedy algorithms (Orthonormal) Matching Pursuit [Mallat & Zhang 1993], Iterative Hard Thresholding (IHT) [Blumensath & Davies 2009], …

Guarantees Assume Restricted isometry property

[Candès & al 2004]

Exact recovery Stability to noise Robustness to model error

15

�(y) := argminx:Mx=y

kxk1

1� � kMzk22kzk22

1 + �

when kzk0 2k

Page 33: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Stable recovery

Low-dimensional model Sparse

16

Signal space

Observation space

Linear “projection”

x

M

y

m ⌧ nRm

Rn

Model set = signals of interest

Page 34: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Stable recovery

Low-dimensional model Sparse Sparse in dictionary D

Signal space

Observation space

Linear “projection”

x

M

y

m ⌧ nRm

Rn

Model set = signals of interest

17

Page 35: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Stable recovery

Low-dimensional model Sparse Sparse in dictionary D Co-sparse in analysis operator A

total variation,

physics-driven sparse models ..

18

Signal space

Observation space

Linear “projection”

x

M

y

m ⌧ nRm

Rn

Model set = signals of interest

Page 36: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Stable recovery

Low-dimensional model Sparse Sparse in dictionary D Co-sparse in analysis operator A

total variation,

physics-driven sparse models …

Low-rank matrix or tensor matrix completion,

phase-retrieval,

blind sensor calibration …

19

Signal space

Observation space

Linear “projection”

x

M

y

m ⌧ nRm

Rn

Model set = signals of interest

Page 37: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Low-dimensional model Sparse Sparse in dictionary D Co-sparse in analysis operator A

total variation,

physics-driven sparse models …

Low-rank matrix or tensor matrix completion,

phase-retrieval,

blind sensor calibration …

Manifold / Union of manifolds detection, estimation,

localization, mapping …

Matrix with sparse inverse Gaussian graphical models

Given point cloud database indexing

Stable recovery

20

Signal space

Observation space

Linear “projection”

x

M

y

m ⌧ nRm

Rn

Model set = signals of interest

Page 38: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Low-dimensional model Sparse Sparse in dictionary D Co-sparse in analysis operator A

total variation,

physics-driven sparse models …

Low-rank matrix or tensor matrix completion,

phase-retrieval,

blind sensor calibration …

Manifold / Union of manifolds detection, estimation,

localization, mapping …

Matrix with sparse inverse Gaussian graphical models

Given point cloud database indexing

Gaussian Mixture Model…

Stable recovery

21

Signal space

Observation space

Linear “projection”

x

M

y

m ⌧ nRm

Rn

Model set = signals of interest

Vector spaceH

Page 39: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Low-dimensional model

arbitrary set

General stable recovery

22

Observation space

Linear “projection”

x

M

y

m ⌧ nRm

Model set = signals of interest

⌃ ⇢ H

Signal space RnVector spaceH

Recovery algorithm

= “decoder”

Ideal goal: build decoder with the guarantee that

(instance optimality [Cohen & al 2009])

kx��(Mx+ e)k Ckek, 8x 2 ⌃

Are there such decoders?

Page 40: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Theorem 1: RIP is necessary Definition: (general) Restricted Isometry Property (RIP) on secant set

RIP holds as soon as there exists an instance optimal decoder

Stable recovery from arbitrary model sets

23

↵ kMzkkzk � when z 2 ⌃� ⌃ := {x� x

0, x, x

0 2 ⌃}up to renormalization

↵ =p1� �;� =

p1 + �

Page 41: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Theorem 1: RIP is necessary Definition: (general) Restricted Isometry Property (RIP) on secant set

RIP holds as soon as there exists an instance optimal decoder

Theorem 2: RIP is sufficient RIP implies existence of decoder with performance guarantees:

Exact recovery

Stable to noise

Bonus: robust to model error

[Cohen & al 2009] for

[Bourrier & al 2014] for arbitrary

kx��(Mx+ e)k C(�)kek+ C

0(�)d⌃(x,⌃)

Stable recovery from arbitrary model sets

23

↵ kMzkkzk � when z 2 ⌃� ⌃ := {x� x

0, x, x

0 2 ⌃}up to renormalization

↵ =p1� �;� =

p1 + �

⌃k

Page 42: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Theorem 1: RIP is necessary Definition: (general) Restricted Isometry Property (RIP) on secant set

RIP holds as soon as there exists an instance optimal decoder

Theorem 2: RIP is sufficient RIP implies existence of decoder with performance guarantees:

Exact recovery

Stable to noise

Bonus: robust to model error

[Cohen & al 2009] for

[Bourrier & al 2014] for arbitrary

kx��(Mx+ e)k C(�)kek+ C

0(�)d⌃(x,⌃)

Stable recovery from arbitrary model sets

23

↵ kMzkkzk � when z 2 ⌃� ⌃ := {x� x

0, x, x

0 2 ⌃}up to renormalization

↵ =p1� �;� =

p1 + �

⌃k

Page 43: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Theorem 1: RIP is necessary Definition: (general) Restricted Isometry Property (RIP) on secant set

RIP holds as soon as there exists an instance optimal decoder

Theorem 2: RIP is sufficient RIP implies existence of decoder with performance guarantees:

Exact recovery

Stable to noise

Bonus: robust to model error

[Cohen & al 2009] for

[Bourrier & al 2014] for arbitrary

kx��(Mx+ e)k C(�)kek+ C

0(�)d⌃(x,⌃)

Stable recovery from arbitrary model sets

23

↵ kMzkkzk � when z 2 ⌃� ⌃ := {x� x

0, x, x

0 2 ⌃}up to renormalization

↵ =p1� �;� =

p1 + �

⌃k

⌃ Distance to model set

Page 44: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

Compressive Learning Examples

Page 45: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Compressive Machine Learning

Point cloud = empirical probability distribution

Reduce collection dimension = sketching

25

X

z` =1

N

NX

i=1

h`(xi) 1 ` m

M z 2 Rm

Sketching operator

Choosing information preserving sketch ?

Page 46: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Goal: find k centroids

Standard approach = K-means

Sketching approach

p(x) is spatially localized

need “incoherent” sampling choose Fourier sampling

sample characteristic function

choose sampling frequencies

Example: Compressive Clustering

26

Compressive Gaussian Mixture Estimation

Anthony Bourrier

12, R

´

emi Gribonval

2, Patrick P

´

erez

11 Technicolor, 975 Avenue des Champs Blancs, 35576 Cesson Sevigne, France

[email protected]

2 INRIA Rennes - Bretagne Atlantique, Campus de Beaulieu, 35042 Rennes, [email protected]

Motivation

Goal: Infer parameters ✓ from n-dimensional data X = {x1

, . . . ,xN}. Thistypically requires extensive access to the data. Proposed method: Inferfrom a sketch of the data ) memory and privacy savings.

n x

1

. . .xN

Learning set(size Nn)

ˆ

A

=) mˆ

z

Database sketch(size m)

L=) K ✓

Learned parameters(size K)

Figure 1: Illustration of the proposed sketching framework. A is a sketch-

ing operator, L is a learning method from the sketch.

Model and problem statement

Application to mixture of isotropic Gaussians in Rn:

fµ / exp

��kx� µk2

2

/(2�2

)

�. (1)

Data X = {xj}Nj=1 ⇠i.i.d.

p =

Pks=1↵sfµs

with:

•weights ↵1

, . . . ,↵k (positive, sum to one)•means µ

1

, . . . ,µk 2 Rn.

Sketch = Fourier samplings at different frequencies: (Af )l = ˆf (!l).Empirical version: ( ˆA(X ))l =

1

N

PNj=1 exp(�ih!l,xji) ⇡ (Ap)l.

We want to infer the mixture parameters from ˆ

z =

ˆ

A(X ).Problem casted as:

p = argminq2⌃k

kˆz�Aqk22

, (2)

where ⌃k = mixtures of k isotropic Gaussians with positive weights.Standard CS Our problem

Signal x 2 Rn f 2 L1

(Rn)

Dimension n InfiniteSparsity k k

Dictionary {e1

, . . . , en} F = {fµ,µ 2 Rn}Measurements x 7! ha,xi f 7!

RRn f (x)e�ih!,xidx

Algorithm

Current estimate p with weights {↵s}ks=1 and support ˆ� = {ˆµs}ks=1.Residual ˆr = ˆ

z�Ap.1. Searching new support functions:

Search for ”good components to add” to the support) Local minima of µ 7! �hAfµ, ˆri, added to the support ˆ�.New support ˆ�0.

2. k-term thresholding:Projection of ˆz onto ˆ

0 with positivity constraints on coefficients:

argmin�2RK

+

||ˆz�U�||22

, (3)

with U = [

ˆµ1

, . . . , ˆµK].k highest coefficients and corresponding support are kept! new support ˆ� and coefficients ↵

1

, . . . , ↵k.3. Final ”shift”:

Gradient descent algorithm on the objective function, with initialization atthe current support and coefficients.

First step Second step Third step

Figure 2: Algorithm illustration in dimension n = 1 for k = 3 Gaus-

sians. Top: Iteration 1. Bottom: Iteration 2. Blue curve=true mixture,

Red curve=reconstructed mixture, Green curve=gradient function. Green

Dots=Candidate Centroids, Red Dots=Reconstructed Centroids.

Experimental results

Data setup: � = 1, (↵1

, . . . ,↵k) drawn uniformly on the simplex.Entries of µ

1

, . . . ,µk ⇠i.i.d.

N (0, 1).

Algorithm heuristics:•Frequencies drawn i.i.d. from N (0, Id).

•New support function search (step 1) initialized as ru, where r uniformly

drawn in0,max

x2X||x||

2

�and u uniformly drawn on B

2

(0, 1).

Comparison between:•Our method: Sketch is computed on-the-fly and data is discarded.

•EM: Data is stored to allow the standard optimization steps to be per-formed.

Quality measures: KL Divergence and Hellinger distance.

NCompressed EM

KL div. Hell. Mem. KL div. Hell. Mem.10

3

0.68± 0.28 0.06± 0.01 0.6 0.68± 0.44 0.07± 0.03 0.2410

4

0.24± 0.31 0.02± 0.02 0.6 0.19± 0.21 0.01± 0.02 2.410

5

0.13± 0.15 0.01± 0.02 0.6 0.13± 0.21 0.01± 0.02 24

Table 1: Comparison between our method and an EM algorithm. n =

20, k = 10,m = 1000.

−4 −2 0 2 4 6 8

−4

−2

0

2

4

6

ˆ

A

=)

−0.5 0 0.5 10

10

20

30

40

50

60 n=10, Hell. for 80%

sketch size mk*

n/m

200 400 600 800 1000 1200 1400 1600 1800 2000

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruc-

tion quality for n = 10.

z`

=1

N

NX

i=1

ejw>` xi

!` 2 Rn

Page 47: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Goal: find k centroids

Standard approach = K-means

Sketching approach

p(x) is spatially localized

need “incoherent” sampling choose Fourier sampling

sample characteristic function

choose sampling frequencies

Example: Compressive Clustering

26

Compressive Gaussian Mixture Estimation

Anthony Bourrier

12, R

´

emi Gribonval

2, Patrick P

´

erez

11 Technicolor, 975 Avenue des Champs Blancs, 35576 Cesson Sevigne, France

[email protected]

2 INRIA Rennes - Bretagne Atlantique, Campus de Beaulieu, 35042 Rennes, [email protected]

Motivation

Goal: Infer parameters ✓ from n-dimensional data X = {x1

, . . . ,xN}. Thistypically requires extensive access to the data. Proposed method: Inferfrom a sketch of the data ) memory and privacy savings.

n x

1

. . .xN

Learning set(size Nn)

ˆ

A

=) mˆ

z

Database sketch(size m)

L=) K ✓

Learned parameters(size K)

Figure 1: Illustration of the proposed sketching framework. A is a sketch-

ing operator, L is a learning method from the sketch.

Model and problem statement

Application to mixture of isotropic Gaussians in Rn:

fµ / exp

��kx� µk2

2

/(2�2

)

�. (1)

Data X = {xj}Nj=1 ⇠i.i.d.

p =

Pks=1↵sfµs

with:

•weights ↵1

, . . . ,↵k (positive, sum to one)•means µ

1

, . . . ,µk 2 Rn.

Sketch = Fourier samplings at different frequencies: (Af )l = ˆf (!l).Empirical version: ( ˆA(X ))l =

1

N

PNj=1 exp(�ih!l,xji) ⇡ (Ap)l.

We want to infer the mixture parameters from ˆ

z =

ˆ

A(X ).Problem casted as:

p = argminq2⌃k

kˆz�Aqk22

, (2)

where ⌃k = mixtures of k isotropic Gaussians with positive weights.Standard CS Our problem

Signal x 2 Rn f 2 L1

(Rn)

Dimension n InfiniteSparsity k k

Dictionary {e1

, . . . , en} F = {fµ,µ 2 Rn}Measurements x 7! ha,xi f 7!

RRn f (x)e�ih!,xidx

Algorithm

Current estimate p with weights {↵s}ks=1 and support ˆ� = {ˆµs}ks=1.Residual ˆr = ˆ

z�Ap.1. Searching new support functions:

Search for ”good components to add” to the support) Local minima of µ 7! �hAfµ, ˆri, added to the support ˆ�.New support ˆ�0.

2. k-term thresholding:Projection of ˆz onto ˆ

0 with positivity constraints on coefficients:

argmin�2RK

+

||ˆz�U�||22

, (3)

with U = [

ˆµ1

, . . . , ˆµK].k highest coefficients and corresponding support are kept! new support ˆ� and coefficients ↵

1

, . . . , ↵k.3. Final ”shift”:

Gradient descent algorithm on the objective function, with initialization atthe current support and coefficients.

First step Second step Third step

Figure 2: Algorithm illustration in dimension n = 1 for k = 3 Gaus-

sians. Top: Iteration 1. Bottom: Iteration 2. Blue curve=true mixture,

Red curve=reconstructed mixture, Green curve=gradient function. Green

Dots=Candidate Centroids, Red Dots=Reconstructed Centroids.

Experimental results

Data setup: � = 1, (↵1

, . . . ,↵k) drawn uniformly on the simplex.Entries of µ

1

, . . . ,µk ⇠i.i.d.

N (0, 1).

Algorithm heuristics:•Frequencies drawn i.i.d. from N (0, Id).

•New support function search (step 1) initialized as ru, where r uniformly

drawn in0,max

x2X||x||

2

�and u uniformly drawn on B

2

(0, 1).

Comparison between:•Our method: Sketch is computed on-the-fly and data is discarded.

•EM: Data is stored to allow the standard optimization steps to be per-formed.

Quality measures: KL Divergence and Hellinger distance.

NCompressed EM

KL div. Hell. Mem. KL div. Hell. Mem.10

3

0.68± 0.28 0.06± 0.01 0.6 0.68± 0.44 0.07± 0.03 0.2410

4

0.24± 0.31 0.02± 0.02 0.6 0.19± 0.21 0.01± 0.02 2.410

5

0.13± 0.15 0.01± 0.02 0.6 0.13± 0.21 0.01± 0.02 24

Table 1: Comparison between our method and an EM algorithm. n =

20, k = 10,m = 1000.

−4 −2 0 2 4 6 8

−4

−2

0

2

4

6

ˆ

A

=)

−0.5 0 0.5 10

10

20

30

40

50

60 n=10, Hell. for 80%

sketch size mk*

n/m

200 400 600 800 1000 1200 1400 1600 1800 2000

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruc-

tion quality for n = 10.

z`

=1

N

NX

i=1

ejw>` xi

!` 2 Rn

Page 48: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Goal: find k centroids

Standard approach = K-means

Sketching approach

p(x) is spatially localized

need “incoherent” sampling choose Fourier sampling

sample characteristic function

choose sampling frequencies

Example: Compressive Clustering

26

Compressive Gaussian Mixture Estimation

Anthony Bourrier

12, R

´

emi Gribonval

2, Patrick P

´

erez

11 Technicolor, 975 Avenue des Champs Blancs, 35576 Cesson Sevigne, France

[email protected]

2 INRIA Rennes - Bretagne Atlantique, Campus de Beaulieu, 35042 Rennes, [email protected]

Motivation

Goal: Infer parameters ✓ from n-dimensional data X = {x1

, . . . ,xN}. Thistypically requires extensive access to the data. Proposed method: Inferfrom a sketch of the data ) memory and privacy savings.

n x

1

. . .xN

Learning set(size Nn)

ˆ

A

=) mˆ

z

Database sketch(size m)

L=) K ✓

Learned parameters(size K)

Figure 1: Illustration of the proposed sketching framework. A is a sketch-

ing operator, L is a learning method from the sketch.

Model and problem statement

Application to mixture of isotropic Gaussians in Rn:

fµ / exp

��kx� µk2

2

/(2�2

)

�. (1)

Data X = {xj}Nj=1 ⇠i.i.d.

p =

Pks=1↵sfµs

with:

•weights ↵1

, . . . ,↵k (positive, sum to one)•means µ

1

, . . . ,µk 2 Rn.

Sketch = Fourier samplings at different frequencies: (Af )l = ˆf (!l).Empirical version: ( ˆA(X ))l =

1

N

PNj=1 exp(�ih!l,xji) ⇡ (Ap)l.

We want to infer the mixture parameters from ˆ

z =

ˆ

A(X ).Problem casted as:

p = argminq2⌃k

kˆz�Aqk22

, (2)

where ⌃k = mixtures of k isotropic Gaussians with positive weights.Standard CS Our problem

Signal x 2 Rn f 2 L1

(Rn)

Dimension n InfiniteSparsity k k

Dictionary {e1

, . . . , en} F = {fµ,µ 2 Rn}Measurements x 7! ha,xi f 7!

RRn f (x)e�ih!,xidx

Algorithm

Current estimate p with weights {↵s}ks=1 and support ˆ� = {ˆµs}ks=1.Residual ˆr = ˆ

z�Ap.1. Searching new support functions:

Search for ”good components to add” to the support) Local minima of µ 7! �hAfµ, ˆri, added to the support ˆ�.New support ˆ�0.

2. k-term thresholding:Projection of ˆz onto ˆ

0 with positivity constraints on coefficients:

argmin�2RK

+

||ˆz�U�||22

, (3)

with U = [

ˆµ1

, . . . , ˆµK].k highest coefficients and corresponding support are kept! new support ˆ� and coefficients ↵

1

, . . . , ↵k.3. Final ”shift”:

Gradient descent algorithm on the objective function, with initialization atthe current support and coefficients.

First step Second step Third step

Figure 2: Algorithm illustration in dimension n = 1 for k = 3 Gaus-

sians. Top: Iteration 1. Bottom: Iteration 2. Blue curve=true mixture,

Red curve=reconstructed mixture, Green curve=gradient function. Green

Dots=Candidate Centroids, Red Dots=Reconstructed Centroids.

Experimental results

Data setup: � = 1, (↵1

, . . . ,↵k) drawn uniformly on the simplex.Entries of µ

1

, . . . ,µk ⇠i.i.d.

N (0, 1).

Algorithm heuristics:•Frequencies drawn i.i.d. from N (0, Id).

•New support function search (step 1) initialized as ru, where r uniformly

drawn in0,max

x2X||x||

2

�and u uniformly drawn on B

2

(0, 1).

Comparison between:•Our method: Sketch is computed on-the-fly and data is discarded.

•EM: Data is stored to allow the standard optimization steps to be per-formed.

Quality measures: KL Divergence and Hellinger distance.

NCompressed EM

KL div. Hell. Mem. KL div. Hell. Mem.10

3

0.68± 0.28 0.06± 0.01 0.6 0.68± 0.44 0.07± 0.03 0.2410

4

0.24± 0.31 0.02± 0.02 0.6 0.19± 0.21 0.01± 0.02 2.410

5

0.13± 0.15 0.01± 0.02 0.6 0.13± 0.21 0.01± 0.02 24

Table 1: Comparison between our method and an EM algorithm. n =

20, k = 10,m = 1000.

−4 −2 0 2 4 6 8

−4

−2

0

2

4

6

ˆ

A

=)

−0.5 0 0.5 10

10

20

30

40

50

60 n=10, Hell. for 80%

sketch size mk*

n/m

200 400 600 800 1000 1200 1400 1600 1800 2000

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruc-

tion quality for n = 10.

z`

=1

N

NX

i=1

ejw>` xi

!` 2 Rn

Page 49: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Goal: find k centroids

Standard approach = K-means

Sketching approach

p(x) is spatially localized

need “incoherent” sampling choose Fourier sampling

sample characteristic function

choose sampling frequencies

Example: Compressive Clustering

26

Compressive Gaussian Mixture Estimation

Anthony Bourrier

12, R

´

emi Gribonval

2, Patrick P

´

erez

11 Technicolor, 975 Avenue des Champs Blancs, 35576 Cesson Sevigne, France

[email protected]

2 INRIA Rennes - Bretagne Atlantique, Campus de Beaulieu, 35042 Rennes, [email protected]

Motivation

Goal: Infer parameters ✓ from n-dimensional data X = {x1

, . . . ,xN}. Thistypically requires extensive access to the data. Proposed method: Inferfrom a sketch of the data ) memory and privacy savings.

n x

1

. . .xN

Learning set(size Nn)

ˆ

A

=) mˆ

z

Database sketch(size m)

L=) K ✓

Learned parameters(size K)

Figure 1: Illustration of the proposed sketching framework. A is a sketch-

ing operator, L is a learning method from the sketch.

Model and problem statement

Application to mixture of isotropic Gaussians in Rn:

fµ / exp

��kx� µk2

2

/(2�2

)

�. (1)

Data X = {xj}Nj=1 ⇠i.i.d.

p =

Pks=1↵sfµs

with:

•weights ↵1

, . . . ,↵k (positive, sum to one)•means µ

1

, . . . ,µk 2 Rn.

Sketch = Fourier samplings at different frequencies: (Af )l = ˆf (!l).Empirical version: ( ˆA(X ))l =

1

N

PNj=1 exp(�ih!l,xji) ⇡ (Ap)l.

We want to infer the mixture parameters from ˆ

z =

ˆ

A(X ).Problem casted as:

p = argminq2⌃k

kˆz�Aqk22

, (2)

where ⌃k = mixtures of k isotropic Gaussians with positive weights.Standard CS Our problem

Signal x 2 Rn f 2 L1

(Rn)

Dimension n InfiniteSparsity k k

Dictionary {e1

, . . . , en} F = {fµ,µ 2 Rn}Measurements x 7! ha,xi f 7!

RRn f (x)e�ih!,xidx

Algorithm

Current estimate p with weights {↵s}ks=1 and support ˆ� = {ˆµs}ks=1.Residual ˆr = ˆ

z�Ap.1. Searching new support functions:

Search for ”good components to add” to the support) Local minima of µ 7! �hAfµ, ˆri, added to the support ˆ�.New support ˆ�0.

2. k-term thresholding:Projection of ˆz onto ˆ

0 with positivity constraints on coefficients:

argmin�2RK

+

||ˆz�U�||22

, (3)

with U = [

ˆµ1

, . . . , ˆµK].k highest coefficients and corresponding support are kept! new support ˆ� and coefficients ↵

1

, . . . , ↵k.3. Final ”shift”:

Gradient descent algorithm on the objective function, with initialization atthe current support and coefficients.

First step Second step Third step

Figure 2: Algorithm illustration in dimension n = 1 for k = 3 Gaus-

sians. Top: Iteration 1. Bottom: Iteration 2. Blue curve=true mixture,

Red curve=reconstructed mixture, Green curve=gradient function. Green

Dots=Candidate Centroids, Red Dots=Reconstructed Centroids.

Experimental results

Data setup: � = 1, (↵1

, . . . ,↵k) drawn uniformly on the simplex.Entries of µ

1

, . . . ,µk ⇠i.i.d.

N (0, 1).

Algorithm heuristics:•Frequencies drawn i.i.d. from N (0, Id).

•New support function search (step 1) initialized as ru, where r uniformly

drawn in0,max

x2X||x||

2

�and u uniformly drawn on B

2

(0, 1).

Comparison between:•Our method: Sketch is computed on-the-fly and data is discarded.

•EM: Data is stored to allow the standard optimization steps to be per-formed.

Quality measures: KL Divergence and Hellinger distance.

NCompressed EM

KL div. Hell. Mem. KL div. Hell. Mem.10

3

0.68± 0.28 0.06± 0.01 0.6 0.68± 0.44 0.07± 0.03 0.2410

4

0.24± 0.31 0.02± 0.02 0.6 0.19± 0.21 0.01± 0.02 2.410

5

0.13± 0.15 0.01± 0.02 0.6 0.13± 0.21 0.01± 0.02 24

Table 1: Comparison between our method and an EM algorithm. n =

20, k = 10,m = 1000.

−4 −2 0 2 4 6 8

−4

−2

0

2

4

6

ˆ

A

=)

−0.5 0 0.5 10

10

20

30

40

50

60 n=10, Hell. for 80%

sketch size mk*

n/m

200 400 600 800 1000 1200 1400 1600 1800 2000

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruc-

tion quality for n = 10.

z`

=1

N

NX

i=1

ejw>` xi

!` 2 Rn

How ? see poster N. Keriven

Page 50: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Goal: find k centroids

27

X M z 2 Rm

N = 1000;n = 2Sampled

Characteristic Function m = 60

Compressive Gaussian Mixture Estimation

Anthony Bourrier

12, R

´

emi Gribonval

2, Patrick P

´

erez

11 Technicolor, 975 Avenue des Champs Blancs, 35576 Cesson Sevigne, France

[email protected]

2 INRIA Rennes - Bretagne Atlantique, Campus de Beaulieu, 35042 Rennes, [email protected]

Motivation

Goal: Infer parameters ✓ from n-dimensional data X = {x1

, . . . ,xN}. Thistypically requires extensive access to the data. Proposed method: Inferfrom a sketch of the data ) memory and privacy savings.

n x

1

. . .xN

Learning set(size Nn)

ˆ

A

=) mˆ

z

Database sketch(size m)

L=) K ✓

Learned parameters(size K)

Figure 1: Illustration of the proposed sketching framework. A is a sketch-

ing operator, L is a learning method from the sketch.

Model and problem statement

Application to mixture of isotropic Gaussians in Rn:

fµ / exp

��kx� µk2

2

/(2�2

)

�. (1)

Data X = {xj}Nj=1 ⇠i.i.d.

p =

Pks=1↵sfµs

with:

•weights ↵1

, . . . ,↵k (positive, sum to one)•means µ

1

, . . . ,µk 2 Rn.

Sketch = Fourier samplings at different frequencies: (Af )l = ˆf (!l).Empirical version: ( ˆA(X ))l =

1

N

PNj=1 exp(�ih!l,xji) ⇡ (Ap)l.

We want to infer the mixture parameters from ˆ

z =

ˆ

A(X ).Problem casted as:

p = argminq2⌃k

kˆz�Aqk22

, (2)

where ⌃k = mixtures of k isotropic Gaussians with positive weights.Standard CS Our problem

Signal x 2 Rn f 2 L1

(Rn)

Dimension n InfiniteSparsity k k

Dictionary {e1

, . . . , en} F = {fµ,µ 2 Rn}Measurements x 7! ha,xi f 7!

RRn f (x)e�ih!,xidx

Algorithm

Current estimate p with weights {↵s}ks=1 and support ˆ� = {ˆµs}ks=1.Residual ˆr = ˆ

z�Ap.1. Searching new support functions:

Search for ”good components to add” to the support) Local minima of µ 7! �hAfµ, ˆri, added to the support ˆ�.New support ˆ�0.

2. k-term thresholding:Projection of ˆz onto ˆ

0 with positivity constraints on coefficients:

argmin�2RK

+

||ˆz�U�||22

, (3)

with U = [

ˆµ1

, . . . , ˆµK].k highest coefficients and corresponding support are kept! new support ˆ� and coefficients ↵

1

, . . . , ↵k.3. Final ”shift”:

Gradient descent algorithm on the objective function, with initialization atthe current support and coefficients.

First step Second step Third step

Figure 2: Algorithm illustration in dimension n = 1 for k = 3 Gaus-

sians. Top: Iteration 1. Bottom: Iteration 2. Blue curve=true mixture,

Red curve=reconstructed mixture, Green curve=gradient function. Green

Dots=Candidate Centroids, Red Dots=Reconstructed Centroids.

Experimental results

Data setup: � = 1, (↵1

, . . . ,↵k) drawn uniformly on the simplex.Entries of µ

1

, . . . ,µk ⇠i.i.d.

N (0, 1).

Algorithm heuristics:•Frequencies drawn i.i.d. from N (0, Id).

•New support function search (step 1) initialized as ru, where r uniformly

drawn in0,max

x2X||x||

2

�and u uniformly drawn on B

2

(0, 1).

Comparison between:•Our method: Sketch is computed on-the-fly and data is discarded.

•EM: Data is stored to allow the standard optimization steps to be per-formed.

Quality measures: KL Divergence and Hellinger distance.

NCompressed EM

KL div. Hell. Mem. KL div. Hell. Mem.10

3

0.68± 0.28 0.06± 0.01 0.6 0.68± 0.44 0.07± 0.03 0.2410

4

0.24± 0.31 0.02± 0.02 0.6 0.19± 0.21 0.01± 0.02 2.410

5

0.13± 0.15 0.01± 0.02 0.6 0.13± 0.21 0.01± 0.02 24

Table 1: Comparison between our method and an EM algorithm. n =

20, k = 10,m = 1000.

−4 −2 0 2 4 6 8

−4

−2

0

2

4

6

ˆ

A

=)

−0.5 0 0.5 10

10

20

30

40

50

60 n=10, Hell. for 80%

sketch size mk*

n/m

200 400 600 800 1000 1200 1400 1600 1800 2000

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruc-

tion quality for n = 10.

Example: Compressive Clustering

Page 51: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Goal: find k centroids

27

ground truth

X M z 2 Rm

N = 1000;n = 2Sampled

Characteristic Function m = 60 z = Mp ⇡

KX

k=1

↵kMp✓k

p ⇡KX

k=1

↵kp✓k

Density model=GMM with variance = identity

Compressive Gaussian Mixture Estimation

Anthony Bourrier

12, R

´

emi Gribonval

2, Patrick P

´

erez

11 Technicolor, 975 Avenue des Champs Blancs, 35576 Cesson Sevigne, France

[email protected]

2 INRIA Rennes - Bretagne Atlantique, Campus de Beaulieu, 35042 Rennes, [email protected]

Motivation

Goal: Infer parameters ✓ from n-dimensional data X = {x1

, . . . ,xN}. Thistypically requires extensive access to the data. Proposed method: Inferfrom a sketch of the data ) memory and privacy savings.

n x

1

. . .xN

Learning set(size Nn)

ˆ

A

=) mˆ

z

Database sketch(size m)

L=) K ✓

Learned parameters(size K)

Figure 1: Illustration of the proposed sketching framework. A is a sketch-

ing operator, L is a learning method from the sketch.

Model and problem statement

Application to mixture of isotropic Gaussians in Rn:

fµ / exp

��kx� µk2

2

/(2�2

)

�. (1)

Data X = {xj}Nj=1 ⇠i.i.d.

p =

Pks=1↵sfµs

with:

•weights ↵1

, . . . ,↵k (positive, sum to one)•means µ

1

, . . . ,µk 2 Rn.

Sketch = Fourier samplings at different frequencies: (Af )l = ˆf (!l).Empirical version: ( ˆA(X ))l =

1

N

PNj=1 exp(�ih!l,xji) ⇡ (Ap)l.

We want to infer the mixture parameters from ˆ

z =

ˆ

A(X ).Problem casted as:

p = argminq2⌃k

kˆz�Aqk22

, (2)

where ⌃k = mixtures of k isotropic Gaussians with positive weights.Standard CS Our problem

Signal x 2 Rn f 2 L1

(Rn)

Dimension n InfiniteSparsity k k

Dictionary {e1

, . . . , en} F = {fµ,µ 2 Rn}Measurements x 7! ha,xi f 7!

RRn f (x)e�ih!,xidx

Algorithm

Current estimate p with weights {↵s}ks=1 and support ˆ� = {ˆµs}ks=1.Residual ˆr = ˆ

z�Ap.1. Searching new support functions:

Search for ”good components to add” to the support) Local minima of µ 7! �hAfµ, ˆri, added to the support ˆ�.New support ˆ�0.

2. k-term thresholding:Projection of ˆz onto ˆ

0 with positivity constraints on coefficients:

argmin�2RK

+

||ˆz�U�||22

, (3)

with U = [

ˆµ1

, . . . , ˆµK].k highest coefficients and corresponding support are kept! new support ˆ� and coefficients ↵

1

, . . . , ↵k.3. Final ”shift”:

Gradient descent algorithm on the objective function, with initialization atthe current support and coefficients.

First step Second step Third step

Figure 2: Algorithm illustration in dimension n = 1 for k = 3 Gaus-

sians. Top: Iteration 1. Bottom: Iteration 2. Blue curve=true mixture,

Red curve=reconstructed mixture, Green curve=gradient function. Green

Dots=Candidate Centroids, Red Dots=Reconstructed Centroids.

Experimental results

Data setup: � = 1, (↵1

, . . . ,↵k) drawn uniformly on the simplex.Entries of µ

1

, . . . ,µk ⇠i.i.d.

N (0, 1).

Algorithm heuristics:•Frequencies drawn i.i.d. from N (0, Id).

•New support function search (step 1) initialized as ru, where r uniformly

drawn in0,max

x2X||x||

2

�and u uniformly drawn on B

2

(0, 1).

Comparison between:•Our method: Sketch is computed on-the-fly and data is discarded.

•EM: Data is stored to allow the standard optimization steps to be per-formed.

Quality measures: KL Divergence and Hellinger distance.

NCompressed EM

KL div. Hell. Mem. KL div. Hell. Mem.10

3

0.68± 0.28 0.06± 0.01 0.6 0.68± 0.44 0.07± 0.03 0.2410

4

0.24± 0.31 0.02± 0.02 0.6 0.19± 0.21 0.01± 0.02 2.410

5

0.13± 0.15 0.01± 0.02 0.6 0.13± 0.21 0.01± 0.02 24

Table 1: Comparison between our method and an EM algorithm. n =

20, k = 10,m = 1000.

−4 −2 0 2 4 6 8

−4

−2

0

2

4

6

ˆ

A

=)

−0.5 0 0.5 10

10

20

30

40

50

60 n=10, Hell. for 80%

sketch size mk*

n/m

200 400 600 800 1000 1200 1400 1600 1800 2000

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruc-

tion quality for n = 10.

Example: Compressive Clustering

Page 52: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Goal: find k centroids

27

estimated centroids

ground truth

X M z 2 Rm

N = 1000;n = 2Sampled

Characteristic Function m = 60 z = Mp ⇡

KX

k=1

↵kMp✓k

p ⇡KX

k=1

↵kp✓k

Density model=GMM with variance = identity

Compressive Gaussian Mixture Estimation

Anthony Bourrier

12, R

´

emi Gribonval

2, Patrick P

´

erez

11 Technicolor, 975 Avenue des Champs Blancs, 35576 Cesson Sevigne, France

[email protected]

2 INRIA Rennes - Bretagne Atlantique, Campus de Beaulieu, 35042 Rennes, [email protected]

Motivation

Goal: Infer parameters ✓ from n-dimensional data X = {x1

, . . . ,xN}. Thistypically requires extensive access to the data. Proposed method: Inferfrom a sketch of the data ) memory and privacy savings.

n x

1

. . .xN

Learning set(size Nn)

ˆ

A

=) mˆ

z

Database sketch(size m)

L=) K ✓

Learned parameters(size K)

Figure 1: Illustration of the proposed sketching framework. A is a sketch-

ing operator, L is a learning method from the sketch.

Model and problem statement

Application to mixture of isotropic Gaussians in Rn:

fµ / exp

��kx� µk2

2

/(2�2

)

�. (1)

Data X = {xj}Nj=1 ⇠i.i.d.

p =

Pks=1↵sfµs

with:

•weights ↵1

, . . . ,↵k (positive, sum to one)•means µ

1

, . . . ,µk 2 Rn.

Sketch = Fourier samplings at different frequencies: (Af )l = ˆf (!l).Empirical version: ( ˆA(X ))l =

1

N

PNj=1 exp(�ih!l,xji) ⇡ (Ap)l.

We want to infer the mixture parameters from ˆ

z =

ˆ

A(X ).Problem casted as:

p = argminq2⌃k

kˆz�Aqk22

, (2)

where ⌃k = mixtures of k isotropic Gaussians with positive weights.Standard CS Our problem

Signal x 2 Rn f 2 L1

(Rn)

Dimension n InfiniteSparsity k k

Dictionary {e1

, . . . , en} F = {fµ,µ 2 Rn}Measurements x 7! ha,xi f 7!

RRn f (x)e�ih!,xidx

Algorithm

Current estimate p with weights {↵s}ks=1 and support ˆ� = {ˆµs}ks=1.Residual ˆr = ˆ

z�Ap.1. Searching new support functions:

Search for ”good components to add” to the support) Local minima of µ 7! �hAfµ, ˆri, added to the support ˆ�.New support ˆ�0.

2. k-term thresholding:Projection of ˆz onto ˆ

0 with positivity constraints on coefficients:

argmin�2RK

+

||ˆz�U�||22

, (3)

with U = [

ˆµ1

, . . . , ˆµK].k highest coefficients and corresponding support are kept! new support ˆ� and coefficients ↵

1

, . . . , ↵k.3. Final ”shift”:

Gradient descent algorithm on the objective function, with initialization atthe current support and coefficients.

First step Second step Third step

Figure 2: Algorithm illustration in dimension n = 1 for k = 3 Gaus-

sians. Top: Iteration 1. Bottom: Iteration 2. Blue curve=true mixture,

Red curve=reconstructed mixture, Green curve=gradient function. Green

Dots=Candidate Centroids, Red Dots=Reconstructed Centroids.

Experimental results

Data setup: � = 1, (↵1

, . . . ,↵k) drawn uniformly on the simplex.Entries of µ

1

, . . . ,µk ⇠i.i.d.

N (0, 1).

Algorithm heuristics:•Frequencies drawn i.i.d. from N (0, Id).

•New support function search (step 1) initialized as ru, where r uniformly

drawn in0,max

x2X||x||

2

�and u uniformly drawn on B

2

(0, 1).

Comparison between:•Our method: Sketch is computed on-the-fly and data is discarded.

•EM: Data is stored to allow the standard optimization steps to be per-formed.

Quality measures: KL Divergence and Hellinger distance.

NCompressed EM

KL div. Hell. Mem. KL div. Hell. Mem.10

3

0.68± 0.28 0.06± 0.01 0.6 0.68± 0.44 0.07± 0.03 0.2410

4

0.24± 0.31 0.02± 0.02 0.6 0.19± 0.21 0.01± 0.02 2.410

5

0.13± 0.15 0.01± 0.02 0.6 0.13± 0.21 0.01± 0.02 24

Table 1: Comparison between our method and an EM algorithm. n =

20, k = 10,m = 1000.

−4 −2 0 2 4 6 8

−4

−2

0

2

4

6

ˆ

A

=)

−0.5 0 0.5 10

10

20

30

40

50

60 n=10, Hell. for 80%

sketch size mk*

n/m

200 400 600 800 1000 1200 1400 1600 1800 2000

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruc-

tion quality for n = 10.

inspired by

Iterative Hard Thresholding Recovery algorithm

= “decoder”

Example: Compressive Clustering

Page 53: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Goal: find k centroids

27

estimated centroids

ground truth

X M z 2 Rm

N = 1000;n = 2Sampled

Characteristic Function m = 60 z = Mp ⇡

KX

k=1

↵kMp✓k

p ⇡KX

k=1

↵kp✓k

Density model=GMM with variance = identity

Compressive Gaussian Mixture Estimation

Anthony Bourrier

12, R

´

emi Gribonval

2, Patrick P

´

erez

11 Technicolor, 975 Avenue des Champs Blancs, 35576 Cesson Sevigne, France

[email protected]

2 INRIA Rennes - Bretagne Atlantique, Campus de Beaulieu, 35042 Rennes, [email protected]

Motivation

Goal: Infer parameters ✓ from n-dimensional data X = {x1

, . . . ,xN}. Thistypically requires extensive access to the data. Proposed method: Inferfrom a sketch of the data ) memory and privacy savings.

n x

1

. . .xN

Learning set(size Nn)

ˆ

A

=) mˆ

z

Database sketch(size m)

L=) K ✓

Learned parameters(size K)

Figure 1: Illustration of the proposed sketching framework. A is a sketch-

ing operator, L is a learning method from the sketch.

Model and problem statement

Application to mixture of isotropic Gaussians in Rn:

fµ / exp

��kx� µk2

2

/(2�2

)

�. (1)

Data X = {xj}Nj=1 ⇠i.i.d.

p =

Pks=1↵sfµs

with:

•weights ↵1

, . . . ,↵k (positive, sum to one)•means µ

1

, . . . ,µk 2 Rn.

Sketch = Fourier samplings at different frequencies: (Af )l = ˆf (!l).Empirical version: ( ˆA(X ))l =

1

N

PNj=1 exp(�ih!l,xji) ⇡ (Ap)l.

We want to infer the mixture parameters from ˆ

z =

ˆ

A(X ).Problem casted as:

p = argminq2⌃k

kˆz�Aqk22

, (2)

where ⌃k = mixtures of k isotropic Gaussians with positive weights.Standard CS Our problem

Signal x 2 Rn f 2 L1

(Rn)

Dimension n InfiniteSparsity k k

Dictionary {e1

, . . . , en} F = {fµ,µ 2 Rn}Measurements x 7! ha,xi f 7!

RRn f (x)e�ih!,xidx

Algorithm

Current estimate p with weights {↵s}ks=1 and support ˆ� = {ˆµs}ks=1.Residual ˆr = ˆ

z�Ap.1. Searching new support functions:

Search for ”good components to add” to the support) Local minima of µ 7! �hAfµ, ˆri, added to the support ˆ�.New support ˆ�0.

2. k-term thresholding:Projection of ˆz onto ˆ

0 with positivity constraints on coefficients:

argmin�2RK

+

||ˆz�U�||22

, (3)

with U = [

ˆµ1

, . . . , ˆµK].k highest coefficients and corresponding support are kept! new support ˆ� and coefficients ↵

1

, . . . , ↵k.3. Final ”shift”:

Gradient descent algorithm on the objective function, with initialization atthe current support and coefficients.

First step Second step Third step

Figure 2: Algorithm illustration in dimension n = 1 for k = 3 Gaus-

sians. Top: Iteration 1. Bottom: Iteration 2. Blue curve=true mixture,

Red curve=reconstructed mixture, Green curve=gradient function. Green

Dots=Candidate Centroids, Red Dots=Reconstructed Centroids.

Experimental results

Data setup: � = 1, (↵1

, . . . ,↵k) drawn uniformly on the simplex.Entries of µ

1

, . . . ,µk ⇠i.i.d.

N (0, 1).

Algorithm heuristics:•Frequencies drawn i.i.d. from N (0, Id).

•New support function search (step 1) initialized as ru, where r uniformly

drawn in0,max

x2X||x||

2

�and u uniformly drawn on B

2

(0, 1).

Comparison between:•Our method: Sketch is computed on-the-fly and data is discarded.

•EM: Data is stored to allow the standard optimization steps to be per-formed.

Quality measures: KL Divergence and Hellinger distance.

NCompressed EM

KL div. Hell. Mem. KL div. Hell. Mem.10

3

0.68± 0.28 0.06± 0.01 0.6 0.68± 0.44 0.07± 0.03 0.2410

4

0.24± 0.31 0.02± 0.02 0.6 0.19± 0.21 0.01± 0.02 2.410

5

0.13± 0.15 0.01± 0.02 0.6 0.13± 0.21 0.01± 0.02 24

Table 1: Comparison between our method and an EM algorithm. n =

20, k = 10,m = 1000.

−4 −2 0 2 4 6 8

−4

−2

0

2

4

6

ˆ

A

=)

−0.5 0 0.5 10

10

20

30

40

50

60 n=10, Hell. for 80%

sketch size mk*

n/m

200 400 600 800 1000 1200 1400 1600 1800 2000

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruc-

tion quality for n = 10.

inspired by

Iterative Hard Thresholding Recovery algorithm

= “decoder”

Example: Compressive Clustering

Compressive Hierarchical Splitting (CHS) = extension to general GMM

similar to OMP with Replacement

Page 54: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Application: Speaker Verification Results (DET-curves)

28

MFCC coefficients xi 2 R12

N = 300 000 000

~ 50 Gbytes ~ 1000 hours of speech

Page 55: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Application: Speaker Verification Results (DET-curves)

28

MFCC coefficients xi 2 R12

After silence detection

N = 60 000 000

N = 300 000 000

~ 50 Gbytes ~ 1000 hours of speech

Page 56: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Application: Speaker Verification Results (DET-curves)

28

MFCC coefficients xi 2 R12

After silence detection

N = 60 000 000

Maximum size manageable by EM

N = 300 000

N = 300 000 000

~ 50 Gbytes ~ 1000 hours of speech

Page 57: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Application: Speaker Verification Results (DET-curves)

29

MFCC coefficients xi 2 R12

After silence detection

N = 60 000 000

Maximum size manageable by EM

N = 300 000

N = 300 000 000

~ 50 Gbytes ~ 1000 hours of speech

CHS

for EMfor CHS

Page 58: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

CHS

Application: Speaker Verification Results (DET-curves)

30

MFCC coefficients xi 2 R12

After silence detection

N = 60 000 000

Maximum size manageable by EM

N = 300 000

N = 300 000 000

~ 50 Gbytes ~ 1000 hours of speech

for EM

for CHS

Page 59: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Application: Speaker Verification Results (DET-curves)

31

~ 50 Gbytes ~ 1000 hours of speech

m= 500close to EM 7 200 000-fold compression one QR code 40-L

CHS

Page 60: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Application: Speaker Verification Results (DET-curves)

31

~ 50 Gbytes ~ 1000 hours of speech

m= 1000same as EM 3 600 000-fold compression two QR codes 40-L

m= 500close to EM 7 200 000-fold compression one QR code 40-L

CHS

Page 61: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

m= 5 000better than EM exploit whole collection 720 000-fold compression fit 80 on 3”1/2 floppy disk

Application: Speaker Verification Results (DET-curves)

31

~ 50 Gbytes ~ 1000 hours of speech

m= 1000same as EM 3 600 000-fold compression two QR codes 40-L

m= 500close to EM 7 200 000-fold compression one QR code 40-L

CHS

Page 62: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

Computational Efficiency

Page 63: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Computational Aspects

Sketching empirical characteristic function

33

z`

=1

N

NX

i=1

ejw>` xi

Page 64: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Computational Aspects

Sketching empirical characteristic function

33

z`

=1

N

NX

i=1

ejw>` xi

X

Page 65: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Computational Aspects

Sketching empirical characteristic function

33

z`

=1

N

NX

i=1

ejw>` xi

X

W WX

h(WX)

h(·) = ej(·)

Page 66: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Computational Aspects

Sketching empirical characteristic function

33

z`

=1

N

NX

i=1

ejw>` xi

X

W WX

h(WX)

h(·) = ej(·)

z

average

Page 67: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Computational Aspects

Sketching empirical characteristic function

33

z`

=1

N

NX

i=1

ejw>` xi

X

W WX

h(WX)

h(·) = ej(·)

z

average

~ One-layer random neural net DNN ~ hierarchical sketching ?

see also [Bruna & al 2013, Giryes & al 2015]

Page 68: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Computational Aspects

Sketching empirical characteristic function

34

z`

=1

N

NX

i=1

ejw>` xi

X

W WX

h(WX)

h(·) = ej(·)

z

average

Privacy-reserving sketch and forget

~ One-layer random neural net DNN ~ hierarchical sketching ?

see also [Bruna & al 2013, Giryes & al 2015]

Page 69: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Computational Aspects

Sketching empirical characteristic function

35

z`

=1

N

NX

i=1

ejw>` xi

X

W WX

h(WX)

h(·) = ej(·)

z

average

Streaming algorithms One pass; online update

Page 70: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Computational Aspects

Sketching empirical characteristic function

35

z`

=1

N

NX

i=1

ejw>` xi

X

W WX

h(WX)

h(·) = ej(·)

z

average

streaming…

Streaming algorithms One pass; online update

Page 71: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Computational Aspects

Sketching empirical characteristic function

36

z`

=1

N

NX

i=1

ejw>` xi

X

W WX

h(WX)

h(·) = ej(·)

z

average

… … … …

DIS TRI BU TED

Distributed computing Decentralized (HADOOP) / parallel (GPU)

Page 72: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

Conclusion

Page 73: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Projections & Learning

38

y

Signal

space

x

Observation space

Signal Processing compressive sensing

M M

Probability space

Sketch space

Machine Learning compressive learning

z

p

Linear “projection”

Compressive sensing random projections of data items

Compressive learning with sketches

random projections of collections

nonlinear in the feature vectors

linear in their probability distribution

Reduce dimension of data items

Reduce size of collection

Page 74: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Summary

Compressive clustering & Compressive GMM Bourrier, G., Perez, Compressive Gaussian Mixture Estimation. ICASSP 2013 Keriven & G.. Compressive Gaussian Mixture Estimation by Orthogonal Matching Pursuit with Replacement. SPARS 2015, Cambridge, United Kingdom Keriven & al, Sketching for Large-Scale Learning of Mixture Models (draft)

Unified framework covering projections & sketches Instance Optimal Decoders Restricted Isometry Property

Bourrier & al, Fundamental performance limits for ideal decoders in high-dimensional linear inverse problems. IEEE Transactions on Information Theory, 2014

39

Challenge: compress before learning ?X

Information preservation ?

Details: poster N. Keriven

Page 75: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

Recent / ongoing work / challenges

Sufficient dimension for RIP

Puy, Davies & G., Recipes for stable linear embeddings from Hilbert spaces to ℝ^m, hal-01203614, see also EUSIPCO 2015 and [Dirksen 2014]

RIP for sketches in RKHS applied to compressive GMM upcoming, Keriven, Bourrier, Perez & G.

Compressive statistical learning: intrinsic dimension of PCA and other related learning tasks

work in progress, Blanchard & G.

RIP-based guarantees for general (convex & nonconvex) regularizers

Traonmilin & G, Stable recovery of low-dimensional cones in Hilbert spaces - One RIP to rule them all, arXiv:1510.00504

extends sharp RIP 1/sqrt(2) [Cai & Zhang 2014] beyond sparsity (low-rank; block/structured …)

40

m = O(dB(⌃� ⌃))

Dimension reduction ?

Decoders?

Details: poster G. Puy

Page 76: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

R. GRIBONVAL - CSA 2015 - Berlin

•Postdoc / R&D engineer positions @ IRISA ✓ theoretical and algorithmic foundations of large-scale

machine learning & signal processing ✓ funded by ERC project PLEASE

TH###NKS#

Interested ? Joint the team

41

Page 77: Rémi Gribonval Inria Rennes - Bretagne Atlantique · R. GRIBONVAL - CSA 2015 - Berlin Example: Compressive Clustering 9 X Compressive Gaussian Mixture Estimation Anthony Bourrier12,Remi

TH###NKS#