Sketched Learning from Random Features Moments · Sketched Learning from Random Features Moments Nicolas Keriven Ecole Normale Supérieure (Paris) CFM-ENS chair in Data Science (thesis

Sketched Learning from Random Features Moments

Nicolas Keriven

Ecole Normale Supérieure (Paris)

CFM-ENS chair in Data Science

(thesis with Rémi Gribonval at Inria Rennes)

Imaging in Paris, Apr. 5th 2018

- Clustering

- Classification

- etc…

Context: machine learning

Nicolas Keriven

Database Task

= cat

Learning

1/21

- Clustering

- Classification

- etc…


Nicolas Keriven

Task

= cat

Large database

Learning

2/21

- Clustering

- Classification

- etc…


Nicolas Keriven

Task

= cat

Large database

Slow, costly

Learning

2/21

- Clustering

- Classification

- etc…


Nicolas Keriven

Task

= cat

Distributed database

Large database

Slow, costly

Learning

2/21

Data Stream

……

- Clustering

- Classification

- etc…


Nicolas Keriven

Task

= cat


Large database

Slow, costly

Learning

2/21

Data Stream

……

- Clustering

- Classification

- etc…


Nicolas Keriven

Task

= cat

Small intermediaterepresentation


Large database

Idea!

Slow, costly

Learning

2/21

Data Stream

……

- Clustering

- Classification

- etc…


Nicolas Keriven

Task

= cat



Large database

Idea!

Slow, costly

Learning

1: Compression

2/21

Data Stream

……

- Clustering

- Classification

- etc…


Nicolas Keriven

Task

= cat



Large database

Idea!

Slow, costly

Learning

1: Compression

2: Learning

2/21

Data Stream

……

- Clustering

- Classification

- etc…


Nicolas Keriven

Task

= cat



Large database

Idea!

Desired properties- Fast to compute (distributed, streaming, GPU…)- Preserve desired information- Preserve data privacy

Slow, costly

Learning

1: Compression

2: Learning

2/21

Three compression schemes

Nicolas Keriven

Data = Collection of vectors

Featureextraction

. . .

Database

3/21


Nicolas Keriven


Featureextraction

. . .Compression ?

Database

3/21


Nicolas Keriven


Featureextraction

. . .

. . .

Dimensionality reductionSee eg [Calderbank 2009,

Boutsidis 2010]

- Random Projection- Feature selection

Compression ?

Database

3/21

SubsamplingcoresetsSee eg[Feldman 2010]

- Uniform sampling (naive)- Adaptive sampling…


Nicolas Keriven


Featureextraction

. . .

. . .


Boutsidis 2010]


Compression ?

. . .

Database

3/21

Linear sketchSee [Thaper 2002][Cormode 2011]

- Hash tables, histograms- Sketching for learning ?

SubsamplingcoresetsSee eg[Feldman 2010]

- Uniform sampling (naive)- Adaptive sampling…


Nicolas Keriven


Featureextraction

. . .

. . .


Boutsidis 2010]


Compression ?

. . . Distributed,streaming

Database

3/21

How-to: build a sketch

Nicolas Keriven

What is a sketch ?

Any linear sketch = empirical moments

4/21


Nicolas Keriven

What is a sketch ?


4/21

What is contained in a sketch ?


Nicolas Keriven

What is a sketch ?


4/21


• : mean


Nicolas Keriven

What is a sketch ?


4/21


• : mean

• : moment


Nicolas Keriven

What is a sketch ?


4/21


• : mean

• : moment

• : histogram


Nicolas Keriven

What is a sketch ?


4/21


• : mean

• : moment

• : histogram

• Proposed: kernel random features [Rahimi 2007]

(random proj. + non-linearity)


Nicolas Keriven

What is a sketch ?


4/21


• : mean

• : moment

• : histogram



Questions:

• What information is preserved by the sketching ?

• How to retrieve this information ?

• What is a sufficient number of features ?


Nicolas Keriven

What is a sketch ?


4/21


• : mean

• : moment

• : histogram



Questions:




Intuition: sketching as a linear embedding


Nicolas Keriven

What is a sketch ?


4/21


• : mean

• : moment

• : histogram



Questions:




- Assumption:



Nicolas Keriven

What is a sketch ?


4/21


• : mean

• : moment

• : histogram



Questions:




- Assumption:

- Linear operator:



Nicolas Keriven

What is a sketch ?


4/21


• : mean

• : moment

• : histogram



Questions:




- Assumption:

- Linear operator:

- « Noisy » linear measurement:

Noise small



Nicolas Keriven

What is a sketch ?


4/21


• : mean

• : moment

• : histogram



Questions:




- Assumption:

- Linear operator:

- « Noisy » linear measurement:

Noise small


Dimensionality-reducing, random, linear embedding: Compressive Sensing?

Sketched learning in this talk

Compressive Sensing: sparsity ?

Nicolas Keriven 5/21

Compressive Sensing: Classical compressive sensing




Compressive Sensing:

• Dimensionality reduction, random operator

Classical compressive sensing

Randommatrix

Randomfeatures

averaged






• (Ill-posed) inverse problem: density estimation


Randommatrix

Randomfeatures

averaged






• (Ill-posed) inverse problem: density estimation

• Sparsity: « simple » densities (mixture model)


Randommatrix

Randomfeatures

averaged

Mixture of Diracs = k-means

Result: Compressive k-means [Keriven et al 2017]


Mixture of Diracs = k-means

Result: Compressive k-means [Keriven et al 2017]

Nicolas Keriven

Application: Spectral clusteringfor MNIST classification [Uw 2001]

Classif. Perf.

6/21

- Twice faster than k-means- 4 orders of magnitude more

memory efficient

GMM

Gaussian mixture models


GMM


Nicolas Keriven

d = 10, k = 20

Size of database

Error

7/21

GMM


Nicolas Keriven

d = 10, k = 20

Size of database

Error

Faster than EM(VLFeat’s gmm)

7/21

GMM


Nicolas Keriven

d = 10, k = 20

Size of database

Error

Application: speaker verification [Reynolds 2000] (d=12, k=64)

• EM on 300 000 vectors : 29.53• 20kB sketch computed on 50GB database: 28.96

Faster than EM(VLFeat’s gmm)

7/21

In this talk

Nicolas Keriven12/10/2017

Q: Theoretical guarantees ?

• Inspired by Compressive Sensing:

• 1: with the Restricted Isometry Property (RIP)

• 2: with dual certificates

8/21

Outline

Nicolas Keriven

Information-preservation guarantees: a RIP analysis

Total variation regularization:a dual certificate analysis

Conclusion, outlooks

Outline

Nicolas Keriven


Joint work with R. Gribonval, G. Blanchard, Y. Traonmilin



Recall: Linear inverse problem


True distribution:



Sketch:

True distribution:


Nicolas Keriven

• Estimation problem = linear inverse problem on measures

• Extremely ill-posed !

9/21

Sketch:

True distribution:


Nicolas Keriven

• Estimation problem = linear inverse problem on measures

• Extremely ill-posed !

• Feasibility? (information-preservation)

9/21

Best algorithmpossible

Sketch:

Information preservation guarantees


: Model set of « simple » distributions (eg. GMMs)











Nicolas Keriven

GoalProve the existence of a decoder robustto noise and stable to modeling error.

« Instance-optimal » decoder

10/21



Nicolas Keriven


Lower Restricted Isometry Property


10/21



Nicolas Keriven

Non-convex generalized moment matching




10/21



Nicolas Keriven

New goal: find/construct models and operators that satisfy the LRIP (w.h.p.)

Non-convex generalized moment matching




10/21

Appropriate metric

Nicolas Keriven

Goal: LRIP

11/21

Appropriate metric

Nicolas Keriven

Reproducing kernel:

Goal: LRIP

11/21

Appropriate metric

Nicolas Keriven

Kernel mean

Reproducing kernel:

Goal: LRIP

11/21

Appropriate metric

Nicolas Keriven

Kernel mean

Reproducing kernel:

Goal: LRIP

11/21

: random features [Rahimi2007]

to approximate

Appropriate metric

Nicolas Keriven

Kernel mean

Reproducing kernel:

Goal: LRIP

11/21

: random features [Rahimi2007]

to approximate

Basis for LRIP

Proof strategy (1)

Nicolas Keriven

Reformulation of the LRIP

Goal: LRIP

12/21

Proof strategy (1)

Nicolas Keriven


Goal: LRIP

12/21

Proof strategy (1)

Nicolas Keriven

Definition: Normalized Secant set


Goal: LRIP

12/21

Proof strategy (1)

Nicolas Keriven

Definition: Normalized Secant set

New goal

With high probability on :

for all , .


Goal: LRIP

12/21

Proof strategy (2)

Nicolas Keriven

Goal: LRIP

13/21

Proof strategy (2)

Nicolas Keriven

Pointwise LRIP:Concentration inequality

Goal: LRIP

13/21

Proof strategy (2)

Nicolas Keriven

Pointwise LRIP:Concentration inequality

Goal: LRIP

Extension to LRIP:covering numbers

13/21

Main result

Nicolas Keriven

Main hypothesis

The normalized secant set has finite covering numbers.

14/21

Result

For ,

Main result

Nicolas Keriven

Main hypothesis


14/21

Result

For ,

Main result

Nicolas Keriven

Main hypothesis


Quality of pointwise LRIP Dimensionality of the model

14/21

Result

For ,

W.h.p.

Main result

Nicolas Keriven

Main hypothesis



14/21

Result

For ,

W.h.p.

Main result

Nicolas Keriven

Main hypothesis



Modeling error Empirical noise

14/21

Result

For ,

W.h.p.

Main result

Nicolas Keriven

Main hypothesis



Modeling error

- Classic Compressive Sensing: finite dimension: Known- Here: infinite dimension: Technical

Empirical noise

14/21

Application

Nicolas Keriven

k-means with mixtures of Diracs

15/21

Application

Nicolas Keriven


Hypotheses- - separated centroids- - bounded domain for centroids

15/21

Application

Nicolas Keriven



(no assumptionon the data)

15/21

Application

Nicolas Keriven



Sketch- Adjusted Random Fourier features (for

technical reasons)


15/21

Application

Nicolas Keriven




technical reasons)

Result- W.r.t. k-means usual cost (SSE)


15/21

Application

Nicolas Keriven




technical reasons)


Sketch size


15/21

Application

Nicolas Keriven

k-means with mixtures of Diracs GMM with known covariance



technical reasons)


Sketch size


15/21

Application

Nicolas Keriven




technical reasons)


Sketch size

Hypotheses- Sufficiently separated means- Bounded domain for means


15/21

Application

Nicolas Keriven




technical reasons)


Sketch size


Sketch- Fourier features


15/21

Application

Nicolas Keriven




technical reasons)


Sketch size



Result- With respect to log-likelihood


15/21

Application

Nicolas Keriven




technical reasons)


Sketch size



Result- With respect to log-likelihood

Sketch size


15/21

Summary


With the RIP analysis:

Summary



• Moment matching: best decoder possible (instance optimal)• Information-preservation guarantees

Summary




• Fine control on modeling error, noise, and metrics• Can incorporate k-means cost or log-likelihood

Summary






Summary






• Random, dimensionality-reducing operator

Summary







• Sparsity

Summary







• Sparsity

• The information is preserved

Summary







• Sparsity

• The information is preserved

• Convex relaxation?

Outline

Nicolas Keriven


Total variation regularization:a dual certificate analysisJoint work with C. Poon, G. Peyré


Total Variation regularization


Previously: RIP analysis

Minimization: moment matching

16/21




• Must know

• Non-convex !


16/21




• Must know

• Non-convex !


Convex relaxation (« super resolution »)

• : Radon measure

•

• : Total variation (« L1 norm »)

16/21




• Must know

• Non-convex !



• : Radon measure

•


Convex:• can be handled by eg Frank-Wolfe algorithm

[Boyd 2015], or in some cases as a SDP

16/21




• Must know

• Non-convex !



• : Radon measure

•


Convex:• can be handled by eg Frank-Wolfe algorithm

[Boyd 2015], or in some cases as a SDP

Questions:• Is the measure sparse ?

• Does it have the right number of components ?

• Does it recover the true ?

16/21

A bit of convex analysis


Intuition: first order conditions: solution

17/21




Def. : Dual certificate ( = Lagrange multiplier in the noiseless case…)

17/21





What is a dual certificate?

17/21

Such that:

•

• otherwise•






17/21

Such that:

•

• otherwise•






Ensures uniqueness and robustness…

17/21

Strategy: going back to random features


Step 1: study full kernel

18/21




18/21




Assumptions:• Kernel « well-behaved »• sufficiently separated

18/21




Step 2: bounding the deviations


18/21






• Pointwise deviation (concentration ineq.)• Covering numbers

18/21







m=10

18/21







m=10 m=20

18/21







m=50m=10 m=20

18/21

Results for separated GMM


1: Ideal scaling in sparsity

19/21

Assumption: data are actually drawn from a GMM…




19/21





In progress…

19/21





In progress…

• not necessarily sparse, but:

• Mass of concentrated around true

• Proof: infinite-dimensional golfingscheme (new)

19/21





In progress…




2: Minimal norm certificate[Duval, Peyré 2015]

In progress…

19/21





In progress…




2: Minimal norm certificate[Duval, Peyré 2015]

In progress…

• when n high enough: sparse, withright number of components

•

• Proof: adaptation of [Tang, Recht 2013](constructive!)

19/21


Outline

Nicolas Keriven




Sketch learning

Nicolas Keriven

• Sketching :• Streaming, distributed learning

• Original view on data compression and generalized moments

• Combines random features and kernel mean with infinitedimensional Compressive sensing

20/21

Summary, outlooks

Nicolas Keriven

• RIP analysis• Information preservation guarantees• Fine control on noise, modeling error (instance optimal decoder) and

recovery metrics• Necessary and sufficient conditions• But: Non-convex minimization

21/21

Summary, outlooks

Nicolas Keriven

• Dual certificate analysis• Convex minimization• Does not handle modelling error• In some cases, automatically guess the right number of components



21/21

Summary, outlooks

Nicolas Keriven

• Dual certificate analysis• Convex minimization• Does not handle modelling error• In some cases, automatically guess the right number of components



21/21

• Outlooks• Algorithms for TV minimization• Other features (not necessarily random…)• Other « sketched » learning tasks• Multilayer sketches ?

Thank you !

Nicolas Keriven

• Keriven, Bourrier, Gribonval, Pérez. Sketching for Large-Scale Learning of Mixture Models Information & Inference: a Journal of the IMA, 2017. <arXiv:1606.02838>

• Keriven, Tremblay, Traonmilin, Gribonval. Compressive k-means ICASSP, 2017.

• Gribonval, Blanchard, Keriven, Traonmilin. Compressive Statistical Learning with Random Feature Moments. Preprint 2017. <arXiv:1706.07180>

• Keriven. Sketching for Large-Scale Learning of Mixture Models. PhD Thesis. <tel-01620815>

• Poon, Keriven, Peyré. A Dual Certificates Analysis of Compressive Off-the-Grid Recovery. Submitted

• Code: sketchml.gforge.inria.fr,github: nkeriven

Sketched Learning from Random Features Moments · Sketched Learning from Random Features Moments Nicolas Keriven Ecole Normale Supérieure (Paris) CFM-ENS chair in Data Science (thesis

Documents