Sketched Learning from Random Features Moments Nicolas Keriven Ecole Normale Supérieure (Paris) CFM-ENS chair in Data Science (thesis with Rémi Gribonval at Inria Rennes) Imaging in Paris, Apr. 5th 2018
Sketched Learning from Random Features Moments
Nicolas Keriven
Ecole Normale Supérieure (Paris)
CFM-ENS chair in Data Science
(thesis with Rémi Gribonval at Inria Rennes)
Imaging in Paris, Apr. 5th 2018
- Clustering
- Classification
- etc…
Context: machine learning
Nicolas Keriven
Database Task
= cat
Learning
1/21
- Clustering
- Classification
- etc…
Context: machine learning
Nicolas Keriven
Task
= cat
Large database
Learning
2/21
- Clustering
- Classification
- etc…
Context: machine learning
Nicolas Keriven
Task
= cat
Large database
Slow, costly
Learning
2/21
- Clustering
- Classification
- etc…
Context: machine learning
Nicolas Keriven
Task
= cat
Distributed database
Large database
Slow, costly
Learning
2/21
Data Stream
……
- Clustering
- Classification
- etc…
Context: machine learning
Nicolas Keriven
Task
= cat
Distributed database
Large database
Slow, costly
Learning
2/21
Data Stream
……
- Clustering
- Classification
- etc…
Context: machine learning
Nicolas Keriven
Task
= cat
Small intermediaterepresentation
Distributed database
Large database
Idea!
Slow, costly
Learning
2/21
Data Stream
……
- Clustering
- Classification
- etc…
Context: machine learning
Nicolas Keriven
Task
= cat
Small intermediaterepresentation
Distributed database
Large database
Idea!
Slow, costly
Learning
1: Compression
2/21
Data Stream
……
- Clustering
- Classification
- etc…
Context: machine learning
Nicolas Keriven
Task
= cat
Small intermediaterepresentation
Distributed database
Large database
Idea!
Slow, costly
Learning
1: Compression
2: Learning
2/21
Data Stream
……
- Clustering
- Classification
- etc…
Context: machine learning
Nicolas Keriven
Task
= cat
Small intermediaterepresentation
Distributed database
Large database
Idea!
Desired properties- Fast to compute (distributed, streaming, GPU…)- Preserve desired information- Preserve data privacy
Slow, costly
Learning
1: Compression
2: Learning
2/21
Three compression schemes
Nicolas Keriven
Data = Collection of vectors
Featureextraction
. . .
Database
3/21
Three compression schemes
Nicolas Keriven
Data = Collection of vectors
Featureextraction
. . .Compression ?
Database
3/21
Three compression schemes
Nicolas Keriven
Data = Collection of vectors
Featureextraction
. . .
. . .
Dimensionality reductionSee eg [Calderbank 2009,
Boutsidis 2010]
- Random Projection- Feature selection
Compression ?
Database
3/21
SubsamplingcoresetsSee eg[Feldman 2010]
- Uniform sampling (naive)- Adaptive sampling…
Three compression schemes
Nicolas Keriven
Data = Collection of vectors
Featureextraction
. . .
. . .
Dimensionality reductionSee eg [Calderbank 2009,
Boutsidis 2010]
- Random Projection- Feature selection
Compression ?
. . .
Database
3/21
Linear sketchSee [Thaper 2002][Cormode 2011]
- Hash tables, histograms- Sketching for learning ?
SubsamplingcoresetsSee eg[Feldman 2010]
- Uniform sampling (naive)- Adaptive sampling…
Three compression schemes
Nicolas Keriven
Data = Collection of vectors
Featureextraction
. . .
. . .
Dimensionality reductionSee eg [Calderbank 2009,
Boutsidis 2010]
- Random Projection- Feature selection
Compression ?
. . . Distributed,streaming
Database
3/21
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
• : moment
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
• : moment
• : histogram
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
• : moment
• : histogram
• Proposed: kernel random features [Rahimi 2007]
(random proj. + non-linearity)
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
• : moment
• : histogram
• Proposed: kernel random features [Rahimi 2007]
(random proj. + non-linearity)
Questions:
• What information is preserved by the sketching ?
• How to retrieve this information ?
• What is a sufficient number of features ?
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
• : moment
• : histogram
• Proposed: kernel random features [Rahimi 2007]
(random proj. + non-linearity)
Questions:
• What information is preserved by the sketching ?
• How to retrieve this information ?
• What is a sufficient number of features ?
Intuition: sketching as a linear embedding
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
• : moment
• : histogram
• Proposed: kernel random features [Rahimi 2007]
(random proj. + non-linearity)
Questions:
• What information is preserved by the sketching ?
• How to retrieve this information ?
• What is a sufficient number of features ?
- Assumption:
Intuition: sketching as a linear embedding
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
• : moment
• : histogram
• Proposed: kernel random features [Rahimi 2007]
(random proj. + non-linearity)
Questions:
• What information is preserved by the sketching ?
• How to retrieve this information ?
• What is a sufficient number of features ?
- Assumption:
- Linear operator:
Intuition: sketching as a linear embedding
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
• : moment
• : histogram
• Proposed: kernel random features [Rahimi 2007]
(random proj. + non-linearity)
Questions:
• What information is preserved by the sketching ?
• How to retrieve this information ?
• What is a sufficient number of features ?
- Assumption:
- Linear operator:
- « Noisy » linear measurement:
Noise small
Intuition: sketching as a linear embedding
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
• : moment
• : histogram
• Proposed: kernel random features [Rahimi 2007]
(random proj. + non-linearity)
Questions:
• What information is preserved by the sketching ?
• How to retrieve this information ?
• What is a sufficient number of features ?
- Assumption:
- Linear operator:
- « Noisy » linear measurement:
Noise small
Intuition: sketching as a linear embedding
Dimensionality-reducing, random, linear embedding: Compressive Sensing?
Sketched learning in this talk
Compressive Sensing: sparsity ?
Nicolas Keriven 5/21
Compressive Sensing: Classical compressive sensing
Sketched learning in this talk
Compressive Sensing: sparsity ?
Nicolas Keriven 5/21
Compressive Sensing:
• Dimensionality reduction, random operator
Classical compressive sensing
Randommatrix
Randomfeatures
averaged
Sketched learning in this talk
Compressive Sensing: sparsity ?
Nicolas Keriven 5/21
Compressive Sensing:
• Dimensionality reduction, random operator
• (Ill-posed) inverse problem: density estimation
Classical compressive sensing
Randommatrix
Randomfeatures
averaged
Sketched learning in this talk
Compressive Sensing: sparsity ?
Nicolas Keriven 5/21
Compressive Sensing:
• Dimensionality reduction, random operator
• (Ill-posed) inverse problem: density estimation
• Sparsity: « simple » densities (mixture model)
Classical compressive sensing
Randommatrix
Randomfeatures
averaged
Mixture of Diracs = k-means
Result: Compressive k-means [Keriven et al 2017]
Nicolas Keriven 6/21
Mixture of Diracs = k-means
Result: Compressive k-means [Keriven et al 2017]
Nicolas Keriven
Application: Spectral clusteringfor MNIST classification [Uw 2001]
Classif. Perf.
6/21
- Twice faster than k-means- 4 orders of magnitude more
memory efficient
GMM
Gaussian mixture models
Nicolas Keriven 7/21
GMM
Gaussian mixture models
Nicolas Keriven
d = 10, k = 20
Size of database
Error
7/21
GMM
Gaussian mixture models
Nicolas Keriven
d = 10, k = 20
Size of database
Error
Faster than EM(VLFeat’s gmm)
7/21
GMM
Gaussian mixture models
Nicolas Keriven
d = 10, k = 20
Size of database
Error
Application: speaker verification [Reynolds 2000] (d=12, k=64)
• EM on 300 000 vectors : 29.53• 20kB sketch computed on 50GB database: 28.96
Faster than EM(VLFeat’s gmm)
7/21
In this talk
Nicolas Keriven12/10/2017
Q: Theoretical guarantees ?
• Inspired by Compressive Sensing:
• 1: with the Restricted Isometry Property (RIP)
• 2: with dual certificates
8/21
Outline
Nicolas Keriven
Information-preservation guarantees: a RIP analysis
Total variation regularization:a dual certificate analysis
Conclusion, outlooks
Outline
Nicolas Keriven
Information-preservation guarantees: a RIP analysis
Joint work with R. Gribonval, G. Blanchard, Y. Traonmilin
Total variation regularization:a dual certificate analysis
Conclusion, outlooks
Recall: Linear inverse problem
Nicolas Keriven 9/21
True distribution:
Recall: Linear inverse problem
Nicolas Keriven 9/21
Sketch:
True distribution:
Recall: Linear inverse problem
Nicolas Keriven
• Estimation problem = linear inverse problem on measures
• Extremely ill-posed !
9/21
Sketch:
True distribution:
Recall: Linear inverse problem
Nicolas Keriven
• Estimation problem = linear inverse problem on measures
• Extremely ill-posed !
• Feasibility? (information-preservation)
9/21
Best algorithmpossible
Sketch:
Information preservation guarantees
Nicolas Keriven 10/21
: Model set of « simple » distributions (eg. GMMs)
Information preservation guarantees
Nicolas Keriven 10/21
: Model set of « simple » distributions (eg. GMMs)
Information preservation guarantees
Nicolas Keriven 10/21
: Model set of « simple » distributions (eg. GMMs)
Information preservation guarantees
Nicolas Keriven 10/21
: Model set of « simple » distributions (eg. GMMs)
Information preservation guarantees
Nicolas Keriven
GoalProve the existence of a decoder robustto noise and stable to modeling error.
« Instance-optimal » decoder
10/21
: Model set of « simple » distributions (eg. GMMs)
Information preservation guarantees
Nicolas Keriven
GoalProve the existence of a decoder robustto noise and stable to modeling error.
Lower Restricted Isometry Property
« Instance-optimal » decoder
10/21
: Model set of « simple » distributions (eg. GMMs)
Information preservation guarantees
Nicolas Keriven
Non-convex generalized moment matching
GoalProve the existence of a decoder robustto noise and stable to modeling error.
Lower Restricted Isometry Property
« Instance-optimal » decoder
10/21
: Model set of « simple » distributions (eg. GMMs)
Information preservation guarantees
Nicolas Keriven
New goal: find/construct models and operators that satisfy the LRIP (w.h.p.)
Non-convex generalized moment matching
GoalProve the existence of a decoder robustto noise and stable to modeling error.
Lower Restricted Isometry Property
« Instance-optimal » decoder
10/21
Appropriate metric
Nicolas Keriven
Goal: LRIP
11/21
Appropriate metric
Nicolas Keriven
Reproducing kernel:
Goal: LRIP
11/21
Appropriate metric
Nicolas Keriven
Kernel mean
Reproducing kernel:
Goal: LRIP
11/21
Appropriate metric
Nicolas Keriven
Kernel mean
Reproducing kernel:
Goal: LRIP
11/21
: random features [Rahimi2007]
to approximate
Appropriate metric
Nicolas Keriven
Kernel mean
Reproducing kernel:
Goal: LRIP
11/21
: random features [Rahimi2007]
to approximate
Basis for LRIP
Proof strategy (1)
Nicolas Keriven
Reformulation of the LRIP
Goal: LRIP
12/21
Proof strategy (1)
Nicolas Keriven
Reformulation of the LRIP
Goal: LRIP
12/21
Proof strategy (1)
Nicolas Keriven
Definition: Normalized Secant set
Reformulation of the LRIP
Goal: LRIP
12/21
Proof strategy (1)
Nicolas Keriven
Definition: Normalized Secant set
New goal
With high probability on :
for all , .
Reformulation of the LRIP
Goal: LRIP
12/21
Proof strategy (2)
Nicolas Keriven
Goal: LRIP
13/21
Proof strategy (2)
Nicolas Keriven
Pointwise LRIP:Concentration inequality
Goal: LRIP
13/21
Proof strategy (2)
Nicolas Keriven
Pointwise LRIP:Concentration inequality
Goal: LRIP
Extension to LRIP:covering numbers
13/21
Main result
Nicolas Keriven
Main hypothesis
The normalized secant set has finite covering numbers.
14/21
Result
For ,
Main result
Nicolas Keriven
Main hypothesis
The normalized secant set has finite covering numbers.
14/21
Result
For ,
Main result
Nicolas Keriven
Main hypothesis
The normalized secant set has finite covering numbers.
Quality of pointwise LRIP Dimensionality of the model
14/21
Result
For ,
W.h.p.
Main result
Nicolas Keriven
Main hypothesis
The normalized secant set has finite covering numbers.
Quality of pointwise LRIP Dimensionality of the model
14/21
Result
For ,
W.h.p.
Main result
Nicolas Keriven
Main hypothesis
The normalized secant set has finite covering numbers.
Quality of pointwise LRIP Dimensionality of the model
Modeling error Empirical noise
14/21
Result
For ,
W.h.p.
Main result
Nicolas Keriven
Main hypothesis
The normalized secant set has finite covering numbers.
Quality of pointwise LRIP Dimensionality of the model
Modeling error
- Classic Compressive Sensing: finite dimension: Known- Here: infinite dimension: Technical
Empirical noise
14/21
Application
Nicolas Keriven
k-means with mixtures of Diracs
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs
Hypotheses- - separated centroids- - bounded domain for centroids
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs
Hypotheses- - separated centroids- - bounded domain for centroids
(no assumptionon the data)
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs
Hypotheses- - separated centroids- - bounded domain for centroids
Sketch- Adjusted Random Fourier features (for
technical reasons)
(no assumptionon the data)
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs
Hypotheses- - separated centroids- - bounded domain for centroids
Sketch- Adjusted Random Fourier features (for
technical reasons)
Result- W.r.t. k-means usual cost (SSE)
(no assumptionon the data)
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs
Hypotheses- - separated centroids- - bounded domain for centroids
Sketch- Adjusted Random Fourier features (for
technical reasons)
Result- W.r.t. k-means usual cost (SSE)
Sketch size
(no assumptionon the data)
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs GMM with known covariance
Hypotheses- - separated centroids- - bounded domain for centroids
Sketch- Adjusted Random Fourier features (for
technical reasons)
Result- W.r.t. k-means usual cost (SSE)
Sketch size
(no assumptionon the data)
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs GMM with known covariance
Hypotheses- - separated centroids- - bounded domain for centroids
Sketch- Adjusted Random Fourier features (for
technical reasons)
Result- W.r.t. k-means usual cost (SSE)
Sketch size
Hypotheses- Sufficiently separated means- Bounded domain for means
(no assumptionon the data)
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs GMM with known covariance
Hypotheses- - separated centroids- - bounded domain for centroids
Sketch- Adjusted Random Fourier features (for
technical reasons)
Result- W.r.t. k-means usual cost (SSE)
Sketch size
Hypotheses- Sufficiently separated means- Bounded domain for means
Sketch- Fourier features
(no assumptionon the data)
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs GMM with known covariance
Hypotheses- - separated centroids- - bounded domain for centroids
Sketch- Adjusted Random Fourier features (for
technical reasons)
Result- W.r.t. k-means usual cost (SSE)
Sketch size
Hypotheses- Sufficiently separated means- Bounded domain for means
Sketch- Fourier features
Result- With respect to log-likelihood
(no assumptionon the data)
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs GMM with known covariance
Hypotheses- - separated centroids- - bounded domain for centroids
Sketch- Adjusted Random Fourier features (for
technical reasons)
Result- W.r.t. k-means usual cost (SSE)
Sketch size
Hypotheses- Sufficiently separated means- Bounded domain for means
Sketch- Fourier features
Result- With respect to log-likelihood
Sketch size
(no assumptionon the data)
15/21
Summary
Nicolas Keriven12/10/2017
With the RIP analysis:
Summary
Nicolas Keriven12/10/2017
With the RIP analysis:
• Moment matching: best decoder possible (instance optimal)• Information-preservation guarantees
Summary
Nicolas Keriven12/10/2017
With the RIP analysis:
• Moment matching: best decoder possible (instance optimal)• Information-preservation guarantees
• Fine control on modeling error, noise, and metrics• Can incorporate k-means cost or log-likelihood
Summary
Nicolas Keriven12/10/2017
With the RIP analysis:
• Moment matching: best decoder possible (instance optimal)• Information-preservation guarantees
• Fine control on modeling error, noise, and metrics• Can incorporate k-means cost or log-likelihood
Compressive Sensing:
Summary
Nicolas Keriven12/10/2017
With the RIP analysis:
• Moment matching: best decoder possible (instance optimal)• Information-preservation guarantees
• Fine control on modeling error, noise, and metrics• Can incorporate k-means cost or log-likelihood
Compressive Sensing:
• Random, dimensionality-reducing operator
Summary
Nicolas Keriven12/10/2017
With the RIP analysis:
• Moment matching: best decoder possible (instance optimal)• Information-preservation guarantees
• Fine control on modeling error, noise, and metrics• Can incorporate k-means cost or log-likelihood
Compressive Sensing:
• Random, dimensionality-reducing operator
• Sparsity
Summary
Nicolas Keriven12/10/2017
With the RIP analysis:
• Moment matching: best decoder possible (instance optimal)• Information-preservation guarantees
• Fine control on modeling error, noise, and metrics• Can incorporate k-means cost or log-likelihood
Compressive Sensing:
• Random, dimensionality-reducing operator
• Sparsity
• The information is preserved
Summary
Nicolas Keriven12/10/2017
With the RIP analysis:
• Moment matching: best decoder possible (instance optimal)• Information-preservation guarantees
• Fine control on modeling error, noise, and metrics• Can incorporate k-means cost or log-likelihood
Compressive Sensing:
• Random, dimensionality-reducing operator
• Sparsity
• The information is preserved
• Convex relaxation?
Outline
Nicolas Keriven
Information-preservation guarantees: a RIP analysis
Total variation regularization:a dual certificate analysisJoint work with C. Poon, G. Peyré
Conclusion, outlooks
Total Variation regularization
Nicolas Keriven12/10/2017
Previously: RIP analysis
Minimization: moment matching
16/21
Total Variation regularization
Nicolas Keriven12/10/2017
Previously: RIP analysis
• Must know
• Non-convex !
Minimization: moment matching
16/21
Total Variation regularization
Nicolas Keriven12/10/2017
Previously: RIP analysis
• Must know
• Non-convex !
Minimization: moment matching
Convex relaxation (« super resolution »)
• : Radon measure
•
• : Total variation (« L1 norm »)
16/21
Total Variation regularization
Nicolas Keriven12/10/2017
Previously: RIP analysis
• Must know
• Non-convex !
Minimization: moment matching
Convex relaxation (« super resolution »)
• : Radon measure
•
• : Total variation (« L1 norm »)
Convex:• can be handled by eg Frank-Wolfe algorithm
[Boyd 2015], or in some cases as a SDP
16/21
Total Variation regularization
Nicolas Keriven12/10/2017
Previously: RIP analysis
• Must know
• Non-convex !
Minimization: moment matching
Convex relaxation (« super resolution »)
• : Radon measure
•
• : Total variation (« L1 norm »)
Convex:• can be handled by eg Frank-Wolfe algorithm
[Boyd 2015], or in some cases as a SDP
Questions:• Is the measure sparse ?
• Does it have the right number of components ?
• Does it recover the true ?
16/21
A bit of convex analysis
Nicolas Keriven12/10/2017
Intuition: first order conditions: solution
17/21
A bit of convex analysis
Nicolas Keriven12/10/2017
Intuition: first order conditions: solution
Def. : Dual certificate ( = Lagrange multiplier in the noiseless case…)
17/21
A bit of convex analysis
Nicolas Keriven12/10/2017
Intuition: first order conditions: solution
Def. : Dual certificate ( = Lagrange multiplier in the noiseless case…)
What is a dual certificate?
17/21
Such that:
•
• otherwise•
A bit of convex analysis
Nicolas Keriven12/10/2017
Intuition: first order conditions: solution
Def. : Dual certificate ( = Lagrange multiplier in the noiseless case…)
What is a dual certificate?
17/21
Such that:
•
• otherwise•
A bit of convex analysis
Nicolas Keriven12/10/2017
Intuition: first order conditions: solution
Def. : Dual certificate ( = Lagrange multiplier in the noiseless case…)
What is a dual certificate?
Ensures uniqueness and robustness…
17/21
Strategy: going back to random features
Nicolas Keriven12/10/2017
Step 1: study full kernel
18/21
Strategy: going back to random features
Nicolas Keriven12/10/2017
Step 1: study full kernel
18/21
Strategy: going back to random features
Nicolas Keriven12/10/2017
Step 1: study full kernel
Assumptions:• Kernel « well-behaved »• sufficiently separated
18/21
Strategy: going back to random features
Nicolas Keriven12/10/2017
Step 1: study full kernel
Step 2: bounding the deviations
Assumptions:• Kernel « well-behaved »• sufficiently separated
18/21
Strategy: going back to random features
Nicolas Keriven12/10/2017
Step 1: study full kernel
Step 2: bounding the deviations
Assumptions:• Kernel « well-behaved »• sufficiently separated
• Pointwise deviation (concentration ineq.)• Covering numbers
18/21
Strategy: going back to random features
Nicolas Keriven12/10/2017
Step 1: study full kernel
Step 2: bounding the deviations
Assumptions:• Kernel « well-behaved »• sufficiently separated
• Pointwise deviation (concentration ineq.)• Covering numbers
m=10
18/21
Strategy: going back to random features
Nicolas Keriven12/10/2017
Step 1: study full kernel
Step 2: bounding the deviations
Assumptions:• Kernel « well-behaved »• sufficiently separated
• Pointwise deviation (concentration ineq.)• Covering numbers
m=10 m=20
18/21
Strategy: going back to random features
Nicolas Keriven12/10/2017
Step 1: study full kernel
Step 2: bounding the deviations
Assumptions:• Kernel « well-behaved »• sufficiently separated
• Pointwise deviation (concentration ineq.)• Covering numbers
m=50m=10 m=20
18/21
Results for separated GMM
Nicolas Keriven12/10/2017
1: Ideal scaling in sparsity
19/21
Assumption: data are actually drawn from a GMM…
Results for separated GMM
Nicolas Keriven12/10/2017
1: Ideal scaling in sparsity
19/21
Assumption: data are actually drawn from a GMM…
Results for separated GMM
Nicolas Keriven12/10/2017
1: Ideal scaling in sparsity
In progress…
19/21
Assumption: data are actually drawn from a GMM…
Results for separated GMM
Nicolas Keriven12/10/2017
1: Ideal scaling in sparsity
In progress…
• not necessarily sparse, but:
• Mass of concentrated around true
• Proof: infinite-dimensional golfingscheme (new)
19/21
Assumption: data are actually drawn from a GMM…
Results for separated GMM
Nicolas Keriven12/10/2017
1: Ideal scaling in sparsity
In progress…
• not necessarily sparse, but:
• Mass of concentrated around true
• Proof: infinite-dimensional golfingscheme (new)
2: Minimal norm certificate[Duval, Peyré 2015]
In progress…
19/21
Assumption: data are actually drawn from a GMM…
Results for separated GMM
Nicolas Keriven12/10/2017
1: Ideal scaling in sparsity
In progress…
• not necessarily sparse, but:
• Mass of concentrated around true
• Proof: infinite-dimensional golfingscheme (new)
2: Minimal norm certificate[Duval, Peyré 2015]
In progress…
• when n high enough: sparse, withright number of components
•
• Proof: adaptation of [Tang, Recht 2013](constructive!)
19/21
Assumption: data are actually drawn from a GMM…
Outline
Nicolas Keriven
Information-preservation guarantees: a RIP analysis
Total variation regularization:a dual certificate analysis
Conclusion, outlooks
Sketch learning
Nicolas Keriven
• Sketching :• Streaming, distributed learning
• Original view on data compression and generalized moments
• Combines random features and kernel mean with infinitedimensional Compressive sensing
20/21
Summary, outlooks
Nicolas Keriven
• RIP analysis• Information preservation guarantees• Fine control on noise, modeling error (instance optimal decoder) and
recovery metrics• Necessary and sufficient conditions• But: Non-convex minimization
21/21
Summary, outlooks
Nicolas Keriven
• Dual certificate analysis• Convex minimization• Does not handle modelling error• In some cases, automatically guess the right number of components
• RIP analysis• Information preservation guarantees• Fine control on noise, modeling error (instance optimal decoder) and
recovery metrics• Necessary and sufficient conditions• But: Non-convex minimization
21/21
Summary, outlooks
Nicolas Keriven
• Dual certificate analysis• Convex minimization• Does not handle modelling error• In some cases, automatically guess the right number of components
• RIP analysis• Information preservation guarantees• Fine control on noise, modeling error (instance optimal decoder) and
recovery metrics• Necessary and sufficient conditions• But: Non-convex minimization
21/21
• Outlooks• Algorithms for TV minimization• Other features (not necessarily random…)• Other « sketched » learning tasks• Multilayer sketches ?
Thank you !
Nicolas Keriven
• Keriven, Bourrier, Gribonval, Pérez. Sketching for Large-Scale Learning of Mixture Models Information & Inference: a Journal of the IMA, 2017. <arXiv:1606.02838>
• Keriven, Tremblay, Traonmilin, Gribonval. Compressive k-means ICASSP, 2017.
• Gribonval, Blanchard, Keriven, Traonmilin. Compressive Statistical Learning with Random Feature Moments. Preprint 2017. <arXiv:1706.07180>
• Keriven. Sketching for Large-Scale Learning of Mixture Models. PhD Thesis. <tel-01620815>
• Poon, Keriven, Peyré. A Dual Certificates Analysis of Compressive Off-the-Grid Recovery. Submitted
• Code: sketchml.gforge.inria.fr,github: nkeriven