Top Banner
A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain COUILLET and Yacine CHITOUR CentraleSupélec, Université Paris-Saclay, France. September 30, 2019 Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 1 / 41
30

A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Jan 26, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

A Random Matrix Framework for Large DimensionalMachine Learning and Neural Networks

Ph.D. defense

Zhenyu LIAOsupervised by Romain COUILLET and Yacine CHITOUR

CentraleSupélec, Université Paris-Saclay, France.

September 30, 2019

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 1 / 41

Page 2: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Understanding the mechanism of large dimensional machine learning

learningalgorithm

large dimensional datax1, . . . , xn ∈ Rp I big data era: exploit large n, p

I counterintuitive phenomena, e.g.,the “curse of dimensionality”

I complete change of understandingof many algorithms

I RMT provides the tools.

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 2 / 41

Page 3: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Outline

1 MotivationSample covariance matrix for large dimensional dataA random matrix perspective of the “curse of dimensionality”

2 Main results: statistical behavior of large dimensional random feature mapsRandom feature maps for large dimensional dataApplication to random features-based ridge regressionRandom feature maps for classifying Gaussian mixturesApplication to random-feature based spectral clustering

3 ConclusionFrom toy to more realistic learning schemesFrom toy to more realistic data models

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 3 / 41

Page 4: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Sample covariance matrix in the large n, p regime

I For xi ∼ N (0, C), estimate population covariance C from n data samplesX = [x1, . . . , xn] ∈ Rp×n.

I Maximum likelihood sample covariance matrix:

C =1n

n

∑i=1

xixTi =

1n

XXT ∈ Rp×p

of rank at most n: optimal for n� p (or, for p “small”).

I In the regime n ∼ p, conventional wisdom breaks down:for C = Ip with n < p, C has at least p− n zero eigenvalues.

‖C−C‖ 6→ 0, n, p→ ∞

⇒ eigenvalue mismatch and not consistent!

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 6 / 41

Page 5: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

When is one under the random matrix regime? Almost always!

What about n = 100p? For C = Ip, as n, p→ ∞ with p/n→ c ∈ (0, ∞): theMarcenko–Pastur law

µ(dx) = (1− c−1)+δ(x) +1

2πcx

√(x− a)+(b− x)+dx

where a = (1−√

c)2, b = (1 +√

c)2 and (x)+ ≡ max(x, 0). Close match!

0.8 1 1.20

2

4

a b

Den

sity

Empirical eigenvalues of C

Marcenko-Pastur law

Population eigenvalue

Figure: Eigenvalue distribution of C versus Marcenko-Pastur law, p = 500, n = 50 000.

I eigenvalues span on [a = (1−√

c)2, b = (1+√

c)2].I for n = 100p, on a range of ±2

√c = ±0.2 around the population eigenvalue 1.

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 7 / 41

Page 6: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

“Curse of dimensionality”: loss of relevance of Euclidean distance

I Binary Gaussian mixture classification:

C1 :x ∼ N (µ, Ip), x = µ + z;

C2 :x ∼ N (−µ, Ip + E), x = −µ + (Ip + E)12 z.

for z ∼ N (0, Ip).I Neyman-Pearson test: classification is possible only when

‖µ‖ ≥ O(1), ‖E‖ ≥ O(p−1/2), | tr E| ≥ O(√

p), ‖E‖2F ≥ O(1).

I In this non-trivial setting, for xi ∈ Ca, xj ∈ Cb,

1p‖xi − xj‖2 =

1p‖zi − zj‖2 + O(p−1/2)

regardless of the classes Ca, Cb!I Indeed,

max1≤i 6=j≤n

{1p‖xi − xj‖2 − 2

}→ 0

almost surely as n, p→ ∞ (for n ∼ p and even n = pm).

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 9 / 41

Page 7: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Visualization of kernel matrices for large dimensional data

Objective: “cluster” data x1, . . . , xn ∈ Rp into C1 or C2.Consider kernel matrix Kij = exp

(− 1

2p‖xi − xj‖2)

and the second top eigenvectors v2

for small (left) and large (right) dimensional data.

(a) p = 5, n = 500

K =

v2 =[ ]

(b) p = 250, n = 500

K =

v2 =[ ]

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 10 / 41

Page 8: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

A spectral viewpoint of large kernel matrices

Accumulated effect of small “hidden” statistical information (in µ, E).

K = exp(−2

2

)(1n1T

n +1p

ZTZ)+ g(µ, E)

1p

jjT + ∗+ o‖·‖(1)

with Z = [z1, . . . , zn] ∈ Rp×n and j = [1n/2;−1n/2], the class-information vector.

ThereforeI entry-wise: for Kij = exp

(− 1

21p‖xi − xj‖2

),

Kij = exp(−1)(

1 +1p

zTi zj︸ ︷︷ ︸

O(p−1/2)

)± 1

pg(µ, E)︸ ︷︷ ︸O(p−1)

+∗

so that 1p g(µ, E)� 1

p zTi zj;

I spectrum-wise: ‖ 1p ZTZ‖ = O(1) and ‖g(µ, E) 1

p jjT‖ = O(1) as well!

⇒With RMT, we understand kernel spectral clustering for large dimensional data!

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 11 / 41

Page 9: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Reminder: random feature maps

X ∈ Rp×n

σσσσσ

random features

random W ∈ RN×pΣ ≡ σ(WX) ∈ RN×n

Figure: Illustration of random feature maps

I Key object: 1N ΣTΣ, correlation in the random feature space.

I Setting: Wiji.i.d.∼ N (0, 1) and n, p, N large.

I 1N ΣTΣ = 1

N ∑Ni=1 σ(XTwi)σ(wT

i X) for independent wi ∼ N (0, Ip).

I Performance guarantee: if N → ∞ alone, goes to the expected kernel matrix

K(X) ≡ Ew∼N (0,Ip)[σ(XTw)σ(wTX)] ∈ Rn×n

I of practical (computational and storage) interests only for N < p.

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 14 / 41

Page 10: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Random feature maps for large dimensional data

For n, p, N → ∞ with n ∼ p ∼ N, (again) closely related to K ≡ Ew[σ(XTw)σ(wTX)].

Eigenspectrum of 1N ΣTΣ [Louart, Liao, Couillet’18]

For all Lipschitz function σ, spectrum of 1N ΣTΣ asymptotically determined by Q via

the fixed-point equation

Q(z) ≡(

1N

ΣTΣ− zIn

)−1↔ Q(z) =

(K

1 + δ(z)− zIn

)−1, δ(z) =

1N

tr KQ(z)

for z ∈ C not an eigenvalue of 1N ΣTΣ.

I for X = Ip and σ(t) = t⇒Marcenko-Pastur lawI access to asymptotic performance of e.g. random feature-based ridge regression

Roadmap

X→ Σ(X) ≡ σ(WX), 1N ΣTΣ

W∼N−−−→N→∞

K(X) = Ew[σ(XTw)σ(wTX)].

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 15 / 41

Page 11: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Application: large random feature-based ridge regression

X ∈ Rp×n

σσσσσ

Σ ≡ σ(WX) ∈ RN×n

random features

βTΣ

random W ∈ RN×pβ ∈ RN×d

Figure: Illustration of a random feature-based ridge regression

I for a training set (X, Y) ∈ Rp×n ×Rd×n, β = 1n Σ( 1

n ΣTΣ + γIn)−1YT withregularization factor γ > 0

I training mean squared error (MSE) Etrain = 1n‖Y− βTΣ‖2

F

I test error Etest =1n‖Y− βTσ(WX)‖2

F on a test set (X, Y) of size nI can be as a single-hidden-layer neural network model with random weights

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 17 / 41

Page 12: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Large random feature-based ridge regression: performance mismatch

I if N → ∞ alone (N � p), 1N ΣTΣ→ K

I not true for large dimensional data (p ∼ N) [Louart, Liao, Couillet’18]I ⇒mismatch in performance prediction for MNIST data!

Figure: Example ofMNIST images

10−4 10−3 10−2 10−1 100 101 10210−3

10−2

10−1

100

N = 512N = 1 024N = 2 048

hyperparameter γ

Trai

ning

MSE

RMT predictionKernel prediction

Simulation

Figure: Training error Etrain on MNIST data with ReLU activationσ(t) = max(t, 0), n = n = 1024, p = 784.

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 18 / 41

Page 13: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Asymptotic performance of random feature-based ridge regression

Figure: Example ofMNIST images

10−4 10−3 10−2 10−1 100 101 102

10−1

100

σ(t) = max(t, 0)

σ(t) = erf(t)

σ(t) = t

hyperparameter γ

MSE

Etrain (Theory)Etest (Theory)

Etrain (Simulation)Etest (Simulation)

Figure: Performance on MNIST data, N = 512, n = n = 1024, p = 784.

⇒ Theoretical understanding and fast tuning of hyperparameter γ!

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 19 / 41

Page 14: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

From random feature maps to kernel matrices

X ∈ Rp×n

σσσσσ

random features

random W ∈ RN×pΣ ≡ σ(WX) ∈ RN×n

Figure: Illustration of random feature maps

I for Wij ∼ N (0, 1) and n, p, N large, 1N ΣTΣ closely related to kernel matrix

K(X) ≡ Ew∼N (0,Ip)[σ(XTw)σ(wTX)]

I explicit K for commonly used σ(·): ReLU(t) ≡ max(t, 0), sigmoid, quadratic, andexponential σ(t) = exp(−t2/2)

Kij = Ew[σ(wTxi)σ(wTxj)] = (2π)−

p2

∫Rp

σ(wTxi)σ(wTxj)e

− ‖w‖2

2 dw ≡ f (xi, xj).

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 20 / 41

Page 15: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Nonlinearity in simple random neural networks

Table: Ki,j for commonly used σ(·), ∠ ≡ xTi xj‖xi‖‖xj‖

.

σ(t) Ki,j = f (xi, xj)

t xTi xj

max(t, 0) 12π ‖xi‖‖xj‖

(∠ arccos (−∠) +

√1−∠2

)|t| 2

π ‖xi‖‖xj‖(∠ arcsin (∠) +

√1−∠2

)sign(t) 2

π arcsin (∠)ς2t2 + ς1t + ς0 ς2

2(2(xTi xj)

2 + ‖xi‖2‖xj‖2) + ς21xT

i xj + ς2ς0(‖xi‖2 + ‖xj‖2) + ς20

cos(t) exp(− 1

2

(‖xi‖2 + ‖xj‖2)) cosh(xT

i xj)

sin(t) exp(− 1

2

(‖xi‖2 + ‖xj‖2)) sinh(xT

i xj)

erf(t) 2π arcsin

( 2xTi xj√(1+2‖xi‖2)(1+2‖xj‖2)

)exp(− t2

2 )1√

(1+‖xi‖2)(1+‖xj‖2)−(xTi xj)2

⇒(still) highly nonlinear functions of the data x!

Roadmap

X→ Σ(X) ≡ σ(WX), 1N ΣTΣ

W∼N−−−→N→∞

K(X) = {f (xi, xj)}ni,j=1 : σ→ f .

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 21 / 41

Page 16: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Dig Deeper into K

Objective: simpler and better interpretation of σ (thus f ) in 1N ΣTΣ (and K).

Data: K-class Gaussian mixture model (GMM)

xi ∈ Ca ⇔√

pxi ∼ N (µa, Ca), xi = µa/√

p + zi

with zi ∼ N (0, Ca/p), a = 1, . . . , K of statistical mean µa and covariance Ca.

Non-trivial classification (again)

‖µa − µb‖ = O(1), ‖Ca‖ = O(1), | tr(Ca −Cb)| = O(√

p), |Ca −Cb|2F = O(p).

‖xi‖2 = ‖zi‖2︸ ︷︷ ︸O(1)

+1p‖µa‖

2 +2√

pµT

a zi︸ ︷︷ ︸O(p−1)

=1p

tr Ca︸ ︷︷ ︸O(1)

+ ‖zi‖2 − 1p

tr Ca︸ ︷︷ ︸O(p−1/2)

+1p‖µa‖

2 +2√

pµT

a zi︸ ︷︷ ︸O(p−1)

Then for C◦ = ∑Ka=1

nan Ca and Ca = C◦a + C◦, a = 1, . . . , K,

⇒ ‖xi‖2 = τ + O(p−1/2) with τ ≡ 1p tr(C◦), ‖xi − xj‖2 ≈ 2τ again!

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 23 / 41

Page 17: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Understand random feature nonlinearity in classifying GMM

Asymptotic behavior of K [Liao, Couillet’18]For all σ (and f ) listed, we have, as n ∼ p→ ∞,

‖K− K‖ → 0, K = d1(σ)

(Z + M

JT√

p

)T (Z + M

JT√

p

)+ d2(σ)UBUT + d0In

almost surely, with U ≡[

J√p , φ

]and B ≡

[ttT + 2S t

tT 1

].

I data structure: J ≡ [j1, . . . , jK ], ja canonical vector of class Ca;

I randomness of data: z, φ = {‖zi‖2 −E[‖zi‖2]}ni=1;

I statistical info: M ≡ [µ1, . . . , µK ], t ≡{

tr C◦a /√

p}K

a=1, S ≡ {tr(CaCb)/p}Ka,b=1.

Asymptotic behavior of K [Liao, Couillet’18]

‖K− K‖ → 0, K = d1(σ)A1(µa − µb, Z) + d2(σ)A2(Ca −Cb, φ) + ∗

Roadmap

Σ = σ(WX), 1N ΣTΣ

W∼N−−−→N→∞

K(X) = {f (xi, xj)}X∼GMM−−−−−→n,p→∞

K(d1, d2) : σ→ f → (d1, d2) .

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 24 / 41

Page 18: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Consequence

K = d1(σ)A1(µa − µb, Z) + d2(σ)A2(Ca −Cb, φ) + ∗

Table: Coefficients (d1, d2) in K for different σ(·)

σ(t) d1 d2

t 1 0

max(t, 0) 14

18πτ

|t| 0 12πτ

sign(t) 2πτ 0

ς2t2 + ς1t + ς0 ς21 ς2

2

cos(t) 0 e−τ

4

sin(t) e−τ 0

erf(t) 4π

12τ+1 0

exp(−t2/2) 0 14(τ+1)3

Table: Coefficients (d1, d2) in K for different σ(·)

σ(t) d1 d2

t 1 0

sin(t) e−τ 0

erf(t) 4π

12τ+1 0

sign(t) 2πτ 0

|t| 0 12πτ

cos(t) 0 e−τ

4

exp(−t2/2) 0 14(τ+1)3

ς2t2 + ς1t + ς0 ς21 ς2

2

max(t, 0) 14

18πτ

A natural classification of σ(·):

I mean-oriented, d1 6= 0, d2 = 0:t, 1t>0, sign(t), sin(t) and erf(t)⇒ separate with difference in M;

I cov-oriented, d1 = 0, d2 6= 0:|t|, cos(t) and exp(−t2/2)⇒ exploit differences in cov t, S;

I “balanced”, both d1, d2 6= 0:ReLU max(t, 0) and quadratic⇒make use of both statistics!

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 25 / 41

Page 19: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Random-feature based spectral clustering: Gaussian data

Setting: Spectral clustering using 1n ΣTΣ on Gaussian mixture data of four classes:

C1 : N (µ1, C1), C2 : N (µ1, C2), C3 : N (µ2, C1) and C4 : N (µ2, C2) with different σ(·).

Mean-oriented: linear map σ(t) = t⇒ N (µ1, C1),N (µ1, C2),N (µ2, C1),N (µ2, C2).

Eigenvector 1

C1 C2 C3 C4

Eigenvector 2

C1 C2 C3 C4

Cov-oriented: σ(t) = |t| ⇒ N (µ1, C1),N (µ1, C2),N (µ2, C1),N (µ2, C2).

Eigenvector 1

C1 C2 C3 C4

Eigenvector 2

C1 C2 C3 C4

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 27 / 41

Page 20: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Random-feature based spectral clustering: Gaussian data

“Balanced”: the ReLU function σ(t) = max(t, 0).

Eigenvector 1

C1 C2 C3 C4

Eigenvector 2

C1 C2 C3 C4

Eigenvector 1

Eige

nvec

tor

2

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 28 / 41

Page 21: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Random-feature based spectral clustering: real datasets

Figure: The MNIST image database.

time

Figure: The epileptic EEG datasets.1

1http://www.meb.unibonn.de/epileptologie/science/physik/eegdata.html.Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 29 / 41

Page 22: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Random-feature based spectral clustering: real datasets

Table: Empirical estimation of statistical information of the MNIST and EEG datasets.

‖µ1 − µ2‖2 ‖C1 −C2‖MNIST data 391.1 83.8EEG data 2.4 14.5

Table: Clustering accuracies on MNIST.

σ(t) n = 64 n = 128

mean-oriented

t 88.94% 87.30%1t>0 82.94% 85.56%

sign(t) 83.34% 85.22%sin(t) 87.81% 87.50%

cov-oriented

|t| 60.41% 57.81%cos(t) 59.56% 57.72%

exp(−t2/2) 60.44% 58.67%

balanced ReLU(t) 85.72% 82.27%

Table: Clustering accuracies on EEG.

σ(t) n = 64 n = 128

mean-oriented

t 70.31% 69.58%1t>0 65.87% 63.47%

sign(t) 64.63% 63.03%sin(t) 70.34% 68.22%

cov-oriented

|t| 99.69% 99.50%cos(t) 99.38% 99.36%

exp(−t2/2) 99.81% 99.77%

balanced ReLU(t) 87.91% 90.97%

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 30 / 41

Page 23: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Conclusion and limitations

Conclusion on large dimensional random feature maps:

Roadmap

ΣTΣ︸︷︷︸σ

K(X)︸ ︷︷ ︸f

K︸︷︷︸(d1,d2)

RF-based ridge regression RF-based spectral clustering

W ∼ N , N → ∞ X ∼ GMM, n, p→ ∞

W ∼ N , n ∼ p ∼ N

Limitations:

? optimization-based problems with implicit solution

? limited to Gaussian data

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 32 / 41

Page 24: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

A random matrix framework to optimization-based learning problem

Problem of empirical risk minimization: for {(xi, yi)}ni=1, xi ∈ Rp, yi ∈ {−1,+1}, find

classifier β such that

minβ∈Rp

1n

n

∑i=1

`(yiβTxi)

for some nonnegative convex loss `.

0− 1 loss

I logistic regression:`(t) = log(1 + e−t)

I least squares: `(t) = (t− 1)2

I boosting algorithm: `(t) = e−t

I SVM: `(t) = max(1− t, 0)

No closed-form solution, RMT provides tools to assess the performance [Mai, Liao’19].

Limitations:4 optimization-based problems with implicit solution: yes if convex!? limited to Gaussian data

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 34 / 41

Page 25: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

From theory to practice: concentrated random vectors

RMT often assumes x are affine maps Az + b of z ∈ Rp with i.i.d. entries.

Concentrated random vectorsFor a certain family of functions f : Rp 7→ R, there exists deterministic mf ∈ R

P(|f (x)−mf | > ε

)≤ e−g(ε), for some strictly increasing function g.

O(√

p)

√pSp−1 ⊂ Rp

Distribution of x

f1(x)

f2(x)

Observations f (x)

O(1)

O(1)

R

⇒The theory remains valid for concentrated random vectorsand for almost real images [Seddik, Tamaazousti, Couillet’19]!

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 36 / 41

Page 26: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

From concentrated random vectors to GANs

GeneratorGenerated

examples

Concen-

vectors!

N (0, Ip)

Real

examples

Discriminator

Real?Fake?

Figure: Illustration of a generative adversarial network (GAN).

Figure: Images samples generated by BigGAN [Brock et al.’18].

Limitations:

4 optimization-based problems with implicit solution: yes if convex!

4 limited to Gaussian data: to concentrated vectors and almost real images!

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 37 / 41

Page 27: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Some clues . . . and much more can be done!

RMT as a tool to analyze, understand and improvelarge dimensional machine learning methods.

I powerful and flexible tool to assess matrix-based machine learning systems;I study (convex) optimization-based learning methods, e.g., logistic regression;I understand impact of optimization methods, the dynamics of gradient descent;I non-convex problems (e.g, deep neural nets) are more difficult, but accessible in

some cases, e.g., low rank matrix recovery, phase retrieval, etc;I even more to be done: transfer learning, active learning, generative models,

graph-based methods, robust statistics, etc.

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 38 / 41

Page 28: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Contributions during Ph.D.

Publications:J1 C. Louart, Z. Liao, and R. Couillet. “A Random Matrix Approach to Neural Networks”. The Annals of

Applied Probability, 28(2) :1190–1248, 2018.

J2 Z. Liao, R. Couillet. “A Large Dimensional Analysis of Least Squares Support Vector Machines”, IEEETransactions on Signal Processing 67 (4), 1065-1074, 2019.

J3 X. Mai and Z. Liao. High Dimensional Classification via Empirical Risk Minimization: Improvements andOptimality. (submitted to) IEEE Transactions on Signal Processing, 2019.

J4 Y. Chitour, Z. Liao, R. Couillet. “A Geometric Approach of Gradient Descent Algorithms in NeuralNetworks”, (submitted to) Journal of Differential Equations, 2019.

C1 Z. Liao, R. Couillet, “Random Matrices Meet Machine Learning: a Large Dimensional Analysis ofLS-SVM”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’17), NewOrleans, USA, 2017.

C2 Z. Liao, R. Couillet. “On the Spectrum of Random Features Maps of High Dimensional Data”.International Conference on Machine Learning (ICML’18), Stockholm, Sweden, 2018.

C3 Z. Liao, R. Couillet, “The Dynamics of Learning: A Random Matrix Approach”, International Conference onMachine Learning (ICML’18), Stockholm, Sweden, 2018.

C4 X. Mai, Z. Liao, R. Couillet. “A Large Scale Analysis of Logistic Regression: Asymptotic Performance andNew Insights”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’19),Brighton, UK, 2019.

C5 Z. Liao, R. Couillet. “On Inner-Product Kernels of High Dimensional Data”, IEEE International Workshopon Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP’19), Guadeloupe, France, 2019.

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 39 / 41

Page 29: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Contributions during Ph.D.

Invited talks and tutorials:I Invited talks at

I DIMACS center, Rutgers University, USAI Matrix series conference, Krakow, PolandI iCODE institute, Paris-Saclay, FranceI Shanghai Jiao Tong University, ChinaI HUAWEI

I Tutorial on “Random Matrix Advances in Machine Learning and Neural Nets”(with R. Couillet and X. Mai), The 26th European Signal Processing Conference(EUSIPCO’18), Roma, Italy, 2018.

Reviewing activities:I ICML, NeurIPS, AAAI, IEEE-TSP.

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 40 / 41

Page 30: A Random Matrix Framework for Large Dimensional …A Random Matrix Framework for Large Dimensional Machine Learning and Neural Networks Ph.D. defense Zhenyu LIAO supervised by Romain

Thank you!

Thank you!

For more information, visit https://zhenyu-liao.github.io!

Z. Liao (CentraleSupélec) RMT for ML Sep 30, 2019 41 / 41