Top Banner
Cross Validation and WAIC in Layered Neural Networks Sumio Watanabe Tokyo Institute of Technology Deep learning : Theory, Algorithms, and Applications 2018 March 19 th -22 nd , Tokyo, Riken AIP.
22

Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

Apr 25, 2018

Download

Documents

vuque
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

Cross Validation and WAICin Layered Neural Networks

Sumio Watanabe Tokyo Institute of Technology

Deep learning : Theory, Algorithms, and Applications

2018 March 19th-22nd, Tokyo, Riken AIP.

Page 2: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

1 Posterior of NN is highly singular

2 Bayesian Learning

3 Learning Curve is Given by

Birational Invariants

4 Generalization Loss can be

Estimated by CV and WAIC.

CONTENTS

Page 3: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

1 Posterior of NN is highly singular

Let’s see the true posterior.

Page 4: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

Layered Neural Network is Nonidentifiable

Input x

Para-meterw

Output f(x,w) w → f( ,w) is not injective

{ ∂wj f(x,w) } islinearly dependent

Mathematicalmethod was notestablished.

Page 5: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

Posterior of (y-a tanh(bx))2 for n=100

True

Even if the true is regular, the posterior is singular.

Page 6: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

Posterior of (y-a tanh(bx))2 for n=10000

Even for n=10000, the posterior is singular.

True

Page 7: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

2 Bayesian Learning

For singular learning machines, Bayes

makes the generalization loss smaller.

Page 8: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

Bayesian learning

(1) {Xi,Yi ; i=1,2,…n} ~ q(x)q(y|x)

(2) Learning machine p(y|x,w)

(3) Prior ϕ(w)

In a regression case, p(y|x,w) ∝exp( -C(y-f(x,w))2 )

H(w) = -Σ log p(Yi|Xi,w)

Minus log likelihood

Page 9: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

Posterior and Predictive

Ew[ ] =∫ ( ) exp( -H(w) ) ϕ(w) dw

∫ exp( -H(w) ) ϕ(w) dw

p*(y|x) = Ew[ p(y|x,w) ] Predictive

Posterior

q(y|x)True

estimates

Page 10: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

10

Training and Generalization Losses

G = ー E(X,Y) [ log p*(Y|X) ]GeneralizationLoss

T = ー(1/n) Σ log p*(Yi |Xi) TrainingLoss

n

i=1

If q(y|x) is realizable by p(y|x,w), then G and T converge to S (entropy of the true).

Page 11: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

3 Learning Curve is Given by

Birational Invariants

To study singular learning machines,

algebraic geometry is necessary.

Page 12: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

12

Learning Curves are given by Algebraic Geometry

n

S

E[ T ]=S+(λ-2ν)/n+o(1/n)

E[ G ]=S+λ/n+o(1/n)

S = entropy of q(y|x).

Page 13: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

13

Birational Invariants

λ and ν are birational invariants.

λ is the real log canonical threshold.

ν is the singular fluctuation.

Cf. If { ∂wj f(x,w) } is linearly independent, then

λ = ν = d/2, where d is the dimension of w.

Page 14: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

Cross Validation

Theorem (Gelfand 1998). Importance sampling CV.

C = (1/n) Σi log Ew[ 1/p(Yi|Xi,w) ]E[G] = E[ C ] + O(1/n2)

14

Epifani (2008) proved that, if a leverage sample point is contained, then Ew[ 1/p ] does not exist.

Leverage sample point : a sample point that affectsthe statistical estimation result strongly.

Vehtari and Gelman (2015) proposed approximation of importance by Pareto distribution.

Page 15: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

15

Information Criterion

E[G] = E[ T ] + d/n + o(1/n)

Cf. This is a generalized version of AIC.If { ∂wj f(x,w) } are linearly independent,

In this case CV and WAIC are equivalent in higherorder (1/n2) (2015).

Theorem. Widely Applicable Information Criterion

E[G] = E[ W ] + O(1/n2)

W = T + (1/n) Σi Vw[ log p(Yi|Xi,w) ]

Page 16: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

16

Cross Validation and Information Criteria

Cross validation requires that{Xi, Yi} is independent.

AIC and WAIC do that{Yi|Xi} is independent.

Page 17: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

4 Generalization Loss can be

Estimated by CV and WAIC.

Page 18: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

Estimation of Generalization Loss

x1 x2

f(x,w)True: x=(x1,x2)g(x) = exp( -x1

2-x22-x1x2)

q(y|x) = g(x)y (1-g(x))1-y

Learner : f(x,w) : Neural Networkp(y|x,w) = f(x,w)y (1-f(x,w))1-y

Page 19: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

19

Model Selection

x1 x2 x10

y1 y2 y10

True: 10 → 5 →10

Candidates:10 → (1, 3, 5, 7, 9)

→10

n =200n_test=1000

Posterior was approximated by Langevin equation.

Page 20: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

20

An experiment: Random 10 trials

Generali-zation WAIC

CrossValidation

AIC

Hidden Units Hidden Units

Hidden Units Hidden Units

Page 21: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

X1,…X9 X10Place of a Leverage point X10 .

Difference between CV and WAIC in Regression.

A leverage sample point was controlled. WAIC and CV were compared with the generalization loss.

Page 22: Cross Validation and WAIC in Layered Neural Networkswatanabe- Neural Network is Nonidentifiable Input x Para-meter w Output f(x,w) w → f( ,w) is not injective { ∂wjf(x,w) } is

Conclusion

(1) Posterior of NN is singular. Learning curvesare given by birational invariants.

(2) Generalization losses are estimated bycross validation and WAIC.

Future Study

To construct MCMC for large networks.