Auto-Encoding Variational Bayes - University of Cambridge · 2019-10-30 · Auto-Encoding Variational Bayes Pawe l F. P. Budzianowski, Thomas F. W. Nicholson, William C. Tebbutt Problem

Auto-Encoding Variational BayesPawe l F. P. Budzianowski, Thomas F. W. Nicholson, William C. Tebbutt

Problem Definition

Perform approximate inference in model with local latent variables ziwhilst learning point estimates for the MAP solution for the global pa-rameters θ having observed xi.

θ

zi xi

N

SGVB

Stochastic Gradient Variational Bayes provides a method to find a deter-ministic approximation to an intractable posterior distribution by findingparameters φ such that DKL (qφ (zi |xi) || pθ (zi |xi)) is minimised forall i. This is achieved by, for each observation, maximising a lower bound

L (φ;xi) = Eqφ(zi |xi) [log pθ (xi | zi)]− DKL (qφ (zi |xi) || p (zi)) .

The expectation term in this lower bound cannot typically be computedexactly.

L̃B (θ, φ;xi) =1

L

L∑l=1

(log pθ (xi | zi,l)− DKL (qφ (zi |xi) || pθ (zi))) ,

where reparameterising z = gφ(x, ε) with ε ∼ p(ε) yields a differentiableMonte Carlo approximation.

Variational Autoencoder

The Variational Autoencoder is a generative latent variable model fordata in which zi ∼ N (0, I) and xi ∼ pθ (xi | zi), where this conditionalis parameterised by an multi-layer perceptron (MLP).

An MLP recognition model qφ (zi |xi) is used to provide fast approxi-mate posterior inference in zi | xi.

The MLPs used in the recognition model qφ and conditional distributionpθ (xi | zi) are often compared to the encoder and decoder networks intraditional autoencoders respectively.

Noisy KL-divergence estimate

In the case of the non-Gaussian distributions, it is often impossible toobtain closed-form expression for the KL-divergence term which also re-quires estimation by sampling. This yields more generic estimator of theform:

L̃A(θ,φ;x(i)) =1

L

L∑l=1

(log pθ(x

(i), z(i,l))− log qφ(z(i,l)|x(i))

).

0.0 0.2 0.4 0.6 0.8 1.0Training samples evaluated 1e8

160

150

140

130

120

110

100

90

L

MNIST, Nz = 3

LA (train)LA (test)LB (train)LB (test)


160

150

140

130

120

110

100

90 MNIST, Nz = 10



160

150

140

130

120

110

100

90 MNIST, Nz = 20



160

150

140

130

120

110

100

90 MNIST, Nz = 200


Visualisation of learned manifolds

The linearly spaced grid of coordinates over the unit square is mappedthrough the inverse CDF of the Gaussian to obtain the value of z whichcan be used to sample from pθ(x|z) with the estimated parameters θ.

Bayesian: is it really all that?

Comparing reconstruction error to vanilla auto-encoder, we see strongerperformance from VAEB.

2 10 200

5

10

15

20

25

30

35

40

Latent space size

MS

E

MSE for various VAE and AE

AEVAE

AEVAEOriginal

dim 2

dim 10

dim 20

Full Variational Bayes

Possible to perform full VB on parameters:

L(φ;X) =

∫qφ(θ)(log pθ(X) + log pα(θ)− log qφ(θ))dθ

A differentiable Monte Carlo estimate to perform SGVB, yielding a dis-tribution over parameters.Implementation showed a decrease of variational lower bound, but noevidence of learning, possibly due to strict Gaussian assumptions of vari-ational approximate posteriors.

Architecture experiments

We examined various changes to the original architecture of the auto-encoder to test the robustness and flexibility of the model which lead toimprovement in terms of optimising the lower bound and computationalefficiency.

• Different activa-tion functions.

0.0 0.2 0.4 0.6 0.8 1.0# Training samples evaluated 1e8

160

150

140

130

120

110

100

90

L

MNIST, Nz = 5

0.0 0.2 0.4 0.6 0.8 1.01e8

160

150

140

130

120

110

100

90 MNIST, Nz = 10

0.0 0.2 0.4 0.6 0.8 1.01e8

160

150

140

130

120

110

100

90 MNIST, Nz = 20

Sigmoid (train)Sigmoid (test)ReLu (train)ReLu (test)Tanh (train)Tanh (test)

• Increasing thedepth of theencoder.

0.0 0.2 0.4 0.6 0.8 1.0# Training samples evaluated 1e8

220

200

180

160

140

120

100

L

MNIST, Defualt AEVB

0.0 0.2 0.4 0.6 0.8 1.01e8

220

200

180

160

140

120

100

MNIST, Depth = 2

0.0 0.2 0.4 0.6 0.8 1.01e8

220

200

180

160

140

120

100

MNIST, Depth = 3

0.0 0.2 0.4 0.6 0.8 1.01e8

220

200

180

160

140

120

100

MNIST, Depth = 4

traintest

Future works

I. Scheduled training of VAEB [2].II. Direct parameterization of differentiable transform [3].III. Different priors over latent space.

References

1. Kingma, D. P., and Welling M., ”Auto-encoding variational bayes.”arXiv preprint arXiv:1312.6114 (2013).

2. Geras, K. J., and Sutton C., ”Scheduled denoising autoencoders.” arXivpreprint arXiv:1406.3269 (2014).

3. Tran, D, Ranganath, R. and Blei, M. ”Variational Gaussian Process”arXiv preprint arXiv:1511.06499 (2015).

Auto-Encoding Variational Bayes - University of Cambridge · 2019-10-30 · Auto-Encoding Variational Bayes Pawe l F. P. Budzianowski, Thomas F. W. Nicholson, William C. Tebbutt Problem

Documents