Page 1
ImprovingVariational InferencewithInverseAutoregressiveFlow
Jan.19,2017
TatsuyaShirakawa ([email protected] )
Diederik P.Kingma (OpenAI)TimSalimans (OpenAI)Rafal Jozefowics (OpenAI)XiChen(OpenAI)IlyaSutskever (OpenAI)MaxWelling(UniversityofAmsterdam)
Page 2
1
Variational Autoencoder (VAE)
log ๐ ๐
โฅ
๐ผ( ๐|๐ log ๐ ๐, ๐ โ log ๐(๐ |๐)โฅ
log ๐ ๐ โ ๐ท23 ๐ ๐|๐ โฅ ๐ ๐ ๐โฅ
๐ผ( ๐|๐ log ๐ ๐ ๐ โ ๐ท23 ๐ ๐|๐ โฅ ๐ ๐
=: โ ๐; ๐ฝ
Modelz ~ p(z;ฮท)x ~ p(x|z;ฮท)
Optimization
maximize๐ผ
1๐B log ๐ ๐๐; ๐ผ
D
EFG
Inference Modelz ~ q(z|x;ฮฝ)
Optimization
maximize๐ฝF(๐ผ,๐)
1๐Bโ ๐๐; ๐ฝ
D
EFG
ELBO
๐๐๐๐๐ฝ = ๐, ๐
P(z|x;ฮผ*)
๐ซ๐ฒ๐ณ(๐ โฅ ๐)q(z|x;ฮฝ*)
P(z|x;ฮผ)
q(z|x;ฮฝ)
Page 3
2
Requirementsfortheinferencemodelq(z|x)
ComputationalTractability1. Computationallycheaptocomputeanddifferentiate2. Computationallycheaptosamplefrom3. Parallelcomputation
Accuracy4. Sufficientlyflexibletomatch
thetrueposteriorp(z|x)
P(z|x;ฮผ*)
๐ซ๐ฒ๐ณ(๐ โฅ ๐)
q(z|x;ฮฝ*)
P(z|x;ฮผ)
q(z|x;ฮฝ)
Page 4
3
PreviousDesignsofq(z|x)
BasicDesigns- DiagonalGaussianDistribution- FullCovarianceGaussianDistribution
DesignsbasedonChangeofVariables- Nice
L.Dinh etal.,โNice:non-linearindependentcomponentsestimationโ,2014
- NormalizingFlowD.J.Rezende etal.,โVariational inferencewithnormalizingflowsโ,ICML2015
DesignsbasedonAddingAuxiliaryVariables- HamiltonianFlow/HamiltonianVariational Inference
T.Salimans etal.,โMarkovchainMonteCarloandvariational inference:Bridgingthegapโ,2014
Page 5
4
Diagonal/FullCovarianceGaussianDistribution
Diagonal:Efficientbutnotflexible๐ ๐ ๐ = ฮ U๐ ๐๐|๐U ๐ , ๐U ๐
FullCovariance:NotEfficientandnotflexible(unimodal)๐ ๐ ๐ = ๐ ๐|๐ ๐ , ๐บ ๐
1. Computationallycheaptocomputeanddifferentiate โ / โ2. Computationallycheaptosamplefrom โ / โ3. Parallelcomputation โ / โ4. Sufficientlyflexibletomatch
thetrueposteriorp(z|x)โ
Page 6
5
ChangeofVariablesbasedmethods
Transoform๐ ๐งZ ๐ฅ tomakemorepowerful distribution๐ ๐ง\ ๐ฅ viasequentialapplicationofchangeofvariables
๐๐ = ๐ ๐๐_๐
๐ ๐๐ ๐ = ๐ ๐๐_๐ ๐ det๐๐ ๐๐_๐๐๐๐_๐
_G
โ log ๐ ๐๐ป ๐ = log ๐ ๐๐ ๐ โBlog det๐๐ ๐๐_๐๐๐๐_๐
๏ฟฝ
^
โข NiceL.Dinh etal.,โNice:non-linearindependentcomponentsestimationโ,2014
โข NormalizingFlowD.J.Rezende etal.,โVariational inferencewithnormalizingflowsโ,ICML2015
Page 7
6
NormalizingFlow
Transformationvia๐๐ = ๐๐_๐ + ๐๐๐ ๐๐
\๐๐_๐ + ๐^KeyFeatures- Determinantsarecomputable
Drawbacks- Informationgoesthroughsinglebottleneck
1. Computationallycheaptocomputeanddifferentiate โ2. Computationallycheaptosamplefrom โ3. Parallelcomputation โ4. Sufficientlyflexibletomatch
thetrueposteriorp(z|x)โ
singlebottleneck
โ
๐๐_๐
๐๐
๐๐๐ป๐๐ + ๐^
๐๐๐ ๐๐๐ป๐๐ + ๐^
Page 8
7
HamiltonianFlow/HamiltonianVariational Inference
ELBOwithauxiliaryvariablesylog ๐ ๐ โฅ log ๐ ๐ โ ๐ท23 ๐ ๐|๐ โฅ ๐ ๐ ๐ โ ๐ท23 ๐ ๐ ๐, ๐ โฅ ๐ ๐ ๐, ๐ =: โ ๐
Drawing(y,z)viaHMC๐ฆ^, ๐ง^ ~๐ป๐๐ถ ๐ฆ^, ๐ง^|๐ฆ^_G, ๐ง^_G
KeyFeatures- Capabilitytosamplefromexactposterior
Drawbacks- LongmixingtimeandlowerELBO
1. Computationallycheaptocomputeanddifferentiate โ2. Computationallycheaptosamplefrom โ3. Parallelcomputation โ4. Sufficientlyflexibletomatch
thetrueposteriorp(z|x)โ
Page 9
8
Nice
Transformonlyhalfofzateachsteps
๐๐ = ๐๐๐ถ, ๐๐๐ท = ๐๐_๐๐ถ , ๐๐_๐
๐ท + ๐ ๐, ๐๐_๐๐ถ ,KeyFeatures- DeterminantoftheJacobiandet uvw ๐๐x๐
u๐๐x๐isalways1
Drawbacks- Limitedformoftransformation- lessaccuratepowerfulthanNormalizingFlow(Next)
1. Computationallycheaptocomputeanddifferentiate โ2. Computationallycheaptosamplefrom โ3. Parallel computation โ4. Sufficientlyflexibletomatch
thetrueposteriorp(z|x)โ
Page 10
9
Autoregressive Flow(proposed)
AutoregressiveFlow(๐๐^,U/๐๐ง^,z =๐๐^,U/๐๐ง^,z =0if๐ โค ๐)๐ง^,U = ๐^,U ๐๐,๐:๐_๐ + ๐^,U ๐๐,๐:๐_๐ โ ๐ง^_G,U
Keyfeatures- Powerful- Easytocomputedet ๐๐๐/๐๐๐_๐ = ฮ U๐^,U ๐ณ๐ญ_๐
Drawbacks- Difficulttoparallelize
1. Computationallycheaptocomputeanddifferentiate โ2. Computationallycheaptosamplefrom โ3. Parallel computation โ4. Sufficientlyflexibletomatch
thetrueposteriorp(z|x)โ
Page 11
10
InverseAutoregressive Flow(proposed)
InvertingAF(๐๐, ๐๐isalsoautoregressive)
๐๐ =๐๐_๐ โ ๐๐ ๐๐_๐
๐๐ ๐๐_๐KeyFeatures- EquallypowerfulasAF- Easytocomputedet ๐๐๐/๐๐๐_๐ = 1/ฮ U๐^,U ๐ณ๐ญ_๐- Parallelizable
1. Computationallycheaptocomputeanddifferentiate โ2. Computationallycheaptosamplefrom โ3. Parallelcomputation โ4. Sufficientlyflexibletomatch
thetrueposteriorp(z|x)โ
Page 12
11
IAFthroughMaskedAutoencoder (MADE)
Modelingautoregressive๐๐ and๐๐ withMADE
โข RemovingpathsfromfuturesfromAutoencodersbyintroducingmasksโขMADEisaprobabilisticmodel๐ ๐ฅ = ฮ U๐ ๐ฅU ๐ฅZ:U_G
Page 13
12
Experiments
IAFisevaluatedonimagegeneratingmodels
ModelsforMNIST- ConvolutionalVAEwithResNet blocks- IAF=2-layerMADE- IAFtransformationsarestackedwithorderingreversedalternately
ModelsforCIFAR-10(verycomplicated)
Page 16
15
IAFin1slide
๐ซ๐ฒ๐ณ(๐ โฅ ๐)
๐ ๐๐ป ๐; ๐๐ป ๐๐ป
๐ ๐ ๐; ๐โ๐ ๐ ๐; ๐
๐ ๐ ๐; ๐๐ปโ
๐ ๐๐ ๐; ๐๐ ๐๐
๐ ๐๐ ๐; ๐๐ ๐๐
Autoregressive Flow
Inverse Autoregressive Flow
IAF isรผ Easy to compute and differentiateรผ Easy to sample fromรผ Parallelizableรผ Flexible
๐ ๐ ๐; ๐๐ป
Page 17
Wearehiring!http://www.abeja.asia/
https://www.wantedly.com/companies/abeja