Top Banner
Improving Variational Inference with Inverse Autoregressive Flow Jan. 19, 2017 Tatsuya Shirakawa ([email protected]) Diederik P. Kingma (OpenAI) Tim Salimans (OpenAI) Rafal Jozefowics (OpenAI) Xi Chen (OpenAI) Ilya Sutskever (OpenAI) Max Welling (University of Amsterdam)
17

Improving Variational Inference with Inverse Autoregressive Flow

Jan 24, 2017

Download

Data & Analytics

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Improving Variational Inference with Inverse Autoregressive Flow

ImprovingVariational InferencewithInverseAutoregressiveFlow

Jan.19,2017

TatsuyaShirakawa ([email protected])

Diederik P.Kingma (OpenAI)TimSalimans (OpenAI)Rafal Jozefowics (OpenAI)XiChen(OpenAI)IlyaSutskever (OpenAI)MaxWelling(UniversityofAmsterdam)

Page 2: Improving Variational Inference with Inverse Autoregressive Flow

1

Variational Autoencoder (VAE)

log ๐‘ ๐’™

โ‰ฅ

๐”ผ( ๐’›|๐’™ log ๐‘ ๐’™, ๐’› โˆ’ log ๐‘ž(๐’› |๐’™)โˆฅ

log ๐‘ ๐’™ โˆ’ ๐ท23 ๐‘ž ๐’›|๐’™ โˆฅ ๐‘ ๐’› ๐’™โˆฅ

๐”ผ( ๐’›|๐’™ log ๐‘ ๐’™ ๐’› โˆ’ ๐ท23 ๐‘ž ๐’›|๐’™ โˆฅ ๐‘ ๐’›

=: โ„’ ๐’™; ๐œฝ

Modelz ~ p(z;ฮท)x ~ p(x|z;ฮท)

Optimization

maximize๐œผ

1๐‘B log ๐‘ ๐’™๐’; ๐œผ

D

EFG

Inference Modelz ~ q(z|x;ฮฝ)

Optimization

maximize๐œฝF(๐œผ,๐‚)

1๐‘Bโ„’ ๐’™๐’; ๐œฝ

D

EFG

ELBO

๐’˜๐’Š๐’•๐’‰๐œฝ = ๐, ๐‚

P(z|x;ฮผ*)

๐‘ซ๐‘ฒ๐‘ณ(๐’’ โˆฅ ๐’‘)q(z|x;ฮฝ*)

P(z|x;ฮผ)

q(z|x;ฮฝ)

Page 3: Improving Variational Inference with Inverse Autoregressive Flow

2

Requirementsfortheinferencemodelq(z|x)

ComputationalTractability1. Computationallycheaptocomputeanddifferentiate2. Computationallycheaptosamplefrom3. Parallelcomputation

Accuracy4. Sufficientlyflexibletomatch

thetrueposteriorp(z|x)

P(z|x;ฮผ*)

๐‘ซ๐‘ฒ๐‘ณ(๐’’ โˆฅ ๐’‘)

q(z|x;ฮฝ*)

P(z|x;ฮผ)

q(z|x;ฮฝ)

Page 4: Improving Variational Inference with Inverse Autoregressive Flow

3

PreviousDesignsofq(z|x)

BasicDesigns- DiagonalGaussianDistribution- FullCovarianceGaussianDistribution

DesignsbasedonChangeofVariables- Nice

L.Dinh etal.,โ€œNice:non-linearindependentcomponentsestimationโ€,2014

- NormalizingFlowD.J.Rezende etal.,โ€œVariational inferencewithnormalizingflowsโ€,ICML2015

DesignsbasedonAddingAuxiliaryVariables- HamiltonianFlow/HamiltonianVariational Inference

T.Salimans etal.,โ€MarkovchainMonteCarloandvariational inference:Bridgingthegapโ€,2014

Page 5: Improving Variational Inference with Inverse Autoregressive Flow

4

Diagonal/FullCovarianceGaussianDistribution

Diagonal:Efficientbutnotflexible๐‘ž ๐’› ๐’™ = ฮ U๐‘ ๐’›๐’Š|๐œ‡U ๐’™ , ๐œŽU ๐’™

FullCovariance:NotEfficientandnotflexible(unimodal)๐‘ž ๐’› ๐’™ = ๐‘ ๐’›|๐ ๐’™ , ๐šบ ๐’™

1. Computationallycheaptocomputeanddifferentiate โœ“ / โœ—2. Computationallycheaptosamplefrom โœ“ / โœ—3. Parallelcomputation โœ“ / โœ—4. Sufficientlyflexibletomatch

thetrueposteriorp(z|x)โœ—

Page 6: Improving Variational Inference with Inverse Autoregressive Flow

5

ChangeofVariablesbasedmethods

Transoform๐‘ž ๐‘งZ ๐‘ฅ tomakemorepowerful distribution๐‘ž ๐‘ง\ ๐‘ฅ viasequentialapplicationofchangeofvariables

๐’›๐’• = ๐‘“ ๐’›๐’•_๐Ÿ

๐‘ž ๐’›๐’• ๐’™ = ๐‘ž ๐’›๐’•_๐Ÿ ๐’™ det๐‘‘๐‘“ ๐’›๐’•_๐Ÿ๐‘‘๐’›๐’•_๐Ÿ

_G

โ‡’ log ๐‘ž ๐’›๐‘ป ๐’™ = log ๐‘ž ๐’›๐ŸŽ ๐’™ โˆ’Blog det๐‘‘๐‘“ ๐’›๐’•_๐Ÿ๐‘‘๐’›๐’•_๐Ÿ

๏ฟฝ

^

โ€ข NiceL.Dinh etal.,โ€œNice:non-linearindependentcomponentsestimationโ€,2014

โ€ข NormalizingFlowD.J.Rezende etal.,โ€œVariational inferencewithnormalizingflowsโ€,ICML2015

Page 7: Improving Variational Inference with Inverse Autoregressive Flow

6

NormalizingFlow

Transformationvia๐’›๐’• = ๐’›๐’•_๐Ÿ + ๐’–๐’•๐‘“ ๐’˜๐’•

\๐’›๐’•_๐Ÿ + ๐‘^KeyFeatures- Determinantsarecomputable

Drawbacks- Informationgoesthroughsinglebottleneck

1. Computationallycheaptocomputeanddifferentiate โœ“2. Computationallycheaptosamplefrom โœ“3. Parallelcomputation โœ—4. Sufficientlyflexibletomatch

thetrueposteriorp(z|x)โœ—

singlebottleneck

โŠ•

๐’›๐’•_๐Ÿ

๐’›๐’•

๐’˜๐’•๐‘ป๐’›๐’• + ๐‘^

๐’–๐’•๐‘“ ๐’˜๐’•๐‘ป๐’›๐’• + ๐‘^

Page 8: Improving Variational Inference with Inverse Autoregressive Flow

7

HamiltonianFlow/HamiltonianVariational Inference

ELBOwithauxiliaryvariablesylog ๐‘ ๐’™ โ‰ฅ log ๐‘ ๐’™ โˆ’ ๐ท23 ๐‘ž ๐’›|๐’™ โˆฅ ๐‘ ๐’› ๐’™ โˆ’ ๐ท23 ๐‘ž ๐’š ๐’™, ๐’› โˆฅ ๐‘Ÿ ๐’š ๐’™, ๐’› =: โ„’ ๐’™

Drawing(y,z)viaHMC๐‘ฆ^, ๐‘ง^ ~๐ป๐‘€๐ถ ๐‘ฆ^, ๐‘ง^|๐‘ฆ^_G, ๐‘ง^_G

KeyFeatures- Capabilitytosamplefromexactposterior

Drawbacks- LongmixingtimeandlowerELBO

1. Computationallycheaptocomputeanddifferentiate โœ—2. Computationallycheaptosamplefrom โœ—3. Parallelcomputation โœ—4. Sufficientlyflexibletomatch

thetrueposteriorp(z|x)โœ“

Page 9: Improving Variational Inference with Inverse Autoregressive Flow

8

Nice

Transformonlyhalfofzateachsteps

๐’›๐’• = ๐’›๐’•๐œถ, ๐’›๐’•๐œท = ๐’›๐’•_๐Ÿ๐œถ , ๐’›๐’•_๐Ÿ

๐œท + ๐‘“ ๐’™, ๐’›๐’•_๐Ÿ๐œถ ,KeyFeatures- DeterminantoftheJacobiandet uvw ๐’›๐’•x๐Ÿ

u๐’›๐’•x๐Ÿisalways1

Drawbacks- Limitedformoftransformation- lessaccuratepowerfulthanNormalizingFlow(Next)

1. Computationallycheaptocomputeanddifferentiate โœ“2. Computationallycheaptosamplefrom โœ“3. Parallel computation โœ—4. Sufficientlyflexibletomatch

thetrueposteriorp(z|x)โœ—

Page 10: Improving Variational Inference with Inverse Autoregressive Flow

9

Autoregressive Flow(proposed)

AutoregressiveFlow(๐‘‘๐œ‡^,U/๐‘‘๐‘ง^,z =๐‘‘๐œŽ^,U/๐‘‘๐‘ง^,z =0if๐‘– โ‰ค ๐‘—)๐‘ง^,U = ๐œ‡^,U ๐’›๐’•,๐ŸŽ:๐’Š_๐Ÿ + ๐œŽ^,U ๐’›๐’•,๐ŸŽ:๐’Š_๐Ÿ โŠ™ ๐‘ง^_G,U

Keyfeatures- Powerful- Easytocomputedet ๐œ•๐’›๐’•/๐œ•๐’›๐’•_๐Ÿ = ฮ U๐œŽ^,U ๐ณ๐ญ_๐Ÿ

Drawbacks- Difficulttoparallelize

1. Computationallycheaptocomputeanddifferentiate โœ“2. Computationallycheaptosamplefrom โœ“3. Parallel computation โœ—4. Sufficientlyflexibletomatch

thetrueposteriorp(z|x)โœ“

Page 11: Improving Variational Inference with Inverse Autoregressive Flow

10

InverseAutoregressive Flow(proposed)

InvertingAF(๐๐’•, ๐ˆ๐’•isalsoautoregressive)

๐’›๐’• =๐’›๐’•_๐Ÿ โˆ’ ๐๐’• ๐’›๐’•_๐Ÿ

๐ˆ๐’• ๐’›๐’•_๐ŸKeyFeatures- EquallypowerfulasAF- Easytocomputedet ๐œ•๐’›๐’•/๐œ•๐’›๐’•_๐Ÿ = 1/ฮ U๐œŽ^,U ๐ณ๐ญ_๐Ÿ- Parallelizable

1. Computationallycheaptocomputeanddifferentiate โœ“2. Computationallycheaptosamplefrom โœ“3. Parallelcomputation โœ“4. Sufficientlyflexibletomatch

thetrueposteriorp(z|x)โœ“

Page 12: Improving Variational Inference with Inverse Autoregressive Flow

11

IAFthroughMaskedAutoencoder (MADE)

Modelingautoregressive๐๐’• and๐ˆ๐’• withMADE

โ€ข RemovingpathsfromfuturesfromAutoencodersbyintroducingmasksโ€ขMADEisaprobabilisticmodel๐‘ ๐‘ฅ = ฮ U๐‘ ๐‘ฅU ๐‘ฅZ:U_G

Page 13: Improving Variational Inference with Inverse Autoregressive Flow

12

Experiments

IAFisevaluatedonimagegeneratingmodels

ModelsforMNIST- ConvolutionalVAEwithResNet blocks- IAF=2-layerMADE- IAFtransformationsarestackedwithorderingreversedalternately

ModelsforCIFAR-10(verycomplicated)

Page 14: Improving Variational Inference with Inverse Autoregressive Flow

13

MNIST

Page 15: Improving Variational Inference with Inverse Autoregressive Flow

14

CIFAR-10

Page 16: Improving Variational Inference with Inverse Autoregressive Flow

15

IAFin1slide

๐‘ซ๐‘ฒ๐‘ณ(๐’’ โˆฅ ๐’‘)

๐’’ ๐’›๐‘ป ๐’™; ๐‚๐‘ป ๐‚๐‘ป

๐’‘ ๐’› ๐’™; ๐โˆ—๐’‘ ๐’› ๐’™; ๐

๐’’ ๐’› ๐’™; ๐‚๐‘ปโˆ—

๐’’ ๐’›๐’• ๐’™; ๐‚๐’• ๐‚๐’•

๐’’ ๐’›๐ŸŽ ๐’™; ๐‚๐ŸŽ ๐‚๐ŸŽ

Autoregressive Flow

Inverse Autoregressive Flow

IAF isรผ Easy to compute and differentiateรผ Easy to sample fromรผ Parallelizableรผ Flexible

๐’’ ๐’› ๐’™; ๐‚๐‘ป

Page 17: Improving Variational Inference with Inverse Autoregressive Flow

Wearehiring!http://www.abeja.asia/

https://www.wantedly.com/companies/abeja