Top Banner
Pixel-level Generative Model Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of T¨ ubingen, Germany Pixel Recurrent Neural Networks (2016ICML) A. van den Oord, N. Kalchbrenner and K. Kavukcuoglu Google DeepMind, UK Conditional Image Generation with PixelCNN Decoders (2016NIPS) A. van den Oord, N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves and K. Kavukcuoglu Google DeepMind, UK Presented by Qi WEI June 23rd, 2017 1
38

Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

Jul 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

Pixel-level Generative Model

Generative Image Modeling Using Spatial LSTMs (2015NIPS)L. Theis and M. Bethge

University of Tubingen, Germany

Pixel Recurrent Neural Networks (2016ICML)A. van den Oord, N. Kalchbrenner and K. Kavukcuoglu

Google DeepMind, UK

Conditional Image Generation with PixelCNN Decoders (2016NIPS)A. van den Oord, N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves and K. Kavukcuoglu

Google DeepMind, UK

Presented by Qi WEIJune 23rd, 2017

1

Page 2: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

Open problem

How to generate a good image?Target: model the distribution of natural imagesExpectation: expressive, tractable and scalableDifficulty: strong statistical dependencies over hundreds of pixelsStrategy of pixel-level generation:

Explicit density model

Use chain rule to decompose likelihood of an image x into product of1-d distributions

p(x;θ)︸ ︷︷ ︸likelihood of image x

=∏i,j

p(xi,j |x<ij ;θ)︸ ︷︷ ︸conditional on all previous pixels

(1)

maximize likelihood of training data

2

Page 3: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelRNN

Generate image pixels starting from corner

Dependency on previous pixels modeled using an RNN (LSTM)

3

Page 4: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelRNN

Generate image pixels starting from corner

Dependency on previous pixels modeled using an RNN (LSTM)

4

Page 5: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelRNN

Generate image pixels starting from corner

Dependency on previous pixels modeled using an RNN (LSTM)

5

Page 6: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelRNN

Generate image pixels starting from corner

Dependency on previous pixels modeled using an RNN (LSTM)

6

Page 7: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelCNN

Also generate image pixels starting from corner

Dependency on previous pixels modeled using a CNN

Comparison with PixelRNN:

Training PixelCNN is faster than PixelRNN

Generation still proceed sequentially: still slow7

Page 8: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

1 Spatial LSTMs

2 PixelRNN

3 PixelCNN

8

Page 9: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

Spatial LSTMs

Spatial LSTMs

A recurrent image model based on multi-dimensional LSTMTarget: handle long-range dependencies that are central to object andscene understandingAdvantage:

scale to images of arbitrary size

likelihood is computationally tractable

Model Structure:

MCGSM: mixtures of conditional Gaussian scale mixtures

Spatial LSTM: a special case of the multidimensional LSTM (Graves& Schmidhuber, 2009)

RIDE: Recurrent image density estimator, i.e., MCGSM + SLSTM

9

Page 10: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

Spatial LSTMs

Recurrent model of natural images

MCGSM+RIDE

Figure: A: pixel dependency. B: causal neighborhood. C: a visualization of theproposed recurrent image model with two layers of spatial LSTMs.

10

Page 11: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

Spatial LSTMs

MCGSMs: mixtures of conditional Gaussian scale mixtures

Distribution of any parametric model with parameters θ:

p(x;θ) =∏i,j

p(xi,j |x<ij ;θ) (2)

Improve the representational power:

p(x; {θij}) =∏i,j

p(xi,j |x<ij ;θij) (3)

The conditional distribution in an MCGSM:

p(xij |x<ij ,θij) =∑c,s

p(c, s|x<ij ,θij)︸ ︷︷ ︸gate

p(xi,j |x<ij , c, s,θij︸ ︷︷ ︸expert

) (4)

11

Page 12: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

Spatial LSTMs

MCGSMs: mixtures of conditional Gaussian scale mixtures

The conditional distribution in an MCGSM:

p(xij |x<ij ,θij) =∑c,s

p(c, s|x<ij ,θij)︸ ︷︷ ︸gate

p(xi,j |x<ij , c, s,θij︸ ︷︷ ︸expert

) (5)

p(c, s|x<ij ,θij) ∝ exp(ηcs −1

2eαcsxT<ijKcx<ij)

p(xi,j |x<ij , c, s) = N (xij ;aTc x<ij , e

−αcs)(6)

where Kc is positive definite.To reduce the number of parameters:

Kc ≈∑n

β2cnbnbTn (7)

12

Page 13: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

Spatial LSTMs

Spatial LSTM

Operations:

cij = gij � iij + ci,j−1 � f cij + ci−1,j � f rij

hij = oij � tanh(cij)(8)

where gijoijiijf rijf cij

=

tanhσσσσ

TA,b

x<ijhi,j−1

hi−1,j

(9)

where

σ is the logistic sigmoid function� indicates a pointwise productTA,b is an affine transformation which depends on the onlyparameters of the network A and biij ,oij , f

cij , f

rij are gating units

13

Page 14: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

Spatial LSTMs

RIDE: recurrent image density estimator

Factorized MCGSM:p(xij |x<ij) = p(xij |hij) (10)

the state of the hidden vector only depends on pixels in x<ij and doesnot violate the factorization law

allows this RIDE to use pixels of a much larger region for prediction

nonlinearly transform the pixels before applying the MCGSM

increase the representational power of the model by stacking spatialLSTMs

14

Page 15: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

Spatial LSTMs

Experimental Results

Natural Images

Figure: Average log-likelihoods andlog-likelihood rates for image patches(w.o./w. DC) and large images extractedfrom BSDS300 [25].

Figure: Average log-likelihood rates forimage patches and large images extractedfrom van Hateren’s dataset [48].

15

Page 16: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

Spatial LSTMs

Experimental Results (cont.)

Dead leaves

Figure: From top to bottom: A 256 by 256 pixel crop of a texture [2], a samplegenerated by an MCGSM trained on the full texture [7], and a sample generated byRIDE (highlight D104, D34). 16

Page 17: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

Spatial LSTMs

Experimental Results (cont.)

Texture synthesis and inpainting

Figure: The center portion of a texture (left and center) was reconstructed by samplingfrom the posterior distribution of RIDE (right).

17

Page 18: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

Spatial LSTMs

Summary

Pros

RIDE: a deep but tractable recurrent image model based on spatialLSTMs

Superior performance in quantitative comparisons

Able to capture many different statistical patterns

Cons

Only works on grayscale images

18

Page 19: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelRNN

PixelRNN

PixelRNN: a DNN sequentially predicting the pixels in an image along thetwo spatial dimensions, similarly to RIDE withNovelty:

fast two dimensional recurrent layers

residual connections in RNN

handle RGB images

Advantages:

scale to images of arbitrary size

get better log-likelihood scores on natural images

generate more crisp, varied and globally coherent samples

19

Page 20: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelRNN

Model structure

For one band image:

p(x) =

n2∏i=1

p(xi|x1, · · · , xi−1) (11)

For RGB images:

p(xi|x<i) = p(xi,R|x<i)p(xi,G|x<i, xi,R)p(xi,B|x<i, xi,R, xi,G) (12)

Pixels as Discrete Variables:Each channel variable xi,∗ simply takes one of 256 distinct values.

20

Page 21: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelRNN

Model structure (cont.)

Row LSTM & Diagonal BiLSTM

21

Page 22: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelRNN

Model structure (cont.)

Residual Connections

Figure: Left: PixelCNN Right: PixelRNN

22

Page 23: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelRNN

Experiment Results

Residual Connections

MNIST

23

Page 24: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelRNN

Experiment Results

MNIST

24

Page 25: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelRNN

Experiment Results

Generate Images (32 × 32)

(a) CIFAR-10 (b) ImageNet

25

Page 26: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelRNN

Experiment Results

ImageNet (64 × 64)

(c) normal model (d) multi-scale model

26

Page 27: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelRNN

Experiment Results

Image Completions (32 × 32)

27

Page 28: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelCNN

PixelCNN

Target: generating images conditional on any vector, e.g.,labels/tags/embeddings

Advantages:

generate diverse, realistic scenes representing distinct animals,objects, landscapes and structures

serve as a powerful decoder

improve the log-likelihood of PixelRNN on ImageNet

much faster to train than PixelRNN

28

Page 29: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelCNN

Visualization of PixelCNN

Figure: Left: visualization of PixelCNN. Middle: masked convolution filter.Right: blind spot in the receptive field.

29

Page 30: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelCNN

Gated Convolutional Layers

Joint distribution of pixels over an image:

p(x) =

n2∏i=1

p(xi|x1, · · · , xi−1) (13)

Replace RELUs between masked convolutions with the gated activationunit:

y = tanh(Wk,f ∗ x)� σ(Wk,g ∗ x) (14)

30

Page 31: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelCNN

Conditional PixelCNN

Distribution of conditional PixelCNN:

p(x|h) =n2∏i=1

p(xi|x1, · · · , xi−1,h) (15)

• h is location-independent

y = tanh(Wk,f ∗ x+ V Tk,fh)� σ(Wk,g ∗ x+ V T

k,gh) (16)

• h is location-dependent

y = tanh(Wk,f ∗ x+ Vk,f ∗ s)� σ(Wk,g ∗ x+ Vk,g ∗ s) (17)

where s = m(h) is a spatial representation with a deconvolutional neuralnetwork m().

31

Page 32: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelCNN

Gated PixelCNN

Figure: A single Layer in the Gated PixelCNN architecture.

32

Page 33: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelCNN

Experiment Results

Unconditional Modeling with Gated PixelCNN

(a) CIFAR-10 (b) ImageNet

33

Page 34: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelCNN

Experiment Results (Cont’)

Conditioning on ImageNet Classes

34

Page 35: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelCNN

Experiment Results (Cont’)

Conditioning on Portrait Embeddings

35

Page 36: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelCNN

Experiment Results (Cont’)

PixelCNN Auto Encoder

36

Page 37: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelCNN

Summary

Pros

provides a way to calculate likelihood

training more stable than GANs

works for both discrete and continuous data

faster than PixelRNN

Cons

assumes the order of generation: top to down, left to right

slow in sampling as in PixelRNN

slow in training (though faster than PixelRNN): PixelCNN++ (fromOpenAI) converges in 5 days on 8 Titan for CIFAR dataset.

37

Page 38: Pixel-level Generative Modellcarin/Qi6.23.2017.pdf · Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tubingen, Germany Pixel Recurrent

PixelCNN

Take-home Message

Both PixelRNN and PixelCNN can

Quantitative: explicitly compute likelihood p(x) which is high

Qualitative: generate visually good image samples

and

PixelRNN: slow to train and slow to generate

PixelCNN: fast to train and slow to generate

Other interesting works:PixelVAE, PixelCNN++, PixelGAN, ...

38