Top Banner
Disentangling Content and Pose with an Adversarial loss Emily Denton CVPR GAN Tutorial June 2018
59

Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

May 28, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Disentangling Content and Pose with an Adversarial loss

Emily Denton

CVPR GAN TutorialJune 2018

Page 2: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Generator

x

Adversarial objective

Discriminator

Generative adversarial network framework:

z

Page 3: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Encodernetwork

x

Task objective:(e.g. classification,

reconstruction, etc.)Adversarial

objective

DiscriminatorTask network

Generator

z

x

Adversarial objective

Discriminator

Generative adversarial network framework: Adversarial losses to shape representations:

Page 4: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Part I: Disentangling content and pose with an adversarial lossDenton and Birodkar. Unsupervised Learning of Disentangled Representations from Video. NIPS, 2017

Part II: Survey of adversarial losses in feature space

Page 5: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Time invariant information: Lighting, background, identity, clothing

Time varying information: Pose of body

Disentangled Representation Net (DrNet)

Disentangling auto-encoder that factorizes image sequences into temporally constant (content) and temporally varying (pose) components

Page 6: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Content encoder

Time invariant information:Lighting/BackgroundIdentity/clothing

Pose encoder

Time varying information:Pose of body

DrNet: two seperate encoders

Page 7: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

DrNet: training

● Reconstruction loss drives training

● Similarity loss makes content vectors invariant across time

● Adversarial loss enforces pose vectors to only contain info that changes across time

Page 8: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

DrNet: training

● Reconstruction loss drives training

● Similarity loss makes content vectors invariant across time

● Adversarial loss enforces pose vectors to only contain info that changes across time

Page 9: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Content encoder

Pose encoder

Frame decoder

Lreconstruction

Page 10: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Content encoder

Pose encoder

Frame decoder

Lreconstruction

Don’t want pose vector encoding anything constant across time

Content vector should contain anything predictable from past frame

Page 11: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

DrNet: training

● Reconstruction loss drives training

● Similarity loss makes content vectors invariant across time

● Adversarial loss enforces pose vectors to only contain info that changes across time

Page 12: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Content vectors should be invariant across time

Page 13: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Lsimilarity

l2 similarity loss on temporally nearby content vectors

Page 14: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

DrNet: training

● Reconstruction loss drives training

● Similarity loss makes content vectors invariant across time

● Adversarial loss enforces pose vectors to only contain info that changes across time

Page 15: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Should not be able to distinguish which video clip a pose vector comes from

Page 16: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Different video

Target 1(Same scene)

Target 0(Different

scene)

Pose encoder: Scene discriminator:

LBCE

LBCE

Same video

Page 17: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Different video

Target 1(Same scene)

Target 0(Different

scene)

Pose encoder: Scene discriminator:

LBCE

LBCE

Same video Pose

encoder held fixed

Page 18: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Pose encoder:

Same video

Target 1/2

(maximal uncertainty)

Ladversary

Scene discriminator held fixed, only used to compute gradients for pose encoder

Train pose encoder to produce pose vectors that make the discriminator maximally uncertain about the content of the video

Page 19: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Pose encoder

Content encoder

Frame decoder:

Lreconstruction

Page 20: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Pose encoder

Content encoder

Frame decoder:

Lreconstruction

Lsimilarity

Page 21: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Pose encoder Lreconstruction

Content encoder

Frame decoder:

Lsimilarity

Target = 1/

2

(maximal uncertainty)

Ladversarial

Page 22: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

SUNCG dataset: rotating objects

S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser. Semantic scene comp

● 280 chair models, 5 elevations, large variability

● Video sequence: camera rotates around chair

Page 23: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Frame decoder

Pose encoder

Content encoder

Can transfer content from one image and pose from another to synthesize a new image

Content image

Pose image

Image synthesis by analogy

Page 24: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Image synthesis by analogy

Page 25: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Interpolation in pose space

Page 26: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

● A representation that factorizes into temporally constant and temporally varying components is particularly useful for video prediction

● Instead of modeling how the entire scene changes, only need to predict the temporally varying component

● Prediction done entirely in latent pose space

Video prediction

Page 27: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

h1

Page 28: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

h1

h2

Page 29: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

h1

h2

h3

Page 30: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

h1

h2

h3

ht

ht-1

...

Page 31: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

h1

h2

h3

ht

ht-1

...

Page 32: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Train LSTM to predict future pose vectors

LSTM

h1

h2

~

LSTM

h2

h3

~

LSTM

h3

h4

~

Don’t have to worry about content vectors - they are fixed across time by design

Page 33: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

LSTM

ht-1

ht

~

LSTM

ht+1

~

LSTM

ht

~h

t+1

~

ht+2

~

Content vector from any past frame Feed predicted pose vectors back into model

Test time: generating a video sequence

Page 34: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

LSTM

ht-1

LSTM LSTM

ht

~h

t+1

~

Decoder maps back to pixels:

Page 35: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

DrNet video prediction takeaways:● Prediction done entirely in latent pose space

○ Generated images never fed recursively back into the model

● Small errors in pixel predictions don’t propagate through time

LSTM

ht-1

LSTM LSTM

ht

~h

t+1

~

Page 36: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Moving MNIST: generating forever...

● Trained model to condition on 5 frames and generate 10 frames into the future

● Can unroll model indefinitely Green box: Ground truth input (t = 1, ... 5)Red box: generated frames (t = 6, ..., 500)

● Content vector fixed across time - helps deal with occlusions

● Digits colored differently so content/pose factorization exists

Page 37: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

● Simple dataset of real-world videos

● Six actions

● Fairly uniform backgrounds

C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, volume 3, pages 32–36. IEEE, 2004.

KTH dataset

Page 38: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Motion-content net separately models motion and content in video sequences

Trained with combined MSE + GAN loss

Baseline: MCNet (Villegas et al. 2017)

[Villegas et al. Decomposing motion and content for natural video sequence prediction. In ICLR, 2017.]

Page 39: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

ConditioningFrames

KTH video generation

Page 40: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

ConditioningFrames

KTH video generation

Page 41: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

ConditioningFrames

KTH video generation

Page 42: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

[1] Villegas et al. Decomposing motion and content for natural video sequence prediction. In ICLR, 2017.

[1]

ConditioningFrames

KTH video generation

Page 43: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

KTH long term video generation

Page 44: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

KTH long term video generation

Page 45: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

KTH long term video generation

Page 46: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

KTH long term video generation

Page 47: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

KTH nearest neighbours

Page 48: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

KTH nearest neighbours

Page 49: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

● This adversarial disentangling technique is very general

● Could apply to other datasets where weak labeling is available

○ Only need grouped data - temporal coherence of videos gives us ‘labels’ for free

Page 50: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Part I: Disentangling content and pose with an adversarial lossDenton and Birodkar. Unsupervised Learning of Disentangled Representations from Video. NIPS, 2017

Part II: Survey of adversariallosses in feature space

Encodernetwork

x

Task objective:(e.g. classification,

reconstruction, etc.)Adversarial

objective

DiscriminatorTask network

Page 51: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Domain adaptation

Labelled examples from source domain, few or no labels from target domain

Source domain Target domain

Page 52: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Domain adaptation

Source encoder

Classifier

Classification loss Labelled examples from source domain, few or no labels from target domain

Target domain

Page 53: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Domain adaptation

Target encoderSource encoder

Domain discriminatorClassifier

Classification loss Adversarial loss

Adversarial loss can be used to learn domain invariant features, allowing source classifier to transfer to target domain

Page 54: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Domain adaptation

Target encoderSource encoder

Domain discriminatorClassifier

Classification loss Adversarial loss

Gradient reversal [Ganin and Lempitsky, 2015]

Label flip [Tzeng et al. 2017]

Uniform target [Tzeng et al. 2015]

Page 55: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Encodernetwork

Learning fair representations

x

Predict labelPredict sensitive

attribute

DiscriminatorTask network

● Closely related to problem of domain adaptation○ source/transfer domain vs. demographic

groups

● Different formulations of adversarial objectives achieve different notions of fairness○ Edwards & Storkey, 2016○ Beutel et al. 2017○ Zhang et al. 2018○ Madras et al. 2018

Page 56: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Independent components

● Discriminate marginal distribution vs. product of marginals: q(z1, ..., zn) vs. q(zi)

● Earlier work on discrete code setting by Schmidhuber (1992)

Kim and Mnih. Disentangling by Factorising. ICML, 2018

Page 57: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Prior distributions of generative models

Adversarial autoencoders: Match aggregate approx posterior q(z) [Makhzani et al. 2016]

Adversarial variational bayes: Match approx posterior q(z|x) [Mescheder et al. 2017]

Adversarial feature learning: GAN loss in image space and latent space[Dumoulin et al. 2017; Donahue et al. 2017]

Page 58: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

ReferencesBeutel et al. Data decisions and theoretical implications when adversarially learning fair representations. arXiv:1707.00075, 2017.

Denton and Birodkar. Unsupervised Learning of Disentangled Representations from Video. NIPS, 2017.

Donahue et al. Adversarial Feature Learning. ICLR, 2017.

Dumoulin et al. Adversarially Learned Inference. ICLR, 2017

Edwards & Storkey. Censoring Representations with an Adversary. ICLR, 2016.

Ganin and Lempitsky. Unsupervised domain adaptation by backpropagation. ICML, 2015.

Kim and Mnih. Disentangling by Factorising. ICML, 2018.

Madras et al. Learning Adversarially Fair and Transferable Representations. ICML, 2018.

Makhzani et al. Adversarial Autoencoders. ICLR Workshop, 2016.

Mescheder et al. Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks. ICML, 2017.

Schmidhuber. Learning factorial codes by predictability minimization. Neural Computation, 1992.

Tzeng et al. Simultaneous deep transfer across domains and tasks. ICCV, 2015.

Tzeng et al. Adversarial discriminative domain adaptation. CVPR, 2017.

Villegas, et al. Decomposing motion and content for natural video sequence prediction. In ICLR, 2017.

Zhang et al. Mitigating Unwanted Biases with Adversarial Learning. AIES, 2018.

Page 59: Disentangling Content and Pose with an Adversarial lossefrosgans.eecs.berkeley.edu/CVPR18_slides/Disentangling... · 2018-06-23 · Disentangling Content and Pose with an Adversarial

Thanks!