Top Banner
VARIANTS OF GANs Jaejun Yoo Ph.D. Candidate @KAIST 13 th May, 2017 초짜 대학원생의 입장에서 이해하는
191

Variants of GANs - Jaejun Yoo

Jan 22, 2018

Download

Technology

JaeJun Yoo
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Variants of GANs - Jaejun Yoo

VARIANTS OFGANs

Jaejun YooPh.D. Candidate @KAIST

13th May, 2017

초짜 대학원생의 입장에서 이해하는

Page 2: Variants of GANs - Jaejun Yoo

안녕하세요 저는

유재준

- Ph.D. Candidate

- Medical Image Reconstruction,

- http://jaejunyoo.blogspot.com/

- CVPR 2017 NTIRE Challenge: Ranked 3rd

Topological Data Analysis, EEG

Page 3: Variants of GANs - Jaejun Yoo

이 강의의 목표

1. GAN에 대한 더 깊은 이해

2. 이후 GAN 연구의 흐름을 따라가기 위한 기반 다지기

• 기존의 GAN이 가지고 있는 문제점과 그 이유에 대한 이해

• Variants of GAN을 소개하되 주요 문제점을 해결하거나 큰 틀에서 새

로운 방향을 제시한 논문들 위주 소개

Page 4: Variants of GANs - Jaejun Yoo

BACKGROUND

Page 5: Variants of GANs - Jaejun Yoo

PREREQUISITESGenerative Models

“FACE IMAGES”

Page 6: Variants of GANs - Jaejun Yoo

PREREQUISITESGenerative Models

* Figure adopted from BEGAN paper released at 31. Mar. 2017 David Berthelot et al. Google (link)

Generated Images by Neural Network

Page 7: Variants of GANs - Jaejun Yoo

PREREQUISITESGenerative Models

“What I cannot create, I do not understand”

Page 8: Variants of GANs - Jaejun Yoo

PREREQUISITESGenerative Models

“What I cannot create, I do not understand”

If the network can learn how to draw cat and dog separately, it must be able to classify them, i.e. feature learning follows naturally.

Page 9: Variants of GANs - Jaejun Yoo

PREREQUISITESTaxonomy of Machine Learning

From Yann Lecun, (NIPS 2016)From David silver, Reinforcement learning (UCL course on RL, 2015)

Page 10: Variants of GANs - Jaejun Yoo

PREREQUISITES

Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)

y = f(x)

Page 11: Variants of GANs - Jaejun Yoo

PREREQUISITES

Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)

Page 12: Variants of GANs - Jaejun Yoo

PREREQUISITESTaxonomy of Machine Learning

From Yann Lecun, (NIPS 2016)From David silver, Reinforcement learning (UCL course on RL, 2015)

Page 13: Variants of GANs - Jaejun Yoo

PREREQUISITES

Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)

Page 14: Variants of GANs - Jaejun Yoo

PREREQUISITES

Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)

Page 15: Variants of GANs - Jaejun Yoo

PREREQUISITES

Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)

Page 16: Variants of GANs - Jaejun Yoo

PREREQUISITES

Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)

Page 17: Variants of GANs - Jaejun Yoo

PREREQUISITES

Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)

Page 18: Variants of GANs - Jaejun Yoo

PREREQUISITES

Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)

Page 19: Variants of GANs - Jaejun Yoo

PREREQUISITES

Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)

Page 20: Variants of GANs - Jaejun Yoo

PREREQUISITES

Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)

Page 21: Variants of GANs - Jaejun Yoo

PREREQUISITES

Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)

Page 22: Variants of GANs - Jaejun Yoo

PREREQUISITES

Slide adopted from Namju Kim, Kakao brain (SlideShare, AI Forum, 2017)

* Figure adopted from NIPS 2016 Tutorial: GAN paper, Ian Goodfellow 2016

Page 23: Variants of GANs - Jaejun Yoo

GAN

Page 24: Variants of GANs - Jaejun Yoo

SCHEMATIC OVERVIEW

z

G

D

x

Real or Fake?

Diagram ofStandard GAN

Gaussian noise as an input for G

Page 25: Variants of GANs - Jaejun Yoo

z

G

D

x

Real or Fake?

Diagram ofStandard GAN

지폐위조범

경찰

SCHEMATIC OVERVIEW

Page 26: Variants of GANs - Jaejun Yoo

z

G

D

x

Real or Fake?

Diagram ofStandard GAN

지폐위조범

경찰

QP

SCHEMATIC OVERVIEW

Page 27: Variants of GANs - Jaejun Yoo

Diagram ofStandard GAN

Data distribution Model distributionDiscriminator

SCHEMATIC OVERVIEW

* Figure adopted from Generative Adversarial Nets, Ian Goodfellow et al. 2014

Page 28: Variants of GANs - Jaejun Yoo

Minimax problem of GAN

THEORETICAL RESULTS

Show that…

1. The minimax problem of GAN has a global optimum at 𝒑𝒑𝒈𝒈 = 𝒑𝒑𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅

2. The proposed algorithm can find that global optimum

TWO STEP APPROACH

Page 29: Variants of GANs - Jaejun Yoo

THEORETICAL RESULTSProposition 1.

Page 30: Variants of GANs - Jaejun Yoo

THEORETICAL RESULTSMain Theorem

Page 31: Variants of GANs - Jaejun Yoo

THEORETICAL RESULTSConvergence of the proposed algorithm

Page 32: Variants of GANs - Jaejun Yoo

SUMMARY

• Supervised / Unsupervised / Reinforcement Learning

• Generative Models

• Variational Inference Technique

• Adversarial Training

• Reduce the gap between 𝑸𝑸𝒎𝒎𝒎𝒎𝒅𝒅𝒎𝒎𝒎𝒎 and 𝑷𝑷𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅

TAKE HOME KEYPOINTS

QP

Page 33: Variants of GANs - Jaejun Yoo

RELATED WORKS (APPLICATIONS)

* CycleGAN Jun-Yan Zhu et al. 2017

* SRGAN Christian Ledwig et al. 2017

Super-resolution

Domain Adaptation

Img2Img Translation

Page 34: Variants of GANs - Jaejun Yoo

DIFFICULTIES

Page 35: Variants of GANs - Jaejun Yoo

DIFFICULTIES

Page 36: Variants of GANs - Jaejun Yoo

DIFFICULTIES CONVERGENCE OF THE MODEL

Page 37: Variants of GANs - Jaejun Yoo

DIFFICULTIES MODE COLLAPSE (SAMPLE DIVERSITY)

* Slide adopted from NIPS 2016 Tutorial, Ian Goodfellow

Page 38: Variants of GANs - Jaejun Yoo

DIFFICULTIES

Page 39: Variants of GANs - Jaejun Yoo

DIFFICULTIES

TEMPORARY SOLUTION

Page 40: Variants of GANs - Jaejun Yoo

DIFFICULTIES HOW TO EVALUATE THE QUALITY?

Page 41: Variants of GANs - Jaejun Yoo

DIFFICULTIES HOW TO EVALUATE THE QUALITY?

Page 42: Variants of GANs - Jaejun Yoo

DIFFICULTIES HOW TO EVALUATE THE QUALITY?

TEMPORARY SOLUTION

Page 43: Variants of GANs - Jaejun Yoo

SUMMARY

“TRAINING GAN IS HARD”

• Power balance (NO learning)

• Convergence (oscillation)

• Mode collapse

• Evaluation (GAN training loss is intractable)

Page 44: Variants of GANs - Jaejun Yoo

SUMMARY

“TRAINING GAN IS HARD”

• Power balance (NO learning)

• Convergence (oscillation)

• Mode collapse

• Evaluation (GAN training loss is intractable)

HOW TO SOLVE THESE PROBLEMS?

Page 45: Variants of GANs - Jaejun Yoo

DCGANLET’S CAREFULLY SELECT THE ARCHITECTURE!

Page 46: Variants of GANs - Jaejun Yoo

MOTIVATION

TRAINING IS TOOOO HARD…

(Ahhh…it just does not work…orz…)

Page 47: Variants of GANs - Jaejun Yoo

SCHEMATIC OVERVIEW

Guideline for stable learning

Page 48: Variants of GANs - Jaejun Yoo

SCHEMATIC OVERVIEW

Guideline for stable learning

“However, after extensive model exploration we identified a family of architectures that resulted in stable training across a range of datasets and allowed for higher resolution and deeper generative models.”

Page 49: Variants of GANs - Jaejun Yoo

SCHEMATIC OVERVIEW

Guideline for stable learning

“However, after extensive model exploration we identified a family of architectures that resulted in stable training across a range of datasets and allowed for higher resolution and deeper generative models.”

"Most GANs today are at least loosely based on the DCGAN architecture."

- NIPS 2016 Tutorial by Ian Goodfellow

Page 50: Variants of GANs - Jaejun Yoo

Okay, learning is finished and the model converged. Then…

KEYPOINTS

“How to show that our network or generator learned A MEANINGFUL FUNCTION?”

Page 51: Variants of GANs - Jaejun Yoo

KEYPOINTS

• The generator DOES NOT MEMORIZED the images.

• There are NO SHARP TRANSITION while walking in the latent space.

• The generator UNDERSTANDS the feature of the data.

Okay, learning is finished and the model converged. Then…

Show that

Page 52: Variants of GANs - Jaejun Yoo

RESULTS

* Figure adopted from DCGAN, Alec Radford et al. 2016 (link)

What can GAN do?

Page 53: Variants of GANs - Jaejun Yoo

RESULTS

* Figure adopted from DCGAN, Alec Radford et al. 2016 (link)

What can GAN do?

“Walking in the latent space”

z-space

Page 54: Variants of GANs - Jaejun Yoo

RESULTS

* Figure adopted from DCGAN, Alec Radford et al. 2016 (link)

What can GAN do?

Page 55: Variants of GANs - Jaejun Yoo

RESULTS

* Figure adopted from DCGAN, Alec Radford et al. 2016 (link)

What can GAN do?

“Forgetting the feature it learned”

Page 56: Variants of GANs - Jaejun Yoo

RESULTS

What can GAN do?

“Vector arithmetic“(e.g. word2vec)

Page 57: Variants of GANs - Jaejun Yoo

RESULTS

What can GAN do?

“Vector arithmetic“(e.g. word2vec)

Page 58: Variants of GANs - Jaejun Yoo

RESULTS

* Figure adopted from DCGAN, Alec Radford et al. 2016 (link)

What can GAN do?

“Vector arithmetic“(e.g. word2vec)

Page 59: Variants of GANs - Jaejun Yoo

RESULTS

Neural network understanding “Rotation”

* Figure adopted from DCGAN, Alec Radford et al. 2016 (link)

What can GAN do?“Understand the meaning of the data“

(e.g. code: rotation, category, and etc.)

Page 60: Variants of GANs - Jaejun Yoo

SUMMARY

1. Guideline for stable learning

2. Good analysis on the results

• Show that the generator DOES NOT MEMORIZED the images

• Show that there are NO SHARP TRANSITION while walking in the latent space

• Show that the generator UNDERSTANDS the feature of the data

Page 61: Variants of GANs - Jaejun Yoo

Unrolled GANLET’S GIVE EXTRA INFORMATION TO THE NETWORK

(allow it to ‘see into the future’)

Page 62: Variants of GANs - Jaejun Yoo

Convergence of the proposed algorithm

MOTIVATION

Impossible to achieve in practice

Page 63: Variants of GANs - Jaejun Yoo

MOTIVATION

WHAT HAPPENS?

* Figure adopted from NIPS 2016 Tutorial GAN, Ian Goodfellow 2016

Page 64: Variants of GANs - Jaejun Yoo

MOTIVATION

WHAT HAPPENS?

* Figure adopted from NIPS 2016 Tutorial GAN, Ian Goodfellow 2016

“GAN procedure normally do not cover the whole distribution,

even when targeting a mode covering divergence such as KL”

Page 65: Variants of GANs - Jaejun Yoo

MOTIVATION

WHAT HAPPENS?

VS

𝑮𝑮∗ = 𝒎𝒎𝒎𝒎𝒎𝒎𝑮𝑮

𝒎𝒎𝒅𝒅𝒎𝒎𝑫𝑫

𝑽𝑽(𝑮𝑮,𝑫𝑫)

𝑮𝑮∗ = 𝒎𝒎𝒅𝒅𝒎𝒎𝑫𝑫

𝒎𝒎𝒎𝒎𝒎𝒎𝑮𝑮

𝑽𝑽 𝑮𝑮,𝑫𝑫

Page 66: Variants of GANs - Jaejun Yoo

MOTIVATION

WHAT HAPPENS?

VS

𝑮𝑮∗ = 𝒎𝒎𝒎𝒎𝒎𝒎𝑮𝑮

𝒎𝒎𝒅𝒅𝒎𝒎𝑫𝑫

𝑽𝑽(𝑮𝑮,𝑫𝑫)

𝑮𝑮∗ = 𝒎𝒎𝒅𝒅𝒎𝒎𝑫𝑫

𝒎𝒎𝒎𝒎𝒎𝒎𝑮𝑮

𝑽𝑽 𝑮𝑮,𝑫𝑫* A single output which can fool the current discriminator most

Page 67: Variants of GANs - Jaejun Yoo

SCHEMATIC OVERVIEW

* https://www.youtube.com/watch?v=JmON4S0kl04

“Adversarial games are not guaranteed to converge using gradient descent, e.g. rock, scissor, paper.”

Page 68: Variants of GANs - Jaejun Yoo

UNROLLED GAN

* Figure adopted from Unrolled GAN, Luke Metz et al. 2016

Let’s “UNROLL” the discriminator to see several modes!

Page 69: Variants of GANs - Jaejun Yoo

UNROLLED GAN

* Figure adopted from Unrolled GAN, Luke Metz et al. 2016

Page 70: Variants of GANs - Jaejun Yoo

UNROLLED GAN

* Figure adopted from Unrolled GAN, Luke Metz et al. 2016

* Here, only the “G” part is unrolled(∵In practice, “D” usually over-powers “G”)

Page 71: Variants of GANs - Jaejun Yoo

UNROLLED GAN

The Missing Gradient Term

“How the discriminator would react to a change in the generator.”

(두 가지 경우를 모두 생각해봅시다. )Trade off between

Page 72: Variants of GANs - Jaejun Yoo

RESULTS

Increased stability in terms of power balance

* Figure adopted from Unrolled GAN, Luke Metz et al. 2016

Page 73: Variants of GANs - Jaejun Yoo

SUMMARY

1. Address the mode collapsing problem

2. Unrolling the optimization problem

• Make the discriminator optimal as possible as it can

Page 74: Variants of GANs - Jaejun Yoo

IMPLEMENTATION

* Codes from the jupyter notebook of Ben Poole (2nd Author): https://github.com/poolio/unrolled_gan

Page 75: Variants of GANs - Jaejun Yoo

IMPLEMENTATION

* Codes from the jupyter notebook of Ben Poole (2nd Author): https://github.com/poolio/unrolled_gan

Page 76: Variants of GANs - Jaejun Yoo

IMPLEMENTATION

* Codes from the jupyter notebook of Ben Poole (2nd Author): https://github.com/poolio/unrolled_gan

Page 77: Variants of GANs - Jaejun Yoo

InfoGANLET’S USE ADDITIONAL CONSTRAINTS FOR THE GENERATOR

Page 78: Variants of GANs - Jaejun Yoo

MOTIVATION

“We want to get a disentangled representation space EXPLICITLY.”

Neural network understanding “Rotation”

* Figure adopted from DCGAN paper (link)

Page 79: Variants of GANs - Jaejun Yoo

MOTIVATION

“We want to get a disentangled representation space EXPLICITLY.”

Neural network understanding “Digit Type”

* Figure adopted from infoGAN paper (link)

Code

Page 81: Variants of GANs - Jaejun Yoo

• When Generator studies data representations, infoGAN imposes an extra

constraint to make NN learn the feature space in disentangled way.

• Unlike standard GAN, Generator takes a pair of variables as an input:

1. Gaussian noise z (source of incompressible noise)

2. latent code c (semantic feature of data distribution)

infoGAN

SCHEMATIC OVERVIEW

Page 82: Variants of GANs - Jaejun Yoo

z

G

D

x

Real or Fake?

Diagram ofStandard GAN

SCHEMATIC OVERVIEW

Page 83: Variants of GANs - Jaejun Yoo

c

z

G

D

x

Real or Fake?

add an extra “code” variable

Diagram ofinfoGAN

1. Gaussian noise z (source of incompressible noise)

2. latent code c (semantic feature of data distribution)

SCHEMATIC OVERVIEW

Page 84: Variants of GANs - Jaejun Yoo

c

z

G

D

x

Real or Fake?

add an extra “code” variable

Diagram ofinfoGAN

1. Gaussian noise z (source of incompressible noise)

2. latent code c (semantic feature of data distribution)

𝐜𝐜 ~ 𝐜𝐜𝐜𝐜𝐜𝐜(𝐊𝐊− 𝟏𝟏𝟏𝟏,𝐩𝐩 = 𝟏𝟏.𝟏𝟏)

1 9

𝟏𝟏𝟏𝟏𝟏𝟏

0

SCHEMATIC OVERVIEW

Page 85: Variants of GANs - Jaejun Yoo

c

z

G

D

x

Real or Fake?

add an extra “code” variable

Diagram ofinfoGAN

1. Gaussian noise z (source of incompressible noise)

2. latent code c (semantic feature of data distribution)

*

SCHEMATIC OVERVIEW

Page 86: Variants of GANs - Jaejun Yoo

c

z

G

D

x

I

Real or Fake?

Mutual Info. infoGAN: maximize I(c,G(z,c))

Diagram ofinfoGAN Impose an extra constraint to learn disentangled feature space

SCHEMATIC OVERVIEW

Page 87: Variants of GANs - Jaejun Yoo

“The information in the latent code c should not be lost in the generation process.”

c

z

G

D

x

I

Real or Fake?

Mutual Info. infoGAN: maximize I(c,G(z,c))

Diagram ofinfoGAN Impose an extra constraint to learn disentangled feature space

SCHEMATIC OVERVIEW

Page 88: Variants of GANs - Jaejun Yoo

INFOGAN

* Figure adopted from Wikipedia “Mutual Information”

Changed Minimax problem:

Mutual Information:

∴ We need to minimize the entropy of

where .

Page 89: Variants of GANs - Jaejun Yoo

INFOGAN

* Figure adopted from Wikipedia “Mutual Information”

Changed Minimax problem:

Mutual Information:

∴ We need to minimize the entropy of

where .

Uncertainty

Page 90: Variants of GANs - Jaejun Yoo

INFOGAN

* Figure adopted from Wikipedia “Mutual Information”

Changed Minimax problem:

Mutual Information:

∴ We need to minimize the entropy of

where .

Uncertainty

*

intractable

Page 91: Variants of GANs - Jaejun Yoo

VARIATIONAL INFORMATION MAXIMIZATION

Changed Minimax problem:

Let’s MAXIMIZE the LOWER BOUND which is tractable !

QP

Page 92: Variants of GANs - Jaejun Yoo

RESULTS

MNIST dataset

* Figure adopted from infoGAN paper (link)

Page 93: Variants of GANs - Jaejun Yoo

RESULTS

3D FACE dataset

* Figure adopted from infoGAN paper (link)

Page 94: Variants of GANs - Jaejun Yoo

RESULTS

* Figure adopted from infoGAN paper (link)

Page 95: Variants of GANs - Jaejun Yoo

RESULTS

* Figure adopted from infoGAN paper (link)

Page 96: Variants of GANs - Jaejun Yoo

Q

IMPLEMENTATION

c

z

G

D

x

I

Diagram ofinfoGAN

Train Q separately

Page 97: Variants of GANs - Jaejun Yoo

SUMMARY

1. Add an additional constraint to improve the performance

• Mutual information

• No adding on the computational cost

2. Learn better feature space

3. Unsupervised way to learn implicit features in the dataset

4. Variational method

Page 98: Variants of GANs - Jaejun Yoo

IMPLEMENTATION

Page 99: Variants of GANs - Jaejun Yoo

IMPLEMENTATION

Page 100: Variants of GANs - Jaejun Yoo

𝒇𝒇-GANLET’S USE f-DIVERGENCE RATHER THAN FIXING A SINGLE ONE.

Here, I have heavily reused the slides from S. Nowozin’s (1st author) NIPS 2016 workshop for GAN. You can easily find the related information (slides) at:http://www.nowozin.net/sebastian/blog/nips-2016-generative-adversarial-training-workshop-talk.html

Page 101: Variants of GANs - Jaejun Yoo

𝑄𝑄𝑃𝑃

𝒫𝒫

MOTIVATION

LEARNING PROBABILISTIC MODELS

* Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.

Page 102: Variants of GANs - Jaejun Yoo

𝑃𝑃𝑄𝑄

𝒫𝒫

Assumptions on P :• tractable sampling• tractable parameter gradient with respect to sample• tractable likelihood function

MOTIVATION

LEARNING PROBABILISTIC MODELS

* Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.

Page 103: Variants of GANs - Jaejun Yoo

[Goodfellow et al., 2014]

𝑧𝑧 → Lin 100,1200 → ReLU→ Lin 1200,1200 → ReLU→ Lin(1200,784) → Sigmoid

Random input Generator Output

𝑧𝑧 ~ Uniform100

MOTIVATION

Likelihood-free Model

Page 104: Variants of GANs - Jaejun Yoo

• P: Expectation• Q: Expectation• Structure in ℱ• Examples:

• Energy statistic [Szekely, 1997]• Kernel MMD [Gretton et al., 20

12],[Smola et al., 2007]

• Wasserstein distance [Cuturi, 2013]

• DISCO Nets[Bouchacourt et al., 2016]

Integral Probability Metrics[Müller, 1997]

[Sriperumbudur et al., 2010]

𝛾𝛾ℱ 𝑃𝑃,𝑄𝑄 = sup𝑓𝑓∈ℱ

�𝑓𝑓d𝑃𝑃 − �𝑓𝑓d𝑄𝑄

Proper scoring rules[Gneiting and Raftery, 2007]

𝑆𝑆 𝑃𝑃,𝑄𝑄 = �𝑆𝑆 𝑃𝑃, 𝑥𝑥 d𝑄𝑄(𝑥𝑥)

• P: Distribution

• Q: Expectation

• Examples:• Log-likelihood

[Fisher, 1922], [Good, 1952]• Quadratic score

[Bernardo, 1979]

f-divergences[Ali and Silvey, 1966]

𝐷𝐷𝑓𝑓 𝑃𝑃 ∥ 𝑄𝑄 = �𝑞𝑞 𝑥𝑥 𝑓𝑓𝑝𝑝(𝑥𝑥)𝑞𝑞(𝑥𝑥)

d𝑥𝑥

• P: Distribution

• Q: Distribution

• Examples:• Kullback-Leibler divergence

[Kullback and Leibler, 1952]• Jensen-Shannon divergence• Total variation• Pearson 𝜒𝜒2

LEARNING PROBABILISTIC MODELS

SCHEMATIC OVERVIEW

Page 105: Variants of GANs - Jaejun Yoo

• P: Distribution

• Q: Expectation

• P: Expectation

• Q: Expectation

• P: Distribution

• Q: Distribution

[Nguyen et al., 2010], [Reid and Williamson, 2011], [Goodfellow et al., 2014]

Variational representation of divergences

LEARNING PROBABILISTIC MODELS

SCHEMATIC OVERVIEW

Page 106: Variants of GANs - Jaejun Yoo

Neural Sampler samples

Training samples

How do we measure the distance only based on empirical samples from 𝑷𝑷𝜽𝜽(𝒎𝒎) and 𝐐𝐐(𝒎𝒎)?

TRAINING NEURAL SAMPLERS

SCHEMATIC OVERVIEW

* Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.

Page 107: Variants of GANs - Jaejun Yoo

“Show that the GAN approach is a special case of an existing more general variational divergence estimation approach.”

Let’s generalize the GAN objective to arbitrary 𝒇𝒇-divergences!

SCHEMATIC OVERVIEW

Page 108: Variants of GANs - Jaejun Yoo

Neural Sampler distribution

True distribution𝑃𝑃𝜃𝜃 𝑥𝑥

𝑄𝑄 𝑥𝑥

We can minimize some distance (divergence) between the distributions if we had 𝑷𝑷𝜽𝜽(𝒎𝒎) and 𝑸𝑸(𝑥𝑥)

TRAINING NEURAL SAMPLERS

SCHEMATIC OVERVIEW

* Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.

Page 109: Variants of GANs - Jaejun Yoo

• Divergence between two distributions

𝑫𝑫𝒇𝒇 𝑸𝑸 ∥ 𝑷𝑷 = �𝓧𝓧𝒑𝒑 𝒎𝒎 𝒇𝒇

𝒒𝒒(𝒎𝒎)𝒑𝒑(𝒎𝒎)

𝒅𝒅𝒎𝒎

• f : generator function, convex, f (1) = 0

[Ali and Silvey, 1966]

𝑓𝑓-DIVERGENCE

Page 110: Variants of GANs - Jaejun Yoo

TRY WHAT YOU WANT! (골라먹는 재미)

𝑓𝑓-DIVERGENCE

Page 111: Variants of GANs - Jaejun Yoo

• Divergence between two distributions

𝐷𝐷𝑓𝑓 𝑄𝑄 ∥ 𝑃𝑃 = �𝒳𝒳𝑝𝑝 𝑥𝑥 𝑓𝑓

𝑞𝑞(𝑥𝑥)𝑝𝑝(𝑥𝑥)

d𝑥𝑥

• Every convex function 𝑓𝑓 has a Fenchel conjugate 𝑓𝑓∗ so that

𝑓𝑓 𝑢𝑢 = sup𝑡𝑡∈dom𝑓𝑓∗

𝑡𝑡𝑢𝑢 − 𝑓𝑓∗(𝑡𝑡)

[Nguyen, Wainwright, Jordan, 2010]

“Any convex f can be represented as point-wise max of linear functions”

Estimating 𝑓𝑓-divergences from samples

𝑓𝑓-DIVERGENCE

Page 112: Variants of GANs - Jaejun Yoo
Page 113: Variants of GANs - Jaejun Yoo

𝐷𝐷𝑓𝑓 𝑄𝑄 ∥ 𝑃𝑃 = �𝒳𝒳𝑝𝑝 𝑥𝑥 𝑓𝑓

𝑞𝑞(𝑥𝑥)𝑝𝑝(𝑥𝑥)

d𝑥𝑥

= �𝒳𝒳𝑝𝑝 𝑥𝑥 sup

𝑡𝑡∈dom𝑓𝑓∗𝑡𝑡𝑞𝑞(𝑥𝑥)𝑝𝑝(𝑥𝑥)

− 𝑓𝑓∗(𝑡𝑡) d𝑥𝑥

≥ sup𝑇𝑇∈𝒯𝒯

�𝒳𝒳𝑞𝑞 𝑥𝑥 𝑇𝑇 𝑥𝑥 d𝑥𝑥 − �

𝒳𝒳𝑝𝑝 𝑥𝑥 𝑓𝑓∗ 𝑇𝑇 𝑥𝑥 d𝑥𝑥

= sup𝑇𝑇∈𝒯𝒯

𝔼𝔼𝑥𝑥~𝑄𝑄 𝑇𝑇(𝑥𝑥) − 𝔼𝔼𝑥𝑥~𝑃𝑃[𝑓𝑓∗(𝑇𝑇(𝑥𝑥))]

Approximate using: samples from Q samples from P

Estimating 𝑓𝑓-divergences from samples (cont)

𝑓𝑓-DIVERGENCE

* Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.

Page 114: Variants of GANs - Jaejun Yoo

• GANmin𝜃𝜃

max𝜔𝜔

𝔼𝔼𝑥𝑥~𝑄𝑄[log𝐷𝐷𝜔𝜔 𝑥𝑥 ] + 𝔼𝔼𝑥𝑥~𝑃𝑃𝜃𝜃[log(1 − 𝐷𝐷𝜔𝜔(𝑥𝑥))]

• 𝑓𝑓-GANmin𝜃𝜃

max𝜔𝜔

𝔼𝔼𝑥𝑥~𝑄𝑄 𝑇𝑇𝜔𝜔 (𝑥𝑥) − 𝔼𝔼𝑥𝑥~𝑃𝑃𝜃𝜃[𝑓𝑓∗(𝑇𝑇𝜔𝜔(𝑥𝑥))]

• GAN discriminator-variational function correspondence: log𝐷𝐷𝜔𝜔 𝑥𝑥 =𝑇𝑇𝜔𝜔 𝑥𝑥

• GAN minimizes the Jensen-Shannon divergence (which was also pointed out in Goodfellow et al., 2014)

𝑓𝑓-GAN and GAN objectives

𝑓𝑓-GAN

* Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.

Page 115: Variants of GANs - Jaejun Yoo

min𝜃𝜃

max𝜔𝜔

𝔼𝔼𝑥𝑥~𝑄𝑄 𝑔𝑔𝑓𝑓(𝑉𝑉𝜔𝜔 𝑥𝑥 ) + 𝔼𝔼𝑥𝑥~𝑃𝑃𝜃𝜃 −𝑓𝑓∗ 𝑔𝑔𝑓𝑓 𝑉𝑉𝜔𝜔 𝑥𝑥

Comparison of the objectives

𝑓𝑓-GAN

* Please note that in S. Nowozin’s slides, Q represents the real distribution and P stands for the parametric model we set.

Page 116: Variants of GANs - Jaejun Yoo

• Double-loop algorithm [Goodfellow et al., 2014]• Algorithm:

• Inner loop: tighten divergence lower bound• Outer loop: minimize generator loss

• In practice the inner loop is run only for one step (two backprops)• Missing justification for this practice

• Single-step algorithm (proposed)• Algorithm: simultaneously take (one backprop)

• a positive gradient step w.r.t. variational function 𝑇𝑇𝜔𝜔(𝑥𝑥)• a negative gradient step w.r.t. generator function 𝑃𝑃𝜃𝜃 𝑥𝑥

• Does this converge?

THEORETICAL RESULTSAlgorithm: Double-Loop versus Single-Step

Page 117: Variants of GANs - Jaejun Yoo

GENERAL ALGORITHM

f-GAN

* Please note that in S. Nowozin’s paper, P represents the real distribution and 𝑄𝑄𝜃𝜃 stands for the parametric model we set.

Page 118: Variants of GANs - Jaejun Yoo

THEORETICAL RESULTSLocal convergence of the algorithm 1

Page 119: Variants of GANs - Jaejun Yoo

• Assumptions• F is locally (strongly) convex with respect to 𝜃𝜃• F is (strongly) concave with respect to 𝜔𝜔

• Local convergence:

Define 𝐽𝐽 𝜃𝜃,𝜔𝜔 = 12𝛻𝛻𝜃𝜃𝐹𝐹 2 + 1

2𝛻𝛻𝜔𝜔𝐹𝐹 2, then

𝐽𝐽 𝜃𝜃𝑡𝑡 ,𝜔𝜔𝑡𝑡 ≤ 1 −𝛿𝛿2

𝐿𝐿

𝑡𝑡

𝐽𝐽 𝜃𝜃0,𝜔𝜔0

𝛻𝛻2𝐹𝐹

= 𝛻𝛻𝜃𝜃2𝐹𝐹 𝛻𝛻𝜃𝜃𝛻𝛻𝜔𝜔𝐹𝐹𝛻𝛻𝜔𝜔𝛻𝛻𝜃𝜃𝐹𝐹 𝛻𝛻𝜔𝜔2𝐹𝐹

𝛻𝛻𝜃𝜃2𝐹𝐹 ≻ 0, 𝛻𝛻𝜔𝜔2𝐹𝐹 ≺ 0

𝛿𝛿: strong convexity parameter, L: smoothness parameter

Geometric rate of convergence!

THEORETICAL RESULTSLocal convergence of the algorithm 1

Page 120: Variants of GANs - Jaejun Yoo

𝑽𝑽 𝒎𝒎,𝒚𝒚 = 𝒎𝒎𝒚𝒚 +𝜹𝜹𝟐𝟐

(𝒎𝒎𝟐𝟐 − 𝒚𝒚𝟐𝟐) 𝑽𝑽 𝒎𝒎,𝒚𝒚 = 𝒎𝒎𝒚𝒚𝟐𝟐 +𝜹𝜹𝟐𝟐

(𝒎𝒎𝟐𝟐 − 𝒚𝒚𝟐𝟐)

VIDEOS

Page 121: Variants of GANs - Jaejun Yoo

RESULTSSynthetic 1D Univariate

Approximate a mixture of Gaussians by a Gaussian to• Validate the approach• Demonstrate the properties of different divergences [Minka, 2005]

Compare the exact optimization of the divergence with the GAN approach

* Please note that in S. Nowozin’s paper, P represents the real distribution and 𝑄𝑄𝜃𝜃 stands for the parametric model we set.

Page 122: Variants of GANs - Jaejun Yoo

RESULTS* Figure adopted from f-GAN paper (link)Synthetic 1D Univariate

* Please note that in S. Nowozin’s paper, P represents the real distribution and 𝑄𝑄𝜃𝜃 stands for the parametric model we set.

Page 123: Variants of GANs - Jaejun Yoo

RESULTS

* Figure adopted from f-GAN paper (link)

Page 124: Variants of GANs - Jaejun Yoo
Page 125: Variants of GANs - Jaejun Yoo

SUMMARY

• Generalize GAN objective to arbitrary 𝒇𝒇-divergences

• Simplify GAN algorithm + local convergence proof

• Demonstrate different divergences

Page 126: Variants of GANs - Jaejun Yoo

ETC.

WHY GAN GENERATES SHARPER IMAGES?

* Figure adopted from NIPS 2016 Tutorial GAN, Ian Goodfellow 2016

Page 127: Variants of GANs - Jaejun Yoo

ETC.

WHY GAN GENERATES SHARPER IMAGES?

* Figure adopted from NIPS 2016 Tutorial GAN, Ian Goodfellow 2016

Page 128: Variants of GANs - Jaejun Yoo

• LSUN experiment: No (visually)• Empirical contradiction to intuition from [Theis et al., 2015], [Huszar, 2015]

• Why?• Intuition: strong inductive bias of model class

Q

ETC.

DOES THE DIVERGENCE MATTER?

Page 129: Variants of GANs - Jaejun Yoo

EBGANLET’S USE ENERGY-BASED MODEL FOR THE DISCRIMINATOR

Page 130: Variants of GANs - Jaejun Yoo

MOTIVATION

* Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)

We want to learnthe data manifold!

Page 131: Variants of GANs - Jaejun Yoo

MOTIVATION

* Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)

We want to learnthe data manifold!

⟺ Do not want to give

penalty to both 𝒚𝒚 and �𝒚𝒚 .

Page 132: Variants of GANs - Jaejun Yoo

MOTIVATION

* Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)

We want to learnthe data manifold!

⟺ Do not want to give

penalty to both 𝒚𝒚 and �𝒚𝒚 .

⟺ Pen can fall either way.

(there are no wrong way to fall)

Page 133: Variants of GANs - Jaejun Yoo

MOTIVATION

ENERGY-BASED MODEL

Energy based models capture dependencies between variables by associating a scalar energy (a measure of compatibility) to each configuration of the variables.

Inference, i.e., making a prediction or decision, consists in setting the value of observed variables and finding values of the remaining variables that minimize the energy.

Learning consists in finding an energy function that associates low energies to correct values of the remaining variables, and higer energies to incorrect values.

A loss functional, minimized during learning, is used to measure the quality of the available energy functions.

* Quoted from “A Tutorial on Energy-Based Learning”, Yann Lecun et al., 2006

Page 134: Variants of GANs - Jaejun Yoo

Energy based models capture dependencies between variables by associating ascalar energy (a measure of compatibility) to each configuration of the variables.

Inference, i.e., making a prediction or decision, consists in setting the value ofobserved variables and finding values of the remaining variables that minimize theenergy.

Learning consists in finding an energy function that associates low energies tocorrect values of the remaining variables, and higer energies to incorrect values.

A loss functional, minimized during learning, is used to measure the quality of theavailable energy functions.

MOTIVATION

ENERGY-BASED MODEL

WITHIN the COMMON inference/learning FRAMEWORK, the wide choice of

energy functions and loss functionals allows for the design of many types of

statistical models, both probabilistic and non-probabilistic.

* Quoted from “A Tutorial on Energy-Based Learning”, Yann Lecun et al., 2006

Page 135: Variants of GANs - Jaejun Yoo

MOTIVATION

ENERGY-BASED MODEL

* Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)

“LET’S USE IT!”

Page 136: Variants of GANs - Jaejun Yoo

MOTIVATION

ENERGY-BASED MODEL

* Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)

“BUT HOW do we choose where to push up?“

Page 137: Variants of GANs - Jaejun Yoo

MOTIVATION

* Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)

Page 138: Variants of GANs - Jaejun Yoo

MOTIVATION

* Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)

LIMITATIONS

→ Limit the space

→ Every interesting case is intractable

→How to pick the point to push up?

→ Limit the model or space

Page 139: Variants of GANs - Jaejun Yoo

MOTIVATION

* Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)

Page 140: Variants of GANs - Jaejun Yoo

MOTIVATION

* Figure adopted from Yann Lecun’s slides, NIPS 2016 (link)

Page 141: Variants of GANs - Jaejun Yoo

SCHEMATIC OVERVIEW

* Slides from Yann Lecun’s talk in NIPS 2016 (link)

Page 142: Variants of GANs - Jaejun Yoo

* Slides from Yann Lecun’s talk in NIPS 2016 (link)

SCHEMATIC OVERVIEW

Page 143: Variants of GANs - Jaejun Yoo

* Slides from Yann Lecun’s talk in NIPS 2016 (link)

SCHEMATIC OVERVIEW

Page 144: Variants of GANs - Jaejun Yoo

Architecture: discriminator is an auto-encoder

Loss functions:

EBGAN

* Slides from Yann Lecun’s talk in NIPS 2016 (link)

Page 145: Variants of GANs - Jaejun Yoo

Architecture: discriminator is an auto-encoder

Loss functions:

hinge loss

EBGAN

* Slides from Yann Lecun’s talk in NIPS 2016 (link)

Page 146: Variants of GANs - Jaejun Yoo

THEORETICAL RESULTS

* Slides from Yann Lecun’s talk in NIPS 2016 (link)

Page 147: Variants of GANs - Jaejun Yoo

RESULTS

* Slides from Yann Lecun’s talk in NIPS 2016 (link)

Page 148: Variants of GANs - Jaejun Yoo

RESULTS

* Slides from Yann Lecun’s talk in NIPS 2016 (link)

Page 149: Variants of GANs - Jaejun Yoo

SUMMARY

1. New framework using an energy-based model

• Discriminator as an energy function

• Low values on the data manifold

• Higher values everywhere else

• Generator produce contrastive samples

2. Stable learning

Page 150: Variants of GANs - Jaejun Yoo

WGAN

Page 151: Variants of GANs - Jaejun Yoo

MOTIVATION

What does it mean to learn a probability model?: we learned it from 𝒇𝒇-GAN!

𝑄𝑄𝑃𝑃𝜃𝜃

𝒫𝒫

LEARNING PROBABILISTIC MODELS

Assumptions on P : tractable sampling, parameter gradient with respect to sample, likelihood function

Page 152: Variants of GANs - Jaejun Yoo

MOTIVATION

What does it mean to learn a probability model?: we learned it from 𝒇𝒇-GAN!

𝑄𝑄𝑃𝑃𝜃𝜃

𝒫𝒫

LEARNING PROBABILISTIC MODELS

Assumptions on P : tractable sampling, parameter gradient with respect to sample, likelihood function

… 𝒎𝒎𝒎𝒎𝒎𝒎𝜽𝜽

𝑲𝑲𝑲𝑲[𝑸𝑸| 𝑷𝑷𝜽𝜽 ⟺ 𝒎𝒎𝒅𝒅𝒎𝒎𝜽𝜽

�𝒎𝒎=𝟏𝟏

𝑵𝑵

𝒎𝒎𝒎𝒎𝒈𝒈𝑷𝑷 𝒎𝒎𝒎𝒎 𝜽𝜽

QP

Page 153: Variants of GANs - Jaejun Yoo

MOTIVATION

What if the supports of two distribution does not overlap?

𝑰𝑰𝑰𝑰 𝑲𝑲𝑲𝑲[𝑸𝑸| 𝑷𝑷𝜽𝜽 𝑰𝑰𝒅𝒅𝒎𝒎𝒎𝒎𝒎𝒎 𝒅𝒅𝒎𝒎𝒇𝒇𝒎𝒎𝒎𝒎𝒎𝒎𝒅𝒅?

QP

Page 154: Variants of GANs - Jaejun Yoo

MOTIVATION

What if the supports of two distribution does not overlap?

𝑨𝑨𝒅𝒅𝒅𝒅 𝒅𝒅 𝒎𝒎𝒎𝒎𝒎𝒎𝑰𝑰𝒎𝒎 𝒅𝒅𝒎𝒎𝒕𝒕𝒎𝒎 𝒅𝒅𝒎𝒎 𝒅𝒅𝒕𝒕𝒎𝒎𝒎𝒎𝒎𝒎𝒅𝒅𝒎𝒎𝒎𝒎 𝒅𝒅𝒎𝒎𝑰𝑰𝒅𝒅𝒕𝒕𝒎𝒎𝒅𝒅𝒅𝒅𝒅𝒅𝒎𝒎𝒎𝒎𝒎𝒎

QP

* Note that this is a very rough explanation

Page 155: Variants of GANs - Jaejun Yoo

MOTIVATION

What if the supports of two distribution does not overlap?

𝑨𝑨𝒅𝒅𝒅𝒅 𝒅𝒅 𝒎𝒎𝒎𝒎𝒎𝒎𝑰𝑰𝒎𝒎 𝒅𝒅𝒎𝒎𝒕𝒕𝒎𝒎 𝒅𝒅𝒎𝒎 𝒅𝒅𝒕𝒕𝒎𝒎𝒎𝒎𝒎𝒎𝒅𝒅𝒎𝒎𝒎𝒎 𝒅𝒅𝒎𝒎𝑰𝑰𝒅𝒅𝒕𝒕𝒎𝒎𝒅𝒅𝒅𝒅𝒅𝒅𝒎𝒎𝒎𝒎𝒎𝒎

QP

TEMPORARY? OR UNSATISFYING SOLUTION

* Note that this is a very rough explanation

Page 156: Variants of GANs - Jaejun Yoo

• P: Expectation• Q: Expectation• Structure in ℱ• Examples:

• Energy statistic [Szekely, 1997]• Kernel MMD [Gretton et al., 20

12],[Smola et al., 2007]

• Wasserstein distance [Cuturi, 2013]

• DISCO Nets[Bouchacourt et al., 2016]

Integral Probability Metrics[Müller, 1997]

[Sriperumbudur et al., 2010]

𝛾𝛾ℱ 𝑃𝑃,𝑄𝑄 = sup𝑓𝑓∈ℱ

�𝑓𝑓d𝑃𝑃 − �𝑓𝑓d𝑄𝑄

Proper scoring rules[Gneiting and Raftery, 2007]

𝑆𝑆 𝑃𝑃,𝑄𝑄 = �𝑆𝑆 𝑃𝑃, 𝑥𝑥 d𝑄𝑄(𝑥𝑥)

• P: Distribution

• Q: Expectation

• Examples:• Log-likelihood

[Fisher, 1922], [Good, 1952]• Quadratic score

[Bernardo, 1979]

f-divergences[Ali and Silvey, 1966]

𝐷𝐷𝑓𝑓 𝑃𝑃 ∥ 𝑄𝑄 = �𝑞𝑞 𝑥𝑥 𝑓𝑓𝑝𝑝(𝑥𝑥)𝑞𝑞(𝑥𝑥)

d𝑥𝑥

• P: Distribution

• Q: Distribution

• Examples:• Kullback-Leibler divergence

[Kullback and Leibler, 1952]• Jensen-Shannon divergence• Total variation• Pearson 𝜒𝜒2

LEARNING PROBABILISTIC MODELS

REVIEW!

Page 157: Variants of GANs - Jaejun Yoo

[Goodfellow et al., 2014]

𝑧𝑧 → Lin 100,1200 → ReLU→ Lin 1200,1200 → ReLU→ Lin(1200,784) → Sigmoid

Random input Generator Output

𝑧𝑧 ~ Uniform100

REVIEW!

Likelihood-free Model

Page 158: Variants of GANs - Jaejun Yoo

[Goodfellow et al., 2014]

𝑧𝑧 → Lin 100,1200 → ReLU→ Lin 1200,1200 → ReLU→ Lin(1200,784) → Sigmoid

Random input Generator Output

𝑧𝑧 ~ Uniform100

REVIEW!

WELL KNOWN FOR BEING DELICATE AND UNSTABLE FOR TRAINING

Likelihood-free Model

Page 159: Variants of GANs - Jaejun Yoo

• P: Expectation• Q: Expectation• Structure in ℱ• Examples:

• Energy statistic [Szekely, 1997]• Kernel MMD [Gretton et al., 20

12],[Smola et al., 2007]

• Wasserstein distance [Cuturi, 2013]

• DISCO Nets[Bouchacourt et al., 2016]

Integral Probability Metrics[Müller, 1997]

[Sriperumbudur et al., 2010]

𝛾𝛾ℱ 𝑃𝑃,𝑄𝑄 = sup𝑓𝑓∈ℱ

�𝑓𝑓d𝑃𝑃 − �𝑓𝑓d𝑄𝑄

Proper scoring rules[Gneiting and Raftery, 2007]

𝑆𝑆 𝑃𝑃,𝑄𝑄 = �𝑆𝑆 𝑃𝑃, 𝑥𝑥 d𝑄𝑄(𝑥𝑥)

• P: Distribution

• Q: Expectation

• Examples:• Log-likelihood

[Fisher, 1922], [Good, 1952]• Quadratic score

[Bernardo, 1979]

f-divergences[Ali and Silvey, 1966]

𝐷𝐷𝑓𝑓 𝑃𝑃 ∥ 𝑄𝑄 = �𝑞𝑞 𝑥𝑥 𝑓𝑓𝑝𝑝(𝑥𝑥)𝑞𝑞(𝑥𝑥)

d𝑥𝑥

• P: Distribution

• Q: Distribution

• Examples:• Kullback-Leibler divergence

[Kullback and Leibler, 1952]• Jensen-Shannon divergence• Total variation• Pearson 𝜒𝜒2

LEARNING PROBABILISTIC MODELS

REVIEW!

Page 160: Variants of GANs - Jaejun Yoo

THEORETIC RESULTS

Page 161: Variants of GANs - Jaejun Yoo

THEORETIC RESULTS

Different distances

Slide courtesy of Sungbin Lim, DeepBio, 2017

Page 162: Variants of GANs - Jaejun Yoo

THEORETIC RESULTS

Different distances

Slide courtesy of Sungbin Lim, DeepBio, 2017

Page 163: Variants of GANs - Jaejun Yoo

THEORETIC RESULTS

Different distances

Slide courtesy of Sungbin Lim, DeepBio, 2017

Page 164: Variants of GANs - Jaejun Yoo

THEORETIC RESULTS

Different distances

Slide courtesy of Sungbin Lim, DeepBio, 2017

Page 165: Variants of GANs - Jaejun Yoo

THEORETIC RESULTS

Slide courtesy of Sungbin Lim, DeepBio, 2017

Page 166: Variants of GANs - Jaejun Yoo

THEORETIC RESULTS

Slide courtesy of Sungbin Lim, DeepBio, 2017

Page 167: Variants of GANs - Jaejun Yoo

THEORETIC RESULTS

Slide courtesy of Sungbin Lim, DeepBio, 2017

Page 168: Variants of GANs - Jaejun Yoo

Slide courtesy of Sungbin Lim, DeepBio, 2017

THEORETIC RESULTS

Page 169: Variants of GANs - Jaejun Yoo

THEORETIC RESULTS

Slide courtesy of Sungbin Lim, DeepBio, 2017

Page 170: Variants of GANs - Jaejun Yoo

WASSERSTEIN DISTANCE

Page 171: Variants of GANs - Jaejun Yoo

Slide courtesy of Sungbin Lim, DeepBio, 2017

WASSERSTEIN DISTANCE

Page 172: Variants of GANs - Jaejun Yoo

Slide courtesy of Sungbin Lim, DeepBio, 2017

WASSERSTEIN DISTANCE

Page 173: Variants of GANs - Jaejun Yoo

THEORETIC RESULTS

* Figure adopted from WGAN paper (link)

Page 174: Variants of GANs - Jaejun Yoo

THEORETIC RESULTS

Slide courtesy of Sungbin Lim, DeepBio, 2017

Page 175: Variants of GANs - Jaejun Yoo

THEORETIC RESULTS

Slide courtesy of Sungbin Lim, DeepBio, 2017

Page 176: Variants of GANs - Jaejun Yoo

THEORETIC RESULTS

Wasserstein distance is a continuous function on 𝜽𝜽 under mild assumption!

Page 177: Variants of GANs - Jaejun Yoo

THEORETIC RESULTS

Wasserstein distance is the weakest one

Page 178: Variants of GANs - Jaejun Yoo

THEORETIC RESULTS

Highly intractable!(for all joint dist….)

Page 179: Variants of GANs - Jaejun Yoo

THEORETIC RESULTS

Change the problem with its dual problem which is tractable (somehow)!

Page 180: Variants of GANs - Jaejun Yoo

THEORETIC RESULTS

Okay! We can use the neural network!(up to a constant)

Page 181: Variants of GANs - Jaejun Yoo

IMPLEMENTATION

Page 182: Variants of GANs - Jaejun Yoo

RESULTS

* Figure adopted from WGAN paper (link)

Page 183: Variants of GANs - Jaejun Yoo

RESULTS

* Figure adopted from WGAN paper (link)

Page 184: Variants of GANs - Jaejun Yoo

RESULTS

* Figure adopted from WGAN paper (link)

Stable without Batch normalization!

Page 185: Variants of GANs - Jaejun Yoo

RESULTS

* Figure adopted from WGAN paper (link)

Stable without DCGAN structure (generator)!

Page 186: Variants of GANs - Jaejun Yoo

SUMMARY

1. Provide a comprehensive theoretical analysis of how the EM

distance behaves in comparison to the others

2. Introduce Wasserstein-GAN that minimizes a reasonable and

efficient approximation of the EM distance.

3. Empirically show that WGANs cure the main training problems of

GANs (e.g. stability, power balance, mode-collapsing)

4. Evaluation criteria (learning curve)

Page 187: Variants of GANs - Jaejun Yoo

IMPLEMENTATION

THAT SIMPLE!

(after all those mathematics…)

Page 188: Variants of GANs - Jaejun Yoo

• 거의 모든 GAN에 대한 구현이 Pytorch와 tensorflow 버전으로 구현되어있는 repo:

https://github.com/wiseodd/generative-models

IMPLEMENTATION

Page 189: Variants of GANs - Jaejun Yoo

THANK YOU [email protected]

Page 190: Variants of GANs - Jaejun Yoo

MLE & KL DIVERGENCE

Page 191: Variants of GANs - Jaejun Yoo

분포수렴이란?