Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement

Post on 14-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Disentangling Disentanglement inVariational AutoencodersICML 2019

Emile Mathieu?, Tom Rainforth?, N. Siddharth?, Yee Whye TehJune 12, 2019

Departments of Statistics and Engineering Science, University of Oxford

Variational Autoencoders

x1

x2

x3

x4

x5

x

z1

z2

z3

z4

GenerativeModel Inference

Modelzl(gender)

zm (beard)zn

(makeup)

Factors

1

Disentanglement

= Independence

x1

x2

x3

x4

x5

x

z1

z2

z3

z4

GenerativeModel Inference

Model

xixj

zl(gender)

zm (beard)zn

(makeup)

MeaningfulFactors

1

Disentanglement = Independence

x1

x2

x3

x4

x5

x

z1

z2

z3

z4

GenerativeModel Inference

Modelzl(shape)

zm (angle)zn

(scale)

IndependentFactors

1

Decomposition ∈ {Independence, Clustering, Sparsity, …}

x1

x2

x3

x4

x5

x

z1

z2

z3

z4

GenerativeModel Inference

Modelzl(gender)

zm (beard)zn

(makeup)

Co-RelatedFactors

1

Decomposition: A Generalization of Disentanglement

Characterise decomposition as the fulfilment of two factors:

(a) level of overlap between encodings in the latent space,(b) matching between the marginal posterior qφ(z) and structured

prior p(z) to constrain with the required decomposition.

2

Decomposition: An Analysis

Desired StructureTargetStructure

p(z)

3

Decomposition: An Analysis

Insufficient Overlap

Insu�cient

Overlap

q�(z|x) p✓(x|z)pD(x) q�(z) p(z) p✓(x)

3

Decomposition: An Analysis

Too Much Overlap

q�(z|x) p✓(x|z)

Too Much

Overlap

pD(x) q�(z) p(z) p✓(x)

3

Decomposition: An Analysis

Appropriate Overlap

Appropriate

Overlap

q�(z|x) p✓(x|z)pD(x) q�(z) p(z) p✓(x)

3

Overlap — Deconstructing the β-VAE

Lβ(x) = Eqφ(z|x)[logpθ(x|z)]− β · KL(qφ(z|x)||p(z))= L(x) (πθ,β ,qφ)︸ ︷︷ ︸

ELBO with β-annealed prior

+(β − 1) · Hqφ︸ ︷︷ ︸maxent

+ log Fβ︸ ︷︷ ︸constant

Implicationsβ-VAE disentangles largely by controlling the level of overlapIt places no direct pressure on the latents to be independent!

4

Decomposition: Objective

Lα,β(x) = Eqφ(z|x)[logpθ(x | z)] Reconstruct observations

− β · KL(qφ(z | x) ‖ p(z)) Control level of overlap

− α · D(qφ(z),p(z)) Impose desired structure

5

Decomposition: Generalising Disentanglement

Independence: p(z) = N (0,σ?)

Figure 1: β-VAE trained on 2D Shapes1 computing disentanglement2.

1Matthey et al., dSprites: Disentanglement testing Sprites dataset, p. 1.2Kim and Mnih, “Disentangling by Factorising”, p. 2.

6

Decomposition: Generalising Disentanglement

Clustering: p(z) = ∑k ρk · N (µk,σk)

β = 0.01 β = 0.5 β = 1.0 β = 1.2

α=0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

β=0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

α = 1 α = 3 α = 5 α = 8

Figure 2: Density of aggregate posterior qφ(z) with different α, β for thepinwheel dataset.3

3http://hips.seas.harvard.edu/content/synthetic-pinwheel-data-matlab. 7

Decomposition: Generalising Disentanglement

Sparsity: p(z) = ∏d (1− γ) · N (zd; 0, 1) + γ · N (zd; 0, σ20)

0 5 10 15 20 25 30 35 40 45Latent dimension

0.0

0.2

0.4

0.6

Avg.

late

nt m

agni

tude

TrouserDressShirt

Figure 3: Sparsity of learnt representations for the Fashion-MNIST4 dataset.

4Xiao, Rasul, and Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.

8

Decomposition: Generalising Disentanglement

Sparsity: p(z) = ∏d (1− γ) · N (zd; 0, 1) + γ · N (zd; 0, σ20)

(a) d = 49 (b) d = 30 (c) d = 19 (d) d = 40leg separation dress width shirt fit sleeve style

Figure 3: Latent space traversals for “active” dimensions4.

4Xiao, Rasul, and Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.

8

Decomposition: Generalising Disentanglement

Sparsity: p(z) = ∏d (1− γ) · N (zd; 0, 1) + γ · N (zd; 0, σ20)

0 200 400 600 800 1000alpha

0.2

0.3

0.4

0.5Av

g. N

orm

alise

d Sp

arsit

y

γ= 0, β= 0.1γ= 0.8, β= 0.1

γ= 0, β= 1γ= 0.8, β= 1

γ= 0, β= 5γ= 0.8, β= 5

Figure 3: Sparsity vs regularisation strength α (higher better)4.

4Xiao, Rasul, and Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.

8

Recap

We propose and develop:

• Decomposition: a generalisation of disentanglement involving:(a) overlap of latent encodings(b) match between qφ(z) and p(z)

• A theoretical analysis of the β-VAE objective showing it primarilyonly contributes to overlap.

• An objective that incorporates both factors (a) and (b).• Experiments that showcase efficacy at different decompositions:• independence • clustering • sparsity

9

Emile Mathieu Tom Rainforth N. Siddharth Yee Whye Teh

Code Paper

iffsid/disentangling-disentanglementarXiv:1812.02833

Come talk to us at our poster: #59

top related