Invertible Conditional GANs for image editing€¦ · Complex image editing often requires human supervision and professional image editing tools. How can we automatize these complex
Post on 21-Jul-2020
4 Views
Preview:
Transcript
2. Conditional GANs (cGANs)
Invertible Conditional GANs for image editing
Guim Perarnau, Joost van de Weijer*, Bogdan Raducanu*, Jose M. Álvarez†
* Computer Vision Center, Barcelona, Spain † Data61 @ CSIRO, Canberra, Australia
guimperarnau@gmail.com, {joost,bogdan}@cvc.uab.es, jose.alvarez@data61.csiro.au
Bibliography[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. 2672–2680.
[2] A. Radford, L. Metz, and S. Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, International Conference on Learning Representations, 2016.
[3] S. E. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis”, International Conference on Machine Learning, 2016.
Overview
Problem
Complex image editing often requires human
supervision and professional image editing
tools. How can we automatize these complex
operations?
Solution
We propose Invertible Conditional GANs(IcGANs), a model that combines a conditional
GAN with an encoder.
How?
1. Generate realistic images via GANs.
2. Condition generated images with attributes.
3. Encode real images in order to reconstruct
them with the desired changes.
3. Encoder
4. Invertible conditional GANs (IcGANs) 5. Results
6. Conclusions
Now we can combine both cGAN and encoder to create an IcGAN. With the encoder, we can invert the
generator and map an image into a high feature space z and y. In this space, we can arbitrarily change key
aspects of the image and then reconstruct the modified image using the generator.
DiscriminatorGenerator
Fake images
Real images
Fake?
Real?
Backpropagation
GANs are composed of two networks, a generator and a discriminator. The generator is trained to
fool the discriminator by creating realistic images, and the discriminator is trained not to be fooled
by the generator.
One way to evaluate these models is to directly see how visually appealing the generated
samples are. Here we show some qualitative examples of what an IcGAN is capable of by
playing with both latent space z and conditional information y.
We fix z for every row and modify y for each column to obtain variations of real images.
Interpolations between faces.
Swapping two face attributes.
AcknowledgmentsThis work is funded by the Projects TIN2013-41751-P of the Spanish Ministry
of Science and the CHIST ERA project PCIN-2015-226.
Scheme of an IcGAN and how it is used.
Example of complex editing operations.
Blonde Smile MaleBangsOriginal
Code available! → https://github.com/Guim3/IcGAN
1. Generative Adversarial Networks (GANs)
With cGANs, we add into the model conditional information y that describes some aspect of the
data. This allows to control certain aspects of the generated images, e.g. generate a blonde
woman with sunglasses. We refine cGANs by testing the optimal position in which y is inserted in
the generator and discriminator.
Full conv 1 Full conv 2Full conv 3
Full conv 4
Full conv 5
100
Conv 1 Conv 2 Conv 3 Conv 4Conv 5
Encoder Generator
1
0
1
1
0
female
black hair
brown hair
make-up
sunglasses
Change vector
1
1
0
1
0
female
black hair
brown hair
make-up
sunglasses cGANIcGAN
Encoder
Input Recons. Swap y
Input Recons. Swap y• We introduce IcGANs, which solves the problem of GANs lacking the ability to reconstruct real images
while also allowing to explicitly control complex attributes of generated samples.
• We refine the performance of cGANS by inserting the conditional information 𝑦 at the input level for the
generator and at the first layer for the discriminator.
• We evaluate several approaches to training an encoder, being two independent encoders 𝐸𝑧 and 𝐸𝑦 (IND)
the best option.
0
10
20
30
40
50
60
70
Input Layer 1 Layer 2 Layer 3 Layer 4
F1-S
core
cGAN evaluation depending on y inserted position
Discriminator
Generator
0,35
0,4
0,45
0,5
Reco
nst
ruct
ion
lo
ss
Encoder type comparison
SNG IND IND-COND
Then, we train an encoder to reconstruct real images. It is trained after the cGAN and is composed
of two sub-encoders: 𝐸𝑧, which encodes an image to 𝑧, and 𝐸𝑦, which encodes an image to 𝑦′..
We test different strategies to make them interact and improve the encoding process:
• SNG: 𝐸𝑧 and 𝐸𝑦 are embedded in a single encoder.
• IND: 𝐸𝑧 and 𝐸𝑦 are trained separately.
• IND-COND: two independent encoders, where 𝐸𝑧is conditioned on the output of 𝐸𝑦.
Higher is better Lower is better
top related