2. Conditional GANs (cGANs) Invertible Conditional GANs for image editing Guim Perarnau, Joost van de Weijer*, Bogdan Raducanu*, Jose M. Álvarez† * Computer Vision Center, Barcelona, Spain † Data61 @ CSIRO, Canberra, Australia [email protected], {joost,bogdan}@cvc.uab.es, [email protected] Bibliography [1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. 2672–2680. [2] A. Radford, L. Metz, and S. Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, International Conference on Learning Representations, 2016. [3] S. E. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis”, International Conference on Machine Learning, 2016. Overview Problem Complex image editing often requires human supervision and professional image editing tools. How can we automatize these complex operations? Solution We propose Invertible Conditional GANs (IcGANs), a model that combines a conditional GAN with an encoder. How? 1. Generate realistic images via GANs. 2. Condition generated images with attributes. 3. Encode real images in order to reconstruct them with the desired changes. 3. Encoder 4. Invertible conditional GANs (IcGANs) 5. Results 6. Conclusions Now we can combine both cGAN and encoder to create an IcGAN. With the encoder, we can invert the generator and map an image into a high feature space z and y. In this space, we can arbitrarily change key aspects of the image and then reconstruct the modified image using the generator. Discriminator Generator Fake images Real images Fake? Real? Backpropagation GANs are composed of two networks, a generator and a discriminator. The generator is trained to fool the discriminator by creating realistic images, and the discriminator is trained not to be fooled by the generator. One way to evaluate these models is to directly see how visually appealing the generated samples are. Here we show some qualitative examples of what an IcGAN is capable of by playing with both latent space z and conditional information y. We fix z for every row and modify y for each column to obtain variations of real images. Interpolations between faces. Swapping two face attributes. Acknowledgments This work is funded by the Projects TIN2013-41751-P of the Spanish Ministry of Science and the CHIST ERA project PCIN-2015-226. Scheme of an IcGAN and how it is used. Example of complex editing operations. Blonde Smile Male Bangs Original Code available! → https://github.com/Guim3/IcGAN 1. Generative Adversarial Networks (GANs) With cGANs, we add into the model conditional information y that describes some aspect of the data. This allows to control certain aspects of the generated images, e.g. generate a blonde woman with sunglasses. We refine cGANs by testing the optimal position in which y is inserted in the generator and discriminator. Full conv 1 Full conv 2 Full conv 3 Full conv 4 Full conv 5 100 Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 Encoder Generator 1 0 1 1 0 female black hair brown hair make-up sunglasses Change vector 1 1 0 1 0 female black hair brown hair make-up sunglasses cGAN IcGAN Encoder Input Recons. Swap y Input Recons. Swap y • We introduce IcGANs, which solves the problem of GANs lacking the ability to reconstruct real images while also allowing to explicitly control complex attributes of generated samples. • We refine the performance of cGANS by inserting the conditional information at the input level for the generator and at the first layer for the discriminator. • We evaluate several approaches to training an encoder, being two independent encoders and (IND) the best option. 0 10 20 30 40 50 60 70 Input Layer 1 Layer 2 Layer 3 Layer 4 F1-Score cGAN evaluation depending on y inserted position Discriminator Generator 0,35 0,4 0,45 0,5 Reconstruction loss Encoder type comparison SNG IND IND-COND Then, we train an encoder to reconstruct real images. It is trained after the cGAN and is composed of two sub-encoders: , which encodes an image to , and , which encodes an image to ′. . We test different strategies to make them interact and improve the encoding process: • SNG: and are embedded in a single encoder. • IND: and are trained separately. • IND-COND: two independent encoders, where is conditioned on the output of . Higher is better Lower is better