Page 1
Semantic Image Synthesis with Spatially-Adaptive Normalization
Byeong-Sun Hong
2019-05-16
Computer Graphics @ Korea University
Copyright of figures and other materials in the paper belongs to original authors.
Taesung Park(UC Berkeley) et al.CVPR 2019
Page 2
Byeong-sun Hong | 2019-05-16| # 2Computer Graphics @ Korea University
Title
• Semantic Image Synthesis
Semantic Segmentation Mask Image -> Photorealistic Image
• Spatially-Adaptive Normalization
SPatially Adaptive (DE)normalization
• SPADE
Page 3
Byeong-sun Hong | 2019-05-16| # 3Computer Graphics @ Korea University
Index
• Introduction
• Related Work
• Model
• Semantic Image Synthesis
• Experiments
• Conclusion
Page 5
Byeong-sun Hong | 2019-05-16| # 5Computer Graphics @ Korea University
Introduction
Champion Scene
Page 6
Byeong-sun Hong | 2019-05-16| # 6Computer Graphics @ Korea University
Introduction
• 최근 Neural Network을 활용한 Generating Photorealistic image 기법들이 나오고 있음
• 이전 방법들은 Wash Away 현상으로 인해 결과가 좋지 않다.
• Contribution
Spatially-Adaptive Normalization 기법을 적용하여 Wash Away 현상을 없애고 이전의 결과들보다 좋은 결과를 얻어냈음
Segmentation 및 Style 별로 Control 가능하다.
Page 8
Byeong-sun Hong | 2019-05-16| # 8Computer Graphics @ Korea University
• “Generative Adversarial Nets”
[Ian J.Goodfellow(Google) et al. / NIPS 2014]
Related Work
Deep Generative Model
Page 9
Byeong-sun Hong | 2019-05-16| # 9Computer Graphics @ Korea University
Related Work
Conditional Image Synthesis
• labels
• Text to Image
• Image to Image
• Segmentation Mask to Image
Page 10
Byeong-sun Hong | 2019-05-16| # 10Computer Graphics @ Korea University
Related Work - Conditional Image Synthesis
labels
• “Conditional image synthesis with auxiliary classifier GANs”
[Augustus Odena(Google Brain) et al. / CVPR 2017]
• “cGANs with projection discriminator”
[Takeru M.(Preferred Networks inc.) and Masanori K. / ICLR 2018 ]
Page 11
Byeong-sun Hong | 2019-05-16| # 11Computer Graphics @ Korea University
Related Work - Conditional Image Synthesis
Text to Image
• “Generative adversarial text to image synthesis.”
[Scott Reed(University of Michigan) et al. / ICML 2016]
• “Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks”
[Han Zhang(Rutgers Univ.) et al. / ICCV 2016]
Page 12
Byeong-sun Hong | 2019-05-16| # 12Computer Graphics @ Korea University
Related Work - Conditional Image Synthesis
Image to Image
• “Image to image translation with conditional adversarial networks”
[Phillip Isola(Berkeley AI Research) et al. / CVPR 2017]
• “Multimodal unsupervised image-to-image translation”
[Xun Huang(Cornell Univ.) et al. / ECCV 2018]
Page 13
Byeong-sun Hong | 2019-05-16| # 13Computer Graphics @ Korea University
Related Work - Conditional Image Synthesis
Segmentation Mask to Image(1/3)
• “Photographic Image Synthesis with Cascaded Refinement Networks”
[Qifeng Chen and Vladlen Koltun(Intel Labs) / ICCV 2017]
Page 14
Byeong-sun Hong | 2019-05-16| # 14Computer Graphics @ Korea University
Related Work - Conditional Image Synthesis
Segmentation Mask to Image(2/3)
• “Semi-parametric image synthesis”
[Xiaojuan Qi (CUHK) et al. / CVPR 2018]
SIMS 같은 경우에는 Class가 많은 Data에 대해서는 불가능하다.
• 연산량이 너무 많이 필요함
Page 15
Byeong-sun Hong | 2019-05-16| # 15Computer Graphics @ Korea University
Related Work - Conditional Image Synthesis
Segmentation Mask to Image(3/3)
• “High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs”
[Ting-Chun Wang(NVIDIA Corp.) et al. / CVPR 2017]
Page 16
Byeong-sun Hong | 2019-05-16| # 16Computer Graphics @ Korea University
Related Work
Normalization
• Unconditional Normalization Layers
Batch Normalization
Layer Normalization
Instance Normalization
Group Normalization
• Conditional Normalization Layers
Conditional Batch Normalization
Adaptive Instance Normalization
𝛾와 𝛽를 어떻게 학습하는지에 따라 나뉨
Page 17
Byeong-sun Hong | 2019-05-16| # 17Computer Graphics @ Korea University
• 입력의 분포가 평균 0 분산 1로 규격화가 되었더라도층이 깊어질 수록 입력 분포가 변화되면서 학습이 불안정해진다.
Related Work - Normalization
Covariate Shift
출처 : Google 이미지
출처 : Google 이미지
Page 18
Byeong-sun Hong | 2019-05-16| # 18Computer Graphics @ Korea University
• Gradient Vanishing / Gradient Exploding 문제 발생
Related Work - Normalization
DNN의 문제점
출처 : Google 이미지
Page 19
Byeong-sun Hong | 2019-05-16| # 19Computer Graphics @ Korea University
• Normalization 이전의 해결방법
Activation Function 변화(Relu 등)
Careful Initialization
Small learning rate
• 간접적인 방법이 아닌 Training 하는 과정 자체를 전체적으로안정화 하여 문제를 해결하겠다.
Normalization 방법
Related Work - Normalization
여러 해결법
출처 : Google 이미지
Page 20
Byeong-sun Hong | 2019-05-16| # 20Computer Graphics @ Korea University
• “Batch Normalization : Accelerating deep network training by reducing internal covariate shift”
[Sergey Loffe and Cristian Szegedy(Google Inc.) / NIPS 2015]
Related Work - Unconditional Normalization Layers
Batch Normalization(1/2)
Page 21
Byeong-sun Hong | 2019-05-16| # 21Computer Graphics @ Korea University
Related Work - Unconditional Normalization Layers
Batch Normalization(2/2)
𝛾, 𝛽 값(scale, shift factor)이 있는 이유 : Normalization를 통해 평균 0, 분산 1이되면, activation function의 비선형성이 없어질 수 있다.
Page 22
Byeong-sun Hong | 2019-05-16| # 22Computer Graphics @ Korea University
• “Layer Normalization”
[Jimmy Lei Ba et al.(Univ. of Toronto) / NIPS 2016]
• “Instance Normalization: The Missing Ingredient for Fast Stylization”
[Dimitry Ulyanov et al.(Computer Vision Group Skoltech) /CVPR 2017]
• “Group Normalization”
[Yuxin Wu and Kaiming He(Facebook AI Research)/ ECCV 2018]
Related Work - Unconditional Normalization Layers
Layer, Instance, Group Normalization
N = Batch안의 Sample 수C = Channel(H,W) = Height x Width
Page 23
Byeong-sun Hong | 2019-05-16| # 23Computer Graphics @ Korea University
Related Work - Unconditional Normalization Layers
Unconditional Normalization
R = 255G = 0B = 0
Batch Norm = 2장 총 8개의 픽셀 R,G,B 각각 NormLayer Norm = 1장 총 4개의 픽셀 모든 R,G,B NormInstance Norm = 1반 총 4개의 픽셀 R,G,B 각각 Norm
Batch Size = 2N = Batch Size로 나눈 반 수 = 4 / 2 = 2C = RGB 수 = 3H,W = 픽셀 수 = 2 x 2 = 4
Mini Batch
R = 0G = 0B = 0
Page 24
Byeong-sun Hong | 2019-05-16| # 24Computer Graphics @ Korea University
Related Work
Conditional Normalization Layers
• “A learned representation for artistic style”
[V. Dumoulin(Google Brain) et al. / ICLR 2017]
• “Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization”
[Xun Huang and Serge Belongie(Cornell Univ.) / ICCV 2017]
𝑥 = content input𝑦 = style input
Conditional Batch Normalization 제시
Adaptive Instance Normalization 제시
Page 26
Byeong-sun Hong | 2019-05-16| # 26Computer Graphics @ Korea University
Model
Total Architecture
Page 27
Byeong-sun Hong | 2019-05-16| # 27Computer Graphics @ Korea University
Model
Generator, Discriminator, Encoder
Generator DiscriminatorEncoder
Page 28
Byeong-sun Hong | 2019-05-16| # 28Computer Graphics @ Korea University
Model
SPADE ResBlk, SPADE
Page 29
Semantic Image Synthesis
Page 30
Byeong-sun Hong | 2019-05-16| # 30Computer Graphics @ Korea University
Semantic Image Synthesis
SPADE
m = Semantic Segmentation Mask𝐻,𝑊, 𝐶 = Height, Width, Channel𝑁 = 전체 Data Size / Batch 수ℎ𝑖 = 𝑖번째 Layer에서 나온 Activation Map𝛾, 𝛽 = Scaling and Shift Factor Map
Page 31
Byeong-sun Hong | 2019-05-16| # 31Computer Graphics @ Korea University
Semantic Image Synthesis
SPADE Generator
• SPADE를 사용하게 되면 처음에 Segmentation Mask가 필요 없음
각각의 SPADE ResBlk에서 계속 정보를 받아 옴
Encoder 부분이 없어도 되므로 Parameter가 줄어들면서 시간 단축
Page 32
Byeong-sun Hong | 2019-05-16| # 32Computer Graphics @ Korea University
Semantic Image Synthesis
Why does SPADE work better?
• Normalization을 거치면서 Wash away 되는 경향이 있음
Input이 단일 Mask로 구성되어 있으면 Normalization을 하면서All Zero가 된다
• SPADE는 Segmentation Mask는 Normalization을 하지 않으므로Data 정보가 살아 있다.
Page 33
Byeong-sun Hong | 2019-05-16| # 33Computer Graphics @ Korea University
Semantic Image Synthesis
Multi Modal Synthesis
• Random Vector를 Generator의 input으로 사용
• Random Vector를 조금씩 수정하면 다른 결과가 나온다
Page 35
Byeong-sun Hong | 2019-05-16| # 35Computer Graphics @ Korea University
Experiments
Implementation Details
• 학습시간
Dataset에 따라 다르지만 약 100 ~ 200 epoch
• Learning rate
Generator = 0.0001, Discriminator =0.0004
• Hardware
NVIDIA DGX1
출처 : NVIDIA 홈페이지
Page 36
Byeong-sun Hong | 2019-05-16| # 36Computer Graphics @ Korea University
Experiments
Datasets
• COCO-Stuff 118,000 training images, 5000 validation images, 182 class
• ADE20K 20,210 training images, 2000 validation images, 150 class
• ADE20K-outdoor Subset of the ADE20K dataset that only contains outdoor scenes
• Cityscapes Street scene images in German cities. 3000 training images, 500 validation images
• Flickr Landscapes Flickr 사이트에서 가져온 풍경 사진들로 구성 DeepLab v2를 가지고 Segmentation Mask 생성 40,000 training images, 1000 validation images, 182 class
Page 37
Byeong-sun Hong | 2019-05-16| # 37Computer Graphics @ Korea University
Experiments
Performance Metrics
• 평가 방법
생성한 Image를 학습이 잘 된 Semantic Segmentation Model로다시 Segmentation을 한 뒤에 Ground Truth와 비교
Segmentation Model
• COCO-Stuff : DeepLabV2
• ADE20K : UperNet 101
• Cityscapes : DRN-D-105
• 평가 종류
mIoU - Mean Intersection-over-Union
accu – Pixel Accuracy
FID – Frechet Inception Distance
• 평가 대상 – CRN, SIMS, PIX2PIXHD
공정한 비교를 위해 CRN, PIX2PIXHD는 저자에게 직접 제공받음
Page 38
Byeong-sun Hong | 2019-05-16| # 38Computer Graphics @ Korea University
Experiments
Quantitative Comparisons
• mIoU - Mean Intersection-over-Union
• accu – Pixel Accuracy
• FID – Frechet Inception Distance
| 𝑚 − 𝑚𝑔 | 22 + 𝑇𝑟(𝐶 + 𝐶𝑔 − 2 𝐶𝐶𝑔
1
2)
𝑚 = 실제 Data distribution 평균𝐶 = 실제 Data distribution 분산𝑚𝑔 = 생성 Data distribution 평균
𝐶𝑔 = 생성 Data distribution 분산
𝑇𝑟 = 대각합
출처 : Google 이미지
Page 39
Byeong-sun Hong | 2019-05-16| # 39Computer Graphics @ Korea University
Experiments
Qualitative Results(1/3)
Page 40
Byeong-sun Hong | 2019-05-16| # 40Computer Graphics @ Korea University
Experiments
Qualitative Results(2/3)
• Flickr Landscapes Dataset으로 학습한 결과
Page 41
Byeong-sun Hong | 2019-05-16| # 41Computer Graphics @ Korea University
Experiments
Qualitative Results(3/3)
• COCO Dataset으로 학습한 결과
Page 42
Byeong-sun Hong | 2019-05-16| # 42Computer Graphics @ Korea University
Experiments
Human Evaluation
• Amazon Mechanical Turk(AMT)
Amazon에서 제공하는 설문조사 사이트
• 결과물 2장과 Segmentation Mask를 주어 5명에게 결과를 받음
• 500개의 Result를 제공하여 결과 비교
출처 : AMT
Page 43
Byeong-sun Hong | 2019-05-16| # 43Computer Graphics @ Korea University
Experiments
The Effectiveness of SPADE
• ++ - SPADE를 뺀 나머지 할 수 있는 최대한의 좋은 것
• Concat - Input에 Segmentation Mask를 Channel wise로 붙임
• Compact - Filter 수를 줄임
Page 44
Byeong-sun Hong | 2019-05-16| # 44Computer Graphics @ Korea University
Experiments
Variations of SPADE Generator
• 여러 방법을 통해 Generator 성능을 측정
Input to the Generator, Kernel Size, Number of Filters, Normalization Method
Page 45
Byeong-sun Hong | 2019-05-16| # 45Computer Graphics @ Korea University
Experiments
Multi-Modal Synthesis
• 같은 Segmentation Mask이지만 다른 Random Noise Input으로다른 모습을 나타낼 수 있다.
Page 46
Byeong-sun Hong | 2019-05-16| # 46Computer Graphics @ Korea University
Experiments
최종 결과
• Style과 Segmentation Map을 수정하며 사용 가능
Page 48
Byeong-sun Hong | 2019-05-16| # 48Computer Graphics @ Korea University
Conclusion
• SPADE를 사용하여 다양한 사진들에 대해 좋은 결과를 얻어냄
• Segmentation Mask와 Style에 따른 결과 변경이 가능하다
Page 49
Byeong-sun Hong | 2019-05-16| # 49Computer Graphics @ Korea University
Conclusion
Demo Video - GauGAN(1/3)
• Software GauGAN - Gauguin을 본 따서 만듦.
Page 50
Byeong-sun Hong | 2019-05-16| # 50Computer Graphics @ Korea University
Conclusion
Demo Video - GauGAN(2/3)
• Software GauGAN - Gauguin을 본 따서 만듦.
Page 51
Byeong-sun Hong | 2019-05-16| # 51Computer Graphics @ Korea University
Conclusion
Demo Video - GauGAN(3/3)
• Software GauGAN - Gauguin을 본 따서 만듦.
Page 52
Tae-hyeong Kim | 2012. 10. 29 | # 52Computer Graphics @ Korea University
Q & A
Page 53
Byeong-sun Hong | 2019-05-16| # 53Computer Graphics @ Korea University
Backpropagation
Page 54
Byeong-sun Hong | 2019-05-16| # 54Computer Graphics @ Korea University
Residual Block
Page 55
Byeong-sun Hong | 2019-05-16| # 55Computer Graphics @ Korea University
ExperimentsSegmentation Modal
• DeepLabV2
“DeepLab : Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs“
• [Liang-Chieh Chen(Google Inc.) et al. / CVPR 2017]
Page 56
Byeong-sun Hong | 2019-05-16| # 56Computer Graphics @ Korea University
ExperimentsSegmentation Modal
• UperNet101
“Unified Perceptual Parsing for Scene Understanding“
• [Tete Xiao(Peking University) et al. / ECCV 2018]
Page 57
Byeong-sun Hong | 2019-05-16| # 57Computer Graphics @ Korea University
ExperimentsSegmentation Modal
• DRN-D-105
“Dilated Residual Networks“
• [Fisher Yu(Princeton University) et al. / CVPR 2017]
Page 58
Byeong-sun Hong | 2019-05-16| # 58Computer Graphics @ Korea University
Additional Results
Page 59
Byeong-sun Hong | 2019-05-16| # 59Computer Graphics @ Korea University
Additional Results
Page 60
Byeong-sun Hong | 2019-05-16| # 60Computer Graphics @ Korea University
Additional Results