Sky is Not the Limit: Semantic-Aware Sky Replacement Yi-Hsuan Tsai Xiaohui Shen Zhe Lin Ming-Hsuan Yang Kalyan Sunkavalli ACM Transactions on Graphics (SIGGRAPH), 2016
Sky is Not the Limit: Semantic-Aware Sky Replacement
Yi-Hsuan Tsai Xiaohui Shen Zhe Lin Ming-Hsuan YangKalyan Sunkavalli
ACM Transactions on Graphics (SIGGRAPH), 2016
Motivation
Goal: automatically segment and replace with different styles of the sky
Example Results
Example Results
Challenges• Manually edit sky using Photoshop
5 mins 30 mins
We need a good segmentation algorithm!Input Image
Reference
Challenges• Manually edit sky using Photoshop
Input Image
Reference
We need image harmonization!
v.s
Professional editingColors are not matched
System
Input Image
SkySegmentation
Reference Images
SkySearch
SkyReplacement
Results
Sky Segmentation
Input Image
SkySegmentation
Literatures• Sky/non-sky classifier [Tao et al. SIGGRAPH’09]• Scene parsing [Long et al. CVPR’15]• Online refinement [Rother et al. SIGGRAPH’04]
Challenges• Sky appearance varies widely
• skylines/landscapes, clouds, lighting conditions • Need accurate sky boundaries
Sky Search
Input Image Reference Images
SkySearch
Literatures• GIST [Hays and Efros SIGGRAPH’07, Liu et al. CGF’14]
• Only consider global scene layout• Need a large database
Challenges• Search compatible images • Account for image content
Reference Image 1 Reference Image 2 Reference Image 3
Sky Replacement
Input Image
Literatures• Global transfer [Reinhard et al. 2001, Tao et al. SIGGRAPH’09]
• Image contents are not considered• Less realistic results
• Local transfer [Wu et al. CGF’13, Laffont et al. SIGGRAPH’14]• Boundary artifacts• Rely on filters for smoothing
Challenges• Transfer foreground appearance• Account for image content
SkyReplacement
Semantic-Aware System
Input Image
SkySegmentation
Reference Images
SkySearch
SkyReplacement
Results
Fully Convolutional Networks
Fully Convolutional Networks
Scene Parsing
FgRoad
Building
SkyTree
Semantic Response
Sky
. . .
Building Road
Fully Convolutional Networks (FCN)• End-to-end model• Pixel-wise segmentation
• Finetune with 11 scene labels• Semantic response map
[Long et al. CVPR’15]
Sky Segmentation
Input Image
SceneParsing
OnlineRefinement
Fully Convolutional Networks
Sky Segmentation
Input Image
SceneParsing
Fully ConvolutionalNetworks
OnlineRefinement
Conditional Random Field optimization• Online models: color, texture• Semantic response (sky/non-sky)• Pairwise term: magnitude of gradient
Sky Segmentation Results
Input Image FCN Results Our Results
Results
DeepLab [Chen et al. ICLR’15]
Sky Search
Input Image
Sky Image Database (415 Images)
SkySearch
Sky Search
Input ImageReference Images
Semantic Layout Descriptor• Account for local layouts• Utilize semantic responses
SkySearch
Sky Image Database (415 Images)
Sky Search
Input ImageReference Images
Semantic Layout Descriptor• Account for local layouts• Utilize semantic responses
SkySearch
Check Sky Properties• Prevent large distortions
• Aspect ratio• Resolution
• Ensure sky diversity• Color similarity
Sky Image Database (415 Images)
Semantic Layout Descriptor
Input Image
. . .
Sky Building Road
Semantic Responses• Pixel-wise responses• Range from 0 to 1
Semantic Layout Descriptor
Input Image
. . .
Sky Building Road
Semantic Responses• Pixel-wise responses• Range from 0 to 1
Average pooling on spatial pyramids• Global pooling
Semantic Layout Descriptor
Input Image
. . .
Sky Building Road
Semantic Responses• Pixel-wise responses• Range from 0 to 1
Average pooling on spatial pyramids• Global pooling• Local contents (3x3 grids)
. . .
. . .
Semantic Layout Descriptor
Input Image
. . .
Sky Building Road
Semantic Layout Descriptor
Input Image
. . .
Sky Building Road
. . . . . . . . .
Semantic Layout Descriptor
Input Image
. . .
Sky Building Road
. . . . . . . . .
Descriptor . . .
Sky Replacement
Input Image
SkyAlignment
Sky Alignment• Extract complete sky regions from reference
images• Re-scale and paste on the input image
Reference Images
Sky Replacement
Input Image
SkyAlignment
Semantic-awareTransfer
Sky Alignment• Extract complete sky regions from reference
images• Re-scale and paste on the input image
Semantic-aware Transfer• Adjustment foreground appearance• Account for semantic regionsReference Images
Semantic-aware TransferDirect local transfer [Laffont et al. SIGGRAPH’14]• Match corresponding semantic regions• Boundary artifacts
Input image Scene parsing
Semantic-aware TransferDirect local transfer [Laffont et al. SIGGRAPH’14]• Match corresponding semantic regions• Boundary artifacts
T1 (x)
Input image Scene parsing
Semantic-aware TransferDirect local transfer [Laffont et al. SIGGRAPH’14]• Match corresponding semantic regions• Boundary artifacts
T2 (x)
T1 (x)
Input image Scene parsing
Semantic-aware TransferDirect local transfer [Laffont et al. SIGGRAPH’14]• Match corresponding semantic regions• Boundary artifacts
Input image Scene parsing Direct local transfer
T2 (x)
T1 (x)
Semantic-aware TransferDirect local transfer [Laffont et al. SIGGRAPH’14]• Match corresponding semantic regions• Boundary artifacts
Propose a soft mapping method• Utilize semantic responses as weights
for each category n
Input image Scene parsing Direct local transfer
T1 (x)
T2 (x)
Semantic-aware TransferDirect local transfer [Laffont et al. SIGGRAPH’14]• Match corresponding semantic regions• Boundary artifacts
Propose a soft mapping method• Utilize semantic responses as weights
for each category n
Input image Scene parsing Direct local transfer
T1 (x)
T2 (x)
Semantic-aware TransferDirect local transfer [Laffont et al. SIGGRAPH’14]• Match corresponding semantic regions• Boundary artifacts
Propose a soft mapping method• Utilize semantic responses as weights
for each category n
Input image Scene parsing Direct local transfer Soft mapping
Wn (x) = 1 or 0
T1 (x)
T2 (x)
Transfer FunctionsTransfer Functions Tn (x) for each category n• Transfer luminance and color
T1 (x)
T2 (x)
Luminance• Shift mean
Transfer FunctionsTransfer Functions Tn (x) for each category n• Transfer luminance and color
Color• Matched regions: chrominance
• Histogram matching [Lee et al. CVPR’16]• Non-matched regions: color temperature
• Consider entire foreground• More conservative
Not all the semantic regions are matched!
T1 (x)
T2 (x)
?
Sky Replacement Results
Input Image Sky Replacement Results
Input Image Sky Replacement Results
Input Image Sky Replacement Results
Input Image Sky Replacement Results
Input Image Sky Replacement Results
Sky Replacement withUser Preference
Input Image Sky Replacement Results
Input Image Sky Replacement Results
Input Image Sky Replacement Results
Input Image
Preferred Sky
Sky Replacement ResultsInput Image
Preferred Sky
Comparisons to Other Methods
Comparisons of different search methods
Comparisons of different transfer methods
Limitation
Light reflections
Conclusions• Automatic sky replacement results can be realistic
• New sky image database
• Semantics helps a lot• Sky segmentation• Sky image search• Appearance transfer
• Apply semantics to other tasks• Scene completion• Photo and video re-coloring
Summary of my Other Projects: Visual Object Recognition
Joint Object Classification and Segmentation [BMVC’13]• How do segmentation and classification help each other?
Class-specific Object Segmentation Hypotheses [ICCV’13]• How to utilize exemplars to gain more information
during learning and inference?
Image Retrieval [ICIP’14]• Compute label similarities to bridge semantic gaps
Exemplar-based Object Detection [CVPR’15]• Discover representative exemplars to build models• Region-based feature extraction and model learning
Image (Object) Recognition• Classification• Segmentation• Retrieval• Detection
Video Object Recognition• Object (Co-)segmentation• Scene (Co-)parsing
Video Segmentation via Object Flow [CVPR’16]• How do segmentation and optical flow help each other?• Segmentation: multi-scale, spatio-temporal graphical model• Optical flow: use segmentation to refine boundaries• Iteratively solve the joint model
Semantic Co-segmentation in Videos (submitted to ECCV’16)• Temporal-consistent object tracklets• Relations between objects from a collection of videos
Ongoing and future work• Scene Parsing via Deep CNNs
• Attention to small objects• Label co-occurrence
• Video Scene Co-parsing• Weakly-supervised: video tags• Use image-based classifier
Object Segmentation
96.4 MCL, 74.4
93.3 PMCut, 59.1
94.4 MCL, 53.083.6 PMCut, 47.3
Object Segmentation
93.5 PMCut, 26.6
89.2 MCL, 65.373.8 PMCut, 58.0
86.9 PMCut, 68.0
Object Detection
Object Detection
Video Object Segmentation
Video Object Segmentation
Segmentation Updated Optical Flow Initial Optical Flow
Joint Object Classificationand Segmentation [BMVC’13] Object Segmentation [ICCV’13]
Image Retrieval [ICIP’14]
Object Detection [CVPR’15]
Video Object Segmentation [CVPR’16]
Sky Replacement [SIGGRAPH’16]
Semantic Co-segmentation in Videos (submitted to ECCV’16)
Video Scene Co-parsing (ongoing)
Image (Object) Recognition via Exemplars• Classification• Segmentation• Retrieval• DetectionVideo Object Recognition: Temporal + CNN• Object (Co-)segmentation• Scene (Co-)parsing
Image/Video Editing• Background/Object Replacement• Scene Completion• Re-coloring
Semantic Information
My homepage: https://sites.google.com/site/yihsuantsai/
Thank you!