Semantic-Aware Sky Replacement (SIGGRAPH 2016)

Sky is Not the Limit: Semantic-Aware Sky Replacement

Yi-Hsuan Tsai Xiaohui Shen Zhe Lin Ming-Hsuan YangKalyan Sunkavalli

ACM Transactions on Graphics (SIGGRAPH), 2016

Motivation

Goal: automatically segment and replace with different styles of the sky

Example Results

Example Results

Challenges• Manually edit sky using Photoshop

5 mins 30 mins

We need a good segmentation algorithm!Input Image

Reference

Challenges• Manually edit sky using Photoshop

Input Image

Reference

We need image harmonization!

v.s

Professional editingColors are not matched

System

Input Image

SkySegmentation

Reference Images

SkySearch

SkyReplacement

Results

Sky Segmentation

Input Image

SkySegmentation

Literatures• Sky/non-sky classifier [Tao et al. SIGGRAPH’09]• Scene parsing [Long et al. CVPR’15]• Online refinement [Rother et al. SIGGRAPH’04]

Challenges• Sky appearance varies widely

• skylines/landscapes, clouds, lighting conditions • Need accurate sky boundaries

Sky Search

Input Image Reference Images

SkySearch

Literatures• GIST [Hays and Efros SIGGRAPH’07, Liu et al. CGF’14]

• Only consider global scene layout• Need a large database

Challenges• Search compatible images • Account for image content

Reference Image 1 Reference Image 2 Reference Image 3

Sky Replacement

Input Image

Literatures• Global transfer [Reinhard et al. 2001, Tao et al. SIGGRAPH’09]

• Image contents are not considered• Less realistic results

• Local transfer [Wu et al. CGF’13, Laffont et al. SIGGRAPH’14]• Boundary artifacts• Rely on filters for smoothing

Challenges• Transfer foreground appearance• Account for image content

SkyReplacement

Semantic-Aware System

Input Image

SkySegmentation

Reference Images

SkySearch

SkyReplacement

Results

Fully Convolutional Networks


Scene Parsing

FgRoad

Building

SkyTree

Semantic Response

Sky

. . .

Building Road

Fully Convolutional Networks (FCN)• End-to-end model• Pixel-wise segmentation

• Finetune with 11 scene labels• Semantic response map

[Long et al. CVPR’15]

Sky Segmentation

Input Image

SceneParsing

OnlineRefinement


Sky Segmentation

Input Image

SceneParsing

Fully ConvolutionalNetworks

OnlineRefinement

Conditional Random Field optimization• Online models: color, texture• Semantic response (sky/non-sky)• Pairwise term: magnitude of gradient

Sky Segmentation Results

Input Image FCN Results Our Results

Results

DeepLab [Chen et al. ICLR’15]

Sky Search

Input Image

Sky Image Database (415 Images)

SkySearch

Sky Search

Input ImageReference Images

Semantic Layout Descriptor• Account for local layouts• Utilize semantic responses

SkySearch


Sky Search

Input ImageReference Images

Semantic Layout Descriptor• Account for local layouts• Utilize semantic responses

SkySearch

Check Sky Properties• Prevent large distortions

• Aspect ratio• Resolution

• Ensure sky diversity• Color similarity


Semantic Layout Descriptor

Input Image

. . .

Sky Building Road

Semantic Responses• Pixel-wise responses• Range from 0 to 1


Input Image

. . .

Sky Building Road


Average pooling on spatial pyramids• Global pooling


Input Image

. . .

Sky Building Road


Average pooling on spatial pyramids• Global pooling• Local contents (3x3 grids)

. . .

. . .


Input Image

. . .

Sky Building Road


Input Image

. . .

Sky Building Road

. . . . . . . . .


Input Image

. . .

Sky Building Road

. . . . . . . . .

Descriptor . . .

Sky Replacement

Input Image

SkyAlignment

Sky Alignment• Extract complete sky regions from reference

images• Re-scale and paste on the input image

Reference Images

Sky Replacement

Input Image

SkyAlignment

Semantic-awareTransfer

Sky Alignment• Extract complete sky regions from reference

images• Re-scale and paste on the input image

Semantic-aware Transfer• Adjustment foreground appearance• Account for semantic regionsReference Images

Semantic-aware TransferDirect local transfer [Laffont et al. SIGGRAPH’14]• Match corresponding semantic regions• Boundary artifacts

Input image Scene parsing


T1 (x)



T2 (x)

T1 (x)



Input image Scene parsing Direct local transfer

T2 (x)

T1 (x)


Propose a soft mapping method• Utilize semantic responses as weights

for each category n


T1 (x)

T2 (x)



for each category n


T1 (x)

T2 (x)



for each category n

Input image Scene parsing Direct local transfer Soft mapping

Wn (x) = 1 or 0

T1 (x)

T2 (x)

Transfer FunctionsTransfer Functions Tn (x) for each category n• Transfer luminance and color

T1 (x)

T2 (x)

Luminance• Shift mean

Transfer FunctionsTransfer Functions Tn (x) for each category n• Transfer luminance and color

Color• Matched regions: chrominance

• Histogram matching [Lee et al. CVPR’16]• Non-matched regions: color temperature

• Consider entire foreground• More conservative

Not all the semantic regions are matched!

T1 (x)

T2 (x)

?

Sky Replacement Results

Input Image Sky Replacement Results





Sky Replacement withUser Preference




Input Image

Preferred Sky

Sky Replacement ResultsInput Image

Preferred Sky

Comparisons to Other Methods

Comparisons of different search methods

Comparisons of different transfer methods

Limitation

Light reflections

Conclusions• Automatic sky replacement results can be realistic

• New sky image database

• Semantics helps a lot• Sky segmentation• Sky image search• Appearance transfer

• Apply semantics to other tasks• Scene completion• Photo and video re-coloring

Summary of my Other Projects: Visual Object Recognition

Joint Object Classification and Segmentation [BMVC’13]• How do segmentation and classification help each other?

Class-specific Object Segmentation Hypotheses [ICCV’13]• How to utilize exemplars to gain more information

during learning and inference?

Image Retrieval [ICIP’14]• Compute label similarities to bridge semantic gaps

Exemplar-based Object Detection [CVPR’15]• Discover representative exemplars to build models• Region-based feature extraction and model learning

Image (Object) Recognition• Classification• Segmentation• Retrieval• Detection

Video Object Recognition• Object (Co-)segmentation• Scene (Co-)parsing

Video Segmentation via Object Flow [CVPR’16]• How do segmentation and optical flow help each other?• Segmentation: multi-scale, spatio-temporal graphical model• Optical flow: use segmentation to refine boundaries• Iteratively solve the joint model

Semantic Co-segmentation in Videos (submitted to ECCV’16)• Temporal-consistent object tracklets• Relations between objects from a collection of videos

Ongoing and future work• Scene Parsing via Deep CNNs

• Attention to small objects• Label co-occurrence

• Video Scene Co-parsing• Weakly-supervised: video tags• Use image-based classifier

Object Segmentation

96.4 MCL, 74.4

93.3 PMCut, 59.1

94.4 MCL, 53.083.6 PMCut, 47.3

Object Segmentation

93.5 PMCut, 26.6

89.2 MCL, 65.373.8 PMCut, 58.0

86.9 PMCut, 68.0

Object Detection

Object Detection

Video Object Segmentation

Video Object Segmentation

Segmentation Updated Optical Flow Initial Optical Flow

Joint Object Classificationand Segmentation [BMVC’13] Object Segmentation [ICCV’13]

Image Retrieval [ICIP’14]

Object Detection [CVPR’15]

Video Object Segmentation [CVPR’16]

Sky Replacement [SIGGRAPH’16]

Semantic Co-segmentation in Videos (submitted to ECCV’16)

Video Scene Co-parsing (ongoing)

Image (Object) Recognition via Exemplars• Classification• Segmentation• Retrieval• DetectionVideo Object Recognition: Temporal + CNN• Object (Co-)segmentation• Scene (Co-)parsing

Image/Video Editing• Background/Object Replacement• Scene Completion• Re-coloring

Semantic Information

My homepage: https://sites.google.com/site/yihsuantsai/

Thank you!

Semantic-Aware Sky Replacement (SIGGRAPH 2016)

Engineering