Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. [email protected]CSED703R: Deep Learning for Visual Recognition (2016S) Semantic Segmentation • Segmenting images based on its semantic notion 2 3 Supervised Learning Fully Convolutional Network • Network architecture [Long15] • End‐to‐end CNN architecture for semantic segmentation • Interpret fully connected layers to convolutional layers 4 [Long15] J. Long, E. Shelhamer, and T. Darrell, Fully Convolutional Network for Semantic Segmentation. CVPR 2015 Deconvolution 16x16x21 500x500x3
13
Embed
Lecture 7: Semantic Segmentation - POSTECHLecture 7: Semantic Segmentation BohyungHan Computer Vision Lab. ... J. Long, E. Shelhamer, and T. Darrell, Fully Convolutional Network for
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
CSED703R: Deep Learning for Visual Recognition (2016S)
Semantic Segmentation
• Segmenting images based on its semantic notion
2
3
Supervised Learning
Fully Convolutional Network
• Network architecture[Long15]• End‐to‐end CNN architecture for semantic segmentation• Interpret fully connected layers to convolutional layers
4
[Long15] J. Long, E. Shelhamer, and T. Darrell, Fully Convolutional Network for Semantic Segmentation. CVPR 2015
Deconvolution
16x16x21
500x500x3
Deconvolution Filter
• Bilinear interpolation filter Same filter for every class No filter learning!
• How does this deconvolution work? Deconvolution layer is fixed. Fining‐tuning convolutional layers of
the network with segmentation ground‐truth.
5
64x64 bilinear interpolation
seg ∘Fixed Pretrained on ImageNet
Fine‐tuned for segmentation
DeconvNet
• Learning a deep deconvolution network Conceptually more reasonable Better to identify fine structures of objects Designed to generate outputs from larger solution space Capable of predicting dense output scores Instance‐wise training and prediction Difficult to learn: memory intensive, large number of parameters
6[Noh15] H. Noh, S. Hong, B. Han: Learning Deconvolution Network for Semantic Segmentation, ICCV 2015
Operations in Deconvolution Network
• Unpooling Place activations to
pooled location Preserve structure of
activations
• Deconvolution Densify sparse activations Bases to reconstruct shape
• Instance‐wise training Data augmentation: object proposals, random cropping, flipping Two‐stage training
• Binary segmentation with ground truth• Full segmentation with object proposals
Batch normalization
• Instance‐wise prediction
Each class corresponds to one of the channels in the output layer. Label of a pixel is given by max operation over all channels. Aggregation of 50 object proposals: max operations over all proposals
9
1. Input image 2. Object proposals
DeconvNet
3. Prediction and aggregation 4. Results
Results
10
11
Semi‐Supervised Learning
Motivation
• Challenges in existing supervised learning approaches Heavy labeling efforts in semantic segmentation Much more expensive to obtain pixel‐wise segmentation labels than
other kinds of labels Difficult to extend to other classes and handle more classes
12
Problem Definition
• Weakly supervised learning with hybrid annotations Many weak annotations: image‐level object class labels Few strong annotations: full segmentation labels
• Construction Adopting DeconvNet Customized for binary segmentation
16
min ; ; , where isbinaryGT.
Bridging Layers
• Specification Input: concatenation of and in the channel direction
Output: class‐specific activation map
• Construction Fully connected layers : pool5 feature maps
: backpropagating class‐specific information until pool5
17
Class‐Specific Information
• Class‐specific saliency map[Simonyan12]
Given an image, pixels related to specific class can be identified by computing gradient of class score w.r.t image by
18
[Simonyan12] K. Simonyan, A. Vedaldi, A. Zisserman. Deep inside Convolutional Networks: VisualisingImage Classification Models and Saliency Maps. ICLR Workshop, 2014.
⋅ ⋯
Segmentation Maps
19
Inference
• Need iterations Computing segmentation map for each identified label Using the same segmentation network with different class‐specific
information
20
;;
Inference
• Need iterations Computing segmentation map for each identified label Using the same segmentation network with different class‐specific
information
21
∗ ∗;
∗max ; , ; , ∗;
Qualitative Results
22
Quantitative Results
23
Comparison to other algorithms in PASCAL VOC 2012 validation set
Per‐class accuracy in PASCAL VOC 2012 test set
24
Weakly‐Supervised Learning
Problem Definition
• Semantic segmentation by weakly supervised learning Image‐level object class labels only Bounding boxes (and corresponding labels) only Scribbles (and corresponding labels) only
• Approaches Constrained optimization Iterative optimization Transfer learning
25
personbike
boatpersonhorse
• Training
Multiple Instance Learning
26
Input image Overfeat feature
P. O. Pinheiro, R. Collbert: From Image‐level to Pixel‐level Labeling with Convolutional Networks, CVPR 2015
Multiple Instance Learning
• Inference
27
P. O. Pinheiro, R. Collbert: From Image‐level to Pixel‐level Labeling with Convolutional Networks, CVPR 2015
Constrained Convolutional Neural Network
• With Image‐Level Class Labels only Define the objective function with constraints Estimate latent probability distribution for optimization
28
D. Pathak, P. Krähenbühl, T. Darrell: Constrained Convolutional Neural Networks for Weakly Supervised Segmentation, ICCV 2015
subjectto |find
subjecttomin, || |1
Constraints
• Suppression constraint Suppress all labels not in the image.
• Foreground constraint Make positive labels visible.
• Background constraint Define lower and upper
bound of background area.
• Size constraint Define upper bound on a class.
29
0∀ ∉∀ ∈ 0 bg
Optimization
• Iterative method with slack variable
Find
Find
30
max exp ; ;∈min ℓ log
subjecttomin, || |1
Results
31Original image Ground‐truth With labels With labels + tags
BoxSup
• With bounding box annotations only
32
J. Dai, K. He, J. Sun: BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. ICCV 2015
1 1 IoU , , ,min,
Semantic Segmentation by Transfer Learning
• Data Source domain, 1,… ,
• Image‐level class labels • Pixel‐wise segmentation annotations
Target domain, 1,… ,• Image‐level class labels only
Source and target domains are composed of exclusive sets of categories.
• Goal Semantic segmentation of target domain images by transferring
segmentation knowledge from source domain data
• Impact Scalability to the datasets with a large number of classes with
[Hong15] Seunghoon Hong, Junhyuk Oh, Bohyung Han, Honglak Lee: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network, CVPR 2016
, ; exp∑ exp⨀
Network Architecture
• Training strategy Using segmentation annotations: train both decoder and attention model Using image‐level class labels: train both classifier and attention model Encoder is fixed: VGG‐16 layer net
• Loss function
35
min, , ;∈ ∗∈ ∪ , ;∈ ∗∈
[Hong15] Seunghoon Hong, Junhyuk Oh, Bohyung Han, Honglak Lee: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network, CVPR 2016
• Same with FCN approach but higher classification resolution• Hole algorithm
Make feature map denser Use existing CNN architecture
40
L.‐C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille: Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, ICLR 2015
Less stride but larger filter forboth pooling and convolution!
Fully Connected Conditional Random Field
41
,
L.‐C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille: Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, ICLR 2015
Results by DeepLab‐CRF
42
WSSL
• Baseline: semantic segmentation with pixel‐level annotations
• Goals: estimate pixel‐level labels by learning CNN based on Image‐level annotations Bounding box annotations Hybrid annotations: many image‐level annotations and few segmentation
annotations
43
G. Papandreou, L.‐C. Chen, K. P. Murphy, A. L. Yuille: Weakly‐ and Semi‐Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation. ICCV 2015
max log ; log ;where log ; ∝ exp | ;
Output of DCNN Model parameterLabel of pixel
EM using Weak Annotations
• E‐step: estimate the latent segmentation
• M‐step: maximize log‐likelihood by SGD
• EM‐Fixed:
• EM‐Adapt:
44
argmax ;argmax log ; logargmax | ; ′ log
′ argmax ;, 1 if 10 if 0
, const, const
, takes cardinality potential.
With image‐level class labels
45
With bounding box annotations
• Estimate pixel‐level annotations from bounding boxes Fully connected CRF