Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.
Post on 21-Jan-2016
214 Views
Preview:
Transcript
Feedforward semantic segmentation with zoom-out featuresMOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH
TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO
2Main Ideas
Casting semantic segmentation as classifying a set of superpixels.
Extracting CNN features from different levels of spatial context around the superpixel at hand.
Using MLP as the classifier
Photo credit: Mostajabi et al.
3Zoom-out feature extraction
Photo credit: Mostajabi et al.
4Zoom-out feature extraction
Subscene Level Features Bounding box of superpixels within radius three from the superpixel
at hand
Warp bounding box to 256 x 256 pixels
Activations of the last fully connected layer
Scene Level Features Warp image to 256 x 256 pixels
Activations of the last fully connected layer
5Training
Extracting the features from the mirror images and take element-wise max over the resulting two features vectors.
12416-dimensional representation for each superpixel.
Training 2 classifiers Linear classifier (Softmax)
MLP: Hidden layer (1024 neurons) + ReLU + Hidden layer (1024 neurons) with dropout
6Loss Function
Imbalanced dataset Wheighted loss function
Loss function: Let be frequency of class c in the training data and
7Effect of Zoom-out Levels
Photo and Table credit: Mostajabi et al.
Image Ground Truth
G1:3 G1:5 G1:5+S1 G1:5+S1+S2
8Quantitative Results
Softmax Results on VOC 2012
Table credit: Mostajabi et al.
9Quantitative Results MLP Results
Table credit: Mostajabi et al.
10Qualitative Results
Photo credit: Mostajabi et al.
11
Learning Deconvolution Network for Semantic SegmentationNOH, HONG AND HAN
POSTECH, KOREA
12Motivations
Photo credit: Noh et al.
Image Ground Truth FCN Prediction
13Motivations
Photo credit: Noh et al.
14Deconvolution Network Architecture
Photo credit: Noh et al.
15Unpooling
Photo credit: Noh et al.
16Deconvolution
Photo credit: Noh et al.
17Unpooling and Deconvolution Effects
Photo credit: Noh et al.
18Pipeline
Generating 2K object proposals using Edge-Box and selecting top 50 based on their objectness scores.
Aggregating the segmentation maps which are generated for each proposals using pixel-wise maximum or average.
Constructing the class conditional probability map using Softmax
Apply fully-conncected CRF to the probability map.
Ensemble with FCN Computing mean of probability map generated with DeconvNet and
FCN
applying CRF.
Photo credit: Noh et al.
19Training Deep Network
Adding a batch normalization layer to the output of every convolutional and deconvolutional layer.
Two-stage Training Train on easy examples first and then fine-tune with more
challenging ones.
Constructing easy examples: Crop object instances using ground-truth annotations
Limiting the variations in object location and size reduces the search space for semantic segmentation substantially
20Effect of Number of Proposals
Photo credit: Noh et al.
21Quantitative Results
Table credit: Noh et al.
22Qualitative Results
Photo credit: Noh et al.
23Qualitative Results
Examples that FCN produces better results than DeconvNet.
Photo credit: Noh et al.
24Qualitative Results
Examples that inaccurate predictions from our method and FCN are improved by ensemble.
Photo credit: Noh et al.
top related