Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

Post on 21-Jan-2016

214 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

Feedforward semantic segmentation with zoom-out featuresMOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH

TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

2Main Ideas

Casting semantic segmentation as classifying a set of superpixels.

Extracting CNN features from different levels of spatial context around the superpixel at hand.

Using MLP as the classifier

Photo credit: Mostajabi et al.

3Zoom-out feature extraction

Photo credit: Mostajabi et al.

4Zoom-out feature extraction

Subscene Level Features Bounding box of superpixels within radius three from the superpixel

at hand

Warp bounding box to 256 x 256 pixels

Activations of the last fully connected layer

Scene Level Features Warp image to 256 x 256 pixels

Activations of the last fully connected layer

5Training

Extracting the features from the mirror images and take element-wise max over the resulting two features vectors.

12416-dimensional representation for each superpixel.

Training 2 classifiers Linear classifier (Softmax)

MLP: Hidden layer (1024 neurons) + ReLU + Hidden layer (1024 neurons) with dropout

6Loss Function

Imbalanced dataset Wheighted loss function

Loss function: Let be frequency of class c in the training data and

7Effect of Zoom-out Levels

Photo and Table credit: Mostajabi et al.

Image Ground Truth

G1:3 G1:5 G1:5+S1 G1:5+S1+S2

8Quantitative Results

Softmax Results on VOC 2012

Table credit: Mostajabi et al.

9Quantitative Results MLP Results

Table credit: Mostajabi et al.

10Qualitative Results

Photo credit: Mostajabi et al.

11

Learning Deconvolution Network for Semantic SegmentationNOH, HONG AND HAN

POSTECH, KOREA

12Motivations

Photo credit: Noh et al.

Image Ground Truth FCN Prediction

13Motivations

Photo credit: Noh et al.

14Deconvolution Network Architecture

Photo credit: Noh et al.

15Unpooling

Photo credit: Noh et al.

16Deconvolution

Photo credit: Noh et al.

17Unpooling and Deconvolution Effects

Photo credit: Noh et al.

18Pipeline

Generating 2K object proposals using Edge-Box and selecting top 50 based on their objectness scores.

Aggregating the segmentation maps which are generated for each proposals using pixel-wise maximum or average.

Constructing the class conditional probability map using Softmax

Apply fully-conncected CRF to the probability map.

Ensemble with FCN Computing mean of probability map generated with DeconvNet and

FCN

applying CRF.

Photo credit: Noh et al.

19Training Deep Network

Adding a batch normalization layer to the output of every convolutional and deconvolutional layer.

Two-stage Training Train on easy examples first and then fine-tune with more

challenging ones.

Constructing easy examples: Crop object instances using ground-truth annotations

Limiting the variations in object location and size reduces the search space for semantic segmentation substantially

20Effect of Number of Proposals

Photo credit: Noh et al.

21Quantitative Results

Table credit: Noh et al.

22Qualitative Results

Photo credit: Noh et al.

23Qualitative Results

Examples that FCN produces better results than DeconvNet.

Photo credit: Noh et al.

24Qualitative Results

Examples that inaccurate predictions from our method and FCN are improved by ensemble.

Photo credit: Noh et al.

top related