Semantic Segmentation with Convolutional Neural Networks

Semantic Segmentation with Convolutional Neural Networks

R. Q. FEITOSA

Typical ConvNet for Image Classification

2

polling

activation function

convolution

polling

activation function

convolution

polling

activation function

convolution

flattern

fully

connected

Typical ConvNet for Image Classification

3

fully

connected conv flatten

Main Application Groups

4

Image

Classification

Detection/

Localization

Semantic

Segmentation

1st idea for SS: sliding window

5

garden

garden

roof

input image crop

patch

classify center

pixel with a CNN

Redundant operations due to overlap. Inefficient!

2nd idea for SS: Fully Convolutional (a)

6

A bunch of convolutional layers at input resolution to classify all pixels at once.

Convolution at original resolution expensive!

3rd idea for SS: Fully Convolutional (b)

7

A bunch of convolutional layers with downsampling and upsampling inside the network.

contract expand

Upsampling: Unpooling

8

1 1 1

1 1 2 2

2 2

2

3

3 3

3 4 4

4 4

4 3

1 0

0

0

0

0

0

0

0

0 0

0 0

2

4 3

1 2

4 3

Nearest Neighbor Bed of Nails

input: 2×2 output: 4×4 input: 2×2 output: 4×4

Upsampling: Max Unpooling

9

5 3 5

2 1 6 3

2 1

2

1

7 3

2 2 1

8 4

0 0

0 0

0

1

0

0

0

0

4

0 0

3 0

6

8 7

1 2

4 3

Max Pooling memorizes where the max

came from

input: 4×4 output: 2×2 input: 2×2 output: 4×4

…

Max Unpooling puts at the location where

the max came from

corresponding pairs of down- and upsampling layers

3 × 3 transpose convolution, stride 2 pad 1

Filter moves 2 pixels in the output for every pixel in the input.

Stride gives ratio between movement in output and input.

Learnable Upsampling: Transpose Convolution

10

.2 .5

.3 .1

input: 2×2 filter 3×3 output: 4×4

1 2 1

2 4 2

1 2 1

.8 .4

.4 .2


Output contains copies of the filter weighted by the input, summing up at overlaps in the output.


11

.2 .5

.3 .1


1 2 1

2 4 2

1 2 1

0 .8 .4

.4 .2

1.4 2

.7 1

1

.5


Other names:

–deconvolution, –upconvolution, –fractionally strided convolutionl, –backward strided convolution.


12

.2 .5

.3 .1


1 2 1

2 4 2

1 2 1 0

.8 .4

.6 .1

1.4 2

.8 1

1

.5

.4 .2

.4 .2


Other names:

–deconvolution, –upconvolution, –fractionally strided convolutionl, –backward strided convolution.


13

.2 .5

.3 .1


1 2 1

2 4 2

1 2 1 0

.8 .4

.6 .1

1.4 2

.8 1

.1

.5

.4 .8

.4 .2

1.2 .6

.6 .3

FCN Example: U-net To improve spatial accuracy, output of corresponding layer in contractive stage is appended to the inputs of the expansive stage.

14

Picture from: Ronneberger O., Fischer P., Brox T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N.,

Hornegger J., Wells W., Frangi A. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015.

Lecture Notes in Computer Science, vol 9351. Springer, Cham.

https://arxiv.org/abs/1505.04597






















































































































FCN Example: U-net To circumvent the scarcity of training samples:

• Patch-wise training and inference.

• Final result is the mosaic of patches’ results.

• Trade-off between patch size and number of

training samples.

15

FCN Example: U-net The loss function must be weighted to compensate for unbalanced training data.

where

– 𝐿 is the weighted cross-entropy

– 𝜔𝑦𝑖 is the weight for the true class (𝑦𝑖) ; larger for

less abundant classes

16

𝐿 = −1

𝑁 𝜔𝑖 log

𝑒𝑠𝑦𝑖

𝑒𝑠𝑗𝑗𝑖

softmax of true class (𝑦𝑖)

precomputed weights

Recent relevant works on FCN

• Jégou, S. , Drozdzal, M., Vazquez, D., Romero, A. and Bengio, Y. , 2017, The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, pp. 1175-1183.

• Maggiori, E. , Tarabalka, Y., Charpiat, G. Alliez, P., 2017. Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification, IEEE Transactions on Geoscience and Remote Sensing , V. 55(2) , pp. 645-659.

• Volpi, M. and Tuia, D., 2017. Dense Semantic Labeling of Subdecimeter Resolution Images With Convolutional Neural Networks, IEEE Transactions on Geoscience and Remote Sensing , Vol. 55(2), pp. 881-893.

• Ronneberger O., Fischer P., Brox T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N., Hornegger J., Wells W., Frangi A. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Springer, Cham.

• Long, M., Cao, Y., Wang, J., and Jordan, M. I., 2015. Learning transferable features with deep adaptation networks. Proceedings of the 32nd International Conference on Machine Learning, pp. 97-105.

17

Semantic Segmentation with Convolutional Neural Networks

Thank you!

R. Q. FEITOSA

Semantic Segmentation with Convolutional Neural Networks

Documents