Semantic Segmentation with Convolutional Neural Networks R. Q. FEITOSA
Semantic Segmentation with Convolutional Neural Networks
R. Q. FEITOSA
Typical ConvNet for Image Classification
2
polling
activation function
convolution
polling
activation function
convolution
polling
activation function
convolution
flattern
fully
connected
Typical ConvNet for Image Classification
3
fully
connected conv flatten
Main Application Groups
4
Image
Classification
Detection/
Localization
Semantic
Segmentation
1st idea for SS: sliding window
5
garden
garden
roof
input image crop
patch
classify center
pixel with a CNN
Redundant operations due to overlap. Inefficient!
2nd idea for SS: Fully Convolutional (a)
6
A bunch of convolutional layers at input resolution to classify all pixels at once.
Convolution at original resolution expensive!
3rd idea for SS: Fully Convolutional (b)
7
A bunch of convolutional layers with downsampling and upsampling inside the network.
contract expand
Upsampling: Unpooling
8
1 1 1
1 1 2 2
2 2
2
3
3 3
3 4 4
4 4
4 3
1 0
0
0
0
0
0
0
0
0 0
0 0
2
4 3
1 2
4 3
Nearest Neighbor Bed of Nails
input: 2×2 output: 4×4 input: 2×2 output: 4×4
Upsampling: Max Unpooling
9
5 3 5
2 1 6 3
2 1
2
1
7 3
2 2 1
8 4
0 0
0 0
0
1
0
0
0
0
4
0 0
3 0
6
8 7
1 2
4 3
Max Pooling memorizes where the max
came from
input: 4×4 output: 2×2 input: 2×2 output: 4×4
…
Max Unpooling puts at the location where
the max came from
corresponding pairs of down- and upsampling layers
3 × 3 transpose convolution, stride 2 pad 1
Filter moves 2 pixels in the output for every pixel in the input.
Stride gives ratio between movement in output and input.
Learnable Upsampling: Transpose Convolution
10
.2 .5
.3 .1
input: 2×2 filter 3×3 output: 4×4
1 2 1
2 4 2
1 2 1
.8 .4
.4 .2
3 × 3 transpose convolution, stride 2 pad 1
Output contains copies of the filter weighted by the input, summing up at overlaps in the output.
Learnable Upsampling: Transpose Convolution
11
.2 .5
.3 .1
input: 2×2 filter 3×3 output: 4×4
1 2 1
2 4 2
1 2 1
0 .8 .4
.4 .2
1.4 2
.7 1
1
.5
3 × 3 transpose convolution, stride 2 pad 1
Other names:
–deconvolution, –upconvolution, –fractionally strided convolutionl, –backward strided convolution.
Learnable Upsampling: Transpose Convolution
12
.2 .5
.3 .1
input: 2×2 filter 3×3 output: 4×4
1 2 1
2 4 2
1 2 1 0
.8 .4
.6 .1
1.4 2
.8 1
1
.5
.4 .2
.4 .2
3 × 3 transpose convolution, stride 2 pad 1
Other names:
–deconvolution, –upconvolution, –fractionally strided convolutionl, –backward strided convolution.
Learnable Upsampling: Transpose Convolution
13
.2 .5
.3 .1
input: 2×2 filter 3×3 output: 4×4
1 2 1
2 4 2
1 2 1 0
.8 .4
.6 .1
1.4 2
.8 1
.1
.5
.4 .8
.4 .2
1.2 .6
.6 .3
FCN Example: U-net To improve spatial accuracy, output of corresponding layer in contractive stage is appended to the inputs of the expansive stage.
14
Picture from: Ronneberger O., Fischer P., Brox T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N.,
Hornegger J., Wells W., Frangi A. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015.
Lecture Notes in Computer Science, vol 9351. Springer, Cham.
FCN Example: U-net To circumvent the scarcity of training samples:
• Patch-wise training and inference.
• Final result is the mosaic of patches’ results.
• Trade-off between patch size and number of
training samples.
15
FCN Example: U-net The loss function must be weighted to compensate for unbalanced training data.
where
– 𝐿 is the weighted cross-entropy
– 𝜔𝑦𝑖 is the weight for the true class (𝑦𝑖) ; larger for
less abundant classes
16
𝐿 = −1
𝑁 𝜔𝑖 log
𝑒𝑠𝑦𝑖
𝑒𝑠𝑗𝑗𝑖
softmax of true class (𝑦𝑖)
precomputed weights
Recent relevant works on FCN
• Jégou, S. , Drozdzal, M., Vazquez, D., Romero, A. and Bengio, Y. , 2017, The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, pp. 1175-1183.
• Maggiori, E. , Tarabalka, Y., Charpiat, G. Alliez, P., 2017. Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification, IEEE Transactions on Geoscience and Remote Sensing , V. 55(2) , pp. 645-659.
• Volpi, M. and Tuia, D., 2017. Dense Semantic Labeling of Subdecimeter Resolution Images With Convolutional Neural Networks, IEEE Transactions on Geoscience and Remote Sensing , Vol. 55(2), pp. 881-893.
• Ronneberger O., Fischer P., Brox T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N., Hornegger J., Wells W., Frangi A. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Springer, Cham.
• Long, M., Cao, Y., Wang, J., and Jordan, M. I., 2015. Learning transferable features with deep adaptation networks. Proceedings of the 32nd International Conference on Machine Learning, pp. 97-105.
17
Semantic Segmentation with Convolutional Neural Networks
Thank you!
R. Q. FEITOSA