Top Banner
Convolutional Neural Networks (CNN) Prof. Seungchul Lee Industrial AI Lab.
115

Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

May 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Convolutional Neural Networks (CNN)

Prof. Seungchul Lee

Industrial AI Lab.

Page 2: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Convolution

2

Page 3: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Convolution

• Integral (or sum) of the product of the two signals after one is reversed and shifted

• Cross correlation and convolution

3

Page 4: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

1D Convolution

• (actually cross-correlation)

4Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1Input

Page 5: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

1D Convolution

• (actually cross-correlation)

5Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1

W

1 3 0 -1

w

Input

Kernel

Output

L = W-w+1

Page 6: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

1D Convolution

• (actually cross-correlation)

6Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1

1 3 0 -1

Input

Kernel

Output

L = W-w+1

7

W

w

Page 7: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

1D Convolution

• (actually cross-correlation)

7Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1

1 3 0 -1

Input

Kernel

Output

L = W-w+1

7 9

W

w

Page 8: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

1D Convolution

• (actually cross-correlation)

8Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1

1 3 0 -1

w

Input

Kernel

Output

L = W-w+1

7 9 12

W

Page 9: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

1D Convolution

• (actually cross-correlation)

9Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1

1 3 0 -1

w

Input

Kernel

Output

L = W-w+1

7 9 12 2

W

Page 10: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

1D Convolution

• (actually cross-correlation)

10Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1

1 3 0 -1

w

Input

Kernel

Output

L = W-w+1

7 9 12 2 -1

W

Page 11: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

1D Convolution

• (actually cross-correlation)

11Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1

1 3 0 -1

w

Input

Kernel

Output

L = W-w+1

7 9 12 2 -1 0

W

Page 12: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

1D Convolution

• (actually cross-correlation)

12Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1

1 3 0 -1

w

Input

Kernel

Output

L = W-w+1

7 9 12 2 -1 0 6

W

Page 13: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Example: 1D Convolution

13

Page 14: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

De-noising a Piecewise Smooth Signal

• Moving average (MA) filter– A moving average is the unweighted mean of the previous 𝑚 data

• Convolution with 1

𝑚,1

𝑚, ⋯ ,

1

𝑚

• Low-pass filter in time domain

14

Page 15: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

De-noising a Piecewise Smooth Signal

15

Page 16: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Edge Detection

16

Page 17: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Smoothing and Detection of Abrupt Changes

17

Page 18: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Images

18

Page 19: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Images Are Numbers

19Source: 6.S191 Intro. to Deep Learning at MIT

Page 20: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Images Are Numbers

20Source: 6.S191 Intro. to Deep Learning at MIT

Page 21: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Images Are Numbers

21Source: 6.S191 Intro. to Deep Learning at MIT

Page 22: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Images

22

Original image R G B

Gray image

Page 23: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

2D Convolution

23

Page 24: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Convolution on Image (= Convolution in 2D)

• Filter (or Kernel)– Discrete convolution can be viewed as element-wise multiplication by a matrix

– Modify or enhance an image by filtering

– Filter images to emphasize certain features or remove other features

– Filtering includes smoothing, sharpening and edge enhancement

24

Image Kernel Output

Page 25: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Convolution on Image (= Convolution in 2D)

25

Page 26: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Convolution on Image

26

Kernel

1 0 1

1 0 1

1 0 1

Image Kernel Output

1 1 1

0 0 0

1 1 1

Page 27: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Convolution on Image

27

Page 28: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Gaussian Filter: Blurring

28

Page 29: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

How to Find the Right Kernels

• We learn many different kernels that make specific effect on images

• Let’s apply an opposite approach

• We are not designing the kernel, but are learning the kernel from data

• Can learn feature extractor from data using a deep learning framework

29

Page 30: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Learning Visual Features

30

Page 31: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Convolutional Neural Networks (CNN)

• Motivation– The bird occupies a local area and looks the same in different parts of an image. We should

construct neural networks which exploit these properties.

31

Page 32: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

ANN Structure for Object Detection in Image

• Does not seem the best

• Did not make use of the fact that we are dealing with images

32

bird

Page 33: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Fully Connected Neural Network

• Input– 2D image

– Vector of pixel values

• Fully connected– Connect neuron in hidden layer to all neurons in input layer

– No spatial information

– Spatial organization of the input is destroyed by flatten

– And many, many parameters !

• How can we use spatial structure in the input to inform the architecture of the network?

33Source: 6.S191 Intro. to Deep Learning at MIT

Page 34: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Convolution Mask + Neural Network

34

Page 35: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Locality

• Locality: objects tend to have a local spatial support

– fully-connected layer → locally-connected layer

35

Page 36: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Locality

• Locality: objects tend to have a local spatial support

– fully-connected layer → locally-connected layer

36

We are not designing the kernel, but are learning the kernel from data→ Learning feature extractor from data

𝜔1 𝜔2 𝜔3

𝜔4 𝜔5 𝜔6

𝜔7 𝜔8 𝜔9

Page 37: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Deep Artificial Neural Networks

• Universal function approximator– Simple nonlinear neurons

– Linear connected networks

• Hidden layers– Autonomous feature learning

37

Class 2Class 1

Page 38: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Convolutional Neural Networks

• Structure– Weight sharing

– Local connectivity

– Typically have sparse interactions

• Optimization– Smaller searching space

38

Class 2Class 1

Page 39: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multiple Filters (or Kernels)

39

Page 40: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Channels

• Colored image = tensor of shape (height, width, channels)

• Convolutions are usually computed for each channel and summed:

• Kernel size aka receptive field (usually 1, 3, 5, 7, 11)

40Source: Dr. Francois Fleuret at EPFL

Page 41: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel 2D Convolution

41Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Page 42: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel 2D Convolution

42Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

Page 43: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel 2D Convolution

43

Kernel

w

h

c

OutputInput

W

H

C

Page 44: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel 2D Convolution

44Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Output

Kernel

w

h

c

Page 45: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel 2D Convolution

45Source: Dr. Francois Fleuret at EPFL

Kernel

w

h

c

Input

W

H

C

Output

Page 46: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel 2D Convolution

46Source: Dr. Francois Fleuret at EPFL

Kernel

w

h

c

Input

W

H

C

Output

Page 47: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel 2D Convolution

47Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

Output

Page 48: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel 2D Convolution

48Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

Output

Page 49: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel 2D Convolution

49Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

Output

Page 50: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel 2D Convolution

50Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

Output

Page 51: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel 2D Convolution

51Source: Dr. Francois Fleuret at EPFL

OutputInput

W

H

C

Kernel

w

h

c

Page 52: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel 2D Convolution

52Source: Dr. Francois Fleuret at EPFL

Output

H – h + 1

W – w + 1

1

Input

W

H

C

Kernel

w

h

c

Page 53: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel and Multi-kernel 2D Convolution

53Source: Dr. Francois Fleuret at EPFL

Output

H – h + 1

W – w + 1

D

w

h

c

Kernels

Input

W

H

C

D

Page 54: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Dealing with Shapes

• Activations or feature maps shape

– Input (𝑊𝑖 , 𝐻𝑖 , 𝐶)

– Output (𝑊𝑜, 𝐻𝑜, 𝐷)

• Kernel of Filter shape (𝑤, ℎ, 𝐶, 𝐷)– 𝑤 × ℎ Kernel size

– 𝐶 Input channels

– 𝐷 Output channels

• Numbers of parameters: (𝑤 × ℎ × 𝐶 + 1) × 𝐷– bias

54Source: Dr. Francois Fleuret at EPFL

Page 55: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel 2D Convolution

• The kernel is not swiped across channels, just across rows and columns.

• Note that a convolution preserves the signal support structure.– A 1D signal is converted into a 1D signal, a 2D signal into a 2D, and neighboring parts of the input signal influence neighboring

parts of the output signal.

• We usually refer to one of the channels generated by a convolution layer as an activation map.

• The sub-area of an input map that influences a component of the output as the receptive field of the latter.

55

Page 56: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Padding and Stride

56

Page 57: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Strides

• Strides: increment step size for the convolution operator

• Reduces the size of the output map

57

Example with kernel size 3×3 and a stride of 2 (image in blue)

Source: https://github.com/vdumoulin/conv_arithmetic

Page 58: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Padding

• Padding: artificially fill borders of image

• Useful to keep spatial dimension constant across filters

• Useful with strides and large receptive fields

• Usually fill with 0s

58Source: https://github.com/vdumoulin/conv_arithmetic

Page 59: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Padding and Stride

• Here with 5 × 5 × 𝐶 as input, a padding of (1 , 1), a stride of (2 , 2)

59Source: Dr. Francois Fleuret at EPFL

1

Input

1

Page 60: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Padding and Stride

• Here with 5 × 5 × 𝐶 as input, a padding of (1,1), a stride of (2,2), and a kernel of size 3 × 3 × 𝐶

60Source: Dr. Francois Fleuret at EPFL

Input

Output

Page 61: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

• Here with 5 × 5 × 𝐶 as input, a padding of (1,1), a stride of (2,2), and a kernel of size 3 × 3 × 𝐶

Padding and Stride

61Source: Dr. Francois Fleuret at EPFL

Input

Output

Page 62: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Padding and Stride

• Here with 5 × 5 × 𝐶 as input, a padding of (1,1), a stride of (2,2), and a kernel of size 3 × 3 × 𝐶

62Source: Dr. Francois Fleuret at EPFL

Input

Output

Page 63: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Padding and Stride

• Here with 5 × 5 × 𝐶 as input, a padding of (1,1), a stride of (2,2), and a kernel of size 3 × 3 × 𝐶

63Source: Dr. Francois Fleuret at EPFL

Input

Output

Page 64: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Padding and Stride

• Here with 5 × 5 × 𝐶 as input, a padding of (1,1), a stride of (2,2), and a kernel of size 3 × 3 × 𝐶

64Source: Dr. Francois Fleuret at EPFL

Input

Output

Page 65: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Padding and Stride

• Here with 5 × 5 × 𝐶 as input, a padding of (1,1), a stride of (2,2), and a kernel of size 3 × 3 × 𝐶

65Source: Dr. Francois Fleuret at EPFL

Input

Output

Page 66: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Padding and Stride

• Here with 5 × 5 × 𝐶 as input, a padding of (1,1), a stride of (2,2), and a kernel of size 3 × 3 × 𝐶

66Source: Dr. Francois Fleuret at EPFL

Input

Output

Page 67: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Padding and Stride

• Here with 5 × 5 × 𝐶 as input, a padding of (1,1), a stride of (2,2), and a kernel of size 3 × 3 × 𝐶

67Source: Dr. Francois Fleuret at EPFL

Input

Output

Page 68: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Padding and Stride

• Here with 5 × 5 × 𝐶 as input, a padding of (1,1), a stride of (2,2), and a kernel of size 3 × 3 × 𝐶

68Source: Dr. Francois Fleuret at EPFL

Input

Output

Page 69: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Nonlinear Activation Function

69

Page 70: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Pooling

70

Page 71: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Pooling

• Compute a maximum value in a sliding window (max pooling)

• Reduce spatial resolution for faster computation

• Achieve invariance to local translation

• Max pooling introduces invariances– Pooling size : 2×2

– No parameters: max or average of 2x2 units

71

Page 72: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Pooling

• The most standard type of pooling is the max-pooling, which computes max values over non-overlapping blocks

• For instance in 1D with a window of size 2

72Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1

Input

r w

Page 73: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Pooling

73Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1

Input

r w

w

Output

3

r

Page 74: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Pooling

74Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1

w

3

Input

r w

Output

r

3

Page 75: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Pooling

75Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1

w

3 3 0

Input

r w

Output

r

Page 76: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Pooling

76Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1

w

3 3 0 2

Input

r w

Output

r

Page 77: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Pooling

77Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1

w

3 3 0 2 2

Input

r w

Output

r

Page 78: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

• Such an operation aims at grouping several activations into a single “more meaningful” one.

• The average pooling computes average values per block instead of max values

Pooling

78Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1

w

3 3 0 2 2

Input

r w

Output

r

Page 79: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Pooling: Invariance

• Pooling provides invariance to any permutation inside one of the cell

• More practically, it provides a pseudo-invariance to deformations that result into local translations

79Source: Dr. Francois Fleuret at EPFL

Page 80: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

• Pooling provides invariance to any permutation inside one of the cell

• More practically, it provides a pseudo-invariance to deformations that result into local translations

Pooling: Invariance

80Source: Dr. Francois Fleuret at EPFL

Page 81: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel Pooling

81Source: Dr. Francois Fleuret at EPFL

Input

r w

s h

C

Page 82: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel Pooling

82

Input

r w

s h

C

Output

Page 83: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel Pooling

83Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

Page 84: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel Pooling

84Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

Page 85: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel Pooling

85Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

Page 86: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel Pooling

86Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

Page 87: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel Pooling

87Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

Page 88: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel Pooling

88Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

Page 89: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel Pooling

89Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

Page 90: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel Pooling

90Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

Page 91: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel Pooling

91Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

Page 92: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel Pooling

92Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

Page 93: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel Pooling

93Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

Page 94: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel Pooling

94Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

Page 95: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Multi-channel Pooling

95Source: Dr. Francois Fleuret at EPFL

Input

r w

s h

C

Output

r

s

C

Page 96: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Inside the Convolution Layer Block

96

Conv blocks

Page 97: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Classic ConvNet Architecture

• Input

• Conv blocks– Convolution + activation (relu)

– Convolution + activation (relu)

– ...

– Maxpooling

• Output

– Fully connected layers

– Softmax

97

Page 98: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

CNNs for Classification: Feature Learning

• Learn features in input image through convolution

• Introduce non-linearity through activation function (real-world data is non-linear!)

• Reduce dimensionality and preserve spatial invariance with pooling

98Source: 6.S191 Intro. to Deep Learning at MIT

Page 99: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

CNNs for Classification: Class Probabilities

• CONV and POOL layers output high-level features of input

• Fully connected layer uses these features for classifying input image

• Express output as probability of image belonging to a particular class

99Source: 6.S191 Intro. to Deep Learning at MIT

Page 100: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

CNNs: Training with Backpropagation

• Learn weights for convolutional filters and fully connected layers

• Backpropagation: cross-entropy loss

100Source: 6.S191 Intro. to Deep Learning at MIT

Page 101: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

CNN in TensorFlow

101

Page 102: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Lab: CNN with TensorFlow

• MNIST example

• To classify handwritten digits

102

Page 103: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

CNN Structure

103

Page 104: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Weights, Biases and Placeholder

104

Page 105: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Build a Model

• Convolution layers

1) The layer performs several convolutions to produce a set of linear activations

2) Each linear activation is running through a nonlinear activation function

3) Use pooling to modify the output of the layer further

• Fully connected layers

– Simple multi-layer perceptrons (MLP)

105

Page 106: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Convolution

• First, the layer performs several convolutions to produce a set of linear activations

– Filter size : 3×3

– Stride : The stride of the sliding window for each dimension of input

– Padding : Allow us to control the kernel width and the size of the output independently

• 'SAME' : zero padding

• 'VALID' : No padding

106

Page 107: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Activation

• Second, each linear activation is running through a nonlinear activation function

107

Page 108: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Pooling

• Third, use a pooling to modify the output of the layer further

– Compute a maximum value in a sliding window (max pooling)

– Pooling size : 2×2

108

Page 109: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Second Convolution Layer

109

Page 110: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Fully Connected Layer

• Fully connected layer

– Input is typically in a form of flattened features

– Then, apply softmax to multiclass classification problems

– The output of the softmax function is equivalent to a categorical probability distribution, it tells you the probability that any of the classes are true.

110

Page 111: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Loss and Optimizer

• Loss

– Classification: Cross entropy

– Equivalent to applying logistic regression

• Optimizer

– GradientDescentOptimizer

– AdamOptimizer: the most popular optimizer

111

Page 112: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Optimization

112

Page 113: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Test or Evaluation

113

Page 114: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

Test or Evaluation

114

Page 115: Convolutional Neural Networks (CNN) · Convolutional Neural Networks (CNN) •Motivation –The bird occupies a local area and looks the same in different parts of an image. We should

CNN Implemented in an Embedded System

115