Top Banner
Case Study of CNN from LeNet to ResNet NamHyuk Ahn @ Ajou Univ. 2016. 03. 09
53

Case Study of Convolutional Neural Network

Apr 16, 2017

Download

Engineering

NamHyuk Ahn
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Case Study of Convolutional Neural Network

Case Study of CNN from LeNet to ResNet

NamHyuk Ahn @ Ajou Univ. 2016. 03. 09

Page 2: Case Study of Convolutional Neural Network

Convolutional Neural Network

Page 3: Case Study of Convolutional Neural Network
Page 4: Case Study of Convolutional Neural Network
Page 5: Case Study of Convolutional Neural Network

Convolution Layer

- Convolution (3-dim dot product) image and filter

- Stack filter in one layer (See blue and green output, called channel)

Page 6: Case Study of Convolutional Neural Network

Convolution Layer- Local Connectivity

• Instead connect all pixels to neurons, connect only local region of input (called receptive field)

• It can reduce many parameter

- Parameter sharing

• To reduce parameter, each channel have same filter. (# of filter == # of channel)

Page 7: Case Study of Convolutional Neural Network

Convolution Layer- Example) 1st conv layer in AlexNet

• Input: [224, 224], filter: [11x11x3], 96, output: [55, 55]

- Each filter extract different features (i.e. horizontal edge, vertical edge…)

Page 8: Case Study of Convolutional Neural Network

Pooling Layer- Downsample image to reduce parameter

- Usually use max pooling (take maximum value in region)

Page 9: Case Study of Convolutional Neural Network

ReLU, FC Layer

- ReLU

• Sort of activation function (e.g. sigmoid, tanh…)

- Fully-connected Layer

• Same as normal neural network

Page 10: Case Study of Convolutional Neural Network

Convolutional Neural Network

\

Page 11: Case Study of Convolutional Neural Network

Training CNN1. Calculate loss function with foward-prop

2. Optimize parameter w.r.t loss function with back-prop

• Use gradient descent method (SGD)

• Gradient of weight can calculate with chain rule of partial derivate

Page 12: Case Study of Convolutional Neural Network
Page 13: Case Study of Convolutional Neural Network
Page 14: Case Study of Convolutional Neural Network
Page 15: Case Study of Convolutional Neural Network

ILSVRC trend

Page 16: Case Study of Convolutional Neural Network
Page 17: Case Study of Convolutional Neural Network

AlexNet (2012) (ILSVRC 2012 winner)

Page 18: Case Study of Convolutional Neural Network

AlexNet

- ReLU

- Data augmentation

- Dropout

- Ensemble CNN (1-CNN 18.2%, 7-CNN 15.4%)

Page 19: Case Study of Convolutional Neural Network

AlexNet

- Other methods (but will not mention today)

• SGD + momentum (+ mini-batch)

• Multiple GPU

• Weight Decay

• Local Response Normalization

Page 20: Case Study of Convolutional Neural Network

Problems of sigmoid

- Gradient vanishing

• when gradient pass sigmoid, it can vanish because local gradient of sigmoid can be almost zero.

- Output is not zero-centered

• cause bad performance

Page 21: Case Study of Convolutional Neural Network

ReLU

- Converge of SGD is faster than sigmoid-like

- Computationally cheap

Page 22: Case Study of Convolutional Neural Network

Data augmentation- Randomly crop [256, 256] images to [224, 224]

- At test time, crop 5 images and average to predict

Page 23: Case Study of Convolutional Neural Network

Dropout- Similar to bagging (approximation of bagging)

- Act like regularizer (reduce overfit)

- Instead of using all neurons, “dropout” some neurons randomly (usually 0.5 probability)

Page 24: Case Study of Convolutional Neural Network

Dropout• At test time, not “dropout” neurons, but use

weighted neurons (usually 0.5)

• Weight is expected value of each neurons

Page 25: Case Study of Convolutional Neural Network

Architecture

- conv - pool - … - fc - softmax (similar to LeNet)

- Use large size filter (i.e. 11x11)

Page 26: Case Study of Convolutional Neural Network

Architecture

- Weights must be initalized randomly

• If not, all gradients of neurons will be same

• Usually, use gaussian distribution, std = 0.01

- Use mini-batch SGD and momentum SGD to update weight

Page 27: Case Study of Convolutional Neural Network

VGGNet (2014) (ILSVRC 2014 2nd)

Page 28: Case Study of Convolutional Neural Network

VGGNet

- Use small size kernel (always 3x3)

• Can use multiple non-linearlity (e.g. ReLU)

• Less weights to train

- Hard data augmentation (more than AlexNet)

- Ensemble 7 model (ILSVRC submission 7.3%)

Page 29: Case Study of Convolutional Neural Network

Architecture

- Most memory needs in early layers, most parameters increase in fc layers.

Page 30: Case Study of Convolutional Neural Network

GoogLeNet - Inception v1 (2014) (ILSVRC 2014 winner)

Page 31: Case Study of Convolutional Neural Network

GoogLeNet

Page 32: Case Study of Convolutional Neural Network

Inception module- Use 1x1, 3x3 and 5x5 conv

simultaneously to capture variety of structure

- Capture dense structure to 1x1, more spread out structure to 3x3, 5x5

- Computational expensive

• Use 1x1 conv layer to reduce dimension (explain details in later in ResNet)

Page 33: Case Study of Convolutional Neural Network

Auxiliary Classifiers- Deep network raises concern about effectiveness

of graident in backprop

- Loss of auxiliary is added to total loss (weighted by 0.3), remove at test time

Page 34: Case Study of Convolutional Neural Network

Average Pooling

- Proposed in Network in Network (also used in GoogLeNet)

- Problems of fc layer

• Needs lots of parameter, easy to overfit

- Replace fc to average pooling

Page 35: Case Study of Convolutional Neural Network

Average Pooling- Make channel as same as # of class in last conv

- Calc average on each channel, and pass to softmax

- Reduce overfit

Page 36: Case Study of Convolutional Neural Network

MSRA ResNet (2015) (ILSVRC 2015 winner)

Page 37: Case Study of Convolutional Neural Network

before ResNet..

- Have to know about

• PReLU

• Xavier Initalization

• Batch Normalization

Page 38: Case Study of Convolutional Neural Network

PReLU- Adaptive version of ReLU

- Train slope of function when x < 0

- Slightly more parameter (# of layer x # of channel)

Page 39: Case Study of Convolutional Neural Network

Xavier Initalization- If init with gaussian distribution, output of neurons

will be nearly zeros when network is deeep

- If increase std (1.0), output will saturate to -1 or 1

- Xavier init decide initial value by number of input neurons

- Looks fine, but this init method assume linear activation so can’t use in ReLU-like network

Page 40: Case Study of Convolutional Neural Network

output is saturated

output is vanished

Page 41: Case Study of Convolutional Neural Network

Xavier Initalization / 2

Xavier Initalization

Xavier Initalization / 2

Page 42: Case Study of Convolutional Neural Network

Batch Normalization- Make output to be gaussian distribution, but

normalization cost a lot

• Calc mean, variance in each dimension (assume each dims are uncorrelated)

• Calc mean, variance in mini-batch (not entire set)

- Normalize constrain non-linearlity and constrain network by assume each dims are uncorrelated

• Linear transform output (factors are parameter)

Page 43: Case Study of Convolutional Neural Network

Batch Normalization- When test, calc mean, variance using entire set (use

moving average)

- BN act like regularizer (don’t need Dropout)

Page 44: Case Study of Convolutional Neural Network

ResNet

Page 45: Case Study of Convolutional Neural Network

ResNet

Page 46: Case Study of Convolutional Neural Network

Problem of degradation- More depth, more accurate but deep network can

vanish/explode gradient • BN, Xavier Init, Dropout can handle (~30 layer)

- More deeper, degradation problem occur • Not only overfit, but also increase training error

Page 47: Case Study of Convolutional Neural Network

Deep Residual Learning

- Element-wise addition with F(x) and shortcut connection, and pass through ReLU non-linearlity

- Dim of x, F(x) are unequal (changing of channel), linear project x to match dim (done by 1x1 conv)

- Similar to LSTM

Page 48: Case Study of Convolutional Neural Network

Deeper Bottleneck

- To reduce training time, modify as bottleneck design (just for economical reason)

• (3x3x3)x64x64 + (3x3x3)x64x64=221184 (left)

• (1x1x3)x256x64 + (3x3x3)x64x64 + (1x1x3)x64x256=208896 (right)

• More width(channel) in right, but similar parameter

• Similar method also used in GoogLeNet

Page 49: Case Study of Convolutional Neural Network

ResNet

- Data augmentation as AlexNet does

- Batch Normalization (no dropout)

- Xavier / 2 initalization

- Average pooling

- Structure follows VGGNet style

Page 50: Case Study of Convolutional Neural Network

Conclusion

Page 51: Case Study of Convolutional Neural Network

Top-

5 Er

ror

0%

4%

8%

12%

16%

AlexNet

(2012)

VGGNet

(2014)

Inception-V1

(2014)

Human

PReLU-net

(2015)

BN-Inception

(2015)

ResNet-152

(2015)Inception-ResNet

(2016)

3.1%3.57%4.82%4.94%5.1%

6.66%7.32%

15.31%

Page 52: Case Study of Convolutional Neural Network

Conclusion

- Dropout, BN

- ReLU-like activation (e.g. PReLU, ELU..)

- Xavier initalization

- Average pooling

- Use pre-trained model :)

Page 53: Case Study of Convolutional Neural Network

Reference- Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep

convolutional neural networks." Advances in neural information processing systems. 2012.

- Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).

- Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." arXiv preprint arXiv:1312.4400 (2013).

- He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE International Conference on Computer Vision. 2015.

- He, Kaiming, et al. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015).

- Szegedy, Christian, Sergey Ioffe, and Vincent Vanhoucke. "Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning." arXiv preprint arXiv:1602.07261 (2016).

- Gu, Jiuxiang, et al. "Recent Advances in Convolutional Neural Networks." arXiv preprint arXiv:1512.07108 (2015). (good for tutorial)

- Also Thanks to CS231n, I used some figures in CS231n lecture slides. see http://cs231n.stanford.edu/index.html