Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

convolutional neural networks for imageclassificationEvidence from Kaggle National Data Science Bowl.

Dmytro Mishkin, ducha.aiki at gmail comMarch 25, 2015

Czech Technical University in Prague

kaggle national data science bowl overview

The image classification problem

� 130,400 test images� 30,336 train images� 1 channel (grayscale)� 121 (biased) classess� 90% images ≤ 100x100 px

� logloss score = - 1N

N∑i=1

M∑j=1

yij log pij

� No external data

1

classes diagram

1

1url: http://npow.github.io/plankton/viewer/index.html.2

http://npow.github.io/plankton/viewer/index.html

final leaderboard

3

Which approach to use?

4

lunch time chat at kth’s computer vision group

� a computer vision scientist: How long does it take to train thesegeneric features on ImageNet?

� Hossein: 2 weeks� Ali: almost 3 weeks depending on the hardware� the computer vision scientist: hmmmm...� Stefan: Well, you have to compare the three weeks to the last 40

years of computer vision2

2http://www.csc.kth.se/cvap/cvg/DL/ots/5

http://www.csc.kth.se/cvap/cvg/DL/ots/

convolutional networks

CNNs are state-of-art in such fields of image recognition as:3:

� – Object Image Classification� – Scene Image Classification� – Action Image Classification� – Object Detection� – Semantic Segmentation� – Fine-grained Recognition� – Attribute Detection� – Metric Learning� – Instance Retrieval (almost).

3beat classic computer vision methods in 19 datasets out of 20http://www.csc.kth.se/cvap/cvg/DL/ots/

6

http://www.csc.kth.se/cvap/cvg/DL/ots/

contents

1. Basics of convolutional networks2. Image preprocessing3. Network architectures4. Ensembling5. What (seems that) do and does not work6. Winner‘s solution highlights

7

..basics of convolutional net-works

what is convolution

4

4https://developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html

9

https://developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html

https://developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html

softmax classifier

Softmax(cross-entropy) lossL = − log efyi∑

j

efj

SVM (hinge)lossL =

∑j̸=yi

max(0, f(xi, W)j − f(xi, W)yi +∆)

5

5http://vision.stanford.edu/teaching/cs231n/linear-classify-demo/10

http://vision.stanford.edu/teaching/cs231n/linear-classify-demo/

lenet-5. no other layers are necessary

6

Firstly idea proposed by LeCun7 in 1989, recently revived bySpringenberg et. al. in ”All Convolutional Net”8,

6http://eblearn.sourceforge.net/beginner_tutorial2_train.html7url: https://www.facebook.com/yann.lecun/posts/10152766574417143.8J. T. Springenberg et al. “Striving for Simplicity: The All Convolutional Net”. In:

ArXiv e-prints (2014). arXiv: 1412.6806 [cs.LG].11

http://eblearn.sourceforge.net/beginner_tutorial2_train.html

https://www.facebook.com/yann.lecun/posts/10152766574417143

http://arxiv.org/abs/1412.6806

non-linearities

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

4

Input

Activation

TanH

Sigmoid

ReLU

maxout (sort of)

LeakyReLU

12

regularization - dropout, weight decay

9

9Nitish Srivastava et al. “Dropout: A Simple Way to Prevent Neural Networks fromOverfitting”. In: Journal of Machine Learning Research 15 (2014), pp. 1929–1958.url: http://jmlr.org/papers/v15/srivastava14a.html.

13

http://jmlr.org/papers/v15/srivastava14a.html

deep learning libraries

Table 1: Popular deep learning GPU libraries

Name url languages Notescaffe github.com/BVLC/caffe C++/Python/no largest communitycxxnet github.com/dmlc/cxxnet C++/no good memory managementTheano github.com/Theano/Theano Python huge flexibilityTorch facebook/fbcunn lua LeCun Facebook librarycuda-convnet2 code.google.com/p/cuda-convnet2/ C++/pythonSparseConvNet http://tinyurl.com/pu65cfp C++/CUDA differs from others

14

github.com/BVLC/caffe

github.com/dmlc/cxxnet

github.com/Theano/Theano

facebook/fbcunn

code.google.com/p/cuda-convnet2/

http://tinyurl.com/pu65cfp

..image preprocessing

basic network architecture

72x72x1 → Crop to 64x64 →20C5 →MP2 →50C5 → MP2 →500IP → clf

16

basic data preprocessing

Table 2: 5-layer network experiments, 48x48 input image, no non-linearities,mean pixel extraction

Name, augmentation Val logloss Train loglossNo mean extraction, no scaling – –mirror 1.67 0.64histeq, mirror 1.74 0.64mirror + ReLU 1.61 0.44mirror + scale 1.42 0.937mirror + scale LeakyReLU 1.34 0.83mirror + rand rot 1.53 1.31

17

basic data preprocessing

Table 3: 5-layer network experiments, 48x48 input image, LeakyReLUnon-linearities, mean pixel extraction

Name, augmentation Val logloss Train loglossmirror + scale 1.34 0.83invert, mirror + scale 1.27 0.80invert, norm, mirror + scale 1.24 0.505invert, norm, mirror + scale, salt-pepper 1.15 n/a

18

more geometric transformations

Table 4: 5-layer network experiments, 64x64 input image, LeakyReLU

Name, augmentation Val loglossmirror 1.30mirror + scale (resize modes) 1.12h+v mirror, scale 1.10h+v mirror, scale + rot 1.08mirror, less baselr 1.04 :)

h+v mirror, scale + rot, depolar imgs 1.28

19

regularization methods

Table 5: 5-layer network experiments, 64x64 input image, LeakyReLU

Name, augmentation Val loglossh+v mirror, scale + rot, vanilla 1.08h+v mirror, scale + rot, PReLU (but slow down a lot)10 1.03h+v mirror, scale + rot, BatchNorm11 1.10h+v mirror, scale + rot, StochPool12 0.98

10K. He et al. “Delving Deep into Rectifiers: Surpassing Human-Level Performance onImageNet Classification”. In: ArXiv e-prints (2015). arXiv: 1502.01852 [cs.CV].11S. Ioffe and C. Szegedy. “Batch Normalization: Accelerating Deep Network Training byReducing Internal Covariate Shift”. In: ArXiv e-prints (2015). arXiv: 1502.03167[cs.LG].12M. D. Zeiler and R. Fergus. “Stochastic Pooling for Regularization of DeepConvolutional Neural Networks”. In: ArXiv e-prints (2013). arXiv: 1301.3557 [cs.LG].

20





data augmentation - don‘t forget about it during test time

for i = 0,90,180,270 degrees rotationfor 9 crops (N, NE, E, ...)

get predictions for mirrored/non-mirrored

21

..network architectures

cifar/lenet for testing

Pro‘s

� + Training time < 20 min� + Can be done in parallel� + therefore lots of experiments

Con‘s

� - Not complex enough to check smth (i.e. BatchNorm)� - That is why might lead to wrong conclusions about ”bad” things (i.e.

random rotations hurts CifarNets, but helps VGGNets)� - Or ”good” things (i.e. Stochastic pooling helps CifarNets, but none

for VGGNets)

23

We need to go deeper

24

googlenet

GoogLeNet architecture13

13C. Szegedy et al. “Going Deeper with Convolutions”. In: ArXiv e-prints (2014).arXiv: 1409.4842 [cs.CV].

25


googlenet

22 layers, but simple base brick – ”Inception”

26

internal ensemble

Take mean of all auxiliary classifiers instead of just throwing away them

Table 6: GoogLeNet,validation loss

Name Public LBclf on inc3 0.722clf on inc4a 0.754clf on inc4b 0.757clf on inc5b 0.855average 0.693

Table 7: VGGNet,validation loss

Name Public LBclf on pool4 0.762clf on pool5 0.657clf on fc7 0.707average 0.630

14

14J. Xie, B. Xu, and Z. Chuang. “Horizontal and Vertical Ensemble with DeepRepresentation for Classification”. In: ArXiv e-prints (2013). arXiv: 1306.2759[cs.LG].

27



googlenet-results

Table 8: GoogLeNet, 64x64 input image, Leaky ReLU (if not stated other),AlexNet-oversample

Name Public LBNo inv, scale, ReLU, last-clf 0.910No inv, scale, ReLU 0.859No inv, scale 0.816No inv scale, maxout-clf 0.785Inv, scale, maxout-clf, retrain 0.70396x96, inv, scale, maxout-clf, retrained, no-aug-ft15 0.684112x112, inv, scale, maxout-clf, retrained, no-aug-ft. 0.71648x48, inv, scale, maxout-clf, retrained, no-aug-ft. + test rot 0.74996x96, inv, scale, maxout-clf, retrained, no-aug-ft. + test rot 0.67948x48+96x96+112x112, inv, scale, maxout-clf, retrained, no-aug-ft 0.677

15Ben Graham‘s trick: finetune converged model for 1-5 epochs withoutdata-augmentation with small lrhttp://blog.kaggle.com/2015/01/02/cifar-10-competition-winners-interviews-with-dr-ben-graham-phil-culliton-zygmunt-zajac/

28

http://blog.kaggle.com/2015/01/02/cifar-10-competition-winners-interviews-with-dr-ben-graham-phil-culliton-zygmunt-zajac/

http://blog.kaggle.com/2015/01/02/cifar-10-competition-winners-interviews-with-dr-ben-graham-phil-culliton-zygmunt-zajac/

vggnet

VGGNet architectures16

Differences: Dropout in conv-layers (0.3), SPP-pooling for pool5, LeakyReLU,aux. clf.

16K. Simonyan and A. Zisserman. “Very Deep Convolutional Networks for Large-ScaleImage Recognition”. In: ArXiv e-prints (Sept. 2014). arXiv: 1409.1556 [cs.CV]. 29


spatial pyramid pooling

17

17K. He et al. “Spatial Pyramid Pooling in Deep Convolutional Networks for VisualRecognition”. In: ArXiv e-prints (2014). arXiv: 1406.4729 [cs.CV].

30


vggnet-results

Table 9: GoogLeNet, 64x64 input image, Leaky ReLU (if not stated other),AlexNet-oversample, no-SPP

Name Public LBNo inv, scale, ReLU, fc-maxout 0.752Inv, scale, single random crop 0.773Inv, scale, 50 random crops 0.751Inv, scale, 0.729Inv, scale, retrained 0.720Inv, scale, fc-maxout 0.662Inv, scale, fc-maxout, SPP 0.654All VGGNets Mix 0.650

31

sparseconvnet

� – 0.79 LB Score� – Unusual library� – C2 instead of C3 convolution� – Only padding - for input image

� – Kaggle CIFAR-10 winning architecture

320C2 - 320C2 - MP2 -640C2 - 10% dropout - 640C2 - 10% dropout - MP2 -960C2 - 20% dropout - 960C2 - 20% dropout - MP2 -1280C2 - 30% dropout - 1280C2 - 30% dropout - MP2 -1600C2 - 40% dropout - 1600C2 - 40% dropout - MP2 -1920C2 - 50% dropout - 1920C1 - 50% dropout - 121C1 - Softmax output

32

ensemble-results

Table 10: Different mixes of all modes (3 GoogleNets, 4 VGGNets, 1SparseConvNet)

Name Public LB Private LB4 VGG 0.650 0.6513 VGG, 1 GLN 0.625 0.6294 VGG, 3 GLN 0.617 0.6184 VGG, 3 GLN, 1 Sparse 0.611 0.6164 VGG, 3 GLN, 1 Sparse, figure-skating 0.609 0.613

33

..misc

batchnorm

Works for CIFAR

But no big difference for VGGNet in KNDB for me. However, works forother people, i.e. Jae Hyun Lim18, 22nd place18https://github.com/lim0606/ndsb

35

https://github.com/lim0606/ndsb

what else seems to work here

� – Retrain top layers with different non-linearity (cheat diversity)� – Figure-skating average – throw away max and min prediction (0.003

LB score)

36

what seems, that does not work here

� – Dense SIFT + BOW / Fisher Vector 6̃0% accuracy� – Random forest on CNN features 6̃5% accuracy� – Mix of Hinge and Cross-Entropy losses� – Averaging with other mean than arithmetical� – Image enhancement or preprocessing (histogram equalization, etc.)

37

..winner‘s solution highlights

team work

� – Roll-pool

� – Hand-engineered features� – RMS-Pool� – Knowledge distillation

19

19http://benanne.github.io/2015/03/17/plankton.html39

http://benanne.github.io/2015/03/17/plankton.html

Questions?

40

thanks

This nice presentation theme is taken from

github.com/matze/mtheme

The theme itself is licensed under a Creative CommonsAttribution-ShareAlike 4.0 International License.

cba

41

github.com/matze/mtheme

http://creativecommons.org/licenses/by-sa/4.0/

http://creativecommons.org/licenses/by-sa/4.0/

Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

Technology

image preprocessing3

computer vision scientist

elds of image recognition

kths computer vision

train images

48x48 input image

convolutional neural

convolutional net8