Top Banner
convolutional neural networks for image classification Evidence from Kaggle National Data Science Bowl . Dmytro Mishkin, ducha.aiki at gmail com March 25, 2015 Czech Technical University in Prague
42

Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

Jul 15, 2015

Download

Technology

Dmitro Mishkin
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

convolutional neural networks for imageclassificationEvidence from Kaggle National Data Science Bowl.

Dmytro Mishkin, ducha.aiki at gmail comMarch 25, 2015

Czech Technical University in Prague

Page 2: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

kaggle national data science bowl overview

The image classification problem

� 130,400 test images� 30,336 train images� 1 channel (grayscale)� 121 (biased) classess� 90% images ≤ 100x100 px

� logloss score = - 1N

N∑i=1

M∑j=1

yij log pij

� No external data

1

Page 3: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

classes diagram

1

1url: http://npow.github.io/plankton/viewer/index.html.2

Page 4: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

final leaderboard

3

Page 5: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

Which approach to use?

4

Page 6: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

lunch time chat at kth’s computer vision group

� a computer vision scientist: How long does it take to train thesegeneric features on ImageNet?

� Hossein: 2 weeks� Ali: almost 3 weeks depending on the hardware� the computer vision scientist: hmmmm...� Stefan: Well, you have to compare the three weeks to the last 40

years of computer vision2

2http://www.csc.kth.se/cvap/cvg/DL/ots/5

Page 7: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

convolutional networks

CNNs are state-of-art in such fields of image recognition as:3:

� – Object Image Classification� – Scene Image Classification� – Action Image Classification� – Object Detection� – Semantic Segmentation� – Fine-grained Recognition� – Attribute Detection� – Metric Learning� – Instance Retrieval (almost).

3beat classic computer vision methods in 19 datasets out of 20http://www.csc.kth.se/cvap/cvg/DL/ots/

6

Page 8: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

contents

1. Basics of convolutional networks2. Image preprocessing3. Network architectures4. Ensembling5. What (seems that) do and does not work6. Winner‘s solution highlights

7

Page 9: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

..basics of convolutional net-works

Page 11: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

softmax classifier

Softmax(cross-entropy) lossL = − log efyi∑

j

efj

SVM (hinge)lossL =

∑j̸=yi

max(0, f(xi, W)j − f(xi, W)yi +∆)

5

5http://vision.stanford.edu/teaching/cs231n/linear-classify-demo/10

Page 12: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

lenet-5. no other layers are necessary

6

Firstly idea proposed by LeCun7 in 1989, recently revived bySpringenberg et. al. in ”All Convolutional Net”8,

6http://eblearn.sourceforge.net/beginner_tutorial2_train.html7url: https://www.facebook.com/yann.lecun/posts/10152766574417143.8J. T. Springenberg et al. “Striving for Simplicity: The All Convolutional Net”. In:

ArXiv e-prints (2014). arXiv: 1412.6806 [cs.LG].11

Page 13: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

non-linearities

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

4

Input

Activation

TanH

Sigmoid

ReLU

maxout (sort of)

LeakyReLU

12

Page 14: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

regularization - dropout, weight decay

9

9Nitish Srivastava et al. “Dropout: A Simple Way to Prevent Neural Networks fromOverfitting”. In: Journal of Machine Learning Research 15 (2014), pp. 1929–1958.url: http://jmlr.org/papers/v15/srivastava14a.html.

13

Page 15: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

deep learning libraries

Table 1: Popular deep learning GPU libraries

Name url languages Notescaffe github.com/BVLC/caffe C++/Python/no largest communitycxxnet github.com/dmlc/cxxnet C++/no good memory managementTheano github.com/Theano/Theano Python huge flexibilityTorch facebook/fbcunn lua LeCun Facebook librarycuda-convnet2 code.google.com/p/cuda-convnet2/ C++/pythonSparseConvNet http://tinyurl.com/pu65cfp C++/CUDA differs from others

14

Page 16: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

..image preprocessing

Page 17: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

basic network architecture

72x72x1 → Crop to 64x64 →20C5 →MP2 →50C5 → MP2 →500IP → clf

16

Page 18: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

basic data preprocessing

Table 2: 5-layer network experiments, 48x48 input image, no non-linearities,mean pixel extraction

Name, augmentation Val logloss Train loglossNo mean extraction, no scaling – –mirror 1.67 0.64histeq, mirror 1.74 0.64mirror + ReLU 1.61 0.44mirror + scale 1.42 0.937mirror + scale LeakyReLU 1.34 0.83mirror + rand rot 1.53 1.31

17

Page 19: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

basic data preprocessing

Table 3: 5-layer network experiments, 48x48 input image, LeakyReLUnon-linearities, mean pixel extraction

Name, augmentation Val logloss Train loglossmirror + scale 1.34 0.83invert, mirror + scale 1.27 0.80invert, norm, mirror + scale 1.24 0.505invert, norm, mirror + scale, salt-pepper 1.15 n/a

18

Page 20: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

more geometric transformations

Table 4: 5-layer network experiments, 64x64 input image, LeakyReLU

Name, augmentation Val loglossmirror 1.30mirror + scale (resize modes) 1.12h+v mirror, scale 1.10h+v mirror, scale + rot 1.08mirror, less baselr 1.04 :)

h+v mirror, scale + rot, depolar imgs 1.28

19

Page 21: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

regularization methods

Table 5: 5-layer network experiments, 64x64 input image, LeakyReLU

Name, augmentation Val loglossh+v mirror, scale + rot, vanilla 1.08h+v mirror, scale + rot, PReLU (but slow down a lot)10 1.03h+v mirror, scale + rot, BatchNorm11 1.10h+v mirror, scale + rot, StochPool12 0.98

10K. He et al. “Delving Deep into Rectifiers: Surpassing Human-Level Performance onImageNet Classification”. In: ArXiv e-prints (2015). arXiv: 1502.01852 [cs.CV].11S. Ioffe and C. Szegedy. “Batch Normalization: Accelerating Deep Network Training byReducing Internal Covariate Shift”. In: ArXiv e-prints (2015). arXiv: 1502.03167[cs.LG].12M. D. Zeiler and R. Fergus. “Stochastic Pooling for Regularization of DeepConvolutional Neural Networks”. In: ArXiv e-prints (2013). arXiv: 1301.3557 [cs.LG].

20

Page 22: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

data augmentation - don‘t forget about it during test time

for i = 0,90,180,270 degrees rotationfor 9 crops (N, NE, E, ...)

get predictions for mirrored/non-mirrored

21

Page 23: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

..network architectures

Page 24: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

cifar/lenet for testing

Pro‘s

� + Training time < 20 min� + Can be done in parallel� + therefore lots of experiments

Con‘s

� - Not complex enough to check smth (i.e. BatchNorm)� - That is why might lead to wrong conclusions about ”bad” things (i.e.

random rotations hurts CifarNets, but helps VGGNets)� - Or ”good” things (i.e. Stochastic pooling helps CifarNets, but none

for VGGNets)

23

Page 25: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

We need to go deeper

24

Page 26: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

googlenet

GoogLeNet architecture13

13C. Szegedy et al. “Going Deeper with Convolutions”. In: ArXiv e-prints (2014).arXiv: 1409.4842 [cs.CV].

25

Page 27: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

googlenet

22 layers, but simple base brick – ”Inception”

26

Page 28: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

internal ensemble

Take mean of all auxiliary classifiers instead of just throwing away them

Table 6: GoogLeNet,validation loss

Name Public LBclf on inc3 0.722clf on inc4a 0.754clf on inc4b 0.757clf on inc5b 0.855average 0.693

Table 7: VGGNet,validation loss

Name Public LBclf on pool4 0.762clf on pool5 0.657clf on fc7 0.707average 0.630

14

14J. Xie, B. Xu, and Z. Chuang. “Horizontal and Vertical Ensemble with DeepRepresentation for Classification”. In: ArXiv e-prints (2013). arXiv: 1306.2759[cs.LG].

27

Page 29: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

googlenet-results

Table 8: GoogLeNet, 64x64 input image, Leaky ReLU (if not stated other),AlexNet-oversample

Name Public LBNo inv, scale, ReLU, last-clf 0.910No inv, scale, ReLU 0.859No inv, scale 0.816No inv scale, maxout-clf 0.785Inv, scale, maxout-clf, retrain 0.70396x96, inv, scale, maxout-clf, retrained, no-aug-ft15 0.684112x112, inv, scale, maxout-clf, retrained, no-aug-ft. 0.71648x48, inv, scale, maxout-clf, retrained, no-aug-ft. + test rot 0.74996x96, inv, scale, maxout-clf, retrained, no-aug-ft. + test rot 0.67948x48+96x96+112x112, inv, scale, maxout-clf, retrained, no-aug-ft 0.677

15Ben Graham‘s trick: finetune converged model for 1-5 epochs withoutdata-augmentation with small lrhttp://blog.kaggle.com/2015/01/02/cifar-10-competition-winners-interviews-with-dr-ben-graham-phil-culliton-zygmunt-zajac/

28

Page 30: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

vggnet

VGGNet architectures16

Differences: Dropout in conv-layers (0.3), SPP-pooling for pool5, LeakyReLU,aux. clf.

16K. Simonyan and A. Zisserman. “Very Deep Convolutional Networks for Large-ScaleImage Recognition”. In: ArXiv e-prints (Sept. 2014). arXiv: 1409.1556 [cs.CV]. 29

Page 31: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

spatial pyramid pooling

17

17K. He et al. “Spatial Pyramid Pooling in Deep Convolutional Networks for VisualRecognition”. In: ArXiv e-prints (2014). arXiv: 1406.4729 [cs.CV].

30

Page 32: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

vggnet-results

Table 9: GoogLeNet, 64x64 input image, Leaky ReLU (if not stated other),AlexNet-oversample, no-SPP

Name Public LBNo inv, scale, ReLU, fc-maxout 0.752Inv, scale, single random crop 0.773Inv, scale, 50 random crops 0.751Inv, scale, 0.729Inv, scale, retrained 0.720Inv, scale, fc-maxout 0.662Inv, scale, fc-maxout, SPP 0.654All VGGNets Mix 0.650

31

Page 33: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

sparseconvnet

� – 0.79 LB Score� – Unusual library� – C2 instead of C3 convolution� – Only padding - for input image

� – Kaggle CIFAR-10 winning architecture

320C2 - 320C2 - MP2 -640C2 - 10% dropout - 640C2 - 10% dropout - MP2 -960C2 - 20% dropout - 960C2 - 20% dropout - MP2 -1280C2 - 30% dropout - 1280C2 - 30% dropout - MP2 -1600C2 - 40% dropout - 1600C2 - 40% dropout - MP2 -1920C2 - 50% dropout - 1920C1 - 50% dropout - 121C1 - Softmax output

32

Page 34: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

ensemble-results

Table 10: Different mixes of all modes (3 GoogleNets, 4 VGGNets, 1SparseConvNet)

Name Public LB Private LB4 VGG 0.650 0.6513 VGG, 1 GLN 0.625 0.6294 VGG, 3 GLN 0.617 0.6184 VGG, 3 GLN, 1 Sparse 0.611 0.6164 VGG, 3 GLN, 1 Sparse, figure-skating 0.609 0.613

33

Page 35: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

..misc

Page 36: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

batchnorm

Works for CIFAR

But no big difference for VGGNet in KNDB for me. However, works forother people, i.e. Jae Hyun Lim18, 22nd place18https://github.com/lim0606/ndsb

35

Page 37: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

what else seems to work here

� – Retrain top layers with different non-linearity (cheat diversity)� – Figure-skating average – throw away max and min prediction (0.003

LB score)

36

Page 38: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

what seems, that does not work here

� – Dense SIFT + BOW / Fisher Vector 6̃0% accuracy� – Random forest on CNN features 6̃5% accuracy� – Mix of Hinge and Cross-Entropy losses� – Averaging with other mean than arithmetical� – Image enhancement or preprocessing (histogram equalization, etc.)

37

Page 39: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

..winner‘s solution highlights

Page 40: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

team work

� – Roll-pool

� – Hand-engineered features� – RMS-Pool� – Knowledge distillation

19

19http://benanne.github.io/2015/03/17/plankton.html39

Page 41: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

Questions?

40

Page 42: Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

thanks

This nice presentation theme is taken from

github.com/matze/mtheme

The theme itself is licensed under a Creative CommonsAttribution-ShareAlike 4.0 International License.

cba

41