Top Banner
FINE-TUNING A CONVOLUTIONAL NETWORK FOR CULTURAL EVENT RECOGNITION ADVISORS: Andrea Calafell Xavier Giró-i-Nieto Amaia Salvador 20/07/2015 AUTHOR: Matthias Zeppelzauer
54

Fine tuning a convolutional network for cultural event recognition

Aug 12, 2015

Download

Technology

Xavier Giro
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fine tuning a convolutional network for cultural event recognition

FINE-TUNING A CONVOLUTIONAL NETWORK FOR CULTURAL EVENT RECOGNITION

ADVISORS:

Andrea Calafell

Xavier Giró-i-Nieto Amaia Salvador

20/07/2015

AUTHOR:

Matthias Zeppelzauer

Page 2: Fine tuning a convolutional network for cultural event recognition

OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work

2

Page 3: Fine tuning a convolutional network for cultural event recognition

MOTIVATION: Cultural Heritage

3Chinese New year

Page 4: Fine tuning a convolutional network for cultural event recognition

MOTIVATION: Cultural Heritage

4Carnival Rio

Page 5: Fine tuning a convolutional network for cultural event recognition

Classic onsite explorers

5

Page 6: Fine tuning a convolutional network for cultural event recognition

Onsite social media is big data...

6

Page 7: Fine tuning a convolutional network for cultural event recognition

...and online explorers need our help

7

Page 8: Fine tuning a convolutional network for cultural event recognition

CHALEARN: Looking at People

8

TRAININGSET

5,875

VALIDATIONSET

2,332

TESTSET

3,569

50 EVENTS

Page 9: Fine tuning a convolutional network for cultural event recognition

MOTIVATION: Goals

9

● Improve the results obtained in ChaLearn Challenge.

● Exploit the noisy data collected from Flickr

Page 11: Fine tuning a convolutional network for cultural event recognition

STATE OF THE ART: CaffeNet

11

CaffeNet

ARCHITECTURE[Khrizevsky’12]

SOFTWARE[Jia’14]

DATA[Deng’09]

Page 12: Fine tuning a convolutional network for cultural event recognition

STATE OF THE ART: CNN ARCHITECTURE

12

Convolutional Neural Network architecture

Babenko et al, Neural codes for image retrieval. In Computer Vision-ECCV, 2014

Page 13: Fine tuning a convolutional network for cultural event recognition

STATE OF THE ART: Object+Scene CNNs

13

Object-Scene Convolutional Neural Network for event recognition

Wang et al, Object-scene convolutional neural networks for event recognition in images. In CVPRW, 2015

Page 14: Fine tuning a convolutional network for cultural event recognition

OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work

14

Page 15: Fine tuning a convolutional network for cultural event recognition

BASELINE: Fine-tuning a ConvNet

15

50

Page 16: Fine tuning a convolutional network for cultural event recognition

BASELINE: ChaLearn @ CVPRW 2015

16

Awarded with the 2nd prize of the Cultural Event Recognition Challenge in the ChaLearn Workshop at CVPR 2015

Salvador. A, Giro-i-Nieto. X, Calafell, A, et al, Cultural Event Recognition with Visual ConvNets and Temporal Models. In CVPRW, 2015

Page 17: Fine tuning a convolutional network for cultural event recognition

BASELINE: ChaLearn @ CVPRW 2015

17

Awarded with the 2nd prize of the Cultural Event Recognition Challenge in the ChaLearn Workshop at CVPR 2015

Salvador. A, Giro-i-Nieto. X, Calafell, A, et al, Cultural Event Recognition with Visual ConvNets and Temporal Models. In CVPRW, 2015

Page 18: Fine tuning a convolutional network for cultural event recognition

OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work

18

Page 19: Fine tuning a convolutional network for cultural event recognition

Convnets require to be trained with...

19

a large amount of labeled images

Page 20: Fine tuning a convolutional network for cultural event recognition

but clean data is expensive...

20

and downloading noisy data in an unsupervised fashion is easier and cheaper.

Page 21: Fine tuning a convolutional network for cultural event recognition

NOISY DATA: Flickr Dataset

21

FLICKR DATASET

4,06850

EVENTS

Page 22: Fine tuning a convolutional network for cultural event recognition

DATASET BIAS

22

Dataset bias when fine-tuning with ChaLearn or Flickr dataset:

Page 23: Fine tuning a convolutional network for cultural event recognition

OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work

23

Page 24: Fine tuning a convolutional network for cultural event recognition

DENOISING THE FLICKR DATASET

24

Mosaic of Queens Day from ChaLearn Mosaic of Queens Day from Flickr

Page 25: Fine tuning a convolutional network for cultural event recognition

DENOISING THE FLICKR DATASET

25Example event: Annual Buffalo Roundup

Fine-tuned model with ChaLearn

New subset from

Page 26: Fine tuning a convolutional network for cultural event recognition

BASELINE: Dataset ordering during fine-tuning

26

CaffeNet

FINE-TUNING JOINT:

Page 27: Fine tuning a convolutional network for cultural event recognition

DENOISING THE FLICKR DATASET

27

Joint fine-tuning of the clean and noisy datasets:

0.6136

Page 28: Fine tuning a convolutional network for cultural event recognition

BASELINE: Dataset ordering during fine-tuning

28

CaffeNet

FINE-TUNING: FINE-TUNING:

Page 29: Fine tuning a convolutional network for cultural event recognition

DENOISING THE FLICKR DATASET

29

Sequential fine-tuning of the clean and noisy datasets:

0.6136

Page 30: Fine tuning a convolutional network for cultural event recognition

BASELINE: Dataset ordering during fine-tuning

30

CaffeNet

FINE-TUNING:FINE-TUNING:

Page 31: Fine tuning a convolutional network for cultural event recognition

DENOISING THE FLICKR DATASET

31

Sequential fine-tuning of the noisy and clean datasets:

0.6136

+1,3%

Page 32: Fine tuning a convolutional network for cultural event recognition

OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work

32

Page 33: Fine tuning a convolutional network for cultural event recognition

FRACKING MINING +/- SAMPLES

33

Page 34: Fine tuning a convolutional network for cultural event recognition

FRACKING THE TRAINING DATASET

34Example event: Pingxi Lantern Festival

Fine-tuned model with ChaLearn

New subset from

hard negatives

hard positive

Page 35: Fine tuning a convolutional network for cultural event recognition

BASELINE: Dataset ordering during fine-tuning

35

CaffeNet

FINE-TUNING: Fine-tuning with fracking subset from:

Page 36: Fine tuning a convolutional network for cultural event recognition

FRACKING THE TRAINING DATASET

36

Results of fine-tuning using fracking in images from ChaLearn:

baseline: 0.61365

+0,9%

Page 37: Fine tuning a convolutional network for cultural event recognition

OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work

37

Page 38: Fine tuning a convolutional network for cultural event recognition

FINE-TUNING DEEPER LAYERS ONLY

38Layer 2 responds to corners and other edge/color conjunctions.

Page 39: Fine tuning a convolutional network for cultural event recognition

FINE-TUNING DEEPER LAYERS ONLY

39

Layer 3 has more complex invariances, capturing similar textures Zeiler et al, Visualizing and Understanding Convolutional Networks, In Computer Vision-ECCV 2014,

Page 40: Fine tuning a convolutional network for cultural event recognition

FINE-TUNING DEEPER LAYERS ONLY

40

50

Andrej Karpathy. Convolutional neural networks for visual recognition. In Stanford CS class CS231n.

FC6 FC7

FC8

Page 41: Fine tuning a convolutional network for cultural event recognition

FINE-TUNING DEEPER LAYERS ONLY

41

Results of only fine-tuning the deeper layers:

+3%

0.61365

Page 42: Fine tuning a convolutional network for cultural event recognition

FINE-TUNING DEEPER LAYERS ONLY

42

Results of only fine-tuning the deeper layers :

+4%

0.6136

Page 43: Fine tuning a convolutional network for cultural event recognition

OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work

43

Page 44: Fine tuning a convolutional network for cultural event recognition

BASELINE: ChaLearn @ CVPRW 2015

44

Awarded with the 2nd prize of the Cultural Event Recognition Challenge in the ChaLearn Workshop at CVPR 2015

Salvador. A, Giro-i-Nieto. X, Calafell, A, et al, Cultural Event Recognition with Visual ConvNets and Temporal Models. In CVPRW, 2015

Page 45: Fine tuning a convolutional network for cultural event recognition

ENSEMBLE OF EVENT DETECTORS

45

SINGLE CONVNET FOR THE 50 EVENTS:

Page 46: Fine tuning a convolutional network for cultural event recognition

ENSEMBLE OF EVENT DETECTORS

46

ONE CONVNET FOR EACH EVENTS:

Page 47: Fine tuning a convolutional network for cultural event recognition

ENSEMBLE OF EVENT DETECTORS

47

Results of ensemble of binary :

+6,6%

0.6136

Page 48: Fine tuning a convolutional network for cultural event recognition

OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work

48

Page 49: Fine tuning a convolutional network for cultural event recognition

CONLUSIONS

49

● The Flickr dataset helped us to improve the score by swapping the order in which we were using the clean and noisy datasets

CaffeNet

FINE-TUNING:FINE-TUNING:+1,3%

Page 50: Fine tuning a convolutional network for cultural event recognition

CONLUSIONS

50

● The network actually succeeds in improving his performance by learning from its own mistakes when applying fracking.

+0,9%

CaffeNet

FINE-TUNING: Fine-tuning with fracking subset from:

Page 51: Fine tuning a convolutional network for cultural event recognition

CONLUSIONS

51

● The results are better if we keep the weights learned in the earlier layers from a very large dataset.

50

+4%

Page 52: Fine tuning a convolutional network for cultural event recognition

CONLUSIONS

52

● Fine-tuning one convnet for each class increases the score.

+6,6%

Page 53: Fine tuning a convolutional network for cultural event recognition

FUTURE WORK

53

● Mix our solutions with a fine-tuned network with PLACES, and with other local solutions.

SCENE CNN (PLACES)

LOCAL

NOW

● Compete (and try to win) ChaLearn @ ICCV 2015 !!

Page 54: Fine tuning a convolutional network for cultural event recognition

FINE-TUNING A CONVOLUTIONAL NETWORK FOR CULTURAL EVENT RECOGNITION

ADVISORS:

Andrea Calafell

Xavier Giró-i-Nieto Amaia Salvador

20/07/2015

AUTHOR:

Matthias Zeppelzauer