Top Banner
The Impact of Visual Saliency Prediction in Image Classification 1 Eric Arazo Sánchez Kevin McGuinness Eva Mohedano Xavier Giró-i-Nieto Advisors:
92

The impact of visual saliency prediction in image classification

Apr 11, 2017

Download

Data & Analytics

Xavier Giro
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The impact of visual saliency prediction in image classification

The Impact of Visual Saliency Prediction in Image Classification

1Eric Arazo Sánchez Kevin McGuinness Eva Mohedano Xavier Giró-i-Nieto

Advisors:

Page 2: The impact of visual saliency prediction in image classification

Introduction - Computer vision

2

ClassifierHandcrafted descriptors “guitar”

ClassifierLearned descriptors

Trainable

Trainable

Classical computer

vision

Deep Learning “guitar”

Page 3: The impact of visual saliency prediction in image classification

Introduction - Imagenet

3

Russakovsky, Olga, et al. “Imagenet large scale visual recognition challenge”. International Journal of Computer Vision (2015).

Page 4: The impact of visual saliency prediction in image classification

Imagenet

4

Images:

● 1.2 M train

● 50,000 test

● 1,000 categories

Evaluation dataset unpublished before the

competition

Page 5: The impact of visual saliency prediction in image classification

Imagenet

5

Metrics:

● Top-1 accuracy

● Top-5 accuracy

Page 6: The impact of visual saliency prediction in image classification

Imagenet

6

Metrics:

● Top-1 accuracy

● Top-5 accuracy

Page 7: The impact of visual saliency prediction in image classification

Introduction - Imagenet

7

ILSVRC - Evolution since 2010

Slide credit: Kaiming He (FAIR)

Page 8: The impact of visual saliency prediction in image classification

Introduction - Imagenet

8

ILSVRC - Evolution since 2010

Slide credit: Kaiming He (FAIR)

Some models have already reached

human-level performance.

Still the olympic games of computer

vision?

Page 9: The impact of visual saliency prediction in image classification

Introduction - Imagenet

9Slide credit: Kaiming He (FAIR)

-9.4%2012

Introduction of the Convolutional Neural

Networks (CNN) in the competition with AlexNet

ILSVRC - Evolution since 2010

Page 10: The impact of visual saliency prediction in image classification

Introduction - AlexNet

10

Ref: Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. NIPS 2012.

Page 11: The impact of visual saliency prediction in image classification

Introduction - AlexNet

11

5 Convolutional

Layers

3 Fully Connected

Layers

1000 softmax

Object class

Page 12: The impact of visual saliency prediction in image classification

Introduction - CNN

12LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.

Page 13: The impact of visual saliency prediction in image classification

Introduction - CNN

13LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.

CNN are very useful in computer vision:

● Reduction of parameters (shared filters)

● Spatial coherence

Page 14: The impact of visual saliency prediction in image classification

Introduction - CNN

14

Image captioning Image segmentation

Page 15: The impact of visual saliency prediction in image classification

Introduction - CNN

15

Saliency prediction

Page 16: The impact of visual saliency prediction in image classification

Introduction - Saliency prediction

16

CNN model

Images

Saliency maps

Page 17: The impact of visual saliency prediction in image classification

Introduction - Saliency prediction

17

CNN for image classification

Page 18: The impact of visual saliency prediction in image classification

Objective

18

● Explore if saliency maps could improve other computer vision tasks

Page 19: The impact of visual saliency prediction in image classification

Objective

19

● Explore if saliency maps could improve computer vision tasks

Page 20: The impact of visual saliency prediction in image classification

Objective

20

● Explore if saliency maps could improve computer vision tasks

Page 21: The impact of visual saliency prediction in image classification

Outline● Introduction● Objective● State-of-the-art ● Methodology● Conclusions● Future work

21

Page 22: The impact of visual saliency prediction in image classification

State-of-the-art - Saliency prediction

22

SalNet

Pan, Junting and McGuinness, Kevin and Sayrol, Elisa and Giro-i-Nieto, Xavier and O'Connor, Noel E. Shallow and Deep Convolutional Networks for Saliency Prediction. CVPR 2016.

Trained on SALICON

Page 23: The impact of visual saliency prediction in image classification

Saliency prediction

23

Application of saliency:

Page 24: The impact of visual saliency prediction in image classification

Saliency prediction

24

Application of saliency:

● In image retrieval

○ Finding the last appearance of an object.

Ref: Reyes, Cristian et al. Where is my Phone? Personal Object Retrieval from Egocentric Images (2016)

Page 25: The impact of visual saliency prediction in image classification

Saliency prediction

25

Application of saliency:

● In image retrieval

○ Finding the last appearance of an object.

● Object recognition

○ Health care

Ref: Reyes, Cristian et al. Where is my Phone? Personal Object Retrieval from Egocentric Images (2016)

Ref: Pérez de San Roman, Philippe et al. Saliency Driven Object recognition in egocentric videos with deep CNN. 2016

Page 26: The impact of visual saliency prediction in image classification

Saliency prediction - our approach

26

Page 27: The impact of visual saliency prediction in image classification

Saliency prediction - our approach

27

AlexNet*SalNet

Page 28: The impact of visual saliency prediction in image classification

Outline● Introduction● Objective● State-of-the-art ● Methodology● Conclusions● Future work

28

Page 29: The impact of visual saliency prediction in image classification

Methodology

29

RGB images

Page 30: The impact of visual saliency prediction in image classification

30

RGB images

RGB - The Baseline

Page 31: The impact of visual saliency prediction in image classification

31

RGB images

RGB - The Baseline

● 1.2 M images

● 227 x 227

Page 32: The impact of visual saliency prediction in image classification

● 1.2 M images

● 227 x 227

32

RGB images

RGB - The Baseline

9 days to train on computation

cluster

Page 33: The impact of visual saliency prediction in image classification

RGB - The Baseline

33

Page 34: The impact of visual saliency prediction in image classification

RGB - The Baseline

34

9 days

5 days

Page 35: The impact of visual saliency prediction in image classification

RGB - The Baseline

35

9 days

5 days

1.5 days

Page 36: The impact of visual saliency prediction in image classification

How to introduce saliency predictions?

36

Multiplication

Fan-in Network

Concatenation

Page 37: The impact of visual saliency prediction in image classification

37

AlexnetMultiplication

Fan-in Network

Concatenation

Alexnet

How to introduce saliency predictions?

Page 38: The impact of visual saliency prediction in image classification

38

Multiplication

Fan-in Network

Concatenation

Alexnet

Alexnet

How to introduce saliency predictions?

Page 39: The impact of visual saliency prediction in image classification

39

Multiplication

Fan-in Network

Concatenation

Alexnet

Alexnet

Alexnet

CNN

How to introduce saliency predictions?

Page 40: The impact of visual saliency prediction in image classification

40

Multiplication

Fan-in Network

ConcatenationWhere?

Alexnet

Alexnet

Alexnet

CNN

How to introduce saliency predictions?

Page 41: The impact of visual saliency prediction in image classification

41

Multiplication

Fan-in Network

Concatenation

Alexnet

Alexnet

Alexnet

CNN

How to introduce saliency predictions?

Page 42: The impact of visual saliency prediction in image classification

42

Alexnet

Alexnet

Alexnet

CNN

Makes sense to use the baseline, which is already trained

Multiplication

Fan-in Network

Concatenation

How to introduce saliency predictions?

Page 43: The impact of visual saliency prediction in image classification

43

Alexnet

Alexnet

Alexnet

CNN

Makes sense to use the baseline, which is already trained

Multiplication

Fan-in Network

Concatenation

Pre-trained CNN

How to introduce saliency predictions?

Page 44: The impact of visual saliency prediction in image classification

Multiplication vs. Concatenation

44

Three strategies for each of them:

Page 45: The impact of visual saliency prediction in image classification

Multiplication vs. Concatenation

45

Three strategies for each of them:

RGBS

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Page 46: The impact of visual saliency prediction in image classification

Multiplication vs. Concatenation

46

Three strategies for each of them:

RGB-1S-2SRGBS

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Page 47: The impact of visual saliency prediction in image classification

Multiplication vs. Concatenation

47

Three strategies for each of them:

RGBS RGB-1S-2S RGBS-1S-2S

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Page 48: The impact of visual saliency prediction in image classification

Multiplication vs. Concatenation

48

RGBSRGBS

RGBS

RGB-1S-2S

RGBS-1S-2S

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Page 49: The impact of visual saliency prediction in image classification

Multiplication vs. Concatenation

49

RGBSRGBS

RGBS

RGB-1S-2S

RGBS-1S-2S

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Page 50: The impact of visual saliency prediction in image classification

Multiplication vs. Concatenation

50

RGB-1S-2S

RGBS

RGB-1S-2S

RGBS-1S-2S

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Page 51: The impact of visual saliency prediction in image classification

Multiplication vs. Concatenation

51

RGB-1S-2S

RGBS

RGB-1S-2S

RGBS-1S-2S

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Page 52: The impact of visual saliency prediction in image classification

Multiplication vs. Concatenation

52

RGBS-1S-2S

RGBS

RGB-1S-2S

RGBS-1S-2S

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Page 53: The impact of visual saliency prediction in image classification

Multiplication vs. Concatenation

53

RGBS-1S-2S

RGBS

RGB-1S-2S

RGBS-1S-2S

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Page 54: The impact of visual saliency prediction in image classification

Multiplication vs. Concatenation

54

The best option is concatenation:

● RGBS

● RGB-1S-2S

Page 55: The impact of visual saliency prediction in image classification

55

Multiplication

Fan-in Network

Concatenation

How to introduce saliency predictions?

Page 56: The impact of visual saliency prediction in image classification

56

Multiplication

Fan-in Network

Concatenation

How to introduce saliency predictions?

Page 57: The impact of visual saliency prediction in image classification

57

RGBS

RGB-1S-2S

Multiplication

Fan-in Network

Concatenation

How to introduce saliency predictions?

Page 58: The impact of visual saliency prediction in image classification

58

RGBS

RGB-1S-2S

Multiplication

Fan-in Network

Concatenation

How to introduce saliency predictions?

Page 59: The impact of visual saliency prediction in image classification

59

Alexnet

CNN

RGBS

RGB-1S-2S

Multiplication

Fan-in Network

Concatenation

How to introduce saliency predictions?

Page 60: The impact of visual saliency prediction in image classification

60

Alexnet

CNN

RGBS

RGB-1S-2S

Multiplication

Fan-in Network

Concatenation

Where?

How to introduce saliency predictions?

Page 61: The impact of visual saliency prediction in image classification

Fan-in architecture

61

Three strategies:

Fan-in C1.1

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Page 62: The impact of visual saliency prediction in image classification

Fan-in architecture

62

Three strategies:

Fan-in C1.1 Fan-in C2.1

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Conv 2Batch Norm.Max-Pooling

Page 63: The impact of visual saliency prediction in image classification

Fan-in architecture

63

Three strategies:

Fan-in C1.1 Fan-in C2.1 Fan-in C2

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Conv 2Batch Norm.Max-Pooling

Conv 1

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Page 64: The impact of visual saliency prediction in image classification

Fan-in architecture

64

Fan-in C1.1

Fan-in C1.1

Fan-in C2.1

Fan-in C2

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Page 65: The impact of visual saliency prediction in image classification

Fan-in architecture

65

Fan-in C1.1

Fan-in C1.1

Fan-in C2.1

Fan-in C2

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Page 66: The impact of visual saliency prediction in image classification

Fan-in architecture

66

Fan-in C1.1

Fan-in C2.1

Fan-in C2

Fan-in C2.1

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Conv 2Batch Norm.Max-Pooling

Page 67: The impact of visual saliency prediction in image classification

Fan-in architecture

67

Fan-in C1.1

Fan-in C2.1

Fan-in C2

Fan-in C2.1

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Conv 2Batch Norm.Max-Pooling

Page 68: The impact of visual saliency prediction in image classification

Fan-in architecture

68

Fan-in C1.1

Fan-in C2.1

Fan-in C2

Fan-in C2

Conv 1

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Page 69: The impact of visual saliency prediction in image classification

Fan-in architecture

69

Fan-in C1.1

Fan-in C2.1

Fan-in C2

Fan-in C2

Conv 1

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Page 70: The impact of visual saliency prediction in image classification

Fan-in architecture

70

The best option is concatenation:

● Fan-in C2.1

● Fan-in C2

Page 71: The impact of visual saliency prediction in image classification

Fan-in architecture

71

The best option is concatenation:

● Fan-in C2.1

● Fan-in C2

Surprising result for Fan-in C2 since it

has less parameters than the baseline

More experiments

12.4%

Page 72: The impact of visual saliency prediction in image classification

RGB-C2 (128x128)

72

Fan-in C2Fan-in Network

Page 73: The impact of visual saliency prediction in image classification

RGB-C2 (128x128)

73

Fan-in C2Fan-in Network

Page 74: The impact of visual saliency prediction in image classification

RGB-C2 (128x128)

74

RGB-C2RGB (baseline)

Fan-in C2Fan-in Network

Page 75: The impact of visual saliency prediction in image classification

75

RGB-C2 (128x128)

RGB (baseline)

Fan-in Network

RGB-C2

Fan-in C2

Page 76: The impact of visual saliency prediction in image classification

76

Multiplication

Fan-in Network

ConcatenationRGBS

RGB-1S-2S

How to introduce saliency predictions?

Page 77: The impact of visual saliency prediction in image classification

77

Multiplication

Fan-in Network

ConcatenationRGBS

RGB-1S-2S

Fan-in C2.1

Fan-in C2

How to introduce saliency predictions?

Page 78: The impact of visual saliency prediction in image classification

Analysis of per-class improvements

78

Fan-in C2.1

Fan-in C2

RGBS

RGB-1S-2S

Multiplication

Fan-in Network

Concatenation

Page 79: The impact of visual saliency prediction in image classification

Analysis of per-class improvements

79

Fan-in C2.1

Fan-in C2

RGBS

RGB-1S-2S

Multiplication

Fan-in Network

Concatenation

Page 80: The impact of visual saliency prediction in image classification

Analysis of per-class improvements

80

Class Increase of accuracy

Acoustic guitar

25 %

Volleyball 23 %

Page 81: The impact of visual saliency prediction in image classification

81

Analysis of per-class improvementsClass Increase of accuracy

Wrecker, tow car

-23 %

Entertainment center

-18 %

Page 82: The impact of visual saliency prediction in image classification

Outline● Introduction● Objective● State-of-the-art ● Methodology● Conclusions● Future work

82

Page 83: The impact of visual saliency prediction in image classification

● CNNs trained to predict saliency maps can be used to improve other computer vision tasks such as image classification

83

Conclusions

Page 84: The impact of visual saliency prediction in image classification

● CNNs trained to predict saliency maps can be used to improve other computer vision tasks such as image classification

84

Conclusions

Fan-in Network

Page 85: The impact of visual saliency prediction in image classification

● CNNs trained to predict saliency maps can be used to improve other computer vision tasks such as image classification

85

Conclusions

Fan-in Network

Page 86: The impact of visual saliency prediction in image classification

● The best way to introduce the saliency maps to a CNN is with a Fan-in architecture, that provides freedom to the network to decide how to introduce the saliency maps

86

Conclusions

Page 87: The impact of visual saliency prediction in image classification

● The best way to introduce the saliency maps to a CNN is with a Fan-in architecture, that provides freedom to the network to decide how to introduce the saliency maps

87

Conclusions

Fan-in C2.1Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Conv 2Batch Norm.Max-Pooling

Fan-in NetworkConcatenation

RGBSConv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Page 88: The impact of visual saliency prediction in image classification

● The best way to introduce the saliency maps to a CNN is with a Fan-in architecture, that provides freedom to the network to decide how to introduce the saliency maps

88

Conclusions

Page 89: The impact of visual saliency prediction in image classification

● The methodology of downsampling the images provides accurate results on the improvements of the CNN in larger images

89

Conclusions

227 x 227

128 x 128

Page 90: The impact of visual saliency prediction in image classification

Outline● Introduction● Objective● State-of-the-art ● Methodology● Conclusions● Future work

90

Page 91: The impact of visual saliency prediction in image classification

Future work

91

● Several experiments:○ Fan-in:

■ Fan-in C2 without saliency maps

■ Concatenating instead of multiplying

○ Concatenation only in the first convolutional layer

○ Multiplication and training from scratch

● Once we have a reasonable model try with other saliency models

Page 92: The impact of visual saliency prediction in image classification

Future work

92

● Several experiments:○ Fan-in:

■ Fan-in C2 without saliency maps

■ Concatenating instead of multiplying

○ Concatenation only in the first convolutional layer

○ Multiplication and training from scratch

● Once we have a reasonable model try with other saliency models

Thank you