Top Banner
REGION-ORIENTED CONVOLUTIONAL NETWORKS FOR OBJECT RETRIEVAL Eduard Fontdevila Amaia Salvador Xavier Giró-i-Nieto ADVISORS AUTHOR
66
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Region-oriented Convolutional Networks for Object Retrieval

REGION-ORIENTED CONVOLUTIONAL NETWORKS FOR OBJECT RETRIEVAL

Eduard Fontdevila Amaia Salvador Xavier Giró-i-Nieto

ADVISORSAUTHOR

Page 2: Region-oriented Convolutional Networks for Object Retrieval

ACKNOWLEDGEMENTS

2

Financial Support Technical Support

Albert Gil Josep Pujal

Page 3: Region-oriented Convolutional Networks for Object Retrieval

OUTLINE

1. Motivation2. State of Art3. Local CNNs for Instance Search4. Fine-tuning5. Conclusions

3

Page 4: Region-oriented Convolutional Networks for Object Retrieval

visual Data is Big Data

4

motivation

Page 5: Region-oriented Convolutional Networks for Object Retrieval

libraries need librarians...

5

motivation

Page 6: Region-oriented Convolutional Networks for Object Retrieval

... and visual Data needs Computer Vision

6

COMPUTERVISION

motivation

Page 7: Region-oriented Convolutional Networks for Object Retrieval

applications

7

motivation

...

Page 8: Region-oriented Convolutional Networks for Object Retrieval

OUTLINE

1. Motivation2. State of Art3. Local CNNs for Instance Search4. Fine-tuning5. Conclusions

8

Page 9: Region-oriented Convolutional Networks for Object Retrieval

from shallow to deep learning

9

Bag of Words

SIFT

Histograms of gradients

Convolutional Neural Networks (CNNs)

“hand crafted” features

state of art

“learned” features

Page 10: Region-oriented Convolutional Networks for Object Retrieval

why deep learning now?

10

state of art

large datasets Powerful GPUs

...

Page 11: Region-oriented Convolutional Networks for Object Retrieval

AlexNet

11

state of art

Krizhevsky et al. (Toronto), ImageNet Classification with Deep Convolutional Neural Networks (2012)

Page 12: Region-oriented Convolutional Networks for Object Retrieval

CaffeNet

12

state of art

CaffeNet

architecture[Krizhevsky’12]

data[Deng’09]

framework[Jia’14]

Slide credit: Xavier Giró-i-Nieto

Page 13: Region-oriented Convolutional Networks for Object Retrieval

CaffeNet

13

state of art

inputimage

Babenko et al. (Moskow), Neural Codes for Image Retrieval (2014)

Page 14: Region-oriented Convolutional Networks for Object Retrieval

CaffeNet

14

state of art

convolutional layers

Babenko et al. (Moskow), Neural Codes for Image Retrieval (2014)

Page 15: Region-oriented Convolutional Networks for Object Retrieval

CaffeNet

15

state of art

fully connected layers

Babenko et al. (Moskow), Neural Codes for Image Retrieval (2014)

Page 16: Region-oriented Convolutional Networks for Object Retrieval

object candidates

16

state of art

Selective Search bounding boxes

Uijlings et al. (Trento), Selective Search for Object Recognition (2013)

MCG segments

Arbeláez et al. (Berkeley), Multiscale Combinatorial Grouping (2014)

Page 17: Region-oriented Convolutional Networks for Object Retrieval

R-CNN

17

state of art

Girshick et al. (Berkeley), Rich feature hierarchies for accurate object detection and semantic segmentation (2014)

Object Detection network

Page 19: Region-oriented Convolutional Networks for Object Retrieval

SDS

19

state of art

Hariharan et al. (Berkeley), Simultaneous Detection and Segmentation (2014)

Object Detection + Semantic Segmentation network

Page 20: Region-oriented Convolutional Networks for Object Retrieval

OUTLINE

1. Motivation2. State of Art3. Local CNNs for Instance Search4. Fine-tuning5. Conclusions

20

Page 21: Region-oriented Convolutional Networks for Object Retrieval

TRECVid Instance Search

21

local CNNs for instance search

large collection of videos

464h

shots

~470k

frames

1/4 fps

Page 22: Region-oriented Convolutional Networks for Object Retrieval

TRECVid Instance Search

22

local CNNs for instance search

large collection of videos

464h

shots

~470k

frames

1/4 fps

...in our case, subset of 13k shots (23k frames)

Page 23: Region-oriented Convolutional Networks for Object Retrieval

a Big Data scenario

23

local CNNs for instance search

Page 24: Region-oriented Convolutional Networks for Object Retrieval

query descriptors

24

local CNNs for instance search

CaffeNet

Fast R-CNN

SDS

visual features

visual features

visual features

query set

descriptorsimage

bbox

region

Page 25: Region-oriented Convolutional Networks for Object Retrieval

query descriptors

25

local CNNs for instance search

query set

examples of TRECVid query images

Page 26: Region-oriented Convolutional Networks for Object Retrieval

query descriptors

26

local CNNs for instance search

CaffeNet

Fast R-CNN

SDS

visual features

visual features

visual features

query set

descriptorsimage

bbox

region

Page 27: Region-oriented Convolutional Networks for Object Retrieval

object candidates

main scheme

27

local CNNs for instance search

CaffeNet

Fast R-CNN

SDS

visualfeatures

visualfeatures

visualfeatures

querydescriptors

matching

matching

matching

framesin 1 shot

pooling

pooling

pooling

ranking

ranking

ranking

Page 28: Region-oriented Convolutional Networks for Object Retrieval

object candidates

pooling

pooling

visualfeatures

visualfeatures

main scheme

28

local CNNs for instance search

CaffeNet

Fast R-CNN

SDS

visualfeatures

querydescriptors

matching

matching

matching

framesin 1 shot

pooling ranking

ranking

ranking

global approach

Page 29: Region-oriented Convolutional Networks for Object Retrieval

poolingvisualfeatures

object candidates

pooling

pooling

visualfeatures

visualfeatures

main scheme

29

local CNNs for instance search

CaffeNet

Fast R-CNN

SDS

querydescriptors

matching

matching

matching

framesin 1 shot

ranking

ranking

ranking

global approach

Page 30: Region-oriented Convolutional Networks for Object Retrieval

visualfeatures

pooling

object candidates

pooling

pooling

visualfeatures

visualfeatures

main scheme

30

local CNNs for instance search

CaffeNet

Fast R-CNN

SDS

querydescriptors

matching

matching

matching

framesin 1 shot

ranking

ranking

ranking

global approach

Page 31: Region-oriented Convolutional Networks for Object Retrieval

visualfeatures

pooling

object candidates

pooling

pooling

visualfeatures

visualfeatures

main scheme

31

local CNNs for instance search

CaffeNet

Fast R-CNN

SDS

querydescriptors

matching

framesin 1 shot matching

matching ranking

ranking

ranking

global approach

euclidean distance

Babenko et al. (Moskow), Neural Codes for Image Retrieval (2014)

Page 32: Region-oriented Convolutional Networks for Object Retrieval

poolingvisualfeatures

object candidates

pooling

pooling

visualfeatures

visualfeatures

main scheme

32

local CNNs for instance search

CaffeNet

Fast R-CNN

SDS

querydescriptors

matching

matching

matching

framesin 1 shot

ranking

ranking

ranking

global approach

Zhu et al. (NII), Multi-image aggregation for better visual object retrieval (2014)

distanceframe 1

distanceframe 2

distanceframe 3

average distance

distance shot - query

=

Page 33: Region-oriented Convolutional Networks for Object Retrieval

poolingvisualfeatures

object candidates

pooling

pooling

visualfeatures

visualfeatures

main scheme

33

local CNNs for instance search

CaffeNet

Fast R-CNN

SDS

querydescriptors

matching

matching

matching

framesin 1 shot

ranking

ranking

ranking

global approach

only top1000 shots

Page 34: Region-oriented Convolutional Networks for Object Retrieval

object candidates

main scheme

34

local CNNs for instance search

CaffeNet

Fast R-CNN

SDS

visualfeatures

visualfeatures

visualfeatures

querydescriptors

matching

matching

matching

framesin 1 shot

pooling

pooling

pooling

ranking

ranking

ranking

Page 35: Region-oriented Convolutional Networks for Object Retrieval

visualfeatures

pooling

object candidates

main scheme

35

local CNNs for instance search

CaffeNet

Fast R-CNN

SDS

visualfeatures

visualfeatures

querydescriptors

matching

matching

matching

pooling

pooling

ranking

ranking

ranking

local approach

framesin 1 shot

Page 36: Region-oriented Convolutional Networks for Object Retrieval

object candidates

main scheme

36

local CNNs for instance search

CaffeNet

Fast R-CNN

SDS

framesin 1 shot

local approach

Page 37: Region-oriented Convolutional Networks for Object Retrieval

visualfeatures

pooling

object candidates

main scheme

37

local CNNs for instance search

CaffeNet

Fast R-CNN

SDS

visualfeatures

visualfeatures

querydescriptors

matching

matching

matching

pooling

pooling

ranking

ranking

ranking

local approach

framesin 1 shot

Page 38: Region-oriented Convolutional Networks for Object Retrieval

quantitative results: ranking

38

local CNNs for instance search

mAP (%)

SDS Fast R-CNN

Page 39: Region-oriented Convolutional Networks for Object Retrieval

re-ranking

39

local CNNs for instance search

CaffeNet SDS / F-RCNN re-ranking

global + localfusion

Page 40: Region-oriented Convolutional Networks for Object Retrieval

quantitative results: re-ranking

40

mAP (%)

SDS Fast R-CNN CaffeNet

local CNNs for instance search

Page 41: Region-oriented Convolutional Networks for Object Retrieval

quantitative results: re-ranking

41

mAP (%)

SDS Fast R-CNN CaffeNet

local CNNs for instance search

adding context

~8%

Page 42: Region-oriented Convolutional Networks for Object Retrieval

qualitative results: re-ranking

42

query

SDS

Fast R-CNN

local CNNs for instance search

Page 43: Region-oriented Convolutional Networks for Object Retrieval

qualitative results: re-ranking

43

query

SDS

Fast R-CNN

local CNNs for instance search

Page 44: Region-oriented Convolutional Networks for Object Retrieval

as a reminder...

44

local CNNs for instance search

Selective Search bounding boxes

Uijlings et al. (Trento), Selective Search for Object Recognition (2013)

MCG segments

Arbeláez et al. (Berkeley), Multiscale Combinatorial Grouping (2014)

Fast R-CNN

SDS

Page 45: Region-oriented Convolutional Networks for Object Retrieval

OUTLINE

1. Motivation2. State of Art3. Local CNNs for Instance Search4. Fine-tuning5. Conclusions

45

Page 46: Region-oriented Convolutional Networks for Object Retrieval

training CNNs from scratch is costly...

46

fine-tuning

Page 47: Region-oriented Convolutional Networks for Object Retrieval

... instead: fine-tuning

47

fine-tuning

already trained network new dataset (novel domain)

resume training

Page 48: Region-oriented Convolutional Networks for Object Retrieval

a quick trial

48

fine-tuning

CaffeNet Pascal dataset

Page 49: Region-oriented Convolutional Networks for Object Retrieval

results on Pascal (global scale)

49

fine-tuning

validation subset

validation set

accuracy (%) 59,31% 4,14%

Histogram of images per category

categories

% of

imag

es

Page 50: Region-oriented Convolutional Networks for Object Retrieval

Microsoft COCO

50

fine-tuning

● Multiple objects per image

● 80 categories

● > 300k images (80k training)

● > 2M instances

Lin et al. (Cornell - Microsoft), http://vision.ucsd.edu/sites/default/files/coco_eccv.pdf (2015)

Page 51: Region-oriented Convolutional Networks for Object Retrieval

fine-tuning SDS on COCO

51

fine-tuning

SDS network COCO dataset

resume training

Page 52: Region-oriented Convolutional Networks for Object Retrieval

fine-tuning SDS on COCO

52

fine-tuning

SDS network COCO dataset

resume training

Page 53: Region-oriented Convolutional Networks for Object Retrieval

... but why?

53

fine-tuning

the more objects the network knows, the better

Page 54: Region-oriented Convolutional Networks for Object Retrieval

OUTLINE

1. Motivation2. State of Art3. Local CNNs for Instance Search4. Fine-tuning5. Conclusions

54

Page 55: Region-oriented Convolutional Networks for Object Retrieval

about the results

● Although not outperforming CaffeNet: SDS good for localization!

55

conclusions

maybe more suitable for TRECVid localization task?

Page 56: Region-oriented Convolutional Networks for Object Retrieval

about fine-tuning

● Networks trained on objects, but not on the objects to retrieve

56

conclusions

fine-tuning on a larger dataset is clearly the next step

Page 57: Region-oriented Convolutional Networks for Object Retrieval

about object candidates

● Only 100 candidates decreseases likelihood to success

... but using a higher number

57

conclusions

Fast SDS would be the key

Page 58: Region-oriented Convolutional Networks for Object Retrieval

thank you

Page 59: Region-oriented Convolutional Networks for Object Retrieval

visualizing CNNs’ features

more class-specific information

annex

Page 60: Region-oriented Convolutional Networks for Object Retrieval

SDS: Proposal Generation

input image

MCG object candidates

segments, not only bounding boxes

annex

Page 61: Region-oriented Convolutional Networks for Object Retrieval

SDS: Feature Extractionannex

Page 62: Region-oriented Convolutional Networks for Object Retrieval

SDS: Feature Extraction

object candidate

penultimate fully connected layers

annex

Page 63: Region-oriented Convolutional Networks for Object Retrieval

SDS: Region Classification

Linear SVM

annex

Page 64: Region-oriented Convolutional Networks for Object Retrieval

SDS: Region Refinementannex

Page 65: Region-oriented Convolutional Networks for Object Retrieval

basic pipeline for retrievalannex

Page 66: Region-oriented Convolutional Networks for Object Retrieval

interactive: Multi-image aggregationQuery images for a topic was used with the min distance to each shot.

The best option with SIFT-BoW is average, wheteher features (Avg-Pooling) or similarity scores (Sim-Avg)

annex

Zhu et al. (NII), Multi-image aggregation for better visual object retrieval (2014)