Top Banner
Contextless Object Recognition with Shape-enriched SIFT and Bags of Features Marcel Tella Amo Directed by Dr. Matthias Zeppelzauer (TU Wien) Codirected by Dr. Xavier Giró-i-Nieto (UPC)
42

Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

Dec 07, 2014

Download

Technology

Xavi Giró

Thesis report and full details: https://imatge.upc.edu/web/publications/contextless-object-recognition-shape-enriched-sift-and-bags-features

Author: Marcel Tella
Advisors: Xavier Giró-i-Nieto (UPC) and Matthias Zeppelzauer (TU Wien)

Degree: Telecommunications Engineering (5 years) at Telecom BCN-ETSETB (UPC)

Abstract:
Currently, there are highly competitive results in the field of object recognition based on the aggregation of point-based features. The aggregation process, typically with an average or max-pooling of the features generates a single vector that represents the image or region that contains the object.

The aggregated point-based features typically describe the texture around the points with descriptors such as SIFT. These descriptors present limitations for wired and textureless objects. A possible solution is the addition of shape-based information. Shape descriptors have been previously used to encode shape information and thus, recognise those types of objects. But generally an alignment step is required in order to match every point from one shape to other ones. The computational cost of the similarity assessment is high.

We purpose to enrich location and texture-based features with shape-based ones. Two main architectures are explored: On the one side, to enrich the SIFT descriptors with shape information before they are aggregated. On the other side, to create the standard Bag of Words histogram and concatenate a shape histogram, classifying them as a single vector.

We evaluate the proposed techniques and the novel features on the Caltech-101 dataset.

Results show that shape features increase the final performance. Our extension of the Bag of Words with a shape-based histogram(BoW+S) results in better performance. However, for a high number of shape features, BoW+S and enriched SIFT architectures tend to converge.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

Contextless Object Recognitionwith Shape-enriched SIFT and

Bags of Features

Marcel Tella Amo

Directed by Dr. Matthias Zeppelzauer (TU Wien)Codirected by Dr. Xavier Giró-i-Nieto (UPC)

Page 2: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

2

Motivation

Object Recognition and Classification

Categories• Ball• Airplane• Chair• Beaver• …

Ball Airplane Chair

Shape Information

Texture information

Page 3: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

3

Requirements

State of the Art

Design

Results

Index

Page 4: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

4

Requirements

Page 5: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

5

Design shape features that can be used in an aggregated framework, like Bag of Words with no need of matching or alignment.

Requirements State of the Art Design Results

Take a successful method :

Shape Information

SIFT

Page 6: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

6

Analyse the implication of the vocabulary size with respect to the size of the shape features.

SIFT

Shape

Requirements State of the Art Design Results

Page 7: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

7

The proposed features should be at least scale, rotation and translation invariant. If it is possible, flip invariant as well.

Requirements State of the Art Design Results

Page 8: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

8

Need for Segmentation to codify the shapeStudy the limitations of shape coding when using a state of the art segmentation.

Manual annotations vs Automatic Segmentation

Requirements State of the Art Design Results

Page 9: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

9

State of the Art

Page 10: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

10

Requirements State of the Art Design Results

Object Candidates algorithmsMultiscale Combinatorial Grouping (MCG)

Arbelaez, P., Pont-Tuset, J., Barron, J. T., Marques, F., Malik, J. (2014).Multiscale Combinatorial Grouping. CVPR.

Ranking

Object Plausibility

High

Low

Page 11: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

11

Shape Context

G. Mori, S. Belongie, and J. Malik. Ecient shape matching using shapecontexts. PAMI, 27(11), 2005.

Requirements State of the Art Design Results

Page 12: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

12

Interest point descriptors: SIFT descriptor

Typically 4x4 divisions * 8 bins/hist = 128 features

dense SIFT

sparse SIFT

David G Lowe, Distinctive image features from scale-invariant keypoints, International journal of computer vision 60 (2004), no. 2, 91{110.

Simplified example

Requirements State of the Art Design Results

Page 13: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

13

Enrichment of SIFT

Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order pooling. In Computer Vision{ECCV 2012} (pp. 430-443). Springer Berlin Heidelberg.

Extra features : Relative position + aspect ratio + scale ratio + Color Space

Extra features : Absolute spatial location (X,Y) or angle and distance

Rene Grzeszick, Leonard Rothacker, and Gernot A. Fink, "Bag-of-features representations using spatial visual vocabularies for object classication,“ in IEEE Intl. Conf. on Image Processing, Melbourne, Australia, 2013

128-dimensional SIFT descriptor Extra features

Requirements State of the Art Design Results

Page 14: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

14

Bag of Words

Requirements State of the Art Design Results

Page 15: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

15

Bags of Words - Pipeline

Get Descriptors

Clustering(K-means)

Create histograms

Train Model(SVM)

Image

Create histogram

Evaluate(SVM)

Requirements State of the Art Design Results

Page 16: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

16

Design

Page 17: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

17

Why dense SIFT?

Requirements State of the Art Design Results

Page 18: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

18

Main principle: Combination of dense SIFT and Object Candidates

Requirements State of the Art Design Results

Page 19: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

19

Distance to the nearest border (DNB)

Logarithmic distance to the nearest border (LDNB)

Less influence of big distances

Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order pooling. In Computer Vision-ECCV 2012 (pp. 430-443). Springer Berlin Heidelberg.

Requirements State of the Art Design Results

Page 20: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

20

Distance and Angle to the nearest border (DANB)

Solution: Codify them in two separated features.Problem: Really similar in 2D but very different values.

Requirements State of the Art Design Results

Page 21: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

21

Rotation Invariant Angle to the nearest border

Requirements State of the Art Design Results

Page 22: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

22

Distance to the center (DC)

Requirements State of the Art Design Results

Page 23: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

23

η - Angular Scan (ηAS)WINNER!

WINNER!

Requirements State of the Art Design Results

Page 24: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

24

Shape Context from a dense SIFT (DSC)

Note: It crosses the contour of the region like Shape Context. ηAS does not!

Requirements State of the Art Design Results

Page 25: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

25

Rotation Invariant Region Quantization (RIRQ)

Main idea: Get spatial information.

Easily extensible to a pyramid!

Lazebnik, S., Schmid, C., & Ponce, J. (2006). 2006 IEEE Computer Society Conference on (Vol. 2, pp. 2169-2178). IEEE.

Requirements State of the Art Design Results

Page 26: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

26

Achieving flip invariance (RIRQ)

12

34

1

2 3

44 1

23

2

34

1

4 22 4

SORT SORT

2 4

Requirements State of the Art Design Results

Page 27: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

Where do we integrate our features? Two main Architectures

SIFT Shape features

Bag of eSIFT visual words

Visual Vocabulary

Enriched SIFT (eSIFT)

SIFT

Shape histogramBag of Words

Visual Vocabulary

BoW+Shape

27

Requirements State of the Art Design Results

Page 28: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

28

SIFT

Shape histogramBag of Words

Visual Vocabulary

BoW+Shape Creation of the shape histograms

11. Accumulate the same feature for all points .

2. Create a histogram of X bins for that feature.

1

2

2

3. Concatenate histograms to create the final one.

Example: 8-Angular Scan

8 distances (different angles)

# SI

FT k

eypo

ints

Accumulation of features

Requirements State of the Art Design Results

Page 29: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

29

Results and conclusions

Page 30: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

30

The dataset: Caltech-101Requirements State of the Art Design Results

• Well recognized dataset• 101 Different Categories of images• Ground truth annotations available• From 40 to 800 images per category.

Page 31: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

31

Metrics: Accuracy (%)

Correct Classifications

Correct + Incorrect Classifications

Requirements State of the Art Design Results

Page 32: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

32

Experiments setup• 30 images per category in train and 30-50 in test.• 101 Categories + Background category.• Different Vocabulary sizes in the X axis.• Accuracy(%) in the Y axis:

•Experiments and analysis:• eSIFT• BoW+S• eSIFT vs BoW+S• Performance acheived• Comparison between adding features before or after quantization• Number of bins per histogram• Ground truth vs MCG Object Canditates• Context vs Shape

Requirements State of the Art Design Results

Page 33: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

33

Results enriched SIFTRequirements State of the Art Design Results

Page 34: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

34

Results BoW+S

Requirements State of the Art Design Results

Page 35: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

35

Performance achieved

Conclusion

With Angular Scan, there is an increase of performance from 16% to around 41%.

Requirements State of the Art Design Results

Page 36: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

36

Comparison between adding features after and before

Conclusion

In Angular Scan, if the number of shape features is high,both architectures tend to converge.

Requirements State of the Art Design Results

Page 37: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

37

Number of bins per histogram

Conclusion

In Angular Scan, 8 bins is the value that gives the best performance.

Requirements State of the Art Design Results

Page 38: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

38

Ground truth vs MCG Object Candidates

Conclusion 1

Higher vocabulary values lead to a more robust approach in terms of segmentation errors.

Conclusion 2

Shape-based methods are more sensible to segmentation errors than texture-based.

Requirements State of the Art Design Results

Page 39: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

Context gain vs Shape gain

Conclusion

It gives better performance to codify the shape than the context of the image. 39

Object

Context

Requirements State of the Art Design Results

Page 40: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

40

Future Work

Comparison betwen our work andSecond Order Pooling

PhD thesis of Carles Ventura

Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order pooling. In Computer Vision-ECCV 2012 (pp. 430-443). Springer Berlin Heidelberg.

Page 41: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

41

Distance to the nearest border (DNB)

Future Work

Page 42: Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

42

Conclusions

1. Increase of performance from 16% to around 41%2. In Angular Scan, if the number of shape features is high, both

architectures tend to converge.3. In Angular Scan, 8 bins is the value that gives the best performance.4. Higher vocabulary values lead to a more robust approach in terms of

segmentation errors.5. Shape-based methods are more sensible to segmentation errors than

texture-based.6. It gives better performance to codify the shape than the context of the

image.

Thank you! Questions?