Top Banner
Bag-of-features for category recognition Cordelia Schmid
57

Bag-of-features for category recognition

Jan 26, 2016

Download

Documents

sook

Bag-of-features for category recognition. Cordelia Schmid. Bag-of-features for image classification. Origin: texture recognition Texture is characterized by the repetition of basic elements or textons. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bag-of-features  for category recognition

Bag-of-features for category recognition

Cordelia Schmid

Page 2: Bag-of-features  for category recognition

Bag-of-features for image classification

• Origin: texture recognition• Texture is characterized by the repetition of basic elements or

textons

Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001;

Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

Page 3: Bag-of-features  for category recognition

Texture recognition

Universal texton dictionary

histogram

Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

Page 4: Bag-of-features  for category recognition

Bag-of-features for image classification

• Origin: bag-of-words• Orderless document representation: frequencies of words from a

dictionary• Classification to determine document categories

Page 5: Bag-of-features  for category recognition

Bag-of-features for image classification

Classification

SVM

Extract regions Compute descriptors

Find clusters and frequencies

Compute distance matrix

[Nowak,Jurie&Triggs,ECCV’06], [Zhang,Marszalek,Lazebnik&Schmid,IJCV’07]

Page 6: Bag-of-features  for category recognition

Bag-of-features for image classification

Classification

SVM

Extract regions Compute descriptors

Find clusters and frequencies

Compute distance matrix

[Nowak,Jurie&Triggs,ECCV’06], [Zhang,Marszalek,Lazebnik&Schmid,IJCV’07]

Step 1 Step 2 Step 3

Page 7: Bag-of-features  for category recognition

Bag-of-features for image classification

• Excellent results in the presence of background clutter

bikes books building cars people phones trees

Page 8: Bag-of-features  for category recognition

Books- misclassified into faces, faces, buildings

Buildings- misclassified into faces, trees, trees

Cars- misclassified into buildings, phones, phones

Examples for misclassified images

Page 9: Bag-of-features  for category recognition

Bag-of-features for image classification

Classification

SVM

Extract regions Compute descriptors

Find clusters and frequencies

Compute distance matrix

[Nowak,Jurie&Triggs,ECCV’06], [Zhang,Marszalek,Lazebnik&Schmid,IJCV’07]

Step 1 Step 2 Step 3

Page 10: Bag-of-features  for category recognition

Step 1: feature extraction

• Scale-invariant image regions + SIFT– Selection of characteristic points

Harris-Laplace Laplacian

Page 11: Bag-of-features  for category recognition

Step 1: feature extraction

• Scale-invariant image regions + SIFT– Robust description of the extracted image regions

gradient3D histogram

image patch

y

x

• SIFT [Lowe’99]– 8 orientations of the gradient

– 4x4 spatial grid

Page 12: Bag-of-features  for category recognition

Step 1: feature extraction

• Scale-invariant image regions + SIFT– Selection of characteristic points – Robust description of these characteristic points– Affine invariant regions give “too” much invariance– Rotation invariance in many cases “too” much invariance

Page 13: Bag-of-features  for category recognition

Step 1: feature extraction

• Scale-invariant image regions + SIFT– Selection of characteristic points – Robust description of these characteristic points– Affine invariant regions give “too” much invariance– Rotation invariance in many cases “too” much invariance

• Dense descriptors – Improve results in the context of categories (for most categories)– Interest points do not necessarily capture “all” features

Page 14: Bag-of-features  for category recognition

Dense features

- Multi-scale dense grid: extraction of small overlapping patches at multiple scales

- Computation of the SIFT descriptor for each grid cells

Page 15: Bag-of-features  for category recognition

Step 1: feature extraction

• Scale-invariant image regions + SIFT (see lecture 2)

• Dense descriptors

• Color-based descriptors

• Shape-based descriptors

Page 16: Bag-of-features  for category recognition

Bag-of-features for image classification

Classification

SVM

Extract regions Compute descriptors

Find clusters and frequencies

Compute distance matrix

Step 1 Step 2 Step 3

Page 17: Bag-of-features  for category recognition

Step 2: Quantization

Page 18: Bag-of-features  for category recognition

Step 2:QuantizationStep 2:Quantization

Clustering

Page 19: Bag-of-features  for category recognition

Step 2: QuantizationStep 2: Quantization

Clustering

Visual vocabulary

Page 20: Bag-of-features  for category recognition

Examples for visual words

Airplanes

Motorbikes

Faces

Wild Cats

Leaves

People

Bikes

Page 21: Bag-of-features  for category recognition

Step 2: Quantization

• Cluster descriptors– K-mean – Gaussian mixture model

• Assign each visual word to a cluster– Hard or soft assignment

• Build frequency histogram

Page 22: Bag-of-features  for category recognition

K-means clustering

• We want to minimize sum of squared Euclidean distances between points xi and their nearest cluster centers

Algorithm:• Randomly initialize K cluster centers• Iterate until convergence:

– Assign each data point to the nearest center– Recompute each cluster center as the mean of all points

assigned to it

Page 23: Bag-of-features  for category recognition

K-means clustering

• Local minimum, solution dependent on initialization

• Initialization important, run several times– Select best solution, min cost

Page 24: Bag-of-features  for category recognition

From clustering to vector quantization

• Clustering is a common method for learning a visual vocabulary or codebook– Unsupervised learning process– Each cluster center produced by k-means becomes a

codevector– Provided the training set is sufficiently representative, the

codebook will be “universal”

• The codebook is used for quantizing features– A vector quantizer takes a feature vector and maps it to the

index of the nearest codevector in a codebook– Codebook = visual vocabulary– Codevector = visual word

Page 25: Bag-of-features  for category recognition

Visual vocabularies: Issues

• How to choose vocabulary size?– Too small: visual words not representative of all patches– Too large: quantization artifacts, overfitting

• Computational efficiency– Vocabulary trees

(Nister & Stewenius, 2006)

• Soft quantization: Gaussian

mixture instead of k-means

Page 26: Bag-of-features  for category recognition

Hard or soft assignment

• K-means hard assignment – Assign to the closest cluster center – Count number of descriptors assigned to a center

• Gaussian mixture model soft assignment– Estimate distance to all centers– Sum over number of descriptors

• Frequency histogram

Page 27: Bag-of-features  for category recognition

Image representationImage representation

…..

freq

uenc

y

codewords

Page 28: Bag-of-features  for category recognition

Bag-of-features for image classification

Classification

SVM

Extract regions Compute descriptors

Find clusters and frequencies

Compute distance matrix

Step 1 Step 2 Step 3

Page 29: Bag-of-features  for category recognition

Step 3: Classification

• Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes

Zebra

Non-zebra

Decisionboundary

Page 30: Bag-of-features  for category recognition

Classification

• Assign input vector to one of two or more classes• Any decision rule divides input space into decision regions

separated by decision boundaries

Page 31: Bag-of-features  for category recognition

Nearest Neighbor Classifier

• Assign label of nearest training data point to each test data point

Voronoi partitioning of feature space for 2-category 2-D and 3-D data

from Duda et al.

Source: D. Lowe

Page 32: Bag-of-features  for category recognition

• For a new point, find the k closest points from training data• Labels of the k points “vote” to classify• Works well provided there is lots of data and the distance function is

good

K-Nearest Neighbors

k = 5

Source: D. Lowe

Page 33: Bag-of-features  for category recognition

Linear classifiers

• Find linear function (hyperplane) to separate positive and negative examples

0:negative

0:positive

b

b

ii

ii

wxx

wxx

Which hyperplaneis best?

SVM (Support vector machine)

Page 34: Bag-of-features  for category recognition

Functions for comparing histograms

• L1 distance

• χ2 distance

• Quadratic distance (cross-bin)

N

i

ihihhhD1

2121 |)()(|),(

N

i ihih

ihihhhD

1 21

221

21 )()(

)()(),(

ji

ij jhihAhhD,

22121 ))()((),(

Page 35: Bag-of-features  for category recognition

Kernels for bags of features

• Histogram intersection kernel:

• Generalized Gaussian kernel:

• D can be Euclidean distance, χ2 distance, Earth Mover’s Distance, etc.

N

i

ihihhhI1

2121 ))(),(min(),(

2

2121 ),(1

exp),( hhDA

hhK

Page 36: Bag-of-features  for category recognition

Chi-square kernel

•Multi-channel chi-square kernel

● Channel c is a combination of detector, descriptor

● is the chi-square distance between histograms

● is the mean value of the distances between all training sample

● Extension: learning of the weights, for example with MKL

),( jic HHD

cA

m

i iiiic hhhhHHD1 21

22121 )]()([

2

1),(

Page 37: Bag-of-features  for category recognition

Pyramid match kernel

• Weighted sum of histogram intersections at multiple resolutions (linear in the number of features instead of cubic)

optimal partial matching between sets

of features

Page 38: Bag-of-features  for category recognition

Pyramid Match

Histogram intersection

Page 39: Bag-of-features  for category recognition

Difference in histogram intersections across levels counts number of new pairs matched

matches at this level matches at previous level

Histogram intersection

Pyramid Match

Page 40: Bag-of-features  for category recognition

Pyramid match kernel

• Weights inversely proportional to bin size

• Normalize kernel values to avoid favoring large sets

measure of difficulty of a match at level i

histogram pyramids

number of newly matched pairs at level i

Page 41: Bag-of-features  for category recognition

Example pyramid matchLevel 0

Page 42: Bag-of-features  for category recognition

Example pyramid matchLevel 1

Page 43: Bag-of-features  for category recognition

Example pyramid matchLevel 2

Page 44: Bag-of-features  for category recognition

Example pyramid match

pyramid match

optimal match

Page 45: Bag-of-features  for category recognition

Summary: Pyramid match kernel

optimal partial matching between sets

of features

number of new matches at level idifficulty of a match at level i

Page 46: Bag-of-features  for category recognition

Spatial pyramid matching

• Add spatial information to the bag-of-features

• Perform matching in 2D image space

[Lazebnik, Schmid & Ponce, CVPR 2006]

Page 47: Bag-of-features  for category recognition

Related work

Szummer & Picard (1997) Lowe (1999, 2004) Torralba et al. (2003)

GistSIFT

Similar approaches:

Subblock description [Szummer & Picard, 1997]

SIFT [Lowe, 1999]

GIST [Torralba et al., 2003]

Page 48: Bag-of-features  for category recognition

Locally orderless representation at several levels of spatial resolution

level 0

Spatial pyramid representation

Page 49: Bag-of-features  for category recognition

Spatial pyramid representation

level 0 level 1

Locally orderless representation at several levels of spatial resolution

Page 50: Bag-of-features  for category recognition

Spatial pyramid representation

level 0 level 1 level 2

Locally orderless representation at several levels of spatial resolution

Page 51: Bag-of-features  for category recognition

Spatial pyramid matching

• Combination of spatial levels with pyramid match kernel [Grauman & Darell’05]

Page 52: Bag-of-features  for category recognition

Scene classification

L Single-level Pyramid

0(1x1) 72.2±0.6

1(2x2) 77.9±0.6 79.0 ±0.5

2(4x4) 79.4±0.3 81.1 ±0.3

3(8x8) 77.2±0.4 80.7 ±0.3

Page 53: Bag-of-features  for category recognition

Retrieval examples

Page 54: Bag-of-features  for category recognition

Category classification – CalTech101

L Single-level Pyramid

0(1x1) 41.2±1.2

1(2x2) 55.9±0.9 57.0 ±0.8

2(4x4) 63.6±0.9 64.6 ±0.8

3(8x8) 60.3±0.9 64.6 ±0.7

Bag-of-features approach by Zhang et al.’07: 54 %

Page 55: Bag-of-features  for category recognition

CalTech101

Easiest and hardest classes

• Sources of difficulty:– Lack of texture– Camouflage– Thin, articulated limbs– Highly deformable shape

Page 56: Bag-of-features  for category recognition

Discussion

• Summary– Spatial pyramid representation: appearance of local

image patches + coarse global position information– Substantial improvement over bag of features– Depends on the similarity of image layout

• Extensions– Integrating different types of features, learning weights,

use of different grids [Zhang’07, Bosch & Zisserman’07, Varma et al.’07, Marszalek et al.’07]

– Flexible, object-centered grid

Page 57: Bag-of-features  for category recognition

Evaluation of image classification

• Image classification task PASCAL VOC 2007-2009

• Precision – recall curves for evaluation

• Mean average precision