Object Recognition from Local Scale-Invariant Featurespeople.cs.pitt.edu/~kovashka/cs3710_sp15/features_yan.pdf · • Proposed Method: Scale Invariant Feature Transform(SIFT) Department

Department of Electrical & Computer Engineering

Object Recognition from Local

Scale-Invariant Features

David G. Lowe

Presented by: Yan Fang


What are Local Features?• Image Pattern differs from its immediate neighborhood

• Associate with change of image property

• Used for image descriptors


Why Local Features?

• Specific semantic interpretation in the limited context of a

certain application.- Edge -> road

- Blob -> impurities

• Limited set of well localized and individually identifiable anchor

points.- Motion tracking

- 3D reconstruction

- Image alignment or mosaicing

• A robust image representation, no need for segmentation- Scene recognition

- Object recognition

- Needs no meaning in features


Some Terms

• Detector - Tools that extract features from images

• Descriptor – Instance of feature representation

• Invariant - A function is invariant under a certain family of

transformations if its value does not change when a

transformation from this family is applied to its argument.

• Local feature – (ideal) location in space, no spatial extent- interest points

- Regions

- edge segments


Ideal Local Features

• Repeatability: found in different condition images of same object or

scene.

- Invariance

- Robustness

• Distinctiveness/informativeness: The intensity patterns underlying

the detected features should show a lot of variation.

• Locality: reduce the probability of occlusion and to allow simple

model approximations of the geometric and photometric

deformations.

• Quantity: sufficiently large number, even on small objects.

• Accuracy: The detected features should be accurately localized,

both in image location, as with respect to scale and possibly shape.

• Efficiency: time, fast, easy for computation


Discussion on Local Features

• Repeatability: depends on invariance, robustness, quantity.

• Distinctiveness v.s. Locality

- More local, less information, harder to match

- In some case (mosaicing), locality can be scarified

• Distinctiveness v.s. Invariance

- Degree of freedom of transformation

• Distinctiveness v.s. Robustness

- Information Loss for robustness

- Denoise v.s. Detail


Compare with other features

• Global Features - Describe content with color histogram

- Usage: Segmentation, Object recognition

- Fail in distinguishing foreground and background

- Image clutter and occlusion are problems

• Image Segments - Difficult by itself, require much information from image

- Search for blob, based on texture/color

• Sampled Features - Exhaustively Sample from subparts with sliding window

- Solve background problem, not partial occlusion

- Fixed grid sampling, difficult for invariance.

- Random sampling, better localization, poor repeatability, not

used alone

- Sampling from edge, good with wiry objects

•


Corner Detector – Harris Detector

• Distinguish “flat”, “Edge”, “Corner”

• Auto-Correlation Matrix, describe gradient distribution of local

neighborhood- Smooth with Gaussian kernel

- Two eigenvalues indicate image signal change in two direction

- Large eigenvalue in both means potential corner

• Measure the cornerness


Corner Detector – Harris Detector

For interest points detection, extract local minimum of cornerness

function with non-maximum suppressions


Example of Harris Detector

Results on rotated image examples

Notice T-junctions also be found other than true corners


Select Feature Detector

• Select feature detectors based on image content

and category

• Do not use more invariance than need. Notice the

tradeoff between invariance and distinguish power.

• Consider other properties depend on application

scenario- localization accuracy for Camera Calibration or 3D modeling

- Efficiency for large dataset


Introduction to SIFT

• Problem: Object Recognition in cluttered real world scene

• Challenge & Difficulty: Finding image features resist to object

variation

• Proposed Method: Scale Invariant Feature Transform(SIFT)


Invariance

• Illumination

• Scale

• Rotation

• Affine


Previous Work

• Candidate feature types

– line segments

– groupings of edges

– regions

• Zhang et al

– Harris Corner Detection

– Detect peaks in local image variation

• Schmid and Mohr

– Harris Corner Detection for interesting points

– Orientation-invariant vector of derivative-of-Gaussian image measurements


Motivation & Improvement

Limitation of related work:• Examine image only on a single scale

• Difficult to extend to other circumstance

• Focus on feature detection, overlook the descriptor

This work:

• Identify key location in scale-space

• Selected feature vectors invariant to scaling,

stretching, rotation and other variation

• Improvement on feature descriptor

• Efficient, less than 2 second with clutter and

occlusion


Stage of SIFT Object Recogntion

• Feature Detection

• Local Image Description

• Indexing and Matching

• Model Verification


Scale Space

Proper scaling of

objects in new image

is unknown

Exploring features in

different scales is

helpful to recognize

different objects.


Difference of Gaussian (DoG)

• A = Convolve image with vertical and horizontal 1D

Gaussians, 𝜎 = 2• B = Convolve A with vertical and horizontal 1D

Gaussians, 𝜎 = 2• DOG (Difference of Gaussian) = A – B

• Downsample B with bilinear interpolation with pixel

spacing of 1.5 (linear combination of 4 adjacent

pixels)

𝐷 𝑥, 𝑦, 𝜎 = 𝐺 𝑥, 𝑦, 𝑘𝜎 − 𝐺 𝑥, 𝑦, 𝜎 ∗ 𝐼 𝑥, 𝑦 , 𝑘 = 2


Image Pyramid of DoG

A3-B3

A2-B2

A1-B1

B3

A3

B2

A2

B1

A1

G

G

G

G

Downsample

Downsample

DOG Pyramid1DOG1


Pyramid of DoG (Octave)

2k2σ

2kσ

2σ

kσ

σ

2kσ

2σ

kσ

σ

David G. Lowe, IJCV 2004


DoG Example

A1 B1 DoG1

DoG3

DoG2A2

A3 B3

B2

Ashley L. Kapron


Feature Detection

• Find maxima and minima of scale space

• For each point on a DOG level:

– Compare to 26 neighbors at adjacent level

• Repeat for each DOG level

• Key points remains



SIFT key stability - Illumination

• For all levels, compute

– Gradient Magnitude

– 𝑀𝑖𝑗 = (𝐴𝑖𝑗 − 𝐴𝑖+1,𝑗)2+(𝐴𝑖𝑗 − 𝐴𝑖,𝑗+1)

2

• Threshold gradient magnitudes:

– Remove all key points with MIJ less than 0.1 times the max gradient value

• Motivation: Low contrast is generally less reliable than high for feature points


SIFT key stability - Orientation

• For all levels, compute

– Gradient Orientation

– 𝑅𝑖𝑗 = 𝑎𝑡𝑎𝑛2(𝐴𝑖𝑗 − 𝐴𝑖−1,𝑗 , 𝐴𝑖𝑗+1 − 𝐴𝑖,𝑗)

+

Gaussian Smoothed Image Gradient Orientation Gradient Magnitude

Ashley L. Kapron



• Gradient magnitude weighted by 2D gaussian

Gradient Magnitude 2D Gaussian Weighted Magnitude

* =

Ashley L. Kapron



• Identify peak

• Assign orientation and sum of magnitude to key point

Weighted Magnitude

Gradient Orientation

Gradient OrientationSum

of

Weig

hte

d M

agnitudes

Peak

Ashley L. Kapron


Example of Key Points

Max/mins from

DOG pyramid

Filter for

illuminationFilter for edge

orientation

Ashley L. Kapron


Stability Test

78% of the keys survive from

rotation, scaling, stretching,

change of brightness and

contrast,

and addition of pixel noise.








Local Image Description

• SIFT keys each assigned:

– Location

– Scale (analogous to level it was detected)

– Orientation (assigned in previous canonical

orientation steps)

• Now: Describe local image region invariant to

the above transformations


SIFT Key Example



For each key point:

• Identify 8x8

neighborhood

(from DOG

level it was

detected)

• Align orientation

to x-axis

(subtracted by

the orientation

of key points)



• Calculate gradient magnitude and orientation map and weight by

Gaussian




Gaussian

• Sum the weighted gradient magnitude at near direction. Calculate

histogram of each 4x4 region. 8 bins for gradient orientation.




Gaussian

• Sum the weighted gradient magnitude at near direction.Calculate

histogram of each 4x4 region. 8 bins for gradient orientation.

• This histogram array is the image descriptor.

Ashley L. Kapron


Orientations Numbers









Image Matching

Database Input Image


Image Matching

• Find all key points identified in target image

– Each key point will have 2D location, scale and orientation, as

well as invariant descriptor vector

• For each key point, search similar descriptor vectors in

reference image database.

– Descriptor vector may match more than one reference pose

database

– The key point “votes” for pose(s)

• Use best-bin-first algorithm


Hough Transform Clustering

• Create 4D Hough Transform (HT) Space for each

reference pose

1. Orientation bin = 30°

2. Scale bin = 2

3. X location bin = 0.25*ref image width

4. Y location bin = 0.25*ref image height

• If key point “votes” for reference pose, count the vote

which gives estimate of location and pose

• Keep list of which key points vote for a bin








Verification

• Identify bins with largest votes (must have at least 3).

• Using list of key points which voted for a cell, compute

affine transformation parameters (M, T)

• Use corresponding coordinates of reference model (x,y)

and target image (u,v).

• If more than three points, solve in least-squares sense


Remove Outliers

• After applying affine transformation to key points, determine difference between calculated location and actual target image location

• Candidate must meet:

– Orientation within 15°

– Scale changed within 2

– X,Y location within 0.2*model size

• Repeat least-squares solution until no points are removed

• Fewer than 3 points remain lead to rejection


Object Recognition Example


Object Recognition Example


Pros & Cons

• Numerous keys can be generated from scaling space for even small objects

• Partial occlusion/image clutter can be dealt with

• Object models can undergo limited affine projection.

• Individual features can be matched to a large database of objects

• Robust recognition can be performed fast

• Fully affine transformations require additional steps

• Method was not evaluated by large data set with various case.


Future Works

• Deeper exploration in scale space with octave of

incremental Gaussian filtering

• Sub-pixel localization with 3D curve fitting

• Filter edge and low contrast points

• More?


Questions?

Object Recognition from Local Scale-Invariant Featurespeople.cs.pitt.edu/~kovashka/cs3710_sp15/features_yan.pdf · • Proposed Method: Scale Invariant Feature Transform(SIFT) Department

Documents