2011.4.14 Reporter: Fei-Fei Chen. Wide-baseline matching Object recognition Texture recognition Scene classification Robot wandering Motion tracking.

Post on 30-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

2011.4.14Reporter: Fei-Fei Chen

SIFT (scale invariant feature

transform)

What is Computer Vision?

Local Invariant Feature

Wide-baseline matchingObject recognitionTexture recognitionScene classificationRobot wanderingMotion trackingChange in illumination3D camera viewpointetc.

Applications

Object recognition

3D object recognition

Image retrieval (1/3)

…> 5000images

change in viewing angle

Image retrieval (2/3)

22 correct matches

Image retrieval (3/3)

…> 5000images

change in viewing angle+ scale change

Automatic image stitching (1/2)

Automatic image stitching (2/2)

Motivation: Matching ProblemFind corresponding features across two or

more views.

Elements to be matched are image patches of fixed size

Task: Find the best (most similar) patch in a second image.

Motivation: Patch Matching

Intuition: This would be a good match for matching, since it is very distinctive.

Not all patches are created equal

Intuition: This would be a BAD patch for matching, since it is not very distinctive.

Not all patches are created equal

Intuitively, junctions of contours.Generally more stable features over change of

viewpoint.Intuitively, large variations in the neighborhood of the

point in all directions.They are good features to match!

What are corners?

SIFTDetection of Scale-Space ExtremaAccuracy Keypoint localizationOrientation assignmentKeypoint descriptor

detector

descriptor

For scale invariance, search for stable features across all possible scales using a continuous function of scale, scale space.

SIFT uses DoG filter for scale space because it is efficient and as stable as scale-normalized Laplacian of Gaussian.

1. Detection of scale-space extrema

DoG filteringConvolution with a variable-scale Gaussian

Difference-of-Gaussian (DoG) filter

Convolution with the DoG filter

Scale space doubles for the next octave

K=2(1/s)

Dividing into octave is for efficiency only.

Detection of scale-space extrema

Keypoint localization

X is selected if it is larger or smaller than all 26 neighbors

2. Accurate keypoint localization

Reject (1) points with low contrast (flat) (2) poorly localized along an edge

(edge)Fit a 3D quadratic function for sub-pixel

maxima

1

6

5

0-1 +1

3

1ˆ x

22 3262

626)( xxxxxf

062)(' xxf

3

16

3

13

3

126)ˆ(

2

xf

3

16

3

1

2. Accurate keypoint localizationTaylor series of several variables

Two variables

222

22

22

1)0,0(),( y

yy

fxy

yx

fx

xx

fy

y

fx

x

ffyxf

y

x

yy

f

yx

fyx

f

xx

f

yxy

x

y

f

x

ff

y

xf 22

22

2

1

0

0

xx

xxx

0x2

2

2

1

ff

ff TT

2. Accurate keypoint localization

Taylor expansion in a matrix form, x is a vector, f maps x to a scalar

nx

f

x

fx

f

1

1

2

2

2

2

1

2

2

2

22

2

12

21

2

21

2

21

2

nnn

n

n

x

f

xx

f

xx

f

xx

f

x

f

xx

fxx

f

xx

f

x

f

Hessian matrix(often symmetric)

gradient

2D illustration

Derivation of matrix form

xff

xffff

T

2

2

2

2

2

2

2

1

xxxxxx

2. Accurate keypoint localization

x is a 3-vectorRemove sample point if offset is larger than

0.5Throw out low contrast (<0.03)

Eliminating edge responses

r=10

Let

Keep the points with

Hessian matrix at keypoint location

3. Orientation assignmentBy assigning a consistent orientation, the

keypoint descriptor can be orientation invariant.

For a keypoint, L is the Gaussian-smoothed image with the closest scale,

orientation histogram (36 bins)

(Lx, Ly)

m

θ

Orientation assignment

Orientation assignment

Orientation assignment

Orientation assignment

σ=1.5*scale of the keypoint

Orientation assignment

Orientation assignment

Orientation assignment

accurate peak position is determined by fitting

Orientation assignment

0 2

36-bin orientation histogram over 360°, weighted by m and 1.5*scale falloff

Peak is the orientation

Local peak within 80% creates multiple orientations

About 15% has multiple orientations and they contribute a lot to stability

4. Local image descriptor

σ=0.5*width

• Thresholded image gradients are sampled over 16x16 array of locations in scale space

• Create array of orientation histograms (w.r.t. key orientation)• 8 orientations x 4x4 histogram array = 128 dimensions• Normalized for intensity variance, clip values larger than 0.2,

renormalize

Conclusions for SIFT Detection of Scale-Space Extrema

Accuracy Keypoint localization

Orientation assignment

Keypoint descriptor

For scale invariance

For rotation invariance

Remove unstable feature points

For illumination invariance

Image scale invariance. Image rotation invariance.Robust matching across a substantial range

of (1) affine distortion, (2) change in 3D viewpoint, (3) addition of noise, (4) change in illumination.

Conclusions for SIFT

For a feature x, he found the closest feature x1 and the second closest feature x2. If the distance ratio of d(x, x1) and d(x, x2) is smaller than 0.8, then it is accepted as a match.

Feature matching

Maxima in DoG

Remove low contrast

Remove edges

SIFT descriptor

SIFT descriptor

SIFT descriptor

SIFT descriptor

Image Matching

Image Matching

Image Matching

Image Matching

Image Matching

Image Matching

Image Matching

Image Matching

Thanks for your attention!

Q&A

top related