Dudek & Jugessur, ICRA 2000. April 2000, IEEE ICRADudek & Jugessur Robust Place and Object Recognition using Local Appearance based Methods Gregory Dudek.

April 2000, IEEE ICRA Dudek & Jugessur

Dudek & Jugessur, ICRA 2000.

Robust Place and Object Recognition using Local Appearance based Methods

Gregory Dudek and Deeptiman Jugessur

Center for Intelligent Machines

McGill University

+

QuickTime™ and aAnimation decompressor

are needed to see this picture.



Outline

• Applications• PCA: shortcomings• Objectives• Approach• Background• System Overview• Results• Conclusion



Two Applications

• Object recognition: what is that thing?– Recognizing a known object from its visual appearance.

– Landmarks, grasping targets, etc.

• Place recognition (coarse localization): what room am I in?– Recognizing the current waypoint on a trajectory,

validating the current locale for the application of a precise localization method, topological navigation.



PCA-based recognition.

• Has now become a well established method for image recognition.

• PCA-based recognition: global transform of image with N degrees of freedom into an eigenspace with M << N degrees of freedom.– Freedoms M are the “most important” characteristics of

the set of images being memorized.

• Avoids having to segment image into object & background by using the whole thing.



Observations

• Using whole image implies recognizing combination of object AND background.

• Segmenting object from background would avoid dependence on background, but it’s too difficult.

• Using a small sub-region gives a less precise recognition (e.e. the sun-window could come from more than one image), it’s is efficient.

• Many subwindows together can “vote” for an unambiguous recognition.

• If the sub-windows are suitably chosen, they may totally ignore the background.



Problem Statement

• Improving the performance of classic PCA based recognition by accounting for:

– Varying backgrounds

– Planar rotations

– Occlusions

• Also (discussed in less detail) – Changes in object pose

– Non-rigid deformation



Our key idea(s).

• Use sub-windows: several together uniquely accomplish recognition.

• Sub-windows are selected by an attention operator (several kinds can be used).

• Each sub-window is sampled non-uniformly to weight it towards it’s center.

• Use only the amplitude spectrum to buy rotational invariance.



Background

• Standard Appearance Based Recognition– M. Turk and S. Pentland 1991

– S.K. Nayar, H. Murase, S.A. Nene 1994

– H. Murase, S.K. Nayar 1995

– Shortcomings (due to global approach):• Background

• Scale

• Rotations

• Local changes of the image or object

• Occlusion



Background (part 2)

• “Enhanced” Local sub-window methods– D. Lowe 1999: scale invariance, simple features.

– C. Schmid 1999: Probabilistic approach based on sub-windows extracted using Harris operator.

– C. Schmid & R. Mohr 1997: numerous sub-windows extracted using Harris operator for database image retrieval (simpler problem).

– K. Ohba & K. Ikeuchi 1997: K.L.T. operator used for the extraction

of sub-windows for the creation of an eigenspace. Only handles occlusion.

• Interest Operator of choice:– D. Reisfeld, H. Wolfson, Y.Yeshurun 1995: Local symmetry operator



Approach

• 2 phases:

– Training (off-line) for the entire database of recognizable images:

• Run an interest operator to obtain a saliency map for each image.

• Choose sub-windows around the salient points for each image.

• Select most informative sub-windows and use foveal sampling.

• Create the eigenspace with the processed sub-windows.

– Testing (on-line) for a candidate test image:

• Run the same interest operator to obtain the saliency map.

• Choose the sub-windows and process the information within them.

• Project the sub-windows onto the eigenspace

• Perform classification based on nearest neighbor rules.



Recognition Model

Databaseof

recognizableimages

Candidatetest

image

Extractsub-windows

based oninterest operator

saliencyvalues and

information content

Obtainamplitude

spectrafor the

sub-windows

Eigenspacefor

classification

Run all images though the interest operator

Run the image through the interest operator

2D FFT

2D FFT

Create low dim. eigenspace

Project ontoeigenspace

Off-line

On-line



Polar Samplings and 2D FFT

Polar Sampling Polar Sampling

SameAmplitude Spectrum

(in theory)

2D FFT 2D FFT



Shift Theorem

f(x,y) → F(u,v)

Shift theorem states that:

f(x−a,y−b) → ej2π(au+bv)F(u,v)

Amplitudes are the same as:

|ej2π(au+bv)F(u,v) | = |F(u,v) |



Place RecognitionTest Images Training Images

Best match

Best match



Place Recognition (2)Test Images Training Images

Best match

Best match



Object RecognitionTest Image Training Image

Recognition



Object Recognition (2)Test Image Training Image

Best matches

Note:background variation

and occlusion



Performance metrics

• On-line performance:• 15x15 pixel subwindows: 90% recognition with 10 subwindows

(10 interest points).

• 15x15 pixel subwindows: 100% recognition using 15 more subwindows

– Interest operator can take 1/30s to 10 min. (depending on the operator, images size, etc.).

– Classification in Eigenspace well under 1 sec (can be performed in real time).



Performance vs Number of Interest PointsR

ecog

niti

on R

ate

100%

Number of features

Note: 10 windows of size 15x15 meansusing only 0.7% of the total image

content.



Conclusion & Extensions

• Approach to object and place recognition from single video images. Works despite planar rotation, occlusion or other deformations.

• Highly robust.

• Recognition rates of up to 100% with 20 test images.

• Improved robustness to background can be achieved using “masking” [Jugessur & Dudek CVPR 2000].

• Ongoing work sees to exploit geometry of interest points.

• Could filter in Eigenspace during training to select only “useful” features.



That’s all



Questions you could ask

• Have you considered the use of alternative interest/attention operators? Does the operator matter?

• What if the background is much more interesting (to the operator) that the object?

• How much does color information matter?• What is the consequence of not using geometric

information (and what does that really mean)?





Performance metrics

• Training time: roughly 64 windows, 15x15, 17 objects, 3 views per object: 24 hours.– This is using MATLAB and highly non-optimized code.

• Using similar methods on global images, other groups have reported times on the order of minutes for similar tasks.

• On-line performance: – Interest operator can take 1/30s to 10 min. (depending on

the operator, images size, etc.)

– Classification in Eigenspace well under 1 sec (can be performed in real time).

Dudek & Jugessur, ICRA 2000. April 2000, IEEE ICRADudek & Jugessur Robust Place and Object Recognition using Local Appearance based Methods Gregory Dudek.

Documents

image recognition

deeptiman jugessur center

methods gregory dudek

known object

precise localization

local appearance

visual appearance

established method