Computer Vision Set: Object Recognition Slides by C.F. Olson 1 Object Recognition.

1Computer Vision Set: Object Recognition

Slides by C.F. Olson

Object Recognition



Object Recognition

• Object recognition is the process of determining whether an object appears in an image.

• Sometimes divided into two subproblems:– Identification: which objects are in the image?– Location: given that the object is present, where is it?

• Many methods solve both at the same time.



Appearance-Based Matching

• Appearance-based techniques use example images (templates or exemplars) of the objects to perform recognition (as opposed to extracted features).

• I include edge images in this definition, although this is considered feature-based by some.

Recognition by Finding Patterns

• If we know exactly what something looks like then it is easy to find.– Stereo matching

• Objects look different under varying conditions:– Changes in lighting or color– Changes in viewing direction– Changes in size / shape

• A single exemplar is unlikely to succeed reliably!– However, it is impossible to represent all appearances of an object.

Computer Vision - A Modern ApproachSet: Recognition as Template Matching

Slides by D.A. Forsyth, C.F. Olson



Example

?=

?=Frontal faces are fairly easy to find (and sometimes classify)

However, changes to lighting and background cause problems.



Edge Matching

• Changes in lighting and color usually don’t have much effect on image edges.



Edge Matching

• Strategy:– Detect edges in template

and image– Compare edges images to

find the template– Must consider range of

possible template positions



Edge Matching Measures

• What measure should we use to compare edge images?

• Can count number of overlapping edges.– Not robust to changes in shape

• Better: count number of template edge pixels with some distance of an edge in the search image.

• Best: – Determine probability distribution of distance to nearest edge in search

image (if template at correct position)– Estimate likelihood of each template position generating image



Hausdorff Distance

• The Hausdorff distance has also been used to compare edge images.– This is a distance between two point sets

• If A and B are sets of points (for example edge pixel locations):h(A,B) = max min || a – b ||

a A b B

H(A,B) = max(h(A,B), h(B,A))

• The Hausdorff distance is not robust to outliers.– A single bad point can lead to a large distance.

• Common variation: take the median (or some other percentile):

h(A,B) = med min || a – b || a A b B



Searching for Matches

• Can we search for matches more efficiently than looking at every possible translation?

• What if the object in the image is a different size or rotated or skewed?

• Premise: Given a set of possible positions we can find a bound on the best possible match in the set without looking at all of the positions.

• Assume that the best score is small (like the Hausdorff distance). – Can change this easily.



Searching for Matches

• Divide-and-conquer strategy: – consider all positions as a set (a cell in the space of positions)– determine lower bound on score at best position in cell– if bound is too large, prune cell– if bound is not too large, divide cell into subcells and try each subcell

recursively– process stops when cell is “small enough”

• Unlike multi-resolution search, this technique is guaranteed to find all matches that meet the criterion (assuming that the lower bound is accurate).



Divide-and-Conquer

At each level, cells are pruned when possible and divided into smaller cells when not possible.



Finding the Bound

• Given a cell of possible template positions, how can we find a lower bound on the best score?– Look at score for the template position represented by the center

of the cell.– Subtract maximum change from the “center” position for any

other position in cell (occurs at cell corners)

• This strategy can also be applied when the score is based on a count on the number of pixels that match well.– Must count maximum number of pixels that could match well at a

position in the cell



Complex Transformations

• What if space of possible template positions is more complex?– Rotations– Scale– Shear

• Basic methodology is exactly the same!

• Complexity arises from determining bounds on distance.



Greyscale Matching

• Although edges are (mostly) robust to illumination changes, they throw away a lot of information.

• Can we apply similar to techniques to greyscale matching?

• Yes. Must compute pixel distance as a function of both pixel position and pixel intensity.

• Can be applied to color also.



Matching Gradients

• One way to be robust to illumination changes, but not throw away as much information is to compare image gradients.

• Matching is performed like matching greyscale images.

• Simple alternative: use (normalized) correlation.



Different Viewpoints

• What if we don’t know the viewpoint?– Non-frontal faces– Objects in arbitrary orientation

• A partial solution: linear transformations model small changes in viewpoint.

• A better solution: use templates that model all possible view directions.– Computationally expensive



Large Modelbases

• If we have many potential templates that we are looking can we search efficiently?

• One approach is based on eigenvectors of the templates.– eigenfaces

http://www-white.media.mit.edu/vismod/demos/facerec/basic.html



Feature-Based Recognition

• Feature-based methods extract features of some sort from the objects to be recognized and the images to be searched:– Surface patches– Corners– Linear edges

• A search is used to find feasible matches between object features and image features.

• The primary constraint is that a single position of the object must account for all of the feasible matches.



Interpretation Trees

• One method for finding sets of feasible matches is to search a tree.

• Each node in the tree represents a set of matches.– Root node represents empty set.– Each other node is the union of the matches in the parent node and

one additional match.– Wildcard is used for features with no match.

• Nodes are “pruned” when the set of matches is infeasible.– A pruned node has no children (all would have infeasible matches).

• Historically significant and still used, but less commonly.



Recognition by Hypothesize and Test

• General idea– Hypothesize object identity

and pose– Compare hypothesized

appearance to image

• Issues– Where do the hypotheses come

from?– How do we compare to image

(verification)?

• Simplest approach– Construct a correspondence for

small sets of object features to every correctly sized subset of image points

• These are the hypotheses– Expensive search, which is

also redundant. • Can be improved using

randomization.



What are the features?

• They have to project to similar features in the image:– Points– Lines– Conics– Other fitted curves– Regions (particularly the center of a region, etc.)



Pose Consistency

• Correspondences between image features and model features are not independent.

• A small number of correspondences yields the object position - the others must be consistent with this.

• Strategy:– Generate hypotheses using small numbers of correspondences (e.g. triples

of points for 3D recognition)– Project other model features into image and verify additional

correspondences

• Use the smallest number of correspondences necessary to achieve discrete object poses.



Figure from “Object recognition using alignment,” D.P. Huttenlocher and S. Ullman, Proc. Int. Conf. Computer Vision, 1986, copyright IEEE, 1986

Example (2D)



Example (3D)



Randomization

• Improved efficiency can be gained using RANSAC (Random Sample Consensus)

• Examine small sets of image features until likelihood of missing object becomes small.

• For each set of image features must consider all possible matching sets of model features

• (1 – wc)k = z– w is fraction of image points that are “good” (w ~ m/n)– c is number of correspondences necessary– k is number of trials– z is probability of every trial using one (or more) incorrect

correspondences



Pose Clustering

• Each object leads to many correct sets of correspondences, each of which has (roughly) the same pose– Vote on pose, in an accumulator array– This is a (essentially) a Hough transform.

• Note that this method uses sets of correspondences, rather than individual correspondences– Implementation is easier, since each set yields a small number of possible

object poses.



Figure from “The evolution and testing of a model-based object recognition system”, J.L. Mundy and A. Heller, Proc. Int. Conf. Computer Vision, 1990 copyright 1990 IEEE



Figure from “The evolution and testing of a model-based object recognition system”, J.L. Mundy and A. Heller, Proc. Int. Conf. Computer Vision, 1990 copyright 1990 IEEE



Example

Detected craters Estimated poseGreen: matched craters

Yellow: unmatched craters



Alignment vs. Voting

Not all correct sets of matches will lead to a good pose!



Grouping

If we can determine groups of points that are likely to come from the same object, we can reduce the number of hypotheses that need to be examined.



Invariance

• There are geometric properties that are invariant to camera transformations

• One case: a planar object with a linear transformation• Assume we have three basis points Pi on the object, then any other

point on the object can be written as:

• Image points are obtained by multiplying by a linear transformation, so:

Pk =P1 + P2 - P1( )+ P3 - P1( ) kk

qk=APk=A P1 + P2 - P1( )+ P3 - P1( )( )= q1 + q2 - q1( )+ q3 - q1( )

kk

kk



Invariance

• This means that, if we know the basis points in the image, we can compute all of the α’s and β’s.– they’re the same in object and in image, i.e. invariant

• However, we don’t know the correspondences.

• Suggests another voting strategy:– form α’s and β’s in image and vote for model points with same values



Geometric Hashing

1. Preprocess data by determining α’s and β’s for all sets of points in all objects in database. Store in “hash table”.– This step is offline.

2. Pick a possible basis set (3 points) in the image.

3. Use image to compute α’s and β’s for remaining points and use them to look up possible matches in hash table.

4. If any object basis set gets enough consistent votes, then it is likely to be present in the image. Otherwise, repeat from step 2.

5. Perform verification.



Indexing With Invariants

• It would be nice to have invariants for more general cases (nonplanar objects, nonlinear transformations).

• Store invariants in a lookup table and index objects quickly.

• Invariants exist for:– 4 planar points with a linear transformation– 5 planar points with a perspective projection– planar curves (lines and conics) with a perspective projection

• There is no (nontrivial) invariant for unrestricted sets of nonplanar points.



Figure from “Efficient model library access by projectively invariant indexing functions,” by C.A. Rothwell et al., Proc. Computer Vision and Pattern Recognition, 1992, copyright 1992, IEEE



Verification

• Edge score– Are there image edges near predicted object edges?– Can be unreliable; in textured areas there are many edges

• Oriented edge score– Are there image edges near predicted object edges with the right

orientation?– Better, but still has false positives (see next slide)

• Could use texture, hue, etc.– Does the tool have the same texture as the wood?



Computer Vision Set: Object Recognition Slides by C.F. Olson 1 Object Recognition.

Documents