Eighth IEEE International Conference on Computer Vision (July 2001) Matching Shapes Serge Belongie, Jitendra Malik and Jan Puzicha Department of Electrical Engineering and Computer Sciences University of California, Berkeley, CA 94720, USA sjb,malik,puzicha @cs.berkeley.edu Abstract We present a novel approach to measuring similar- ity between shapes and exploit it for object recogni- tion. In our framework, the measurement of similar- ity is preceded by (1) solving for correspondences be- tween points on the two shapes, (2) using the correspon- dences to estimate an aligning transform. In order to solve the correspondence problem, we attach a descrip- tor, the shape context, to each point. The shape con- text at a reference point captures the distribution of the remaining points relative to it, thus offering a globally discriminative characterization. Corresponding points on two similar shapes will have similar shape contexts, enabling us to solve for correspondences as an optimal assignment problem. Given the point correspondences, we estimate the transformation that best aligns the two shapes; regularized thin–plate splines provide a flexi- ble class of transformation maps for this purpose. Dis- similarity between two shapes is computed as a sum of matching errors between corresponding points, together with a term measuring the magnitude of the aligning transform. We treat recognition in a nearest-neighbor classification framework. Results are presented for sil- houettes, trademarks, handwritten digits and the COIL dataset. 1 Introduction Consider the two 5’s in Figure 1. Regarded as vec- tors of pixel brightness values and compared using norms, they are very different. However, regarded as shapes they appear rather similar to a human observer. Our objective in this paper is to operationalize a notion of shape similarity, with the ultimate goal of using that as a basis for category-level recognition. We approach this as a three stage process: (1) solve the correspon- dence problem between the two shapes, (2) use the cor- respondences to estimate an aligning transform, and (3) compute the distance between the two shapes as a sum of matching errors between corresponding points, together with a term measuring the magnitude of the aligning transformation. We wish to solve the problem in considerable gener- ality. Shapes are arbitrary 2D figures, e.g. derived from edges extracted in images of 3D objects, not just silhou- ettes. The family of aligning transforms include affine as well as non-rigid smooth transformations, parametrized using thin plate splines. Matching errors between cor- responding points are computed using both shape and local appearance differences. At the heart of our approach is a tradition of match- ing shapes by deformation that can be traced at least as far back as D’Arcy Thompson. In his classic work On Growth and Form [27], Thompson observed that re- lated but not identical shapes can often be deformed into alignment using simple coordinate transformations. Fis- chler and Elschlager [9] operationalized this approach using energy minimization in a mass-spring model. Grenander et al. [13] developed these ideas in a prob- abilistic setting. Yuille’s [31] version of the deformable template concept fitted hand-crafted parametrized mod- els, e.g. for eyes, in the image domain using gradient descent. Von der Malsburg and collaborators [19] used elastic graph matching for aligning faces. Our primary contribution is a simple and robust al- gorithm for finding correspondences between shapes. Shapes are represented by a set of points sampled from the shape contours (typically 100 or so pixel locations sampled from the output of an edge detector are used). There is nothing special about the points. They are not required to be landmarks or curvature extrema, etc.; as we use more samples we obtain ever better approxima- tions to the underlying shape. We introduce a shape de- scriptor, the shape context, to describe the coarse distri- bution of the rest of the shape with respect to a point on the shape. Finding correspondences between two shapes is then equivalent to finding for each sample point on one shape the sample point on the other shape that has the most similar shape context. Maximizing similarities and enforcing uniqueness naturally leads to a setup as a bipartite graph matching (equivalently, optimal assign- ment) problem. As desired, we can incorporate other
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Eighth IEEE International Conference on Computer Vision (July 2001)
Universityof California,Berkeley, CA 94720,USA�sjb,malik,puzicha� @cs.berkeley.edu
Abstract
We presenta novel approach to measuringsimilar-ity betweenshapesand exploit it for object recogni-tion. In our framework, the measurementof similar-ity is precededby (1) solving for correspondencesbe-tweenpointsonthetwoshapes,(2) usingthecorrespon-dencesto estimatean aligning transform. In order tosolvethecorrespondenceproblem,weattach a descrip-tor, the shapecontext, to each point. The shapecon-text at a referencepoint capturesthedistribution of theremainingpointsrelativeto it, thusoffering a globallydiscriminativecharacterization. Correspondingpointson two similar shapeswill havesimilar shapecontexts,enablingus to solvefor correspondencesasan optimalassignmentproblem. Giventhepoint correspondences,weestimatethe transformationthat bestaligns the twoshapes;regularizedthin–platesplinesprovide a flexi-ble classof transformationmapsfor this purpose. Dis-similarity betweentwo shapesis computedasa sumofmatchingerrorsbetweencorrespondingpoints,togetherwith a term measuringthe magnitudeof the aligningtransform. We treat recognition in a nearest-neighborclassificationframework. Resultsare presentedfor sil-houettes,trademarks,handwrittendigits and the COILdataset.
1 Intr oduction
Considerthe two 5’s in Figure1. Regardedasvec-tors of pixel brightnessvaluesandcomparedusing ���norms, they are very different. However, regardedasshapesthey appearrathersimilar to a humanobserver.Our objective in this paperis to operationalizea notionof shapesimilarity, with theultimategoalof usingthatasa basisfor category-level recognition. We approachthis as a threestageprocess:(1) solve the correspon-denceproblembetweenthetwo shapes,(2) usethecor-respondencesto estimateanaligningtransform,and(3)computethedistancebetweenthetwoshapesasasumofmatchingerrorsbetweencorrespondingpoints,together
with a term measuringthe magnitudeof the aligningtransformation.
We wish to solve theproblemin considerablegener-ality. Shapesarearbitrary2D figures,e.g.derivedfromedgesextractedin imagesof 3D objects,not justsilhou-ettes.Thefamily of aligningtransformsincludeaffineaswell asnon-rigidsmoothtransformations,parametrizedusingthin platesplines. Matchingerrorsbetweencor-respondingpoints are computedusing both shapeandlocalappearancedifferences.
At the heartof our approachis a traditionof match-ing shapesby deformationthat can be tracedat leastas far backasD’Arcy Thompson. In his classicworkOn GrowthandForm [27], Thompsonobservedthatre-latedbut not identicalshapescanoftenbedeformedintoalignmentusingsimplecoordinatetransformations.Fis-chler and Elschlager[9] operationalizedthis approachusing energy minimization in a mass-springmodel.Grenanderet al. [13] developedtheseideasin a prob-abilistic setting.Yuille’s [31] versionof thedeformabletemplateconceptfitted hand-craftedparametrizedmod-els, e.g. for eyes, in the imagedomainusing gradientdescent.Von derMalsburg andcollaborators[19] usedelasticgraphmatchingfor aligningfaces.
Our primary contribution is a simpleandrobust al-gorithm for finding correspondencesbetweenshapes.Shapesarerepresentedby a setof pointssampledfromthe shapecontours(typically 100 or so pixel locationssampledfrom the outputof an edgedetectorareused).Thereis nothingspecialaboutthepoints. They arenotrequiredto be landmarksor curvatureextrema,etc.; aswe usemoresampleswe obtainever betterapproxima-tionsto theunderlyingshape.We introducea shapede-scriptor, theshapecontext, to describethecoarsedistri-bution of therestof theshapewith respectto a point ontheshape.Findingcorrespondencesbetweentwo shapesis then equivalent to finding for eachsamplepoint ononeshapethe samplepoint on the othershapethathasthemostsimilar shapecontext. Maximizing similaritiesandenforcinguniquenessnaturallyleadsto a setupasabipartitegraphmatching(equivalently, optimal assign-ment) problem. As desired,we can incorporateother
Given the correspondencesat samplepoints,we ex-tendthe correspondenceto the completeshapeby esti-matingan aligning transformationthatmapsoneshapeontotheother. Thetransformationscanbepickedfromany of a numberof families– we have usedEuclidean,affineandregularizedthin platesplinesin variousappli-cations. Oncethe shapesarealigned,computingsim-ilarity scoresandrecognitionby � -NN classificationisrelatively straightforward.
We demonstrateobjectrecognitionin a wide varietyof settings.We dealwith 2D objects,e.g. the MNISTdatasetof handwrittendigits (Fig. 5), silhouettes,andtrademarks(Fig. 7), as well as 3D objects from theColumbiaCOIL dataset,modeledusingmultiple views(Fig. 6). Thesearewidely usedbenchmarksandourap-proachturnsout to be the leadingperformeron all theproblemsfor which thereis comparativedata.
Thestructureof this paperis asfollows. We discussrelatedwork in Section2. In Section3 we thendescribeour shapematchingmethodin detail. Our transforma-tion model is discussedin Section4. We thendiscussthe problemof measuringshapesimilarity in Section5anddemonstrateour proposedmeasureon a variety ofdatabasesincluding handwrittendigits and picturesof3D objects.Finally, we concludein Section6.
2 Prior Work on ShapeMatching
An extensive survey of shapematchingin computervision canbefoundin [28]. Broadlyspeaking,therearetwo approaches:(1) feature-based,and(2) brightness-based.
Feature-basedapproachesinvolve the useof spatialarrangementsof extracted featuressuch as edgesorjunctions. Silhouetteshave beendescribed(and com-pared)usingFourierdescriptors,e.g.[32], skeletonsde-rived usingBlum’s medialaxis transform [26], or di-rectly matchedusing dynamicprogramminge.g. [11].Since silhouettesare limited as shapedescriptorsforgeneral objects1, other approaches[14, 10] treat the
1They ignoreinternalcontoursandaredifficult to extractfrom realimages.
shapeasa setof points in the 2D image,extractedus-ing, say, an edgedetector. Amit and Geman[1] findkey pointsor landmarks,andrecognizeobjectsusingthespatialarrangementsof point sets.However not all ob-jectshave distinguishedkey points(think of a circle forinstance),andusingkey pointsalonesacrificestheshapeinformationavailablein smoothportionsof objectcon-tours. Most closelyrelatedto our approachis theworkof Rangarajanand collaborators[12, 7], which is dis-cussedin Section3.2.
Brightness-basedapproachesmake more direct useof pixel brightnessvalues.Severalapproaches[19, 29, 8]first attemptto find correspondencesbetweenthe twoimages,beforedoing the comparison. This turns outto bequitea challengeasdifferentialopticalflow tech-niquesdo not copewell with the large distortionsthatmustbehandleddueto pose/illuminationvariations.Er-rors in finding correspondencewill causedownstreamprocessingerrorsin the recognitionstage.As an alter-native, therearea numberof methodsthat build clas-sifiers without explicitly finding correspondences.Insuchapproaches,onereliesona learningalgorithmhav-ing enoughexamplesto acquirethe appropriateinvari-ances.Someexamplesinclude[21, 6] for handwrittendigit recognition,[22] for facerecognition,andisolated3D objectrecognition[24].
3 Matching with ShapeContexts
In our approach,a shapeis representedby a discretesetof pointssampledfrom the internalor externalcon-tourson the shape.Thesecanbe obtainedaslocationsof edgepixelsasfoundby anedgedetector, giving usaset ��� � � � � � � � ��� � , ��������� , of � points.They neednot,andtypically will not,correspondto key-pointssuchas maximaof curvatureor inflection points. We pre-fer to samplethe shapewith roughly uniform spacing,thoughthis is alsonot critical. Fig. 2(a,b)showssamplepointsfor two shapes.Assumingcontoursarepiecewisesmooth,we canobtainasgoodanapproximationto theunderlyingcontinuousshapesasdesiredby picking � tobesufficiently large.
For eachpoint ��� on thefirst shape,we want to findthe“best” matchingpoint � � on thesecondshape.Thisis acorrespondenceproblemsimilarto thatin stereopsis.Experiencetheresuggeststhatmatchingis easierif oneusesa rich local descriptor, e.g.a grayscalewindow oravectorof filter outputs,insteadof just thebrightnessata singlepixel or edgelocation. Rich descriptorsreducetheambiguityin matching.
binsfor . (d-f) Exampleshapecontexts for referencesamplesmarked
by ! " # " $ in (a,b). Eachshapecontext is a log-polarhistogramof the
coordinatesof the restof the point setmeasuredusingthe reference
point asthe origin. (Dark=large value.) Note thevisual similarity of
the shapecontexts for ! and # , which werecomputedfor relatively
similar pointson thetwo shapes.By contrast,theshapecontext for $is quitedifferent.(g) Correspondencesfoundusingbipartitematching,
with costsdefinedby the % & distancebetweenhistograms.
shapecontext, that could play such a role in shapematching.Considerthesetof vectorsoriginatingfrom apoint to all othersamplepointson a shape.Thesevec-tors expressthe configurationof the entireshaperela-tive to the referencepoint. Obviously, this setof ')(+*vectorsis a rich description,sinceas ' getslarge, therepresentationof theshapebecomesexact.
Thefull setof vectorsasa shapedescriptoris muchtoo detailedsinceshapesandtheir sampledrepresenta-tion may vary from one instanceto anotherin a cate-gory. We identify thedistributionoverrelativepositionsasa morerobustandcompact,yet highly discriminativedescriptor. For a point ,�- on the shape,we computeacoarsehistogram.�- of therelativecoordinatesof there-maining '/(0* points,.�- 1 2 3547608 9�:4;,�-=<�1 9>(�,�- 3�? bin 1 2 3 @BA (1)
This histogramis definedto betheshapecontext of ,�- .The descriptorshouldbe moresensitive to differencesin nearbypixels. We thus proposeto usea log-polarcoordinatesystem.An exampleis shown in Fig. 2(c).
Considera point ,�- on the first shapeanda point 9 Con the secondshape. Let DE- C;4FDG1 ,�- H 9 C 3 denotethecost of matchingthesetwo points. As shapecontextsaredistributionsrepresentedashistograms,it is natural2
to usethe I�J teststatistic:
DE- C>4 *K LMN O�P Q .�- 1 2 3�(;. C 1 2 3 R J.�- 1 2 3TSU. C 1 2 3where .�- 1 2 3 and . C 1 2 3 denotethe V -bin normalizedhistogramat ,�- and 9 C , respectively.
Giventhesetof costsD�- C betweenall pairsof pointsWonthefirst shapeandX onthesecondshapewewantto
minimize the total costof matchingsubjectto the con-straint that the matchingbe one-to-one.This is an in-stanceof the squareassignment(or weightedbipartitematching)problem,whichcanbesolvedin YG1 Z)[ 3 timeusingtheHungarianmethod.In ourexperiments,weusethemoreefficientalgorithmof [17]. Theinput to theas-signmentproblemis a squarecost matrix with entriesD�- C . Theresultis a permutation\]1 W 3 suchthat thesum^ - D - _ ` a - b is minimized.
When the numberof sampleson two shapesis notequal, the cost matrix can be madesquareby adding“dummy” nodesto eachpointsetwith aconstantmatch-ing cost of c d . The sametechniquemay alsobe usedeven when the samplenumbersare equalto allow forrobusthandlingof outliers. In this case,a point will bematchedto a “dummy” whenever thereis no realmatchavailable at smallercost than c d . Thus, c d can be re-gardedasa thresholdparameterfor outlierdetection.
Thecost DE- C for matchingpointscaninclude,anad-ditional term basedon the local appearancesimilarityat points ,�- and 9 C . This is particularly useful whenwe are comparingshapesderived from gray-level im-agesinsteadof line drawings.For example,onecanaddacostbasedoncoloror texturesimilarity, SSDbetweensmallgray-scalepatches,distancebetweenvectorsof fil-teroutputs,similarity of tangentangles,andsoon.
3.1 Invarianceand Robustness
A matchingapproachshouldbe (1) invariantunderscalingandtranslation,and(2) robustundersmallaffinetransformations,occlusionandpresenceof outliers. Incertainapplications,onemaywantcompleteinvarianceunderrotation,or perhapseven the full groupof affinetransformations.We now evaluateshapecontext match-ing by thesecriteria.
2Alternatives includeBickel’s generalizationof the Kolmogorov-Smirnov testfor 2D distributions[4], whichdoesnot requirebinning.
Invarianceto translationis intrinsic to theshapecon-text definitionsinceall measurementsaretakenwith re-spectto pointsontheobject.To achievescaleinvariancewe normalizeall radial distancesby the meandistancee betweenthe fTg pointpairsin theshape.
Sinceshapecontexts areextremelyrich descriptors,they areinherentlyinsensitive to smallperturbationsofpartsof the shape.While we have no theoreticalguar-anteeshere,robustnessto small affine transformations,occlusionsandpresenceof outliersis evaluatedexperi-mentallyin Sect.4.1.
In the shapecontext framework, we canprovide forcompleterotationinvarianceif thisis desirablefor anap-plication. Insteadof usingtheabsoluteframefor com-puting the shapecontext at eachpoint, onecanusethetangentvectorat eachpoint as the positive h -axis. Inthis way the referenceframeturnswith the tangentan-gle, andtheresultis a completelyrotationinvariantde-scriptor. In the extendedversionof this paper[3] wedemonstratethis experimentallyusingthe datasetfromKimia andcollaborators[26].
3.2 Relatedwork
The most comprehensive body of work on shapecorrespondencein this generalsetting is the work ofRangarajanandcollaborators[12, 7]. They developedan iterative optimizationalgorithm to determinepointcorrespondencesandunderlyingimagetransformationsjointly, where typically some generic transformationclassis assumed,e.g.affine or thin platesplines. Thecostfunction that is beingminimizedis thesumof Eu-clideandistancesbetweena point on the transformedfirst shapeandthesecondshape.Thissetsupachicken-and-egg problem: the distancesmake senseonly whenthereis at leastaroughalignmentof shape.Jointestima-tion of correspondencesandshapetransformationleadsto a difficult, highly non-convex optimizationproblem,which is addressedusingdeterministicannealing[12].Theshapecontext is averydiscriminativepointdescrip-tor, facilitatingeasyandrobustcorrespondencerecoveryby incorporatingglobal shapeinformation into a localdescriptor.
As far aswe areawareof, theshapecontext descrip-tor and its usefor matching2D shapesis novel. Themostcloselyrelatedideain pastwork is thatdueto John-sonandHebert[16] in theirwork onrangeimages.Theyintroducedarepresentationfor matchingdensecloudsoforiented3D pointscalledthe “spin image”. A spin im-ageisa2D histogramformedbyspinningaplanearoundanormalvectoronthesurfaceof theobjectandcounting
thepointsthatfall insidebinsin theplane.
4 Modeling Transformations
Givena setof correspondencesbetweentwo shapes,onecanproceedto estimatea transformationthatmapsthemodelinto thetarget.For thispurposetherearesev-eraloptions;perhapsmostcommonis theaffine model.In this work, we usethe thin platespline(TPS)model,which is commonlyusedfor representingflexible co-ordinatetransformations[30, 25]. Bookstein[5], forexample, found it to be highly effective for modelingchangesin biologicalforms. Thethin platesplineis the2D generalizationof thecubicspline. In its regularizedform, which is discussedbelow, theTPSmodelincludestheaffine modelasa specialcase.We will now providesomebackgroundinformationon theTPSmodel.
Let i j denotethe target function values at corre-spondinglocations k�jlnm h j o p j q in the plane, withr lts o u o v v v o f . In particular, we will set i j equalto h wj and p wj in turn to obtainonecontinuoustransfor-mation for eachcoordinate. We assumethat the loca-tions m h�j o p j q areall differentandarenot collinear. TheTPSinterpolant x�m hTo p q minimizesthe bendingenergyy z l+{ {|x g} }E~ u x g} �E~ x g� � � h � p andhastheform:x�m h�o p q�l�� � ~ � } h ~ � � p~��� j ��� � j �Um � m h�j o p j q]�Um h�o p q � q (2)
where �Gm � q;l�� g � � � � . In order for x�m h�o p q to havesquareintegrablesecondderivatives,we requirethat�� j �T� � j)l�� and �� j ��� � j h j)l �� j �T� � j p j)l�� (3)
Togetherwith the interpolationconditions,x�m h�j o p j q�li j , this yieldsa linearsystemfor theTPScoefficients:��� ��>� ��� � � ��� l � i �U� (4)
where� j ��l�Gm � m h j o p j q��;m h�� o p � q � q , the
rth row of
�is m s o h j o p j q , � and i arecolumnvectorsformedfrom� j and i j , respectively, and � is thecolumnvectorwithelements� � o � } o � � . Wewill denotethe m f ~|� q ��m f ~|� qmatrix of this systemby � . As discussede.g.in [25], �is nonsingularandwe canfind thesolutionby inverting� . If wedenotetheupperleft f)�Gf blockof �E � by ¡ ,thenit canbeshown that
To addressthe dependenceof ½ on the datascale,suppose · ª ¸ ¹ ª º and ´ ·�Ī ¸ ¹ Ī º arereplacedby ´ ÅT· ª ¸ Å ¹ ª ºand ´ ÅT·�Ī ¸ Å ¹ Ī º , respectively, for somepositive constantÅ . Then it can be shown that the parametersÆG¸ Ç�¸ ¾ ¿of theoptimal thin platesplineareunaffectedif ½ is re-placedby Å » ½ . This simplescalingbehavior suggestsa normalizeddefinitionof the regularizationparameter.Let Å againrepresentthe scaleof the point setasesti-matedby the meanedgelengthbetweentwo points inthe set. Thenwe candefine ½ in termsof Å and ½�È , ascale-independentregularizationparameter, via thesim-ple relation ½/¯ Å » ½�È .
In order to study the robustnessof our proposedmethod,we performedthesyntheticpoint setmatchingexperimentsdescribedin [7]. Theexperimentsarebro-ken into threepartsdesignedto measurerobustnesstodeformation,noise,andoutliers. (The latter testseachincludea “moderate”amountof deformation.)In eachtest,wesubjectedthemodelpointsetto oneof theabovedistortionsto createa“target” pointset.Wethenranouralgorithm to find the bestwarping betweenthe modelandthetarget. Finally, theperformanceis quantifiedbycomputingtheaveragedistancebetweenthecoordinatesof thewarpedmodelandthoseof thetarget.Theresultsare shown in Fig. 4. More detailsof the experimentsmaybefoundin [2].
5 ShapeSimilarity and Recognition
We define the shape distance Ì)Í Î|Ï Ð�Ñ betweenshapesÎ and Ð asaweightedsumof threeterms:shapecontext distance,imageappearancedistanceandbend-ing energy. We will demonstratethe useof this dis-tancefor recognitionin a nearest-neighborclassifierfora numberof differentobjectrecognitionproblems.
We measureshapecontext distancebetweenshapesÎ and Ð as the symmetric sum of shape con-text matching costs over best matching points, i.e.Ì sc Í Î�Ï Ð�ÑÓÒÕÔÖG×0Ø Ù Ú�Û Ü Ý�Þ|ß à á Ù âGã Í äTÏ å¶Í æ Ñ Ñ;çÔè × á Ù â Û Ü Ý]ÞGß à Ø Ù Ú�ã Í äTÏ å;Í æ Ñ Ñ where å�Í é Ñ denotestheestimatedTPSshapetransformation.
Often there is additional appearanceinformationavailable that is not capturedby our notion of shape,e.g.local imagepatches,textural information,color, etc.As a key benefitof the shapematchingframework, thedistortedimagecanbewarpedbackinto a normalformafter recovery of the underlying2D imagetransforma-tion, thuscorrectingfor distortionsof theimageappear-ance. We useda term Ì ac Í Î�Ï Ð�Ñ for appearancecostwhichis thesumof squareddifferencesin Gaussianwin-dowsaroundcorrespondingpoints.
Thethird termcorrespondsto the ‘amount’ of trans-formationnecessaryto align theshapes.In theTPScasethe bendingenergy Ì be Í Î�Ï Ð�Ñ�ÒBê>ëTì)ê is a naturalmeasure.
5.1 Digit Recognition
We begin with resultson the well-known MNISTdatasetof handwrittendigits, which consistsof 60,000training and10,000testdigits[21]. Matchingused100point samplesselectedfrom the Canny edgesof eachdigit image.We employeda TPStransformationmodelandused3 iterationsof shapecontext matchingandTPSre-estimation.WeusedanearestneighborclassifierwithÌ/Í Î�Ï Ð�Ñ asdefinedabove.
Nearestneighborclassifiershave thepropertythatasthe numberof examples í in the training set îðï ,the1-NN error convergesto a value ñ�ò óGô , where óGôis the BayesRisk (for õ -NN, by making õîöï andõ�÷ í0î�ø , the error î�óGô ). However, what mattersinpracticeis theperformancefor small í , andthisgivesusawayto comparedifferentsimilarity/distancemeasures.In Fig. 5, our shapedistanceis comparedto SSD(sumof squareddifferencesbetweenpixel brightnessvaluesof imagesregardedasvectors).
On the MNIST dataset nearly 30 algorithmshave been compared (http://www.research.att.com/
0 2000 4000 6000 8000 100000
0.05
0.1
0.15
0.2
0.25
0.3
size of training set
test
set
err
or r
ate
SSDSD
103
104
0.01
0.02
0.03
0.04
0.05
0.06
size of training set
test
set
err
or r
ate
K=1K=3K=5
Figure5.Handwrittendigit recognitionontheMNIST dataset.Left:Testseterrorsof a1-NNclassifierusingSSDandShapeDistance(SD)measures.Right: Detail of performancecurve for ShapeDistance,includingresultswith trainingsetsizesof 15,000and20,000.Resultsareshown onasemilog-ù scalefor ú0û)ü ý þ ý ÿ nearestneighbors.
� yann/exdb/mnist/index.html). Thelowesttestseterrorratepublishedat this timeis ø � � � for aboostedLeNet-4with a training set of size � ø Ï ø ø ø���� ø syntheticdis-tortionsper training digit. Our error rateusing20,000trainingexamplesand -NN is ø � � � .
5.2 MPEG-7 ShapeSilhouetteDatabase
Our next experiment involves the MPEG-7 shapesilhouettedatabase,specificallyCore ExperimentCE-Shape-1 part B, which measuresperformance ofsimilarity-basedretrieval [15]. Thedatabaseconsistsof1400images:70 shapecategories,20 imagesper cate-gory. Theperformanceis measuredusingtheso-called“bullseye test,” in which eachimageis usedasa queryandonecountsthenumberof correctimagesin thetop40 matches.
As this experimentinvolves intricateshapeswe in-creasedthe numberof samplesfrom 100 to 300. Insomecategoriesthe shapesappearrotatedandflipped,which we addressusing a modified distancefunction.The distancedistÍ GÏ ��Ñ betweena referenceshapeanda queryshape� is definedas
With thesechangesin placebut otherwiseusingthesameapproachasin the MNIST digit experiments,weobtainaretrieval rateof 76.51%.Currentlythebestpub-lished performanceis achieved by Latecki et al. [20],with a retrieval rateof 76.45%,followedby Mokhtarianetal. [23] at 75.44%.
0 2 4 6 8 10 120
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
average no. of prototypes per object
test
set
err
or r
ate
SSD SD SD−proto
Figure6. Left: 3D object recognitionusing the COIL-20 dataset.
of anobject. In this exampleweobserve thattheAnacinbox requires
twiceasmany views asthebabypowderbottle.
5.3 Columbia COIL-20 Database
Ournext experimentinvolvesthe20commonhouse-hold objectsfrom theCOIL-20 database[24]. Eachob-ject wasplacedon a turntableandphotographedevery� �
for a total of 72 views per object. We preparedourtraining setsby selectinga numberof equally spacedviews for eachobjectandusingtheremainingviews fortesting.Thematchingalgorithmandshapedistanceareexactly thesameasfor digits.
Fig. 6(a) shows the performanceof a � -NN classi-fier using our shapedistanceas well as SSD (sum ofsquareddifferences).SSD performsvery well on thiseasydatabasedueto thelackof variationin lighting [14](PCA just makesit faster).
In a companionpaper[2] we recentlydevelopedanovel editingalgorithmbasedon shapecontext similar-ity and � -medoidclustering.Theeditingalgorithmis il-lustratedin Fig.6(b). Moreviewsarechosenfor visuallycomplex categories.This ideais relatedto the“aspect”conceptas discussedin [18]. The curve marked SC-protoin Fig. 6(a)showstheimprovedclassificationper-formanceusingthis prototypeselectionstrategy insteadof equally-spacedviews. Notethatweobtaina2.4%er-ror ratewith anaverageof only 4 two-dimensionalviewsfor eachthree-dimensionalobject,thanksto theflexibil-ity providedby thematchingalgorithm.
5.4 Trademark Retrieval
The automaticidentificationof trademarkinfringe-ment is of commercialinterest. Currently, trademarksare broadly classifiedaccordingto the Vienna code,andinfringementsaredetectedby manuallylooking forcloseperceptualsimilarity in an appropriatecategory.Shape,togetherwith text and texture, is key in defin-ing perceptualsimilarity. Usingournotionof shapedis-tance,Fig. 7 depictsnearestneighborretrieval resultsfrom a databaseof 300 trademarks.We experimentedwith eightdifferentquerytrademarksfor eachof whichthe databasecontainedat leastone potential infringe-ment. It is clearlyseenthat thepotentialinfringementsareeasilydetectedandappearasmostsimilaronthetopranksdespitesubstantialvariationof the actualshapes.It hasbeenmanually verified that no visually similartrademarkhasbeenmissedby thealgorithm.
6 Conclusion
We havepresenteda new approachto theanalysisofshape. A key characteristicof our approachis the es-timationof shapesimilarity andcorrespondencesbasedon a novel descriptor, theshapecontext. In our experi-mentswe have demonstratedexcellentperformanceonawide varietyof datasets,bothof 2D and3D objects.
Acknowledgments This research is supported by(ARO) DAAH04-96-1-0341,the Digital Library GrantIRI-9411334,an NSF graduateFellowship for S.B andthe GermanResearchFoundationby DFG grant PU-165/1. We wish to thank H. Chui and A. Rangarajanfor providing thesynthetictestingdatausedin 4.1.
References
[1] Y. Amit, D. Geman,and K. Wilder. Joint induction of shapefeaturesandtreeclassifiers. IEEE Trans.PAMI, 19(11):1300–1305,November1997.
[2] S. Belongie,J. Malik, andJ. Puzicha. Shapecontext: A newdescriptorfor shapematchingandobjectrecognition. In NIPS,November2000.
anda weightedcombinationof shapecontext similarity � sc andthe
sumover local tangentorientationdifferences.
[6] C. BurgesandB. Scholkopf. Improving theaccuracy andspeedof supportvectormachines.In NIPS, pages375–381,1997.
[7] H. ChuiandA. Rangarajan.A new algorithmfor non-rigidpointmatching.In CVPR, volume2, pages44–51,June2000.
[8] T. Cootes,D. Cooper, C. Taylor, andJ. Graham. Active shapemodels- their training and application. ComputerVision andImage Understanding(CVIU), 61(1):38–59,Jan.1995.
[9] M. FischlerandR. Elschlager. Therepresentationandmatchingof pictorial structures.IEEE Trans.Computers, C-22(1):67–92,1973.
[12] S. Gold, A. Rangarajan,C.-P. Lu, S. Pappu,andE. Mjolsness.New algorithmsfor 2D and3D point matching:poseestimationandcorrespondence.PatternRecognition, 31(8),1998.
[13] U. Grenander, Y. Chow, and D. Keenan. HANDS: A PatternTheoretic StudyOf Biological Shapes. Springer, 1991.
[14] D. Huttenlocher, R. Lilien, andC. Olson. View-basedrecogni-tion usingan eigenspaceapproximationto the Hausdorff mea-sure.PAMI, 21(9):951–955,Sept.1999.
[15] S. JeanninandM. Bober. Descriptionof coreexperimentsforMPEG-7 motion/shape.TechnicalReport ISO/IEC JTC 1/SC29/WG11MPEG99/N2690,MPEG-7,Seoul,March1999.
[16] A. E. JohnsonandM. Hebert.Recognizingobjectsby matchingorientedpoints. In CVPR, pages684–689,1997.
[17] R. Jonker andA. Volgenant. A shortestaugmentingpathalgo-rithm for denseandsparselinearassignmentproblems.Comput-ing, 38:325–340,1987.
[19] M. Lades,C. Vorbruggen,J. Buhmann,J. Lange,C. von derMalsburg, R. Wurtz, and W. Konen. Distortion invariant ob-ject recognitionin the dynamiclink architecture. IEEE Trans.Computers, 42(3):300–311,March1993.
[20] L. J. Latecki, R. Lakamper, and U. Eckhardt. Shapedescrip-torsfor non-rigidshapeswith asingleclosedcontour. In CVPR,pages424–429,2000.
[22] B. Moghaddam,T. Jebara,and A. Pentland. Bayesianfacerecognition.PatternRecognition, 33(11):1771–1782,November2000.
[23] F. Mokhtarian,S. Abbasi, and J. Kittler. Efficient and robustretrieval by shapecontentthrough curvature scalespace. InA. W. M. SmeuldersandR. Jain,editors,Image DatabasesandMulti-MediaSearch, pages51–58.World Scientific,1997.
[24] H. MuraseandS. Nayar. Visual learningandrecognitionof 3-D objectsfrom appearance.Int. Journal of ComputerVision,14(1):5–24,Jan.1995.
[25] M. J.D. Powell. A thin platesplinemethodfor mappingcurvesinto curvesin two dimensions.In ComputationalTechniquesandApplications(CTAC95), Melbourne,Australia,1995.
[26] D. Sharvit,J.Chan,H. Tek, andB. Kimia. Symmetry-basedin-dexing of imagedatabases.J. VisualCommunicationandImageRepresentation, 1998.
[27] D. W. Thompson.OnGrowthandForm. Dover, 1917.
[28] R. C. Veltkampand M. Hagedoorn. Stateof the art in shapematching.TechnicalReportUU-CS-1999-27,Utrecht,1999.
[29] T. Vetter, M. J.Jones,andT. Poggio.A bootstrappingalgorithmfor learninglinear modelsof object classes. In CVPR, pages40–46,1997.
[30] G. Wahba.SplineModelsfor ObservationalData. SIAM, 1990.
[31] A. Yuille. Deformabletemplatesfor facerecognition.J. Cogni-tiveNeuroscience, 3(1):59–71,1991.
[32] C. Zahn and R. Roskies. Fourier descriptorsfor planeclosedcurves. IEEETrans.Computers, 21(3):269–281,March1972.