67 IEEE TRANSACTIONS ON COMPUTERS, VOL. c-22, NO. 1, JANUARY 1973 [4] K. D. Senne and R. S. Bucy, "Digital realization of optimal discrete-time nonlinear estimators," in Proc. 4th Annu. Princeton Conf. Syst. Sci., 1970. [5] R. S. Bucy and K. D. Senne, "Digital synthesis of non-linear filters," Automatica, vol. 7, pp. 287-298, 1971. [6] E. C. Tacker and T. D. Linton, "Digital and hybrid simulation of a discrete-time optimal nonlinear filter," in Proc. 4th Hawaii Int. Conf. Syst. Sci., 1971, pp. 465-467. [7] -, "Digital and hybrid simulation of a Bayes-optimal non- linear filter," Studies in Digital Automata, Louisiana State Univ., Baton Rouge, Air Force Office of Scientific Research Contract F-44620-68-C-0021, Tech. Rep. LSU-T-TR-40, Sept. 1970. [8] R. E. Mortensen, "Mathematical problems of modeling sto- chastic nonlinear dynamic systems," J. Statist. Phys., vol. 1, no. 2, 1969. [9] C. W. Helstrom, "Markov processes and applications," in Com- munication Theory, A. V. Balakrishnan, Ed. New York: McGraw-Hill, 1968. [10] R. S. Bucy, M. J. Merritt, and D. S. Miller, "Hybrid computer synthesis of optimal discrete nonlinear dynamic systems," in Proc. 2nd Symp. Nonlinear Estimation Theory and Its Applica- tions (San Diego, Calif.), 1971. [11] ,"Hybrid synthesis of the optimal discrete nonlinear filter," Stochastics, vol. 1, Jan. 1973. [12] D. S. Miller, Ph.D. dissertation, Dep. Elec. Eng., Univ. South- ern California, Los Angeles, 1972. Edgar C. Tacker (S'59-M'64) was born in Savannah, Tenn., on September 26, 1935. He received the B.S. degree (with distinction) in electrical engineering from the University of Oklahoma, Norman, in 1960, the M.S. degree in electrical engineering from New York University, New York, N. Y., in 1962, and the Ph.D. degree from the University of Florida, Gainesville, in 1964. IN 01!31,11~5~Q~N, ~0 His industrial experience includes two :1_ i 5 years as a Systems Engineer with Bell Tele- 8 phone Laboratories, Inc. He is presently an Associate Professor in the Departments of Electrical and Chemical Engineering, Loui- siana State University, Baton Rouge. He has developed graduate courses in applied func- tional analysis, systems science, and digital and hybrid computation at LSU and Michigan State University, East Lansing. His current research interests are in the areas of stochastic control theory and multilevel system theory, with emphasis on com- putational aspects. He has applied these theories to problems involv- ing the control of chemical processes and interconnected electrical energy systems. Dr. Tacker is a member of Tau Beta Pi, Pi Mu Epsilon, and Eta Kappa Nu. Thomas D. Linton was born in Frost, Ohio, on November 9, 1944. He received the B.S. i degree in electrical engineering from the Uni- versity of Arkansas, Fayetteville, in 1967, and i% the M.S. degree in systems science from Michigan State University, East Lansing, in1969. He is presently working toward the Ph.D. degree in the Department of Electrical Engi- neering, Louisiana State University, Baton Rouge. Mr. Linton is a member of Eta Kappa Nii and Phi Kappa Phi. The Representation and Matching of Pictorial Structures MARTIN A. FISCHLER AND ROBERT A. ELSCHLAGER Abstract-The primary problem dealt with in this paper is the following. Given some description of a visual object, find that object in an actual photograph. Part of the solution to this problem is the specification of a descriptive scheme, and a metric on which to base the decision of "goodness" of matching or detection. We offer a combined descriptive scheme and decision metric which is general, intuitively satisfying, and which has led to promis- ing experimental results. We also present an algorithm which takes the above descriptions, together with a matrix representing the in- tensities of the actual photograph, and then finds the described object in the matrix. The algorithm uses a procedure similar to dynamic programming in order to cut down on the vast amount of computation otherwise necessary. One desirable feature of the approach is its generality. A new programming system does not need to be written for every new description; instead, one just specifies descriptions in terms of a certain set of primitives and parameters. There ate many areas of application: scene analysis and descrip- tion, map matching for navigation and guidance, optical tracking, Manuscript received November 30, 1971; revised Mav 22, 1972, and August 21, 1972. The authors are with the Lockheed Palo Alto Research Labora- tory, Lockheed Missiles & Space Company, Inc., Paln Alto, Calif. 94304. stereo compilation, and image change detection. In fact, the ability to describe, match, and register scenes is basic for almost any image processing task. Index Terms-Dynamic programming, heuristic optimization, picture description, picture matching, picture processing, represen- tation. INTRODUCTION T llHE PRIMARY PROBLEM dealt with in this T paper is the following. Given some description of a visual object, find that object in an actual photo- graph. The object might be simple, such as a line, or complicated, such as an ocean wave, and the description can be linguistic, pictorial, procedural, etc. The actual photograph will be called the "sensed scene," a two- dimensional array of gray-level values, while the object being sought is called the "reference." This ability to find a reference in a sensed scene, or, equivalently, to match or register the images of two scenes, is basic for almost any image processing task. Application to such areas as scene analysis and descrip- tion, map matching for navigation and guidance, optical
26
Embed
Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
67IEEE TRANSACTIONS ON COMPUTERS, VOL. c-22, NO. 1, JANUARY 1973
[4] K. D. Senne and R. S. Bucy, "Digital realization of optimaldiscrete-time nonlinear estimators," in Proc. 4th Annu. PrincetonConf. Syst. Sci., 1970.
[5] R. S. Bucy and K. D. Senne, "Digital synthesis of non-linearfilters," Automatica, vol. 7, pp. 287-298, 1971.
[6] E. C. Tacker and T. D. Linton, "Digital and hybrid simulationof a discrete-time optimal nonlinear filter," in Proc. 4th HawaiiInt. Conf. Syst. Sci., 1971, pp. 465-467.
[7] -, "Digital and hybrid simulation of a Bayes-optimal non-linear filter," Studies in Digital Automata, Louisiana StateUniv., Baton Rouge, Air Force Office of Scientific ResearchContract F-44620-68-C-0021, Tech. Rep. LSU-T-TR-40, Sept.1970.
[8] R. E. Mortensen, "Mathematical problems of modeling sto-chastic nonlinear dynamic systems," J. Statist. Phys., vol. 1, no.2, 1969.
[9] C. W. Helstrom, "Markov processes and applications," in Com-munication Theory, A. V. Balakrishnan, Ed. New York:McGraw-Hill, 1968.
[10] R. S. Bucy, M. J. Merritt, and D. S. Miller, "Hybrid computersynthesis of optimal discrete nonlinear dynamic systems," inProc. 2nd Symp. Nonlinear Estimation Theory and Its Applica-tions (San Diego, Calif.), 1971.
[11] ,"Hybrid synthesis of the optimal discrete nonlinear filter,"Stochastics, vol. 1, Jan. 1973.
[12] D. S. Miller, Ph.D. dissertation, Dep. Elec. Eng., Univ. South-ern California, Los Angeles, 1972.
Edgar C. Tacker (S'59-M'64) was born in Savannah, Tenn., onSeptember 26, 1935. He received the B.S. degree (with distinction) inelectrical engineering from the University of Oklahoma, Norman, in1960, the M.S. degree in electrical engineering from New YorkUniversity, New York, N. Y., in 1962, and the Ph.D. degree fromthe University of Florida, Gainesville, in 1964.
IN 01!31,11~5~Q~N, ~0 His industrial experience includes two
:1_ i 5years as a Systems Engineer with Bell Tele-8 phone Laboratories, Inc. He is presently an
Associate Professor in the Departments ofElectrical and Chemical Engineering, Loui-siana State University, Baton Rouge. He hasdeveloped graduate courses in applied func-tional analysis, systems science, and digitaland hybrid computation at LSU and MichiganState University, East Lansing. His currentresearch interests are in the areas of stochastic
control theory and multilevel system theory, with emphasis on com-putational aspects. He has applied these theories to problems involv-ing the control of chemical processes and interconnected electricalenergy systems.
Dr. Tacker is a member of Tau Beta Pi, Pi Mu Epsilon, and EtaKappa Nu.
Thomas D. Linton was born in Frost, Ohio,on November 9, 1944. He received the B.S.
i degree in electrical engineering from the Uni-versity of Arkansas, Fayetteville, in 1967, and
i% the M.S. degree in systems science fromMichigan State University, East Lansing,in1969.He is presently working toward the Ph.D.
degree in the Department of Electrical Engi-neering, Louisiana State University, BatonRouge.
Mr. Linton is a member of Eta Kappa Nii and Phi Kappa Phi.
The Representation and Matching of Pictorial StructuresMARTIN A. FISCHLER AND ROBERT A. ELSCHLAGER
Abstract-The primary problem dealt with in this paper is the
following. Given some description of a visual object, find that objectin an actual photograph. Part of the solution to this problem is thespecification of a descriptive scheme, and a metric on which to basethe decision of "goodness" of matching or detection.We offer a combined descriptive scheme and decision metric
which is general, intuitively satisfying, and which has led to promis-ing experimental results. We also present an algorithm which takesthe above descriptions, together with a matrix representing the in-tensities of the actual photograph, and then finds the describedobject in the matrix. The algorithm uses a procedure similar to
dynamic programming in order to cut down on the vast amount ofcomputation otherwise necessary.
One desirable feature of the approach is its generality. A new
programming system does not need to be written for every new
description; instead, one just specifies descriptions in terms of a
certain set of primitives and parameters.There ate many areas of application: scene analysis and descrip-
tion, map matching for navigation and guidance, optical tracking,
Manuscript received November 30, 1971; revised Mav 22, 1972,and August 21, 1972.
The authors are with the Lockheed Palo Alto Research Labora-tory, Lockheed Missiles & Space Company, Inc., Paln Alto, Calif.94304.
stereo compilation, and image change detection. In fact, the abilityto describe, match, and register scenes is basic for almost anyimage processing task.
INTRODUCTIONTllHE PRIMARY PROBLEM dealt with in this
T paper is the following. Given some description ofa visual object, find that object in an actual photo-
graph. The object might be simple, such as a line, orcomplicated, such as an ocean wave, and the descriptioncan be linguistic, pictorial, procedural, etc. The actualphotograph will be called the "sensed scene," a two-dimensional array of gray-level values, while the objectbeing sought is called the "reference."
This ability to find a reference in a sensed scene, or,equivalently, to match or register the images of twoscenes, is basic for almost any image processing task.Application to such areas as scene analysis and descrip-tion, map matching for navigation and guidance, optical
1EEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
tracking, stereo compilation, and image change detec-tion is direct and obvious.There are two basic approaches to solving the image-
matching problem as defined above.If we possess a precise description of the noise and dis-
tortion process which defines the mapping between thereference and its image in the sensed scene, we can em-
ploy statistical decision theory techniques to derive an
image-matching procedure which optimizes some objec-tive criterion (e.g., minimum error, or minimum risk,in determining the best embedding of the reference inthe sensed scene). A typical outcome of such an analysisis the use of a correlation-like matching procedure. How-ever, in most practical problem situations, the requirednoise and distortion model is not available, nor is itfeasible to attempt to construct one. For example, we
might consider all human faces to be perturbed versionsof some single ideal or reference face. However, to at-
tempt to define completely a valid noise and distortionmodel for this situation would be a hopeless task.A second and more general approach to the image-
matching problem bypasses the need for a noise and dis-tortion model by accepting an embedding metric with-out requiring its justification as being equivalent to a
minimum error (or minimum risk) procedure. In fact,without a noise and distortion model, there is no the-oretically valid way to derive or predict the error per-
formance of a selected procedure prior to its actualapplication. Our primary concern in this paper is withthis latter case.
We offer the following two sets of criteria for an em-
bedding metric not theoretically derived on the basisof error performance. First, it must be successful inapplication (i.e., its observed error performance must
be acceptable), intuitively satisfying (so we can havesome confidence in its ability to deal with as yet untriedapplications), and general enough so that it can be em-
ployed over a wide range of problems without significantmodification. Second, it must be possible to specify a
computationally feasible decision algorithm which se-
lects a suitable embedding based on the given em-
bedding metric. (The combination of embedding metricand corresponding decision algorithm will be called an
embedding model.) In the remainder of this paper, we
will present an embedding metric and correspondingdecision algorithm, present examples of the applicationof this embedding model to a number of distinct prob-lem areas, and show how this embedding model is rele-vant to the problem of scene representation as well as to
image matching.
AN EMBEDDING METRIC
To introduce the generic form of our embeddingmetric, and establish its intuitive validity, let us firstconsider the following process. Assume that the refer-ence is an image on a transparent rubber sheet. Wemove this sheet over the sensed image and, at each pos-
sible placement, we pull or push on the rubber sheet to
get the best possible alignment between the referenceimage on the sheet and the underlying sensed image.We evaluate each such embedding both by how good acorrespondence we were able to obtain and by how muchpushing and pulling we had to exert to obtain it.
Let us now consider a discrete version of the aboveprocess which is both more precise and more reasonablefrom an implementation standpoint. In a specific appli-cation, we might have some information on the rangeof permissible distortions that can occur between thereference and sensed images. For instance, some subsetof the items appearing in the sensed image might alwaysretain their internal shape even though their relativepositions might be subject to change with respect to
their locations in the reference scene. Further, wherechange of relative position is possible, we might be ableto bound the extent of such change; and, finally, wemight like to assign variable "costs" to the differenttypes of change of relative position or relative change insome nongeometric attribute.To achieve these capabilities, we replace the rubber
sheet by a reference image which is composed of a num-ber of rigid pieces (components) held together by"springs." A rigid piece of the reference image can be assmall as a single resolution cell, or as large as the entirereference image, and corresponds to a single coherententity in the reference image. The springs joining therigid pieces serve both to constrain relative movementand to measure the "cost" of the movement by howmuch they are "stretched." (Typically, the springs willbe highly nonlinear in their behavior.) In determiningthe cost of an embedding, we measure the "tension" oneach spring (the tension can be a function of directionas well as stretch or even a relative change in somelocally defined attribute), and also make a local evalua-tion of how well each coherent piece is embedded as anindependent entity.The above model permits two interesting dichotomies.
The first dichotomy is the separation of "syntactic" and"semantic" information. The semantic information,which is application dependent, is embodied in thespecific partitioning of the reference into coherentpieces, the placement and cost functions assigned to thesprings, and the cost functions associated with the inde-pendent embedding of the coherent pieces. The syn-tactic information, which is relatively independent ofthe particular application, defines the class of descrip-tions which the algorithm can process. These data are
embodied in the limits set on reference decomposition(e.g., number and maximum size of pieces, etc.); in theformats which must be employed to specify the globalconstraints and costs; and in the form of the embeddingmetric which evaluates "global" fit. The separation ofsemantic and syntactic information is essential to per-mit application of the model to a broad range of prob-lem areas without the necessity of making significantchanges in the implementation.The second dichotomy is the separation between the
68
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
local and global evaluation functions. The global evalu-ation function, associated with the relative positioningof the coherent pieces as described previously, hasstrong syntactic controls on its form to permit its inte-gration directly into the decision algorithm. This is im-portant because the global evaluation produces the mostsevere combinatorial problems. A local evaluation func-tion, associated with how well a given coherent piece isindependently embedded, is easily changed from prob-lem to problem (based on problem-dependent considera-tions) without requiring any change in the core algo-rithms. Thus, the form of a local evaluation functioncan be a (conventional) correlation function togetherwith a pictorial reference component, or a procedurebased on linguistic concepts together with a formaldescription of a reference component,' or even a seriesof guesses in'serted interactively by a human evaluator.The decoupling of the local evaluation functions fromthe core algorithms provides a great deal of flexibilityin' making changes or improvements in the evaluationfunctions for a given problem, as well as when switchingfrom problem to problem. Further, because of the aboveseparation, the performance of the algorithms (bothlocal and global) can be independently evaluated in adirect and intuitively obvious manner. Such an evalua-tion then permits iterative improvement in performanceby selective alteration in the problem-dependent options.We are now in a position to present formally the pro-
posed embedding metric. Let the reference be composedof p components (i.e., p coherent, or primitive, pieces).For 1 <i<p, let xi be a variable ranging over the setof all locations of the sensed scene. xi is defined to be thepostion of the ith component. Suppose there is a mech-anism, either a computer program, or possibly a person,or some mechanical device, which, for location xi of theith component, outputs a numerical value l1(x2) thatindicates how strongly the ith component fits at locationxi of the sensed scene. The smaller li(xi), the betterthe fit.While not formally required, the intent is that li(x2)
measure the presence of the ith component at a locationin the sensed scene independent of any knowledge of the.locations of the other components. That is, li(xi) is apurely local and possibly imprecise measure of the pres-ence of the ith component at location xi.
In addition to the purely local measure li, 1<i.p,there are the following considerations: 1) how well thedifferent components are situated in the required spa-tial relations to each other; and 2) how relative valuesof attributes of the components compare with the cor-responding measured values in the sensed image (e.g.,we might want to specify that the ith component bethicker and more greenish than the jth component). The
I Note that we are now further generalizinig the coincept of "com-ponent." ITt no longer has to be a rigid entity defined pictorially, butrather may be anv information structure or decision procedure whichcan be used to define a real-valued function whose domain of defini-tion is the set of all locations in the sensed image.
extent to which the above specifications are not satisfiedis reflected in the "stretching" of the springs between thecorresponding components.Each location in the sensed image can be associated
with a two-dimensional vector (e.g., the components ofthe vector can be the row and column number of thelocation in the sensed scene). In that case, xi-xj (usualvector subtraction) is a vector pointing from xj to xi.We can now let gij(xi, xj) =gij(xi-xj) be the cost associ-ated with the spring joining the ith and jth components.If there is no spring between these components, thengij is identically zero.
If we set gij(xi, xj) =lI(x) when i =j; and let Xi= {Xi, x2, * , xi }, then the total cost of embedding pcomponents at locations X, is G(Xp).
p i
G(Xp) = E E gij(xi, xi).i=i j-1
Expression (1) can also be written asp
G(Xp) = E hi(Xi)i=j
(1)
(2)
where
hi(Xi) gAjxi xj) .j-l
hi(Xi) can be thought of as the cost of embedding theith component at location xi, given that the previousi-1 components are at the locations specified by X2.
COMPUTATIONAL PROCEDURESIn this section of the paper, we will present computa-
tional procedures for locating a suitable embedding ofone image in another, based on the embedding metricjust presented. A discussion of dynamic programing(DP) is included to place our proposed algorithm [the"linear embedding algorithm" (LEA)] in proper per-spective. In particular, a generic (but computationallyimpractical) approach to solving the embedding prob-lem is some form of DP. The specific form of our em-bedding metric permits a simplification of the generalDP formulation, and the LEA is offered as a computa-tionally feasible approximation to this restricted DPformulation. A graph theoretic interpretation is in-cluded to provide a better intuitive appreciation of theLEA in relation to DP.
Let us assume that the sensed image, designated bythe abbreviation SM, is composed of M resolution ele-ments; while the reference, designated by the abbrevia-tion RM, is composed of P pictorially defined com-ponents (coherent pieces) with a total of N= niresolution elements, ni being the number of resolutionelements in the ith component.The most direct procedure for locating a best em-
bedding is to select combinationally N resolution ele-ments at a time from the SMT, determine if each sucl- se-lection satisfies the coherent (intracomponent) and
69
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
global (intercomponent) constraints, and, if acceptable,then evaluate the embedding metric for the given selec-tion. Obviously, such an approach is completely imprac-tical even for small pictures. For example, a 50X 30 SM(M = 1500) and a 5 X 5 RM (N = 25) would require morethan 1054 selections and evaluations, a hopelessly largenumber. If we assume that the coherent and relationalconstraints provide us with P = 6 nonoverlapping com-ponents, each component sequentially constrained tostay within w = 10 locations referenced to the locationof the previously placed component, then we would stillbe required to perform on the order of 1500 X 105= 1.5 X 108 evaluations. Assuming 10-3 s per evaluation,we would require 1.5 X 105 s or approximately two daysof computation time. It is thus obvious that a moreeffective technique is required.
Dynamic Programming
Expressed in formal terms, the evaluation of the em-bedding metric for a typical picture results in a non-linear, integer programming problem with local optimadifferent from the global optimum (i.e., no particularregularity, such as unimodality). The only availableclass of computational procedures for finding the globaloptimum under the above conditions (other than the ex-haustive search techniques discussed earlier) is usuallydesignated by the generic name dynamic programming.DP is a multistage or iterative optimization proce-
dure which can be described in general terms as follows(see [6] and [7]). We wish to find
min G(X) = E h(Xi) (3)X iEI
where X= {X1, X2, , xp4, each xi, 1<i<p, rangesover a set of vectors with discrete components, I= {1, 2, * , p}, and Xt is the set of those variables(among X) upon which hi depends. hi 1, i< P, area given set of real-valued functions.2
Let XiI be the number of variables in Xi, and leteach xi, 1 <i < p, range over M values. Then each com-ponent hi(Xt) of the cost function is specified by meansof a table with IXiI +1 columns and Mlx I rows.The solution proceeds as follows. We select a variable
y1EX and compute the following expressions (thisgives us the minimization of G with respect to yi):
fl(F(y1)) = min E hi(Xi) (4)Yl iEIl
y,*(r(yl))= the value of yi which minimizes expression (4) (5)
where r(yi) is the set of all those variables (except yi)which occur in any one of those Xi which contains yi.In other words, F(yi) is the set of variables which in-teracts with yi. Ih is the set of those i such that Xi con-
TheI{hi defined in (2) are independenit of all1 xj for j>i; the
Ihs) defined in the general DP formulation can be a function ofany xi.
tains yi. yl* is the optimizing assignment for yi as a func-tion of the variables of F(y').The operation described by (4) is called the elimina-
tion of the variable y, and results in the following trans-formation of (3):
(6)min G(X) = min Ff(r(y1)) + Ex X-1v1 L i[I-I11]
Expression (6) has the same form as (3), but does notcontain yi. Thus, we can find an optimal assignment forX be sequentially "eliminating" all of the variables, andthen tracing back through the stored tables of yi*(P(yi)),where P(yi) can only contain yj such that j>i. That is,we must eventually reach a point where, for some s,
s
U Ij = Ij=1
and expression (6) has the form
r(Ya)min G(X) = min Lf8(Y8+i, YP)]X (Y,8+1,s ' '' Yp I
From (7) we can directly determine the global mini-mum cost for G and also the optimizing values forY= (y., Ys+1, , yp). Given the value for Y, we candetermine the value of y,,1* from the stored table forYsl*(r(y,s_1)), as indicated in expression (5). This "back-ward" recursive process is continued to provide us withthe complete optimizing assignment for X.The computational feasibility of the DP approach de-
pends on storage and computing time requirements.For a given objective function (in our case, the embed-ding metric), storage and computing time requirementsare a function of the order in which the variables areeliminated [3], [7] and the dimensionality of each ofthe eliminated variables. We will say that variable ythas dimensionality F(yi) |, where F(y) is the numberof variables in the set L(yi) as defined following (4) and(5). For many of the problems we shall be concernedwith, the dimensionality will be essentially constantover all variables (i.e., a constant number of springsattached to each component and a symmetric intercon-nection topology) and relatively independent of orderof elimination. Let us associate the dimensionality ofthe variables with the embedding function itself em-ploying the designation D(G) to denote the maximumdimensionality of any of the variables to be eliminatedfor a given order of elimination.Where the dimensionality of all variables is constant
for a given order of elimination, the complete embeddingprocedure will involve the iterative application of (4)and (5) [p-D(G) ] times, to evaluate the p arguments ofG corresponding to the embedding of the p compo-nents. In the kth iteration,3 we compute and store a
The kth iteration evaluates the "cost" of embedding the kthcomponent at yk*, given some specific embedding of the componentsassociated with the variables in r(y). This evaluation corresponds tothe elimination of Yk.
70
(7)
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
table forfk and yk*. The number of lookups required forthe construction of this kth table will be proportional tothe number4 of its rows- [AMD(G)j] and (for each row)the constrained number of feasible locations of the-vari-able being eliminated (denoted Wk, whereWk<W.f). Thuswe have a storage requirement of 2 MD(G) elements pertable (the two entries stored in each row of the table arevalues of fk and yk*), and a computation requirement ofup to MD(G)+1 lookups (with one or more additions andcomparisons per lookup).
If Wk=W for all k, and all variables have dimension-ality D(G), the complete embedding procedure then hasa computation requirement proportional to
and a static (secondary store) storage requirement (forthe "backward" selection of the yi after obtaining theminimum global cost) proportional to
[p - D(G)][MD(G)] entries. (10)
For the earlier example where M=50X30 =1500 loca-tions, (for 1 < k < P)Wk =W = 10 locations, and p = 6 com-ponents, if these components are connected in a linearsequence where each interior component lhas only onespring attached to each of its two immediate neighborsand the end components only one spring each, the num-ber of computations5 would involve 75 X 103 lookups andevaluations, or 75 s of computing time at 10-3 s percomputation. This is certainly a much more reasonablerequirement than the two days of computing time for adirect evaluation. We pay for this speedup6 by having afast storage requirement of 3X103 entries (or words),and backup storage requirement of 7.5 X 103 entries,versus a storage requirement of only a few entries forthe direct evaluation.Now, however, note that, if we permit just one addi-
tional spring per component (or even a single springlinking the first and last components), D(G) = 2 and thenumber of computations increase by a factor of almostM= 1500 to 9 X 107 lookups and evaluations, or 9 X 104 s=25 h. The fast storage requirement increases by afactor of Al= 1500 to 4.5 X 106 entries, and the backupstorage requirement increases to 9 X 106 entries.
This exercise demonstrates that, for even a small in-crease over unit dimensionality, the utility of dynamic-programming as a computational technique is question-able for the embedding task. The next subsection intro-duces a heuristic modification to the DP type of sequen-
Relational constrainits canl redtuce the number of feasible rowsto a value considerably below the unconstrained case. However, thecomipuitational problems, in attempting to take "advantage" of thisreduction, may be prohibitive.
I Assuming the variables are eliminated in the order in which thevappear in the linear sequence, then D(G) = 1.
6 Dynamic programming cani be considered to he a way of tradingstorage for compuitation time.
tial optimization which eliminates the growth of dimen-sionality as more global constraints (springs) are per-mitted.
Linear Embedding
The sequential embedding technique which we pre-sent in this section will be called the linear embeddingalgorithm (LEA). The essential property of this algo-rithm is its ability to locate a suitable embedding witha linear, rather than exponential, growth of storage andcomputing time requirements as a function of thenumber of components in the reference. The algorithmis formally described as follows.
Given a reference with p components, for 1<i<P,li(x.) is the externally supplied local evaluation arrayfor the ith component as a function of xi, its embeddedlocation. Given that the ith component is embedded inlocation xi, wi(xi) is the constrained set of feasible loca-tions for xi-,.
If y is the sequence (1, 3, 2, 5), then y * 8 will be an-other way of writing (1, 3, 2, 5, 8). Si(xi, . . xi)= L_=1 gij(xi, xj), Where each gij is an externally sup-plied spring array, specified as a function of the relativeembeddings of the ith and jth components. This decom-position of Si into a sum of two-place functions (springs)is not required in the following LEA. Thus, if desired,we could extend the scope of the embedding metric toinclude more complex relational forms without increas-ing the computational complexity of the embeddingalgorithm (LEA).The LEA is the computation, in order, of the follow-
ing sequence of 2p+2 equations. (Note that hi=si+Ii.)
y2(x.,) = y1(x1) * X2, where x1 is that value whichminimizes the previous equation.
gi(xi)' =
(14)
min [S,(yi-l(xi l) + x,) * li(xi)xi _ I ! )
+ gi-1(xi--)] (15)
yi(Xi) = yi-(xi1) * xi, where xi-, is that value whichminimizes the previous equation. (16)
gp(xp) = . . .
yp(xp) = . .
G = min gp(xp).xp
Y = YP(xP),
(17)(18)
(19)
where x1, is that value whichminimizes the previous equation. (20)
As in the subsection on dynamic programming, G[see (19)] is the total embedding cost, and the cor-
71
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
responding locations of the embedded components aredetermined from Y [see (20)].7
Given the restriction that the hi in (3) are independentof all xj forj > i [this is consistent with the embeddingmetric as presented in (2) ], then dynamic programmingcan be viewed as a procedure for finding the shortestpath through a graph. We define this graph in the fol-lowing way. The nodes of the graph are arranged in pcolumns, where each node in the ith column is labeledby the values of the variables corresponding to a uniqueset of embeddings for the first i components; there areas many nodes in the ith column as there are uniqueembeddings of the first i components. The length of abranch between a node in column i -1 (with label Xi1)and a node in column i (with label Xi) is hi(Xi). Wedelete branches with infinite length.To determine the shortest path from the root node
(a single node placed in column zero) to some node incolumn p, we can proceed as follows. In the ith column,for each node, sum all the branch lengths correspondingto the label of the node. We now determine that set ofvariables (xi) which do not appear in any hj for j> i,and call this set of variables Zi (the variables in Zi cor-respond to components in columns j<i which do nothave spring connections to components in columnsj>i). We now place the nodes in the ith column intosets, such that, for each set, the nodes are identical intheir labels except for the variables in Zi. For each suchset, we retain the node with the shortest path lengthfrom the root node, and delete all the other nodes in theset as well as those nodes in columnsj>i which branchfrom deleted nodes. It is this pruning process whichgives DP its computational advantage over completeenumeration. After the above set of operations is carriedout through the Pth column, we select the node definingthe shortest path through the tree, and thus the lowestcost embedding for the P components. The LEA differsfrom DP in that in the sequential determination of theshortest path, a maximum of only mi nodes will be re-tained in the ith column (mi is the number of permissiblelocations for embedding the ith component). In pro-cessing the ith column, the nodes are grouped into setssuch that, for each set, the nodes are identical in theirlabels for xi. For each such set we then proceed as in theDP case. Thus, at the ith iteration we save only the mi"best" current embeddings, such that every possible
7The yi defined in (16) clarify the presentation of the LEA. How-ever, in a computer implementation one need Inot calculate these yi,which are arravs, each element of which is a sequence of locations.Rather, it is only necessary to compute a more restricted funiction Zidefined by
Zi (Xi) = Xi-l
where xi- is that value which minimizes the previous equiationThese zi are arrays, each element of which is a single locatioii
rather than a sequence of locations. Then in the compuitations in-volving the Si, if Si depends on more than the two rightmost loca-tions of y_ (xi-,)*xi, the additional locations may be retrieved fromthe Zk, 1 <k<i-1.
positioning of the ith component occurs in one of theembeddings. This approximation technique may fail tofind the best embedding (shortest path) if the com-ponents with low indicies (small i) incur a high em-bedding cost when placed in their optimal locations.
If the components of the reference are linked by asingle chain of springs (which are then called primarysprings), DP and the LEA provide identical solutions.When additional springs are present, the LEA no longerassures the optimal embedding, and these additionalsprings are called hueristic springs (hueristic in thesense that while these additional springs provide moreinformation and thus give an intuitively better matchthan in the single chain case, the best possible use of thisadditional information is not always assured). Theoperation of the LEA is illustrated by the examplegiven in Fig. 1.
Additional Computation Speedup and StorageReduction Techniques
By slowing the growth of the computation and stor-age requirements to a linear function of the size (M) ofthe pictures, the LEA establishes itself as a feasible em-bedding procedure. However, because of the large pro-portionality constants, the practicality of employing theLEA will probably depend on the efficiency of the asso-ciated computer programing, and on the employmentof additional (second-order) speedup and storage reduc-tion techniques. Two of the more important speeduptechniques, applicable to both DP and the LEA, arediscussed below.The dynamic programming and LEA formalisms pre-
sented earlier do not explicitly consider constraints onthe variables (xi). The simplest way of treating suchconstraints is to introduce into the objective function(embedding metric) cost terms which become infinite ifthe relational constraints are violated. This approachhandles the problem by increasing the computation timeneeded to evaluate the additional cost terms. A muchmore desirable technique is to employ the explicit rela-tional constraints (the wi) to limit both the table sizeand search requirements by ignoring or eliminatinginfeasible variable combinations. This technique isillustrated in [1]- [3], and was considered in derivingexpression (8). When the relational constraints are im-plicit (i.e., must be computed from the given data), it isnot clear whether any advantage can be gained fromfirst converting them to explicit form, and then applyingthe above technique.Dynamic programming and the linear embedding
algorithm previously described can both be speeded up bythe following (type of branch-and-bound) technique.We examine a number of complete embeddings, andselect the lowest cost corresponding to any one of thesetrial embeddings. (The embeddings themselves couldhave been obtained by random placements of the corn-
72
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
YSM
44 5 2 8
3 7 5 1 3
2 8 1 5 7
1 4 3 2 4
1 2 3 4 -_z
C1 (1, 1)
c2 (1,1)
C3 (1, 1)
C4 (1,1)
4Q-L C2
2 3
= CI =36C2 4
=C3 =5=C4 =3
I(z Y) = I SM(z,Y) Cifor I i - 4
Spring definition when (i, j) = (2, 1) or (i,j) - (4,3)
Xi - Xj =(Zi - Zj J _ yj) gi (xi- X)
1,0 0
2,0 1
otherwise
Spring definition when (i,j) - (4, 1) or (i,j) = (3,2)
xi-=xj- (zi- zi Yi Yj) g.i(Xi-x.)0,1 0
0,2 1
otherwise
(b)
Evaluation of g2
x2 x1 61 s2z2 Y2 z1Y1 I1 12 g21 92
24 14 1 2 0 3
3 4 1 4 1 4 1 6
2 4 4 4 0
4 4 2 4 4 4 1
3 4 2 4 0 6
2 3 1 3 1 1 0 2
3 3 1:3 1 3 1
2 3 1 3 0 4
4 3 2 3 1 1 1 3
3 3 5 1 0
22 1 2 2 3 0 5
3 2 1 2 2 1 1 4
2 2 5 1 0
4 2 2 2 5 3 1
32 1 3 0 4
Evaluation of 93
x3 x2 x s3
z3 33 z2 Y2 zlyl 13 g32 g2 g3
2 3 2 4 1 4 0 0 3 3
3 3 3 4 1 4 4 0 6 10
4 3 4 4 3 4 2 0 6 8
2 2 2 3 1 3 4 0 2 6
2 4 4 1 3
3 2 3 3 2 3 0 0 4 4
3 4 0 1 6
j4 2 4 3 2 3 2 0 3 5
4 4 2 1 6
2 1 2 2 2 0 5
2 3 1 3 2 1 2 5
3 1 3 2 1 2 3 0 4 7
3 3 3 1 4
4 1 4 2 3 2 1 0 4 5
4 3 1 1 3
Evaluation of g4 = G
x4 23 xl 94Z4y4 Z y33 1 Y
6g43 g41 S4 14 g3 G
1 3 2 3 1 4 0 0 0 4 3 7
3 3 1 4 1 0 1 4 10 15
2 3 3 3 1 4 0 2
4 3 3 4 1 2
3 3 4 3 3 4 0 0 0 2 8 10
1 2 2 2 1 3 0 0 0 5 6 11
3 2 2 3 1 5
2 2 3 2 2 3 0 0 0 2 4 6
4 2 2 3 1 0 1 2 5 8
3 2 4 2 2 3 0 2
1 1 2 1 1 3 0 1 1 1 5 7
3 1 1 2 1 0 1 1 7 9
21 31 12 0 0
41 32 1 0
31' 4 1 3 2 0 0 0 1 5 6
(c)
Fig. 1. An example illustrating the operation of the linear embedding algorithm. The definitions of x, gij, I, are given on pages
z and y are the components of x; that is, x = (z, y). (a) The sensed image. (b) The reference description. (c) Linear embedding algorithm.
73
x= (z Y)
1,42,43,44,41,32,33,34,31,22,23,24,21,12,13,14,1
SM (Zty)
5288751381574324
(a)
-[ (2, 3), (3, 3), (3, 2), (2, 2)]
-I (3, 2), (4, 2), (4, 1)(3, 1)]
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
ponents, or perhaps by someone guessing at what a"good" embedding might actually look like.) Now, em-ploying the LEA (or DP) to place and evaluate thecumulative cost of placement of the components se-quentially, we can eliminate from further considerationany of those placements whose costs exceed the boundestablished by our best trial embedding. It should benoted that this technique is valid only if the cost asso-ciated with the embedding of each component is non-negative. However, this can almost always be the casefor the class of problems we are discussing in this paper.A heuristic embellishment of the above branch-and-
bound technique would be to use some fraction (say[k/N]') of the bound at the kth stage of an N-stageprocess as the threshold for eliminating a possible se-quence of embeddings.
Scale and Rotation (S&R) ConsiderationsIn attempting to match or register two images, we
frequently are faced with the problem of unknown rela-tive scale and orientation. While such variations areconceptually indistinct fromn any of a host of unwantedvariations between the reference and the image, they(S&R) can serve as a vehicle for clarifying some im-portant issues pertaining to the way the LEA is em-ployed.As noted in earlier sections, the embedding process is
carried out at two levels. First, the components of thereference are searched for as independent entities. Theparticular processes by which these searches are exe-cuted are not a direct issue of concern here; the im-portant point is that, regardless of the search mecha-nism, the outcome of the search for any individual com-ponent is presented to the LEA by a tabulation calledthe local evaluation array [L(EV)A]. Each entry in theL(EV)A corresponds to a possible embedding in theimage of the associated reference component, indexed bythe variables used to define the embedding. The entryconsists of a number related to the probability that the-component is actually present at the "location" specifiedby the indexing variables, and each entry can also con-tain the values of attributes of the component as mea-sured at the indexed location. The LEA has no knowl-edge of the component beyond what is presented to itin the L(EV)A for that component. The purpose of theLEA is to integrate global or structural knowledge withthe information provided in the L(EV)A's to find thebest overall embedding (or embeddings) of the referencein the image. The acceptability of the final embeddingselected by the LEA will thus be dependent on tlhe qual-ity of the information presented in the L(EV)A's,where the extent of this dependence is related to therelative importance of local (component definition) ver-sus global (intercomponent) information for the particu-lar problem. Thus, it is the responsibility of the localevaluation function, in attempting to gather evidenceabout the presence of some given reference component,to be able to deal with the various noise and distortion
Fig. 2. Reference description of a squiare.
processes (such as S&R) which might be encountered.To the extent that these same noise and distortion pro-cesses affect the global or structural relationships be-tween the reference components, the LEA provides themachinery necessary to deal with the resulting problems.
Ability to deal with variations at the global level isaccomplished by defining "attributes" which measure(or estimate) these variations, and then making the"spring" parameters functions of these attributes.Thus, in the case of S&R, if a component Pi has scale
and rotation attributes Si and 01, the springs (vectors)attached to P1 would be (conceptually) scaled as a func-tion of S1, and rotated as a function of 01. The followingexample illustrates some of the above comments.
Problem
Given a two-dimensional region in which there are krandomly oriented and positioned line segments, findthe four-line segments which best approximate a square.Each line is specified by a four-tuple of the type (x, y,0, 1) where the x, y coordinates locate the center of thesegment, 0 specifies the orientation, and I the length ofthe segment. To simplify this example, we will ignorethe detection problem and assume that the given valuesfor each segment are known with probability one. Weassign a cost Ci for each unit of positional disparity be-tween the sides of a candidate square, and a cost C2 foreach degree of rotational disparity between the sides(i.e., sides should meet at right angles).
Solution
Consider a single "local evaluation array" consistingof the given list of four-tuples (x, y, 0, 1). We can con-sider x, y to be the "location" indexing variables, and0, 1 to be attributes. All entries have unit probability,entries with zero probability are deleted. The "descrip-tion" of a square is shown schematically in Fig. 2. Each(spring) vector is rigidly attached to its line segment ata fixed angle of 45°. Costs (C1) associated with (x, y) dis-parity between the end of a vector and the center of thenext line segment are circularly symmetric (i.e., the"cost" function increases with distance from the tip ofthe vector) about the tip of the vector. We also assess acost (C2) proportional to the difference in measured
74
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
(attribute value) orientation of more or less than 900for sequential line segments.The general form of the LEA, with orientation ad-
justed springs, and spring costs augmented by attribute(S&R) differences, is adequate to deal with the prob-lem as posed. A minor difficulty arises from the fact thatthe orientation of a line segment is actually two valued(i.e., 0 and 0+1800). If we list each line segment twicein the local evaluation array, once for each of its twoorientations, then the LEA can be applied withoutmodification.We can handle squares of differing size by using the
line-segment-length attribute in the same manner as theorientation attribute (except that double entries are notrequired here). Spring stretching is augmented by acost proportional to the difference in length attributefor sequential line segments. That is, when the LEAexamines a new line segment as a possible additionalside for a square already partially formed, it comparesthe length of the new line segment with the length of theline segment in the partial square to which the new seg-ment will be attached. The spring between the new andthe old line segments then is stretched (over and aboveany stretching due to angular and positional disparity)by an amount proportional to the difference in theselengths.
In the above example note that, because each entryin the local evaluation array had either zero (oo cost)or unit (0 cost) probability associated with its occur-rence, the size of this array could be reduced to listingonly those few coordinate combinations associated withthe feasible (nonzero probability) occurrence of a com-ponent (line segment). This procedure can be used inother situations where it is reasonable to reduce all lowprobability entries in the local evaluation array to zero.
PICTORIAL REPRESENTATION
A central problem in much of the work concernedwith the computer processing of pictorial data is that ofrepresentation. Since we cannot manipulate the realworld object (itself) within the computer, we attempt toconstruct a representation (or model) which can be usedin place of the actual object and which has the following(somewhat overlapping) properties.
Complete: Any question of interest which could beresolved by reference to the actual object should also becapable of being resolved by reference to the represen-tation.
Compact: The representation should be free of infor-mation redundant to the purposes for which it will beused. This is necessary to minimize computer storagerequirements.
Transformable: Much of the information contained ina representation will be implicit rather than explicit inform. The ability to manipulate easily the representa-tion to extract required information is essential. Forexample, if we represent a picture by an intensity matrixor raster, then a count of the number of isolated objects
appearing in the picture would be implicit informationwhich could be extracted from the representation afterconsiderable processing. However, if the representationconsisted of the contours of the object appearing in thepicture, then the required count could be obtainedrather simply.
Incrementally Changeable: If we observe a slightchange in the real world object, it should be a relativelysimple and straightforward task to alter the representa-tion. Further, from the standpoint of image matching,a small change in the real world should require only asmall change in the representation.
Accuracy and Simplicity of Translation: Given a realworld object, it should be relatively simple to derive anaccurate representation of the object.Over the past ten years or so, much of the work con-
cerned with pictorial representation has been restrictedto the domain of line type drawings, and the use of for-mal linguistic methods (see [8 ]- [10 ]). Very little successhas been achieved in attempts to extend this work toscenes of terrains, cloud covers, human faces, etc.,which can only be described meaningfully in terms ofpicture components which are not line elements, butwhich are regions with colors, textures, shadings, etc.8
Perhaps the most serious failing of the linguistic (andsimilar) techniques occurs with respect to the "transla-tion" property. These techniques build a representationby constructing a hierarchy based on picture primitives,assembled into linear expressions employing specifiedrelational forms, and satisfying a set of syntax rules. Theproblem arises from the fact that (usually) the onlydirect correspondence between the actual object and itsrepresentation occurs at the level of the primitive ele-ments (typically points, intensities, and lines), while inpractice there are pieces of a picture that are too in-volved or complicated to describe in terms of theseprimitives. Theoretically, the description is possiblesince the matrix representing the picture is finite. How-ever, such a description would be so complicated that,aside from the difficulty of composing it, there would bea considerable likelihood of error and inaccurate repre-sentation.
In the previous portions of this paper we have pre-sented a representational scheme for pictures, and wereprimarily concerned with its application to imagematching. We will now show that the representationalscheme has wide general applicability, and avoids manyof the problems of the linguistic approaches. First wenote that it is a hybrid type of representation in that itinvokes symbolic (numerical) elements as well as allow-ing actual picture segments to be part of the representa-tion. The ability to intermix picture segments and sym-bolic data in the same representation greatly simplifies
8 Specialized systems have been developed, highly tailored tospecific problems, which are exceptions to this assertion; e.g., seeKelly [11]. A number of papers, including Bledsoe [15], [161 and
* Goldstein and Harmon [17], effectively consider the problem of faceidentification based on feature measurements obtained manually.
75
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
the translation problem. Where a pictorial concept isdifficult to describe symbolically, we can use an actualpiece of the picture as part of the description. A secondaspect of our representation that simplifies the transla-tion problem is the fact that the components (picturepieces, local evaluation arrays, etc.) and the relationalforms (springs) are two-dimensional rather than one-
dimensional entities. We thus avoid the problem of hav-ing to construct a one-dimensional model for a two-
dimensional structure.Let us now examine some of the other representational
attributes. Incremental changeability follows directlyfrom ease of translation (although the inverse relation-ship would not necessarily hold). Transformability withrespect to image matching has certainly been estab-lished; the fact that the representation is already in
two-dimensional form, with metric, geometrical, andtopological relationships explicitly and quantitativelyexpressed, implies that transformability for many otherapplications is more than adequate.
In many respects, compactness and transformabilityare antithetical since data in explicit form are usuallymore extensive than the equivalent implicit information.This is the case with our representation. It requires con-
siderably more storage than might be required for a
linguistic representation.The one area where the linguistic approach has an
obvious advantage is with respect to completeness. Alinguistic representation can treat nonpictorial informa-tion (e.g., relations between items in a picture and otheritems not visible, but perhaps implied or normallyassociated with the pictured items) in a way that wouldbe extremely difficult in our representation. This addi-tional capability could, of course, be achieved by ap-
pending the necessary linguistic machinery to our cur-
rent scheme, although the final result might well be a
mixture of two representations rather than one inte-grated representation.
EXPERIMENTAL RESULTS
In order to evaluate the practical implications of thetechniques presented in this paper, we have initiated a
program involving experiments on a variety of line type
drawings and gray-level imagery. Over 400 experimentshave already been performed with the following generalresults.
1) On well-defined imagery (i.e., relatively noise-freeand unblurred -pictures), the embeddings produced bythe LEA were almost always in agreement with the bestembedding as predetermined by human evaluators.Where the few deviations did occur, they were reason-
able, and usually related to the crude component de-scriptions employed.
2) On noisy imagery (course resolution, additiverandom and coherent noise), the fall in performanceparalleled the difficulty human evaluators had in locat-
ing suitable embeddings. Where the components werediscernible in the image, the embeddings were usuallycorrect; those components which were significantlyaltered by the noise were sometimes missed, but thesubstitution error was usually in close proximity to thecorrect embedding location, and, even in error, thecorrect embedding almost always had a score close tothe best score.
Since the programmed version of the algorithm is stillevolving, and some of the discussed features have notyet been implemented (e.g., the "attribute" feature isnot operational as yet), most of the experiments wereinformal in nature. However, for the purposes of thispaper, two sets of controlled experiments were run andare described below.
Image-Matching Experiments Using Faces
The majority of the experiments we have run to datehad human faces as their subject. Reasons for this selec-tion include the following.
1) The availability of a set of digitized gray-scalepictures containing faces.
2) A single reference (face) could be tested on all thefaces in the data set. In the case of, say, terrain pictures,a separate reference (or, at best, a unique compositionof standard reference components) is necessary for eachpicture.
3) Our familiarity with faces and their components(eyes, nose, mouth, etc.) facilitates evaluation of per-formance as noise and distortion are introduced.The data set used in the face experiments consisted of
15 human faces,9 both men (some with beards) andwomen, digitized to approximately 16 true gray levels,and each face typically was contained in a picture fieldof from 2000 to 3000 resolution elements. Using a refer-ence as shown schematically in Fig. 3(a), with com,ponents as described in Fig. 3(b) and (c), almost 300formal and informal experiments were performed. Ineach experiment, additive (truncated) Gaussian randomnoise with zero-mean and standard deviation of either 0,10, or 15 units was added to each resolution element(relative to a pseudogray scale of 64 units for the noise-free pictures). In some of the informal experiments,coherent noise consisting of randomly placed lines wasalso inserted [see Fig. 4(a)].With no more than two or three exceptions, when the
reference was restricted to hair, eyes, and sides of face,correct embedding was achieved. These results aregratifying in view of the simple component descriptionsemployed, and the equivocation displayed by the re-sulting L(EV)A's. (See examples shown in Fig. 4.)Two series of formal experiments were run on the
I These data were obtained from the Staiiford Artificial Intelli-gence Laboratory and are a subset of the data employed in the ex-periments described by Kelly [11].
76
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
LEFTEDGE
A1P1I
77
V~7$J~O~I RIGHT
NOSE EDGE
MOUTH
(a)
VALUE(X)=(E+F+G+H)-(A+B+C+D)
Note: VALUE(X) is the value assigned to theL(EV)A corresponding to the location Xas a function of the intensities of locationsA through H in the sensed scene.
(b)
K K2=CONSTANTSa=(C+D+E+F)/4p=(A+B+G+H+I+J)/6
p-(X+F)IF [X<(a-K}) OR. a < /3)THEN VALUE(X)=yFK2ELSE VALUE (X) = y
(c)Fig. 3. Reference description of a face. (a) Schematic representation
of face reference, indicating components and their linkages.(b) Reference description for left edge of face. (c) Referencedescription for eye.
(noisy) face pictures using two references which in-cluded, but differed in, the nose/mouth definitions. Inthe first series, consisting of 90 experiments, there were
83 completely correct embeddings, and 7 partially incor-rect embeddings. The errors involved six experimentsin which the nose/mouth complex was offset by three to
four resolution cells from its ideal location, and one ex-
periment in which both the eyes and the nose/mouthcomplex were improperly placed. In the second series,consisting of 45 experiments, the placement of the nose/mouth complex was judged incorrect in 3 experiments,while all the other components were always correctlyembedded.
Analysis of the face experiments led to the followingconclusions. In spite of almost perfect performance inembedding the hair, eyes, and sides of the face, preciseplacement of the nose/mouth complex based on strictlylocal evaluation was almost impossible in some of thenoisy pictures due to loss of detail [e.g., see Fig. 4(b) ].With the attribute feature of the LEA not yet opera-
tional, and with the arbitrary decision to use binary(rather than multivalued) weights in the spring arrays
for these experiments, the LEA restricted the feasibleregion over which an optimum value could be selectedfor embedding the nose/mouth complex, but did notbias the selection as would genetally be the case. In thepresence of heavy noise, the simple nose/mouth descrip-
tions used in these experiments were not always ade-quate to produce a local optimum in the L(EV)A at ornear the ideal embedding location. (A three-resolutioncell deviation was considered an error.)
Image-Matching Experiments Using Terrain Scenes
Approximately 40 experiments have been performedusing terrain scenes (including both aerial and groundscenes). The object in each case was to create a relativelysimple description of some portion of the scene and thenattempt to find the proper embedding of the descriptionin the image (or some distorted or alternate view ofthe image).The descriptions employed two basic types of com-
ponents: 1) texture components, in which- the "texturevalue" of a point was defined as a crude statistical func-tion of the intensity values and gradients in some localregion surrounding the point; and 2) shape components,which were defined by collections of "edge" points hav-ing specified gradients.
Fig. 5(a) shows an example of a terrain (reference)description. Fig. 5(b) shows its successful embeddingrelative to the computer-stored version of the photo-graph of the actual terrain segment as shown in Fig.5 (c). Each coherent piece in reference 5 (a) is representedby several points enclosed by a dotted line. In this ex-
ample, the points of each enclosure of the reference com-
11 4. I 3F41.+1 I -I 8==6fA 14. 4. 412 +-+ I)X3341=8 - - = *316,8SmZ- =+13 I Z4ASE"=U*- I -31361=X - +14 = ZMNA+ 1-se Z 338ij38X +*515 1 = ZZ71*AZAA 33 I - 363A3- 3 =16 =X 1+Z+XBAZAl46Ml+A6AX)IIA8F - 33 =17 = 4+ Z =*fHM361 ZA0§U2 1MYA8f oil +4118 = + Z M"A+XAX+X 33)iZ)ZA+U)A *(V=--=19 ++7 =)?M 1)61A1-Il A+AF3Z343 zZ AAA Z4+ XEX4 +++ 1-N§A+8Z.89jU3- + A -z-21 =AAZAA+.A7X +M) ) 3)-ZZ7O+ --= )-2Z -=ZZ AAAAA+ ='1XI 3)1 Of 3 - 4.)-2 3 Z 4 1t46A81 1)AA13M 31 z24 == Z 1WAIA1AAAAA+333) + C) 1 - 1 425 + Z 1 +I11733)X MZ)364AA -3) Z + 11Z6 zz Y1 AZ+XIg+ fg+=ZAAA -33 -27 Z 7 IfSv+%ZA1 1Z)AA+l1M +128 7Z + 1-F4)7ZZl+ A.I)XXZ 33=29 Z)) =#SA ) )M AX14-Al+X +- 3+1-34 LZ 1 X+AGARZZXMZX7XI 1 I - -31 1Z =+ 7X+XAAX)++IXA - --32 Z I XA64)+= XIFWM7Z=+4 +++4 311 +433 XXM+7)l+XlZ) AO=1Z= +H10111 + = +34 = 7MAIX=MX) +111)1 Zt - 1MIX=1+M'X35 314XKVVE33fI1ffI3U§3XX3389AXkA*1m3I7MXM)7I
12345678901 ?345678Q012345678901234567890
Noisy picture (sensed scene) as used in experiment.
HAIR WAS LOCATED AT (6, 18)L/EDGE WAS LOCATED AT (18, 10)R/EDGE WAS LOCATED AT (18, 25)L/EYE WAS LOCATED AT (17,13)R/EYE WAS LOCATED AT (17, 21)NOSE WAS LOCATtD AT (22,18)MOUTH WAS LOCATED AT (24,17)
123456 7°9'l23456 7890 12 34567 89 C12 3456 78q0
L(EV)A for eye. (Density at a point is proportional toprobability that an eye is present at that location.)
(a)
Fig. 4. Examples of image-matching experiments using faces. (a) Successful embedding under coherent nioise.
217 1-2+) 3M+X84 I2T3 TT1 A 8 =+28 +- = 1X AB3 Z -ZI + A - I =29 2 14 --ZAM+Zl ++Z 3 1 + -A -X13C = = + AAXA- A ++ -= =-1 - I +
31 I- 1+-361 )= 3 +- +X=1 + =X32 - 1 3 M 10A6X1388XX M4 A =M+ I +=33 J A - +WXEM3 X8JZ1 +A= 1+134 1 §ZA= ZZXAZ + ZA=) +4- 4 14+435 1 A I -+ MAU2 Z ABA 136 + == M6U)) -36AA=1 +1 1+) X37 I+) 1= 1 1xAZ1 =-4A++A)) 138 P1 -= =36M 1=XZ 1 3= 1 =
12345678901234567S93 1234567a9C1234567890
Noisy picture (sensed scene) as used in experiment.
HAIR WAS LOCATED AT (8,21)L/EDGE WAS LOCATED AT (17, 11)R/EDGE WAS LOCATED AT (17,25)L/EYE WAS LOCATED AT (17,14)R/EYE WAS LOCATED AT (17,20)NOSE WAS LOCATED AT (21,16)MOUTH WAS LOCATED AT (23,16)
(b)Fig. 4 (continued). (b) Incorrect embedding of nose under random noise.
79
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
1234567890123456789012345678901234561 IlltIllI IIZtzIItZZZzIZZZZLZZlZZ2ZLLZLLZZLZZZLZZZAAXLZZZLZZZZLLZL..3 ZZZZZZZZZZZZZZZA§SSMXZLXZZZZXXXXX4 Z ZZZL1ZZL ZZXM36SSSS6AXZXXX1LXXXX5 ZZtZZtZZZZMtSSASSSSSq9XZZZZZZZZZe 1111ZZZ I I1A5I0I?61555365111@2Z11Z2Z7 i1lZZ1itIIeIe8SSSSSUIsSSmLUZZzzzz
.. Z LIZ IZLZZI2"4{{||@@X Z Z Z7LZU_._9 11)11)11) 1SSISSIS6ISUMAXA%VA
10t. *ISS6SSE3GMAX29q4+11 *1AV"691MAXXXXZZK81M12 .-1gUmxxxxxX~x1lzmgS9v=13 IIAlXAAAXXAAAAZ XSSS1J . ___.tSx XMM§eAX8(Xt M A I *.__IS +8SXMSSSZZFMSEMZAfM)16 I AMZlIXAX I XMMAI IZOMX.17 .. ... MMII ZIIIXLL1Iz MIIS -A2111Z)IXZZlZX*A+19 xOzzzzz xxxxxzztze20 -ZA4GAX XXL.AMAXXX XAX.21 SBXXXZAMAXAXXBI+22 XIAXMEMM§OMXMSS
123456789012345678901234567890123456L(EV)A for nose. (Density at a point is proportional
to probability that nose is present at that loca-tion.)
123456769012345678901234567890123456I tX)XAlA0*++AIX)AZAZl *XOA9I XNA12 101 AtM. XaAXelSAA4ZAZ)-MZ -XOAX+l3 *l+lZX+MM1 +*Xf ZlX6MAAXXIAA9saX+4 )ItIAA )AA I I IE0 AA2AlI I*P+
- 5 - )4MJ1 tZAt 4JMUUX Z AGAUM+fl M-K-X-6 IZtXAXZ w8+M"WUUISSASUU4X Z0.@A)7 ++ XASI ZZSMSZS@SMASUIASASO8AMXI IAI -ZI-IcX ZXX@SPu"$LS0A"eAqf *MZlUl9 120 XMl SS*zlS)MAA4IUAtOzvSl x
10 x 1+ *SZ000MIOO@) 1924 + 1
12 1 ) I lU0S-XUMM-+9* Px *-13 +*A BeUsSZ*MMXXMSSSmi -
14 4A 4 SS-UM+91 U9)"0ag z15 * lLluullGKpSMgelzll+ * M16 + ) *+ )+ -SAMZ8 ++A9SIA *M-17 T FITA JlZ -rMMsx*40Z)@M1iu -13 + X I X)IAMIIM* S 1256X) 119 2 -++ +MX)+MASIXS+AXMM I I +20 -a 5 4 9XAXZAIXXI AXI I1K --21 a ) I E4GfXOM 9S18IRZMSA+ a I22 + Z2Z+ I XOA)@-)XOSRAU OM I2T a AllIXX A-a MO Al A124 +4 IAXMM§XMU+ZS it25 2Z + - X 1INZ6eAA6SeS1) +A *A- 6 -xIt-i MSM300@|SSS X 1
2? ' +X tl@|lelvlen I +28 + Z+ 1XXAMSSUuSLAM- It + -+
-2Xqxn- }-- T-FVZM-- I,11Z-69-- Y --3C .1+ 4e1XZiZA *IXZGXI+31 -+ II 1MIOMSU)X AlZLSst A 132 . A== A)llXxagAI2zX@AZL-1)1 A33 1 1 +M+AMelZ4XMS1+14G+XXMA-Z34 M+ +*XfMA4XM8ASXXSl*lXZ+t I ASS
1234567890 12345678901234567890123456
Noisy picture (sensed scene) as used in experiment.
HAIR WAS LOCATED AT (7, 23)b/EDGE WAS LOCATED AT (17,13)R/EDGE WAS LOCATED AT (17, 26)L/EYE WAS LOCATED AT (14,17)R/EYE WAS LOCATED AT (14,23)NOSE WAS LOCATED AT (20, 20)MOUTH WAS LOCATED AT (22, 20)
(c)
Fig. 4 (continued). (c) Successful embedding under random noise.
80
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
1.
15
16~
213 C- 4n-F tvA A VJ%?08 AXYU3ZS XX X1 2 AXJIAWI5 A
23 I I IX Z-11+ - LASI+ ZXXXXAA*AX
22 8~4ALm-AX AAAVXI IIXA1 1)112XAXI It I X4A4ZL"A&
2,3 AAeAXXXX "XXAAAlXXZZ1 -43261.3ZXX'C1&11i251 X{4'AAAVX IX. 31R(V-X*)AA43319111121Z2
12 111AMA XZA At fSU3anWSeno*#yjf1113 1 I" 1fffffRSLaG901i 4cmfMMA4M, Ak4 I 114 111w.t. A ft.*" 3 AAXXA I NA M U+f3.4MA I I
16 I11AXA4XAXZ2Z21131*3331IhA4qAAZZ2ZZII17 11 1IXAXXKA X . I+3 I))I + -ZA4P I 0AXXAAII1I16 II1XX)KA1AX2113,?)) 34 =1ZAWfR'4XAAAXA111IC 111AM-M'*3Wi6X11*I ))lZZlXMA'MAXAAM'4,11120 1 gdlYU84 4aAyMAFM P IA21 11164 A3.~~~A43 8A4M8~~~O@~AX6A
X Xvl1122 111XlAA6XXY''-AAX!lXMOIMAAIXAMAAft41123 1 1ZZIZAZXXZXX IL II izIxI3ZzX1ZAAXA$OMAvI I24 111X2ZX211Z1+14*1 3IlIl3lXXZX'A4AXX4II125 1 1AAVuAM A 1I)3I ==+++.4111IZ31Z XI 1 1 126 1 1l'4vi*am4YtXi II IZ I I... I)ZII Z X IZ I I27 1110SA"A4XlP)7 A AXX6Z I 3+313)1117 IZ X 11t23 1ll2X1Z213 )lAXLs %Z73+)4331112q 1112 1'? 3,)zZ,,XQ,4iiX.7Z 7Zl3I34443.13=1113r m1i13+1i7,4j)jZIzizKXXXX1l3+3+ +111l31 1lIZ3311X63vA461))11X6&P14171134= + il132 1171133 1114-) IXAAAAAZ+41ZIXXZZZxlxAZIZZZI111134 111 3)IZZXZXKZZZX3)1131)ZIIZZZZIXZXAIII35 11131171.433Ij11I111 3*4)ZXAAAAAIIII
I i3457 C~Q 1 2345il 7;'~1 I234''- -i~r 343= 7 i1
1ljj IZX1 -1 V !it Y t- # f :#. .3
2 &PE- zYVERYAE1, r- and1l
6
13~
14~
160 8§K--3Z4 0 " SI 1 as1 SX=X +
08 AN8x XE + i3 Ar11)FjV g+X-i8 fSl
4C ** X31
i1 Z XAZAAA3SA-"llFA = Z1Xa1= Ai:t- xz inA fXX M Af.Y-1A3Xl31 V*=l3AX1 XX XlA A'+
21 A7 (A -1 4= )- 4A
24 ) z PM 2=+ 1,.+ 7
25 3Il14E*f4X VX'+2l3 =+g6'31 +A XXK
26 IBP IB A=f IA Z X = L 11IXX1 X1I1+
27
28 1 X 1AII?3Z= MAUVAV*X 1 ) A Z2)1 33I A XX-'
2~~M I Z X3-ilUSyUSZ IXk4AA,6 14 X AX+
31 )Z9PXZZA ML*&S kMIAt,! A+,-'S8K+,Xi=X I1O) )+32 Afg AAPRII8AU )I).S i 8 A YX Cf~~N'-43?3 E2X-St. VAII 1A0~34 1ZIAI
35 f 118218e Li- x 1 Klj1V+-,X y38lUXwy-* 83O
1234567A'~012345A7SQ0123456789012345678Q
Noisy picture (sensed scene) as used in experiment.
HAIR WAS LOCATED AT (11, 21)
L/EDGE WAS LOCATED AT (25, 11)
R/EDGE WAS LOCATED AT (25, 24)
L/EYE WAS LOCATED AT (21, 15)
R/EYE WAS LOCATED AT (21, 21)
NOSE WAS LOCATED AT (26, 18)
MOUTH WAS LOCATED AT (29, 17)
1234567593 123456789012345678901~234567P?0
L(EV)A for hair. (Density at a point is proportionalto probability that hair is present at that loca-
tion.)
(d)
Fig. 4 (conitinued). (d) Successful embedding under random noise.
| q {- * ~ W 4 S 4 ~ g il X x t. + I Z 1 =V Z 2X.* :X
23 P;'* g7 =+ + I~E Z I Xy, "A. .
c>2'': t^^ ': ;+@6y )=+ I y; -vXw} t Y e2 %4'AX )X= 1Y i I I -I fi)2 4 X+25 >9 y X> fil. Z +7k2 f .XX X X X X X Z 1+ ) 7 1 )--+ + X)Lx 111.7 X -TITITI I )II) I + +)7 A X 6XXYxx X XIYI Z Z ) + - li L + + II
29 AXY YYXyXXXZ IlZ, l. 1Z i lblZiIII 1+ )III-1-`-- XY7-77 7ZI Il I I Z711 1 ) ) )I}) + ~ ++ i+
31 YX7?Z1ii l11)=)11111 +++++l)- + iII)I)32 YZZ111111+- I11 1++++) 1 + II1 i i
+!tt7i11* +lF-4+ l l(134 IZI I X yI35 Z = =I 4;'isK
IZ 4 67 ^t. 1 2 14^ 7¢C- 2 5". 7 C;t1 2 4^
Original picttire.
21 7., tI 4'Sf 7 C Ia45678P 1 2 4 5 6 t3'4
i~'t, -K-.0';X 'K A- A*. o4-27 P_O"YK v;''"" ':FA ts/Q .; 3.Ati YMiA4 AP1e k4 X4P=i Ft isa
io9w O''r"f),:i,07? t-*= 4*~ .0-1i\*s&P r8F AiP '4fASE7"Vf ' ' *?4 te ii '-i A5.A 4 X 'AC
A Ye tIte fZ ..Z A,Kt I0f
IC "-",Ac,, *i6,@ tvI1 -E,,^f441~~~~~~~~~N k18*1* 4 Yt- Y I Zi*;X.7o . I A td V-1f.-\ 4^
V1 P tff9wF' -@'F f SIiAtv'A + X aif ;e P ., P-~ :
1 5 4(-X i4r- V43 if 5`,eliB A.J-" AA#9 i R 3AAW* -
I36 HPAAX SfOXX X 'd A z+Xtoa* Xi17 F#Y4@ f A'YAfR4*Tv XVIUX8vkeiSWAI A A G-9-?
1 8S FfA Xf Eft-i Z!'I§t R$8A{2f i0 Z=C ZA bFfU341I,R 11-Yt; f-3C9! A#k!wl X= AU '-' Z A-928-
2A F P-;4csJ*Asief E+t*6 C.aF*P 0YA- AA@
8xsA-
221 i"f:*^;E0F{1'~ afW W .i4& if '#'*A6,1- XXj
24 E-fI-f4'1C1 si:t8=*f it jjX4V-FfiVA I ml*f* a-Ait2 c fZ f;* A A R3iXA k V fFr-CXA A ZIf N X I'"Al K
26 f fTFr f-,4 !PA, i#44 P-P vt;V,.
2 F >:wA'WP- litJP9 tX% f t ve At-7 i P+i4Xi4 ' \H_~ ~~~~
l
3C @l*A2-fAsi*Z76-00ei1JO'tA+WkKAAA43
3n3 6w.i?{';.Z..8' X (AlzktfVAXRjj Aui;;34 E99WINPITAX196 I46'wN-e YPe{ 44Ae6AMXAIP) AR&35 F ioi1Pi1A3i iAAP'llox4il!V.^si^MAz<
~~~~~~~~~S : XZ ItiB IfE'A e',9A E4 -XAG8 v1)( ) #M&7 f'X ZE ^ 6A ~ " e0 umeXs*0Zi X t $h*WFI #s
_ 2 .1 .A .P. .4 a ..L.
c ;zA oA?.."" E>1 '"tKtZ3I' AtL'A 3*i"
1 X Ea 2E i' f'f'4 A - ) -A BiKZ}XV 6Z1L 'I
1 1 iELtIs21S~s _ 19 - 1 *A H41lX SVi# A' Z
1.3 f )6Yf* f J Z I A 1 *V6 ) tHAW) X14 VrC-t"fv")AZZ)1 )+ X XL +-+RA-17YAI1 5 +X fU 'V AV+ P ) X Z75b f@§frYK3ZAWAB916 *Af1 XX.* .- 114.Z==+ = A 5b1l*^Z ti
17 U 4#G49+*81X8ZA1S"+ IANS VA'81xx -4 I if I . X Z............... t A A ...A'-'.. .Z. '~I
19 A+"*-U 111114A8 ZlZ+ lZb= 'WAL4SX42' lZ?2 '9 >+'4-'l C ,7R 5 (;A (O Z +XIZi=*Glfl A *:*Z,2r *' t ) t )~ 1\ _ ) 1ZI
3C 6 Z-Z)1K 4+XtL ml CZ eZX= + I 1X=31 1 W f;Y1)= i+ + )ti Z ====3X3 2 Z+L'. Z== J 1 1XZ'l +Z+= + 7ZZ=)33 MXA + I 1'f1'A XX l4Z1A1Xm'M
34 A) +4A = 1 IX + M 4Z X=1-61A35 1 + - I _ A )M I m1@@1
12345o78Q')1234567031234567itOl23456789(
Noisy picture (sensed scene) as used in experiment.
HAIR WAS LOCATED AT (8, 20)L/EDGE WAS LOCATED AT (20, 11)R/EDGE WAS LOCATED AT (20, 27)L/EYE WAS LOCATED AT (18,15)R/EYE WAS LOCATED AT (18, 23)NOSE WAS LOCATED AT (23,18)MOUTH WAS LOCATED AT (25,18)
123456759C1234557P8n12345675C012345A 75)^.
L(EV)A for L/EDGE. (Density at a point is propor-tional -to probability that L/EDGE is present atthat location.)
(e)
Fig. 4 (continued). (e) Successful embedding under random noise.
l 7 F'-M K Afo Ai= r . L I1 Xk61 +I 1Z7lllll X 7 Z Z7e
1EfI'1A5X3IRs813'IX)AZNSSS#t.t #S11mHm'YIX YI1i I15 **4XASISSffiIx V*(4'XVSSXKKKX IK IAM1PI-S14AZfEfE&Sfl.Y XXXfit'SSsZ?6X&-XX)4
3 2 tffAAXiv. :X il ) Z9EN IF X XZ AAA4
4: !4 Hai*>ti e1 4 VI i i:" A '1, et.L Ia=1 Y_tM/ZXX24 X X4X FMSOW4A X_f;x. "'A.x 4E1 f 131 0Jt XXX-AX\ Z
12?4Fl'7i.<t'^* 12X45c"7eg(:12345)7 >1 1234'>670
25
4
32X eXXKXXk'AESfl9>+'XAtK17;weez3)1111
?4xA X-X80f4XX.:.i .VB8X* XA34 A~~~~X/X~~~~*HhEgr ~ ~ WA
35 XX(X tf!I9SSUUVit'PAKKKxYKLAW*fESW9*3VSRS.X
12345eJ739212345o7tc 123450nyk'c 123455rv:
Original picture.
A A ,AKWA 7 *'77411'7I t'vx/4At;?2 AX)ALA"Vr4?K t'A i X4"%3 AZX;0AKAi@ZZ* , ?\\wK*--AiA9l)'. Xi<C
13 A) I;'," IK Z 7 ZAt Z t- -FM'A 4i 4 "A Al4}'
4 A 1A A'w''7)LZIX;. L *xlt1+ '-Kt"^
71 1 A r-'= X8E"- Lt,tI ZsX.Y A' X1 7 +;7 Z i 1.;jt1
17,AFrs-X2 A WA.t.Xi.rZi<~' 8sZ1N"y7~~~~~~~~~~~~~~~~~~~~~~"
8 AX)AFiAC?A^@'tA5^!'X%AA't &.5//l"'t^Zl'':#-4Ii AX IAZ*#L A-Z AL-AIi -*I V aI 1 Z X I* '/I
12 As,31 aJIYal71 YAA)b lSI.l77+IF-t8 ZII13 AAlAbz*#Ylx1 V~7ZAu?Z)fRS% 84A) a'"+!,'.KL4124 ,AE IV )I )Z .K L tL.*6A X ,+ AZ Xt 1A(M A15 A 1,Pt* I v; X 7 7-\73 7Y Z",:.) Iit.. AXZ"ZS<701/% V'tL7f?^JtHX7??r1Et?92'
-*I AX?-A-SV14 K1-X't?? 5-;X-i,4flL7!ZZ--A#+ -18 AztXgR,z 4lA X1A)ASa.K/2vARt1+KX+I*9H87419 x ' 4 Z Z
10 S*S6faSfl6kS+.B6&'1eiSASISvle1Sa*ntEtflL,lC SN(MhIHSSteHUSCS4GSWS@-1+=f'.Z±i%S12 4l"St41h5OS~LS: 11 7?XKHfft1l SiAI-C*1 3 USeXweISS*X#sY))XiI1-ZLVAWSESSNSxKzX8N0|SAV14 RISt*f11&1kkvlMZ 1 IY*I+'fSw51+Z1HtU1'15 XS'iE808SI\L1XAZ-+ AtS XSVRASX*N')16 *ISIAssixm 9Si e=)1271*?HSSX+ )1'tHU Z17 U,tR11*SZ: 1=- t+x vLsssrsIrazn,w7ev18 8leM ISE0SS? ZA.+ tEA"C4ZASSR#( 1=+t--&Y X19 4fSAttiS=SPHAahX~AXM?19#1# IE{AII*EF*F4I9) = f.S ^PYIS ^ a 1§11 X*'A,:A= EJ?C AJHN'vX l1I eNSRX' 6t -!'Z-IA4d 16 P 15<AX t21 iSXMKII'.S-X?*EA1F7-1'+BSKCI-z9-S22 Ma4S =XYitt)ISfe A ltA+) = )ltEiX-+Pt++23 tMhMSi ZXA%ZJZ0t + +X is EC#t ze4vjI A.243?1A1SA.1-1Z&-=.f+M)XA A x+il+Iii4102' 9tZ'SZSANs AKXf/Z+ -1XYi&1* i 1MAt72o XGIA-)|/*:11Sri' AF7vv+f27 1AIA %4vcK i\t*y* I I/ ZS L;-4 = ]sj IXX28 ZI = XXMK"1sAN5;KOX-7 ) "WS 1Z E+?LLX ++29 4AKXlZxY*vZE, ,+ 'V.-2%'VXAXA = ZA )+911301QiS M IkAiS1)k1 A.#(S+ieFv= G-1 lbZ'S31 8AItSl') ACWS%13X,AI "5B+-i= A=fl XAL32 teM,Z -lt-' SAleIAs ISSKtR 'S ASY88l l + AZ IA33 +g8:,* .sos 40 xs'. a^w I' A AN s;*oZ 'A X34 xY 1SASRSaAS*%/fx I A nI i f4aStZeSsAi.135 I Z-N11'SXA88 AA * z*XX S lIeFSSUZS9IK04X
1234 5675 93 12 3456 8QA1 2 3456a789$12 3456 78l
Noisy picture (sensed scene) as used in experiment.
HAIR WAS LOCATED AT (11, 15)L/EDGE WAS LOCATED AT (19,8)R/EDGE WAS LOCATED AT (19, 24)L/EYE WAS LOCATED AT (19,12)R/EYE WAS LOCATED AT (19, 20)NOSE WAS LOCATED AT (25,15)MOUTH WAS LOCATED AT (28,15)
(f)Fig. 4 (continued). (f) Successful embedding under random noise.
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
1234507b'd,1345678se 123456759012 345b73tf
2345 +11+
--++= )A649e1 -7 +A9**5q*i,"Y*Z =
I010 Ii wSf)ii oeZi** -a
17I1 )A6180OBO8{s3UHR6i8S666.+
1 4 ) [email protected]+ __15 6Q16 ZI8,2 Z Z I1 ZXL66s86s417 -VKR4i8 4A+-- +Z ' IARSA
31 -7Z=) '-4 ) ))4*} = Z 1 7'4 +=32 +- = + 11=x 7 X Z - =+?3 + - A 2 Ar*l=t.-) I =34 - 5 'UAA X7 -.+ X- 1 +A++)35_A 1 -t46X Z2X + 7 ZAZ)()- = 13t ++ 2 -+-+ R 15 + =+ ZtLV +* 15x37 Z7 12 =+ llf=A X x >'31 111 I A Z1 Z Z -> )Xi=) X-M 1)
12345,795C12345t7h0C1?2345O7Q9c1234567890
Noisy picture (sensed scene) as used in experiment.
HAIR WAS LOCATED AT (13,23)L/EDGE WAS LOCATED AT (25,13)R/EDGE WAS LOCATED AT (25,28)L/EYE WAS LOCATED AT (22,16)R/EYE WAS LOCATED AT (22,23)NOSE WAS LOCATED AT (27,20)MOUTH WAS LOCATED AT (29,19)
L(EV)A for eye. (Density at a point is proportional toprobability that eye is present at that location.)
(g)Fig. 4 (continued). (g) Successful embedding under random noise.
3 AAA.^itlS".?.X'"4A;1AAA^l_ ZY7 7A&Ai4 A.A %=X77.' t eA1eAIX A^ )'AX L YLYZ A AtA415 A A f, _4A rIL IV X UMI it i K I I X I I Z X '-O A -M3YX P!.A A,6 A AA A X 7++LXZ 1 7 7 2^) -XAZlt2 JA AXI A A A7 A5' ,'AY IZ ^. I '.:XFll8iFiWV* W _0 Z I I I Y P. I2 X A AA A;P ,^,.AtAZ Y 11 \VOGX,s yi 7;-@s.. 1+ I Z'f1F 32 AAI- ^5 A-AAXIX'Af-* 1l1XV'AwvA4:'1Y~X AX I AiA
I A 4) I I 42+eL.L 77 ZAP- IS Zl IIIX11AAAA
1!A4 -% Y. 8e-Yf )ir =2 Z x Y ?' ++ a, A --, Al13 At.t '! I =+ X -XX I A -L ,7Z ,t 'At I I X) X r-X ) -.tAA
I S A A A A t ' 1.X I X1 t.?!VaP3AX) X2sZ OAA7 tte1+A X7 I + It+t I)+1 1 Y A AV Yft A A
Ic2 .114A .1 Z X Z A. '-t:i- 1J 1 'V' I V X )_ A X V,'O Xt.A .% Pt?^ Att 14Szx1sZ Y I y J* '6I V16i:*" Z I ? A ,YXA f A...................EA21 4-A I I ! I I 7 Z Y*r:'.,A A ) 1ta I Z 4A XV^A" A4.22 t.:. xz7 Z 7 I A [IfA E, PfAAec -IthA . x7 .1A*^
7 I . .7 .k tI K7- 7-.......%-' -1 A .A N
2~~~~~~~~~~~~~~~~~ Y ,X A y> v:4:-Z' I7YYjz A A -I7; A*AA A I Y Zqt 'AJ-.^zYx X I.t>AaA
28$ A\A,;KKI1t' i;4'¢-e.8,>8 t X 7 I ZX Y X~ *:X P >A27: izA -I". A Z X,L X ''4ik- ~,;4 zj 7 X X A -A A e ,PA* XtX) i( I A^.7
.4AAt,^ .-'2 I
)
I1f 1 1Y.i~* XI ZXh A MA L A AX AA
't
A
^
at ' 'K .'I - A. 4sw;4,? I ~+IV X X I t)_' X A A!A .E.
3Z 4'I:5\)7,.'"e Y 7
XA
> a^^
3, X4 K*^.,, #K^* *vl ^
tt** 5\
4 ~ St ^E -Y Y .: s X X . 8 1 )+x x z 1. X-Zs7^i*5
itz'.7: i4 %; 74 1,c 144 7 ^92 54 t ,7 a9
L(EV)A for mouth. (Density at a. point is proportionalto probability that.mouth is presetit at that loca-tion.)
1 s4-. 7 -I , 1?'4 a 7 A Q-'v 2 34 7 u1 2 3 4 5j J7 j1 5t P _-K 4VZ4 -+.P = ) - 1+ Ii 1)2 +1XZ)+Z P=1 = Z I + -PZ A= * *t'1
A , + K I PP ' +A I PI.A4 P ) =v 1 *Z 1= = P +5 X 2z ) *t- 4-fE1X" 2 X 1=ZI +A 1 146 Z=-l.l- l P16R, A 17-0-- -ZAXX +1- PI7 Z-=-) A0XOU S*W-@wWSE@'P'U* P AlX 4-X 9A X -x P'4-EewefflemelzP-ZS) OP S9 1 Z A Ij 4U=9S=ROSaj19 ) Z13 - 1 PfHUIti.AAveB=.v x i=A * A11 P -eA-I:o1g6XQ1SIB ) X + +I*5P* 1 4 81 P0=..12 i1.AEi FtBXlsz= +-+. V'4110P) 11v) Z13 + X+Af XA9B4= OP) SOAIPI Li-E 'I14 1 P. -*'7 hh'. * 5- YX(1C 44 PtIA#+*e- LZLvAI-*vS A q16 All)XINA X)# .'1?U *VS-X Z =)I P17 P + z=1 )P==+-1+ 7 z = A 4 _Pd - = XZ-+ X )P44) I + A= A + K19 X + )XI = A7 Z - PS =+20 I PS PZ9I1 XI I9 I_______21 i6e+ t* '8P 641A1t -. 1-22 1P -tfigs'^.44 1 + 4+23 * - AIfU-'-e-i-o * PZ 1 p))24 + - P4 P'0SH'I-Pv'xv*lO Z =?5 14P-4OPK=E,I9I1.S 'M.1 ) -
-.?F z -11 4IQtIRE4ij.*ySitliH 1 PA =2=L .._.._*7 -. /IgFt8i'tlSl+ei+j X)=XK i- I
Pj| P /Jei8R5fleYsaI-p x =2q 1 -t 1I .sRE*8pEi=.-pp-) - -3C -- +*. e£H-Eef X * z A31 = I P 'O6t6R =x1= 1 * Z
I Z k4 56 7- c l 2 5 70 -;) I 214'67 4OI2 34557j%(
Noisy picture (sensed scene) as used in experiment.
HAIR WAS LOCATED AT (8, 19)L/EDGE WAS LOCATED AT (16,9)R/EDGE WAS LOCATED AT (16, 23)L/EYE WAS LOCATED AT (14,12)R/EYE WAS LOCATED AT (14,18)NOSE WAS LOCATED AT (18, 16)MOUTH WAS LOCATED AT (21,16)
(h)Fig. 4 (continued). (h) Successful embedding under random noise.
85
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
1234567,V1' 1?34';67$i9. 12 34517$30CIt -44 44';+; }4-eba in Q eEr a t e a c
2 *4D jf *44*If~
9*'}I I IV 0f 4f f E^eeb4-' 44441§i4 i*41
S (+i+-+.- te -FFlf- {,9f 4 t 0_,zFEv i,,u # If
*%- 8H-.HS86 QAt8-88f9 f- -;Vlff+E-.fFfkjffElSE 888jReSHEfigfSea8 f
ZAF" s E R 8X ' ti^ sS HA1Y Ii'l A'±4n12*94'^ t'@ l@Bfia|9 i0 ZiVAzII + Ig a$BRfs R a: " A
4
1 3 41 v. Y Z I +tIRE8a .A A AA14 (WAXAVURSR YXl )s--+ =411|0fl RaX X .AA
l
I1 54XX 448ftXX71 += I. (RISRA8 X XXX AA' M
16 M XXAA8101.Y 1 7 )- +A SxIAXAAA17 PA;4XXAUfJg4X'hAL+=-4)/Zi14HHH77XXYXXXXAA
--1--AXXXXXAR4Z888, XI XY
K A119 R AK X,4.Elff ie~k;2* It & IV'tF iMA74I I ZXXXXX X
2C AYXXAw'A f.AAXtA 1X IXX 1 11* AAZIXXZXXX7 T r C 1z7 F & x -,x1i k Y 'Al iXXXfX-X22 MXXXX*XI 1))X= 1I +X5PX?Z772Z7XX23 3 A XX X'i* X 1 4IAL 171- X '7 I1111I IZZZZXX
- 74A-' YWZX'/-XX t' A 18 1 j716--; S29I AY
)I I I I'I 1ZX X
25 A X XZ I X lA X7X0-MAX1XAXXX 4 ++ 11Z X26 AX2 X LZXIiL P vX XA VMA X AL.X Z II?v1--=I + +) I Z X X
7 AXYXA xL Z L r XFDI--= =iWT)TX-x2E XXZiZ2IZIAXXX(XXZZ/ZZZY)===+++))1IZZ29 77ZZ ZZZ Z 9V. AAXLZ ZKX X 'A +)1 I Izz3-37l7ZZZZ'7Z4- 127.AAJA-A+.-1)) II ZZZZZZ31 ZZt 71Z7Z PAE,q4AZ Xm4U'AX?- I 1 ZXKXXXXX A
32 ZZZZZX AX+AA7'`5X4AAAAXX 7 I -I ?MM¶4#-3 i 1LLZLXA41 X X XYWXTZZ - -1+Z1Xq34 XXXZIXXZ MXXAAIZZZI1IZ35 %IMXXI+ 4AZ1I 1I) +2- -I
123456789q123456786O 1234567BQO123456-7Vc
Original picture.
7
A-
2 P'NPJ AIIlV.j 74G A4+ Y'3 "94 .911: Yz X ,* 'o R
4 MA8t 8 t> 7 d. 7 , { 8t < 1 ASX t Y
73 -fWJ-5*t\ , .>+fi|j ++
i':P 'I3X i: I A X
12 M vX YV+ _+ !A!
7I v'@'A mm- m) I !A- .-ft,K t `- :3-y{FZ "' .'- AY .'1
14 MuBMv IX7*f-2 sm Zl S:A6 t. X z Z I Z t_41 7 if N4 t,;( -'lA .
~ ~ ~ ~ ~ ~
1 7 I+1A-,I I I7Y. 7 4Ij 1 '
~~~7~~' AaAk ,
W'" A A I MIRIP- }IJt" Z K >-f^"igdt ^ *I)W ,5A <
1 A v
_ X!,Q Z Y "AAX A| -i7 it Vf'Z t #0 + NI^X- 'K
L I W' -+ x
I M MA *
1" 3 17Y 7 Y ~ '. ||"-1' | IL * f 1 I .1YA . AIZL( V) fo nose.)((I Densty a a point is proportin
o pr' oa' bA t y Ytt no seis p esn1 at tha lo
ti on2 ) 5 x + + 1I Z I I I?21 MV9wh-M-X'7 i 7. w 1X.;l .'"YI I IX^271 '-tAl I 30'
24 9'fl'JE"*;3t9,* >, =+ ZZ -C tvP+X f-71i; 0
-2?, 1t'J 74}SiN ? A x"-#VYlk?7X.S1t4'
3 I Po v V it so- t 91~ 'P it 1 71X A I |
3 -J(Pt§-^9 ie"i YYY Y :, Z wiVA Vt.'7 x A'
b SYv p.L FCX A + X ct:¢-xL. s I ,,!"!, '. ,Iv..! E Y' 8
-3r1J;Y + + _ - ) l .!771 Y -5 '4 '- + I I I I Vy t'Z v-J'
12 1 76?91BNs- !- 7-7eSRE - 1 '11 3 -= 10880 --5X I X Z7'e .,si1 1 1 1
14 = *AUVU'+1-+ =:5,57wS9?= )A +
15 Xl I M +-'848.9.IY1)-i1A1XZ1 *14 Z
16 +* *=-StitIEUAA *1 7 = X*17. _ _ ZleSBmzeZ )*4'8;1,;+ =_1 V 1 frBSIf2Sl A6'lA '64+119 + I + 7 SZX1ZMAWI1A+AM) li- -* - -20 *4RZ A A+eU*l17=?XVZ7)=l75XXfX21 I ifl -A Z + 4 -AA !+ + X
22 1 X2 I=+ *ft)= I 16-A1s XKAFM ++ y
23 Z 7 *H651'l466l +Z -ZA - Z124 1L f5XY A?U8I 1+-X 16+ + 1 "A + 1
25 -= X ) 2 t'tl )C1Z7-= X M=+ X e 1 +26 X 7= +A41I,1X. lll*SlX-f- Z= -7177 V+)+X= A*5?7x511A -A 11 + 128 + + I +=:WAtEA4j7'A IIA Z29 1AIV= Y*YlW^AA1=fdZ7*A7 =+ Z-:74 -30 ) +4 -z A ?68fl8*'f h27A+ A l1a+ + I' I31 7Z 1sXIU *f4. HI HAMI-1 = It32 - - A +*A*.1U@X yiV'L+ZZ-= x 1- I3 --+ +fTl1 OX*YIB£ a/MXx1'3 w v =
34 X-) =YVZI.f+2Z3Z+ I" *! 4-iL35 +7Z+ Z A uMX 111=ISA= '4 Z117+ -A-) 7
36 +1 - M fi*EitZ.411+++A38llM14) 8 z)37
M
Z +I) =084 XA J=7.fX*11A ZXA )138 4 lI+Z1MLZ Z "661AA I+ f-+j1 14 M
1234 5 7s';312 3456 7800 12 3456780012 3456 7890
Noisy picture (sensed scene) as used in experiment.
HAIR WAS LOCATED AT (9, 20)L/EDGE WAS LOCATED AT (20 ,12)R/EDGE WAS LOCATED AT (20,27)L/EYE WAS LOCATED AT (18,15)R/EYE WAS LOCATED AT (18, 22)NOSE WAS LOCATED AT (24, 18)MOUTH WAS LOCATED AT (27,19)
(j)Fig. 4 (continued). (j) Successful embedding under random noise.
87
88 IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
THIS PICTUJDE HAD lAD HlDS ADDO300 COCOIDNI 'HID PICTUREE DAD 160 DtVwS AND £30 CO)LU4NS
ItIt.
la
3D
SN
6'\'( }~'vregetation 145 S % D % ._
Vegetation 3\s
'Al
DIO
10~ ~ ~ ~ 7/
I5 I *
'U2
1.K.114
ID
I i14 Se~Seshre___ .. -
(a)Fig. S. Example of. image-matching experiment using a terraini scenie. (a) Reference for terrain.
FISCHLER AND ELSCHLAGER: PiCTORIAL STRUCTURES 89
IH000 PICYOIP HAS 100 0901 AND 000 CON.100NS lots PUctuOPF HSo I&O 0000S 0N0 200 CUL0.0400
02
30
41..
......:)veegettation1
40Si.
Is~~~~~~~~~~~~~~~~~~~~~
102
904
I0I00
li.tl/ e~tt~f
121~1.
140
144
Fig. 5 (continued). (b) Embedding of reference in sensed image for terrain.
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
3834
AI,3049'St
at-
To33
304
3,
3r,
i4.
143
14114.
St.
392
319
-------------------913i334'?,*p33F4~T03'I?1'~730T3T W 33l~'S3~'3t3
A '403334333333333333033333333333 43 34333A3A333393314333343A33333333A33313A33v333333 Po ""39333 033333433393333I33I 3I333II33344I433I333I34I336I...3.34393..3
3fIf%OVV"A333333393333q34-333333333.i3339330 "O 3 Al4033944 33A3343333333A3393k33I3A33All33Sit3v33A33333 .13333333P33333"333......3 )A&314*3q*43333033403
m- I GO--mmaAm"M A x 4- -4 I It 441 A.-Vffft I t Iso".33 -&t33333333333o333333433333333343343a,*33333333I.333334333333 333 3333333333394404033933333340333334333433333333333333404I33 333433333333333333v333f3333333333333333 333333333333334 333I3I3I433333-33A3It33la331 .33333 ...333333333..333343333433
* 333333433~AAA A 133333333403r43333A33 AA-RV3333433333.0-4-03A333333433333333330Ps'- 1.333333343333333I3veto 344 3333333333440333333334333333333.99.,000*90.4..33 9"433933..33X333333333333of
33333333 43333333433 0- I .40i0V W A-33A333431333333433333333334333 . 333P3333vel A33 .* #v I 3qqs$33 te33of 4
St 4333333433433g333433333336333II I43333333433344363333303339333333333333 3333I333333333.3Z33.33.33333333Z333333?33 03334333OA3334333343333333633433333393 3343. 3 33333333333333333334433333333333433 44.33..33.3434334333333333 33S3393
" 114$$ I t n II ft i? TrtrTIT I zrzztwwrtitI tIrttzrwr z trz I "ZI?trzl I r?" itmmAv3.333 33..mvm33 II I 33 333 3333xxo333040W3RA0833333333339AA&3A333334333333111.41-48TIltit~ ~ ~ ~ ~ ~ ~ ~~~~~~~~~~~~~~~~3733333.33333333333333333339333 3393333"W119333340331. 303333033443303333143433333333i43333333333333393333333333333333333333333333333I34333333333333333l33333 33333333
posed a rigid template which is correlated at the dif-ferent positions of the sensed scene in order to get thelocal evaluation array. The correlation is performed onlyat the indicated points of the template. Reproductiondifficulties make it hard to see the intensities of theseenclosed points. For "Vegetation 1," all the points areof intensity 40, for "Vegetation 2," 20. For "Urban," thepoints were randomly assigned intensities of 25 and 35,and so on for the other pieces.The terrain-matching experiments represent a con-
tinuing area of investigation. Our current efforts areprimarily directed toward obtaining a satisfactory setof primitives which can be used as the basis for referencecomponent description. In low-noise experiments, withlimited geometric distortion, the simple descriptions(i.e., ad hoc shape and texture components linked bysprings) produced correct embeddings. Experimentsunder more severe conditions remain to be performed.
Implementation DetailsIn all of the face and terrain experiments presented
in this paper, the springs were assigned the values
0g
gij(Xj -xi) = q
t oo,
if A < row (xj-xi) < B and
C < column (xj- xi) < D
otherwise.
The values A, B, C, and D were typically set by takinga number of sample pictures and determining the small-est box encompassing the variation in the relative posi-tion between the components i and j. A subroutine was
written to perform this task and automatically set thederived parameters into the LEA.
In our current implementation, for a typical 35 X39(face) picture, we require 13 s to compute all theL(EV)A's, and 35 s to execute the LEA (these timesinclude picture input from a disk library, and storageof results on the disk). Most of the programming is inFortran, and the computer is-an IBM /360/40 (the addi-tion instruction on the 360/40 executes in 11.88 ,s).Total core and disk storage in bytes for all arrays is
M(4f+2h) and NM(f+2h), respectively, where M, N,f, h are the number of resolution cells in the sensedimage, the number of L(EV)A's, the amount of core
needed for a floating point number, and the amount ofcore required for a (small) integer. For a typical picturein the face experiments (M=1600, N=8, J=4, h=2),these requirements woi-k out to appr-oximately 32 Kbytes of core, and IOOK bytes of disk storage.
DISCUSSION
Many, though by no means all, visual objects can bedescribed by breaking down the object into a number ofmore "primitive parts," and by specifying an allowablerange of spatial relations which these "primitive parts"must satisfy for the object to be present.As an example, suppose we want to describe a frontal
view of a standing person. This visual object could be
decomposed into six primitive pieces: a head, two arms,a torso, and two legs. For this visual object to be presentin an actual picture, it is required that these six primi-tives occur (or at least that some significant subset ofthem occurs), and also that they occur within a certainspatial relationship one to the other-that is, the legsshould be next to each other, and below the torso; thetorso should be between and below the tops of the twoarms; and the head should be on top of the torso.
It may be noticed that in the previous two para-graphs, we implicitly separated the local aspects fromthe global aspects of the description; the local aspectsare the primitive parts of the picture, and the globalaspects are the spatial relations between these parts.While at first glance it does not seem unnatural to makethis separation, in practice there is a frequently encoun-tered difficulty, that is, the feedback between thWe localand the global.To illustrate this difficulty, let us go back to the ex-
ample of the view of a standing person. Any methodwhich detects torsos on a local level (that is, a methodwhich detects torsos without using any knowledge ofthe positions of nearby arms and legs) might very welldetect several torsos in a picture. In fact, the actual or"true" torso may be one of the weaker of the torsosdetected by the method; it may even happen that thetrue torso is not detected at all. What does determinethe position of the true torso is the position of the truearms, legs, and head. But, unfortunately, the reverse istrue. The positions of, say, the true arms depends onthe position of the true torso. Thus, the possible positionof each piece affects the possible position of each otherpiece, making for a circular type of dependency.
It seems that whatever the visual object is, wheneverwe try to separate the global and the local, the samecircular dependency occurs. Many times attempts torecognize visual objects described in such a way involvealternating between local and global analysis, heuristics,backup procedures, etc.One conceptual way of avoiding this circularity is to
evaluate simultaneously a complete interpretation of thepicture; e.g., in the example given above, we would lookat, and evaluate, complete configurations of head, arms,legs, torso, etc. The best complete interpretation couldthen be chosen. This approach, however, requires thatwe make an infeasibly large number of evaluations. Itwas just this computational problem that in the firstplace led to the decomposition approach.The implication of the above discussion is that, in
general, we cannot hope to decompose the global evalua-tion problem into a number of smaller independent prob-lems, but rather must use something akin to the simul-taneous evaluation, taking advantage of any reductionin total variable interdependency to reduce the requirednumber of such evaluations.
In this paper, we accomplish this through the follow-ing machinery. First, an embedding metric is presentedwhich sets the framework for evaluating how well any
91
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
composition of primitive picture pieces (parts of the
decomposed picture) matches the desired composite
picture. Second, a sequential optimization (dynamic
programing-type) algorithm is developed which takes
advantage of the decomposition to reduce drastically
the computational requirements (our computationalrequirements grow linearly with the size of the picture,
rather than exponentially). The contribution of this
paper is the simultaneous offering of the above two com-
ponents and their suitability for application to a wide
class of pictorial objects.In addition to the image-matching application, which
was the center of most of the development in this paper,
we have also attempted to establish the utility of the
representational aspects of the embedding metric for
general picture description applications.The work presented here is a continuation of the
investigation described in [1] and [2], where Fischler
uses sequential optimization for matching two-dimen-sional scenes, introduces the generic form of the em-
bedding metric elaborated on here, and presents the con-
cepts of coherent segmentation, arbitrary serialization,and sequential constraints. The relation of the heuristicembedding problem to formal decision theory is also
discussed. The only other paper in which sequentialoptimization is applied to a broad class of problems in-
volving two-dimensional scenes is where Martelli andMontanari [4] present a metric and matched algorithm
for smoothing pictures. Kovalewsky [5 ] and Montanari[3] have applied dynamic programing to the detectionof (one-dimensional) line-like pictures. Reference [3] isan outstanding paper, in which Montanari providesmuch insight into the characteristics of sequential opti-
mization. Both Kovalewsky [5 ] and Montanari [3]comment on the representational aspects associatedwith their optimization procedures. An embedding met-
ric conceptually very similar to the one given in [1 J, [2 ],and this paper is discussed in a broad and interestingwork by Bremermann'0 [121 with respect to its poten-
tial use in character recognition, speech recognition, and
control of effectors (e.g., manipulators).
ACKNOWLEDGMENT
The authors wish to thank 0. Firschein and J. Tenen-
baum for many constructive suggestions relative to theorganization of this paper.
REFERENCES[1] M. A. Fischler, "The detection of scene congruence," Lockheed
[2] M. A. Fischler, "Aspects of the detection of scene congruence," inProc. 2nd Int. Joint Conf. Artificial Intelligence (Advance Paper),Sept. 1971, pp. 88-100.
[3] U. Montanari, "On the optimal detection of curves in noisypictures," Commun. Ass. Comput. Mach., vol. 14, pp. 335-345,May 1971.
[4] A. Martelli and U. Montanari, "Optimal smoothing in pictureprocessing," in Proc. IFIP Congr. Amsterdam, The Nether-lands: North-Holland, 1971.
10 Other relevant publications by Bremerman) incltude [131 and[14].
[5] V. A. Kovalewsky, "Sequential optimization in pattern recog-nition and pattern description," in Proc. IFIP Congr. Amster-dam, The Netherlands: North-Holland, 1968, pp. 146-151.
[6] R. Bellman and S. Dreyfus, Applied Dynamic Programming.Princeton, N. J.: Princeton Univ. Press, 1962.
[7] U. Bertele and F. Brioschi, "A new algorithm for the solutionof the secondary optimization problem in nonserial dynamicprogramming," J. Math. Anal. Appl., vol. 27, no. 3, pp. 565-574,1969.
[8] 0. Firschein and M. A. Fischler, "Describing and abstractingpictorial structures," Pattern Recognition, vol. 3, pp. 421-444,Nov. 1971.
[9] A. Rosenfeld, Picture Processing by Computer. New York:Academic, 1969.
[10] W. F. Miller and A. S. Shaw, "Linguistic methods in pictureprocessing-A survey," in 1968 Fall Joint Comput. Conf.,AFIPS Conf. Proc., vol. 33. Washington, D. C.: Thompson,1968, pp. 279-290.
[11] M. D. Kelly, "Edge detection in pictures by computer usingplanning," in Machine Intelligence 6, B. Meltzer and D. Michie,Ed. New York: Elsevier, 1971, pp. 397-410.
[12] H. J. Bremermann, "Cybernetic functionals and fuzzy sets," inAnn. Symp. Rec. 1971 IEEE Syst., Man Cybern. Group,Oct. 1971, pp. 248-253.
[13] "Pattern recognition, functionals, and entropy," IEEETrans. Bio-Med. Eng., vol. BME-15, pp. 201-207, July 1968.
[14] -, "What mathematics can and cannot do for patternrecognition," in Pattern Recognition in Biological and TechnicalSystems, 0. J. Grusser, Ed. Heidelberg, Germany: Springer,1971, pp.31-45.
[15] W. W. Bledsoe, "The model method in facial recognition,"Panoramic Research, Inc., Palo Alto, Calif., Rep. PRI:15,Aug. 1966.
[17] A. J. Goldstein, L. D. Harmon, and A. B. Lesk, "Identificationof human faces, "Proc. IEEE, vol. 59, pp. 748-760, May 1971.
Martin A. Fischler (S'57-M'58) was born inNew York,I- Y., on February 15, 1932. Hereceived the B.E.E. degree from the City Col-lege of New York, New York, in 1954 and theM.S. and Ph.D. degrees in electrical engineer-ing from Stanford University, Stanford, Calif.,in 1958 and 1962, respectively.
He served in the U. S. Army for two yearsand held positions at the National Bureau ofStandards and at Hughes Aircraft Corpora-tion during the period 1954 to 1958. In 1958
he joined the technical staff of the Lockheed Missiles & Space Com-pany, Inc., at the Lockheed Palo Alto Research Laboratory, PaloAlto, Calif., and currently holds the title of Staff Scientist. He hasconducted research and published in the areas of artificial intelligence,picture processing, switching theory, computer organization, andinformation theory.
Dr. Fischler is a member of the Association for Computing Ma-chinery, the Pattern Recognition Society, the Mathematical Associa-tion of America, Tau Beta Pi, and Eta Kappa Nu. He is currentlyan Associate Editor of the journal Pattern Recognition and is a pastChairman of the San Francisco Chapter of the IEEE Society on Sys-tems, Man, and Cybernetics.
Robert A. Elschlager was born in Chicago,Ill., on May 25, 1943. He received the B.S.degree in mathematics from the University ofIllinois, Urbana, in 1964, and the M.S. degreein mathematics from the University of Cali-fornia, Berkeley, in 1969.
Since then he has been an AssociateScientist with the Lockheed Missiles & SpaceCompany, Inc., at the Lockheed Palo Alto Re-
Q< search Center, Palo Alto, Calif. His currentinterests are picture processing, operating
systems, computer languages, and computer understanding.Mr. Elschlager is a member of the American Mathematical
Society, the Mathematical Association of America, and the Associa-tion for Symbolic Logic.