-
67IEEE TRANSACTIONS ON COMPUTERS, VOL. c-22, NO. 1, JANUARY
1973
[4] K. D. Senne and R. S. Bucy, "Digital realization of
optimaldiscrete-time nonlinear estimators," in Proc. 4th Annu.
PrincetonConf. Syst. Sci., 1970.
[5] R. S. Bucy and K. D. Senne, "Digital synthesis of
non-linearfilters," Automatica, vol. 7, pp. 287-298, 1971.
[6] E. C. Tacker and T. D. Linton, "Digital and hybrid
simulationof a discrete-time optimal nonlinear filter," in Proc.
4th HawaiiInt. Conf. Syst. Sci., 1971, pp. 465-467.
[7] -, "Digital and hybrid simulation of a Bayes-optimal
non-linear filter," Studies in Digital Automata, Louisiana
StateUniv., Baton Rouge, Air Force Office of Scientific
ResearchContract F-44620-68-C-0021, Tech. Rep. LSU-T-TR-40,
Sept.1970.
[8] R. E. Mortensen, "Mathematical problems of modeling
sto-chastic nonlinear dynamic systems," J. Statist. Phys., vol. 1,
no.2, 1969.
[9] C. W. Helstrom, "Markov processes and applications," in
Com-munication Theory, A. V. Balakrishnan, Ed. New
York:McGraw-Hill, 1968.
[10] R. S. Bucy, M. J. Merritt, and D. S. Miller, "Hybrid
computersynthesis of optimal discrete nonlinear dynamic systems,"
inProc. 2nd Symp. Nonlinear Estimation Theory and Its Applica-tions
(San Diego, Calif.), 1971.
[11] ,"Hybrid synthesis of the optimal discrete nonlinear
filter,"Stochastics, vol. 1, Jan. 1973.
[12] D. S. Miller, Ph.D. dissertation, Dep. Elec. Eng., Univ.
South-ern California, Los Angeles, 1972.
Edgar C. Tacker (S'59-M'64) was born in Savannah, Tenn.,
onSeptember 26, 1935. He received the B.S. degree (with
distinction) inelectrical engineering from the University of
Oklahoma, Norman, in1960, the M.S. degree in electrical engineering
from New YorkUniversity, New York, N. Y., in 1962, and the Ph.D.
degree fromthe University of Florida, Gainesville, in 1964.
IN 01!31,11~5~Q~N, ~0 His industrial experience includes two:1_
i 5years as a Systems Engineer with Bell Tele-
8 phone Laboratories, Inc. He is presently anAssociate Professor
in the Departments ofElectrical and Chemical Engineering,
Loui-siana State University, Baton Rouge. He hasdeveloped graduate
courses in applied func-tional analysis, systems science, and
digitaland hybrid computation at LSU and MichiganState University,
East Lansing. His currentresearch interests are in the areas of
stochastic
control theory and multilevel system theory, with emphasis on
com-putational aspects. He has applied these theories to problems
involv-ing the control of chemical processes and interconnected
electricalenergy systems.
Dr. Tacker is a member of Tau Beta Pi, Pi Mu Epsilon, and
EtaKappa Nu.
Thomas D. Linton was born in Frost, Ohio,on November 9, 1944. He
received the B.S.
i degree in electrical engineering from the Uni-versity of
Arkansas, Fayetteville, in 1967, andi% the M.S. degree in systems
science fromMichigan State University, East Lansing,in1969.He is
presently working toward the Ph.D.
degree in the Department of Electrical Engi-neering, Louisiana
State University, BatonRouge.
Mr. Linton is a member of Eta Kappa Nii and Phi Kappa Phi.
The Representation and Matching of Pictorial StructuresMARTIN A.
FISCHLER AND ROBERT A. ELSCHLAGER
Abstract-The primary problem dealt with in this paper is
thefollowing. Given some description of a visual object, find that
objectin an actual photograph. Part of the solution to this problem
is thespecification of a descriptive scheme, and a metric on which
to basethe decision of "goodness" of matching or detection.We offer
a combined descriptive scheme and decision metric
which is general, intuitively satisfying, and which has led to
promis-ing experimental results. We also present an algorithm which
takesthe above descriptions, together with a matrix representing
the in-tensities of the actual photograph, and then finds the
describedobject in the matrix. The algorithm uses a procedure
similar todynamic programming in order to cut down on the vast
amount ofcomputation otherwise necessary.
One desirable feature of the approach is its generality. A
newprogramming system does not need to be written for every
newdescription; instead, one just specifies descriptions in terms
of acertain set of primitives and parameters.
There ate many areas of application: scene analysis and
descrip-tion, map matching for navigation and guidance, optical
tracking,
Manuscript received November 30, 1971; revised Mav 22, 1972,and
August 21, 1972.
The authors are with the Lockheed Palo Alto Research
Labora-tory, Lockheed Missiles & Space Company, Inc., Paln
Alto, Calif.94304.
stereo compilation, and image change detection. In fact, the
abilityto describe, match, and register scenes is basic for almost
anyimage processing task.
Index Terms-Dynamic programming, heuristic optimization,picture
description, picture matching, picture processing,
represen-tation.
INTRODUCTIONTllHE PRIMARY PROBLEM dealt with in thisT paper is
the following. Given some description of
a visual object, find that object in an actual photo-graph. The
object might be simple, such as a line, orcomplicated, such as an
ocean wave, and the descriptioncan be linguistic, pictorial,
procedural, etc. The actualphotograph will be called the "sensed
scene," a two-dimensional array of gray-level values, while the
objectbeing sought is called the "reference."
This ability to find a reference in a sensed scene,
or,equivalently, to match or register the images of twoscenes, is
basic for almost any image processing task.Application to such
areas as scene analysis and descrip-tion, map matching for
navigation and guidance, optical
-
1EEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
tracking, stereo compilation, and image change detec-tion is
direct and obvious.There are two basic approaches to solving the
image-
matching problem as defined above.If we possess a precise
description of the noise and dis-
tortion process which defines the mapping between thereference
and its image in the sensed scene, we can em-ploy statistical
decision theory techniques to derive animage-matching procedure
which optimizes some objec-tive criterion (e.g., minimum error, or
minimum risk,in determining the best embedding of the reference
inthe sensed scene). A typical outcome of such an analysisis the
use of a correlation-like matching procedure. How-ever, in most
practical problem situations, the requirednoise and distortion
model is not available, nor is itfeasible to attempt to construct
one. For example, wemight consider all human faces to be perturbed
versionsof some single ideal or reference face. However, to
at-tempt to define completely a valid noise and distortionmodel for
this situation would be a hopeless task.A second and more general
approach to the image-
matching problem bypasses the need for a noise and dis-tortion
model by accepting an embedding metric with-out requiring its
justification as being equivalent to aminimum error (or minimum
risk) procedure. In fact,without a noise and distortion model,
there is no the-oretically valid way to derive or predict the error
per-formance of a selected procedure prior to its
actualapplication. Our primary concern in this paper is withthis
latter case.We offer the following two sets of criteria for an
em-
bedding metric not theoretically derived on the basisof error
performance. First, it must be successful inapplication (i.e., its
observed error performance mustbe acceptable), intuitively
satisfying (so we can havesome confidence in its ability to deal
with as yet untriedapplications), and general enough so that it can
be em-ployed over a wide range of problems without
significantmodification. Second, it must be possible to specify
acomputationally feasible decision algorithm which se-lects a
suitable embedding based on the given em-bedding metric. (The
combination of embedding metricand corresponding decision algorithm
will be called anembedding model.) In the remainder of this paper,
wewill present an embedding metric and correspondingdecision
algorithm, present examples of the applicationof this embedding
model to a number of distinct prob-lem areas, and show how this
embedding model is rele-vant to the problem of scene representation
as well as toimage matching.
AN EMBEDDING METRIC
To introduce the generic form of our embeddingmetric, and
establish its intuitive validity, let us firstconsider the
following process. Assume that the refer-ence is an image on a
transparent rubber sheet. Wemove this sheet over the sensed image
and, at each pos-sible placement, we pull or push on the rubber
sheet to
get the best possible alignment between the referenceimage on
the sheet and the underlying sensed image.We evaluate each such
embedding both by how good acorrespondence we were able to obtain
and by how muchpushing and pulling we had to exert to obtain
it.
Let us now consider a discrete version of the aboveprocess which
is both more precise and more reasonablefrom an implementation
standpoint. In a specific appli-cation, we might have some
information on the rangeof permissible distortions that can occur
between thereference and sensed images. For instance, some subsetof
the items appearing in the sensed image might alwaysretain their
internal shape even though their relativepositions might be subject
to change with respect totheir locations in the reference scene.
Further, wherechange of relative position is possible, we might be
ableto bound the extent of such change; and, finally, wemight like
to assign variable "costs" to the differenttypes of change of
relative position or relative change insome nongeometric
attribute.To achieve these capabilities, we replace the rubber
sheet by a reference image which is composed of a num-ber of
rigid pieces (components) held together by"springs." A rigid piece
of the reference image can be assmall as a single resolution cell,
or as large as the entirereference image, and corresponds to a
single coherententity in the reference image. The springs joining
therigid pieces serve both to constrain relative movementand to
measure the "cost" of the movement by howmuch they are "stretched."
(Typically, the springs willbe highly nonlinear in their behavior.)
In determiningthe cost of an embedding, we measure the "tension"
oneach spring (the tension can be a function of directionas well as
stretch or even a relative change in somelocally defined
attribute), and also make a local evalua-tion of how well each
coherent piece is embedded as anindependent entity.The above model
permits two interesting dichotomies.
The first dichotomy is the separation of "syntactic"
and"semantic" information. The semantic information,which is
application dependent, is embodied in thespecific partitioning of
the reference into coherentpieces, the placement and cost functions
assigned to thesprings, and the cost functions associated with the
inde-pendent embedding of the coherent pieces. The syn-tactic
information, which is relatively independent ofthe particular
application, defines the class of descrip-tions which the algorithm
can process. These data areembodied in the limits set on reference
decomposition(e.g., number and maximum size of pieces, etc.); in
theformats which must be employed to specify the globalconstraints
and costs; and in the form of the embeddingmetric which evaluates
"global" fit. The separation ofsemantic and syntactic information
is essential to per-mit application of the model to a broad range
of prob-lem areas without the necessity of making
significantchanges in the implementation.The second dichotomy is
the separation between the
68
-
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
local and global evaluation functions. The global evalu-ation
function, associated with the relative positioningof the coherent
pieces as described previously, hasstrong syntactic controls on its
form to permit its inte-gration directly into the decision
algorithm. This is im-portant because the global evaluation
produces the mostsevere combinatorial problems. A local evaluation
func-tion, associated with how well a given coherent piece
isindependently embedded, is easily changed from prob-lem to
problem (based on problem-dependent considera-tions) without
requiring any change in the core algo-rithms. Thus, the form of a
local evaluation functioncan be a (conventional) correlation
function togetherwith a pictorial reference component, or a
procedurebased on linguistic concepts together with a
formaldescription of a reference component,' or even a seriesof
guesses in'serted interactively by a human evaluator.The decoupling
of the local evaluation functions fromthe core algorithms provides
a great deal of flexibilityin' making changes or improvements in
the evaluationfunctions for a given problem, as well as when
switchingfrom problem to problem. Further, because of the
aboveseparation, the performance of the algorithms (bothlocal and
global) can be independently evaluated in adirect and intuitively
obvious manner. Such an evalua-tion then permits iterative
improvement in performanceby selective alteration in the
problem-dependent options.We are now in a position to present
formally the pro-
posed embedding metric. Let the reference be composedof p
components (i.e., p coherent, or primitive, pieces).For 1
-
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
global (intercomponent) constraints, and, if acceptable,then
evaluate the embedding metric for the given selec-tion. Obviously,
such an approach is completely imprac-tical even for small
pictures. For example, a 50X 30 SM(M = 1500) and a 5 X 5 RM (N =
25) would require morethan 1054 selections and evaluations, a
hopelessly largenumber. If we assume that the coherent and
relationalconstraints provide us with P = 6 nonoverlapping
com-ponents, each component sequentially constrained tostay within
w = 10 locations referenced to the locationof the previously placed
component, then we would stillbe required to perform on the order
of 1500 X 105= 1.5 X 108 evaluations. Assuming 10-3 s per
evaluation,we would require 1.5 X 105 s or approximately two daysof
computation time. It is thus obvious that a moreeffective technique
is required.
Dynamic ProgrammingExpressed in formal terms, the evaluation of
the em-
bedding metric for a typical picture results in a non-linear,
integer programming problem with local optimadifferent from the
global optimum (i.e., no particularregularity, such as
unimodality). The only availableclass of computational procedures
for finding the globaloptimum under the above conditions (other
than the ex-haustive search techniques discussed earlier) is
usuallydesignated by the generic name dynamic programming.DP is a
multistage or iterative optimization proce-
dure which can be described in general terms as follows(see [6]
and [7]). We wish to find
min G(X) = E h(Xi) (3)X iEI
where X= {X1, X2, , xp4, each xi, 1
-
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
table forfk and yk*. The number of lookups required forthe
construction of this kth table will be proportional tothe number4
of its rows- [AMD(G)j] and (for each row)the constrained number of
feasible locations of the-vari-able being eliminated (denoted Wk,
whereWk
-
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
responding locations of the embedded components aredetermined
from Y [see (20)].7
Given the restriction that the hi in (3) are independentof all
xj forj > i [this is consistent with the embeddingmetric as
presented in (2) ], then dynamic programmingcan be viewed as a
procedure for finding the shortestpath through a graph. We define
this graph in the fol-lowing way. The nodes of the graph are
arranged in pcolumns, where each node in the ith column is
labeledby the values of the variables corresponding to a uniqueset
of embeddings for the first i components; there areas many nodes in
the ith column as there are uniqueembeddings of the first i
components. The length of abranch between a node in column i -1
(with label Xi1)and a node in column i (with label Xi) is hi(Xi).
Wedelete branches with infinite length.To determine the shortest
path from the root node
(a single node placed in column zero) to some node incolumn p,
we can proceed as follows. In the ith column,for each node, sum all
the branch lengths correspondingto the label of the node. We now
determine that set ofvariables (xi) which do not appear in any hj
for j> i,and call this set of variables Zi (the variables in Zi
cor-respond to components in columns ji). We now place the nodes in
the ith column intosets, such that, for each set, the nodes are
identical intheir labels except for the variables in Zi. For each
suchset, we retain the node with the shortest path lengthfrom the
root node, and delete all the other nodes in theset as well as
those nodes in columnsj>i which branchfrom deleted nodes. It is
this pruning process whichgives DP its computational advantage over
completeenumeration. After the above set of operations is
carriedout through the Pth column, we select the node definingthe
shortest path through the tree, and thus the lowestcost embedding
for the P components. The LEA differsfrom DP in that in the
sequential determination of theshortest path, a maximum of only mi
nodes will be re-tained in the ith column (mi is the number of
permissiblelocations for embedding the ith component). In
pro-cessing the ith column, the nodes are grouped into setssuch
that, for each set, the nodes are identical in theirlabels for xi.
For each such set we then proceed as in theDP case. Thus, at the
ith iteration we save only the mi"best" current embeddings, such
that every possible
7The yi defined in (16) clarify the presentation of the LEA.
How-ever, in a computer implementation one need Inot calculate
these yi,which are arravs, each element of which is a sequence of
locations.Rather, it is only necessary to compute a more restricted
funiction Zidefined by
Zi (Xi) = Xi-lwhere xi- is that value which minimizes the
previous equiation
These zi are arrays, each element of which is a single
locatioiirather than a sequence of locations. Then in the
compuitations in-volving the Si, if Si depends on more than the two
rightmost loca-tions of y_ (xi-,)*xi, the additional locations may
be retrieved fromthe Zk, 1
-
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
YSM
44 5 2 8
3 7 5 1 3
2 8 1 5 7
1 4 3 2 4
1 2 3 4 -_z
C1 (1, 1)
c2 (1,1)
C3 (1, 1)
C4 (1,1)
4Q-L C2
2 3
= CI =36C2 4
=C3 =5=C4 =3
I(z Y) = I SM(z,Y) Cifor I i - 4
Spring definition when (i, j) = (2, 1) or (i,j) - (4,3)
Xi - Xj =(Zi - Zj J _ yj) gi (xi- X)1,0 02,0 1
otherwise
Spring definition when (i,j) - (4, 1) or (i,j) = (3,2)
xi-=xj- (zi- zi Yi Yj) g.i(Xi-x.)0,1 00,2 1
otherwise
(b)
Evaluation of g2
x2 x1 61 s2z2 Y2 z1Y1 I1 12 g21 9224 14 1 2 0 3
3 4 1 4 1 4 1 6
2 4 4 4 0
4 4 2 4 4 4 1
3 4 2 4 0 6
2 3 1 3 1 1 0 2
3 3 1:3 1 3 1
2 3 1 3 0 4
4 3 2 3 1 1 1 3
3 3 5 1 0
22 1 2 2 3 0 5
3 2 1 2 2 1 1 4
2 2 5 1 0
4 2 2 2 5 3 1
32 1 3 0 4
Evaluation of 93
x3 x2 x s3
z3 33 z2 Y2 zlyl 13 g32 g2 g32 3 2 4 1 4 0 0 3 3
3 3 3 4 1 4 4 0 6 10
4 3 4 4 3 4 2 0 6 8
2 2 2 3 1 3 4 0 2 6
2 4 4 1 3
3 2 3 3 2 3 0 0 4 4
3 4 0 1 6
j4 2 4 3 2 3 2 0 3 5
4 4 2 1 6
2 1 2 2 2 0 5
2 3 1 3 2 1 2 5
3 1 3 2 1 2 3 0 4 7
3 3 3 1 4
4 1 4 2 3 2 1 0 4 5
4 3 1 1 3
Evaluation of g4 = G
x4 23 xl 94Z4y4 Z y33 1 Y 6g43 g41 S4 14 g3 G
1 3 2 3 1 4 0 0 0 4 3 7
3 3 1 4 1 0 1 4 10 15
2 3 3 3 1 4 0 2
4 3 3 4 1 2
3 3 4 3 3 4 0 0 0 2 8 10
1 2 2 2 1 3 0 0 0 5 6 11
3 2 2 3 1 5
2 2 3 2 2 3 0 0 0 2 4 6
4 2 2 3 1 0 1 2 5 8
3 2 4 2 2 3 0 2
1 1 2 1 1 3 0 1 1 1 5 7
3 1 1 2 1 0 1 1 7 9
21 31 12 0 0
41 32 1 0
31' 4 1 3 2 0 0 0 1 5 6
(c)
Fig. 1. An example illustrating the operation of the linear
embedding algorithm. The definitions of x, gij, I, are given on
pagesz and y are the components of x; that is, x = (z, y). (a) The
sensed image. (b) The reference description. (c) Linear embedding
algorithm.
73
x= (z Y)
1,42,43,44,41,32,33,34,31,22,23,24,21,12,13,14,1
SM (Zty)
5288751381574324
(a)
-[ (2, 3), (3, 3), (3, 2), (2, 2)]
-I (3, 2), (4, 2), (4, 1)(3, 1)]
-
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
ponents, or perhaps by someone guessing at what a"good"
embedding might actually look like.) Now, em-ploying the LEA (or
DP) to place and evaluate thecumulative cost of placement of the
components se-quentially, we can eliminate from further
considerationany of those placements whose costs exceed the
boundestablished by our best trial embedding. It should benoted
that this technique is valid only if the cost asso-ciated with the
embedding of each component is non-negative. However, this can
almost always be the casefor the class of problems we are
discussing in this paper.A heuristic embellishment of the above
branch-and-
bound technique would be to use some fraction (say[k/N]') of the
bound at the kth stage of an N-stageprocess as the threshold for
eliminating a possible se-quence of embeddings.
Scale and Rotation (S&R) ConsiderationsIn attempting to
match or register two images, we
frequently are faced with the problem of unknown rela-tive scale
and orientation. While such variations areconceptually indistinct
fromn any of a host of unwantedvariations between the reference and
the image, they(S&R) can serve as a vehicle for clarifying some
im-portant issues pertaining to the way the LEA is em-ployed.As
noted in earlier sections, the embedding process is
carried out at two levels. First, the components of thereference
are searched for as independent entities. Theparticular processes
by which these searches are exe-cuted are not a direct issue of
concern here; the im-portant point is that, regardless of the
search mecha-nism, the outcome of the search for any individual
com-ponent is presented to the LEA by a tabulation calledthe local
evaluation array [L(EV)A]. Each entry in theL(EV)A corresponds to a
possible embedding in theimage of the associated reference
component, indexed bythe variables used to define the embedding.
The entryconsists of a number related to the probability that
the-component is actually present at the "location" specifiedby the
indexing variables, and each entry can also con-tain the values of
attributes of the component as mea-sured at the indexed location.
The LEA has no knowl-edge of the component beyond what is presented
to itin the L(EV)A for that component. The purpose of theLEA is to
integrate global or structural knowledge withthe information
provided in the L(EV)A's to find thebest overall embedding (or
embeddings) of the referencein the image. The acceptability of the
final embeddingselected by the LEA will thus be dependent on tlhe
qual-ity of the information presented in the L(EV)A's,where the
extent of this dependence is related to therelative importance of
local (component definition) ver-sus global (intercomponent)
information for the particu-lar problem. Thus, it is the
responsibility of the localevaluation function, in attempting to
gather evidenceabout the presence of some given reference
component,to be able to deal with the various noise and
distortion
Fig. 2. Reference description of a squiare.
processes (such as S&R) which might be encountered.To the
extent that these same noise and distortion pro-cesses affect the
global or structural relationships be-tween the reference
components, the LEA provides themachinery necessary to deal with
the resulting problems.
Ability to deal with variations at the global level
isaccomplished by defining "attributes" which measure(or estimate)
these variations, and then making the"spring" parameters functions
of these attributes.Thus, in the case of S&R, if a component Pi
has scale
and rotation attributes Si and 01, the springs (vectors)attached
to P1 would be (conceptually) scaled as a func-tion of S1, and
rotated as a function of 01. The followingexample illustrates some
of the above comments.
Problem
Given a two-dimensional region in which there are krandomly
oriented and positioned line segments, findthe four-line segments
which best approximate a square.Each line is specified by a
four-tuple of the type (x, y,0, 1) where the x, y coordinates
locate the center of thesegment, 0 specifies the orientation, and I
the length ofthe segment. To simplify this example, we will
ignorethe detection problem and assume that the given valuesfor
each segment are known with probability one. Weassign a cost Ci for
each unit of positional disparity be-tween the sides of a candidate
square, and a cost C2 foreach degree of rotational disparity
between the sides(i.e., sides should meet at right angles).
Solution
Consider a single "local evaluation array" consistingof the
given list of four-tuples (x, y, 0, 1). We can con-sider x, y to be
the "location" indexing variables, and0, 1 to be attributes. All
entries have unit probability,entries with zero probability are
deleted. The "descrip-tion" of a square is shown schematically in
Fig. 2. Each(spring) vector is rigidly attached to its line segment
ata fixed angle of 45°. Costs (C1) associated with (x, y)
dis-parity between the end of a vector and the center of thenext
line segment are circularly symmetric (i.e., the"cost" function
increases with distance from the tip ofthe vector) about the tip of
the vector. We also assess acost (C2) proportional to the
difference in measured
74
-
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
(attribute value) orientation of more or less than 900for
sequential line segments.The general form of the LEA, with
orientation ad-
justed springs, and spring costs augmented by attribute(S&R)
differences, is adequate to deal with the prob-lem as posed. A
minor difficulty arises from the fact thatthe orientation of a line
segment is actually two valued(i.e., 0 and 0+1800). If we list each
line segment twicein the local evaluation array, once for each of
its twoorientations, then the LEA can be applied
withoutmodification.We can handle squares of differing size by
using the
line-segment-length attribute in the same manner as
theorientation attribute (except that double entries are
notrequired here). Spring stretching is augmented by acost
proportional to the difference in length attributefor sequential
line segments. That is, when the LEAexamines a new line segment as
a possible additionalside for a square already partially formed, it
comparesthe length of the new line segment with the length of
theline segment in the partial square to which the new seg-ment
will be attached. The spring between the new andthe old line
segments then is stretched (over and aboveany stretching due to
angular and positional disparity)by an amount proportional to the
difference in theselengths.
In the above example note that, because each entryin the local
evaluation array had either zero (oo cost)or unit (0 cost)
probability associated with its occur-rence, the size of this array
could be reduced to listingonly those few coordinate combinations
associated withthe feasible (nonzero probability) occurrence of a
com-ponent (line segment). This procedure can be used inother
situations where it is reasonable to reduce all lowprobability
entries in the local evaluation array to zero.
PICTORIAL REPRESENTATION
A central problem in much of the work concernedwith the computer
processing of pictorial data is that ofrepresentation. Since we
cannot manipulate the realworld object (itself) within the
computer, we attempt toconstruct a representation (or model) which
can be usedin place of the actual object and which has the
following(somewhat overlapping) properties.
Complete: Any question of interest which could beresolved by
reference to the actual object should also becapable of being
resolved by reference to the represen-tation.
Compact: The representation should be free of infor-mation
redundant to the purposes for which it will beused. This is
necessary to minimize computer storagerequirements.
Transformable: Much of the information contained ina
representation will be implicit rather than explicit inform. The
ability to manipulate easily the representa-tion to extract
required information is essential. Forexample, if we represent a
picture by an intensity matrixor raster, then a count of the number
of isolated objects
appearing in the picture would be implicit informationwhich
could be extracted from the representation afterconsiderable
processing. However, if the representationconsisted of the contours
of the object appearing in thepicture, then the required count
could be obtainedrather simply.
Incrementally Changeable: If we observe a slightchange in the
real world object, it should be a relativelysimple and
straightforward task to alter the representa-tion. Further, from
the standpoint of image matching,a small change in the real world
should require only asmall change in the representation.
Accuracy and Simplicity of Translation: Given a realworld
object, it should be relatively simple to derive anaccurate
representation of the object.Over the past ten years or so, much of
the work con-
cerned with pictorial representation has been restrictedto the
domain of line type drawings, and the use of for-mal linguistic
methods (see [8 ]- [10 ]). Very little successhas been achieved in
attempts to extend this work toscenes of terrains, cloud covers,
human faces, etc.,which can only be described meaningfully in terms
ofpicture components which are not line elements, butwhich are
regions with colors, textures, shadings, etc.8
Perhaps the most serious failing of the linguistic (andsimilar)
techniques occurs with respect to the "transla-tion" property.
These techniques build a representationby constructing a hierarchy
based on picture primitives,assembled into linear expressions
employing specifiedrelational forms, and satisfying a set of syntax
rules. Theproblem arises from the fact that (usually) the
onlydirect correspondence between the actual object and
itsrepresentation occurs at the level of the primitive ele-ments
(typically points, intensities, and lines), while inpractice there
are pieces of a picture that are too in-volved or complicated to
describe in terms of theseprimitives. Theoretically, the
description is possiblesince the matrix representing the picture is
finite. How-ever, such a description would be so complicated
that,aside from the difficulty of composing it, there would bea
considerable likelihood of error and inaccurate
repre-sentation.
In the previous portions of this paper we have pre-sented a
representational scheme for pictures, and wereprimarily concerned
with its application to imagematching. We will now show that the
representationalscheme has wide general applicability, and avoids
manyof the problems of the linguistic approaches. First wenote that
it is a hybrid type of representation in that itinvokes symbolic
(numerical) elements as well as allow-ing actual picture segments
to be part of the representa-tion. The ability to intermix picture
segments and sym-bolic data in the same representation greatly
simplifies
8 Specialized systems have been developed, highly tailored
tospecific problems, which are exceptions to this assertion; e.g.,
seeKelly [11]. A number of papers, including Bledsoe [15], [161
and
* Goldstein and Harmon [17], effectively consider the problem of
faceidentification based on feature measurements obtained
manually.
75
-
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
the translation problem. Where a pictorial concept isdifficult
to describe symbolically, we can use an actualpiece of the picture
as part of the description. A secondaspect of our representation
that simplifies the transla-tion problem is the fact that the
components (picturepieces, local evaluation arrays, etc.) and the
relationalforms (springs) are two-dimensional rather than
one-dimensional entities. We thus avoid the problem of hav-ing to
construct a one-dimensional model for a two-dimensional
structure.
Let us now examine some of the other representationalattributes.
Incremental changeability follows directlyfrom ease of translation
(although the inverse relation-ship would not necessarily hold).
Transformability withrespect to image matching has certainly been
estab-lished; the fact that the representation is already
intwo-dimensional form, with metric, geometrical, andtopological
relationships explicitly and quantitativelyexpressed, implies that
transformability for many otherapplications is more than
adequate.
In many respects, compactness and transformabilityare
antithetical since data in explicit form are usuallymore extensive
than the equivalent implicit information.This is the case with our
representation. It requires con-siderably more storage than might
be required for alinguistic representation.The one area where the
linguistic approach has an
obvious advantage is with respect to completeness. Alinguistic
representation can treat nonpictorial informa-tion (e.g., relations
between items in a picture and otheritems not visible, but perhaps
implied or normallyassociated with the pictured items) in a way
that wouldbe extremely difficult in our representation. This
addi-tional capability could, of course, be achieved by ap-pending
the necessary linguistic machinery to our cur-rent scheme, although
the final result might well be amixture of two representations
rather than one inte-grated representation.
EXPERIMENTAL RESULTS
In order to evaluate the practical implications of thetechniques
presented in this paper, we have initiated aprogram involving
experiments on a variety of line typedrawings and gray-level
imagery. Over 400 experimentshave already been performed with the
following generalresults.
1) On well-defined imagery (i.e., relatively noise-freeand
unblurred -pictures), the embeddings produced bythe LEA were almost
always in agreement with the bestembedding as predetermined by
human evaluators.Where the few deviations did occur, they were
reason-able, and usually related to the crude component
de-scriptions employed.
2) On noisy imagery (course resolution, additiverandom and
coherent noise), the fall in performanceparalleled the difficulty
human evaluators had in locat-
ing suitable embeddings. Where the components werediscernible in
the image, the embeddings were usuallycorrect; those components
which were significantlyaltered by the noise were sometimes missed,
but thesubstitution error was usually in close proximity to
thecorrect embedding location, and, even in error, thecorrect
embedding almost always had a score close tothe best score.
Since the programmed version of the algorithm is stillevolving,
and some of the discussed features have notyet been implemented
(e.g., the "attribute" feature isnot operational as yet), most of
the experiments wereinformal in nature. However, for the purposes
of thispaper, two sets of controlled experiments were run andare
described below.
Image-Matching Experiments Using FacesThe majority of the
experiments we have run to date
had human faces as their subject. Reasons for this selec-tion
include the following.
1) The availability of a set of digitized gray-scalepictures
containing faces.
2) A single reference (face) could be tested on all thefaces in
the data set. In the case of, say, terrain pictures,a separate
reference (or, at best, a unique compositionof standard reference
components) is necessary for eachpicture.
3) Our familiarity with faces and their components(eyes, nose,
mouth, etc.) facilitates evaluation of per-formance as noise and
distortion are introduced.The data set used in the face experiments
consisted of
15 human faces,9 both men (some with beards) andwomen, digitized
to approximately 16 true gray levels,and each face typically was
contained in a picture fieldof from 2000 to 3000 resolution
elements. Using a refer-ence as shown schematically in Fig. 3(a),
with com,ponents as described in Fig. 3(b) and (c), almost
300formal and informal experiments were performed. Ineach
experiment, additive (truncated) Gaussian randomnoise with
zero-mean and standard deviation of either 0,10, or 15 units was
added to each resolution element(relative to a pseudogray scale of
64 units for the noise-free pictures). In some of the informal
experiments,coherent noise consisting of randomly placed lines
wasalso inserted [see Fig. 4(a)].With no more than two or three
exceptions, when the
reference was restricted to hair, eyes, and sides of
face,correct embedding was achieved. These results aregratifying in
view of the simple component descriptionsemployed, and the
equivocation displayed by the re-sulting L(EV)A's. (See examples
shown in Fig. 4.)Two series of formal experiments were run on
the
I These data were obtained from the Staiiford Artificial
Intelli-gence Laboratory and are a subset of the data employed in
the ex-periments described by Kelly [11].
76
-
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
LEFTEDGE
A1P1I
77
V~7$J~O~I RIGHT
NOSE EDGE
MOUTH
(a)
VALUE(X)=(E+F+G+H)-(A+B+C+D)
Note: VALUE(X) is the value assigned to theL(EV)A corresponding
to the location Xas a function of the intensities of locationsA
through H in the sensed scene.
(b)
K K2=CONSTANTSa=(C+D+E+F)/4p=(A+B+G+H+I+J)/6
p-(X+F)IF [X
-
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
1l34567800123456b890li34567890123456789.,
234 4XMeeAl5 Zvoy"9U34X_6 a-119-z8flfaOsx-
7 1%5i2[mMCW*UIZ3A8 Z7iZ- = IZXXXWIO
c so@x + --Zos FLc A33Z7 +A333)11 134 ZI7=-- -+AOUUS=T2 X' --- -
*Ass -13 '36AZ 1+.= -+XEUKRX14 MUA7ZI- *744 _8_A15 XISXZ XX1=
-.++ielaus+f16 )BSXM"MIlX -A44XXZMUUU*17 -X14tXAM-+4Mf4AZ 1M3US18
XIE)XAI AT) Z1T?AI19 XfX=-I77ZZ +* 1+++=la 120 134) +1 )- - +7X+)Zi
+YAL+ Zi ++ M IT22 )AXl= IAZX) -Z7)+-23 ZXZ).Xl ZI- -1Z=24 +
AY771TT--+Z)25 MXZ XMA XXZI += 1 +26 AAXZXI)++Il))X-
28 A@IAXI )1IZZZ1Z=2ZS; =Amfif4XX1 IIXXL1 X_
3 1 - Z A;A X X X Z =+ 132 1M -I IXXZ 1I) ) ++ 1+=33 4.XWM
1iI71)+ + + = +X IX)34 +1XAM8=XX===++--= -= -AAX.+=ZX1+++35 I
1234567890123456 78Q3 12345678CC12 34567803
Original picture.
12345676 90 12 345o79l012345678 901234567890
2 ZZZI7 Z2ZZ177X )==4++.. lZl71 l)})ll)111111113 ZZ)1)X17 1+.+1-
- )1 1 714Z21+ +)1111Z4 2711) 1Z2)Z1X1 )XEf3e'A'
-==+I11111)llllllllZ5 771 )1 1)IZI)14*O93R6Z )=11ZZZZ11zz71111Zf:
Z2XZZZ1) 1 I++XWiEll8!iYRO42 +)1 1 Z ZZZZ I ZZIZ7 7ZlZA1AZ
)1.+?A334343tILZI 7 I1)1111ZZP Z)I Z1114Xf1)I+Z-Tf'.MM3VY
=1127IXAZ1II1; Zll 11 l}ZMlX) 1++)11 }+++X@!X1+)11XZIZZ112
10 Z71)l3110Zf))l--)111++)1§YX-))1Z2ZZZZl1 ZI7Z11 Z4.3ZII)1+1 1
)ld M=Z*lfAXl ZZZ))Z1177
12 Z7Z7ZZZX3MXZ4M++1+ I 1 I7Z9MX)I I IZ L+=+ )Z13 ZZl 17@
xwXIBX+-l)?II )APMXXXX1)++) I ZZ1 4 Z 1.1 I ) Z-P LPZ1 1EMA==1)
A19P.HMXX1I4.+++MIZZ1 5 ZZI 1Z AAAMAXXX IMM +)-
A&JugMeA)1+=+M4.1 1Z16 ZXZ7XXX9878XM^9iX'1+MEX4I.EX+++A3SFli17
Z1 177Z t7 3A 3IRSG X3PG3R9AA14Z-=AlER) 7Z ZZ18 Z))1)ZIX9147XMMXX
114BAA7XZMA1++M.P7)+ZI7lZ19
ZIZ1)11X?+1XXMXl43:4-lXX3*MM33m)4.4.7lZXZ2C
714XzX1X3Z+=+11++4)el.zt1 IMS@l1t)ZXlIXXZ21
ZA@1MM1PAA1X1IXA1)1)M14++l.M3UA+4)111ZlZX22
ZlXAZ1X)4.7x6+ZAX1='m43IXUXMA+-)XZII)1723
Z)77)1l1))ZXM1AA111=tIi)A+PAEm4++1Z)1)724 ZlZl)
EIieXAXXt4MW17AOIB11=+i))ZlllZZIZ25
17ZZZ7))11XXiM1XM)tXMEt1!14I)+)S3i)+XX47ZAZ
27 ZZZ7X11ZM7MZXAAX) Z7ZZAZZMX)+)1M1+ IXXIZ28 Z7XZ)11
Z7ZZ7AfAZZX7Z 1XMXXAXZ7 )l 4+M41 )1111)Z29 ZZXXZI)) 11
XlZAAX414XXXAZXA1 +-9lX111Z30 ZAZI,1)+117XX4tMA6AAAM1 1.ZZXXXlZZI
111)))11231 ZZ11) )+ 1l1)XX4MI'A11IXA)111)AX++41Z111117Z3Z LL1l1 1
1=A4llt3AZ17)Z) l)14.Ag4I=+X+i rT)733 Z=++Z7AZ17)
lZZZ+=IlZZI+)ZZZm1)+)))1)+ZZ34
l1X1fmA)A6A1l1XX1X)YXl1A@Al1Z9gXXM48X1ZZZ35 Z
3E33333339zU535333*1337 7
123456789012 3456 789012 3456789012 3456789V)4. 4..= - I=
2 e =-- 1+ )-Z + - += --+J =1 - + - 43 + -.7 Z _- 1+ -Z+_- -_)
4+ + +4 1 ) 1+ +ABP3A 1 - ==5 1 1 ) ASM8R1E571-=Z 1 - +6 -4.) - =+
=P&BSPVSEeIA+ - =4++*I4. =7 ) 11) = -3'"IZA8*3333Z= 1 1 -)8 =
334)1 4. M4314.3 -AX - XI-9 = += + 1-143s. s =+
1C == A3 z- 1 ==++An -+- =11 4. I 3F41.+1 I -I 8==6fA 14. 4. 412
+-+ I)X3341=8 - - = *316,8SmZ- =+13 I Z4ASE"=U*- I -31361=X - +14 =
ZMNA+ 1-se Z 338ij38X +*515 1 = ZZ71*AZAA 33 I - 363A3- 3 =16 =X
1+Z+XBAZAl46Ml+A6AX)IIA8F - 33 =17 = 4+ Z =*fHM361 ZA0§U2 1MYA8f
oil +4118 = + Z M"A+XAX+X 33)iZ)ZA+U)A *(V=--=19 ++7 =)?M 1)61A1-Il
A+AF3Z343 zZ AAA Z4+ XEX4 +++ 1-N§A+8Z.89jU3- + A -z-21 =AAZAA+.A7X
+M) ) 3)-ZZ7O+ --= )-2Z -=ZZ AAAAA+ ='1XI 3)1 Of 3 - 4.)-2 3 Z 4
1t46A81 1)AA13M 31 z24 == Z 1WAIA1AAAAA+333) + C) 1 - 1 425 + Z 1
+I11733)X MZ)364AA -3) Z + 11Z6 zz Y1 AZ+XIg+ fg+=ZAAA -33 -27 Z 7
IfSv+%ZA1 1Z)AA+l1M +128 7Z + 1-F4)7ZZl+ A.I)XXZ 33=29 Z)) =#SA )
)M AX14-Al+X +- 3+1-34 LZ 1 X+AGARZZXMZX7XI 1 I - -31 1Z =+
7X+XAAX)++IXA - --32 Z I XA64)+= XIFWM7Z=+4 +++4 311 +433
XXM+7)l+XlZ) AO=1Z= +H10111 + = +34 = 7MAIX=MX) +111)1 Zt -
1MIX=1+M'X35 314XKVVE33fI1ffI3U§3XX3389AXkA*1m3I7MXM)7I
12345678901 ?345678Q012345678901234567890
Noisy picture (sensed scene) as used in experiment.
HAIR WAS LOCATED AT (6, 18)L/EDGE WAS LOCATED AT (18, 10)R/EDGE
WAS LOCATED AT (18, 25)L/EYE WAS LOCATED AT (17,13)R/EYE WAS
LOCATED AT (17, 21)NOSE WAS LOCATtD AT (22,18)MOUTH WAS LOCATED AT
(24,17)
123456 7°9'l23456 7890 12 34567 89 C12 3456 78q0
L(EV)A for eye. (Density at a point is proportional
toprobability that an eye is present at that location.)
(a)
Fig. 4. Examples of image-matching experiments using faces. (a)
Successful embedding under coherent nioise.
78
-
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
12345678901234567890 123456789012345678901
3-4) I lZAMAI+65__ _ _m_ if8mZ-6 s+A6ffISlSOO@SSG 17 366
I{f3E66ISII§@7 -"e""e*[email protected] AOI@8866eUSS640531X
10 _ v3U6G9SMAA8G6I66Z11 _ __ XKSfiIIItl}+-__+11166.__ .-__12
XflftfE+- +met68+13 AIIIE X)- 1M6410-14 AGOM +XI+15 .11Z8 + 1116
8IMAMPMX+ =))++Si6X17 -XG8MXI XAI 3 ZXLxIS+Is 8 } SSXAMMZXZ
=IAXI=+Zii+iS 3M6ZlZl=Xl =I)1A20 A81=+==Z) IX21 4-61- 4Z4 Zt22 -0Z)
+4- Al23 6X81 )Z- =M+24 MA1= -ZZ== 6e25 *XZ++..+r- ZA26 )32111)+=
1t27 4+X11X2++=4+ -+28 +XZ8113= ++29 lAXZ3l)'=1+) +30 lAMAZI)=) 331
+XMPAZ1+-+)1=32 =43XAMA211134 4+33 += =IZXXZI+-- 134 + +XZ12Z4
lXZ35 1Z362)33+- +AXW-36 11MMAI- 1AM637 511AMX+ XX88-3e =1 zzz)
ZZZZ
1234567890123456789012345678901234567890
Original picture.
* e* ee w *
ssss ww.si{ 06@ 1.*S~@ i*I66M6e@I3 Z346060( 'U Z
5t)IIIIUL(5I4OO0SW U Z10 1CDO I
4 0* MiNI*@ ii 0@l00
0167 116.* $6 ii B @ * 661* 11600I@@
9~ 0600 1M166M#ei#@ B@ R s86 *!k I3*I0i10C10 0 1ti{ i i 61 1
CEf81M111666M01184I0S6001S666
1~2 - f 56BM 0g 6 10 6B 0-3 0O101i9*ElE6Ol0014 _00010631(46*
116116000*11160069100i
16 1{{0 0v 16{i! 91017 00!* il@X 4l@
19 * E*IEH 8l*80M 6088@ESMI 2E(8601 00O2C *6*66M66
-
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
1234567890123456789012345678901234561 IlltIllI
IIZtzIItZZZzIZZZZLZZlZZ2ZLLZLLZZLZZZLZZZAAXLZZZLZZZZLLZL..3
ZZZZZZZZZZZZZZZA§SSMXZLXZZZZXXXXX4 Z ZZZL1ZZL
ZZXM36SSSS6AXZXXX1LXXXX5 ZZtZZtZZZZMtSSASSSSSq9XZZZZZZZZZe 1111ZZZ
I I1A5I0I?61555365111@2Z11Z2Z7
i1lZZ1itIIeIe8SSSSSUIsSSmLUZZzzzz
.. Z LIZ IZLZZI2"4{{||@@X Z Z Z7LZU_._9 11)11)11)
1SSISSIS6ISUMAXA%VA
10t. *ISS6SSE3GMAX29q4+11 *1AV"691MAXXXXZZK81M12
.-1gUmxxxxxX~x1lzmgS9v=13 IIAlXAAAXXAAAAZ XSSS1J . ___.tSx
XMM§eAX8(Xt M A I *.__IS +8SXMSSSZZFMSEMZAfM)16 I AMZlIXAX I XMMAI
IZOMX.17 .. ... MMII ZIIIXLL1Iz MIIS -A2111Z)IXZZlZX*A+19 xOzzzzz
xxxxxzztze20 -ZA4GAX XXL.AMAXXX XAX.21 SBXXXZAMAXAXXBI+22
XIAXMEMM§OMXMSS
-.-AXXAMMXAA%9X24 AIMAAMMAAMGOSSI25 )W XXAAMSSSL2L _
SSq88etSS,_
27 M.s sseSSSMM-28 *XASSSI0SSMXJM2S AXXAMESeM.AAXGI30
IMXXXAMMMAXXXK6Z31 -XAMXXAAMAAAAMISXt4A- ____ )XXAOMA AAMM
AMMBe§eXXXj pa..33 1XXXXAHSMMMM8M6eMeMXZXXAA1I34
SZXAAXXXAIBMAMMMPMAAMMXZXZXXMM
12345678901234567890123456?890123456
Original picture.
123456789012345678901234567890123456I MPMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMM
-2"'MM MMM MMMMMMMMMNMMMMMMMMMMMMMMMMMM3
M4MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM4
PPMMMMM4MMMMMMMMMMMMMMMMMMMMMMMMMMM
-MWMIMMKMMMMMMMMMMMMMMMMMMMMMMP4IPMMM6
MNMMMMMMMMMMMMM4MMMMMMMMMMMMMMMMM7 MMMMMMAX 2 MSISMA SIEMeMZZ
XMMMM4MS MMMMMMAMOeMMMMMMMAMISSO@SSOBQMMMM9 MMMMMMMSS$SMA XMIIA
ZA8O9MMSSMMMMMM
10 MMMMMMSS8MAAAMeXXAXAAZ1IZAt1tMMMMMMTT1MgMMWAX
-
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
1.
15
16~
213 C- 4n-F tvA A VJ%?08 AXYU3ZS XX X1 2 AXJIAWI5 A
23 I I IX Z-11+ - LASI+ ZXXXXAA*AX
22 8~4ALm-AX AAAVXI IIXA1 1)112XAXI It I X4A4ZL"A&
2,3 AAeAXXXX "XXAAAlXXZZ1 -43261.3ZXX'C1&11i251 X{4'AAAVX
IX. 31R(V-X*)AA43319111121Z2
28AAXXX6AAXVt.XX2I3'14A+A**AXXXXXX4
32 XXXZ2ZXim+8II"4A1 3124*.AE4 AXXXXA
34 XXXXASS4 335 XXAB8*Ut )4?MA.AMAXXXXAS)ASUE-OX4?$M~4
12345618901234567890123456789C123456789
Original picture.
1234567h90 l234567e401234567PQ'1I2345t7R90I 1
XXIIXflX72ZlKZZZ111+++1333XA"AZZXXlII2 III
XAAX3KXIZXL,17VAXZZIIIZX;4AAX)XXXXM1113 11 SAM.
.AAAAXXMXA?A1AXXZZZA'44imAXAXAlll4 111 4.i.1AAAMXXXX 6XLXXXXXX7
XA6AI-AAXXIII
II !(3t4A14AXZXA6XXAXAXXA%XXAAXAAAMMM6m 11 1l1pomXAXZ%1ZY4Z1f(
Ix0 X3AKZ 4zX!A*AIOJ1
7 111 XAe wXXXAAXAAXXAXXAXX?ZZZAAMl482Z1 18 1 IA M4A MAXZ
XA!A#vAA KP0-AX1IZXlK4AA6Z111I
19) 11IIXXXXAMAP43UN-OIMHROOAXXAA1IXXX1III
12 111AMA XZA At fSU3anWSeno*#yjf1113 1 I" 1fffffRSLaG901i
4cmfMMA4M, Ak4 I 114 111w.t. A ft.*" 3 AAXXA I NA M U+f3.4MA I
I
16 I11AXA4XAXZ2Z21131*3331IhA4qAAZZ2ZZII17 11 1IXAXXKA X . I+3
I))I + -ZA4P I 0AXXAAII1I16 II1XX)KA1AX2113,?)) 34
=1ZAWfR'4XAAAXA111IC 111AM-M'*3Wi6X11*I ))lZZlXMA'MAXAAM'4,11120 1
gdlYU84 4aAyMAFM P IA21 11164 A3.~~~A43 8A4M8~~~O@~AX6A
X Xvl1122 111XlAA6XXY''-AAX!lXMOIMAAIXAMAAft41123 1 1ZZIZAZXXZXX
IL II izIxI3ZzX1ZAAXA$OMAvI I24 111X2ZX211Z1+14*1
3IlIl3lXXZX'A4AXX4II125 1 1AAVuAM A 1I)3I ==+++.4111IZ31Z XI 1 1
126 1 1l'4vi*am4YtXi II IZ I I... I)ZII Z X IZ I I27 1110SA"A4XlP)7
A AXX6Z I 3+313)1117 IZ X 11t23 1ll2X1Z213 )lAXLs %Z73+)4331112q
1112 1'? 3,)zZ,,XQ,4iiX.7Z 7Zl3I34443.13=1113r
m1i13+1i7,4j)jZIzizKXXXX1l3+3+ +111l31
1lIZ3311X63vA461))11X6&P14171134= + il132 1171133 1114-)
IXAAAAAZ+41ZIXXZZZxlxAZIZZZI111134 111
3)IZZXZXKZZZX3)1131)ZIIZZZZIXZXAIII35 11131171.433Ij11I111
3*4)ZXAAAAAIIII
I i3457 C~Q 1 2345il 7;'~1 I234''- -i~r 343= 7 i11ljj IZX1 -1 V
!it Y t- # f :#. .3
2 &PE- zYVERYAE1, r- and1l
6
13~
14~
160 8§K--3Z4 0 " SI 1 as1 SX=X +
08 AN8x XE + i3 Ar11)FjV g+X-i8 fSl
4C ** X31
i1 Z XAZAAA3SA-"llFA = Z1Xa1= Ai:t- xz inA fXX M Af.Y-1A3Xl31
V*=l3AX1 XX XlA A'+
21 A7 (A -1 4= )- 4A
24 ) z PM 2=+ 1,.+ 7
25 3Il14E*f4X VX'+2l3 =+g6'31 +A XXK26 IBP IB A=f IA Z X = L
11IXX1 X1I1+
27
28 1 X 1AII?3Z= MAUVAV*X 1 ) A Z2)1 33I A XX-'2~~M I Z
X3-ilUSyUSZ IXk4AA,6 14 X AX+
31 )Z9PXZZA ML*&S kMIAt,! A+,-'S8K+,Xi=X I1O) )+32 Afg
AAPRII8AU )I).S i 8 A YX Cf~~N'-43?3 E2X-St. VAII 1A0~34 1ZIAI
35 f 118218e Li- x 1 Klj1V+-,X y38lUXwy-* 83O
1234567A'~012345A7SQ0123456789012345678Q
Noisy picture (sensed scene) as used in experiment.
HAIR WAS LOCATED AT (11, 21)
L/EDGE WAS LOCATED AT (25, 11)
R/EDGE WAS LOCATED AT (25, 24)L/EYE WAS LOCATED AT (21, 15)
R/EYE WAS LOCATED AT (21, 21)
NOSE WAS LOCATED AT (26, 18)
MOUTH WAS LOCATED AT (29, 17)
1234567593 123456789012345678901~234567P?0
L(EV)A for hair. (Density at a point is proportionalto
probability that hair is present at that loca-
tion.)
(d)
Fig. 4 (conitinued). (d) Successful embedding under random
noise.
81
-
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
3 tttt't.'*a.r6o @*'++# 4ez,i4 iJJ 4ij*-IiXtetV.fCC'kC- 4--1
£-=J'4I 1 ,,4140 +s
S i 6tet:*++oF 6:f'pt- -.0 ±.4Te4'4e1 }V
7~~~~~~~~~~~~~~~~~~~@**a;+'F4#---HIIIFi ;lp8 4*F-v:t;+ 0f i t
-It 1 V
t**+ffff~~~~~ Of~-;"_rE Jei.a.' 44&EjX;'qi -tit
r *^^ A1U1 0 t _ , .,3 A i!
1~ xX-- +=124 f f- Fxl L-_ ,11
7.7 1 a x*31 II )t 1 1
i n-V VV4- X 'Y A y| q {- * ~ W 4 S 4 ~ g il X x t. + I Z 1 =V Z
2X.* :X
23 P;'* g7 =+ + I~E Z I Xy, "A. .
c>2'': t^^ ': ;+@6y )=+ I y; -vXw} t Y e2 %4'AX )X= 1Y i I I
-I fi)2 4 X+25 >9 y X> fil. Z +7k2 f .XX X X X X X Z 1+ ) 7 1
)--+ + X)Lx 111.7 X -TITITI I )II) I + +)7 A X 6XXYxx X XIYI Z Z )
+ - li L + + II
29 AXY YYXyXXXZ IlZ, l. 1Z i lblZiIII 1+ )III-1-`-- XY7-77 7ZI
Il I I Z711 1 ) ) )I}) + ~ ++ i+31 YX7?Z1ii l11)=)11111 +++++l)- +
iII)I)32 YZZ111111+- I11 1++++) 1 + II1 i i
+!tt7i11* +lF-4+ l l(134 IZI I X yI35 Z = =I 4;'isK
IZ 4 67 ^t. 1 2 14^ 7¢C- 2 5". 7 C;t1 2 4^
Original picttire.
21 7., tI 4'Sf 7 C Ia45678P 1 2 4 5 6 t3'4
i~'t, -K-.0';X 'K A- A*. o4-27 P_O"YK v;''"" ':FA ts/Q .; 3.Ati
YMiA4 AP1e k4 X4P=i Ft isa
io9w O''r"f),:i,07? t-*= 4*~ .0-1i\*s&P r8F AiP '4fASE7"Vf '
' *?4 te ii '-i A5.A 4 X 'AC
A Ye tIte fZ ..Z A,Kt I0f
IC "-",Ac,, *i6,@ tvI1 -E,,^f441~~~~~~~~~N k18*1* 4 Yt- Y I
Zi*;X.7o . I A td V-1f.-\ 4^
V1 P tff9wF' -@'F f SIiAtv'A + X aif ;e P ., P-̂~ :
1 5 4(-X i4r- V43 if 5`,eliB A.J-" AA#9 i R 3AAW* -I36 HPAAX
SfOXX X 'd A z+Xtoa* Xi17 F#Y4@ f A'YAfR4*Tv XVIUX8vkeiSWAI A A
G-9-?1 8S FfA Xf Eft-i Z!'I§t R$8A{2f i0 Z=C ZA bFfU341I,R 11-Yt;
f-3C9! A#k!wl X= AU '-' Z A-928-
2A F P-;4csJ*Asief E+t*6 C.aF*P 0YA- AA@ 8xsA-221 i"f:*^;E0F{1'~
afW W .i4& if '#'*A6,1- XXj
24 E-fI-f4'1C1 si:t8=*f it jjX4V-FfiVA I ml*f* a-Ait2 c fZ f;* A
A R3iXA k V fFr-CXA A ZIf N X I'"Al K26 f fTFr f-,4 !PA, i#44 P-P
vt;V,.
2 F >:wA'WP- litJP9 tX% f t ve At-7 i P+i4Xi4 ' \H_~ ~~~~
l
3C @l*A2-fAsi*Z76-00ei1JO'tA+WkKAAA43
3n3 6w.i?{';.Z..8' X (AlzktfVAXRjj Aui;;34 E99WINPITAX196
I46'wN-e YPe{ 44Ae6AMXAIP) AR&35 F ioi1Pi1A3i
iAAP'llox4il!V.^si^MAz<
* i 7 7 S4 r,s 4. -.'i1 4 3 4 rF ,7 ° 1 1 2 3 4,r 73 Q;;1
zAS'4w% 'ia8 Fot+Lt e1 v 1i1819 1 a- :9 i 6) i*2 A0=,ER08t ;,,
11":*k i+ 4d*aF- ItW RX A\BC 9#' 6eW
r *is,_,=s ; z_*6i t~~~~-.--4 A)0-fti -e*
aas*>^W xsaI ai 6C IF- C8 4Hw gi- i4'1 04+i=1 6,i0 0
~~~~~~~~~S : XZ ItiB IfE'A e',9A E4 -XAG8 v1)( ) #M&7 f'X ZE
^ 6A ~ " e0 umeXs*0Zi X t $h*WFI #s
_ 2 .1 .A . P. .4 a ..L.
c ;zA oA?.."" E>1 '"tKtZ3I' AtL'A 3*i"
1 X Ea 2E i' f'f'4 A - ) -A BiKZ}XV 6Z1L 'I1 1 iELtIs21S~s _ 19
- 1 *A H41lX SVi# A' Z
1.3 f )6Yf* f J Z I A 1 *V6 ) tHAW) X14 VrC-t"fv")AZZ)1 )+ X XL
+-+RA-17YAI1 5 +X fU 'V AV+ P ) X Z75b f@§frYK3ZAWAB916 *Af1 XX.*
.- 114.Z==+ = A 5b1l*^Z ti17 U 4#G49+*81X8ZA1S"+ IANS VA'81x
x -4 I if I . X Z............... t A A ...A'-'.. .Z. '~I
19 A+"*-U 111114A8 ZlZ+ lZb= 'WAL4SX42' lZ?2 '9 >+'4-'l C ,7R
5 (;A (O Z +XIZi=*Glfl A *:*Z,2r *' t ) t )~ 1\ _ ) 1ZI
2 1 115 AAX XL- IC 1= i= 1 + ) ++ 1. )
22 S t4Z11AA1,+f ==;g A=++ z=61 =z23 X-FtAXSSlISx.4-1 X,+ ) i1 1
--* +A( !SZl6139'M24 IEM1 X Std XX' I9) XX(.eZ)S+H:VXEII 4X#*?
+Ai25 IZ1Z e" AI1-itIP+ A)ZI I.I AZX)++3'-4)X2?6 P'14A)A APX 8AXt
*\ Z) (*4+14 +_A1M)27 1i )1*=ve/+ )=f )A+4*) ?.)A Z+1 =0Z A28 X 1
)IgpWO+'#iAfX- Y91 XRI+ XI-A+ IZM29 %L)P'1I+r1XZZlZ1A)1 A+l+= + +a1
13C 6 Z-Z)1K 4+XtL ml CZ eZX= + I 1X=31 1 W f;Y1)= i+ + )ti Z
====3X3 2 Z+L'. Z== J 1 1XZ'l +Z+= + 7ZZ=)33 MXA + I 1'f1'A XX
l4Z1A1Xm'M34 A) +4A = 1 IX + M 4Z X=1-61A35 1 + - I _ A )M I
m1@@1
12345o78Q')1234567031234567itOl23456789(
Noisy picture (sensed scene) as used in experiment.
HAIR WAS LOCATED AT (8, 20)L/EDGE WAS LOCATED AT (20, 11)R/EDGE
WAS LOCATED AT (20, 27)L/EYE WAS LOCATED AT (18,15)R/EYE WAS
LOCATED AT (18, 23)NOSE WAS LOCATED AT (23,18)MOUTH WAS LOCATED AT
(25,18)
123456759C1234557P8n12345675C012345A 75)^.
L(EV)A for L/EDGE. (Density at a point is propor-tional -to
probability that L/EDGE is present atthat location.)
(e)
Fig. 4 (continued). (e) Successful embedding under random
noise.
82
-
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
12 i4-rr 7 3C' 1? i4'r,,;7>lt4.1234546751'12 1456
710i'6hfVVYi;flE4.44f * *.eeswtienortngVI*
_2 _*IEVWWYfl6Me9*f6Owe*tfIt(t(itj.#,1eiguv.1i H9bUi?iwt*
ent*ieifiri?ts e*e.4wx3e4 3flM ti etinX ZIttt ZvtrtftF9 X4t.t
et44w5 '8 '1eX)k14t#4'fI 94v17
IVE)FIRSSSI,SSRSm17.fefscXXIf;4*ueXes
_T )(teu0g8ES sSg s0 w,Ao6e ;- IWtSISSSIhhhAHS1SESIR P3@JiX4i-
--"
IL *&ff*MiHS"'RHuh0AX X =/*UInsmvXAtXX4A'sA13 *fIfEm 1 14)
I=,+7vsErXs I-X1Ax An14 t**4:*4"lAA V W XZ Z4 7X * +) I z "ELjv5gjg
44 4.-3
I2f- 9" 'AA X L XA A YI 4) le I Ii X ILXAXI Ai A XN< S
16 *i"+IShIelGt:wX.X l)t)+ ++X..1xAIh Zhix7AAHXe;q
l 7 F'-M K Afo Ai= r . L I1 Xk61 +I 1Z7lllll X 7 Z Z7e
1EfI'1A5X3IRs813'IX)AZNSSS#t.t #S11mHm'YIX YI1i I15
**4XASISSffiIx V*(4'XVSSXKKKX IK IAM1PI-S14AZfEfE&Sfl.Y
XXXfit'SSsZ?6X&-XX)4
3 2 tffAAXiv. :X il ) Z9EN IF X XZ AAA4
4: !4 Hai*>ti e1 4 VI i i:" A '1, et.L Ia=1 Y_tM/ZXX24 X X4X
FMSOW4A X_f;x. "'A.x 4E1 f 131 0Jt XXX-AX\ Z
12?4Fl'7i.670
25
4
32X eXXKXXk'AESfl9>+'XAtK17;weez3)1111
?4xA X-X80f4XX.:.i .VB8X* XA34 A~~~~X/X~~~~*HhEgr ~ ~ WA
35 XX(X tf!I9SSUUVit'PAKKKxYKLAW*fESW9*3VSRS.X
12345eJ739212345o7tc 123450nyk'c 123455rv:
Original picture.
A A ,AKWA 7 *'77411'7I t'vx/4At;?2 AX)ALA"Vr4?K t'A i X4"%3
AZX;0AKAi@ZZ* , ?\\wK*--AiA9l)'. Xii
'. erxu4#SSSetSwtflue**Sv-R ~tH 47e*;s ___9 tl8"'XtLlHSSStBS
'efllflfiS1dSf)ilSS4 el-+Sriz0
10 S*S6faSfl6kS+.B6&'1eiSASISvle1Sa*ntEtflL,lC
SN(MhIHSSteHUSCS4GSWS@-1+=f'.Z±i%S12 4l"St41h5OS~LS: 11 7?XKHfft1l
SiAI-C*1 3 USeXweISS*X#sY))XiI1-ZLVAWSESSNSxKzX8N0|SAV14
RISt*f11&1kkvlMZ 1 IY*I+'fSw51+Z1HtU1'15 XS'iE808SI\L1XAZ-+ AtS
XSVRASX*N')16 *ISIAssixm 9Si e=)1271*?HSSX+ )1'tHU Z17 U,tR11*SZ:
1=- t+x vLsssrsIrazn,w7ev18 8leM ISE0SS? ZA.+ tEA"C4ZASSR#(
1=+t--&Y X19 4fSAttiS=SPHAahX~AXM?19#1# IE{AII*EF*F4I9) = f.S
^PYIS ^ a 1§11 X*'A,:A= EJ?C AJHN'vX l1I eNSRX' 6t -!'Z-IA4d 16 P
15
-
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
1234507b'd,1345678se 123456759012 345b73tf
2345 +11+
--++= )A649e1 -7 +A9**5q*i,"Y*Z =
I010 Ii wSf)ii oeZi** -a
17I1 )A6180OBO8{s3UHR6i8S666.+1 4 ) [email protected]+ __15
6Q16 ZI8,2 Z Z I1 ZXL66s86s417 -VKR4i8 4A+-- +Z ' IARSA
T1GEx -iAi t1+ - -)U568B0U410Q +i 69 1'X7)1- 8 I71 em2C
+QC66*A"MX)I Z )l AB866O_e_
22 =166'iZK"48s1 *X)[email protected] 'v,6R'*-IZZLZX2Z22-1)6PSA
25 zSSAl + 1- Z 88#+25 1iSt I 1z= l __"+2 =- - )W+2R AOXI+-1).-
+= ++29 -AX)+ XA )=--)l3~~~~ ~~~~1YT2T- _ -
31 +X+)}+ -32 Y5) ++
34 Ad>X AZ+ I=35 -YtiZ ZZ14= 1)+
37 =+11)1*=)= = + ) Z Z xZ 1 1)38 ==1)+ - -+4+) IZXZZZZ1)-
1234567 S01234567 c012345678qOl234567 1)n
Original picture.
133457 ri,7 '1-, 71 11.7ZZ7AX 7I. 1..Z 11 1 I I1 .)++ 1)IIi I2
1177 A )1)))) + 1 j)))4 11) +
4 l)flh/1eXS77 1)1)1_1- += -+711l7^Y)1lt'7 Ii51YZ>7frlT72
Y7)7^;. r-- .-+ 4+# 11_Tf'Ft 7.UTT1T) 1-6 1 51 A?)4+,+-+*t YXJ'-+*-
171111J1Al7 1)JZ?iAxl+!7 =). 1JH *^.?1IX4, !F 1 1 }>+9s4j g
s++)-W)AZZLT
I Xz JZ) I?I? ) I B ti ^' t G 4 !,10 XZ '£ J IIZ x 4' .-'Ai-
F--*''s )1
l' I -lz =n,a1-''-*;4Ll : -'t- .; n-}-.TT 1I Zt^ / *X.Z hi:̂ i i
i-i'fi if- Xt,Ytl'; 24 ,1 I Z I ) 4 1 t4- n ?V .J 4 4l1k.4
J.te'.:2)F -I
Ibt,1 ) + KAL^ ? *lE! f it I *I ) +) ,1w16 1 )'I I $8f +*1 7
)*.ut;HwI +' I1: 1
311 VP.s7 Ix /1==74 1 T? 177''-' ¼ H331o.)1111- **. j > 11
++- )l ''"7--s} +
34 171 7 7 --
3751TT-i- -TV.'k7.i'4x 7i'.^.'>71 .1'
?? I ) ) z I t:q= !', "zha %'Z,R."1 IF ".]= ,,I7 11T) !1.1s8-IR
JF= F +_ ii - r."~-3 XX1+ I J I I )y j Z1 )>- iZ v) I I7 f
7"> y fTi.
12345676og&034'LI -'34 12345678?0123456 '??O1 ()-X, )2= A=-=
A- 1 + = 12 +=l I - + _=_= _ =3 I +-+= 2 X=1 = 1= XZ4 = : A 1 1-' I
++1 += Z +Z45 Z7 -- = I 1=A+ 1 ) X-iA= )A I
7Z +.x = =+=-Z I AN + 7 1.+ A7 + I I) 1 4 Wjatt46S66X Z -X1)-=1
A8 t 7Z +Z+XYYw88i.z*v vw*69"1= I0 +1 -1 + =-zi8I-5Z6hStai0Z =
=
10 Z 1. .+7Xw06+UA6tU':6,64Ul 6*8ii^E -=1 1 * 1 -
x*H6OYSZ0H66'6s36RwwI1V%= 112 -XZ1+1 1+ tRsUu6eeezdv+ 113 + 4 D AlX
elE6VHM.m860S# I- +14 X) - o'A5a I*68EX?'5I4O6'a*.Zpi,Z +15 4 1
A.iesEeBm+vs0 mg".Zs*1-16 + 11)WE3sssaki*A\x + +Sl*3SSSli =Z17 Z1)
- fY*10"'A1 11 )lM M70S,.IAA )1s 1= f£I66Rt'f - lx'11 16l9=19 )t
66RUAi,'X+ + - *ilggitifrl ) X=2C 4 1 +.f-4*lBI1 - 2 =Z X0-5-E 8Z+
All21 +1 ti=oSi-* = 6) 1541A687V*iX+=Z I22 1 *IlleiAS"m>.j ff=
X.e9;*+z 123 + - - *10+.7 49"A9Si+) Zfl *SUI!=L-24 -x *=8s0*' + 1 M
3 -X= IhaS5 L X25 L=L86"= 1) +i= I#4661) 1=26 * 8* 1 E'31 111 I A
Z1 Z Z -> )Xi=) X-M 1)
12345,795C12345t7h0C1?2345O7Q9c1234567890
Noisy picture (sensed scene) as used in experiment.
HAIR WAS LOCATED AT (13,23)L/EDGE WAS LOCATED AT (25,13)R/EDGE
WAS LOCATED AT (25,28)L/EYE WAS LOCATED AT (22,16)R/EYE WAS LOCATED
AT (22,23)NOSE WAS LOCATED AT (27,20)MOUTH WAS LOCATED AT
(29,19)
L(EV)A for eye. (Density at a point is proportional
toprobability that eye is present at that location.)
(g)Fig. 4 (continued). (g) Successful embedding under random
noise.
84
-
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
l2347CPiq-1234,A7OI, 1234-57Qc'A,' Oi' 7 QP .On1 = +~+=_= ,
==_+44= 44+4+ *44 4
2 z+=+++4 -+e+++,*++*++++#.....+...+...)P)2 ++++++ + +++,++
-++++-+++.-)*. . . . . . . . .4 .+.+....4.) P1111. 1 . P P P P P .
. . . . .5 4+4*4pP) IP)) .S'V hA~5X P P P)P) P)IPX) P114
7 P P P Ati '4i,*@ 1R1 P P P P P P111 1A P P)))Z}1 /* *4*+6*~r')
t))))P) IP1P1P9 PPP I l 1 ee*eti fi 7 P ) 1 P )1 Pil
IC PPP PPAt*ii*ti-iXt PXPIiliLl11I Iz Aiv)IIW+i9=@4++z*> I1!1
1 1 1_12 Z T)l*libi)+-+ZrfP21 1) 1 111 11113
=+-++lif*9XL-AZ17AAZ+14CI11 ).)1MIII14 2*e*)AePxiPX [email protected]===15
-+.M4LXN:-IAA2 Z%'AP) +9 .16 +X9I +1Z1P} =ZXZ 1 +>'*L---.---17
-XF1-44+4+ -++=--KA_+1 8 z VI PI1IMA XZ4-44XX4I Iq 19XXX X AAtAKAL
1 +WZ-20 +4MX.'NA5 AZMMXPP __ A_X t
222* .I r* _,4 x2 3 = v*aElt*X-2325 43FP0VZ26 +i: i3
2';28 23SB8Hfl{}XI.t2S ~~Auea5EBae*"__
'31 +- £PPSPIUPPSBP32 4 Y#8B86R8I'
34 -yMA A-.iAS35 + 1 1 + +vxz
12345678901234567?90 12'345676C123456 J?-;3
Original picture.
1?34-67; ';.'123~~'123447XZ 5018-A1245X 73A )I AALAXAX1'.A; X]
.K'W-MVM 5t. 1A ZY A -1.!,
3 AAA.^itlS".?.X'"4A;1AAA^l_ ZY7 7A&Ai4 A.A %=X77.' t
eA1eAIX A^ )'AX L YLYZ A AtA415 A A f, _4A rIL IV X UMI it i K I I
X I I Z X '-O A -M3YX P!.A A,6 A AA A X 7++LXZ 1 7 7 2^) -XAZlt2 JA
AXI A A A7 A5' ,'AY IZ ^. I '.:XFll8iFiWV* W _0 Z I I I Y P. I2 X A
AA A;P ,^,.AtAZ Y 11 \VOGX,s yi 7;-@s.. 1+ I Z'f1F 32 AAI- ^5
A-AAXIX'Af-* 1l1XV'AwvA4:'1Y~X AX I AiA
I A 4) I I 42+eL.L 77 ZAP- IS Zl IIIX11AAAA
1!A4 -% Y. 8e-Yf )ir =2 Z x Y ?' ++ a, A --, Al13 At.t '! I =+ X
-XX I A -L ,7Z ,t 'At I I X) X r-X ) -.tAA
I S A A A A t ' 1.X I X1 t.?!VaP3AX) X2sZ OAA7 tte1+A X7 I +
It+t I)+1 1 Y A AV Yft A A
Ic2 .114A .1 Z X Z A. '-t:i- 1J 1 'V' I V X )_ A X V,'O Xt.A .%
Pt?^ Att 14Szx1sZ Y I y J* '6I V16i:*" Z I ? A ,YXA f
A...................EA21 4-A I I ! I I 7 Z Y*r:'.,A A ) 1ta I Z 4A
XV^A" A4.22 t.:. xz7 Z 7 I A [IfA E, PfAAec -IthA . x7 .1A*^
7 I . .7 .k tI K7- 7-.......%-' -1 A .A N
2~~~~~~~~~~~~~~~~~ Y ,X A y> v:4:-Z' I7YYjz A A -I7; A*AA A I
Y Zqt 'AJ-.^zYx X I.t>AaA
28$ A\Â,;KKI1t' i;4'¢-e.8,>8 t X 7 I ZX Y X~ *:X P >A27:
izA -I". A Z X,L X ''4ik- ~,;4 zj 7 X X A -A A e ,PA* XtX) i( I
A^.7
.4AAt,^ .-'2 I) I1f 1 1Y.i~* X I ZXh A MA L A AX AA't A^at ' 'K
.'I - A. 4sw;4,? I ~+IV X X I t)_' X A A!A .E.
3Z 4'I:5\)7,.'"e Y 7 X A> a^^3, X4 K*^.,, #K^* *vl ^ tt**
5\
4 ~ St ^E -Y Y .: s X X . 8 1 )+x x z 1. X-Zs7^i*5
itz'.7: i4 %; 74 1,c 144 7 ^92 54 t ,7 a9
L(EV)A for mouth. (Density at a. point is proportionalto
probability that.mouth is presetit at that loca-tion.)
1 s4-. 7 -I , 1?'4 a 7 A Q-'v 2 34 7 u1 2 3 4 5j J7 j1 5t P _-K
4VZ4 -+.P = ) - 1+ Ii 1)2 +1XZ)+Z P=1 = Z I + -PZ A= * *t'1
A , + K I PP ' +A I PI.A4 P ) =v 1 *Z 1= = P +5 X 2z ) *t-
4-fE1X" 2 X 1=ZI +A 1 146 Z=-l.l- l P16R, A 17-0-- -ZAXX +1- PI7
Z-=-) A0XOU S*W-@wWSE@'P'U* P AlX 4-X 9A X -x P'4-EewefflemelzP-ZS)
OP S9 1 Z A Ij 4U=9S=ROSaj19 ) Z13 - 1 PfHUIti.AAveB=.v x i=A * A11
P -eA-I:o1g6XQ1SIB ) X + +I*5P* 1 4 81 P0=..12 i1.AEi FtBXlsz= +-+.
V'4110P) 11v) Z13 + X+Af XA9B4= OP) SOAIPI Li-E 'I14 1 P. -*'7 hh'.
* 5- YX(1C 44 PtIA#+*e- LZLvAI-*vS A q16 All)XINA X)# .'1?U *VS-X Z
=)I P17 P + z=1 )P==+-1+ 7 z = A 4 _Pd - = XZ-+ X )P44) I + A= A +
K19 X + )XI = A7 Z - PS =+20 I PS PZ9I1 XI I9 I_______21 i6e+ t*
'8P 641A1t -. 1-22 1P -tfigs'^.44 1 + 4+23 * - AIfU-'-e-i-o * PZ 1
p))24 + - P4 P'0SH'I-Pv'xv*lO Z =?5 14P-4OPK=E,I9I1.S 'M.1 ) -
-.?F z -11 4IQtIRE4ij.*ySitliH 1 PA =2=L .._.._*7 -.
/IgFt8i'tlSl+ei+j X)=XK i- IPj| P /Jei8R5fleYsaI-p x =
2q 1 -t 1I .sRE*8pEi=.-pp-) - -3C -- +*. e£H-Eef X * z A31 = I P
'O6t6R =x1= 1 * Z3 1 1 -X 4s; +4-i-34 P= AP.¶.AP&'A x P*P ==Z
A35 2 ] *4)-z 7 =X-+ = 2
I Z k4 56 7- c l 2 5 70 -;) I 214'67 4OI2 34557j%(
Noisy picture (sensed scene) as used in experiment.
HAIR WAS LOCATED AT (8, 19)L/EDGE WAS LOCATED AT (16,9)R/EDGE
WAS LOCATED AT (16, 23)L/EYE WAS LOCATED AT (14,12)R/EYE WAS
LOCATED AT (14,18)NOSE WAS LOCATED AT (18, 16)MOUTH WAS LOCATED AT
(21,16)
(h)Fig. 4 (continued). (h) Successful embedding under random
noise.
85
-
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
1234567,V1' 1?34';67$i9. 12 34517$30CIt -44 44';+; }4-eba in Q
eEr a t e a c
2 *4D jf *44*If~
9*'}I I IV 0f 4f f E^eeb4 -' 44441§i4 i*41S (+i+-+.- te -FFlf-
{,9f 4 t 0_,zFEv i,,u # If
*%- 8H-.HS86 QAt8-88f9 f- -;Vlff+E-.fFfkjffElSE
888jReSHEfigfSea8 f
ZAF" s E R8X ' ti^ s
S HA1Y Ii'l A'±4n12*94'^ t'@ l@Bfia|9 i0 ZiVAzII + Ig a$BRfs R
a: " A41 3 41 v. Y Z I +tIRE8a .A A AA14 (WAXAVURSR YXl )s--+
=411|0fl RaX X .AA
l
I1 54XX 448ftXX71 += I. (RISRA8 X XXX AA' M16 M XXAA8101.Y 1 7
)- +A SxIAXAAA17 PA;4XXAUfJg4X'hAL+=-4)/Zi14HHH77XXYXXXXAA
--1--AXXXXXAR4Z888, XI XY K A119 R AK X,4.Elff ie~k;2* It &
IV'tF iMA74I I ZXXXXX X2C AYXXAw'A f.AAXtA 1X IXX 1 11* AAZIXXZXXX7
T r C 1z7 F & x -,x1i k Y 'Al iXXXfX-X22 MXXXX*XI 1))X= 1I
+X5PX?Z772Z7XX23 3 A XX X'i* X 1 4IAL 171- X '7 I1111I IZZZZXX
- 74A-' YWZX'/-XX t' A 18 1 j716--; S2 9I AY )I I I I'I 1ZX X25
A X XZ I X lA X7X0-MAX1XAXXX 4 ++ 11Z X
26 AX2 X LZXIiL P vX XA VMA X AL.X Z II?v1--=I + +) I Z X X7
AXYXA xL Z L r XFDI--= =iWT)TX-x
2E XXZiZ2IZIAXXX(XXZZ/ZZZY)===+++))1IZZ29 77ZZ ZZZ Z 9V. AAXLZ
ZKX X 'A +)1 I Izz3-37l7ZZZZ'7Z4- 127.AAJA-A+.-1)) II ZZZZZZ31 ZZt
71Z7Z PAE,q4AZ Xm4U'AX?- I 1 ZXKXXXXX A32 ZZZZZX AX+AA7'`5X4AAAAXX
7 I -I ?MM¶4#-3 i 1LLZLXA41 X X XYWXTZZ - -1+Z1Xq34 XXXZIXXZ
MXXAAIZZZI1IZ35 %IMXXI+ 4AZ1I 1I) +2- -I
123456789q123456786O 1234567BQO123456-7Vc
Original picture.
7
A-
2 P'NPJ AIIlV.j 74G A4+ Y'3 "94 .911: Yz X ,* 'o R
4 MA8t 8 t> 7 d. 7 , { 8t < 1 ASX tY
73 -fWJ-5*t\ , .>+fi|j ++
i':P 'I3X i: I A X
12 M vX YV+ _+ !A!
7I v'@'A mm- m) I !A- .-ft,K t `- :3-y{FZ"' .'- AY .'1
14 MuBMv IX7*f-2 sm ZlS:A6 t. X z Z I Z t_41 7 if N4 t,;( -'lA
.
~ ~ ~ ~ ~ ~
1 7 I+1A-,I I I7Y. 7 4Ij 1 '
~~~7~~' AaAk ,
W'" A A I MIRIP- }IJt" Z K >-f^"igdt ^ *I)W ,5A <
1 Av
_ X!,Q Z Y "AAX A| -i7 it Vf'Z t #0 + NI^X- 'K
L I W' -+ x
I M MA *
1" 3 17Y 7 Y ~ '. ||"-1' | IL * f 1 I .1YA . AIZL( V) fo
nose.)((I Densty a a point is proportin
o pr' oa' bA t y Ytt no seis p esn1 at tha loti on2 ) 5 x + + 1I
Z I I I
?21 MV9wh-M-X'7 i 7. w 1X.;l .'"YI I IX^271 '-tAl I 30'
24 9'fl'JE"*;3t9,* >̂, =+ ZZ -C tvP+X f-71i; 0
-2?, 1t'J 74}SiN ? A x"-#VYlk?7X.S1t4'3 I Po v V it so- t 91~ 'P
it 1 71X A I |
3 -J(Pt§-^9 ie"i YYY Y :, Z wiVA Vt.'7 xA'
b SYv p.L FCX A + X ct:¢-xL. s I ,,!"!, '. ,Iv..! E Y' 8-3r1J;Y
+ + _ - ) l .!771 Y -5 '4 '- + I I I I Vy t'Z v-J'
i
1 2 4 5iTpC 2 4 12 -! 7j) 1 2 :4F '7a,1t'5 h,7 7 }
L(EV)A for nose. (Density at a point is proportionalto
probability that nose is present at that loca-tion.)
19734,j7- "',, 2m7 -!13- a r12 j *i rI f:1 o aY-sEl I Y YAl II
-1 'l1 aR IEl- A-4At,#2 UX4 7/1V t716Zfr F -e UIz s( =4 F +R '^AUE
Tj@ 1 ERiAi0 ABAlUiYS#lii AS4 0 Atift*41)1X /fE19 1 4f
)'AV5~~~~~~~~~18884' X.8 P- -kW.*' 1-1a a^6 v5Ht BASE '*9 8"E HRi Z
A*IIX MJ6A x98H"b67 XA P, x Y HZ8ê '
13 'fAXHUZ8-'8Hs 1k;-'5-
i 13Rsa1 I 1Xiiy 441 6 4I1f%lRHR8B/1 + X h0'J71X* X
17t 1 I I tf A+ 8EE f A fJ 'A
3 f,XR" ig) +s'7AwSAO-A.AE 1t A I XI21c 94- X)s o+ 17~ x 116")?
/I1 2 lk A-itfXZ'I 11Z WXSVIVUIX 8H1 37:&' I I 9 f 1%+8XI I=F i
xZ +* XI I7-
24Fl Z d ZA AA4 xZ a- + X Z25 fM *i^ " 1t AA + ' X I -X+112 + 7)
A k a I, I+wR Ab 1 7 )4-.2 7i4: 1X+ A Z 4 : 12 ='Zj/ 4Zi4fj Ajllj 4
Z + ) Y
A 1_ K Z7 A I 'Ai, I...A x--14) ) t77 A=I Z.
3 1 Z x 1 iAv AX ZYZ 1X 1AA )L is I I6.71 41*-'-3?2 L X X71 , I
I y.t
-
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
1~~~~~~~~'2344?i?(i?35h7°r!2 45 7Hq 1__234'_5h '7,iq
2
7 -HI~~~4beHllPe 0I4e414 + A4 i NeILRJ v I
5 -5P4RRflI29S99iI.
10 _A9ffsEReSs0*+1 gA7 B3
12 ;e980e8HR 71111 1sw8Sma13 f5SC9m*Hb5z1+++IlzE§RvK-14
4~?I117UU
19 +038649 11 1) IZAf4Ax%R8+17 __ 416LF-t-~l49*
1~ .X ASf 7 1 lX AX 7I A
201 39 X X X Z A Z 77 1+ I5II A'A
l20 1S FS0XX-X 7171) I +1X iAS =-
22 1 il,XI Z l+1++* + I#1-23 4,fX1 IS Z1 +--) 11X -1-24 (fZLi95Z
ti625 +tEVAXYXAXZlI)1+1I A+26 XWIMHAPX1ZXZI1XI__
21t 1Ridb£tXZ7Xe'}f0+
28 +%S;!'13mL17I17X5x129
-
88 IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
THIS PICTUJDE HAD lAD HlDS ADDO300 COCOIDNI 'HID PICTUREE DAD
160 DtVwS AND £30 CO)LU4NS
ItIt.
la
3D
SN
6'\'( }~'vregetation 145 S % D % ._
Vegetation 3\s
'Al
DIO
10~ ~ ~ ~ 7/
I5 I *
'U2
1.K.114
ID
I i14 Se~Seshre___ .. -
(a)Fig. S. Example of. image-matching experiment using a
terraini scenie. (a) Reference for terrain.
-
FISCHLER AND ELSCHLAGER: PiCTORIAL STRUCTURES 89
IH000 PICYOIP HAS 100 0901 AND 000 CON.100NS lots PUctuOPF HSo
I&O 0000S 0N0 200 CUL0.0400
02
30
41..
......:)veegettation1
40Si.
Is~~~~~~~~~~~~~~~~~~~~~
102
904
I0I00
li.tl/ e~tt~f
121~1.
140
144
Fig. 5 (continued). (b) Embedding of reference in sensed image
for terrain.
-
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
3834
AI,3049'St
at-
To33
304
3,
3r,
i4.
143
14114.
St.
392
319
-------------------913i334'?,*p33F4~T03'I?1'~730T3T W
33l~'S3~'3t3i3q,99940907333------ ,aOus0Asu I t33e 4133333 I33W4333
IIn..Ij-3.I.004014461114193.33333333343.03333333
4I333It333333333NA334If333934t933*4It3373343.I*
....uS03330
3440f33SS4340430~~~34.,3'33430334340333333333A33401333M3333433333333313333')13?~5jij~ji34034M4033b4AUiUU334430S'4A403UO'
9339433U303333S3334934403331 333330334333033
333333.3333.3------------333A3A4 --------31.-------------3393333334
aa343330333133133333333333333433 3333333~MR040999900339033 13.".304
333333 3304336633 09 13333in033333333TT1lIIU
* -. 333~~~~~~~~------itoww"4" 404 7334333xie
-~473333333413333333434'.~~~~~~~~~~~~~~~~~~~~~~~~~~~mItpg4500117-1aorA
II?a46333333333333I---3333333333 VA33 3anoless
333 333333333
3.4.3..930..33433.,3.3..333m393333330933a343333333333333133311
3- -. 333333333313333~~~~~~~22#0I3.333 0g483p3S3 i333
40.3~036340333 3403 3U33 13 13333 61 109900334 10 9699333430171i
03033343a 3 4 3 3 3 3 3O393403333.3-339433339034033334
33.3I333t%t"04093 3333I3333I3I33-393I3I3339633
33333330*3333933333333333333-* 333333333033 339043433333333
3333I34
3.g33433933333333343O33----------333
333333334ct3g333939393g33390139696096933493Al3333333333333333334I
93341 y- I333333.3334999" s90900199333333333339,00004,000406 39333
333333330334343333333
333333333
,..3333393333333403060 33 ~3334@33 41113g4. 33333'I 33333
3333314111 3314011333933333AI333 A 4"asoose"
o~09333339333333333*33316 33443 333333133033303V9333333093 3333I33*
3333333 333 333633333333333 3333343333933S333309933393333~
3333339333333333333 3333333303939433333333943Al*~9~3
33340 343
393333304033333VEGETATION3333333*133333333333333333344094333333333333334333334333343403333333.3333
3333333993333333333.3133333334933933333333339333
333930333333333333333333433933AA333333334.34
I e -ie foppae 33II3333IIast403333I"I333ANP4491VAIIIAT100419,1I
op ..ol i s410
"33333433433343363333366406403340q33333333'43343333333333333333333333334333333333~3
333333339334393403333 403344033333334334003 334.3333333
A 333333044333333333333333333433333333333333933 39333333333333
333 33333333333 3339 3333333A
3339333333443933333333133333333343333333333333333-*33433333333333333333333033333933399*
33333333*333333343393433333434VEGETATION333133333333334333333333A333333333334333433333333333
3333433 333333:V G ''''O 3333333
333333333333333333333 it ....... APO ...43AA34A333
3-I0161 "*Of*8333033033339 3 394333333
4333?333333333I3333333433333333333343-3 . ? . IIva 4q4A "-o-0
IIIII.wvqow
33343333333333333393333I333I333-3333-33333333333933933333333 .33
p"337133333333 It- 11"11 333333 3A-M-- 101493333I 0333v3 99996 aaI
3W,ItaA333333333I I333I I33A3 3o3Re33 44333 3 III o
oof13rlp3333v333II.....I3.......96 33 3.3333*s* 33004sop4of-
Ia333433303 -933333343304440333334333333 333333I31433333I33A 3
M337I33333343AI337 I41033V3St33I3333
93933333V3A394033A33A?I3aI3II99I99I33A39333 9 0I333333
I3.I333I33I..................333333433343
A '403334333333333333033333333333 43
34333A3A333393314333343A33333333A33313A33v333333 Po ""39333
033333433393333I33I 3I333II33344I433I333I34I336I...3.34393..3
3 793334033333333363403333333343 3834333333349333343
33334433g33933334333339393333393933334033
33333333333334333333333333.............3.3333333.39303OS40313fAg3314334333333333A3333333A3333333333933f9333333333t33333333334333333k333433MAP
333933333 33333333339333333403333333333334
33333333333333333.333333333333333333333333933333
.333333333333333333333333333433333 333343333
3 333333333333333343333333333333344333.333333333333333
.336333.4033333~~~~~~~~~~~041009S.A.-Asee W03333333333433330333A
43A 3333933 0033A333A3334333400333 A39It3I 33.3I3I.3I
I33-33I3I3I33I3I3I39A3I3I33I3I3I33I3I......33
3333333333333
33333333839333333333334333333344003333333 3433 ..34
33333433333333333333333333333 30333433
£ 3333333333333333333334333333334
383333333433333333343333333333033393333333
3333333333333333333033333333833333333333333 3391333333333
3
3933334333333333340g333343333333333393436343333933339333333333333333333484333333333333393633333393.,33333333333333333333333333
33333333333333333.33333333333
3fIf%OVV"A333333393333q34-333333333.i3339330 "O 3 Al4033944
33A3343333333A3393k33I3A33All33Sit3v33A33333
.13333333P33333"333......3 )A&314*3q*43333033403
m- I GO--mmaAm"M A x 4- -4 I It 441 A.-Vffft I t Iso".33
-&t33333333333o333333433333333343343a,*33333333I.333334333333
333 3333333333394404033933333340333334333433333333333333404I33
333433333333333333v333f3333333333333333 333333333333334
333I3I3I433333-33A3It33la331 .33333 ...333333333..333343333433*
333333433~AAA A 133333333403r43333A33
AA-RV3333433333.0-4-03A333333433333333330Ps'-
1.333333343333333I3veto 344
3333333333440333333334333333333.99.,000*90.4..33
9"433933..33X333333333333of
33333333 43333333433 0- I .40i0V W
A-33A333431333333433333333334333 . 333P3333vel A33 .* #v I 3qqs$33
te33of 4O1 -.4.3wo" A-ON333333334R4 6333333333333. .3333A X
0.33..3? -A.43J,IfI V9* II e-e 98.8 9996'j3 334A03333434AA3A93 3~T3
Ii3A933V3t3
ANA33..O3333433333333390433333434V33-3A43A3333430-coax33v3
I3333I343333333333333 3443333 3333333333
333433333.33.3)34.334333333e333333* 1111"OKA33333333AN34433333S33
VA33333343&O" 93I3S4Iv To& a It3343343333333
333333333333333333333333333333330333 3333 333 w3 4343933 4.....
.w33.3333 4333 83333
33333343333333413f OWNA3333339433333Aw
3333AA9333331113303333033334033t333333333833333333333433A33333333333333a3
33330433333333333 .33.3333333393333333I333I34383T 914433333333
0493334033333 333433404111 333,73394 la33333333-V333V33AI3IVIt001
tT*-13333"t33.443434...0433333330.00 010 "-%-14'4 *-A909'A44OA
I3333333334343343I3333A3A"
933493333I333V33I3I333A4333333)A333333303333333 333333333333338
33333333333334333339333343334333333333333333a3.3333..63
3.O3.3333333343333333333
*
33333333333933343333333343393333333343333333333933333333333333333333334333333
333333333333333333433336333
333333333333333333333333333380333333333333331
..333334333334333333333333403A3I3I3
343333333333333333333333333333333433333333343333333333343333393333-33
33it 33333333333334I3334333I3V33V33
34333333333I3I33a3334333334343I3333333 34333 3333113333333333
I3a33my3I3I3A
* .33343333333333333333333334333333333334333
333334333333334333333
333338334343433~~~~~~Otl)'211111I3333333393333333333333333333333333334903
3333333043333343333333333333333333333333833
3334333333333333333338333333.3633333333343433 3*33-. 333393
3333333333
3.3333333.33333333333333333940334433333343333333344333334343333333333
3333333403343343333333333333333333333333333334334 3 .333.334 3433~
33-331333-33333L39133.6333433393333333333433333.33334333343333 43
333333433433433333333333333333333333333333333433334333333333 3
333~33433..~3~33~ 33.3 333' .33439333.3433333333
,.3333333333393333344043333333333933403339333333 3343 33
333403339333833343333343343339333
3I3333393333343333333333333333333334333333333333333343333
...33....3333 344344333434033333333 4033933333333
9333333333333343333333339333333333S3333330333333433ASHORE3~33
333333333 333334333343433333333333333333*.43343334334334336343333S3
333333333333333333333VE E35g±.~JLJ3 33333*3339:343333336- izzizz tr
Ertittizirzftttzyrfrfr fit tj ~ ~ ~ ~~~~~~~~~ ~ ~ ~ ~ ~
~~~~~~~~334303333
33333333333333434343333 Ill335 333333334343343333333333333333433
33334333343.333.3333.333333333333I3330333943333433333333333333333333433339333333~33~
333133 133 3333333333033333333733333303333333333333 3433
333..333*333333 3333.43334333333433333333333 3433333333St
4333333433433g333433333336333II
I43333333433344363333303339333333333333
3333I333333333.3Z33.33.33333333Z333333?33
03334333OA3334333343333333633433333393 3343. 3
33333333333333333334433333333333433 44.33..33.3434334333333333
33S3393
" 114$$ I t n II ft i? TrtrTIT I zrzztwwrtitI tIrttzrwr z trz I
"ZI?trzl I r?" itmmAv3.333 33..mvm33 II I 33 333
3333xxo333040W3RA0833333333339AA&3A333334333333111.41-48TIltit~
~ ~ ~ ~ ~ ~ ~~~~~~~~~~~~~~~~3733333.33333333333333333339333
3393333"W119333340331.
303333033443303333143433333333i43333333333333393333333333333333333333333333333I34333333333333333l33333
33333333
*333334033439333303330334~3333~3333333333.3333433333349333433333333333333333
33333333333433393333333433033333333333c33333334333333333333333333333343333333333
Fig.335 (continued).33334333
333333333333334333S3e3n3333ed333image3333333for34994terrain33.333333
90
-
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
posed a rigid template which is correlated at the dif-ferent
positions of the sensed scene in order to get thelocal evaluation
array. The correlation is performed onlyat the indicated points of
the template. Reproductiondifficulties make it hard to see the
intensities of theseenclosed points. For "Vegetation 1," all the
points areof intensity 40, for "Vegetation 2," 20. For "Urban,"
thepoints were randomly assigned intensities of 25 and 35,and so on
for the other pieces.The terrain-matching experiments represent a
con-
tinuing area of investigation. Our current efforts areprimarily
directed toward obtaining a satisfactory setof primitives which can
be used as the basis for referencecomponent description. In
low-noise experiments, withlimited geometric distortion, the simple
descriptions(i.e., ad hoc shape and texture components linked
bysprings) produced correct embeddings. Experimentsunder more
severe conditions remain to be performed.
Implementation DetailsIn all of the face and terrain experiments
presented
in this paper, the springs were assigned the values
0g
gij(Xj -xi) = qt oo,
if A < row (xj-xi) < B andC < column (xj- xi) <
Dotherwise.
The values A, B, C, and D were typically set by takinga number
of sample pictures and determining the small-est box encompassing
the variation in the relative posi-tion between the components i
and j. A subroutine waswritten to perform this task and
automatically set thederived parameters into the LEA.
In our current implementation, for a typical 35 X39(face)
picture, we require 13 s to compute all theL(EV)A's, and 35 s to
execute the LEA (these timesinclude picture input from a disk
library, and storageof results on the disk). Most of the
programming is inFortran, and the computer is-an IBM /360/40 (the
addi-tion instruction on the 360/40 executes in 11.88 ,s).Total
core and disk storage in bytes for all arrays isM(4f+2h) and
NM(f+2h), respectively, where M, N,f, h are the number of
resolution cells in the sensedimage, the number of L(EV)A's, the
amount of coreneeded for a floating point number, and the amount
ofcore required for a (small) integer. For a typical picturein the
face experiments (M=1600, N=8, J=4, h=2),these requirements woi-k
out to appr-oximately 32 Kbytes of core, and IOOK bytes of disk
storage.
DISCUSSION
Many, though by no means all, visual objects can bedescribed by
breaking down the object into a number ofmore "primitive parts,"
and by specifying an allowablerange of spatial relations which
these "primitive parts"must satisfy for the object to be present.As
an example, suppose we want to describe a frontal
view of a standing person. This visual object could be
decomposed into six primitive pieces: a head, two arms,a torso,
and two legs. For this visual object to be presentin an actual
picture, it is required that these six primi-tives occur (or at
least that some significant subset ofthem occurs), and also that
they occur within a certainspatial relationship one to the
other-that is, the legsshould be next to each other, and below the
torso; thetorso should be between and below the tops of the
twoarms; and the head should be on top of the torso.
It may be noticed that in the previous two para-graphs, we
implicitly separated the local aspects fromthe global aspects of
the description; the local aspectsare the primitive parts of the
picture, and the globalaspects are the spatial relations between
these parts.While at first glance it does not seem unnatural to
makethis separation, in practice there is a frequently encoun-tered
difficulty, that is, the feedback between thWe localand the
global.To illustrate this difficulty, let us go back to the ex-
ample of the view of a standing person. Any methodwhich detects
torsos on a local level (that is, a methodwhich detects torsos
without using any knowledge ofthe positions of nearby arms and
legs) might very welldetect several torsos in a picture. In fact,
the actual or"true" torso may be one of the weaker of the
torsosdetected by the method; it may even happen that thetrue torso
is not detected at all. What does determinethe position of the true
torso is the position of the truearms, legs, and head. But,
unfortunately, the reverse istrue. The positions of, say, the true
arms depends onthe position of the true torso. Thus, the possible
positionof each piece affects the possible position of each
otherpiece, making for a circular type of dependency.
It seems that whatever the visual object is, wheneverwe try to
separate the global and the local, the samecircular dependency
occurs. Many times attempts torecognize visual objects described in
such a way involvealternating between local and global analysis,
heuristics,backup procedures, etc.One conceptual way of avoiding
this circularity is to
evaluate simultaneously a complete interpretation of thepicture;
e.g., in the example given above, we would lookat, and evaluate,
complete configurations of head, arms,legs, torso, etc. The best
complete interpretation couldthen be chosen. This approach,
however, requires thatwe make an infeasibly large number of
evaluations. Itwas just this computational problem that in the
firstplace led to the decomposition approach.The implication of the
above discussion is that, in
general, we cannot hope to decompose the global evalua-tion
problem into a number of smaller independent prob-lems, but rather
must use something akin to the simul-taneous evaluation, taking
advantage of any reductionin total variable interdependency to
reduce the requirednumber of such evaluations.
In this paper, we accomplish this through the follow-ing
machinery. First, an embedding metric is presentedwhich sets the
framework for evaluating how well any
91
-
IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973
composition of primitive picture pieces (parts of thedecomposed
picture) matches the desired compositepicture. Second, a sequential
optimization (dynamicprograming-type) algorithm is developed which
takesadvantage of the decomposition to reduce drasticallythe
computational requirements (our computationalrequirements grow
linearly with the size of the picture,rather than exponentially).
The contribution of thispaper is the simultaneous offering of the
above two com-ponents and their suitability for application to a
wideclass of pictorial objects.
In addition to the image-matching application, whichwas the
center of most of the development in this paper,we have also
attempted to establish the utility of therepresentational aspects
of the embedding metric forgeneral picture description
applications.The work presented here is a continuation of the
investigation described in [1] and [2], where Fischleruses
sequential optimization for matching two-dimen-sional scenes,
introduces the generic form of the em-bedding metric elaborated on
here, and presents the con-cepts of coherent segmentation,
arbitrary serialization,and sequential constraints. The relation of
the heuristicembedding problem to formal decision theory is
alsodiscussed. The only other paper in which sequentialoptimization
is applied to a broad class of problems in-volving two-dimensional
scenes is where Martelli andMontanari [4] present a metric and
matched algorithmfor smoothing pictures. Kovalewsky [5 ] and
Montanari[3] have applied dynamic programing to the detectionof
(one-dimensional) line-like pictures. Reference [3] isan
outstanding paper, in which Montanari providesmuch insight into the
characteristics of sequential opti-mization. Both Kovalewsky [5 ]
and Montanari [3]comment on the representational aspects
associatedwith their optimization procedures. An embedding met-ric
conceptually very similar to the one given in [1 J, [2 ],and this
paper is discussed in a broad and interestingwork by Bremermann'0
[121 with respect to its poten-tial use in character recognition,
speech recognition, andcontrol of effectors (e.g.,
manipulators).
ACKNOWLEDGMENT
The authors wish to thank 0. Firschein and J. Tenen-baum for
many constructive suggestions relative to theorganization of this
paper.
REFERENCES[1] M. A. Fischler, "The detection of scene
congruence," Lockheed
Missiles & Space Company, Inc., Palo Alto, Calif., Rep.
6-83-71-2, Jan. 1971.
[2] M. A. Fischler, "Aspects of the detection of scene
congruence," inProc. 2nd Int. Joint Conf. Artificial Intelligence
(Advance Paper),Sept. 1971, pp. 88-100.
[3] U. Montanari, "On the optimal detection of curves in
noisypictures," Commun. Ass. Comput. Mach., vol. 14, pp.
335-345,May 1971.
[4] A. Martelli and U. Montanari, "Optimal smoothing in
pictureprocessing," in Proc. IFIP Congr. Amsterdam, The
Nether-lands: North-Holland, 1971.
10 Other relevant publications by Bremerman) incltude [131
and[14].
[5] V. A. Kovalewsky, "Sequential optimization in pattern
recog-nition and pattern description," in Proc. IFIP Congr.
Amster-dam, The Netherlands: North-Holland, 1968, pp. 146-151.
[6] R. Bellman and S. Dreyfus, Applied Dynamic
Programming.Princeton, N. J.: Princeton Univ. Press, 1962.
[7] U. Bertele and F. Brioschi, "A new algorithm for the
solutionof the secondary optimization problem in nonserial
dynamicprogramming," J. Math. Anal. Appl., vol. 27, no. 3, pp.
565-574,1969.
[8] 0. Firschein and M. A. Fischler, "Describing and
abstractingpictorial structures," Pattern Recognition, vol. 3, pp.
421-444,Nov. 1971.
[9] A. Rosenfeld, Picture Processing by Computer. New
York:Academic, 1969.
[10] W. F. Miller and A. S. Shaw, "Linguistic methods in
pictureprocessing-A survey," in 1968 Fall Joint Comput. Conf.,AFIPS
Conf. Proc., vol. 33. Washington, D. C.: Thompson,1968, pp.
279-290.
[11] M. D. Kelly, "Edge detection in pictures by computer
usingplanning," in Machine Intelligence 6, B. Meltzer and D.
Michie,Ed. New York: Elsevier, 1971, pp. 397-410.
[12] H. J. Bremermann, "Cybernetic functionals and fuzzy sets,"
inAnn. Symp. Rec. 1971 IEEE Syst., Man Cybern. Group,Oct. 1971, pp.
248-253.
[13] "Pattern recognition, functionals, and entropy," IEEETrans.
Bio-Med. Eng., vol. BME-15, pp. 201-207, July 1968.
[14] -, "What mathematics can and cannot do for
patternrecognition," in Pattern Recognition in Biological and
TechnicalSystems, 0. J. Grusser, Ed. Heidelberg, Germany:
Springer,1971, pp.31-45.
[15] W. W. Bledsoe, "The model method in facial
recognition,"Panoramic Research, Inc., Palo Alto, Calif., Rep.
PRI:15,Aug. 1966.
116] , "Man-machine facial recognition," Panoramic
Research,Inc., Palo Alto, Calif., Rep. PRI:22, Aug. 1966.
[17] A. J. Goldstein, L. D. Harmon, and A. B. Lesk,
"Identificationof human faces, "Proc. IEEE, vol. 59, pp. 748-760,
May 1971.
Martin A. Fischler (S'57-M'58) was born inNew York,I- Y., on
February 15, 1932. Hereceived the B.E.E. degree from the City
Col-lege of New York, New York, in 1954 and theM.S. and Ph.D.
degrees in electrical engineer-ing from Stanford University,
Stanford, Calif.,in 1958 and 1962, respectively.
He served in the U. S. Army for two yearsand held positions at
the National Bureau ofStandards and at Hughes Aircraft Corpora-tion
during the period 1954 to 1958. In 1958
he joined the technical staff of the Lockheed Missiles &
Space Com-pany, Inc., at the Lockheed Palo Alto Research
Laboratory, PaloAlto, Calif., and currently holds the title of
Staff Scientist. He hasconducted research and published in the
areas of artificial intelligence,picture processing, switching
theory, computer organization, andinformation theory.
Dr. Fischler is a member of the Association for Computing
Ma-chinery, the Pattern Recognition Society, the Mathematical
Associa-tion of America, Tau Beta Pi, and Eta Kappa Nu. He is
currentlyan Associate Editor of the journal Pattern Recognition and
is a pastChairman of the San Francisco Chapter of the IEEE Society
on Sys-tems, Man, and Cybernetics.
Robert A. Elschlager was born in Chicago,Ill., on May 25, 1943.
He received the B.S.degree in mathematics from the University
ofIllinois, Urbana, in 1964, and the M.S. degreein mathematics from
the University of Cali-fornia, Berkeley, in 1969.
Since then he has been an AssociateScientist with the Lockheed
Missiles & SpaceCompany, Inc., at the Lockheed Palo Alto
Re-
Q< search Center, Palo Alto, Calif. His currentinterests are
picture processing, operating
systems, computer languages, and computer understanding.Mr.
Elschlager is a member of the American Mathematical
Society, the Mathematical Association of America, and the
Associa-tion for Symbolic Logic.