Hierarchical Model-Based Motion James R. Bergen, P. Anandan, Kei,th J. Hanna, an d Rajesh Hingorani David Sarnoff Research enter, Princeton NJ 08544, USA Abstract. This paper describes a hierarchical estimation framework for the computation of diverse epresentations of motion information. The key features of the resulting framework (or family of algorithms) a,re a global model ha t constrains he overall structure of the motion estimated, a local rnodel that is used in the estimation process, an d a coa,rse-fine efinement strategy. Four specific motion models: affine low, planar surface low, rigid body motion, an d general optical flow, are described along with their appli- cation to specific examPles. 1 Introduction A large body of work in computer vision over the last L0 or 15 years has been con- ."ro"d with the extraction of motion information from image sequences. he motivation of this work is actually quite diverse, with intended applications anging from data com- pression o pattern recognition (alignment strategles) o robotics and vehicle navigat[gn. In tandem with this diversity of motivation is a diversity of representation of motion information: from optical flow, to affine or other parametric transformations, o 3-d ego- motion plus range or other structure. The purpose of this paper is to describe a common framework within which all of thesecomputations can be represented. This unification is possible because all of these problems can be viewed from t[e perspective of image registration. That is, given an image sequence, ompute a repre- sentation of motion that best aligns pixels in on e frame of the sequence with those in the next. The differences among the various approaches mentioned above can then be expressed as different parametric representations of the alignment process. n all ca^ses the function minimized is the same; he difference ie s in the fact that it is minimized with respect to different parameters. The key features of the resulting framework (or family of algorithms) are a global model hat constrains he overall structure of the motion estimate , a local nodel hat is used n the estimation process 1, and a coarse-fine efinement strategy. An example of a global model is the rigidity constraint; an example of a local model is that displacement is constant over a patch. Coarse-fine efinement or hierarchical estimation is included in this framework for reasons hat go well beyond the conventional ones of computational efficiency. ts utility derives from the nature of the objective function common to the various motion models. 1. 1 Hierarchical estimation Hierarchical pproaches a ve been used by various esearchers .8., e e 2 , L 0, 1I,22,19]). More recently, a theoretical analysis of hieralchical motion estimation was described n 1 Because his model will be used n a multiresolution data structure, it is "local" in a slightly unconventional sense hat will be discussed below.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
[8] and the advantagesof using parametric models within such a framework have alsobeen discussedn [5].
Arguments for useof hierarchical (i.e. pyramid based)estimation techniques or mo.tion estimation have usually focused on issuesof computational efficiency.A matching
process hat must accommodate arge displacementscan be very expensive o compute.Simple intuition suggestshat if large displacementsca^n e computed using low resolu-tion image information great savings n computation will be achieved.Higher resolutioninformation can then be used to improve the accuracy of displacementestimation byincrementally estimating small displacements see, or example,[2]).However, t can alsobe a.rgued hat it is not only efficient, o ignorehigh resolution image nformation whencomputing large displacements,n a sense t is necessaryo do so. This is becauseofaliasingof high spatial frequency componentsundergoinglarge motion. Aliasing is thesourceof falsematches n correspondence olutionsor (equivalently) ocal minima in theobjective function used or minimization. Minimization or matching in a multiresolutionframework helps to eliminate problems of this type. Another way of expressing his isto say that many sourcesof non-convexity hat complicate the matching processare notstable with respectto scale.
With only a few exceptions([5, 9J),much of this work has concentratedon using asmall family of "generic" motion models within the hiera,rchicalestimation framework.Such models nvolve the useof some ype of a smoothnessconstraint (sometimesallow-ing for discontinuities) o constrainthe estimation processat image ocations containinglittle or no image structure. However,as noted above,the arguments or use of a mul-tiresolution, hierarchical approach apply equally to more structured models of imagemotion.
In this paper, we describea variety of motion models used within the samehierar-
chical framework. Thesemodelsprovide powerful constraintson the estimation processand their use within the hierarchicalestimation framework leadsto increasedaccuracy,robustnessand efficiency.We outline the implementation of four new modelsand presentresults usingreal images.
L.2 Motion Models
Becauseoptical flow computation is an underconstrainedproblem, all motion estimationalgorithms involve additional assumpti'ons bout the structure of the motion computed.In many cases,however, his assumption s not expressedexplicitly as such, rather it ispresentedas a regularization erm in an objectivefunction
[14,16] or describedprimarily
as a computational sbue[18,4, 2, 20J.Previous work involving explicitly model-basedmotion estimation includes direct
methods 1L7,217,13]as well as methods or estimation under restrictedconditions [7,9J.The first classof methods usesa global egomotion constraint while those in the secondclassof methodsrely on parametric motion models within local regions.The description"direct methods" actually appliesequally to both types.
With respect o motion models, hesealgorithms can be divided into three categories:(i) fully parametric, (ii) quasi-parametric,and (iii) non-parametric. Fully parametricmodels describe he motion of individual pixels within a region in terms of a parametricform. These include affine and quadratic flow fields. Quasi-parametricmodels involve
representing he motion of a pixel as a combination of a parametric componentthat isvalid for the entire region and a local component which variesfrom pixei to pixel. F'orinstance, he rigid motion modelbelongs o this class: he egomotionpararneters onstrainthe local flow vector to lie alonga specific ine, while the local depth valuedetermines he
exact value of the flow vector at eachpixel. By non-parametric models, we mean those
such as are commonly used in optical flow computation, i.e. those involving the use of
some type of a smoothnessor uniformity constraint.
A parallel taxonomy of motion modelscan be constructed by considering ocal models
that constrain the motion in the neighborhoodof a pixel and global models hat describe
the motion over the entire visual field. This distinction becomesespeciallyuseful n a^na.lyzing hiera,rchical pproacheswherethe meaning of "local" changesas the computation
moves hrough the multiresolution hierarchy. n this scheme ully parametric modelsare
global models, non-parametric modelssuchas smoothnessor uniformity of displacement
are local models, and quasi-parametricmodels nvolve both a global and a local model.
The rea^sonor describing motion models n this way is that it clarifies the relationship
between different approachesand allows consideration of the range of possibilities in
choosing a model appropriate to a given situation. Purely global (or fully parametric)
models in essencerivially imply a local model so no choice s possible.However, n the
ca^se f quasi-or non-parametric models, he local model can be more or less complex.
Also, it makesclea,r hat by varying the sizeof local neighborhoods,t is possible o move
continuously from a partially or purely local model to a purely global one.
The reasons or choosingone model or a.notherare generallyquite intuitive, though
the exact choice of model is not always easy to make in a rigorous way. In general,
parametric models constrain the local motion more strongly than the less parametric
ones. A small number of parameters(e.g., six in the ca.se f a,ffine low) are sufficient
to completely specify the flow vector at every point within their region of applicability.
However, they tend to be applicable only within local regions, and in many casesare
approximations to the actual flow field within those regions (although they may be very
good approximations). From the point of view of motion estimation, such models allow
the preciseestimation of motion at locationscontaining no imagestructure, providedthe
region contains at least a few locationswith significant imagestructure.
Quasi-parametric models constrain the flow field less, but neverthelessconstrain it
to some degree. For instance, for rigidly moving objects under perspectiveprojection,
the rigid motion pa.rameters sameas the egomotion paxarneters n the caseof observer
motion), constrain the flow vector at eachpoint to lie along a line in the velocity space.
One dirnensional mage structure (e.g.,a,nedge) s generallysufficient to preciselyesti-
mate the motion of that point. These models tend to be applicableover a wide region
in the image, perhaps even the entire image. If the local structure of the scenecan be
further parametrized (".9., planar surfacesunder rigid motion), the model becomes ullyparametric within the region.
Non-parametric models require local image structure that is two-dimensional (e.g.,corner points, textured areas). However, with the use of a smoothnessconstraint it is
usually possible o 'frll-in" where there is inadequate ocal information. The estimationprocess s typically more computationally expensive han the other two ca.ses. hesemodels are more generally applicable (not requiring parametrizable scene structure ormotion) than the other two classes.
1.3 Paper Organization
The remainder of the paper consistsof an overview of the hierarchicalmotion estimationframework, a description of each of the four models and their application to specificexamples,and a discussionof the overall approachand its applications.
Figure 1 describes he hierarchicalmotion estimation framework. The basiccomponents
of tnis frameworkare: (i) pyramid construction, (ii) motion estimation, (iii) imagewarp-ing, and (iv) coarse-to-fine efinement.
There are a number of ways to construct the image pyramids. Our implementation
uses the Laplacian pyramid described n [6], which involvessimple local computations
and providesthe necessary patial-frequencydecomposition.
The motion estimator variesaccording o the model. In all cases, owever, he estima-
tion processnvolvesSSDminimization, but insteadof performinga discretesearch such
." in [l]), Gauss-Newtonminimization is employed n a refinement process.The basic
*rr*piion behind SSD minimization is intensity constancy. s appliedto the Laplacian
pyramid images.Thus,
f ( * , t )= / (* - . t (x ) , t - 1)
where* = (r,y) denotes he spatial magepositionof a point, f the (Laplacianpyramid)
image ntensity and u(*) - (u(o,a),a(x,y)) denotes he imagevelocity at that point.
the SSDerror measure or estimating the flow field within a region is:
r({.'}) - t (/(*,t)- /(x - rr(*),t L))'x
wherethe sum is computedoverall the points within the regionand {.t} it used o denote
the entire flow field within that region. In general his error (which is actually the sum
of individual errors) is not quadratic in terms of the unknown quantities {t}, be_cause
of the complex pu,[1gtttof intensity variations. Hence, we typically have a non-linear
minimization problem at hand-
Note that the basicstructure of the problem is independentof the choiceof a motion
model. The model is in essencea statement about the function t(x). To make this
explicit, we can write,
u(x) = u(x;p-), (2)
wherepr,. is a vector representing he model parameters.
A standa,rdnumerical approach for solving such a problem is to apply Newton's
method. Ilowever, or errorswhich are sum of squaresa good approximation to Newton's
method is the Gauss-Newtonmethod, which usesa first order expansionof the individual
error quantities beforesquaring. f {u}; current estimate of the flow field during the fth
iteration, the incrementalestimate {6u} can be obtained by minimizing the quadratic
error measure
a({6u}) I @I+ v/. 6u(x))2,x
where
A/(x) - f(*, t ) - / (* - ur(x) t - L),
that is the differencebetweenthe two imagesat correspondingpixels, after taking the
current estimate nto account.
As such, he minimization problem described n Equation 3 is underconstrained.Thedifferent motion models constrain the flow field in difierent ways.When these a,reused
to describe he flow field, the estimation problem can be refiormulatedn terms of the un-
known (incremental)model parameters.The detailsof thesereformulationsare described
This flow field is quadratic in (x) and can be written also as
u(x) - a1* a2x* aey azxz asxy
o(x) - &4* asc* aaU azxU aeUz (11 )
where he 8 coefficients41,...,og) are functionsof the motion paramters ,cl and thesurfaceparmetersk. Since his 8-parameter orm is rather well-known (e.g.,see [15])we
omit its details.
If the egomotionparametersare known, hen the three parametervector k can be usedto represent he motion of the pla^nar urface.Otherwisethe 8-parameter representationcan be used. In either case, he flow field is a linear in the unknown pa,rameters.
The problem of estimating pla^nar urfacemotion has been has been extensivelystud-ied before[21,1, 23]. n particular, Negahdaripour nd Horn [21]suggest terative meth-ods for estimating the motion and the surfaceparameters,a"swell as amethod of estimat-ing the 8 parametersand then decomposinghem into the five rigid motion parameters
the three surfaceparameters n closed orm. Besides he embeddingof thesecomputations
within the hierarchicalestimation framework,we also take a slightly different, approachto the problem.
We assume hat the rigid motion parametersare already known or can be estimated(".9., seeSection3.3 below).Then, the problem reduces o that of estimating he threesurfaceparametersk. There are severalpractical reasons o prefer this approach:First, inmany situations the rigid motion model may be more globally applicable han the planarsurfacemodel, and can be estimated using nformation from all the surfacesundergoingthe samerigid motion. Second,unless he region of interest subtends a significant fieldof view, the second order components of the flow field will be small, and hence theestimationof the eight parameterswill be inaccurateand the processmay be unstable.On the other hand, he informationconcerning he threeparameters is contained n thefirst order componentsof the flow field, and (if the rigid motion parametersare known)their estimation will be more accurateand stable.
The Estirnation Algorithm: Let ki denote the current estimate of the surface pa-rameters,and let t and cudenote the motion parameters.Theseparametersare used toconstruct an initial flow field that is used n the warping step. The residual nformationis then used to determine an incrementalestimate 6k.
To refine he local models,weassumehat L/Z(x) is constantover5 x 5 imagepatchescentered on each image pixel. We then algebraically solve for this Z both in order toestimate its current value, and to eliminate it from the global error measure.Considerthe local component of the error measure,
Eto"ot I E(t ,w, I /Z(x)) .5 x E
Differentiating quation17with respect o I/Z(x) andsetting he result o zero,weget
L/z(x)-- Ibxs(VI)"At (4/ - gz/)rAtilzd(x),+ (V/)"gc.t (v l)r3,w;) t1(
Du*u((vr;r6*' '
' \ ' ' (19)
To refine the global model, we minimize the error in Equation L7 summed over the
entire image:
Estobatt
E(t ,u, I /Z(x)) .
Image
We insert the expression or | / Z (x) given in Equation L9-not the current numeri,cal
aalue of the local parameter-into Equation 20. The result is an expression or Eilobarthat is non-quadratic in t but quadratic in c.r We recoverrefined estimatesof t a,ndc.r
by performing one Gauss-Newtonminimization step using the previousestimatesof theglobal parameters, i and arg,as starting values.Expressionsa,reevaluated numericallya t t ; a n d u ) = u ) i .
We then repeat the estimation algorithm several imes at each mage resolution.
Experiments with the rigid body motion model: We have chosenan outdoor scene
to demonstrate the rigid body motion model. Figure 4a shows one of the input images,
and Figure 4b shows the differencebetween the two input images.The algorithm wasperfiormedbeginning at level
3(subsampledby u factor of
8)of a Laplacianpyramid. The
local surface parametercIf Z(x) were all initialized to zero, and the rigid-body motionparameterswere nitialized o t0 = (0,0, 1)T and u)= (0,0,0)t.The modelparameterswere refined 10 times at each image resolution. Figure 4c shows the difierence image
between he second mage and the first image after being warpedusing he final estimates
of the rigid-body motion parametersand the local surfaceparameters.Figure 4d shows
an image of the recovered ocal surfaceparameterc f Z(x) such that bright points are
nea,rer he camera than dark points. The recovered nverseranges are plausible almost
everywhere,except at the image border and near the recovered ocus of expansion.The
bright dot at the bottom right hand side of the inverse ange map corresponds o a leaf
in the original image that is blowing acoss the ground towa"rds he camera.Figure 4e
shows a table of rigid-body motion parameters that were recovered at the end of each
resolution of analysis.
More experimental results and a detailed discussion of the algorithm's performance
on va.rious types of scenes can be found in [12].
3.4 General Flow Fields
The Modeh Unconstrainedgeneral low fields are typically not describedby any globalparametric model. Different local modelshavebeen used to facilitate the estimation pro-
cess,ncluding constant low within a local window and locally smoothor continuous low.
The former facilitates direct local estimation [18,20], whereas he latter model requires
iterative relaxation techniques 16] tt is also not uncommon to use the combination of
these wo types of local models ".g., [3, 10]).The local model chosenhere s constant flow within 5 x 5 pixel windows at each evel
of the pyramid. This is the sarnemodel as used by Lucas and Kanade [18]but here it is
embeddedas a local model within the hiera,rchicalestimation framework.
The Estirnation Algorithm: Assume that we have an approximate flow field fromprevious evels (or previous terations at the same evel). Assuming that the incremental
flow vector 6u is constant within the 5 x 5 window, Equation 3 can be written as
E(6u) f{a I +vfr6') 'x
where the sum is taken within the 5 x 5 window. Minimizing this error with respect o
6u leads o the equation,
[Itotxvo'] 6u--I vIAI. (22)
(2r)
We make some observationsconcerning he singularities of this relationship. If the sum-*ittg window consists of a singleelement, the 2 x 2 matrix on the left-hand-side s anouter product of a 2 x I vector and hence has a rank of atmost unity. In our case,whenthe sum*ittg window consistsof 25 points, the rank of the matrix on the left-hand-sidewill be two unless he directionsof the gradientvectorsV.I everywherewithin the windowcoincide.This situation is the generalcaseof the aperlure effect.
In our implementation of this technique, he flow estimate at eachpoint is obtainedbyusing a 5 x 5 windows centeredaround that point. This amounts to assuming mplicitly
that the flow field variessmoothly over the image.
Experiments with the general flow model: We demonstrate he general low algo-rithm on an image sequence ontaining several ndependently moving objects, a case orwhich the other motion models describedhere are not applicable. Figure 5a shows one
image of the original sequence.Figure 5b showsthe difference between the two framesthat wereused o compute mage low. Figure 5c shows ittle differencebetween he com-pensated mage and the other original image. Figure 5d shows he horizontal componentof the computed flow field, and figure 5e shows he vertical component. In local imageregionswhere image structure is well-defined,and where the local image motion is sim-
ple, the recoveredmotion estimates appear plausible. Errors predictably occur howeverat motion bounda^ries. rrors alsooccur in image regionswherethe local image structure
is not well-defined like someparts of the road), but for the same rea"son, ucherrors do
not appear as ntensity errors in the compensateddifference mage.
Thus far, we havedescribeda hierarchical ramework for the estimation of imagemotionbetweentwo imagesusing va^riousmodels.Our motivation was to generalize le notionof direct estimation to
model-basedestimation and unify a diverseset of model-basedestimation algorithms nto a single ramework.The frameworkalsosupportsthe combineduse of parametric globalmodels and local models which typically representsometype ofa smoothnessor local uniformity assumption.
One of the unifying aspectsof the framework is that the same objective function(SSD) is usedfor all models,but the minimization is performedwith respect to differentparameters.As noted in the introduction, this is enabledby viewingall theseproblemsfrom the perspectiveof image registration.
It is interesting to contrast this perspective(of model-based mageregistration) withsome of the more traditional approaches o motion analysis.One such approach is tocompute image low fields, which involvescombining the local brightness constraint withsomesort of a global smoothnessa^ssumption,nd then interpret them usingappropriatemotion models.In contrast, the approach aken here is to use the motion models toconstrain the flow field computation. The obvious benefit of this is that the resultingflow fields may generallybe expected to be more consistent with models than generalsmooth flow fields.Note, however, hat the frameworkalso ncludesgeneral *ooih flowfield techniques,which can be used f the motion model s unkno*n.
In the caseof models hat are not fully parametric, ocal image nformation is usedtodetermine ocal image/scene roperties(e.g., he local rangevalue).However, he accu-racy of thesecan only be as good as the available ocal image nformation. For example,in homogeneous reasof the scene, t may be possible o achieveperfect registration even
if the surfacerange estimates (and the corresponding ocal flow vectorsf are incorrect.However, n the presenceof significant image structures, these local estimates may beexpected o be accurate.On the other hand, the accuracyof the globalparameters e.g.,the rigid motion parameters) dependsonly on having sufficient and sufficiently diverselocal information across he entire region.Hence, t may be possible o obtain reliableestimatesof theseglobal parameters,even though estimated local inf,ormationmay notbe reliable everywherewithin the region.For fully parametric models, his problem doesnot exist.
The image registration problem addressedn this paper occurs in a wide range ofimage processingapplications, far beyond the usual ones considered n computer vision(".9., navigationand imageunderstanding).These nclude magecompression ia
motioncompensatedencoding,spatiotemporal analysisof remote sensing ype of images, magedatabase ndexing and retrieval, and possibly object recognition. On" way to state thisgeneralproblem is as that of recovering he coordinatesystem that relate two imagesofa scene aken from two different viewpoints. In this sense, he framework proporuJ h"r"unifiesmotion analysisacross hesediferent applicationsas well.
Acknowledgements: M*y individuals havecontributed to the ideas and results pre-sentedhere.These nclude Peter Burt and LeonidOliker from the David SarnoffResearchCenter, and ShmuelPeleg rom HebrewUniversity.