Subpixel Motion Estimation for Super-Resolution Image Sequence Enhancement

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION

Vol. 9, No. 1, March, pp. 38–50, 1998ARTICLE NO. VC970370

Subpixel Motion Estimation for Super-Resolution ImageSequence Enhancement*

Richard R. Schultz

Department of Electrical Engineering, University of North Dakota, P.O. Box 7165, Grand Forks, North Dakota 58202-7165E-mail: [email protected]

Li Meng

Digital Video Products, LSI Logic Corporation, 1525 McCarthy Boulevard, Milpitas, California 95035E-mail: [email protected]

and

Robert L. Stevenson

Laboratory for Image and Signal Analysis, Department of Electrical Engineering, University of Notre Dame, Notre Dame, Indiana 46556E-mail: [email protected]

Received June 17, 1997; accepted December 31, 1997

Super-resolution enhancement algorithms are used to esti- 1. INTRODUCTIONmate a high-resolution video still (HRVS) from several low-resolution frames, provided that objects within the digital image Super-resolution enhancement techniques may be usedsequence move with subpixel increments. A Bayesian multi- to estimate a high-resolution still from several low-resolu-frame enhancement algorithm is presented to compute an

tion video frames, provided that objects within the imageHRVS using the spatial information present within each framesequence move with subpixel increments [1]. To properlyas well as the temporal information present due to object motionincorporate temporal correlations into the multiframe ob-between frames. However, the required subpixel-resolution mo-servation model, high-quality subpixel motion vectors musttion vectors must be estimated from low-resolution and noisybe estimated between video frames [2–4]. Methods forvideo frames, resulting in an inaccurate motion field which can

adversely impact the quality of the enhanced image. Several attaining subpixel accuracy generally employ an interpola-subpixel motion estimation techniques are incorporated into tion of the image sequence frames, followed by the applica-the Bayesian multiframe enhancement algorithm to determine tion of a parametric or nonparametric motion estimationtheir efficacy in the presence of global data transformations scheme. The accuracy of the estimated motion fields hasbetween frames (i.e., camera pan, rotation, tilt, and zoom) and a direct influence on the quality of the high-resolutionindependent object motion. Visual and quantitative compari- video still (HRVS) image.sons of the resulting high-resolution video stills computed from

The concept of multiframe enhancement was originallytwo video frames and the corresponding estimated motion fieldsintroduced by Tsai and Huang [5], in which an observationshow that the eight-parameter projective motion model is ap-model was defined for a sequence consisting of subpixelpropriate for global scene changes, while block matching andshifts of the same scene. Stark and Oskoui formulated aHorn–Schunck optical flow estimation each have their ownprojection onto convex sets (POCS) algorithm to computeadvantages and disadvantages when used to estimate indepen-

dent object motion. 1998 Academic Press an estimate from observations obtained by scanning orrotating an image with respect to the image acquisitionsensor array. Tekalp et al. [6] first extended this POCS

* This research was supported in part by the National Science Founda- formulation to include sensor noise, and then Patti [7]tion Faculty Early Career Development (CAREER) Program, Grant incorporated interlaced frames and other video samplingMIP-9624849. In addition, this material is based upon work supported

patterns into this algorithm. Bayesian methods of imagein part by the U.S. Army Research Office under Contract DAAH04-96-sequence frame integration were introduced by Cheese-1-0449. Research was presented in part at Visual Communications and

Image Processing ’97, (San Jose, CA), February, 1997. man [8] and Schultz and Stevenson [1]. The multiframe

381047-3203/98 $25.00Copyright 1998 by Academic PressAll rights of reproduction in any form reserved.

SUBPIXEL MOTION ESTIMATION 39

resolution enhancement algorithm proposed by Schultz section, a motion-compensated subsampling model is de-scribed, along with the resulting Bayesian multiframe reso-and Stevenson [1] uses a Bayesian maximum a posteriori

(MAP) estimation technique which incorporates a discon- lution enhancement algorithm.tinuity-preserving prior to integrate progressively scanned

2.1. Image Sequence Observation Modelvideo frames. A more general video observation modelwas also defined for the motion-compensated scan conver- The subsampling model which maps q 3 q high-resolu-sion of interlaced image sequences [9]. This research ex- tion pixels within frame k, denoted as z(k)

r,s , into a singletends the Bayesian HRVS algorithm by comparing several low-resolution pixel y(k)

x at spatial location x 5 (x1 , x2) ismotion estimation techniques, including the estimation of given asthe eight-parameter projective model coefficients [10], con-ventional block matching motion estimation [11, 12], andHorn–Schunck optical flow estimation [13], to determine y(k)

x 51q2 S Oqx1

r5qx12q11Oqx2

s5qx22q11z(k)

r,sD (2)the efficacy of these methods with respect to the qualityof the corresponding multiframe enhanced images.

The paper is organized as follows. Section 2 describes the for x1 5 1, . . . , N1 and x2 5 1, . . . , N2 . This representsBayesian multiframe resolution enhancement technique the integration of light over a single low-resolution charge-which will be used in the simulations. An overview of three coupled device (CCD) image acquisition sensor containingmotion estimation techniques is provided in Section 3. The q 3 q high-resolution sensors within its spatial area. Indetection and elimination of inaccurate motion vectors matrix–vector notation, this can be expressed asmay help improve the quality of localized regions withina high-resolution video still, and an algorithm used for this y(k) 5 A(k,k)z(k), (3)purpose is also proposed in Section 3. Simulation resultsare presented for two image sequences in Section 4, includ- where A(k,k) [ RN1N23q2N1N2 is referred to as the subsamplinging a synthetic image sequence with a known subpixel matrix. Intuitively, each row of A(k,k) maps a square blockcamera pan between frames and an actual image sequence of q 3 q high-resolution samples into a single low-resolu-containing objects which move independently. A brief con- tion pixel.clusion and future research directions are provided in Sec- If the motion vectors are known exactly, the kth frametion 5. after compensation should be identical to the lth frame,

such that2. BAYESIAN MULTIFRAME

RESOLUTION ENHANCEMENT y(l)x 5 y(k)

x12v(x),x22h(x) , (4)

A short low-resolution digital image sequence consistingwhere the vertical and horizontal displacement vector com-

of M frames is defined asponents at location x 5 (x1 , x2) are denoted as v(x) andh(x), respectively. These displacement values may be frac-tional to represent subpixel-resolution motion between

hy(l)j for l 5 k 2 KM 2 12 H, . . . , k 2 1, k,

(1)frames. Assuming that all motion vectors between framesk and l are known exactly and that purely translationalmotion can be used to represent the movement of objects

k 1 1, . . . , k 1 LM 2 12 J, between the two image sequence frames, the motion-com-

pensated subsampling model is given as

where y(k) is the reference frame situated at or near the y(l)x 5 y(k)

x12v(x),x22h(x)

(5)center of the sequence. In this expression, x denotes thelargest integer less than or equal to x, and x representsthe smallest integer greater than or equal to x. The goal 5

1q2 S Oqx1

r5qx12q11Oqx2

s5qx22q11z(k)

r2qv(x),s2qh(x)D.of single frame image enhancement techniques—most no-tably bilinear [14], cubic B-spline [15], and Bayesian MAP

Therefore, the overall high-resolution to low-resolutioninterpolation [16]—is to estimate the unknown high-reso-mapping becomeslution image z(k) from the low-resolution reference frame

y(k). However, if objects in the short image sequence movey(l) 5 A(l,k)z(k). (6)with subpixel increments from frame-to-frame, this addi-

tional temporal information can be utilized to improvethe quality of the high-resolution estimate of z(k). In this A(l,k) has a structure similar to A(k,k), but with the summa-

40 SCHULTZ, MENG, AND STEVENSON

tions over shifted sets of pixels. Since the rows of A(l,k) Since the signal z(k) and the noise n(l,k) are statisticallyuncorrelated, each of the conditional densities for l ? k iswhich contain useful information are those for which ele-

ments of y(l) are observed entirely from motion-compen- an independent noise probability density function:sated elements of z(k), low-resolution pixels which are notcompletely observable must be detected so that the corre- Pr(y9(l)uz(k)) 5 Pr(A9(l,k)z(k) 1 n(l,k)uz(k))sponding rows of A(l,k) can be deleted in the construction 5 Pr(n(l,k)uz(k))

(11)of the reduced matrix A9(l,k). Practically, the motion vectors 5 Pr(n(l,k)).must be estimated using one of the techniques to be pre-sented in Section 3. Thus, the optimization problem in (9) becomes

A practical video observation model useful for real im-age sequences is given as

z(k) 5 arg maxz(k) 5log Pr(z(k)) 1 log Pr(y(k)uz(k))

(12)y9(l) 5 A9(l,k)z(k) 1 n(l,k), (7)

where A9(l,k) is an estimate of the motion-compensated1 Ok1(M21)/2

l5k2(M21)/2l?k

log Pr(y9(l)uz(k))6.subsampling matrix which both subsamples the high-reso-lution frame and takes into account the motion betweenframes, and n(l,k) is an additive noise term representing theerror in estimating A9(l,k) from y(k) and y(l). Additive noise In order to compute the MAP estimate, the prior imagein this case is assumed to be Gaussian-distributed. The model Pr(z(k)) and the conditional densities Pr(y(k)uz(k)) andnotation y9(l) denotes only those pixels within y(l) which are Pr(y9(l)uz(k)) for l ? k must be defined.observable entirely from motion-compensated elements of The Huber–Markov random field (HMRF) model [1] isz(k). a Gibbs prior [17] which represents piece-wise smooth

data, with the probability density function defined as2.2. High-Resolution Video Stills (HRVS) Algorithm

To estimate a high-resolution video still from a low-Pr(z(k)) 5

1Z

exp H21

2b Oc[C

ra(dtcz(k))J. (13)resolution image sequence, a Bayesian MAP estimation

technique [1] is employed. The Bayesian estimate com-puted from M low-resolution frames is given as In this expression, Z is a normalizing constant known as

the partition function, b is the Gibbs prior ‘‘temperature’’z(k) 5 arg max

z(k)log Pr(z(k)uy9(k2(M21)/2), . . . , y9(k21),

(8)parameter, and c is a local group of pixels contained withinthe set of all image cliques C . The quantity dt

cz(k) is a spatialy(k), y9(k11), . . . , y9(k1(M21)/2)). activity measure, with a small value in smooth image loca-

tions and a large value near edges. Four spatial activityApplying Bayes’ theorem to the posterior probability re- measures are computed at each pixel in the high-resolutionsults in the following optimization problem: image, given by the following second-order finite differ-

ences:z(k) 5 arg max

z(k)hlog Pr(z(k)) 1 log Pr(y9(k2(M21)/2), . . . ,

(9) dtx,1z(k) 5 z(k)

x1,x221 2 2z(k)x1,x2

1 z(k)x1,x211 ,

y9(k21), y(k), y9(k11), . . . , y9(k1(M21)/2)uz(k))j.dt

x,2z(k) 5 0.5z(k)x111,x221 2 z(k)

x1,x21 0.5z(k)

x121,x211 ,

In this expression, Pr(z(k)) represents the prior image den-dt

x,3z(k) 5 z(k)x121,x2

2 2z(k)x1,x2

1 z(k)x111,x2

,sity, and the conditional density models the error in esti-mating the motion vectors used to construct the motion dt

x,4z(k) 5 0.5z(k)x121,x221 2 z(k)

x1,x21 0.5z(k)

x111,x211 .compensated subsampling matrices A(l,k). The motion esti-mation error is assumed to be independent between

The likelihood of discontinuities is controlled by the Huberframes, so that the joint conditional density may be writ-edge penalty function [1],ten as

Pr(y9(k2(M21)/2), . . . , y9(k21), y(k), y9(k11), . . . , y9(k1(M21)/2)uz(k))ra(x) 5 Hx 2, uxu # a,

2auxu 2 a2, uxu . a,(14)

5 Pr(y(k)uz(k)) 3 pk1(M21)/2

l5k2(M21)/2l?k

Pr(y9(l)uz(k)). (10)

where a is a threshold parameter which separates the qua-


dratic and linear regions. A quadratic edge penalty, limaRy 3. SUBPIXEL MOTION ESTIMATION TECHNIQUESra(x) 5 x 2, characterizes the Gauss–Markov random field

Estimating accurate subpixel motion vectors is a criti-prior model. The use of the Huber function with the propercally important component of modeling an image sequencevalue of a helps maintain discontinuities in the image es-for use in super-resolution enhancement algorithms. Thetimate.eight-parameter projective model coordinate transforma-Each conditional density, Pr(y(k)uz(k)) and Pr(y9(l)uz(k)) fortion [10], block matching [12], and Horn–Schunck opticall ? k, models the error in the displacement vector estimatesflow motion estimation [13] are described in this section.used to construct the motion-compensated subsamplingIn all cases, two successive low-resolution video framesmatrices A9(l,k). Since motion vectors are not required forare first up-sampled by a factor of q using either bilinear,frame y(k), the density Pr(y(k)uz(k)) is given ascubic B-spline [15], or single frame Bayesian MAP interpo-lation [16]. The resulting up-sampled frames are denotedas yq(k) and yq(l); subpixel-resolution motion vectors arePr(y(k)uz(k)) 5 H1, y(k) 5 A(k,k)z(k),

0, y(k) ? A(k,k)z(k).(15)

then estimated from these two frames. Since a single sub-pixel-resolution motion vector is required for each low-resolution image pixel in the image sequence observationFor other frames in the short sequence, the error in themodel, the estimated motion vectors are down-sampled byvideo observation model is assumed to be Gaussian-distrib-averaging over q 3 q vector blocks.uted; therefore, each conditional density is given by the ex-

pression3.1. Eight-Parameter Projective Model

Parametric coordinate transformation algorithms as-Pr(y9(l)uz(k)) 5

1(2f)(N1N2)/2s(l,k)N1N2

(16)sume that objects remain stationary while the camera orthe camera lens moves; this includes transformations suchas pan, rotation, tilt, and zoom. If an image sequence con-

3 exp H21

2s(l,k)2 iy9(l) 2 A9(l,k)z(k)i2J tains a global transformation between frames, the esti-

mated motion field can be highly accurate due to the largeratio of observed image pixels to unknown motion modelfor l 5 k 2 (M 2 1)/2, . . . , k 2 1 and l 5 k 1 1, . . . ,parameters. The parametric model which correspondsk 1 (M 2 1)/2. The error variance s(l,k)2

for each framemost closely to transformations that occur in the real worldis assumed to be proportional to the absolute frame indexis the eight-parameter projective model [10]. The modeldifference ul 2 ku.is defined asBy substituting Eqs. (13), (15), and (16) into (12), the

Bayesian MAP estimate of the high-resolution image canbe written as the constrained optimization problem x91 5

a1x1 1 a2x2 1 a3

a7x1 1 a8x2 1 1, x92 5

a4x1 1 a5x2 1 a6

a7x1 1 a8x2 1 1, (19)

where x 5 (x1 , x2) denotes the spatial location of a pixelz(k) 5 arg minz(k)[Z 5Ox O4

i51ra(dt

x,iz(k))

(17)in frame l, x9 5 (x91 , x92) represents the location of thetransformed pixel in frame k, and a 5 ha1 , . . . , a8j denotesthe eight unknown model parameters. This model can beexpressed in a more compact matrix–vector form as1 Ok1(M21)/2

l5k2(M21)/2l?k

l(l,k)iy9(l) 2 A9(l,k)z(k)i26,

x9 5Fx91

x92G5

Ax 1 bctx 1 1

. (20)where the constraint set is defined as

Z 5 hz(k) : y(k) 5 A(k,k)z(k)j. (18) The matrices and vectors have the following dimensions:A [ R232, b [ R231, and c [ R231.

The ‘‘four point’’ method proposed by Mann and PicardEach frame, y9(l) for l ? k, has an associated confidenceparameter, l(l,k) 5 b/s(l,k)2

, proportional to the confidence [10] is used to compute the eight-parameter projectivemodel coefficients. By selecting four points in frame yq(l)in the motion-compensated subsampling matrix estimate

A9(l,k). Since the objective function in (17) is convex, the corresponding to four transformed points in the up-sam-pled reference frame yq(k), eight nonlinear equations resultgradient projection technique [1, 18] is used to compute

the unique solution z(k). for the eight unknowns. Solving these simultaneous nonlin-


ear equations directly is not possible. To circumvent this Block matching generally performs well if a number ofspatial discontinuities are present within the block, but theproblem, a bilinear motion model [19] approximation to

the projective model is used, such that technique fails to estimate vectors properly over flat imageintensity regions. A large block size, say, p 5 5, may be usedto ensure that a sufficient spatial variation exists within thex91 5 q1x1x2 1 q2x1 1 q3x2 1 q4 ,

(21) block.x92 5 q5x1x2 1 q6x1 1 q7x2 1 q8 .

3.3. Horn–Schunck Optical Flow EstimationNow, eight linear equations result for the four selected

Horn–Schunck optical flow estimation [13] results inpoints, resulting in a least squares solution. The projective

motion vector estimates which satisfy the optical flow equa-model parameters are then related to the bilinear coeffi-

tion with the minimum pixel-to-pixel variation in the veloc-cients in an iterative algorithm, in which the up-sampled

ity field. Assume that continuous image intensity, denotedreference frame is iteratively warped into yq(l) until the

as E(x1 , x2 , t), is constant along a particular object trajec-projective model parameters converge.

tory, such that the optical flow equation is given as3.2. Block Matching Motion Estimation

E(x1 , x2 , t) 5 E(x1 1 Dx1 , x2 1 Dx2 , t 1 Dt) ;x1 , x2 , t.The primary drawback of using parametric motion mod- (23)

els is that they are only applicable in the presence of globalmotion. Simply stated, it is expected that they will fail Through a first-order Taylor series expansion of the right-when objects move independently. Nonparametric models hand side of Eq. (23), the optical flow equation can bedo not possess this problem, so they may be used to esti- rewritten asmate independent object motion [19]. However, due to thelow ratio of observed image pixels to unknown motion

«of(v(x), h(x)) 5 v(x)Ex11 h(x)Ex2

1 Et P 0, (24)model parameters, a number of the estimated motion vec-tors may be inaccurate.

where the translational flow velocity vector consists of spa-Block matching [12] is a popular approach for estimatingtial derivatives; i.e.,motion vectors from image sequences. This method as-

sumes that the motion field is uniform over compact blocksof pixels and that the motion can be modeled as displace- v(x) 5

dx1

dt, h(x) 5

dx2

dt. (25)

ments of these blocks. Let a (2p 1 1) 3 (2p 1 1) pixelregion in the up-sampled frame yq(l) represent the blockto be matched between frames k and l, with pixel locations In the discretized optical flow equation, spatial derivativecontained in the set estimates, Ex1

and Ex2, and the temporal derivative esti-

mate, Et , can be calculated from the interpolated framesRx 5 h(r, s)ux1 2 p # r # x1 1 p; x2 2 p # s # x2 1 pj. as finite differences [13]:

The maximum allowable vertical and horizontal displace- Ex15 Af[yq(k)

x111,x22 yq(k)

x1,x21 yq(k)

x111,x211 2 yq(k)x1,x211ment of this block is d pixels, with this parameter defining

the extent of the search area in the up-sampled reference 1 yq(l)x111,x2

2 yq(l)x1,x2

1 yq(l)x111,x211 2 yq(l)

x1,x211]frame yq(k). Thus, candidate motion vectors are limited to

Ex25 Af[yq(k)

x1,x211 2 yq(k)x1,x2

1 yq(k)x111,x211 2 yq(k)

x111,x2the set

1 yq(l)x1,x211 2 yq(l)

x1,x21 yq(l)

x111,x211 2 yq(l)x111,x2

]Vx 5 h(v(x), h(x))u 2 d # v(x) # d; 2d # h(x) # d j.

Et 5 Af[yq(l)x1,x2

2 yq(k)x1,x2

1 yq(l)x111,x2

2 yq(k)x111,x2Using this notation, a single displacement vector can be

estimated using the mean absolute difference (MAD) crite-1 yq(l)

x1,x211 2 yq(k)x1,x211 1 yq(l)

x111,x211 2 yq(k)x111,x211].rion as

(v(x), h(x)) 5 arg min(v(x),h(x))[Vx (22)

The motion field is then estimated using the following cri-terion:

H 1(2p 1 1)2 O

(r,s)[Rx

uyq(l)r,s 2 yq(k)

r2v(x),s2h(x)uJ (v, h) 5 arg min(v,h)

Ex2E

x1

[«2of (v(x), h(x))

(26)1 t 2«2

s(v(x), h(x))] dx1 dx2 .for x1 5 1, . . . , N1 and x2 5 1, . . . , N2 .


The smoothness measure, «2s(v(x), h(x)), is added to the The mean, eDFD , and variance, s 2

DFD , of the DFD aredenoted asoptical flow constraint to ensure that the displacement

vectors vary only slightly over a small neighborhood. Foreach point within the image, the smoothness measure is

eDFD 51N O

xDFD(l,k)

x 51N O

xuyq(l)

x 2 yq(k)x12v(x),x22h(x)u (29)defined as

s 2DFD 5

1N 21 Ox (DFD(l,k)

x 2 eDFD)2, (30)«2

s(v(x), h(x)) 5 Sv(x)x1

D2

1 Sv(x)x2

D2

(27)where N is the total number of pixels under consideration.

1 Sh(x)x1

D2

1 Sh(x)x2

D2

. Since it is assumed that the motion vectors along the bor-ders of each frame are not useful due to pixels enteringand leaving the scene, N is given as N1 3 N2 minus the

The parameter t 2 controls the influence of the smoothness number of excluded border pixels. A motion vector atterm, and thus the smoothness of the estimated optical flow point x is detected as an inaccurate estimate if DFD(l,k)

xfield. Minimization of (26) can be performed by solving a exceeds a threshold T which is dependent on the signalpair of linear equations using Gauss–Seidel iterations [13]. statistics. The threshold value used in the simulations isSimulations have shown that it takes at least 2000 iterations given asto achieve convergence, even for a pair of relatively smallvideo frames. T 5 eDFD 1 2sDFD . (31)

Theoretically, Horn–Schunck optical flow estimationshould yield more accurate motion vectors than block A simple change detection algorithm examines thematching, since the vectors are correlated throughout the straight frame difference (FD) between the up-sampledmotion field. However, estimating accurate gradients from frames k and l,image data is difficult, and for this reason «2

of (v(x), h(x))may be a poor estimate of the true constraints. In practice, FD(l,k)

x 5 uyq(l)x 2 yq(k)

x u. (32)motion fields estimated by applying the Horn–Schuncktechnique to actual image sequence frames contain a num- If FD(l,k)

x is small, this implies that the intensity of pixel xber of errors, since they are computed by minimizing an in frame l is almost the same as the corresponding pixelinaccurate objective function. in frame k. Therefore, the motion vector at this point will

be difficult to estimate using the block matching and optical3.4. Detection and Elimination of Inaccurate flow techniques, and it may have to be eliminated from

Motion Estimates the vector field.In the simulations, a motion vector estimate at point xIn the multiframe enhancement algorithm, an inaccurate

is considered to be accurate if the following two conditionsmotion vector can cause a great deal of damage to theare satisfied:estimated high-resolution video still, even more so than

by assuming that there is no motion at that point. There-DFD(l,k)

x , T, FD(l,k)x . 2fore, the detection and elimination of inaccurate motion

vectors is an important step in the enhancement algorithmIf an inaccurate motion vector estimate is detected at loca-to improve the overall quality of the high-resolutiontion x, v(x) and h(x) are ignored, and the pixel is consideredvideo still.to be observable in only one of the two video frames.The displaced frame difference (DFD) at location x be-

tween the up-sampled frame l and the compensated image4. SIMULATIONSfrom frame k is denoted as

In order to compare the effects of the motion estimationDFD(l,k)x 5 uyq(l)

x 2 yq(k)x12v(x),x22h(x)u. (28)

techniques on the quality of the multiframe enhanced im-ages computed by the Bayesian HRVS algorithm, two im-age sequences were used in the simulations. The first imageIdeally, the DFD should be zero if the displacement vectors

describe the motion exactly. However, there is always some sequence, Airport, is shown in Fig. 1. Airport was syntheti-cally generated from a single digital image to model aerror associated with the incorrect estimation of the motion

vectors. Therefore, the DFD has a nonzero value which diagonal camera pan with known motion vectors. The sec-ond image sequence, Mobile and Calendar, is shown incan serve as a criterion of how well the motion vectors

have been estimated. Fig. 2. This sequence is composed of several objects moving


FIG. 1. Airport image sequence: (a) original high-resolution video frame z(k); (b) low-resolution video frame y(k) (q 5 2); (c) low-resolution video frame y(k11) (q 5 2).

independently. To show the effect of an estimated motion when more than two frames are integrated using the Bayes-ian HRVS algorithm.field on the quality of a high-resolution video still, only

two frames within each sequence have been selected for Both test image sequences were first subsampled by afactor of q (q 5 2 or q 5 4) and then interpolated back tothese experiments. Thus, a high-resolution video still is

computed from two low-resolution video frames and the their original dimensions so that quantitative comparisonscould be made using the improved signal-to-noise ratiosingle motion vector field that is estimated between them.

It must be emphasized that we are attempting to determine (SNR). The improved SNR is a quantitative measure ofhow much the estimated frame z(k) has improved over theempirically how the accuracy of an estimated motion field

affects the quality of the high-resolution video still; with reference frame, given asthis in mind, only a single motion field (and consequently,two low-resolution video frames) can be used in the experi-

DSNR 5 10 log10iz(k) 2 z(k)

0 i2

iz(k) 2 z(k)i2 (in dB). (33)ments. To obtain improved resolution, more frames couldbe added to each sequence. The reader is directed to previ-ous research results by Schultz and Stevenson [1] to exam- In this expression, z(k)

0 is generated by a zero-order holdup-sampling of the reference frame y(k), z(k) is the originaline the visual and quantitative results that are possible


FIG. 2. Mobile and calendar image sequence: (a) original high-resolution video frame z(k); (b) low-resolution video frame y(k) (q 5 2);(c) low-resolution video frame y(k11) (q 5 2).

high-resolution image, and z(k) is the estimated high-resolu- down-sample the displacement vectors by averagingq 3 q vector blocks.tion video still image. It should be noted that this quantita-

tive measure can only be used in idealized test cases where 3. Compute the Bayesian multiframe HRVS for eachmotion field using the following parameters: M 5 2;z(k) is known.

The procedure used to compute the multiframe high- q 5 2 or q 5 4; a 5 1.0; and l(l,k) 5 5.0/ul 2 ku.resolution estimates is as follows: Tables 1 and 2 provide a quantitative comparison of the

single frame and multiframe enhanced estimates computed1. Up-sample both low-resolution frames y(k) and y(l) byfrom the Airport and Mobile and Calendar sequences, re-a factor of q 5 2 or q 5 4, using (a) bilinear interpolation,spectively, while Figs. 3 and 4 provide a visual comparison(b) cubic B-spline interpolation, and (c) single frameof several multiframe estimates computed using the threeBayesian interpolation (M 5 1, a 5 1.0). The resultingdifferent subpixel motion estimation techniques. For allup-sampled frames are denoted as yq(k) and yq(l).multiframe estimates shown in Figs. 3 and 4, single frame2. Estimate the subpixel-resolution motion vectors be-Bayesian interpolation was used to up-sample the low-tween yq(k) and yq(l) by applying (a) eight-parameter pro-resolution frames so that the subpixel motion vector fieldsjective model estimation (‘‘four point’’ method), (b) blockcould be estimated. Conclusions drawn from the visual andmatching (p 5 4, d 5 9), and (c) Horn–Schunck opticalquantitative comparisons are as follows:flow estimation (t 2 5 10.0, 2000 iterations). Detect and

eliminate inaccurate motion vector estimates. Finally, 1. Provided that the motion is estimated correctly, val-


TABLE 1a

Comparison of Airport Sequence High-Resolution Video StillsDSNR (in dB) of Single Frame and Multiframe Resolution Enhanced Estimates

q 5 2 q 5 4Single frame up-sampling algorithm/motion

estimation technique All Accurate All Accurate

Single frame bilinear interpolation 0.93 0.59Single frame cubic B-spline interpolation 3.08 1.40Single frame Bayesian interpolation 3.44 1.67

Bilinear interpolation/eight-parameter 6.85 7.08 2.24 2.38Bilinear interpolation/block matching 6.12 7.18 2.30 2.34Bilinear interpolation/Horn–Schunck 3.08 3.17 1.79 1.85

Cubic B-spline interpolation/eight-parameter 6.17 7.17 2.30 2.70Cubic B-spline interpolation/block matching 6.11 7.14 2.34 2.38Cubic B-spline interpolation/Horn–Schunck 2.84 3.07 1.77 1.78

Bayesian interpolation/eight-parameter 7.00 7.18 2.42 2.52Bayesian interpolation/block matching 5.96 6.88 2.23 2.20Bayesian interpolation/Horn–Schunck 2.82 2.96 1.57 1.70

a ‘‘All’’ represents the use of all estimated motion vectors, while ‘‘Accurate’’ denotes that the inaccuratemotion vector estimates have been detected and eliminated prior to the application of the multiframeresolution enhancement algorithm. DSNR values (in dB) represent the quantitative improvement of thehigh-resolution video still over the low-resolution reference frame.

ues of DSNR should be larger for the multiframe estimates matching motion vectors and the Mobile and Calendarestimates in Table 2 which used either the block matchingthan for the single frame estimates. This is quite evident

for the multiframe Airport estimates in Table 1 which used or Horn–Schunck optical flow motion vectors.2. The quality of the subpixel-resolution motion vectoreither the eight-parameter projective model or the block

TABLE 2a

Comparison of Mobile and Calendar Sequence High-Resolution Video StillsDSNR (in dB) of Single Frame and Multiframe Resolution Enhanced Estimates

q 5 2 q 5 4Single frame up-sampling algorithm/motion

estimation technique All Accurate All Accurate

Single frame bilinear interpolation 0.27 0.35Single frame cubic B-spline interpolation 1.70 0.76Single frame Bayesian interpolation 2.41 1.12

Bilinear interpolation/eight-parameter 20.13 20.04 1.01 1.03Bilinear interpolation/block matching 2.35 2.82 1.30 1.33Bilinear interpolation/Horn–Schunck 2.74 2.95 1.23 1.23

Cubic B-spline interpolation/eight-parameter 0.56 0.85 1.08 1.08Cubic B-spline interpolation/block matching 2.83 3.07 1.29 1.32Cubic B-spline interpolation/Horn–Schunck 2.54 2.84 1.16 1.18

Bayesian interpolation/eight-parameter 0.27 1.01 1.10 1.12Bayesian interpolation/block matching 2.49 2.86 1.25 1.26Bayesian interpolation/Horn–Schunck 2.58 2.79 1.15 1.17

a ‘‘All’’ represents the use of all estimated motion vectors, while ‘‘Accurate’’ denotes that the inaccuratemotion vector estimates have been detected and eliminated prior to the application of the multiframeresolution enhancement algorithm. DSNR values (in dB) represent the quantitative improvement of thehigh-resolution video still over the low-resolution reference frame.


estimates is dependent upon the accuracy of the up-sam-pled image data. Bilinear, cubic B-spline, and single frameBayesian MAP interpolation (M 5 1, a 5 1.0) were usedto up-sample the low-resolution video frames prior to theapplication of a particular motion estimation method. Em-pirically, it was determined that both cubic B-spline andBayesian MAP interpolation provide useful and very simi-lar up-sampled frames, while bilinear interpolation israther ineffective.

3. The performance of the three motion estimation algo-rithms varies for the test image sequences. In the case ofglobal motion, the eight-parameter projective model canrecover the motion extremely well. As expected, the algo-rithm performs poorly on sequences which contain objectsthat move independently. On the other hand, block match-ing and Horn–Schunck optical flow estimation performquite well in this case, but not for global data transforma-tions. It is expected that the Horn–Schunck algorithmshould perform better than block matching for the generalcase of independent object motion. However, as shownempirically, this is not always the case. Estimating thespatial and temporal gradients as required by the Horn–Schunck algorithm results in an inaccurate objective func-tion and, consequently, in many inaccurate motion vectors.

4. In most cases, the multiframe high-resolution imagesand their corresponding DSNR values show that the qualityof a particular HRVS is improved after inaccurate motionvectors have been detected and eliminated.

5. For the interpolation factor q 5 4, Tables 1 and 2show that the quality of the multiframe HRVS computedusing M 5 2 frames does not differ significantly from thesingle frame Bayesian estimate (M 5 1, a 5 1.0). Thisimplies that the estimated motion vectors are quite poor forq . 2, and that the subpixel-resolution motion estimationschemes used in this research do not perform well forpractical interpolation factors. Parametric techniques canbe useful for estimating higher-resolution motion fieldsbetween frames containing global motion (e.g., reconnais-sance and satellite image sequences). Estimating high-reso-lution motion fields between frames containing indepen-dent object motion is still an open research area, as blockmatching and Horn–Schunck estimation do not performadequately.

FIG. 3. Details of Airport sequence high-resolution video stills,q 5 2: (a) low-res frame, y(k); (b) low-res frame, y(k11); (c) high-resframe, z(k); (d) single frame Bayesian estimate, DSNR 5 3.44 dB; (e)multiframe HRVS, eight-parm model, all vectors, DSNR 5 7.00 dB;(f) multiframe HRVS, eight-parm model, accurate vectors, DSNR 57.18 dB; (g) multiframe HRVS, block matching, all vectors, DSNR 55.96 dB; (h) multiframe HRVS, block matching, accurate vectors,DSNR 5 6.88 dB; (i) multiframe HRVS, Horn–Schunck, all vectors,DSNR 5 2.82 dB; (j) multiframe HRVS, Horn–Schunck, accuratevectors, DSNR 5 2.96 dB.


within the sequence move with subpixel increments. An im-age sequence observation model was presented for low-res-olution frames, which models the subsampling of the un-known high-resolution data and accounts for independentobject motion occurring between frames. Three singleframe enhancement methods were used in the experiments,including the bilinear, cubic B-spline, and single frameBayesian interpolation techniques. The Bayesian multi-frame enhancement algorithm was described to recon-struct a high-resolution video still from several low-resolu-tion image sequence frames. In this algorithm, estimatingaccurate subpixel-resolution motion vectors is critically im-portant. The eight-parameter projective parametric trans-formation, block matching, and Horn–Schunck optical flowestimation techniques designed for subpixel motion estima-tion were utilized in the computer simulations. The resultinghigh-resolution video stills computed using the various esti-mated motion fields were analyzed both visually and quanti-tatively to determine the most useful motion estimationtechnique. It was determined that the parametric motionmodel results in accurate multiframe video estimates whena global transformation such as a camera pan, rotation, tilt,or zoom occurs between frames. Optical flow equation-based techniques tend to perform well when objects moveindependently in front of a stationary camera lens. Blockmatching can recover the motion well when the sequencecontains a large number of spatial discontinuities and fewflat image regions. All motion estimation techniques are in-herently imperfect, and it is easy to see the locations of inac-curate motion estimates within a high-resolution video still.An algorithm for the detection and elimination of inaccu-rate motion vectors was described, and simulations verifiedthat the HRVS is improved when the inaccurate motion vec-tors were eliminated from the motion field.

A number of issues will be explored in future research.First, the Horn–Schunck optical flow estimation methodwill be modified to allow discontinuities within the motionfield by incorporating the Huber edge penalty function intothe smoothness measure [20]. The goal is to automaticallyFIG. 4. Details of Mobile and Calendar sequence high-resolutionsegment the motion field into regions representing inde-video stills, q 5 2: (a) low-res frame, y(k); (b) low-res frame, y(k11);pendent objects. Furthermore, the motion of each object(c) high-res frame, z(k); (d) single frame Bayesian estimate, DSNR 5

2.41 dB; (e) multiframe HRVS, eight-parm model, all vectors, will be studied, since a more accurate subpixel-resolutionDSNR 5 0.27 dB; (f) multiframe HRVS, eight-parm model, accurate motion estimate can be obtained by averaging over thevectors, DSNR 5 1.01 dB; (g) multiframe HRVS, block matching, all motion field region where an individual object is undergo-vectors, DSNR 5 2.49 dB; (h) multiframe HRVS, block matching,

ing a transformation between frames. Secondly, an itera-accurate vectors, DSNR 5 2.86 dB; (i) multiframe HRVS, Horn–tive motion estimation approach will be implemented forSchunck, all vectors, DSNR 5 2.58 dB; (j) multiframe HRVS, Horn–

Schunck, accurate vectors, DSNR 5 2.79 dB. the multiframe enhancement algorithm. Explicitly, multi-frame estimates will serve as the up-sampled frames, andthen the subpixel-resolution motion vector estimates willbe refined in an iterative fashion. An improved image5. CONCLUSIONsensor model is also under investigation, which incorpo-rates motion-compensated data subsampling and a pointSuper-resolution image sequence enhancement methods

are used to estimate a high-resolution video still from sev- spread function (PSF) to represent blur due to an out-of-focus camera lens or object motion.eral low-resolution video frames, provided that objects


7. A. J. Patti, M. I. Sezan, and A. M. Tekalp, High-resolution standardsAPPENDIX: SYMBOLSconversion of low resolution video, in Proceedings, IEEE Interna-tional Conference on Acoustics, Speech, and Signal Processing, De-N1 number of rows within a digital video frametroit, MI, 1995, pp. 2197–2200.

N2 number of columns within a digital video frame8. P. Cheesman, B. Kanefsky, R. Kraft, J. Stutz, and R. Hanson, Super-M number of frames within a digital image sequence Resolved Surface Reconstruction from Multiple Images, Tech. Rep.

z lexicographically-ordered digital image, repre- FIA-94-12, NASA Ames Research Center, Moffett Field, CA, 1994.sented as a vector 9. R. R. Schultz and R. L. Stevenson, Motion-compensated scan conver-

z(k) frame k within a digital image sequence sion of interlaced video sequences, in Proceedings of the IS&T/SPIEConference on Image and Video Processing IV, vol. 2666, (San Jose,z(k)

x pixel at spatial location x 5 (x1 , x2) within the kth

CA), pp. 107–118, 1996.frame10. S. Mann and R. W. Picard, Virtual bellows: constructing high qualityz an estimate of the vector z

stills from video, in Proceedings of the IEEE International Conferencei?i Euclidean norm in real spaceon Image Processing (Austin, TX), pp. 363–367, 1994.

ra(?) Huber edge penalty function, dependent on pa-11. M. Bierling and R. Thoma, Motion compensating field interpolation

rameter a using a hierarchically structured displacement estimator, Signal Pro-dt

x,iz(k) spatial activity measure computed at pixel z(k)x cessing 11(4), 1986, 387–404.

a Huber edge penalty function threshold parameter 12. H. G. Musmann, P. Pirsch, and H.-J. Grallert, Advances in picturecoding, Proceedings of the IEEE 73(4), 1985, 523–548.b Gibbs prior temperature parameter

l smoothing parameter 13. B. K. P. Horn and B. G. Schunck, Determining optical flow, ArtificialIntelligence 17, 1981, 185–203.e mean or average value

14. A. K. Jain, Fundamentals of Digital Image Processing, Prentice–Hall,s Gaussian noise standard deviationEnglewood Cliffs, NJ, 1989.Pr(?) probability density function

15. H. H. Hou and H. C. Andrews, Cubic splines for image interpolationexph?j exponential functionand digital filtering, IEEE Transactions on Acoustics, Speech, and? operator returning the least integral value greaterSignal Processing 26(6), 1978, 508–517.

than or equal to its argument16. R. R. Schultz and R. L. Stevenson, A Bayesian approach to image? operator returning the greatest integral value less expansion for improved definition, IEEE Transactions on Image Pro-

than or equal to its argument cessing 3(3), 1994, 233–242.Z constraint set 17. S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions,q integer-valued interpolation factor and the Bayesian restoration of images, IEEE Transactions on Pattern

Analysis and Machine Intelligence 6(6), 1984, 721–741.y9 vector of useful observations, with reduced di-18. J. M. Ortega and W. C. Rheinbolt, Iterative Solutions of Nonlinearmensionality

Equations in Several Variables. Computer Science and Applied Math-A9 matrix of useful constraints, with reduced rowematics, New York: Academic Press, 1970.dimensionality

19. A. M. Tekalp, Digital Video Processing. Upper Saddle River, NJ:DSNR improved signal-to-noise ratio, in decibels.Prentice Hall, Inc., 1995.

20. D. Shulman and J.-Y. Herve, Regularization of discontinuous flowREFERENCES fields, in Proceedings, Workshop on Visual Motion, Irvine, CA, 1989,

pp. 81–86.1. R. R. Schultz and R. L. Stevenson, Extraction of high-resolution

frames from video sequences, IEEE Trans. Image Process. 5(6),1996, 996–1011.

2. C. A. Berenstein, L. N. Kanal, D. Lavine, and E. C. Olson, A geomet-ric approach to subpixel registration accuracy, Computer Vision,Graphics, and Image Processing, 40, 1987, 334–360.

3. G. de Haan and P. W. A. C. Beizen, Sub-pixel motion estimationwith 3-D recursive search block-matching, Signal Processing: ImageCommunications 6, 1994, 229–239.

4. Q. Tian and M. N. Huhns, Algorithms for subpixel registration, Com-RICHARD R. SCHULTZ was born on March 19, 1967, in Grafton,puter Vision Graphics, Image Processing 35, 1986, 220–233.

North Dakota. He received the B.S.E.E. degree (summa cum laude) from5. R. Y. Tsai and T. S. Huang, Multiframe image restoration and regis- the University of North Dakota in 1990, and the M.S.E.E. and Ph.D. de-

tration, in Advances in Computer Vision and Image Processing (R. Y. grees from the University of Notre Dame in 1992 and 1995, respectively.Tsai and T. S. Huang, Eds.), Vol. 1, pp. 317–339, JAI Press, Lon- He joined the faculty of the Department of Electrical Engineering at thedon, 1984. University of North Dakota in 1995, where he is currently an Assistant

6. A. M. Tekalp, M. K. Ozkan, and M. I. Sezan, High-resolution image Professor. In 1996, Dr. Schultz received a National Science Foundationreconstruction from lower-resolution image sequences and space- Faculty Early Career Development (CAREER) award to integrate his im-varying image restoration, in Proceedings, IEEE International Confer- age processing research and educational activities. Dr. Schultz is a memberence on Acoustics, Speech, and Signal Processing, San Francisco, CA, of the IEEE, SPIE, ASEE, Eta Kappa Nu, and Tau Beta Pi. His current

research interests include digital signal, image, and video processing.1992, pp. III–169 to III–172.


LI MENG was born on September 23, 1971, in Beijing, China. She cum laude) from the University of Delaware in 1986, and the Ph.D.in electrical engineering from Purdue University in 1990. While atreceived the B.S. degree in electrical engineering from Beijing University

in 1995, and the M.S.E.E. degree from the University of North Dakota Purdue, he was supported by graduate fellowships from the NationalScience Foundation, DuPont Corporation, Phi Kappa Phi, and Purduein 1996. Ms. Meng joined LSI Logic Corporation (Milpitas, CA) in 1997.

She conducts research and development within the Digital video Prod- University. He joined the faculty of the Department of ElectricalEngineering at the University of Notre Dame in 1990, where he isucts group.currently an Associate Professor. His research interests include image/video processing, image/video compression, robust image/video commu-nication systems, multimedia systems, ill-posed problems in computa-tional vision, and computational issues in image processing. Dr. Steven-son currently serves as an Associate Editor for the IEEE Transactionson Circuits and Systems for Video Technology. He is the GeneralChair of the 1998 Midwest Symposium on Circuits and Systems to beheld on the campus of the University of Notre Dame, and he willco-chair the 1999 Conference on Visual Communications and ImageProcessing.ROBERT L. STEVENSON received the B.E.E. degree (summa

Subpixel Motion Estimation for Super-Resolution Image Sequence Enhancement

Documents