Top Banner
C COPYRIGHT NOTICE I © 2004 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
13

C COPYRIGHT NOTICE Izhu/tpami117016.pdf · Generalized Parallel-Perspective Stereo Mosaics from Airborne Video Zhigang Zhu, Member, IEEE, Allen R. Hanson,Member, IEEE, and Edward

May 13, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: C COPYRIGHT NOTICE Izhu/tpami117016.pdf · Generalized Parallel-Perspective Stereo Mosaics from Airborne Video Zhigang Zhu, Member, IEEE, Allen R. Hanson,Member, IEEE, and Edward

C COPYRIGHT NOTICE I © 2004 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Page 2: C COPYRIGHT NOTICE Izhu/tpami117016.pdf · Generalized Parallel-Perspective Stereo Mosaics from Airborne Video Zhigang Zhu, Member, IEEE, Allen R. Hanson,Member, IEEE, and Edward

Generalized Parallel-Perspective StereoMosaics from Airborne Video

Zhigang Zhu, Member, IEEE, Allen R. Hanson, Member, IEEE, and

Edward M. Riseman, Senior Member, IEEE

Abstract—In this paper, we present a new method for automatically and efficiently generating stereoscopic mosaics by seamless

registration of images collected by a video camera mounted on an airborne platform. Using a parallel-perspective representation, a pair

of geometrically registered stereo mosaics can be precisely constructed under quite general motion. A novel parallel ray interpolation

for stereo mosaicing (PRISM) approach is proposed to make stereo mosaics seamless in the presence of obvious motion parallax and

for rather arbitrary scenes. Parallel-perspective stereo mosaics generated with the PRISM method have better depth resolution than

perspective stereo due to the adaptive baseline geometry. Moreover, unlike previous results showing that parallel-perspective stereo

has a constant depth error, we conclude that the depth estimation error of stereo mosaics is in fact a linear function of the absolute

depths of a scene. Experimental results on long video sequences are given.

Index Terms—Mosaicing, stereo vision, visual representation, epipolar geometry, image registration, view interpolation, airborne

video analysis.

1 INTRODUCTION

RECENTLY, there have been attempts in a variety ofapplications to add 3D information into an image-based

mosaic representation. Creating stereo mosaics from tworotating cameras was proposed by Huang andHung [3]. Thegeneration of stereo mosaics from a single off-center rotatingcamerawasproposedandfullystudiedbyPelegandBen-Ezra[13] and Shum and Szeliski [20] for image-based renderingapplications. In our previous work for environmentalmonitoring using aerial video, we proposed to create stereomosaics from a single camera with dominant translationalmotion [26]. In fact, the idea of generating stereo panoramasfor either anoff-center rotatingcameraora translatingcameracan be traced back to the earlier work in robot visionapplications by Ishiguro, et al. [6] and Zheng and Tsuji [25].The attraction of the recent studies on off-center rotatingcameras lies inhowtomakestereomosaicswithgoodepipolargeometry and high image quality for image-based rendering[13], [14], [19], [20].However, in stereomosaicswitha rotatingcamera, the viewers are constrained to rotationally viewingthe stereo representations. Translational motion, on the otherhand, is the typical prevalent sensor motion during groundvehicle navigation [25], [27] or aerial surveys [8], [26]. In [13],the authors mentioned that the same techniques developedfor the stereoscopic circular projection of a rotating cameracould be applied to a translating camera, but it turns out thatthere has been little serious work on this topic. A rotatingcamera can be easily controlled to achieve thedesired circular

motion, and to exhibit certain desirable geometric properties.In contrast, the translation of a camera over a large distance ismuch harder to control, and introduces rather different andoften difficult geometric properties.

Clearly, theuseofstandard2Dmosaicingtechniquesbasedon 2D image transformations such as a manifold projection[12] cannot generate seamless mosaics in the presence ofobvious motion parallax. Rectified mosaicing methods havebeen proposed for generating 2Dmosaics “without the curl,”withapanningcamerathat isnotperfectlyhorizontal [32], [17]orwith a translating camera facing a tiltedplanar surface [32].However these methods are all based on global parametrictransformationsbetweensuccessiveframes,andcannotapplyto a translating camera viewing surfaces that are highlyirregular or with large differences in heights, resulting insignificantly different motion parallax. In generating seam-less 2Dmosaics froma hand-held rotating camera, ShumandSzeliski [21] used a local alignment (deghosting) technique tocompensate for the small amount of motion parallax intro-duced by small translations of the camera. For 2D rectifiedmosaics under more general motion cases, non-straightstitching curves have been proposed (for example, [9]) togenerate seamless mosaics for aerial and satellite images.More recently, Rousso et al. [15] proposed universal mosai-cing using a “pipe projection.” To deal with motion withparallax, theysuggestedthata2Dorthogonalprojectioncouldbegeneratedby takinga collectionof strips, eachwith awidthof one pixel, from interpolated camera views in between theoriginal camera positions, but details were not provided.Moreover, for stereo mosaics, an accurate mathematicalmodel is required for precise 3D reconstruction. Kumar et al.[8] dealt with the geo-registration problem by utilizing anavailable geo-registered aerial image with broader coverage,as well as an accompanying coregistered digital elevationmap. Inmoregeneral cases forgenerating imagemosaicswithparallax, several techniques have been proposed to explicitlyestimate the cameramotion and residual parallax by recover-ing a projective depth value for each pixel [7], [16], [24]. A

226 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 2, FEBRUARY 2004

. Z. Zhu is with the Department of Computer Science, City College of NewYork, Convent Ave. and 138th Street, New York, NY 10031.E-mail: [email protected].

. A.R. Hanson and E.M. Riseman are with the Department of ComputerScience, University of Massachusetts at Amherst, Amherst, MA 01003.E-mail: {hanson, riseman}@cs.umass.edu.

Manuscript received 25 July 2002; revised 27Apr. 2003; accepted 18 Sept. 2003.Recommended for acceptance by M. Irani.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number 117016.

0162-8828/04/$20.00 � 2004 IEEE Published by the IEEE Computer Society

Page 3: C COPYRIGHT NOTICE Izhu/tpami117016.pdf · Generalized Parallel-Perspective Stereo Mosaics from Airborne Video Zhigang Zhu, Member, IEEE, Allen R. Hanson,Member, IEEE, and Edward

general yet efficient approach and a suitable representation ishighlydesired forgeneratingseamless stereomosaics for longimage sequences under obvious motion parallax, preferablybefore 3D reconstruction.

Another interesting issue is howwell the parallel-perspec-tive stereo behaves in terms of depth resolution. It has beenshown independently by others [1] and by us [26], [29] thatparallel-perspective stereo is superior to both conventionalperspective stereo and to the recently developed multi-perspective stereowith concentricmosaics for 3D reconstruc-tion (e.g., in [20]), in that the adaptive baseline inherent in theparallel-perspective geometry permits depth accuracy inde-pendent of absolute depth. But, this conclusionwasobtainedandverified under ideal conditions (e.g., in the study of [1]). Inpractice, however, a seriousconsiderationof stereomosaicingfrom a real video sequence is the degree of error in the finalmosaics when using real sensors. Real video cameras havecentralprojection,undergoamoregeneralmotion, are subjectto limited frame rates, and view scenes with depth changes.

Asarealapplicationofourwork,our interdisciplinaryNSFenvironmental monitoring project aims at developing tech-niques for estimating the standing biomass of forests,monitoring land use changes, habitat destruction, etc., usinghigh resolution low-altitude video sequences [23], [26]. Aninstrumentation package, mounted on a small airplane,consists of two digital video cameras (telephoto and wide-angle), a Global Positioning System (GPS), an InertialNavigation System (INS), and a profiling pulse laser [26].The previous manual approach [23] used by our forestryexperts utilized only a fraction of the available data due to thelabor involved inmanual interpretationof the largeamountofvideo data. For example, recent projects in Bolivia involvedmore than 20 hours of video over 600 sites, and in Brazil over120 hours (10 terra bytes), which is prohibitive if the video isinterpreted manually. A more compact representation andmore flexible interactive 3Dvisualization interface are clearlynecessary in such aerial video applications; in fact, for manyapplications dealingwith large-scale natural or urban scenes,extending the field of view (FOV) of a 2D image, and thenintroducing the third dimension of depth would be of greatutility.Videosurveillance [8], environmentalmonitoring [26],[29], image-based rendering [13], [20], compact video repre-sentation [4], [5], and robot navigation [25], [27] are just a fewexamples of the applications that would benefit from anextended and 3D-enhanced image-based representation.

In this paper, we will address the problem of creatingseamless and geometrically registered 3D mosaics from amoving camera, undergoing a rather general motion andallowingviewpoints tochangeovera largedistance.Therearethree significant contributions in this paper. First, a precisemathematical model of generalized parallel-perspectivestereo is proposed, which not only supports seamlessmosaicing under quite general motion, but also capturesinherent 3D information of the scene in a pair of stereomosaics. Second,we propose a novel technique called PRISM(parallel ray interpolation for stereo mosaicing) to efficientlyconvert the sequence of perspective imageswith largemotionparallax into the parallel-perspective stereomosaics.Wenotethat the PRISM approach can be generalized to mosaics withother types of projections (such as circular projection and fullparallel projection). Third,we further examine 1)whether thePRISM process (of image rectification followed by rayinterpolation) introduces additional errors in the succeedingsteps (e.g., depth recovery) and2)whether the final “disparity

equation” of the stereo mosaics, which exhibits a linearrelation between depths and stereo mosaic displacements,reallymeansthat the recovereddepthaccuracy is independentof absolute depth. Results formosaic construction fromaerialvideo data of real scenes are shown and 3D reconstructionsfrom these mosaics are given.

This paper is organized as follows: Section 2 gives therepresentation of generalized parallel-perspective stereo andits epipolar curvegeometryunder3D translation. InSection3,we discuss how image sequences with rather arbitrary, butdominant translational motion (i.e., constrained 6 DOF), canbe used as input to develop stereo mosaics. In Section 4, wepropose the novel ray interpolation approach, PRISM, togeneral stereo mosaic from video with obvious motionparallax under translational motion. Section 5 gives athorough error analysis of ray interpolation in stereo mosai-cing. Several important conclusions are made regarding theconditions for generating effective stereo mosaics by thePRISM approach and subsequent 3D reconstruction. Finally,we summarize the main points of this paper and discussdirections of future work.

2 GENERALIZED PARALLEL-PERSPECTIVE STEREO

The basic idea of the parallel-perspective stereo can beexplained as the following under 1D translation [26], [1].Assume the camera motion is an ideal 1D translation, theoptical axis is perpendicular to themotion, and the frames aredense enough. Then, we can generate two spatio-temporalmosaic images by extracting two scanlines of pixels (perpen-dicular to themotion) at the front and rear edgesof each framein motion (Fig. 1a). Each mosaic image thus generated issimilar to a parallel-perspective image generated by a linearpushbroom camera [2], which has perspective projection inthe direction perpendicular to the motion and parallelprojection in the motion direction. In addition, these mosaicsare obtained from two different oblique viewing angles (of asingle camera’s field of view), so that a stereo pair of left andright mosaics captures the inherent 3D information.

To copewith the realmotionof anairborne camera,wewillgeneralize (next section) the stereo mosaicing mechanism todealwithconstrained6DOFmotion—arathergeneralmotionwith a dominant translation motion direction. Since rotationeffects could be removed by image rectification, here we willfirst show how to represent stereo mosaics under 3Dtranslation (i.e., without camera rotation). We assume thatthe3Dcurvecollectingthemovingviewpointshasadominanttranslational motion (e.g., the Y direction in Fig. 1b) so that aparallel projection can be generated in that direction. Under3D translation, parallel stereomosaics can be generated in thesameway as in the case of 1D translation. Themaindifferenceis that viewpoints of themosaics form a 3D curve instead of a1D straight line.

2.1 Mathematical Model

Without loss of generality, we assume that two horizontal1D-scanline slit windows (the rear slit and the front slit) havedy=2 offsets to the left and right of the center of the image,respectively (Fig. 1a). The stereo mosaics are formed by thefollowing two steps: scaling and then translating.Wedefine thescaledvector of a camerapositionTT ¼ ðTx; Ty; TzÞt (related to acommon reference frame—the first frame in Fig. 1) as tt ¼ðtx; ty; tzÞt ¼ FTT=H in the mosaicing coordinates, where F isthe focal length of the camera andH is the height of a fixation

ZHU ET AL.: GENERALIZED PARALLEL-PERSPECTIVE STEREO MOSAICS FROM AIRBORNE VIDEO 227

Page 4: C COPYRIGHT NOTICE Izhu/tpami117016.pdf · Generalized Parallel-Perspective Stereo Mosaics from Airborne Video Zhigang Zhu, Member, IEEE, Allen R. Hanson,Member, IEEE, and Edward

plane (e.g., average height of the terrain) onwhich one pixel inthe y direction of the mosaics corresponds to H=F worlddistance. Then, from the frame in camera position ðtx; ty; tzÞ,the front slitwill be translated to ðtx; ty þ dy=2Þ in the “left eye”mosaic, while the rear slit will be translated to ðtx; ty � dy=2Þ inthe “right eye”mosaic; therefore, bothmosaics share the sameorigino.Theabove treatment (namelyscalingandtranslating)has twobenefits. By scaling, an appropriate parallel samplingin the y direction is maintained; therefore, a good aspect ratioof the mosaics is kept in the x and the y directions. This is thebest choice for parallel sampling under parallel-perspectiveprojection, especially for object points close to the fixationplane. By translating, appearance distortion of the mosaics isminimized (especially in theX direction) since the image slitsare placed according to the camera locations of the corre-sponding image frames.

Suppose the corresponding pair of the 2D points (onefrom each mosaic), ðxl; ylÞ and ðxr; yrÞ, of a 3D pointðX;Y ; ZÞ, is generated from original frames in the camerapositions ðTxl; Tyl; TzlÞ and ðTxr; Tyr; TzrÞ, respectively. Themathematical model of the generalized parallel-perspectivestereo mosaics can be represented by the following equations

ðxl; ylÞ ¼ F X�Txl

Z�Tzlþ F Txl

H ; F YH � Z�Tzl

H � 1� � dy

2

� �ðxr; yrÞ ¼ F X�Txr

Z�Tzrþ F Txr

H ; F YH þ Z�Tzr

H � 1� � dy

2

� �:

8<: ð1Þ

It should be noted that generation of stereo mosaics requiresonly knowledge of the camera pose information, but not the3D structure of the scene. Under 3D translation, the imagescales (especially in the x direction) of the same scene regionsin the left and rightmosaics couldbedifferent due todifferentZ translational components Tzl andTzr in (1).However,whenthe translation in theZ direction isvery small compared to theheight H, as in our aerial video applications, the scaledifference of the same regions in the left and the rightmosaicsof the stereo pair are small, which aids stereo matching (bothby computers for 3D reconstruction and by humans duringstereo viewing).

2.2 Disparity, Baseline, and Epipolar Geometry

Because of the way the stereo mosaics are generated, theviewpoints of both are on the same smooth 3D motion track.The “scaled” camera position ttrr corresponding to column y inthe right mosaic is exactly the camera position ttll correspond-ing to column yþ dy in the left mosaic (e.g., the right-view

point yrH=F and the left-view point ðyr þ dyÞH=F on thefixation plane in Fig. 1b are from the same view TTrr), i.e.,

ttrrðyÞ ¼ ttllðyþ dyÞ ð2Þ

both of which are only functions of the y coordinate. Let usdefine the mosaic displacement vector between a pair ofcorresponding points ðxl; ylÞ and ðxr; yrÞ in the stereomosaics as

ð�x;�yÞ ¼ ðxr � xl; yr � ylÞ: ð3Þ

In the general case of 3D translation, the depth of the pointcan be derived from (1) as

Z ¼ Hbydy

þ �TTz ¼ Hð1þ�y

dyÞ þ �TTz; ð4Þ

where

by ¼F

HBy ¼ dy þ�y ð5Þ

is defined as the scaled baseline in the y direction, which isthe scaled version of the baseline (By ¼ Tyl � Tyr) betweenthe two viewpoints ðTxl; Tyl; TzlÞ and ðTxr; Tyr; TzrÞ thatgenerate the point pair, and

�TTz ¼Tzl þ Tzr

2ð6Þ

is defined as the average camera height deviation of the stereopointpair relatedto theoriginof thereference frame.Equation(4) represents the depth-baseline-disparity relation of theparallel-perspective stereo: The disparity of any correspond-ing point pair is a constant number dy, but the mosaicdisplacement �y varies with the depth of the 3D point andrepresents the adaptive baseline by for that point. Note that, inthe depth equation, we adopt the same notations of disparityand baseline as in the two-view perspective stereo. From (4),we have

�y ¼ dyH

ð�Z � �TTzÞ; ð7Þ

which means that under 3D translation, the mosaic displace-ment (�y) of a 3D point is proportional to the relative depthdeviation of the point, which is the real depth deviation fromthe fixation plane (�Z ¼ Z �H), less the average cameraheight deviation ( �TTz). When the motion of the camera isconstrained to a 2D translation in the XY plane (i.e., Tz ¼ 0,

228 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 2, FEBRUARY 2004

Fig. 1. Parallel-perspective stereo geometry. (a) Illustration of dominant translational motion direction, two slit windows and two parallel-perspectivemosaics. (b)The stereo geometry of the generalized parallel-perspective projection under 3D translation (the X axis and the slit windows areperpendicular to the plane of the figure).

Page 5: C COPYRIGHT NOTICE Izhu/tpami117016.pdf · Generalized Parallel-Perspective Stereo Mosaics from Airborne Video Zhigang Zhu, Member, IEEE, Allen R. Hanson,Member, IEEE, and Edward

thus �TTz ¼ 0), the stereo mosaic displacement of a 3D point isexactly proportional to the depth deviation of the pointaround the fixation planeH. It is also interesting to note thatthe selection of the two mosaic coordinate systems brings aconstant shift dy to the scaled baseline (by) and produces thefixationof the stereomosaics toahorizontal fixationplaneof anaverage height H. This is highly desirable for both stereomatching and stereoscopic viewing.

For stereo matching between two such mosaics, we needto know the epipolar geometry under general 3D translation.From (1) and (4), the corresponding point in the right mosaicof any point in the left mosaic will be on an epipolar curvedetermined by the left point and the 3D motion track, i.e.,

�x ¼bx�yþ bzdyðxl � txrþtxl

2 Þ=F�yþ dy þ bzdy=ð2F Þ ; ð8Þ

where bx � bxðyl;�yÞ ¼ txlðyl þ dy þ�yÞ � txlðylÞ and bz �bzðyl;�yÞ ¼ tzlðyl þ dy þ�yÞ � tzlðylÞ are the “scaled” base-

line functions in the x and z directions of variables yl and�y.

Here, we use the same “scaled” notation as the baseline by in

(5) and apply the relation in (2). Hence, �x is a nonlinear

function of theposition ðxl; ylÞ aswell as thedisplacement�y.

This is quitedifferent fromtheepipolargeometryof two-view

perspective stereobecause: 1) imagecolumnswithdifferentylcoordinates in parallel-perspective mosaics are projected

fromdifferent viewpoints, which are reflected in the baseline

function bxðyl;�yÞ, and 2)�x is also a function ofxl due to the

nonzeroZ translationand thereforenonzero bz term in (7).We

have the following conclusions for the epipolar geometry of

parallel-perspective stereo:

1. In the general case of 3D translation, if we know therange of depth deviation (plus an average cameraheight deviation) from (7), i.e.,

��Zm ¼ �ðj�Zjmax þ j �TTzjmaxÞ ð9Þ

the search region for the corresponding point in theright mosaic is

�y 2 � dyH

�Zm;þdyH

�Zm

� �;

and along an epipolar curve(using (8)), which isdifferent for every point ðxl; ylÞ in general.

2. In the case of 2D translation (i.e., bz ¼ 0), the epipolarcurve for a given point ðxl; ylÞ in the left mosaicpasses through the location ðxl; ylÞ in the rightmosaic (Fig. 2):

�x ¼ bxðyl;�yÞ �y

�yþ dy; ð10Þ

which implies that the stereo mosaics are aligned forall the points whose depths areH. The same epipolarcurve function (of yl and�y) is applied to all thepointsin the left mosaic with the same yl coordinate.

3. In the ideal casewhere the viewpoints of stereomosaicslie in a 1D straight line (i.e., bx ¼ bz ¼ 0), the epipolarcurves will turn out to be horizontal lines (�x ¼ 0).Therefore, we can apply most of the existing stereomatch algorithms for rectified perspective stereowithlittle modification.

3 MOSAICING UNDER REALISTIC 6 DOF MOTION

This sectiondiscusses how to generate stereomosaicsunder amore general motion model, with constrained 6 DOF. Togeneratemeaningful and seamless stereomosaics,weneed toimpose some constraints on the cameramotion (Fig. 3a). First,the motion must have a dominant direction. Second, theangular orientation of the camera is constrained to a rangethat precludes it turning more than 180 degrees. Third, therate of change of the angular orientation parameters must beslow enough to allow sufficient overlap of successive imagesfor stereomosaicing. These constraints are all reasonable andare satisfied by a sensor mounted in a light aircraft withnormal turbulence. Within these constraints, the camera canrealistically undergo six DOF motion. There are two stepsnecessary to generate a rectified image sequence that exhibitsonly (known) 3D translation, and from which we cansubsequently generate seamless mosaics:

Step 1: Camera orientation estimation. Using aninternally precalibrated camera, the extrinsic camera para-meters (camera orientation) can be determined from anaerial instrumentation system (GPS, INS, and a laserprofiler) [26] and bundle adjustment techniques [22]. Themain point here is that we do not need to carry out a densematch between two successive frames. Instead, only sparsetie points widely distributed in the two images are neededto estimate the camera orientations.

Step 2: Image rectification. An image rotation transfor-mation is applied to each frame in order to eliminate the

ZHU ET AL.: GENERALIZED PARALLEL-PERSPECTIVE STEREO MOSAICS FROM AIRBORNE VIDEO 229

Fig. 2. Epipolar curves in stereo mosaics under 2D translation. Given aleft point ðxl; ylÞ, the baseline function bxðyl;�yÞ is a shifted version ofthe motion track txrðyrÞ by a constant txlðylÞ, which results in the epipolarcurve in the right mosaic using (10).

Fig. 3. Image rectification. (a) Original and (b) rectified image sequence.

Page 6: C COPYRIGHT NOTICE Izhu/tpami117016.pdf · Generalized Parallel-Perspective Stereo Mosaics from Airborne Video Zhigang Zhu, Member, IEEE, Allen R. Hanson,Member, IEEE, and Edward

rotational components (Fig. 3b). In fact, we only need to dothis kind of transformation to two narrow slices in eachframe that will contribute incrementally to a pair of mosaics.In our motion model, the 3D rotation is represented by arotation matrix RR, and the 3D translation is denoted by avector TT ¼ ðTx; Ty; TzÞt. A 3D point XXkk ¼ ðXk; Yk; ZkÞT withimage coordinates uukk ¼ ðuk; vk; 1Þt at current frame k can berelated to its reference coordinates XX ¼ ðX;Y ; ZÞT byXX ¼ RRkkXXkk þ TTkk, where RRkk and TTkk are the rotation matrixand the translation vector of the kth frame related to thereference frame (e.g., the first frame). In the imagerectification stage, a projective transformation AAkk is appliedto the kth frame of the video using the motion parametersobtained from the camera orientation estimation step:

uuppkk ffi AAkkuukk ð11Þ

with

AAkk ¼ FFRRkkFF�1; FF ¼

F 0 00 F 00 0 1

0@

1A; ð12Þ

where uuppkk is the reprojected image point of the kth frame, and

F is the camera’s focal length. The resulting video sequencewill be a rectified image sequence as if it was captured by a“virtual” camera undergoing 3D translation ðTx; Ty; TzÞ. Weassume that vehicle’s motion is primarily along the Y axisafter eliminating the rotation, which implies that the mosaicwill be produced along the Y direction.

3.1 A Real Example

Whilea full calibrationofparameters inall camerapositions isa very difficult task in a long video sequence and is not thefocus of this paper, we point out here that two differentpractical treatments with near real-time implementations areapplied in our experiments [29]: unconstrained imagemosaics using a dominant plane fitting technique withoutcamera calibration and geo-registered mosaics with a prac-ticalmethodforcameraorientationestimationusingtheGPS/INS measurements. An underlying assumption in thepractical treatments is that, if the translational component intheZ direction ismuch smaller than thedistance itself,weusea constant scaling factor in the interframe motion estimationand image rectification for each frame to compensate for theZ translation.Weshowarealexampleofstereomosaics froma165-framevideosequenceinFig.4,collectedaspartofaproject[23], [26] with The Nature Conservancy (TNC) for determin-ingbiomass forpreservationofa tropical forest inBolivia.Thisexample shows the creation of stereomosaics from video of areal world scene, the epipolar geometry, and the3D reconstruction and stereo viewing properties. Figs. 4aand 4b show a pair of stereo mosaics generated by using anunconstrainedimagemosaicingapproach(referredtoas“freemosaicing” in [29])with the slit-windowdistance dy ¼ 224 (inpixels; the original imagewas 720*480). It is obvious from themosaics that, after the compensation of the small rotationangles and the smallZ components by image rectification, therectified image sequence has significant translations in thex direction, as well as in the dominant y direction.

230 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 2, FEBRUARY 2004

Fig. 4. Stereo mosaics and the epipolar geometry. (a) Left view mosaic. (b) Right view mosaic. (c) The y displacement map: mosaic displacement�y(proportional to the relative depth �Z) is encoded as brightness (brightness is from 0 when �y ¼ 18:3 pixels, to 255 when �y ¼ �16:2 pixels), sohigher elevation (i.e., closer to the camera) is brighter. (d) The histogram of the y displacements. (e) Illustration of searching ranges of the xdisplacements given a searching range of the y displacements [-10, +10] in the stereo mosaics, using (10).

Page 7: C COPYRIGHT NOTICE Izhu/tpami117016.pdf · Generalized Parallel-Perspective Stereo Mosaics from Airborne Video Zhigang Zhu, Member, IEEE, Allen R. Hanson,Member, IEEE, and Edward

There are two benefits of generating a seamless stereomosaic pair. First, a human can perceive the 3D scene in astereo mosaic pair (e.g., using a pair of polarized glasses, oreven red/blue-green anaglyph glasses) without any explicit3D recovery [28], [29]. Since stereo mosaicing can begenerated in real-time, this leads to a real-time stereoscopicviewing of the 3D scene. Experience of both forestry expertsand laymen have shown that the stereoscopically viewedmosaics of trees are both compelling and vivid to theviewers. Results of high resolution stereo mosaics can befound at our Web sites [28]. Second, after stereo mosaicing,matches are only performed on the stereo mosaics for3D recovery, not on individual pairs of video frames,resulting in tremendous reduction in storage and computa-tion. Fig. 4c shows the derived y “displacement” map(where displacement �y is proportional to the relativedepth �Z) from the pair of parallel-perspective stereomosaics. This displacement map is obtained using ahierarchical subpixel dense correlation method [18]. Fig. 4dshows the histogram of the y displacements of this pair ofstereo mosaics, indicating most of the pixels have displace-ments within �10:0 pixels to þ10:0 pixels. Using (10), wederive the search range of the x displacements,½�xmin;�xmax�, at each column of the left mosaic, if thesearch range of the y displacements in the stereo mosaics is½�10;þ10� (Fig. 4e). It can be seen that for the large part ofthe mosaics, the search ranges in the x direction are within�3 pixels, except the tail with large x motion components.

4 A RAY INTERPOLATION APPROACH FOR

STEREO MOSAICING

After image rectification, we obtain a translational motionsequence with the rotational effect removed. However, thetranslationalsequenceexhibitsobviousmotionparallax.Howcan we generate seamless mosaics in a computationallyeffective way from this sequence? The key to our approachlies in the parallel-perspective representation and our novelPRISM (parallel ray interpolation for stereo mosaicing) approach[30]. For the left (or right)mosaic, we only need to take a front(or rear) slice of a certainwidth (determinedby the interframemotion) from each frame and perform local registrationbetween the overlapping slices of successive frames.We thendirectly generate parallel-perspective interpolated rays be-tween two known discrete perspective views for the left (or

right) mosaic so that the geometry of the mosaic generatedexhibits true parallel projection in the direction of thedominant motion and, therefore, the mosaic is withoutgeometric distortions.

4.1 PRISM: Parallel Ray Interpolation forStereo Mosaicing

Let us examine the PRISM approach more rigorously inthe general case of 3D translation (after image rectifica-tion). We take the left mosaic as an example and illustratethe geometry in Fig. 5. First, we define the fixed line of thefront mosaicing slice in each frame as a scanline that isdy=2 distance from the center of the frame. We use theterm “fixed line” to indicate that pixels on that line can bedirectly copied to the corresponding location in the leftmosaic. The width of the slice used for ray interpolationare determined by the camera’s locations of both framesand the depths of the points seen by the two frames. Aninterpretation plane (IP) of the fixed line is a plane passingthrough the nodal point and the fixed line of the frame. Bythe definition of parallel-perspective stereo mosaics underpure translation, all the IPs of fixed lines for the leftmosaic are parallel to each other. Suppose that ðSx; Sy; SzÞis the translational vector of the camera between theprevious (1st) frame of viewpoint ðTx; Ty; TzÞ and thecurrent (2nd) frame of viewpoint ðTx þ Sx; Ty þ Sy; Tz þ SzÞ(Fig. 5). We need to interpolate parallel-perspective raysbetween the two fixed lines of the 1st and the 2nd frames.

For each point ðx1; y1Þ (to the right of the first fixed liney0 ¼ dy=2Þ in the first frame, which will contribute to the leftmosaic, we can find a corresponding point ðx2; y2Þ (to theleft of the second fixed line) in the second frame. We assumethat ðx1; y1Þ and ðx2; y2Þ, represented in their own framecoordinate systems, intersect at a 3D point ðX;Y ; ZÞ. Then,the desired parallel-reprojected viewpoint ðTxi; Tyi; TziÞ forthe corresponding pair can be computed as

Tyi ¼ Ty þy1 � dy

2

� �ðFSy � y2SzÞ

ðy1 � y2Þ FSy � dy2 Sz

� �Sy;

Txi ¼ Tx þSx

SyðTyi � TyÞ;

Tzi ¼ Tz þSz

SyðTyi � TyÞ;

ð13Þ

ZHU ET AL.: GENERALIZED PARALLEL-PERSPECTIVE STEREO MOSAICS FROM AIRBORNE VIDEO 231

Fig. 5. Ray interpolation by ray reprojection. For any pair of points between the two fixed lines of the two successive frames, a new interpolated ray isback-projected parallel to the interpretation planes of the fixed lines, from the intersection of the two corresponding rays cast from the viewpoints ofthe two existing frames.

Page 8: C COPYRIGHT NOTICE Izhu/tpami117016.pdf · Generalized Parallel-Perspective Stereo Mosaics from Airborne Video Zhigang Zhu, Member, IEEE, Allen R. Hanson,Member, IEEE, and Edward

where Tyi is calculated in a “virtually” interpolated IP that

passes through the point ðX;Y ; ZÞ and is parallel to the IPs

of the fixed lines of the first and second frames, and Txi and

Tzi are calculated in such a way that all the viewpoints

between ðTx; Ty; TzÞ and ðTx þ Sx; Ty þ Sy; Tz þ SzÞ lie in a

straight line. (Of course we can find a better fit for the

motion curve rather than this linear local fit.) The

reprojected “image” ðxi; yiÞ of the point ðX;Y ; ZÞ from the

interpolated viewpoint ðTxi; Tyi; TziÞ is given by

ðxi; yiÞ ¼x1 � F Sx

Zi

1� Sz

Zi

;dy2

" #; Zi ¼

FSy � dy2 Sz

y1 � dy2

; ð14Þ

whereZi is introducedhere to simplify the represenation (it is

loosely related todepthbut isnotactually thedepthmeasured

from the interpolated viewpoint). Note that the calculation of

the x coordinate in the above equation indicates perspective

projection in the x direction, and the constant y coordinate

(¼ dy=2) indicates that the point is on the fixed line of the

virtually interpolated view (and, hence, the interpolated

projection ray is parallel to the IPs of the fixed lines).In our aerial video application, the actual motion of the

aircraft was mostly in the xy plane; therefore, the3D translation is reduced to 2D translation (with Tz ¼ 0 and,hence, Sz ¼ 0) as shown in [30]. In this case, (13) and (14) canbegreatly simplified.The items in the twoequations thatneedto be changed are

Tyi ¼ Ty þy1 � dy=2

y1 � y2Sy; xi ¼ x1 �

Sx

Syy1 �

dy2

� �: ð15Þ

Knowing the interpolated view point ðTxi; Tyi; TziÞ and the

point coordinates ðxi; yiÞ in the virtually interpolated view,

the left mosaicing coordinates ðxl; ylÞ of the point can be

calculated as

ðxl; ylÞ ¼ txi þ xi; tyi þdy2

� �; ð16Þ

where txi ¼ FTxi=H and tyi ¼ FTyi=H are the “scaled”

translational components of the interpolated view.

4.1.1 Generalization and Discussions

The core idea of the PRISM algorithm is ray interpolation,which uses explicit motion parallax information betweensuccessive video frames to create better mosaics. However,the idea of ray interpolation is not limited to parallel-perspective projection. It can be generalized to other kindsof projection geometry, such as circular projection [13], [14]multiperspective panoramas [20], linear pushbroom cam-eras [2], full parallel projection [1], etc.

Wealsonote thatview interpolationhasbeen suggestedbyothers for generating seamless 2D mosaics under motionparallax [15]. Our work is different from theirs in threeaspects, mainly due to explicitly employing the mosaicinggeometry, namely, the parallel-perspective stereo represen-tation, in the PRISM algorithm. First, our approach is directand much more efficient. We do not need to generate manynew images between each pair of original frames. Instead,wedirectly generate interpolated parallel rays for the parallel-perspectivemosaics. Second,wepropose to stitch two imagesin themiddle of the two fixed lines to minimize the occlusionproblem since views of the points thus selected in the originalimages are as close as possible to the rays of the finalmosaics.Last but not least, we use an accurate geometric model tomaintain precise stereo geometry for 3D reconstruction.

4.2 Implementation and Experimental Analysis

4.2.1 A Fast PRISM Implementation

In principle, we need tomatch all the points between the twofixed lines of the successive frames to generate a completeparallel-perspective mosaic. In an effort to reduce thecomputational complexity and to handle textureless regions,a fastPRISMalgorithmhasbeen implemented ([29], [30]).Asasummary, the fast PRISM algorithm consists of the followingfour steps, taking the left mosaic as an example (Fig. 6):

Step 1: Slice determination. Determine the fixed lines in

the current framekand theprevious framek� 1by the left slit

window distance dy=2, and “ideal” straight stitching lines by

their 2D scaled translational parameters ðtðkÞx ; tðkÞy Þ and

ðtðk�1Þx ; tðk�1Þ

y Þ. The locations of the stitching lines are in the

middle of the two fixed lines, i.e, at dy=2þ ðtðkÞy � tðk�1Þy Þ=2 in

the ðk� 1Þth frame and dy=2� ðtðkÞy � tðk�1Þy Þ=2 in the kth

232 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 2, FEBRUARY 2004

Fig 6. A fast PRISM algorithm: sparse matching, region triangulation, and image warping.

Page 9: C COPYRIGHT NOTICE Izhu/tpami117016.pdf · Generalized Parallel-Perspective Stereo Mosaics from Airborne Video Zhigang Zhu, Member, IEEE, Allen R. Hanson,Member, IEEE, and Edward

frame. Thus, we have two overlapping slices in the kth and

ðk� 1Þth frames, each of which starts from the fixed line and

ends a small distance away from the stitching lines (to ensure

overlap), in opposite directions.Step 2: Match and ray interpolation. Match a set of

corresponding points as control point pairs in the twosuccessive overlapping slices, fðP1i; P2iÞ; i ¼ 1; 2; . . . ; Ng, ina given small region along epipolar lines, around the straightstitching line. We use a correlation-based method to find apair of piecewise linear matching curves passing through thecontrol points in the two frames. The control point pairs areselected bymeasuring both the gradient magnitudes and thecorrelation values in matching. Then, a stitching curve isobtained which runs through destination locations Qiði ¼1; . . . ; NÞ of the interpolated rays in the mosaic computed foreach corresponding pair ðP1i; P2iÞ using (16).

Step 3: Triangulation. Select two sets of control pointsfRmiðm ¼ 1; 2; i ¼ 1; . . . ; N � 1Þg on the fixed lines in the twosuccessive frames, whose y coordinates are determined bythe fixed lines and whose x coordinates are the averages ofthe consecutive control points on the matching curves, Pmi

and Pm;iþ1ðm ¼ 1; 2; i ¼ 1; . . . ; N � 1Þ, for appropriate trian-gulation. Mapping of R1i and R2i into the correspondingmosaic coordinates results in S1i and S2iði ¼ 1; . . . ; NÞ, thistime by solely using interframe translations ðtðkÞx ; tðkÞy Þ andðtðk�1Þ

x ; tðk�1Þy Þ. For the kth frame, we generate two sets of

corresponding triangles (Fig. 6): The source triangles bypointsets fP2ig and fR2ig, and the destination triangles by pointsets fQig and fS2ig. Do the same triangulation for the ðk� 1Þstframe.

Step 4: Warping. For each of the two frames, warp eachsource triangle into the corresponding destination triangle,under the assumption that the region within each triangle isa planar surface given small interframe displacements.Since the two sets of destination triangles in the mosaichave the same control points on the stitching curve, the twoslices will be naturally stitched in the mosaic.

4.2.2 Motion Parallax and Misalignment Analysis

Wepresent someexperimental resultsof realvideomosaicingto show why ray interpolation is important for both stereoviewing and 3D reconstruction. Fig. 7 shows real examples oflocal match and ray interpolation results for two pairs ofsuccessive images for aUMass campus scene. In constructingstereo mosaics, the distance between the front and the rearslice windows is dy ¼ 192 (in pixels; original image 720*480),and the range profiler tells us that the average height of theaerial camera from the ground is about H ¼ 300 m. Thequantitative analysis of 3D estimation error with suchmisalignments for both pairs is summarized in Table 1. Asanexample,wewill explain thedetails for theFineArtsCentercase inFig. 7Example1. Bymanually selecting correspondingpoints on the ground and the top of the ridge of the Fine ArtBuilding in the stereomosaics generated by the 3Dmosaicingmethod (the fast PRISM algorithm), we find that the relativey displacement of the building topwith respect to the groundis about �y ¼ �13 pixels. A 1-pixel misalignment (�y inTable1) instereomosaics,whenusinga2Dmosaicingmethodwithout ray interpolation,will introduce adepth (andheight)error of �Z ¼ 1:56m, using (4). While the relative error of thedepth estimation of the roof (i.e., �Z=Z) is only about0.56 percent, the relative error in height estimation (i.e.,

�Z=�Z) is as high as 7.7 percent. It can be seen that eventhoughtheseveralpixelsof interframemotionparallaxarenotsufficient for 3D estimation using interframe stereo, it issignificant in improving the overall depth accuracy in stereomosaics.

Fig. 7g shows mosaiced results where camera orientationparameters were estimated by registering the planar groundsurface of the scene via dominant motion analysis [29]. Thereaders can also compare the results of 3D mosaicing (withparallel-perspective projection) versus 2D mosaicing (withmanifold projection) by examining the building boundariesassociating with depth changes in the full 4; 160� 1; 536mosaics at our Web sites [28]. Clearly geometric-seamlessmosaicing by ray interpolation is very important for accurate3D estimation as well as good visual appearance.

4.3 Discussions: Better Triangulation andOcclusion Handling

The locations of the stitching curves in the fast PRISMalgorithm enable us to use the closest existing views togenerate parallel-perspective rays. Using sparse controlpoints on stitching curves and imagewarping, the fast PRISMalgorithm only approximates the parallel-perspective geo-metry in stereo mosaics. However, the proposed PRISMtechnique can be implemented to use more feature points(thus, smaller triangles) in the overlapping slices rather thanjust those around a single stitching curve so that each trianglereally covers a planar patch or a patch that is visuallyindistinguishable from a planar patch. Therefore, one of thecritical issues is to robustly pick up and match the controlpoints and to perform better triangulation that is necessary togenerate a geometrically corrected and seamless mosaic.Morris andKanade [10] have discussed the best triangulationgiven a set of 3D points of an object, based on its consistencywith a set of images of the 3D object. Themethod proposed intheirpapercouldbeapplied toourPRISMalgorithmforbettertriangulation, which is an important topic that deservesfurther study.

Another important issue is occlusion handling in rayinterpolation. In a perspective image, scene regions indifferent image locations have varying degrees of occlusion(Fig. 8a). In contrast, in a parallel-perspective image, theocclusion relations are always the same in the direction of theparallel projection (Fig. 8b). Now, the question is: When wetransform a sequence of perspective images to a left-view (orright-view) parallel-perspective mosaic using parallel rayinterpolation, how can we deal with these different spatialocclusion relations in perspective views? While occlusion isknown to be a particularly difficult problem in computervision, our analysis shows that, in our case of stereo mosaics,we only need to deal with occlusion where we detectsignificant right-side depth boundaries for generating leftmosaics, or left-side for right mosaics. We show the principlewith the left-view parallel-perspective mosaic, working withthe 1D intersection in the direction of the parallel projection.Let us consider a pair of successive frames (with viewpointsO1 and O2) of an image sequence (Fig. 8c). We define anocclusion viewpoint Ox as a viewpoint from which the leftparallel ray intersects anoccludingboundary (Bx) of anobject(the box). It can be easily verified that the condition to avoidthe occlusion problem iswhen both viewpointsO1 andO2 areon one side of the occlusion viewpointOx. Otherwise,we facethe occlusion problem. In such a case, a region on the more

ZHU ET AL.: GENERALIZED PARALLEL-PERSPECTIVE STEREO MOSAICS FROM AIRBORNE VIDEO 233

Page 10: C COPYRIGHT NOTICE Izhu/tpami117016.pdf · Generalized Parallel-Perspective Stereo Mosaics from Airborne Video Zhigang Zhu, Member, IEEE, Allen R. Hanson,Member, IEEE, and Edward

distant surface (the ground) can only be seen from the secondview for generating the left mosaic. Part of this region, whichshould be visible in the generated left mosaic, is bounded bythe rays OxBx and O1Bx. If we are not dealing with verycomplicated occluding scene, the third view that follows thepair of frames under consideration can usually also see thisportion of region, so that matching points in the second and athird view (O3) for this region and back-projecting the rays tothe desired left ray direction will solve this problem.

5 DEPTH ERROR CHARACTERIZATION OF

STEREO MOSAICS

In theory, the adaptive baseline inherent in the parallel-perspectivegeometrypermitsdepthaccuracy independentofabsolute depth, as shown in [26], [1]. However, in practice, an

important question needs to be answered: Since the motionparallax information between two successive perspectiveframes is used for making stereo mosaics, will the smallbaseline between frames introduce large errors in rayinterpolation, as it does for direct depth estimation fromsuccessive frames?

In order to answer the question, we need to reobserve theray interpolation process. In making parallel-perspectivestereo mosaics, the disparities (dy) of all points are constantsincea fixedanglebetweenthe twoviewingrays is selectedforgenerating thestereomosaics.Asaconsequence, foranypointin the right mosaic, searching for the match point in the leftmosaic indicates a process of finding an original frame inwhich this match pair has a predefined constant disparity (bythe distance of the two slit windows) but with an adaptivebaseline depending on the depth of the point. Therefore, we

234 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 2, FEBRUARY 2004

Fig. 7. Two examples of local match and triangulation for the left mosaic. In both examples (the first and the second columns), enlarged views ofsmall windows of the following are shown: (a), (d) the previous frame; (b), (e) the current frame, and (c), (f) ground truth values of heights along thestitching lines (measured from the ground plane). The large gray crosses in (a), (b), (d), and (e) show the initially selected points (which are evenlydistributed along the ideal stitching line) in the previous frame and its initial matches in the current frame by using the global transformation. The darkand light small crosses show the correct match pairs by feature selection and correlation (light matches light, dark matches dark). The fixed lines,stitching lines/curves, and the triangulation results are shown as lighter lines. The local match results show that points on the tops of the narrowbuilding (the Fine Arts Center Building) in the first example and the tall building (the Campus Center Building) in the second example have largermotion parallax than ground points. The left-view mosaic of all the frames in the sequence using the fast PRISM algorithm are shown in (g).

Page 11: C COPYRIGHT NOTICE Izhu/tpami117016.pdf · Generalized Parallel-Perspective Stereo Mosaics from Airborne Video Zhigang Zhu, Member, IEEE, Allen R. Hanson,Member, IEEE, and Edward

formulate theproblemasfollowsunder1Dtranslation(Fig.9).Let us for the moment assume that we accurately identify apoint y3 ¼ �dy=2 right in the center of the rear slitwindowinaviewO3 (of theoriginalvideosequence),whichwillcontributeto the rightmosaic.Nowweare hoping to find itsmatchpointyi ¼ þdy=2, in the center of the front slit window of a certainviewinthesequence,whichcouldcontribute to the leftmosaicwith parallel-perspective projection. Usually, we will nothave the exact viewOi; instead, thepoint yi is reprojected (i.e.,interpolated) from a virtual interpolated view Oi determinedby a pair of correspondence points y1 and y2 in two existingconsecutiveviewsO1 andO2 in theoriginalperspective imagesequence. The localization error of thepoint yi dependson theerrors inmatching and localizing points y1 and y2 (and also incamera pose estimation). After some tedious mathematicaldeduction [31], we obtain an important conclusion: The deptherror of the real stereo mosaics is proportional to the absolute depth:

�Zmosaic ¼Z

dy�y; ð17Þ

where �y represents the spatial localization error of thecorresponding point pair in the original perspective images.

In the beginning of this discussion, we assumed anaccurate right (backward-looking) parallel ray, but we canincorporate the localization error of that ray as well.Symmetrically, the localization of the point in the rightmosaic has the same amount of error, but the result of lineardepth error characterization will remain the same.

How good is this linear error characterization in thestereo mosaics, then? The analysis in [31] also shows that,even though the depth estimate from two successive views

O1 and O2 cannot give us good 3D information, as shown bythe large diamond error region in Fig. 9 (denoted by�Zinterframe), the localization error of the interpolated point(i.e., the left-viewing ray from viewpoint Oi) is muchsmaller, leading to significantly smaller depth estimationerror (�Zmosaic). The key factor is that the PRISM approachonly needs interframe matches (which are much easier toobtain than in the large baseline case), but not the explicitdepth information from interframe matches (which issubject to large errors). Quantitatively, it turns out that thedepth error of the real stereo mosaics introduced by the rayinterpolation step is bounded by the errors of two pairs of stereoviews O1&O3 and O2&O3, both with almost the same “optimal”baseline configurations as the real stereo mosaics. Obviously, the

ZHU ET AL.: GENERALIZED PARALLEL-PERSPECTIVE STEREO MOSAICS FROM AIRBORNE VIDEO 235

Fig. 8. Illustrations of the differences in occlusions in (a) a perspective image and (b) a parallel image. The shaded regions are occluded.(c) illustrates how to perform ray interpolation with occlusion for a left-view mosaic. A symmetric relation holds for a right-view mosaic.

Fig. 9. Error analysis of ray interpolation. While depth estimation for twoconsecutive frames is subject to large error (�Zinterframe), the localizationerror of the interpolated ray for stereo mosaics turns out to be very smalland so does the depth error of the real stereo mosaics (�Zmosaic).

TABLE 1Error Analysis of 2D Mosaics of a Campus Scene (dy = 192 Pixels, H = 300 m)

Page 12: C COPYRIGHT NOTICE Izhu/tpami117016.pdf · Generalized Parallel-Perspective Stereo Mosaics from Airborne Video Zhigang Zhu, Member, IEEE, Allen R. Hanson,Member, IEEE, and Edward

“real” stereo mosaic approach provides a systematic way toachieve such “optimal” configurations.

From the derivation of the localization error of theinterpolated point for ray interpolation, we have found thatthis error is proportional to Z=F , where F is the focal length,but is independent of the interframe translational magnitude [31].This implies that stereo mosaics with the same degree ofaccuracy can be generated from sparse image sequences, aswell as dense ones, given that the interframe matches arecorrect and are not subject to the occlusion problems (asdiscussed inSection4.3).An intuitiveexplanation is that, eventhough the depth errors via interframe stereo are inverselyproportional to the magnitudes of the interframemotion, theprojections of those error regions with different motionmagnitudes to the parallel ray direction remain the same.

6 CONCLUDING REMARKS AND FUTURE WORK

We have studied the representation and generation of aparallel-perspective stereoscopic mosaic pair from an imagesequence captured by a camera with constrained 3D rotationand 3D translation. The 3D mechanism for the stereomosaics includes two aspects: 1) a 3D mosaicing processconsists of a global image rectification that eliminatesrotation effects, followed by a fine local transformationand ray interpolation that accounts for the interframemotion parallax due to the 3D structure of a scene; and2) the final mosaics are a stereo pair that embodies3D information of the scene derived from optimal baselines.The core idea of ray interpolation in the PRISM algorithm,which leads to better and more accurate mosaics, can begeneralized to mosaics with other types of projections.

Since parallel-perspective stereo mosaics provide adap-tive baselines and large constant disparity, better depthresolution is achieved than in perspective stereo and therecently developed multiperspective stereo with circularprojection. We have arrived at several important conclu-sions. Ray interpolation between two successive views isactually very similar to image rectification, thus theaccuracy of a three-stage matching mechanism (i.e., match-ing for poses, mosaicing, and correspondences) for3D recovery from stereo mosaics is comparable to that ofperspective stereo with the same adaptive/optimal baselineconfigurations. Apparently, the stereo mosaic mechanismprovides a nice way to achieve such “optimal” configura-tions. We have proven that the depth error of stereo mosaicsfrom real video is a linear function of the absolute depth,which extends our understanding of the parallel-perspec-tive stereo from the previous observation of constant depthresolution. We also show that the ray interpolationapproach works equally well for both dense and sparseimage sequences in terms of accuracy in depth estimation.

Given the nice stereo geometry of the parallel-perspectivestereo mosaics, there are several open issues for futureresearch. The first important issue is camera orientationestimation for accurate geo-registered stereomosaics. Bundleadjustment is an obvious approach, but, in order to apply thetechniquesautomaticallyandefficiently(withoutorwithlittlehuman intervention) to very long image sequences (usuallywith more than a thousand images), the robustness, conver-gence, and computational efficiency problems need to bestudied.

The second important issue is the interframematchingandtriangulation for ray interpolation in generating stereomosaics. In our current implementation, a simple correlation

approach may be sufficient for the forest scenes with strongtextures and with quite dense image sequences. But, for acultural scenewithmany textureless areas but obvious depthboundaries, an accurate and robust feature selection andmatching method is required to build the correspondencesbetween the two slices in the successive frames for rayinterpolation. Since ray interpolation actually deals with 3Dinformation (even though not explicit 3D recovery), physicalconstraints such as homogeneous texture region constraintsand boundary detection could be used to define thematchingprimitives and to provide better triangulation for rayinterpolation.

The third important issue is the stereo correspondenceproblem between a pair of stereo mosaics. The advantage ofstereomosaics for 3D reconstruction is the strong stereo effectfrom twowidely separated viewing directions, creating largeconstant disparity, and adaptive baselines. However, largeand adaptive baselines bring in difficulties in stereo match-ing. As one of the possible solutions, for example, we canextractmultiple (i.e.,more than2)pairs of stereomosaicswithsmall viewing angle differences (i.e., the disparity dy)between each pair of nearby mosaics—thus, constructing a“multidisparity” stereo mosaic system [28] , analogous to amultibaseline stereo system [11]. Multidisparity stereomosaics could be a natural solution for the problem ofmatching across large oblique viewing angles.

ACKNOWLEDGMENTS

This work is partially supported by US National ScienceFoundation Challenges CISE (Grant Number EIA-9726401),CNPq EIA9970046, SGER EIA-0105272, and Army ResearchOffice (DURIP) DAAD19-99-1-0016. The authors are gratefulto anonymous reviewers for their insightful comments andsuggestions that have greatly improved the presentation ofthe paper.

REFERENCES

[1] J. Chai and H.-Y. Shum, “Parallel Projections for Stereo Recon-struction,” Proc. IEEE Conf. Computer Vision and Pattern Recognition(CVPR ’00), pp. 493-500, 2000.

[2] R. Gupta and R. Hartley, “Linear Pushbroom Cameras,” IEEETrans. Pattern Analysis and Machine Intelligence, vol. 19, no. 9,pp. 963-975, 1997.

[3] H.-C. Huang and Y.-P. Hung, “Panoramic Stereo Imaging Systemwith Automatic Disparity Warping and Seaming,” GraphicalModels and Image Processing, vol. 60, no. 3, pp. 196-208, 1998.

[4] M. Irani, P. Anandan, J. Bergen, R. Kumar, and S. Hsu, “EfficientRepresentations of Video Sequences and Their Applications,”Signal Processing: Image Comm., vol. 8, no. 4, pp. 327-351, May 1996.

[5] M. Irani and P. Anandan, “Video Indexing Based on MosaicRepresentations,” IEEE Trans. Pattern Analysis and MachineIntelligence, vol. 86, no. 5, pp. 905-921, 1998.

[6] H. Ishiguro, M. Yamamoto, and S. Tsuji, “Omnidirectional Stereofor Making Global Map,” Proc. IEEE Int’l Conf. Computer Vision(ICCV ’90), pp. 540-547, 1990.

[7] R. Kumar, P. Anandan, M. Irani, J. Bergen, and K. Hanna,“Representation of Scenes from Collections of Images,” IEEEWorkshop Representation of Visual Scenes, pp. 10-17, 1995.

[8] R. Kumar, H. Sawhney, J. Asmuth, J. Pope, and S. Hsu,“Registration of Video to Geo-Registered Imagery,” Proc. IAPRInt’l Conf. Pattern Recognition (ICPR ’98), vol. 2, pp. 1393-1400, 1998.

[9] D.L. Milgram, “Adaptive Techniques in Photo Mosaicing,” IEEETrans. Computers, vol. 26, pp. 1175-1180, 1977.

[10] D.D. Morris and T. Kanade, “Image-Consistent Surface Triangula-tion,” Proc. IEEE Conf. Computer Vision and Pattern Recognition(CVPR ’00), pp. 332-338, June 2000.

236 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 2, FEBRUARY 2004

Page 13: C COPYRIGHT NOTICE Izhu/tpami117016.pdf · Generalized Parallel-Perspective Stereo Mosaics from Airborne Video Zhigang Zhu, Member, IEEE, Allen R. Hanson,Member, IEEE, and Edward

[11] M. Okutomi and T. Kanade, “A Multiple-Baseline Stereo,” IEEETrans. Pattern Analysis and Machine Intelligence, vol. 15, no. 4,pp. 353-363, 1993.

[12] S. Peleg and J. Herman, “Panoramic Mosaics by ManifoldProjection,” Proc. IEEE Conf. Computer Vision and Pattern Recogni-tion (CVPR ’97), pp. 338-343, 1997.

[13] S. Peleg and M. Ben-Ezra, “Stereo Panorama with a SingleCamera,” Proc. IEEE Conf. Computer Vision and Pattern Recognition(CVPR ’99), pp. 395-401, 1999.

[14] S. Peleg, M. Ben-Ezra, and Y. Pritch, “OmniStereo: PanoramicStereo Imaging,” IEEE Trans. Pattern Analysis and MachineIntelligence, pp. 279-290, Mar. 2001.

[15] B. Rousso, S. Peleg, I. Finci, and A. Rav-Acha, “UniversalMosaicing Using Pipe Projection,” Proc. IEEE Int’l Conf. ComputerVision (ICCV ’98), pp. 945-952, 1998.

[16] H.S. Sawhney, “Simplifying Motion and Structure Analysis UsingPlanar Parallax and Image Warping,” Proc. IAPR Int’l Conf. PatternRecognition (ICPR ’94), pp. 403-408, 1994.

[17] H.S. Sawhney, R. Kumar, G. Gendel, J. Bergen, D. Dixon, and V.Paragano, “VideoBrush: Experiences Consumer Video Mosai-cing,” Proc. IEEE Workshop Applications of Computer Vision(WACV ’98), pp. 56-62, Oct. 1998.

[18] H. Schultz, “Terrain Reconstruction from Widely SeparatedImages,” Proc. SPIE, vol. 2486, pp. 113-123, Apr. 1995.

[19] H.-Y. Shum and L.-W. He, “Rendering with Concentric Mosaics,”Proc. SIGGRAPH ’99, pp. 299-306, Aug. 1999.

[20] H.-Y. Shum and R. Szeliski, “Stereo Reconstruction from Multi-perspective Panoramas,” Proc. IEEE Int’l Conf. Computer Vision(ICCV ’99), pp. 14-21, 1999.

[21] H.-Y. Shum and R. Szeliski, “Construction of Panoramic ImageMosaics with Global and Local Alignment,” Int’l J. ComputerVision, vol. 36, no. 2, pp. 101-130, 2000.

[22] Manual of Photogrammetry, fourth ed. C.C. Slama ed., Am. Soc. ofPhotogrammetry, 1980.

[23] D. Slaymaker, H. Schultz, A. Hanson, E. Riseman, C. Holmes, M.Powell, and M. Delaney, “Calculating Forest Biomass with SmallFormat Aerial Photography, Videography and a Profiling Laser,”Proc. 17th Biennial Workshop Color Photography and Videography inResource Assessment, 1999.

[24] R. Szeliski and S.B. Kang, “Direct Methods for Visual SceneReconstruction,” Proc. IEEE Workshop Representation of VisualScenes, pp. 26-33, 1995.

[25] J.Y. Zheng and S. Tsuji, “Panoramic Representation for RouteRecognition by a Mobile Robot,” Int’l J. Computer Vision, vol. 9,no. 1, pp. 55-76, 1992.

[26] Z. Zhu, A.R. Hanson, H. Schultz, F. Stolle, and E.M. Riseman,“Stereo Mosaics from a Moving Video Camera for EnvironmentalMonitoring,” Proc. First Int’l Workshop Digital and ComputationalVideo, pp. 45-54, 1999.

[27] Z. Zhu, G. Xu, and X. Lin, “Panoramic EPI Generation andAnalysis of Video from a Moving Platform with Vibration,” Proc.IEEE Conf. Computer Vision and Pattern Recognition (CVPR ’99),pp. 531-537, 1999.

[28] Z. Zhu, “PRISM: Parallel Ray Interpolation for Stereo Mosaics,”http://www-cs.engr.ccny.cuny.edu/~zhu/StereoMosaic.html orhttp://www.cs.umass.edu/~zhu/StereoMosaic. html, 2000.

[29] Z. Zhu, E.M. Riseman, and A.R. Hanson, “Theory and Practice inMaking Seamless Stereo Mosaics from Airborne Video,” Techni-cal Report #01-01, Computer Science Dept., Univ. of Massachu-settes-Amherst, Jan. 2001, http://www.cs.umass.edu/zhu/UM-CS-2001-001.pdf.

[30] Z. Zhu, E.M. Riseman, and A.R. Hanson, “Parallel-PerspectiveStereo Mosaics,” Proc. IEEE Int’l Conf. Computer Vision (ICCV ’01),vol. I, pp. 345-352, July 2001.

[31] Z. Zhu, A.R. Hanson, H. Schultz, and E.M. Riseman, “Generationand Error Characteristics of Parallel-Perspective Stereo Mosaicsfrom Real Video,” Video Registration, M. Shah and R. Kumar, eds.,Video Computing Series, Kluwer Academic, pp. 72-105, May 2003.

[32] A. Zomet, S. Peleg, and C. Arora, “Rectified Mosaicing: MosaicsWithout the Curl,” Proc. IEEE Conf. Computer Vision and PatternRecognition (CVPR ’00), pp. 459-465, June 2000.

Zhigang Zhu received the BE, ME, and PhDdegrees, all in computer science from TsinghuaUniversity, Beijing, China, in 1988, 1991, and1997, respectively. He is currently an associateprofessor in the Department of ComputerScience, the City College of the City Universityof New York. Previously, he was an associateprofessor at Tsinghua University, and a seniorresearch fellow at the University of Massachu-setts, Amherst. From 1997 to 1999, he was the

director of the Information Processing and Application Division in theComputer Science Department at Tsinghua University. His researchinterests include 3D computer vision, Human-Computer Interaction(HCI), virtual/augmented reality, video representation, and variousapplications in education, environment, robotics, surveillance, andtransportation. He has published more than 80 technical papers in therelated fields. Dr. Zhu received the Science and Technology Achieve-ment Award (second-prize winner) from the Ministry of ElectronicIndustry, China, in 1996 and the C.C. Lin Applied Mathematics Award(first prize winner) from Tsinghua University in 1997. His PhD thesis “OnEnvironment Modeling for Visual Navigation” was selected in 1999 as aspecial award in the top 100 dissertations in China over the last threeyears, and a book based on his PhD thesis was published by ChinaHigher Education Press in December, 2001. He is a member of the IEEEand a member of the ACM.

Allen R. Hanson received the BS degree fromClarkson College of Technology in 1964 and theMS and PhD degrees in electrical engineeringfrom Cornell University in 1966 and 1969,respectively. He joined the Computer ScienceDepartment as an associate professor in 1981,has been a full professor since 1989. He hasconducted research in computer vision, artificialintelligence, learning, and pattern recognition,and has more than 150 publications. He is the

codirector of the Computer Vision Laboratory with a diverse range ofrecent research including aerial digital video analysis for environmentalscience, three-dimensional terrain reconstruction, distributed sensornetworks, motion analysis and tracking, mobile robot navigation, under-vehicle inspection for security applications, object recognition, coloranalysis, and image information retrieval. He has been on the editorialboards of the following journals: Computer Vision, Graphics and ImageProcessing (1983-1990), Computer Vision, Graphics, and ImageProcessing—Image Understanding (1991-1994), and Computer Visionand Image Understanding (1995-present). He is a member of the IEEE.

Edward M. Riseman received the BS degreefrom Clarkson College of Technology in 1964and the MS and PhD degrees in electricalengineering from Cornell University in 1966and 1969, respectively. He joined the ComputerScience Department as an assistant professor in1969, has been a full professor since 1978, andserved as chairman of the department from1981-1985. He has conducted research incomputer vision, artificial intelligence, learning,

and pattern recognition, and has more than 200 publications. He hascodirected the Computer Vision Laboratory since its inception in 1975,with a diverse range of recent research, including aerial digital videoanalysis for environmental science, three-dimensional terrain recon-struction, distributed sensor networks, motion analysis and tracking,mobile robot navigation, biomedical image analysis, under-vehicleinspection for security applications, object recognition, color analysis,and image information retrieval. He has served on the editorial board ofComputer Vision and Image Understanding (CVIU) from 1992-1997, andthe editorial board for the International Journal of Computer Vision(IJCV) from 1987 to the present, is a senior member of IEEE, and afellow of the American Association of Artificial Intelligence (AAAI).

. For more information on this or any other computing topic,please visit our Digital Library at http://computer.org/publications/dlib.

ZHU ET AL.: GENERALIZED PARALLEL-PERSPECTIVE STEREO MOSAICS FROM AIRBORNE VIDEO 237