INSERTION OF THREE-DIMENSIONAL OBJECTS IN ...wscg.zcu.cz/wscg2002/Papers_2002/B11.pdfthe optical center C, they determine three mutually or-thogonal directions. This condition allows

INSERTION OF THREE-DIMENSIONAL OBJECTS INARCHITECTURAL PHOTOS

Beatriz Silva Villa AlvarezPaulo Cezar Pinto Carvalho

Marcelo Gattass

Instituto de Mateḿatica Pura e Aplicada - IMPAEstrada Dona Castorina 110

22.460-320 Rio de Janeiro/RJBrasil

Departmento de InforḿaticaPontif́ıcia Universidade Católica do Rio de janeiro - PUC-Rio

Marqûes de S̃ao Vicente 22522.453-900 Rio de Janeiro/RJ

Brasil

[email protected], [email protected] , [email protected]

Abstract

This paper proposes a simple and interactive system that allows modifying a photographic picture of a three-dimensional scene involving architectural elements, so that the user can evaluate the aesthetic effects and theimpact such modifications in the real environment would cause in other people. The method is based on theexistence, in architectural pictures, of three main directions of interest, which are mutually orthogonal. Theidentification of vanishing points of such directions allows calibrating the camera used to take the pictures andalso inserting new elements into the scene.

Keywords: three-dimensional edition, camera calibration, projection, pictures, vanishing points, orthogonal di-rections.

1 INTRODUCTION

This paper presents a simple and interactive system ca-pable of modifying a photographic picture of a three-dimensional scene involving architectural elements, byinserting new objects into certain planes in the scene. Thesystem is targeted to Architecture, Photography, Public-ity and Visual Arts professionals whose needs involvemodifying real settings. For instance, architects could in-sert new three-dimensional objects in building façades toevaluate the result of refurbishment or restoration beforespending time and money in the actual execution. Public-ity professionals might desire to have an early visualiza-tion of their outdoor campaigns and test their visual im-pact (see Figure 1). Graphics artists could also enhancetheir wall paintings, nowadays so common in big cities,by testing the aesthetic result virtually and studying thecolor and size of their works before they become real.

The system also allows extracting texture with adequateperspective correction, which could be used for rendering

synthetic 3D models, and obtaining a new view of thescene in which the projections of vertical lines becomeparallel. This last tool is useful in those cases in whichthe camera is too close to the object of interest and thepicture cannot be taken by a camera parallel to the verticaldirection (hence, vertical lines seem to converge to theirvanishing point).

The techniques proposed here can be easily embeddedinto existing 2D image editing software (for instance, asplug-ins), in order to provide the user with new, easy-to-use, geometrically correct tools for modifying 3D scenes.

In Section 2, we will describe the problem of editingthree-dimensional scenes and its possible approaches.Then, in Section 3, we will discuss in detail the imple-mentation of the proposed method. Finally, in Section5, we will draw some conclusions and suggest possibleextensions of this work.

Figure 1: (a,b) Original images; (c) outdoor ap-plied to the house’s façade

2 PROBLEM DESCRIPTION

When editing photographic pictures, one commonlywishes to insert new images in certain planes of the orig-inal picture or to obtain a new view of the scene usinga projection that preserves the proportions parallel to agiven plane. Such tasks are often executedby feeling- theuser deforms the image, by means of trial and error, untilthe desired effect is obtained. However, such process istime-consuming, tedious, and it not always provides goodresults, even for experienced and capable users.

The problem of obtaining three-dimensional informationfrom pictures has been receiving a good deal of atten-tion in the literature, but usually with purposes differ-ent from the ones related to the present work. Severalworks [Chen95, MB95, HAA97] approach the visual-ization problem using image-based rendering techniqueswhose goal is to obtain new views of a scene from a setof images. Other works deal with the three-dimensionalreconstruction of a scene using stereo vision [FRL98] oradopting mathematical models for the objects present inthe scene [DTM96].

Differently from such approaches, the method presentedhere does not seek to obtain a complete reconstruction ofthe scene. On the other hand, it uses as input data onlyone picture, for which no metric information is assumedto be known, neither about the scene (such as coordinatesof reference points or measures of elements) nor aboutthe camera position.

The present work has several points in common with[P3D99], which provides an environment for extract-ing three-dimensional information from a single picture.However, the emphasis of that work is the partial mod-eling of three-dimensional objects in the scene, whereasour goal is to provide tools for editing, in a geometri-cally correct manner, planes of the original picture, byinserting new objects into such planes with the properperspective. This method also aims at allowing the user

to extract a texture from the input picture with automaticperspective correction. It also provides a way to simu-late a photographic lens capable of correcting the angulardistortion caused by the perspective projection. We mustnote that this projection can not always be obtained witha real photographic lens.

3 METHOD DESCRIPTION

3.1 Camera Calibration

One of the preconditions to construct geometric represen-tations from photographic pictures is to know or identifythe parameters of the camera that captured the picture ofinterest; among them, the camera orientation and virtualposition in the 3D space. With this information, a trans-formation is obtained such that, given a point in space, itscorresponding point in the image is located. This problemis calledcamera calibration, and a more detailed studycan be seen in [Fau93, TGTL91]. We are considering asimple pin-hole camera model, which does not take intoaccount the existence of lenses: the image of a 3D objectis obtained by its perspective projection on a plane 2Dsurface through the camera optical center (pin-hole).

Several camera-calibration methods are based on the ex-istence of reference points in the picture, whose coordi-nates in the real world are known [CSG98, Tsai86]. Oth-ers [DTM96] employ known metric information, such asthe dimensions of architectural elements like windows ordoors. In the present paper, we assume that no such in-formation is available. This is important, for instance, insituations involving old pictures, for which it is impossi-ble to recover these data.

The calibration method used in this work, which is thesame as in [P3D99], resorts exclusively to informationon the directions - more precisely, to the vanishing pointsrelative to three mutually orthogonal directions. As willbe shown next, these vanishing points determine the posi-tion of the camera optical center and angle of view. How-ever, without complementary metric information, it is im-possible to retrieve the focal distance. As a consequence,objects can be inserted in the scene with correct propor-tions, but without an absolute control of their size. Forexample, in Figure 1, the size of the outdoor can be spec-ified only in relation to elements in the scene, such as thefloor-to-ceiling height.

The need to identify vanishing points of three orthogonaldirections makes this method especially suitable for ar-chitectural pictures, in which there are usually three eas-ily identifiable directions. We also admit the elements tobe inserted on the image, or from which one wishes to ob-tain information from the image, to be aligned with suchdirections.

3.2 Obtaining the Camera’s Position

In this subsection we will show how to obtain the cameraposition in relation to the picture from vanishing pointsFx, Fy and Fz relative to the three orthogonal direc-tions X, Y andZ. In architectural pictures, the choiceof these directions is natural, as they correspond to themain height, width and depth directions of buildings. Thevanishing points are the points where the lines parallel tothese three directions passing through the optical centerintersect the projection plane. Therefore, together withthe optical centerC, they determine three mutually or-thogonal directions. This condition allows retrieving theposition of the optical centerC in relation to the image,by solving a system of equations expressing the orthog-onality of ~CFx, ~CFy e ~CFz or by using the fact thatCis projected on the orthocenter of the triangle having ver-ticesFx, Fy andFz, as illustrated in Figure 2. The po-sition (uc, vc, wc) of C is expressed in the image coordi-nate systemSi, with origin on the left bottom corner, twoaxesu andv aligned with the borders of the image, and athird axisw orthogonal to these borders. The coordinatesin this system are expressed in pixels. As already men-tioned, it is not possible to obtain the real focal distanceunless one knows the dimensions of a pixel. It is alsoimportant to note that the projection of the optical centerdoes not necessarily coincide with the center of the image(Figure 3), since the image might represent only a portionof the original picture.

Figure 2: Location of the optical center

In fact, the user does not directly specify vanishing pointsFx, Fy andFz : he/she indicates, for each of the main di-rections, two or more straight lines in the image which areprojections of straight lines in the scene having these di-rections. The corresponding vanishing point is obtainedby intersecting these straight lines. To allow the user toindicate the straight lines corresponding to each direction,the system offers an interface with three pairs of guide-lines to be positioned over the straight lines in the image,as illustrated in Figure 3.

3.3 Projection Equations

The camera projection equations associate each point inspace to its perspective projection on the image. The po-sition of a pointP in space is described by means ofits coordinates in the world coordinate systemSm, withorigin at the camera optical centerC and axes aligned

Figure 3: Guidelines for the directions, and posi-tion of the optical centerC

with the main directionsX, Y andZ. The axes in thissystem pass through pointsFx, Fy andFz, respectively.Since the real focal distance is not known, the coordinatesin this system are also expressed in pixels. To expressthem in conventional distance measures one would haveto know, once again, the dimensions of a pixel.

Therefore, the projection equations include a point(X, Y, Z) of the world in their projection(u, v) on theimage. To write these equations, it is convenient to con-sider an intermediate coordinate system: that of the cam-era,Sc, with origin on the optical centerC and axesU ,V andW parallel to the axes of the coordinate system ofthe image,Si. Figure 4 illustrates these systems. Given apoint (X, Y, Z) of the world, its coordinates on the cam-era’s system are given by:

UVW

= Xu Yu ZuXv Yv Zv

Xw Yw Zw

XYZ

(1)where (Xu, Xv, Xw), (Yu, Yv, Yw) e (Zu, Zv, Zw) areunit vectors corresponding to~CFx, ~CFy e ~CFz, respec-tively.

Once such camera coordinates are obtained, obtainingtheir projection(u, v) is immediate. In homogeneous co-ordinates, it is given by: tutv

t

= wc 0 uc0 wc vc

0 0 1

UVW

(2)

Thus, the camera projection equation is expressed by: tutvt

= wc 0 uc0 wc vc

0 0 1

Xu Yu ZuXv Yv ZvXw Yw Zw

XYZ

(3)

Figure 4: Coordinate systems

3.4 Editing in a Main Plane

Once the camera position and the projection equationshave been obtained, it is possible to edit portions of theimage corresponding to projections of images containedin planes parallel to two of the main axes. We will con-sider planes parallel to axesX andY , but the other casescan be handled similarly.

We assume that the portion of the image to be edited is theprojection of a rectangle with sides parallel toX andY ,specified by selecting two diagonally opposed pointsp1andp2 corresponding to the projections of two verticesP1 and P2 of the rectangle. There are infinitely manypoints in space that project onp1 andp2, correspondingto the several depths ofZ in which the rectangle can belocated. Since our purpose is to obtain the ratio betweenthe rectangle dimensions, and not to retrieve their abso-lute values, this depth can be arbitrarily set. Thus, let ussetZ = 1. We can now retrieve the position ofP1 andP2, and consequently that of the two other verticesQ1andQ2 of the rectangle and their projectionsq1 andq2.

We have now established a correspondence between arectangle in spaceP1Q1P2Q2 and its projection - quadri-lateralp1q1p2q2 - by means of a two-dimensional projec-tive transformationT

′, which is the perspective projec-

tion executed by the camera restricted to planeZ = 1.This correspondence allows retrieving a non-deformedimage of the quadrilateralp1q1p2q2, obtained by a warp-ing transformation given by the inverse ofT

′. Over this

non-deformed image, new elements can be applied, suchas signs or pictures. These elements can be inserted backinto the original image by means of a warping transfor-mation inverse to the one applied in the previous step. Allthe process described is illustrated in Figure 5.

Figure 6 shows another example, where an outdoor is ap-plied to a building façade seen from two diferent posi-tions.

In many cases, it can be interesting to execute only partof the process above - for instance, when one wishes toextract information from the scene instead of insertingnew elements. The non-deformed image correctly dis-plays the proportions among the elements present on theplane being edited and preserves their angles, so it can beused to retrieve the relations among element dimensions(such as doors or windows) or to extract textures from

Figure 5: Stages of the editing process on a main plane

Figure 6: (a) original outdoor image , (b)(c) build-ings with inserted outdoors

it, to be used in the construction of virtual models of thescene. Figure 7 illustrates this process: the texture on thewall was extracted in a geometrically correct manner.

Figure 7: (a) original picture, (b) extracted texture

3.5 Perspective Correction

Though the images produced by means of perspectivecorrection are geometrically correct, they are not alwaysthe ideal representation of the object of interest. Particu-larly, angles are not preserved unless they have sides par-allel to the projection plane. For instance, when produc-ing the picture of a building whose height is large with re-spect to its distance to the camera optical center (i.e., sit-

uations in which there is little depth of field), the camerahas to be inclined in order to include the greatest possibleportion of the object. This causes the vertical directionto be non-parallel to the plane of the picture, introducinga vanishing point corresponding to this direction. Thismeans that the observer has the impression that the verti-cal edges of the building will meet. Even though this isperfectly normal, architects often prefer images in whichthe vertical direction is parallel to the plane of the picture,in order to avoid this effect. A possible solution consistsin using special lenses capable of producing pictures withthe desired characteristics [Nik01].

Our system offers an alternative solution, allowing theuser to eliminate the vanishing point corresponding to thevertical direction. The original picture is reprojected on anew plane, parallel to vertical directionY and positionedat the same distance from the camera as the original pro-jection plane. We select coordinate axes in this new planeso thatY is projected according to the vertical axis in thisnew picture. This is equivalent to introducing a new cam-era with the same optical center and focal distance, andnew coordinate axesα, β andγ. These new axes are se-lected so thatβ has the same direction asY andγ has adirection as similar toW as possible. This is done by tak-ing γ as the projection ofW on the plane orthogonal toY , as illustrated in Figure 8. Thus, vectors~α, ~β e~γ cor-responding to these new axes can be obtained from unitvectors~Y e ~W (respectively vertical and orthogonal tothe original projection plane) using the equations below:

~β = ~Y~γ = ~W − ( ~W · ~Y )~Y~α = ~β × ~γ

(4)

Finally, to obtain the reprojected picture, one only needsto consider each point(u, v) in the image to have the form(u, v, wc), then transform it to the coordinate system ofthe new camera and apply the new perspective projection.Thus, a point(u, v) in the original picture is transformedinto a point whose homogeneous coordinates(tu′, tv′, t)are given by:

tu′tv′t′

= wc 0 uc0 wc vc

0 0 1

αu αv αwβu βv βwγu γv γw

uvwc

(5)

where (αu, αv, αw), (βu, βv, βw) and (γu, γv, γw) arethe unit vectors corresponding to vectorsα, β andγ ob-tained in Equation (4).

Equation 5 determines a warping transformation that al-lows generating the new image.

Figure 9 shows the original images and the results of thereprojected pictures generated by the system. One can seethe desired effect: the lines corresponding to the verticaldirection of buildings have become parallel and verticalin the reprojected picture.

Figure 8: New camera’s computation process

Figure 9: Original images and reprojected pictureswith vertical inclination

4 APPLICATION

The method described in the previous section was imple-mented in an application for Windows 98, NT4 or above,available at [VCG00]. The user starts by loading a pic-ture and performing the camera calibration procedure,through the positioning of pairs of guidelines for eachone of the three main directionsX, Y or Z (Figure 10and Figure 11). Calibration results can be saved for usein future sessions.

Figure 10: Guideline initialization.

Figure 11: Guideline positioned along lines paral-lel to the main directions.

After calibration, the user can retrieve, with proper per-spective correction, a portion of the image correspondingto the projection of a rectangle parallel to two of the maindirections. First, he/she specifies whether the rectangleis parallel to planeXY , XZ or Y Z. Then, two diago-nally opposed points of the projected rectangle must beindicated (Figure 12). The projected rectangle is drawnover the picture and the corrected (unprojected) rectangleis shown in a separate window (Figure 13).

The unprojected picture can be saved to a file and usedfor texture extraction or proportion measurement. It canalso be replaced by a new image and placed back into theoriginal image, as shown in Figure 14.

There is also a function for eliminating the vanishingpoint for the vertical (Y ) direction, in such a way thatvertical lines project vertically in the new image. Figure15 shows the result obtained when applying that function.

Figure 12: Opposing corners of projected rectangle.

Figure 13: (a) Projected, (b) unprojected rectangle.

5 CONCLUSION AND FUTURE WORK

We have presented a system for three-dimensional edit-ing of architectural photographic pictures. The method issimple enough to be used by the professionals it is aimedat, requiring intuitive information which can be easilyidentified in the scenes of interest. The method does notrequire previous data about the scenes, so it can be usedwith any picture (even historical ones) for extracting in-formation or inserting new virtual elements.

A limitation of this method is that it assumes the editedregion to be on a single plane, parallel to two main axes.When this hypothesis is not satisfied, the extracted image

Figure 14: (a) Unprojected outdoor, (b) outdoorinserted into the original image.

Figure 15: Image with vertically projected vertical lines.

will contain elements with proportions not correspondingto the real ones. Figure 16 illustrates this. When extract-ing the marked region from the picture, the elements inthe balcony are also extracted and are displayed with de-formations (actually, the balcony is treated as if it weredrawn on the wall, instead of being a three-dimensionalelement). To address this problem, it would be necessaryto extend the editing surface, considering surfaces formedby rectangles parallel to the main axes.

Another natural extension would be to consider planesparallel to only one of the axes. This would be the caseof ceilings, ramps or walls not aligned with the main di-rections, for instance.

Figure 16: (a) Original image; (b) transformed im-age, considering all elements on the same plane.

It should also be considered the introduction of auto-matic or semi-automatic methods for extracting guide-lines, through the recognition of straight lines in the im-age ([SCG00]). For better results, these lines should beextracted at sub-pixel level.

References

[HAA97] Anjyo, K.; Horry, Y. and Arai, K. Tour into thepicture: Using a spidery mesh interface to makeanimation from a single image. InConferenceProceedings SIGGRAPH’97, 1997, pp 225–232.

[FRL98] Laveau, S.; Csurka,G.; Zeller,C.; Gauclin,C.;Faugeras,O.; Robert,L. and Zoghlami,I. 3D recon-struction of urban scenes from image sequences.Computer Vision and Image Understanding, Vol.

69 No.3(Article No. IV970665.):292–309, March1998.

[Nik01] Lens Nikon 28mm f/3.5 PC-nikkor perspec-tive control technology.http://www.nikon.com, 2001.

[MB95] McMillan, L. and Bishop, G. Plenoptic mod-eling: An image-based rendering system. InRobert Cook, editor,Conference Proceedings(SIGGRAPH ’95), ACM SIGGRAPH, AddisonWesley, Annual Conference Series, Los Angeles,California, 06-11 August 1995, pp 39–46.

[Fau93] Olivier Faugeras.Three-Dimensional ComputerVision: A Geometric ViewPoint. MIT Press, Cam-bridge, Massachusetts, 1993.

[P3D99] Photo 3D. http://www.photo3D.com ,1999.

[Tsai86] Roger Y. Tsai. An efficient and accurate cameracalibration technique for 3d machine vision. IEEEComputer Society Conference on Computer Vi-sion and Pattern Recognition, Miami Beach, FL,22-26 June 1986.

[Chen95] Shenchang Eric Chen. QuickTime VR - animage-based approach to virtual environment nav-igation. In Robert Cook, editor,Conference Pro-ceedings (SIGGRAPH ’95), ACM SIGGRAPH,Addison Wesley, Annual Conference Series, LosAngeles, California, 06-11 August 95, pp 29–38.

[SCG00] Szenberg, F.; Carvalho,P.C. e Gattass,M. Au-tomatic Camera Calibration for Image Sequencesof a Football Match. Interactive Conference onPattern Recognition. Rio de Janeiro, RJ, Brasil,Março de 2000.

[CSG98] Szenberg,F.; Carvalho,P.C. and Gattass,M.Image-based modeling using a two-step cameracalibration method. InProceedings of Interna-tional Symposium on Computer Graphics, ImageProcessing and Vision, 1998, pp 388–395.

[DTM96] Taylor,C.J.; Debevec,P. and Malik,J. Model-ing and rendering architecture from photographs:A hybrid geometry-and image-based approach. InHolly Rushmeier, editor,Conference Proceedings(SIGGRAPH ’ 96), ACM SIGGRAPH, AddisonWesley, Annual Conference Series, New Orleans,Louisiana, 04-09 August 1996, pp 11–20.

[TGTL91] Tommaselli,A. and Tozzi, C.Tutorial:calibraç̃ao de ĉameras usando feiçõesgeoḿetricas. In IV Simṕosio Brasileiro deComputaç̃ao Gráfica e Processamento de Im-agens, SIBGRAPI’91, Centro de ConvençõesRebouças, S̃ao Paulo, 14-17 July 1991.

[VCG00] Villa Alvarez,B.; Carvalho,P.C. and Gat-tass,M. http:www.tecgraf.puc-rio.br/˜bia/paper/3dedition.html

INSERTION OF THREE-DIMENSIONAL OBJECTS IN ...wscg.zcu.cz/wscg2002/Papers_2002/B11.pdfthe optical center C, they determine three mutually or-thogonal directions. This condition allows

Documents