-
INSERTION OF THREE-DIMENSIONAL OBJECTS INARCHITECTURAL
PHOTOS
Beatriz Silva Villa AlvarezPaulo Cezar Pinto Carvalho
Marcelo Gattass
Instituto de Mateḿatica Pura e Aplicada - IMPAEstrada Dona
Castorina 110
22.460-320 Rio de Janeiro/RJBrasil
Departmento de InforḿaticaPontif́ıcia Universidade Católica do
Rio de janeiro - PUC-Rio
Marqûes de S̃ao Vicente 22522.453-900 Rio de Janeiro/RJ
Brasil
[email protected], [email protected] ,
[email protected]
Abstract
This paper proposes a simple and interactive system that allows
modifying a photographic picture of a three-dimensional scene
involving architectural elements, so that the user can evaluate the
aesthetic effects and theimpact such modifications in the real
environment would cause in other people. The method is based on
theexistence, in architectural pictures, of three main directions
of interest, which are mutually orthogonal. Theidentification of
vanishing points of such directions allows calibrating the camera
used to take the pictures andalso inserting new elements into the
scene.
Keywords: three-dimensional edition, camera calibration,
projection, pictures, vanishing points, orthogonal di-rections.
1 INTRODUCTION
This paper presents a simple and interactive system ca-pable of
modifying a photographic picture of a three-dimensional scene
involving architectural elements, byinserting new objects into
certain planes in the scene. Thesystem is targeted to Architecture,
Photography, Public-ity and Visual Arts professionals whose needs
involvemodifying real settings. For instance, architects could
in-sert new three-dimensional objects in building façades
toevaluate the result of refurbishment or restoration
beforespending time and money in the actual execution. Public-ity
professionals might desire to have an early visualiza-tion of their
outdoor campaigns and test their visual im-pact (see Figure 1).
Graphics artists could also enhancetheir wall paintings, nowadays
so common in big cities,by testing the aesthetic result virtually
and studying thecolor and size of their works before they become
real.
The system also allows extracting texture with
adequateperspective correction, which could be used for
rendering
synthetic 3D models, and obtaining a new view of thescene in
which the projections of vertical lines becomeparallel. This last
tool is useful in those cases in whichthe camera is too close to
the object of interest and thepicture cannot be taken by a camera
parallel to the verticaldirection (hence, vertical lines seem to
converge to theirvanishing point).
The techniques proposed here can be easily embeddedinto existing
2D image editing software (for instance, asplug-ins), in order to
provide the user with new, easy-to-use, geometrically correct tools
for modifying 3D scenes.
In Section 2, we will describe the problem of
editingthree-dimensional scenes and its possible approaches.Then,
in Section 3, we will discuss in detail the imple-mentation of the
proposed method. Finally, in Section5, we will draw some
conclusions and suggest possibleextensions of this work.
-
Figure 1: (a,b) Original images; (c) outdoor ap-plied to the
house’s façade
2 PROBLEM DESCRIPTION
When editing photographic pictures, one commonlywishes to insert
new images in certain planes of the orig-inal picture or to obtain
a new view of the scene usinga projection that preserves the
proportions parallel to agiven plane. Such tasks are often
executedby feeling- theuser deforms the image, by means of trial
and error, untilthe desired effect is obtained. However, such
process istime-consuming, tedious, and it not always provides
goodresults, even for experienced and capable users.
The problem of obtaining three-dimensional informationfrom
pictures has been receiving a good deal of atten-tion in the
literature, but usually with purposes differ-ent from the ones
related to the present work. Severalworks [Chen95, MB95, HAA97]
approach the visual-ization problem using image-based rendering
techniqueswhose goal is to obtain new views of a scene from a setof
images. Other works deal with the three-dimensionalreconstruction
of a scene using stereo vision [FRL98] oradopting mathematical
models for the objects present inthe scene [DTM96].
Differently from such approaches, the method presentedhere does
not seek to obtain a complete reconstruction ofthe scene. On the
other hand, it uses as input data onlyone picture, for which no
metric information is assumedto be known, neither about the scene
(such as coordinatesof reference points or measures of elements)
nor aboutthe camera position.
The present work has several points in common with[P3D99], which
provides an environment for extract-ing three-dimensional
information from a single picture.However, the emphasis of that
work is the partial mod-eling of three-dimensional objects in the
scene, whereasour goal is to provide tools for editing, in a
geometri-cally correct manner, planes of the original picture,
byinserting new objects into such planes with the
properperspective. This method also aims at allowing the user
to extract a texture from the input picture with
automaticperspective correction. It also provides a way to
simu-late a photographic lens capable of correcting the
angulardistortion caused by the perspective projection. We mustnote
that this projection can not always be obtained witha real
photographic lens.
3 METHOD DESCRIPTION
3.1 Camera Calibration
One of the preconditions to construct geometric represen-tations
from photographic pictures is to know or identifythe parameters of
the camera that captured the picture ofinterest; among them, the
camera orientation and virtualposition in the 3D space. With this
information, a trans-formation is obtained such that, given a point
in space, itscorresponding point in the image is located. This
problemis calledcamera calibration, and a more detailed studycan be
seen in [Fau93, TGTL91]. We are considering asimple pin-hole camera
model, which does not take intoaccount the existence of lenses: the
image of a 3D objectis obtained by its perspective projection on a
plane 2Dsurface through the camera optical center (pin-hole).
Several camera-calibration methods are based on the ex-istence
of reference points in the picture, whose coordi-nates in the real
world are known [CSG98, Tsai86]. Oth-ers [DTM96] employ known
metric information, such asthe dimensions of architectural elements
like windows ordoors. In the present paper, we assume that no such
in-formation is available. This is important, for instance,
insituations involving old pictures, for which it is impossi-ble to
recover these data.
The calibration method used in this work, which is thesame as in
[P3D99], resorts exclusively to informationon the directions - more
precisely, to the vanishing pointsrelative to three mutually
orthogonal directions. As willbe shown next, these vanishing points
determine the posi-tion of the camera optical center and angle of
view. How-ever, without complementary metric information, it is
im-possible to retrieve the focal distance. As a
consequence,objects can be inserted in the scene with correct
propor-tions, but without an absolute control of their size.
Forexample, in Figure 1, the size of the outdoor can be spec-ified
only in relation to elements in the scene, such as
thefloor-to-ceiling height.
The need to identify vanishing points of three
orthogonaldirections makes this method especially suitable for
ar-chitectural pictures, in which there are usually three eas-ily
identifiable directions. We also admit the elements tobe inserted
on the image, or from which one wishes to ob-tain information from
the image, to be aligned with suchdirections.
-
3.2 Obtaining the Camera’s Position
In this subsection we will show how to obtain the cameraposition
in relation to the picture from vanishing pointsFx, Fy and Fz
relative to the three orthogonal direc-tions X, Y andZ. In
architectural pictures, the choiceof these directions is natural,
as they correspond to themain height, width and depth directions of
buildings. Thevanishing points are the points where the lines
parallel tothese three directions passing through the optical
centerintersect the projection plane. Therefore, together withthe
optical centerC, they determine three mutually or-thogonal
directions. This condition allows retrieving theposition of the
optical centerC in relation to the image,by solving a system of
equations expressing the orthog-onality of ~CFx, ~CFy e ~CFz or by
using the fact thatCis projected on the orthocenter of the triangle
having ver-ticesFx, Fy andFz, as illustrated in Figure 2. The
po-sition (uc, vc, wc) of C is expressed in the image coordi-nate
systemSi, with origin on the left bottom corner, twoaxesu andv
aligned with the borders of the image, and athird axisw orthogonal
to these borders. The coordinatesin this system are expressed in
pixels. As already men-tioned, it is not possible to obtain the
real focal distanceunless one knows the dimensions of a pixel. It
is alsoimportant to note that the projection of the optical
centerdoes not necessarily coincide with the center of the
image(Figure 3), since the image might represent only a portionof
the original picture.
Figure 2: Location of the optical center
In fact, the user does not directly specify vanishing pointsFx,
Fy andFz : he/she indicates, for each of the main di-rections, two
or more straight lines in the image which areprojections of
straight lines in the scene having these di-rections. The
corresponding vanishing point is obtainedby intersecting these
straight lines. To allow the user toindicate the straight lines
corresponding to each direction,the system offers an interface with
three pairs of guide-lines to be positioned over the straight lines
in the image,as illustrated in Figure 3.
3.3 Projection Equations
The camera projection equations associate each point inspace to
its perspective projection on the image. The po-sition of a pointP
in space is described by means ofits coordinates in the world
coordinate systemSm, withorigin at the camera optical centerC and
axes aligned
Figure 3: Guidelines for the directions, and posi-tion of the
optical centerC
with the main directionsX, Y andZ. The axes in thissystem pass
through pointsFx, Fy andFz, respectively.Since the real focal
distance is not known, the coordinatesin this system are also
expressed in pixels. To expressthem in conventional distance
measures one would haveto know, once again, the dimensions of a
pixel.
Therefore, the projection equations include a point(X, Y, Z) of
the world in their projection(u, v) on theimage. To write these
equations, it is convenient to con-sider an intermediate coordinate
system: that of the cam-era,Sc, with origin on the optical centerC
and axesU ,V andW parallel to the axes of the coordinate system
ofthe image,Si. Figure 4 illustrates these systems. Given apoint
(X, Y, Z) of the world, its coordinates on the cam-era’s system are
given by:
UVW
= Xu Yu ZuXv Yv Zv
Xw Yw Zw
XYZ
(1)where (Xu, Xv, Xw), (Yu, Yv, Yw) e (Zu, Zv, Zw) areunit
vectors corresponding to~CFx, ~CFy e ~CFz, respec-tively.
Once such camera coordinates are obtained, obtainingtheir
projection(u, v) is immediate. In homogeneous co-ordinates, it is
given by: tutv
t
= wc 0 uc0 wc vc
0 0 1
UVW
(2)
Thus, the camera projection equation is expressed by: tutvt
= wc 0 uc0 wc vc
0 0 1
Xu Yu ZuXv Yv ZvXw Yw Zw
XYZ
(3)
-
Figure 4: Coordinate systems
3.4 Editing in a Main Plane
Once the camera position and the projection equationshave been
obtained, it is possible to edit portions of theimage corresponding
to projections of images containedin planes parallel to two of the
main axes. We will con-sider planes parallel to axesX andY , but
the other casescan be handled similarly.
We assume that the portion of the image to be edited is
theprojection of a rectangle with sides parallel toX andY
,specified by selecting two diagonally opposed pointsp1andp2
corresponding to the projections of two verticesP1 and P2 of the
rectangle. There are infinitely manypoints in space that project
onp1 andp2, correspondingto the several depths ofZ in which the
rectangle can belocated. Since our purpose is to obtain the ratio
betweenthe rectangle dimensions, and not to retrieve their
abso-lute values, this depth can be arbitrarily set. Thus, let
ussetZ = 1. We can now retrieve the position ofP1 andP2, and
consequently that of the two other verticesQ1andQ2 of the rectangle
and their projectionsq1 andq2.
We have now established a correspondence between arectangle in
spaceP1Q1P2Q2 and its projection - quadri-lateralp1q1p2q2 - by
means of a two-dimensional projec-tive transformationT
′, which is the perspective projec-
tion executed by the camera restricted to planeZ = 1.This
correspondence allows retrieving a non-deformedimage of the
quadrilateralp1q1p2q2, obtained by a warp-ing transformation given
by the inverse ofT
′. Over this
non-deformed image, new elements can be applied, suchas signs or
pictures. These elements can be inserted backinto the original
image by means of a warping transfor-mation inverse to the one
applied in the previous step. Allthe process described is
illustrated in Figure 5.
Figure 6 shows another example, where an outdoor is ap-plied to
a building façade seen from two diferent posi-tions.
In many cases, it can be interesting to execute only partof the
process above - for instance, when one wishes toextract information
from the scene instead of insertingnew elements. The non-deformed
image correctly dis-plays the proportions among the elements
present on theplane being edited and preserves their angles, so it
can beused to retrieve the relations among element dimensions(such
as doors or windows) or to extract textures from
Figure 5: Stages of the editing process on a main plane
Figure 6: (a) original outdoor image , (b)(c) build-ings with
inserted outdoors
it, to be used in the construction of virtual models of
thescene. Figure 7 illustrates this process: the texture on thewall
was extracted in a geometrically correct manner.
Figure 7: (a) original picture, (b) extracted texture
3.5 Perspective Correction
Though the images produced by means of perspectivecorrection are
geometrically correct, they are not alwaysthe ideal representation
of the object of interest. Particu-larly, angles are not preserved
unless they have sides par-allel to the projection plane. For
instance, when produc-ing the picture of a building whose height is
large with re-spect to its distance to the camera optical center
(i.e., sit-
-
uations in which there is little depth of field), the camerahas
to be inclined in order to include the greatest possibleportion of
the object. This causes the vertical directionto be non-parallel to
the plane of the picture, introducinga vanishing point
corresponding to this direction. Thismeans that the observer has
the impression that the verti-cal edges of the building will meet.
Even though this isperfectly normal, architects often prefer images
in whichthe vertical direction is parallel to the plane of the
picture,in order to avoid this effect. A possible solution
consistsin using special lenses capable of producing pictures
withthe desired characteristics [Nik01].
Our system offers an alternative solution, allowing theuser to
eliminate the vanishing point corresponding to thevertical
direction. The original picture is reprojected on anew plane,
parallel to vertical directionY and positionedat the same distance
from the camera as the original pro-jection plane. We select
coordinate axes in this new planeso thatY is projected according to
the vertical axis in thisnew picture. This is equivalent to
introducing a new cam-era with the same optical center and focal
distance, andnew coordinate axesα, β andγ. These new axes are
se-lected so thatβ has the same direction asY andγ has adirection
as similar toW as possible. This is done by tak-ing γ as the
projection ofW on the plane orthogonal toY , as illustrated in
Figure 8. Thus, vectors~α, ~β e~γ cor-responding to these new axes
can be obtained from unitvectors~Y e ~W (respectively vertical and
orthogonal tothe original projection plane) using the equations
below:
~β = ~Y~γ = ~W − ( ~W · ~Y )~Y~α = ~β × ~γ
(4)
Finally, to obtain the reprojected picture, one only needsto
consider each point(u, v) in the image to have the form(u, v, wc),
then transform it to the coordinate system ofthe new camera and
apply the new perspective projection.Thus, a point(u, v) in the
original picture is transformedinto a point whose homogeneous
coordinates(tu′, tv′, t)are given by:
tu′tv′t′
= wc 0 uc0 wc vc
0 0 1
αu αv αwβu βv βwγu γv γw
uvwc
(5)
where (αu, αv, αw), (βu, βv, βw) and (γu, γv, γw) arethe unit
vectors corresponding to vectorsα, β andγ ob-tained in Equation
(4).
Equation 5 determines a warping transformation that al-lows
generating the new image.
Figure 9 shows the original images and the results of
thereprojected pictures generated by the system. One can seethe
desired effect: the lines corresponding to the verticaldirection of
buildings have become parallel and verticalin the reprojected
picture.
Figure 8: New camera’s computation process
Figure 9: Original images and reprojected pictureswith vertical
inclination
-
4 APPLICATION
The method described in the previous section was imple-mented in
an application for Windows 98, NT4 or above,available at [VCG00].
The user starts by loading a pic-ture and performing the camera
calibration procedure,through the positioning of pairs of
guidelines for eachone of the three main directionsX, Y or Z
(Figure 10and Figure 11). Calibration results can be saved for
usein future sessions.
Figure 10: Guideline initialization.
Figure 11: Guideline positioned along lines paral-lel to the
main directions.
After calibration, the user can retrieve, with proper
per-spective correction, a portion of the image correspondingto the
projection of a rectangle parallel to two of the maindirections.
First, he/she specifies whether the rectangleis parallel to planeXY
, XZ or Y Z. Then, two diago-nally opposed points of the projected
rectangle must beindicated (Figure 12). The projected rectangle is
drawnover the picture and the corrected (unprojected) rectangleis
shown in a separate window (Figure 13).
The unprojected picture can be saved to a file and usedfor
texture extraction or proportion measurement. It canalso be
replaced by a new image and placed back into theoriginal image, as
shown in Figure 14.
There is also a function for eliminating the vanishingpoint for
the vertical (Y ) direction, in such a way thatvertical lines
project vertically in the new image. Figure15 shows the result
obtained when applying that function.
Figure 12: Opposing corners of projected rectangle.
Figure 13: (a) Projected, (b) unprojected rectangle.
5 CONCLUSION AND FUTURE WORK
We have presented a system for three-dimensional edit-ing of
architectural photographic pictures. The method issimple enough to
be used by the professionals it is aimedat, requiring intuitive
information which can be easilyidentified in the scenes of
interest. The method does notrequire previous data about the
scenes, so it can be usedwith any picture (even historical ones)
for extracting in-formation or inserting new virtual elements.
A limitation of this method is that it assumes the editedregion
to be on a single plane, parallel to two main axes.When this
hypothesis is not satisfied, the extracted image
Figure 14: (a) Unprojected outdoor, (b) outdoorinserted into the
original image.
-
Figure 15: Image with vertically projected vertical lines.
will contain elements with proportions not correspondingto the
real ones. Figure 16 illustrates this. When extract-ing the marked
region from the picture, the elements inthe balcony are also
extracted and are displayed with de-formations (actually, the
balcony is treated as if it weredrawn on the wall, instead of being
a three-dimensionalelement). To address this problem, it would be
necessaryto extend the editing surface, considering surfaces
formedby rectangles parallel to the main axes.
Another natural extension would be to consider planesparallel to
only one of the axes. This would be the caseof ceilings, ramps or
walls not aligned with the main di-rections, for instance.
Figure 16: (a) Original image; (b) transformed im-age,
considering all elements on the same plane.
It should also be considered the introduction of auto-matic or
semi-automatic methods for extracting guide-lines, through the
recognition of straight lines in the im-age ([SCG00]). For better
results, these lines should beextracted at sub-pixel level.
References
[HAA97] Anjyo, K.; Horry, Y. and Arai, K. Tour into thepicture:
Using a spidery mesh interface to makeanimation from a single
image. InConferenceProceedings SIGGRAPH’97, 1997, pp 225–232.
[FRL98] Laveau, S.; Csurka,G.; Zeller,C.;
Gauclin,C.;Faugeras,O.; Robert,L. and Zoghlami,I. 3D
recon-struction of urban scenes from image sequences.Computer
Vision and Image Understanding, Vol.
69 No.3(Article No. IV970665.):292–309, March1998.
[Nik01] Lens Nikon 28mm f/3.5 PC-nikkor perspec-tive control
technology.http://www.nikon.com, 2001.
[MB95] McMillan, L. and Bishop, G. Plenoptic mod-eling: An
image-based rendering system. InRobert Cook, editor,Conference
Proceedings(SIGGRAPH ’95), ACM SIGGRAPH, AddisonWesley, Annual
Conference Series, Los Angeles,California, 06-11 August 1995, pp
39–46.
[Fau93] Olivier Faugeras.Three-Dimensional ComputerVision: A
Geometric ViewPoint. MIT Press, Cam-bridge, Massachusetts,
1993.
[P3D99] Photo 3D. http://www.photo3D.com ,1999.
[Tsai86] Roger Y. Tsai. An efficient and accurate
cameracalibration technique for 3d machine vision. IEEEComputer
Society Conference on Computer Vi-sion and Pattern Recognition,
Miami Beach, FL,22-26 June 1986.
[Chen95] Shenchang Eric Chen. QuickTime VR - animage-based
approach to virtual environment nav-igation. In Robert Cook,
editor,Conference Pro-ceedings (SIGGRAPH ’95), ACM SIGGRAPH,Addison
Wesley, Annual Conference Series, LosAngeles, California, 06-11
August 95, pp 29–38.
[SCG00] Szenberg, F.; Carvalho,P.C. e Gattass,M. Au-tomatic
Camera Calibration for Image Sequencesof a Football Match.
Interactive Conference onPattern Recognition. Rio de Janeiro, RJ,
Brasil,Março de 2000.
[CSG98] Szenberg,F.; Carvalho,P.C. and Gattass,M.Image-based
modeling using a two-step cameracalibration method. InProceedings
of Interna-tional Symposium on Computer Graphics, ImageProcessing
and Vision, 1998, pp 388–395.
[DTM96] Taylor,C.J.; Debevec,P. and Malik,J. Model-ing and
rendering architecture from photographs:A hybrid geometry-and
image-based approach. InHolly Rushmeier, editor,Conference
Proceedings(SIGGRAPH ’ 96), ACM SIGGRAPH, AddisonWesley, Annual
Conference Series, New Orleans,Louisiana, 04-09 August 1996, pp
11–20.
[TGTL91] Tommaselli,A. and Tozzi, C.Tutorial:calibraç̃ao de
ĉameras usando feiçõesgeoḿetricas. In IV Simṕosio Brasileiro
deComputaç̃ao Gráfica e Processamento de Im-agens, SIBGRAPI’91,
Centro de ConvençõesRebouças, S̃ao Paulo, 14-17 July 1991.
[VCG00] Villa Alvarez,B.; Carvalho,P.C. and Gat-tass,M.
http:www.tecgraf.puc-rio.br/˜bia/paper/3dedition.html