Fiducial Planning for Error-Bounded Pose Estimation of a Panoramic Camera in Large Environments Daniel G. Aliaga Ingrid Carlbom {aliaga|carlbom}@bell-labs.com Lucent Technologies Bell Labs 1. INTRODUCTION Panoramic image sensors are becoming increasingly popular because they capture large portions of the visual field in a single image. These cameras are particularly effective for capturing and navigating through large, complex 3D environments. Existing vision-based camera pose algorithms are derived for standard field-of-view cameras, but few algorithms have been proposed to take advantage of the larger field-of-view of panoramic cameras. Furthermore, while existing camera pose estimation algorithms work well in small spaces, they do not scale well to large, complex 3D environments consisting of a number of interconnected spaces. Accurate and robust estimation of the position and orientation of image sensors has been a recurring problem in computer vision, computer graphics, and robot navigation. Stereo reconstruction methods use camera pose for extracting depth information to reconstruct a 3D environment [12, 16]. Image-based rendering techniques [1, 3, 15, 19, 20, 28] require camera position and orientation to recreate novel views of an environment from a large number of images. Augmented reality systems [5] use camera pose information to align virtual objects with real objects, and robot navigation and localization methods [7, 8, 10, 30] must be able to obtain the robot’s current location in order to maneuver through a (captured) space. We can divide existing vision-based camera pose approaches into passive methods and active methods. Passive methods derive camera pose without altering the environment but depend on its geometry for accurate results. For example, techniques may rely upon matching environment features (e.g., edges) to an existing geometric model or visual map [11, 29, 34]. To obtain robust and accurate pose estimates, the model or map must contain sufficient detail to ensure correspondences at all times. Another class of passive methods, self-tracking methods, use optical flow to calculate changes in position and orientation [17]. However, self-tracking approaches are prone to cumulative errors making them particularly unsuited for large environments. Active methods utilitize fiducials, or landmarks, to reduce the dependency on the environment geometry. Although fiducial methods are potentially more robust, the number and locations of the fiducials can significantly affect accuracy. Existing techniques often focus on deriving pose estimates from a relatively sparse number of (noisy) measurements [4, 6, 9, 18, 21, 27]. For large arbitrarily shaped environments, such as the ones presented in this
21
Embed
Fiducial Planning for Error-Bounded Pose Estimation of a …gfx.cs.princeton.edu/proj/soi/pose.pdf · 2003. 1. 27. · Fiducial Planning for Error-Bounded Pose Estimation of a Panoramic
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Fiducial Planning for Error-Bounded Pose Estimation of a Panoramic Camera in Large Environments
Daniel G. Aliaga Ingrid Carlbom {aliaga|carlbom}@bell-labs.com Lucent Technologies Bell Labs
1. INTRODUCTION Panoramic image sensors are becoming increasingly popular because they capture large portions of the visual field
in a single image. These cameras are particularly effective for capturing and navigating through large, complex 3D
environments. Existing vision-based camera pose algorithms are derived for standard field-of-view cameras, but
few algorithms have been proposed to take advantage of the larger field-of-view of panoramic cameras.
Furthermore, while existing camera pose estimation algorithms work well in small spaces, they do not scale well to
large, complex 3D environments consisting of a number of interconnected spaces.
Accurate and robust estimation of the position and orientation of image sensors has been a recurring problem in
computer vision, computer graphics, and robot navigation. Stereo reconstruction methods use camera pose for
extracting depth information to reconstruct a 3D environment [12, 16]. Image-based rendering techniques [1, 3, 15,
19, 20, 28] require camera position and orientation to recreate novel views of an environment from a large number
of images. Augmented reality systems [5] use camera pose information to align virtual objects with real objects, and
robot navigation and localization methods [7, 8, 10, 30] must be able to obtain the robot’s current location in order
to maneuver through a (captured) space.
We can divide existing vision-based camera pose approaches into passive methods and active methods. Passive
methods derive camera pose without altering the environment but depend on its geometry for accurate results. For
example, techniques may rely upon matching environment features (e.g., edges) to an existing geometric model or
visual map [11, 29, 34]. To obtain robust and accurate pose estimates, the model or map must contain sufficient
detail to ensure correspondences at all times. Another class of passive methods, self-tracking methods, use optical
flow to calculate changes in position and orientation [17]. However, self-tracking approaches are prone to
cumulative errors making them particularly unsuited for large environments.
Active methods utilitize fiducials, or landmarks, to reduce the dependency on the environment geometry. Although
fiducial methods are potentially more robust, the number and locations of the fiducials can significantly affect
accuracy. Existing techniques often focus on deriving pose estimates from a relatively sparse number of (noisy)
measurements [4, 6, 9, 18, 21, 27]. For large arbitrarily shaped environments, such as the ones presented in this
2
article, there does not exist a method for determining the optimal number of fiducials or their optimal placement in
order to achieve a desired pose accuracy.
In this article, we present a robust camera pose algorithm and a working system to compute bounded-error
estimates of the position and orientation of panoramic images captured within large, arbitrarily complex
environments while moving the camera within a plane. We use a planning algorithm to place fiducials in an
environment so as to satisfy a set of fiducial constraints, including the number of visible fiducials, the distance from
the viewpoint to the fiducials, and the angle subtended by pairs of fiducials. Combined with an analytic error model,
we can either provide fiducial placements to achieve a desired pose estimation accuracy, or bound the pose
estimation error for a given fiducial placement (Figure 1).
Our algorithm inserts small portable fiducials (e.g., light boxes) into an environment and triangulates camera pose
from the projections of the fiducials onto the panoramic images. We use a coarse 2D floor plan and a heuristical
solution to a variation of the classical art-gallery problem to suggest fiducial locations that satisfy the fiducial
constraints for viewpoints within the environment. Exact fiducial locations are not necessary and will be obtained
later via an optimization method. At the expense of more fiducials, enforcing stricter constraints increases pose
estimation accuracy. Our system requires little setup time and does not significantly alter the environment. We have
Figure 1. Example setup. We show a floor plan and fiducial locations that from all camera viewpoints
within the environment satisfy a set of visibility, distance, and angle constraints. To the left, we show a
picture of one of our small portable fiducials. To the right, we show our remote-controlled capture system,
including a computer, panoramic camera, battery, and motorized cart.
d
α
camera
f6
f1
f2
f3 f4
f5
f7f8
f9
f10
3
used our method with several environments, covering 500 to 1000 square feet and with an average pose accuracy of
up to 0.66 cm. Our approach includes the following contributions:
Table 1. Pose error bounds vs. maximum distance to fiducials. Rows correspond to fiducial sets computed for fiducial distances of 100 to 700 cm in the test environment. Each column contains the mean and standard
0
20
40
60
80
100
120
140
160
180
0 2 4 6 8 10 12 14 16 18
Mi ni mum number of v i s i bl e f i duc i a l s
64
32
16
8
4
2
1
actual
a) b)Figure 14. Pose computations in a small office environment. (a) We show the floor plan of a small office
environment (i.e., a simple box), the location of all the available fiducials (colored boxes), and the computed
camera trajectory (using all the fiducials). (b) We show the pose error bounds for the image sequence using
subsets of the fiducials.
2.5 m
3 m
18
deviation of the pose error bound for fiducial positioning errors of 64, 32, 16, 8, 4, 2, and 1 cm. The last column pair (“actual”) refers to the fiducial positioning errors actually obtained via the global optimization.
Table 2. Pose error bounds vs. minimum number of visible fiducials. Rows correspond to fiducial sets computed for 2 to 16 visible fiducials in the test environment. Each column contains the mean and standard deviation of the pose error bound for fiducial positioning errors of 64, 32, 16, 8, 4, 2, and 1 cm. The last column pair (“actual”) refers to the fiducial positioning errors actually obtained via the global optimization.
64 32 16 8 4 2 1 actual mean σ mean σ mean σ mean σ mean σ mean σ mean σ mean σ
Table 3. Pose error bounds vs. minimum subtended angle. Rows correspond to fiducial sets computed for angles of 0 to 140 degrees in the test environment. Each column contains the mean and standard deviation of the pose error bound for fiducial positioning errors of 64, 32, 16, 8, 4, 2, and 1 cm. The last column pair (“actual”) refers to the fiducial positioning errors actually obtained via the global optimization. Acknowledgments
We are thankful to Sid Ahuja, Multimedia Communications Research VP at Bell Labs, for supporting this research.
In addition, we thank Bob Holt for his mathematical help.
References [1] D. Aliaga, I. Carlbom, “Plenoptic Stitching: A Scalable Method for Reconstructing Interactive Walkthroughs”, Proceedings of ACM
SIGGRAPH, pp. 443-450, 2001.
[2] D. Aliaga, “Accurate Catadioptric Calibration for Real-time Pose Estimation in Room-size Environments", IEEE International
Conference on Computer Vision (ICCV), pp. 127-134, 2001.
[3] D. Aliaga, T. Funkhouser, D. Yanovsky, I. Carlbom, “Sea of Images”, IEEE Visualization, 2002.
[4] D. Avis, H. Imai, “Locating a Robot with Angle Measurements”, Journal of Symbolic Computation, No. 10, pp. 311-326, 1990.
[5] R. Azuma, “A Survey of Augmented Reality”, Presence: Teleoperators and Virtual Environments, 6(4), pp. 355-385, 1997.
[6] M. Betke, L. Gurvits, “Mobile Robot Localization Using Landmarks”, IEEE Transactions on Robotics and Automation, Vol. 13, No.
2, pp. 251-263, 1997.
[7] J. Borenstein, B. Everett, L. Feng, Navigating Mobile Robots: Systems and Techniques. A.K. Peters, Ltd., Wellesley, MA, 1996.
[8] T. Boult, Remote Reality Demonstration, Proceedings of IEEE Computer Vision and Pattern Recognition (CVPR), pp. 966-967,
1998.
[9] R. Chatila, J.P. Laumond, “Position Referencing and Consistent World Modeling for Mobile Robots”, Proceedings of IEEE Intl.
Conference on Robotics and Automation, pp. 138-145, 1985.
[10] I. Cox, “Blanche: Position Estimation for an Autonomous Robot Vehicle”, Proceedings of IEEE Int. Workshop on Intelligent Robots
and Systems, pp. 432-439, 1989.
[11] F. Dalleart, W. Burgard, D. Fox, S. Thrun, “Using the Condensation Algorithm for Robust, Vision-based Mobile Robot
Localization”, Proceedings of IEEE Computer Vision and Pattern Recognition (CVPR), pp. 588-594, 1999.
[12] U. Dhond, J. Aggarwal, “Structure from Stereo – a Review”, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 19, No.
16, 1989.
[13] C. Geyer and K. Daniilidis, "Catadioptric Camera calibration", Proceedings Int. Conf. on Computer Vision (ICCV), pp. 398-404,
1999.
[14] C. Geyer and K. Daniilidis, “Structure and Motion from Uncalibrated Catadioptric Views”, IEEE Conf. Computer Vision and
Pattern Recognition (CVPR), 2001.
[15] Gortler S., Grzeszczuk R., Szeliski R., and Cohen M., “The Lumigraph”, Computer Graphics (SIGGRAPH 96), pp. 43-54, 1996.
[16] S.B. Kang, R. Szeliski, “3D Scene Data Recovery using Omnidirectional Multibaseline Stereo”, IEEE Conf. Computer Vision and
Pattern Recognition (CVPR), pp. 364-370, 1996.
[17] S.B. Kang, “Catadioptric Self-Calibration”, Proceedings of IEEE Computer Vision and Pattern Recognition (CVPR), pp. 201-207,
June 2000.
20
[18] J.J. Leonard, H.F. Durrant-Whyte, “Mobile Robot Localization by Tracking Geometric Beacons”, IEEE Transactions on Robotics
and Automation, Vol. 7, No. 3, pp. 376-382, 1991.
[19] Levoy M. and Hanrahan P., “Light Field Rendering”, Computer Graphics (SIGGRAPH 96), pp. 31-42, 1996.
[20] McMillan L. and Bishop G., “Plenoptic Modeling: An Image-Based Rendering System”, Computer Graphics (SIGGRAPH 95), pp.
39-46, 1995.
[21] J. Mendelsohn, K. Daniilidis, “Constrained Self-Calibration”, Proceedings of IEEE Computer Vision and Pattern Recognition
(CVPR), pp. 581-587, 1999.
[22] T. Morita, T. Kanade, “A Sequential Factorization Method for Recovering Shape and Motion from Image Streams”, Proc. ARPA
Image Understanding Workshop, Vol. 2, pp. 1177–1188, 1994.
[23] S. Nayar, “Catadioptric Omnidirectional Camera”, Proc. of IEEE Computer Vision and Pattern Recognition, pp. 482-488, 1997.
[24] J. O'Rourke, Art Gallery Theorems and Algorithms, Oxford University Press, New York, 1987.
[25] M. Pollefeys, R. Koch, and L. van Gool, “Self-Calibration and Metric Reconstruction in Spite of Varying and Unknown Internal
Camera Parameters”, Proceedings Int. Conf. on Computer Vision (ICCV), pp. 90-95, 1998.
[26] K. Simsarian, T. Olson, N. Nandhakumar, “View-Invariant Regions and Mobile Robot Self-Localization”, IEEE Transactions on
Robotics and Automation, Vol. 12, No. 5, pp. 810-816, 1996.
[27] K. Sugihara, “Some Location Problems for Robot Navigation using a Single Camera”, Computer Vision, Graphics, and Image
Processing, Vol. 42, pp. 112-129, 1988.
[28] H. Shum, L. He, “Concentric Mosaics”, Proceedings of ACM SIGGRAPH, pp. 299-306, 1999.
[29] R. Talluri, J. K. Aggarwal, “Mobile Robot Self-Location Using Model-Image Feature Correspondence”, IEEE Transactions on
Robotics and Automation, Vol. 12, No. 1, pp. 63-77, 1996.
[30] C. J. Taylor, “Video Plus”, IEEE Workshop on Omnidirectional Vision, pp. 3-10, 2000.
[31] S. Teller, M. Antone, Z. Bodnar, M. Bosse, S. Coorg, M. Jethwa, N. Master, “Calibrated, Registered Images of an Extended Urban
Area", IEEE Computer Vision and Pattern Recognition (CVPR), 2001.
[32], B. Triggs, P. McLauchlan, R. Hartley, A. Fitzgibbon", Bundle Adjustment - A Modern Synthesis, in Vision Algorithms: Theory and
Practice, Springer-Verlag, 2000.
[33] M. Ward, R. Azuma, R. Bennett, S. Gottschalk, and H. Fuchs, "A Demonstrated Optical Tracker with Scalable Work Area for Head-
Mounted Display Systems.", ACM Symposium on Interactive 3D Graphics (I3D 92), pp. 43-52, 1992.
[34] Y. Yagi, Y. Nishizawa, M. Yachida, “Map-Based Navigation for a Mobile Robot with Omnidirectional Image Sensor COPIS”, IEEE
Transactions on Robotics and Automation, Vol. 11, No. 5, pp. 634-648, 1995.
APPENDIX A
In this appendix, we briefly describe an optional method for improving the estimate of the 3D fiducial locations
before tracking and image capture. Using a tape measure, we measure the distances between pairs of mutually
visible fiducials and fit the resulting rigid configuration to the original floor plan. So long as the graph of mutual
fiducial visibility is at least biconnected, the distances alone provide a rigid fiducial configuration. To fully support
floor plans containing loops, a sufficient condition is that the fiducial visibility graph must be biconnected using
only local edges (i.e., if an edge is removed, the affected vertices must remain connected using only local edges and
21
not via a sequence of edges on the other side of the loop). To enforce this, we could modify the original planning
algorithm so that when searching for the most redundant fiducial to remove, we also verify that the resulting mutual
fiducial visibility graph is still biconnected.
Using the N fiducial locations f1=(x1, y1) through fN=(xN, yN) computed by the planning algorithm and given a set of
fiducial distance measurements, we create a distance matrix. Then, using minimization, we obtain a new set of
fiducial locations f1′=(x1′, y1′) through fN′=(xN′, yN′) that produce a distance matrix with the same values as the
measured distances. We wish to minimize the function gplacement consisting of the sum of the squared differences
between the measured distances mij and the corresponding distances of the current fiducial set. (We use the
Cronecker delta to ignore terms for which we have no distance measurements.) The function and its partial
derivatives are given below:
∑∑−
= +=
=′′′′1
1 1
211 ),,...,,(
N
i
N
ijijijNNplacement EKyxyxg (3)
)1(
)))()(((
)22(2
)22(2
0,
222
1
1 1
1
1 1
ijmij
jijiijij
N
iji
N
ijijij
i
N
iji
N
ijijij
i
K
yyxxmE
yyEKyg
xxEKxg
δ−=
′−′+′−′−=
′+′−=′
∂
′+′−=′
∂
∑∑
∑∑
−
= +=
−
= +=
(4)
We fit the resulting rigid fiducial configuration to the original ideal fiducial configuration by finding a
transformation that best aligns the two. We compute a translation (tx, ty) and rotation r of the configuration that
minimizes the sum of the squared distances between the fiducial locations in the floor plan and the fiducials of the
actual configuration. The function gfit to minimize is given below: