Pose estimation from one conic correspondence by Snehal I. Bhayani 201211008 A Thesis Submitted in Partial Fulfilment of the Requirements for the Degree ofMASTER OF TECHNOLOGY in INFORMATION AND COMMUNICATION TECHNOLOGY to DHIRUBHAI A MBANI I NSTITUTE OFI NFORMATION ANDC OMMUNICATION T ECHNOLOGY June, 2014
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
7/25/2019 Pose estimation from one conic correspondence
i) the thesis comprises of my original work towards the degree of Master of
Technology in Information and Communication Technology at Dhirubhai
Ambani Institute of Information and Communication Technology and has
not been submitted elsewhere for a degree,
ii) due acknowledgment has been made in the text to all the reference material
used.
Snehal Bhayani
Certificate
This is to certify that the thesis work entitled INSERT YOUR THESIS TITLE HEREhas been carried out by INSERT YOUR NAME HERE for the degree of Master of
Technology in Information and Communication Technology at Dhirubhai Ambani
Institute of Information and Communication Technology under my/our supervision.
Prof. Aditya Tatu
Thesis Supervisor
i
7/25/2019 Pose estimation from one conic correspondence
In this thesis we attempt to solve the problem of camera pose estimation from one
conic correspondence by exploiting the epipolar geometry. For this we make two
important assumptions which simplify the geometry further. The assumptions
are, (a) The scene conic is a circle and (b) The translation vector is contained in a
known plane. These two assumptions are justified by noting that many artifacts
in scenes(especially indoor scenes), contain circles, which are wholly in front of
the camera. Additionally, there is a good possibility that the plane which contains
the translation vector would be known. Through the epipolar geometry frame-
work, a matrix equation is defined which relates the camera pose to one conic
correspondence and the normal vector defining the scene plane. Through the as-
sumptions, we simplify the system of polynomials in such a way that the task
involving solution to a set of seven simultaneous polynomials in seven equations,
is transformed into a task of solving only two polynomials in two variables, at
the same time. For this we design a geometric construction. This method gives
a set of finitely many camera pose solutions. We test our propositions throughsynthetic datasets and suggest an observation which helps in selecting a unique
solution from the finite set of pose solutions. For synthetic dataset, the solution
so obtained is quite accurate with an error of 10 −4, and for real datasets, the solu-
tion is erroneous due to errors in camera calibration data we have. We justify this
fact through an experiment. Additionally, the formulation of above mentioned
seven equations relating the pose to conic correspondence and scene plane po-
sition, helps to understand that, how does the relative pose establish point and
conic correspondences between the two images. We then compare the perfor-
mance of our geometric approach with the conventional way of optimizing a cost
function and show that the geometric approach gives us more accurate pose solu-
tions.
vi
7/25/2019 Pose estimation from one conic correspondence
Of the four classes above, our work is about the fourth type of pose estimation
problem, 2D perspective- 2D perspective pose estimation problem. This approach
requires an overview of a two camera setup. Hence before we go further, a general
arrangement of the two camera setup introduced in next section (1.1). The math-
ematical spaces considered throughout this report would be euclidean spaces 1,
unless specified otherwise.
1.1 Two camera setup
The purpose of introducing such an arrangement is two-fold. Firstly, it intro-
duces the various feature artifacts which would be used for establishing corre-
spondences between two images(like points, lines, conics etc.) and the varying
mathematical relationships amongst them, and secondly the same framework sets
up the idea of multiple-view geometry(or termed as epipolar geometry by Hartleyand Zisserman in [1]).
Figure 1.1: A setup describing epipolar geometry
1One can wonder, we are dealing with projective spaces and still the ones considered are notprojective spaces. The reason is, as shown in section (A.2.5), the projective space is obtained by"adding" points at infinity to an affine space(here we can consider an euclidean space as an affinespace with origin as the point [0,0,0]T ). For practical purposes we assume the points we deal withare "not at infinity". Hence the projective space is reduced to an affine(or an euclidean) space.
3
7/25/2019 Pose estimation from one conic correspondence
A two camera setup is depicted in figure (1.1). Here a pin-hole camera is decom-
posed into a projection center O(a point in R3), an image plane π and its calibra-
tion matrix K . This model is mainly of theoretical interest but for our application
we see that this highly simplified model works well enough so as to be able toignore various practical issues in a camera model. Such a camera model shall be
denoted as cam(O, π , K ). The calibration matrix houses quantities that determine
the relation between the position of a point x ∈ π in 2D image coordinate system
with respect to its position in the 3D global coordinate system of the camera. Let
O⊥ be the intersection of the line from O perpendicular to π with π . Then the
matrix K gives us the distance of the plane π from center O and the position of
the point O⊥ in terms of the local coordinate system of π . More on the structure
of K can be read from appendix (B.3).
As shown in the figure we have a pair of cameras cam(O1, π 1, K ) and cam(O2, π 2, K )
with their centers at points O1 and O2 in R3. The calibration matrices are same for
both of the cameras. The image planes associated with cameras O1 and O2 are π 1
and π 2 respectively. Now a quadratic curve is defined as the zero set of a second
order polynomial
Ax2 + By2 + Cxy + Dx + Ey + F = 0.
This polynomial can be written in matrix form as
x y 1
A C/2 D/2
C/2 B E/2
D/2 E/2 F
x
y
1
= 0. (1.3)
Using dual notation, henceforth we shall have the same notation for a quadratic
curve and for a matrix representation of its defining polynomial, C. For the above
defined curve, C means the matrix
A C/2 D/2
C/2 B E/2
D/2 E/2 F
and also the set of points
defined by the solution to the equation (1.3). In computer vision community such
a quadratic curve is termed as a conic.
The conics in the two image planes are assumed to be C1 in π 1 and C2 in π 2.
Further, let us have a third plane, π is oriented in R3 space, containing the scene
4
7/25/2019 Pose estimation from one conic correspondence
if and only if x and y are the images of the same scene point4. Another matrix
discussed and taken up subsequently by noted researchers termed as fundamentalmatrix. This matrix is the un-calibrated counterpart of the essential matrix.
E = K T FK .
This means the same point correspondence is defined but the point measurements
don’t need the calibration matrix to be known. A detailed explanation and treat-
ment of both these matrices can be found in the textbook, [1] by Hartley and Zis-
serman. These equations form the backbone of our thesis.
Relative orientation of plane π 1 with respect to plane π 2 in E3 is assumed to
be rotation, R and translation, t. These quantities are such that a point y ∈ π 2
when rotated and translated through R and t, we would get a corresponding
point x ∈ π 1 as x = Ry + t. Thus in the figure given above, if O1 is at origin
O1 =
0 0 0T
, then O2 = −RT t. The points of intersection of line −−−→O1O2
with planes π 2 and π 1 are known as the epipoles e1 and e2 of cameras 2 and 1 re-
spectively. The essential matrix, as introduced above can be decomposed in terms
of R and t, [5]:E = [t]×R.
The fundamental matrix in terms of R and t is decomposed as [1]:
F = [e2]× H .
In lemma (1) in chapter (3), we prove the following relationship between pose
parameters R, t and variables of epipolar geometry, H , e
t = λK −1e,
R = λ−1(K −1 HK + K −1evT K ), (1.6)
where R, t, e and K have their usual meanings and λ is a real scaling factor. v
represents the position of the scene plane.
4x and y are measured in image planes, assuming that the cameras are calibrated.
7
7/25/2019 Pose estimation from one conic correspondence
With the above setup in mind, one can now define the pose estimation through
mathematical quantities. Considering the camera centers, O1 and O2, translation
vector t is defined as:
t =−→O1 − R
−→O2, (1.7)
where−→O1 and
−→O2 are the vector representations of points O1 and O2 in R
3.
R is the rotation matrix which maps image plane π 2 to a parallel position to that
of π 1 upon rotation. In other words if π 1 is defined as uT 1 x + 1 = 0, ∀x ∈ R3 and
π 2 as uT 2 x + 1 = 0, ∀x ∈ R3, then R can be estimated as the rotation of the unit
vector u2/u2 to the unit vector u1/u1.
Thus pose estimation in this thesis, is defined to be an estimation of R and t, given
the two images or the two image planes, π 1 and π 2. In this thesis, the assumption is
that the cameras are calibrated and the camera calibration matrix K is same for
both cameras. More assumptions would follow in later chapters, some for more
accuracy and some for simplicity.
1.3 Background work
The history of computer vision is rich and full of brilliant insights. Its association
with projective geometry is even richer. We have listed out the four main classes
of pose estimation problem along the lines of a paper by Haralick, [4]. In a later
paper, Haralick et al. in [6] work over a similar kind of problem but exclusively
in an euclidean space, where they look at a closed form solution to pose estima-
tion from a set of three point perspective projections. This problem would be of
the type defined as 2D perspective-3D pose estimation problem in point (3). Pho-
togrammetry deals with these problems in detail and as well has its application
to computer vision. Higgins, in [5], introduced the essential matrix of equation(1.5) primarily to tackle the problem of relative orientation or in other words the
pose estimation problem. Authors in the same paper give us an algorithm to es-
timate R and t from E. This fact can be understood from the algorithm proposed
for estimation of R and t in [5]. A comprehensive study on fundamental matrix
and its related treatment has been carried out by other researchers, Zhang in [7]
and, by Luong and Faugeras in [8]. For the second and more crucial part of the
problem [5, 8, 7, 4, 9, 10] use point correspondences to estimate either the fun-
8
7/25/2019 Pose estimation from one conic correspondence
damental matrix or essential matrix. As against this, Heyden and Kahl in [11]
have used conic correspondences to estimate the fundamental matrix. The au-
thors give a brief survey of various features(like points, lines, curves and many
more) used in the past to estimate the fundamental matrix. They also state the
reasons for why conic correspondences are preferred by certain researchers over
the conventional point and line correspondences. The primary motivation is the
fact that many man-made objects contain a curve which is either a conic or can
be approximated to be formed of conics. Another reason is a property of projec-
tive transformation that any projective transformation maps a conic into another
conic(also termed as the projective invariance property). A projective transforma-
tion is a pointwise mapping between two projective spaces.5 Ji et al. in [12] have
used a mix of various geometric features like points, lines and conics, to estimate
the pose of a camera with respect to the object coordinate frame. Towards the
same objective, they have have considered a linear approach that combines geo-metric features at different levels of complexity, thus improving the stability and
accuracy of the solution. The approach estimates the pose parameters from point
correspondences, line correspondences and 2D ellipse-3D circle correspondences.
For circle-ellipse correspondence, they have obtained two polynomials which de-
fine two constraints on the relative pose. But the authors have assumed that the
radius of the circle is known and the property of the circle as a conic section is
not used completely as the focus is more on using as many feature correspon-
dences as available. The same problem of 2D perspective-3D pose estimation is
worked over by Wang in [13]. The approach as proposed in the paper amounts
to estimating the pose of the camera from single view and under the assumption
that the intrinsic camera parameters are known. The approach uses the image
of an absolute conic6 to estimate the pose of the camera. An added assumption
is needed that the image of center of the 3D circle is known is employed for the
minimal case where image of only one circle is known. But it hasn’t been explic-
itly justified when would such an assumption hold true though some methods of
estimating the same have been suggested.
5Definition of projective transformation in strict mathematical terms is given in appendix(A.2.3).
6The absolute conic is an imaginary conic at infinity that consists of purely imaginary points.The image of the conic is shown to depend only on the calibration matrix.
9
7/25/2019 Pose estimation from one conic correspondence
Our contribution primarily lies in an attempt to solve the problem from a slightly
different perspective. It has motivated two different approaches for pose estima-
tion. The first approach is based on the equation,
(R − tuT )T K T C2K (R − tuT ) − µK T C1K = 03×3,
where R, t, C1, C2 have their usual meanings, u ∈ R3 is the vector7 defining
the scene plane that contains the conic C and µ is a scaling factor introduced to
account for the homogeneous quantities, C1 and C2 in the equation. The above
equation is derived by combining epipolar geometry with one conic correspon-
dence. Intuitively this equation describes the relationship between the pose R, t
and the pair of conics in correspondence through the normal vector of scene plane,
u. This constraint can be further simplified if we assume that the conic C in scene
whose images C1 and C2 are known is a circle8 and that the translation vector lies
in a specified plane(defined by a normal vector w). These assumptions reduce the
number of unknown variables in previous equation to get
(R − tuT )T K T C2K (R − tuT ) − µK T C1K = 03×3,
wT t = 0, (1.8)
where R, t and µ are to estimated and C1, C2, u, K and w are known. A straight-forward way to solve the above equations is to write a gradient descent algorithm
via explicit calculation of gradient vectors or use MATLAB’s inbuilt functions for
optimization on a cost function modeled from equation (1.8). Unfortunately any
optimization method, in general, can get stuck in a local minima, and through ex-
periments on synthetic datasets, we have found that the algorithm does get stuck
at a point which is nowhere close to the true value. Such an experiment and its
result is given in section (4.1) of chapter (4). A second problem is that there is
no sure-shot way of figuring out how many global minimas does our system of
polynomials have. These facts make the starting points of the parameters more ac-
countable to how does the algorithm behave. An estimate closer to the true value
helps the algorithm behave nicely and converge accurately to the true solution.
But with a starting point quite far off, the solution achieved upon convergence is
not at all close enough to the true value. To get round to this problem, we design
7The vector u defines the plane through the plane equation xT u + 1 = 0, ∀x ∈ R3.8By its requirement to being a circle we mean, a circle in the global coordinate system in R3.
10
7/25/2019 Pose estimation from one conic correspondence
a geometric construction9 such that one can estimate all possible pose solutions to
a given problem. For this we transform the problem of estimating pose solutions
through optimization of cost function of equation (1.8) to a problem that involves
finding solutions to two pairs of polynomials, with each pair depending only on
two variables. The first pair of polynomials has three and four degree polyno-
mials, whereas the second pair has quadratic polynomials. These polynomials
can be accurately solved using the symbolic computation toolbox available with
MATLAB. The advantage here is that at a time we have only two polynomials in
two variables to solve which is a considerable improvement over the conventional
optimization task which includes solving seven polynomials in seven variables at
the same time. This is the reason for the high accuracy our approach achieves.
Further, solving these polynomials we get the pose as a finite set of all possible
solutions in form of R and t. The process follows a geometric construction and
does not need optimization which in turn helps improve the accuracy of the re-sults. The construction further improves our understanding of the above equa-
tion. The equation (1.8) relates the image and camera coordinate systems through
a conic correspondence. As a set of observations we propose some points on how
to pick one solution out of the finite set of all possible solutions as obtained from
this approach. We perform experiments on both real and synthetic data for this
geometric approach to pose estimation. For synthetic datasets, we find that the
pose solutions thus estimated, are accurate to an error of the order of 10−4. Espe-
cially for datasets with rotation matrix close to identity matrix, the observations
help us select a solution which is closest to the true values. But the observations
don’t hold true for datasets with rotation matrices considerably far from identity
matrix. For such cases, we propose using one additionally point correspondence
which is beyond the scope of this thesis. For real datasets the estimated pose so-
lution is not accurate enough. But through a related experiment, we demostrate
that the error in pose solution is primarily due to the error in camera calibration
process.
1.5 Layout of thesis
In chapter (2) we introduce the basics of epipolar geometry. It deals with the setup
of a two camera system but from a projective geometry point of view. The pre-
requisites of epipolar geometry are projective, affine and euclidean spaces whose
9For the time being, we consider only the euclidean coordinate system.
11
7/25/2019 Pose estimation from one conic correspondence
This is the geometric way of defining a point correspondence. One point worth
noting is that the camera setup of the figure (2.1) is in R3. If lines −→qx1 and −→ox2 are
parallel they don’t intersect in a point in E3, but in a point x∞, well defined in the
projective space P(E4) which by equation (A.13) is decomposed as
P(E4) = E3 P(E3),
where denotes the union of two disjoint sets. Thus point x∞ lies in P(E3).With this decomposition in mind, we can ensure that the point correspondence
between two images is well defined. This way of defining a point correspondence
motivates a special homography between two images. We call it special because,
such a homography would be constructed through the scene plane. As shown
later, this mapping is a part of a more general mapping between these two images
through the scene. In next section we intuitively describe this homography map-
ping through a scene plane and after that algebraically define the more general
mapping through scene points.
2.1.1 Geometric definition of homography between two images
Based on the way a point correspondence between two images through a scene
plane, π , is described, one can infer that such a mapping would be bijective. Dis-
tinct positions for π would give different mappings unless the planes are parallel
to each other. One point to note is that given a pair of images and a scene, not
every point in first image forms a correspondence pair with a point in second
image through a homography realized through a scene plane. Only the pointswhich are projection of points on scene plane, in both of the image planes are the
only ones forming correspondence pairs through homgraphy mapping generated
through π . This is termed as point transfer through scene plane π by Hartley and
Zisserman in chapter (9), [1]. But the scene points(irrespective of whether they lie
on scene plane or not) in general also setup point correspondences between the
two images. We look at this mapping in an algebraic formulation next.
14
7/25/2019 Pose estimation from one conic correspondence
2. Epipolar line of a point c ∈ π 2: The line l in π 1 obtained by the intersection of
the epipolar plane of c as defined above, with the image plane π 1 is known
as the epipolar line of c. This line is the set of all points of π 1 which can be
mapped to c through the two-camera setup described above.
To conclude, each point x ∈ π 2 has a unique line associated with it, l ∈ π 1. Thesame epipolar plane is also seen to be the epipolar plane of all points x ∈ π 1
such that x ∈ l. With simple geometry, one can say that, to every point x ∈ π 2
there is a unique line associated, l in π 1. The fundamental matrix F encodes this
correspondence:
l = F x, (2.3)
where l is a vector representation of line l in P(E3). Referring to section (A.2) of
appendix (A) we say that every line l in P(E3) corresponds to a plane throughorigin in E3 and the normal vector of this plane is denoted by l here. Hence this
representation is unique upto a non-zero scalar multiple, which conforms well
with the relationship given above. This is a point-line correspondence between
the two images that solely depends on the relative orientation of the two cam-
eras. It is just another perspective of the point-point correspondence of equation
(2.2). The geometric description of homography we saw in previous section is
constrained mapping of current mapping, as is evident from figure (2.2). In other
words, the point correspondence pairs through geometric description are a sub-
set of the correspondence pairs through the algebraic definition we discussed in
present section. In summary, this section builds the framework of epipolar ge-
ometry through which two images have point mappings realized through scene
points.
2.1.3 Some properties of the fundamental matrix, F
The fundamental matrix is of rank 2, unique upto a non-zero real scalar. Certain
decompositions and properties of this fundamental matrix are enlisted below for
a quick reference. Detailed discussions on properties and different interpretations
can be obtained from [1, 8, 7]:
1. If P1 and P2 are the projection matrices1, of two cameras then F = [e2]×P2P†1 .
2. If the relative orientation and position between the two cameras are defined
1 A projection matrix of a camera is discussed in appendix (B.1).
16
7/25/2019 Pose estimation from one conic correspondence
3. If the scene contains a plane π and the point mapping through the plane is
defined by the homography H ,
F = [e]× H ,
were e is the epipole of the image plane π 2 of the second camera and H is
defined such that
x = Hx, ∃x ∈ π 2, ∀x ∈ π 1. (2.5)
The second property is helpful for an intuitive grasp of the setup. The fun-
damental matrix maps points from one image to the other albeit upto a certainambiguity. The points are specified in local coordinate systems2. The decompo-
sition is though specified in terms of R and t which can be seen as being external
or specified in absolute coordinate system as compared to the image and scene
planes involved. This enables us to infer from an algebraic point of view how
does the change in R and/or t affect the change in point mapping. For more clar-
ification, we can put equations (2.2) and (2.4) together:
xT K −T [t]×RK −1x = 0. (2.6)
Intuitively we see that this equation describes a relationship between point
mappings and relative orientation between the two cameras. Such an interpre-
tation will be useful for the approach we have devised for pose estimation, as
the aim is to estimate R and t from various feature correspondences. For want
of deeper insight, there are two questions which need to be answered related to
equations (2.5) and (2.6) in chapter (3). These answers help is a better understand-
ing of the single stage geometric approach for pose estimation, taken up in chapter
(3). Next we take up both the questions one by one.
2.1.4 Question on homography generated in a one camera setup
Before taking up the problem with two cameras, we consider a situation with just
one camera and the scene plane π 1. For a given relative orientation of the camera
2To every plane(image or object) we fix an internal cartesian coordinate system. When we talkof calibration matrix being fixed, we mean the coordinate system as well.
17
7/25/2019 Pose estimation from one conic correspondence
3 with respect to the scene plane π 1, we can have a homography H representingthe mapping π → π 1
4. Thus given a relative orientation of the camera and the
planes, we can construct a unique homography. This statement is well proved and
discussed in depth in the textbook, [1] by Hartley and Zisserman, and which we
accept here without proof. The actual question is inverse of the above statement:
“For a given homography can we orient the camera and scene plane in order to induce the
given matrix?". If we have fixed coordinate systems in both the planes, the given
homography actually translates to a euclidean problem. The homography, thus
gives us four point correspondences
5
between two planes π 1 and π :
ai → bi, ai ∈ π , bi ∈ π 1, 1 ≤ i ≤ 4. (2.7)
Thus the problem is about finding an orientation between the camera and scene
plane such that the point correspondences as mentioned above are obtained. One
can show that not any given homography (or a set of four point correspondences) can be
represented by an arrangement of the camera and the scene plane. It amounts to getting
the right representation and at the same time reducing the number of unknowns
and the number of equations in play. Once the basic arrangement is laid out, the3Following discussion on cameras in section (1.1), by camera, we mean a model comprising of
centre O1, its image plane π and the calibration matrix, K fixed as well as known.4One more point to note is, we can fix any coordinate system in π 1 and π planes. Thereafter a
change of coordinate system in any of the planes amounts to multiplying the obtained homogra-phy matrix by an invertible matrix of coordinate transformation. In fact calibration matrix is forthe same reason, to transform the coordinates from one coordinate system to another.
5The point correspondences are also assumed to have been measured in the pre-decided eu-clidean coordinate system.
18
7/25/2019 Pose estimation from one conic correspondence
Such an euclidean arrangement is illustrated in figure (2.3). Here we have the
camera cam(O, π , K ). A calibrated camera means that the relation between the
local coordinate system of π and the global coordinate system is fixed. In fig-
ure (2.3) , the origin of coordinate system in π is O plane and the origin of global
coordinate system is O, the line −−−−→OO plane is perpendicular to π . As well as the
axes x − y of global coordinate system being parallel to the x plane − y plane axes of
the plane π . This information fixes the orientation of the plane π with respect
to the origin O and also the relationship of point P =
u plane v plane
T with the
global coordinate system. P as defined in the global coordinate system will be
P ≡
u plane v plane f T
, where f is the distance of O from the plane π . In terms
of polynomials, we can specify the same setup as a fixture of three quantities viz:
f and the distances of two arbitrary points6 P1, P2 ∈ π from O. These constraints
fix the orientation and the position of plane π with respect to the origin O. Thecalibration matrix encodes this information in the form of upper triangular ma-
trix K , but the equations help us understand the conditions that control the image
formation in a simple pin-hole camera.
With the basic setup with us, the point correspondences can now be defined as
mentioned before. Given four such point correspondences as labelled in equation
(2.7), we have to orient the plane π 1 relative to camera cam(O, π , K ).
Orienting π 1 in R3 to construct the desired homography
The way point correspondence between π and π 1 is defined, points ai, i = 1,..., 4
in plane π are mapped to bi = λiai in π 1, where λi is a scaling factor for point
ai. Then points λiai have to lie in the same plane, π . Further, the points bi are
measured in a local coordinate system and hence their positions are represented
by five distance constraints. In other words, five inter-point distances,
are known, where dist(x, y) represents euclidean distance between two points xand y in R2 . Hence we have six polynomial constraints in four variables, λi, i =
1, ..., 4. This proves the fact we stated before that not all homography mappings
can be realized by a relative orientation of the scene plane with respect to the given
camera. We have an interesting result to further reinforce this fact, by Poncelet,
6The two arbitrary points ought be specified in the local coordinate system. So we can select
P1 =
1 0T
and P2 =
0 1T
.
19
7/25/2019 Pose estimation from one conic correspondence
Figure 2.4: Geometric description of poncelet’s theorem, figure from [2].
Poncelet’s theorem: A version of the famous Poncelet’s theorem mentions that
“When a planar scene is the central projection of another plane(image plane), the
planar scene and the image plane stay in perspective correspondence even if the
scene plane is rotated about the line of intersection of the image and the scene
planes. The center of perspectivity moves in a circle in the plane perpendicular to
this line of intersection".
For our requirements we can translate the same theorem as “Given an orientation
of the scene plane and the camera(consisting of the center, image plane and the calibra-
tion matrix, with a fixed coordinate system) inducing the given homography, any further
change in the relative orientation of the scene plane with respect to the camera will change
the homography."
This fact is an important point towards building up the original problem. For it
shows that in order to maintain the same homography in spite of change in orien-
tation of scene plane with respect to the image plane, the camera centre also needs
to move with respect to image plane(specifically in a circle). This means if we at-tempt to keep the distance of the camera centre from image plane fixed, no two different
orientations of scene plane can give the same homography.
2.1.5 Question on homography generated in a two camera setup
Adding one more camera to the above arrangement, we have two cameras and
the scene plane, π . We assume π 1 and π 2 are image planes of two cameras. Any
20
7/25/2019 Pose estimation from one conic correspondence
for one approach to pose estimation, taken up at the end in chapter (3). But it is
purely an optimization task though there is some possible of future work on it.
This thesis focuses on a different approach that involves one defining equation
instead of two here. We can combine this two equations by eliminating e. The
equation so formed forms the basis of our geometric approach. This equation has
been solved through optimization tool as well, but with results not good enough,
we create a geometric design and estimate R and t. Discussion on this design is
given in section (3.4) of chapter (3).
2.2 Conics
The epipolar geometry is laid out in previous section. It defines the point cor-
respondences between the two images. Such point correspondences lead to cor-respondences between more complex features of images. The main focus of this
thesis being the use of conic correspondences for pose estimation, it is worthwhile
investigating the formulation of conics, its basic properties and mathematical def-
initions for a conic correspondence. A conic is a second degree curve in a plane
described by a quadratic equation as its solution set:
ax2 + bxy + cy2 + dx + ey + f = 0. (2.12)
This is the euclidean plane equation. Its corresponding representation in P(E3) isobtained by homogenizing equation (2.12) using a third variable as:
ax2 + bxy + cy2 + dxz + eyz + f z2 = 0. (2.13)
The same equation can be encoded using a symmetric matrix:
x y z
a b/2 c/2
b/2 c e/2
d/2 e/2 f
x
y
z
= 0. (2.14)
The matrix C =
a b/2 d/2
b/2 c e/2
d/2 e/2 f
would define the conic upto a non-zero scalar
multiple. Using dual notation, we would use C to mean both, the set of points of
the conic and it’s defining equation. We use this notation to classify conics by in-
specting the matrix C. Ideally in a euclidean plane, we can have either degenerate
23
7/25/2019 Pose estimation from one conic correspondence
point correspondences between points in π 1 and π 2 through the points of the scene plane is
mapped through the homography H . The above equation is proved in a different way
by Hartley, [1]. It can be shown with some algebra that u represents the position
of the scene plane π . If π is represented by the solution set of the equation
π ≡
x y zT
∈ R3|m1x + m2 y + m3 z + 1 = 0, m1, m2, m3 ∈ R
. (3.4)
Then u is the vector
m1 m2 m3
T uniquely defining the position of π . Hence-
forth we shall alternately denote the plane π defined by a vector u as above, by
the notation π u.
3.2 Conic correspondence
The scene plane π contains conic, C whose images are C1 and C2 in planes π 1 and
π 2 respectively. These two conics are measured in the local coordinate system.
Then we have the transformed conics as C1 = K T C1K and C
2 = K T C2K as repre-
sentations of the two conics in a transformed local coordinate system in which the
x plane − y plane axes are aligned to the x − y axes of the camera’s coordinate system
and the origin O plane point of intersection of normal vector to π . We can use the
equation of conic correspondence stated in equation (2.15),
H
T
C2 H = µC1,
to form the new constraint
(R − tuT )T K T C2K (R − tuT ) − µK T C1K = 03×3. (3.5)
This equation transforms the problem of pose estimation from conic corre-
spondence into a problem of estimating R, t and u from a set of five equations.
Though the matrix equation has six polynomial equations in all, its elements be-
ing unique upto non-zero scalar multiple, we have five equations, or by introduc-ing one more variable µ, we have six equations but the additional variable, µ. As
evident from the equation (3.5), t and u appear as a scalar product form. We need
to estimate u upto a scalar multiple. This argument reduces the variable set to R,
t, u =
1 n2 n3
T and µ: nine parameters in all from six equations. In order to
reduce the number of unknowns further, we introduce two assumptions,
1. Scene conic C is a circle in the global coordinate system.
30
7/25/2019 Pose estimation from one conic correspondence
2. The translation vector lies in a particular plane. Let us denote it by π w and
its defining normal vector by w.
The first assumption is easily realized by indoor scenes and to an extent by out-
door scenes. For example, a scene comprising of household artifacts is quite likely
to contain circular cross-sections in form of bottle-mouths, cups, glasses, doorknobs, objects of art and craft that contain circular arcs and curves, holes in walls
etc. And the complete circular curves don’t need to be visible, partially visible
curves can be fit to curves with considerable accuracy. In most of the cases, the cir-
cular objects in scenes would be solids and more like circular discs which implies
that they would be wholly in front of the camera while imaging. This means that
these circles would be always projected as ellipses. We have many tools available
for detecting an ellipse from an image and then fit a polynomial to this ellipse.
One such tool which we use for our experiments is developed by Prasad, [ 15].
The second assumption is not so commonly fulfilled as is the first one. But many
times it so happens that the camera is leveled and held on a tripod stand, even
as it moves. This fact can be used to estimate the plane that contains translation
vector. Hence in such cases, the plane containing the translation vector is already
known.
These two assumptions further reduce the number of variables in equation
(3.5). In next section (3.3) we prove the lemma (2), which says that u can be es-
timated as a set of finite solutions, each unique upto a non-zero multiple. Forthis we derive two equations (3.16) and (3.17). This means that out of the three
parameters of vector u, two are estimated through the lemma. Thus we are left
with seven parameters and six equations. Next, the second assumption reduces
the parameter set by one more variable. In summary, by employing two assump-
tions, we have a fully determined set of seven polynomials in seven variables.
Rewriting the constraint equations we get,
(R − tuT )T K T C2K (R − tuT ) − µK T C1K = 03×3,
wT t = 0, (3.6)
where u, C1, C2 and w are known and R, t and µ are to be estimated. If we con-
sider the geometry described by the above equations, we can intuitively note that
all of the seven polynomials are algebraically independent. This means that for
non-trivial cases, unique solution(s) exist. These equations form the backbone of
the approach for pose estimation we next propose.
31
7/25/2019 Pose estimation from one conic correspondence
The conventional approach to pose estimation is a two stage task of estimating F
from feature2 correspondences and, then R and t from F. As mentioned earlier in
section on background work in chapter (1), there is a lot of literature and study on
methods for estimating F from point or conic correspondences. But most of these
methods treat F as a single mathematical entity to be estimated at once. For the
second stage of estimating R and t, we have an algorithm proposed by Hartley
in [10], based on SVD decomposition of the fundamental matrix. A point worth
noting here is the fact that the estimation of R and t from F gives little insight
in a two camera setup in euclidean space. Additionally, the first assumption is
not encoded directly in the fundamental matrix formulation and in the way it is
estimated from point correspondences. These are the reasons why we look for
a different approach to pose estimation from conic correspondence. The reason
why assumptions affect the methods we adopt, is due to the fact that for the first
assumption of the scene conic being a circle, there are direct constraints on theposition of the scene plane. These constraints on plane position aren’t evident
directly from treatment of F as one quantity or even when we estimate it directly
from point or conic correspondences. The idea here is to breakup the fundamental
matrix in such a way that we have a direct relationship among the quantities de-
scribing pose, R, t, the conic correspondence and scene plane position. Equation
(3.6) encodes such a relationship. This equation can be solved through a geo-
metric construction, with which we can estimate all possible pose solutions with
substantial accuracy. This relationship in mathematical terms is given by equa-
tion (3.6). Next we give a derivation of two equations that put two polynomial
constraints on the plane position by employing the first assumption. And though
we are looking for a geometric construction in order to list out all possible pose
solutions, in section (4.1),we give an account of a way to optimize a cost function
that encodes the equation (3.6) so that one can register the follies with such an
approach and we justify our motivation for such an approach.
3.3 Mathematical implication of the first assumption
on scene plane π
Let us consider the arrangement as given in section (1.1), where only one camera
cam(O1, π 1, K ) and the scene plane π are considered. Let the conic C1 ∈ π 1 be
known. Given this setup we claim that there are finitely many positions of the
2Traditionally, features have included points but lines, conics and curves have been subse-quently used for estimating fundamental matrix.
32
7/25/2019 Pose estimation from one conic correspondence
Figure 3.1: Two cones,Q1 and Q2 describing a conic correspondence.
relative orientation between the circles C and C. The series of steps to follow
demonstrate a geometric construction of estimating the pose once the two circles
are known in R3. The two conics C1 and C
2 are known which give us two cones
Q1 and Q2 respectively. We apply lemma (2) to these two cones to get two sets of
plane positions of the form u =
m1 m2 m3
T denoted by U 1 and U 2. The two
sets are defined as:
U 1 is the set of planes π u1 such that the intersection of plane π u1 with cone Q1
is a circle, and U 2 is the set of planes π u2 such that the intersection of plane π u2with cone Q2 is a circle. The following property can be inferred from the proof of
lemma (2):
Lemma 3. If π u1 ∈ U 1, then π αu1 ∈ U 1, ∀α ∈ R − {0}. Similarly if π u2 ∈ U 2, then
π αu2 ∈ U 2, ∀α ∈ R − {0}.
Proof. Let us apply lemma (2) to C1 and its cone Q1. Inspecting the form of equa-
tions (3.16) an d(3.17) so obtained, we see that they are homogeneous polynomials
39
7/25/2019 Pose estimation from one conic correspondence
Figure 3.2: Rigid body motion of the cone Q2 onto Q2.
in three variables, m1, m2 and m3. By change of variables, we transform them into
polynomials of two variables n2 = −m2/m1 and n3 = −m3/m1. Hence scaling u1
by α does not have any effect on n2 and n3. Similarly we can argue for conic C2
and its cone Q2. Thus we have the result that if π u1 ∈ U 1, then π αu1 ∈ U 1, ∀α ∈ R.
Similarly if π u2 ∈ U 2, then π αu2 ∈ U 2, ∀α ∈ R.
For every plane π u1 ∈ U 1, we can always find a plane π u2 ∈ U 2 such that the
radius of the circle of intersection of π u1 with Q1 is the same as the radius of the
circle of intersection of π u2 with cone Q24. Let us define the radius of intersection
of π u1 plane with Q1 as ru1 and the radius of the intersection of π u2 with Q2 as ru2.
This means for every π u1 in U 1 we have π u2 in U 2 such that ru1 = ru2.
This relationship defines a pair of planes. This pair is important as every such paircan give a possible pose estimation and for every plane π u1 we have two planes
in U 2 which form such a pair viz π u2i and π −u2i for a scalar i ∈ N. Thus the set of
all possible pairs of planes which can give us a solution can be defined as
U sol = {(π u1, π u2) ∈ U 1 × U 2|ru1 = ru2} . (3.18)
4This can be seen as every cone extends to infinity and the radius can take any positive realvalue by appropriately positioning the plane.
40
7/25/2019 Pose estimation from one conic correspondence
Then the distance of p2 from xc1rot should be the same as that of p1 from xc1, giving
us the following polynomial equation:
p2 − xc1rot = p1 − xc1 . (3.27)
These equations (3.23), (3.24) and (3.27) encode the solution to the parameters Rand t. Point xc1rot obtained as a solution to the above three equations help us in
determining R with the following constraints:
Rxc1 = xc1rot ,
Rp1 = p2. (3.28)
Let us assume, A = xc1 p1 (xc1 × p1) and B = xc1rot p2 (xc1rot × p2) , (3.29)
where xc1 × p1 represents the vector cross-product of xc1 and p1, and similarly for
xc1rot and p2. Then, we estimate such a matrix R as
∴ R = B A−1, (3.30)
with A and B both being invertible matrices, justifying an existence of R as ob-
tained above. Now, from the way the solution to R is designed, we can ascertain
the following from equations (3.24), (3.27) and (3.28),
xc1 = xc1rot ,
p1 = p2 , (3.31)
and the angle between the vectors xc1 and p1 is the same as the angle between
vectors xc1rot and p2. With these facts in mind, one can easily prove the following
with matrices A and B as defined in equation (3.29),
AT A = BT B.
From this it is straightforward to note that the matrix R = BA−1 obtained as in
equation (3.30) is a rotation matrix. Once R is known, t is estimated as
t = xc2 − Rxc1, (3.32)
44
7/25/2019 Pose estimation from one conic correspondence
Discussion on an experiment on synthetic dataset for geometric
approach
For this dataset, we estimate all possible thirty two distinct pose solutions, R andt. From points on non-uniqueness of solutions discussed previously, we select two
pose solutions, which are shown in figure (3.4), where the true camera positions
Figure 3.4: Pose solution
are shown in green and yellow colors. First camera is shown in green. For sec-
ond camera(shown in yellow), the rotation matrix, R true is defined through euler
angles about the three coordinate axes as −8◦ about z axis, 10◦ about y axis and
0◦ about z axis. The translation vector is set to be ttrue =
1 −11 1T
. Let R1, t1
48
7/25/2019 Pose estimation from one conic correspondence
and R2, t2 be the two best pose solutions selected through our algorithm. Then
the camera for pose solution R1, t1 is shown in blue which almost coincides with
the true pose for second camera and the camera for pose solution R2, t2 is shown
in black color.
Departure of rotation matrices for these two solutions and the true solution from
identity matrix is
d(Rtrue, I 3) = 0.3515, d(R1, I 3) = 0.3516 & d(R2, I 3) = 1.8472.
The distances are based on geodesic distance on unit sphere between two points
R1 and R2 in SO(3) group, [18]:
d(R1, R2) = log(RT 1 R2) F,
where R1 and R2 are two rotation matrices.
Table 3.1: Results of single stage geometric approach on synthetic dataset. HereRtrue and ttrue denote true values and, R and t denote the pose solution obtained
through convergence for gradient descent scheme.
Angles with re-spect to x, y and z axes
Translation vec-tor, ttrue
Geodesic dis-tance of R fromRtrue
RecoveredTranslationvector, t
Angle be-tweent andttrue
Geodesicdis-tance of R fromI 3
Is the se-lected solutionwith small-est geodesiclength?
10◦ , 0◦, 0◦
0.50.10.1
2.1 × 10−4
0.500970.10200.0989
0.2428◦ 0.2469 yes
10◦ , 20◦ , 0◦
0.7−0.1
1
7.9 × 10−6
0.6993−0.0999
1.0000
0.0028◦ 0.5513 yes
0◦, 10◦, −5◦
1
−30.1
2.3 × 10−4
0.9993
−2.99930.1000
0.0080◦ 0.2760 yes
1◦, 10◦, −8◦
1
−111
1.3 × 10−4
0.9960
−11.00961.0049
0.0321◦ 0.3154 yes
30◦ , 0◦, 0◦
−0.1
−13
0.1666 × 10−4
−0.0914
−1.0642−3.0429
0.8615◦ 0.7239 yes
1◦, −30◦, −80◦
0.0891
−0.0980−0.0178
3.9 × 10−4
0.0900
−0.9853−0.1790
0.0264◦ 2.093 no
One more point to note is that we estimate the translation vector only upto a
non-zero scalar multiple. Hence we scale it up with the same scalar which has
scaled the true translation of second camera for visualization purposes in figure
(3.4). Hence for this case we select R1, t1 as the best possible pose solution taking
into consideration the observation that this solution has its rotation matrix closest
to identity matrix in geodesic sense.
This was for one experiment on synthetic dataset for our proposed approach. We
49
7/25/2019 Pose estimation from one conic correspondence
constructing conic representations in camera coordinate system. One can deduce
from this discussion that the primary reason for substantial error in pose solution
for a real dataset is errors in the calibration process we have employed through the
calibration toolbox. But it is worth noting that our algorithm gives better accurate
results than conventional optimization process for synthetic datasets.
3.6 Summary
This chapter forms the core of the thesis. We start with an introduction to two
equations derived in section (1) which relate the relative pose to a conic corre-
spondence. Based on these two equations, we devise a geometric construction
in an epipolar geometry framework simplified by two important assumptions
regarding scene conic and plane containing the true translation vector. The ge-
ometric approach thus proposed is tested upon both synthetic and real dataset.The results so obtained are compared, analyzed and discussed in order to explain
the performance of our proposed method. In next chapter, (4) we consider two
alternate approaches to pose estimation from one conic correspondence. These
two approaches differ from the geometric method we have taken up in this chap-
ter in the way in which we estimate the pose solution. As against this geometric
approach, these alternate approaches are based on optimization of cost functions
appropriately modeled on equations that relate pose, R, t to elements of epipolar
geometry, H , e, C1, C2.
53
7/25/2019 Pose estimation from one conic correspondence
This chapter describes two techniques for pose estimation which we have consid-
ered at certain points but the results haven’t been as good as the ones we obtained
with geometric approach, discussed and reported in chapter (3). The first tech-
nique is based on the same set of equations, on which the geometric approach is
based, which means the two assumptions defined in section (3.2) of chapter (3)also hold true here. But we estimate pose through a conventional optimization
scheme instead of the geometric construction. This is described in next section,
(4.1). The second approach we report here is based on a different idea, which can
be seen to be loosely based on the work of Higgins, [5], Zhang, [7] and Luong,
[8]. We employ optimization for a cost function modeled for one conic correspon-
dence and one point correspondence. Optimization schemes have been either
gradient descent method implemented through calculation of gradient vectors or
through MATLAB’s inbuilt methods like lsqonlin(.). The results for both of these
implementations are comparable. Hence in section (4.1.1) we report results for
experiments of first approach through gradient descent.
4.1 Estimating R and t through optimization
The equations which define the dependence of R and t on conics C1, C2 and the
scene plane π are,
(R − tuT )T K T C2K (R − tuT ) − µK T C1K = 03×3,wT t = 0, (4.1)
where u, C1, C2, w and K are known and R, t and µ are to be estimated. For sake
of brevity we assume C1 = K T C1K and C
2 = K T C2K . From this we define the
54
7/25/2019 Pose estimation from one conic correspondence
This allows us to use the above lemma (2) for C 1, giving us the vector u upto a
scalar multiple. Hence from equation (3.6), we can consider u as a known constantand hence have to estimate all elements of t. Vector w being constant, we have
unknown variables as Y , t and µ. The norm for matrices considered here is the
frobenius norm:
A F=
trace( AT A).
We have replaced the rotation matrix, R, with a real matrix Y and additional con-
straints Y T Y = I 3 and det(Y ) = 1. The cost function has been optimized through a
command lsqnonlin(.) in MATLAB, [21]. Results of sample experiments with this
approach are listed in section (4.1.1). With a random starting point, the behavior
of the algorithm is as expected for a conventional optimization technique. After
a certain value of cost function is achieved, the algorithm tends to get stuck in a
local minima. Additionally the final value achieved upon convergence depends
on the starting point. Due to these reasons, it is practically unfeasible to estimate
a unique solution in form of a global minima to the cost function. This is evident
from results listed in table (4.1) of section (4.1.1). With a starting point closer to
the true value, the algorithm converges to a solution which is considerably close
to the true value. But with a starting point which is considerably far from the true
values, the point reached upon convergence is far from the true solution.
One can perform optimization by explicit computation of gradient descent as
well. These vectors, ∂E(Y , t, µ)
∂Y ,
∂E(Y , t)
∂t and
∂E(Y , t, µ)
∂µ are:
∂E(Y , t, µ)
∂Y = 4C
2YY T C2 + 2(tT C
2Yu)C2tuT + 2 u 2 C
2ttT C2Y + 2C
2YL
+ C2tuT LT + L + 4Y T Y − 4Y + 2det(Y )R−T
n (det(Y ) − 1),
∂E(Y , t, µ)
∂t = 2(tT C
2Yu)C2Yu + 2 u 2 C2
2t + 4 u 4 C2t − 4µuT C
1uC2t + C2Yu
+ 2C2tuT Y T C
2Yu+ u 2 (tT C2tC
2Yu + 2C2ttT C
2Yu) − µC2YC
1u + 2(wT t)w,
∂E(Y , t, µ)
∂µ = −2tT C
2tuT C1u + 2µ(trace(C2
1)) − trace(Y T C2YC
1) − tT C2YC
1u, (4.3)
where L = (tT C2t)uuT − µC
1 and L = ∂(tT C
2YY T C2Yu)
∂Y . L hasn’t been further
simplified here for it doesn’t have a concise representation in matrix form. But it
can be simplified using symbolic computation toolboxes like MATLAB or Maple.
Or it’s analytic expression can be derived through some tedious matrix algebra.
55
7/25/2019 Pose estimation from one conic correspondence
purely optimization scheme, it is not feasible to estimate all possible pose solu-
tions.
The function lsqnonlin(.) of optimization toolbox in MATLAB has the option of
two types of optimization algorithms inherently. One is the Levenberg Marquardt
algorithm, [22] and the other is the trust region method. These two vary in a man-
ner which is not quite important to our problem at hand. But what is crucial is
the fact that the these algorithms don’t always converge to the global minima or
even if they do, one can never ascertain fully how many distinct points of global
minima our cost function can attain. A second problem here is that the cost func-
tion which we are attempting to solve is a set of thirteen polynomials in thirteen
variables. Theoretically such a solution set has multiple solutions and through
such a pure optimization approach, it is not feasible to estimate all possible pose
solutions.
4.2 Multi-stage approach to pose estimation: a com-
parison
Another approach which we have given some thought to is a based on two stage
dependence of R and t on point and conic correspondences. This relationship
depends on a property of fundamental matrix that defines point correspondence
between two image planes π 1 and π 2 as,
a ↔ b ⇔ bT Fa = 0, a ∈ π 1 & b ∈ π 2. (4.4)
A fundamental matrix can be decomposed as F = [e]× H , as introduced in
section (2.4). Thus, given n point correspondences, {ai ↔ bi} as defined above,
one can think of minimizing the error
f (F) = Σni=1(bT
i Fa2i ).
This gives us the fundamental matrix F from which we have the essential ma-trix E = K −T FK −1 with the assumption that calibration matrix K , known. Once
E is known, R and t can be estimated through a relative orientation algorithm
suggested by Hartley in [10]. This has led to a method of pose estimation from
point correspondences which has been quite studied and researched in the past
and successfully implemented. Theoretically seven point correspondences of this
form are sufficient to estimate F. The points involved in this correspondence need
57
7/25/2019 Pose estimation from one conic correspondence
to be in general position. By a general position, we mean that no three points
should lie on the same line in any of the two planes1. Similar to these approaches
we suggest a method for estimating F from point and conic correspondences. To
begin with let us consider one point correspondence,
a ↔ b, a ∈ π 1 & b ∈ π 2, (4.5)
and one conic correspondence,
C1 ↔ C2, C1 ∈ π 1, C2 ∈ π 2.
Let us have a scene conic C in plane π , π being a scene plane. Then C1 and
C2 are images of C by the two cameras on image planes π 1 and π 2 respectively,
thus defining the above mentioned conic correspondence. The two cameras im-
age the same scene plane and hence there exists a homography between the twoimage planes, constructed by point transfer between two image planes through
π . We have defined this point transfer in section (2.1.1). Projective invariance of
conics implies that the same homography ought to transform C1 into C2. If this
homography mapping is denoted by H , we have
H T C2 H = µC1. (4.6)
This equation introduces a constraint on H in the form of a zero set of six ho-
mogeneous polynomials in nine homogeneous variables2, which are the elements
of vector h. Let us assume h = vec( H ), where vec(.) is the usual vector operation
in linear algebra that transforms an n × n dimensional matrix to n2 dimensional
vector formed by stacking up the columns of the matrix. The equation (4.6) is thus
transformed into a set of five polynomials given next:
f : R9 → R5 : f (h) =
hT S1h
hT S2h
hT S3h
hT S4h
−hT S5h
= 05×1, (4.7)
1In fact a set of three collinear points in one plane would invariably be mapped to threecollinear points in the other plane.
2The conic and homography representations are in homogeneous coordinates, due to whichwe estimate H uotp non-zero scalar multiple.
58
7/25/2019 Pose estimation from one conic correspondence
In conclusion, we note that the geometric approach for pose estimation from one
conic correspondence gives us accurate pose solutions with the error being of the
order of 10−4. The idea for this approach rests on two important assumptions.
One assumption that the scene conic is a circle and the other assumption is that
the translation vector lies in a known plane. With these two assumptions thegeometry is highly simplified due to which we are able to employ computation
toolbox in MATLAB to solve the simplified set of polynomials and obtain all pos-
sible pose solutions. This further helps is estimating the finite set of all possible
pose solutions. Next, an observation is made that the pose solution with the rota-
tion matrix closest to identity matrix is the best approximation to the true value.
This observation helps in selecting one particular pose solution as final solution to
pose estimation problem. With experiments on synthetic data, we show that this
observation holds true and is valid for rotation matrices close enough to identity
matrix. For larger distances, we find the observation failing. This point props up
an important question which can form a part of future work. The question is that,
is a threshold possible to be computed analytically, such that for all cases with
distance rotation matrices from identity matrix less than this threshold, the obser-
vation holds true? If not then we need to find another way to select one solution
out of the finite set of pose solutions estimated through our geometric approach.
Another sure shot way is to use one point correspondence. But as mentioned in
chapter (3), we need to select one point correspondence which can be realized by
only one pose out of all the solutions. And such a selection in all general cases,doesn’t seem possible, atleast to our knowledge. So the search for a universal
method to pick one pose solution is still an open problem.
Secondly the results for real dataset have been marred by inaccuracies in cam-
era calibration. We haven’t figured out the source of error, but have shown that the
error in pose solution so estimated through geometric approach is solely due to
63
7/25/2019 Pose estimation from one conic correspondence
Lemma 4. We can have a subtraction map defined as :
φ : A × A −→ V , φ(a, b) → ν, ∀a, b ∈ A, v ∈ V
where
(ν, a) → ν + a = b
as per the definition of ’+’ operator above. Thus we can define φ(a, b) → b − a ≡−→ab = ν.
Then we prove here that this map is onto V and many-one.
Proof. If for two distinct vectors, v, u in V , we have φ(a, b) = v and φ(a, b) = u
then
φ(a, b) = v = u =⇒ v + a = b, u + a = b
=⇒ v + a = u + a
=⇒ (v − u) + a = a. (A.1)
Further, the uniqueness property says that 0 is the only vector such that
0 + a = a, ∀a ∈ A.
Hence we have
v − u = 0 =⇒ v = u.
This proves that the map φ : A × A → V is well defined. Also for every vector vin V and every point a in A, we can find a point b in A such that b = v + a. Hence
φ(a, b) = v, ∃b ∈ A, ∀v ∈ V .
This proves that the map φ is onto. And we can find distinct points a1, a2, b1, b2 in
A such that v = φ(a1, b1) = φ(a2, b2) for atleast one v in V . This proves that φ is
many-one.
Lemma 5. For three points a, b, c in A, φ(a, b) + φ(b, c) = φ(a, c) where φ is what we
have defined in lemma (4) above.
We shall accept this lemma without proof here.
Thus with a definition of an affine space in place, we can note one point that an
affine space is actually a set of points with such a vector space and represent it as
( A, V , φ) where V is the underlying vector space or also represented as (X ,−→X , φ)
where−→X is the underlying vector space. Henceforth in this literature we will use
71
7/25/2019 Pose estimation from one conic correspondence
The projective space of dimension n, denoted P(En+1) is obtained by taking the
quotient of an (n + l)-dimensional vector space En+1 \ {0} with respect to the
equivalence relationship
x x
⇔ x = λx, ∀x, x
∈ En+1, ∃λ ∈ R \ {0}. (A.6)
Here we assume En+1 is the vector space over R. In some cases we might gen-
eralize to the complex field C and mentioned as required. Also do we see that
is an equivalence relation here.
Many other equivalent definitions are also found in literature for P(En+1). One
might be interested in looking at an equivalence relation as a 1-dimensional sub-
space of En+1, thus P(En+1) can be seen as the set of all 1-dimensional subspacesof En+1, or also the set of all lines passing through origin in En+1. Different ways
of looking at the definition, but essentially the same structure is obtained. Alter-
nate ways of describing a projective geometry are interesting enough to not miss.
Hence just for the sake of lateral view:
A projective space is a triplet (P, L, I ) such that
1. Any pair of distinct points are joined by a unique line.
2. Given any four points A, B, C, and D with no three collinear, if AB intersects
CD, then AC intersects BD.3
3. Every line is incident with at least three distinct points.
4. There exist three non-collinear points.
P is a set of points, L is a set of lines and I is an incidence structure which gives
us the information as to which line is incident on which point and which point is
incident on which line. We can derive as results from these axioms many otherproperties of a projective space including it’s invariants. But they being out of
the scope of this text we skip it. Beutelspacher in [28] and Casse in [29] give an
extensive treatment of this topic.
3This axiom leads to the much talked about property of a projective plane that any two linesmust intersect at a point.
77
7/25/2019 Pose estimation from one conic correspondence
subspace(k < n) into l dimensional projective subspace where l ≤ k . In other
words, a plane in En would be transformed into either a plane, a line or a point
in E
m whereas a homography would preserve the dimensions. Hence a line is
transformed into a line, and a plane transformed into a plane. Here a projective
line is defined as a 1 dimensional subspace of the projective space and a plane
as 2 dimensional subspace, a point as a 0 dimensional subspace of a projective
space, and so on so forth. The reason for homography preserving the dimensions
is a point to note. A homography or collineation(a term used for homography
in certain literature) is a projective mapping associated with an isomorphism of −→ f : En+1 → E
n+1. Hence each of the vectors of a basis of any subspace of En+1
would be uniquely mapped to a unique vector in E
n+1, and a set of of linearly
independent vectors {mi}1≤i≤k would be mapped to a set of linearly independent
vectors {m
i}1≤i≤k . Hence a subspace of k dimensions spanned by {mi}1≤i≤k is
uniquely mapped to a subspace of k dimensions spanned by {m
i}1≤i≤k .
More on subspaces
Given two subspaces U and V of P(En+1), we have a span of U and V , < U ∪ V >
as the smallest projective subspace containing U ∪ V (or seen as the intersection
of all subspaces of P(En+1) containing U ∪ V ). Then we can easily show that
U ∪ V has the vector subspace F + G of En+1, when F, G are the subspaces of En+1
associated with U , V .
A.2.5 Affine completion
Generalizing the affine geometry we obtain the projective geometry. Specifically
we show here the extension of the affine space to obtain a projective space. Con-
sider an n dimensional affine space (X ,−→X , φ). Now assuming {mi}1≤i≤n+1 to be
a basis of affine space X , we can denote every point m ∈ X by taking the vector
−−→m1m = −→m = (x1,..., xn) and representing it with it’s co-ordinates in the given
basis. Thus extending this co-ordinate representation by appending 1 we have−→m p = p((x1,..., xn, 1)) = p([−→m , 1]) 5. Hence as there is a one-one correspondence−→m ↔ −→m p we can represent every point a in P(En+1) not at infinity by a unique
point m in X . Also for every point m in X we have a unique point a in P(En+1).
5 p is the same canonical projection defined in equation (A.8).
82
7/25/2019 Pose estimation from one conic correspondence
where p4 is the rightmost column of P. A camera center in world coordinate frame
is defined as a vector C, such that PC = 0. For the finite camera with non-singular
M, C is the point represented as C =
− M−1 p4
1
In short, a finite camera is the one
whose center C is a finite point in 3D world coordinate system. 2
B.1.1 Elements of a finite projective camera
Assuming that we have a finite camera at hand, a camera projection matrix P = M | p
is dissected into following elements:
1. Column points: The leftmost 3 columns of P, which are p1, p2, p3 represent
the images of 3 principal directions X , Y , Z of world coordinate system. And
p4 represents the image of the origin of world coordinate system. This is so
in P3 a direction is represented by a point at infinity in that direction. HenceX direction is represented by a point represented by the vector (1, 0, 0, 0)
2. Row vectors: Denoting the rows of P as r1, r2, r3, the principal plane is the
plane parallel to image plane and passing through the camera center. Hence
all points that project to points represented by (x, y, 0) lie on this plane.
Thus PX =
r1
r2
r3
X =
x
y
0
, and hence r3X = 0. Thus r3 is the correspond-
ing row representing the principal plane. Similarly we can see that the othertwo rows are the planes which project to X and Y axes of the image plane.
They are known as axis planes.
B.2 Infinite Camera
An infinite camera is the one whose center is at infinity. Using the notion of the
previous section, we say that M is a singular matrix. And hence applying the
condition PC = 0, we get the camera center as
C =
d3×1
0
.
2Can we say that a point at infinity in 3D world coordinate system is also a point at infinity in3D camera coordinate system?
93
7/25/2019 Pose estimation from one conic correspondence
This section contains some mathematical proofs to certain statements claimed in
various sections of the thesis. While some proofs might look trivial and some
not so trivial, the premise of a rigorous mathematical backbone is upon airtight
arguments and reasoning. And hence we aspire to lay down whatever proofs felt
relevant with utmost rigor.
Lemma 17. The zero set of function f in equation (4.7) defines the set of valid values of
h. We hypothesize that this set of points
X = {h ∈ R9| f (h) = 05}
defines an implicit manifold of fourth dimension. Or in other words, the jacobian of f ,
J X (h) defined as
J X (h) = 2
hT S1
hT S2
hT S3
hT S4
hT S5
5×9
(C.1)
is a matrix of rank five for all nonzero values of vector h ∈ R9. Where Si are nine
dimensional quadrics or real symmetric matrices defined in section (4.2).
Proof. Let us in general assume that the five vector rows are linearly dependent.
Hence we have some scalars αi, i = 1, 2, ..., 5 such that Σ5i=1αihT Si = 0 and not allof αi are zero. Using the definition of Si, i = 1, 2, ..., 5 we can write
hT S1 =
hT 1 C2 01×3 − p1hT
3 C2
, hT S2 =
01×3 hT
2 C2 − p2hT 3 C2
,
hT S3 =
hT 2 C2/2 hT
1 C2/2 − p3hT 3 C2
, hT S4 =
hT
3 C2/2 01×3 hT 1 C2/2 − p4hT
3
hT S5 =
01×3 hT 3 C2/2 hT
2 C2/2 − p5hT 3 C2
.
96
7/25/2019 Pose estimation from one conic correspondence