Pose estimation from one conic correspondence

7/25/2019 Pose estimation from one conic correspondence

http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 1/110

Pose estimation from one coniccorrespondence

by

Snehal I. Bhayani

201211008

A Thesis Submitted in Partial Fulfilment of the Requirements for the Degree of

MASTER OF TECHNOLOGY

in

INFORMATION AND COMMUNICATION TECHNOLOGY

to

DHIRUBHAI A MBANI I NSTITUTE OF I NFORMATION AND C OMMUNICATION T ECHNOLOGY

June, 2014



Declaration

I hereby declare that

i) the thesis comprises of my original work towards the degree of Master of

Technology in Information and Communication Technology at Dhirubhai

Ambani Institute of Information and Communication Technology and has

not been submitted elsewhere for a degree,

ii) due acknowledgment has been made in the text to all the reference material

used.

Snehal Bhayani

Certificate

This is to certify that the thesis work entitled INSERT YOUR THESIS TITLE HEREhas been carried out by INSERT YOUR NAME HERE for the degree of Master of

Technology in Information and Communication Technology at Dhirubhai Ambani

Institute of Information and Communication Technology under my/our supervision.

Prof. Aditya Tatu

Thesis Supervisor

i



Acknowledgments

Is this what you ask,

or is this your answer?

Pray illuminate, for I have questions more and more

after you answer each one of these,

and more... ;

there is no learning save for when I stumble,

pray illuminate, for I have questions more.

Of the swathe of acknowledgments, the first one goes out to my supervisor

Prof. Aditya Tatu. His insights helped me not only achieve crucial progress, but

have different and interesting perspectives of problems at various times along

the duration of my thesis work. His guidance, starting from what, when, where

to keep notes(in Latex), to what shall we infer from which experiment, has been

paramount in molding my work in a comprehensive manner. With his knowledge

of mathematics and his sheer interest in the same, he has helped me get over mypoints of confusion, again, a numerous times.

I would like to acknowledge Prof. Manjunath Joshi for his guidance on camera

calibration tools and approaches. I would also like to acknowledge Prabhunath

sir, for his instantaneous help in creating a virtual setup for camera calibration. A

special thanks to my friend, Haritha for her swanky DSLR camera helped me have

the best possible images of calibration patterns for days at end. I would like to

thank all of my friends, colleagues and classmates, who put up with my changed

self while I worked and put up with my other self while I was not working and

they were. And last but not the least, a special thanks to my parents, and my sister

for their constant support and care all along.

ii



Contents

Abstract vi

List of Principal Symbols and Acronyms viii

List of Tables ix

List of Figures x

1 Introduction 1

1.1 Two camera setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.1 General assumptions . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Introduction to problem of pose estimation . . . . . . . . . . . . . . 8

1.3 Background work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4 Our contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.5 Layout of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Epipolar geometry 13

2.1 Introduction to epipolar geometry . . . . . . . . . . . . . . . . . . . 13

2.1.1 Geometric definition of homography between two images . 14

2.1.2 Algebraic definition of epipolar mapping . . . . . . . . . . . 15

2.1.3 Some properties of the fundamental matrix, F . . . . . . . . 16

2.1.4 Question on homography generated in a one camera setup . 17

2.1.5 Question on homography generated in a two camera setup 20

2.2 Conics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3 Quadrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Geometric approach to pose estimation from one conic correspondence 28

3.1 Dependence of pose on conic correspondence and vector defining

the scene plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2 Conic correspondence . . . . . . . . . . . . . . . . . . . . . . . . . . 30

iii



3.3 Mathematical implication of the first assumption on scene plane π 32

3.4 Estimating R and t through geometric construction . . . . . . . . . 37

3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.5.1 Experiments for geometric approach on synthetic data . . . 47

3.5.2 Experiments for geometric approach on real data . . . . . . 50

3.5.3 Experiment of geometric approach on part real and part syn-

thetic dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Alternate approaches to pose estimation 54

4.1 Estimating R and t through optimization . . . . . . . . . . . . . . . 54

4.1.1 Results and discussion . . . . . . . . . . . . . . . . . . . . . . 56

4.2 Multi-stage approach to pose estimation: a comparison . . . . . . . 57

4.2.1 Optimizing the cost function . . . . . . . . . . . . . . . . . . 60

4.2.2 Results and discussion . . . . . . . . . . . . . . . . . . . . . . 61

4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5 Conclusion and future work 63

References 66

Appendix A Basics of projective geometry 70

A.1 Affine Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

A.1.1 Affine spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 70A.1.2 Basis of affine spaces . . . . . . . . . . . . . . . . . . . . . . . 72

A.1.3 Affine morphism . . . . . . . . . . . . . . . . . . . . . . . . . 73

A.1.4 Affine subspaces . . . . . . . . . . . . . . . . . . . . . . . . . 74

A.1.5 Invariants of Affine morphism . . . . . . . . . . . . . . . . . 75

A.2 Projective Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

A.2.1 Definition of a projective space . . . . . . . . . . . . . . . . . 77

A.2.2 Basis of a projective space . . . . . . . . . . . . . . . . . . . . 78

A.2.3 Projective transformation . . . . . . . . . . . . . . . . . . . . 79

A.2.4 Projective subspaces . . . . . . . . . . . . . . . . . . . . . . . 81

A.2.5 Affine completion . . . . . . . . . . . . . . . . . . . . . . . . . 82

A.2.6 Action of Homographies on subspaces and study of invariants 85

A.2.7 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

A.2.8 Homography as a perspective projection between two pro-

jective lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

A.2.9 Homography between two planes . . . . . . . . . . . . . . . 89

iv



Appendix B Camera models and camera calibration 92

B.1 Finite Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

B.1.1 Elements of a finite projective camera . . . . . . . . . . . . . 93

B.2 Infinite Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

B.3 Camera calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Appendix C Some miscelleneous proofs 96

v



Abstract

In this thesis we attempt to solve the problem of camera pose estimation from one

conic correspondence by exploiting the epipolar geometry. For this we make two

important assumptions which simplify the geometry further. The assumptions

are, (a) The scene conic is a circle and (b) The translation vector is contained in a

known plane. These two assumptions are justified by noting that many artifacts

in scenes(especially indoor scenes), contain circles, which are wholly in front of

the camera. Additionally, there is a good possibility that the plane which contains

the translation vector would be known. Through the epipolar geometry frame-

work, a matrix equation is defined which relates the camera pose to one conic

correspondence and the normal vector defining the scene plane. Through the as-

sumptions, we simplify the system of polynomials in such a way that the task

involving solution to a set of seven simultaneous polynomials in seven equations,

is transformed into a task of solving only two polynomials in two variables, at

the same time. For this we design a geometric construction. This method gives

a set of finitely many camera pose solutions. We test our propositions throughsynthetic datasets and suggest an observation which helps in selecting a unique

solution from the finite set of pose solutions. For synthetic dataset, the solution

so obtained is quite accurate with an error of 10 −4, and for real datasets, the solu-

tion is erroneous due to errors in camera calibration data we have. We justify this

fact through an experiment. Additionally, the formulation of above mentioned

seven equations relating the pose to conic correspondence and scene plane po-

sition, helps to understand that, how does the relative pose establish point and

conic correspondences between the two images. We then compare the perfor-

mance of our geometric approach with the conventional way of optimizing a cost

function and show that the geometric approach gives us more accurate pose solu-

tions.

vi



List of Principal Symbols and Acronyms

α, β, γ : Real valued scalars.

En,Rn : n dimensional vector spaces over R. The basics of projective geometry

which we shall refer to, are mostly written in a language that uses the no-

tation En to denote a real vector space. The rest of our work shall use the

usual Rn notation.

P(En+1) : n dimensional projective space over R. As mentioned in appendix

(A.2), the underlying vector space is En+1.

p : En+1 \ 0n+1 → P(En+1) : Canonical projection from vector space, En to P(En)

as explained in section (A.2.2).

f :P(En+1)→P(En+1): A bijective mapping of points of P(En+1) onto itself such

that −→

f : En+1 → En+1 is an isomorphism. In literature this is termed as a

projective morphism or a projective transformation.

GL(n) : Set of n × n real invertible matrices.

E : The calibrated counterpart of F that relates the two images in the same way,

except for the fact that the calibration matrices are known and fixed.

K : A 3 × 3 camera calibration matrix that has the intrinsic parameters of the

camera, K ∈ GL(3).

H : A homography or a projective morphism over a projective plane, f :P(En+1)→P(E

where n = 2. This mapping is denoted by a matrix H ∈ GL(3).

cam(O, π , K ) : A camera unit with centre O, image plane π and calibration matrix

K .

F : A fundamental matrix defined in a framework of epipolar geometry, [1].

vii



[x]× : The skew-symmetric matrix defining the vector cross-product of x. If x =x1 x2 x3

T , then [x]× =

0 −x3 x2

x3 0 −x1

−x2 x1 0

.

I n×n or I n: An n × n dimensional identity matrix.

xn×1 or xn: An n dimensional column vector, x.

A F: The frobenius norm of matrix A.

viii



List of Tables

3.1 Results of single stage geometric approach on synthetic dataset.

Here Rtrue and ttrue denote true values and, R and t denote the pose

solution obtained through convergence for gradient descent scheme. 49

3.2 Results of single stage geometric approach on real dataset . . . . . 51

3.3 Result of part real data for investigating the error due to erroneous

calibration matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.1 Result of gradient descent approach on synthetic data. Here Rinit

and tinit denote starting points, Rtrue and ttrue denote true values

and, R and t denote the pose solution obtained through conver-

gence for gradient descent scheme. . . . . . . . . . . . . . . . . . . . 56

ix



List of Figures

1.1 A setup describing epipolar geometry . . . . . . . . . . . . . . . . . 3

2.1 Epipolar geometry or the geometry of two views . . . . . . . . . . . 13

2.2 Epipolar plane drawn for epipolar geometry . . . . . . . . . . . . . 15

2.3 A one camera setup and its question . . . . . . . . . . . . . . . . . . 18

2.4 Geometric description of poncelet’s theorem, figure from [2]. . . . . 20

2.5 Conic correspondence through a projective transformation. . . . . . 24

2.6 A cone with its apex at origin, image from [1]. . . . . . . . . . . . . 25

3.1 Two cones,Q1 and Q2 describing a conic correspondence. . . . . . . 39

3.2 Rigid body motion of the cone Q2 onto Q2. . . . . . . . . . . . . . . 40

3.3 A diagram describing the geometric construction. . . . . . . . . . . 43

3.4 Pose solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.5 First test image containing conic C1 . . . . . . . . . . . . . . . . . . . 51

3.6 Second test image containing conic C2

. . . . . . . . . . . . . . . . . 51

3.7 Difference between the two conics of real and sythetic datasets . . 52

A.1 Commutativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

A.2 Associativitiy of perspective projections . . . . . . . . . . . . . . . . 90

A.3 An example of a homography between two projective planes l and

m due to perspectivity . . . . . . . . . . . . . . . . . . . . . . . . . . 90

C.1 Two series of circular cross-sections in circular cone, figure from [3]. 98

x



CHAPTER 1

Introduction

This thesis deals with the problem of one form of pose estimation as defined for

computer vision community. In rudimentary terminology this form of pose es-

timation can be stated as an estimation of relative orientation between two camera

positions in euclidean coordinate system from where the given scene has been imaged.

Haralick in [4] introduces four classes of pose estimation problems as given next:

1. 2D-2D pose estimation problem: We are given two-dimensional coordinate

observations from N observed images: x1,..., xN . These could correspond,

for example, to the observed center position of all observed objects. We are

also given the corresponding (or matching) N two-dimensional coordinate

vectors from the model: y1,..., yN . The rotation and translation in 2D plane

are to be estimated that relate these two sets of observations. In other words,

we have to determine the rotation matrix R and the translation vector t such

that the least squares error,

2 = ΣN n=1wn yn − (Rxn + t) 2, (1.1)

is minimized. wn represents the weight for the contribution to the error by

the nth point correspondence.

2. 3D-3D pose estimation problem: Let N 3D-coordinate observations be given

as y1,..., yN and that they match the corresponding 3D coordinates x1,..., xN .

Each observation yn is said to be the rigid body motion of the correspondingobservations xn in R3 space. They are related as

yn = Rxn + t + ηn, ∀n, 1 ≤ j ≤ N .

1



The pose estimation in this case is defined to be the estimation of R and t

that minimizes the error

2 = ΣN n=1wn yn − (Rxn + t) 2 .

3. 2D perspective-3D pose estimation problem: Let N 3D coordinate observa-tions be given as y1,..., yN and they match the corresponding 2D coordinates

represented as x1,..., xN , x j =

u j1 u j2

T . The exact relationship is given as

u j1 = f r1 yn + t1

r3 yn + t3,

u j2 = f r2 yn + t2

r3 yn + t3,

t = (t1, t2, t3) ,

R =

r1

r2

r3

, (1.2)

where f is the focal length or the distance of the image plane in front of

the origin that is the center of perspectivity and r1, r2 and r3 are the rows of

the rotation matrix R. Then the problem of pose estimation of this kind is

to estimate R and t when a set of correspondences between the 3D points

and the perspective 2D points are given. This problem is termed as exterior

orientation problem in photogrammetry literature.

4. 2D perspective-2D perspective pose estimation problem: This is perhaps the

most difficult of pose estimation problems. Here we do not have the 3D

world coordinates. Instead we have two images or perspective projections

of the same object. Or one can assume the object to be moving and the

perspective projection device to be fixed. A pin-hole camera model is onesuch theoretical device and is of interest to us. As a setup, we have a scene

and it’s image from two distinct positions of the camera(or as stated, we

have one camera and we are taking images of a scene which has undergone

a rigid body motion.). Then with point correspondences between these two

perspective projections one has to estimate the rigid body motion that the

scene has undergone. This statement forms the statement of 2D perspective-

2D perspective pose estimation problem.

2



Of the four classes above, our work is about the fourth type of pose estimation

problem, 2D perspective- 2D perspective pose estimation problem. This approach

requires an overview of a two camera setup. Hence before we go further, a general

arrangement of the two camera setup introduced in next section (1.1). The math-

ematical spaces considered throughout this report would be euclidean spaces 1,

unless specified otherwise.

1.1 Two camera setup

The purpose of introducing such an arrangement is two-fold. Firstly, it intro-

duces the various feature artifacts which would be used for establishing corre-

spondences between two images(like points, lines, conics etc.) and the varying

mathematical relationships amongst them, and secondly the same framework sets

up the idea of multiple-view geometry(or termed as epipolar geometry by Hartleyand Zisserman in [1]).

Figure 1.1: A setup describing epipolar geometry

1One can wonder, we are dealing with projective spaces and still the ones considered are notprojective spaces. The reason is, as shown in section (A.2.5), the projective space is obtained by"adding" points at infinity to an affine space(here we can consider an euclidean space as an affinespace with origin as the point [0,0,0]T ). For practical purposes we assume the points we deal withare "not at infinity". Hence the projective space is reduced to an affine(or an euclidean) space.

3



1.1.1 General assumptions

A two camera setup is depicted in figure (1.1). Here a pin-hole camera is decom-

posed into a projection center O(a point in R3), an image plane π and its calibra-

tion matrix K . This model is mainly of theoretical interest but for our application

we see that this highly simplified model works well enough so as to be able toignore various practical issues in a camera model. Such a camera model shall be

denoted as cam(O, π , K ). The calibration matrix houses quantities that determine

the relation between the position of a point x ∈ π in 2D image coordinate system

with respect to its position in the 3D global coordinate system of the camera. Let

O⊥ be the intersection of the line from O perpendicular to π with π . Then the

matrix K gives us the distance of the plane π from center O and the position of

the point O⊥ in terms of the local coordinate system of π . More on the structure

of K can be read from appendix (B.3).

As shown in the figure we have a pair of cameras cam(O1, π 1, K ) and cam(O2, π 2, K )

with their centers at points O1 and O2 in R3. The calibration matrices are same for

both of the cameras. The image planes associated with cameras O1 and O2 are π 1

and π 2 respectively. Now a quadratic curve is defined as the zero set of a second

order polynomial

Ax2 + By2 + Cxy + Dx + Ey + F = 0.

This polynomial can be written in matrix form as

x y 1

A C/2 D/2

C/2 B E/2

D/2 E/2 F

x

y

1

= 0. (1.3)

Using dual notation, henceforth we shall have the same notation for a quadratic

curve and for a matrix representation of its defining polynomial, C. For the above

defined curve, C means the matrix

A C/2 D/2

C/2 B E/2

D/2 E/2 F

and also the set of points

defined by the solution to the equation (1.3). In computer vision community such

a quadratic curve is termed as a conic.

The conics in the two image planes are assumed to be C1 in π 1 and C2 in π 2.

Further, let us have a third plane, π is oriented in R3 space, containing the scene

4



conic C such that its images are C1 and C2 upon imaging by the two cameras. For

a general orientation we assume that none of O1 or O2 lie on the plane π . This ar-

rangement of the three planes π , π 1 and π 2 constructs a special bijective mapping

between each pair of these projective planes, known as homographies in computer

vision terminology. Without getting into details, we mention that a homography

is a bijective mapping between two projective planes such that projective lines

are mapped to projective lines. Precise definition and properties of homography

between two planes is described in sections (A.2.3) and (A.2.9) of appendix (A).

From these definitions we note that such a mapping can be represented by a real

invertible matrix, H , unique upto a non-zero scalar multiple. Then the point map-

ping between two projective planes π 1 and π 2 is defined as

Hx = y, x ∈ π 1, y ∈ π 2,

where x and y are homogeneous representations for points of projective planes.

This matrix, H , shall henceforth represent such a homography between two pro-

jective planes. Then as mentioned before, that the arrangement of the three planes

π , π 1 and π 2 construct homographies between planes π and π 1, between π and

π 2, and between π 1 and π 2 as shown below:

H 1 : π 1 → π , H 2 : π 2 → π and H : π 1 → π 2

where

H = H −12 H 1.

Contrary to what we assume for defining homography above, we assume that

the three planes π , π 1 and π 2 are represented as euclidean planes rather than pro-

jective planes. A practical application would mostly have the cameras at finite

location and the projective point representing the finite camera center would be

uniquely identified by its corresponding euclidean (or affine) counterpart. Even

if there is point at infinity in P(E4) which is imaged to obtain a finite point in the

image plane, we treat those points as special cases of parallelism. Further, the

points in P(E4) which are imaged to points at infinity on the image planes are the

points lying on the principal plane2, [1]. But for practical situations we don’t con-

sider those parts of scene that lie on the same side of the image plane on which

the camera center lies. This means that points on the principal plane would not be

imaged, which implies that the points in the scene in front will never be mapped

2A principal plane in a camera is the plane parallel to the image plane and passing through thecamera center.

5





ditions for point correspondence x ↔ y:

yT Ex = 0 (1.5)

if and only if x and y are the images of the same scene point4. Another matrix

discussed and taken up subsequently by noted researchers termed as fundamentalmatrix. This matrix is the un-calibrated counterpart of the essential matrix.

E = K T FK .

This means the same point correspondence is defined but the point measurements

don’t need the calibration matrix to be known. A detailed explanation and treat-

ment of both these matrices can be found in the textbook, [1] by Hartley and Zis-

serman. These equations form the backbone of our thesis.

Relative orientation of plane π 1 with respect to plane π 2 in E3 is assumed to

be rotation, R and translation, t. These quantities are such that a point y ∈ π 2

when rotated and translated through R and t, we would get a corresponding

point x ∈ π 1 as x = Ry + t. Thus in the figure given above, if O1 is at origin

O1 =

0 0 0T

, then O2 = −RT t. The points of intersection of line −−−→O1O2

with planes π 2 and π 1 are known as the epipoles e1 and e2 of cameras 2 and 1 re-

spectively. The essential matrix, as introduced above can be decomposed in terms

of R and t, [5]:E = [t]×R.

The fundamental matrix in terms of R and t is decomposed as [1]:

F = [e2]× H .

In lemma (1) in chapter (3), we prove the following relationship between pose

parameters R, t and variables of epipolar geometry, H , e

t = λK −1e,

R = λ−1(K −1 HK + K −1evT K ), (1.6)

where R, t, e and K have their usual meanings and λ is a real scaling factor. v

represents the position of the scene plane.

4x and y are measured in image planes, assuming that the cameras are calibrated.

7



1.2 Introduction to problem of pose estimation

With the above setup in mind, one can now define the pose estimation through

mathematical quantities. Considering the camera centers, O1 and O2, translation

vector t is defined as:

t =−→O1 − R

−→O2, (1.7)

where−→O1 and

−→O2 are the vector representations of points O1 and O2 in R

3.

R is the rotation matrix which maps image plane π 2 to a parallel position to that

of π 1 upon rotation. In other words if π 1 is defined as uT 1 x + 1 = 0, ∀x ∈ R3 and

π 2 as uT 2 x + 1 = 0, ∀x ∈ R3, then R can be estimated as the rotation of the unit

vector u2/u2 to the unit vector u1/u1.

Thus pose estimation in this thesis, is defined to be an estimation of R and t, given

the two images or the two image planes, π 1 and π 2. In this thesis, the assumption is

that the cameras are calibrated and the camera calibration matrix K is same for

both cameras. More assumptions would follow in later chapters, some for more

accuracy and some for simplicity.

1.3 Background work

The history of computer vision is rich and full of brilliant insights. Its association

with projective geometry is even richer. We have listed out the four main classes

of pose estimation problem along the lines of a paper by Haralick, [4]. In a later

paper, Haralick et al. in [6] work over a similar kind of problem but exclusively

in an euclidean space, where they look at a closed form solution to pose estima-

tion from a set of three point perspective projections. This problem would be of

the type defined as 2D perspective-3D pose estimation problem in point (3). Pho-

togrammetry deals with these problems in detail and as well has its application

to computer vision. Higgins, in [5], introduced the essential matrix of equation(1.5) primarily to tackle the problem of relative orientation or in other words the

pose estimation problem. Authors in the same paper give us an algorithm to es-

timate R and t from E. This fact can be understood from the algorithm proposed

for estimation of R and t in [5]. A comprehensive study on fundamental matrix

and its related treatment has been carried out by other researchers, Zhang in [7]

and, by Luong and Faugeras in [8]. For the second and more crucial part of the

problem [5, 8, 7, 4, 9, 10] use point correspondences to estimate either the fun-

8



damental matrix or essential matrix. As against this, Heyden and Kahl in [11]

have used conic correspondences to estimate the fundamental matrix. The au-

thors give a brief survey of various features(like points, lines, curves and many

more) used in the past to estimate the fundamental matrix. They also state the

reasons for why conic correspondences are preferred by certain researchers over

the conventional point and line correspondences. The primary motivation is the

fact that many man-made objects contain a curve which is either a conic or can

be approximated to be formed of conics. Another reason is a property of projec-

tive transformation that any projective transformation maps a conic into another

conic(also termed as the projective invariance property). A projective transforma-

tion is a pointwise mapping between two projective spaces.5 Ji et al. in [12] have

used a mix of various geometric features like points, lines and conics, to estimate

the pose of a camera with respect to the object coordinate frame. Towards the

same objective, they have have considered a linear approach that combines geo-metric features at different levels of complexity, thus improving the stability and

accuracy of the solution. The approach estimates the pose parameters from point

correspondences, line correspondences and 2D ellipse-3D circle correspondences.

For circle-ellipse correspondence, they have obtained two polynomials which de-

fine two constraints on the relative pose. But the authors have assumed that the

radius of the circle is known and the property of the circle as a conic section is

not used completely as the focus is more on using as many feature correspon-

dences as available. The same problem of 2D perspective-3D pose estimation is

worked over by Wang in [13]. The approach as proposed in the paper amounts

to estimating the pose of the camera from single view and under the assumption

that the intrinsic camera parameters are known. The approach uses the image

of an absolute conic6 to estimate the pose of the camera. An added assumption

is needed that the image of center of the 3D circle is known is employed for the

minimal case where image of only one circle is known. But it hasn’t been explic-

itly justified when would such an assumption hold true though some methods of

estimating the same have been suggested.

5Definition of projective transformation in strict mathematical terms is given in appendix(A.2.3).

6The absolute conic is an imaginary conic at infinity that consists of purely imaginary points.The image of the conic is shown to depend only on the calibration matrix.

9



1.4 Our contribution

Our contribution primarily lies in an attempt to solve the problem from a slightly

different perspective. It has motivated two different approaches for pose estima-

tion. The first approach is based on the equation,

(R − tuT )T K T C2K (R − tuT ) − µK T C1K = 03×3,

where R, t, C1, C2 have their usual meanings, u ∈ R3 is the vector7 defining

the scene plane that contains the conic C and µ is a scaling factor introduced to

account for the homogeneous quantities, C1 and C2 in the equation. The above

equation is derived by combining epipolar geometry with one conic correspon-

dence. Intuitively this equation describes the relationship between the pose R, t

and the pair of conics in correspondence through the normal vector of scene plane,

u. This constraint can be further simplified if we assume that the conic C in scene

whose images C1 and C2 are known is a circle8 and that the translation vector lies

in a specified plane(defined by a normal vector w). These assumptions reduce the

number of unknown variables in previous equation to get


wT t = 0, (1.8)

where R, t and µ are to estimated and C1, C2, u, K and w are known. A straight-forward way to solve the above equations is to write a gradient descent algorithm

via explicit calculation of gradient vectors or use MATLAB’s inbuilt functions for

optimization on a cost function modeled from equation (1.8). Unfortunately any

optimization method, in general, can get stuck in a local minima, and through ex-

periments on synthetic datasets, we have found that the algorithm does get stuck

at a point which is nowhere close to the true value. Such an experiment and its

result is given in section (4.1) of chapter (4). A second problem is that there is

no sure-shot way of figuring out how many global minimas does our system of

polynomials have. These facts make the starting points of the parameters more ac-

countable to how does the algorithm behave. An estimate closer to the true value

helps the algorithm behave nicely and converge accurately to the true solution.

But with a starting point quite far off, the solution achieved upon convergence is

not at all close enough to the true value. To get round to this problem, we design

7The vector u defines the plane through the plane equation xT u + 1 = 0, ∀x ∈ R3.8By its requirement to being a circle we mean, a circle in the global coordinate system in R3.

10



a geometric construction9 such that one can estimate all possible pose solutions to

a given problem. For this we transform the problem of estimating pose solutions

through optimization of cost function of equation (1.8) to a problem that involves

finding solutions to two pairs of polynomials, with each pair depending only on

two variables. The first pair of polynomials has three and four degree polyno-

mials, whereas the second pair has quadratic polynomials. These polynomials

can be accurately solved using the symbolic computation toolbox available with

MATLAB. The advantage here is that at a time we have only two polynomials in

two variables to solve which is a considerable improvement over the conventional

optimization task which includes solving seven polynomials in seven variables at

the same time. This is the reason for the high accuracy our approach achieves.

Further, solving these polynomials we get the pose as a finite set of all possible

solutions in form of R and t. The process follows a geometric construction and

does not need optimization which in turn helps improve the accuracy of the re-sults. The construction further improves our understanding of the above equa-

tion. The equation (1.8) relates the image and camera coordinate systems through

a conic correspondence. As a set of observations we propose some points on how

to pick one solution out of the finite set of all possible solutions as obtained from

this approach. We perform experiments on both real and synthetic data for this

geometric approach to pose estimation. For synthetic datasets, we find that the

pose solutions thus estimated, are accurate to an error of the order of 10−4. Espe-

cially for datasets with rotation matrix close to identity matrix, the observations

help us select a solution which is closest to the true values. But the observations

don’t hold true for datasets with rotation matrices considerably far from identity

matrix. For such cases, we propose using one additionally point correspondence

which is beyond the scope of this thesis. For real datasets the estimated pose so-

lution is not accurate enough. But through a related experiment, we demostrate

that the error in pose solution is primarily due to the error in camera calibration

process.

1.5 Layout of thesis

In chapter (2) we introduce the basics of epipolar geometry. It deals with the setup

of a two camera system but from a projective geometry point of view. The pre-

requisites of epipolar geometry are projective, affine and euclidean spaces whose

9For the time being, we consider only the euclidean coordinate system.

11



properties and definitions are well covered in appendix (A). The camera models

are covered in appendix (B) and camera calibration in appendix (B.3). The discus-

sion in these two appendices follows the textbooks by Hartley and Zisserman in

[1] and by Trucco, Emanuele and Verri in [14]. In chapter (3) we introduce and

describe in detail the geometric approach to pose estimation from one conic cor-

respondence with the two assumptions. Alongwith the discussion of the algebra

and geometry behind the approach we list the experiments performed on syn-

thetic and real data, we infer certain points of merit and demerit for the proposed

approach. In chapter (4) we take up two alternate methods of pose estimation,

which are solved through optimization algorithms. Their shortcomings and sam-

ple results for one method are provided alongwith an interpretation for the other

method. In chapter (5) we conclude the thesis where we discuss practical and

theoretical difficulties encountered, and a possible future line of work.

12



CHAPTER 2

Epipolar geometry

2.1 Introduction to epipolar geometry

Epipolar geometry is a geometry of two views and the underlying framework on

which this thesis is built upon. Before one looks into the details, it is worthwhile

to see why should one study the same. We start with a purely euclidean setup

already introduced in section (1.1) on a two camera system. The same setup is

redrawn here but with the necessary details kept and the rest removed.

Figure 2.1: Epipolar geometry or the geometry of two views

The two cameras are cam(o, π 2, K ) and cam(q, π 1, K ) and the scene plane is π .

Through this scene plane we have the point mappings a ↔ a,b ↔ b, c ↔ c

and d ↔ d. The points e2 and e1 are known as epipoles for images π 1 and π 2

13



respectively. These two points define correspondence a geometric setup. This

gives us the name epipolar geometry meaning, the geometry of two(’epi’) poles.

In general this point correspondence between points x1 in π 1 and x2 in π 2 can be

defined as

x1 ↔ x2 ⇔ −→qx1 ∩ −→ox2 = ∅ & pint ∈ π , ∃ pint ∈ −→qx1 ∩ −→ox2. (2.1)

This is the geometric way of defining a point correspondence. One point worth

noting is that the camera setup of the figure (2.1) is in R3. If lines −→qx1 and −→ox2 are

parallel they don’t intersect in a point in E3, but in a point x∞, well defined in the

projective space P(E4) which by equation (A.13) is decomposed as

P(E4) = E3 P(E3),

where denotes the union of two disjoint sets. Thus point x∞ lies in P(E3).With this decomposition in mind, we can ensure that the point correspondence

between two images is well defined. This way of defining a point correspondence

motivates a special homography between two images. We call it special because,

such a homography would be constructed through the scene plane. As shown

later, this mapping is a part of a more general mapping between these two images

through the scene. In next section we intuitively describe this homography map-

ping through a scene plane and after that algebraically define the more general

mapping through scene points.

2.1.1 Geometric definition of homography between two images

Based on the way a point correspondence between two images through a scene

plane, π , is described, one can infer that such a mapping would be bijective. Dis-

tinct positions for π would give different mappings unless the planes are parallel

to each other. One point to note is that given a pair of images and a scene, not

every point in first image forms a correspondence pair with a point in second

image through a homography realized through a scene plane. Only the pointswhich are projection of points on scene plane, in both of the image planes are the

only ones forming correspondence pairs through homgraphy mapping generated

through π . This is termed as point transfer through scene plane π by Hartley and

Zisserman in chapter (9), [1]. But the scene points(irrespective of whether they lie

on scene plane or not) in general also setup point correspondences between the

two images. We look at this mapping in an algebraic formulation next.

14



2.1.2 Algebraic definition of epipolar mapping

We can model the correspondence of equation (2.1) using a fundamental matrix

as well:

x1 ↔ x2 ⇐⇒ xT 2 Fx1 = 0 where x1 ∈ π 1 and x2 ∈ π 2. (2.2)

Hartley and Zisserman in [1] term this representation to be the algebraic expres-

sion of the epipolar geometry. Given a pair of cameras, their image planes have

point correspondences related through this algebraic equation. But the point map-

ping is not unique which is evident from the two figures (2.1) and (2.2).

Figure 2.2: Epipolar plane drawn for epipolar geometry

From figure (2.2), we see that points c and c in plane π 1 map to the same

point c in plane π 2. In short all points that lie on line−−→ce1 are mapped to the same

point in plane π 2. Thus we say that the line−−→ce1 corresponds to the point c. For

geometric intuition one has the following definitions from [1]:

1. Epipolar plane of a point c ∈ π 2: A plane containing the line −→qo and the point

c is known as the epipolar plane of c.

15



2. Epipolar line of a point c ∈ π 2: The line l in π 1 obtained by the intersection of

the epipolar plane of c as defined above, with the image plane π 1 is known

as the epipolar line of c. This line is the set of all points of π 1 which can be

mapped to c through the two-camera setup described above.

To conclude, each point x ∈ π 2 has a unique line associated with it, l ∈ π 1. Thesame epipolar plane is also seen to be the epipolar plane of all points x ∈ π 1

such that x ∈ l. With simple geometry, one can say that, to every point x ∈ π 2

there is a unique line associated, l in π 1. The fundamental matrix F encodes this

correspondence:

l = F x, (2.3)

where l is a vector representation of line l in P(E3). Referring to section (A.2) of

appendix (A) we say that every line l in P(E3) corresponds to a plane throughorigin in E3 and the normal vector of this plane is denoted by l here. Hence this

representation is unique upto a non-zero scalar multiple, which conforms well

with the relationship given above. This is a point-line correspondence between

the two images that solely depends on the relative orientation of the two cam-

eras. It is just another perspective of the point-point correspondence of equation

(2.2). The geometric description of homography we saw in previous section is

constrained mapping of current mapping, as is evident from figure (2.2). In other

words, the point correspondence pairs through geometric description are a sub-

set of the correspondence pairs through the algebraic definition we discussed in

present section. In summary, this section builds the framework of epipolar ge-

ometry through which two images have point mappings realized through scene

points.

2.1.3 Some properties of the fundamental matrix, F

The fundamental matrix is of rank 2, unique upto a non-zero real scalar. Certain

decompositions and properties of this fundamental matrix are enlisted below for

a quick reference. Detailed discussions on properties and different interpretations

can be obtained from [1, 8, 7]:

1. If P1 and P2 are the projection matrices1, of two cameras then F = [e2]×P2P†1 .

2. If the relative orientation and position between the two cameras are defined

1 A projection matrix of a camera is discussed in appendix (B.1).

16



by rotation R and translation t,

F = K −T [t]×RK −1. (2.4)

3. If the scene contains a plane π and the point mapping through the plane is

defined by the homography H ,

F = [e]× H ,

were e is the epipole of the image plane π 2 of the second camera and H is

defined such that

x = Hx, ∃x ∈ π 2, ∀x ∈ π 1. (2.5)

The second property is helpful for an intuitive grasp of the setup. The fun-

damental matrix maps points from one image to the other albeit upto a certainambiguity. The points are specified in local coordinate systems2. The decompo-

sition is though specified in terms of R and t which can be seen as being external

or specified in absolute coordinate system as compared to the image and scene

planes involved. This enables us to infer from an algebraic point of view how

does the change in R and/or t affect the change in point mapping. For more clar-

ification, we can put equations (2.2) and (2.4) together:

xT K −T [t]×RK −1x = 0. (2.6)

Intuitively we see that this equation describes a relationship between point

mappings and relative orientation between the two cameras. Such an interpre-

tation will be useful for the approach we have devised for pose estimation, as

the aim is to estimate R and t from various feature correspondences. For want

of deeper insight, there are two questions which need to be answered related to

equations (2.5) and (2.6) in chapter (3). These answers help is a better understand-

ing of the single stage geometric approach for pose estimation, taken up in chapter

(3). Next we take up both the questions one by one.

2.1.4 Question on homography generated in a one camera setup

Before taking up the problem with two cameras, we consider a situation with just

one camera and the scene plane π 1. For a given relative orientation of the camera

2To every plane(image or object) we fix an internal cartesian coordinate system. When we talkof calibration matrix being fixed, we mean the coordinate system as well.

17



Figure 2.3: A one camera setup and its question

3 with respect to the scene plane π 1, we can have a homography H representingthe mapping π → π 1

4. Thus given a relative orientation of the camera and the

planes, we can construct a unique homography. This statement is well proved and

discussed in depth in the textbook, [1] by Hartley and Zisserman, and which we

accept here without proof. The actual question is inverse of the above statement:

“For a given homography can we orient the camera and scene plane in order to induce the

given matrix?". If we have fixed coordinate systems in both the planes, the given

homography actually translates to a euclidean problem. The homography, thus

gives us four point correspondences

5

between two planes π 1 and π :

ai → bi, ai ∈ π , bi ∈ π 1, 1 ≤ i ≤ 4. (2.7)

Thus the problem is about finding an orientation between the camera and scene

plane such that the point correspondences as mentioned above are obtained. One

can show that not any given homography (or a set of four point correspondences) can be

represented by an arrangement of the camera and the scene plane. It amounts to getting

the right representation and at the same time reducing the number of unknowns

and the number of equations in play. Once the basic arrangement is laid out, the3Following discussion on cameras in section (1.1), by camera, we mean a model comprising of

centre O1, its image plane π and the calibration matrix, K fixed as well as known.4One more point to note is, we can fix any coordinate system in π 1 and π planes. Thereafter a

change of coordinate system in any of the planes amounts to multiplying the obtained homogra-phy matrix by an invertible matrix of coordinate transformation. In fact calibration matrix is forthe same reason, to transform the coordinates from one coordinate system to another.

5The point correspondences are also assumed to have been measured in the pre-decided eu-clidean coordinate system.

18



reason for such a constraint is explained.

Such an euclidean arrangement is illustrated in figure (2.3). Here we have the

camera cam(O, π , K ). A calibrated camera means that the relation between the

local coordinate system of π and the global coordinate system is fixed. In fig-

ure (2.3) , the origin of coordinate system in π is O plane and the origin of global

coordinate system is O, the line −−−−→OO plane is perpendicular to π . As well as the

axes x − y of global coordinate system being parallel to the x plane − y plane axes of

the plane π . This information fixes the orientation of the plane π with respect

to the origin O and also the relationship of point P =

u plane v plane

T with the

global coordinate system. P as defined in the global coordinate system will be

P ≡

u plane v plane f T

, where f is the distance of O from the plane π . In terms

of polynomials, we can specify the same setup as a fixture of three quantities viz:

f and the distances of two arbitrary points6 P1, P2 ∈ π from O. These constraints

fix the orientation and the position of plane π with respect to the origin O. Thecalibration matrix encodes this information in the form of upper triangular ma-

trix K , but the equations help us understand the conditions that control the image

formation in a simple pin-hole camera.

With the basic setup with us, the point correspondences can now be defined as

mentioned before. Given four such point correspondences as labelled in equation

(2.7), we have to orient the plane π 1 relative to camera cam(O, π , K ).

Orienting π 1 in R3 to construct the desired homography

The way point correspondence between π and π 1 is defined, points ai, i = 1,..., 4

in plane π are mapped to bi = λiai in π 1, where λi is a scaling factor for point

ai. Then points λiai have to lie in the same plane, π . Further, the points bi are

measured in a local coordinate system and hence their positions are represented

by five distance constraints. In other words, five inter-point distances,

dist(b1, b2), dist(b1, b3), dist(b2, b3), dist(b3, b4) and dist(b2, b4)

are known, where dist(x, y) represents euclidean distance between two points xand y in R2 . Hence we have six polynomial constraints in four variables, λi, i =

1, ..., 4. This proves the fact we stated before that not all homography mappings

can be realized by a relative orientation of the scene plane with respect to the given

camera. We have an interesting result to further reinforce this fact, by Poncelet,

6The two arbitrary points ought be specified in the local coordinate system. So we can select

P1 =

1 0T

and P2 =

0 1T

.

19



[2] which is stated next.

Figure 2.4: Geometric description of poncelet’s theorem, figure from [2].

Poncelet’s theorem: A version of the famous Poncelet’s theorem mentions that

“When a planar scene is the central projection of another plane(image plane), the

planar scene and the image plane stay in perspective correspondence even if the

scene plane is rotated about the line of intersection of the image and the scene

planes. The center of perspectivity moves in a circle in the plane perpendicular to

this line of intersection".

For our requirements we can translate the same theorem as “Given an orientation

of the scene plane and the camera(consisting of the center, image plane and the calibra-

tion matrix, with a fixed coordinate system) inducing the given homography, any further

change in the relative orientation of the scene plane with respect to the camera will change

the homography."

This fact is an important point towards building up the original problem. For it

shows that in order to maintain the same homography in spite of change in orien-

tation of scene plane with respect to the image plane, the camera centre also needs

to move with respect to image plane(specifically in a circle). This means if we at-tempt to keep the distance of the camera centre from image plane fixed, no two different

orientations of scene plane can give the same homography.

2.1.5 Question on homography generated in a two camera setup

Adding one more camera to the above arrangement, we have two cameras and

the scene plane, π . We assume π 1 and π 2 are image planes of two cameras. Any

20



orientation so formed would give us a homography H between the two images

π 1 and π 2. The question we are asking is the inverse (as we did for Question 1

above):

Given a homography, H, can we orient the two cameras and the scene plane such that the

H is induced between the two image planes, by point transfer through the scene plane?.

This means that given a homography mapping H : π 1 → π 2, we can arrange the

three entities so as to obtain H 1 : π 1 → π , H 2 : π 2 → π and

H = H 2 H −11 . (2.8)

The orientation so obtained is the pose between the two cameras consisting of

rotation R and translation t. An exact dependence of R and t on H as well as on

epipole e(of image plane π 2 if H is defined as in equation (2.5)).

Algebraically the relation is specified as :

R = λ−1(K −1 HK + K −1evT K ) (2.9)

t = λK −1e, (2.10)

where λ is a non-zero scalar, and v is a parameter vector uniquely specifying the

orientation of the scene plane π in space. Next we derive these two equations.

The arguments follow a lemma stated in [1] and which we state here,

Lemma 1. We know that a fundamental matrix is of rank two. It can be decomposed as

F = [e]× H as we have seen earlier. Such a decomposition, given F, is not unique. Hence

this lemma says that if F has two decompositions,

F = [e]× H = [e1] H 1,

then e1 = λe and H 1 = λ−1( H + evT ) for a non-zero scalar λ and a vector v in R3.

Now, if we assume that the relative orientation between the two cameras with

same calibration matrix is represented by R and t, the projection matrices of the

two cameras are P1 = K [I 3|0] and P2 = K [R|t]. A property stated in [10] says

that with projection matrices given in this form, fundamental matrix would be

decomposed as F = [Kt]×KRK −1. Let us assume that for the same camera setup,

point e is the epipole of second image and homography between image planes

of the two cameras through a scene plane is H . Hence fundamental matrix can

be alternately decomposed as F = [e]× H . Thus we can apply lemma (1) with F

21



having two decompositions, F = [Kt]×KRK −1 = [e]× H to get:

Kt = λe,

KRK −1 = ( H + evT )/λ. (2.11)

Rearranging the terms, we get equations (2.9 ) and (2.10).Using these equations as a base, we hypothesize that a given H can help us to

identify the specific R and t in form of some conclusions:

1. If we keep the relative orientation of the two cameras the same and change

the plane position, the homography, H changes alongwith upto non-zero

scalar multiple.

2. If we keep the plane position fixed with respect to the coordinate system of

the first camera, it is not possible to obtain a relative orientation of the two

cameras so that a given homography is formed. This is obvious from the

above two equations in which R, t, e and λ are unknowns, nine in all. But

we have twelve algebraically independent equations. Hence not for every

H is a solution R and t, guaranteed.

The same inference can be seen through the breakup of homography be-

tween the two planes π 1 and π 2 in form of homographies H 1 and H 2. From

the discussion in question-1, we see that fixing a relative orientation between

a scene plane and the image plane, corresponding homography gets fixed.

Hence here as scene plane is fixed with respect to first camera, H 1 is fixed.

Assuming that H is given to us as well, from equation (2.8) we have

H 2 = H H 1

and hence H 2 is also fixed. Applying the discussion in question-1 again, we

see that not every homography H can be obtained through relative orienta-

tion between a scene plane and a camera. Thus for some values of H 2, we

have no possible relative orientation between two cameras, R and t possible.

3. If we have a solution R and t for a given homography, H and a given plane

position, then changing R and t would invariably change H . Two distinct

relative orientations would generate different homography through the same

scene plane.

These two equations motivate a method of pose estimation, R and t, from conic

correspondence which would put constraints on H , e and v and forms the basis

22



for one approach to pose estimation, taken up at the end in chapter (3). But it is

purely an optimization task though there is some possible of future work on it.

This thesis focuses on a different approach that involves one defining equation

instead of two here. We can combine this two equations by eliminating e. The

equation so formed forms the basis of our geometric approach. This equation has

been solved through optimization tool as well, but with results not good enough,

we create a geometric design and estimate R and t. Discussion on this design is

given in section (3.4) of chapter (3).

2.2 Conics

The epipolar geometry is laid out in previous section. It defines the point cor-

respondences between the two images. Such point correspondences lead to cor-respondences between more complex features of images. The main focus of this

thesis being the use of conic correspondences for pose estimation, it is worthwhile

investigating the formulation of conics, its basic properties and mathematical def-

initions for a conic correspondence. A conic is a second degree curve in a plane

described by a quadratic equation as its solution set:

ax2 + bxy + cy2 + dx + ey + f = 0. (2.12)

This is the euclidean plane equation. Its corresponding representation in P(E3) isobtained by homogenizing equation (2.12) using a third variable as:

ax2 + bxy + cy2 + dxz + eyz + f z2 = 0. (2.13)

The same equation can be encoded using a symmetric matrix:

x y z

a b/2 c/2

b/2 c e/2

d/2 e/2 f

x

y

z

= 0. (2.14)

The matrix C =

a b/2 d/2

b/2 c e/2

d/2 e/2 f

would define the conic upto a non-zero scalar

multiple. Using dual notation, we would use C to mean both, the set of points of

the conic and it’s defining equation. We use this notation to classify conics by in-

specting the matrix C. Ideally in a euclidean plane, we can have either degenerate

23



or non-degenerate conics:

1. A degenerate conic is the one in which rank (C) is less than three. In this case

the conic reduces to either a lone real point (

0 0 0T

), a set of two lines

(x = ± y) or a single line counted twice x = 0 in P(E3).

2. A non-degenerate conic is the one with a full rank matrix. We find all ver-

sions of a non-degenerate conic, viz parabola, hyperbola, ellipse and circle to

be projectively equivalent. This means that one non-degenerate conic can be

transformed into another through a projective morphism. Each of the three

forms parabola, ellipse and hyperbola are characterized by where does the

line at infinity meet them. A hyperbola meets the line in two distinct points

at infinity, a parabola is a tangent to the line at infinity and the ellipse doesn’t

intersect the line at infinity at all. So every projective morphism changing

one non-degenerate form to another is equivalent to moving the line at in-finity. But this is not the case in affine or euclidean classification.

In this thesis we focus on non-degenerate conics. As is evident from equation

(2.13) there are six coefficients to estimate which are unique upto a non-zero scalar

multiple. Hence five distinct points are sufficient to determine a unique conic.

Figure 2.5: Conic correspondence through a projective transformation.

Projective morphism of conics:

A projective morsphism would transform one conic into another. Referring to the

24



figure (2.5), we encode this relation as

H T C2 H = µC1, (2.15)

where H = A2 A−11 , A1 and A2 are the homographies of planes π 1 and π 2 with

respect to the scene plane π and µ is a real scaling factor. Matrix H denotes thehomography between π 1 and π 2.

Figure 2.6: A cone with its apex at origin, image from [ 1].

2.3 Quadrics

A Quadric is a surface defined by a quadratic polynomial in four homogeneous

variables in P(E4). It is the set of points which satisfy the relation

x y z w

S

x y z w

T

= 0 (2.16)

where S is a real symmetric 4 × 4 matrix defining the solution set. The above

definition tells that the surface is a quadratic surface. Just like conics, various

classes of quadrics can be studied through its defining matrix S. Henceforth we

shall denote a quadric as a set of points by its defining matrix, S.

A quadric has certain fundamental properties which can be read from chapter-3

in the book [1]. For the sake of reference below are certain properties which we

would need in coming chapters:

25



1. Every real symmetric matrix, S uniquely corresponds to a quadric upto a

non-zero scalar multiple. Hence nine points in general uniquely define a

quadric.

2. If S is singular, the quadric is said to be degenerate just like we mentioned

for conics in previous section. A cone at the origin defined as the set of points

x y z w

T such that

x2 + y2 = z2

is an example of a degenerate quadric which we use in this thesis. Such a

cone is shown in figure (2.6).

3. The intersection of a plane π with a quadric Q is a conic C. Computing the

conic can be tricky because it requires a coordinate system for the plane. Inlemma (2) we select one such coordinate system to estimate the conics C1,

C2 and C, as plane sections of cones.

4. As is for conics, a projective morphism transforms a quadric into another

quadric. Let us consider a projective morphism f : P(E4) → P(E4) as de-

fined in appendix (A.2.3) represented by a 4 × 4 real invertible matrix H .

Then the transformed set of points represented by Q = H −T QH −1 is also a

quadric.

2.4 Summary

In this chapter we introduce the epipolar geometry framework. We start with a

two camera system, and derive two important equations that encode the relation-

ship of relative orientation of one camera with respect to the other camera, R and

t on a conic correspondence. These two equations lead to three inferences about

how does a homography mapping change with change in R and t between the two

cameras. We present arguments in two different ways to arrive at the same con-

clusion regarding this dependence, one using the geometry of Poncelet’s theorem,

(2.1.4) and the other through the equations (2.9) and (2.10). With the focus of this

thesis primarily being on conics, next we introduce conics, their representations

as symmetric matrices and how conics would be transformed through projective

morphisms of P(E2) spaces. Discussion on cross-ratios of points on a conic is in-

cluded for a different perspective on the group of homographies that preserve a

conic. Similar argument holds for the subgroup of homographies that transform

26



a given conic C1 to the second conic C2. Lastly we give a brief introduction to

quadrics as homogeneous surfaces in R4 or as projective surfaces in P(E4), defin-

ing along the process, a cone.

Next chapter, (3) introduces our proposed method for pose estimation. The

approach suggested is an estimation of R and t, through a geometric construction

of a two camera setup. Hence we build the setup step by step and propose an

algorithm for the same. With results of experiments and analysis, on both real

and synthetic data, we compare the performance with conventional optimization

approach of solving pose for such a setup.

27



CHAPTER 3

Geometric approach to pose estimation from

one conic correspondence

In this chapter we start with two equations that relate pose parameters, R and

t to homography( H ), epipole(e) and a vector(let us denote it by u) defining the

scene plane, which are important constructs of the epipolar geometry. Using a

constraint developed from one conic correspondence(as discussed in section (2.2)

), alongwith two assumptions which we introduce and justify in section (3.4), we

obtain a matrix equation that directly encodes the dependence of R and t on the

two image conics and u. The simplified equation is indirectly solved through a

geometric construction, developed in section (3.4). Sample experiments are per-

formed for this proposed geometric approach on both synthetic and real data,

with the results listed in sections (3.5.1) and (3.5.2). Next two sections simplify

the above discussed set of equations to give us one defining equation, we men-

tioned above.

3.1 Dependence of pose on conic correspondence and

vector defining the scene plane

Let us restate the two equations that define R and t, (2.9) and (2.10) in terms of

epipolar geometry,

R = λ−1(K −1 HK + K −1evT K ),

t = λK −1e, (3.1)

where

1. R and t are the rotation matrix and the translation vector respectively de-

scribing the orientation of camera cam(O1, π 1, K ) with respect to cam(O2, π 2, K ).

28



Precise definition of pose as it depends on R and t and how it maps camera

1 to camera 2 is given in section (1.2).

2. H is the homography matrix that represents the mapping of points from

image plane π 1 to the image plane π 2. This mapping is through the third

scene plane, π via point transfer, described in section (2.1.1).

3. Point e is the epipole in the second image(or image plane π 2). e being mea-

sured in local coordinate system, is unique upto a non-zero scalar multiple.

4. K is the calibration matrix of the two cameras. It is in fact unique upto a

non-zero scalar multiple. One can see from the above equations that the

non-unique representation of e and H is compensated by the scalar λ and

the non-unique representation of K matrix.

5. Vector v defines the position of scene plane π , uniquely upto a non-zeroscalar multiple.

6. Scaling v up(or down) can be compensated for, by scaling t down(or up)

respectively keeping R and H the same.

Eliminating e from the two equations (2.9) and (2.10) we have

R − t

vT K

λ2

=

K −1 HK

λ . (3.2)

Let us denote the quantity by vT K /(λ2) by uT . Further, K −1 HK would be another

invertible matrix unique upto scalar multiple and hence is in one-to-one corre-

spondence with the homography matrix, H . One important point to note is that

H describes point mapping between the two images with the points measured

in local image coordinate system. Hence K −1 HK represents the same point map-

ping between the two images, but in another local coordinate system that has its

x − y axes parallel to the camera’s global coordinate system. This implies that the

matrix K −1 HK /λ represents the same homography matrix1. We rewrite the above

equation to get,

K −1(R − tuT )K = H /λ or R − tuT = H , (3.3)

where we denote K −1 HK /λ by H and To describe this equation in one line is

to say, given the relative orientation of camera cam(O1, π 1, K ) with respect to camera

cam(O2, π 2, K ) and the position of the scene plane, π in a global coordinate system, the

1This is one of the many places in this thesis where the fact that the cameras are calibratedwould be used to simplify the situation.

29



point correspondences between points in π 1 and π 2 through the points of the scene plane is

mapped through the homography H . The above equation is proved in a different way

by Hartley, [1]. It can be shown with some algebra that u represents the position

of the scene plane π . If π is represented by the solution set of the equation

π ≡

x y zT

∈ R3|m1x + m2 y + m3 z + 1 = 0, m1, m2, m3 ∈ R

. (3.4)

Then u is the vector

m1 m2 m3

T uniquely defining the position of π . Hence-

forth we shall alternately denote the plane π defined by a vector u as above, by

the notation π u.

3.2 Conic correspondence

The scene plane π contains conic, C whose images are C1 and C2 in planes π 1 and

π 2 respectively. These two conics are measured in the local coordinate system.

Then we have the transformed conics as C1 = K T C1K and C

2 = K T C2K as repre-

sentations of the two conics in a transformed local coordinate system in which the

x plane − y plane axes are aligned to the x − y axes of the camera’s coordinate system

and the origin O plane point of intersection of normal vector to π . We can use the

equation of conic correspondence stated in equation (2.15),

H

T

C2 H = µC1,

to form the new constraint

(R − tuT )T K T C2K (R − tuT ) − µK T C1K = 03×3. (3.5)

This equation transforms the problem of pose estimation from conic corre-

spondence into a problem of estimating R, t and u from a set of five equations.

Though the matrix equation has six polynomial equations in all, its elements be-

ing unique upto non-zero scalar multiple, we have five equations, or by introduc-ing one more variable µ, we have six equations but the additional variable, µ. As

evident from the equation (3.5), t and u appear as a scalar product form. We need

to estimate u upto a scalar multiple. This argument reduces the variable set to R,

t, u =

1 n2 n3

T and µ: nine parameters in all from six equations. In order to

reduce the number of unknowns further, we introduce two assumptions,

1. Scene conic C is a circle in the global coordinate system.

30



2. The translation vector lies in a particular plane. Let us denote it by π w and

its defining normal vector by w.

The first assumption is easily realized by indoor scenes and to an extent by out-

door scenes. For example, a scene comprising of household artifacts is quite likely

to contain circular cross-sections in form of bottle-mouths, cups, glasses, doorknobs, objects of art and craft that contain circular arcs and curves, holes in walls

etc. And the complete circular curves don’t need to be visible, partially visible

curves can be fit to curves with considerable accuracy. In most of the cases, the cir-

cular objects in scenes would be solids and more like circular discs which implies

that they would be wholly in front of the camera while imaging. This means that

these circles would be always projected as ellipses. We have many tools available

for detecting an ellipse from an image and then fit a polynomial to this ellipse.

One such tool which we use for our experiments is developed by Prasad, [ 15].

The second assumption is not so commonly fulfilled as is the first one. But many

times it so happens that the camera is leveled and held on a tripod stand, even

as it moves. This fact can be used to estimate the plane that contains translation

vector. Hence in such cases, the plane containing the translation vector is already

known.

These two assumptions further reduce the number of variables in equation

(3.5). In next section (3.3) we prove the lemma (2), which says that u can be es-

timated as a set of finite solutions, each unique upto a non-zero multiple. Forthis we derive two equations (3.16) and (3.17). This means that out of the three

parameters of vector u, two are estimated through the lemma. Thus we are left

with seven parameters and six equations. Next, the second assumption reduces

the parameter set by one more variable. In summary, by employing two assump-

tions, we have a fully determined set of seven polynomials in seven variables.

Rewriting the constraint equations we get,


wT t = 0, (3.6)

where u, C1, C2 and w are known and R, t and µ are to be estimated. If we con-

sider the geometry described by the above equations, we can intuitively note that

all of the seven polynomials are algebraically independent. This means that for

non-trivial cases, unique solution(s) exist. These equations form the backbone of

the approach for pose estimation we next propose.

31



The conventional approach to pose estimation is a two stage task of estimating F

from feature2 correspondences and, then R and t from F. As mentioned earlier in

section on background work in chapter (1), there is a lot of literature and study on

methods for estimating F from point or conic correspondences. But most of these

methods treat F as a single mathematical entity to be estimated at once. For the

second stage of estimating R and t, we have an algorithm proposed by Hartley

in [10], based on SVD decomposition of the fundamental matrix. A point worth

noting here is the fact that the estimation of R and t from F gives little insight

in a two camera setup in euclidean space. Additionally, the first assumption is

not encoded directly in the fundamental matrix formulation and in the way it is

estimated from point correspondences. These are the reasons why we look for

a different approach to pose estimation from conic correspondence. The reason

why assumptions affect the methods we adopt, is due to the fact that for the first

assumption of the scene conic being a circle, there are direct constraints on theposition of the scene plane. These constraints on plane position aren’t evident

directly from treatment of F as one quantity or even when we estimate it directly

from point or conic correspondences. The idea here is to breakup the fundamental

matrix in such a way that we have a direct relationship among the quantities de-

scribing pose, R, t, the conic correspondence and scene plane position. Equation

(3.6) encodes such a relationship. This equation can be solved through a geo-

metric construction, with which we can estimate all possible pose solutions with

substantial accuracy. This relationship in mathematical terms is given by equa-

tion (3.6). Next we give a derivation of two equations that put two polynomial

constraints on the plane position by employing the first assumption. And though

we are looking for a geometric construction in order to list out all possible pose

solutions, in section (4.1),we give an account of a way to optimize a cost function

that encodes the equation (3.6) so that one can register the follies with such an

approach and we justify our motivation for such an approach.

3.3 Mathematical implication of the first assumption

on scene plane π

Let us consider the arrangement as given in section (1.1), where only one camera

cam(O1, π 1, K ) and the scene plane π are considered. Let the conic C1 ∈ π 1 be

known. Given this setup we claim that there are finitely many positions of the

2Traditionally, features have included points but lines, conics and curves have been subse-quently used for estimating fundamental matrix.

32



plane π , unique upto non-zero scalar multiple, such that the scene conic, C ∈ π

is a circle in the global coordinate system. This coordinate system is assumed

to have its origin coincide with O1 and the x − y axes parallel to the x − y axes

of the local coordinate system in plane π 1. In other words the orientation of the

camera cam(O1, π 1, K ) with respect to the global coordinate system is represented

by R = I 3 and t = [0 0 0]T .

Lemma 2. Let us assume π be a scene plane defined by the normal vector, u =

m1 m2 m3

T .

Without loss of generality, we can assume m1 = 0. Then we have a finite set of solutions

for variables m2 and m3 such that conic C is a circle.

Proof. Based on our assumptions, the projection matrix of camera cam(O1, π 1, K )

is P1 = K [I 3 03]. From a result by Quan in [16], we know that given a projecton

matrix P1 of a camera, and a conic C1 in image plane π 1, the cone containing C1 is

given by PT 1 C1P1. Let us denote it by Q.

∴ Q =

I 3

03

K T C1K [I 3 03] .

Let K T C1K be denoted as Ccal .

∴ Q = PT C1P1 =

Ccal 03

0T 3 0

.

Alternately Q can also denote a set of points defined as

x y z

T ∈ R

3|

x y z 1 Ccal 03

0T 3 0

x y z 1

T = 0

.

Then the conic C in π , (π is the scene plane) is the intersection of π with Q. Then

the intersection of π with Q is given as the solution set of the equation,

−1 − m2 y − m3 z

m1 y z 1

Ccal 03

0T 3 0

−1 − m2 y − m3 z

m1 y

z

1

= 0 (3.7)

33



=⇒

−1 − m2 y − m3 z

m1 y z

Ccal

−1 − m2 y − m3 z

m1

y

z

= 0. (3.8)

Thus the conic, C, in global coordinate system is obtained as the set of points

C =

x y zT

∈ R3|x =

(−1 − m2 y − m3 z)

m1&

−1 − m2 y − m3 z

m1 y z

Ccal

−1 − m2 y − m3 z

m1

y

z

Let us have the following notations,

n1 = 1

m1, n2 =

−m2

m1, n3 =

−m3

m1, (3.9)

o3 =

−n1 0 0

T , o1 =

n2 1 0

T , o2 =

n3 0 1

T . (3.10)

Then we have

−1 − m2 y − m3 z

m1 y z

Ccal

−1 − m2 y − m3 z

m1

y

z

= 0.

∴ (o3 + y(o1)) + z(o2))T Ccal (o3 + y(o1)) + z(o2)) = 0. (3.11)

This can be rewritten as

y z 1

oT

1 Ccal o1 oT 1 Ccal o2 oT

1 Ccal o3

oT 2 Ccal o1 oT

2 Ccal o2 oT 2 Ccal o3

oT 3 Ccal o1 oT

3 Ccal o2 oT 3 Ccal o3

y

z

1

= 0.

Now we need to represent the points

x y zT

∈ π in a local coordinate sys-

tem of π . Following the definition of plane π by a zero set of equation (3.4), we

select an orthonormal coordinate system that depends only on the parameters,ni, i = 1, 2, 3. We further simplify the notations,

34



M = −1 + n2n3 + n2

3

1 + n2n3 + n22

, a =

−n1 0 0T

,

k 1 =

(n2 + n3)2 + 2, k 2 =

( Mn2 + n3)2 + M2 + 1,

b =

n2 + n3k 1

− n1 1k 11k 1

T

,

c =

Mn2 + n3

k 2− n1

M

k 2

1

k 2

T

. (3.12)

The points a, b, c so parameterized lie on plane π . And the orthogonal axes are

−→ab =

n2 + n3 1 1

T

k 1, −→ac =

Mn2 + n3 M 1

T

k 2.

One can easily verify the following ,

−→ab .−→ac = 0,

−→ab = 1, −→ac = 1 .

From this parametrization, we have the following relationship between local co-

ordinate vector representation [u v]T and global coordinate vector representation

[x y z]T of the same point:

y = u

k 1+

Mv

k 2, z =

u

k 1+

v

k 2⇔ u = k 1

Mz − y

M − 1 , v = k 2

y − z

M − 1. (3.13)

Substituting the values of y, z in equation (3.11) we get

o3 + (

u

k 1+

Mv

k 2)(o1) + (

u

k 1+

v

k 2)(o2)

T

Ccal

o3 + (

u

k 1+

Mv

k 2)(o1) + (

u

k 1+

v

k 2)(o2)

=

Rearranging the terms, we obtain a polynomial in variables u and v,

o3 + u(o2 + o1)

k 1+

v( Mo1 + o2)

k 2 T

Ccal o3 + u(o2 + o1)

k 1+

v( Mo1 + o2)

k 2 = 0.

Rewriting we get,

u v 1

lT 1 Ccal l1 lT

1 Ccal l2 lT 1 Ccal l3

lT 2 Ccal l1 lT


lT 3 Ccal l1 lT


u

v

1

= 0,

35



where l1 = o1 + o2

k 1, l2 =

Mo1 + o2

k 2and l3 = o3.

Thus the conic C in parameteric form in local coordinate system is defined as

C =

u v

T ∈ π | u v 1

lT 1 Ccal l1 lT


lT 2 Ccal l1 lT


lT 3 Ccal l1 lT


u

v

1

= 0

, (3.14)

and hence represented by the matrix,

lT 1 Ccal l1 lT


lT 2 Ccal l1 lT


lT 3 Ccal l1 lT


. (3.15)

If we have a circle represented by the solution set of the equation,

(u − a)2 + (v − b)2 = r2,

its matrix representation would be obtained by rewriting the equation as

u v 1

1 0 −a

0 1 −b

−a −b a2 + b2 − r2

u

v

1

= 0.

For the conic C to be a circle, its matrix, as defined in equation (3.15), has to be

of the same form as given above. Hence we have two conditions as

lT 1 Ccal l1 − lT

2 Ccal l2 = 0 (3.16)

and

lT 1 Ccal l2 = 0, (3.17)

where l1 =

n2 + n3 1 1

T

(n2 + n

3)2 + 2

, l2 =

Mn2 + n3 M 1

T

( Mn2 + n

3)2 + M2 + 1

, Ccal = K −1C1K and

M = −1 + n2n3 + n2

3

1 + n2n3 + n22

, with n1, n2 and n3 as defined in equation (3.9).

We can solve the two equations, (3.16) an d(3.17) for two variables n2 and n3. Then

the plane π is defined by the vector,

−m1

−1 n2 n3

T ,

36





where i can be any numeral or a sub-scripted string, e.g. π αua23 denotes the plane

defined the vector αua23, α being a real scalar.

Geometric construction

The two assumptions which were stated at the start of this chapter help in geo-metric construction forming the bulk of this approach:

1. The scene conic C is a circle in the global coordinate system.

2. The translation vector t lies in a given plane(lets denote the plane as π w)

specified in the global coordinate system.

The geometry is augmented by the availability of conic correspondences C1 ↔ C2.

The calibrated counterparts of C1 and C2 are C 1 and C

2 respectively. Henceforth

in this section for geometric construction we shall use C 1 to mean the calibratedcounterpart of the conic C1 defined as C

1 = K T C1K . Revisiting the arrangement of

section (1.1), let us form cones Q1 and Q2 through conics C

1 and C2 respectively as

shown in the figure (3.1). This diagram is basically an extension of the two camera

setup of section (1.1). The rigid body motion that defines the relative pose of cone

Q1 with respect to cone Q2 is considered to be R and t. C 1 is the intersection of π 1

with Q1 and C 2 is the intersection of π 2 with Q

2. C is the intersection of π with

Q1 or with Q2. In fact the cones Q1 and Q

2 have to intersect in a planar conic. A

result is stated by Quan in [16] that the two cones must intersect in a quartic curve

which disintegrates into two second order planar curves of which one is the scene

conic C.

Step-1

Let us apply rigid body motion3 on the structure formed from camera-2(shown

in yellow in the above figure) and cone Q2. Camera-2 has its coordinate system

as origin O2 and triplet of axes {X c, Y c, Zc}. In other words, we first rotate Q2

through rotation matrix R and then translate it through the translation matrix

t. This motion results in cone Q2 and circle C is transformed into circle C. As

a result, the two cameras in this case would coincide and so would the image

planes π 2 and π 1. We see further that the circle C and the circle C have the same

radii. This situation is shown in the next figure (3.2). This rigid body motion is

precisely the relative pose we have to estimate. The idea lies in estimating the

3The rigid body motion is with reference to the coordinate system of camera-1, which consists

of origin O1, and the triplet of axes

X w f , Y w f , Zw f

.

38



Figure 3.1: Two cones,Q1 and Q2 describing a conic correspondence.

relative orientation between the circles C and C. The series of steps to follow

demonstrate a geometric construction of estimating the pose once the two circles

are known in R3. The two conics C1 and C

2 are known which give us two cones

Q1 and Q2 respectively. We apply lemma (2) to these two cones to get two sets of

plane positions of the form u =

m1 m2 m3

T denoted by U 1 and U 2. The two

sets are defined as:

U 1 is the set of planes π u1 such that the intersection of plane π u1 with cone Q1

is a circle, and U 2 is the set of planes π u2 such that the intersection of plane π u2with cone Q2 is a circle. The following property can be inferred from the proof of

lemma (2):

Lemma 3. If π u1 ∈ U 1, then π αu1 ∈ U 1, ∀α ∈ R − {0}. Similarly if π u2 ∈ U 2, then

π αu2 ∈ U 2, ∀α ∈ R − {0}.

Proof. Let us apply lemma (2) to C1 and its cone Q1. Inspecting the form of equa-

tions (3.16) an d(3.17) so obtained, we see that they are homogeneous polynomials

39



Figure 3.2: Rigid body motion of the cone Q2 onto Q2.

in three variables, m1, m2 and m3. By change of variables, we transform them into

polynomials of two variables n2 = −m2/m1 and n3 = −m3/m1. Hence scaling u1

by α does not have any effect on n2 and n3. Similarly we can argue for conic C2

and its cone Q2. Thus we have the result that if π u1 ∈ U 1, then π αu1 ∈ U 1, ∀α ∈ R.

Similarly if π u2 ∈ U 2, then π αu2 ∈ U 2, ∀α ∈ R.

For every plane π u1 ∈ U 1, we can always find a plane π u2 ∈ U 2 such that the

radius of the circle of intersection of π u1 with Q1 is the same as the radius of the

circle of intersection of π u2 with cone Q24. Let us define the radius of intersection

of π u1 plane with Q1 as ru1 and the radius of the intersection of π u2 with Q2 as ru2.

This means for every π u1 in U 1 we have π u2 in U 2 such that ru1 = ru2.

This relationship defines a pair of planes. This pair is important as every such paircan give a possible pose estimation and for every plane π u1 we have two planes

in U 2 which form such a pair viz π u2i and π −u2i for a scalar i ∈ N. Thus the set of

all possible pairs of planes which can give us a solution can be defined as

U sol = {(π u1, π u2) ∈ U 1 × U 2|ru1 = ru2} . (3.18)

4This can be seen as every cone extends to infinity and the radius can take any positive realvalue by appropriately positioning the plane.

40



This brings to us another interesting property which we have proved as a lemma

(18) in appendix (C).

Step-2

Once we have the two sets U 1 and U 2 in order, the set of all possible solutions to

the pose estimation problem forms a subset of U 1 × U 2 which has been defined in

equation (3.18). Let (π u11, π u21) ∈ U sol be one such solution pair. In other words

ru11 = ru21. This gives us the pair of planes as defined before. Let the circles be

defined by the matrices as

C =

1 0 −a1

0 1 −b1

−a1 −b1 a21 + b2

1 − r21

, (3.19)

and

C =

1 0 −a2

0 1 −b2

−a2 −b2 a22 + b2

2 − r21

. (3.20)

The circles are in a specific local orthonormal coordinate system which solely de-

pends on the plane position in R3. Matrix C represents the circle of intersection

of cone Q1 with π u11 and C represents the circle of intersection of cone Q2 with

π u21. Their radii being the same we denote them by r1. The two circles can be seen

as one being a rigid body motion of the other. Let us have the relative orientationdefined as

Rx + t = y, ∀x ∈ C, ∃ y ∈ C.

Further we know that cones Q1 and Q 2 intersect in C. Hence, applying the same

rigid body motion, R and t on cone Q 2, we should get the cone Q2, as shown in

figure (3.2).

Step-3

The next step is to map circle C to C through a rigid body motion comprising

of rotation R and translation t, on C. From the two circles’ representations (3.19)

and (3.20) we have the centers of two circles represented as

a1 b1

T for C and

center

a2 b2

T for C . But these center representations are in a local coordinate

system, unique for each plane. Their representations in global coordinate system

are obtained through equation (3.13) as shown in next equation. We shall denote

41



the global coordinate representation of the two center points as xc1 and xc2. Then

equations (3.12), (3.9) and (3.13) give us,

˜ yc1 = a1

k 11+

M1b1

k 12

˜ zc1 =

a1

k 11 +

b1

k 12

˜xc1 = −1 − m12 ˜ yc1 − m13 ˜ zc1

m11, (3.21)

for plane π u11 and

˜ yc2 = a2

k 21+

M2b2

k 22

˜ zc2 = a2

k 21+

b2

k 22

˜xc2 = −1 − m22 ˜ yc2 − m23 ˜ zc2

m21, (3.22)

for plane π u21. The plane π u11 is assumed to be defined by the vector

m11 m12 m13

T

and π u21 is assumed to be defined by the vector

m21 m22 m23

T . From equa-

tions (3.21) and (3.22), centers of the two circles are represented in global coordi-

nate system as

xc1 =

˜xc1 ˜ yc1 ˜ zc1T

, xc2 =

˜xc2 ˜ yc2 ˜ zc2T

.

Primary condition in mapping circle C to C is that the center xc1 should be mapped

to xc2. Second condition is that the translation vector t should satisfy the assump-

tion wT t = 0 where w is pre-specified. These two conditions lead to our next step

of geometric construction. figure (3.3) depicts the geometric construction for esti-

mating R and t by mapping one circle C to C . Steps four and five next describe

and solve this construction.

Step-4

We have a plane π w1 through point xc2 such that π w1 π w. Then the point

xc1rot = Rxc1 (let’s assume) ,

should lie on π w1 and

t = xc2 − xc1rot ,

42



Figure 3.3: A diagram describing the geometric construction.

which by construction would be on π w1. The distance of point xc1rot from origin

is the same as that of xc1 from origin. Hence we have the first two equations

constraining xc1rot as

wT 1 xc1rot + 1 = 0, (3.23)

xc1rot = xc1 . (3.24)

Let us denote the point on perpendicular line from origin to plane π u11 and also

lying on π u11 as p1, whose coordinates depend on u11 as

p1 = − u11

u11 2. (3.25)

The plane through xc1rot and parallel to π u21 is denoted as π uc1rot and defined by

the vector, uc1rot as

uc1rot = − u21

(xT c1rot u21)

.

The point on perpendicular line from origin to plane π uc1rot and also lying on

π uc1rot is denoted by p2. Its coordinates depend on uc1rot as

p2 = − uc1rot

uc1rot 2. (3.26)

43



Then the distance of p2 from xc1rot should be the same as that of p1 from xc1, giving

us the following polynomial equation:

p2 − xc1rot = p1 − xc1 . (3.27)

These equations (3.23), (3.24) and (3.27) encode the solution to the parameters Rand t. Point xc1rot obtained as a solution to the above three equations help us in

determining R with the following constraints:

Rxc1 = xc1rot ,

Rp1 = p2. (3.28)

Let us assume, A = xc1 p1 (xc1 × p1) and B = xc1rot p2 (xc1rot × p2) , (3.29)

where xc1 × p1 represents the vector cross-product of xc1 and p1, and similarly for

xc1rot and p2. Then, we estimate such a matrix R as

∴ R = B A−1, (3.30)

with A and B both being invertible matrices, justifying an existence of R as ob-

tained above. Now, from the way the solution to R is designed, we can ascertain

the following from equations (3.24), (3.27) and (3.28),

xc1 = xc1rot ,

p1 = p2 , (3.31)

and the angle between the vectors xc1 and p1 is the same as the angle between

vectors xc1rot and p2. With these facts in mind, one can easily prove the following

with matrices A and B as defined in equation (3.29),

AT A = BT B.

From this it is straightforward to note that the matrix R = BA−1 obtained as in

equation (3.30) is a rotation matrix. Once R is known, t is estimated as

t = xc2 − Rxc1, (3.32)

44



with xc1rot estimated from equation (3.28). Thus we have estimated R and t for

one solution point xc1rot to the three equations (3.23), (3.24) and (3.27) designed

for one pair of planes (π u11, π u21).

Step-5

The three polynomial equations have at most four real solutions, and each point

gives one pose solution R and t, leading to a maximum of four pose solutions

for each plane pair. Then above steps are repeated for all possible plane pairs

{π u11, π u21} ∈ U sol . Thus we get a set of solutions R and t obtained for all such

plane pairs. For a general case there would be more than one in such a set, of

which one would be the true solution we desire and the rest are to be elimi-

nated. Next section describes the non-uniqueness of solutions in this set and some

thoughts on how to pick the particular solution which actually realized camera

setup.

Non-uniqueness of solution

Case-1: The three equations (3.23), (3.24) an d(3.27) are polynomials of degree one,

two and two respectively. One can eliminate a variable and reduce the three equa-

tions to two. Hence the total number of possible solutions are four for each pair

of planes as an application of bezout’s theorem on counting intersection points of

two curves. Hence for every plane pair, (π u11, π u21), we have at most four pose

solutions possible.Case-2: The second case arises due to the fact that for every plane π u1 in U 1, we

have two planes possible in U 2, π u2 and π −u2, such that ru1 = ru2 = r−u2. These

planes have their normal vectors in opposite directions. The discussion following

lemma (3) states the same fact which in precise terms can be rewritten as:

If (π u11, π u21) ∈ U sol has translation t as part of its solution then (π u11, π −u21)

has translation −t as a part of one of its solutions. So the translation vectors for

all solutions to (π u11, π −u21) are negative counterparts of translation obtained as

a solution to (π u11, π u21). The complete relationship between pose solutions for

plane pairs (π u11, π u21) and (π u11, π −u21) can be derived based on equations (3.29)

and (3.30) as:

t1 = −t,

R1 = B1 A−1 (3.33)

45



where B1 = −B + 2

0 0 xc1rot × p2

.

∴ R1 = −R + 2

0 0 −xc1rot × p2

A−1. (3.34)

This relationship gives us a way to estimate R and t for one pair of planes (π u1, π −u2)

if R and t for the pair (π u1, π u2) are known.Case-3: The third case arises due to the fact that if (π u11, π u21) ∈ Rsol gives us a

solution R and t then (π αu11, π αu21) gives us a solution R and t/α. We prove this

fact in lemma (19) of appendix (C).

Because of the first two cases of non-uniqueness of a pose solution we have thirty

two pose solutions in all. Accounting for the third case as well, we can’t estimate

the translation vector beyond non-zero scaling. Case-1 & 2 can be solved through

some point correspondences or as we show next through some observations. We

show next a breakup of how one can eliminate all but one solution.Solution to case-1 and case-2: These two problems can be worked out by us-

ing some point correspondences. Ideally a single point correspondence should

be enough to select a true solution. But the one point correspondence we have

might be realized by more than one solution of R and t. Unfortunately, to the best

of our knowledge, there is no one-shot way of selecting the right discriminating

point correspondence. Additionally the main focus of this thesis being on min-

imal correspondences, we look for other ways for fixing one solution of the set

of solutions. We have tested our approach on synthetic data. The data has been

designed to model a real world scenario as closely as possible. For this we have

used the epipolar geometry toolbox, [17] in MATLAB. The point which is taken

care for is that the circle which is imagined by both of the cameras is wholly in

front of the cameras. In other words if c is the camera center, π is the image plane

and x is a point on circle, then c and x are points on different sides of the plane

π . This would eliminate sixteen of the thirty two solutions. The procedure is out-

lined next.

Condition for the scene conic to lie in front of the camera:

Consider a plane pair (π u11, π u21) and circles, C and C

in these two planes. Letcenters of the two circles be xc1 and xc2 as mentioned in step-3 above. Writ-

ing the defining vectors for the two planes, u11 as m1

1 −n2 −n3

T and u21

as m 1

1 −n

2 −n3

T , lemma (2) fixes n2, n3 for π u11 and n

2, n3 for π u21, which

means that the factors, m1 and m1 scale centers xc1 and xc2 respectively. Hence we

need the scales to be such that m1xc1 and m1xc2 lie in the front of the first camera.

This condition gives a possible range for the scaling factors. Either the range con-

46



sists of positive real values or negative values, based on which, we eliminate one

half of pose solutions in U sol .

Second observation is that for scenarios with small rotation angles the geodesic

distance of rotation matrix from identity matrix is the least for the specific pose

solution which is the best approximation to the true solution. This hypothesis

has been tested extensively on synthetic datasets. The distance metric used is the

geodesic metric on unit sphere, [18]:

d(R, I 3) =

trace(LT L)

where L = ((acos((trace(R) − 1)/2))(R − RT )/(2sin(acos((trace(R) − 1)/2)))),

(3.35)

where R is the rotation matrix whose geodesic distance from identity matrix, de-

noted by d(R, I 3) is to be estimated.

Case-3 solution: Once R is known, all that is left is to find the correct value of t.

Hartley in [1] talks about the case of epipolar geometry under the effect of pure

translation. From section (2.4) we can see that scaling t would scale F correspond-

ingly and thus the point mapping between the two images would still stay the

same. Hence, we can not estimate the scale of t simply by using the point cor-

respondences in the two images. Translation can be determined upto non-zero

scalar multiple only.

3.5 Experiments

This section contains results of some experiments we have performed on synthetic

as well as real data for the geometric approach proposed for pose estimation in

this section.

3.5.1 Experiments for geometric approach on synthetic data

Synthetic dataset has been designed using the epipolar geometry toolbox, [17].

Not going in details of the process followed, it should be noted that a scene circle

is first chosen in R3. Calibration matrix K is assumed to be an identity matrix.

Projection matrix P1 is the same for all examples, with the first camera assumed

to be at the origin of the world coordinate system. The projection matrix of the

second camera, P2 is chosen randomly, starting from ones with smaller rotation

angles and progressively with larger angles. One such dataset and its solution, so

obtained through our algorithm is described next.

47



Discussion on an experiment on synthetic dataset for geometric

approach

For this dataset, we estimate all possible thirty two distinct pose solutions, R andt. From points on non-uniqueness of solutions discussed previously, we select two

pose solutions, which are shown in figure (3.4), where the true camera positions

Figure 3.4: Pose solution

are shown in green and yellow colors. First camera is shown in green. For sec-

ond camera(shown in yellow), the rotation matrix, R true is defined through euler

angles about the three coordinate axes as −8◦ about z axis, 10◦ about y axis and

0◦ about z axis. The translation vector is set to be ttrue =

1 −11 1T

. Let R1, t1

48



and R2, t2 be the two best pose solutions selected through our algorithm. Then

the camera for pose solution R1, t1 is shown in blue which almost coincides with

the true pose for second camera and the camera for pose solution R2, t2 is shown

in black color.

Departure of rotation matrices for these two solutions and the true solution from

identity matrix is

d(Rtrue, I 3) = 0.3515, d(R1, I 3) = 0.3516 & d(R2, I 3) = 1.8472.

The distances are based on geodesic distance on unit sphere between two points

R1 and R2 in SO(3) group, [18]:

d(R1, R2) = log(RT 1 R2) F,

where R1 and R2 are two rotation matrices.

Table 3.1: Results of single stage geometric approach on synthetic dataset. HereRtrue and ttrue denote true values and, R and t denote the pose solution obtained

through convergence for gradient descent scheme.

Angles with re-spect to x, y and z axes

Translation vec-tor, ttrue

Geodesic dis-tance of R fromRtrue

RecoveredTranslationvector, t

Angle be-tweent andttrue

Geodesicdis-tance of R fromI 3

Is the se-lected solutionwith small-est geodesiclength?

10◦ , 0◦, 0◦

0.50.10.1

2.1 × 10−4

0.500970.10200.0989

0.2428◦ 0.2469 yes

10◦ , 20◦ , 0◦

0.7−0.1

1

7.9 × 10−6

0.6993−0.0999

1.0000

0.0028◦ 0.5513 yes

0◦, 10◦, −5◦

1

−30.1

2.3 × 10−4

0.9993

−2.99930.1000

0.0080◦ 0.2760 yes

1◦, 10◦, −8◦

1

−111

1.3 × 10−4

0.9960

−11.00961.0049

0.0321◦ 0.3154 yes

30◦ , 0◦, 0◦

−0.1

−13

0.1666 × 10−4

−0.0914

−1.0642−3.0429

0.8615◦ 0.7239 yes

1◦, −30◦, −80◦

0.0891

−0.0980−0.0178

3.9 × 10−4

0.0900

−0.9853−0.1790

0.0264◦ 2.093 no

One more point to note is that we estimate the translation vector only upto a

non-zero scalar multiple. Hence we scale it up with the same scalar which has

scaled the true translation of second camera for visualization purposes in figure

(3.4). Hence for this case we select R1, t1 as the best possible pose solution taking

into consideration the observation that this solution has its rotation matrix closest

to identity matrix in geodesic sense.

This was for one experiment on synthetic dataset for our proposed approach. We

49







Table 3.3: Result of part real data for investigating the error due to erroneouscalibration matrix

Geodesic dis-tance of Rtrue

from I 3

Translation vec-tor, ttrue

Geodesic dis-tance of R fromRtrue

RecoveredTranslationvector, t

Angle be-tweent andttrue

Geodesicdis-tance of R fromI 3

Is the se-lected solutionwith small-est geodesicdistance fromI

3?

0.4810

743.7650

−130.3833508.9385

0.0015

7.4437

−1.30335.0907

0.0168◦ 0.4813 yes

plotted in local image coordinate system.

Figure 3.7: Difference between the two conics of real and sythetic datasets

With this new dataset, we run our algorithm and select the best solution, tabu-

lated in table (3.3). If we continue the assumption of previous section that C2 with

other parameters kept the same, gives us the same pose solution, we have

K T C2K /µsyn = K T C2syn K /µ, µ, µsyn = 0

from equation (3.5). Hence C2 and C2syn represent the same conic which has been

found to not be true(as evident from figure (3.7)). Either the calibration matrix

or the ground true values for pose, aren’t accurate or the conics C1 and C2 have

erroneous representation matrices. But the conic detection algorithm has errors

of the order of 10−3 which can be considered to be sufficiently negligible. And

R and t as estimated through the toolbox, [19], give us pixel errors of the order

of 0.1. This leaves us the calibration matrix which has substantial errors, of the

order of 10 in each of its elements. Added to these errors is the distortion which

is not included in the calibration matrix, giving us incomplete rectification while

52



constructing conic representations in camera coordinate system. One can deduce

from this discussion that the primary reason for substantial error in pose solution

for a real dataset is errors in the calibration process we have employed through the

calibration toolbox. But it is worth noting that our algorithm gives better accurate

results than conventional optimization process for synthetic datasets.

3.6 Summary

This chapter forms the core of the thesis. We start with an introduction to two

equations derived in section (1) which relate the relative pose to a conic corre-

spondence. Based on these two equations, we devise a geometric construction

in an epipolar geometry framework simplified by two important assumptions

regarding scene conic and plane containing the true translation vector. The ge-

ometric approach thus proposed is tested upon both synthetic and real dataset.The results so obtained are compared, analyzed and discussed in order to explain

the performance of our proposed method. In next chapter, (4) we consider two

alternate approaches to pose estimation from one conic correspondence. These

two approaches differ from the geometric method we have taken up in this chap-

ter in the way in which we estimate the pose solution. As against this geometric

approach, these alternate approaches are based on optimization of cost functions

appropriately modeled on equations that relate pose, R, t to elements of epipolar

geometry, H , e, C1, C2.

53



CHAPTER 4

Alternate approaches to pose estimation

This chapter describes two techniques for pose estimation which we have consid-

ered at certain points but the results haven’t been as good as the ones we obtained

with geometric approach, discussed and reported in chapter (3). The first tech-

nique is based on the same set of equations, on which the geometric approach is

based, which means the two assumptions defined in section (3.2) of chapter (3)also hold true here. But we estimate pose through a conventional optimization

scheme instead of the geometric construction. This is described in next section,

(4.1). The second approach we report here is based on a different idea, which can

be seen to be loosely based on the work of Higgins, [5], Zhang, [7] and Luong,

[8]. We employ optimization for a cost function modeled for one conic correspon-

dence and one point correspondence. Optimization schemes have been either

gradient descent method implemented through calculation of gradient vectors or

through MATLAB’s inbuilt methods like lsqonlin(.). The results for both of these

implementations are comparable. Hence in section (4.1.1) we report results for

experiments of first approach through gradient descent.

4.1 Estimating R and t through optimization

The equations which define the dependence of R and t on conics C1, C2 and the

scene plane π are,

(R − tuT )T K T C2K (R − tuT ) − µK T C1K = 03×3,wT t = 0, (4.1)

where u, C1, C2, w and K are known and R, t and µ are to be estimated. For sake

of brevity we assume C1 = K T C1K and C

2 = K T C2K . From this we define the

54



following cost function,

E(Y , t, µ) = (Y − tuT )T C2(Y − tuT ) − µC

1 2F +(wT t)2+ Y T Y − I 3 2

F +(det(Y ) − 1)2.

(4.2)

This allows us to use the above lemma (2) for C 1, giving us the vector u upto a

scalar multiple. Hence from equation (3.6), we can consider u as a known constantand hence have to estimate all elements of t. Vector w being constant, we have

unknown variables as Y , t and µ. The norm for matrices considered here is the

frobenius norm:

A F=

trace( AT A).

We have replaced the rotation matrix, R, with a real matrix Y and additional con-

straints Y T Y = I 3 and det(Y ) = 1. The cost function has been optimized through a

command lsqnonlin(.) in MATLAB, [21]. Results of sample experiments with this

approach are listed in section (4.1.1). With a random starting point, the behavior

of the algorithm is as expected for a conventional optimization technique. After

a certain value of cost function is achieved, the algorithm tends to get stuck in a

local minima. Additionally the final value achieved upon convergence depends

on the starting point. Due to these reasons, it is practically unfeasible to estimate

a unique solution in form of a global minima to the cost function. This is evident

from results listed in table (4.1) of section (4.1.1). With a starting point closer to

the true value, the algorithm converges to a solution which is considerably close

to the true value. But with a starting point which is considerably far from the true

values, the point reached upon convergence is far from the true solution.

One can perform optimization by explicit computation of gradient descent as

well. These vectors, ∂E(Y , t, µ)

∂Y ,

∂E(Y , t)

∂t and

∂E(Y , t, µ)

∂µ are:

∂E(Y , t, µ)

∂Y = 4C

2YY T C2 + 2(tT C

2Yu)C2tuT + 2 u 2 C

2ttT C2Y + 2C

2YL

+ C2tuT LT + L + 4Y T Y − 4Y + 2det(Y )R−T

n (det(Y ) − 1),

∂E(Y , t, µ)

∂t = 2(tT C

2Yu)C2Yu + 2 u 2 C2

2t + 4 u 4 C2t − 4µuT C

1uC2t + C2Yu

+ 2C2tuT Y T C

2Yu+ u 2 (tT C2tC

2Yu + 2C2ttT C

2Yu) − µC2YC

1u + 2(wT t)w,

∂E(Y , t, µ)

∂µ = −2tT C

2tuT C1u + 2µ(trace(C2

1)) − trace(Y T C2YC

1) − tT C2YC

1u, (4.3)

where L = (tT C2t)uuT − µC

1 and L = ∂(tT C

2YY T C2Yu)

∂Y . L hasn’t been further

simplified here for it doesn’t have a concise representation in matrix form. But it

can be simplified using symbolic computation toolboxes like MATLAB or Maple.

Or it’s analytic expression can be derived through some tedious matrix algebra.

55





purely optimization scheme, it is not feasible to estimate all possible pose solu-

tions.

The function lsqnonlin(.) of optimization toolbox in MATLAB has the option of

two types of optimization algorithms inherently. One is the Levenberg Marquardt

algorithm, [22] and the other is the trust region method. These two vary in a man-

ner which is not quite important to our problem at hand. But what is crucial is

the fact that the these algorithms don’t always converge to the global minima or

even if they do, one can never ascertain fully how many distinct points of global

minima our cost function can attain. A second problem here is that the cost func-

tion which we are attempting to solve is a set of thirteen polynomials in thirteen

variables. Theoretically such a solution set has multiple solutions and through

such a pure optimization approach, it is not feasible to estimate all possible pose

solutions.

4.2 Multi-stage approach to pose estimation: a com-

parison

Another approach which we have given some thought to is a based on two stage

dependence of R and t on point and conic correspondences. This relationship

depends on a property of fundamental matrix that defines point correspondence

between two image planes π 1 and π 2 as,

a ↔ b ⇔ bT Fa = 0, a ∈ π 1 & b ∈ π 2. (4.4)

A fundamental matrix can be decomposed as F = [e]× H , as introduced in

section (2.4). Thus, given n point correspondences, {ai ↔ bi} as defined above,

one can think of minimizing the error

f (F) = Σni=1(bT

i Fa2i ).

This gives us the fundamental matrix F from which we have the essential ma-trix E = K −T FK −1 with the assumption that calibration matrix K , known. Once

E is known, R and t can be estimated through a relative orientation algorithm

suggested by Hartley in [10]. This has led to a method of pose estimation from

point correspondences which has been quite studied and researched in the past

and successfully implemented. Theoretically seven point correspondences of this

form are sufficient to estimate F. The points involved in this correspondence need

57



to be in general position. By a general position, we mean that no three points

should lie on the same line in any of the two planes1. Similar to these approaches

we suggest a method for estimating F from point and conic correspondences. To

begin with let us consider one point correspondence,

a ↔ b, a ∈ π 1 & b ∈ π 2, (4.5)

and one conic correspondence,

C1 ↔ C2, C1 ∈ π 1, C2 ∈ π 2.

Let us have a scene conic C in plane π , π being a scene plane. Then C1 and

C2 are images of C by the two cameras on image planes π 1 and π 2 respectively,

thus defining the above mentioned conic correspondence. The two cameras im-

age the same scene plane and hence there exists a homography between the twoimage planes, constructed by point transfer between two image planes through

π . We have defined this point transfer in section (2.1.1). Projective invariance of

conics implies that the same homography ought to transform C1 into C2. If this

homography mapping is denoted by H , we have

H T C2 H = µC1. (4.6)

This equation introduces a constraint on H in the form of a zero set of six ho-

mogeneous polynomials in nine homogeneous variables2, which are the elements

of vector h. Let us assume h = vec( H ), where vec(.) is the usual vector operation

in linear algebra that transforms an n × n dimensional matrix to n2 dimensional

vector formed by stacking up the columns of the matrix. The equation (4.6) is thus

transformed into a set of five polynomials given next:

f : R9 → R5 : f (h) =

hT S1h

hT S2h

hT S3h

hT S4h

−hT S5h

= 05×1, (4.7)

1In fact a set of three collinear points in one plane would invariably be mapped to threecollinear points in the other plane.

2The conic and homography representations are in homogeneous coordinates, due to whichwe estimate H uotp non-zero scalar multiple.

58







and H (or h = vec( H )) are obtained as:

δE(e, H )

δe = −2[ Ha]×bbT [ Ha]×e + 2(eT e − 1)e,

δE(e, H )

δh = −2bT [e]× Ha

a1[e]×b

a2[e]×b

a3[e]×b+ 2(hT h − 1)h, (4.14)

where a =

a1 a2 a3

T . The proposed algorithm is listed below.

Initialization: Let n = 0. We set timesteps te for update in e and th for update in h.

It is possible to keep the timesteps dynamically changing with the magnitude of

gradient vectors. But for the time being we take them to be constants. The thresh-

old for the cost function value, tolcost is preset. Starting point for e is a random

vector, but the starting point for H is chosen to lie on MX , h0 ∈ MX .

Algorithm:

1. Update e as en+1 = en − teδE(e, H )

δe |e=en, H = H n .

2. Project δE(e, H )

δh |e=en+1, H = H n on the tangent space of MX at point hn =

vec( H n). Let the projected vector so obtained be δh.

3. Compute geodesic with starting point as hn and the starting vector as δh.

The updated value of h is the endpoint of the geodesic, say hn+1 and H n+1

is obtained as H n+1 = vec−1(hn+1). The geodesic computation has been

implemented along the lines of [23] by Nowicki and Dedieu.

4. If E(en+1, H n+1) < tolcost stop and the solution is e = en+1 and H = H n+1

else increment n by one and repeat steps one through four.

4.2.2 Results and discussion

Both of these methods estimate e and H and hence F through one point and one

conic correspondence. Unlike the approach of section (4.1), we do away with the

two assumptions regarding scene conic being a circle and the translation plane

being known. As stated previously, R and t then can be found from via an SVD

decomposition of E = K −T FK −1. But the problem here is that being an under de-

termined polynomial system, the solution obtained through optimization would

vary with starting points and may not the be true one. We have implemented

61



these two optimization tasks on synthetic datasets, but the results aren’t promis-

ing enough to be listed here. The reason why we have listed this approach is that

there is some intuition in this idea. Previously we saw that MX is a polynomial

manifold and can be seen as an intersection of five quadrics in R9. Their defin-

ing matrices,S1,..., S5 are 9 × 9 symmetric matrices with a special structure. This

fact opens up a new way of looking at this optimization task. If the geometric

structure of this quadrics’ intersection is studied in detail, it may be possible to

have an improved optimization algorithm which gives us more accurate pose so-

lutions. Further, we can see that the intersection is a non-linear set of points inR9.

Hence a point of importance, we assume, is that by identifying the subsets of this

intersection set which are linear sets, we can simplify the process of optimization.

4.3 Summary

This chapter introduces two alternate ways of estimating pose from one conic

correspondence. The first approach, section (4.1) considers two assumptions and

hence one conic correspondence is enough for pose estimation. Whereas the sec-

ond approach of section (4.2) doesn’t consider the two assumptions and hence

needs five more point correspondences. For both of these approaches we design

cost functions which are optimized through either MATALB’s optimization tool-

box or gradient descent by explicit computation of the gradient vectors. We state

a common point for these two approaches, that the optimization tasks fail to con-

verge to the true solution point. For the first approach we carry out experiments

on synthetic data and justify this point as well.

62



CHAPTER 5

Conclusion and future work

In conclusion, we note that the geometric approach for pose estimation from one

conic correspondence gives us accurate pose solutions with the error being of the

order of 10−4. The idea for this approach rests on two important assumptions.

One assumption that the scene conic is a circle and the other assumption is that

the translation vector lies in a known plane. With these two assumptions thegeometry is highly simplified due to which we are able to employ computation

toolbox in MATLAB to solve the simplified set of polynomials and obtain all pos-

sible pose solutions. This further helps is estimating the finite set of all possible

pose solutions. Next, an observation is made that the pose solution with the rota-

tion matrix closest to identity matrix is the best approximation to the true value.

This observation helps in selecting one particular pose solution as final solution to

pose estimation problem. With experiments on synthetic data, we show that this

observation holds true and is valid for rotation matrices close enough to identity

matrix. For larger distances, we find the observation failing. This point props up

an important question which can form a part of future work. The question is that,

is a threshold possible to be computed analytically, such that for all cases with

distance rotation matrices from identity matrix less than this threshold, the obser-

vation holds true? If not then we need to find another way to select one solution

out of the finite set of pose solutions estimated through our geometric approach.

Another sure shot way is to use one point correspondence. But as mentioned in

chapter (3), we need to select one point correspondence which can be realized by

only one pose out of all the solutions. And such a selection in all general cases,doesn’t seem possible, atleast to our knowledge. So the search for a universal

method to pick one pose solution is still an open problem.

Secondly the results for real dataset have been marred by inaccuracies in cam-

era calibration. We haven’t figured out the source of error, but have shown that the

error in pose solution so estimated through geometric approach is solely due to

63





[24] form a grassmannian representation of a conic(in general a curve) in P(E4)

for epipolar geometry. Similarly, Burdis et al. in [25] consider the problem of

establishing correspondence between two curves which are images of the same

curve in 3D by considering the groups of projective transformations that leave a

curve invariant in a specific sense. These two approaches lend a new meaning

to epipolar geometry of curves and might be extended to the specific problem of

estimating pose from a conic(or generally a curve) correspondence.

65









[31] Z. Zhang, “A flexible new technique for camera calibration,” IEEE Transac-

tions on Pattern Analysis and Machine Intelligence, vol. 22(11), pp. 1330–1334,

2000.

[32] [Online]. Available: http://en.wikipedia.org/wiki/Bezout’s_theorem

69

http://en.wikipedia.org/wiki/Bezout's_theorem

http://en.wikipedia.org/wiki/Bezout's_theorem



CHAPTER A

Basics of projective geometry

A.1 Affine Geometry

In this section we introduce the geometry of affine spaces. These discussions will

lay the foundation for projective geometry.

A.1.1 Affine spaces

An affine space is a set A together with a vector space V and a faithful and transi-

tive group action of V 1 (with addition of vectors as group action) on A. Explicitly,

an affine space is a set of points, A together with a map:

l : V × A −→ A, (ν, a) → ν + a with the following properties:

1. Left identity: ∀a ∈ A, 0 + a = a (0 is a vector).

2. Associativity: ∀ν, w ∈ V , a ∈ A, ν + (w + a) = (ν + w) + a.

3. Uniqueness: ∀a ∈ A, ψa : V −→ A, ν → (ν + a) is a bijection. (This justi-

fies transitivity of the map l and faithfulness is seen in the fact that if two

elements f , g of V are such that

f + a = a, g + a = a, ∀a ∈ A,

then f = g.

The vector space V is said to underly the affine space A and is also called a differ-

ence space. Thus we have the operator ’+ ( defined as the map l)’ between a point

and a vector. Equivalently we can define an affine space in another way. We can

see it as some results that come from the above definition, considering an affine

space A and the underlying vector space is V :

1A group action of a vector space V on a set X is a map V × X → X , (v, x) : → v.x with asso-ciativity and existence of an identity element in V .

70



Lemma 4. We can have a subtraction map defined as :

φ : A × A −→ V , φ(a, b) → ν, ∀a, b ∈ A, v ∈ V

where

(ν, a) → ν + a = b

as per the definition of ’+’ operator above. Thus we can define φ(a, b) → b − a ≡−→ab = ν.

Then we prove here that this map is onto V and many-one.

Proof. If for two distinct vectors, v, u in V , we have φ(a, b) = v and φ(a, b) = u

then

φ(a, b) = v = u =⇒ v + a = b, u + a = b

=⇒ v + a = u + a

=⇒ (v − u) + a = a. (A.1)

Further, the uniqueness property says that 0 is the only vector such that

0 + a = a, ∀a ∈ A.

Hence we have

v − u = 0 =⇒ v = u.

This proves that the map φ : A × A → V is well defined. Also for every vector vin V and every point a in A, we can find a point b in A such that b = v + a. Hence

φ(a, b) = v, ∃b ∈ A, ∀v ∈ V .

This proves that the map φ is onto. And we can find distinct points a1, a2, b1, b2 in

A such that v = φ(a1, b1) = φ(a2, b2) for atleast one v in V . This proves that φ is

many-one.

Lemma 5. For three points a, b, c in A, φ(a, b) + φ(b, c) = φ(a, c) where φ is what we

have defined in lemma (4) above.

We shall accept this lemma without proof here.

Thus with a definition of an affine space in place, we can note one point that an

affine space is actually a set of points with such a vector space and represent it as

( A, V , φ) where V is the underlying vector space or also represented as (X ,−→X , φ)

where−→X is the underlying vector space. Henceforth in this literature we will use

71











U ⊆ V or V ⊆ U or U ∩ V = ∅.

Hence the condition of parallelism invariance can be stated as:

Lemma 8. Given an affine morphism f : X → X

between two affine spaces (X ,−→X , φ)

and (X

,

−→

X

, φ) if U , V be the affine subspaces of X such that U //V then the correspond-ing affine subspaces in image of f, U

= f (U ) and V

= f (V ) follow the property

U //V

.

Proof. We have seen in the section on affine morphism, that−→ f defines the corre-

sponding mapping between the underlying vectors spaces−→X and

−→X

. Hence by

definition of parallel subspaces,−→U ⊆

−→V or

−→V ⊆

−→U .

Let−→U ⊆

−→V .

∴ a

∈−→ f (

−→U ) =⇒ a

=

−→ f (a), ∃a ∈

−→U

=⇒ a

=−→ f (a), ∃a ∈

−→V ( ∵

−→U ⊆

−→V )

=⇒ a

∈−→ f (

−→V ). (A.5)

Thus a

∈ −→

f (−→U ) =⇒ a

∈

−→ f (

−→V ). Hence

−→ f (

−→U ) ⊆

−→ f (

−→V ). Similarly we

can show the other way round if we assume −→

V ⊆ −→

U . Hence now all that we

need to show is that f (U ) and f (V ) are affine subspaces of X . For that we need toshow that

−→ f (

−→U )and

−→ f (

−→V ) are corresponding vector spaces of f (U ) and f (V ).

This is pretty obvious from the alternate definition of affine morphism and that of

affine subspaces. Hence once we prove ( f (U ),−→ f (

−→U ), φ) and ( f (U ),

−→ f (

−→V ), φ)

are affine subspaces of (X ,

−→X

, φ), we can say that parallel affine subspaces are

transformed into parallel subspaces in image affine space.

A.2 Projective Geometry

Now we move on to definitions of a projective geometry. The projective geometry

is the geometry of a most general form and hence has fewer invariants but they

are neverthless extremely crucial.

76



A.2.1 Definition of a projective space

The projective space of dimension n, denoted P(En+1) is obtained by taking the

quotient of an (n + l)-dimensional vector space En+1 \ {0} with respect to the

equivalence relationship

x x

⇔ x = λx, ∀x, x

∈ En+1, ∃λ ∈ R \ {0}. (A.6)

Here we assume En+1 is the vector space over R. In some cases we might gen-

eralize to the complex field C and mentioned as required. Also do we see that

is an equivalence relation here.

Many other equivalent definitions are also found in literature for P(En+1). One

might be interested in looking at an equivalence relation as a 1-dimensional sub-

space of En+1, thus P(En+1) can be seen as the set of all 1-dimensional subspacesof En+1, or also the set of all lines passing through origin in En+1. Different ways

of looking at the definition, but essentially the same structure is obtained. Alter-

nate ways of describing a projective geometry are interesting enough to not miss.

Hence just for the sake of lateral view:

A projective space is a triplet (P, L, I ) such that

1. Any pair of distinct points are joined by a unique line.

2. Given any four points A, B, C, and D with no three collinear, if AB intersects

CD, then AC intersects BD.3

3. Every line is incident with at least three distinct points.

4. There exist three non-collinear points.

P is a set of points, L is a set of lines and I is an incidence structure which gives

us the information as to which line is incident on which point and which point is

incident on which line. We can derive as results from these axioms many otherproperties of a projective space including it’s invariants. But they being out of

the scope of this text we skip it. Beutelspacher in [28] and Casse in [29] give an

extensive treatment of this topic.

3This axiom leads to the much talked about property of a projective plane that any two linesmust intersect at a point.

77







Groups of projective transformations

If the mapping f :P(En+1)→P(E

n+1) is bijective , the mapping−→ f :En+1→E

n+1 is

one-one in a sense that ∀u, v ∈ En+1, u = λv, for any λ ∈ R, then−→ f (u) = λ

−→ f (v),

for any λ

∈ R. Similarly we can see that−→ f is onto as every vector in E

n+1 would

project on a point in P(E

n+1) which would have a unique corresponding point inP(En+1) which in turn would be a projection of some vector in En+1.

These three results tell us that we can uniquely identify a projective map-

ping between two projective spaces with a unique linear mapping between their

corresponding vector spaces. Further, we know well that any linear mapping,−→ f :En+1→E

n+1 can be alternately represented as multiplication by a matrix:

∀v ∈ En+1, f ( p(v)) = p( A × v). (A.10)

Hence for an homography, we need matrix A to be an invertible matrix. In other

words an homography is defined as the projective transformation where−→ f is an

isomorphism. The set of all homographies, PLG(En+1), that can be represented by

the set of all such invertible matrices, A, form a group, with the group operation

being the composition of homographies:

f , g ∈ C(P(En+1), P(En+1)) ⇒ f ◦ g ∈ C(P(En+1), P(En+1)).

The identity element is seen as an identity homography defined as

∀v ∈ En+1,−→ f (v) = Av = v,

implying ∀a ∈ P(En+1), f (a) = a. Further, the mapping−→ f :En+1→En+1 and hence

f :E→E are bijective, the inverse homography exists such that f ◦ f −1 is an iden-

tity homography. Thus PLG(En+1) is a group.

Lemma 11 (First fundamental theorem of projective geometry). Let P(En+1) andP(E

n+1) be two projective spaces of n dimensions and let their associated vector spaces

be En+1 and E

n+1. Let’s assume {mi}1≤i≤n+2 and {m

i}1≤i≤n+2 be the basis of P(En+1)

and P(E

n+1) respectively. Then the theorem says that there is a unique homography

g : P(En+1) → P(E

n+1), such that g(mi) = m

i, ∀ i,1≤i≤n+2.

Proof. Given the basis of a projective space, {mi}1≤i≤n+2 let’s assume {−→m i}1≤i≤n+1

forms the basis of the vector space associated. Thus {−→m i}1≤i≤n+1 and {−→m

i}1≤i≤n+1

80



are the bases of vector spaces of P(E) and P(E) as defined in subsection (A.2.2)

on bases of projective spaces.

We shall use canonical projection functions, p :En+1\{0}→P(En+1), p(−→mi ) = mi

and p

: E

n+1\{0} → P(E

n+1), p(

−→m

i ) = m

i. From the given condition g(mi) =

mi, ∀i,1≤i≤n+2, we have

g( p(−→mi )) = g(mi) = m

i = p(

−→m

i ) = p(−→ g (−→mi )), ∀i,1≤i≤n+2.

∴

−→m

i = λi−→ g (−→mi), λi ∈ R, ∀i,1≤i≤n+1. (A.11)

Let,−−→m

n+2 = λ−→ g (−−→mn+2).

But −−→mn+2 = Σn+1i=1 (−→mi ) and

−−→m

n+2 = Σn+1i=1 (

−→m

i).

From equation (A.11) we get,

Σn+1i=1 (λi

−→ g (−→mi )) = λ ∗ −→ g (Σn+1i=1 (−→mi )).

Using the fact that −→ g is a linear function and as {−→m i}1≤i≤n+1 forms a basis of

En+1 we see that {−→ g (−→m i)}1≤i≤n+1 forms a basis of E

n+1. Thus we get

λi = λ, ∀i,1≤i≤n+1.

∴−→ g (−→mi ) = λ ∗

−→m

i , ∀i,1≤i≤n+2, ∃λ ∈ R \ {0}. (A.12)

Let us consider two homographies, g1 and g2 such that −→ g1 (−→mi ) = λ−→m

i, ∀i,1≤i≤n+2

and −→ g2 (−→mi ) = µ−→m

i, ∀i,1≤i≤n+2, ∃λ, µ ∈ R \ {0}. Then from lemma (10), we deduce

that −→ g1 = α−→ g2 (here α = λ

µ ) implies that there is a unique homography associated

with them. Hence g1 = g2.

A.2.4 Projective subspaces

Let V

be a subset of a given projective space P(En+1) ≡ P(E) and it’s associ-

ated vector space En+1, then it is a projective subspace of P(En+1) iff we can find

a vector subspace V of En+1 such that P(V ) = V 4. Thus if

−→V is an m dimen-

sional subspace(m ≤ n + 1) of −→E , then V is known as m − 1 dimensional projective

subspace of P(E).

81



Transformations of projective subspaces

A projective transformation f : En → E

m transforms an k dimensional projective

subspace(k < n) into l dimensional projective subspace where l ≤ k . In other

words, a plane in En would be transformed into either a plane, a line or a point

in E

m whereas a homography would preserve the dimensions. Hence a line is

transformed into a line, and a plane transformed into a plane. Here a projective

line is defined as a 1 dimensional subspace of the projective space and a plane

as 2 dimensional subspace, a point as a 0 dimensional subspace of a projective

space, and so on so forth. The reason for homography preserving the dimensions

is a point to note. A homography or collineation(a term used for homography

in certain literature) is a projective mapping associated with an isomorphism of −→ f : En+1 → E

n+1. Hence each of the vectors of a basis of any subspace of En+1

would be uniquely mapped to a unique vector in E

n+1, and a set of of linearly

independent vectors {mi}1≤i≤k would be mapped to a set of linearly independent

vectors {m

i}1≤i≤k . Hence a subspace of k dimensions spanned by {mi}1≤i≤k is

uniquely mapped to a subspace of k dimensions spanned by {m

i}1≤i≤k .

More on subspaces

Given two subspaces U and V of P(En+1), we have a span of U and V , < U ∪ V >

as the smallest projective subspace containing U ∪ V (or seen as the intersection

of all subspaces of P(En+1) containing U ∪ V ). Then we can easily show that

U ∪ V has the vector subspace F + G of En+1, when F, G are the subspaces of En+1

associated with U , V .

A.2.5 Affine completion

Generalizing the affine geometry we obtain the projective geometry. Specifically

we show here the extension of the affine space to obtain a projective space. Con-

sider an n dimensional affine space (X ,−→X , φ). Now assuming {mi}1≤i≤n+1 to be

a basis of affine space X , we can denote every point m ∈ X by taking the vector

−−→m1m = −→m = (x1,..., xn) and representing it with it’s co-ordinates in the given

basis. Thus extending this co-ordinate representation by appending 1 we have−→m p = p((x1,..., xn, 1)) = p([−→m , 1]) 5. Hence as there is a one-one correspondence−→m ↔ −→m p we can represent every point a in P(En+1) not at infinity by a unique

point m in X . Also for every point m in X we have a unique point a in P(En+1).

5 p is the same canonical projection defined in equation (A.8).

82







And any hyperplane in En+1 can be represented as

Σn+1i=1 (hixi) = 0,

∀x ∈ En+1, x ≡ (x1,..., xn+1),

∃h ∈ En+1, h ≡ (h1,..., hn+1).

Vector of coefficients, h as described above uniquely defines the hyperplane,

upto multiplication by a non-zero scalar. Thus a hyperplane can be uniquely

represented by a projective point a = p(h), a ∈ P(En+1). Vice versa, every point

h ∈ P(En+1) uniquely gives us a hyperplane, as one from H (En+1). Hence from

equation (A.14)

H (En+1) ←→ H (P(En+1)) ←→ P(En+1), (A.15)

e.g. the hyperplane defined by equation, xn+1 = 0 corresponds to multiples of

a vector h ≡ (0,...,1)n+1 and hence to the projective point (0,...,0)n. This sec-

tion indirectly hands over us the concept of duality in projective space. Given

a n dimensional projective space P(En+1), we see that any point uniquely corre-

sponds to a hyperplane in corresponding En+1 and hence to a unique hyperplane

in P(En+1). Thus points and lines (line is an hyperplane of P(E2)) are duals of

each other in p(E2).

A.2.6 Action of Homographies on subspaces and study of invari-

ants

We know that parallelism and incidence are invariant in any affine transforma-

tion. And this can be seen from the previous sections on affine invariants. Simi-

larly here in projective geometry we can talk about invariants, which are cross-ratio

and incidence. It’s elementary to prove that incidence is preserved in a projective

transformation. For all we have to show is that the underlying vector spaces for

two projective subspaces in which one is a subset of another, are transformed into

vector subspaces in which again the corresponding vector subspace is a subset of

the other image. This can be proved using the linear property of the linear vector

space mapping. Hence we just define a cross-ratio here:

Lemma 13. Given four points a, b, c, d on P(E2) such that the first three points are

distinct, and denoting ha,b,c as the homography on P(E2) such that ha,b,c(a) = ∞,

85





two lemmas next, which we accept without proof. We still give short explanation

for each lemma along with.

Lemma 14. A point in P(E∗n+1) uniquely represents a hyperplane, H in P(En+1) and

hence a unique hyperplane H

in En+1 (Refer to equation ( A.15)). In fact if the point is

f ∈ P(E∗

n+1

), represented by the coordinate vector (α1,..., αn) of E∗

n+1

upto a non-zero

scalar multiplication, then the corresponding hyperplane is defined by the equation hT x =

0 where h is a vector uniquely defined as (α1,..., αn) upto a non-zero scalar multiplication.

This hyperplane is normally considered to lie in En+1. Thus a set of points in P(E∗n+1) is a

set of hyperplanes in En+1 and also by equation ( A.15) is a set of hyperplanes in P(En+1).

Lemma 15. There is a unique n − 2 dimensional subspace of P(En+1), V, such that

∆= { H ∈ H (P(En+1)) : V ⊂ H }. And for every x not in V, there is a unique H ∈ ∆

containning V. ∆ as either a line in P(En+1) or in P(E∗n+1) (we can note that there is a

one-one correspondence between P(En+1) and P(E∗n+1)).

An elaborate proof can be read from [27], given by Faugeras et al. . This gives

us a kind of understanding of a line in a projective dual space P(E∗n+1) and corre-

sponding pencil of hyperplanes in the corresponding projective space P(En+1).

Ideally we have a vector in dual space uniquely corresponding to a vector in

the vector space and vice versa. And by result (A.15) each of the points on the line

in projective dual space P(E∗n+1) corresponds to a unique hyperplane in the vec-

tor space En+1 and hence to a unique hyperplane in the projective space P(En+1).

The above result adds to it that corresponding to all these points lying on a line,

the hyperplanes in P(En+1), contain a common n − 2 dimensional projective sub-

space, V .

Lemma 16. If we consider another line D in P(En+1) such that it doesn’t intersect V,

then we have a homography,

∆→ D : H → H ∩D.

Proof. We are given the application ∆ → D : H → H ∩ D. Hence to show that

it is a homography, we consider the corresponding map between the respective

vector spaces:

∆ = P(F∗), D = P(D∗), F∗ ⊂ E∗n+1, D∗ ⊂ En+1.

∴

−→ f : F∗ → D∗

is the corresponding map which we show is an isomorphism. We proceed in two

parts.

87





maps to v in D∗. Which can be expressed as,

v j = Σn+1i=1 (hiαij ), ∀ j,1≤ j≤n+1, (A.16)

where αij are constant scalars dependent only on m1,..., mn−1.

Thus this transformation is a linear one which can be easily verified using the def-inition of a linear map and the transformation

−→ f : F∗ → D∗ : g → v.

Now this transformation defines the corresponding projective mapping, f : ∆ →

D : H → H ∩ D, and hence the projective mapping is a homography., or a projec-

tive morphism.

A.2.8 Homography as a perspective projection between two pro-

jective lines

Consider a setting of a 2D projective space(plane). Let us have a point o and two

lines l and m not passing through o. Further let us have four lines n1, n2, n3 and

n4, passing through o. Then we can assume a, b, c, d to be the points of intersection

of line l with lines n1, n2, n3, n4 and a, b

, c

, d

are points of intersection of line m

with lines n1, n2, n3, n4 respectively. Then this kind of projective transformation is

a homography from P(E1) onto P(E1). The proof which is quite elementary is left

here. Though this is not the only kind of homography possible between two projec-

tive lines. We can show that given 3 point correspondences between two P1 lines,

we can obtain a unique homography8

between them. And not all of them would be such that the lines defined by vectors

−→aa

,

−→bb

and

−→cc

are concurrent. Thus this

kind of projective transform is a special case, known as perspective projection or

perspectivity. In fact in section (A.2.3) we saw that all homographies f : P1 → P1

form a group PLG(R2). Thus these perspective projections also form a group

which is also a subgroup of P LG(R2). How such a projection forms a group can

be summed up from the below figure (A.2) where m1 → m2 and m2 → m3 are

homographies due to perspective projection and also is m1 → m3 a perspective

projection.

A.2.9 Homography between two planes

Figure (A.3) shows a correspondence between two planes m and l due to perspec-

tive projection. In this case we can extend the result for lines to see that such a

8Of course as defined in section (A.2.3.) a homography is unique upto non-zero scalar multi-plication

89



Figure A.2: Associativitiy of perspective projections

Figure A.3: An example of a homography between two projective planes l and mdue to perspectivity

perspective projection between two planes centered at point o is also a homog-

raphy f : P(E2) → P(E2). Then f ∈ PLG(R3). Thus this is a special case of a

homography between two projective planes which otherwise would have needed

4 point correspondences. In this scenario as we know central point o, knowing

just three non-collinear point correspondences, we have planes well defined and

correspondences between all the other points obtained by collineating through

point o. The three points in each plane constructing the correspondences are mea-

sured in a local coordinate systems in respective planes.

A good example of why we need four points for specifying a homography

between two projective planes in general can be seen if we consider only three

point corrrespondences, the position of the center point o is uncertain. With ref-

90



erence to [6], we can see that given a point o, we can fix three directions and then

find three points, one in each direction, this would determine the position of two

planes and a homography between them. With some extension we can show that

specifying a fourth point correspondence, the freedom is restricted and not every

point o can act as a center of the perspective projection. A point to note is that the

relative positions are to be calculated. We have proved a similar result in chapter

(2), where we show that not all four point mappings can be realized by a per-

spective projection. Thus the set of homographies of projective planes obtained

through perspective projection forms a subgroup of the set of all homographies

of projective planes. The group structure of this subgroup has been studied and

discussed in literature on projective geometry like [27, 30]. This kind of homog-

raphy is extremely useful for camera calibration and pose estimation. It defines

many properties governing image formation in a pin-hole camera model.

One point to note is that, in this appendix we have looked at homographies

as invertible bijective projective transformation, as in section (A.2.3) between two

projective spaces of equal dimensions(also known as projective morphisms). But

henceforth from here we will use the term homography only for projective mor-

phisms between projective planes as considered in this section.

91



CHAPTER B

Camera models and camera calibration

A basic camera is a projective model that maps points in P(E4) to points in P(E3).

Skipping elementary constructions we give a general formula that maps point

X ∈ P(E4) to a point x ∈ P(E3) 1. Assuming the camera coordinate system to be

centered at the euclidean coordinate system,

x = KR

I | − C

3×4 X , (B.1)

where K is a 3 × 3 camera calibration matrix that relates points in 3D camera co-

ordinate system to 2D image coordinate system and houses intrinsic parametes.

Further, R and t = −RC are the extrinsic parameters of camera. R is the rota-

tion matrix and t is the translation vector relating 3D world coordinate system to

3D camera coordinate system. Point C denotes the camera center in the world co-

ordinate system and hence C = [C 1]T is one of the vectors in R4 representing

C.

P = K R

I | − C

is the 3 × 4 projection matrix of the camera.

This raises an important question, can all 3 × 4 real matrices represent a camera

projection matrix? The answer is yes. This question leads us to two kinds of

cameras, classified based on the position of the camera center:

B.1 Finite Camera

If the left-most 3 × 3 submatrix of projection matrix P(let us denote it as M), is a

non-singular matrix, we have

P = M[I | − M−1 p4],

1Owing to space restrictions we denote a point X ≡ (a, b, c) in P(E4) by one of it’s correspond-ing vectors (a, b, c, 1) in R

4. Henceforth we would use this notation for a projective point unlessspecified otherwise.

92



where p4 is the rightmost column of P. A camera center in world coordinate frame

is defined as a vector C, such that PC = 0. For the finite camera with non-singular

M, C is the point represented as C =

− M−1 p4

1

In short, a finite camera is the one

whose center C is a finite point in 3D world coordinate system. 2

B.1.1 Elements of a finite projective camera

Assuming that we have a finite camera at hand, a camera projection matrix P = M | p

is dissected into following elements:

1. Column points: The leftmost 3 columns of P, which are p1, p2, p3 represent

the images of 3 principal directions X , Y , Z of world coordinate system. And

p4 represents the image of the origin of world coordinate system. This is so

in P3 a direction is represented by a point at infinity in that direction. HenceX direction is represented by a point represented by the vector (1, 0, 0, 0)

2. Row vectors: Denoting the rows of P as r1, r2, r3, the principal plane is the

plane parallel to image plane and passing through the camera center. Hence

all points that project to points represented by (x, y, 0) lie on this plane.

Thus PX =

r1

r2

r3

X =

x

y

0

, and hence r3X = 0. Thus r3 is the correspond-

ing row representing the principal plane. Similarly we can see that the othertwo rows are the planes which project to X and Y axes of the image plane.

They are known as axis planes.

B.2 Infinite Camera

An infinite camera is the one whose center is at infinity. Using the notion of the

previous section, we say that M is a singular matrix. And hence applying the

condition PC = 0, we get the camera center as

C =

d3×1

0

.

2Can we say that a point at infinity in 3D world coordinate system is also a point at infinity in3D camera coordinate system?

93



B.3 Camera calibration

From section (B), a projection matrix of a camera model is given by:

P = K R [I | − C] ,

K =

α γ u0

0 β v0

0 0 1

, (B.2)

where K is the camera calibration matrix. In fact, α and β represent the focal length

of the camera in terms of pixel dimensions in x and y directions respectively, γ

represents the skew due to distorted sensors in practical cameras and, u0 and v0

are the x and y coordinates of the principal point3 in image coordinate system.

Further, R is the rotation matrix and t = −RC is the translation vector.

The process of camera calibration is defined as estimating these quantities.

Further, P is a 3 × 4 matrix with 11 degrees of freedom4 and rank 3. Thus a knowl-

edge of 6 point correspondences in needed to uniquely estimate P upto non-zero

scalar multiplication. In fact only 5 and a half correspondences are needed. Rep-

resenting our image plane as P(E3) and world coordinate space as P(E4), we can

show that the process of imaging points in scene,P(E4), onto image plane, P(E3),

is a form of a projective transform. Hartley and Zisserman in [1] and Trucco et al.

in [14] have a detailed treatment for various ways of estimating P. Also does a

paper by Zhang, [31], outline two main kinds of methods for camera calibration:

1. Photogrammetric calibration: Here the calibration is done by specifying a

set of 3D-2D point correspondences between the world coordinate system

and the image plane. For this an elaborate setup and knowledge of the 3D

coordinates of the model object are required.

2. Self-calibration: Here more than one images are obtained using the same

camera for the scene. Different images are created by a rigid motion of thecamera in 3D space5. These views impose certain constraints on the inter-

nal parameters of the camera and hence can help us estimate the projection

matrix without the need for an explicit calibration model.

3A principal point is the point of intersection of the perpendicular line to image plane from thecamera center, with the image plane itself.

4K has 5 degrees of freedom, R has 2 and C has 3, thus a total of 11.5It can be an euclidean or a projective space.

94





CHAPTER C

Some miscelleneous proofs

This section contains some mathematical proofs to certain statements claimed in

various sections of the thesis. While some proofs might look trivial and some

not so trivial, the premise of a rigorous mathematical backbone is upon airtight

arguments and reasoning. And hence we aspire to lay down whatever proofs felt

relevant with utmost rigor.

Lemma 17. The zero set of function f in equation (4.7) defines the set of valid values of

h. We hypothesize that this set of points

X = {h ∈ R9| f (h) = 05}

defines an implicit manifold of fourth dimension. Or in other words, the jacobian of f ,

J X (h) defined as

J X (h) = 2

hT S1

hT S2

hT S3

hT S4

hT S5

5×9

(C.1)

is a matrix of rank five for all nonzero values of vector h ∈ R9. Where Si are nine

dimensional quadrics or real symmetric matrices defined in section (4.2).

Proof. Let us in general assume that the five vector rows are linearly dependent.

Hence we have some scalars αi, i = 1, 2, ..., 5 such that Σ5i=1αihT Si = 0 and not allof αi are zero. Using the definition of Si, i = 1, 2, ..., 5 we can write

hT S1 =

hT 1 C2 01×3 − p1hT

3 C2

, hT S2 =

01×3 hT

2 C2 − p2hT 3 C2

,

hT S3 =

hT 2 C2/2 hT

1 C2/2 − p3hT 3 C2

, hT S4 =

hT

3 C2/2 01×3 hT 1 C2/2 − p4hT

3

hT S5 =

01×3 hT 3 C2/2 hT

2 C2/2 − p5hT 3 C2

.

96







for two set of conics and their plane solutions, as given next.

Scaling the two vectors defining π u11 and π u21, by α, we get planes, π αu11 and

π αu21 and the vector defining the plane π αu11 is α

m11 m12 m13

T and that for

π u21 the vector is α

m21 m22 m23T

. From the definition of quantities k 11, k 12and M1 for vector u11 and k 21, k 22 and M2 for vector u21, given by equations (3.9)

and (3.12), we can notice that they are not affected by scaling. Hence from equa-

tions (3.21) and (3.22), one can infer that scaling u11 and u21 by α results in scaling

of the centers xc1 and xc2 by 1/α. The local coordinate system is chosen to be an

orthonormal set of axes. Hence the radius of the circle represented in both of the

local and global coordinate systems is the same which means that the radius is

also scaled by 1/α.

Then by application of lemma (2) to each of the two planes’ scaled versions, π αu11

and π αu21, with conics C1 and C2 being the same, we have

xαc1 = xc1/α and xαc2 = xc2/α,

and the radius(which stays the same as it was for C and C ) of these two scaled

circles is

rα1 = r1/α.

Further do we see that xαc1rot = xc1rot /α. From equations (3.25) an d(3.26) we have

pα1 = p/α and pα2 = p2/α. Applying equation (3.29),

Aα =

xc1

α

p1

α

(xc1 × p1)

α2

and Bα =

xc1rot

α

p2

α

(xc1rot × p2)

α2

.

Pose estimation from one conic correspondence

Documents