Building and Evaluation of a Mosaic of Images using Aerial ......SIFT. Ransac, em conjunto com o algoritmo DLT, é usado para calcular as transformações projetivas entre duas imagens

Building and Evaluation of a Mosaic of Images using

Aerial Photographs

Tiago André Simões Coito

Dissertação para obtenção do Grau de Mestre em

Engenharia Mecânica

Júri

Presidente: Prof. Mário Manuel Gonçalves da Costa

Orientador: Prof. João Rogério Caldas Pinto

Coorientador: Prof. José Raul Carreira Azinheira

Vogal: Doutor Lourenço da Penha e Costa Bandeira

Novembro de 2012

“We cannot solve problems by using the same kind

of thinking we used when we created them.”

Albert Einstein

- i -

Acknowledgements

I would like to thank to my family, because without them this work would not be possible;

I would like to thank also to my advisors Professors João Caldas Pinto for the help, dedication,

availability and especially patience to deal with my stubbornness and laziness and to Professor José

Azinheira, in particular by his constructive criticism on the way I was writing the dissertation.

I am also grateful to my colleagues, in special to Pedro Frazão for the excellent example of

what is to be dedicated and hardworking and for showing me that there are just advantages in doing

everything in advance. To José Azevedo for his friendship and for being always available to help, to

Tiago Pereira, Henrique Mata, Crispim Ponte, Inês Colaço, Tony Paulino, Ricardo Gomes, Joana

Rodrigues, Miguel Silveira and finally a special thanks to my dear friend Daniel Araújo, because

sometimes seems that the best people pass away in the first place.

I am thankful to all my friends and colleagues (they know who they are) with whom I spend

hours playing foosball and snooker! They are responsible for some of the best memories that I have

from these years.

Least, but not last, I would like to thank to all my colleagues and friends, inside and outside

Instituto Superior Técnico, who helped me these years to forget and overcome the difficulties that had

arisen. Especially to my good friends: Bruno Lopes, Mariana Lopes, Ricardo Assunção and Iotelma

Jumpe. A special thanks to my dear friend Rita Santos who passed away a few years ago due to a

brain tumor, for her courage.

This work was partially funded by the Project PEst-OE/EME/LA0022/2011, LAETA Unit

(IDMEC-Pole IST) of the Foundation for Science and Technology (Fundação para a Ciência e

Tecnologia).

- ii -

- iii -

Abstract

To solve image mosaicing problem, this work first use the Harris-Laplace method to first find scale-

invariant keypoints. Through Gaussian-weighting the images intensities with a square window around

each keypoint, it is possible to compute a finger print (descriptor) for each keypoint using the same

method as the used in SIFT. Ransac, together with DLT algorithm, is used to robust estimate the

projective transformations between two images from a set of putative correspondences. To find the

putative correspondences between keypoints is used the ratio of the Euclidean distances of the first

over the second nearest neighbor of each keypoint. Computed homographies are then used to build

the mosaic of images.

To study the robustness of the method a simulator was developed to take aerial photographies to

an input image representing the Earth surface. Small shifting the parameters involved to obtain each

photo, error measures, namely based on homography decomposition methods, were used to compare

the results between the estimated and the exact mosaics.

Results show the importance of the minimization of the tilt angles of the quadrotor, as well as the

necessity for high overlapping percentages between images. Results also demonstrate that the

images should be taken at high flight altitudes to avoid, on maximum, the parallax. However, the

researched method has shown to be invariant to translations, rotations, scale variations, perspective

transformations, contrast and brightness changes and even to the presence of some noise.

Keywords: Image mosaicing; aerial photography; UAV; SIFT; Ransac

- iv -

Resumo

Para resolver o problema da criação de mosaicos de imagem este trabalho primeiro usa o

algoritmo Harris-Laplace para encontrar pontos de interesse invariantes. Através da pesagem das

intensidades das imagens com um filtro Gaussiano de janela quadrada, é possível calcular uma

impressão digital (descritor) para cada ponto de interesse usando o método descrito no algoritmo do

SIFT. Ransac, em conjunto com o algoritmo DLT, é usado para calcular as transformações projetivas

entre duas imagens de um conjunto de correspondências putativas. Para encontrar as

correspondências putativas é usado o rácio das distâncias Euclidianas do primeiro sobre o segundo

vizinho mais próximo de cada ponto de interesse. As homografias calculadas são então usadas para

construir o mosaico de imagens.

Para estudar a robustez do método foi desenvolvido um simulador para tirar fotografias aéreas

a uma imagem representativa da superfície da Terra. Adicionando pequenas variações aos

parâmetros envolvidos para obter cada uma das fotografias, medidas de erro, nomeadamente

baseadas em métodos de decomposição das matrizes de homografias, foram usadas para comparar

os resultados entre os mosaicos estimados e exatos.

Os resultados demonstram a importância da minimização dos ângulos de inclinação do

quadrotor, bem como da necessidade de uma maior percentagem de sobreposição entre as imagens.

Os resultados também demonstram que as imagens devem ser tiradas a altitudes elevadas, para

evitar, ao máximo, a paralaxe. Contudo, o método estudado mostrou ser invariante a translações,

rotações, variações de escala, brilho e contraste, a transformações de perspectiva, e ainda à

presença de algum ruído.

Palavras-chave: Mosaico de imagens; fotografia aérea; UAV; SIFT; Ransac

- v -

Table of Contents

Acknowledgements .............................................................................................................................................................. i Abstract ............................................................................................................................................................................... iii Resumo ................................................................................................................................................................................ iv Table of Contents ................................................................................................................................................................. v

List of Figures .................................................................................................................................................................... vii List of Tables ....................................................................................................................................................................... xi List of Acronyms ............................................................................................................................................................... xiii List of Symbols ................................................................................................................................................................. xiv

1. Introduction ........................................................................................................................................ 1

1.1. Historic overview ....................................................................................................................... 1

1.2. Image mosaicing ....................................................................................................................... 2

1.3. Framework and goals of the dissertation .................................................................................. 3

1.4. Steps for the building of a mosaic of images ............................................................................ 4

1.5. Common issues ........................................................................................................................ 5

1.6. Structure of the dissertation ...................................................................................................... 5

1.7. Contributions ............................................................................................................................. 6

2. State of the Art ................................................................................................................................... 7

2.1. Research and application fields of image mosaicing................................................................ 7

2.2. Correct geometric deformations using data and/or camera models......................................... 7

2.3. Image registration using data/or camera models ..................................................................... 8

2.3.1. Feature based methods ................................................................................................................... 8

2.3.2. Exhaustive search ......................................................................................................................... 10

2.3.3. Frequency domain ......................................................................................................................... 10

2.3.4. Iteratively adjust ............................................................................................................................. 10

2.4. Eliminating seams from image mosaics ................................................................................. 11

3. Mathematical Problem Formulation .............................................................................................. 13

3.1. Image mosaicing ..................................................................................................................... 13

3.2. Geometric corrections ............................................................................................................. 14

3.2.1. Basic transformation problems ...................................................................................................... 15

3.2.2. Nadir view ...................................................................................................................................... 16

3.2.3. Orthorectification ........................................................................................................................... 17

3.2.4. Parallax.......................................................................................................................................... 18

3.2.5. Lens distortions ............................................................................................................................. 20

3.3. Image registration ................................................................................................................... 21

3.4. 2D homography ...................................................................................................................... 21

3.5. Direct Linear Transformation (DLT) algorithm ........................................................................ 23

3.6. Feature detection using Harris-Laplace .................................................................................. 26

3.7. Scale Invariant Feature Transform (SIFT) .............................................................................. 31

3.7.1. Assignment of a canonical orientation to each interest point ......................................................... 35

3.7.2. Interest point description ............................................................................................................... 36

- vi -

3.8. Putative correspondences ...................................................................................................... 37

3.9. Robust estimation using Ransac ............................................................................................ 38

3.10. Homography decomposition ................................................................................................. 42

4. Simulation of taking aerial photographies with UAVs ................................................................. 45

4.1. World reference frame ............................................................................................................ 45

4.2. UAV reference frame .............................................................................................................. 47

4.3. Camera reference frame ......................................................................................................... 50

4.4. Pinhole camera model ............................................................................................................ 51

4.5. Camera parameters and calibration ....................................................................................... 53

4.6. Resume ................................................................................................................................... 55

4.6.1. Detailed scheme of the reference frames ...................................................................................... 55

4.6.2. Scheme of the equations involved ................................................................................................. 56

4.7. Final notes .............................................................................................................................. 56

4.8. Examples of results given by Matlab ...................................................................................... 59

5. Implementation of the Image Mosaicing Method ......................................................................... 63

5.1. Image mosaicing method for two images ............................................................................... 63

5.1.1. Find homography ........................................................................................................................... 63

5.1.2. Stitch two images together ............................................................................................................ 64

5.2. Image mosaicing method for more than two images .............................................................. 66

5.3. Error measures implemented ................................................................................................. 67

5.3.1. Exact vs. estimated transformation ................................................................................................ 68

5.3.2. Homography decomposition based ............................................................................................... 70

6. Evaluation of the Mosaicing Method. Results and Discussion .................................................. 71

6.1. Robustness analysis to the implemented algorithm ............................................................... 71

6.1.1. Minimum number of inliers required .............................................................................................. 74

6.1.2. Runtime ......................................................................................................................................... 74

6.1.3. Influence of the random factor introduced by Ransac ................................................................... 74

6.1.4. Percentage of overlap between two images .................................................................................. 75

6.2. Example of a mosaic .............................................................................................................. 83

6.3. A comparison between mosaicing methods ........................................................................... 84

6.4. Application in real data ........................................................................................................... 84

7. Conclusions and Future Work ....................................................................................................... 89

7.1. Conclusions ............................................................................................................................ 89

7.2. Future work ............................................................................................................................. 90

References ......................................................................................................................................................................... 91 Appendices......................................................................................................................................................................... 99

Appendix A .................................................................................................................................................... 99

Appendix B .................................................................................................................................................. 104

Appendix C ................................................................................................................................................. 108

Appendix D ................................................................................................................................................. 112

Appendix E .................................................................................................................................................. 116

- vii -

List of Figures

Fig. 1.1 – Uavision quadrotor (small scale UAV) used in this project. .................................................... 3

Fig. 1.2 – Global scheme of the proposed algorithm. ............................................................................. 4

Fig. 3.1 – Common geometric transformations. The first three cases are typical examples for the affine

transformations. The remaining two are the common cases where perspective and polynomial

transformations are used, respectively.[21] ........................................................................................... 15

Fig. 3.2 – Attitude Rotations: Pitch, Roll, and Yaw. [84] ...................................................................... 15

Fig. 3.3 – Diagram showing the relationship between the zenith, the nadir, and different types of

horizon. The zenith is opposite the nadir [87]........................................................................................ 16

Fig. 3.4 – Perspective distortion resultant of a pitch rotation. ............................................................... 16

Fig. 3.5 – Perspective views of an A4 sheet. ........................................................................................ 17

Fig. 3.6 – From left to right: a) Nadir View; b) Small pitch rotation; c) High pitch rotation; d) Scheme of

the three views. It works similar to roll axis [88]. ................................................................................... 17

Fig. 3.7 – Orthographic views project as a right angle to the data plane. Perspective views project

from the surface onto the datum plane from a fixed location [91]. ........................................................ 18

Fig. 3.8 – A simplified illustration of the parallax of an object against a distant background due to a

perspective shift. When viewed from "Viewpoint A", the object appears to be in front of the blue

square. When the viewpoint is changed to "Viewpoint B", the object appears to have moved in front of

the red square [94]. ............................................................................................................................... 18

Fig. 3.9 – Parallax as a consequence of ground reliefs for high pitch angles. ..................................... 19

Fig. 3.10 – Parallax on nadir photos, where the bigger the angle of view of the camera the worst. .... 19

Fig. 3.11 – For successively higher altitudes (nadir views) the distortions diminish for the same

photographed area, however the resolution diminishes too. ................................................................. 19

Fig. 3.12 – Same camera tilts for successively higher altitudes gives worst results. ........................... 20

Fig. 3.13 – a) Distortion free image; b) Barrel distortion (fisheye lens); c) Pincushion distortion; d)

Example of a complex distortion [95]. ................................................................................................... 21

Fig. 3.14 – Observed object from different viewpoints. ......................................................................... 22

Fig. 3.15 – Contours of constant are shown by the fine lines............................................................ 29

Fig. 3.16 – Schematic of the D. Lowe scale space. .............................................................................. 33

Fig. 3.17 – Scale levels for each image of each octave ( ). [103] .................................................. 34

Fig. 3.18 – Orientation histogram [103]. ................................................................................................ 36

Fig. 3.19 – Interest point description [103]. ........................................................................................... 36

Fig. 3.20 – Orientation histograms. Adapted from [103]. ...................................................................... 37

Fig. 3.21 – A comparison between symmetric transfer error (upper) and reprojection error (lower) [63].

............................................................................................................................................................... 40

Fig. 3.22 – Desired and current camera frames. Involved notation [106]. ............................................ 42

Fig. 4.1 – Adopted virtual world image (8486 by 12000 pixels) and its coordinate system. For this

simulation each pixel was considered to have 1 m [108]. ..................................................................... 46

Fig. 4.2 – Clockwise reference frame adopted. .................................................................................... 46

- viii -

Fig. 4.3 – Caring for the representation of the results in Matlab. .......................................................... 47

Fig. 4.4 – UAV reference frame (blue) related to the world reference frame (red) when

. ................................................................................................................................................ 47

Fig. 4.5 – The pink reference frame corresponds to an UAV frame with . The

blue frame is the UAV frame with a single rotation yaw of 20 degrees. ................................................ 49


blue frame is the UAV frame with a single rotation pitch of 20 degrees. .............................................. 49


blue frame is the UAV frame with a single rotation roll of 20 degrees. ................................................. 50

Fig. 4.8 – Representation of the camera reference frame compared to the world and UAV reference

frames. ................................................................................................................................................... 51

Fig. 4.9 – Pinhole Model [111]. ............................................................................................................. 51

Fig. 4.10 – Real model (left) and simplified model (right) of a pinhole camera in the XZ plane. .......... 52

Fig. 4.11 – From world reference frame to image plane. Note on the notation involved. It is recalled

that is aligned with the flight direction of the UAV. ....................................................................... 55

Fig. 4.12 – Graph of the transformations involved. ............................................................................... 56

Fig. 4.13 – Strategy adopted to find the world coordinates associated to each image coordinate. ..... 58

Fig. 4.14 – The use of bilinear interpolation for accurate results. ......................................................... 58

Fig. 4.15 – Example of bilinear interpolation. For more information we recommend to consult: [63]. .. 58

Fig. 4.16 – Matlab representation of the model developed. The red rectangle shows the image plane

while the green rectangle shows the object plane. , . Camera position ( ,

1000, 600) meters. ................................................................................................................................ 60

Fig. 4.17 – Representation of the ground area covered by the photo taken in Fig. 4.16. The image on

the right is a detailed view. The red asterisk, in the middle, shows the position ( , , ) of

the camera in world coordinates (Fig. 4.1) projected onto a plane ( , ). A projection of and

reference frame is also shown in green. ............................................................................................... 60

Fig. 4.18 – Virtual photography taken (e.g. 320 per 240) according to Fig. 4.16 and Fig. 4.17. .......... 60

Fig. 4.19 – , . Camera position ( , 1000, 600) meters. a) Model

developed; b) Representation of the ground area; c) Image taken. ...................................................... 61

Fig. 4.20 – , , . Camera position ( , 1100, 800) meters. a) Model

developed; b) Representation of the ground area; c) Image taken. ...................................................... 61

Fig. 5.1 – Basic algorithm used to find a homography between two images. ....................................... 63

Fig. 5.2 – Image mosaic within the calculated boundaries. .................................................................. 64

Fig. 5.3 – Frames associated with the first two images. ....................................................................... 65

Fig. 5.4 – Example of the “maximum” strategy for two images. ............................................................ 65

Fig. 5.5 – Example of the “first in stays in” strategy for two images. .................................................... 66

Fig. 5.6 – Image chain assumed on our implementation. ..................................................................... 66

Fig. 5.7 – Euclidean distance. ............................................................................................................... 69

Fig. 6.1 – Taking two photographs with the Matlab script developed in chapter 4. The bottom image of

c) is the reference image, while in the top is the obtained applying a small shift to the first. ................ 71

- ix -

Fig. 6.2 – a) Estimated mosaic, b) Exact mosaic. ................................................................................. 72

Fig. 6.3 – Above we have the 84 putative correspondences between the two images obtained with

section 3.8. Below are the 62 correspondences (inliers) that remain after the robust estimation (section

3.9). ........................................................................................................................................................ 72

Fig. 6.4 – On the left we have the inliers (the same as the Fig. 6.3), but now with a rectangle around

defining the neighborhood of that point used to find its descriptor. The size of the square is

according to its characteristic scale. On the right is a zoom in view where we can see that each

interest point has its own canonical orientation. .................................................................................... 73

Fig. 6.5 – Ratio between the overlap percentages. Concept. ............................................................... 73

Fig. 6.6 – Scheme used for the pitch-overlap study. ............................................................................ 77

Fig. 6.7 – Scheme used for the roll-overlap study. ............................................................................... 78

Fig. 6.8 – Scheme used for the yaw-overlap study. .............................................................................. 79

Fig. 6.9 – Scheme used for the scale-overlap study. ............................................................................ 79

Fig. 6.10 – Contrast variations. a) ; b) ; c) ; .. 80

Fig. 6.11 – Brightness variations. a) ; b) ; c) ;81

Fig. 6.12 – Noise. a) ; b) ; ....................................................... 82

Fig. 6.13 – The red asterisks represent the coordinates where each photograph was taken. a)

Photographies taken. b) Desired positions and trajectory without errors. ............................................. 83

Fig. 6.14 – a) Exact mosaic. b) Estimated mosaic with the “maximum”-strategy between the

overlapping regions (see Fig. 5.4). c) Estimated mosaic with the “first in stays in”-strategy between the

overlapping regions (see Fig. 5.5). ........................................................................................................ 83

Fig. 6.15 – Two mosaics obtained from two pairs of images. 80 inliers were involved on the left and 16

on the right. ............................................................................................................................................ 85

Fig. 6.16 – Mosaic from three images. 59 inliers between the first and the second image and 48

between the second and the third. ........................................................................................................ 85

Fig. 6.17 – Results from the flight. a) trajectory; b) flight altitude. ........................................................ 86

Fig. 6.18 – Futsal pitch photograph. Interest points. ............................................................................. 86

Fig. 6.19 – Red lines. ............................................................................................................................ 86

Fig. 6.20 – Identical interest points. ...................................................................................................... 87

Fig. 6.21 – Straight lines are often not mapped as straight lines. ......................................................... 87

Fig. 7.1 – Matrix of coordinates. .......................................................................................................... 102

- x -

- xi -

List of Tables

Tab. 6.1 – Variability of the number of inliers. ....................................................................................... 75

Tab. 6.2 – Overlap percentages tested. ................................................................................................ 76

Tab. 6.3 – Example of a table for future interpretation. ......................................................................... 77

Tab. 6.4 – Minimum number of inliers on contrast evaluation. ............................................................. 80

Tab. 6.5 – Maximum geometric distance error on contrast evaluation. ................................................ 80





Tab. 6.10 – Comparison between methods. ......................................................................................... 84

Tab. 7.1 – Minimum number of inliers on pitch evaluation. ................................................................. 104

Tab. 7.2 – Medium number of inliers on pitch evaluation. .................................................................. 104

Tab. 7.3 – Maximum geometric distance error on pitch evaluation. Same table as Tab. 6.3. ............ 105

Tab. 7.4 – Medium geometric distance error on pitch evaluation. ...................................................... 105

Tab. 7.5 – Maximum pitch error on pitch evaluation. .......................................................................... 106

Tab. 7.6 – Medium pitch error on pitch evaluation. ............................................................................. 106

Tab. 7.7 – Maximum error on pitch evaluation. ................................................................................ 107

Tab. 7.8 – Medium error on pitch evaluation. .................................................................................. 107

Tab. 7.9 – Minimum number of inliers on roll evaluation. ................................................................... 108

Tab. 7.10 – Medium number of inliers on roll evaluation. ................................................................... 108

Tab. 7.11 – Maximum geometric distance error on roll evaluation. .................................................... 109

Tab. 7.12 – Medium geometric distance error on roll evaluation. ....................................................... 109

Tab. 7.13 – Maximum roll error on roll evaluation. .............................................................................. 110

Tab. 7.14 – Medium roll error on roll evaluation. ................................................................................ 110

Tab. 7.15 – Maximum error on roll evaluation. ................................................................................ 111

Tab. 7.16 – Medium error on roll evaluation. ................................................................................... 111

Tab. 7.17 – Minimum number of inliers on yaw evaluation. ................................................................ 112

Tab. 7.18 – Medium number of inliers on yaw evaluation. .................................................................. 112

Tab. 7.19 – Maximum geometric distance error on yaw evaluation. .................................................. 113

Tab. 7.20 – Medium geometric distance error on yaw evaluation. ..................................................... 113

Tab. 7.21 – Maximum yaw error on yaw evaluation. .......................................................................... 114

Tab. 7.22 – Medium yaw error on yaw evaluation. ............................................................................. 114

Tab. 7.23 – Maximum error on yaw evaluation. ............................................................................... 115

Tab. 7.24 – Medium error on yaw evaluation. .................................................................................. 115

Tab. 7.25 – Minimum number of inliers on scale evaluation. .............................................................. 116

Tab. 7.26 – Medium number of inliers on scale evaluation. ................................................................ 116

Tab. 7.27 – Maximum geometric distance error on scale evaluation. ................................................ 117

Tab. 7.28 – Medium geometric distance error on scale evaluation. ................................................... 117

- xii -

Tab. 7.29 – Maximum error on scale evaluation. ............................................................................. 118

Tab. 7.30 – Medium error on scale evaluation. ................................................................................ 118

Tab. 7.31 – Maximum error on scale evaluation. ............................................................................. 119

Tab. 7.32 – Medium error on scale evaluation. ................................................................................ 119

- xiii -

List of Acronyms

DLT – Direct Linear Transformation

DoG – Difference of Gaussian

GPS – Global Positioning System

IMU – Inertial Measuring Unit

LAETA – Laboratório de Energia, Transportes e Aeronáutica

LoG – Laplacian of Gaussian

Ransac – RANdom SAmple Consensus

SIFT – Scale Invariant Feature Transform

SVD – Single Value Decomposition

UAV – Unmanned Aerial Vehicle

WGS84 – World Geodesic System of 1984

- xiv -

List of Symbols

The nomenclature used is explained during the text. Here are some of the most relevant symbols.

or

Homography

Coordinates of point correspondences between both images

Normalized coordinates of point correspondences between both images

Average change of the image intensity

Image window

Image

Structure tensor/second moment matrix

or Gaussian window

Corner classification measure

Gaussian smoothed image

Local scale or Standard deviation (specified in the text)

Local scale

Integration scale

Gradient magnitude

Gradient orientation

Euclidean Distance

Transformation that brings points of the world frame into the image frame

Calibration matrix

Perspective projection

- 1 -

1. Introduction

1.1. Historic overview

Since ancient times the cartography has been dedicated to studying and creating maps.

Records of maps (visual representations of a region) found in Lascaux in southern France with near

17,300 years (Paleolithic Cave-Paintings) [1] are evidence of this in which it was tried to get an

overview map of the stars from the sky. Later in history maps found in Babylon, Greece and some

parts of Asia dating from a few thousand years ago [2] report that maps were very often related to

military purposes. Through the age of discoveries and more recently the discovery of photography

(early 19th century) which is used nowadays for satellite mapping much has evolved.

Through photography (art, science and practice of creating durable images by recording light or

other electromagnetic radiation [3]) it is possible to obtain very accurate representations of reality. The

photography has been used by mankind for various applications either as leisure, for example, to

preserve moments experienced by people, either as an art in which is included, for example, cinema

and perhaps the most important that is to observe and record the progress of all sorts of phenomena

in nature and thus develop science.

Aerial photography

In the present days, aerial photography shows to be an excellent form to gather information of a

given earth surface area. There are two main groups of these photographs: first are the photographs

taken at higher altitudes such as those by satellites, while the others are taken at a relatively lower

altitude by, for example, planes and Unmanned Aerial Vehicles (UAV). UAVs can fly from a few

meters (case of Micro Air Vehicles – MAV that can be as small as 15 centimeters) to 200 hundred of

kilometers height for some classes of hypersonic UAVs [4]. In the present work we are particularly

interested in the results given by a small-scale UAV (quadrotor) with mean flight capabilities where the

goal will be to take pictures at about 100 meters height (bird-eye views [5]), the enough to provide

good resolution images of the ground (measured as m/pixel) and the enough to guaranty some

robustness in terms of ground reliefs/human constructions as we will see later.

- 2 -

Unmanned Aerial Vehicles (UAVs)

Unmanned aerial vehicles (UAVs) or drones in the beginning were simple remotely piloted

aircrafts, but autonomous control is increasingly being employed [6]. Advances in technology, material

science and control engineering made the development of small-scale UAVs possible, affordable and

widely used in the military domain [5].

There are a wide variety of UAVs that may weigh from 18000 kilograms [7] to recently one UAV

developed at the University of Pennsylvania with the coordination of Professor Vijay Kumar with near

50 grams that consume just 15 W and has a diameter of 20 cm [8]. UAVs can cost from a few Euros to

tens of millions of Euros [7] and some of them have the capacity of carry at least 500 kg like MQ-1B

Predator [9].

These UAVs are equipped with sensors such as accelerometers, gyroscopes and barometers to

stabilize the flight attitude and is usual that they also include Global Positioning System (GPS)

receivers to obtain accurate position information. They may also include cameras, infrared cameras,

or other sensors [5].

More recently a lot of research is being done in attempts to flight through non-structured

environments which include 3D online reconstruction of the environment and consequent calculation

of adaptive trajectories. Flight formations are also one of the main research fields [8].

The trend for use of UAV technology in commercial aerial surveillance is expanding rapidly

thanks to new low cost UAV systems. Surveillance applications include: livestock monitoring, wildfire

mapping, pipeline security, home security, road patrol and anti-piracy [10]. Disaster assessment and

disaster management are also important applications and perhaps the most important ones since they

can directly deal with people lives [11]. Like [5] refer in disaster situations the available information

(e.g. maps) is no longer valid due to earthquakes or flooding so a quick and accurate overview of the

affected area provided by these UAVs can be very useful.

1.2. Image mosaicing

A lot of work has been done in mosaicing of images over the last 30 years (first algorithms

developed). Shortly after the photographic process was developed in 1839, the use of photographs

was demonstrated on topographical mapping [12]. However, these photographs initially acquired from

hill-tops or balloons and later with airplanes (1903), were in the beginning manually pieced together

[13]. Later in history after the first satellites started sending pictures back to earth the need for

automatic mosaicing methods increased.

So, in the present context, the goal of image mosaicing methods is to perform stitching

operations between a set of images obtained from small-scale UAVs of a determined region of interest

in order to create a single image with a larger field of view of the same region. A more complete

definition of image mosaicing is given ahead in chapter 3.

- 3 -

1.3. Framework and goals of the dissertation

This work falls within the scope of activities LAETA (Laboratório de Energia, Transportes e

Aeronáutica) in which there is an ongoing project to evaluate the fire safety in campgrounds and trailer

parks in Portugal.

This project aims to provide, up to date, geographic information of critical areas and its

surroundings to civil authorities, identifying enhancers of risk and this way supporting the fight against

fire. In this context, the major goal of this project aims to create a mosaic of images using a quadrotor

(Fig. 1.1). For this purpose, an analysis of the difficulty / efficiency of a set of acquisition imaging

systems that exists for quadrotors are already being studied. Methods for the automatic extraction of

information from mosaics will also be addressed (water sources, green zones, access roads, houses

or tents more vulnerable and everything that can help the fire department in charge) as well as a

possible integration of the mosaics created in Google Earth. One of the project objectives also aims to

study the possibility of integrating fish-eye lenses instead of conventional camera lens in order to take

advantage of its higher field of view.

Fig. 1.1 – Uavision quadrotor (small scale UAV) used in this project.

After a brief introduction and reference to the project behind this dissertation, and since the

project was still in an initial phase, the main objective for this dissertation was to study the state of the

art regarding the creation of mosaic images. Later the most appropriate method/methods found for the

present situation would be implemented and tested with a variety of representative images in Matlab.

A small study should be made to assess the robustness of the developed method in order to

inform the other members of the project, including those responsible for the control of the quadrotor,

for example, about the most appropriate flight altitude and overlap percentage between images, as

well as the maximum values allowed for the angles pitch, roll and yaw of the UAV while taking the

photos.

- 4 -

1.4. Steps for the building of a mosaic of images

To solve image mosaicing problem one of major strategies used in literature involve finding the

2D homographies (section 3.3) that describe the projective transformation between a pair of images;

naturally this requires that those images have some overlapping region between each other. Due to its

intuitive way to deal with the problem and also because of the quality of the achieved results shown by

numerous studies, find these homographies became the strategy adopted.

A basic global scheme, of the implemented algorithm, to solve image mosaicing problem

through homographies (which excludes data given by UAV sensors), is presented here for future

reference:

Fig. 1.2 – Global scheme of the proposed algorithm.

Find interest points in all the images

Describe interest points

Find putative correspondences

Find consistent correspondences

Estimate homography

Stitching operation

Image mosaic

- 5 -

1.5. Common issues

Several difficulties/challenges are reported in literature when images are obtained from low-

flying, small-scale UAVs, particularly when using homographies [5]. Next are enumerated some of

those difficulties. Related theoretical concepts are described in chapter 3:

Non - planar surfaces: Homography theory (2D homographies) is just true when we are

dealing with planar surfaces, where we assume image transformations as being from 2D to 2D and so

without ground reliefs. Homographies obtained from points matched at different elevations can cause

severe failures. Orthorectification (section 3.2.3), nadir view (section 3.2.2), parallax effect (section

3.2.4) are the concepts involved;

Low altitude: Naturally if flight altitude decrease, the importance of ground reliefs increase,

amplifying the problems already described;

Non – nadir views (section 3.2.2): biggest responsible for the occurrence of parallax effect;

Inaccurate data from UAV’s sensors: position and orientation given by GPS, Inertial

Measuring Unit (IMU) systems and altimeters, if accurate, would be enough to solve the image

mosaicing problem;

Lenses distortions (section 3.2.5): for example if fish eye lenses are being used;

Computational resources: important on real time applications, which fortunately is not our

case.

We finally note that this work aims photographing campsites, where usually there are no

relevant ground reliefs. Considering this fact homographies become an interesting option.

1.6. Structure of the dissertation

In chapter 2 is presented a revision of the state of the art regarding image mosaics. Chapter 3

addresses the theoretical concepts (“Mathematical Problem Formulation”). Chapter 4 concerns the

development of a simulator in Matlab that allow to take virtual photos to an image (this image

represent the world) considering for this that the world frame origin is in one of the extremes of such

image. To cope with this, information like the camera calibration matrix as well as the camera position

and orientation relative to the world frame need to be considered. This chapter is of extreme

importance since transformations involved allow to associate a pixel (of a picture taken to this “world

image”), given in pixel coordinates, to its corresponding point in the world coordinates (meters) and

vice versa. In chapter 5 is given a brief explanation of the most relevant practical aspects of the

implementation in Matlab. In Fig. 5.1 is presented a basic global algorithm of the image mosaicing

method. The differences between the methods mentioned on chapter 3 and the ones, in fact,

implemented are also addressed. This chapter 5 ends with an explanation of the error measures used.

Chapter 6 concerns the results and discussion. A robust analysis of the method is made. Examples of

results with real data are also presented. Finally, chapter 7 is for the conclusion and future work.

- 6 -

1.7. Contributions

The contributions of this dissertation concern the following topics:

1. It was developed a “toolbox” for Matlab to simulate the process of taking photographs with

UAVs. To do this, there are several inputs that need to be taken into account. They are: the calibration

matrix of the camera that is being used; angles pitch, roll and yaw of the UAV at the precise moment

when the photograph is to be taken; the cartesian coordinates (x, y, z) of the optical center relative to

the world frame; it is still necessary to provide a “virtual picture of the world” because this software

assumes that we are photographing a flat surface (the origin of the world reference frame is in the

upper left corner of the image); finally, we must provide the coordinate transformation matrix between

the UAV reference frame and the camera reference frame (they are not necessarily coincident).

This simulation does not deal with distortions of the camera lens.

2. A deep study of SIFT (Scale Invariant Feature Transform) based algorithms for mosaicing

methods led to choose a variation of this algorithm that is well suited for our purpose of building

mosaics from quadrotor images. The method to solve image mosaicing problem was developed in

Matlab. Inputs of the method are just the images to mosaicing. An advantage of this particular

implementation is that the input images do not need to have all the same sizes. An important

disadvantage is that it has to receive the images according to its stitching operation: the second image

stitch with the first, the third with the second and so on, however a possible method to solve this

problem is proposed as future work (section 7.2).

3. Since is not our goal a real time application, an improvement on the Ransac algorithm is

proposed. Part of it is implemented and the rest can be done easily as future work. Naturally, this

solution increases the computational effort of the algorithm (time consumption).

4. This work provides important feedback to those responsible for the control of the quadrotor.

The main features to be taken into account are: the overlap percentage between images - that need to

be high; and the importance of the minimization, in particular, of the pitch and roll angles of the UAV at

the precise moment when the photographs are taken – flight stability. A few comparative tables for this

are presented in the appendixes.

- 7 -

2. State of the Art

This chapter is divided as follows: in section 2.1 the main research and application fields of

image mosaicing methods are addressed. In the end of section 2.1 the image mosaicing problem is

divided into three smaller problems. These problems are mentioned, respectively, in sections 2.2, 2.3

and 2.4.

2.1. Research and application fields of image mosaicing

Nowadays the main research fields and applications of mosaicing methods can be categorized

into three topics: (1) the creation of image panoramas [14]; (2) the creation of planar mosaics from flat

surfaces [15]. It is often considered that satellite and UAV acquired Earth surfaces can be

approximated to flat surfaces [5]. Still in the planar mosaics context, microscopic mosaics for medical

purposes are also being studied [16]; (3) the 3D reconstruction of mosaics that includes:

The reconstruction of surfaces with relief: e.g., Earth surface (for precision) using

satellites (example of Google Earth); seafloor [17];

3D reconstruction of unstructured environments, e.g., interior of infrastructures [8];

Medical purposes, e.g., for facial reconstruction [18].

3D mosaics are also being used to get 3D representations of patient’s bones using

ultrasound technology [19];

Principals of epipolar geometry and stereo vision [20].

In the literature, the problem of image mosaicing is a combination of three problems: Correct

geometric deformations using data and/or camera models, image registration using data/or

camera models and eliminating seams from image mosaics [21]. Next it will be made a resume

regarding each of these three problems.

2.2. Correct geometric deformations using data and/or camera models

There are a wide variety of linear and nonlinear geometric distortions caused by camera lens.

Fish eye lens are an example of lens that produce images with a global distortion (barrel distortion).

According to [22] polynomial distortions can be used in such situations of global distortions using only

point correspondences between two images. Some methods proposed to solve lens distortions use

pre-computed data from calibration procedures to pre-processed photographies [23].This dissertation

is not focused on correcting distortions from camera lens, so we will skip this topic.

M. Irani and P. Anadan [24][25][26][27] chose to use polynomial transformations with more

degrees of freedom than the 8-parameter bilinear transformation that accurately handle perspective

distortions (and thus homographies). They use the extra degree of freedom in the transformations to

- 8 -

deal with the nonlinearities due to parallax, scene change, etc. [21]. A. Goshtasby [28] handle with

local deformations dividing the sub-regions in triangular regions with corners at the control points and

then he finds a linear transformation for that region that exactly maps corners to desired locations.

2.3. Image registration using data/or camera models

Image registration is the task of matching two or more images [21]. Automatic methods for

image registration used in image mosaicing literature can be categorized in many ways. Some authors

divide these methods just as feature based (section 2.3.1) and pixel based methods [5] (or intensity-

based [29]). On the other hand others authors as for example [21] consider a kind of a division of the

pixel based methods in three subgroups: exhaustive search (section 2.3.2), frequency domain

(section 2.3.3) and iteratively adjust (section 2.3.4) methods. Next, is given a brief introduction in

each of these groups.

2.3.1. Feature based methods

Feature based methods rely on accurate detection of image features. After the detection step,

strategies are used to find the features correspondences between images. The method finishes with

the calculation of the transformations between the images using the previous correspondences.

Examples are: [30][31].

Detection and description of features

Morphological features such as area, length and perimeter and color features such as the mean

value of the green pixels and the maximum value of the red pixels are used, for example, in the quality

assessment of lentils [32]. Features may also be points, lines or contours [29]. In computer vision,

some of main feature detection techniques currently used in the literature can be divided briefly as:

Edge detection: Canny [33]; Canny-Deriche [34]; Differential [35][36]; Sobel [37][38]; Prewitt [39];

Roberts Cross [40]. Corner detection: Harris operator [41]; Shi and Tomasi [42][43]; Level curve

curvature [44][45]; SUSAN [46][47]; FAST [48]; Harris-Laplacian scale invariant [49][31]. Blob

detection: Laplacian of Gaussian (LoG) [50]; Difference of Gaussians (DoG) [51]; Determinat of

Hessian (DoH); Maximally stable extremal regions, Principal curvature-based region detector (PCBR);

Ridge detection; Hough transform; Structure tensor; Affine invariant feature detection: Affine shape

adaptation, Harris affine, Hessian affine; Feature description: Scale-Invariant Feature Transform

(SIFT) [52][30][53]; Affine Sift Invariant Feature Transform (ASIFT) [54][55][56][57] is a fully affine

invariant method where the authors calculate SIFT interest points for several images. These images

are obtained simulating several camera rotations on the original image. Despite the large amount of

interest point this algorithm is expensive (time consumption); Speed Up Robust Feature (SURF) [58]

- 9 -

which is, up to some point, similar in concept as SIFT, in that they both focus on the spatial distribution

of gradient information; Gradient Location and Orientation Histogram (GLOH) which is a SIFT-like

descriptor that contains more spatial regions for the histograms. It uses the Principal Component

Analysis (PCA) to reduce its dimension to 64; Histogram of orientated gradients (HOG); Local Energy

based Shape Histogram (LESH) and Scale-space: Scale-space axioms; Implementation details;

Pyramids.

However, not all these techniques were specifically developed for our problem. From the

references mentioned above, from our research, the following were applied directly to problems

related to mosaic images using UAVs: SIFT [5]; SUSAN [59].

Some methods try to reduce the computational cost by finding just distinctive features in

portions of images, using for this accurate sensor data from aerial photography to provide guesses on

the overlapping regions[60].

Matching features

In order to find the correspondence features between a set of images, matching algorithms

have been proposed. Some of these algorithms are pixel-based, as the phase correlation method that

is mentioned in section 2.3.3, but here we are concerning just the feature-based like the nearest

neighborhood [30] and the Ransac for robust estimation [61][62].

Feature-based mosaicing algorithms usually include a step for the calculation of a vector of

descriptors for each feature. These vectors works like a fingerprint for each feature [30].

Ransac is mentioned in the literature several times due to the quality of its results on finding

correct matches between features [62]. However, Ransac does not receive just the features as inputs,

but also some initial guesses on the correspondences (putative correspondences). The aim of putative

correspondence methods is not the perfect matching, although they should present Ransac algorithm

a set of correspondences with at least 50% of inliers [62]. Examples of putative correspondence

methods – that use the vector-descriptors to find the putative correspondences between each pair of

images – are: the nearest neighborhood, which defines a correspondence as the keypoint (from image

2) with the minimum Euclidean distance from other keypoint (of image 1); and even a better method

mentioned by D. Lowe on SIFT method [30] is to use the ratio between the distance to the first and to

the second neighborhood, respectively.

Ransac, in this context of feature-based methods, appears often related to homographies and

to the use of the Direct Linear Transformation (DLT) algorithm [62]. Homographies can be computed

using the DLT algorithm from a set of more than four robust correspondences.

However, in the absence of distinctive features, this kind of approach is likely to fail [21].

In the context of this dissertation the reader should be aware of the variety of solutions available

to solve image mosaicing problem. Thus, does not make much sense to study all the methods that

exist, since it would take too much time. For that reason and since preliminary tests with the

correlation method did not go well, pixel-based methods were not deeply studied.

- 10 -

Transformation problem

After the correspondence problem is completed, transformations are then determined in order to

map the target image to the reference images, thereby establishing point-by-point correspondence

between the reference and the target images [29]. Some methods are only focused on correcting the

translation, rotation and scale between images[63][64][65][66], while others deal also with perspective

and projective transformations (homographies), as we saw before with Ransac using DLT [62].

2.3.2. Exhaustive search

Exhaustively searching for a best match for all possible motion parameters can be

computationally extremely expensive. Using hierarchical processing (i.e. coarse-to-fine [67]) results in

significant speed-ups. According to [68] this approach is also used to take advantage of parallel

processing for additional performance improvement.

2.3.3. Frequency domain

Frequency domain approaches uses the well known phase correlation technique to find the

displacements [63] [64] and other Fourier transform properties to find rotation and scale [65][66]. This

phase correlation technique estimates the 2-D translation between a pair of images by taking 2-D

Fourier Transforms of each image, computing the phase difference at each frequency, performing an

inverse Fourier Transform, and searching for a peak in the magnitude image. This technique proves to

work remarkably well, providing good initial guesses for image pairs which overlap by as little as 50%.

The technique works well for panorama and planar mosaics after lends distortion corrections and is

computationally efficient. These methods are, however, not specifically develop to UAV mosaicing

problems (perspective transformations and ground reliefs) since they can be sensitive to noise [21]

and will not work if the inter-frame motion is not mostly translational [15]. Still, according to our

reference [15] it was used only for planar scenes such as a documents, whiteboards and flat desktops.

2.3.4. Iteratively adjust

Iteratively adjusting camera-motion parameters leads to local minimums unless a reliable initial

estimate is provided. See for example the Levenberg-Marquardt iterative non-linear minimization

algorithm [69] which leads only to locally optimal solutions [15]. Initial estimates can be obtained using

a coarse global search or an efficiently implemented frequency domain approach [15][70].

Within our research these methods are not yet specifically used in UAVs mosaicing problems;

however it may be possible, in the future, to provide initial estimations to these methods through UAVs

sensors.

- 11 -

2.4. Eliminating seams from image mosaics

After the geometric corrections, images, most likely, require further processing to eliminate

remaining distortions and discontinuities. Alignment of images may be imperfect due to registration

errors resulting from, e.g. homography computation. In most cases images that need to be mosaiced

are not exposed evenly due to changing lighting conditions. However, these unwanted effects can be

alleviated during the compositing process.

Following [21] the main problem in image compositing is the problem of determining how the

pixels in an overlapping area should be represented.

Yahyanejad [5], e.g., simply replaces the overlapping pixels according to the order in which the

images are added to the mosaic. Each new image added overlaps the images already mosaiced.

Finding the best separation border between overlapping images [71] has the potential to

eliminate remaining geometric distortions. Such a border is likely to traverse around moving objects

avoiding double exposure [72][73]. The uneven exposure problem can be solved by histogram

equalization [73][74], by iteratively distributing the edge effect on the border to a large area [75], or by

a smooth blending function [76].

Following Begg C. L [77] it is possible to use the final overlapping region to calculate an

intensity correction and apply it to the images. The correction is calculated by finding the average

difference in intensity between the two images inside the overlap, and adding that values to every

pixel in the second image.

- 12 -

- 13 -

3. Mathematical Problem Formulation

This chapter is structured as follows: in the first section is given a brief explanation of what

image mosaicing is (section 3.1); the following section focuses on geometric transformations and

concepts derived from geometric distortions introduced when photographing with UAVs (section 3.2);

in the next section a brief definition of image registration is given (section 3.3); finally the concept of

homography is introduced (section 3.4) and all methods and algorithms arising are explained right

after.

Note that, in this dissertation, Matlab implementations do not always follow exactly the

approaches proposed by the authors (which are precisely those that are explained in this chapter), so,

in the Matlab implementation chapter, the introduced variations will be explained. Some practical

aspects regarding these algorithms (e.g.: constant values) are also discussed in the next chapters.

Another aspect is that the mathematical approach of some of the concepts that were not directly

involved in this dissertation were set aside (referring mainly to section 3.2).

Finally, the pinhole camera model is included in the chapter 4 to ease the reasoning.

3.1. Image mosaicing

Given a set of images with some overlapping regions between each other, mosaicing algorithms

are used to perform stitching operations on such images in order to create a single image with a larger

field of view. When pure image based, one may use mosaicing algorithms to find out the

corresponding camera angles used to take each image.

Naturally, there are several steps regarding image mosaicing. Methods are usually considered

as a combination of the three following problems [21]:

• Correct geometric deformations using image data and/or camera models (section 3.2)

• Image registration using image data and/or camera models (section 3.3)

• Eliminate seams from image mosaics (chapter 5.1.2)

- 14 -

Next, different situations where we apply mosaic algorithms, of increasing complexity, are

summarized [78]:

Planar scene: Situation wherein translations and a range of rotations of the camera optical

center are allowed1, but there are no depth variations (as ground reliefs). Any two images can be

related to each other by 2D homographies.

Panorama scene (or environment map [79]): when camera motion is resumed to rotations

around its optical center. Practice recently very used by Google Earth as Street View.

Scenes with depth variations: When two cameras view a 3D scene from two distinct positions.

The concept of motion parallax is involved, as well as epipolar geometry principles. When the camera

motion is known, the problem of depth map recovery is called stereo reconstruction (or multiframe

stereo if more than two views are used). This problem has been extensively studied in

photogrammetry and computer vision [80].

A version of such mosaics can also be seen in Google Earth, where a 3D reconstruction of the

Earth, including major ground reliefs, is performed.

As was said in section 1.4, this dissertation will be focused on methods regarding homographies

(Fig. 1.2), so we will approximate our problem to a planar scene situation considering that ground

reliefs are of small relevance for our problem.

In the next sections we will first speak about geometric corrections and give a definition of what

image registration is, later homography and related methods will be addressed.

3.2. Geometric corrections

A geometric transformation is a mapping that relocates image points.

Transformations can be global or local in nature. Global transformations are usually defined

by a single equation which is applied to the whole image. Local transformations are applied to a part of

an image and they are harder to express concisely [21].

Most common global transformations are the affine, perspective and polynomial

transformations [21] (Fig. 3.1). In geometry, an affine transformation (it is equivalent to a translation

followed by a linear transformation) is a transformation which preserves straight lines (i.e., all points

lying on a line initially, still lie on a line after transformation) and ratios of distances between points

lying on a straight line (e.g., the midpoint of a line segment remains the midpoint after transformation).

An affine transformation does not necessarily preserve angles or lengths. Translation, geometric

contraction, expansion, dilation, reflection, rotation, shear, similarity transformations, and spiral

1 Camera should be pointing down, high pitch and roll rotations - depending on the angle of view of the

camera - may result in photographies containing portions of the sky (points at infinity).

http://en.wikipedia.org/wiki/Expansion_(geometry)

http://en.wikipedia.org/wiki/Similarity_(geometry)

- 15 -

similarities are all affine transformations, as are their combinations [81]. As we will see (section 3.2.2)

perspective transformations2 are closely related to non nadir views, on the other hand polynomial

transformations are related to ground reliefs. Polynomial transformations are also often used to correct

geometric distortions caused by camera lens. Shah S. used polynomial transformations to correct fish-

eye lens [82] (section 3.2.5).

For more complex geometric distortions, one may use nonlinear (local) transformations

combined with others.

Fig. 3.1 – Common geometric transformations. The first three cases are typical examples for the affine

transformations. The remaining two are the common cases where perspective and polynomial

transformations are used, respectively.[21]

3.2.1. Basic transformation problems

Consider the current situation in which our UAV fly over and take pictures of an area previously

selected, we can easily notice that in the middle of air, without any kind of support, the simple effect of

wind and atmospheric pressure differences can cause oscillations on our UAV. Variations of flight

altitude, pitch, yaw and roll angles (Fig. 3.2) are naturally present, thereby causing geometric

distortions between images that we obtain and images that we wished to have. If the UAV controller is

not robust enough these previous distortions can make our problem very difficult to solve.

Fig. 3.2 – Attitude Rotations: Pitch, Roll, and Yaw. [83]

2 Projective transformation or homography is the composition of a pair of perspective projections.

- 16 -

Eventually it becomes necessary that the current methods used to generate mosaics deal

directly with the three simplest possible transformations that can occur when we try to stitch two

images. They are the affine transformations: relative rotations between images, scale effect and of

course the translation effect [84].

3.2.2. Nadir view

Nadir direction is just the vertical direction (pointing down) directly below a particular location

[85]. Fig. 3.3 shows the two vertical directions at a specified location, nadir and zenith directions.

Fig. 3.3 – Diagram showing the relationship between the zenith, the nadir, and different types of horizon.

The zenith is opposite the nadir [86].

If camera axis is not aligned with nadir direction at the exact moment in which we take pictures,

then resultant images will contain perspective distortions of the ground. Non nadir views can be

caused by non zero pitch and/or roll angles (camera tilt) of the UAV (Fig. 3.3).

Fig. 3.4 – Perspective distortion resultant of a pitch rotation.

The distortion produced by this phenomenon is characterized as being a geometrical

perspective transformation since the ground plane is not parallel to picture plane (Fig. 3.5).

- 17 -

Fig. 3.5 – Perspective views of an A4 sheet.

Fortunately both these perspective distortions provoked by pitch and roll rotations can be

corrected through perspective transformations [21] provided that pitch and roll angles remain relatively

small. Photos tend to lose accuracy in corners and edges of images for larger distances from the

camera as can be seen (Fig. 3.6 c)) where pitch and/or roll angles are relatively large.

Fig. 3.6 – From left to right: a) Nadir View; b) Small pitch rotation; c) High pitch rotation; d) Scheme of the

three views. It works similar to roll axis [87].

3.2.3. Orthorectification

At this point it was considered that pictures were taken of approximately flat surfaces (without

ground reliefs). However, the topography of the ground is also a very important source of distortion in

this type of aerial photos. A lot of work has been done in order to obtain photos somehow unbiased

from relief effects [88][89]. A photograph corrected by these effects caused by topography is

commonly referred as orthophoto. Fig. 3.7 exemplifies this concept through a top and side view of two

objects.

- 18 -

Fig. 3.7 – Orthographic views project as a right angle to the data plane. Perspective views project from the

surface onto the datum plane from a fixed location [90].

This concept is one of the most important in this research area, so it is worthy of a definition.

According to our reference [91] an orthophoto, orthophotograph or orthoimage is an aerial photograph

geometrically corrected ("orthorectified") such that the scale is uniform: the photo has the same lack of

distortion as a map. Unlike an uncorrected aerial photograph, an orthophotograph can be used to

measure true distances, because it is an accurate representation of the Earth's surface, having been

adjusted for topographic reliefs, [91] lens distortions and camera tilts.

3.2.4. Parallax

Parallax is a displacement or difference in the apparent position of an object viewed along two

different lines of sight, and is measured by the angle or semi-angle of inclination between those two

lines [92].

Fig. 3.8 – A simplified illustration of the parallax of an object against a distant background due to a

perspective shift. When viewed from "Viewpoint A", the object appears to be in front of the blue

square. When the viewpoint is changed to "Viewpoint B", the object appears to have moved in

front of the red square [93].

Parallax is a consequence of the presence of ground relief. Camera tilts (pitch and roll) are just

aggravating factors (Fig. 3.9).

- 19 -

Fig. 3.9 – Parallax as a consequence of ground reliefs for high pitch angles.

Orthorectification algorithms have parallax into account and they are often associated with

epipolar geometry and stereo vision [20].

Before we finish with this topic it is important to notice that even perfect nadir photos taken with

pinhole cameras can reveal the parallax problem (Fig. 3.10).

Fig. 3.10 – Parallax on nadir photos, where the bigger the angle of view of the camera the worst.

Ideally to solve this problem we would have to take infinite pictures at an infinite distance in

order to have one nadir-photo for each single ground pixel (Fig. 3.11).

Fig. 3.11 – For successively higher altitudes (nadir views) the distortions diminish for the same

photographed area, however the resolution diminishes too.

- 20 -

In aerial photography there are situations where it is easy to identify the presence of problems

related to parallax effect independently of if the photo was taken from the nadir direction or not.

Consider for example the situation in which you are looking in detail at a photo, more precisely a

building, and you can see one of its facades, it becomes clear that the viewpoint of the camera related

to that specific building was not in the nadir direction causing an unwanted look of the building. It is

very difficult to deal with this problem, but there are several ways to anticipate and diminish its effects:

UAVs may be equipped with better controllers in order to diminish the pitch and roll

oscillations during the flight;

Take more photos of the ground per unit of area;

Use cameras with smaller angles of view may help;

Take photos at higher altitudes, but consistent with the resolution desired.

However attention must be given to some situations as those illustrated in Fig. 3.12:

Fig. 3.12 – Same camera tilts for successively higher altitudes gives worst results.

As we saw topographic relief (section 3.2.3) and camera tilt (section 3.2.2) problems have

already been portrayed. In the next subchapter a brief description is made around corrections of

camera lens distortions.

3.2.5. Lens distortions

There are several optical and lens profiles, while some produce slight changes in pictures

others are responsible for important distortions as the case of fisheye lens.

Most photographers are aware of the barrel and pincushion distortion that typically appear at

the shorter and longer ends of zoom lens respectively, with continuous variation in between. Real-

world lenses, as an example, often exhibit a distortion pattern that varies from barrel to pincushion

within the same image. The effect of such lens design is to have a maximum distortion mid-field, while

the edges display little or virtually no distortion [94].

- 21 -

Fig. 3.13 – a) Distortion free image; b) Barrel distortion (fisheye lens); c) Pincushion distortion; d) Example

of a complex distortion [94].

Some modern lenses exhibit geometric distortion that varies with focusing distance. As a result,

this distortion can only be properly corrected using correction models that take into account focusing

distance [94].

Knowing the internal and external parameters affecting the performance of a camera such as

focal length, distance to the target and type of distortions present, it is possible to fix or at least to

minimize these imperfections through polynomial transformations. For more complex situations

nonlinear transformations may also be used [21].

When no information is available one of the methods used relies on this principle: “straight lines

remain straight lines” [94], where lens distortions are corrected after analyses of a set of

photographies taken for example of a grid3.

3.3. Image registration

According to: [63][95][96] image registration is the task of finding the optimal transformations

that allows us to transform different sets of data into one coordinate system. For our image mosaicing

problem data are photographies and the desired output are geometric transformations. Basically such

transformations, obtained by matching methods, allow align two images if they share some

overlapping region.

Image registration is, then, the central task of image mosaicing procedures, recording that the

problem of image mosaicing is a combination of: correct geometric deformations, image registration

and eliminating seams from mosaics [21].

3.4. 2D homography

As we said previously UAV sensors give inaccurate spatial information, so if we want our

mosaic to be as much accurate as possible and since the only information that's left are the color

scales of the pixels within the image, image based methods must be studied.

3 DxO Optics Pro Software can be used to solve several problems related to lens distortions [57].

- 22 -

After a state of the art revision it is easy to say that the creation of mosaic images is closely

related to the calculation of geometric transformations that allow us to transform a set of images into

the same reference frame. In this context several authors use planar projective transformations for this

purpose.

Homography, projective transformation, projectivity or collineation is a mathematical concept

used in projective geometry to describe a relationship between two planes, such that any point on one

plane corresponds to one point in the other plane, and vice versa. It describes what happens to the

perceived positions of observed objects when the point of view of the observer changes (Fig. 3.14).

Fig. 3.14 – Observed object from different viewpoints.

Formally, homography is a projective transformation that takes a set of points in (Fig.

3.14 - I1) to its corresponding set of points likewise in (Fig. 3.14 - I2).

Some relevant properties of projective transformations:

Projectivities form a group since the inverse of a projectivity is also a projectivity, and so

is the composition of two projectivities;

Do not preserve sizes or angles;

The map is a collineation since lines are mapped to lines, which preserves incidence

since any two lines in a plane meet;

Cross-ratio, or ratio of ratio of lengths, is invariant under any projective transformation of

a line.

The problem of estimating projective transformations involves the computation of a

matrix such that for each . A question that naturally arises is: how many correspondences

do we need to calculate a homography? Briefly, matrix contains entries, but is defined only up to

scale. Since each point-to-point correspondence accounts for constraints and the number of degree

of freedom of a projective transformation is , we need to specify four point correspondences (taking

into account that any of the points must not be collinear).

- 23 -

3.5. Direct Linear Transformation (DLT) algorithm

In the present work, DLT is used to find homographies given a set of more than 4 point

correspondences for which we have an over determined solution. Therefore, in order to introduce

involved equations we will begin with a simple linear algorithm for determining the exact given a set

of four to point correspondences satisfying:

(3.1)

Exact solution

Since we are dealing with homogeneous vectors4,

and differ from each other by a

nonzero scale factor; however the direction is kept the same. An alternative expression is to consider

the cross product, which is zero for parallel vectors:

(3.2)

Defining as the -th row of the matrix we can write:

(3.3)

Now if

then the cross product is defined as:

(3.4)

Since , we can rewrite the previous equation:

(3.5)

Where is a matrix, and is a 9-vector made up of the entries of the matrix .

4 Homogeneous coordinates, for our to case, are obtained adding a third coordinate to a point in

Euclidean 2-space. For more information it is recommended to see the reference [55].

- 24 -

,

(3.6)

One may find that only the first two rows of eq. (3.5) are linear independent, so rewriting we got:

(3.7)

Where is now a matrix.

As can be inferred, is a linear equation in the unknown where each point

correspondence gives two independent equations, so if we consider four of such points we obtain

coefficients for our matrix . Since is in general only determined up to scale, the 1-

dimensional null-space is not an impediment for finding a solution, since the scale may be arbitrarily

chosen due to usage of homogeneous coordinates (( ) and ( ) represent the same point,

where is the scale factor). Posteriorly a homography normalization can be performed5.

Over determined solution

Finally, if more than four point correspondences are given then there will not be an exact

solution to the over-determined system: apart from the trivial one. It is easy to see that this

happens due to the fact that the measurement of image coordinates is inexact (presence of noise).

To cope with this, one attempts to find an approximate solution that minimizes a suitable cost

function. It seems natural to attempt to minimize the norm , however to avoid the solution

a constraint such as must be used. The value of the norm, , is unimportant since is

only defined up to scale. The solution is then the unit singular vector corresponding to the smallest

singular value of [62] (p. 90). The resulting DLT algorithm is described next.

5 one normalization uses entry performing for this

, however attention must given when

points are mapped to infinity (skyline/the vanishing line of the plane) where ; On the other hand

there is a normalization that consider where

following the algebraic relation:

.

- 25 -

Objective: Find homography given a set of to point

correspondences such that

.

1. Normalization of : Compute a similarity transformation , consisting of a

translation and scaling, that takes points , to a new set of points such

that the centroid of the points is the coordinate origin and their

average distance from the origin is ;

2. Normalization of : Compute a similar transformation for the points in

the second image, transforming points to

;

3. From each correspondence compute the matrix using eq. (3.5)

6.

4. Assemble the matrices into a single matrix ;

5. Given , find that minimizes subject to ;

6. Obtain SVD (Single Value Decomposition) of : where and

are orthogonal matrices and is a diagonal matrix with non-zero entries;

7. is the last column of ;

8. Denormalization: set .

The normalized DLT algorithm [62] (p.107-109) introduces an essential normalization step to

point coordinates. Typical image points are of the order: , i.e., and are much

larger then , thus the entries of : , , , will be of order , while will be just unity.

Without entering in too much detail the thing is that all entries in must have similar magnitude (some

entries of need to be increased while others decreased) and to do that a new matrix is found that

minimizes the difference to in Frobenius norm: . This way for which has not rank

8 due to noisy data (and so does not have an exact solution) is replaced by that has rank

8 (with minimal changes) providing this way the exact solution of: . Frobenius norm is

simply given by the square root of the sum of squares of all entries. Results show that data

normalization achieved by rescaling and translating point coordinates gives dramatically better results

[62].

The basic DLT algorithm can be consulted here: page 91 of [62]. For readers interested, further

explanations about Least-squares solution as well as for SVD can be found here: [62] (p.592-593) and

[62] (p.585-587) respectively.

6 We use eq. (3.5) instead of eq. (3.7) because we do not have an exact solution and therefore the three

rows of are linear independent.

The normalized

DLT for 2D

homographies

The basic DLT for 2D

homographies

Least-squares solution of

homogeneous equations

Single Value

Decomposition

- 26 -

3.6. Feature detection using Harris-Laplace

In image processing, feature detection methods are used to extract information (or features)

from images in order to, later, use this information to make several local decisions. It’s difficult to give

an exact definition of feature since it often depends on the problem, however there are specialized

techniques used to find some classes of features. Feature detection methods can be generally divided

in four groups depending on the type of feature extracted: image regions/blobs[97]; corners/interest

points; edges and ridges [98].

Due to the fact that homographies can be calculated from a minimum set of four matching

points between a pair of images with some overlapping region, in this section it will be only addressed

methods that allow to find interest points / corners of images in order to, a posteriori, find the

correspondence between points of both images and solve the image mosaicing problem.

Moravec

Moravec’s corner detector is one of the earliest corner detection algorithms. Moravec algorithm

test how similar a patch centered on each pixel is to nearby and defines a corner to be a point if with

low self-similarity [99].

Denoting the image intensities by , and considering an image patch over the area the

change produced by the small shifts, , considered by the author7, give an average change of the

image intensity (sum of squared differences between the two patches):

(3.8)

Where specifies the image window: it is unity within a specified rectangular region around

each pixel and zero elsewhere.

Moravec’s corner detector works like this: for each pixel it considers the lowest value of from

the set of shifted patches mentioned (corner strength). After this it looks for local maxima above some

threshold. A smaller value of E indicates more similarity and smaller is the probability of this point to

become a corner.

7 The shifted patches ( ) used by Moravec that are considered comprise: {(1,0);(1,1);(0,1);(-1,1)} (shifts

at every 45 degrees).

- 27 -

In conclusion, three cases were considered in the previous paragraph [41]:

A. If the windowed image patch is flat (ie. Approximately constant in intensity), then all shifts will

result in only a small change;

B. If the window straddles an edge, then a shift along the edge will result in a small change,

but a shift perpendicular to the edge will result in a large change;

C. If the windowed patch is a corner or isolated point, then all shifts will result in a large change.

A corner can thus be detected by finding when the minimum change produced by any of the

shifts is large.

Harris / Harris & Stephens

Harris method [41] presented an improvement upon Moravec’s corner detector by adding full

rotation invariance. Moravec considered a discrete set of shifted patches (anisotropic) to deal with

rotation, however Harris by performing an analytic expansion about the shift origin make possible the

presence of all small shifts.

Recovering the previous equation used by Moravec:

(3.9)

In which Harris considered the corner score ( ) to be an autocorrelation matrix, and performing

a Taylor expansion to :

(3.10)

Where and are partial derivatives that can be obtained by the convolutions:

(3.11)

We get:

(3.12)

- 28 -

This can be written in the matrix form:

(3.13)

Where is the structure tensor/second moment matrix:

(3.14)

And , and can be obtained as:

(3.15)

Instead of the binary and rectangular window used by Moravec, Harris proposed to use an

isotropic smooth circular window like the Gaussian, reducing, this way, the previous noisy response:

(3.16)

Or in the normalized form:

(3.17)

Gaussian window acts to average in a local region while weighting those values near the center

more heavily. For the normalized form, the weight summation over all entries of is unitary.

Following Harris-Stephens article, let α and β be the eigenvalues of . α and β are proportional

to the principal curvatures of and so they will give a rotationally invariant description of .

Instead of use the explicit calculation of the eigenvalues and since there is a need for a corner

classification, Harris find attractive to use:

(3.18)

Where is used as a measure of the corner response and , sensitivity factor, was proposed

by the author to be ( - have been reported as feasible). is positive in the corner

regions, negative in the edge regions and small in the flat regions (Fig. 3.15). Notice that a pixel is

selected as a corner if its response is an 8 –way local maximum.

- 29 -

Fig. 3.15 – Contours of constant are shown by the fine lines.

As before, three cases were considered:

If both curvatures are small, so that the local auto-correlation function is flat, then the

windowed image region is of approximately constant intensity (ie. arbitrary shifts of the image

patch cause little change in );

If one curvature is high and the other low, so that the local auto-correlation function is ridge

shaped, then only shifts along the ridge (ie. along the edge) cause little change in : this

indicates an edge;

If both curvatures are high, so that the local auto-correlation function is sharply peaked,

then shifts in any direction will increase : this indicates a corner.

Harris-Laplace

Harris’s method presented good results, however it does not deal with scale changes. To

achieve robustness to scale changes Mikolajczyk and Schmid [49] proposed a new method for

detecting scale invariant interest points. Their method first computes a multi-scale representation for

the Harris interest point detector and then select points at which a local measure (the Laplacian) is

maximal over scales. For each interest point characteristic scale can also be determined. According its

authors, these points are invariant to scale, rotation and translation as well as robust to illumination

changes and limited changes of viewpoint.

In order to compute a multi-scale representation for the Harris corners the second moment

matrix was modified to reflect a scale-invariant property. Being the new scale adapted

second-moment matrix for the Harris-Laplace detector [31]:

(3.19)

- 30 -

Where is the Gaussian smoothed image:

(3.20)

And of course:

(3.21)

As we saw before (eq. (3.17)), is an isotropic, circular Gaussian kernel over the area

:

(3.22)

is a local scale for smoothing prior to the computation of image derivatives and is an

integration scale for accumulating the nonlinear operations on derivative operators into an integrated

image descriptor. is also the current scale at which the Harris corner points are detected.

Next, the algorithm searches over a fixed number of defined scales. This set of scales is defined

as:

(3.23)

Where is the scale factor between successive levels, is the initial value and is the number

of scale levels. Mikolajczyk and Schmid [31] set as in their experiments. They also used the

constant factor in the relation:

(3.24)

Investigated ranges for are within . Naturally , as initial scale, can be set as 1.

Finally, as before in the original Harris method for corner classification, the already known

expression (eq. (3.18)) is used:

(3.25)

And again a pixel is selected as a corner if its response is an 8 – way local maximum.

- 31 -

At this point we have Harris corner localizations at each scale level. Now comes the second and

final step of the algorithm where we must select points at which a local measure (the Laplacian) is

maximal over scales. Below a description of the procedure is given.

Calculate the scale normalized Laplacian operator (Laplacian of Gaussian - LoG) at each of the

scale levels considered:

(3.26)

Where and are the second derivatives in their respective directions and factor is

used to normalize the LoG across scales and make these measures comparable. Mikolajczyk and

Schmid demonstrate that the LoG measure attains the highest percentage of correctly detected corner

points in comparison to other scale-selection measures [49].

For each corner found previously by Harris:

If the scale which maximizes this LoG measure is between the two scale

neighborhoods:

(3.27)

Then the corner mentioned is selected and is deemed its characteristic scale.

If the scale which maximizes this LoG measure is one of the two extremes:

(3.28)

Then the corners mentioned are selected and or , respectively, are deemed as its

characteristic scale.

Otherwise corners are discarded.

3.7. Scale Invariant Feature Transform (SIFT)

David Lowe, in his papers [52][30], proposed a new method for image feature generation based

on the grey-level image. According to his article the first stage of SIFT method is to identify interest

point locations in scale space by looking for locations that are maxima or minima of a difference of

Gaussian function. Next, an explicit scale is determined for each point, which allows the image

description vector for that point to be sampled at an equivalent scale in each image. A canonical

orientation is determined at each location, so that matching can be performed relative to a consistent

local 2D coordinate frame. Finally D. Lowe, using and coordinates, characteristic scale and the

canonical orientation, describes each interest point with a local feature vector which is invariant to

translation, scaling and rotation, is also partially invariant to illumination changes and affine or 3D

projections. Invariance for the affine or 3D projections was achieved by blurring image gradient

locations.

http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/OWENS/LECT6/node2.html

- 32 -

Laplacian of Gaussian vs. Difference of Gaussian

To identify interest points in scale space, Lidenberg [100] has shown that under some rather

general assumptions on scale invariance, the Gaussian kernel and its derivatives are the only possible

smoothing kernel for scale space analysis. Therefore, the scale space of an image is defined as a

function:

(3.29)

Where is the Gaussian kernel used in the convolution (eq. (3.17)).

To detect stable interest points, D. Lowe proposed to use space scale extrema in the Difference

of Gaussian (DoG) function convolved with the image : , which can be computed from

the difference of two nearby scales separated by a constant multiplicative factor :

(3.30)

The Laplacian of Gaussian: , gives the second image derivative, which is good to locate

interest points as corners and edges. Linderberg [100] showed that true scale invariance is achieved

normalizing the Laplacian, , by the factor : and Mikolajczyk [101] found that the maxima

and minima of produced the most stable image features compared to a range of other possible

image functions. However, it is computationally heavy to calculate Laplacian of Gaussians. According

to D. Lowe the difference-of-Gaussian function provides a close approximation to the scale-normalized

Laplacian operator with less computational effort:

(3.31)

Where is constant over all scales and so has no influence in extrema location.

was found to be a good scale space factor with almost no influence on the stability of extrema

detection and localization.

Constructing the scale space from DoGs

The philosophy is to use a set of octaves each being composed by a set of scale levels. The

author proposed to use four octaves each with five scale levels [30][102] but other refinements can be

used. To get an image in one scale level a blurring step must be done to the previous image of the

same octave. The first image of each octave is obtained by downsampling the third scale level image

of the previous octave. The first image of the first octave is obtained from the initial image. Finally, as

we said, DoG are obtained performing the difference of two nearby scales (see Fig. 3.16).

- 33 -

A close analysis shows that D. Lowe used multiple octaves but just one DoG per octave on his

first article regarding SIFT [52]. Later, in his second article [30] he presented a generalization of his

method introducing multiple levels per octave. Despite the fact that his first article seems clear, several

of the initial constants used got changed and some of the modifications introduced were not clarified.

Due to such changes, the scale space step became the reason why there are several slightly different

versions of the same algorithm available on the Internet [103][102][104].

We will present here a schematic to ease the reasoning of how to construct a space scale from

DoGs, confronting the previous interpretations [103][102][104]. with D. Lowe articles:

Fig. 3.16 – Schematic of the D. Lowe scale space.

D. Lowe conclude that blurring the initial image led to an improvement of the results, however if

we smooth the initial image before extrema detection we are effectively discarding the high spatial

frequencies [102][30]. To solve this, he doubled the size of the input image (increasing the number of

stable keypoints by almost a factor of 4) using linear interpolation prior to building the first level of the

pyramid. He assumed that the original image had a blur of at least (the minimum needed to

prevent significant aliasing), and that therefore the double image would have a blur of relative to its

new pixel spacing.

In D. Lowe’s first paper [52], as we said, just the first DoG of each of the octaves were used.

and were set to while the scale, , used on was set to . It was also used a pre

blur of before the first level of each octave (not represented).

In the second paper [30], D. Lowe changed the amount of prior smoothing, , to , which

provides close to optimal repeatability according to his results. was set to , by simply taking every

second pixel in each row and column. Despite the fact that should be , D. Lowe claims that there

is no real change in the accuracy of the results, while the computation time is greatly reduced (no

need for bilinear interpolations).

- 34 -

In the implementation: [102] and were set to while the scale, , used on

was set to , on the other hand A.Vedaldi implementation [103] present a different solution. A.

vedaldi instead of blurring the initial image with and , he just uses a blur of

right after doubling the initial image. He also uses as instead of , while

and remains and respectively.

Finally an important aspect to say is that D. Lowe used different constant values for .

According to his papers if is the number of intervals in which each octave of scale space is divided

[30] then is given by:

(3.32)

For this case we were naturally considering which produced images in the stack

of blurred images for each octave.

Characteristic scale of a point that lies within a specific DoG level

The scale of each point, D. Lowe says that it is simply the smallest Gaussian used in the

Difference-of-Gaussian function of eq. (3.30). In other words, the scale of a point found within a DoG

image is the amount of blur, , used on the smallest Gaussian image used to calculate that DoG.

A. Vedaldi defined the scale for each of his levels as

, where , is the octave

number: and is the scale level: . He simply began with because

he considered that when D. Lowe initially doubled the size of the image he was giving a step

backwards on scale space and so the scale for the first level of the first octave should be less than 1

( . Utkarsh [102], on the other hand, used simply the relation:

(3.33)

Substituting we have:

Fig. 3.17 – Scale levels for each image of each octave ( ). [102]

An observation of these results states that the first level of each octave (excluding the first)

corresponds to the level in the previous octave [30][102].

- 35 -

Interest point location

Interest point locations are simply given by the local maxima and minima of the DoG functions.

Each sample point is compared to its eight neighbors in the current image and nine neighbors in the

scale above and below. It is selected only if it is larger than all of these neighbors or smaller than all of

them.

3.7.1. Assignment of a canonical orientation to each interest point

The idea of assigning this orientation is to provide rotation invariance to all the keypoints. The

method is very simple: the scale of the keypoint is used to select the Gaussian smoothed image,

(eq. (3.29)), with the closest scale, so that all computations are performed in a scale-invariant manner.

For each image sample, , at this scale, the gradient magnitude, , and orientation, ,

is computed using pixel differences for all the pixels within the image [52][30].

(3.34)

(3.35)

Next an orientation histogram is formed from the gradient orientations of sample points within a

region around the keypoint (this region is proportional to the of the keypoint). The orientation

histogram has bins of degrees each (covering the degrees). Each sample added to the

histogram is weighted by its gradient magnitude and by a Gaussian-weighted circular window with

( of the keypoint)8.

Finally the highest peak in the orientation histogram is selected for that keypoint. Any other

peak that falls within the 80-100% of the highest peak is also selected and a new point is assigned

(with the same scale) for that orientation (Fig. 3.18). A parabola was used by D. Lowe to improve

accuracy on the peak position, considering always the three closest histogram values of each interest

point for the interpolation.

8 Using Gaussian kernels the amount added also depends on the distance from the keypoint. So gradients

that are far away from the keypoint will add smaller values to the histogram.

- 36 -

Fig. 3.18 – Orientation histogram [102].

3.7.2. Interest point description

Following [52][30] the first step is to use, as before in section 3.7.1, the scale of the keypoint to

select the Gaussian smoothed image, , with the closest scale and thus compute gradient

magnitudes, eq. (3.34), and orientations, eq. (3.35), for that keypoint in a scale-invariant manner.

Next, the descriptor coordinates and the gradient orientations are rotated relative to the

canonical orientation of each keypoint which was obtained in section 3.7.1. This will guarantee rotation

invariance.

In the next step we weight gradient magnitudes around each keypoint with a circular Gaussian

window of . Its purpose will be described soon.

To find the descriptor D. Lowe began for considering a window around each keypoint as

represented in Fig. 3.19:

Fig. 3.19 – Interest point description [102].

From Fig. 3.19 one may see that the keypoint lies “in between”. It does not lie exactly on one of

the entries of the window. That is because it does not. The window takes orientations

and magnitudes of the image “in-between” pixels. So we need to interpolate the image to generate

orientation and magnitude data “in between” pixels [102][30].

- 37 -

As we can see from Fig. 3.19 we divide each window into sixteen small windows.

Now assigning the previous calculated Gaussian-weighted gradient magnitudes, of each

window, into orientation histograms of bins each (Fig. 3.20) we obtain a

dimensional vector which summarize the content of all the small windows from the top left to the

bottom right (Fig. 3.19).

Fig. 3.20 – Orientation histograms. Adapted from [102].

Finally, to reduce the effects of illumination changes, the descriptor vector, let us call it , is

normalized to the unit by dividing the vector by the square root of the sum of the squares of all the

entries of itself:

(3.36)

A change in image contrast in which each pixel value is multiplied by a constant factor will

multiply gradients by the same constant, so this contrast change will be canceled by vector

normalization. A brightness change in which a constant is added to each image pixel will not affect the

gradient values, as they are computed from pixel differences [30].

To reduce the effects of non-linear illumination changes (as camera saturations or simply solar

reflections) that can cause large changes in relative magnitudes for some gradients, but are less likely

to affect the gradient orientations, D. Lowe reduced the influence of large gradient magnitudes by

thresholding the values in the normalized vector to each be no larger than (experimentally

determined), and then renormalizing to the unit.

3.8. Putative correspondences

Given two sets of interest points, where each interest point is described by a vector, the aim of

this step is not perfect matching (it is difficult to guaranty correct matches), but to provide an initial

point correspondence set (putative correspondences) in order to, later, use other method (e.g.

Ransac) to eliminate the mismatches [62].

- 38 -

According to D.Lowe [30] the best candidate match for each keypoint is found by identifying its

nearest neighbor. The nearest neighbor is defined as the keypoint with minimum Euclidean distance

for the invariant descriptor described in section 3.7.2.

The Euclidean distance, , between the vector describing interest point from image 1,

, and the vector describing interest point from image 2, , is given by:

(3.37)

Thus, what we are looking for is the minimum value of each column/row9 of .

Briefly, a more effective measure is obtained by comparing the distance of the closest neighbor

to that of the second-closest neighbor. According to D. Lowe if we reject all matches in which the

distance ratio is greater than (eq. (3.38)) we should get rid of of the false matches while

discarding less than of the correct matches:

(3.38)

3.9. Robust estimation using Ransac

Until now all efforts were made in order to find a set of corresponding points between two

images . However, errors in previous steps must be considered. There are two main sources

of errors: in the measure of points position, which is assumed to follow a Gaussian distribution, and

due to mismatched points (outliers) resulting from previous subchapter (section 3.8). As we said

previously (section 1.4) our final goal is to find projective transformations or homographies that

describe relations between pairs of images, therefore, our objective in this subchapter is then to find a

set of at least four correctly matched points (inliers) so that homographies may be estimated in an

optimal manner. Robust estimation comes from the fact that the methods used are robust (tolerant) to

outliers (measurements following, possibly, an unmodelled error distribution). This subchapter will

follow Richard Harley and Andrew Zisserman’s book [62].

9 In fact, we obtain two different results (not necessarily) if we just use the minimum values provided from the columns or from the rows.

- 39 -

Ransac

RANdom SAmple Consensus (Ransac) [61] is a commonly used method for robust estimation

since is able to cope with a large proportion of outliers. The idea behind is very simple: consider a

bunch of 2D points lying on a plan and that we want our model to be a line that best fits these points;

we first select two points randomly; these points define a line; the support for this line is measured by

the number of points that lie within a distance threshold (example: , where is the standard

deviation of the perpendicular distance of all the points to the mentioned line); this procedure is

repeated N times (so, each repetition picks two random points) and the line with most support is

deemed the robust fit; points within the distance threshold are the inliers and the others outliers.

Naturally, outliers will lead to lines with very little support.

Now considering our case, we want our model to be a planar homography, so instead of two

points we will need to have a minimum set of four point correspondences. Next, an automatic

estimation of the homography between two images using the Ransac robust estimation algorithm

adapted from Richard Harley and Andrew Zisserman’s book is summarized [62]:

Objective: Compute the 2D homography between two images.

Given: Set of putative correspondences between interest points of each image

Algorithm:

1. Select a random sample of 4 putative correspondences and compute the homography

;

2. Calculate the distance for each putative correspondence;

3. Compute the number of inliers consistent with by the number of correspondences for

which pixels ( is the standard deviation of );

4. If the number of inliers is above some threshold value go to step 7;

5. If not, repeat for N samples, steps 1 to 4 ( as we will see later is the number of trials);

6. Since no number of inliers was bigger than , choose the with the highest support

(largest consensus set). In the case of ties choose the solution that has the lowest

standard deviation of inliers;

7. Re-estimate using all correspondences classified as inliers.

Consider the calculations involved in the previous algorithm. We already explained how to

compute a homography of four 2D to 2D point correspondences as well as for over-determined

solutions (section 3.5); therefore, only the remaining distance and thresholds t, N and T are

discussed next.

- 40 -

Distance measure :

The simplest method of assessing the error of a correspondence from a homography is to

use the symmetric transfer error:

(3.39)

Where is the point correspondence. In this method measurement errors occur in both

images. Naturally gives the square of the point distance which is the sum of squared and

measurements.

Reprojection error is an example of a better, thought more expensive distance measure:

(3.40)

Where is the perfect correspondence. In the symmetric transfer error we had the

coordinate values: , however for the reprojection error points

need to be estimated in

such a way that the reprojection error is minimum (a cost function is involved). For readers who wish

to know more about reprojection error the following bibliographic reference is suggested [62].

Fig. 3.21 – A comparison between symmetric transfer error (upper) and reprojection error (lower) [62].

Distance threshold :

If we assume the measurement error of each point to be Gaussian with zero mean and

standard deviation , then the square of the point distance, , will be a sum of squared Gaussian

variables and follows a distribution with degrees of freedom ( , codimension of the modal, is

equals to two for homographies).

The probability that the value of a random variable is less than is given by the cumulative

chi-squared distribution:

(3.41)

- 41 -

Usually is chosen as so that there is probability that the point is an inlier:

with

(3.42)

Number of inliers :

To calculate this value a conservative estimation of the proportion of outliers must be done.

We know the total number, , of putative correspondences received by the Ransac algorithm, so we

then just need to perform:

(3.43)

This threshold value is set aside most of the times since we do not know a priori the fraction of

data consisting of outliers; however, when we do it can be computationally friendly. This value will be

considered again in chapter 5.

Number of trials :

Naturally the usage of random samples of putative correspondences has only the objective

of reduce the computational effort, so the question that rises is: how many samples do we need to,

with probability , at least one of the samples of is free from outliers? To answer this question let be

the probability of any selected data point (correspondence) be an inlier, and thus is the

probability that it is an outlier. Then at least selections of correspondences are required:

(3.44)

Usually is chosen as , however the same problem as before for appears: we do not

know the value of . An adaptive method to find is described next.

First we need initial values. The worst case guess is to consider , since for higher values

the algorithm is likely to fail. Considering the initial value for is, according to eq. (3.44), 72.

Whenever Ransac finds a consensus set containing more than 50% of the data points, we then know

that there is at least that proportion of inliers. Basically each time a sample gives a higher percentage

of inliers than all the previous results, then we just have to save the best score so far and update for

the new lower value. decreases with the decreasing of , thus, when the repetitions are over our

best score is determined. It may occur that determines a less than the number of samples that

have already been performed, in such cases the algorithm terminates.

- 42 -

3.10. Homography decomposition

As we will see later this section is of extreme importance in the error analysis of the mosaicing

method developed, it will allow to compare the exact homographies obtained from chapter 4, where

we know the exact translations and rotations used for the quadrotor when obtaining each of the

photographs, to the ones obtained through mosaicing methods.

The problem of Euclidean homography decomposition, also called Euclidean reconstruction

from homography, is that of retrieving the elements , and from the matrix [105]:

(3.45)

Soon each of these variables will be introduced.

Before eq. (3.45) the homography obtained from either the exact or the estimated methods

must be corrected from the calibration matrix:

(3.46)

The calibration matrix, , will be defined later on eq. (4.20), but without the minus signals in the

entries and (see section 4.5). Next, we need to normalize in such a way that

. We saw before on the note 5 (page 24) that this is achieved through the equation:

(3.47)

Now that we are in conditions to continue our previous reasoning from eq. (3.45), let us begin

for considering two different camera frames: the current and the desired, and in the figure

respectively:

Fig. 3.22 – Desired and current camera frames. Involved notation [105].

- 43 -

The homogeneous transformation matrix converting point coordinates from the desired

frame to the current frame is:

(3.48)

Where and are the rotation matrix and translation vector, respectively. In the figure, the

distances from the object plane to the corresponding camera frame are denoted as and . is the

normal to the plane.

Briefly, and not to enter into too much detail, we obtain:

(3.49)

Where is normalized with respect to the plane depth .

A very good self explanatory presentation of the equations involved can be found here for

interested readers [106]. The concept is to use the Single Value Decomposition (SVD) of the

homography matrix:

(3.50)

And then use the eigenvalues of the matrix as well as and to find , and . For this, the

main equations will be presented here. First, using this decomposition we obtain the new equation:

(3.51)

Where , and are related to , and by:

(3.52)

can be calculated from:

(3.53)

Wherein:

(3.54)

- 44 -

The values are the eigenvalues of .

is obtained from:

(3.55)

To finish we can compute the matrix as:

(3.56)

Where:

(3.57)

And the vector as:

(3.58)

Finally, we can decompose into the known Euler angles and take the roll, pitch and yaw

angles. Equations that give these angles will be presented only in chapter 5, due to some practical

aspects we have to take into account.

- 45 -

4. Simulation of taking aerial photographies with UAVs

The major purpose of this subchapter is to determine the transformations that allow to go from

the coordinates of a photographed point ( ) in the world reference frame (GPS) to a pixel point ( )

in the picture reference frame (eq. (4.1)). Naturally, for this purpose it is necessary to know the

location of the optical center of the camera in the world reference frame as well as the pitch, roll and

yaw rotations of the UAV in the precise moment when the photo is taken. Camera parameters, such

as focal length, must also be known.

(4.1)

Eq. (4.1) divides the problem in two parts. The first relates the transformation from the world

reference frame to the camera reference frame (eq. (4.2)10

).

(4.2)

The second goes from the camera reference frame to the image plane. For this, later will be

discussed the pinhole camera model and the calculation of the calibration matrix which is a

matrix that contains the intrinsic parameters of the camera. On the other hand, camera matrix ( ),

, will be given by:

(4.3)

Later, and are discussed.

4.1. World reference frame

In the real application of this problem the origin of the world reference frame and the

implemented coordinate system would respect the Global Positioning System (GPS). However in this

simulation instead of the World Geodesic System of 1984 (WGS84) used by the GPS, the system

adopted was in cartesian coordinates of the form: . To facilitate interaction with Matlab, the

origin of the world reference frame was made to coincide with the upper left corner of the virtual image

used to simulate the world (Fig. 4.1). It was from this image that the virtual photos were taken.

10 has inverse because it is a multiplication of rotation and translation matrixes and each of these is

invertible [55].

- 46 -

Fig. 4.1 – Adopted virtual world image (8486 by 12000 pixels) and its coordinate system. For this

simulation each pixel was considered to have 1 m [107].

On this simplification it was naturally assumed that the ground would be plane since we were

using flat satellite images to simulate the World. This allowed to be possible, in a first approach, to

take aerial photographs to this virtual "image of the World" with the exact knowledge of all the

variables involved (in a real situation such values, with the respective errors, would be made available

by the sensors). Later, with the obtained images, it would become possible to measure the accuracy

of the mosaicing techniques developed by comparing the resulting mosaics with the original virtual

image of the World.

To avoid possible misunderstandings, axes not only of the world reference frame, but of all

frames used in this dissertation were used with respect to the right-hand rule - clockwise reference

frame (Fig. 4.2).

Fig. 4.2 – Clockwise reference frame adopted.

What concern to the reading and writing functions of images provided by the Matlab Image

Acquisition Toolbox (imread (row, column) and imshow (row, column)) they operate according to the

red reference frame represented in Fig. 4.1 and Fig. 4.3 (direct reference frame). However, let us note

here that some features of Matlab are ruled in a counterclockwise reference frame, this includes the

editor used to select the black dot you can see in Fig. 4.3 as well as the Matlab very known plot

function that allows you, for the current situation, to represent points in images. So, instead of have

plot (x, y) as plot (column, row) - in this work plot was used in the modified form: plot (row, column) or

plot (y, x) for a consistent use / display of the results.

- 47 -

Fig. 4.3 – Caring for the representation of the results in Matlab.

4.2. UAV reference frame

After the definition of the world reference frame comes the need for the UAV reference frame.

According to [62] (p. 579) axes x, y and z are used respectively for the roll, pitch and yaw angles of an

UAV (see Fig. 3.2). It is also known that z positive is pointed to the ground when the roll, pitch and

yaw take zero value (blue reference frame in the Fig. 4.4). According to what was said earlier the

clocker wise reference frame was used.

Fig. 4.4 – UAV reference frame (blue) related to the world reference frame (red) when

11

.

11 The origin of the UAV reference frame related to the origin of the world ( , , ) in this and

in the next figures was of course arbitrary, since the intention was only to demonstrate the procedure adopted.

- 48 -

The passage from the world reference frame to the UAV reference frame ( ) is given in five

steps (five coordinate transformations).

(4.4)

Firstly, according to Fig. 4.4, is made a translation from the origin of the world reference frame

(red) to the origin of the UAV reference frame (green).

(4.5)

Then, comes a rotation in which according to Fig. 4.4, goes from the green reference frame to

the blue reference frame by rotating 180 ° around axis (assuming pitch, roll and yaw equal

zero):

(4.6)

Finally comes the three coordinate transformations resulting from the roll, pitch and yaw angles

of the UAV. According to [108], and because matrix multiplication is not commutative the first

transformation taking place is about yaw angle12

(around - axis that points to the ground on Fig.

4.4). Yaw rotation also occurs about the axis of the world frame, not only the body frame of the

UAV. If it was applied an arbitrary roll or pitch angle, different from zero, before yaw rotation,

would be no longer perpendicular to the ground plane or parallel with or saying in other way the

image plane would be no longer parallel to the ground plane (Fig. 4.5). So, this transformation is given

by:

(4.7)

12 The reader should pay attention that this analysis was done from the world reference frame step by step to the UAV reference frame (from the left to the right transformation in the eq. (4.4)). However when we have the coordinates of a point in the UAV reference frame, we will naturally follow from the right to the left and so the first angle to be corrected would the roll, then the pitch, and finally the yaw. The thing is, for the eq. (4.4), each time a new rotation is introduced on the left, it has no concern for the original body frame of the right.

- 49 -

Fig. 4.5 – The pink reference frame corresponds to an UAV frame with . The blue

frame is the UAV frame with a single rotation yaw of 20 degrees.

The second angle to be applied is the pitch angle - around y (axis through the right wing of the

plane) - which causes the up and down of the UAV nose. This axis must be, necessarily, parallel to

the plane of the ground [108]. If the roll angle were applied before pitch, the previous statement would

not be true (see Fig. 4.6) and we would get a different final transformation. This transformation is

described in the form:

(4.8)


frame is the UAV frame with a single rotation pitch of 20 degrees.

- 50 -

Finally is the roll angle – x axis (longitudinal axis of the UAV) – which allows the tilting of UAV to

the left or right (Fig. 4.7). This transformation is described in the form:

(4.9)


frame is the UAV frame with a single rotation roll of 20 degrees.

4.3. Camera reference frame

The transformation that characterizes the passage from the camera reference frame to the UAV

reference frame is debatable. In real situation the position of the camera on the UAV turns out to be

limited by physical circumstances, so that in this simulation it will be assumed that both reference

frame origins are coincident ( , , ) = ( , , ) and that it's possible to align ( )

with the flight direction of the UAV ( ) allowing this way the alignment of the camera and the

pictures taken with the UAV flight direction. Fig. 4.8 shows the camera reference frame in light blue

after a rotation of -90 degrees about the axis.

(4.10)

- 51 -

Fig. 4.8 – Representation of the camera reference frame compared to the world and UAV reference

frames.

4.4. Pinhole camera model

A pinhole camera is simply a camera that has no lens [109]. In its place is an extremely small

aperture through which exterior light is projected onto sensitive film or paper. Effectively is a light-proof

box with a single small hole in one side. Light from a scene passes through this point and projects an

inverted image on the opposite side of the box (Fig. 4.9).

Up to a certain point, the smaller the hole, the sharper the image, but the dimmer the projected

image. Optimally, the size of the aperture should be 1/100 or less of the distance between it and the

projected image [109].

Fig. 4.9 – Pinhole Model [110].

However, in the literature it is common to consider the simplification that the image plane stays

between the pinhole and the photographed object [62](Fig. 4.10). In this case, this simulation avoids

the need for a final correction related to the inversion of the photography taken as Fig. 4.9 would

indicate. Nowadays images that we dead with are already corrected from this inversion.

- 52 -

Fig. 4.10 – Real model (left) and simplified model (right) of a pinhole camera in the XZ plane.

From Fig. 4.10 it is possible to derive these elementary geometric equations:

(4.11)

represents the distance between the object and the pinhole;

is the focal length;

and are the dimensions of the photographed area on the ground;

and are the photography dimensions on the image plane.

Our goal then is to use equations (4.11) to find the transformation that project 3D points from

the object plane to points on 2D plane (se Fig. 4.11 of section 4.6.1 for full detailed scheme).

Projections can be modeled in three steps (eq. (4.12)) as we will see: First is a perspective

projective transform that relates the two point positions ( ); second a frame transformation ( ) in

order to fix of the projected point as and finally a space dimension reduction, :

(4.12)

Perspective projection

Perspective projection ( or ) corresponds to the well-known pinhole camera model. In this

model, like it has been seen in Fig. 4.9, a point in the object plane generates a ray through the optical

center intersecting the image plane at a given position. Using homogeneous transformations if

represents the original point defined in the camera frame then, with respect

to the previous eq. (4.11), its orthogonal projection on plane will have coordinates

and the derived homographic transform is given by [62]:

(4.13)

- 53 -

Frame transformation

The passage from to is just a pure translation along axis of named before as focal

length:

(4.14)

represents the 3D frame whose origin lies on the middle of the image plane.

Space dimension reduction

Finally, because we are only interested in points on the projection plane, we can use the

remaining coordinates to define 2D axes on this plane. Denoting this coordinates system as (yellow

in Fig. 4.11), these new axes are obtained from the old ones through the transform:

(4.15)

Recovering previous equations (4.3) and (4.12), it is now obtained and represented in eq.

(4.16):

(4.16)

However, the use of the homogeneous transformation introduce a scale factor ( ) that

needs to be posteriorly corrected (see the notes in section 4.7).

4.5. Camera parameters and calibration

Remembering eq. (4.3) is the last transformation matrix that needs to be calculated to

completely define our problem. This transformation is used to place the center of the image

coordinates on the top left corner of an image instead of in the middle. We need to do so, because

Matlab by default consider the top left corner as the origin of both matrices and images. It is

recommended you to see the yellow reference frame ( ) and red reference frame ( ) shown in

Fig. 4.11 (section 4.6.1) for further detail.

To find , the problem was divided in three steps: axes rescale, translation, rotation.

- 54 -

Axes rescale

The reference frame is in meters while is given in pixels. Naturally each pixel was

considered to have a very small rectangular area (not necessarily square). The division of the image

pixel size (e.g. ), respectively by the previous and values given in Fig. 4.10 that

represent the photography dimensions on the image plane, gives us the number of pixels for

measuring unit in both directions of the image: and respectively. These are the values that are

used to rescale the axes.

Translation

The translation between the origin of and the origin is ( ), where and are

respectively half the total image size for each direction in pixel dimensions (e.g. and )13

.

Rotation

The rotation (see Fig. 4.11 on section 4.6.1) is simple given by:

(4.17)

Camera calibration

Finally, is given by:

(4.18)

Recovering eq. (4.1), is given by

(4.19)

And the calibration matrix14

:

(4.20)

13 On a real case application values: , , and are calculated based on a calibration procedure.

Thus and are not necessarily half the image sizes. In section 4.7 (“Intrinsic Parameters”) is given a

bit more information about this. 14

The skew, , of the camera calibration matrix (between the and axis) is often set to zero. The skew

is in the entry: [55].

- 55 -

4.6. Resume

In this section is presented a resume of all what was said above to ease the reasoning. First a

detailed scheme with all the reference frames involved is presented (section 4.6.1); next the same is

done for the equations (section 4.6.2).

4.6.1. Detailed scheme of the reference frames

Fig. 4.11 shows the detailed relation scheme between all the reference frames involved.

Camera model as well as image and object plane are represented with respect to the UAV reference

frame and the world reference frame.

Fig. 4.11 – From world reference frame to image plane. Note on the notation involved. It is recalled that

is aligned with the flight direction of the UAV.

- 56 -

4.6.2. Scheme of the equations involved

Next is presented a small resume of how to obtain the transformation that relates 2D points in

the image frame with 3D points on the world frame.

Fig. 4.12 – Graph of the transformations involved.

4.7. Final notes

Scale factor

In order to solve the problem introduced by the scale factor (eq. (4.16)) we will recover eq. (4.1):

(4.21)

The thing is that, and are given in due the multiplication of the calibration

matrix (given in pixels) by the world coordinates and (given in meters). This way is easy to see

that we just need to divide by its last entry (that turns out to be in meters as we can see from eq.

(4.16)).

is naturally set to zero because we are working on the ground plane. The last entry of is

1 due to the use of homogeneous coordinates.

Intrinsic parameters

Focal length ( ), , , and are given by the fabricant of each camera. However in this

dissertation it was used the Camera Calibration Toolbox for Matlab [111] in order to find the camera

parameters of a Logitech Labtec WebCam:

(4.22)

(4.1) (4.2)

(4.3) (4.4)

(4.9) (4.5)

(4.10)

(4.6) (4.8) (4.7)

(4.12)

(4.15) (4.14)

(4.13)

(4.18)

- 57 -

Instead of the theoretical calibration matrix, eq. (4.22) ends up being used to take these virtual

photos.

Discussion concerning the virtual photo

In the beginning to find the virtual photo it was used directly eq. (4.21), where ( , ) was

replaced respectively by the pixel coordinates of Fig. 4.1. Then, it was selected just the

world coordinates ( , ) that lead to values inside image dimensions (e.g. and

). Finally each of these world pixel coordinates selected were used to obtain the virtual

photo.

This method, however, presented two problems. First was computationally very inefficient since

it needed to calculate eq. (4.21) more than 100 million times ( ) or eventually less if an

initial guess were provided (center of the camera coordinates). Second is that when a coordinate pixel

is provided, ( , ), in practice, the ( ) coordinates calculated are not integer numbers which turns

this method actually into an approach unless some interpolation method is used to calculate the

correct color degree for each pixel ( ). Nevertheless, despite the time consumption it gave good

results.

Later, for a quicker calculation, the goal was to introduce ( ) coordinates and obtain directly

( , ).

For this, it was considered the inverted transformation matrix :

(4.23)

As it was said before has inverse. Since the inverse of , which is a matrix, can be

obtained by adding a row of zeros ( ) [62] (p. 590) them has also inverse. A back-

projection of points to rays method can also be used [62] (p. 161-162) to find the set of points in space

that map to a certain image point ( ) using the pseudo-inverse (or right inverse [112]) of :

,

(4.24)

However, a detailed analysis shows that the scale effect, (eqs. (4.16) and (4.21)) that

supposedly need to be multiplied by ( ) in order to obtain ( , ) actually depends of ( , ),

which leads to an indeterminate problem.

- 58 -

Finally, using symbolic calculation with Matlab, and considering the fact that , it was

possible to obtain

:

(4.25)

And get this way expressions for both and

dependent of all the variables in study15.

Using homography theory and again to avoid, e.g., calculations, it was simply used

the four extreme points: ( , , and with their correspondent

inhomogeneous coordinates ( , ) in order to find the projective transformation or homography16

that would give the world coordinates for each image point in a very quicker way:

Fig. 4.13 – Strategy adopted to find the world coordinates associated to each image coordinate.

Later, and to improve our results a bilinear interpolation was used (Fig. 4.15 and Fig. 4.14).

Fig. 4.14 – The use of bilinear interpolation for accurate results.

Fig. 4.15 – Example of bilinear interpolation. For more information we recommend to consult: [62].

15 These equations are two big to be presented here or even in appendix due to their dimensions (more than 25000 characters each).

16 We can use homographies for this because world coordinates are being considered to be 2D coordinates, and so we can treat this problem as a normal 2D to 2D correspondence problem.

Select the 4 corners Compute and

for the 4 corners.

Find homography

Image plan

( , )

( , )

( , )

( , )

World plan

( , ) ( , )

( , ) ( , )

Image plan

Use homography

Bilinear interpolation

( , )

World plan

( , )

Image plan

- 59 -

Consider, for example, that the point given by the coordinates ( , ) gives us the

coordinates in the world image . So, the value would be weighted,

according to Fig. 4.15, in the form:

(4.26)

With respect to the equation:

(4.27)

Where is used to round the value to the lowest nearest integer.

Other application of this procedure

The origin of the UAV reference frame related to the origin of the world ( , , ) as

well as pitch, roll and yaw angles would be measured and given by UAV sensors. This way it is

possible, using previous transformations, to create an image mosaic just with the information provided

by the sensors. Naturally, this approach is valid while only we are treating the world as a flat surface.

4.8. Examples of results given by Matlab

This topic pretends to show the script developed in Matlab based on the previous equations in

order to demonstrate their accuracy. To ease the reader reasoning, it was considered an initial

position of the UAV with a yaw of degrees and zero roll and pitch angles. Introducing all the

parameters we may obtain the set of figures shown next.

- 60 -

Fig. 4.16 – Matlab representation of the model developed. The red rectangle shows the image plane while

the green rectangle shows the object plane. , . Camera position

( , 1000, 600) meters.

Fig. 4.17 – Representation of the ground area covered by the photo taken in Fig. 4.16. The image on the

right is a detailed view. The red asterisk, in the middle, shows the position ( , , ) of

the camera in world coordinates (Fig. 4.1) projected onto a plane ( , ). A projection of

and reference frame is also shown in green.

Fig. 4.18 – Virtual photography taken (e.g. 320 per 240) according to Fig. 4.16 and Fig. 4.17.

Next, and to end with this topic, are presented two other brief examples.

- 61 -

a) b) c)

Fig. 4.19 – , . Camera position ( , 1000, 600) meters. a) Model developed; b)

Representation of the ground area; c) Image taken.

a) b) c)

Fig. 4.20 – , , . Camera position ( , 1100, 800) meters. a) Model

developed; b) Representation of the ground area; c) Image taken.

- 62 -

- 63 -

5. Implementation of the Image Mosaicing Method

This chapter is not intended to exhaustively expose the lines of the Matlab code. The aim is to

present the algorithms and strategies used to solve problems and explain the similarities / differences

between the methods that were implemented and the ones addressed in the mathematical problem

formulation chapter. Several practical aspects like constant values previously skipped will also be

taken into account.

First is explained how from two images, with some overlapping region, we can get a stitched

image mosaic (section 5.1). A detailed scheme to help understanding the image mosaicing method

that was implemented is presented in Fig. 1.2. In section 5.2 this method is generalized for more than

two images. Section 5.3 is related to the error measures used to evaluate the accuracy of the method.

5.1. Image mosaicing method for two images

This section is divided in two parts. The first explains how to obtain a homography between two

images. Next, in section 5.1.2, the stitching operation for these images is addressed.

5.1.1. Find homography

This subsection presents a basic algorithm of the proposed method (Fig. 5.1) (follows the

strategy used in [113]). Previously a simplified version of this scheme was presented in Fig. 1.2

(section 1.4). All the considerations/practical issues related to each and every step of this basic

algorithm are explained in the Appendix A.

Fig. 5.1 – Basic algorithm used to find a homography between two images.

Describe interest points according to SIFT method (section 3.7.2)

Find putative correspondences using the ratio / nearest neighborhood (section 3.8)

Find consistent correspondences using Ransac (section 3.9)

Estimate homography using DLT algorithm (section 3.5)

Assign a characteristic scale for each interest point following Harris-Laplace algorithm (section 3.6)

Use Harris-Laplace algorithm (section 3.6) to find interest points in both images

images

Assign a canonical orientation to each interest point according to SIFT method (section 3.7.1)

- 64 -

5.1.2. Stitch two images together

This section is divided in four steps: Transformation that brings both images into the same

reference frame; Image mosaic boundaries; Application of the transformation to each image; and

Overlapping regions.

Transformation that brings both images into the same reference frame

Up to this point it is assumed we received two images linked by a homography:

. If we set the first image as our reference (identity matrix), then the second image can

be reassigned into the first reference frame according to:

(5.1)

Image mosaic boundaries

Now that we have the required transformations, we need to define the boundaries, in pixel size,

of the resultant image mosaic. For this, let us assume each of our images has pixels. Then

our first image alone (identity matrix) will initially set the size of our mosaic as . However if

we add the second image to the mosaic we need to calculate where its boundaries will lead us:

(5.2)

Normalizing the left matrix by its scale factors we get:

(5.3)

Next we have to find the maximum and minimum values for and , respectively from the sets:

and . This new four values will define the new

boundaries of our image mosaic (built from the two images). See Fig. 5.2.

Fig. 5.2 – Image mosaic within the calculated boundaries.

- 65 -

The black regions of the Fig. 5.2 were simply set to “NaN” (Not a Number).

Application of the transformation to each image

The transformation of the first image is trivial because it is the reference. However for the

second image we simply use a similar concept as the one used in eq. (5.2), but this time the input

coordinates are all the pixel coordinates of the final mosaic (similar to what we saw in equation (4.21)):

(5.4)

and are the mosaic dimensions in pixels.

Now, through bilinear interpolation (function in Matlab) it is possible to know the true

values for the pixels of the mosaic and . Fig. 5.3 shows the two images individually

placed within the mosaic before the overlapping operation. To avoid misunderstandings, these new

images will be called frames.

Fig. 5.3 – Frames associated with the first two images.

Overlapping regions

The stitching operation on the overlapping regions was done in two different ways. Simply

assigning to each pixel coordinates of the final image the maximum value obtained from all the frames

for that specific coordinate17

:

Frame 1

Frame 2

Mosaic (final image)

NaN NaN NaN NaN NaN NaN NaN

NaN NaN 0,87 0,43 0,14 0,08 0,42

NaN NaN 0,87 0,43 0,14 0,08 0,42

0,76 0,05 0,79 0,26 0,08 NaN NaN +

NaN NaN 0,40 0,18 0,58 0,12 0,90 =

0,76 0,05 0,79 0,26 0,58 0,12 0,90

0,75 0,53 0,31 0,65 0,23 NaN NaN

NaN NaN 0,26 0,26 0,55 0,18 0,94

0,75 0,53 0,31 0,65 0,55 0,18 0,94

0,57 0,93 0,17 0,75 0,15 NaN NaN


0,57 0,93 0,17 0,75 0,15 NaN NaN

Fig. 5.4 – Example of the “maximum” strategy for two images.

17 This was done individually for each RBG channel.

- 66 -

Or instead of the maximum, we can consider that the more transformations involved to find a

frame the more inaccurate are our results18

. This is the same as saying that pixels added sooner

(successively from the first to the last frame) will have priority over the others since they are more

likely to give the best results. See Fig. 5.5 to better understand the idea.

Frame 1

Frame 2

Mosaic (final image)


NaN NaN 0,87 0,43 0,14 0,08 0,42

NaN NaN 0,87 0,43 0,14 0,08 0,42

0,76 0,05 0,79 0,26 0,08 NaN NaN +

NaN NaN 0,40 0,18 0,58 0,12 0,90 =

0,76 0,05 0,79 0,26 0,08 0,12 0,90

0,75 0,53 0,31 0,65 0,23 NaN NaN

NaN NaN 0,26 0,26 0,55 0,18 0,94

0,75 0,53 0,31 0,65 0,23 0,18 0,94

0,57 0,93 0,17 0,75 0,15 NaN NaN


0,57 0,93 0,17 0,75 0,15 NaN NaN

Fig. 5.5 – Example of the “first in stays in” strategy for two images.

5.2. Image mosaicing method for more than two images

With relation to this point, the implementation developed in this dissertation did not enter into

great sophistications. What was considered was that each new image added to the mosaic was

directly associated with the image right before (Fig. 5.6).

Fig. 5.6 – Image chain assumed on our implementation.

The configuration of Fig. 5.6 was the adopted because the quadrotor images were obtained and

named according to the trajectories followed. So, for now, there was no need to deal with mixed

images without any explicit order.

The homographies between each pair of images was obtained as before in section 5.1.1. The

stitching operation was just a generalization of what was done in section 5.1.2. If we set the first image

as our reference (identity matrix), then the following images can be transformed into this global

reference frame according to:

(5.5)

18 Generalizing for cases with more than two frames involved.

- 67 -

5.3. Error measures implemented

This section starts with the definition of some equations concerning the error measures. Next

are described the two error measures adopted: “Exact vs. Estimated Transformation” (section 5.3.1)

and “Homography Decomposition Based” (section 3.10).

Flight altitude and covered area

Here is explained how to calculate the flight altitude necessary to cover a given area, ,

and how to calculate the area covered at a given flight altitude, . For these calculations we will

assume that we obtained, a priori, experimental data for the camera calibration matrix. On this

example photographs have pixels and the calibration matrix is given by eq. (4.22).

From Fig. 4.10 we can derive the relations:

,

(5.6)

For accurate results we can now use the data obtained from the Camera Calibration Toolbox

(eq. (4.22)) and set:

,

(5.7)

As we mention, considering the image pixel size as being, e.g., ( , ) and

recalling what was said in “Axes rescale” paragraph of section 4.5 we can write eq. (5.8)19

:

,

(5.8)

Where we use and , instead of simply and , because this values are half the

image sizes (Fig. 4.10).

Finally, combining these three equations (5.6), (5.7) and (5.8) respectively for or we have

the photographed area, , for a given desired flight altitude:

(5.9)

And the flight altitude, , necessary to cover a given area can be computed from:

(5.10)

19 and are constants and do not depend on or , however this is a calibration procedure based on

experimental data.

- 68 -

Overlap percentage

From previous equations one image taken at the height of meters covers

square meters of area, thus if two images have an overlap of it means that both cover a total area

given by:

(5.11)

Also, to achieve this of overlap between two consecutive images, we will consider in this

dissertation just two situations: a pure translation along or along , respectively given by:

(5.12)

(5.13)

Where and were mentioned previously on the equations (5.6) to (5.9):

(5.14)

(5.15)

is aligned with the ( ) according to Fig. 4.11.

Exact homography between two images

From chapter 4 we saw that it is possible to know the exact transformation between the world

plane and the picture plane just knowing the calibration matrix, the coordinates of the camera with

respect to the world frame, and the pitch, roll and yaw angles of the UAV.

Now, if we take two (controlled) pictures of our world image (Fig. 4.1), and if they have some

overlapping region between each other, it is possible to compute the exact homography between such

images. Let us consider two pictures obtained using the transformations described previously on eq.

(4.23): and , then the exact homography that will place point of the second

image into the reference frame of the first image is:

5.3.1. Exact vs. estimated transformation

Let us consider a mosaic composed by two images with some predetermined overlap

percentage between each other (equations (5.12) and (5.13)). The first image is the reference and the

second image is the object of our analyses. The transformation , as described previously

in eq. (5.16), is the exact transformation between the second image and the world reference frame.

The estimated transformation is obtained combining the transformation between the first image with

(5.16)

- 69 -

the world frame ( ) and the estimated homography obtained according to section

5.1.1 ( ):

(5.17)

Now, knowing that:

(5.18)

(5.19)

Where, e.g., and .

We can compute the error measure:

(5.20)

If we use the exact and the estimated transformations to calculate all the coordinates in the

world that are associated with each pixel from the exact and the estimated image, respectively, then

the Euclidean distance will be the distance or error, in meters, between each of these estimated and

exact positions. Fig. 5.7 will ease the comprehension of this measure.

Fig. 5.7 – Euclidean distance.

From the Euclidean distance matrix we now can compute:

(5.21)

(5.22)

(5.23)

(5.24)

- 70 -

5.3.2. Homography decomposition based

As before in section 5.3.1, let us consider a mosaic composed by two images with some

predetermined overlap percentage between each other (equations (5.12) and (5.13)). The error

measure described here is a straight forward application of the equations mentioned on section 3.10.

The idea behind is very simple, we need to decompose the estimated and the exact homography (see

section 5.1.1 and equation (5.16) respectively), and then calculate the difference between the exact

and the estimated results. The results are: roll, pitch, yaw and the translation coordinates that can be

obtained from the translation vector : and (equation (3.48)). Next are presented some

considerations with respect to the involved equations.

The homography decomposition method gives two solutions. For the correct solution the values

from equation (3.53) can be initially guessed from the direction of the -axis of the UAV:

.

The three Euler equations reported for this section are:

(5.25)

(5.26)

(5.27)

However, due to transformations such as the ones given by the equations (4.6) -

, (4.10) - and (4.17), the values of the roll, pitch and yaw can come rotated of

90 or 180 degrees. To correct this, the rotation matrix of equation (3.52) (here renamed as ) needed

to be corrected before the application of equations (5.25) to (5.27):

(5.28)

Also, after the recalculation of the , and angles with the equations (5.25) to (5.26)

the new true angles had to be corrected according to:

(5.29)

The translation (equation (3.48)) was also adjusted by a factor of .

- 71 -

6. Evaluation of the Mosaicing Method. Results and Discussion

This chapter is divided into sections. In section 6.1 a robustness analysis will be done to the

proposed mosaicing algorithm in order to find which is the best overlap percentage between two

images. An example of a mosaic built from a set of images is presented in section 6.2. Section 6.3

presents a comparison between the proposed image mosaicing method and the SIFT method. D.

Lowe has an executable, available for download on the internet, with the SIFT implementation (here:

[114]). Other three methods will be considered in section 6.3. Finally, in section 6.4, considerations are

made on applications with real data.

6.1. Robustness analysis to the implemented algorithm

This section starts with an explanation of how the robustness analysis was done. Section 6.1.1

introduces the importance of having a high number of inliers. In section 6.1.2 a consideration on the

computational cost of the mosaicing method is done. The influence of the random selection step of 4

putative correspondences in the Ransac algorithm is discussed in section 6.1.3. Finally is studied the

most adequate overlap percentage for each one of these parameters: pitch, roll, yaw and scale factor

(section 6.1.4). Changes of brightness; changes of contrast and noise are also addressed.

For the robust analysis this dissertation used the Matlab script developed in chapter 4. Knowing

the variables involved to get each image, two photographs were taken to the world image (Fig. 4.1).

The first image was set to have roll, pitch and yaw equals to zero. This first image was deemed our

world reference. The second image was obtained from the conditions in which the first was taken

adding a translation to the UAV position and/or adding small shifts to the roll, pitch or yaw angles (see

Fig. 6.1).

a) b) c)

Fig. 6.1 – Taking two photographs with the Matlab script developed in chapter 4. The bottom image of c) is

the reference image, while in the top is the obtained applying a small shift to the first.

The bottom image of Fig. 6.1 is an example of image obtained with the position of the UAV set

on meters; and ( just for display

purposes). For the second image was used:

meters; ; and . The translation along axis was obtained from eqs.

(5.12) and (5.14) considering an overlap percentage, , of : meters.

- 72 -

As was explained before in chapter 5, using the exact and the estimated homography, it is

possible to obtain the exact and the estimated mosaics:

Fig. 6.2 – a) Estimated mosaic, b) Exact mosaic.

The controlled environment, and the good overlap percentage between the two images used to

obtain the mosaics of the Fig. 6.2, explain why, visually, seems to be almost a perfect estimation of

the estimated homography.

Next Fig. 6.3 and Fig. 6.4 represent the putative matches and the inliers out of the interest

point of first image and the interest points of the second image.

Fig. 6.3 – Above we have the 84 putative correspondences between the two images obtained with section

3.8. Below are the 62 correspondences (inliers) that remain after the robust estimation (section

3.9).

- 73 -

Fig. 6.4 – On the left we have the inliers (the same as the Fig. 6.3), but now with a rectangle around

defining the neighborhood of that point used to find its descriptor. The size of the

square is according to its characteristic scale. On the right is a zoom in view where we can see

that each interest point has its own canonical orientation.

Error measures used to quantify the quality of the mosaic

For this, only the errors of section 5.3.1 (Exact vs. estimated transformation) and 5.3.2

(Homography decomposition based) were used.

Other errors, namely, a ratio between the overlap percentage of the estimated and the exact

mosaics were studied:

Fig. 6.5 – Ratio between the overlap percentages. Concept.

(6.1)

The overlap percentage is the percentage of the second image (top right image of Fig. 6.1) that

lies within the reference image.

Also, a possible error measure consisting of the difference between the Frobenius norm of the

exact and the estimated homography was studied [115][116].

Others criterions, as the number of inliers, were studied, however, most had the same problem;

they defined necessary but not sufficient conditions, and so were not used.

- 74 -

6.1.1. Minimum number of inliers required

Naturally, the number of inliers should be as high as possible. However, an experimental study

on fast 2D homography estimation for face recognition [117] refer that a good estimate should be

achievable with near the minimum required number of correspondences ( is the minimum) or at least,

as a rule of thumb, with less than point correspondences.

On the absence of parallax displacement (controlled environment) this study shows that the

DLT algorithm gives relatively good results for numbers of at least to inliers. But again, we must

say that a high number of inliers do not mean necessarily a good homography estimation. Point

correspondences should be equally distributed within the overlap region, avoiding problems as

collinearity.

6.1.2. Runtime

This study was not meant for real time applications. Next, are presented some offline runtimes

obtained using a laptop with a processor and of ram (this system was used for all the

tests in this dissertation). It is remembered that the code was written and executed with Matlab.

Tests showed that the time taken to read two .png images with of overlap

successively till the display of our final image mosaic was of about seconds ( seconds were just

for the final display). This includes the calculation of the descriptor vectors - entries each – of

to interest points per image resulting in a final average of inliers. If we include the time taken

to take each of the two images with chapter 4, it increases to around seconds.

6.1.3. Influence of the random factor introduced by Ransac

We saw before in section 3.9 that from the entire set of putative correspondences between two

images, the Ransac algorithm selects randomly just putative correspondences at a time. This

artifact, used only to speed up the algorithm, has consequences. The consequence is that each time

we run the algorithm it shows different results, because a different set of inliers is chosen.

Using the two-image strategy described in the beginning of this section (6.1), a small test was

made to assess on the variability of the results.

The reference image was set to: ; .

The second image was obtained with: since

the ; ; .

- 75 -

Now, running the mosaic algorithm times (for statistical independence) over these two

images, the following table was obtained:

Tab. 6.1 – Variability of the number of inliers.

10000 repetitions Standard Deviation Minimum value Medium value Maximum value

Number of Inliers: 14,85 114,00 166,12 189,00

The number of keypoints found in the first and in the second image used in this test was

respectively of and . The number of putative correspondences was .

From Tab. 6.1 we can see just a small glimpse of the problems that may rise from this

variability. In this particular situation we see that the minimum number of inliers represents just of

the maximum value that we could have. One may ask if this difference has meaning in the finals

results. For this situation it does not, but if our minimum value was of inliers and the maximum

possible was of it would be significant. In some situations the minimum value even goes below

inliers, and in such situations the mosaicing problem has no solution.

6.1.4. Percentage of overlap between two images

Naturally the bigger the overlap percentage between images, the lesser are the errors resulting

from the proposed method, however the number of images required to cover the same given area

increase. Despite the increase of the computational cost needed to stitch more images, the mosaicing

algorithm, as was said, was planned to run offline, so the computational cost (time consuming) was

not a problem.

Taking photographs at higher frequency rates will result in bigger overlap percentages, however

problems related to the UAV capabilities, as the speed of the data transference achieved with the USB

interface of the quadrotor, are not of our concern in this dissertation.

The objective of this section is to give a feedback of what the implemented method in this

dissertation can do by suggesting the more appropriate overlap percentage between two consecutive

images taking into account the physical limitations of the problem. Tests reported showed that the

controller board used on the quadrotor allows to take photographs with pitch and yaw angles below

degrees20. The information that arrived from the GPS sensors was inaccurate and of little use because

we were having errors of near to meters on the quadrotor position.

20 This is the same as saying that are errors are below degrees because naturally we wanted it to be

zero degrees.

- 76 -

Again, using the two-image strategy described in the beginning of this section (6.1), a series of

tables were computed (see Appendix). In each table two parameters was set to change: the overlap

percentage and the pitch/roll/yaw or scale (coordinate will change for scale evaluation) of the second

image. A small consideration on contrast variations, brightness changes and noise will be also

addressed. Each table will present a different error measure as we will see later. For statistical

independence each entry of these tables was computed from five different pairs of images repeating

the Ransac step times for each. In total each entry of each table was obtained from a total of

results.

The overlap percentages that were used in this analysis can be seen in Tab. 6.2, as well as the

individual and displacements that were added to the camera position (from the reference image to

the second image) to achieve such overlap percentages. These results were obtained from equations

(5.11) to (5.13).

Tab. 6.2 – Overlap percentages tested.

Overlap Percentage 20 30 35 40 45 50 55 60 65 70 80

Covered Area 4,06E+05 3,83E+05 3,72E+05 3,61E+05 3,49E+05 3,38E+05 3,27E+05 3,16E+05 3,04E+05 2,93E+05 2,71E+05

Translation along

328,78 287,68 267,13 246,58 226,03 205,49 184,94 164,39 143,84 123,29 82,19

Translation along

438,92 384,05 356,62 329,19 301,76 274,32 246,89 219,46 192,03 164,59 109,73

Naturally these percentages are just true if both images are taken with ,

which are not. So, these percentages are, in fact, just indications of what we should have if the UAV

controller was perfect, and that is why there is a need for the evaluation on the robustness of the

method for each percentage.

Pitch

Tab. 6.3 represents an example of a typical table of results. In this table each entry is filled with

the worst (maximum error) obtained over the repetitions. The blank spaces mean that the

algorithm has failed at least once for that specific pitch-percentage configuration.

- 77 -

Tab. 6.3 – Example of a table for future interpretation.

Maximum Geometric Distance Error (section 5.3.1) – in pixels x100 Ransac Overlap Percentage

x5 Images 20 30 35 40 45 50 55 60 65 70 80

Pitch

(degrees)

12

6,6E+06 313,4 122,5 99,23 24,76

1>value

11

6,6E+04 1,6E+07 145,5 84,44 85,28 44,61

3>value≥1

10

6,2E+08 329,5 114,6 109,2 45,25 36,76

6>value≥3

9

5,1E+08 235,2 139,5 102,3 61,63 45,11 21,98

10>value≥6

8

5,3E+08 5,3E+07 505,4 210,4 48,85 59,34 55,09 12,62

15>value≥10

7

8,1E+06 581,6 103,4 64,84 41,91 23,20 24,01 9,68

20>value≥15

6

6,6E+07 1,1E+07 131,6 102,9 115,9 42,20 16,84 15,72 8,93

30>value≥20

5

3,3E+07 192,7 113,5 70,41 35,68 31,60 26,38 11,31 13,10

40>value≥30

4 2,7E+07 1,8E+03 92,03 113,6 67,19 26,15 26,06 22,05 13,64 8,73

60>value≥40

3 5,2E+06 101,6 75,21 38,83 31,14 39,49 30,64 15,00 11,83 10,71

100>value≥60

2 117,2 70,65 173,8 39,13 23,11 19,51 13,97 13,96 12,60 6,41

200>value≥100

1 1,2E+07 63,07 53,98 36,17 36,42 14,91 10,96 7,87 5,15 3,80 2,70

500>value≥200

0 12,18 1,14 1,55 0,57 0,69 0,62 0,69 0,33 0,37 0,20 0,12

1000>value≥500

-2 115,0 30,19 21,21 32,31 20,19 14,53 10,02 10,53 5,98 6,52 5,79

value≥1000

-4 64,04 31,35 23,63 16,20 10,65 10,50 9,12 7,13 6,92 4,69 3,87

-8 31,76 15,41 14,51 16,08 9,83 8,40 6,47 5,56 4,34 4,03 5,70

-12 15,15 13,32 7,52 6,05 7,08 5,09 6,07 7,25 4,53 7,82 6,79

-16 14,84 7,10 11,35 9,31 5,48 7,23 5,98 7,57 10,92 12,07 13,78

-20 14,73 15,61 10,74 12,82 12,41 15,54 13,56 19,03 23,27 41,45 38,09

The errors are in absolute values, and naturally is of our interest that they be as near zero as

possible. The geometric distance error (Euclidean distance: section 5.3.1), in this particular simulation,

is in meters since each pixel was set to represent one meter.

As was mentioned before we should expect maximum errors for the pitch around degrees.

Thus, if one image is taken with a pitch of and the next right after with then this is relatively

equivalent as having a pitch of for the first image and a pitch of for the second image. On may

ask: what about if we instead of this not just considered a simulation where the first image had

and the second ? That is because the is not yet very accurate. If in the future the controller of the

quadrotor gets better and the maximum error turns to be then there is no need to repeat the results

because we just have to check this tables for results around pitch equals .

A close look on Tab. 6.3 shows that we have better results for negative angles than for positive.

This is simply because a negative pitch increases the overlap percentage when the quadrotor is

moving straight (Fig. 6.6). The displacement can be computed from Tab. 6.2.

Fig. 6.6 – Scheme used for the pitch-overlap study.

- 78 -

The tables regarding the pitch can be seen in Appendix B. If we consider the maximum pitch to

be then it is recommended to have overlaps of near . Despite the fact that the maximum pitch

error obtained for this case was , the medium was just of with a standard deviation of

and a minimum of . Tables regarding “standard deviations”21

and “best results” were always

computed for all the situations in this dissertation, however are not included to avoid an enormous

unnecessary appendix. It was adopted a conservative point of view in this and in the next analyses.

To finish this topic: We saw that this mosaicing method is planned to run offline, and that is

possible to obtain very good results if we have a bit of “luck” in our random selection of the four

putative correspondences with Ransac. So, what is proposed is to, on offline applications, allow the

Ransac algorithm to test as much random samples as possible and just then choose the set

consensus set with the most inliers22

.

Roll

The displacement used for the roll was along the since it gave worst results than for

the .

Fig. 6.7 – Scheme used for the roll-overlap study.

Analyzing the tables in Appendix C for maximum rolls of , we see that the results are

significantly better than the ones obtained with the pitch. An overlap percentage of gave a

medium error value of just and a maximum of . We can also see that the biggest Euclidean

distance was of meters, but the medium was just of meters, which represents less than

pixels in the final mosaic.

One may see that there are maximum errors for the roll angle near (Tab. 7.13). There is

not a real explanation for that because this means that the second image had to be pointing towards

the sky and so we should not have had a solution, but yet the method found a solution. We assumed

this to be mathematical solutions (given by the method described on section 3.10) with no physical

meaning.

21 The standard deviation was obtained averaging all the standard deviations ( ). Since the

sample sizes are equal for all the situations (no need for weights) it was simply used the square root of

the standard deviations:

and then

with

because we are using pairs of images. For this and for information regarding variance of combined

sets of data – which is different from “averaging” – it is recommended the reference: [132]. 22

As we saw on section 6.1.1, empirically we know that a higher number of inliers have more chances go give better results (the Ransac itself uses this principle).

- 79 -

Yaw

At the time there was no precise information available regarding the capacity of the controller to

control the quadrotor direction, thus the displacement used for the roll was along the , because it

shows that we cannot have errors around and as we can see in the results of Appendix D.

Fig. 6.8 – Scheme used for the yaw-overlap study.

For overlaps of we got maximum yaw errors below in the range to , which is

very good when compared to the previous pitch and roll angles. Rotations of between to in

two consecutive photographs are very unlikely, yet, the maximum yaw error found was of .

Scale

Despite the inaccurate information received by the GPS sensors, those responsible for the

quadrotor reported that it is possible to take photographs at a relatively constant altitude. The

displacement used was along for no special reason.

Fig. 6.9 – Scheme used for the scale-overlap study.

Each image has pixels within ( ). From eq. (5.9), for a height of meters,

we have a covered area of squared meters ( pixels of the world image)23

. The

reason why was used an height of meters was simply because at this high we could sift our

altitude to near half its value ( meters pixels) without compromising the accuracy of

the photographs that we know to be obtained from the world image through bilinear interpolation.

The results in Appendix E show that, for overlaps of , a shift in the height of the second

image to values between and result on mosaics with maximum errors (section

5.3.2) of meters and medium bellow meters.

23 With

- 80 -

Contrast variations

For the change of contrast all the pixels of the second image (two-image strategy) were

multiplied by a constant value. Naturally, pixel values above or below (range: to ) were

saturated. The displacement used was along the .

Fig. 6.10 – Contrast variations. a) ; b) ; c) ;

Tab. 6.4 – Minimum number of inliers on contrast evaluation.

Minimum Number of Inliers

value≥250

x100 Ransac Overlap Percentage

250>value≥200

x5 Images 20 30 35 40 45 50 55 60 65 70 80

200>value≥150

Contrast Change (constant factor)

0,35 4 5 9 14 16 17 20 24 24 25 32

150>value≥120

0,50 5 10 16 24 31 38 40 52 52 58 82

120>value≥90

0,75 11 22 37 54 67 77 97 124 138 152 208

90>value≥60

0,90 16 44 69 95 114 128 151 179 201 218 282

60>value≥40

1,00 21 55 86 121 142 158 183 215 242 269 336

40>value≥20

1,10 17 45 73 107 124 137 163 192 218 243 312

20>value≥10

1,25 5 23 44 62 81 93 84 101 110 122 170

10>value≥7

1,40 4 6 13 17 18 19 22 27 28 32 57

7>value

Tab. 6.5 – Maximum geometric distance error on contrast evaluation.

Maximum Geometric Distance Error (section 5.3.1) – in pixels

0,5>value


1>value≥0,5

x5 Images 20 30 35 40 45 50 55 60 65 70 80

2>value≥1

Contrast Change

(constant factor)

0,35 4,7E+09 5,6E+07 2,11 3,43 2,20 1,26 1,36 0,60 0,56 0,31 0,17

5>value≥2

0,50 10,49 1,28 1,71 3,93 0,51 1,75 0,21 0,18 0,21 0,83 0,41

10>value≥5

0,75 10,12 1,14 1,56 0,85 0,32 0,40 0,63 0,18 0,27 0,16 0,08

30>value≥10

0,90 9,75 1,14 1,55 0,71 0,79 0,70 0,83 0,37 0,43 0,23 0,13

50>value≥30

1,00 12,18 1,14 1,55 0,57 0,69 0,62 0,72 0,33 0,37 0,20 0,12

100>value≥50

1,10 13,15 2,30 2,66 0,74 0,79 0,54 0,45 0,33 0,31 0,26 0,24

250>value≥100

1,25 513,1 20,86 9,22 10,39 3,95 2,21 4,82 4,08 6,02 1,91 1,62

1000>value≥250

1,40 2,0E+06 112,4 93,7 38,01 25,48 18,40 24,61 7,13 21,90 16,97 6,28

value≥1000

Results demonstrate that common changes on the contrast have little influence on the results.

- 81 -

Brightness changes

For the change of brightness a constant value was added to all the pixels of the second image

(two-image strategy). Naturally, pixel values under and above (range: to ) were saturated. The

displacement used was along the .

Fig. 6.11 – Brightness variations. a) ; b) ; c) ;



value≥200


200>value≥120

x5 Images 20 30 35 40 45 50 55 60 65 70 80

120>value≥90

Brightness Change (constant factor)

-0,50 9 6 7 10 17 19 21 23 26 26 32

90>value≥60

-0,25 20 55 85 120 141 156 182 212 233 266 333

60>value≥40

-0,10 21 55 86 120 141 157 184 214 241 268 335

40>value≥20

0,00 21 55 86 121 142 158 184 215 242 269 336

20>value≥10

0,10 17 45 73 106 120 134 152 186 208 232 303

10>value≥7

0,30 7 4 4 5 7 8 10 13 14 16 22

7>value



0,5>value


1>value≥0,5

x5 Images 20 30 35 40 45 50 55 60 65 70 80

3>value≥1

Brightness Change

(constant factor)

-0,50 147,8 23,02 174,9 53,7 27,4 21,00 20,38 11,80 10,81 9,34 8,27

10>value≥3

-0,25 12,01 2,22 2,86 1,14 1,84 1,16 0,69 0,89 0,64 0,36 0,35

25>value≥10

-0,10 12,18 0,63 1,49 0,57 0,70 0,63 0,70 0,32 0,37 0,19 0,19

50>value≥25

0,00 12,18 1,14 1,55 0,57 0,69 0,62 0,69 0,33 0,37 0,20 0,12

100>value≥50

0,10 13,91 0,89 2,00 0,67 0,62 0,63 0,64 0,38 0,40 0,74 0,49

500>value≥100

0,30 279,3 7,1E+06 1,3E+07 390,4 858,8 222,4 26,4 19,88 5,07 7,17 8,71

value≥500

As before for contrast, results demonstrate that common changes on the brightness have little

influence on the results.

- 82 -

Noise

A random value was added to each and every one of the pixels of the second image (two-image

strategy). The Matlab function used to add the noise was:

noise=random('unif',-0.15,0.15,[240 320 3]); (6.2)

Where a uniform distribution was used between the interval: The displacement

used was along the .

Fig. 6.12 – Noise. a) ; b) ;



value≥200


200>value≥100

x5 Images 20 30 35 40 45 50 55 60 65 70 80

100>value≥60

Noise

without noise 21 55 86 121 142 158 184 213 242 269 336

60>value≥40

[-0.05,0.05] 8 17 31 58 53 65 75 92 117 127 156

40>value≥20

[-0.10,0.10] 5 8 11 24 33 25 30 50 48 48 73

20>value≥10

[-0.15,0.15] 4 5 6 11 13 14 16 28 26 28 37

10>value



1>value


3>value≥1

x5 Images 20 30 35 40 45 50 55 60 65 70 80

10>value≥3

Noise

without noise

12,18 1,14 1,55 0,57 0,69 0,62 0,69 0,33 0,37 0,20 0,12

25>value≥10

[-0.05,0.05] 111,1 34,3 20,74 5,21 10,20 6,97 5,67 2,88 2,61 4,24 2,28

100>value≥25

[-0.10,0.10] 223,4 117,0 122,9 41,0 14,25 19,73 25,6 10,55 7,89 11,41 9,59

500>value≥100

[-0.15,0.15] 8,4E+06 1,7E+08 1,3E+07 238,4 60,9 93,1 42,2 16,90 33,6 16,31 6,84

value≥500

Noise, in certain ways, can be compared to the motion parallax because both add small

changes near the interest points, thus affecting the -descriptors. Results show that this method is

sensitive to noise.

- 83 -

6.2. Example of a mosaic

In this section we will present an example of a mosaic with 18 images. From section 6.1 we see

that combining the errors will result on an enormous increase of difficulty and thus we propose an

increase of the overlap percentage to a value of . The following example has used of

overlap.

The first image will be set our reference has we have been doing:

meters; and . Each of the following images will be obtained by adding a

Gaussian/uniform random factor: according to:

: . meters from eq. (5.13);

: ; meters from eq. (5.12);

: (Scale):

;

Pitch: ;

Roll: ;

Yaw: ;

Contrast variations: ;

Brightness changes: ;

Noise:

Fig. 6.13 – The red asterisks represent the coordinates where each photograph was taken. a)

Photographies taken. b) Desired positions and trajectory without errors.

Fig. 6.14 – a) Exact mosaic. b) Estimated mosaic with the “maximum”-strategy between the overlapping

regions (see Fig. 5.4). c) Estimated mosaic with the “first in stays in”-strategy between the

overlapping regions (see Fig. 5.5).

- 84 -

6.3. A comparison between mosaicing methods

Here is made a comparison between the mosaicing method studied in this dissertation and an

implementation of the SIFT method. D. Lowe, author of the SIFT, has an executable of the SIFT

method available online [114].

Three other methods are compared: Harris using correlation for matching (not invariant to

rotations or scale changes), SURF (implementation found here [118]) and ASIFT (implementation

found here [119]). However, despite the fact that there was some study regarding these methods this

dissertation will not address them. They are here because they have been tested and this information

may be of use for interested readers or as a suggestion for future research.

Briefly, to avoid excessive information, Tab. 6.10 gives the medium number of inliers and the

medium computation time taken for each method.

Tab. 6.10 – Comparison between methods.

x5 Images Method

Overlap Percentage

Chosen Method ASIFT SURF Harris SIFT

Time(s) Nº Inliers Time(s) Nº Inliers Time(s) Nº Inliers Time(s) Nº Inliers Time(s) Nº Inliers

20 9,32 27,8 701,0 2867,2 4,0 9,4 - - 1,88 57,6

30 9,28 78,8 - - 5,7 33,8 1,68 20,2 2,10 108,6

40 8,98 135,0 717,7 10228,2 3,3 66,8 - - 1,84 177,4

50 9,28 212,2 - - 3,2 97,2 0,98 60,6 1,72 233,8

60 9,00 256,6 - - 3,5 132,2 - - 2,0 283,6

These tests were done using the two-image strategy over five images with just one Ransac

sample each. ‘-’ mean the situation was not tested. SIFT and SURF methods have been implemented

in C and that is eventually the main reason why they run faster. On the few tests done, ASIFT proved

to give the best results, but the computational cost (time) was very high.

We saw before that the number of inliers provide a good guess on the accuracy of the results,

thus the most important thing to be notice from this table is that the SIFT method has more inliers than

the proposed method. However, the SIFT method is patented [53].

6.4. Application in real data

Unfortunately it was not provided, within the period of execution of this dissertation, a set of real

photographs with which was possible to work on.

However, those responsible for the image acquisition system with the quadrotor provided

preliminary data of some tests that were done. The photographs obtained corresponded to situations

where the quadrotor was controlled remotely by a technician without any planning on the trajectory to

follow. Also, these photographs do not take into account the overlap percentage nor the stabilization of

- 85 -

the quadrotor at the precise moment when the photographies were taken, because the mechanism

used to take photos was timed to take a photograph every (around) 2 to 5 seconds.

Images size is of and the three small mosaic obtained below, of a campsite, were

computed without any kind of pre-manipulation of the images.

Fig. 6.15 – Two mosaics obtained from two pairs of images. 80 inliers were involved on the left and 16 on

the right.

Fig. 6.16 – Mosaic from three images. 59 inliers between the first and the second image and 48 between

the second and the third.

To get these results the original algorithm was modified. The robust estimation step (section

3.9) was repeated, internally, times for each pair of images. The homographies with the highest

support (more inliers) were chosen.

Next, in Fig. 6.17 is presented the trajectory followed by the quadrotor and the respective flight altitude

(between 40 to 80 meters).

- 86 -

Fig. 6.17 – Results from the flight. a) trajectory; b) flight altitude.

To avoid parallax, a set of photographs were taken to a flat surface (futsal pitch). An example of

an image is presented next (Fig. 6.18).

Fig. 6.18 – Futsal pitch photograph. Interest points.

From Fig. 6.18 we see that the algorithm just find interest point in the lines, where we have the

highest derivatives between pixels. Harris-Laplace algorithm finds the interest points in the image gray

level, thus the lines in red fade into the gray background:

Fig. 6.19 – Red lines.

Other aspect is that the interest points have very similar descriptors, because its surroundings

are identical:

- 87 -

Fig. 6.20 – Identical interest points.

The fact that almost all the keypoints lie within a couple of lines results in a failure of the method

for such situations. It is very likely that between the four putative correspondences chosen by Ransac

(section 3.9), three of than are collinear (in section 5.1.1 is made a reference on this).

Other aspect to consider is that the photographs obtained with the camera used on the flight

(Logitech webcam c510) show to have same nonlinear distortion. It is recommended the reader to pay

attention on the supposed straight lines of the Fig. 6.21.

Fig. 6.21 – Straight lines are often not mapped as straight lines.

Finally, from these results three important aspects need to be mentioned:

Parallax can be diminished if the camera tilt is minimized and the photographs are taken at

higher altitudes;

Overlap between images should be high and not as low as (Fig. 6.15) or even less;

Interest points should be distinct (descriptor values) and should be spread within each

image.

- 88 -

- 89 -

7. Conclusions and Future Work

7.1. Conclusions

In this work, a SIFT-like mosaicing algorithm is proposed to solve image mosaicing problems.

Instead of using Difference of Gaussians to find interest points, this algorithm uses the Harris-Laplace

method. Relaying on the principles of projective transformations between two images, the Direct

Linear Transformation algorithm is used together with Ransac to robust estimate homographies.

The proposed solution has presented invariance to translation, rotation, scale, perspective

transformations, contrast variations, brightness changes and even showed to be robust in the

presence of noise.

To study the robustness of the method a simulator was developed to take aerial photographies

to an input image.

This simulator requires, along with other information, the camera calibration matrix. Naturally,

an input image of the “world”, from which the photographies will be taken need to be provided. In this

controlled environment, knowing all the parameters used to obtain each photograph (e.g. pitch) an

accuracy evaluation of the method was made. This was done changing the, above mentioned,

parameters involved (e.g. rotation between two images). Error measures were used to compare the

results between the estimated and the exact mosaics.

Results show the importance of the minimization of the tilt angles of the quadrotor at the precise

moment when the photographies are taken. Also the overlapping percentage between two images is

at stake. Overlapping percentages between to showed to give good results for a variety of

distortions. Results also demonstrate that the images should be taken at high flight altitudes to avoid,

on maximum, the parallax.

Unfortunately, real data was not provided in time to test this method. However preliminary data

showed the importance of having distinctive interest points within each image as well as what was

said on previous paragraph. Data from the GPS sensors proved to be very inaccurate.

- 90 -

7.2. Future work

This work revealed the importance of having the maximum number of inliers possible between

two photographs when using feature-based methods. To cope with this, and since the online

implementation is not a problem, is proposed to use more than just the Harris-Laplace method to find

interest points by combining the information of various methods. Difference of Gaussians (SIFT) can

be tested together with Harris-Laplace, although it is also focused on image corners as it is the Harris

method.

Two modifications to the Ransac algorithm are proposed to increase the number of inliers: the

first one is related to the samples used to find the 4 putative correspondences. Repeating the

process of the “ samples” several times (e.g. to ) and choosing the model (homography) that

has the highest support/consensus set, we can improve the results. This was actually implemented in

section 6.4, however was not used on previous results due to its time consumption. The other

modification proposed for the Ransac is to use the homography calculated from the best consensus

set to find a new consensus set. This because the consensus set we are using in this work is obtained

with the homography obtained from the four initial putative correspondences that gave the best

results. Iteratively repeating the above (using the homography to find a consensus set, and using the

consensus set to find a new homography) it should be possible to associate a consensus set with the

same homography two times in a row (situation in which the algorithm terminates).

If the solutions above do not improve significantly the results, then instead of using DLT

algorithm other methods should be studied to calculate the homographies from the inliers.

Now, moving to another field, it is important that future research be done regarding the

utilization of the data given by UAV sensors. Although it was not tested with real sensor-data, we

demonstrate in this dissertation, with simulations, how to mosaic a set of images just with the

information that supposedly we would receive from the sensors (this is how the exact mosaics were

obtained for comparison with the estimated mosaics – chapter 6).

In this work we assume that we know in advance that each image is mosaiced with the

previous. An important suggestion of future work is to allow the current algorithm to receive an

unordered set of images. To do this it is suggested to count the number of inliers between all pairs of

two images and then use a threshold for the number of inliers to define witch of these pairs have

overlapping regions.

Research most be done in order to find accurate methods to deal with the parallax and the

orthorectification of the photos.

Methods for the automatic extraction of information from mosaics should also be investigated.

Examples are: water sources, green zones, access roads, houses or tents more vulnerable and

everything that may help the authorities in charge.

A possible integration of the mosaics created in Google Earth may also be investigated.

The integration of fish-eye lenses in the quadrotor instead of conventional cameras can be a

topic of future research.

- 91 -

References

[1] C. Holly, " "Symbols from the Sky: Heavenly messages from the depths of prehistory may be

encoded on the walls of caves throughout Europe."," Seed Magazine, 13 July 2010. [Online].

Available: http://seedmagazine.com/content/article/symbols_from_the_sky. [Accessed October

2012].

[2] L. Bagrow, History of Cartography, Transaction Publishers, 1986.

[3] D. Spencer, The Focal Dictionary of Photographic Technologies., Focal Press, 1973, p. 454.

[4] "Unmanned Vehicles – Aerial – UAV," [Online]. Available: http://www.unmannedvehicles.co.uk-

/unmanned-vehicles-aerial-ua/. [Accessed October 2012].

[5] S. Yahyanejad, "Incremental Mosaicking of Images from Autonomous, Small-Scale UAVs,"

Advanced Video and Signal Based Surveillance (AVSS), 2010 Seventh IEEE International

Conference on, pp. 329 - 336 , 2010.

[6] P. Shah, "Pakistan Says U.S. Drone Kills 13," New York Times, 18 June 2009.

[7] [Online]. Available: http://www.theuav.com/. [Accessed October 2012].

[8] V. Kumar, "Video: Robots that fly ... and cooperate," [Online]. Available: http://www.ted.com-

/talks/vijay_kumar_robots_that_fly_and_cooperate.html. [Accessed October 2012].

[9] "MQ-1B PREDATOR," 1 5 2012. [Online]. Available: http://www.af.mil/information/factsheets-

/factsheet.asp?fsID=122. [Accessed October 2012].

[10] [Online]. Available: http://www.droneamerica.com/. [Accessed October 2012].

[11] M. Quaritsch et al., "Collaborative Microdrones: Applications and Research Challenges.," In

Proceedings of the Second International Conference on Autonomic Computing and

Communication Systems (Autonomics 2008), p. 7, Sptember 2008.

[12] P. Wolf., Elements of Photogrammetry, 2 ed., McGraw-Hill, 1983.

[13] P. Kolonia., When more is better. Popular Photography, 58(1):, 1994, 1994, pp. 30-34.

[14] D. Steedly et al., "Efficiently registering video into panoramic mosaics," In Proceedings of the

Tenth IEEE International Conference on Computer Vision, vol. 2, p. 1300–1307, 2005.

[15] R. Szeliski, "Image mosaicing for tele-reality applications.," In Proc. of IEEE Workshop on

Applications of Computer Vision, pp. 44-53, 1994.

[16] K. Loewke et al., "Real-Time Image Mosaicing for Medical Applications," 2007.

[17] "Photo Mosaics of the Seafloor," Aug 2011. [Online]. Available: http://www.interactiveoceans-

.washington.edu/story/Photo+Mosaics+of+the+Seafloor. [Accessed October 2012].

[18] A. lanitis, "Reconstructing 3D faces in cultural heritage applications," VSMM, 2008.

[19] C. Wu, "3D Reconstruction of Anatomical Structures from Endoscopic Images," 2010.

[20] K. Ghormley, "Tutorial: Making DEMs and Orthophotos," April 2005. [Online]. Available:

http://www.microimages.com/documentation/Tutorials/demortho.pdf. [Accessed October 2012].

- 92 -

[21] S. Gumustekin, "An Introduction to Image Mosacing," July 1999. [Online]. Available:

http://web.iyte.edu.tr/eee/sevgum/research/mosaicing99/#kh:75. [Accessed October 2012].

[22] P. Bishop, "Multiple-center-of-projection images," In Proc. of ACM SIGGRAPH, 1998.

[23] G. Stein, "Lens Distortion Calibration Using Point Correspondences," In Proc. CVPR, 1996.

[24] M. Anandan, "Video indexing based on mosaic representations," Proceedings of the IEEE, pp.

905-921, May 1998.

[25] M. Anandan, "Robust multi-sensor image alignment," In Proc. of IEEE International Conf. on

Computer Vision, Jan 1998.

[26] M. Anandan et al., "Mosaic representations of video sequences and their Applications," Signal

Processing: Image Communication, special issue on Image and Video Semantics: Processing,

Analysis, and Application, May 1996.

[27] M. Anandan et al., "Video compression using mosaic representations," Signal Processing:

Image Communication, pp. 529-552, 1995.

[28] A. Goshtasby, "Piecewise linear mapping functions for image registration," Pattern Recognition,

pp. 459-468, 1986.

[29] A. Goshtasby, 2-D and 3-D Image Registration for Medical, Remote Sensing, and Industrial

Applications, Wiley Press, 2005.

[30] D. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," International Journal of

Computer Vision, pp. 91-110, 2004.

[31] K. Mikolajczyk and C. Schmid, "Scale and affine invariant interest point detectors," International

Journal on Computer Vision, pp. 63-86, 2004.

[32] P. Winter and S. Sokhansanj et al., "Quality assessment and grading of lentils using machine

vision," In Agricultural Institute of Canada Annual Conference, July 1996.

[33] J. Canny, "A Computational Approach To Edge Detection," IEEE Trans. Pattern Analysis and

Machine Intelligence, p. 679–698, 1986.

[34] R. Deriche, "Using Canny's criteria to derive a recursively implemented optimal edge detector,"

Int. J. Computer Vision, p. 167–187, April 1987.

[35] T. Lindeberg, "Edge detection and ridge detection with automatic scale selection," International

Journal of Computer Vision, pp. 117-154, 1998.

[36] T. Lindeberg, "Discrete derivative approximations with scale-space properties: A basis for low-

level feature extraction," J. of Mathematical Imaging and Vision, pp. 349-376, 1993.

[37] I. Sobel, "A 3 × 3 isotropic gradient operator for image processing," 1968.

[38] I. Sobel, "An Isotropic 3×3 Gradient Operator, Machine Vision for Three – Dimensional

Scenes," pp. 376-379, 1990.

[39] J. Prewitt, "Object Enhancement and Extraction," Picture Process Psychopict, pp. 75-149,

1970.

[40] L. G. Roberts, "Machine Perception Of Three-Dimensional Solids," 1963.

- 93 -

[41] C. Harris and M. Stephens, "A combined corner and edge detector," Proceedings of the 4th

Alvey Vision Conference, p. 147–151, 1988.

[42] C. Tomasi, "Good Features to Track," 9th IEEE Conference on Computer Vision and Pattern

Recognition, June 1994.

[43] C. Tomasi and T. Kanade, "Detection and Tracking of Point Features," International Journal of

Computer Vision, p. 165–168, 2004.

[44] L. Kitchen and A. Rosenfeld, "Gray-level corner detection," Pattern Recognition Letters, 1982.

[45] J. Koenderink and W. Richards, "Two-dimensional curvature operators," Journal of the Optical

Society of America, pp. 1136-1141, 1988.

[46] S. Smith and J. Brady, "SUSAN - a new approach to low level image processing," International

Journal of Computer Vision, p. 45–78, May 1997.

[47] S. Smith and J. Brady, "Method for digitally processing images to determine the position of

edges and/or corners therein for guidance of unmanned vehicle". UK Patent 2272285, January

1997.

[48] E. Rosten and T. Drummond, "Machine learning for high-speed corner detection," European

Conference on Computer Vision, May 2006.

[49] K. Mikolajczyk and C. Schmid, "Indexing Based on Scale Invariant Interest Points," In

Proceedings of the 8th International Conference on Computer Vision, pp. 525-531, 2001.

[50] R. Wang, “Laplacian of Gaussian (LoG),” 2011. [Online]. Available: http://fourier.eng.hmc-

.edu/e161/lectures/gradient/node10.html. [Accessed October 2012].

[51] R. Wang, "Difference of Gaussians (DoG)," 2011. [Online]. Available: http://fourier.eng.hmc-

.edu/e161/lectures/gradient/node11.html. [Accessed October 2012].

[52] D. Lowe, "Object recognition from local scale-invariant features," Proceedings of the

International Conference on Computer Vision, p. 1150–1157, 1999.

[53] D. Lowe, "Method and apparatus for identifying scale invariant features in an image and use of

same for locating an object in an image". U.S. Patent 6,711,293, 23 March 2004.

[54] J. Morel and G. Yu, "ASIFT: A New Framework for Fully Affine Invariant Image Comparison,"

SIAM Journal on Imaging Sciences, 2009.

[55] J. Morel and G. Yu, "ASIFT: An Algorithm for Fully Affine Invariant Comparison," Image

Processing On Line, 2011.

[56] J. Morel and G. Yu, "A Fully Affine Invariant Image Comparison Method," Proc. IEEE

International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2009.

[57] J. Morel and G. Yu, "Is SIFT Scale Invariant?," Inverse Problems and Imaging, Feb 2011.

[58] H. Bay et al., "SURF: Speeded up robust features," In ECCV, 2006.

[59] Y. Huang et al., "Image Mosaicing For UAV Application," International Symposium on

Knowledge Acquisition and Modeling, 2008.

[60] H. Xiaowei et al., "An Approach of Fast Mosaic for Serial Remote Sensing Images from UAV,"

- 94 -

Fourth International Conference on Fuzzy Systems and Knowledge Discovery, 2007.

[61] M. Fischler and R. Bolles, "Random sample consensus: a paradigm for model fitting with

applications to image analysis and automated cartography," Magazine Communications of the

ACM, vol. 24, pp. 381 - 395, June 1981.

[62] R. Harley and A. Zisserman, Multiple View Geometry in computer vision, UK: Cambridge

University Press, 2004.

[63] L. G. Brown, "A survey of image registration techniques," ACM Computing Surveys (CSUR), pp.

325 - 376, December 1992.

[64] C. Hines and D. Kuglin, "The phase correlation image alignment method," In IEEE 1975

Conference on Cybernetics and Society, p. 163–165, September 1975.

[65] B. Chatterji and R. B., "An fft-based technique for translation, rotation, and scale-invariant

image registration," IEEE Transactions on Image Processing, 1996.

[66] J. Davis, "Mosaics of scenes with moving objects," In Proc. of IEEE Conference on Computer

Vision and Pattern Recognition, 1998.

[67] P. Burt, "Smart sensing within a pyramid vision," Proceedings of the IEEE, pp. 1006-1015,

1988.

[68] S. Gümüstekin and R. Hall, "Image registration and mosaicing using a self calibrating camera,"

In Proc. of IEEE Int. Conf. on Image Processing, 1998.

[69] W. Press et al., Numerical Recipes in C: The Art of Scientific Computing., 2 ed., Cambridge,

England: Cambridge University Press, 1992.

[70] H. Sawhney and R. Kumar, "True multi-image alignment and its application to mosaicing and

lens distortion correction," In Proc. of IEEE Conference on Computer Vision and Pattern

Recognition, pp. 450-456, 1997.

[71] D. Milgram, "Adaptive techniques for photo mosaicing.," IEEE Transactions in Computers, pp.

1175-1180, 1977.

[72] J. Davis, "Mosaics of scenes with moving objects," In Proc. of IEEE Conference on Computer

Vision and Pattern Recognition, 1998.

[73] S. Gümüstekin e R. Hall, “Mosaic image generation on a flattened gaussian sphere,” In Proc. of

IEEE Workshop on Applications of Computer Vision, pp. 50-55, 1996.

[74] J. Lim, Two Dimensional Signal and Image Processing, Prentice Hall, 1989.

[75] S. Peleg, "Elimination of seams from photomosaics.," Computer Graphics and Image

Processing, pp. 90-94, 1981.

[76] P. Burt and E. Adelson, "A multiresolution spline with application to image mosaics.," ACM

Transactions on Graphics, pp. 217-236, 1983.

[77] C. Begg and R. Mukundan, "Hierarchical matching techniques for automatic image," Image and

Vision Computing New Zealand, pp. 381-386, Nov 2004.

[78] R. Szeliski, "Video Mosaics for Virtual Environments," IEEE Ccomputer Graphics and

- 95 -

Applications, vol. 16, pp. 22-30, March 1996.

[79] N. Greene, "Environment Mapping and Other Applications of World Projections," IEEE CG&A,

vol. 6, pp. 21-29, Nov. 1986.

[80] O. Faugeras, Three-Dimensional Computer Vision: A Geometric Viewpoint, Cambridge: MIT

Press, 1993.

[81] M. Berger, Geometry. Translated from the French by M. Cole and S. Levy. Universitext, Berlin:

Springer-Verlag, 1987.

[82] S. Shah, "A simple calibration procedure for fish-eye (high distortion) lens camera,"

Proceedings of the 1994 IEEE International Conference on Robotics and Automation, vol. 4, pp.

3422 - 3427, 8-13 May 1994.

[83] "Nasa.gov," [Online]. Available: http://www.grc.nasa.gov/WWW/K-12/airplane/Images/rotations-

.gif. [Accessed October 2012].

[84] K. Nomizu and S. Sasaki, Affine Differential Geometry, Cambridge University Press, 1994.

[85] Encyclopædia Britannica, 11 ed., 2008.

[86] [Online]. Available: http://upload.wikimedia.org/wikipedia/commons/4/47/Zenith-Nadir-

Horizon.svg. [Accessed October 2012].

[87] "Google Earth," 2012. [Online]. Available: https://play.google.com/store/apps/details?id=com.-

google.earth. [Accessed Oct 2012].

[88] G. Smith, "Digital Orthophotography and GIS," ESRI Conference, 2005.

[89] P. Bolstad, GIS Fundamentals: A First Text on Geographic Information Systems, Eider Press,

2005.

[90] [Online]. Available: http://upload.wikimedia.org/wikipedia/commons/5/51/OrthoPerspective.svg.

[Accessed October 2012].

[91] G. Smith, "Digital Orthophotography and GIS," http://proceedings.esri.com/library/userconf-

/proc95/to150/p124.html.

[92] Oxford English Dictionary, 2 ed., 1989.

[93] [Online]. Available: http://upload.wikimedia.org/wikipedia/commons/1/10/Parallax_Example.svg.


[94] "DXO. Image Science. Lens distortion," [Online]. Available: http://www.dxo.com/en/photo/-

dxo_optics_pro/features/optics_geometry_corrections/distortion. [Accessed October 2012].

[95] X. Papademetris, "Image Registration: A Review," [Online]. Available: http://www.google.-

com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=7&ved=0CE4QFjAG&url=http%3A%2F%2Fn

oodle.med.yale.edu%2F~papad%2Fvarious%2Fpapademetris_Image_Registration.ppt&ei=RSx

VUO-kNcPJhAengIHoBg&usg=AFQjCNEjNQkazmz8GhFw00crpW_ZYpNS6Q. [Accessed

October 2012].

[96] J. Fitzpatrick, D. Hill and J. Calvin, "CHAPTER 8 Image Registration," in Handbook of Medical

Imaging - Volume 2, Medical Image Processing and Analysis, Bellingham, Washington, SPIE

- 96 -

PRESS, 2000, pp. 488-496.

[97] T. Lindeberg, "Detecting Salient Blob-Like Image Structures and Their Scales with a Scale-

Space Primal Sketch: A Method for Focus-of-Attention," International Journal of Computer

Vision 11, 1993.

[98] T. Lindeberg, "Edge detection and ridge detection with automatic scale selection," International

Journal of Computer Vision 30, pp. 117-154, 1998.

[99] H. Moravec, Obstacle Avoidance and Navigation in the Real World by a Seeing Robot Rover,

Pittsburgh, Pennsylvania, 1980, p. 175.

[100] T. Lindeberg, "Scale-space theory: A basic tool for analysing structures at different scales,"

Journal of Applied Statistics, p. 1994, 224–270.

[101] K. Mikolajczyk and C. Schmid, "An affine invariant interest point detector," In European

Conference on Computer Vision, p. 128–142, 2002.

[102] U. Sinha, "SIFT: Scale Invariant Feature Transform," [Online]. Available: http://www.aishack.in-

/2010/05/sift-scale-invariant-feature-transform/3/. [Accessed October 2012].

[103] A. Vedaldi, "An Implementation of Lowe's scale invariante feature transform," [Online].

Available: http://www.vlfeat.org/~vedaldi/code/sift.html. [Accessed October 2012].

[104] “CodeForge.com. Free Open Source Codes Forge and Sharing.,” [Online]. Available:

http://www.codeforge.com/article/114823. [Accessed October 2012].

[105] M. Vargas and E. Malis, "Deeper understanding of the homography decomposition for vision-

based control," Unité de recherche INRIA Sophia Antipolis (France), 2007.

[106] O. Faugeras and F. Lustman, "Motion and structure from motion in a piecewise planar

environment," International Journal of Pattern Recognition and Artificial Intelligence, June 1988.

[107] "Aerial photograph:Year 2010," [Online]. Available: http://suisin.jimu.kyushuu.ac.jp/en-

/showcase/photo/aerial/h22/index.html. [Accessed May 2012].

[108] S. LaValle, "Yaw, pitch, and roll rotations," 20 4 2012. [Online]. Available: http://planning.cs.-

uiuc.edu/node102.html. [Accessed May 2012].

[109] "Pinhole art," [Online]. Available: http://microsites.lomography.com/pinhole/. [Accessed October

2012].

[110] [Online]. Available: http://upload.wikimedia.org/wikipedia/commons/3/3b/Pinhole-camera.svg.


[111] J. Bouguet, "Camera Calibration Toolbox for Matlab," 9 July 2010. [Online]. Available:

http://www.vision.caltech.edu/bouguetj/calib_doc/. [Accessed 2012].

[112] A. Greville and T. Greville, Generalized inverses. Theory and applications, 2 ed., New York,

NY: Springer, 2003.

[113] P. Azad et al., “Combining Harris Interest Points and the SIFT Descriptor for Fast Scale-

Invariant Object Recognition,” IEEE/RSJ international conference on Intelligent robots and

systems, 2009.

- 97 -

[114] D. Lowe, "Demo Software: SIFT Keypoint Detector," [Online]. Available: http://www.cs.ubc.ca-

/~lowe/keypoints/. [Accessed May 2012].

[115] S. Blackford, "How to Measure Errors," 1 10 1999. [Online]. Available: http://www.netlib-

.org/lapack/lug/node75.html. [Accessed May 2012].

[116] "Wolfram MathWorld," [Online]. Available: http://mathworld.wolfram.com/FrobeniusNorm.html.

[Accessed May 2012].

[117] J. Kämäräinen and P. Paalanen, "Experimental study on fast 2D homography estimation from a

few point correspondences," Lappeenranta, 2009.

[118] C. Evans, "The OpenSURF Computer Vision Library," 12 04 2012. [Online]. Available:

http://www.chrisevansdev.com/computer-vision-opensurf.html. [Accessed October 2012].

[119] J. Morel and G. Yu, "ASIFT: An Algorithm for Fully Affine Invariant Comparison," [Online].

Available: http://www.ipol.im/pub/art/2011/my-asift/. [Accessed May 2012].

[120] R. Wang, "Sharpening and Edge Detection," 19 09 2011. [Online]. Available:

http://fourier.eng.hmc.edu/e161/lectures/gradient/node10.html. [Accessed October 2012].

[121] R. Owens, "Computer Vision IT412," 29 10 1997. [Online]. Available: http://homepages.inf.ed-

.ac.uk/rbf/CVonline/LOCAL_COPIES/OWENS/LECT6/node2.html. [Accessed October 2012].

[122] W. Kecs, The convolution product and some applications, Reidel & Ed. Academici, 1982.

[123] A. Kovnatsky, "Feature points in image, Keypoint extraction," 11 Oct 2010. [Online]. Available:

http://www.mathworks.com/matlabcentral/fileexchange/29004-feature-points-in-image-keypoint-

extraction. [Accessed May 2012].

[124] A. Noble, "Finding corners," IVC - Image and Vision Computing, p. 121–128, 1988.

[125] J. Li and N. Allinson, "A comprehensive review of current local features for computer vision,"

Neurocomputing, pp. 1771-1787, 2008.

[126] L. Quam, "Hierarchical warp stereo," in Proc. DARPA lmage Understanding Workshop, p. 149–

155, December 1984.

[127] T. Headrick, Statistical Simulation: Power Method Polynomials and other Transformations,

Boca Raton: Chapman and Hall/CRC, 2010.

[128] R. Szeliski and J. Coughlan, "Hierarchical spline-based image registration.," In IEEE Computer

Society Conference on Computer Vision and Pattern Recognition (CVPR’94), June 1994.

[129] J. Bergen and P. Anandan, "Hierarchical modelbased motion estimation.," In Second European

Conference on Computer Vision (ECCV’92), p. 237–252, May 1992.

[130] A. Witkin and D. Terzopoulos, "Signal matching through scale space.," International Journal of

Computer Vision, p. 133–144, 1987.

- 98 -

- 99 -

Appendices

Appendix A

Harris-Laplace algorithm (section 3.6)

With concern to this sub-section we will focus on the following practical issues:

1) Convolutions and Gaussian kernels:

In this dissertation convolutions, as the one described on eq. (3.19) between the Gaussian

kernel, , and , were done using the Matlab function: . For example for the first entry

of the matrix ( ) we have:

A = conv2(Lx.^2, g, 'same') (7.1)

Where is the Gaussian kernel and “same” return the central part of the convolution, that is the

same size as .

In general, the Gaussian kernels in this dissertation were all obtained with the function .

Here is an example:

g = fspecial('gaussian',max(1,fix(6*S_I+1)), S_I) (7.2)

Where, in this particular case (still eq. (3.19)), is and “fix(6*S_I+1))” is the scalar that

will define the size of the Gaussian window.

It is important to notice that the sum of all the entries of is unity when the size of the window is

infinite. Since the size of the Gaussian window is limited, the function automatically normalize

all these entries of by its sum.

2) Derivatives of the Gaussian smoothed image :

Considering equations (3.20) and (3.21) we see that the authors first smooth the initial image

and then through derivation they calculate and .

However, in this implementation, the following relation was taken into account [120] [121]:

(7.3)

- 100 -

Since the convolution of generalized functions has the commutative property [122], we can first

derive the one dimensional Gaussian function:

(7.4)

Where, in the present code, using Matlab notation, is a vector defined between:

x = -round(3*S_D):round(3*S_D) (7.5)

Which is the same as saying that the size of the window is approximately .

And just then we convolve the Gaussian with the image:

Lx = conv2(img, G, 'same')

Ly = conv2(img, G', 'same') (7.6)

This idea, take prove to work perfectly, was taken from an implementation of this method by a

PhD student: Artiom Kovnatsky (implementation can be accessed here [123]).

3) Different constants involved:

What concerns to the constants involved, experiments show good results for the following

configuration:

From eq. (3.23), the initial sigma ( ) was set to , was set as and , number of scale

levels, as . The number of scale levels could be defined adaptively depending on the size of

the input image, however it seen unnecessary at this point;

From eq. (3.24) has remained as ;

All points closer than pixels from the border distance were discarded ( because the

descriptor window is a matrix around each keypoint).

4) Corner classification measure:

Instead of use eq. (3.25) for the corner classification it was used the modified measure

(harmonic mean) proposed by Nobel [124] and used by several authors [125]:

(7.7)

Where , , and were defined earlier in eq. (3.19) and is the distance from to the

next largest double-precision number, that is .

5) Laplace of Gaussian:

- 101 -

To solve the Laplacian of Gaussian eq. (3.26) we simple use:

LoG(x,y,S_I)=(S_I^2)*imfilter(img,(fspecial(...

'log',floor(6*S_I+1),S_I)),'replicate') (7.8)

is used to filter a multidimensional array with a multidimensional filter.

Canonical orientation according to SIFT (section 3.7.1)

In this dissertation, instead of using equations (3.29), (3.34) and (3.35) it was used the

derivative mask (derivative of the Gaussian kernel eq. (3.17):

(7.9)

Where is the characteristic scale of the keypoint (section 3.7).

Now using a region around each keypoint of (using Matlab notation:

) and not normalizing the results since there is no point in normalization (will

not influence the results) we have:

Pre_gradient_norm = sqrt(Ix.^2 + Iy.^2); (7.10)

gradient_angles = rad2deg( atan2( Iy, Ix ) + pi ); (7.11)

Where (=Ix) and (=Iy) are obtained from eq. (7.9) according to:

Ix = imfilter(img, dGdx, 'same'); (7.12)

Iy = imfilter(img, dGdx', 'same'); (7.13)

is the same as dGdx.

As was said in section 3.7.1 each sample added to the histogram is weighted by its gradient

magnitude (eq. (7.10)) and by a Gaussian-weighted circular window with scale*Factor :

g=fspecial('gaussian',max(1,fix(6*scale*Factor+1)),scale*Factor);

gradient_norm=imfilter(Pre_gradient_norm,g,'same'); (7.14)

- 102 -

Describe interest points according to SIFT (section 3.7.2)

The description of interest points is complicated. Here is given just a small explanation about

how it was implemented in our code.

Around each keypoint was defined a matrix of coordinates as described in Fig. 7.1. Actually is

not a matrix, but let us call it this way.

(…) (…) (…) (…) (…) (…) (…)

(…) (2,-2) (2,-1) (2,0) (2,1) (2,2) (…)

(…) (1,-2) (1,-1) (1,0) (1,1) (1,2) (…)

(…) (0,-2) (0,-1) (0,0) (0,1) (0,2) (…)

(…) (1,-2) (-1,-1) (-1,0) (-1,1) (-1,2) (…)

(…) (-2,-2) (-2,-1) (-2,0) (-2,1) (-1,2) (…)

(…) (…) (…) (…) (…) (…) (…)

Fig. 7.1 – Matrix of coordinates.

The size of this square matrix (Matrix ) depends on the characteristic scale, , of each

keypoint. Following the suggestion of the reference: [123], this implementation used a radius of

around each keypoint. So the square matrix has

entries.

Next, a rotation matrix was calculated for each keypoint according to its canonical

orientation. A new matrix , with new coordinates, was then obtained from the multiplication of the

rotation matrix by the coordinates of the matrix .

Once this was done the coordinates of the keypoint were added to all the entries of ,

resulting in the matrix .

Finally, as before, the equations (7.9) to (7.14) were used. However, instead of using the image

“img” in the equations (7.12) and (7.13) it was used only the pixels of the image “img” that had

coordinates in the matrix .

Briefly, our final window of gradient angles and gradient norms was obtained

downsampling by a factor of the gradient angles and gradient norm matrixes obtained from

the new equations (7.12) and (7.13).

The rest of the calculations regarding this topic do not need considerations.

- 103 -

Putative correspondences (section 3.8)

With relation to the nearest neighborhood, it was used just the minimum value from the columns

of (see note 9 (page 38) and equation (3.37)). Also instead of using the threshold of equation

(3.38) as , this implementation used .

Ransac (section 3.9)

This dissertation used the symmetric transfer error instead of the reprojection error;

The value (equation (3.43)) was not used;

Instead of it was used simply the relation: ;

Between the step and of the algorithm described in section 3.9, was added an

intermediate step to evaluate on the collinearity of the putative correspondences.

Whenever they are collinear the algorithm randomly selects other putative

correspondences. This reselection does not account as a trial , however if this step is

repeated more than times for each repetition then the Ransac algorithm fail and

a flag is activated to stop the mosaicing method.

was computed with the adaptive method described and was set to .

To avoid the algorithm to get stuck inside the loop created by the number of trials , the

algorithm, as before, stops after trials with the activation of a flag.

DLT algorithm (section 3.5)

This topic does not need considerations because the implementations follow exactly what was

written in section 3.5. All the parameters were defined there.

- 104 -

Appendix B

Tab. 7.1 – Minimum number of inliers on pitch evaluation.



x5 Images 20 30 35 40 45 50 55 60 65 70 80

Pitch (degrees)

12 5 9 13 20 42

value≥250

11

4 8 13 12 26 43

250>value≥200

10

6 8 13 21 31 49

200>value≥170

9

5 8 13 14 27 33 54

170>value≥140

8

4 6 9 14 25 32 41 59

140>value≥120

7

5 9 14 23 30 37 56 88

120>value≥100

6

5 7 14 19 26 35 47 59 87

100>value≥80

5

7 10 13 23 33 45 59 70 103

80>value≥60

4 6 7 11 20 29 39 52 61 75 96

60>value≥40

3 6 11 16 25 38 48 53 66 80 94

40>value≥30

2 12 14 24 40 44 50 67 82 87 124

30>value≥20

1 5 12 17 24 42 58 66 92 123 131 169

20>value≥10

0 21 55 86 121 142 158 184 215 242 269 336

10>value≥7

-2 11 34 54 53 68 71 92 93 107 132 155

7>value

-4 20 38 48 52 78 89 103 106 112 134 151

-8 30 52 68 81 83 110 126 137 142 150 153

-12 53 68 88 91 128 122 134 145 155 137 108 -16 53 76 104 108 112 120 116 105 109 93 86 -20 61 77 80 99 88 83 83 66 63 58 56

Tab. 7.2 – Medium number of inliers on pitch evaluation.

Medium Number of Inliers x100 Ransac Overlap Percentage x5 Images 20 30 35 40 45 50 55 60 65 70 80

Pitch (degrees)

12 12,71 27,81 38,84 56,97 107,3

value≥250

11

7,03 20,07 35,73 48,81 69,78 114,7

250>value≥200

10

16,41 27,36 42,21 63,07 77,34 110,8

200>value≥170

9

12,45 26,06 39,93 52,05 74,07 91,51 125,4

170>value≥140

8

8,00 19,78 33,94 44,37 69,78 85,38 106,2 141,4

140>value≥120

7

12,99 28,50 43,07 62,44 76,87 98,74 131,0 179,8

120>value≥100

6

8,25 23,75 38,38 54,20 69,62 94,79 120,9 144,8 188,9

100>value≥80

5

21,32 28,70 48,24 64,46 90,64 113,9 130,5 153,9 192,1

80>value≥60

4 13,48 26,90 40,94 61,04 82,50 101,5 118,9 139,0 158,5 191,9

60>value≥40

3 21,20 40,55 61,80 79,50 96,31 109,0 126,9 145,7 158,7 201,2

40>value≥30

2 40,13 54,42 69,99 83,82 100,1 115,9 143,5 166,8 191,1 234,1

30>value≥20

1 10,14 40,15 57,46 75,21 101,5 128,3 155,7 187,6 218,7 247,0 295,3

20>value≥10

0 41,40 111,8 152,2 192,2 230,4 262,0 303,8 345,4 385,8 424,2 501,8

10>value≥7

-2 46,03 86,48 106,5 118,1 131,7 152,8 174,8 196,7 216,3 243,6 285,0

7>value

-4 59,26 96,21 114,9 132,7 158,2 178,6 195,0 209,5 233,8 257,0 304,2

-8 93,72 128,8 143,7 161,9 185,7 218,6 234,3 248,6 272,8 285,2 289,1

-12 109,1 148,2 165,7 192,6 223,9 245,4 258,0 269,9 283,9 265,8 242,4

-16 117,8 151,1 178,5 196,7 222,2 225,7 227,3 218,9 215,1 199,8 183,2

-20 109,8 145,4 158,9 173,3 163,9 171,1 161,9 155,2 153,9 147,0 131,7

- 105 -

Tab. 7.3 – Maximum geometric distance error on pitch evaluation. Same table as Tab. 6.3.

Maximum Geometric Distance Error (section 5.3.1) – in pixels x100 Ransac Overlap Percentage

x5 Images 20 30 35 40 45 50 55 60 65 70 80

Pitch

(degrees)

12 6,6E+06 313,4 122,5 99,23 24,76

1>value

11

6,6E+04 1,6E+07 145,5 84,44 85,28 44,61

3>value≥1

10

6,2E+08 329,5 114,6 109,2 45,25 36,76

6>value≥3

9

5,1E+08 235,2 139,5 102,3 61,63 45,11 21,98

10>value≥6

8

5,3E+08 5,3E+07 505,4 210,4 48,85 59,34 55,09 12,62

15>value≥10

7

8,1E+06 581,6 103,4 64,84 41,91 23,20 24,01 9,68

20>value≥15

6

6,6E+07 1,1E+07 131,6 102,9 115,9 42,20 16,84 15,72 8,93

30>value≥20

5

3,3E+07 192,7 113,5 70,41 35,68 31,60 26,38 11,31 13,10

40>value≥30

4 2,7E+07 1,8E+03 92,03 113,6 67,19 26,15 26,06 22,05 13,64 8,73

60>value≥40

3 5,2E+06 101,6 75,21 38,83 31,14 39,49 30,64 15,00 11,83 10,71

100>value≥60

2 117,2 70,65 173,8 39,13 23,11 19,51 13,97 13,96 12,60 6,41

200>value≥100

1 1,2E+07 63,07 53,98 36,17 36,42 14,91 10,96 7,87 5,15 3,80 2,70

500>value≥200

0 12,18 1,14 1,55 0,57 0,69 0,62 0,69 0,33 0,37 0,20 0,12

1000>value≥500

-2 115,0 30,19 21,21 32,31 20,19 14,53 10,02 10,53 5,98 6,52 5,79

value≥1000

-4 64,04 31,35 23,63 16,20 10,65 10,50 9,12 7,13 6,92 4,69 3,87

-8 31,76 15,41 14,51 16,08 9,83 8,40 6,47 5,56 4,34 4,03 5,70

-12 15,15 13,32 7,52 6,05 7,08 5,09 6,07 7,25 4,53 7,82 6,79

-16 14,84 7,10 11,35 9,31 5,48 7,23 5,98 7,57 10,92 12,07 13,78

-20 14,73 15,61 10,74 12,82 12,41 15,54 13,56 19,03 23,27 41,45 38,09

Tab. 7.4 – Medium geometric distance error on pitch evaluation.

Medium Geometric Distance Error (section 5.3.1) – in pixels x100 Ransac Overlap Percentage x5 Images 20 30 35 40 45 50 55 60 65 70 80

Pitch

(degrees)

12 64,67 8,40 5,80 3,61 1,26

1>value

11

6,6E+04 26,98 6,63 3,97 2,97 1,46

3>value≥1

10

213,2 13,44 5,48 3,78 2,28 1,73

6>value≥3

9

186,2 10,82 5,65 3,88 3,00 2,15 1,23

10>value≥6

8

845,8 40,51 7,88 5,44 3,20 2,25 1,32 0,90

15>value≥10

7

52,02 11,90 4,78 3,80 2,67 1,39 0,91 0,61

20>value≥15

6

450,0 57,10 6,86 4,00 2,87 1,85 1,06 0,79 0,51

30>value≥20

5

25,64 8,85 5,13 2,94 1,87 1,38 1,05 0,76 0,54

40>value≥30

4 170,6 14,50 5,27 3,54 2,19 1,54 1,40 1,01 0,72 0,56

60>value≥40

3 24,34 6,15 3,40 2,44 1,86 1,66 1,20 0,86 0,71 0,57

100>value≥60

2 6,05 3,34 3,54 1,98 1,62 1,11 0,73 0,57 0,58 0,46

200>value≥100

1 29,90 3,81 3,22 1,52 1,04 0,66 0,59 0,46 0,38 0,39 0,31

500>value≥200

0 1,59 0,18 0,14 0,09 0,09 0,07 0,05 0,03 0,03 0,03 0,01

1000>value≥500

-2 3,92 1,91 1,46 1,32 1,07 0,74 0,55 0,49 0,39 0,36 0,30

value≥1000

-4 3,42 1,64 1,24 1,03 0,78 0,70 0,57 0,48 0,47 0,38 0,31

-8 1,60 0,90 0,80 0,76 0,60 0,42 0,38 0,37 0,34 0,32 0,34

-12 0,98 0,73 0,54 0,50 0,39 0,35 0,33 0,38 0,36 0,38 0,41

-16 0,86 0,61 0,52 0,52 0,46 0,48 0,43 0,48 0,51 0,56 0,73

-20 0,92 0,73 0,73 0,76 0,73 0,75 0,74 0,82 0,87 1,00 1,17

- 106 -

Tab. 7.5 – Maximum pitch error on pitch evaluation.

Maximum Pitch Error (section 5.3.2) – in degrees x100 Ransac Overlap Percentage x5 Images 20 30 35 40 45 50 55 60 65 70 80

Pitch

(degrees)

12 68,8 92,7 24,00 17,71 7,13

0.5>value≥0

11

98,1 71,3 22,97 13,79 16,04 8,67

1>value≥0.5

10

56,4 96,1 32,17 20,56 9,96 8,75

2>value≥1

9

75,6 52,7 27,04 19,86 10,35 9,30 4,74

4>value≥2

8

87,6 91,5 51,8 33,22 9,33 10,88 10,46 3,42

7>value≥4

7

78,9 55,1 25,44 12,22 10,17 5,48 4,48 2,84

10>value≥7

6

85,0 72,7 36,06 22,74 20,71 9,80 4,33 4,79 2,33

15>value≥10

5

63,1 40,1 31,37 12,43 9,20 7,49 6,19 2,82 2,75

20>value≥15

4 84,5 73,3 22,93 24,38 11,53 6,33 5,66 4,84 3,04 2,18

30>value≥20

3 89,8 21,61 16,43 8,37 6,63 8,90 5,97 3,84 3,33 2,13

40>value≥30

2 20,96 18,52 31,17 7,70 5,58 5,20 3,39 3,83 3,21 1,64

60>value≥40

1 72,4 16,76 15,00 8,93 7,62 4,00 3,17 1,68 1,23 1,14 0,93

90>value≥60

0 0,34 0,13 0,15 0,12 0,09 0,11 0,19 0,05 0,10 0,05 0,02

120>value≥90

-2 27,24 7,86 4,98 7,27 4,94 3,91 2,71 2,77 1,58 1,64 1,59

180≥value≥120

-4 16,97 7,17 7,54 3,85 3,09 3,25 2,74 1,81 1,77 1,30 1,32

-8 5,77 3,63 3,20 4,15 2,27 1,84 1,51 1,56 1,19 1,26 1,20

-12 4,67 2,82 2,60 1,95 1,55 1,84 1,32 2,08 1,24 1,17 1,64

-16 3,19 2,38 2,52 1,73 1,60 1,66 1,32 1,89 2,05 2,70 3,35

-20 3,85 3,64 1,91 2,35 2,33 2,85 2,28 3,20 3,71 7,08 9,25

Tab. 7.6 – Medium pitch error on pitch evaluation.

Medium Pitch Error (section 5.3.2) – in degrees x100 Ransac Overlap Percentage x5 Images 20 30 35 40 45 50 55 60 65 70 80

Pitch

(degrees)

12 19,34 5,93 4,09 2,83 1,15

0.5>value≥0

11

23,71 13,35 4,58 3,04 2,36 1,25

1>value≥0.5

10

8,99 10,19 4,43 2,70 1,82 1,48

2>value≥1

9

24,31 6,42 4,50 3,00 2,52 1,67 1,06

4>value≥2

8

31,85 13,87 5,80 4,25 2,61 1,75 1,17 0,76

7>value≥4

7

17,95 8,91 3,28 2,95 2,12 1,31 0,84 0,60

10>value≥7

6

26,45 8,60 4,96 3,14 2,26 1,72 1,00 0,84 0,52

15>value≥10

5

8,94 5,50 4,22 2,12 1,64 1,29 1,00 0,66 0,57

20>value≥15

4 23,17 9,30 4,03 3,01 1,65 1,33 1,21 0,95 0,69 0,61

30>value≥20

3 11,05 4,20 2,70 1,80 1,29 1,32 0,93 0,84 0,67 0,57

40>value≥30

2 3,63 2,30 2,42 1,47 1,25 0,92 0,62 0,49 0,51 0,41

60>value≥40

1 14,00 2,40 2,53 1,34 0,69 0,46 0,52 0,43 0,37 0,43 0,33

90>value≥60

0 0,15 0,03 0,07 0,05 0,05 0,05 0,06 0,03 0,04 0,03 0,01

120>value≥90

-2 2,71 1,19 0,95 1,16 0,82 0,68 0,48 0,50 0,36 0,30 0,27

180≥value≥120

-4 2,52 1,37 1,14 0,92 0,68 0,64 0,50 0,40 0,42 0,32 0,27

-8 1,20 0,79 0,66 0,71 0,52 0,38 0,34 0,31 0,26 0,26 0,28

-12 0,82 0,60 0,47 0,40 0,34 0,29 0,27 0,31 0,26 0,31 0,35

-16 0,74 0,45 0,40 0,44 0,34 0,34 0,34 0,38 0,37 0,47 0,65

-20 0,70 0,49 0,47 0,50 0,53 0,51 0,51 0,58 0,62 0,77 0,93

- 107 -

Tab. 7.7 – Maximum error on pitch evaluation.

Maximum Error (section 5.3.2) – in pixels

x100 Ransac Overlap Percentage x5 Images 20 30 35 40 45 50 55 60 65 70 80

Pitch

(degrees)

12 4,6E+03 287,9 68,97 34,51 24,45

1>value

11

3,3E+06 626,6 39,98 43,33 41,56 24,40

3>value≥1

10

947,4 293,9 43,09 49,86 36,22 31,20

6>value≥3

9

3,2E+03 67,81 53,24 54,90 39,64 23,11 28,08

10>value≥6

8

1,3E+05 1,2E+03 238,9 56,24 35,42 42,19 29,71 16,89

15>value≥10

7

4,6E+03 81,85 68,03 28,19 31,38 22,50 17,03 12,86

20>value≥15

6

2,3E+05 347,3 87,77 35,62 51,26 36,84 20,33 15,89 11,88

30>value≥20

5

205,0 73,93 68,96 35,32 24,50 29,39 21,12 15,95 12,20

40>value≥30

4 836,8 194,9 35,51 54,00 48,58 28,95 16,37 18,56 14,04 14,13

60>value≥40

3 754,2 67,39 57,38 26,59 22,61 26,79 19,48 16,58 16,29 11,64

100>value≥60

2 25,22 40,33 34,92 25,79 22,15 17,27 14,08 10,05 9,45 10,64

200>value≥100

1 8,4E+03 43,85 39,41 25,65 15,29 12,76 18,28 9,57 6,89 8,00 6,15

500>value≥200

0 0,34 0,60 0,45 0,62 0,49 0,49 0,31 0,37 0,51 0,30 0,23

1000>value≥500

-2 44,80 18,87 21,37 38,32 21,52 10,92 13,34 11,28 11,63 10,38 10,92

value≥1000

-4 39,52 27,55 17,07 22,90 16,75 12,40 13,76 12,72 15,54 10,28 7,40

-8 25,17 17,81 19,77 18,87 16,23 9,37 12,77 12,65 8,91 9,24 8,94

-12 24,18 19,65 15,00 16,21 9,98 10,20 11,33 15,69 11,57 10,21 11,09

-16 25,53 13,17 17,52 23,12 12,27 11,92 11,27 10,55 13,88 13,24 14,05

-20 20,08 19,54 14,37 25,26 14,71 25,34 16,49 18,65 23,27 21,44 24,39

Tab. 7.8 – Medium error on pitch evaluation.

Medium Error (section 5.3.2) – in pixels


Pitch (degrees)

12 68,98 8,67 8,09 6,12 3,93

1>value

11

1,7E+04 15,68 7,44 6,81 6,08 4,34

3>value≥1

10

62,40 14,24 7,03 6,85 6,18 4,74

6>value≥3

9

70,93 8,72 8,02 6,35 6,27 4,59 4,35

10>value≥6

8

5,7E+03 78,86 12,47 8,04 6,70 6,15 4,50 3,69

15>value≥10

7

72,24 8,47 6,26 6,47 6,22 3,96 3,23 2,60

20>value≥15

6

2,8E+04 19,90 7,69 6,48 6,56 4,90 3,41 2,84 2,49

30>value≥20

5

7,92 8,44 7,74 6,07 4,67 3,99 3,74 2,77 2,28

40>value≥30

4 83,25 12,95 6,82 7,59 5,26 4,16 3,77 3,32 2,67 2,49

60>value≥40

3 44,55 6,34 5,56 3,84 3,67 3,83 2,94 2,77 3,50 2,24

100>value≥60

2 4,90 6,13 4,80 3,79 3,14 3,24 2,86 2,25 1,85 2,13

200>value≥100

1 139,6 6,89 6,03 3,63 2,14 2,03 2,16 1,21 1,03 0,92 0,83

500>value≥200

0 0,16 0,14 0,14 0,15 0,27 0,16 0,15 0,12 0,13 0,12 0,09

1000>value≥500

-2 6,70 2,78 2,66 2,85 3,04 2,96 2,66 2,17 1,92 2,09 2,07

value≥1000

-4 6,92 4,67 3,66 3,58 3,28 2,68 2,66 2,61 2,41 2,54 1,71

-8 4,39 3,58 3,78 3,54 3,25 2,36 2,13 2,39 2,12 2,58 2,18

-12 4,56 3,74 3,03 3,07 2,48 2,30 2,17 2,16 2,42 2,12 2,20

-16 5,20 3,39 2,87 3,09 2,63 2,49 2,59 2,68 2,84 2,95 3,31

-20 4,81 3,28 3,56 3,92 3,59 3,77 3,02 3,90 3,42 4,12 3,81

- 108 -

Appendix C

Tab. 7.9 – Minimum number of inliers on roll evaluation.

Minimum Number of Inliers x100 Ransac Overlap Percentage x5 Images 20 30 35 40 45 50 55 60 65 70 80

Roll (degrees)

16 5 8 13 20 29 55

value≥250

15

5 13 17 26 32 51

250>value≥200

14

5 10 16 25 28 34 55

200>value≥170

13

4 12 20 28 28 44 71

170>value≥140

12

4 9 18 27 30 38 50 72

140>value≥120

11

5 11 17 27 32 45 54 66

120>value≥100

10

4 10 16 21 27 39 47 54 75

100>value≥80

9 4 4 12 23 22 35 42 53 62 82

80>value≥60

8 4 8 11 20 25 34 50 65 76 94

60>value≥40

7 5 12 16 25 33 44 57 82 84 104

40>value≥30

3 4 17 30 35 40 55 61 61 82 92 108

30>value≥20

1 14 31 32 44 63 74 86 115 124 156 164

20>value≥10

0 57 103 124 157 176 201 220 251 278 302 350

10>value≥7

-5 36 55 55 76 81 92 108 114 118 130 147

7>value

-10 38 55 66 86 90 105 109 113 127 135 144

-20 36 57 69 79 91 97 100 99 85 81 86

Tab. 7.10 – Medium number of inliers on roll evaluation.

Medium Number of Inliers


x5 Images 20 30 35 40 45 50 55 60 65 70 80

Roll (degrees)

16 13,88 26,92 39,13 56,23 95,80

value≥250

15

8,78 21,59 33,94 47,72 59,47 87,44

250>value≥200

14

5,57 14,89 26,59 40,62 50,79 68,25 100,7

200>value≥170

13

8,28 20,67 31,87 48,56 59,31 82,75 123,5

170>value≥140

12

15,45 30,25 40,14 54,11 67,71 94,60 132,9

140>value≥120

11

9,39 22,99 33,27 46,61 61,51 84,05 99,08 131,9

120>value≥100

10

15,19 27,84 40,93 52,00 68,80 83,97 97,56 137,7

100>value≥80

9

7,79 19,44 36,97 43,77 63,77 76,21 92,78 109,2 145,3

80>value≥60

8

13,59 29,18 41,13 56,29 68,91 84,06 111,0 141,4 163,4

60>value≥40

7 9,20 19,44 32,91 46,05 56,60 76,21 102,4 136,1 156,9 195,3

40>value≥30

3 7,17 37,98 49,14 63,69 85,16 100,9 113,2 135,7 149,5 173,1 212,2

30>value≥20

1 25,20 47,63 59,98 83,62 108,9 137,7 173,6 200,6 235,5 257,4 292,2

20>value≥10

0 81,40 155,8 195,2 233,2 269,0 302,4 334,8 370,2 411,8 445,4 509,2

10>value≥7

-5 62,38 91,39 103,0 133,0 148,6 171,9 193,3 209,3 235,3 260,0 287,3

7>value

-10 71,74 101,9 130,0 141,9 165,0 179,0 199,5 217,6 240,0 256,1 260,9

-20 55,54 95,88 119,3 143,7 152,8 167,4 179,9 176,6 165,6 156,9 155,4

- 109 -

Tab. 7.11 – Maximum geometric distance error on roll evaluation.



x5 Images 20 30 35 40 45 50 55 60 65 70 80

Roll (degrees)

16 2,3E+08 2,0E+03 208,4 82,26 40,06

1>value

15

1,2E+09 1,2E+03 325,9 195,9 112,9 35,79

3>value≥1

14

3,1E+08 711,5 511,3 185,7 129,5 60,32 29,76

6>value≥3

13

2,3E+09 1,6E+03 387,8 153,8 76,73 47,12 28,39

10>value≥6

12

1,1E+08 4,8E+07 558,3 166,3 135,3 54,71 31,63 17,83

15>value≥10

11

5,2E+08 1,6E+03 293,8 179,2 78,87 47,80 35,74 17,91

20>value≥15

10

2,0E+08 5,9E+07 409,3 194,4 134,2 55,45 48,79 33,20 13,33

30>value≥20

9 1,1E+09 1,4E+04 181,0 145,8 74,34 59,56 34,99 18,45 14,48

40>value≥30

8 1,5E+18 1,2E+09 2,1E+03 219,8 83,19 53,56 35,28 26,06 22,66 17,88

60>value≥40

7 4,9E+08 1,5E+03 194,3 282,3 86,84 46,89 28,79 20,73 17,01 10,51

100>value≥60

3 4,1E+09 343,3 126,1 74,33 67,14 33,61 22,20 24,40 15,60 10,83 8,23

200>value≥100

1 1,1E+03 76,11 74,04 47,62 26,78 13,20 8,28 5,65 6,15 4,91 3,81

500>value≥200

0 0,00 0,00 0,00 0,45 0,72 0,68 0,78 0,36 0,28 0,30 0,20

1000>value≥500

-5 56,11 37,09 22,96 15,87 15,48 11,89 8,33 6,50 6,17 5,22 6,86

value≥1000

-10 46,28 18,55 11,99 10,43 9,08 6,60 6,82 5,46 5,92 5,41 6,14

-20 44,66 25,54 19,22 14,86 18,73 16,69 16,98 12,80 16,84 15,35 16,21

Tab. 7.12 – Medium geometric distance error on roll evaluation.

Medium Geometric Distance Error (section 5.3.1) – in pixels


x5 Images 20 30 35 40 45 50 55 60 65 70 80

Roll (degrees)

16 180,1 20,74 9,13 4,49 1,80

1>value

15

1,7E+03 18,91 12,05 5,88 5,14 2,01

3>value≥1

14

3,1E+03 36,25 16,58 9,23 4,97 3,28 1,72

6>value≥3

13

810,6 23,90 12,76 5,10 4,26 2,00 1,47

10>value≥6

12

76,07 14,69 10,07 5,26 2,95 1,86 1,04

15>value≥10

11

300,0 22,68 10,46 6,77 3,85 2,68 2,25 1,01

20>value≥15

10

75,89 14,57 7,76 5,50 3,04 2,41 1,74 0,88

30>value≥20

9

916,1 31,56 10,19 7,08 3,45 3,13 1,89 1,24 0,83

40>value≥30

8

264,9 19,02 9,23 4,88 3,67 2,14 1,20 0,88 0,78

60>value≥40

7 161,1 23,49 10,29 6,09 4,52 2,97 1,78 0,97 0,76 0,70

100>value≥60

3 1,0E+03 12,08 7,82 4,92 3,52 2,39 1,39 1,19 0,86 0,69 0,55

200>value≥100

1 8,83 5,50 4,80 2,41 1,39 0,82 0,69 0,50 0,41 0,34 0,31

500>value≥200

0 0,00 0,00 0,00 0,03 0,06 0,08 0,07 0,04 0,04 0,03 0,03

1000>value≥500

-5 3,68 1,93 1,47 1,05 0,89 0,73 0,57 0,50 0,43 0,42 0,35

value≥1000

-10 2,18 1,29 0,88 0,77 0,67 0,60 0,52 0,47 0,39 0,38 0,38

-20 1,64 1,16 0,96 0,83 0,88 0,84 0,80 0,83 0,84 0,91 1,05

- 110 -

Tab. 7.13 – Maximum roll error on roll evaluation.

Maximum Roll Error (section 5.3.2) – in degrees


x5 Images 20 30 35 40 45 50 55 60 65 70 80

Roll (degrees)

16

122,6 56,4 22,31 11,57 4,97

0.5>value≥0

15

156,5 51,2 29,57 22,94 16,07 5,29

1>value≥0.5

14

152,4 96,8 39,23 24,20 16,69 7,46 4,24

2>value≥1

13

166,3 54,4 30,14 20,58 10,91 6,85 4,85

4>value≥2

12

150,2 128,7 38,24 31,71 25,26 8,31 5,39 3,55

7>value≥4

11

150,1 53,5 24,36 23,13 12,59 7,96 5,91 3,61

10>value≥7

10

150,1 120,4 43,0 26,06 16,58 7,42 7,08 4,59 2,62

15>value≥10

9

145,7 74,1 34,06 27,25 10,49 9,27 5,27 3,41 2,24

20>value≥15

8 147,9 113,7 63,4 26,47 15,67 8,17 7,76 4,82 4,16 3,51

30>value≥20

7 105,3 60,0 25,73 33,39 10,92 8,17 5,75 3,51 3,14 2,32

40>value≥30

3 178,0 30,48 18,96 10,64 9,39 5,27 3,49 4,22 2,87 2,87 1,99

60>value≥40

1 54,9 12,36 9,88 7,11 5,01 2,27 1,60 1,31 1,32 0,82 1,15

90>value≥60

0 0,00 0,00 0,00 0,07 0,11 0,13 0,18 0,07 0,06 0,07 0,05

120>value≥90

-5 9,03 5,43 3,92 2,80 2,97 2,30 1,63 1,34 1,45 1,67 1,67

180≥value≥120

-10 7,55 5,05 2,52 2,16 1,79 1,48 1,58 1,25 1,16 1,15 1,33

-20 10,14 4,00 2,79 2,29 2,56 2,16 2,14 1,84 2,02 1,73 1,91

Tab. 7.14 – Medium roll error on roll evaluation.

Medium Roll Error (section 5.3.2) – in degrees


x5 Images 20 30 35 40 45 50 55 60 65 70 80

Roll (degrees)

16 21,67 8,08 4,38 2,35 1,03

0.5>value≥0

15

60,3 8,01 5,69 3,23 2,56 1,07

1>value≥0.5

14

64,0 18,14 7,47 4,86 2,56 1,75 0,98

2>value≥1

13

53,2 9,86 5,65 2,55 2,21 1,15 0,96

4>value≥2

12

15,36 6,65 5,59 2,64 1,68 1,05 0,73

7>value≥4

11

26,85 10,64 4,89 3,30 2,24 1,54 1,34 0,73

10>value≥7

10

15,77 6,65 3,92 3,05 1,73 1,33 0,97 0,60

15>value≥10

9

50,3 13,62 4,90 3,51 1,88 1,74 1,08 0,77 0,52

20>value≥15

8

15,71 8,31 4,76 2,76 1,95 1,29 0,80 0,61 0,56

30>value≥20

7 40,2 11,58 5,33 3,24 2,39 1,69 1,19 0,72 0,55 0,53

40>value≥30

3 71,2 4,88 4,15 2,88 2,06 1,38 0,84 0,77 0,61 0,49 0,42

60>value≥40

1 4,56 2,62 2,63 1,38 0,84 0,55 0,49 0,38 0,33 0,28 0,27

90>value≥60

0 0,00 0,00 0,00 0,01 0,03 0,04 0,05 0,02 0,03 0,02 0,03

120>value≥90

-5 1,99 1,16 0,88 0,70 0,63 0,50 0,36 0,35 0,32 0,34 0,25

180≥value≥120

-10 1,27 0,87 0,60 0,50 0,47 0,37 0,34 0,31 0,26 0,22 0,22

-20 0,95 0,72 0,60 0,48 0,49 0,44 0,41 0,40 0,42 0,44 0,56

- 111 -

Tab. 7.15 – Maximum error on roll evaluation.



x5 Images 20 30 35 40 45 50 55 60 65 70 80

Roll (degrees)

16 470,0 87,16 76,15 49,69 38,31

1>value

15

1,6E+03 96,59 128,3 68,53 49,30 41,21

3>value≥1

14

9,2E+03 186,2 106,2 82,82 90,89 58,02 38,82

6>value≥3

13

4,0E+05 211,9 106,2 79,65 51,78 44,18 28,01

10>value≥6

12

1,3E+06 440,0 123,5 87,71 64,24 43,77 32,19 23,77

15>value≥10

11

2,5E+04 120,3 138,0 73,05 52,07 49,81 30,92 24,34

20>value≥15

10

2,9E+04 380,1 127,1 95,11 89,71 59,70 41,39 44,02 25,16

30>value≥20

9 3,2E+03 290,7 114,1 88,79 53,30 42,20 41,16 34,45 26,09

40>value≥30

8 3,1E+05 110,9 121,6 126,9 43,03 62,37 41,24 23,24 21,93 26,46

60>value≥40

7 285,6 165,7 76,47 95,92 58,02 41,51 31,16 22,31 22,73 20,53

100>value≥60

3 1,3E+06 67,38 58,06 81,38 69,52 37,02 25,37 31,84 21,83 21,42 21,05

200>value≥100

1 64,36 45,12 50,91 31,43 31,94 17,80 18,74 10,08 11,49 12,15 9,33

500>value≥200

0 0,00 0,00 0,00 0,03 0,93 0,78 0,69 0,59 0,58 0,46 0,89

1000>value≥500

-5 55,72 38,50 30,55 28,07 25,95 21,04 19,45 18,16 17,88 15,49 15,88

value≥1000

-10 50,93 27,44 29,63 28,93 24,89 24,13 21,80 17,37 23,21 19,21 15,06

-20 64,45 82,65 31,01 28,68 28,21 27,95 20,19 23,17 20,07 23,18 27,30

Tab. 7.16 – Medium error on roll evaluation.



x5 Images 20 30 35 40 45 50 55 60 65 70 80

Roll (degrees)

16 49,25 17,78 13,78 10,76 7,38

1>value

15

138,9 20,84 21,61 12,17 13,35 7,60

3>value≥1

14

678,4 28,65 18,07 13,93 13,76 10,81 7,28

6>value≥3

13

1,2E+04 24,04 19,38 12,49 9,76 8,28 5,57

10>value≥6

12

28,81 19,83 20,77 11,89 8,49 7,92 5,87

15>value≥10

11

195,0 21,20 17,37 13,84 10,31 8,71 7,40 5,89

20>value≥15

10

32,74 23,65 14,45 14,23 10,26 8,26 7,32 6,24

30>value≥20

9

76,00 26,42 14,85 15,57 11,00 8,45 9,30 6,77 5,25

40>value≥30

8

15,93 19,55 15,51 11,03 10,84 8,24 7,13 4,99 5,26

60>value≥40

7 55,29 17,42 16,39 12,88 12,27 9,06 7,10 4,61 5,25 4,43

100>value≥60

3 8,7E+03 10,48 13,94 12,96 9,09 7,77 6,72 5,88 4,88 4,62 4,47

200>value≥100

1 7,10 10,19 10,81 7,28 5,22 3,79 2,74 1,80 1,85 1,67 1,64

500>value≥200

0 0,00 0,00 0,00 0,01 0,07 0,31 0,19 0,21 0,24 0,15 0,31

1000>value≥500

-5 9,63 8,69 6,93 6,06 5,55 4,50 4,69 4,34 3,82 3,74 3,52

value≥1000

-10 9,74 7,15 6,68 5,26 5,26 5,43 4,43 4,50 3,60 3,87 3,32

-20 10,69 8,58 5,75 5,95 5,62 5,07 4,72 4,80 4,58 5,66 5,84

- 112 -

Appendix D

Tab. 7.17 – Minimum number of inliers on yaw evaluation.


Yaw (degrees)

0 57 103 124 157 176 201 220 251 278 302 350

value≥250

5 12 22 28 38 42 43 50 59 68 75 81

250>value≥200

10 7 15 24 29 39 37 53 62 66 73 83

200>value≥170

20 6 11 16 15 21 28 26 41 41 47 53

170>value≥140

40 4 5 9 14 12 17 24 22 23 26 32

140>value≥120

70 5 9 9 13 18 16 22 23 25 33

120>value≥100

80 5 7 14 17 19 25 28 26 35 43

100>value≥80

90 5 7 9 13 15 23 29 28 33 48

80>value≥60

100 5 8 10 11 14 19 22 25 31 37

60>value≥40

110 4 4 6 7 12 12 15 19 25 28

40>value≥30

140 4 5 6 6 9 8 9 14 13 14 21

30>value≥20

180 6 11 12 13 16 21 27 29 33 37 43

20>value≥10

220

5 7 8 10 11 13 12 15 17

10>value≥7

250 5 8 9 14 12 17 15 17 21 23

7>value

260 7 11 13 14 18 19 23 26 32 32

270 8 12 19 23 26 32 37 43 50 64

280 7 10 16 18 17 28 32 35 35 49

290 6 10 10 12 14 16 25 23 30 42

320 4 9 8 11 13 16 23 20 32 35 42 340 8 16 18 23 22 28 37 41 44 55 66 350 9 17 26 30 35 41 53 60 71 71 90 355 14 17 30 33 42 43 50 62 77 82 90

Tab. 7.18 – Medium number of inliers on yaw evaluation.


Yaw (degrees)

0 81,40 155,8 195,2 233,2 269,0 302,4 334,8 370,2 411,8 445,4 509,2

value≥250

5 19,95 38,14 48,98 61,15 78,90 87,39 101,4 118,7 129,4 150,2 178,8

250>value≥200

10 15,47 32,56 45,03 57,50 64,66 80,69 93,78 113,1 118,6 133,9 165,8

200>value≥170

20 9,79 20,98 29,63 33,51 44,38 52,10 61,21 70,46 78,12 93,73 115,2

170>value≥140

40 6,16 11,57 16,46 20,81 23,13 29,60 36,72 40,39 43,23 50,15 69,47

140>value≥120

70 8,49 13,71 16,98 21,95 29,54 30,05 39,11 42,90 51,40 63,63

120>value≥100

80 8,96 12,75 19,41 25,88 31,18 37,88 44,49 49,51 58,60 77,30

100>value≥80

90 8,43 12,65 16,44 25,51 28,90 39,23 48,47 66,41 74,46 94,66

80>value≥60

100 7,02 11,68 15,29 16,98 21,63 30,05 36,41 40,77 45,76 61,25

60>value≥40

110 6,28 9,23 13,25 14,72 18,41 21,87 26,93 31,52 37,76 45,55

40>value≥30

140 5,17 6,72 7,95 9,84 12,46 13,69 15,86 20,27 21,80 24,29 32,12

30>value≥20

180 9,93 16,55 20,50 26,29 30,14 36,64 43,65 49,56 55,37 60,72 74,59

20>value≥10

220

9,64 11,87 12,64 14,13 17,67 20,95 23,65 23,36 34,60

10>value≥7

250 7,04 12,27 14,02 19,97 20,80 25,03 26,87 31,75 38,77 48,82

7>value

260 9,17 14,20 18,84 22,50 29,22 33,91 43,91 47,56 54,92 70,27

270 13,47 21,49 27,90 36,12 43,40 52,17 67,96 86,77 93,45 122,4

280 9,80 14,24 22,41 26,06 32,60 41,70 50,52 55,29 62,19 82,74

290 8,36 13,96 17,76 23,64 27,58 32,71 38,55 44,44 49,73 65,39 320 5,98 12,19 13,62 17,87 20,86 26,78 35,27 37,17 50,35 54,59 68,15 340 13,30 21,67 26,50 32,02 41,57 49,20 60,77 67,02 76,32 85,96 115,3 350 19,19 35,32 48,40 62,23 70,53 78,74 97,54 108,5 125,6 136,8 174,2 355 22,94 37,28 51,45 61,36 79,56 92,68 104,9 116,5 136,0 156,7 180,4

- 113 -

Tab. 7.19 – Maximum geometric distance error on yaw evaluation.

Maximum Geometric Distance Error (section 5.3.1) – in pixels x100 Ransac Overlap Percentage x5 Images 20 30 35 40 45 50 55 60 65 70 80

Yaw (degrees)

0 0,00 0,00 0,00 0,45 0,72 0,68 0,78 0,36 0,35 0,30 0,20

1>value

5 1,3E+03 123,2 82,76 44,07 40,41 26,64 17,32 13,78 9,94 11,89 6,34

3>value≥1

10 8,5E+07 251,1 63,57 46,55 45,06 36,61 29,95 21,62 16,31 10,23 11,00

6>value≥3

20 3,4E+08 285,5 174,8 78,41 59,05 34,41 36,02 18,49 33,13 16,20 19,96

10>value≥6

40 1,8E+08 1,5E+03 1,0E+03 169,2 89,06 169,0 52,80 60,61 25,43 32,11 29,61

15>value≥10

70 5,4E+07 902,3 290,5 124,3 50,65 47,49 45,75 21,38 21,46 12,63

20>value≥15

80 6,1E+07 1,5E+08 290,5 73,72 46,50 39,96 28,79 23,41 21,94 11,66

30>value≥20

90 3,2E+08 6,1E+03 434,6 145,9 54,40 40,53 20,07 22,08 17,93 14,08

40>value≥30

100 1,8E+09 1,0E+04 512,2 335,0 105,7 51,86 56,11 30,78 24,18 14,82

60>value≥40

110 4,2E+08 4,1E+06 249,2 216,8 77,16 52,27 32,74 27,82 22,43 24,45

100>value≥60

140 8,2E+07 4,4E+08 4,1E+07 2,6E+03 408,0 168,0 133,9 79,23 37,10 65,22 29,31

200>value≥100

180 5,2E+08 3,5E+03 152,3 87,54 58,95 45,78 37,48 28,59 22,21 16,22 14,76

500>value≥200

220

1,3E+03 744,0 416,0 113,4 194,5 53,05 82,93 40,55 30,04

1000>value≥500

250 1,8E+08 927,6 371,1 126,6 72,88 52,91 93,70 53,73 27,16 26,42

value≥1000

260 2,3E+08 6,3E+06 353,0 87,95 98,57 58,24 25,75 29,45 26,76 20,28 270 362,8 342,9 105,1 70,09 42,69 26,48 21,74 20,60 11,83 7,48 280 6,6E+08 1,2E+03 100,4 74,38 68,79 54,87 52,31 25,40 19,67 16,09 290 8,9E+07 872,1 490,4 207,9 79,23 61,88 40,33 40,77 21,61 13,07 320 1,9E+09 3,7E+03 359,2 197,1 179,9 73,71 43,93 32,63 20,56 23,38 19,24 340 1,8E+08 214,0 131,3 92,58 153,3 57,93 28,79 20,49 18,45 14,58 14,08 350 4,1E+08 153,2 89,37 42,40 31,88 33,17 18,50 18,12 15,41 11,24 6,52 355 560,6 181,7 74,55 42,89 38,41 19,77 22,70 14,43 11,54 19,19 5,89

Tab. 7.20 – Medium geometric distance error on yaw evaluation.


Yaw (degrees)

0 0,00 0,00 0,00 0,03 0,06 0,08 0,07 0,04 0,04 0,03 0,03

1>value

5 24,40 6,92 4,24 2,83 1,89 1,72 1,11 0,96 0,76 0,61 0,49

3>value≥1

10 181,2 10,61 3,88 2,88 2,55 1,86 1,25 1,11 0,94 0,79 0,61

6>value≥3

20 758,6 10,11 7,27 4,09 3,22 1,98 2,04 1,40 1,27 1,06 0,85

10>value≥6

40 796,1 27,91 18,22 6,54 4,68 3,98 2,63 2,27 1,81 1,66 1,37

15>value≥10

70 89,66 21,29 11,92 5,38 3,56 3,04 2,24 1,93 1,55 1,25

20>value≥15

80 107,9 475,9 7,49 4,46 3,06 2,75 1,98 1,65 1,39 1,33

30>value≥20

90 505,4 17,21 6,35 4,14 3,46 2,22 1,56 1,16 1,04 0,96

40>value≥30

100 2,4E+03 27,71 9,45 7,31 4,91 2,82 2,32 1,88 1,81 1,45

60>value≥40

110 202,0 64,33 9,74 10,36 4,71 3,40 2,60 2,10 1,80 1,58

100>value≥60

140 381,9 325,3 81,06 23,76 11,00 9,23 5,30 3,15 2,86 2,86 2,14

200>value≥100

180 268,9 15,72 9,22 5,48 3,81 2,85 2,21 1,88 1,64 1,44 1,30

500>value≥200

220

23,81 11,56 10,44 7,21 4,92 3,62 2,96 2,84 2,00

1000>value≥500

250 115,5 16,41 13,44 7,97 5,26 3,56 3,45 2,38 2,26 1,85

value≥1000

260 47,64 22,55 10,58 5,29 3,64 3,05 2,24 2,08 1,79 1,44 270 22,85 10,77 6,28 3,84 2,77 2,06 1,51 1,07 1,01 0,89 280 203,7 19,34 6,09 4,65 3,69 2,80 2,01 1,75 1,54 1,20 290 104,3 17,53 10,52 5,18 4,16 2,85 2,34 1,88 1,65 1,24 320 1,5E+03 18,60 11,16 7,26 5,61 4,27 2,51 2,37 1,72 1,66 1,25 340 104,3 12,78 6,06 5,63 3,21 2,23 1,78 1,42 1,29 1,07 0,85 350 95,31 7,31 4,17 2,43 2,03 1,83 1,27 1,10 0,90 0,75 0,59 355 19,33 6,26 3,56 2,35 1,95 1,17 1,08 0,88 0,79 0,61 0,48

- 114 -

Tab. 7.21 – Maximum yaw error on yaw evaluation.

Maximum Yaw Error (section 5.3.2) – in degrees x100 Ransac Overlap Percentage x5 Images 20 30 35 40 45 50 55 60 65 70 80

Yaw (degrees)

0 0,00 0,00 0,00 0,00 0,01 0,01 0,01 0,01 0,01 0,01 0,01

0.1>value≥0

5 11,02 1,81 1,46 1,01 0,86 1,02 0,61 0,46 0,35 0,45 0,36

0.3>value≥0.1

10 17,5 2,68 1,90 1,02 1,26 0,79 0,56 0,80 0,62 0,47 0,60

0.5>value≥0.3

20 37,5 2,89 2,00 1,42 1,69 1,29 2,15 0,52 1,00 0,87 0,62

1>value≥0.5

40 168,6 11,46 15,9 4,79 1,92 2,73 1,27 3,88 1,00 1,35 0,89

1.5>value≥1

70 177,0 12,08 5,89 3,07 1,89 2,44 2,12 1,06 0,89 0,54

2>value≥1.5

80 177,8 178,5 6,52 2,22 1,38 1,58 1,34 1,06 0,82 0,59

4>value≥2

90 179,7 133,7 10,63 2,04 1,90 1,24 1,03 0,53 0,65 0,39

7>value≥4

100 179,4 172,3 25,5 18,5 3,19 2,04 1,05 1,57 0,87 0,68

10>value≥7

110 158,6 177,1 5,46 15,5 2,76 2,70 1,56 1,39 0,80 0,82

15>value≥10

140 175,3 177,4 151,1 19,9 7,81 8,57 4,24 1,58 1,07 2,56 1,03

20>value≥15

180 21,0 8,16 8,17 3,02 2,28 1,16 1,06 0,92 0,86 0,65 0,59

50>value≥20

220

16,6 5,51 4,92 8,66 7,83 2,00 2,60 2,70 1,00

100>value≥50

250 175,9 25,4 19,0 8,50 2,36 1,67 5,37 2,00 1,44 1,26

180>value≥100

260 169,2 63,1 6,33 2,75 2,30 1,76 1,29 1,25 0,97 0,79

270 62,5 9,68 2,86 2,00 1,62 1,05 0,95 0,81 0,62 0,40

280 163,3 60,4 4,73 2,29 2,55 2,09 1,13 1,13 0,83 0,71

290 177,3 18,8 4,23 3,19 2,03 2,38 1,76 1,22 0,92 0,65

320 43,0 21,4 9,17 2,06 2,24 1,54 0,95 1,11 1,34 0,89 0,87 340 20,6 4,90 2,54 1,67 1,51 0,94 0,95 0,82 0,71 0,57 0,35 350 21,2 2,89 1,85 1,05 0,67 0,79 0,76 0,73 0,46 0,33 0,59 355 11,43 3,02 1,17 1,02 0,79 0,70 0,41 0,62 0,37 0,34 0,29

Tab. 7.22 – Medium yaw error on yaw evaluation.

Medium Yaw Error (section 5.3.2) – in degrees x100 Ransac Overlap Percentage x5 Images 20 30 35 40 45 50 55 60 65 70 80

Yaw (degrees)

0 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00

0.1>value≥0

5 1,30 0,45 0,28 0,23 0,19 0,14 0,13 0,11 0,07 0,08 0,06

0.3>value≥0.1

10 1,72 0,65 0,27 0,25 0,24 0,17 0,14 0,15 0,11 0,11 0,08

0.5>value≥0.3

20 4,16 0,56 0,41 0,30 0,32 0,20 0,23 0,15 0,18 0,12 0,09

1>value≥0.5

40 7,19 2,49 1,77 0,65 0,47 0,42 0,29 0,34 0,23 0,21 0,15

1.5>value≥1

70 24,6 1,58 1,18 0,57 0,46 0,38 0,28 0,25 0,21 0,11

2>value≥1.5

80 17,0 21,3 0,89 0,55 0,28 0,35 0,27 0,19 0,16 0,13

4>value≥2

90 25,2 2,13 0,62 0,41 0,41 0,20 0,15 0,14 0,10 0,08

7>value≥4

100 49,6 5,02 0,99 0,82 0,60 0,29 0,23 0,20 0,22 0,14

10>value≥7

110 28,6 25,0 1,11 0,79 0,60 0,48 0,30 0,24 0,19 0,15

15>value≥10

140 15,88 17,4 12,50 2,68 0,61 0,90 0,52 0,29 0,24 0,30 0,21

20>value≥15

180 2,78 1,14 0,84 0,55 0,37 0,30 0,22 0,21 0,19 0,16 0,14

50>value≥20

220

2,25 0,84 0,83 0,73 0,46 0,40 0,38 0,39 0,22

100>value≥50

250 13,71 1,91 1,61 0,92 0,59 0,44 0,50 0,38 0,38 0,22

180>value≥100

260 4,32 2,56 1,00 0,76 0,49 0,42 0,32 0,28 0,22 0,18

270 2,83 1,38 0,82 0,61 0,57 0,38 0,29 0,19 0,12 0,08

280 13,59 2,39 0,71 0,51 0,45 0,37 0,26 0,23 0,21 0,14

290 12,40 1,90 0,81 0,65 0,44 0,33 0,34 0,24 0,18 0,17

320 6,92 1,24 0,87 0,50 0,51 0,32 0,23 0,22 0,16 0,18 0,14 340 2,21 0,68 0,34 0,30 0,22 0,17 0,16 0,15 0,13 0,11 0,08 350 1,44 0,57 0,29 0,17 0,18 0,20 0,13 0,13 0,09 0,08 0,06 355 0,97 0,47 0,19 0,21 0,15 0,12 0,10 0,11 0,08 0,07 0,06

- 115 -

Tab. 7.23 – Maximum error on yaw evaluation.



Yaw (degrees)

0 0,00 0,00 0,00 0,03 0,86 0,78 0,69 0,59 1,26 0,46 0,89

1>value

5 441,4 94,80 60,31 46,49 32,56 51,24 31,31 21,17 25,83 20,13 20,34

3>value≥1

10 374,3 134,4 64,59 43,11 47,34 45,62 50,51 43,29 42,95 30,42 25,85

6>value≥3

20 1,1E+03 131,9 186,4 91,12 63,87 45,56 77,83 64,64 52,95 45,40 31,94

10>value≥6

40 949,6 587,8 479,0 141,6 145,6 191,4 75,24 99,00 56,49 82,14 68,14

15>value≥10

70 1,8E+03 295,7 236,8 144,5 99,47 91,89 73,57 74,85 64,00 45,28

20>value≥15

80 3,2E+04 473,6 106,3 125,7 58,68 66,67 69,91 57,92 41,76 35,42

30>value≥20

90 1,7E+03 281,3 160,8 83,10 62,29 47,56 43,80 38,11 49,47 30,54

40>value≥30

100 1,7E+03 204,5 279,4 273,9 264,6 90,25 67,12 73,79 55,72 39,41

60>value≥40

110 1,2E+04 2,9E+05 224,5 527,8 109,4 145,1 64,47 55,27 47,89 53,66

100>value≥60

140 4,6E+03 1,9E+04 5,0E+04 577,4 468,7 513,6 389,9 119,2 94,17 246,6 58,62

200>value≥100

180 717,3 214,7 136,4 131,2 113,2 107,6 98,47 71,89 56,91 56,14 52,66

500>value≥200

220 703,1 402,9 408,1 213,6 276,6 392,4 138,6 202,5 103,0 84,88

1000>value≥500

250

397,6 265,0 144,6 154,2 95,09 190,2 82,55 79,97 77,35

value≥1000

260 329,0 305,0 185,9 159,3 99,11 101,2 70,39 59,38 56,56 53,96 270 304,0 212,8 92,76 63,86 45,43 40,71 39,42 41,67 32,16 22,93 280 665,5 234,8 140,3 113,7 90,92 61,26 57,01 54,92 49,18 42,12 290 591,2 251,3 264,9 137,2 93,02 81,69 61,13 76,04 75,61 41,20 320 1,1E+04 311,2 233,0 217,2 138,5 106,5 123,1 98,24 71,24 80,06 78,36 340 252,8 137,0 131,4 76,09 197,5 115,2 60,21 39,24 41,23 40,78 40,01 350 561,1 125,7 70,02 43,99 59,38 52,00 38,82 37,45 26,94 35,38 24,46 355 193,7 60,33 71,80 35,77 50,10 32,02 33,87 24,32 20,97 31,07 17,87

Tab. 7.24 – Medium error on yaw evaluation.



Yaw (degrees)

0 0,00 0,00 0,00 0,01 0,07 0,31 0,19 0,21 0,25 0,15 0,31

1>value

5 30,81 15,62 11,50 11,10 8,55 8,12 7,62 6,16 6,02 5,12 4,34

3>value≥1

10 40,31 23,14 12,73 10,66 12,63 9,45 7,71 7,47 7,77 7,21 5,58

6>value≥3

20 87,31 26,96 26,93 18,10 16,20 11,62 13,50 11,48 9,79 9,41 7,57

10>value≥6

40 358,2 79,44 59,56 27,74 34,61 24,91 20,69 18,55 14,59 14,28 14,07

15>value≥10

70 112,6 61,32 49,61 30,74 22,11 20,27 15,83 15,56 11,70 9,45

20>value≥15

80 220,1 74,15 24,78 26,85 18,07 15,48 13,21 12,03 11,02 10,15

30>value≥20

90 97,03 30,42 22,04 13,67 14,21 11,63 10,19 7,38 6,65 6,91

40>value≥30

100 270,5 55,87 35,17 24,89 27,80 16,57 14,54 15,90 13,69 11,42

60>value≥40

110 909,5 1,1E+03 56,57 53,75 22,07 21,34 16,67 14,67 13,34 13,01

100>value≥60

140 791,9 733,8 206,5 144,0 69,64 69,46 42,03 23,52 22,62 26,63 16,98

200>value≥100

180 59,87 28,08 27,22 24,92 25,72 20,90 16,82 14,43 12,78 10,87 9,20

500>value≥200

220

65,30 65,01 51,84 46,35 50,58 30,93 27,39 28,40 16,13

1000>value≥500

250 202,1 57,83 61,01 32,97 31,67 21,58 24,49 17,39 17,77 12,23

value≥1000

260 67,21 58,65 43,63 27,71 19,17 20,20 16,05 14,19 13,96 11,61 270 37,02 26,41 17,39 13,18 10,68 10,25 8,16 5,36 5,44 4,60 280 116,2 44,53 26,82 26,78 18,28 15,31 12,00 11,15 11,94 8,73 290 147,9 47,40 39,08 21,11 21,80 16,38 15,25 16,21 13,05 9,96 320 171,5 52,13 56,46 43,54 34,52 28,40 18,17 20,59 16,36 16,24 11,55 340 66,49 29,84 25,16 17,89 18,03 14,62 12,65 10,91 10,55 8,67 7,98 350 43,79 19,40 13,69 10,36 11,51 9,94 8,69 8,35 8,04 5,98 4,82 355 22,46 14,17 10,48 9,14 8,56 6,78 7,14 6,35 5,81 5,14 4,20

- 116 -

Appendix E

Tab. 7.25 – Minimum number of inliers on scale evaluation.


Scale factor (percentage)

50 4 5 4

value≥250

60

5 8 8 10 14 15 17 27

250>value≥200

70

4 6 8 9 11 16 17 20 25

200>value≥170

75 4 4 5 6 9 9 9 12 13 20

170>value≥140

80 5 6 7 9 12 13 14 14 20 21

140>value≥120

85 7 11 13 16 16 18 28 32 37 43

120>value≥100

90 4 15 17 27 28 38 43 50 60 56 83

100>value≥80

95 10 21 28 36 55 63 67 70 86 93 116

80>value≥60

100 57 103 124 157 176 201 220 251 278 302 350

60>value≥40

105 19 34 33 50 55 70 70 87 91 96 113

40>value≥30

110 13 20 33 33 46 47 61 65 72 73 100

30>value≥20

115 7 18 15 26 29 33 40 43 51 52 58

20>value≥10

120 8 11 14 15 18 22 24 27 31 29 42

10>value≥7

125 5 8 8 11 11 16 19 18 19 17 24

7>value

130 6 8 9 8 12 11 15 18 19 18 20 140 5 7 8 10 11 14 13 14 15 19 20 150 6 9 11 12 15 17 15 20 20 21 22 160 6 7 8 12 13 15 18 18 19 19 19 170 6 7 9 7 9 12 11 13 13 15 14 180 5 7 4 5 5 4 6 7 9 9 5

190 4 4 5 5 5 5 4 6 4 5

200 4 4 4 4 4 4

Tab. 7.26 – Medium number of inliers on scale evaluation.



50 7,24 9,20 11,40

value≥250

60

10,61 14,15 15,34 19,49 26,60 28,55 35,15 47,23

250>value≥200

70

9,05 11,55 14,63 16,78 17,92 24,92 29,51 34,14 44,68

200>value≥170

75 6,21 9,50 7,90 10,66 13,75 13,89 17,16 19,65 22,72 30,84

170>value≥140

80 6,21 8,29 10,84 12,68 16,40 18,89 23,52 27,10 32,71 42,00

140>value≥120

85 12,92 17,60 18,81 24,01 29,19 35,50 45,98 49,02 55,67 74,63

120>value≥100

90 8,51 25,51 29,84 44,58 54,13 63,03 73,94 86,65 103,5 114,6 139,2

100>value≥80

95 18,15 44,38 61,23 73,94 90,14 105,9 120,6 139,7 165,9 182,0 215,9

80>value≥60

100 81,40 155,8 195,2 233,2 269,0 302,4 334,8 370,2 411,8 445,4 509,2

60>value≥40

105 30,62 54,40 63,47 84,68 106,5 119,2 127,7 151,6 170,0 188,7 219,2

40>value≥30

110 23,73 42,89 53,00 56,19 77,93 81,99 102,4 112,2 124,4 133,1 165,4

30>value≥20

115 16,93 28,57 31,89 40,80 44,43 54,85 64,15 73,51 82,50 89,11 107,8

20>value≥10

120 13,31 17,73 20,75 28,73 28,01 32,95 42,32 46,16 47,57 59,02 66,06

10>value≥7

125 8,65 13,07 14,83 17,17 19,20 23,02 29,69 30,35 32,66 34,13 45,18

7>value

130 8,95 10,23 13,12 15,05 16,56 20,13 22,15 22,31 27,63 29,00 35,43 140 7,89 12,43 13,77 17,10 16,15 22,21 22,80 26,01 27,51 32,45 36,38 150 11,17 15,09 17,97 20,47 23,24 25,67 28,67 31,76 33,32 35,60 40,62 160 11,16 15,30 18,21 20,96 23,11 26,26 28,28 31,46 33,58 36,34 37,13 170 11,65 14,12 18,49 17,50 21,02 22,30 25,82 26,95 25,39 30,33 30,24 180 7,82 10,67 9,11 11,06 10,68 12,23 13,69 15,34 15,12 17,04 13,92

190 5,70 6,97 6,98 7,62 8,04 7,41 9,31 8,30 8,24 7,71

200 5,63 5,60 5,57 5,59 5,58 5,60

- 117 -

Tab. 7.27 – Maximum geometric distance error on scale evaluation.

Maximum Geometric Distance Error (section 5.3.1) – in pixels x100 Ransac Overlap Percentage x5 Images 20 30 35 40 45 50 55 60 65 70 80


50 6,1E+07 88,88 23,69

1>value

60 3,8E+06 162,7 82,94 31,54 16,81 16,83 10,89 6,78

3>value≥1

70 1,6E+07 282,0 77,87 110,9 47,42 26,04 26,16 10,28 7,83

6>value≥3

75 6,9E+07 5,7E+06 2,8E+07 730,4 123,0 63,96 44,82 23,12 16,15 21,31

10>value≥6

80 5,0E+07 7,2E+08 3,6E+07 134,9 98,41 56,27 36,05 32,36 22,08 14,26

15>value≥10

85 1,1E+08 403,4 118,4 77,74 58,61 38,97 16,96 15,10 13,03 12,44

20>value≥15

90 3,7E+07 170,2 115,7 57,43 44,72 26,67 21,61 17,92 9,20 10,24 6,29

30>value≥20

95 1,8E+08 155,9 65,64 36,18 36,79 21,85 15,37 10,98 11,49 8,89 5,60

40>value≥30

100 0,00 0,00 0,00 0,45 0,72 0,68 0,78 0,36 0,28 0,40 0,20

60>value≥40

105 346,2 81,64 54,58 65,78 26,99 27,02 17,47 14,49 9,99 10,83 7,13

100>value≥60

110 496,1 92,87 56,40 59,54 30,73 30,10 23,01 25,96 17,92 13,16 6,72

200>value≥100

115 3,9E+08 100,8 105,5 60,41 39,80 34,13 28,19 27,28 17,64 14,83 15,29

500>value≥200

120 1,5E+03 164,6 421,3 48,88 83,17 56,62 37,26 35,98 44,75 19,52 20,91

1000>value≥500

125 2,2E+09 2,1E+03 355,2 105,4 77,83 56,18 46,64 82,86 45,04 31,74 22,45

value≥1000

130 1,1E+10 490,6 910,7 1,6E+03 114,7 108,1 81,71 83,11 43,73 49,68 36,81 140 6,5E+08 330,1 481,5 1,2E+07 116,0 131,1 61,07 88,04 46,34 32,29 53,47 150 2,4E+03 295,0 183,6 174,8 96,71 100,1 84,76 69,61 50,96 44,21 28,10 160 416,6 535,3 164,2 270,3 191,1 96,26 100,8 78,29 59,20 54,73 40,33 170 7,0E+07 274,2 201,1 223,0 198,8 126,4 75,49 67,81 117,1 50,27 59,05 180 1,2E+06 586,2 8,5E+05 1,5E+08 2,6E+03 2,2E+07 395,4 160,0 268,5 134,9 500,0 190 1,3E+08 1,9E+07 3,6E+08 3,6E+07 570,9 6,5E+06 1,4E+03 3,0E+07 1,7E+08 1,7E+08 200 1,9E+08 1,9E+08 7,9E+09 1,9E+08 1,9E+08 650,6

Tab. 7.28 – Medium geometric distance error on scale evaluation.



50 7,54 3,08 1,32

1>value

60

34,98 5,04 3,54 1,82 1,25 0,95 0,83 0,64

3>value≥1

70

128,5 11,87 4,90 3,65 2,39 1,76 1,30 0,92 0,66

6>value≥3

75 144,9 33,94 32,62 8,34 3,76 2,92 1,95 1,74 1,26 1,03

10>value≥6

80 181,3 213,6 28,35 6,42 3,28 2,38 1,86 1,50 1,16 0,96

15>value≥10

85 192,7 10,17 6,16 4,13 2,73 1,88 1,27 1,15 0,96 0,72

20>value≥15

90 128,7 9,74 5,85 2,88 2,23 1,73 1,30 0,99 0,74 0,63 0,53

30>value≥20

95 191,1 6,50 3,49 2,12 1,62 1,24 1,00 0,77 0,55 0,49 0,40

40>value≥30

100 0,00 0,00 0,00 0,03 0,06 0,08 0,07 0,04 0,04 0,03 0,03

60>value≥40

105 14,71 4,44 3,66 2,48 1,45 1,38 1,08 0,82 0,67 0,65 0,50

100>value≥60

110 18,78 6,34 3,59 3,16 1,97 1,67 1,25 1,03 0,86 0,77 0,57

200>value≥100

115 94,52 5,94 5,12 4,10 2,84 2,40 1,63 1,46 1,28 1,07 0,91

500>value≥200

120 27,38 7,78 6,25 3,73 3,47 2,62 2,19 1,92 1,52 1,33 1,13

1000>value≥500

125 226,3 15,63 10,03 7,41 4,35 3,12 2,59 2,78 2,32 1,99 1,45

value≥1000

130 406,7 15,54 12,39 10,31 5,80 5,01 3,82 3,64 2,93 2,25 1,79 140 86,89 12,05 11,69 10,29 5,32 5,64 3,47 3,47 2,82 2,07 1,89 150 35,81 10,87 8,79 4,92 4,67 4,69 3,34 2,75 2,95 2,28 2,11 160 24,87 13,85 9,41 7,58 6,76 4,80 4,39 3,65 3,29 2,83 2,48 170 38,19 12,56 8,02 8,53 5,74 6,57 3,90 3,72 4,70 3,25 2,82 180 36,67 15,69 31,83 372,5 28,67 28,95 8,60 6,36 5,11 5,52 8,88

190 157,1 28,33 111,2 271,4 25,45 124,8 45,52 50,48 156,8 89,71

200 106,2 89,25 305,5 104,0 83,39 34,10

- 118 -

Tab. 7.29 – Maximum error on scale evaluation.




50 1,6E+03 187,8 22,02

1>value

60

6,9E+03 75,18 59,57 25,80 13,44 12,19 8,46 5,88

3>value≥1

70

9,3E+03 97,45 81,40 58,49 35,10 35,89 39,44 5,89 10,23

6>value≥3

75 1,1E+05 3,1E+06 2,1E+03 300,0 314,9 34,79 17,33 9,91 8,96 6,38

10>value≥6

80 9,0E+03 680,0 470,8 241,0 40,86 30,65 16,23 15,33 10,80 10,63

15>value≥10

85 629,2 98,54 127,2 45,26 13,93 10,88 5,93 6,18 5,22 4,87

20>value≥15

90 6,3E+06 160,7 73,36 22,78 10,01 9,50 7,44 5,29 3,15 2,84 2,80

30>value≥20

95 799,6 46,91 27,32 8,19 6,28 10,06 3,98 2,93 2,35 2,36 2,29

40>value≥30

100 0,00 0,00 0,00 0,11 0,19 0,18 0,21 0,09 0,08 0,09 0,05

60>value≥40

105 80,31 23,39 21,68 12,38 4,76 4,37 3,47 3,43 2,50 2,12 2,64

100>value≥60

110 152,3 30,37 13,48 10,24 6,83 4,92 4,49 3,58 2,97 2,68 2,07

200>value≥100

115 2,9E+03 30,33 19,93 14,26 10,79 8,49 6,18 3,68 3,87 2,87 3,53

500>value≥200

120 105,7 40,11 34,69 11,93 30,84 7,67 5,46 5,90 6,97 3,37 2,85

1000>value≥500

125 1,8E+04 89,46 124,2 26,89 16,10 10,31 14,23 11,38 5,86 5,29 7,19

value≥1000

130 555,2 76,63 126,1 84,63 30,84 31,21 9,69 18,12 7,98 6,38 5,76 140 1,1E+03 67,62 29,41 248,0 15,16 16,46 15,98 6,16 10,18 4,74 8,48 150 113,7 29,66 14,99 14,48 13,56 6,74 12,75 8,00 4,70 5,47 4,84 160 123,0 50,51 40,52 19,72 19,85 11,04 7,55 9,89 5,72 6,57 3,61 170 262,5 20,21 30,42 27,50 23,47 19,58 6,89 4,46 8,97 4,15 5,25 180 1,6E+04 44,00 3,7E+04 1,1E+03 63,62 693,9 23,38 14,61 9,38 21,59 99,25

190 1,2E+04 1,8E+04 3,3E+04 1,4E+04 206,8 1,6E+05 490,7 412,8 8,6E+03 1,1E+04

200 2,5E+03 207,0 2,4E+03 212,3 214,9 26,52

Tab. 7.30 – Medium error on scale evaluation.




50 31,46 11,67 3,92

1>value

60

98,35 11,09 8,61 4,18 2,73 2,51 1,78 1,37

3>value≥1

70

800,0 17,31 9,94 6,42 3,81 3,45 2,52 1,83 1,29

6>value≥3

75 2,7E+03 7,4E+03 55,68 17,13 5,57 3,92 2,96 2,97 2,12 2,08

10>value≥6

80 802,8 59,53 23,95 12,96 4,72 3,80 2,74 2,12 1,44 1,50

15>value≥10

85 37,51 16,11 9,17 5,18 2,73 2,11 1,29 1,42 1,24 0,99

20>value≥15

90 1,9E+05 11,90 6,62 2,81 2,29 2,17 1,26 0,86 0,83 0,66 0,58

30>value≥20

95 43,05 6,56 3,71 2,05 1,51 1,27 0,90 0,69 0,45 0,43 0,43

40>value≥30

100 0,00 0,00 0,00 0,02 0,05 0,06 0,06 0,03 0,03 0,02 0,02

60>value≥40

105 11,38 3,39 2,95 2,01 1,14 1,14 0,83 0,71 0,51 0,51 0,40

100>value≥60

110 18,59 4,89 2,91 2,18 1,50 1,23 0,93 0,73 0,64 0,51 0,39

200>value≥100

115 49,05 3,87 3,46 2,94 2,05 1,77 1,18 0,97 0,81 0,71 0,60

500>value≥200

120 17,27 5,75 3,87 2,50 2,69 1,52 1,31 1,39 0,88 0,80 0,67

1000>value≥500

125 310,7 9,24 7,36 3,28 2,55 1,44 1,46 1,46 1,16 1,33 0,82

value≥1000

130 68,23 8,65 8,26 7,95 3,13 2,71 1,94 2,24 1,69 1,25 1,10 140 63,87 4,79 5,12 3,08 2,71 2,49 1,86 1,33 1,51 1,19 1,00 150 9,06 5,12 3,13 2,07 2,22 1,57 1,43 1,28 0,96 1,13 0,79 160 14,59 6,37 4,59 2,73 2,41 1,69 1,44 1,15 1,00 0,94 0,73 170 6,25 4,52 2,24 2,98 1,43 2,25 1,26 1,12 1,68 1,02 0,93 180 148,2 4,96 298,7 90,16 6,64 10,24 2,34 1,76 1,25 1,46 4,88

190 77,47 82,62 138,2 735,1 18,82 9,1E+03 100,9 10,76 208,8 165,7

200 22,74 18,03 30,92 15,36 15,29 9,06

- 119 -

Tab. 7.31 – Maximum error on scale evaluation.




50 325,8 238,8 114,8

1>value

60

797,3 66,96 84,16 55,39 43,51 37,47 24,71 22,29

3>value≥1

70

1,0E+03 185,4 74,97 113,8 82,47 52,27 49,58 28,07 20,38

6>value≥3

75 9,8E+03 2,7E+05 399,7 245,0 65,14 55,97 47,52 48,43 40,39 29,24

10>value≥6

80 1,4E+03 157,7 152,1 108,8 78,45 50,31 42,60 32,71 34,08 41,06

15>value≥10

85 183,4 70,70 46,53 35,61 43,07 41,94 18,70 23,90 24,50 21,94

20>value≥15

90 1,2E+06 66,52 68,03 30,46 24,32 18,56 17,78 18,48 24,64 13,14 9,84

30>value≥20

95 178,5 39,05 24,39 17,33 15,74 14,02 14,39 10,47 10,81 10,68 8,79

40>value≥30

100 0,00 0,00 0,00 0,01 0,47 0,39 0,34 0,30 0,29 0,32 0,44

60>value≥40

105 47,05 25,32 29,77 28,63 15,20 15,05 17,27 12,61 12,94 14,19 9,60

100>value≥60

110 79,00 36,73 37,28 28,00 20,22 14,47 17,33 16,90 13,76 13,30 11,23

200>value≥100

115 1,1E+03 46,35 46,17 38,01 27,45 24,49 19,75 18,33 18,12 19,64 13,43

500>value≥200

120 87,35 57,80 127,7 42,35 34,68 39,37 26,06 24,25 21,87 17,39 17,28

1000>value≥500

125 1,3E+03 249,7 120,5 49,66 71,17 34,92 38,67 41,36 24,77 30,55 21,07

value≥1000

130 207,4 84,50 139,6 147,2 53,06 65,96 37,63 55,08 32,07 26,37 28,80 140 302,3 117,2 132,7 272,0 105,4 44,70 40,65 65,40 43,13 23,55 39,86 150 156,5 87,82 45,11 58,36 32,32 51,37 49,81 19,32 32,75 35,38 32,49 160 91,26 62,53 51,70 51,31 57,80 49,91 40,70 45,61 52,60 48,01 47,18 170 86,26 63,63 68,60 70,16 71,52 66,71 27,85 28,51 46,73 40,86 51,30 180 1,5E+03 114,0 5,5E+03 162,8 184,2 182,8 90,19 54,05 112,5 45,19 284,2 190 3,0E+03 2,4E+03 966,4 1,3E+03 258,8 1,2E+04 459,6 409,8 909,8 653,1 200 1,4E+03 332,4 756,8 332,8 332,9 139,5

Tab. 7.32 – Medium error on scale evaluation.




50 41,13 31,24 18,18

1>value

60

38,21 14,32 17,49 11,28 7,99 6,55 5,85 5,96

3>value≥1

70

74,95 31,83 16,28 18,05 12,44 10,48 8,93 6,01 4,57

6>value≥3

75 188,9 747,9 47,92 25,58 16,11 10,89 11,44 10,90 8,09 7,50

10>value≥6

80 113,5 39,33 38,17 20,40 12,49 9,72 9,75 7,94 6,94 5,39

15>value≥10

85 22,62 15,08 12,60 8,51 6,75 7,53 5,51 5,09 5,33 3,78

20>value≥15

90 3,2E+04 11,47 10,00 5,92 4,94 4,44 5,09 3,38 3,26 3,21 2,50

30>value≥20

95 15,85 7,26 5,72 4,06 3,67 3,15 3,13 2,87 2,40 2,30 1,92

40>value≥30

100 0,00 0,00 0,00 0,00 0,03 0,16 0,09 0,11 0,12 0,08 0,15

60>value≥40

105 9,68 5,68 5,58 4,38 3,89 3,16 3,23 2,66 2,95 2,53 2,34

100>value≥60

110 13,92 7,62 6,27 5,83 3,89 3,97 3,61 3,31 2,96 2,90 2,21

200>value≥100

115 27,02 9,32 7,68 6,11 5,98 6,39 4,72 3,81 4,02 3,82 3,10

500>value≥200

120 18,58 10,38 11,06 8,02 6,43 6,29 4,58 5,85 4,30 3,59 3,64

1000>value≥500

125 72,72 23,37 13,97 10,96 10,89 8,10 7,00 7,52 5,48 5,74 5,22

value≥1000

130 33,17 18,07 21,15 19,21 10,79 10,75 8,20 9,73 7,44 6,46 4,96 140 59,75 16,16 17,23 15,34 13,33 10,66 11,11 9,49 8,21 4,87 6,32 150 20,11 11,91 10,64 8,24 7,20 9,25 9,09 5,53 7,12 5,99 7,72 160 18,50 10,16 10,60 8,63 9,78 8,95 8,65 8,11 7,41 6,41 6,73 170 20,64 12,05 10,72 11,95 8,83 12,00 6,01 9,22 7,87 9,40 8,32 180 39,16 13,01 55,51 37,00 26,22 20,02 17,30 14,44 12,67 9,55 31,65 190 64,72 45,57 56,03 117,3 46,47 793,4 104,8 36,71 78,90 55,16 200 42,49 40,52 42,60 44,65 46,09 41,11

Building and Evaluation of a Mosaic of Images using Aerial ......SIFT. Ransac, em conjunto com o algoritmo DLT, é usado para calcular as transformações projetivas entre duas imagens

Documents