An active stereovision system for 3D shape reconstruction using

ALMA MATER STUDIORUM · UNIVERSITÀ DI BOLOGNA

FACOLTÀ DI INGEGNERIACorso di Laurea Specialistica in Ingegneria Informatica

Tesi di Laurea inMetodi Numerici per la Grafica

Un sistema di stereovisione attiva per la ricostruzionedi forme 3D tramite superfici di suddivisione

An active stereovision system for 3D shapereconstruction using subdivision surfaces

Relatore:Prof.ssa Morigi Serena

Correlatori:Prof. Casciola GiulioProf. Liverani Alfredo

Candidato:Rucci Marco

Sessione IAnno Accademico 2007/2008

TABLE OF CONTENTS TABLE OF CONTENTS

Table of Contents

1 Introduction – English 2

2 Introduction – Italiano 5

3 Technologies Overview 73.1 Reverse Engineering Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 3D Scanners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.3 Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.4 Human-Computer Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.5 Our Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Stereoscopic Vision for 3D Tracking 184.1 Introducing the Wii Remote Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2 Camera Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.3 Epipolar Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 From Points to Polyline Meshes 345.1 Polyline Mesh Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.2 Polyline Mesh Components Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.3 Polyline Mesh Memory Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.4 Building the Polyline Mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.5 Search Strategies in the Polyline Mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6 From Polyline Mesh to Smooth Surfaces 616.1 From n-sided polygons to Tri-Quads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.2 Refining the Tri-Quad Mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.3 Smooth Surface Reconstruction Using Subdivision . . . . . . . . . . . . . . . . . . . . . . . . . 73

7 Experimental Results 78

8 Conclusions 86

9 References 88

1

1 INTRODUCTION – ENGLISH

Figure 1: Traditional reverse engineering pipeline steps with example images representing the digitalizationand reconstruction of an historical artifact. Images courtesy of ENEA, Bologna and the Department ofMathematics, University of Bologna.

1 Introduction – English

In the field of Computer Aided Design (CAD), reverse engineering has become an effective methodto create a 3D virtual model of a physical object for later use in software for computer aided design andmanufacturing. Reverse engineering has many applications in different fields, such as medical imaging,entertainment, cultural heritage, web commerce, collaborative design and obviously engineering; all theseapplications can take advantage in different ways from the reconstructed 3D virtual model.

The reverse engineering process usually involves two separate steps (see Figure 1):• Measuring an existing object• Reconstructing it as a 3D model

The physical object can be measured using 3D scanning technologies such as coordinate measuring ma-chines or computed tomography scanners, and the output of the measuring process is usually in the formof a point cloud, i.e. a large set of vertices in a three dimensional coordinate system, which lacks topologicalinformation and therefore is generally not directly usable in most 3D applications.

The point cloud is then usually converted to a mesh model, NURBS (NonUniform Rational B-Spline)

2


surface model, or CAD model through a process commonly referred to as 3D reconstruction so that it can beused for various purposes. This second step of reconstruction of the virtual 3D object from the dense pointcloud is an inverse problem and generally does not admit a unique solution. Most proposed approachesto reconstruction from unstructured data points build polygon meshes that interpolate or approximate theinput points. The fundamental difficulties of reconstruction arise from the lack of topological information inthe data but also from the noise and inaccuracies of the measuring process, the presence of obstructions andholes and consequently, additional assumptions and requirements on the input data are generally neededto make the problem tractable.

As a result, most reconstructed models need to be post-processed for simplification and optimizationintroducing another step in the reverse engineering process.

The steps of measuring and reconstruction could be achieved by using different techniques and devices,and all of them have strengths and weaknesses. Regarding the different 3D scanning methods, we canoutline the following quality measures:

• Accuracy and Resolution• Environmental sensitivity• Repeatability• Speed• Cost

while for the reconstruction process the following properties are usually needed:• Automaticity• No restriction on topology• Time and space efficiency• Robustness (respect to noisy data)

What is shared among all current reverse engineering solutions is the strict separation of the two fundamen-tal steps of measuring and reconstruction, that makes this process a linear, non iterative and non interactiveprocess.

This dissertation will present a new method of reverse engineering for fast, simple and interactive ac-quisition and reconstruction of a virtual 3D model representing an existing physical object that exploits anactive stereo acquisition system supported by a reconstruction and visualization layer based on subdivisionsurfaces.

The presented method, summarized in Figure 21, overcomes the lack of user interaction and feedbacktypical of existing solutions due to the strict separation of the two sequential main steps of measuring andreconstruction by integrating them into an iterative and incremental process that allows the user to have areal time visual feedback on the ongoing work. This gives the user the ability not only to intervene in caseof problems, inaccuracies or errors, but also to focus attention and measuring/reconstruction efforts on thefeatures of the more complex parts of the object.

The measuring step is achieved through an active stereo system made of two infrared cameras (availablein the Wii Remote controller) and one infrared light emitter mounted on a pen-like device. The user controlsthe IR pen, the 3D position of which is tracked by the stereo rig, and can intuitively draw and refine the

1The “physical” object showed in this image is actually a rendered view of a 3D virtual model. The reason will be clear later, but isprimarily due to problems in the development of the active stereo vision system. This forced us to simulate the process of interactivesurface sketching for the creation of the curve network by unrefining an existing 3D virtual model.

3


Figure 2: Fast and Interactive Reverse Engineering pipeline.

style lines of the object, i.e. the lines and curves that principally characterize the object shape. This set ofcurves is called the Curve Network and the process of interactively and incrementally drawing the curvenetwok is called Interactive Surface Sketching. The techniques and devices used in the measuring step areexplained in chapter 4.

The visualization of the ongoing process is achieved using subdivision surfaces that, due to their intrisicrecursive nature, perfectly fits in our real time process providing a scalable, fast, easy to implement, andhigh quality method for the representation and visualization of surfaces. The problem of reconstructionand visualization using Subdivision Surfaces is the subject of chapter 6.

The whole fast reverse engineering process is supported internally by an ad-hoc designed data structurecalled the polyline mesh. The design of this data structure has been tailored to provide an efficient andeasy-to-use way to store the acquired curve network and to support the development of the reconstructionand visualization algorithms that need to be executed with strict time requirements to provide a real-timevisual feedback on the ongoing process. The design and performance evaluation of the Polyline Mesh DataStructure is the subject of chapter 5.

The remarkable presence of the user during the process and the new way our system proposes to in-terface with Computer Aided Design software, leads us to the investigation of innovative or alternative

4

2 INTRODUCTION – ITALIANO

techniques for Human-Computer Interaction (HCI) such as virtual reality, modal interfaces and differenttypes of feedbacks that should enhance and facilitate the user experience.

The dissertation will begin with a chapter dedicated to an overview of the existent systems, technologiesand solutions for measuring and reconstruction of 3D shapes and, at the end of this chapter, we will presentin more detail our proposed solution for reverese engineering which we will refer to as Fast InteractiveReverse Engineering System (F.I.R.E. system).

2 Introduction – Italiano

La reingegnerizzazione o ingegneria inversa, comunemente indicata con il termine inglese reverse engi-neering, rappresenta il processo di creazione di un modello virtuale tridimensionale a partire da un oggettofisico esistente. L'ingegneria inversa offre molte applicazioni in diversi ambiti che possono trarre vantag-gio dal modello virtuale 3D ricostruito: medico, industriale, cinematografico, commerciale ed in generaleingegneristico.

Il processo di reverse engineering tradizionale é suddivisibile in due passi fondamentali (vedi Figure 1,pagina 2):

• La misurazione di un oggetto esisitente.• La ricostruzione di tale oggetto in un modello geometrico tridimensionale .

La misurazione dell'oggetto puó essere effettuata utilizzando svariate tecnologie di acquisizione e digital-izzazione, come ad esempio le macchine di misura a coordinate o la tomografia assiale computerizzata,e l'output del processo di misurazione é in genere una nuvola di punti, cioe' un insieme di vertici in unsistema di coordinate tridimensionale, a cui non é associata nessuna informazione topologica e non é percui utilizzabile direttamente nei software per la progettazione, la fabbricazione o l'ingegneria assistita dalcalcolatore (CAD/CAM/CAE). Tale nuvola di punti é per cui normalmente convertita in un modello geo-metrico di tipo mesh poligonale o superfice NURBS attraverso il cosiddetto processo di ricostruzione 3D.Quest'ultimo é un processo inverso che generalmente non ammette un unica soluzione. Molte soluzioniproposte per la risoluzione di tale problema costruiscono mesh poligonali che interpolano o approssimanoi punti di ingresso. Le maggiori difficoltá nella ricostruzione nascono dalla mancanza di informazioni topo-logiche nei dati ma anche dalla precisione limitata degli strumenti di acquisizione, quindi dalla presenzadi rumore ed outliers, ed é per cui generalmente necessario stabilire requisiti e fare assunzioni sui dati diingresso al fine di rendere il problema trattabile.

Le diverse soluzioni di reverse engineering utilizzano diverse tecniche e dispositivi, ognuno dei qualipresenta vantaggi e svantaggi. Per quanto riguarda i metodi di scansione tridimensionale, possiamo indi-viduare le seguenti misure qualitative:

• Precisione e risoluzione• Ripetibilitá• Velocitá• Costo

mentre per il processo di ricostruzione le seguenti proprietá sono in genere richieste:• Automaticitá• Nessuna restrizione sulla topologia

5

2 INTRODUCTION – ITALIANO

• Efficenza spazio-temporale• Robustezza rispetto al rumore nei dati

Le attuali soluzioni di reverse engineering condividono una forte separazione dei due passi fondamentaildi misurazione e riconstruzione, rendendo il processo lineare, non iterativo e non interattivo.

In questa dissertazione verrá presentata una proposta di reverse engineering per l'acquisizione e ri-construzione semplice e veloce di un modello virtuale 3D rappresentante un oggetto fisico sfruttando unsistema di acquisizione stereoscopica attiva ed una ricostruzione basata sull'utilizzo di superfici di suddivi-sione.

Il metodo presentato, riassunto in Figure 2 ( pagina 4), supera le limitazioni tipiche dei correnti sistemi direverse engineering sulla mancanza di interazione e feedback con l'utente, dovute alla marcata separazionedei due passi sequenziali di misurazione e ricontruzione, integrandoli in un processo incrementale ed itera-tivo. Ogni passo fornisce un feedback in tempo reale sul processo in corso che permette all'utente non solodi intervenire in caso di problemi od inesattezze nell'acquisizione, ma anche di concentrare l'attenzione esulle parti piú complesse dell'oggetto.

La misurazione é ottenuta attraverso un sistema di stereovisione attiva constituito da due telecameread infrarossi (disponibili nel Wii Remote Controller) ed un emettitore di luce infrarossa montato su di unanormale penna. L'utente utilizza tale penna IR, la cui posizione é tracciata dal sistema stereo, per disegnaree raffinare intuitivamente le linee di stile dell'oggetto, cioe' le linee e le curve che ne caratterizzano princi-palmente la forma. Tale insieme di curve é chiamato Curve Network ed il processo di disegno incrementaleed interattivo del curve network é detto Interactive Surface Sketching (disegno interattivo della superfice).Le tecniche ed i dispositivi utilizzati nello step di misurazione sono trattati nel capitolo 4.

La visualizzazione del processo in corso é ottenuta sfruttando le superfici di suddivisione che, data laloro natura ricorsiva, si integrano molto bene nel nostro processo real-time, fornendo un metodo scalabile,veloce e semplice da implementare per la rappresentazione e visualizzazione di superfici. Il problema dellaricostruzione tramite superfici di suddivisione é trattato nel capitolo 6.

Il porcesso di reverse engineering é supportato internamente da una struttra dati (denominata PolylineMesh Data Structure) progettata ad-hoc per fornire un metodo efficente per la memorizzazione del curvenetwork acquisito e per il supporto allo sviluppo degli algoritmi di ricostruzione che devono essere eseguiticon stringenti requisiti temporali al fine di fornire il feedback visivo in tempo reale sul processo in corso.L'analisi, il progetto e la valutazione delle performance della struttura dati polyline mesh sono trattati nelcapitolo 5.

La massiccia presenza ed il supporto dell'utente durante il processo di reverse engineering, insiemeall'approccio non tradizionale con cui il nostro sistema propone di interfacciarsi verso i software per la pro-gettazione assistita, hanno portato ad investigare tecniche innovative o alternative per l'interazione uomo-macchina (HCI) mirate a migliorare e facilitare l'esperienza ed il conivolgimento dell'utente.

Il primo capitolo di questa tesi é dedicato alla presentazione di sistemi, tecnologie e soluzioni esistentiper la misurazione e riconstruzione 3D e sará quindi presentata in maggior dettaglio la nostra proposta direverse engineering denominata Fast Interactive Reverse Engineering (F.I.R.E.)

6

3 TECHNOLOGIES OVERVIEW

3 Technologies Overview

In this chapter we will introduce in more depth the current solutions involved in the reverse engineeringprocess such as measuring, reconstruction, human-computer interaction and virtual reality, and we willpresent our proposed approach.

Before this technology overview we will see some possible applications of reverse engineering for betterunderstanding the different requirements in the different areas of applications and for also understandingwhich applications could benefit from the F.I.R.E. system.

3.1 Reverse Engineering Applications

The applications are wide-ranging and include [4]:Collaborative design

While CAD tools can be helpful in designing parts, in some cases the most intuitive design method isphysical interaction with the model. Frequently, companies employ sculptors to design these modelsin a medium such as clay. Once the sculpture is ready, it may be digitized and reconstructed on acomputer.

ManifactureMany manufacturable parts are currently designed with Computer Aided Design software. However,in some instances, a mechanical part exists and belongs to a working system but there exists no com-puter model needed to regenerate the part. If such a part breaks, and neither spare parts nor castingmolds exist, then it may be possible to remove a part from a working system and digitize it preciselyfor re-manufacture.

MedicineApplications of reverse engineering in medicine are wide ranging as well. Prosthetics can be customdesigned when the dimensions of the patient are known to high precision. Plastic surgeons can use theshape of an individual’s face to model tissue scarring processes and visualize the outcomes of surgery.When performing radiation treatment, a model of the patient’s shape can help guide the doctor indirecting the radiation accurately.

Dissemination of museum artifactsMuseum artifacts represent one-of-a-kind objects that attract the interest of scientists and lay-peopleworld-wide. Traditionally, to visualize these objects, it has been necessary to visit potentially distantmuseums or obtain non-interactive images or video sequences. By digitizing these parts, museumcurators can make them available for interactive visualization. See for example "The Digital Michelan-gelo Project"[5].

Special effects, video games, and virtual worldsSynthetic imagery is playing an increasingly prominent role in movies, video games and virtual reality.All of these applications require 3D models that may be taken from real life or from sculptures createdby artists.

Web commerceAs the World Wide Web provides a backbone for interaction over the Internet, commercial vendorsare taking advantage of the ability to market products through this medium. By making 3D models

7

3.2 3D Scanners 3 TECHNOLOGIES OVERVIEW

Figure 3: Possible taxonomy of 3D scanners

Figure 4: Different techniques for optical 3D measuring

of their products available over the Web, vendors can allow the customer to explore their productsinteractively.

3.2 3D Scanners

3D scanners can be classified into two main types (see Figure 3): contact and non-contact. Non-contact3D scanners can be further divided into two main categories, reflective and transmissive. There are a varietyof technologies that fall under each of these categories, see for example Figure 4.Contact

Contact 3D scanners probe the subject through physical touch. A CMM (coordinate measuring ma-chine) is an example of a contact 3D scanner. It is used mostly in manufacturing and can be veryprecise. The disadvantage of CMMs and contact 3D scanners is that they require contact with theobject being scanned. Thus, the act of scanning the object might modify or damage it. The otherdisadvantage of CMMs is that they are relatively slow compared to the other scanning methods.

8

3.2 3D Scanners 3 TECHNOLOGIES OVERVIEW

Figure 5: A laser projector and a camera could be used to measure a 3D object using triangulation

Non-Contact ReflectiveReflective scanners detect reflections of some kind of radiations in order to probe an object or environ-ment.

Time-of-flight - light detection and ranging (lidar) scanningThe time-of-flight 3D laser scanner finds the distance of a surface by timing the round-trip time ofa pulse of light. A laser is used to emit a pulse of light and the amount of time before the reflectedlight is seen by a detector is timed. The accuracy of a time-of-flight 3D laser scanner depends on howprecisely we can measure the time.

TriangulationThe triangulation laser shines a laser beam on the subject and utilizes a camera to look for the locationof the projected light (see Figure 5)The same techniques used for classic 3D stereo vision system could be applied in this case.

Structured lightStructured light 3D scanners project a pattern of light on the subject and look at the deformation of thepattern on the subject.

Non-Contact PassivePassive scanners do not emit any kind of radiation themselves, but instead rely on detecting reflectedambient radiation. Passive methods can be very cheap, because in most cases they do not need partic-ular hardware.

StereoscopicStereoscopic systems usually employ two video cameras looking at the same scene. By analyzing the

9

3.3 Reconstruction 3 TECHNOLOGIES OVERVIEW

differences between the images seen by each camera, it is possible to determine information on the 3Dstructure of the scene.

Active stereoActive stereo emerges as an alternative approach to the traditional use of two cameras for scene recon-struction. The term active stereo refers to the measurement of visible radiation that is projected in thescene, in contrast to passive techniques that are based on light present into the scene.

In an active stereo vision system a projector or a laser unit projects one or more points or beams of light onto the object and, similarly to the passive case, images of the points or beams on the two camera are analysedto infer the 3D structure of the observed object. With respect to the passive case, active stereo vision systemscan deal with smooth objects that do not contantain edges or corners that in the passive case are detectedusing gradients and intensity discontinuities and are necessary for the stereo matching process. Moreoverin active stereo the matching process, i.e. the search for feature correspondence between pairs of images, iscompletely absent because the corrispondence is unambiguous. Both of this features permit to have verydense depth maps of the scene that generally can produce more accurate reconstructions than is possibleusing passive stereo but with the drawback that (current) active techniques tend to be more expensive,slower, and more intrusive than their passive counterparts and could be unfeasible due to distant or not"static" objects and the need of an indoor environment for well controlled lighting.

3.3 Reconstruction

Point cloud data typically define numerous points on the surface of the object in terms of x, y and zcoordinates and eventually a color value.

There is usually far too much data in the point cloud collected from the scanner or digitizer, and someof it may be corrupted by noise. Without further processing, the data is not in a form that can be used byapplications such as CAD/CAM software or in rapid prototyping. Reconstruction techniques are used toedit the point cloud data, establish topology of the points in the cloud, and combine it into useful 3D virtualmodels.

In a traditional 3D scanning system, the shortest part of the reverse engineering process is the first stepof measuring while on the other hand, manipulating the data to produce a 3D virtual model of the objectcan be quite time-consuming and it may even require days to complete. In fact, as showed in the example ofFigure 6 [6] the reconstruction step consists generally in finding an interpolating or approximating mesh orsurface from the point cloud and also includes work for integrating multiple scans into one single 3D modelwith the so called processes of registration or alignement.

In fact, except for the simplest objects, multiple scans must be acquired to cover the whole object’ssurface. The individual point clouds must be aligned, or registered, into a common coordinate system sothat they can be integrated into a single 3D model. Different approaches exist such as:

• Mechanical tracking: the scanner or the object may be attached to a coordinate measurement machinethat tracks its position and orientation.

• Optical tracking: features of the objects or markers are used for alignement• Turntable:• Interactive alignment: user supply three or more matching points which are used to compute a rigid

transformation that aligns the points.

10


Figure 6: Reverse Engineering for rapid prototyping typical pipeline

Figure 7: Reconstruction from parallel and non parallel object contours

• Automatic alignment: Automatic feature matching for computing alignments is an active area of re-search.

Other different techniques for 3D reconstruction are necessary when the output of the scanning systemis not in the form of an unstructured point cloud. For example Figure 7 shows some results of a methodexposed in [7] here input data may lie on non-parallel cross-sections which are typical for medical scanningsystems.

Sometimes the input for the reconstruction doesn't come from a digitized object, but directly from acarefullly user-designed curve network or polyline network. The reconstruction of a detailed mesh or asmooth surface from this type of input, mostly referred to as lofting or skinning, is commonly available inCAD software and can produce high-quality results.

Figure 9 shows some steps of the method reconstruction from curve network proposed in [8] in whichthe curve network, consisting of cubic B-splines, is inferred from curve control points connected by linesegments in the polyline network through a curve network subdivision scheme. Additional points areintroduced in the interior of patches using a connectivity construction algorithm and fairing (smoothing);

11


Figure 8: Skinning of cross sectional curves

Figure 9: Lofting starting from a polyline network and producing a smooth surface through subdivionsurfaces

finally, a smooth surface is computed using a modification of Catmull-Clark subdivision scheme.Very often these methods of reconstructions require that the curve network satisfy strict constraints about

the shape, the number of intersections and the topology associated with the curve network, and the usermust be conscious of the process which is to be applied to carefully create the necessary input.

Subdivision surfaces can also be exploited in the process of reconstruction from point clouds as in theapproach proposed by Huges Hoppe in [9] and [10] in which first a triangular mesh, consisting of a rela-tively large number of triangles is built from the point data, then an optmized mesh is produced throughthe optimization of an energy function that explicitly models the trade-off between the goals of concise rep-resentation and good fit and then subdivision rules that will also consider sharp features of the mesh areapplied on the optimized mesh.

12

3.4 Human-Computer Interaction 3 TECHNOLOGIES OVERVIEW

Figure 10: Reconstruction from unstructured point cloud using subdivision. From left to right: The pointcloud; The optimized mesh; The optimized mesh with sharp featrures tagged; The resulting subdivisionsurface

3.4 Human-Computer Interaction

Human–computer interaction (HCI) is the study of interaction between people and computers. It isoften regarded as the intersection of computer science, behavioral sciences, design and several other fieldsof study. Interaction between users and computers occurs at the user interface, which includes both softwareand hardware. The following definition is given by the Association for Computing Machinery[11]:

"Human-computer interaction is a discipline concerned with the design, evaluation and im-plementation of interactive computing systems for human use and with the study of major phe-nomena surrounding them."

Because human-computer interaction studies humans and machines in conjunction, it draws from support-ing knowledge on both the machine and the human side.

Since the most classical and mature type of interaction between human and computers happens throughthe input/output devices composed of keyboard, mouse and screen, most efforts of HCI are put into theprinciples and methodologies of designing graphical user interfaces. But despite this de facto standardiza-tion of user-machine interaction, the goals of HCI are to develop new design methodologies, to experimentwith new hardware devices, to prototype new software systems, to explore new paradigms for interaction,and to develop models and theories of interaction.

In this direction our work proposes some solution to enhance the user experience during the reverse en-gineering process by investigating through the field of Virtual Reality, gesture recognition, modal interfacesand different types of feedback.

• Virtual RealityVirtual Reality is often used to describe a wide variety of applications, commonly associated with its immer-sive, highly visual, 3D environments and it's impossible to give an unambiguous definition but in generalVR systems try to create a virtual world that look real, sound real, move and respond to interaction in realtime, and even feel real.

• Gesture recognition

13

3.4 Human-Computer Interaction 3 TECHNOLOGIES OVERVIEW

Figure 11: Ivan Sutherland using Sketchpad. The user of Sketchpad, the precursor of all modern interactivecomputer graphics systems, was able to draw with a light pen on the computer screen and see the resultsalmost immediately.

Gesture Recognition is a topic in computer science with the goal of interpreting human gestures via mathe-matical algorithms. Gestures can originate from any motion of the body but commonly originate from theface or hand.

Gesture Recognition can be seen as a way for computers to begin to understand human body language,thus building a richer bridge between machines and humans than primitive text user interfaces or evenGUIs (Graphical User Interfaces), which still limit the majority of input to keyboard and mouse, enabling tointerface with the machine and interact naturally without any mechanical devices.

• Modal interfacesA precise definition of a modal interface is given in [12]:

"A human-machine interface is modal with respect to a given gesture when the current stateof the interface is not the user's locus of attention and the interface will execute one amongseveral different responses to the gesture, depending on the system's current state."

An interface that uses no modes is known as a modeless interface.In the field of Computer Aided Design, keyboard and mouse, coupled with a Graphical User Interface

(GUI), form the currently most used way for interacting with the software layer. But the way humansinteract with CAD software has been since the beginning an active field of study and experimentation. Forexample Sketchpad (see Figure 11) was a program written by Ivan Sutherland, now considered to be theancestor of modern CAD programs, used a light pen to interact with the program. Ever since, different input

14

3.5 Our Proposal 3 TECHNOLOGIES OVERVIEW

and output peripherals have been developed to enhance the way users interface with a CAD, regarding bothinput and output devices and paradigms, like trackballs, touchpads, graphic tablets, wired gloves, hapticdevices, virtual reality systems, stereoscopic displays etc.

3.5 Our Proposal

We have seen so far that the reverse engineering process has many different applications in many differ-ent scenarios but the requirements in terms of accuracy, time and cost are highly specific for the particularpurpose; none of the current scanning and reconstruction techniques are suited for all applications.

Our system for reverse engineering could be considered as targeted toward situations with strict require-ments in terms of time and cost but not demanding requirements on accuracy. These characteristics makeour system a candidate for use with possible "home users" applications, expanding the possible employmentof the process of reverse engineering towards this new market.

The 3D measuring step is accomplished with a hybrid contact-based active stereo system that makes useof two infrared cameras and a human-driven "probe" represented by an infrared light emitter device. It canbe considered as a hybrid solution because it is quite different from current active scanning techniques liketime-of-flight, interferometry (structured light) or triangulation primarily because we use a contact-basedhuman-driven approach.

In terms of costs, our system was constituted by very cheap equipments made of:• Two Wii Remote Controllers• One infrared led emitter mounted on a pen and activated by a switch• One personal computer with bluetooth support

This is the base system for 3D scanning but additional devices could be used to enhance human-computerinteraction experience.

While in classic active stereo the accuracy of the system primarily derives from the resolution of the cam-eras used and the quality and dimensions of the light emitter, in our system it is however limited by humanaccuracy factors that substitute the "mechanically" moved light projector normally used in standard activestereo vision systems. Nevertheless, involvment of user in the process of measuring has its advantages suchas:

• Simplifies the ambiguous and resource intensive reconstruction step, transferring in an intuitive waythe responsibility of defining the topology of the object in the user's hands.

• Gives the ability to intervene by adding, modifying or discarding measures right during the acquisi-tion process.

• Allows detection of features of the objects like edges and cornersThe process of incrementally drawing the style lines on the surface of the physical object, called interactivesurface sketching, produce a curve netwok that is internally represented as a Polyline Mesh, i.e. a set offaces, vertices, edges and polylines.

The reconstruction of a smooth surface from the polyline mesh is achieved through three main steps:• tri-quad mesh reconstruction through the triqudrification process• base mesh reconstruction through bilinearly blended Coons patches• smooth surface reconstruction through subdivision surfaces

15


Since the user draws almost freedomly the curve network, it's associated polyline mesh will contain facesthat could be n-sided, non-planar, and non-convex. The first step of reconstruction has the objective offinding a good subdivision of each face in three and four sided polygons and the output of this step is calleda Tri-Quad Mesh.

The tri-quad mesh is howerver very coarse, especially during the first iteration of the F.I.R.E. process,and we need a method to refine the tri-quad mesh in order to supply sufficent information to the last stepof reconstruction. The method that we chose exploit the Coons patches and produce a refined mesh calledthe Base Mesh. Finally a smooth surface is computed and visualized using subdivision surfaces.

Independently from the used reconstruction techniques, our system, like most other 3D scanning tech-niques, has need of registering and alignment algorithm support to integrate different scanning of the sameobject from different view points. The current implementation of our system lacks registering and aligne-ment features and this could be an area of future research and development. In any case we, could imaginethat an interactive alignement procedure could seamlessly fit into our interactive-iterative process. As theuser moves the object (or the cameras are moved), he could supply three or more points on the object thatcorrespond to matching points on the 3D virtual part of the object output of previous scans.

The 3D acquisition and reconstruction layers are supported by an ad-hoc designed data structure calledPolyline Mesh Data Structure. In fact rather than using and expanding an existing, off-the-shelf generalmesh data structure for storing the acquired curve network, we developed the polyline mesh DS to meetour specific requirements for the implementation of the reconstruction algorithms which, starting from anetwork of curves, will build a smooth surface. In order to provide a visual feedback for the ongoningreverse engineering process, these algorithms have to be performed in real-time, thus, lot of attention hasbeen given to the efficiency and performance of the solution.

We said that our system makes use of a simple device as input peripheral: an infrared light sourcemounted on a pen with a switch that trigger the light, and we could notice how the possible meanings thatwe can attribute to the on/off state of the input are limited. This makes it difficult, or nearly impossible, todistinguish different actions that the user wants to perform, for example, how do we differentiate when theuser wants to select the starting point of a new curve from the drawing of the curve itself. Moreover thecharacteristics of our system make it impossible to differentiate behaviours based on the temporal durationof actions. For example, the classic single or double click usually performed with the mouse could not beimplemented in our systems due to noise, occlusions or ambiental interferences that will easily confuse andbreak the algorithm for detecting these types of actions.

This leads us to the use of modal interfaces so that the actions the user perform, and the subsequentfunctions the system does, depend on the current mode of operation. But how to change such mode withoutinterrupting or slowing down the whole process?

The current solution is to support user input and enhance semantics through keyboard and mouse inputbut with this solution some problems arise: we are limiting the positioning of the cameras and the physicalobject near such input devices, otherwise they could not be within reach, slowing the process. We are plan-ning to integrate into our process of reverse engineering the use of gesture recognition systems to enhanceand experiment with new way of HCI in the field of computer aided design.

The use of the glove should allow us to increase the semantics of the action performed with the pen,by means of gestures that imitate possible operations. In fact both during the reverse engineering processand during the post-processing surface editing, the user should be able to perform different actions such as

16


Figure 12: Hardware and software layers involved in our fast reverse engineering process

entering a new curve, selecting, deleting or modifying an existing curve. Also more common operations onthe camera could be performed through the glove and light pen such as rotating, zooming or moving thepoint of view.

Moreover, we are also planning to enhance the main input periheral, that is the IR pen, equipping it witha 3-axis accelerometer for improving the 3D tracking process and an off-centered small motor to providehaptic feedback.

The tactile or haptic feedback could help the user during the measuring process for example by warningwhen the light source is near the end of the field of view of one of the cameras or when one or both camerasdoes not see the light source due to occlusions. This kind of help could be very useful if we consider thatthe user focus during the reverse engineering process is primarily on the object rather than on the screen.

The real-time and interactive characteristics of our systems that allow the user to incrementally refine the3D model of an existing object while constantly receiving a visual feedback on the ongonig process, makesour solution different from the existing reverse engineering processes that are often linear, non iterative andnon interactive. All these characteristics, plus the planned features to be researched and developed, con-stitute what we have called a Fast Interactive Reverse Engineering System (F.I.R.E. System). The hardwareand software layers involved in the F.I.R.E. system are summarized in Figure 12.

17

4 STEREOSCOPIC VISION FOR 3D TRACKING

Figure 13: The wiimote seen from different views

4 Stereoscopic Vision for 3D Tracking

In the field of computer vision many efforts have been made on the development of theories, algorithmsand techniques for the so called "multiple view geometry" or "epipolar geometry" which focus on the prob-lematics that arise in the reconstruction of a 3-dimensional scene seen from multiple points of view by oneor more cameras.

This chapter does not contain any novel aspects about geometry for computer vision but we will intro-duce some of the basic concept of multiple view geometry pointing out how this theories and the relatedalgorithms could be applied to our specific system based on an active stereo rig that exploit the infraredcameras available in the wiimotes. To this end we will first present some general information on the WiiRemote Controller, and we will see through a simple example how to use the wiimote infrared camera tobuild an head tracking system.

4.1 Introducing the Wii Remote Controller

The Wii Remote Controller, wiimote for short, is the main input device of the Nintendo Wii Console usedas primary interaction interface between the user and the system. The main feature of the Wii Remote is inits motion sensing capability, which allows the user to interact via movement and pointing through the useof accelerometer and optical sensor technology

The wiimote communicates with the Wii console through the standard Bluetooth interface HID (HumanInterface Device) without requiring any of the authentication or encryption features of the Bluetooth stan-dard. These are precisely the characteristics that permitted the reverse engineering2 process that enabled

2Here the term “reverse engineering” refer to the process of discovering the technological principles of a device, object or system

18

4.1 Introducing the Wii Remote Controller 4 STEREOSCOPIC VISION FOR 3D TRACKING

Figure 14: Rotation axis of wiimote. Usually the rotation angles about x, y and z are called respectivelypitch, roll and yaw

to fully make use of all the wiimote functionalities on any personal computer equipped with a bluetoothinterface.

This led through the development of different applications, such as the Head Tracking for Desktop Vir-tual Reality [13] that make use, often in an original way, of the wiimote features. Note that the wiimote doesnot utilize novel or groundbreaking technologies but, the economical success of the Nintendo Wii an theconsequent mass production, allowed the availability of technologies that, until now, weren't availbale forthe home-user market at an affordable price.

4.1.1 Input & Output

The Wiimote has three main input devices:• a three-axis accelerometer• an infrared camera• 11 buttons

and a port that allows for external input features such as those contained in the Nunchuk and the ClassicController.

It also have three different type of output that can provide the user with different feedbacks:• Visual feedback through four leds placed in the wiimote front face• Audio feedback through a small speaker• Tactile feedback through a small motor attached to an off-center weight

Motion Sensor The wiimote contains a three-axis linear accelerometer, the integrated circuit ADXL330[14] manufactured by Analog Devices, which report back, in arbitrary units, the instantaneous accelerationimparted on the controller by the user holding it over a range of at least +/- 3g with 10% sensitivity.

through analysis of its structure, function and operation rather than the process of measuring and reconstructing a physical object intoa 3D virtual model that is the main subject of this thesis. See [1]

19


IR Sensor The camera includes a built-in processor capable of tracking up to 4 moving sources of infraredlights and normally detects sources with wavelength between 850nm and 950nm, but if the IR-pass filter isremoved, it can track any bright object.

Unlike the accelerometer, technical specifications, optical and components characteristics are unavailablefor the infrared camera. Methods of camera calibration, i.e. the process of recovering the parameters thatcharacterize a camera, are described in the next chapters but we can anticipate that accurate results in thecalibration of the wiimote IR camera were not achieved, thus we have to rely on approximated values.

Different sources of information reports an horizontal Field of View (FoV) of about 41° and a vertical FoVof about 31°. The real dimension of the image plane are unknown but the values returned by the wiimoteare in the range [0; 1023]x[0; 767]. Knowing that, some application currently developed state that we canassociate at each unitary increment in the image plane an increment of the θu angle (see Figure 15) equal tothe quotient between the FoV and the dimension of the image plane. That is, given a point P with imagecoordinate (up, vp) we can compute:

θu = up ∗ (horizontalFoVhorizontaldim

) , θv = vp ∗ (verticalFoVverticaldim

)

in which the rate FoVdim express the increment of the angle in degrees for each pixel in the two different direc-

tion.By using this technique is like we are making the wrong assumption that the image plane has hemi-

spherical shape because in this case we will correctly associate a unitary increment of the curve arc to anangle increment but, although the wiimote FoV is not very wide, we are anyhow committing an error thatis easily eliminable.

Throughout the knowledge of the FoV, even if approximated, we can compute the focal lenght f and usethis value to compute the angles θu, θv using:

θu = atan(

uf

), θv = atan

(vf

)As we will see, the previous formula is vaild if we model the image formation process with the pinholecamera model.

Leds, Rumble & Speaker The three kinds of feedback available through the wiimote, visual, audio andtactile, are primarily used to enhance the player experience providing feedback on the user actions duringthe game. Apart from the recreational use, these types of feedback play an inportant role in the field ofHuman-Computer Interaction.

4.1.2 Head Tracking Using the Wiimote

Using a single wiimote and two sources of infrared light placed at a fixed and known distance the onefrom the other, it is possible to track the position of the point in the middle of the linght source in the3-Dimesional space. This technique is used in the application "Head Tracking for Desktop VR displaty"developed by Johnny Chung Lee and is a good example to understand and apply some basic concepts ofcomputer vision.

20


Figure 15: In this figure are shown θu and up, measures involved in the computation of the focal distance

Using the pinhole camera model a 3D point m(x, y, z) is mapped onto the image plane in the pointsm′(u, v) with coordinates:

u = fxz

, v = fyz

Given a segment with end points

a =

xayaza

, b =

xbybzb

the projection of this segment on the image plane will have end points:

a′ =[

uava

]= f

[xazayaza

], b′ =

[ubvb

]= f

[ xbzbybzb

]and length

l′ = f∥∥∥∥[ xa

za− xb

zb

yaza− yb

zb

]T∥∥∥∥

21

4.2 Camera Geometry 4 STEREOSCOPIC VISION FOR 3D TRACKING

If the line segment ab is parallel to the image plane then the length of the image of the segment does notvary respect to translation in the x, y directions, in fact:

za = zb = z ⇒ l′ = f∥∥∥∥[ xa−xb

zya−yb

z

]T∥∥∥∥ =

fz

√(xa − xb)2 + (ya − yb)2 =

flz

i.e. the lenght of the image of the segment is inversely proportional to the distance from the camera.Perspective projection, modeled by the pinhole camera, does not preserve the length ratios along lines

and these can be seen for example by noting how the image of the mid-point of the segment does notcoincide with the mid-point of the image of the segment, in fact:

a′ + b′

2=

f2

[xaza

+ xbzbya

za+ yb

zb

]6= m′ =

f2

[ xa+xbza+zbya+ybza+zb

]

It is easy to see that if ab is parallel to the image plane then they coincide.Such considerations can be utilized to develop a 3D tracking system using one wiimote and two IR light

sources at a fixed distance one from the other.Leveraging the invariance respect to translation in x, y directions, we can compute the coordinates

(xh, yh, zh) of the mid-point between the sources considering the ratio:

ledDist/2zH

=dotDist/2

f

where ledDist is the fixed distance between the light sources and dotDist is the distance between the imageof this two sources on the image plane of the camera (see Figure 16).

Knowing zH = f ∗ledDistdotDist , we can compute xh, yh.

The idea of Chung Lee is to exploit this 3D tracking system to visualize on the screen a 3D scene seenfrom a point of view that coincides with the current position of the observer respect to the screen. In thissituation the observer can use the screen as virtual portal toward the 3D scene.

Considering the screen reference frame positioned at the center of the screen and with the z axis perpen-dicular to it, as in Figure 17, let Ow = (xw, yw, zw) be the known position of the wiimote reference framerespect to the screen. If the segment between the two light sources is always parallel to the x,y plane of thescreen reference plane, and we only allow a rotation θw of the wiimote along the x-axis, then the invarianceof the dimension of the image of the segment is valid and we can compute the head position respect to thewiimote reference frame h′ = (x′h, y′h, z′h) using the formulas described above.

The head position h′ could be transformed in screen coordinates using:

h = RTx (θw)(h′ −Ow)

where RTx (θw) express the rotation along the x-axis of an angle θw.

4.2 Camera Geometry

The pinhole camera is the simplest device capable of capturing an image from a three-dimensional scene:the light pass through the small hole and the inverted image of the scene is impressed on the photosensitive

22


Figure 16: A schematic representation of the entities involved in the 3D tracking using wiimote and thesensorbar.

surface (see Figure 18).The pinhole camera model describes the mathematical relationship between the coordinates of a 3D

point and its projection onto the image plane of an ideal pinhole camera. The mapping from the coordinatesof a 3D point M =

[x y z

]Tto the 2D image coordinates of the point's projection onto the image plane

m =[u v

]Tis given by

(uv

)= f

z

(xy

).

From Figure 19 we can see the entities involved in the pinhole camera model:• The image plane I• The center of projection or camera center C• The focal distance f : the distance between the camera center and the image plane• The optical axis: the line perpendicular to the image plane and passing through the camera center• The principal point c: the intersection of the optical axis with the image plane• The principal plane: the plane parallel to the image plane through the camera center

In Figure 19 the optical axis coincide with the z-axis, the principle point is (0,0, -f) and the principal plane isthe plane z = 0.

The pinhole camera model is widely used as an approximation of the image formation process of moderncameras even if the model does not include geometric distortions and blurring of unfocused objects causedby lenses and finite size apertures. It also does not take into account that most practical cameras have

23


Figure 17: An example of wiimote-screen configuration

Figure 18: Rapresentation of a pinhole camera

24


Figure 19: The pinhole camera model

only discrete image coordinates. Its validity depends mostly on the quality of the camera and, in general,decreases from the center of the image to the edges as lens distortion effects increase. Some of the effects thatthe pinhole camera model does not take into account could be neglected if a high quality camera is used. Ifthis simplification is not sufficient non-linear mappings have to be incorporated in the camera model.

Using homogeneous coordinates the pinhole camera model could be expressed with the relation:

m = PM ,

fxfyz

=

f 0 0 00 f 0 00 0 1 0

xyz1

that describe the linear mapping between a point M =

[x y z 1

]Tand his image m =

[u v 1

]T

through the Camera Projection Matrix P. Such matrix represents only the simplest case of camera modelingbecause it consider only the focal distance f . To represent more accurately the image formation process wecan consider the so called intrinsic camera parameters that, besides the focal distance, are:

• the image center (ox, oy)• pixel dimension (sx, sy) ( sy

sxis the aspect ratio)

• the skew s (related to the angle θ between the axis, typically π2 )

Intrinsic parameters could be encapsulated in the calibration matrix K:

K =

f /sx s ox0 f /sy oy0 0 1

that express the transformation between "normalized camera coordinates" to "pixel coordinates".

The process of determining the calibration matrix is called camera calibration and its result is a calibratedcamera. If the matrix K is unknown the camera is denoted as uncalibrated camera. For most real worldcameras, four out of five intrinsic camera parameters are predictable. The aspect ratio is close to one, the

25


Figure 20: Pinhole camera model with camera frame and world frame not coincident

skew close to zero and the principal point close to the centre of the image. In contrast to this, the focal lengthmight vary largely depending on the aperture angle of the lens. Camera calibration issues are considerednext.

Considering the calibration matrix we can now represent the camera projection matrix as:

P = K [I | 0]

The previous formulation only represents the particular choice of the camera reference frame with its centreof projection at the origin of the euclidean/world space. In the general case (see Figure 20), if the positionand orientation of the camera is expressed rispect to a "world" reference frame by a transformation matrix

G =[

R t0 1

], composed by a 3x3 rotation matrix R and a 3x1 translation vector t, then a point in world

coordinates Xworld is transformed into camera coordinates Xcam by Xworld = RXcam + t. R and t are calledextrinsic camera parameters.

Considering the calibration matrix K and the camera-to-world transformation matrix G, the projectioncould be defined by m = P M where P = K [I | 0]G or equivalently P = K [R | t].

26

4.3 Epipolar Geometry 4 STEREOSCOPIC VISION FOR 3D TRACKING

Figure 21: Epipolar geometry for two cameras.

4.3 Epipolar Geometry

Given two views of the same 3D scene, acquired at the same time by two cameras of a stereo rig orsequentially from a moving camera (and a static scene), they can be analyzed to extract information on thescene structure or on the intrinsic and extrinsic camera parameters. It is important to note that the twotasks of reconstructing the structure and the cameras features and position are interlinked and should notbe considered as two decoupled problems.

Without considering occlusions, a point M = (x, y, z, 1) of the 3D scene is projected on the image planeof the right and left camera in the conjugate (or corrispondent) points mr = (ur, vr, 1)T and ml = (ul , vl , 1)T .

If we identify the projection matrices of the right and left cameras with Pr, Pl , the image of the point Mon the image planes of the two cameras are:

mr = Pr M , ml = Pl M

Let's introduce the following nomenclature (see Figure 21):• Left Epipole el : is the projection of the optical center Cr of the right camera on the left image plane.• Right Epipole er: Viceversa• Epipolar Plane: the plane defined by the three points Cl , Cr, M• Left Epipolar Line ll : the intersection of the epipolar plane with left image plane• Right Epipolar Line lr: viceversa• Baseline: the line that connects the two projection centers Cl , Cr

The projected points ml , mr are respectively on the rays from the projection centers Cl , Cr to M and both lieson the epipolar plane.

Every possible left epipolar line pass through the left epipole el and every right epipolar line passthrough er

Considering that the epipolar plane is completely defined by the two projection centers and a pointon the image plane (e.g. ml), it is always possible to compute the conjugate epipolar line ( lr) on which theconjugate image point ( mr) lies. This consideration, that as we will se can be mathematically represented by

27


the fundamental matrix, greatly help the matching problem restricting the research space for corrispondencein a monodimensional space (an epipolar line) rather than in the whole image.

In the discussion above we implicitly made the assumption that it is known which image point in twoviews represent the same 3D point in space. In general, the only source of information for the so calledmatching problem, i.e. the problem of detecting corresponding image features between two views of thesame scene, are the images and their features. In an active stereo vision systems the matching problem ispractically absent because the sources of light are artificially inserted in the scene and it is usually knownthe number and the spatial relations between the tracked lights thus making the problem of matching easilysolvable.

4.3.1 The Fundamental Matrix

The fundamental matrix is the algebraic rapresentation of the epipolar geometry and state the mappingbetween an image point and the conjugate epipolar line.

Considering the equation ml = Pl M that express the projection of the 3D point M on the image planeof the camera identified by the projection matrix Pl , we can express the parametric equation of the ray fromthe camera projection center Cl to the image point ml with:

M(λ) = P +l ml + λCl where Pl P +

l = I , PlCl = 0.Considering the two points on the ray M(λ = 0) = P +

l ml , M(λ = ∞) = Cl , they are projected on theimage plane of the right camera identified by the matrix Pr in PrP +

l ml , PrCl . The line that pass throughthis two points is the epipolar line lr and can be expressed with:

lr = (PrCl)× (PrP +l ml)

The right epipole er is the point PrCl and in fact it is the projection of the left camera center on the rightcamera image plane. The equation of the epipolar line could also be written as:

lr = [er] × (PrP +l )ml

where the notation [a] × with a = (ax, ay, az) stands for [a] × =

0 −az ayaz 0 −az−ay az 0

and this antisimmetric matrix could be used to denote the cross product using matrix product.As said, the fundametal matrix express the mapping between an image point ant the conjugate epipolar

line ml → lr:

F = [er] × (PrP +l )

The fundamental matrix allow the formulation of the corrispondence condition:

mTr F ml = 0

that is valid for every pair of conjugate image points.

28


4.3.2 The Essential Matrix

The essential matrix is the specialization of the fundamental matrix to the case of normalized imagecoordinates and conversely the fundamental matrix may be thought of as the generalization of the essentialmatrix in which the assumption of calibrated cameras is removed.

Considering the camera matrix decomposed as P = K [R | t] and an image point m = PM, if thecalibration matrix K is known, then we may apply its inverse to the point m to obtain the point m′ = K−1m.Then m′ = [R | t]M , where m′ is the image point expressed in normalized coordinates. It may be thoughtof as the image of the point M with respect to a camera having the identity matrix as calibration matrix. Thecamera matrix K−1 P = [R | t] is called the normalized camera matrix.

Considering a pair of normalized camera matrices Pl = [I | 0] and Pr = [R | t]. The fundamentalmatrix corresponding to the pair of normalized cameras is called the essential matrix.

Given two conjugate image points m′l , m′r (in normalized camera coordinates) we have:

m′Tr E m′l = 0

Knowing that the relation between un-normalized and normalized camera coordinates point given the cal-ibration matrices Kl , Kr is:

m′l = K−1l ml , m′r = K−1

r mr

and substituting for m′l , m′r we have:

mTr K−T

r E K−1l ml = 0

Comparing this with the corrispondence condition mTr F ml = 0 of the fundamental matrix, it follows that

the relationship between the fundamental and essential matrix is F = K−Tr E K−1

l .

4.3.3 The Eight Point Algorithm

The eight-point algorithm is used to estimate the essential matrix or the fundamental matrix related toa stereo camera pair from a set of corresponding image points. It was introduced in [15] for the case of theessential matrix but can also be used as we will see for the fundamental matrix computation.

The algorithm's name derives from the fact that it estimates the essential matrix or the fundamentalmatrix from a set of eight (or more) corresponding image points.

The basic algorithm consists in three steps:1. Define a set of homogeneous linear equations from a set of eight or more conjugate image points.2. Solve the linear equation3. Enforce fundamental matrix internal constraint

Given a pair of conjugate image points

ml =[ul vl 1

]T, mr =

[ur vr 1

]T

and the unknown fundamental matrix

29


F =

f11 f12 f13f21 f22 f23f31 f32 f33

The constraint represented by the correspondence condition mT

r F ml = 0 could be rewritten as:

urul f11 + ulvr f12 + ul f13 + vlur f21 + vlvr f22 + vl f23 + ur f31 + vr f32 + f33

or A1x9 f 9x1 = 0 with

A =[urul ulvr ul vlur vlvr vl ur vr 1

]and

f =[

f11 f12 f13 f21 f22 f23 f31 f32 f33]T

Each pair of corresponding image point produce a vector A1x9, so, given a set of N conjugate points, wecan build the linear set of equations:

ANx9 f 9x1 = 0

with

A =

ur1ul1 ul1vr1 ul1 vl1ur1 vl1vr1 vl1 ur1 vr1 1ur2ul2 ul2vr2 ul2 vl2ur2 vl2vr2 vl2 ur2 vr2 1

· · ·urNuln ulnvrN uln vlnurN vlnvrN vln urN vrN 1

The least-squares solution for f is the singular vector corresponding to the smallest singular value of A,

that is, the last column of V in the singular value decomposition A = UDVT . A reordering of the solutionf back into a 3x3 matrix gives the result of the second step Fest.

Since the fundamental matrix F is singular and of rank 2, and in general we are calculating Fest based ondiscretized data with some amount of noise, these properties will not naturally be present in Fest. We willenforce these properties of F to refine the intermediate result that we obtained above. Considering that forany singular matrix of rank two, the SVD should result in a diagonal matrix D that has a zero value in thelast element, if we take the SVD of Fest and force the last diagonal element of D to zero, we can recalculate arefined F through:

Fest = UDVT with D = diag[r s t

]D′ = diag

[r s 0

]F = UD′VT

As pointed out in [16] when the eight point algorithm is implemented as described above, it will often yieldresults that are only marginally useful. This is because the method above is highly sensitive to noise in the

30


matched point data. The solution is to transform the input image points using anisotropic scaling, precondi-tioning the matrix A, without affecting the properties of the obtained fundamental matrix. In general givena set of conjugate image points ml , mr and two independent 3x3 (affine) transformation matrices Tl , Trthat transform the points into the new image coordinates:

ml = Tlml , mr = Trmr

Substituting in the correspondence condition mTr F ml = 0 we have:

mTr T−T

r FT−1l ml

from which we can define

F = T−Tr FT−1

l

If we compute F with the eight point algorithm with input

mTr F ml = 0

we can obtain the fundamental matrix for the original data with:

F = TTr F Tl

The question now becomes, how do we want to transform the matched points so that we can decrease theamount of numerical errors in the 8-Point algorithm? In [16] Hartley proposed that the coordinate systemof each of the two images should be transformed independently into a new coordinate system in which:

• The origin should be centered at the centroid of the image points.• After the translation, the coordinates should be uniformly scaled so that the mean distance from the

origin to a point equals√

2This tranformation could be done defining the following matrix:

T =

1/d 0 −u/d0 1/d −v/d0 0 1

where

u =n

∑i=1

uin

, v =n

∑i=1

vin

and

d =n

∑i=1

√(ui − u)2 + (vi − v)2

n√

2

31


4.3.4 Camera Calibration

Camera calibration is a necessary step in order to extract metric information from 2D images. Muchwork has been done, starting in the photogrammetry community and more recently in computer vision. Itis possible to classify these techniques into three categories: photogrammetric calibration, self- calibrationand mixed solutions.

• Photogrammetric calibrationCamera calibration is performed by observing a calibration object whose geometry in 3-D space is knownwith very good precision. Calibration can be done very efficiently. The calibration object usually consists oftwo or three planes orthogonal to each other. Sometimes, a plane undergoing a precisely known translationis also used. These approaches require an expensive calibration apparatus, and an elaborate setup.

• Self-calibrationTechniques in this do not use any calibration object. Just by moving a camera in a static scene, or having twoor more images from different points of view, correspondences between the images are sufficient to recoverboth the internal and external parameters which allow to reconstruct 3-D structure. While this approach isvery flexible, it is not yet mature. Because there are many parameters to estimate, it is not always possibleto obtain reliable results.

• Mixed calibrationTechniques such as [17] or [18] only requires the camera to observe a planar pattern shown at a few (at leasttwo) different orientations and the motion need not be known. Compared with self-calibration, it gainsconsiderable degree of robustness. The method exposed in [17] is available through in OpenCV [19] andhas been used in our system for the calibration of the wiimote IR camera. The results are however currentlyunsatisfactory because of the limitation on the maximum number of possible light sources that the wiimoteis capable to track. This could be an area of further investigation for the development of a calibration processtailored to the wiimote cameras and infrared cameras in general, where, compared to classic cameras, weoften have a limited number of corrispondence for each frame, but we can rely on an easier and fastersolution to the matching problem that could allow to have a great number of views of the same pattern.

4.3.5 Structure Computation

This chapter describes how to compute the position of a point M in 3-D space given its image ml , mr intwo views and the camera matrices Pl , Pr of those views. Considering noise and inaccuracies in the pointsand the camera matrices, the rays from the two camera centers passing through the two image points willnot intersect in general.

This means that there will not be a point M which exactly satisfies both ml = Pl M, mr = Pr M and theimage points do not satisfy the epipolar constraint mT

r Fml = 0. These statements are equivalent since thetwo rays corresponding to a matching pair of points will meet in space if and only if the points satisfy theepipolar constraint. See Ill ...

In [20] which the equation of projection ml = Pl M, mr = Pr M are combined into a linear equation inthe form AM = 0, where

32


A =

ul p3T

l − p1Tl

vl p3Tl − p2T

lur p3T

r − p1Tr

vr p3Tr − p2T

r

and pil , pi

r are the rows of Pl , Pr

that can be solved for example by using the SVD decomposition of A.In the current implementation of the 3D tracking algorithm the following solution to the triangulation

problem is used.Given a pair of homogeneous normalized image coordinates ml = [ul vl 1]T , mr = [ur vr 1]T corre-

sponding to the unknown 3D point M = [x y z 1]T , positioning the reference system in the left cameracenter we have:

M = Ml =

xlylzl1

and Mr =

xryrzr

= R(Ml–Cr)

Using the pinhole camera model and considering normalized image coordinates we can writeMl = zlml , Mr = zrmr and then

zrmr = R(zlml − Cr) ⇒

zrurzrvrzr

=

zl(r1T ·ml)− r1TCrzl(r2T ·ml)− r2TCrzl(r3T ·ml)− r3TCr

The last equation coud be rewrited in a linear system Az = b with unknowns zl , zr:ur − (r1T ·ml)

vr − (r2T ·ml)1− (r3T ·ml)

[zrzl

]=

−r1TCr−r2TCr−r3TCr

that can be solved in the least squares sense by solving the normal equations:

AT Az = ATb

33

5 FROM POINTS TO POLYLINE MESHES

Figure 22: An overview of the reconstruction from curve network process in the F.I.R.E. system. The polylinemesh data structure will support the development of the algorithms of reconstruction storing the necessaryinformation and offering facilities for their implementation.

5 From Points to Polyline Meshes

The polyline mesh is the representation of the geometrical model underlying the curve network createdby the user during the process of interactive surface sketching. While the curve network is only a visualrepresentation of the object in terms of curves in the 3D space, the polyline mesh contain also topologicalinformation, i.e. which pairs of vertices form an edge, which sequences of edges form a face. Basically thepolyline mesh is a polygon mesh in which each edge has an associated polyline that represent a curve or apart of a curve drawn by the user.

During the process of reconstruction from curve network, schematized in Figure 22, different algorithmsmodify the geometric and topological information contained in the polyline mesh and to actually implementthese algorithms a data structure that offer facility for storing, accessing and modifying the informationcontained in it is needed. In this chapter are presented some problematics that arise in the design of a datastructure for the storage of the polyline mesh and a possible solution is presented and evaluated. In the restof this thesis we will refer to this data structure as the polyline mesh DS.

Summarizing, the curve network is the visual representation of the interactive surface sketching processand the polyline mesh is its underlying geometric representation; the tri-quad mesh and the base mesh aresuccessive refinement of the polyline mesh that alter its geometric and topological information.

34

5.1 Polyline Mesh Features 5 FROM POINTS TO POLYLINE MESHES

5.1 Polyline Mesh Features

A polyline mesh is very similar to a polygon mesh in the sense that it is a collection of faces, edges andvertices. In particular we can define a polyline mesh M as M = (V, E, F, P) where

• V is a set of points in R3 called vertices.• E is a set of edges. An edge is a pair of vertices• F is a set of faces. A Face is sequences of edges• P is a set of polylines, i.e. sequences of points in R3.

Between the elements of E, V, F, P the following relations must be respected:• Every e ∈ E edge connects exactly two vertices and is shared at most among two faces• Every v ∈ V is shared by at least two edges• Every f ∈ F is closed, i.e. the starting vertex of the first edge coincide with the ending vertex of the

last edge• Every e ∈ E has an associated polyline p ∈ P. The starting and ending vertex of the edge coincide

with the first and the last point of p.Even though there exist numerous data structures for polygonal mesh storage and manipulation [21][2],many of which are focused on flexibility (i.e. the ability to use a data structure in the greater number ofapplication scenarios), data structure designed for curve network management are not available off-the-shelf.

This raised the need for the development of an ad-hoc data structure, the polyline mesh DS, for thesupport of the different algorithms that operates on the polyline mesh such as:

• Face splitting, the basic building block for changing the polyline mesh topology• Preprocessing, of the data output of the measuring process.• Refinement through Coons patches• Subdivision surface iteration steps• Nearest neighborhood search

Moreover most of these algorithms have to be performed in real-time primarily to provide a visual feedbackfor the ongoing fast reverse engineering process, so a lot of attention has been given to the efficiency andperformance of the solution.

5.2 Polyline Mesh Components Relationships

An important subject in the design of a polygon mesh data structure is the choice of the appropriaterelationships between the faces, edges and vertices that the data structure will maintain. This decision willaffect the size of the data and how simply and efficiently we can perform some operation on the structure.

Considering our dynamic scenario in which the user modifies in an interactive and incremental waythe polyline mesh topology and we also apply at every step different algorithms for the reconstruction andvisualization of a smooth surface, we need a highly dynamical data structure which offer all the facility foran efficient implementation of different kinds of algorithms.

The current implementation of the polyline mesh DS provides the following relations (See Figure 23)• Every face has a reference to an ordered list of edges• Every edge has exactly two reference to vertices and at most two reference to faces

35

5.2 Polyline Mesh Components Relationships 5 FROM POINTS TO POLYLINE MESHES

Figure 23: Polyline mesh DS components relationships. A mesh with 4 faces, 10 edges and 7 vertices isshown here. All the reference between components are bidirectional. The white dotted lines representbidirectional references available in the data structure.The possible navigation are: face to edge; edge tovertex; vertex to edge; edge to face;

• Every vertex has a reference to an unordered list of edgesIf we describe this relations using the notation used by Rossignac[22][2] we have:{

F, E, V : F ⇒ E 2→ V , V → E 62→ F}

In this notation the symbols V, E, F denote the vertex, edge and face. These are the primary entity nodesin the graph that represents the relations between the components, called the incidence graph. Arrowsindicate incidence relations and their cardinality (i.e. the number of referenced elements) if present is abovethe arrow, otherwise it indicates a variable number of incident elements. For example, F → E implies that

a face points to a variable number of edges while E 2→ V indicates that each edge points to exactly twovertices.

Sometimes, incidence references are organized by couples. For example, a face may have one referenceto its bounding face-edge couples and such case is indicated with parentheses, as in F→(E,V).

When the multiple arcs emanating from a node are ordered (possibly in cyclic fashion), double arrow“⇒” is used instead of “→”. For example, F⇒V indicates that to each face is associated a list of links tovertices that are ordered in a circular fashion around the face.

Given this notation we can compare our solution to other existing implementation.

5.2.1 Face-Vertex

The face-vertex structure is a simple and popular data structure commonly used by the computer graph-ics community. Rossignac gave the notation for this data structure as:

{F, V : F ⇒ V}

36


Figure 24: The face vertex data structure will only maintain the information of which vertices the face ismade of. The reference F->V is unidirectional and thus it is only possible to traverse from faces to vertices.

Figure 25: Winged edge data structure. Besides the relations showed in the image, each edge also points totwo pair (F,E).

which is a data structure where each face points to an ordered list of vertices. The face-vertex data structurehas the advantage of being simple, but the incidence relation indicate that it is only possible to traversefrom faces to vertices. Consequently, the face-vertex data structure could be very improper to use in manysituations.

5.2.2 Winged-Edge

The notation for the winged-edge data structure is:{F, E, V : F → E 2⇒ V, E 4⇒ E, V → E 2⇒ (F, E)

}37


Figure 26: Half edge data structure. Each edge is decomposed into two halfedges with opposite orientationsthat stores different information

Every face points to its edges and each edge points to two vertices and the four neighbouring edges. Eachvertex points to the edges it is part of and each edge also points to two pairs consisting of the neighbouringface oriented by the edge on the counter-clockwise rotation of each face.

5.2.3 Half-Edge

A halfedge data structure is an edge-centered data structure capable of maintaining incidence informa-tion of vertices, edges and faces. Each edge is decomposed into two halfedges with opposite orientations.One incident face , one incident vertex, the previous, the next and the opposite half-edge are stored in eachhalfedge. For each face and each vertex, one incident (random) half-edge is stored.

Reduced variants of the halfedge data structure can omit some of these information, for example thehalfedge pointers in faces or the storage of faces at all.

5.2.4 Quad-Edge

the quad-edge data structure is similar to the half-edge and is primarily based on directed edges (half-edges) that store all the topological information. A possible representation is3:{

he, F, V : he 2→ F, he 4⇒ he, he 2→ V, V → he}

where he represents an half-edge.

5.2.5 Corner-Table

The corner-table data structure was developed for the storage and compression of triangle meshes storedas triangle strips [23]. In Rossignac’s notation, the corner-table data structure is:

3This representation differ from the one given in [2] but seems confirmed in [3]

38


Figure 27: Quad edge data structure. Each half edge has the references to the two adjacent faces, the twovertices and to the four half edges belonging to the adjacent faces.

Figure 28: Corner table data structure. Each vertex points to all the vertex reachable through connectededges.

{F, V : V 65→ V, V 1→ F

}The corner-table data structure is limited as it can only represent triangle meshes, moreover, the vertices arestored in arrays and the indices of these arrays, i.e. the order in which the vertices are stored, describe therelations between the vertices and therefore the cost of modifying a mesh is high. If a new vertex is insertedor removed, all the arrays must be reallocated and the indices recomputed. An algorithm that is basedon a corner-table data structure can't modify the mesh, but instead can create new meshes as temporaryoutput. The corner-table data structure is not appropriate for dealing with dynamic situations although thesimplicity of this structure makes it easy to implement many applications that require fast data access andsmall memory usage.

In order to compare our data structure to these mentioned above, we can consider as an example howmuch pointers are necessary for representing a cube and the results are summarized in Table 5.2.5. As an

39


Representation Pointers for a Cube Possible T raversals

face-vertex 24region to faceface to vertex

winged-edge 192

face to edgeedge to faceedge to vertexvertex to edge...

half-edge 134

face to half-edgehalf-edge to vertexhalf-edge to faceHalf-edge to half-edgevertex to half-edge

quad-edge 200

edge to half-edgehalf-edge to edgeedge to vertexvertex to edge

corner-table vertex to vertexvertex to face

PolylineNetwork DS 96

face to edgeedge to faceedge to vertex

Figure 29: Polygon mesh data structures possible traversal and approximatememory requirements in terms of number of pointers necessary to represent acube

example the number of pointers necessary to store a cube for the Polyline Mesh DS are computed as follow:• 6 faces each pointing to 4 edges.• 12 edges each pointing to 2 faces and 2 vertices.• 8 vertices each pointing to 3 edges.Note that, for comparing our data structure with other existing data structures we are not considering

the polylines associated with each edge, i.e. we are not considering the references from each edge to the listof points that represent a polyline.

In the same table are also shown which are the possible traversal between the mesh components. Whatcan be seen is that our data structure allow an intuitive navigation of the mesh offering the most naturaldirection of navigation: from face to edge, from edge to vertex, from vertex to edge and from edge to face (F ←→ E ←→ V simplifying the Rossignac notation).

These are also the relationships offered by the winged-edge but as can be seen from Table 5.2.5 thememory footprint for the storage of a cube mesh is exactly halved and is lower than both the half-edge andthe quad-edge data structures.

40

5.3 Polyline Mesh Memory Layout 5 FROM POINTS TO POLYLINE MESHES

5.3 Polyline Mesh Memory Layout

Regardless from the high level data structure design, i.e. the design and evaluation of access relationsbetween edges, faces and vertices, we can note how, on a lower level, the data structure is simply an orga-nization of direct and indirect references to the 3D points that represent the polylines. Therefore if the highlevel design must be tailored to provide a coherent vision for data retrieval to allow an intuitive implemen-tation of the algorithms that operates on the data stored in the polyline mesh, on a lower level we have todecide how the polylines and their 3D points lie in memory.

To this end is important to point out that many variables affect in different ways the efficiency of theparticular choice, depending on data layout and allocation strategies, and for which we have to evaluateanyhow a trade-off that maximizes the advantages for a specific category of algorithms. In the matter ofefficiency one of the most important features that a data structure could possess, respect to the algorithmthat operates on that structure, is the so called Memory Locality.

We will now present some basic concepts about CPU Caches and Memory Locality and how the perfor-mance of an algorithm implementation w.r.t cache efficiency could be evaluated (through Cache Profiling)with the purpose of applying this knowledge to the low-level design of the Polyline Mesh Data Structure.

5.3.1 CPU Cache

Figure 30: Typical hierarchy of memories

In the typical hierarchy of memories of current computer architectures, cache memories sit between thefast processor registers and slow and inexpensive main memory and hold regions of recently referenceddata. When the processor wishes to read or write a location in main memory, it first checks whether thatmemory location is in the cache and, if so, we say that a cache hit has occurred, otherwise we had a cachemiss. The latter incur a slowdown to fetch the corresponding data from main memory and also, in order tomake room for the new entry, the cache generally has to substitute one of the existing entries. The algorithmthat it uses to choose the entry to evict is called the replacement policy. The fundamental problem with anyreplacement policy is that it must predict which existing cache entry is least likely to be used in the future.Four of the most common cache line replacement algorithms are:

41


• Least Recently Used: the cache line that was last referenced in the most distant past is replaced.• First In- First Out: the cache line from the set that was loaded in the most distant past is replaced.• Least Frequently Used: the cache line that has been referenced the fewest number of times is replaced.• Random: a randomly selected line form cache is replaced

When data is written to the cache, it must at some point be written to main memory as well. The timing ofthis write is controlled by what is known as the write policy. In a write-through cache, every write to thecache causes a write to main memory. Alternatively a write-back cache marks which locations have beenwritten over and the data in these locations is written back to main memory only when that data is evictedfrom the cache.

Caches are generally characterized by three parameters:• associativity• block size• capacity

If the replacement policy is free to choose any entry in the cache to hold the copy, the cache is called fullyassociative and conversely, if each entry in main memory can go in just one place in the cache, the cache isdirect mapped. Many caches implement a compromise, and are described as N-way set associative.

There are three kinds of cache misses: instruction read miss, data read miss, and data write miss.A cache read miss from an instruction cache generally causes the most delay because the processor, or

thread of execution, has to wait until the instruction is fetched from main memory.A data read miss usually causes less delay, because instructions not dependent on the cache read can be

issued and the processor is not in stall.A data write miss generally causes the least delay, because the write can be queued and there are few

limitations on the execution of subsequent instructions.Furthermore any miss could be separated in three categories :• Compulsory or Cold misses are those misses caused by the first reference to a data.• Capacity misses are those misses that occur solely due to the finite size of the cache.• Conflict misses are those misses that could have been avoided if the cache not evicted an entry earlier.

Caches work because most programs exhibit significant memory locality.

5.3.2 Memory Locality

Memory locality, locality of reference, or the principle of locality all refer to the same concept: the factthat programs tend to access only a small part of the address space at a given point in time.

There are two basic types of locality of memory reference: temporal and spatial.• Temporal locality

A memory location that is referenced by a program at one point in time is likely be referenced againin the near future.

• Spatial localityMemory close to the referenced memory is likely to be referenced soon. This means that if a programhas referred to an address, it is very likely that an address in close proximity will be referred to soon.

Caches exploit temporal locality by retaining recently referenced data and exploit spatial locality by fetchingmultiple contiguous words, a cache block, whenever a miss occurs.

42


Because of finite cache size and limited associativity, the order in which the instructions are executedand the layout in memory of accessed data can have significant impact on the effectiveness of a particularalgorithm.

From an empirical and implementative point of view such principle translates into a data structuredesign tailored to maximize the number of both read and write cache hits and conversely to minimize cachemisses for which is necessary going up the memory hierarchy, worsening data access times.

5.3.3 Cache Profiling

Traditional execution time profiling tools like GNU gprof provide information about how a programspent its time and which functions called which other functions while it was executing. This information canshow which pieces of the program are slower than expected, and might be candidates for rewriting to makethe program execute faster. While this information is undoubtedly useful, what is missing is an analysis ofthe causes of the execution slowdown. Sometimes what is causing a performance issues could be due toa bad data structure and algorithm design that does not exploit the memory locality principle. To identifysuch situations cache profiling tools help the developer by providing information on how instructions anddata are retrieved and stored in the memory hierarchy.

Cachegrind, part of the Valgrind framework [24], is a tool for finding places where programs interactbadly with caches and run slowly as a result. In particular it will do a cache simulation of a program andcan then annotate the source code line-by-line with the number of cache misses. The following statistics arecollected:

• L1 instruction cache reads and misses;• L1 data cache reads and read misses, writes and write misses;• L2 unified cache reads and read misses, writes and writes misses.

Cache profiling can be very useful for improving the performance of a program and also, since one instruc-tion cache read is performed per instruction executed, this statistic allow to find out how many instructionsare executed per line, which can be useful for traditional profiling and test coverage.

Cachegrind simulates a machine with independent first level instruction and data caches (I1 and D1)and a unified second level cache (L2). This configuration is used by almost all modern machines.

The cache configuration simulated (cache size, associativity and line size) is determined automaticallybut it is possible to manually specify the cache specific features from the command line

Valgrind's cache profiling has a number of shortcomings and in fact it does not account for kernel andother processes activity and will schedule threads differently from how they would be when running na-tively but while these and other factors make the results not highly accurate, they are close enough to beuseful.

Cachegrind, has been widely used during the development of the polyline mesh data structure. Some ofthe results of these tests are presented in the next section while more information can be found in Appendix/experimental results.

5.3.4 Memory Management

Coming back to our data structure and considering the memory locality principle we can say that thedata layout that should guarantee the best temporal and spatial locality is a plain simple sequential data

43


structure, i.e. a mono-dimensional array in which the 3D points that belong to a polyline are in contiguousmemory locations. Even in this simple situation we can do some considerations about different implemen-tation choices, such as static or dynamic allocation strategies, the use of structure types for encapsulationrather then simple floating point values or the use of external vector libraries for managing 3D points.

Every programming language provides facilities for managing memory occupied by the different datatypes and objects. The C language provides three distinct ways to allocate memory for objects:

• Static memory allocation: space for the object is provided at compile-time; lifetime of these object isthe same as the program (global variables). Statically allocated variables reside in the data segment.

• Automatic memory allocation: temporary objects can be stored on the stack, and the memory is auto-matically freed after the block in which they are declared is exited (local variables)

• Dynamic memory allocation: blocks of memory of arbitrary size can be requested and freed at run-time using library functions from the region of memory called heap;

These three approaches are appropriate in different situations and have various trade-offs. For example,static memory allocation has no allocation overhead, automatic allocation may involve a small amount ofoverhead, and dynamic memory allocation can potentially have a great deal of overhead for both allocationand deallocation. On the other hand, stack space is typically much more limited and transient than eitherstatic memory or heap space, and dynamic memory allocation allows allocation of objects whose size isknown only at run-time. Where possible, automatic or static allocation is usually preferred because thestorage is managed by the compiler, freeing the programmer of manually allocating and releasing storage.However, many data structures can grow in size at runtime and since static and automatic allocations musthave a fixed size at compile-time, there are many situations in which dynamic allocation must be used.

The C language provides several functions for memory allocation and management:• malloc and calloc to reserve space• realloc, to resize or move a reserved block of memory to store data of different dimensions• free, to release memory back

These functions can be found in the stdlib library.As an example for an external vector library we will use the meschach library [25] vector objects4 are

represented with the VEC structure that is declared as:

typedef struct{ unsigned int dim, max_dim;

Real *ve;}VEC;

A 3-Dimensional vector could be allocated via:

VEC* v = v_get( 3 );

The v_get function simply invokes the allocation procedure malloc to reserve the necessary space forthe storage of the structure itself and for a mono-dimensional array (with double precision if not differentlyspecified during compilation) of size specified as parameter.

4The ANSI C language doesn't offer any facility for Object Oriented programming but the C struct data type could be considered aprimitive object type which only offer simple data encapsulation.

44


Figure 31: Forced memory contiguity of both VEC objects and VEC coordinate arrays through preallocationof memory blocks of suitable size.

Figure 32: Normal meschach allocation: all VEC objects and their coordinates are allocated separately bymultiple malloc calls without guarantee of memory contiguity.

So, assuming to choose meschach and its vector type VEC for the internal representation of the 3D pointsof our data structure we could allocate an array (statically or dynamically) of VEC pointers and initializethe single 3D points through the v_get(3) function supplied by the library.

The problem arising from this solution is that subsequent malloc invocations does not guarantee con-tiguity in the memory layout of the point coordinates (See Figure 32). As we will see later, such problemtranslate into a considerable performance loss during sequential access both on reading and writing opera-tion mostly due to a low memory locality proved by an increment in the cache miss rate.

Knowing what is causing the performance issue, one possible solution could be to not delegate the mem-ory allocation burden to the meschach library functions but forcing memory contiguity by preallocating asingle memory chunk through a single malloc invocation and then initializing the vector element pointers(i.e. the Real* ve fields of the VEC structures) so that they reference a previously allocated memory block(see Figure 31).

With this solution we gain both the advantages of the sequential memory layout and the advantage of

45


meschach library support. An obvious other solution could be to store the 3D points in a simple mono-dimensional array and encapsulate the 3D points in a VEC structure by need, i.e. whenever is necessary tocall a meschach routine we call the v_get function and allocate the needed points. But in this way we paya price in memory consumption (duplicating the points data) and time needed for dynamic data allocationand deallocation.

Even with the forced memory contiguity solution we have a slight performance loss (compared to themono-dimensional array solution) primarily due to the introduced level of indirection during the access ofthe points coordinates. A more in-depth analysis of this problems is in the next chapter.

5.3.5 Performance of Different Solutions

Let us consider the following data structures:• s1: Statically allocated doubles array

double s1[TEST_SIZE*3];

• s2: Dynamically allocated doubles array:

double* s2;s2 = malloc( sizeof(double) * 3 * TEST_SIZE );

• s3: Statically allocated VEC array. The VEC structure array s3 and the double arrays s3[i].ve are not incontiguous memory locations.

VEC* s3[ TEST_SIZE ];for( i=0 ; i<TEST_SIZE ; i++ )

s3[i] = v_get( 3 );

• s4: Dinamically allocated VEC array. The VEC pointers array s4 and the double arrays s4[i]->ve areallocated by the meschach funtion and are not in contiguous memory locations.

VEC** s4;s4 = malloc( sizeof(VEC*) * TEST_SIZE );for( i=0 ; i<TEST_SIZE ; i++ )

s4[i] = v_get( 3 );

• s5: The VEC structure array s5 is statically allocated while the double arrays s5[i].ve are in the heap ina preallocated contiguous memory segment.

VEC s5[ TEST_SIZE ];double* p = malloc( sizeof(double) * 3 * TEST_SIZE );for( i=0 ; i<TEST_SIZE ; i++ ){ s5[i].dim = 3;

s5[i].max_dim = 3;s5[i].ve = &p[i*3];

}

46


• s6: Both the VEC pointers array s6 and the double arrays s6[i]->ve are in the heap in a preallocatedcontiguous memory segment.

VEC* s6;s6 = (VEC*)malloc( sizeof( VEC ) * TEST_SIZE );double* p = malloc( sizeof(double) * 3 * TEST_SIZE );for( i=0 ; i<TEST_SIZE ; i++ ){ s6[i].dim = 3;

s6[i].max_dim = 3;s6[i].ve = &p[i*3];

}

Using these data structures in order to analyze memory locality, cache efficiency and the consequentperformance impact, the following operations have been executed:

• Sequential reads• Sequential writes• Random reads• Random writesMoreover, for each of the 6 data structure, the read and write operations have been executed with and

without the caching of the base address pointer of the 3D point on which we execute the operation. Thisseparation allowed to evaluate an eventual advantage on performance.

In this paragraph we will first see some results of only the sequential access operations because theyare a more representative access pattern in our scenario (see for example the nearest neighborhood searchalgorithms).

In particular this first test is executed with:• 100000 3-Dimensional points• 300000 double precision coordinates• 100 iterations on the single read and write operations.• 10 iterations to average the results

on a laptop machine with 1.6 GHz intel pentium M processor with 32 kb L1 instruction cache, 32 kb L1 datacache, 1 Mb L2 unified cache, 512 Mb of main memory, running with Ubuntu 7.10.

As expected, comparing the results on sequential reads and sequential writes on the structure s1, s3 ands5, we can see how a careful design tailored to the maximization of locality in memory access could leadto a considerable performance improvement. Moreover we can note how the differences in execution times(positives and negatives) in respect to the base pointer address caching are primarily due to the increasednumber of instruction executed and are not linked to cache miss rate variation.

Comparing the statistics about the access to structures allocated statically at compile-time or dynamicallyat run-time (e.g. s1 and s2) we can note how, despite there aren’t differences in cache management, there arehowever differences, even if minimal, in the access times. This means that even if the malloc implementationguarantee the memory contiguity in both cases, the data available in the data segment is probably treateddifferently by the OS.

Such consideration could not lead us to the decision of utilizing the data segment or the stack for thestorage of points coordinates since, to guarantee scalability in terms of mesh complexity, dynamic memory

47


s1 s2 s3 s4 s5 s6

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Sequent ial writes

w/o cachingw caching

tim

e

Figure 33: Execution times of write operations for the 6 different structure

s1 s2 s3 s4 s5 s6

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Sequent ial reads

w/o caching

w caching

Figure 34: Execution times of read operations for the 6 different structure

allocation allow a finer memory management (run-time allocation, deallocation and reallocation of memoryblocks) during the program execution and, for this reason, through the rest of this paragraph only thedynamically allocated structures s2, s4, s6 are considered.

The second set of tests was executed on a desktop machine with a dual core 2.8GHz intel pentium Dprocessor with 2x16kb L1 data cache, 2x12kb L1 instruction cache, 2x1MbL2 unified cache, 1024Mb of mainmemory, running on ubuntu 7.10.

In Figure 35 and 36 are also shown collected data about the performance respect to random read and ran-dom write operations with a variable number of 3D points in Figure 35 and a variable number of iterationson the single operation (i.e. how many times an operation is executed on the same point) in Fig. 36.

What can be easily seen is the indisputable evidence of how the simplicity of a mono-dimensional arrayis followed by a great performance gain. Even the s6 structure, with data in contiguous memory location

48


Figure 35: Execution times for each operation with a variable number of iteration on operations. From left toright, top to bottom: sequential writes, sequential reads, random writes, random reads.Bold lines representsthe use of base pointer address caching. Unfortunately the legend is automatically generated by a bashscript and reports wrong names of the structure: s1, s2 and s3 are the structure called s2, s4, s6 in the text

has a non negligible performance loss and this is, I think, the most important consideration because is tellingus that even a single level of indirection in the access of data have to pay in efficiency due to both growthin number of instruction executed and data cache misses. The latter increases because we not only need tostore in the cache the data itself but also the addresses (pointers) that allow to retrieve the data, filling cachecapacity earlier and increasing capacity misses probability.

This extreme application of the Ockham's razor principle badly couple with object oriented program-ming and the encapsulation technique which is the base of the information hiding principle and could't beused in every application but, in our scenario, we have strict requirements in terms of efficiency and wehave to pay careful attention to every detail in order to fulfill our real-time requirements.

We arrived at this conclusion starting from the analysis of the possibles layout of the point coordinatesof the 3D points that represent the polylines. Analogous considerations could be done about the storage of

49


Figure 36: Execution times for each operation with a variable number of points stored. From left to right,top to bottom: sequential writes, sequential reads, random writes, random reads. Bold lines represents theuse of base pointer address caching. Unfortunately the legend is automatically generated by a bash scriptand reports wrong names of the structure: s1, s2 and s3 are the structure called s2, s4, s6 in the text

the faces, edges and vertices that, although they could be some order of magnitude less in number, are oftenused by the algorithms that operates on the polyline mesh with a sequential access pattern and consequentlycould take advantage from the application of the memory locality principle. In the current implementationfaces, edges and vertices are stored in simple mono-dimensional arrays. A particular that is worth notingis that while the 3D points are prevalently static, the faces, edges and vertices are often deleted to then besubstituted by one or more new modified versions (see for example the face splitting algorithm or the Coonsalgorithm). This “dynamism” raise two problems:

• the need for variable length arrays• the management of the free memory blocks

Both are well known problems and different solutions exist but an analysis of the strength and weaknessesof the different strategies is outside the purpose of this thesis.

50

5.4 Building the Polyline Mesh 5 FROM POINTS TO POLYLINE MESHES

5.4 Building the Polyline Mesh

We used the term interactive surface sketching to denote the process of “intuitively“ drawing the stylelines of the object directly on the surface of the object. But how much of this process is really free and how isthe curve insertion seen and handled by the low level data structure? The following two paragraphs answerthis question.

5.4.1 Creating the border edges

The first step in the creation of a curve network as seen from the low-level data structure view, is thedefinition of the initial polygon that will define the first face.

In this steps the edges inserted are subject to different constraints:• the first edge inserted does not have constraints• “intermediate” edges have to start from the last point (vertex) of the last edge entered.• last edge have to start from the last point (vertex) of the last edge entered and have to finish on the

first vertex of the first edgeAfter the user draws freely the first curve, we can simply force to start the new edge from the last vertex ofthe last edge created.

Regarding the last edge, if we do not want that the user have to explicitly request the closure of thepolygon, we can show a visual feedback that activates when the user is under a threshold distance from thefirst vertex of the first edge.

One important feature of this step is the support for automatic snapping/selecting of the vertices neededto fulfill the constraints on the edge insertion and the development of this feature is the subject of chapter5.5.

5.4.2 Face Splitting

Every times the user add a new curve in the curve network the topology of the face containing thestarting and ending points of the new curve will change. The process of updating all the necessary relationsbetween the involved polyline mesh entities is called Face Splitting and will be here described in detail.

Let's introduce some nomenclature for describing the algorithm:• fs : the split face• e1 : the starting edge associated with the polyline

Ve10 = Pe1

0 , ..., Pe1i , ..., Pe1

n = Ve11

where Pe1i is the starting point of the new edge

• e2 : the ending edge associated with the polyline

Ve20 = Pe2

0 , ..., Pe2j , ..., Pe2

m = Ve21

where Pe2j is the ending point of the new edge

• enew : the new edge associated with the polyline

51


Figure 37: The implemented solution: when an edge e1 is split into e11 and e12, the points of the associatedpolylines are copied into two newly allocated arrays

Venew0 = Pe1

i = Penew0 , ..., Penew

l = Pe2j = Venew

1

The main steps of the face splitting algorithm are:1. Create 2 new vertices venew

0 , venew1 , and 4 new edges e11,e12,e21,e22

2. Identify external faces f e1ext, f e2

ext and update them3. Create the new edge enew4. Delete the old face fs5. Create the new faces f0, f16. Delete the split edges e1,e2

Face Splitting – step 1The algorithm begins by creating the new vertices ve1 , ve2 which are not associated, for now, with anyedge, then creates 2 new edge for every split edge:

e11 : Venew0 = Pe1

i , ..., Pe10 = Ve1

0

e12 : Venew0 = Pe1

i , ..., Pe1n = Ve1

1

e21 : Venew1 = Pe2

j , ..., Pe20 = Ve2

0

e22 : Venew1 = Pe2

j , ..., Pe2n = Ve2

1

The creation of these new edges implies the associations with the newly created vertices venew0 , venew

1that now both have valence 2.The implementation of the creation of the 4 new edges proceeds copying all the points of the 4 newpolylines from the array associated with e1,e2 in 4 newly allocated arrays.

52


Figure 38: In this solution the polylines associated with the newly created edges simply points to the existingpolyline of the split edge

This solution was chosen among other two:• Without copying any point and allocating any memory, just set the pointers to reference the lo-

cation in the existent polyline of e1 and e2 (see Figure 38). This solution needs to handle twoissues:

– The points Pe1i and Pe2

j are shared among the arrays of e11,e12 and e21,e22– We need to keep track of the multiple references to the same allocated memory because when

we delete the edges e1 and e2 we do not want to deallocate the polylines associated with thenew edges.

• If we copy just the “second” part of the array and shrink the first part we can avoid only the issueof the shared vertex point (see Figure 39)

Despite the fact that the chosen solution could be the less efficient, opting for one of the other twosolutions would increase code complexity because of the need of reference counting.

Face Splitting – step 2For every pair of edges we can identify one shared face fs (the split face, input of the algorithm), andone “external” face per edge f e1

ext, f e2ext.

When the starting and ending point of the new edges are internal points of the polyline, and notvertices, determining the external faces is straightforward and, in the case that the starting and/orending edge of the new polyline are border edges the update process is simply skipped.Once identified the external faces, if they exists, the algorithm proceeds modifying them by eliminat-ing the split edge from the face and inserting the two new edges, e11,e12 in f e1

ext, and e21,e22 in f e2ext. This

is accomplished as follows:• Before eliminating the split edge from the face, use it to obtain the reference to its “previous” and

“next” edges in the external face. Call it ee1prev, ee1

next for f e1ext, and ee2

prev, ee2next for f e1

ext.

53


Figure 39: In this solution we avoid the shared vertex problem by allocating only one part of the splitpolyline in a new memory block.

• Remove the edges e1,e2 from the external faces, but do not delete them because they will be usedlater.

• Now the algorithm has to insert the new edges into the external faces preserving the order of theedges. To do that, the algorithm tests if e11 shares a vertex with ee1

prev, if so, we can add e11 betweenee1

prev and ee1next and then add e12 between e11 and ee1

next in f e1ext. If the test fails we are sure that e12

will share a vertex with ee1prev because in step one we created:

e11 : Venew0 = Pe1

i , ..., Pe10 = V e1

0

e12 : Venew0 = Pe1

i , ..., Pe1n = V e1

1 with Ve10 , Ve1

1 vertices of e1

and since e1 shares a vertex with ee1prev (because of adjacency) then e11 or e12 has to share a vertex

with ee1prev. Thus, if the test does fail, we can add e12 between ee1

prev and ee1next and then add e11

between e12 and ee1next in f e1

ext.The same procedure is applied to e21 and e22.

Face Splitting – step 3enew , the new edge associated with the new polyline Venew

0 = Pe1i = Penew

0 , ..., Penewl = Pe2

j = Venew1 , is

created but not linked to any face. The valence of the two vertices Venew0 , Venew

1 is updated to 3 becausenow Venew

0 is shared among e11, e12 and enew, and Venew1 is shared among e21, e22 and enew.

Face Splitting – step 4Deleting the shared face fs from the polyline mesh structure in this step doesn't serve to free the spacereserved to the face itself, but to eliminate the references to fs which every edge of fs has. This isnecessary because we are going to add new faces to the edges of fs and we won't violate the basicassumption that every edge can't be shared by more than two faces.

Face Splitting – step 5To create the two new faces f0, f1, information on the adjacency of the edges of (the copy of) fs is used.In a nutshell the algorithm for creating the first face starts from a split edge, adds the first part of thesplit edges and navigates the edges of the face in a random direction adding the edges encountered

54


until the other split edge is reached. Then it adds the first part of the second split edge and the newedge to close the polygon that defines the first face. The second face is created starting from thesecond part of the second split edge and continues navigating and adding edges in the same directionin which the first split edge is encountered. Then it adds the second part of the first split edge and thenew edge.In detail, the actual implementation works as follows:

• start from e11, add it to f0 (see Figure 40.a)• test which of the 2 adjacent vertex to e1 share a vertex with e11. This edge (that we call e f0

next)defines a “direction of navigation of the face”. (see Figure 40.b)

• navigate the edges of fs in that direction adding the edges encountered until the other split edgee2 is reached.

• Now, test which of the 2 part of e2 ( e22, e21) share a vertex with the last edge added to fs, (call ite2i) and add it to fs. (see Figure 40.c)

• add the new edge enew to close the polygon that define the first face. (see Figure 40.d)• to create the second face f1 start adding the second part of e2• navigate the edges of fs in the same direction adding the edges encountered until e1 is reached.• add e12 to f1• add the new edge enew to close f1

Face Splitting – step 6End the face splitting algorithm deleting the edges e1,e2 and their associated polylines that are nolonger needed.

5.4.3 Face Splitting Special Cases

When the first and/or last points are on a border edge, i.e. they have only one adjacent face, the algo-rithm proceeds as normal, simply skipping the step of updating the external face.

When the first and/or last points are the first or last point of the edge, i.e. they are vertices, the facesplitting algorithm is slightly different, in fact it is not needed to create new vertices, to split and create newedges, to modify external faces and the creation of the new faces is simpler.

With the purpose of code maintainability, to avoid the implementation of the three different cases of facesplitting:

• from edge to edge• from vertex to edge and viceversa• from vertex to vertex

the following solution has been adopted:For all of the three cases the input of the algorithm is the same: two edges , two points on that edges and

the shared face. When one or both the points are vertices we can pass as input a fake edge, i.e. a degeneratepolyline with only one point which is also the starting and ending vertex. If we add this fake edge betweenthe edges adjacent to the corresponding vertex, we execute the algorithm and delete the split edges fromthe two new faces, then we obtain the result.

55

5.5 Search Strategies in the Polyline Mesh 5 FROM POINTS TO POLYLINE MESHES

Figure 40: creation of the first face during the face splitting algorithm

5.5 Searching Strategies in the Polyline Mesh

Considering the characteristics and the topological constraints of a polyline mesh we have to supply theuser with an automatic procedure for selecting the starting and ending vertices during the process of curvecreation. The user is indeed constrained to insert curves (that are internally represented with an edge andan associated polyline) in which the starting and ending vertices must be points or vertices of pre-existentpolylines that are associated with edges of the same face. Thus the problem that arises is the need to findthe nearest pre-existent point or vertex at the minimum distance from the 3D point currently output of thetracking process that we will call Pout. This type of search is commonly known as Nearest NeighborhoodSearch (NNS).

In our scenario the problem is: given a set of N points in R3 (all points of all polylines) and a query pointPout ∈ R3, find the closest point in N to Pout.

The analysis of such problem leads us to the identification of the following possible solutions:

56


Figure 41: Exhaustive search will compute the distance from the query point to all the points of all thepolylines

1. Exhaustive Search2. Search Using OpenGL Selection Mode3. Incremental Search with Feedback4. Search on Bidimensional Grid5. Search on Bounding Volume Hierarchy

5.5.1 Exhaustive Search

A naive exhaustive search, or linear search, would calculate the distances of the point Pout from all thepoints of all the edges that currently are in the polyline mesh data structure. Moreover, if we don't want justthe nearest point but the k-nearest neighborhood, we have to partially order the results during the search.Obviously, for scalability and efficiency matters, this search type could be a bottleneck for the real-timeapplication, unless we organize the data in a spatial data structure of some kind (bsp, kd-tree, r-tree, octree,regular grids etc.), that will restrict the number of necessary operations for the search of the (k-)nearest-neighborhood of the 3D point Pout.

Whichever spatial data structure we choose, we have to consider the management costs for insertion,deletion and modification of the data and the memory impact of the chosen solution depending on thepredominant operations that act on the data in order to evaluate and quantify the eventual performanceadvantages.

5.5.2 Search Using OpenGL Selection Mode

In the OpenGL library, the “selection mode“ allows us to identify which primitives are internal to aspecified view frustum volume. In this volume only the primitives to which we have previously associateda numerical ID, not necessarily unique, are searched for containment. Such functionality could be usedto build a cubic volume around the point Pout and examine which point is the nearest to Pout and, withsome additional work, to which edge this point belongs. The problem with the latter solution is that thedimension of the OpenGL stack on which we store the numerical ID's of the points and edges primitives israther limited (at least 64, more often 128 or 256) compared to the amount of points that we expect in ourpolyline mesh data structure. Moreover, trying to solve the previous problem, for example executing morethan one selection mode renderings, we still have to establish the dimensions of the view frustum volumes

57


Figure 42: The OpenGL selection mode allow to compute only the distances from the query points to thepoints of the polylines inside the selection frustum

Figure 43: An incremental search will first compute the distance from the query point to the edges, and thenonly the distances from the points of the polylines associated with the nearest edge(s)

on which we search for neighborhoods because a small volume could not find any result and consequentlywill be necessary to proceed with an unpredictable number of iterations with increasing volume size (buthow many iterations? How much do we increase the volume?) and conversely, if over-dimensioned, thenumber of primitives to analyse could be numerous.

5.5.3 Incremental Search with Feedback

The following solution is based on an approximation deriving from a peculiarity of the polyline mesh.Indeed, if we consider the edges of the polyline mesh as an adequate approximation of the polyline towhich they refer, we can perform an exhaustive search and ordering of the distances from the point Poutto the edges and then execute the search of the nearest point only between the found edges. In fact if anexhaustive search on all the points of all edges could be a performance and scalability issue, in our scenarioan exhaustive search on the edge distances is not likely to be a problem.

Such a solution, being an approximation of the real situation, can lead to false positive results, discardingthe real solution. A quick solution to this issue is to delegate to the user the responsibility of identifying suchsituations and allowing him to request a further iteration of the algorithm that performs a search betweenthe points of the subsequent edges (subsequent with respect to the ordering performed in the first step). Theterm “incremental search with feedback” underlines the ability the user has to visually evaluate the found

58


Figure 44: Given a specific plane in space, we can project the 3D points of the polylines onto that plane andpartition the data in a regular grid.

result and to decide if the shown nearest point found is suitable.

5.5.4 Search on Bidimensional Grid

Organizing the polyline points in a bi-dimensional (eventually regular) grid, built upon the image in“window coordinate system“ of the scene seen from a predefined point of view, we could execute a nearestneighborhood search that would exploit a culling based on the window coordinates of Pout.

The first problem arising from this solution is the choice of the point of view from which the bidimen-sional data structure is built. This choice considerably effects the spatial data layout as well as eventualrejection of points due to frustum culling and consequently the effectiveness of the search. For example ifwe choose to use the same camera point of view that is used to render the image on the user screen we haveto consider that the grid data structure, being view-dependent, must be recalculated at each transformationof the camera matrix. Instead, if we opt for a second static camera, we need to establish its position andconsider the price of an additional rendering process that has to be executed at each insertion, deletion ormodification of the points in the polyline mesh data structure.

Moreover this type of search does not provide the information (necessary for our purposes) about whichedge point belongs to, and, if we intend to build a bi-dimensional grid structure to index the edges, insteadof the points, which advantages this solution has with respect to the previously mentioned “incrementalsearch with feedback“ considering that:

1. It exploits a worse approximation of the distance from Pout to an edge because it works on discreteprojections of the edges and consequently with distances based on the projection of Pout and the pro-jections of the edges.

2. It is still an approximation and needs user feedback in the presence of false results.3. The grid must be updated at each change of the point of view on which the structure is built (unless

we use the other proposed solution that unfortunately does not lack its share of disadvantages).

5.5.5 Search on Bounding Volume Hierarchy

Another way to exploit the aforementioned peculiarity of the spatial layout of the data and to takeadvantage of our primary Polyline Mesh data structure (intended until now only for storage of the curvenetwork) could be found by considering this data structure as a Bounding Volume Hierarchy (BVH). Indeed,

59

6 FROM POLYLINE MESH TO SMOOTH SURFACES

Figure 45: The distances from the query point to the edge aligned bounding boxes constitute in our situationa good spatial proximity approximation for the points of the polylines contained in them. Therefore we canexploit in different ways this estimate restrict the search only between the points of the polylines associatedwith the bounding boxes nearest to the query point

if for every edge we compute an edge aligned bounding box that wraps the points of the polyline to whichthe edge is linked and, for every face, we compute an axis aligned bounding box that includes the boundingboxes of its own edges we obtain a 2 level bounding volume hierarchy.

Such BVH could be exploited in different ways for the nearest neighborhood search:1. Use the kNN-optimal algorithm[26]2. Perform an incremental search with feedback using as heuristic strategy instead of the distance from

the edge to Pout as before, the distance from the edge aligned bounding box to Pout. Although this doesnot completely eliminate false positive results, it is certainly a more accurate estimate.

3. Perform a search using OpenGL selection mode and draw in this rendering mode only the primitivesthat represent the edge bounding boxes or, in two steps, perform a first rendering of the face boundingboxes, pruning the faces (and their edges) that do not intersect with the view frustum volume, and thenproceed with the edge BB's primitives or directly on the remaining polyline points. However, whilethis search might solve the problem of the limited size stack, we still have to confront the describedproblems arising from the choice of the view frustum volume dimensions.

It is worth noting how, by using the polyline mesh data structure as a Bounding Volume Hierarchy, we easilyavoid a lot of issues relative to the ex-novo creation and management of a spatial data structure because thehierarchical organization of the nodes and the positioning of the points inside the nodes are automaticallyinferred from the polyline network topology that is created during the fast reverse engineering process.Therefore all problems such as where to insert a new node, when to split leaf and intermediate nodes,which are the maximum dimensions of nodes and the deep level of the structure etc.5, are, in our case,elicited from the intrinsic structural properties of the curve network.

60

6 FROM POLYLINE MESH TO SMOOTH SURFACES

Figure 46: The steps of 3D reconstruction from curve network of our fast reverse engineering pipeline.

6 From Polyline Mesh to Smooth Surfaces

So far we have seen the basics of our fast reverse engineering process in which through the stereo visionsystem we acquire a network of curves incrementally refining the topology of the underlying polyline meshthrough the face splitting algorithm that is supported by a nearest neighborhood search that helps the userduring the selection of pre-existent vertices or points.

While at each step we are adding visual information to the 3D virtual model that at the end of theprocess should represent the physical object, the visualization of the mere network of polylines does notalone provide a sufficient visual feedback on the ongoing process. Since the beginning, in the design of ourreverse engineering process, we had clear in mind that at the end of the measuring process the network ofcurves had to be used to reconstruct the final 3D model through subdivision surfaces. Nothing prevents usfrom using the same algorithms during each step of the process.

It is also true that, in the first steps, trying to extrapolate whichever type of surface or detailed mesh fromdata that still lacks most of the visual information could lead to misleading results so, most likely, the bestchoice is to provide the user with both visualizations. Moreover, the visualization of the unrefined polylinemesh is more suitable in assisting the user in the creation of new edges because it allows a more cleardistinction and visualization of the polyline base entities (faces, vertices, edges and associated polylines).

Thus while the polyline mesh visualization is an important feedback, the smooth surface visualization is

5This and others are typical problems of spatial data structures (such as the R-tree) and normally are solved through heuristicfunctions that evaluate penalties and advantages of a set of possibilities.

61

6.1 From n-sided polygons to Tri-Quads 6 FROM POLYLINE MESH TO SMOOTH SURFACES

equally important to detect regions of the object that need to be further detailed and to give an overall ideaof what the result will be with the current acquired curve network.

During the acquisition process no limitations are imposed on the user regarding the number of sides ofthe faces nor on their planarity or convexity. In fact, the user has to only choose a face and draw the newpolyline from a starting vertex or point of the face to an ending vertex or point of the same face but on adifferent edge. This freedom has a drawback: the subdivision surface scheme that we chose when designingour reverse engineering process, works with a mesh made of only quad faces as input6.

Therefore, we have to pre-process the polyline mesh containing arbitrary n-sided faces to transformthem into quads in order to supply the correct input to the subdivision process. To this end, we will applytwo steps of preprocessing:

• a step of “triquadrification“ that transforms every n-sided face, with n>4, to triangles and quads• a step of refinement (that will transform every triangle into quads)

6.1 From n-sided polygons to Tri-Quads

Considering that the faces, apart from being n-sided, are also non-planar and could be non-convex,another problem we have to face is that the subdivision surface algorithm input should be not only correctbut also “good“ with respect to planarity and convexity7 and thus for each face we have to choose fromamong all the possible splittings which is the best with respect to a chosen criterion.

In this section we will examine some techniques that could be used to choose a solution only with respectto a measure of planarity. In order to do this, we need a method for finding a reference plane on which themeasure of planarity is computed.

Once we have chosen a measure of planarity there are different methods that allow us to explore andcompare all the possible ways in which a face could be split into triangles and quads. In fact all the possiblesubdivisions generate a decision tree (see Figure 47) in which we could search for a solution with differenttechniques such as exhaustive, branch and bound and greedy searches. An analysis of this problem leadsus to the decision to use a greedy search that allows for an easy and fast implementation, very good per-formance but with the drawback of a non optimal solution. In particular, our search strategy works in arecursive fashion as follows:

Given an n-sided face, with n > 5, for each of the possible different quads, made of subsequent sequencesof vertices, compute the planarity measure and choose the best solution. Split the face creating a quad withthe best found sequence of vertices and with the remaining (n-2)-sided polygon restart the algorithm.

When the recursion reaches the termination condition n < 5 create the last face with the remaining threeor four vertices. See Figure 48.

6.1.1 Least Squares Plane

Given four points p1, p1, p3, p4 that represent a possible quad we can use the least squares method to findthe plane that minimizes the sum of the squared orthogonal distances of the points from that plane.

6Due to time limitations, the initial idea of applying an interpolating subdivision scheme, has been substituted with a more simpleand versatile Catmull-Clark approximating subdivision surface scheme.

7 We could also be interested in the area of the new polygons, the angles, the number of triangles vs quads, the number of extraor-dinary vertices, etc.

62


Figure 47: Starting from a 7-sided face at the first step we can create 7 different quads made of consequentvertex sequences. From each one of these face subdivision, the remaining 5-sided face could be split in 5different modes with a total of 35 different face subdivision.

Figure 48: Example of a greedy search on a 7-sided face: At the first step planarity measure is computedfor the 7 quads q1, q2, ..., q7. Only for the best solution, in this example q7, the resulting 5-sided face isconsidered.

63


To this end we can consider that the least squares plane passes through the centroid [27] c = 14 ∑4

i=1 piof the four points and we can build the 3x3 covariance matrix:

S =4

∑i=1

(pi − c) · (pi − c)T .

The left singular vector corresponding to the smallest singular value of the SVD decomposition of Srepresents the solution of the least squares problem that minimizes the sum of the squared orthogonaldistances of the points from that plane, i.e. the function:

4

∑i=1

d2(π, pi) =4

∑i=1

((pi − c) · n)2

‖n‖22

.

6.1.2 Mean Normal Plane

Also in this case, we start from the knowledge that the centroid of the four points will lie in the planethat “best“ fits the four points and we proceed as follows:

• Compute the four normal vectors n1,n2,n3,n4 of the four possible triangles that we can build in thequad.

• Compute the mean normal: n = 14 ∑4

i=1 ni• Define the mean normal plane as:

(p− c) · n = 0

6.1.3 3D Hough Transform

Reasoning on the possible ways to find a plane given a set of points we have investigated the use of theHough Transform [28]

In the bi-dimensional case, for the detection of lines in image analysis, the Hough Transform starts fromthe consideration that the line equation:

y− mx− c = 0

expresses the mapping between a point (m, c) in the parameter space and the line with parameters (m, c) inimage space.

Conversely, given a point in the image plane, the equation

y−mx− c = 0

represents the mapping between the point (x, y) in image space and the infinite points in the parameterspace each of which represents a pair of parameters (m, c) of a line passing through (x, y) (see Figure 49 foran example).

64


Figure 49: Each point (x, y) in the image space is mapped in a line of equation y − mx − c = 0 in theparameter space

A typical implementation of a line matching algorithm8 proceeds by quantizing the parameter space(that is represented in memory by a bi-dimensional array called accumulator) and for each image point(x, y) that belongs to an edge, the equation y − mx − c = 0 is evaluated and the corresponding accu-mulator cells are incremented. At the end of the evaluation process the algorithm searches for peaks inthe accumulator and the corresponding parameters (mh, ch) should represent possible straight lines in theimage (See Figure 50).

In this form the Hough transform could not be really used because the parameter m is unbounded andthe accumulator could not be implemented. The solution is to use the normal parametrization of the line:

ρ = x cos θ + y sin θ

where ρ is the orthogonal distance of the line from the origin and θ is the angle between the x-axis and thevector normal to the line.

Using this representation the image points (x, y) are mapped into sinusoidal curves in the parameterspace [ρ, θ]:

ρ = x cos θ + y sin θ

The same principles could be easily extended to the 3-Dimensional case by representing a plane

ax + by + cz = d

8The Hough Transform could be used to detect other shapes for which an analytical representation is known and also to detectfree-form shapes through the Generalized Hough Transform

65


Figure 50: The discretized parameter space and the correspondent votes in the accumulator array

Figure 51: The Hough transform for lines expressed in normal parametrization

66


in Hessian Normal Form:

n · P = −ρ where n =

nxnynz

=

a√

a2+b2+c2b√

a2+b2+c2c√

a2+b2+c2

and ρ =d√

a2 + b2 + c2.

n is a unit normal vector and ρ is the distance of the plane from the origin. The form n · P = ρ derives fromn · (P− P0) = 0 and in fact, if we choose P0 as the intersection of the plane with the line passing throughthe origin and parallel to the normal direction n, we have:

n · (P− P0) = 0 ⇒ (n · P) = (n · P0) ⇒(n · P) = |n||P0| cos(π) ⇒ (n · P) = ρ

If we express the unit normal vector in spherical coordinates:

n =

nxnynz

=

sin θ cos φ

sin θ sin φ

cos θ

then

ρ = x sin θ cos φ + y sin θ cos φ + z cos θ

Similarly to the 2D case, we can consider this equation as a mapping between the 3D point (x, y, z) and thesurface of equation:

ρ = f (θ, φ) = x sin θ cos φ + y sin θ cos φ + z cos θ

A plane detection algorithm could exploit this knowledge by evaluating the surface on a quantized (θ, φ)space, storing the “votes“ in a 3-Dimensional Accumulator Array and searching for peaks in this array.

But how we can exploit this knowledge to efficiently retrieve a reference plane or a planarity measurefor four given vertices?

We know that, given a point p1 = (x1,y1,z1), we can compute the corresponding surface f1(θ, φ) andeach point in the parameter space that lies on this surfaces represents a triplet that defines one of the possibleplanes passing through the point (x1,y1,z1).

Given a second point p2 = (x2,y2,z2) the corresponding surface f2(θ, φ) will intersect with f1(θ, φ)forming a curve in the parameter space and the points of this curve define the planes passing through theline that join the two points p1, p2, i.e. all planes passing through both p1 and p2.

Given a third point p3 = (x3,y3,z3) the corresponding surface f3(θ, φ) will intersect with both f2(θ, φ)and f1(θ, φ) giving two more curves in the parameter space. The point of intersection of the three curvesdefines the plane passing through the points p1, p2 and p3. Figure 54 shows that if the parameters ρ, θ, φ

vary in [−ρmax, ρmax][0, π], [0, 2π] the curves will intersect in two points that define the same plane but withdifferent normal directions (inside/outside).

67


Figure 52: Visualization of a surface in the parameter space deriving from 3D Hough transform of a singlepoint in space. Every point of this surface identify a plane that passes through this 3D point. The image wasgenerated using Octave

Figure 53: The 3D Hough transform of a point as visualized in the test application used to experiment withthe 3D Hough transform . Every point of this surface identify a plane that passes through this 3D point

68


Figure 54: The curves associated with three 3D Hough Transform of three points intersect in the parameterspace in two points that represents the same plane passing through the three points

Now, if we consider the fourth surface f4(θ, φ) associated with the point, p4 we will have six curves9 thatwill intersect in a single point if and only if the four p1, p2, p3, p4 points are coplanar. If not, considering thatwith four points we can define four different triangles, thus four different planes, we will have four pointsof intersection.

The idea is to exploit and analyze the distances of these four points in the Hough parameter space ρ, θ, φ

to compute a measure of planarity for a non-planar quad.One major problem that arises from this method is the choice of the resolution of the parameter space

that will considerably affect the effectiveness and the efficiency of the method. Normally, given N pointsand quantizing the parameter space ρ, θ, φ evenly in each direction with Q intervals, we need to evaluate Nsurfaces on the [θ, φ] space, quantized in Q2 points, thus with a temporal complexity of O(N ∗ Q2) for thecomputation of the accumulator array.

One possible solution to this problem is to use a “dynamic quantization and evaluation“ in which wewill search for surface intersection by iteratively increasing the parameter space resolution and evaluatingonly where it is necessary in a way similar to how the octree spatial data structure works.

For instance, since we are searching for cells in the accumulator that have at least three votes (i.e. theyare the intersection of three or more surfaces) we can do the following:

• Starting from a [2× 2× 2] 3D grid, evaluate the 4 surfaces in the 4 points of the [θ, φ] space.• Each cell with at least three votes is refined by subdividing it into an octant.• In the next step the 4 surfaces are reevaluated but only in correspondence with the (θ, φ) points of the

new subdivided cells.• Proceed refining the grid and evaluating only the cells where at least three votes are found until a

9These curves represent the six pencils of planes passing through the six possible lines connecting the four points: p1-p2 , p2-p3 ,p3-p4 , p4-p1 , p1-p3, p2-p4

69

6.2 Refining the Tri-Quad Mesh 6 FROM POLYLINE MESH TO SMOOTH SURFACES

Figure 55: An example of possible [θ, φ] space subdivision.

termination condition is satisfied (e.g. a predefined number of steps)

6.2 Refining the Tri-Quad Mesh

6.2.1 Bilinearly Blended Coons Patches

Coons Patches are used to find a surface that interpolates four given boundary curves.In particular the bilinearly blended Coons patch for four curves c1(u), c2(u) and d1(v), d2(v) with u ∈

[0, 1] and v ∈ [0, 1] is a surface x(u, v) that has these four curves as boundary curves:

x(u, 0) = c1(u) , x(u, 1) = c2(u)

x(0, v) = d1(v) , x(1, v) = d2(v)

Using the four boundary curves we can define two ruled surfaces:

rc(u, v) = (1− v)x(u, 0) + vx(u, 1)

andrd(u, v) = (1− u)x(0, v) + ux(1, v)

but none of them will interpolate all the four boundary curves together because they are defined as linearinterpolation between c1(u), c2(u) and d1(v), d2(v) respectively. Bilinearly blended Coons patches eliminatethe “interpolation failures“ of the ruled surfaces defining the Coons patch x(u, v) as:

x = rc + rd − rcd

70


where rcd is the bilinear interpolant to the four corners:

rcd(u, v) = [1− u u][

x(0, 0) x(0, 1)x(1, 0) x(1, 1)

] [1− v

u

]Expanding the terms of x = rc + rd − rcd, the parametric bilinearly blended Coons patch is:

x(u, v) = [1− u u][

x(0, v)x(1, v)

]+

[x(u, 0) x(u, 1)

] [1− vv

]− [1− u u]

[x(0, 0) x(0, 1)x(1, 0) x(1, 1)

] [1− vv

]But why are Coons patches used in our reconstruction process and how we can adapt them since the curvenetwork is made of polylines instead of parametric curves?

The answer to the first question is that we need a way to refine the polyline mesh, underlying the curvenetwork “drawn“ by the user, with the purpose of producing a sufficiently good base mesh that will bethe input of the subdivision surface reconstruction step. Coons patches offer a simple and effective way torefine the polyline mesh that perfectly fits in our reconstruction pipeline being efficient and because, as wewill see, it can be used in a recursive fashion.

Before considering the problem of the application of the Coons patches given four boundary polylines,we can first note that for our purposes it is only necessary to evaluate a small number of points on thebilinearly blended Coons patch and for instance, we will only compute one or a few more points on thesurface because, since each patch associated with a face of the polyline mesh is computed independentlyfrom the neighborhood faces, a finer evaluation of the Coons patch will not produce in general a smoothsurface.

Given the four boundary polylines, that for simplicity of notation all have the same number of points n:

c1 ={

pc11 , pc1

2 , . . . , pc1n}

c2 ={

pc21 , pc2

2 , . . . , pc2n}

d1 ={

pd11 , pd1

2 , . . . , pd1n

}d2 =

{pd2

1 , pd22 , . . . , pd2

n

}we need to find the points on the polylines

x(u, 0), x(u, 1), x(0, v)x(1, v)

in order to obtain a point x(u, v) on the Coons surface x(u, v).To this end, given a polyline P =

{p1, p2, . . . , pn

}, define the polyline length l of the polyline as:

71


l =n

∑i=1

∥∥pi+1 − pi∥∥

we can associate at each point pi a corresponding parameter ti ∈ [0, 1]:

ti =1l

i−1

∑j=1

∥∥∥pj+1 − pj

∥∥∥Since the “discretely parametrized“ polyline could only be evaluated in the points ti , i = 1, . . . , n, whichcould be the point P(t) associated with a general parameter value t?

A possible answer is: the point P(tj) for which

(t–tj) < (t–ti) , ∀i

Knowing the parametrization for a polyline, we can finally compute a point on the surface patch with:

x(u, v) = [1− u u][

d1(v)d2(v)

]+

[c1(u)c2(u)

] [1− vv

]− [1− u u]

[x(0, 0) x(0, 1)x(1, 0) x(1, 1)

] [1− vv

] (1)

The previous formula has been used during the implementation of the reconstruction step that trans-forms the Tri-Quad Mesh into the refined Base Mesh as follows10:

For each face compute the “Coons point“ Cp = x( 12,

12 ) on the bilinearly blended Coons patch using the

polylines c1(u), c2(u) , d1(v), d2(v) associated with the edges of the face and connect the new point Cp to thepoints c1( 1

2 ), c2( 12 ), d1( 1

2 ), d2( 12 ) creating 4 new edges that subdivide the face into 4 new quad faces. Let’s

call the polyline mesh output of this step BaseMesh.If we only want to refine the tri-quad mesh with a single step of the Coons algorithm this is all that

we need and we can proceed in the next step of the reconstruction pipeline using BaseMesh as input for thesubdivision surface algorithm. But if we want to apply another step of refinement we can note that the newlycreated edges of BaseMesh do not have an associated polyline and the points c1( 1

2 ), c2( 12 ), d1( 1

2 ), d2( 12 ) could

not be determined.The idea is to evaluate the bilinearly blended Coons patch in x( 1

4,12 ), x( 1

2,34 ), x( 3

4,12 ), x( 1

2,14 ) and to create

the four new polylines Pnew1 , Pnew

2 , Pnew3 , Pnew

4 as:

10A triangular face with border polylines c1, c2, c3 is handled in a simpler way by considering the centroid of the three points c1(½),c2(½), c3(½)

72

6.3 Smooth Surface Reconstruction Using Subdivision6 FROM POLYLINE MESH TO SMOOTH SURFACES

Pnew1 =

{Cp, x(

12

,14), c1(

12)}

Pnew2 =

{Cp, x(

12

,34), c2(

12)}

Pnew3 =

{Cp, x(

14

,12), d1(

12)}

Pnew4 =

{Cp, x(

34

,12), d2(

12)}

If we want to apply more than two steps of refinement then we have to simply include other points sampledfrom the Coons patch. In particular if we need n steps of refinement then the surface x(u, v) has to beevaluated in:

x(i

2n,

12) , i = 1, 2, . . . , 2n− 1

x(12

,j

2n) , j = 1, 2, . . . , 2n− 1

Since x( i2n , 1

2 ) = ( 12 , j

2n ) for i = j = n, we need 4n–3 points on the Coons patch.Techniques for utilizing Coons patches for triangles exists but we have temporarily implemented in

our system a simple solution for the refining of triangular faces. The computation of the “Coons point” issubstituted with the computation of the centroid of the three midpoints of the border polylines, where withmidpoints is intended as before.

This solution, apart from refining the patch, will also produce as output always quad faces.

6.3 Smooth Surface Reconstruction Using Subdivision

After the step of refinement through bilinearly blended Coons patches, our objective is to produce asmooth surface that will represent our physical object.

A popular smooth surface representation is the tensor product NURBS. However NURBS can only rep-resent surfaces of arbitrary topological type by partitioning the model in a collection of individual NURBSpatches. Adjacent patches must then be explicitly stitched together using geometric continuity conditions.A large number of parameters are therefore introduced, most of which are constrained by the continuityconditions and, as a consequence, fitting NURBS in general requires high-dimensional constrained opti-mization.

Subdivision surfaces offer a valid alternative to the problem of representation of a smooth surface andthe base idea can be summarized as follows[30]:

“Subdivision defines a smooth curve or surface as the limit of a sequence of successive re-finements.”

Figure 56 shows an example of recursive refinement in the case of a curve connecting some number ofinitial points in the plane. The initial 4 points connected through straight line segments are used to compute

73


Figure 56: An example of curve subdivision

Figure 57: An example of a sequence of successive refinements on an initial coarse mesh

3 more points “in between” the old points. Repeating the process we can see smoother looking piecewiselinear curve and repeating once more the curve starts to look quite nice already.

The same approach could be used in 3D and an example of subdivision for surfaces is shown in Fig-ure 57. In this case each triangle in the original mesh is split into 4 new triangles quadrupling the numberof triangles in the mesh. Applying the same subdivision rule once again gives the mesh on the right. Bothof these examples show what is known as interpolating subdivision.

How were the new points determined? One could imagine many ways to decide where the new pointsshould go. Clearly, the shape and smoothness of the resulting curve or surface depends on the chosen rule.

There is a straightforward way to classify most of the subdivision schemes based on four criteria:• the type of refinement rule (face split or vertex split)• the type of generated mesh (triangular or quadrilateral)• whether the scheme is approximating or interpolating• smoothness of the limit surfaces

For the purpose of this thesis, and in particular for the completion of our reconstruction pipeline, we willonly consider Catmull-Clark subdivision surfaces that, starting from an arbitrary polygonal mesh, produce

74


Figure 58: Three consecutive iterations of the Catmull-Clark subdivision for a quad mesh representing acube

at each step a refined quad mesh through face splitting rules, and the limit surface is C2 except at extraordi-nary vertices where is C1 continuous.

The Catmull-Clark subdivision applied to the base mesh M = (F, E, V) output of the previous step ofreconstruction recursively refines the mesh using the following rules:

• for each f in F compute the face point fp as the centroid of all the vertices of the face:

fp =1n

n

∑i=1

vi

• for each e in E compute the edge point ep as the average of the endpoints of the edge and the newlycomputed face points of adjacent faces:

ep =v1 + v2 + f1 + f2

4• move each v in V in the new position :

k− 2k

v +1k2

k

∑i=1

fpi +1k2

k

∑i=1

emi

where k is the valence of the vertex and emi is the midpoint of the edge (the average of the endpoints)The mesh is reconnected using the following method

• Each new face point is connected to the new edge points of the edges defining the original face.• Each new vertex point is connected to the new edge points of all original edges incident on the original

vertex point.Let's review the process on the simple mesh composed of four triangular faces of Figure 59-a):

75


a) b)

c) d)

e)

Figure 59: An example of a Catmull-Clark step: a) the original mesh, b) face points, c) edge points, d) newvertex position (that in the 2D visualization overlap with the old vertex position), e) the new refined mesh.

First, we construct the face points, which are calculated as the average of the vertices of each face. Thesepoints are shown in Figure 59-b) labeled with an F. Then the edge points are computed as the average offour points: the two vertices at the endpoints of the edge and the two new face points of the faces that areadjacent to the edge. The edge points are shown in Figure 59-c) labeled with an E. Then move the centralvertex in the new position using the formula described above and build the new “refined“ mesh connectingthe new face points to the new edge points adjacent to the face, and the moved vertex to the edge points.

The Catmull-Clark subdivision scheme is not an interpolation scheme and in fact vertices of the coarsermesh are not vertices of the refined mesh. Interpolation could be an attractive feature for our reconstructionpurposes because the original vertices defining the base mesh will be also points of the limit surface. Sincethe vertices of the base mesh are points of the curve network acquired on the physical object, the idealsituation would be to interpolate these points and also the other points of the polyline associated witheach edge of the base mesh.Unfortunately, the quality of interpolating subdivision surfaces is in generalnot as high as the quality of surfaces produced by approximating schemes and the presence of noise inthe data could easily worsen the situation. These considerations, in conjunction with time constraints forthe development of an ad-hoc subdivision surface scheme interpolating the curve network, lead us to the

76


decision of using Catmull-Clark subdivision surfaces.

77

7 EXPERIMENTAL RESULTS

Figure 60: The unmodified boat's keel quad mesh.

7 Experimental Results

In this chapter we will see some experimental results of our fast interactive reverse engineering sys-tem. In particular we will focus on the reconstruction and visualization steps of the incremental acquisitionpipeline to better visualize, understand and compare the results and the expectable efficacy of the chosensolutions.

Regarding the measuring step, we only achieved very rough results because during the developmentof the software layer responsible for the 3D tracking we have encountered different problems primarilyderiving from the wiimote camera calibration. Without an accurately calibrated camera, the accuracy of the3D tracking system is notably compromised.

Since the reconstruction and visualization layers work with a curve network as input but the measuringstep is currently not sufficiently mature to produce a usable curve network, we will simulate an acquisitionprocess building a curve network from an available polygon mesh and augmenting it by associating randomgenerated polylines to the polygon mesh edges.

The use of synthetically generated curve networks has an advantage: it allow us to compare, at leastvisually, the result of the process of reconstruction from curve networks with the original polygon mesh orwith the smooth surface obtainable through subdivision of the original mesh.

To better understand how these curve networks are generated we will see an example starting from apolygon mesh representing a boat's keel. In Figure 60 the original mesh is shown from different views.

By eliminating some edge loops from the mesh we can obtain a less refined mesh that only containsminimum topological information. The original mesh is composed of only four sided planar polygonswhile the unrefined mesh contains general n-gons. In Figure 61 we can see the unrefined boat keel mesh,

78


Figure 61: Unrefined boat's keel mesh.

a) b)

Figure 62: A close-up of the keel curve network randomly generated from the unrefined mesh by linearlyinterpolating each edge in a fixed number of points and adding 2 and 4 percent of noise rispectively inimage a) and b)

with the vertices represented as violet dots, the edges in blue and the border edges in red.To actually build a curve network from the unrefined mesh we associate at each edge a polyline gener-

ated through a linear interpolation of the end-points of each edge and, to simulate the inaccuracies of theacquisition process we add a variable percentage of noise.

In Figure 62 we can see a close-up of the curve network generated with different amounts of noise.To more accurately match the characteristics of a possible curve network output of the measuring pro-

cess, we have to eliminate all the vertices with valence 2 that aren't on the border polygon. In fact, asexplained in chapter 5.4, the user can only draw new curves that start and end on an existing polyline, so,vertices shared by only two edges could not exist. To resolve this situation we "collapse" each vertex withvalence 2 by joining the polylines associated with the two edges that share this vertex. In Figure 63 showshow the step of vertex collapse modifies the topology associated with the curve network while preservingthe polylines.

The first experiment for testing our reconstruction pipeline will be on a curve network representinga torus (see Figure 64). This curve network comes from the unrefinement of the quad mesh shown inFigure 65.

The first step of our rendering pipeline consists in transforming n-sided faces, with n>4, in faces made oftriangles and quads. Since the process of synthetically generating a curve network always starts from a quadmesh, and after the unrefining and vertex collapsing step, produces a curve network with an underlying

79


Figure 63: In the left image there are three vertex, marked with circles, linked each one to only two edges.The vertex collapse step will eliminate these 3 vertex and will join the 4 edges in a single one that keep thereference to the joined polylines

Figure 64: Curve network representing a torus

Figure 65: The original torus quad mesh

80


a) b)

c) d)

Figure 66: Polyline mesh refinement through Coons patcehs. Image a) represents the initial curve network.b) and c) are the curve networks output of the application of respectively one and two Coons refinementsteps where (the lines in red are the new computed edges). Image d) is the smooth surface resulting fromCatmull-Clark subdivision applied to the base mesh.

polyline mesh with quad topology, we will consider the triquadrification step in a separate section alsobecause it deserves a careful analysis being the first processing applied to the data thus influencing notablythe final result.

The subsequent step consists in the refinement of the single faces of the polyline mesh exploiting thebilinearly blended Coons patches. This step of refinement, as explained in chapter 6.2, computes a newpoint for each face and builds four new quads if the face is four sided or three new quads if it is 3-sided, byconnecting the new point to the midpoints of the polylines associated at each edge. The new polyline mesh,output of the refinement, is called the base mesh.

For example Figure 66-b), c) shows two steps of refinement applied to the torus polyline mesh of Fig-ure 66-a).

From the base mesh our process produces a smooth surface through subdivision and, in particular,through Catmull-Clark subdivision. The smooth surface resulting from the recursive application of sixsubdivision steps on the previously created base mesh is shown in Figure 66. The quality of the surfaceoutput of the subdivision process, as currently implemented, depends only on the vertices and faces of thebase mesh and not on the polylines associated with the edges, in fact, the computation of the face points,edge points and vertex point of the Catmull-Clark subdivision does not take into account the points of thepolylines except for the starting and ending points, i.e. the vertices.

81


a) b)

c) d)

Figure 67: Interactive surface sketching and the reconstruction pipeline

The importance of the real-time visual feedback during the process of interactive surface sketching couldbe better understood if we visualize the progressive creation of a curve network.

Supposing that the user has drawn the curve network of Figure 67-a) the reconstruction pipeline pro-ceeds applying two steps of the Coons algorithm, producing the base mesh of Figure 67-b) and then thesmooth surface of Figure 67-d) through Catmull-Clark subdivision. The result of the first iteration of theCatmull subdivision is shown in Figure 67-c).

As described in chapter 6, the user will be able to see both the current curve network and the smoothsurface. This combined visual feedback allows the user to identify parts of the object that need to be refined,in order to decide where to draw the next curve. The terms "Fast and Interactive Reverse Engineering" and"interactive surface sketching" were used often in this thesis to emphasize precisely this characteristic of oursystem that is rarely available in other reverse engineering solutions where a visual feedback of the smoothsurface is only available after the physical object has been completely scanned and after all data processingand 3D surface reconstruction is completed.

Figure 68 and 69 show two possible iterations of the reconstruction pipeline where we simulate that theuser has progressively drawn more curves on the object through interactive surface sketching.

The decision of when the reconstructed 3D model is sufficiently accurate is only up to the user, who canchoose, based on his needs, whether to proceed in the refinement or to stop the reverse engineering process.

Another advantage of the real-time visual feedback is that it gives the user the ability to detect at eachiteration of the process eventual errors caused by the instruments or directly by the user. These kinds oferrors can be easily spotted and corrected. For example Figure 70 shows a curve network representing ashoe in which the curve visible in the rear part of the shoe is incorrect.

82




83


Figure 70: The visual feedback at each iteration of the reverse engineering pipeline allow to easily detectproblems in the acquisition of a curve netwok.

Figure 71: The original shoe mesh

Let's see some other examples of our reconstruction pipeline starting from the correct curve networkrepresenting a woman's shoe (Figure 72). The polyline mesh associated with the curve network containsmuch less topological information than the original mesh (Figure 71) but, despite that, the reconstructionpipeline does a good job recovering the missing information from the points of the polylines associated withthe polyline mesh.

Figure 72: Reconstruction from a curve network representing a woman shoe.

84


Figure 73: Example of reconstruction from a curve network representing an airplane

Figure 74: Example of reconstruction from a curve network representing a pawn.Since the starting curvespresent corners, the surface resulting from the reconstruction process will be affected accordingly

The two examples of Figure 73 and Figure 74 start from a curve network that has been generated un-refining meshes that were designed specifically for the application of approximating subdivision surfaces.We can see, especially in the second figure, that the resulting surface inherits the sharp features of the curvenetwork because the bilinearly blended Coons patches, differently from an approximating subdivision sur-face schemes, interpolates some points of the polylines. The result of the reconstruction of the mechanicalpiece shown in Figure 75, where the polylines of the starting curve network are rather smooth, tells us thatis possible to achieve very good results and underlines the importance of a correct acquisition of the curvenetwork, both in terms of accuracy of the 3D scanning system and in terms of user responsibility in thechoice of the curves that principally characterize the shape of the object.

The algorithm that from n-sided faces produces tri and quad faces, as presented in chapter 6.1, will sufferfrom a major problem that is independent from the used estimate of planarity.

85

8 CONCLUSIONS

Figure 75: Example of reconstruction from a curve network representing a mechanical piece

When a suitable subdivision from n-sided face to tri-quad is found, new edges have to be created. Sinceduring the analysis of the triquadrification problem, we considered only the mesh associated with the curvenetwork, without considering the polylines, the problem of what polyline we will associate to the newedges was not evident and was solved by linearly interpolating the endpoints of the new edge. This isunfortunately a fundamental issue because the new edge and its associated polyline will be used in the nextsteps of reconstruction.

In particular the Coons algorithm applied at each face will interpolate the border polylines of each face(thus also the new polylines generated by the triquadrification algorithm) and the result, as can be seen inFigure 76, is obviously wrong.

These issues were only detected during the final experimentation of our reconstruction technique be-cause they become evident only after applying the subsequent steps of the reconstruction process. As said,this problem arises from the choice of the polylines associated with the new edges, thus a method for com-puting the new points of the polylines have to be found in order to be able to perform the scheduled stepsof reconstruction of our reverse engineering pipeline. Unfortunately a solution to this problem has not beenfound but a possible idea could be to merge the steps of triquadrification and refinement to exploit thebilinearly blended Coons patches in the computation of the new points of the polylines.

The proposed triquadrification methods remain an effective solution if applied to a polygon mesh notaugmented with polylines, when each face needs to be subdivided in triangles and quads and requires agood solution with respect to a measure of planarity of the new generated faces.

8 Conclusions

Reverse engineering offers useful applications in many different fields, ranging from medicine to indus-trial design to movie special effects. Traditional reverse engineering involves two main steps, the measuring

86

8 CONCLUSIONS

(a) (b) (c) (d)

Figure 76: The curve network a) representing the pawn has been used this time without applying the stepof vertex collapsing thus the faces now are general n-gons. Figure b) represents the triquadrification of a)using the Mean Normal Plane technique. c) is the curve network after one step of the Coons algorithm whiled) is the surface output of Catmull-Clark subdivision surface iterations.

of the physical object, and its reconstruction as a 3D virtual object. Since the output of a typical 3D scanningsystem is in the form of an unstructured point cloud, various techniques have been developed to resolve theproblem of reconstruction from such a point cloud. However these techniques suffer from some commonlimitations:

• the strict separation of the two fundamental steps of measuring and reconstruction makes this processa linear, non iterative and non interactive process

• the often overwhelming number of points acquired and the lack of topological information in this data,combined with the precence of noise and inaccuracies, usually require complex and time-consumingsolutions

In this thesis we have presented a solution of the reverse engineering problem that tries to address theseissues by involving the user in an iterative and interactive surface sketching process. The interactive surfacesketching is possible thanks to inexpensive and readily available infrared cameras used in a stereo rig totrack the position of the infrared pen controlled by the user. Because the user respects an established set ofrules, the sketching process itself provides topological information, allowing us to exploit this informationto reconstruct a smooth surface using a fast process based on subdivision surfaces. The rapidity of thisprocess in turn allows us to give the user visual feedback on the ongoing reverse engineering process.Thanks to this, the user can intervene immediately in case of problems due to inaccuracies in the acquisitionor errors in the reconstruction. Moreover the user can draw new curves, focusing on the most importantcharacteristics of the object, basing choices on the nearly instantaneous visual feedback.

While this procedure seems very promising, it presents several difficulties which still need to be resolved:the active stereovision system lacks a rigorous calibration process due to problems with the instrumentsused and the reconstruction step needs to be further refined to improve the robustness of the results.

87

9 REFERENCES

9 References

[1] “Reverse engineering - Wikipedia, the free encyclopedia”;http://en.wikipedia.org/wiki/Reverse_engineering.

[2] C. Smith, “On Vertex-Vertex Systems and their use in geometric and biological modelling,” 2006.[3] L. Guibas and J. Stolfi, “Primitives for the manipulation of general subdivisions and the computation

of Voronoi,” ACM Transactions on Graphics (TOG), vol. 4, 1985, pp. 74-123.[4] “SIGGRAPH 2000 Course on 3D Photography”;

http://www.cs.cmu.edu/˜seitz/course/3DPhoto.html.[5] “The Digital Michelangelo Project”; http://graphics.stanford.edu/projects/mich/.[6] F. Bernardini and H. Rushmeier, “The 3D Model Acquisition Pipeline,” Computer Graphics Forum,

vol. 21, 2002, pp. 149-172.[7] L. Liu et al., “Surface Reconstruction From Non-parallel Curve Networks”;[8] S. Schaefer, J. Warren, and D. Zorin, “Lofting curve networks using subdivision surfaces,” Proceed-

ings of the 2004 Eurographics/ACM SIGGRAPH symposium on Geometry processing, 2004, pp. 103-114.[9] H. Hoppe et al., “Surface reconstruction from unorganized points,” Proceedings of the 19th annual

conference on Computer graphics and interactive techniques, 1992, pp. 71-78.[10] H. Hoppe et al., “Piecewise smooth surface reconstruction,” Proceedings of the 21st annual confer-

ence on Computer graphics and interactive techniques, 1994, pp. 295-302.[11] “ACM - Association for Computing Machinery”; http://www.acm.org/.[12] J. Raskin, The Humane Interface: New Directions for Designing Interactive Systems, Addison-

Wesley Professional, 2000.[13] “Johnny Chung Lee - Projects - Wii”; http://www.cs.cmu.edu/˜johnny/projects/wii/.[14] “Analog Devices: ADXL330: Small, Low Power, 3-Axis ±3g iMEMS® Accelerometer :: iMEMS®

Accelerometers :: MEMS and Sensors”;[15] H.C. Longuet-Higgins, “A computer algorithm for reconstructing a scene from two projections,”

Readings in Computer Vision: Issues, Problems, Principles, and Paradigms, 1987.[16] R.I. Hartley, “In defense of the eight-point algorithm,” Pattern Analysis and Machine Intelligence,

IEEE Transactions on, vol. 19, 1997, pp. 580-593.[17] Z. Zhang, “A Flexible New Technique for Camera Calibration,” IEEE transaction on pattern analysis

and machine intelligence, vol. Vol. 22, No. 11, 2000, pp. 1330-1334.[18] B. Triggs, “Autocalibration from planar scenes,” Proceedings of the 5th European Conference on

Computer Vision, Freiburg, Germany, vol. 626, 1998, p. 626.[19] “Open Source Computer Vision Library”;

http://www.intel.com/technology/computing/opencv/.[20] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge Universi-

tyPress, 2003.[1] M. Botsch et al., “Openmesh-a generic and efficient polygon mesh data structure,” OpenSG Sympo-

sium, vol. 2002, 2002.[22] J. Rossignac, “Specification, representation, and construction of non-manifold geometric struc-

tures.,” 1994.

88

9 REFERENCES

[23] J. Rossignac, “3D Compression Made Simple: Edgebreaker with Zip&Wrap on a Corner-Table,”Proceedings of the International Conference on Shape Modeling & Applications, 2001, p. 278.

[24] “Valgrind Home”; http://valgrind.org/.[25] D.E. Stewart and Z. Leyk, Meschach: matrix computations in C, Centre for Mathematics and its

Applications, Australian National University, [Canberra], 1994.[26] S. Berchtold et al., “A cost model for nearest neighbor search in high-dimensional data space,” Pro-

ceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems,1997, pp. 78-86.

[27] Y. Nievergelt, “Total least squares: State-of-the-art regression in numerical analysis,” SIAM Review,vol. 36, 1994, pp. 258-264.

[28] R.O. Duda and P.E. Hart, “Use of the Hough transformation to detect lines and curves in pictures,”Communications of the ACM, vol. 15, 1972, pp. 11-15.

[29] S.A. Coons, SURFACES FOR COMPUTER-AIDED DESIGN OF SPACE FORMS, MassachusettsInstitute of Technology, 1967;

[30] D. Zorin and P. Schroder, “Subdivision for modeling and animation,” ACM SIGGRAPH Course,2000.

89

An active stereovision system for 3D shape reconstruction using

Documents