Modelling 3D Object Shape - Charlie Nashcharlienash.github.io/assets/docs/mscThesis.pdf · Chapter 1 Introduction A major goal in computer vision is to fully understand a natural

Modelling 3D Object Shape

Charlie Nash

Master of Science by ResearchSchool of Informatics

University of Edinburgh2015

i

Abstract

Signi�cant progress has been made in recent years in computer vision tasks such as objectdetection, recognition and segmentation. However, full scene understanding, where the goalis to infer the 3D positions and poses of objects in a scene, as well as the location of thelighting source and camera, remains a challenging task. To this end a model of object shapecan be very useful, whether as a means of generating richly-annotated training data for arecognition model, or as a component in an inverse-graphics system. Additionally, in com-puter graphics, content creation is a central task, and a shape model can be used to synthesizerealistic objects that can be placed within a scene. However, shape modelling is challengingdue to the high level of variability that can be present in an object class, and the di�culty of�nding an appropriate shape representation. We present a system that can learn to generatenovel instances of an object class using a collection of examples from that object class astraining data. The system automatically obtains a landmark representation of objects fromthe object class, learns a shape model using the landmark representation, samples from thismodel, and generates a mesh that matches the shape of the landmark sample. We evaluate hesystem on a number of object classes, and demonstrate its ability to produce class instancesthat are both novel and realistic.

ii

Acknowledgements

I would like to thank my supervisor, Professor Chris Williams, for his invaluable input andguidance throughout the course of this project. This work was carried out in the Centre forDoctoral Training in Data Science at The University of Edinburgh and I would like to thankboth the students and the management team of the centre for crafting a high-quality programand supporting this research.

This work was supported in part by the EPSRC Centre for Doctoral Training in Data Science,funded by the UK Engineering and Physical Sciences Research Council (grant EP/L016427/1)and the University of Edinburgh.

iii

Declaration

I declare that this thesis was composed by myself, that the work contained herein is myown except where explicitly stated otherwise in the text, and that this work has not beensubmitted for any other degree or professional quali�cation except as speci�ed.

(Charlie Nash)

Table of Contents

1 Introduction 11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 42.1 Representations of object shape . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Volumetric shape representation . . . . . . . . . . . . . . . . . . . . 52.1.2 Landmark shape representation . . . . . . . . . . . . . . . . . . . . . 52.1.3 Discussion of shape representations . . . . . . . . . . . . . . . . . . 6

2.2 Other related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Datasets 83.1 Vases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.2 Cars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.3 Teapots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.4 Software and Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.4.1 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.4.2 Wavefront obj �les . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.4.3 Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.4.4 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Establishing Point Correspondences 124.1 Correspondence overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.1.1 Pairwise and groupwise shape correspondence . . . . . . . . . . . . 124.1.2 Similarity-based correspondence . . . . . . . . . . . . . . . . . . . . 134.1.3 Registration of point clouds and meshes . . . . . . . . . . . . . . . . 134.1.4 Global surface transformation . . . . . . . . . . . . . . . . . . . . . . 144.1.5 Local surface transformation . . . . . . . . . . . . . . . . . . . . . . 154.1.6 The iterative closest point algorithm . . . . . . . . . . . . . . . . . . 15

4.2 The method used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.2.1 Landmark selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.2.2 Distance metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2.3 Deformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2.4 regularisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2.5 Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

iv

TABLE OF CONTENTS v

4.2.6 Iterative sti�ness relaxation . . . . . . . . . . . . . . . . . . . . . . . 204.2.7 Re-estimation of template shape . . . . . . . . . . . . . . . . . . . . . 204.2.8 Summary and parameter selection . . . . . . . . . . . . . . . . . . . 21

4.3 Evaluation and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.3.1 Correspondence examples . . . . . . . . . . . . . . . . . . . . . . . . 224.3.2 Deformation examples . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5 Models for Object Shape 275.1 Statistical shape models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.2 Gaussian models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.2.1 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.3 Latent variable models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.3.1 Factor analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.3.2 Probabilistic principal component analysis . . . . . . . . . . . . . . . 305.3.3 Gaussian process latent variable model . . . . . . . . . . . . . . . . . 315.3.4 Bayesian GP-LVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.4 The method used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.4.1 Model set up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.4.2 Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.4.3 GP-LVM latent kernel density estimation . . . . . . . . . . . . . . . 36

5.5 Evaluation and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.5.1 Realism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.5.2 Generalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.5.3 Shape completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.5.4 Log likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.5.5 Data visualisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6 Meshing shape samples 466.1 Embedded deformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466.2 The method used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.2.1 Distance metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.2.2 Regularisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.2.3 Embedded deformation meshing . . . . . . . . . . . . . . . . . . . . 486.2.4 Mesh smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486.2.5 Template choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6.3 Evaluation and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496.3.1 Sti�ness parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 496.3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7 Conclusions and Future Work 527.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Bibliography 55

Chapter 1

Introduction

A major goal in computer vision is to fully understand a natural scene: given an image, thetask is to identify the objects present in a scene, their positions and poses, the source ofillumination etc. This can be viewed as a pattern recognition problem in which the goal isto infer the probable cause of a sensory input. A general approach to a pattern recognitiontask is to build a generative model from which samples can be generated which are likely toresemble the patterns of interest [1]. A separate recognition model can then be trained on thesynthetic, labelled data generated by the generative model. Using this approach, the sceneunderstanding task can be tackled by generating synthetic, realistic scenes, and training arecognition model that can infer the variables of interest. One aspect of synthetic scenegeneration is 3D object modelling where the task is to learn a model of object shape andappearance for a given object class such that novel, realistic objects from that class could begenerated and placed in a scene.

Object modelling has additional applications in computer graphics, in which content cre-ation is a key task. Vast numbers of 3D models are created by graphics artists for use incomputer games, virtual reality worlds and visual e�ects in cinema. Even with modern in-teractive tools, the creation of this content requires considerable time and e�ort on the partof graphics artists. However, online repositories of 3D models are growing in size, with thou-sands of objects in a wide range of classes being available to download. Users can browserepositories and select content to appear in their scenes, but this resource could be put to aanother use: as data for the development of shape synthesis tools, which can learn to gener-ate instances from an object class. A major bene�t to this approach is that that an unlimitednumber of objects could be generated, allowing for the creation of virtual worlds that cancompete with the real world in terms of object variety.

The task of modelling 3D objects is challenging, as objects within an object class can ex-hibit considerable shape and appearance variability. Take the teapot object class for example.Focusing solely on shape, teapots vary in the placement of their handles and spouts, as wellas in their overall geometry, with some models being short and wide, and some being talland thin. Moreover, di�erent elements of an object’s shape can be highly dependent. Forexample with car objects the angle of the windscreen is correlated with the overall height ofthe car. Successful models of object shape must account for such dependencies, often withfew examples to utilise. In addition, it is not obvious how exactly to represent object shape.Shapes may be represented as grids of voxels, as collections of landmark or control points,as a set of geometric primitives, or in other ways, but each approach has its limitations.

1

Chapter 1. Introduction 2

Figure 1.1: System overview. (a) We take a collection of 3D meshes from a particular object class.(b) Corresponding landmark points are automatically obtained. (c) We learn a shape model usingthe landmark point representation. (d) We use the learned model to sample new instances from theobject class in the form of landmark points. (e) We then generate a mesh that matches the shape ofthe sampled landmark points.

1.1 Overview

In this project we aim to develop a system that can generate novel instances of an object class,using a collection of examples from that object class as training data. We impose some furtherrestrictions on the scope of the project: We assume that the objects in the input collection are3D triangulated meshes, and we aim to generate objects also of this form. We restrict our-selves to modelling object shape rather than appearance. We follow the approach of Eslamiet al. [2] and impose the requirement that the object model should meet two requirements:

• Realism Samples should retain characteristic features of the class, and meshes shouldbe of good quality.

• Novelty The model should be capable of producing samples that di�er from the train-ing examples.

To meet the project speci�cation our approach is to separate the object generation processinto three stages. In the �rst stage we determine a �xed-dimensional shape representationof the training examples by establishing correspondences between dense landmark points.In the second stage we specify, learn and sample from a probabilistic shape model, using therepresentation determined in the �rst stage. Finally we mesh the sampled landmark pointsby deforming a template mesh. Figure 1.1 provides an overview of the full system.

1.2 Contributions

This thesis contributes a three-part system for the generation of novel, realistic shapes from agiven shape class. We propose methods for the establishment of point correspondences

Chapter 1. Introduction 3

between objects from an object class, for the sampling of landmark points, and for themeshing of sampled landmark points. In particular the main contributions of this thesisare:

• It extends the non-rigid iterative closest point algorithm of Amberg et al. [3] with abidirectional distance metric. This method is shown to be an improvement on theoriginal for establishing point correspondence.

• It provides an evaluation of probabilistic principal component analysis, the Gaussianprocess latent variable model, and the Bayesian Gaussian process latent variable model,in the context of shape landmark generation. The models are assessed by the noveltyand realism of the samples that they produce, as well as by their performance on ashape reconstruction task.

• It proposes the application of embedded deformation [4] for the meshing of a givenshape sample. This method is demonstrated to be e�ective at generating realisticmeshes that match the shape of the landmark points.

1.3 Structure of the thesis

This thesis is structured as follows: Chapter 2 reviews existing methods for modelling objectshape. It covers both volumetric and surface representations of 3D objects and discusses therelative merits of each approach. Chapter 3 describes the data sets, pre-processing techniquesand computational resources used in this project. Chapter 4 reviews existing methods for es-tablishing point correspondence, describes the methods used in this project, and evaluatesand analyses the methods used. Chapter 5 describes general probabilistic models and theirlearning and sampling methods. It also describes and evaluates the particular models used inthis project. Models are evaluated by the realism and novelty of the shape samples they pro-duce, as well as by their performance on a shape reconstruction task. We also demonstrateadditional features of the latent variable models, including data visualisation, and interactivesampling. Chapter 6 describes the embedded deformation method of meshing shape samples.It provides examples of the meshes obtained as well as discusses the impact of certain pa-rameters in the meshing process. Finally, Chapter 7 concludes the thesis by summarising themethods used and discussing the main �ndings. It also points to some directions for futureresearch based on this work.

Chapter 2

Background

Shape modelling has a rich history, with shape models being used in graphics for shape syn-thesis [5, 6], animation [7] and font generation [8] as well as in computer vision for recogni-tion [9] and segmentation tasks [10]. We are interested in statistical shape modelling, whereproperties of an object class are learned from a collection of examples. In order to learn ashape model we must �rst determine a representation of object shape such that shapes canbe compared in a meaningful way. A wide variety of shape representations have been used,which stems from a fundamental ambiguity about what shape is, and the range of tasks whichshape models have been used for. In Section 2.1 we describe how shape can be represented,with a focus on two classes of shape representations: Volumetric representations (Section2.1.1), and landmark representations (Section 2.1.2). In Section 2.2 we discuss other relevantshape modelling approaches that do not explicitly use these representations. This chapter isintended as a high-level overview of shape modelling approaches as we provide additionalbackground for point correspondence, shape modelling and meshing shape samples in Chap-ters 4, 5 and 6 respectively.

2.1 Representations of object shape

Figure 2.1: Representations of object shape. (a) Voxellated teapot model. (b) Teapot mesh with shaperepresented by dense landmark points.

Object shape can be encoded in a variety of ways: Landmark representations describe ob-jects as a collection of points on the object surface. Control points can be used in combinationwith an interpolation rule to create continuous lines or surfaces. Implicit shape representa-tions encode the contour of a shape as the zero level set of an embedding function such asa signed distance map. Volumetric methods represent shapes as binary voxels placed on a

4

Chapter 2. Background 5

�xed-size grid. In this section we restrict our attention to two classes of shape representa-tion: Volumetric representations (Section 2.1.1) and landmark representations (Section 2.1.2).Examples of these representations are shown in Figure 2.1. We discuss the relative merits ofthese representations for the tasks of shape modelling and synthesis in Section 2.1.3.

2.1.1 Volumetric shape representation

Volumetric shape models consist of binary voxels in a three-dimensional grid of �xed size.More speci�cally, an object can be represented by the vector X = (xv )Vv=1, where V is thenumber of voxels, and xv ∈ {0,1} for v = 1,. . . ,V . A voxel x is part of the object if x = 1, andit is not part of the object if x = 0.

Wu et al. [11] used a volumetric representation in combination with a convolutional deepbelief network to model object shape for a set of object classes. They used a convolutionaldeep belief network to model the joint distribution of binary variables and object class labels.The model was successfully used to recognise object classes and reconstruct 3D objects froma single depth image.

The volumetric representation can be viewed as a three-dimensional extension of a pixelrepresentation. As such work on modelling two-dimensional object shape in the pixel domainis relevant here. Boykov et al. [12] used Grid-structured Markov random �elds as modelsof images. Here the pairwise potentials are e�ective at capturing local shape characteris-tics like smoothness, but they are unable to more global properties, like the arrangement ofobject parts. Eslami et al. [2] proposed the Shape Boltzmann Machine, which models two-dimensional grids of binary pixels using a carefully constrained deep Boltzmann machine.This was shown to be e�ective in capturing both low-level and high-level shape propertiesof the tested objects.

2.1.2 Landmark shape representation

Three-dimensional shapes can be represented by a collection of n points S = {xi }ni=1, wherexi ∈ R3, i = 1,. . . ,n. These points are called landmarks, and they may be sparsely or denselydistributed over the object surface. Collections of shapes with a landmark representation canbe modelled using statistical shape models, where a generative model p (x|θθθ ) is speci�ed, andthe parameters θθθ are learned using the collection of landmark pointsD = {vi }Mi=1, where M isthe number of training examples. In order to learn such a model, the landmark points mustbe in correspondence. That is, if vj = [v1

j ,. . . ,vnj ] and vk = [v1

k ,. . . ,vnk ] are two shapes from

an object class, then landmarkvij should be at the same location on shape vj asvik is on shapevk . We discuss the task of establishing point correspondence in Chapter 4.

Statistical shape models, also known as point distribution models or active shape modelswere �rst introduced by Cootes et al. [13] and have been used to model face shape [10], formedical image segmentation [14] and to model 3D cars [9]. We cover statistical shape modelsin more detail in Chapter 5.

2.1.3 Discussion of shape representations

In the context of shape modelling both the volumetric representation and the landmark pointrepresentation of object shape have advantages and disadvantages. One advantage of the vol-


umetric approach is that voxels across volumetric images are naturally in correspondence,whereas correspondence between landmark points is harder to achieve in a surface repre-sentation. In fact, establishing point correspondence is among the most di�cult elements ofshape modelling with a landmark representation.

One disadvantage with the volumetric approach is that the dimensionality involved maybe very high, with even a relatively low resolution 64x64x64 voxel image containing 262,144voxels. Moreover, many of these voxels will be redundant, lying on the inside or outside ofall the models, and care will be needed to ensure that unnecessary work is not undertakenby modelling these pixels.

The greatest issue with the volumetric representation is that 3D models are generallycreated and rendered as meshes, and the process of converting from mesh to voxels and backagain is likely to involve technical di�culties and manual intervention. This is a particularconcern when the main goal of this project is to generate 3D meshes that could be placed intoa scene and rendered. As such we consider the landmark shape representation to be mostappropriate for the purposes of this project.

2.2 Other related work

We have focused on shape modelling in the context of landmark and volumetric representa-tions, but there exists a body of relevant work on shape modelling that makes use of otherrepresentations of object shape.

A parts-based approach was used by Kalogerakis et al. [5] to create novel class instances.Objects were synthesized by recombination of shape components from the training data.Fish et al. [6] also used a parts-based method, except instead of the co-occurrence of parttypes, they modelled the geometric relationships between shape parts. For each shape unaryrelations such as the relative length of a part, and binary relations like the angle between twoparts were captured. One issue with these approaches is that they require part-annotatedobjects, which may be di�cult to �nd for many classes of 3D model.

Human shape and pose variation was modelled by Anguelov et al. [15] by representingshape examples as deformations of a template shape. A linear subspace model was used tomodel the variation in transformation parameters. Recent work builds upon this to modelhuman shape in motion [16]. The representations used in these models have very high di-mension, with a linear transformation being speci�ed for each of roughly 50000 triangles. Tolearn reasonable models with such high dimensionality, a large number of training exampleswere used. However, for many object classes it is only possible to obtain relatively few dataexamples, so a smaller representation may be required.

Synthetic depth images of humans with a variety of body sizes, shapes, poses and cameraviews were generated by Shotton et al. [17] by randomly varying appearance parameters(pose, rotation, size etc.) of a base set of human models. The limitation of this approachin the context of this project is that the continuous shape variability of the objects is notmodelled explicitly, rather it is captured by the base set of human models.

Other work has focused on generative models of handwritten digits [18]. In this frame-work digits could be generated by deforming the control points of a cubic B-spline, trans-forming from the object to the image frame, then uniformly placing Gaussian ‘ink drops’along the transformed spline. More recently, a manifold of fonts was learned which allowed


for continuous deformation between training fonts [8].A 3D deformable model of object shape was learned using only 2D images and a small

degree of user interaction [19]. Control points and subdivision surfaces were used to repre-sent surfaces, and the positions of the control points were optimised so that the silhouette ofthe object matched the outline of object in the 2D image.

Chapter 3

Datasets

As a �rst step in the modelling process, three 3D model datasets consisting of triangularmeshes were acquired. Each dataset is made up of a variety of objects from a particularobject class. The 3D models were scaled and aligned so that the shape can be modelledwithout capturing size and pose variability. This chapter describes the datasets (Sections 3.1- 3.3) and pre-processing steps (Section 3.4) in more detail.

3.1 Vases

131 3D vase models were obtained from ‘The 3D pottery benchmark’, a dataset originallydesigned for the evaluation of object retrieval methods [20]. The benchmark is a collectionof models in Wavefront .obj format from a variety of sources including ‘3D Millennium’, the‘NTU 3D model database’, ‘Google sketch repository’, ‘Archive 3D’, and others. The modelsare organised into classes such as bowl, modern-bottle, modern-glass, mug and modern-vase,the last of which we used as our shape modelling dataset. This vase dataset consists of modelsranging from low resolution (208 triangles), to very high resolution (179072 triangles). Themain challenge of the vases dataset is the shape variability in the form of di�erences in necksize, lip width and base width between models. However, the task is made easier by the factthat vases are generally rotationally symmetric and lack multiple parts. Figure 3.1 shows anumber of the vase models.

Figure 3.1: Pre-processed vase models.

3.2 Cars

59 3D hatchback models were selected from a cars dataset curated by Fidler et al. [21]. Thedataset was originally used for object recognition and viewpoint estimation tasks and consistsof cars in a variety of styles in Wavefront obj format. The selected hatchback models are

8

Chapter 3. Datasets 9

characterised by nearly vertical rear doors and a shorter wheel-span than saloon-type cars.The resolution of the car models varies from 5322 triangles to 126172 triangles. The carmodels exhibit the lowest level of variation of the three object classes, however there still existsigni�cant di�erences between models in windscreen angle, wheel placement, and overalllength / height. Figure 3.2 shows a number of the car models.

Figure 3.2: Pre-processed car models.

3.3 Teapots

27 3D models of teapots were obtained from a database curated by Pol Moreno, a graduatestudent at The University of Edinburgh. The models originated at the Google 3D warehouseand Archive 3D online repositories. The models were provided in Wavefront obj format andvary in resolution from 1480 triangles to 339930 triangles. Teapots are a challenging class forobject modelling as they feature distinct parts such as the spout and handle, and are highlyvariable in terms of their overall geometry. Figure 3.3 shows a number of the teapot models.

Figure 3.3: Pre-processed teapot models.

3.4 Software and Pre-processing

3.4.1 Software

Typically 3D models are processed using dedicated modelling programs such as Blender orAutodesk Maya. In these programs, the vertices and faces can be manipulated visually tocraft or alter models, and the results can be rendered to create images. These environmentso�er a wide variety of tools for manipulating 3D models, but are designed for manual designand control, rather than the numerical computation that is used in this project. The Mat-lab computing environment has basic rendering facilities, and o�ers a rich environment fornumeric computing. As such it was used to implement the various methods in this project.

3.4.2 Wavefront obj �les

The �rst step in pre-processing was to convert the model �les to objects that can be manipu-lated within Matlab. As the models were all in Wavefront obj format we used the Matlab OBJ


Figure 3.4: Aligning teapot models. (a) Misaligned teapot model with principal x-y componentsshown in red. (b) Model rotated so that the �rst principal component lies on the x-axis.

toolbox that is available on the Mathworks �le exchange [22]. The toolbox was used to readthe obj �les and a structured object containing vertex and face information was outputtedfor each model. These could then be further processed within the Matlab environment.

3.4.3 Alignment

In order to capture shape variation rather than scale or pose variation, the objects within thethree datasets were brought into alignment and transformed to be on similar scales. Whenaligning the models the z-axis was chosen to be the vertical direction and the models weretransformed so that their lowest point touched the x-y plane. The x-axis was chosen to be thedirection of greatest length of the models, so that the spouts and handles of the teapots, andthe principal front-to-back direction of the cars were aligned along the x-axis. We describethe alignment methods for each object class below.

• Vases The original vase models were badly aligned, with little consistency in the di-rection of the vertical part of the vases. To conform to the above alignment criteria thevases were �rst transformed to principal component space where almost invariably the�rst principal component corresponds to the vertical direction. The vases were then�ipped so that the �rst principal component was the z-axis, and transformed so that thelowest point touched the x-y plane. Manual corrections were made to a small numberof vases where the �rst principal component did not point in the vertical direction.

• Teapots A straightforward procedure was used to automatically align the teapot mod-els: The axes were �rst �ipped so that the z-axis was the vertical direction. Then thespout of the teapots was aligned along the x-axis. This was achieved by �nding theprincipal components of the x-y dimensions and rotating so that the �rst principalcomponent lay on along the x-axis (Figure 3.4). Then the models were translated sothat their highest vertex was centered at x =y = 0 and their lowest vertices touched thex-y plane. The highest vertex for the teapot models is typically a point on the centreof the lid, as such it is sensible to centre the model around this point. In 6 cases teapotswere incorrectly translated, due to their highest vertex being o�-centre. These caseswere manually corrected.

• Cars Fortunately the car models were pre-aligned in the intended way and did notrequire processing.


Figure 3.5: Scaling methods for 3D models. (a) Teapot model with bounding box. (b) Teapot modelwith vectors showing the standard deviation of the models’s vertices in each dimension.

3.4.4 Scaling

We scaled the models in the three object classes in one of two di�erent ways:

• Equal bounding box The models are scaled so that the bounding boxes of each modelhave the same volume.

• Equal variance The models are scaled so that the sum of the variance in the x ,y andz directions is equal for each model.

Figure 3.5 demonstrates these rescaling techniques. The equal bounding box scaling methodis sensitive to models that have a particularly low or high vertex range in one or more dimen-sions. In order to compensate, the object is scaled by a large amount in the other directionswhich can results in objects that are considerably taller or broader than the others. The equalvariance method is more robust to outliers, however for models with internal vertices suchas the cars it will produce poor scalings.

• Vases Few vase models have internal vertices, and there are many examples that areparticularly tall and thin, or short and fat. As such the equal variance scaling was usedto scale the vases.

• Teapots As with the vases, the teapot models do not contain internal vertices, and aresusceptible to the problems associated with the equal bounding box method, so theequal variance scaling method is used.

• Cars Due to the internal vertices present in many car models, the equal bounding boxscaling method is the most appropriate. Fortunately there are few examples of cars thatare very large or small in a particular dimension, so the equal bounding box methodworks well in most cases.

Chapter 4

Establishing Point Correspondences

In order to model the variation in objects with a surface representation it is necessary to�rst establish correspondences between the landmark or control points that characterise theobject surfaces. More speci�cally, the task is to �nd a �xed number of indexed points oneach surface such that points with the same index are in correspondence. Points on surfacesare in correspondence if their geometry is locally and globally comparable in the context oftheir respective surfaces. For instance if the object class is human hands we would expect apoint on the tip of the thumb of one hand to correspond to a point on the tip of the thumb ofanother hand. They are locally similar in the sense that they are both the tip of an approxi-mately convex local surface, and they are globally similar in that they are on the same digit ofthe hand. This chapter reviews some of the existing techniques for establishing point corre-spondence (Section 4.1), details the particular methods used in this project (Section 4.2) andevaluates the methods by demonstrating correspondence and deformation examples (Section4.3).

4.1 Correspondence overview

The �rst idea that comes to mind when tasked with establishing correspondences is to man-ually annotate the surfaces with a �xed number of points. In fact this was the approach usedin the classic paper on active shape models by Cootes et al. [13]. However, this approach hasobvious disadvantages: manual annotation is time-consuming and prone to human error, andis not feasible for dense correspondences which may be required. Instead we would like toautomatically �nd correspondences, and there exist a wide range of methods developed forthis purpose. These methods are generally tailored to the problem at hand: Some assumethat surfaces are related by a rigid or a�ne transformation, others assume that the surfacescan deform in nonrigid ways.

4.1.1 Pairwise and groupwise shape correspondence

Point correspondences between a group of shapes can be found by �nding correspondencesbetween pairs of shapes. The idea is that if point i on shape S1 corresponds to point j onshape S2, and point j on shape S2 corresponds to point k on shape S3, then point i alsocorresponds to point k . That is, correspondence is transitive. Thus the task of establishinggroup point correspondence can be reduced to �nding the correspondence between some

12

Chapter 4. Establishing Point Correspondences 13

template shape and each of the other shapes in the group. In practice this transitive propertydoes not always hold, and the choice of template impacts the correspondences obtained. Assuch it is pragmatic to use a generic object from the group as the template in the hope thatthe established correspondences will be relatively robust.

Other methods make use of the fact that all the shapes in a group should be in corre-spondence in their optimisation procedure. A natural extension of the pairwise approach isto obtain correspondences in a pairwise way, compute the mean shape, then use the meanshape as the template model for the pairwise approach. This can be repeated until there islittle change in the mean shape obtained [23]. Another approach is to optimize correspon-dences by the quality of the statistical shape model that they produce. An example of onesuch approach is to select correspondences using the minimum description length principle,where models with a small coding length are favoured [24]. These groupwise methods op-timize correspondences across all shapes at once, eliminating the template-choice problemsof the pairwise approach.

4.1.2 Similarity-based correspondence

Points on a surface can be characterized using shape features, which quantify in some way theshape of a particular point with respect to the other points on the surface. In [25], Belongieet al. introduced shape contexts where a particular point is represented as the histogramof vectors that connect it to the other points on the shape. Other shape features that havebeen used include the normal direction of the points and the curvature of the surface at apoint [8]. Given shape features for each point on a shape a natural approach to �nding pointcorrespondence is to match points if their shape features are similar. To this end an objectivefunction can be established that encourages feature vectors of corresponding points to besimilar while penalising correspondences which would cause the shapes to be distorted ordeformed if brought into alignment [25].

4.1.3 Registration of point clouds and meshes

Surface registration is the process of �nding a mapping between a template surface and a tar-get surface so that template points are mapped to target points that correspond semantically[3]. Surfaces are typically polygon meshes, but these methods can be adapted to work withpoint clouds obtained from 3D scans. In this section a polygon meshM = (V ,E) is de�ned asa setV = {vi }Vi=1 ofV vertices and a set E = {(i,j )k }Ek=1 of E edges. A point cloud P = {pi }Pi=1 issimply a set of P points. For convenience we make no distinctions between point clouds andmeshes unless necessary and write S = {xi }Ni=1 and T = {uj }Mj=1 for the template and targetsets respectively.

Point correspondence for a pair of surfaces can be established by choosing a number oflandmark points on the template surface, registering the template surface to a target surface,and considering the position of the deformed landmark points. By applying this process toeach of the target surfaces in the object class, full group point correspondences can be foundusing the transitive property described above.

Surface registration can be formulated as an optimisation problem, where the objectivefunction contains a data �t term which encodes the distance between the target surface and


the transformed template surface for some transformation Ψ:

ED (Ψ) = dist(T ,Ψ(S)). (4.1)

In the case of point-cloud registration the distance term is typically point-to-point:

ED (Ψ) =N∑i=1

Ψ(xi )−un (i )

2, (4.2)

where n(i ) is the index of the target point corresponding to template point xi . For surfaceswith normal vectors a point-to-plane metric can be used:

ED (Ψ) =N∑i=1

nᵀi (Ψ(xi )−un (i ) )

2, (4.3)

where ni is the surface normal at vertex i . In some cases a weighted combination of bothdistance metrics is used:

ED (Ψ) = λ1

N∑i=1

Ψ(xi )−un (i )

2+λ2

N∑i=1

nᵀi (Ψ(xi )−un (i ) )

2, (4.4)

for some constants λ1, λ2. A signi�cant issue with these distance metrics is that they encour-age template points to be close to target points, but not the other way round. That is, theremay be regions of the target surface that are not well covered by the deformed template.To this end an additional data �t term [26] can be included which penalizes transformationswhich do not cover the whole target surface:

ETarget (Ψ) =M∑j=1

Ψ(xm (j ) )−uj

2, (4.5)

wherem(j ) is the index of the template point that corresponds to the target point uj . Distancemetrics that incorporate data �t terms in both directions are bidirectional, whereas measuresthat only match in one direction are unidirectional.

Once a distance metric has been chosen a question comes to mind: what constraintsshould be imposed on the transformation Ψ? One option is to enforce a global transformationso that each template vertex is transformed in the same way.

4.1.4 Global surface transformation

A global a�ne transformation can be speci�ed by a 3×3 matrix A and a 3×1 vector t:

Ψ(S) = AS+ t (4.6)

The linear transformation A may be constrained to be a rotation, thus assuming that thesurfaces are related by a rigid transformation. An unconstrained matrix A allows for nonrigidtransformations. Global transformations can be speci�ed with six degrees of freedom in therigid case, and twelve in the a�ne case.


4.1.5 Local surface transformation

Alternatively, each vertex could be transformed separately as

Ψi (xi ) = Aixi + ti , i = 1,. . . ,n, (4.7)

where rigid or nonrigid transformations may be allowed as in the global case. But thispresents a problem: if vertices are allowed to deform independently then the optimisationproblem has very many degrees of freedom, and the resulting surface is likely to be severelydistorted. For example, a minimum could be found by collapsing all the template points toa single point on the target surface. As such it is necessary to impose regularisation on thetransformations to encourage natural deformations of the template surface. regularisationcan be encoded by using additional terms in the objective function which penalize undesir-able deformations. For example in the case of nonrigid deformations a rigidity term [27] canbe included:

Erigid (Ψ) =N∑i=1

Rot(Ai ), (4.8)

where

Rot(A) = (aᵀ1 a2)2+ (aᵀ1 a3)

2+ (aᵀ2 a2)2+ (4.9)

(1−aᵀ1 a2)2+ (1−aᵀ2 a2)

2+ (1−aᵀ3 a3)2,

and a1, a2, and a3 are the column vectors of A. This term discourages a�ne transformationsand encourages pure rigid transformations. Another regularisation term that is often used isthe sti�ness of the transformation [3]:

Esti� (Ψ) =N∑i=1

∑j ∈N (i )

Ai −Aj

2F+

ti − tj

2, (4.10)

whereN (i ) is the set of vertices neighbouring vertex i . This term encourages transformationsof neighbouring points to be similar, thus avoiding jagged deformations. Other regularisa-tion terms encourage surface normals of neighbouring vertices to be similar [28], encouragesmoothness using thin plate splines [29], or manage holes in depth scans [27]. For a moredetailed survey on regularisation terms for local nonrigid registration see [30].

4.1.6 The iterative closest point algorithm

In the previous sections there has been a glaring omission: whenever a data �t or regulari-sation term was de�ned it was assumed that correspondences for the surfaces were alreadyknown. More explicitly, for each template point xi we used the notion of an associated targetpoint un (i ) . However in reality these associations are not known, in fact the whole point ofthese methods is to establish these correspondences.

If the template transformation Ψ is known for a target surface, then the correspondencescan be obtained by deforming the template and picking the closest point on the target surface.Conversely if the correspondences are known then the transformation can be obtained byminimizing the appropriate objective function.


Problems of this kind are typically solved using the iterative closest point algorithm [31]where correspondences are estimated, a transformation is determined using the correspon-dences, after which the corresponding points are re-estimated based on the transformed tem-plate, and the process is repeated until convergence:

Algorithm 1 Iterative Closest PointInput: Template surface S, target surface T , objective function E (Ψ), stopping tolerance

ϵ > 0.Output: Transformation Ψ that maps template surface to target surface.

1: Initialize Ψ0 B∞, Ψ1 B I, t B 1.2: while E (Ψ

t )−E (Ψt−1) < ϵ do3: Transform the template surface: S → Ψt (S).4: For each transformed template point Ψt (xi ) �nd the closest target point uti .5: Set Ψt+1 B argminΨE (Ψ) using new correspondences {ui }ni=1.6: Set t B t +1.7: end while8: Ψ = Ψt .

For rigid surface registration tasks the objective function E (Ψ) can be minimized analytically,however for some more complex nonrigid deformations this is a non-linear optimisationproblem which must be solved using iterative methods.

4.2 The method used

We require a �exible correspondence-�nding method that can cope with global and localnonrigid shape variation, multiple object classes, and classes with potentially many exam-ples. Ideally the method would be e�ective in the absence of manually annotated landmarkpoints. We use a pairwise nonrigid mesh registration technique based on the adaptive sti�-ness regularisation and e�cient optimisation method of Amberg et al. [3]. A bidirectionaldata �t term is used, and correspondences are improved by iteratively adapting the templatemesh using the mean shape.

4.2.1 Landmark selection

We wish to obtain a reduced, �xed-dimensional representation of each shape. To this end we�nd landmark points on the template shape that represent the object at a speci�ed resolution.Landmarks are also obtained for the target shape so that a bidirectional distance metric maybe used. For a meshM = (V ,E) the points are obtained by maintaining two sets, a landmarkset L and a point set P. The point set is initially taken to be the set of all vertices V , andthe landmark set is empty. A random point is chosen from the point set and added to thelandmark set. Then all the points within a certain radius of the chosen point are removedfrom the point set. A new point is chosen from the point set and the process is repeated untilthe point set is empty. See Algorithm 2 for a formal description.

The output of this algorithm is a set of landmark points L that are roughly uniformlyspaced over the surface of the object. The resolution of the landmark set can be varied by


Algorithm 2 Landmark SelectionInput: VerticesV , Radius r > 0.Output: Landmark set L.

1: Choose radius r > 0.2: Initialize landmark set L B ∅ and point set P BV .3: while |P | > 0 do4: Select a point p uniformly at random from remaining points P.5: Add it to landmark set: L B L∪p.6: Find the set of points close to p: Premove B {x | p−x < r }.7: Remove these points from point set: P B P \Premove.8: end while

Figure 4.1: Landmark selection. (a) Car model with 96259 vertices. (b) 1027 Landmarks obtained forthe same model using the landmark selection algorithm.

choosing the radius r . Figure 3.1 shows an application of the landmark selection algorithmto a car model.

4.2.2 Distance metric

LetV = {vi }Vi=1 be the set of vertices of the target shape, and let X = {xi }Ni=1 andU = {ui }Mi=1be the sets of landmark points obtained for the template and target shape respectively. Weuse a weighted, bidirectional, point-to-point distance metric to obtain the data term in theobjective function:

ED (Ψ) =N∑i=1

wi Ψ(xi )−vn (i )

2+λ

M∑j=1

w ′j Ψ(xm (j ) )−uj

2, (4.11)

where λ is a pre-determined constant that weights the importance of the target-focused dis-tance term. The weightsw andw ′ are optional and may be used to indicate the reliability of aparticular match. One approach is to set weights to zero if the surface normals for a match arenot compatible. Alternatively the weights may be obtained using the distance from template


points to their corresponding target points:

si =1

1+ xi −vn (i ) , (4.12)

wi =si

maxj sj. (4.13)

This has the e�ect of giving more weight to matches that are closer to their target. Theweights w ′ can be obtained similarly. In our experiments, good correspondence could beachieved without weighting the matches in any way.

4.2.3 Deformation

To allow for �exible deformations within object classes we use nonrigid local deformations.Following the approach in [3] we represent each landmark point in homogeneous coordinatesx = [x1,x2,x3,1]ᵀ and associate each with an a�ne 3×4 transformation matrix A:

Ψi (xi ) = Aixi , i = 1,. . . ,N . (4.14)

The data term can then be written as

ED (Ψ) =N∑i=1

wi Aixi −vn (i )

2+λ

M∑j=1

w ′j Am (j )xm (j ) −uj

2. (4.15)

This homogeneous representation will allow us to represent the full objective function as aquadratic function which can be minimized analytically.

4.2.4 regularisation

For regularisation a sti�ness or smoothness term is used which penalises weighted di�er-ences of neighbouring transformations under the Frobenius norm:

Es (Ψ) =N∑i=1

∑j ∈N (i )

(Ai −Aj )G

2F. (4.16)

The weighting matrix G B diag(1,1,1,γ ) contains a parameter γ which weights di�erencesin the skew and translational parts of the transformation against the translational part. HereN (i ) is the set of indices of points that neighbour xi . The neighbourhood can be de�ned in anumber of ways, for example if a whole mesh is being deformed it may consist of the edges inthe mesh triangulation, or simply connect points that are close to one another by Euclideanor geodesic distance. In this work a reduced landmark representation of surfaces was used, soneighbourhoods based on mesh triangulations were not possible. As such neighbourhoodswere de�ned by Euclidean distance:

N (i ) = {j | xj ∈ kNN(xi )}, i = 1,. . . ,N , (4.17)

where kNN is the k-nearest neighbour function. Here k ∈ I>0 is a parameter that can bespeci�ed.


4.2.5 Optimisation

The full objective function to be minimized is:

E (Ψ) = ED (Ψ)+αEs (Ψ), (4.18)

where α ∈ R>0 weights the sti�ness term. Following Amberg et al. [3] the data term can berearranged as follows:

ED (Ψ) =N∑i=1

wi Aixi −vn (i )

2+λ

M∑j=1

w ′j Am (j )xm (j ) −uj

2

(4.19)

=

(W⊗ I3)

*......,

A1 0 · · · 00 A2 · · · 0....... . .

...

0 0 · · · AN

x1

x2...

xN

−

vn (1)vn (2)...

vn (N )

+//////-

2

+ (4.20)

λ

(W′ ⊗ I3)

*......,

Am (1) 0 · · · 00 Am (2) · · · 0...

.... . .

...

0 0 · · · Am (M )

xm (1)

xm (2)...

xm (M )

−

u1

u2...

uM

+//////-

2

where WB diag(√w1,. . . ,√wN ), W′B diag(

√w ′1,. . . ,

√w ′M ) and ⊗ is the Kronecker prod-

uct. Then by introducing:

A = [A1 · · · AN ]ᵀ, (4.21)U = [vn (1) ,. . . ,vn (N )]ᵀ, (4.22)U′ = [u1,. . . ,uM ]ᵀ, (4.23)

X =

xᵀ1 0 · · · 00 xᵀ2 · · · 0....... . .

...

0 0 · · · xᵀN

, (4.24)

X′ : [X′]i j =

[xm (i )]1 if j = 4m(i )−3[xm (i )]2 if j = 4m(i )−2[xm (i )]3 if j = 4m(i )−1[xm (i )]4 if j = 4m(i )

0 otherwise

, (4.25)

we can rewrite the data �t term as:

ED (Ψ) = ‖W(XA−U)‖2F + W′(X′A−U′) 2F . (4.26)

To see this note that in the �rst term of Equation 4.20 the squared norm is being taken ofa 3N × 1 vector, where the entries are the stacked elements of the 3× 1 vectors √wi (Aixi −vn (i ) ), i = 1,. . . ,n. Similarly, the �rst term of Equation 4.26 is the squared norm of a N × 3matrix, where the i’th row contains the three elements of the vector √wi (Aixi −vn (i ) ). Assuch the Euclidean norm of the �rst, and the Frobenius norm of the latter are equivalent. The


working is similar for the second term in each expression. As for the sti�ness term, it maybe rearranged as

Es (Ψ) =N∑i=1

∑j ∈N (i )

(Ai −Aj )G

2F

(4.27)

= ‖ (M⊗G)A‖2F , (4.28)

where M is the node-arc incidence matrix of the graph de�ned by the neighbourhoodsN (i ),i = 1,. . . ,N discussed previously. In this graph, points are nodes, and nodes are connected byan edge if there exists a neighbourhood which contains both the points. Taking both termsinto account the full objective function can be expressed as the following quadratic form:

E (Ψ) =

αM⊗GWXλW′X′

A−

0WUλW′U′

2

F

(4.29)

, ‖BA−C‖2F . (4.30)

This may be solved exactly for A by �nding the gradient and setting it equal to zero:

A = (BᵀB)−1BᵀC. (4.31)

See [3] for a proof that the Hessian BᵀB is invertible. In practice, solving the linear system(BᵀB)A = BᵀC is more e�cient.

4.2.6 Iterative sti�ness relaxation

The deformations that this method produces are sensitive to the sti�ness parameter α . Highvalues enforce more global transformations, whereas low values allow points to deform in-dependently of their neighbours. Amberg et al. [3] proposed an iterative reduction in thesti�ness parameter which allows for more robust �tting. A set of sti�ness weights A ={α1,. . .αA}, with αa < αa−1 is chosen, and for each weight the iterative closest point algorithmis applied, and the correspondences are updated. The �nal transformation Ψ is obtained byminimizing the objective function E (Ψ) for correspondences that have been iteratively up-dated throughout the procedure. We modify this procedure by updating both the tempateshape and the correspondences at each iteration instead of updating just the correspondences.Thus at each iteration a new template shape is determined, and the optimal transformationof this updated shape is found. We describe this process as the iterative sti�ness relaxationalgorithm (ISR) and give full details in Algorithm 3.

4.2.7 Re-estimation of template shape

After selecting a template shape and �nding correspondences using the ISR algorithm we ob-tain a �xed dimensional set of corresponding landmarksD = {xi }Mi=1 where M is the numberof shapes in the object class. We can then compute the mean shape x = 1

M∑

i xi . To reducebias due to the choice of initial template we use the mean shape as a new template and repeatthe ISR procedure to obtain the �nal set of correspondences.


Algorithm 3 Iterative Sti�ness RelaxationInput: Template landmark set Xt , target verticesV , target landmark setU , stopping toler-

ance ϵ , sti�ness set A = {α1,. . .αA}, with αa < αa−1.Output: Point setXc on the target surface that corresponds to the template landmark pointsXt.

1: Initialise Ψ0 B∞, Ψ1 B I, Xc B Xt and t B 1.2: for αi ∈ A do3: while E (Ψ

t )−E (Ψt−1) < ϵ do4: For each template landmark point xi ∈ Xc �nd the closest target vertex vtn (i ) ∈V

and the associated weight wi .5: For each target landmark point uj ∈ U �nd the corresponding template land-

mark point xtm (j ) ∈ Xc and the associated weight w ′i .6: Set Ψt+1 B argminΨE (Ψ).7: Update the template points using the current transformation parameters: Xc B

Ψt+1 (Xc ).8: Set t B t +1.9: end while

10: end for11: Snap each point inXc to closest target vertex. The outputXc is the set of snapped points.

4.2.8 Summary and parameter selection

The method involves a number of di�erent steps and the speci�cation of various options andhyperparameters, so for clarity we provide the following step-by-step summary:

• Select template shape.

• Specify radii rtemplate and rtarget for the landmark selection algorithm (Algorithm 2) andobtain template landmarks X.

• Con�gure objective function E (Ψ) by specifying parameters λ, γ and k .

• Specify alpha set A, and ϵ the tolerance for the ISR algorithm.

• Apply the ISR algorithm, obtain the mean shape, then reapply the ISR algorithm toobtain �nal correspondences.

For each dataset, the parameters and options were chosen using a combination of insight andexperimentation. The template shape was chosen to be a fairly representative shape, withoutdistinguishing features. The radius for the landmark selection algorithm was chosen basedon the scale of the objects, with the aim being to obtain around 700-800 landmarks. Weused the same radius for both the template and the target objects. The γ parameter whichweights di�erences in the skew and translational parts of the transformation against thetranslational part was set to 2 for all classes. The k parameter which determines the numberof neighbours per neighbourhood in the regularisation term was set to 6 for all classes. Theλ parameter which weights the importance of the target landmarks in the distance metricwas set to 1, 0.5 and 0.1 for the vases, teapots and cars respectively. The lower value for


Figure 4.2: Correspondences obtained for four car models. Points were manually selected on car (a),and the corresponding points were found on the other cars.

the cars was chosen because some of the car models features interior vertices with seats anddashboard etc. This meant that with a high λ value the template landmarks would be attractedto these interior points, thus producing poor correspondences. The sti�ness parameters inthe alpha set were chosen so that the initial transformations were global, and so that the�nal transformations did not become unstable. For the vases and teapots this was a equallyspaced set A = {100,. . . ,5}, for the cars a slightly higher �nal value was required with A ={100,. . . ,10}. Finally the stopping tolerance ϵ was set so that a reasonable number of iterations(20-30) occurred for each sti�ness setting, with values of ϵ = 0.005, ϵ = 0.2 and ϵ = 0.2 for thecars, vases and teapots respectively.

4.3 Evaluation and analysis

In the absence of ground-truth correspondences for the shapes of interest, a full quantitativeevaluation of the correspondence procedure is not possible. Instead we consider some ex-amples of the correspondences found and investigate the qualitative e�ects of the decisionsmade when formulating the method.

4.3.1 Correspondence examples

The �rst step in evaluating the e�ectiveness of the correspondence �nding procedure is toconsider some examples. For each dataset we obtain correspondences using the methodsoutlined in the summary (Section 4.2.8). Then we manually select landmarks of intereston a reference object and �nd the corresponding points on three other models. We considerwhether the correspondences are of good quality by assessing whether corresponding points


Figure 4.3: Correspondences obtained for four vase models. Points were manually selected on vase(a), and the corresponding points were found on the other vases.

Figure 4.4: Correspondences obtained for four teapot models. Points were manually selected onteapot (a), and the corresponding points were found on the other teapots.


share local and global geomtrical features.

• Cars (Figure 4.2) The chosen landmarks are at points of interest including the bottomsof the wheels, the wing-mirrors, and the top corners of the windscreen. In general thecorrespondences are good, with most points in rough correspondence across models,and key landmarks such as the bottoms of the wheels, and wingmirrors correspondingcorrectly. However, there are a few cases where the correspondence could be better.For car (b) the point associated with the top corner of the rear passenger windowshould be further to the rear of the car. The point associated with the bottom of thefront bumper of car (c) is too high. Also for car (c) the headlight landmark point isfurther back than it should be. This is probably due to the car having a very squarefront that doesn’t curve back like most other examples.

• Vases (Figure 4.3) The chosen landmarks for the vase models are at the bases of thevases, the lips of the vases, and around central contours of the vases. The correspon-dences achieved are very good, with points on the vase lips and bottoms being in goodcorrespondence. One slight issues that is present here is asymmetry in vase (c), wherepoints on the right hand side have slipped below points on the left hand side. Onewould expect symmetrical deformations for objects like vases, however asymmetrycould be introduced by the random landmark sampling procedure.

• Teapots (Figure 4.4) The chosen landmarks for the teapot models are at the basesof the teapots, along the teapot handles, on the teapot spout, and at the top of theteapot lid. The correspondences obtained are workable, but a number of poor corre-spondences are evident. The point where the spout of the teapot attaches to the bodyis far too high up on teapots (b) and (c). The uppermost point on the teapot handle(orange) is poorly positioned on teapot (c), in fact the landmark has left the handle andhas moved to the upper part of the teapot body. These poor correspondences are un-derstandable given that the teapots are so variable in shape, with spouts and handlesattaching to the body at very di�erent places across di�erent models.

Based on these samples, we judge the correspondences obtained by the methods in this chap-ter to be of satisfactory quality, albeit with some errors due primarily to shape variability. Im-portantly, we achieve the primary goal of producing correspondences that are good enoughto be used as a basis for the shape modelling task. In addition, it should be noted that theresults are produced without any hand-annotated landmarks, and with a minimum of tweak-ing for di�erent object types. As such the methods could easily be applied to arbitrary objectclasses, which is a desirable property of a real system. A potential direction for future workis to take a parts-based approach, and �nd correspondences between object parts such asteapot handles or spouts. The parts could then be ’stitched’ back together to produce a fullshape. This approach would eliminate the issue present with the teapot models, where corre-spondences were found between separate object parts. The issue here is that part-annotatedmodels are required which may not be easy to obtain.

4.3.2 Deformation examples

In this section we demonstrate the e�ects of certain features of the correspondence �ndingprocedure by highlighting the deformations achieved when these features are modi�ed or


removed. In particular we focus on the use of local nonrigid deformations, the bidirectionaldistance metric and the iterative sti�ness relaxation algorithm.

• Local nonrigid transformations (Figure 4.5) Regularised local transformations al-low for a template shape to deform �exibly in order to match a target shape. Figure 4.5compares deformations of a teapot model with (b) global, and (c) local transformations.The global transformation is achieved by choosing a high value for the regularisationconstant α , and by not applying the iterative sti�ness relaxation algorithm. The localtransformations are achieved by allowing the sti�ness to decrease so that neighbour-ing landmarks are less constrained to behave similarly. It is clear from the �gure thata global transformation is not su�cient to capture the di�erences in shape of the twoteapot models, whereas the local transformations are e�ective in recovering the shapeof the target model.

• Bidirectional distance metric (Figure 4.6) One of the notable features of the par-ticular correspondence �nding process that we have used is the use of a bidirectionaldistance metric. In the absence of manually annotated landmark points this feature isinvaluable for obtaining reasonable correspondences, as Figure 4.6 shows. The bidirec-tional metric remedies the �aw in the single direction metric, which is that the targetshape does not have to be fully covered to optimise the objective function. Under thesingle direction metric the deformation can �nd some subset of the target such thatthe template landmarks are close to target vertices, and the smoothness constraintsare satis�ed as in (d). The bidirectional metric pulls the landmark points up so thatthey cover every part of the target surface (c).

• Iterative sti�ness relaxation (Figure 4.7) The iterative sti�ness relaxation algo-rithm produces a sequence of increasingly local transformations. This is bene�cial asthe initial global transformations serve as a good initialisation for further local trans-formations. Figure 4.7 shows deformations both with (c) and without (d) the ISR algo-rithm. The deformation that does not use the ISR algorithm has a single sti�ness valuewhich is set to be the same as the �nal sti�ness value of the deformation that does useISR. The deformation achieved using the algorithm is of higher quality, covering thetarget surface more evenly and completely. The deformation with ISR is helped by theinitial global a�ne transformation (b).

Figure 4.5: Global vs. local deformations. (a) Target model in blue with template landmark pointsshown in red. (b) Optimal global a�ne transformation of template points. (c) Template deformedusing local transformations through the iterative sti�ness reduction algorithm.


Figure 4.6: Surface registration. (a) Target model in blue with template landmark points shownin red. (b) Intermediate iteration of surface registration. (c) Template deformed with bidirectionaldistance metric. (d) The perils of an asymmetric distance metric. The template does not have to coverthe target fully to achieve a low objective function value.

Figure 4.7: Iterative sti�ness relaxation. (a) Target model in blue with template landmark pointsshown in red. (b) Iterative sti�ness relaxation �rst �nds an optimal global a�ne transformation. (c)Fully deformed template. (d) Template deformed without iterative sti�ness relaxation.

In this section we have used example deformations to support the decisions made in formu-lating the method. We demonstrated that global transformations of landmark points are notsu�cient to capture the shape di�erences within object classes (Figure 4.5 ). The bidirec-tional distance metric helped the deformed template to cover the entirety of the target shape(Figure 4.6). Finally we showed the improved deformations obtained using the iterative sti�-ness relaxation algorithm rather than a single sti�ness setting (Figure 4.7). These featuresseemed reasonable to include and resulted in qualitatively improved deformations, howeverthere are additional techniques that could have been explored. An extension to this projectcould consider using alternative regularisation techniques or point-to-plane distance met-rics. It would be interesting to see if these alternatives could further improve the obtainedcorrespondences.

Chapter 5

Models for Object Shape

In the previous chapter a process for establishing correspondences between landmark pointson objects within an object class was described. This was an intermediate goal; a pre-requisitefor the shape modelling that will allow us to generate new plausible objects. In order tomodel object shape we specify a generative model p (x|θθθ ) and learn the parameters θθθ from acollection of corresponding landmark points D = {xi }Mi=1 for a particular object class. Thismodel can then be sampled from to obtain new instances of the object class. In this chapterwe present an overview of shape modelling using landmark points (Section 5.1) and describea number of the models used in more detail, as well as their learning and sampling methods(Sections 5.2 - 5.3). We then evaluate the chosen models in terms of the realism and noveltyof the samples they produce, as well as on a shape reconstruction task (Section 5.5). We thenanalyse how the train and test log-likelihood varies with the number of latent dimensionsused for the PPCA model (Section 5.5.4). Finally we demonstrate how the models can be usedfor data visualisation and interactive sampling (Section 5.5.5).

5.1 Statistical shape models

Generative models of object shape using landmark points are more commonly known asstatistical shape models. We brie�y introduced this class of models in Chapter 2. The mostcommon statistical shape model is the linear subspace model of Cootes et al. which assumesthat the data lie near a low-dimensional linear subspace. Independent components analysishas been used as a shape model in the context of medical image segmentation where it wasfound to produced more local shape variation than PCA [32]. Romandhani et al. proposeda kernel PCA-based shape model to capture the nonlinear shape variations of human facescaptured in 2D images from di�erent viewpoints [33]. Campbell et al. used a Gaussian processlatent variable model (GP-LVM) to model font shapes determined by dense landmark points.This model was chosen as it has been shown to require far fewer latent dimensions thanlinear models in order to account for the variability in the data.

In this project we use a probabilistic principal components (PPCA) model as a baseline,and investigate the e�ectiveness of both a standard GP-LVM and a Bayesian GP-LVM incomparison. It is hoped that the GP-LVMs, with their non-linear mappings from latent to dataspace will be able to more accurately reproduce the low-dimensional manifold on which thedata lie. In Sections 5.2 - 5.3 we provide background for these models, including their learningand sampling methods.

27

Chapter 5. Models for Object Shape 28

5

x1

0

-5-5

0

x2

0.06

0.08

0.1

0.04

0.02

0

5

Pro

babili

ty d

ensity

x1

-4 -2 0 2 4

x2

-4

-2

0

2

4

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Probability density

Figure 5.1: Two-dimensional multivariate Gaussian with µµµ = [0,0] and Σ =[ 2 0.5

0.5 1.4].

5.2 Gaussian models

The multivariate Gaussian is the most commonly used model of continuous variables. Itforms the basis of each of the models described in this chapter. The multivariate Gaussianhas the following form:

p (x|µµµ,Σ) =1

(2π )D/2 |Σ|1/2 exp(−

12 (x− µ

µµ )ᵀΣ−1 (x− µµµ )), (5.1)

where x is a random vector, µµµ is the mean of the distribution, and Σ is the covariance. Wewrite p (x|µµµ,Σ) =N (x|µµµ,Σ) for compactness. Figure 4.1 shows an example of a two dimen-sional Gaussian distribution. If we assume that the data is generated i.i.d. by a multivariateGaussian then the probability density of the data given the parameters is:

p (D|µµµ,Σ) =N∏i=1N (xi |µµµ,Σ). (5.2)

The goal is then to learn the parameters µµµ and Σ.

5.2.1 Learning

In order to learn the parameters we may use maximum likelihood (MLE), Maximum a poste-riori (MAP), or full Bayesian estimation. The likelihood of the parameters is a function whichreturns the probability density of the data given the parameters:

L (µµµ,Σ) = p (D|µµµ,Σ) (5.3)

=

N∏i=1

1(2π )D/2 |Σ|1/2 exp

(−

12 (xi − µ

µµ )ᵀΣ−1 (xi − µµµ )). (5.4)

Taking logs we transform the product into a sum and obtain the log likelihood:

`(µµµ,Σ) = −ND

2 ln (2π )− N

2 ln ( |Σ|)−12

N∑i=1

(xi − µµµ )ᵀΣ−1 (xi − µµµ ). (5.5)


The MLE is obtained by �nding the parameters that maximise the likelihood. As the logfunction is monotonic, it is su�cient to maximise the log likelihood:

µµµ,Σ = argmaxµµµ,Σ

L (µµµ,Σ) (5.6)

= argmaxµµµ,Σ

`(µµµ,Σ). (5.7)

The maximum can be found analytically by setting the gradient equal to zero. The MLEparameters are the sample mean and covariance:

µµµ =1N

N∑i=1

xi , x (5.8)

Σ =1N

N∑i=1

(xi − x) (xi − x)ᵀ . (5.9)

For Bayesian estimation, a prior p (θθθ ) on the parameters is speci�ed, and the posterior distri-bution p (θθθ |D) may be determined using Bayes’ rule:

p (θθθ |D) =p (D|θθθ )p (θθθ )∫θθθ p (D|θθθ )p (θθθ )

. (5.10)

In MAP estimation, we seek to �nd the parameters that maximize the posterior distribu-tion θθθ = argmaxθθθ p (θθθ |D). In full Bayesian estimation we aim to recover or approximate theposterior distribution by computing the normalising constant

∫θθθ p (D|θθθ )p (θθθ ).

The Gaussian model is a natural �rst choice for shape modelling, however the high di-mensionality of the data relative to the number of data points is problematic. The covariancematrix Σ has O (M2) elements where M is the number of landmark points. For a model with700 landmark points, the covariance matrix has 4,410,000 elements. For the datasets with30-180 data points that we are working with, the potential for over�tting is severe. As suchthe multivariate Gaussian is not an appropriate choice without strong regularisation. Thelatent variables models in the following section get around this issue by assuming that thedata is generated in a much smaller dimensional latent space before being transformed intothe data space.

5.3 Latent variable models

In the Gaussian model presented above, the correlation between data variables was explicitlymodelled with the covariance matrix. An alternative approach is to assume that the data isactually explained by a process where hidden or latent variables are generated independentlybefore being transformed into the data space.

5.3.1 Factor analysis

If we assume that the data variables x are generated by adding Gaussian noise to a linearfunction of continuous latent variables z we obtain the following model:

p (x|z,θθθ ) =N (x|Wz+ µµµ,Ψ), (5.11)


where the factor loading matrix W is a D×L matrix and Ψ is a D×D covariance matrix. Nowif we assume that the latent variables have a Gaussian distribution p (z) =N (z|µµµ,Σ), and thatthe covariance matrix of the noise is diagonal Ψ = diag(Ψ1,. . . ,ΨD ), then we obtain the factoranalysis model. The restriction that the noise covariance is diagonal encourages the latentvariables to explain the correlation between data variables. The marginal distribution of thedata variables given the parameters is obtained by integrating out the hidden variables:

p (x|θθθ ) =∫N (x|Wz+ µµµ,Ψ)N (z|0,I)dz (5.12)

=N (x|µµµ,WWᵀ +Ψ). (5.13)

Note that the choice of the standard Gaussian distribution for the latent variables does notimpact the �exibility of the model as mean µµµ0 and covariance Σ0 can be absorbed into theother parameters µµµ and W. In learning we aim to estimate the mean µµµ, factor loading matrixW and the diagonal noise variance Ψ. The MLE of µµµ is simply the sample mean x, and theother parameters can be estimated using either the expectation maximisation algorithm (EM),or the eigen-approach common in statistics [34].

5.3.2 Probabilistic principal component analysis

Consider the factor analysis model described above, but now impose the more restrictiveassumption that the noise covariance Ψ is not just diagonal, but is a scalar multiple of theidentity matrix Ψ = σ 2I. This is the probabilistic principal components model [35].

5.3.2.1 Learning

Unlike factor analysis, the EM algorithm is not required to learn the parameters of the PPCAmodel. Recall the marginal distribution of the visible variables from Equation 5.11. In PPCAwe can maximize the marginal likelihood with respect to the parameters θθθ = (W,σ 2) as fol-lows:

L (W,σ 2) = p (D|µ,W,σ 2) (5.14)

=

N∏i=1N (xi |0,WWᵀ +σ 2I) (5.15)

⇒ `(W,σ 2) = −N

2 ln |C| − 12

N∑i=1

xᵀi C−1xi (5.16)

= −N

2 ln |C|+ tr(C−1Σ), (5.17)

where C =WWᵀ +σ 2I, Σ = 1N

∑Ni=1 xix

ᵀi is the sample covariance, and we assume that the

data are centered. The MLE is obtained at

W = V(Λ−σ 2I)12R, (5.18)

where V is the matrix of eigenvectors of the sample covariance Σ and Λ is the correspondingdiagonal matrix of eigenvalues. R is an arbitrary orthogonal matrix, which may be chosento be I. In regular PCA the data are transformed into the latent space with x = (VΛ)z so that


as the noise variance σ 2→ 0, the PPCA MLE tends to the regular PCA estimate. The MLE ofthe noise variance is

σ 2 =1

D −L

D∑j=L+1

λj , (5.19)

which is the average variance of the unused dimensions.

5.3.2.2 Covariance regularisation

In order to estimate the parameters of the PPCA model it is necessary to compute the samplecovariance matrix Σ. For problems with few data points this can be problematic, as the samplecovariance is likely to be a poor estimator of the true covariance. This can lead to incorrectestimation of the PPCA parameters, and numerical issues where the marginal data distribu-tion covariance C is ill-conditioned, or even singular. Various regularisation methods havebeen proposed to improve the estimation of the covariance estimate including the Van Nessestimator [36] where ΣV N (α ) = αdiag(Σ) for some scalar α and shrinkage estimation whereΣS (δ ) = δ Σ+ (1−δ )I for some scalar δ ∈ [0,1]. The closer δ is to 1 the more covariance infor-mation is retained, and the closer to zero, the better the condition of the covariance estimate.The Ledoit-Wolf shrinkage estimator [37] is a method for choosing δ so as to minimize theexpected squared loss E[ ΣS −Σ

2F

] under the Frobenius norm as the number of observationsand variables go to in�nity. Bayesian approaches specify a prior on the covariance matrixp (Σ) such as the conjugate inverse-Wishart or scaled inverse-Wishart distributions [38]. TheMAP estimate can then be obtained by �nding the mode of the posterior distribution.

Sampling

Sampling from the PPCA is straightforward: �rst sample from the latent Gaussian distri-bution p (z) = N (z|0,I), then transform to data space using the learned linear mapping W.Finally add spherical Gaussian noise with learned variance σ 2. Equivalently we can sampledirectly from the marginal data distribution p (x|θθθ ) =N (x|µµµ,WWᵀ +σ 2I).

5.3.3 Gaussian process latent variable model

The Gaussian process latent variable model (GP-LVM) [39] is a generative probabilistic modelthat maps a low-dimensional space to the data space by means of a kernel function. The GP-LVM is best described as a non-linear extension of PPCA. Recall the joint distribution of thevisible and latent variables given the parameters under the PPCA model:

p (x,z|W,σ 2) =N (x|Wz,σ 2I)N (z|000,I) (5.20)

When learning the PPCA model we integrated out the latent variables z to obtain the marginaldistribution of the visible variables given the parameters. This probability was then max-imised with respect to the parameters to obtain the maximum likelihood estimates of theparameters µµµ, W and σ 2. We can approach this task in a di�erent way by placing a prior onW and integrating out to obtain the marginal distribution of the visible variables given thelatent variables and other parameters. Writing X = [x1,. . . ,xN ]ᵀ and Z = [z1,. . . ,zN ]ᵀ for the


data and latent matrices respectively, and using a prior of the form p (W) =∏

jN (wj |000,III ),where wj is the j’th row of W, we have

p (X,W|Z,σ 2) =N∏i=1N (xi |Wzi ,σ 2I)

D∏j=1N (wj |000,III ) (5.21)

⇒ p (X|Z,σ 2) =

∫ N∏i=1N (xi |Wzi ,σ 2I)

D∏j=1N (wj |000,III )dW (5.22)

=

D∏j=1N (x•j |000,ZZᵀ +σ 2I), (5.23)

where x•j is the jth column of the data matrix X. To see this note that the prior over W isGaussian and so the joint distribution of X and W will also be Gaussian. After integrationthe marginal over X will be Gaussian and it su�ces to identify the mean and covariance ofthe distribution:

x•j = Zwj +ϵϵϵ•j (5.24)⇒ E[x•j ] = E[Zwj +ϵϵϵ•j ] (5.25)

= ZE[wj ]+E[ϵϵϵ•j ] (5.26)= 000 (5.27)

cov[x•j ] = E[x•jxᵀ•j ] (5.28)

= E[(Zwj +ϵϵϵ•j ) (Zwj +ϵϵϵ•j )ᵀ] (5.29)

= ZE[wjwᵀj ]Zᵀ +E[ϵϵϵ•j ]E[wᵀj ]Zᵀ+ (5.30)

ZE[wj ]E[ϵϵϵ•j ]+E[ϵϵϵ•jϵϵϵᵀ•j ] (5.31)

= ZZᵀ +σ 2I. (5.32)

Thus p (x•j |Z,σ 2) =N (x•j |000,ZZᵀ +σ 2I) and the full marginal distribution of the data is theproduct p (X|Z,σ 2) =

∏Dj=1N (x•j |000,ZZᵀ + σ 2I). Now writing K = ZZᵀ + σ 2I we have the

following marginal likelihood and log likelihood:

L (X,σ 2) =1

(2π ) DN2 |K|

D2

exp(−

12tr(K−1XXᵀ )

)(5.33)

⇒ `(X,σ 2) = −D

2 log |K| − 12tr(K−1XXᵀ )+ const. (5.34)

In the PPCA model we maximised the log likelihood with respect to the parameters W aftermarginalising the latent variables. Here we have marginalised the parameters and insteadmaximise the log likelihood with respect to the latent points X. The gradient of the likelihoodwith respect to the latent variables is

∂`

∂Z= K−1XXᵀK−1Z−DK−1Z, (5.35)

so that the solution Z : ∂`∂Z (Z) = 0 is satis�ed when

1DXXᵀK−1Z = Z. (5.36)


-0.6 -0.4 -0.2 0 0.2 0.4 0.6

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

Figure 5.2: Locations in latent space of data points from the Oil data using a GP-LVM trained usingscaled conjugate gradients for 400 iterations. Marker colours represent classes of oil �ow, it is notablethat they are well separated in the latent space. The colormap represents the precision with whichthe manifold was expressed in data-space for that latent point.

It can be shown (see appendix A) that the optimal solution is

Z = ULLVᵀ, (5.37)

where UL is an N ×L matrix whose columns are eigenvectors of XXᵀ , L is a L×L diagonal

matrix whose j’th diagonal element is(λjαD −

1βα

)− 12, where λj is the jth eigenvalue of XXᵀ ,

and V is an arbitrary L×L orthogonal matrix. It can be shown (Appendix B) that this solutionfor the optimal positions of the latent variables is equivalent to that obtained by regular PPCAusing Z= WX. The key observation in the GP-LVM is that the marginal likelihood in 5.33 is aproduct of independent Gaussian processes with a particular linear kernel K = ZZᵀ+σ 2I. Assuch this method of obtaining the latent positions Z could be extended by using non-linearkernels K such as the radial basis function (RBF) which has the form:

k (z,z′) = αexp(−γ

2 (z− z′)ᵀ (z− z′)

)+δz,z′β

−1. (5.38)

GP-LVM describes this broader class of dimensionality-reduction techniques that make useof non-linear kernels in this way. A pleasing characteristic of GP-LVMs is the high qualityvisualisations they can be used to create. Figure 5.2 is an example of a visualisation of amulti-class data with 2 latent dimensions.

5.3.3.1 Learning

When learning a GP-LVM model, the goal is to estimate the latent points Z given the dataX. This is achieved by maximising the probability of the data given the latent points, that isZ = argmaxZp (X|Z,σ 2). The gradient of the log likelihood function with respect to the latentpoints is given in Equation 5.35 and the gradient with respect to the latent points ∂`

∂zn,jcan be

obtained using the chain rule. For the linear kernel the optimum can be found analytically bysetting the gradient to zero, however for more general non-linear kernels this is not possible.


As such non-linear optimisers such as scaled conjugate gradients (SCG) [40] may be used toiteratively �nd a local minimum. In addition, the kernel may contain parameters such as α ,γ and β for the RBF. Gradients can also be found with respect to these variables and they canbe optimised simultaneously with the latent points.

5.3.3.2 Sampling

Having inferred the latent variables Z we can treat the set of latent and data variables as atraining set D = {(zi ,xi )}Ni=1. Now given a point in latent space z∗ we can sample from theposterior distribution p (f∗ |z∗,X,Z,θθθ ). To determine the form of this distribution we �rst notethat the data variables are generated by D independent Gaussian processes. That is

p (f∗ |z∗,X,Z,θθθ ) =D∏j=1

p ( fj∗ |z∗,X,Z,θθθ ). (5.39)

If we write xd = [xd1,. . . ,xdN ]ᵀ for the data points in the d’th dimension then we have thefollowing joint distribution:

*,

xdfd∗

+-∼ N *

,0,*

,

Kxd K∗Kᵀ∗ k∗∗

+-

+-, (5.40)

where for simplicity we assume that the mean is zero. The posterior distributions for eachdimension have the following form:

p ( fd∗ |z∗,X,Z,θθθ ) =N ( fd∗ |µd∗ ,σ2d∗ ), (5.41)

µd∗ = Kᵀ∗ K−1

xdxd , (5.42)σ 2d∗ = k∗∗−K

ᵀ∗ K−1

xdK∗. (5.43)

To save on computation we can compute αααd = K−1xdxd and plug this in to the above formulas

whenever a posterior is required at a new latent point. The joint posterior mean is µµµ =[µ1,. . . ,µD]ᵀ .

We have seen how to sample from the posterior given a latent point, but the questionremains of how to choose a latent point since under the regular GP-LVM a prior is not im-posed on the latent variables. One method is to use kernel density estimation on the inferredlatent points Z. For a Gaussian kernel the bandwidth parameter can be chosen using cross-validation, and latent points can be sampled from the kernel density estimate by pickinga training latent point uniformly at random, then adding Gaussian noise using the kernelestimate.

5.3.4 Bayesian GP-LVM

In the GP-LVM, the parameters W were integrated out and the marginal likelihood was opti-mised with respect to the latent points Z. The Bayesian GP-LVM places a standard Gaussianprior on the latent variables

p (Z) =N∏i=1N (zi |0,I), (5.44)


z1

-5 0 5

z2

-5

0

5

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045Probability density

Figure 5.3: Sampling the GP-LVM. A kernel density estimate can be found for the distribution ofpoints in latent space. Here a Gaussian kernel with bandwidth estimated using cross-validation wasused with the teapots dataset. Red points are the optimized latent data points for the GP-LVM.

so that the marginal distribution of the data variables is:

p (X|θθθ ) =∫

p (X|Z,θθθ )p (Z)dZ. (5.45)

The goal is then to maximise the likelihood with respect to the hyperparameters θθθ . However,this integral is intractable and so variational methods are used to determine a lower boundon the marginal likelihood which can be optimised instead. One advantage of the BayesianGP-LVM is that for certain choices of kernel, the model can automatically determine thedimensionality of the latent space by setting the length scale hyperparameter associated withcertain dimensions to zero [41].

5.3.4.1 Sampling

Unlike the regular GP-LVM, under the Bayesian GP-LVM a prior is imposed on the latentvariables. So sampling is as straightforward as sampling from this prior, then mapping todata space using the posterior distribution described in Section 5.3.3.2.

5.4 The method used

We require a shape model that can produce samples that are both realistic and novel. It is anadditional requirement that the models can learn high dimensional models from few exam-ples. We use a PPCA model as a baseline, and investigate the e�ectiveness of both a standardGP-LVM and a Bayesian GP-LVM in comparison. We implemented PPCA in Matlab, andmade use of the GPmat and vargplvm Matlab toolboxes produced by the machine learninggroup at She�eld University for the GP-LVM models [42, 43].


5.4.1 Model set up

The PPCA model was �tted using the shrinkage-regularised sample data covariance, withthe Ledoit-Wolf shrinkage estimator. The number of latent dimensions was chosen to be thesmallest k such that the sum of the �rst k eigenvalues of the covariance matrix explained80% of the variance. This arguably small value of 80% was chosen to ensure that the numberof latent dimensions was relatively small, as the small number of data points makes estima-tion of transformations of larger latent spaces problematic. In some cases this method wouldstill require a large number of latent dimensions, in which case we would choose some rea-sonable smaller value. The GP-LVM was trained using a RBF kernel, and the number oflatent dimensions were chosen to be the same as were determined for the PPCA model. TheBayesian GP-LVM was trained with the automatic relevance determination radial basis func-tion kernel and similarly the number of latent dimensions was chosen to be the same as forthe PPCA model. However on some occasions, the optimisation of the Bayesian GP-LVMwas di�cult for high dimensional latent spaces, and so we would pick a lower dimension,typically 5.

5.4.2 Optimisation

Fitting the GP-LVM models amounts to learning the optimal positions of the latent points.This can not be achieved analytically for the RBF kernels used, and so general purposegradient-based optimisation techniques were used to train these models. Unless otherwisespeci�ed we used SCG with 100 iterations for the GP-LVM. For the Bayesian GP-LVM a vari-ational deterministic training conditional approximation [44] was used with 20 active points,and 2000 iterations of SCG.

5.4.3 GP-LVM latent kernel density estimation

As mentioned in Section 5.3.3.2, the GP-LVM isn’t a true generative model in the sense thatthere is no prior on the latent variables. We use a kernel density estimate with a Gaussiankernel to approximate the latent distribution. The covariance of the kernel is spherical withvariance estimated using leave-one-out cross validation. More speci�cally, we consider arange of possible variances, and for each variance we evaluate the log-likelihood of a heldout data-point under the KDE determined by the other data points. This is repeated untileach data point has been held out, and the mean log-likelihood is recorded. The variancewith the highest mean log-likelihood is selected as the kernel parameter.

5.5 Evaluation and Analysis

The principal goal of this project is to design a system that can synthesize new shapes from agiven object class. We follow the approach of Eslami et al. [2] and evaluate the shape modelsin terms of the realism of their samples, and their generalisation ability. We assess shaperealism for each model by comparing some example samples to a selection of data shapes.Realistic shapes should possess characteristic features of the object class, and might plausiblybe drawn from the set of data shapes. generalisation ability is assessed by comparing modelsamples to their closest data examples. If these samples are su�ciently di�erent then there


is evidence of that models generalisation ability. We also assess the models in terms of theirability to complete occluded shapes, analyse how the log-likelihood varies with the numberof training dimensions for the PPCA model and demonstrate visualisations created by theGP-LVM. Note that in this chapter we demonstrate samples of landmark points rather thanmeshes, as the meshing process obscures some of the detail of the shape samples. In Chapter6 we demonstrate how a mesh can be constructed that �ts the shape of a given landmarkpoint sample.

5.5.1 Realism

To assess whether the shape models can produce realistic samples we compare a number ofmodel samples with a random selection of data samples for each of the vases (Figure 5.4), cars(Figure 5.5) and teapots (Figure 5.6) datasets. The models were trained and sampled from asdescribed in Section 5.4.

• Vases (Figure 5.4) Vase samples were generated using 6 latent dimensions for eachmodel. The PPCA samples appear fairly realistic in general with plausible contoursand lips. Some of the samples (bottom left, top right, bottom right) have a noticeablecentral lip, where the top half of the vase seems to have sunk into the lower half.This is possibly one of the principal modes of variation where in one direction thevases separate in the middle, and the other they sink into themselves. The GP-LVMsamples are also plausible in general, and do not su�er from the lip present in thePPCA samples. However, the top right sample is poor, with an entirely missing centralsection, and the sample second from the left on the bottom row does not have a wellde�ned lip. The Bayesian GP-LVM samples are arguably the most realistic, with fewerissues present than the other models. One exception is sample (2) with its asymmetricpoint distribution and missing central section.

• Cars (Figure 5.5) Car samples were generated using 10 latent dimensions for both thePPCA and GP-LVM models, and using 5 latent dimensions for the Bayesian GP-LVM.All three models produce realistic car samples, in fact almost all of which could veryplausibly be drawn from the training set. Aside from two very ‘�at’ samples (GP-LVM(1) and Bayesian GP-LVM (8)), the realism across all models is good. This re�ects thelow level of variance of the training set and the good correspondences found as a result.

• Teapots (Figure 5.6) Teapot samples were generated using 5 latent dimensions foreach model. It is very clear that the teapots dataset is the most challenging, with eachmodel failing to produce a set of realistic shapes. The PPCA samples are the best,and feature clear handles, spouts and lids, as well as reasonably smooth and symmet-ric bodies. However, a number of the samples feature uneven scattering of landmarkpoints rather than an even distribution. The GP-LVM produces noisy, asymmetric, anddisconnected samples with few redeeming qualities. This is likely to be a result of thekernel density sampling procedure where samples are data points that have been mod-i�ed in latent space and transformed into data space. The Bayesian GP-LVM producesmore realistic samples than the standard GP-LVM, but they su�er from asymmetry androughness uncharacteristic of the object class.


(a) Data samples

(b) PPCA

(c) GP-LVM

(d) Bayesian GP-LVM

Figure 5.4: Sampled vases. (a) A random selection of shapes from the vases dataset. (b) Shapessampled from the PPCA model. (c) Shapes sampled from the GP-LVM. (d) Shapes sampled from theBayesian GP-LVM.


(a) Data samples

(b) PPCA

(c) GP-LVM

(d) Bayesian GP-LVM

Figure 5.5: Sampled cars. (a) A random selection of shapes from the cars dataset. (b) Shapes sampledfrom the PPCA model. (c) Shapes sampled from the GP-LVM. (d) Shapes sampled from the BayesianGP-LVM.

Overall, there is little to choose between the three models in terms of realism. The mostnotable points are the the good quality vase samples produced by the Bayesian GP-LVM, andthe poor quality teapot samples produced by the standard GP-LVM.

5.5.2 Generalisation

We assess the generalisation ability of the models by comparing model samples to their clos-est data examples to assess whether they are su�ciently di�erent to be considered novel.

• Nearest data examples (Figure 5.7)Generalisation can also be assessed qualitatively,by sampling from a model, then comparing to the closest data example. It is hoped thatthe samples will di�er from closest data examples in non-trivial ways, otherwise themodel will simply be remembering the training data. We sample from each model and


(a) Data samples

(b) PPCA

(c) GP-LVM

(d) Bayesian GP-LVM

Figure 5.6: Sampled teapots. (a) A random selection of shapes from the teapots dataset. (b) Shapessampled from the PPCA model. (c) Shapes sampled from the GP-LVM. (d) Shapes sampled from theBayesian GP-LVM.


(a) PPCA

(b) GP-LVM

(c) Bayesian GP-LVM

Figure 5.7: Sampled and closest data examples. Samples are shown in blue, and closest data examplesby Euclidean distance are shown in red. (a) PPCA model. (c) GP-LVM. (d) Bayesian GP-LVM.

�nd the data example that is closest by Euclidean distance in data space. An alternativeoption would be to �nd the closest training point in latent space, and to use that asthe comparison, however this approach is inconsistent across models, with PPCA forexample having a latent space on a larger scale that the GP-LVM models.

Figure 5.7 shows a number of vase samples (red), along with their nearest data examples(blue) for each model. Each model produces samples that are signi�cantly di�erent totheir comparators, with di�erent bases, contours, and lips. In particular, the secondGP-LVM sample is notably di�erent to its nearest data example, with a longer, thinnerneck. The �nal Bayesian GP-LVM sample is also very di�erent, with a more slenderpro�le, and a di�erent lip. Some examples however are quite similar, with the �rstGP-LVM sample being very close to its associated data sample, with similar contoursand lip.

Overall it is hard to conclude that any of the models is more general, with each producingsamples that di�er from their nearest data counterparts.

5.5.3 Shape completion

We further assess the quality of the shape models using a shape completion task, where aportion of a shape is hidden, and the goal is to reconstruct the missing region using the model.The models are trained on a training set, and the shape to be reconstructed is taken from aseparate test set. This is to ensure that the models do not memorize the training data.

For the PPCA model, we reconstruct the missing variables using the mean of the condi-tional distributionp (m|v,θ ), wherem are the missing variables and v are the visible variables.


(a) (b)

(c) (d) (e)

Figure 5.8: Shape completion. (a) Test shape from cars dataset. (b) Test shape with missing points.(c) PPCA reconstruction. (d) GP-LVM reconstruction. (e) Bayesian GP-LVM reconstruction.

We know the joint distribution p (m,v|θ ) = N (m,v|µµµ,WWᵀ +σ 2I), and the conditional canbe obtained using rules for conditioning on Gaussians. For the GP-LVMs, shape completionis not so straightforward; we do not have a marginal data distribution in the same way aswe do with the PPCA model, instead completion is achieved by optimising the likelihood ofthe visible variables with respect to the latent variables, and then given the optimised latentvariables, transforming back into the data space. For more details see [41].

To evaluate the models shape completion capabilities we consider some examples, andalso the average reconstruction error on each dataset.

• Completion examples (Figures 5.8 and 5.9) The �rst example comes from the carsdataset, and hides the rear end of a test car shape. Each model produces a plausi-ble reconstruction of the missing variables, achieving an unbroken transition betweenvisible and reconstructed variables, as well as reasonable lengths and distributions ofpoints. The three constructions are qualitatively similar, with the only obvious di�er-ence being that the GP-LVM model’s reconstructed roof slopes down more so than theother models.

The second example comes from the vases dataset, and features a test vase missingits middle section, but retaining its base and neck. Each model achieves an unbrokentransition between visible and reconstructed variables. However, the contours of thePPCA reconstruction are less smooth than the other models, and an unrealistic lipon the lower half of the vase is present. Both the GP-LVM and Bayesian GP-LVMreconstructions are of very high quality, with the Bayesian variant being slightly closerto the original data object.

• Reconstruction error (Table 5.1) We compute the mean squared reconstruction er-ror across a test set for each data set. For a particular test shape the mean squaredreconstruction error is E =

∑Mi=1 (mi −mi )

2, where m = [m1,. . . ,mM ] are the recon-structed missing variables, and m = [m1,. . . ,mM ] are the true values of the missingvariables. We average across test shapes for the two reconstruction tasks describedabove to obtain the values in Table 5.1.

For both tasks the Bayesian GP-LVM is the best performer, closely followed by the


(a) (b)

(c) (d) (e)

Figure 5.9: Shape completion. (a) Test shape from vases dataset. (b) Test shape with missing points.(c) PPCA reconstruction. (d) GP-LVM reconstruction. (e) Bayesian GP-LVM reconstruction.

Model Vases Cars

PPCA 0.0223 0.00099GP-LVM 0.0027 0.00095BGP-LVM 0.0024 0.00093

Table 5.1: Mean reconstruction error on test set for vase and car tasks.

standard GP-LVM, with the PPCA model having the greatest reconstruction error. Onthe vases task in particular, the PPCA model’s reconstruction error is almost ten timesthat of the Bayesian GP-LVM. In the vases task 8 latent dimensions were used for eachmodel, and in the cars task 10 latent dimensions were used for the PPCA and GP-LVMmodels, and 5 dimensions for the Bayesian GP-LVM.

The performance of each model on the shape completion task is again comparable, how-ever the Bayesian GP-LVM stands out with high quality reconstructions and the lowest re-construction error. PPCA has the worst performance on this task, particularly in terms ofreconstruction error. The GP-LVM produces good reconstructions, but is edged out by theBayesian variant in terms of reconstruction error.

5.5.4 Log likelihood

Figure 5.10 shows the train and test mean negative log-likelihood (NLL) for the PPCA modelwith varying latent dimensions. The mean NLL is obtained using cross-validation with 5folds. For the vase dataset both the train and test NLL reduce sharply with the number oflatent dimensions before reaching a plateau at around 50 latent dimensions. Thus a largenumber of dimensions are required for the linear subspace to accurately model the data.For the car dataset the train NLL decreases slowly before reaching a plateau at around 40latent dimensions, however the test NLL halts its rate of decrease much earlier, at around


10 latent dimensions. This suggests the model begins to over�t the training data beyond 10latent dimensions. For the teapot dataset both the train and test NLL decrease until around16 latent dimensions, at which point the rate of decrease is slowed. As with the vases, itappears that a linear subspace with relatively high dimensionality is required to explain theteapots data.

Latent dimensions

0 50 100

Ne

gative

log

lik

elih

oo

d

-5000

-4000

-3000

-2000

-1000Vases

Test

Train

Latent dimensions

0 20 40

Ne

gative

log

lik

elih

oo

d

-7500

-7000

-6500

-6000

-5500

-5000

-4500Cars

Test

Train

Latent dimensions

0 5 10 15 20 25

Ne

gative

log

lik

elih

oo

d

-4000

-3500

-3000

-2500

-2000

-1500Teapots

Test

Train

Figure 5.10: Train and test negative log-likelihood by number of latent dimensions of the PPCAmodel.

5.5.5 Data visualisation

If we constrain the models to have 2 or 3 latent dimensions, we can obtain a visualisationof the high dimensional datasets. We do this by simply plotting the data in latent space.The quality of the visualisations do not necessarily re�ect on the realism and generalisationproperties that we are interested in evaluating, but they can be interesting in terms of clusterand outlier identi�cation. We create two-dimensional visualisations using the GP-LVM, anddemonstrate outlier identi�cation as well as a method to obtain smooth transitions betweensamples.

• Outlier visualisation (Figure 5.11) We trained a GP-LVM with 2 latent dimensionsusing 100 iterations of scaled conjugate gradients on the teapots dataset. The two-dimensional visualisation shown in Figure 5.11 shows that the data are primarily dis-tributed in a central cluster, with a few outliers on either side. We highlight two ofthese outlier data points and plot the landmarks of the associated data examples. Theexamples that the model considers to be outliers are immediately recognisable as thetwo teapots that have a handle that arcs over the top of the teapot body.

• Straight line sampling (Figure 5.11) We trained a GP-LVM with 2 latent dimensionsusing 100 iterations of SCG on the vases data. Figure 5.12 demonstrates a nice featureof the latent variable models. By sampling uniformly along a straight line in latentspace, we can create sets of landmarks that smoothly morph one object into another.Here the �rst sample vase has high shoulders, with each successive sample havinglower shoulders and a longer neck. This demonstration raises the prospect of a userinterface for shape generation, where a user could create a new shape by exploring thelatent space.


Figure 5.11: Outlier visualisation using GP-LVM. The left hand �gure shows the optimised two-dimensional latent points with outliers indicated in orange and red. The Corresponding teapot land-mark points are shown on the right. The two models are ones that we would expect to be outliers,given their unusual geometry and handle placement.

Figure 5.12: Smooth data morphing using the GP-LVM. Latent points are chosen along a straightline, the corresponding smooth transition between vase landmarks is shown on the bottom line.

Chapter 6

Meshing shape samples

Once a model has been �tted, it may be sampled from as described in section 4.2. Samplesconsist of D-dimensional vectors, which may be rearranged into a set of landmark points.However the object of this project was not to produce sets of landmark points, but to syn-thesize full 3D models in the form of triangulated meshes. We can produce these meshes bysampling a new set of landmarks and then generating a mesh whose shape closely matchesthe landmark points. To do this we take advantage of a widely used method from computergraphics called called embedded deformation. In this chapter we describe the embedded de-formation method (Section 6.1) and the way in which it is adapted for use in this project(Section 6.2). We then evaluate the method by considering examples of the produced meshesfor each object class (Section 6.3).

6.1 Embedded deformation

(a) (b)

Figure 6.1: Embedded deformation. (a) Grid of nodes (red) and vertices (blue). (b) Given a set ofa�ne transformations for each node, the positions of the vertices can be found using an embeddeddeformation.

Embedded deformation [4] is a method for deforming a mesh by means of a reduced de-formable model called a deformation graph. The deformation graph’s nodes are typicallya reasonably dense subset of the mesh vertices. In the embedded deformation transforma-tions are found for each node of the deformation graph, and the rest of the mesh verticesare deformed by interpolation of these transformations. Let {xi }Ni=1 be the vertex set and let{gj }Mj=1 be the node set. If Aj is the a�ne transformation associated with the jth node then

46

Chapter 6. Meshing shape samples 47

the deformed position of vertex vi is a weighted linear combination of node transformations:

vi =M∑j=1

w j (vi )[Aj (vi −gj )+gj ], (6.1)

where the weights are set as follows:

w j (vi ) =

(1− vi −gj /dmax)

2 if gj ∈ kNN(vi )

0 otherwise.(6.2)

Thus nodes that are closer to a vertex have more in�uence over its deformation, and k deter-mines how many nodes in�uence a vertex. The embedded deformation technique is widelyused in the computer graphics world, where it is useful for animating 3D models. An exampleof an embedded deformation is shown in Figure 6.1.

6.2 The method used

To construct a meshed sample we use the following procedure. Start with a template meshS, its associated landmark points Lt = {xi }Mi=1 and the sampled landmark points Ls = {ui }Mi=1.Find the a�ne transformations {Ai }

Li=1 that map the template landmark points to the sam-

pled landmarks by deforming the template points. This can be accomplished by modifyingthe iterative sti�ness reduction algorithm from Chapter 4. Instead of iteratively identify-ing correspondences and transformations as we did in that section, we actually know thecorrespondences, so can optimise for these explicitly.

6.2.1 Distance metric

We use a distance metric of the form

ED (Ψ) =N∑i=1

wi ‖Aixi −ui ‖2 . (6.3)

This encourages transformations that map the template landmarks to their correspondingsample landmarks. Again we set all of the weights to one.

6.2.2 Regularisation

Sti�ness regularisation as de�ned in Equation 4.16 is again used to constrain the problem,however de�ning the neighbourhoods that control the regularisation can be problematic. Ifwe determine the neighbourhoods using k-nearest neighbours as we did previously, then thegraphs produced may be of poor quality as in Figure 6.2(a). This is because the correspon-dence �nding procedure can yield landmarks that while adjacent, are far enough apart as tonot be k-nearest neighbours. One solution is to use the same incidence graph as was used forthe template deformation in the original correspondence �nding procedure (Section 4.2.4).For the template model, adjacent landmarks tended to be close together, so they would gener-ally be connected in the incidence graph. Assuming that in the deformation process, adjacentlandmarks remained adjacent, then using this incidence graph will ensure good connectivityas in Figure 6.2(b).


(a) (b)

Figure 6.2: Deformation graphs. (a) Graph constructed using 6-nearest neighbours on landmarkpoints. Note the disconnected section which would lead to poor deformation regularisation. (b) Orig-inal graph used to regularise template deformation in the correspondence establishing procedure.

6.2.3 Embedded deformation meshing

These data �t and regularisation terms can then be plugged into the optimisation procedureas before (Section 4.2.5), and the iterative sti�ness reduction algorithm can be used to de-termine the transformations {Ai }

Li=1. Once the transformations of the template landmarks

are established, the template vertices can be transformed using an embedded deformation,treating the landmarks as nodes of the deformation graph.

6.2.4 Mesh smoothing

Given a synthesised mesh, we optionally apply curvature �ow smoothing [45] to removerough features and noise. Smoothing in this way can enhance the quality and realism of thegenerated mesh. The basic idea behind the curvature �ow smoothing is to update vertices inthe direction of their surface normal with a step size proportional to the mean curvature:

vnew = vold−λk (vold)n(vold), (6.4)

where λ is a constant, k (v) and n(v) are the mean curvature and unit normal at vertex vrespectively. This kind of smoothing has desirable properties including a preservation of theoriginal ratio of edge lengths, and a reduction in drift when compared to the popular umbrellaoperator [46].

6.2.5 Template choice

The embedded deformation meshing relies on a template mesh, whose vertices are deformedso that the template landmarks closely match the target landmarks. As such, the results aredependent on the template mesh used. We use three methods to pick the template mesh fora given sample: i) Select a generic object that is likely to deform well. ii) Select the nearestexample to the sample in data space. iii) Select the nearest example to the sample in latentspace.


6.3 Evaluation and Analysis

In this section we present a number of examples of the embedded deformation meshing andhope to demonstrate that the generated meshes are good quality and match the sampledlandmarks. We also consider the e�ects of the sti�ness set A on the meshes produced.

6.3.1 Sti�ness parameters

Figure 6.3: The impact of sti�ness parameters on generated meshes. (a) Landmark points sampledfrom the Bayesian GP-LVM. (b) Generated mesh with low �nal sti�ness parameter (as = 1). (c) Gen-erated mesh with high �nal sti�ness parameter (as = 10).

The �nal value as of the sti�ness setA = [a1,. . . ,as ] is signi�cant in determining the appear-ance of the generated mesh. High �nal sti�ness values encourage the generated mesh to besimilar to the template mesh, and can improve the realism of the generated mesh, low valueson the other hand allow the generated mesh to capture the shape of the sample better. Figure7.1 shows a sampled shape along with two generated meshes. The �rst mesh was created witha low �nal sti�ness value, whereas the second mesh was created with a high �nal sti�nessvalue. It is clear that the �rst mesh better captures the shape of the landmark points, but itis irregular and inconsistent, and not a satisfactory example of a realistic teapot. The secondmesh however is better quality, but captures less of the landmark shape. For each dataset,a bit of tweaking of the sti�ness parameters is required to �nd an acceptable compromisebetween mesh quality and faithful shape reproduction.

6.3.2 Examples

For each object class we sample from a trained Bayesian GP-LVM with 6 latent dimensions.We then select a generic template model and appropriate sti�ness parameters, and apply theembedded deformation meshing technique. We assess whether the meshes produced matchthe sampled landmarks well, and whether they are good quality in terms of regularity andrealism.

• Vases (Figure 6.4) For the sampled vase landmarks, a �nal sti�ness value of as = 3was found to produce good meshes that match the landmark shape. The relatively lowcomplexity of the template shape combined with the high quality samples produced bythe Bayesian GP-LVM meant that this low sti�ness could be used without compromis-ing the integrity of the mesh produced. After applying the embedded deformation, weused smoothing to obtain the meshes in Figure 6.4. The meshes produced match thelandmark points very well, with good reproduction of the vase bases, shoulders and


lips. This is despite sampled shapes that di�er signi�cantly from the template mesh.The meshed samples are also good quality, with even surfaces and realistic contours.

Figure 6.4: Meshing vase samples. The landmarks are generated by the Bayesian GP-LVM, and theembedded deformation is applied to a template shape to generate the meshes.

• Cars (Figure 6.5) For the car models a higher �nal sti�ness value of as = 5 was chosenas lower values yielded rough and unrealistic meshes. No smoothing was applyed inthis case as it caused gaps in the meshes produced. The generated meshes are shownin Figure 6.5. As with the vase models, the meshes produced are satisfactory, with fewirregularities present. They match the landmark points in terms of shape, but retainlow-level features of the template model such as the headlights, wing mirrors, andwheels. One issue is that the wheels of the second car have been stretched upwards,and are no longer circular. The parts-based approach described in Section 4.3.1 couldbe a solution to this problem, as wheels could be modelled separately to the rest of thecar, and we would expect the learned model of wheels to always sample circular ratherthan elliptical shapes.

Figure 6.5: Meshing car samples. The landmarks are generated by the Bayesian GP-LVM, and theembedded deformation is applied to a template shape to generate the meshes.

• Teapots (Figure 6.6) For the teapot models it was necessary to use a high �nal sti�nessvalue of as = 10 in order to generate reasonably realistic meshes. As we demonstrated


in Section 5.5.1, none of the shape models produced consistently plausible samples.By choosing a high �nal sti�ness value the embedded deformation procedure is reg-ularised by the chosen template, thus producing more realistic meshes. We appliedthe embedded deformation with smoothing to obtain the meshes in Figure 6.6. Themeshes produced approximate the sampled landmarks reasonably well, but miss �nedetail such as the shape of the spout. The quality of the meshes is acceptable, with fewirregularities present. However in each of the generated meshes the handle has sepa-rated from the teapot body. This is a result of these parts not actually being connectedin the template mesh.

Figure 6.6: Meshing car samples. The landmarks are generated by the Bayesian GP-LVM, and theembedded deformation is applied to a template shape to generate the meshes.

Overall, the embedded deformation is an e�ective method of meshing the generated land-mark points. Given a reasonable choice of sti�ness parameters, the technique can producemeshes that are both realistic in terms of mesh quality, and consistent with the shape de�nedby the landmark points. The teapot meshes produced are of lower quality than the others, butthis is more a consequence of the correspondence establishing procedure than the meshingprocess. Importantly the meshing process allows us to achieve the central goal of this projectof generating novel and realistic triangulated meshes that could be plausibly inserted into a3D scene.

Chapter 7

Conclusions and Future Work

Figure 7.1: Our system generates novel, realistic class instances for teapot, vase and car object classes.

7.1 Conclusions

In this project the main goal was to develop a system that can take in a set of 3D graphicsmodels of a particular object class, and learn to generate novel, realistic instances of thatobject class. The system that was developed consisted of three main components: i) Estab-lishing point correspondence, where a �xed dimensional set of corresponding landmarks wasfound for each object in an object class, ii) Modelling object shape, where generative proba-bilistic models of the landmark points were speci�ed, learned and sampled from, iii) Meshingshape samples, where a mesh was generated that matches the shape of a given set of sampledlandmark points.

In the point correspondence component we used a variant of the iterative closest pointalgorithm with local non-rigid deformations, a bidirectional point-to-point data �t term, anditerative sti�ness relaxation. We found that this method produces qualitatively good cor-respondences that could be used as the basis for a shape model. We also demonstrated bymeans of examples that the bidirectional data �t term, the use of local vertex transformationsand the iterative sti�ness reduction procedure all improved the deformation process and thusthe correspondences obtained.

For shape modelling PPCA, the GP-LVM and the Bayesian GP-LVM were used. The three

52

Chapter 7. Conclusions and Future Work 53

models were assessed on their ability to produce novel and realistic samples, as well as on ashape reconstruction task. Each model was found able to produce good quality samples forthe vase and car dataset, however the samples produced in the teapot class were lacking inrealism. This re�ects more on the correspondences obtained for the teapot models and thesmall dataset size than it does the shape models. The Bayesian GP-LVM was found to bemost successful on the shape reconstruction task with the lowest reconstruction error andqualitatively good samples. The PPCA model was least successful on this task with by farthe highest reconstruction error.

In the shape meshing component we adapted the embedded deformation procedure totransform a template mesh to match the shape of sampled landmarks. We found the meshesproduced to be both realistic and faithful to the sampled landmark points. The meshes pro-duced for sampled teapots were of lower quality, but again this is a knock-on e�ect from thecorrespondence establishing procedure. We also found the sti�ness parameters to be signif-icant in the appearance of the generated mesh. High sti�ness values restricted the mesh tobe more similar to the template used, whereas low values allowed the mesh to reproduce theshape of the landmark points, but sometimes at the expense of mesh quality.

Overall the methods used were successfully in ful�lling the overall goal of producingnovel, realistic shapes. However there are a number of areas in which aspects of the methodwere less successful, such as the correspondence �nding procedure, which struggled with thehighly variable teapot models. The visual nature of the shape sampling task, and the lack ofground truth data meant it was di�cult to quantitatively evaluate the methods used. Insteadwe resorted to examples, which although very useful, are limited in how well they can fullyexplore the methods.

7.2 Future Work

One area for future work is in establishing point correspondences. In this work we used apoint-to-point distance metric, but a point-to-plane metric may be more e�ective for lowresolution meshes, as the template points will be attracted to points that densely cover theobject surface rather than a set of sparse landmarks. Another area for exploration is witha parts based approach. If the objects are segmented into known parts, then the problemof establishing point correspondences could be reduced to �nding correspondences betweenpoints on each part. One disadvantage to this approach is that part-annotated models arerequired, however there exist automated segmentation techniques that may be well suitedto this task [47, 48]. Even if parts are known for just the template shape then improvementsmay be had by deforming the parts seperately. In order to compare these techniques to thoseoutlined in this project, it would be sensible to manually annotate a test set of models withlandmark points. This would allow for a more robust quantitative evaluation in which anestimated landmark could be compared to the ground truth point.

Another way in which parts information could be utilised is in shape modelling. Objectscould be modelled as collections of parts and locations where the parts connect, and a jointdistribution over these elements could be speci�ed and learned. Kalogerakis et al. synthesizedobjects by combining a pre-determined set of parts in plausible combinations [5]. In this worksome continuous geometric features were modelled such as the length of table legs, or radiusof a circular table top, however there is room to extend the approach to the full continuous

Chapter 7. Conclusions and Future Work 54

modelling used in project.A generative probabilistic model of object shape could be used as an element in a com-

puter vision system. One approach is to use a shape model to generate synthetic, richlyannotated data. This could then be used as training data for scene understanding tasks suchas the estimation of object pose, location, as well as scene elements such as camera positionand lighting location. This approach was used by Shotton et al. [17] to recognise the locationof human body parts from depth images. Synthetic depth images of humans in a variety ofposes were generated, and a model was trained using the richly annotated data set. Anotherapproach is to incorporate the shape model into vision as inverse graphics systems such asPicture [49], where scenes are rendered based on sampled scene elements and then comparedto a data image in order to �nd the set of scene variables most likely to have generated theimage. Here the scene elements could be the object class, the pose and the location of the ob-ject, as well as the lighting and camera. The system could then sample objects such as vasesor teapots in various locations, poses, and shapes until the rendered scene closely matchesthe data image.

Bibliography

[1] Peter Dayan, Geo�rey E Hinton, Radford M Neal, and Richard S Zemel. The Helmholtzmachine. Neural Computation, 7(5):889–904, 1995.

[2] SM Ali Eslami, Nicolas Heess, Christopher KI Williams, and John Winn. The ShapeBoltzmann Machine: a strong model of object shape. International Journal of ComputerVision, 107(2):155–176, 2014.

[3] Brian Amberg, Sami Romdhani, and Thomas Vetter. Optimal step nonrigid ICP al-gorithms for surface registration. In Computer Vision and Pattern Recognition, 2007.CVPR’07. IEEE Conference on, pages 1–8. IEEE, 2007.

[4] Robert W Sumner, Johannes Schmid, and Mark Pauly. Embedded deformation for shapemanipulation. ACM Transactions on Graphics (TOG), 26(3):80, 2007.

[5] Evangelos Kalogerakis, Siddhartha Chaudhuri, Daphne Koller, and Vladlen Koltun. Aprobabilistic model for component-based shape synthesis. ACM Transactions on Graph-ics (TOG), 31(4):55, 2012.

[6] Noa Fish, Melinos Averkiou, Oliver van Kaick, Olga Sorkine-Hornung, Daniel Cohen-Or, and Niloy J. Mitra. Meta-representation of shape families. ACM Trans. Graph.,33(4):34:1–34:11, July 2014.

[7] Matthew M. Loper, Naureen Mahmood, and Michael J. Black. MoSh: Motion and shapecapture from sparse markers. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia),33(6):220:1–220:13, November 2014.

[8] Neill D. F. Campbell and Jan Kautz. Learning a manifold of fonts. ACM Transactions onGraphics (SIGGRAPH), 33(4), 2014.

[9] M Zeeshan Zia, Michael Stark, Bernt Schiele, and Konrad Schindler. Detailed 3D rep-resentations for object recognition and modeling. Pattern Analysis and Machine Intelli-gence, IEEE Transactions on, 35(11):2608–2623, 2013.

[10] Andreas Lanitis, Chris J Taylor, and Timothy F Cootes. Automatic interpretation andcoding of face images using �exible models. Pattern Analysis and Machine Intelligence,IEEE Transactions on, 19(7):743–756, 1997.

[11] Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang,and Jianxiong Xiao. 3D ShapeNets: a deep representation for volumetric shapes. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages1912–1920, 2015.

55

Bibliography 56

[12] Yuri Boykov, Olga Veksler, and Ramin Zabih. Markov random �elds with e�cient ap-proximations. In Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEEcomputer society conference on, pages 648–655. IEEE, 1998.

[13] Timothy F Cootes, Christopher J Taylor, David H Cooper, and Jim Graham. Activeshape models-their training and application. Computer vision and image understanding,61(1):38–59, 1995.

[14] Tobias Heimann and Hans-Peter Meinzer. Statistical shape models for 3D medical imagesegmentation: a review. Medical image analysis, 13(4):543–563, 2009.

[15] Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers,and James Davis. SCAPE: shape completion and animation of people. In ACM Transac-tions on Graphics (TOG), volume 24, pages 408–416. ACM, 2005.

[16] Gerard Pons-Moll, Javier Romero, Naureen Mahmood, and Michael J. Black. Dyna:A model of dynamic human shape in motion. ACM Transactions on Graphics, (Proc.SIGGRAPH), 34(4):120:1–120:14, August 2015.

[17] Jamie Shotton, Toby Sharp, Alex Kipman, Andrew Fitzgibbon, Mark Finocchio, AndrewBlake, Mat Cook, and Richard Moore. Real-time human pose recognition in parts fromsingle depth images. Communications of the ACM, 56(1):116–124, 2013.

[18] Michael Revow, Christopher KI Williams, and Geo�rey E Hinton. Using generativemodels for handwritten digit recognition. Pattern Analysis and Machine Intelligence,IEEE Transactions on, 18(6):592–606, 1996.

[19] Thomas J Cashman and Andrew W Fitzgibbon. What shape are dolphins? Building3D morphable models from 2D Images. Pattern Analysis and Machine Intelligence, IEEETransactions on, 35(1):232–244, 2013.

[20] Anestis Koutsoudis, George Pavlidis, Vassiliki Liami, Despoina Tsiafakis, andChristodoulos Chamzas. 3D pottery content-based retrieval based on pose normali-sation and segmentation. Journal of Cultural Heritage, 11(3):329–338, 2010.

[21] Sanja Fidler, Sven Dickinson, and Raquel Urtasun. 3D object detection and viewpointestimation with a deformable 3D cuboid model. In Advances in Neural Information Pro-cessing Systems, pages 611–619, 2012.

[22] Dirk-Jan Kroon. Wavefront obj toolbox. http://uk.mathworks.com/matlabcentral/fileexchange/27982-wavefront-obj-toolbox.

[23] Alejandro F Frangi, Daniel Rueckert, Julia Schnabel, Wiro J Niessen, et al. Automaticconstruction of multiple-object three-dimensional statistical shape models: Applicationto cardiac modeling. Medical Imaging, IEEE Transactions on, 21(9):1151–1166, 2002.

[24] Rhodri H Davies, Carole J Twining, Tim F Cootes, John C Waterton, and Camillo JTaylor. A minimum description length approach to statistical shape modeling. MedicalImaging, IEEE Transactions on, 21(5):525–537, 2002.

http://uk.mathworks.com/matlabcentral/fileexchange/27982-wavefront-obj-toolbox

http://uk.mathworks.com/matlabcentral/fileexchange/27982-wavefront-obj-toolbox

Bibliography 57

[25] Serge Belongie, Jitendra Malik, and Jan Puzicha. Shape matching and object recognitionusing shape contexts. Pattern Analysis and Machine Intelligence, IEEE Transactions on,24(4):509–522, 2002.

[26] Alan D Brett, Andrew Hill, and Christopher J Taylor. A method of 3D surface corre-spondence for automated landmark generation. In BMVC, 1997.

[27] Hao Li, Robert W Sumner, and Mark Pauly. Global correspondence optimization fornon-rigid registration of depth scans. In Computer Graphics Forum, volume 27, pages1421–1430. Wiley Online Library, 2008.

[28] Michael Wand, Bart Adams, Maksim Ovsjanikov, Alexander Berner, Martin Bokeloh,Philipp Jenke, Leonidas Guibas, Hans-Peter Seidel, and Andreas Schilling. E�cientreconstruction of nonrigid shape and motion from real-time 3D scanner data. ACMTransactions on Graphics (TOG), 28(2):15, 2009.

[29] Haili Chui and Anand Rangarajan. A new point matching algorithm for non-rigid reg-istration. Computer Vision and Image Understanding, 89(2):114–141, 2003.

[30] Gary KL Tam, Zhi-Quan Cheng, Yu-Kun Lai, Frank C Langbein, Yonghuai Liu, DavidMarshall, Ralph R Martin, Xian-Fang Sun, and Paul L Rosin. Registration of 3D pointclouds and meshes: a survey from rigid to nonrigid. Visualization and Computer Graph-ics, IEEE Transactions on, 19(7):1199–1217, 2013.

[31] Paul J Besl and Neil D McKay. Method for registration of 3-D shapes. In Robotics-DLtentative, pages 586–606. International Society for Optics and Photonics, 1992.

[32] Alejandro F Frangi, Johan HC Reiber, Boudewijn PF Lelieveldt, et al. Independent com-ponent analysis in statistical shape models. In Medical Imaging 2003, pages 375–383.International Society for Optics and Photonics, 2003.

[33] Sami Romdhani, Shaogang Gong, Ahaogang Psarrou, et al. A multi-view nonlinearactive shape model using kernel pca. In BMVC, volume 10, pages 483–492, 1999.

[34] David Barber. Bayesian reasoning and machine learning. Cambridge University Press,2012.

[35] Michael E Tipping and Christopher M Bishop. Probabilistic principal component anal-ysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3):611–622, 1999.

[36] John Van Ness. On the dominance of non-parametric Bayes rule discriminant algo-rithms in high dimensions. Pattern Recognition, 12(6):355–368, 1980.

[37] Olivier Ledoit and Michael Wolf. A well-conditioned estimator for large-dimensionalcovariance matrices. Journal of Multivariate Analysis, 88(2):365–411, 2004.

[38] O‘Malley, James A and Zaslavsky, Alan M. Domain-level covariance analysis for mul-tilevel survey data with structured nonresponse. Journal of the American StatisticalAssociation, 103(484):1405–1418, 2008.

Bibliography 58

[39] Neil D Lawrence. Gaussian process latent variable models for visualisation of highdimensional data. Advances in Neural Information Processing Systems, 16(3):329–336,2004.

[40] Martin Fodslette Møller. A scaled conjugate gradient algorithm for fast supervisedlearning. Neural Networks, 6(4):525–533, 1993.

[41] Michalis K Titsias and Neil D Lawrence. Bayesian Gaussian process latent variablemodel. In International Conference on Arti�cial Intelligence and Statistics, pages 844–851, 2010.

[42] The University of She�eld Machine learning group Department of Computer Science.GPmat. http://ml.sheffield.ac.uk/~neil/gpmat/.

[43] The University of She�eld Machine learning group Department of Computer Science.vargplvm. https://github.com/SheffieldML/vargplvm.

[44] James Hensman and Neil Lawrence. Gaussian processes for big data through stochasticvariational inference. Adv Neural Inf Process Syst, 25, 2012.

[45] Mathieu Desbrun, Mark Meyer, Peter Schröder, and Alan H Barr. Implicit fairing ofirregular meshes using di�usion and curvature �ow. In Proceedings of the 26th An-nual Conference on Computer Graphics and Interactive Techniques, pages 317–324. ACMPress/Addison-Wesley Publishing Co., 1999.

[46] Gabriel Taubin. A signal processing approach to fair surface design. In Proceedingsof the 22nd Annual Conference on Computer Graphics and Interactive Techniques, pages351–358. ACM, 1995.

[47] Yu-Kun Lai, Shi-Min Hu, Ralph R Martin, and Paul L Rosin. Rapid and e�ective segmen-tation of 3D models using random walks. Computer Aided Geometric Design, 26(6):665–679, 2009.

[48] Aleksey Golovinskiy and Thomas Funkhouser. Consistent segmentation of 3D models.Computers & Graphics, 33(3):262–269, 2009.

[49] Tejas D. Kulkarni, Pushmeet Kohli, Joshua B. Tenenbaum, and Vikash Mansinghka. Pic-ture: A probabilistic programming language for scene perception. In The IEEE Confer-ence on Computer Vision and Pattern Recognition (CVPR), June 2015.

http://ml.sheffield.ac.uk/~neil/gpmat/

https://github.com/SheffieldML/vargplvm

Modelling 3D Object Shape - Charlie Nashcharlienash.github.io/assets/docs/mscThesis.pdf · Chapter 1 Introduction A major goal in computer vision is to fully understand a natural

Documents