Top Banner
DIFFERENTIAL GEOMETRY APPROACH FOR UNSUPERVISED MACHINE LEARNING ALGORITHMS by Nan Wu A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Mathemtics University of Toronto c Copyright 2018 by Nan Wu
118

by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

Apr 17, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

DIFFERENTIAL GEOMETRY APPROACH FOR UNSUPERVISED MACHINE LEARNINGALGORITHMS

by

Nan Wu

A thesis submitted in conformity with the requirementsfor the degree of Doctor of PhilosophyGraduate Department of Mathemtics

University of Toronto

c© Copyright 2018 by Nan Wu

Page 2: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

Abstract

Differential Geometry Approach for Unsupervised Machine Learning Algorithms

Nan WuDoctor of Philosophy

Graduate Department of MathemticsUniversity of Toronto

2018

Since its introduction in 2000, locally linear embedding (LLE) algorithm has been widely applied in data science.

In this thesis, we provide an asymptotical analysis of LLE under the manifold setup. First, by study the regularized

barycentric problem, we derive the corresponding kernel function of LLE. Second, we show that when the point

cloud is sampled from a general closed manifold, asymptotically LLE algorithm does not always recover the

Laplace-Beltrami operator, and the result may depend on the non-uniform sampling. We demonstrate that a

careful choosing of the regularization is necessary to ensure the recovery of the Laplace-Beltrami operator. A

comparison with the other commonly applied nonlinear algorithms, particularly the diffusion map, is provided.

Moreover, we discuss the relationship between two common nearest neighbor search schemes and the relationship

of LLE with the locally linear regression. At last, we consider the case when the point cloud is sampled from a

manifold with boundary. We show that if the regularization is chosen correctly, LLE algorithm asymptotically

recovers a linear second order differential operator with “free” boundary condition. Such operator coincides with

Laplace-Beltrami operator in the interior of the manifold. We further modify LLE algorithm to the Dirichlet Graph

Laplacian algorithm which can be used to recover the Laplace-Beltrami operator of the manifold with Dirichlet

boundary condition.

ii

Page 3: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

Acknowledgements

First and foremost, I would like to express my deep gratitude to my advisors Professor Alex Nabutovsky andProfessor Hau-tieng Wu for their valuable supervision of my work and countless hours of discussing mathematicalproblems with me. I greatly benefited from their knowledge and deep understanding of mathematics which theyshared with me and from their invaluable advice on the development of my academic career. In my long journeyto get a Ph.D.degree, it was their kind encouragement and support that helped me to overcome my difficulties andfrustrations.

I am grateful to Professor Regina Rotman, my Ph.D. Committee member, who enriched my knowledge ofRiemannian geometry and topology and provided thoughtful and patient guidance over the years. I also want tothank Professor Almut Burchard for teaching me analysis and her helpful advice.

I would like to thank my collaborator and friend Zhifei Zhu for our enlightening discussions and enjoyabletime spent on the Riemannian geometry projects.

I also thank the staff of the Department of Mathematics at our university. Especially I want to thank the helpof Jemima Merisca. And I would also like to thank the assisiance from the late Ida Bulat.

Last, but not the least, I am grateful to my dear parents for their unconditional love and support without whichI would have never been able to pursue a Ph.D. in mathematics. .

iii

Page 4: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

Contents

1 Introduction 11.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 LLE algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Barycentric coordinate on point cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Perturbation analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 LLE on closed manifolds 142.1 Manifold Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2 Some preliminary lemmas in Riemmanian geometry . . . . . . . . . . . . . . . . . . . . . . . . 152.3 Local covariance structure and local PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.4 Integral kernel of LLE and variance analysis on closed manifolds . . . . . . . . . . . . . . . . . . 33

2.4.1 Proof of Theorem 2.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.5 Biased analysis on closed manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.5.1 Proof of Theorem 2.5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512.6 Conculsion of the chapter: convergence of LLE on closed manifolds . . . . . . . . . . . . . . . . 642.7 ε-radius neighborhood v.s. K nearest neighborhood . . . . . . . . . . . . . . . . . . . . . . . . . 652.8 LLE v.s. LLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662.9 Error in variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672.10 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

2.10.1 Sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682.10.2 Examine the kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722.10.3 Two-dimensional random tomography example . . . . . . . . . . . . . . . . . . . . . . . 74

3 LLE on manifolds with boundary 773.1 Setup on manifolds with boundary and preliminary lemmas . . . . . . . . . . . . . . . . . . . . . 773.2 Variance analysis and biased analysis on manifolds with boundary, when ρ = 3 . . . . . . . . . . 84

3.2.1 Conclusion when ρ = 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 923.2.2 Proof of Theorem 3.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933.2.3 Proof of Theorem 3.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

3.3 Dirichlet Graph Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

Bibliography 109

iv

Page 5: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

Chapter 1

Introduction

Dimension reduction is a fundamental step in data analysis. In past decades, due to the high demand for ana-lyzing the large scale, massive and complicated datasets accompanying technological advances, there have beenmany efforts to solve this problem from different angles. The resulting algorithms can be roughly classified intotwo types, linear and nonlinear. Linear methods include principal component analysis (PCA), multidimensionalscaling, and others. Nonlinear methods include ISOMAP [65, 3], eigenmap [6], maximal variance unfolding[72], t-distributed stochastic neighbor embedding [69], commute time embedding [48], Patch-to-tensor embed-ding [51], locally linear embedding (LLE) [49, 1] and its variations such as, Hessian LLE [22], modified LLE[74], robust LLE [15] and weighted LLE [46], diffusion map (DM) [18] and its variations such as, local tangentspace alignment [75], vector diffusion map [55, 57], horizontal diffusion map [27], ODM(orientable diffusionmap) [54], magnetic diffusion map [26, 19], alternating diffusion [39, 64, 38, 37], multiview diffusion map [40],time coupled diffusion maps [42] and empirical intrinsic geometry (EIG) [53, 62, 63].

The subject of this thesis, LLE, was published in Science in 2000 [49]. It has been widely used and has beencited more than 10,000 times. The algorithm is designed to be intuitive and simple. It naturally merges two ideas“fit locally” and “think globally” together. Given a point cloud, first, for each data point, we determine its nearestneighbors, and catch the local geometric structure of the dataset through finding the barycenter coordinates forthose neighboring points by using a regularization. The barycentric coordinates generalize the following notion:they are an assignment of weights to the points in the neighborhood so that under such assignment the center ofthe neighborhood is the center of mass. This is the “fit locally” part of LLE. Second, by extending the barycentercoordinates into LLE matrix, the eigenvectors the matrix are evaluated as coodinate maps to reduce the dimensionof the point cloud. This is the “think globally” part of LLE. However, unlike the fruitful theoretical results fromdiscussing eigenmap and the diffusion-based approach like DM [6, 44, 7, 32, 52, 30, 18, 58, 8, 70, 71, 28, 57,24, 66, 55, 11, 10, 35, 47, 4, 14, 50, 13, 9, 31, 61, 33, 59], a systematic analysis of LLE algorithm has not beenundertaken, except an ad hoc argument shown in [6] based on some conditions.

In this thesis, we work under the assumption that the point cloud is (non-) uniformly sampled from a lowdimensional manifold isometrically embedded in the Euclidean space. Under such manifold setup, the “thinkglobally” part of LLE algorithm could be understood from the spectral geometry viewpoint [10, 11], where theyprove that the orthonormal eigenfunctions of the Laplace -Beltrami operator of a closed Riemannian manifoldcan be used as coordinate functions to embed the manifold into a Hilbert space with the Euclidean inner product.Hence, one should expect that LLE matrix approximates the Laplace-Beltrami operator and the eigenvectors of

1

Page 6: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 1. INTRODUCTION 2

LLE matrix approximate the eigenfunctions of the Laplace -Beltrami operator, so that if one uses the eigenvectorsas coodinate maps to reduce the dimension of the point cloud, the topological structure of the underlying manifoldis preserved.

One of the major contributions in this thesis is an asymptotic pointwise convergence analysis of LLE underthe manifold (may have boundary) setup . Such analysis is acheived in the following steps. First, for the pointcloud sampled from a manifold (may have boundary), we solve the regularized barycentric problem in the “fitlocally” part of LLE algorithm. From the barycentric coordinate estimation, we establish the kernel function ofLLE which depends on the regularization. Second, we prove that LLE matrix asymptotically converges to anintegral operator associated with the kernel function almost surely. Third, although it is widely believed thatunder the closed manifold setup, asymptotically LLE matrix should lead to the Laplace-Beltrami operator, weshow that this might not always be the case. Specifically, we show that the behavior of the integral operator relieson the regularization. If the regularization is chosen properly, the intergal operator approximates the Laplace-Beltrami operator, even if the sampling is non-uniform. If the regularization is not chosen properly, the acquiredinformation will be contaminated by the extrinsic information (the second fundamental form of the underlyingmanifold). At last, under the proper choice of the regularization, if the manifold has boundary, we show that theintegral operator approximates a second order linear differential operator on the manifold with “free” boundarycondition. The differential operator coincides with the Laplace-Beltrami operator in the interior of the manifold.

Given a point cloud that sampled from a manifold with boundary, it is an important problem in scienific com-putation that how to recover the Laplace-Beltrami operator of the underlying manifold with the Dirichlet boundarycondition. For example, the paper [29] in studying molecular dynamics provides motivation to solve such prob-lem. Based on the analysis of LLE algorithm on manifold with boundary, we propose an algorithm coined withthe name Dirichlet Graph Laplacian Algorithm. Without boundary detection, the algorithm constructs a diagonalmatrix which approximates the bump function concentrated on the boundary of the manifold. The Dirichlet GraphLaplacian is constructed by adding the diagonal matrix onto the LLE matrix with the proposed regularization. Ifthe manifold is closed, then the Dirichlet Graph Laplacian recovers the Laplace-Beltrami operator. However, ifthe manifold has a boundary, then the diagonal matrix will force a Dirichlet boundary condition. Specifically, weshow that the Dirichlet Graph Laplacian converges to an intergal operater almost surely and the integral operatorconverges uniformly over finite dimensional eigensubspaces of the Laplace-Beltrami operator with the Dirichletboundary condition.

The contents of this thesis can be summarized as follows. In Chapter 1, we review LLE algorithm and thebarycentric coodinates on the point cloud. We introduce the regularization which stablizes the barycentric problemand the solution to the regularized barycentric problem. At last, we provide a perturbation analysis to a specialtype of analytic symmetric matrix, which is used in the asymptotic analysis in later chapters. In Chapter 2, weestabish an asymptotic analysis of LLE on closed manifold setup. Moreover, we have a direct comparison of LLEand other relevant nonlinear machine learning algorithms, for example, the eigenmap and DM. The relationshipbetween two common nearest neighbor search schemes is discussed and we link LLE back to the widely appliedkernel regression technique, the locally linear regression (LLR) and error in variable problem. In the end, weprovide numerical simulations to support our theoretical findings. This chapter is based on the joint work withHau-tieng Wu [73]. In chapter 3, we estabish an asymptotic analysis of LLE on closed manifold with boundarysetup. At last, we introduce the Dirichlet Graph Laplacian (DGL) algorithm by modifying the original LLEalgorithm. This chapter is based a joint work with Hau-tieng Wu.

Page 7: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 1. INTRODUCTION 3

Table 1.1: Commonly used notations in this thesis.Symbol Meaningp Dimension of the ambient spaced Dimension of the low-dimensional Riemannian manifold(M,g) d-dimensional smooth Riemannian manifolddV Riemannian volume form of (M,g)|Sd−1| the volume of the Sd−1

expx Exponential map at xTxM Tangent space of M at xRicx Ricci curvature tensor of (M,g) at xι , ι∗ Isometric embedding of M into Rp and its differentialIIx Second fundamental form of the embedding ι at xP Probability density function on ι(M)n ∈ N Number of data points sampled from MX = zin

i=1 Point cloud sampled from ι(M)⊂ Rp

wzk ∈ RN Barycentric coordinates of zk with respect to data points inthe ε-neighborhood

1.1 Notations

We fix the notations used in this thesis. For d ∈ N, Id×d means the identity matrix of size d× d. For n ∈ N,denote 111n to be the n-dim vector with all entries 1. For ε ≥ 0, denote BRp

ε (x) := y ∈Rp|‖x−y‖Rp ≤ ε. Denoteei = [0, · · · ,1, · · ·0]> ∈ Rp to be the unit p-dim vector with 1 in the i-th entry. For p,r ∈ N so that r ≤ p, denoteJp,r ∈ Rp×r so that the (i, i) entry is 1 for i = 1, . . . ,r, and zeros elsewhere and denote Jp,r ∈ Rp×r so that the(p− r+ i, i) entry is 1 for i = 1, . . . ,r, and zeros elsewhere. Ip,r := Jp,rJ>p,r is a p× p matrix so that the (i, i)-thentry is 1 for i = 1, . . . ,r and 0 elsewhere; and Ip,r := Jp,r J>p,r is a p× p matrix so that the (i, i)-th entry is 1 fori = p− r+1, . . . , p and 0 elsewhere. Denote S(p) to be the set of real symmetric matrix of size p× p, O(p) to bethe orthogonal group in dimension p, and o(p) to be the set of anti-symmetric matrix of size p× p. For M ∈Rp×p,denote M> to be the transpose of M and M† to be the Moore-Penrose pseudo-inverse of M. For a,b ∈ R, we usea∧b := mina,b and a∨b := maxa,b to simplify the notation. We summarize the commonly used notationsfor the asymptotical analysis in Table 1.1 for the convenience of the readers.

1.2 LLE algorithm

We start by summarizing LLE algorithm. Suppose X = zini=1 ⊂ Rp is the provided dataset, or the point cloud.

1. Fix ε > 0. For each zk ∈X , denote Nzk := BRpε (zk)∩ (X \zk) = zk, jnk

j=1, where nk ∈ N is the numberof points in Nzk . Nzk is called the ε-radius neighborhood of zk. Alternatively, we can also fix a number K,and choose the K nearest points of zk. This is called the K nearest neighbors (KNN) scheme.

2. For each zk ∈X , find its barycentric coordinates associated with Nzk by

wzk = arg minw∈Rnk ,w>111nk=1

‖zk−nk

∑j=1

w( j)zk, j‖2 ∈ Rnk . (1.2.1)

Notice that wzk satisfies w>zk111nk = ∑

nkj=1 wzk( j) = 1.

Page 8: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 1. INTRODUCTION 4

3. Define a n×n matrix W , called the LLE matrix, by

Wk,l =

wzk( j) if zl = zk, j ∈Nzk ;0 otherwise.

(1.2.2)

4. To reduce the dimension of X , it is suggested in [49] to embed X into a low dimension Euclidean space

zk 7→ Yk = [v1(k), · · · ,v`(k)]> ∈ R`, (1.2.3)

for each zk ∈X , where ` is the dimension of the embedded points chosen by the user, and v1, · · · ,v` ∈ Rn

are eigenvectors of (I−W )>(I−W ) corresponding to the ` smallest eigenvalues. Note that this is equivalentto minimizing the cost function ∑

nk=1 ‖Yk−∑

nl=1 Wk,lYl‖2.

Although the algorithm looks relatively simple, there are actually several details that should be discussed priorto the asymptotical analysis. To simplify the discussion, we focus on one point zk ∈X and assume that there areN data points in Nzk = zk,1, · · · ,zk,N. To find the barycentric coordinates of zk, we define the local data matrix

associated with Nzk :

Gn :=

| |zk,1− zk . . . zk,N− zk

| |

∈ Rp×N . (1.2.4)

It is important to note that Gn depends not only on n, but also ε and zk. However, we only keep n to make thenotation easier. The other notations in this section are simplified in the same way. Minimizing (1.2.1) is equivalentto minimizing the functional w>G>n Gnw over w∈RN under the constraint w>111N = 1. Here, G>n Gn is the Gramianmatrix associated with the dataset zk,1−zk, · · · ,zk,N−zk. In general, G>n Gn might be singular, and it is suggestedin [49] to stabilize the algorithm by regularizing the equation by

(G>n Gn + cIN×N)y = 111N , (1.2.5)

where c > 0 is the regularizer chosen by the user. For example, in [49], c is suggested to be δ

N , where 0 < δ <

‖Gn‖2F is chosen by the user and ‖Gn‖F is the Frobenius norm of Gn. It has been observed that LLE is sensitive to

the choice of the regularizer (see, for example, [74]). We will later quantify this dependence under the manifoldsetup. Using the Lagrange multiplier method, the minimizer is

wn =yn

y>n 111N, (1.2.6)

where yn is the solution of (1.2.5). We are going to find wn explicitly in nect section.

1.3 Barycentric coordinate on point cloud

In this section, we explicitly express wn, which is the essential step toward the asymptotical analysis. Supposerank(G>n Gn) = rn. Note that rn = rank(GnG>n ) = rank(Gn) ≤ p, so G>n Gn is singular when p < N. Moreover,G>n Gn is positive semidefinite. Denote the eigen-decomposition of G>n Gn as VnΛnV>n , where

Λn = diag(λn,1,λn,2, . . . ,λn,N), (1.3.1)

Page 9: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 1. INTRODUCTION 5

λn,1 ≥ λn,2 ≥ ·· · ≥ λn,rn > λn,rn+1 = · · ·= λn,N = 0, and

Vn =

| |vn,1 . . . vn,N

| |

∈ O(N). (1.3.2)

Clearly, vn,iNi=rn+1 form an orthonormal basis of the null space of Null(G>n Gn), which is equivalent to Null(Gn).

Then (1.2.5) is equivalent to solvingVn(Λn + cIN×N)V>n y = 111N , (1.3.3)

and the solution is

yn =Vn(Λn + cIN×N)−1V>n 111N

= c−1111N +Vn[(Λn + cIN×N)

−1− c−1IN×N]V>n 111N . (1.3.4)

Therefore,

w>n =111>N +111>NVn

[c(Λn + cIN×N)

−1− IN×N]V>n

N +111>NVn[c(Λn + cIN×N)−1− IN×N

]V>n 111N

. (1.3.5)

Without recasting (1.3.5) into a proper form, it is not clear how to capture the geometric information containedin (1.3.5). Observe that while G>n Gn is the Gramian matrix, GnG>n is related to the sample covariance matrixassociated with Nzk . We call 1

n GnG>n the local sample covariance matrix.1 Clearly, rn ≤ p and GnG>n and G>n Gn

share the same positive eigenvalues, λn,1 · · ·λn,rn . Denote the eigen-decomposition of GnG>n as UnΛnU>n , whereUn ∈ O(p) and Λn is a p× p diagonal matrix. By a direct calculation, the first rn columns of Vn are related to Un

byVnJN,rn = G>n Un(Λ

†n)

1/2Jp,rn , (1.3.6)

where Vn = [VnJN,rn |VnJN,N−rn ]. Since(Λn + cIN×N

)−1− c−1IN×N has only rn non-zero diagonal entries, basedon (1.3.4), we have

y>n =c−1111>N +111>NVn[(Λn + cIN×N)

−1− c−1IN×N]V>n

=c−1111>N +111>N G>n Un(Λ†n)

1/2Jp,rnJ>p,rn

[(Λn + cIp×p)

−1− c−1Ip×p]Jp,rnJ>p,rn(Λ

†n)

1/2U>n Gn .

Note that we have

Un(Λ†n)

1/2Jp,rnJ>p,rn

[(Λn + cIp×p)

−1− c−1Ip×p]Jp,rnJ>p,rn(Λ

†n)

1/2U>n

= − c−1UnJp,rnJ>p,rn(Λn + cIp×p)−1Jp,rnJ>p,rnU>n , (1.3.7)

which could be understood as a “regularized pseudo-inverse”. Specifically, when c is small, we have

UnJp,rnJ>p,rn(Λn + cIp×p)−1Jp,rnJ>p,rnU>n ≈ (GnG>n )

†. (1.3.8)

DenoteIc(GnG>n ) :=UnJp,rnJ>p,rn(Λn + cIp×p)

−1Jp,rnJ>p,rnU>n . (1.3.9)

1The usual sample covariance matrix associated with Nzk is defined as 1n−1 ∑

Nj=1(zk, j−µk)(zk, j−µk)

>, where µk =1n ∑

Nj=1 zk, j .

Page 10: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 1. INTRODUCTION 6

Hence, we can recast (1.3.4) and (1.3.5) into

y>n =c−1111>N − c−1111>N G>n Ic(GnG>n )Gn (1.3.10)

and

w>n =111>N −111>N G>n Ic(GnG>n )Gn

N−111>N G>n Ic(GnG>n )Gn111N=

111>N −T>n,zkGn

N−T>n,zkGn111N

, (1.3.11)

where

Tn,zk := Ic(GnG>n )Gn111N (1.3.12)

is chosen in order to have a better geometric insight into LLE algorithm. We now summarize the expansion of thebarycentric coordinate.

Proposition 1.3.1. Take a data set X = zini=1 ⊂Rp. Suppose there are N data points in the ε neighborhood of

zk, namely zk,1, · · · ,zk,N ⊂ BRpε (zk)∩ (X \zk). Assume p < N. Let G>n Gn be the Gramian matrix associated

with zk,1− zk, · · · ,zk,N − zk and let λn,iri=1 and un,ir

i=1, where r ≤ p is the rank of G>n Gn, be the nonzero

eigenvalues and the corresponding orthonormal eigenvectors of GnG>n satisfying (1.3.6). With Tn,zk defined in

(1.3.12), the barycentric coordinates of zk coming from the regularized equation (1.2.5) is

w>n =111>N −T>n,zk

Gn

N−T>n,zkGn111N

. (1.3.13)

Remark 1.3.1. The denominator N −T>n,zkGn111N is the sum of all entries of the numerator 111>N −T>n,zk

Gn. We

could thus view LLE matrix defined in (1.2.2) as a “normalized kernel” defined on the point cloud. However,

we mention that while all entries of wn are summed to 1, the vector 111>N −T>n,zkGn might have negative entries,

depending on the vector T>n,zk. Therefore, in general, W cannot be understood as a transition matrix.

1.4 Perturbation analysis

In this section, we introduce a perturbation analysis on a special type of symmetric matrix. Such analysis is goingto be used in the proof of the main theorem.

Suppose A :R→ S(p), where S(p) is the set of real symmetric p× p matrices, is an analytic function around 0.In this appendix, we are going to introduce an algorithm to calculate the eigenvalues and orthonormal eigenvectorsof A(ε) when ε is small enough. The method introduced in this appendix follows the standard approach, like[2, 68]. For discussion of more general matrices, interested readers are referred to [68].

Suppose

A(0) =

[λ Id×d 0

0 0

],

where 0 < d < p and λ 6= 0. Decompose A(0) by

A(0)X(0) = X(0)Λ(0), (1.4.1)

Page 11: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 1. INTRODUCTION 7

where Λ(0) = A(0) is a diagonal matrix consisting of eigenvalues of A(0), and

X(0) =

[X1 00 X2

]∈ O(p),

where X1 ∈ O(d) and X2 ∈ O(p− d). Note that due to the possible nontrivial multiplicity of eigenvalues, X(0)may not be uniquely determined. Take the Taylor expansion of A around 0 as

A(ε) = A(0)+A′(0)ε +12

A′′(0)ε2 +O(ε3) ,

where ε > 0 is sufficiently small, A′(0) and A′′(0) are divided into blocks of the same size as those of A(0) by

A′(0) =

[A′11 A′12

A′21 A′22

], A′′(0) =

[A′′11 A′′12

A′′21 A′′22

],

where A′11 ∈ S(d), A′22 ∈ S(p− d), A′′11 ∈ S(d) and A′′22 ∈ S(p− d). Let Λ(ε) ∈ Rp×p be the diagonal matrixconsisting of eigenvalues of A(ε) and X(ε) ∈ Rp×p be the matrix formed by the corresponding eigenvectors, i.e.

A(ε)X(ε) = X(ε)Λ(ε) . (1.4.2)

Since A is symmetric, X and Λ are both analytic around 0 based on [5, Section 3.6.2, Theorem 1]. We thus havethe following Taylor expansion when ε is sufficiently small:

Λ(ε) = Λ(0)+ εΛ′(0)+

12

ε2Λ′′(0)+O(ε3),

X(ε) = X(0)+X ′(0)ε +O(ε2).

Here Λ(0), Λ′(0) and Λ′′(0) are all diagonal matrices and columns of X(ε) form an orthogonal set. Note thatif we normalize X(ε) to be in O(p), then by the fact that the Lie algebra of O(p) is the set of anti-symmetricmatrices, we know that X(0)−1X ′(0) is an anti-symmetric matrix. We discuss the eigendecomposition of A(ε)

under two different setups, depending on the multiplicity of eigenvalues.

When there is no repeated eigenvalue in both A′11 and A′22. In the first case, we assume that the eigenvaluesof A′11 are distinct and the eigenvalues of A′22 are distinct (but the eigenvalues of A′11 and the eigenvalues of A′22

could overlap). To get Λ(ε) up to the first order, we need to solve Λ′(0). To determined Λ′(0), we check the firstorder derivative of A(ε) at ε = 0. Differentiate (1.4.2) and we get

A′(0)X(0)+A(0)X ′(0) = X ′(0)Λ(0)+X(0)Λ′(0) .

Denote

Λ′(0) =

[Λ′1 00 Λ′2

]and set

X ′(0) = X(0)C,

Page 12: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 1. INTRODUCTION 8

where

C =

[C11 C12

C21 C22

]∈ Rp×p.

If we substitute X(0), X ′(0), and Λ′(0) into (1.4.3), we have the following linear equations by comparing blocks:

A′11X1 = X1Λ′1, (1.4.3)

A′22X2 = X2Λ′2, (1.4.4)

A′12X2 =−λX1C12, (1.4.5)

A′21X1 = λX2C21, (1.4.6)

By (1.4.3) and (1.4.4), Λ′1 and Λ′2 are eigenvalue matrices of A′11 and A′22, and X1 and X2 are the correspondingeigenvector matrices, and we obtain the first order approximation of the eigenvalues. Note that above equationshold without assuming that the eigenvalues of A′11 are distinct and the eigenvalues of A′22 are distinct. Also notethat although we could obtain the first order relationship between the eigenvectors of A(ε) and A(0), withoutassuming distinct eigenvalues, the eigenvectors may not be unique.

If we want to further get Λ(ε) up to the second order and solve X(ε) uniquely up to the first order, we needto solve Λ′(0), Λ′′(0), X(0), and X ′(0). To solve Λ′(0), Λ′′(0), X(0), and X ′(0), we need the assumption that A′11

and A′22 have no repeated eigenvalues, while we allow eigenvalues of A′11 to be same as those of A′22. By (1.4.5)and (1.4.6) we have

C12 =−λ−1X>1 A′12X2, (1.4.7)

C21 = λ−1X>2 A′21X1. (1.4.8)

Clearly, since A′11 and A′22 do not have repeated eigenvalues, X1 and X2 are uniquely defined and C12 and C21 canbe uniquely determined. Since the information of C11 and C22 are not available from (1.4.3), we need the higherorder derivative of A(ε). Differentiate A(ε) twice, we get

A′′(0)X(0)+2A′(0)X ′(0)+A(0)X ′′(0) = X ′′(0)Λ(0)+2X ′(0)Λ′(0)+X(0)Λ′′(0). (1.4.9)

Now, we further substitute X(0) and X ′(0) = X(0)C into (1.4.9), and get

A′′11X1−X1Λ′′1 = 2X1(C11Λ

′1−Λ

′1C11)−2A′12X2C21, (1.4.10)

A′′22X2−X2Λ′′2 = 2X2(C22Λ

′2−Λ

′2C22)−2A′21X1C12. (1.4.11)

Since X1 ∈ O(d) and X2 ∈ O(p−d), we have

X>1 (A′′11X1 +2A′12X2C21) = 2(C11Λ′1−Λ

′1C11)+Λ

′′1 (1.4.12)

X>2 (A′′22X2 +2A′21X1C12) = 2(C22Λ′2−Λ

′2C22)+Λ

′′2 . (1.4.13)

Since that diagonal entries of C11Λ′1−Λ′1C11 and C22Λ′2−Λ′2C22 are zero, and the off-diagonal entries of Λ′′1 andΛ′′2 are zero, the off diagonal entries of C11 and C22, as well as Λ′′11 and Λ′′22, can be found from (1.4.12). Specially,

Page 13: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 1. INTRODUCTION 9

since A′11 and A′22 do not have repeated eigenvalues, we have

(C11)m,n =−1

(Λ′1)m,m− (Λ′1)n,ne>m(X>1 A′′11X1 +

X>1 A′12A′21X1)en,

(Λ′′11)m,m = e>m(X>1 A′′11X1 +

X>1 A′12A′21X1)em,

where 1≤ m 6= n≤ d and

(C22)m,n =−1

(Λ′2)m,m− (Λ′2)n,ne>m(X>2 A′′22X2−

X>2 A′21A′12X2)en,

(Λ′′22)m,m = e>m(X>2 A′′22X2−

X>2 A′21A′12X2)em,

where 1 ≤ m 6= n ≤ p− d. By the above evaluation, we know Ci, j = −C j,i for 1 ≤ i 6= j ≤ p, and what is leftunknown is the diagonal entries of C. To determine the diagonal entries of C, we normalize X(ε) = X(0) +X(0)Cε +O(ε2) so that X(ε) ∈ O(p). We thus have

Ip×p = (X(0)+X ′(0)ε +O(ε2))>(X(0)+X ′(0)ε +O(ε2))

= X(0)>X(0)+(C>X(0)>X(0)+X(0)>X(0)C)ε +O(ε2)

= Ip×p +2εdiag(C)+O(ε2), (1.4.14)

where the last equality holds since Ci, j =−C j,i when i 6= j, and diag(C) is a diagonal matrix so that diag(C)i,i =

Ci,i for i = 1, . . . , p. As a result, we know that the diagonal entries of C are of order ε . As a result, we have thefollowing solution to the eigenvalues and eigenvectors of A(ε):

Λ(ε) =

[λ Id×d + εΛ′1 +

12 ε2Λ′′1 0

0 εΛ′2 +12 ε2Λ′′2

]+O(ε3), (1.4.15)

X(ε) = X(0)(Ip×p + εS)+O(ε2) ∈ O(p), (1.4.16)

where S := C− diag(C) and the last equality holds since the entries of diag(C) are of order ε . Note that S isan anti-symmetric matrix. This result could be understood from the fact that the Lie algebra of O(p) is the set ofanti-symmetric matrices, and the tangent vector at X(0) leading to X(ε) is X(0)S.

When there exists a repeated eigenvalue in A′22. In this case, we assume that A′22 may have repeated eigen-values, and to simplify the discussion, we assume that A′11 does not have a repeated eigenvalue. Recall (1.4.3) and(1.4.4). Write

Λ′2 =

[Λ′2,1 0

0 Λ′2,2

],

where Λ′2,2 ∈ Rl×l , 1 ≤ l ≤ p− d, is a diagonal matrix with the same diagonal entries, denoted as γ ∈ R. Tosimplify the discussion, we assume that the diagonal entries of Λ′2,1 ∈ R(p−d−l)×(p−d−l) are all distinct and aredifferent from γ . Hence, we have

Λ′(0) =

Λ′1 0 00 Λ′2,1 00 0 Λ′2,2

.Let Γ1 ∈ O(d) be the orthonormal eigenvector matrix of A′11 and Γ2 ∈ O(p−d) be any orthonormal eigenvector

Page 14: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 1. INTRODUCTION 10

matrix of A′22. Define

Γ =

[Γ1 00 Γ2

].

ConsiderA(ε) = Γ

−1A(ε)Γ . (1.4.17)

Note that A(ε) has the same eigenvalue matrix as A(ε). By a direct expansion,

A(ε) =Γ−1A(ε)Γ

=Γ−1A(0)Γ+Γ

−1A′(0)Γε +12

Γ−1A′′(0)Γε

2 +O(ε3)

= A(0)+ A′(0)ε +12

A′′(0)ε2 +O(ε3) ,

where A(0) := Γ−1A(0)Γ, A′(0) := Γ−1A′(0)Γ, and A′′(0) := Γ−1A′′(0)Γ. By the assumption of A(0), we have

A(0) = Γ−1A(0)Γ = A(0).

Furthermore, we have

A′(0) = Γ−1A′(0)Γ =

[Γ−11 A′11Γ1 Γ

−11 A′12Γ2

Γ−12 A′21X1 Γ

−12 A′22Γ2

]=

[Λ′1 Γ

−11 A′12Γ2

Γ−12 A′21Γ1 Λ′2

],

where the last equality holds since Γ1 and Γ2 are eigenvector matrices of A′11 and A′22. Then, we divide A(0),A′(0) and A′′(0) and Λ′′(0) into blocks in the same way as that of Λ′(0):

A(0) =

λ Id×d 0 00 0 00 0 0

A′(0) =

Λ′1 A′12,1 A′12,2

A′21,1 Λ′2,1 0A′21,2 0 Λ′2,2

A′′(0) =

A′′11 A′′12,1 A′′12,2

A′′21,1 A′′22,11 A′′22,12

A′′21,2 A′′22,21 A′′22,22

Λ′′(0) =

Λ′′1 0 00 Λ′′2,1 00 0 Λ′′2,2

,

where we use the following notations for the blocks of A′(0):

Γ−12 A′21Γ1 =

[A′21,1

A′21,2

]Γ−11 A′12Γ2 =

[A′12,1 A′12,2

].

If X(ε) is an orthonormal eigenvector matrix of A(ε), by (1.4.17), we have

X(ε) = ΓX(ε) .

By the expansion X(ε) = X(0)+ εX ′(0)+O(ε2), we have X(0) = ΓX(0) and X ′(0) = ΓX ′(0). Therefore, it is

Page 15: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 1. INTRODUCTION 11

sufficient to find X(0) and X ′(0). Since

A′22 = Γ−12 A′22Γ2 =

[Λ′2,1 0

0 Λ′2,2

]

is a diagonal matrix after the conjugation with Γ, by the assumption about the eigenvalues and (1.4.3) and (1.4.4),we have

X(0) =

X1 0 00 X2,1 00 0 X2,2

.Similarly, define X ′(0) = X(0)C, where we divide C into blocks in the same way as that of Λ′(0):

C =

C11 C12,1 C12,2

C21,1 C22,11 C22,12

C21,2 C22,21 C22,22

.

Under such a block decomposition, we apply (1.4.3) to A′(0), and we have

Λ′1X1 = X1Λ

′1, (1.4.18)

Λ′2,1X2,1 = X2,1Λ

′2,1. (1.4.19)

Λ′2,2X2,2 = X2,2Λ

′2,2. (1.4.20)

A′12,1X2,1 =−λ X1C12,1, (1.4.21)

A′12,2X2,2 =−λ X1C12,2, (1.4.22)

A′21,1X1 = λ X2,1C21,1, (1.4.23)

A′21,2X1 = λ X2,2C21,2 . (1.4.24)

Then, we apply (1.4.9) to A′′(0), we have

A′′11X1− X1Λ′′1 = 2X1(C11Λ

′1−Λ

′1C11)−2A′12,1X2,1C21,1−2A′12,2X2,2C21,2, (1.4.25)

A′′22,11X2,1− X2,1Λ′′2,1 = 2X2,1(C22,11Λ

′2,1−Λ

′2,1C22,11)−2A′21,1X1C12,1, (1.4.26)

A′′22,22X2,2− X2,2Λ′′2,2 = 2X2,2(C22,22Λ

′2,2−Λ

′2,2C22,22)−2A′21,2X1C12,2, (1.4.27)

A′′22,12X2,2 = 2X2,1(C22,12Λ′2,2−Λ

′2,1C22,12)−2A′21,1X1C12,2, (1.4.28)

A′′22,21X2,1 = 2X2,2(C22,21Λ′2,1−Λ

′2,2C22,21)−2A′21,2X1C12,1. (1.4.29)

Since Λ′1 and Λ′2,1 both have distinct diagonal entries, by (1.4.18) and (1.4.19), we have

X1 = Id×d

andX2,1 = I(p−d−l)×(p−d−l).

Page 16: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 1. INTRODUCTION 12

In this case, C12,1 and C21,1 can be uniquely determined by (1.4.21) and (1.4.23), and we have

C12,1 =−1λ

A′12,1, (1.4.30)

C21,1 =1λ

A′21,1. (1.4.31)

Similarly, by (1.4.22) and (1.4.24), we have

C12,2 =−1λ

A′12,2, (1.4.32)

C21,2 =1λ

A′21,2, (1.4.33)

By plugging (1.4.32) into (1.4.27), and use the assumption that Λ′2,2 = γIl×l , we can solve Λ′′2,2. Indeed, since Λ′2,2is a scalar multiple of the identity matrix, C22,22Λ′2,2−Λ′2,2C22,22 = 0 in (1.4.27). Thus, (1.4.27) becomes

(A′′22,22−2λ−1A′21,2A′12,2)X2,2 = X2,2Λ

′′2,2, (1.4.34)

and Λ′′2,2 and X2,2 are eigenvalue and orthonormal eigenvector matrices of A′′22,22−2λ−1A′21,2A′12,2. Thus, we haveobtained the eigenvalue information. However, note that in general X2,2 cannot be uniquely determined.

Suppose we want to uniquely determine the eigenvectors, X(ε), we have to further assume that Λ′′2,2 doesnot have repeated diagonal entries; that is, eigenvalues of A′′22,22 − 2λ−1A′21,2A′12,2 do not repeat. Under thisassumption, X2,2 is uniquely determined, and we can proceed to solve C. With X2,2, from (1.4.28) and (1.4.29) wecan solve C22,12 and C22,21 since Λ′2,2 is a scalar multiple of the identity matrix and the diagonal entries of Λ′2,1 areassumed to be different from Λ2,2. In fact, we have

C22,12 = (γI(p−d−l)×(p−d−l)−Λ′2,1)−1(

12

A′′22,12X2,2 + A′21,1C12,2), (1.4.35)

C22,21 = X>2,2(12

A′′22,21 + A′21,2C12,1)(Λ′2,1− γI(p−d−l)×(p−d−l))

−1 . (1.4.36)

Next, Λ′′1 , Λ′′2,1 and the off-diagonal entries of C11 and C22,11 are solved by rewriting (1.4.25) and (1.4.26) as

2(C11Λ′1−Λ

′1C11)+Λ

′′1 = (A′′11 +2A′12,1C21,1 +2A′12,2X2,2C21,2) (1.4.37)

2(C22,11Λ′2,1−Λ

′2,1C22,11)+Λ

′′2,1 = (A′′22,11 +2A′21,1C12,1) . (1.4.38)

Therefore, with the assumption that Λ′′2,2 does not have repeated diagonal entries, we have

(C11)m,n =−1

(Λ′1)m,m− (Λ′1)n,ne>m((

12

A′′11 +A′12,1C21,1 + A′12,2X2,2C21,2))en,

(C22,11)m,n =−1

(Λ′2,1)m,m− (Λ′2,1)n,ne>m((

12

A′′22,11 + A′21,1C12,1))en,

where 1≤ m 6= n≤ d and d+1≤ i 6= j ≤ p− l. However, the problem cannot be solved and more information isneeded. Indeed, note that (1.4.27) can be rewritten as

Λ′′2,2 = X>2,2(A

′′22,22X2,2 +2A′21,2C12,2) = X>2,2(A

′′22,22−

A′21,2A′12,2)X2,2 (1.4.39)

since C22,22Λ′2,2−Λ′2,2C22,22 = 0, which is the same as (1.4.34). Thus, it is not informative and we need higher

Page 17: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 1. INTRODUCTION 13

order derivatives of A(ε) at 0 to solve C22,22.Suppose we know A′′′(0), and denote A′′′(0) = Γ−1A′′′(0)Γ, which is divided correspondingly as

A′′′(0) =

A′′′11 A′′′12,1 A′′′12,2

A′′′21,1 A′′′22,11 A′′′22,12

A′′′21,2 A′′′22,21 A′′′22,22

. (1.4.40)

Then, if we differentiate (1.4.2) three times and use the similar method as before, we get

(C22,22)m,n =−1

(Λ′′2,2)m,m− (Λ′′2,2)n,ne>m[X>2,2A′′′22,22X2,2−

2λ 2 X>2,2A′21,2(A

′11− γId×d)A′12,2X2,2 (1.4.41)

+1λ

X>2,2A′21,2A′′12,2X2,2 +1λ

X>2,2A′′21,2A′12,2X2,2

− 4λ 2 X>2,2(A

′21,2A′12,1)(γI(p−d−l)×(p−d−l)−Λ

′2,1)−1(A′21,1A′12,2)X2,2

]en .

By normalizing X(ε), we can get the diagonal terms of C11, C22,11 and C22,22, which are of order ε . As aresult, we have

Λ(ε) =

λ Id×d + εΛ′1 + ε2Λ′′1 0 00 εΛ′2,1 + ε2Λ′′2,1 00 0 εΛ′2,2 + ε2Λ′′2,2

+O(ε3), (1.4.42)

X(ε) = X(0)(Ip×p + ε(C−diag(C)))+O(ε2) ∈ O(p), (1.4.43)

where the last equality holds since the entries of diag(C) are of order ε . Finally, we can find X(0) and X ′(0) byusing

X(0) = ΓX(0), X ′(0) = ΓX ′(0).

General cases. In general, if Λ′1 or Λ′2,1 has repeated diagonal entries, we divide them into more blocks, andthe block with the same diagonal entries can be treated in the same way as we treated Λ′2,2 above. We skip detailshere.

Page 18: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

Chapter 2

LLE on closed manifolds

2.1 Manifold Setup

Let X be a p-dimensional random vector. Assume that the distribution of X is supported on a d-dimensionalcompact, smooth Riemannian manifold (M,g) isometrically embedded in Rp via ι : M → Rp, where we assumethat M is boundary-free to simplify the discussion. Denote d(·, ·) to be the geodesic distance associated with g.For the tangent space TyM on y ∈M, denote ι∗TyM to be the embedded tangent space in Rp. Denote the normalspace at ι(y) as (ι∗TyM)⊥. Denote IIx to be the second fundamental form of ι at x. Denote expy : TyM→M to bethe exponential map at y. Denote Sd−1 to be the (d−1)-dim unit sphere embedded in Rp, and denote |Sd−1| to bethe volume. Unless otherwise stated, in this paper we will carry out the calculation with the normal coordinate.

We now define the probability density function (p.d.f.) associated with X . The random vector X : Ω→ Rp

is a measurable function with respect to the probability space (Ω,F ,P), where P is the probability measuredefined on the sigma algebra F in Ω. By assumption, the range of X is supported on ι(M). Let B be the Borelsigma algebra of ι(M), and denote by PX the probability measure defined on B, induced from P. When d < p,the p.d.f. of X is not well-defined as a function on Rp. Thus, for an integrable function ζ : ι(M)→ R, we have

Eζ (X) =∫

Ω

ζ (X(ω))dP(ω) =∫

ι(M)ζ (z)dPX (z) , (2.1.1)

where the second equality follows from the fact that PX is the induced probability measure. If PX is absolutelycontinuous with respect to the volume density on ι(M), by the Radon-Nikodym theorem, dPX (z) = P(z)ι∗dV (z),where ι∗dV (z) is the induced measure on ι(M) via ι , P is a non-negative measurable function defined on ι(M)

and dV is the volume form associated with the metric g. Thus, (2.1.1) becomes

Eζ (X) =∫

ι(M)ζ (z)P(x)ι∗dV (z) =

∫M

ζ (ι(y))P(ι(y))dV (y), (2.1.2)

where the second equality comes from the change of variable z = ι(y). We thus call P as the p.d.f. of X on M.When P is a constant function, we call X a uniform random sampling scheme; otherwise it is nonuniform. Sinceι is an isometric embedding, when we do the calculation, we will abuse the notation and write

Eζ (X) =∫

Mζ (y)P(y)dV (y) . (2.1.3)

14

Page 19: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 15

In other words, we will not distinguish either a function is defined on ι(M) or M.

With the above discussion, to facilitate the discussion and the upcoming analysis, we make the followingassumption about the random vector X and the regularity of the associated p.d.f..

Assumption 2.1.1. Assume PX is absolutely continuous with respect to the volume density on ι(M) so that dPX =

Pι∗dV , where P is a measurable function. We further assume that P ∈ C5(ι(M)) and there exist Pm > 0 and

PM ≥ Pm so that Pm ≤ P(x)≤ PM < ∞ for all x ∈ ι(M).

Let X = ι(xi)ni=1 ⊂ ι(M)⊂ Rp denote a set of identical and independent (i.i.d.) random samples from X ,

where xi ∈M. We could then run LLE on X . For ι(xk)∈X and ε > 0, we have Nι(xk) := ι(xk,1), · · · , ι(xk,N)⊂BRp

ε (ι(xk))∩ (X \ι(xk)). Take Gn ∈ Rp×N to be the local data matrix associated with Nι(xk) and evaluate thebarycentric coordinates wn = [wn,1, · · · ,wn,N ]

> ∈ RN . Again, while Gn and wn depend on ε , n, and xk, to ease thenotation, we only keep n to indicate that we have a finite sampling points.

In our asymptotic analysis of LLE under the manifold setup, we make the following assumption about thetangent space of the manifold.

Assumption 2.1.2. Since the barycentric coordinates are rotational and translational invariant, without loss of

generality, we assume that the manifold is translated and rotated properly, so that ι∗TxM is spanned by e1, . . . ,ed .

2.2 Some preliminary lemmas in Riemmanian geometry

In this section we prepare several technical lemmas. For v ∈ Rp, we use the following notation to simplify theproof:

v = [[v1, v2]] ∈ Rp , (2.2.1)

where v1 ∈Rd forms the first d coordinates of v and v2 ∈Rp−d forms the last p−d coordinates of v. Thus, underAssumptions 2.1.2, for v = [[v1, v2]] ∈ Tι(x)Rp, v1 = J>p,dv is tangential to ι∗TxM and v2 = J>p,p−dv is the coordinateof the normal component of v associated with a chosen basis of the normal bundle. The first three lemmas arebasic facts about the exponential map, the normal coordinate, and the volume form. The proofs of Lemmas 2.2.1and 2.2.2 are standard and we skip the proof. Interested readers are referred to [55].

Lemma 2.2.1. Fix x ∈M. If we use the polar coordinate (t,θ) ∈ [0,∞)× Sd−1 to parametrize TxM, the volume

form has the following expansion:

dV =(

td−1− 16Ricx(θ ,θ)td+1− 1

12∇θRicx(θ ,θ)td+2

−( 1

40∇

2θθRicx(θ ,θ)+

1180

d

∑a,b=1

Rx(θ ,a,θ ,b)Rx(θ ,a,θ ,b)−1

72Ricx(θ ,θ)

2)td+3

+O(td+4))

dtdθ ,

where Rx is the Riemannian curvature of (M,g) at x. If we use the Cartesian coordinate to parametrize TxM, the

Page 20: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 16

volume form has the following expansion

dV =(

1−d

∑i, j=1

16Ricx(∂i,∂ j)uiu j−

d

∑i, j,k=1

112

∇kRicx(∂i,∂ j)uiu juk

−d

∑i, j,k,l=1

[ 140

∇2klRicx(∂i,∂ j)+

1180

d

∑a,b=1

Rx(∂i,∂a,∂ j,∂b)Rx(∂k,∂a,∂l ,∂b)

− 172

Ricx(∂i,∂ j)Ricx(∂k,∂l)]uiu jukul +O(‖u‖5)

)du,

where u = ui∂i ∈ TxM.

Lemma 2.2.2. Fix x ∈M. For u ∈ TxM with ‖u‖ sufficiently small, we have the following Taylor expansion:

ι expx(u)− ι(x) = ι∗u+12

IIx(u,u)+16

∇uIIx(u,u)

+1

24∇

2uuIIx(u,u)+

1120

∇3uuuIIx(u,u)+O(‖u‖6).

Lemma 2.2.3. Fix x ∈ M. If we use the polar coordinate (t,θ) ∈ [0,∞)× Sd−1 to parametrize TxM, when t =

‖ι expx(θ t)− ι(x)‖Rp is sufficiently small, we have

t = t− 124‖IIx(θ ,θ)‖2t3− 1

24∇θ IIx(θ ,θ) · IIx(θ ,θ)t4−

( 180

∇2θθ IIx(θ ,θ) · IIx(θ ,θ)

+1

90∇θ IIx(θ ,θ) ·∇θ IIx(θ ,θ)+

11152

‖IIx(θ ,θ)‖4)t5 +O(t6) ,

t = t +1

24‖IIx(θ ,θ)‖2t3 +

124

∇θ IIx(θ ,θ) · IIx(θ ,θ)t4 +( 1

80∇

2θθ IIx(θ ,θ) · IIx(θ ,θ)

+1

90∇θ IIx(θ ,θ) ·∇θ IIx(θ ,θ)+

71152

‖IIx(θ ,θ)‖4)t5 +O(t6).

Hence, (ι expx)−1(BRp

t (ι(x))∩ ι(M))⊂ TxMd is star shaped.

Proof. Let γ(t) be the geodesic in ι(M) with γ(0) = ι(x). If γ(i)(0) denotes the i-th derivative of γ(t) with respectto t at 0, then we have

γ(t) =γ(0)+ γ(1)(0)t +

12

γ(2)(0)t2 +

16

γ(3)(0)t3 (2.2.2)

+124

γ(4)(0)t4 +

1120

γ(5)(0)t5 +O(t6).

Moreover, since γ(t) is a geodesic, if we apply the product rule, we have

γ(1)(0) · γ(1)(0) = 1, (2.2.3)

γ(2)(0) · γ(1)(0) = 0,

γ(2)(0) · γ(2)(0) =−γ

(3)(0) · γ(1)(0),

3γ(3)(0) · γ(2)(0) =−γ

(4)(0) · γ(1)(0),

4γ(4)(0) · γ(2)(0)+3γ

(3)(0) · γ(3)(0) =−γ(5)(0) · γ(1)(0) ,

Page 21: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 17

where γ(l) is the l-th derivative of γ and l ∈ N. From (2.2.2), we have

‖γ(t)− γ(0)‖2Rp =γ

(1)(0) · γ(1)(0)t2 +(γ(2)(0) · γ(1)(0))t3 (2.2.4)

+(1

3γ(3)(0) · γ(1)(0)+ 1

4γ(2)(0) · γ(2)(0)

)t4

+( 1

12γ(4)(0) · γ(1)(0)+ 1

6γ(3)(0) · γ(2)(0)

)t5

+( 1

60γ(5)(0) · γ(1)(0)(0)+ 1

24γ(4)(0) · γ(2)(0)+ 1

36γ(3)(0) · γ(3)(0)

)t6 +O(t7).

If we substitute (2.2.3) into (2.2.4), we have

‖γ(t)− γ(0)‖2Rp =t2− 1

12γ(2)(0) · γ(2)(0)t4− 1

12γ(3)(0) · γ(2)(0)t5− (2.2.5)( 1

40γ(4)(0) · γ(2)(0)+ 1

45γ(3)(0) · γ(3)

)t6 +O(t7).

Therefore

t = ‖γ(t)− γ(0)‖Rp = t− 124

γ(2)(0) · γ(2)(0)t3− 1

24γ(3)(0) · γ(2)(0)t4 (2.2.6)

−( 1

80γ(4)(0) · γ(2)(0)+ 1

90γ(3)(0) · γ(3)+ 1

1152(γ(2)(0) · γ(2)(0))2)t5 +O(t6).

By comparing the order, we have

t =t +1

24γ(2)(0) · γ(2)(0)t3 +

124

γ(3)(0) · γ(2)(0)t4 +(

180

γ(4)(0) · γ(2)(0) (2.2.7)

+1

90γ(3)(0) · γ(3)+ 7

1152(γ(2)(0) · γ(2)(0))2)t5 +O(t6).

Finally, by applying Lemma 2.2.2 to (2.2.2) with γ(t) = ι expx(θ t) and substituting corresponding terms forγ(l)(0), the conclusion follows.

The essence of Lemma 2.2.3 is describing how well we could estimate the local geodesic distance by theambient space metric. When the manifold setup is considered in an algorithm, this Lemma could be helpful in theanalysis since most of time we only have an access to the ambient space metric, but not the intrinsic Riemannianmetric.

To alleviate the notational load, we denote

Bε(x) := ι−1(BRp

ε (ι(x))∩ ι(M))⊂M, (2.2.8)

and for a sufficiently small ε , by Lemma 2.2.3, denote

ε =ε− 124‖IIx(θ ,θ)‖2

ε3− 1

24∇θ IIx(θ ,θ) · IIx(θ ,θ)ε

4−( 1

80∇

2θθ IIx(θ ,θ) · IIx(θ ,θ)

+1

90∇θ IIx(θ ,θ) ·∇θ IIx(θ ,θ)+

11152

‖IIx(θ ,θ)‖4)ε

5 +O(ε6).

To have a more succinct proof, we prepare the following integration, which comes from a direct expansionand the proof is skipped.

Page 22: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 18

Lemma 2.2.4. For d ∈ N, γ > −d and h1,h2,h3 ∈ R, we have the following asymptotical expansion when ε is

sufficiently small:

∫ε+h1ε3+h2ε4+h3ε5+O(ε6)

0td−1+γ dt

=εd+γ

d + γ

(1+(d + γ)h1ε

2 +(d + γ)h2ε3 +[(d + γ)h3 +

(d + γ)(d + γ−1)2

h21]ε

4)+O(εd+γ+5).

In the next lemma, we calculate the asymptotical expansion of few quantities that we are going to use in prov-ing the main theorem. Note that in order to capture the extra terms introduced by the barycentric coordinate, wecalculate the normal term 2 orders higher than those for the tangential direction. To handle the normal componentis the main reason we need the C5 regularity for P.

Lemma 2.2.5. Fix x ∈M and assume Assumptions 2.1.2 hold. When ε is sufficiently small, we have the following

expansion for E[ f (X)χBRpε (ι(x))(X)]:

E[ f (X)χBRpε (ι(x))(X)] =

|Sd−1|d

f (x)P(x)εd (2.2.9)

+|Sd−1|

d(d +2)

[12

P(x)∆ f (x)+12

f (x)∆P(x)+∇ f (x) ·∇P(x)

+s(x) f (x)P(x)

6+

d(d +2)ω(x) f (x)P(x)24

d+2 +O(εd+3)

and the following expansion for E[(X− ι(x)) f (X)χBRpε (ι(x))(X)] ∈ Rp:

E[(X− ι(x)) f (X)χBRpε (x)(X)] = [[v1, v2]]+O(εd+5), (2.2.10)

where v1 ∈Rd and v2 ∈Rp−d are defined in (2.2.14) and (2.2.15) respectively, which contain terms of order εd+2

and εd+4.

Proof. By Lemma 2.2.1, Lemma 2.2.2 and Lemma 2.2.3,

E[ f (X)χBRpε (ι(x))(X)]

=∫

Bε (x)f (y)P(y)dV (y)

=∫

Sd−1

∫ε

0( f (x)+∇θ f (x)t +

12

∇2θ ,θ f (x)t2 +O(t3))

(P(x)+∇θ P(x)t +∇2θ ,θ P(x)t2 +O(t3))(td−1− 1

6Ricx(θ ,θ)td+1 +O(td+2))dtdθ

=A1 +B1 +C1 +O(εd+4),

Page 23: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 19

where

A1 :=∫

Sd−1

∫ε

0f (x)P(x)td−1dtdθ (2.2.11)

B1 :=∫

Sd−1

∫ε

0(∇θ f (x)P(x)+ f (x)∇θ P(x))tddtdθ

C1 :=∫

Sd−1

∫ε

0

[16

f (x)P(x)Ricx(θ ,θ)+∇θ f (x)∇θ P(x)

+12

∇2θ ,θ f (x)P(x)+

12

∇2θ ,θ P(x) f (x)

]td+1dtdθ ,

the second equality holds by Lemma 2.2.3 and the last equality holds due to the symmetry of sphere. Indeed, thesymmetry forces all terms of odd order contribute to the εd+4 term; for example,

B1 =∫

Sd−1

∫ε

0(∇θ f (x)P(x)+ f (x)∇θ P(x))tddtdθ

=1

d +1

∫Sd−1

(∇θ f (x)P(x)+ f (x)∇θ P(x))(

ε +1

24‖IIx(θ ,θ)‖2

ε3 +O(ε4)

)d+1dθ

=εd+1

d +1

∫Sd−1

(∇θ f (x)P(x)+ f (x)∇θ P(x))(

1+d +1

24‖IIx(θ ,θ)‖2

ε2 +O(ε3)

)dθ

=O(εd+4)

since∫

Sd−1(∇θ f (x)P(x) + f (x)∇θ P(x))(1+ d+1

24 ‖IIx(θ ,θ)‖2ε2)dθ = 0. The other even order terms could be

expanded by a direct calculation. We have

A1 =∫

Sd−1

∫ε

0f (x)P(x)td−1dtdθ

=f (x)P(x)

d

∫Sd−1

(ε +

124‖IIx(θ ,θ)‖2

ε3 +O(ε4)

)ddθ

=εd f (x)P(x)

d

∫Sd−1

(1+

d24‖IIx(θ ,θ)‖2

ε2 +O(ε3)

)dθ

=εd |Sd−1| f (x)P(x)

[1d+

ω(x)24

ε2]+O(εd+3).

A similar argument holds for B1. By denoting R2(θ) := 16 f (x)P(x)Ricx(θ ,θ)+∇θ f (x)∇θ P(x)+ 1

2 ∇2θ ,θ f (x)P(x)+

12 ∇2

θ ,θ P(x) f (x), we have

C1 =∫

Sd−1

∫ε

0R2(θ)td+1dtdθ

=1

d +2

∫Sd−1

(ε +

124‖IIx(θ ,θ)‖2

ε3 +O(ε4)

)d+2R2(θ)dθ

=εd+2

d +2

∫Sd−1

(1+

d +224‖IIx(θ ,θ)‖2

ε2 +O(ε3)

)R2(θ)dθ

=εd+2

d +2

∫Sd−1

R2(θ)dθ +O(εd+4).

Page 24: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 20

To proceed, note that by expressing θ in the local coordinate as θ i∂i, we have, for example,∫Sd−1

∇θ f (x)∇θ P(x)dθ = ∑i j

∫Sd−1

∂i f (x)∂ jP(x)θ iθ

jdθ

= ∑i

∫Sd−1

∂i f (x)∂iP(x)(θ i)2dθ =|Sd−1|

d∇ f (x) ·∇P(x),

where the second equality holds since odd order terms disappear when integrated over the sphere, and the lastequality holds since

∫Sd−1(θ i)2dθ = 1

d∫

Sd−1 ∑di=1(θ

i)2dθ = |Sd−1|d due to again the symmetry of the sphere. The

same argument leads to

∫Sd−1

f (x)P(x)Ricx(θ ,θ)dθ =|Sd−1|

df (x)P(x)s(x)∫

Sd−1∇

2θ ,θ f (x)P(x)dθ =

|Sd−1|d

f (x)∆P(x) (2.2.12)∫Sd−1

∇2θ ,θ P(x) f (x)dθ =

|Sd−1|d

P(x)∆ f (x),

where s(x) is the scalar curvature of (M,g) at x. As a result, we have

C1 =|Sd−1|

d(d +2)

[12

P(x)∆ f (x)+12

f (x)∆P(x)

+∇ f (x) ·∇P(x)+s(x) f (x)P(x)

6

d+2 +O(εd+4).

By putting all the above together, we have

E[ f (X)χBRpε (ι(x))(X)] =

|Sd−1|d

f (x)P(x)εd +|Sd−1|

d(d +2)

[12

P(x)∆ f (x)+12

f (x)∆P(x)

+∇ f (x) ·∇P(x)+s(x) f (x)P(x)

6+

d(d +2)ω(x) f (x)P(x)24

d+2 +O(εd+3).

Next, we evaluate E[(X − ι(x)) f (X)χBRpε (ι(x))(X)]. Again, by Lemma 2.2.1, Lemma 2.2.2 and Lemma 2.2.3,

we have

E[(X− ι(x)) f (X)χBRpε (ι(x))(X)] (2.2.13)

=∫

Bε (x)(ι(y)− ι(x)) f (y)P(y)dV (y)

=∫

Sd−1

∫ε

0(ι∗θ t +

12

IIx(θ ,θ)t2 +16

∇θ IIx(θ ,θ)t3 ++1

24∇θθ IIx(θ ,θ)t4 +O(t5))

× ( f (x)+∇θ f (x)t +12

∇2θ ,θ f (x)t2 +O(t3))

× (P(x)+∇θ P(x)t +12

∇2θ ,θ P(x)t2 +O(t3))

× (td−1− 16Ricx(θ ,θ)td+1 +O(td+2))dtdθ

=A2 +B2 +C2 +D2 +O(εd+6),

Page 25: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 21

where

A2 :=∫

Sd−1

∫ε

0ι∗θ f (x)P(x)tddtdθ

B2 :=∫

Sd−1

∫ε

0

[ι∗θ(∇θ f (x)P(x)+∇θ P(x) f (x))+

12

IIx(θ ,θ) f (x)P(x)]td+1dtdθ ,

C2 :=∫

Sd−1

∫ε

0

[ι∗θ(∇θ f (x)∇θ P(x)+∇

2θ ,θ P(x) f (x)+∇

2θ ,θ f (x)P(x))

+16

∇θ IIx(θ ,θ) f (x)P(x)− 16

f (x)P(x)Ricx(θ ,θ)]td+2dtdθ

and

D2 :=∫

Sd−1

∫ε

0

[ι∗θ(1

6∇

3θ ,θ ,θ f (x)P(x)+

16

∇3θ ,θ ,θ P(x) f (x)+

12

∇2θ ,θ f (x)∇θ P(x)

+12

∇2θ ,θ P(x)∇θ f (x)− 1

6Ricx(θ ,θ)[ f (x)∇θ P(x)+∇θ f (x)P(x)]

)+

12

IIx(θ ,θ)(∇θ f (x)∇θ P(x)+

12[P(x)∇2

θ ,θ f (x)+ f (x)∇2θ ,θ P(x)]− 1

6Ricx(θ ,θ) f (x)P(x)

)+

16

∇θ IIx(θ ,θ)(P(x)∇θ f (x)+ f (x)∇θ P(x))+1

24f (x)P(x)∇θθ IIx(θ ,θ)

]td+3dtdθ

and the O(εd+5) term disappears in the last equality due to the symmetry of the sphere. The main differencebetween evaluating E[(X − ι(x)) f (X)χBRp

ε (ι(x))(X)] and E[ f (X)χBRpε (ι(x))(X)] is the existence of ι(y) in the inte-

grand in (2.2.13). Clearly, E[(X− ι(x)) f (X)χBRpε (ι(x))(X)] is a vector while E[ f (X)χBRp

ε (ι(x))(X)] is a scalar. Dueto the curvature, ι(y)− ι(x) does not always exist on ι∗TxM for all y ∈ Bε , and we need to carefully trace thenormal components. By Lemma 2.2.4,

A2 =∫

Sd−1

∫ε

0f (x)P(x)ι∗θ tddtdθ

=f (x)P(x)

d +1

∫Sd−1

ι∗θ(ε +1

24‖IIx(θ ,θ)‖2

ε3 +

124

∇θ IIx(θ ,θ) · IIx(θ ,θ)ε4 +O(ε5))d+1dθ

=f (x)P(x)

d +1

∫Sd−1

(ι∗θε

d+1 +d +1

24ι∗θ‖IIx(θ ,θ)‖2

εd+3

+d +1

24ι∗θ∇θ IIx(θ ,θ) · IIx(θ ,θ)ε

d+4 +O(εd+5))

=f (x)P(x)

24[ι∗

∫Sd−1

θ∇θ IIx(θ ,θ) · IIx(θ ,θ)dθ]ε

d+4 +O(εd+5)

=f (x)P(x)|Sd−1|

24ι∗R0(x)εd+4 +O(εd+5) ,

where the second last equality holds due to the symmetry of the sphere. We could see that A2 = O(εd+4) andA2 ∈ ι∗TxM. Similarly, we have C2 = O(εd+6), but C2 might not be on ι∗TxM due to the term ∇θ IIx(θ ,θ) f (x)P(x).

Page 26: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 22

B2 could be evaluated by a similar direct expansion.

B2 =∫

Sd−1

∫ε

0

[P(x)ι∗θ(∇ f (x) ·θ)+ f (x)ι∗θ(∇P(x) ·θ)

+12

IIx(θ ,θ) f (x)P(x)]td+1dtdθ

=εd+2

d +2

∫Sd−1

[P(x)ι∗θθ

>∇ f (x)+ f (x)ι∗θθ

>∇P(x)+

P(x)2

IIx(θ ,θ) f (x)]dθ

+εd+4

24

∫Sd−1‖IIx(θ ,θ)‖2

[P(x)ι∗θθ

>∇ f (x)+ f (x)ι∗θθ

>∇P(x)+

P(x)2

IIx(θ ,θ) f (x)]dθ +O(εd+5),

which becomes

εd+2

d +2ι∗

∫Sd−1

θθ>dθ

[P(x)∇ f (x)+ f (x)∇P(x)

]+|Sd−1|

2(d +2)f (x)P(x)N0(x)εd+2

+εd+4

24

(ι∗

∫Sd−1‖IIx(θ ,θ)‖2

θθ>dθ

[P(x)∇ f (x)+ f (x)∇P(x)

]+

f (x)P(x)2

∫Sd−1‖IIx(θ ,θ)‖2IIx(θ ,θ)dθ

)+O(εd+5)

=|Sd−1|d +2

[[P(x)ι∗∇ f (x)+ f (x)ι∗∇P(x)]

d+

f (x)P(x)N0(x)2

d+2

+|Sd−1|

24

[ι∗M1(x)

[P(x)∇ f (x)+ f (x)∇P(x)

]+

f (x)P(x)N1(x)2

d+4 +O(εd+5) ,

where the second equality holds by the same argument as that for B1 and the fourth equality holds since∫

Sd−1 θθ>dθ =|Sd−1|

d Id×d .

For D2, we only need to explicitly write down the εd+4 term. By the same argument as that for B1, we have

D2 =∫

Sd−1

∫ε

0

[ι∗θ(1

6∇

3θ ,θ ,θ f (x)P(x)+

16

∇3θ ,θ ,θ P(x) f (x)+

12

∇2θ ,θ f (x)∇θ P(x)

+12

∇2θ ,θ P(x)∇θ f (x)− 1

6Ricx(θ ,θ)[ f (x)∇θ P(x)+∇θ f (x)P(x)]

)+

12

IIx(θ ,θ)(∇θ f (x)∇θ P(x)+

12[P(x)∇2

θ ,θ f (x)+ f (x)∇2θ ,θ P(x)]− 1

6Ricx(θ ,θ) f (x)P(x)

)+

16

∇θ IIx(θ ,θ)(P(x)∇θ f (x)+ f (x)∇θ P(x))+1

24f (x)P(x)∇θθ IIx(θ ,θ)

]td+3dtdθ ,

which becomes

εd+4

d +4

∫Sd−1

[ι∗θ(1

6∇

3θ ,θ ,θ f (x)P(x)+

16

∇3θ ,θ ,θ P(x) f (x)+

12

∇2θ ,θ f (x)∇θ P(x)

+12

∇2θ ,θ P(x)∇θ f (x)− 1

6Ricx(θ ,θ)[ f (x)∇θ P(x)+∇θ f (x)P(x)]

)+

12

IIx(θ ,θ)(∇θ f (x)∇θ P(x)+

12[P(x)∇2

θ ,θ f (x)+ f (x)∇2θ ,θ P(x)]− 1

6Ricx(θ ,θ) f (x)P(x)

)+

16

∇θ IIx(θ ,θ)(P(x)∇θ f (x)+ f (x)∇θ P(x))+1

24f (x)P(x)∇θθ IIx(θ ,θ)

]dθ +O(εd+6) .

We now simplify this complicated expression. The first term on the right hand side of D2 becomes |Sd−1|J f (x) ∈

Page 27: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 23

ι∗TxM. For the second term on the right hand side of D2, we rewrite it as∫Sd−1

IIx(θ ,θ)[∇θ f (x)∇θ P(x)+

12

P(x)∇2θ ,θ f (x)

+12

f (x)∇2θ ,θ P(x)− 1

6Ricx(θ ,θ) f (x)P(x)

]dθ

=∇ f (x)>∫

Sd−1IIx(θ ,θ)θθ

>dθ∇P(x)+12

P(x)tr(∫

Sd−1IIx(θ ,θ)θθ

>dθ∇2 f (x)

)+

12

f (x)tr(∫

Sd−1IIx(θ ,θ)θθ

>dθ∇2P(x)

)− 1

6f (x)P(x)

∫Sd−1

IIx(θ ,θ)Ricx(θ ,θ)dθ

= |Sd−1|[∇ f (x)>M2(x)∇P(x)+

12(P(x)tr(M2(x)∇2 f (x))

+12

f (x)tr(M2(x)∇2P(x)))− 1

6f (x)P(x)N2(x)

],

which is in (ι∗TxM)⊥, where we use the equality u>Mv = tr(vu>M), where M is a d× d matrix and u,v ∈ Rd .For the third term on the right hand side of D2, it simply becomes∫

Sd−1∇θ IIx(θ ,θ)

(P(x)∇θ f (x)+ f (x)∇θ P(x)

)dθ

=P(x)∫

Sd−1∇θ IIx(θ ,θ)θ

>dθ∇ f (x)+ f (x)∫

Sd−1∇θ IIx(θ ,θ)θ

>dθ∇P(x)

= |Sd−1|[P(x)R1(x)∇ f (x)+ f (x)R1(x)∇P(x)

],

which might or might not in ι∗TxM. Therefore, we have

D2 =|Sd−1|d +4

(J f (x)+

12

∇ f (x)>M2(x)∇P(x)− 112

f (x)P(x)N2(x)

+14[P(x)tr

(M2(x)∇2 f (x)

)+ f (x)tr

(M2(x)∇2P(x)

)]+

16[P(x)R1(x)∇ f (x)+ f (x)R1(x)∇P(x)

]+

124

f (x)P(x)R2(x))

εd+4.

As a result, by putting the above together, when expressing E[(X− ι(x)) f (X)χBRpε (ι(x))(X)] as [[v1,v2]], we have

v1 =J>p,dE[(X− ι(x)) f (X)χBRpε (ι(x))(X)] (2.2.14)

=|Sd−1|d +2

J>p,d[P(x)ι∗∇ f (x)+ f (x)ι∗∇P(x)

]d

εd+2

+|Sd−1|

24J>p,dι∗

(M1(x)

[P(x)∇ f (x)+ f (x)∇P(x)

]+ f (x)P(x)R0(x)

d+4

+|Sd−1|d +4

J>p,d(J f (x)+

16(P(x)R1(x)∇ f (x)+ f (x)R1(x)∇P(x))

+1

24f (x)P(x)R2(x)

d+4 +O(εd+5),

Page 28: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 24

and

v2 = J>p,p−dE[(X− ι(x)) f (X)χBRpε (ι(x))(X)] (2.2.15)

=|Sd−1|d +2

f (x)P(x)J>p,p−dN0(x)

d+2 +|Sd−1|

24

f (x)P(x)J>p,p−dN1(x)

d+4

+|Sd−1|d +4

J>p,p−d

(12

∇ f (x)>M2(x)∇P(x)+14[P(x)tr(M2(x)∇2 f (x))+ f (x)tr(M2(x)∇2P(x))

]− 1

12f (x)P(x)N2(x)

d+4

+|Sd−1|

6(d +4)J>p,p−d

[P(x)R1(x)∇ f (x)+ f (x)R1(x)∇P(x)+

14

f (x)P(x)R2(x)]ε

d+4 +O(εd+5).

2.3 Local covariance structure and local PCA

We callCx := E[(X− ι(x))(X− ι(x))>χBRp

ε (ι(x))(X)] ∈ Rp×p (2.3.1)

the local covariance matrix at ι(x) ∈ ι(M), which is the covariance matrix associated with the local PCA [55, 17,67, 12, 36, 41]. In the proof of LLE under the manifold setup, the eigen-structure of Cx plays an essential role dueto its relationship with the barycentric coordinate. Geometrically, for a d-dim manifold, the first d eigenvectors ofCx corresponding to the largest d eigenvalues provide an estimated basis for the embedded tangent space ι∗TxM,and the remaining eigenvectors form an estimated basis for the normal space at ι(x). To be more precise, asmooth manifold can be well-approximated locally by an affine subspace. However, this approximation cannotbe perfect, if the curvature exists. It is well-known that the contribution of curvature is of high order. For thepurpose of fitting the manifold, we can ignore its contribution. For example, in [55, 17] the local PCA is appliedto estimate the tangent space. However, in LLE, the curvature plays an essential role and a careful analysis isneeded to understand its role. In Lemma 2.2.5, we show a generalization of the result shown in [55, 17] byexpanding the Cx up to the third order for the sake of capturing LLE behavior. The third order term is needed foranalyzing the regularization step shown in (1.2.5).

Proposition 2.3.1. Fix x ∈M and suppose Assumption 2.1.2 holds. When ε is sufficiently small, we have

Cx =|Sd−1|P(x)d(d +2)

εd+2([Id×d 0

0 0

]+

[M(2)

11 M(2)12

M(2)21 M(2)

22

2 +

[M(4)

11 M(4)12

M(4)21 M(4)

22

4 +O(ε6)),

where M(2)11 , M(4)

11 ∈ S(d), M(2)22 , M(4)

22 ∈ S(p−d), M(2)12 , M(4)

12 ∈Rd×(p−d), M(2)12 = M(2)>

21 , and M(4)12 = M(4)>

21 . These

matrices are defined in (2.3.5), (2.3.7), (2.3.9), and (2.3.10). M(2)22 depends on IIx but does not depend on the p.d.f.

P, and M(4)22 depends on the IIx and its derivatives, the Ricci curvature, and P.

Proof. We use the notation 〈·, ·〉 to mean the inner product and use the notation (2.2.8). The (m,n)-th entry ofCx = E[(X− ι(x))(X− ι(x))>χBRp

ε (ι(x))(X)] is

e>mCxen =∫

Bε (x)〈ι(y)− ι(x),em〉〈ι(y)− ι(x),en〉P(y)dV (y). (2.3.2)

Page 29: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 25

The quantities ι expx(θ t), ε and dV need to be expanded up to higher order terms. By the change of variabley = expx(tθ), where (t,θ) ∈ [0,∞)×Sd−1 constitutes the polar coordinate, we have the following expressions:

ι expx(θ t)− ι(x) = K1(θ)t +K2(θ)t2 +K3(θ)t3 +K4(θ)t4 +K5(θ)t5 +O(t6)

ε = ε +H1(θ)ε3 +H2(θ)ε

4 +H3(θ)ε5 +O(ε6)

dV (expx(tθ)) = td−1 +R1(θ)td+1 +R2(θ)td+2 +R3(θ)td+3 +O(td+4)

P(expx(tθ)) = P0 +P1(θ)+P2(θ)t2 +P3(θ)t3 +P4(θ)t4 +O(t5) ,

where K1(θ) = ι∗θ , K2(θ) =

12

IIx(θ ,θ), K3(θ) =16

∇θ IIx(θ ,θ),

K4(θ) =1

24∇

2θθ IIx(θ ,θ), K5(θ) =

1120

∇3θθθ IIx(θ ,θ),

by Lemma (2.2.1), H1(θ) =

124‖IIx(θ ,θ)‖2, H2(θ) =

124

∇θ IIx(θ ,θ) · IIx(θ ,θ),

H3(θ) =1

80∇

2θθ IIx(θ ,θ) · IIx(θ ,θ)

+1

90∇θ IIx(θ ,θ) ·∇θ IIx(θ ,θ)+

71152

‖IIx(θ ,θ)‖4,

by Lemma (2.2.2), R1(θ) = −

16Ricx(θ ,θ), R2(θ) =−

112

∇θRicx(θ ,θ),

R3(θ) = −140

∇2θθRicx(θ ,θ)

− 1180

d

∑a,b=1

Rx(θ ,a,θ ,b)Rx(θ ,a,θ ,b)+172

Ricx(θ ,θ)2,

by Lemma (2.2.3), and P0 := P(x), P1(θ) := ∇θ P(x), P2(θ) :=

12

∇2θ ,θ P(x)

P3(θ) :=16

∇3θ ,θ ,θ P(x), P4(θ) :=

124

∇4θ ,θ ,θ ,θ P(x).

Note that H1, H3, R1, R3, P0, P2, and P4 are even functions on Sd−1 and H2, R2, P1 and P3 are odd on Sd−1.Similarly, for m,n = 1, . . . , p, we have

〈ι expx(θ t)− ι(x),em〉〈ι expx(θ t)− ι(x),en〉

=Am,n(θ)t2 +Bm,n(θ)t3 +Cm,n(θ)t4 +Dm,n(θ)t5 +Em,n(θ)t6 +O(t7) ,

Page 30: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 26

where

Am,n(θ) =〈K1(θ),em〉〈K1(θ),en〉

Bm,n(θ) =〈K2(θ),em〉〈K1(θ),en〉+ 〈K1(θ),em〉〈K2(θ),en〉

Cm,n(θ) =〈K2(θ),em〉〈K2(θ),en〉+ 〈K1(θ),em〉〈K3(θ),en〉

+ 〈K3(θ),em〉〈K1(θ),en〉

Dm,n(θ) =〈K1(θ),em〉〈K4(θ),en〉+ 〈K2(θ),em〉〈K3(θ),en〉

+ 〈K3(θ),em〉〈K2(θ),en〉+ 〈K4(θ),em〉〈K1(θ),en〉

Em,n(θ) =〈K1(θ),em〉〈K5(θ),en〉+ 〈K2(θ),em〉〈K4(θ),en〉

+ 〈K3(θ),em〉〈K3(θ),en〉+ 〈K4(θ),em〉〈K2(θ),en〉+ 〈K5(θ),em〉〈K1(θ),en〉 .

Observe that Am,n, Cm,n and Em,n, for m,n = 1, . . . , p, are even functions on Sd−1, while Bm,n and Dm,n are oddfunctions on Sd−1. But plugging these expressions into (2.3.2), we have

e>mCxen =∫

Sd−1

∫ε

0

(Am,n(θ)t2 +Bm,n(θ)t3 +Cm,n(θ)t4

+Dm,n(θ)t5 +Em,n(θ)t6 +O(t7))

×(P0 +P1(θ)t +P2(θ)t2 +P3(θ)t3 +P4(θ)t4 +O(t5)

)×(td−1 +R1(θ)td+1 +R2(θ)td+2 +R3(θ)td+3 +O(td+4)

)dtdθ .

We now collect terms of the same order to simplify the calculation. We focus on those terms with the order lessthan or equal to εd+6.

e>mCxen =∫

Sd−1

∫ε

0P0Am,n(θ)td+1 +

(P0Bm,n(θ)+P1(θ)Am,n(θ)

)td+2

+(P0Am,n(θ)R1(θ)+P0Cm,n(θ)+P1(θ)Bm,n(θ)+P2(θ)Am,n(θ)

)td+3

+(P0Am,n(θ)R2(θ)+P0Bm,n(θ)R1(θ)+P1(θ)Am,n(θ)R1(θ)+P0Dm,n(θ)

+P1(θ)Cm,n(θ)+P2(θ)Bm,n(θ)+P3(θ)Am,n(θ))td+4

+(P0Am,n(θ)R3(θ)+P0Bm,n(θ)R2(θ)+P1(θ)Am,n(θ)R2(θ)+P0Cm,n(θ)R1(θ)

+P1(θ)Bm,n(θ)R1(θ)+P2(θ)Am,n(θ)R1(θ)+P0Em,n(θ)+P1(θ)Dm,n(θ)

+P2(θ)Cm,n(θ)+P3(θ)Bm,n(θ)+P4(θ)Am,n(θ))td+5dtdθ +O(εd+7)

By further expanding the integration of t over [0, ε] = [0,ε +H1(θ)ε3 +H2(θ)ε

4 +H3(θ)ε5 +O(ε6)] by Lemma

2.2.4, we have

e>mCxen = εd+2Q(0)

m,n(x)+ εd+4Q(2)

m,n(x)+ εd+6Q(4)

m,n(x)+O(εd+7),

Page 31: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 27

where

Q(0)m,n(x) = P0

∫Sd−1

Am,n(θ)

d +2dθ ,

Q(2)m,n(x) =

∫Sd−1

P0Am,n(θ)H1(θ)+1

d +4[P0Am,n(θ)R1(θ)+P0Cm,n(θ)

+P1(θ)Bm,n(θ)+P2(θ)Am,n(θ)]dθ ,

and

Q(4)m,n(x) =

∫Sd−1

(P0Am,n(θ)H3(θ)+

d +12

P0Am,n(θ)H21 (θ)

+[P0Bm,n(θ)+P1(θ)Am,n(θ)

]H2(θ)

+[P0Am,n(θ)R1(θ)+P0Cm,n(θ)+P1(θ)Bm,n(θ)+P2(θ)Am,n(θ)

]H1(θ)

+1

d +6[P0Am,n(θ)R3(θ)+P0Bm,n(θ)R2(θ)+P1(θ)Am,n(θ)R2(θ)+P0Cm,n(θ)R1(θ)

+P1(θ)Bm,n(θ)R1(θ)+P2(θ)Am,n(θ)R1(θ)+P0Em,n(θ)+P1(θ)Dm,n(θ)

+P2(θ)Cm,n(θ)+P3(θ)Bm,n(θ)+P4(θ)Am,n(θ)])

dθ .

To finish the proof, we evaluate Q(0)m,n, Q(2)

m,n, and Q(4)m,n, for 1≤ m,n≤ p. Due to Assumptions 2.1.2 and 2.3.1,

e1, · · · ,ed is an orthonormal basis of ι∗TxM and ed+1, · · · ,ep is an orthonormal basis of (ι∗TxM)⊥. There, wehave 〈K1(θ),ei〉= 〈ι∗θ ,ei〉= 0 for i = d+1, · · · , p. Using Lemma 2.2.2 and symmetry of sphere, we can evaluatethe term of order εd+2 in Cx. For 1≤ m = n≤ d, we have

Q(0)m,n =

P0

d +2

∫Sd−1

Am,n(θ)dθ =P(x)d +2

∫Sd−1|〈ι∗θ ,e1〉|2dθ =

|Sd−1|P(x)d(d +2)

; (2.3.3)

for other m and n,∫

Sd−1 Am,n(θ)dθ = 0. Thus, the coefficient of the εd+2 term is |Sd−1|P(x)d(d+2)

[Id×d 0

0 0

]. Denote

M(0)11 = Id×d ∈ Rd×d and M(0)

12 = 0 ∈ Rd×(p−d), M(0)21 = M(0)

12>

and M(0)22 = 0 ∈ R(p−d)×(p−d).

Next, we evaluate the term of order εd+4 in Cx. Note that 〈IIx(θ ,θ),em〉= 0, for m = 1, · · · ,d, so Bm,n(θ) = 0.Thus, for 1≤ m,n≤ d, by a direct calculation,

Q(2)m,n =

P(x)24

∫Sd−1〈ι∗θ ,em〉〈ι∗θ ,en〉‖IIx(θ ,θ)‖2dθ (2.3.4)

− P(x)6(d +4)

∫Sd−1〈ι∗θ ,em〉〈ι∗θ ,en〉Ricx(θ ,θ)dθ

− P(x)6(d +4)

∫Sd−1〈ι∗θ ,em〉〈IIx(en,θ), IIx(θ ,θ)〉+ 〈ι∗θ ,en〉〈IIx(em,θ), IIx(θ ,θ)〉dθ

+1

2(d +4)

∫Sd−1

∇2θ ,θ P(x)〈ι∗θ ,em〉〈ι∗θ ,en〉dθ ,

where we use the fact that 〈∇θ IIx(θ ,θ),em〉=−〈IIx(em,θ), IIx(θ ,θ)〉 when m = 1, . . . ,d. By defintion, it is clear

Page 32: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 28

that Am,n(θ) = 0 when 1≤ m≤ d and d +1≤ n≤ p. Thus, for 1≤ m≤ d and d +1≤ n≤ p, we have

Q(2)m,n =

P(x)6(d +4)

∫Sd−1〈ι∗θ ,em〉〈∇θ IIx(θ ,θ),en〉dθ (2.3.5)

+1

2(d +4)

∫Sd−1

∇θ P(x)〈ι∗θ ,em〉〈IIx(θ ,θ),en〉dθ .

By definition, for d +1≤ m,n≤ p, Am,n(θ) = Bm,n(θ) = 0, and hence

Q(2)m,n =

P(x)4(d +4)

∫Sd−1〈IIx(θ ,θ),em〉〈IIx(θ ,θ),en〉dθ . (2.3.6)

Finally, we evaluate the εd+6 term. Again, recall the fact that when d + 1 ≤ m,n ≤ p, Am,n(θ) = 0 andBm,n(θ) = 0. Therefore, Q(4)

m,n, where d +1≤ m,n≤ p, consists of only

Q(4)m,n =

∫Sd−1

P0Cm,n(θ)H1(θ)

+1

d +6

(P0Cm,n(θ)R1(θ)+P0Em,n(θ)+P1(θ)Dm,n(θ)+P2(θ)Cm,n(θ)

)dθ .

Based on Lemmas 2.2.1, 2.2.2 and 2.2.3, for d +1≤ m,n≤ p, we have

Q(4)m,n =

P(x)96

∫Sd−1〈IIx(θ ,θ),em〉〈IIx(θ ,θ),en〉‖IIx(θ ,θ)‖2dθ (2.3.7)

− P(x)24(d +6)

∫Sd−1〈IIx(θ ,θ),em〉〈IIx(θ ,θ),en〉Ricx(θ ,θ)dθ

+P(x)

48(d +6)

∫Sd−1〈IIx(θ ,θ),em〉〈∇2

θθ IIx(θ ,θ),en〉

+ 〈∇2θθ IIx(θ ,θ),em〉〈IIx(θ ,θ),en〉dθ

+P(x)

36(d +6)

∫Sd−1〈∇θ IIx(θ ,θ),em〉〈∇θ IIx(θ ,θ),en〉dθ

+1

12(d +6)

∫Sd−1

∇θ P(x)(〈IIx(θ ,θ),em〉〈∇θ IIx(θ ,θ),en〉

+ 〈∇θ IIx(θ ,θ),em〉〈IIx(θ ,θ),en〉)

+1

4(d +6)

∫Sd−1

∇2θ ,θ P(x)〈IIx(θ ,θ),em〉〈IIx(θ ,θ),en〉dθ .

Since we only need to evaluate Q(4)mn , where d + 1 ≤ m,n ≤ p, for LLE analysis, we omit the calculation of the

other pairs of m,n. We thus conclude that

Cx = εd+2 |Sd−1|P(x)

d(d +2)

([Id×d 00 0

]+

[M(2)

11 M(2)12

M(2)21 M(2)

22

2

+

[M(4)

11 M(4)12

M(4)21 M(4)

22

4 +O(ε6)), (2.3.8)

where M( j)11 ∈ Rd×d is defined as

e>mM( j)11 en =

d(d +2)|Sd−1|P(x)

Q( j)m,n, (2.3.9)

Page 33: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 29

for m,n = 1, . . . ,d and j = 2,4, M( j)22 ∈ R(p−d)×(p−d) is defined as

e>mM( j)22 en =

d(d +2)|Sd−1|P(x)

Q( j)m+d,n+d , (2.3.10)

for m,n = 1, . . . , p−d and j = 2,4, M(2)12 ∈ Rd×(p−d) is defined as

e>mM(2)12 en =

d(d +2)|Sd−1|P(x)

Q(2)m,n+d

=d(d +2)

6|Sd−1|(d +4)

∫Sd−1〈ι∗θ ,em〉〈∇θ IIx(θ ,θ),en〉dθ (2.3.11)

+d(d +2)

2(d +4)|Sd−1|P(x)

∫Sd−1

∇θ P(x)〈ι∗θ ,em〉〈IIx(θ ,θ),en〉dθ

for m = 1, . . . ,d and n = 1, . . . , p−d, and M(2)21 = M(2)

12>

.

Since P is bounded by Pm from below, when ε is sufficiently small, the εd+2 term is dominant and the largestd eigenvalues of Cx are of order εd+2. The other eigenvalues of Cx are of higher order and depend on the εd+4

term or even the εd+6 term. The behavior of eigenvectors is more complicated, due to the possible multiplicity ofthe corresponding eigenvalues.

To simplify the statement of the eigen-structure, following Assumption 2.1.2, we make one more assumption.

Assumption 2.3.1. Following Assumption 2.1.2, without loss of generality, we assume that the manifold is trans-

lated and rotated properly, so that ed+1, · · · ,ep “diagonalize” the second fundamental form; that is, M(2)22 in

Proposition 2.3.1 is diagonalized to Λ(2)2 = diag(λ (2)

d+1, . . . ,λ(2)p ).

The eigen-structure of the local covariance matrix is summarized in the following Proposition.

Proposition 2.3.2. Fix x ∈ M. Suppose ε is sufficiently small and Assumptions 2.1.2 and 2.3.1 hold. Let r =

rank(Cx). The eigen-decomposition of Cx =UxΛxU>x , where Ux ∈ O(p) and Λx ∈ Rp×p is a diagonal matrix, is

summarized below.

Case 0: When r = d, we have

Cx :=|Sd−1|P(x)εd+2

d(d +2)

([Id×d 00 0

]+

[O(ε2) 0

0 0

])and

Λx =|Sd−1|P(x)εd+2

d(d +2)

[Id×d +O(ε2) 0

0 0

]+O(ε4),

Ux(ε) =Ux(0)(Ip×p + ε2S)+O(ε4) ∈ O(p),

where Ux(0) =

[X1 00 X2

], X1 ∈ O(d), X2 ∈ O(p−d), and S ∈ o(p).

Case 1: When all diagonal entries of Λ(2)2 are nonzero, we have:

Page 34: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 30

Λx =|Sd−1|P(x)εd+2

d(d +2)

[Id×d + ε2Λ

(2)1 + ε4Λ

(4)1 0

0 ε2Λ(2)2 + ε4Λ

(4)2

]+O(ε6),

Ux =Ux(0)(Ip×p + ε2S)+O(ε4) ∈ O(p),

where Λ(2)1 ,Λ

(4)1 ∈Rd×d and Λ

(4)2 ∈R(p−d)×(p−d) are diagonal matrices with diagonal entries of order 1, Ux(0) =[

X1 00 X2

]∈ O(p), X1 ∈ O(d), X2 ∈ O(p−d), and S ∈ o(p). The explicit expression of these matrices are listed

in (2.3.14)-(2.3.21).

Case 2: When l diagonal entries for Λ(2)2 are 0, where 1≤ l≤ p−d, we have the following eigen-decomposition

under some conditions. Divide Cx into blocks corresponding to the multiplicity l as

Cx =|Sd−1|P(x)d(d +2)

εd+2(Id×d 0 0

0 0 00 0 0

+M(2)

11 M(2)12,1 M(2)

12,2

M(2)21,1 Λ

(2)2,1 0

M(2)21,2 0 0

ε2

+

M(4)11 M(4)

12,1 M(4)12,2

M(4)21,1 M(4)

22,11 M(4)22,12

M(4)21,2 M(4)

22,21 M(4)22,22

ε4 +O(ε6)

), (2.3.12)

where M(2)12,1,M

(4)12,1 ∈ Rd×(p−d−l), M(2)

12,2,M(4)12,2 ∈ Rd×l , M(2)

12,1 = M(2)>21,1 , M(4)

12,1 = M(4)>21,1 , M(2)

12,2 = M(4)>21,2 , M(2)

12,2 =

M(4)>21,2 , M(4)

22,11 ∈ S(p−d− l), M(4)22,22 ∈ S(l), M(4)

22,12 ∈ R(p−d−l)×l , and M(4)22,21 = M(4)>

22,12.

Denote the eigen-decomposition of the matrix M(4)22,22−2M(2)

21,2M(2)12,2 as

M(4)22,22−2M(2)

21,2M(2)12,2 =U2,2Λ

(4)2,2U>2,2, (2.3.13)

where U2,2 ∈ O(l) and Λ(4)2,2 = diag[λ

(4)p−l+1, . . . ,λ

(4)p ] is a diagonal matrix. If we further assume that all diagonal

entries of Λ(4)2,2 are nonzero, we have

Λx =|Sd−1|P(x)εd+2

d(d +2)

Id×d + ε2Λ(2)1 + ε4Λ

(4)1 0 0

0 ε2Λ(2)2,1 + ε4Λ

(4)2,1 0

0 0 ε4Λ(4)2,2

+O(ε6),

Ux =Ux(0)(Ip×p + ε2S)+O(ε4) ∈ O(p),

where Λ(4)1 and Λ

(4)2,1 are diagonal matrices, Ux(0) =

X1 0 00 X2,1 00 0 X2,2

∈ O(p), X1 ∈ O(d), X2,1 ∈ O(p−d− l),

X2,2 ∈ O(l), and S ∈ o(p). The explicit formula for these matrices are listed in (2.3.22)-(2.3.24).

Proof. We now evaluate the eigenvalue and eigenvectors of Cx shown in (2.3.8) based on the technique introducedin Section 1.4.

Page 35: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 31

For Case 0, when ε is sufficiently small, we have

Cx :=|Sd−1|P(x)εd+2

d(d +2)

([Id×d 00 0

]+

[O(ε2) 0

0 0

])and hence the d non-zero eigenvalues satisfies

Λx =|Sd−1|P(x)εd+2

d(d +2)

[Id×d +O(ε2) 0

0 0

]+O(ε4),

Ux(ε) =Ux(0)(Ip×p + ε2S)+O(ε4) ∈ O(p),

where Ux(0) =

[X1 00 X2

], X1 ∈O(d), X2 ∈O(p−d), and S ∈ o(p). Note that in this case S, X1 and X2 cannot be

uniquely determined by the order εd+2 part of Cx.

For Case 1, when ε is sufficiently small, we have

Λx =|Sd−1|P(x)εd+2

d(d +2)

[Id×d + ε2Λ

(2)1 + ε4Λ

(4)1 0

0 ε2Λ(2)2 + ε4Λ

(4)2

]+O(ε3),

Ux(ε) =Ux(0)(Ip×p + ε2S)+O(ε4) ∈ O(p),

where Ux(0) =

[X1 00 X2

], X1 ∈ O(d), and X2 ∈ O(p−d), and

S :=

[S11 S12

S21 S22

]∈ o(p). (2.3.14)

Since M(2)22 is a diagonal matrix under Assumptions 2.1.2 and 2.3.1, (1.4.4) implies that it is M(2)

22 = Λ(2)2 . From

(1.4.7) and (1.4.8), we have

S12 =−X>1 M(2)12 X2, (2.3.15)

S21 = X>2 M(2)21 X1. (2.3.16)

If all eigenvalues of M(2)11 are distinct, then X1 could be uniquely determined; if all eigenvalues of M(2)

22 are distinct,since it is a diagonal matrix, X2 is identity matrix. Moreover, Λ′′1 , Λ′′2 and S can be uniquely determined:

Λ′′1 = diag

(X>1 M(4)

11 X1 +2X>1 M(2)12 M(2)

21 X1), (2.3.17)

Λ′′2 = diag

(M(4)

22 −2M(2)21 M(2)

12

), (2.3.18)

(S11)m,n =−1

(Λ(2)1 )m,m− (Λ

(2)1 )n,n

e>m(X>1 M(4)

11 X1 +2X>1 M(2)12 M(2)

21 X1)en, (2.3.19)

(S22)i, j =−1

(Λ(2)2 )i,i− (Λ

(2)2 ) j, j

e>i(M(4)

22 −2M(2)21 M(2)

12

)ei, (2.3.20)

where 1 ≤ m 6= n ≤ d and 1 ≤ i 6= j ≤ p− d. On the other hand, if M(2)22 has q+ t distinct eigenvalues, where

Page 36: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 32

q, t ≥ 0, and q eigenvalues are simple, then based on the pertutbation theory in the intoduction, we have

X2 =

Iq×q 0 · · · 0

0 X12 · · · 0

0 0. . . 0

0 0 · · · X t2

(2.3.21)

since M(2)22 is diagonal under Assumption 2.3.1. Each of X1

2 · · ·X t2 corresponds to a repeated eigenvalue, and each

of them is an orthogonal matrix whose dimension depends on the multiplicity of the repeated eigenvalue. Wemention that they may be uniquely determined by higher order terms in Cx as described in the pertutbation theoryin the intoduction.

For Case 2, when ε is sufficiently small, by dividing all matrices into blocks of the same size, we have

Λx =|Sd−1|P(x)εεd+2

d(d +2)

Id×d + ε2Λ(2)1 + ε4Λ

(4)1 0 0

0 ε2Λ(2)2,1 + ε4Λ

(4)2,1 0

0 0 ε4Λ(4)2,2

+O(ε6),

Ux(ε) =Ux(0)(Ip×p + ε2S)+O(ε4) ∈ O(p), (2.3.22)

Ux(0) =

X1 0 00 X2,1 00 0 X2,2

∈ O(p), S=

S11 S12,1 S12,2

S21,1 S22,11 S22,12

S21,2 S22,21 S22,22

∈ o(p) ,

by Assumptions 2.1.2 and 2.3.1, where Λ(2)1 is the eigenvalue matrix of M(2)

11 , diagonal entries of Λ(2)2,1 are nonzero,

X1 ∈ O(d), X2,1 ∈ O(p−d− l) and X2,2 ∈ O(l). By (1.4.7) and (1.4.8), we have

S12,1 =−X>1 M(2)12,1X2,1,

S21,1 = X>2,1M(2)21,1X1,

S12,2 =−X>1 M(2)12,2X2,2, (2.3.23)

S21,2 = X>2,2M(2)21,2X1 .

If the eigenvalues of M(2)11 are distinct, then X1 is the corresponding orthonormal eigenvector matrix. Λ

(2)2,2 = 0 by

the assumption of Case 2. Recall that Λ(4)2,2 is the eigenvalue matrix of M(4)

22,22−2 d(d+2)|Sd−1|P(x)M(2)

21,2M(2)12,2. If Λ

(4)2,2 has

different diagonal entries then X2,2 is the corresponding orthonormal eigenvector matrix. Recall that if Λ(2)1 , Λ

(2)2,1

Page 37: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 33

and Λ(4)2,2, each has distinct diagonal entries, then X(0) and S can be determined uniquely, and we have

S22,12 = (−Λ′2,1)−1(

12

M(4)22,12X2,2 +M(2)

21,1X1S12,2),

S22,21 = X>2,2(12

M(4)22,21 +M(2)

21,2X1S12,1)(Λ(2)2,1)−1,

Λ(4)1 = diag

[X>1 (M(4)

11 X1 +2M(2)12,1S21,1 +2M(2)

12,2X2,2S21,2)]

Λ(4)2,1 = diag

[(M(4)

22,11 +2M(2)21,1X1S12,1)

],

(S11)m,n =−1

(Λ(2)1 )m,m− (Λ

(2)1 )n,n

e>m[X>1 (

12

M(4)X1 +M(2)12,1S21,1 +M(2)

12,2X2,2S21,2)]en,

where 1≤ m 6= n≤ d, and

(S22,11)m,n =−1

(Λ(2)2,1)m,m− (Λ

(2)2,1)n,n

e>m[(

12

M(4)22,11 +M(2)

21,1X1S12,1)]en. (2.3.24)

where d +1≤ m 6= n≤ p− l. However, we need higher order derivative of Cx to solve S22,22 following the samestep as evaluating (1.4.41). We skip the details here. Finally, if diagonal entries of Λ

(2)2,1 are distinct, then X2,1 is

the identity matrix. If Λ(2)2,1 or Λ

(4)2,2 contains repeated eigenvalues, then it can be described as (2.3.21). We also

skip the details here.

In general, the eigen-structure of Cx may be more complicated than the two cases considered in Proposition2.3.2. In this general case, we could apply the same perturbation theory to evaluate the eigenvalues. Since theproof is similar but there is extensive notational loading, and it does not bring further insight to LLE, we skipdetails of these more general situations.

2.4 Integral kernel of LLE and variance analysis on closed manifolds

We now study the asymptotic behavior of LLE. Under the manifold setup, from now on, we fix

c = nεd+ρ , (2.4.1)

and we call ρ the regularization order. By (1.3.13), for vvv ∈ RN , we have

N

∑j=1

wk( j)vvv( j) =111>N vvv−111>N G>n Inεd+ρ (GnG>n )Gnvvv

N−111>N G>n Inεd+ρ (GnG>n )Gn111N. (2.4.2)

Before proceeding, we provide a geometric interpretation of this formula. By the eigen-decomposition GnG>n =

UnΛnU>n and the fact that Inεd+ρ (GnG>n ) =UnJp,rnJ>p,rn(Λn +nεd+ρ Ip×p)−1Jp,rnJ>p,rnU>n =UnInεd+ρ (Λn)U>n by

the definition of Iρ in (1.3.9), we have 111>N G>n Inεd+ρ (GnG>n )Gnvvv= 111>N G>n UnInεd+ρ (Λn)U>n Gnvvv and 111>N G>n Inεd+ρ (GnG>n )Gn111=111>N G>n UnInεd+ρ (Λn)U>n Gn111N . By the discussion of the local PCA in Section 2.3, U>n Gn means evaluating thecoordinates of all neighboring points of ι(xk) with the basis composed of the column vectors of Un, U>n Gn111 meansthe mean coordinate of all neighboring points, Inεd+ρ (Λn) means a regularized weighting of the coordinates thathelps to enhance the nonlinear geometry of the point cloud, and G>n UnInεd+ρ (Λn)U>n Gn is a quadratic form of theaveraged coordinates of all neighboring points. We could thus view the “kernel” part, 111>N G>n UnInεd+ρ (Λn)U>n Gn,as preserving the geometry of the point cloud, by evaluating how strongly the weighted coordinates of neighboring

Page 38: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 34

points are related to the mean coordinate of all neighboring points by the inner product.

Asymptotically, by the law of large numbers, when conditional on ι(xk),

1n

Gn111N =1n

N

∑j=1

(ι(xk, j)− ι(xk))n→∞−−−→ E[(X− ι(xk))χBRp

ε (ι(xk))(X)]

and we “expect” the following holds

nInεd+ρ (GnG>n ) = Iεd+ρ (

1n

GnG>n )n→∞−−−→I

εd+ρ (Cxk).

Also, we would “expect” to have

nInεd+ρ (GnG>n )1n

Gn111Nn→∞−−−→I

εd+ρ (Cxk)[E(X− ι(xk))χBRp

ε (xk)

]=: Tι(xk) .

Hence, for f ∈C(ι(M)), for ι(xk) and its corresponding Nι(xk), we would “expect” to have

N

∑j=1

wn( j) f (xk, j)n→∞−−−→

E[χBRpε (xk)

(X) f (X)]−T>ι(xk)

E[(X− ι(xk))χBRpε (xk)

(X) f (X)]

E[χBRpε (xk)

(X)]−T>ι(xk)

E[(X− ι(xk))χBRpε (xk)

(X)]

=E[ f (X)(1−T>

ι(x)(X− ι(x)))χBRpε (x)(X)]

E[(1−T>ι(x)(X− ι(x)))χBRp

ε (x)(X)]. (2.4.3)

However, it is not possible to directly see how the convergence happens, due to the dependence among differentterms and how the regularized pseudo-inverse converges. The dependence on the regularization order is also notclear. A careful theoretical analysis is needed.

To proceed with the proof, we need to discuss a critical observation. Note that the term Cx might be ill-conditioned for the pseudo-inverse procedure, and the regularized pseudo inverse depends on how the regulariza-tion penalty ρ is chosen. As we will see later, the choice of ρ is critical for the outcome. The ill-conditionednessdepends on the manifold geometry, and can be complicated. In this paper we focus on the following three cases.

Condition 2.4.1. Follow the notations used in Proposition 2.3.2. For the local covariance matrix Cx with the rank

r, without loss of generality, we consider the following three cases:

• Case 0: r = d;

• Case 1: r = p > d, and λ(2)d+1, . . . ,λ

(2)p are nonzero;

• Case 2: r = p > d, λ(2)d+1, . . . ,λ

(2)p−l , are nonzero, where 1 ≤ l ≤ p− d, λ

(2)p−l+1 = . . . = λ

(2)p = 0, and

λ(4)p−l+1, . . . ,λ

(4)p are nonzero.

At first glance, it is limited to assume that when r > d, we have r = p in Cases 1 and 2. However, it is generalenough in the following sense. In Cases 1 and 2, if Cx is degenerate, that is, d < r < p, it means that locallythe manifold only occupies a lower dimensional affine subspace. Therefore, the sampled data are constrained tothis affined subspace, and hence the rank of the local sample covariance matrix satisfies rn ≤ r. As a result, theanalysis can be carried out only on this affine subspace without changing the outcome. More general situationscould be studied by the same analysis techniques shown below, but they will not provide more insights about our

Page 39: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 35

understanding of the algorithm and will introduce additional notational burdens. For f ∈C(ι(M)), define

Q f (x) :=E[ f (X)(1−T>

ι(x)(X− ι(x)))χBRpε (x)(X)]

E[(1−T>ι(x)(X− ι(x)))χBRp

ε (x)(X)], (2.4.4)

The following theorem summarizes the relationship between LLE and Q f under these three cases.

Theorem 2.4.1. Fix f ∈ C(ι(M)). Suppose the regularization order is ρ ∈ R, ε = ε(n) so that√

log(n)n1/2εd/2+1 → 0

and ε → 0 as n→ ∞. With probability greater than 1−n−2, for all xk ∈X , under different conditions listed in

Condition 2.4.1, we have:

N

∑j=1

wk( j) f (xk, j)− f (xk) (2.4.5)

=

Q f (xk)− f (xk)+O( √

log(n)n1/2εd/2−1

)in Case 0

Q f (xk)− f (xk)+O( √

log(n)n1/2εd/2+[(−1)∨(0∧(ρ−4)]

)in Cases 1,2

Particularly, when ρ ≤ 3, with probability greater than 1−n−2, for all xk ∈X , for all Cases listed in Condition

2.4.1, we have:

N

∑j=1

wk( j) f (xk, j)− f (xk) = Q f (xk)− f (xk)+O( √log(n)

n1/2εd/2−1

). (2.4.6)

Note that the convergence rate of Case 0 is fast, no matter what regularization order ρ is chosen, while theconvergence rate of Case 1 and Case 2 depends on ρ . This theorem echoes several practical findings of LLE thatthe choice of regularization is critical in the performance, and it suggests that we should choose ρ = 3.

Remark 2.4.1. We should compare the convergence rate of LLE with that of the DM. The convergence rate of

Case 0 is the same as that of the eigenmap or the DM without any normalization [57], while the convergence rate

of Case 1 and Case 2 is the same as that of the α-normalized DM [18] when ρ ≥ 4 [57]. Note that the main

convergence rate bottleneck for the α-normalized DM comes from the probability density function estimation,

while the convergence bottleneck for LLE is the regularized pseudo-inverse.

Theorem 2.4.1 describes how LLE could be viewed as a “diffusion process” on the dataset. Note that

E[ f (X)(1−T>ι(x)(X− ι(x)))χBRp

ε (x)(X)] (2.4.7)

=∫

M(1−T>

ι(xk)(ι(y)− ι(xk)))χBRp

ε (xk)(ι(y)) f (ι(y))P(y)dV (y)

Therefore, we can view wn as a “zero-one” kernel supported on BRpε (xk)∩ ι(M) with the correction depending on

Tι(xk). Note that after the correction, the whole operator may no longer be a diffusion.

Corollary 2.4.1. The integral kernel associated with LLE when the regularization order is ρ ∈ R is

KLLE(x,y) = [1−T>ι(x)(ι(y)− ι(x))]χBRp

ε (ι(x))∩ι(M)(ι(y)), (2.4.8)

where x,y ∈M and

Tι(x) := Iεd+ρ (Cx)

[E(X− ι(x))χBRp

ε (x)

]∈ Rp. (2.4.9)

Page 40: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 36

Note that KLLE depends on ε , the geometry of the manifold near x, and ρ via Tι(x). We provide some properties

of the kernel function KLLE. By a direct expansion, we have T>ι(x) = ∑

ri=1

u>i E[(X−xk)χBRpε (xk)

(X)]

λi+εd+ρu>i , where ui and λi

are the i-th eigen-pair of Cx. Since |E(X− ι(xk))χBRpε (xk)

(X)| is bounded above by vol(M)ε , λi+εd+ρ is bounded

below by εd+ρ and each ui is a unit vector, |Txk | is bounded above by ∑ri=1

εvol(M)

λi+εd+ρ. Consequently, we have the

following proposition.

Proposition 2.4.1. The kernel KLLE is compactly supported and is in L2(M×M). Thus, the linear operator

A : L2(M,PdV )→ L2(M,PdV ) defined by

A f (x) := E[ f (X)(1−T>ι(x)(X− ι(x)))χBRp

ε (x)(X)] (2.4.10)

is Hilbert-Schmidt.

Note that the kernel function KLLE(x, ·) depends on x and hence the manifold, and the kernel is dominatedby normal bundle information, due to the regularized pseudo-inverse procedure. For example, if M is an affinesubspace of Rp and the data is uniformly sampled, then E[(X − x)χBRp

ε (x)(X)] = 0. Consequently, Tx = 0 andK(x,y) = 1. If M is Sp−1, a unit sphere centered at origin embedded in Rp and the data is uniformly sampled,the first dominant p− 1 eigenvectors are perpendicular to x and the last eigenvector is parallel to x. By a directcalculation, E[(X− x)χBRp

ε (x)(X)] is parallel to x and hence K(x,y) behaves like a quadratic function 1− cu>p (y−x) = 1− cx>(y− x), where c is the constant depending on the eigenvalues.

2.4.1 Proof of Theorem 2.4.1

For each xk, denote fff = ( f (xk,1), f (xk,2), . . . , f (xk,N))> ∈ RN . By the expansion

N

∑j=1

wk( j) f (xk, j) =111>N fff −111>N G>n Inεd+ρ (GnG>n )Gn fff

N−111>N G>n Inεd+ρ (GnG>n )Gn111N,

we can write ∑Nj=1 wk( j) f (xk, j)− f (xk) as

1n ∑

Nj=1( f (xk, j)− f (xk))− [ 1

n ∑Nj=1(xk, j− xk)]

>nInεd+ρ (GnG>n )[1n ∑

Nj=1(xk, j− xk)( f (xk, j)− f (xk))]

Nn − [ 1

n ∑Nj=1(xk, j− xk)]>nInεd+ρ (GnG>n )[

1n ∑

Nj=1(xk, j− xk)]

. (2.4.11)

Note that we havenInεd+ρ (GnG>n ) = I

εd+ρ (1n

GnG>n ).

Thus, the goal is to relate the finite sum quantity (2.4.11) with the following “expectation”

A f (xk)

A1(xk)− f (xk) = Q f (xk)− f (xk) , (2.4.12)

where A is defined in (2.4.10). Note that LLE is a ratio of two dependent random variables, and the denominatorand numerator both involve complicated mixup of sampling points. Therefore, the convergence fluctuation cannot

Page 41: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 37

be simply computed. We control the size of the fluctuation of the following five terms

1nεd

N

∑j=1

1 (2.4.13)

1nεd

N

∑j=1

( f (xk, j)− f (xk)) (2.4.14)

1nεd

N

∑j=1

(xk, j− xk) (2.4.15)

1nεd

N

∑j=1

(xk, j− xk)( f (xk, j)− f (xk)) (2.4.16)

1nεd (GnG>n + ε

ρ Ip×p) (2.4.17)

as functions of n and ε by the Bernstein type inequality. Here, we put ε−d in front of each term to normalize thekernel so that the computation is consistent with the existing literature, like [17, 57]. The size of the fluctuationof these terms are controlled in the following Lemmas. The term (2.4.13) is the usual kernel density estimation,so we have the following lemma.

Lemma 2.4.1. When n is large enough, we have with probability greater than 1− n−2 that for all k = 1, . . . ,nthat ∣∣∣∣∣ 1

nεd

N

∑j=1

1−E1εd χBRp

ε (xk)(X)

∣∣∣∣∣= O(√log(n)

n1/2εd/2

).

The behavior of (2.4.14) is summarized in the following Lemma. Although the proof is standard, we provideit for the sake of self-containedness.

Lemma 2.4.2. When n is large enough, we have with probability greater than 1− n−2 that for all k = 1, . . . ,nthat ∣∣∣∣∣ 1

nεd

N

∑j=1

( f (xk, j)− f (xk))−E1εd ( f (X)− f (xk))χBRp

ε (xk)(X)

∣∣∣∣∣= O( √log(n)

n1/2εd/2−1

).

Proof. By denoting

F1, j =1εd ( f (x j)− f (xk))χBRp

ε (xk)(x j),

we have1

nεd

N

∑j=1

( f (xk, j)− f (xk)) =1n

n

∑j 6=k, j=1

F1, j.

Define a random variableF1 :=

1εd ( f (X)− f (xk))χBRp

ε (xk)(X).

Clearly, when j 6= k, F1, j can be viewed as randomly sampled i.i.d. from F1. Note that we have

1n

n

∑j 6=k, j=1

F1, j =n−1

n

[1

n−1

n

∑j 6=k, j=1

F1, j

].

Since n−1n → 1 as n→∞, the error incurred by replacing 1

n by 1n−1 is of order 1

n , which is negligible asymptotically.

Page 42: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 38

Thus, we can simply focus on analyzing 1n−1 ∑

nj=1, j 6=i F1, j. We have by Lemma 2.2.5

E[F1] =|Sd−1|

2d(d +2)[∆(( f (y)− f (xk))P(y))|y=xk

2 +O(ε3)

E[F21 ] =

|Sd−1|2d(d +2)

[∆(( f (y)− f (xk))

2P(y))|y=xk

]ε−d+2 +O(ε−d+3),

where ∆ acts on y and we apply the Lemma by viewing f (y)P(y) as a function and evaluate the integration overthe uniform measure. Thus, we conclude that

σ21 := Var(F1) =

|Sd−1|2d(d +2)

[∆(( f (y)− f (xk))

2P(y))|y=xk

]ε−d+2 +O(ε−d+3). (2.4.18)

To simplify the discussion, we assume that ∆(( f (y)− f (xk))2P(y))|y=xk 6= 0 so that σ2

1 = O(ε−d+2) when ε issmall enough. In the case that ∆(( f (y)− f (xk))

2P(y))|y=xk = 0, the variance is of higher order, and the proof isthe same.

With the above bounds, we could apply the large deviation theory. First, note that the random variable F1 isuniformly bounded by

c1 = 2‖ f‖L∞ε−d

andσ

21 /c1→ 0 as ε → 0,

so we apply Bernstein’s inequality to provide a large deviation bound. Recall Bernstein’s inequality

Pr

1

n−1

n

∑j 6=k, j=1

(F1, j−E[F1])> β1

≤ e−

nβ21

2σ21 + 2

3 c1β1 ,

where β1 > 0. Since our goal is to estimate a quantity of order ε2, which is the order that the Laplace-Beltramioperator lives, we need to take β1 = β1(ε) much smaller than ε2 in the sense that β1/ε2 → 0 as ε → 0. In thiscase, c1β1 is much smaller than σ2

1 , and hence 2σ21 + 2

3 c1β1 ≤ 3σ21 when ε is smaller enough. Thus, when ε is

smaller enough, the exponent in Bernstein’s inequality is bounded from below by

nβ 21

2σ21 +

23 c1β1

≥ nβ 21

3σ21≥ nβ 2

1 εd−2

3 |Sd−1|

d(d+2)

[∆(( f (y)− f (xk))2P(y))|y=xk

] .Suppose n is chosen large enough so that

nβ 21 εd−2

3 |Sd−1|

d(d+2)

[∆(( f (y)− f (xk))2P(y))|y=xk

] = 3log(n) ;

that is, the deviation from the mean is set to

β1 =3√

log(n)√|Sd−1|

d(d+2)

[∆(( f (y)− f (xk))2P(y))|y=xk

]n1/2εd/2−1 = O

( √log(n)n1/2εd/2−1

), (2.4.19)

where the implied constant in O( √

log(n)n1/2εd/2−1

)is√|Sd−1|

d(d+2)

[∆(( f (y)− f (xk))2P(y))|y=xk

]. Note that by the assump-

Page 43: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 39

tion that ε = ε(n) so that√

log(n)n1/2εd/2+1 → 0 as ε → 0, we know that β1/ε2 =

√log(n)

n1/2εd/2+1 → 0. It implies that thedeviation greater than β1 happens with probability less than

exp

(− nβ 2

1

2σ21 +

23 c1β1

)≤ exp

− nβ 21 εd−2

3 |Sd−1|

d(d+2) [∆(( f (y)− f (xk))2P(y))|y=xk ]

= exp(−3log(n)) = 1/n3.

As a result, by a simple union bound, we have

Pr

1

n−1

n

∑j 6=k, j=1

(F1, j−E[F1])> β1

∣∣∣k = 1, . . . ,n

≤ ne

−nβ2

12σ2

1 + 23 c1β1 ≤ 1/n2.

Denote Ω1 to be the event space that the deviation 1n−1 ∑

nj 6=k, j=1(F1, j−E[F1])≤ β1 for all i = 1, . . . ,n, where

β1 is chosen in (2.4.19) is satisfied. We now proceed to (2.4.15). In this case, we need to discuss different casesindicated by Condition 2.4.1.

Lemma 2.4.3. Suppose Case 0 in Condition 2.4.1 holds. When n is large enough, we have with probability

greater than 1−n−2 that for all k = 1, . . . ,n,

e>i

[1

nεd

N

∑j=1

(xk, j− xk)−E1εd (X− xk)χBRp

ε (xk)(X)

]= O

( √log(n)n1/2εd/2−1

),

where i = 1, . . . ,d.

Suppose Case 1 in Condition 2.4.1 holds. When n is large enough, we have with probability greater than

1−n−2 that for all k = 1, . . . ,n,

e>i

[1

nεd

N

∑j=1

(xk, j− xk)−E1εd (X− xk)χBRp

ε (xk)(X)

]= O

( √log(n)n1/2εd/2−1

),

where i = 1, . . . ,d and

e>i

[1

nεd

N

∑j=1

(xk, j− xk)−E1εd (X− xk)χBRp

ε (xk)(X)

]= O

( √log(n)n1/2εd/2−2

),

where i = d +1, . . . , p.

Suppose Case 2 in Condition 2.4.1 holds. When n is large enough, we have with probability greater than

1−n−2 that for all k = 1, . . . ,n,

e>i

[1

nεd

N

∑j=1

(xk, j− xk)−E1εd (X− xk)χBRp

ε (xk)(X)

]= O

( √log(n)n1/2εd/2−1

),

where i = 1, . . . ,d,

e>i

[1

nεd

N

∑j=1

(xk, j− xk)−E1εd (X− xk)χBRp

ε (xk)(X)

]= O

( √log(n)n1/2εd/2−2

),

Page 44: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 40

where i = d +1, . . . , p− l, and

e>i

[1

nεd

N

∑j=1

(xk, j− xk)−E1εd (X− xk)χBRp

ε (xk)(X)

]= O

( √log(n)n1/2εd/2−3

),

where i = p− l +1, . . . , p

Proof. First, we prove Case 1. Case 0 is a special case of Case 1. Suppose Case 1 holds. Fix xk. By denoting

1nεd

N

∑j=1

(xk, j− xk) =1n

n

∑j 6=k, j=1

p

∑`=1

F2,`, je`.

whereF2,`, j :=

1εd e>` (x j− xk)χBRp

ε (xk)(x j),

we know that when j 6= k, F2,`, j is randomly sampled i.i.d. from the random variable

F2,` :=1εd e>` (X− xk)χBRp

ε (xk)(X).

Similarly, we can focus on analyzing 1n−1 ∑

nj=1, j 6=i F2,`, j since n−1

n → 1 as n→ ∞. By plugging f = 1 in (2.2.10),we have

E[F2,`] =|Sd−1|ε2

d +2e>`[[J>p,dι∗∇P(x)

d,

P(x)J>p,p−dN0(x)

2]]+O(ε4)

and by (2.3.8) we have

E[F22,`] =

|Sd−1|P(x)ε−d+2

d(d +2)+O(ε−d+4) when `= 1, . . . ,d

P(x)ε−d+4

4(d +4)

∫Sd−1|〈IIx(θ ,θ),e`〉|2dθ +O(ε−d+6) when `= d +1, . . . , p.

Thus, we conclude that

σ22,` := Var(F2,`)

=

|Sd−1|P(x)ε−d+2

d(d +2)+O(ε−d+4) when `= 1, . . . ,d

P(x)ε−d+4

4(d +4)

∫Sd−1|〈IIx(θ ,θ),e`〉|2dθ +O(ε−d+6) when `= d +1, . . . , p.

Note that for `= d+1, . . . , p, the variance is of higher order than that of `= 1, . . . ,d. By the same argument, Case0 satisfies E[F2,`] = σ2

2,` = 0 for `= d +1, . . . , p.

With the above bounds, we could apply the large deviation theory. For ` = 1, . . . ,d, the random variable F2,`

is uniformly bounded by c2,` = 2ε−d+1 and σ22,`/c2,` → 0 as ε → 0, so when ε is sufficiently smaller and n is

sufficiently large, the exponent in Bernstein’s inequality,

Pr

1

n−1

n

∑j 6=k, j=1

(F2,`, j−E[F2,`])> β2,`

≤ exp

(−

nβ 22,`

2σ22,`+

23 c2,`β2,`

),

Page 45: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 41

where β2,` > 0, satisfiesnβ 2

2,`

2σ22,`+

23 c2,`β2,`

≥nβ 2

2,`

3σ22,`≥

nβ 22,`ε

d−2

3 |Sd−1|P(x)d(d+2)

= 3log(n) ;

that is, the deviation from the mean is set to

β2,` =3√

log(n)√

3 |Sd−1|P(x)d(d+2)

n1/2εd/2−1 = O( √log(n)

n1/2εd/2−1

). (2.4.20)

For `= d +1, . . . , p, since the variance is of higher order, by the same argument, we have

β2,` =3√

log(n)√

3 |Sd−1|P(x)d(d+2)

n1/2εd/2−1 = O( √log(n)

n1/2εd/2−2

). (2.4.21)

As a result, in both Case 0 and Case 1, by a simple union bound, for `= 1, . . . ,d, we have

Pr

∣∣∣∣∣1n n

∑j 6=k, j=1

F2,`, j−E[F2,`]

∣∣∣∣∣> β2,`

∣∣∣k = 1, . . . ,n

≤ 1/n2.

where

β2,` =3√

log(n)√

3 |Sd−1|P(x)d(d+2)

n1/2εd/2−1 = O( √log(n)

n1/2εd/2−1

), (2.4.22)

and in Case 1, for `= d +1, . . . , p, we have

Pr

∣∣∣∣∣1n n

∑j 6=k, j=1

F2,`, j−E[F2,`]

∣∣∣∣∣> β2,`

∣∣∣k = 1, . . . ,n

≤ 1/n2.

where

β2,` =3√

log(n)√

3 P(x)4(d+4)

∫Sd−1 |〈IIx(θ ,θ),e`〉|2dθ

n1/2εd/2−2 = O( √log(n)

n1/2εd/2−2

). (2.4.23)

For Case 2, by plugging f = 1 in (2.2.10), we have

E[F2,`] =

ε2e>`|Sd−1|ι∗∇P(x)

d(d +2)+O(ε4) when `= 1, . . . ,d

ε2e>`|Sd−1|P(x)J>p,p−dN0(x)

d(d +2)+O(ε4) when `= d +1, . . . , p− l

ε4e>`

R1(x)∇P(x)6(d +4)

+O(ε5) when `= p− l +1, . . . , p.

Page 46: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 42

and by (2.3.8) we have

E[F22,`]

=

|Sd−1|P(x)ε−d+2

d(d +2)+O(ε−d+4) when `= 1, . . . ,d

P(x)ε−d+4

4(d +4)

∫Sd−1|〈IIx(θ ,θ),e`〉|2dθ +O(ε−d+6) when `= d +1, . . . , p− l

P(x)ε−d+6

36(d +6)

∫Sd−1〈∇θ IIx(θ ,θ),em〉〈∇θ IIx(θ ,θ),en〉dθ +O(ε−d+8) when `= p− l +1, . . . , p ,

Thus, we conclude that

σ22,` := Var(F2,`)

=

|Sd−1|P(x)ε−d+2

d(d +2)+O(ε−d+4) when `= 1, . . . ,d

P(x)ε−d+4

4(d +4)

∫Sd−1|〈IIx(θ ,θ),e`〉|2dθ +O(ε−d+6) when `= d +1, . . . , p− l

P(x)ε−d+6

36(d +6)

∫Sd−1〈∇θ IIx(θ ,θ),em〉〈∇θ IIx(θ ,θ),en〉dθ +O(ε−d+8) when `= p− l +1, . . . , p.

By the same large deviation argument that we skip the details, we conclude the claim with

β2,` =

O( √log(n)

n1/2εd/2−1

)when `= 1, . . . ,d

O( √log(n)

n1/2εd/2−2

)when `= d +1, . . . , p− l

O( √log(n)

n1/2εd/2−3

)when `= p− l +1, . . . , p.

(2.4.24)

Denote Ω2 to be the event space that the deviation∣∣∣ 1

n ∑nj 6=k, j=1 F2,`, j−E[F2,`]

∣∣∣ ≤ β2,` for all ` = 1, . . . , p andk = 1, . . . ,n, where β2,` are chosen in (2.4.22) under Case 0 in Condition 2.4.1, (2.4.22) and (2.4.23) under Case1, and (2.4.24) under Case 2.

Denote the eigen-decomposition of 1nεd GnG>n as UnΛnU>n , where Un ∈O(p) and Λn ∈Rp×p a diagonal matrix,

and the eigen-decomposition of 1εd Cx as UΛU>, where U ∈ O(p) and Λ ∈ Rp×p a diagonal matrix. Note that

nεdInεd+ρ (GnG>n ) = Iερ (

1nεd GnG>n ) .

We first control Iερ (Λn)−Iερ (Λ) = Ip,rn(Λn+ερ)−1Ip,rn− Ip,r(Λ+ερ)−1Ip,r based on the three cases listedin Condition 2.4.1. By Proposition 2.3.1, the first d eigenvalues of EF are of order ε2. In Case 0, all the remainingeigenvalues are 0; in Case 1, all the remaining eigenvalues are nonzero and of order ε4; in Case 2, there are l

nonzero eigenvalues of order ε6 and p−d− l remaining eigenvalues of order ε4.

Lemma 2.4.4. When n is large enough, with probability greater than 1−n−2, for Case 0 in Condition 2.4.1, we

have

∣∣e>i [Iερ (Λn)−Iερ (Λ)]ei∣∣= O

( √log(n)

n1/2εd/2−2+2(2∧ρ)

)

Page 47: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 43

for i = 1, . . . ,d; for Case 1 in Condition 2.4.1, we have

∣∣e>i [Iερ (Λn)−Iερ (Λ)]ei∣∣=

O( √

log(n)n1/2εd/2−2+2(2∧ρ)

)for i = 1, . . . ,d

O( √

log(n)n1/2εd/2−4+2(4∧ρ)

)for i = d +1, . . . , p ;

for Case 2 in Condition 2.4.1, we have

∣∣e>i [Iερ (Λn)−Iερ (Λ)]ei∣∣=

O( √

log(n)n1/2εd/2−2+2(2∧ρ)

)for i = 1, . . . ,d

O( √

log(n)n1/2εd/2−4+2(4∧ρ)

)for i = d +1, . . . , p− l

O( √

log(n)n1/2εd/2−6+2(6∧ρ)

)for i = p− l +1, . . . , p .

Moreover, for each case in Condition 2.4.1, when n is sufficiently large, with probability greater than 1−n−2, we

have Un =UΘ+

√log(n)

n1/2εd/2−2 UΘS+O( log(n)

nεd−4

), where S ∈ o(p), and Θ ∈ O(p). Θ commutes with Iερ (Λ).

Note that log(n)nεd−4 is asymptotically bounded by ε6 due to the assumption that

√log(n)

n1/2εd/2−1 is asymptotically ap-proaching zero as n→ ∞.

Proof. We start from analyzing 1nεd GnG>n . The proof can be found in [57, (6.12)-(6.19)], and here we summarize

the results with our notations. Denote

F3,a,b,i :=1εd e>a (xk,i− xk)(xk,i− xk)

>eb

so that1

nεd GnG>n =1n

p

∑a,b=1

N

∑i=1

F3,a,b,ieae>b .

Note that for each a,b = 1, . . . , p, F3,a,b,ini=1 are i.i.d. realizations of the random variable F3,a,b = 1

εd e>a (X −xk)(X− xk)

>ebχBRpε (xk)

(X). Denote F3 ∈ Rp×p so that the (a,b)-th entry of F3 is F3,a,b. Note that Cx = εdEF3.

The random variable F3,a,b is bounded by c3,a,b = 2ε−d+2 when a,b = 1, . . . ,d, by c3,a,b = ca,bε−d+4 whena,b = d+1, . . . , p, and by c3,a,b = ca,bε−d+3 for other pairs of a,b, where ca,b, when a > d or b > d, are constantsdepending on the second fundamental form [55, (B.33)-(B.34)].

The variance of F3,a,b, denoted as σ23,a,b, is sa,bε−d+4 when a,b = 1, . . . ,d, sa,bε−d+8 when a,b = d+1, . . . , p,

and sa,bε−d+6 for other pairs of a,b (see [55, (B.33)-(B.35)] or [57]), where sa,b are constants depending on thesecond fundamental form. Again, to simplify the discussion, we assume that ca,b and sa,b are not zero for alla,b = 1, . . . , p. When the variance is of higher order, the deviation could be evaluated similarly and we skip thedetails.1 Thus, for β3,1,β3,2,β3,3 > 0, by Berstein’s inequality, we have

Pr

∣∣∣∣∣1n n

∑i6=k, i=1

F3,a,b,i−EF3,a,b

∣∣∣∣∣> β3,1

≤ exp

(n−1)β 23,1

sa,bε−d+4 + ca,bε−d+2β3,1

(2.4.25)

1For example, when the manifold is flat around xk and ε is sufficiently small, ca,b = sa,b = 0 when a > d or b > d, and the proof of thebound is trivial.

Page 48: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 44

when a,b = 1, . . . ,d,

Pr

∣∣∣∣∣1n n

∑i6=k, i=1

F3,a,b,i−EF3,a,b

∣∣∣∣∣> β3,2

≤ exp

(n−1)β 23,2

sa,bε−d+8 + ca,bε−d+4β3,2

(2.4.26)

when a,b = d +1, . . . , p, and

Pr

∣∣∣∣∣1n n

∑i6=k, i=1

F3,a,b,i−EF3,a,b

∣∣∣∣∣> β3,3

≤ exp

(n−1)β 23,3

sa,bε−d+6 + ca,bε−d+3β3,3

(2.4.27)

for the other cases.

Choose β3,1, β3,2 and β3,3 so that β3,1/ε2 → 0, β3,2/ε4 → 0 and β3,3/ε3 → 0 as ε → 0 so that when ε issufficiently small,

sa,bε−d+4 + ca,bε

−d+2β3,1 ≤ 2sa,bε

−d+4 for all k, l = 1, . . . ,d

sa,bε−d+8 + ca,bε

−d+4β3,2 ≤ 2sa,bε

−d+8 for all k, l = d +1, . . . , p

sa,bε−d+6 + ca,bε

−d+3β3,3 ≤ 2sa,bε

−d+6 for other k, l .

To guarantee that the deviation of (2.4.25), (respectively (2.4.26) and (2.4.27)) greater than β3,1 (respectively

β3,2 and β3,3) happens with probability less than 1n3 , n should satisfy

nβ 23,1

log(n) ≥ 6sa,bε−d+4 (respectivelynβ 2

3,2log(n) ≥

6sa,bε−d+8 andnβ 2

3,3log(n) ≥ 6sa,bε−d+6). By setting β3,1 =

√6sa,b

√log(n)

n1/2εd/2−2 , β3,2 =√

6sa,b

√log(n)

n1/2εd/2−4 , and β3,3 =√6sa,b

√log(n)

n1/2εd/2−3 , the conditions β3,1/ε3→ 0, β3,2/ε5→ 0 and β3,3/ε4→ 0 as ε → 0 hold by the assumed rela-tionship between n and ε and the deviations of (2.4.25), (2.4.26) and (2.4.27) are well controlled by β3,1, β3,2 andβ3,3 respectively, with probability greater than 1−n−3. Define the deviation of 1

nεd GnG>n from EF3 as

E :=1

nεd GnG>n −EF3 ∈ Rp×p. (2.4.28)

As a result, again by a trivial union bound, with probability greater than 1−n−2, for all xk, we have

|Ea,b| ≤c√

log(n)n1/2εd/2−2 when a,b = 1, . . . ,d

|Ea,b| ≤c√

log(n)n1/2εd/2−4 when a,b = d +1, . . . , p

|Ea,b| ≤c√

log(n)n1/2εd/2−3 otherwise,

(2.4.29)

wherec := max

a,b=1,...,p

√6sa,b. (2.4.30)

Denote Ω3 to be the event space that the deviation (2.4.29) is satisfied. With the above preparation, we now startthe proof of Lemma 2.4.4 case by case.

Case 0 in Condition 2.4.1. Note that both EF and 1nεd GnG>n are of rank r = d due to the geometric con-

straints. By the calculation in Section 1.4, when conditional on Ω3, (2.4.28) holds, and the nonzero eigenvalues of

Page 49: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 45

1nεd GnG>n (there are only d such eigenvalues) are deviated from the nonzero eigenvalues of EF3 by O(

√log(n)

n1/2εd/2−2 ),which is smaller than ε3 by the assumed relationship between n and ε . Thus, rn = d when ε is sufficiently small,and we have

Ip,rn(Λn + ερ)−1Ip,rn − Ip,d(Λ+ ε

ρ)−1Ip,d = Ip,d [(Λn + ερ)−1− (Λ+ ε

ρ)−1]Ip,d .

Denote the i-th eigenvalue of EF3 =1

εd Cx as λi, where i = 1, . . . ,d. By a direct calculation, we have

∣∣e>i [Iερ (Λn)−Iερ (Λ)]ei∣∣= O

( √log(n)

n1/2εd/2−2+2(2∧ρ)

)for i = 1, . . . ,d when ε is sufficiently small since we have

1

λi +O(√

lognn1/2εd/2−2 )+ ερ

− 1λi + ερ

=1

λi + ερ

( 1

O(√

lognn1/2εd/2−2(λi+ερ )

)+1−1)

=O( √

lognn1/2εd/2−2(λi + ερ)2

)= O

( √log(n)

n1/2εd/2−2+2(2∧ρ)

)due to the fact that λi is of order ε2 for i = 1, . . . ,d, λi + ερ = O(ε2∧ρ) and n1/2εd/2+1→ ∞ as n→ ∞. Supposethere are 1 ≤ l ≤ d distinct eigenvalues, and the multiplicity of the j-th distinct eigenvalue is p j ∈ N. Clearly,

∑li=1 pi = p. By the calculation in Section 1.4 that we skip the details, when conditional on Ω3, Un = UΘ+√

log(n)n1/2εd/2−2 UΘS+O( log(n)

nεd−4 ), where S ∈ o(p),

Θ =

X (1) 0 · · · 0

0 X (2) · · · 0

0 0. . . 0

0 0 · · · X (l)

∈ O(p), (2.4.31)

and X ( j) ∈O(p j), j = 1, . . . , l, comes from the j-th distinct eigenvalue. Note that Θ commutes with Λ and Iερ (Λ).

Case 1 in Condition 2.4.1. By the calculation in Section 1.4, when conditional on Ω3, the first d eigenvalues

of 1nεd GnG>n are deviated from the first d eigenvalues of EF by O(

c√

log(n)n1/2εd/2−2 ), which is smaller than ε3, and the left

p−d eigenvalues of 1nεd GnG>n are deviated from the left p−d eigenvalues of EF by O(

c√

log(n)n1/2εd/2−4 ), which is smaller

than ε5. Thus, again, when ε is sufficiently small, rn = r = p, and Ip,rn(Λn + ερ)−1Ip,rn − Ip,r(Λ+ ερ)−1Ip,r =

[(Λn + ερ)−1− (Λ+ ερ)−1]. Therefore,

∣∣e>i [Iερ (Λn)−Iερ (Λ)]ei∣∣=

O( √

log(n)n1/2εd/2−2+2(2∧ρ)

)for i = 1, . . . ,d

O( √

log(n)n1/2εd/2−4+2(4∧ρ)

)for i = d +1, . . . , p

when ε is sufficiently small. Again, by the calculation in Section 1.4, when conditional on Ω3, we have Un =

UΘ+

√log(n)

n1/2εd/2−2 UΘS+O( log(n)nεd−4 ), where S ∈ o(p) and Θ ∈ O(p) is defined in (2.4.31).

Page 50: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 46

Case 2 in Condition 2.4.1. A similar discussion holds. In this case, when conditional on Ω3, we have

∣∣e>i [Iερ (Λn)−Iερ (Λ)]ei∣∣=

O( √

log(n)n1/2εd/2−2+2(2∧ρ)

)for i = 1, . . . ,d

O( √

log(n)n1/2εd/2−4+2(4∧ρ)

)for i = d +1, . . . , p− l

O( √

log(n)n1/2εd/2−6+2(6∧ρ)

)for i = p− l +1, . . . , p

when ε is sufficiently small. Similarly, when conditional on Ω3, Un = UΘ+

√log(n)

n1/2εd/2−2 UΘS+O( log(n)nεd−4 ), where

S ∈ o(p) and Θ ∈ O(p) is defined in (2.4.31).

Back to finish the proof of Theorem 2.4.1. Denote Ω := ∩i=1,...,4Ωi. It is clear that the probability of theevent space Ω is great than 1− 4n−2. Below, all arguments are conditional on Ω. When ε is sufficiently small,based on Lemma (2.4.4), we have

Iερ (1

nεd GnG>n ) =Iερ (EF)+ E3 , (2.4.32)

where

E3 :=UnIερ (Λn)U>n −UIερ (Λn)U>

=(

UΘ+

√log(n)

n1/2εd/2−2 UΘS+O(log(n)nεd−4 )

)[Iερ (Λ)+E3,1]

×(

UΘ+

√log(n)

n1/2εd/2−2 UΘS+O(log(n)nεd−4 )

)>−UIερ (Λ)U>.

=

√log(n)

n1/2εd/2−2 UΘ[SIερ (Λ)+Iερ (Λ)S>]Θ>U>+UΘE3,1Θ>U>+

[higher order terms

].

and E3,1 := Iερ (Λn)−Iερ (Λ), which bound is provided in Lemma (2.4.4). Define

E3 :=

√log(n)

n1/2εd/2−2 UΘ[SIερ (Λ)+Iερ (Λ)S>]Θ>U>+UΘE3,1Θ>U>. (2.4.33)

By (2.4.3), we have

1nεd

N

∑j=1

(xk, j− xk) = EF2 +E2, (2.4.34)

where the bound of E2 is provided in Lemma 2.4.3. Similarly, we have

1nεd

N

∑j=1

(xk, j− xk)( f (xk, j)− f (xk)) = EF4 +E4 , (2.4.35)

where the bound of E4 is the same as that in Lemma 2.4.3.

Page 51: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 47

Table 2.1: The relevant items in each error term in E >2 Iερ (EF3)EF4 +EF>2 Iερ (EF3)E4 +EF>2 E3EF4. Thebounds are for entrywise errors. T means the tangential components in all Cases, N means the normal componentsin Case 1, and N1 means the first p− d− l normal components of order εd+4, and N2 means the last l normalcomponents of order εd+6 in Case 2. “Total” means the overall bound of E >2 Iερ (EF3)EF4 +EF>2 Iερ (EF3)E4 +EF>2 E3EF4, where only the major terms depending on n and ε in the leading order terms are shown.

Case 0 Case 1 Case 2T T N T N1 N2

EF2 ε2 ε2 ε2 ε2 ε2 ε4

Iερ (EF3) ε−(2∧ρ) ε−(2∧ρ) ε−(4∧ρ) ε−(2∧ρ) ε−(4∧ρ) ε−(6∧ρ)

EF4 ε2 ε2 ε2 ε2 ε2 ε4

E2

√log(n)

n1/2εd/2−1

√log(n)

n1/2εd/2−1

√log(n)

n1/2εd/2−2

√log(n)

n1/2εd/2−1

√log(n)

n1/2εd/2−2

√log(n)

n1/2εd/2−3

E3,1

√log(n)

n1/2εd/2−2+2(2∧ρ)

√log(n)

n1/2εd/2−2+2(2∧ρ)

√log(n)

n1/2εd/2−4+2(4∧ρ)

√log(n)

n1/2εd/2−2+2(2∧ρ)

√log(n)

n1/2εd/2−4+2(4∧ρ)

√log(n)

n1/2εd/2−6+2(6∧ρ)

E4

√log(n)

n1/2εd/2−1

√log(n)

n1/2εd/2−1

√log(n)

n1/2εd/2−2

√log(n)

n1/2εd/2−1

√log(n)

n1/2εd/2−2

√log(n)

n1/2εd/2−3

Total√

log(n)n1/2εd/2+(2∧ρ)−3

√log(n)

n1/2εd/2+(2∧ρ)−3 +

√log(n)

n1/2εd/2+(4∧ρ)−4

√log(n)

n1/2εd/2+(2∧ρ)−3 +

√log(n)

n1/2εd/2+(4∧ρ)−4

We could therefore recast [ 1nεd ∑

Nj=1(xk, j− xk)]

>Iερ ( 1nεd GnG>n )[

1nεd ∑

Nj=1(xk, j− xk)( f (xk, j)− f (xk))] as

[EF2 +E2]>[Iερ (EF3)+ E3][EF4 +E4]

=EF>2 Iερ (EF3)EF4 +[E >2 Iερ (EF3)EF4 +EF>2 Iερ (EF3)E4 +EF>2 E3EF4

]+[higher order terms

].

We now control the error term E >2 Iερ (EF3)EF4+EF>2 Iερ (EF3)E4+EF>2 E3EF4, which depends on the tangen-tial and normal components. Since the errors are of different orders in the tangential and normal directions, weshould evaluate the total error separately.

To avoid tedious description of each Case, we summarize the main order of each term for different Cases inTable 2.1. We mention that in Case 2, if the N1 part is zero; that is, the non-trivial eigenvalues corresponding to

the normal bundle are all of order εd+6, the final error rate is√

log(n)n1/2εd/2−1 , which is the same as Case 0.

We only carry out the calculation for Case 1, and skip the details for the other cases since the calculationis the same. By checking the error order in Table 2.1, the leading order error term of E >2 Iερ (EF3)EF4 +

EF>2 Iερ (EF3)E4 is controlled by√

log(n)n1/2εd/2+(2∧ρ)−3 +

√log(n)

n1/2εd/2+(4∧ρ)−4 , where√

log(n)n1/2εd/2+(2∧ρ)−3 comes from the tangen-

tial part, and√

log(n)n1/2εd/2+(4∧ρ)−4 comes from the normal part. Note that the sizes of (2∧ρ)−3 and (4∧ρ)−4 depend

on the chosen ρ , so we keep both. On the other hand, by (2.4.33) and Table 2.1, the error EF>2 E3EF4 is controlled

by√

log(n)n1/2εd/2+(4∧2ρ)−6 +

√log(n)

n1/2εd/2+(8∧2ρ)−8 . By a direct comparison, it is clear that when ε is sufficiently small, no matterwhich ρ is chosen, EF>2 E3EF4 is dominated by E >2 Iερ (EF3)EF4 +EF>2 Iερ (EF3)E4, and hence the total error

term is controlled by√

log(n)n1/2εd/2+(2∧ρ)−3 +

√log(n)

n1/2εd/2+(4∧ρ)−4 .

Therefore, when conditional on Ω, for all k = 1, . . . ,n, the deviation of the nominator of (2.4.11) fromE[F1]−EF>2 Iερ (EF3)EF4 depends on ρ and different Cases in Condition 2.4.1; for Case 0, it is controlled by

O(

√log(n)

n1/2εd/2−1 )+O(

√log(n)

n1/2εd/2+(2∧ρ)−3 ) =O(

√log(n)

n1/2εd/2−1 ) since (2∧ρ)−3= (−1)∧(ρ−3)≤−1; for Case 1 and Case 2,

it is controlled by O(

√log(n)

n1/2εd/2−1 )+O(

√log(n)

n1/2εd/2+(2∧ρ)−3 )+O(

√log(n)

n1/2εd/2+(4∧ρ)−4 ) = O(

√log(n)

n1/2εd/2+[(−1)∨(0∧(ρ−4)] ), which comesfrom the fact that (2∧ρ)−3≤−1 and (4∧ρ)−4 = 0∧ (ρ−4). Similarly, the deviation of the denominator of

Page 52: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 48

(2.4.11) from E[F0]−EF>2 Iερ (EF3)EF2 depends on ρ and different Cases in Condition 2.4.1; for Case 0, it is

controlled by O(

√log(n)

n1/2εd/2 )+O(

√log(n)

n1/2εd/2+(2∧ρ)−3 ) = O(

√log(n)

n1/2εd/2 ) since (2∧ρ)−3 ≤ −1 < 0; for Case 1 and Case 2,

it is controlled by O(

√log(n)

n1/2εd/2 )+O(

√log(n)

n1/2εd/2+(2∧ρ)−3 )+O(

√log(n)

n1/2εd/2+(4∧ρ)−4 ) = O(

√log(n)

n1/2εd/2 ), which comes from the factthat (4∧ρ)−4 = 0∧ (ρ−4)≤ 0.

As a result, when conditional on Ω, for all k = 1, . . . ,n, we have

N

∑j=1

wk( j) f (xk, j)− f (xk) (2.4.36)

=

E[F1]−EF>2 Iερ (EF3)EF4 +O(

√log(n)

n1/2εd/2−1 )

E[F0]−EF>2 Iερ (EF3)EF2 +O(

√log(n)

n1/2εd/2 )in Case 0

E[F1]−EF>2 Iερ (EF3)EF4 +O(

√log(n)

n1/2εd/2+[(−1)∨(0∧(ρ−4)] )

E[F0]−EF>2 Iερ (EF3)EF2 +O(

√log(n)

n1/2εd/2 )in Case 1,2

which leads to

N

∑j=1

wk( j) f (xk, j)− f (xk) (2.4.37)

=

Q f (xk)− f (xk)+O( √

log(n)n1/2εd/2−1

)in Case 0

Q f (xk)− f (xk)+O( √

log(n)n1/2εd/2+[(−1)∨(0∧(ρ−4)]

)in Case 1,2

where the equality comes from rewriting (2.4.12) as

Q f (xk)− f (xk) =EF1−EF>2 Iερ (EF3)EF4

EF0−EF>2 Iερ (EF3)EF2,

and the fact that EF0 is of order 1 and EF1 is of order ε2. Hence, we finish the proof.

2.5 Biased analysis on closed manifolds

For f ∈C(ι(M)), by the definition of A, we have

Q f (x) =(A f )(x)(A1)(x)

, (2.5.1)

where 1 means the constant function. We now provide an approximation of identity expansion of the Q operator.By a direct expansion, we have

A f (x) =∫

MKLLE(x,y) f (ι(y))P(y)dV (y). (2.5.2)

Page 53: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 49

While the formula of the Q operator looks like the diffusion process commonly encountered in the graph Laplacianbased approach, like the DM [18], the proof and the result are essentially different. To ease the notation, define

N0(x) :=1|Sd−1|

∫Sd−1

IIx(θ ,θ)dθ , (2.5.3)

M2(x) :=1|Sd−1|

∫Sd−1

IIx(θ ,θ)θθ>dθ , H f (x) := tr(M2(x)∇2 f (x)),

where f ∈C3(ι(M)).

Theorem 2.5.1. Suppose f ∈ C3(ι(M)) and P ∈ C5(ι(M)) and fix x ∈ M. Assume that Assumptions 2.1.2 and

2.3.1 hold and the regularization order is ρ ∈R. Following the same notations used in Proposition 2.3.2, we have

the following result

Q f (x)− f (x) = (C1(x)+C2(x))ε2 +O(ε3), (2.5.4)

where C1(x) and C2(x) depend on different cases stated in Condition 2.4.1.

• Case 0. In this case,

C1(x) =1

d +2[1

2∆ f (x)+

∇ f (x) ·∇P(x)P(x)

− ∇ f (x) ·∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

], (2.5.5)

C2(x) = 0 . (2.5.6)

• Case 1. In this case,

C1(x) =

1d+2

[ 12 ∆ f (x)+ ∇ f (x)·∇P(x)

P(x) − ∇ f (x)·∇P(x)

P(x)+ d(d+2)|Sd−1 |

ερ−2

]1− d

2(d+2) ∑pi=d+1

(N>0 (x)ei)2

2d λ

(2)i +

2(d+2)P(x)|Sd−1 |

ερ−4

, (2.5.7)

C2(x) =−

14(d+4) ∑

pi=d+1

(N>0 (x)ei)(H>f (x)ei)

2d λ

(2)i +

2(d+2)P(x)|Sd−1 |

ερ−4

1d −

12(d+2) ∑

pi=d+1

(N>0 (x)ei)2

2d λ

(2)i +

2(d+2)P(x)|Sd−1 |

ερ−4

. (2.5.8)

• Case 2. In this case,

C1(x) =

1d+2

[ 12 ∆ f (x)+ ∇ f (x)·∇P(x)

P(x) − ∇ f (x)·∇P(x)

P(x)+ d(d+2)|Sd−1 |

ερ−2

]1− d

2(d+2) ∑p−li=d+1

(N>0 (x)ei)2

2d λ

(2)i +

2(d+2)P(x)|Sd−1 |

ερ−4

, (2.5.9)

C2(x) =−

14(d+4) ∑

p−li=d+1

(N>0 (x)ei)(H>f (x)ei)

2d λ

(2)i +

2(d+2)P(x)|Sd−1 |

ερ−4

1d −

12(d+2) ∑

p−li=d+1

(N>0 (x)ei)2

2d λ

(2)i +

2(d+2)P(x)|Sd−1 |

ερ−4

. (2.5.10)

Intuitively, based on the approximation of the identity, the kernel representation of the Q operator suggests that

Page 54: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 50

asymptotically we get the function value back, with the second order derivative popping out in the second ordererror term. In the GL setup, it has been well known that the second order derivative term is the Laplace-Beltramioperator when the p.d.f. is constant [18]. However, due to the interaction between the geometric structure andthe barycentric coordinate, LLE usually does not lead to the Laplace-Beltrami operator, unless under specialsituations. Note that while we could still see the Laplace-Beltrami operator in C1, it is contaminated by otherquantities, including N0(x), H f (x) and λ

(2)i . These terms all depend on the second fundamental form. When

ρ > 4, the curvature term appears in the ε2 order term.

This theorem states that the asymptotic behavior of LLE is sensitive to the choice of ρ . We discuss each casebased on different choices of ρ . If ρ < 2, for all cases,

C1(x) =1

(d +2)[1

2∆ f (x)+

∇ f (x) ·∇P(x)P(x)

]and C2(x) = 0, (2.5.11)

which comes from the fact that when ερ is large, Tι(x) is small, and hence KLLE is dominated by 1. Note that notonly the Laplacian-Beltrami operator but also the p.d.f are involved, if the sampling is non-uniform. Therefore,when ρ is chosen too small, the resulting asymptotic operator is the Laplace-Beltrami operator, only when thesampling is uniform. If ρ = 3, for all cases we have

C1(x) =1

2(d +2)∆ f (x) and C2(x) = 0. (2.5.12)

In this case, we recover the Laplacian-Beltrami operator, and the asymptotic result of LLE is independent ofthe non-uniform p.d.f.. This theoretical finding partially explains why such regularization could lead to a goodresult. If ρ > 4, since εd+ρ is smaller than all eigenvalues of the local covariance matrix, asymptotically εd+ρ isnegligible and the result depends on different cases considered in Condition 2.4.1: for Case 0, we have

C1(x) =1

2(d +2)∆ f (x) and C2(x) = 0 ,

for Case 1, we have

C1(x) =1

2(d+2)∆ f (x)

1− d2

4(d+2) ∑pi=d+1

(N>0 (x)ei)2

λ(2)i

, C2(x) =−

d8(d+4) ∑

pi=d+1

(N>0 (x)ei)(H>f (x)ei)

λ(2)i

1d −

d4(d+2) ∑

pi=d+1

(N>0 (x)ei)2

λ(2)i

,

and for Case 2, we have

C1(x) =1

2(d+2)∆ f (x)

1− d2

4(d+2) ∑p−li=d+1

(N>0 (x)ei)2

λ(2)i

, C2(x) =−

d8(d+4) ∑

p−li=d+1

(N>0 (x)ei)(H>f (x)ei)

λ(2)i

1d −

d4(d+2) ∑

p−li=d+1

(N>0 (x)ei)2

λ(2)i

.

Note that when ρ > 4, we do not get the Laplace-Beltrami operator asymptotically in Cases 1 and 2. Furthermore,the behavior of LLE is dominated by the curvature and is independent of the p.d.f..

It is worth mentioning a specific situation when ρ > 4. Suppose the principal curvatures are equal to p ∈ Rin the direction ei, where i = d + 1, . . . , p, and vanish in the other directions. Then, there is a choice of basise1, . . . ,ed so that IIx(θ ,θ) · ei = ∑

dj=1 pθ 2

j = p, where θ = (θ1, . . . ,θd) ∈ Sd−1. Under this specific situation, by a

Page 55: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 51

direct expansion, we have a simplification that

d8(d +4)

(N>0 (x)ei)(H>f (x)ei) =

12(d +2)

∆ f (x) ,

which leads to C1(x)+C2(x) = 0. Therefore, asymptotically we obtain a fourth order term.

We mention that the statement “suppose ε is sufficiently small” in Proposition 2.3.1, Proposition 2.3.2 andTheorem 2.5.1 is a technical condition needed in the proof of Lemma 2.2.3, which describes how well we couldestimate the local geodesic distance by the ambient space metric. This technical condition depends on the factthat the exponential map is a diffeomorphism only if it is restricted to a subset of ι∗TxM that is bounded bythe injectivity radius of the manifold. That is, ε needs to be less than the injectivity radius. For any closed(compact without boundary) and smooth manifold, it is clear that different kinds of curvatures are bounded andthe injectivity radius is strictly positive, so there exists ε0 > 0 less than the injectivity radius, so that for all ε ≤ ε0,the statement “suppose ε is sufficiently small” is satisfied. The relationship between the curvature and the ε0 couldbe further elaborated by quoting the well known result in [16]: for a closed Riemannian manifolds of dimensiond with the sectional curvature bounded by K, where K ≥ 0, and with the volume lower bound v, where v > 0, theinjectivity radius is bounded below by i(d,K,v)> 0, where i(d,K,v) can be expressed explicitly in terms of d, K

and v. Hence, ε0 needs to satisfy ε0 < i(d,K,v).

2.5.1 Proof of Theorem 2.5.1

We need the following Proposition for the proof.

Proposition 2.5.1. Suppose l = nullity(M(2)22 )> 0 and Assumptions 2.1.2 and 2.3.1 hold. Then 〈IIx(θ ,θ),ei〉=

0 for p− l +1≤ i≤ p. Moreover, for m,n = p− l +1, . . . , p, we have

[M(4)

22,22−2M(2)21,2M(2)

12,2

]m−p+l,n−p+l =

d(d +2)36(d +6)|Sd−1|

∫Sd−1〈∇θ IIx(θ ,θ),em〉〈∇θ IIx(θ ,θ),en〉dθ

− d2(d +2)2

18|Sd−1|2(d +4)2

d

∑k=1

∫Sd−1〈∇θ IIx(θ ,θ),em〉〈ι∗θ ,ek〉dθ

∫Sd−1〈ι∗θ ,ek〉〈∇θ IIx(θ ,θ),en〉dθ .

This Proposition essentially says that if nullity(M(2)22 ) = l > 0 and M(2)

22 is diagonalized as in (2.3.12), thengeometrically ep−l+1, . . . ,ep are perpendicular to the second fundamental form IIx(θ ,θ). Furthermore, the eigen-values of order εd+6 in Case 2 of Proposition 2.3.2 depend only on the third order derivative of the embedding,∇θ IIx(θ ,θ), in those directions.

Proof. Suppose l = nullity(M(2)22 ) > 0. By Assumption 2.3.1, M(2)

22 is diagonalized as in (2.3.12). Therefore,based on (2.3.6) and (2.3.10), we have

∫Sd−1〈IIx(θ ,θ),em〉〈IIx(θ ,θ),em〉dθ = 0, (2.5.13)

where m = p− l +1, . . . , p.

If we denote θ = θ i∂i ∈ Sd−1 ⊂ TxM, the following expression for the second fundamental form holds:

〈IIx(θ ,θ),em〉=d

∑i=1

pmii θ

i2 +2 ∑i< j

pmi j θ

j , (2.5.14)

Page 56: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 52

where pmi j = 〈IIx(∂i,∂ j),em〉 ∈ R, i, j = 1, . . . ,d, are the corresponding coefficients. Note that ι∗∂i = ei for i =

1, . . . ,d. By plugging (2.5.14) into (2.5.13), we have

0 =∫

Sd−1

[ d

∑i=1

(pmii θ

i2)2 +4d

∑k=1

pmkkθ

k2∑i< j

pmi jθ

j +4(∑i< j

pmi jθ

j)2]dθ

=1

d(d +2)|Sd−1|

(3

d

∑i=1

(pmii )

2 +2 ∑i< j

pmii pm

j j +4 ∑i< j

(pmi j)

2)

=2d

∑i=1

(pmii )

2 +( d

∑i=1

pmii)2

+4 ∑i< j

(pmi j)

2,

which leads to the conclusion that pmi j = 0 for all i and j. To get the expansion of

[M(4)

22,22−2M(2)21,2M(2)

12,2

]m−p+l,n−p+l ,

we directly plug the above formula to (2.3.5) and (2.3.7) and get the claim.

Recall the definition of Tι(x)=Iεd+ρ (Cx)

[E(X−ι(x))χBRp

ε (x)

]in (2.4.9), which could be expanded as ∑

ri=1

E[(X−ι(x))χBRp

ε (ι(x))(X)]·ui

λi+εd+ρui ∈

Rp, where r is the rank of Cx and ui and λi form the i-th eigen-pair of Cx. Clearly, Tι(x) is dominated by those“small” eigenvalues of Cx.

Define the notation to simplify the statement of the next lemma:

J := Jp,p−dJp−d,p−d−l ∈ Rp×(p−d−l). (2.5.15)

Lemma 2.5.1. Fix x∈M and assume Assumptions 2.1.2 and 2.3.1 hold. Suppose ε is sufficiently small. Following

the same notations used in Proposition 2.3.2, under three conditions shown in Condition 2.4.1, Tι(x) satisfies:

Case 0. Tι(x) = [[v1, v2]]+ [[O(ε2), 0]], where

v1 =J>p,dι∗∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

+O(ε2), v2 = 0.

Case 1. Tι(x) = [[v1, v2]]+ [[O(ε2), O(1)]], where

v1 =J>p,dι∗∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

+p

∑i=d+1

N>0 (x)Jp,p−dX2J>p,p−dei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

X1S12J>p,p−dei ,

v2 =1ε2

p

∑i=d+1

N>0 (x)Jp,p−dX2J>p,p−dei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

X2J>p,p−dei ,

and N0(x) is defined in (2.5.3).

Page 57: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 53

Case 2. Tι(x) = [[v1, v2]]+ [[O(ε2), O(1)]], where

v1 =J>p,dι∗∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

+p−l

∑i=d+1

N>0 (x)JX2,1J>ei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

X2,1S12,1J>ei

+p

∑i=p−l+1

αi|Sd−1|P(x)

d(d+2) λ(4)i + ερ−6

X1S12,2J>p,lei , (2.5.16)

v2 =1ε2

p−l

∑i=d+1

N>0 (x)JX2,1J>ei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

[X2,1 0

0 X2,2

]J>p,p−dei

+1ε2

p

∑i=p−l+1

αi|Sd−1|P(x)

d(d+2) λ(4)i + ερ−6

[X2,1 0

0 X2,2

]J>p,p−dei , (2.5.17)

where αi ∈ R is defined in (2.5.19).

Proof. We show the lemma case by case, and we will recycle the equations shown in Lemma 2.2.5. Note thatalthough the eigenvectors of Cx might not be unique, we will see that the result is independent of the choice of theeigenvectors.

Case 0 in Condition 2.4.1. In this case, by Proposition 2.3.2, denote the i-th eigenvector of Cx as ui =

[X1J>p,dei +O(ε2)

0(p−d)×1

],

where i = 1, . . . ,d and X1 ∈ O(d), and the corresponding eigenvalue λi =|Sd−1|P(x)

d(d+2) εd+2 +O(εd+4). By Lemma2.2.5 and Lemma 2.3.2, we have 1≤ i≤ d

E[(X− ι(x))χBRpε (xk)

(X)] ·ui

=|Sd−1|d +2

[[J>p,dι∗∇P(x)

d+2 +O(εd+4),P(x)J>p,p−dN0(x)

d+2 +O(εd+4)]]· [[X1J>p,dei +O(ε2),0]]

=|Sd−1|

d(d +2)u>i ι∗∇P(x)εd+2 +O(εd+4) ,

where the last equality comes from the fact that 〈J>p,dι∗∇P(x), X1J>p,dei〉= e>i J>p,dX>1 ι∗∇P(x) = u>i ι∗∇P(x). Thus,

E[(X− ι(x))χBRpε (xk)

(X)] ·ui

λi + εd+ρ=

|Sd−1|d(d+2)u>i ι∗∇P(x)εd+2 +O(εd+4)

P(x)|Sd−1|d(d+2) εd+2 + εd+ρ +O(εd+4)

=u>i ι∗∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

+O(ε2) ,

where the last expansion holds for all chosen regularization order ρ . Specifically, when ρ > 2, it is trivial; whenρ ≤ 2, u>i ι∗∇P(x)+O(ε2)

P(x)+ d(d+2)|Sd−1 |

ερ−2+O(ε2)− u>i ι∗∇P(x)

P(x)+ d(d+2)|Sd−1 |

ερ−2is of order smaller than ε2 since the denominator is dominated by

Page 58: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 54

ερ−2. Hence, since ui for an orthonormal set, we have

Tι(x) =d

∑i=1

E[(X− ι(x))χBRpε (xk)

(X)] ·ui

λi + εd+ρui

=d

∑i=1

( u>i ι∗∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

+O(ε2))[[X1J>p,dei +O(ε2), 0]]

=[[ J>p,dι∗∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

, 0]]+[[O(ε2), 0]].

Case 1 in Condition 2.4.1. The eigenvalues of Cx are λi =|Sd−1|P(x)

d(d+2) (εd+2+λ(2)i εd+4+O(εd+6)) for i= 1, . . . ,d

and λi =|Sd−1|P(x)

d(d+2) λ(2)i εd+4 +O(εd+6) for i = d +1, . . . , p. The eigenvectors of Cx are

ui =

[X1J>p,dei

0(p−d)×1

]+ ε

2Ux(0)Sei +O(ε4) = [[X1J>p,dei +O(ε2), O(ε2)]]

for i = 1, . . . ,d, where Ux(0) =

[X1 00 X2

]∈ O(p), and

ui =

[0d×1

X2J>p,p−dei

]+ ε

2Ux(0)Sei +O(ε4) = [[J>p,dUx(0)Seiε2 +O(ε4), X2J>p,p−dei +O(ε2)]]

for i = d +1, . . . , p, X1 ∈ O(d) and X2 ∈ O(p−d). For 1≤ i≤ d, we have

E[(X− ι(x))χBRpε (xk)

(X)] ·ui

=|Sd−1|d +2

[[J>p,dι∗∇P(x)

d+2 +O(εd+4),P(x)J>p,p−dN0(x)

d+2 +O(εd+4)]]· [[X1J>p,dei +O(ε2), O(ε2)]]

=|Sd−1|

d(d +2)ι∗∇P(x)>Jp,dX1J>p,deiε

d+2 +O(εd+4),

and hence

E[(X− ι(x))χBRpε (xk)

(X)] ·ui

λi + εd+ρui

=

|Sd−1|d(d+2) (ι∗∇P(x))>Jp,dX1J>p,deiε

d+2 +O(εd+4)

|Sd−1|P(x)d(d+2) εd+2 + εd+ρ +λ

(2)i εd+4 +O(εd+6)

[[X1J>p,dei +O(ε2), O(ε2)]]

=[[ (ι∗∇P(x))>Jp,dX1J>p,dei

P(x)+ d(d+2)|Sd−1| ερ−2

X1J>p,dei, 0]]+[[O(ε2), O(ε2)]].

Page 59: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 55

Since columns of Jp,dX1 form an orthonormal basis of ι∗TxM, we have

d

∑i=1

E[(X− ι(x))χBRpε (xk)

(X)] ·ui

λi + εd+ρui

=d

∑i=1

[[ (ι∗∇P(x))>Jp,dX1J>p,dei

P(x)+ d(d+2)|Sd−1| ερ−2

X1J>p,dei, 0]]+[[O(ε2), O(ε2)]]

=[[ J>p,dι∗∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

+O(ε2), O(ε2)]].

For d +1≤ i≤ p, similarly we have

E[(X− ι(x))χBRpε (xk)

(X)] ·ui

=|Sd−1|d +2

[[J>p,dι∗∇P(x)

d+2 +O(εd+4),P(x)J>p,p−dN0(x)

d+2 +O(εd+4)]]

· [[J>p,dUx(0)Seiε2 +O(ε4), X2J>p,p−dei +O(ε2)]]

=|Sd−1|

2(d +2)P(x)N>0 (x)Jp,p−dX2J>p,p−deiε

d+2 +O(εd+4) ,

and hence

E[(X− ι(x))χBRpε (xk)

(X)] ·ui

λi + εd+ρui

=N>0 (x)Jp,p−dX2J>p,p−deiε

d+2 +O(εd+4)

2d λ

(2)i εd+4 + 2(d+2)

P(x)|Sd−1|εd+ρ +O(εd+6)

[[J>p,dUx(0)Seiε2 +O(ε4), X2J>p,p−dei +O(ε2)]]

=[[N>0 (x)Jp,p−dX2J>p,p−dei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

J>p,dUx(0)Sei,

N>0 (x)Jp,p−dX2J>p,p−deiX2J>p,p−dei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

1ε2

]]+[[O(ε2), O(1)]].

As a result, by the fact that J>p,dUx(0)Sei = X1S12J>p,p−dei when i = d +1, . . . , p, in this case we have

Tι(x) =[[ J>p,dι∗∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

+p

∑i=d+1

N>0 (x)Jp,p−dX2J>p,p−dei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

X1S12J>p,p−dei,

1ε2

p

∑i=d+1

N>0 (x)Jp,p−dX2J>p,p−dei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

X2J>p,p−dei

]]+[[O(ε2), O(1)]].

Case 2 in Condition 2.4.1. In this case, the eigenvalues of Cx are

λi =

|Sd−1|P(x)

d(d+2) εd+2 +O(εd+4) for i = 1, . . . ,d|Sd−1|P(x)

d(d+2) λ(2)i εd+4 +O(εd+6) for i = d +1, . . . , p− l,

|Sd−1|P(x)d(d+2) λ

(4)i εd+6 +O(εd+8) for i = p− l +1, . . . , p.

Page 60: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 56

Adapt notations from Proposition 2.3.2 and use

Ux(0) =

X1 0 00 X2,1 00 0 X2,2

∈ O(p) , S=

S11 S12,1 S12,2

S21,1 S22,11 S22,12

S21,2 S22,21 S22,22

∈ o(p),

where X1 ∈ O(d), X2,1 ∈ O(p− d − l) and X2,2 ∈ O(l). The eigenvectors of Cx, on the other hand, are ui =[X1J>p,dei

0(p−d)×1

]+ε2Ux(0)Sei +O(ε4) for i = 1, . . . ,d, ui =

[0d×1

X2,1J>ei

]+ε2Ux(0)Sei +O(ε4) for i = d+1, . . . , p− l,

and ui =

[0d×1

X2,2J>p,lei

]+ ε2Ux(0)Sei +O(ε4) for i = p− l +1, . . . , p.

Similar to Case 1, we could evaluateE[(X−ι(x))χ

BRpε (xk)

(X)]·ui

λi+εd+ρui for i = 1, . . . , p− l, and have

p−l

∑i=1

E[(X− ι(x))χBRpε (ι(xk))

(X)] ·ui

λi + εd+ρui

=[[ J>p,dι∗∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

+p−l

∑i=d+1

N>0 (x)JX2,1J>ei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

X2,1S12,1J>ei,

1ε2

p−l

∑i=d+1

N>0 (x)JX2,1J>ei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

X2,1J>ei

]]+[[O(ε2), O(1)]].

For p− l +1≤ i≤ p, base on Proposition 3.2.3, we have

N>0 (x)Jp,p−d

[X2,1 0

0 X2,2

]J>p,p−dei =N>0 (x)Jp,lX2,2J>p,lei = 0 , (2.5.18)

and hence

E[(X− ι(x))χBRpε (ι(xk))

(X)] ·ui (2.5.19)

=|Sd−1|d +2

[[J>p,dι∗∇P(x)

d+2 +O(εd+4),P(x)J>p,p−dN0(x)

d+2 +O(εd+4)]]

· [[X1S12,2J>p,leiε2 +O(ε4), X2,2J>p,lei +O(ε2)]]

=αiεd+4 +O(εd+6),

where we use the fact that J>p,dUx(0)Sei = X1S12,2J>p,lei when i = p− l +1, . . . , p, and αi ∈ R is the coefficient ofthe order εd+4 term. Note that the εd+2 term disappears due to (2.5.18). We mention that since αi will be canceled

Page 61: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 57

out in the main Theorem, we do not spell it out explicitly. Therefore,

p

∑i=p−l+1

E[(X− ι(x))χBRpε (ι(xk))

(X)] ·ui

λi + εd+ρui

=p

∑i=p−l+1

αiεd+4 +O(εd+6)

|Sd−1|P(x)d(d+2) λ

(4)i εd+6 + εd+ρ +O(εd+8)

× [[X1S12,2J>p,leiε2 +O(ε4), X2,2J>p,lei +O(ε2)]]

=p

∑i=p−l+1

αi|Sd−1|P(x)

d(d+2) λ(4)i + ερ−6

[[X1S12,2J>p,lei, X2,2J>p,lei

1ε2

]]+[[O(ε2), O(1)]].

As a result, in this case we have

Tι(x) =[[v1, v2]]+ [[O(ε2), O(1)]],

where

v1 =J>p,dι∗∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

+p−l

∑i=d+1

N>0 (x)JX2,1J>ei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

X2,1S12,1J>ei

+p

∑i=p−l+1

αi|Sd−1|P(x)

d(d+2) λ(4)i + ερ−6

X1S12,2J>p,lei

and

v2 =1ε2

p−l

∑i=d+1

N>0 (x)JX2,1J>ei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

X2,1J>ei

+1ε2

p

∑i=p−l+1

αi|Sd−1|P(x)

d(d+2) λ(4)i + ερ−6

X2,2J>p,lei.

Page 62: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 58

We introduce the following notations to simplify the proof:

ω(x) :=1|Sd−1|

∫Sd−1‖IIx(θ ,θ)‖2dθ

N1(x) :=1|Sd−1|

∫Sd−1‖IIx(θ ,θ)‖2IIx(θ ,θ)dθ

N2(x) :=1|Sd−1|

∫Sd−1

IIx(θ ,θ)Ricx(θ ,θ)dθ

M1(x) :=1|Sd−1|

∫Sd−1‖IIx(θ ,θ)‖2

θθ>dθ

M2(x) :=1|Sd−1|

∫Sd−1

IIx(θ ,θ)θθ>dθ

R0(x) :=1|Sd−1|

∫Sd−1

θ∇θ IIx(θ ,θ) · IIx(θ ,θ)dθ

R1(x) :=1|Sd−1|

∫Sd−1

∇θ IIx(θ ,θ)θ>dθ

R2(x) :=1|Sd−1|

∫Sd−1

∇θθ IIx(θ ,θ)dθ .

For f ∈C3(ι(M)) and P ∈C5(M), define

Ω f :=12

∇ f (x)>M2(x)∇P(x)+14

P(x)tr(M2(x)∇2 f (x))+16

P(x)R1(x)∇ f (x) (2.5.20)

J f (x) :=1|Sd−1|

ι∗

∫Sd−1

θ(1

6∇

3θ ,θ ,θ f (x)P(x)+

16

∇3θ ,θ ,θ P(x) f (x)+

12

∇2θ ,θ f (x)∇θ P(x)

+12

∇2θ ,θ P(x)∇θ f (x)− 1

6Ricx(θ ,θ)[ f (x)∇θ P(x)+∇θ f (x)P(x)]

)dθ .

We prepare some calculations. By Lemma 2.2.5, we have

E[χBRpε (ι(x))(X)] =

|Sd−1|d

P(x)εd +|Sd−1|

d(d +2)

[12

∆P(x) (2.5.21)

+s(x)P(x)

6+

d(d +2)ω(x)P(x)24

d+2 +O(εd+3),

and hence

E[( f (X)− f (x))χBRpε (ι(x))(X)] (2.5.22)

=E[ f (X)χBRpε (ι(x))(X)]− f (x)E[χBRp

ε (ι(x))(X)]

=|Sd−1|

d(d +2)

[12

P(x)∆ f (x)+∇ f (x) ·∇P(x)]ε

d+2 +O(εd+3).

Similarly, by Lemma 2.2.5, we have

E[(X− ι(x))χBRpε (ι(x))(X)] = [[v1,v2]], (2.5.23)

Page 63: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 59

where

v1 =|Sd−1|d +2

J>p,dι∗∇P(x)

d+2 +|Sd−1|

24J>p,dι∗

(M1(x)∇P(x)+P(x)R0(x)

d+4

+|Sd−1|d +4

J>p,d[J1(x)+

16R1(x)∇P(x)+

124

P(x)R2(x)]ε

d+4 +O(εd+5),

and

v2 =|Sd−1|d +2

P(x)J>p,p−dN0(x)

d+2 +|Sd−1|

24

P(x)J>p,p−dN1(x)

d+4

+|Sd−1|(d +4)

J>p,p−d[1

4tr(M2(x)∇2P(x))− 1

12f (x)P(x)N2(x)

d+4

+|Sd−1|

6(d +4)J>p,p−d

[R1(x)∇P(x)+

14

P(x)R2(x)]ε

d+4 +O(εd+5) .

Again, by Lemma 2.2.5, we have

E[(X− ι(x))( f (X)− f (x))χBRpε (ι(x))(X)] (2.5.24)

=E[(X− ι(x)) f (X)χBRpε (ι(x))(X)]− f (x)E[(X− ι(x))χBRp

ε (ι(x))(X)]

=[[v1,v2]],

where

v1 =|Sd−1|

d(d +2)P(x)J>p,dι∗∇ f (x)εd+2 +

|Sd−1|24

P(x)J>p,dι∗M1(x)∇ f (x)εd+4

+|Sd−1|d +4

J>p,d[J f (x)− f (x)J1(x)+

16

P(x)R1(x)∇ f (x)]ε

d+4 +O(εd+5)

and

v2 =|Sd−1|d +4

J>p,p−d

(12

∇ f (x)>M2(x)∇P(x)+14

P(x)tr(M2(x)∇2 f (x))

+16

P(x)R1(x)∇ f (x))

εd+4 +O(εd+5)

=|Sd−1|d +4

J>p,p−dΩ f εd+4 +O(εd+5).

With the above preparation, we are ready to prove Theorem 2.5.1.

Proof of Theorem 2.5.1. The proof is straightforward, and we show it case by case.Case 0 in Condition 2.4.1. In this case, by Lemma 2.5.1, (2.5.24), and (2.5.23),

T>ι(x)E[X( f (X)− f (x))χBRp

ε (ι(x))(X)] =|Sd−1|

d(d +2)P(x)∇ f (x) ·∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

εd+2 +O(εd+4)

and

T>ι(x)E[(X− ι(x))χBRp

ε (ι(x))(X)] =|Sd−1|

d(d +2)∇P(x) ·∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

εd+2 +O(εd+4),

Page 64: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 60

and hence

E[( f (X)− f (x))χBRpε (ι(x))(X)]−T>

ι(x)E[(X− ι(x))( f (X)− f (x))χBRpε (ι(x))(X)]

=|Sd−1|

d(d +2)

[12

P(x)∆ f (x)+∇ f (x) ·∇P(x)− P(x)∇ f (x) ·∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

d+2 +O(εd+4) .

Note that T>ι(x)E[(X− ι(x))χBRp

ε (ι(x))(X)] is of order O(εd+2) for any ρ , therefore

E[χBRpε (ι(x))(X)]−T>

ι(x)E[(X− ι(x))χBRpε (ι(x))(X)] =

|Sd−1|d

P(x)εd +O(εd+2).

As a result, we conclude that

Q f (x)− f (x) =1

(d +2)[1

2∆ f (x)+

∇ f (x) ·∇P(x)P(x)

− ∇ f (x) ·∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

2 +O(ε4).

Case 1 in Condition 2.4.1. Observe that by Lemma 2.5.1, the tangential component of Tι(x) is of order O(1)and the normal component of Tι(x) is of order O( 1

ε2 ). Hence, by (2.5.23)

T>ι(x)E[(X− ι(x))χBRp

ε (ι(x))(X)]

=|Sd−1|P(x)2(d +2)

p

∑i=d+1

(N>0 (x)Jp,p−dX2J>p,p−dei)2

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

εd +O(εd+2) ,

and hence

E[χBRpε (ι(x))(X)]−T>

ι(x)E[(X− ι(x))χBRpε (ι(x))(X)]

=

[|Sd−1|

dP(x)− |S

d−1|P(x)2(d +2)

p

∑i=d+1

(N>0 (x)Jp,p−dX2J>p,p−dei)2

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

d +O(εd+2).

Similarly, by (2.5.24)

T>ι(x)E[(X− ι(x))( f (X)− f (x))χBRp

ε (ι(x))(X)]

=

[|Sd−1|P(x)d(d +2)

(∇ f (x) ·∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

+p

∑i=d+1

N>0 (x)Jp,p−dX2J>p,p−dei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

∇ f (x)>Jp,dX1S12J>p,p−dei

)

+|Sd−1|(d +4)

p

∑i=d+1

N>0 (x)Jp,p−dX2J>p,p−dei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

Ω>f Jp,p−dX2J>p,p−dei

d+2 +O(εd+3) ,

which could be significantly simplified. Since M(2)12 satisfies (2.3.11), by a direct expansion we have that

∇ f (x)>Jp,dM(2)12 =

d(d +2)P(x)(d +4)

(12

∇ f (x)>M2(x)∇P(x)+16

P(x)R1(x)∇ f (x))>

Jp,p−d .

Page 65: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 61

By (2.3.15), we have X1S12 =−M(2)12 X2, and hence

|Sd−1|P(x)d(d +2)

N>0 (x)Jp,p−dX2J>p,p−dei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−d−4

∇ f (x)>Jp,dX1S12J>p,p−dei

= − |Sd−1|

d +4

N>0 (x)Jp,p−dX2J>p,p−dei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−d−4

(12

∇ f (x)>M2(x)∇P(x)+16

P(x)R1(x)∇ f (x))>

Jp,p−dX2J>p,p−dei .

Combining this with Ω f defined in (2.5.20), the second and third terms in T>ι(x)E[(X−ι(x))( f (X)− f (x))χBRp

ε (ι(x))(X)]

are simplified. As a result, we have

E[( f (X)− f (x))χBRpε (ι(x))(X)]−T>

ι(x)E[(X− ι(x))( f (X)− f (x))χBRpε (ι(x))(X)]

=

[|Sd−1|

d(d +2)

(12

P(x)∆ f (x)+∇ f (x) ·∇P(x)− P(x)∇ f (x) ·∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

)

− |Sd−1|P(x)

4(d +4)

p

∑i=d+1

N>0 (x)Jp,p−dX2J>p,p−dei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

H>f (x)Jp,p−dX2J>p,p−dei

d+2 +O(εd+3) .

To finish the proof for Case 1, we claim that

p

∑i=d+1

N>0 (x)Jp,p−dX2J>p,p−dei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

H>f (x)Jp,p−dX2J>p,p−dei =p

∑i=d+1

N>0 (x)eiH>f (x)ei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

.

Recall (2.3.21). Suppose there are q+t eigenvalues of M(2)22 , where q, t ∈N∪0, so that q eigenvalues are simple.

We have

X2 =

Iq×q 0 · · · 0

0 X12 · · · 0

0 0. . . 0

0 0 · · · X t2

,

where X j2 , where j = 1, . . . , t are orthogonal matrices whose size is the multiplicity of the associated eigenvalue.

Suppose X12 ∈ O(α), where α > 1. Then

d+q+α

∑i=d+q

(N>0 (x)Jp,p−dX2J>p,p−dei)(H>f (x)Jp,p−dX2J>p,p−dei) =

d+q+α

∑i=d+q

(N>0 (x)ei)(H>f (x)ei)

since the left hand side is the inner product between the projections of N0(x) and H f (x) onto the eigenspace. Bya similar argument for the other blocks, we conclude the claim. By exactly the same argument we have

p

∑i=d+1

(N>0 (x)Jp,p−dX2J>p,p−dei)2

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

=p

∑i=d+1

(N>0 (x)ei)2

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

.

In conclusion, we haveQ f (x)− f (x) = (C1(x)+C2(x))ε2 +O(ε3),

Page 66: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 62

where

C1(x) =

1d(d+2)

[12 ∆ f (x)+ ∇ f (x)·∇P(x)

P(x) − ∇ f (x)·∇P(x)

P(x)+ d(d+2)|Sd−1 |

ερ−2

]1d −

12(d+2) ∑

pi=d+1

(N>0 (x)ei)2

2d λ

(2)i +

2(d+2)P(x)|Sd−1 |

ερ−4

and

C2(x) =−

14(d+4) ∑

pi=d+1

(N>0 (x)ei)(H>f (x)ei)

2d λ

(2)i +

2(d+2)P(x)|Sd−1 |

ερ−4

1d −

12(d+2) ∑

pi=d+1

(N>0 (x)ei)2

2d λ

(2)i +

2(d+2)P(x)|Sd−1 |

ερ−4

.

Case 2 in Condition 2.4.1. In this case, by (2.5.16) and (2.5.17), we rewrite Tι(x) as

Tι(x) =[[v1, v2]]+ [[O(ε2), O(1)]],

where

v1 =J>p,dι∗∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

+p−l

∑i=d+1

N>0 (x)JX2,1J>ei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

X2,1S12,1J>ei

+p

∑i=p−l+1

αi|Sd−1|P(x)

d(d+2) λ(4)i + ερ−6

X1S12,2(Jp,l)>ei

and

v2 =1ε2

p−l

∑i=d+1

N>0 (x)JX2,1J>ei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

X2,2J>p,lei

+1ε2

p

∑i=p−l+1

αi|Sd−1|P(x)

d(d+2) λ(4)i + ερ−6

X2,2J>p,lei ,

Note that αi|Sd−1 |P(x)

d(d+2) λ(4)i +ερ−6

is of order 1 or smaller, no matter what regularization order ρ is chosen. Rewrite

(2.5.23) up to O(εd+4) as

E[(X− ι(x))χBRpε (ι(x))(X)] =

[[ |Sd−1|d +2

J>p,dι∗∇P(x)

d+2 +O(εd+4),

|Sd−1|d +2

P(x)J>p,p−dN0(x)

d+2 +O(εd+4)]].

We claim that in E[(X − ι(x))χBRpε (ι(x))(X)]>Tι(x), the “fourth order” terms, i.e., the terms with ∑

pi=p−l+1, do not

have dominant contribution asymptotically by showing that for each i = p− l +1, . . . , p, we have

E[(X− ι(x))χBRpε (ι(x))(X)]>

[[X1S12,2J>p,lei,

1ε2 X2,2J>p,lei

]]= O(εd+1) (2.5.25)

Page 67: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 63

and

E[( f (X)− f (x))(X− ι(x))χBRpε (ι(x))(X)]>

[[X1S12,2J>p,lei,

1ε2 X2,2J>p,lei

]]= O(εd+3). (2.5.26)

Since the tangential direction of Tι(x) is of order 1 and the normal direction of Tι(x) is of order ε−2, it is sufficientto focus on the normal direction in order to show (2.5.25). By Proposition 3.2.3, the dominant term in the normaldirection satisfies

N>0 (x)Jp,lX2,2J>p,lei = 0 ,

and hence (2.5.25) follows.

To show (2.5.26), for each p− l +1≤ i≤ p, by a direct expansion we have

E[( f (X)− f (x))(X− ι(x))χBRpε (ι(x))(X)]>

[[X1S12,2J>p,lei,

1ε2 X2,2J>p,p−dei

]]=( |Sd−1|

d(d +2)P(x)ι∗∇ f (x)>Jp,dX1S12,2J>p,lei +

|Sd−1|d +4

Ω>f Jp,lX2,2J>p,lei

d+2 +O(εd+3) .

Again, it is sufficient to focus on the normal direction. We now claim that

|Sd−1|d(d +2)

P(x)ι∗∇ f (x)>Jp,dX1S12,2J>p,lei +|Sd−1|d +4

Ω>f Jp,lX2,2J>p,lei = 0. (2.5.27)

Based on Lemma 2.5.1 and (2.3.23), the first part of (2.5.27) becomes

|Sd−1|P(x)d(d +2)

∇ f (x)>Jp,dX1S12,2J>p,lei

= − |Sd−1|P(x)

d(d +2)∇ f (x)>Jp,dM(2)

12,2X2,2J>p,lei

= − |Sd−1|P(x)

d(d +2)∇ f (x)>Jp,dM(2)

12 J>p,p−d Jp,lX2,2J>p,lei

= − |Sd−1|

(d +4)

(12

∇ f (x)>M2(x)∇P(x)+16

P(x)R1(x)∇ f (x))>

Jp,lX2,2J>p,lei ,

where the second equality comes from the direct expansion that

M(2)12,2 = M(2)

12 J>p,p−d Jp,l

and the last equality comes from (2.3.11). For the second part of (2.5.27), based on Proposition 3.2.3, for p− l +

1≤ i≤ p, we have

|Sd−1|d +4

Ω>f Jp,lX2,2J>p,lei

=|Sd−1|(d +4)

(12

∇ f (x)>M2(x)∇P(x)+16

P(x)R1(x)∇ f (x))>

Jp,lX2,2J>p,lei .

Thus, two terms in (2.5.27) cancel each other and (2.5.26) follows. Based on the above discussion, we know that

Page 68: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 64

E[(X− ι(x))χBRpε (ι(x))(X)]>Tι(x) is dominated by

E[(X− ι(x))χBRpε (ι(x))(X)]>

[[ J>p,dι∗∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

+p−l

∑i=d+1

N>0 (x)JX2,1J>ei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

X2,1S12,1J>ei,

1ε2

p−l

∑i=d+1

N>0 (x)JX2,1J>ei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

[X2,1 0

0 X2,2

]J>p,p−dei

]],

which is of order O(εd) by a similar argument as in Case 1, and E[( f (X)− f (x))(X − ι(x))χBRpε (ι(x))(X)]>Tι(x)

is dominated by

E[( f (X)− f (x))(X− ι(x))χBRpε (ι(x))(X)]>

[[ J>p,dι∗∇P(x)

P(x)+ d(d+2)|Sd−1| ερ−2

+p−l

∑i=d+1

N>0 (x)JX2,1J>ei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

X2,1S12,1J>ei,

1ε2

p−l

∑i=d+1

N>0 (x)JX2,1J>ei

2d λ

(2)i + 2(d+2)

P(x)|Sd−1|ερ−4

[X2,1 0

0 X2,2

]J>p,p−dei

]].

which is of order O(εd+2) by using a similar argument in Case 1. By putting the above together, we conclude that

Q f (x)− f (x) = (C1(x)+C2(x))ε2 +O(ε3),

where

C1(x) =

1d(d+2)

[12 ∆ f (x)+ ∇ f (x)·∇P(x)

P(x) − ∇ f (x)·∇P(x)

P(x)+ d(d+2)|Sd−1 |

ερ−2

]1d −

12(d+2) ∑

p−li=d+1

(N>0 (x)ei)2

2d λ

(2)i +

2(d+2)P(x)|Sd−1 |

ερ−4

C2(x) =−

14(d+4) ∑

p−li=d+1

N>0 (x)eiH>f (x)ei

2d λ

(2)i +

2(d+2)P(x)|Sd−1 |

ερ−4

1d −

12(d+2) ∑

p−li=d+1

(N>0 (x)ei)2

2d λ

(2)i +

2(d+2)P(x)|Sd−1 |

ερ−4

,

and hence we finish the proof.

2.6 Conculsion of the chapter: convergence of LLE on closed manifolds

By combining the variation analysis and the bias analysis shown above, we conclude the following pointwise

convergence theorem for LLE, when we have a proper choice of ρ .

Theorem 2.6.1. Take f ∈ C(ι(M)), ρ = 3, and ε = ε(n) so that√

log(n)n1/2εd/2+1 → 0 and ε → 0 as n→ ∞. With

probability greater than 1−n−2, for all xk ∈X ,

1ε2

[ N

∑j=1

wk( j) f (xk, j)− f (xk)]=

12(d +2)

∆ f (x)+O(ε)+O( √log(n)

n1/2εd/2+1

).

Based on the Borel-Cantelli Lemma, it is clear that asymptotically LLE converges almost surely. For practical

Page 69: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 65

purposes, we need to discuss the bandwidth choice when ρ = 3. Based on the assumption about the relationship

between n and ε , we have√

log(n)n1/2εd/2+1 → 0 as n→ ∞, but the convergence rate of

√log(n)

n1/2εd/2+1 might be slower thanε → 0. Suppose we call a bandwidth “optimal”, if it balances the standard deviation and the bias for all cases in

Condition 2.4.1; that is,√

log(n)n1/2εd/2+1 ε . We then have n

log(n) 1

εd+4 , and we can estimate the optimal bandwidthfrom n.

2.7 ε-radius neighborhood v.s. K nearest neighborhood

In the original article [49], the KNN scheme was proposed for LLE algorithm. However, the analysis in this paperhas been based on the ε-radius neighborhood scheme. These two schemes are closely related asymptotically fromthe viewpoint of density function estimation [43]. The following argument shows that the developed theorems areactually transferrable to the KNN scheme under the manifold setup.

For ι(xk) ∈X , take K nearest neighbors of ι(xk), namely ι(xk,1), . . . , ι(xk,K), with respect to the Euclideandistance. Intuitively, K is closely related to the volume of the minimal ball centered at xk with the radius ε(xk)

containing the K nearest neighbors of xk, where ε(xk) depends on K and the p.d.f.; that is, we expect to have

nP(xk)vol(Dxk)≈ K , (2.7.1)

where Dx := BRp

ε(x)(ι(x))∩ ι(M) is the minimal ball centered at x ∈M with the radius ε(x)> 0 so that Dx containsthe K nearest neighbors of x. Under the smoothness assumption of the p.d.f. and the manifold setup, we claim thatasymptotically when n→∞, this relationship holds uniformly over the manifold a.s., if K = K(n), K/ log(n)→∞

and K/n→ 0 as n→ ∞. This claim could be achieved by slightly modifying the argument for the Theoremin [20] to obtain the large deviation bound for (2.7.1) when n is finite. To bound Prsupx∈M | K

nvol(Dx)−P(x)| >

α, where α > 0, it is sufficient to bound the two terms on the right hand side of [20, equation (10)]. By astraightforward calculation of the equations on page 539 in [20], we achieve the bound Prsupx∈M | K

nvol(Dx)−

P(x)| > α ≤ poly(n)e−cKα3, where c is a constant depending on d and the upper bounds of P(x) on M, and

poly(n) = 3(1+ 2p+3np+3).2 Therefore, if we choose α = ( 2p+10c )1/3( logn

K )1/3, with probability greater than1−n−2, we have uniformly K

nvol(Dx)= P(x)+O(α). Note that by the assumption, α → 0 as n→ ∞. We conclude

that with probability greater than 1−n−2,

ε(x) =( d|Sd−1|

)1/d( KnP(x)

)1/d(1+O

(( lognK

)1/3)), (2.7.2)

where we use the fact that vol(Dx)=|Sd−1|

d ε(x)d +O(ε(x)d+1) when ε(x) is sufficiently small. It is transparent thatε(x) depends on n and ε(x)→ 0 a.s. as n→∞ since K(n)/n→ 0 by assumption. In other words, ε is not a constantvalue. It is a function depending on the p.d.f.. If we requre K =K(n) to additionally satisfy K(n)

nK(n)d/2

log(n)d/2 →∞, then

ε(xk) satisfies√

nn1/2ε(x)d/2+1 → 0 a.s.. On the other hand, notice that the statement of Theorem 2.6.1 is pointwise.

Therefore, its proof could be directly employed to the case when ε is chosen pointwisely, and hence the KNNscheme. As a result, if we take ρ = 3 and is K/n→ 0, K/ log(n)→ ∞, and (K/n)(K/ log(n))d/2 → ∞ when

2This can be observed by combining [20, equations (6) (7) (9) and (10)]. The second term on the right hand side of [20, equation (10)]is dominated by the first term. To bound the first term, we can substitute δ = Kβ

4n(PM+β ) and M = 4kPMnβ

into the fourth unlabeled equationon page 539 in [20], where PM is the upper bound of p.d.f. In the fourth unlabeled equation, α is the upper bound of the volume ratio ofBRp

2ε(x)(ι(x))∩ ι(M) and BRp

ε(x)(ι(x))∩ ι(M), which can be chosen as 3d when ε(x) is sufficiently small. Finally, we use the fact that when β issmall, the equation follows.

Page 70: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 66

n→ ∞, by plugging (2.7.2) into Theorem 2.6.1, when n is sufficiently large, the following convergence holds forall xk with probability greater than 1−2n−2:

K

∑j=1

wk( j) f (xk, j)− f (xk) =( d|Sd−1| )

1/d

2(d +2)∆ f (xk)

P(xk)2/d

(Kn

)2/d

+O(( log(n)

K

)1/3(Kn

)2/d)+O

(( log(n)K

)1/2(Kn

)1/d). (2.7.3)

In summary, unless the sampling is uniform, we do not obtain the Laplace-Beltrami operator with the KNNscheme. Based on the expansion (2.7.3), to obtain the Laplace-Beltrami operator with the KNN scheme, we couldnumerically consider a “normalized LLE matrix”; that is, find the eigen-structure of L := E −1(W − I), where W

is the ordinary LLE matrix, and E ∈ Rn×n is a diagonal matrix so that Eii = ε(xi)2. Since the analysis of the

pointwise convergence of L is similar to that of Theorem 2.6.1, we skip the details here.

2.8 LLE v.s. LLR

Based on the above theoretical study under the manifold setup, we could link LLE to the locally linear regression(LLR) [25, 17]. Recall that in the LLR, we locally fit a linear function to the response, and the associated kerneldepends on the inverse of a variation of the covariance matrix. We summarize how the LLR is operated. Considerthe following regression model

Y = m(X)+σ(X)ξ , (2.8.1)

where ξ is a random error independent of X with E(ξ ) = 0 and Var(ξ ) = 1, and both the regression function m

and the conditional variance function σ2 are defined on Rd . Let (Xl ,Yl)nl=1 denote a random sample observed

from model (2.8.1) with X := Xlnl=1 being sampled from X . Given (Xl ,Yl)n

l=1 and x ∈ Rd , the problemis then to estimate m(x) assuming enough smoothness of m. Choose a smooth kernel function with fast decayK : [0,∞]→ R and a bandwidth ε > 0. The LLR estimator for m(x) is defined as e>1 βββ x, where

βββ x = arg minβββ∈Rd+1

(Y−Xxβββ )>Wx(Y−Xxβββ ) , (2.8.2)

Y = (Y1, . . . ,Yn)> , Xx =

[1 . . . 1

X1 . . . Xn

]>∈ Rn×(d+1),

Wx = diag(Kε(X1,x), . . . ,Kε(Xn,x)) ∈ Rn×n,

and Kε(Xl ,x) := ε−dK(‖Xl− x‖Rd

/ε). By a direct expansion, (2.8.2) becomes

βββ x = (X>x WxXx)−1X>x WxY (2.8.3)

if (X>x WxXx)−1 exists. We have Xx =

[111>nGx

], where Gx is the data matrix associated with Xin

i=1 centered at x.

By yet another direct expansion by the block inversion,

e>1 βββ x = w(LLR)x

>Y , (2.8.4)

Page 71: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 67

where w(LLR)x is called the “smoothing kernel” and satisfies

w(LLR)x :=

111>n Wx−111>n WxG>x (GxWxG>x )−1GxWx

111>n Wx111n−111>n WxG>x (GxWxG>x )−1GxWx111n. (2.8.5)

Through a direct comparison, we see that the vector w(LLR)x is almost the same as the weight matrix in LLE

algorithm shown in (1.3.11), except the weighting by the chosen kernel – in LLE, the kernel function and itssupport are both determined by the data, while in the LLR the kernel is selected in the beginning and the datapoints are weighted by the chosen kernel like GxWx. If we choose the kernel to be a zero-one kernel with thesupport on the ball centered at x with the radius ε , then we “recover” (1.3.11).

Under the low dimensional manifold setup, GxWxG>x might not be of full rank. Note that the term GxWxG>xis the weighted local covariance matrix, which is considered in [55] to estimate the tangent space. Unlike the reg-ularized pseudo-inverse (1.3.9) in LLE, to handle this degeneracy issue, in LLR the data matrix Gx is constructedby projecting the point cloud to the estimated tangent plane. This projection step could be understood as takingthe Moore-Penrose pseudo-inverse approach to handle the degeneracy. We mention that in [17, Section 6], therelationship between the LLR and the manifold learning under the manifold setup is established. It is shown thatasymptotically, the smooth matrix from the kernel w(LLR)

x leads to the Laplace-Beltrami operator. The result isparallel to the reported result in this paper.

These relationships between LLE and the LLR suggest the possibility of fitting the data locally by taking thelocally polynomial regression into account, and generalizing the barycentric coordinates by fitting a polynomialfunction locally. This might lead to a variation of LLE that catches more delicate structure of the manifold, in adifferent adaptive way. Since this direction is outside the scope of this paper, the study of this possibility is left tofuture studies.

2.9 Error in variable

In this work, we analyze LLE under the assumption that the dataset is randomly sampled directly from a manifold,without any influence of the noise. However, the noise is inevitable and a further study is needed. By the analysis,we observe that LLE takes care of the error in variable challenge “in some sense”.

Suppose the dataset is yini=1 ⊂ Rp, where yi = zi +ξi, zi is supported on a manifold and ξi is an i.i.d. noise

with good properties. The question is to ask how much information LLE could recover from zini=1. A parallel

problem for the GL, or the more general graph connection Laplacian (GCL), has been studied in [23, 24]. It showsthat the spectral properties of the GL and GCL are robust to noise. For LLE, while a similar analysis could beapplied, if we view LLE as a kernel method and show a similar result, we mention that we might benefit by takingthe special algorithmic structure of LLE into account.

When the dimension of the dataset is high, the noise might have a nontrivial behavior. For example, whenthe dimension of the database p = p(n) satisfies p(n)/n→ γ > 0 when n→ ∞ (known as the large p and large n

setup), it is problematic to even estimate the covariance matrix. Note that the covariance matrix is directly relatedto LLE algorithm

since the covariance matrix appears in the regularized pseudo inverse, Inεd+ρ (GnG>n ), where Gn is the localdata matrix associated with yk determined from the noisy database yin

i=1, and GnG>n is the covariance matrix.Under the large p and large n setup, the eigenvalues and eigenvectors of the covariance matrix will both be

biased, depending on the “signal-to-noise ratio” and γ [34]. A careful manipulation of the noise, or a modifica-tion of the covariance matrix estimator, is needed in order to address these introduced biases. For example, the

Page 72: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 68

“shrinkage technique” was introduced to correct the eigenvalue bias with a theoretical guarantee [56, 21]. Thecovariance matrix estimator based on the shrinkage technique is Cn := ∑

pl=1 f (λl)ulu>l , where ul and λl form the

l-th eigenpair of GnG>n and f is the designed shrinkage function.

A direct comparison shows that the regularized pseudo inverse in LLE behaves like a shrinkage technique.Recall that Inεd+ρ (GnG>n ) = ∑

rnl=1

1λl+nεd+ρ

ulu>l (1.3.9), where rn is the rank of GnG>n , the shrinkage function isf (x) = 1

x+nεd+ρχ(0,∞)(x), and χ is the indicator function. Although how f corrects the noise impact is outside the

scope of this paper, it would be potential to carefully improve the regularized pseudo inverse by taking the shrink-age technique into account. In other words, by modifying the barycentric coordinate evaluation and applying thetechnique discussed in [23, 24], it is possible to improve LLE algorithm.

2.10 Numerical examples

2.10.1 Sphere

Suppose that Sp−1 ∈ Rp is the unit sphere in Rp. Denote Hk to be the space of homogeneous polynomials inRp restricted on Sp−1. We have that the space Hk is the eigenspace of the Laplace-Beltrami operator on Sp−1

corresponding to eigenvalue −k(k+ p−2), and the dimension of Hk is

(p+ k−1

p−1

)−

(p+ k−3

p−1

)[60]. In this

example, we show that if we choose a εd+ρ that is too small, then we are not going to get the Laplace-Beltramioperator. When ρ = 8, which is much greater than 3, by Theorem 2.5.1, we have

Q f (xk)− f (xk) =

(−(p−1)

8(p+3)(p+5)

p−1

∑i=1

∂4i f (xk)−

(p−1)24(p+3)(p+5) ∑

i6= j∂

2i ∂

2j f (xk)

− p+124(p+3)(p+5)

p−1

∑i=1

∂2i f (xk)

4 +O(ε6). (2.10.1)

It is obvious that asymptotically, we get the fourth order differential operator, instead of the Laplace-Beltramioperator. Specifically, when p = 2, or S1,

Q f (xk)− f (xk) =−1

280(

f ′′′′(xk)+ f ′′(xk))ε

4 +O(ε6). (2.10.2)

We mention that if the data set xini=1 is non-uniformly sampled based on the p.d.f. P from S1, then for any xk

we have Q f (xk)− f (xk) = Cε4 +O(ε6), where C depends on the first four order differentiation of f at xk and thefirst three order differentiations of P at xk.

Calculation of the sphere case. The calculation flow could serve as a simplified proof of Theorem 2.5.1under the special manifold setup, so we provide the details here. Consider the unit sphere Sp−1 ⊂Rp. We assumethat the center of the sphere is at [0, · · · ,0,1] the data set xin

i=1 is uniformly sampled from Sp−1, and xk is at theorigin. To simplify the calculation, for v ∈ Rp, denote v1 ∈ Rp−1 to be the first p−1 coordinates of v and v2 ∈ Rto be the last coordinate of v, and use the notation v = [[v1, v2]]. We parametrize Sp−1 \ [0, · · · ,0,2] by the normalcoordinates at xk via

θ t→ [[θ sin(t),1− cos(t)]] ∈ Sp−1 \ [0, · · · ,0,2] ,

Page 73: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 69

where θ ∈ Sp−2 ⊂ Txk Sp−1 ≈ Rp−1 and t ∈ [0,π) is the geodesic distance. The volume form is

dV = sinp−2(t)dtdθ .

Denote r := r(ε) to be the radius of the ball exp−1xk(BRp

ε (xk)∩ Sp−1) in Txk Sp−1, where ε is assumed to besufficiently small. By a direct calculation, we have

E[XX>χBRpε (xk)

(X)]

=

Sp−2

∫ r

0θθ> sin2(t)sinp−2(t)dtdθ

∫Sp−2

∫ r

0θ>(sin(t)− sin(t)cos(t))sinp−2(t)dtdθ

∫Sp−2

∫ r

0θ(sin(t)− sin(t)cos(t))sinp−2(t)dtdθ

∫Sp−2

∫ r

0(1− cos(t))2 sinp−2(t)dtdθ

.

Since∫

Sp−2 θθ>dθ = |Sp−2|p−1 I(p−1)×(p−1) and

∫Sp−2 θdθ = 0, we conclude that

Cxk =E[XX>χBRpε (xk)

(X)]

=

( |Sp−2|

p−1

∫ r

0sinp(t)dt

)I(p−1)×(p−1) 0

0 |Sp−2|∫ r

0(1− cos(t))2 sinp−2(t)dt

,

which is a diagonal matrix containing the eigenvalues of Cxk , and we can choose eipi=1 to be its orthonormal

eigenvectors. Next, we have

E[XχBRpε (xk)

(X)] =[[∫

Sp−2

∫ r

0θ sinp−1(t)dtdθ ,

∫Sp−2

∫ r

0(1− cos(t))sinp−2(t)dtdθ

]]=[[0, |Sp−2|

∫ r

0(1− cos(t))sinp−2(t)dt]].

We now choose ρ = 8. Therefore, by definition,

Txk = Iε p+5(Cxk)

[EXχBRp

ε (xk)

]=[[

0,∫ r

0 (1− cos(t))sinp−2(t)dt∫ r0 (1− cos(t))2 sinp−2(t)dt + ε p+7

]]=[[

0,∫ r

0 (1− cos(t))sinp−2(t)dt∫ r0 (1− cos(t))2 sinp−2(t)dt + rp+7 +O(rp+9)

]],

where the last equality holds since r = r(ε) = ε +O(ε3) and hence ε p+7 = rp+7 +O(rp+9). Thus, the kernelcentered at xk = 0 and evaluated at y = [[θ sin(t),1− cos(t)]] ∈ Rp satisfies

KLLE(xk,y) = 1− [[θ sin(t),1− cos(t)]] ·Txk

=1− (1− cos(t))∫ r

0 (1− cos(t))sinp−2(t)dt∫ r0 (1− cos(t))2 sinp−2(t)dt + rp+7 +O(rp+9)

=1− (1− cos(t))(

2(p+3)(p+1)r2 +(

p2 +14p−36(p+1)(p+5)

)+O(r2)

).

Suppose f ∈C5(Sp−1), we are going to calculate∫

KLLE(xk,y) f (y)dV (y)∫KLLE(xk,y)dV (y) − f (xk). The evaluation of

∫KLLE(xk,y)dV (y)

Page 74: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 70

is direct, and we have∫KLLE(xk,y)dV (y)

=∫

Sp−2

∫ r

0

(1− (1− cos(t))

(2(p+3)(p+1)r2 +(

p2 +14p−36(p+1)(p+5)

)+O(r2)

))sinp−2(t)dtdθ

=( 4|Sp−2|(p+1)(p2−1)

)rp−1 +O(rp+1) .

On the other hand, we have∫KLLE(xk,y)( f (y)− f (xk))dV (y)

=∫

Sp−2

∫ r

0(∇θ f (xk)t +

12

∇2θθ f (xk)t2 +

16

∇3θθθ f (xk)t3 +

124

∇4θθθθ f (xk)t4 +O(t5))

×(

1− (1− cos(t))[ 2(p+3)(p+1)r2 +(

p2 +14p−36(p+1)(p+5)

)+O(r2)])

sinp−2(t)dtdθ .

We calculate each part in the above integration by using the symmetry of sphere Sp−2 in the tangent space.Specifically, we have∫

Sp−2

∫ r

0

(∇θ f (xk)t +

12

∇2θθ f (xk)t2 +

16

∇3θθθ f (xk)t3

+1

24∇

4θθθθ f (xk)t4 +O(t5)

)sinp−2(t)dtdθ

=

∫Sp−2 ∇2

θθf (xk)dθ

2(p+1)rp+1 +

(∫Sp−2 ∇4θθθθ

f (xk)dθ

24(p+3)−

(p−2)∫

Sp−2 ∇2θθ

f (xk)dθ

12(p+3))rp+3 +O(rp+5)

and ∫Sp−2

∫ r

0

(∇θ f (xk)t +

12

∇2θθ f (xk)t2 +

16

∇3θθθ f (xk)t3

+1

24∇

4θθθθ f (xk)t4 +O(t5)

)(1− cos(t))sinp−2(t)dtdθ

=

∫Sp−2 ∇2

θθf (xk)dθ

4(p+3)rp+3 +

∫Sp−2 ∇4

θθθθf (xk)− (2p−3)∇2

θθf (xk)dθ

48(p+5)rp+5 +O(rp+7).

Due to 2(p+3)(p+1)r2 , the term of order rp+1 is cancelled and we obtain

∫KLLE(xk,y)( f (y)− f (xk))dV (y)

=−1

6(p+1)(p+3)(p+5)

(∫Sp−2

∇4θθθθ f (xk)dθ +

∫Sp−2

∇2θθ f (xk)dθ

)rp+3 +O(rp+5).

We use the fact that r = r(ε) = ε +O(ε3) and summarize the result as follows:∫KLLE(xk,y) f (y)dV (y)∫

KLLE(xk,y)dV (y)− f (xk) =

∫KLLE(xk,y)( f (y)− f (xk))dV (y)∫

KLLE(xk,y)dV (y)

=−(p2−1)

24|Sp−2|(p+3)(p+5)

(∫Sp−2

∇4θθθθ f (xk)dθ +

∫Sp−2

∇2θθ f (xk)dθ

4 +O(ε6).

Page 75: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 71

Finally, if we use formulas

∫Sp−2

x4i dθ =

3p+1

|Sp−2|∫Sp−2

x2i x2

jdθ =1

p+1|Sd−2| i 6= j∫

Sp−2x2

kxix jdθ = 0 i 6= j ,

(2.10.3)

we obtain the expansion (2.10.1).

We now numerically show the relationship between the non-uniform sampling scheme and the regularizationterm. Fix n= 30,000. Take non-uniform sampling points θi := 2πUi+0.3sin(2πi/n) on (0,2π], where i= 1, . . . ,nand Ui is the uniform distribution on [0,1], and construct X2 = (cos(θi),sin(θi))

>ni=1 ⊂ R2. Run LLE with

ε = 0.0002 and different ρ’s, and evaluate the first 400 eigenvalues. Based on the theory, we know that whenρ < 3, the asymptotic depends on the non-uniform density function; when ρ = 3, we recover the Laplace-Beltramioperator in the ε2 order; when ρ > 3, we get the fourth order differential operator in the ε4, which depends onthe non-uniform density function. See Figure 2.1 for a comparison of the estimated eigenvalues and the predictedeigenvalues under different setups. We clearly see that the eigenvalues are well predicted under different ρ . Whenρ = 8, we get the fourth order term that depends on the non-uniform density function; when ρ = 3, LLE isindependent of the non-uniform density function and we recover the spectrum of the Laplace-Beltrami operatorin the second order term, as is predicted by the developed theory; when ρ =−5, the non-uniform density functioncomes into play, and the eigenvalues are slightly shifted. To enhance the visualization, the difference betweenthe estimated eigenvalues of S1 and the theoretical values are shown on the middle subplot. The eigenfunctionsprovide more information. When ρ = −5 and ρ = 8, the dependence of the eigenfunctions on the non-uniformdensity function could be clearly seen.

Next, we show the results on S2 with different radii under the non-uniform sampling scheme with ρ = 3 anddifferent ε’s. Fix n = 30,000. Take uniform sampling points xi = (xi1,xi2,xi3)

> ∈ S2 ⊂ R3, where i = 1, . . . ,n,randomly choose n/10 points, randomly perturb those n/10 points by setting xi3 := xi3 + 1− cos(2πUi), whereUi is the uniform distribution on [0,1], and yi := (xi1,xi2,xi3)

>

‖(xi1,xi2,xi3)>‖. As a result, Y := yin

i=1 ⊂ S2 is nonuniformlydistributed on S2. Denote rY to be the scaled sampling points on the sphere with radius r > 0. Run LLE onrY with different ε’s, and evaluate the first 400 eigenvalues. We consider r = 0.5,1,2. For r = 1, considerε = 0.02; for r = 0.5, consider ε = 0.02/4 and 0.02/6; for r = 2, consider ε = 0.02×4 and 0.02×3. Based onthe theory, when ρ = 3, LLE is independent of the non-uniform density function and we obtain the eigenvalues ofthe Laplace-Beltrami operator in all cases. See Figure 2.2 for the results under different setups. Theoretically, theeigenvalues of S2 without counting multiplicities are νi = −i(i+ 1), where i = 0,1, . . .. The multiplicity of νi is2i+1. When the radius is r > 0, the eigenvalues are scaled by r−2. The eigenvalues, as is shown in Figure 2.2, canbe well estimated by LLE, and the gap between the eigenvalues of spheres with different radii is predicted. Thesawtooth behavior of the error comes from the spectral convergence behavior of eigenvalues with multiplicities.Note that there are 19 eigenvalues with multiplicity greater than 1 in the first 400 eigenvalues, which match the19 oscillations found in Figure2.2(b). The eigenfunctions are shown in Figure 2.2(c). As is predicted, the firsteigenfunction is constant, as is shown in ψ1. The eigenspace of ν1 is spanned by three linear functions x, y, andz, restricted on S2. Theresore, ψ4 is a linear. The eigenspace of ν` is spanned by spherical harmonics of order `,and its oscillation is illustrated in ψ9 associated with ν2 and ψ16 associated with ν3.

Page 76: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 72

(a) S1 Eigenvalues (b) S1 Eigenvalues error

(c) S1 Eigenfunctions

Figure 2.1: The first 400 eigenvalues of LLE on 30,000 points sampled from S1 under a non-uniform samplingscheme with ρ =−5,3,8. λk and ψk are the k-th largest eigenvalue and the corresponding eigenfunction of LLEunder different situations. λk denotes the estimated k-th smallest eigenvalue of the Laplace-Beltrami operatoror the fourth order differential operator under different situations. The theoretical value, Lk := d k−1

2 e2 for the

Laplace-Beltrami operator and Lk := d k−12 e

4−d k−12 e

2 for the fourth order differential operator f ′′′′+ f ′′, wheredxe means the the least integer greater than or equal to x, are provided for a comparison. The eigenvalues andthe theoretical values under different setups are shown in 2.1(a), with Lk shown as the black crosses and Lk as theblack circles. To enhance the visualization, the deviation of the evaluated eigenvalues from the theoretical valuesunder different setups are shown in 2.1(b). The tenth eigenfunctions associated with the tenth largest eigenvaluesof LLE under different setups are shown in 2.1(c). Note that when ρ = 3, we recover the spectrum of the Laplace-Beltrami operator when the sampling is non-uniform; when ρ = −5, the non-uniform density function comesinto play, and the eigenvalues are shifted from the theoretical value. For the non-uniform sampling scheme andρ = 8, theoretically the first three eigenvalues come from the six order term and depend on the non-uniformdensity function. Therefore, numerically the first three eigenvalues are non-zero. When ρ = −5 and ρ = 8, theeigenfunctions are the same (up to the global rotation), and depend on the non-uniform density function.

2.10.2 Examine the kernel

We now show the numerical simulations of the corresponding kernel on the unit circle S1 embedded in R2. We takea uniform grid θi := 2πi/n on (0,2π], where n∈N and i= 1, . . . ,n, and construct X = xi :=(cos(θi),sin(θi))

>ni=1⊂

R2, which could be viewed as a uniform sampled set from the unit circle. We fix n = 10,000. We then run LLEwith ε = [(cos(θK/2)− 1)2 + sin(θK/2)

2]1/2, where K ∈ N. See Figure 2.3 for an example of the correspondingkernels when K = 80, and K = 320. Note that the constructed normalized kernel, KLLE(x1000,y)∫

KLLE(x1000,y)dV (y) , is non-positive.

Page 77: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 73

(a) S2 Eigenvalues (b) S2 Eigenvalues error

(c) S2 Eigenfunctions

Figure 2.2: 2.2(a): the first 400 eigenvalues of LLE with ρ = 3 but different ε , over a n = 30,000 non-uniformsampling points on S2 with different radii r > 0. λk is the k-th smallest eigenvalue of the Laplace-Beltrami operatorestimated by LLE under different situations. When r = 0.5 (respectively r = 1 and r = 2), λk are shown in theblack (respectively blue and gray) curve. The results with different ε are shown as the red dash (respectivelyblue dash) when r = 0.5 (respectively r = 2). The theoretical eigenvalues for the canonical S2 (with the radius1), denoted as Lk, k = 1, . . ., are provided for a comparison (superimposed as black circles). 2.2(b): to enhancethe visualization, the difference between the theoretical values and numerical values, log10(λk)− log10(Lk), areshown with the same color and line properties as those shown on 2.2(a). Some eigenfunctions evaluated whenr = 0.5 are shown on 2.2(c).

Next, we show the numerical simulations of the corresponding kernel on the 1-dim flat torus T1 ∼ R/Z withthe induced metric from the canonical metric on R1. We take a uniform grid on T1 as θi = 2πi/nn

i=1, and takeX = xi := (cos(θi),sin(θi))

>ni=1 ⊂R2 to illustrate the flat torus. Fix n = 10,000 and run LLE with ε = |θK/2|,

where K ∈ N. See Figure 2.3 for an example of the corresponding kernels when K = 80 and K = 320. Theconstructed normalized kernel, as the theory predicts, is constant. Note that in this case, we can view the flat1-dim flat torus as the unit circle, when we have the access to the geodesic distance information on the manifold.

Finally, we take a look at the unit sphere S2 embedded in R3 with the center at (0,0,1), and its correspondingkernel. We uniformly sample n points, X = xin

i=1 ⊂R3, from S2. Fix n= 10,000 and run LLE with 400 nearestneighbors. See Figure 2.3 for the corresponding kernel. Note that the normalized kernel is not positive. Theseexamples show that even with the simple manifolds, the corresponding kernels might be complicated.

Page 78: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 74

(a) S1 kernel (b) T1 kernel

(c) S2 kernel

Figure 2.3: 2.3(a): the sampled S1 is illustrated as the gray circle embedded in the (x,y)-plane. The black thickline indicates the first 320 neighbors of the central point x1000. The red line is the corresponding normalizedkernel, KLLE(x1000,y)∫

KLLE(x1000,y)dV (y) , when K = 80, and the blue line is the corresponding normalized kernel when K = 320.

It is clear that the kernel changes sign. 2.3(b): a surrogate of the sampled flat 1-dim torus T1 is illustrated as thegray circle embedded in the (x,y)-plane. The black thick line indicates the first 320 neighbors of the central pointx1000. The red line is the corresponding normalized kernel when K = 80, and the blue line is the correspondingnormalized kernel when K = 320. In this flat manifold case, the kernel is constant. 2.3(c): a surrogate of theuniformly sampled S2. Only the first 10,000 nearest points of the chosen x = (0,0,0) are plotted as the graypoints. Note that the scale of the x and y axes and the z axis are different. The black points indicate the first 400neighbors of x. The red points are the corresponding normalized kernel values when K = 400. It is clear that thekernel is non-positive.

2.10.3 Two-dimensional random tomography example

To further examine the capability of LLE from the viewpoint of nonlinear dimension reduction, we consider thetwo-dimensional random tomography problem [56]. It is chosen because its geometrical structure is well knownand complicated.

We briefly describe the dataset and refer the reader with interest to [56]. The classical two-dimensional trans-mission computerized tomography problem is to recover the function f : R2→R from its Radon transform. In theparallel beam model, the Radon transform of f is given by the line integral Rθ f (s) =

∫x·θ=s f (x)dx, where θ ∈ S1

is perpendicular to the beaming direction θ⊥ ∈ S1, where S1 is the unit circle, and s ∈R. We call θ the projection

direction and Rθ f the projected image. There are cases, however, in which we only have the projected images andthe projection directions are unknown. In such cases, the problem at hand is to estimate f from these projectedimages without knowing their corresponding projection directions. To better study this random projection prob-lem, we need the following facts and assumptions. First, we know that for f ∈ L2(R2) with a compact support

Page 79: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 75

within B1(0), the map R· f : θ ∈ S1 7→ L2([−1,1]) is continuous [56]. To simplify the discussion, we assume thatthere is no symmetry in f ; that is, Rθ1 f and Rθ2 f are different for all pairs of θ1 6= θ2. Next, take S := sip

i=1 tobe the chosen set of sampling points on [−1,1], where p ∈N. In this example, we assume that S is a uniform gridon [−1,1]; that is, si =−1+2(i−1)/(p−1). For θ ∈ S1, denote the discretization of the projection image Rθ f

as DS : L2([−1,1])→ Rp, which is defined by DS : Rθ f 7→ (Rθ f ?hε(s1),Rθ f ?hε(s2), . . . ,Rθ f ?hε(sp))> ∈ Rp,

where hε(x) := 1ε

h( xε), h is a Schwartz function, hε converges weakly to the Dirac delta measure at 0 as ε → 0.

Note that, in general, Rθ f is a L2 function when f is a L2 function. Therefore, we need a convolution to modelthe sampling step. We assume that the discretization DS is dense enough, so that M1 := Dp Rθ fθ∈S1 is alsosimple. In other words, we assume that p is large enough so that M1 is a one-dimensional closed simple curvedembedded in Rp and M1 is diffeomorphic to S1. Finally, we sample finite points from S1 uniformly and obtain thesimulation.

With the above facts and assumptions, we sample the Radon transform X := xi := DS Rθi fni=1 ⊂Rp with

finite projection directions θini=1, where θin

i=1 is a finite uniform grid on S1; that is, X is sampled from theone-dimensional manifold M1. For the simulations with the Shepp-Logan phantom, we take n = 4096, and thenumber of discretization points was p = 128. It has been shown in [56], that the DM could recover the M1 up todiffeomorphism, that is, we could achieve the nonlinear dimensional reduction. In order to avoid distractions, wedo not consider any noise as is considered in [56], and focus our analysis on the clean dataset. The Shepp-Loganimage, some examples of the projections and the results of PCA, DM and LLE, are shown in Figure 2.4. As isshown in [56], the PCA fails to embed X with only the first three principal components, while the DM succeeded.There can be additional discussion for the DM, particularly its robustness to the noise and metric design. Theyhave been extensively discussed in [56], so they are not discussed here. For LLE, we take ε = 0.004. Theembedding results of LLE with different regularization orders, ρ = 8,3,−5, are shown. Due to the complicatedgeometrical structure, we encounter difficulty even to recover the topology of M1 by LLE, if the regularizationorder is not chosen properly.

To examine whether the sign of the kernel corresponding to LLE is indeterminate in this database, we fixedx3555 ∈X , and apply the PCA to visualize its K = 150 neighbors. The kernel function is shown in Figure 2.4 asthe color encoded on the embedded points. The sign of the kernel is indeterminate, as is predicted by the abovetheory due to the existence of curvature. In summary, we should be careful when we apply LLE to a complicatedreal database.

Page 80: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 2. LLE ON CLOSED MANIFOLDS 76

Figure 2.4: Top row: the left panel is the Shepp-Logan phantom, the middle panel shows two projection imagesfrom two different projection directions, and the right panel shows the linear dimension reduction of the datasetby the first three principal components, u1,u2 and u3. Middle row: the left panel shows the diffusion map (DM) ofthe dataset, where the embedding is done by choosing the first two non-trivial eigenvectors of the graph Laplacian,φ2 and φ3, and we simply take the Gaussian kernel to design the affinity without applying the α-normalizationtechnique [18], the middle panel shows the DM of the dataset, where we apply the α-normalization techniquewhen α = 1, and the right panel shows that the sign of the kernel corresponding to the locally linear embedding(LLE) is indeterminate, where the black cross indicates x3555, and the kernel value on its neighbors are encodedby color (the neighbors are visualized by the top three principal components, v1, v2, and v3). Bottom row: theembedding using the second and third eigenvectors of LLE, ψ2 and ψ3, under different setups are shown. Theleft panel shows the result with ρ =−5, the middle panel shows the result with ρ = 3, and the right panel showsthe result with ρ = 8. The results shows the importance of choosing the regularization and are explained by thetheory.

Page 81: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

Chapter 3

LLE on manifolds with boundary

3.1 Setup on manifolds with boundary and preliminary lemmas

Let X be a p-dimensional random vector with the range supported on a d-dimensional compact, smooth Rieman-nian manifold (M,g) isometrically embedded in Rp via ι : M → Rp. When the boundary of M is not empty, weassume that it is smooth. When ∂M 6= /0, we define the ε-neighborhood of ∂M as

Mε = x ∈M|d(x,∂M)< ε. (3.1.1)

Let P ∈ C2(ι(M)) be the probability density function (p.d.f.) associated with X , and there exist Pm > 0and PM ≥ Pm so that Pm ≤ P(x) ≤ PM < ∞ for all x ∈ ι(M). Let X = ι(xi)n

i=1 ⊂ ι(M) ⊂ Rp denote a setof identical and independent (i.i.d.) random samples from X , where xi ∈ M. For ι(xk) ∈ X and ε > 0, wehave Nι(xk) := ι(xk,1), · · · , ι(xk,N) ⊂ BRp

ε (ι(xk))∩ (X \ι(xk)). Take Gn ∈ Rp×N to be the local data matrixassociated with Nι(xk) and evaluate the barycentric coordinates wn = [wn,1, · · · ,wn,N ]

> ∈ RN .For x ∈Mε , define

Dε(x) = (ι expx)−1(BRp

ε ∩ ιM)⊂ TxM ≈ Rd . (3.1.2)

Denote x′ = argminy∈∂M d(y,x) and ε(x) = miny∈∂M d(y,x). Clearly, we have 0 ≤ ε(x) ≤ ε . Choose the normalcoordinates ∂id

i=1 around x, so that x′ = ι expx(ε(x)∂d). If ε is sufficiently small, such x′ is unique.In this chapter, we again assume the Assumption 2.1.2.Note that (ι expx)

−1(BRpε ∩ ι(∂M)) can be regarded as the graph of a function depending on the curvature;

that is, there is a domain K ∈ Rd−1 and a smooth function q defined on K, such that

(ι expx)−1(BRp

ε (ι(x))∩ ι(∂M)) (3.1.3)

=(u1, · · · ,ud) ∈ TxM|(u1, · · · ,ud−1) ∈ K,ud = q(u1, · · · ,ud−1) ,

where q(u1, · · · ,ud−1) can be approximated by ε(x)+∑d−1i, j=1 ai juiu j up to the error depending on a cubic function

of u1, . . . ,ud−1 and ai j are the second fundamental form of the embedding of ∂M to Rp. Note that in general(ι expx)

−1(BRpε (ι(x))∩ ι(∂M)) is not symmetric across the axes ∂1, . . .∂d−1.

Now we define the symmetric region associated with (ι expx)−1(BRp

ε (ι(x))∩ ι(∂M)).

Definition 3.1.1. For x ∈ Mε and ε sufficiently small, choose a normal coordinate ∂idi=1 around x so that

77

Page 82: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 78

argminy∈∂M d(y,x) = ι expx(ε(x)∂d). The symmetric region associated with (ι expx)−1(BRp

ε (ι(x)) is defined as

Dε(x) = (u1, · · ·ud) ∈ TxM|d

∑i=1

u2i ≤ ε

2 and ud ≤ ε(x)+d−1

∑i, j=1

ai juiu j ⊂ TxM. (3.1.4)

For x 6∈Mε and ε sufficiently small, choose a normal coordinate ∂idi=1 around x and define the symmetric region

associated with (ι expx)−1(B0

εRp(ι(x)) as

Dε(x) = (u1, · · ·ud) ∈ TxM|d

∑i=1

u2i ≤ ε

2 ⊂ TxM. (3.1.5)

Clearly, Dε(x) is an approximation of Dε(x) up to the third order error term. When x ∈Mε , it is symmetricacross ∂1, . . . ,∂d−1 since if (u1, · · · ,ui, · · ·ud) ∈ Dε(x), then (u1, · · · ,−ui, · · · ,ud) ∈ Dε(x) for i = 1, · · · ,d−1.

Next Lemma describes the error between∫

Dε (x) f (u)du and∫

Dε (x) f (u)du.

Lemma 3.1.1. Fix x ∈M. When ε > 0 is sufficiently small, we have∣∣∣∣∫Dε (x)du−

∫Dε (x)

du∣∣∣∣= O(εd+2). (3.1.6)

Proof. Based on (2.2.3) and the definition of Dε(x), the distance between the boundary of Dε(x) and the boundaryof Dε(x) is of order O(ε3). The volume of the boundary Dε(x) is of order O(εd−1). Hence the volume differencebetween Dε(x) and Dε(x) is of order O(εd−1 · ε3) = O(εd+2). The conclusion follows.

For x ∈M, consider the following quantities, which could be understood as moments capturing the geometricasymmetry:

µv(x,ε) :=∫

Dε (x)

d

∏i=1

uvii du , (3.1.7)

where v = [v1, . . . ,vd ]> describes the moment order.

We summarize the behavior of moments that we need here:

Lemma 3.1.2. Suppose ε is sufficiently small. Then µ0(x,ε), µed (x,ε), µ2ei and µ2ei+ed (x,ε) are continuous

functions of x on M for all i = 1, · · · ,d. Define |Sd−2|

d−1 = 1 when d = 1. Then, those functions can be quantitatively

described as follows.

1. If x ∈Mε , µ0 is an increasing function of ε(x) and

µ0(x,ε) =|Sd−1|

2dε

d +∫

ε(x)

0

|Sd−2|d−1

(ε2−h2)d−1

2 dh+O(εd+1). (3.1.8)

If x 6∈Mε , then

µ0(x,ε) =|Sd−1|

d . (3.1.9)

In general, the following bound holds for µ0(x,ε):

|Sd−1|2d

εd +O(εd+1)≤ µ0(x,ε)≤

|Sd−1|d

εd . (3.1.10)

Page 83: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 79

2. If x ∈Mε , µed is an increasing function of ε(x) and

µed (x,ε) =−|Sd−2|d2−1

(ε2− ε(x)2)d+1

2 +O(εd+2). (3.1.11)

If x 6∈Mε , then

µed (x,ε) = 0. (3.1.12)

In general, µed (x,ε) is of order O(εd+1).

3. If x ∈Mε , µ2ei is an increasing function of ε(x) for i = 1, · · · ,d. We have

µ2ei(x,ε) =|Sd−1|

2d(d +2)ε

d+2 +∫

ε(x)

0

|Sd−2|(d−1)2 (ε

2−h2)d+1

2 dh+O(εd+3), (3.1.13)

for i = 1, · · · ,d−1, and

µ2ed (x,ε) =|Sd−1|

2d(d +2)ε

d+2 +∫

ε(x)

0

|Sd−2|d−1

(ε2−h2)d−1

2 h2dh+O(εd+3), (3.1.14)

If x 6∈Mε , then

µ2ei(x,ε) =|Sd−1|

d(d +2)ε

d+2. (3.1.15)

In general, the following bounds hold for µ2ei(x,ε), where i = 1, · · · ,d:

|Sd−1|2d(d +2)

εd+2 +O(εd+3)≤ µ2ei(x,ε)≤

|Sd−1|d(d +2)

εd+2, (3.1.16)

4. If x ∈Mε , µ2ei+ed is an increasing function of ε(x) and

µ2ei+ed (x,ε) =−|Sd−2|

(d−1)(d +3)(ε2− ε(x)2)

d+32 +O(εd+4), (3.1.17)

for i = 1, · · ·d−1. And

µ3ed (x,ε) =−|Sd−2|

(d2−1)(d +3)(ε2− ε(x)2)

d+12 (2ε

2 +(d +1)ε(x)2)+O(εd+4). (3.1.18)

If x 6∈Mε , then

µ2ei+ed (x,ε) = 0. (3.1.19)

In general, µ2ei+ed (x,ε) is of order O(εd+3).

The proof is a straightforward integration, we omit it here. This Lemma tells us that when x /∈ Mε , that is,when x is far away from the boundary, all odd order moments disappear due to the symmetry of the integrationdomain. However, when x ∈Mε , it no longer holds – the integration domain becomes asymmetric, and the oddmoments no longer disappear.

For v ∈ Rp, denote

v = [[v1, v2]] ∈ Rp , (3.1.20)

Page 84: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 80

where v1 ∈ Rd forms the first d coordinates of v and v2 ∈ Rp−d forms the last p− d coordinates of v. Thus, forv = [[v1, v2]] ∈ Tι(x)Rp, v1 = J>p,dv is the coordinate of the tangential component of v on ι∗TxM and v2 = J>p,p−dv isthe coordinate of the normal component of v associated with a chosen basis of the normal bundle.

Define Ni j(x) = J>p,p−dII(ei,e j). Note that Ni j(x) is symmetric.

For d ≤ r ≤ p, we defineJp,r−d := Jp,p−dJp−d,r−d ∈ Rp×(r−d). (3.1.21)

We calculate some major ingredients that we are going to use in the proof of the main theorem. Specifically,we calculate the first two order terms in E[χBRp

ε (ι(x))(X)], E[( f (X)− f (x))χBRpε (ι(x))(X)], and the first two order

terms in the tangent component of E[(X− ι(x))χBRpε (ι(x))(X)] and E[(X− ι(x))( f (X)− f (x))χBRp

ε (ι(x))(X)].

Lemma 3.1.3. Fix x ∈M and f ∈C3(M). When ε > 0 is sufficiently small, the following expansions hold.

1. E[χBRpε (ι(x))(X)] satisfies

E[χBRpε (ι(x))(X)] = P(x)µ0(x,ε)+∂dP(x)µed (x,ε)+O(εd+2) (3.1.22)

2. E[( f (X)− f (x))χBRpε (ι(x))(X)] satisfies

E[( f (X)− f (x))χBRpε (ι(x))(X)] = P(x)∂d f (x)µed (x,ε) (3.1.23)

+d

∑i=1

(P(x)

2∂

2ii f (x)+∂i f (x)∂iP(x))µ2ei(x,ε)+O(εd+3),

3. The vector E[(X− ι(x))χBRpε (ι(x))(X)] satisfies

E[(X− ι(x))χBRpε (ι(x))(X)] = [[v1,v2]] , (3.1.24)

where

v1 =P(x)µed (x,ε)J>p,ded +

d

∑i=1

(∂iP(x)µ2ei(x,ε)

)J>p,dei +O(εd+3)

v2 =P(x)

2

d

∑i=1

Nii(x)µ2ei +O(εd+3). (3.1.25)

4. The vector E[(X− ι(x))( f (X)− f (x))χBRpε (ι(x))(X)] satisfies

E[(X− ι(x))( f (X)− f (x))χBRpε (ι(x))(X)] = [[v1,v2]] , (3.1.26)

Page 85: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 81

where

v1 =P(x)d

∑i=1

(∂i f (x)µ2ei(x,ε)

)J>p,dei

+d−1

∑i=1

[∂i f (x)∂dP(x)+∂d f (x) ∂iP(x)+P(x)∂ 2

id f (x)]µ2ei+ed (x,ε)J>p,dei (3.1.27)

+d

∑i=1

([∂i f (x)∂iP(x)+

P(x)2

∂2ii f (x)

]µ2ei+ed (x,ε)

)J>p,ded +O(εd+4),

v2 =P(x)d−1

∑i=1

∂i f (x)Nid(x)µ2ei+ed (x,ε)+P(x)

2∂d f (x)Ndd(x)µ3ed (x,ε)+O(εd+4). (3.1.28)

Note that this Lemma could be viewed as the generalization of Lemma 2.2.5 to the boundary. In particularly,when x /∈Mε , we recover Lemma 2.2.5.

Proof. First, we calculate E[χBRpε (ι(x))(X)].

E[χBRpε (ι(x))(X)] (3.1.29)

=∫

Dε (x)

(P(x)+

d

∑i=1

∂iP(x)ui +O(u2))(

1−d

∑i, j=1

16Ricx(i, j)uiu j +O(u3)

)du

=P(x)∫

Dε (x)du+

∫Dε (x)

d

∑i=1

∂iP(x)uidu+O(εd+2)

=P(x)µ0(x,ε)+∂dP(x)µed (x,ε)+O(εd+2) ,

where the second equality holds by applying Lemma 3.1.1 that the error of changing domain from Dε(x) to Dε(x)

is of order O(εd+2). Note that P(x) is bounded away from 0.

Second, we calculate E[( f (X)− f (x))χBRpε (ι(x))(X)]. Note that when ε is sufficiently small, we have

f expx(u)− f (x) =d

∑i=1

∂i f (x)ui +12

d

∑i, j=1

∂2i j f (x)uiu j +O(u3), (3.1.30)

which is of order O(ε) for u ∈ Dε(x). By a direct expansion, we have

E[( f (X)− f (x))χBRpε (ι(x))(X)] (3.1.31)

=∫

Dε (x)(

d

∑i=1

∂i f (x)ui +12

d

∑i, j=1

∂2i j f (x)uiu j +O(u3))(P(x)+

d

∑i=1

∂iP(x)ui +O(u2))

× (1−d

∑i, j=1

16Ricx(i, j)uiu j +O(u3))du ,

Page 86: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 82

which by Lemma 3.1.1 and the symmetry of Dε(x) becomes

∫Dε (x)

[P(x)

d

∑i=1

∂i f (x)ui +P(x)

2

d

∑i, j=1

∂2i j f (x)uiu j +

d

∑i=1

∂i f (x)ui

d

∑j=1

∂ jP(x)u j +O(u3)]du

=P(x)∂d f (x)∫

Dε (x)uddu+

d

∑i=1

(P(x)

2∂

2ii f (x)+∂i f (x)∂iP(x))

∫Dε (x)

u2i du+O(εd+3)

=P(x)∂d f (x)µed (x,ε)+d

∑i=1

(P(x)

2∂

2ii f (x)+∂i f (x)∂iP(x))µ2ei(x,ε)+O(εd+3) ,

Note that the leading term in the integral is of oreder O(ε), hence the error of changing the domain from Dε(x) toDε(x) is of order O(εd+3).

Third, by a direct expansion, we have

E[(X− ι(x))χBRpε (ι(x))(X)] (3.1.32)

=∫

Dε (x)(ι∗u+

12

IIx(u,u)+O(u3))(P(x)+d

∑i=1

∂iP(x)ui +O(u2))

× (1−d

∑i, j=1

16Ricx(i, j)uiu j +O(u3))du ,

which is a vector in Rp. We then find the tangential part and the normal part of E[(X − ι(x))χBRpε (ι(x))(X)]

respectively. The tangential part is

∫Dε (x)

(ι∗u+O(u3))(P(x)+d

∑i=1

∂iP(x)ui +O(u2)) (3.1.33)

× (1−d

∑i, j=1

16Ricx(i, j)uiu j +O(u3))du

=∫

Dε (x)(ι∗u+O(u3))(P(x)+

d

∑i=1

∂iP(x)ui +O(u2))

× (1−d

∑i, j=1

16Ricx(i, j)uiu j +O(u3))du+O(εd+3) ,

where the equality holds by Lemma 3.1.1. Similarly, by Lemma 3.1.1, the normal part is

∫Dε (x)

(12

IIx(u,u)+O(u3))(P(x)+d

∑i=1

∂iP(x)ui +O(u2)) (3.1.34)

× (1−d

∑i, j=1

16Ricx(i, j)uiu j +O(u3))du

=∫

Dε (x)(

12

IIx(u,u)+O(u3))(P(x)+d

∑i=1

∂iP(x)ui +O(u2))

× (1−d

∑i, j=1

16Ricx(i, j)uiu j +O(u3))du+O(εd+4)

since the leading term P(x)IIx(u,u) is of order O(ε2) on Dε(x). As a result, by putting the tangent part and normal

Page 87: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 83

part together, E[(X− ι(x))χBRpε (ι(x))(X)] = [[v1,v2]], where

v1 =J>p,d[P(x)

∫Dε (x)

ι∗udu+∫

Dε (x)ι∗u

d

∑i=1

∂iP(x)uidu+O(εd+3)]

(3.1.35)

=

(P(x)

∫Dε (x)

uddu)

J>p,ded +d

∑i=1

(∂iP(x)

∫Dε (x)

u2i du)

J>p,dei +O(εd+3)

=P(x)µed (x,ε)J>p,ded +

d

∑i=1

∂iP(x)µ2ei(x,ε)J>p,dei +O(εd+3)

v2 =P(x)

2J>p,p−d

∫Dε (x)

IIx(u,u)du+O(εd+3) (3.1.36)

=P(x)

2

d

∑i=1

Nii(x)µ2ei +O(εd+3).

Finally, we evaluate E[(X− ι(x))( f (X)− f (x))χBRpε (ι(x))(X)] and then find the tangential part and the normal

part. By a direct expansion,

E[(X− ι(x))( f (X)− f (x))χBRpε (ι(x))(X)] (3.1.37)

=∫

Dε (x)(ι∗u+

12

IIx(u,u)+O(u3))(d

∑i=1

∂i f (x)ui +12

d

∑i, j=1

∂2i j f (x)uiu j +O(u3))

×(P(x)+

d

∑i=1

∂iP(x)ui +O(u2))(

1−d

∑i, j=1

16Ricx(i, j)uiu j +O(u3)

)du.

The tangential part is

∫Dε (x)

(ι∗u+O(u3))( d

∑i=1

∂i f (x)ui +12

d

∑i, j=1

∂2i j f (x)uiu j +O(u3)

)×(P(x)+

d

∑i=1

∂iP(x)ui +O(u2))(

1−d

∑i, j=1

16Ricx(i, j)uiu j +O(u3)

)du. (3.1.38)

The leading term P(x)ι∗u∑di=1 ∂i f (x)ui is of order O(ε2) on Dε(x), therefore the error of changing domain from

Dε(x) to Dε(x) is of order O(εd+4). The normal part is

∫Dε (x)

(12

IIx(u,u)+O(u3))( d

∑i=1

∂i f (x)ui +12

d

∑i, j=1

∂2i j f (x)uiu j +O(u3)

)×(P(x)+

d

∑i=1

∂iP(x)ui +O(u2))(

1−d

∑i, j=1

16Ricx(i, j)uiu j +O(u3)

)du. (3.1.39)

The leading term P(x)IIx(u,u)∑di=1 ∂i f (x)ui is of order O(ε3) on Dε(x). Therefore, the error of changing domain

from Dε(x) to Dε(x) is of order O(εd+5). Putting the above together, E[(X − ι(x))( f (X)− f (x))χBRpε (ι(x))(X)] =

Page 88: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 84

[[v1,v2]], where by the symmetry of Dε(x) we have

v1 =J>p,d

[P(x)

∫Dε (x)

ι∗ud

∑i=1

∂i f (x)uidu+∫

Dε (x)ι∗u

d

∑i=1

∂i f (x)ui

d

∑j=1

∂ jP(x)u jdu (3.1.40)

+P(x)

2

∫Dε (x)

ι∗ud

∑i, j=1

∂2i j f (x)uiu jdu+O(εd+4)

]

=P(x)d

∑i=1

(∂i f (x)

∫Dε (x)

u2i du)

J>p,dei

+d−1

∑i=1

[∂i f (x)∂dP(x)+∂d f (x) ∂iP(x)+P(x)∂ 2

id f (x)]∫

Dε (x)u2

i udduJ>p,dei

+d

∑i=1

([∂i f (x)∂iP(x)+

P(x)2

∂2ii f (x)

]∫Dε (x)

u2i uddu

)J>p,ded +O(εd+4)

=P(x)d

∑i=1

∂i f (x)µ2ei(x,ε)J>p,dei

+d−1

∑i=1

[∂i f (x)∂dP(x)+∂d f (x) ∂iP(x)+P(x)∂ 2

id f (x)]µ2ei+ed (x,ε)J>p,dei

+d

∑i=1

[∂i f (x)∂iP(x)+

P(x)2

∂2ii f (x)

]µ2ei+ed (x,ε)J>p,ded +O(εd+4) ,

and

v2 =P(x)

2J>p,p−d

d

∑i=1

∂i f (x)∫

Dε (x)IIx(u,u)uidu+O(εd+4) (3.1.41)

=P(x)d−1

∑i=1

∂i f (x)Nid(x)µ2ei+ed (x,ε)+P(x)

2∂d f (x)Ndd(x)µ3ed (x,ε)+O(εd+4).

3.2 Variance analysis and biased analysis on manifolds with boundary,when ρ = 3

From now on, we fix the regularizer in (1.2.5) to be

c = nεd+3. (3.2.1)

Recall that Cx = E[(X − ι(x))(X − ι(x))>χBRpε (ι(x))(X)]. Suppose rank(Cx) = r. For x ∈M, let Cx =UΛU>

be the eigendecomposition of Cx. Let λ1 ≥ λ2 ≥ ·· · ≥ λr > λr+1 = · · ·= λp = 0 be the eigenvalues of Cx and letβi be the corresponding orthonormal eigenvector.

In next lemma we calculate Cx. This Lemma could be viewed as the generalization of Proposition 2.3.1 in thesense that when x /∈Mε , the result is reduced to that of Proposition 2.3.1. To handle the boundary effect, we onlyneed to calculate the first two order terms in eigenvalues and orthonormal eigenvectors of Cx.

Page 89: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 85

Lemma 3.2.1. Fix x ∈M.

Cx = P(x)

[M(0)(x,ε) 0

0 0

]+

[M(11)(x,ε) M(12)(x,ε)

M(21)(x,ε) 0

]+O(εd+4) (3.2.2)

where M(0) is a d × d diagonal matrix with the m-th diagonal entry µ2em(x,ε). In particular, when x 6∈ Mε ,[M(11)(x,ε) M(12)(x,ε)

M(21)(x,ε) 0

]= 0.

The first d eigenvalues of Cx are

λi = P(x)µ2ei(x,ε)[1+ γi(x)ε]+O(εd+4), (3.2.3)

where i = 1, . . . ,d. If x 6∈Mε , γi(x) = 0 . The last p−d eigenvalues of Cx are λi = O(εd+4), where i = d+1, . . . , p.

Suppose that rank(Cx) = r, then the corresponding othonormal eigenvector matrix is

X(x,ε) = X(x,0)+ εX(x,0)S(x)+O(ε2), (3.2.4)

where

X(x,0) =

X1(x) 0 00 X2(x) 00 0 X3(x)

S(x) =

S11(x) S12(x) S13(x)

S21(x) S22(x) S23(x)

S31(x) S32(x) S33(x)

, (3.2.5)

where X1 ∈ O(d), X2 ∈ O(r− d) and X3 ∈ O(p− r). The matrix S(x) is divided into blocks the same as X(x,0).S(x) is an antisymmetric matrix. In particular, if x 6∈Mε , S(x) = 0.

The proof is essentially the same as that of Proposition 2.3.1, except that we need to handle the fact that theintegral domain is no longer symmetric when x is close to the boundary.

Proof. By definition, the (m,n)-th entry of Cx is

e>mCxen =∫

Dε (x)(ι(y)− ι(x))>em(ι(y)− ι(x))>enP(y)dV (y). (3.2.6)

By the expression

ι expx(u)− ι(x) = ι∗u+12

IIx(u,u)+O(u3) , (3.2.7)

we have

(ι(y)− ι(x))>em(ι(y)− ι(x))>en = (e>mι∗u)(e>n ι∗u)+12(e>mι∗u)(e>n IIx(u,u))+

12(e>mIIx(u,u))(e>n ι∗u)+O(u4).

(3.2.8)

Page 90: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 86

Thus, (3.2.6) is reduced to

e>mCxen =∫

Dε (x)

((e>mι∗u)(e>n ι∗u)+

12(e>mι∗u)(e>n IIx(u,u))+

12(e>mIIx(u,u))(e>n ι∗u)+O(u4)

)(3.2.9)

×(P(x)+∇uP(x)+O(u2)

)(1−

d

∑i, j=1

16Ricx(i, j)uiu j +O(u3)

)du.

For 1≤ m,n≤ d, (e>mι∗u)(e>n ι∗u) = umun. Moreover e>n IIx(u,u) and e>mIIx(u,u) are zero, so

e>mCxen =∫

Dε (x)(umun +O(u4))

(P(x)+∇uP(x)+O(u2)

)(1−

d

∑i, j=1

16Ricx(i, j)uiu j +O(u3)

)du (3.2.10)

=P(x)∫

Dε (x)umundu+

∫Dε (x)

umun

d

∑k=1

uk∂kP(x)du+O(εd+4).

where we use Lemma 3.1.1 to handle the error of changing domain from Dε(x) to Dε(x), which is O(εd+4). Bythe symmetry of domain Dε(x), if 1≤ m = n≤ d,

M(0)m,n =

∫Dε (x)

u2mdu = µ2em(x,ε) (3.2.11)

and M(0)m,n is 0 otherwise.

Next,

M(11)m,n =

∫Dε (x)

umun

d

∑k=1

uk∂kP(x)du (3.2.12)

So, by the symmetry of domain Dε(x), we have

M(11)m,n =

∂dP(x)µ2em+ed 1≤ m = n≤ d,

∂nP(x)µ2en+ed m = d,1≤ n≤ d,

∂mP(x)µ2em+ed n = d,1≤ m≤ d,

0 otherwise.

For 1≤ m≤ d and n≥ d,

e>mCxen =∫

Dε (x)

(12(e>mι∗u)(e>n IIx(u,u))+O(u4)

)(P(x)+∇uP(x)+O(u2)

)(3.2.13)

×(1−

d

∑i, j=1

16Ricx(i, j)uiu j +O(u3)

)du

=P(x)

2

∫Dε (x)

um(e>n IIx(u,u))du+O(εd+4) (3.2.14)

We use Lemma 3.1.1 to handle the error of changing domain from Dε(x) to Dε(x), which is O(εd+5), hence

M(12)(x)m,n−d =P(x)

2

∫Dε (x)

um(e>n IIx(u,u))du. (3.2.15)

Page 91: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 87

Similarly, for 1≤ n≤ d and m≥ d,

e>mCxen =P(x)

2

∫Dε (x)

un(e>mIIx(u,u))du+O(εd+4). (3.2.16)

And,

M(21)(x)m−d,n =P(x)

2

∫Dε (x)

um(e>n IIx(u,u))du. (3.2.17)

At last, for n≥ d and m≥ d, e>mι∗u and e>n ι∗u are 0, therefore, e>mCxen = O(εd+4).

Based on Lemma 3.1.2,

[M(0)(x,ε) 0

0 0

]is of order O(εd+2) and

[M(11)(x,ε) M(12)(x,ε)

M(21)(x,ε) 0

]is of order

O(εd+3). Note that the entries of

[M(11)(x,ε) M(12)(x,ε)

M(21)(x,ε) 0

]are intergal of odd order polynomial over Dε(x),

hence, the matrix is 0 when x 6∈Mε .

We can apply the perturbation theory in the introduction, the first d eigenvalues of Cx are

λi = P(x)µ2ei(x,ε)+λ(1)i (x)+O(εd+4), (3.2.18)

for i = 1, . . . ,d and any x ∈ M, where λ (1)i (x) are eigenvalues of M(11)(x,ε). And λ (1)

i (x) are of orderO(εd+3).

Note that µ2ei(x,ε) is of order εd+2, thus we define an order O(1) term,

γi(x) =λ(1)i (x)

P(x)µ2ei(x,ε)ε. (3.2.19)

Here γi(x) depends on P. When x 6∈Mε , since λ (1)i (x) are zero, we have γi(x) = 0. Therefore,

λi = P(x)µ2ei(x,ε)[1+ γi(x)ε]+O(εd+4), (3.2.20)

where i = 1, . . . ,d. Moreover, λi = O(εd+4) for i = d +1, . . . , p.

Suppose that rank(Cx) = r, based on the perturbation theory in the introduction, the orthonormal eigenvectormatrix of Cx is in the form

X(x,ε) =

X1(x) 0 00 X2(x) 00 0 X3(x)

+ ε

X1(x) 0 00 X2(x) 00 0 X3(x)

S(x)+O(ε2), (3.2.21)

where X1(x) ∈ O(d), X2(x) ∈ O(r−d) and X3(x) ∈ O(p− r).

S(x) is an antisymmetric matrix depending on

[M(11)(x,ε) M(12)(x,ε)

M(21)(x,ε) 0

]and the higher order terms of Cx.

In particular, if x 6∈Mε , S(x) = 0. Moreover, if among first d eigenvalues of Cx, there are 1≤ t ≤ d distinct ones,

Page 92: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 88

then there is a choice of basis e1, · · · ,ed in the tangent space of M so that

X1(x) =

X (1)

1 (x) 0 · · · 0

0 X (2)1 (x) · · · 0

0 0. . . 0

0 0 · · · X (t)1 (x)

, (3.2.22)

where each X (i)1 (x) is an orthogonal matrix corresponding to the same eigenvalue of Cx. The conclusion follows.

Recall that

T>ι(x) = E[(X− ι(x))χBRp

ε (ι(x))(X)]>UIp,r(Λ+ εd+3Ip×p)

−1U> (3.2.23)

=r

∑i=1

E[(X− ι(x))χBRpε (ι(x))(X)]>βiβ

>i

λi + εd+3 ∈ Rp . (3.2.24)

We are going to calculate Tι(x) in the next lemma. For our purpose, we only need to calculate the first twoorder terms in the tangent component of Tι(x) and the first term in the normal direction.

Lemma 3.2.2. Tι(x) = [[v(−1)1 + v(0)1,1 + v(0)1,2,v

(−1)2 ]]+ [[O(ε),O(1)]], where

v(−1)1 =

µed (x,ε)µ2ed (x,ε)

J>p,ded (3.2.25)

v(0)1,1 =∇P(x)P(x)

, (3.2.26)

v(0)1,2 =−εµed (x,ε)γd(x)

µ2ed (x,ε)J>p,ded−

µed (x,ε)εd+3

P(x)(µ2ed (x,ε))2 J>p,ded (3.2.27)

+ εµed (x,ε)d

∑i=1

[(e>d Jp,dX1(x)S11(x)J>p,dei)X1(x)J>p,dei

µ2ei(x,ε)+

(e>d Jp,dX1(x)J>p,dei)X1(x)S11(x)J>p,dei

µ2ei(x,ε)

]+

r

∑i=d+1

[P(x)

µed (x,ε)εd+1

(e>d Jp,dX1(x)S12(x)J>p,r−dei

)]X1(x)S12(x)J>p,r−dei

+r

∑i=d+1

[P(x)

2

d

∑j=1

µ2e j

εd+2N>j j(x)Jp−d,r−dX2(x)J>p,r−dei

]X1(x)S12(x)J>p,r−dei,

and

v(−1)2 =

r

∑i=d+1

[P(x)

µed (x,ε)εd+2

(e>d Jp,dX1(x)S12(x)J>p,r−dei

)]Jp−d,r−dX2(x)J>p,r−dei (3.2.28)

+r

∑i=d+1

[P(x)

2

d

∑j=1

µ2e j

εd+3N>j j(x)Jp−d,r−dX2(x)J>p,r−dei

]Jp−d,r−dX2(x)J>p,r−dei.

Note that by Lemma 3.1.2, v(−1)1 is of order ε−1 when x ∈ Mε and 0 when x /∈ Mε ; v(0)1,2 is of order 1 since

µed (x,ε) is of order εd+1 and µ2ei(x,ε) is of order εd+2 for i = 1, . . . ,d. Moreover, when x /∈ Mε , we haveµed (x,ε) = 0 and S12(x) = 0, hence v(0)1,2 = 0. Similarly, v(−1)

2 is of order ε−1.

Page 93: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 89

Proof. Recall that

T>ι(x) =

r

∑i=1

E[(X− ι(x))χBRpε (ι(x))(X)]>βiβ

>i

λi + εd+3 . (3.2.29)

To show the proof, we evaluate the terms in Tι(x) one by one.

Based on Lemma 3.2.1, the first d eigenvalues are λi =P(x)µ2ei(x,ε)[1+γi(x)ε]+O(εd+4), where i= 1, . . . ,d,and the corresponding eigenvectors are

βi =

[X1(x)J>p,dei

0(p−d)×1

]+

[εX1(x)S11(x)J>p,dei +O(ε2)

O(ε)

], (3.2.30)

where X1(x) ∈ O(d).

For i = d +1, . . . ,r, λi = O(εd+4), and the corresponding eigenvectors are

βi =

[0d×1

Jp−d,r−dX2(x)J>p,r−dei

]+

[εX1(x)S12(x)J>p,r−dei +O(ε2)

O(ε)

], (3.2.31)

where X2(x) ∈ O(r−d).

By Lemma 3.1.3, we haveE[(X− ι(x))χBRp

ε (ι(x))(X)] = [[v1,v2]], (3.2.32)

where

v1 =P(x)µed (x,ε)J>p,ded +

d

∑i=1

∂iP(x)µ2ei(x,ε)J>p,dei +O(εd+3),

v2 =P(x)

2

d

∑i=1

Nii(x)µ2ei +O(εd+3). (3.2.33)

Next, we calculate E[(X − ι(x))χBRpε (ι(x))(X)]>βi , for i = 1, . . . ,d. Note that the normal component of βi is

of order O(ε) and the normal component of E[(X − ι(x))χBRpε (ι(x))(X)] is of order O(εd+2), so they will only

contribute in the O(εd+3) term.

Therefore, for i = 1, . . . ,d, the first two order terms of E[(X− ι(x))χBRpε (ι(x))(X)]>βi are

E[(X− ι(x))χBRpε (ι(x))(X)]>βi (3.2.34)

=(P(x)µed (x,ε)

)(e>d Jp,dX1(x)J>p,dei

)+ ε(P(x)µed (x,ε)

)(e>d Jp,dX1(x)S11(x)J>p,dei

)+

d

∑j=1

(∂Pj(x)µ2e j(x,ε)

)(e>j Jp,dX1(x)J>p,dei

)+O(εd+3).

By putting the above expressions together, a direct calculation shows that the normal component of

d

∑i=1

E[(X− ι(x))χBRpε (ι(x))(X)]>βiβ

>i

λi + εd+3

is of order O(1) and the tangent component of

d

∑i=1

E[(X− ι(x))χBRpε (ι(x))(X)]>βiβ

>i

λi + εd+3

Page 94: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 90

is of order O(ε−1):

P(x)µed (x,ε)d

∑i=1

(e>d Jp,dX1(x)J>p,dei)X1(x)J>p,dei

λi + εd+3 (3.2.35)

+εP(x)µed (x,ε)d

∑i=1

(e>d Jp,dX1(x)S11(x)J>p,dei)X1(x)J>p,dei

λi + εd+3

+d

∑i=1

∑dj=1(∂Pj(x)µ2e j(x,ε))(e

>j Jp,dX1(x)J>p,dei)

λi + εd+3

+εP(x)µed (x,ε)d

∑i=1

(e>d Jp,dX1(x)J>p,dei)X1(x)S11(x)J>p,dei

λi + εd+3 +O(ε) ,

where the first term is of order ε−1, the second to the fourth terms are of order 1 since µed (x,ε) is of order εd+1,µ2ei(x,ε) is of order εd+2 for i = 1, . . . ,d and λi is of order εd+2 for i = 1, . . . ,d.

To simplify the formula of tangent component of Tι(x), recall that from (3.2.22),

X1(x) =

X (1)

1 (x) 0 · · · 0

0 X (2)1 (x) · · · 0

0 0. . . 0

0 0 · · · X (t)1 (x)

, (3.2.36)

1≤ t ≤ d. Here different X (k)1 correpsonds to different λk. In particular X (k)

1 ∈O(dk) corresponds to the eigenvalueλk of Cx, where dk ∈ N is the multiplicity of λk. Note that ∑

di=1(e

>j Jp,dX1(x)J>p,dei)X1(x)J>p,dei , for j = 1, . . . ,d,

is the projection of J>p,de j onto the space spanned by the columns of [0, · · · ,X (k)1 , · · ·0]> ∈ Rd×dk , where X (k)

1

correpsonds to λk = λ j. In other words, if λ j 6= λi, then

(e>j Jp,dX1(x)J>p,dei)X1(x)J>p,dei = 0. (3.2.37)

Thus we have

d

∑i=1

(e>j Jp,dX1(x)J>p,dei)X1(x)J>p,dei

λi + εd+3 =J>p,de j

λ j + εd+3 . (3.2.38)

We conclude that

P(x)µed (x,ε)d

∑i=1

(e>d Jp,dX1(x)J>p,dei)X1(x)J>p,dei

λi + εd+3 (3.2.39)

=P(x)µed (x,ε)J

>p,ded

λd + εd+3

=µed (x,ε)J

>p,ded

µ2ed (x,ε)[1+ γd(x)ε + εd+3

P(x)µ2ed (x,ε)]+O(εd+4)

=µed (x,ε)J

>p,ded

µ2ed (x,ε)[1− γd(x)ε−

εd+3

P(x)µ2ed (x,ε)]+O(ε) ,

whereµed (x,ε)J

>p,de j

µ2e j (x,ε)is of order ε−1 since µed (x,ε) is of order εd+1 and µ2e j(x,ε) is of order εd+2.

Page 95: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 91

Similarly,

d

∑i=1

∑dj=1(∂Pj(x)µ2e j(x,ε)

)(e>j Jp,dX1(x)J>p,dei

)X1(x)J>p,dei

λi + εd+3 (3.2.40)

=d

∑j=1

(∂Pj(x)µ2e j(x,ε)

)J>p,de j

λ j + εd+3

=d

∑j=1

(∂Pj(x)µ2e j(x,ε)

)J>p,de j

P(x)µ2e j(x,ε)[1+ γ j(x)ε]+ εd+3 +O(εd+4)

=∇P(x)P(x)

+O(ε).

At last,

εP(x)µed (x,ε)d

∑i=1

(e>d Jp,dX1(x)S11(x)J>p,dei)X1(x)J>p,dei

λi(3.2.41)

+εP(x)µed (x,ε)d

∑i=1

(e>d Jp,dX1(x)J>p,dei)X1(x)S11(x)J>p,dei

λi

=εµed (x,ε)d

∑i=1

[(e>d Jp,dX1(x)S11(x)J>p,dei)X1(x)J>p,dei

µ2ei(x,ε)[1+ γi(x)ε]+ εd+3 +O(εd+4)+

(e>d Jp,dX1(x)J>p,dei)X1(x)S11(x)J>p,dei

µ2ei(x,ε)[1+ γi(x)ε]+ εd+3 +O(εd+4)

]

=εµed (x,ε)d

∑i=1

[(e>d Jp,dX1(x)S11(x)J>p,dei)X1(x)J>p,dei

µ2ei(x,ε)+

(e>d Jp,dX1(x)J>p,dei)X1(x)S11(x)J>p,dei

µ2ei(x,ε)

]+O(ε)

By combining above equations, the tangent component of ∑di=1

E[(X−ι(x))χBRp

ε (ι(x))(X)]>βiβ

>i

λi+εd+3 can be simplified.

Next, we calculate E[(X− ι(x))χBRpε (ι(x))(X)]>βi , for i = d +1, . . . ,r.

E[(X− ι(x))χBRpε (ι(x))(X)]>βi (3.2.42)

=P(x)εµed (x,ε)(e>d Jp,dX1(x)S12(x)J>p,r−dei

)+

P(x)2

d

∑j=1

µ2e jN>j j(x)Jp−d,r−dX2(x)J>p,r−dei +O(εd+3),

where both terms are of order O(εd+2).

Note that λi = O(εd+4), for i = d +1, . . . ,r, so we have

E[(X− ι(x))χBRpε (ι(x))(X)]>βi

λi + εd+3 (3.2.43)

=E[(X− ι(x))χBRp

ε (ι(x))(X)]>βi

εd+3 +O(εd+4)

=P(x)µed (x,ε)

εd+2

(e>d Jp,dX1(x)S12(x)J>p,r−dei

)+

P(x)2

d

∑j=1

µ2e j

εd+3N>j j(x)Jp−d,r−dX2(x)J>p,r−dei +O(1).

Page 96: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 92

A direct calculation shows that

r

∑i=d+1

E[(X− ι(x))χBRpε (ι(x))(X)]>βiβ

>i

λi + εd+3 (3.2.44)

=r

∑i=d+1

E[(X− ι(x))χBRpε (ι(x))(X)]>βiβ

>i

εd+3 +O(εd+4)

=[[r

∑i=d+1

[P(x)

µed (x,ε)εd+1

(e>d Jp,dX1(x)S12(x)J>p,r−dei

)]X1(x)S12(x)J>p,r−dei

+r

∑i=d+1

[P(x)

2

d

∑j=1

µ2e j

εd+2N>j j(x)Jp−d,r−dX2(x)J>p,r−dei

]X1(x)S12(x)J>p,r−dei +O(ε),

r

∑i=d+1

[P(x)

µed (x,ε)εd+2

(e>d Jp,dX1(x)S12(x)J>p,r−dei

)]Jp−d,r−dX2(x)J>p,r−dei

+r

∑i=d+1

[P(x)

2

d

∑j=1

µ2e j

εd+3N>j j(x)Jp−d,r−dX2(x)J>p,r−dei

]Jp−d,r−dX2(x)J>p,r−dei +O(1)]].

Note that the tangent component is of order O(1) and the normal component is of order ε−1. We sum up

∑di=1

E[(X−ι(x))χBRp

ε (ι(x))(X)]>βiβ

>i

λi+εd+3 and ∑ri=d+1

E[(X−ι(x))χBRp

ε (ι(x))(X)]>βiβ

>i

λi+εd+3 , then we have the conclusion.

Recall thatKLLE(x,y) = [1−T>

ι(x)(ι(y)− ι(x))]χBRpε (ι(x))∩ι(M)(ι(y)), (3.2.45)

andQ f (x) :=

E[ f (X)K(x,X)]

E[K(x,X)]. (3.2.46)

3.2.1 Conclusion when ρ = 3

We fix the regularizer in (1.2.5) to be c = nεd+3. The following Theorem describes the convergence behavior andthe convergence rate of the matrix W

Theorem 3.2.1. Suppose f ∈C3(M). Suppose ε = ε(n) so that√

log(n)n1/2εd/2+1 → 0 and ε→ 0 as n→∞. We have with

probability greater than 1−n−2 that for all k = 1, . . . ,n,

n

∑j=1

[W − In×n]k j f (x j) =Q f (xk)− f (xk)+O( √log(n)

n1/2εd/2−1

)We have the following theorem describing Q when ε is sufficiently small.

Theorem 3.2.2. Let (M,g) be a d-dimensional compact, smooth Riemannian manifold isometrically embedded

in Rp, where M may have smooth boundary. Suppose f ∈C3(M) and P ∈C2(M).

Q f (x)− f (x) =d

∑i=1

φi(x,ε)∂ 2ii f (x)+g(V (x,ε),∇ f (x))+O(ε3), (3.2.47)

where V is a vector field. V and φi are defined in Notation 3.2.1. If x 6∈Mε , then

φi(x,ε) =1

2(d +2)ε

2 (3.2.48)

Page 97: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 93

for i = 1, · · · ,d and V (x,ε) = 0. If x ∈Mε , then φi(x,ε) is of order ε2, for i = 1, · · · ,d and V (x,ε) = O(ε2).

Remark 3.2.1. Since V (x,ε) = 0 if x 6∈ Mε ; V (x,ε)ε2 = O(1) for x ∈ Mε and φi(x,ε)

ε2 is of order O(1) for all x, the

operator Q f (x)ε2 converges to a second order differential operator when ε → 0. Hence, we conclude that when

c = nεd+3, LLE matrix 1ε2 [W − In×n] recovers a differential operator with free boundary condition.

3.2.2 Proof of Theorem 3.2.2

To simplify the proof of the main theorems, we introduce following notations,

Notation 3.2.1. Define functions

φi(x,ε) =µ2ed (x,ε)µ2ei(x,ε)−µ2ei+ed (x,ε)µed (x,ε)

2µ2ed (x,ε)µ0(x,ε)−2µed (x,ε)2 (3.2.49)

for i = 1, · · · ,d.

Define a vector field

V (x,ε) =d

∑i=1

Vi(x,ε)∂i, (3.2.50)

where

Vi(x,ε) =−[µ2ei(x,ε)µ2ed (x,ε)v(0)>1,2 J>p,dei +

∂iP(x)P(x) µ2ei+ed (x,ε)µed (x,ε)+µ2ei+ed (x,ε)µ2ed (x,ε)v(−1)>

2 Nid(x)

µ0(x,ε)µ2ed (x,ε)−µed (x,ε)2

],

(3.2.51)

for i = 1 · · ·d−1, and

Vd(x,ε) = −[µ2ed (x,ε)

2 v(0)>1,2 J>p,ded +∂dP(x)

P(x) µ3ed (x,ε)µed (x,ε)+12 µ3ed (x,ε)µ2ed (x,ε)v(−1)>

2 Ndd(x)

µ0(x,ε)µ2ed (x,ε)−µed (x,ε)2

],

(3.2.52)

where v(0)1,2 is defined in (3.2.26) and v(−1)2 is defined in (3.2.28).

Next lemma describes the behavior of the above seemingly complicated notations. The proof follows from adirect calculation based on Lemma 3.1.2 and Lemma 3.2.2.

Lemma 3.2.3. If x ∈Mε , then φi(x,ε) is of order ε2, for i = 1, · · · ,d.

V (x,ε) = O(ε2) (3.2.53)

If x 6∈Mε , then

φi(x,ε) =1

2(d +2)ε

2 (3.2.54)

for i = 1, · · · ,d,

V (x,ε) = 0. (3.2.55)

In this proof of the main theorem, we caluclate the first two order terms in Q f (x). First, we are going tocalculate E[χBRp

ε (ι(x))(X)]−E[(X− ι(x))χBRpε (ι(x))(X)]>Tι(x) and show that it is dominated by the order εd terms.

Page 98: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 94

Then we are going to calculate E[( f (X)− f (x))χBRpε (ι(x))(X)]−E[(X − ι(x))( f (X)− f (x))χBRp

ε (ι(x))(X)]>Tι(x)

and show that it is dominated by the order εd+2 terms. Hence their ratio is dominated by the order ε2 terms.

By Lemma 3.1.3 and Lemma 3.2.2, we have

E[χBRpε (ι(x))(X)] = P(x)µ0(x,ε)+O(εd+1), (3.2.56)

E[(X− ι(x))χBRpε (ι(x))(X)] = [[P(x)µed (x,ε)J

>p,ded +O(εd+2),O(εd+2)]], (3.2.57)

and Tι(x) = [[v(−1)1 + v(0)1,1 + v(0)1,2,v

(−1)2 ]]+ [[O(ε),O(1)]], where

v(−1)1 =

µed (x,ε)µ2ed (x,ε)

J>p,ded , (3.2.58)

v(0)1,1 =∇P(x)P(x)

, (3.2.59)

v(0)1,2 and v(−1)2 are defined in Lemma 3.2.2. Moreover, v(0)1,2 is of order O(1) and v(−1)

2 is of order O(ε−1). Hence,

E[χBRpε (ι(x))(X)]−E[(X− ι(x))χBRp

ε (ι(x))(X)]>Tι(x) (3.2.60)

=P(x)[µ0(x,ε)−

µed (x,ε)2

µ2ed (x,ε)

]+O(εd+1) ,

=P(x)[µ0(x,ε)µ2ed (x,ε)−µed (x,ε)

2

µ2ed (x,ε)

]+O(εd+1) ,

where the leading term in above expression is of order εd by Lemma 3.1.2.

Based on Lemma 3.1.3, we have

E[( f (X)− f (x))χBRpε (ι(x))(X)]

=P(x)∂d f (x)µed (x,ε)+d

∑i=1

[P(x)2

∂2ii f (x)+∂i f (x)∂iP(x)

]µ2ei(x,ε)+O(εd+3),

andE[(X− ι(x))( f (X)− f (x))χBRp

ε (ι(x))(X)] = [[v1,v2]] , (3.2.61)

where

v1 =P(x)d

∑i=1

(∂i f (x)µ2ei(x,ε)

)J>p,dei

+d−1

∑i=1

[∂i f (x)∂dP(x)+∂d f (x) ∂iP(x)+P(x)∂ 2

id f (x)]µ2ei+ed (x,ε)J>p,dei (3.2.62)

+d

∑i=1

([∂i f (x)∂iP(x)+

P(x)2

∂2ii f (x)

]µ2ei+ed (x,ε)

)J>p,ded +O(εd+4),

v2 =P(x)d−1

∑i=1

∂i f (x)Nid(x)µ2ei+ed (x,ε)+P(x)

2∂d f (x)Ndd(x)µ3ed (x,ε)+O(εd+4). (3.2.63)

Page 99: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 95

Therefore, we have

E[(X− ι(x))( f (X)− f (x))χBRpε (ι(x))(X)]>Tι(x) (3.2.64)

=P(x)d

∑i=1

(∂i f (x)µ2ei(x,ε)

)v(−1)>

1 J>p,dei +P(x)d

∑i=1

(∂i f (x)µ2ei(x,ε)

)v(0)>1,1 J>p,dei

+P(x)d

∑i=1

(∂i f (x)µ2ei(x,ε)

)v(0)>1,2 J>p,dei +

d−1

∑i=1

[∂i f (x)∂dP(x)+∂d f (x) ∂iP(x)+P(x)∂ 2

id f (x)]µ2ei+ed (x,ε)v(−1)>

1 J>p,dei

+d

∑i=1

[∂i f (x)∂iP(x)+

P(x)2

∂2ii f (x)

]µ2ei+ed (x,ε)v(−1)>

1 J>p,ded

+P(x)d−1

∑i=1

∂i f (x)µ2ei+ed (x,ε)v(−1)>2 Nid(x)+

P(x)2

∂d f (x)µ3ed (x,ε)v(−1)>2 Ndd(x)+O(εd+3)

Note that by Lemma 3.1.2, the first term is of order εd+1 and the second to seventh terms are of order εd+2.Furthermore, we can simplify the first and the second term as:

P(x)d

∑i=1

(∂i f (x)µ2ei(x,ε)

)v(−1)>

1 J>p,dei = P(x)∂d f (x)µed (x,ε) (3.2.65)

P(x)d

∑i=1

(∂i f (x)µ2ei(x,ε)

)v(0)>1,1 J>p,dei =

d

∑i=1

∂i f (x)∂iP(x)µ2ei(x,ε) (3.2.66)

Next we calculate E[( f (X)− f (x))χBRpε (ι(x))(X)]−E[(X − ι(x))( f (X)− f (x))χBRp

ε (ι(x))(X)]>Tι(x). Clearly, thecommon terms, P(x)∂d f (x)µed (x,ε) and ∑

di=1 ∂i f (x)∂iP(x)µ2ei(x,ε), are canceled, and hence only terms of order

εd+2 are left in the difference; that is, we have

E[( f (X)− f (x))χBRpε (ι(x))(X)]−E[(X− ι(x))( f (X)− f (x))χBRp

ε (ι(x))(X)]>Tι(x)

=P(x)

2

d

∑i=1

∂2ii f (x)µ2ei(x,ε)−P(x)

d

∑i=1

(∂i f (x)µ2ei(x,ε)

)v(0)>1,2 J>p,dei

−d−1

∑i=1

[∂i f (x)∂dP(x)+∂d f (x) ∂iP(x)+P(x)∂ 2

id f (x)]µ2ei+ed (x,ε)v(−1)>

1 J>p,dei

−d

∑i=1

[∂i f (x)∂iP(x)+

P(x)2

∂2ii f (x)

]µ2ei+ed (x,ε)v(−1)>

1 J>p,ded

−P(x)d−1

∑i=1

∂i f (x)µ2ei+ed (x,ε)v(−1)>2 Nid(x)−

P(x)2

∂d f (x)µ3ed (x,ε)v(−1)>2 Ndd(x)+O(εd+3)

Therefore, the ratio

E[( f (X)− f (x))χBRpε (ι(x))(X)]−E[(X− ι(x))( f (X)− f (x))χBRp

ε (ι(x))(X)]>Tι(x)

E[χBRpε (ι(x))(X)]−E[(X− ι(x))χBRp

ε (ι(x))(X)]>Tι(x)(3.2.67)

Page 100: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 96

could be expanded to

12

d

∑i=1

∂2ii f (x)

µ2ei(x,ε)µ2ed (x,ε)µ0(x,ε)µ2ed (x,ε)−µed (x,ε)

2 −d

∑i=1

∂i f (x)µ2ei(x,ε)µ2ed (x,ε)

µ0(x,ε)µ2ed (x,ε)−µed (x,ε)2 v(0)>1,2 J>p,dei

−d−1

∑i=1

[∂i f (x)

∂dP(x)P(x)

+∂d f (x)∂iP(x)P(x)

+∂2id f (x)

] µ2ei+ed (x,ε)µ2ed (x,ε)µ0(x,ε)µ2ed (x,ε)−µed (x,ε)

2 v(−1)>1 J>p,dei

−d

∑i=1

[∂i f (x)

∂iP(x)P(x)

+12

∂2ii f (x)

] µ2ei+ed (x,ε)µ2ed (x,ε)µ0(x,ε)µ2ed (x,ε)−µed (x,ε)

2 v(−1)>1 J>p,ded

−d−1

∑i=1

∂i f (x)µ2ei+ed (x,ε)µ2ed (x,ε)

µ0(x,ε)µ2ed (x,ε)−µed (x,ε)2 v(−1)>

2 Nid(x)−∂d f (x)

2µ3ed (x,ε)µ2ed (x,ε)

µ0(x,ε)µ2ed (x,ε)−µed (x,ε)2 v(−1)>

2 Ndd(x)+O(ε3)

Note that v(−1)>1 J>p,dei =

µed (x,ε)µ2ed (x,ε)

if i = d, and it is 0 otherwise. Hence, above expression can be further simplifiedinto

E[( f (X)− f (x))χBRpε (ι(x))(X)]−E[(X− ι(x))( f (X)− f (x))χBRp

ε (ι(x))(X)]>Tι(x)

E[χBRpε (ι(x))(X)]−E[(X− ι(x))χBRp

ε (ι(x))(X)]>Tι(x)(3.2.68)

=d

∑i=1

∂2ii f (x)

[µ2ei(x,ε)µ2ed (x,ε)−µ2ei+ed (x,ε)µed (x,ε)

2µ0(x,ε)µ2ed (x,ε)−2µed (x,ε)2

]

−d−1

∑i=1

∂i f (x)[µ2ei(x,ε)µ2ed (x,ε)v(0)>1,2 J>p,dei +

∂iP(x)P(x) µ2ei+ed (x,ε)µed (x,ε)+µ2ei+ed (x,ε)µ2ed (x,ε)v(−1)>

2 Nid(x)

µ0(x,ε)µ2ed (x,ε)−µed (x,ε)2

]

−∂d f (x)[µ2ed (x,ε)

2 v(0)>1,2 J>p,ded +∂dP(x)

P(x) µ3ed (x,ε)µed (x,ε)+12 µ3ed (x,ε)µ2ed (x,ε)v(−1)>

2 Ndd(x)

µ0(x,ε)µ2ed (x,ε)−µed (x,ε)2

]+O(ε3)

=d

∑i=1

φi(x,ε)∂ 2ii f (x)+g(V (x,ε),∇ f (x))+O(ε3)

The conclusion of the theorem follows from applying Lemma 3.2.3 to simplify the expressions.

3.2.3 Proof of Theorem 3.2.1

The proof is similar to the proof of Theorem 2.4.1.

For each xk, denote fff = ( f (x1), f (x2), . . . , f (xn))> ∈ Rn. By a direct expansion, we have

N

∑j=1

[W − In×n]k j f (x j) =111>N fff −111>N G>n UnIp,rn(Λn +nεd+3Ip×p)

−1U>n Gn fff

N−111>N G>n UnIp,rn(Λn +nεd+3Ip×p)−1U>n Gn111N− f (xk), (3.2.69)

which can be rewritten as

1nεd ∑

Nj=1( f (xk, j)− f (xk))− [ 1

nεd ∑Nj=1(xk, j− xk)]

>UnIp,rn(Λnnεd + ε3Ip×p)

−1U>n [ 1nεd ∑

Nj=1(xk, j− xk)( f (xk, j)− f (xk))]

Nnεd − [ 1

nεd ∑Nj=1(xk, j− xk)]>UnIp,rn(

Λnnεd + ε3Ip×p)−1U>n [ 1

nεd ∑Nj=1(xk, j− xk)]

(3.2.70)

The goal is to relate the finite sum quantity (3.2.70) to Q f (xk) =g1g2

,

Page 101: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 97

where

g1 =E[1εd χBRp

ε (ι(xk))(X)( f (X)− f (xk))]−E[

1εd (X− xk)χBRp

ε (ι(xk))(X)]> (3.2.71)

(UIp,r(Λ

εd + ε3Ip×p)

−1U>)E[1εd (X− xk)χBRp

ε (ι(xk))(X)( f (X)− f (xk))]

and

g2 =E[1εd χBRp

ε (ι(xk))(X)]−E[

1εd (X− xk)χBRp

ε (ι(xk))(X)]> (3.2.72)

(UIp,r(Λ

εd + ε3Ip×p)

−1U>)E[1εd (X− xk)χBRp

ε (ι(xk))(X)].

Moreover, we relate

Bkk f (xk) =‖Gn111N‖2

Rp

N2εf (xk) =

( 1nεd ∑

Nj=1(xk, j− xk)

Nnεd

)2f (xk) (3.2.73)

to

Bε(xk) f (xk) =1ε

(‖E[ 1εd (X− ι(xk))χBRp

ε (ι(xk))(X)]‖Rp

E[ 1εd χBRp

ε (ι(xk))(X)]

)2f (xk). (3.2.74)

We now control the size of the fluctuation of the following four terms

1nεd

N

∑j=1

1 (3.2.75)

1nεd

N

∑j=1

( f (xk, j)− f (xk)) (3.2.76)

1nεd

N

∑j=1

(xk, j− xk) (3.2.77)

1nεd

N

∑j=1

(xk, j− xk)( f (xk, j)− f (xk)) (3.2.78)

as a function of n and ε by the Bernstein type inequality. Here, we put ε−d in front of each term to normalize thekernel so that the computation is consistent with the existing literature, like [17, 57].

The size of the fluctuation of these terms are controlled in the following Lemmas. The term (3.2.75) is theusual kernel density estimation, so we have the following lemma.

Lemma 3.2.4. Suppose ε = ε(n) so that√

log(n)n1/2εd/2+1 → 0 and ε → 0 as n→ ∞. We have with probability greater

than 1−n−2 that for all k = 1, . . . ,n,∣∣∣∣∣ 1nεd

N

∑j=1

1−E1εd χBRp

ε (xk)(X)

∣∣∣∣∣= O(√log(n)

n1/2εd/2

). (3.2.79)

Denote Ω0 to be the event space that above Lemma is satisfied. The behavior of (3.2.76) is summarized in thefollowing Lemma.

Lemma 3.2.5. Suppose ε = ε(n) so that√

log(n)n1/2εd/2+1 → 0 and ε → 0 as n→ ∞. We have with probability greater

Page 102: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 98

than 1−n−2 that for all k = 1, . . . ,n ,∣∣∣∣∣ 1nεd

N

∑j=1

( f (xk, j)− f (xk))−E1εd ( f (X)− f (xk))χBRp

ε (xk)(X)

∣∣∣∣∣= O( √log(n)

n1/2εd/2−1

). (3.2.80)

Proof. By denoting

F1, j =1εd ( f (x j)− f (xk))χBRp

ε (xk)(x j), (3.2.81)

we have1

nεd

N

∑j=1

( f (xk, j)− f (xk)) =1n

n

∑j 6=k, j=1

F1, j. (3.2.82)

Define a random variableF1 :=

1εd ( f (X)− f (xk))χBRp

ε (xk)(X). (3.2.83)

Clearly, when j 6= k, F1, j can be viewed as randomly sampled i.i.d. from F1. Note that we have

1n

n

∑j 6=k, j=1

F1, j =n−1

n

[1

n−1

n

∑j 6=k, j=1

F1, j

](3.2.84)

Since n−1n → 1 as n→∞, the error incurred by replacing 1

n by 1n−1 is of order 1

n , which is negligible asymptotically,we can simply focus on analyzing 1

n−1 ∑nj=1, j 6=i F1, j. We have by Lemma 3.1.2 and Lemma 3.1.3,

E[F1] =O(ε) if x ∈Mε (3.2.85)

E[F1] =O(ε2) if x 6∈Mε (3.2.86)

E[F21 ] =

d

∑i=1

P(xk)(∂i f (xk))2µ2ei(xk,ε)ε

−2d +O(ε−d+3), (3.2.87)

By Lemma 3.1.2, |Sd−1|2d(d+2)ε−d+2 +O(ε−d+3)≤ µ2ei(xk,ε)ε

−2d ≤ |Sd−1|d(d+2)ε−d+2, therefore, in any case,

σ21 := Var(F1)≤

|Sd−1|‖P‖L∞

d(d +2)ε−d+2 +O(ε−d+3), . (3.2.88)

With the above bounds, we could apply the large deviation theory. First, note that the random variable F1 isuniformly bounded by

c1 = 2‖ f‖L∞ε−d (3.2.89)

so we apply Bernstein’s inequality to provide a large deviation bound. Recall Bernstein’s inequality

Pr

1

n−1

n

∑j 6=k, j=1

(F1, j−E[F1])> η1

≤ e−

nη21

2σ21 + 2

3 c1η1 , (3.2.90)

where η1 > 0.

Note that E[F1] = O(ε), if xk ∈Mε and E[F1] = O(ε2), if xk 6∈Mε . Hence, we assume η1 = O(ε2+s), wheres> 0. Then c1η1 =O(ε−d+2+s). If ε is small enough, 2σ2

1 +23 c1η1 ≤Cε−d+2 for some constant C which depends

on P. We have,nη2

1

2σ21 +

23 c1η1

≥ nη21 εd−2

C. (3.2.91)

Page 103: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 99

Suppose n is chosen large enough so that

nη21 εd−2

C≥ 3log(n) ; (3.2.92)

that is, the deviation from the mean is set to

η1 ≥ O( √log(n)

n1/2εd/2−1

), (3.2.93)

Note that by the assumption that η1 = O(ε2+s), we know that η1/ε2 =

√log(n)

n1/2εd/2+1 → 0. It implies that thedeviation greater than η1 happens with probability less than

exp

(− nη2

1

2σ21 +

23 c1η1

)≤ exp

(−nη2

1 εd−2

C

)= exp(−3log(n)) = 1/n3.

As a result, by a simple union bound, we have

Pr

1

n−1

n

∑j 6=k, j=1

(F1, j−E[F1])> η1

∣∣∣k = 1, . . . ,n

≤ ne

−nη2

12σ2

1 + 23 c1η1 ≤ 1/n2. (3.2.94)

Denote Ω1 to be the event space that the deviation 1n−1 ∑

nj 6=k, j=1(F1, j−E[F1])≤ η1 for all i = 1, . . . ,n, where

η1 is chosen in (3.2.93) is satisfied.

Lemma 3.2.6. Suppose ε = ε(n) so that√

log(n)n1/2εd/2+1 → 0 and ε → 0 as n→ ∞. We have with probability greater

than 1−n−2 that for all k = 1, . . . ,n,

e>i

[1

nεd

N

∑j=1

(xk, j− xk)−E1εd (X− xk)χBRp

ε (xk)(X)

]= O

( √log(n)n1/2εd/2−1

), (3.2.95)

where i = 1, . . . ,d. And

e>i

[1

nεd

N

∑j=1

(xk, j− xk)−E1εd (X− xk)χBRp

ε (xk)(X)

]= O

( √log(n)n1/2εd/2−2

), (3.2.96)

where i = d +1, . . . , p.

Proof. Fix xk. By denoting1

nεd

N

∑j=1

(xk, j− xk) =1n

n

∑j 6=k, j=1

p

∑`=1

F2,`, je`. (3.2.97)

whereF2,`, j :=

1εd e>` (x j− xk)χBRp

ε (xk)(x j), (3.2.98)

and we know that when j 6= k, F2,`, j is randomly sampled i.i.d. from

F2,` :=1εd e>` (X− xk)χBRp

ε (xk)(X). (3.2.99)

Page 104: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 100

Similarly, we can focus on analyzing 1n−1 ∑

nj=1, j 6=i F2,`, j since n−1

n → 1 as n→ ∞. By Lemma 3.1.3 we have

E[F2,`] =

(P(x)µed (x,ε)ε

−d)e>` ed +d

∑i=1

(∂iP(x)µ2ei(x,ε)ε

−d)e>` ei +O(εd+3) when `= 1, . . . ,d

P(x)ε−d

2e>`

d

∑i=1

Nii(x)µ2ei +O(εd+3) when `= d +1, . . . , p.

In other words, by Lemma 3.1.2, for ` = 1, . . . ,d we have E[F2,`] = O(ε) if xk ∈ Mε , and E[F2,`] = O(ε2) ifxk 6∈Mε . Moreover, E[F2,`] = O(ε2) for `= d +1, . . . , p.

By (3.2.10) we have, for `= 1, . . . ,d

E[F22,`]≤C`ε

−d+2 +O(ε−d+3), (3.2.100)

and C` depends on ‖P‖L∞ . For `= d +1, . . . , p,

E[F22,`]≤C`ε

−d+4 +O(ε−d+5), (3.2.101)

and C` depends on ‖P‖L∞ and second fundamental form of M.

Thus, we conclude that

σ22,` :≤C`ε

−d+2 +O(ε−d+3) when `= 1, . . . ,d (3.2.102)

σ22,` :≤C`ε

−d+4 +O(ε−d+5) when `= d +1, . . . , p. (3.2.103)

Note that for `= d +1, . . . , p, the variance is of higher order than that of `= 1, . . . ,d.

With the above bounds, we could apply the large deviation theory. For `= 1, . . . ,d, the random variable F2,` isuniformly bounded by c2,` = 2ε−d+1. Since E[F2,`] = O(ε) if xk ∈Mε , and E[F2,`] = O(ε2) if xk 6∈Mε , we assumeη2,` = O(ε2+s), where s > 0. Then c2,`η2,` = O(ε−d+3+s). If ε is small enough, 2σ2

2,`+23 c2,`η2,` ≤Cε−d+2 for

some constant C which depends on P and manifold M. We have,

nη22,`

2σ22,`+

23 c2,`η2,`

≥nη2

2,`εd−2

C. (3.2.104)

Suppose n is chosen large enough so that

nη22,`ε

d−2

C≥ 3log(n) ; (3.2.105)

that is, the deviation from the mean is set to

η2,` ≥ O( √log(n)

n1/2εd/2−1

), (3.2.106)

Note that by the assumption that η2,` = O(ε2+s), we know that η2,`/ε2 =

√log(n)

n1/2εd/2+1 → 0.

Thus, when ε is sufficiently smaller and n is sufficiently large, the exponent in Bernstein’s inequality

Pr

1

n−1

n

∑j 6=k, j=1

(F2,`, j−E[F2,`])> η2,`

≤ exp

(−

nη22,`

2σ22,`+

23 c2,`η2,`

)≤ 1

n3 . (3.2.107)

Page 105: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 101

By a simple union bound, for `= 1, . . . ,d, we have

Pr

∣∣∣∣∣1n n

∑j 6=k, j=1

F2,`, j−E[F2,`]

∣∣∣∣∣> η2,`

∣∣∣k = 1, . . . ,n

≤ 1/n2.

For `= d +1, . . . , p, the random variable F2,` is uniformly bounded by c2,` = 2ε−d+1. Since E[F2,`] = O(ε2)

for `= d +1, . . . , p, we assume η2,` = O(ε3+s), where s > 0. Then c2,`η2,` = O(ε−d+4+s). If ε is small enough,2σ2

2,`+23 c2,`η2,` ≤Cε−d+4 for some constant C which depends on M and P. We have,

nη22,`

2σ22,`+

23 c2,`η2,`

≥nη2

2,`εd−4

C. (3.2.108)

Suppose n is chosen large enough so that

nη22,`ε

d−4

C= 3log(n) ; (3.2.109)

that is, the deviation from the mean is set to

η2,` = O( √log(n)

n1/2εd/2−2

), (3.2.110)

Note that by the assumption that β1 = O(ε3+s), we know that η2,`/ε3 =

√log(n)

n1/2εd/2+1 → 0.

By a similar argument , for `= d +1, . . . , p, we have

Pr

∣∣∣∣∣1n n

∑j 6=k, j=1

F2,`, j−E[F2,`]

∣∣∣∣∣> η2,`

∣∣∣k = 1, . . . ,n

≤ 1/n2.

Denote Ω2 to be the event space that the deviation∣∣∣ 1

n ∑nj 6=k, j=1 F2,`, j−E[F2,`]

∣∣∣ ≤ η2,` for all ` = 1, . . . , p andk = 1, . . . ,n, where η2,` are chosen in (3.2.106) and (3.2.110).

Next Lemma summarizes behavior of (3.2.78) and can be proved similarly as Lemma 3.2.6.

Lemma 3.2.7. Suppose ε = ε(n) so that√

log(n)n1/2εd/2+1 → 0 and ε → 0 as n→ ∞. We have with probability greater

than 1−n−2 that for all k = 1, . . . ,n,

e>i

[1

nεd

N

∑j=1

(xk, j− xk)( f (xk, j)− f (xk))−E1εd (X− xk)( f (X)− f (xk))χBRp

ε (xk)(X)

]= O

( √log(n)n1/2εd/2−2

),

(3.2.111)where i = 1, . . . ,d. And

e>i

[1

nεd

N

∑j=1

(xk, j− xk)( f (xk, j)− f (xk))−E1εd (X− xk)( f (X)− f (xk))χBRp

ε (xk)(X)

]= O

( √log(n)n1/2εd/2−3

),

(3.2.112)where i = d +1, . . . , p.

Denote Ω3 to be the event space that Lemma 3.2.7 is satisfied.

Page 106: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 102

In the next two lemmas, we describe the behavior of 1nεd GnG>n . The proofs are the same as Lemma 2.4.4 with

ρ = 3.

Lemma 3.2.8. Suppose ε = ε(n) so that√

log(n)n1/2εd/2+1 → 0 and ε → 0 as n→ ∞. We have with probability greater

than 1−n−2 that for all k = 1, . . . ,n ,

|e>i (1

nεd GnG>n −1εd Cxk)e j|= O(

√log(n)

n1/2εd/2−2 ), (3.2.113)

where i, j = 1, . . . ,d.

|e>i (1

nεd GnG>n −1εd Cxk)e j|= O(

√log(n)

n1/2εd/2−4 ), (3.2.114)

where i, j = 1+1, . . . , p.

|e>i (1

nεd GnG>n −1εd Cxk)e j|= O(

√log(n)

n1/2εd/2−3 ), (3.2.115)

otherwise.

Lemma 3.2.9. rn ≤ r and rn is a non decreasing function of n. If n is large enough, rn = r. Suppose ε = ε(n) so

that√

log(n)n1/2εd/2+1 → 0 and ε → 0 as n→ ∞. We have with probability greater than 1−n−2 that for all k = 1, . . . ,n ,

∣∣e>i [Ip,rn(Λn

nεd + ε3Ip×p)

−1− Ip,r(Λ

εd + ε3Ip×p)

−1]ei∣∣= O

( √log(n)n1/2εd/2+2

)for i = 1, . . . ,r (3.2.116)

Un =UΘ+

√log(n)

n1/2εd/2−2 UΘS+O( log(n)

nεd−4

), (3.2.117)

where S ∈ o(p), and Θ ∈ O(p). Θ commutes with Ip,r(Λ

εd + ε3Ip×p)−1.

Denote Ω4 to be the event space that Lemma 3.2.9 is satisfied.In the proofs of Lemma 3.2.2 and Theorem 3.2.2, we need the order εd+3 terms of the eigenvalues λi of

Cx for i = 1, · · · ,d and we need the order ε term of the eigenvectors βi of Cx for i = 1, · · · , p. We also use thefact that λi of Cx for i = d+1, · · · , p are of order O(εd+4), so that we can calculate the leading terms (order ε2)

of Q f (x) for all x ∈M. Since√

log(n)n1/2εd/2+1 → 0, the above two lemmas imply that the differences between the first

d eigenvalues of 1nεd GnG>n and 1

εd Cxk are less than O(ε3). The differences between the rest of the eigenvaluesof 1

nεd GnG>n and 1εd Cxk are less than O(ε4). In other words, we can make sure that the rest of the eigenvalues

of 1nεd GnG>n are of order O(ε4). Moreover Un and UΘ differ by an order O(ε3) matrix. Consequently, in the

following proof, we can show that the deviation between ∑Nj=1[W − In×n]k j f (x j) and Q f (xk) is less than ε2 for all

xk.

Proof of Theorem 3.2.1. Denote Ω :=∩i=0,...,4Ωi. By a direct union bound, the probability of the event spaceΩ is great than 1− n−2. Below, all arguments are conditional on Ω. Based on previous lemmas, we have, fork = 1, . . . ,n,

1nεd

N

∑j=1

1 = E1εd χBRp

ε (xk)(X)+O

(√log(n)n1/2εd/2

), (3.2.118)

1nεd

N

∑j=1

( f (xk, j)− f (xk)) = E1εd ( f (X)− f (xk))χBRp

ε (xk)(X)+O

( √log(n)n1/2εd/2−1

)(3.2.119)

Page 107: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 103

1nεd

N

∑j=1

(xk, j− xk) = E1εd (X− xk)χBRp

ε (xk)(X)+E1 , (3.2.120)

where E1 ∈ Rp. e>i E1 = O( √

log(n)n1/2εd/2−1

)for i = 1, . . . ,d, and e>i E1 = O

( √log(n)

n1/2εd/2−2

)for i = d +1, . . . , p.

1nεd

N

∑j=1

(xk, j− xk)( f (xk, j)− f (xk)) = E1εd (X− xk)( f (X)− f (xk))χBRp

ε (xk)(X)+E2, (3.2.121)

where E2 ∈ Rp. e>i E2 = O( √

log(n)n1/2εd/2−2

)for i = 1, . . . ,d, and e>i E2 = O

( √log(n)

n1/2εd/2−3

)for i = d +1, . . . , p.

UnIp,rn(Λn

nεd + ε3Ip×p)

−1U>n −UIp,r(Λ

εd + ε3Ip×p)

−1U>

=(

UΘ+

√log(n)

n1/2εd/2−2 UΘS+O(log(n)nεd−4 )

)[Ip,r(

Λ

εd + ε3Ip×p)

−1 +O( √log(n)

n1/2εd/2+2

)]

×(

UΘ+

√log(n)

n1/2εd/2−2 UΘS+O(log(n)nεd−4 )

)>−UIp,r(

Λ

εd + ε3Ip×p)

−1U>.

=

√log(n)

n1/2εd/2−2 UΘ[SIp,r(Λ

εd + ε3Ip×p)

−1 + Ip,r(Λ

εd + ε3Ip×p)

−1S>]Θ>U>+O( √log(n)

n1/2εd/2+2

)Ip×p

+[higher order terms

].

Define a p× p matrix

E3 =

√log(n)

n1/2εd/2−2 UΘ[SIp,r(Λ

εd + ε3Ip×p)

−1 + Ip,r(Λ

εd + ε3Ip×p)

−1S>]Θ>U>+O( √log(n)

n1/2εd/2+2

)Ip×p (3.2.122)

[1

nεd

N

∑j=1

(xk, j− xk)]>UnIp,rn(

Λn

nεd + ε3Ip×p)

−1U>n [1

nεd

N

∑j=1

(xk, j− xk)( f (xk, j)− f (xk))] (3.2.123)

=[E1εd (X− xk)χBRp

ε (xk)(X)+E1]

>[UIp,r(Λ

εd + ε3Ip×p)

−1U>+E3 +higher order terms]

[E1εd (X− xk)( f (X)− f (xk))χBRp

ε (xk)(X)+E2]

=E1εd (X− xk)χBRp

ε (xk)(X)>[UIp,r(

Λ

εd + ε3Ip×p)

−1U>]E1εd (X− xk)( f (X)− f (xk))χBRp

ε (xk)(X)

+E >1 UIp,r(Λ

εd + ε3Ip×p)

−1U>E1εd (X− xk)( f (X)− f (xk))χBRp

ε (xk)(X)

+E1εd (X− xk)χBRp

ε (xk)(X)>E3E

1εd (X− xk)( f (X)− f (xk))χBRp

ε (xk)(X)

+E1εd (X− xk)χBRp

ε (xk)(X)>UIp,r(

Λ

εd + ε3Ip×p)

−1U>E2 +higher order terms

Note that

E1εd (X− xk)χBRp

ε (xk)(X)>UIp,r(

Λ

εd + ε3Ip×p)

−1U>E2 = Tι(xk)E2 (3.2.124)

Page 108: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 104

When x ∈Mε

Tι(xk)E2 =[[O(ε−1),O(ε−1)]] · [[O( √log(n)

n1/2εd/2−2

),O( √log(n)

n1/2εd/2−3

)]] = O

( √log(n)n1/2εd/2−1

). (3.2.125)

When x 6∈Mε

Tι(xk)E2 =[[O(1),O(ε−1)]] · [[O( √log(n)

n1/2εd/2−2

),O( √log(n)

n1/2εd/2−3

)]] = O

( √log(n)n1/2εd/2−2

). (3.2.126)

Moreover, when xk ∈Mε or xk ∈M\Mε by a similar calculation as in Lemma 3.2.2 , UIp,r(Λ

εd +ε3Ip×p)−1U>E 1

εd (X−xk)( f (X)− f (xk))χBRp

ε (xk)(X) = [[O(1),O(1)]]. Hence,

E >1 UIp,r(Λ

εd + ε3Ip×p)

−1U>E1εd (X− xk)( f (X)− f (xk))χBRp

ε (xk)(X) = O

( √log(n)n1/2εd/2−1

). (3.2.127)

Next, we calculate E 1εd (X−xk)χBRp

ε (xk)(X)>E3E 1

εd (X−xk)( f (X)− f (xk))χBRpε (xk)

(X). By a straightforward cal-culation, we can show that it is dominated by

O( √log(n)

n1/2εd/2+2

)E

1εd (X− xk)χBRp

ε (xk)(X)>E

1εd (X− xk)( f (X)− f (xk))χBRp

ε (xk)(X).

Hence, when xk ∈Mε ,

E1εd (X− xk)χBRp

ε (xk)(X)E3E

1εd (X− xk)( f (X)− f (xk))χBRp

ε (xk)(X) = O

( √log(n)n1/2εd/2−1

). (3.2.128)

When xk 6∈Mε ,

E1εd (X− xk)χBRp

ε (xk)(X)E3E

1εd (X− xk)( f (X)− f (xk))χBRp

ε (xk)(X) = O

( √log(n)n1/2εd/2−2

). (3.2.129)

In conclusion for k = 1, · · · ,n, we have

[1

nεd

N

∑j=1

(xk, j− xk)]>UnIp,rn(

Λn

nεd + ε3Ip×p)

−1U>n [1

nεd

N

∑j=1

(xk, j− xk)( f (xk, j)− f (xk))] (3.2.130)

=E1εd (X− xk)χBRp

ε (xk)(X)>[UIp,r(

Λ

εd + ε3Ip×p)

−1U>]E1εd (X− xk)( f (X)− f (xk))χBRp

ε (xk)(X)

+O( √log(n)

n1/2εd/2−1

).

A similar argument shows that for k = 1, · · · ,n,

[1

nεd

N

∑j=1

(xk, j− xk)]>UnIp,rn(

Λn

nεd + ε3Ip×p)

−1U>n [1

nεd

N

∑j=1

(xk, j− xk)] (3.2.131)

=E1εd (X− xk)χBRp

ε (xk)(X)>[UIp,r(

Λ

εd + ε3Ip×p)

−1U>]E1εd (X− xk)χBRp

ε (xk)(X)

+O(√log(n)

n1/2εd/2

).

By Theorem 3.2.2, g1 has order O(ε2) and g2 has order 1. Hence, (3.2.118),(3.2.119),(3.2.130) and (3.2.131)

Page 109: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 105

implies that

n

∑j=1

[W − In×n]k j f (x j) =g1 +O

( √log(n)n1/2εd/2−1

)g2 +O

(√log(n)n1/2εd/2

) = Q f (xk)+O( √log(n)

n1/2εd/2−1

). (3.2.132)

3.3 Dirichlet Graph Laplacian

Motivated by the spirit of LLE, we introduce our algorithm to estimate the Laplace-Beltrami operator with theDirichlet boundary condition from a given point cloud.

Fix ε and let Nk := BRpε (xk)∩X \xk = xk, jNk

j=1. Recall that Gn is the local data matrix associated withNxk :

Gn :=

| |xk,1− xk . . . xk,Nk − xk

| |

∈ Rp×Nk . (3.3.1)

Define a n×n diagonal matrix B such that

Bkk =‖Gn111Nk‖2

Rp

N2k ε

. (3.3.2)

We call B the bumping matrix.With the bumping matrix, Dirichlet Graph Laplacian (DGL) is given by

L := In×n−W −B, (3.3.3)

Note that we do not detect boundary in the algorithm; instead, based on the intrinsic geometric nature of thetruncated barycentric coordinate, the bumping matrix B takes care of the boundary automatically. This rendersthe DGL an approximation of the Laplace-Beltrami operator with the Dirichlet boundary condition.

We define a “boundary function” Bε(x) on M as

Bε(x) =1ε

[‖E[(X− ι(x))χBRpε (ι(x))(X)]‖Rp

E[χBRpε (ι(x))(X)]

]2(3.3.4)

The bump matrix and the boundary function can be related by the following proposition.

Proposition 3.3.1. 1. When ε is sufficiently small, the boundary function satisfies

Bε(x) =1ε

[µed (x,ε)

2

µ0(x,ε)2 +2∂dP(x)

P(x)

(µed (x,ε)µ2ed (x,ε)

µ0(x,ε)2 −µed (x,ε)

3

µ0(x,ε)3

)]+O(ε3) . (3.3.5)

If x ∈Mε ,

Bε(x) = O(ε). (3.3.6)

In particular, when x ∈ ∂M, Bε(x) =4|Sd−2|2

(d2−1)2|Sd−1|2 ε +O(ε2). If x 6∈Mε ,

Bε(x) = O(ε3). (3.3.7)

2. Suppose f ∈C2(M). Suppose ε = ε(n) so that√

log(n)n1/2εd/2+1 → 0 and ε→ 0 as n→∞. We have with probability

Page 110: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 106

greater than 1−n−2 that

if xk ∈Mε , then

Bkk f (xk) = Bε(xk) f (xk)+O( √log(n)

n1/2εd/2−1

).

If xk 6∈Mε , then

Bkk f (xk) = Bε(xk) f (xk)+O( √log(n)

n1/2εd/2−2

). (3.3.8)

Proof. By Lemma 3.1.3, we have

E[χBRpε (ι(x))(X)] = P(x)µ0(x,ε)+O(εd+1), (3.3.9)

E[(X− ι(x))χBRpε (ι(x))(X)] = [[P(x)µed (x,ε)J

>p,ded +O(εd+2),O(εd+2)]], (3.3.10)

Hence,

‖E[(X− ι(x))χBRpε (ι(x))(X)]‖2

Rp =(P(x)µed (x,ε)+∂dP(x)µ2ed (x,ε))2 +O(ε2d+4) (3.3.11)

=P(x)2µed (x,ε)

2 +2P(x)∂dP(x)µed (x,ε)µ2ed (x,ε)+O(ε2d+4),

and

E[χBRpε (ι(x))(X)]2 = P(x)2

µ0(x,ε)2 +2P(x)∂dP(x)µ0(x,ε)µed (x,ε)+O(ε2d+2) (3.3.12)

Therefore, by the definition (3.3.4), we have

Bε(x) =1ε

(‖E[(X− ι(x))χBRpε (ι(x))(X)]‖Rp

E[χBRpε (ι(x))(X)]

)2 (3.3.13)

=1ε

[P(x)2µed (x,ε)

2 +2P(x)∂dP(x)µed (x,ε)µ2ed (x,ε)+O(ε2d+4)

P(x)2µ0(x,ε)2 +2P(x)∂dP(x)µ0(x,ε)µed (x,ε)+O(ε2d+2)

]=

µed (x,ε)2

εµ0(x,ε)2 +2∂dP(x)εP(x)

(µed (x,ε)µ2ed (x,ε)µ0(x,ε)2 −

µed (x,ε)3

µ0(x,ε)3

)+O(ε3).

Recall that by Lemma 3.2.6

1nεd

N

∑j=1

(xk, j− xk) = E1εd (X− xk)χBRp

ε (xk)(X)+E1 , (3.3.14)

where E1 ∈ Rp. e>i E1 = O( √

log(n)n1/2εd/2−1

)for i = 1, . . . ,d, and e>i E1 = O

( √log(n)

n1/2εd/2−2

)for i = d +1, . . . , p.

Page 111: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 107

Bkk f (xk) =‖Gn111N‖2

Rp

N2εf (xk) (3.3.15)

=1ε

( 1nεd ∑

Nj=1(xk, j− xk)

Nnεd

)2f (xk)

=1ε

( E 1εd (X− xk)χBRp

ε (xk)(X)+E1

E 1εd χBRp

ε (xk)(X)+O

(√log(n)

n1/2εd/2

))2f (xk)

=1ε

‖E 1εd (X− xk)χBRp

ε (xk)(X)‖2 +2‖E 1

εd (X− xk)χBRpε (xk)

(X)‖‖E1‖+higher order terms

E 1εd χBRp

ε (xk)(X)2 +2E 1

εd χBRpε (xk)

(X)O(√

log(n)n1/2εd/2

)+higher order terms

f (xk).

Note that by Lemma 3.1.3 and Lemma 3.1.2, ‖E 1εd (X−xk)χBRp

ε (xk)(X)‖2 is of order ε4, if xk 6∈Mε and ‖E 1

εd (X−xk)χBRp

ε (xk)(X)‖2 is of order ε2, if xk ∈Mε . Moreover, E 1

εd χBRpε (xk)

(X) is always of order 1.

Hence, if xk ∈Mε ,

Bkk f (xk) = Bε(xk) f (xk)+O( √log(n)

n1/2εd/2−1

). (3.3.16)

If xk 6∈Mε ,

Bkk f (xk) = Bε(xk) f (xk)+O( √log(n)

n1/2εd/2−2

). (3.3.17)

Finally, we define the following integral operators from C(M) to C(M):

Rε f (x) :=Q f (x)− f (x)+Bε(x) f (x) . (3.3.18)

where f ∈C(M).

If we combine Theorem 3.2.1 and Proposition 3.3.1, then we have

Theorem 3.3.1. 1. Suppose f ∈C3(M). Suppose ε = ε(n) so that√

log(n)n1/2εd/2+1 → 0 and ε → 0 as n→ ∞. We

have with probability greater than 1−n−2 that for all k = 1, . . . ,n,

−n

∑j=1

Lk j f (x j) =Rε f (xk)+O( √log(n)

n1/2εd/2−1

)

2. We have for any x ∈M,

Rε f (x) =d

∑i=1

φi(x,ε)∂ 2ii f (x)+g(V (x,ε),∇ f (x))+Bε(x) f (x)+O(ε3). (3.3.19)

When x 6∈Mε , we have

Rε f (x) =1

2(d +2)∆ f (x)ε2 +O(ε3) ; (3.3.20)

Page 112: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 108

When x ∈Mε , we have

Rε f (x) = Bε(x) f (x)+O(ε2) . (3.3.21)

In particular, if x ∈ ∂M,

Rε f (x) =C f (x)ε +O(ε2) , (3.3.22)

where C is a constant which only depends on the dimension of M.

Next, we show the convergence of Rε f (x)ε2 to the Laplace-Beltrami operator with the Dirichlet boundary condi-

tion, ∆(D), in the L2 sense as ε→ 0. Suppose −λi, i = 1, . . ., are the eigenvalues of ∆(D) so that 0 < λ1 ≤ λ2 ≤ . . ..Since ∆(D) is not a bounded operator with respect to L2 norm, we cannot show the convergence in the operatornorm. Instead, we show the proof via the uniform convergence over finite dimensional subspaces composed offinite eigenspace of ∆(D). For k ∈ N, define

Ek := f ∈C∞(M)|∆(D) f =−λk f , f |∂M = 0 and ‖ f‖L2 = 1 . (3.3.23)

The convergence result can be stated as follows,

Theorem 3.3.2. Fix K ∈ N. There exists a constant C which only depends on M and P such that if ε satisfies

ε ≤ C

(1+λd/4+1K )4

, (3.3.24)

we have for all f ∈ ⊕Kl=1El that

∥∥∥Rε f (x)ε2 − 1

2(d +2)∆(D) f (x)

∥∥∥L2≤Cε

1/2 . (3.3.25)

Proof. In this proof, to simplify the notation, we use . to denote less than or equal up to a multiplicative constantwhich is independent of ε and f . Based on (3.3.19), for f ∈ ⊕K

l=1El and any x ∈M, we have

Rε f (x)ε2 − 1

2(d +2)∆ f (x) (3.3.26)

=d

∑i=1

[φi(x,ε)ε2 − 1

2(d +2)]∂

2ii f (x)+g

(V (x,ε)ε2 ,∇ f (x)

)+

Bε(x)ε2 f (x)+O(ε).

In particular, if x 6∈Mε , by applying Theorem 3.3.1

Rε f (x)ε2 − 1

2(d +2)∆ f (x) = O(ε). (3.3.27)

We first bound the O(ε) term in (3.3.26) and (3.3.27). Denote α = (α1, · · · ,αd) to be a multi-index. Then theterm O(ε) in (3.3.26) and (3.3.27) depends on D|α| f for |α|= 1, . . . ,3. Note that

‖ f‖Cl(M) . ‖ f‖Hd/2+l+1(M) . 1+‖∆d/4+l/2+1/2 f‖L2(M) . 1+λd/4+l/2+1/2K

Page 113: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

CHAPTER 3. LLE ON MANIFOLDS WITH BOUNDARY 109

where the first inequality is the Sobolev Embedding Theorem [45] and the third inequality is by the assumptionthat f ∈⊕K

l=1El . Note that ‖ f‖Cl(M)εl/2 . ε l/4 if and only if ε ≈ (1+λ

d/4+l/2+1/2K )−4/l , which is sufficient when

ε ≈ 1

(1+λd/4+1K )4

. (3.3.28)

Combining above discussions, we conclude that for a given K ∈ N, if ε ≈ 1(1+λ

d/4+2K )4

, the O(ε) terms in (3.3.26)

and (3.3.27) are bounded above by ε1/2.Next, we bound ‖Rε f (x)

ε2 − 12(d+2)∆ f (x)‖L2 . Fix x∈Mε , it follows from Proposition 3.3.1 and a straightforward

calculation that Bε (x)ε2 . 1

εfor ε sufficiently small. Similarly, the following expressions,

φi(x,ε)ε2 for i = 1, · · · ,d (3.3.29)

V (x,ε)ε2 (3.3.30)

are of order O(1). Hence, they are uniformly bounded from above on Mε for ε sufficiently small. Finally, notethe fact that Vol(Mε). ε since the boundary is smooth and Vol(M\Mε) is of order 1. Thus, by putting the abovetogether, we have the following estimation:∥∥∥Rε f (x)

ε2 − 12(d +2)

∆ f (x)∥∥∥

L2(3.3.31)

=∥∥∥Rε f (x)

ε2 − 12(d +2)

∆ f (x)∥∥∥

L2(Mε )+∥∥∥Rε f (x)

ε2 − 12(d +2)

∆ f (x)∥∥∥

L2(M\Mε )

.∥∥∥ d

∑i=1

[φi(x,ε)ε2 − 1

2(d +2)]∂

2ii f (x)+g

(V (x,ε)ε2 ,∇ f (x)

)+

Bε(x)ε2 f (x)

∥∥∥L2(Mε )

+ ε1/2

. max1≤|α|≤2

‖Dα f‖L∞(Mε )Vol(Mε)+∥∥∥Bε(x)

ε2

∥∥∥L∞(Mε )

‖ f‖L∞(Mε )Vol(Mε)+ ε1/2

.‖ f‖C2(Mε )ε +‖ f‖L∞(Mε )+ ε

1/2 ,

where we use the fact that max|α|=2 ‖Dα f‖L∞(Mε )+max|α|=1 ‖Dα f‖L∞(Mε ) ≤ 2‖ f‖C2(Mε )in the last step. Based

on (3.3.28), if ε . 1(1+λ

d/4+1K )4

, then ‖ f‖C2(Mε )ε ≤ ‖ f‖C2(M)ε . ε1/2. To control ‖ f‖L∞(Mε ), we use the assumption

that ∆ is the Laplace-Beltrami operator with the Dirichlet boundary and f ∈ ⊕Kl=1El is smooth. As a result, we

have ‖ f‖L∞(Mε ) . ε when ε is sufficiently small. In conclusion, when ε ≈ 1(1+λ

d/4+1K )4

, for all f ∈⊕Kl=1El we have

∥∥∥Rε f (x)ε2 − 1

2(d +2)∆ f (x)

∥∥∥L2

. ε1/2 (3.3.32)

Page 114: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

Bibliography

[1] A. Alvarez-Meza, J. Valencia-Aguirre, G. Daza-Santacoloma, and G. Castellanos-Domınguez. Global andlocal choice of the number of nearest neighbors in locally linear embedding. Pattern Recognition Letters,32:2171–2177, 2011.

[2] A. L. Andrew and R. C. E. Tan. Computation of derivatives of repeated eigenvalues and the correspondingeigenvectors of symmetric matrix pencils. SIAM Journal on Matrix Analysis and Applications, 20(1):78–100, 1998.

[3] M. Balasubramanian and E. L. Schwartz. The Isomap Algorithm and Topological Stability. Science,295(January):7a–7, 2002.

[4] J. Bates. The embedding dimension of laplacian eigenfunction maps. Appl. Comput. Harmon. Anal.,37(3):516–530, 2014.

[5] H. Baumgartel. Analytic perturbation theory for matrices and operators. Operator theory. Birkhauser Verlag,1985.

[6] M. Belkin and P. Niyogi. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation.Neural. Comput., 15(6):1373–1396, 2003.

[7] M. Belkin and P. Niyogi. Towards a theoretical foundation for Laplacian-based manifold methods. InProceedings of the 18th Conference on Learning Theory (COLT), pages 486–500, 2005.

[8] M. Belkin and P. Niyogi. Convergence of Laplacian eigenmaps. In Adv. Neur. In.: Proceedings of the 2006

Conference, volume 19, page 129. The MIT Press, 2007.

[9] M. Belkin, Q. Que, Y. Wang, and X. Zhou. Toward understanding complex spaces: graph Laplacians onmanifolds with singularities and boundaries. In JMLR: Workshop and Conference Proceedings, volume 23,pages 1–24, 2012.

[10] P. Berard. Spectral Geometry: Direct and Inverse Problems. Springer, 1986.

[11] P. Berard, G. Besson, and S. Gallot. Embedding Riemannian manifolds by their heat kernel. Geom. Funct.

Anal., 4:373–398, 1994.

[12] A. Bernstein and A. Kuleshov. Data-based Manifold Reconstruction via Tangent Bundle Manifold Learning.ICML-2014, Topological Methods for Machine Learning Workshop, pages 1–6, 2014.

[13] T. Berry and T. Sauer. Density Estimation on Manifolds with Boundary. 2015.

110

Page 115: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

BIBLIOGRAPHY 111

[14] T. Berry and T. Sauer. Local kernels and the geometric structure of data. Applied and Computational

Harmonic Analysis, 40(3):439–469, 2016.

[15] H. Chang and D. Y. Yeung. Robust locally linear embedding. Pattern Recognition, 39:1053–1065, 2006.

[16] J. Cheeger, M. Gromov, M. Taylor, et al. Finite propagation speed, kernel estimates for functions of thelaplace operator, and the geometry of complete riemannian manifolds. Journal of Differential Geometry,17(1):15–53, 1982.

[17] M.-Y. Cheng and H.-T. Wu. Local linear regression on manifolds and its geometric interpretation. J. Am.

Stat. Assoc., 108:1421–1434, 2013.

[18] R. R. Coifman and S. Lafon. Diffusion maps. Appl. Comput. Harmon. Anal., 21(1):5–30, 2006.

[19] J. Dever. Eigenvalue sums of theta laplacians on finite graphs. arXiv preprint arXiv:1609.05999, 2016.

[20] L. P. Devroye and T. J. Wagner. The strong uniform consistency of nearest neighbor density estimates. Ann.

Stat., 5(3):536–540, 1977.

[21] D. L. Donoho, M. Gavish, and I. M. Johnstone. Optimal Shrinkage of Eigenvalues in the Spiked CovarianceModel. ArXiv e-prints, 2013.

[22] D. L. Donoho and C. Grimes. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. P. Natl. Acad. Sci. USA, 100(10):5591–5596, 2003.

[23] N. El Karoui. On information plus noise kernel random matrices. Ann. Stat., 38(5):3191–3216, 2010.

[24] N. El Karoui and H.-T. Wu. Connection graph Laplacian methods can be made robust to noise. Ann. Stat.,44(1):346–372, 2016.

[25] J. Fan and I. Gijbels. Local Polynomial Modelling and Its Applications. Chapman and Hall/CRC, 1996.

[26] M. Fanuel, C. Alaız, A. Fernandez, and J. Suykens. Magnetic eigenmaps for the visualization of directednetworks. Applied and Computational Harmonic Analysis, 44(1):189–199, 2018.

[27] T. Gao. The Diffusion Geometry of Fibre Bundles. ArXiv e-prints, 2016.

[28] N. Garcia Trillos and D. Slepcev. A variational approach to the consistency of spectral clustering. Appl.

Comput. Harmon. Anal., pages 1–39, 2015.

[29] A. S. Georgiou, J. M. Bello-Rivas, C. W. Gear, H.-T. Wu, E. Chiavazzo, and I. G. Kevrekidis. An explorationalgorithm for stochastic simulators driven by energy gradients. Entropy, 19(7):294, 2017.

[30] E. Gine and V. Koltchinskii. Empirical graph laplacian approximation of laplace-beltrami operators: Largesample results. In Anthony Bonato and Jeannette Janssen, editors, IMS Lecture Notes, volume 51 of Mono-

graph Series, pages 238–259. The Institute of Mathematical Statistics, 2006.

[31] M. J. Goldberg and S. Kim. Some Remarks on Diffusion Distances. Journal of Applied Mathematics,2010:1–17, 2010.

[32] M. Hein, J. Audibert, and U. von Luxburg. From graphs to manifolds - weak and strong pointwise con-sistency of graph Laplacians. In Proceedings of the 18th Conference on Learning Theory (COLT), pages470–485, 2005.

Page 116: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

BIBLIOGRAPHY 112

[33] M. Hein, J.-Y. Audibert, and U. von Luxburg. Graph Laplacians and their Convergence on Random Neigh-borhood Graphs. JMLR, 8:1325–1368, 2007.

[34] I. M. Johnstone. High dimensional statistical inference and random matrices. arXiv:math/0611589v1, 2006.

[35] P. W. Jones, M. Maggioni, and R. Schul. Manifold parametrizations by eigenfunctions of the Laplacian andheat kernels. P. Natl. Acad. Sci. USA, 105(6):1803–8, 2008.

[36] D. N. Kaslovsky and F. G. Meyer. Non-asymptotic analysis of tangent space perturbation. Information and

Inference, 3(2):134–187, 2014.

[37] O. Katz, R. Talmon, Y.-L. Lo, and H.-T. Wu. Diffusion-based nonlinear filtering for multimodal data fusionwith application to sleep stage assessment. Information Fusion, In press, 2018.

[38] R. R. Lederman and R. Talmon. Learning the geometry of common latent variables using alternating-diffusion. Appl. Comp. Harmon. Anal., 2015.

[39] R. R. Lederman, R. Talmon, H.-T. Wu, Y.-L. Lo, and R. R. Coifman. Alternating diffusion for commonmanifold learning with application to sleep stage assessment. pages 5758–5762. IEEE, 2015.

[40] O. Lindenbaum, A. Yeredor, M. Salhov, and A. Averbuch. MultiView Diffusion Maps. ArXiv e-prints, 2015.

[41] A. V. Little, M. Maggioni, and L. Rosasco. Multiscale geometric methods for data sets I: Multiscale SVD,noise and curvature. Appl. Comput. Harmon. Anal., 43(3):504–567, 2017.

[42] N. F. Marshall and M. J. Hirn. Time coupled diffusion maps. Applied and Computational Harmonic Analysis,1:1–20, 2017.

[43] D. S. Moore and J. W. Yackel. Consistency Properties of Nearest Neighbor Density Function Estimators.Ann. Statist., 5(1):143–154, 1977.

[44] P. Niyogi, S. Smale, and S. Weinberger. Finding the homology of submanifolds with high confidence fromrandom samples. In Twentieth Anniversary Volume:, pages 1–23. Springer New York, 2009.

[45] R. S. Palais. Foundations of global non-linear analysis. 1969.

[46] Y. Pan, S. Ge, and A. Al Mamun. Weighted locally linear embedding for dimension reduction. Pattern

Recognition, 42:798–811, 2009.

[47] J. W. Portegies. Embeddings of Riemannian manifolds with heat kernels and eigenfunctions. 2016.arXiv:1311.7568 [math.DG].

[48] H. J. Qiu and E. R. Hancock. Clustering and embedding using commute times. IEEE Trans. Patt. Anal.

Mach. Intell., 29(11):1873–90, 2007.

[49] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science,290(5500):2323–2326, 2000.

[50] M. Salhov, A. Bermanis, G. Wolf, and A. Averbuch. Approximately-isometric diffusion maps. Applied and

Computational Harmonic Analysis, 38(3):399–419, 2015.

[51] M. Salhov, G. Wolf, and A. Averbuch. Patch-to-tensor embedding. Appl. Comput. Harmon. Anal.,33(2):182–203, 2012.

Page 117: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

BIBLIOGRAPHY 113

[52] A. Singer. From graph to manifold Laplacian: The convergence rate. Appl. Comput. Harmon. Anal.,21(1):128–134, 2006.

[53] A. Singer and R. R. Coifman. Non-linear independent component analysis with diffusion maps. Appl.

Comput. Harmon. Anal., 25(2):226–239, 2008.

[54] A. Singer and H.-T. Wu. Orientability and diffusion maps. Applied and computational harmonic analysis,31(1):44–58, 2011.

[55] A. Singer and H.-T. Wu. Vector diffusion maps and the connection Laplacian. Comm. Pure Appl. Math.,65(8):1067–1144, 2012.

[56] A. Singer and H.-T. Wu. 2-d tomography from noisy projections taken at unknown random directions. SIAM

J. Imaging Sci., 6(1):136–175, 2013.

[57] A. Singer and H.-T. Wu. Spectral convergence of the connection laplacian from random samples. Informa-

tion and Inference: A Journal of the IMA, 6(1):58–123, 2017.

[58] O. Smolyanov, H.v. Weizsacker, and O. Wittich. Chernoff’s theorem and discrete time approximations ofbrownian motion on manifolds. Potential Anal., 26(1):1–29, 2007.

[59] A. Spira, R. Kimmel, and N. Sochen. A short-time Beltrami kernel for smoothing images and manifolds.IEEE Transactions on Image Processing, 16(6):1628–1636, 2007.

[60] E. M. Stein and G. Weiss. Introduction to Fourier analysis on Euclidean spaces (PMS-32), volume 32.Princeton university press, 2016.

[61] A. Szlam. Asymptotic regularity of subdivisions of Euclidean domains by iterated PCA and iterated 2-means. Applied and Computational Harmonic Analysis, 27(3):342–350, 2009.

[62] R. Talmon and R. R. Coifman. Intrinsic modeling of stochastic dynamical systems using empirical geometry.Applied and Computational Harmonic Analysis, 39(1):138–160, 2015.

[63] R. Talmon and R.R. Coifman. Empirical intrinsic geometry for nonlinear modeling and time series filtering.Proc. Nat. Acad. Sci., 110(31):12535–12540, 2013.

[64] R. Talmon and H.-T. Wu. Discovering a latent common manifold with alternating diffusion for multimodalsensor data analysis. Appl. Comput. Harmon. Anal., accepted for publication, 2018.

[65] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A Global Geometric Framework for Nonlinear Dimen-sionality Reduction. Science, 290(5500):2319–2323, 2000.

[66] N. G. Trillos, M. Gerlach, M. Hein, and D. Slepcev. Error estimates for spectral convergence ofthe graph laplacian on random geometric graphs towards the laplace–beltrami operator. arXiv preprint

arXiv:1801.10108, 2018.

[67] H. Tyagi, E. Vural, and P. Frossard. Tangent space estimation for smooth embeddings of Riemannian mani-folds. Information and Inference, 2(1):69–114, 2013.

[68] N.P. Van Der Aa, H.G. Ter Morsche, and R.R.M. Mattheij. Computation of eigenvalue and eigenvectorderivatives for a general complex-valued eigensystem. Electronic Journal of Linear Algebra, 16(1):300–314, 2007.

Page 118: by Nan Wu - University of Torontoblog.math.toronto.edu/GraduateBlog/files/2018/02/Nan-Wus-thesis.pdf · Moreover, we discuss the relationship between two common nearest neighbor search

BIBLIOGRAPHY 114

[69] L. van der Maaten and G. Hinton. Visualizing Data using t-SNE. Journal of Machine Learning Research,9:2579–2605, 2008.

[70] U. von Luxburg, M. Belkin, and O. Bousquet. Consistency of spectral clustering. Ann. Stat., 36(2):555–586,April 2008.

[71] X. Wang. Spectral Convergence Rate of Graph Laplacian. ArXiv e-prints, 1510.08110, 2015.

[72] K.Q. Weinberger and L.K. Saul. An introduction to nonlinear dimensionality reduction by maximum vari-ance unfolding. Aaai, pages 1683–1686, 2006.

[73] H.-T. Wu and N. Wu. Think globally, fit locally under the Manifold Setup: Asymptotic Analysis of LocallyLinear Embedding. ArXiv e-prints,1703.04058, 2017.

[74] Z. Zhang and J. Wang. Mlle: Modified locally linear embedding using multiple weights. Advances in neural

information processing systems, pages 1593–1600, 2006.

[75] Z. Zhang and H. Zha. Principal manifolds and nonlinear dimensionality reduction via tangent space align-ment. SIAM J. Sci. Comput., 26:313 – 338, 2004.