Laplacian Eigenmaps for Dimensionality Reduction and Data Representation By Mikhail Belkin, Partha Niyogi Slides by Shelly Grossman Big Data Processing Seminar Amir Averbuch 28.12.2014
Laplacian Eigenmaps for Dimensionality Reduction and
Data Representation
By Mikhail Belkin, Partha Niyogi Slides by Shelly Grossman
Big Data Processing Seminar
Amir Averbuch 28.12.2014
Introduction
• A geometrically-motivated algorithm to non-linear dimensionality reduction.
• An attempt to recover a representation of the data in it’s intrinsic structure (if exists), keeping close points together.
• Shares common properties with LLE, Spectral Clustering, Diffusion maps, and other non linear dimensionality reduction methods.
Agenda
Preliminaries & Reminders
Geometric motivation
The Algorithm & Justification
Relation to Laplace operator and Heat Kernels
Similar algorithms
Examples
Open questions
Preliminaries & Reminders
Manifolds
• A space that resembles the Euclidean Space Rn in a neighborhood near each point.
Dimensionality Reduction • “Unfolding” a manifold embedded in a high-
dimensional space so each data point is assigned a low dimensional representation.
• x1,…,xk ∈M, M embedded in ℝl.
• Target: find y1,…,yk ∈ ℝ𝑚 , 𝑚 ≪ 𝑙, where yi is equivalent to xi.
Example
• X=(x1,x2,…,xn) x1
– Not very good…
• Later on, we will see criteria for “Good” and “Bad” representations.
Graph Laplacian ℒ
𝓛 = 𝑫 −𝑾
W = adjacency matrix
𝐷𝑖,𝑗 = 0
deg(𝑣𝑖)
𝑖 ≠ 𝑗
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
W = weights matrix
𝐷𝑖,𝑖 = 𝑤(𝑣𝑖 , 𝑣𝑗)
ℒ𝑖,𝑗 =
𝐷𝑖,𝑖−𝑤(𝑣𝑖 , 𝑣𝑗)
0
𝑖 = 𝑗𝑖 ≠ 𝑗 ; ∃𝑒𝑑𝑔𝑒 𝑣𝑖 → 𝑣𝑗
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Geometric motivation
• The graph Laplacian is a discrete approximation of the Laplace operator on manifolds.
• Eigenvectors of the Laplacian matrix are equivalent to eigenfunctions of the Laplace operator.
• The Laplace operator, in turn, defines the inner-product on the tangent space for any point in the manifold.
• The inner product is used to define geometric notions such as length, angle, orthogonality.
• See S. Rosenberg, the Laplacian on a Riemmannian Manifold, 1997, pgs. 11, 18.
The Algorithm
• Input: K points in ℝ𝑙 (samples from the data).
• We do not know whether these points actually lay on a manifold of lower dimension – it’s an assumption.
• Output: an embedding map of these points to a lower dimension.
Adjacency Graph Construction
• The k data points are translated to k graph nodes.
• Edges are defined according to a metric set on the points.
– Which points are considered “close”?
• Two alternatives:
– 𝜀-close nodes are connected
– n nearest neighbors
𝜀-close nodes
• 𝑥𝑖 − 𝑥𝑗2< 𝜀
• ∙ is the usual Euclidean norm in ℝ𝑙 • Geometrically intuitive, but leads to
disconnected graphs. • Need to choose 𝜀.
1 2
3
0.8𝜀
4 1.2𝜀
n-nearest neighbors
• n=2 – Node 1 is close to 4,5
– Node 2 is close to 1,3
– Node 3 is close to 1,2
– Node 4 is close to 1,5
– Node 5 is close to 1,4
• Easy to pick and good chances to have a connected graph, but not as intuitive.
1
2
3
4 5
Choosing weights
• Binary: 1 for an existing edge in the adjacency graph. 𝑊𝑖,𝑗 ∈ {0,1}
• Heat Kernel: t ∈ ℝ+,𝑊𝑖,𝑗 = 𝑒−𝑥𝑖−𝑥𝑗
2
𝑡
for an existing edge, 0 otherwise.
• Intuition regarding the heat kernel will be provided later on.
Eigenmap computation
• Repeat the following for each connected component:
• Solve a generalized eigenvector problem:
ℒ𝐟 = 𝜆𝐷𝐟
– This will result in a set of eigenvalues and matching eigenvectors.
• Take m eigenvectors matching the smallest eigenvalues (omitting 0): 𝐟𝟏, 𝐟𝟐, … 𝐟𝐦
𝐱𝐢 → (𝐟𝟏 𝐢 , … , 𝐟𝐦 𝐢 )
Justification • Suppose m=1 (map the sample to a line).
• The map is: 𝐱𝐢 → 𝐲𝐢
• Minimize: 𝑦𝑖 − 𝑦𝑗2𝑊𝑖,𝑗𝑖,𝑗 = λ
• It can be proved that 1
2λ = 𝐲𝐭 ℒ𝐲
• The minimizing vector matches the smallest eigenvalue of ℒ.
• A similar argument can be applied for m>1.
Notes • We find eigenmaps per each connected
component.
• We take m eigenvectors, where m is the dimension of the embedded manifold (if known).
• 0 is omitted as the matching eigenvector is 𝟏: in ℒ, the sum of a row is 0. Taking it will result in mapping an entire component’s first coordinate to a single point.
Relation to Laplace Operator • A similar process can be applied in the continuous
case.
• 𝑓 maps every 2 points in the embedded manifold to a low dimension space (i.e. the real line).
𝑓 𝑦 − 𝑓 𝑥 ≤ 𝑑𝑖𝑠𝑡ℳ 𝑥, 𝑦 𝛻𝑓 𝑥 + 𝑜(𝑑𝑖𝑠𝑡ℳ 𝑥, 𝑦 )
• It can be proved with tools from Functional Analysis that a mapping 𝑓 that best preserves local distances is an eigenfunction of the Laplace operator on the manifold.
Heat Kernel and Choice of Weight Matrix
Discrete Laplacian↔Laplace Operator ↔ Heat equation
• Heat equation: 𝜕
𝜕𝑡+ℒ 𝑢 = 0
• The solution for 𝑢(𝑥, 𝑡) can be expressed using the Heat Kernel, which is approximately the Gaussian:
4𝜋𝑡 −𝑚2 𝑒−
𝑥−𝑦 2
4𝑡
• Plugging into the heat equation 𝑓 𝑥 = 𝑢(𝑥, 0) we get an estimate for the Laplacian using Gaussian weights:
ℒ𝑓 𝑥𝑖 ≈1
𝑡𝑓 𝑥𝑖 − 𝛼 𝑒−
𝑥𝑖−𝑥𝑗2
4𝑡
𝑊𝑖,𝑗𝑗
𝑓 𝑥𝑗
LLE • In LLE we had:
1. Calculating weights 𝑊𝑖,𝑗
2. Use 𝑊𝑖,𝑗 to calculate the representation 𝑦𝑖 .
• Step 2 could also be done by calculating the smallest eigenvectors of 𝑀 = 𝐼 −𝑊 𝑇 𝐼 − 𝑊
• Regarding 𝑀 as an operator on functions defined on the dataset, it can be shown that:
𝑀𝑓 ≈1
2ℒ2𝑓
• Therefore, LLE calculates the eigenfunctions of the iterated Laplacian.
• Eigenfunctions of ℒ2 are the same as the eigenfunctions of ℒ.
Clustering • In last lecture: Clustering↔Minimal graph cut
• We also saw that normalized spectral clustering solves a generalized eigenvector problem:
ℒ𝐯 = 𝜆𝐷𝐯
• Can also show how the process of finding the minimal cut reduces to finding the eigenvectors of the graph Laplacian.
• Therefore the Laplacian has a role in both dimensionality reduction and clustering.
• Can be viewed as 2 sides of the same coin.
Examples
• Classic Swiss roll:
Original sample Laplacian representation PCA
Toy vision example - bars
example Laplacian representation PCA Dimension=1600 (40x40) to dimension=2. Sample was 500 horizontal bars and 500 vertical bars.
Linguistics
• 300 most popular words in the Brown Corpus (compiled in 1961).
• Each such word is represented as a vector of dimension 600 with the bigram count information:
𝑤𝑖 = (𝑐 𝑤1𝑤𝑖 , … , 𝑐 𝑤300𝑤𝑖 , 𝑐 𝑤𝑖𝑤1 , … , 𝑐 𝑤𝑖𝑤300 )
• Dimensionality reduction using Laplacian eigenmaps will give us a bonus – soft clustering of words with similar syntactic categories.
Linguistics
Infinitives (to be) Prepositions Modal verbs
Speech
• Given a short recording of speech, can we recognize and represent phonetic data efficiently?
• Convert speech signal to Fourier transform, label each vector of Fourier coefficients (dimension = 256) with phonetic identity.
• Labels are not disclosed to Laplacian eigenmap algorithm.
Fricatives עיצורים חוככיםf, v, s, z
Closures עיצורים סותמיםg, k, t, d, p, b
Vowels, י"אהו
Nasals, עיצורים אפיים (n, m)
Open Questions
• Finding an isometry of a manifold in a low dimensional space. – Dimensionality reduction with global preservation of distances.
• The process does not reveal the intrinsic dimensionality of the manifold, even though we assume the data does lie there.
• Assumes uniform sampling.
• Manifold boundaries.
• Choice of 𝜖 and 𝑡.
• Do we really mind the underlying manifold? Requires research of specific problems in various areas.