E.G.M. Petrakis Dimensionality Reduction 1 Dimensionality Reduction Given N vectors in n dims, find the k most important axes to project them k is user defined (k < n) Applications: information retrieval & indexing identify the k most important features or reduce indexing dimensions for faster retrieval (low dim indices are faster)
Dimensionality Reduction. Given N vectors in n dims, find the k most important axes to project them k is user defined ( k < n ) Applications : information retrieval & indexing identify the k most important features or - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
E.G.M. Petrakis Dimensionality Reduction 1
Dimensionality Reduction
Given N vectors in n dims, find the k most important axes to project themk is user defined (k < n)
Applications: information retrieval & indexingidentify the k most important features
or reduce indexing dimensions for faster
retrieval (low dim indices are faster)
E.G.M. Petrakis Dimensionality Reduction 2
Techniques
Eigenvalue analysis techniques [NR’92]Karhunen-Loeve (K-L) transformSingular Value Decomposition (SVD)both need O(N2) time
FastMap [Faloutsos & Lin 95] dimensionality reduction and mapping of objects to vectorsO(N) time
E.G.M. Petrakis Dimensionality Reduction 3
Mathematical Preliminaries
For an nxn square matrix S, for unit vector x and scalar value λ: Sx = λxx: eigenvector of Sλ: eigenvalue of S
The eigenvectors of a symmetric matrix (S=ST) are mutually orthogonal and its eigenvalues are realr rank of a matrix: maximum number or
independent columns or rows
E.G.M. Petrakis Dimensionality Reduction 4
Example 1
Intuition: S defines an affine transform y = Sx that involves scaling, rotationeigenvectors: unit vectors along the new
directionseigenvalues denote scaling
52.0
85.0 ,38.3
85.0
52.0 ,62.3
31
12
22
11
u
u
S
eigenvector of major axis
E.G.M. Petrakis Dimensionality Reduction 5
Example 2
If S is real and symmetric (S=ST) then it can be written as S = UΛUT
the columns of U are eigenvectors of SU: column orthogonal (UUT=I)Λ: diagonal with the eigenvalues of S
52.085.0
85.052.0
38.10
062.3
52.085.0
85.052.0
31
12S
E.G.M. Petrakis Dimensionality Reduction 6
Karhunen-Loeve (K-L)
Project in a k-dimensional space (k<n) minimizing the error of the projections (sum. of sq. diffs)K-L gives a linear combination of axes sorted by importance keep the first k dims
2-dim points and the 2 K-L directionsfor k=1 keep x’
E.G.M. Petrakis Dimensionality Reduction 7
Computation of K-L
Put N vectors in rows in A=[aij]
Compute B=[aij-ap] , whereCovariance matrix: C=BTBCompute the eigenvectors of CSort in decreasing eigenvalue orderApproximate each object by its
projections on the directions of the first k eigenvectors
N
i ipp aNa1
1
E.G.M. Petrakis Dimensionality Reduction 8
Intuition
B shifts the origin of the center of gravity of the vectors by ap and has 0 column mean
C represents attribute to attribute similarityC square, real, symmetric Eigenvector and eigenvalues are computed
on C not on AC denotes the affine transform that
minimizes the error Approximate each vector with its
projections along the first k eigenvectors
E.G.M. Petrakis Dimensionality Reduction 9
Example
Input vectors [1 2], [1 1], [0 0]Then col.avgs are 2/3 and 1
00
11
21
A
0.47
0.88-u 13.0
88.0
47.0 u 53.2
21
13/2 and
13/2
03/1
13/1
22
11
CB
E.G.M. Petrakis Dimensionality Reduction 10
SVD
For general rectangular matrixesNxn matrix (N vectors, n dimensions)groups similar entities (documents) togetherGroups similar terms together and each group
the CS concept and unrelated to the medical concept
LSI returns docs that also contain the terms “retrieval” and “information” which are not specified by the query
0
0
0
0
1
q
0
58.0
0
0
0
0
1
71.071.0000
0058.058.058.0
qVq Tc
E.G.M. Petrakis Dimensionality Reduction 16
FastMap
Works with distances, has two roles:1. Maps objects to vectors so that
their distances are preserved (then apply SAMs for indexing)
2. Dim. Reduction: N vectors with n attributes each, find N vectors with k attributes such that distances are preserved as much as possible
E.G.M. Petrakis Dimensionality Reduction 17
Main idea
Pretend that objects are points in some unknown n-dimensional spaceproject these points on k mutually
orthogonal axescompute projections using distance only
The heart of FastMap is the method that projects two objects on a linetake 2 objects which are far apart (pivots)project on the line that connects the pivots
E.G.M. Petrakis Dimensionality Reduction 18
Project Objects on a Line
Oa, Ob: pivots, Oi: any object
dij: shorthand for D(Oi,Oj)
xi: first coordinate on a k dimensional space
If Oi is close to Oa, xi is small
ab
biabaii
abiabaiib
d
dddx
dxddd
2
2222
222
Apply cosine low:
E.G.M. Petrakis Dimensionality Reduction 19
Choose Pivots
Complexity: O(N) The optimal algorithm would require O(N2) time
steps 2,3 can be repeated 4-5 times to improve the accuracy of selection
E.G.M. Petrakis Dimensionality Reduction 20
Extension for Many Dimensions
Consider the (n-1)-dimensional hyperplane H that is perpendicular to line Oab
Project objects on H and apply previous stepchoose two new pivotsthe new xi is the next object coordinate repeat this step until k dim. vectors are
obtained
The distance on H is not D D’: distance between projected objects
E.G.M. Petrakis Dimensionality Reduction 21
Distance on the Hyper-Plane H
D’ on H can be computed from the Pythagorean theorem
The ability to compute D’ allows for computing a second line on H etc.
C. Faloutsos, K.-Ip.Lin, FastMap: A Fast Algorithm for Indexing, Data Mining and Visualization of Traditional and Multimedia Datasets, Proc. of Sigmod, 1995