Estimating Intrinsic Dimension Justin Eberhardt UMD, Mathematics and Statistics Advisor: Dr. Kang James
Jan 14, 2016
Estimating Intrinsic Dimension
Justin EberhardtUMD, Mathematics and Statistics
Advisor: Dr. Kang James
Outline
Introduction Nearest Neighborhood Estimators
Regression Estimator Maximum Likelihood Estimator Revised Maximum Likelihood Estimator
Comparison Summary
2
Intrinsic Dimension
Definition
The least number of parameters required to generate a dataset
Minimum number of dimensions that describes a dataset without significant loss of feature
3
Ex 1: Intrinsic Dimension
Flatten (Unroll)
x
z
y
x
y
Int Dim = 2
4
Ex 2: Intrinsic Dimension
28 X 28
One Image: 784 Dimensional
1 28
56
Ex 2: Intrinsic Dimension
Int Dim = 2
[Isomap Project, J. Tenenbaum & J. Langford, Stanford]
6
Top & Bottom Loop
No Loop
Applications
Biometrics Facial Recognition, Fingerprints, Iris
Genetics
7
Why do we need to reduce dimensionality? Low dimensional datasets are more efficient
Not even supercomputers can handle very high-dimensional matrices
Data in 1,2 and 3 dimensions can be visualized
8
Ex: Facial Recognition in MN 5 Million People 2 Images per Person (Front and Profile) 1028 X 1028 Pixels per Image (1 Megapixel)
Total Memory Required: n = 5,000,000 p = (2)(1028)(1028)= 2.11 Million Dimensions Matrix Size: (5 x 106)(2.11 x 106) = 10 billion cells Memory: 2(10 x 1012) = 20 x 1012 = 20 Terabytes
Intrinsic Dimension Estimators
Objective:
To find a simple formula that uses nearest neighbor (NN) information to quickly estimate intrinsic dimension
10
Intrinsic Dimension Estimators
Project Description:
Through simulation, we will compare the effectiveness of three proposed NN intrinsic dimension estimators.
11
Intrinsic Dimension Estimators
Note:
Traditional methods for estimating Intrinsic Dimension, such as PCA, fail on non-linear manifolds.
12
Intrinsic Dimension EstimatorsNearest-Neighbor Methods Regression Estimator
K. Pettis, T. Bailey, A. Jain & R. Dubes, 1979
Maximum Likelihood EstimatorE. Levina, & P. Bickel, 2005
D. MacKay and Z. Ghahramani, 2005
13
The distance from x2 to x3
Distance Matrix
1 2 3 . . . N
1 0 d1,2 d1,3 d1,n
2 d2,1 0 d2,3 d2,n
3 d3,1 d3,2 0 d3,n
. . .
. . .
. . .
N dn,1 dn,2 dn,3 . . . 0Di,j: Euclidean distance from xi to xj 14
The distance between x2 and the kth NN to x2
Nearest Neighbor Matrix
1 2 3 . . . N
1 0 t1,2 t1,3 t1,n
2 0 t2,2 t2,3 t2,n
3 0 t3,2 t3,3 t3,n
. . .
. . .
. . .
N 0 tn,2 tn,3 . . . tn,n
Ti,k: Euclidean distance between xi and the kth NN to xi 15
Notation
m: Intrinsic Dimension p: Dimension of the Raw Dataset n: Number of Observations
f(x): density pdf for observation x
Tx,k or Tk: distance from observation x to kth NN
N(t,x): # obs within dist t of observation x
16
t
N(t,x) = 3
Notation
t1t3
t2
x
p = 2
m = 1
N = 12
17
NN Regression Estimator
Density of Distance to kth NN(Single Observation, appx as Poisson)
1
Expected Distance to kth NN(Single Observation)
2aSample-AveragedDistance to kth NN
2b
Expected Distance toSample-Averaged kth NN
3
1Y X
m
Regression Estimator, ,
1( ) [ ]x k x kf t P t T t t
t
19
Trinomial Distribution
1 11 ![ ( ) ] [ ( ) ] [1 ( ) ]
( 1)!1!( )!k n k
t t t
nf x V f x V f x V
t k n k
11( ) ' [ ( ) ] [1 ( ) ]
1k n k
t t t
nnf x V f x V f x V
k
Binomial Distribution
Distance to Kth NN pdf
Assumptions
• f(x) is constant• n is large• f(x)Vt is small
Regression Estimator
1 1 ( 1) ( ) ( )
,
( ) ( ) [( 1) ( ) ( ) ]( )
( 1)!
mm m k n f x V m t
k x
nf x V m mt n f x V m t ef t
k
Approximate as Poisson
1
1 11
1( ) ( ) ( )
, where ( 1) ( ) ( )
( )
m
m m
k nf x V mm k c n f x V m
k k c
, ,
0
( ) ( )k x k xE t tf t dt
Expected distance to Kth NN
CnGk,m
1
1 111
1( ) 1 ( ) ( )
( )
( ) [( 1) ( ) ( )]
nim
kim m
i
k nf x V mmE t kn
k k n f x V m
,
1log( ) log( ) log( ) log( )k m k nG t k C
m
Estimate m using simple linear regression
21
,1
1where,
i
n
k k Xi
t tn
Ex: Swiss Roll DatasetSwiss Roll Data
y = 0.4912x - 0.0913
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 0.5 1 1.5 2 2.5 3 3.5
ln(k)
ln(t
k)
22
m=0.49
Datasets
Faces: Raw Dimension = 4096, Int Dim ~ 3 to 5
Gaussian Sphere
Raw Dim = 3
Int Dim = 3
Swiss Roll
Raw Dim = 3
Int Dim = 2
Dbl Swiss Roll
Raw Dim = 3
Int Dim = 2
23
ResultsRegression Estimator
FACES
~ 3.0
~ 2.0
~ 2.0
~ 3.5
24K = N / 100
NN Maximum Likelihood Estimator
Counting ProcessBinomial (appx as Poisson)
1
Joint Counting Probability
Joint Occurrence Density
2
Log-likelihood Function
3
4
0xL
m
0xL
Maximum Likelihood Estimator
( , ) ( , ( ) )tN t x BIN n f x V
( )t tnf x V
26
( )
,
( )[ ] , for 0 < r < s
!
s r ns r
r s
eP N n
n
N(t,x) = # Counts within Distance t of x
# Counts btw Distance r and s is BIN
,r s s r
Maximum Likelihood Estimator
1 ,..., 1( ,..., )NT T Nf t t
0 1 1 1 1 2 2 2, , , , ,
1[ 0, 1, 0, 1, , 0]
K Kt t t t t t t t t t t t t tKP N N N N N
t
28
Joint pdf of Distances to K NN
0 0 1
( ) 1( ) ( )
1 1
( )1
(1)!
t t ti i
t t t t t ti i i i
K Kt t t
ki i
ee e
t
i i it t t t t tKe
0
1 ,..., 11
( ,..., )
R
t
N i
dt K
T T N ti
f t t e
0 0
lnR R
x t t tL dN dt
( , ) ( ) 0mxLN R x V m e R
( , )
1
'( ) 1[ ] ( , ) ln '( ) ( ) log 0
( )
N R xm mx
jj
L V mN R x T V m e R V m e R R
m V m m
29
Log-Likelihood Function
( )
1
( )ln ( ) log 0
N R
jj
N RT N R R
m
( )
1
( )ln
N R
j j
N R R
m T
1
1
1
1ln
1
kk
x
j j
Tm
k T
11
1 1
1ln
1
n kk
i j jx
Tk T
mn
1
1 1
ln1
( 1)
N kk
i j j
x
TT
N km
Averaging over N observations
Averaging inverses over N
observations
(Using MLE)
E. Levina & P. Bickel
D. MacKay & Z. Ghahramani30
ResultsMLE Estimator (Revised MacKay & Ghahramani)
FACES
~ 3.0
~ 2.0
~ 2.1
~ 3.5
31K = N / 100
Comparison
32
Input Data: 3-Dimensional Gaussian Sphere Int Dim = 3
2000 Observations
0
1
2
3
4
5
6
1 10 100 1000
Neighbors (K)
Intr
ins
ic D
ime
ns
ion
Es
tim
ate
NN MLE
Rev NN MLE
NN REG
Comparison
33
Input Data: 3-Dimensional Swiss Roll Intrinsic Dimension = 22000 Observations
0
0.5
1
1.5
2
2.5
3
3.5
4
1 10 100 1000
Neighbors (K)
Intr
ins
ic D
ime
ns
ion
Es
tim
ate
NN MLE
Rev NN MLE
NN REG
Comparison
34
Input Data: 3-Dimensional Double Swiss Roll Int Dim = 22000 Observations
0
0.5
1
1.5
2
2.5
3
3.5
4
1 10 100 1000
Neighbors (K)
Intr
ins
ic D
ime
ns
ion
Es
tim
ate
NN MLE
Rev NN MLE
NN REG
Comparison
35
Face Data128 Data Point (128 images)
0
1
2
3
4
5
6
7
8
9
10
1 10 100
Neighbors (K)
Intr
ins
ic D
ime
ns
ion
Es
tim
ate
NN MLE
Rev NN MLE
NN REG
Comparison
36
Input Data: 3-Dim Gaussian Sphere/Uniform Cube Int Dim = 31000 Observations
0
1
2
3
4
5
6
1 10 100 1000
Neighbors (K)
Intr
ins
ic D
ime
ns
ion
Es
tim
ate
Gaussian Distributed
Uniform Distributed
Comparison
37
Input Data: 25-Dimensional Gaussian Intrinsic Dimension = 152000/1000/500/250 Observations
10
10.5
11
11.5
12
12.5
13
13.5
14
14.5
15
0 5 10 15 20
Neighbors (K)
Intr
ins
ic D
ime
ns
ion
Es
tim
ate
N = 250
N = 500
N = 1000
N = 2000
Isomap
38
Summary
The regression and revised MLE estimators share similar characteristics when intrinsic dimension is small
As intrinsic dimension increases, the estimators become more dependent on K
Distribution type does not appear to be highly influential when the intrinsic dimension is small
39
Thank You!
Dr. Kang James & Dr. Barry James Dr. Steve Trogdon
0
0.5
1
1.5
2
2.5
3
3.5
4
0 10 20 30 40 50
1.kt vs k
0
2
4
6
8
10
12
14
16
0 10 20 30 40 50
0
10
20
30
40
50
60
0 10 20 30 40 50
2.kt vs k
3.kt vs k
Example
Swiss Roll Data
Int Dim = 2
th
Consider random, m-dimensional data:
( ) : Density at Observation
: Average Distance to k NN
Assume that ( ) is indep of k
knear , then .
f(x)
( ) , ( ) is a constant
Therefore,
A plot of
k
m
m
k
f x x
t
f x
x V
V V m t V m
k t
m
k t vs. k should be linear.