Estimating Intrinsic Dimension

Estimating Intrinsic Dimension

Justin EberhardtUMD, Mathematics and Statistics

Advisor: Dr. Kang James

Outline

Introduction Nearest Neighborhood Estimators

Regression Estimator Maximum Likelihood Estimator Revised Maximum Likelihood Estimator

Comparison Summary

2

Intrinsic Dimension

Definition

The least number of parameters required to generate a dataset

Minimum number of dimensions that describes a dataset without significant loss of feature

3

Ex 1: Intrinsic Dimension

Flatten (Unroll)

x

z

y

x

y

Int Dim = 2

4


28 X 28

One Image: 784 Dimensional

1 28

56


Int Dim = 2

[Isomap Project, J. Tenenbaum & J. Langford, Stanford]

6

Top & Bottom Loop

No Loop

Applications

Biometrics Facial Recognition, Fingerprints, Iris

Genetics

7

Why do we need to reduce dimensionality? Low dimensional datasets are more efficient

Not even supercomputers can handle very high-dimensional matrices

Data in 1,2 and 3 dimensions can be visualized

8

Ex: Facial Recognition in MN 5 Million People 2 Images per Person (Front and Profile) 1028 X 1028 Pixels per Image (1 Megapixel)

Total Memory Required: n = 5,000,000 p = (2)(1028)(1028)= 2.11 Million Dimensions Matrix Size: (5 x 106)(2.11 x 106) = 10 billion cells Memory: 2(10 x 1012) = 20 x 1012 = 20 Terabytes

Intrinsic Dimension Estimators

Objective:

To find a simple formula that uses nearest neighbor (NN) information to quickly estimate intrinsic dimension

10


Project Description:

Through simulation, we will compare the effectiveness of three proposed NN intrinsic dimension estimators.

11


Note:

Traditional methods for estimating Intrinsic Dimension, such as PCA, fail on non-linear manifolds.

12

Intrinsic Dimension EstimatorsNearest-Neighbor Methods Regression Estimator

K. Pettis, T. Bailey, A. Jain & R. Dubes, 1979

Maximum Likelihood EstimatorE. Levina, & P. Bickel, 2005

D. MacKay and Z. Ghahramani, 2005

13

The distance from x2 to x3

Distance Matrix

1 2 3 . . . N

1 0 d1,2 d1,3 d1,n

2 d2,1 0 d2,3 d2,n

3 d3,1 d3,2 0 d3,n

. . .

. . .

. . .

N dn,1 dn,2 dn,3 . . . 0Di,j: Euclidean distance from xi to xj 14

The distance between x2 and the kth NN to x2

Nearest Neighbor Matrix

1 2 3 . . . N

1 0 t1,2 t1,3 t1,n

2 0 t2,2 t2,3 t2,n

3 0 t3,2 t3,3 t3,n

. . .

. . .

. . .

N 0 tn,2 tn,3 . . . tn,n

Ti,k: Euclidean distance between xi and the kth NN to xi 15

Notation

m: Intrinsic Dimension p: Dimension of the Raw Dataset n: Number of Observations

f(x): density pdf for observation x

Tx,k or Tk: distance from observation x to kth NN

N(t,x): # obs within dist t of observation x

16

t

N(t,x) = 3

Notation

t1t3

t2

x

p = 2

m = 1

N = 12

17

NN Regression Estimator

Density of Distance to kth NN(Single Observation, appx as Poisson)

1

Expected Distance to kth NN(Single Observation)

2aSample-AveragedDistance to kth NN

2b

Expected Distance toSample-Averaged kth NN

3

1Y X

m

Regression Estimator, ,

1( ) [ ]x k x kf t P t T t t

t

19

Trinomial Distribution

1 11 ![ ( ) ] [ ( ) ] [1 ( ) ]

( 1)!1!( )!k n k

t t t

nf x V f x V f x V

t k n k

11( ) ' [ ( ) ] [1 ( ) ]

1k n k

t t t

nnf x V f x V f x V

k

Binomial Distribution

Distance to Kth NN pdf

Assumptions

• f(x) is constant• n is large• f(x)Vt is small

Regression Estimator

1 1 ( 1) ( ) ( )

,

( ) ( ) [( 1) ( ) ( ) ]( )

( 1)!

mm m k n f x V m t

k x

nf x V m mt n f x V m t ef t

k

Approximate as Poisson

1

1 11

1( ) ( ) ( )

, where ( 1) ( ) ( )

( )

m

m m

k nf x V mm k c n f x V m

k k c

, ,

0

( ) ( )k x k xE t tf t dt

Expected distance to Kth NN

CnGk,m

1

1 111

1( ) 1 ( ) ( )

( )

( ) [( 1) ( ) ( )]

nim

kim m

i

k nf x V mmE t kn

k k n f x V m

,

1log( ) log( ) log( ) log( )k m k nG t k C

m

Estimate m using simple linear regression

21

,1

1where,

i

n

k k Xi

t tn

Ex: Swiss Roll DatasetSwiss Roll Data

y = 0.4912x - 0.0913

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 0.5 1 1.5 2 2.5 3 3.5

ln(k)

ln(t

k)

22

m=0.49

Datasets

Faces: Raw Dimension = 4096, Int Dim ~ 3 to 5

Gaussian Sphere

Raw Dim = 3

Int Dim = 3

Swiss Roll

Raw Dim = 3

Int Dim = 2

Dbl Swiss Roll

Raw Dim = 3

Int Dim = 2

23

ResultsRegression Estimator

FACES

~ 3.0

~ 2.0

~ 2.0

~ 3.5

24K = N / 100

NN Maximum Likelihood Estimator

Counting ProcessBinomial (appx as Poisson)

1

Joint Counting Probability

Joint Occurrence Density

2

Log-likelihood Function

3

4

0xL

m

0xL

Maximum Likelihood Estimator

( , ) ( , ( ) )tN t x BIN n f x V

( )t tnf x V

26

( )

,

( )[ ] , for 0 < r < s

!

s r ns r

r s

eP N n

n

N(t,x) = # Counts within Distance t of x

# Counts btw Distance r and s is BIN

,r s s r

Maximum Likelihood Estimator

1 ,..., 1( ,..., )NT T Nf t t

0 1 1 1 1 2 2 2, , , , ,

1[ 0, 1, 0, 1, , 0]

K Kt t t t t t t t t t t t t tKP N N N N N

t

28

Joint pdf of Distances to K NN

0 0 1

( ) 1( ) ( )

1 1

( )1

(1)!

t t ti i

t t t t t ti i i i

K Kt t t

ki i

ee e

t

i i it t t t t tKe

0

1 ,..., 11

( ,..., )

R

t

N i

dt K

T T N ti

f t t e

0 0

lnR R

x t t tL dN dt

( , ) ( ) 0mxLN R x V m e R

( , )

1

'( ) 1[ ] ( , ) ln '( ) ( ) log 0

( )

N R xm mx

jj

L V mN R x T V m e R V m e R R

m V m m

29

Log-Likelihood Function

( )

1

( )ln ( ) log 0

N R

jj

N RT N R R

m

( )

1

( )ln

N R

j j

N R R

m T

1

1

1

1ln

1

kk

x

j j

Tm

k T

11

1 1

1ln

1

n kk

i j jx

Tk T

mn

1

1 1

ln1

( 1)

N kk

i j j

x

TT

N km

Averaging over N observations

Averaging inverses over N

observations

(Using MLE)

E. Levina & P. Bickel

D. MacKay & Z. Ghahramani30

ResultsMLE Estimator (Revised MacKay & Ghahramani)

FACES

~ 3.0

~ 2.0

~ 2.1

~ 3.5

31K = N / 100

Comparison

32

Input Data: 3-Dimensional Gaussian Sphere Int Dim = 3

2000 Observations

0

1

2

3

4

5

6

1 10 100 1000

Neighbors (K)

Intr

ins

ic D

ime

ns

ion

Es

tim

ate

NN MLE

Rev NN MLE

NN REG

Comparison

33

Input Data: 3-Dimensional Swiss Roll Intrinsic Dimension = 22000 Observations

0

0.5

1

1.5

2

2.5

3

3.5

4

1 10 100 1000

Neighbors (K)

Intr

ins

ic D

ime

ns

ion

Es

tim

ate

NN MLE

Rev NN MLE

NN REG

Comparison

34

Input Data: 3-Dimensional Double Swiss Roll Int Dim = 22000 Observations

0

0.5

1

1.5

2

2.5

3

3.5

4

1 10 100 1000

Neighbors (K)

Intr

ins

ic D

ime

ns

ion

Es

tim

ate

NN MLE

Rev NN MLE

NN REG

Comparison

35

Face Data128 Data Point (128 images)

0

1

2

3

4

5

6

7

8

9

10

1 10 100

Neighbors (K)

Intr

ins

ic D

ime

ns

ion

Es

tim

ate

NN MLE

Rev NN MLE

NN REG

Comparison

36

Input Data: 3-Dim Gaussian Sphere/Uniform Cube Int Dim = 31000 Observations

0

1

2

3

4

5

6

1 10 100 1000

Neighbors (K)

Intr

ins

ic D

ime

ns

ion

Es

tim

ate

Gaussian Distributed

Uniform Distributed

Comparison

37

Input Data: 25-Dimensional Gaussian Intrinsic Dimension = 152000/1000/500/250 Observations

10

10.5

11

11.5

12

12.5

13

13.5

14

14.5

15

0 5 10 15 20

Neighbors (K)

Intr

ins

ic D

ime

ns

ion

Es

tim

ate

N = 250

N = 500

N = 1000

N = 2000

Isomap

38

Summary

The regression and revised MLE estimators share similar characteristics when intrinsic dimension is small

As intrinsic dimension increases, the estimators become more dependent on K

Distribution type does not appear to be highly influential when the intrinsic dimension is small

39

Thank You!

Dr. Kang James & Dr. Barry James Dr. Steve Trogdon

0

0.5

1

1.5

2

2.5

3

3.5

4

0 10 20 30 40 50

1.kt vs k

0

2

4

6

8

10

12

14

16

0 10 20 30 40 50

0

10

20

30

40

50

60

0 10 20 30 40 50

2.kt vs k

3.kt vs k

Example

Swiss Roll Data

Int Dim = 2

th

Consider random, m-dimensional data:

( ) : Density at Observation

: Average Distance to k NN

Assume that ( ) is indep of k

knear , then .

f(x)

( ) , ( ) is a constant

Therefore,

A plot of

k

m

m

k

f x x

t

f x

x V

V V m t V m

k t

m

k t vs. k should be linear.

Estimating Intrinsic Dimension

Documents