Motivation Intrinsic Dimensionality Theory Intrinsic Dimensional Dependency Conclusions ICPR 2016 – December 5th 2016 Measuring Dependency via Intrinsic Dimensionality Simone Romano * [email protected]@ialuronico Oussama Chelly Nguyen Xuan Vinh James Bailey Michael E. Houle * Currently I am an applied scientist for in London UK Simone Romano NII Tokyo Measuring Dependency via Intrinsic Dimensionality
26
Embed
Measuring Dependency via Intrinsic Dimensionality (ICPR 2016)
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Motivation Intrinsic Dimensionality Theory Intrinsic Dimensional Dependency Conclusions
Motivation Intrinsic Dimensionality Theory Intrinsic Dimensional Dependency Conclusions
Connection between local ID and α-Renyi dimension
Theorem 1The α-Renyi dimension can be expressed as:
dimα(X ) =
∫f α(x) ID(x)dx∫
f α(x)dx,
where f is pdf of X .
Special case: α = 1, dim(X ) is the growth rate of the Shannon entropy.
According to Theorem 1, dim(X ) is the expectation of the local ID:
dim(X ) =
∫f (x) ID(x)dx .
This can be estimated using local ID estimators proposed in [Amsaleg et al., 2015]
Simone Romano NII Tokyo
Measuring Dependency via Intrinsic Dimensionality
Motivation Intrinsic Dimensionality Theory Intrinsic Dimensional Dependency Conclusions
Connection between local ID and α-Renyi dimension
Theorem 1The α-Renyi dimension can be expressed as:
dimα(X ) =
∫f α(x) ID(x)dx∫
f α(x)dx,
where f is pdf of X .
Special case: α = 1, dim(X ) is the growth rate of the Shannon entropy.
According to Theorem 1, dim(X ) is the expectation of the local ID:
dim(X ) =
∫f (x) ID(x)dx .
This can be estimated using local ID estimators proposed in [Amsaleg et al., 2015]
Simone Romano NII Tokyo
Measuring Dependency via Intrinsic Dimensionality
Motivation Intrinsic Dimensionality Theory Intrinsic Dimensional Dependency Conclusions
kNN estimators of dimensionality
We propose kNN estimators for the intrinsic dimensionality dimα of the variables X .
In the special case of α = 1, we can simply average local ID:
dim(X ) =1
n
n∑i=1
ID(xi) = −1
n
n∑i=1
(1
k
k∑i=1
lndi(x)
dk(x)
)−1
.
where di(x) is the distance of x to its ith nearest neighbor.
We can employ the estimators for dimα to build a dependency measure betweenmultiple variables
Simone Romano NII Tokyo
Measuring Dependency via Intrinsic Dimensionality
Motivation Intrinsic Dimensionality Theory Intrinsic Dimensional Dependency Conclusions
kNN estimators of dimensionality
We propose kNN estimators for the intrinsic dimensionality dimα of the variables X .
In the special case of α = 1, we can simply average local ID:
dim(X ) =1
n
n∑i=1
ID(xi) = −1
n
n∑i=1
(1
k
k∑i=1
lndi(x)
dk(x)
)−1
.
where di(x) is the distance of x to its ith nearest neighbor.
We can employ the estimators for dimα to build a dependency measure betweenmultiple variables
Simone Romano NII Tokyo
Measuring Dependency via Intrinsic Dimensionality
Motivation Intrinsic Dimensionality Theory Intrinsic Dimensional Dependency Conclusions
Building a dependency measure: choice of αBigger α makes the estimation of dimensionality less sensitive to noise.
0 0.5 1X1
-1
0
1X
2
1 2 3,
1
1.2
1.4ddim,(X1;X2)
Figure : With α ≈ 3 the estimated dimensionality is too small: it is indeed 1 as if the figure
depicted a perfect 1-dimensional manifold.
When building a dependency measure, we want to be sensitive to noise
Therefore we use dim(X ) with α = 1 to build a dependency measure because:
I sensitive to noise
I simple estimator
I we can use properties of Shannon entropy
Simone Romano NII Tokyo
Measuring Dependency via Intrinsic Dimensionality
Motivation Intrinsic Dimensionality Theory Intrinsic Dimensional Dependency Conclusions
Motivation
Intrinsic Dimensionality Theory
Intrinsic Dimensional Dependency
Conclusions
Simone Romano NII Tokyo
Measuring Dependency via Intrinsic Dimensionality
Motivation Intrinsic Dimensionality Theory Intrinsic Dimensional Dependency Conclusions
Intrinsic Dimensional Dependency
The Intrinsic Dimensional Dependency (IDD) for X :
IDD(X ) ,
∑Di=1 dim(Xi)− dim(X )∑D
i=1 dim(Xi)−maxi dim(Xi).
Properties
1. 0 ≤ IDD(X ) ≤ 1;
2. IDD(X ) = 0 iff all Xi are independent;
3. IDD(X ) = 1 if there exist one or more manifolds of dimension 1 whose unionembeds X ;
Simone Romano NII Tokyo
Measuring Dependency via Intrinsic Dimensionality
Motivation Intrinsic Dimensionality Theory Intrinsic Dimensional Dependency Conclusions
Algorithm to compute IDD
Compute IDD using the one single parameter k (number of NN).
We use the copula transformation to make it invariant to the marginals.
We use KD-trees to speed up computations.
IDD(X , k)
1 Copula transform X2 X = X + ε, where ε = 10−6 Gaussian noise3 Build KD-trees for X and Xi
4 Compute dim(X ) and dim(Xi)5 return IDD(X )
Average computational complexity: O(Dn log n + nk log n)
Simone Romano NII Tokyo
Measuring Dependency via Intrinsic Dimensionality
Motivation Intrinsic Dimensionality Theory Intrinsic Dimensional Dependency Conclusions
Choice of kCaveat:: if k is chosen too small, no relationship will be identified even if thereexists only a small amount of noise.
(X1; Y1) (X2; Y2)
0 100 200 300 400k
0
0.5
1
Intrinsic Dimensional Dependency (IDD)
IDD(X1; Y1)
IDD(X2; Y2)
With small k the blue relationship gets scored ≈ 0
In this work we chose k ≈ n/4
Simone Romano NII Tokyo
Measuring Dependency via Intrinsic Dimensionality
Motivation Intrinsic Dimensionality Theory Intrinsic Dimensional Dependency Conclusions
Synthetic relationships
IDD is behaves well on synthetic relationships:
I Rel. A: All variables are identical
I Rel. B: There are multiple 1-dimensional manifolds
I Rel. C: There is a functional relationship between one variable and theremaining variables
I Rel. D: All variables are independent.
2 3 4 5Number of variables D
0
0.5
1
Relationship A
2 3 4 5Number of variables D
0
0.5
1
Relationship B
2 3 4 5Number of variables D
0
0.5
1
Relationship C
2 3 4 5Number of variables D
0
0.5
1
Relationship D
IDD
UDS
MAC
Simone Romano NII Tokyo
Measuring Dependency via Intrinsic Dimensionality
Motivation Intrinsic Dimensionality Theory Intrinsic Dimensional Dependency Conclusions
Real data setsData set 1: identification of correlated traffic sensors in MelbourneDue to the nature of traffic flow, the top 100 pairs identified by the dependencymeasure should consist of sensors that are geographically close (in km).
IDD MID MIC MAC UDS
6.6 ± 5.5 7.1 ± 5.0 7.1 ± 5.5 7.4 ± 5.6 7.5 ± 5.4
Data set 2: identify the building whose energy consumption is most dependent onthe outdoor temperature at the University of Melbourne
0 20 40 60Max Temperature
4
6
8
10
Par
kville
Buildin
g Top for IDD and UDS
0 20 40 60Max Temperature
0
0.5
1
Bai
llieu
Lib
rary
Top for MID
0 20 40 60Max Temperature
0
0.2
0.4
0.6
ICT
Buildin
g
Top for MIC
0 20 40 60Max Temperature
0
0.1
0.2
0.3
Sydney
Mye
rBuildin
g Top for MAC
IDD allows to identify buildings that have two different functioning regimes of thecooling system.
Simone Romano NII Tokyo
Measuring Dependency via Intrinsic Dimensionality
Motivation Intrinsic Dimensionality Theory Intrinsic Dimensional Dependency Conclusions
Motivation
Intrinsic Dimensionality Theory
Intrinsic Dimensional Dependency
Conclusions
Simone Romano NII Tokyo
Measuring Dependency via Intrinsic Dimensionality
Motivation Intrinsic Dimensionality Theory Intrinsic Dimensional Dependency Conclusions
Conclusion - Summary
We discussed Intrinsic Dimensional Dependency (IDD),a dependency measure between multiple variablesto identify manifold dependencies.
To achieve this goal:
I we identified the connection between α-Renyi dimension and local intrinsicdimensionality;
I we proposed novel global estimators of dimensionality.
Simone Romano NII Tokyo
Measuring Dependency via Intrinsic Dimensionality
Motivation Intrinsic Dimensionality Theory Intrinsic Dimensional Dependency Conclusions
Motivation Intrinsic Dimensionality Theory Intrinsic Dimensional Dependency Conclusions
References I
Amsaleg, L., Chelly, O., Furon, T., Girard, S., Houle, M. E., Kawarabayashi, K.-i., and Nett,M. (2015).Estimating local intrinsic dimensionality.In SIGKDD, pages 29–38. ACM.
Houle, M. E. (2013).Dimensionality, discriminability, density and distance distributions.In Data Mining Workshops (ICDMW).
Nguyen, H.-V., Mandros, P., and Vreeken, J. (2016).Universal dependency analysis.SDM.
Nguyen, H. V., Muller, E., Vreeken, J., Efros, P., and Bohm, K. (2014).Multivariate maximal correlation analysis.In ICML, pages 775–783.
Renyi, A. (1961).On measures of entropy and information.
Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh,P. J., Lander, E. S., Mitzenmacher, M., and Sabeti, P. C. (2011).Detecting novel associations in large data sets.Science.
Simone Romano NII Tokyo
Measuring Dependency via Intrinsic Dimensionality
Motivation Intrinsic Dimensionality Theory Intrinsic Dimensional Dependency Conclusions
References II
Sugiyama, M. and Borgwardt, K. M. (2013).Measuring statistical dependence via the mutual information dimension.In IJCAI.
von Brunken, J., Houle, M. E., and Zimek, A. (2015).Intrinsic dimensional outlier detection in high-dimensional data.