Feature Extraction for Outlier Detection in High- Dimensional Spaces Hoang Vu Nguyen Vivekanand Gopalkrishnan
Feature Extraction for Outlier Detection in High-Dimensional Spaces
Hoang Vu Nguyen
Vivekanand Gopalkrishnan
Motivation
Feature Extraction for Outlier Detection2
• Outlier detection techniques Compute distances between points in full feature space
Curse of dimensionality
Solution: feature extraction
• Feature extraction techniques Do not consider class imbalance
Not suitable for asymmetric classification (and outlier detection!)
Overview
Feature Extraction for Outlier Detection3
• DROUT Dimensionality Reduction/Feature Extraction for OUTlier Detection
Extract features for the detection process
To be integrated with outlier detectors
Training setDROUT
Features
Testing setDetector Outliers
Background
Feature Extraction for Outlier Detection4
• Training set:
Normal class ωm: cardinality Nm, mean vector μm, covariance matrix ∑m
Anomaly class ωa: cardinality Nm, mean vector μa, covariance matrix ∑a
Nm >> Na
Total number of points: Nt = Nm + Na
∑w = (Nm/Nt) . ∑m + (Na/Nt) . ∑a
∑b = (Nm/Nt) . (μm – μt) (μm – μt)T + (Na/Nt) . (μa – μt)(μa – μt)T
∑t = ∑w + ∑b
Background (cont.)
Feature Extraction for Outlier Detection5
• Eigenspace of scatter matrix ∑ : (spanned by eigenvectors) Consists of 3 subspaces: principal, noise, and null space
Solving eigenvalue problem and obtain d eigenvalues v1 ≥ v2 ≥ … ≥ vd
Noise and null subspaces are caused by noise and mainly by the insufficient training data
Existing methods: discard the noise and null subspaces loss of information
Jiang et al. 2008: regularize all 3 subspaces before performing feature extraction
1 m r d
P N Ø
0
Plot of eigenvalues
DROUT Approach
Feature Extraction for Outlier Detection6
• Weight-adjusted Within-Class Scatter Matrix
∑w = (Nm/Nt) . ∑m + (Na/Nt) . ∑a
Nm >> Na ∑a is far less reliable than ∑m
Weighing ∑m and ∑a according to (Nm/Nt) and (Na/Nt)
when doing feature extraction on ∑w (using PCA etc.), dimensions (eigenvectors) specified mainly by small eigenvalues of ∑m unexpectedly removed
dimensions extracted are not really relevant for the asymmetric classification task
Xudong Jiang: Asymmetric principal component and discriminant analyses for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell., 31(5), 2009
• Solutions
∑w = wm . ∑m + wa . ∑a
wm < wa and wm + wa = 1
more suitable for asymmetric classification
DROUT Approach (cont.)
Feature Extraction for Outlier Detection7
• Which matrix to regularize first? Goal: extract features that minimize the within-class and maximize the between-class
variances
Within-class variances are estimated from limited training data
small variances estimated tend to be unstable and cause overfitting
proceed with regularizing 3 subspaces of the adjusted within-class scatter matrix
DROUT Approach (cont.)
Feature Extraction for Outlier Detection8
• Subspace decomposition
Solving eigenvalue problem on (weight-adjusted) ∑w and obtain eigenvectors {e1, e2, …, ed} with corresponding eigenvalues v1 ≥ v2 ≥ … ≥ vd
Identify m:
vmed = mediani ≤ r {vi}
vm+1 = maxi ≤ r {vi | vi < 2vmed – vr}
1 m r d
P N Ø
0
Plot of eigenvalues
DROUT Approach (cont.)
Feature Extraction for Outlier Detection9
• Subspace regularization
a = v1 . vm . (m – 1)/(v1 – vm)
b = (mvm – v1)/(v1 – vm)
Regularize:
i ≤ m: xi = vi
m < i ≤ r: xi = a/(i + b)
r < i ≤ d: xi = a/(r + 1 + b)
A = [ei . wi]1 ≤ i ≤ d where wi = 1/sqrt(xi)
1 m r d
P N Ø
0
iw
DROUT Approach (cont.)
Feature Extraction for Outlier Detection10
• Subspace regularization pT = AT . p with p being a data point
Form new (weight-adjusted) total scatter matrix (slide 4) and solve the eigenvalue problem using it
B = matrix of c resulting eigenvectors with largest eigenvalues
feature extraction done only after regularization limit loss of information
Xudong Jiang, Bappaditya Mandal, and Alex ChiChung Kot: Eigenfeature regularization and extraction in face recognition. IEEE Trans. Pattern Anal. Mach. Intell., 30(3):383–394, 2008
Transform matrix: M = A . B
DROUT Approach (cont.)
Feature Extraction for Outlier Detection11
• Summary:
Let ∑w = wm . ∑m + wa . ∑a
Compute A from ∑w
Transform the training set using A
Compute the new total scatter matrix ∑t
Compute B by solving the eigenvalue problem on ∑t
M = A . B
Use M to transform the testing set
Related Work
Feature Extraction for Outlier Detection12
• APCDAXudong Jiang: Asymmetric principal component and discriminant analyses for pattern classification . IEEE Trans. Pattern Anal. Mach. Intell., 31(5), 2009
Uses weight-adjusted scatter matrices for feature extraction
Discards noise and null subspaces loss of information
• EREXudong Jiang, Bappaditya Mandal, and Alex ChiChung Kot: Eigenfeature regularization and extraction in face recognition. IEEE Trans. Pattern Anal. Mach. Intell., 30(3):383–394, 2008
Performs regularization before feature extraction
Ignores class imbalance not suitable for outlier detection
• ACPDavid Lindgren and Per Spangeus: A novel feature extraction algorithm for asymmetric classification. IEEE Sensors Journal, 4(5):643–650, 2004
Consider neither noise-null subspaces nor class imbalance
Outlier Detection with DROUT
Feature Extraction for Outlier Detection13
• Detectors: ORCA
Stephen D. Bay and Mark Schwabacher: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In KDD, pages 29–38, 2003
BSOUT
George Kollios, Dimitrios Gunopulos, Nick Koudas, and Stefan Berchtold: Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Trans. Knowl. Data Eng., 15(5):1170–1187, 2003
Outlier Detection with DROUT (cont.)
Feature Extraction for Outlier Detection14
• Datasets: KDD Cup 1999
Normal class (60593 records) vs. U2R class (246)
d = 34 (7 categorical attributes are excluded)
Training set: 1000 normal recs. vs. 50 anomalous recs.
Ann-thyroid 1
Class 3 vs. class 1
d = 21
Training set: 450 normal recs. vs. 50 anomalous recs.
Ann-thyroid 2
Class 3 vs. class 2
d = 21
Training set: 450 normal recs. vs. 50 anomalous recs.
• Parameter settings:
wm = 0.1 and wa = 0.9
Number of extracted features b ≤ d/2
Conclusion
Feature Extraction for Outlier Detection17
• Summary of contributions Explore the effect of feature extraction on outlier detection
Results on real datasets and two detection methods are promising
A novel framework for ensemble outlier detection. Experiments on real data sets seem to be promising
• Future work More experiments on larger datasets
Examine other possibilities of dimensionality reduction