Unsupervised Anomaly Detection for High Dimensional Data Dr. Thayasivam, Umashanger Department of Mathematics, Rowan University. July 19th, 2013 International Workshop in Sequential Methodologies (IWSM-2013) Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
39
Embed
Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Unsupervised Anomaly Detection for HighDimensional Data
Dr. Thayasivam, Umashanger
Department of Mathematics, Rowan University.
July 19th, 2013
International Workshop in Sequential Methodologies(IWSM-2013)
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Outline of Talk
I Motivation : Biometrics
I SVM(Supervised learning) Approach
I Unsupervised L2E Estimation Approach
I Experimental Results
I Concluding Remarks
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Introduction
We are drowning in the deluge of data that are being collectedworld-wide, while starving for knowledge at the same time.Anomalous events occur relatively infrequently However, whenthey do occur, their consequences can be quite dramatic and quiteoften in a negative sense
* - J. Naisbitt, Megatrends: Ten New Directions Transforming Our Lives.New York: Warner Books, 1982.
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Need for Accurate Speaker Recognition
I Method of recognizing a person based on his voice
I One of the forms of biometric identification
I Need for accurate and scalable speaker recognition -VoIP applications
I Applications in diverse areas- telephone, internetbanking,online trading,forensics
I Corporate and government sectors security enforcement
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
What is an intrusion detection?
I Intrusions are the activities that violate the security policy ofsystem.
I Intrusion Detection is the process used to identify maliciousbehavior that targets a network and its resources
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Intrusion Detection System
I Intrusion Detection Systems(IDSs) plays a key role as defensemechanism against malicious attacks in network security.
I Monitors traffic between users and networks; abnormalactivity.
I Analyzes patterns/signatures based on data packets.
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Intrusion Detection Techniques
I misuse intrusion detection-intrusion signatures
I statistical/anomaly intrusion detection
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Misuse intrusion detection
I Catch the intrusions in terms of the characteristics of knownattacks or system vulnerabilities
I Built with knowledge of bad behaviors
I Collection of signatures-Signature Analysis
I Examine event stream for signature match-Pattern Matching
I Cannot detect novel or unknown attacks
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Anomaly detection?
I Anomaly is a pattern in the data that does not conformto the expected behavior
I Also referred to as outliers, exceptions, peculiarities,surprise, etc.
I Detect any action that significantly deviates from thenormal behavior
I Built with knowledge of normal behaviors
I Examine event stream for deviations from normal
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Applications of Anomaly detection
I Network intrusion detection
I Insurance / Credit card fraud detection
I Healthcare Informatics / Medical diagnostics
I Industrial Damage Detection
I Image Processing / Video surveillance
I Novel Topic Detection in Text Mining
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Real world Analomies
Figure : Real world AnalomiesDr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Key Challenges
I Defining a representative normal region is challenging
I The boundary between normal and outlying behavior isoften not precise
I The exact notion of an outlier is different for differentapplication domains
I Availability of labeled data for training/validation
I Data is extremely huge, noisy, can be complex
I Normal behavior keeps evolving
I Fast and accurate real-time detection
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Novelty detection
I Identification of new or unknown data or signal that amachine learning system is not aware of during training.
I Fundamental requirements of good classification oridentification system
I Abnormalities are very rare or there may be no datadescribes the faulty conditions
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Techniques/approaches to detect anomalies
I Supervised - The data (observations, measurements, etc.)are labeled with pre-defined classes.
I Unsupervised - Class labels of the data are unknown
I Given a set of data, the task is to establish the existenceof classes or clusters in the data
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Support Vector Machine (SVM)
I A popular supervised anomaly detection technique
I SVMs are linear classifiers that find a hyperplane toseparate two class of data, positive and negative
I The common features in normal and adversary groupsneed to be learned and need to be differentiate
I Discovering the key characteristics of network trafficpatterns, a decision making boundary is superimposed inthe space of feature representations.
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
SVM for Network Traffic Classification
I Effectively understand the patterns of network trafficand detect measurements deemed untrustworthy frommalicious targets
I Eliminates the need for arbitrary assumptions about theunderlying network topology and parameters orthresholds in favor of direct training data.
I Discover key characteristics of network traffic patternsby superimposing a boundary in the space ofmeasurements.
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
SVM Framework
I Cast the problem of detecting malicious nodes in a SVMclassification framework
I Labeled Training Examples: (~xi , yi ), where ~xi is therepresentation of the i th example in the feature spaceand yi ∈ {1,−1} is the corresponding label
I Decision Boundary Function: y(→x ) =
→w .→x .+ w0 where
→w is the weight vector and w0 is the bias.
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
SVM Framework
I Network Traffic Features:→x
I Optimization Function:→w and w0
I Prediction of Training Set Label:
→w .→x +w0
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
SVM Optimization Problem
I
min1
2||→W ||2 + γ
N∑i=1
εi
subject to yi (→W .Φ(
→x ) + W0) > 1− εi , ∀i
I where N : number of training examples.
I εi : collection of non-negative slack variables that account forpossible misclassification’s.
I γ : trade off factor between the slack variables and the
regularization on the norm of the weight vector→W .
I The constraint in this minimization implies that we want our
predictions,→W .Φ (~x) .+ W to be similar to labels.
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Solution to the SVM Optimization Problem
I Solve optimization by quadratic programming in dual
I Parameter estimation by cross validation of training set
I Given a ~W ∗ and ~W ∗0 , predict whether a node is adversary
or not by looking at the sign of ~W ∗.Φ (~x) + ~W ∗0 .
I LibSVM package to implement the SVM model basedanomaly detection
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Key Challenges in Supervised learning
I Defining a representative normal region is challenging
I The boundary between normal and outlying behavior isoften not precise
I The exact notion of an outlier is different for differentapplication domains
I Availability of labeled data for training/validation
I Data is extremely huge, noisy, can be complex
I Normal behavior keeps evolving
I Fast and accurate real-time detection
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
What is Mixture Model
I Let fθm(~x) denote the general mixture probability density
function with m components.
fθm(~x) =
m∑i=1
πi f (~x |~φi ).
I πi ≥ 0,m∑i=1
πi = 1 for i = 1, . . . ,m;
θm = (π1, . . . , πm−1, πm, ~φT
1 , . . . ,~φT
m)T .
I In theory, the f (~x |~φi )’s could be any parametric density,although in practice they are often from the same parametricfamily (usually Gaussian)
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Estimation Approach with Built-in Robustness using L2E
I When m is known, we want to find fθm(~x) is close to g(~x) in
L2 distance.
I That is,
L2(fθm, g(~x)) =
∫ ∞−∞
[fθm(~x)− g(~x)]2d~x .
I The aim is to derive an estimate of θm that minimizes the L2distance
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Estimation Approach with Built-in Robustness
L2(fθm(~x), g(~x)) =
∫ ∞−∞
f 2θm(~x)d~x
− 2
∫ ∞−∞
fθm(~x)g(~x)d~x
+
∫ ∞−∞
g(~x)2d~x
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Estimation Approach with Built-in Robustness
I The last integral is constant with with respect to θm
I The first integral is often available as a closed formexpression
I The second integral is simply the average height of thedensity estimate, which may be estimated as−2n−1
∑ni=1 fθm
(~Xi ) where ~Xi is a sample observation.
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Computational Algorithm
I The L2E estimator of θm is given by
θ̂L2Em = arg min
θm
[∫ ∞−∞
f 2θm(~x)d~x − 2n−1
n∑i=1
fθm(~Xi )
],
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Computational Algorithm
I Normal Identity∫ ∞−∞
φ(x | µ1, σ12)φ(x | µ2, σ2
2)dx = φ(µ1 − µ2| 0, σ12 + σ2
2),
I where φ(x | µ, σ2) is the normal density function with mean µ andvariance σ2.
I For multivariate Gaussian mixtures-GMM, f (~x |~φi ) = φ(~x | ~µi ,Σi ),the use of the above identity reduces the key integral to
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Computational Algorithm
∫ ∞−∞
f 2θm(~x)d~x =
m∑k=1
m∑l=1
πkπl φ(~µk − ~µl | 0,Σk + Σl)
I Making the integral tractable and thereby significantlyreducing the computations involved in minimizing L2E .
I Thus, the estimation of L2E may be performed by anystandard optimization algorithm.
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Data Analysis
I The effective detection and identification of anomalies intraffic requires the ability to separate them from normalnetwork traffic.
I Network traffic data set from University of New Mexico.
I Trace files contained 13831 sample observations withprocess IDs and their respective system calls.
I We apply our L2E(unsupervised) and compare theperformance with SVM(supervised)
Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data
Results: Accuracy with increasing dimensions-(70%-30%train-test partition of the data)