Department of Computer Science Intelligent Systems Laboratory FUZZY DATA MINING AND GENETIC ALGORITHMS APPLIED TO INTRUSION DETECTION Susan M. Bridges [email protected]Rayford B. Vaughn [email protected]23 rd National Information Systems Security Conference October 16-19, 2000
27
Embed
Fuzzy Data Mining and Genetic Algorithms Applied to Intrusion
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Department of Computer Science Intelligent Systems Laboratory
23rd National Information Systems SecurityConference
October 16-19, 2000
Department of Computer Science Intelligent Systems Laboratory
OUTLINEl AI and Intrusion Detection
l Intrusion Detection System Designl Fuzzy Logic and Data Mining
– Fuzzy Association Rules– Fuzzy Frequency Episodes
l Intrusion Detection via Fuzzy Data Miningl GA’s for System Optimization
l Summary and Future Work
Department of Computer Science Intelligent Systems Laboratory
AI TECHNIQUES AND INTRUSIONDETECTION
l Long history of AI techniques applied to intrusiondetection. For example:– Rule-Based Expert Systems( Lunt and Jagannathan 1988)– State Transition Analysis (Ilgun and Kemmerer 1995)– Genetic Algorithms (Me 1998)– Inductive Sequential Patterns (Teng, Chen and Lu 1990)– Artificial Neural Networks (Debar, Becker, and Siboni 1992)
l Data mining applied to intrusion detection is an activearea of research. Examples include:– Lee, Stolfo, and Mok (1998)– Barbara, Jajodia, Wu, and Speegle (2000)
Department of Computer Science Intelligent Systems Laboratory
UNIQUE FEATURES OF OURWORK
l Combines fuzzy logic with data miningl Overcomes sharp boundary problems of many
systemsl Reduces false positive errorsl Can be used for both anomaly detection and misuse
detectionl Includes real-time componentsl Uses genetic algorithms for system optimization
Department of Computer Science Intelligent Systems Laboratory
FUZZY LOGIC AND SECURITYl Many security-related features are quantitative
– e.g., temporal statistical measurements(Porras and Valdes 1998; Lee and Stolfo 1998)
» SN: number of SYN flags in TCP header during last 2s» FN: number of FIN flags in TCP header during last 2s» RN: number of RST flags in TCP header during last 2s» PN: number of distinct destination ports during last 2s
0.0 1.0 2.0 3.0 4.0 5.0 6.0 Timestamp
first 2 seconds
second 2 seconds
third 2 seconds
…………………………...
Department of Computer Science Intelligent Systems Laboratory
FUZZY LOGIC ALLOWSOVERLAPPING CATEGORIES
Low Medium High Low Medium High
1
0
0.2
0.4
0.6
0.8
0 5 10 15 20 25 30Destination Ports
b. Fuzzy sets
1
0
0.2
0.4
0.6
0.8
0 5 10 15 20 25 30Destination Ports
a. Non-fuzzy sets
Department of Computer Science Intelligent Systems Laboratory
FUZZY LOGIC OVERCOMES SHARPBOUNDARY PROBLEMS
0 x1 x2 x3 x4 …...
Medium A B
LOW MEDIUM HIGH
0 x1 x2 x3 x4 …...
A B
If Medium is the normal pattern, then without fuzzy sets, A & B areboth “outside” of the normal pattern. Fuzzy logic allows “degrees ofnormality.”
Department of Computer Science Intelligent Systems Laboratory
INTELLIGENT INTRUSIONDETECTION MODEL
l Integration of fuzzy logic with data mining– Fuzzy association rules– Fuzzy frequency episodes
l Preliminary architecture– Includes both misuse detection and anamoly
detection– Integrates machine-level and network-level
information
l Optimization using genetic algorithms
Department of Computer Science Intelligent Systems Laboratory
…...
Core Component
IntrusionDetectionModule n+1
Machine Learning Component( by mining fuzzy association rulesand fuzzy frequency episodes )
Server
Clients
BackgroundUnit
Experts
…...Network Trafficor Audit Data (1)
Network Trafficor Audit Data (2)
Network Trafficor Audit Data (m)
IntrusionDetectionModule n’
Decision-Making Module
IntrusionDetectionModule n
Communication Module
IntrusionDetectionModule 1
.
.
.
Administrator
…...
Anomaly Detection
MisuseDetection
IntrusionDetectionSentry s
Host orNetworkDevice
IntrusionDetectionSentry 1
Host orNetworkDevice
IntrusionDetectionSentry 2
Host orNetworkDevice
Department of Computer Science Intelligent Systems Laboratory
MINING FUZZY ASSOCIATIONRULES
l Association rules represent commonly found patternsin data.
l Association Rule Format: R1: X→Y, c, sX and Y are disjoint sets of itemss (support) tells how often X and Y co-occur in the
datac (confidence) tells how often Y is associated with X.
l Our system is unique: X and Y are fuzzy variablesthat take fuzzy sets as values
Department of Computer Science Intelligent Systems Laboratory
FUZZY ASSOCIATION RULESSample Fuzzy Association Rule:
{ SN=LOW, FN=LOW } → { RN=LOW }, c = 0.924, s = 0.49
Interpretation:SN, FN, and RN are fuzzy variables.The pattern { SN=LOW, FN=LOW, RN=LOW } has
occurred in 49% of the training cases;When the pattern { SN=LOW, FN=LOW } occurs, there will
be 92.4% probability that { RN=LOW } will occur at thesame time.
Department of Computer Science Intelligent Systems Laboratory
SAMPLE FUZZY FREQUENCYEPISODE RULES
{ E1: PN=LOW, E2: PN=MEDIUM } →{ E3:PN=MEDIUM } c = 0.854, s = 0.108, w = 10 seconds
– E1, E2 and E3 are events occurring within the time window 10seconds.
– PN is a fuzzy variable– The events occur in the order E1, E2, E3, but there may also
be intervening events– { PN=LOW, PN=MEDIUM, PN=MEDIUM } has occurred 10.8%
in all training cases;– When { PN=LOW, PN=MEDIUM } occurs, { PN=MEDIUM }
will follow with 85.4% probability.
Department of Computer Science Intelligent Systems Laboratory
FUZZY DATA MINING FORINTRUSION DETECTION
l Modification of non-fuzzy methods developed byLee, Stolfo, and Mok (1998)
l Anomaly Detection Approach– Mine a set of fuzzy association rules from data with no
anomalies.– When given new data, mine fuzzy association rules
from this data.– Compare the similarity of the sets of rules mined from
new data and “normal” data.
Department of Computer Science Intelligent Systems Laboratory
Similarities between Training Data Set and Different Test Data Sets by Mining Fuzzy Association Rules on SN, FN, and RN.Training data collected in the afternoon. T1-T3—afternoon T4-T6—evening T7-T9—late nightData source: MSU CS network
Department of Computer Science Intelligent Systems Laboratory
Similarities between Training Data Set and Different Test Data Sets by Mining Fuzzy Frequency Episodes on PN.Training data collected in the afternoon. T1-T3—afternoon T4-T6—evening T7-T9—late nightData source: MSU CS network
Department of Computer Science Intelligent Systems Laboratory
0
0.2
0.4
0.6
0.8
1
Sim
ilari
ty
Test Data Sets
Similarity 0.744 0.309 0.315
Baseline Network1 Network3
Similarities between Training Data Set and Different Test Data Sets by Mining Fuzzy Association Rules on SN, FN, and RN
0
0.2
0.4
0.6
0.8
1
Sim
ilari
ty
Testing Data Sets
Similarity 0.885 0 0.000155
Baseline Network1 Network3
Similarities between Training Data Set and Different Test Data Sets by Mining Fuzzy Frequency Episodes on PN
Training data: no intrusions
Test data: baseline (no intrusions)
network1 (includes simulated IP Spoofing intrusions)network3 (includes simulated port scanning intrusions)
Department of Computer Science Intelligent Systems Laboratory
REAL-TIME INTRUSIONDETECTION
Given a fuzzy episode rule R: { e1, …, ek-1 } → { ek }, c, s, w,
if {e1, …, ek-1} has occurred in the current event sequence,
then { ek } can be predicted to occur next with confidence of c.
If the next event does not match any prediction from the rule set,it will be alarmed as an anomaly.
Define anomaly percentage = number of anomalies / number of events
Department of Computer Science Intelligent Systems Laboratory
Anomaly Percentages of Different Test Data Sets in Real-time Intrusion Detection by Mining Fuzzy Frequency Episodes on PNTraining Data: No intrusionsTest Data: T1’-T3’—no intrusions T4’-T6’—simulated mscan
Department of Computer Science Intelligent Systems Laboratory
Comparing the false positive error rates offuzzy episode rules with non-fuzzy versionsfor real-time intrusion detection