Po-Ching Lin Dept. of CSIE National Chung Cheng University

Po-Ching LinDept. of CSIE

National Chung Cheng University

Problem definitionTwo main approaches to detecting malicious inp

uts, behavior, network traffic, etc.Signature matchingAnomaly detection

Challenge of effective anomaly detection of malicious trafficA highly accurate modeling of normal trafficReal network traffic is usually polluted or unclean

Using it as the training data can be a problemHow can we sanitize training data for AD sensors?

Solution outline

noise: attack or non-regularityMi: micro model i

training set

MM2 MM

Assumption:An attack or abnormality appear only in small subsets in a large training set

The solution:1.Test each packet with the micro models using the voting scheme and build a “normal” model.2.Data deemed abnormal is used for building an abnormal model.3.Abnormal model can be distributed between sites.4.A shadow sensor architecture to handle false positives. 3

Assumption & micro modelsObservation

Over a long period, attacks and abnormalities are a minority class of data.

Deriving the micro modelsT = {md1,md2,...,mdN}, where mdi is the mi

cro-dataset starting at time (i − 1) g, g fro∗m 3 to 5 hours

Mi = AD(mdi) : the micro model from mdi

Deriving the sanitized training setSanitize the training dataset

Test each packet Pj with all the micro-models Mi. Lj,i = TEST(Pj ,Mi), Lj,i=1 if Pj is abnormal; otherwise 0.

Combine output from the modelsSCORE(Pj )= 1/WwiLj,i, where W=wi.

Sanitize the training datasetTsan={Pj|SCORE(Pj)V}, Msan = AD(Tsan).Tabn={Pj|SCORE(Pj)>V}, Msan = AD(Tsan).

Evaluation of sanitizationUse two anomaly sensors for evaluation

Anagram and PaylExperimental corpus

500 hours of real network traffic 300 hours of traffic to build the micro-models the next 100 hours to generate the sanitized model the remaining 100 hours of data was used for testing

from three different hosts: www, www1, and lists

with cross validation6

Without sanitizing the training data

A: AnagramA-S: Anagram +SnortA-SAN: Anagram + sanitizationP: PaylP-SAN: Payl+sanitizationV ∈ [0.15, 0.4

Analysis of sanitization parametersThree parameters for fine-tuning

The granularity of micro-modelsThe voting algorithm (simple voting vs. we

ighted voting)The voting threshold

Simple voting vs. Weighted voting

Results from the other two hosts

Granularity impact

Other impacts

Latency for different ADs

Long lasting training attacks

Collaborative SanitizationComparing models of abnormality with those generate

d by other sitesDirect model differencing

Mcross = Msan − {Mabni ∩Msan}Indirect model differencing

Differencing the sets of packets used to compute the models.

If a packet Pj is considered abnormal by at least one Mabni

features are extracted from the packet for computingthe new local abnormal model

used for computing the cross-sanitized model15

Training attacks

Conclusion & limitations in this paperThe capability of anomaly detection in the micr

o models Effectiveness of PAYL and Anagram?

The traffic in the evaluation “normality” is diverse in a real environment

Deriving packets to form the training set is statelessAttacks can be across packets or even connections

Po-Ching Lin Dept. of CSIE National Chung Cheng University

Documents

The Full Steiner Tree Problem Speaker: Chuang-Chieh Lin...

1 Design & Analysis of Multistratum Randomized Experiments.....

The CSiE Gene

EAS Reconstruction with Cerenkov photons Ching-Cheng Hsu,...

1 Stat 232 Experimental Design Spring 2008. 2 Ching-Shui...

Macroeconomie CSIE rezolvari

CHING HO CHENG - Shepherd W & K...

Ch 10 Monitoring NCNU CSIE 97321012 林似真 Stella. NCNU.....

OEIC LAB National Cheng Kung University 1 Ching-Ting Lee...

Yschen, CSIE, CCU1 Chapter 5: The Cellular Concept Associate...

A Life Biography of Cheng Man Ching - WordPress.comThe...

Cap1-Macro Csie An2 Popescu2

ST-MAC: Spatial-Temporal MAC Scheduling for Underwater...

Curs1 Csie Ie

Fitness Promotion · Cheng Pet Chin Cheng Seng Loong Cheng....

Symbolic Program Consistency Checking of OpenMP Parallel...