1 Network Payload-based Anomaly Detection and Content-based Alert Correlation Ke Wang Thesis Defense Aug. 14 th, 2006 Department of Computer Science Columbia.

1

Network Payload-based Anomaly Detection and

Content-based Alert Correlation

Ke WangKe WangThesis DefenseThesis DefenseAug. 14Aug. 14thth, 2006, 2006

Department of Computer ScienceDepartment of Computer ScienceColumbia UniversityColumbia University

2

Why do we need Why do we need payload-based anomaly payload-based anomaly

detectiondetection Attacks that are normal connections Attacks that are normal connections

may carry bad (anomalous) content may carry bad (anomalous) content indicative of a new exploitindicative of a new exploit

Slow and stealthy, or targeted/hitlist Slow and stealthy, or targeted/hitlist worms do not display “loud and worms do not display “loud and obvious” scanning or propagation obvious” scanning or propagation behavior detectable via flow statistics behavior detectable via flow statistics

This sensor augments other sensors This sensor augments other sensors and enriches the view of the networkand enriches the view of the network

3

Conjecture and GoalConjecture and Goal Detect Zero-Day Exploits via Content AnalysisDetect Zero-Day Exploits via Content Analysis

Worms – propagation detectable via flow statistics Worms – propagation detectable via flow statistics (except perhaps slow worms)(except perhaps slow worms)

Targeted Attacks (sophisticated, stealthy, no “loud and Targeted Attacks (sophisticated, stealthy, no “loud and obvious” propagation) obvious” propagation)

True Zero-day will manifest as “never before True Zero-day will manifest as “never before seen data” delivered to an application or serverseen data” delivered to an application or server

Learn “typical/normal” data, detect abnormal Learn “typical/normal” data, detect abnormal datadata

Generate signature immediately to stop further Generate signature immediately to stop further propagationpropagation No need to wait until “payload prevalence” (a sufficient No need to wait until “payload prevalence” (a sufficient

number of repeated occurrences of the same content)number of repeated occurrences of the same content) Develop sensors that are accurate, efficient, Develop sensors that are accurate, efficient,

scalable, with resiliency to mimicry attacksscalable, with resiliency to mimicry attacks

4

ContributionsContributions Demonstrate the usefulness of analyzing

network payload for anomaly detection PAYL: 1-gram modelingPAYL: 1-gram modeling Anagram: higher order n-gram modelingAnagram: higher order n-gram modeling

Randomized modeling/testing that can help thwart mimicry attacks

Ingress/egress payload correlation to capture a worm’s initial propagation attempt

Efficient privacy-preserving payload correlation across sites, and automatic signature generation

5

ContributionsContributions Demonstrate the usefulness of analyzing network

payload for anomaly detection PAYL: 1-gram modelingPAYL: 1-gram modeling

Statistical, semantics/language-independent, efficient

Incremental learningIncremental learning Clustering for space savingClustering for space saving Multi-centroids fine grained modelingMulti-centroids fine grained modeling

Anagram: higher order n-gram modelingAnagram: higher order n-gram modeling Randomized modeling/testing that can help thwart mimicry

attacks Ingress/egress payload correlation to capture a worm’s initial

propagation attempt Efficient privacy-preserving payload correlation across sites

6

Motivation of PAYLMotivation of PAYL Content traffic to different ports have very Content traffic to different ports have very

different payload distributionsdifferent payload distributions Within one port, packets with different Within one port, packets with different

lengths also have different payload lengths also have different payload distributionsdistributions

Furthermore, worm/virus payloads usually Furthermore, worm/virus payloads usually are quite different from normal distributionsare quite different from normal distributions

Previous work: Previous work: Attack signature: Snort, BroAttack signature: Snort, Bro First few bytes of a packet: NATE, PHAD, ALADFirst few bytes of a packet: NATE, PHAD, ALAD Service-specific IDS [CKrugel02]: coarse Service-specific IDS [CKrugel02]: coarse

modeling, 256 ASCII characters in 6 groups.modeling, 256 ASCII characters in 6 groups.

7

Dest Port 22 Dest Port 25 Dest Port 80

Src Port 22 Src Port 25 Src Port 80

Example byte distributions Example byte distributions for different portsfor different ports

ssh Mail Web

8

Example byte distribution for Example byte distribution for different payload lengths of port 80 different payload lengths of port 80

on the same host serveron the same host server

9

CR II distribution versus a CR II distribution versus a normal distributionnormal distribution

10

How to model “normal” How to model “normal” content: content:

1-gram Centroid1-gram CentroidThe average relative frequency of each byte, The average relative frequency of each byte, and the standard deviation of the frequency and the standard deviation of the frequency of each byte, for payload length 185 of port of each byte, for payload length 185 of port 80 80

11

PAYL operationPAYL operation Learning phaseLearning phase

Models are computed from packet stream Models are computed from packet stream incrementally incrementally conditioned on port/service and length of conditioned on port/service and length of packetpacket

Hands-free epoch-based trainingHands-free epoch-based training Fine-grained multi-centroids modelingFine-grained multi-centroids modeling

ClusteringClustering: merge two neighbouring centroids if their : merge two neighbouring centroids if their Manhattan distance is smaller than thresholdManhattan distance is smaller than threshold Save space, remove redundancy, linear time Save space, remove redundancy, linear time

computationcomputation Improve the modeling accuracy for those length bins Improve the modeling accuracy for those length bins

with few training data (sparseness)with few training data (sparseness) Self-calibration phaseSelf-calibration phase

Sampled training data sets an initial threshold settingSampled training data sets an initial threshold setting Detection phase Detection phase

Packets are compared against models using simplified Packets are compared against models using simplified Mahalanobis distanceMahalanobis distance

12

Performance comparison: Performance comparison: single centroid vs. multi-single centroid vs. multi-

centroidscentroids

Dataset Dataset WW

Dataset Dataset W1W1

Dataset Dataset EXEX

Single-Single-centroidcentroid

0.66%0.66% 0.487%0.487% 0.982%0.982%

Multi-Multi-centroids centroids (one-pass)(one-pass)

0.42%0.42% 0.225%0.225% 0.32%0.32%

Multi-Multi-centroids centroids (semi-(semi-batched)batched)

0.0086%0.0086% 0.029%0.029% 0.107%0.107%

Test Worms: CR, CRII, WebDAV, and nsiislog.dll buffer overflow vulnerability (MS03-022)

At 0.1% false positive rate: 5.8 alerts/h for EX, 6 alerts/h for W, 8 alerts/h for W1

13

PAYL SummaryPAYL Summary Models: length conditioned character Models: length conditioned character

frequency distribution (1-gram) and frequency distribution (1-gram) and standard deviation of normal trafficstandard deviation of normal traffic

Testing: Mahalanobis distance of the test Testing: Mahalanobis distance of the test packet against the modelpacket against the model

Pro:Pro: Simple, fast, memory efficientSimple, fast, memory efficient

Con:Con: Cannot capture attacks displaying normal byte Cannot capture attacks displaying normal byte

distributiondistribution Easily fooled by mimicry attacks with proper Easily fooled by mimicry attacks with proper

paddingpadding

14

Example: phpBB forum Example: phpBB forum attackattack

Relatively normal byte distribution, so PAYL Relatively normal byte distribution, so PAYL misses itmisses it

Abnormal sequence of commands for exploitationAbnormal sequence of commands for exploitation The attack invariantsThe attack invariants The subsequence of new, distinct bye values should be The subsequence of new, distinct bye values should be

“malicious”“malicious” What we need: capture order dependence of byte What we need: capture order dependence of byte

sequences --- sequences --- higher order n-grams modelinghigher order n-grams modeling

GET /modules/Forums/admin/admin_styles.php?phpbb_root_path=http://81.174.26.111/cmd.gif?&cmd=cd%20/tmp;wget%20216.15.209.4/criman;chmod%20744%20criman;./criman;echo%20YYY;echo|..HTTP/1.1.Host:.128.59.16.26.User‑Agent:.Mozilla/4.0.(compatible;.MSIE.6.0;.Windows.NT.5.1;)..

15

ContributionsContributions Demonstrate the usefulness of analyzing network payload for

anomaly detection PAYL: 1-gram modelingPAYL: 1-gram modeling

Anagram: higher order n-gram modelingAnagram: higher order n-gram modeling Binary-based modelingBinary-based modeling Bloom filter for space efficiencyBloom filter for space efficiency Semi-supervised learningSemi-supervised learning Privacy-preserving payload alert for Privacy-preserving payload alert for

correlationcorrelation Randomized modeling/testing that can help thwart mimicry

attacks Ingress/egress payload correlation to capture a worm’s initial

propagation attempt Efficient privacy-preserving payload correlation across sites

16

Overview of AnagramOverview of Anagram Binary-base higher order n-grams modelingBinary-base higher order n-grams modeling

Models all the distinct n-grams appearing in the Models all the distinct n-grams appearing in the normal training data normal training data

During test, compute the percentage of never-During test, compute the percentage of never-seen distinct n-grams out of the total n-grams in seen distinct n-grams out of the total n-grams in a packet:a packet:

Semi-supervised learningSemi-supervised learning

Normal traffic is modeledNormal traffic is modeled Prior known malicious traffic is modeled: Prior known malicious traffic is modeled:

Snort Rules, captured malcodeSnort Rules, captured malcode Model is space-efficient by using Bloom Model is space-efficient by using Bloom

filtersfilters Previous workPrevious work

Foreign system call sequences [Forrest96]Foreign system call sequences [Forrest96] Trie-based n-gram storage and comparison for Trie-based n-gram storage and comparison for

network anomaly detection [Rieck06]network anomaly detection [Rieck06]

]1,0[T

NScore new

17

18

19

0 1 2 3 4 5 6 70

0.005

0.01

0.015

0.02

0.025

0.03

Training Dataset Length (in days)Fal

se P

ositi

ve R

ate

(%)

whe

n 10

0% D

etec

tion

Rat

e

3-grams5-grams7-grams

False positive rate (with 100% detection rate) with different training time and n of n-grams

Low False positive rate Low False positive rate per packetper packet (better per (better per flow)flow)

No significant gain after No significant gain after 4 days’ training4 days’ training

Higher order n-grams Higher order n-grams needs longer training needs longer training time to build good time to build good modelmodel

3-grams are not long 3-grams are not long enough to distinguish enough to distinguish malicious byte malicious byte sequences from normal sequences from normal onesones

Normal traffic: real web traffic collected of two CUCS web servers

Test worms: CR, CRII, WebDAV, Mirela, phpBB forum attack, nsiislog.dll buffer overflow(MS03-022)

20

2 3 4 5 6 7 8 90

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Value n of n-gramsFal

se P

ostit

ive

Rat

e (%

) w

hen

100%

Det

ectio

n R

ate

dataset www1-06, normaldataset www1-06, supervised dataset www-06, normaldataset www-06, supervised

The false positive rate (with 100% detection rate) for different n-grams, under both normal and semi-supervised training – per packet rate

21

Mimicry attacksMimicry attacks Attackers can mimic the normal traffic and hide the Attackers can mimic the normal traffic and hide the

exploit inside “the sled” to avoid the sensor easily.exploit inside “the sled” to avoid the sensor easily. Example: polymorphic mimicry worm developed by [OK05] Example: polymorphic mimicry worm developed by [OK05]

targeting PAYL, which do encoding and traffic blending to targeting PAYL, which do encoding and traffic blending to simulate normal profile.simulate normal profile.

22


payload for anomaly detection

Randomized modeling/testing that can help thwart mimicry attacks

Ingress/egress payload correlation to capture a worm’s initial propagation attempt

Efficient privacy-preserving payload correlation across sites

23

Randomization against Randomization against mimicry attacksmimicry attacks

The general idea of payload-based mimicry attacks The general idea of payload-based mimicry attacks is by crafting small pieces of exploit code with a is by crafting small pieces of exploit code with a large amount of “normal” padding to make the large amount of “normal” padding to make the whole packet look normal.whole packet look normal.

If we If we randomly choose the payload portionrandomly choose the payload portion for for modeling/testingmodeling/testing, the attacker would not know , the attacker would not know precisely which byte positions it may have to pad precisely which byte positions it may have to pad to appear normal; harder to hide the exploit code!to appear normal; harder to hide the exploit code!

This is a This is a generalgeneral technique can be used for both technique can be used for both PAYL and Anagram, or any other payload anomaly PAYL and Anagram, or any other payload anomaly detector.detector.

For Anagram, additional randomization, keep n-For Anagram, additional randomization, keep n-gram size a secret!gram size a secret!

24

Randomized ModelingRandomized Modeling Separate the whole packet randomly into Separate the whole packet randomly into

several (possibly interleaved) substrings or several (possibly interleaved) substrings or subsequences: subsequences: SS11, S, S22, ..S, ..SNN, and build one , and build one model for each of them model for each of them

Test packet’s payload is divided accordinglyTest packet’s payload is divided accordingly

25

Models from sub-partitions may be similarModels from sub-partitions may be similar Higher memory consumption, no real model Higher memory consumption, no real model

diversitydiversity The testing partitioning need to be the same The testing partitioning need to be the same

as training partitioningas training partitioning Less flexibilityLess flexibility Need to retrain when wants to change partitionsNeed to retrain when wants to change partitions

Top plot is the model built from the whole packet, and the bottom two are the models built from two random sub-partitions.

Shortcomings:

26

Randomized TestingRandomized Testing Simpler strategy that does not incur Simpler strategy that does not incur

substantial overheadsubstantial overhead Build one model for whole packet, Build one model for whole packet,

randomize tested portionsrandomize tested portions Separate the whole packet randomly into Separate the whole packet randomly into

several (possibly interleaved) partitions: several (possibly interleaved) partitions: SS11, , SS22, ..S, ..SNN, ,

Score each randomly chosen partition Score each randomly chosen partition separatelyseparately

Use the maximum score:Use the maximum score:

ii TNScorenew

/max

27

28

DetectiDetection on

TimesTimes

Avg. FPAvg. FP Std. FPStd. FP

Pure Pure random random maskmask

16/2016/20 0.269%0.269% 0.375%0.375%

Chunked Chunked random random maskmask

14/2014/20 0.175%0.175% 0.409%0.409%

PAYL Test: on the mimicry attack designed by

[OK05] targeting it, 20 fold fold

randomized testingrandomized testing

29

Anagram TestAnagram Test: : average false positive rate and standard average false positive rate and standard

deviation withdeviation with 100% detection rate, 100% detection rate, chunked random mask, 10 fold randomized chunked random mask, 10 fold randomized

testingtesting

Normal training Semi-supervised training

30


payload for anomaly detection Randomized modeling/testing that can help thwart

mimicry attacks

Ingress/egress payload correlation to capture a worm’s initial propagation attempt Detect slow or stealthy wormsDetect slow or stealthy worms Immediate signature generation

Efficient privacy-preserving payload correlation across sites.

31

Ingress/egress correlation to Ingress/egress correlation to detectdetect

worm’s propagation worm’s propagation ObservationObservation

Self-propagating worms will start attacking other Self-propagating worms will start attacking other machines (by sending at least the exploit portion of its machines (by sending at least the exploit portion of its content) shortly after a host is infected content) shortly after a host is infected

The attacked destination port will be the same since The attacked destination port will be the same since it’s exploiting the same vulnerabilityit’s exploiting the same vulnerability

An approach to stop the worm’s very first An approach to stop the worm’s very first propagation attemptpropagation attempt If we detect anomalous egress packets to port If we detect anomalous egress packets to port ii very very

similar to those anomalous ingress packets to port similar to those anomalous ingress packets to port ii, , there is a high probability that a worm has started its there is a high probability that a worm has started its propagationpropagation

Advantage:Advantage: Can detect Can detect slow or stealthyslow or stealthy worms which won’t show worms which won’t show

probe behavior and thus avoid probe detectorsprobe behavior and thus avoid probe detectors

32

Similarity metrics to compare Similarity metrics to compare the payloads of two or more the payloads of two or more

anomalous packet alertsanomalous packet alerts

2*C/( L1+ L2)

2*C/( L1+ L2)

1 for equal,

0 otherwise

Similarity score [0, 1]

SomeYesRaw dataLongest common subsequence (LCSeq)

NoYesRaw dataLongest common substring (LCS)

NoNoRaw dataString equality (SE)

Detect metamorphic

Handle fragment

Data usedMetric

Experiment result

33

|d0|$@|0 ff|5|d0|$@|0|h|d0| @|0|j|1|j|0|U|ff|5|d8|$@|0 e8 19 0 0 0 c3 ff|%`0@|0 ff|%d0@|0 ff|%h0@|0 ff|%p0@|0 ff|%t0@|0 ff|%x0@|0 ff|%|0@|fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 0 0 0 0 0 0 0 0 0 0 0 0 0|\EXPLORER.EXE|0 0 0|SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon|0 0 0|SFCDisable|0 0 9d ff ff ff|SYSTEM\CurrentControlSet\Services\W3SVC\Parameters\Virtual Roots|0 0 0 0|/Scripts|0 0 0 0|/MSADC|0 0|/C|0 0|/D|0 0|c:\,,217|0 0 0 0|d:\,,217|fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc…

LCS signature LCS signature generation: generation: Code Red IICode Red II

34

Previous WorkPrevious Work

Worm signature generation: Worm signature generation: Autograph, Earlybird, Honeycomb, Autograph, Earlybird, Honeycomb, Polygraph, HasmaPolygraph, Hasma Detecting frequently occurring payload Detecting frequently occurring payload

substrings or tokens from suspicious IP, substrings or tokens from suspicious IP, which still depends on the scanning which still depends on the scanning behaviorbehavior

Detection occurs some time after the Detection occurs some time after the worm propagationworm propagation

Cannot detect slow and stealthy wormsCannot detect slow and stealthy worms

35


payload for anomaly detection Randomized modeling/testing that can help thwart

mimicry attacks Ingress/egress payload correlation to capture a

worm’s initial propagation attempt Efficient privacy-preserving

payload correlation across sites Robust and privacy-preserving means

of representing content-based alerts. Automatic signature generation.

36

Cross-site payload alert Cross-site payload alert correlationcorrelation

Each site has a distinct content flow Each site has a distinct content flow Diversity via content (not system or software)Diversity via content (not system or software)

Find global, common “invariants in content”.Find global, common “invariants in content”. If multiple sites see the same/similar content alerts, If multiple sites see the same/similar content alerts,

it’s highly likely to be a true worm/targeted it’s highly likely to be a true worm/targeted outbreakoutbreak

Separate TP’s from FP’s! Separate TP’s from FP’s! The False False PositiveThe False False Positive ProblemProblem

Reduces false positives by creating white lists Reduces false positives by creating white lists of those alerts that cannot be correlatedof those alerts that cannot be correlated

Higher standard to prevent mimicry attackHigher standard to prevent mimicry attack Exploit writers/attackers have to learn the distinct Exploit writers/attackers have to learn the distinct

content traffic patterns of many different sites content traffic patterns of many different sites Need to be privacy-preservingNeed to be privacy-preserving

37

Related ResearchRelated Research

DNAD/Worminator (slow/IP) sharingDNAD/Worminator (slow/IP) sharing Domino alert sharingDomino alert sharing The DShield.org model for content The DShield.org model for content

sharing and queryingsharing and querying Could also serve as a “trap” to detect Could also serve as a “trap” to detect

attacker watermarking behavior attacker watermarking behavior PeerPressure, Privacy-Preserving PeerPressure, Privacy-Preserving

friends troubleshooting networkfriends troubleshooting network

38

Correlation techniquesCorrelation techniques

BaselineBaseline ““Raw” Raw” suspect contentsuspect content string-based string-based

correlation:correlation: String equality (SE), longest String equality (SE), longest common substring (LCS), longest common common substring (LCS), longest common subsequence (LCSeq), edit distance (ED)subsequence (LCSeq), edit distance (ED)

Frequency-modeled 1-gram correlationFrequency-modeled 1-gram correlation Frequency distribution:Frequency distribution: Manhattan Manhattan

distancedistance Z-String:Z-String: supports SE, LCS, LCSeq, ED supports SE, LCS, LCSeq, ED

Binary-modeled n-gram correlationBinary-modeled n-gram correlation N-gram signature, Bloom filter n-gram N-gram signature, Bloom filter n-gram

“signature”“signature”

39

Example Example suspect contentsuspect content

This is a bot command string

Thi, his, is□□, s□i, □is, , s□i, □is, s□a, □a□, a□b, □bo, bot, s□a, □a□, a□b, □bo, bot, ot□, t□c, □co, com, omm, ot□, t□c, □co, com, omm, mma, man, and, nd□, d□s,mma, man, and, nd□, d□s,□□st, str, tri, rin, ingst, str, tri, rin, ing

Original content: 256 bits.

Frequency distribution; the most frequent character is a space (ASCII code 32). Size ≈ 8160

bits.

List of 3-grams in original string. A box represents a space; the

underlined n-gram appears twice in the original alert. 25 n-grams take

approximately 600 bits.0000011010101101001101100110101101010…01010011101010101111000Bloom filter of above n-grams. If three hash values are used, a minimum optimal size would be ~ 150 bits.

□□isamnotTbcdghrisamnotTbcdghrZ-String; the space (box) is the most frequent character. Non-

appearing characters are removed.

15 characters = 120 bits.

40

Real traffic evaluationReal traffic evaluation Goal: measure performance in identifying true Goal: measure performance in identifying true

alerts from false positivesalerts from false positives Ideal: true positives have very high similarity Ideal: true positives have very high similarity

scores, while false positives have very low scoresscores, while false positives have very low scores Mix the collection of attacks into two hours of Mix the collection of attacks into two hours of

traffic from traffic from wwwwww and and www1www1 Multiple, differently-fragmented instances of Code Multiple, differently-fragmented instances of Code

Red and Code Red II to simulate a real worm attackRed and Code Red II to simulate a real worm attack Mixed sets are run through PAYL and Mixed sets are run through PAYL and

Anagram, with alerting threshold reduced so Anagram, with alerting threshold reduced so that 100% of attacks are detected, but with that 100% of attacks are detected, but with possibly higher FP ratespossibly higher FP rates

String evaluation

41

Real traffic evaluation Real traffic evaluation (II)(II)

False positive score range; blue bar represents 99.9% percentile; white represents maximum score

Range of scores across multiple instances of the same worm (CR or CRII)Range of scores across instances of different worms (CR vs. CRII), e.g., polymorphism

Methods are, from 1 to 8: Raw-LCS, Raw-LCSeq, Raw-ED, Freq-MD, ZStr-LCS, ZStr-LCSeq, Zstr-ED, N-grams with n=5.

42

Real traffic evaluation Real traffic evaluation (III)(III)

Correlation of identical (non-polymorphic) Correlation of identical (non-polymorphic) attacks works accurately for all techniquesattacks works accurately for all techniques Non-fragmented attacks score near 1Non-fragmented attacks score near 1 Z-Strings (MD, LCseq, ED) and n-grams handle Z-Strings (MD, LCseq, ED) and n-grams handle

fragmentation wellfragmentation well Polymorphism is hard to detect; only Raw-Polymorphism is hard to detect; only Raw-

LCSeq and n-grams score wellLCSeq and n-grams score well Overall, n-grams are particularly effective Overall, n-grams are particularly effective

at eliminating false positives, and Bloom at eliminating false positives, and Bloom filters enable privacy preservationfilters enable privacy preservation

43

Signature GenerationSignature Generation Each class of techniques can generate its Each class of techniques can generate its

own signatureown signature Raw packets: Exchange LCS/LCSeqRaw packets: Exchange LCS/LCSeq

Not privacy-preservingNot privacy-preserving Byte frequency/Z-StringsByte frequency/Z-Strings

Given the frequency distribution, Z-Strings Given the frequency distribution, Z-Strings generated by ordering from most to least generated by ordering from most to least frequent and dropping the least frequentfrequent and dropping the least frequent

N-gramsN-grams Robust to reordering or fragmentationRobust to reordering or fragmentation If position information is available, can “flatten” If position information is available, can “flatten”

into a deployable string signatureinto a deployable string signature

44

Signature/Query Signature/Query generation (II)generation (II)

GET./default.ida?XXXXXXXXXXXXXXXXXXX

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX



XXXXXXXXXXXXXXXXXXXXXX%u9090%u6858%u

cbd3%u7801%u9090%u6858%ucbd3%u7801%u

9090%u6858%ucbd3%u7801%u9090%u9090%u

8190%u00c3%u0003%u8b00%u531b%u53ff%u

0078%u0000%u0

88 0 255 117 48 85 116 101

106 232 100 133 80 254 1

69 137 56 51

* /def*ult.ida?XXXX*XXXX%u9090%

u6858%ucbd3%u7801%u9090%u6858%u

cbd3%u7801%u9090%u6858%ucbd3%u7

801%u9090%u9090%u8190%u00c3%u00

03%u8b00%u531b%u53ff%u0078%u000

0%u00=a HT*: 3379

Original CRII packet (first 300 bytes)

Z-String (first 20 bytes, ASCII values)

Byte frequency distribution

Flattened 5-grams (first 172 bytes; “*” implies wildcard)

45

Accuracy of the signaturesAccuracy of the signaturesThe accumulative frequency of the signature match scores computed by matching normal traffic against different worm signatures. The closer to the y-axis, the more accurate.

The six curves represent the following, in order from the left to the right: 1) n-grams signature, 2) Z-string signature comparing using LCS, 3) LCSeq of raw signature, 4) Z-string signature using LCSeq, 5) LCSeq of raw signature, 6) byte-frequency signature.

46

Signature for Signature for polymorphic wormpolymorphic worm

Our approaches work poor since they are Our approaches work poor since they are based on payload similaritybased on payload similarity

Will there be enough invariants for accurate Will there be enough invariants for accurate signature?signature? Slammer: first byte “0x04”Slammer: first byte “0x04” CLET shellcode 2: CLET shellcode 2: “\0xff\0xff\0xff” and “\0xeb\0x31”.

Proposed alternative: “generalized signature” specifying the higher-level pattern of an attack, instead of raw payload based. “0xeb 0x31”B {92 bytes, entropy: E, “0xff 0xff

0xff”B}

47

ConclusionsConclusions

Network payload-based PAYL and Anagram Network payload-based PAYL and Anagram can detect zero-day attacks with high can detect zero-day attacks with high accuracy and low false positivesaccuracy and low false positives

Randomization help thwart mimicry attackRandomization help thwart mimicry attack Ingress/egress correlation detects worm’s Ingress/egress correlation detects worm’s

initial propagation and generate accurate initial propagation and generate accurate worm signatureworm signature Good at detecting slow/stealth wormsGood at detecting slow/stealth worms

Privacy-preserving payload alerts Privacy-preserving payload alerts correlation across sites can identify true correlation across sites can identify true anomalies and reduces false positiveanomalies and reduces false positive Accurate signature generationAccurate signature generation

48

Accomplishments Accomplishments Major papers:Major papers:

Anagram: A Content Anomaly Detector Resistant to Anagram: A Content Anomaly Detector Resistant to Mimicry Attack,Mimicry Attack, K. Wang, J. Parekh, S. Stolfo, RAID, K. Wang, J. Parekh, S. Stolfo, RAID, Sept 2006. Sept 2006.

Privacy-preserving Payload-based Correlation for Privacy-preserving Payload-based Correlation for Accurate Malicious Traffic DetectionAccurate Malicious Traffic Detection, J. Parekh, K. , J. Parekh, K. Wang, S. Stolfo, SIGCOMM LSAD Workshop, Sept, Wang, S. Stolfo, SIGCOMM LSAD Workshop, Sept, 2006.2006.

Anomalous Payload-based Worm Detection and Anomalous Payload-based Worm Detection and Signature GenerationSignature Generation, K. Wang, G. Cretu, S. Stolfo, , K. Wang, G. Cretu, S. Stolfo, RAID, Sept 2005.RAID, Sept 2005.

FLIPS: Hybrid Adaptive Intrusion Prevention, FLIPS: Hybrid Adaptive Intrusion Prevention, M. M. Locasto, K. Wang, A. Keromytis, S. Stolfo, RAID, Sept. Locasto, K. Wang, A. Keromytis, S. Stolfo, RAID, Sept. 2005.2005.

Anomalous Payload-based Network Intrusion Anomalous Payload-based Network Intrusion DetectionDetection, K. Wang, S. Stolfo, RAID, Sept 2004., K. Wang, S. Stolfo, RAID, Sept 2004.

Software implementation (licensed by Columbia):Software implementation (licensed by Columbia): PAYL sensorPAYL sensor Anagram sensorAnagram sensor

49

Future WorkFuture Work Further Evaluation – including

measures/features of high-entropy partitions Optimization problem: model parameter

settings (n-gram size, thresholds, etc.), random mask generation

Real deployment of multiple-site correlation

Shadow server architecture implementation and testing

Pushing into the host: integration with instrumented application software

50

Thank you!Thank you!

Q/A ? Q/A ?

51

Backup slidesBackup slides

52

Overview of PAYL – How Overview of PAYL – How it worksit works

Principles of operationPrinciples of operation Normal packet content is automatically learnedNormal packet content is automatically learned Based upon unsupervised anomaly detection Based upon unsupervised anomaly detection

algorithmsalgorithms Fine-grained modeling of normal payload Fine-grained modeling of normal payload

Site and application specific, also conditioned on Site and application specific, also conditioned on packet lengthpacket length

Build byte frequency distribution and its standard Build byte frequency distribution and its standard deviation as normal profiledeviation as normal profile

For test data, compute the simplified For test data, compute the simplified Mahalanobis distance against its centroid to Mahalanobis distance against its centroid to measure similaritymeasure similarity

53

Unsupervised Anomaly Unsupervised Anomaly Detection – Core TechnologyDetection – Core Technology

Each site/host has a “unique” content flow that Each site/host has a “unique” content flow that may be automatically learnedmay be automatically learned

UAD Generates model over “unlabeled” dataUAD Generates model over “unlabeled” data Model detectsModel detects

Anomalies in collected training data (Forensics)Anomalies in collected training data (Forensics) Anomalies in data stream (Detection)Anomalies in data stream (Detection)

Computational Approach: Outlier DetectionComputational Approach: Outlier Detection Two frameworks: geometric, Two frameworks: geometric,

probabilistic/statisticalprobabilistic/statistical Several algorithms – PAYL is based upon Several algorithms – PAYL is based upon

comparison of content statistical distributionscomparison of content statistical distributions Handles “noise” in dataHandles “noise” in data

No guarantees of “attack-free” dataNo guarantees of “attack-free” data Assumes most data is “attack-free”Assumes most data is “attack-free”

Return to main Return to main slidesslides

54

Epoch-based learningEpoch-based learning

To determine how much training data is enough, or whether the model is ready for use

An epoch is measured in terms of the number of packets analyzed, or by means of a time period

The training phase is sufficiently complete if the model current computed has changed little for several continuous epochs

Need to define model similarity measurements

55

Epoch-based learning: PAYLEpoch-based learning: PAYL Metric 1: number of new centroids produced in the

current epoch, Metric 2: Manhattan distance of each centroid to each

nearest one computed in the prior epoch


56

0 50 100 150 200 250 300 350 400 4500

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

0.01

Per 10000 content packets

Like

lihoo

d of

see

ing

new

n-g

ram

s 3-grams5-grams7-grams

The likelihood of seeing new n-grams, which is the percentage of the new distinct n-grams out of total n-grams in this epoch

Epoch-based learning: Epoch-based learning: AnagramAnagram

57

The computed Mahalanobis The computed Mahalanobis distance of the normal and distance of the normal and

attack packetsattack packets

The normal data’s distances are displayed as The normal data’s distances are displayed as several bandsseveral bands, which illustrates that we might , which illustrates that we might have have multiple centroidsmultiple centroids for one length for one length

58

Multiple centroids Multiple centroids modeling for each lengthmodeling for each length

Goal: build finer-grained models for the Goal: build finer-grained models for the payload to detect anomalies more accuratelypayload to detect anomalies more accurately

Problems:Problems: Don’t know how many clusters may existDon’t know how many clusters may exist Can only access each packet data once in sequence, Can only access each packet data once in sequence,

cannot store them in memorycannot store them in memory So the traditional clustering algorithm like K-means, So the traditional clustering algorithm like K-means,

EM cannot be easily applied hereEM cannot be easily applied here Our solutionOur solution

One-pass online clusteringOne-pass online clustering Improvement: semi-batched one-pass clustering Improvement: semi-batched one-pass clustering

(keep a small buffer and do local optimal clustering)(keep a small buffer and do local optimal clustering)

Back

59

Simplified Mahalanobis Simplified Mahalanobis Distance Distance

Standard metric to compare two Standard metric to compare two statistical distributions:statistical distributions:

x is the test data, and y is its profile. x is the test data, and y is its profile. When we assume each ASCII value When we assume each ASCII value

is independent, the formula can be is independent, the formula can be simplified:simplified:

)()(),( 12 yxCyxyxd T

),( jiij yyCovC

)/|(|),(1

0 i

n

i ii yxyxd


60

Incremental Learning Incremental Learning

Average of N Average of N data:data:

When the (N+1)When the (N+1)thth data arrives:data arrives:

For the standard deviation, we can rewrite it as the For the standard deviation, we can rewrite it as the following:following:

Therefore, we don’t need to keep previous data to Therefore, we don’t need to keep previous data to update the new average and standard deviationupdate the new average and standard deviation

Now each centroid stores only the average of Now each centroid stores only the average of xx and and xx22

Return to main slidesReturn to main slides

NxxN

i i

1

1111

N

xxx

N

xNxx NN

222 )()()()( EXXEEXXEXVar

61

Manhattan distanceManhattan distance

n

iii yxyxMdis

0

||),(

x y

23||),(6

1

i

ii yxyxMdis

62

Example of clustering Example of clustering across length binsacross length bins

Original centroidsOriginal centroids

Clustered centroidsClustered centroids

Return to main slides

63

Self-calibrationSelf-calibration Training data is sampledTraining data is sampled Use FIFO to keep the most recent samples Use FIFO to keep the most recent samples

to capture concept driftto capture concept drift After training, compute the distances of After training, compute the distances of

samples against the centroid and set the samples against the centroid and set the anomalous threshold to the maximum anomalous threshold to the maximum

At the start of the detection phase, At the start of the detection phase, increase the threshold by t% if the alert increase the threshold by t% if the alert rate is higher than a user-specified rate is higher than a user-specified parameterparameterReturn to main slides

64

One-pass online One-pass online clustering algorithmclustering algorithm

Problem: The incoming order of the Problem: The incoming order of the packets affect the resultpackets affect the result

while (more packets){p = next packet;if ( p is similar to one of the existing centroids )

merge into that centroidelse

create a new centroid; use p as center

if ( total number of centroids > MaxSize )merge the two nearest ones

}


65

Continue …Continue …

merge ( c_set1, c_set2) {for (each c in c_set1){

if (c is similar to one of the centroids in c_set2)merge c into that centroid

elseadd c as a new centroid to c_set2

} if ( size of c_set2 > MaxNum)

merge the two nearest ones until (size==MaxNum)}


66

Improvement: Semi-Improvement: Semi-batched one-pass batched one-pass

clustering for stream clustering for stream processingprocessing

Main idea: Main idea: Store byte distributions of M packets Store byte distributions of M packets Optimize aggregate clustering of the M Optimize aggregate clustering of the M

packetspackets Merge the resulting centroids into the existing Merge the resulting centroids into the existing

centroids from prior batch of datacentroids from prior batch of data Can ameliorate the problem of packet orderingCan ameliorate the problem of packet ordering The batch size M needs to be chosen properly: The batch size M needs to be chosen properly:

tradeoff of accuracy and memory consumptiontradeoff of accuracy and memory consumption

67

One-pass clustering result. First six centroids for W dataset, length 1460

68

Semi-batch clustering result. First six centroids for W dataset, length 1460


69

PerformancePerformance Training over 3 days of data, detection Training over 3 days of data, detection

over 2 daysover 2 days Data from two web serversData from two web servers Training: 29 seconds (60 MBits/sec)Training: 29 seconds (60 MBits/sec) Detection: 12 seconds (54 MBits/sec)Detection: 12 seconds (54 MBits/sec) FP Rate: 42 / 625595 packets (0.006%)FP Rate: 42 / 625595 packets (0.006%) Coverage: 20/30 known attacks in data Coverage: 20/30 known attacks in data

detecteddetected

70

Bloom filterBloom filter A A Bloom filterBloom filter (BF) is a one-way data structure (BF) is a one-way data structure

that supports that supports insertinsert and and verifyverify operations, yet is operations, yet is fast and space-efficientfast and space-efficient

Represented as a bit vector; bit Represented as a bit vector; bit bb is set if is set if hhii((ee) = ) = bb, where , where hhii is a hash function and is a hash function and ee is the is the element in questionelement in question

No false negatives, although false positives are No false negatives, although false positives are possible in a saturated BF via hash collisions; possible in a saturated BF via hash collisions; use multiple hash functions for robustnessuse multiple hash functions for robustness

Each n-gram is a candidate element to be Each n-gram is a candidate element to be inserted or verified in the BFinserted or verified in the BF

Bloom filters are also Bloom filters are also privacy-preservingprivacy-preserving, since , since n-grams cannot be extracted from the resulting n-grams cannot be extracted from the resulting bit vectorbit vector

71Return to main

72

Anagram: Anagram: semi-supervised learningsemi-supervised learning

Binary-based approach is simple and efficient, Binary-based approach is simple and efficient, but too sensitive to noisy databut too sensitive to noisy data

Pre-compute a Pre-compute a bad content modelbad content model using snort using snort rules and collection of worm samples, to rules and collection of worm samples, to supervise the learningsupervise the learning This model should match very few normal packets, This model should match very few normal packets,

while able to identify malicious traffic (often, new while able to identify malicious traffic (often, new exploits reuse portions of old exploits)exploits reuse portions of old exploits)

The model contains the distinct n-grams The model contains the distinct n-grams appearing in these malcode collectionsappearing in these malcode collections

Use a small, clean dataset to exclude the Use a small, clean dataset to exclude the normal n-grams appearing in the snort rules normal n-grams appearing in the snort rules and virus.and virus.

73

Bad content model (purple part)

N-grams in snort rules and collected malwares

N-grams in clean traffic

74

0 20 40 60 80 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Bad Content Model Matching Score

Per

cent

age

of t

he p

acke

ts

Normal Content Packets

0 20 40 60 80 1000

0.05

0.1

0.15

0.2

0.25

Bad Content Model Matching Score

Per

cent

age

of t

he p

acke

ts

Attack Packets

Distribution of bad content matching scores for normal packets (left) and attack packets (right).

The “matching score” is the percentage of the n-grams of a packet that match the bad content model

75

Use of bad content modelUse of bad content model

Training: ignore possible malicious n-gramsTraining: ignore possible malicious n-grams Packets with a max number of N-grams matching Packets with a max number of N-grams matching

the bad content model are ignoredthe bad content model are ignored Packets with high matching score (>5%) Packets with high matching score (>5%)

ignored, since new attacks might reuse old ignored, since new attacks might reuse old exploit code.exploit code.

Ignoring few packets is harmless for trainingIgnoring few packets is harmless for training Testing: scoring separates malicious from Testing: scoring separates malicious from

normalnormal If a never-seen n-gram also appears in the bad If a never-seen n-gram also appears in the bad

content model, give a higher weight factor content model, give a higher weight factor tt for for it. (t=5 in our experiment)it. (t=5 in our experiment)

T

NtNScore badnewnew _*

Back

76

Feedback-based learning Feedback-based learning with shadow serverswith shadow servers

Training attacks: attacker sends malicious data Training attacks: attacker sends malicious data during training time to poison the model.during training time to poison the model. Bad content model cannot guarantee 100% detectionBad content model cannot guarantee 100% detection

The most reliable way is using the feedback of The most reliable way is using the feedback of some host-based shadow server to supervise the some host-based shadow server to supervise the trainingtraining

Also useful for adaptive learning to accommodate Also useful for adaptive learning to accommodate concept shiftingconcept shifting

PAYL/Anagram can be used as a first-line classifier PAYL/Anagram can be used as a first-line classifier to amortize the expensive cost of the shadow serverto amortize the expensive cost of the shadow server Only small percentage of the all traffic is sent to shadow Only small percentage of the all traffic is sent to shadow

server, instead of allserver, instead of all The feedback of shadow server can be improve the The feedback of shadow server can be improve the

accuracy of Anagramaccuracy of Anagram

77Back

78

The structure of the The structure of the mimicry wormmimicry worm

79

The maximum possible padding The maximum possible padding length for a packet of different length for a packet of different varieties of this mimicry attackvarieties of this mimicry attack

versioversionn

418, 418, 1010

418, 418, 100100

730, 730, 1010

730, 730, 1010

730, 730, 100100

730, 730, 100100

PaddiPadding ng lengthlength

125125 149149 437437 461461 11671167 11911191

Each cell in the top row contains a tuple (x, y), representing a variant sequence of y packets of x bytes each.

The second row represents the maximum number of bytes that can be used for padding in each packet.Return to main

80

Ingress/egress Ingress/egress experimental settingexperimental setting

Launched CodeRed and CodeRed II in our Launched CodeRed and CodeRed II in our controlled test environment, capture the controlled test environment, capture the traces, and merge the traces into a real web traces, and merge the traces into a real web server's trace server's trace Simulate a real worm attacking and propagating Simulate a real worm attacking and propagating

on a real serveron a real server Interesting behavior observed about the Interesting behavior observed about the

wormworm Propagation occurred with packets fragmented Propagation occurred with packets fragmented

differently than the initial attack packets differently than the initial attack packets Multiple types of fragmentationMultiple types of fragmentation

81

1460, 1460, 8981448, 1448, 922

OutgoingIncoming

Code Red II (total 3818 bytes)

4, 13, 453, 1460, 1460, 649

4, 375, 1460, 1460, 740

4, 13, 362, 91, 1460, 1460, 6491448, 1448, 1143

OutgoingIncoming

Code Red (total 4039 bytes)

Different Different fragmentation for CR fragmentation for CR

and CRIIand CRII

82

MetricMetric Detect Detect propagatipropagationon

False False alertsalerts

SESE NoNo NoNo

LCS(0.5)LCS(0.5) Yes Yes NoNo

LCSeq(0.LCSeq(0.5)5)

YesYes NoNo

Results of correlation Results of correlation for different metricsfor different metrics

The number in the parenthesis is the The number in the parenthesis is the threshold setting for similarity score to threshold setting for similarity score to decide whether a propagation has occurreddecide whether a propagation has occurred

Return to main

83

Data DiversityData Diversity

Example byte distribution for payload length 536 of port 80 for the three sites.

84

EX, WEX, W1448, 1448, 0.78960.7896

1460, 1460, 0.78510.7851

216, 216, 0.62410.6241

EX, W1EX, W11460, 1460, 0.97460.9746

1448, 1448, 0.87310.8731

536, 536, 0.55400.5540

W, WW, W892, 892, 0.75020.7502

1460, 1460, 0.74560.7456

1448, 1448, 0.71220.7122

PAYL: for each pair of sites, the 3 packet lengths with the largest Manhattan distance between their byte distribution.

Dataset A

Dataset B

Common 5-grams

Common Perc(%)

EX (509347)

W (953345) 129468 17.5%

EX (509347)

W1 (974292) 99366 13.4%

W1 (974292)

W (953345) 454586 47.2%

Anagram: the number of unique 5-grams in dataset W, W1 and EX, and the common 5-grams numbers between each pair of sites.

Back

85

Testing methodologyTesting methodology Three sets of trafficThree sets of traffic

www1www1 and and www2www2: Columbia webservers, 100 : Columbia webservers, 100 packets eachpackets each

Malicious packet dataset, 56 packets eachMalicious packet dataset, 56 packets each Known Ground TruthKnown Ground Truth

Arranged into three sets of pairsArranged into three sets of pairs 10,000 “good vs. good”10,000 “good vs. good” 1,540 “bad vs. bad”1,540 “bad vs. bad” 5,600 “good vs. bad” between 5,600 “good vs. bad” between www1www1 and the and the

malicious datasetmalicious dataset CompareCompare

SimilaritySimilarity of the approaches of the approaches EffectivenessEffectiveness in correlating in correlating Ability to generate Ability to generate signaturessignatures

86

Similarity – direct string Similarity – direct string comparisoncomparison

0 10 20 30 40 50 60 70 800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Sim

ilarit

y S

core

Raw LCS

Zstr LCS

Raw LCSeq

Raw ED

Manhattan Distance

Zstr LCSeq

Zstr ED

High-level view of High-level view of score similaritiesscore similarities

Most of the Most of the techniques are techniques are similar, similar, exceptexcept LCS LCS (vulnerable to slight (vulnerable to slight differences)differences)

ED and LCSeq very ED and LCSeq very similarsimilar

N-gram techniques N-gram techniques not included not included (doesn’t compute (doesn’t compute similarity over entire similarity over entire packet datagram)packet datagram)Similarity score, 80 random

pairs of “good vs. good”Detail Comparison

Back

87

Similarity comparison Similarity comparison (II)(II)

To compare the difference more precisely, To compare the difference more precisely, normalizenormalize and compare scores and compare scores Compute similarity score vectors Compute similarity score vectors VVA A ,, V VBB

Match their mediansMatch their medians Scale ranges proportionally so min and max Scale ranges proportionally so min and max

values matchvalues match Manhattan distanceManhattan distance then computed between then computed between

the vectorsthe vectors Each privacy-enabled technique compared Each privacy-enabled technique compared

against Raw-LCSeq (baseline)against Raw-LCSeq (baseline)

88

Similarity of packets (III)Similarity of packets (III)

TypeType Raw-Raw-LCSLCS

Raw-Raw-EDED

MDMD ZStr-ZStr-LCSLCS

ZStr-ZStr-LCSeLCSe

qq

ZStr-ZStr-EDED

G-DG-D .094.09488

.033.03366

.066.06699

.207.20799

.079.07944

.066.06677

B-BB-B .050.05088

.044.04411

.065.06533

.039.03999

.026.02633

.066.06699

G-BG-B .025.02511

.024.02411

.011.01100

.031.03100

.019.01911

.023.02333

Unsurprisingly, Raw-ED closest to Raw-LCSeq Unsurprisingly, Raw-ED closest to Raw-LCSeq All privacy-preserving methods are closeAll privacy-preserving methods are close when when

correlating pairs including attack traffic; may be correlating pairs including attack traffic; may be leveraging difference between byte distributionsleveraging difference between byte distributions Manhattan distance between packet freq Manhattan distance between packet freq

distributions bestdistributions best

Normalized similarity scores (lower is better)

Back

89

Worm variants – CRII Worm variants – CRII exampleexample

GET /default.ida?XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u9090%u8190%u00c3%u0003%u8b00%u531b%u53ff%u0078%u0000%u00=a HTTP/1.0\x0d\n.

GET /notarealfile.idq?UOIRJVFJWPOIVNBUNIVUWIFOJIVNNZCIVIVIGJBMOMKRNVEWIFUVNVGFWERIOUNVUNWIUNFOWIFGITTOOWENVJSNVSFDVIRJGOGTNGTOWGTFGPGLKJFGOIRWTPOIREPTOEIGPOEWKFVVNKFVVSDNVFDSFNKVFKGTRPOPOGOPIRWOIRNNMSKVFPOSVODIOREOITIGTNJGTBNVNFDFKLVSPOERFROGDFGKDFGGOTDNKPRJNJIDH%u1234DSPPOITEBFBWEJFBHREWJFHFRG=bla HTTP/1.0\x0d\n.

[Crandall05]

90

Anagram privacy preserving Anagram privacy preserving cross-sites collaborationcross-sites collaboration

The anomalous n-grams of suspicious The anomalous n-grams of suspicious payload are stored in a Bloom filter, and payload are stored in a Bloom filter, and exchanged among sitesexchanged among sites

By checking the n-grams of local alerts By checking the n-grams of local alerts against the Bloom filter alert, it’s easy to against the Bloom filter alert, it’s easy to tell how similar the alerts are to each tell how similar the alerts are to each otherother The common malicious n-grams can be used The common malicious n-grams can be used

for general signature generation, even for for general signature generation, even for polymorphic wormspolymorphic worms

Privacy preserving with no loss of Privacy preserving with no loss of accuracyaccuracy

91

Robust Signature Robust Signature GenerationGeneration

Anagram not only detects suspicious Anagram not only detects suspicious packets, it also identifies the packets, it also identifies the corresponding malicious n-grams!corresponding malicious n-grams!

These n-grams are good targets for These n-grams are good targets for further analysis and signature further analysis and signature generationgeneration

The set of n-grams is order-The set of n-grams is order-independent. Attack vector independent. Attack vector reordering will fail.reordering will fail.

92

Anagram flattened signature Anagram flattened signature for attackfor attack

GET /modules/Forums/admin/admin_styles.php?phpbb_root_path=http://81.174.26.111/cmd.gif?&cmd=cd%20/tmp;wget%20216.15.209.4/criman;chmod%20744%20criman;./criman;echo%20YYY;echo|..HTTP/1.1.Host:.128.59.16.26.User‑Agent:.Mozilla/4.0.(compatible;.MSIE.6.0;.Windows.NT.5.1;)..

N=3: *?ph*bb_*//8*p;wg*n;c*n;./c*n;ec*0YYY;echo|H*26.U*1;).*N=5: *ums/ad*in/admin_sty*.phpadmin_sty*hp?phpbb_root_path=http://81.174.26.111/cmd*cmd=cd%20/tmp;wget%20216*09.4/criman;chmod%20744%20criman;./criman;echo%20YYY;echo| HTT*6.26.Use*5.1;)..*N=7: *dules/Forums/admin/admin_styles.phpadmin_styles.php?phpbb_root_path=http://81.174.26.111/cmd.gi*?&cmd=cd%20/tmp;wget%20216.15.209.4/criman;chmod%20744%20criman;./criman;echo%20YYY;echo| HTTP/*59.16.26.User-*T 5.1;)..*

php attack content:

Generated signatures using different N:

Note: “.” for nonprintable characters; “*” represents a wildcard for signature matching

1 Network Payload-based Anomaly Detection and Content-based Alert Correlation Ke Wang Thesis Defense Aug. 14 th, 2006 Department of Computer Science Columbia.

Documents

gram modeling payl

worms initial propagation

gram modeling statistical

payload prevalence

slow worms targeted

network slide

obvious propagation

targetedhitlist worms