Top Banner
Intrusion Detection and Malware Analysis Automatic signature generation Pavel Laskov Wilhelm Schickard Institute for Computer Science
27

Intrusion Detection and Malware Analysis - Automatic signature

Feb 11, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Intrusion Detection and Malware Analysis - Automatic signature

Intrusion Detection and Malware AnalysisAutomatic signature generation

Pavel LaskovWilhelm Schickard Institute for Computer Science

Page 2: Intrusion Detection and Malware Analysis - Automatic signature

The quest for attack signatures

Post-mortem: security research, computer forensicsReactive: analysis of anomalies (forensic sinks)Proactive: acquisition and analysis of malicious data

Page 3: Intrusion Detection and Malware Analysis - Automatic signature

A general framework for ASG

Clustering: finding groups of similar malicious eventsToken extraction: finding common patterns in malicious dataSignature assembly: assessment of extracted tokens

Page 4: Intrusion Detection and Malware Analysis - Automatic signature

Signature format

A set of tokens t1, . . . tn

A set of support values ν1, . . . , νn

A threshold θ

Evaluation rule:n

∑i=1

νi M(ti, s) > θ,

where

M(ti, s) =

{1 if ti is present in a string s0 otherwise

Page 5: Intrusion Detection and Malware Analysis - Automatic signature

Signature examples

Banload keylogger:

Storm worm:

Page 6: Intrusion Detection and Malware Analysis - Automatic signature

Invariance as a main principle of ASG

Invariance is inherent for attacks due to extremely specificnature of exploits.Diversity makes signatures more general and accurate.Too much diversity makes signatures smaller and leads tofalse positives.

Page 7: Intrusion Detection and Malware Analysis - Automatic signature

Token extraction: basic definitions

A token is a substring found in malicious content thatsatisfies pre-defined empirical conditions, such as:

minimal lengthminimial support: percentage of malicious events it occurs in

A pair of tokens is said to be distinct if they are not asubstring of one another.A token s that is a substring of another token t is ignoredunless it satisfies tokenization conditions while being notpart of t.

Page 8: Intrusion Detection and Malware Analysis - Automatic signature

Distinct token examples

s1 = “ddddfddf”, s2 = “ddddedde”Distinct tokens: “dddd”, “dd”

s1 = “dddfdddf”, s2 = “dddeddde”Distinct tokens: “ddd”s1 = “abcbabcbaba”, s2 = “abcbabxbaby”

Distinct tokens: “abcbab”, “ba”

Page 9: Intrusion Detection and Malware Analysis - Automatic signature

Distinct token examples

s1 = “ddddfddf”, s2 = “ddddedde”Distinct tokens: “dddd”, “dd”s1 = “dddfdddf”, s2 = “dddeddde”Distinct tokens: “ddd”

s1 = “abcbabcbaba”, s2 = “abcbabxbaby”

Distinct tokens: “abcbab”, “ba”

Page 10: Intrusion Detection and Malware Analysis - Automatic signature

Distinct token examples

s1 = “ddddfddf”, s2 = “ddddedde”Distinct tokens: “dddd”, “dd”s1 = “dddfdddf”, s2 = “dddeddde”Distinct tokens: “ddd”s1 = “abcbabcbaba”, s2 = “abcbabxbaby”

Distinct tokens: “abcbab”, “ba”

Page 11: Intrusion Detection and Malware Analysis - Automatic signature

Distinct token examples

s1 = “ddddfddf”, s2 = “ddddedde”Distinct tokens: “dddd”, “dd”s1 = “dddfdddf”, s2 = “dddeddde”Distinct tokens: “ddd”s1 = “abcbabcbaba”, s2 = “abcbabxbaby”Distinct tokens: “abcbab”, “ba”

Page 12: Intrusion Detection and Malware Analysis - Automatic signature

Token extraction: basic algorithm

Traverse a GST from top to bottom.For each node, output its path from the root if its depth isgreater than Lmin and the number of non-zero entries in itsleaf count is greater than νn.Output the percentage of non-zero entries in its leaf countas a token support.

Page 13: Intrusion Detection and Malware Analysis - Automatic signature

Token extraction example

Input: strings “abbaa” and “baaaa”, Lmin = 1, ν = 100%

Output:

“a”, “aa”, “b”, “ba”, “baa”

a # $ b

a # $ bbaa# aa baa#

a # $ aa$ #

a$ $

6 6

3 4 2 1

1 3 1 1

0 2

Page 14: Intrusion Detection and Malware Analysis - Automatic signature

Token extraction example

Input: strings “abbaa” and “baaaa”, Lmin = 1, ν = 100%

Output: “a”

, “aa”, “b”, “ba”, “baa”

a # $ b

a # $ bbaa# aa baa#

a # $ aa$ #

a$ $

6 6

3 4 2 1

1 3 1 1

0 2

Page 15: Intrusion Detection and Malware Analysis - Automatic signature

Token extraction example

Input: strings “abbaa” and “baaaa”, Lmin = 1, ν = 100%

Output: “a”, “aa”

, “b”, “ba”, “baa”

a # $ b

a # $ bbaa# aa baa#

a # $ aa$ #

a$ $

6 6

3 4 2 1

1 3 1 1

0 2

Page 16: Intrusion Detection and Malware Analysis - Automatic signature

Token extraction example

Input: strings “abbaa” and “baaaa”, Lmin = 1, ν = 100%

Output: “a”, “aa”, “b”

, “ba”, “baa”

a # $ b

a # $ bbaa# aa baa#

a # $ aa$ #

a$ $

6 6

3 4 2 1

1 3 1 1

0 2

Page 17: Intrusion Detection and Malware Analysis - Automatic signature

Token extraction example

Input: strings “abbaa” and “baaaa”, Lmin = 1, ν = 100%

Output: “a”, “aa”, “b”, “ba”, “baa”

a # $ b

a # $ bbaa# aa baa#

a # $ aa$ #

a$ $

6 6

3 4 2 1

1 3 1 1

0 2

Page 18: Intrusion Detection and Malware Analysis - Automatic signature

Open issues in the basic algorithm

How can we define unique “end-of-string” markers for a fullalphabet of byte values?

EscapingSpecial encoding: extra bytes (bits)

How can we avoid generation of non-distinct tokens?Post-processingComplex suffix tree accounting

Page 19: Intrusion Detection and Malware Analysis - Automatic signature

Open issues in the basic algorithm

How can we define unique “end-of-string” markers for a fullalphabet of byte values?

EscapingSpecial encoding: extra bytes (bits)

How can we avoid generation of non-distinct tokens?Post-processingComplex suffix tree accounting

Page 20: Intrusion Detection and Malware Analysis - Automatic signature

Auxiliary node data

Let N be a given internal node of a GST and LN be a label pathfrom the root to N.

Leaf index (LI) is a set of suffixes that are descendants of agiven node (characterized by their positions in a string).Prefix leaf index (PLI) is a set of suffixes containing distincttokens for which LN is a prefix.Suffix leaf index (SLI) is a set of suffixes containing distincttokens for which LN is a suffix.

Page 21: Intrusion Detection and Malware Analysis - Automatic signature

Exemplary GST with leaf indices

Input: strings “abbaa” and “baaaa”

1

2 6 6 3

4 5 5 1 5 2

6 4 4 1 3

2 3

a # $ b

a # $ bbaa# aa baa#

a # $ aa$ #

a$ $

[145 | 2345] [23 | 1]

[4 | 234] [3 | 1]

[∅ | 23]

Page 22: Intrusion Detection and Malware Analysis - Automatic signature

Sufficient condition for token distinctness

TheoremA label path LN is a distinct token if at least νn components in theset LI\(PLI∪ SLI) are non-empty.

Page 23: Intrusion Detection and Malware Analysis - Automatic signature

Token extraction with distinctness check

Traverse the suffix tree in the reverse label depth orderperforming the following steps for each node:

1. Compute B = PLI∪ SLI.2. Compute T = LI\B. If more than νn components of T are

non-emptyB = B∪ Treport a distinct token

3. Propagate B to the PLI to the parent node.4. Propagate B shifted by one to the SLI to the suffix link node

(if one exists).

Page 24: Intrusion Detection and Malware Analysis - Automatic signature

Signature assembly

Goal: remove tokens that frequently occur in normal traffic.Rules for removal:

ν−(ti) > ν+(ti)ν−(ti) > νmax

Underlying problem: set matching.Algorithms:

Knuth-Morris-Pratt: O(k(n + M))Aho-Corasick: O(k + n + M)

Page 25: Intrusion Detection and Malware Analysis - Automatic signature

Signature refinement

Given the set of token/support pairs {(t1, ν1), . . . , (tk, νk)},signature refinement consists of the following steps:

Normalization: support values are normalized so that theyadd up to 1:

νi =νi

∑kj=1 νk

Calibration: the threshold θ is calibrated on benign data soas not to exceed some maximal false positive rate.

Page 26: Intrusion Detection and Malware Analysis - Automatic signature

Lessons learned

Automatic signature generation enables one to quicklyextract signatures for samples of malicious and benigntraffic.Careful choice of algorithms and data structure is importantfor practical feasibility of ASG.ASG enable some very interesting applications to malwareanalysis, especially detection of malware communication.

Page 27: Intrusion Detection and Malware Analysis - Automatic signature

Recommended reading

D. Gusfield.Algorithms on strings, trees, and sequences.Cambridge University Press, 1997.

Konrad Rieck, Guido Schwenk, Tobias Limmer, Thorsten Holz, andPavel Laskov.Botzilla: Detecting the "phoning home" of malicious software.In Proc. of 25th ACM Symposium on Applied Computing (SAC),March 2010.(to appear).