Top Banner
Automated Signature Extraction for High Volume Attacks Yehuda Afek Anat Bremler-Barr Shir Landau Feibish This work is part of the Kabarnit–Cyber Consortium (2012-2014) under Magnet program, funded by the chief scientist in the Israeli ministry of Industry, Trade and Labor. This research was also partly supported by European Research Council (ERC) Starting Grant no. 259085.
37

Automated Signature Extraction for High Volume Attacks

Feb 23, 2016

Download

Documents

MICHEAL HOFER

Automated Signature Extraction for High Volume Attacks. Yehuda Afek Anat Bremler -Barr Shir Landau Feibish. This work is part of the Kabarnit –Cyber Consortium (2012-2014) under Magnet program, funded by the chief scientist in the Israeli ministry of Industry, Trade and Labor. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Automated Signature Extraction for High Volume Attacks

Automated Signature Extraction for High Volume Attacks

Yehuda AfekAnat Bremler-Barr Shir Landau Feibish

This work is part of the Kabarnit–Cyber Consortium (2012-2014) under Magnet program, funded by the chief scientist in the Israeli ministry of Industry, Trade and Labor. This research was also partly supported by European Research Council (ERC) Starting Grant no. 259085.

Page 2: Automated Signature Extraction for High Volume Attacks

2

Zombies on innocent computers

Current DDoS Attack

Server-level DDoS attacks

Infrastructure-level DDoS attacks

Bandwidth-level DDoS attacks

Page 3: Automated Signature Extraction for High Volume Attacks

3

High volume attacks - Current Defense

Defense Line1

Defense Line 2

Defense Line n

Defense Line 3

Many different types of attackers:

… Call for HELP!!

Remaining attacks: Botnets (millions of computers) Hard to identify behaviorally, under the radar screen Zero-day – no known signatures

access control list filtering

behavioral analysis

SYN cookies, Challenge-response

Page 4: Automated Signature Extraction for High Volume Attacks

4

Signature based DDoS Attack Detection Unknown (zero-day) attacks:

Some hope: Attack tools usually leave some unique footprint (repeating pattern) Example in packet:

Connection: KEEP-ALIVE

Today: Find signatures manually (human eye)

Our goal: Find it automatically

Signatures used by anti-DDoS devices and firewalls to stop attack Mitigation in minutes, good enough for these types of attacks

Page 5: Automated Signature Extraction for High Volume Attacks

5

Signatures also used in NIDS/IPS (Snort, Bro, etc.) Worm detection (automated extraction)

Previous work: Worm behavior (address dispersion, suspicious code,

etc.) Fixed-length signatures Non-scalable Notable works:

Kephart et al ‘94 Honeycomb [Kreibich et al ’04] Earlybird [Singh et al ‘04] Autograph[Kim et al ’04] Hancock[Griffin et al ’09]

Page 6: Automated Signature Extraction for High Volume Attacks

6

System Overview

Our Challenge: Automatically find signatures that appear frequently only during attack

Where:Input collection:

In mitigation box (DDoS Guard/firewall/anti-DDoS etc.) In the cloud – collect data from several collectors.

Signature ExtractionAttack time traffic

sample

Peace time traffic sample Attack signatures

e.g. Connection: KEEP-ALIVE

Page 7: Automated Signature Extraction for High Volume Attacks

7

Signature Extraction - High Level

Attack time traffic sample

Peace time traffic sample

Attack signaturese.g. Connection: KEEP-ALIVE

Signature Extraction

Find frequent strings in attack time traffic

Find frequent strings in peace time traffic

Take only strings found in attack and not in peace

Page 8: Automated Signature Extraction for High Volume Attacks

8

Our GoalAutomatically find signatures that appear frequently only during attack

Requirements:1. Find minimal set of signatures

Some filtering devices have limited capacity2. Allow signatures of varying lengths 3. Don’t include signatures found in legitimate traffic

Minimum false positives4. Minimize space and time usage

Large amounts of data Quick response

Page 9: Automated Signature Extraction for High Volume Attacks

9

Finding Frequent Strings in Traffic Input: Sequence of packets Output: Strings that appear frequently in packets

Common Stringology solution: use suffix trees/arrays too much space

Our solution uses heavy hitters

Attack time traffic sample

Peace time traffic sample

Attack signaturese.g. Connection: KEEP-ALIVE

Find frequent strings in attack time traffic

Find frequent strings in peace time traffic

Take only strings found in attack and not in peace

Page 10: Automated Signature Extraction for High Volume Attacks

10

Heavy Hitters (Frequent Items) Input: N values, integer v Output: v values each appearing at least N/v

times Approximate solution:

Uses O(v) space! One pass over input!

Known counter based HH Algorithms: Misra & Gries 1982 Lossy Counting – Monku and Motwani 2002 Space saving - Metwally et al 2005 – currently using

Page 11: Automated Signature Extraction for High Volume Attacks

11

Space saving Heavy Hitters [Metwally et al 2005] Algorithm:

Maintain v values, and their counters.

counter

value

1 101 221 30

Input102230103550

Page 12: Automated Signature Extraction for High Volume Attacks

12

Space saving Heavy Hitters [Metwally et al 2005] Algorithm:

Maintain v values, and their counters. If next value x is one of the v, increment its

counter.

counter

value

2 101 221 30

Input102230103550

Page 13: Automated Signature Extraction for High Volume Attacks

13

Space saving Heavy Hitters [Metwally et al 2005] Algorithm:

Maintain v values, and their counters. If next value x is one of the v, increment its

counter. Else take item with minimal counter c:

Replace value with x New counter is c+1

Error rate: N/vcounter

value

2 102 351 30

Input102230103550

Page 14: Automated Signature Extraction for High Volume Attacks

14

Our Solution Heavy hitters usually done on numbers… how do we

use it for text?

k-grams: strings of length exactly k

Trivial idea: For each packet: Take all k-grams (sliding window) Do Heavy hitters on them

Fixed length not good enough Either too short: cuts up longer signatures

Substring pollution - Too many heavy hitters for one signature Or too long : noisy signatures

abcabcadefgfsdghjghnfdghfgsdhfjsb1=ab

cab2 = bcabb3 = cabc

k-grams

Page 15: Automated Signature Extraction for High Volume Attacks

15

Our Solution: Double Heavy Hitters Double Heavy Hitters algorithm: two separate

instances of heavy hitters Heavy Hitters 1: Find heavy hitters of k-grams Heavy Hitters 2: Find heavy hitters of varying-length

strings created during run of Heavy Hitters 1

Heavy Hitters 1

k k….

kk

kk string

k k

Heavy Hitters 2

string

string

string

string

Input to Heavy Hitters 1: k-grams

Input to Heavy Hitters 2: strings

Output is output of Heavy Hitters 2

Page 16: Automated Signature Extraction for High Volume Attacks

16

Double Heavy Hitters Algorithm While processing k-grams in Heavy Hitters1 Find max run of k-grams:

Already in Heavy Hitters 1 Counters of consecutive k-grams maintain predefined

ratio Create string Insert into Heavy Hitters 2

abca

cabc

bcab

k-grams:Is already in Heavy Hitters 1?

N YYNN Y YNNN

abca

abcabcCheck

ratio

abca

cabc

bcab

abcd

bcda

cdab

dabc

abca

N

Page 17: Automated Signature Extraction for High Volume Attacks

17

Double Heavy Hitters Algorithm Example:

K-gram

bi

abca b1

bcab b2

cabc b3

abca b4

bcab b5

cabc b6

Heavy Hitters 1counter

K-gram

1 abca1 bcab1 cabc

Heavy Hitters 2counter

string

0 NULL0 NULL0 NULL

K-gram

bi

abca b1

bcab b2

cabc b3

abca b4

bcab b5

cabc b6

abcd b7

abcabcabcd

Input:

Page 18: Automated Signature Extraction for High Volume Attacks

18

Double Heavy Hitters Algorithm Example:

K-gram

bi

abca b1

bcab b2

cabc b3

abca b4

bcab b5

cabc b6

Heavy Hitters 1counter

K-gram

2 abca1 bcab1 cabc

String = abcaHeavy Hitters 2counter

string

0 NULL0 NULL0 NULL

K-gram

bi

abca b1

bcab b2

cabc b3

abca b4

bcab b5

cabc b6

abcd b7

abcabcabcd

Input:

Page 19: Automated Signature Extraction for High Volume Attacks

19

Double Heavy Hitters Algorithm Example:

K-gram

bi

abca b1

bcab b2

cabc b3

abca b4

bcab b5

cabc b6

abcd b7

Heavy Hitters 1counter

K-gram

2 abca2 bcab1 cabc

String = abcabHeavy Hitters 2counter

string

0 NULL0 NULL0 NULL

abcabcabcd

Input:

Page 20: Automated Signature Extraction for High Volume Attacks

20

Double Heavy Hitters Algorithm Example:

K-gram

bi

abca b1

bcab b2

cabc b3

abca b4

bcab b5

cabc b6

Heavy Hitters 1counter

K-gram

2 abca2 bcab2 cabc

String = abcabcHeavy Hitters 2counter

string

0 NULL0 NULL0 NULL

K-gram

bi

abca b1

bcab b2

cabc b3

abca b4

bcab b5

cabc b6

abcd b7

abcabcabcd

Input:

Page 21: Automated Signature Extraction for High Volume Attacks

21

Double Heavy Hitters Algorithm Example:

K-gram

bi

abca b1

bcab b2

cabc b3

abca b4

bcab b5

cabc b6

Heavy Hitters 1counter

K-gram

3 abcd2 bcab2 cabc

String = abcabcHeavy Hitters 2counter

string

1 abcabc

0 NULL0 NULL

K-gram

bi

abca b1

bcab b2

cabc b3

abca b4

bcab b5

cabc b6

abcd b7

abcabcabcd

Input:

Page 22: Automated Signature Extraction for High Volume Attacks

22

Heavy Hitters on text – improving the estimation

Problem: substrings in heavy hitters Only longest run is in input to HH2

Correct the count: After run of algorithm For all strings s in Heavy Hitters 2:

Find other strings which contain s and add their counters to s’s counter

Heavy Hitters 2counter

string

200 wonder300 woman100 wonderwoma

n

Heavy Hitters 2Real counter

counter

string

300 200 wonder400 300 woman100 100 wonderwoma

n

Page 23: Automated Signature Extraction for High Volume Attacks

23

Double Heavy Hitters Algorithm Analysis Input:

Input to HH1: N k-grams Input to HH2: C consecutive grams

Error bounds: For HH1 with v items: N/v For HH2 with v items: C/v

We Prove: C ≤ N/(k + 1) Overall: Error bound of the Double Heavy Hitters

algorithm

Page 24: Automated Signature Extraction for High Volume Attacks

24

Signature Extraction - High Level

Formalize with thresholds

Attack time traffic sample

Peace time traffic sample

Attack signaturese.g. Connection: keep-ALIVE

Signature Extraction

Find frequent strings in attack time traffic

Find frequent strings in peace time traffic

Take only strings found in attack and not in peace

Page 25: Automated Signature Extraction for High Volume Attacks

25

Chose Signatures Create signatures that never appear in legitimate traffic

Strings in attack with frequency > Attack-High

Thresholds: Attack-highPeace-lowPeace-highDelta

Page 26: Automated Signature Extraction for High Volume Attacks

26

Chose Signatures Create signatures that never appear in legitimate traffic

Strings in attack with frequency > Attack-High

Strings in peace time

Signatures

Thresholds: Attack-highPeace-lowPeace-highDelta

False positives

Page 27: Automated Signature Extraction for High Volume Attacks

27

Chose Signatures Create signatures that rarely appear in legitimate traffic

Strings in attack with frequency > Attack-High

Strings in peace with frequency > Peace-Low

Thresholds: Attack-highPeace-lowPeace-highDelta

Signatures

False positives

Page 28: Automated Signature Extraction for High Volume Attacks

28

Chose Signatures Create signatures that may appear in legitimate traffic, but appear in

attack traffic much more

Strings in attack with frequency > Attack-High

Thresholds: Attack-highPeace-lowPeace-highDelta

frequency > Peace-Low

Signatures only if attack frequency at least delta more than peace frequency

False positives

Signatures

frequency > Peace-high

Page 29: Automated Signature Extraction for High Volume Attacks

29

Use peace traffic to create filters

Double Heavy Hitters Algorith

m

abcabcadefgfsdghjghnfdghfg......b1=abca

b2 = bcab

b3 = cabc

……

Output values

Peace time traffic packets payload: White list

Maybe white list

Not white list

Use our Double Heavy Hitters algorithm on peace time traffic:

0%

100%

50%

Peace-high

Peace-low

frequency > Peace-high

frequency > Peace-Low

frequency > Peace-high

Page 30: Automated Signature Extraction for High Volume Attacks

30

Extracting Attack Signatures

Heavy

Hitters 1

Heavy

Hitters 2

hagdhdadjashdklahdjkasfjasbfjabfhfgahfvhsbdfjkasnkiaywtqyeffcgfacsdxasdbasb1=hagd

b2 = agdh

b3 = gdhd

……

string

Output values

Signatures

Attack traffic packets payload:

White list: discard if contained in whitelist string

Maybe white list:

Now use Double Heavy Hitters algorithm on attack time traffic with filters

Modified DHH

frequency > Attack-High

Page 31: Automated Signature Extraction for High Volume Attacks

31

Evaluations Overall eleven tests:

Ten real attack captures 5 captures of peacetime traffic 5 synthetic peacetime captures

One Synthetic attack in real peace time traffic

Compare to human expert

Page 32: Automated Signature Extraction for High Volume Attacks

32

Sample Signatures Extra newline between header fields Use of upper-case characters, where

usually lower Use of a rarely used HTTP field Use of rare user agent.

Could not be identified manually

Page 33: Automated Signature Extraction for High Volume Attacks

33

Results – Accuracy of Double Heavy Hitters estimation

Graph of frequency of signatures RED – Actual count (frequency) in attack traffic BLUE – Algorithm (DHH) estimation of frequency of signatures

Perc

ent

Signatures1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37

0102030405060708090

100

Algorithm (DHH) Actual Count (frequency)

Page 34: Automated Signature Extraction for High Volume Attacks

34

Results - Attack Rate EstimationAt

tack

rate

Test Number

Tests with real peace time traffic

Tests with synthetic peace time traffic

1 2 3 4 5 6 7 8 90

10

20

30

40

50

60

70

80

90

100

Human Ex...

Page 35: Automated Signature Extraction for High Volume Attacks

35

Results – Recall and Precision Estimation

Tests with real peace time traffic

Tests with synthetic peace time traffic

Perc

ent

Test Number

1 2 3 4 5 6 7 8 9 10 110

102030405060708090

100

Peacetime ba...

Precision: relevant packets from all identified

Recall: identified packets from all relevantAverage: 99.96Worst case: 99.8

Page 36: Automated Signature Extraction for High Volume Attacks

36

Future Work Identify signatures always found in same

packets

Good synthetic peace-time traffic, global white-list

Support regular expression signatures

Page 37: Automated Signature Extraction for High Volume Attacks

37