Top Banner
Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Publisher : ANCS’ 06 Author : Fang Yu, Zhifeng Chen, Yanlei Diao, T.V. Lakshman and Randy H. Katz Presenter : Yu-Hsiang Wang Date : 2010/11/17 1
17

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Jan 04, 2016

Download

Documents

Karen Armstrong

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection. Publisher : ANCS’ 06 Author : Fang Yu, Zhifeng Chen, Yanlei Diao, T.V. Lakshman and Randy H. Katz Presenter : Yu-Hsiang Wang Date : 2010/11/17. Outline. Introduction - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Publisher : ANCS’ 06Author : Fang Yu, Zhifeng Chen, Yanlei Diao, T.V. Lakshman and Randy H. KatzPresenter : Yu-Hsiang WangDate : 2010/11/17

1

Page 2: Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Outline

IntroductionDFA Analysis for Individual Regular

expression Regular Expression RewritesRegular Expressions Grouping Evaluation results

2

Page 3: Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Introduction

• A theoretical worst case study [14] shows a single regular expression of length n can be expressed as an NFA with O(n) states. When the NFA is converted into a DFA, it may generate O(Σn) states.

(Σ : a finite set of input symbols , 28 symbols from the

ASCII code)

• The processing complexity for each character in the input is O(1) in a DFA, but is O(n2) for an NFA when all n states are active at the same time.

3

Page 4: Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Introduction

•To handle m regular expressions, two choices are possible:

-processing them individually in m automata : O(m)

-compiling m regular expressions into a composite DFA : O(1)

4

Page 5: Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Design Consideration

•Completeness of matching result: Pattern : ab* Input : abbb -Exhaustive Matching : a , ab, abb ,abbb -Non-overlapping Matching : a (or abbb) left-most longest match, shortest match results

•DFA execution model for substring matching : patterns without ^ attached at the beginning. -Repeated search :Start scanning from one position, if no

match, start again at the next position -One-pass search : .* is pre-pended to each pattern without ^

5

Page 6: Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

DFA Analysis

• We use Exhaustive Matching and One-pass search• Typical patterns in network payload scanning applications

6

Page 7: Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Case 4 : DFA of Quadratic size

• if an input contains multiple Bs, the DFA needs to remember the number of Bs it has seen and their locations

7

Page 8: Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Case 4 Rewrites

Rewrite Rule(1)•Rewriting is enabled by relaxing the requirement of exhaustive matching to that of non-overlapping matching•the new pattern essentially implements non-overlapping left-most shortest match.

•Ex: ^SEARCH\s+[^\n]{1024} ^SEARCH\s [^\n]{1024} input : SEARCH\s\s ... \s aa ... a •number of states linear in j because it has removed the ambiguity for matching \s

8

1024

1024

Page 9: Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Case 5 : DFA of Exponential Size•we need to remember all possible effects

of the preceding As as they may yield different results when combined with subsequent inputs.

9

AABABA

BCD OBCD X

Page 10: Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Case 5 : DFA of Exponential Size• Often for detecting buffer overflow attempts : .*AUTH\s[^\n]{100} • DFA needs to remember all the possible AUTH\s : DFA >

10000states -A second AUTH\s can either match [^\n]{100} or be counted as a new match of the start of the pattern AUTH\s

• Can’t be efficiently processed by an NFA-based approach either

10

A U T H \s [\^n] [\^n][\^n] [\^n]

100 states

ε

NFA for .*AUTH\s[^\n]{100}

Input AUTH\sAUTH\s AUTH\s\s AUTH\s\s\s …

Page 11: Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Case 5 Rewrites

• Only the first AUTH\s matters -If there is a ‘\n’ within the next 100 bytes None of the

AUTH\s matches the pattern -Otherwise, the first AUTH\s and the following characters

have already matched the pattern• Rewrite the pattern to: ([^A]|A[^U]|AU[^T]|AUT[^H]|

AUTH[^\s]|AUTH\s[^\n]{0,99}\n)*AUTH\s[^\n]{100} generates a DFA of only 106 states

11

Page 12: Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Regular Expressions Grouping

• Some composite patterns generate DFA of exponential sizes

• interaction : two patterns interact with each other if their composite DFA contains more states than the sum of two individual ones

12

Page 13: Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Regular Expressions Grouping

Multi-core architectures (ex: IXP 2800 NPU ,16 processing unit)•Goal : design an algorithm that divides regular expressions into several groups, so that one processing unit can run one or several composite DFAs.

•the size of local memory of each processing unit is quite limited -Compute pair-wise interactive results, form a graph -Pick a pattern with the fewest interactions to the new group -Keep adding patterns until reaching limit

13

Page 14: Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Regular Expressions Grouping

14

Page 15: Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Evaluation results

•Effect of Rule Rewriting -L7-filter: protocol identifiers (70 regular expression) -Bro: intrusion patterns (2781 regular expression) -SNORT: No regular expression in April 2003 1131 out of 4867 regular expressions as of Jan

2006

15

Page 16: Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Evaluation results

•Effect of Grouping Multiple Patterns

16

Page 17: Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Evaluation results

17