1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS Department, UC Berkeley , 2 Bell Laboratories, Lucent Technologies
Feb 04, 2016
1
SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification
Fang Yu1 T. V. Lakshman2 Marti Austin Motoyama1 Randy H. Katz1
1EECS Department, UC Berkeley , 2Bell Laboratories, Lucent Technologies
2
Outline
• Introduction to multi-match classification
• Multi-match classification using TCAM– May consume a large amount of TCAM memory– May consume high power
• Set Splitting Algorithm (SSA)– A memory and power efficient scheme for multi-match
classification
• Simulation results
• Conclusions
3
• Single-Match classification – Assumption: all the filters are associated with priorities– Only the highest priority match matters– E.g., longest prefix match
udp $EXTERNAL_NET any-> $HOME_NET 1434
content:"|04|"; depth:1;content:"|81 F1 03 01 04
9B 81 F1 01|";content:"sock";content:"send"
udp $EXTERNAL_NET any -> $HOME_NET any
content:"|00 01 86 A9|";offset:12; depth:4;
content:"|00 00 00 01|";distance:4; within:4;
byte_jump:4,4,relative,align;byte_jump:4,4,relative,align;byte_test:4,>,64,0,relative;
content:"|00 00 00 00|";offset:4; depth:4; sid:2027;
rev:4;
A rule for MS-SQLWorm detection.
A rule for RPC oldpassword overflow attempt
Packet header Packet Payload
• Multi-Match classification– Report all matching results– No priority among filters– Intrusion detection system:
identify all the related rules– Also required by accounting
applications
Packet Classification
4
Ternary-CAM (TCAM)
• Fully associative memory: compare input string with all the entries in parallel– For multiple matches, report the
index of the first match
• Each cell takes one of three logic states – ‘0’, ‘1’, and ‘?’(don’t care)
192.128.101.100
168.100.???.???
192.128.???.???
Match192.128.101.???
Input
TCAM
entry
cell
width
5
Challenges of Multi-match Classification using TCAM
• Memory efficient– 9Mbits – 18Mbits priced at $200-$300
• Power efficient
• Easy update
• High speed– TCAM is fast e.g., 4 ns, However, TCAM only returns
the first match result – We want all the matching results within a few cycles
• If returning a bit vector of the matching result?– Processing the bit vector can take time if the bit vector is long– Not efficient it is a sparse vector in most of the cases
0
2
4
6
8
10
12
0 20,000 40,000 60,000Number of Entries Searched in Parallel
Po
we
r (i
n w
att
s) 250 Million Lookups Per Second
207 Million Lookups Per Second165 Million Lookups Per Second125 Million Lookups Per Second
6
Previous Solutions: Geometric Intersection-based Solution [Hot Interconnects 04]
• Add additional intersection filters– High speed
• Return all the matching results within one cycle
– Memory efficient • Create ~10N intersection filters
for the Snort rule set• May create O(NF) intersection
filters in the worst case
– Energy efficient
– Easily updatable
tcp $SQL_SERVER 1433$EXTERNAL_NET 139
tcp any any any 139
Match
tcp $SQL_SERVER 1433$EXTERNAL_NET any
Input
TCAMStores Rules
Filter 1
Filter 2
SRAM
Stores Match list(Index of rule)
tcp $SQL_SERVER 1433$EXTERNAL_NET 139
Filter 1&2
7
Previous Solution: MUD [ Sigcomm’05]
• Encode the index of the entry and include the encoded value in each TCAM entry – Search the TCAM with initial MUD as all don’t
cares– After finding a matching result at index j, search
again with discriminator field value ‘> j’
Filter 3Filter 2
Filter 1
Packet InfoInput
TCAM
00110010
0001
Discrim-inators
8
Previous Solution: MUD (Cont.)
• High speed– 1+d+(k-2)*(d-1) = O(dk) TCAM lookups to get k matching results
• d is the logarithm of the number of entries in TCAM (d=log2N)• Decreased to 1+d*(k-1)/r with DIRPE, where r (smaller than d)
• Memory efficient
• Energy efficient– All the entries in TCAMs are accessed each time high power
consumption.
• Easily updatable
Our Goal: Find a memory and power efficient solution
9
Observation
• Split filters into two sets to reduce intersections – Report the union of results from all sets– No need to include the intersections of the filters from different sets – Decrease the number of filters in TCAM, decrease power consumption– Increase the number of TCAM access
N filters +O(N2) intersection1 TCAM lookup
N filters + 1 intersection2 TCAM lookups
Original Two sets
F1
FN
Matching F1 and FN
Matching F1
Matching FN
10
Problem Definition
• Given a set of filters F(F1,F2, …., FN)
• Filters create a set of intersections I(I1,I2, …., IM)– e.g., I1= intersection of (F1, F5, F6)
• How to divide the filters into several sets – Residual intersection set I’: intersections from filters
in the same set– N + |I’| < TCAM size– Number of sets (TCAM accesses) is minimum – NP hard problem!
11
Split Rules into Two Sets
• Still an NP hard problem (known as maximum set splitting or maximum hypergraph cut )
• Best known approximation algorithms– Yield a performance ratio of 0.72 to the optimum
solution– Require quadratic programming slow when the
number of filters is large
• Our SSA algorithm– Remove at least half of the intersections– O(NM) complexity, where N is the total number of
filters, and M is the total number of intersections
12
Maximum Satisfiability Problem
• Maximum Satisfiability Problem– A set of literals {F1, F1, F2, F2,.., FN, FN}
– A set of clauses, each clause is a subset of literals
• E.g., C1={F1 F5 F6}
– Goal: Find an assignment of F to satisfy a maximum number of clauses
13
Johnson’s Algorithm to Maximum Satisfiability Problem
• Assign each clause a weight = 2-|c|
• E.g., weight of C1={F1, F5 F6} is 2-3
• Let Fi be any literal which hasn’t been assigned a value yet– If the weight of all clauses containing Fi is higher than
those containing Fi
• Assign Fi a true value and remove all clauses containing Fi
• Multiply the weight of all the clauses containing Fi by 2
– Otherwise• Assign Fi a false value and remove all clauses containing Fi
• Multiply the weight of all the clauses containing Fi by 2
14
Johnson’s Theorem
• If all the clauses have at least k literals– Johnson’s algorithm can satisfy at least
(2K-1)/ 2K percent of the total clauses – e.g., k=2, satisfy at least ¾ of the clauses– It is proved that (2K-1)/ 2K is the best
approximation bound for k>2
15
Filter Set Split Algorithm (SSA)
• Convert set splitting problem into maximum satisfiablity problem– Each filter corresponds to a literal
– For any intersection (e.g., I1= intersection of F1,, F5, and F6), add two clauses
• C={F1, F5 F6} and C’={F1, F5 F6}
• Total number of clauses is 2M, M is the number of intersections
• Run Johnson’s algorithm and assign each filter Fi either a true (put in set one) or a false value (put in set two)
16
Filter Set Split Algorithm (SSA) (cont.)
• According to Johnson’s theorem– At least ¾ of the clauses are satisfied
2M*3/4=1.5M
At least 0.5M of the intersections have both clauses satisfied
• Suppose for intersection of F1,, F5, and F6 , C={F1 F5 F6}
and C’={F1 F5 F6} both are satisfied
• At least one of F1,, F5, F6 is true and at least one is false• F1,, F5, F6 are split into different sets, thus this intersection
doesn’t need to be presented in TCAM
At least 50% of the intersections are removed!
17
Review of the SSA Scheme
• High speed– Deterministic lookup rate. E.g., if filters are split into two sets, only 2
TCAM lookups per packet are needed. – Sets are logically independent Lookups can be parallelized
• Memory efficient– Guarantee the removal of at least 50% of the intersections each time the filter
set is split into two sets
• Energy efficient– Low memory requirement– Access each filter only once per packet
• Easily updatable– Updates can be inserted to one of the set that creates the least number
of intersections
18
Simulation Setup
• Tests on the Snort rule header sets– Compare SSA with two TCAM-based solutions:
• MUD• Geometric Intersection-based solution
– Compare SSA with two representative software-based solutions:
• Hicuts • EGT-PC
– Evaluation metrics• Memory consumption• Lookup rate• Power consumption• Update cost
19
Memory Usage
Total number of extra intersections filters in TCAMs.
VersionGeometric
Intersection-based
SSA-2 SSA-4
Extra Intersections Saving Extra Intersections Saving
2.0.0 3453 46 98.67% 1 99.97%
2.0.1 3754 47 98.75% 1 99.97%
2.1.0 3758 47 98.75% 0 100%
2.1.1 4067 55 98.65% 0 100%
Total number of TCAM entries used.
Version MUDGeometric
Intersection-based SSA-2 SSA-4
2.0.0 240 3693 286 241
2.0.1 255 4009 302 256
2.1.0 257 4015 304 257
2.1.1 263 4330 318 263
20
Classification Speed
• MUD– One packet may match up to 12 unique filters, and requires a maximum of 20
TCAM lookups– Common packets like http packets match 4 unique filters and may require 5-9
TCAM lookups. A Napster packet requires 9 to 15 TCAM lookups
• Geometric Intersection-based solution– 1 TCAM lookup per packet
• SSA-2– 2 TCAM lookups per packet
• SSA-4– 4 TCAM lookups per packet– If average packet size is 402.7 bytes, SSA-4 operates at 201.35 Gbps
classification rate – Worst case, if every packet is 40 bytes, SSA-4 achieves 20Gbps rate
21
Update Cost
Version MUD
GeometricIntersection-
basedSSA-2 SSA-4
Avg Max Avg Max Avg Max
2.0.0 1 31.73 157 1.33 17 1.002 2
2.0.1 1 35.24 135 1.34 19 1 1
2.1.0 1 34.71 135 1.36 20 1.002 2
2.1.1 1 36.00 172 1.41 26 1.006 2
• Update cost in terms of newly inserted filters
22
Power Consumption
0
1000
2000
3000
4000
5000
6000
7000
8000
2.0.0 2.0.1 2.1.0 2.1.1Snort version
Nu
mb
er
of
TC
AM
en
trie
s
ac
ce
ss
ed
pe
r p
ac
ke
t
MUD (HTTP Packets)MUD (Napster Packets)MUD (worst case)Geometric Intersection-based SSA-2SSA-4
• Energy used by a TCAM is linear to – The number of entries searched in parallel – The number of TCAM accesses per packet
• Metric: total TCAM entries accessed per packet
24
Conclusions
• SSA is a memory and power efficient solution to multi-match classification problem– O(NM) complexity– Guarantee to remove 50% of the intersections each
time the filter set splits– Comparing to MUD
• Use a similar amount of TCAM memory • Yield a 75% to 95% reduction in power consumption
– Comparing to the Geometric Intersection-based Solution
• Use 90% less TCAM memory and power • Require one additional TCAM lookup per packet