Packet Classification On Multiple Fields Pankaj Gupta and Nick McKeown Computer Systems Laboratory, Stanford University {pankaj,nickm}@stanford.edu
Jan 26, 2016
Packet Classification On Multiple Fields
Pankaj Gupta and Nick McKeownComputer Systems Laboratory,
Stanford University
{pankaj,nickm}@stanford.edu
High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.
2
Why classify packets ?
To determine which flow they belong to => to decide what service they should receive
Router needs to identify the flow of every incoming packet and then perform appropriate special processing
3
Special Processing Requires Identification of
Flows• All packets of a flow obey a pre-defined
rule and are processed similarly by the router
• Classification is based on an arbitrary number of fields in the packet header
• E.g. a flow = (src-IP-address, dst-IP-address), or a flow = (dst-IP-prefix, protocol) etc.
4
Network services :
• Routing• Access-control in firewalls• Policy-based routing• Provision of differentiated qualities
of service• Traffic billing
5
What to determine?
• Forward or filter a packet?• Where to forward it to?• What class of service to receive?• How much to charge for
transpoting it?
6
Packet Classifier
Action
--------
---- ----
--------
rules Action
Classifier (policy database)
Packet Classification
Forwarding Engine
Incoming Packet
HEADER
7
Need for Differentiated Services
ISP1
NAP
E1E2
ISP2
ISP3Z
X
Y
Service ExampleTraffic Shaping
Ensure that ISP3 does not inject more than 50Mbps of total traffic on interface X, of which no more than 10Mbps is email traffic
Packet Filtering
Deny all traffic from ISP3 (on interface X) destined to E2
Policy Routing
Send all voice-over-IP traffic arriving from E1 (on interface Y) and destined to E2 via a separate ATM network
8
Table 2 :
Class Relevant Packet Fields
Email and from ISP2
From ISP2
From ISP3 and going to E2
All other packets
Source Link-layer Address,SourceTransport port number
Source Link-layer Address
Source Link-layer AddressDestination Network-Layer Address
---------
9
Packet Classification: Problem Definition
Given a classifier C with N rules, Rj, 1 j N, where Rj consists of three entities:
1) A regular expression Rj[i], 1 i d, on each of the d header fields,
2) A number, pri(Rj), indicating the priority of the rule in the classifier, and
3) An action, referred to as action(Rj).
For an incoming packet P with the header considered as a d-tuple of points (P1, P2, …, Pd), the d-dimensional packet classification problem is to find the rule Rm with the highest priority among all the rules Rj matching the d-tuple; i.e., pri(Rm) > pri(Rj), j m, 1 j N, such that Pi matches Rj[i], 1 i d. We call rule Rm the best matching rule for packet P.
10
Classification is a Generalization of Lookup
• Classifier = routing table• One-dimension (destination
address)• Rule = routing table entry• Regular expression = prefix• Action = (next-hop-address, port)• Priority = prefix-length
11
Example 4D classifierRule
L3-DA L3-SA L4-DP L4-PROT
Action
R1 152.163.190.69/255.255.255.255
152.163.80.11/255.255.255.255
* * Deny
R2 152.168.3.0/255.255.255.255
152.163.200.157/255.255.255.255
eq www udp Deny
R3 152.168.3.0/255.255.255.255
152.163.200.157/255.255.255.255
range 20-21
udp Permit
R4 152.168.3.0/255.255.255.255
152.163.200.157/255.255.255.255
eq www tcp Deny
R5 152.163.198.4/…
152.163.36.0/…
gt 1023 tcp Deny
R6152.163.198.4/255.255.255.255
152.163.36.0/255.255.255.255
Gt 1023 tcpPermit
12
Example Classification Results
Pkt Hdr
L3-DA L3-SA L4-DP L4-PROT
Rule, Action
P1 152.163.190.69 152.163.80.11 www tcp R1, Deny
P2 152.168.3.21 152.163.200.157
www udp R2, Deny
13
General characteristics of Classifiers
Number of rules not a large number “0.7% more than 1000” “mean of 50 rules”
Number of fields max of 8 fields : src/dst network layeraddress src/dst transport layer port numbers type-of-service field”TOS” protocol field transport-layer protocol flags 17% of rules : 1 field , 23% : 3 fields , 60% : 4 fields
14
General characteristics of Classifiers (contd.)
Transport-layer protocol field TCP,UDP,ICMP,IGMP,(E)IGRP,GRE,IPINIP or ‘*’Transport-layer field specification 10.2% have range specificationRules with non-contiguous mask 14% of classifiers have & 10.2% of all rulesMany different rules in the same classifier share a
number of field specificationRedundant rules 8% of rules in classifiers 4.4% of rules are backward redundant 3.6% of rules are forward redundant
15
GoalsThe algorithm should:
o Be fast enough to operate at OC48c linerates and preferably at OC192c linerateso Allow matching on arbitrary fields o Support general classification rules prefixes,operators,wildcardso Be suitable for implementation in both software
and hardware o Not have expensive memory requirements o Scale in terms of both memory and speed with
the size of the classifier
16
Previous work
simplest classification algorithm :
evaluating rules sequentially
simple and efficient in its use of memory
poor scaling properties : time grows linearly with the number of
rules
17
Classification with Ternary-CAMs
Memory array Priority
encoder
Packet Header
TCAM
01
2
3
M
0
1
0
0
1
The first matching rule
Too expensive,too small,and consume too much power for large classifiers
18
Structure of the Classifiers
R1
R2
R34 regions
A classification algorithm must keep a record of each region and be able to determine the region to which each newly arriving packet belongs
19
Structure of the Classifiers
R1
R2
R3
{R1, R2}
{R2, R3}
{R1, R2, R3}
7 regions
The more region the classifier contains,the more storage is required and the longer it takes to classify a packet
20
Algorithm
Packet Classification problem : S bits in the packet header => T bits of classID T = log N “ N is number of classifier rules “
A simple and fast way of doing this mapping : pre-compute the value of classID for each of the
2^S different packet headers :
• Yield the answer in one step “in one memory access” • Require too much memory
21
Recursive Flow Classification
perform the same mapping but over several stages
2S = 2128 2T = 212
One-step
2S = 2128 2T = 212232264
Multi-step
22
Recursive Flow Classification
Consists of P phases each with a set of parallel memory lookups
Each lookup is a reduction : the value returned by the memory lookup is
shorter than the index of the memory access
23
Chunking of a Packet
Source L3 Address
Destination L3 Address
L4 protocol and flags
Source L4 port
Destination L4 port
Type of Service
Packet Header
Chunk #0
Chunk #7
Used to index into multiple memories in parallel
24
Packet Flow
Phase 0 Phase 1 Phase 2 Phase 3
index
action
Header
eqID
25
Example 4D classifierRule
L3-DA L3-SA L4-DP L4-PROT
Action
R1 152.163.190.69/255.255.255.255
152.163.80.11/255.255.255.255
* * Deny
R2 152.168.3.0/255.255.255.255
152.163.200.157/255.255.255.255
eq www udp Deny
R3 152.168.3.0/255.255.255.255
152.163.200.157/255.255.255.255
range 20-21
udp Permit
R4 152.168.3.0/255.255.255.255
152.163.200.157/255.255.255.255
eq www tcp Deny
R5 152.163.198.4/…
152.163.36.0/…
gt 1023 tcp Deny
R6152.163.198.4/255.255.255.255
152.163.36.0/255.255.255.255
Gt 1023 tcpPermit
26
In phase 0 chunk#6 :
1.{www=80} 2.{20,21} 3.{>1023} 4.{remaining numbers} can be encoded by” 00b to 11b : eqIDs” reduction : 16 to 2
bits chunk#4 : 1.{tcp} 2.{udp} 3.{remaining numbers} can be encoded by 2 bits reduction : 8 to 2 bits
In phase 1
CESs : .{({80},{udp})} 2.{({20-21},{udp})} 3.{({80},{tcp})} 4.{({gt 1023},{tcp})} 5.{all remaining crossproducts}
“concatenating” reduction : 4 to 3 bits can be encoded by 3 bits total reduction : 24 to 3 bits
27
RFC preprocessing for chunk j of phase 0
For each rule rl in the classifier project ith component of rl onto the number line (from 0 to 2^b-1)
making the start and end points of each of its constituent intervalsEnd for ;Bmp := 0 ;For n in 0…2^b-1If(any rule starts or ends at n) update bmp ; if(bmp not seen earlier) eq := new_Equivalence_Class( ) ; eq -> cbm := bmp ; end if ;End if ;Else eq := the equovalence class whose cbm is bmp ;table_0_j[n] = eq->ID ;End for ;
28
RFC preprocessing for chunk i of phase j(j>0)
Index := 0 ;listEqs := nil ;For each CES,c1eq,of chunk c1For each CES,c2eq,of chunk c2…For each CES,cmeq,of chunk cm intersectedBmp := c1eq->cbm& c2eq->cbm&…& cmeq->cbm neweq := searchList(listEqs,intersectedBmp) ; if(not found in listEqs) neweq := new_Equivalence_Class( ) ; neweq->cbm := bmp ; add neweq to listEqs ; end if ; table_j_i[index] := neweq->ID ; index++ ;End for ;
29
Performance of RFC 1.number of phases P we combine those chunks together which
have the most correlation
2.the reduction tree used we combine as many chunks as we can
without causing unreasonable memory consumpsion
30
Choice of Reduction Tree
Number of phases = P = 310 memory accesses
0
4
1
2
3
5
0
4
1
2
3
5
ClassID
ClassID
Tree-A Tree_B
31
Choice of reduction tree
0
4
1
2
3
5
0
4
1
2
3
5
Tree_A Tree_B
ClassID
ClassID
Number of phases = P = 411 memory accesses
32
RFC lookup in Hardware
chk0 chk1 chk0 chk1Chks0-2
Chks3-5
SRAM2
SRAM1
SDRAM2
SDRAM1
Chks0 and 1 replicated
Chk#0 Chk#0 (replicated)
Phase 0 Phase 1
Phase 2
Clk : 125MHZ => 31.25 million packets per second
33
RFC lookup in software
30 lines of code in C
compiled on a 333Mhz PentiumII PC running windows NT :
worst case path for the code took (140clks+9tm) for three phases and (146clks+11tm) for four phases
“tm : memory access time” = 60 ns
=> 0.98us for 3 phase & 1.1us for 4 phases close to one million packets per second the average lookup time is 50% faster than the worst case
34
RFc lookup operationFor(each chunk,chknum of phase 0) eqNums[0][chkNum] = contents of appropriate rfctable at memory
address pktFields[chkNum] ; For(phaseNum=1…numphases-1) For(each chunk,chkNum,in Phase phaseNum) chd = parent descriptor of (phaseNum,chknum) ; Index = eqNums[phaseNum of chkParents[0]][chkNum
ofchkParents[0]] ; For(I=1…chd->numChkParents-1) index = index * (total #equivIDs of chd->chkParents[I]) + eqNums[phaseNum of chd->chkParents[I]] [chkNum of chd->chkParents[I]] ; End for eqNums[phaseNum][chkNum] = contents of appropriate rfctable at
address index End forReturn eqNums[0][numphases-1] ;
35
Table 6 :
Src L3 31..16
Src L315..0
Dst L3 31..16
Dst L315..0
L4 protocol8 bits
Dstn L416 bits
Action#
0
1
2
3
4
0.83/1..0.83/1..
0.83/1..
0.0/0.0
0.0/0.0 0.0/0.0
0.0/0.0
0.0/1…
0.0/1…
0.0/1…
0.0/0.0 *
*
*
0.0/0.0
0.0/0.0
0.0/0.0
0.0/0.0
4.6/1…
4.6/1…
0.77/1…
1.0/255.0
0.77/1…
udp
udp
*
20-30
21
21
*
permit
permit
permit
deny
permit
36
Variations and improvements of RFC
1.RFC can be extended to process a larger number of fields in each packet header
2.speed up RFC by taking advantage of available fast lookup algorithms
3.employ “adjacency groups” technique
to reduce the memory requirements when processing large classifiers
37
Adjacency GroupsSize of the RFC table ~ number of CES s
R & S are adjacent in dimension I if : 1.they have the same action 2.all but the ith field have the exact same specification in the two rules
3.all rules appearing between them have either the same action or are disjoint from R two rules are simple adjacent if they are adjacent in some dimension SO we will merge adjacent rules
38
Example of adjacency groups
R(a1,b1,c1,d1)S(a1,b1,c2,d1)T(a2,b1,c2,d1)U(a2,b1,c1,d1)V(a1,b1,c4,d2)W(a1,b1,c3,d2)X(a2,b1,c3,d2)Y(a2,b1,c4,d2)
RS(a1,b1,c1+c2,d1)TU(a2,b1,c1+c2,d1)VW(a1,b1,c3+c4,d2)XY(a2,b1,c3+c4,d2)
RSTU(a1+a2,b1,c1+c2,d1)VWXY(a1+a2,b1,c3+c4,d2)
RSTU(m1,n1)VWXY(m1,n2))RSTUVWXY(m1,n1+n2)
Merge along Dimension 3
Merge along Dimension 1
Carry out an RFC phaseAssume: chunks 1 & 2 are combinedAnd also chunks 3 & 4 are combined
Merge
Continue with RFC …
39
RFC: Pros and Cons
Advantages
Suitable for multiple fieldsSupports non-contiguous masksFast accesses
Disadvantages
Large pre-processing timeIncremental updates slowLarge worst-case storage requirements