1 EE384Y: Packet Switch Architectures Part II Address Lookup and Classification Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University [email protected] http://www.stanford.edu/~nickm
Mar 26, 2015
1
High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.
EE384Y: Packet Switch ArchitecturesPart II
Address Lookup and Classification
Nick McKeownProfessor of Electrical Engineering and Computer Science, Stanford University
[email protected]://www.stanford.edu/~nickm
2
Generic Router Architecture (Review from EE384x)
LookupIP Address
UpdateHeader
Header ProcessingData Hdr Data Hdr
~1M prefixesOff-chip DRAM
AddressTable
AddressTable
IP Address Next Hop
QueuePacket
BufferMemoryBuffer
Memory~1M packetsOff-chip DRAM
3
Lookups Must be Fast
12540Gb/s2003
31.2510Gb/s2001
7.812.5Gb/s1999
1.94622Mb/s1997
40B packets (Mpkt/s)
LineYear
1. Lookup mechanism must be simple and easy to implement2. (Surprise?) Memory access time is the long-term bottleneck
4
Memory Technology (2003-04)
Technology
Single chip density
$/chip ($/MByte)
Access speed
Watts/chip
Networking DRAM
64 MB $30-$50($0.50-$0.75)
40-80ns 0.5-2W
SRAM 4 MB $20-$30($5-$8)
4-8ns 1-3W
TCAM 1 MB $200-$250($200-$250)
4-8ns 15-30W
Note: Price, speed and power are manufacturer and market dependent.
5
Lookup Mechanism is Protocol Dependent
Networking Protocol
Lookup Mechanism
Techniques
MPLS, ATM, Ethernet
Exact match search
–Direct lookup–Associative lookup–Hashing–Binary/Multi-way Search Trie/Tree
IPv4, IPv6 Longest-prefix match search
-Radix trie and variants-Compressed trie-Binary search on prefix intervals
6
Outline
I. Routing Lookups• Overview• Exact matching
– Direct lookup– Associative lookup– Hashing– Trees and tries
• Longest prefix matching– Why LPM?– Tries and compressed tries– Binary search on prefix intervals
• References
II. Packet Classification
7
Exact Matches in ATM/MPLS
VCI/MPLS-label
Addre
ss
Memory
Data
(Outgoing Port, new VCI/label)
• VCI/Label space is 24 bits- Maximum 16M addresses. With 64b data, this is 1Gb of memory.
• VCI/Label space is private to one link • Therefore, table size can be “negotiated”• Alternately, use a level of indirection
Direct Memory Lookup
8
Exact Matches in Ethernet Switches
• Layer-2 addresses are usually 48-bits long,
• The address is global, not just local to the link,
• The range/size of the address is not “negotiable” (like it is with ATM/MPLS)
• 248 > 1012, therefore cannot hold all addresses in table and use direct lookup.
9
Exact Matches in Ethernet Switches (Associative Lookup)• Associative memory (aka Content Addressable
Memory, CAM) compares all entries in parallel against incoming data.
Network address Data
AssociativeMemory(“CAM”)
Addre
ss48bitsMatch
Location
Addre
ss“Normal”Memory
Data
Port
10
Exact Matches in Ethernet SwitchesHashing
• Use a pseudo-random hash function (relatively insensitive to actual function)
• Bucket linearly searched (or could be binary search, etc.)• Leads to unpredictable number of memory references
HashingFunction
Memory
Addre
ss
Data
NetworkAddress
48
16, say Pointer
Memory
Addre
ss
DataList/Bucket
List of network addresses in this bucket
11
Exact Matches Using HashingNumber of memory references
Where:
ER Expected number of memory references=
M Number of memory addresses in table=
N Number of linked lists= M N=
M
N)
11(1
12
1
empty)not islist |list oflength Expected(2
1ER
:referencesmemory ofnumber Expected
12
Exact Matches in Ethernet SwitchesPerfect Hashing
HashingFunction
Memory
Addre
ss
Data
NetworkAddress
48
16, say Port
There always exists a perfect hash function.
Goal: With a perfect hash function, memory lookup always takes O(1) memory references.
Problem: - Finding perfect hash functions (particularly
minimal perfect hashings) is very complex. - Updates?
13
Exact Matches in Ethernet Switches
Hashing• Advantages:
– Simple– Expected lookup time is small
• Disadvantages– Inefficient use of memory– Non-deterministic lookup time
Attractive for software-based switches, but decreasing use in hardware platforms
14
Exact Matches in Ethernet Switches Trees and Tries
Binary Search Tree
< >
< > < >
log
2 NN entries
Binary Search Trie
0 1
0 1 0 1
111010
Lookup time bounded and independent of table size, storage
is O(NW)
Lookup time dependent on table size, but independent of address length, storage is O(N)
15
Exact Matches in Ethernet Switches Multiway tries
16-ary Search Trie
0000, ptr 1111, ptr
0000, 0 1111, ptr
000011110000
0000, 0 1111, ptr
111111111111
Ptr=0 means no children
Q: Why can’t we just make it a 248-ary trie?
16
Exact Matches in Ethernet Switches
Multiway tries
Degree ofTree
# MemReferences
# Nodes(x106)
Total Memory(Mbytes)
FractionWasted (%)
2 48 1.09 4.3 494 24 0.53 4.3 738 16 0.35 5.6 8616 12 0.25 8.3 9364 8 0.17 21 98256 6 0.12 64 99.5
Table produced from 215 randomly generated 48-bit addresses
As degree increases, more and more pointers are “0”
17
Exact Matches in Ethernet Switches Trees and Tries
• Advantages:– Fixed lookup time– Simple to implement and update
• Disadvantages– Inefficient use of memory and/or
requires large number of memory references
18
Outline
I. Routing Lookups• Overview• Exact matching
– Direct lookup– Associative lookup– Hashing– Trees and tries
• Longest prefix matching– Why LPM?– Tries and compressed tries– Binary search on prefix intervals
• References
II. Packet Classification
19
Longest Prefix Matching: IPv4 Addresses
• 32-bit addresses• Dotted quad notation: e.g.
12.33.32.1• Can be represented as integers on
the IP number line [0, 232-1]: a.b.c.d denotes the integer: (a*224+b*216+c*28+d)
0.0.0.0 255.255.255.255IP Number Line
20
Class-based Addressing
A B C D
0.0.0.0
E
128.0.0.0 192.0.0.0
Class Range MS bits netid hostidA 0.0.0.0 –
128.0.0.00 bits 1-7 bits 8-31
B 128.0.0.0 -191.255.255.255
10 bits 2-15 bits 16-31
C 192.0.0.0 -223.255.255.255
110 bits 3-23 bits 24-31
D (multicast)
224.0.0.0 - 239.255.255.255
1110 - -
E (reserved)
240.0.0.0 -255.255.255.255
11110 - -
21
Lookups with Class-based Addresses
23
186.21
Port 1
Port 2192.33.32.1
Class A
Class B
Class C
192.33.32 Port 3Exact match
netid port#
22
Problems with Class-based Addressing
• Fixed netid-hostid boundaries too inflexible– Caused rapid depletion of address
space
• Exponential growth in size of routing tables
23
Early Exponential Growth in Routing Table Sizes
Num
ber
of
BG
P r
oute
s advert
ised
24
Classless Addressing (and CIDR)
• Eliminated class boundaries• Introduced the notion of a variable
length prefix between 0 and 32 bits long
• Prefixes represented by P/l: e.g., 122/8, 212.128/13, 34.43.32/22, 10.32.32.2/32 etc.
• An l-bit prefix represents an aggregation of 232-l IP addresses
25
CIDR:Hierarchical Route Aggregation
Backbone
Router
R1R2
R3R4
ISP, P ISP, Q192.2.0/22 200.11.0/22
Site, S
192.2.1/24
Site, T
192.2.2/24 192.2.0/22 200.11.0/22
192.2.1/24 192.2.2/24
192.2.0/22, R2
Backbone routing table
IP Number Line
R2
26
Post-CIDR Routing Table sizes
Source: http://www.cidr-report.org/
27
Routing Lookups with CIDR
192.2.0/22, R2
192.2.2/24, R3 192.2.0/22 200.11.0/22
192.2.2/24
200.11.0/22, R4
200.11.0.33192.2.0.1 192.2.2.100
LPM: Find the most specific route, or the longest matching prefix among all the prefixes matching the destination address of an incoming packet
28
Longest Prefix Match is Harder than Exact Match
• The destination address of an arriving packet does not carry with it the information to determine the length of the longest matching prefix
• Hence, one needs to search among the space of all prefix lengths; as well as the space of all prefixes of a given length
29
LPM in IPv4Use 32 exact match algorithms for LPM!
Exact matchagainst prefixes
of length 1
Exact matchagainst prefixes
of length 2
Exact matchagainst prefixes
of length 32
Network Address PortPriorityEncodeand pick
30
Metrics for Lookup Algorithms• Speed (= number of memory accesses)• Storage requirements (= amount of
memory)• Low update time (support ~5K updates/s)• Scalability
– With length of prefix: IPv4 unicast (32b), Ethernet (48b), IPv4 multicast (64b), IPv6 unicast (128b)
– With size of routing table: (sweetspot for today’s designs = 1 million)
• Flexibility in implementation• Low preprocessing time
31
Radix Trie
P1 111* H1
P2 10* H2
P3 1010*
H3
P4 10101
H4
P2
P3
P4
P1
A
B
C
G
D
F
H
E
1
0
0
1 1
1
1
Lookup 10111
Add P5=1110*
I
0
P5
next-hop-ptr (if prefix)
left-ptr right-ptr
Trie node
32
Radix Trie
• W-bit prefixes: O(W) lookup, O(NW) storage and O(W) update complexity
Advantages
SimplicityExtensible to wider fields
Disadvantages
Worst case lookup slowWastage of storage space in chains
33
Leaf-pushed Binary Trie
A
B
C
G
D
E
1
0
0
1
1
left-ptr or next-hop
Trie node
right-ptr or next-hop
P2
P4P3
P2
P1P1 111* H1
P2 10* H2
P3 1010*
H3
P4 10101
H4
34
PATRICIA
2A
B C
E
10
1
Patricia tree internal node
3
P3
P2
P4
P110
0F G
D5
bit-position
left-ptr right-ptr
Lookup 10111
P1 111* H1
P2 10* H2
P3 1010*
H3
P4 10101
H4
Bitpos 12345
35
• W-bit prefixes: O(W2) lookup, O(N) storage and O(W) update complexity
Advantages
Decreased storage Extensible to wider fields
Disadvantages
Worst case lookup slowBacktracking makes implementation complex
PATRICIA
36
Path-compressed Tree
1, , 2A
B C10
10,P2,4
P4
P1
1
0
E
D1010,P3,5
bit-position
left-ptr right-ptr
variable-length bitstring
next-hop (if prefix present)
Path-compressed tree node structure
Lookup 10111
P1 111* H1
P2 10* H2
P3 1010*
H3
P4 10101
H4
37
• W-bit prefixes: O(W) lookup, O(N) storage and O(W) update complexity
Advantages
Decreased storage
Disadvantages
Worst case lookup slow
Path-compressed Tree
38
Multi-bit Tries
Depth = WDegree = 2Stride = 1 bit
Binary trieW
Depth = W/kDegree = 2k
Stride = k bits
Multi-ary trie
W/k
39
Prefix Expansion with Multi-bit Tries
If stride = k bits, prefix lengths that are not a multiple of k need to be expanded
Prefix Expanded prefixes
0* 00*, 01*
11* 11*
E.g., k = 2:
Maximum number of expanded prefixes corresponding to one non-expanded prefix = 2k-
1
40
Four-ary Trie (k=2)
P2
P3 P12
A
B
F11
next-hop-ptr (if prefix)
ptr00 ptr01
A four-ary trie node
P11
10
P42
H11
P41
10
10
1110
D
C
E
G
ptr10 ptr11
Lookup 10111
P1 111* H1
P2 10* H2
P3 1010*
H3
P4 10101
H4
41
Prefix Expansion Increases Storage Consumption
• Replication of next-hop ptr• Greater number of unused (null)
pointers in a node
Time ~ W/kStorage ~ NW/k * 2k-1
42
Generalization: Different Strides at Each Trie Level
• 16-8-8 split• 4-10-10-8 split• 24-8 split• 21-3-8 split
Optional Exercise: Why does this not work well for IPv6?
43
Choice of Strides: Controlled Prefix Expansion [Sri98]
Given a forwarding table and a desired number of memory accesses in the worst case (i.e., maximum tree depth, D)
A dynamic programming algorithm to compute the optimal sequence of strides that minimizes the storage requirements: runs in O(W2D) timeAdvantages
Optimal storage under these constraints
Disadvantages
Updates lead to sub-optimality anywayHardware implementation difficult
44
Binary Search on Prefix Intervals [Lampson98]
0000 11110010 0100 0110 1000 11101010 1100
P1
P4P3
P5P2
Prefix IntervalP1 /0 0000…
1111
P2 00/2 0000…0011
P3 1/1 1000…1111
P4 1101/4 1101…1101
P5 001/3 0010…0011
1001
I1 I3 I4 I5 I6I2
45
I1
I3
I2 I4 I5
I6
0111
0011 1101
11000001
>Alphabetic Tree
1/2 1/4
1/8
1/16 1/32
1/32
>
>
>
>
0000 11110010 0100 0110 1000 11101010 1100
P1
P4P3
P5P2
1001
I1 I3 I4 I5 I6I2
46
0001
Another Alphabetic Tree
I1
I2
I5
I3
I4
I6
0111
0011
1100
1101
1/2
1/4
1/8
1/16
1/32 1/32
47
Advantages
Storage is linearCan be ‘balanced’Lookup time independent of W
Disadvantages
But, lookup time is dependent on NIncremental updates complexEach node is big in size: requires higher memory bandwidth
•W-bit N prefixes: O(logN) lookup, O(N) storage
Multiway Search on Intervals
48
Routing Lookups: References
• [lulea98] A. Brodnik, S. Carlsson, M. Degermark, S. Pink. “Small Forwarding Tables for Fast Routing Lookups”, Sigcomm 1997, pp 3-14. [Example of techniques for decreasing storage consumption]
• [gupta98] P. Gupta, S. Lin, N.McKeown. “Routing lookups in hardware at memory access speeds”, Infocom 1998, pp 1241-1248, vol. 3. [Example of hardware-optimized trie with increased storage consumption]
• P. Gupta, B. Prabhakar, S. Boyd. “Near-optimal routing lookups with bounded worst case performance,” Proc. Infocom, March 2000 [Example of deliberately skewing alphabetic trees]
• P. Gupta, “Algorithms for routing lookups and packet classification”, PhD Thesis, Ch 1 and 2, Dec 2000, available at http://yuba.stanford.edu/ ~pankaj/phd.html [Background and introduction to LPM]
49
Routing lookups : References (contd)
• [lampson98] B. Lampson, V. Srinivasan, G. Varghese. “ IP lookups using multiway and multicolumn search”, Infocom 1998, pp 1248-56, vol. 3.
• [LC-trie] S. Nilsson, G. Karlsson. “Fast address lookup for Internet routers”, IFIP Intl Conf on Broadband Communications, Stuttgart, Germany, April 1-3, 1998.
• [sri98] V. Srinivasan, G.Varghese. “Fast IP lookups using controlled prefix expansion”, Sigmetrics, June 1998.
• [wald98] M. Waldvogel, G. Varghese, J. Turner, B. Plattner. “Scalable high speed IP routing lookups”, Sigcomm 1997, pp 25-36.