1 EE384Y: Packet Switch Architectures Part II Address Lookup and Classification Nick McKeown Professor of Electrical Engineering and Computer Science,

1

High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.

EE384Y: Packet Switch ArchitecturesPart II

Address Lookup and Classification

Nick McKeownProfessor of Electrical Engineering and Computer Science, Stanford University

[email protected]://www.stanford.edu/~nickm

2

Generic Router Architecture (Review from EE384x)

LookupIP Address

UpdateHeader

Header ProcessingData Hdr Data Hdr

~1M prefixesOff-chip DRAM

AddressTable

AddressTable

IP Address Next Hop

QueuePacket

BufferMemoryBuffer

Memory~1M packetsOff-chip DRAM

3

Lookups Must be Fast

12540Gb/s2003

31.2510Gb/s2001

7.812.5Gb/s1999

1.94622Mb/s1997

40B packets (Mpkt/s)

LineYear

1. Lookup mechanism must be simple and easy to implement2. (Surprise?) Memory access time is the long-term bottleneck

4

Memory Technology (2003-04)

Technology

Single chip density

$/chip ($/MByte)

Access speed

Watts/chip

Networking DRAM

64 MB $30-$50($0.50-$0.75)

40-80ns 0.5-2W

SRAM 4 MB $20-$30($5-$8)

4-8ns 1-3W

TCAM 1 MB $200-$250($200-$250)

4-8ns 15-30W

Note: Price, speed and power are manufacturer and market dependent.

5

Lookup Mechanism is Protocol Dependent

Networking Protocol

Lookup Mechanism

Techniques

MPLS, ATM, Ethernet

Exact match search

–Direct lookup–Associative lookup–Hashing–Binary/Multi-way Search Trie/Tree

IPv4, IPv6 Longest-prefix match search

-Radix trie and variants-Compressed trie-Binary search on prefix intervals

6

Outline

I. Routing Lookups• Overview• Exact matching

– Direct lookup– Associative lookup– Hashing– Trees and tries

• Longest prefix matching– Why LPM?– Tries and compressed tries– Binary search on prefix intervals

• References

II. Packet Classification

7

Exact Matches in ATM/MPLS

VCI/MPLS-label

Addre

ss

Memory

Data

(Outgoing Port, new VCI/label)

• VCI/Label space is 24 bits- Maximum 16M addresses. With 64b data, this is 1Gb of memory.

• VCI/Label space is private to one link • Therefore, table size can be “negotiated”• Alternately, use a level of indirection

Direct Memory Lookup

8

Exact Matches in Ethernet Switches

• Layer-2 addresses are usually 48-bits long,

• The address is global, not just local to the link,

• The range/size of the address is not “negotiable” (like it is with ATM/MPLS)

• 248 > 1012, therefore cannot hold all addresses in table and use direct lookup.

9

Exact Matches in Ethernet Switches (Associative Lookup)• Associative memory (aka Content Addressable

Memory, CAM) compares all entries in parallel against incoming data.

Network address Data

AssociativeMemory(“CAM”)

Addre

ss48bitsMatch

Location

Addre

ss“Normal”Memory

Data

Port

10

Exact Matches in Ethernet SwitchesHashing

• Use a pseudo-random hash function (relatively insensitive to actual function)

• Bucket linearly searched (or could be binary search, etc.)• Leads to unpredictable number of memory references

HashingFunction

Memory

Addre

ss

Data

NetworkAddress

48

16, say Pointer

Memory

Addre

ss

DataList/Bucket

List of network addresses in this bucket

11

Exact Matches Using HashingNumber of memory references

Where:

ER Expected number of memory references=

M Number of memory addresses in table=

N Number of linked lists= M N=

M

N)

11(1

12

1

empty)not islist |list oflength Expected(2

1ER

:referencesmemory ofnumber Expected

12

Exact Matches in Ethernet SwitchesPerfect Hashing

HashingFunction

Memory

Addre

ss

Data

NetworkAddress

48

16, say Port

There always exists a perfect hash function.

Goal: With a perfect hash function, memory lookup always takes O(1) memory references.

Problem: - Finding perfect hash functions (particularly

minimal perfect hashings) is very complex. - Updates?

13


Hashing• Advantages:

– Simple– Expected lookup time is small

• Disadvantages– Inefficient use of memory– Non-deterministic lookup time

Attractive for software-based switches, but decreasing use in hardware platforms

14

Exact Matches in Ethernet Switches Trees and Tries

Binary Search Tree

< >

< > < >

log

2 NN entries

Binary Search Trie

0 1

0 1 0 1

111010

Lookup time bounded and independent of table size, storage

is O(NW)

Lookup time dependent on table size, but independent of address length, storage is O(N)

15

Exact Matches in Ethernet Switches Multiway tries

16-ary Search Trie

0000, ptr 1111, ptr

0000, 0 1111, ptr

000011110000

0000, 0 1111, ptr

111111111111

Ptr=0 means no children

Q: Why can’t we just make it a 248-ary trie?

16


Multiway tries

Degree ofTree

# MemReferences

# Nodes(x106)

Total Memory(Mbytes)

FractionWasted (%)

2 48 1.09 4.3 494 24 0.53 4.3 738 16 0.35 5.6 8616 12 0.25 8.3 9364 8 0.17 21 98256 6 0.12 64 99.5

Table produced from 215 randomly generated 48-bit addresses

As degree increases, more and more pointers are “0”

17

Exact Matches in Ethernet Switches Trees and Tries

• Advantages:– Fixed lookup time– Simple to implement and update

• Disadvantages– Inefficient use of memory and/or

requires large number of memory references

18

Outline

I. Routing Lookups• Overview• Exact matching

– Direct lookup– Associative lookup– Hashing– Trees and tries

• Longest prefix matching– Why LPM?– Tries and compressed tries– Binary search on prefix intervals

• References

II. Packet Classification

19

Longest Prefix Matching: IPv4 Addresses

• 32-bit addresses• Dotted quad notation: e.g.

12.33.32.1• Can be represented as integers on

the IP number line [0, 232-1]: a.b.c.d denotes the integer: (a*224+b*216+c*28+d)

0.0.0.0 255.255.255.255IP Number Line

20

Class-based Addressing

A B C D

0.0.0.0

E

128.0.0.0 192.0.0.0

Class Range MS bits netid hostidA 0.0.0.0 –

128.0.0.00 bits 1-7 bits 8-31

B 128.0.0.0 -191.255.255.255

10 bits 2-15 bits 16-31

C 192.0.0.0 -223.255.255.255

110 bits 3-23 bits 24-31

D (multicast)

224.0.0.0 - 239.255.255.255

1110 - -

E (reserved)

240.0.0.0 -255.255.255.255

11110 - -

21

Lookups with Class-based Addresses

23

186.21

Port 1

Port 2192.33.32.1

Class A

Class B

Class C

192.33.32 Port 3Exact match

netid port#

22

Problems with Class-based Addressing

• Fixed netid-hostid boundaries too inflexible– Caused rapid depletion of address

space

• Exponential growth in size of routing tables

23

Early Exponential Growth in Routing Table Sizes

Num

ber

of

BG

P r

oute

s advert

ised

24

Classless Addressing (and CIDR)

• Eliminated class boundaries• Introduced the notion of a variable

length prefix between 0 and 32 bits long

• Prefixes represented by P/l: e.g., 122/8, 212.128/13, 34.43.32/22, 10.32.32.2/32 etc.

• An l-bit prefix represents an aggregation of 232-l IP addresses

25

CIDR:Hierarchical Route Aggregation

Backbone

Router

R1R2

R3R4

ISP, P ISP, Q192.2.0/22 200.11.0/22

Site, S

192.2.1/24

Site, T

192.2.2/24 192.2.0/22 200.11.0/22

192.2.1/24 192.2.2/24

192.2.0/22, R2

Backbone routing table

IP Number Line

R2

26

Post-CIDR Routing Table sizes

Source: http://www.cidr-report.org/

27

Routing Lookups with CIDR

192.2.0/22, R2

192.2.2/24, R3 192.2.0/22 200.11.0/22

192.2.2/24

200.11.0/22, R4

200.11.0.33192.2.0.1 192.2.2.100

LPM: Find the most specific route, or the longest matching prefix among all the prefixes matching the destination address of an incoming packet

28

Longest Prefix Match is Harder than Exact Match

• The destination address of an arriving packet does not carry with it the information to determine the length of the longest matching prefix

• Hence, one needs to search among the space of all prefix lengths; as well as the space of all prefixes of a given length

29

LPM in IPv4Use 32 exact match algorithms for LPM!

Exact matchagainst prefixes

of length 1


of length 2


of length 32

Network Address PortPriorityEncodeand pick

30

Metrics for Lookup Algorithms• Speed (= number of memory accesses)• Storage requirements (= amount of

memory)• Low update time (support ~5K updates/s)• Scalability

– With length of prefix: IPv4 unicast (32b), Ethernet (48b), IPv4 multicast (64b), IPv6 unicast (128b)

– With size of routing table: (sweetspot for today’s designs = 1 million)

• Flexibility in implementation• Low preprocessing time

31

Radix Trie

P1 111* H1

P2 10* H2

P3 1010*

H3

P4 10101

H4

P2

P3

P4

P1

A

B

C

G

D

F

H

E

1

0

0

1 1

1

1

Lookup 10111

Add P5=1110*

I

0

P5

next-hop-ptr (if prefix)

left-ptr right-ptr

Trie node

32

Radix Trie

• W-bit prefixes: O(W) lookup, O(NW) storage and O(W) update complexity

Advantages

SimplicityExtensible to wider fields

Disadvantages

Worst case lookup slowWastage of storage space in chains

33

Leaf-pushed Binary Trie

A

B

C

G

D

E

1

0

0

1

1

left-ptr or next-hop

Trie node

right-ptr or next-hop

P2

P4P3

P2

P1P1 111* H1

P2 10* H2

P3 1010*

H3

P4 10101

H4

34

PATRICIA

2A

B C

E

10

1

Patricia tree internal node

3

P3

P2

P4

P110

0F G

D5

bit-position

left-ptr right-ptr

Lookup 10111

P1 111* H1

P2 10* H2

P3 1010*

H3

P4 10101

H4

Bitpos 12345

35

• W-bit prefixes: O(W2) lookup, O(N) storage and O(W) update complexity

Advantages

Decreased storage Extensible to wider fields

Disadvantages

Worst case lookup slowBacktracking makes implementation complex

PATRICIA

36

Path-compressed Tree

1, , 2A

B C10

10,P2,4

P4

P1

1

0

E

D1010,P3,5

bit-position

left-ptr right-ptr

variable-length bitstring

next-hop (if prefix present)

Path-compressed tree node structure

Lookup 10111

P1 111* H1

P2 10* H2

P3 1010*

H3

P4 10101

H4

37

• W-bit prefixes: O(W) lookup, O(N) storage and O(W) update complexity

Advantages

Decreased storage

Disadvantages

Worst case lookup slow

Path-compressed Tree

38

Multi-bit Tries

Depth = WDegree = 2Stride = 1 bit

Binary trieW

Depth = W/kDegree = 2k

Stride = k bits

Multi-ary trie

W/k

39

Prefix Expansion with Multi-bit Tries

If stride = k bits, prefix lengths that are not a multiple of k need to be expanded

Prefix Expanded prefixes

0* 00*, 01*

11* 11*

E.g., k = 2:

Maximum number of expanded prefixes corresponding to one non-expanded prefix = 2k-

1

40

Four-ary Trie (k=2)

P2

P3 P12

A

B

F11

next-hop-ptr (if prefix)

ptr00 ptr01

A four-ary trie node

P11

10

P42

H11

P41

10

10

1110

D

C

E

G

ptr10 ptr11

Lookup 10111

P1 111* H1

P2 10* H2

P3 1010*

H3

P4 10101

H4

41

Prefix Expansion Increases Storage Consumption

• Replication of next-hop ptr• Greater number of unused (null)

pointers in a node

Time ~ W/kStorage ~ NW/k * 2k-1

42

Generalization: Different Strides at Each Trie Level

• 16-8-8 split• 4-10-10-8 split• 24-8 split• 21-3-8 split

Optional Exercise: Why does this not work well for IPv6?

43

Choice of Strides: Controlled Prefix Expansion [Sri98]

Given a forwarding table and a desired number of memory accesses in the worst case (i.e., maximum tree depth, D)

A dynamic programming algorithm to compute the optimal sequence of strides that minimizes the storage requirements: runs in O(W2D) timeAdvantages

Optimal storage under these constraints

Disadvantages

Updates lead to sub-optimality anywayHardware implementation difficult

44

Binary Search on Prefix Intervals [Lampson98]

0000 11110010 0100 0110 1000 11101010 1100

P1

P4P3

P5P2

Prefix IntervalP1 /0 0000…

1111

P2 00/2 0000…0011

P3 1/1 1000…1111

P4 1101/4 1101…1101

P5 001/3 0010…0011

1001

I1 I3 I4 I5 I6I2

45

I1

I3

I2 I4 I5

I6

0111

0011 1101

11000001

>Alphabetic Tree

1/2 1/4

1/8

1/16 1/32

1/32

>

>

>

>

0000 11110010 0100 0110 1000 11101010 1100

P1

P4P3

P5P2

1001

I1 I3 I4 I5 I6I2

46

0001

Another Alphabetic Tree

I1

I2

I5

I3

I4

I6

0111

0011

1100

1101

1/2

1/4

1/8

1/16

1/32 1/32

47

Advantages

Storage is linearCan be ‘balanced’Lookup time independent of W

Disadvantages

But, lookup time is dependent on NIncremental updates complexEach node is big in size: requires higher memory bandwidth

•W-bit N prefixes: O(logN) lookup, O(N) storage

Multiway Search on Intervals

48

Routing Lookups: References

• [lulea98] A. Brodnik, S. Carlsson, M. Degermark, S. Pink. “Small Forwarding Tables for Fast Routing Lookups”, Sigcomm 1997, pp 3-14. [Example of techniques for decreasing storage consumption]

• [gupta98] P. Gupta, S. Lin, N.McKeown. “Routing lookups in hardware at memory access speeds”, Infocom 1998, pp 1241-1248, vol. 3. [Example of hardware-optimized trie with increased storage consumption]

• P. Gupta, B. Prabhakar, S. Boyd. “Near-optimal routing lookups with bounded worst case performance,” Proc. Infocom, March 2000 [Example of deliberately skewing alphabetic trees]

• P. Gupta, “Algorithms for routing lookups and packet classification”, PhD Thesis, Ch 1 and 2, Dec 2000, available at http://yuba.stanford.edu/ ~pankaj/phd.html [Background and introduction to LPM]

49

Routing lookups : References (contd)

• [lampson98] B. Lampson, V. Srinivasan, G. Varghese. “ IP lookups using multiway and multicolumn search”, Infocom 1998, pp 1248-56, vol. 3.

• [LC-trie] S. Nilsson, G. Karlsson. “Fast address lookup for Internet routers”, IFIP Intl Conf on Broadband Communications, Stuttgart, Germany, April 1-3, 1998.

• [sri98] V. Srinivasan, G.Varghese. “Fast IP lookups using controlled prefix expansion”, Sigmetrics, June 1998.

• [wald98] M. Waldvogel, G. Varghese, J. Turner, B. Plattner. “Scalable high speed IP routing lookups”, Sigcomm 1997, pp 25-36.

1 EE384Y: Packet Switch Architectures Part II Address Lookup and Classification Nick McKeown Professor of Electrical Engineering and Computer Science,

Documents

address lookup

exact matches

memory technology

gb of memory

bucket slide

memory access time

o1 memory references

chip dram slide