From a Nick M cKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham 1 Forwarding Decisions • ATM and MPLS switches – Direct Lookup • Bridges and Ethernet switches – Associative Lookup – Hashing – Trees and tries • IP Routers – CIDR – Patricia trees/tries – Other methods – Caching • Packet Classification
65
Embed
Forwarding Decisions - univ-pau.frcpham.perso.univ-pau.fr/ENSEIGNEMENT/IUP/RouterForwarding.pdf · From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 1
Forwarding Decisions• ATM and MPLS switches
– Direct Lookup• Bridges and Ethernet switches
– Associative Lookup– Hashing– Trees and tries
• IP Routers– CIDR– Patricia trees/tries– Other methods– Caching
• Packet Classification
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 2
ATM and MPLS SwitchesDirect Lookup
VCI
Address
MemoryD
ata(Port, VCI)
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 3
Forwarding Decisions• ATM and MPLS switches
– Direct Lookup• Bridges and Ethernet switches
– Associative Lookup– Hashing– Trees and tries
• IP Routers– CIDR– Patricia trees/tries– Other methods– Caching
• Packet Classification
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 4
Bridges and Ethernet SwitchesAssociative Lookups
NetworkAddress
AssociatedData
AssociativeMemory or CAM
Search Data
48
log2N
AssociatedData
Hit?
Address{
Advantages:• Simple
Disadvantages• Slow• High Power• Small• Expensive
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 5
Bridges and Ethernet SwitchesHashing
HashingFunction
Memory
Add
ress
Dat
a
Search Data
48
log2N
AssociatedData
Hit?
Address{16
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 6
Lookups Using HashingAn example
Hashing Function
CRC-1616
#1 #2 #3 #4
#1 #2
#1 #2 #3Linked lists
Memory
Search Data
48
log2N
AssociatedData
Hit?
Address{M entries
N lists
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 7
Lookups Using HashingPerformance of simple example
Most addresses in their own list
Most addresses in one list
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 8
Lookups Using Hashing
Advantages:• Simple
• Expected lookup time can be small
Disadvantages• Non-deterministic lookup time
• Inefficient use of memory
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 9
Trees and Tries
Binary Search Tree
< >
< > < >
log2 N
N entries
Binary Search Trie
0 1
0 1 0 1
111010
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 10
Tries
• An entry is:– a pointer to another array,– a special symbol indicating
no better match– a null pointer indicating that
the longst match is the parentnode
• Two ways to improveperformance– cache recently used addresses
in a CAM– move common entries up to a
higher level (match longerstrings)
128.32.1.2 ?
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 11
TriesMultiway tries
16-ary Search Trie
0000, ptr 1111, ptr
0000, 0 1111, ptr
000011110000
0000, 0 1111, ptr
111111111111
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 12
Trees and TriesMultiway tries
Table produced from 215 randomly generated 48-bit addresses
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 13
Forwarding Decisions• ATM and MPLS switches
– Direct Lookup• Bridges and Ethernet switches
– Associative Lookup– Hashing– Trees and tries
• IP Routers– CIDR– Patricia trees/tries– Other methods– Caching
• Packet Classification
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 14
IP RoutersClass-based addresses
Class A Class B Class C D
212.17.9.4Class AClass BClass C 212.17.9.0 Port 4
Exact Match
Routing Table:
IP Address Space
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 15
IP RoutersCIDR
A B C D0 232-1
0 232-1
128.9/16
128.9.0.0
216
142.12/19
65/24
Classless:
Class-based:
128.9.16.14
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 16
IP RoutersCIDR
0 232-1
128.9/16
128.9.16.14
128.9.16/20 128.9.176/20
128.9.19/24
128.9.25/24
Most specific route = “longest matching prefix”
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 17
IP RoutersMetrics for Lookups
128.9/16128.9.16/20
128.9.176/20
128.9.19/24128.9.25/24
142.12/19
65/24
Prefix Port35271013
128.9.16.14• Lookup time• Storage space• Update time• Preprocessing time
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 18
IP RouterLookup
IPv4 unicast destination address based lookup
Dstn Addr Next Hop
--------
---- ----
--------
Destination Next HopForwarding Table
Next Hop Computation
Forwarding Engine
Incoming Packet
HEADER
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 20
Lookup Performance Required
Gigabit Ethernet (84B packets): 1.49 Mpps
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 21
Size of the Routing Table
Source: http://www.telstra.net/ops/bgptable.html
Exponentialgrowth before
CIDR
About10k newprefixes per year
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 22
Size of the Forwarding TableSource: http://www.telstra.net/ops/bgptable.html
95 96 97 98 99 00Year
Num
ber
of P
refi
xes
10,000/year
Renewed Exponential Growth
Renewed growth due to multi-homing of enterprise networks!
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 31
Routing Lookups in Hardware
Prefix length
Num
ber
Most prefixes are 24-bits or shorter
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 32
Routing Lookups in Hardware14
2.19
.6.1
4
Prefixes up to 24-bits14
2.19
.614
1 Next Hop
24
Next Hop
142.19.6
224 = 16M entries
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 33
Routing Lookups in Hardware12
8.3.
72.4
4
Prefixes up to 24-bits12
8.3.
7244
1 Next Hop
128.3.72
24 0 Pointer
8
Prefixes above 24-bits
Next Hop
Next Hop
Next Hopof
fset
base
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 36
Caching Addresses
CPU BufferMemory
LineCard
DMA
MAC
LocalBuffer
Memory
LineCard
DMA
MAC
LocalBuffer
Memory
LineCard
DMA
MAC
LocalBuffer
Memory
Fast Path
Slow Path
Advantages
Increased averagelookupperformance
Disadvantages
Decreased locality inbackbone traffic
Cache size
Cache managementoverhead
Hardwareimplementation difficult
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 37
Caching Addresses
LAN:Average flow < 40 packets
WAN: Huge Number of flows
Cache = 10% of Full Table
Cache Hit Rate
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 38
IP Router LookupsReferences
• A. Brodnik, S. Carlsson, M. Degermark, S. Pink. “Small ForwardingTables for Fast Routing Lookups”, Sigcomm 1997, pp 3-14.
• B. Lampson, V. Srinivasan, G. Varghese. “ IP lookups using multiwayand multicolumn search”, Infocom 1998, pp 1248-56, vol. 3.
• M. Waldvogel, G. Varghese, J. Turner, B. Plattner. “Scalable highspeed IP routing lookups”, Sigcomm 1997, pp 25-36.
• P. Gupta, S. Lin, N.McKeown. “Routing lookups in hardware atmemory access speeds”, Infocom 1998, pp 1241-1248, vol. 3.
• S. Nilsson, G. Karlsson. “Fast address lookup for Internet routers”,IFIP Intl Conf on Broadband Communications, Stuttgart, Germany,April 1-3, 1998.
• V. Srinivasan, G.Varghese. “Fast IP lookups using controlled prefixexpansion”, Sigmetrics, June 1998.
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 45
Course Outline
• Packet Lookup and Classification:Where does a packet go next?
• Switching Fabrics:How does the packet get there?
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 46
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 48
Background: Circuit switch
• A switch that can handle N calls has N logical inputs andN logical outputs– N up to 200,000
• Moves 8-bit samples from an input to an output port– Recall that samples have no headers– Destination of sample depends on time at which it arrives at the
switch• In practice, input trunks are multiplexed
– Multiplexed trunks carry frames = set of samples• Goal: extract samples from frame, and depending on
position in frame, switch to output– each incoming sample has to get to the right output line and the
right slot in the output frame
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 49
Call blocking
• Can’t find a path from input to output• Internal blocking
– slot in output frame exists, but no path• Output blocking
– no slot in output frame is available• Output blocking is reduced in transit
switches– need to put a sample in one of several slots
going to the desired next hop
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 50
Multiplexors and demultiplexors
• Most trunks time division multiplex voicesamples
• At a central office, trunk is demultiplexedand distributed to active circuits
• Synchronous multiplexor– N input lines– Output runs N times as fast as input
…
123
N
MUX…
123
N
De-MUX1 2 3 … N
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 51
Time division switching
• Key idea: when de-multiplexing, position in framedetermines output trunk
• Time division switching interchanges sample positionwithin a frame: time slot interchange (TSI)
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 52
Time Division SwitchingLimitations
• To build a 120,000 circuit switch– read and write samples 120,000 every 125us, a
R&W operation in 0.5 ns!– Today DRAM has access time from 80 to 40 ns– If we use 40 ns DRAM, it's 80 times more than
what we need– Maximum #circuit= 120,000/80=1500!– Too small!!
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 53
Space division switching
• Each sample takes a different path throughthe switch, depending on its destination
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 54
Crossbar
• Simplest possible space-division switch
• Crosspoints can be turnedon or off, long enough totransfer a packet from aninput to an output
• Expensive• Internally nonblocking
– but need N2 crosspoints– time to set each crosspoint
grows quadratically
configuration
Dat
a In
Data Out
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 55
Multistage crossbar (1)
• In a crossbar during eachswitching time only onecross-point per row orcolumn is active
• Can save crosspoints if across-point can attach tomore than one input line
• This is done in a multistagecrossbar
N/narraysn x k
karraysN/n x N/n
N/narraywk x n
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 56
Multistage crossbar (2)
• Can suffer internal blocking– unless sufficient number of second-level
stages, k ≥ n• Number of crosspoints < N2
• Finding a path from input to outputrequires a depth-first-search
• Scales better than crossbar, but still not toowell– 120,000 call switch needs ~250 million
crosspoints
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 57
The true cost of telephoneswitching
• In a central switching system, the high costis the line card.
• Now the true cost is the copper wire to thecustomer premises!!
• In long-distance, the high cost is in layinglines, acquiring rights of way and switch-control software!
• So, saving a few thousand crosspoints isnot going to make phone call cheaper!
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 58
Packet switches
• In a circuit switch, path of a sample is determinedat time of connection establishment
• No need for a sample header--position in frameused
• In a packet switch, packets carry a destinationfield or label– Need to look up destination port on-the-fly
• Datagram switches– lookup based on entire destination address (longest-
prefix match)• Cell or Label-switches
– lookup based on VCI or Labels
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 59
Blocking in packet switches
• Can have both internal and output blocking• Internal
– no path to output• Output
– trunk unavailable• Unlike a circuit switch, cannot predict if
packets will block (why?)• If packet is blocked, must either buffer or
drop
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 60
Dealing with blocking in packetswitches
• Over-provisioning– internal links much faster than inputs
• Buffers– at input or output
• Backpressure– if switch fabric doesn’t have buffers, prevent
packet from entering until path is available• Parallel switch fabrics
– increases effective switching capacity
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 61
Switch Fabrics: Bufferedcrossbar
• What happens ifpackets at two inputsboth want to go tosame output?
• Can defer one at aninput buffer
• Or, buffer cross-points: complex arbiter
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 62
Switch fabric element
• Goal: towards building “self-routing” fabrics• Can build complicated fabrics from a simple
element
• Routing rule: if 0, send packet to upper output,else to lower output– If both packets to same output, buffer or drop
0
1
data 10
data 00
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 63
• What if two packets both want to go to the sameoutput→output blocking
000001
010011
100101
110111
000001
010011
100101
110111
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 64
Blocking in Banyan S/wsSorting
• Can avoid blocking by choosing order in which packetsappear at input ports
• If we can– present packets at inputs sorted by output– remove duplicates– remove gaps– precede banyan with a perfect shuffle stage– then no internal blocking
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 70
Output QueueingThe “ideal”
1
1
1
1
1
1
1
1
1
11
1
2
2
2
2
2
2
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 71
Output QueueingHow fast can we make centralized shared memory?
SharedMemory
200 byte bus
5ns SRAM
1
2
N
• 5ns per memory operation• Two memory operations per packet• Therefore, up to 160Gb/s• In practice, closer to 80Gb/s
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 72
Switching Fabrics
• Output and Input Queueing• Output Queueing• Input Queueing
– Scheduling algorithms– Combining input and output queues– Multicast traffic– Other non-blocking fabrics
• Multistage Switches
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 73
InterconnectsInput Queueing with Crossbar
configuration
Dat
a In
Data Out
Scheduler
Memory b/w = 2R
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 74
Input QueueingHead of Line Blocking
Del
ay
Load58.6% 100%
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 75
Head of Line Blocking
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 76
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 77
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 78
Input QueueingVirtual output queues
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 79
Input QueuesVirtual Output Queues
Del
ay
Load100%
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 80
Input Queueing Virtual Output Queues
Scheduler
Memory b/w = 2R
Can be quitecomplex!
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 85
Input QueueingWhy is serving long/old queues better than
serving maximum number of queues?
• When traffic is uniformly distributed, servicing themaximum number of queues leads to 100% throughput.• When traffic is non-uniform, some queues become longer than others.• A good algorithm keeps the queue lengths matched, and services a large number of queues.
VOQ #
Avg
Occ
upan
cy Uniform traffic
VOQ #
Avg
Occ
upan
cy
Non-uniform traffic
From a Nick McKeown's tutorial, 1999 andslides from Kalyanaraman (with figure fromKeshav) Some slides modified by C. Pham 103