1 Computer Networks Network layer (Part 4)
Jan 15, 2016
1
Computer Networks
Network layer (Part 4)
2
Network layer (so far)
• Network layer functions• Network layer implementation (IP)• Today
– Network layer devices (routers)• Network processors• Input/output port functions• Forwarding functions• Switching fabric
– Advanced network layer topics• Routing problems• Routing metric selection• Overlay networks
3
NL: Router Architecture OverviewKey router functions:
– Run routing algorithms/protocol (RIP, OSPF, BGP) and construct routing table– Switch/forward datagrams from incoming to outgoing link based on route
4
NL: Routing vs. Forwarding
• Routing: process by which the forwarding table is built and maintained– One or more routing protocols
– Procedures (algorithms) to convert routing info to forwarding table.
• Forwarding: the process of moving packets from input to output– The forwarding table
– Information in the packet
5
NL: What Does a Router Look Like?
• Network processor/controller– Handles routing protocols, error conditions
• Line cards– Network interface cards
• Forwarding engine– Fast path routing (hardware vs. software)
• Backplane– Switch or bus interconnect
6
NL: Network Processor
• Runs routing protocol and downloads forwarding table to forwarding engines– Use two forwarding tables per engine to allow easy
switchover (double buffering)
• Typically performs “slow” path processing– ICMP error messages
– IP option processing
– IP fragmentation
– IP multicast packets
7
NL: Fast-path router processing
• Packet arrives arrives at inbound line card• Header transferred to forwarding engine• Forwarding engine determines output interface• Forwarding engine signals result to line card• Packet copied to outbound line card
8
NL: Input Port Functions
Decentralized switching: • given datagram dest., lookup output port using
routing table in input port memory
• goal: complete input port processing at ‘line speed’
• queuing: if datagrams arrive faster than forwarding rate into switch fabric
Physical layer:bit-level reception
Data link layer:e.g., Ethernetsee chapter 5
9
NL: Input Port Queuing
• Fabric slower than input ports combined => queuing may occur at input queues
• Head-of-the-Line (HOL) blocking: queued datagram at front of queue prevents others in queue from moving forward
• queueing delay and loss due to input buffer overflow!
10
NL: Input Port Queuing
• Possible solution– Virtual output buffering
• Maintain per output buffer at input
• Solves head of line blocking problem
• Each of MxN input buffer places bid for output
– Crossbar connect
– Challenge: map of bids to schedule for crossbar
11
NL: Forwarding Engine
• General purpose processor + software• Packet trains help route hit rate
– Packet train = sequence of packets for same/similar flows– Similar to idea behind IP switching (ATM/MPLS) where long-lived
flows map into single label
• Example– Partridge, et. al. “A 50-Gb/s IP Router”, IEEE Trans. On Networking,
Vol 6, No 3, June 1998. – 8KB L1 Icache
• Holds full forwarding code
– 96KB L2 cache• Forwarding table cache
– 16MB L3 cache• Full forwarding table x 2 - double buffered for updates
12
NL: Binary trie
Route PrefixesA 0* B 01000*C 011*D 1*E 100*F 1100*G 1101*H 1110*I 1111*
A
0
0
0
0
1
1
0
0 0
0 0
1
1
1 1
1
B
C
D
E
F G H I
13
NL: Path-compressed binary trie
• Eliminate single branch point nodes
• Variants include PATRICIA and BSD tries
Route PrefixesA 0* B 01000*C 011*D 1*E 100*F 1100*G 1101*H 1110*I 1111*
A
0
1 0
0
0 0
1
1
1 1
1
B C
D
E
F G H I
0
Bit=3 Bit=2
Bit=3
Bit=4 Bit=4
Bit=1
14
NL: Patricia tries and variable prefix match
128.2/16
10
16
19128.32/16
128.32.130/240 128.32.150/24
default0/0
0
• Patricia Tree• Arrange route entries into a series of bit tests• Worst case = 32 bit tests• Problem: memory speed is a bottleneck• Used in older BSD Unix routing implementations
Bit to test – 0 = left child,1 = right child
15
NL: Multi-bit tries
• Compare multiple bits at a time– Reduces memory accesses
– Forces table expansion for prefixes falling in between strides
– Variable-length multi-bit tries
– Fixed-length multi-bit tries
• Most route entries are Class C
• Cut prefix tree at 16 bit depth – Many prefixes 8, 16, 24 bits in length
– 64K bit mask
– Bit = 1 if tree continues below cut (root head)
– Bit = 1 if leaf at depth 16 or less (genuine head)
– Bit = 0 if part of range covered by leaf
16
NL: Variable stride multi-bit trie
• Single level has variable stride lengths
Route PrefixesA 0* B 01000*C 011*D 1*E 100*F 1100*G 1101*H 1110*I 1111*
A
0 1
0 1
00 01 10 11
A D D
B
CC E
00 01 10 11
GF IH
00 01 10 11
17
NL: Fixed stride multi-bit trie
• Single level has equal strides
Route PrefixesA 0* B 01000*C 011*D 1*E 100*F 1100*G 1101*H 1110*I 1111*
A
000 001 010 011 100 101 110 111
A A
00 01 10 11 00 01 10 11 00 01 10 11
C E D D D
B F F G HG H II
18
NL: Other data structures
• Ruiz-Sanchez, Biersack, Dabbous, “Survey and Taxonomy of IP address Lookup Algorithms”, IEEE Network, Vol. 15, No. 2, March 2001– LC trie– Lulea trie– Full expansion/compression– Binary search on prefix lengths– Binary range search– Multiway range search– Multiway range trees– Binary search on hash tables (Waldvogel – SIGCOMM 97)
19
NL: Prefix Match issues
• Scaling – IPv6
• Stride choice– Tuning stride to route table
– Bit shuffling
20
NL: Speeding up Prefix Match - Alternatives
• Route caches– Temporal locality– Many packets to same destination
• Protocol acceleration– Add clue (5 bits) to IP header– Indicate where IP lookup ended on previous node (Bremler-Barr SIGCOMM
99)
• Content addressable memory (CAM)– Hardware based route lookup– Input = tag, output = value associated with tag– Requires exact match with tag
• Multiple cycles (1 per prefix searched) with single CAM• Multiple CAMs (1 per prefix) searched in parallel
– Ternary CAM• 0,1,don’t care values in tag match• Priority (i.e. longest prefix) by order of entries in CAM
21
NL: Types of network switching fabrics
Memory
Bus
Multistage interconnectionCrossbar interconnection
22
NL: Types of network switching fabrics
• Issues– Switch contention
• Packets arrive faster than switching fabric can switch
• Speed of switching fabric versus line card speed determines input queuing vs. output queuing
23
NL: Switching Via MemoryFirst generation routers:• packet copied by system’s (single) CPU• 2 bus crossings per datagram• speed limited by memory bandwidth Modern routers:• input port processor performs lookup, copy into memory• Cisco Catalyst 8500
InputPort
OutputPort
Memory
System Bus
24
NL: Switching Via Bus
• Datagram from input port memory
to output port memory via a shared bus• Bus contention: switching speed limited by bus
bandwidth• 1 Gbps bus, Cisco 1900: sufficient speed for access
and enterprise routers (not regional or backbone)
25
NL: Switching Via An Interconnection Network
• Overcome bus bandwidth limitations• Crossbar networks
– Fully connected (n2 elements)
– All one-to-one, invertible permutations supported
26
NL: Switching Via An Interconnection Network
• Crossbar with N2 elements hard to scale
• Multi-stage interconnection networks (Banyan)– Initially developed to connect processors in multiprocessor
– Typically (n log n) elements
– Datagram fragmented fixed length cells
– Cells switched through the fabric
– Cisco 12000: Gbps through an interconnection network
– Blocking possible (not all one-to-one, invertible permutations supported)
A
B
C
D
W
X
Y
Z
27
NL: Output Ports
• Output contention– Datagrams arrive from fabric faster than output port’s transmission rate– Buffering required– Scheduling discipline chooses among queued datagrams for transmission
28
NL: Output port queueing
• buffering when arrival rate via switch exceeds ouput line speed
• queueing (delay) and loss due to output port buffer overflow!
29
NL: Advanced topics
• Routing synchronization• Routing instability• Routing metrics• Overlay networks
30
NL: Routing Update Synchronization
• Another interesting robustness issue to consider...• Even apparently independent processes can eventually
synchronize– Intuitive assumption that independent streams will not
synchronize is not always valid
– Periodic routing protocol messages from different routers
– Abrupt transition from unsynchronized to synchronized system states
31
NL: Examples/Sources of Synchronization
• TCP congestion windows– Cyclical behavior shared by flows through gateway
• Periodic transmission by audio/video applications• Periodic downloads• Synchronized client restart
– After a catastrophic failure
• Periodic routing messages– Manifests itself as periodic packet loss on pings
• Pendulum clocks on same wall• Automobile traffic patterns
32
NL: How Synchronization Occurs
T
AMessage from B
Weak Coupling when A’s behavior is triggered off of B’smessage arrival!
A
T
Weak couplingcan result in
eventual synchronization
33
NL: Routing Source of Synchronization• Router resets timer after processing its own and incoming
updates
• Creates weak coupling among routers
• Solutions – Set timer based on clock event that is not a function of processing
other routers’ updates, or
– Add randomization, or reset timer before processing update• With increasing randomization, abrupt transition from predominantly
synchronized to predominantly unsynchronized
• Most protocols now incorporate some form of randomization
34
NL: Routing Instability
• References– C. Labovitz, R. Malan, F. Jahanian,
``Internet Routing Stability'', SIGCOMM 1997.
• Record of BGP messages at major exchanges• Discovered orders of magnitude larger than expected
updates– Bulk were duplicate withdrawals
• Stateless implementation of BGP – did not keep track of information passed to peers
• Impact of few implementations
– Strong frequency (30/60 sec) components• Interaction with other local routing/links etc.
35
NL: Route Flap Storm
• Overloaded routers fail to send Keep_Alive message and marked as down
• BGP peers find alternate paths• Overloaded router re-establishes peering session• Must send large updates • Increased load causes more routers to fail!
36
NL: Route Flap Dampening
• Routers now give higher priority to BGP/Keep_Alive to avoid problem
• Associate a penalty with each route– Increase when route flaps
– Exponentially decay penalty with time
• When penalty reaches threshold, suppress route
37
NL: BGP Oscillations
• Can possible explore every possible path through network (n-1)! Combinations
• Limit between update messages (MinRouteAdver) reduces exploration– Forces router to process all outstanding messages
• Typical Internet failover times– New/shorter link 60 seconds
• Results in simple replacement at nodes
– Down link 180 seconds• Results in search of possible options
– Longer link 120 seconds• Results in replacement or search based on length
38
NL: Routing Metrics
• Choice of link cost defines traffic load– Low cost = high probability link belongs to SPT and will
attract traffic, which increases cost
• Main problem: convergence– Avoid oscillations
– Achieve good network utilization
39
NL: Metric Choices
• Static metrics (e.g., hop count)– Good only if links are homogeneous
– Definitely not the case in the Internet
• Static metrics do not take into account– Link delay
– Link capacity
– Link load (hard to measure)
40
NL: Original ARPANET Metric
• Cost proportional to queue size– Instantaneous queue length as delay estimator
• Problems– Did not take into account link speed
– Poor indicator of expected delay due to rapid fluctuations
– Delay may be longer even if queue size is small due to contention for other resources
41
NL: Metric 2 - Delay Shortest Path Tree
• Delay = (depart time - arrival time) + transmission time + link propagation delay– (Depart time - arrival time) captures queuing
– Transmission time captures link capacity
– Link propagation delay captures the physical length of the link
• Measurements averaged over 10 seconds– Update sent if difference > threshold, or every 50 seconds
42
NL: Performance of Metric 2
• Works well for light to moderate load– Static values dominate
• Oscillates under heavy load– Queuing dominates
43
NL: Specific Problems
• Range is too wide– 9.6 Kbps highly loaded link can appear 127 times costlier
than 56 Kbps lightly loaded link
– Can make a 127-hop path look better than 1-hop
• No limit to change between reports• All nodes calculate routes simultaneously
– Triggered by link update
44
NL: Example
Net X Net Y
B
A
45
NL: Example
Net X Net Y
B
A
After everyone re-calculates routes:
.. Oscillations!
46
NL: Consequences
• Low network utilization (50% in example)• Congestion can spread elsewhere• Routes could oscillate between short and long paths• Large swings lead to frequent route updates
– More messages
– Frequent SPT re-calculation
47
NL: Revised Link Metric
• Better metric: packet delay = f(queueing, transmission, propagation)
• When lightly loaded, transmission and propagation are good predictors
• When heavily loaded queueing delay is dominant and so transmission and propagation are bad predictors
48
NL: Normalized Metric
• If a loaded link looks very bad then everyone will move off of it
• Want some to stay on to load balance and avoid oscillations
• It is still an OK path for some • Hop normalized metric diverts routes that have an
alternate that is not too much longer • Also limited relative values and range of values
advertised gradual change
49
NL: Revised Metric
• Limits on relative change– Measured link delay is taken over 10sec period
– Link utilization is computed as .5*current sample + .5*last average
– Max change limited to slightly more than ½ hop
– Min change limited to slightly less than ½ hop
– Bounds oscillations
• Normalized according to link type – Satellite should look good when queueing on other
links increases
50
NL: Routing Metric vs. Link Utilization
0
30
60
140
75
50% 100%25% 75%
225
New metric(routing units)
Utilization
9.6 satellite
9.6 terrestrial
56 terrestrial
56 satellite
90
51
NL: Observations
• Utilization effects– High load never increases cost more than 3*cost of idle link
– Cost = f(link utilization) only at moderate to high loads
• Link types– Most expensive link is 7 * least expensive link
– High-speed satellite link is more attractive than low-speed terrestrial link
• Allows routes to be gradually shed from link
52
NL: Idealized Network Response Maps
0.0
1.0
4.0
0.6
0.8
0.4
2.0 3.01.0 1.5 2.5 3.50.5
Link cost
Mean loadon link
•Load of “average” link as a function of that link’s cost•Created empirically
0.2
Increasingappliednetworkload
53
NL: Equilibrium Calculation
0.0
1.0
4.0
0.6
0.8
0.4
2.0 3.01.0 1.5 2.5 3.50.5
Link cost
Mean loadon link
0.2
HN-SPF
D-SPF
•Combine utilization to cost and cost to utilization maps•Equilibrium points at intersections
Increasingappliednetwork load
54
NL: Routing Dynamics
0
1.0
4.0
0.5
0.75
0.25
2.0 3.01.0 1.5 2.5 3.50.5
Link reported cost
Utilization
Boundedoscillation
Metric map
Network response
•Limiting maximum metric change bounds oscillation
55
NL: Routing Dynamics
0
1.0
4.0
0.5
0.75
0.25
2.0 3.01.0 1.5 2.5 3.50.5
Reported cost
Utilization
Metric map
Network response
Easing ina new link
56
NL: Overlay Routing
• Basic idea:– Treat multiple hops through IP network as one hop in an
overlay network
– Run routing protocol on overlay nodes
• Why?– For performance – can run more clever protocol on overlay
– For efficiency – can make core routers very simple
– For functionality – can provide new features such as multicast, active processing, IPv6
57
NL: Overlay for Performance
• References– Savage et. al. “The End-to-End Effects of Internet Path
Selection”, SIGCOMM 99– Anderson et. al. “Resilient Overlay Networks”, SOSP 2001
• Why would IP routing not give good performance?– Policy routing – limits selection/advertisement of routes– Early exit/hot-potato routing – local not global incentives– Lack of performance based metrics – AS hop count is the
wide area metric
• How bad is it really?– Look at performance gain an overlay provides
58
NL: Quantifying Performance Loss
• Measure round trip time (RTT) and loss rate between pairs of hosts– ICMP rate limiting
• Alternate path characteristics– 30-55% of hosts had lower latency
– 10% of alternate routes have 50% lower latency
– 75-85% have lower loss rates
59
NL: Bandwidth Estimation
• RTT & loss for multi-hop path– RTT by addition
– Loss either worst or combine of hops – why?• Large number of flows combination of probabilities
• Small number of flows worst hop
• Bandwidth calculation– TCP bandwidth is based primarily on loss and RTT
• 70-80% paths have better bandwidth• 10-20% of paths have 3x improvement
60
NL: Overlay for Efficiency
• Multi-path routing– More efficient use of links or QOS
– Need to be able to direct packets based on more than just destination address can be computationally expensive
– What granularity? Per source? Per connection? Per packet?• Per packet re-ordering
• Per source, per flow coarse grain vs. fine grain
– Take advantage of relative duration of flows• Most bytes on long flows
61
NL: Overlay for Features
• How do we add new features to the network?– Does every router need to support new feature?
– Choices• Reprogram all routers active networks
• Support new feature within an overlay
– Basic technique: tunnel packets
• Tunnels– IP-in-IP encapsulation
– Poor interaction with firewalls, multi-path routers, etc.
62
NL: Examples
• IP V6 & IP Multicast– Tunnels between routers supporting feature
• Mobile IP– Home agent tunnels packets to mobile host’s location
– http://www.rfc-editor.org/rfc/rfc2002.txt
• QOS– Needs some support from intermediate routers
63
NL: Overlay Challenges
• How do you build efficient overlay– Probably don’t want all N2 links – which links to create?
– Without direct knowledge of underlying topology how to know what’s nearby and what is efficient?
64
NL: Future of Overlay
• Application specific overlays– Why should overlay nodes only do routing?
• Caching– Intercept requests and create responses
• Transcoding– Changing content of packets to match available bandwidth
• Peer-to-peer applications
65
NL: Network layer summary
• Network layer functions• Specific network layers (IPv4, IPv6)• Specific network layer devices (routers)• Advanced network layer topics
66
NL: Network trace
• http://www.cse.ogi.edu/class/cse524/trace.txt
67
NL: End of material for midterm
• Midterm next Monday 10/29/2001 covering….– Technical material in lectures
– Chapters 1, 4, and 5• Chapter 1
• Chapter 4.1-4.7
• Chapter 5
– Review questions at end of chapters