Top Banner
The Impact of Internet Policy and Topology on Delayed Routing Convergence
48

The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Jan 20, 2018

Download

Documents

Abraham Rodgers

Money Time It CAN ’ T be tolerated any more Internet is to became an major factor in economy. e-commerce, VoIP, real-time video, etc.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

The Impact of Internet Policyand Topology onDelayed Routing Convergence

Page 2: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Good Old DaysInternet always was and still is BAD in reliability, availability & QoS.For historical reasons QoS just was not

there initially: “Best Effort” Principle.

e-mail & web surfing did not place high standards.

Page 3: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Money TimeIt CAN’T be tolerated any more

Internet is to became an major factor in economy.

e-commerce , VoIP, real-time video, etc.

Page 4: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

QoS BattleEfforts to bring QoS to Internet are enormous, BUT:

Stable underlying infrastructure MUST exist for any application level solution!

Page 5: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Bad NewsExisting Internet Backbone DO NOT provide rapid restoration and rerouting

NO effective interdomain path fail-over!Fail-Over for single failure takes

milliseconds in PSTN, minutes in Internet

Page 6: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

It hurts!

Impact on performance is huge. While restoring path :30 times more packet loss4 times end-to-end latency

Some fail-overs takes 15 minutes, average 3 minutes.

Page 7: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

What’s the Problem?

Slow Convergence during Fail-OverRouting tables oscillates after failure for

long period seeking for consistent network view.

WHO is to be blamed ? BGP - Currently used inter-domain routing protocol.

Page 8: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Nasty things about BGPAS path based BGP solves count-to-infinity of RIP, but exacerbates the number of routing table oscillations.

For unbounded delay BGP : ALL possible paths may be explored after single failure : O(N!).

Page 9: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

And More…

Even assuming bounded delay , BGP convergence for full mesh topology without filters is O(N * T DELAY)N is number of AS and there are 70000 of

them in Internet. T DELAY is about 30 sec. (recommendation is

30 sec +/- short random jitter).

Page 10: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

And Even More…

It is possible for autonomous systems to define “unsafe” policies causing persistent route oscillation.

Page 11: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

So What?!

All this stuff is interesting in theory but has little touch with reality.

Page 12: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Any-Way?

BGP4 used in Internet routers has bounded delay, provided by MinRouteAdver timer delaying distribution of too rapid updates.

So, O(N!) performance is irrelevant in real Internet.

Page 13: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Who cares?

BGP divergence was never observed in practice and remains theoretical problem.

There are modifications to BGP policies guaranteeing convergence.

Page 14: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

What a Mesh?!

Internet topology is long way from being complete mesh.

BGP Updates filtering is done by almost every BGP node.

Page 15: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Now What?

Experimental results indicates fail-over problems caused by bad BGP performance.For studying and resolving those problems, much more realistic Internet BGP processes models should be developed.

Page 16: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Drug “Providers?”

Internet retains hierarchy with several tiers of ISPs.

This hierarchy is specified by commercial relationships.Smaller ISP are customers of big ones.

Page 17: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Talk to me…Transit – upstream provider transits service to the customer.Default-free routing tables passed downstream. Customers & backbone routes passed

upstream.Peer – symmetric connection providing access to each other customers. Never used for transit to other ISP.Only customers & backbone routes exchanged.

Backup transit – normally acts like Peer, provides transit after fault detection.

Page 18: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

It is strictly businessFiltering mechanism of AS boarder routers are used for emphasizing those commercial relationships: If You don’t want other side to use some route – You should not announce it. So:Send customer & backbones routes to all

peers.Provide with other routing information (learned

from peers & upstream) only customers.

Page 19: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

A B

C

D E

F

G

H

J

I

Peer ___Transit ___Back-up ___

Tier 1

Tier 2

Tier 3

Page 20: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

No Free LunchesTransit relations – Inbound filters

Prefix filters limiting customer announcement to “legitimate” address space of the customer.Used by 100% ISPs.Upstream customer is willing to transit

routes for its customers only.

Page 21: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Friend to friendPeer relations – Outbound filters

Community filters is based on tagging routes to distinguish customer routes. Only updates from routes tagged as customer routes will pass the filter.Used by 73% of ISPs

Page 22: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Don’t talk too muchPeer relations – Outbound filters (cont.)

Prefix filters also may be used to distinguish customer routes.Applying prefix filter only (used by 13% of

ISPs) may cause creation of unintentional back-up transit path.

Page 23: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Check it

Peer relations – Outbound filters (cont.)ASPaths regular expressions are used to explicitly permit routes advertising.Combination of ASPaths & prefix filters

prevents creation of unintentional back-up transit path.

Both ASPaths & prefix filters are used by 13 % of IPSs.

Page 24: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

A B

C

D E

F

G

H

J

I

Peer ___Transit ___Back-up ___UnintentionalBack-up ___

Tier 1

Tier 2

Tier 3

D-C

D-C

Example:In absence of ASPath

check : path “D-C” learned after AD link

failure will be announced to B by A (after DA link failure)

providing unintentional back-up

path from C to B through A.

Page 25: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Trust Me…

Peer relations – Outbound filtersGenerally ISPs just trusts their peers to send only valid information.Only “bogon” filters identifying generally illegal (private, unallocated, etc.) addresses are applied.80% ISPs use “bogon” filters.20% ISPs use none.

Page 26: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Let Us Introduce…Model of BGP convergence is a directed graph.Node represent AS.Model is given for fixed destination X.The shortest path is chosenArc e(u,v) exists iff u informs v about its best route to X (not vice versa)The graph is not symmetricTopology of graph differs for different

destinations X

Page 27: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Up And DownGiven X – client connected to network by single arc to node A (AS of X).Link goes down : TDOWN is the time elapsed until every node knows there is no path to X (new stable state)Connection reestablished : TUP is the time elapsed until all nodes add route to X to their tables.

Page 28: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

What We Want to HearAfter establishing connection : Node learns about its best path to X in time dependent on its shortest path to XProof by simple induction.

TUP convergence is ruled by d - maximal shortest distance from X to any node.O(d * T DELAY), where T DELAY is T WAIT + T SEND

T DELAY may be of the same order as MinDelayAdver ,especially if implemented on per peer (not peer + destination) basis

Page 29: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

And What We don’t…After A-X link goes down multiple update messages are sent along arcs. Nodes will announce back-up paths for them withdraw

wasn’t received yet. Generally updates will propagate more slowly via long

paths because router add 0 to 30 sec delay Always add ~30 sec after initial update received.

Simple Path from X to A is covered by time T if any node in the path received update from preceding node and resend update to the next node before time T.

Page 30: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Long DownNode U has no route to X in time T iff all simple paths from X to U are covered.Simple path of length L is covered in O(L* T DELAY ) time.TDOWN convergence is ruled by D – length of longest simple path from X to any node.O(D * T DELAY)

Page 31: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

What Do You Want?

Minimize network diameter for improving TUP - increase connectivity!Minimize longest possible paths for improving TDOWN - decrease connectivity!NP-complete problem

For full mesh – diameter is 1, longest path is N

Page 32: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Welcome to the Reality6 months of experimental studies.Geographically and topologically diverse BGP sessions with > 20 IPSs.Artificial BGP transitions (announcement & withdraws) injected in > 10 providers.Broad spectrum of other IPSs surveyed.

Page 33: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Real World ExampleJapanese ISP (ISP4) have BGP peer sessions with providers IPS1, ISP2, ISP3 at Mae-West.Withdraw route Ri from IPSi. Observe paths announced by IPS4 for

every case.

Page 34: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

ISP4

ISP1

ISP5

R1 Fault

Steady State

The only back-up path explored is ISP1 -> ISP5 -> ISP4. The path explored in 96% events, 92 sec. Average. No path was explored in 4% events, 32 sec. Average.

Page 35: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

ISP4

ISP2

ISP5

R2 Fault

Steady State

ISP6

ISP10

ISP13

VagabondPath !

No path was explored in 7% events, 54 sec. Average. Only ISP2-ISP5-ISP4 was explored in 63% events, 79 sec. Average. ISP2-ISP5-ISP4 & ISP2-ISP5-ISP6-ISP4 was explored in 7% events, 88 sec. Average. 11 more unique paths in 45 distinct sequences of announcements. Most of them are “vagabond” back-up paths resulting from router configuration errors.

ISP11

ISP12

Page 36: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

It Was an Easy One…

Withdraw of R3 from ISP3 causes exploring fairly complex topology.

More than 20 distinct paths were announced. Almost 150 different combinations of

announcements. Much bigger convergence times (~ 140 sec) Only 35% of those paths are “legitimate” and the

rest are “vagabond” unintentional back-up paths.

Page 37: The Impact of Internet Policy and Topology on Delayed Routing Convergence.
Page 38: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Do not Interfere!

Selection & Order of back-up paths depends on interaction of MinRouteAdver timers on routersMinRouteAdver is usually implemented on

peer (not peer+address) basis , so earlier instability interferes.

For example: In ISP1 case in 4% of cases initial delay on IPS4 was longer than delay needed to propagate back-up path.

Page 39: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

LA to SF via HaifaVagabond paths were found in the majority of 200 monitored ISP pairs.Usually persist for short period (several days)Those erroneous paths do not conform any intended or published policy.Single error may have global impact mainly cause of lack of inbound filters on peer connections.Vagabond paths may impact performance and need to be automatically detected!

Page 40: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

You call It line?Average convergence delay clearly corresponds to the length of the longest back-up path.Back-up paths are determined by policy

and topology.

Data contains significant variability but linear relationships is clued by the experimental data.

Page 41: The Impact of Internet Policy and Topology on Delayed Routing Convergence.
Page 42: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

But Some are more equalTopology is dependent on ISP tier.Smaller ISP typically purchase transit from

multiple upstream providers.Smaller ISP implements back-up transits

policy unnecessary in large ISPs.Longest legitimate path : 9 ASes for Tier 1, 12 ASes for Tier 2.

Page 43: The Impact of Internet Policy and Topology on Delayed Routing Convergence.
Page 44: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

This way Supported by the provided example: ISP1 is large tier-1 backbone provider ISP2 is moderate sized US-based tier-2

provider ISP3 is regional tier-3 network

Tier-1 & tier-2 topology is much simpler and their customers are much less impacted by fail-over problems.

Page 45: The Impact of Internet Policy and Topology on Delayed Routing Convergence.
Page 46: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Now You SeeInternet lacks the level of reliability required by its future role.Route fail-over complexity scales linearly with longest back-up for the route.The back-up paths length depends on number of contractuals & policy implementation.

Page 47: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Advices are for freeFor Customer: If You do mission-critical stuff , connect to large providers.For Small ISP: Limit number of transit & backup transit connections.For All ASes: Avoid vagabond paths.Better route validation & authentication

mechanism are needed.

Page 48: The Impact of Internet Policy and Topology on Delayed Routing Convergence.

Any Proposals??

Adaptive MinRouteAdver timers?Additional information inclusion into BGP withdrawal messages?Other?