An open problem in Internet An open problem in Internet Routing --- Policy Language Routing --- Policy Language
Design for BGPDesign for BGP
Nov 3, 2003
Timothy G. Griffin Intel Research,
Cambridge UK
Architecture of Dynamic Routing
AS 1
AS 2
EGP (= BGP)
EGP = Exterior Gateway Protocol
IGP = Interior Gateway Protocol
Metric based: OSPF, IS-IS, RIP, EIGRP (cisco)
Policy based: BGP
The Routing Domain of BGP is the entire Internet
IGP
IGP
• Topology information is flooded within the routing domain
• Best end-to-end paths are computed locally at each router.
• Best end-to-end paths determine next-hops.
• Based on minimizing some notion of distance
• Works only if policy is shared and uniform
• Examples: OSPF, IS-IS
• Each router knows little about network topology
• Only best next-hops are chosen by each router for each destination network.
• Best end-to-end paths result from composition of all next-hop choices
• Does not require any notion of distance
• Does not require uniform policies at all routers
• Examples: RIP, BGP
Link State Vectoring
Technology of Distributed Routing
The Gang of Four
Link State Vectoring
EGP
IGP
BGP
RIPIS-IS
OSPF
Partial View of www.cl.cam.ac.uk (128.232.0.20) Neighborhood
AS 786 ja.net(UKERNA)
AS 1239 Sprint
AS 4373 Online Computer Library Center
Originates > 180 prefixes, Including 128.232.0.0/16
AS 3356Level 3
AS 6461AboveNet
AS 1213 HEAnet(Irish academic and research)
AS 7 UK Defense Research Agency
AS 5459 LINX
AS 702 UUNET
AS 20965 GEANT
How Many ASNs are there today?
Thanks to Geoff Huston. http://bgp.potaroo.net on November 3, 2003
16,046
7
Four Types of BGP Messages
• Open : Establish a peering session.
• Keep Alive : Handshake at regular intervals.
• Notification : Shuts down a peering session.
• Update : Announcing new routes or withdrawing previously announced routes.
announcement = prefix + attributes values
BGP Attributes
Value Code Reference----- --------------------------------- --------- 1 ORIGIN [RFC1771] 2 AS_PATH [RFC1771] 3 NEXT_HOP [RFC1771] 4 MULTI_EXIT_DISC [RFC1771] 5 LOCAL_PREF [RFC1771] 6 ATOMIC_AGGREGATE [RFC1771] 7 AGGREGATOR [RFC1771] 8 COMMUNITY [RFC1997] 9 ORIGINATOR_ID [RFC2796] 10 CLUSTER_LIST [RFC2796] 11 DPA [Chen] 12 ADVERTISER [RFC1863] 13 RCID_PATH / CLUSTER_ID [RFC1863] 14 MP_REACH_NLRI [RFC2283] 15 MP_UNREACH_NLRI [RFC2283] 16 EXTENDED COMMUNITIES [Rosen] ... 255 reserved for development
From IANA: http://www.iana.org/assignments/bgp-parameters
Mostimportantattributes
Not all attributesneed to be present inevery announcement
9
BGP Route Processing
Best Route Selection
Apply Import Policies
Best Route Table
Apply Export Policies
Install forwardingEntries for bestRoutes.
ReceiveBGPUpdates
BestRoutes
TransmitBGP Updates
Apply Policy =filter routes & tweak attributes
Based onAttributeValues
IP Forwarding Table
Apply Policy =filter routes & tweak attributes
Open ended programming.Constrained only by vendor configuration language
Route Selection Summary
Highest Local Preference
Shortest ASPATH
Lowest MED
i-BGP < e-BGP
Lowest IGP cost to BGP egress
Lowest router ID
traffic engineering
Enforce relationships
Throw up hands andbreak ties
11
ASPATH Attribute
AS7018135.207.0.0/16AS Path = 6341
AS 1239Sprint
AS 1755Ebone
AT&T
AS 3549Global Crossing
135.207.0.0/16AS Path = 7018 6341
135.207.0.0/16AS Path = 3549 7018 6341
AS 6341
135.207.0.0/16
AT&T Research
Prefix Originated
AS 12654RIPE NCCRIS project
AS 1129Global Access
135.207.0.0/16AS Path = 7018 6341
135.207.0.0/16AS Path = 1239 7018 6341
135.207.0.0/16AS Path = 1755 1239 7018 6341
135.207.0.0/16AS Path = 1129 1755 1239 7018 6341
In fairness: could you do this “right” and still scale?
Exporting internalstate would dramatically increase global instability and amount of routingstate
Shorter Doesn’t Always Mean Shorter
AS 4
AS 3
AS 2
AS 1
Mr. BGP says that path 4 1 is better than path 3 2 1
Duh!
13
Shedding Inbound Traffic with ASPATH Prepending
Prepending will (usually) force inbound traffic from AS 1to take primary linkAS 1
192.0.2.0/24ASPATH = 2 2 2
customerAS 2
provider
192.0.2.0/24
backupprimary
192.0.2.0/24ASPATH = 2
Yes, this is a Glorious Hack …
14
… But Padding Does Not Always Work
AS 1
192.0.2.0/24ASPATH = 2 2 2 2 2 2 2 2 2 2 2 2 2 2
customerAS 2
provider
192.0.2.0/24
192.0.2.0/24ASPATH = 2
AS 3provider
AS 3 will sendtraffic on “backup”link because it prefers customer routes and localpreference is considered before ASPATH length!
Padding in this way is oftenused as a form of loadbalancing
backupprimary
15
COMMUNITY Attribute to the Rescue!
AS 1
customerAS 2
provider
192.0.2.0/24
192.0.2.0/24ASPATH = 2
AS 3provider
backupprimary
192.0.2.0/24ASPATH = 2 COMMUNITY = 3:70
Customer import policy at AS 3:If 3:90 in COMMUNITY then set local preference to 90If 3:80 in COMMUNITY then set local preference to 80If 3:70 in COMMUNITY then set local preference to 70
AS 3: normal customer local pref is 100,peer local pref is 90
Don’t celebrate just yet…
customer
peering
provider/customer
Provider B (Tier 1)Provider A (Tier 1)
Provider C (Tier 2)
Now, customer wants a backup link to C….
provider/customer
Customer installs a “backup link” …
customer
Provider B (Tier 1)Provider A (Tier 1)
Provider C (Tier 2)
customer sends “lower my preference” Community value
primarybackup
Disaster Strikes!
customer
Provider B (Tier 1)Provider A (Tier 1)
Provider C (Tier 2)primary
backup
customer is happy that backup was installed …
The primary link is repaired, and something odd occurs…
customer
Provider B (Tier 1)Provider A (Tier 1)
Provider C (Tier 2)primary
backup
YIKES --- routing DOES NOT return to normal!!!
WAIT! It Gets Better…
A
P
B
BB
C
B
D
P = primary B = backup
OOOOOPS!
A
P
B
BB
C
B
DSuppose A, B, C all break ties in the same direction(clockwise or counter-clockwise)
No solution =Protocol Divergence
What the heck is going on?
• There is no guarantee that a BGP configuration has a unique routing solution. – When multiple solutions exist, the (unpredictable) order
of updates will determine which one is wins.
• There is no guarantee that a BGP configuration has any solution!– And checking configurations NP-Complete [GW1999]
• Complex policies (weights, communities setting preferences, and so on) increase chances of routing anomalies.– … yet this is the current trend!
What Problem is BGP Solving?
Underlying problem
Shortest Paths
Distributed means of computing a solution.
????
RIP, OSPF, IS-IS
BGP
[GSW1998, GSW2002]
Stable Paths
1
An instance of the Stable Paths Problem (SPP)
2 5 5 2 1 0
0
2 1 02 0
1 3 01 0
3 0
4 2 04 3 0
3
4
2
1
•A graph of nodes and edges, •Node 0, called the origin, •For each non-zero node, a set or permitted paths to the origin. This set always contains the “null path”. •A ranking of permitted paths at each node. Null path is always least preferred. (Not shown in diagram)
When modeling BGP : nodes represent BGP speaking routers, and 0 represents a node originating some address block
most preferred…least preferred
5 5 2 1 0
1
A Solution to a Stable Paths Problem
2
0
2 1 02 0
1 3 01 0
3 0
4 2 04 3 0
3
4
2
1
•node u’s assigned path is either the null path or is a path uwP, where wP is assigned to node w and {u,w} is an edge in the graph,
•each node is assigned the highest ranked path among those consistent with the paths assigned to its neighbors.
A Solution need not represent a shortest path tree, or a spanning tree.
A solution is an assignment of permitted paths to each node such that
An SPP may have multiple solutions
First solution
1
0
2
1 2 01 0
1
0
2
1
0
2
2 1 02 0
1 2 01 0
2 1 02 0
1 2 01 0
2 1 02 0
Second solutionDISAGREE
BAD GADGET : No Solution
2
0
31
2 1 02 0
1 3 01 0
3 2 03 0
4
3
This is an SPP version of the example first presented in Persistent Route Oscillations in Inter-Domain Routing. Kannan Varadhan, Ramesh Govindan,and Deborah Estrin. Computer Networks, Jan. 2000
SURPRISE!
2
0
31
2 1 02 0
1 3 01 0
3 4 2 03 0
4
4 04 2 04 3 0
Becomes a BAD GADGET if link (4, 0) goes down.
BGP is not robust : it is not guaranteed to recover from network failures.
PRECARIOUS
1
0
2
1 2 01 0
2 1 02 0
3
4
5 6
5 3 1 05 6 3 1 2 05 3 1 2 0
6 3 1 06 4 3 1 2 06 3 1 2 0
4 3 1 04 5 3 1 2 04 3 1 2 0
3 1 03 1 2 0
As with DISAGREE, this part has two distinct solutions
This part has a solution only when node 1 is assigned the direct path (1 0).
Has a solution, but path vector may not find it!
A Sufficient Condition for Robustness
Checking PPO at the “language level” is an NP-Complete problem
P Q : transitive closure of (subpath relation on permitted paths union the path ranking relation at each node)
Partially Partially Ordered (PP0): For all paths P and Q, P Q and Q P implies (P = Q or head(P) = head(Q))
This is a sufficient condition for robustness
PPO iff ranking functions can be rewritten to be strictly increasing along all paths
Why is BGP not causing more trouble?If the provider/customer digraph is acyclic and every AS obeys the commandments
• Thou shall prefer customer routes over all others
• Thou shall use provider routes only as a last resort
• Thou shall not provide transit between peers or providers
then the BGP configuration is robust. [see Gao-Rexford and Gao-Griffin-Rexford]
Hierarchical BGP (HBGP)
HBGP
HBGP +PEER + BU
HBGP +PEER HBGP + BU
[GR2000, GGR2001]
Can BGP be fixed?
Joint work with Aaron Jaggard (UPenn Math) and Vijay Ramachandran (Yale CS) to appear at SIGCOMM 2003
• BGP policy languages have evolved organically
• A policy language really should be designed!
• But how?
Design Dimensions
• Robustness (required!)• Transparency (required!)• Expressive Power• Autonomy (“local wiggle room”) • Local vs. Global Constraints• Policy Opacity
Tradeoffs galore
General Autonomy
Suppose C and K are any predicates that partition all routes.Then it is possible to write policies, with no inbound filtering, such that for all imported routes, those that satisfy C are ranked below those that satisfy K.
A Partial Ordered for the Design Space
( J , L ) < ( J , L ) 11 2 2
if and only if for all S : SPP
1. J(S) implies J(S)
2. L(S) implies L(S) 2
2
21
1
Local ConstraintGlobal Constraint
Robust Designs
( J, L ) is a robust design if
2
(J and L ) implies PPO
Examples:
( True, SP )
( PPO, True )
Robust Subspace
( PPO, True )
( True, SP )
Exp
ress
ive P
ow
er
Con
stra
int
Sim
plic
ity
Not tractable
Tractable
Need Global Constraints
Theorem: Any robust system supporting both transparency and autonomy must have a non-trivial global constraint
Global constraints must be a part of design from the start
Next?
• Need techniques for constructing policy languages.
• Design of protocols to enforce global constraints.
• Can ad-hocery be avoided?