R R esilient esilient R R outing outing R R econfiguration econfiguration Ye Wang, Hao Wang*, Ajay Mahimkar + , Richard Alimi, Yin Zhang + , Lili Qiu + , Yang Richard Yang * Google + University of Texas at Austin Yale University ACM SIGCOMM 2010, New Delhi, India ACM SIGCOMM 2010, New Delhi, India September 2, 2010
48
Embed
Resilient Routing Reconfiguration Ye Wang, Hao Wang*, Ajay Mahimkar +, Richard Alimi, Yin Zhang +, Lili Qiu +, Yang Richard Yang * Google + University.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
e.g., concurrent fiber cuts in Sprint (2006) Planned maintenance affects multiple network elements
May overlap with unexpected failures (e.g. due to inaccurate SRLG)
Increasingly stringent requirement on reliability VoIP, video conferencing, gaming, mission-critical apps, etc. SLA has teeth violation directly affects ISP revenue
Need resiliency: network should recover quickly & smoothly from one or multiple overlapping failures
Challenge: Topology Uncertainty Number of failure scenarios quickly explodes
500-link network, 3-link failures: > 20,000,000!
Difficult to optimize routing to avoid congestion under all possible failure scenarios Brute-force failure enumeration is clearly infeasible Existing methods handle only 100s of topologies
Difficult to install fast routing Preconfigure 20,000,000 backup routes on routers?
Focus exclusively on reachability e.g., FRR, FCP (Failure Carrying Packets), Path Splicing May suffer from congestion and unpredictable performance
congestion mostly caused by rerouting under failures [Iyer et al.] multiple network element failures have domino effect on FRR
rerouting, resulting in network instability [N. So & H. Huang]
Only consider a small subset of failures e.g., single-link failures [D. Applegate et al.] Insufficient for demanding SLAs
Online routing re-optimization after failures Too slow cannot support fast rerouting
Existing Approaches & Limitations
4
R3: Resilient Routing Reconfiguration
A novel link-based routing protection scheme
requiring no enumeration of failure scenarios provably congestion-free for all up-to-F link failures efficient w.r.t. router processing/storage overhead flexible in supporting diverse practical requirements
Goal: congestion-free rerouting under up-to-F link failures Input: topology G(V,E), link capacity ce, traffic demand d
dab: traffic demand from router a to router b
Output: base routing r, protection routing p rab(e): fraction of dab carried by link e
pl(e): (link-based) fast rerouting for link l
Problem Formulation
6
5
1010
10
10
capacity dab=6
a b
d
c
rab(ab)=1
pab(ac)=1 pab(cb)=1
From Topology Uncertainty to Traffic Uncertainty
Instead of optimizing for original traffic demand on all possible topologies under failures
R3 optimizes protection routing for a set of traffic demands on the original topology
Rerouting virtual demand set captures the effect of failures on amount of rerouted traffic
Protection routing on original topology can be easily reconfigured for use after failure occurs
7
Failure scenario (f) rerouted traffic (x)
Rerouted traffic under all possible up-to-F-link failure scenarios (independ of r):XF = { x | 0 ≤ xl ≤ cl, Σ(xl/cl) ≤ F } (convex combination)
Rerouting Virtual Demand Set
8
4/5
0/10 0/10
4/10
2/10
rerouted traffic xl after link l fails = base load on l given r(r is congestion-free xl ≤ cl)
load/capacity
a b
c
d
Failure scenario
Rerouted traffic
Upper bound of rerouted traffic
ac fails xac = 4 xac ≤ 5 (cac)
ab fails xab = 2 xab ≤ 10 (cab)
R3 Overview Offline precomputation
Plan (r,p) together for original demand d plus rerouting virtual demand x on original topology G(V,E) to minimize congestion
p may “use” links that will later fail
Online reconfiguration Convert and use p for fast rerouting after failures
9
Compute (r,p) to minimize MLU (Max Link Utilization) for original demand d + rerouting demand x ∈ XF
r carries d, p carries x∈XF
min(r,p) MLU
s.t. [1] r is a routing, p is a routing; [2] ∀x∈XF, ∀e:
[ ∑a,b∈Vdabrab(e) + ∑l∈Exlpl(e) ] / ce ≤ MLU
Challenge: [2] has infinite number of constraints Solution: apply LP duality a polynomial # of constraints
Offline Precomputation
10
Original traffic Rerouting traffic
Step 1: Fast rerouting after ac fails: Precomputed p for ac:
pac(ac)=1/3, pac(ab)=1/3, pac(ad)=1/3 ac fails fast reroute using pac \ pac(ac) equivalent to a rescaled pac:
ξac(ac)=0, ξac(ab)=1/2, ξac(ad)=1/2 link l fails ξl(e)=pl(e)/(1-pl(l))
Online Reconfiguration
11
1/3
1/3 1/3
2/31/3
pac(e)
a b
c
d
0
1/2
1/2 1/2
1ξac(e)
Online Reconfiguration (Cont.) Step 2: Reconfigure p after failure of ac
Current p for ab: pab(ac)=1/2, pab(ad)=1/2
ac fails 1/2 need to be “detoured” using ξac pab(ac)=0, pab(ad)=3/4, pab(ab)=1/4
link l fails ∀l’, pl’(e) = pl’(e)+pl’(l) ξl(e)
12
1/2
1/2 1/2
1/2
Apply detour ξac on every protection routing (for otherlinks) that is using ac
pab(e)
a b
c
d
0
3/4 3/4
0
01/4
R3 Guarantees Sufficient condition for congestion-free
if (∃ r,p) s.t. MLU ≤ 1 under d+XF
no congestion under any failure involving up to F links
Necessary condition under single link failure if there exists a protection routing guarantees no congestion
under any single-link failure scenario (∃ r,p) s.t. MLU ≤ 1 under d+X1
Adding superset of rerouted traffic to original demand is not so wasteful
Open problem: is R3 optimal for >1 link failures?
R3 online reconfiguration is order independent of multiple failures
13
R3 Extensions Fixed base routing
r can be given (e.g., as an outcome of OSPF) Trade-off between no-failure and failure protection
Add penalty envelope β(≥1) to bound no-failure performance
Trade-off between MLU and end-to-end delay Add envelope γ(≥1) to bound end-to-end path delay
Prioritized traffic protection Associate different protection levels to traffic with different priorities
Realistic failure scenarios Shared Risk Link Group, Maintenance Link Group
Traffic variations Optimize (r,p) for d ∈ D + x ∈ XF
Robustness on Base Routing OSPFInvCap: link weight is inverse proportional to bandwidth OSPF: optimized link weights Left: single failure; Right: two failures
30
A better base routing can lead to better routing protection