Top Banner
Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny
35

Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

Mar 27, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.

Responsive Yet Stable Traffic Engineering

Srikanth Kandula

Dina Katabi, Bruce Davie, and Anna Charny

Page 2: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

• Good mapping requires Load Balancing

• Good Mapping Good Performance & Low Cost

• ISPs needs to map traffic to underlying topology

100%

Ingress

Egress

Page 3: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

100%

Ingress

Egress

More, ISPs want to re-balance load when an unexpected event causes congestion

Page 4: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

100%

Ingress

EgressTraffic Change

More, ISPs want to re-balance load when an unexpected event causes congestion failure, BGP reroute, flash crowd, or attack

Page 5: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

More, ISPs want to re-balance load when an unexpected event causes congestion failure, BGP reroute, flash crowd, or attack

100%

Ingress

EgressTraffic Change

Move Traffic

Page 6: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

• Need to rebalance load ASAP Remove congestion before it affects user’s performance

• But, moving quickly may overshoot congestion on a different path more drops …

But, rebalancing load in realtime is risky

Congestion!

Ingress1

Ingress2Traffic Change

Page 7: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

• Need to rebalance load ASAP Remove congestion before it affects user’s performance

• But, moving quickly may overshoot congestion on a different path more drops …

But, rebalancing load in realtime is risky

Ingress1

Ingress2Traffic Change

Congestion!

Page 8: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

Problem: How to make Traffic Engineering:

• Responsive: reacts ASAP

• Stable: converges to balanced load without overshooting or generating new congestion

Problem: How to make Traffic Engineering:

• Responsive: reacts ASAP

• Stable: converges to balanced load without overshooting or generating new congestion

• Need to rebalance load ASAP Remove congestion before it affects user’s performance

• But, moving quickly may overshoot congestion on a different path more drops …

But, rebalancing load in realtime is risky

Page 9: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

Current ApproachesOffline TE (e.g., OSPF-TE)• Avoids the risk of instability caused

by realtime adaptation, but also misses the benefits

• Balances the load in steady state• Deal with failures and change in

demands by computing routes that work under most conditions

• Overprovision for unanticipated events

Online TE (e.g., MATE)• Try to adapt to unanticipated events• But, can overshoot causing drops and instability

Long-Term

Demands

Long-Term

Demands

Link WeightsLink Weights

OSPF-TE

Page 10: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

This Talk• TeXCP: Responsive & Stable Online TE

• Idea: Use adaptive load balancing But add explicit-feedback congestion

control to prevent overshoot and drops

• TeXCP keeps utilization always within a few percent of optimal

• Compare to MATE and OSPF-TE, showing that TeXCP outperforms both

Page 11: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

Typical Formalization of the TE Problem

Find a routing that:

Min Max-Utilization

• Removes hot spots and balances load

• High Max-Utilization is an indicator that the ISP should upgrade its infrastructure

Page 12: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

Online TE involves solving 2 sub-problems

1. Find the traffic split that minimizes the Max-Utilization

2. Converge to the balanced traffic splits in a stable manner

Also, an implementation mechanism

to force traffic to follow the desired splits

Page 13: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

• A TeXCP agent per IE, at ingress node

• ISP configures each TeXCP agent with paths between IE

• Paths are pinned (e.g., MPLS tunnels)

Force traffic along the right pathsImplementation:

IngressEgress

TeXCP Agent

Solution:

Page 14: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

• Periodically, TeXCP agent probes a path for its utilization

Distributedly, TeXCP agents find balanced traffic splits

Sub-Problem:

Solution: TeXCP Load Balancer

Ingress

Egress

x

U1 = 0.4

U2 = 0.7

U1 = 0.4

U2 = 0.7

Probes follow the slow path like ICMP messages

Page 15: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

• Periodically, TeXCP agent probes a path for its utilization

Sub-Problem:

Solution:

)()( tututrr ppp

• A TeXCP agent iteratively moves traffic from over-utilized paths to under-utilized paths rp is this agent’s traffic on path p

• Deal with different path capacity

• Deal with inactive paths (rp =0)

TeXCP Load Balancer

Distributedly, TeXCP agents find balanced traffic splits

Page 16: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

• Periodically, TeXCP agent probes a path for its utilization

Sub-Problem:

Solution:

• A TeXCP agent iteratively moves traffic from over-utilized paths to under-utilized paths rp is this agent’s traffic on path p

• Deal with different path capacity

• Deal with inactive paths (rp =0)

TeXCP Load Balancer

Proof in paper

i

ii

r

uru )()(ˆ tututrr ppp

Distributedly, TeXCP agents find balanced traffic splits

Page 17: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

Converge to balanced load in a stable way

Sub-Problem:

Solution: Use Experience from Congestion Control (XCP)

Congestion Control• Flow from sender to

receiver

• Senders share the bottleneck; need coordination to prevent oscillations

Online TE• Flow from ingress to

egress

• TeXCP agents share physical link; need coordination to prevent oscillations

Move in really small increments No Overshoot! Challenge is to move traffic quickly

w/o overshoot

Move in really small increments No Overshoot! Challenge is to move traffic quickly

w/o overshoot

Page 18: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

Load

Balancer

Load

Balancer

Path

Controller

Path

ControllerPath

Controller

Path

Controller

Ingress-Egress Demands

Congestion Management layer

Analogous to an Application

Congestion Management Layer between Load Balancer and Data Plane Set of light-weight per-path congestion controllers

Unlike prior online TE, Load Balancer can push a decision to the data plane only as fast as the Congestion Management Layer allows

it

Unlike prior online TE, Load Balancer can push a decision to the data plane only as fast as the Congestion Management Layer allows

it

Page 19: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

• Explicit feedback from core routers (like XCP)

• Periodically, collects feedback in ICMP-like probes

Per-Path Light-Weight Congestion Controller

Ingress Egress

Utilization

Feedback

Probe

Page 20: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

• Explicit feedback from core routers (like XCP)

• Periodically, collects feedback in ICMP-like probes

Per-Path Light-Weight Congestion Controller

Ingress Egress

U = .8

F = 100kbps

Probe

Page 21: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

• Explicit feedback from core routers (like XCP)

• Periodically, collects feedback in ICMP-like probes

Per-Path Light-Weight Congestion Controller

Ingress Egress

U = .8

F = 100kbps

Probe

U = .2

F = 500kbps

U = .2

F = 500kbps

Page 22: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

• Explicit feedback from core routers (like XCP)

• Periodically, collects feedback in ICMP-like probes

Per-Path Light-Weight Congestion Controller

Ingress Egress

• Core router computes aggregate feedback Δ = Spare BW – Queue / Max-RTT

• Estimates number of IE-flows by counting probes, and divides feedback between them

Occasional explicit feedback in probes… Need software changes only

Occasional explicit feedback in probes… Need software changes only

Page 23: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

Stability Idea

Theorem 1: Given a particular load split, the path controller stabilizes the traffic on each link

Theorem 2: Given stable path controllers, Every TeXCP agent sees balanced load on all paths Unused paths have higher utilization than used paths

Load Balancer

Path Controller

Path Controller

Ingress

Egress

Informally stated:

Per-path controller works at a faster timescale than load balancer Can decouple components Stabilize separately

Page 24: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

Performance

Page 25: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

Simulation Setup

Standard for TE• Rocketfuel topologies • Average demands follow gravity model• IE-traffic consists of large # of Pareto on-off

sources

TeXCP Parameters: Each agent is configured with 10 shortest paths Probe for explicit feedback every 0.1s Load balancer re-computes a split every 0.5s

Compare to Optimal Max-Utilization Obtained with a centralized oracle that has

Immediate and exact demands info, and uses as many paths as necessary

Page 26: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

TeXCP Balances Load Without Oscillations

TeXCP converges to a few percent of optimal

TeXCP converges to a few percent of optimal

Time (s)

Maxim

um

Lin

k U

tiliz

ati

on

Page 27: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

TeXCP Balances Load Without Oscillations

Time (s)

Utilizations of all links in the network change without oscillations

Utilizations of all links in the network change without oscillations

Lin

k U

tiliz

ati

on

Page 28: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

Comparison with MATE

• MATE is the state-of-the-art in online TE

• All simulation parameters are from the MATE paper

Ingress 1

Ingress 2

Ingress 3

Egress 1

Egress 2

Egress 3

L1

L2

L3

Page 29: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

Time (s) Time (s)

Avg. drop rate in MATE is 20% during convergence

decrease in cross traffic

decrease in cross traffic

TeXCP MATE

Explicit feedback allows TeXCP to react faster and without oscillations

Explicit feedback allows TeXCP to react faster and without oscillations

TeXCP balances load better than MATEO

ffere

d Li

nk L

oad

Page 30: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

Comparison with OSPF-TE

• OSPF-TE is the most-studied offline TE scheme

• It computes link weights, which when used in OSPF balance the load

• OSPF-TE-FAIL is an extension that optimizes for failures

• OSPF-TE-Multi-TM is an extension that optimizes for variations in traffic demands

Page 31: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

1

1.2

1.4

1.6

Abovenet Genuity Sprint Tiscali AT&T

OSPF-TE TeXCP

Comparison with OSPF-TE under Static Load

Abovenet Genuity Sprint Tiscali AT&T

1.6

1.4

1.2

1

Optimal

Rati

o o

f M

ax-U

to O

pt.

Page 32: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

TeXCP is within a few percent of optimal, outperforming OSPF-TE

TeXCP is within a few percent of optimal, outperforming OSPF-TE

Comparison with OSPF-TE under Static Load

1

1.2

1.4

1.6

Abovenet Genuity Sprint Tiscali AT&T

OSPF-TE TeXCP

Abovenet Genuity Sprint Tiscali AT&T

1.6

1.4

1.2

1

Rati

o o

f M

ax-U

to O

pt.

OSPF-TE TeXCP

Page 33: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

Comparison with OSPF-TE-Fail

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

Abovenet Genuity Sprint Tiscali AT&T

OSPF-TE-Fail TeXCP

Abovenet Genuity Sprint Tiscali AT&T

3

2.8

2.6

2.4

2.2

2

1.8

1.6

1.4

1.2

1Rati

o o

f M

ax-U

to O

pt.

TeXCP allows an ISP to support same failure resilience with about ½ the capacity !

TeXCP allows an ISP to support same failure resilience with about ½ the capacity !

OSPF-TE-Fail TeXCP

Page 34: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

Performance When Traffic Deviates From Long-term Averages

TeXCP reacts better to realtime demands! TeXCP reacts better to realtime demands!

1

1.1

1.2

1.3

1.41.5

1.6

1.7

1.8

1.9

1 1.5 2 2.5 3 3.5 4 4.5 5

Deviation from Long-term Average

OSPF-TE-Multi-TM TeXCP

1.8

1.6

1.4

1.2

1

Rati

o o

f M

ax-U

to O

pt.

Deviation from Long-term Average Demands 1 1.5 2 2.5 3 3.5 4 4.5 5

OSPF-TE-Multi-TM TeXCP

Page 35: Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny.

Conclusion• TeXCP: Responsive & Stable Online TE

• Combines load balancing with a Cong. Mngt. Layer to prevent overshoot and drops

• TeXCP keeps utilization always within a few percent of optimal

• Compared to MATE, it is faster and does not overshoot

• Compared to OSPF-TE it keeps utilization 20% to 100% lower it supports the same failure resilience with ½

the capacity major savings for the ISP

http://nms.lcs.mit.edu/projects/texcp/