Top Banner
Update Damping in BGP Geoff Huston Chief Scientist, APNIC
33

Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Jun 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Update Damping in BGP

Geoff HustonChief Scientist, APNIC

Page 2: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

BGP Growth: Table Size

Page 3: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

BGP Growth: Updates (05 – 06)

BGP Prefix Updates per Day

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Jan-05 Mar-05 May-05 Jul-05 Sep-05 Nov -05 Jan-06 Mar-06 May-06 Jul-06 Sep-06

Mill

ions

Date

Pref

ixes

/ D

ay

Total BGP Prefix Count Withdrawn Prefix Count

Page 4: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Limits to Growth?

Are there practical limits to the size of the routed network ?

limits to routing database size ?limits in routing update processing load ?practical bounds for time to reach “converged” routing states ?

Page 5: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Current UnderstandingsThe protocol message peak rate is increasing faster than the number of routed entries

BGP is a “chatty” protocolDense interconnection implies higher levels of path exploration to stabilize on best available paths

Some concern that BGP has some practical limits in terms of size and convergence times within the bounds of currently deployed routing machinery

Some further concern that these limits may be achieved in the near term future

Page 6: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Profiling BGP Load

Use a BGP monitor connected to DFZ update feeds

Quagga

Log all updatesProcess logs and generate daily profile

http://bgpupdates.potaroo.net

Page 7: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Update Distribution by Prefix

Page 8: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Update Distribution by Origin AS

Page 9: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Previous Analysis of BGP Update Profile

Update load profile and convergence times do not appear to be precisely aligned to routing table size

The BGP load profile is heavily skewed, with a small number of route objects, and a small number of origin AS’s, contributing a disproportionate amount to the routing update load

Background load appears to be heavily related to close-to-collector routing events that affect large numbers of routed objectsIntense load appears to be related to close-to-origin routing events that affect small numbers of routed objects with each event

As the network grows the highly active component of route load does not appear to grow proportionally

Page 10: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

What’s the cause here?

BGP Updates recorded at AS2.0, June 28 – July 12AS21452

Page 11: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

What’s the cause here?

BGP Updates recorded at AS2.0, June 28 – July 12AS21452

This daily cycle of updates with a weekend profile is a characteristic signature of the origin AS performing some form of load-based routing

Page 12: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Poor Traffic Engineering?An increasing trend to “multi-home” an AS with multiple transit providersSpread traffic across the multiple transit paths by selectively altering advertisementsThe use of load monitors and BGP control systems to automate the processPoor tuning (or no tuning!) of the automated traffic engineeringprocess produces extremely unstable BGP outcomes!

AS1

AS2 AS3

Page 13: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

BGP Update Load ProfileIt appears that the majority of the BGP load is caused by a very small number of unstable origination configurations, possibly driven by automated systems with limited or no feedback control

This problem is getting larger over time

The related protocol update load consumes routing resources, but does not change the base information state – it generally oscillates across a small set of states that do not imply local forwarding change

Page 14: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Mitigating BGP Update Loads

Current set of deployed “tools” to mitigate BGP update overheads:

1. Minimum Route Advertisement Interval Timer (MRAI)

2. Withdrawal MRAI Timer3. Route Flap Damping4. Output Queue Compression

Page 15: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

1. MRAI Timer

Optional timer in BGPON in Ciscos (30 seconds)OFF in Junipers (0 secconds)

Suppress the advertisement of successive updates to a peer for a given prefix until the timer expiresCommonly implemented as suppress ALL updates to a peer until a per-peer MRAI timer expires

Page 16: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

2. Withdrawal MRAI TIMER

Variant on MRAI where withdrawals are also time limited in the same way as updates

Page 17: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

3. Route Flap DampingRFD attempts to apply a heuristic to identify noisy prefixes and apply a longer term suppression to update propagation Uses the concept of a “penalty” score applied to a prefix learned from a peer

Each update and withdrawal adds to the scoreThe score decays exponentially over timeIf the score exceeds a suppress threshold the route is dampedDamping remains in place until the score drops below the release thresholdDamping is applied to the adj-rib-in

Page 18: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

4. Output Queue CompressionBGP is a rate-throttled protocol (due to TCP transport)

A process-loaded BGP peer applies back pressure to the ‘other’ side of the BGP session by shutting down the advertised TCP recvwindowThe local BGP process may then perform queue compression on the output queue for that peer, removing queued updates that refer to the same prefix

Apply queue compression when this queue forms

Close TCP window when this queue forms

Page 19: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Some ObservationsRFD – long term suppression

Route Flap damping extends convergence times by hours with no real benefit offset

MRAI – short term suppressionMRAI variations in the network make path exploration noisierEven with piecemeal MRAI deployment we still have a significant routing load attributable to Path Exploration

Output Queue CompressionRarely triggered in today’s network!

Page 20: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Code Description

AA+ Announcement of an already announced prefix with a longer AS Path (update to longer path)

AA- Announcement of an announced prefix with a shorter AS Path (update to shorter path)

AA0 Announcement of an announced prefix with a different path of thesame length (update to a different AS Path of same length)

AA* Announcement of an announced prefix with the same path but different attributes (update of attributes)

AA Announcement of an announced prefix with no change in path or attributes (possible BGP error or data collection error)

WA+ Announcement of a withdrawn prefix, with longer AS Path

WA- Announcement of a withdrawn prefix, with shorter AS Path

WA0 Announcement of a withdrawn prefix, with different AS Path of the same length

WA* Announcement of a withdrawn prefix with the same AS Path, but different attributes

WA Announcement of a withdrawn prefix with the same AS Path and same attributes

AW Withdrawal of an announced prefix

WW Withdrawal of a withdrawn prefix (possible BGP error or a data collection error)

BGP Update Types

Announced-to-AnnouncedUpdates

Withdrawn-to-AnnouncedUpdates

Announced-to-WithdrawnWithdrawn-to-Withdrawn

Page 21: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Code Count

AA+ 607,093

AA- 555,609

AA0 594,029

AA* 782,404

AA 195,707

WA+ 238,141

WA- 190,328

WA0 51,780

WA* 30,797

WA 77,440

AW 627,538

WW 0

BGP Path Exploration?

April 2007 BGP Update Profile

Totals of each type of prefix updates, using a recording of all BGP updates as heard by AS2.0 for the month of April 2007

Page 22: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

BGP Update Profile

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

AA+ AA- AA0 AA* AA WA+ WA- WA0 WA* WA AW WW

Relative proportion of BGP Prefix Update Types

Path Exploration Candidates

Page 23: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Time Distribution of UpdatesTime Distribution of Updates (Hours)

1

10

100

1000

10000

100000

1000000

10000000

0 24 48 72 96 120 144 168 192 216 240 264 288 312 336 360 384 408 432 456 480 504 528 552 576 600 624 648 672 696 720

Time Interval between updates of the same prefix (hours)

Upd

ate

Cou

nt (l

og)

24 hour cycles?

Elapsed time between received updates for the same prefix - hours

Page 24: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Time Distribution of UpdatesTime Distribution of Updates (Minutes)

1

10

100

1000

10000

100000

1000000

10000000

0 30 60 90 120 150 180 210 240

Time Interval between updates of the same prefix (minutes)

Upd

ate

Cou

nt (l

og)

Route Flap Damping?

Elapsed time between received updates for the same prefix - minutes

Page 25: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Time Distribution of UpdatesTime Distribution of Updates (Seconds)

1

10

100

1000

10000

100000

1000000

0 30 60 90 120 150 180 210 240

Time Interval between updates of the same prefix (seconds)

Upd

ate

Cou

nt (l

og)

MRAI Timer

Elapsed time between received updates for the same prefix - seconds

Page 26: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Update Sequence Length Distribution

Update Sequences (using 35 second interval timer)

1

10

100

1000

10000

100000

1000000

10000000

0 5 10 15 20 25 30 35 40 45 50

Sequence Length (updates)

Log

of N

umbe

r of s

eque

nces

A “sequence” is a set of updates for the same prefix that are separated by an interval <= the sequence timer (35 seconds)

Page 27: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Path Exploration Damping (PED)A prevalent form of path hunting is the update sequence of increasing AS path length, followed by a withdrawal, all closely coupled in time

{AA+, AA0, AA} *, AW

The AA+, AA0 and AA updates are intermediate noise updates in this case representing transient routing states

Can these updates be locally suppressed for a short interval to see if they are path of a BGP Path Exploration activity?

The suppression would hold the update in the local output queue for a fixed time interval (in which case the update is released) or the update is further updated by queuing a subsequent update (orwithdrawal) for the same prefix

Page 28: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Path Exploration Damping

Apply a 35 second MRAI timer to AA+, AA0 and AA updates queued to eBGP peersNo MRAI timer applied to all other updates and all withdrawals

35 seconds is used to compensate for MRAI-filtered update sequences that use 30 second interval

Page 29: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

PED Results on BGP dataBGP Update Damping - average damped updates per second

0

1

2

3

4

5

6

7

8

9

10

1 25 49 73 97 121 145 169 193 217 241 265 289 313 337 361 385 409 433 457 481 505 529 553 577 601 625 649 673 697

Hour

Dam

ped

Upd

ates

/ se

cond

Path Exploration Damping applied toBGP updates recorded at AS2.0, June 28 – July 12

Page 30: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

PED Results on BGP dataBGP Update Damping - peak damped updates per second

0

100

200

300

400

500

600

700

800

900

1000

1 25 49 73 97 121 145 169 193 217 241 265 289 313 337 361 385 409 433 457 481 505 529 553 577 601 625 649 673 697

Hour

Peak

dam

ped

upda

tes

/ sec

ond

Path Exploration Damping applied toBGP updates recorded at AS2.0, June 28 – July 12

Page 31: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

PED Results on BGP data

21% of all updates in the collection period would’ve been eliminated by Path Exploration DampingAverage update rate for the month would fall from 1.60 prefix updates per second to 1.22 prefix updates per secondAverage peak update rates fall from 355 to 290 updates per second

Page 32: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

SummaryMuch of the update processing load in BGP is in processing non-informative intermediate states caused by BGP Path ExplorationExisting approaches to suppress this processing load appear to be too coarse to be very effectiveSome significant leverage in further reducing BGP peak load rates can be obtained by applying a more selective algorithm to the MRAI approach in BGP, attempting to isolate Path Exploration updates by the use of local heuristics

Page 33: Update Damping in BGP - RIPE Network …BGP Update Damping - peak damped updates per second 0 100 200 300 400 500 600 700 800 900 1000 1 25 49 73 97 121 145 169 193 217 241 265 289

Thank You

Questions?