Top Banner
Handling BGP Attribute Errors Rob Shakir (GX Networks) [email protected] / RJS-RIPE 1 Monday, 18 May 2009
22

LINX65 - Handling BGP Attribute Errors (Rob Shakir)

May 27, 2015

Download

Documents

Rob Shakir

BGP Attribute Error discussion following AS4_PATH global table problems - video at http://rob.sh/files/linx65-presentation.mp4
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

Handling BGP Attribute ErrorsRob Shakir (GX Networks)[email protected] / RJS-RIPE

1Monday, 18 May 2009

Page 2: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

Outline / Motivation

• BGP Errors - Current Handling

• AS4_PATH Bug and Optional Transitives

• Update to RFC 4893

• IETF IDR Drafts

• Why you should care!

2Monday, 18 May 2009

Page 3: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

Attributes and Errors

• Types of BGP Attributes

• Well-known Mandatory

• Well-known Discretionary

• Optional Transitive

• Optional Non-Transitive

• RFC 4271

• “A NOTIFICATION message is sent when an error condition is detected. The BGP connection is closed immediately after it is sent.”

3Monday, 18 May 2009

Page 4: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

Current Error Handling (1)

• AS_PATH Error (Well-known Mandatory)

• Worst case - loops and invalid routing.

AS65300 AS65400eBGP

Invalid AS_PATH

NOTIFICATION and Teardown

4Monday, 18 May 2009

Page 5: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

• Aggregator Error (Optional Transitive)

• Worst case? Loss of routing metadata?

AS65300 AS65300iBGP

Invalid AGGREGATOR

NOTIFICATION and Teardown

Current Error Handling (2)

5Monday, 18 May 2009

Page 6: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

Problem?

• All errors are treated equally.

• Is this the right behaviour?

• “Good, we’re being cautious!”

• “Why is my AS suddenly disconnected from the global table?”

6Monday, 18 May 2009

Page 7: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

AS4_PATH

• Defined in RFC 4893 (Optional Transitive)

AS70000 AS65400eBGP

AS71000eBGP

Non-AS4 SpeakerAS4 Speaker AS4 Speaker

AS4_PATH: 70000Not Used Not Used

AS_PATH: 23456 ii 65400 70000 i

7Monday, 18 May 2009

Page 8: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

Neat! And Errors?

• Shouldn’t really see errors!

• Cleaned like AS_PATH

• Mixed NEW and OLD confederations

• “To prevent the possible propagation of confederation path segment outside of a confederation, the path segment types AS_CONFED_SEQUENCE and AS_CONFED_SET [RFC3065] are declared invalid for the AS4_PATH attribute” (RFC 4893)

8Monday, 18 May 2009

Page 9: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

Whoops!

• December 10th 2008

• 91.207.218.0/23

• AS4_PATH: (65044 65057) 196629 (7 bytes)

• AS_PATH: xx xx 35320 23456 (13 bytes)

• Confederation information in AS4_PATH

• First RFC-compliant NEW speaker to see the UPDATE tears down the session to where it saw the UPDATE from.

9Monday, 18 May 2009

Page 10: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

What went wrong?

• ASN running mixed confeds with mixed OLD/NEW speakers and JunOS.

AS65301 AS65302eBGP

AS65303eBGP

Non-AS4 SpeakerAS4 Speaker AS4 Speaker

Copies AS_CONFED_SET

into AS4_PATH

Transits AS4_PATH (not checked!)

Invalid AS4_PATH received - sends NOTIFICATION

and teardown

10Monday, 18 May 2009

Page 11: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

Why is this concerning?

AS35320 AS5413

AS running JunOSand Confeds

Arbitrary ASNAS4-aware Border

Transit Provider

AS3356eBGP

Global Table

• First RFC compliant AS4 speaker in the path reacts.

• Teardown can be towards transit (likely, every prefix on these sessions!)

• Can craft an UPDATE to reach via a specific path.

11Monday, 18 May 2009

Page 12: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

Our Recommended Fix

• Recommended: Don’t send NOTIFICATION, treat UPDATE as withdrawl of prefix via this path.

• “Punish” broken paths without breaking every prefix via a session.

• Prefix might become unreachable.

12Monday, 18 May 2009

Page 13: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

Likely RFC Fix

• draft-ietf-rfc4893bis

• Ignore the broken parts of the AS4_PATH.

• IOS implemented this -12.0(32)S(Y8|13)

• Doesn’t lose reachability, and recovers from an error “in the wild”

• Some implications in loop detection?

13Monday, 18 May 2009

Page 14: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

AS_PATH and AS4_PATH

• Last LINX meeting - AS_PATH length problems.

• Different Case: Well Known Mandatory

• Highlights interesting point relating to AS4_PATH - loop detection for AS4?

• Bugs will always mean that invalid information is propagated.

14Monday, 18 May 2009

Page 15: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

The General Case

• draft-ietf-rfc4893bis fixes this specific - but what about others?

• Errors in other optional transitives still cause session teardown.

• Revise this behaviour? Don’t require NOTIFICATION be sent.

• Tell our neighbour that someone in their path did something wrong?

15Monday, 18 May 2009

Page 16: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

draft-scudder-idr-optional-transitive

• Handles the case of Optional Transitives that are not formed or checked by our neighbour

• Partial bit is set to 1 if some BGP speaker passes without checking.

• These are the “tunneled” UPDATES

• Recommended behaviour: Treat as a withdraw of the prefix and log.

16Monday, 18 May 2009

Page 17: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

draft-scholl-idr-advisory

• New MP-BGP capability (ADVISORY)- allows a string to be transmitted between two routers.

• NOT a replacement for NOTIFICATION

• Inform our neighbour that we’re considering an UPDATE as invalid.

• Not just error handling:

• “in-band” notification (e.g. maintenance)

17Monday, 18 May 2009

Page 18: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

draft-nalawade-bgp-soft-notify

• Has been some opposition to ADVISORY

• Humans already have phone and e-mail!

• SOFT-NOTIFICATION previous suggestion (2003)

• Intended to allow for graceful recovery from an error.

• Structured payload (no IM via BGP!)

18Monday, 18 May 2009

Page 19: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

Implications of these Drafts

• Protocol-wise, this isn’t core functionality

• Vendors and protocol-purists not necessarily interested?

• Operationally, we need to be robust!

• Do we trust everyone in the global table?

• Easier direct communication of events and settings directly between operators.

• Capability (you can turn it off!)19Monday, 18 May 2009

Page 20: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

Conclusions

• Blanket handling of BGP errors is suboptimal.

• Fix handling optional transitive errors (make the protocol more robust!)

• Add method to communicate these errors without tearing sessions down.

• Operator’s voices are really needed here!

20Monday, 18 May 2009

Page 21: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

Questions, Comments, Corrections?

Many thanks to:Andy Davidson (NetSumo)Jonathan Oddy (Hostway)David Freedman (Claranet)Will Hargrave (LONAP)Greg Hankins (Force10)

21Monday, 18 May 2009

Page 22: LINX65 - Handling BGP Attribute Errors (Rob Shakir)

Questions, or comments [email protected]

RJS-RIPE

Public Comments?IETF IDR - [email protected]

(To Subscribe: [email protected], In Body: subscribe idr-post)

22Monday, 18 May 2009