Oct 18, 2015
5/28/2018 BRKRST-3320 - Troubleshooting BGP
1/108
5/28/2018 BRKRST-3320 - Troubleshooting BGP
2/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Cell Phones
Who am I?
Who are you?
Service Provider
Enterprise
Studying for CCIE Advanced Class
Assume BGP Operational Experience Basic configuration
Show commands
Understand BGP attributes
IntroductionHousekeeping
5/28/2018 BRKRST-3320 - Troubleshooting BGP
3/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
IOS vs. IOS-XR vs. NX-OS
Troubleshooting concepts are the same
Some variation in show command syntax and output
Will use all three in this presentation
IntroductionOperating Systems
5/28/2018 BRKRST-3320 - Troubleshooting BGP
4/108 2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Generic Troubleshooting Advice
Troubleshooting Peers Bestpath Algorithm
Table Version
Initial Convergence Periodic Convergence
High Utilization
Layer 3 VPNs Looking Glasses
IntroductionAgenda
5/28/2018 BRKRST-3320 - Troubleshooting BGP
5/108
Generic Troubleshooting Advice
5/28/2018 BRKRST-3320 - Troubleshooting BGP
6/108 2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Narrow down the problem Can you reproduce it?
Which device(s) are the cause of theproblem?
Reduce your configs
Troubleshoot one thing at a time
100k routes flapping? Pick one route andfocus on that one route
Have a co-worker take a look Forces you to talk through the problem
Different set of eyes may spot something Sniffer capture, sniffer capture, sniffer
capture
Generic Troubleshooting Advice
5/28/2018 BRKRST-3320 - Troubleshooting BGP
7/108 2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Use NTP to sync timestamps on your routers cl ock t i mezone EST - 5 0
cl ock summer - t i me EDT r ecur r i ng nt p ser ver x. x. x. x
Use a syslog server
l oggi ng moni t or i nf or mat i onal l oggi ng host x. x. x. x
ser vi ce t i mest amps l og dat et i me msec l ocal t i me
Generic Troubleshooting AdviceSyslogs
5/28/2018 BRKRST-3320 - Troubleshooting BGP
8/108 2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Centralized/Timesynced syslogs are a great troubleshoo
Generic Troubleshooting AdviceSyslogs
5/28/2018 BRKRST-3320 - Troubleshooting BGP
9/108 2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
bgp l og- nei ghbor - changes
Generates a syslog message when a peer goes up or down Always configure this
OSPF, ISIS, and EIGRP all have log-neighbor-changes too
Generic Troubleshooting Advicelog-neighbor-changes
5/28/2018 BRKRST-3320 - Troubleshooting BGP
10/108 2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
The CPU on this router is high
High compared to what?
What is the CPU load normally at this time of day?
Things to keep track of
CPU load
Free Memory Largest block of memory
Input/Output load for interfaces
Rate of BGP bestpath changes
Etc, etc
Generic Troubleshooting AdviceDefine Normal
5/28/2018 BRKRST-3320 - Troubleshooting BGP
11/108 2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Cacti is a handy tool for polling and graphing data from various netwdevices
http://www.cacti.net/
Generic Troubleshooting AdviceDefine Normal
5/28/2018 BRKRST-3320 - Troubleshooting BGP
12/108 2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Use SPAN to get traffic to your sniffer
monitor session 1 source interface Te2/4 rx
monitor session 1 destination interface Te2/2
IOS-XR
Only supported on ASR-9000
Use ACLs to control what packets to SPAN RSPAN
RSPAN has all the features of SPAN, plus support for source ports and dports that are distributed across multiple switches, allowing one to monitordestination port located on the RSPAN VLAN. Hence, one can monitor theone switch using a device on another switch.
Generic Troubleshooting AdviceSniffer Captures
5/28/2018 BRKRST-3320 - Troubleshooting BGP
13/108 2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Ability to capture packets on the router
Primarily for control-plane traffic Difficult to capture transit traffic on distributed platforms
Is supported on some platforms
Very handy if a dedicated sniffer is not availableAvailable on IOS and NX-OS
Generic Troubleshooting AdviceEmbedded Packet Capture
5/28/2018 BRKRST-3320 - Troubleshooting BGP
14/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Create a buffer moni t or capt ur e buf f er buf 1 si ze 512 max- si ze 512 ci r cul ar
Define which interface and direction to capture moni t or capt ur e poi nt i p cef dwal t on- cap gi g 0/ 0 i n
Associate the buffer with the capture moni t or capt ur e poi nt associ at e dwal t on- cap buf 1
Start/Stop the capture moni t or capt ur e poi nt st ar t dwal t on- cap
moni t or capt ur e poi nt st op dwal t on- cap
Export the capture to a .pcap file moni t or capt ur e buf f er buf 1 expor t t f t p: / / 172. 26. 2. 254/ buf 1.
Generic Troubleshooting AdviceIOS Embedded Packet Capture
G
5/28/2018 BRKRST-3320 - Troubleshooting BGP
15/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
You probably know this already but
Wireshark is your best friend
It is free
You can get it here
http://www.wireshark.org/
Generic Troubleshooting AdviceWireshark
G i T bl h ti Ad i
5/28/2018 BRKRST-3320 - Troubleshooting BGP
16/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Generic Troubleshooting AdviceWireshark
G i T bl h ti Ad i
5/28/2018 BRKRST-3320 - Troubleshooting BGP
17/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Can do complex filters
ANDs, ORs, ()s, etc
If the filter is red, your syntax is busted
If the filter is green, your syntax is correct
Generic Troubleshooting AdviceWireshark
G i T bl h ti Ad i
5/28/2018 BRKRST-3320 - Troubleshooting BGP
18/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Wireshark does a LOT
Enough for someone to
800 page book on how t
ISBN-13: 978-18939399
Generic Troubleshooting AdviceWireshark
G i T bl h ti Ad i
5/28/2018 BRKRST-3320 - Troubleshooting BGP
19/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Send output to the logging buffer, not the console l oggi ng buf f er ed no l oggi ng consol e
Use milli-second timestamps service timestamps debug datetime msec localtime service timestamps log datetime msec localtime
Use ACLs to limit output br ai n1( conf i g) #access- l i st 100 per mi t i p host 1. 1. 1. 1 host br ai n1#debug i p packet 100 I P packet debuggi ng i s on f or access l i st 100 br ai n1#
If you need to enable a very chatty debug r el oad i n 10 Run your debug r el oad cancel
Generic Troubleshooting AdviceDebugs
Generic Troubleshooting Advice
5/28/2018 BRKRST-3320 - Troubleshooting BGP
20/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Collects event information for various protocols
Runs in the background
Events are stored in memory
Debug output is not generated
Syslogs are not generated
Finite number of most recent events are stored
Use show commands later to
Display an event in a debug like format
Merge events from various protocols
Easier on the box than debugs
Generic Troubleshooting AdviceEvent Tracing
Generic Troubleshooting Advice
5/28/2018 BRKRST-3320 - Troubleshooting BGP
21/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
br ai n1( conf i g) #moni t or event - t r ace ?
adj acency Adj acency Event sal l - t r aces Conf i gur e mer ged event t r aces
at om AToM Event Tr ace
cef CEF t r aces
[ sni p]
br ai n1( conf i g) #moni t or event - t r ace adj acency enabl e
br ai n1( conf i g) #end
Generic Troubleshooting AdviceEvent Tracing
Generic Troubleshooting Advice
5/28/2018 BRKRST-3320 - Troubleshooting BGP
22/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
br ai n1#show moni t or event - t r ace adj acency al l
Feb 14 17: 15: 48. 270: GLOBAL: adj mgr not i f i ed of f i bi db st at e change i nt Fast Et her net 0/ 0 t o
Feb 14 17: 15: 50. 958: GLOBAL: adj mgr not i f i ed of f i bi db st at e change i nt Fast Et her net 0/ 0 t o
Feb 14 17: 15: 51. 682: GLOBAL: adj i pv4 bundl e changed t o I Pv4 no f i xup adj oce [ OK]
Feb 14 17: 15: 51. 682: ADJ : I P 172. 26. 38. 1 Fast Et her net 0/ 0/ 0: updat e oce bundl e, I Pv4 i ncompl[ OK]
Feb 14 17: 15: 51. 682: ADJ : I P 172. 26. 38. 1 Fast Et her net 0/ 0/ 0: al l ocat e [ OK]
Feb 14 17: 15: 51. 686: ADJ : I P 172. 26. 38. 1 Fast Et her net 0/ 0/ 0: r equest r esol ut i on [ OK]
Feb 14 17: 15: 51. 734: ADJ : I P 172. 26. 38. 1 Fast Et her net 0/ 0/ 0: r equest t o add ARP [ OK]
Feb 14 17: 15: 51. 734: ADJ : I P 172. 26. 38. 1 Fast Et her net 0/ 0/ 0: al l ocat e [ I gnr ]
Feb 14 17: 15: 51. 734: ADJ : I P 172. 26. 38. 1 Fast Et her net 0/ 0/ 0: add sour ce ARP [ OK]
Feb 14 17: 15: 51. 734: ADJ : I P 172. 26. 38. 1 Fast Et her net 0/ 0/ 0: r equest t o updat e [ OK]
Feb 14 17: 15: 51. 734: ADJ : I P 172. 26. 38. 1 Fast Et her net 0/ 0/ 0: updat e oce bundl e, I Pv4 no f i xu
Feb 14 17: 15: 51. 734: ADJ : I P 172. 26. 38. 1 Fast Et her net 0/ 0/ 0: updat e [ OK]
br ai n1#
Generic Troubleshooting AdviceEvent Tracing
Generic Troubleshooting Advice
5/28/2018 BRKRST-3320 - Troubleshooting BGP
23/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Dont be the person who ha
hours to console into a box If you dont have out of ban
for every router and/or switcnetwork.get it.please
Generic Troubleshooting AdviceOut Of Band Access
5/28/2018 BRKRST-3320 - Troubleshooting BGP
24/108
Troubleshooting Peers
Failed Peering
5/28/2018 BRKRST-3320 - Troubleshooting BGP
25/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
R2
R1
Failed PeeringConfigurations
R1#sh tcp brief allTCB Local Addr ess For ei gn Addr ess ( st at e)64328548 *.179 2.2.2.2.* LISTEN
R1#
Check
AS Numbers IP addresses for TCP
eBGP Multihop?
i nt er f ace Loop0i p addr ess 2. 2. 2. 2/ 32
!r out er bgp 100nei ghbor 1. 1. 1. 1 r emot e- a
nei ghbor 1. 1. 1. 1 updat e- s
i nt er f ace Loop0i p addr ess 1. 1. 1. 1/ 32
!
r out er bgp 100nei ghbor 2. 2. 2. 2 r emot e- anei ghbor 2. 2. 2. 2 updat e- s
Failed Peering
5/28/2018 BRKRST-3320 - Troubleshooting BGP
26/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
i nt er f ace Loop0i p addr ess 1. 1. 1. 1/ 32
!
r out er bgp 100nei ghbor 2. 2. 2. 2 r emot e- asnei ghbor 2. 2. 2. 2 updat e- so
i nt er f ace Loop0
i p addr ess 2. 2. 2. 2/ 32!r out er bgp 100nei ghbor 1. 1. 1. 1 r emot e- asnei ghbor 1. 1. 1. 1 updat e- so
R2
R1
Failed PeeringConnectivity
Check
Extended ping betweenBGP peering addresses
R1#pi ng 2. 2. 2. 2 sour ce Loop0Sendi ng 5, 100- byt e I CMP Echos t o 2. 2. 2. 2Packet sent wi t h a sour ce addr ess of 1. 1. 1. 1. . . . .Success r at e i s 0 per cent ( 0/ 5)R1#
Failed Peering
5/28/2018 BRKRST-3320 - Troubleshooting BGP
27/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
BGP runs on top of IP and caaffected by many things
No connectivity? IGP issues
Access Lists
TCP problems
Peers come up but flap, are s MTU Issues extended ping an
address ranges, DF bit, etc
Rate limiting
Traffic shaping
Debugs may be needed
Failed PeeringConnectivity
Failed Peering
5/28/2018 BRKRST-3320 - Troubleshooting BGP
28/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
BGP NOTIFICATIONs consist of an error code, subcode and data
All Error Codes and Subcodes can be found here
http://www.iana.org/assignments/bgp-parameters/bgp-parameters.xml http://tinyurl.com/bgp-notification-codes
Data portion may contain what triggered the notification Example: corrupt part of the UPDATE
Pay attention to who sent vs. received the NOTIFICATION
If Router X sent the NOTIFICATION, it means he noticed the issue
Does not mean Router X is the cause of the issue
Failed PeeringNotifications
Failed Peering
5/28/2018 BRKRST-3320 - Troubleshooting BGP
29/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Failed PeeringNotifications
The first 2 in 2/2 is the Error Code.so OPEN Message Error
Value Name Reference
1 Message Header Error RFC 4271
2 OPEN Message Error RFC 4271
3 UPDATE Message Error RFC 4271
4 Hold Timer Expired RFC 4271
5 Finite State Machine Error RFC 4271
6 Cease RFC 4271
%BGP- 3- NOTI FI CATI ON: sent t o nei ghbor 2. 2. 2. 2 2/2 ( peer i n wr ong2 byt es 00C8 FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 002D 0104 00C00B4 0202 0202 1002 0601 0400 0100 0102 0280 0002 0202 00
Failed Peering
5/28/2018 BRKRST-3320 - Troubleshooting BGP
30/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Subcode # Subcode Name Subcode Descrip
1 Unsupported BGP version The version of BGP the peer is run
with the local version of BGP
2 Bad Peer AS The AS this peer is locally configur
the AS the peer is advertising
3 Bad BGP Identifier The BGP router ID is the same as
ID4 Unsupported Optional Parameter There is an option in the packet wh
speaker doesnt recognize
6 Unacceptable Hold Time The remote BGP peer has request
which is not allowed (too low)
7 Unsupported Capability The peer has asked for support for
local router does not support
OPEN Message Subcodes shown aboveThe second 2 in 2/2 is the Error Subcode.so Bad Peer AS
Failed PeeringNotifications
Failed Peering
5/28/2018 BRKRST-3320 - Troubleshooting BGP
31/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Sniff of BGP Notification Sent from R2 to R1
R2# show l og | i ncl ude NOTI FI CATI ON%BGP- 3- NOTI FI CATI ON: sent t o nei ghbor 10. 1. 2. 1 2/ 2 ( peer i n wr ong AS)2 byt es 0064 FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 002D 01040064 00B4 0101 0101 1002 0601 0400 0100 0102 0280 0002 0202 00
x0064 = data of NOTIFICATION
x0064 = decimal 100
Failed PeeringNotifications
10.
1.
2.
1
R2
R
Failed Peering
5/28/2018 BRKRST-3320 - Troubleshooting BGP
32/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Question: What did R1 see?
R1#sh l og | i ncl ude NOTI FI CATI ON%BGP- 3- NOTI FI CATI ON: r ecei ved f r om nei ghbor 10. 1. 2. 2 2/ 2 ( peer i n wr ong AS
Failed PeeringNotifications
r out er bgp 200no synchr oni zat i onbgp l og- nei ghbor - changesnei ghbor 10. 1. 2. 1 r emot e- as 10no aut o- summar y
r out er bgp 100no synchr oni zat i onbgp l og- nei ghbor - changes
nei ghbor 10. 1. 2. 2 r emot e- as 200no aut o- summar y
10.
1.2
.1
10
.1.
2.
2
R2
R1
Failed Peering
5/28/2018 BRKRST-3320 - Troubleshooting BGP
33/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
What if a peer sends you a message that causes us to send a NOTIFIC
Corrupt UPDATE
Bad OPEN message, etc View the message that triggered the NOTIFICATION
show i p bgp nei ghbor 1. 1. 1. 1 | begi n Last r eset
Last r eset 5d12h, due t o BGP Not i f i cat i on sent , i nval i d or cor r upt
Message received that caused BGP to send a Notification:
FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
005C0200 00004140 01010040 0206065D
1CFC059F 400304D5 8C20F480 04040000 05054005 04000000 55C0081C 329C4844
329C6E28 329C6E29 58F50082 58F5EACE
58F5FA02 58F5FA6E 18D14E70
gDecoding Hex
Failed Peering
5/28/2018 BRKRST-3320 - Troubleshooting BGP
34/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
You dont like reading hex?
Nice write-up here on converting hex output to wireshark .pc
http://ccie-in-3-months.blogspot.com/2010/08/decoding-ripe-expe http://tinyurl.com/bgp-hex-decode
In a nutshell, put the hex dump in this format
gDecoding Hex
Failed Peering
5/28/2018 BRKRST-3320 - Troubleshooting BGP
35/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Now use Wiresharks text2pcap.exe to add the needed heade
gDecoding Hex
Open bgp_message.pcap with Wireshark
Troubleshooting Peers
5/28/2018 BRKRST-3320 - Troubleshooting BGP
36/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
geBGP TTL
BGP uses a TTL of 1 for eBGP peers
Also verifies if NEXTHOP is directly connected For eBGP peers that are more than 1 hop away
a larger TTL must be used
No longer verifies if NEXTHOP is directly connected neighbor x.x.x.x ebgp-multihop [2-255]
R1
R2
Troubleshooting Peers
5/28/2018 BRKRST-3320 - Troubleshooting BGP
37/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
geBGP TTL
Loopback peering to directly connected eBGP peer Typically used to load-balance over multiple links
Two options for configuring this
Option #1 The old way Use ebgp-multihop
Change the TTL to 2
Disables the is the NEXTHOP on a connected subnet check
R2
R1
M
se
lo
R1#r out er bgp 100
no synchr oni zat i onbgp l og- nei ghbor - changesnei ghbor 2. 2. 2. 2 r emot e- as 200neighbor 2.2.2.2 ebgp-multihop 2
nei ghbor 2. 2. 2. 2 updat e- sour ce Loopback0
no aut o- summar y
Troubleshooting Peers
5/28/2018 BRKRST-3320 - Troubleshooting BGP
38/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
eBGP TTL
Option #2 The new way
Use disable-connected-check
Still uses a TTL of 1 Disables the is the NEXTHOP on a connected
subnet check
R2
R1
M
se
loR1#r out er bgp 100no synchr oni zat i onbgp l og- nei ghbor - changesnei ghbor 2. 2. 2. 2 r emot e- as 200
neighbor 2.2.2.2 disable-connected-checknei ghbor 2. 2. 2. 2 updat e- sour ce Loopback0no aut o- summar y
Failed Peering
5/28/2018 BRKRST-3320 - Troubleshooting BGP
39/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Assume R1 sends hold time expired NOTIFICATION to R2 R1 did not receive a KA from R2 for holdtime seconds
One of two issues R2 is not generating keepalives R2 is generating keepalives but R1 is not receiving them
First figure out if R2 is building keepalives Is R2 out of memory or CPU?
Output drops on the outbound interface towards R1? When did R2 last build a keepalive? R2#show i p bgp nei ghbor s 1. 1. 1. 1 Last r ead 00: 00: 15, last write 00:00:44, hol d t i me i s 180
keepal i ve i nt er val i s 60 seconds Is the TCP window open?
show i p bgp summar y Watch R2s MsgSent counter for R1.does it increment?
Notifications Hold Time Expired
Failed Peering
5/28/2018 BRKRST-3320 - Troubleshooting BGP
40/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Assuming R2 is sending keepalives, why isnt R1 receiving them? Input drops on R1 Lost in transit?
Do R1 and R2 still have IP connectivity? Ping using peering addresses (loopback to loopback) Ping with mss (max-segment-size) with df-bit set
MSS Max Segment Size
536 bytes by default Path MTU Discovery finds smallest MTU between R1 and R2 Subtract 40 bytes for TCP/IP overhead
Note the MSS and ping accordingly
R1#sh ip bgp neighborsBGP nei ghbor i s 2. 2. 2. 2, r emot e AS 2, ext er nal l i nkDat agr ams ( max dat a segment i s 1460 byt es) :R1#ping 2.2.2.2 source loop0 size 1500 df-bit
Notifications Hold Time Expired
Failed Peering
5/28/2018 BRKRST-3320 - Troubleshooting BGP
41/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
show i p bgp summar y
Watch R1s MsgRcvd counter for R2it should be incrementing
When did R1 last receive keepalive?R1#show i p bgp nei ghbor s 2. 2. 2. 2
Last read 00:00:95, l ast wr i t e 00: 00: 44, hol d t i me i s 1
keepal i ve i nt er val i s 60 seconds
Notifications Hold Time Expired
Speaker FlapC S d
5/28/2018 BRKRST-3320 - Troubleshooting BGP
42/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
There are lots of possibilities here
Is R1 having a problem sending keepalives? CPU at 100%?
Out of memory?
Are the keepalives being lost in the cloud?
Is R2 having a problem receiving the keepalive?
Case Study
R1 R2
%BGP- 5- ADJ CHANGE: nei ghbor 1. 1. 1. 1 Down BGP Not i f i cat i on sent%BGP- 3- NOTI FI CATI ON: sent t o nei ghbor 1. 1. 1. 1 4/ 0 ( hol d t i me expi r ed)
R2#show ip bgp neighbor 1.1.1.1 | include last resetLast r eset 00: 01: 02, due t o BGP Not i f i cat i on sent , hol d t i me expi r ed
NOTIFICATION
Speaker FlapC St d
5/28/2018 BRKRST-3320 - Troubleshooting BGP
43/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Did R1 build and transmit a keepalive for R2? debug i p bgp keepal i ve show i p bgp nei ghbor
When did we last send or receive data with the peer?R2#show i p bgp nei ghbor s 1. 1. 1. 1BGP nei ghbor i s 1. 1. 1. 1, r emot e AS 100, ext er nal l i nkBGP ver si on 4, r emot e r out er I D 1. 1. 1. 1
BGP st at e = Est abl i shed, up f or 00: 12: 49Last read 00:01:15, last write 00:00:44, hol d t i me i s 180,
keepal i ve i nt er val i s 60 seconds
R2 hasnt received a Keepalive in more than keepalive interval second
Time to check R1 How is R1 on memory? What is the R1s CPU load? Is R2s TCP window open?
Case Study
Speaker FlapC St d
5/28/2018 BRKRST-3320 - Troubleshooting BGP
44/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
R1#show ip bgp sum | begin NeighborNei ghbor MsgRcvd MsgSent Tbl Ver I nQ Out Q Up/ Down St at e/ Pf xRcd2. 2. 2. 2 53 284 10167 0 98 00: 02: 24 0
R1#show ip bgp sum | begin NeighborNei ghbor MsgRcvd MsgSent Tbl Ver I nQ Out Q Up/ Down St at e/ Pf xRcd2. 2. 2. 2 53 284 10167 0 97 00: 01: 20 0
At least one BGP kee
interval apart
The number of packets
generated is increasingThe number of packets
transmitted is not increasing
OutQ is incrementing due to Keepalives
MsgSent is not incrementing
Something is stuck in the OutQ
The keepalives arent leaving R1!
Case Study
Speaker FlapC St d
5/28/2018 BRKRST-3320 - Troubleshooting BGP
45/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
This is a layer 2 or 3 transport issue, etc.
BGP OPENs and Keepalives are small
UPDATEs can be much larger So maybe small packets work but larger packets do not?
Case Study
R1#ping 2.2.2.2
Type escape sequence t o abor t .Sendi ng 5, 100- byt e I CMP Echos t o 2. 2. 2. 2, t i meout i s 2 seconds:! ! ! ! !Success r at e i s 100 per cent ( 5/ 5) , r ound- t r i p mi n/ avg/ max = 16/ 21/ 24 ms
R1#ping 2.2.2.2 size 1500 df-bit
Type escape sequence t o abor t .Sendi ng 5, 1500- byt e I CMP Echos t o 2. 2. 2. 2, t i meout i s 2 seconds:Packet sent wi t h t he DF bi t set. . . . .
Success r at e i s 100 per cent ( 5/ 5) , r ound- t r i p mi n/ avg/ max = 1/ 1/ 1 ms
5/28/2018 BRKRST-3320 - Troubleshooting BGP
46/108
Bestpath Algorithm
Best PathAlgorithm
5/28/2018 BRKRST-3320 - Troubleshooting BGP
47/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Algorithm
1 Not synchronized Only happens if sync is configured AND the route isnt in yo
2 Inaccessible NEXTHOP IGP does not have a route to the BGP NEXTHOP
3 Received-only paths Happens if soft-reconfig inbound is applied. A path will be r
was denied/modified by inbound policy.
Quick bestpath review
Remember BGP only advertises one path per prefixthe bestpath
Cannot advertise path from one iBGP peer to another
Bestpath selection process is a little lengthy
First eliminate paths that are ineligible for bestpath
Best Path Algorithm
5/28/2018 BRKRST-3320 - Troubleshooting BGP
48/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Best PathAlgorithm
1 Weight Highest wins Scope is router only
2 LOCAL_PREFERENCE Highest wins Scope is AS only
3 Locally Originated Redistribution or network statement favored
address
4 AS_PATH Shortest wins Skipped if bgp bestpath as-path ignore coAS_SET counts as 1
CONFED parts do not count
5 ORIGIN Lowest wins IGP < EGP < Incomplete
6 MED Lowest wins MEDs are compared only if the first AS in this the same
7 eBGP over iBGP
8 Metric to Next Hop Lowest wins IGP cost to the BGP NEXTHOP
9 Multiple Paths in RIB Flag path as multipath is max-paths is con
10 Oldest External Wins Unless BGP best path compare router-id co
11 BGP Router ID Lowest
12 CLUSTER_LIST Smallest Shorter CLUSTER_LIST wins
13 Neighbor Address Lowest Lowest neighbor address
Best PathAlgorithm
5/28/2018 BRKRST-3320 - Troubleshooting BGP
49/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
show i p bgp x. x. x. x best pat h Will show you only the bestpath for x.x.x.x
Handy if you have lots of paths for a prefixR2#sh i p bgp 7. 4. 4. 0/ 24 best pat hBGP r out i ng t abl e ent r y f or 7. 4. 4. 0/ 24, ver si on 2Pat hs: ( 20 available, best #13, t abl e Def aul t - I P- Rout i ngFl ag: 0x820
Not adver t i sed t o any peer100
192. 150. 6. 11 f r om 192. 150. 6. 11 ( 192. 150. 6. 11)Or i gi n I GP, met r i c 0, l ocal pr ef 100, val i d, ext er n
R2#
show i p bgp x. x. x. x mul t i pat h Same concept but will show you all of the multipaths for x.x.x.x
Algorithm
Best Path Algorithm
5/28/2018 BRKRST-3320 - Troubleshooting BGP
50/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
IOS-XR has sh i p bgp x. x. x. x best pat h- compar e
Explains why the bestpath is the best
Best PathAlgorithm
5/28/2018 BRKRST-3320 - Troubleshooting BGP
51/108
BGP Table Version
BGP Table Version
5/28/2018 BRKRST-3320 - Troubleshooting BGP
52/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
BGP Table Version
Lots of things must happen when bestpaths change
RIB must be notified Peers must be informed Must have a way to track who has been informed of which bestpath
Prefix Table Version
Each prefix has a 32 bit number that is its table version A prefixs table version is bumped for every bestpath change Bumped means the table version changes from the current version
available version #. Assume 10.0.0.0/8 has a table version of #27 and the highest table
by any prefix is #30. If 10.0.0.0/8 has a bestpath change his table vbumped to #31.
BGP Table Version
5/28/2018 BRKRST-3320 - Troubleshooting BGP
53/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
BGP Table Version
show i p bgp x. x. x. x will show you a prefixs table versionR1#sh i p bgp 10. 0. 0. 0BGP r out i ng t abl e ent r y f or 10. 0. 0. 0/ 8, ver si on 31Pat hs: ( 1 avai l abl e, best #1, t abl e Def aul t - I P- Rout i ng- Tabl e)Fl ag: 0x820
Not adver t i sed t o any peer200
2. 2. 2. 2 f r om 2. 2. 2. 2 ( 2. 2. 2. 2)
Or i gi n I GP, met r i c 0, l ocal pr ef 100, val i d, ext er nal , bestR1#
BGP Table Version
5/28/2018 BRKRST-3320 - Troubleshooting BGP
54/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
BGP Table Version
RIB & Peer Table Versions
We have a table version for the RIB Also have a table version for each peer Used to keep track of which bestpath changes have been propagate
If peer 1.1.1.1 has a table version of #60 this tells us we have
1.1.1.1 of all bestpath changes for prefixes with a table versio If any prefix has a table version > #60 then we need to inform
that prefixs bestpath
Once 1.1.1.1 has been updated his table version will be updat
accordingly Same concept for the RIB and its table version
BGP Table Version
5/28/2018 BRKRST-3320 - Troubleshooting BGP
55/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
G ab e e s o
show ip bgp summary is best for viewing RIB and peer version #sR2#show i p bgp summ
BGP r out er i dent i f i er 2. 2. 2. 2, l ocal AS number 200BGP t abl e ver si on i s 13, mai n r out i ng t abl e ver si on 13
3 net work ent r i es usi ng 351 byt es of memory
3 pat h ent r i es usi ng 156 byt es of memory
Nei ghbor V AS MsgRcvd MsgSent Tbl Ver I nQ Out Q Up/ Down Stat e/ Pf x
1. 1. 1. 1 4 100 4386 4388 13 0 0 01: 20: 24 1
R2#
Highest table version of any prefix = main routing table version
RIB is converged
1.1.1.1 is converged
BGP Table Version
5/28/2018 BRKRST-3320 - Troubleshooting BGP
56/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Example
Assume the highest table version of any prefix is #10
The RIB has a table version of #10 The RIB is up to date for all prefixes
All peers have a table version of #10 Our peers are currently converged
5 prefixes experience a bestpath change
Highest table version is now #15
Inform the RIB of these 5 changes Do RIB adds, deletes, and/or modifies
When complete, set the RIB table version to #15 Inform our peers of these 5 changes
Build updates and/or withdraws for each peer
When complete, set our peers table versions to #15
BGP Table Version
5/28/2018 BRKRST-3320 - Troubleshooting BGP
57/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Why am I babbling about this?
Gives you a way to know who has been informed about what
Provides a way to tell how many bestpath changes your network is ex You have 150k routes and see the table version increase by 150k every minu
is wrong!! You have 150k routes and see the table version increase by 300 every minut
normal network churn
You should monitor the table version in your network to determine whafor you
If the table version is increasing rapidly then that could explain why Band BGP IO are busy
5/28/2018 BRKRST-3320 - Troubleshooting BGP
58/108
Initial Convergence
BGP Convergence
5/28/2018 BRKRST-3320 - Troubleshooting BGP
59/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
HeyWho are you calling slow?
Two general convergence situations
Initial startup Periodic route changes
g
ConvergenceInitial Startup
5/28/2018 BRKRST-3320 - Troubleshooting BGP
60/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Initial convergence happens when:
A router boots
RP failover clear ip bgp *
How long initial convergence takesis a factor of the amount of work to
be done and the router/networksability to do this fast and efficiently
p
ConvergenceInitial Startup
5/28/2018 BRKRST-3320 - Troubleshooting BGP
61/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Initial convergence can
stressfulif you are apBGP scalability limits tyou will see issues.
p
ConvergenceInitial Startup
5/28/2018 BRKRST-3320 - Troubleshooting BGP
62/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
What work needs to be done?
1) Accept routes from all peers
Not too difficult
2) Calculate bestpaths This is easy
3) Install bestpaths in the RIB Also fairly easy
4) Advertise bestpaths to all peers This can be difficult and may take several minutes depending on the f
variables
p
ConvergenceKey Variables
5/28/2018 BRKRST-3320 - Troubleshooting BGP
63/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
BGP Variables
The number of routes
The number of peers The number of update-groups
The ability to advertise routes to each update-group efficiently
Router Variables
CPU horsepower
Code version
Outbound Interface Bandwidth
y
ConvergenceUPDATE Packing
5/28/2018 BRKRST-3320 - Troubleshooting BGP
64/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
An UPDATE contains a set of Attributes and a list of prefixes (NLRI) BGP starts an UPDATE by building an attribute set BGP then packs as many destinations (NLRIs) as it can into the UPDATE NLRI = Network Layer Reachability Information Only NLRI with a matching attribute set can be placed in the UPDATE NLRI are added to the UPDATE until it is full (4096 bytes max)
UPDATE Packing refers to how efficiently an implementation packs NL Least efficient: BGP only puts one NLRI per UPDATE Most efficient: BGP puts all NLRI with a certain Attribute set in one UPDAT
g
Least Efficient MED 50Origin IGP
MED 50
Origin IGP
10.1.1.0/24 MED 50
Origin IGP
10.1.2.0/24 10
Most Efficient MED 50Origin IGP
10.1.1.0/24
10.1.2.0/24
10.1.3.0/24
ConvergenceUPDATE Packing
5/28/2018 BRKRST-3320 - Troubleshooting BGP
65/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
The fewer attribute sets you have the better More NLRI will share an attribute set
Fewer UPDATEs to converge
Things you can do to reduce attribute sets next-hop-self for all iBGP sessions
Dont accept/send communities you dont need
Use cluster-id to put RRs in the same POP in a cluster To see how many attribute sets you have
show i p bgp summar y
190844 net wor k ent r i es usi ng 21565372 byt es of memor y
302705 pat h ent r i es usi ng 15740660 byt es of memor y57469/31045 BGP path/bestpath attribute entries usi ng 620665memor y
ConvergenceTCP MSS Max Segment Size
5/28/2018 BRKRST-3320 - Troubleshooting BGP
66/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
TCP MSS (max segment size) is also a factor in convergence times. The larthe MSS the fewer TCP packets it takes to transport the BGP updates. Fewepackets means less overhead and faster convergence.
BGP UPDATE Attribute NLRNLRINLRI ..NLRIs.. ..NLRIs..
Attribute NLRIDefault MSS
BGP UDPATE is split
into two TCP packets
..NLRIs..
NLRINLRI ..NLRIs..
Increased MSS
The entire BGP update
can fit in one TCP packet
IP Header TCP Header
IP Header TCP Header
Attribute NLRINLRI ..NLRIs..IP Header TCP Header
ConvergenceTCP MSS Max Segment Size
5/28/2018 BRKRST-3320 - Troubleshooting BGP
67/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
MSS Max Segment Size
Limit on packet size for a TCP socket
536 bytes by default Path MTU Discovery
Finds smallest MTU between R1 and R2
Subtract 40 bytes for TCP/IP overhead
Enabled by default for BGP
nei ghbor 2. 2. 2. 2 t r anspor t pat h- mt u- di scover y di sabl e
To find the MSSR1#sh ip bgp neighbors
BGP nei ghbor i s 2. 2. 2. 2, r emot e AS 3, ext er nal l i nk
Dat agr ams ( max dat a segment i s 1460 byt es) :
ConvergenceUpdate Groups
5/28/2018 BRKRST-3320 - Troubleshooting BGP
68/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
BGP must create updates based onthe policies towards each peer
Peers with a common outbound policyare members of the same update-group
iBGP vs. eBGP Outbound route-map, prefix-lists, etc
UPDATEs are generated for one
member of an update-group and thenreplicated to the other members
Attribute NLRNLRI
Less Efficient Two peer
update-groups
More Efficient Two peer
the same update-group
Attribute NLRNLRI
Attribute NLRNLRI
ConvergenceDropping TCP Acks
5/28/2018 BRKRST-3320 - Troubleshooting BGP
69/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Primarily an issue on RRs (Route Reflectors)with
One or two interfaces connecting to the core Hundreds of RRCs (Route Reflector Clients)
RR sends out tons of UPDATES to RRCs
RRCs send TCP ACKs RR core facing interface(s) receive huge wave of
TCP ACKs
R
RR
TCP A
BGP UP
ConvergenceDropping TCP Acks
5/28/2018 BRKRST-3320 - Troubleshooting BGP
70/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Interface input queue fills upTCP ACKs are dropped
Each time a TCP packet is dropped, the session goes into slow start
It takes a good deal of time for a TCP session to come out of slow start Increase the input queue
hold-queue 1000 in
If you still see drops increase to 4096
Convergence
5/28/2018 BRKRST-3320 - Troubleshooting BGP
71/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
How do you know if BGP has converged?
Watch the global table version
Increases by 1 for every bestpath change In the lab: Table version stabilizes
In the real world: Reaches your normal rate of change
Watch peer InQ and OutQs
Wait for all InQ and OutQs to be empty
To list peers with non-empty queues
show i p bgp summ | e 0 0
Watch peer table versions show i p bgp summ
If peer table version == global table version and InQ/OutQ empty, BGP h
that peer
ConvergenceInitial Convergence Summary
5/28/2018 BRKRST-3320 - Troubleshooting BGP
72/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Initial convergence time is a factor of the amount of work that needsand the router/networks ability to do this fast and efficiently
Reduce the number of attributes sets in BGP Use next-hop-self, dont send communities you dont need, etc.
Reduce the number of unique outbound policies towards all peers
Try to find a small set of common policies, rather than individualizing polic
The fewer update-groups the better
MSS/PMTU
Efficient packaging of BGP messages in TCP
Stop TCP ACK drops
Increase interface input queues on RRs
5/28/2018 BRKRST-3320 - Troubleshooting BGP
73/108
Periodic Convergence
ConvergenceRoute Changes
5/28/2018 BRKRST-3320 - Troubleshooting BGP
74/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
There are 2 elements to route change convergence for BGP
Failure Detection
How long does it take to see the failure? (t0 to t1) Convergence
How long does it take to process and propagate information abouthe failure? (t1 to t2)
FailureProcess
Propagate
t0 t2
Recovery
t1
ConvergenceRoute Changes
5/28/2018 BRKRST-3320 - Troubleshooting BGP
75/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Time to Detect Failure
Address Tracking Feature
Nexthop Tracking Peer Down Detection
Time to Respond to Failure MRAI Min Route Advertisement Interval
Advertising the new information
ConvergenceAddress Tracking Filter
5/28/2018 BRKRST-3320 - Troubleshooting BGP
76/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Quick ATF review
ATF = Address Tracking Filter
ATF is a middle man between the RIB and RIB clients BGP, OSPF, EIGRP, etc are clients of the RIB
A client tells ATF what prefixes he is interested in
ATF tracks each prefix Notify the client when the route to a registered prefix changes
Client is responsible for taking action based on ATF notification
Provides a scalable event driven model for dealing
with RIB changes
ConvergenceNexthop Tracking
5/28/2018 BRKRST-3320 - Troubleshooting BGP
77/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
BGP nexthop tracking
Relies on ATF
Event driven convergence model
Register NEXTHOPs with ATF 10.1.1.3
10.1.1.5
ATF filters out changes for 10.1.1.1/32, 10.1.1.2/32, and 10.1.1.4/32
BGP has not registered for these
Changes to 10.1.1.3/32 and 10.1.1.5/32 are passed along to BGP
Recompute bestpath for prefixes that use these NEXTHOPs
No need to wait for BGP ScannerRIB
10.1.1.1/3210.1.1.2/32
10.1.1.3/32
10.1.1.4/32
10.1.1.5/32
ATF
BGP NEXTH
10.1.1.3
10.1.1.5
BGP
ConvergenceNexthop Tracking
5/28/2018 BRKRST-3320 - Troubleshooting BGP
78/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Enabled by default [ no] bgp next hop t r i gger enabl e
BGP registers all nexthops with ATF show i p bgp at t r next - hop r i bf i l t er
Trigger delay is configurable bgp next hop t r i gger del ay
5 seconds by default
Debugs debug i p bgp event s next hop
debug i p bgp r i b- f i l t er
ConvergencePeer Down Detection
5/28/2018 BRKRST-3320 - Troubleshooting BGP
79/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
BGP must learn that the peer is down
Default keepalive/holdtime values are 60 seconds and 180 seconds
My 2c.use 3 second KA with 9 second holdtime
Tune your IGP to converge in under 9 seconds
Use BFD (bidirectional forwarding detection) if you need to be more aggre
eBGP directly connected
bgp f ast - ext er nal - f al l over
If the interface goes down so does the eBGP peer
Reduce carrier-delay settings
0 msec for down 100 msec for up
eBGP multihop
Relies on holdtime or BFD
ConvergencePeer Down Detection
5/28/2018 BRKRST-3320 - Troubleshooting BGP
80/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
iBGP peers
Relies on holdtime or BFD
BFD on iBGP peers Know how fast your IGP converges!
Your BFD dead timer must be greater than that amount
iBGP peer down detection isnt as critical as eBGP. Why?
IGP should be tuned to converge quickly
Fast IGP + BGP Nexthop Tracking = BGP reacts quickly to nexthop chang
BGP can route around a change in the core prior to bringing down iBGP p
ConvergenceFast Session Deactivation
5/28/2018 BRKRST-3320 - Troubleshooting BGP
81/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Fast Session Deactivation nei ghbor x. x. x. x f al l - over
Register peer's address with ATF
ATF informs BGP of routing changes to the peer
When we lose our route to the peer, bring the peer down.
No need to wait for holdtime to expire
Primary use case is eBGP multihop
Multiho
#1 Li
#2 Li
#3 FSdown p
ConvergenceFast Session Deactivation
5/28/2018 BRKRST-3320 - Troubleshooting BGP
82/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Very dangerous for iBGP peers
IGP may not have a route to a peer for a split second
FSD would tear down the BGP session
Imagine if you lose your IGP route to your RR (Route Reflector)for just 100ms
Every RR to RRC session would flap
Off by default nei ghbor x. x. x. x f al l - over
ConvergenceFSD vs. BFD
5/28/2018 BRKRST-3320 - Troubleshooting BGP
83/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Why do we have both? FSD was developed first Goal was fast BGP neighbor detection without
expense of fast keepalives
BFD came later Goal was fast neighbor detection for multipleprotocols
Fast keepalives not as much of a concern BFD KAs are generated by linecards CPUs are also much faster today
FSD Relies on control plane (absence of a route in
the RIB) to tear down the peer We could have a route but not have
connectivity
BFD Relies on forwarding plane to detect down
peer If we loose connectivity, the peer comes down
ConvergenceMRAI (minimum route advertisement interval)
5/28/2018 BRKRST-3320 - Troubleshooting BGP
84/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
How is the timer enforced for peer X?
Timer starts when all routes have been advertised to X
For the next MRAI (seconds) we will not propagate any bestpath cX
Once Xs MRAI timer expires, send him updates and withdraws
Restart the timer and the process repeats
User may see a wave of updates and withdraws to peer X evseconds
User will NOT see a delay of MRAI between each individual
withdraw BGP would never converge if this were the case
ConvergenceMRAI
5/28/2018 BRKRST-3320 - Troubleshooting BGP
85/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
MRAI timeline for BGP peer w/ MRAI of 5 seconds
T0 The big bang
T7 Bestpath Change #1 UPDATE sent immediately MRAI timer starts, will expire at T12
T10
Bestpath Change #2 Must wait until T12 for MRAI to expire
T12 MRAI expires
Bestpath Change #2 is Txed MRAI timer starts, will expire at T17
T17 MRAI expires No pending UPDATEs
t0 t5 t10 t15
TX update #1
Start MRAI
Bestpath
Change #2
Bestpath
Change #1
MRAI Expires
TX update #2Start MRAI
MRA
ConvergenceMRAI
5/28/2018 BRKRST-3320 - Troubleshooting BGP
86/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
BGP is not a link state protocol, it is path vector
May take several rounds/cycles of exchanging updates and withdrnetwork to converge
MRAI must expire between each round!
The more fully meshed the network and the more tiers of ASes, the rounds required for convergence
Think about
How many tiers of ASes there are in the Internet
How meshy peering can be in the Internet
ConvergenceMRAI
5/28/2018 BRKRST-3320 - Troubleshooting BGP
87/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Internet churn means we are constantly setting and waiting on MRA
One flapping prefix slows convergence for all prefixes
Internet table sees roughly 6 bestpath changes per second
For iBGP and PE-CE eBGP peers nei ghbor x. x. x. x adver t i sement - i nt er val 0
Has been the default since 12.0(32)S
For regular eBGP peers
Lowering to 0 may get you dampened
OK to lower for eBGP peers if they are not using dampening
5/28/2018 BRKRST-3320 - Troubleshooting BGP
88/108
High CPU Utilization
High Utilization
5/28/2018 BRKRST-3320 - Troubleshooting BGP
89/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Define High Know what normal CPU utilization is for the router in question
Is the CPU spiking due to BGP Scanner or is it constant?
Look at the scenario Is BGP going through Initial Convergence?
If not then route churn is the usual culprit
Illegal recursive lookup or some other factor causes bestpath cfor the entire table
Rout er #show process cpuCPU ut i l i zat i on f or f i ve seconds: 100%/ 0%; one mi nut e: 99%; f i ve mi n
. . . .139 6795740 1020252 6660 88. 34% 91. 63% 74. 01% 0 BGP Rout
High Utilization
5/28/2018 BRKRST-3320 - Troubleshooting BGP
90/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
How to identify route churn?
Do sh ip bgp summary, note the table version
Wait 60 seconds
Do sh ip bgp summary, compare the table version from 60 sec
You have 150k routes and see the table version increase b
This is probably normal route churn
Know how many bestpath changes you normally see per minute
You have 150k routes and see the table version increase b
This is bad and is the cause of your high CPU
High Utilization
5/28/2018 BRKRST-3320 - Troubleshooting BGP
91/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
What causes massive table version changes?
Flapping peers
Hold-timer expiring?
Corrupt UPDATE?
Route churn
Dont try to troubleshoot the entire BGP table at once
Identify one prefix that is churning and troubleshoot that one prefix
Will likely fix the problem with the rest of the BGP table churn
High Utilization
5/28/2018 BRKRST-3320 - Troubleshooting BGP
92/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Table Version Changing Rapidly: A Little Lab Fun
RP/ 0/ RP0/ CPU0: XR#sh r out e | include 00:00:Wed Apr 27 13: 53: 40. 201 EDT
O 1. 0. 0. 0/ 30 [ 110/ 3] vi a 10. 1. 2. 1, 00:00:00, Gi gabi t Et her net 0/ 0/ 0/O 1. 0. 0. 4/ 30 [ 110/ 3] vi a 10. 1. 2. 1, 00:00:00, Gi gabi t Et her net 0/ 0/ 0/O 1. 0. 0. 8/ 30 [ 110/ 3] vi a 10. 1. 2. 1, 00:00:00, Gi gabi t Et her net 0/ 0/ 0/O 1. 0. 0. 12/ 30 [ 110/ 3] vi a 10. 1. 2. 1, 00:00:00, Gi gabi t Et her net 0/ 0/ 0. . .
RP/ 0/ RP0/ CPU0: XR#sh r out e | include 00:00:Wed Apr 27 13: 53: 44. 162 EDTB 1. 0. 0. 0/ 30 [ 20/ 2] vi a 1. 1. 1. 1, 00:00:01B 1. 0. 0. 4/ 30 [ 20/ 2] vi a 1. 1. 1. 1, 00:00:01B 1. 0. 0. 8/ 30 [ 20/ 2] vi a 1. 1. 1. 1, 00:00:01B 1. 0. 0. 12/ 30 [ 20/ 2] vi a 1. 1. 1. 1, 00:00:01
. . .
< 4 seconds later
High Utilization
Table Version Changing Rapidly: A Little Lab Fun
5/28/2018 BRKRST-3320 - Troubleshooting BGP
93/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
RP/ 0/ RP0/ CPU0: aggi es#sh i p bgp 1. 0. 0. 4Wed Apr 27 14: 00: 36. 066 EDT. . .Last Modi f i ed: Apr 27 14: 00: 35. 387 f or 00: 00: 00Pat hs: ( 1 avai l abl e, no best pat h). . .
1001. 1. 1. 1 ( i naccessi bl e) f r om 1. 1. 1. 1 ( 1. 1. 1. 1). . .
RP/ 0/ RP0/ CPU0: aggi es#sh i p bgp 1. 0. 0. 4Wed Apr 27 14: 00: 38. 710 EDT
. . .Last Modi f i ed: Apr 27 14: 00: 38. 387 f or 00: 00: 00Pat hs: ( 1 avai l abl e, no best pat h). . .
1. 1. 1. 1 ( met r i c 2) f r om 1. 1. 1. 1 ( 1. 1. 1. 1)
. . .
Table Version Changing Rapidly: A Little Lab Fun
3 seconds later1.1.1.1 (NH) flapping
S thi i ith NEXTHOP 1 1 1 1
High Utilization
5/28/2018 BRKRST-3320 - Troubleshooting BGP
94/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Something is wrong with NEXTHOP 1.1.1.1
Flip flops between inaccessible and accessible with an IGP cost o
Troubleshoot 1.1.1.1 and the churning will stop
5/28/2018 BRKRST-3320 - Troubleshooting BGP
95/108
Layer 3 VPNs
Layer 3 VPNs
5/28/2018 BRKRST-3320 - Troubleshooting BGP
96/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Troubleshooting Checklist
#1 PE1 PE2 core connectivity
Verify you can ping from loopback to loopback Verify you can mpls ping from loopback to
loopback
PE loopbacks must be /32
Check IGP Check LDP
#2 PE1 CE1 and PE2 CE2 connectivity
Can each PE ping their directly connected CE? Remember to do ping vrf FOO x.x.x.x
PE1
CE1
#2
Layer 3 VPNs
#3 PE PE vrf connectivity
5/28/2018 BRKRST-3320 - Troubleshooting BGP
97/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
#3 PE PE vrf connectivity
Can PEs ping the vrf interface of the other PE?
If not double check your import/export Route Targets
#4 PE CE connectivity
Verify each PE can ping the CE connected to the other PE
#5 CE CE connectivity
At this point you should be able to ping CE to CE
PE1 #
CE1
#4
#
5/28/2018 BRKRST-3320 - Troubleshooting BGP
98/108
Looking Glasses
You are advertising your
The InternetBGP Looking Glasses
5/28/2018 BRKRST-3320 - Troubleshooting BGP
99/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
You are advertising youraddress space to your ISPs
Q: How can you verify they arereceiving it?
Q: How can you verify the restof the Internet is receiving it?
A: BGP Looking Glasses
BGP Looking Glass servers are computers on th
Internet running one of a variety of publicly availa
5/28/2018 BRKRST-3320 - Troubleshooting BGP
100/108
Internet running one of a variety of publicly availaLooking Glass software implementations. A LookGlass server (or LG server) is accessed remotelthe purpose of viewing routing info. Essentially, tserver acts as a limited, read-only portal to routewhatever organization is running the Looking Glaserver. Typically, publicly accessible looking glasservers are run by ISPs or NOCs.
http://www.bgp4.as
The InternetBGP Looking Glasses
https://www.sprint.
5/28/2018 BRKRST-3320 - Troubleshooting BGP
101/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Show bgp route 72.16
72.163.0.0/20
The InternetBGP Looking Glasses
5/28/2018 BRKRST-3320 - Troubleshooting BGP
102/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
host $ nsl ookup www. ci sc. . .Addr ess: 72. 163. 4. 161
host $
http://whois.arin.net/ui
Huge list of looking glasses here
The InternetBGP Looking Glasses
5/28/2018 BRKRST-3320 - Troubleshooting BGP
103/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Huge list of looking glasses here
http://www.bgp4.as/looking-glasses
The Level3 looking glass will translate AS #s to company na
The InternetBGP Looking Glasses
5/28/2018 BRKRST-3320 - Troubleshooting BGP
104/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
g g p y AS-PATH: 3549 6327
AS-PATH Translation: GBLX SHAWFIBER
Long list herehtt //b t t/ id / t ht l
The InternetWhose AS is that anyway?
5/28/2018 BRKRST-3320 - Troubleshooting BGP
105/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
g http://bgp.potaroo.net/cidr/autnums.html
Or lookup a specific AS
http://whois.arin.net/rest/asn/AS1239/pft
The University's Route Views project was originally conceived as a tool for Internet operators to obreal-time information about the global routing system from the perspectives of several different bacand locations around the Internet. Although other tools handle related tasks, such as the various L
Glass Collections (see e.g. NANOG, or the DTI NSPIXP-2 Looking Glass), they typically either proonly a constrained view of the routing system (e.g., either a single provider, or the route server) or t id l ti t ti d t
5/28/2018 BRKRST-3320 - Troubleshooting BGP
106/108
not provide real-time access to routing data.
While the Route Views project was originally motivated by interest on the part of operators in deterhow the global routing system viewed their prefixes and/or AS space, there have been many other
interesting uses of this Route Views data. For example, NLANR has used Route Views data for ASvisualization (see also NLANR), and to study IPv4 address space utilization (archive). Others haveRoute Views data to map IP addresses to origin AS for various topological studies. CAIDA has useconjunction with theNetGeo database in generating geographic locations for hosts, functionality thboth CoralReef and the Skitter project support.
University of Oregon Route
http://www.r
Complete Your Online Session Evaluation
Give us your feedback and
5/28/2018 BRKRST-3320 - Troubleshooting BGP
107/108
2013 Cisco and/or its affiliates. All rights reserved.BRKRST-3320 Cisco Public
Maximize your Cisco Live exp
free Cisco Live 365 account. DPDFs, view sessions on-dema
live activities throughout the ye
Cisco Live 365 button in your
log in.
yyou could win fabulous prizes.Winners announced daily.
Receive 20 Cisco Daily Challengepoints for each session evaluationyou complete.
Complete your session evaluationonline now through either the mobileapp or internet kiosk stations.
5/28/2018 BRKRST-3320 - Troubleshooting BGP
108/108