• Troubleshooting Methodology
• Packet Forwarding Review
• Data Plane
• Active Monitoring
• Passive Flow Monitoring
• QoS
• Control Plane
• Logging
• Routing Protocol Stability
• Getting Started
Agenda
• This session is about basic network troubleshooting,
focusing on fault detection & isolation
• Some non-Cisco specifics
• For context, we will cover some basic methodologies and functional elements of network behavior
• This session is NOT about
• Architectures of specific platforms
• Data Center technologies
• This is the 90 min tour. ;-)
Keeping Focused: What This Session is About
The Big Picture
network
Network Operator
Server
Client
Application Operator
Not
happy
It’s not
the
network
It’s the
network
Is it
Monday?
Pings
fine!
Can’t
ping it.
Internet’s
down.
Somebody's
downloading something.
(?)
EnterpriseDC
• A lot of stuff going on
• Multiple networks
• Multiple applications
• Multiple layered services
• Mis-information / inconsistency
Some More (network) Detail
LAN
Server A
Client
Not
happy
ISP A
Enterprise WAN
Server B
Internet
DNS
DHCP
802.1x
DNS
ISP B
EnterpriseDC
• Redundant paths / ECMP / LAG
• Overlays
• Load balancers
• Firewalls
• NATs
… and it keeps on going
LAN
Server A
Client
Not
happy
ISP A
Enterprise WAN
Server B
Internet
DNS
DHCP
802.1x
DNS
Why network state awareness?
• Quick detection of hard failures
• Early warning for
• soft failures
• performance issues
• and tomorrows’ problems
• Faster problem resolution
• Greater confidence in network by users and application operators
• Control Plane• Processes variety of information
sources and policies, creates routing information base (RIB)
• Best known intention w/o actual packet in hand
• Data Plane• The actual forwarding process
(might be SW or HW based)
• Granted some decision flexibility• Driven by arriving packet details,
traffic conditions etc.
Control Plane & Data Plane
Control Plane
Data PlaneInt A
Int B
Int C
packet
Routing Protocol(s)
APIs Statics
show ip route
show ip bgp
show ip ospf
show ip policy
show ip cef
show mpls forwarding…
Gossip from other routers
Passive Measurements
ifmib *FlowCbQoS
show policy-map int…show interfaceshow flow monitor
PfR
Admin Edict
• Control plane: condenses options driven by policies and (relatively) slower moving , aggregated information, eg. prefix reachability, interface state
• Data plane responds to packet conditions
• Destination prefix to egress interface matching
• Multi-path (ECMP / LAG) member selection
• Interface congestion
• QoS class state
• Access Lists
• Packet processing fields (TTL expire, etc)
• IPv4 fragmentation, etc
Data Plane Decision Flexibility
• Each network device makes an independent forwarding decision• Explicit Local / domain policies
• Device perspective might not be symmetric
• Data plane flexibility
• Generally happens at WAN-edge and admin boundaries (traffic engineering)
• Asymmetric routing
Network as a System: Independent Decisions
A B
R1 R2 R5R6
R4
R3
your network You don’t control
Congested link
R5 is doing
ECMP hash
User / Agent Checks• Treat network as a black box: are your beacon services working?
• Synthetic service check (HTTP, DNS, etc.)
• Ping (not all remotes will respond)
• Data plane is exercised and tested• Variety = better coverage (multiple IP addresses / L4 ports per location)
• Validate similar treatment (QoS) as real user traffic
• Uptime and performance (loss, latency) metrics
• Look for patterns, changes from normal. All down vs some down.
• Capture and validate real user (human) incidents. What got missed?
• Use wisely: network and server resources consumed
A B
R1 R2 R5R6
R3
LatencyNetwork
JitterDist. ofStats
ConnectivityPacketLoss
FTP DNS DHCP TCPJitter ICMP UDPDLSW HTTP
NetworkPerformance
Monitoring
Service Level Agreement
(SLA)Monitoring
NetworkAssessment
Multiprotocol Label
Switching (MPLS)
Monitoring
VoIP Monitoring
AvailabilityTroubleShooting
Operations
Measurement Metrics
Uses
MIB Data Active Generated Traff ic to Measure the Netw ork
DestinationSource
Responder
LDP H.323 SIP RTP
IP SLA
IP SLA: Synthetic Traffic Measurements
IP SLA
Cisco IOS
Software
IP SLA
Cisco IOS
SoftwareCisco IOS
Software
• IPSLA Multicast
One Way Delay (NTP req)One Way JitterPacket Loss
• Configuration is on IP SLA Sender
• Have to specify each responder explicitly in endpoint-list
• Responder becomes mcast receiver, IGMPv3 (G) and (S,G) behavior
• ISRG2, ISR4451X, ASR1k, CSR1000v, cat4k(sup7/6), c7600
IPSLA Multicast Support
SLAsender(config)#ip sla endpoint-list type ip mylist
ip-address 172.16.1.2,172.17.1.2 port 3800
SLAsender(config)#ip sla 1
udp-jitter 224.1.1.1 4000 endpoint-list mylist source-ip 172.16.1.1 source-port 4500 num-packets 100 interval 25
Unicast control
Multicast traffic
Reference
IPSLA and Relatives
• IPSLA on router/switch – good use of deployed infra
• May not be true check of data plane (shadow router)
• Resource contention (CPU) – group scheduling
• Simplistic service checks
• User end-system based agent software
• Uses true stack (OS, browser) on PC
• Truly end to end (could include WiFi)
• Includes end system resource view
• BYOD deployment challenges
• Dedicated Agent
• Mixture of benefits from end-system and network
• True stack can be challenging
show interface
• Classic command
• Check up status
• Monitor in/out bit/packet changes
# show interfaceGigabitEthernet1 is up, line protocol is up Hardware is CSR vNIC, address is 000c.291a.7f97 (bia
000c.291a.7f97)Internet address is 192.168.225.130/24MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255Encapsulation ARPA, loopback not setKeepalive set (10 sec)Full Duplex, 1000Mbps, link type is auto, media type is RJ45output flow-control is unsupported, input flow-control is
unsupportedARP type: ARPA, ARP Timeout 04:00:00Last input 00:05:35, output 00:09:58, output hang neverLast clearing of "show interface" counters neverInput queue: 0/375/0/0 (size/max/drops/flushes); Total output
drops: 0Queueing strategy: fifoOutput queue: 0/40 (size/max)5 minute input rate 0 bits/sec, 0 packets/sec5 minute output rate 0 bits/sec, 0 packets/sec
25349 packets input, 2381158 bytes, 0 no bufferReceived 0 broadcasts (0 IP multicasts)0 runts, 0 giants, 0 throttles 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored0 watchdog, 0 multicast, 0 pause input3958 packets output, 312408 bytes, 0 underruns0 output errors, 0 collisions, 0 interface resets56 unknown protocol drops0 babbles, 0 late collision, 0 deferred0 lost carrier, 0 no carrier, 0 pause output0 output buffer failures, 0 output buffers swapped out
snmp ifmib ifindex persistsnmp ifmib trap throttleinterface <intf>[no] logging event link-status [no] no snmp trap link-status load-interval 30
traceroute
• Understand the limitations
• Sends 3 packets (default) at each TTL
• Implementations
• Linux/Cisco: UDP (ICMP and TCP-SYN are Linux optional)• UDP DST port # used to keep track of packets, increments per packet. Initial= 33434 (default)
• SRC port #: randomized (linux), incrementing per packet (IOS)
• Linux (GNU inetutils-traceroute)• UDP DST port# increments per TTL (not per packet)
• SRC port is random but fixed per entire run
• Windows: ICMP Echo request
• IOS ICMP responses limited to 1 per 500ms• Configurable via: ip icmp rate-limit unreachable <ms>
Widest dispersion
against possibilities. Difficult to
understand though.
ICMP blocked
frequently
Narrower
dispersion. Story might be
misleading.
Interet: aka the
TCP/80 network
Unix traceroute
• Multiple path options
• Topology ‘shortcuts’ (same router seen at diff hop)
• Ultimately all paths result in similar e2e delay
$ traceroute 62.2.88.172traceroute to 62.2.88.172 (62.2.88.172), 30 hops max, 60 byte packets
1 152.22.242.65 (152.22.242.65) 1.044 ms 1.371 ms 1.585 ms
2 152.22.240.8 (152.22.240.8) 0.219 ms 0.328 ms 0.327 ms
3 128.109.70.9 (128.109.70.9) 1.066 ms 1.059 ms 1.168 ms
4 rtp7600-gw-to-dep7600-gw2.ncren.net (128.109.70.137) 1.634 ms 1.628 ms 1.736 ms
5 rlasr-gw-link1-to-rtp7600-gw.ncren.net (128.109.9.17) 5.354 ms 5.446 ms 5.557 ms
6 128.109.9.117 (128.109.9.117) 5.671 ms 128.109.9.170 (128.109.9.170) 7.141 ms 128.109.9.117 (128.109.9.117) 5.433 ms
7 wscrs-gw-to-ws-a1a-ip-asr-gw-sec.ncren.net (128.109.1.105) 9.174 ms 128.109.1.209 (128.109.1.209) 8.256 ms 6.397 ms
8 dcp-brdr-03.inet.qwest.net (205.171.251.110) 18.414 ms chr-edge-03.inet.qwest.net (65.114.0.205) 27.353 ms 27.438 ms
9 dcp-brdr-03.inet.qwest.net (205.171.251.110) 21.739 ms 63-235-40-106.dia.static.qwest.net (63.235.40.106) 17.750 ms
dcp-brdr-03.inet.qwest.net (205.171.251.110) 22.450 ms
10 63-235-40-106.dia.static.qwest.net (63.235.40.106) 22.531 ms 22.516 ms 84-116-130-173.aorta.net (84.116.130.173) 140.738 ms
11 nl-ams02a-rd1-te0-2-0-2.aorta.net (84.116.130.65) 140.831 ms 140.816 ms 84-116-130-173.aorta.net (84.116.130.173) 144.819 ms
12 nl-ams02a-rd1-te0-2-0-2.aorta.net (84.116.130.65) 144.074 ms 144.761 ms 84-116-130-58.aorta.net (84.116.130.58) 138.455 ms
13 84-116-130-58.aorta.net (84.116.130.58) 141.844 ms 141.924 ms 142.459 ms
14 84.116.204.234 (84.116.204.234) 145.603 ms 145.891 ms 145.987 ms
15 * * *
16 62-2-88-172.static.cablecom.ch (62.2.88.172) 268.281 ms 268.245 ms 268.176 ms
1 AAA2 BBB
3 CCC
4 DDD
5 EEE
6 FGF
7 HII8 JKK +10ms (unsustained)
9 JLJ
10 LLM +120ms (sustained)
11 NNM
12 NNO
13 PPP
14 QQQ15 ***
16 RRR ~268ms (all three)
filter + > 100 ms
delay
+120ms
Atlantic crossing
Reference
Unix inetutils traceroute
• Narrower view (no alternate paths directly seen)
• Repeating nodes suggests multipath, or (unlikely) routing issue
$ inetutils-traceroute --resolve-hostname 62.2.88.172traceroute to 62.2.88.172 (62.2.88.172), 64 hops max1 152.22.242.65 (152.22.242.65) 0.783ms 0.727ms 0.798ms2 152.22.240.8 (152.22.240.8) 0.226ms 0.228ms 0.221ms3 128.109.70.9 (128.109.70.9) 0.967ms 0.980ms 0.962ms4 128.109.70.137 (rtp7600-gw-to-dep7600-gw2.ncren.net) 1.576ms 1.598ms 1.567ms5 128.109.9.17 (rlasr-gw-link1-to-rtp7600-gw.ncren.net) 5.149ms 5.140ms 5.126ms6 128.109.9.166 (128.109.9.166) 7.113ms 7.098ms 7.306ms7 128.109.1.209 (128.109.1.209) 7.835ms 8.326ms 7.958ms
8 65.114.0.205 (chr-edge-03.inet.qwest.net) 19.944ms 9.299ms 40.372ms9 63.235.40.106 (63-235-40-106.dia.static.qwest.net) 18.442ms 18.412ms 18.432ms10 63.235.40.106 (63-235-40-106.dia.static.qwest.net) 22.424ms 22.391ms 75.960ms11 84.116.130.173 (84-116-130-173.aorta.net) 145.434ms 146.301ms 145.445ms12 84.116.130.58 (84-116-130-58.aorta.net) 137.583ms 137.556ms 137.661ms13 84.116.130.58 (84-116-130-58.aorta.net) 142.476ms 141.886ms 141.819ms14 84.116.204.234 (84.116.204.234) 144.841ms 145.034ms 144.964ms15 * * *16 62.2.88.172 (62-2-88-172.static.cablecom.ch) 287.318ms 176.670ms 254.237ms
Packets for hop 9,12 took a
‘shortcut’ and packets for hop 10,13 went long way
Reference
lft
• lft ‘layer 4 traceroute’ dynamically adjusts to responses
• Firewall detection, whois and AS lookup integrated
• Narrower packet changes, so narrower multi-path
$ sudo lft -ENA 62.2.88.172
Tracing ________________________________________________________________.
TTL LFT trace to 62-2-88-172.static.cablecom.ch (62.2.88.172):80/tcp1 [AS81] [NCREN-B22] 152.22.242.65 20.1/17.2ms2 [AS81] [NCREN-B22] 152.22.240.8 20.1/20.1ms3 [AS81] [CONCERT] 128.109.70.9 20.1/20.1ms4 [AS81] [CONCERT] rtp7600-gw-to-dep7600-gw2.ncren.net (128.109.70.137) 20.1/20.1ms5 [AS81] [CONCERT] rlasr-gw-link1-to-rtp7600-gw.ncren.net (128.109.9.17) 20.1/20.1ms6 [AS81] [CONCERT] 128.109.9.117 20.1/20.1ms7 [AS209] [unknown] chr-edge-03.inet.qwest.net (65.121.156.209) 20.1/19.5ms8 [AS209] [QWEST-INET-35] dcp-brdr-03.inet.qwest.net (205.171.251.110) 20.1/18.4ms9 [AS209] [QWEST-INET-17] 63-235-40-106.dia.static.qwest.net (63.235.40.106) 20.1/60.3ms10 [AS6830] [84-RIPE/LGI-Infrastructure] 84-116-130-173.aorta.net (84.116.130.173) 160.7/160.7ms11 [AS6830] [84-RIPE/LGI-Infrastructure] nl-ams02a-rd1-te0-2-0-2.aorta.net (84.116.130.65) 160.7/160.7ms12 [AS6830] [84-RIPE/LGI-Infrastructure] 84-116-130-58.aorta.net (84.116.130.58) 140.6/140.6ms** [firewall] the next gateway may statefully inspect packets13 [AS6830] [84-RIPE/LGI-Infrastructure] 84.116.204.234 160.7/160.6ms** [neglected] no reply packets received from TTL 1415 * [AS6830] [RIPE-C3/CC-HO841-NET] [target] 62-2-88-172.static.cablecom.ch (62.2.88.172):80 160.7ms
Used tcp/80
SYN
Reference
mtr
• Interactive combined traceroute and ping
• Gives a sense of health of path (loss, delay Standard Deviation)
• Narrow path view
Reference
aakhter-nlr-ubuntu-01 (0.0.0.0) Sat May 30 18:57:09 2015Keys: Help Display mode Restart statistics Order of fields quit
Packets PingsHost Loss% Snt Last Avg Best Wrst StDev1. 152.22.242.65 0.0% 145 0.8 0.9 0.7 10.0 0.82. 152.22.240.8 0.0% 145 0.3 0.2 0.2 0.3 0.03. 128.109.70.9 0.0% 145 1.0 3.3 1.0 182.3 17.24. rtp7600-gw-to-dep7600-gw2.ncren.net 1.0% 145 9.2 4.1 1.6 203.4 18.65. rlasr-gw-link1-to-rtp7600-gw.ncren.net 0.0% 145 5.3 5.3 5.1 6.8 0.2
6. 128.109.9.166 0.0% 145 7.1 7.3 7.1 16.1 0.87. wscrs-gw-to-ws-a1a-ip-asr-gw-sec.ncren.net 0.0% 145 6.8 8.3 6.2 10.6 1.08. chr-edge-03.inet.qwest.net 0.0% 145 9.4 12.3 9.3 62.1 9.59. dcp-brdr-03.inet.qwest.net 0.0% 145 21.8 22.8 21.7 70.7 5.510. 63-235-40-106.dia.static.qwest.net 0.0% 145 21.8 24.5 21.7 86.1 10.611. 84-116-130-173.aorta.net 0.0% 145 144.8 145.0 144.7 152.9 1.012. nl-ams02a-rd1-te0-2-0-2.aorta.net 0.0% 145 144.1 145.5 144.0 165.4 3.713. 84-116-130-58.aorta.net 5.0% 144 142.9 142.3 142.0 145.6 0.414. 84.116.204.234 5.0% 144 145.1 145.1 144.9 145.3 0.015. 217-168-62-150.static.cablecom.ch 5.0% 144 145.9 146.1 145.2 164.3 1.916. 62-2-88-172.static.cablecom.ch 5.0% 144 313.0 260.3 152.6 508.0 80.0
Note
variability,
probably just
the end
system
Just local noise, no
carry over to later
hops Sustained loss.
Likely something
wrong 12->13, or
way back
Follow the Flow with NetFlow
• Per-Node: Data plane observations and decisions captured
• Src/dst mac/IP/port#s, DSCP values, in/out interfaces, etc.
• Network view: flows centrally analyzed- NetFlow collector/analyzer
• Biggest value: strategically placed partial views (eg WAN edge)
A B
R1 R2 R5R6
R4
R3
NetFlow Collector
LiveAction
• Developed and patented at Cisco Systems in 1996
• NetFlow is the de facto standard for acquiring IP operational data
• Standardized in IETF via IPFIX
• Provides network and security monitoring, network planning, traffic analysis, and IP accounting
• Packet capture is like a wire tap
• NetFlow is like a phone bill
NetFlow—What Is It?
Network World Article—NetFlow Adoption on the Risehttp://www.networkworld.com/newsletters/nsm/2005/0314nsm1.html
Src.IP
Dest.IP
SourcePort
Dest.Port
ProtocolTOS
Input I/F
… Pkts
3.3.3.3 2.2.2.2 23 22078 6 0 E0 … 1100
Traffic Analysis Cache
Flow Monitor 1
Traffic
Non-Key Fields
Packets
Bytes
Timestamps
Next Hop Address
Source IP Dest. IP Input I/F Flag … Pkts
3.3.3.3 2.2.2.2 E0 0 … 11000
Security Analysis Cache
Flow Monitor 2
Key Fields Packet 1
Source IP 3.3.3.3
Dest IP 2.2.2.2
Input Interface Ethernet 0
SYN Flag 0
Non-Key Fields
Packets
Timestamps
Flexible NetFlowMultiple Monitors with Unique Key Fields
Key Fields Packet 1
Source IP 3.3.3.3
Destination IP 2.2.2.2
Source Port 23
Destination Port 22078
Layer 3 Protocol TCP - 6
TOS Byte 0
Input Interface Ethernet 0
• Flexible NetFlow Forwarding Status field captures forwarding (and drop reason) for flow.
• Drop Count increments on any explicit drop by router
NetFlow Forwarding Status & Drop Count Fields
Network nodes are able to discover & validate RTP, TCP and IP-CBR traffic on hop by hop
basis
À la carte metric (loss, latency, jitter etc.) selections, applied on operator selected sets of traffic
Allows for fault isolation and network span validation
Per-application threshold and altering.
Network Performance Monitor
• RTP SSRC
• RTP Jitter (min/max/mean)
• Transport Counter (expected/loss)
• Media Counter (bytes/packets/rate)
• Media Event
• Collection interval
• TCP MSS
• TCP round-trip time
Performance Monitor Information Elements
• CND - Client Network Delay (min/max/sum)
• SND – Server Network Delay (min/max/sum)
• ND – Network Delay (min/max/sum)
• AD – Application Delay (min/max/sum)
• Total Response Time (min/max/sum)
• Total Transaction Time (min/max/sum)
• Number of New Connections
• Number of Late Responses
• Number of Responses by Response Time (7-bucket histogram)
• Number of Retransmissions
• Number of Transactions
• Client/Server Bytes
• Client/Server Packets
• L3 counter (bytes/packets)
• Flow event
• Flow direction
• Client and server address
• Source and destination address
• Transport information
• Input and output interfaces
• L3 information (TTL, DSCP, TOS, etc.)
• Application information (from NBAR2)
• Monitoring class hierarchy
Media Monitoring Application Response Time Other Metrics
NetFlow QoS Analysis
Cisco Prime Infra
LiveAction
flow 5-tuple DPI/NBAR QoS processing DSCP
How is my flow being classified?
Did this QoS class drop traffic?
• QoS queue performance (drops)
• QoS class structure class-map and policy map names
NetFlow QoS Flow exporter:option c3pl-class-table timeout <timeout>option c3pl-policy-table timeout <timeout>
QoS Queue performance:flow record type performance monitor qos-recordmatch policy qos queue indexcollect policy qos queue drops
(or)flow record qos-recordmatch policy qos queue index
collect policy qos queue drops
Flow to QoS Association:flow record type performance-monitor Amatch connection client ipv4 addressmatch connection server ipv4 addressmatch connection server transport portcollect policy qos class hierarchycollect policy qos queue id…
(or)flow record qos-class-recordmatch ipv4 source addressmatch ipv4 destination addresscollect policy qos classification hierarchycollect policy qos queue index…
Enhanced NetFlow CLI ExampleR1#show flow monitor qos-flow-monitor cache IP FORWARDING STATUS: Forward
IPV4 SOURCE ADDRESS: 192.168.32.128
IPV4 DESTINATION ADDRESS: 224.0.0.5
INTERFACE INPUT: Null
INTERFACE OUTPUT: Gi2
FLOW DIRECTION: Output
IP DSCP: 0x30
policy qos class hierarchy: WAN-EDGE-4-CLASS: CONTROL
policy qos queue index: 1073741827
IP FORWARDING STATUS: Consume
IPV4 SOURCE ADDRESS: 192.168.225.128
IPV4 DESTINATION ADDRESS: 192.168.225.130
INTERFACE INPUT: Gi1
INTERFACE OUTPUT: Null
FLOW DIRECTION: Input
IP DSCP: 0x04
policy qos class hierarchy: WAN-EDGE-4-CLASS: class-default
policy qos queue index: 0
IP FORWARDING STATUS: Forward
IPV4 SOURCE ADDRESS: 192.168.225.128
IPV4 DESTINATION ADDRESS: 5.5.5.5
INTERFACE INPUT: Gi1
INTERFACE OUTPUT: Gi2FLOW DIRECTION: Output
IP DSCP: 0x00
policy qos class hierarchy: WAN-EDGE-4-CLASS: class-default
policy qos queue index: 1073741829
0x30 = CS6: in
‘control’ class
My VTY
session
Data traffic
platform qos performance-monitor
!
flow record qos-class-record
match routing forwarding-status
match ipv4 dscp
match ipv4 source address
match ipv4 destination address
match interface input
match interface output
match flow direction
collect policy qos classification hierarchy
collect policy qos queue index
!
flow monitor qos-flow-monitor
record qos-class-record
!
interface GigabitEthernet1
ip flow monitor qos-flow-monitor input
!
interface GigabitEthernet2
ip flow monitor qos-flow-monitor output
service-policy output WAN-EDGE-4-CLASS
• IOS QoS collects vital information regardinghealth of QoS classes
• Pre and Post bytes, drops, etc
• Same class names from different routers can be compared
• For flow level analysis, use NetFlow QoSreporting
• ‘snmp mib persist CBQoS’
CBQoS MIB
Dedicated Protocol Analyzers• Wireshark and other protocol analyzers are great
• Detailed analysis for variety of protocols at deep level
• Dedicated probes are expensive to deploy pervasively• Operator has to make difficult judgment calls on where the problem is going to be– before it
happens
• Can be challenging after the fact- need on-site trained personnel.
Embedded Packet Capture & Analyze
• Capture packets locally to buffer on router
• Store to flash, USB, FTP, TFTP for analysis in protocol analyzer• IOS XE Cat 4k Sup 7E & Sup 7L-E (XE 3.3.0 SG) include built in Wireshark decode capability
• Capture does not add traffic to network
LY-2851-8#monitor capture buffer pcap-buffer1 size 10000 max-size 1550
LY-2851-8#monitor capture point ip cef pcap-point1 g0/0 both
LY-2851-8#monitor capture point associate pcap-point1 pcap-buffer1
LY-2851-8#monitor capture point start pcap-point1
LY-2851-8#monitor capture point stop pcap-point1
LY-2851-8#monitor capture buffer pcap-buffer1 export ftp://10.17.0.252/images/test.cap
Gig0/0
iOAM6(prototype)
• Instrumented IPv6 extension header on user packets
• vs. IPv4 record-route option header
• v6 Ext Headers better designed
• Domain level control
• Minimal performance hit (handled in data plane)
• Packets continue on regular path
• Instrumentation
• Packet sequence numbers => detect packet loss
• Time stamps => one way delay
• Node and ingress/egress interface names => path recording
• Send interest to [email protected] Network Element
Apps/Controller
v6 traffic
matrix
Live flow
tracing
Delay
distribution
Bi-castíng
control
Loss matrix/
monitor
App data
monitoring
Enhanced Telemetry
Per hop and end-to-end data added to
(selected) data traffic into the packet
Node-ID Ingress i/f egress i/f
Sequence# Timestamp App-Data
BRKRST-2606
for more info
iOAM6 Path Trace
• Basic configuration
ipv6 ioam path-record
ipv6 ioam node-id <node id>
• Extended Ping
H1#pingProtocol [ip]: ipv6Target IPv6 address: ::A:1:1:0:1DRepeat count [5]: 1Datagram size [100]: 300Timeout in seconds [2]:Extended commands? [no]: yesSource address or interface: gig0/1UDP protocol? [no]:
Verbose? [no]: yesPrecedence [0]:DSCP [0]:Include hop by hop Path Record option? [no]: yesSweep range of sizes? [no]:Type escape sequence to abort.Sending 1, 300-byte ICMP Echos to ::A:1:1:0:1D, timeout is 2 seconds:(Gi0/1)R1(Gi0/2)----(Gi0/1)R4(Gi0/2)----(Gi0/2)R3(Gi0/3)----H3----(Gi0/3)R3(Gi0/2)----(Gi0/2)R4(Gi0/1)----(Gi0/2)R1(Gi0/1)Reply to request 0 (35 ms)Success rate is 100 percent (1/1), round-trip min/avg/max = 35/35/35 ms
H1 R1R3
H3::A:1:1:0:1D
R2
R4
V6 extension header
applied/decapped
V6 extension header
applied/decapped
• 3Ws: When, where, and what
• Change is normal, but some changes are more interesting:
• Single change that causes loss of reachability or suboptimal performance
• Instability: high rate of change
Control Plane
Logging
• Centrally: for ease of analysis and search
• syslog-ng – preprocessing, relay and store(file/db)
• Logstash(ELK), fluentd – multisource collection, storage and analysis
• Locally: in case logs can’t get home
service timestamps log datetime msec show-timezone!
logging host <ipaddr>
logging trap 6
logging source interface Loopback 0
!
logging buffered <size> 6logging presistant url disk0:/syslog size <TotalLogsSize> filesize <OneFileSize>
State of the Routing Table
• Be familiar with normal behavior of important service prefixes
• Establish quickly if problem is control plane or data plane
• show ip route / ipRouteTable MIB / show ip traffic (Drop stats)
• Nagios: check_snmp_iproute.pl
• Track objects and EEM(config)track 100 ip route 0.0.0.0 0.0.0.0 reachabilityevent manager applet TrackRoute_0.0.0.0event track 100 state anyaction 1.0 syslog msg "route is $_track_state“
#01:09:21: %HA_EM-6-LOG: TrackRoute_0.0.0.0: route is down
blog.ipspace.net
#show ip route 192.168.2.2
Routing entry for 192.168.2.2/32
Known via "ospf 1", distance 110, metric 11, type intra area
Last update from 10.0.0.2 on FastEthernet0/0, 00:00:13 ago
Routing Descriptor Blocks:
* 10.0.0.2, from 2.2.2.2, 00:00:13 ago, via FastEthernet0/0
Route metric is 11, traffic share count is 1
blog.ipsapce.net
• Remember that OSPF nodes in area should be consistant
• Understand ‘normal’ rate of changes• SPF runs per hour
• show ip ospf stat detail
• number of LSAs expected
• OSPF-MIB: OspfSpfRuns, ospfAreaLSACount
• Route missing? • Where is the network supposed to be
attached? Is it still?
• show interface (on advertising router)
• show ip ospf database …
OSPF Area / AS-Wide
# show ip ospf
Routing Process "ospf 1" with ID 192.168.0.1
Start time: 00:01:46.195, Time elapsed: 00:48:27.308
Supports only single TOS(TOS0) routes
Supports opaque LSA
Supports Link-local Signaling (LLS)
Supports area transit capability
Supports NSSA (compatible with RFC 3101)
Supports Database Exchange Summary List Optimization (RFC 5243)
Event-log enabled, Maximum number of events: 1000, Mode: cyclic
Router is not originating router-LSAs with maximum metric
Initial SPF schedule delay 5000 msecs
Minimum hold time between two consecutive SPFs 10000 msecs
Maximum wait time between two consecutive SPFs 10000 msecs
Incremental-SPF disabled
Minimum LSA interval 5 secs
Minimum LSA arrival 1000 msecs
LSA group pacing timer 240 secs
Interface flood pacing timer 33 msecs
Retransmission pacing timer 66 msecs
Number of external LSA 0. Checksum Sum 0x000000
Number of opaque AS LSA 0. Checksum Sum 0x000000
Number of DCbitless external and opaque AS LSA 0
Number of DoNotAge external and opaque AS LSA 0
Number of areas in this router is 1. 1 normal 0 stub 0 nssa
Number of areas transit capable is 0
External flood list length 0
IETF NSF helper support enabled
Cisco NSF helper support enabled
Reference bandwidth unit is 100 mbps
Area BACKBONE(0)
Number of interfaces in this area is 4 (1 loopback)
Area has no authentication
SPF algorithm last executed 00:47:05.379 ago
SPF algorithm executed 4 times
Area ranges are
Number of LSA 16. Checksum Sum 0x078460
Number of opaque link LSA 0. Checksum Sum 0x000000
Number of DCbitless LSA 0
Number of indication LSA 0
Number of DoNotAge LSA 0
Flood list length 0
OSPF Neighborships
• neighbor adjacencies
• log-adjacency-changes (on by default)
• show ip ospf neighbor detail (OSPF-MIB: ospfNbrState, ospfNbrEvents, ospfNbrLSRetransQLen)
(config) router ospf <id>
(config-router) log-adjacency-changes [detail]
%OSPF-5-ADJCHG: Process 12, Nbr 172.25.25.1 on Serial0/0 from FULL to DOWN, Neighbor Down: Dead timer
expired Oct 14 09:57:43: %OSPF-5-ADJCHG: Process 12, Nbr 172.25.25.1 on ...
# show ip ospf neighbor detailNeighbor 192.168.0.7, interface address 10.0.0.3
In the area 0 via interface GigabitEthernet0/1Neighbor priority is 1, State is FULL, 6 state changesDR is 10.0.0.3 BDR is 10.0.0.4Options is 0x12 in Hello (E-bit, L-bit)Options is 0x52 in DBD (E-bit, L-bit, O-bit)LLS Options is 0x1 (LR)Dead timer due in 00:00:39Neighbor is up for 00:33:10Index 2/2/2, retransmission queue length 0, number of retransmission 0First 0x0(0)/0x0(0)/0x0(0) Next 0x0(0)/0x0(0)/0x0(0)Last retransmission scan length is 0, maximum is 0
Last retransmission scan time is 0 msec, maximum is 0 msec
RtrA#show ip eigrp neighbors
IP-EIGRP neighbors for process 1
H Address Interface Hold Uptime SRTT RTO Q Seq
(sec) (ms) Cnt Num
2 10.1.1.1 Et0 12 6d16h 20 200 0 233
1 10.1.4.3 Et1 13 2w2d 87 522 0 452
0 10.1.4.2 Et1 10 2w2d 85 510 0 3
Seconds Remaining Before Declaring Neighbor Down
How Long Since the Last Time Neighbor Was Discovered
How Long It Takes for This Neighbor to Respond to Reliable Packets
How Long We’ll Wait Before Retransmitting if No Acknowledgement
NeighborsShow IP EIGRP Neighbors
Outstanding Packets
Last Reliable Packet Sent
Neighbors
• So this tells us why the neighbor is bouncing—but what do they mean?
• eg: peer restarted means you have to ask the peer; he’s the one that restarted the session
Log-Neighbor-Changes Messages
Neighbor 10.1.1.1 (Ethernet0) is down: peer restarted
Neighbor 10.1.1.1 (Ethernet0) is up: new adjacency
Neighbor 10.1.1.1 (Ethernet0) is down: holding time expired
Neighbor 10.1.1.1 (Ethernet0) is down: retry limit exceeded Others, but not often
BGP Monitoring Protocol (BMP) OverviewCollecting Pre-Policy BGP Messages
Adj-RIB-in (pre-inbound-filter)BGP Monitor Protocol update
BMP collector
BMP clientInbound
filtering
policing
Loc-RIB (post-inbound-filter)iBGP update
BMP message
Adj-RIB-in (pre-inbound-filter)eBGP update
BGP peer’s (external)
BGP peer
(internal)
• BMP receiver can be configured with both ipv4 & ipv6 host addresses.
• The BGP speaker process is referred to as the BMP Client.
• BMP client provides only pre-policy view of the ADJ-RIB-IN of a peer
• Post-policy view is not supported
• A BGP peer can be monitored by multiple BMP reciever
• Any update message from the peer ( irrespective of the address-family ) is sent to the BMP receiver
• Multiple BMP receivers can be configured across all BGP instances
• Each BGP instance will send update messages for peers under it to the BMP receiversmonitoring the corresponding peers
BMP client
Be Prepared!
• Be prepared and have data collection systems enabled• Enable passive monitoring
• Call signaling and endpoint logging systems: CDR, Syslog, SNMP Traps etc.• Session monitoring on endpoints, application infrastructure and network
• Enable active tests • Periodic endpoint to endpoint calls• Network performance probes
• Helpdesk• Interview Script
• Access to tools, logs, etc.
• Firefighters run drills, so should your teams!• Be familiar with the tools and how they respond on your network
• Cross-domain teams (applications, UC, security, servers)
Expanding your Toolbox and Knowledge
• Great open source tools to look at
• Network topology & IP address management: netdot, GestióIP
• Performance tests: iperf3
• Service checks: Nagios Core, Zenoss Community
• NetFlow / Log analysis: logstash, fluentd
• Template driven config generation: ansible
• Just Some of the Sessions at Cisco Live
• BRKARC-2002 - Techniques of a Network Detective
• BRKARC-2011 - Overview of Packet Capturing Tools in Cisco Switches and Routers
• BRKARC-2019 - Operating an ASR1000
• LTRARC-2003 - IOS-XE hands-on troubleshooting
Participate in the “My Favorite Speaker” Contest
• Promote your favorite speaker through Twitter and you could win $200 of Cisco Press products (@CiscoPress)
• Send a tweet and include
• Your favorite speaker’s Twitter handle @aakhter
• Two hashtags: #CLUS #MyFavoriteSpeaker
• You can submit an entry for more than one of your “favorite” speakers
• Don’t forget to follow @CiscoLive and @CiscoPress
• View the official rules at http://bit.ly/CLUSwin
Promote Your Favorite Speaker and You Could Be a Winner
Complete Your Online Session Evaluation
Don’t forget: Cisco Live sessions will be available for viewing on-demand after the event at CiscoLive.com/Online
• Give us your feedback to be entered into a Daily Survey Drawing. A daily winner will receive a $750 Amazon gift card.
• Complete your session surveys though the Cisco Live mobile app or your computer on Cisco Live Connect.
Continue Your Education
• Demos in the Cisco campus
• Walk-in Self-Paced Labs
• Table Topics
• Meet the Engineer 1:1 meetings
• Related sessions
R&S Related Cisco Education OfferingsCourse Description Cisco Certification
CCIE R&S Advanced Workshops (CIERS-1 & CIERS-2) plusSelf Assessments, Workbooks & Labs
Expert level trainings including: instructor led workshops, self assessments, practice labs and CCIE Lab Builder to prepare candidates for the CCIE R&S practical exam.
CCIE® Routing & Switching
• Implementing Cisco IP Routing v2.0• Implementing Cisco IP Switched
Networks V2.0
• Troubleshooting and Maintaining Cisco IP Networks v2.0
Professional level instructor led trainings to prepare candidates for the CCNP R&S exams (ROUTE, SWITCH and TSHOOT). Also available in self study eLearning formats with Cisco Learning Labs.
CCNP® Routing & Switching
Interconnecting Cisco Networking Devices: Part 2 (or combined)
Configure, implement and troubleshoot local and wide-area IPv4 and IPv6 networks. Also available in self study eLearning format with Cisco Learning Lab.
CCNA® Routing & Switching
Interconnecting Cisco Networking Devices: Part 1
Installation, configuration, and basic support of a branch network. Also available in self study eLearning format with Cisco Learning Lab.
CCENT® Routing & Switching
For more details, please visit: http://learningnetwork.cisco.com
Questions? Visit the Learning@Cisco Booth or contact [email protected]
Performance Monitor Configuration
Flow
Record
Flow Monitor
Flow
Exporter(optional)
Policy-map
Class-map
Interface
Applied inbound or outbound
What traffic to monitor?What metrics to collect?
Where to send data?
Flow Record defines what metrics to collect and how to collect them (just like in Flexible NetFlow configuration)
Performance monitor introducesflow record type performance-monitor
Match field types perform aggregation towards that field.
Iematch ipv4 source addressmatch ipv4 destination address
will create a unique entry per src-dstcombinations
Example Configuration – Flow Record
f low record ty pe perf ormance-monitor def ault-rtp-pt-namematch ipv 4 protocolmatch ipv 4 source addressmatch ipv 4 destination addressmatch transport source-portmatch transport destination-portmatch transport rtp ssrcmatch policy perf ormance-monitor classif ication hierarchycollect routing f orwarding-statuscollect ipv 4 dscpcollect ipv 4 ttlcollect transport packets expected countercollect transport packets lost countercollect transport packets lost ratecollect transport ev ent packet-loss countercollect transport rtp jitter meancollect transport rtp jitter minimumcollect transport rtp jitter maximumcollect interf ace inputcollect interf ace outputcollect counter by tescollect counter packetscollect counter by tes ratecollect timestamp interv alcollect application namecollect application media by tes countercollect application media by tes ratecollect application media packets countercollect application media packets ratecollect application media ev entcollect monitor ev entcollect transport rtp pay load-ty pe
!
flow monitor pulls together the flow record, exporter, and specific cache management configurations (just like Flexible NetFlow)
Special type of flow monitorflow monitor type performance-monitor
(optional) Flow exporter configures how the NetFlow exporting is done
Policy map specifies which traffic to monitor (via class-map), how to monitor (via monitor), and any per-class threshold crossing actions
Typed policy-map (performance monitor)
Example Configuration – monitor
flow exporter mn-campus-samplicatordestination 10.1.160.37source Loopback0transport udp 2055template data timeout 60option c3pl-class-tableoption c3pl-policy-tableoption interface-tableoption application-tableoption sub-application-table!flow monitor type performance-monitor default-rtp-pt-namerecord default-rtp-pt-nameexporter mn-campus-samplicatorcache timeout synchronized 10 export-spread 5history size 10!policy-map type performance-monitor rtp-traffic-nameclass VOIPflow monitor default-rtp-pt-namereact 1 transport-packets-lost-ratethreshold value ge 1.00alarm severity erroraction syslog
class VIDEO-CONFflow monitor default-rtp-pt-name
Example Configuration – Interface attachment
• Finally, policy map is applied to interface
• Note typed policy is used
• Direction of monitoring (input|output) selectable for some platforms
interface gigabitEthernet 0/1
service-policy type performance-monitor input rtp-traffic-
name
• AQM provides deeper insight into the media flows that are processed by the CUBE / Voice gateways
ISRG2, c8xx 15.3(3)M
• Available via MIB, CDR and performance monitor
Audio Quality Metrics (AQM) on CUBE
PRI
SIP/media
‘media monitoring’ configuration under ‘voice service voip’ or dial-peer
Controls generation of metrics on CUBE/VG
To export via NetFlow, regular performance monitor configuration –just include the AQM fields
MIBCISCO-VOICE-DIAL-CONTROL-MIB
Example Configuration –AQM performance monitor
voice service voipmedia monitoring [num] persist
! num is number of channels used to monitormedia statistics
! delay calc, MOS etc
OR
dial-peer voice [tag] voipmedia monitoring
!flow record type performance-monitor aqmmatch ipv4 source addressmatch ipv4 destination addressmatch transport source-portmatch transport destination-portcollect application voice number calledcollect application voice number calling…
Regular performance monitoring configuration continues
VQM deeper insight into the video flows (H.264) that are crossing routers
ISRG2, c8xx 15.3(3)M
Available via performance monitor
Video Quality Metrics (VQM) on ISR G2
‘no shut’ under ‘video monitoring’ global config.
To export via NetFlow, regular performance monitor configuration – just include the AQM fields
Example Configuration –VQM performance monitor
video monitoringmaximum-sessions 10no shutdown
flow record type performance-monitoring vqm-recmatch ipv4 protocolmatch ipv4 source addressmatch ipv4 destination addressmatch transport source-portmatch transport destination-portmatch transport rtp ssrccollect application video resolution [ width | height ] lastcollect application video frame ratecollect application video payload bitrate [ average | fluctuation ]collect application video frame [ I | STR | LTR | super-P | NR ] counter
framescollect application video frame [ I | STR | LTR | super-P | NR ] counter
packets [lost]collect application video frame [ I | STR | LTR | super-P | NR ] counter bytescollect application video frame [ I | STR | LTR | super-P | NR ] slice-
quantization-levelcollect application video eMOS compression [ network | bitstream ]collect application video eMOS packet-loss [ network | bitstream ]collect application video frame percentage damagedcollect application video scene-complexitycollect application video level-of-motioncollect transport rtpsequence-number [ last ]
Individual monitor intervals:• show performance monitor history
Aggregation over all stored intervals:
• show performance monitor status
show commands1861-AA0213#show performance monitor history Load for five secs: 20%/16%; one minute: 8%; five minutes: 4%Time source is NTP, 01:52:12.052 EST Fri Oct 29 2010
Codes: * - field is not configurable under flow recordNA - field is not applicable for configured parameters
Match: ipv4 src addr = 10.1.160.19, ipv4 dst addr = 10.1.3.5, ipv4 prot = udp, trns srcport = 32760, trns dst port = 22802, SSRC = 1717646439Policy: all-apps, Class: telepresence-CS4, Interface: FastEthernet0/0, Direction: input
start time 01:51:31 ============
*history bucket number : 1 *counter flow : 1 counter bytes : 162329 counter bytes rate (Bps) : 5410 *counter bytes rate per flow (Bps) : 5410 *counter bytes rate per flow min (Bps) : 5410 *counter bytes rate per flow max (Bps) : 5410 counter packets : 773
*counter packets rate per flow : 25 counter packets dropped : 0 routing forwarding-status reason : Unknown interface input : Fa0/0 interface output : Vl1000 monitor event : false
ipv4 dscp : 32 ipv4 ttl : 58 application media bytes counter : 146869 application media packets counter : 773
application media bytes rate (Bps) : 4895 *application media bytes rate per flow (Bps) : 4895 *application media bytes rate per flow min (Bps) : 4895 *application media bytes rate per flow max (Bps) : 4895
application media packets rate (pps) : 25 application media event : Normal *transport rtp flow count : 1 transport rtp jitter mean (usec) : 476 transport rtp jitter minimum (usec) : 1 transport rtp jitter maximum (usec) : 1997 *transport rtp payload type : 96 transport event packet-loss counter : 0 *transport event packet-loss counter min : 0 *transport event packet-loss counter max : 0 transport packets expected counter : 773 transport packets lost counter : 0 *transport packets lost counter minimum : 0 *transport packets lost counter maximum : 0 transport packets lost rate ( % ) : 0.00 *transport packets lost rate min ( % ) : 0.00 *transport packets lost rate max ( % ) : 0.00
for reference
How do I want to cache information?Router(config)# flow monitor my-monitor
Router(config-flow-monitor)# exporter my-exporter
Router(config-flow-monitor)# record my-record
Which interface do I want to monitor?
What data do I want to meter?Router(config)# flow record my-recordRouter(config-flow-record)# match ipv4 destination addressRouter(config-flow-record)# match ipv4 source address
Router(config-flow-record)# collect counter bytes
Where do I want my data sent?Router(config)# flow exporter my-exporter
Router(config-flow-exporter)# destination 1.1.1.1
Router(config)# interface s3/0
Router(config-if)# ip flow monitor my-monitor input
1. Configure the Exporter
2. Configure the Flow Record
3. Configure the Flow Monitor
4. Apply to an Interface
Service PlanningFNF Configuration - Example
• How is my flow being classified?
• Did this class drop traffic?
• QoS queue performance (drops)
• QoS class structure class-map and policy map names
NetFlow QoS Reporting Flow exporter:option c3pl-class-table timeout <timeout>option c3pl-policy-table timeout <timeout>
QoS Queue performance:flow record type performance monitor qos-recordmatch policy qos queue indexcollect policy qos queue drops
(or)flow record qos-recordmatch policy qos queue index
collect policy qos queue drops
Flow to QoS Association:flow record type performance-monitor Amatch connection client ipv4 addressmatch connection server ipv4 addressmatch connection server transport portcollect policy qos class hierarchycollect policy qos queue id…
(or)flow record qos-class-recordmatch ipv4 source addressmatch ipv4 destination addresscollect policy qos classification hierarchycollect policy qos queue index…
show ip traffic
R1#show ip traffic [interface <interface>]
IP statistics:
Rcvd: 1117 total, 1116 local destination
0 format errors, 0 checksum errors, 0 bad hop count
0 unknown protocol, 0 not a gateway
0 security failures, 0 bad options, 0 with options
Opts: 0 end, 0 nop, 0 basic security, 0 loose source route
0 timestamp, 0 extended security, 0 record route
0 stream ID, 0 strict source route, 0 alert, 0 cipso, 0 ump
0 other
Frags: 0 reassembled, 0 timeouts, 0 couldn't reassemble
0 fragmented, 0 fragments, 0 couldn't fragment
Bcast: 58 received, 0 sent
Mcast: 442 received, 221 sent
Sent: 842 generated, 1195 forwarded
Drop: 1 encapsulation failed, 0 unresolved, 0 no adjacency
0 no route, 0 unicast RPF, 0 forced drop
0 options denied
Drop: 0 packets with source IP address zero
Drop: 0 packets with internal loop back IP address
0 physical broadcast
Reinj: 0 in input feature path, 0 in output feature path
ICMP statistics:
Rcvd: 0 format errors, 0 checksum errors, 0 redirects, 0 unreachable
0 echo, 0 echo reply, 0 mask requests, 0 mask replies, 0 quench
0 parameter, 0 timestamp, 0 timestamp replies, 0 info request, 0 other
0 irdp solicitations, 0 irdp advertisements
0 time exceeded, 0 info replies
Sent: 0 redirects, 0 unreachable, 0 echo, 0 echo reply
0 mask requests, 0 mask replies, 0 quench, 0 timestamp, 0 timestamp replies
0 info reply, 0 time exceeded, 0 parameter problem
0 irdp solicitations, 0 irdp advertisements
UDP statistics:
Rcvd: 58 total, 0 checksum errors, 58 no port 0 finput
Sent: 0 total, 0 forwarded broadcasts
BGP statistics:
Rcvd: 0 total, 0 opens, 0 notifications, 0 updates
0 keepalives, 0 route-refresh, 0 unrecognized
Sent: 0 total, 0 opens, 0 notifications, 0 updates
0 keepalives, 0 route-refresh
TCP statistics:
Rcvd: 1471 total, 0 checksum errors, 85 no port
Sent: 597 total
..
OSPF statistics:
Last clearing of OSPF traffic counters never
Rcvd: 460 total, 0 checksum errors
414 hello, 8 database desc, 3 link state req
22 link state updates, 13 link state acks
Sent: 245 total
199 hello, 12 database desc, 2 link state req
21 link state updates, 12 link state acks