Top Banner
72

Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Aug 29, 2018

Download

Documents

dangkhue
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP
Page 2: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Network State Awareness and TroubleshootingAamer Akhter / [email protected]

BRKARC-2025

Page 3: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

• Troubleshooting Methodology

• Packet Forwarding Review

• Data Plane

• Active Monitoring

• Passive Flow Monitoring

• QoS

• Control Plane

• Logging

• Routing Protocol Stability

• Getting Started

Agenda

Page 4: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

• This session is about basic network troubleshooting,

focusing on fault detection & isolation

• Some non-Cisco specifics

• For context, we will cover some basic methodologies and functional elements of network behavior

• This session is NOT about

• Architectures of specific platforms

• Data Center technologies

• This is the 90 min tour. ;-)

Keeping Focused: What This Session is About

Page 5: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

The Big Picture

network

Network Operator

Server

Client

Application Operator

Not

happy

It’s not

the

network

It’s the

network

Is it

Monday?

Pings

fine!

Can’t

ping it.

Internet’s

down.

Somebody's

downloading something.

(?)

Page 6: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

EnterpriseDC

• A lot of stuff going on

• Multiple networks

• Multiple applications

• Multiple layered services

• Mis-information / inconsistency

Some More (network) Detail

LAN

Server A

Client

Not

happy

ISP A

Enterprise WAN

Server B

Internet

DNS

DHCP

802.1x

DNS

Page 7: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

ISP B

EnterpriseDC

• Redundant paths / ECMP / LAG

• Overlays

• Load balancers

• Firewalls

• NATs

… and it keeps on going

LAN

Server A

Client

Not

happy

ISP A

Enterprise WAN

Server B

Internet

DNS

DHCP

802.1x

DNS

Page 8: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Why network state awareness?

• Quick detection of hard failures

• Early warning for

• soft failures

• performance issues

• and tomorrows’ problems

• Faster problem resolution

• Greater confidence in network by users and application operators

Page 9: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Find the Suspects Question Suspects Improve

Be Prepared

Think Like a Network Detective

Page 10: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

• Control Plane• Processes variety of information

sources and policies, creates routing information base (RIB)

• Best known intention w/o actual packet in hand

• Data Plane• The actual forwarding process

(might be SW or HW based)

• Granted some decision flexibility• Driven by arriving packet details,

traffic conditions etc.

Control Plane & Data Plane

Control Plane

Data PlaneInt A

Int B

Int C

packet

Routing Protocol(s)

APIs Statics

show ip route

show ip bgp

show ip ospf

show ip policy

show ip cef

show mpls forwarding…

Gossip from other routers

Passive Measurements

ifmib *FlowCbQoS

show policy-map int…show interfaceshow flow monitor

PfR

Admin Edict

Page 11: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

• Control plane: condenses options driven by policies and (relatively) slower moving , aggregated information, eg. prefix reachability, interface state

• Data plane responds to packet conditions

• Destination prefix to egress interface matching

• Multi-path (ECMP / LAG) member selection

• Interface congestion

• QoS class state

• Access Lists

• Packet processing fields (TTL expire, etc)

• IPv4 fragmentation, etc

Data Plane Decision Flexibility

Page 12: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

• Each network device makes an independent forwarding decision• Explicit Local / domain policies

• Device perspective might not be symmetric

• Data plane flexibility

• Generally happens at WAN-edge and admin boundaries (traffic engineering)

• Asymmetric routing

Network as a System: Independent Decisions

A B

R1 R2 R5R6

R4

R3

your network You don’t control

Congested link

R5 is doing

ECMP hash

Page 13: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Data Plane

Page 14: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

User / Agent Checks• Treat network as a black box: are your beacon services working?

• Synthetic service check (HTTP, DNS, etc.)

• Ping (not all remotes will respond)

• Data plane is exercised and tested• Variety = better coverage (multiple IP addresses / L4 ports per location)

• Validate similar treatment (QoS) as real user traffic

• Uptime and performance (loss, latency) metrics

• Look for patterns, changes from normal. All down vs some down.

• Capture and validate real user (human) incidents. What got missed?

• Use wisely: network and server resources consumed

A B

R1 R2 R5R6

R3

Page 15: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

LatencyNetwork

JitterDist. ofStats

ConnectivityPacketLoss

FTP DNS DHCP TCPJitter ICMP UDPDLSW HTTP

NetworkPerformance

Monitoring

Service Level Agreement

(SLA)Monitoring

NetworkAssessment

Multiprotocol Label

Switching (MPLS)

Monitoring

VoIP Monitoring

AvailabilityTroubleShooting

Operations

Measurement Metrics

Uses

MIB Data Active Generated Traff ic to Measure the Netw ork

DestinationSource

Responder

LDP H.323 SIP RTP

IP SLA

IP SLA: Synthetic Traffic Measurements

IP SLA

Cisco IOS

Software

IP SLA

Cisco IOS

SoftwareCisco IOS

Software

Page 16: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

• IPSLA Multicast

One Way Delay (NTP req)One Way JitterPacket Loss

• Configuration is on IP SLA Sender

• Have to specify each responder explicitly in endpoint-list

• Responder becomes mcast receiver, IGMPv3 (G) and (S,G) behavior

• ISRG2, ISR4451X, ASR1k, CSR1000v, cat4k(sup7/6), c7600

IPSLA Multicast Support

SLAsender(config)#ip sla endpoint-list type ip mylist

ip-address 172.16.1.2,172.17.1.2 port 3800

SLAsender(config)#ip sla 1

udp-jitter 224.1.1.1 4000 endpoint-list mylist source-ip 172.16.1.1 source-port 4500 num-packets 100 interval 25

Unicast control

Multicast traffic

Reference

Page 17: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

IPSLA and Relatives

• IPSLA on router/switch – good use of deployed infra

• May not be true check of data plane (shadow router)

• Resource contention (CPU) – group scheduling

• Simplistic service checks

• User end-system based agent software

• Uses true stack (OS, browser) on PC

• Truly end to end (could include WiFi)

• Includes end system resource view

• BYOD deployment challenges

• Dedicated Agent

• Mixture of benefits from end-system and network

• True stack can be challenging

Page 18: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

show interface

• Classic command

• Check up status

• Monitor in/out bit/packet changes

# show interfaceGigabitEthernet1 is up, line protocol is up Hardware is CSR vNIC, address is 000c.291a.7f97 (bia

000c.291a.7f97)Internet address is 192.168.225.130/24MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,

reliability 255/255, txload 1/255, rxload 1/255Encapsulation ARPA, loopback not setKeepalive set (10 sec)Full Duplex, 1000Mbps, link type is auto, media type is RJ45output flow-control is unsupported, input flow-control is

unsupportedARP type: ARPA, ARP Timeout 04:00:00Last input 00:05:35, output 00:09:58, output hang neverLast clearing of "show interface" counters neverInput queue: 0/375/0/0 (size/max/drops/flushes); Total output

drops: 0Queueing strategy: fifoOutput queue: 0/40 (size/max)5 minute input rate 0 bits/sec, 0 packets/sec5 minute output rate 0 bits/sec, 0 packets/sec

25349 packets input, 2381158 bytes, 0 no bufferReceived 0 broadcasts (0 IP multicasts)0 runts, 0 giants, 0 throttles 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored0 watchdog, 0 multicast, 0 pause input3958 packets output, 312408 bytes, 0 underruns0 output errors, 0 collisions, 0 interface resets56 unknown protocol drops0 babbles, 0 late collision, 0 deferred0 lost carrier, 0 no carrier, 0 pause output0 output buffer failures, 0 output buffers swapped out

snmp ifmib ifindex persistsnmp ifmib trap throttleinterface <intf>[no] logging event link-status [no] no snmp trap link-status load-interval 30

Page 19: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

traceroute

• Understand the limitations

• Sends 3 packets (default) at each TTL

• Implementations

• Linux/Cisco: UDP (ICMP and TCP-SYN are Linux optional)• UDP DST port # used to keep track of packets, increments per packet. Initial= 33434 (default)

• SRC port #: randomized (linux), incrementing per packet (IOS)

• Linux (GNU inetutils-traceroute)• UDP DST port# increments per TTL (not per packet)

• SRC port is random but fixed per entire run

• Windows: ICMP Echo request

• IOS ICMP responses limited to 1 per 500ms• Configurable via: ip icmp rate-limit unreachable <ms>

Widest dispersion

against possibilities. Difficult to

understand though.

ICMP blocked

frequently

Narrower

dispersion. Story might be

misleading.

Interet: aka the

TCP/80 network

Page 20: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Unix traceroute

• Multiple path options

• Topology ‘shortcuts’ (same router seen at diff hop)

• Ultimately all paths result in similar e2e delay

$ traceroute 62.2.88.172traceroute to 62.2.88.172 (62.2.88.172), 30 hops max, 60 byte packets

1 152.22.242.65 (152.22.242.65) 1.044 ms 1.371 ms 1.585 ms

2 152.22.240.8 (152.22.240.8) 0.219 ms 0.328 ms 0.327 ms

3 128.109.70.9 (128.109.70.9) 1.066 ms 1.059 ms 1.168 ms

4 rtp7600-gw-to-dep7600-gw2.ncren.net (128.109.70.137) 1.634 ms 1.628 ms 1.736 ms

5 rlasr-gw-link1-to-rtp7600-gw.ncren.net (128.109.9.17) 5.354 ms 5.446 ms 5.557 ms

6 128.109.9.117 (128.109.9.117) 5.671 ms 128.109.9.170 (128.109.9.170) 7.141 ms 128.109.9.117 (128.109.9.117) 5.433 ms

7 wscrs-gw-to-ws-a1a-ip-asr-gw-sec.ncren.net (128.109.1.105) 9.174 ms 128.109.1.209 (128.109.1.209) 8.256 ms 6.397 ms

8 dcp-brdr-03.inet.qwest.net (205.171.251.110) 18.414 ms chr-edge-03.inet.qwest.net (65.114.0.205) 27.353 ms 27.438 ms

9 dcp-brdr-03.inet.qwest.net (205.171.251.110) 21.739 ms 63-235-40-106.dia.static.qwest.net (63.235.40.106) 17.750 ms

dcp-brdr-03.inet.qwest.net (205.171.251.110) 22.450 ms

10 63-235-40-106.dia.static.qwest.net (63.235.40.106) 22.531 ms 22.516 ms 84-116-130-173.aorta.net (84.116.130.173) 140.738 ms

11 nl-ams02a-rd1-te0-2-0-2.aorta.net (84.116.130.65) 140.831 ms 140.816 ms 84-116-130-173.aorta.net (84.116.130.173) 144.819 ms

12 nl-ams02a-rd1-te0-2-0-2.aorta.net (84.116.130.65) 144.074 ms 144.761 ms 84-116-130-58.aorta.net (84.116.130.58) 138.455 ms

13 84-116-130-58.aorta.net (84.116.130.58) 141.844 ms 141.924 ms 142.459 ms

14 84.116.204.234 (84.116.204.234) 145.603 ms 145.891 ms 145.987 ms

15 * * *

16 62-2-88-172.static.cablecom.ch (62.2.88.172) 268.281 ms 268.245 ms 268.176 ms

1 AAA2 BBB

3 CCC

4 DDD

5 EEE

6 FGF

7 HII8 JKK +10ms (unsustained)

9 JLJ

10 LLM +120ms (sustained)

11 NNM

12 NNO

13 PPP

14 QQQ15 ***

16 RRR ~268ms (all three)

filter + > 100 ms

delay

+120ms

Atlantic crossing

Reference

Page 21: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Unix inetutils traceroute

• Narrower view (no alternate paths directly seen)

• Repeating nodes suggests multipath, or (unlikely) routing issue

$ inetutils-traceroute --resolve-hostname 62.2.88.172traceroute to 62.2.88.172 (62.2.88.172), 64 hops max1 152.22.242.65 (152.22.242.65) 0.783ms 0.727ms 0.798ms2 152.22.240.8 (152.22.240.8) 0.226ms 0.228ms 0.221ms3 128.109.70.9 (128.109.70.9) 0.967ms 0.980ms 0.962ms4 128.109.70.137 (rtp7600-gw-to-dep7600-gw2.ncren.net) 1.576ms 1.598ms 1.567ms5 128.109.9.17 (rlasr-gw-link1-to-rtp7600-gw.ncren.net) 5.149ms 5.140ms 5.126ms6 128.109.9.166 (128.109.9.166) 7.113ms 7.098ms 7.306ms7 128.109.1.209 (128.109.1.209) 7.835ms 8.326ms 7.958ms

8 65.114.0.205 (chr-edge-03.inet.qwest.net) 19.944ms 9.299ms 40.372ms9 63.235.40.106 (63-235-40-106.dia.static.qwest.net) 18.442ms 18.412ms 18.432ms10 63.235.40.106 (63-235-40-106.dia.static.qwest.net) 22.424ms 22.391ms 75.960ms11 84.116.130.173 (84-116-130-173.aorta.net) 145.434ms 146.301ms 145.445ms12 84.116.130.58 (84-116-130-58.aorta.net) 137.583ms 137.556ms 137.661ms13 84.116.130.58 (84-116-130-58.aorta.net) 142.476ms 141.886ms 141.819ms14 84.116.204.234 (84.116.204.234) 144.841ms 145.034ms 144.964ms15 * * *16 62.2.88.172 (62-2-88-172.static.cablecom.ch) 287.318ms 176.670ms 254.237ms

Packets for hop 9,12 took a

‘shortcut’ and packets for hop 10,13 went long way

Reference

Page 22: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

lft

• lft ‘layer 4 traceroute’ dynamically adjusts to responses

• Firewall detection, whois and AS lookup integrated

• Narrower packet changes, so narrower multi-path

$ sudo lft -ENA 62.2.88.172

Tracing ________________________________________________________________.

TTL LFT trace to 62-2-88-172.static.cablecom.ch (62.2.88.172):80/tcp1 [AS81] [NCREN-B22] 152.22.242.65 20.1/17.2ms2 [AS81] [NCREN-B22] 152.22.240.8 20.1/20.1ms3 [AS81] [CONCERT] 128.109.70.9 20.1/20.1ms4 [AS81] [CONCERT] rtp7600-gw-to-dep7600-gw2.ncren.net (128.109.70.137) 20.1/20.1ms5 [AS81] [CONCERT] rlasr-gw-link1-to-rtp7600-gw.ncren.net (128.109.9.17) 20.1/20.1ms6 [AS81] [CONCERT] 128.109.9.117 20.1/20.1ms7 [AS209] [unknown] chr-edge-03.inet.qwest.net (65.121.156.209) 20.1/19.5ms8 [AS209] [QWEST-INET-35] dcp-brdr-03.inet.qwest.net (205.171.251.110) 20.1/18.4ms9 [AS209] [QWEST-INET-17] 63-235-40-106.dia.static.qwest.net (63.235.40.106) 20.1/60.3ms10 [AS6830] [84-RIPE/LGI-Infrastructure] 84-116-130-173.aorta.net (84.116.130.173) 160.7/160.7ms11 [AS6830] [84-RIPE/LGI-Infrastructure] nl-ams02a-rd1-te0-2-0-2.aorta.net (84.116.130.65) 160.7/160.7ms12 [AS6830] [84-RIPE/LGI-Infrastructure] 84-116-130-58.aorta.net (84.116.130.58) 140.6/140.6ms** [firewall] the next gateway may statefully inspect packets13 [AS6830] [84-RIPE/LGI-Infrastructure] 84.116.204.234 160.7/160.6ms** [neglected] no reply packets received from TTL 1415 * [AS6830] [RIPE-C3/CC-HO841-NET] [target] 62-2-88-172.static.cablecom.ch (62.2.88.172):80 160.7ms

Used tcp/80

SYN

Reference

Page 23: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

mtr

• Interactive combined traceroute and ping

• Gives a sense of health of path (loss, delay Standard Deviation)

• Narrow path view

Reference

aakhter-nlr-ubuntu-01 (0.0.0.0) Sat May 30 18:57:09 2015Keys: Help Display mode Restart statistics Order of fields quit

Packets PingsHost Loss% Snt Last Avg Best Wrst StDev1. 152.22.242.65 0.0% 145 0.8 0.9 0.7 10.0 0.82. 152.22.240.8 0.0% 145 0.3 0.2 0.2 0.3 0.03. 128.109.70.9 0.0% 145 1.0 3.3 1.0 182.3 17.24. rtp7600-gw-to-dep7600-gw2.ncren.net 1.0% 145 9.2 4.1 1.6 203.4 18.65. rlasr-gw-link1-to-rtp7600-gw.ncren.net 0.0% 145 5.3 5.3 5.1 6.8 0.2

6. 128.109.9.166 0.0% 145 7.1 7.3 7.1 16.1 0.87. wscrs-gw-to-ws-a1a-ip-asr-gw-sec.ncren.net 0.0% 145 6.8 8.3 6.2 10.6 1.08. chr-edge-03.inet.qwest.net 0.0% 145 9.4 12.3 9.3 62.1 9.59. dcp-brdr-03.inet.qwest.net 0.0% 145 21.8 22.8 21.7 70.7 5.510. 63-235-40-106.dia.static.qwest.net 0.0% 145 21.8 24.5 21.7 86.1 10.611. 84-116-130-173.aorta.net 0.0% 145 144.8 145.0 144.7 152.9 1.012. nl-ams02a-rd1-te0-2-0-2.aorta.net 0.0% 145 144.1 145.5 144.0 165.4 3.713. 84-116-130-58.aorta.net 5.0% 144 142.9 142.3 142.0 145.6 0.414. 84.116.204.234 5.0% 144 145.1 145.1 144.9 145.3 0.015. 217-168-62-150.static.cablecom.ch 5.0% 144 145.9 146.1 145.2 164.3 1.916. 62-2-88-172.static.cablecom.ch 5.0% 144 313.0 260.3 152.6 508.0 80.0

Note

variability,

probably just

the end

system

Just local noise, no

carry over to later

hops Sustained loss.

Likely something

wrong 12->13, or

way back

Page 24: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Follow the Flow with NetFlow

• Per-Node: Data plane observations and decisions captured

• Src/dst mac/IP/port#s, DSCP values, in/out interfaces, etc.

• Network view: flows centrally analyzed- NetFlow collector/analyzer

• Biggest value: strategically placed partial views (eg WAN edge)

A B

R1 R2 R5R6

R4

R3

NetFlow Collector

LiveAction

Page 25: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

• Developed and patented at Cisco Systems in 1996

• NetFlow is the de facto standard for acquiring IP operational data

• Standardized in IETF via IPFIX

• Provides network and security monitoring, network planning, traffic analysis, and IP accounting

• Packet capture is like a wire tap

• NetFlow is like a phone bill

NetFlow—What Is It?

Network World Article—NetFlow Adoption on the Risehttp://www.networkworld.com/newsletters/nsm/2005/0314nsm1.html

Page 26: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Src.IP

Dest.IP

SourcePort

Dest.Port

ProtocolTOS

Input I/F

… Pkts

3.3.3.3 2.2.2.2 23 22078 6 0 E0 … 1100

Traffic Analysis Cache

Flow Monitor 1

Traffic

Non-Key Fields

Packets

Bytes

Timestamps

Next Hop Address

Source IP Dest. IP Input I/F Flag … Pkts

3.3.3.3 2.2.2.2 E0 0 … 11000

Security Analysis Cache

Flow Monitor 2

Key Fields Packet 1

Source IP 3.3.3.3

Dest IP 2.2.2.2

Input Interface Ethernet 0

SYN Flag 0

Non-Key Fields

Packets

Timestamps

Flexible NetFlowMultiple Monitors with Unique Key Fields

Key Fields Packet 1

Source IP 3.3.3.3

Destination IP 2.2.2.2

Source Port 23

Destination Port 22078

Layer 3 Protocol TCP - 6

TOS Byte 0

Input Interface Ethernet 0

Page 27: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

• Flexible NetFlow Forwarding Status field captures forwarding (and drop reason) for flow.

• Drop Count increments on any explicit drop by router

NetFlow Forwarding Status & Drop Count Fields

Page 28: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Network nodes are able to discover & validate RTP, TCP and IP-CBR traffic on hop by hop

basis

À la carte metric (loss, latency, jitter etc.) selections, applied on operator selected sets of traffic

Allows for fault isolation and network span validation

Per-application threshold and altering.

Network Performance Monitor

Page 29: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

• RTP SSRC

• RTP Jitter (min/max/mean)

• Transport Counter (expected/loss)

• Media Counter (bytes/packets/rate)

• Media Event

• Collection interval

• TCP MSS

• TCP round-trip time

Performance Monitor Information Elements

• CND - Client Network Delay (min/max/sum)

• SND – Server Network Delay (min/max/sum)

• ND – Network Delay (min/max/sum)

• AD – Application Delay (min/max/sum)

• Total Response Time (min/max/sum)

• Total Transaction Time (min/max/sum)

• Number of New Connections

• Number of Late Responses

• Number of Responses by Response Time (7-bucket histogram)

• Number of Retransmissions

• Number of Transactions

• Client/Server Bytes

• Client/Server Packets

• L3 counter (bytes/packets)

• Flow event

• Flow direction

• Client and server address

• Source and destination address

• Transport information

• Input and output interfaces

• L3 information (TTL, DSCP, TOS, etc.)

• Application information (from NBAR2)

• Monitoring class hierarchy

Media Monitoring Application Response Time Other Metrics

Page 30: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

NetFlow QoS Analysis

Cisco Prime Infra

LiveAction

flow 5-tuple DPI/NBAR QoS processing DSCP

How is my flow being classified?

Did this QoS class drop traffic?

Page 31: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

• QoS queue performance (drops)

• QoS class structure class-map and policy map names

NetFlow QoS Flow exporter:option c3pl-class-table timeout <timeout>option c3pl-policy-table timeout <timeout>

QoS Queue performance:flow record type performance monitor qos-recordmatch policy qos queue indexcollect policy qos queue drops

(or)flow record qos-recordmatch policy qos queue index

collect policy qos queue drops

Flow to QoS Association:flow record type performance-monitor Amatch connection client ipv4 addressmatch connection server ipv4 addressmatch connection server transport portcollect policy qos class hierarchycollect policy qos queue id…

(or)flow record qos-class-recordmatch ipv4 source addressmatch ipv4 destination addresscollect policy qos classification hierarchycollect policy qos queue index…

Page 32: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Enhanced NetFlow CLI ExampleR1#show flow monitor qos-flow-monitor cache IP FORWARDING STATUS: Forward

IPV4 SOURCE ADDRESS: 192.168.32.128

IPV4 DESTINATION ADDRESS: 224.0.0.5

INTERFACE INPUT: Null

INTERFACE OUTPUT: Gi2

FLOW DIRECTION: Output

IP DSCP: 0x30

policy qos class hierarchy: WAN-EDGE-4-CLASS: CONTROL

policy qos queue index: 1073741827

IP FORWARDING STATUS: Consume

IPV4 SOURCE ADDRESS: 192.168.225.128

IPV4 DESTINATION ADDRESS: 192.168.225.130

INTERFACE INPUT: Gi1

INTERFACE OUTPUT: Null

FLOW DIRECTION: Input

IP DSCP: 0x04

policy qos class hierarchy: WAN-EDGE-4-CLASS: class-default

policy qos queue index: 0

IP FORWARDING STATUS: Forward

IPV4 SOURCE ADDRESS: 192.168.225.128

IPV4 DESTINATION ADDRESS: 5.5.5.5

INTERFACE INPUT: Gi1

INTERFACE OUTPUT: Gi2FLOW DIRECTION: Output

IP DSCP: 0x00

policy qos class hierarchy: WAN-EDGE-4-CLASS: class-default

policy qos queue index: 1073741829

0x30 = CS6: in

‘control’ class

My VTY

session

Data traffic

platform qos performance-monitor

!

flow record qos-class-record

match routing forwarding-status

match ipv4 dscp

match ipv4 source address

match ipv4 destination address

match interface input

match interface output

match flow direction

collect policy qos classification hierarchy

collect policy qos queue index

!

flow monitor qos-flow-monitor

record qos-class-record

!

interface GigabitEthernet1

ip flow monitor qos-flow-monitor input

!

interface GigabitEthernet2

ip flow monitor qos-flow-monitor output

service-policy output WAN-EDGE-4-CLASS

Page 33: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

• IOS QoS collects vital information regardinghealth of QoS classes

• Pre and Post bytes, drops, etc

• Same class names from different routers can be compared

• For flow level analysis, use NetFlow QoSreporting

• ‘snmp mib persist CBQoS’

CBQoS MIB

Page 34: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Dedicated Protocol Analyzers• Wireshark and other protocol analyzers are great

• Detailed analysis for variety of protocols at deep level

• Dedicated probes are expensive to deploy pervasively• Operator has to make difficult judgment calls on where the problem is going to be– before it

happens

• Can be challenging after the fact- need on-site trained personnel.

Page 35: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Embedded Packet Capture & Analyze

• Capture packets locally to buffer on router

• Store to flash, USB, FTP, TFTP for analysis in protocol analyzer• IOS XE Cat 4k Sup 7E & Sup 7L-E (XE 3.3.0 SG) include built in Wireshark decode capability

• Capture does not add traffic to network

LY-2851-8#monitor capture buffer pcap-buffer1 size 10000 max-size 1550

LY-2851-8#monitor capture point ip cef pcap-point1 g0/0 both

LY-2851-8#monitor capture point associate pcap-point1 pcap-buffer1

LY-2851-8#monitor capture point start pcap-point1

LY-2851-8#monitor capture point stop pcap-point1

LY-2851-8#monitor capture buffer pcap-buffer1 export ftp://10.17.0.252/images/test.cap

Gig0/0

Page 36: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

iOAM6(prototype)

• Instrumented IPv6 extension header on user packets

• vs. IPv4 record-route option header

• v6 Ext Headers better designed

• Domain level control

• Minimal performance hit (handled in data plane)

• Packets continue on regular path

• Instrumentation

• Packet sequence numbers => detect packet loss

• Time stamps => one way delay

• Node and ingress/egress interface names => path recording

• Send interest to [email protected] Network Element

Apps/Controller

v6 traffic

matrix

Live flow

tracing

Delay

distribution

Bi-castíng

control

Loss matrix/

monitor

App data

monitoring

Enhanced Telemetry

Per hop and end-to-end data added to

(selected) data traffic into the packet

Node-ID Ingress i/f egress i/f

Sequence# Timestamp App-Data

BRKRST-2606

for more info

Page 37: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

iOAM6 Path Trace

• Basic configuration

ipv6 ioam path-record

ipv6 ioam node-id <node id>

• Extended Ping

H1#pingProtocol [ip]: ipv6Target IPv6 address: ::A:1:1:0:1DRepeat count [5]: 1Datagram size [100]: 300Timeout in seconds [2]:Extended commands? [no]: yesSource address or interface: gig0/1UDP protocol? [no]:

Verbose? [no]: yesPrecedence [0]:DSCP [0]:Include hop by hop Path Record option? [no]: yesSweep range of sizes? [no]:Type escape sequence to abort.Sending 1, 300-byte ICMP Echos to ::A:1:1:0:1D, timeout is 2 seconds:(Gi0/1)R1(Gi0/2)----(Gi0/1)R4(Gi0/2)----(Gi0/2)R3(Gi0/3)----H3----(Gi0/3)R3(Gi0/2)----(Gi0/2)R4(Gi0/1)----(Gi0/2)R1(Gi0/1)Reply to request 0 (35 ms)Success rate is 100 percent (1/1), round-trip min/avg/max = 35/35/35 ms

H1 R1R3

H3::A:1:1:0:1D

R2

R4

V6 extension header

applied/decapped

V6 extension header

applied/decapped

Page 38: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Demo Time

Page 39: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Control Plane

Page 40: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

• 3Ws: When, where, and what

• Change is normal, but some changes are more interesting:

• Single change that causes loss of reachability or suboptimal performance

• Instability: high rate of change

Control Plane

Page 41: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Logging

• Centrally: for ease of analysis and search

• syslog-ng – preprocessing, relay and store(file/db)

• Logstash(ELK), fluentd – multisource collection, storage and analysis

• Locally: in case logs can’t get home

service timestamps log datetime msec show-timezone!

logging host <ipaddr>

logging trap 6

logging source interface Loopback 0

!

logging buffered <size> 6logging presistant url disk0:/syslog size <TotalLogsSize> filesize <OneFileSize>

Page 42: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

State of the Routing Table

• Be familiar with normal behavior of important service prefixes

• Establish quickly if problem is control plane or data plane

• show ip route / ipRouteTable MIB / show ip traffic (Drop stats)

• Nagios: check_snmp_iproute.pl

• Track objects and EEM(config)track 100 ip route 0.0.0.0 0.0.0.0 reachabilityevent manager applet TrackRoute_0.0.0.0event track 100 state anyaction 1.0 syslog msg "route is $_track_state“

#01:09:21: %HA_EM-6-LOG: TrackRoute_0.0.0.0: route is down

blog.ipspace.net

#show ip route 192.168.2.2

Routing entry for 192.168.2.2/32

Known via "ospf 1", distance 110, metric 11, type intra area

Last update from 10.0.0.2 on FastEthernet0/0, 00:00:13 ago

Routing Descriptor Blocks:

* 10.0.0.2, from 2.2.2.2, 00:00:13 ago, via FastEthernet0/0

Route metric is 11, traffic share count is 1

blog.ipsapce.net

Page 43: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

• Remember that OSPF nodes in area should be consistant

• Understand ‘normal’ rate of changes• SPF runs per hour

• show ip ospf stat detail

• number of LSAs expected

• OSPF-MIB: OspfSpfRuns, ospfAreaLSACount

• Route missing? • Where is the network supposed to be

attached? Is it still?

• show interface (on advertising router)

• show ip ospf database …

OSPF Area / AS-Wide

# show ip ospf

Routing Process "ospf 1" with ID 192.168.0.1

Start time: 00:01:46.195, Time elapsed: 00:48:27.308

Supports only single TOS(TOS0) routes

Supports opaque LSA

Supports Link-local Signaling (LLS)

Supports area transit capability

Supports NSSA (compatible with RFC 3101)

Supports Database Exchange Summary List Optimization (RFC 5243)

Event-log enabled, Maximum number of events: 1000, Mode: cyclic

Router is not originating router-LSAs with maximum metric

Initial SPF schedule delay 5000 msecs

Minimum hold time between two consecutive SPFs 10000 msecs

Maximum wait time between two consecutive SPFs 10000 msecs

Incremental-SPF disabled

Minimum LSA interval 5 secs

Minimum LSA arrival 1000 msecs

LSA group pacing timer 240 secs

Interface flood pacing timer 33 msecs

Retransmission pacing timer 66 msecs

Number of external LSA 0. Checksum Sum 0x000000

Number of opaque AS LSA 0. Checksum Sum 0x000000

Number of DCbitless external and opaque AS LSA 0

Number of DoNotAge external and opaque AS LSA 0

Number of areas in this router is 1. 1 normal 0 stub 0 nssa

Number of areas transit capable is 0

External flood list length 0

IETF NSF helper support enabled

Cisco NSF helper support enabled

Reference bandwidth unit is 100 mbps

Area BACKBONE(0)

Number of interfaces in this area is 4 (1 loopback)

Area has no authentication

SPF algorithm last executed 00:47:05.379 ago

SPF algorithm executed 4 times

Area ranges are

Number of LSA 16. Checksum Sum 0x078460

Number of opaque link LSA 0. Checksum Sum 0x000000

Number of DCbitless LSA 0

Number of indication LSA 0

Number of DoNotAge LSA 0

Flood list length 0

Page 44: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

OSPF Neighborships

• neighbor adjacencies

• log-adjacency-changes (on by default)

• show ip ospf neighbor detail (OSPF-MIB: ospfNbrState, ospfNbrEvents, ospfNbrLSRetransQLen)

(config) router ospf <id>

(config-router) log-adjacency-changes [detail]

%OSPF-5-ADJCHG: Process 12, Nbr 172.25.25.1 on Serial0/0 from FULL to DOWN, Neighbor Down: Dead timer

expired Oct 14 09:57:43: %OSPF-5-ADJCHG: Process 12, Nbr 172.25.25.1 on ...

# show ip ospf neighbor detailNeighbor 192.168.0.7, interface address 10.0.0.3

In the area 0 via interface GigabitEthernet0/1Neighbor priority is 1, State is FULL, 6 state changesDR is 10.0.0.3 BDR is 10.0.0.4Options is 0x12 in Hello (E-bit, L-bit)Options is 0x52 in DBD (E-bit, L-bit, O-bit)LLS Options is 0x1 (LR)Dead timer due in 00:00:39Neighbor is up for 00:33:10Index 2/2/2, retransmission queue length 0, number of retransmission 0First 0x0(0)/0x0(0)/0x0(0) Next 0x0(0)/0x0(0)/0x0(0)Last retransmission scan length is 0, maximum is 0

Last retransmission scan time is 0 msec, maximum is 0 msec

Page 45: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

RtrA#show ip eigrp neighbors

IP-EIGRP neighbors for process 1

H Address Interface Hold Uptime SRTT RTO Q Seq

(sec) (ms) Cnt Num

2 10.1.1.1 Et0 12 6d16h 20 200 0 233

1 10.1.4.3 Et1 13 2w2d 87 522 0 452

0 10.1.4.2 Et1 10 2w2d 85 510 0 3

Seconds Remaining Before Declaring Neighbor Down

How Long Since the Last Time Neighbor Was Discovered

How Long It Takes for This Neighbor to Respond to Reliable Packets

How Long We’ll Wait Before Retransmitting if No Acknowledgement

NeighborsShow IP EIGRP Neighbors

Outstanding Packets

Last Reliable Packet Sent

Page 46: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Neighbors

• So this tells us why the neighbor is bouncing—but what do they mean?

• eg: peer restarted means you have to ask the peer; he’s the one that restarted the session

Log-Neighbor-Changes Messages

Neighbor 10.1.1.1 (Ethernet0) is down: peer restarted

Neighbor 10.1.1.1 (Ethernet0) is up: new adjacency

Neighbor 10.1.1.1 (Ethernet0) is down: holding time expired

Neighbor 10.1.1.1 (Ethernet0) is down: retry limit exceeded Others, but not often

Page 47: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

BGP Monitoring Protocol (BMP) OverviewCollecting Pre-Policy BGP Messages

Adj-RIB-in (pre-inbound-filter)BGP Monitor Protocol update

BMP collector

BMP clientInbound

filtering

policing

Loc-RIB (post-inbound-filter)iBGP update

BMP message

Adj-RIB-in (pre-inbound-filter)eBGP update

BGP peer’s (external)

BGP peer

(internal)

Page 48: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

• BMP receiver can be configured with both ipv4 & ipv6 host addresses.

• The BGP speaker process is referred to as the BMP Client.

• BMP client provides only pre-policy view of the ADJ-RIB-IN of a peer

• Post-policy view is not supported

• A BGP peer can be monitored by multiple BMP reciever

• Any update message from the peer ( irrespective of the address-family ) is sent to the BMP receiver

• Multiple BMP receivers can be configured across all BGP instances

• Each BGP instance will send update messages for peers under it to the BMP receiversmonitoring the corresponding peers

BMP client

Page 49: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

OpenBMP

Historical record of prefix withdraws

Current route views and peer status

Page 50: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Demo Time

Page 51: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Getting Started

Page 52: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Be Prepared!

• Be prepared and have data collection systems enabled• Enable passive monitoring

• Call signaling and endpoint logging systems: CDR, Syslog, SNMP Traps etc.• Session monitoring on endpoints, application infrastructure and network

• Enable active tests • Periodic endpoint to endpoint calls• Network performance probes

• Helpdesk• Interview Script

• Access to tools, logs, etc.

• Firefighters run drills, so should your teams!• Be familiar with the tools and how they respond on your network

• Cross-domain teams (applications, UC, security, servers)

Page 53: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Expanding your Toolbox and Knowledge

• Great open source tools to look at

• Network topology & IP address management: netdot, GestióIP

• Performance tests: iperf3

• Service checks: Nagios Core, Zenoss Community

• NetFlow / Log analysis: logstash, fluentd

• Template driven config generation: ansible

• Just Some of the Sessions at Cisco Live

• BRKARC-2002 - Techniques of a Network Detective

• BRKARC-2011 - Overview of Packet Capturing Tools in Cisco Switches and Routers

• BRKARC-2019 - Operating an ASR1000

• LTRARC-2003 - IOS-XE hands-on troubleshooting

Page 54: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Participate in the “My Favorite Speaker” Contest

• Promote your favorite speaker through Twitter and you could win $200 of Cisco Press products (@CiscoPress)

• Send a tweet and include

• Your favorite speaker’s Twitter handle @aakhter

• Two hashtags: #CLUS #MyFavoriteSpeaker

• You can submit an entry for more than one of your “favorite” speakers

• Don’t forget to follow @CiscoLive and @CiscoPress

• View the official rules at http://bit.ly/CLUSwin

Promote Your Favorite Speaker and You Could Be a Winner

Page 55: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Complete Your Online Session Evaluation

Don’t forget: Cisco Live sessions will be available for viewing on-demand after the event at CiscoLive.com/Online

• Give us your feedback to be entered into a Daily Survey Drawing. A daily winner will receive a $750 Amazon gift card.

• Complete your session surveys though the Cisco Live mobile app or your computer on Cisco Live Connect.

Page 56: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Continue Your Education

• Demos in the Cisco campus

• Walk-in Self-Paced Labs

• Table Topics

• Meet the Engineer 1:1 meetings

• Related sessions

Page 57: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Thank you

Page 58: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP
Page 59: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

R&S Related Cisco Education OfferingsCourse Description Cisco Certification

CCIE R&S Advanced Workshops (CIERS-1 & CIERS-2) plusSelf Assessments, Workbooks & Labs

Expert level trainings including: instructor led workshops, self assessments, practice labs and CCIE Lab Builder to prepare candidates for the CCIE R&S practical exam.

CCIE® Routing & Switching

• Implementing Cisco IP Routing v2.0• Implementing Cisco IP Switched

Networks V2.0

• Troubleshooting and Maintaining Cisco IP Networks v2.0

Professional level instructor led trainings to prepare candidates for the CCNP R&S exams (ROUTE, SWITCH and TSHOOT). Also available in self study eLearning formats with Cisco Learning Labs.

CCNP® Routing & Switching

Interconnecting Cisco Networking Devices: Part 2 (or combined)

Configure, implement and troubleshoot local and wide-area IPv4 and IPv6 networks. Also available in self study eLearning format with Cisco Learning Lab.

CCNA® Routing & Switching

Interconnecting Cisco Networking Devices: Part 1

Installation, configuration, and basic support of a branch network. Also available in self study eLearning format with Cisco Learning Lab.

CCENT® Routing & Switching

For more details, please visit: http://learningnetwork.cisco.com

Questions? Visit the Learning@Cisco Booth or contact [email protected]

Page 60: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Backup Slides

Page 61: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Performance Monitor Configuration

Flow

Record

Flow Monitor

Flow

Exporter(optional)

Policy-map

Class-map

Interface

Applied inbound or outbound

What traffic to monitor?What metrics to collect?

Where to send data?

Page 62: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Flow Record defines what metrics to collect and how to collect them (just like in Flexible NetFlow configuration)

Performance monitor introducesflow record type performance-monitor

Match field types perform aggregation towards that field.

Iematch ipv4 source addressmatch ipv4 destination address

will create a unique entry per src-dstcombinations

Example Configuration – Flow Record

f low record ty pe perf ormance-monitor def ault-rtp-pt-namematch ipv 4 protocolmatch ipv 4 source addressmatch ipv 4 destination addressmatch transport source-portmatch transport destination-portmatch transport rtp ssrcmatch policy perf ormance-monitor classif ication hierarchycollect routing f orwarding-statuscollect ipv 4 dscpcollect ipv 4 ttlcollect transport packets expected countercollect transport packets lost countercollect transport packets lost ratecollect transport ev ent packet-loss countercollect transport rtp jitter meancollect transport rtp jitter minimumcollect transport rtp jitter maximumcollect interf ace inputcollect interf ace outputcollect counter by tescollect counter packetscollect counter by tes ratecollect timestamp interv alcollect application namecollect application media by tes countercollect application media by tes ratecollect application media packets countercollect application media packets ratecollect application media ev entcollect monitor ev entcollect transport rtp pay load-ty pe

!

Page 63: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

flow monitor pulls together the flow record, exporter, and specific cache management configurations (just like Flexible NetFlow)

Special type of flow monitorflow monitor type performance-monitor

(optional) Flow exporter configures how the NetFlow exporting is done

Policy map specifies which traffic to monitor (via class-map), how to monitor (via monitor), and any per-class threshold crossing actions

Typed policy-map (performance monitor)

Example Configuration – monitor

flow exporter mn-campus-samplicatordestination 10.1.160.37source Loopback0transport udp 2055template data timeout 60option c3pl-class-tableoption c3pl-policy-tableoption interface-tableoption application-tableoption sub-application-table!flow monitor type performance-monitor default-rtp-pt-namerecord default-rtp-pt-nameexporter mn-campus-samplicatorcache timeout synchronized 10 export-spread 5history size 10!policy-map type performance-monitor rtp-traffic-nameclass VOIPflow monitor default-rtp-pt-namereact 1 transport-packets-lost-ratethreshold value ge 1.00alarm severity erroraction syslog

class VIDEO-CONFflow monitor default-rtp-pt-name

Page 64: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Example Configuration – Interface attachment

• Finally, policy map is applied to interface

• Note typed policy is used

• Direction of monitoring (input|output) selectable for some platforms

interface gigabitEthernet 0/1

service-policy type performance-monitor input rtp-traffic-

name

Page 65: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

• AQM provides deeper insight into the media flows that are processed by the CUBE / Voice gateways

ISRG2, c8xx 15.3(3)M

• Available via MIB, CDR and performance monitor

Audio Quality Metrics (AQM) on CUBE

PRI

SIP/media

Page 66: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

‘media monitoring’ configuration under ‘voice service voip’ or dial-peer

Controls generation of metrics on CUBE/VG

To export via NetFlow, regular performance monitor configuration –just include the AQM fields

MIBCISCO-VOICE-DIAL-CONTROL-MIB

Example Configuration –AQM performance monitor

voice service voipmedia monitoring [num] persist

! num is number of channels used to monitormedia statistics

! delay calc, MOS etc

OR

dial-peer voice [tag] voipmedia monitoring

!flow record type performance-monitor aqmmatch ipv4 source addressmatch ipv4 destination addressmatch transport source-portmatch transport destination-portcollect application voice number calledcollect application voice number calling…

Regular performance monitoring configuration continues

Page 67: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

VQM deeper insight into the video flows (H.264) that are crossing routers

ISRG2, c8xx 15.3(3)M

Available via performance monitor

Video Quality Metrics (VQM) on ISR G2

Page 68: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

‘no shut’ under ‘video monitoring’ global config.

To export via NetFlow, regular performance monitor configuration – just include the AQM fields

Example Configuration –VQM performance monitor

video monitoringmaximum-sessions 10no shutdown

flow record type performance-monitoring vqm-recmatch ipv4 protocolmatch ipv4 source addressmatch ipv4 destination addressmatch transport source-portmatch transport destination-portmatch transport rtp ssrccollect application video resolution [ width | height ] lastcollect application video frame ratecollect application video payload bitrate [ average | fluctuation ]collect application video frame [ I | STR | LTR | super-P | NR ] counter

framescollect application video frame [ I | STR | LTR | super-P | NR ] counter

packets [lost]collect application video frame [ I | STR | LTR | super-P | NR ] counter bytescollect application video frame [ I | STR | LTR | super-P | NR ] slice-

quantization-levelcollect application video eMOS compression [ network | bitstream ]collect application video eMOS packet-loss [ network | bitstream ]collect application video frame percentage damagedcollect application video scene-complexitycollect application video level-of-motioncollect transport rtpsequence-number [ last ]

Page 69: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

Individual monitor intervals:• show performance monitor history

Aggregation over all stored intervals:

• show performance monitor status

show commands1861-AA0213#show performance monitor history Load for five secs: 20%/16%; one minute: 8%; five minutes: 4%Time source is NTP, 01:52:12.052 EST Fri Oct 29 2010

Codes: * - field is not configurable under flow recordNA - field is not applicable for configured parameters

Match: ipv4 src addr = 10.1.160.19, ipv4 dst addr = 10.1.3.5, ipv4 prot = udp, trns srcport = 32760, trns dst port = 22802, SSRC = 1717646439Policy: all-apps, Class: telepresence-CS4, Interface: FastEthernet0/0, Direction: input

start time 01:51:31 ============

*history bucket number : 1 *counter flow : 1 counter bytes : 162329 counter bytes rate (Bps) : 5410 *counter bytes rate per flow (Bps) : 5410 *counter bytes rate per flow min (Bps) : 5410 *counter bytes rate per flow max (Bps) : 5410 counter packets : 773

*counter packets rate per flow : 25 counter packets dropped : 0 routing forwarding-status reason : Unknown interface input : Fa0/0 interface output : Vl1000 monitor event : false

ipv4 dscp : 32 ipv4 ttl : 58 application media bytes counter : 146869 application media packets counter : 773

application media bytes rate (Bps) : 4895 *application media bytes rate per flow (Bps) : 4895 *application media bytes rate per flow min (Bps) : 4895 *application media bytes rate per flow max (Bps) : 4895

application media packets rate (pps) : 25 application media event : Normal *transport rtp flow count : 1 transport rtp jitter mean (usec) : 476 transport rtp jitter minimum (usec) : 1 transport rtp jitter maximum (usec) : 1997 *transport rtp payload type : 96 transport event packet-loss counter : 0 *transport event packet-loss counter min : 0 *transport event packet-loss counter max : 0 transport packets expected counter : 773 transport packets lost counter : 0 *transport packets lost counter minimum : 0 *transport packets lost counter maximum : 0 transport packets lost rate ( % ) : 0.00 *transport packets lost rate min ( % ) : 0.00 *transport packets lost rate max ( % ) : 0.00

for reference

Page 70: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

How do I want to cache information?Router(config)# flow monitor my-monitor

Router(config-flow-monitor)# exporter my-exporter

Router(config-flow-monitor)# record my-record

Which interface do I want to monitor?

What data do I want to meter?Router(config)# flow record my-recordRouter(config-flow-record)# match ipv4 destination addressRouter(config-flow-record)# match ipv4 source address

Router(config-flow-record)# collect counter bytes

Where do I want my data sent?Router(config)# flow exporter my-exporter

Router(config-flow-exporter)# destination 1.1.1.1

Router(config)# interface s3/0

Router(config-if)# ip flow monitor my-monitor input

1. Configure the Exporter

2. Configure the Flow Record

3. Configure the Flow Monitor

4. Apply to an Interface

Service PlanningFNF Configuration - Example

Page 71: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

• How is my flow being classified?

• Did this class drop traffic?

• QoS queue performance (drops)

• QoS class structure class-map and policy map names

NetFlow QoS Reporting Flow exporter:option c3pl-class-table timeout <timeout>option c3pl-policy-table timeout <timeout>

QoS Queue performance:flow record type performance monitor qos-recordmatch policy qos queue indexcollect policy qos queue drops

(or)flow record qos-recordmatch policy qos queue index

collect policy qos queue drops

Flow to QoS Association:flow record type performance-monitor Amatch connection client ipv4 addressmatch connection server ipv4 addressmatch connection server transport portcollect policy qos class hierarchycollect policy qos queue id…

(or)flow record qos-class-recordmatch ipv4 source addressmatch ipv4 destination addresscollect policy qos classification hierarchycollect policy qos queue index…

Page 72: Network State Awareness and Troubleshooting · • This session is about basic network troubleshooting, focusing on fault detection & isolation ... eg. prefix reachability, ... (NTP

show ip traffic

R1#show ip traffic [interface <interface>]

IP statistics:

Rcvd: 1117 total, 1116 local destination

0 format errors, 0 checksum errors, 0 bad hop count

0 unknown protocol, 0 not a gateway

0 security failures, 0 bad options, 0 with options

Opts: 0 end, 0 nop, 0 basic security, 0 loose source route

0 timestamp, 0 extended security, 0 record route

0 stream ID, 0 strict source route, 0 alert, 0 cipso, 0 ump

0 other

Frags: 0 reassembled, 0 timeouts, 0 couldn't reassemble

0 fragmented, 0 fragments, 0 couldn't fragment

Bcast: 58 received, 0 sent

Mcast: 442 received, 221 sent

Sent: 842 generated, 1195 forwarded

Drop: 1 encapsulation failed, 0 unresolved, 0 no adjacency

0 no route, 0 unicast RPF, 0 forced drop

0 options denied

Drop: 0 packets with source IP address zero

Drop: 0 packets with internal loop back IP address

0 physical broadcast

Reinj: 0 in input feature path, 0 in output feature path

ICMP statistics:

Rcvd: 0 format errors, 0 checksum errors, 0 redirects, 0 unreachable

0 echo, 0 echo reply, 0 mask requests, 0 mask replies, 0 quench

0 parameter, 0 timestamp, 0 timestamp replies, 0 info request, 0 other

0 irdp solicitations, 0 irdp advertisements

0 time exceeded, 0 info replies

Sent: 0 redirects, 0 unreachable, 0 echo, 0 echo reply

0 mask requests, 0 mask replies, 0 quench, 0 timestamp, 0 timestamp replies

0 info reply, 0 time exceeded, 0 parameter problem

0 irdp solicitations, 0 irdp advertisements

UDP statistics:

Rcvd: 58 total, 0 checksum errors, 58 no port 0 finput

Sent: 0 total, 0 forwarded broadcasts

BGP statistics:

Rcvd: 0 total, 0 opens, 0 notifications, 0 updates

0 keepalives, 0 route-refresh, 0 unrecognized

Sent: 0 total, 0 opens, 0 notifications, 0 updates

0 keepalives, 0 route-refresh

TCP statistics:

Rcvd: 1471 total, 0 checksum errors, 85 no port

Sent: 597 total

..

OSPF statistics:

Last clearing of OSPF traffic counters never

Rcvd: 460 total, 0 checksum errors

414 hello, 8 database desc, 3 link state req

22 link state updates, 13 link state acks

Sent: 245 total

199 hello, 12 database desc, 2 link state req

21 link state updates, 12 link state acks