8/6/2019 Network Diagnosis and Troubleshooting Summary
1/14
Network Diagnosis and Troubleshooting Summary by Bob Chan
Documentation
Baselining Objective
Discover the true performance of the network
Provide comparison between normal and abnormal situations
Verify policies
Identify over-utilization and under-utilization areas
Long-term performance and capacity prediction
Steps of baselining
Planning for the first baseline
Start with data points which represent defined policies
Collect data for day or two before actual baseline to
determine whether the right data is collected from right
devices
Conduct network baselining on regular basis
Speed up fault isolation
Understand how the network affected by changes
Identifying devices and ports of interest
More clear report
Either keep from change or change informing manner
Use port description field to track the ports
Determine the duration of baseline
At least 7 days, 2 4 weeks is adequate
Network documentation
Overview
Facilitate more effective troubleshooting
Save time to build network configurations again Network configuration table
Contain accurate and up-to-date records of components of the
network.
Provide information to identify and correct faults
Should include: type, model, hostname, location, data link layer
address, network layer address, other physical aspects
Table for budgetary purpose should be separated
Network topology diagrams
Notations and symbols should be consistent
8/6/2019 Network Diagnosis and Troubleshooting Summary
2/14
Cloud symbol = out of scope network
Should include: device name, interface name, IP address, routing
protocols
Discover network configuration information
show version device name, model, OS version (all)
show ip interfaces active interfaces + addresses (R)
show ip interfaces brief brief summary of interfaces (R)
show ip interface {interface-name} MAC address (R)
show ip protocols routing protocols enabled (R)
show spanning tree/spantree spanning tree status (all)
show cdp neighbors directly connected Cisco devices (all)
show cdp entry {device id} details of connected devices (all)
show interfaces description active ports + addresses (S)
show interfaces status ports summary (S)
show etherchannel summary EtherChannel (S)
show interfaces trunk Trunk ports (S)
show tech-support all information (many than needed)
End system configuration table
End systems are important, can affect network performance
Provide complete picture of the network
Should include: device name, OS, IP address, subnet mask,
default gateway, DNS server, high-bandwidth network
applications
End system topology diagrams
Should include: device name, OS, IP address, subnet mask,
interface names, VLANs
Discover end system configuration information
OS and hardware information
Access command line ipconfig / winipcfg / ifconfig- TCP/IP setting
route print active routes
arp a ARP information
ping check connectivity
tracert / traceroute view routes
Documentation guidelines
Determine scope Know the objective Be consistent Keep
the documents accessibleMaintain the documentation
8/6/2019 Network Diagnosis and Troubleshooting Summary
3/14
Troubleshooting methodologies and tools
Overview
Systematic approach can make troubleshooting manageable, less
confuse and less time wasting
Rocket scientist approach (theorist)
Analyze until identify root cause, then correct with precision
Time wasting, resources demanding
Caveman approach (practical)
Swap the things until the network functions again
Not reliable, root cause may still present
General troubleshooting process
Remarks: stages are not mutually exclusive, policies should be
established in each stage
Step 1 Gather symptoms
From alerts from NM systems, console message and users
Break down the problems to smaller ones
Questioning technique
Ask questions which related to the problem
Use each question to eliminate or discover possibilities
Make the question understandable by users
Ask the time of the problem first seen
Ask user to recreate the problem if possible
Determine the event sequence before the problem happened
Match the symptoms with common problem causes
Step 2 Isolate the problem
Use the layer models to categorize the problems
Further gather and document symptoms
Step 3 Correct the problem
Implement Test
Document (especially a new problem is made)
Approaches
Types
Bottom-up
Work up through OSI layer model
Good to deal with physical problems
Check every device and document all conclusions and
possibilities after obtain authorization
8/6/2019 Network Diagnosis and Troubleshooting Summary
4/14
Top-down
Work down through OSI layer model
Good to deal with application problems
Check every network applications and document all
conclusions and possibilities after obtain authorization
Divide and conquer
Work directly on a particular layer, based on troubleshooters
experience and symptoms
If a layer is functioning, normally underneath layers are
working too
Selecting guidelines
Tools
Network management system frameworks
End stations can send alerts when problems are recognized
Management entities are programmed to react Agent in end stations gather information
Such information will be sent via NM protocols like SNMP
Five areas: Performance, Configuration, Accounting, Fault and
Security
Knowledge base tools - databases
Performance measurement and reporting tools - Cisco view, Netsys
baseliner
Event and fault management tools Cisco Network Analysis Module,
protocol analyzers, pair / cable testers
8/6/2019 Network Diagnosis and Troubleshooting Summary
5/14
OSI layer 1 troubleshooting
Critical characteristics
As physical layer failed, upper layers cannot operate too
Ping timeout
Not able to telnet
Not able to access network drives and servers
Page cannot be displayed when attempting to access web pages
Noncritical characteristics
Equipment indicators
System LED - It shows whether the system is receiving power
and functioning correctly
POST off = running, green = success, amber = failed
Remote Power Supply (RPS) LED - It indicates whether or not
the remote power supply is in use
Port Mode LED - It indicates the current state of the Mode button.
8/6/2019 Network Diagnosis and Troubleshooting Summary
6/14
Port Status LED - They have different meanings, depending on
the current value of the Mode LED.
Console messages
show interfaces
no keepalive pretend interface up, should not be used
8/6/2019 Network Diagnosis and Troubleshooting Summary
7/14
Performance lower than baseline
Poor configuration
Incorrect clock rate, incorrect clock source, incorrect serial
links (sync/async), interface shutdown, encapsulations, IP
addressing, duplex and speed
Inadequate capacity
Unstable routing due to marginal link or port
Excessive traffic across low speed link
Overload server or service
Exceed design limits
Distance limit of cable signal attenuation
Collisions
Large collision domains, duplex mismatch, late collisions
Use show interface ethernet/fastethernet
Electromagnetic Interference (EMI) effects Impulse noise (voltage fluctuation, 270mV on 10BaseT and
30 or 40mV on 1000BaseT)), Random noise, Alien cross-talk
(parallel cables) and Near End Cross Talk (untwisted cable >
13mm)
Faulty media or hardware
Loose cable, dirty contacts, wrong cable, return loss
Power LED, Fan, power cable
Resources and utilization
CPU and memory
8/6/2019 Network Diagnosis and Troubleshooting Summary
8/14
Power
Network
Console (error) messages
Format: %FACILITY-SEVERITY-MNEMONIC: Message-text
Facility (hardware, protocol, or module)
Severity (of the situation, lower number = more serious)
Mnemonic (Unique identifier of the message)
Message-text (describe the condition)
Useful commands
Show buffers memory buffer pool statistics
Show environment power supply and temperature
Show processes cpu/memory resources utilizations
Show stacks display processor stacks, requires stack decoder
Show context show exception information in NVRAM
8/6/2019 Network Diagnosis and Troubleshooting Summary
9/14
OSI layer 2 troubleshooting
More difficult to troubleshoot because of suboptimal operations, either
frames not transmitting through best paths or dropped frames
Framing errors
A frame which is not ended on 8-bit byte boundary
Noisy serial line
Improperly designed cable
Incorrect clock (rate)
T1 link problem because of incorrect framing or coding
specification
Useshow interfaces to reveal
Frame error count
Invalid Cyclic Redundancy Check
Layer-2 to Layer-3 address mapping errors
Occur in point-to-multipoint, Frame Relay and broadcast Ethernet
A correct destination Layer-2 address must be given to a frame
Layer-2 to Layer-3 address mapping mechanism and potential errors
Static maps
In Ethernet environment, change of NIC can lead to problem
In Frame Relay environment, incorrect DLCIs assigned by
Telco
Dynamic maps (ARP)
Devices do not respond to ARP or Inverse-ARP requests
Invalid ARP replies due to misconfiguration, DoS or Man-in-
the-middle attacks
Symptoms (except man-in-the-middle attack)
No direct Layer-3 communications
Layer-2 communications are ok
No or incorrect Layer-2 address when doing ARP inspection Useful commands
Show arp
Show cdp neighbor detail
Show frame-relay map
Spanning Tree Protocol
Problem occur when exchange of Bridge Protocol Data Units (BPDUs)
failed
Symptoms
Unusually high backplane utilization
8/6/2019 Network Diagnosis and Troubleshooting Summary
10/14
Rapid address re-learning
Rapidly incrementing frame counters
Poor link performance
Broadcast storm within Layer-2 domain
Causes
Bad transceivers
Cabling issues
Hardware failures such as ports and Supervisor engine
Unidirectional link between bridges (cause STP loops)
UDLD protocol (to prevent STP loops)
A Layer-2 protocol which works with Layer-1 mechanisms
Able to detect neighbors identity and shutdown misconnected
ports
Operations
Exchange protocol packets between neighbors
Packets contains device/port ID of device itself and of
neighbors
Neighboring ports should see their own echo in packets
received from another side, otherwise the link will be
considered as unidirectional link after specific time
The ports in unidirectional link will be disabled by UDLD,
and only can reenable manually
Configuration
UDLD is disabled by default
Use udld enable either in global mode or in a particular
interface (interface command overwrites global ones)
Useshow udld interface to verify UDLD operation
Ethernet broadcast traffic
Causes Poorly programmed or configured applications
Huge Layer-2 broadcast domain
Other network problems such as STP loops or route flapping
Discover
Either compare with baseline or use protocol analyzer
Solutions
Create separate VLANs
Configure switches to be multicast aware
Use scheduling for distribution services to control broadcast
8/6/2019 Network Diagnosis and Troubleshooting Summary
11/14
Ethernet switch flooding
Causes
Asymmetric routing because of HSRP configuration on Layer-3
switches
STP Topology Change Notification (TCN)
Overflow of switch forwarding table (CAM)
Solutions
Set the routers ARP timeout and switches forwarding table-aging
time close to each other
Enable STP portfast feature on ports
Use port security feature
EtherChannel
Cause
Non-identical configuration on both sides
Symptoms
Loss of connectivity (due to switching loops)
Increased backplane utilization
Rapid MAC address re-learning
Interfaces may turn to ErrDisable state
Solution
Configure the ports on both sides to have same speed, duplex, and
native VLAN trunk
T1 framing errors
Useshow controllers t1
Check if clock source is provided by Telco (Line)
Check if the framing format is same as the line
Check if the line coding matches
ISDN
Useful commands show isdn status
debug isdn q931 show Layer-2 exchange
debug dialer show dialer list and dialer map
Check PPP connection
Frame Relay
Check physical connectivity
Verify LMI information exchange (show frame-relay lmi)
Verify PVC status (Active, inactive or deleted)
Verify Frame Relay encapsulation
8/6/2019 Network Diagnosis and Troubleshooting Summary
12/14
OSI layer 3 troubleshooting
General
Distribute-list blocking (except OSPF and ISIS)
Passive interface (RIP/IGRP can still receive routing updates)
Missing or incorrect network or neighbor statement
Layer-1 and 2 problem
show ip protocols
show ip interface
show ip interface brief
debug ip routing
RIP
Incompatible version types
By default, router receives version 1 and 2 but send version 1 only
Mismatched authentication key (in version 2 only)
Hop count limit (more than 15)
Discontiguous networks
Add static route
Change the middle network into a part of major network also
Use version 2 with no auto-summary
Invalid source address
Cause by IP unnumbered
Use no validate-update-source to solve the problem
Flapping routes
Large routing table
debug ip rip
EIGRP
Mismatched K values on both sides
Default K1=1, K2=1, K3=1, K4 and K5=0
Stuck in active Congested or bad link
Low router resources
Long query range
Excessive redundancy
Duplicate router ID
Change loopback address
show ip eigrp interfaces
show ip eigrp neighbors
debug ip eigrp
8/6/2019 Network Diagnosis and Troubleshooting Summary
13/14
OSPF
Access list blocking (multicast hello 224.0.0.5)
Mismatched parameters
Hello and dead interval
Authentication type
Area ID
Area options
State issues
Stuck in ATTEMPT
No response when trying to contact a neighbor
Misconfigured neighbor statement
Stuck in INIT
Two-way communication has not been established
Access list blocking OSPF hellos
Authentication enabled on one side only
Stuck in EXCHANGE
Fail to exchange Database Descriptor (DBD) packets
Duplicate router ID
Mismatched interface MTU
Point-to-point link unnumbered
show ip ospf interface
debug ip ospf events
BGP
Neighbors not initializing
Updates will only be exchanged upon Established neighbor state
Routes not being installed in routing table
IBGP
Routes not synchronized
Next hop is unreachable EBGP
Next hop is unreachable in case of multihop EBGP
Multiexit discriminator (MED) value is infinite
ISIS
Adjacency problems
Show clns neighbors
debug isis adj packets
debug isis update-packets
8/6/2019 Network Diagnosis and Troubleshooting Summary
14/14
OSI layer 4 troubleshooting
ACL
Implementing the standard access list as close to the protected
destination
Implementing the extended access list as close as possible to the source
of the traffic being filtered.
show log
show ip access-list {number/name}
show ip interface
NAT
DHCP
Source address of DHCP-Request packet is 0.0.0.0
Since NAT requires both valid destination and source address,
DHCP is difficult to run on router with NAT
DNS and WINS
When using dynamic NAT, the inside and outside addresses
relationship changes frequently, so the outside DNS servers can
not accurately present the network inside the router
SNMP
SNMP management station may not be able to contact SNMP
agents on the other side of the NAT router because NAT can alter
the addressing information in the payload
show ip nat
debug ip nat
Others
Local system logging
logging on
Network Time Protocol (NTP)
ntp peer {NTP server IP address} ntp peer authenticate
Logging timestamps
Service timestamps debug datetime {local time} {msec} {show
timezone}
NetBIOS
Netstatdisplay protocol statistics and current TCP/IP
connections
Nbstat- display protocol statistics and current NetBIOS
connections running on TCP/IP