Top Banner

of 14

Network Diagnosis and Troubleshooting Summary

Apr 08, 2018

Download

Documents

Bob Chan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/6/2019 Network Diagnosis and Troubleshooting Summary

    1/14

    Network Diagnosis and Troubleshooting Summary by Bob Chan

    Documentation

    Baselining Objective

    Discover the true performance of the network

    Provide comparison between normal and abnormal situations

    Verify policies

    Identify over-utilization and under-utilization areas

    Long-term performance and capacity prediction

    Steps of baselining

    Planning for the first baseline

    Start with data points which represent defined policies

    Collect data for day or two before actual baseline to

    determine whether the right data is collected from right

    devices

    Conduct network baselining on regular basis

    Speed up fault isolation

    Understand how the network affected by changes

    Identifying devices and ports of interest

    More clear report

    Either keep from change or change informing manner

    Use port description field to track the ports

    Determine the duration of baseline

    At least 7 days, 2 4 weeks is adequate

    Network documentation

    Overview

    Facilitate more effective troubleshooting

    Save time to build network configurations again Network configuration table

    Contain accurate and up-to-date records of components of the

    network.

    Provide information to identify and correct faults

    Should include: type, model, hostname, location, data link layer

    address, network layer address, other physical aspects

    Table for budgetary purpose should be separated

    Network topology diagrams

    Notations and symbols should be consistent

  • 8/6/2019 Network Diagnosis and Troubleshooting Summary

    2/14

    Cloud symbol = out of scope network

    Should include: device name, interface name, IP address, routing

    protocols

    Discover network configuration information

    show version device name, model, OS version (all)

    show ip interfaces active interfaces + addresses (R)

    show ip interfaces brief brief summary of interfaces (R)

    show ip interface {interface-name} MAC address (R)

    show ip protocols routing protocols enabled (R)

    show spanning tree/spantree spanning tree status (all)

    show cdp neighbors directly connected Cisco devices (all)

    show cdp entry {device id} details of connected devices (all)

    show interfaces description active ports + addresses (S)

    show interfaces status ports summary (S)

    show etherchannel summary EtherChannel (S)

    show interfaces trunk Trunk ports (S)

    show tech-support all information (many than needed)

    End system configuration table

    End systems are important, can affect network performance

    Provide complete picture of the network

    Should include: device name, OS, IP address, subnet mask,

    default gateway, DNS server, high-bandwidth network

    applications

    End system topology diagrams

    Should include: device name, OS, IP address, subnet mask,

    interface names, VLANs

    Discover end system configuration information

    OS and hardware information

    Access command line ipconfig / winipcfg / ifconfig- TCP/IP setting

    route print active routes

    arp a ARP information

    ping check connectivity

    tracert / traceroute view routes

    Documentation guidelines

    Determine scope Know the objective Be consistent Keep

    the documents accessibleMaintain the documentation

  • 8/6/2019 Network Diagnosis and Troubleshooting Summary

    3/14

    Troubleshooting methodologies and tools

    Overview

    Systematic approach can make troubleshooting manageable, less

    confuse and less time wasting

    Rocket scientist approach (theorist)

    Analyze until identify root cause, then correct with precision

    Time wasting, resources demanding

    Caveman approach (practical)

    Swap the things until the network functions again

    Not reliable, root cause may still present

    General troubleshooting process

    Remarks: stages are not mutually exclusive, policies should be

    established in each stage

    Step 1 Gather symptoms

    From alerts from NM systems, console message and users

    Break down the problems to smaller ones

    Questioning technique

    Ask questions which related to the problem

    Use each question to eliminate or discover possibilities

    Make the question understandable by users

    Ask the time of the problem first seen

    Ask user to recreate the problem if possible

    Determine the event sequence before the problem happened

    Match the symptoms with common problem causes

    Step 2 Isolate the problem

    Use the layer models to categorize the problems

    Further gather and document symptoms

    Step 3 Correct the problem

    Implement Test

    Document (especially a new problem is made)

    Approaches

    Types

    Bottom-up

    Work up through OSI layer model

    Good to deal with physical problems

    Check every device and document all conclusions and

    possibilities after obtain authorization

  • 8/6/2019 Network Diagnosis and Troubleshooting Summary

    4/14

    Top-down

    Work down through OSI layer model

    Good to deal with application problems

    Check every network applications and document all

    conclusions and possibilities after obtain authorization

    Divide and conquer

    Work directly on a particular layer, based on troubleshooters

    experience and symptoms

    If a layer is functioning, normally underneath layers are

    working too

    Selecting guidelines

    Tools

    Network management system frameworks

    End stations can send alerts when problems are recognized

    Management entities are programmed to react Agent in end stations gather information

    Such information will be sent via NM protocols like SNMP

    Five areas: Performance, Configuration, Accounting, Fault and

    Security

    Knowledge base tools - databases

    Performance measurement and reporting tools - Cisco view, Netsys

    baseliner

    Event and fault management tools Cisco Network Analysis Module,

    protocol analyzers, pair / cable testers

  • 8/6/2019 Network Diagnosis and Troubleshooting Summary

    5/14

    OSI layer 1 troubleshooting

    Critical characteristics

    As physical layer failed, upper layers cannot operate too

    Ping timeout

    Not able to telnet

    Not able to access network drives and servers

    Page cannot be displayed when attempting to access web pages

    Noncritical characteristics

    Equipment indicators

    System LED - It shows whether the system is receiving power

    and functioning correctly

    POST off = running, green = success, amber = failed

    Remote Power Supply (RPS) LED - It indicates whether or not

    the remote power supply is in use

    Port Mode LED - It indicates the current state of the Mode button.

  • 8/6/2019 Network Diagnosis and Troubleshooting Summary

    6/14

    Port Status LED - They have different meanings, depending on

    the current value of the Mode LED.

    Console messages

    show interfaces

    no keepalive pretend interface up, should not be used

  • 8/6/2019 Network Diagnosis and Troubleshooting Summary

    7/14

    Performance lower than baseline

    Poor configuration

    Incorrect clock rate, incorrect clock source, incorrect serial

    links (sync/async), interface shutdown, encapsulations, IP

    addressing, duplex and speed

    Inadequate capacity

    Unstable routing due to marginal link or port

    Excessive traffic across low speed link

    Overload server or service

    Exceed design limits

    Distance limit of cable signal attenuation

    Collisions

    Large collision domains, duplex mismatch, late collisions

    Use show interface ethernet/fastethernet

    Electromagnetic Interference (EMI) effects Impulse noise (voltage fluctuation, 270mV on 10BaseT and

    30 or 40mV on 1000BaseT)), Random noise, Alien cross-talk

    (parallel cables) and Near End Cross Talk (untwisted cable >

    13mm)

    Faulty media or hardware

    Loose cable, dirty contacts, wrong cable, return loss

    Power LED, Fan, power cable

    Resources and utilization

    CPU and memory

  • 8/6/2019 Network Diagnosis and Troubleshooting Summary

    8/14

    Power

    Network

    Console (error) messages

    Format: %FACILITY-SEVERITY-MNEMONIC: Message-text

    Facility (hardware, protocol, or module)

    Severity (of the situation, lower number = more serious)

    Mnemonic (Unique identifier of the message)

    Message-text (describe the condition)

    Useful commands

    Show buffers memory buffer pool statistics

    Show environment power supply and temperature

    Show processes cpu/memory resources utilizations

    Show stacks display processor stacks, requires stack decoder

    Show context show exception information in NVRAM

  • 8/6/2019 Network Diagnosis and Troubleshooting Summary

    9/14

    OSI layer 2 troubleshooting

    More difficult to troubleshoot because of suboptimal operations, either

    frames not transmitting through best paths or dropped frames

    Framing errors

    A frame which is not ended on 8-bit byte boundary

    Noisy serial line

    Improperly designed cable

    Incorrect clock (rate)

    T1 link problem because of incorrect framing or coding

    specification

    Useshow interfaces to reveal

    Frame error count

    Invalid Cyclic Redundancy Check

    Layer-2 to Layer-3 address mapping errors

    Occur in point-to-multipoint, Frame Relay and broadcast Ethernet

    A correct destination Layer-2 address must be given to a frame

    Layer-2 to Layer-3 address mapping mechanism and potential errors

    Static maps

    In Ethernet environment, change of NIC can lead to problem

    In Frame Relay environment, incorrect DLCIs assigned by

    Telco

    Dynamic maps (ARP)

    Devices do not respond to ARP or Inverse-ARP requests

    Invalid ARP replies due to misconfiguration, DoS or Man-in-

    the-middle attacks

    Symptoms (except man-in-the-middle attack)

    No direct Layer-3 communications

    Layer-2 communications are ok

    No or incorrect Layer-2 address when doing ARP inspection Useful commands

    Show arp

    Show cdp neighbor detail

    Show frame-relay map

    Spanning Tree Protocol

    Problem occur when exchange of Bridge Protocol Data Units (BPDUs)

    failed

    Symptoms

    Unusually high backplane utilization

  • 8/6/2019 Network Diagnosis and Troubleshooting Summary

    10/14

    Rapid address re-learning

    Rapidly incrementing frame counters

    Poor link performance

    Broadcast storm within Layer-2 domain

    Causes

    Bad transceivers

    Cabling issues

    Hardware failures such as ports and Supervisor engine

    Unidirectional link between bridges (cause STP loops)

    UDLD protocol (to prevent STP loops)

    A Layer-2 protocol which works with Layer-1 mechanisms

    Able to detect neighbors identity and shutdown misconnected

    ports

    Operations

    Exchange protocol packets between neighbors

    Packets contains device/port ID of device itself and of

    neighbors

    Neighboring ports should see their own echo in packets

    received from another side, otherwise the link will be

    considered as unidirectional link after specific time

    The ports in unidirectional link will be disabled by UDLD,

    and only can reenable manually

    Configuration

    UDLD is disabled by default

    Use udld enable either in global mode or in a particular

    interface (interface command overwrites global ones)

    Useshow udld interface to verify UDLD operation

    Ethernet broadcast traffic

    Causes Poorly programmed or configured applications

    Huge Layer-2 broadcast domain

    Other network problems such as STP loops or route flapping

    Discover

    Either compare with baseline or use protocol analyzer

    Solutions

    Create separate VLANs

    Configure switches to be multicast aware

    Use scheduling for distribution services to control broadcast

  • 8/6/2019 Network Diagnosis and Troubleshooting Summary

    11/14

    Ethernet switch flooding

    Causes

    Asymmetric routing because of HSRP configuration on Layer-3

    switches

    STP Topology Change Notification (TCN)

    Overflow of switch forwarding table (CAM)

    Solutions

    Set the routers ARP timeout and switches forwarding table-aging

    time close to each other

    Enable STP portfast feature on ports

    Use port security feature

    EtherChannel

    Cause

    Non-identical configuration on both sides

    Symptoms

    Loss of connectivity (due to switching loops)

    Increased backplane utilization

    Rapid MAC address re-learning

    Interfaces may turn to ErrDisable state

    Solution

    Configure the ports on both sides to have same speed, duplex, and

    native VLAN trunk

    T1 framing errors

    Useshow controllers t1

    Check if clock source is provided by Telco (Line)

    Check if the framing format is same as the line

    Check if the line coding matches

    ISDN

    Useful commands show isdn status

    debug isdn q931 show Layer-2 exchange

    debug dialer show dialer list and dialer map

    Check PPP connection

    Frame Relay

    Check physical connectivity

    Verify LMI information exchange (show frame-relay lmi)

    Verify PVC status (Active, inactive or deleted)

    Verify Frame Relay encapsulation

  • 8/6/2019 Network Diagnosis and Troubleshooting Summary

    12/14

    OSI layer 3 troubleshooting

    General

    Distribute-list blocking (except OSPF and ISIS)

    Passive interface (RIP/IGRP can still receive routing updates)

    Missing or incorrect network or neighbor statement

    Layer-1 and 2 problem

    show ip protocols

    show ip interface

    show ip interface brief

    debug ip routing

    RIP

    Incompatible version types

    By default, router receives version 1 and 2 but send version 1 only

    Mismatched authentication key (in version 2 only)

    Hop count limit (more than 15)

    Discontiguous networks

    Add static route

    Change the middle network into a part of major network also

    Use version 2 with no auto-summary

    Invalid source address

    Cause by IP unnumbered

    Use no validate-update-source to solve the problem

    Flapping routes

    Large routing table

    debug ip rip

    EIGRP

    Mismatched K values on both sides

    Default K1=1, K2=1, K3=1, K4 and K5=0

    Stuck in active Congested or bad link

    Low router resources

    Long query range

    Excessive redundancy

    Duplicate router ID

    Change loopback address

    show ip eigrp interfaces

    show ip eigrp neighbors

    debug ip eigrp

  • 8/6/2019 Network Diagnosis and Troubleshooting Summary

    13/14

    OSPF

    Access list blocking (multicast hello 224.0.0.5)

    Mismatched parameters

    Hello and dead interval

    Authentication type

    Area ID

    Area options

    State issues

    Stuck in ATTEMPT

    No response when trying to contact a neighbor

    Misconfigured neighbor statement

    Stuck in INIT

    Two-way communication has not been established

    Access list blocking OSPF hellos

    Authentication enabled on one side only

    Stuck in EXCHANGE

    Fail to exchange Database Descriptor (DBD) packets

    Duplicate router ID

    Mismatched interface MTU

    Point-to-point link unnumbered

    show ip ospf interface

    debug ip ospf events

    BGP

    Neighbors not initializing

    Updates will only be exchanged upon Established neighbor state

    Routes not being installed in routing table

    IBGP

    Routes not synchronized

    Next hop is unreachable EBGP

    Next hop is unreachable in case of multihop EBGP

    Multiexit discriminator (MED) value is infinite

    ISIS

    Adjacency problems

    Show clns neighbors

    debug isis adj packets

    debug isis update-packets

  • 8/6/2019 Network Diagnosis and Troubleshooting Summary

    14/14

    OSI layer 4 troubleshooting

    ACL

    Implementing the standard access list as close to the protected

    destination

    Implementing the extended access list as close as possible to the source

    of the traffic being filtered.

    show log

    show ip access-list {number/name}

    show ip interface

    NAT

    DHCP

    Source address of DHCP-Request packet is 0.0.0.0

    Since NAT requires both valid destination and source address,

    DHCP is difficult to run on router with NAT

    DNS and WINS

    When using dynamic NAT, the inside and outside addresses

    relationship changes frequently, so the outside DNS servers can

    not accurately present the network inside the router

    SNMP

    SNMP management station may not be able to contact SNMP

    agents on the other side of the NAT router because NAT can alter

    the addressing information in the payload

    show ip nat

    debug ip nat

    Others

    Local system logging

    logging on

    Network Time Protocol (NTP)

    ntp peer {NTP server IP address} ntp peer authenticate

    Logging timestamps

    Service timestamps debug datetime {local time} {msec} {show

    timezone}

    NetBIOS

    Netstatdisplay protocol statistics and current TCP/IP

    connections

    Nbstat- display protocol statistics and current NetBIOS

    connections running on TCP/IP