Top Banner

of 24

Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

Jul 05, 2018

Download

Documents

saikyawhtike
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    1/24

    Cisco Support Community

    Home

    Troubleshooting high CPU on a 6500 with sup720Document

     Tue, 09/29/2015 - 01:46

    Adam Casella  Mar 8th, 2011

    Troubleshooting high CPU on a 6500 with sup720

    The purpose of this document is to cover how to determine the cause of high

    CPU on a 6500/7600 with a sup720. The troubleshooting methods discussed in

    this documentation will make it possible to determine the cause of 90% of

    all high CPU issues on the sup720.

    The majority of high CPU on the sup720 is related to CPU usage on the MSFC,

    thus the majority of this document will cover high CPU on the MSFC.

    Because it would not be possible to cover every possible reason high CPU can

    be caused on the sup720, I will demonstrate how to use some of the tools

    built-in to the sup720 to show general methods on how to narrow down the

    cause of high CPU.

    If you are unable to determine the reason based on this documentation,

    please open a TAC case to investigate this issue further.

     

    ** Note that these methods can be used to determine the cause of high CPU ona RSP720, Sup32 and VS-S720, due to common architecture.

     

    Determining Where the CPU utilization is occurring:

     

    Within the sup720 6500, there are two types of CPU’s. One is used for

    https://supportforums.cisco.com/https://supportforums.cisco.com/users/acasellahttps://supportforums.cisco.com/users/acasellahttps://supportforums.cisco.com/

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    2/24

    layer 2 operations and is commonly referred as the SP (Switch Processor)

    CPU. The other CPU is used for layer3/4 operations and it commonly referred

    to as the RP(Route Processor) CPU. Both of these processors are located on

    the MSFC3 complex, each with a 1 gig in-band channel to the supervisor.

    Also depending on the module you may also have a DFC (Distributed Feature

    Card) to perform forwarding locally on that module. The DFC also has its

    own CPU, which performs processing locally on the line card. Under certain

    scenarios high CPU can be seen on these modules.

     

    High CPU on the SP (Switch processor):

    High CPU on the SP is much less common than high CPU on the RP. The

    reasons for high CPU on the SP are typically related to layer 2 operations

    of the sup720, such a spanning-tree (processing of BPDU's) or processing

    IGMP snooping/IGMP queries/membership reports as well as LACP/PAGP.

     

    You can view the CPU utilization using the following command:

     

    SP CPU Util:

     

    Switch# remote command switch show process cpu

     

    OR

     

    Switch#remove login switch

    Switch-sp#show process cpu

     

    High CPU on the RP (Route Processor):

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    3/24

    This will be traffic that needs to be processed for layer 3 operations, such

    as ARP, HSRP, forwarding traffic in software. Below I will go over

    troubleshooting steps when seeing high CPU on the IP Input/ARP input process

    as well as CPU utilization caused by interrupt switched traffic on the RP

    CPU.

    You can view the CPU utilization using the following command:

     

    RP CPU Util:

    Switch#show process cpu

     

    High CPU on a DFC/module:

    The CPU on the DFC will help in programming TCAM and router in hardware,

    since each DFC has its own TCAM.

    High CPU on a DFC is not very common and can occur for a few different

    reasons. One reason you may see high CPU on the DFC is due to Netflow Data

    Export. Typically CPU from NDE is expected, but in rare instances it can

    become high enough to disrupt other processes.

    You can view the CPU utilization using the following command:

    DFC CPU Util:

     

    Switch# attach

    Switch-DFC#show process cpu

     

    Types of CPU utilization:

     

    There are two type of CPU utilization within IoS, interrupt and process.

     

    Process based CPU utilization:

     

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    4/24

    CPU utilization caused by a process can be caused by few reasons listed

    below:

     

    1.) Processes switched traffic. This is traffic that is hitting a specific

    process in order to be forwarded OR processed by the CPU. An example of

    each would traffic being forwarded via the "IP Input" process OR control-

    plane traffic hitting the "PIM process".

     

    2.) A process trying to clean up tables/previous actions performed. This

    can be seen in process such a "CEF Scanner" OR "BGP Scanner", which are

    used to clean/update the CEF and BGP tables.

     

    Interrupt based CPU utilization:

     

    CPU caused by an interrupt is always traffic based. Interrupt switched

    traffic, is traffic that does not match a specific process, but still needs

    to be forwarded.

     

    Determining the type of CPU utilization:

    Process and Interrupt CPU utilization are listed within the "show process

    cpu" command. This is broken down below on how to determine what

    percentage of the CPU utilization is due to interrupt traffic or processed

    switched traffic:

     

    6500-3#sh proc cpu

    CPU utilization for five seconds: 0%/0%; one minute: 0%; five minutes: 0%

     

    Red - Percentage of total CPU utilization

    Blue - Percentage of the CPU that is caused by Interrupts.

     

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    5/24

    Percentage of process CPU util. = Total CPU - Interrupt CPU util.

    Common reasons for HIGH CPU on the MSFC/RP:

     

    IP traffic with a TTL of  1 - Due to the fact we need to send an IP unreachable

    message to the host letting them know the message has expired in transit.

    This cannot be done in hardware and thus the packet must be punted to the

    MSFC. Find the device sending traffic the TTL of 1 and stop is from

    sending traffic, increase the TTL OR install the MLS TTL rate-limiter.

    Using an ACL with the log keyword - Since a log keyword requires a syslog

    message to be generated this must be punted to the RP CPU as it cannot be

    done in hardware. Remove the log keyword from the ACL.

    Using a PBR route-map without a set statement - Any traffic that matches a PBR

    route-map with no set statement will be punted. This is due to the factthat we need to program the next-hop in hardware and if the next-hop is not

    known, this traffic must be punted to determine the next hop. Configure a

    set statement OR remove the policy route from the interface.

    FIB TCAM Exception - If you try to install more routes than are possible into

    the FIB TCAM you will see the following error message in the logs:

     

    CFIB-SP-STBY-7-CFIB_EXCEPTION : FIB TCAM exception, Some entries will be

    software switched

     

    %CFIB-SP-7-CFIB_EXCEPTION : FIB TCAM exception, Some entries will be

    software switched

    %CFIB-SP-STBY-7-CFIB_EXCEPTION : FIB TCAM exception, Some entries will be

    software switched

     

    This error message is received when the amount of available space in the

    TCAM is exceeded. This results in high CPU. This is a FIB TCAM limitation.

    Once TCAM is full, a flag will be set and FIB TCAM exception is received.

    This stops from adding new routes to the TCAM. Therefore, everything will be

    software switched. The removal of routes does not help resume hardware

    switching. Once the TCAM enters the exception state, the system must be

    reloaded to get out of that state. You can view if you have hit a FIB TCAM

    exception with the following command:

    6500-2#sh mls cef exception status

    http://cco.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/configuration/guide/dos.html#wpmkr1141137http://cco.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/configuration/guide/dos.html#wpmkr1141137

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    6/24

    Current IPv4 FIB exception state = TRUE

    Current IPv6 FIB exception state = FALSE

    Current MPLS FIB exception state = FALSE

    When the exception state is TRUE, the FIB TCAM has hit an exception.

    The maximum routes that can be installed in TCAM is increased by the mls

    cef maximum-routes command.

     

    This issue is common when trying to route a full BGP table on PFC-3A or a

    PFC-3B.

    **Note a failover of the supervisors in dual supervisor system will not recover this exception, even

    through the “show mls cef exception status” will no longer indicate a FIB exception. A full

    reload of the switch is required.

    ICMP redirects - If traffic is taking a path that is not efficient, an ICMP

    redirect will be sent out to inform the host of a better next-hop. This

    will cause the packet to be punted in order to trigger the MSFC to send the

    ICMP redirect to the host. This can be seen when performing a netdr

    capture. An example of using netdr can be seen in the “Tools used to

    determine the source of the CPU utilization:” section

    Turn off icmp redirects to stop this traffic from being punted. However

    this is an indication of network inefficiency that was attempting to bedynamically resolved. User interaction is needed in order to track down

    this inefficiency.

    If you need assistance in determining why ICMP redirects are being generated

    please open a TAC case

     

    CEF Glean adjacency  - This can happen when no ARP resolution for the next

    hop. All traffic must be punted in order to trigger an ARP request for the

    next hop. This will always manifest it self as interrupt based traffic.

    To protect the RP CPU from this issue you can implement the Glean adj. mls

    rate-limiter.

    Netflow and ACL feature configured on the same interface matching the same traffic  -

    You cannot have an ACL based feature and a Flow based feature configured on

    the same interface for the same traffic. An Example of this would be having

    NAT and PBR configured on the same interface matching the same traffic.

    NAT is netflow assisted, as the first packet in every flow would need to bepunted to create the netflow entry in hardware. Once the netflow entry is

    created all subsequent packets will hit this hardware netflow entry and thus

    be forwarded in hardware.

    http://cco.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/configuration/guide/dos.html#wpmkr1141161http://cco.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/configuration/guide/dos.html#wpmkr1141161http://cco.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/configuration/guide/dos.html#wpmkr1141161http://cco.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/configuration/guide/dos.html#wpmkr1141161

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    7/24

    Policy-Based Routing is ACL based. This will create a “policy-route” state

    when a route-map is configured to use PBR and applied to that interface.

    This will point to a special adjacency, which is where the next-hop is

    specified in the “set” statement of the route-map.

    The issue comes when a packet matches both the NAT and the PBR feature, the

    traffic can not be sent to the CPU to be put into Netflow AND be redirectedto the PBR special adj, thus this traffic must be software switched. If

    these two features overlap, these features are taken out of hardware and the

    traffic is software switched. When this occurs neither feature may be

    applied to the matching traffic.

    If a packet does not match both the ACL based feature and Netflow based

    feature match criteria then the relevant function (ACL based or Netflow

    based) will be performed in hardware.

    Therefore, for proper hardware based performance in situations where ACLbased features and Netflow based features are configured on the same

    interfaces it is important to have unique policies.

     

    To work around this problem do not have both an ACL based and Netflow based

    feature configured on the same interface, matching the same traffic.

     

    You can read more about troubleshooting feature conflict issues via thefollowing link:

     

    https://supportforums.cisco.com/docs/DOC-15670

     

    Directed Broadcast traffic – All broadcast traffic must be sent to the MSFC on

    a vlan when a layer 3 interface is configured within that vlan. This

    includes directed broadcast traffic. Use multicast instead of directed

    broadcast.

     

    Bridging loop - If a bridging loop occurs on the network, this could cause

    high CPU on the MSFC. All broadcast traffic must be sent to the MSFC on a

    vlan when a layer 3 interface is configured within that vlan.

     

    https://supportforums.cisco.com/document/60191/6500sup720-feature-manager-troubleshootinghttps://supportforums.cisco.com/document/60191/6500sup720-feature-manager-troubleshooting

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    8/24

    You can determine what traffic is hitting the CPU by using a netdr capture

    to track down the source interface of the loop (See Using Netdr to determine

    traffic punted to the CPU section).

    GRE with non-unique tunnel source - On the sup720, tunnel sources must be

    unique for all tunnels. Tunnels with a non-unique source will be software

    switched. The workaround for this limitation is to use either unique

    loopback interfaces for every GRE tunnel OR use secondary addresses on a

    loopback interface for the tunnel source addresses. For more

    information see CSCdy72539.

    You may also see the following error:

     

    %Warning: Using same source IP for more than one IP/GRE tunnels may cause software

    switching packets for tunnels using this address. If possible, use a unique tunnel source

    for Interface Tunnel

     

    Other common unsupported features on Sup720-PFC3:

     

    The following features/traffic types are common features that are not

    supported by the 6500 and will cause high CPU if implemented:

     

    **Note this is not an exhausted list and there may be unsupported features not listed below.

     

    NBAR

    Traffic with IP options field set.

    Multicast RPF drops

    RSVP (INTSERV QOS) *can be used for tunnels

    CEF accounting

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    9/24

    Multicast traffic and NAT – see CSCek78254

     

    The following link will give a larger list of all unsupported features and

    commands on the sup720-PFC3:

     

    http://www.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/release/n

    otes/features.html#wp3691673

     

    Tools used to determine the source of the CPU utilization:

     

    Determine the source of RP CPU utilization using interface buffers:

     

    **Note** you will only be able to see traffic in the interface buffers on a

    layer 3 interface if the traffic is being processed switched (see

    “ Determining type of CPU utilization” above). This will not work when

    traffic is being interrupt switched. In the case of interrupt switched 

    traffic use the netdr capture instead .

     

    One of the quickest ways to determine the layer 3 interface that is the

    source of traffic that is causing high CPU is to see which interface has a

    large amount of drops flushes on the interfaces input queue. The input

    queue on a layer 3 interface is the CPU queue for that interface on the

    sup720. If we ever see packets/drops on the input queue on the sup720 it

    is always due to traffic that is being sent towards the CPU. You cannarrow down the location of such an interface with the following commands:

     

    http://www.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/release/notes/features.html#wp3691673http://www.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/release/notes/features.html#wp3691673http://www.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/release/notes/features.html#wp3691673http://www.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/release/notes/features.html#wp3691673

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    10/24

     

    6500-2#show interface | include is up|drop

     

    Vlan10 is up, line protocol is up

      Input queue: 74/75/18063/18063 (size/max/drops/flushes); Total output

    drops: 0

    Vlan20 is up, line protocol is up

    Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0

     

    We can see that SVI (Switched Virtual Interface) 10 has 74 packets in its

    buffer, whose queue size is 75 packets. This demonstrates that a large

    amount of traffic is being punted on this interface to the RP CPU, since

    this queue is full.

     

    Now that we can see a large amount of traffic within this queue, we can

    look at what is in this queue with the command "show buffers input-interface

    vlan 10 header". This command will display the IP header of the packet sowe can attempt to determine the source. If you want to look at the entire

    packet you can use the command "show buffers input-interface vlan 10

    packet".

     

    Below is the output from this command for SVI 10

     

    6500-2#sh buffers input-interface vlan 10 header

     

    Buffer information for Small buffer at 0x4667A08C

      data_area 0x802F664, refcount 1, next 0x466AE968, flags 0x200

      linktype 7 (IP), enctype 1 (ARPA), encsize 14, rxtype 1

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    11/24

      if_input 0x530D5048 (Vlan10), if_output 0x0 (None)

      inputtime 00:00:00.000 (elapsed never)

      outputtime 00:00:00.000 (elapsed never), oqnumber 65535

      datagramstart 0x802F6DA, datagramsize 60, maximum size 308

      mac_start 0x802F6DA, addr_start 0x802F6DA, info_start 0x0

      network_start 0x802F6E8, transport_start 0x802F6FC, caller_pc 0x41F78790

     

    source: 10.10.10.2, destination: 10.100.101.10, id: 0x0000, ttl: 1,

     

    TOS: 0 prot: 6, source port 0, destination port 0

     

    Above we can see the basic information about this traffic that is included

    in the IP header, including the TOS, TTL and protocol encapsulated within

    the IP header.

     

    If we viewed the entire packet we can look at more in depth information

    including the layer 2 information, as can be seen below:

     

    6500-2#sh buffers input-interface vlan 10 packet

     

    Buffer information for Small buffer at 0x466A23B0

      data_area 0x80340A4, refcount 1, next 0x466E991C, flags 0x200

      linktype 7 (IP), enctype 1 (ARPA), encsize 14, rxtype 1

      if_input 0x52836BE4 (Vlan10), if_output 0x0 (None)

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    12/24

      inputtime 16:32:10.292 (elapsed 00:00:50.608)

      outputtime 00:00:00.000 (elapsed never), oqnumber 65535

      datagramstart 0x803411A, datagramsize 60, maximum size 308

      mac_start 0x803411A, addr_start 0x803411A, info_start 0x0

      network_start 0x8034128, transport_start 0x0, caller_pc 0x41F78790

     

    source: 10.10.10.2, destination: 10.100.101.10, id: 0x0000, ttl: 1,

     

    TOS: 0 prot: 6, source port 0, destination port 0

     

    0:0015C726 FB800000 01000600 08004500 ..G&{.........E.

      16: 002E0000 00000106 36510A0A 0A020A64  ........6Q.....d

      32:650A0000 00000000 00000000 00005000 e.............P.

      48: 0000265C 00000001 02030405 FD ..&\........}

    Red = Dest MAC

    Blue = Source MAC

    Green = Ethertype (0x800 for IP traffic)

    Purple = Src. IP

    Orange = Dest IP

     

    Using “show ip traffic” statistics to see why traffic is punted:

     

    6500-2#show interface | i is up|drop

      Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0

    Vlan10 is up, line protocol is up

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    13/24

      Input queue: 74/75/18063/18063 (size/max/drops/flushes); Total output

    drops: 0

    Vlan20 is up, line protocol is up

     

    SVI 10/Interface Vlan 10 is receiving a large amount of traffic that isbeing punted to the RP CPU. When we look at what is in this queue with the

    command "show buffers input-interface vlan 10 header".

     

    Below is the output from this command for SVI 10

     

    6500-2#sh buffers input-interface vlan 10 header

     

    Buffer information for Small buffer at 0x4667A08C

      data_area 0x802F664, refcount 1, next 0x466AE968, flags 0x200

      linktype 7 (IP), enctype 1 (ARPA), encsize 14, rxtype 1

      if_input 0x530D5048 (Vlan10), if_output 0x0 (None)

      inputtime 00:00:00.000 (elapsed never)

      outputtime 00:00:00.000 (elapsed never), oqnumber 65535

      datagramstart 0x802F6DA, datagramsize 60, maximum size 308

      mac_start 0x802F6DA, addr_start 0x802F6DA, info_start 0x0

      network_start 0x802F6E8, transport_start 0x802F6FC, caller_pc 0x41F78790

     

    source: 10.10.10.2, destination: 10.100.101.10, id: 0x0000, ttl: 1,

      TOS: 0 prot: 6, source port 0, destination port 0

     

    Buffer information for Small buffer at 0x4667C7E8

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    14/24

      data_area 0x80314A4, refcount 1, next 0x46695FD0, flags 0x200

      linktype 7 (IP), enctype 1 (ARPA), encsize 14, rxtype 1

      if_input 0x530D5048 (Vlan10), if_output 0x0 (None)

      inputtime 00:00:00.000 (elapsed never)

      outputtime 00:00:00.000 (elapsed never), oqnumber 65535

      datagramstart 0x803151A, datagramsize 60, maximum size 308

      mac_start 0x803151A, addr_start 0x803151A, info_start 0x0

      network_start 0x8031528, transport_start 0x803153C, caller_pc 0x41F78790

     

    source: 10.10.10.1, destination: 10.10.10.2, id: 0xD096, ttl: 255, prot: 1

     

    Since at this point we are unsure why this traffic is being punted, we can

    look at “show ip traffic" statistics to see why this traffic is being punted

    to the CPU. First start by clearing the IP traffic statistics. We can then

    see what is incrementing in these counters to see what would be the cause:

     

    6500-2#clear ip traffic

    Clear "show ip traffic" counters [confirm]

    6500-2#sh ip traffic

    IP statistics:

      Rcvd: 33516 total, 0 local destination

      0 format errors, 0 checksum errors, 33516 bad hop count

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    15/24

     

    Frags: 0 reassembled, 0 timeouts, 0 couldn't reassemble

      0 fragmented, 0 couldn't fragment

      Bcast: 0 received, 0 sent

      Mcast: 0 received, 0 sent

      Sent: 0 generated, 0 forwarded

      Drop: 40005 encapsulation failed, 0 unresolved, 0 no adjacency

      0 no route, 0 unicast RPF, 0 forced drop

      0 options denied, 0 source IP address zero

     

    ICMP statistics:

      Rcvd: 0 format errors, 0 checksum errors, 0 redirects, 0 unreachable

      0 echo, 0 echo reply, 0 mask requests, 0 mask replies, 0 quench

      0 parameter, 0 timestamp, 0 info request, 0 other

      0 irdp solicitations, 0 irdp advertisements

      0 time exceeded, 0 timestamp replies, 0 info replies

      Sent: 0 redirects, 0 unreachable, 0 echo, 0 echo reply

      0 mask requests, 0 mask replies, 0 quench, 0 timestamp

      0 info reply, 58464 time exceeded, 0 parameter problem

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    16/24

    On the 6500 all traffic with a TTL of 1 is punted to the CPU so that an

    ICMP TTL expired message can be sent to the host who sent this traffic.

     

    Also, the first packet in the buffer can be seen to have TTL of 1, which is

    why this traffic is punted. We can see that the 2nd packet is sourced from

    10.10.10.1 (SVI 10) sent to 10.10.10.2. This packet is an ICMP TTL expired

    message.

     

    Using Netdr to determine traffic punted to the CPU:

     

    A netdr capture is preformed on the MSFC CPU controller. This is the

    closest location you can capture a packet on the MSFC in order to determine

    why traffic is being punted to the MSFC/RP CPU. With a Sup720 or Sup32 it

    allows one to capture packets on the RP or SP inband. The netdr command can

    be used to capture both Tx and Rx packets in the software-switching path.

     

    Cat6500#debug netdr capture ?

    acl (11) Capture packets matching an acl

    and-filter (3) Apply filters in an and function: all must match

    continuous (1) Capture packets continuously: cyclic overwrite

    destination-ip-address (10) Capture all packets matching ip dst address

    dstindex (7) Capture all packets matching destination index

    ethertype (8) Capture all packets matching ethertype

    interface (4) Capture packets related to this interface

    or-filter (3) Apply filters in an or function: only one must

    match

    rx (2) Capture incoming packets only

    source-ip-address (9) Capture all packets matching ip src address

    srcindex (6) Capture all packets matching source index

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    17/24

    tx (2) Capture outgoing packets only

    vlan (5) Capture packets matching this vlan number

     

    OPTIONS:

    Using the continuous option, the switch will capture packets on the RP-

    inband continuously fill the entire capture buffer (4096 packets) and then

    start to overwrite the buffer in a FIFO fashion.

    The tx and rx options will capture packets coming from the MSFC and going

    to the MSFC respectivey.

    The and-filter and the or-filter specify that an and or an or will be

    applied respectively to all of the options that follow. For example, if

    you use the syntax below, then both option #1 and option #2 must match for

    the packet to be captured. Similarly, if the or-filter is used either

    option #1 or option #2 or both must match for the packet to be captuered.

    debug netdr and-filter option#1 option#2

    The interface option is used to capture packets to or from the specified

    interface. The interface can be either an SVI or a L3 interface on the

    switch.

    The vlan option is used to capture all packets in the specified VLAN. The

    VLAN specified can also be one of the internal VLANs associated with a L3

    interface.

    The srcindex and dstindex options are used to capture all packets matching

    the source ltl and destination ltl indices respectively. Note that the

    interface option above only allows the capture of packets to or from a L3

    interface (SVI or physical). Using the srcindex or dstindex options allows

    the capture of Tx or Rx packets on a given L2 interface. The srcindex and

    dstindex options work with either L2 or L3 interface indices.

    The ethertype option allows the capture of all packets matching the

    specified ethertype.

    The source-ip-address and destination-ip-address options allow the capture

    of all packets matching the specified source or destination IP address

    respectively.

     

    Below is an example of capturing traffic destined to 10.100.101.10 sourced

    from 10.10.10.2 going to the RP CPU:

     

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    18/24

    6500-2#debug netdr cap rx and-filter source-ip-address 10.10.10.2 destination-ip-address

    10.100.101.10

     

    6500-2#sh netdr cap

     

    A total of 4096 packets have been captured

    The capture buffer wrapped 0 times

    Total capture capacity: 4096 packets

     

    ------- dump of incoming inband packet -------

    interface Vl10, routine mistral_process_rx_packet_inlin, timestamp 00:00:11

    dbus info: src_vlan 0xA(10), src_indx 0xC0(192), len 0x40(64)

    bpdu 0, index_dir 0, flood 0, dont_lrn 0, dest_indx 0x380(896)

    10020400 000A0000 00C00000 40080000 00060468 0E000040 00000000 03800000

    mistral hdr: req_token 0x0(0), src_index 0xC0(192), rx_offset 0x76(118)

      requeue 0, obl_pkt 0, vlan 0xA(10)

    destmac 00.15.C7.26.FB.80, srcmac 00.00.01.00.06.00, protocol 0800

    protocol ip: version 0x04, hlen 0x05, tos 0x00, totlen 46, identifier 0

      df 0, mf 0, fo 0, ttl 100, src 10.10.10.2, dst 10.100.101.10 

    tcp src 0, dst 0, seq 0, ack 0, win 0 off 5 checksum 0x265C

     

    Red = Ingress Vlan of traffic

     

    Blue  = Layer 3 interface traffic is coming from

     

    Green  = Ethertype and SRC/DST MAC addresses

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    19/24

     

    Purple  = IP Header

     

    Orange  = SRC index (source of ingress traffic).

     

    Dark Red = Dest Index (where traffic is being sent).

     

    You can use this information to track down the source of the traffic beingpunted. Please refer to Troubleshooting with a NETDR capture on a

    sup720/6500  documention for a further explanation of how to interpret this

    data.

     

    Please open a TAC case if you need further assistance interpreting this

    data.

     

    Using a CPU SPAN to determine traffic being punted to the CPU

     

    This capture is performed on the ASIC, which is connected to the RP/SP

    CPU. This will allow you to replicate traffic that is being sent to theRP or SP CPU to a capture device. This can be handy for determine the

    cause of the HIGH CPU OR determining if traffic is being sent to or from

    the CPU for processing (such as HSRP/OSPF/PIM control plane traffic).

    When using the 12.2(18)SXF train and earlier the configuration for an

    inband span session is as follows:

     

    https://supportforums.cisco.com/document/59956/troubleshooting-netdr-capture-sup7206500https://supportforums.cisco.com/document/59956/troubleshooting-netdr-capture-sup7206500https://supportforums.cisco.com/document/59956/troubleshooting-netdr-capture-sup7206500https://supportforums.cisco.com/document/59956/troubleshooting-netdr-capture-sup7206500

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    20/24

     

    RP Console:

     

    Router#monitor session

     

    SP Console:

     

    Router#remote login switch

    Router-sp#test monitor session

     

    -OR-

     

    Router#remote login switch

    Router-sp#test monitor

    Router-sp#test monitor session

     

    On the 12.2(33)SXH train and later, this is the configuration for an inband

    sp->rp span session:

     

    Router(config)# monitor session 1 type local

    Router(config-mon-local)# source cpu

    Router(config-mon-local)# destination interface gigabitethernet 1/2

    Router(config-mon-local)# no shutdown

     

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    21/24

    For more information please reference the following link:

     

    http://cco.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/configura

    tion/guide/span.html#wp1109488

     

    Once this information is collected you can then use the source MAC/source IP

    information to determine the source of the traffic.

     

    Troubleshooting CPU spikes.

     

    At times it is not possible to determine the cause of a CPU spike, since a

    "show process CPU" cannot be run during the times of the issue. One way to

    get around this would be to setup an EEM script to run the command for you

    when the CPU goes above a certain value. The following EEM script will run

    a "show process cpu sorted" when the CPU utilization of the device goes

    above 50%:

     

    event manager scheduler script thread class default number 1

    event manager applet High_CPUevent snmp oid 1.3.6.1.4.1.9.9.109.1.1.1.1.3.1 get-type exact entry-op ge

    entry-val 50 poll-interval 0.5

     

    action 0.0 syslog msg "High CPU DETECTED. Please wait - logging

    Information to :high_cpu.txt"

      action 0.1 cli command "enable"

      action 0.2 cli command "show clock | append :high_cpu.txt"

      action 1.2 cli command "term length 0"

      action 1.3 cli command "show process cpu sorted | append :high_cpu.txt"

     

    Please fill in with the location of the file system without

    "".

     

    If you need further assistance in determining the cause of your high CPUplease open a Cisco TAC case.

    Rating

    http://cco.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/configuration/guide/span.html#wp1109488http://cco.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/configuration/guide/span.html#wp1109488http://cco.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/configuration/guide/span.html#wp1109488http://cco.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/configuration/guide/span.html#wp1109488

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    22/24

    1

    2

    3

    4

    5

    Overall Rating: 5 (7 ratings)

    Comments

    Collapse all

    Recent replies last

    katerina.dardoufa Tue, 09/29/2015 - 01:46

    Great article!

     

     Thanks for sharing

     

    See More

    https://supportforums.cisco.com/printpdf/59926?rate=2mcJt_oRSCTvHgUfcgb0VG2icETs2NwuPpcgp-prG4ohttps://supportforums.cisco.com/printpdf/59926?rate=O8vLZwvKYED3T2-6q6ViLndbVUhc_1gW41TabKOq2Rghttps://supportforums.cisco.com/printpdf/59926?rate=sJalSuXLTVQebhdITQgFVBFfQKhB_m2FfVCIRrIjO9Qhttps://supportforums.cisco.com/printpdf/59926?rate=qAGHdyGyrcXIo41PWFNCaOf5-SlflxR2PHuunq4FYD8https://supportforums.cisco.com/printpdf/59926?rate=czIZisCdwiFyFzMr_vyq3aCqpE8SWpoVWRc8vBf6l5khttps://supportforums.cisco.com/#https://supportforums.cisco.com/document/59926/troubleshooting-high-cpu-6500-sup720?recent=0https://supportforums.cisco.com/users/katerinadardoufahttp://supportforums.cisco.com/printpdf/59926#http://supportforums.cisco.com/printpdf/59926#https://supportforums.cisco.com/users/katerinadardoufahttps://supportforums.cisco.com/document/59926/troubleshooting-high-cpu-6500-sup720?recent=0https://supportforums.cisco.com/#https://supportforums.cisco.com/printpdf/59926?rate=czIZisCdwiFyFzMr_vyq3aCqpE8SWpoVWRc8vBf6l5khttps://supportforums.cisco.com/printpdf/59926?rate=qAGHdyGyrcXIo41PWFNCaOf5-SlflxR2PHuunq4FYD8https://supportforums.cisco.com/printpdf/59926?rate=sJalSuXLTVQebhdITQgFVBFfQKhB_m2FfVCIRrIjO9Qhttps://supportforums.cisco.com/printpdf/59926?rate=O8vLZwvKYED3T2-6q6ViLndbVUhc_1gW41TabKOq2Rghttps://supportforums.cisco.com/printpdf/59926?rate=2mcJt_oRSCTvHgUfcgb0VG2icETs2NwuPpcgp-prG4o

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    23/24

    anandbhalla Fri, 08/17/2012 - 06:46

    Under section "common reasons for High CPU on MSFC/RP:"

    How about a situation where PBR route-map is used with a match ip access-list and set

    statement, but the ip access-list does not exist on the router ?

    See More

    Akshay Balaganur Mon, 01/16/2012 - 11:43

    Nice and informative.

     You might wanna check the SNMP OID though.. Looks like there is a typo.

    1.3.6.1.4.1.9.9.109.1.1.1.1.3 is a valid OID ( Ends with 3 not 3.1 )

    Cheers,

    Akshay

    See More

    https://supportforums.cisco.com/users/abalaganhttp://supportforums.cisco.com/printpdf/59926#http://supportforums.cisco.com/printpdf/59926#https://supportforums.cisco.com/users/abalaganhttp://supportforums.cisco.com/printpdf/59926#https://supportforums.cisco.com/users/anandbhalla

  • 8/16/2019 Cisco Support Community - Troubleshooting High Cpu on a 6500 With Sup720 - 2014-06-17

    24/24

    Wachirajit G  Fri, 11/28/2014 - 01:12

    Hi Akshay,

    Some document end with 3.1. Please advise

    http://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-series-switches/63992-6k-high-

    cpu.html

    1.3.6.1.4.1.9.9.109.1.1.1.1.3.1

    See More

    p.hruby Tue, 03/19/2013 - 01:59

    Akshay,

    look at this nice document about CPU monitoring and OIDs

    http://www.cisco.com/en/US/tech/tk648/tk362/technologies_tech_note09186a0080094a94.shtml

    Petr

    See More

    https://supportforums.cisco.com/document/59926/troubleshooting-high-cpu-6500-sup720

    https://supportforums.cisco.com/users/wachirajitghttps://supportforums.cisco.com/users/phrubyhttp://www.cisco.com/en/US/tech/tk648/tk362/technologies_tech_note09186a0080094a94.shtmlhttp://supportforums.cisco.com/printpdf/59926#http://supportforums.cisco.com/printpdf/59926#http://www.cisco.com/en/US/tech/tk648/tk362/technologies_tech_note09186a0080094a94.shtmlhttps://supportforums.cisco.com/users/phrubyhttp://supportforums.cisco.com/printpdf/59926#http://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-series-switches/63992-6k-high-cpu.htmlhttp://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-series-switches/63992-6k-high-cpu.htmlhttps://supportforums.cisco.com/users/wachirajitg