Top Banner
Contents Why Monitor IPT Components? 1 About AppManager ..................... 1 CallManager Server Health ........ 2 CallManager Services Health ...... 3 CallManager Database................. 3 CallManager Functionality ......... 3 IP Gateway Health ....................... 3 QoS Monitoring ............................ 3 Layer 2 and 3 Switches ................ 3 Reporting ....................................... 3 Conclusion ..................................... 3 Appendix A: Supported Environments ................................ 3 Appendix B: Summary Guidelines3 Best Practices for Monitoring Cisco Systems IP Telephony Networks White Paper January 2005 This paper highlights suggested best practices to ensure a successful Call Manager deployment. Other white papers that address the management of the other Cisco IPT components are available. Each IP telephony deployment is different, but generally a Cisco Systems AVVID IP Telephony deployment includes a CallManager cluster, voice gateways, a Unity voice mail server, routers, L2/L3 switches, IP phones, and other applications.
25
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Best Practices for Monitoring Cisco Ip

Contents

Why Monitor IPT Components? 1

About AppManager ..................... 1

CallManager Server Health ........ 2

CallManager Services Health...... 3

CallManager Database................. 3

CallManager Functionality ......... 3

IP Gateway Health ....................... 3

QoS Monitoring ............................ 3

Layer 2 and 3 Switches ................ 3

Reporting....................................... 3

Conclusion ..................................... 3

Appendix A: Supported Environments................................ 3

Appendix B: Summary Guidelines3

Best Practices for Monitoring Cisco Systems IP Telephony Networks

White Paper January 2005

This paper highlights suggested best practices to ensure a successful Call Manager deployment. Other white papers that address the management of the other Cisco IPT components are available.

Each IP telephony deployment is different, but generally a Cisco Systems AVVID IP Telephony deployment includes a CallManager cluster, voice gateways, a Unity voice mail server, routers, L2/L3 switches, IP phones, and other applications.

Page 2: Best Practices for Monitoring Cisco Ip
Page 3: Best Practices for Monitoring Cisco Ip

1

Why Monitor IPT Components?

Cisco’s AVVID (Architecture for Voice, Video, and Integrated Data) exemplifies high-reliability IP telephony (IPT), but its reliability is dependent on the proper configuration and operation of dozens of associated components.

In the following sections, we’ll discuss a few of the IPT components you should plan to monitor day-in and day-out. In many cases, good management and monitoring practices can alert you to potential risks before they actually create problems for users.

Monitoring Cisco IPT with NetIQ® AppManager will enhance performance, cost-effectiveness, and reliability, and simplify the management of your IPT network. A comprehensive management solution is vitally important to the success and reliability of your IP telephony implementation.

About AppManager

The AppManager suite from NetIQ is the best, most comprehensive, and most reliable system fault and performance management solution on the market. AppManager was designed to manage the Windows NT/2000 systems that support Cisco IP telephony. It can perform hundreds of simple and sophisticated monitoring and management tasks related to Windows 2000 services, DNS/DHCP and WINS, SQL server, and even hardware, such as CPUs and fans.

Modules have been developed for AppManager to specifically manage the Cisco AVVID system. AppManager works to ensure the availability and performance of VoIP systems and networks through the use of Knowledge Scripts, which are network management rules designed to handle one or more tasks. Depending on the task, Knowledge Scripts can collect performance data (for example, about how many calls have been attempted today), monitor systems for simple or complex events (for example, call quality is poor or a service is down), and respond with one or more actions (such as raising an alert when there’s a problem, or restarting a service automatically).

CallManager Server Health

AppManager for Cisco CallManager checks CPU and memory utilization for CallManager processes at each server you choose to monitor and raises an alert when a process exceeds its utilization threshold, indicating reduced performance or increased risk of a failure. It tracks average CPU and memory usage over time for the CallManagers in a cluster, gives you access to a list of the processes that are consuming the most CPU resources, and can display the information it discovers in charts and graphs.

Careful, thorough management of your Cisco CallManager servers will let you know about a potential problem so that you can respond proactively, before the problem affects your users. AppManager offers numerous Knowledge Scripts devoted to monitoring the Cisco CallManager application, resources, and critical services. With other scripts, you can monitor for spikes in CPU and memory usage.

Note: Knowledge Script names in bold indicate that the script is recommended. Knowledge Scripts not in bold are suggested. There are Knowledge Scripts not mentioned in this document that may be useful for your specific requirements so be sure to review the Knowledge Script Guide for more complete listings.

Page 4: Best Practices for Monitoring Cisco Ip

2 White Paper

Here’s a list of the most important things to monitor right from the start:

• CPU usage. Run the CiscoCallMgr_CCM_SystemUsage script to monitor and set thresholds for the CallManager CPU usage and total CPU usage. Also run CiscoCallMgr_CCM_CpuHigh, which lets you set thresholds for maximum CPU usage for all other CallManager processes. Run these scripts every five minutes. Then run CiscoCallMgr_Report_SystemUsage or create a chart to compile the data that you’ve collected. The Maximum and Average data streams can provide invaluable trending information.

Also, look for spikes in CPU usage. To isolate which processor application is causing the spikes in CPU, run NT_TopCpuProcs. It’s possible that an application other than CallManager is causing the problem. Spikes in excess of 80 percent may indicate that your system can’t handle any new functions or that the CallManager might start dropping calls. Consider adding another server or moving phones to balance the loads carried by all your servers. If this is a rogue process, stop the identified process.

• Physical memory. Run the CiscoCallMgr_CCM_SystemUsage script to monitor and set thresholds for CallManager memory usage and total memory usage. Also run CiscoCallMgr_CCM_MemoryHigh, which allows you set thresholds for maximum memory usage for all other CallManager processes. Run these scripts every five minutes. Then run CiscoCallMgr_Report_SystemUsage or create a chart to compile the data that you’ve collected. The Minimum, Maximum, and Average data streams can provide invaluable trending information.

Also, look for spikes in memory usage. To isolate which process or application is causing the spike in memory usage or memory leak, run NT_TopMemProcs. It’s possible that an application other than CallManager is causing the problem. Spikes in usage of 75 to 80 percent could indicate heavy usage, or a more serious issue such as a virus, or denial-of-service attack. Spikes in excess of 80 percent may indicate that your system can’t handle any new functions or that the CallManager might start dropping calls. Consider adding another server or moving phones to balance the loads carried by all your servers. If this is a rouge process or memory leak, stop the identified process.

• Hard disk. We recommend that you monitor your hard disks every 12 hours. Among the benefits is ensuring the status of the different disks belonging to a RAID array. (Although the array may be in proper working condition, one of the physical drives may not be.)

Several AppManager scripts automate the task of monitoring hard disk status:

− CIM_DiskArrayFail. Monitors each physical drive in the Array set. This script raises events for SNMP or Compaq Insight Manager (CIM) failures, drive failures, and drive degradation.

− CiscoCallMgr_Sys_PhysicalDiskBusy. Monitors physical disk operation time and queue length. A disk is considered busy if its disk operation time is high or the queue length is long.

− CiscoCallMgr_Sys_PhysicalDiskIO. Monitors physical disk reads, writes and transfers per second. For disk array subsystems, you need to enable Performance Monitor disk counters before you can run Sys_PhysicalDiskIO. If you have not already enabled Performance Monitor for disk activities, run the program %systemroot%\system32\diskperf.exe with the -y switch, then reboot your system. On Windows 2000 servers, only the physical disk counter is enabled by default.

− CIM_IDAFail. Monitors IDA controllers for the operational status of IDA drives. This script raises events for SNMP or Compaq Insight Manager (CIM) failures, drive failures, and drive degradation.

− CIM_SCSIFail. Monitors the operational status of discovered SCSI drives. This script raises events for SNMP or Compaq Insight Manager (CIM) failures, drive failures, and drive degradation.

Page 5: Best Practices for Monitoring Cisco Ip

3

− CIM_SCSITimeout. Monitors hard and soft resets and command timeouts for the SCSI controller. This script raises events for SNMP or Compaq Insight Manager (CIM) failures.

• Disk space usage. You should closely monitor the usage of disk space of your CallManager servers, especially if logs are activated. You’ll be able to avoid many problems altogether if you take a proactive stance toward managing log file sizes. Run the NT_LogicalDiskSpace script every 12 hours. Usage above 75 percent or free space of less than one GB is a signal to delete temporary files and archive and delete log files.

• Virtual memory. Run NT_MemUtil every two minutes to monitor the usage of virtual memory, as well as physical memory and paging files. A spike in usage in excess of 75 percent could indicate heavy usage, or a more serious issue such as a virus, or denial-of-service attack. Run the AvgValueByHr Report script to summarize the data you’ve collected on an hourly basis. Look at the Minimum and Average memory data streams to help you set event thresholds and establish growth needs. Look at the Maximum memory data stream to help detect memory leaks.

• Fans. It’s a good bet to periodically check the status of your CallManager server’s fan. A once-a-day check is sufficient. Run the following script:

− CIM_FanSummary. Monitors the status of system and CPU fans. This script raises events for SNMP or Compaq Insight Manager (CIM) failures, fan failures, and fan degradation.

• Power supply. The status of your CallManager’s power supply is perhaps one of the more vital conditions that you can monitor. Run either or both of the following scripts every two hours:

− CIM_UPSBatteryLow. Monitors the UPS (uninterrupted power supply) battery life. This script raises events for SNMP or HP Compaq Insight Manager (CIM) failures, AC power on, and low battery.

− CIM_UPSLineStatus. Monitors the UPS AC power line. This script raises events for SNMP or Compaq Insight Manager (CIM) failures and AC power line failure.

• Temperature. You should monitor the condition of the server’s temperature sensors as well as your system’s overall thermal environment. Once every hour, run:

− CIM_ThermalStatus. Monitors the system’s thermal environment and the status of the temperature sensors. If the overall condition of the system’s thermal environment is abnormal or the temperature sensors are operating out of normal range an event is generated with a degraded or critical condition event.

• Memory leaks. A memory leak occurs when a process requests memory for temporary usage, but does not release the memory when the process no longer needs it. This memory accumulation by a process can then starve other processes that need memory, leaving your system unstable or degraded.

Run CiscoCallMgr_CCM_SystemUsage to monitor physical memory usage for the CallManager process and total physical memory usage. Run this script for a week or two, and then create a chart or run CiscoCallMgr_Report_SystemUsage to compile the data that you’ve collected.

Graph the memory values to identify possible memory leak conditions at the system level. You can identify a potential memory leak condition by noticing that the maximum free memory values continuously diminish over time or memory values for a particular process continually increase over time (assuming other parameters, such as the number of registered devices, remain somewhat constant).

To pinpoint the faulty process, run NT_TopMemProcs. Then use the AppManager Chart Console to graph the daily minimum memory usage for that process over time. Double-click on a datapoint to see the details on memory use by top processes.

Page 6: Best Practices for Monitoring Cisco Ip

4 White Paper

• Network Interface Cards. It is also important to monitor the bandwidth on the Network Interface Cards. If the NIC on a particular CallManager is over-utilized, problems could occur with call setup and other communications.

Run the NT_NetworkBusy script every 15 minutes to monitor the traffic on all CallManager Network Interface Cards. An event will be raised if bandwidth utilization exceeds the threshold.

CallManager Services Health

The CallManager server—and thus, the rest of your IPT network—is only as reliable as the applications and services on which it depends. You’ll need to monitor the following essential components.

• Cisco CallManager service. The Cisco CallManager service runs on the Cisco IP Telephony Applications Server to provide software-only call processing as well as signaling and call control functionality. You’ll want to monitor the status of the CallManager service every five minutes.

− Run CiscoCallMgr_CCM_HealthCheck and set the parameters to alert you when the service has been restarted or if a restart attempt fails.

In addition, several other scripts monitor vital health-related functions:

− CiscoCallMgr_CCM_RoleStatus. Determines whether a CallManager status is Primary or Backup. This script raises an event for status transitions. A Backup is defined as any CallManager with no registered phones (hardware or software).

− CiscoCallMgr_CCM_Heartbeat. Monitors the CallManager heartbeat. Each CallManager installed in your system should be sending out a signal to all registered devices—letting them know it’s active—every 30 seconds. This script raises an event if the heartbeat stops or falls below the specified threshold. A low heartbeat indicates that the CallManager service was stopped and then restarted.

• Cisco TFTP service. Cisco Trivial File Transfer Protocol (TFTP) builds and serves files consistent with the trivial file transfer protocol, a simplified version of FTP. TFTP servers distribute information to IP phones about the locations of CallManagers and the existence of patches they need to install, if any. The TFTP service on Windows 2000 provides configuration files and other information to Cisco devices as they register. You’ll want to be notified if the TFTP service goes down, if TFTP errors occur, or if an exceptionally large number of TFTP requests pass over the network.

If a TFTP server isn’t working properly, you’ll probably see problems with phone and gateway registration. The TFTP server may serve up corrupt configuration files, or may fail to respond to requests.

You can monitor the TFTP service with the same script you use to monitor the CallManager service. Run CiscoCallMgr_CCM_HealthCheck every five minutes and set the parameters to alert you when the service has been restarted or if a restart attempt fails.

In addition, several other AppManager scripts monitor vital TFTP-related functions:

− CiscoCallMgr_TftpRequests. Monitors the total number of TFTP requests handled during an interval. This number includes the local requests that were successfully handled by the server, Not Found requests, and requests that have been aborted or rejected by the TFTP server.

− CiscoCallMgr_TftpErrors. Monitors TFTP-related errors that occur during an interval.

− CiscoCallMgr_TftpHeartbeat. Monitors the Cisco TFTP heartbeat.

Page 7: Best Practices for Monitoring Cisco Ip

5

− CiscoCallMgr_TftpChangeNotify. Monitors the number of TFTP change notifications handled during an interval.

− CiscoCallMgr_TftpSegmentPctLost. Monitors the percentage of TFTP segments lost during an interval.

− CiscoCallMgr_TftpSegmentsSent. Monitors the number of TFTP segments sent during an interval.

• Cisco Messaging Interface service. The Cisco Messaging Interface service provides the communication between the voice-mail system and Cisco CallManager. Use the CiscoCallMgr_CCM_HealthCheck script to monitor the status of the Cisco Messaging Interface service. Run the script every five minutes and set the parameters to alert you when the service has been restarted or if a restart attempt fails.

• Cisco IP Voice Media Streaming APP service. The Cisco IP Voice Media Streaming Application service provides voice media streaming functionality for the Cisco CallManager for use with MTP, conferencing, and music on hold (MOH). Run CiscoCallMgr_CCM_HealthCheck every five minutes and set the parameters to alert you when the service has been restarted or if a restart attempt fails.

• Cisco CTI Manager service. The CTI Manager contains the CTI components that interface with applications. With the CTI Manager service, applications have access to resources and functionality of all Cisco CallManagers in the cluster and have improved failover capability. CiscoCallMgr_CCM_HealthCheck also monitors the status of the Cisco CTI Manager service. Run the script every five minutes and set the parameters to alert you when the service has been restarted or if a restart attempt fails (assuming that the CTI Manager service is not in use).

NumOfActiveCMLink is a counter that shows the total number of active CallManager links in the cluster. If this value drops to 0, then there is definitely something wrong with the CTI Manager service. If this number is non-zero, but is less than the total number of active CallManagers, you may have a problem with the CallManager servers in the cluster. Run the following scripts to further monitor the CTI Manager service:

− CiscoCallMgr_CTI_Manager. Monitors the number of CTI Manager connections, open devices, open lines, and active CallManager links.

− CiscoCallMgr_RegCtiPorts. Monitors the number of currently registered CTI ports.

• Cisco Telephony Call Dispatcher service. The Telephony Call Dispatcher service provides centralized services for Cisco Web Attendant clients and pilot points. You can monitor the Telephony Call Dispatcher service with the same script you use to monitor other Cisco services. Run CiscoCallMgr_CCM_HealthCheck every five minutes and set the parameters to alert you when the service has been restarted or if a restart attempt fails (assuming that Web Attendant is not in use).

• Cisco RIS Data Collector service. The Real-time Information Server (RIS) maintains real-time Cisco CallManager information and provides an interface through which the Cisco RIS Data Collector service and the SNMP Agent retrieve that information. Use the CiscoCallMgr_CCM_HealthCheck script to monitor the status of the RIS Data Collector service. Run the script every five minutes and set the parameters to alert you when the service has been restarted or if a restart attempt fails.

Page 8: Best Practices for Monitoring Cisco Ip

6 White Paper

• Cisco Database Layer Monitor service. The Cisco Database Layer Monitor service monitors aspects of the database layer as well as call detail records (CDRs). The database layer comprises a set of dynamic link libraries (DLLs) that provide a common access point for applications that need to access the database to add, retrieve, and change data. The Cisco Database Layer Monitor service performs functions such as determining whether the primary server is available during failover. Monitor this service every five minutes with the CiscoCallMgr_CCM_HealthCheck script, setting the parameters to alert you when the service has been restarted or if a restart attempt fails.

• System backups. Make sure you’re always kept informed if system backups don’t take place as scheduled—preferably every night. Monitor your backup servers and drives and make sure they aren’t in any danger of crashing. Cisco provides backup utilities that can be monitored. Run CiscoCallMgr_CiscoBackupStatus regularly and after scheduled backups to monitor the Cisco IP Telephony Applications Backup Utility program.

• Internet Information Services. Internet Information Services (IIS) support Cisco CallManager configuration through active server pages (ASP), give the Cisco CallManager server access to Administration web pages, and helps secure Cisco CallManager administration functions. If IIS processes and applications consume excessive CPU resources, and IIS servers, Web services, and processes may go down.

− Configure CiscoCallMgr_IIS_HealthCheck to notify you if any of the above events occur.

− Because IIS logs error information, you should also monitor its logs. Use CiscoCallManager_CCM_EventLog to look for failed or errored ASP and HTTP requests and other communication failures.

In addition to the above-mentioned scripts, daily running of the following scripts will provide additional IIS monitoring capability:

− CiscoCallMgr_IIS_CpuHigh. Monitors CPU usage for IIS application processes.

− CiscoCallMgr_IIS_KillTopCPUProcs. Monitors CPU usage for IIS processes and kill processes using excessive CPU resources.

− CiscoCallMgr_IIS_MemoryHigh. Monitors working set memory usage and memory pool usage for IIS application processes.

− CiscoCallMgr_IIS_RestartServer. Restarts an IIS server.

− CiscoCallMgr_IIS_ServiceUptime. Monitors Web sites and Web services uptime.

• DC Directory Server service. The DC Directory Server provides phone number lookup and other directory services for Cisco IP phones. Run CiscoCallMgr_CCM_HealthCheck every five minutes to monitor the status of the DC Directory Server service and to automatically restart this service if it goes down.

• Domain Name Service. You’re probably already monitoring this important service on your network. DNS is just as critical for CallManager as it is for the rest of your network, enabling each VoIP phone to locate its CallManager server.

− Run the NT_DNSConnectivity script to make sure CallManager servers never lose connectivity to DNS servers.

− Use the CiscoCallMgr_Sys_EventLog script to scan the Windows event logs for DNS errors.

• Security. Another member of your organization may be in charge of network security. But security failures become your problem if they take down the phone system. It’s a good idea to keep informed about any secure areas, such as the CallManager server, that have been compromised or threatened.

Page 9: Best Practices for Monitoring Cisco Ip

7

− NT_FailedLogins. Monitors for failed logon attempts to the server since the last interval (possibly due to break-in attempts).

• CCM Systems Counters: NetIQ plans to provide the ability to monitor some of the key systems counters provided in recent releases of CallManager. Performance counters like CallsRejectedDueToThrottling, CodeRed (Yellow) EntryExit, and others are planned for a future release.

CallManager Database

The CallManager SQL database keeps records of your administrative configuration data, call route tables, and information about all calls made. Without the database, CallManager can’t access any of its administrative configuration data or its routing plan. Database status, accessibility, and available space are the most critical metrics to track, but you should also keep tabs on CPU and memory utilization, at minimum. Set the parameters in the following Knowledge Scripts to raise events if a critical SQL service, such as MSSQLServer, goes down, or if the Windows 2000 Application Event Log includes a message that a SQL scheduled job has failed.

To gather the most information, run the following scripts every five minutes:

• CiscoCallMgr_SQL_Accessibility. Monitors SQL Server and database accessibility.

• CiscoCallMgr_SQL_RepTransactions. Monitors the number of transactions marked for replication but not yet replicated.

• CiscoCallMgr_ServerDown. Monitors the status of the SQL Server service. Automatically restarts the service when down if the Auto-Start option is set to yes.

Other useful metrics to monitor include:

• CiscoCallMgr_SQL_BlockedProcesses. Monitors the SQL processes that have been blocked.

• CiscoCallMgr_SQL_CPUUtil. Monitors the percentage of CPU resources used by SQL Server processes.

• CiscoCallMgr_SQL_DataGrowthRate. Monitors the data growth and shrink rates for all SQL Server databases.

• CiscoCallMgr_SQL_DataSpace. Monitors the data space available and data space being used for all SQL Server databases.

• CiscoCallMgr_SQL_DBGrowthRate. Monitors database growth and shrink rates.

• CiscoCallMgr_SQL_DbOption. Monitors databases options.

• CiscoCallMgr_SQL_DBSpace. Monitors the database space available and space being used for all SQL Server databases.

• CiscoCallMgr_SQL_Errorlog. Monitors the SQL Server error log.

• CiscoCallMgr_SQL_LogGrowthRate. Monitors log growth and shrink rate for all SQL Server databases.

• CiscoCallMgr_SQL_LogSpace. Monitors the log space available and log space being used for all SQL Server databases.

• CiscoCallMgr_SQL_MemUtil. Monitors the amount of working set memory used by SQL Server processes.

Page 10: Best Practices for Monitoring Cisco Ip

8 White Paper

• CiscoCallMgr_SQL_NearFileMaxSize. Monitors the size of all SQL Server database files.

• CiscoCallMgr_SQL_NearMaxConnect. Monitors SQL Server opened connection usage.

• CiscoCallMgr_SQL_NearMaxLocks. Monitors SQL Server lock utilization.

• CiscoCallMgr_SQL_NetError. Monitors SQL Server network errors.

• CiscoCallMgr_SQL_RepTranSec. Monitors the number of transactions replicated per second.

CallManager Functionality

Once you’ve taken care of the absolutely essential monitoring tasks outlined in the sections above, you’re ready to extend coverage once more—to the functionality of CallManager itself and some of the “extras” that supplement or ship with CallManager.

• Registered Devices. Whenever a device (e.g., phone, gateway, gatekeeper) has a problem registering with its CallManager, you should take a closer look. In the Windows 2000 Application Event Log, an error listed as DeviceTransientConnection indicates that a device made a connection to the CallManager server on TCP port 2000, but that the connection was terminated before registration was accomplished. This could mean there’s a problem with the device, with the network connection, or with the server or database. The device itself may be illegal and could indicate a security breach. For obvious reasons, you’ll want to know anytime there’s a problem with device registration, or if the number of currently registered devices exceeds the number of devices that you know are authorized.

• Registered phones. You’ll want to be kept informed any time the number of registered phones decreases rapidly or falls below your threshold. Run CiscoCallMgr_RegHardwarePhones every 15 minutes. In addition, monitor the number of currently registered station devices other than Cisco hardware phones, such as Cisco IP SoftPhones, Cisco uOne ports, and Cisco Unity voice ports, with CiscoCallMgr_RegOtherDevices.

• MGCP gateway registration. At minimum, run CiscoCallMgr_MGCP_GatewayCheck every five minutes to monitor for new and missing MGCP Gateways.

An additional AppManager script provides more data to aid your monitoring efforts:

− CiscoCallMgr_CCM_DeviceStatus. Monitors the status of gateways within a cluster. Possible statuses include registered, unregistered, rejected, and unknown.

• Gatekeeper registration. You should periodically (every five minutes should do) verify that the CallManager is registered with the gatekeeper. Run CiscoCallMgr_CCM_DeviceStatus to monitor the status of Gatekeepers within a cluster. Possible statuses include registered, unregistered, rejected, and unknown.

• Calls in progress. When a phone goes off hook, it is a call in progress until it goes back on hook. If all calls that are in progress are connected, the number of calls in progress and the number of active calls will be the same. For capacity-planning purposes, you should establish an upper-limit threshold for the number of calls that can be in progress. Run CiscoCallMgr_CallsInProgress every five minutes over a period of time. Then run AvgValueByHr report script to graph the data streams that will help you decide what constitutes the calls-in-progress threshold. Once you’ve established your baseline, you can configure the CallsInProgress script to alert you when the number of in-progress calls exceeds Cisco sizing guidelines.

Page 11: Best Practices for Monitoring Cisco Ip

9

• Active calls. Active calls are those that have a voice path connected. For capacity-planning purposes, you should establish an upper-limit threshold for the number of active calls. Run CiscoCallMgr_CallsActive every five minutes over a period of time. Then run CiscoCallMgr_Report_CallsByHour report script to graph the data streams that will help you decide what constitutes the active-call threshold. Once you’ve established your baseline, you can configure the CallsActive script to alert you when the number of active calls exceeds your system’s capacity

To gather additional information about active calls, run NetworkDevice_ISDNDChannelUtil to monitor total gateway call activity.

• Attempted calls. You should monitor attempted calls over time and use the collected data to compute the Busiest Hour Call Attempt (BHCA) value. Run the CiscoCallMgr_CallActivity script every 15 minutes to gather the data and then run CiscoCallMgr_Report_CallsByHour to graph the data.

• Completed calls. A completed call is an active call that completed without an abnormal termination code. You should monitor completed calls over time and use the collected data to compute the Busiest Hour Call Attempt (BHCA) value. Run the CiscoCallMgr_CallActivity script every 15 minutes to gather the data and then run CiscoCallMgr_Report_CallsByHour to graph the data.

• Active PRI channels. Collection of this data over time can help you understand call patterns and busy hour peak calls. You can use baseline data to detect real-time underutilization of circuits, which is an indication of possible system performance degradation (including hard-to-detect PSTN call routing or circuit-down conditions). Data trending helps you plan for circuit growth and provisioning.

Several AppManager Knowledge Scripts can provide the information that you need:

− CiscoCallMgr_MGCP_PRI_Channels. Monitors MGCP PRI devices for the number of currently active and out-of-service channels. PRIs can be grouped into logical Trunk Groups for thresholding across multiple PRIs. The PRIs are generally grouped by any combination of carrier, local, long distance, international, etc.

− CiscoCallMgr_MGCP_PRI. Monitors calls completed and outbound busy attempts for MGCP PRI devices and also the status of the PRI D-Channel.

− CiscoCallMgr_MGCP_T1CAS_Channels. Monitors MGCP T1 devices for the number of currently active and out-of-service channels. T1s can be grouped into logical Trunk Groups for thresholding across multiple T1s. The T1s are generally grouped by any combination of carrier, local, long distance, international, etc.

− CiscoCallMgr_MGCP_T1CAS. Monitors calls completed and outbound busy attempts for MGCP T1 devices

− CiscoCallMgr_H323_CallsAttempted. Monitors the number of calls attempted by an H.323 device during an interval.

− CiscoCallMgr_H323_CallsInProgress. Monitors the number of calls in progress by an H.323 device.

− NetworkDevice_ISDNBChannelUtil. Monitors Total Gateway PRI channels in use and E1 Interface channels in use.

− NetworkDevice_InterfaceHealth. Monitors the parent resource for the interfaces on a network device.

− CiscoCallMgr_CCM_PRIChannels. For CallManager 3.1 and above, monitors the number of active PRI voice channels and PRI spans in service.

Page 12: Best Practices for Monitoring Cisco Ip

10 White Paper

• In-service PRI spans. The total number of in-service PRI spans should remain constant, although the number of circuits may vary whenever a new circuit is provisioned or an existing circuit is disconnected. Run CiscoCallMgr_CCM_PRIChannels every five minutes and set it to alert you when the number of in-service spans falls below an acceptable level.

In addition, run NetworkDevice_InterfaceHealth to gather further information about the parent resources for the interfaces that you are monitoring.

• Port status (FXO, FXS, and Analog). Make sure that your monitoring efforts include watching call activity through your FXO, FXS and Analog ports, as well as knowing when the ports become inactive. The total number of in-service ports should remain fairly constant.

Run the following AppManager scripts to monitor active and in-service ports, completed calls, and outbound busy attempts:

− CiscoCallMgr_MGCP_FXO. Monitors completed calls and outbound busy attempts for MGCP FXO devices.

− CiscoCallMgr_MGCP_FXS. Monitors completed calls and outbound busy attempts for MGCP FXS devices.

− CiscoCallMgr_AnalogPortsActive. Monitors the number of currently active analog ports.

− CiscoCallMgr_AnalogPortsOutOfService. Monitors the number of analog ports out of service.

− CiscoCallMgr_CCM_FXOPorts. For CallManager 3.1 and above, monitors the number of active and in-service FXO ports.

− CiscoCallMgr_CCM_FXSPorts. For CallManager 3.1 and above, monitors the number of active and in-service FXS ports.

• Active Conference Bridge calls. Software to help users set up conference calls, Conference Bridge, ships with the CallManager software and allows for two different types of conference dial-in procedures—“Meet-Me” and “Ad-Hoc.” Conference Bridge works with either multicast or unicast conference devices, but in each case, you must configure in advance the maximum number of audio streams that will have to be supported for a call.

You should monitor Conference Bridge conferences and streams in real time to identify under- and over-utilization and to ensure that users are able to set up and complete conference calls when desired and that conference devices are configured to meet demands for audio streams.

Five AppManager scripts can provide all of the data that you need:

− CiscoCallMgr_ConfBridgeActiveConf. Monitors the number of active conferences for a Conference Bridge.

− CiscoCallMgr_ConfBridgeActiveStreams. Monitors the number of active streams for a Conference Bridge.

− CiscoCallMgr_ConfBridgeAvailStreams. Monitors the number of available streams for a Conference Bridge.

− CiscoCallMgr_ConfBridgeConferences. Monitors the number of conferences completed during an interval.

− CiscoCallMgr_ConfBridgeStreams. Monitors the number of streams on conferences completed during an interval.

Page 13: Best Practices for Monitoring Cisco Ip

11

• Available Conference resources. Run CiscoCallMgr_ConfBridgeAvailStreams to alert you if the number of available Conference Bridge streams falls below the minimum acceptable level. If the number of available streams frequently falls below the acceptable level, consider adding more Conference Bridge resources.

• Active transcoding resources. Transcoding resources allow IP phones using different codecs to communicate transparently. With calls coming into your network from the PSTN and from other VoIP networks, you may see some problems with codec incompatibility. Among the resources CallManager allocates is a transcoding resource that allows IP phones using different codecs to communicate transparently. Transcoding is particularly useful if bandwidth is tight and restrictions are being placed on certain network segments to limit codec usage to the lower-bandwidth codecs. For example, a call placed using a low-bandwidth codec may be transferred to a voicemail system that requires a G.711 (high-bandwidth codec) data stream. In such a case, the lack of a transcoder can mean a dropped or failed call.

Run the following scripts every three minutes to monitor active resources and to be notified should the number of available resources fall below an acceptable level:

− CiscoCallMgr_TranscoderResources. For CallManager 3.1 and above, monitors active and available transcoder resources on all transcoder devices registered to a CallManager.

− CiscoCallMgr_Transcoder_Device. Monitors an individual transcoder device for active resources and available resources. This script also monitors whether the transcoder device ran out of resources at any time during the specified interval.

− CiscoCallMgr_TranscoderUnavailable. For CallManager 3.1 and above, monitors the number of times during the interval that a CallManager attempted to allocate a transcoder resource when none was available.

• Media Termination Points (MTPs). Available on some Cisco switches, the MTP application supports call hold and transfer for H.323 endpoints and PSTN phones, which wouldn’t otherwise be able to hold or transfer calls on a VoIP network. MTPs work by acting as proxies, keeping the call on hold alive on the non-supportive endpoints while communicating information about the call’s location to the party at the other end of the call. Without MTPs, many incoming calls placed on hold or transferred by a telephone user are dropped.

Because you obviously can’t predict how many incoming calls will need MTP resources at any point, it’s a good idea to keep records of how many active streams each MTP has to support at certain times of the day, how often MTPs are requested, and how often these requests go unfulfilled due to call volumes.

− CiscoCallMgr_MTP_Device. Monitors an individual MTP device for active and available resources.

• Music on Hold (MOH) Servers. A plug-in installed during CallManager installation, the MOH server allows users to hear music while they’re waiting on hold. MOH won’t work unless you also configure the CallManager server to use the MOH streams generated by the MOH server. The MOH Audio Translator application can transform a given .mp3 audio file into MOH audio source files formatted for each of the four supported codec types. Based on a source ID that identifies the type of codec making the MOH request, source files are then sent in streaming (UDP) format to the proper port.

The MOH server has several Windows 2000 performance counters to monitor, and you’ll also want to know if any MOH requests end in failed connections, indicating a configuration mismatch between the server and the CallManager. The IP Voice Media Streaming application that enables MTPs and unicast conference bridges also enables the MOH server, so make sure you receive an alert if it goes down for any reason.

Page 14: Best Practices for Monitoring Cisco Ip

12 White Paper

− Cisco CallMgr_MOHDevice. Monitors the number of currently active and available resources of Music On Hold devices.

− CiscoCallMgr_MOHServer_LostConnections. Monitors the number of times during the specified interval that a Music On Hold server lost connections with CallManager.

• Available bandwidth. Voice traffic requires specific bandwidth based on codec. G.711 requires about 64 Kbps or so for each direction of a bi-directional call. G.723 and G.729 require significantly less bandwidth due to compression but congestion can severely impact call quality. Each time you add a new application to the mix on your network, you risk the oversubscription of certain links. Congestion will almost certainly affect overall call performance, particularly if data loss or excess latency occurs. Voice is susceptible to catastrophic degradation under conditions of network oversubscription.

Ensure that you have adequate bandwidth, and ensure that you know when bandwidth availability is low, by running the following scripts every five minutes:

− CiscoCallMgr_LocationBandwidth. Monitors the current available bandwidth for a Cisco CallManager location.

− NetworkDevice_SingleWANLink_Util. Monitors a single WAN (serial, T1, or T3) link on a network device.

− NetworkDevice_WANLink_Util. Monitors WAN (serial, T1, or T3) links on a network device.

• IP phone functionality. You should monitor IP phones for their registration status, the validity of their dial tones, jitter, latency and lost packet count. By frequently checking CallManager Call Detail Records (CDRs) and Call Management Records (CMRs), you’ll gain access to valuable information about call metrics and call quality. CallManager writes CMRs only for Cisco IP phones and for gateways that use the MGCP (Media Gateway Control Protocol) to interface with CallManager. CallManager doesn’t keep these records by default; do the following to start collecting these useful data records:

a. In Cisco CallManager Administration, select Service > Service Parameters > CallManager. b. To enable the generation of CDRs, set CDREnabled to T. c. To enable the generation of CMRs, set CallDiagnosticsEnabled to T.

The following AppManager scripts provide the IP phone monitoring capability you need:

− CiscoCallMgr_RegHardwarePhones. Monitors the number of registered hardware phones.

− CiscoCallMgr_CCM_PhoneCheck. Monitors for new and missing phones and events with directory number or description of phones.

− CiscoCallMgr_CCM_LossOfHardwarePhones. Monitors for loss of hardware phones and events based upon configured threshold.

− CiscoCallMgr_CallQuality. Monitors calls recorded in the CallManager database on the Publisher for jitter, latency and lost data. This script checks CMRs periodically for lost packets, jitter, and latency, all of which can degrade the quality of voice transmission and lead to user complaints. Latency is the most important statistic to track. The CMR estimates latency for a call based on differences in the Network Time Protocol (NTP) timestamps in the RTP headers added to each packet by the sender and the receiver. Latency for a VoIP call in a single direction should be below 140-150 ms, or call quality noticeably deteriorates.

AppManager will generate an event with the full CDR record that includes source number, destination number, duration of call, failure cause code, and the latency, loss, and jitter metric values averaged for that call.

− CiscoCallMgr_CallFailures. Monitors calls recorded in the CallManager database on the Publisher for calls that ended with an abnormal termination code.

Page 15: Best Practices for Monitoring Cisco Ip

13

AppManager will generate an event with the full CDR record that includes source number, destination number, duration of call, failure cause code, and the latency, loss, and jitter metric values averaged for that call.

− CiscoCallMgr_CCM_DeviceStatus. Monitors the status of key devices within a cluster. Possible statuses include registered, unregistered, rejected, and unknown.

• Cisco CallManager CDR Reporting and Analysis. The AppManager for Call Data Analysis module enables customers to collect and report on call data records (CDRs) produced by VoIP systems such as Cisco CallManager. These records usually contain information such as call origination, call destination, call duration, and call termination status. Most VoIP systems also provide information about the quality of the calls they process, including metrics such as jitter and latency, as well as the number of packets that were sent, received, and lost. With Call Data Analysis, customers can create and schedule detailed reports, using AppManager Knowledge Scripts that analyze the traffic represented by the CDR data. Sample reports include Call Volume Report, Call Success Rate Report, Call Completion Rate Report, Call Failure Cause Report, and Call Quality Report.

IP Gateway Health

We suggest that you constantly monitor VoIP gateways for availability, CPU statistics, memory usage, and link utilization. Run the following AppManager scripts to gather all of the necessary data:

• NetworkDevice_Chassis_Usage. Monitors the physical chassis of a network device.

• NetworkDevice_Interface_Health. Monitors the interfaces on a network device.

• NetworkDevice_LANLink_Util. Monitors the LAN links on a network device.

• NetworkDevice_WANLink_Util. Monitors the WAN (serial, T1, or T3) links on a network device.

QoS Monitoring

In order for VoIP users to receive an acceptable level of voice quality, VoIP traffic must be given priority over other kinds of network traffic, such as data. The main goal of Quality of Service (QoS) is to ensure that VoIP traffic receives the preferential treatment it deserves, thereby reducing or eliminating the delay of voice packets that travel across a network.

You should monitor the following metrics that affect VoIP call quality:

• Delay. The end-to-end delay, or latency, as measured between endpoints is a key factor in determining VoIP call quality.

• Jitter. Jitter is a call quality factor known to adversely affect call quality. Jitter is also called delay variation, and it indicates the variance of the arrival rate of datagrams sent during a simulated VoIP call.

• Jitter buffer loss. Jitter buffer loss is the amount of data that is lost when jitter exceeds that which the jitter buffer can hold. Jitter buffer loss affects call clarity, which affects the overall call quality.

• Packet loss. When a datagram is lost during a VoIP transmission, you can lose an entire syllable or word in a conversation. Obviously, data loss can severely impair call quality.

Page 16: Best Practices for Monitoring Cisco Ip

14 White Paper

• MOS. By comparing your real network metrics with the subjective MOS (Mean Opinion Score), you can understand which network factor is clearly affecting voice quality. The MOS is an overall score representing the quality of a call. The MOS is a number between 1 and 5. A MOS of 5 is excellent; a MOS of 1 is unacceptably bad.

• R-value. Defined by ITU (International Telecommunication Union) recommendation G.107, the E-model is a complex calculation, the output of which is a single score called an R-value that is derived from delays and equipment impairment factors. An R-value can be mapped to an estimated MOS. R-values range from 100 (excellent) to 0 (poor). As shown below, an estimated MOS can be directly calculated from an R-value:

Several AppManager scripts simulate a VoIP call between Performance Endpoints. After simulating a call, the scripts can gather data about some or all of the QoS metrics as they relate to your network:

• VoIPQuality_CallPerf_G711a. Simulates a VoIP call between endpoints using the G.711a codec, which is the ITU standard for H.323-compliant codecs. Uses the A-law for compression, a popular standard in Europe.

• VoIPQuality_CallPerf_G711u. Simulates a VoIP call between endpoints using the G.711u codec, which is the ITU standard for H.323-compliant codecs. Uses the U-law for compression, the most frequently used method in North America.

• VoIPQuality_CallPerf_G723.1-ACELP. Simulates a VoIP call between endpoints using the G.723.1-ACELP codec, which uses the conjugate structure algebraic code excited linear predictive compression (ACELP) algorithm.

• VoIPQuality_CallPerf_G723.1-MPMLQ. Simulates a VoIP call between endpoints using the G.723.1-MPMLQ codec, which uses the multipulse maximum likelihood quantization (MPMLQ) compression algorithm.

• VoIPQuality_CallPerf_G726. Simulates a VoIP call between endpoints using the G.726 codec, which is a waveform codec that uses Adaptive Differential Pulse Code Modulation (ADPCM). ADPCM is a variation of pulse code modulation (PCM), which only sends the difference between two adjacent samples, producing a lower bit rate

• VoIPQuality_CallPerf_G729. Simulates a VoIP call between endpoints using the G.729 codec, which is a high-performing codec that offers compression with high quality.

Page 17: Best Practices for Monitoring Cisco Ip

15

• VoIPQuality_CallPerf_G729A. Simulates a VoIP call between endpoints using the G.729A codec, which is a reduced-complexity version of the G.729 codec. Developed for simultaneous voice and data applications for which the G.729 codec was too complex. Speech quality is virtually indistinguishable between G.729 and G.729A.

Many other Knowledge Scripts simulate a VoIP call between Cisco SAA-enabled routers. The VoIPQuality_CiscoSAA scripts simulate calls using the same codecs as the VoIPQuality_CallPerf scripts.

And finally, one more Knowledge Script, CiscoCallMgr_CallQuality, monitors calls recorded in the CallManager database on the Publisher for jitter, latency and lost data.

Layer 2 and 3 Switches

We highly recommend that you continually monitor Layer 2 and Layer 3 switches for switch failures, card failures (such as reboots, crashes), memory utilization, CPU utilization, power supply status, temperature status, fan status, QoS parameters, and IP phone port status.

Three AppManager scripts provide the monitoring capability you need:

• NetworkDevice_Chassis_Usage. Monitors the physical chassis of a network device.

• NetworkDevice_Interface_Health. Monitors the interfaces on a network device.

• NetworkDevice_LANLink_Util. Monitors the LAN links on a network device.

Reporting

AppManager collects data about the performance of IP telephony and stores it in the AppManager repository, a SQL server database. You can access this data in real-time or historically for all of your reporting needs.

Real-time The AppManager Chart Console lets you generate and view charts of data streams generated by Knowledge Script jobs. As the jobs run, the data streams in the charts are continually updated with new information. The Chart Console provides key data that you can use instantly to manage and troubleshoot your Cisco IPT environment.

You can use the AppManager GUI- or Web-based Chart Console to view collected data in real-time at regular intervals as low as one minute. Viewing of data can be organized and segmented by data stream and access to charts can be restricted by AppManager user login.

All data displayed in charts can be easily viewed using AppManager Report scripts and selecting the desired data stream. In addition, AppManager ships the Chart2HTML Knowledge Script, which allows you to easily convert charts to Reports.

Page 18: Best Practices for Monitoring Cisco Ip

16 White Paper

Historical We recommend that you collect trending information whenever and wherever possible. Trending information should contain at least maximum and average values, which can then be used to define “above average” and “peak” thresholds for the different parameters. The threshold should be defined if possible using the average and maximum values observed during the busy hour of the day in order to avoid unnecessary alerts.

Compiling the Collected Data AppManager reports are generated using Report Knowledge Scripts. AppManager ships with dozens of Report scripts to generate HTML reports based on any type of collected data. Access to reports can be restricted using MS IIS web site directory security.

The following is a list of frequently used generic report Knowledge Scripts:

• AggValueHistory. Generates a report from data in the archive and aggregate tables.

• AvgMaxMinValue. Displays the average, maximum, and minimum values of the data stream(s) collected by a Knowledge Script within a specified time frame.

• AvgValueByDay. Details the average daily value of data streams collected by Knowledge Script jobs

• AvgValueByHr. Displays the average values by hour of the data stream(s) collected by a Knowledge Script within a time range

• AvgValueByMin. Displays the average values by minute of the data stream(s) collected by a Knowledge Script within a time range

Analyzing Call Activity This section introduces a formal process that you can adapt to your organization in order to provide baseline and trend information, and act upon the information collected this way.

It is important to note that unexpected additions to the number of phones, sidecars (7914s), gateways, applications (IP Softphones, Web attendants, IP Manager Assistants, etc), or any other changes might affect system utilization. Therefore, it is important to document plans for a successful analysis of the collected data and planning. Here is an example of the steps entailed in a post-analysis of collected data:

1. Analyze the Call Detail Records (CDRs) and/or relevant performance counters to determine attempted calls (CA) and completed calls (CC). If using CDRs, group the data into time slots (for example, 15 minutes-worth of data at a time). Compute the HCA (hourly CA) and the HCC (hourly CC) for every time slot of data. For example, to get the hourly data, multiply the numbers found for 15 minutes by 4.)

2. Using the data above, you can determine:

− The busiest hour during the day (all days);

− The busiest hour during the week (all weeks);

− The top three busiest days of the year.

“Busiest” can be defined in terms of HCA, HCC, or even talk-minutes if you are looking at the problem from a cost perspective. If you are using 15-minute time slots for data analysis, finding the “weekly busiest hour of call attempts (BHCA)” means finding the four consecutive time slots that have the highest total value of HCA. The result could be, for example: “on Tuesday from 9:45am to 10:45am, with a BHCA value of 1,252.”

Page 19: Best Practices for Monitoring Cisco Ip

17

3. Once you’ve figured out your busiest hours (daily, weekly) and your top three busiest day (yearly), use the [data] collected during:

− the busiest hour during the day (averaged),

− the busiest hour during the week (averaged), and

− the busiest hour of the busiest days (considered as a peak value)

Note [data] could be any of the parameters highlighted in this document (total virtual memory used; virtual bytes used by CCM.exe; etc.); so for each [data] element you want to baseline, you’ll obtain three values.

You can use data you’ve collected to plan system upgrades or to analyze whether your system has a good chance of sustaining periods of high usage. For example, if you are planning to add more phones on your system within six months, then monitor the call activity using the busiest hour of the busiest day for one month and divide by the current number for phones to determine the peak call activity per phone.

Conclusion

Cisco CallManager is an excellent choice for your IP telephony implementation. But as with any sophisticated system, there may be a few hurdles along the way to your goal of a VoIP network with five-nines of reliability. Network hardware and links go down; software applications, services, and processes consume limited CPU and memory resources; intruders interfere with administrative files and records. Cisco has worked hard to ensure the Cisco CallManager system will be as reliable as the telephone networks we all take for granted.

Keeping the network running perfectly, all the time, requires proactive management and a good understanding of the various system components including the operating systems, databases, and servers that support Cisco CallManager. An intelligent deployment and ongoing monitoring practice are required to keep Cisco CallManager and its associated software and hardware operational, efficient, and reliable — a task that can be quite time-consuming. NetIQ AppManager for VoIP software and Knowledge Scripts provide the strategic and tactical tools in support of the necessary monitoring tasks required for company-specific SLAs.

Page 20: Best Practices for Monitoring Cisco Ip

18 White Paper

Appendix A: Supported Environments

AppManager modules support the following platforms:

Module: Supported platforms:

Cisco CallManager Cisco CallManager 3.0(x), 3.1(x), 3.2(x), 3.3(x), 3.4(x), 4.0(x)

Cisco Intelligent Contact Manager (ICM):

Cisco ICM 4.6 or later

Cisco ICS Cisco Integrated Communication System 7750

Cisco IP Interactive Voice Response (IP IVR)

Cisco IP IVR 2.2 or later

Cisco IP/TV Cisco IP/TV 3.2 or later

Cisco Personal Assistant Cisco Personal Assistant 1.2 or later

Cisco Unity Cisco Unity 3.0(2), 3.1(x) and 4.0(x)

Cisco Unity Bridge Cisco Unity Bridge 2.1 or later

Compaq Insight Manager Cisco Media Convergence Server (MCS) 7800 series

Compaq Insight Manager agent 3.2 or later

Dell OpenManage Dell PowerEdge servers running Dell OpenManage version 3.1 or later

H.323 Call Setup Microsoft Windows 2000 SP2, or Windows NT 4.0 SP6a

Lotus Domino Unified Messaging

Domino Server 4.5, 4.6 or 5.0

Microsoft Exchange Unified Messaging

Exchange Server 5.0, 5.5 and Exchange 2000 Server

Network Devices Cisco Systems switches, routers and gateways, including VG200/248

Nortel BayStack switches, models 460 and above

Nortel Networks routers, BayRS v14 and above

Nortel Access Stack Node (ASN) Series

Nortel Backbone Concentrator Node (BCN) Series

Nortel Backbone Link Node (BLN) Series

Nortel Backbone Node (BN) Series

Nortel Passport Advanced Remote Node (ARN) Series

Nortel Passport Series, including 8600 series

Extreme Networks switches using ExtremeWare v6.1.8 and above

Alcatel OmniSwitch/Router 6000 and 7000 Series

Page 21: Best Practices for Monitoring Cisco Ip

19

Module: Supported platforms:

SIP Call Setup Microsoft Windows 2000 SP2

Video Quality Microsoft Windows 2000 SP2, or Windows NT 4.0 SP6a

Microsoft Windows Media Player version 7.x or later

RealOne Player or RealPlayer G2 or later

VoIP Quality (Call Performance)

Microsoft Windows 2000 SP2, Windows NT 4.0 SP6a

Linux for x86

Sun Solaris (x86 and SPARC)

Windows Microsoft Windows 2000 SP2, or Windows NT 4.0 SP6a

Page 22: Best Practices for Monitoring Cisco Ip

20 White Paper

Appendix B: Summary Guidelines

CallManager Server Health Required AM KS Thresholds

War/Crit Polling Interval

Data Collection

CPU CallManager CPU Usage CiscoCallMgr_CCM_SystemUsage Every 5 Minutes Y Total CPU CiscoCallMgr_CCM_SystemUsage Every 5 Minutes Y CPU for CallManager Process CiscoCallMgr_CCM_CpuHigh 90% Every 5 Minutes CPU for all other CallManager processes CiscoCallMgr_CCM_CpuHigh 20% Every 5 Minutes Isolate Process Spikes in CPU NT_TopCpuProcs Y

Memory Memory by the CallManager Process CiscoCallMgr_CCM_SystemUsage Every 5 Minutes Y Total Memory CiscoCallMgr_CCM_SystemUsage Every 5 Minutes Y Memory for all CallManager Processes CiscoCallMgr_CCM_MemHigh Every 5 Minutes Memory for all other processes CiscoCallMgr_CCM_MemHigh Every 5 Minutes

Isolate Process Spikes in Memory (Memory Leaks) NT_TopMemProcs n/a Y

Physical Memory NT_MemUtil n/a Virtual Memory

NT_MemUtil 90% Every 5 Minutes Y Paging Space NT_MemUtil 70% Every 5 Minutes Y Paging High NT_PagingHigh Every 10 Minutes

Disk Disk Usage NT_LogicalDiskSpace n/a / 80% Every 12 Hours Disk Array Status CIM_DiskArrayFail Down Every 12 Hours Fans

Fan Status CIM_FanSummary Down Power Supply

Battery Status CIM_UPSBatteryLow AC Power Status

CIM_UPSLineStatus Temperature

Server Temperature Status CIM_ThermalStatus Network Interface Cards

NIC Card Bandwidth Utilization NT_NetworkBusy Every 15 Minutes Server Up/Down Down

CCM AppManager Application Monitors Required AM KS Thresholds

War/Crit Polling Interval Data

Collection

AppManager 6.0 Services netiQms Down netiQmc Down netiQccm Down

CCM SQL (CallManager Database) Required AM KS Thresholds

War/Crit Polling Interval Data

Collection

SQL Accessibility CiscoCallMgr_SQL_Accessibility n/a

SQL Server Status CiscoCallMgr_SQL_ServerDown Down

SQL Transaction Replication CiscoCallMgr_SQL_RepTransaction Trans to be Repl. Every 1 Hour

Max # 10

Page 23: Best Practices for Monitoring Cisco Ip

21

CCM Application Services (CallManager Services Health) Required AM KS

Thresholds War/Crit Polling Interval

Data Collection

CallManager Services Cisco Call Service (Cisco Call) CiscoCallMgr_CCM_HealthCheck Down Every 1 minute Y Cisco DB Layer Monitor Service (Aupair) CiscoCallMgr_CCM_HealthCheck Down Every 1 minute Y Cisco TFTP Service (Cisco Tftp) CiscoCallMgr_CCM_HealthCheck Down Every 1 minute Y Cisco IP Voice Media Streaming App Service n/a Cisco Message Interface service n/a

Cisco Telephony Call Dispatcher Service (Cisco Telephony Call Dispatcher) CiscoCallMgr_CCM_HealthCheck

Down

Every 1 minute

Y

Cisco DC Directory Server Service CiscoCallMgr_CCM_HealthCheck Down Every 1 minute Y Cisco SNMP Data Collector service CiscoCallMgr_CCM_HealthCheck Down Every 1 minute Y

Cisco Extension Mobility Logout (CiscoUserLogoutSvc) CiscoCallMgr_CCM_HealthCheck n/a Every 1 minute Y

Cisco MOH Audio Translator service n/a Cisco RIS Data Collector service (Cisco RIS Data Collector) CiscoCallMgr_CCM_HealthCheck Down Every 1 minute Y

Cisco CDR Insert Service (InsertCDR) CiscoCallMgr_CCM_HealthCheck Down Every 1 minute Y CallManager Heartbeat

CallManager Keep-Alive CiscoCallMgr_CCM_Heartbeat CallManager TFTP Status

TFTP Requests CiscoCallMgr_TFTPRequests 30% Every 30 Minutes

CiscoCallMgr_TFTPRequests 30% TFTP Errors Every 30 Minutes

Not Found errors CiscoCallMgr_TFTPErrors Request Aborted CiscoCallMgr_TFTPErrors Overflow errors CiscoCallMgr_TFTPErrors

CallManager Role Status Primary/Secondary status of all CallManagers CiscoCallMGR_CCM_RoleStatus Change in status Every 5 minutes CCM_Publisher Secondary--> Pri Primary Sub Pri --> Secondary

Secondary Sub Secondary--> Pri CallManager Backup Status

burBackup service (burBack) CiscoCallMgr_CiscoBackupStatus Down Once a day Successful backup CiscoCallMgr_CiscoBackupStatus fails Once a day CA Arcserve BrightSTOR Agent Down

IIS Server Service Health IIS Service Status CiscoCallMgr_IIS_HealthCheck Down Every 1 minute

Domain Name Service Can be added DNS Connectivity NT_DNSConnectivity Every 1 Hour

Security Failed Logon Attempts NT_FailedLogons Every 1 Hour Y

Page 24: Best Practices for Monitoring Cisco Ip

22 White Paper

CallManager Functionality Required AM KS Thresholds

War/Crit Polling Interval

Data Collection

Call Information n/a CallManager Calls Active CiscoCallMgr_CallsActive Every 5 Minutes Y CallManager Calls In Progress CiscoCallMgr_Call_in_Progress Every 5 Minutes Y

Calls Attempted/Calls Completed (Busy Hour Reporting) CiscoCallMgr_CallActivity

Every 5 Minutes

Y

Call Quality Packet Loss CiscoCallMgr_CallQuality 3% Every 5 Minutes Y Jitter CiscoCallMgr_CallQuality 45ms Every 5 Minutes Y Latency CiscoCallMgr_CallQuality 150 ms Every 5 Minutes Y CallFailures CiscoCallMgr_CallFailures Every 5 Minutes Y

IP Phone Functionality Loss of HW Phones CiscoCallMgr_LossOfHardwarePhones 10% Every 5 Minutes New or Missing Phones CiscoCallMgr_Phone Check Y Registered Hardware Phones CiscoCallMgr_RegHardwarePhones 2000 Phones Every 30 Minutes Y

3 Phones Y Status of Critical Phones CiscoCallMgr_DeviceStatus

MGCP Gateway Registration New or Missing Gateways CiscoCallMgr_MGCP_GatewayCheck Gateway Registration Status CiscoCallMgr_DeviceStatus

MGCP Gateway Registration Gatekeeper Registration Status CiscoCallMgr_DeviceStatus

MGCP Call Activity FXO calls completed and outbound busy attempts

CiscoCallMgr_MGCP_FXO

Every 5 Minutes

Y

FXO calls active and in-service FXO ports CiscoCallMgr_CCM_FXOPorts Every 5 Minutes Y FXS calls completed and outbound busy attempts

CiscoCallMgr_MGCP_FXS Every 5 Minutes Y FXO calls active and in-service FXO ports CiscoCallMgr_CCM_FXsPorts Every 5 Minutes Y PRI calls active and out-of-service channels CiscoCallMgr_MGCP_PRI_ Channels Every 5 Minutes Y

PRI calls completed outbound busy attempts and D-Channel Status CiscoCallMgr_MGCP_PRI Every 5 Minutes Y

T1 calls active and out-of-service channels CiscoCallMgr_MGCP_T1 CAS_ Channels Every 5 Minutes Y

T1 calls completed and outbound busy attempts CiscoCallMgr_MGCP_T1 CAS Every 5 Minutes Y H323 Call Activity

H323 Calls Attempted CiscoCallMgr_H323Calls Attempted Every 5 Minutes Y H323 Calls In Progress CiscoCallMgr_H323CallsIn Progress Every 5 Minutes Y

Analog Port Activity Analog Ports Active

CiscoCallMgr_AnalogPortsActive Every 5 Minutes Y Analog Ports out of Service

CiscoCallMgr_AnalogPortsOutOfService Every 5 Minutes Y Music On Hold Resources

Status of active and available MOH resources CiscoCallMgr_MOHDevice Y

MOH server connection status with CallManager CiscoCallMgr_MOHServer_Lost Connections

Y Transcoder Resources Active and available transcoder resources

CiscoCallMgr_TranscoderResources Y Media Termination Points Active and available MTP resources

CiscoCallMgr_MTP_Device Y

Page 25: Best Practices for Monitoring Cisco Ip

23

Cisco CallManager Operational Reports

Registered Hardware Phones ReportAM _AvgValByDay

CallManager Service Availability CiscoCallMgr_Report_ServicesAvailability

SW & HW Inventory ReportAM_Inventory

System Usage CPU CiscoCallMgr_Report_SystemUsage Memory and Memory Leaks CiscoCallMgr_Report_SystemUsage Virtual Memory (Memory Leaks)

ReportAM _AvgValueByHr

CallManager Call Information Calls In Progress ReportAM _AvgValueByHr Calls Active CiscoCallMgr_Report_CallsByHour Busy Hour Calls Attempted Calls Completed across CallManagers CiscoCallMgr_Report_CallsByHour

Busy Hour Calls Attempted Calls Completed per CallManager

CiscoCallMgr_Report_CallActivity CallQuality CiscoCallMgr_Report_CallQuality

MGCP Call Information Total Active Channels per PRI/T1 CiscoCallMgr_Report_MGCPChannelUsage Total Active Calls per MGCP Gateway CiscoCallMgr_Report_GatewayUsage