Top Banner
© 2010 IBM Corporation IBM Tivoli Netcool/OMNIBUS Key Performance Indicators IBM Tivoli Netcool/OMNIbus 7.3.1 Key Performance Indicators Presented by Christina Grimes [email protected]
27
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation

IBM Tivoli Netcool/OMNIBUS Key Performance Indicators

IBM Tivoli Netcool/OMNIbus 7.3.1 Key Performance IndicatorsPresented by Christina Grimes [email protected]

Page 2: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation2

Agenda Introduction

Object Server Key Performance Indicators

Probe and Gateway Key Performance Indicators

Q&A

Page 3: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation3

Introduction

To check that Omnibus is performing well, there are several key performance indicators that can be monitored.

When using the key performance indicators, first establish a baseline on the system when it is under normal load and operation.

The Key Performance indicators can be used to measure performance when changes are made to the environment, by comparing the baseline to the KPI measurements after the change.

Page 4: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation4

Object Server Key Performance Indicators

–Object Server Granularity• Profiler Report• Trigger Stats

–Number of rows in alerts.status, alerts.journal and alerts.details–Number of inserts in the alerts.status table–Number of Connections–CPU usage of nco_objserv–Memory Usage of nco_objserv–Memstore usage

Page 5: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation5

Object Server Granularity The Object Server property Granularity specifies the update interval for IDUC

Insert, Delete, Update, Control notifications to clients.

The Default Granularity period is 60 seconds. It is not recommended to change the default.

The Object Server should service all client and trigger requests within the Granularity period.

Client and trigger requests may temporarily take more time than the Granularity period.

If client and trigger requests consistently take more time than the Granularity period, Object Server performance will be degraded. Resulting in:

– Client disconnections– Delays or errors in logging into the Object Server– Delays in events being propagated between multiple Object Servers– Object Server will eventually become unresponsive

Page 6: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation6

Object Server Profiler Report Profile statistics are also logged to a profile log file

$NCHOME/omnibus/log/ObjectServername_profiler_report.log#.

The profiler report shows a breakdown of the time spent for each client connection and the total time spent by client type, for each granularity period

You can use the profile log file to analyze how the ObjectServer time is spent during each granularity period and calculate the percentage of time used. For example, if the granularity period is set to 60 seconds and the total time spent for all the connections during a particular period was 30 seconds, you can calculate that the ObjectServer spent 50% of its available time on running SQL commands from client connections.

Page 7: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation7

Example Profiler ReportMon Oct 12 17:39:46 2009: Individual user profiles: Mon Oct 12 17:39:46 2009: 'Administrator' (uid = 0) time on adminhost: 0.000000s Mon Oct 12 17:39:46 2009: 'isql' (uid = 0) time on omnihost1.ibm.com: 3.770000s Mon Oct 12 17:39:46 2009: 'PROBE' (uid = 0) time on probehost.ibm.com: 5.010000s Mon Oct 12 17:39:46 2009: 'e@c0B4D@c0142:11.0' (uid = 0) time on omnihost1.ibm.com: 10.010000sMon Oct 12 17:39:46 2009: 'c@xxxxx@xxxxx:11.0' (uid = 45) time on omnihost1.ibm.com: 0.000000s Mon Oct 12 17:39:46 2009: 'e@c0B4D@c0142:11.0' (uid = 45) time on omnihost1.ibm.com: 9.870000s Mon Oct 12 17:39:46 2009: 'c@xxxxx@xxxxx:11.0' (uid = 55) time on omnihost1.ibm.com: 0.000000s Mon Oct 12 17:39:46 2009: 'e@c0B4D@c0142:11.0' (uid = 55) time on omnihost1.ibm.com: 6.020000s Mon Oct 12 17:39:46 2009: 'GATEWAY' (uid = 0) time on omnihost1.ibm.com: 0.270000s Mon Oct 12 17:39:46 2009: 'GATEWAY' (uid = 0) time on omnihost1.ibm.com: 0.000000s Mon Oct 12 17:39:46 2009: 'PROBE' (uid = 0) time on omnihost1.ibm.com: 3.010000s Mon Oct 12 17:39:46 2009: Grouped user profiles: Mon Oct 12 17:39:46 2009: Execution time for all connections whose application name is 'PROBE': 8.020000s Mon Oct 12 17:39:46 2009: Execution time for all connections whose application name is 'GATEWAY': 0.270000s Mon Oct 12 17:39:46 2009: Execution time for all connections whose application name is 'c@xxxxx@xxxxx:11.0':

0.000000s Mon Oct 12 17:39:46 2009: Execution time for all connections whose application name is 'e@c0B4D@c0142:11.0':

25.93000s Mon Oct 12 17:39:46 2009: Execution time for all connections whose application name is 'isql': 3.77000s Mon Oct 12 17:39:46 2009: Execution time for all connections whose application name is 'Administrator': 0.000000s [Mon Oct 12 17:39:46 2009: Total time in the report period (59.275782s): 29.980000s

Page 8: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation8

Analyzing the Profiler Report

Goal: Profiler Report “Total time in the report period” + Trigger Statistics “Time for all triggers” < Object Server Granularity period of 60 seconds.

• Current Total time in the report period of 29.980000s indicates ~50% of Object Server Granularity period is taken services client connections.

• Identify the highest clients in the report period:

Mon Oct 12 17:39:46 2009: 'e@c0B4D@c0142:11.0' (uid = 0) time on omnihost1.ibm.com: 10.010000s

Mon Oct 12 17:39:46 2009: 'e@c0B4D@c0142:11.0' (uid = 45) time on omnihost1.ibm.com: 9.870000s

High probes can indicate an event flood or large number of status or details insertsHigh event lists or WebGUI clients can indicate inefficient filtersHigh Object Server gateway clients can indicate transfer of many events or a resync

Page 9: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation9

Object Server Trigger Statistics

Trigger statistics are also logged to the file $NCHOME/omnibus/log/ObjectServername_trigger_stats.log#.

The trigger statistics log file shows the amount of time that each trigger has used in the last profiling period.

You can use this log file for automation debugging, and to determine which triggers are slow due to slow-running SQL queries.

Page 10: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation10

Example Trigger StatisticsMon Oct 12 18:03:56 2009: Trigger Profile Report….Mon Oct 12 18:03:56 2009: Trigger Group 'primary_only'Mon Oct 12 18:03:56 2009: Trigger time for 'generic_clear': 5.879707sMon Oct 12 18:03:56 2009: Trigger time for 'expire': 0.008233sMon Oct 12 18:03:56 2009: Trigger time for 'delete_clears': 0.007219sMon Oct 12 18:03:56 2009: Trigger time for 'enrich_and_correlate': 23.007219s…Mon Oct 12 18:03:56 2009: Trigger Group 'iduc_triggers'Mon Oct 12 18:03:56 2009: Trigger time for 'disconnect_iduc_missed': 0.000000sMon Oct 12 18:03:56 2009: Trigger time for 'iduc_stats_update': 0.000949sMon Oct 12 18:03:56 2009: Trigger time for 'iduc_messages_tblclean': 0.000089sMon Oct 12 18:03:56 2009: Trigger time for 'deduplicate_iduc_stats': 0.000000sMon Oct 12 18:03:56 2009: Trigger time for 'iduc_stats_insert': 0.000000sMon Oct 12 18:03:56 2009: Trigger Group 'automatic_backup_system'Mon Oct 12 18:03:56 2009: Trigger time for 'backup_succeeded': 0.000000sMon Oct 12 18:03:56 2009: Trigger time for 'backup_failed': 0.000000sMon Oct 12 18:03:56 2009: Trigger time for 'backup_state_integrity': 0.000000sMon Oct 12 18:03:56 2009: Trigger Group 'gateway_triggers'Mon Oct 12 18:03:56 2009: Trigger time for 'resync_finished': 0.000000sMon Oct 12 18:03:56 2009: Time for all triggers in report period (60s): 29.789663s

Page 11: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation11

Analyzing Trigger StatisticsGoal: Profiler Report “Total time in the report period” + Trigger Statistics “Time for all

triggers” < Granularity period of 60 seconds.• Current Profiler report Total time in the report period of 29.980000s + Trigger Statistics

Time for all triggers 29.789663s indicates Object Server is at 100% utilization of 60 second Granularity period.

• Identify the highest triggers:

Mon Oct 12 18:03:56 2009: Trigger time for 'enrich_and_correlate': 23.007219sMon Oct 12 18:03:56 2009: Trigger time for 'generic_clear': 5.879707s

High generic_clear or deduplication triggers can indicate high event throughput or high number of resident events.

Ensure best practices are used in creating custom triggers

http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.netcool_OMNIbus.doc_7.3.1/omnibus/wip/admin/reference/omn_adm_per_bestpracticestriggers.html

Ensure trigger execution time is kept to a minimum, no other writes can be performed in the Object Server when a trigger is executed.

Page 12: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation12

Number of Rows alerts.status and alerts.journal alerts.details

alerts.details table should only be used when alerts.status is not enough to hold enough information for a specific alarm or during rules file development. On production systems, it is suggested to keep the alerts.details table below 5,000 rows. If you have a large number of rows in alerts.details, the ObjectServer performance can be degraded.                                                                                                                Details statement in probe rules file are used to generate records into alerts.details table. details($*) will record each token as one row into alerts.details. If you have details($*) enabled in your rules file, for each event in alerts.status table, you might have 10~50 rows in alerts.details table.                                                                                                                           Details can be disabled by commenting out any details($*) statements in all your probe rules file, restarting all probes, clearing the current records in details table (with "delete from alerts.details")

Page 13: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation13

Manage volume of alerts.detailsThroughput of events on the alerts.status table

Deduplication triggers prevent the number of rows from increasing, but still take processing time in the Object Server.

Page 14: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation14

Object Server Connections The maximum permitted number of connections is determined by the

ObjectServer Connections property, with a default of 30.

The Connections property of the Object Server should always be increased in a production environment.

Only a finite number of connections can be made to the ObjectServer. When the maximum number of connections is reached, new connections are refused.

A refused connection might result in the temporary loss of access to data or loss of input to probes or gateways.

The maximum number of connections is 1024.

An error is logged in the Object Server log when connections are exceeded.

Connections can be monitored through the stats_triggers in the master.stats table

Page 15: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation15

Monitoring Row Count, Volume and Connections

Enable the stats_triggers trigger group

In the default ObjectServer configuration, the stats_triggers group uses triggers to gather several statistics and metrics. You can enable the trigger group in a production environment.

The triggers in the stats_triggers trigger group count the – Current event counts on alerts.status, alerts.details and alerts.journal– number of inserts to the alerts.status, alerts.details and alerts.journal – number of deduplications and new inserts to alerts.status– number of client connections

Page 16: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation16

Stats triggers The data gathered by this trigger group and automation is written

periodically to the master.stats table.

The default write interval is 300 seconds; this value is configurable in the statistics_gather trigger

5 days of data is retained by default

Page 17: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation17

CPU usage Monitor the CPU usage of the nco_objserv process

If the Object Server is under heavy load, this will be reflected in CPU usage

Profiler report and trigger statistics logs will show the source of the heavy load

Sizing considerations https://www.ibm.com/developerworks/mydeveloperworks/wikis/home?lang=en#/wiki/Tivoli%20Netcool%20OMNIbus/page/OMNIbus%20Sizing%20Guide

Page 18: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation18

Memory Usage Memory usage of nco_objserv process

The memory usage of the process increases proportionally to increases in the number of rows in the alerts.status table, alerts.details table, and the alerts.journal table (or any additional tables you have defined), to increases in the number of connections, and increased usage by clients.

The memory usage should remain stable over time, and any increases should correspond to increases in the numbers of table rows or additional clients.

Sizing considerations

https://www.ibm.com/developerworks/mydeveloperworks/wikis/home?lang=en#/wiki/Tivoli%20Netcool%20OMNIbus/page/OMNIbus%20Sizing%20Guide

Page 19: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation19

Object Server Memstore Usage

To monitor the memstore, inspect the content of the catalog.memstores table.

For each row, compare the value of the UsedBytes column with the values of the SoftLimit column and the HardLimit column.

Memstores are containers that are maintained by the ObjectServer, they contain ObjectServer data and tables in the memory. Memstores have a finite size, and, when full, do not permit any further data to be inserted. Consequently, you must ensure that the memstores do not become full.

Once a memstore soft limit is exceeded, an error is logged to the Object Server log file. Once the hard limit is exceeded, the Object Server shuts down.

Page 20: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation20

Summary Monitoring Object Server KPI Object Server Profiler report and trigger statistics logs

–Object Server Granularity• Profiler Report• Trigger Stats

Enable Stats triggers–Number of rows in alerts.status, alerts.journal and

alerts.details–Number of inserts in the alerts.status table–Number of Connections

System monitors–CPU usage of nco_objserv–Memory Usage of nco_objserv

Object Server catalog.memstores table–Memstore usage

Page 21: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation21

Probe and Gateway KPI

Probe Key Performance Indicators– CPU Usage– Memory Usage– Number of events received– Average Time spent processing rules

Gateway Key Performance Indicators– CPU Usage– Memory Usage

Page 22: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation22

Probe Self Monitoring

Probe self monitoring can log the following information in the Probe log and/or at the Object Server

CPU time Rules File Processing Time Memory Used Number of Events Received Number of Events Discarded from Rules Number of Events Generated by Self Monitoring

http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.netcool_OMNIbus.doc_7.3.1/omnibus/wip/probegtwy/concept/omn_prb_enablingroi.htm

Page 23: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation23

Probe Self Monitoring example Probe Log File Info MessageLevel

--- ProbeStats report at Heartbeat ---Timestamp: 1294869063Probe: glfHost: hostnamePID: 29976ProbeID: glf@hostnameNumberEventsProcessed: 8785NumberEventsDiscarded: 0NumberEventsGenerated: 0DiscardPercentage: 0.000000TotalRulesTime: 0.785492 secTotalCPUTime: 43.089477 secProbeUpTime: -1294869003 secAverageEventsPerSecond: 8785AverageRulesFileTime: 89 usec--- ProbeStats report end ---

Page 24: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation24

Summary Monitoring Probe and Gateway KPI

Probe Key Performance Indicators– System Monitors and/or Probe Self Monitoring

• CPU Usage• Memory Usage

– Probe Self Monitoring• Number of events received• Average Time spent processing rules

Gateway Key Performance Indicators– System Monitors

• CPU Usage• Memory Usage

Page 25: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation25

Summary

Object Server Key Performance Indicators– Object Server Granularity

• Profiler Report• Trigger Stats

– Number of rows in alerts.status, alerts.journal and alerts.details– Number of inserts in the alerts.status table– Number of Connections– CPU usage of nco_objserv– Memory Usage of nco_objserv– Memstore usage

Probe Key Performance Indicators– Number of events received– CPU Usage– Memory Usage– Average Time spent processing rules

Gateway Key Performance Indicators– CPU Usage– Memory Usage

Page 26: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation26

Omnibus Best Practice ReferencesOmnibus 7.3.1 Best Practices Guide:

https://www.ibm.com/developerworks/mydeveloperworks/groups/service/html/communityview?communityUuid=cdd16df5-7bb8-4ef1-bcb9-cefb1dd40581#fullpageWidgetId=W05de62601548_4e85_8940_81bb58657a85&file=b7d8ad85-7152-47e8-926a-802246204e8c

Sizing Guide:

https://www.ibm.com/developerworks/mydeveloperworks/wikis/home?lang=en#/wiki/Tivoli%20Netcool%20OMNIbus/page/OMNIbus%20Sizing%20Guide

Multitier Architecture:

http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.netcool_OMNIbus.doc_7.3.1/omnibus/wip/install/concept/omn_esf_configuringdeploymultitieredarch.html

Best Practices for Performance Tuning:

http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.netcool_OMNIbus.doc_7.3.1/omnibus/wip/admin/reference/omn_adm_per_bestpracticespertuning.html

SQL Query Guidelines:

http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.netcool_OMNIbus.doc_7.3.1/omnibus/wip/admin/reference/omn_adm_per_sqlqueryguidelines.html

Best Practices for Creating Triggers:

http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.netcool_OMNIbus.doc_7.3.1/omnibus/wip/admin/reference/omn_adm_per_bestpracticestriggers.html

Page 27: 2012 04 12 STE Omnibus Performance

© 2010 IBM Corporation27

Questions?