Knowledge Article Best Practices for making changes to an active Manager run in BMC Performance Assurance for Servers Back to Answers Printer Friendly Rate this Page Knowledge Article ID: KA310873 Version: 1.0 Status: Publishe d Published date: 01/20/20 11 Problem The goal of this document is to cover how to make changes to an active Manager run, which changes require a Manager run to be resubmitted, and when changes will be applied to an active Manager run. NOTE: This document was originally published as Solution SLN000000168714. BMC Performance Assurance for Servers 7.4.10, 7.4.00, 7.3.00, 7.2.00 Unix Solution Section I: Overview There are 4 scripts executed by an active Manager run that collects, processes, and transfers data. These are: The [date]-[date].Manager script (Jan-01-2004.00.00-Dec-31-2004.23.59.Manager) o The .Manager script is created when a Manager run is submitted and the same script is used for the entire duration of the Manager run o The .Manager script is scheduled to run 'COLLECT_LEAD_TIME' minutes before the start of data collection. By default the COLLECT_LEAD_TIME is 30 minutes and is configurable under Options -> Collect tab -> "Run script before collection time". In Perform version 7.1.20 and earlier the .Manager script was executed 15 minutes before data collection start time. o The .Manager script creates and schedules all of the other scripts used for data collection, data transfer, and data processing
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Knowledge ArticleBest Practices for making changes to an active Manager run in BMC Performance Assurance for Servers
Back to Answers Printer Friendly Rate this Page
Knowledge Article ID: KA310873
Version: 1.0
Status: Published
Published date: 01/20/2011
ProblemThe goal of this document is to cover how to make changes to an active Manager run, which changes require a Manager run
to be resubmitted, and when changes will be applied to an active Manager run.
NOTE: This document was originally published as Solution SLN000000168714.
BMC Performance Assurance for Servers 7.4.10, 7.4.00, 7.3.00, 7.2.00
Unix
Solution
Section I: Overview
There are 4 scripts executed by an active Manager run that collects, processes, and transfers data. These are:
The [date]-[date].Manager script (Jan-01-2004.00.00-Dec-31-2004.23.59.Manager)
o The .Manager script is created when a Manager run is submitted and the same script is used for the
entire duration of the Manager run
o The .Manager script is scheduled to run 'COLLECT_LEAD_TIME' minutes before the start of data
collection. By default the COLLECT_LEAD_TIME is 30 minutes and is configurable under Options -> Collect
tab -> "Run script before collection time". In Perform version 7.1.20 and earlier the .Manager script was
executed 15 minutes before data collection start time.
o The .Manager script creates and schedules all of the other scripts used for data collection, data transfer,
and data processing
o The .Manager script will remove itself from cron/pcron when the End Date of the Manager run has been
reached
o In Perform version 7.2.00 and later the *.Manager script executes the udrCollectMgr processes which
handle data collection and data transfer for the Manager run
The [date]-[date].Collect script (Jan-01-2004.00.00-Jan-01-2004.23.59.Collect)
o A new .Collect script is created each day by the .Manager script
o In Perform version 7.2.00 and later, the *.Collect script's only purpose is to ensure that there is a
udrCollectMgr process running for the Manager run. It will run every 'Collector restart interval' minutes and
start a new udrCollectMgr process to take over data collection and data transfer for the run if one isn't
o In Perform version 7.1.20 and earlier, the .Collect script is scheduled to run at the specified start time of
data collection. So, if data collection is scheduled to run from 00:00 to 23:59 the .Collect script will be
scheduled to run each day at 00:00. The data collection start time is controlled by the Daily Begin field in
Manager.
o In Perform version 7.1.20 and earlier, the .Collect script will remove itself from cron/pcron when it has
finished starting data collection
o In Perform version 7.1.20 and earlier, if Collector Restart is enabled in Manager, after the .Collect script
has started data collection it will reschedule itself as the '[date]-[date].Collect query' script. This script will run
every 'Restart Interval' (by default 15 minutes) to query each remote node to ensure that data collection is
running.
The [date]-[date].XferData script (Jan-01-2004.00.00-Jan-01-2004.23.59.XferData)
o A new .XferData script is created each day by the .Manager script
o The .XferData script is scheduled by Manager to run Processing Delay minutes after data collection has
completed. So, if the Options => Advanced Features => Other => Processing Delay is set to 10 minutes (the
default) and the Daily End is set to 23:59 then the .XferData script will be scheduled to run at 00:09 (10
minutes after data collection completes)
o The .XferData script will remove itself from cron/pcron when it has finished all data transfer attempts
o The .XferData script executes the .ProcessDay script when all data transfer attempts have been
executed, or when all expected data is present on the managing node (whichever comes first)
o In Perform version 7.2.00 and later the purpose of the *.XferData script is to check if all data has been
transferred to the console by the udrCollectMgr process or if the Transfer Duration has elapsed. Once either
of those two events has happened it will execute the *.ProcessDay script.
The [date]-[date].ProcessDay script ((Jan-01-2004.00.00-Jan-01-2004.23.59.ProcessDay)
o A new .ProcessDay script is created each day by the .Manager script
o The .ProcessDay script is executed by the .XferData script after data transfer is complete (or time has
expired)
Section II: Making changes to a Manager run
The rule: You should only press the green activate button in the Manager GUI when scheduling a new Manager run.
Do not press the activate button again after making modifications to an active Manager
Each night the .Manager script will read all of the files that define your Manager run. This includes:
Your .vcmds file
The Workload definition file (.an file) specified for the run
All the Domain files (.dmn files) specified for the run
What that means is that any changes made to the .vcmds, .an file, or .dmn file will propagate into the next execution of the Manager run without it being re-activated.
The changes will be propagated into the Manager run when the new scripts are created at night. But,
keep in mind that all of the scripts for a day of data collection and data processing are created at the same time - 15 minutes before data collection begins. What that means is that changes to data collection will be seen starting with the data collection run on the night of your change, but changes to data processing will occur the next night.
So, if a change is made to a Manager run that collects data from 00:00 to 23:59 and has a 31 minute processing delay:
Change made Seen for data collection Seen for data processing
Jan 1, 2004 @ 10:00 Jan 2, 2004 @ 00:00 Jan 3, 2004 @ 00:30
So, if on January 1st at 10:00 a new workload is added to the .an file for a Manager run, the new workload will not appear in
Visualizer until January 3rd (it will be applied to the data collected on January 2nd). The reason is that the data processing
scripts for the data collected on January 1st have already been created. So, even though the workload characterization has
changed, the .ProcessDay script that is scheduled to run that night has already been created with the old workload
characterization.
Changes to some Manager options require that the run be stopped and restarted
There are some fields in a Manager run that should not be modified until the old Manager run is stopped and restarted.
These fields are:
The 'Output Files Directory' from the Main Manager GUI -> Data tab
The 'Data Source' toggle from the Main Manager GUI -> Data tab
The 'Start Date' from the Main Manager GUI -> Schedule tab
The 'End Date' from the Main Manager GUI -> Schedule tab
The 'Use time stamp for output directory' button from the Options -> Advanced Features -> Other tab
All other fields within the .vcmds file can be modified without stopping and restarting the Manager run. All fields in the .an file
and .dmn files can be modified without stopping and restarting the Manager run.
Section III: Frequently Asked Questions
Question 1
Is there an easy way to make changes to a Manager run and have them applied to the run immediately. For example
add a node for data collection at 10:00 and have it start collecting data immediately and process that data that
night? What about add a workload to the .an file and have it process today's workload with that new workload
characterization?
Unfortunately there is no easy way to get a change applied to a Manager run immediately. The files that define the base
Manager run (.vcmds file, .an file, .dmn files) are only referenced when the scripts are created at night. From that point
forward the run is controlled by files in the Manager output directory such as the [date]-[date].Variables file, the [date]-
[date].NItable file, and the scripts themselves. For that reason adding a node to the domain (.dmn) file won't have any effect
on the active Manager run because the active node list is stored in the [date]-[date].NItable file. Likewise, adding a workload
to the Workload definition (.an) file won't have any effect on the active Manager run because the workload definition that will
be used is store within the .ProcessDay script itself.
Question 2
Is there any way to stop and restart a Manager run in such a way that all collected data (the data from the beginning
of the day collected by the old Manger run, and the data collected after the Manager run has been restarted) will be
processed by the new Manager run which includes my desired change?
There is no seamless way to do this.
In Perform version 7.2.00 and later it is possible but it isn't simple. Once you've made you change to the Manager vcmds file,
Domain (dmn) file, or Analyze Commands (.an) File you would need to do the following:
1. Put a run.quit file into the existing Manager Output Directory for the run.
2. Find the running udrCollectMgr process from the Manager run you just stopped and kill it.
3. Rename the *.vcmds file to a new name.
4. Submit the Manager run under the new vcmds file name with a start date of today's date without changing the start
time of data collection (so if it was 00:00 leave it at that).
In general changes to your Manager run are best handled when you can wait a 24-hour period for the change to take effect.
In Perform version 7.1.20 and earlier there is no easy way to do that. Stopping the original Manager run will cause the data
collected by that Manager run to be left on the remote node (as the new Manager run will not know the data exists and will
thus not transfer it). Hence, the only data that will be processed and available in Visualizer will be the data collected from the
time that the new Manager run was activated.
Question 3
What happens if I don't deactive my existing Manager run but do press the green active button after making
changes to its .vcmds file?
In Perform version 7.2.00 and later the Manager GUI should reject your attempt to reactive the already active Manager run.
In Perform version 7.1.20 and earlier there will be two Manager runs using the same .vcmds file active. That means that
each Manager run will attempt to issue data collection requests for the same set of nodes, process the data from the same
set of nodes, and create Visualizer files for the same set of nodes. This will certainly result in wasted processing time, but
can result in other problems as well. It is best to ensure that only one Manager run is active for any .vcmds file. Until one of
the Manager runs is stopped there will be two Visualizer files created having the same name and containing the same data
but in different Manager output directories.
Question 4
If I absolutely need a change to occur in my Manager run immediately, what can I do?
First, ask yourself, "Why do I need this change to happen immediately? Why can't I wait 24 hours for this to happen?" Trying
to make a change that happens immediately will introduce a great deal of unnecessary risk into your environment since it
isn't well support and could mean the entire Manager run will fail.
There are two basic options at this point:
A. If you are willing to lose data from the current day for the period before the change was made, stop and restart
your Manager run. Data collection will start on the next whole hour and whatever change was made will be part of
this new Manager run. The data collected from before the change will never be transferred to the managing node
and it won't be processed by the re-activated Manager run. You must stop the previously activated Manager run or
data will be double processed - the whole day by the original Manager run and the part of the data including your
change by the new Manager run. That means that the data from whichever run finishes last will end up being
populated into Visualizer (which will typically be the original run - not what you want).
B. If you cannot lose any data you should contact Technical Support for suggestions. Technical Support may be able
to suggest the necessary modifications to the existing configuration files in the Manager Output Directory to apply
the change you want to make to the existing Manager run. This is somewhat risky as incorrectly applied changed
could cause data processing to fail. For Perform version 7.2.00 and later, see the answer to 'Question 2' for an
overview of what you need to do.
Knowledge ArticleHow can I change nodes with missing metric groups from Warnings to OK state in UDR Collection Manager (UCM) status reports?
Back to Answers Printer Friendly Rate this Page
Knowledge Article ID: KA295604
Version: 1.0
Status: Published
Published date: 01/20/2011
ProblemHow can I change nodes with missing metric groups from Warnings to OK state in UDR Collection Manager (UCM) status
reports?
This document was originally published as Solution SLN000000203444.
In the Collector Status Reporting (CSR) reports nodes and domains can be in one of three different states: OK, Warning, or
Failed. A node will be assigned an 'OK' state for data collection if data collection has been successfully started on the
remote node and data is being collected for all configured metric groups. A node will be assigned a 'Warning' state for data
collection if data collection has failed to start on the remote node, or if data collection has failed for one or more configured
metric groups on the remote node. A node will be assigned a 'Failed' state for data collection after data collection has failed
and the time for all configured retries have passed.
By default, it is quite likely that some configured metric groups will not be collected on almost every node. For example, the
Solaris collector will attempt to gather the 'SRM Statistics' group which will only be enabled on machines using the Solaris
Resource Manager. Each operating system has at least one group that is unlikely to be collected on the majority of the
machines in an environment. This means that almost all nodes will be assigned a 'Warning' status for data collection. Even
nodes where data collection has been requested but the Perform product isn't even installed will be assigned a 'Warning'
status for most of the day because by default UDR Collection Manager (UCM) will continue to issue a start collection request
(if previous ones have failed) over the first 90% of the collection interval (just under 22 hours). This means that in order to
make the collection status useful a mechanism is needed to filter the groups that are known to be uncollectible on a
machine. This will allow nodes with successful collection (collecting the configured groups we believe they should collect) to
appear in 'OK' state and those with collection problems will appear in 'Warning' state. This is where the udrCollectFilter
command comes in.
Section II: Running udrCollectFilter for a specific node against a specific collection interval
During normal data collection when the remote node is able to communicate back to the console on port 6768 the remote
node will send status messages to the console indicating data collection problem. The messages are reported in the
Collection Status Reporting (CSR) reports under the 'Messages' section of each nodes status page. For example, here are
some messages for a node called 'topgun':
Thu May 19 23:30:06 2005 [UCM-Information] Collect request sent from console and received by the agent Thu May 19 23:30:06 2005 [Agent-Information] Collect Request - Data Collection pending Fri May 20 00:00:00 2005 [Agent-Information] Collect Request - Data Collection active Fri May 20 00:00:15 2005 [Agent-Information] Collect Request - Agent repository write active Fri May 20 00:00:25 2005 [Agent-Warning] Metric group not supported SRM Statistics
These messages indicate that topgun received and accepted a data collection request from the console at 11:30 PM, started
collecting the data at midnight, and notified the console that data for the 'SRM Statistics' group was requested, but is not
being collected on the machine. This missing group causes the node to appear in 'Warning' state in the CSR reports.
The udrCollectFilter command can be used to tell the CSR reports that the missing 'SRM Statistics' group is acceptable for
This will create a file called 'topgun.flt' in the /usr/adm/best1_default/local/manager/filter directory on the console using the
groups that were not collected on March 16th 2006. The '-d MM-DD-YYYY' flag is optional. If the -d date flag is not specified
then all groups that have failed to collect at any point will be included in the nodes filter file.
The contents of the topgun.flt file created by the udrCollectFilter command is in this case just a single line:
"SRM Statistics"
This indicates that if the agent on node 'topgun' reports that the 'SRM Statistics' metric group isn't being collected that is OK,
and the node can be reported in 'OK' state rather than 'Warning' state. If some time in the future another group were to stop
being collected node 'topgun' would be reported in 'Warning' state again. Using udrCollectFilter enables specific groups to
be filtered while maintaining the alerting feature if another group unexpectedly stops being collected.
NOTE: You should only run udrCollectFilter against a data collection request that has completed to get a complete list of
groups to be filtered. Group that haven't been terminated by the collector but contain no data will not show as unavailable
until data collection has completed.
NOTE: By default udrCollectFilter will not overwrite an existing filter file for the node. You must specify the '-r' flag to force an
existing filter file to be overwritten.
Section III: Running udrCollectFilter against all nodes
The udrCollectFilter command can also be run to create filter files for all nodes that have sent messages to the console. The
command to run udrCollectFilter against all nodes is simply:
$BEST1_HOME/bgs/bin/udrCollectFilter
If there is already an existing [hostname].flt file for a node the udrCollectMgr process will not overwrite it. Instead it will only
create new [hostname].flt files. If you wanted to overwrite existing [hostname].flt files it would be necessary to specify the '-r'
flag.
Section IV: Inadvertently masking real problems with udrCollectFilter
The udrCollectFilter command will create filter entries for any metric groups which currently aren't being collected on a
machine. That means that if udrCollectFilter is run against a machine that is having collection problems critical groups may
be added to the filter list for that machine.
For example, here are messages for node 'topcat':
Wed May 18 23:30:04 2005 [UCM-Information] Collect request sent from console and received by the agent Thu May 19 00:24:42 2005 [Agent-Information] Collect Request - Data Collection active Thu May 19 00:24:57 2005 [Agent-Warning] Metric group currently has no data available for collection Cpu Statistics
Thu May 19 00:24:58 2005 [Agent-Warning] Metric group not available PRM Configuration Thu May 19 00:24:58 2005 [Agent-Information] Collect Request - Agent repository write active Thu May 19 23:59:19 2005 [Agent-Information] Collect Request - Data Collection Complete Thu May 19 23:59:19 2005 [Agent-Warning] Collect Request - Data not collected for metric group Cpu Statistics Thu May 19 23:59:19 2005 [Agent-Warning] Collect Request - Data not collected for metric group Raid Configuration Thu May 19 23:59:19 2005 [Agent-Warning] Collect Request - Data not collected for metric group Raid Statistics Thu May 19 23:59:19 2005 [Agent-Warning] Collect Request - Data not collected for metric group User Id Statistics Thu May 19 23:59:19 2005 [Agent-Information] Collect Request - Processing complete Fri May 20 00:04:02 2005 [UCM-Warning] No data collected for some of the configured metric groups Fri May 20 00:04:08 2005 [UCM-Information] Collect Request - Agent data transfer successful Fri May 20 00:04:08 2005 [UCM-Information] Collect Request - Data deleted in Agent repository
Running udrCollectFilter will create a topcat.flt file that looks like this:
ProblemProcesses available for manually transferring the Perform capacity planning data from the Perform remote agent server to
the Perform console server when manager run does not successfully transfer the data.
NOTE: This Knowledge Article was originally published as Resolution 210290.
BMC Performance Assurance for Microsoft Windows Servers 7.5.00, 7.4.00, 7.3.00, 7.2.00
BMC Performance Assurance for Unix 7.5.00, 7.4.00, 7.3.00, 7.2.00
Manager
Solution
Section I: UNIX Perform Console Options
There are three options available to recover from a failed data transfer for all supported releases of Perform and one
additional option for Perform version 7.4.10 and later:
A. For Perform version 7.4.00 and later only: Re-run the [date]-[date].XferData script with the '-r' flag.
B. Use the UDR Collection Manager (UCM) command line utility to recover the Manager run that failed to
transfer the data.
C. Transfer the data using the best1collect run command from the command line
D. Manually transfer the data using an external tool such as ftp or scp
For Perform version 7.4.00 and later
Option A: For Perform version 7.4.00 and later: Re-run the [date]-[date].XferData script
The [date]-[date].XferData script supports a new '-r' recovery option to allow it to automatically recover failed data transfer
and initiate data processing.
If the *.XferData script is being kept in the Manager Output Directory (not automatically deleted) after the Manager run
completes it can be re-run to re-transfer and then execute re-processing for the Manager run.
For example, to re-try data transfer for the Manager run in the '/bmc/perform/manager/run1/Mar-16-2008.15.16' Manager
Output Directory for the April 24th 00:00 - 23:59 Manager run:
/bmc/perform/manager/run1/Mar-16-2008.15.16/Apr-23-2008.00.00-Apr-23-2008.23.59.XferData -r &INFO: Thu Apr 24 13:06:13 CDT 2008 XferData Starting UDRCollectManager in recovery mode.udrCollectMgr: manager run was initiatedINFO: Thu Apr 24 13:06:13 CDT 2008 XferData Recovery RC=0, Run is initiated..INFO:Thu Apr 24 13:06:14 CDT 2008 XferData recovery started.udrCollectMgr: time is expired for manager runudrCollectMgr: manager run is registered and complete for date specified
The [date]-[date].XferData script will initiate udrCollectMgr processes to handle the data transfer and then once the transfer
phase is complete it will initiate the [date]-[date].ProcessDay script to re-process the data.
For all supported releases of Perform (including 7.4.00 and later)
Option B: Use the UCM Command Line Utility
To recover failed data transfer using the UCM command line executable:
For Windows, make sure that if there are spaces in the paths that the command line parameter is enclosed in double-quotes
(as in the example above). Also ensure that the directory separators are specified as two backslashes, not a single
backslash. The following is an example of the messages you might receive for a successful transfer request:
best1collect on [managing node]: requesting a push of collected data via the Service Daemon...Node : [remote node]. Start from: Mon Mar 18 00:00:00 2002.Sun Dec 8 15:12:19 2002*Node: [agnent node] has acknowledged Push request successfully.
Option D: Manually transfer the data
Transfer the UDR data using an external transfer tool if necessary.
Section II: Windows Perform Console
The PATROL Perform Manager on the Windows managing console assumes that the data is already present on the
managing node when executing a Manager run that uses the Use existing data option. Therefore, you need to use another
method to get the data to the managing node before running Manager again to reprocess the data.
Option A: For Perform version 7.4.00 and later: Use the 'Recover' option in the GUI
In Perform version 7.4.00 and later there is a new 'Recover' option in the Perform console Manager 'Schedule' GUI to
recover data transfer and data processing.
Step 1
Open the Perform console and select the Manager -> Schedule object in the left-pane. That will bring up the Manager
Schedule list.
Step 2
Select the target Manager run for transfer recover. Right-click and select 'Recover' from the menu. That will pop-up a
-G: Generate status reports. This will update the .xml pages in the status directory for the web-based status report. Use the
–s option to specify a script and the –d option to specify the run date.
'-E' flag: Comma separated list of errors codes and definitions
-E: Output a comma separated list of all error ids, levels and strings (no options)
For example:
> $BEST1_HOME/bgs/bin/udrCollectStat -EID, Severity, Description0, Information, Normal Operation1, Error, No conditions in alert2, Information, Drill down request received3, Information, Drill down request cleared4, Error, Bad selector received in message5, Error, Invalid metric request received6, Error, Policy file does not exist7, Warning, Cannot reach agent for alert<-- cut -->