Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version 2.0 Bob Allegretti, NetApp Technical Marketing Performance Management August 2015 | TR-4448 Abstract This document describes some best practices when using NetApp ® OnCommand ® Performance Manager for managing NetApp clustered Data ONTAP ® systems.
26
Embed
Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Technical Report
OnCommand Performance Manager
Best Practices OnCommand Performance Manager Version 2.0 Bob Allegretti, NetApp Technical Marketing Performance Management
August 2015 | TR-4448
Abstract
This document describes some best practices when using NetApp® OnCommand
®
Performance Manager for managing NetApp clustered Data ONTAP® systems.
3.3 Policies and Events ....................................................................................................................................... 12
4 Common Use Cases ............................................................................................... 14
4.1 Common Workflows ...................................................................................................................................... 14
4.4 Correlating Events and Metrics ..................................................................................................................... 21
4.5 Discovering Most Active Volumes Globally ................................................................................................... 22
4.6 View the Effects of Caching .......................................................................................................................... 24
Once configured, Performance Manager forwards performance events to Unified Manager. Unified
Manager then displays performance events on the dashboard, providing links to Performance Manager
event details, and it retains events in the performance event inventory.
3 Architectural Elements
Performance Manager manages storage performance through three basic architectural elements:
1. Dashboard
2. Performance visualization
3. Events (and alerts)
Performance Manager analyzes the entire storage environment. It lists the clusters and, in some cases,
the objects that need immediate attention on the dashboard. View more details navigating between
storage objects and displaying performance metrics collected over time. Performance Manager has the
capability to generate customized alerts and events based on user-defined policies specific to application
environments.
3.1 Dashboard
Ideally, storage environments manage themselves without any human intervention. The next best option
is a tool indicating what demands attention now. Performance Manager does this by composing a cluster
dashboard that first lists clusters that are of the most interest. The order of precedence is first, clusters
that can’t be reached; second, actively alerting clusters; and, last, most active clusters. The dashboard
also presents other high-level information such as key performance metrics, use of the most active
resources, and simple color-coded alert status indicators.
The example dashboard in Figure 5 shows three clusters. In this figure you see that there are no active
alerts because all color-coded status indicators are green and that the cluster called ontaptme-fc-cluster is most active at 7,274 IOPS. You also see that the most active disk aggregate runs at 21% busy and the most active node operates at 45% utilization.
The Performance Manager object landing page focuses on a specific storage object in which summary
and detailed performance information is presented. All storage objects have a landing page and have a
similar look and feel. Obviously not all objects are the same, so a cluster landing page has slightly
different metrics and views than a volume landing page. However, all landing pages summarize key high-
level metrics and categorize events for a given storage object over the prior three days. For example, Figure 7 shows a cluster node object landing page in which summary metrics charts display latency,
IOPS, MBps, and utilization. Below each of the metrics charts, links appear to any new and obsolete
events for the given metric and object pair. In addition to object-specific information, the landing page
provides access to the performance explorer through which you can observe interactions with other
objects.
Performance Explorer
Performance Explorer is a modular component of Performance Manager common to all object landing
appear on the OnCommand® Unified Manager dashboard and cause an e-mail alert to be sent to an
administrator.
There are two types of events: system and user-defined. Performance Manager has built-in system event
generation through which NetApp engineering establishes thresholds. These include thresholds associated with the internal operations of the system, such as node busy conditions, file system layout
factors, and disk utilization. Take system events seriously and act on them. User-defined events result
from the violation of a user-created threshold policy.
Events appear in chart timelines such that they can be visually correlated with all other metrics. This
correlation is helpful in confirming expected correlations or in discovering unexpected correlations between object resource consumption and the triggering of an event. In Figure 10 you can see that a
critical system event is correlated with aggregate utilization (red line) crossing 50% utilization (note that
the chart key is not shown to conserve space). In the on-screen display, the user can access additional
event details by hovering a mouse over the red dot.
When an event is generated, the following attributes are recorded:
Status: warning or critical
Type: system or user-defined
State: new or obsolete
Duration: how long the alerting condition lasted
Associated storage object
Description: why the event occurred
Figure 11 shows details from a user-defined CRITICAL event in which volume latency exceeds a 5.00
ms/op threshold setting for 3 hours and 10 minutes.
4 Common Use Cases
As it is for many management tools, the number of potential use cases is nearly infinite. This section
highlights some of the common procedures for using Performance Manager.
4.1 Common Workflows
The main purpose of storage performance management is to confirm that storage systems are operating
as expected. Performance Manager is an excellent tool to aid in establishing expectations after
The lower portion of the cluster landing page is where the “hot objects” functionality resides (see Figure 13). The number of objects, for example, volumes, can easily reach hundreds and beyond. The bulk of
the volumes might contain data either at rest or simply casually accessed and thus uninteresting. This
feature shows the most active objects sorted by a user-selected metric over a user-selected time range.
The negative slopes of the histograms in Figure 13, for example, “Top 10 Volumes,” show how
Performance Manager sorts volumes with the highest measured latency.
4.3 Managing Objects
Performance Manager object management functions have a common look and feel through the modular
object landing pages and Performance Explorer (see section 3.2). The following use cases utilize the
features and functionality of these components and are applicable with simple modifications to other
A critical aspect of performance management is knowing how resources are used and understanding
performance resource consumption. This knowledge can result in rebalancing workloads by relocating
volumes or LUNs to newly added nodes or underused nodes. It can also result in choosing a node to add
a new workload. In any case, Performance Manager provides node utilization metrics over time and
provides ways to easily compare them.
To compare node utilization within a cluster, access any node object landing page in the cluster (Figure 16). You can do this from the dashboard directly or from the cluster landing page. On the left side of the
screen is a list of the sibling nodes in the cluster. Add all of the node objects to the metrics charts on the
right side of the screen using the Add to Charts buttons (annotated) in Performance Explorer.
After adding all node objects to the metrics charts, the graph appears with the utilization metric plotted for
all the nodes (Figure 17).
Figure 16) Node landing page.
Figure 17) Metrics chart comparing node utilization within a cluster.
Refer to the Interoperability Matrix Tool (IMT) on the NetApp Support site to validate that the exact product and feature versions described in this document are supported for your specific environment. The NetApp IMT defines the product components and versions that can be used to construct configurations that are supported by NetApp. Specific results depend on each customer's installation in accordance with published specifications.
Software derived from copyrighted NetApp material is subject to the following license and disclaimer:
THIS SOFTWARE IS PROVIDED BY NETAPP "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL NETAPP BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
NetApp reserves the right to change any products described herein at any time, and without notice. NetApp assumes no responsibility or liability arising from the use of products described herein, except as expressly agreed to in writing by NetApp. The use or purchase of this product does not convey a license under any patent rights, trademark rights, or any other intellectual property rights of NetApp.
The product described in this manual may be protected by one or more U.S. patents, foreign patents, or pending applications.
RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer
Software clause at DFARS 252.277-7103 (October 1988) and FAR 52-227-19 (June 1987).
Trademark Information
NetApp, the NetApp logo, Go Further, Faster, AltaVault, ASUP, AutoSupport, Campaign Express,
Cloud ONTAP, Clustered Data ONTAP, Customer Fitness, Data ONTAP, DataMotion, Fitness,