Confidence in a connected world. WHITE PAPER: BUSINESS BENEFITS Top 10 Backup Reports Using Veritas™ Backup Reporter Hal Uygur & Josef Pfeiffer Data Protection Group: Veritas NetBackup January 2009
WH
ITE
PA
PE
R: c
us
To
mIz
E
confidence in a connected world.
WH
ITE
PA
PE
R: B
us
INE
ss
BE
NE
FIT
s
Top 10 Backup Reports Using Veritas™ Backup Reporter
Hal Uygur & Josef Pfeiffer
Data Protection Group: Veritas NetBackup
January 2009
Content
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Report 1: Success Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Report 2: Risk Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Report 3: Failure Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Report 4: Predictive Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Report 5: Drive Utilization and Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Report 6: Backup Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Report 7: Deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Report 8: Media Trending and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Report 10: Week At A Glance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
White Paper: Business Benefits
Top 10 Backup Reports UsingVeritas™ Backup Reporter
White Paper: Business Benefits
4
IntroductionThe sheer volume of data growth places a greater burden on the protection of these assets. Data protection vendors continue
to offer more technologies to keep up and scale with the growth of data. Technologies such as continuous data protection
(CDP), deduplication, snapshot client, and virtual tape library (VTL) are now part of the mainstream offerings and provide
increasingly advanced techniques to protect the data assets and provide better alignment with the business value of the data
being protected. Data protection managers now find themselves having to choose and manage their data protection processes
across an increasingly complex landscape. Visibility into each technology as well as across each technology at more logical and
business layers is becoming very challenging. The management paradigm, “you can’t manage it if you can’t measure it” is very
fitting here and underscores the importance of good metrics and reporting in data protection management. Most of the leading
data protection product vendors have now expanded their offerings to include reporting. Several years back, when data protection
technology was mainly tape-based, reporting was seen more as a “nice to have.” Today, it is becoming an integral part of the
data protection solution. The last several years have also witnessed the growth of the data protection services sector. Service
providers also require robust reporting around the tiers of services offered in addition to showing compliance with formal service
level agreements (SLAs) and being able bill their customers. Backup reporting has emerged as a way to obtain greater efficiency,
monitor service levels, and justify the business costs associated with protecting data.
Being the leader of data protection backup software led Symantec to create Veritas™ Backup Reporter, a backup reporting product
such as storage, networks and servers, Symantec has been able to focus exclusively on the backup application and gather more
details to help create the best reports for NetBackup™, Backup Exec™ and PureDisk™ as well as third-party backup products,
including IBM Tivoli® Storage Manager, EMC Networker®, and CommVault Galaxy®.
This paper provides a top 10 style categorization of the fundamental disciplines—performance and capacity planning, service level
management, and compliance—all essential to good management of data protection.
Report 1: Success Rate
The success rate report is your “here is the proof” report that all computing machinery was successfully backed up last night. It
measures success rate based on the outcome of the last backup. Simply put, the measure of whether you have protected the data
assets is based on whether a backup was successful or not. Thus, initial attempts that have failed need to be ignored once a good
that is designed with input from hundreds of users. By developing a reporting product that is not clouded by other things
5
White Paper: Business Benefits
backup is attained before the next scheduled backup is run. So the “true” success rate is calculated by using the status of the last
job for each server and aggregating it to arrive at an overall success rate for the target population. This contrasts with the raw
success rate calculation where the number of successes is divided by the number of jobs. Both metrics have significance for their
audiences. The true success rate report is typically sent to end users, internal/external customers, and CIOs/CTOs. The raw success
rate report is typically sent to those who manage the data protection operation. These people would typically also receive the true
success rate report because they are the ones who will be contacted if success is not attained. While satisfied that all computing
assets were protected, they would need to know why failures may have occurred before successful backups were attained. There
is one more report included in the success rate reporting portfolio; it’s the “first job success rate.” This essentially is a measure of
readiness and efficiency. In short, it’s the inverse of the true (or last) success rate. Here, only the first job is selected and measured
on its outcome. The emphasis here is being prepared when a new day’s backup schedule kicks off. Cases with low first job and
high last job success rates would point to environments in which additional human interaction (debugging, troubleshooting,
escalating) and computing cycles need to be applied. Eventually, satisfaction was attained, but significant work was required. This
is in contrast to being very successful at the start and not having to apply additional cycles to achieve the desired results. A good
feature in reports where a metric is to be evaluated against a target is the inclusion of this value as another data set in the report.
this value .
Report 2: Risk AnalysisAn actual case of risk analysis involves a large enterprise whose data protection team had SLAs with each business unit. Analysis
indicated that all of their data was being protected. Their published reports showed a 100 percent success rate for business
unit satisfaction. However, eventually one business unit requested a restore of a corrupted database. Embarrassingly, the data
protection manager notified the requestor that no such backup existed. The irate business unit manager demanded an explanation
because success rate reports were at 100 percent. The data protection manager had to explain that the 100 percent success
rate was measured against servers that were being backed up; however, there was no such measure of servers without backup
software or even servers with backup software but not included on the backup schedule. Therefore to complement the success
rate reports (because clearly success rate reports don’t tell the whole story) is the risk portfolio of reports. These reports determine
are all drives, file systems, and directories being backed up? There are many other questions that these reports can answer. In
order to report on whether all computing assets are being protected, the source of data must originate from somewhere other
than the backup product. This could be an asset management system, a configuration management database (CMDB), homegrown
database, or even a spreadsheet that functions as the authoritative ledger for all computing assets. Data from any of these sources
This provides for quick and intuitive visualization where periods when the target is not met is clearly seen as the points below
additional details and accuracy by answering questions such as, is all computing machinery being backed up? For those that are,
White Paper: Business Benefits
6
is then compared to what is provided by the backup product and the exceptions are reported as servers that are not defined to the
backup product. Risk analysis reports help satisfy your auditors because the reports demonstrate that there is a process in place to
verify the occurrence of backups and necessary actions.
Beyond the initial exposure (as in “Where am I at risk in terms of what is not being protected?”), another risk report focuses on the
servers that are being backed up, verifies that all data on the server is being protected, and nothing is being neglected. Client risk
analysis reports show exposures at a server policy level. Increasingly, different data types on servers (file servers, Web servers,
database servers, etc.) are being scheduled separately. So while a server’s file systems may have been successfully backed up,
it may have failed because its database was not backed up. With such granular level of reporting, the exposure from the failed
database backup doesn’t go unnoticed. Additional reports will be included in this portfolio as virtualization and blade technologies
provide for more “fly under the radar” opportunities and challenges for data protection managers.
Report 3: Failure AnalysisA best practice for remediating failures is the ability to perform solid analysis of why backups fail and then address root causes.
This results in steady and lasting improvement to operational performance. There are several reports that comprise this portfolio.
First and foremost is the simple pie chart that provides a breakdown of the percentage of failures by specific error codes. To ensure
that this report is statistically significant, the ability to filter out specific codes is important; for example, excluding failures that
result from operator termination of backups. The resulting report will provide a breakdown of not only the different types of error
codes, but also what percent each error code contributes to the total number of error codes. A first glance here allows operations
managers to assess if there are many different errors spread equally across the backup environment.
7
White Paper: Business Benefits
The error code distribution represented in the pie chart is perhaps the most undesired result because it verifies that failures are
plentiful and for a variety of reasons. This is in contrast to a similar number of codes, but only one or two are the high-volume drivers.
In the latter case, by fixing the two high-volume codes first, significant progress is made in improving performance, whereas in the
the next logical step is to determine if these errors are limited to a number servers or if they are indiscriminate across the backup
environment; for example, is an “80:20” type scenario in effect where 20 % of the servers are resulting in 80% of all failures? A simple
rankings report against the error codes will illustrate the behavior. A third and significant analysis is grouping of errors. One approach
is breaking down errors as either technical or operational; for example, backup window closed (operational), client could not be reached
(technical), no tapes available in storage unit (operational), read/write operation failed (technical), etc. Beyond the pie and Pareto chart
type analysis, looking at failures historically through the trends also provides verification that problems are being addressed.
Report 4: Predictive Forecasting
In addition to the numerous reports that verify that data assets are being protected, forward-looking reports are an integral part
of the reporting portfolio. These reports provide a much-needed perspective of what conditions 6, 9, 15 and more months out might
be like. The ability to forecast resource demand and use the information for sizing the infrastructure to meet the demand will help
ensure that key performance indicators (KPIs) such as success rate, throughput, etc. will continue to meet requirements. Of the several
forecasting reports, the backup size forecast is the first one that should be configured.
Regardless of whether data size or widgets are being forecast, it is important that sufficient historical data is collected and included
in the report to ensure that resulting forecast numbers are statistically significant. Future decisions should never be based on just a
few historical data points. Furthermore, in order to dampen out the cyclical (that is, full vs. incremental) nature of the historical data,
setting reports with the sufficient blocks of times help to ensure that results are statistically significant and relevant. Keeping these
configuration techniques in mind, the resulting report can provide important information on whether current resources can satisfy the
future demand. If backed up data grows 30% a year from now, can the backup servers and media servers accommodate the additional
load and still meet backup windows? Will tape drive performance become a bottleneck where new drives may need to be procured?
previous case , after fixing the first two codes, much improvement is still expected. After getting the initial breakdown of error codes,
White Paper: Business Benefits
8
Answers to these and similar questions are what the backup size forecast reports can shed light on. Beyond backup size, other key
forecast variables include number of clients, number of virtual machines, and file count. All of these collectively provide important
information on components that can impact performance and breach SLAs. Going beyond the physical infrastructural components,
the use of business-level views around this infrastructure provides yet even more powerful perspectives. Insights into whether data
will grow faster in Europe or North America, whether the finance business unit will require more tape inventory, or whether the
Hong Kong data center will double the amount of virtual machines, will become available.
Report 5: Drive Utilization and Throughput
One of the big ticket components of the data protection infrastructure is the tape drive. Without adequate management of
these assets, cost and performance can quickly spiral out of control. Knowing the critical role that these assets play in the data
protection workflow, many operations overcompensate against these assets so as to not run into capacity problems. This behavior
is largely driven by the absence of reliable reports and forecast of capacity and growth. The drive utilization reports based on heat
chart style graphical illustration provide a comprehensive picture around drive activity and utilization. In its simplest form, a drive
can be decomposed down to any hour of any day; utilization during each hour is calculated through correlating the time that each
job is using the drive on a minute-level measure of utilization. Thus the utilization percentages shown for any drive during any
hour is precise down to the minute. Aggregating off this level of precision provides capacity planners with reliable information on
usage and trends. The configurable color settings that define utilization ranges (0–100%) provide a powerful means of analysis. In
addition to analyzing utilization during specific times, the report provides advanced averaging calculations so that when utilization
for long time intervals are formulated, they are statistically significant and not diluted by straight line averaging type methodology.
In order to contain costs, more companies are sharing drives to increase utilization. These reports also provide for smart filtering,
which shows shared utilization in addition to physical utilization. The reports can be aggregated and/or filtered by logical/physical
drive, drive type, media server, and library. Thus the drive utilization reports from an hourly and day-of-week type template can
provide more than a dozen different types of analyses of the tape drive infrastructure. And for the architect that must know the
drive utilization metric, a precise metric is produced.
Now that you have a good understanding of drive utilization, the next logical question is, “While the drives are being utilized, how
well are they performing?” The fundamental measure of drive performance is throughput; that is, the speed at which data is being
transmitted through the drives onto the tape or in the case of restores, from the tape out to the disk storage unit. Throughput is
a rate metric and measures speed. The units of measure here is kilobytes/second (kB/sec ). Just like the drive utilization reports,
9
White Paper: Business Benefits
heat chart style graphical illustration enables a compelling representation of throughput. Through the light to dark shade of green,
you can easily see a drive’s performance and further be able to isolate drives so that both fast and slow speeds are observed.
The main differentiator of the throughput reporting is the precision stemming from the granularity of data collected along with
advanced averaging techniques.
Traditional and simplistic methods (that is, the amount of data transferred divided by job duration) of measuring drive throughout
can be quite misleading and inaccurate. The methods used here start with collecting data at a fragment level and capturing
the actual times in which the reading and writing operations occur (as opposed to the simplistic job start and end times, which
completely ignore drive queuing times as well as data ferreting times). The resultant metrics enable a precise and insightful
analysis of drive performance. Scenarios where the same drive showing high and low throughput can be further decomposed to
the underlying jobs and the servers and policies behind the jobs. Similar to the utilization report, they can be aggregated and/
or filtered by logical/physical drive, drive type, media server, and library. An excellent use case of these reports is capturing
performance against drives in which there is restore/recovery activity. These metrics can then be included in the industry-standard
Recovery Time Objective (RTO) measure as part of recovery planning activities.
Report 6: Backup Window
White Paper: Business Benefits
10
Increasingly, time (that is, when backups occur and how long they run) is becoming an important aspect of data protection.
End users and customers alike are demanding that specific numbers of backup copies of data are completed by a certain hour
(for example, start of business day) along with not commencing before a certain hour. Likewise, more complex interdependencies
such as taking a backup of a database before and after business processing updates is also becoming commonplace. The “backup
window” reports are thus an important part of the data protection manager’s need to demonstrate that activity is occurring
during the defined times, repeatable from day-to-day, and the resources required to deliver on these requirements have sufficient
capacity. The backup window reports provide insights into what happens by hours of day and days of week. These reports are
complemented with a graphical drawing of the backup window for compelling visual presentation and analyses. One can quickly
determine if all backup activity is occurring within the defined window. When spillovers outside of windows occur, there is also
reporting to zoom in on why this is happening. Longer running jobs, sub-optimal scheduling, data growth, and client growth are
typical contributors to the out-of-window condition. Window activity should be examined collectively in the context of number of
jobs, number of clients, and amount of data being backed up across each hour. Furthermore, similar to the drive utilization and
throughput reports, looking at window performance across broad timelines using intelligent averaging is also necessary. Missing
a window once or twice doesn’t necessarily point to broader systematic problems and thus the averaging context needs to be
examined alongside the actual daily context.
Report 7: Deduplication
Deduplication is nothing short of a megatrend when referring to backup. Remote offices have been a great area where
deduplication can be used to send backups to a more centralized data center location. By eliminating the need to deal with disk or
tape drives in these offices, where IT staff is usually minimal to none, a large amount of risk is avoided. Veritas Backup Reporter
allows you to report on NetBackup PureDisk Remote Office Edition so that there is one email or one set of reports that shows which
remote offices are at risk due to a failed or non-occurring backup. It also can be used to show how much deduplication is occurring
at the remote office before transmitting. Over time these deduplication rates and common failures can be analyzed to show trends
that can help users make more educated decisions about the backup environment and which offices, data or servers need more
attention. In the data center, deduplication reporting helps to determine what the actual deduplication rate is at each PureDisk
storage unit. Reporting can be abstracted to view the protected size or stored size across all PureDisk environments or drilled down
to one specific area. Common file or data types can then be identified and matched with their deduplicatability.
11
White Paper: Business Benefits
Report 8: Media Trending and Analysis
Backup media resides at the heart of data protection operations. Unfortunately, not all data can be stored in one place, so
managing the amount of space on multiple tape cartridges, drives, libraries, VTLs, disk devices, and other locations can get
complicated. Capacity planning has become an important role in doing backups and Veritas Backup Reporter has a number of
reports directly set to address this. At a high level, a report can be generated to show overall historical supply and demand. Ideally,
supply would always be equal to demand. Why pay for more storage than you need? While you don’t want to have too little supply
and not be able to meet the backup demands, Backup Reporter can show the supply and demand for the overall environment or a
particular storage location such as a VTL or tape library. And by applying predictive forecasting, you can determine exactly when
more supply needs to be added based on the growing demand.
Report 9: Backup Size
White Paper: Business Benefits
12
This report provides visibility into the amount of data being backed up in a computing environment. There are several versions
of this report—which should be in every technology manager’s portfolio—including a running total of amount of data backed
up by days for the last two weeks. This helps verify whether there are significant swings in the amount of data crossing the wire.
For environments in which there are well-defined schedules where incremental backups occur during the week followed by full
backups on weekends, a 2–3 week timeline will verify the spikes on the weekends from the full backups. By zooming in on the
amount of data being backed up on the weekends, significant shifts in overall data volume can be observed. Another type of size-
based report considers data on a monthly basis and then restricts the totals to full backups only. This report provides significant
insights on whether data growth in general is being observed across the computing environment. An additional derivative of this
report is looking at size in a business context; for example, by geography, data center, business unit, and application. The primary
audiences for this report are the CIO, operations manager, and data protection infrastructure architects. Just as we expect CEOs to
be very familiar with numbers such as annual revenue, % growth and margins, this is one of the metrics that CIOs and CTOs need
to know.
Report 10: Week At A Glance
Reporting out to systems administrators, DBAs, and application owners is now a common requirement across most enterprises.
Similarly, with service providers, demonstrating that they are meeting the service objectives to their customers is typically part of
the contract. Leading the list of reports for end users and internal/external customers is the week-at-a-glance report. This report
provides quick and comprehensive verification that all is well. Quick in that the visual presentation of failures will be readily
apparent, and comprehensive in that all servers are itemized. An excellent use case of business views is the need to segregate the
data relevant to the audience. So a DBA will expect to only see the servers on which their databases reside and not have to sift
through a report that includes servers that are of no interest. While providing for a weekly summary, this report is quite versatile
because you can drill down to any server on any day and get the job level detail. The grouping requirement is more critical for
service providers because they need to ensure that each customer sees only their data.
13
White Paper: Business Benefits
SummaryAs data grows, backup operations are becoming increasingly complex; however, customers are benefiting from great gains in
operational productivity by leveraging backup reporting and service-level management disciplines. It is very difficult to improve on
a process that isn’t measurable, so the first step is defining, monitoring, and analyzing data on an ongoing basis. Veritas Backup
Reporter provides an excellent means for doing so. Veritas Backup Reporter provides many different types of reports that address
backup window, and risk analysis reports, to inventory planning and operational reports such as predictive forecasting and week-
recovery operations.
a variety of different needs for key stakeholders within the organization. From compliance reports such as backup success rate,
at-a-glance reports, Veritas Backup Reporter provides an important role in improving and optimizing data center backup and
For specific country offices and
contact numbers, please visit
our Web site. For product
information in the u.s., call
toll-free 1 (800) 745 6054.
symantec corporation
World Headquarters
20330 stevens creek Boulevard
cupertino, cA 95014 usA
+1 (408) 517 8000
1 (800) 721 3934
www.symantec.com
copyright © 2007 symantec corporation. All rights reserved. symantec, the symantec Logo, BindView, Enterprise security manager, sygate, Veritas, Enterprise Vault, NetBackup and Livestate are trademarks or registered trademarks of symantec corporation or its affiliates in the u.s. and other countries. other names may be trademarks of their respective owners. 01/07 xxxxxxxx
About Symantec
symantec is a global leader in
infrastructure software, enabling
businesses and consumers to have
confidence in a connected world.
The company helps customers
protect their infrastructure,
information, and interactions
by delivering software and services
that address risks to security,
availability, compliance, and
performance. Headquartered in
cupertino, calif., symantec has
operations in 40 countries.
more information is available at www.
symantec.com.