us Top 10 Backup Reports Using Veritas™ Backup Reportereval.symantec.com/mktginfo/enterprise/white_papers/b-whitepaper_top_10... · This paper provides a top 10 style categorization

WH

ITE

PA

PE

R: c

us

To

mIz

E

confidence in a connected world.

WH

ITE

PA

PE

R: B

us

INE

ss

BE

NE

FIT

s

Top 10 Backup Reports Using Veritas™ Backup Reporter

Hal Uygur & Josef Pfeiffer

Data Protection Group: Veritas NetBackup

January 2009

Content

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Report 1: Success Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Report 2: Risk Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Report 3: Failure Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Report 4: Predictive Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Report 5: Drive Utilization and Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Report 6: Backup Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Report 7: Deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Report 8: Media Trending and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Report 10: Week At A Glance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

White Paper: Business Benefits

Top 10 Backup Reports UsingVeritas™ Backup Reporter


4

IntroductionThe sheer volume of data growth places a greater burden on the protection of these assets. Data protection vendors continue

to offer more technologies to keep up and scale with the growth of data. Technologies such as continuous data protection

(CDP), deduplication, snapshot client, and virtual tape library (VTL) are now part of the mainstream offerings and provide

increasingly advanced techniques to protect the data assets and provide better alignment with the business value of the data

being protected. Data protection managers now find themselves having to choose and manage their data protection processes

across an increasingly complex landscape. Visibility into each technology as well as across each technology at more logical and

business layers is becoming very challenging. The management paradigm, “you can’t manage it if you can’t measure it” is very

fitting here and underscores the importance of good metrics and reporting in data protection management. Most of the leading

data protection product vendors have now expanded their offerings to include reporting. Several years back, when data protection

technology was mainly tape-based, reporting was seen more as a “nice to have.” Today, it is becoming an integral part of the

data protection solution. The last several years have also witnessed the growth of the data protection services sector. Service

providers also require robust reporting around the tiers of services offered in addition to showing compliance with formal service

level agreements (SLAs) and being able bill their customers. Backup reporting has emerged as a way to obtain greater efficiency,

monitor service levels, and justify the business costs associated with protecting data.

Being the leader of data protection backup software led Symantec to create Veritas™ Backup Reporter, a backup reporting product

such as storage, networks and servers, Symantec has been able to focus exclusively on the backup application and gather more

details to help create the best reports for NetBackup™, Backup Exec™ and PureDisk™ as well as third-party backup products,

including IBM Tivoli® Storage Manager, EMC Networker®, and CommVault Galaxy®.

This paper provides a top 10 style categorization of the fundamental disciplines—performance and capacity planning, service level

management, and compliance—all essential to good management of data protection.

Report 1: Success Rate

The success rate report is your “here is the proof” report that all computing machinery was successfully backed up last night. It

measures success rate based on the outcome of the last backup. Simply put, the measure of whether you have protected the data

assets is based on whether a backup was successful or not. Thus, initial attempts that have failed need to be ignored once a good

that is designed with input from hundreds of users. By developing a reporting product that is not clouded by other things

5


backup is attained before the next scheduled backup is run. So the “true” success rate is calculated by using the status of the last

job for each server and aggregating it to arrive at an overall success rate for the target population. This contrasts with the raw

success rate calculation where the number of successes is divided by the number of jobs. Both metrics have significance for their

audiences. The true success rate report is typically sent to end users, internal/external customers, and CIOs/CTOs. The raw success

rate report is typically sent to those who manage the data protection operation. These people would typically also receive the true

success rate report because they are the ones who will be contacted if success is not attained. While satisfied that all computing

assets were protected, they would need to know why failures may have occurred before successful backups were attained. There

is one more report included in the success rate reporting portfolio; it’s the “first job success rate.” This essentially is a measure of

readiness and efficiency. In short, it’s the inverse of the true (or last) success rate. Here, only the first job is selected and measured

on its outcome. The emphasis here is being prepared when a new day’s backup schedule kicks off. Cases with low first job and

high last job success rates would point to environments in which additional human interaction (debugging, troubleshooting,

escalating) and computing cycles need to be applied. Eventually, satisfaction was attained, but significant work was required. This

is in contrast to being very successful at the start and not having to apply additional cycles to achieve the desired results. A good

feature in reports where a metric is to be evaluated against a target is the inclusion of this value as another data set in the report.

this value .

Report 2: Risk AnalysisAn actual case of risk analysis involves a large enterprise whose data protection team had SLAs with each business unit. Analysis

indicated that all of their data was being protected. Their published reports showed a 100 percent success rate for business

unit satisfaction. However, eventually one business unit requested a restore of a corrupted database. Embarrassingly, the data

protection manager notified the requestor that no such backup existed. The irate business unit manager demanded an explanation

because success rate reports were at 100 percent. The data protection manager had to explain that the 100 percent success

rate was measured against servers that were being backed up; however, there was no such measure of servers without backup

software or even servers with backup software but not included on the backup schedule. Therefore to complement the success

rate reports (because clearly success rate reports don’t tell the whole story) is the risk portfolio of reports. These reports determine

are all drives, file systems, and directories being backed up? There are many other questions that these reports can answer. In

order to report on whether all computing assets are being protected, the source of data must originate from somewhere other

than the backup product. This could be an asset management system, a configuration management database (CMDB), homegrown

database, or even a spreadsheet that functions as the authoritative ledger for all computing assets. Data from any of these sources

This provides for quick and intuitive visualization where periods when the target is not met is clearly seen as the points below

additional details and accuracy by answering questions such as, is all computing machinery being backed up? For those that are,


6

is then compared to what is provided by the backup product and the exceptions are reported as servers that are not defined to the

backup product. Risk analysis reports help satisfy your auditors because the reports demonstrate that there is a process in place to

verify the occurrence of backups and necessary actions.

Beyond the initial exposure (as in “Where am I at risk in terms of what is not being protected?”), another risk report focuses on the

servers that are being backed up, verifies that all data on the server is being protected, and nothing is being neglected. Client risk

analysis reports show exposures at a server policy level. Increasingly, different data types on servers (file servers, Web servers,

database servers, etc.) are being scheduled separately. So while a server’s file systems may have been successfully backed up,

it may have failed because its database was not backed up. With such granular level of reporting, the exposure from the failed

database backup doesn’t go unnoticed. Additional reports will be included in this portfolio as virtualization and blade technologies

provide for more “fly under the radar” opportunities and challenges for data protection managers.

Report 3: Failure AnalysisA best practice for remediating failures is the ability to perform solid analysis of why backups fail and then address root causes.

This results in steady and lasting improvement to operational performance. There are several reports that comprise this portfolio.

First and foremost is the simple pie chart that provides a breakdown of the percentage of failures by specific error codes. To ensure

that this report is statistically significant, the ability to filter out specific codes is important; for example, excluding failures that

result from operator termination of backups. The resulting report will provide a breakdown of not only the different types of error

codes, but also what percent each error code contributes to the total number of error codes. A first glance here allows operations

managers to assess if there are many different errors spread equally across the backup environment.

7


The error code distribution represented in the pie chart is perhaps the most undesired result because it verifies that failures are

plentiful and for a variety of reasons. This is in contrast to a similar number of codes, but only one or two are the high-volume drivers.

In the latter case, by fixing the two high-volume codes first, significant progress is made in improving performance, whereas in the

the next logical step is to determine if these errors are limited to a number servers or if they are indiscriminate across the backup

environment; for example, is an “80:20” type scenario in effect where 20 % of the servers are resulting in 80% of all failures? A simple

rankings report against the error codes will illustrate the behavior. A third and significant analysis is grouping of errors. One approach

is breaking down errors as either technical or operational; for example, backup window closed (operational), client could not be reached

(technical), no tapes available in storage unit (operational), read/write operation failed (technical), etc. Beyond the pie and Pareto chart

type analysis, looking at failures historically through the trends also provides verification that problems are being addressed.

Report 4: Predictive Forecasting

In addition to the numerous reports that verify that data assets are being protected, forward-looking reports are an integral part

of the reporting portfolio. These reports provide a much-needed perspective of what conditions 6, 9, 15 and more months out might

be like. The ability to forecast resource demand and use the information for sizing the infrastructure to meet the demand will help

ensure that key performance indicators (KPIs) such as success rate, throughput, etc. will continue to meet requirements. Of the several

forecasting reports, the backup size forecast is the first one that should be configured.

Regardless of whether data size or widgets are being forecast, it is important that sufficient historical data is collected and included

in the report to ensure that resulting forecast numbers are statistically significant. Future decisions should never be based on just a

few historical data points. Furthermore, in order to dampen out the cyclical (that is, full vs. incremental) nature of the historical data,

setting reports with the sufficient blocks of times help to ensure that results are statistically significant and relevant. Keeping these

configuration techniques in mind, the resulting report can provide important information on whether current resources can satisfy the

future demand. If backed up data grows 30% a year from now, can the backup servers and media servers accommodate the additional

load and still meet backup windows? Will tape drive performance become a bottleneck where new drives may need to be procured?

previous case , after fixing the first two codes, much improvement is still expected. After getting the initial breakdown of error codes,


8

Answers to these and similar questions are what the backup size forecast reports can shed light on. Beyond backup size, other key

forecast variables include number of clients, number of virtual machines, and file count. All of these collectively provide important

information on components that can impact performance and breach SLAs. Going beyond the physical infrastructural components,

the use of business-level views around this infrastructure provides yet even more powerful perspectives. Insights into whether data

will grow faster in Europe or North America, whether the finance business unit will require more tape inventory, or whether the

Hong Kong data center will double the amount of virtual machines, will become available.

Report 5: Drive Utilization and Throughput

One of the big ticket components of the data protection infrastructure is the tape drive. Without adequate management of

these assets, cost and performance can quickly spiral out of control. Knowing the critical role that these assets play in the data

protection workflow, many operations overcompensate against these assets so as to not run into capacity problems. This behavior

is largely driven by the absence of reliable reports and forecast of capacity and growth. The drive utilization reports based on heat

chart style graphical illustration provide a comprehensive picture around drive activity and utilization. In its simplest form, a drive

can be decomposed down to any hour of any day; utilization during each hour is calculated through correlating the time that each

job is using the drive on a minute-level measure of utilization. Thus the utilization percentages shown for any drive during any

hour is precise down to the minute. Aggregating off this level of precision provides capacity planners with reliable information on

usage and trends. The configurable color settings that define utilization ranges (0–100%) provide a powerful means of analysis. In

addition to analyzing utilization during specific times, the report provides advanced averaging calculations so that when utilization

for long time intervals are formulated, they are statistically significant and not diluted by straight line averaging type methodology.

In order to contain costs, more companies are sharing drives to increase utilization. These reports also provide for smart filtering,

which shows shared utilization in addition to physical utilization. The reports can be aggregated and/or filtered by logical/physical

drive, drive type, media server, and library. Thus the drive utilization reports from an hourly and day-of-week type template can

provide more than a dozen different types of analyses of the tape drive infrastructure. And for the architect that must know the

drive utilization metric, a precise metric is produced.

Now that you have a good understanding of drive utilization, the next logical question is, “While the drives are being utilized, how

well are they performing?” The fundamental measure of drive performance is throughput; that is, the speed at which data is being

transmitted through the drives onto the tape or in the case of restores, from the tape out to the disk storage unit. Throughput is

a rate metric and measures speed. The units of measure here is kilobytes/second (kB/sec ). Just like the drive utilization reports,

9


heat chart style graphical illustration enables a compelling representation of throughput. Through the light to dark shade of green,

you can easily see a drive’s performance and further be able to isolate drives so that both fast and slow speeds are observed.

The main differentiator of the throughput reporting is the precision stemming from the granularity of data collected along with

advanced averaging techniques.

Traditional and simplistic methods (that is, the amount of data transferred divided by job duration) of measuring drive throughout

can be quite misleading and inaccurate. The methods used here start with collecting data at a fragment level and capturing

the actual times in which the reading and writing operations occur (as opposed to the simplistic job start and end times, which

completely ignore drive queuing times as well as data ferreting times). The resultant metrics enable a precise and insightful

analysis of drive performance. Scenarios where the same drive showing high and low throughput can be further decomposed to

the underlying jobs and the servers and policies behind the jobs. Similar to the utilization report, they can be aggregated and/

or filtered by logical/physical drive, drive type, media server, and library. An excellent use case of these reports is capturing

performance against drives in which there is restore/recovery activity. These metrics can then be included in the industry-standard

Recovery Time Objective (RTO) measure as part of recovery planning activities.

Report 6: Backup Window


10

Increasingly, time (that is, when backups occur and how long they run) is becoming an important aspect of data protection.

End users and customers alike are demanding that specific numbers of backup copies of data are completed by a certain hour

(for example, start of business day) along with not commencing before a certain hour. Likewise, more complex interdependencies

such as taking a backup of a database before and after business processing updates is also becoming commonplace. The “backup

window” reports are thus an important part of the data protection manager’s need to demonstrate that activity is occurring

during the defined times, repeatable from day-to-day, and the resources required to deliver on these requirements have sufficient

capacity. The backup window reports provide insights into what happens by hours of day and days of week. These reports are

complemented with a graphical drawing of the backup window for compelling visual presentation and analyses. One can quickly

determine if all backup activity is occurring within the defined window. When spillovers outside of windows occur, there is also

reporting to zoom in on why this is happening. Longer running jobs, sub-optimal scheduling, data growth, and client growth are

typical contributors to the out-of-window condition. Window activity should be examined collectively in the context of number of

jobs, number of clients, and amount of data being backed up across each hour. Furthermore, similar to the drive utilization and

throughput reports, looking at window performance across broad timelines using intelligent averaging is also necessary. Missing

a window once or twice doesn’t necessarily point to broader systematic problems and thus the averaging context needs to be

examined alongside the actual daily context.

Report 7: Deduplication

Deduplication is nothing short of a megatrend when referring to backup. Remote offices have been a great area where

deduplication can be used to send backups to a more centralized data center location. By eliminating the need to deal with disk or

tape drives in these offices, where IT staff is usually minimal to none, a large amount of risk is avoided. Veritas Backup Reporter

allows you to report on NetBackup PureDisk Remote Office Edition so that there is one email or one set of reports that shows which

remote offices are at risk due to a failed or non-occurring backup. It also can be used to show how much deduplication is occurring

at the remote office before transmitting. Over time these deduplication rates and common failures can be analyzed to show trends

that can help users make more educated decisions about the backup environment and which offices, data or servers need more

attention. In the data center, deduplication reporting helps to determine what the actual deduplication rate is at each PureDisk

storage unit. Reporting can be abstracted to view the protected size or stored size across all PureDisk environments or drilled down

to one specific area. Common file or data types can then be identified and matched with their deduplicatability.

11


Report 8: Media Trending and Analysis

Backup media resides at the heart of data protection operations. Unfortunately, not all data can be stored in one place, so

managing the amount of space on multiple tape cartridges, drives, libraries, VTLs, disk devices, and other locations can get

complicated. Capacity planning has become an important role in doing backups and Veritas Backup Reporter has a number of

reports directly set to address this. At a high level, a report can be generated to show overall historical supply and demand. Ideally,

supply would always be equal to demand. Why pay for more storage than you need? While you don’t want to have too little supply

and not be able to meet the backup demands, Backup Reporter can show the supply and demand for the overall environment or a

particular storage location such as a VTL or tape library. And by applying predictive forecasting, you can determine exactly when

more supply needs to be added based on the growing demand.

Report 9: Backup Size


12

This report provides visibility into the amount of data being backed up in a computing environment. There are several versions

of this report—which should be in every technology manager’s portfolio—including a running total of amount of data backed

up by days for the last two weeks. This helps verify whether there are significant swings in the amount of data crossing the wire.

For environments in which there are well-defined schedules where incremental backups occur during the week followed by full

backups on weekends, a 2–3 week timeline will verify the spikes on the weekends from the full backups. By zooming in on the

amount of data being backed up on the weekends, significant shifts in overall data volume can be observed. Another type of size-

based report considers data on a monthly basis and then restricts the totals to full backups only. This report provides significant

insights on whether data growth in general is being observed across the computing environment. An additional derivative of this

report is looking at size in a business context; for example, by geography, data center, business unit, and application. The primary

audiences for this report are the CIO, operations manager, and data protection infrastructure architects. Just as we expect CEOs to

be very familiar with numbers such as annual revenue, % growth and margins, this is one of the metrics that CIOs and CTOs need

to know.

Report 10: Week At A Glance

Reporting out to systems administrators, DBAs, and application owners is now a common requirement across most enterprises.

Similarly, with service providers, demonstrating that they are meeting the service objectives to their customers is typically part of

the contract. Leading the list of reports for end users and internal/external customers is the week-at-a-glance report. This report

provides quick and comprehensive verification that all is well. Quick in that the visual presentation of failures will be readily

apparent, and comprehensive in that all servers are itemized. An excellent use case of business views is the need to segregate the

data relevant to the audience. So a DBA will expect to only see the servers on which their databases reside and not have to sift

through a report that includes servers that are of no interest. While providing for a weekly summary, this report is quite versatile

because you can drill down to any server on any day and get the job level detail. The grouping requirement is more critical for

service providers because they need to ensure that each customer sees only their data.

13


SummaryAs data grows, backup operations are becoming increasingly complex; however, customers are benefiting from great gains in

operational productivity by leveraging backup reporting and service-level management disciplines. It is very difficult to improve on

a process that isn’t measurable, so the first step is defining, monitoring, and analyzing data on an ongoing basis. Veritas Backup

Reporter provides an excellent means for doing so. Veritas Backup Reporter provides many different types of reports that address

backup window, and risk analysis reports, to inventory planning and operational reports such as predictive forecasting and week-

recovery operations.

a variety of different needs for key stakeholders within the organization. From compliance reports such as backup success rate,

at-a-glance reports, Veritas Backup Reporter provides an important role in improving and optimizing data center backup and

For specific country offices and

contact numbers, please visit

our Web site. For product

information in the u.s., call

toll-free 1 (800) 745 6054.

symantec corporation

World Headquarters

20330 stevens creek Boulevard

cupertino, cA 95014 usA

+1 (408) 517 8000

1 (800) 721 3934

www.symantec.com

copyright © 2007 symantec corporation. All rights reserved. symantec, the symantec Logo, BindView, Enterprise security manager, sygate, Veritas, Enterprise Vault, NetBackup and Livestate are trademarks or registered trademarks of symantec corporation or its affiliates in the u.s. and other countries. other names may be trademarks of their respective owners. 01/07 xxxxxxxx

About Symantec

symantec is a global leader in

infrastructure software, enabling

businesses and consumers to have

confidence in a connected world.

The company helps customers

protect their infrastructure,

information, and interactions

by delivering software and services

that address risks to security,

availability, compliance, and

performance. Headquartered in

cupertino, calif., symantec has

operations in 40 countries.

more information is available at www.

symantec.com.

us Top 10 Backup Reports Using Veritas™ Backup Reportereval.symantec.com/mktginfo/enterprise/white_papers/b-whitepaper_top_10... · This paper provides a top 10 style categorization

Documents