Veritas NetBackup Backup Planning and Performance Tuning Guide

N281842

Veritas NetBackup™

Backup Planning and Performance Tuning Guide

UNIX, Windows, and Linux

Release 6.0

Veritas NetBackup NetBackupBackup Planning and Performance Tuning Guide

Copyright © 2003 - 2006 Symantec Corporation. All rights reserved.

Veritas NetBackup 6.0PN: 281842

Symantec, the Symantec logo, and NetBackup are trademarks or registered trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners.

Portions of this software are derived from the RSA Data Security, Inc. MD5 Message-Digest Algorithm. Copyright 1991-92, RSA Data Security, Inc. Created 1991. All rights reserved.

The product described in this document is distributed under licenses restricting its use, copying, distribution, and decompilation/reverse engineering. No part of this document may be reproduced in any form by any means without prior written authorization of Symantec Corporation and its licensors, if any.

THIS DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID, SYMANTEC CORPORATION SHALL NOT BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES IN CONNECTION WITH THE FURNISHING PERFORMANCE, OR USE OF THIS DOCUMENTATION. THE INFORMATION CONTAINED IN THIS DOCUMENTATION IS SUBJECT TO CHANGE WITHOUT NOTICE.

Symantec Corporation20330 Stevens Creek Blvd.Cupertino, CA 95014www.symantec.com

Printed in the United States of America.

http://www.symantec.com

Third-party legal notices

Third-party software may be recommended, distributed, embedded, or bundled with this Symantec product. Such third-party software is licensed separately by its copyright holder. All third-party copyrights associated with this product are listed in the accompanying release notes.

Technical support

For technical assistance, visit http://support.veritas.com and select phone or email support. Use the Knowledge Base search feature to access resources such as TechNotes, product alerts, software downloads, hardware compatibility lists, and our customer email notification service.

http://support.veritas.com

Contents

Section I Backup planning and configuration guidelines

Chapter 1 NetBackup capacity planningNew .........................................................................................................................12Introduction ..........................................................................................................13Analyzing your backup requirements ..............................................................14Designing your backup system ..........................................................................16

Calculate the required data transfer rate for your backups ..................17Calculate how long it will take to back up to tape ...................................18Calculate how many tape drives are needed ............................................20Calculate the required data transfer rate for your network(s) .............21Calculate the size of your NetBackup catalog .........................................22Calculate the size of the EMM server ........................................................23Calculate how much media is needed for full and incremental backups 25Calculate the size of the tape library needed to store your backups ...26Design your master backup server based on your previous findings ..27Estimate the number of master servers needed ......................................29Design your media server ...........................................................................31Estimate the number of media servers needed .......................................32Design your NOM server .............................................................................33Summary .......................................................................................................36

Questionnaire for capacity planning ................................................................37

Chapter 2 Master Server configuration guidelinesManaging NetBackup job scheduling ................................................................40

Delays in starting jobs .................................................................................40Delays in running queued jobs ...................................................................40Job delays caused by unavailable media ...................................................41Delays after removing a media server ......................................................41Limiting factors for job scheduling ...........................................................41Adjusting the server’s network connection options ...............................42Using NOM to monitor jobs ........................................................................43Disaster recovery testing and job scheduling ..........................................43

Miscellaneous considerations ............................................................................44Processing of storage units ........................................................................44

6

Disk staging .................................................................................................. 44File system capacity .................................................................................... 45

NetBackup catalog strategies ............................................................................ 45Catalog backup types .................................................................................. 46Guidelines for managing the catalog ........................................................ 46Catalog backup not finishing in the available window .......................... 47Catalog compression ................................................................................... 48

Merging/splitting/moving servers ................................................................... 48Moving the EMM server .............................................................................. 49

Guidelines for policies ........................................................................................ 49Include and exclude lists ............................................................................ 49Critical policies ............................................................................................. 50Schedule frequency ..................................................................................... 50

Managing logs ...................................................................................................... 50Optimizing the performance of vxlogview .............................................. 50Interpreting legacy error logs .................................................................... 51

Chapter 3 Media Server configuration guidelinesNetwork and SCSI/FC bus bandwidth ............................................................... 54How to change the threshold for media errors ............................................... 54

Adjusting media_error_threshold ............................................................. 55How to reload the st driver without rebooting Solaris .................................. 57Media Manager drive selection ......................................................................... 58Robot types and NetBackup port configuration ............................................. 58

Chapter 4 Media configuration guidelinesDedicated or shared backup environment ....................................................... 60Pooling ................................................................................................................... 60Disk versus tape ................................................................................................... 60

Chapter 5 Database backup guidelinesIntroduction ......................................................................................................... 64Considerations for database backups ............................................................... 64

Chapter 6 Best practicesBest practices: new tape drive technologies .................................................... 66Best practices: tape drive cleaning ................................................................... 66Best practices: storing tape cartridges ............................................................. 68Best practices: recoverability ............................................................................. 68

Suggestions for data recovery planning .................................................. 69Best practices: naming conventions ................................................................. 71

7

Policy names .................................................................................................71Schedule names ............................................................................................72Storage unit/storage group names ............................................................72

Section II Performance tuning

Chapter 7 Measuring performanceOverview ................................................................................................................76Controlling system variables for consistent testing conditions ...................76

Server variables ............................................................................................76Network variables ........................................................................................77Client variables .............................................................................................78Data variables ...............................................................................................78

Evaluating performance .....................................................................................79Evaluating UNIX system components ..............................................................84

Monitoring CPU load ...................................................................................84Measuring performance independent of tape or disk output ...............84

Evaluating Windows system components .......................................................85Monitoring CPU load ...................................................................................86Monitoring memory use .............................................................................87Monitoring disk load ...................................................................................87

Chapter 8 Tuning the NetBackup data transfer pathOverview ................................................................................................................90The data transfer path ........................................................................................90Basic tuning suggestions for the data path .....................................................91NetBackup client performance ..........................................................................95NetBackup network performance .....................................................................96

Network interface settings .........................................................................96Network load .................................................................................................97NetBackup media server network buffer size ..........................................97NetBackup client communications buffer size ........................................99The NOSHM file .........................................................................................100Using multiple interfaces .........................................................................101

NetBackup server performance .......................................................................102Shared memory (number and size of data buffers) ..............................102Parent/child delay values .........................................................................108Using NetBackup wait and delay counters ............................................108Fragment size and NetBackup restores ..................................................119Other restore performance issues ...........................................................122

NetBackup storage device performance .........................................................126

8

Chapter 9 Tuning other NetBackup componentsMultiplexing and multi-streaming .................................................................130

When to use multiplexing and multi-streaming ...................................130Effects of multiple data streams on backup/restore ............................132

Encryption ..........................................................................................................133Compression .......................................................................................................133

How to enable compression .....................................................................133Using both encryption and compression .......................................................134NetBackup java ...................................................................................................134Vault ....................................................................................................................134Fast recovery with bare metal restore ............................................................135Backing up many small files ............................................................................135

FlashBackup ...............................................................................................136

Chapter 10 Tuning disk I/O performanceHardware performance hierarchy ..................................................................140

Performance hierarchy level 1 ................................................................142Performance hierarchy level 2 ................................................................142Performance hierarchy level 3 ................................................................143Performance hierarchy level 4 ................................................................144Performance hierarchy level 5 ................................................................145General notes on performance hierarchies ...........................................145

Hardware configuration examples .................................................................147Tuning software for better performance .......................................................148

Chapter 11 OS-related tuning factorsKernel tuning (UNIX) ........................................................................................152

Kernel parameters on Solaris 8 and 9 ....................................................152Kernel parameters in Solaris 10 ..............................................................154Message queue and shared memory parameters on HP-UX ...............155Kernel parameters on Linux ....................................................................157

Adjusting data buffer size (Windows) ............................................................157Other Windows issues .......................................................................................159

Appendix A Additional resourcesPerformance tuning information at vision online ...............................161Performance monitoring utilities ...........................................................161Freeware tools for bottleneck detection ................................................161Mailing list resources ................................................................................162

Index 163

Section I

Backup planning and configuration guidelines

Section I helps you lay the foundation of good backup performance through planning and configuring your NetBackup installation. Section I also includes some best practices.

Section I includes these chapters:

■ NetBackup Capacity Planning

■ Master Server Configuration Guidelines

■ Media Server Configuration Guidelines

■ Media Configuration Guidelines

■ Database Backup Guidelines

■ Best Practices

Note: For a discussion of tuning factors and general recommendations that may be applied to an existing installation, see Section II.

10

Chapter
1
NetBackup capacity planning

This chapter explains how to design your backup system as a foundation for good performance.

This chapter includes the following sections:

■ “Introduction” on page 13

■ “Analyzing your backup requirements” on page 14

■ “Designing your backup system” on page 16

■ “Questionnaire for capacity planning” on page 37

12 NetBackup capacity planningNew

NewVeritas NetBackup is a high-performance data protection application. Its architecture is designed for large and complex distributed computing environments. NetBackup provides a scalable storage management server that can be configured for network backup, recovery, archival, and file migration services.

This manual is for administrators who want to analyze, evaluate, and tune NetBackup performance. This manual is intended to answer questions such as the following: How big should the backup server be? How can the NetBackup server be tuned for maximum performance? How many CPUs and tape drives are needed? How to configure backups to run as fast as possible? How to improve recovery times? What tools can characterize or measure how NetBackup is handling data?

Note: Most critical factors in performance are based in hardware rather than software. Hardware selection and configuration have roughly four times the weight that software has in determining performance. Although this guide provides some hardware configuration assistance, it is assumed for the most part that your devices are correctly configured.

DisclaimerIt is assumed you are familiar with NetBackup and your applications, operating systems, and hardware. The information in this manual is advisory only, presented in the form of guidelines. Changes to an installation undertaken as a result of the information contained herein should be verified in advance for appropriateness and accuracy. Some of the information contained herein may apply only to certain hardware or operating system architectures.

Note: The information in this manual is subject to change.

13NetBackup capacity planningIntroduction

IntroductionThe first step toward accurately estimating your backup requirements is a complete understanding of your environment. Many performance issues can be traced to hardware or environmental issues. A basic understanding of the entire backup data path is important in determining the maximum performance you can expect from your installation.

Every backup environment has a bottleneck. It may be a fast bottleneck, but it will determine the maximum performance obtainable with your system.

Example: Consider the configuration illustrated below. In this environment, backups run slowly (in other words, they are not completing in the scheduled backup window). Total throughput is eight to 10 megabytes per second.

What makes the backups run slowly? How can NetBackup or the environment be configured to increase backup performance in this situation?

Figure 1-1 Dedicated NetBackup server

The explanation is that the LAN, having a speed of 100megabits per second, has a theoretical throughput of 12.5 megabytes per second. In practice, 100BaseT throughput is unlikely to exceed 70% utilization. Therefore, the best delivered data rate is about 8 megabytes per second to the NetBackup server. The throughput can be even lower than this, when TCP/IP packet headers, TCP-window size constraints, router hops (packet latency for ACK packets delays the sending of the next data packet), host CPU utilization, filesystem overhead, and other LAN users’ activity are considered. Since the LAN is the slowest element in the backup path, it is the first place to look in order to increase backup performance in this configuration.

14 NetBackup capacity planningAnalyzing your backup requirements

Analyzing your backup requirementsMany elements influence your backup strategy. You must analyze and compare these factors and then make backup decisions according to your site’s priorities. When you plan your installation’s NetBackup capacity, ask yourself the following questions:

■ Which systems need to be backed up?

It is important that you identify all systems that need to be backed up and then list each system separately so that you can identify any that require more resources to back up. Document which machines have local tape drives or libraries attached and be sure to write down the model type of each tape drive or library. In addition, record each host name, operating system and version, database type and version, network technology (for example, ATM or 100BaseT), and location.

■ How much data will be backed up?

Calculate how much data you need to back up. Include the total disk space on each individual system, including that for databases. Remember to add the space on mirrored disks only once.

By calculating the total size for all disks, you can design a system that takes future growth into account. You should also consider the future by estimating how much data you will need to back up in six months to a few years from now.

■ Do you plan to back up databases or raw partitions?

If you are planning to backing up databases, you need to identify the database engines, their version numbers, and the method that you will use to back them up. NetBackup can back up several database engines and raw file systems, and databases can be backed up while they are online or offline. To back up any database while it is online, you need a NetBackup database agent for your particular database engine.

If you use NetBackup Advanced Client to back up databases using raw partitions, you are actually backing up as much data as the total size of your raw partition. Also, remember to add the size of your database backups to your final calculations when figuring out how much data you need to back up.

■ Will you be backing up specialty servers like MS-Exchange, Lotus Notes, etc.?

If you are planning on backing up any specialty servers, you will need to identify their types and application release numbers. As previously mentioned, you may need a special NetBackup agent to properly back up your particular servers.

■ What types of backups are needed and how often should they take place?

15NetBackup capacity planningAnalyzing your backup requirements

The frequency of your backups has a direct impact on your:

■ Tape requirements

■ Data transfer rate considerations

■ Restore opportunities.

To properly size your backup system, you must decide on the type and frequency of your backups. Will you perform daily incremental and weekly full backups? Monthly or bi-weekly full backups?

■ How much time is available to run each backup?

It is important to know the window of time that is available for each backup. The length of a window dictates several aspects of your backup strategy, for example, you may want a larger window of time to back up multiple, high-capacity servers. Or you may consider the use of advanced NetBackup features such as synthetic backups, a local snapshot method, or FlashBackup.

■ How long should backups be retained?

An important factor while designing your backup strategy is to consider your policy for backup expiration. The amount of time a backup is kept is also known as the “retention period.” A fairly common policy is to expire your incremental backups after one month and your full backups after six months. With this policy, you can restore any daily file change from the previous month and restore data from full backups for the previous six months. The length of the retention period depends on your own unique requirements and business needs, and perhaps regulatory requirements. However, keep in mind that the length of your retention period has a directly proportional effect on the number of tapes you will need and the size of your NetBackup catalog database. Your NetBackup catalog database keeps track of all the information on all your tapes. The catalog size is tightly tied in to your retention period and the frequency of your backups. Also, database management daemons and services may become bottlenecks.

■ If backups are sent off site, how long must they remain off site?

If you plan to send tapes to an off site location as a disaster recovery option, you must identify which tapes to send off site and how long they remain off site. You might decide to duplicate all your full backups, or only a select few. You might also decide to duplicate certain systems and exclude others. As tapes are sent off site, you will need to buy new tapes to replace them until they are recycled back from off site storage. If you forget this simple detail, you will run out of tapes when you most need them.

■ What is your network technology?

If you are planning on backing up any system over a network, note the network types that you will be using. The next section, Designing your

16 NetBackup capacity planningDesigning your backup system

backup system, explains how to calculate the amount of data you can transfer over those networks in a given time.

Depending on the amount of data that you want to back up and the frequency of those backups, you might want to consider installing a private network just for backups.

■ What new systems will be added to your site in the next six months?

It is important to plan for future growth when designing your backup system. By analyzing the potential future growth of your current or future systems, you can insure the backup solution that you have accommodates the kind of environment that you will have in the future. Remember to add any resulting growth factor that you incur to your total backup solution.

■ Will user-directed backups or restores be allowed?

Allowing users to do their own backups and restores can reduce the time it takes to initiate certain operations. However, user-directed operations can also result in higher support costs and the loss of some flexibility. User-directed operations can monopolize media and tape drives when you most need them. They can also generate more support calls and training issues while the users become familiar with the new backup system. You will need to decide whether allowing user access to some of your backup systems’ functions is worth the potential costs.

Other factors to consider when planning your backup capacity include:

■ Data type: What are the types of data: text, graphics, database? How compressible is the data? How many files are involved? Will the data be encrypted? (Note that encrypted backups may run slower. See “Encryption” on page 133 for more information.)

■ Data location: Is the data local or remote? What are the characteristics of the storage subsystem? What is the exact data path? How busy is the storage subsystem?

■ Change management: Because hardware and software infrastructure will change over time, is it worth the cost to create an independent test-backup environment to ensure your production environment will work with the changed components?

Designing your backup systemFollowing an analysis of your backup requirements, you can begin designing your backup system. Use the following subsections in the order shown below.

17NetBackup capacity planningDesigning your backup system

Note: The ideas and examples that follow are based on standard and ideal calculations. Your numbers will differ based on your particular environment, data, and compression rates.

■ “Calculate the required data transfer rate for your backups” on page 17

■ “Calculate how long it will take to back up to tape” on page 18

■ “Calculate how many tape drives are needed” on page 20

■ “Calculate the required data transfer rate for your network(s)” on page 21

■ “Calculate the size of your NetBackup catalog” on page 22

■ “Calculate the size of the EMM server” on page 23

■ “Calculate how much media is needed for full and incremental backups” on page 25

■ “Calculate the size of the tape library needed to store your backups” on page 26

■ “Design your master backup server based on your previous findings” on page 27

■ “Estimate the number of master servers needed” on page 29

■ “Design your media server” on page 31

■ “Estimate the number of media servers needed” on page 32

■ “Design your NOM server” on page 33

■ “Summary” on page 36

Calculate the required data transfer rate for your backupsThis is the rate of transfer your system must achieve to complete a backup of all your data in the allowed time window. Use the following formula to calculate your ideal data transfer rate for full and incremental backups:

Ideal data transfer rate = (Amount of data to back up) / (Backup window)

On average, the daily change in data for many systems is between 10 and 20 percent. Calculating a change of 20% in the (Amount of data to back up) and dividing it by the (Backup window) will give you the backup data rate for incremental backups.

If you are running cumulative-incremental backups, you need to take into account which data is changing, since that affects the size of your backups. For example, if the same 20% of the data is changing daily, your


cumulative-incremental backup will be much smaller than if a completely different 20% changes every day.

Example: Calculating your ideal data transfer rate during the weekAssumptions:

Amount of data to back up during a full backup = 500 gigabytesAmount of data to back up during an incremental backup = 20% of a full backup Daily backup window = 8 hours

Solution 1:

Full backup = 500 gigabytes Ideal data transfer rate = 500 gigabytes/8 hours = 62.5 gigabytes/hour

Solution 2:

Incremental backup = 100 gigabytes Ideal data transfer rate = 100 gigabytes/8 hours = 12.5 gigabytes/hour

To calculate your ideal data transfer rate during the weekends, divide the amount of data that needs to be backed up by the length of the weekend backup window.

Calculate how long it will take to back up to tapeOnce you know what your ideal data transfer rates are for backups, you can figure out what kind of tape drive technology will meet your needs. Because you also know the length of your available backup windows and the amount of data that needs to be backed up, you can also calculate how many tape drives you will need.

The table below lists the transfer rates for several tape drive technologies. The values listed are those published by their individual manufacturers and those observed in real-life situations. Keep in mind that device manufacturers list optimum rates for their devices. In reality, it is quite rare to achieve those values when a system has to deal with the overhead of the operating system, CPU loads, bus architecture, data types, and other hardware and software issues.

The typical gigabytes/hour values from the Table 1-1Tape drive data transfer rates table represent a range of real-life transfer rates for several devices, with and without compression. When you design your backup system, consider the nature of both your data and your environment. It is generally wise to estimate on the conservative side when planning capacity. For instance, use the low end of the typical gigabytes/hour range for your planning unless you have specific reasons to use the higher numbers.

To calculate the length of your backups using a particular tape drive, use the formula:


Actual data transfer rate = (Amount of data to back up)/((Number of drives) * (Tape drive transfer rate))

Example: Calculating the actual data transfer rate requiredAssumptions:

Amount of data to back up during a full backup = 500 gigabytes Daily backup window = 8 hours Ideal transfer rate (data/(backup window)) = 500 gigabytes/8 hours = 62.5 gigabytes/hour

Solution 1:

Tape drive = 1 drive, LTO gen 1Tape drive transfer rate = 37 gigabytes/hour Actual data transfer rate = 500 gigabytes/((1 drive) * (37 gigabytes/hour)) = 13.51 hours

With a data transfer rate of 37 gigabytes/hour, a single LTO gen 1 tape drive will take 13.51 hours to perform a 500 gigabyte backup. A single LTO gen 1 tape drive will not be able to perform your backup in eight hours. You will need a faster tape drive or another LTO gen 1 tape drive.

Solution 2:

Tape drive = 1 drive, LTO gen 2Tape drive transfer rate = 75 gigabytes/hour Backup length = 500 gigabytes/((1 drive) * (75 gigabytes/hour)) = 6.67 hours

With a data transfer rate of 75 gigabytes/hour, a single LTO gen 2 tape drive will take 6.67 hours to perform a 500 gigabyte backup.

Table 1-1 Tape drive data transfer rates

Drive Theoretical gigabytes/hour (no compression)

Theoretical gigabytes/hour (2:1 compression)

Typical gigabytes/hour

LTO gen 1 54 108 37-65

LTO gen 2 108 216 75-130

LTO gen 3 288 576 200-345

SDLT 320 57 115 40-70

SDLT 600 129 259 90-155

STK 9940B 108 252 (2.33:1) 75-100


Depending on the several factors that can influence the transfer rates of your tape drives, it is possible to obtain higher or lower transfer rates. The solutions in the examples above are approximations of what you can expect.

Note also that a backup of encrypted data may take more time. See “Encryption” on page 133 for more information.

Calculate how many tape drives are neededTo calculate how many tape drives you will need to perform your backups, use the formula below and the typical gigabytes/hour transfer rates from the table “Tape drive data transfer rates” on page 19.

Number of drives = (Amount of data to back up) /((Backup window) * (Tape drive transfer rate))

Example: Calculating the number of tape drives needed to perform a backup Assumptions:

Amount of data to back up = 500 gigabytes Backup window = 8 hours

Solution 1:

Tape drive type = SDLT 320Tape drive transfer rate = 40 gigabytes/hour Number of drives = 500 gigabytes/ ((8 hours) * (40 gigabytes/hour)) = 1.56 = 2 drives

Solution 2:

Tape drive type = SDLT 600 Tape drive transfer rate = 90 gigabytes/hour Number of drives = 500 gigabytes/((8 hours) * (90 gigabytes/hour)) = 0.69 = 1 drive

Although it is quite straightforward to calculate the number of drives needed to perform a backup, it is difficult to spread the data streams evenly across all drives. To effectively spread your data, you have to experiment with various backup schedules, NetBackup policies, and your hardware configuration. See “Basic tuning suggestions for the data path” on page 91 to determine your options.

Another important aspect of calculating how many tape devices you will need is calculating how many tape devices you can attach to a drive controller.

When calculating the maximum number of tape drives that you can attach to a controller, you must know the drive and controller maximum transfer rates as published by their manufacturers. Failure to use maximum transfer rates for your calculations can result in saturated controllers and unpredictable results.


The table below displays the transfer rates for several drive controllers. In practice, your transfer rates might be slower because of the inherent overhead of several variables including your file system layout, system CPU load, and memory usage.

Calculate the required data transfer rate for your network(s)When designing your backup system to perform backups over a network, you need to move data from your client(s) to your backup server(s) at a fast enough rate to finish your backups within your allotted backup window. Using the typical gigabytes/hour transfer rates from the table below, you can find out the typical transfer rates of some fairly common network technologies. To calculate the required data transfer rate, use the formula below:

Required network data transfer rate = (Amount of data to back up) / (Backup window)

Table 1-2 Drive controller data transfer rates

Drive Controller Theoretical megabytes/second

Theoretical gigabytes/hour

ATA-5 (ATA/ATAPI-5) 66 237.6

Wide Ultra 2 SCSI 80 288

iSCSI 100 360

1 Gigabit Fibre Channel 100 360

SATA/150 150 540

Ultra-3 SCSI 160 576


SATA/300 300 1080

Ultra320 SCSI 320 1152


Table 1-3 Network data transfer rates

Network Technology Theoretical gigabytes/hour Typical gigabytes/hour

10BaseT (switched) 3.6 2.7

100BaseT (switched) 36 32

1000BaseT (switched) 360 320


Note: For additional information on the importance of matching network bandwidth to your tape drives, see “Network and SCSI/FC bus bandwidth” on page 54.

Example: Calculating network transfer ratesAssumptions:

Amount of data to back up = 500 gigabytes Backup window = 8 hours Required network transfer rate = 500 gigabytes/8hr = 62.5 gigabytes/hour

Solution 1: Network Technology = 10BaseT (switched) Typical transfer rate = 2.7 gigabytes/hour

Using the values from the Table 1-3Network data transfer rates table, a single 10BaseT network has a transfer rate of 2.7 gigabytes/hour. This network will not handle your required data transfer rate of 62.5 gigabytes/hour. In this case, you would have to explore some other options, such as:

■ Backing up your data over a faster network (1000BaseT)

■ Backing up large servers to dedicated tape drives (media servers)

■ Performing your backups during a longer time window

■ Performing your backups over faster dedicated private networks.

Solution 2: Network Technology = 1000BaseT (switched) Typical transfer rate = 320 gigabytes/hour

Using the values from the Table 1-3Network data transfer rates table, a single 1000BaseT network has a transfer rate of 320 gigabytes/hour. This network technology will be able to handle your backups with room to spare.

Calculating the data transfer rates for your networks can help you identify your potential bottlenecks by looking at the transfer rates of your slowest networks. “Basic tuning suggestions for the data path” on page 91 provides several solutions for dealing with multiple networks and bottlenecks.

Calculate the size of your NetBackup catalogAn important factor when designing your backup system is to calculate how much disk space you need to store your NetBackup catalog. Your catalog keeps track of all the files that have been backed up. The catalog’s size is directly tied in to several variables, including the frequency of your backups, the number of files being backed up, the path length for each file being backed up, and your retention periods. On average, the size of your catalog can be between 1% to 2% (or higher) of the total data being tracked.


To calculate your NetBackup catalog size, you need to know how much data you will be backing up for full and incremental backups, how often these backups will be performed, and for how long they will be retained. Here are two simple formulas to calculate these values:

Data being tracked = (Amount of data to back up) * (Number of backups) * (Retention period)

NetBackup catalog size = 120 * (number of files)

Note: If you select NetBackup’s True Image Restore option, your catalog will be twice as large as a catalog without this option selected. True Image Restore collects the information required to restore directories to their contents at the time of any selected full or incremental backup. Because the additional information that NetBackup collects for incremental backups is the same as that of a full backup, incremental backups take much more disk space when you collect True Image Restore information.

Example: Calculating the size of your NetBackup catalogAssumptions:

Amount of data to back up = 100 gigabytes Incremental backups = 20% of all data Full backups per month = 4 Retention period for full backups = 6 months Incremental backups per month = 30 Retention period for incremental backups = 1 month

Solution:

Size of full backups = 100 gigabytes * 4 * 6 months = 2.4 terabytesSize of incremental backups = (20% of 100 gigabytes) * 30 * 1 month = 600 gigabytes Total data tracked = 2.4 terabytes + 600 gigabytes = 3 terabytes

NetBackup catalog size = 2% of 3 terabytes= 60 gigabytes

Based on the previous assumptions, it will take 60 gigabytes of disk space to hold the catalog. Compression can reduce the size of your catalog to one-sixth or less of its uncompressed size. When the catalog is decompressed, this is only done for the images and time period of the particular system that you need to restore.

Calculate the size of the EMM serverBy default, the EMM server resides on the NetBackup master server. The amount of space needed for the EMM server is determined by the size of the NetBackup database (NBDB), as explained below.


Note: This space must be included when determining size requirements for a master or media server, depending on where the EMM server is installed.

Space for the NBDB on the EMM server is required in the following two locations:

UNIX/usr/openv/db/data/usr/openv/db/staging

Windowsinstall_path\NetBackupDB\datainstall_path\NetBackupDB\staging

Calculate the required space for the NBDB in each of the two directories, as follows:

60 MB + (2 KB * number of volumes configured for EMM)

where EMM is the Enterprise Media Manager, and volumes are NetBackup (EMM) media volumes. Note that 60 MB is the default amount of space needed for the NBDB database used by the EMM server. It includes pre-allocated space for configuration information for devices and storage units.

Note: During NetBackup installation, the install script looks for 60 MB of free space in the above /data directory; if there is insufficient space, the installation fails. The space in /staging is only required when a hot catalog backup is run.

Example: Calculating the space needed for the EMM serverAssuming there are 1000 EMM volumes to back up, the total space needed for the EMM server in /usr/openv/db/data is:

60 MB + (2 KB * 1000 volumes) = 62 MB

The same amount of space is required in /usr/openv/db/staging. The amount of space required may grow over time as the NBDB database increases in size.

Note: The above 60 MB of space is pre-allocated, and is derived from the following separate databases that are consolidated into the EMM database in NetBackup 6.0: globDB, ltidevs, robotic_def, namespace.chksum, ruleDB, poolDB, volDB, mediaDB, storage_units, stunit_groups, SSOhosts, and media errors database. See the NetBackup Release Notes, in the section titled “Enterprise Media Manager Databases,” for additional details on files and database information included in the EMM database.


Calculate how much media is needed for full and incremental backups

As part of planning your backup strategy, calculate how many tapes will be needed to store and retrieve your backups. The number of tapes that you will need depends on:

■ The amount of data that you are backing up

■ The frequency of your backups

■ The planned retention periods

■ The capacity of the media used to store your backups.

If you expect your site's workload to increase over time, you can ease the pain of future upgrades by planning for expansion. Design your initial backup architecture so it can evolve to support more clients and servers. Invest in the faster, higher-capacity components that will serve your needs beyond the present.

A simple formula for calculating your tape needs is shown here:

Number of tapes = (Amount of data to back up) / (Tape capacity)

To calculate how many tapes will be needed based on all your requirements, the above formula can be expanded to

Number of tapes = ((Amount of data to back up) * (Frequency of backups) * (Retention period)) / (Tape capacity)

Example: Calculating how many tapes are needed to store all your backupsPreliminary calculations:

Size of full backups = 500 gigabytes * 4 (per month) * 6 months = 12 terabytes

Table 1-4 Tape capacities

Drive Theoretical gigabytes (no compression)

Theoretical gigabytes (2:1 compression)

LTO gen 1 100 200

LTO gen 2 200 400

LTO gen 3 400 800

SDLT 320 160 320

SDLT 600 300 600

STK 9940B 200 400


Size of incremental backups = (20% of 500 gigabytes) * 30 * 1 month = 3 terabytes Total data tracked = 12 terabytes + 3 terabytes = 15 terabytes

Solution 1:

Tape drive type = LTO gen 1Tape capacity without compression = 100 gigabytes Tape capacity with compression = 200 gigabytes

Without compression: Tapes needed for full backups = 12 terabytes/100 gigabytes = 120Tapes needed for incremental backups = 3 terabytes/100 gigabytes = 30 Total tapes needed = 120 + 30 = 150 tapes

With 2:1 compression: Tapes needed for full backups = 12 terabytes/200 gigabytes = 60Tapes needed for incremental backups = 3 terabytes/200 gigabytes = 15 Total tapes needed = 60 + 15 = 75 tapes

Solution 2:

Tape drive type = LTO gen 3Tape capacity without compression = 400 gigabytes Tape capacity with compression = 800 gigabytes

Without compression: Tapes needed for full backups = 12 terabytes/400 gigabytes = 30 Tapes needed for incremental backups = 3 terabytes/400 gigabytes = 7.5 ~= 8 Total tapes needed = 30 + 8 = 38 tapes

With 2:1 compression: Tapes needed for full backups = 12 terabytes/800 gigabytes = 15 Tapes needed for incremental backups = 3 terabytes/800 gigabytes = 3.75 ~= 4Total tapes needed = 15 + 4 = 19 tapes

Calculate the size of the tape library needed to store your backupsTo calculate how many robotic library tape slots are needed to store all your backups, take the number of tapes for backup calculated in “Calculate how much media is needed for full and incremental backups” on page 25 and add tapes for catalog backup and cleaning:

Tape slots needed = (Number of tapes needed for backups) + (Number of tapes needed for catalog backups) + 1 (for a cleaning tape)

A typical example of tapes needed for catalog backup is 2.

Additional tapes may be needed for the following:


■ If you plan to duplicate tapes or to reserve some media for special (non-backup) use, add those tapes to the above formula.

■ Add tapes needed for future data growth. Make sure your system has a viable upgrade path as new tape drives become available.

Design your master backup server based on your previous findingsTo design and configure a master backup server, you must:

■ Perform an initial backup requirements analysis, as outlined in the section “Analyzing your backup requirements” on page 14.

■ Perform the calculations outlined in the previous steps of the current section.

Designing a backup server becomes a simple task once the basic design constraints are known:

■ Amount of data to back up

■ Size of the NetBackup catalog

■ Number of tape drives needed

■ Number of networks needed

Given the above, a simple approach to designing your backup server can be outlined as follows:

■ Acquire a dedicated server

■ Add tape drives and controllers (for saving your backups)

■ Add disk drives and controllers (for OS and NetBackup catalog)

■ Add network cards

■ Add memory

■ Add CPU’s


Figure 1-2 Backup server hardware component

In some cases, it may not be practical to design a generic server to back up all of your systems. You might have one or several large servers that cannot be backed up over a network within your backup window. In such cases, it is best to back up those servers using their own locally-attached tape drives. Although this section discusses how to design a master backup server, you can still use its information to properly add the necessary tape drives and components to your other servers.

The next example shows how to configure a master server using the design elements gathered from the previous sections.

Example: Designing your master backup server Assumptions:

Amount of data to back up during full backups = 500 gigabytes Amount of data to back up during incremental backups = 100 gigabytes Tape drive type = SDLT 600 Tape drives needed = 1 Network technology = 100BaseT Network cards needed = 1 Size of NetBackup catalog after 6 months = 60 gigabytes (from “Example: Calculating the size of your NetBackup catalog” on page 23)

Solution (the following values are based on the table “CPUs needed per master/media server component” and “Memory needed per master/media server component” on page 32):

CPUs needed for network cards = 1 CPUs needed for tape drives = 1


CPUs needed for OS = 1Total CPUs needed = 1 + 1 + 1 = 3

Memory needed for network cards = 16 megabytesMemory needed for tape drives = 128 megabytesMemory needed for OS and NetBackup = 1 gigabyteTotal memory needed = 16 + 128 + 1000 = 1.144 gigabytes

Based on the above, your master server needs 3 CPUs and 1.144 gigabytes of memory. In addition, you need 60 gigabytes of disk space to store your NetBackup catalog, along with the necessary disks and drive controllers to install your operating system and NetBackup (2 gigabytes should be ample for most installations). This server also requires one SCSI card, or another, faster, adapter for use with the tape drive (and robot arm) and a single 100BaseT card for network backups.

When designing your backup server solution, begin with a dedicated server for optimum performance. In addition, consult with your server’s hardware manufacturer to ensure that the server can handle your other components. In most cases, servers have specific restrictions on the number and mixture of hardware components that can be supported concurrently. Overlooking this last detail can cripple even the best of plans.

Estimate the number of master servers neededOne of the key elements in designing your backup solution is estimating how many master servers are needed. As a rule, the number of master servers is proportional to the number of media servers. To determine how many master servers are required, consider the following:

■ The master server must be able to periodically communicate with all its media servers. If there are too many media servers, master server processing may be overloaded.

■ Consider business-related requirements. For example, if an installation has different applications which require different backup windows, a single master may have to run backups continually, leaving no spare time for catalog cleaning, catalog backup, or maintenance.

■ If at all possible, design your configuration with one master server per firewall domain. In addition, do not share robotic tape libraries between firewall domains.

■ As a rule, the number of clients (separate physical hosts) per master server is not a critical factor for NetBackup. Ordinary backup processing performed by each client has little or no impact on the NetBackup server, unless, for instance, the clients all have database extensions or are trying to run ALL_LOCAL_DRIVES at the same time.


■ Plan your configuration so that it contains no single point of failure. Provide sufficient redundancy to ensure high availability of the backup process. Having more tape drives or media may reduce the number of media servers needed per master server.

■ Consider limiting the number of media servers handled by a master to the lower end of the estimates in the following table, Table 1-5Number of media servers supported by a master server.

Although a well-managed NetBackup environment can handle more media servers than the numbers listed in this table, you may find your backup operations more efficient and manageable with fewer but larger media servers. The variation in the number of media servers per master server for each scenario in the table depends on the number of jobs submitted, multiplexing, multi-streaming, and network capacity.

For information on designing a master server, refer to “Design your master backup server based on your previous findings” on page 27.

Note: This table provides a rough estimate only, as a guideline for initial planning. Note also that the RAM amounts shown below are for a base NetBackup installation; RAM requirements vary depending on the NetBackup features, options, and agents being used.

Table 1-5 Number of media servers supported by a master server

Master Server Type

RAM Number of Processors

Master Backups

Media Server Backups

Media Configuration

Number of Media Servers Per Master Server

Solaris 2 gigabytes 4 Not backing up clients

Media server backing up itself only

10 - 20 tape drives in not more than 2 libraries

25 - 40

Solaris 4 gigabytes 4 Not backing up clients



35 - 50

Solaris 8+ gigabytes 4 Not backing up clients

Media server backing up network clients


50 -70


Design your media serverYou can use a media server not only to back up itself, but also to back up other systems and reduce or balance the load on your master server. With NetBackup, the robotic control of a library can be on either the master server or the media server.

Windows 2 gigabytes 4 Not backing up clients



10+

Windows 4 gigabytes 4 Not backing up clients



20+

Windows 8+ gigabytes 4 Not backing up clients

Media server backing up network clients


50+

Table 1-5 Number of media servers supported by a master server

Master Server Type

RAM Number of Processors

Master Backups

Media Server Backups

Media Configuration

Number of Media Servers Per Master Server

Table 1-6 CPUs needed per master/media server component

Component How many and what kind of component Number of CPU’s per component

Network cards 2-3 100BaseT cards 1

5-7 10BaseT cards 1

1 ATM card 1

1-2 Gigabit Ethernet cards with coprocessor 1

Tape drives 2 LTO gen 3 drives 1

2-3 SDLT 600 drives 1

2-3 LTO gen 2 drives 1

3-4 LTO gen 1 drives 1


The information in the above tables is a rough estimate only, intended as a guideline for initial planning.

In addition to the above media server components, you must also add the necessary disk drives to store the NetBackup catalog and your operating system. The size of the disks needed to store your catalog depends on the calculations explained earlier under “Calculate the size of your NetBackup catalog” on page 22.

Estimate the number of media servers neededHere are some guidelines for estimating the number of media servers needed:

■ I/O performance is generally more important than CPU performance.

■ Consider CPU, I/O, and memory expandability when choosing a server.

■ Consider how many CPUs are needed (see “CPUs needed per master/media server component” on page 31). Here are some general guidelines:

Experiments (with Sun Microsystems) have shown that a useful, conservative estimate is 5MHz of CPU capacity per 1MB/second of data movement in and out of the NetBackup media server. Keep in mind that the operating system and other applications also use the CPU. This estimate is for the power available to NetBackup itself.

OS and NetBackup 1

Table 1-6 CPUs needed per master/media server component

Component How many and what kind of component Number of CPU’s per component

Table 1-7 Memory needed per master/media server component

Component Type of component Memory per component

Network cards 16 megabytes

Tape drives LTO gen 3 drive 256 megabytes

SDLT 600 drive 128 megabytes

LTO gen 2 drive 128 megabytes

LTO gen 1 drive 64 megabytes

OS and NetBackup 1 gigabyte

OS, NetBackup, and NOM 1 or more gigabytes

NetBackup multiplexing 8 megabytes * (# streams) * (# drives)


Example:

A system backing up clients over the network to a local tape drive at the rate of 10MB/second would need 100MHz of available CPU power: 50MHz to move data from the network to the NetBackup server50MHz to move data from the NetBackup server to tape.

■ Consider how much memory is needed (see “Memory needed per master/media server component” on page 32).

At least 512 megabytes of RAM is recommended if the server is running a Java GUI. NetBackup uses shared memory for local backups. NetBackup buffer usage will affect how much memory is needed. See the “Tuning the NetBackup data transfer path” chapter for more information on NetBackup buffers.

Keep in mind that non-NetBackup processes need memory in addition to what NetBackup needs.

A media server moves data from disk (on relevant clients) to storage (usually disk or tape). The server must be carefully sized to maximize throughput. Maximum throughput is attained when the server keeps its tape devices streaming. (For an explanation of streaming, see “Tape streaming” on page 126.)

Media server factors to consider for sizing include:

■ Disk storage access time

■ Adapter (for example, SCSI) speed

■ Bus (for example, PCI) speed

■ Tape device speed

■ Network interface (for example, 100BaseT) speed

■ Amount of system RAM

■ Other applications, if the host is non-dedicated

The platform chosen must be able to drive all network interfaces and keep all tape devices streaming.

Design your NOM serverBefore setting up a NetBackup Operations Manager (NOM) server, review the recommendations and requirements listed in the installation chapter of the NetBackup Operations Manager Getting Started Guide. Note, for example:

■ NOM server software does not have to be installed on the same server as NetBackup 6.0 master server software. Since the NOM server is also a web server, installing NOM on a master server may impact security and performance. (The guidelines provided here assume that the NOM server is a standalone host not acting as a master server.)


■ Symantec recommends that you not install NOM software on a clustered NetBackup master server.

Sizing considerationsThe size of your NOM server depends largely on the number of NetBackup objects that NOM manages. See the following table.

Based on the above factors, the following NOM server components should be sized accordingly.

The next section describes the NOM database and how it affects disk space requirements, followed by overall sizing guidelines for NOM.

NOM databaseThe Sybase database used by NOM is similar to that used by NetBackup and is installed as part of the NOM server installation.

■ The disk space needed for the initial installation of NOM depends on the volume of data initially loaded onto the server, based on the following: number of policy data records, number of job data records, number of media data records, and number of catalog image records.

■ The rate of NOM database growth depends on the quantity of data being managed: policy data, job data, media data, and catalog data.

Factors in determining NOM server size

Number of master servers to manage

(the number of media servers is irrelevant)

Number of policies

Number of jobs run per day

Number of media

Number of catalog images

NOM server components

Disk space (installed NOM binary + NOM database, described below)

Type and number of CPUs

RAM


Sizing guidelinesThe following guidelines are presented in groups based on the number of objects that your NOM server manages.

It is assumed that your NOM server is a standalone host (the host is not acting as a NetBackup master server).

Note: Symantec recommends multiple NOM servers for deployments larger than those described in the following guidelines.

Note: The guidelines are intended for basic planning purposes, and do not represent fixed recommendations or restrictions.

In the following table, find the installation category that matches your site, based on number of master servers that your NOM server will manage, number of jobs per day, and so forth. Then consult the following table for NOM sizing guidelines.

Using the NetBackup installation category from above (A, B, C, D), read across to the recommended NOM server capacities.

Table 1-8 NOM sizing guidelines

NetBackup installation category

Master servers

Jobs per day

Policies Alerts per day

Media

A 1 - 3 200 - 500 200 - 300 100 - 200 5000

B 3 - 5 500 - 1000 300 - 500 200 - 300 10000

C 5 - 7 1000 - 5000 1000 - 4000 500 - 800 20000

D 8 - 10 5000 - 8000 4000 - 8000 800 - 3000 30000

Table 1-9 NOM server capacities


OS CPU type Number of CPUs

RAM Disk space

A Windows Pentium V 2 2 GB 80 GB

Solaris Sun Sparc 1200 MHz 1 2 GB 80 GB

B Windows Pentium V 2 2 GB 80 GB


SummaryUsing the guidelines provided in this chapter, design a solution that can do a full backup and incremental backups of your largest system within your time window. The remainder of the backups can happen over successive days.

Eventually, your site may outgrow its initial backup solution. By following these guidelines, you can add more capacity at a future date without having to redesign your basic strategy. With proper design and planning, you can create a backup strategy that will grow with your environment.

As outlined in the previous sections, the number and location of the backup devices are dependent on a number of factors.

■ The amount of data on the target systems,

■ The available backup and restore windows,

■ The available network bandwidth, and

■ The speed of the backup devices.

If one drive causes backup window time conflicts, another can be added, providing an aggregate rate of two drives. The trade-off is that the second drive imposes extra CPU, memory, and I/O loads on the media server.

If you find that you cannot complete backups in the allocated window, one approach is to either increase your backup window or decrease the frequency of your full and incremental backups.

Another approach is to reconfigure your site to speed up overall backup performance. Before you make any such change, you should understand what determines your current backup performance. List or diagram your site network and systems configuration. Note the maximum data transfer rates for all the components of your backup configuration and compare these against the rate you must meet for your backup window. This will identify the slowest


C Windows Pentium V 4 4 GB 80 GB


D Windows Pentium V 4 4 GB 80 GB


Table 1-9 NOM server capacities


OS CPU type Number of CPUs

RAM Disk space

37NetBackup capacity planningQuestionnaire for capacity planning

components and, consequently, the cause of your bottlenecks. Some likely areas for bottlenecks include the networks, tape drives, client OS load, and filesystem fragmentation.

Questionnaire for capacity planningUse the following questionnaire to fill in information about the characteristics of your systems and how they will be used. This data can help determine your NetBackup client configurations and backup requirements.

Table 1-10 Backup questionnaire

Question Explanation

System name Any unique name to identify the machine. Hostname or any unique name for each system.

Vendor The hardware vendor who made the system (for example, Sun, HP, IBM, generic PC)

Model For example: Sun E450, HP K580, Pentium II 300MHZ, HP Proliant 8500

OS version For example: Solaris 9, HP-UX 11i, Windows 2000 DataCenter

Building / location Identify physical location by room, building, and/or campus.

Total storage Total available internal and external storage capacity.

Used storage Total used internal and external storage capacity - if the amount of data to be backed up is substantially different from the amount used, please note that.

Type of external array For example: Hitachi, EMC, EMC CLARiiON, STK.

Network connection For example, 10/100MB, Gigabit, T1. It is important to know if the LAN is a switched network or not.

Database (DB) For example, Oracle 8.1.6, SQLServer 7.

Hot backup required? If so, this requires the optional database agent if backing up a database.

Key application For example: Exchange server, accounting system, software developer's code repository, NetBackup critical policies.

Backup window For example: incrementals run M-F from 11PM to 6AM, Fulls are all day Sunday. This information helps determine where potential bottlenecks will be and how to configure a solution.

Retention policy For example: incrementals for 2 weeks, full backups for 13 weeks. This information will help determine how to size the number of slots needed in a library.

Existing backup media Type of media currently used for backups.

38 NetBackup capacity planningQuestionnaire for capacity planning

Comments? Any special situations to be aware of? Any significant patches on the operating system? Will the backups be over a WAN? Do the backups need to go through a firewall?

Table 1-10 Backup questionnaire (continued)

Question Explanation

Chapter
2
Master Server configuration guidelines

This chapter provides guidelines and recommendations for better performance on the NetBackup master server.


■ “Managing NetBackup job scheduling” on page 40

■ “Miscellaneous considerations” on page 44

■ “Merging/splitting/moving servers” on page 48

■ “Guidelines for policies” on page 49

■ “Managing logs” on page 50

40 Master Server configuration guidelinesManaging NetBackup job scheduling

Managing NetBackup job schedulingThis section discusses issues related to NetBackup job scheduling.

Delays in starting jobsThe NetBackup Policy Execution Manager (nbpem) may not begin a backup at exactly the time a backup policy's schedule window opens. This can happen when you define a schedule or modify an existing schedule with a window start time close to the current time.

For instance, suppose you create a schedule at 5:50 PM, specifying that backups should start at 6:00 PM. You complete the policy definition at 5:55 PM. At 6:00 PM, you expect to see a backup job for the policy start, but it does not. Instead, the job takes another several minutes to start.

The explanation is that NetBackup receives and queues policy change events as they happen, but processes them periodically as configured in the Policy Update Interval setting under Host Properties > Master Server > Properties > Global Settings (the default is 10 minutes). The backup does not start until the first time NetBackup processes policy changes after the policy definition is completed at 5:55 PM. NetBackup may not process the changes until 6:05 PM. For each policy change, NetBackup determines what needs to be done and updates its work list accordingly.

Delays in running queued jobsIf jobs remain in the queue and only one job runs at a time, make sure the following attributes are set to allow jobs to run simultaneously:

■ Host Properties > Master Server > Properties > Global Attributes > Maximum jobs per client (should be greater than 1).

■ Host Properties > Master Server > Properties > Client Attributes setting for Maximum data streams (should be greater than 1).

■ Policy attribute Limit jobs per policy (should be greater than 1).

■ Policy schedule attribute Media multiplexing (should be greater than 1).

■ Check the storage unit properties:

■ Is the storage unit enabled to use multiple drives (Maximum concurrent write drives)? If you want to increase this value, remember to set it to fewer than the number of drives available to this storage unit. Otherwise, restores and other non-backup activities will not be able to run while backups to the storage unit are running.

41Master Server configuration guidelinesManaging NetBackup job scheduling

■ Is the storage unit enabled for multiplexing (Maximum streams per drive)? You can write a maximum of 32 jobs to one tape at the same time.

Job delays caused by unavailable mediaIf the media in a storage unit are not configured or are unusable (such as being expired, or the maximum mounts setting was exceeded, or the wrong pool was selected), the job will fail if no other storage units are usable. If media are unavailable, new media will have to be added, or the media configuration will have to be changed to make media available (such as changing the volume pool or the maximum mounts).

If the media in a storage unit are usable but are currently busy, the job will be queued. The NetBackup Activity Monitor should display the reason for the job queuing, such as “media are in use.” If the media are in use, the media will eventually stop being used and the job will run.

Delays after removing a media serverA job may be queued by the NetBackup Job Manager (nbjm) if the media server is not available. This is not because of communication time-outs, but because EMM knows the media server is down and the NetBackup Resource Broker (nbrb) queues the request to be retried later.

If a media server is configured in EMM but has been physically removed, powered off, or disconnected from the network, or if the network is down for any reason, the media and device selection logic of EMM will queue the job if no other media servers are available. The Activity Monitor should display the reason for the job queuing, such as “media server is offline.” Once the media server is online again in EMM, the job will start. In the meantime, if other media servers are available, the job will run on another media server.

If a media server is not configured in EMM (removed from the configuration), regardless of the physical state of the media server, EMM will not select that media server for use. If no other media servers are available, the job will fail.

Limiting factors for job schedulingFor every backup submitted, there may be one bprd process for the duration of the job. When many requests are submitted to NetBackup simultaneously, NetBackup will increase its use of memory and may eventually impact the overall performance of the system. This type of performance degradation is associated with the way a given operating system handles memory requests. It may affect the functioning of all applications running on the system in question, not just NetBackup.

42 Master Server configuration guidelinesManaging NetBackup job scheduling

Note: The Activity Monitor may not update if there are many (thousands of) jobs to view. If this happens, you may need to change the memory setting using the NetBackup Java command jnbSA with the -mx option. Refer to the “INITIAL_MEMORY, MAX_MEMORY” subsection in the NetBackup System Administrator’s Guide for UNIX and Linux, Volume I. Note that this situation does not affect NetBackup's ability to continue running jobs.

Adjusting the server’s network connection optionsWhen running many simultaneous jobs, the CPU utilization of the master server may become very high. To reduce utilization and improve performance, adjust the network connection options for the local machine on the Host Properties > Master Server > Master Server Properties > Firewall display in the NetBackup Administration Console (shown below), or you can add the following bp.conf entry to the UNIX master server.

CONNECT_OPTIONS = localhost 1 0 2

For an explanation of the CONNECT_OPTIONS values, refer to the NetBackup System Administrator’s Guide for UNIX and Linux, Volume II.

The NetBackup Troubleshooting Guide also provides information on network connectivity issues.

43Master Server configuration guidelinesManaging NetBackup job scheduling

Using NOM to monitor jobsThe NetBackup Operations Manager (NOM) can be used to monitor the performance of NetBackup jobs. NOM can also manage and monitor dozens of NetBackup installations spread across multiple locations. Some of the features provided by NOM are the following:

■ Web-based interface for efficient, remote administration across multiple NetBackup servers from a single, centralized console.

■ Policy-based alert notification, using predefined alert conditions to specify typical issues or thresholds within NetBackup.

■ Flexible reporting, on issues such as backup performance, media utilization, and rates of job success.

■ Consolidated job and job policy views per server (or group of servers), for filtering and sorting job activity.

For more information on the capabilities of NOM, refer to the NOM online help in the Administration console, or see the NetBackup Operations Manager Getting Started Guide.

Disaster recovery testing and job schedulingThe following techniques may help in your disaster recovery testing.

■ Prevent the expiration of empty media.

a Go to the following directory:

UNIXcd /usr/openv/netbackup/bin

Windowsinstall_path\NetBackup\bin

b Enter the following:mkdir bpsched.d cd bpsched.d echo 0 > CHECK_EXPIRED_MEDIA_INTERVAL

■ Prevent the expiration of images

a Go to the following directory:

UNIXcd /usr/openv/netbackup

Windowscd install_path\NetBackup

b Enter the following:

UNIXtouch NOexpire

44 Master Server configuration guidelinesMiscellaneous considerations

Windowsecho 0 > NOexpire

■ Prevent backups from starting by shutting down bprd (NetBackup Request Manager). This will suspend scheduling of new jobs by nbpem. To shut down bprd, you can use the Activity Monitor in the NetBackup Administration Console.

Restart bprd to resume scheduling.

Miscellaneous considerationsConsider the following issues when planning for or troubleshooting NetBackup.

Processing of storage unitsNetBackup storage units are processed in alphabetical order. You can affect how storage units are selected and therefore when media servers are used by being aware of the alphabetical order of the name of each storage unit. You can also have some control over load balancing by using storage unit groups.

Storage unit groups contain a list of storage units that are available for that policy to use. A storage unit group can be configured to use storage units in any of three ways, in the New Storage Unit Group dialog of the NetBackup Administration Console.

■ Use storage units in the order in which they are listed in the group.

■ Choose the least recently selected storage unit in the group.

■ Configure the storage unit group as a failover group. This means the first storage unit in the group will be the only storage unit used. If the storage unit is busy, then backups will queue. The second storage unit will only be used if the first storage unit is down.

Disk stagingWith disk staging, images can be created on disk initially, then copied later to another media type (as determined in the disk staging schedule). The media type for the final destination is typically tape, but could be disk. This two-stage process leverages the advantages of disk-based backups in the near term, while preserving the advantages of tape-based backups for long term.

Note that disk staging can be used to increase backup speed. For more information, refer to the NetBackup System Administrator’s Guide, Volume I.

45Master Server configuration guidelinesNetBackup catalog strategies

File system capacityThere must be ample file system space for NetBackup to record its logging and/or catalog entries on each master server, media server, and client. If logging or catalog entries should exhaust available file system space, NetBackup will cease to function. Having the ability to increase the size of the file system via volume management is recommended. The disk containing the NetBackup master catalog should be protected with mirroring or RAID hardware or software technology.

NetBackup catalog strategiesThe NetBackup catalog resides on the disk of the NetBackup master server. The catalog consists of the following parts:

■ Image database: The image database contains information about what has been backed up. It is by far the largest part of the catalog.

■ NetBackup data stored in relational databases: This includes the media and volume data describing media usage and volume information which is used during the backups.

■ NetBackup configuration files: Policy, schedule and other flat files used by NetBackup.

For more information on the catalog, refer to “Catalog Maintenance and Performance Optimization” in the NetBackup Administrator's Guide Volume 1.

The NetBackup catalogs on the master server tend to grow large over time and eventually fail to fit on a single tape. Here is the layout of the first few directory levels of the NetBackup catalogs on the master server:

Figure 2-3 Directory layout on master Server

46 Master Server configuration guidelinesNetBackup catalog strategies

Catalog backup typesIn addition to the existing cold catalog backups (which require that no jobs be running), NetBackup 6.0 introduces online “hot” catalog backups. These hot catalog backups can be performed while other jobs are running.

Note: For NetBackup release 6.0 and beyond, it is recommended that you use schedule-based, incremental hot catalog backups with periodic full backups as your preferred catalog backup method.

Guidelines for managing the catalog■ NetBackup catalog pathnames (cold catalog backups)

/vault

/failure_history

/Master/client_1

/Media_server /client_n

/class_template

NBDB.db

EMM_DATA.db

EMM_INDEX.db

NBDB.log

BMRDB.db

BMRDB.log

BMR_DATA.db

BMR_INDEX.db

vxdbms.conf

/usr/openv/

/db/data

/error/class

Image databaseRelational database files

server.conf

databases.conf

/var/global

Configuration files

/client

/config

/images

/Netbackup/db

License key and authentication information

/var /Netbackup/vault

/jobs

/media

47Master Server configuration guidelinesNetBackup catalog strategies

When defining the file list, use absolute pathnames for the locations of the NetBackup and Media Manager catalog paths and include the server name in the path. This is in case the media server performing the backup is changed.

■ Back up the catalog using online, hot catalog backup

This type of catalog backup is for highly active NetBackup environments in which continual backup activity is occurring. It is considered an online, hot method because it can be performed while regular backup activity is taking place. This type of catalog is policy-based and can span more than one tape. It also allows for incremental backups, which can significantly reduce catalog backup times for large catalogs.

■ Store the catalog on a separate file system

The NetBackup catalog can grow quickly depending on backup frequency, retention periods, and the number of files being backed up. If you store the NetBackup catalog data on its own file system, this ensures that other disk resources, root file systems, and the operating system are not impacted by the catalog growth. For information on how to move the catalog, refer to “Catalog compression” on page 48.

■ Change the location of the NetBackup relational database files

The location of the NetBackup relational database files can be changed and/or split into multiple directories for better performance. For example, by placing the transaction log file, NBDB.log, on a physically separate drive, you gain better protection against disk failure and increased efficiency in writing to the log file. Refer to the procedure in the section “Moving NBDB Database Files After Installation” in the “NetBackup Relational Database” appendix of the NetBackup System Administrator’s Guide, Volume I.

■ Delay to compress catalog

The default value for this parameter is 0, which means that NetBackup does not compress the catalog. As your catalog increases in size, you may want to use a value between 10 and 30 days for this parameter. When you restore old backups, which requires looking at catalog files that have been compressed, NetBackup automatically uncompresses the files as needed, with minimal performance impact. For information on how to compress the catalog, refer to “Catalog compression” on page 48.

Catalog backup not finishing in the available windowIf your cold catalog backups are not finishing in the backup window, or hot catalog backups are running a long time, here are some possible solutions:

■ Use catalog archiving. Catalog archiving reduces the size of online catalog data by relocating the large catalog .f files to secondary storage. NetBackup

48 Master Server configuration guidelinesMerging/splitting/moving servers

administration will continue to require regularly scheduled catalog backups, but without the large amount of online catalog data, the backups will be faster.

■ Off load some policies, clients, and backup images from the current master server to a new, additional master, so that each master has a window large enough to allow its catalog backup to finish. Since a media server can be connected to one master server only, additional media servers may be needed. For assistance in adding another master server to lighten the workload of the existing master, contact Symantec Consulting.

■ Determine whether most of the catalog backup time is being used in expiring backup images. If this is the case, make sure the master's primary DNS server is available by running nslookup. The command should respond quickly. Also, investigate whether there are any media servers which no longer exist. The image cleaning operation will time out on these repeatedly, trying to expire fragments, if the media servers were not removed from the NetBackup configuration correctly.

Catalog compressionWhen the NetBackup image catalog becomes too large for the available disk space, there are two ways to manage this situation:

■ Compress the image catalog

■ Move the image catalog.

For details, refer to “Moving the Image Catalog” and “Compressing and Uncompressing the Image Catalog” in the NetBackup System Administrator’s Guide, Volume I.

Note that NetBackup compresses images after each backup session, regardless of whether or not any backups were successful. This happens right before the execution of the session_notify script and the backup of the catalog. The actual backup session is extended until compression is complete.

Merging/splitting/moving serversA master server schedules and maintains backup information for a given set of systems. The Enterprise Media Manager (EMM) server and its database maintain centralized device and media related information used on all servers that are part of the configuration. By default, the EMM server and the NetBackup Relational Database (NBDB) that contains the EMM data are located on the master server. A large and dynamic data center can expect to periodically reconfigure the number and organization of its backup servers.

49Master Server configuration guidelinesGuidelines for policies

Centralized management, reporting, and maintenance are the benefits of working in a centralized NetBackup environment. Once a master server has been established, it is possible to merge its databases with another master server, giving control over its set of server backups to the new master server.

Conversely, if the backup load on a master server has grown to the point where backups are not finishing in the backup window, it may be desirable to split that master server into two master servers.

It is possible to merge or split NetBackup master servers or EMM servers. It is also possible to convert a media server to a master server or a master server to a media server. However, the procedures to accomplish this are complex and require a detailed knowledge of NetBackup database interactions. Merging or splitting NetBackup, Media Manager and EMM databases to another server is not recommended without involving a Symantec consultant to determine the changes needed, based on your specific configuration and requirements.

Moving the EMM serverThe EMM server can be moved from the master server to another server, to create a “remote” EMM server. In some cases, moving the EMM server off the master may improve capacity and performance for the NetBackup master server and for the EMM server. For assistance, refer to “Moving the NetBackup Database from One Host to Another” in Appendix A of the NetBackup System Administrator's Guide, Volume I.

Guidelines for policiesThe following items may have performance implications.

Include and exclude lists■ Do not use excessive wild cards in file lists.

When wildcards are used, NetBackup compares every filename against the wild cards. This decreases NetBackup performance. Instead of placing /tmp/* (UNIX) or C:\Temp\* (Windows) in an include or exclude list, use /tmp/ or C:\Temp.

■ Use exclude files to exclude large useless files.

Reduce the size of your backups by using exclude lists for the files your installation does not need to preserve. For instance, you may decide to exclude temporary files. Use absolute paths for your exclude list entries, so that valuable files are not inadvertently excluded. Before adding files to the exclude list, confirm with the affected users that their files can be safely

50 Master Server configuration guidelinesManaging logs

excluded. Should disaster (or user error) strike, not being able to recover files costs much more than backing up extra data.

When a policy specifies that all local drives be backed up (ALL_LOCAL_DRIVES), nbpem initiates a parent job (nbgenjob) that connects to the client and runs bpmount -i to get a list of mount points. Then nbpem initiates a job with its own unique job identification number for each mount point. Next the client bpbkar starts a stream for each job. Then, and only then, the exclude list is read by NetBackup. When the entire job is excluded, bpbkar exits with a status 0, stating that it sent 0 of 0 files to backup. The resulting image files are treated just as any other successful backup's image files. They expire in the normal fashion when the expiration date in the image header files specifies they are to expire.

Critical policiesFor online, hot catalog backups (a new feature in NetBackup 6.0), make sure to identify those policies that are crucial to recovering your site in the event of a disaster. For more information on hot catalog backup and critical policies, refer to the NetBackup System Administrator’s Guide, Volume I.

Schedule frequencyTo minimize the number of times you back up files that have not changed, and to minimize your consumption of bandwidth, media, and other resources, consider limiting the frequency of your full backups to monthly or even quarterly, followed by weekly cumulative incremental backups and daily incremental backups.

Managing logs

Optimizing the performance of vxlogviewAs explained in the NetBackup Troubleshooting Guide, the vxlogview command is used for viewing logs created by unified logging (VxUL). The vxlogview command will deliver optimum performance when a file ID is specified in the query.

For example: when viewing messages logged by the NetBackup Resource Broker (nbrb) for a given day, you can filter out the library messages while viewing the nbrb logs. To achieve this, run vxlogview as follows:

vxlogview –o nbrb –i nbrb –n 0

Note that -i nbrb specifies the file ID for nbrb. Specifying the file ID improves the performance, because the search is confined to a smaller set of files.

51Master Server configuration guidelinesManaging logs

Interpreting legacy error logs This section describes the fields in the legacy log files written to the /usr/openv/netbackup/db/error directory on UNIX (the install_path\NetBackup\db\error folder on Windows). On UNIX, there is a link to the most current file in the error directory; the link is called daily_messages.log. Note that the information in these logs provides the basis for the NetBackup ALL LOG ENTRIES report. For more information on legacy logging and unified logging (VxUL), refer to the NetBackup Troubleshooting Guide.

Here is a sample message from an error log: 1021419793 1 2 4 nabob 0 0 0 *NULL* bpjobd TERMINATED bpjobd

The meaning of the various fields in this message (the fields are delimited by blanks) is defined in the table below, Table 2-11Meaning of daily_messages log fields. The next table, Table 2-12Message types, lists the values for the message type, which is the third field in the log message.

Table 2-11 Meaning of daily_messages log fields

Field Definition Value

1 Time this event occurred (ctime) 1021419793 (= number of seconds since 1970)

2 Error database entry version 1

3 Type of message 2

4 Severity of error:

1: Unknown

2: Debug

4: Informational

8: Warning

16: Error

32: Critical

4

5 Server on which error was reported nabob

6 Job ID (included if pertinent to the log entry) 0

7 (optional entry) 0

8 (optional entry) 0

9 Client on which error occurred, if applicable, otherwise *NULL*

*NULL*

52 Master Server configuration guidelinesManaging logs

10 Process which generated the error message bpjobd

11 Text of error message TERMINATED bpjobd

Table 2-11 Meaning of daily_messages log fields (continued)

Field Definition Value

Table 2-12 Message types

Type Value Definition of this Message Type

1 Unknown

2 General

4 Backup

8 Archive

16 Retrieve

32 Security

64 Backup status

128 Media device

Chapter
3
Media Server configuration guidelines

This chapter provides configuration guidelines for the media server along with related background information.


■ “Network and SCSI/FC bus bandwidth” on page 54

■ “How to change the threshold for media errors” on page 54

■ “How to reload the st driver without rebooting Solaris” on page 57

■ “How to reload the st driver without rebooting Solaris” on page 57

■ “Media Manager drive selection” on page 58

■ “Robot types and NetBackup port configuration” on page 58

54 Media Server configuration guidelinesNetwork and SCSI/FC bus bandwidth

Network and SCSI/FC bus bandwidthConfigure no more than two high-performance tape drives per SCSI/fibre-channel connection. A SCSI/fibre-channel configuration should be able to handle both drives at maximum rated compression. Tape drive wear and tear is much reduced and efficiency increased if the data stream matches the tape drive capacity and is sustained.

Note: Make sure that both your inbound network connection and your SCSI/FC bus have enough bandwidth to feed all of your tape drives.

Example:

iSCSI (360 GB/hour) Two LTO gen 3 drives, each rated at approximately 300 GB/hour (2:1 compression)

In this example, the tape drives require more speed than provided by the iSCSI bus. Only one tape drive will stream given this configuration. The solution is to add a second iSCSI bus, or to move to a connection that is fast enough to efficiently feed data to the tape drives.

How to change the threshold for media errorsSome backup failures can occur because there is no media available. If you see this kind of error, you can execute the following script and then run the NetBackup Media List report to check the status of media:

UNIX/usr/openv/netbackup/bin/goodies/available_media

Windowsinstall_path\NetBackup\bin\goodies\available_media

The NetBackup Media List report may show that some media is frozen and therefore cannot be used for backups.

One of the reasons NetBackup freezes media is because of recurring I/O errors. The NetBackup Troubleshooting Guide describes the recommended approaches for dealing with this issue, for example, under NetBackup error code 96. It is also possible to configure the NetBackup error threshold value. The method for doing this is described in this section.

Each time a read, write, or position error occurs, NetBackup records the time, media ID, type of error, and drive index in the EMM database. Then NetBackup scans to see whether that media has had “m” of the same errors within the past “n” hours. The variable “m” is a tunable parameter known as media_error_threshold. The default value of media_error_threshold is 2 errors.

55Media Server configuration guidelinesHow to change the threshold for media errors

The variable “n” is known as time_window. The default value of time_window is 12 hours. If a tape volume has more than media_error_threshold errors, NetBackup will take the appropriate action:

■ If the volume has not been previously assigned for backups, then NetBackup will:

■ set the volume status to FROZEN

■ select a different volume

■ log an error

■ If the volume is in the NetBackup media catalog and has been previously selected for backups, then NetBackup will:

■ set the volume to SUSPENDED

■ abort the current backup

■ log an error

Adjusting media_error_thresholdTo configure the NetBackup media error thresholds, use the nbemmcmd command on the media server as follows. NetBackup freezes a tape volume or downs a drive for which these values are exceeded. For more detail on the nbemmcmd command, refer to the man page or to the NetBackup Commands Guide.

UNIX/usr/openv/netbackup/bin/admincmd/nbemmcmd -changesetting -time_window unsigned integer -machinename string -media_error_threshold unsigned integer -drive_error_threshold unsigned integer

Windows<install_path>\NetBackup\bin\admincmd\nbemmcmd.exe -changesetting -time_window unsigned integer -machinename string -media_error_threshold unsigned integer -drive_error_threshold unsigned integer

For example, if the -drive_error_threshold is set to the default value of 2, the drive is downed after 3 errors in 12 hours. If the -drive_error_threshold is set to a value of 6, it would take 7 errors in the same 12 hour period before the drive would be downed.

56 Media Server configuration guidelinesHow to change the threshold for media errors

Note: The following description has nothing to do with the number of times NetBackup retries a backup/restore that fails. That situation is controlled by the global configuration parameter “Backup Tries” for backups and the bp.conf entry RESTORE_RETRIES for restores. This algorithm merely deals with whether I/O errors on tape should cause media to be frozen or drives to be downed.

When a read/write/position error occurs on tape, the error returned by the operating system does not distinguish between whether the error is caused by the tape or the drive. To prevent the failure of all backups in a given timeframe, bptm tries to identify a bad tape volume or drive based on past history, using the following logic:

■ Each time an I/O error occurs on a read/write/position, bptm logs the error in the file /usr/openv/netbackup/db/media/errors (UNIX) or install_path\NetBackup\db\media\errors (Windows). The error message includes the time of the error, media ID, drive index and type of error.

Examples of the entries in this file are these: 07/21/96 04:15:17 A00167 4 WRITE_ERROR 07/26/96 12:37:47 A00168 4 READ_ERROR

■ Each time an entry is made, the past entries are scanned to determine if the same media ID and/or drive has had this type of error in the past “n” hours. “n” is known as the time_window. The default time window is 12 hours.

When performing the history search for the time_window entries, EMM notes past errors that match the media ID, the drive, or both the drive and the media ID. The purpose of this is to determine the cause of the error. For example, if a given media ID gets write errors on more than one drive, it is assumed that the tape volume is bad and NetBackup freezes the volume. If more than one media ID gets a particular error on the same drive, it is assumed the drive is bad and the drive goes to a “down” state. If only past errors are found on the same drive with the same media ID, then EMM assumes that the volume is bad and freezes it.

■ Freezing or downing does not occur on the first error. There are two other parameters, media_error_threshold and drive_error_threshold. The default value of both of these parameters is 2. For a “freeze” or “down” to happen, more than the threshold number of errors must occur (by default, at least three errors must occur) in the time window for the same drive/media ID.

57Media Server configuration guidelinesHow to reload the st driver without rebooting Solaris

Note: If either media_error_threshold or drive_error_threshold is 0, freezing or downing occurs the first time any I/O error occurs. media_error_threshold is looked at first, so if both values are 0, freezing will override downing. It is not recommended that these values be set to 0.

Changing the default values is not recommended, unless there is a good reason to do so. One obvious change would be to put very large numbers in the THRESHOLD files, thus basically disabling the mechanism such that to “freeze” a tape or “down” a drive should never occur.

Freezing and downing is primarily intended to benefit backups. If read errors occur on a restore, freezing media has little effect. NetBackup still accesses the tape to perform the restore. In the restore case, downing a bad drive may help.

How to reload the st driver without rebooting Solaris

The devfsadmd daemon enhances device management in Solaris. This daemon is capable of dynamically reconfiguring devices during the boot process and in response to kernel event notification.

The devfsadm located in /usr/sbin is the command form of devfsadmd. devfsadm replaces drvconfig (for management of physical device tree /devices) and devlinks (for management of logical devices in /dev). devfsadm also replaces the commands for specific device class types, such as /usr/sbin/tapes.

Thus, in order to recreate tape devices for NetBackup after changing the /kernel/drv/st.conf file without rebooting the server, perform the following steps:

To reload the st driver without rebooting

1 Shutdown the NetBackup and Media Manager daemons.

2 Obtain the module id for the st driver in kernel:

/usr/sbin/modinfo | grep SCSI

The module id is the first field in the line corresponding to the SCSI tape driver.

3 Unload the st driver from the kernel:

/usr/sbin/modunload -i "module id"

58 Media Server configuration guidelinesMedia Manager drive selection

4 Use devfsadm to recreate the device nodes in /devices and the device links in /dev for tape devices by running any one (not all) of the following commands:

/usr/sbin/devfsadm -i st /usr/sbin/devfsadm -c tape /usr/sbin/devfsadm -C -c tape (Use this command to enforce cleanup if

dangling logical links are present in /dev.)

5 Reload the st driver: /usr/sbin/modload st

6 Restart the NetBackup and Media Manager daemons.

Media Manager drive selectionOnce the media and device selection logic (MDS) in the EMM service determines which storage unit to use, MDS attempts to select a drive that matches the storage unit selection criterion, such as media server, robot number, robot type, and density. MDS prefers loaded drives over unloaded drives (a loaded drive removes the overhead of loading a media in a drive). If no loaded drives are available, MDS attempts to select the best usable drive suited for the job. In general, MDS prefers non-shared drives over shared drives, and it attempts to select the least recently used drive.

Robot types and NetBackup port configurationThere is no facility in NetBackup to route ACSLS communications through any server other than the one where the backup or restore is taking place. Unlike the tldd/tldcd design for TLD robotic control which requires one point of robotic control, acsd is a single robotic daemon that runs on each server with ACS drives attached. The same is true for TLM (ADIC SDLC and ADIC DAS controlled libraries) where tlmd runs on each server with TLM drives. Such robot types, which have no single point of robotic control, provide resiliency in case of a NetBackup server failure. However, extra planning may be required to accommodate them given your firewall requirements.

ACS has been enhanced to be more firewall friendly. For more information, refer to the “STK Automated Cartridge System (ACS)” appendix of the NetBackup Media Manager System Administrator’s Guide.

Chapter
4
Media configuration guidelines

This chapter provides guidelines and recommendations for better performance with NetBackup media.


■ “Dedicated or shared backup environment” on page 60

■ “Pooling” on page 60

■ “Disk versus tape” on page 60

60 Media configuration guidelinesDedicated or shared backup environment

Dedicated or shared backup environmentOne design decision is whether to make your backup environment dedicated or shared. Dedicated SANs are secure but expensive. Shared environments cost less, but require more work to make them secure. A SAN installation with a database may require the performance of a RAID 1 array. An installation backing up a basic file structure may satisfy its needs with RAID 5 or NAS.

PoolingHere are some useful conventions for media pools (formerly known as volume pools):

■ Configure a scratch pool for management of scratch tapes. If a scratch pool exists, EMM can move volumes from that pool to other pools that do not have volumes available.

■ Use the available_media script in the goodies directory. You can put the available_media report into a script which redirects the report output to a file and emails the file to the administrators daily or weekly. This helps track which tapes are full, frozen, suspended, and so on. By means of a script, you can also filter the output of the available_media report to generate custom reports.

To monitor media, you can also use the NetBackup Operations Manager (NOM). For instance, NOM can be configured to issue an alert if there are fewer than X number of media available, or if more than X% of the media is frozen or suspended.

■ Use the none pool for cleaning tapes.

■ Do not create too many pools. The existence of too many pools causes the library capacity to become fragmented across the pools. Consequently, the library becomes filled with many partially-full tapes.

Disk versus tapeDisk is becoming more common as a backup medium. Storing backup data on disk generally provides faster restore.

Tuning disk-based storage for performance is similar to tuning tape-based storage. The optimal buffer settings for a site can vary according to its configuration. It takes thorough testing to determine these settings.

Disk-based backup storage can be useful if you have a lot of incremental backups and the percentage of data change is small. If the volume of data in incremental

61Media configuration guidelinesDisk versus tape

copies is insufficient to ensure streaming to tape drives, writing to disk can speed the backup process and alleviate wear and tear on your tape drives.

Here are some factors to consider when choosing to back up a given dataset to disk or tape:

■ Short or long retention period

■ Incremental or full backup

■ Intermediate (staging) or long-term storage

■ Delay in recovery time

Here are some benefits of backing up to disk rather than tape:

■ No need to multiplex

Writing to disk does not need to be streamed. This means that multiplexing is not necessary.

Multiplexing is only necessary with tape because the tape must be streamed. Multiplexing allows multiple clients and multiple file systems to be backed up to the same tape simultaneously, thus streaming the drive. However, this functionality slows down the restore. (See “Tape streaming” on page 126 for an explanation of streaming.)

■ Instant access to data

Most tape drives on the market have a “time to data” of close to two minutes. This time includes the amount of time to move the tape from its slot, load it into the drive and seek an appropriate place on tape. Disk has an effective time to data of zero seconds. To understand the significance of eliminating this delay, consider that restoring a large file system whose backups reside on 30 different tapes means that a two-minute delay per tape adds almost two hours to the restore. This includes the time it takes to eject and unload the 30 tapes.

■ Fewer full backups.

With tape-based systems, full backups must be done regularly because of the instant access to data issue described above. Otherwise, the number of tapes required for a restore significantly increases both the time to restore and the chance that a single tape will cause the restore to fail. Since disk arrays are protected by RAID software, they do not have this problem.

62 Media configuration guidelinesDisk versus tape

Chapter
5
Database backup guidelines

This chapter gives planning guidelines for database backup.


■ “Introduction” on page 64

■ “Considerations for database backups” on page 64

64 Database backup guidelinesIntroduction

IntroductionBefore you create a database, decide how to protect the database against potential failures. Answer the following questions before developing your backup strategy.

■ Is it acceptable to lose any data if a hardware failure damages some of the files that constitute a database?

■ Will you ever need to recover to past points-in-time?

■ Does the database need to be available at all times (24x7)?

For specific information on backing up and restoring your database, refer to the NetBackup administrator’s guide for your database product. In addition, the manufacturer of your database product may provide publications that document backup recommendations and methods.

Considerations for database backupsWhen planning your database backups, consider the following.

■ Fragmentation and databases

Using a smaller fragment size in a backup of a database such as Oracle will not improve backup performance, and may hinder restore performance. Database backups (when not using Advanced Client) are unaffected by fragmentation since there is only one “file” per backup image. There is no advantage in tape positioning with or without fast-locate blocks.

■ Using Advanced Client

NetBackup Advanced Client provides snapshot backup technology combined with off-host data movement for local networks and SAN environments. A data snapshot can be created on disk in seconds and then backed up directly to tape. Users can significantly reduce CPU and I/O overhead from application or database servers while eliminating the backup window altogether.

Advanced Client helps reduce the impact on applications that require 24x7xforever availability. Advanced Client is available on UNIX and Windows systems, and supports all NetBackup libraries and drives. It can be used with multi-streaming and multiplexing, and with a variety of disk arrays.

Chapter
6
Best practices

This chapter describes an assortment of best practices, and includes the following sections:

■ “Best practices: new tape drive technologies” on page 66

■ “Best practices: tape drive cleaning” on page 66

■ “Best practices: storing tape cartridges” on page 68

■ “Best practices: recoverability” on page 68

■ “Best practices: naming conventions” on page 71

66 Best practicesBest practices: new tape drive technologies

Best practices: new tape drive technologiesSymantec provides a white paper on best practices for migrating your NetBackup installation to new tape technologies:

“Best Practices: Migrating to or Integrating New Tape Drive Technologies in Existing Libraries,” available at www.support.veritas.com.

Recent tape drives offer noticeably higher capacity than the previous generation of tape drives targeted at the open-systems market. Administrators may want to take advantage of these higher-capacity, higher performance tape drives, but are concerned about integrating these into an existing tape library. The white paper discusses different methods for doing so and the pros and cons of each.

Best practices: tape drive cleaningThis section discusses several ways to clean tape drives. Refer to the NetBackup Media Manager System Administrator’s Guide for details on how to use the methods discussed here.

Note: The TapeAlert feature is discussed in detail later in this section.

Here are the tape drive cleaning methods that can be used in a NetBackup installation:

■ Frequency-based cleaning

■ On-demand cleaning

■ TapeAlert

■ Robotic cleaning

Frequency-based cleaning NetBackup does frequency-based cleaning by tracking the number of hours a drive has been in use. When this time reaches a configurable parameter, NetBackup creates a job that mounts and exercises a cleaning tape. This cleans the drive in a preventive fashion. The advantage of this method is that typically there are no drives unavailable awaiting cleaning. There is also no limitation on platform or robot type. On the downside, cleaning is done more often than necessary. This adds system wear and consumes time that could be used to write to the drive. Another limitation is that this method is hard to tune. When new tapes are used, drive cleaning is needed less frequently; the need for cleaning increases as the tape inventory ages. This increases the amount of tuning administration needed and, consequently, the margin of error.

67Best practicesBest practices: tape drive cleaning

On-demand cleaning Refer to the NetBackup Media Manager System Administrator’s Guide for more information on this topic.

TapeAlert TapeAlert allows reactive cleaning for most drive types. TapeAlert allows a tape drive to notify EMM when it needs to be cleaned. EMM then performs the cleaning. You must have a cleaning tape configured in at least one library slot in order to utilize this feature. TapeAlert is the recommended cleaning solution if it can be implemented.

Not all drives, at all firmware levels, support this type of reactive cleaning. In the case where reactive cleaning is not supported on a particular drive, frequency-based cleaning may be substituted. This solution is not vendor or platform specific. The specific firmware levels have not been tested by Symantec, however the vendor should be able to confirm that the TapeAlert feature is supported.

■ How TapeAlert works

To understand NetBackup's behavior with drive-cleaning TapeAlerts, it is important to understand the TapeAlert interface to a drive. The TapeAlert interface to a tape drive is via the SCSI bus, based on a Log Sense page, which contains 64 alert flags. The conditions that cause a flag to be set and cleared are device-specific and are determined by the device vendor.

The configuration of the Log Sense page is via a Mode Select page. The Mode Sense/Select configuration of the TapeAlert interface is compatible with the SMART diagnostic standard for disk drives.

NetBackup reads the TapeAlert Log Sense page at the beginning and end of a write/read job. TapeAlert flags 20 to 25 are used for cleaning management although some drive vendors’ implementations may vary from this. NetBackup uses TapeAlert flag 20 (Clean Now) and TapeAlert flag 21 (Clean Periodic) to determine when it needs to clean a drive.

When a drive is selected by NetBackup for a backup, the Log Sense page is reviewed by bptm for status. If one of the clean flags is set, the drive will be cleaned before the job starts.

If a backup is in progress and one of the clean flags is set, the flag is not read until a tape is dismounted from the drive.

If a job spans media and, during the first tape, one of the clean flags is set, the cleaning light comes on and the drive will be cleaned before the second piece of media is mounted in the drive.

The implication is that the present job will conclude its ongoing write despite a TapeAlert Clean Now or Clean Periodic message. That is, the TapeAlert will not require the loss of what has been written to tape so far.

68 Best practicesBest practices: storing tape cartridges

This is true regardless of the number of NetBackup jobs involved in writing out the rest of the media.

Note that the behavior described here may change in the future.

If a large number of media become FROZEN as a result of having implemented TapeAlert, there is a strong likelihood of underlying media and/or tape drive issues.

■ Disabling TapeAlert

To disable TapeAlert, create a touch file called NO_TAPEALERT:

UNIX:

/usr/openv/volmgr/database/NO_TAPEALERT

Windows:

install_path\volmgr\database\NO_TAPEALERT

Robotic cleaning Robotic cleaning is not proactive, and is not subject to the limitations detailed above. By being reactive, unnecessary cleanings are eliminated, frequency tuning is not an issue, and the drive can spend more time moving data, rather than in maintenance operations.

Library-based cleaning is not supported by EMM for most robots, since robotic library and operating systems vendors have implemented this type of cleaning in many different ways.

Best practices: storing tape cartridgesA couple of issues with tape management are how long and where to store the tapes that you need to keep on site. Typically, a DLT tape can be stored for up to three years. The storage location should be climate-controlled and away from sunlight. In addition, the tapes should always be stored in their plastic boxes.

Best practices: recoverabilityRecovering from data loss involves both planning and technology to support your recovery objectives and time frames. The following table, Table 6-13Methods and procedures for recoverability, describes how you can use NetBackup and other tools to recover from various mishaps or disaster. The

69Best practicesBest practices: recoverability

methods and procedures you adopt for your installation should be documented and tested regularly to ensure that your installation can recover from disaster.

Additional material may be found in the following books:

■ The Resilient Enterprise, Recovering Information Services from Disasters, by Symantec and industry authors, published by Symantec Software Corporation.

■ Blueprints for HIGH Availability, Designing Resilient Distributed Systems, by Even Marcus and Hal Stern, published by John Wiley and Sons.

■ Implementing Backup and Recovery: The Readiness Guide for the Enterprise, by David B. Little and David A. Chapa, published by Wiley Technology Publishing.

Suggestions for data recovery planningIt is important to have a well-documented and tested plan to recover from a logical error, an operator error, or a site disaster. The following practices have been found effective for recoverability in production environments. Refer also to the NetBackup Troubleshooting Guide and the NetBackup System Administrator's Guide for further information on disaster recovery.

■ Always use a regularly scheduled hot catalog backup

Refer to “Catalog Recovery from an Online Backup” in the NetBackup Troubleshooting Guide.

■ Review the disaster recovery plan often

Table 6-13 Methods and procedures for recoverability

Operational Risk Recovery Possible? Methods and Procedures

File deleted before backup No None

File deleted after backup Yes Standard NetBackup restore procedures

Backup client failure Yes Data recovery using NetBackup

Media failure Yes Backup image duplication

Master/media server failure Yes Manual failover to alternate server

Loss of backup database Yes NetBackup database recovery

No NetBackup software Yes If multiplexing was not used, recovery of media without NetBackup, using GNU tar

Complete site disaster Yes Vaulting and off site media storage

70 Best practicesBest practices: recoverability

Review your site-specific recovery procedures and verify that they are accurate and up-to-date. Also, verify that the more complex systems, such as the NetBackup master and media servers, have current procedures for rebuilding the machines with the latest software.

■ Perform test recoveries on a regular basis

Implement a plan to perform restores of various systems to alternate locations. This plan should include selecting random production backups and restoring the data to a non-production system. A checksum can then be performed on one or many of the restored files and compared to the actual production data. Be sure to include offsite storage as part of this testing. The end-user or application administrator can also be involved in determining the integrity of the restored data.

■ Support NetBackup recoverability:

■ Back up the NetBackup catalog to two tapes.

The catalog contains information vital for NetBackup recovery. Its loss could result in hours or days of recovery time through manual processes. The cost of a single tape is a small price to pay for the added insurance of rapid recovery in the event of an emergency.

■ Back up the catalog after each backup.

If a hot catalog backup is used, an incremental catalog backup can be done after each backup session. Extremely busy backup environments should also use a scheduled hot catalog backup, since their backup sessions end infrequently.

In the event of a catastrophic failure, the recovery of images is slowed by not having all images available. If a manual backup occurs just before the master server or the drive that contains the backed-up files crashes, the manual backup must be imported to recover the most recent version of the files.

■ Record the IDs of catalog backup tapes.

Record the catalog tapes in the site run book or another public location to ensure rapid identification in the event of an emergency. If the catalog tapes are not identified ahead of time, a significant amount of time may be lost by scanning every tape in a library to find them.

The utility vmphyinv can be used to mount all tapes in a robotic library and identify the catalog tape(s). The vmphyinv utility will identify cold catalog tapes.

■ Designate label prefixes for catalog backups.

Make it easy to identify the NetBackup catalog data in times of emergency. Label the catalog tapes with a unique prefix such as “DB” on the tape barcodes, so your operators can find the catalog tapes without delay.

71Best practicesBest practices: naming conventions

■ Place NetBackup catalogs in specific robot slots.

Place a catalog backup tape in the first or last slot of a robot to more easily identify the tape in an emergency. This also allows for easy tape movement if manual tape handling is necessary.

■ Put the NetBackup catalog on different online storage than the data being backed up.

In the case of a site storage disaster, the catalogs of the backed-up data should not reside on the same disks as production data. The reason behind this is straightforward: you want to avoid the case where, if a disk drive loses production data, it also loses the catalog of the production data, resulting in increased downtime.

■ Regularly confirm the integrity of the NetBackup catalog.

On a regular basis, such as quarterly or after major operations or personnel changes, walk through the process of recovering a catalog from tape. This essential part of NetBackup administration can save hours in the event of a catastrophe.

Best practices: naming conventionsUse a consistent naming convention on all NetBackup master servers. Examples are provided below. Use lower case for all names. In most cases, the case will not cause issues, but some issues can occur when the installation comprises UNIX and Windows master and media servers.

Policy names One good naming convention for policies is platform_datatype_server(s).

Example 1: w2k_filesystems_trundle

This policy name designates a policy for a single Windows server doing file system backups.

Example 2: w2k_sql_servers

This policy name designates a policy for backing up a set of Windows 2000 SQL servers. Several servers may be backed up by this policy. Servers that are candidates for being included in a single policy are those running the same operating system and with the same backup requirements. Grouping servers within a single policy reduces the number of policies and eases the management of NetBackup.

72 Best practicesBest practices: naming conventions

Schedule namesCreate a generic scheme for schedule naming. One recommended set of schedule names is daily, weekly, and monthly. Another recommended set of names is incremental, cumulative, and full. This convention keeps the management of NetBackup at a minimum. It also helps with the implementation of Vault, if your site uses Vault.

Storage unit/storage group namesA good naming convention for storage units is to name the storage unit after the media server and the type of data being backed up.

Two examples: mercury_filesystems and mercury_databases

where “mercury” is the name of the media server and “filesystems” and “databases” identify the type of data being backed up.

Section II

Performance tuning

Section II explains how to measure your current NetBackup performance, and gives general recommendations and examples for tuning NetBackup.

Section II includes these chapters:

■ Measuring performance

■ Tuning the NetBackup data transfer path

■ Tuning other NetBackup components

■ Tuning disk I/O performance

■ OS-related tuning factors

■ Additional resources

74

Chapter
7
Measuring performance

This chapter provides suggestions for measuring NetBackup performance.


■ “Overview” on page 76

■ “Controlling system variables for consistent testing conditions” on page 76

■ “Evaluating performance” on page 79

■ “Evaluating UNIX system components” on page 84

■ “Evaluating Windows system components” on page 85

76 Measuring performanceOverview

OverviewThe final measure of NetBackup performance is the length of time required for backup operations to complete (usually known as the backup window), or the length of time required for a critical restore operation to complete. However, to measure existing performance and improve future performance by means of those measurements calls for performance metrics more reliable and reproducible than simple wall clock time. This chapter will discuss these metrics in more detail.

After establishing accurate metrics as described here, you can measure the current performance of NetBackup and your system components to compile a baseline performance benchmark. With a baseline, you can apply changes in a controlled way. By measuring performance after each change, you can accurately measure the effect of each change on NetBackup performance.

Controlling system variables for consistent testing conditions

For reliable performance evaluation, eliminate as many unpredictable variables as possible in order to create a consistent backup environment. Only a consistent environment will produce reliable and reproducible performance measurements. Some of the variables to consider are described below as they relate to the NetBackup server, the network, the NetBackup client, or the data itself.

Server variablesIt is important to eliminate all other NetBackup activity from your environment when you are measuring the performance of a particular NetBackup operation. One area to consider is the automatic scheduling of backup jobs by the NetBackup scheduler.

When policies are created, they are usually set up to allow the NetBackup scheduler to initiate the backups. The NetBackup scheduler will initiate backups based on the traditional NetBackup frequency-based scheduling or on certain days of the week, month, or other time interval. This process is called calendar-based scheduling. As part of the backup policy definition, the Start Window is used to indicate when the NetBackup scheduler can start backups using either frequency-based or calendar-based scheduling. When you perform backups for the purpose of performance testing, this setup might interfere since the NetBackup scheduler may initiate backups unexpectedly, especially if the operations you intend to measure run for an extended period of time.

77Measuring performanceControlling system variables for consistent testing conditions

The simplest way to prevent the NetBackup scheduler from running backup jobs during your performance testing is to create a new policy specifically for use in performance testing and to leave the Start Window field blank in the schedule definition for that policy. This prevents the NetBackup scheduler from initiating any backups automatically for that policy. After creating the policy, you can run the backup on demand by using the Manual Backup command from the NetBackup Administration Console.

To prevent the NetBackup scheduler from running backup jobs unrelated to the performance test, you may want to set all other backup policies to inactive by using the Deactivate command from the NetBackup Administration Console. Of course, you must reactivate the policies to start running backups again.

You can use a user-directed backup to run the performance test as well. However, using the Manual Backup option for a policy is preferred. With a manual backup, the policy contains the entire definition of the backup job, including the clients and files that are part of the performance test. Running the backup manually, straight from the policy, means there is no doubt which policy will be used for the backup, and makes it easier to change and test individual backup settings: from the policy dialog.

Before you start the performance test, check the Activity Monitor to make sure there is no NetBackup processing currently in progress. Similarly, check the Activity Monitor after the performance test for unexpected activity (such as an unanticipated restore job) that may have occurred during the test.

Additionally, check for non-NetBackup activity on the server during the performance test and try to reduce or eliminate it.

Note: By default, NetBackup logging is set to a minimum level. To gather more logging information, set the legacy and unified logging levels higher and create the appropriate legacy logging directories. For details on how to use NetBackup logging, refer to the logging chapter of the NetBackup Troublshooting Guide. Keep in mind that higher logging levels will consume more disk space.

Network variablesNetwork performance is key to achieving optimum performance with NetBackup. Ideally, you would use a completely separate network for performance testing to avoid the possibility of skewing the results by encountering unrelated network activity during the course of the test.

In many cases, a separate network is not available. Ensure that non-NetBackup activity is kept to an absolute minimum during the time you are evaluating performance. If possible, schedule testing for times when backups are not active. Even occasional short bursts of network activity may be enough to skew

78 Measuring performanceControlling system variables for consistent testing conditions

the results during portions of the performance test. If you are sharing the network with production backups occurring for other systems, you must account for this activity during the performance test.

Another network variable you must consider is host name resolution. NetBackup depends heavily upon a timely resolution of host names to operate correctly. If you have any delays in host name resolution, including reverse name lookup to identify a server name from an incoming connection from a certain IP address, you may want to eliminate that delay by using the HOSTS (Windows) or /etc/hosts (UNIX) file for host name resolution on systems involved in your performance test environment.

Client variablesMake sure the client system is in a relatively quiescent state during performance testing. A lot of activity, especially disk-intensive activity such as virus scanning on Windows, will limit the data transfer rate and skew the results of your tests.

One possible mistake is to allow another NetBackup server, such as a production backup server, to have access to the client during the course of the test. This may result in NetBackup attempting to back up the same client to two different servers at the same time, which would severely impact the results of a performance test in progress at that time.

Different file systems have different performance characteristics. For example, comparing data throughput results from operations on a UNIX VxFS or Windows FAT file system to those from operations on a UNIX NFS or Windows NTFS system may not be valid, even if the systems are otherwise identical. If you do need to make such a comparison, factor the difference between the file systems into your performance evaluation testing, and into any conclusions you may draw from that testing.

Data variablesMonitoring the data you are backing up improves the repeatability of performance testing. If possible, move the data you will use for testing backups to its own drive or logical partition (not a mirrored drive), and defragment the drive before you begin performance testing. For testing restores, start with an empty disk drive or a recently defragmented disk drive with ample empty space. This will reduce the impact of disk fragmentation on the NetBackup performance test and yield more consistent results between tests.

Similarly, for testing backups to tape, always start each test run with an empty piece of media. You can do this by expiring existing images for that piece of media through the Catalog node of the NetBackup Administration Console, or by

79Measuring performanceEvaluating performance

running the bpexpdate command. Another approach is to use the bpmedia command to freeze any media containing existing backup images so that NetBackup selects a new piece of media for the backup operation. This step will help reduce the impact of tape positioning on the NetBackup performance test and will yield more consistent results between tests. It will also reduce the impact of tape mounting and unmounting of media that has NetBackup catalog images and that cannot be used for normal backups.

When you test restores from tape, always restore from the same backup image on the tape to achieve consistent results between tests.

In general, using a large data set will generate a more reliable and reproducible performance test than a small data set. A performance test using a small data set would probably be skewed by startup and shutdown overhead within the NetBackup operation. These variables are difficult to keep consistent between test runs and are therefore likely to produce inconsistent test results. Using a large data set will minimize the effect of start up and shutdown times.

Design the makeup of the dataset to represent the makeup of the data in the intended production environment. For example, if the data set in the production environment contains many small files on file servers, then the data set for the performance testing should also contain many small files. A representative test data set will more accurately predict the NetBackup performance that you can reasonably expect in a production environment.

The type of data can help reveal bottlenecks in the system. Files consisting of non-compressible (random) data cause the tape drive to run at its lower rated speed. As long as the other components of the data transfer path are keeping up, you may identify the tape drive as the bottleneck. On the other hand, files consisting of highly-compressible data can be processed at higher rates by the tape drive when hardware compression is enabled. This may result in a higher overall throughput and possibly expose the network as the bottleneck.

Many values in NetBackup provide data amounts in kilobytes and rates in kilobytes per second. For greater accuracy, divide by 1024 rather than rounding off to 1000 when you convert from kilobytes to megabytes or from kilobytes per second to megabytes per second.

Evaluating performanceThere are two primary locations from which to obtain NetBackup data throughput statistics: the NetBackup Activity Monitor and the NetBackup All Log Entries report. The choice of which location to use is determined by the type of NetBackup operation you are measuring: non-multiplexed backup, restore, or multiplexed backup.

80 Measuring performanceEvaluating performance

You can obtain statistics for all three types of operations from the NetBackup All Log Entries report. You can obtain statistics for non-multiplexed backup or restore operations from the NetBackup Activity Monitor. For multiplexed backup operations, you can obtain the overall statistics from the All Log Entries report after all the individual backup operations which are part of the multiplexed backup are complete. In this case, the statistics available in the Activity Monitors for each of the individual backup operations are relative only to that operation, and do not reflect the actual total data throughput to the tape drive.

There may be small differences between the statistics available from these two locations due to slight differences in rounding techniques between the entries in the Activity Monitor and the entries in the All Logs report. For a given type of operation, choose either the Activity Monitor or the All Log Entries report and consistently record your statistics only from that location. In both the Activity Monitor and the All Logs report, the data-streaming speed is reported in kilobytes per second. If a backup or restore is repeated, the reported speed can vary between repetitions depending on many factors, including the availability of system resources and system utilization, but the reported speed can be used to assess the performance of the data-streaming process.

The statistics from the NetBackup error logs show the actual amount of time spent reading and writing data to and from tape. This does not include time spent mounting and positioning the tape. Cross-referencing the information from the error logs with data from the bpbkar log on the NetBackup client (showing the end-to-end elapsed time of the entire process) indicates how much time was spent on operations unrelated to reading and writing to and from the tape.

To evaluate performance through the NetBackup activity monitor

1 Run the backup or restore job.

2 Open the NetBackup Activity Monitor.

3 Verify that the backup or restore job completed successfully.

The Status column should contain a zero (0).

4 View the log details for the job by selecting the Actions > Details menu option, or by double-clicking on the entry for the job. Select the Detailed Status tab.

5 Obtain the NetBackup performance statistics from the following fields in the Activity Monitor:

■ Start Time/EndTime: These fields show the time window during which the backup or restore job took place.


■ Elapsed Time: This field shows the total elapsed time from when the job was initiated to job completion and can be used as in indication of total wall clock time for the operation.

■ KB per Second: This is the data throughput rate.

■ Kilobytes: Compare this value to the amount of data. Although it should be comparable, the NetBackup data amount will be slightly higher because of administrative information, known as metadata, saved for the backed up data.

For example, if you display properties for a directory containing 500 files, each 1 megabyte in size, the directory shows a size of 500 megabytes, or 524,288,000 bytes, which is equal to 512,000 kilobytes. The NetBackup report may show 513,255 kilobytes written, reporting 1255 kilobytes more than the file size of the directory. This is true for a flat directory. Subdirectory structures may diverge due to the way the operating system tracks used and available space on the disk. Also, be aware that the operating system may be reporting how much space was allocated for the files in question, not just how much data is actually there. For example, if the allocation block size is 1 kilobyte, 1000 1-byte files will report a total size of 1 megabyte, even though 1 kilobyte of data is all that exists. The greater the number of files, the larger this discrepancy may become.

To evaluate performance using the all log entries report

1 Run the backup or restore job.

2 Run the All Log Entries report from the NetBackup reports node in the NetBackup Administrative Console. Be sure that the Date/Time Range that you select covers the time period during which the job was run.

3 Verify that the job completed successfully by searching for an entry such as “the requested operation was successfully completed” for a backup, or “successfully read (restore) backup id...” for a restore.

4 Obtain the NetBackup performance statistics from the following entries in the report.

Note: The messages shown here will vary according to the locale setting of the master server.

Entry Statistic

started backup job for client <name>, policy <name>, schedule <name> on storage unit <name>

The Date and Time fields for this entry show the time at which the backup job started.

82 Measuring performanceEvaluating performance

successfully wrote backup id <name>, copy <number>, <number> Kbytes

For a multiplexed backup, this entry shows the size of the individual backup job and the Date and Time fields show the time at which the job finished writing to the storage device. The overall statistics for the multiplexed backup group, including the data throughput rate to the storage device, are found in a subsequent entry below.

successfully wrote <number> of <number> multiplexed backups, total Kbytes <number> at Kbytes/sec

For multiplexed backups, this entry shows the overall statistics for the multiplexed backup group including the data throughput rate.

successfully wrote backup id <name>, copy <number>, fragment <number>, <number> Kbytes at <number> Kbytes/sec

For non-multiplexed backups, this entry essentially combines the information in the previous two entries for multiplexed backups into one entry showing the size of the backup job, the data throughput rate, and the time, in the Date and Time fields, at which the job finished writing to the storage device.

the requested operation was successfully

completed

The Date and Time fields for this entry show the time at which the backup job completed. This value is later than the “successfully wrote” entry above because it includes extra processing time at the end of the job for tasks such as NetBackup image validation.

begin reading backup id <name>, (restore), copy <number>, fragment <number> from media id <name> on drive index <number>

The Date and Time fields for this entry show the time at which the restore job started reading from the storage device. (Note that the latter part of the entry is not shown for restores from disk, as it does not apply.)

successfully restored from backup id <name>, copy <number>, <number> Kbytes

For a multiplexed restore (generally speaking, all restores from tape are multiplexed restores as non-multiplexed restores require additional action from the user), this entry shows the size of the individual restore job and the Date and Time fields show the time at which the job finished reading from the storage device. The overall statistics for the multiplexed restore group, including the data throughput rate, are found in a subsequent entry below.

successfully restored <number> of <number> requests <name>, read total of <number> Kbytes at <number> Kbytes/sec

For multiplexed restores, this entry shows the overall statistics for the multiplexed restore group, including the data throughput rate.

Entry Statistic


Additional informationThe NetBackup All Log Entries report will also have entries similar to those described above for other NetBackup operations such as image duplication operations used to create additional copies of a backup image. Those entries have a very similar format and may be useful for analyzing the performance of NetBackup for those operations.

The bptm debug log file for tape backups (or bpdm log file for disk backups) will contain the entries that are in the All Log Entries report, as well as additional detail about the operation that may be useful for performance analysis. One example of this additional detail is the intermediate data throughput rate message for multiplexed backups, as shown below:

... intermediate after <number> successful, <number> Kbytes at <number> Kbytes/sec

This message is generated whenever an individual backup job completes that is part of a multiplexed backup group. In the debug log file for a multiplexed backup group consisting of three individual backup jobs, for example, there could be two intermediate status lines, then the final (overall) throughput rate.

For a backup operation, the bpbkar debug log file will also contain additional detail about the operation that may be useful for performance analysis.

Keep in mind, however, that writing the debug log files during the NetBackup operation introduces some overhead that would not normally be present in a production environment. Factor that additional overhead into any calculations done on data captures while debug log files are in use.

The information in the All Logs report is also found in /usr/openv/netbackup/db/error (UNIX) or install_path\NetBackup\db\error (Windows).

See the NetBackup Troubleshooting Guide to learn how to set up NetBackup to write these debug log files.

successfully read (restore) backup id media <number>, copy <number>, fragment <number>, <number> Kbytes at <number> Kbytes/sec

For non-multiplexed restores (generally speaking, only restores from disk are treated as non-multiplexed restores), this entry essentially combines the information from the previous two entries for multiplexed restores into one entry showing the size of the restore job, the data throughput rate, and the time, in the Date and Time fields, at which the job finished reading from the storage device.

Entry Statistic

84 Measuring performanceEvaluating UNIX system components

Evaluating UNIX system componentsIn addition to evaluating NetBackup’s performance, you should also verify that common system resources are in adequate supply.

Monitoring CPU loadUse the vmstat utility to monitor memory use. Add up the “us” and “sy” CPU columns to get the total CPU load on the system (refer to the vmstat man page for details). The vmstat scan rate indicates the amount of swapping activity taking place.

The sar command also provides insight into UNIX memory usage.

Measuring performance independent of tape or disk outputIt is possible to measure the disk (read) component of NetBackup’s speed independent of the network and tape components. There are two different techniques, described below. The first, using bpbkar, is easier. The second may be helpful in more limited circumstances.

In these procedures, the master server is the client.

To measure disk I/O using bpbkar

1 Turn on the legacy bpbkar log by ensuring that the bpbkar directory exists.

UNIX: /usr/openv/netbackup/logs/bpbkar

Windows: install_path\NetBackup\logs\bpbkar

2 Set logging level to 1.

3 Enter the following:

UNIX/usr/openv/netbackup/bin/bpbkar -nocont -dt 0 -nofileinfo

-nokeepalives filesystem > /dev/null

Where filesystem is the path being backed up.

Windowsinstall_path\NetBackup\bin\bpbkar32 -nocont X:\ > NUL

Where X:\ is the path being backed up.

4 Check the time it took NetBackup to move the data from the client disk:

UNIX: The start time is the first PrintFile entry in the bpbkar log, the end time is the entry “Client completed sending data for backup,” and the amount of data is given in the entry Total Size.

Windows: Check the bpbkar log for the entry Elapsed time.

85Measuring performanceEvaluating Windows system components

To measure disk I/O using the bpdm_dev_null touch file (UNIX only)

For UNIX systems, the procedure below can be useful as a follow-on to the bpbkar procedure (above). If the bpbkar procedure shows that the disk read performance is not the bottleneck and does not help isolate the problem, then the bpdm_dev_null procedure described below may be helpful. If the bpdm_dev_null procedure shows poor performance, the bottleneck is somewhere in the data transfer between the bpbkar process on the client and the bpdm process on the server. The problem may involve the network, or shared memory (such as not enough buffers, or buffers that are too small). To change shared memory settings, see “Shared memory (number and size of data buffers)” on page 102.

Caution: If not used correctly, the following procedure can lead to data loss. Touching the bpdm_dev_null file redirects all disk backups to /dev/null, not just those backups using the storage unit created by this procedure. You should disable active production policies for the duration of this test and remove /dev/null as soon as this test is complete.

1 Enter the following:touch /usr/openv/netbackup/bpdm_dev_null

Note: The bpdm_dev_null file re-directs any backup that uses a disk storage unit to /dev/null.

2 Create a new disk storage unit, using /tmp or some other directory as the image directory path.

3 Create a policy that uses the new disk storage unit.

4 Run a backup using this policy. NetBackup will create a file in the storage unit directory as if this were a real backup to disk. This degenerate image file will be zero bytes long.

5 To remove the zero-length file and clear the NetBackup catalog of a backup that cannot be restored, run this command:/usr/openv/netbackup/bin/admincmd/bpexpdate -backupid backupid

-d 0

where backupid is the name of the file residing in the storage unit directory.

Evaluating Windows system componentsIn addition to evaluating NetBackup’s performance, you should also verify that common system resources are in adequate supply. You may want to use the

86 Measuring performanceEvaluating Windows system components

Windows Performance Monitor utility included with Windows. For information about using the Performance Monitor, refer to your Microsoft documentation.

The Performance Monitor organizes information by object, counter, and instance.

An object is a system resource category, such as a processor or physical disk. Properties of an object are counters. Counters for the Processor object include %Processor Time, which is the default counter, and Interrupts/sec. Duplicate counters are handled via instances. For example, to monitor the %Processor Time of a specific CPU on a multiple CPU system, the Processor object is selected, then the %Processor Time counter for that object is selected, followed by the specific CPU instance for the counter.

When you use the Performance Monitor, you can view data in real time format or collect the data in a log for future analysis. Specific components to evaluate include CPU load, memory use, and disk load.

Note: It is recommended that a remote host be used for monitoring of the test host, to reduce load that might otherwise skew results.

Monitoring CPU loadTo determine if the system has enough power to accomplish the requested tasks, monitor the % Processor Time counter for the Processor object to determine how hard the CPU is working, and monitor the Process Queue Length counter for the System object to determine how many processes are actively waiting for the processor.

For % Processor Time, values of 0 to 80 percent are generally considered safe. Values from 80 percent to 90 percent indicate that the system is being pushed hard, while consistent values above 90 percent indicate that the CPU is a bottleneck.

Spikes approaching 100 percent are normal and do not necessarily indicate a bottleneck. However, if sustained loads approaching 100 percent are observed, efforts to tune the system to decrease process load or an upgrade to a faster processor should be considered.

Sustained Processor Queue Lengths greater than two indicate too many threads are waiting to be executed. To correctly monitor the Processor Queue Length counter, the Performance Monitor must be tracking a thread-related counter. If you consistently see a queue length of 0, verify that a non-zero value can be displayed.

87Measuring performanceEvaluating Windows system components

Note: The default scale for the Processor Queue Length may not be equal to 1. Be sure to read the data correctly. For example, if the default scale is 10x, then a reading of 40 actually means that only 4 processes are waiting.

Monitoring memory useMemory is a critical resource for increasing the performance of backup operations. When you examine memory usage, view information on:

■ Committed Bytes. Committed Bytes displays the size of virtual memory that has been committed, as opposed to reserved. Committed memory must have disk storage available or must not require the disk storage because the main memory is large enough. If the number of Committed Bytes approaches or exceeds the amount of physical memory, you may encounter issues with page swapping.

■ Page Faults/sec. Page Faults/sec is a count of the page faults in the processor. A page fault occurs when a process refers to a virtual memory page that is not in its Working Set in main memory. A high Page Fault rate may indicate insufficient memory.

Monitoring disk loadTo use disk performance counters to monitor the disk performance in Performance Monitor, you may need to enable those counters. Windows may not have enabled the disk performance counters by default for your system.

For more information about disk performance counters, from a command prompt, type:

diskperf -help

To enable these counters and allow disk monitoring:

Note: On a Windows 2000 system, this is set by default.

1 From a command prompt, type:

diskperf -y

2 Reboot the system.

To disable these counters and cancel disk monitoring:

1 From a command prompt, type:

diskperf -n

88 Measuring performanceEvaluating Windows system components

2 Reboot the system.

When you monitor disk performance, use the %Disk Time counter for the PhysicalDisk object to track the percentage of elapsed time that the selected disk drive is busy servicing read or write requests.

Also monitor the Avg. Disk Queue Length counter and watch for values greater than 1 that last for more than one second. Values greater than 1 for more than a second indicate that multiple processes are waiting for the disk to service their requests.

Several techniques may be used to increase disk performance, including:

■ Check the fragmentation level of the data. A highly fragmented disk limits throughput levels. Use a disk maintenance utility to defragment the disk.

■ Consider adding additional disks to the system to increase performance. If multiple processes are attempting to log data simultaneously, dividing the data among multiple physical disks may help.

■ Determine if the data transfer involves a compressed disk. The use of Windows compression to automatically compress the data on the drive adds additional overhead to disk read or write operations, adversely affecting the performance of NetBackup. Use Windows compression only if it is needed to avoid a disk full condition.

■ Consider converting to a system based on a Redundant Array of Inexpensive Disks (RAID). Though more expensive, RAID devices generally offer greater throughput, and, (depending on the RAID level employed), improved reliability.

■ Determine what type of controller technology is being used to drive the disk. Consider if a different system would yield better results. See the table “Drive controller data transfer rates” on page 21 for throughput rates for common controllers.

Chapter
8
Tuning the NetBackup data transfer path

This chapter provides guidelines and recommendations for improving performance in the data transfer path of NetBackup.


■ “Overview” on page 90

■ “The data transfer path” on page 90

■ “Basic tuning suggestions for the data path” on page 91

■ “NetBackup client performance” on page 95

■ “NetBackup network performance” on page 96

■ “NetBackup server performance” on page 102

■ “NetBackup storage device performance” on page 126

90 Tuning the NetBackup data transfer pathOverview

OverviewThis chapter contains information on ways to optimize NetBackup. This chapter is not intended to provide tuning advice for particular systems. If you would like help fine-tuning your NetBackup installation, please contact Symantec Consulting Services.

Before examining the factors that affect backup performance, please note that an important first step is to ensure that your system meets NetBackup’s recommended minimum requirements. Refer to the NetBackup Installation Guide and NetBackup Release Notes for information about these requirements. Additionally, Symantec recommends that you have the most recent NetBackup software patch installed.

Many performance issues can be traced to hardware or other environmental issues. A basic understanding of the entire data transfer path is essential in determining the maximum obtainable performance in your environment. Poor performance is often the result of poor planning, which can be based on unrealistic expectations of any particular component of the data transfer path.

The data transfer pathThe component that limits the overall performance of NetBackup is of course the slowest component in the backup system. For example, a fast tape drive combined with an overloaded server yields poor performance. Similarly, a fast tape drive on a slow network also yields poor performance.

The backup system is referred to as the data transfer path. The path usually starts at the data on the disk and ends with a backup copy on tape or disk.

This chapter subdivides the standard NetBackup data transfer path into four basic components: the NetBackup client, the network, the NetBackup server, and the storage device.

Note: This chapter discusses NetBackup performance evaluation and improvement from a testing perspective. It describes ways to isolate performance variables in order to get a sense of the effect each variable has on overall system performance, and to optimize NetBackup performance with regard to that variable. It may not be possible to optimize every variable on your production system.

The requirements for database backups may not be the same as for file system backups. This information applies to file system backups unless otherwise noted.

91Tuning the NetBackup data transfer pathBasic tuning suggestions for the data path

Basic tuning suggestions for the data pathIn every backup system there is always room for improvement. Obtaining the best performance from a backup infrastructure is not complex, but it requires careful review of the many factors that can affect processing. The first step is to gain an accurate assessment of each hardware, software, and networking component in the backup data path. Many performance problems are resolved before attempting to change NetBackup parameters.

NetBackup software offers plenty of resources to help isolate performance problems and assess the impact of configuration changes. However, it is essential to thoroughly test both backup and restore processes after making any changes to the NetBackup configuration parameters.

This section provides practical ideas to improve your backup system performance and avoid bottlenecks. You can find more details on several of the topics and solutions described here in the following NetBackup manuals:

Veritas NetBackup System Administrator’s Guide for UNIX, Volumes I & II

Veritas NetBackup System Administrator’s Guide for Windows, Volumes I & II

Veritas NetBackup Troubleshooting Guide (for UNIX and Windows)

Tuning suggestions:

■ Use multiplexing.

Multiplexing is a NetBackup option that lets you write multiple data streams from several clients at once to a single tape drive or several tape drives. Multiplexing can be used to improve the backup performance of slow clients, multiple slow networks, and many small backups (such as incremental backups). Multiplexing reduces the time each job spends waiting for a device to become available, thereby making the best use of the transfer rate of your storage devices.

Multiplexing is not recommended when restore speed is of paramount interest or when your tape drives are slow. To reduce the impact of multiplexing on restore times, you can improve your restore performance by reducing the maximum fragment size for the storage units. If the fragment size is small, so that the backup image is contained in several fragments, a NetBackup restore can quickly skip to the specific fragment containing the file to be restored. In contrast, if the fragment size is large enough to contain the entire image, the NetBackup restore starts at the very beginning of the image and reads through the image until it finds the desired file.

Multiplexed backups can be de-multiplexed to improve restore performance by using bpduplicate to move fragmented images to a sequential image on a new tape.

92 Tuning the NetBackup data transfer pathBasic tuning suggestions for the data path

Refer to the Veritas NetBackup System Administrator’s Guide for more information about using multiplexing.

■ Consider striping a disk volume across drives.

A striped set of disks will pull data from all disk drives concurrently, allowing faster data transfers between disk drives and tape drives.

■ Maximize the use of your backup windows.

You can configure all your incremental backups to happen at the same time every day and stagger the execution of your full backups across multiple days. Large systems could be backed up over the weekend while smaller systems are spread over the week. You can even start full backups earlier than the incremental backups. They might finish before the incremental backups and give you back all or most of your backup window to finish the incremental backups.

■ Convert large clients into media servers to decrease backup times and reduce network traffic.

Any machine with locally-attached drives can be used as a media server to back up itself or other systems. By converting large client systems into media servers, your backup data no longer travels over the network (except for catalog data), and you get the fastest transfer speeds afforded by locally-attached devices. Another benefit of media servers is that you can use them to balance the load of backing up other clients for your NetBackup master. A media server can back up clients on a network where it has a local connection, thus saving network traffic for a master that might have to go over routers to communicate with those clients. A special case of a media server is a SAN Media Server, which is a NetBackup media server that backs up itself only and comes at a lower cost than a regular media server.

■ Use dedicated private networks to decrease backup times and network traffic.

If you configure one or more networks dedicated to backups, you can reduce the time it takes to back up the systems on those networks and reduce or eliminate network traffic on your enterprise networks. In addition, you can convert to faster technologies and even backup your systems at any time without affecting the enterprise network’s performance (assuming that users do not mind the system loads while backups take place).

■ Avoid a concentration of servers on one network.

If you have a concentration of large servers that you back up over the same general network, you might want to convert some of these into media servers or attach them to private backup networks. Doing either will decrease backup times and reduce network traffic for your other backups.

■ Use dedicated backup servers to perform your backups.

93Tuning the NetBackup data transfer pathBasic tuning suggestions for the data path

When selecting a server for performing your backups, use a dedicated system just for performing backups. A server that shares the load of running several applications unrelated to backups can severely affect your performance and maintenance windows.

■ Use drives from tape libraries attached to other systems.

You can use tape drives from a tape library attached to your master server or another media server, or you can dedicate a library to your large servers. Systems using these drives become media servers that can back up themselves and others through locally-attached drives. The robotic control arm of the library can be connected to either the master server or the media server.

■ Consider the requirements of backing up your catalog.

Remember that the NetBackup catalog needs to be backed up. To facilitate NetBackup catalog recovery, it is highly recommended that the master server have access to a dedicated tape drive, either standalone or within a robotic library.

■ Try to level the backup load.

You can use multiple drives to reduce backup times; however, since you may not be able to split data streams evenly, you may need to experiment with the configuration of the streams or the configuration of the NetBackup policies to spread the load across multiple drives.

■ Bandwidth limiting

The bandwidth limiting feature lets you restrict the network bandwidth consumed by one or more NetBackup clients on a network. The bandwidth setting appears under Host Properties > Master Servers, Properties. The actual limiting occurs on the client side of the backup connection. This feature only restricts bandwidth during backups. Restores are unaffected.

When a backup starts, NetBackup reads the bandwidth limit configuration and then determines the appropriate bandwidth value and passes it to the client. As the number of active backups increases or decreases on a subnet, NetBackup dynamically adjusts the bandwidth limiting on that subnet. If additional backups are started, the NetBackup server instructs the other NetBackup clients running on that subnet to decrease their bandwidth setting. Similarly, bandwidth per client is increased if the number of clients decreases. Changes to the bandwidth value occur on a periodic basis rather than as backups stop and start. This characteristic can reduce the number of bandwidth value changes.

■ Load balancing

NetBackup provides ways to balance loads between servers, clients, policies, and devices. Note that these settings may interact with each other:

94 Tuning the NetBackup data transfer pathBasic tuning suggestions for the data path

compensating for one issue can cause another. The best approach is to use the defaults unless you anticipate or encounter an issue.

■ Adjust the backup load on the server.

Change the Limit jobs per policy attribute for one or more of the policies that the server is backing up. For example, decreasing Limit jobs per policy reduces the load on a server on a specific subnetwork. Reconfiguring policies or schedules to use storage units on other servers also reduces the load. Another possibility is to use bandwidth limiting on one or more clients.

■ Adjust the backup load on the server during specific time periods only.

Reconfigure schedules that execute during the time periods of interest, so they use storage units on servers that can handle the load (assuming you are using media servers).

■ Adjust the backup load on the clients.

Change the Maximum jobs per client global attribute. For example, increasing Maximum jobs per client increases the number of concurrent jobs that any one client can process and therefore increases the load.

■ Reduce the time to back up clients.

Increase the number of jobs that clients can perform concurrently, or use multiplexing. Another possibility is to increase the number of jobs that the server can perform concurrently for the policy or policies that are backing up the clients.

■ Give preference to a policy.

Increase the Limit jobs per policy attribute value for the preferred policy relative to other policies. Alternatively, increase the priority for the policy.

■ Adjust the load between fast and slow networks.

Increase the values of Limit jobs per policy and Maximum jobs per client for the policies and clients on a faster network. Decrease these values for slower networks. Another solution is to use bandwidth limiting.

■ Limit the backup load produced by one or more clients.

Use bandwidth limiting to reduce the bandwidth used by the clients.

■ Maximize the use of devices

Use multiplexing. Also, allow as many concurrent jobs per storage unit, policy, and client as possible without causing server, client, or network performance issues.

■ Prevent backups from monopolizing devices.

95Tuning the NetBackup data transfer pathNetBackup client performance

Limit the number of devices that NetBackup can use concurrently for each policy or limit the number of drives per storage unit. Another approach is to exclude some of your devices from Media Manager control.

NetBackup client performanceThis section lists some factors to consider when evaluating the NetBackup client component of the NetBackup data transfer path. Examine these conditions to identify possible changes that may improve the overall performance of NetBackup.

■ Disk fragmentation. Fragmentation is a condition where data is scattered around the disk in non-contiguous blocks. This condition severely impacts the data transfer rate from the disk. Fragmentation can be repaired using hard disk management utility software offered by a variety of vendors.

■ Number of disks. Consider adding additional disks to the system to increase performance. If multiple processes are attempting to log data simultaneously, dividing the data among multiple physical disks may help.

■ Disk arrays. Consider converting to a system based on a Redundant Array of Inexpensive Disks (RAID). Though more expensive, RAID devices generally offer greater throughput, and, (depending on the RAID level employed), improved reliability.

■ The type of controller technology being used to drive the disk. Consider if a different system would yield better results.

■ Virus scanning. If virus scanning is turned on for the system, it may severely impact the performance of the NetBackup client during a backup or restore operation. This may be especially true for systems such as large Windows file servers. You may wish to disable virus scanning during backup or restore operations to avoid the impact on performance.

■ NetBackup notify scripts. The bpstart_notify.bat and bpend_notify.bat scripts are very useful in certain situations, such as shutting down a running application to back up its data. However, these scripts must be written with care to avoid any unnecessary lengthy delays at the start or end of the backup job. If the scripts are not performing tasks essential to the backup operation, you may want to remove them.

■ NetBackup software location. If the data being backed up is located on the same physical disk drive as the NetBackup installation, performance may be adversely affected, especially if NetBackup debug log files are being used. If they are being used, the extent of the degradation will be greatly influenced by the NetBackup verbose setting for the debug logs. If possible, install

96 Tuning the NetBackup data transfer pathNetBackup network performance

NetBackup on a separate physical disk drive to avoid this disk drive contention.

■ Snapshots (hardware or software). If snapshots need to be taken before the actual backup of data, the time needed to take the snapshot will affect the overall performance.

■ Job tracker. If the NetBackup Client Job Tracker is running on the client, then NetBackup will gather an estimate of the data to be backed up prior to the start of a backup job. Gathering this estimate will affect the startup time, and therefore the data throughput rate, because no data is being written to the NetBackup server during this estimation phase. You may wish to avoid running the NetBackup Client Job Tracker to avoid this delay.

■ Client location. You may wish to consider adding a locally attached tape device to the client and changing the client to a NetBackup media server if you have a substantial amount of data on the client. For example, backing up 100 gigabytes of data to a locally attached tape drive will generally be more efficient than backing up the same amount of data across a network connection to a NetBackup server. Of course, there are many variables to consider, such as the bandwidth available on the network, that will affect the decision to back up the data to a locally attached tape drive as opposed to moving the data across the network.

■ Determining the theoretical performance of the NetBackup client software. You can use the NetBackup client command bpbkar (UNIX) or bpbkar32 (Windows) to determine the speed at which the NetBackup client can read the data to be backed up from the disk drive. This may eliminate data read speed as a possible performance bottleneck. For the procedure, see “Measuring performance independent of tape or disk output” on page 84.

NetBackup network performanceTo improve the overall performance of NetBackup, consider the following network components and factors.

Network interface settingsMake sure your network connections are properly installed and configured. Note the following:

■ Network interface cards (NICs) for NetBackup servers and clients must be set to full-duplex.

■ Both ends of each network cable (the NIC card and the switch) must be set identically as to speed and mode (both NIC and switch must be at full

97Tuning the NetBackup data transfer pathNetBackup network performance

duplex). Otherwise, link down, excessive/late collisions, and errors will result.

■ If auto-negotiate is being used, make sure that both ends of the connection are set at the same mode and speed. The higher the speed, the better.

■ In addition to NICs and switches, all routers must be set to full duplex.

Consult the operating system documentation for instructions on how to determine and change the NIC settings.

Note: Using AUTOSENSE may cause network problems and performance issues.

Network loadThere are two key considerations to monitor when you evaluate remote backup performance:

■ The amount of network traffic

■ The amount of time that network traffic is high

Small bursts of high network traffic for short durations will have some negative impact on the data throughput rate. However, if the network traffic remains consistently high for a significant amount of time during the operation, the network component of the NetBackup data transfer path will very likely be the bottleneck. Always try to schedule backups during times when network traffic is low. If your network is heavily loaded, you may wish to implement a secondary network which can be dedicated to backup and restore traffic.

NetBackup media server network buffer sizeThe NetBackup media server has a tunable parameter that you can use to adjust the size of the network communications buffer used to receive data from the network (a backup) or write data to the network (a restore). This parameter specifies the value that is used to set the network buffer size for backups and restores.

UNIX

The default value for this parameter is 32032.

Windows

The default value for this parameter is derived from the NetBackup data buffer size (see below for more information about the data buffer size) using the following formula:

For backup jobs: (<data_buffer_size> * 4) + 1024

For restore jobs: (<data_buffer_size> * 2) + 1024


For tape: because the default value for the NetBackup data buffer size is 65536 bytes, this formula results in a default NetBackup network buffer size of 263168 bytes for backups and 132096 bytes for restores.

For disk: because the default value for the NetBackup data buffer size is 262144 bytes, this formula results in a default NetBackup network buffer size of 1049600 bytes for backups and 525312 bytes for restores.

To set this parameter, create the following files:

UNIX

/usr/openv/netbackup/NET_BUFFER_SZ

/usr/openv/netbackup/NET_BUFFER_SZ_REST

Windows

install_path\NetBackup\NET_BUFFER_SZ

install_path\NetBackup\NET_BUFFER_SZ_REST

These files contain a single integer specifying the network buffer size in bytes. For example, to use a network buffer size of 64 Kilobytes, the file would contain 65536. If the files contain the integer 0 (zero), the default value for the network buffer size is used.

If the NET_BUFFER_SZ file exists, and the NET_BUFFER_SZ_REST file does not exist, the contents of NET_BUFFER_SZ will specify the network buffer size for both backup and restores.

If the NET_BUFFER_SZ_REST file exists, its contents will specify the network buffer size for restores.

If both files exist, the NET_BUFFER_SZ file will specify the network buffer size for backups, and the NET_BUFFER_SZ_REST file will specify the network buffer size for restores.

Because local backup or restore jobs on the media server do not send data over the network, this parameter has no effect on those operations. It is used only by the NetBackup media server processes which read from or write to the network, specifically, the bptm or bpdm processes. It is not used by any other NetBackup processes on a master server, media server, or client.

This parameter is the counterpart on the media server to the Communications Buffer Size parameter on the client, which is described below. The network buffer sizes are not required to be the same on all of your NetBackup systems for NetBackup to function properly; however, setting the Network Buffer Size parameter on the media server and the Communications Buffer Size parameter on the client (see below) to the same value has significantly improved the throughput of the network component of the NetBackup data transfer path in some installations.

Similarly, the network buffer size does not have a direct relationship with the NetBackup data buffer size (described under “Shared memory (number and size


of data buffers)” on page 102). They are separately tunable parameters. However, setting the network buffer size to a substantially larger value than the data buffer has achieved the best performance in many NetBackup installations.

Synthetic full backups on AIX NetBackup serversIf synthetic full backups on AIX NetBackup servers are running slowly, increase the NET_BUFFER_SZ network buffer to 262144 (256KB). To do this, create a file called /usr/openv/netbackup/NET_BUFFER_SZ and change the default setting (32032) to 262144. This file is unformatted, and should contain only the size in bytes:

$ cat /usr/openv/netbackup/NET_BUFFER_SZ262144$

Changing this value can affect backup and restore operations on the media servers. Test backups and restores to ensure that the change you make does not negatively impact performance.

NetBackup client communications buffer sizeThe NetBackup client has a tunable parameter that you can use to adjust the size of the network communications buffer used to write data to the network for backups.

This client parameter is the counterpart on the client to the Network Buffer Size parameter on the media server, described above. As mentioned, the network buffer sizes are not required to be the same on all of your NetBackup systems for NetBackup to function properly. However, setting the Network Buffer Size parameter on the media server (see above) and the Communications Buffer Size parameter on the client to the same value achieves the best performance in some NetBackup installations.

To set the communications buffer size parameter (on UNIX clients)

Create the /usr/openv/netbackup/NET_BUFFER_SZ file.

As with the media server, it should contain a single integer specifying the communications buffer size. Generally, performance is better when the value in the NET_BUFFER_SZ file on the client matches the value in the NET_BUFFER_SZ file on the media server.

Note: The NET_BUFFER_SZ_REST file is not used on the client. The value in the NET_BUFFER_SZ file is used for both backups and restores.


To set the communications buffer size parameter (on Windows clients)

1 From Host Properties in the NetBackup Administration Console, expand Clients and open the Client Properties > Windows Client > Client Settings dialog for the Windows client on which the parameter is to be changed.

2 Enter the desired value in the Communications buffer field.

This parameter is specified in the number of kilobytes. The default value is 32. An extra kilobyte is added internally for backup operations. Therefore, the default network buffer size for backups is 33792 bytes. In some NetBackup installations, this default value is too small. Increasing the value to 128 improves performance in these installations.

Because local backup jobs on the media server do not send data over the network, this parameter has no effect on these local operations. This parameter is used by only the NetBackup client processes which write to the network, specifically, the bpbkar32 process. It is not used by any other NetBackup for Windows processes on a master server, media server, or client.

3 If you modify the NetBackup buffer settings, test the performance of restores with the new settings.

The NOSHM fileWhen a master or media server backs itself up, NetBackup uses shared memory to speed up the backup. In this case, NetBackup uses shared memory rather than socket communications to transport the data between processes. However, sometimes situations may arise where it is not possible or desirable to use shared memory during a backup. Touching the file NOSHM in the /usr/openv/netbackup (UNIX) directory or the install_path\NetBackup (Windows) directory causes the client and server to use socket communications rather than shared memory to interchange the backup data. (Touching a file means changing the file’s modification and access times.)

The file name is NOSHM and should not contain any extension.

Each time a backup runs, NetBackup checks for the existence of this file, so no services need to be stopped and started for it to take effect. One example of when it might be necessary to use NOSHM is when the master or media server hosts another application that uses a large amount of shared memory, for instance, Oracle.

NOSHM is also useful for testing, both as a workaround while solving a shared memory issue and to verify that an issue is caused by shared memory.


Note: NOSHM only affects backups when it is applied to a system with a directly-attached storage unit.

NOSHM forces a local backup to run as though it were a remote backup. A local backup is a backup of a client that has a directly-attached storage unit, such as a client that happens to be a master or media server. A remote backup is a backup that passes the data across a network connection from the client to a master or media server’s storage unit.

A local backup normally has one or more bpbkar processes that read from the disk and write into shared memory, and a bptm process that reads from shared memory and writes to the tape. A remote backup has one or more bptm (child) processes that read from a socket connection to bpbkar and write into shared memory, and a bptm (parent) process that reads from shared memory and writes to the tape. NOSHM forces the remote backup model even when the client and the media server are the same system.

For a local backup without NOSHM, shared memory is used between bptm and bpbkar. Whether the backup is remote or local, and whether NOSHM exists or not, shared memory is always used between bptm (parent) and bptm (child).

Note: NOSHM does not affect the shared memory used by the bptm process to buffer data being written to tape. bptm uses shared memory for any backup, local or otherwise.

Using multiple interfacesFor a master or media server configured with more than one network interface, distributing NetBackup traffic over all available network interfaces can improve performance. This can be achieved by configuring a unique hostname for the server for each network interface and setting up bp.conf entries for these hostnames.

For example, suppose the server is configured with three network interfaces. Each of the network interfaces connects to one or more NetBackup clients. The following configuration allows NetBackup to use all three network interfaces:

■ In the server’s bp.conf file, add one entry for each network interface:SERVER=server-netaSERVER=server-netbSERVER=server-netc

■ In each client’s bp.conf, make the following entries: SERVER=server-netaSERVER=server-netbSERVER=server-netc

102 Tuning the NetBackup data transfer pathNetBackup server performance

It is okay for a client to have an entry for a server that is not currently on the same network.

NetBackup server performanceTo improve NetBackup server performance, consider the following factors regarding the data transfer path.

■ Shared memory (number and size of data buffers)

■ Parent/child delay values

■ Using NetBackup wait and delay counters

■ Fragment size and NetBackup restores

■ Other restore performance issues

Shared memory (number and size of data buffers)The NetBackup media server uses shared memory to buffer data between the network and the tape or disk drive (or between the disk and the tape drive if the NetBackup media server and client are the same system). The number and size of these shared data buffers can be configured on the NetBackup media server.

The size and number of the tape and disk buffers may be changed so that NetBackup optimizes its use of shared memory. Changing the default buffer size may result in better throughput for high-performance tape drives. These changes may also improve throughput for other types of drives.

Buffer settings are for media servers only and should not be used on a pure master server or client.

Note: Restores use the same buffer size that was used to back up the images being restored.

Default number of shared data buffersThe default number of shared data buffers for various NetBackup operations is shown in the table below.

Table 8-1 Default number of shared data buffers

NetBackup Operation Number of Shared Data Buffers

UNIX Windows

Non-multiplexed backup 8 16

103Tuning the NetBackup data transfer pathNetBackup server performance

Default size of shared data buffersThe default size of shared data buffers for various NetBackup operations is shown in the following table.

On Windows, a single tape I/O operation is performed for each shared data buffer. Therefore, this size must not exceed the maximum block size for the tape device or operating system. For Windows systems, the maximum block size is generally 64K, although in some cases customers are using a larger value successfully. For this reason, the terms “tape block size” and “shared data buffer size” are synonymous in this context.

Amount of shared memory required by NetBackupUse this formula to calculate the amount of shared memory required by NetBackup:

Multiplexed backup 4 8

Restore of non-multiplexed backup 8 16

Restore of multiplexed backup 12 12

Verify 8 16

Import 8 16

Duplicate 8 16

Table 8-2 Default size of shared data buffers

NetBackup Operation Size of Shared Data Buffers

UNIX Windows

Non-multiplexed backup 64K (tape), 256K (disk) 64K (tape), 256K (disk)

Multiplexed backup 64K (tape), 256K (disk) 64K (tape), 256K (disk)

Restore, verify, or import same size as used for the backup

same size as used for the backup

Duplicate read side: same size as used for the backup;

write side: 64K (tape), 256K (disk)

read side: same size as used for the backup;

write side: 64K (tape), 256K (disk)

Table 8-1 Default number of shared data buffers

NetBackup Operation Number of Shared Data Buffers


(number_data_buffers * size_data_buffers) * number_tape_drives * max_multiplexing_setting

For example, assume that the number of shared data buffers is 16, the size of the shared data buffers is 64 Kilobytes, there are two tape drives, and the maximum multiplexing setting is four. Following the formula above, the amount of shared memory required by NetBackup is:

(16 * 65536) * 2 * 4 = 8 MB

Be careful when changing these settings (see the next caution).

Changing the number of shared data buffersTo change the number of shared data buffers, create the following file(s) on the media server (note that the NUMBER_DATA_BUFFERS_RESTORE file is only needed for restore from tape, not from disk):

UNIX

For tape/usr/openv/netbackup/db/config/NUMBER_DATA_BUFFERS/usr/openv/netbackup/db/config/NUMBER_DATA_BUFFERS_RESTORE

For disk/usr/openv/netbackup/db/config/NUMBER_DATA_BUFFERS_DISK

Windows

For tape<install_path>\NetBackup\db\config\NUMBER_DATA_BUFFERS<install_path>\NetBackup\db\config\NUMBER_DATA_BUFFERS_RESTORE

For disk<install_path>\NetBackup\db\config\NUMBER_DATA_BUFFERS_DISK

These files contain a single integer specifying the number of shared data buffers NetBackup will use. The integer represents the number of data buffers. For backups (in the NUMBER_DATA_BUFFERS and NUMBER_DATA_BUFFERS_DISK files), the integer’s value must be a power of 2.

If the NUMBER_DATA_BUFFERS file exists, its contents will be used to determine the number of shared data buffers to be used for multiplexed and non-multiplexed backups.

NUMBER_DATA_BUFFERS_DISK allows for a different value when doing backup to disk instead of tape. If NUMBER_DATA_BUFFERS exists but NUMBER_DATA_BUFFERS_DISK does not, NUMBER_DATA_BUFFERS applies to all backups. If both files exist, NUMBER_DATA_BUFFERS applies to tape backups and NUMBER_DATA_BUFFERS_DISK applies to disk backups. If only NUMBER_DATA_BUFFERS_DISK is present, it applies to disk backups only.

If the NUMBER_DATA_BUFFERS_RESTORE file exists, its contents will be used to determine the number of shared data buffers to be used for multiplexed restores from tape.


The NetBackup daemons do not have to be restarted for the new values to be used. Each time a new job starts, bptm checks the configuration file and adjusts its behavior.

Changing the size of shared data buffers

Caution: It is critical to perform both backup and restore testing if the shared data buffer size is changed. If all NetBackup media servers are not running in the same operating system environment, it is critical to test restores on each of the NetBackup media servers that may be involved in a restore operation. For example, if a UNIX NetBackup media server is used to write a backup to tape with a shared data buffer (block size) of 256 Kilobytes, then it is possible that a Windows NetBackup media server will not be able to read that tape. In general, it is strongly recommended you test restore as well as backup operations, to avoid the potential for data loss. See “Testing changes made to shared memory” on page 107.

To change the size of the shared data buffers, create the following file on the media server:

UNIX

For tape/usr/openv/netbackup/db/config/SIZE_DATA_BUFFERS

For disk/usr/openv/netbackup/db/config/SIZE_DATA_BUFFERS_DISK

Windows

For tapeinstall_path\NetBackup\db\config\SIZE_DATA_BUFFERS

For diskinstall_path\NetBackup\db\config\SIZE_DATA_BUFFERS_DISK

This file contains a single integer specifying the size of each shared data buffer in bytes. The integer must be a multiple of 32 kilobytes (a multiple of 1024 is recommended); see the table below for valid values. The integer represents the size of one tape or disk buffer in bytes. For example, to use a shared data buffer size of 64 Kilobytes, the file would contain the integer 65536.

The NetBackup daemons do not have to be restarted for the parameter values to be used. Each time a new job starts, bptm checks the configuration file and adjusts its behavior.


Analyze the buffer usage by checking the bptm debug log before and after altering the size of buffer parameters.

IMPORTANT: Because the data buffer size equals the tape I/O size, the value specified in SIZE_DATA_BUFFERS must not exceed the maximum tape I/O size supported by the tape drive or operating system. This is usually 256 or 128 Kilobytes. Check your operating system and hardware documentation for the maximum values. Take into consideration the total system resources and the entire network. The Maximum Transmission Unit (MTU) for the LAN network may also have to be changed. NetBackup expects the value for NET_BUFFER_SZ and SIZE_DATA_BUFFERS to be in bytes, so in order to use 32k, use 32768 (32 x 1024).

Note: Some Windows tape devices are not able to write with block sizes higher than 65536 (64 Kilobytes). Backups created on a UNIX media server with SIZE_DATA_BUFFERS set to more than 65536 cannot be read by some Windows media servers. This means that the Windows media server would not be able to import or restore any images from media that were written with SIZE_DATA_BUFFERS greater than 65536.

Note: The size of the shared data buffers used for a restore operation is determined by the size of the shared data buffers in use at the time the backup was written. This file is not used by restores.

Table 8-3 Absolute byte value to be entered in SIZE_DATA_BUFFERS

Kilobytes per Data Buffer SIZE_DATA_BUFFER Value

32 32768

64 65536

96 98304

128 131072

160 163840

192 196608

224 229376

256 262144


Recommended shared memory settingsThe SIZE_DATA_BUFFERS setting is typically increased to 256 KB and NUMBER_DATA_BUFFERS to 16. To configure NetBackup to use 16 x 256 KB data buffers, specify 262144 (256 x 1024) in SIZE_DATA_BUFFERS and 16 in NUMBER_DATA_BUFFERS.

Note that increasing the size and number of the data buffers will use up more shared memory, which is a limited system resource. The total amount of shared memory used for each tape drive is:

(buffer_size * num_buffers) * drives * MPX

where MPX is the multiplexing factor. For two tape drives, each with an MPX of 4 and with 16 buffers of 256k, the total shared memory usage would be:

(16 * 262144) * 2 * 4 = 32768 K (32 MB)

If large amounts of memory are to be allocated, the kernel may require additional tuning to allow enough shared memory to be available for NetBackup's requirements. For more information, see “Kernel tuning (UNIX)” on page 152.

Note: Note that AIX media servers do not need to tune shared memory because AIX uses dynamic memory allocation.

Be cautious if you change these parameters. Make your changes carefully, monitoring for performance changes with each modification. For example, increasing the tape buffer size can cause some backups to run slower. Also, there have been cases with restore issues. After any changes, be sure to include restores as part of your validation testing.

Testing changes made to shared memoryAfter making changes, it is vitally important to verify that the following tests complete successfully:

1 Run a backup.

2 Restore the data from the backup.

3 Restore data from a backup created prior to the changes to SIZE_DATA_BUFFERS and NUMBER_DATA_BUFFERS.

Before and after altering the size or number of data buffers, examine the buffer usage information in the bptm debug log file. The values in the log should match your buffer settings. The relevant bptm log entries are similar to the following: 12:02:55 [28551] <2> io_init: using 65536 data buffer size12:02:55 [28551] <2> io_init: CINDEX 0, sched bytes for monitoring = 200 12:02:55 [28551] <2> io_init: using 8 data buffers


or15:26:01 [21544] <2> mpx_setup_restore_shm: using 12 data buffers, buffer size is 65536

When you change these settings, take into consideration the total system resources and the entire network. The Maximum Transmission Unit (MTU) for the local area network (LAN) may also have to be changed.

Parent/child delay valuesAlthough rarely changed, it is possible to modify the parent and child delay values for a process. To change these values, create the following files:

UNIX/usr/openv/netbackup/db/config/PARENT_DELAY/usr/openv/netbackup/db/config/CHILD_DELAY

Windows<install_path>\NetBackup\db\config\PARENT_DELAY<install_path>\NetBackup\db\config\CHILD_DELAY

These files contain a single integer specifying the value in milliseconds to be used for the delay corresponding to the name of the file. For example, to use a parent delay of 50 milliseconds, the PARENT_DELAY file would contain the integer 50.

See “Using NetBackup wait and delay counters” below for more information about how to determine if you should change these values.

Note: The following section refers to the bptm process on the media server during back up and restore operations from a tape storage device. If you are backing up to or restoring from a disk storage device, substitute bpdm for bptm throughout the section. For example, to activate debug logging for a disk storage device, the following directory must be created: /usr/openv/netbackup/logs/bpdm (UNIX) or install_path\NetBackup\logs\bpdm (Windows).

Using NetBackup wait and delay countersDuring a backup or restore operation the NetBackup media server uses a set of shared data buffers to isolate the process of communicating with the tape from the process of interacting with the disk or network. Through the use of Wait and Delay counters, you can determine which process on the NetBackup media server has to wait more often: the data producer or the data consumer.


Achieving a good balance between the data producer and the data consumer processes is an important factor in achieving optimal performance from the NetBackup server component of the NetBackup data transfer path.

Understanding the two-part communication processThe two-part communication process differs depending on whether the operation is a backup or restore and whether the operation involves a local client or a remote client.

Local clientsWhen the NetBackup media server and the NetBackup client are part of the same system, the NetBackup client is referred to as a local client.

■ Backup of local client

For a local client, the bpbkar (UNIX) or bpbkar32 (Windows) process reads data from the disk during a backup and places it in the shared buffers. The bptm process reads the data from the shared buffer and writes it to tape.

■ Restore of local client

During a restore of a local client, the bptm process reads data from the tape and places it in the shared buffers. The tar (UNIX) or tar32 (Windows) process reads the data from the shared buffers and writes it to disk.

NetBackup Client Network

BPTM(Child Process)

BPTM(Parent Process)Shared Buffers

Tape

Producer

Consumer

Producer - consumer relationship during a backup


Remote clientsWhen the NetBackup media server and the NetBackup client are part of two different systems, the NetBackup client is referred to as a remote client.

■ Backup of remote client

The bpbkar (UNIX) or bpbkar32 (Windows) process on the remote client reads data from the disk and writes it to the network. Then a child bptm process on the media server receives data from the network and places it in the shared buffers. The parent bptm process on the media server reads the data from the shared buffers and writes it to tape.

■ Restore of remote client

During the restore of the remote client, the parent bptm process reads data from the tape and places it into the shared buffers. The child bptm process reads the data from the shared buffers and writes it to the network. The tar (UNIX) or tar32 (Windows) process on the remote client receives the data from the network and writes it to disk.

Roles of processes during backup and restore operationsWhen a process attempts to use a shared data buffer, it first verifies that the next buffer in order is in a correct state. A data producer needs an empty buffer, while a data consumer needs a full buffer. The following chart provides a mapping of processes and their roles during backup and restore operations:

If a full buffer is needed by the data consumer but is not available, the data consumer increments the Wait and Delay counters to indicate that it had to wait for a full buffer. After a delay, the data consumer will check again for a full buffer. If a full buffer is still not available, the data consumer increments the Delay counter to indicate that it had to delay again while waiting for a full buffer. The data consumer will repeat the delay and full buffer check steps until a full buffer is available.

Operation Data Producer Data Consumer

Local Backup bpbkar (UNIX) or bpbkar32 (Windows)

bptm

Remote Backup bptm (child) bptm (parent)

Local Restore bptm tar (UNIX) ortar32 (Windows)

Remote Restore bptm (parent) bptm (child)


This sequence is summarized in the following algorithm:

while (Buffer_Is_Not_Full) {

++Wait_Counter;

while (Buffer_Is_Not_Full) {

++Delay_Counter;

delay (DELAY_DURATION);

}

}

If an empty buffer is needed by the data producer but is not available, the data producer increments the Wait and Delay counter to indicate that it had to wait for an empty buffer. After a delay, the data producer will check again for an empty buffer. If an empty buffer is still not available, the data producer increments the Delay counter to indicate that it had to delay again while waiting for an empty buffer. The data producer will relate the delay and empty buffer check steps until an empty buffer is available.

The algorithm for a data producer has a similar structure:

while (Buffer_Is_Not_Empty) {

++Wait_Counter;

while (Buffer_Is_Not_Empty) {

++Delay_Counter;

delay (DELAY_DURATION);

}

}

Analysis of the Wait and Delay counter values indicates which process, producer or consumer, has had to wait most often and for how long.

There are four basic Wait and Delay Counter relationships:

■ Data Producer >> Data Consumer. The data producer has substantially larger Wait and Delay counter values than the data consumer.

The data consumer is unable to receive data fast enough to keep the data producer busy. Investigate means to improve the performance of the data consumer. For a back up operation, check if the data buffer size is appropriate for the tape drive being used (see below).

If data consumer still has a substantially large value in this case, try increasing the number of shared data buffers to improve performance (see below).

■ Data Producer = Data Consumer (large value). The data producer and the data consumer have very similar Wait and Delay counter values, but those values are relatively large.


This may indicate that the data producer and data consumer are regularly attempting to used the same shared data buffer. Try increasing the number of shared data buffers to improve performance (see below).

■ Data Producer = Data Consumer (small value). The data producer and the data consumer have very similar Wait and Delay counter values, but those values are relatively small.

This indicates that there is a good balance between the data producer and data consumer, which should yield good performance from the NetBackup server component of the NetBackup data transfer path.

■ Data Producer << Data Consumer. The data producer has substantially smaller Wait and Delay counter values than the data consumer.

The data producer is unable to deliver data fast enough to keep the data consumer busy. Investigate ways to improve the performance of the data producer. For a restore operation, check if the data buffer size (see below) is appropriate for the tape drive being used.

If the data producer still has a relatively large value in this case, try increasing the number of shared data buffers to improve performance (see below).

The bullets above describe the four basic relationships possible. Of primary concern is the relationship and the size of the values. Information on determining substantial versus trivial values appears on the following pages. The relationship of these values only provides a starting point in the analysis. Additional investigative work may be needed to positively identify the cause of a bottleneck within the NetBackup data transfer path.

Determining wait and delay counter valuesWait and Delay counter values can be found by creating and reading debug log files on the NetBackup media server.

Note: Writing the debug log files introduces some additional overhead and will have a small impact on the overall performance of NetBackup. This impact will be more noticeable for a high verbose level setting. Normally, you should not need to run with debug logging enabled on a production system.

To determine wait and delay counter values for a local client backup:

1 Activate debug logging by creating these two directories on the media server:

UNIX

/usr/openv/netbackup/logs/bpbkar/usr/openv/netbackup/logs/bptm


Windows

install_path\NetBackup\logs\bpbkarinstall_path\NetBackup\logs\bptm

2 Execute your backup.

Look at the log for the data producer (bpbkar on UNIX or bpbkar32 on Windows) process in:UNIX/usr/openv/netbackup/logs/bpbkar

Windowsinstall_path\NetBackup\logs\bpbkar

The line you are looking for should be similar to the following, and will have a timestamp corresponding to the completion time of the backup:... waited 224 times for empty buffer, delayed 254 times

In this example the Wait counter value is 224 and the Delay counter value is 254.

3 Look at the log for the data consumer (bptm) process in:

UNIX

/usr/openv/netbackup/logs/bptm

Windowsinstall_path\NetBackup\logs\bptm

The line you are looking for should be similar to the following, and will have a timestamp corresponding to the completion time of the backup:

... waited for full buffer 1 times, delayed 22 times

In this example, the Wait counter value is 1 and the Delay counter value is 22.

To determine wait and delay counter values for a remote client backup:

1 Activate debug logging by creating this directory on the media server:

UNIX/usr/openv/netbackup/logs/bptm

Windows

install_path\NetBackup\logs\bptm

2 Execute your backup.

3 Look at the log for the bptm process in:


Windowsinstall_path\NetBackup\Logs\bptm

4 Delays associated with the data producer (bptm child) process will appear as follows:... waited for empty buffer 22 times, delayed 151 times, ...



5 Delays associated with the data consumer (bptm parent) process will appear as:... waited for full buffer 12 times, delayed 69 times

In this example the Wait counter value is 12, and the Delay counter value is 69.

To determine wait and delay counter values for a local client restore:

1 Activate logging by creating the two directories on the NetBackup media server:

UNIX/usr/openv/netbackup/logs/bptm/usr/openv/netbackup/logs/tar

Windowsinstall_path\NetBackup\logs\bptminstall_path\NetBackup\logs\tar

2 Execute your restore.

Look at the log for the data consumer (tar or tar32) process in the tar log directory created above.The line you are looking for should be similar to the following, and will have a timestamp corresponding to the completion time of the restore: ... waited for full buffer 27 times, delayed 79 times

In this example, the Wait counter value is 27, and the Delay counter value is 79.

3 Look at the log for the data producer (bptm) process in the bptm log directory created above.

The line you are looking for should be similar to the following, and will have a timestamp corresponding to the completion time of the restore:... waited for empty buffer 1 times, delayed 68 times

In this example, the Wait counter value is 1 and the delay counter value is 68.

To determine wait and delay counter values for a remote client restore:

1 Activate debug logging by creating the following directory on the media server:


Windowsinstall_path\NetBackup\logs\bptm

2 Execute your restore.


3 Look at the log for bptm in the bptm log directory created above.

4 Delays associated with the data consumer (bptm child) process will appear as follows:... waited for full buffer 36 times, delayed 139 times


5 Delays associated with the data producer (bptm parent) process will appear as follows:... waited for empty buffer 95 times, delayed 513 times

In this example the Wait counter value is 95 and the Delay counter value is 513.

Note: When you run multiple tests, you can rename the current log file. NetBackup will automatically create a new log file, which prevents you from erroneously reading the wrong set of values.

Deleting the debug log file will not stop NetBackup from generating the debug logs. You must delete the entire directory. For example, to stop bptm logging, you must delete the bptm subdirectory. NetBackup will automatically generate debug logs at the specified verbose setting whenever the directory is detected.

Using wait and delay counter values to analyze issuesYou can use the bptm debug log file to verify that the following tunable parameters have successfully been set to the desired values. You can use these parameters and the Wait and Delay counter values to analyze issues. These additional values include:

■ Data buffer size. The size of each shared data buffer can be found on a line similar to:... io_init: using 65536 data buffer size

■ Number of data buffers. The number of shared data buffers may be found on a line similar to:... io_init: using 16 data buffers

■ Parent/child delay values. The values in use for the duration of the parent and child delays can be found on a line similar to:... io_init: child delay = 20, parent delay = 30 (milliseconds)

■ NetBackup Media Server Network Buffer Size. The values in use for the Network Buffer Size parameter on the media server can be found on lines similar to these in debug log files:


The receive network buffer is used by the bptm child process to read from the network during a remote backup....setting receive network buffer to 263168 bytes

The send network buffer is used by the bptm child process to write to the network during a remote restore....setting send network buffer to 131072 bytes

See “NetBackup media server network buffer size” on page 97 for more information about the Network Buffer Size parameter on the media server.

Suppose you wanted to analyze a local backup in which there was a 30-minute data transfer duration baselined at 5 Megabytes/second with a total data transfer of 9,000 Megabytes. Because a local backup is involved, if you refer to “Roles of processes during backup and restore operations” on page 110, you can determine that bpbkar (UNIX) or bpbkar32 (Windows) is the data producer and bptm is the data consumer.

You would next want to determine the Wait and Delay values for bpbkar (or bpbkar32) and bptm by following the procedures described in “Determining wait and delay counter values” on page 112. For this example, suppose those values were:

Using these values, you can determine that the bpbkar (or bpbkar32) process is being forced to wait by a bptm process which cannot move data out of the shared buffer fast enough.

Next, you can determine time lost due to delays by multiplying the Delay counter value by the parent or child delay value, whichever applies.

In this example, the bpbkar (or bpbkar32) process uses the child delay value, while the bptm process uses the parent delay value. (The defaults for these values are 20 for child delay and 30 for parent delay.) The values are specified in milliseconds. See “Parent/child delay values” on page 108 for more information on how to modify these values.

Process Wait Delay

bpbkar (UNIX)

bpbkar32 (Windows)

29364 58033

bptm 95 105


Use the following equations to determine the amount of time lost due to these delays:

This is useful in determining that the delay duration for the bpbkar (or bpbkar32) process is significant. If this delay were entirely removed, the resulting transfer time of 10:40 (total transfer time of 30 minutes minus delay of 19 minutes and 20 seconds) would indicate a throughput value of 14 Megabytes/sec, nearly a threefold increase. This type of performance increase would warrant expending effort to investigate how the tape drive performance can be improved.

The number of delays should be interpreted within the context of how much data was moved. As the amount of data moved increases, the significance threshold for counter values increases as well.

Again, using the example of a total of 9,000 Megabytes of data being transferred, assume a 64-Kilobytes buffer size. You can determine the total number of buffers to be transferred using the following equation:

The Wait counter value can now be expressed as a percentage of the total divided by the number of buffers transferred:

In this example, in the 20 percent of cases where the bpbkar (or bpbkar32) process needed an empty shared data buffer, that shared data buffer has not yet been emptied by the bptm process. A value this large indicates a serious issue,

bpbkar (UNIX)

bpbkar32 (Windows)

= 58033 delays X 0.020 seconds

=1160 seconds

=19 minutes 20 seconds

bptm =105 X 0.030 seconds

=3 seconds

Number_Kbytes = 9,000 X 1024

= 9,216,000 Kilobytes

Number_Slots =9,216,000 / 64

=144,000

bpbkar (UNIX)

bpbkar32 (Windows)

= 29364 / 144,000

= 20.39%

bptm = 95 / 144,000

= 0.07%


and additional investigation would be warranted to determine why the data consumer (bptm) is having issues keeping up.

In contrast, the delays experienced by bptm are insignificant for the amount of data transferred.

You can also view the Delay and Wait counters as a ratio:

In this example, on average the bpbkar (or bpbkar32) process had to delay twice for each wait condition that was encountered. If this ratio is substantially large, you may wish to consider increasing the parent or child delay value, whichever one applies, to avoid the unnecessary overhead of checking for a shared data buffer in the correct state too often. Conversely, if this ratio is close to 1, you may wish to consider reducing the applicable delay value to check more often and see if that increases your data throughput performance. Keep in mind that the parent and child delay values are rarely changed in most NetBackup installations.

The preceding information explains how to determine if the values for Wait and Delay counters are substantial enough for concern. The Wait and Delay counters are related to the size of data transfer. A value of 1,000 may be extreme when only 1 Megabyte of data is being moved. The same value may indicate a well-tuned system when gigabytes of data are being moved. The final analysis must determine how these counters affect performance by considering such factors as how much time is being lost and what percentage of time a process is being forced to delay.

Correcting issues uncovered by wait and delay counter values The following lists identify ways to correct issues that are uncovered by the Wait and Delay counter values.

■ bptm read waits

The bptm debug log contains messages such as, ...waited for full buffer 1681 times, delayed 12296 times

The first number in the message is the number of times bptm waited for a full buffer, which is the number of times bptm write operations waited for data from the source. If, using the technique described in the section “Determining wait and delay counter values” on page 112, you determine that the Wait counter indicates a performance issue, then changing the number of buffers will not help, but adding multiplexing may help.

bpbkar (UNIX)

bpbkar32 (Windows)

= 58033/29364

= 1.98


■ bptm write waits

The bptm debug log contains messages such as, ...waited for empty buffer 1883 times, delayed 14645 times

The first number in the message is the number of times bptm waited for an empty buffer, which is the number of times bptm experienced data arriving from the source faster than the data could be written to tape. If, using the technique described in the section “Determining wait and delay counter values” on page 112, you determine that the Wait counter indicates a performance issue, then reduce the multiplexing factor if you are using multiplexing. Also, adding more buffers may help.

■ bptm delays

The bptm debug log contains messages such as, ...waited for empty buffer 1883 times, delayed 14645 times

The second number in the message is the number of times bptm waited for an available buffer. If, using the technique described in the section “Determining wait and delay counter values” on page 112, you determine that the Delay counter indicates a performance issue, this will need investigation. Each delay interval is 30 ms.

Fragment size and NetBackup restoresBelow is a summary of how fragment size affects NetBackup restores for non-multiplexed and multiplexed images, followed by a more in-depth discussion.

The fragment size affects where tape markers are placed and how many tape markers are used. (The default fragment size is 1 terabyte for tape storage units and 512 GB for disk.) As a rule, a larger fragment size results in faster backups, but may result in slower restores when recovering a small number of individual files.

The Reduce fragment size to setting on the Storage Unit dialog limits the largest fragment size of the image. By limiting the size of the fragment, the size of the largest read during restore is minimized, reducing restore time. This is especially important when restoring a small number of individual files rather than entire directories or file systems.

For many sites, a fragment size of approximately 10 gigabytes will result in good performance for both backup and restore.

When choosing a fragment size, consider the following:

■ Larger fragment sizes usually favor backup performance, especially when backing up large amounts of data. Creating smaller fragments will slow down large backups: each time a new fragment is created, the backup stream is interrupted.


■ Larger fragment sizes do not hinder performance when restoring large amounts of data. But when restoring a few individual files, larger fragments may slow down the restore.

■ Larger fragment sizes do not hinder performance when restoring from non-multiplexed backups. For multiplexed backups, larger fragments may slow down the restore. In multiplexed backups, blocks from several images can be mixed together within a single fragment. During restore, NetBackup positions to the nearest fragment and starts reading the data from there, until it comes to the desired file. Splitting multiplexed backups into smaller fragments can improve restore performance.

■ During restores, newer, faster devices can handle large fragments well. Slower devices, especially if they do not use fast locate block positioning, will restore individual files faster if fragment size is smaller. (In some cases, SCSI fast tape positioning can improve restore performance.)

Note: Unless you have particular reasons for creating smaller fragments (such as when restoring a few individual files, restoring from multiplexed backups, or restoring from older equipment), larger fragment sizes are likely to yield better overall performance.

Restore of a non-multiplexed imagebptm positions to the media fragment and the actual tape block containing the first file to be restored. If fast-locate is available, bptm uses that for the positioning. If fast-locate is not available, bptm uses MTFSF/MTFSR (forward space filemark/forward space record) to do the positioning.

The first file is then restored.

After that, for every subsequent file to be restored, bptm determines where that file is, relative to the current position. If it is faster for bptm to position to that spot rather than to read all the data in between (and if fast locate is available), bptm uses positioning to get to the next file instead of reading all the data in between.

If fast-locate is not available, bptm can read the data as quickly as it can position with MTFSR (forward space record).

Therefore, fragment sizes for non-multiplexed restores matter if fast-locate is NOT available. In general, given smaller fragments, a restore reads less extraneous data. You can set the maximum fragment size for the storage unit on the Storage Unit dialog in the NetBackup Administration Console (Reduce fragment size to).


Restore of a multiplexed imagebptm positions to the media fragment containing the first file to be restored. If fast-locate is available, bptm uses that for the positioning. If fast_locate is not available, bptm uses MTFSF (forward space filemark) for the positioning. The restore cannot “fine-tune” positioning to get to the block containing the first file, because of the randomness of how multiplexed images are written. So, the restore starts reading, throwing away all the data (for this client and other clients) until it reaches the block that contains the first file.

The first file is then restored.

After that, the logic is the same as that for non-multiplexed restores with one exception: if the current position and the next file position are in the same fragment, the restore cannot use positioning, for the same reason that it cannot use “fine-tune” positioning to get to the first file.

However, if the next file position is in a subsequent fragment further down the media (or even on a different media), then the restore uses positioning methods to get to that fragment instead of reading all the data in between.

So, there is an advantage to keeping multiplexed fragments to a smaller size. The optimal fragment size depends on the site's data and situation. For multi- gigabyte images, it is probably desirable to keep fragments to 1 gigabyte or less. Remember that the storage unit attribute to limit fragment size is based on the total amount of data in the fragment (not the total amount of data for any one client).

Note that when multiplexed images are being written, each time a client backup stream starts or ends, by definition, that is a new fragment. A new fragment is also created when a checkpoint occurs for a backup that has checkpoint restart enabled. So not all fragments are of the maximum fragment size. Of course, end-of-media (EOM) also causes new fragment(s).

Some examples may help illustrate when smaller fragments do and do not help restores.

Example 1: Assume you are backing up four streams to a multiplexed tape, and each stream is a single, 1 gigabyte file and a default maximum fragment size of 1 TB has been specified. The resultant backup image logically looks like the following. ‘TM’ denotes a tape mark, or file mark, that indicates the start of a fragment.

TM <4 gigabytes data> TM

When restoring any one of the 1 gigabyte files, the restore positions to the TM and then has to read all 4 gigabytes to get the 1 gigabyte file.

If you set the maximum fragment size to 1 gigabyte:


TM <1 gigabyte data> TM <1 gigabyte data> TM <1 gigabyte data> TM <1 gigabyte data> TM

this does not help, since the restore still has to read all four fragments to pull out the 1 gigabyte of the file being restored.

Example 2: This is the same as Example 1, but assume four streams are backing up 1 gigabyte worth of /home or C:\. With the maximum fragment size (Reduce fragment size) set to a default of 1 TB (and assuming all streams are relatively the same performance), you again end up with:

TM <4 gigabytes data> TM

Restoring /home/file1 or C:\file1 and/home/file2 or C:\file2 from one of the streams will have to read as much of the 4 gigabytes as necessary to restore all the data. But, if you set Reduce fragment size to 1 gigabyte, the image looks like this:

TM <1 gigabyte data> TM <1 gigabyte data> TM <1 gigabyte data> TM <1 gigabyte data> TM

In this case, home/file1 or C:\file1 starts in the second fragment, and bptm positions to the second fragment to start the restore of home/file1 or C:\file1 (this has saved reading 1 gigabyte so far). After /home/file1 is done, if /home/file2 or C:\file2 is in the third or forth fragment, the restore can position to the beginning of that fragment before it starts reading as it looks for the data.

These examples illustrate that whether fragmentation benefits a restore depends on what the data is, what is being restored, and where in the image the data is. In Example 2, reducing the fragment size from 1 gigabyte to half a gigabyte (512 Megabytes) increases the chance the restore can locate by skipping instead of reading when restoring relatively small amounts of an image.

Fragmentation and checkpoint restartIf the policy’s Checkpoint Restart feature is enabled, NetBackup creates a new fragment at each checkpoint, based on the Take checkpoints every setting. For more information on Checkpoint Restart, refer to the NetBackup System Administrator’s Guide, Volume I.

Other restore performance issuesCommon reasons for restore performance issues are described in the following subsections.


NetBackup catalog performanceThe disk subsystem where the NetBackup catalog resides has a large impact on the overall performance of NetBackup. To improve restore performance, configure this subsystem for fast reads. NetBackup binary catalog format provides scalable and fast catalog access.

NUMBER_DATA_BUFFERS_RESTORE setting This parameter can help keep other NetBackup processes busy while a multiplexed tape is positioned during a restore. Increasing this value causes NetBackup buffers to occupy more physical RAM. This parameter only applies to multiplexed restores. For more information on this parameter, see “Shared memory (number and size of data buffers)” on page 102.

Index performance issuesFor information, refer to “Indexing the Catalog for Faster Access to Backups” in the NetBackup 6.0 System Administrator’s Guide, Volume I.

Search performance with many small backupsTo improve search performance when you have many small backup images, run the following command as root on the master server:

UNIX/usr/openv/netbackup/bin/admincmd/bpimage -create_image_list -client client_name

Windowsinstall_directory\bin\admincmd\bpimage -create_image_list -client client_name

where client_name is the name of the client with many small backup images.

In the directory:

UNIX/usr/openv/netbackup/db/images/client_name

Windowsinstall_path\NetBackup\db\images\client_name

the bpimage command creates the following files:

IMAGE_LIST List of images for this client

IMAGE_INFO Information about the images for this client

IMAGE_FILES The file information for small images


Do not edit these files, because they contain offsets and byte counts that are used for seeking to and reading the image information.

Note: These files increase the size of the client directory.

Restore performance in a mixed environmentIf you encounter restore performance issues in a mixed environment (UNIX and Windows), consider reducing the tcp wait interval parameter, tcp_deferred_ack_interval. Under Solaris 8, the default value of this parameter is 100ms. (Root privileges are required to change this parameter.)

The current value of tcp_deferred_ack_interval can be obtained by executing the following command (this example is for Solaris):

/usr/sbin/ndd -get /dev/tcp tcp_deferred_ack_interval

The value of tcp_deferred_ack_interval can be changed by executing the command:

/usr/sbin/ndd -set /dev/tcp tcp_deferred_ack_interval value

where value is the number which provides the best performance for the system. This may have to be tried and tested as it may vary from system to system. A suggested starting value is 20. In any case, the value must not exceed 500ms as this may break TCP/IP.

Once the optimum value for the system is found, the command for setting the value can be permanently set in a script under the directory /etc/rc2.d so that it can be executed at boot time.

Multiplexing set too highIf multiplexing is too high, needless tape searching may occur. The ideal setting is the minimum needed to stream the drives.

Restores from multiplexed database backupsNetBackup can run several restores at the same time from a single multiplexed tape. This is done by means of the MPX_RESTORE_DELAY option, which specifies how long, in seconds, the server waits for additional restore requests of files or raw partitions that are in a set of multiplexed images on the same tape. The restore requests received within this period are executed simultaneously. By default, the delay is 30 seconds.

This may be a useful parameter to change if multiple stripes from a large database backup are multiplexed together on the same tape. If the MPX_RESTORE_DELAY option is changed, you do not need to stop and restart the NetBackup processes for the change to take effect.


When bprd, the request daemon on the master server, receives the first stream of a multiplexed restore request, it triggers the MPX_RESTORE_DELAY timer to start counting the configured amount of time. At this point, bprd watches and waits for related multiplexed jobs from the same client before starting the overall job. If another associated stream is received within the timeout period, it is added to the total job, and the timer is reset to the MPX_RESTORE_DELAY period. Once the timeout has been reached without an additional stream being received by bprd, the timeout window closes, all associated restore requests are sent to bptm, and a tape is mounted. If any associated restore requests are received after this event, they are queued to wait until the tape that is now “In Use” is returned to an idle state.

If MPX_RESTORE_DELAY is not set high enough, NetBackup may need to mount and read the same tape multiple times to collect all of the necessary header information necessary for the restore. Ideally, NetBackup would read a multiplexed tape, collecting all of the header information it needs, with a single pass of the tape, thus minimizing the amount of time to restore.

Example (Oracle): Suppose that MPX_RESTORE_DELAY is not set in the bp.conf file, so its value is the default of 30 seconds. Suppose also that you initiate a restore from an Oracle RMAN backup that was backed up using 4 channels or 4 streams, and you use the same number of channels to restore.

RMAN passes NetBackup a specific data request, telling NetBackup what information it needs to start and complete the restore. The first request is passed and received by NetBackup in 29 seconds, causing the MPX_RESTORE_DELAY timer to be reset. The next request is passed and received by NetBackup in 22 seconds, so again the timer is reset. The third request is received 25 seconds later, resetting the timer a third time, but the fourth request is received 31 seconds after the third. Since the fourth request was not received within the restore delay interval, NetBackup only starts three of the four restores. Instead of reading from the tape once, NetBackup queues the fourth restore request until the previous three requests are completed. Since all of the multiplexed images are on the same tape, NetBackup mounts, rewinds, and reads the entire tape again to collect the multiplexed images for the fourth restore request.

Note that in addition to NetBackup's reading the tape twice, RMAN waits to receive all the necessary header information before it begins the restore.

If MPX_RESTORE_DELAY had been larger than 30 seconds, NetBackup would have received all four restore requests within the restore delay windows and collected all the necessary header information with one pass of the tape. Oracle would have started the restore after this one tape pass, improving the restore performance significantly.

126 Tuning the NetBackup data transfer pathNetBackup storage device performance

MPX_RESTORE_DELAY needs to be set with caution, because it can decrease performance if its value is set too high. Suppose, for instance, that the MPX_RESTORE_DELAY is set to 1800 seconds. When the final associated restore request arrives, NetBackup resets the request delay timer as it did with the previous requests. NetBackup then must wait for the entire 1800-second interval before it can start the restore.

Therefore, try to set the value of MPX_RESTORE_DELAY so it is neither too high or too low.

NetBackup storage device performanceThis section looks at storage device functionality in the NetBackup data transfer path. Changes in these areas may improve NetBackup performance.

Tape drive wear and tear is much less, and efficiency is greater, if the data stream matches the tape drive capacity and is sustained. Generally speaking, most tape drives have much slower throughput than disk drives. Match the number of drives and the throughput per drive to the speed of the SCSI/FC connection, and/or follow the hardware vendors’ recommendations.

These are some of the factors which affect tape drives:

Media positioningWhen a backup or restore is performed, the storage device must position the tape so that the data is over the read/write head. Depending on the location of the data and the overall performance of the media device, this can take a significant amount of time. When you conduct performance analysis with media containing multiple images, it is important to account for the time lag that occurs before the data transfer starts.

Tape streamingIf a tape device is being used at its most efficient speed, it is said to be streaming the data onto the tape. Generally speaking, if a tape device is streaming, there will be little physical stopping and starting of the media. Instead the media will be constantly spinning within the tape drive. If the tape device is not being used at its most efficient speed, it may continually start and stop the media from spinning. This behavior is the opposite of tape streaming and usually results in a poor data throughput rate.

Data compressionMost tape devices support some form of data compression within the tape device itself. Compressible data (such as text files) yields a higher data throughput rate than non-compressible data, if the tape device supports hardware data compression.

127Tuning the NetBackup data transfer pathNetBackup storage device performance

Tape devices typically come with two performance rates: maximum throughput and nominal throughput. Maximum throughput is based on how fast compressible data can be written to the tape drive when hardware compression is enabled in the drive. Nominal throughput refers to rates achievable with non-compressible data.

Note: Tape drive data compression cannot be set by NetBackup. Follow the instructions provided with your OS and tape drive to be sure data compression is set correctly.

In general, tape drive data compression is preferable to client (software) compression such as that available in NetBackup. Client compression may be desirable in some cases, such as for reducing the amount of data transmitted across the network for a remote client backup. See “Tape versus client compression” on page 133 for more information.

128 Tuning the NetBackup data transfer pathNetBackup storage device performance

Chapter
9
Tuning other NetBackup components

This chapter provides guidelines and recommendations for improving performance in certain features or components of NetBackup.


■ “Multiplexing and multi-streaming” on page 130

■ “Encryption” on page 133

■ “Compression” on page 133

■ “Using both encryption and compression” on page 134

■ “NetBackup java” on page 134

■ “Vault” on page 134

■ “Fast recovery with bare metal restore” on page 135

■ “Backing up many small files” on page 135

130 Tuning other NetBackup componentsMultiplexing and multi-streaming

Multiplexing and multi-streamingConsider the following factors regarding multiplexing and multi-streaming.

When to use multiplexing and multi-streamingMultiple data streams can reduce the time for large backups. The reduction is achieved by splitting the data to be backed up into multiple streams and then using multiplexing, multiple drives, or a combination of the two for processing the streams concurrently. In addition, configuring the backup so each physical device on the client is backed up by a separate data stream that runs concurrently with streams from other devices can significantly reduce backup times.

Note: For best performance, use only one data stream to back up each physical device on the client. Running multiple concurrent streams from a single physical device can adversely affect the time to back up that device because the drive heads must move back and forth between tracks containing the files for the respective streams.

Multiplexing is not recommended for database backups, when restore speed is of paramount interest, or when your tape drives are slow.

Backing up across a network, unless the network bandwidth is very broad, can nullify the ability to stream. Typically, a single client can send enough data to saturate a single 100BaseT network connection. A gigabit network has the capacity to support network streaming for some clients. Keep in mind that multiple streams use more of the client’s resources than a single stream. We recommend testing to make sure that the client can handle the multiple data streams and that the users are not affected by the high rate of data transfer.

Multiplexing and multi-streaming can be powerful tools to ensure that all tape drives are streaming. With NetBackup, both can be used at the same time. It is important to distinguish between the two concepts:

■ Multiplexing writes multiple data streams to a single tape drive.

131Tuning other NetBackup componentsMultiplexing and multi-streaming

Figure 9-1 Multiplexing diagram

■ Multi-streaming writes multiple data streams, each to its own tape drive, unless multiplexing is used.

Figure 9-2 Multistreaming diagram

Here are some things to consider with regard to multiplexing:

Experiment with different multiplexing factors to find one where the tape drive is just streaming, that is, where the writes just fill the maximum bandwidth of your drive. This is the optimal multiplexing factor. For instance, if you determine that you can get 5 Megabytes/sec from each of multiple concurrent read streams, then you would use a multiplexing factor of two to get the maximum throughput to a DLT7000 (that is, 10 Megabytes/sec).

■ Use a higher multiplexing factor for incremental backups.

■ Use a lower multiplexing factor for local backups.

■ Expect the duplication of a multiplexed tape to take a longer period of time if it is demultiplexed, because multiple read passes of the source tape must be made.

server

clients

backup to tape

server

backup to tape

132 Tuning other NetBackup componentsMultiplexing and multi-streaming

■ When you duplicate a multiplexed backup, demultiplex it.

By demultiplexing the backups when they are duplicated, the time for recovery is significantly reduced.

Do not use multi-streaming on single mount points. Multi-streaming takes advantage of the ability to stream data from several devices at once. This permits backups to take advantage of Read Ahead on a spindle or set of spindles in RAID environments. Multi-streaming from a single mount point encourages head thrashing and may result in degraded performance. Only conduct multistreamed backups against single mount points if they are mirrored (RAID 0). However, this also is likely to result in degraded performance.

Effects of multiple data streams on backup/restore■ Multiplexing

To use multiplexing effectively, you must understand the implications of multiplexing on restore times. Multiplexing may decrease overall backup time when you are backing up large numbers of clients over slow networks, but it does so at the cost of recovery time. Restores from multiplexed tapes must pass over all nonapplicable data. This action increases restore times. When recovery is required, demultiplexing causes delays in the restore process. This is because NetBackup must do more tape searching to accomplish the restore.

Restores should be tested, before the need to do a restore arises, to determine the impact of multiplexing on restore performance.

When you initially set up a new environment, keep the multiplexing factor low. Typically, a multiplexing factor of four or less does not highly impact the speed of restores, depending on the type of drive and the type of system. If the backups do not finish within their assigned window, multiplexing can be increased to meet the window. However, increasing the multiplexing factor provides diminishing returns as the number of multiplexing clients increases. The optimum multiplexing factor is the number of clients needed to keep the buffers full for a single tape drive.

Set the multiplexing factor to four and do not multistream. Run benchmarks in this environment. Then, if needed, you can begin to change the values involved until both the backup and restore window parameters are met.

■ Multi-streaming

The NEW_STREAM directive is useful for fine-tuning streams so that no disk subsystem is under-utilized or over-utilized.

133Tuning other NetBackup componentsEncryption

EncryptionWhen the NetBackup encryption option is enabled, your backups may run slower. How much slower depends on the throttle point in your backup path. If the network is the issue, encryption should not hinder performance. If the network is not the issue, then encryption may slow down the backup.

Note that some local backups actually ran faster with encryption than without it. In some field test cases, memory utilization has been found to be roughly the same with and without encryption.

CompressionTwo types of compression can be used with NetBackup, client compression (configured in the NetBackup policy) and tape drive compression (handled by the device hardware). Some or all of the files may also have been compressed by other means prior to the backup.

How to enable compressionNetBackup client compression can be enabled by selecting the compression option in the NetBackup Policy Attributes window.

How tape drive compression is enabled depends on your operating system and the type of tape drive. Check with the operating system and drive vendors, or read their documentation to find out how to enable tape compression.

With UNIX device addressing, these options are frequently part of the device name. A single tape drive has multiple names, each with a different functionality built into the name. (This is really done with major and minor device numbers.) So, for instance, on Solaris, if you address /dev/rmt/2cbn, you get drive 2, hardware-compressed, with no-rewind option. If you address /dev/rmt/2n, its function should be uncompressed with the no-rewind option. The choice of device names determines device behavior.

If the media server is UNIX, there is no compression when the backup is to a disk storage unit. The compression options in this case are limited to client compression. If the media server with the disk storage unit is Windows, and the directory used by the disk storage unit is compressed, then there will be compression on the disk write just as there would be for any file writes to that directory by any application.

Tape versus client compression■ Tape drive compression is almost always preferable to client compression.

■ Tape compression offloads the compression task from the client and server.

134 Tuning other NetBackup componentsUsing both encryption and compression

■ Avoid using both tape compression and client compression, as this can actually increase the amount of backed-up data.

■ Only in rare cases is it beneficial to use client (software) compression. For very dense data, compression algorithms take a long time and often increase the overall size of the images when compressing an already compressed image. In cases where the files are already compressed, devices should be pointed to native device drivers. In other cases, NetBackup client compression should be turned off, and the hardware should handle the compression.

■ On UNIX: client compression reduces the amount of data sent over the network, but impacts the client. The NetBackup client configuration setting MEGABYTES_OF_MEMORY may help client performance. It is undesirable to compress files which are already compressed. If you find that this is happening with your backups, refer to the NetBackup configuration option COMPRESS_SUFFIX. Edit this setting through bpsetconfig.

Using both encryption and compressionIf a policy is enabled for both encryption and compression, the client first compresses the backup data and then encrypts it.

Client data compression (configured in the policy) is generally not needed, because data compression is handled internally by tape drives. When data is encrypted, it becomes randomized, and is no longer compressible. Therefore, data compression must be performed prior to any data encryption. In considering whether or not to use NetBackup client compression, see “Compression” on page 133.

NetBackup javaFor performance improvement, refer to the following sections in the NetBackup System Administrator’s Guide for UNIX and Linux, Volume I: “Configuring the NetBackup-Java Administration Console,” and the subsection “NetBackup-Java Performance Improvement Hints.” In addition, the NetBackup Release Notes may contain information about NetBackup Java performance.

VaultRefer to the “Best Practices” chapter of the NetBackup Vault System Administrator’s Guide.

135Tuning other NetBackup componentsFast recovery with bare metal restore

Fast recovery with bare metal restoreVeritas Bare Metal Restore (BMR) provides a simplified, automated method by which to recover an entire system (including the operating system and applications). BMR automates the restore process to ensure rapid, error-free recovery. This process requires one Bare Metal Restore command and then a system boot. BMR guarantees integrity and consistency and is supported for both UNIX and Windows systems.

Note: BMR requires the True image restore option. This option has implications on the size of the NetBackup catalog. Refer to “Calculate the size of your NetBackup catalog” on page 22 for more details.

Backing up many small filesNetBackup will take longer to back up multiple small files of the same total size as a single large file. The following may improve performance when backing up many small files.

■ Use the FlashBackup (or FlashBackup-Windows) policy type. This is a feature of NetBackup Advanced Client. FlashBackup is described in the NetBackup Advanced Client System Administrator’s Guide.

See “FlashBackup” on page 136 of this Tuning guide for a related tuning issue.

■ On Windows, make sure virus scans are turned off (this may double performance).

■ Snap a mirror (such as with the FlashSnap method in Advanced Client) and back that up as a raw partition. This does not allow individual file restore from tape.

Some specific things to try to improve performance include:

■ Turn off or reduce logging.

The NetBackup logging facility has the potential to impact the performance of backup and recovery processing. Logging is usually enabled only to troubleshoot a NetBackup problem, to ensure that any performance impact is short in term. The performance impact can be determined by the amount of logging used and the verbosity level set.

■ Make sure the NetBackup buffer size is the same size on both the servers and clients.

■ Consider upgrading NIC drivers as new releases appear.

136 Tuning other NetBackup componentsBacking up many small files

■ Run the following bpbkar throughput test on the client with Windows:

C:\Veritas\Netbackup\bin\bpbkar32 -nocont > NUL 2>

(for example, C:\Veritas\Netbackup\bin\bpbkar32 -nocont c:\ > NUL 2> temp.f)

■ When initially configuring the Windows server, optimize TCP/IP throughput as opposed to shared file access.

■ Always select the choice of boosting background performance on Windows versus foreground performance.

■ Turn off NetBackup Client Job Tracker if the client is a system server.

■ Regularly review the patch announcements for every server OS. Install patches that affect TCP/IP functions, such as correcting out-of-sequence delivery of packets.

FlashBackup

If using advanced client FlashBackup with a copy-on-write snapshot methodIf you are using the FlashBackup feature of Advanced Client with a copy-on-write method such as nbu_snap, assign the snapshot cache device to a separate hard drive. This will improve performance by reducing disk contention and potential head thrashing due to the writing of data to maintain the snapshot.

Tunable read buffer for Solaris (with nbu_snap method)If the storage unit write speed (either tape or disk) is relatively fast, reading the client disk may become a bottleneck during a FlashBackup raw partition backup. By default, FlashBackup reads the raw partition using fixed 128 KB buffers for full backups and 32 KB buffers for incrementals.

In most cases, the default read buffer size will allow FlashBackup to stay ahead of the storage unit write speed. To further minimize the number of iowaits when reading client data, however, you can tune the FlashBackup read buffer size, allowing the nbu_snap driver to read continuous device blocks up to 1 MB per iowait, depending on the disk driver support. The read buffer size can be adjusted separately for both full backup and incremental backup.

In general, a larger buffer yields faster raw partition backup (but see the following note). In the case of VxVM striped volumes, if the read buffer is configured as a multiple of the striping block size, data can be read in parallel from the disks, significantly speeding up raw partition backup.

137Tuning other NetBackup componentsBacking up many small files

How to adjust the FlashBackup read buffer for Solaris clients

1 Create the following touch file on each Solaris client:/usr/openv/netbackup/FBU_READBLKS

2 Enter the desired values in the FBU_READBLKS file, as follows.

On the first line of the file, enter an integer value for the read buffer size in bytes for full backups and/or enter the read buffer size in bytes for incremental backups. The default is to read the raw partition in 131072 bytes (128 KB) during full backups and in 32768 bytes (32 KB) for incremental backups. If changing both values, separate them with a space. For example, to set the full backup read buffer to 256 KB and the incremental read buffer to 64 KB, enter the following on the first line of the file:262144 65536

You can use the second line of the file to set the tape record write size, also in bytes. The default is the same size as the read buffer. The first entry on the second line sets the full backup write buffer size, the second value sets the incremental backup write buffer size.

Note: Resizing the read buffer for incremental backups can result in a faster backup in some cases, and a slower backup in others. The result depends on such factors as the location of the data to be read, the size of the data to be read relative to the size of the read buffer, and the read characteristics of the storage device and the I/O stack. Experimentation may be necessary to achieve the best setting.

138 Tuning other NetBackup componentsBacking up many small files

Chapter
10
Tuning disk I/O performance

This chapter describes the hardware issues affecting disk performance with NetBackup. This information is intended to provide a general approach to disk tuning, not specific recommendations for your environment. Based on your hardware and other requirements unique to your site, you can use this information to adjust your configuration for better performance.


■ “Hardware performance hierarchy” on page 140

■ “Hardware configuration examples” on page 147

■ “Tuning software for better performance” on page 148

Note: The critical factors in performance are not software-based. They are hardware selection and configuration. Hardware has roughly four times the weight that software has in determining performance.

140 Tuning disk I/O performanceHardware performance hierarchy

Hardware performance hierarchyThe following diagram shows the key hardware elements, and the interconnections (levels) between them, which affect performance. The diagram shows two disk arrays and a single non-disk device (tape, Ethernet connections, and so forth).

Figure 10-3 Performance hierarchy diagram

Performance hierarchy levels are described in later sections of this chapter.

Level 1

Level 4

Level 3

Level 5

Level 2

Array 1

Tape, Ethernet,

or another non-disk

deviceShelf

Drives

Shelf

Drives

Array 2

Shelf

Drives

Shelf

Drives

Fibre channel Fibre channel

PCI card 1

PCI card 3

PCI card 2

PCI bridge PCI bridge

Host

Memory

PCI bus PCI bus PCI bus

Raid controller Raid controller

Shelf adaptor Shelf adaptor Shelf adaptor Shelf adaptor

141Tuning disk I/O performanceHardware performance hierarchy

In general, all data going to or coming from disk must pass through host memory. In the following diagram, a dashed line shows the path that the data takes through a media server.

Figure 10-4 Data stream in NetBackup media server to arrays

The data moves up through the ethernet PCI card at the far right. The card sends the data across the PCI bus and through the PCI bridge into host memory. NetBackup then writes this data to the appropriate location. In a disk example, the data passes through one or more PCI bridges, over one or more PCI buses, through one or more PCI cards, across one or more fibre channels, and so on.

Sending data through more than one PCI card increases bandwidth by breaking up the data into large chunks and sending a group of chunks at the same time to multiple destinations. For example, a write of 1 MB could be split into 2 chunks going to 2 different arrays at the same time. If the path to each array is x bandwidth, the aggregate bandwidth will be approximately 2x.

Each level in the Performance Hierarchy diagram represents the transitions over which data will flow. These transitions have bandwidth limits.

Between each level there are elements that can affect performance as well.

Level 1

Level 4

Level 3

Level 5

Level 2

Array 2

Shelf

Drives

Fibre channel

PCI card

PCI bridge

Host

PCI bus

Shelf

Memory

Raid controller

Data moving through host

memory

Tape, Ethernet

, or another non-disk


Performance hierarchy level 1Level 1 is the interconnect within a typical disk array that attaches individual disk drives to the adaptor on each disk shelf. A shelf is a physical entity placed into a rack. Shelves usually contain around 15 disk drives. If you use fibre channel drives, the Level 1 interconnect is 1 or 2 fibre channel arbitrated loop (FC-AL). When Serial ATA (SATA) drives are used, the Level 1 interconnect is the SATA interface.

Level 1 bandwidth potential is determined by the technology used.

For FC-AL, the arbitrated loop could be either 1 gigabit or 2 gigabit fibre channel. An arbitrated loop is a shared-access topology, which means that only 2 entities on the loop can be communicating at one time. For example, one disk drive and the shelf adaptor can communicate. So even though a single disk drive might be capable of 2 gigabit bursts of data transfers, there is no aggregation of this bandwidth (that is, multiple drives cannot be communicating with the shelf adaptor at the same time, resulting in multiples of the individual drive bandwidth).

Performance hierarchy level 2Level 2 is the interconnect external to the disk shelf. It attaches one or more shelves to the array RAID controller. This is usually FC-AL, even if the drives in the shelf are something other than fibre channel (SATA, for example). This shared-access topology allows only one pair of endpoints to communicate at any given time.

Level 1

Tape, Ethernet,

or another non-disk

device

Shelf

Drives

Shelf adaptor

Shelf

Drives

Shelf

Drives

Shelf

Drives

Shelf adaptor Shelf adaptorShelf adaptor

Level 2

Array 1 Array 2

Raid controllerRaid controller

Tape, Ethernet

, or another non-disk

device

Shelf

Shelf

Shelf Shelf Shelf

Shelf Shelf Shelf


Larger disk arrays will have more than one internal FC-AL. Shelves may even support 2 FC-AL so that there will be two paths between the RAID controller and every shelf, which provides for redundancy and load balancing.

Performance hierarchy level 3Level 3 is the interconnect external to the disk array and host.

While this diagram shows a single point-to-point connection between an array and the host, a real-world use more typically includes a SAN fabric (having one or more fibre channel switches). The logical result is the same, in that either is a data path between the array and the host.

When these paths are not arbitrated loops (for example, if they were fabric fibre channel), they do not have the shared-access topology limitations. That is, if two arrays are connected to a fibre channel switch and the host has a single fibre channel connection to the switch, the arrays can be communicating at the same time (the switch does the coordination with the host fibre channel connection). However, this does not aggregate bandwidth, since the host is still limited to a single fibre channel connection.

Fibre channel is generally 1 or 2 gigabit (both arbitrated loop and fabric topology). Faster speeds are coming on the market. A general rule-of-thumb when considering protocol overhead is that one can divide the gigabit rate by 10 to get an approximate megabyte-per-second bandwidth. So, 1-gigabit fibre channel can theoretically achieve approximately 100 MB/second and 2-gigabit fibre channel can theoretically achieve approximately 200 MB/second.

Fibre channel is also similar to traditional LANs, in that a given interface can support multiple connection rates. That is, a 2-gigabit fibre channel port will also connect to devices that only support 1 gigabit.

Level 3

Array Array


Host


Performance hierarchy level 4Level 4 is the interconnect within a host for the attachment of PCI cards.

A typical host will support 2 or more PCI buses, with each bus supporting 1 or more PCI cards. A bus has a topology similar to FC-AL in that only 2 endpoints can be communicating at the same time. That is, if there are 4 cards plugged into a PCI bus, only one of them can be communicating with the host at a given instant. Multiple PCI buses are implemented to allow multiple data paths to be communicating at the same time, resulting in aggregate bandwidth gains.

PCI buses have 2 key factors involved in bandwidth potential: the width of the bus - 32 or 64 bits, and the clock or cycle time of the bus (in Mhz).

As a rule of thumb, a 32-bit bus can transfer 4 bytes per clock and a 64-bit bus can transfer 8 bytes per clock. Most modern PCI buses support both 64-bit and 32-bit cards. Currently PCI buses are available in 4 clock rates:

■ 33 Mhz

■ 66 Mhz

■ 100 Mhz (sometimes referred to as PCI-X)

■ 133 Mhz (sometimes referred to as PCI-X)

PCI cards also come in different clock rate capabilities.

Backward-compatibility is very common; for example, a bus rated at 100 Mhz will support 100, 66, and 33 Mhz cards.

Likewise, a 64-bit bus will support both 32-bit and 64-bit cards.

They can also be mixed; for example, a 100-Mhz 64-bit bus can support any mix of clock and width that are at or below those values.

Level 4PCI card

1 PCI card 3

PCI card 2

PCI bridge PCI bridgePCI bus PCI bus PCI bus


Note: In a shared-access topology, a slow card can negatively impact the performance of other fast cards on the same bus. This is because the bus adjusts to the right clock and width for each transfer. One moment it could be doing 100 Mhz 64 bit to card #2 and at another moment doing 33 Mhz 32 bit to card #3. Since the transfer to card #3 will be so much slower, it takes longer to complete. The time that is lost may otherwise have been used for moving data faster with card #2.

You should also remember that a PCI bus is a unidirectional bus, which means that when it is doing a transfer in one direction, it cannot move data in the other direction, even from another card.

Real-world bandwidth is generally around 80% of the theoretical maximum (clock * width). Following are rough estimates for bandwidths that can be expected:

64 bit/ 33 Mhz = approximately 200 MB/second

64 bit/ 66 Mhz = approximately 400 MB/second

64 bit/100 Mhz = approximately 600 MB/second

64 bit/133 Mhz = approximately 800 MB/second

Performance hierarchy level 5Level 5 is the interconnect within a host between PCI bridge(s) and memory. This bandwidth is rarely a limiting factor in performance.

General notes on performance hierarchiesThe hardware components between interconnect levels can also have an impact on bandwidth.

■ A drive has sequential access bandwidth and average latency times for seek and rotational delays.

Drives perform optimally when doing sequential I/O to disk. Non-sequential I/O forces movement of the disk head (that is, seek and rotational latency).

Level 5

Host

Memory



This movement is a huge overhead compared to the amount of data transferred. So the more non-sequential I/O done, the slower it will get.

Reading and/or writing more than one stream at a time will result in a mix of short bursts of sequential I/O with seek and rotational latency in between, which will significantly degrade overall throughput. Because different drive types have different seek and rotational latency specifications, the type of drive selected has a large effect on how much the degradation will be.

From best to worst, such drives are fibre channel, SCSI, and SATA, with SATA drives usually twice the latency of fibre channel. However, SATA drives have about 80% the sequential performance that fibre channel drives do.

■ A RAID controller has cache memory of varying sizes. The controller also does the parity calculations for RAID-5. Better controllers have this calculation (called “XOR”) in hardware which makes it faster. If there is no hardware-assisted calculation, the controller processor must perform it, and controller processors are not usually high performance.

■ A PCI card can be limited either by the speed supported for the port(s) or the clock rate to the PCI bus.

■ A PCI bridge is usually not an issue because it is sized to handle whatever PCI buses are attached to it.

Memory can be a limit if there is other intensive non-I/O activity in the system.

Note that there are no CPUs for the host processor(s) in the “Performance hierarchy diagram” on page 140.

While CPU performance is obviously a contributor to all performance, it is generally not the bottleneck in most modern systems for I/O intensive workloads, because there is very little work done at that level. The CPU must execute a read operation and a write operation, but those operations do not take up much bandwidth. An exception is when older gigabit ethernet card(s) are involved, because the CPU has to do more of the overhead of network transfers.

147Tuning disk I/O performanceHardware configuration examples

Hardware configuration examplesThe examples below are not intended as particular recommendations for your site; they are intended to show some factors to consider when adjusting hardware for better NetBackup performance.

Example 1A general hardware configuration could have dual 2-gigabit fibre channel ports on a single PCI card. In such a case, the following is true:

■ Potential bandwidth is approximately 400 MB/second.

■ For maximum performance, the card must be plugged into at least a 66 Mhz PCI slot.

■ No other cards on that bus should need to transfer data at the same time. That single card will saturate the PCI bus.

■ Putting 2 of these cards (4 ports total) onto the same bus and expecting them to aggregate to 800 MB/second will never work unless the bus and cards are 133 Mhz.

Example 2The following more detailed example shows a pyramid of bandwidth potentials with aggregation capabilities at some points. Suppose you have the following hardware:

■ 1x 66 Mhz quad 1 gigabit ethernet

■ 4x 66 Mhz 2 gigabit fibre channel

■ 4x disk array with 1 gigabit fibre channel port

■ 1x Sun V880 server (2x 33 Mhz PCI buses and 1x 66 Mhz PCI bus)

In this case, for maximum backup and restore throughput with clients on the network, the following is one way to assemble the hardware so that no constraints limit throughput.

■ The quad 1-gigabit ethernet card can do approximately 400 MB/second throughput at 66 Mhz.

■ It requires at least a 66 Mhz bus, because putting it in a 33 Mhz bus would limit throughput to approximately 200 MB/second.

■ It will completely saturate the 66 Mhz bus, so do not put any other cards on that bus that need significant I/O at the same time.

Since the disk arrays have only 1-gigabit fibre channel ports, the fibre channel cards will degrade to 1 gigabit each.

148 Tuning disk I/O performanceTuning software for better performance

■ Each card can therefore move approximately 100 MB/second. With four cards, the total is approximately 400 MB/second.

■ However, you do not have a single PCI bus available that can support that 400MB /second, since the 66-Mhz bus is already taken by ethernet card.

■ There are two 33-Mhz buses which can each support approximately 200 MB/second. Therefore, you can put 2 of the fibre channel cards on each of the 2 buses.

This configuration can move approximately 400 MB/second for backup or restore. Real-world results of a configuration like this show approximately 350 MB/second.

Tuning software for better performance

Note: The size of individual I/O operations should be scaled such that the overhead is relatively low compared to the amount of data moved. That means the I/O size for a bulk transfer operation (such as a backup) should be relatively large.

The optimum size of I/O operations is dependent on many factors and varies greatly depending on the hardware setup.

Below is the performance hierarchy diagram, but in this version, each array only has a single shelf.

149Tuning disk I/O performanceTuning software for better performance

Figure 10-5 Example hierarchy with single shelf per array

Note the following:

■ Each shelf in the disk array has 9 drives because it uses a RAID 5 group of 8+1 (that is, 8 data disks + 1 parity disk).

The RAID controller in the array uses a stripe unit size when performing I/O to these drives. Suppose that you know the stripe unit size to be 64KB. This means that when writing a full stripe (8+1) it will write 64KB to each drive.

The amount of non-parity data is 8 * 64KB, or 512KB. So, internal to the array, the optimal I/O size is 512KB. This means that crossing Level 3 to the host PCI card should perform I/O at 512KB.

■ The diagram shows two separate RAID arrays on two separate PCI buses. You want both to be performing I/O transfers at the same time.

If each is optimal at 512K, the two arrays together are optimal at 1MB.

Level 1

Level 4

Level 3

Level 5

Level 2

Array 1

Tape, Ethernet,

or another non-disk

device

Array 2

Shelf

Drives


PCI card 1

PCI card 3

PCI card 2


Host

Memory

PCI bus PCI bus

Raid controller

Shelf adaptor

Shelf

Drives

Raid controller

Shelf adaptor

PCI bus

150 Tuning disk I/O performanceTuning software for better performance

You can implement software RAID-0 to make the two independent arrays look like one logical device. RAID-0 is a plain stripe with no parity. Parity protects against drive failure, and this configuration already has RAID-5 parity protecting the drives inside the array.

The software RAID-0 is configured for a stripe unit size of 512KB (the I/O size of each unit) and a stripe width of 2 (1 for each of the arrays).

Since 1MB is the optimum I/O size for the volume (the RAID-0 entity on the host), that size is used throughout the rest of the I/O stack.

■ If possible, configure the file system mounted over the volume for 1MB. The application performing I/O to the file system also uses an I/O size of 1MB. In NetBackup, I/O sizes are set in the configuration touch file .../db/config/SIZE_DATA_BUFFERS_DISK. See “Changing the size of shared data buffers” on page 105 for more information.

Chapter
11
OS-related tuning factors

This chapter provides OS-related tuning recommendations that can improve NetBackup performance.


■ “Kernel tuning (UNIX)” on page 152

■ “Adjusting data buffer size (Windows)” on page 157

■ “Other Windows issues” on page 159

152 OS-related tuning factorsKernel tuning (UNIX)

Kernel tuning (UNIX)Several kernel tunable parameters can affect NetBackup performance on UNIX.

Note: Keep in mind that changing these parameters may affect other applications that use the same parameters. Making sizeable changes to these parameters may result in performance trade-offs. Usually, the best approach is to make small changes and monitor the results.

Kernel parameters on Solaris 8 and 9The Solaris operating system dynamically builds the operating system kernel with each boot of the system.

The parameters below reflect minimum settings for a system dedicated to running Veritas NetBackup software.

Note: The parameters described in this section can be used on Solaris 8, 9, and 10. However, many of the following parameters are obsolete in Solaris 10. See “Kernel parameters in Solaris 10” on page 154 for a list of the parameters now obsolete in Solaris 10, and for further assistance with Solaris 10 parameters.

Below are brief definitions of the message queue, semaphore, and shared memory parameters. The parameter definitions apply to a Solaris system. The values for these parameters can be set in the file /etc/system.

■ Message queues

set msgsys:msginfo_msgmax = maximum message size

set msgsys:msginfo_msgmnb = maximum length of a message queue in bytes. The length of the message queue is the sum of the lengths of all the messages in the queue.

set msgsys:msginfo_msgmni = number of message queue identifiers

set msgsys:msginfo_msgtql = maximum number of outstanding messages system-wide that are waiting to be read across all message queues.

■ Semaphores

set semsys:seminfo_semmap = number of entries in semaphore map

set semsys:seminfo_semmni = maximum number of semaphore identifiers system-wide

set semsys:seminfo_semmns = number of semaphores system-wide

set semsys:seminfo_semmnu = maximum number of undo structures in system

set semsys:seminfo_semmsl = maximum number of semaphores per id

153OS-related tuning factorsKernel tuning (UNIX)

set semsys:seminfo_semopm = maximum number of operations per semop call

set semsys:seminfo_semume = maximum number of undo entries per process

■ Shared memory

set shmsys:shminfo_shmmin = minimum shared memory segment size

set shmsys:shminfo_shmmax = maximum shared memory segment size

set shmsys:shminfo_shmseg = maximum number of shared memory segments that can be attached to a given process at one time

set shmsys:shminfo_shmmni = maximum number of shared memory identifiers that the system will support

The ipcs -a command displays system resources and their allocation, and is a useful command to use when a process is hanging or sleeping to see if there are available resources for it to use.

Example:This is an example of tuning the kernel parameters for NetBackup master servers and media servers, for a Solaris 8 or 9 system. Symantec provides this information only to assist in kernel tuning for NetBackup. See “Kernel parameters in Solaris 10” on page 154 for Solaris 10.

These are recommended minimum values. If /etc/system already contains any of these entries, use the larger of the existing setting and the setting provided here. Before modifying /etc/system, use the command /usr/sbin/sysdef -i to view the current kernel parameters.

After you have changed the settings in /etc/system, reboot the system to allow the changed settings to take effect. After rebooting, the sysdef command will display the new settings.

*BEGIN NetBackup with the following recommended minimum settings in a Solaris /etc/system file

*Message queues

set msgsys:msginfo_msgmap=512

set msgsys:msginfo_msgmax=8192

set msgsys:msginfo_msgmnb=65536

set msgsys:msginfo_msgmni=256

set msgsys:msginfo_msgssz=16

set msgsys:msginfo_msgtql=512

set msgsys:msginfo_msgseg=8192

*Semaphores

set semsys:seminfo_semmap=64

set semsys:seminfo_semmni=1024

set semsys:seminfo_semmns=1024


set semsys:seminfo_semmnu=1024

set semsys:seminfo_semmsl=300

set semsys:seminfo_semopm=32

set semsys:seminfo_semume=64

*Shared memory

set shmsys:shminfo_shmmax=16777216

set shmsys:shminfo_shmmin=1

set shmsys:shminfo_shmmni=220

set shmsys:shminfo_shmseg=100

*END NetBackup recommended minimum settings

■ Socket Parameters on Solaris 8 and 9

The TCP_TIME_WAIT_INTERVAL parameter sets the amount of time to wait after a TCP socket is closed before it can be used again. This is the time that a TCP connection remains in the kernel's table after the connection has been closed. The default value for most systems is 240000, which is 4 minutes (240 seconds) in milliseconds. If your server is slow because it handles many connections, check the current value for TCP_TIME_WAIT_INTERVAL and consider reducing it.

For Solaris or HP-UX, use the following command:

ndd -get /dev/tcp tcp_time_wait_interval

■ Force load parameters on Solaris 8 and 9

When system memory gets low, Solaris unloads unused drivers from memory and reloads drivers as needed. Tape drivers are a frequent candidate for unloading, since they tend to be less heavily used than disk drivers. Depending on the timing of these unload and reload events for the st (Sun), sg (Symantec), and Fibre Channel drivers, various issues may result. These issues can range from devices “disappearing” from a SCSI bus to system panics.

Symantec recommends adding the following “forceload” statements to the /etc/system file. These statements prevent the st and sg drivers from being unloaded from memory:

forceload: dev/st

forceload: dev/sg

Other statements may be necessary for various Fibre Channel drivers, such as the following example for JNI:

forceload: dev/fcaw

Kernel parameters in Solaris 10In Solaris 10, all System V IPC facilities are either automatically configured or can be controlled by resource controls. Facilities that can be shared are memory,

155OS-related tuning factorsKernel tuning (UNIX)

message queues, and semaphores. For information on tuning these system resources, see Chapter 6, “Resource Controls (Overview),” in the Sun System Administration Guide: Solaris Containers-Resource Management and Solaris Zones.”

For further assistance with Solaris parameters, refer to the Solaris Tunable Parameters Reference Manual, available at:

http://docs.sun.com/app/docs/doc/819-2724?q=Solaris+Tunable+Parameters

The following sections of the Solaris Tunable Parameters Reference Manual may be of particular interest:

■ What’s New in Solaris System Tuning in the Solaris 10 Release?

■ System V Message Queues

■ System V Semaphores

■ System V Shared Memory

Parameters obsolete in Solaris 10The parameters below are obsolete in Solaris 10. Although they can be included in the Solaris /etc/system file and are used to initialize the default resource control values, Sun does not recommend their use in Solaris 10.

semsys:seminfo_semmnssemsys:seminfo_semvmxsemsys:seminfo_semmnusemsys:seminfo_semaemsemsys:seminfo_semumesemsys:seminfo_semuszsemsys:seminfo_semmapshmsys:shminfo_shmsegshmsys:shminfo_shmminmsgsys:msginfo_msgmapmsgsys:msginfo_msgsegmsgsys:msginfo_msgsszmsgsys:msginfo_msgmax

Message queue and shared memory parameters on HP-UXThe kernel parameters that deal with message queues and shared memory can be mapped to work on an HP-UX system. Below is a list of HP kernel tuning parameter settings.

Table 11-4 Kernel tuning parameters for HP-UX

Name Minimum Value

mesg 1


*shmmax = NetBackup shared memory allocation = (SIZE_DATA_BUFFERS * NUMBER_DATA_BUFFERS) * number of drives * MPX per drive

SIZE_DATA_BUFFERS and NUMBER_DATA_BUFFERS are also discussed under “Recommended shared memory settings” on page 107.

To change the above kernel parameters, use the System Administration Manager (SAM) unless you have great familiarity with changing kernel parameters and rebuilding the kernel from the command line.

From SAM, select Kernel Configuration > Configurable Parameters. Find the parameter to change and select Actions > Modify Configurable Parameter. Then key in the new value. This should be done for all the

msgmap 514

msgmax 8192

msgmnb 65536

msgssz 8

msgseg 8192

msgtql 512

msgmni 256

sema 1

semmap semmni+2

semmni 300

semmns 300

semmnu 300

semume 64

semvmx 32767

shmem 1

shmmni 300

shmseg 120

shmmax Calculate shmmax using the formula provided under “Recommended shared memory settings” on page 107.*

Table 11-4 Kernel tuning parameters for HP-UX

Name Minimum Value

157OS-related tuning factorsAdjusting data buffer size (Windows)

desired parameters. Once all the values have been changed, select Actions > Process New Kernel. This will bring up a warning to inform that a reboot will be required to move the values into place. After the reboot, the sysdef command can be used to confirm that the correct value is in place.

Caution: Any changes to the kernel will require a reboot in order to move the new kernel into place. Do not make changes to the parameters unless a system reboot can be performed, or the changes will not be saved.

Kernel parameters on LinuxTo modify the Linux kernel tunable parameters, use sysctl. sysctl is used to view, set, and automate kernel settings in the /proc/sys directory. Most of these parameters can be changed online. To make your changes permanent, edit /etc/sysctl.conf. The kernel must have support for the procfs file system statically compiled in or dynamically loaded as a module.

The default buffer size for tapes is 32K on Linux. To change it, either rebuild the kernel, making changes to st_options.h, or add a resize parameter to the startup of Linux. An example for a grub.conf entry is:

title Red Hat Linux (2.4.18-24.7.x)root (hd0,0) kernel /vmlinuz-2.4.18-24.7.x ro root=/dev/hda2st=buffer_kbs:256,max_buffers:8 initrd /initrd-2.4.18-24.7.x.img

For further information on setting boot options for st, see /usr/src/linux*/drivers/scsi/README.st, subsection BOOT TIME.

Adjusting data buffer size (Windows)The limit on the size of the data buffers possible under Windows is 1024 Kilobytes. This is calculated as a multiple of operating system pages (1 page = 4 Kilobytes). Therefore, the maximum is 256 OS pages counting from 0 to 255 (the hex value 0xFF). Setting anything larger will default back to 64 Kilobytes, as that is the default size for the Scatter Gather List.

The setting of the maximum usable block size is dependent on the Host Bus Adapter (HBA) miniport driver, not the tape driver or the OS. For example, the readme for the QLogic QLA2200 card contains the following:

* MaximumSGList

Windows includes enhanced scatter/gather list support for doing very large SCSI I/O transfers. Windows supports up to 256 scatter/gather segments of 4096 bytes each, allowing transfers up to 1048576 bytes.

158 OS-related tuning factorsAdjusting data buffer size (Windows)

Note: The OEMSETUP.INF file has been updated to automatically update the registry to support 65 scatter/gather segments. Normally, no additional changes will be necessary as this typically results in the best overall performance.

To change the data buffer size, do the following:

1 Click Start > Run and open the REGEDT32 program.

2 Select HKEY_LOCAL_MACHINE and follow the tree structure down to the QLogic driver as follows: HKEY_LOCAL_MACHINE > SYSTEM > CurrentControlSet > Services > Ql2200 > Parameters > Device.

3 Double click MaximumSGList:REG_DWORD:0x21

4 Enter a value from 16 to 255 (0x10 hex to 0xFF). A value of 255 (0xFF) enables the maximum 1 Megabyte transfer size. Setting a value higher than 255 reverts to the default of 64-Kilobyte transfers. The default value is 33 (0x21).

5 Click OK.

6 Exit the Registry Editor, then shut down and reboot the system.

The main definition here is the so-called SGList, that is, Scatter/Gather list. This is the number of pages that can be either scattered or gathered (that is, read or written) in one DMA transfer. For the QLA2200, you set the parameter MaximumSGList to 0xFF (or just to 0x40 for 256Kb) and can then set 256Kb buffer sizes for NetBackup. Extreme caution should be used when attempting to modify this registry value, and the vendor of the SCSI/Fiber card should always be contacted first to ascertain the maximum value that particular card can support.

The same should be possible for other HBAs as well, especially fiber cards.

The default for JNI fiber cards using driver version 1.16 is actually 0x80 (512Kb or 128 pages). The default for the Emulex LP8000 is 0x81 (513Kb or 129 pages).

Note that for this approach to work, the HBA has to install its own SCSI miniport driver. If it does not, transfers will be limited to 64 Kilobytes. This is for legacy cards like old SCSI cards.

In conclusion, the built-in limit on Windows is 1024 Kilobytes, unless you are using the default Microsoft miniport driver for legacy cards. The limitations are all to do with the HBA drivers and the limits of the physical devices attached to them.

For example, Quantum DLT7000 drives work best with 128-Kilobyte buffers and StorageTek 9840 drives with 256-Kilobyte buffers. If these values are

159OS-related tuning factorsOther Windows issues

increased too far, this could result in damage to either the HBA or the tape drives or any devices in between (fiber bridges and switches, for example).

Other Windows issues ■ Troubleshooting NetBackup’s use of configuration files on Windows

systems.

If you create a configuration file on a Windows system for NetBackup’s use (on UNIX systems, such files are called touch files), the file name must match the file name that NetBackup is expecting. In particular, make sure the file name does not have an extension, such as .txt.

If, for instance, you create a file called NOexpire to prevent the expiration of backup images, this file will not produce the desired effect if the file’s name is NOexpire.txt.

Note also: the file must use a supported type of encoding, such as ANSI. Unicode encoding is not supported; if the file is in Unicode, it will not produce the desired effect.

To check the encoding type, open the file using a tool that displays the current encoding, such as Notepad. Select File > Save As and check the options in the Encoding field. ANSI encoding will work properly.

■ Disable antivirus software when running file system backup on Windows 2000 or Windows XP. Antivirus applications scan all files backed up by NetBackup, and load down the client’s CPU and slow its backups.

As a work around, in the Backup, Archive, and Restore interface, on the NetBackup Client Properties dialog, General tab, clear the checkbox next to Perform incrementals based on archive bit.

160 OS-related tuning factorsOther Windows issues

Appendix
A
Additional resources

This chapter lists additional sources of information.

Performance tuning information at vision onlineFor additional information on NetBackup tuning, go to van.veritas.com and click Vision Online 2005, then click Data Management, and select the NetBackup Performance Tuning document.

Performance monitoring utilities■ Storage Mountain, previously called Backup Central, is a resource for all

backup-related issues. It is located at http://www.storagemountain.com.

■ The following article discusses how and why to design a scalable data installation: “High-Availability SANs,” Richard Lyford, FC Focus Magazine, April 30, 2002.

Freeware tools for bottleneck detection■ Iperf, for measuring TCP and UDP bandwidth:

http://dast.nlanr.net/Projects/Iperf1.1.1/index.htm

■ Bonnie, for measuring the performance of UNIX file system operations:

http://www.textuality.com/bonnie

■ Bonnie++, extends the capabilities of Bonnie:

http://www.coker.com.au/bonnie++/readme.html

■ Tiobench, for testing I/O performance with multiple running threads:

http://sourceforge.net/projects/tiobench/

162 Additional resources

Mailing list resources■ You can find Veritas NetBackup news groups at:

http://forums.veritas.com.

Search on the keyword “NetBackup” to find threads relevant to NetBackup.

■ The email list Veritas-bu discusses backup-related products such as NetBackup. Archives for Veritas-bu are located at:

http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

■ The Usenet news group comp.arch.storage.

Index

Symbols/dev/null 84/dev/rmt/2cbn 133/dev/rmt/2n 133/etc/rc2.d 124/etc/system 152/proc/sys (Linux) 157/user/openv/netbackup/db/config/SIZE_DATA_BU

FFERS 105/user/openv/netbackup/db/config/SIZE_DATA_BU

FFERS_DISK 105/usr/openv/netbackup 100/usr/openv/netbackup/bin/admincmd/bpimage 12

3/usr/openv/netbackup/db/error 83/usr/openv/netbackup/db/images 123/usr/openv/netbackup/db/media 56/usr/openv/netbackup/logs 84/usr/sbin 57/usr/sbin/devfsadm 58/usr/sbin/modinfo 57/usr/sbin/modload 58/usr/sbin/modunload 57/usr/sbin/ndd 124/usr/sbin/tapes 57

Numerics1000BaseT 21100BaseT 21100BaseT cards 3110BaseT 2110BaseT cards 31

AACS 58acsd 58ACSLS communications 58Activity Monitor 80additional info on tuning 161adjusting

backup load 94data buffer size (Windows) 157error threshold 55network communications buffer 97read buffer size 136

Advanced Client 64, 135, 136AIX 99All Log Entries report 79, 81, 83ALL_LOCAL_DRIVES 29, 50alphabetical order, storage units 44ANSI encoding 159antivirus software 159arbitrated loop 142archive bit 159archiving catalog 47array RAID controller 142arrays 95ATA 21ATM card 31auto-negotiate 97AUTOSENSE 97available_media report 60available_media script 54, 60Avg. Disk Queue Length counter 88

Bbackup

catalog 50database 64disk or tape 60environment, dedicated or shared 60large catalog 47load adjusting 94load leveling 93monopolizing devices 94user-directed 77window 64, 92

Backup Central 161Backup Tries parameter 56balancing load 93bandwidth 145

164 Index

bandwidth limiting 93Bare Metal Restore (BMR) 135best practices 66, 68, 71Bonnie 161Bonnie++ 161boot options (Linux) 157bottlenecks 79, 97

freeware tools for detecting 161bp.conf file 56, 101bpbkar 96, 109, 110, 112bpbkar log 83, 84bpbkar32 96, 109, 110bpdm log 83bpdm_dev_null 85bpend_notify.bat 95bpimage 123bpmount -i 50bprd 41, 44bpsetconfig 134bpstart_notify.bat 95bptm 108, 109, 110, 112, 115, 118, 120bptm log 56, 83buffers 97, 157

and FlashBackup 136changing 104changing Windows buffers 158default number of 102default size of 103for network communications 99shared 102tape 102testing 107wait and delay 108Windows 157

bus 54

Ccache device (snapshot) 136calculate

actual data transfer rate required 19length of backups 18network transfer rate 21number of robotic tape slots needed 26number of tape drives needed 20number of tapes needed 25shared memory 103size of catalog 23space needed for NBDB database 23, 24

cartridges, storing 68

catalog 123archiving 47backup requirements 93backups not finishing 47backups, guidelines 46calculating size of 23compression 47, 48large backups 47managing 46

Checkpoint Restart 122child delay values 108CHILD_DELAY file 108cleaning

robots 68tape drives 66tapes 60

clientcompression 133convert to media server 92tuning performance 95variables 78

Client Job Tracker 136clock or cycle time 144Committed Bytes 87common system resources 85communications

buffer 99process 109

Communications Buffer Size parameter 98, 99comp.arch.storage 162COMPRESS_SUFFIX option 134compression 88, 133

and encryption 134catalog 47, 48how to enable 133tape vs client 133

configuration files (Windows) 159configuration guidelines 49CONNECT_OPTIONS 42controller 88copy-on-write snapshot 136counters 108

algorithm 110determining values of 112in Windows performance 86wait and delay 108

CPU 84, 86and performance 146load, monitoring 84

165Index

utilization 42CPUs needed per media server component 31critical policies 50cumulative-incremental backup 17custom reports

available media 60cycle time 144

Ddaily_messages log 51data buffer

overview 102size 97, 157

data compression 126Data Consumer 111data path through server 141data producer 111data recovery, planning for 68, 69data stream and tape efficiency 126data throughput 78

statistics 79data transfer path 79, 90

basic tuning 91data transfer rate

for drive controllers 21for tape drives 19required 19

data variables 78database

backups 64, 130protect against failure 64restores 124

databaseslist of pre-6.0 databases 24

DB2 restores 124Deactivate command 77dedicated backup servers 92dedicated private networks 92delay

buffer 108in starting jobs 40values, parent/child 108

de-multiplexing 91designing

master server 27media server 31

Detailed Status tab 80devfsadmd daemon 57device

names 133reconfiguration 57

devlinks 57disable TapeAlert 68disaster recovery 68, 69

testing 43disk

full 88increase performance 88load, monitoring 87performance, issues affecting 139speed, measuring 84staging 44versus tape 60

Disk Queue Length counter 88disk speed, measuring 85Disk Time counter 88disk-based storage 60diskperf command 87disks, adding 95DNS server 48down drive 55, 56drive controllers 21drive selection 58drive_error_threshold 55, 56drives, number per network connection 54drvconfig 57

Eemail list (Veritas-bu) 162EMM 41, 48, 54, 58, 60, 67EMM database

derived from pre-6.0 databases 24EMM server

calculating space needed for 23, 24moving off master 49

encoding, file 159encryption 133

and compression 134error logs 56, 80error threshold value 54Ethernet connection 140evaluating components 84, 85evaluating performance

Activity Monitor 80All Log Entries report 81encryption 133NetBackup clients 95NetBackup servers 102

166 Index

network 77, 96overview 76

exclude lists 49

Ffactors

in choosing disk vs tape 61in job scheduling 41

failover, storage unit groups 44fast-locate 120FBU_READBLKS 137FC-AL 142, 144fibre channel 141, 143

arbitrated loop 142connection 54

file encoding 159file ID on vxlogview 50file system space 45files

backing up many small 135Windows configuration 159

firewall settings 42FlashBackup 135, 136force load parameters (Solaris) 154forward space filemark 120fragment size 119, 121

considerations in choosing 119fragmentation 95

databases 64level 88

freeware tools 161freeze media 54, 55, 56frequency-based tape cleaning 66frozen volume 55full backup 61full duplex 96

GGigabit Ethernet cards 31Gigabit Fibre Channel 21globDB 24goodies directory 60groups of storage units 44

Hhardware

components and performance 145

configuration examples 147elements affecting performance 140performance considerations 145

hierarchy, disk 140host memory 141host name resolution 78hot catalog backup 46, 50

II/O operations

scaling 148I/O overhead 64IMAGE_FILES 123IMAGE_INFO 123IMAGE_LIST 123improving performance, see tuninginclude lists 49increase disk performance 88incremental backups 61, 92, 131index performance 123insufficient memory 87interfaces, multiple 101ipcs -a command 153Iperf 161iSCSI 21iSCSI bus 54

JJava interface 33, 134job

delays 41scheduling 40, 41scheduling, limiting factors 41

Job Tracker 96jobs queued 40, 41

Kkernel tuning

Linux 157Solaris 152

Llarger buffer (FlashBackup) 136largest fragment size 119latency 145legacy logs 51leveling load among backup components 93

167Index

library-based tape cleaning 68Limit jobs per policy attribute 40, 94limiting fragment size 119link down 97Linux, kernel tunable parameters 157load

leveling 93monitoring 86parameters (Solaris) 154

local backups 131Log Sense page 67logging 77logs 51, 56, 80, 112, 135

managing 50viewing 50

long-term storage 61ltidevs 24LTO drives 25, 31

Mmailing lists 162

resources 161managing

logs 50the catalog 46

Manual Backup command 77master server

CPU utilization 42designing 27determining number of 29splitting 48

Maximum concurrent write drives 40Maximum jobs per client 40Maximum Jobs Per Client attribute 94Maximum streams per drive 41maximum throughput rate 127Maximum Transmission Unit (MTU) 108MaximumSGList 157, 158MDS 58measuring

disk read speed 84, 85NetBackup performance 76

mediacatalog 55error threshold 55not available 54pools 60positioning 126threshold for errors 54

media and device selection logic 58media errors database 24Media List report 54media manager

drive selection 58Media multiplexing setting 40media server 32

convert from client 92designing 31factors in sizing 33not available 41number needed 32number supported by a master 30

media_error_threshold 55, 56mediaDB 24MEGABYTES_OF_MEMORY 134memory 141, 145, 146

amount required 32, 103insufficient 87monitoring use of 84, 87shared 102

merging master servers 48message queue 152message queue parameters

HP-UX 155migration 66Mode Select page 67Mode Sense 67modload command 58modunload command 57monitoring

data variables 78MPX_RESTORE_DELAY option 124MTFSF/MTFSR 120multiple drives, storage unit 40multiple interfaces 101multiple small files, backing up 135multiplexed backups

and fragment size 120database backups 124

multiplexed image, restoring 121multiplexing 61, 91, 130

and memory required 32effects of 132schedule 40set too high 124when to use 130

multi-streaming 130NEW_STREAM directive 132

168 Index

when to use 130

Nnamespace.chksum 24naming conventions 71

policies 71storage units 72

NBDB database 23, 24NBDB.log 47nbemmcmd command 55nbjm and job delays 41nbpem 44nbpem and job delays 40nbu_snap 136ndd 124NET_BUFFER_SZ 98, 99, 106NET_BUFFER_SZ_REST 98NetBackup

capacity planning 11catalog 123job scheduling 40news groups 162restores 119scheduler 76

NetBackup Client Job Tracker 136NetBackup Java console 134NetBackup Operations Manager, see NOMNetBackup Relational Database 48NetBackup relational database files 47NetBackup Vault 134network

bandwidth limiting 93buffer size 97communications buffer 99connection options 42connections 96interface cards (NICs) 96load 97multiple interfaces 101performance 77private, dedicated 92tapes drives and 54traffic 97transfer rate 21tuning 96tuning and servers 92variables 77

Network Buffer Size parameter 99, 115NEW_STREAM directive 132

news groups 162no media available 54NO_TAPEALERT touch file 68NOexpire touch file 44, 159NOM 32, 43, 60

database 34guidelines for sizing 33to monitor jobs 43

nominal throughput rate 127none pool 60non-multiplexed restores 120no-rewind option 133NOSHM file 100Notepad, checking file encoding 159notify scripts 95nslookup 48NUMBER_DATA_BUFFERS 104, 107, 156NUMBER_DATA_BUFFERS_DISK 104NUMBER_DATA_BUFFERS_RESTORE 104, 123

OOEMSETUP.INF file 158offload work to additional master 48on-demand tape cleaning 67online (hot) catalog backup 50Oracle 125

restores 124order of using storage units 44out-of-sequence delivery of packets 136

Ppackets 136Page Faults 87parent/child delay values 108PARENT_DELAY file 108patches 136PCI bridge 141, 145, 146PCI bus 141, 144, 145PCI card 141, 146performance

and CPU 146and disk hardware 139and hardware issues 145see also tuningstrategies and considerations 91

performance evaluation 76Activity Monitor 80All Log Entries report 81

169Index

monitoring CPU 86monitoring disk load 87monitoring memory use 84, 87system components 84, 85

PhysicalDisk object 88policies

critical 50guidelines 49naming conventions 71

Policy Update Interval 40poolDB 24pooling conventions 60port configuration

for robot types 58position error 56Process Queue Length 86Processor Time 86

Qqueued jobs 40, 41

RRAID 61, 88, 95

controller 142, 146rate of data transfer 17raw partition backup 136read buffer size

adjusting 136and FlashBackup 136

reconfigure devices 57recovering data, planning for 68, 69recovery time 61reduce CPU overhead 64Reduce fragment size setting 119reduce I/O 64REGEDT32 158registry 158reload st driver without rebooting 57report 83

All Log Entries 81media 60

resizing read buffer (FlashBackup) 137restore

and network 124in mixed environment 124multiplexed image 121of database 124performance of 122

RESTORE_RETRIES for restores 56retention period 61RMAN 125robot

cleaning 68types 58

robotic_def 24routers 97ruleDB 24

SSAN 64SAN fabric 143SAN Media Server 92sar command 84SATA 21Scatter/Gather list 158schedule naming, best practices 72scheduling 40, 76

delays 40disaster recovery 43limiting factors 41

scratch pool 60SCSI bus 54SCSI connection 54SCSI/FC connection 126SDLT drives 25, 31search performance 123semaphore (Solaris) 152Serial ATA (SATA) 142server

data path through 141splitting master from EMM 49tuning 102variables 76

SGList 158shared data buffers 102

changing 104default number of 102default size of 103

shared memory 100, 102amount required 103parameters, HP-UX 155recommended settings 107Solaris parameters 152testing 107

shared-access topology 142, 145shelf 142SIZE_DATA_BUFFERS 106, 107, 156, 157

170 Index

SIZE_DATA_BUFFERS_DISK 105small files, backup of 135SMART diagnostic standard 67snap mirror 135snapshot cache device 136snapshots 96

and databases 64socket

communications 100parameters (Solaris) 154

softwarecompression (client) 134tuning 148

Solarisclients and FlashBackup read buffer 137kernel tuning 152

splitting master server 48SSOhosts 24st driver

reloading 57staging, disk 44, 61Start Window 76STK drives 25storage device performance 126Storage Mountain 161storage unit 44, 95

groups 44naming conventions 72not available 41

Storage Unit dialog 119storage_units database 24storing tape cartridges 68streaming (tape drive) 61, 126striped volumes (VxVM) 136striping

block size 136volumes on disks 92

stunit_groups 24suspended volume 55switches 143synthetic backups 99System Administration Manager (SAM) 156system resources 85system variables, controlling 76

TTake checkpoints setting 122tape

block size 103

buffers 102cartridges, storing 68cleaning 60, 67compression 133efficiency 126full, frozen, suspended 60number of tapes needed for backups 25position error 56streaming 61, 126versus disk 60

tape connectivity 54reload st driver 57

tape drive 126cleaning 66number needed 20number per network connection 54technologies 66technology needed 18transfer rates 19types 31

tape librarynumber of tape slots needed 26using drives 93

TapeAlert 67tape-based storage 60tar 110tar32 110TCP/IP 136tcp_deferred_ack_interval 124testing conditions 76threshold

error, adjusting 55for media errors 54

throughput 79time to data 61Tiobench 161TLD robotic control 58TLM 58tlmd 58tools (freeware) 161topology (hardware) 144touch files 44, 100

encoding 159traffic on network 97transaction log file 47transfer rate

drive controllers 21for backups 17network 21

171Index

required 19tape drives 19

True Image Restore option 23, 135tuning

additional info 161basic suggestions 91buffer sizes 97, 99client performance 95data transfer path, overview 90device performance 126FlashBackup read buffer 136Linux kernel 157network performance 96restore performance 119, 124search performance 123server performance 102software 148Solaris kernel 152

UUltra-3 SCSI 21Ultra320 SCSI 21Unicode encoding 159unified logging, viewing 50URL resources 161Usenet news group 162user-directed backup 77

VVault 134verbosity level 135Veritas-bu email list 162viewing logs 50virus scans 95, 135Vision Online 161vmstat 84volDB 24volume

frozen 55pools 60suspended 55

vxlogview 50file ID 50

VxVM striped volumes 136

Wwait/delay counters 108, 109, 112

analyzing problems 115correcting problems 118for local client backup 112for local client restore 114for remote client backup 113for remote client restore 114

wear of tape drives 126Wide Ultra 2 SCSI 21wild cards in file lists 49Windows Performance Monitor 86Working Set, in memory 87

172 Index

Veritas NetBackup Backup Planning and Performance Tuning Guide

Documents

entry essentially

maximum transmission

netbackup

rsa data security

netbackup

media manager

mastermedia

netbackup