Exchange_best_practice.pdf

7/26/2019 Exchange_best_practice.pdf

1/41

Engineering White Paper

EMC CLARiiON Storage Solutions

Microsoft Exchange 2003 Best PracticesStorage Configuration Guidelines

Abstract

This white paper presents the latest storage configuration guidelines and best practices for Microsoft Exchangeon CLARiiON storage systems. It is focused on Exchange 2003, but most material also applies to Exchange

2000.

Published 8/8/2005


2/41

8/8/2005

Copyright 2005 EMC Corporation. All rights reserved.

EMC believes the information in this publication is accurate as of its publication date. The information is

subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION

MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THEINFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED

WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicablesoftware license.

Part Number H1363

EMC CLARiiON Storage Solutions Microsoft Exchange 2003 Best PracticesStorage Configuration Guidelines 2


3/41

8/8/2005

Table of Contents

Executive Summary............................................................................................5

Intended Audience..............................................................................................5Introduction.........................................................................................................5

Environmental Parameters for Storage Design ........................................................................... 5

User Community Information.................................................................................................... 5Backup/Recovery Requirements.............................................................................................. 6Other Organizational Requirements or Constraints ................................................................. 7

Planning Storage for the Exchange Production Data......................................7

Exchange Storage Groups........................................................................................................... 7

How Many ESGs per Server? .................................................................................................. 7How Many Databases per ESG? ............................................................................................. 8How Many LUNs per ESG?...................................................................................................... 8

Calculating the Base I/O per User Requirement.......................................................................... 9

Calculating the IOPS Requirement for an Exchange Environment ........................................... 10

RAID Types and the Read/Write Ratio ...................................................................................... 10

Other Factors That May Impact I/O ........................................................................................... 11

Calculating the Capacity Requirement for Database LUNs....................................................... 12

Choosing a RAID and Disk Type ............................................................................................... 13

Comparing RAID 1/0 to RAID 5.............................................................................................. 14Comparing 10K rpm to 15K rpm............................................................................................. 16Comparing 73 GB, 146 GB, and 300 GB ............................................................................... 16Capacity Check ...................................................................................................................... 16Summary ................................................................................................................................ 17

MetaLUNs .................................................................................................................................. 17

Building Blocks ....................................................................................................................... 18Log LUN Configuration .............................................................................................................. 18

Additional Storage Considerations for the Exchange Production Data ..................................... 19

Public Folders......................................................................................................................... 19SMTP Queue.......................................................................................................................... 20Keeping EDB Files and STM Files Together ......................................................................... 20Smaller Exchange Environments........................................................................................... 20

Planning Storage for Local Recovery .............................................................21

SnapView for Disk-Based Replication ....................................................................................... 21

Clone-Based Replication........................................................................................................ 21Snapshot-Based Replication.................................................................................................. 22Comparison of Local Replication Options .............................................................................. 23

Online Backup to Disk................................................................................................................ 23Recovery Storage Groups ......................................................................................................... 24

Planning Storage for Local Message Archiving.............................................24

Storage-System Considerations......................................................................25

iSCSI Guidelines........................................................................................................................ 26

CLARiiON Storage Systems Comparison ................................................................................. 27

Putting It All Together ......................................................................................28



4/41

8/8/2005

Consider Site-Specific Constraints......................................................................................... 28Configure the Cleanest Looking Layout Diagram .................................................................. 28Plan Throughout for Operational Resiliency .......................................................................... 28Validate the Design ................................................................................................................ 29

Additional Recommendations for Optimal Performance ..............................29

Storage-System Tuning ............................................................................................................. 29

Exchange Server and Windows Environment ........................................................................... 30Windows File-System Alignment ............................................................................................... 30

Conclusion ........................................................................................................31

Appendix A: Storage Design Examples..........................................................32

Example 1 .................................................................................................................................. 32

RAID-Adjusted Back-end Disk IOPS Calculation................................................................... 32Capacity Calculation with 200 MB Mailboxes (73 GB Drives) ............................................... 32Capacity Calculation with 400 MB Mailboxes (73 GB Drives) ............................................... 32Matching Up the IOPS and Capacity Requirements.............................................................. 33

Example 2 .................................................................................................................................. 33

Capacity Calculation with 200 MB Mailboxes ........................................................................ 34

Matching Up the IOPS and Capacity Requirements.............................................................. 34Example 3 .................................................................................................................................. 34

Calculations............................................................................................................................ 35Appendix B: Quantifying Exchange User Activity .........................................39

Determining the Peak Activity Period ........................................................................................ 39

Measuring IOPS per User.......................................................................................................... 40

Read/Write Ratio........................................................................................................................ 40

Performance Counter Guidelines............................................................................................... 40

Appendix C: Additional Resources .................................................................41

EMC White Papers .................................................................................................................... 41

Microsoft White Papers.............................................................................................................. 41



5/41

8/8/2005

Executive SummaryThis white paper proceeds through the step-by-step process of planning the storage layout for Exchangedata on an EMCCLARiiONstorage system. It offers considerations and the latest best practice

recommendations along the way. The approach taken is to design a layout that meets the following goals:

Optimal performance During peak periods, user response times are still acceptable and there is nobuildup of mail queues.

Efficient backup and rapid recovery Backups complete within the allotted window, with anacceptable impact on the production environment. Local recovery meets the Service Level Agreement

(SLA) requirement.

Simplicity of design The resulting configuration is straightforward to implement and easy tomanage.

In addition to recommendations for production data storage layout, the paper includes considerations for

configuring CLARiiON storage for Exchange backup, local replication, and archiving.

Intended Audience

The intended audience for this white paper is system engineers who have customers interested inimplementing Microsoft Exchange using EMC CLARiiONFibre Channel storage.

The reader should have a general knowledge of Microsoft Exchange and Windows technology, as well as

an understanding of basic CLARiiON features and terminology.

IntroductionIts said that if you ask 10 consultants to architect an Exchange storage design for an organization, you willget back 10 (or more) different design proposals. CLARiiON storage systems offer a great deal of

flexibility with various combinations of RAID and disk types. Many Exchange storage configurations will

work well on CLARiiON systems, but some may cause unforeseen bottlenecks. As new versions of

Exchange are released and as CLARiiON array technology advances, best practice information for

Exchange storage design is constantly evolving.

Environmental Parameters for Storage DesignSeveral factors figure into the storage design for an Exchange environment. The more you know about an

organizations use of the existing messaging system and the better defined the requirements are for the new

implementation, the closer you should be able to come to constructing a useful design.

This section describes data items to gather. Whenever possible, it is always valuable to have this data

supported by empirical measurements (concurrent users, IOPS, read/write ratio, log files/day, etc.) from the

current environment.

User Community Information

The information gathered in this category should lead to a good estimate of the I/O profile for a set of usersover time. It should also lead to determining the users storage requirements. Appropriate tools and

counters for quantifying the following information are included in Appendix B:

How many total users?

Today

Anticipated growth over the next few years

How many concurrent users during the peak period?

How many mailboxes not associated with an individual user (such as a central help desk mailbox)?



6/41

8/8/2005

What mail client is used?

Outlook (2003 cached, or other)

Outlook Web Access

Mobile devices (Blackberry)

When are the peak activity periods?

What is the typical working day? Is there geographic dispersal of the users across time zones?

What is the Exchange activity level of the users?

Categorization of the user types leads to estimated base IOPS demand (see Table 1 on page 9)

Measured I/O in the existing environment gives the best starting point

Are there special category userswith different security, performance, or backup/recoveryrequirements?

What are the mailbox size limits?

Is there anything else pertinent that helps to describe the user profile for this organization?

Heavy use of personal folders?

Do users often send large documents?

Integrated use of voice mail?

Considerable use of Outlook 2003 shared folders?

What are the characteristics of public folder usage?

Size of the public store

Replication activity among public stores

Backup/Recovery Requirements

The choice of backup and recovery method will play an important part in the resulting storage design.

Once again, measurement of the existing environment will provide the best starting point for the new

design.

What is the deleted item retention period?This is the time period that Exchange maintains an item after a user has deleted ittypically 10 to 30

days.

What is the chosen backup method?

Backup to disk using standard Exchange online backup

Backup to tape using standard Exchange online backup

Clone-based replication (with archival backup to disk or tape)

Uses an EMC application such as Replication Manager SE (RMSE) or Replication Manager (RM)to create physical copies of the Exchange data on CLARiiON clones (BCVs). Archival backup to

disk or tape performs an offline copy of the Exchange data files (databases and logs) from the

BCV.

Snapshot-based replication (with archival backup to disk or tape)

Uses an EMC application such as RMSE or RM to create copies of the Exchange data onCLARiiON snaps. Archival backup to disk or tape performs an offline copy of the Exchange data

files (databases and logs) from the snapshot.

What is the timing of the backup activity?

What are the requirements (service-level agreement) for recovery?

Is a distance replication or disaster recovery solution planned?

DR site distance

Network connection



7/41

8/8/2005

Other Organizational Requirements or Constraints

This category covers any additional pertinent factors that have already been decided, or have been added as

a requirement.

What are the type, number, and location of Exchange servers?

Are the Exchange servers clustered?

What is the planned Exchange front-end/back-end server layout?

What is the SAN/network structure?

Is there an existing CLARiiON storage system?

What other software will be operating in the Exchange environment?

Antivirus

E-mail archiving solution

Applications integrated with Exchange (e.g., for workflow, etc.)

Exchange-integrated third-party tools (e.g., for mailbox recovery, enhanced indexing, etc.)

Planning Storage for the Exchange Production DataThis section provides guidelines for configuring storage to handle the production Exchange data. Whendesigning a storage configuration for Exchange, the first disk measurement to consider must be the I/O

operations per second (IOPS) that the Exchange environment requires. Once you have calculated thenumber of drives necessary to meet the I/O demand, then determine the capacity requirement and adjust the

drive count upward if necessary.

Exchange Storage GroupsThe Exchange storage group (ESG) is the fundamental unit for layout planning. When backing up

Exchange, the elements of an ESG should be treated together.

How Many ESGs per Server?

In the past, Microsoft has recommended that to make efficient use of CPU and memory, the administratormust fill up an ESG with users before creating additional Exchange storage groups. However,

improvements in Exchange (starting in Exchange 2000 SP2) allow the use of Exchanges maximum of fourproduction storage groups without running out of system resources. Because there is a single set of log

files for all databases within one storage group, there are advantages to configuring more ESGs on a server

even when you dont have to. With Exchange 2003, in most cases it will be best to use all four storage

groups.

Following are some considerations for the various ESG configuration options.

Using All Four ESGs

Works well with most servers deployed todayat least dual processor and 1 GB memory.

Offers the best granularity for performance. Using multiple Exchange storage groups results in morelog operations in parallel. Since database performance depends on log file performance, increasing the

number of logs can increase overall performance.

Offers the most granular management. Using four storage groups allows maintenance and recoveryoperations that are fully compartmentalized and affect the least amount of users possible.

Allows for full separate treatment of a set of users with different performance, security, orbackup/recovery requirements.



8/41

8/8/2005

Using the Fewest ESGs Possible

Makes the most efficient use of server resources and thus is better for underpowered servers. This hasbecome a less-significant (potentially negligible) factor with newer servers and Exchange 2003.

Fewer ESGs may be easier to manage.

This comes at the expense of lost flexibility and added exposure. Downtime or data loss caused by theloss of a particular database or storage group affects more users.

Using Two or Three ESGs

If a server does not have the resources to handle four ESGs, spreading users across an extra storagegroup or two can be a compromise.

Appropriate for a smaller number of mailboxes on the server (~1500 or less).

Some organizations may prefer to reserve the fourth storage group for growth, low-risk testing, orsome recovery scenarios.

In some cases, the optimal disk layout may align better with two or three ESGs for performance orcapacity reasons.

How Many Databases per ESG?

There can be up to five databases in each Exchange 2000 or 2003 storage group. Following are someconsiderations for database configuration.

Using Five Databases

A particular database can become logically corrupt. With five databases, logical problems on onedatabase will affect the data of the fewest number of people.

For some backup methods, a single database can be restored without affecting the users in otherdatabases in the ESG. Thus database recovery affects the fewest number of people.

Makes for databases of the smallest possible size. If you are allowing space for offlinedefragmentation, you can keep all databases on the same LUN, but need to allow defrag space for thesize of just one database (the largest).

Using Less Than Five Databases

Administration may be a little easier when operating manually.

This is one way of reserving for growth.

Single-instance store is on a per-database basis. Spreading users across fewer databases may providestorage and performance benefits (e.g., when sending a large document to a large distribution list) from

single-instance storage.

In some cases, the optimal disk layout may align better with three or four ESGs for performance orcapacity reasons.

How Many LUNs per ESG?

In most cases, two LUNs should be allocated for each ESGone for the transaction logs, and one for the

databases (edband stmfiles).

The reasons for placing all ESG databases together, and some considerations for variations, are described inthe next sections.

Maintaining All Databases on the Same LUN

Less complexity for backup/recovery and disk configuration.

Offers the optimal capacity utilization for offline database defragmentation.

Allows growth space to be shared for any of the databases.

Best performance for volume shadow copies. Fewer LUNs provide additional safety margin forcompleting VSS split operations within the allotted 10-second time window.



9/41

8/8/2005

Distributing Databases on Multiple LUNs

Allows for more granular restore from clones (but during recovery, all databases in the ESG must bedown anyway because of the resetting of the checkpoint file).

In the case of very large mailboxes, LUN sizes can approach 1 TB or more. Splitting the ESGdatabases across two or three LUNs can improve clone synchronization times and the performance of

incremental SAN Copy from a clone.

Calculating the Base I/O per User RequirementThe best way to provide enough I/O to your application, especially in a large Exchange environment, is toknow your users usage profile. Sizing of the storage infrastructure should be based on a careful analysis of

the number of current and anticipated users, and their messaging habits and patterns. The fundamental

calculation concerns I/Os per second (IOPS) per user.

Table 1 describes four Exchange user categories and provides an estimate of the IOPS demand per user for

each1.

Table 1. Exchange User Profiles

Typical UserProfile

Description Mailbox Size Expected I/Os per Second

Light POP3 Hosted Internet mail < 25 MB 0.08

Light MAPI Infrequent e-mail access

Small mailboxes

2000 users)

Large mailboxes (>200 MB)Start with a rough estimate that for each doubling of the mailbox size over 100 MB you increase the

IOPS per user by about one third.

Regularly sending very large documents (>5 MB)Integrated voice mail is roughly equivalent to large documents.

Blackberry client usersCount each Blackberry user as the equivalent of two to three typical users.

JournalingThis adds significant overhead. Start with an estimate double the I/O.

1IOPS numbers referenced from the MicrosoftExchange Server 2003 Performance and Scalability Guide

white paper.



10/41

8/8/2005

A typical read/write ratio in Exchange is from 2:1 to 3:1. A ratio lower than this (higher percentage of

writes) will also increase the IOPS/user on RAID storage. This is discussed inRAID Types and the

Read/Write Ratioon page 10.

Organizations typically use many mailboxes in Exchange that are not associated with an individual user.

These mailboxes often serve as the central contact point for a group, for conference rooms, or mailboxesused by integrated applications. Depending on the number of these unassociated mailboxes (it can be

significant10 percent or more) and their activity level (which can vary significantly), it may beappropriate to factor these into the IOPS calculation. By default, treat these mailboxes as equivalent to thetypical user mailboxes, and always include them in the capacity calculation.

Calculating the IOPS Requirement for an Exchange EnvironmentThis is a key step in configuring CLARiiON storage for good Exchange performance. It is a best practice

to configure dedicated disk drives for the Exchange databases. Calculate the periods of highest I/O demandduring the day by looking at the anticipated cumulative effect of user activity, system activity (virus

checkers), and background activity (local or remote replication). Balance the I/O where possible with

scheduling (backup during off-peak hours) and even distribution of users across ESGs. Then, plan a designthat will handle the resulting peak I/O load.

Start with the measurement or estimate of the I/O profile of the Exchange users in the organization. Plan

for the peak user load timetypically mid-morning on Monday.

RAID Types and the Read/Write RatioDepending on the particular organizational requirements, there are two possible RAID type options that can

be appropriate for production Exchange database LUNs:

RAID 1/0 This offers the best performance with high protection, but only 50 percent of the RAIDgroup capacity is usable. It is frequently recommended because it provides sufficient space across thenumber of spindles required for handling peak I/O load with todays larger disk drives. On a RAID

1/0 LUN, there are two physical I/O operations for each write requested (a write to each mirrored

disk), described as a write penalty of two.

RAID 5 This configuration offers a higher usable capacity per RAID group than RAID 1/0. It canbe effective for environments with very large mailboxes and/or lower IOPS requirements. However, in

a RAID 5 group there are four physical I/O operations for each write requested (two reads to calculateparity, one write for data, and one write for parity).

Regardless of the RAID type chosen, it is important to configure enough drives to handle the I/O demand.

Table 2. Write Penalty by RAID Type

RAID Type Write Penalty

RAID 1/0 (Striping + Mirroring) 2

RAID 5 (Striping + Parity) 4

An example with 3,000 Exchange users, separated evenly into four storage groups, runs through the setof calculations in the next few sections. The example is highlighted in a series of text boxes.

1000 heavy users at a peak of 1 IOPS each [1000 IOPS]

2000 typical users at a peak of .5 IOPS each [1000 IOPS]

A general read/write ratio of 2:1

Maximum concurrency of active users at 90%

This results in a requirement of [(1000 + 1000) x .9] = 1,800 host-based IOPS for the 3,000 users

during their peak activity period.



11/41

8/8/2005

Use this formula to adjust the base user I/O requirements in the ESG by figuring in the write penalty for

each RAID group:

(Base IOPS x Read %) + (Base IOPS x Write % x Write Penalty)

= RAID-Adjusted Back-End IOPS

Carrying on the example above, with 1800 total base IOPS and a 2:1 read/write ratio:For RAID 1/0:

(1800 x 2/3) + (1800 x 1/3 x 2) = 1200 + 1200 = 2400 IOPS

For RAID 5:

(1800 x 2/3) + (1800 x 1/3 x 4) = 1200 + 2400 = 3600 IOPS

Other Factors That May Impact I/OThere are other administrative operations that may impact I/O. Several of these should be scheduled totake place only during off-peak times (see the following examples). This additional I/O activity must be

accounted for by increasing the IOPS capacity by some amount over the RAID-adjusted IOPS requirement.To estimate the amount of I/O overhead these additional activities will cost, it is best to perform tests in anenvironment that match the target production environment as closely as possible. The less sure you are of

this overhead, the more capacity you should assign to ensure good performance.

Examples of Background I/O Activity That Cannot Be Scheduled to Off-Peak Times

High load on the server The more active mailboxes a server is managing and the less memory it has,the less likely any particular users mailbox will be cached. If this has not been figured into the per-user IOPS requirement, it should be figured in here. For example, going from 2,500 to 4,000 users on

a system can increase the IOPS per user by about 10 percent.

Server-based antivirus protection Besides the extra reads, antivirus software can add 20 percent ormore to the CPU utilization of the Exchange server.

Integrated features and applications The impact here depends on the number and type of any

integrated features (such as content indexing) or applications (such as workflow), and the amount oftheir use.

Synchronous or asynchronous mirroring If a mirroring solution is used for distance replication ordisaster recovery, it should be factored in very carefully.

The total I/O demand during peak user load can be calculated by adding the cumulative overhead of the

background activity to the RAID-adjusted IOPS requirement. This overhead is often calculated simply as

an added percentage, but some activities (such as virus checkers) involve primarily reads, in which case its

more accurate to ignore the write penalty in the calculation.

RAID-Adjusted User IOPS + I/O Overhead = IOPS Requirement at Peak User Activity

For example, if operating on an active 3000 user Exchange server running a frequently used

workflow application, it would be reasonable to start with an estimated overhead percentage of 20

percent. Continuing the example to calculate the IOPS requirement during peak user activity:

RAID 1/0:

2400 + 20% = 2880 IOPS

RAID 5:

3600 + 20% = 4320 IOPS



12/41

8/8/2005

There are other schedulable activities that can significantly increase I/O demand on the Exchange LUNs.

You should understand their impact. It is important to factor these activities into the peak I/O requirements

if they are scheduled to take place during a period of high user activity.

Examples of Schedulable Activities

Online backup to disk or tape

This places heavy read activity on the production LUNs. There is added overhead if the Exchange

server is used to manage the backup.

Local clone-based replication

Clone-based replication using an application such as RMSE involves synchronizing all source

Exchange LUNs in the ESG to their clones. During the incremental synchronization part of the

process, there is heavy back-end read activity against the production LUNS.

Once the copy is complete on the clones, Eseutil performs an integrity check on each page of the

database replicas. The validated copy may also then be archived or copied (via SAN Copy) to aremote site. These checking and archiving functions cause high read activity, but it is to the clones

rather than the production LUNs. As long as best practice is followed and the clone LUNs are on

separate drives from production LUNs, this heavy read activity comes into calculations only in termsof backup scheduling and use of overall array resources. A single Eseutil process can use up to 30

percent of the CPU of one SP.

Local snap-based replication

Snaps will have a more significant impact on production Exchange LUNs than clone-based replication.

If snaps are chosen as a backup method, it is particularly important to factor in the associated impact.

This is discussed further in the section Snapshot-Based Replicationon page 22.

Exchange online maintenance

By default, Exchange schedules online database maintenance for a 4-hour period nightly to performfunctions that include clearing out deleted mailboxes and deleted items that have gone past their

retention period, plus online defragmentation. This timing and duration can be adjusted.

Because of the heavy I/O that online maintenance adds, schedule it to take place during the period of

lightest activity. At the beginning of online maintenance, Exchange performs an Active Directorylookup for each user in the database. Slightly offsetting the online maintenance start times of the

databases will reduce the impact of these searches on the Active Directory.

You cannot perform a backup on a database at the same time it is undergoing online maintenance.Maintenance will pause until the backup job for that database completes, but it will not extend

operation past its allotted time window. Take this into consideration to ensure that online maintenancegets enough time each day.

In summary, take additional I/O activity into consideration when calculating the anticipated demand. Then,

design to accommodate the peak I/O load with that overhead factored in.

Calculating the Capacity Requirement for Database LUNsUsually the I/O requirements will determine the number of drives required, but it is also necessary to know

the space requirement for the databases in the storage group. For certain environments (such as large

mailboxes or low IOPS per user) the storage capacity requirements may call for more disk space thanperformance needs dictate. Comparatively, this is an easy calculation.

For each category of users in the ESG, multiply the maximum allowed mailbox size by the number of users

in that category. If the number of users is planned to grow, factor that in here. Allow an additionalpercentage of the database size for deleted item retention (~ 10 percent for a typical 30-day retention

period). Because its important not to run out of space on the LUN, allow an additional buffer of 10 to 20

percent of the sum of the databases on the LUN.



13/41

8/8/2005

Offline DefragmentationWith Exchange 2003, in most cases it is no longer necessary to perform offline defragmentation of the

databases. Normal online maintenance will defragment the database, but it will not compact the size of the

file. The only way to actually shrink the database size is to perform offline defragmentation, where the

database is dismounted and Eseutil is used to rebuild a new copy. To have the shortest time offline, thisrebuild is performed on the same LUN as the source. There must be free space equaling at least 110 percent

of the size of the database for the rebuild. If multiple databases are stored on the same LUN, enough space

must be allowed to handle the rebuild of the database with the largest size.

Summarizing the CalculationSpace Required for the ESG Database LUNs =

Maximum Mailbox Size * Number of mailboxes

+ Extra space for the deleted item retention

+ Public Folder space (if part of the ESG)

+ 10% to 20% free space for growth protection

+ Space for offline defragmentation if required

When four or five databases are planned for each storage group, deleted item retention is typical, and spaceis required for offline defragmentation; you can perform a quick capacity calculation by simply allocating

free space equal to 50 percent of the mailbox total (62.5 x 1.5 = 93.75).

For example, with 750 users in one ESG, distributed evenly across five databases but all on the sameLUN (no offline defrag space included):

250 heavy users @ 100 MB Mailbox 25 GB

500 typical users @ 75 MB Mailbox 37.5 GB

Sum requirement for mailboxes 62.5 GB

Add 10% for deleted item retention 62.5 x 1.1 = 68.8 GB

Size of each database 68.8 / 5 = 13.8 GB

Add 15% for extra free space 68.8 x 1.15 = 79.1 GB

To include space for offline defragmentation, add space equivalent to the size of the largest database.

With space for offline defragmentation 79.1 + 13.8 = 92.9 GB(extra free space is already available)

Choosing a RAID and Disk TypeRegardless of RAID type, a physical disk can handle a certain number of Exchange-style IOPS. The IOPS

capacity of disks continues to improve with new disk models, but the performance improvement has not

kept pace with the increase in storage capacity. Consequently, most Exchange disk configurations today

are determined by I/O requirements rather than capacity.There is not consistent agreement on the IOPS capability of a disk drive. Although some sequential I/O

tests have indicated that a CLARiiON 10K rpm drive can perform at a speed greater than 300 IOPS, a more

practical value to use with the Exchange 4 KB random I/Os on CLARiiON has been determined to be in thevicinity of 130 IOPS. Similarly, a practical value to use for 15K rpm drives is 180 IOPS.



14/41

8/8/2005

Table 3. IOPS per Spindle

Disk rpm CLARiiON Disk I/O Capacity

with Exchange Databases

10K 130 IOPS

15K 180 IOPS

Using these values, you can divide into the RAID-adjusted IOPS requirement to construct a table indicating

the number of drives needed to handle the I/O demand of the ESG. Although in the end you may be

required to round up the number of drives to meet RAID 1/0 or RAID 5 requirements, the following tableleaves the number as calculated (rounded up to the next whole drive).

RAID-Adjusted IOPS / IOPS per Disk = Drive Count for the Exchange Database LUNs

Continuing the example, calculating the drive requirement for the RAID 1/0 IOPS total (2880) and RAID 5

IOPS total (4320):

Table 4. Disk-Drive Requirements by RAID Type and Disk Speed All Users

RAID 1/0 RAID 5

10K rpm 23 (2880/130) 34 (4320/130)

15K rpm 16 (2880/180) 24 (4320/180)

Since its advisable to lay out Exchange databases on dedicated drives, the most straightforward design is

to dedicate one or more RAID groups for each ESG.

If the 3000 users are spread across four storage groups, applying the example to some dedicated RAID

configurations:

Table 5. Disk-Drive Requirements by RAID Type and Disk Speed Per ESG

RAID 1/0 RAID 5

10K rpm 6 (3+3) 10 (4+1)s

9 (4+1) and (3+1)

15K rpm 4 (2+2) 6 (5+1)

This assumes that the I/O requirements for each storage group are the same. In practice, some fine tuning

is often required because of varying user counts and activity level.

Comparing RAID 1/0 to RAID 5

It is an accepted notion that RAID 1/0 is a better choice for random-write environments like Exchange. Theeffect is somewhat subtle; since all writes hit the write cache, RAID 1/0 and RAID 5 RAID groups perform

equally well until the storage system is sufficiently busy and the write cache becomes saturated. The

advantage of using RAID 1/0 rather than RAID 5 with Exchange is that RAID 1/0 groups can flush the

cache of Exchanges random write load about 15 percent to 30 percent faster than RAID 5 groups. Thisequates to cache-speed performance at a higher random-write load. Additionally, rebuild times and rebuild

impact is reduced with RAID 1/0 in the event of disk failures (see Table 6.,reproduced from the CLARiiON

Best Practices for Fibre Channel Storagewhite paper).



15/41

8/8/2005

Table 6. RAID Types and Relative Performance in Failure Scenarios

RAID Type Rebuild IOPS Loss Rebuild Time Impact of Second Failure

during Rebuild

RAID 5 50% 15% to 50% slower than RAID 1/0 Loss of data

RAID 1/0 20% - 25% 15% to 50% faster than RAID 5 Loss of data 14% of time in aneight-disk group (1/[n-1])

RAID 1* 20% - 25% 15% to 50% faster than RAID 5 Loss of data

* RAID 1 is not a recommended RAID type for Exchange database LUNs, but it may be appropriate for some smaller

log LUN configurations.

The downside may be cost. RAID 1/0 requires more drives for a given capacity, but that extra capacity may

not be valuable when the spindle count is determined by I/O throughput. Consider the following

calculation to determine the number of users that one physical drive can handle:

User IOPS per Drive = IOPS per Drive x Host-Based IOPS / Back-End IOPS

The ratio of host-based IOPS to back-end IOPS depends on the read/write ratio and the RAID type. Theeasiest way to get this figure is to add the left and right sides of the read/write ratio and divide by the (read

ratio + write ratio x write penalty).

RAID 1/0, User IOPS per Drive with a 3:1 read/write ratio on a 10K drive:

130 x (3 + 1) / (3 + 1 x 2) = 130 x 4 / 5 = 104 IOPS

RAID 5, User IOPS per Drive with a 3:1 read/write ratio on a 10K drive:

130 x (3 + 1) / (3 + 1 x 4) = 130 x 4 / 7 = 74 IOPS

Users per Drive = User IOPS per Drive / IOPS per User

RAID 1/0, .8 IOPS per user:

104 / .8 = 130 users per drive

RAID 5, .8 IOPS per user:

74 / .8 = 92 users per drive

Table 7.7 uses these calculations for six drives, configured as RAID 1/0 and RAID 5, with a few different

mailbox sizes, allowing 50 percent free space on the LUN for deleted item retention, growth, and offlinedefragmentation.

Table 7. Capacity Comparison for Six 73 GB Drives by RAID Type and Mailbox Size

Required Capacity

Usable

Capacity

Users @ .8

IOPS75 MB

Mailboxes

150 MB

Mailboxes

250 MB

Mailboxes

RAID 1/0 (3+3)10K

198 GB 780 (130x6) 88 GB 176 GB 293 GB

RAID 5 (5+1)10K 330 GB 552 (92x6) 62 GB 124 GB 207 GB

Note that once the mailbox size gets much over 150 MB, there is not enough space on the RAID 1/0 groupto accommodate the number of users whose I/O it can handle. This does not take into consideration the

fact that the increased mailbox size causes the IOPS per user to increase.

RAID 5 becomes more appropriate when the capacity requirements are high, relative to the I/O demand,

such as when the I/O demand is low (< .4 IOPS) or the mailbox limit is high (>250 MB).



16/41

8/8/2005

Comparing 10K rpm to 15K rpm

Another choice for buying performance is the 15K rpm drive. The 15K rpm drive offers up to 30 percent

better performance over 10K rpm drives in the kind of random-access case that Exchange presents. The

increased speed assures the write cache avoids saturation and keeps writes going at cache speeds. This doesnot apply to the sequential access log devices, where the two RAID types perform about the same.

Comparing 73 GB, 146 GB, and 300 GB

Smaller drives offer more performance per gigabyte. However, the performance/gigabyte curve drops aslarger drives are deployed. Base the disk-size decision for production Exchange drives on:

The I/O capacity of the drive (number of users it will support, when averaged across the RAID group).

The average maximum mailbox size for those users.

Design simplicity and flexibility.

For example, if a RAID 1/0 3+3 group will support the I/O demand of 600 Exchange users with an averagemailbox maximum of 100 MB, 73 GB drives provide more than enough storage capacity. In some cases,

where the predominant disk on the storage system is a larger size or faster speed, it may make sense to

standardize on that type to simplify LUN layout. Refer to Appendix A for some comparative examples.

Capacity Check

The amount of usable space in a RAID group can be calculated with the following formula2:

RAID 1/0:

Usable space of a RAID 1/0 Group = Usable Drive Space x (# Drives / 2)

RAID 5:

Usable space of a RAID 5 Group = Usable Drive Space x (# Drives 1)

Table 8 compares the usable capacity of 10 drives configured as RAID 1/0 and as RAID 5.

Table 8. Usable Capacity of 10 Drives by Disk Size and RAID Type

Raw Capacity Usable Capacity

per Drive

10 Spindles

Usable Capacity

5+5 R1/0

10 Spindles

Usable Capacity

Two 4+1 R5

36 GB * 33 165 (33x10/2) 264 (2x33x[5-1])

73 GB 66 330 528

146 GB 134 670 1072

300 GB 268 1340 2144

* No longer sold

Typically, a disk layout that meets the I/O requirements for an ESG will contain more than enough capacityto meet the ESG storage requirements. However, if the mailboxes are particularly large, the I/O demand per

user is very low, or if a large amount of space is required for offline defragmentation, the storage

requirements may affect the design.

2For exact figures, especially with the use of vault drives, refer to the CLARiiON Capacity Calculator athttp://clariipub.corp.emc.com/cse/capacitycalc/capacitycalc.htm

http://clariipub.corp.emc.com/cse/capacitycalc/capacitycalc.htmhttp://clariipub.corp.emc.com/cse/capacitycalc/capacitycalc.htm


17/41

8/8/2005

Referring to Table 8.8, if the required storage for the ESG went beyond 330 GB, the RAID 1/0 spindle

count of 10 for 36 GB and 73 GB drives would not be adequate. Options would be to increase the drive

count, move up to 146 GB drives, or switch to RAID 5.

Using the calculation of 92.9 GB per ESG from the example, disk space requirements would not be a

factor except for a RAID 1/0 2+2 configuration of 36 GB drives.

Appendix A includes some additional examples of I/O and capacity calculations.

Summary

Because storage capacity of disk drives has outpaced their increase in I/O throughput, IOPS capacity is the

standard to use today when determining the required number of drives. Since a RAID 1/0 group canperform better than that of a RAID 5 group under certain I/O loads with the same number of spindles, and

since the space disadvantage of RAID 1/0 has become less significant, we recommend RAID 1/0 as the

default choice for Exchange database volumes. RAID 5 may be appropriate for some customers depending

on I/O load and cost considerations.

The decision to choose between 10K or 15K drives will likely come down to cost. Since IOPS are the

determining factor in how many disks you need, the number of 15K drives required will usually be lessthan the 10K drive requirement. Balance the additional cost per drive against savings on additional drives,

DAEs, and possibly cabinets.

MetaLUNsCLARiiON storage systems can combine multiple LUNs into a larger metaLUNthat spans multiple RAIDgroups. MetaLUNs offer two primary advantages: I/O load balancing and expandability. Usually you will

configure an ESG for its maximum anticipated size at the start, but it is possible to use a metaLUN to

handle gradual growth.

The main advantage is the ability to distribute I/O over many spindles without resorting to host striping.

Striped volumes are particularly advantageous with workloads such as Exchange that are random andbursty, and metaLUNs make the use of striped volumes simple. Suppose that you have planned two ESGs

to reside on their own 3+3 RAID 1/0 groups, for a total of 12 spindles. An alternative design would be tocreate a metaLUN for each of the ESGsboth spanning the two 3+3 groups. The same 12 spindles wouldstill be handling the I/O of the two ESGs, but in this case, if the I/O demand of one ESG is higher than the

other, the combined load will be balanced across all of the drives. This can help to avoid an I/O bottleneck

for a particular ESG. The two storage groups would also share the cost of a disk rebuild. By spanning two

RAID groups, they double the risk of being affected by a rebuild but a rebuild will only affect half the disks

of the metaLUN.

Figure 1. MetaLUNs Sharing Two RAID Groups

Another choice would be a single 6+6 RAID group. It would yield the same performance, but with less

growth potential. To accommodate growth, additional free space on the existing RAID group set can be



18/41

8/8/2005

used by concatenating the metaLUN. RAID group expansion also provides room for metaLUN growth (by

concatenating the metaLUN), along with added performance.

For added load balancing across a RAID group set, you can interleave the data of multiple storage groups

on the RAID set by creating metaLUNs for each ESG in the following order (illustrated in Figure 2):

Stripe the first component of ESG1

Stripe the first component of ESG2 Concatenate the next component of ESG1

Concatenate the next component of ESG2

Figure 2. Interleaving MetaLUNs

Building Blocks

Configuring metaLUNs can add a level of complexity to an Exchange design. In practice, the metaLUNconfiguration performs well and is easier to manage if you choose from a limited number of proven RAID

group types and sizes. These groups serve as flexible building blocks that will be easy to multiply out in

the full Exchange storage design.

Following are recommended building block elements that are small enough to be flexible, allow for growth,

and have been proven to perform well:

RAID 10: 3+3; 4+4; 5+5

RAID 5: 4+1

Its possible to overuse metaLUNs. While offering the benefits of flexibility and load balancing,metaLUNs residing on the same RAID group all share the risk of a physical disk failure in that group. It isworth considering the consequences of a RAID group failure when you plan your metaLUN layout.

Typically, two or three ESG metaLUNs sharing a RAID group set become a practical limit.

Log LUN ConfigurationWhen configuring disks for Exchange, most attention is paid to the database LUNs because they typically

represent the highest risk of performance bottleneck. But the database performance depends on the log

response time. Database transactions are gated by the completion of the associated log write.

When choosing a RAID type for log file LUNs, I/O performance and data protection are the overriding

factors rather than capacity. RAID 1/0 is the best RAID type to use for log LUNs. It provides better

response time than RAID 5 in degraded situations. In the case of a disk failure, RAID 1/0 rebuildscomplete faster than RAID 5. The longer the rebuild period, the more vulnerability there is to data loss.

Data loss always occurs if a second drive is lost during rebuild of a RAID 5 or RAID 1 group (see Table 6on page 15).

Although writes to the log LUN are sequential, performance tests have shown that you can take best

advantage of a set of drives by sharing a set of log LUNs on them. A rough rule of thumb to use for log

drives is to allocate one-eighth to one-tenth the number of spindles you have allocated for the databases,rounding up for RAID 1/0. For example, if you have calculated the need for 36 drives to handle the



19/41

8/8/2005

databases of four Exchange storage groups on a server, you could allocate four drives in a RAID 1/0 2+2

group to handle the four transaction log LUNs for that server.

To avoid added recovery complications in the unlikely case of the physical loss of a RAID group, the log

files for multiple servers should be stored on separate drives.

There are some other important considerations in the design of the disk layout for the Exchange transactionlogs:

There is one set of transaction logs for each ESG (i.e., the logs for all databases in the ESG arecombined).

For manageability and flexibility, each set of log files should reside on its own LUN. This is an actualrequirement for VSS-based backups.

The transaction log files for an ESG should always be on a separate LUN from their associateddatabases. The log LUN should never share the same spindles as the database LUN for the same

ESGeven on small systems. This is for protection rather than performance. If something should

happen to the database, the log files are essential to recover transactions since the last backup. If those

log files reside on the same physical disk and that disk is damaged, this option is lost.

Log I/Os are 100 percent writes. They are most frequently 512-byte writes, but can be up to 64 KB orlarger.

Host I/O to the log LUN equals approximately 10 to 15 percent of the host I/O to the database LUNs.

The size of each Exchange log file is 5 MB. Most online Exchange backup processes will delete logfiles whose transactions have been committed to the database. It is important to confirm that

committed log files are being deleted to avoid running out of space on the log LUN.

Circular logging is an Exchange feature that causes log files to be deleted after their transactions havebeen committed to the database. Only a handful of the logs are maintained at any time to save space.

However, this sacrifices the ability to recover a database up to the minute. It is off by default andshould never be turned on.

Calculating Log LUN Storage Capacity RequirementsYou can calculate storage capacity requirements for a log LUN by multiplying 5 MB by the maximum

number of log files maintained before being pruned.

If you dont know the maximum number of log files generated in a day, you can use a rough rule of thumbof one log file per user per day. (Microsoft uses an estimate of two logs per day for configuration; EMCs

actual log file rate is considerably less than one per day.)

Unless you specify otherwise, typical Exchange online backups will prune the log files on each run. For

example, if a backup is run nightly, the log LUN for a 1,000-user ESG would minimally need space for

1,000 log files, or 5 GB. Be sure to allow extra capacity to ensure that the log file LUN never runs out of

space.

Additional Storage Considerations for the Exchange ProductionDataThe previous configuration guidelines have covered design recommendations for database and log LUNs

for medium-to-large Exchange implementations. This section describes some additional considerations.

Public Folders

Its difficult to provide general guidelines for configuring public folder storage because their usage varies

so widely. Some organizations practically ignore the existence of Exchanges public folder capability,while others use them extensively for shared document repositories, discussions groups, shared calendars,

and several other purposes. By default, the public store is contained within a dedicated database in the first



20/41

8/8/2005

storage group; but when used actively, it is often configured on its own Exchange server with at least one

replicated copy to another Exchange server.

The best starting point for planning public folder storage for a newly migrated Exchange 2003 environment

is to examine the current I/O, storage usage, and growth rate for public folders in the current environment.

Adjust these measurements as necessary based on any planned changes to the public folder use policy (suchas adding a new integrated application that makes significant use of public folders, or switching shared

documents to file shares). Then, deploy using the principles described in this paper as with any otherExchange storage group.

SMTP Queue

The SMTP message queue should be placed on CLARiiON storage. It is not necessary to create replicas of

the SMTP queue. However, the queue can be placed on one of the database LUNs for the server as long asthe additional I/O and capacity requirements are factored in.

Keeping EDB Files and STM Files Together

An Exchange message store (database) consists of an EDB filecontaining all content generated by MAPI

clients, indexed message properties, and moreand a streaming file containing content generated by

Internet clients. Since the EDB file and STM file compose a complete message store, it is advisable tokeep them together on the same LUN.

Smaller Exchange Environments

If a relatively small (fewer than 20 database drives or 1,000 users) Exchange system is implemented on a

CLARiiON system, the same I/O requirements apply for the database LUNs, and thus it remains important

to allocate enough drives. It still may make sense to use RAID 1/0 for the database drives.

Database and log files should still be kept on separate RAID groups. For these smaller configurations, itmay be appropriate to use a RAID 1 pair for log files. If the number of drives is tight, the extra capacity on

the log drives could be used for some other light purpose.

If there are only two RAID groups available for Exchange data, the users can be split across two ESGs and

the logs placed with the databases for the alternate ESG. However, you must then factor in the log I/Orequirement when calculating the spindle count.

Figure 3. Sharing a Database LUN with the Logs from an Alternate Storage Group



21/41

8/8/2005

Planning Storage for Local RecoveryThis section provides guidelines for configuring storage to handle local replicas and disk-based backups ofExchange production data.

SnapView for Disk-Based ReplicationSnapViewis an optional software package for the EMC CLARiiON storage system3. Using SnapView,

users can create a point-in-time viewor multiple viewsof a LUN, which can subsequently be made

accessible to another server, or simply held as a point-in-time copy for possible restoration. For instance, a

system administrator can make the SnapView replica accessible to a backup server so that the productionserver can continue processing without the downtime traditionally associated with backup processes. In the

event of a data corruption on the source LUN, SnapView replicas can be used to restore the contents of a

corrupted LUN to the point-in-time creation of the replica. SnapView can create replicas using either

clones (BCVs) or snapshots.

There are currently two products sold by EMC that facilitate the management and integration with

Exchange 2003 and the Windows Volume Shadow Copy Service (VSS) to create verified SnapView

replicas of an ESG, ready to be restored immediately:

Replication Manager/SE (RMSE)

Replication Manager (RM)

Clone-Based Replication

The backup option that provides the most rapid recovery of an Exchange database today is clone-basedreplication. SnapView clones provide users the ability to create fully populated copies of LUNs within a

single array. Once synchronized, clones can be fractured from their source, and then presented to a

secondary server for read and write access. Following the initial synchronization, clones can beincrementally resynchronized, where only the data that has changed on the source since the clone was

fractured will be copied to the clone. In the event of a data corruption on the source LUN, clones can be

used to restore the source LUN via a reverse synchronizationoperation. This returns the source LUN to the

point-in-time view of the source as it was when the clone was fractured. In the unlikely event of a

hardware error on the LUN (for instance, a multiple-drive failure), the clone can be repurposed as theproduction LUN. Thus, clones provide protection against both software errors and hardware errors.

RM and RMSE allow you to configure up to eight clones for each of the Exchange production LUNs.

Consider the following when clone-based replication is chosen as the backup method:

Know the characteristics of the clone backups youll be using.

Number of clones for each production LUN Each extra clone maintains a validated backup copyfor a different point in time.

Timing and frequency of the backup operation Spread out backups of the storage groups andtime them for low-usage periods to take best advantage of array resources.

Fibre drives are recommended for clones. Their performance is well-suited to cloneresynchronizations. ATA drives are not recommended for clone LUNs in an active Exchange

environment because the clone resynchronization operation to an ATA drive is considerably (up toseveral times) slower. The likelihood of affecting the production environment during this time

increases. Additionally, with an IOPS rate of less than half that of fibre drives, the backup window

will also be extended on ATA drives by an Eseutil check that takes longer to complete.

Clone LUNs can be bound on drives that differ in size, speed, RAID geometry, or even drive type. Forinstance, some users may elect to use RAID 1/0 for their production LUNs, and RAID 5 for their

3SnapView is supported on all CLARiiON models, CX300 and higher. The only CLARiiON platform on

which SnapView is not supported is the CX200.



22/41

8/8/2005

clones. RAID 5 is recommended for clones because they dont have the same IOPS requirements, and

provide greater capacity. Additionally, some users may elect to put production data on 15K rpm

drives, and use 10K rpm drives for their clones.

Eseutil will be the most I/O-intensive activity on the database clones. Use the Eseutil throughputrequirements when determining the spindle count here.

When using RAID 5 with clones, take modified RAID 3 (MR3) support into consideration. RAID 5

4+1 sets offer a good balance of rebuild time with MR3 support4.

146 GB and 300 GB drives may also be more appropriate with clones. The combination of RAID 5and larger drives allows you to configure sufficient extra capacity to store multiple clones on the sameRAID group. Since clone synch operations are scheduled, the I/O capacity for a set of drives can be

shared to handle multiple backups occurring at different times.

Do not place clone LUNs in the same RAID group that contains their source LUN.

Plan clone layouts to avoid backups occurring simultaneously on different LUNs configured on thesame RAID group. LUN resynchronizations and Eseutil integrity checks both involve a heavy amount

of I/O. Trying to pair two or more of these activities at the same time will slow each other down and

possibly affect user response times.

Consider using metaLUNs for clones to provide more spindles to improve Eseutil performance.

Keep in mind the building blocks discussion on page 18. For simplicity, it is advisable to try to selectyour RAID group from the recommended choices, and expand using this as a base.

To add some extra protection, if you are configuring multiple clones for an Exchange LUN, its best toalternate the clones on separate spindle sets.

Snapshot-Based Replication

RM and RMSE can create a VSS shadow copy via a SnapView snapshot. As with clones, snapshots

provide users a readable and writeable LUN replica. Snapshots, however, are not fully populated copies.

They use a pointer-and-copy-based design, where pointers map to data regions on the source LUN untilthey are changed, at which point the original data is copied to a reserved area and the pointers are

redirected accordingly. In this way, users have only to allocate sufficient disk space to accommodate the

changes to the source LUN.

The process of allocating the pointers according to a particular point in time is referred to asstartinga

SnapViewsession. To see the contents of a particular session, a user can activate a snapshotto the session.

The reserved LUNis the private LUN that contains the original source data, and the process of copying thatdata is referred to as copy on first write(since it must only occur on the initial change to the source LUN).

As with clones, SnapView session data can be used to restore a corrupted source LUN. Much like the

reverse synchronizationthat clones offer, SnapView sessions can be rolled backto the point-in-time view

of when the session was started.

Snap-based replication is not an ideal backup method to use with Exchange for the following reasons:

SnapView copy on first write (COFW) must be completed before allowing a change to productiondata. This causes additional overhead on the production LUNs, especially for the period shortly afterthe snap session is started, and can noticeably impact performance. Even if the replication is

performed during off hours, the snap session remains active during the day to hold the backup copy. In

that case, the highest COFW activity takes place when users become active the following morning. The Eseutil integrity check on the snap results in very heavy read activity to the production database

LUN, which can cause elevated disk response times.

The replica taken is not a completely separate physical copy. An unlikely physical loss of theproduction LUN results in a loss of the backup as well.

4For information on MR3 writes, refer to the CLARiiON RAID 5 Optimizationssection of theEMCCLARiiON Fibre Channel Storage Fundamentals white paper.



23/41

8/8/2005

The space required for the reserved LUN pool (snap cache) is larger than typicalpossibly equivalentto the cumulative size of all files on the source LUNbecause of the random nature of Exchange I/O.

Comparison of Local Replication Options

When comparing the benefits of clone-based to snapshot-based replication of Exchange data, the clone

option has several advantages. It is the recommended solution for Exchange local replication. The

following tables provide a brief comparison of clone/snapshot capabilities and impact, and of the currentlyreleased EMC local replication solutions.

Table 9. Comparison of SnapView Clones and Snapshots for Exchange Local Replication

Feature Clones Snapshots

Accessible to secondary server Yes Yes

Recovery in event of software error Yes Yes

Maximum replicas 8 8

Can be used with MirrorView

No Yes

Recovery in event of hardware error Yes No

Disk space, as a percentage of thesource LUN

100% per clone Varies widely. With a daily snapshotfor local Exchange replication, the

snapshot may approach 100%.

Performance impact on source Only during resync and impactis typically less than COFWactivity

Yes, for the duration of session. Anyactivity on the snap (such as Eseutil or

backup to tape) directly impacts theproduction LUN.

Table 10. Comparison of VSS-Integrated EMC Local Replication Products

Product Supported

Exchange

Versions

Supported OS Supported Array Replicas

RMSE a 5.5

2000

2003

Windows 2000

Windows 2003

CLARiiON Up to eight clones orsnapshots

Replication

Managerb

5.5

2000

2003

Windows 2000

Windows 2003

CLARiiON

Symmetrix

Some third-party arrays(seeEMC Support Matrix)

Up to eight clones or

snapshots

aRMSE also supports replication of SQL and Windows file systems.

bReplication Manager also supports additional operating systems and replication of additional applications.

Online Backup to DiskDisk-based Exchange online backup has become a competitive alternative to tape backup, offering faster

performance and higher reliability. Consider the following when online backup to disk is chosen as theprimary backup method.

Capacity requirements are determined by the size and frequency of the backups and the number ofcopies maintained before archiving to tape.

Online backups perform sequential writes. A configuration that optimizes this type of I/O will performbest.



24/41

8/8/2005

CLARiiON ATA disk drives are effective for backup to disk. They perform well with sequential I/Ooperations especially with FLARERelease 13 or higher and a RAID 3 configuration. They are alsomost competitive with the cost of tape backup.

If keeping more than one backup of an ESG on disk, you can alternate copies across two differentRAID groups for added protection.

Most testing has been done on RAID 5 4+1 (RAID 3 4+1 for ATAs). It has been determined to be the

sweet spot for performance with backup to disk.

Multiple streams within the same LUN will often result in better overall throughput, although the per-stream performance will understandably decline. If each stream is sent to a separate LUN on the

RAID group, the data in a particular stream will be more contiguous on the disk and thus restoresomewhat faster (by 5 MB/s to 10 MB/s).

For the most rapid recovery and the least user impact, individual Exchange databases should be assmall as possible with the smallest possible number of users. If the backup method will be online

backup to disk, you have added incentive to use the maximum number of storage groups and databases

within those ESGs.

If online backup is performed during a period of active production, be sure to take this additionalactivity into account when calculating I/O requirements, also paying attention to overall array

limitations.

For more detail on this topic, refer to the white paperEMC CLARiiON Backup Storage Solutions CXSeries: Backup-to-Disk Performance Guide.

Recovery Storage GroupsWith the addition of the recovery storage group (RSG) feature in Exchange Server 2003, a separate server

for mailbox recovery is no longer required. Within an RSG, an administrator can mount a second copy of amailbox database and use it to recover any data it contains.

When allocating storage for RSGs, plan a single LUN to handle the size of the largest regular Exchange

storage group, plus existing logs for that ESG. Log files can reside in the same LUN as the databases since

users cannot log on to these and mail cannot be delivered to them.

With local replication to clones, its convenient to mount a snapshot of the clones for use with a recovery

storage group. Of course this is useful only if the data you need to recover is recent enough to exist on theclone replicas. You cannot mount the clone directly for use with an RSG because mounting the Exchange

database would make it unusable for the standard VSS recovery. When planning to mount this replica, be

sure to allocate some snap cache for this purpose.

For more detail on this topic, refer to the Microsoft white paper Using Exchange Server 2003 RecoveryStorage Groups.

Planning Storage for Local Message ArchivingMessage archiving on CLARiiON storage has been a growing component of new messaging system

designs. The archiving implementation generally involves adding one or more servers to the environment

to manage the movement of the content of messages past a set age out of the standard Exchange database

structure. The archiving servers also manage the near-line retrieval of these messages when a user calls for

one.

EMC LegatoEmailXtender software provides this archiving capability. The EmailXtender manual

provides a formula to estimate the amount of data to be archived.

Container File Storage (disk space for archived messages) =

Number of users

x Messages per user per day



25/41

8/8/2005

x Days per work week

x Number of weeks mail is retained

x Average message size (KB)

x .000001 (to convert the result to GB)

For example:3200 x 20 x 5 x 250 x 50 x .000001 = 4000 GB

EmailXtender also estimates an approximately 20 percent overhead for the installation and Message Center

drive, plus .5 GB for queuing. The space required on the installation LUN in this case would be:

4000 x .2 + .5 = 800.5 GB

Because there is usually a very large quantity of data archived, and the access to this data is less time-

critical, this is a very appropriate application for CLARiiON ATA drives.

Storage-System ConsiderationsOnce you have determined the I/O requirements of the new messaging system and settled upon an

appropriate number and type of disk drive, the next step is to consider the throughput and features of thestorage system.

The best throughput and data protection from the storage system results from a design that aims for

balanced use of the array resources, considering both the layout of the data and the timing of schedulableactivities. Consider the following when planning your disk layout:

Avoid configuring Exchange database LUNs on the CLARiiON persistent storage manager (PSM)drives (drives 0-2). However, these drives may be suitable for the lower I/O requirements of a

properly configured set of log LUNs.

Its not required to have the log LUN and database LUN for an ESG managed by opposite CLARiiONstorage processors (SPs). It matters more that the overall I/O demand is balanced across the two SPs.

There is a small performance advantage (three to four percent) to be gained by binding the RAID 1/0primaries and secondaries on different back-end buses. The main advantage to this approach is that the

administrator does not have to be cognizant of which LUN is on which back-end bus. The back-end isbalanced by virtue of the RAID group layout.

Dont forget to include hot spares in the configuration. Typically, you should add one hot spare forevery 30 drives.

Clone LUNs must be assigned to the same SP as their source. (It is allowed to configure clones on theopposite SP to its source, but the clone would be trespassed for synchronizations in this case.) Ensure

that both the current SP owner and preferred owner are the same for both the source and target LUNs.

MirrorView (synchronous or asynchronous) cannot be used with any LUN that is part of a clone group(source or target).

Some activities within the Exchange environment place heavy use of the CLARiiON SP. Theseinclude performing an Eseutil check against a database, and running CLARiiON layered applications

particularly when acting upon several LUNs at once. When configuring an array for Exchange and

scheduling Exchange administrative operations, its important to consider the limits of the SP CPUresources.



26/41

8/8/2005

iSCSI GuidelinesThe CLARiiON CX500i and CX300i models provide direct support for iSCSI connections. Note the

following considerations before choosing an iSCSI model CLARiiON storage system for Exchange:

Compared to Fibre Channel, iSCSI offers a lower cost hurdle for customers moving from direct-attached or internal server-based storage to SAN. Connectivity components are less expensive and IP

network expertise is more common.

There is inherently more processing required with the iSCSI protocol than Fibre Channel and thedelivery of iSCSI packets is less regular. This introduces extra latency into the I/O stream. Exchange

activity can generate high disk I/O, and it has a low tolerance for slow disk response.

The iSCSI models support the ability to boot directly from the storage system, but only with an iSCSIHBA (TCP/IP Offload Engine, or TOE) installed in the server.

The iSCSI models support only up to a two-node Exchange cluster.

The CLARiiON Disk Library is fibre-based and requires the network infrastructure of a fibre SAN.

Local replicationincluding VSS-supported shadow copies for Exchange 2003 via ReplicationManageris supported. These storage system models are internally identical to their equivalent Fibre

Channel models; therefore, internal performance such as clone synchronizations should be the same.

Remote replication using CLARiiON layered applications (SAN Copy or MirrorView) is not supportedto or from the iSCSI models.

Front-end ports on iSCSI models (1 Gb versus 2 Gb for fibre models) have the potential to be abottleneck during high I/O activity. Eseutil, for example, can process an Exchange database at the rate

of 10 GB per minute. All CX models have Fibre Channel back ends.

When configuring an iSCSI model CLARiiON storage system for Exchange, note the following

recommendations:

When working with the Microsoft iSCSI Initiator Service, use the Initiator Control Panel to configurethe LUNs on the CLARiiON storage system as persistent targets. This is necessary to automaticallyreestablish a connection to the storage system upon a restart.

Use Gigabit Ethernet and separate network traffic from storage traffic. Dedicated Gigabit Ethernetoffers the best throughput to the CLARiiON iSCSI models.

Configure redundant controllers and switches for high availability.

While it is common to use a network interface card (NIC) in servers for iSCSI connections to savecost, consider replacing the NIC with an iSCSI HBA. A TOE handles the overhead added by the

TCP/IP processing, which can be particularly important on an active Exchange server.

If you use a NIC, install software in this order:

1. Microsoft iSCSI initiator

2. Navisphere Agent

3. PowerPath

During the installation of Navisphere Agent, make sure to answer yeswhen asked if the system uses

the Microsoft iSCSI Initiator.

If you use a TOE, the recommended order is:

1. HBA software (such as QLogic SANsurfer)

2. Navisphere Agent

3. PowerPath



27/41

8/8/2005

During the installation of Navisphere Agent, make sure to answer nowhen asked if the system uses the

iSCSI initiator.

The bottom line for performance is to minimize traffic latency in the iSCSI connections betweenservers and storage system, and to test Exchange performance in the environment to ensure that

response time is as required.

For further detail on Microsoft iSCSI guidelines, refer to Knowledge Base article 839686, Support for

iSCSI Technology Components in Exchange Server.For information on Microsoft iSCSI cluster supportrequirements refer to the FAQ at:

http://www.microsoft.com/WindowsServer2003/technologies/storage/iscsi/i

scsicluster.mspx

CLARiiON Storage Systems ComparisonTable 11 lists useful specifications for the five current CLARiiON CX models.

Table 11. CLARiiON CX Series Storage Systems Feature Summary

Feature CX700 CX500 CX300 CX500i CX300iMaximum Disks 240 120 60 120 60

Storage Processors (SP) 2 2 2 2 2

CPUs/SP2x3GHz

2x1.6GHz

1x800MHz

2x1.6GHz

1x800MHz

Front-End Ports/SP

4 @ 2Gb

(fibre)

2 @ 2Gb

(fibre)

2 @ 2Gb

(fibre)

2 @ 1Gb

(iSCSI)

2 @ 1Gb

(iSCSI)

Back-End Fibre Channel Ports/SP4 @ 2

Gb

2 @ 2

Gb

1 @ 2

Gb

2 @ 2

Gb

1 @ 2

Gb

I/O Buses 4 2 1 2 1

Array Cache 8 GB 4 GB 2 GB 4 GB 2 GB

Highly Available Hosts 256 128 64 128 c 64 c

Maximum LUNs 2048 1024 512 1024 512

MirrorView Images (Total Primary + Secondary) a 100 50 n/a n/a n/a

Snapshot LUNs 300 150 100 150 100

Clone Groups 50 25 25 25 25

Clone Objects * 100 50 50 50 50

SAN Copy Concurrent Sessions (and MaxDestinations)

16(100)

8 (50) 4 (50) n/a n/a

Max Incremental SAN Copy Source LUNs 100 50 25 n/a n/a

Exchange Users with no replicationb

20,000 10,000 5,000 6,000 3,000

Exchange Users with local replication (RMSE) 10,000 5,000 2,500 TBD TBDa

Clone objects and MV images share the max.bAs described in this paper, the number of Exchange users a storage system can handle will vary greatly.

The number in this table is provided primarily as a starting point and for comparison between the models. It

refers to a storage system dedicated to Exchange users (.4 IOPS/user). While it is very unusual for a

CLARiiON storage processor to fail, it is advisable to plan for and test an SP failure scenario to understandhow the environment would run in degraded mode, and then to size accordingly.

cFor iSCSI, each NIC/TOE connected to the storage system counts as a connection.

http://www.microsoft.com/WindowsServer2003/technologies/storage/iscsi/iscsicluster.mspxhttp://www.microsoft.com/WindowsServer2003/technologies/storage/iscsi/iscsicluster.mspxhttp://www.microsoft.com/WindowsServer2003/technologies/storage/iscsi/iscsicluster.mspxhttp://www.microsoft.com/WindowsServer2003/technologies/storage/iscsi/iscsicluster.mspx


28/41

8/8/2005

Be aware of these specifications when determining the appropriate storage system(s) for the new Exchange

storage design. For example, you can create up to 50 clone objects on a CLARiiON CX500. Each LUN in

a clone groupincluding the sourcecounts as one clone object. If the local replication plan calls for two

clone copies of each Exchange production LUN, the CX500 can support up to 16 Exchange LUNs (each

production LUN and its two clones represent three clone objects, for a total of 48 clone objects). If theExchange design follows the standard recommendation of two LUNs per storage group, the CX500 can

handle eight ESGs. The number of servers does not matter here. The CX500 could support four Exchange

servers with two ESGs each, two servers with four ESGs, or any other combination totaling eight ESGs. Ifonly one clone is used per production LUN, the CX500 could support up to 12 ESGs, assuming that no

other resource limits are reached.

Putting It All TogetherThis section presents some final suggestions for designing a successful Exchange storage design.

Consider Site-Specific ConstraintsThe resulting storage design must obviously take into account any requirements or restrictions in theenvironment where the messaging system will be implemented.

For example, the organization may have an existing CX500 available that they want to use before any

additional array is purchased. They may not be able to dedicate an entire array to Exchange use, or mayhave a certain type of drive already in place for the Exchange data. There are likely to be many other

decisions made already that will affect the storage design, such as number and location of Exchange

servers, number of ESGs per server, network capacity, etc.

Regardless of the constraints, the core requirement of providing enough drives to meet peak I/O demand

remains, as does the strong recommendation to keep log and database LUNs for the same ESG on separate

spindles.

Configure the Cleanest Looking Layout Diagram

Using a building block style, draw up a clean-looking storage layout diagram. This will ease understanding

of the design, help identify possible weaknesses, and aid in the storage administration of the

implementation.

Plan Throughout for Operational Resiliency

Increasing availability through the elimination of single failure points is easier in SAN environments.

Consider all possible points of failure and distribute users from the same functional area to minimize theimpact of planned or unplanned downtime.

Separate ESGs width first (more ESGs), depth second (databases per ESG)

Separate Exchange servers typically 4,000 or fewer users per mailbox server

Separate storage systems spread out users (see following example)

Geographically dispersed locations for larger organizations

Upper management and other priority users and key group mailboxes may have stricter SLAs than the rest

of the organization. These users should also be distributed, in addition to the extra performance or HAconfigurations they are provided.

ExampleFor an organization of 7,000 typical Exchange 2003 users, a single CX500 could handle their basic mailbox

requirements. A CX700 could handle their mailbox requirements, including local replication to clones. Asan alternative to the single CX700, consider designing the configuration with two CX500s handling 3,500

users each. Distributing these 3,500 users over all four ESGs places fewer than 900 users per Exchange

storage group and under 200 per database, minimizing user impact and maintaining conservative databasesizes.



29/41

8/8/2005

Additionally, each of the CX500s could be configured to handle disaster recovery for the peer storage

system in the unlikely event one storage system experiences a significant problem. By providing additional

storage space and temporarily postponing

Exchange_best_practice.pdf

Documents