Top Banner
Template Version October 2011 Jetstress 2013 Jetstress Field Guide Monday, 8 July 2013 Version 2.0.0.8 [Issued] Prepared by [email protected]
79
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Jetstress Field Guide v2.0.0.8

Template Version October 2011

Monday, 8 July 2013

Version 2.0.0.8 [Issued]

Prepared by

Page 2: Jetstress Field Guide v2.0.0.8

000Exchange Community0

MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, our provision of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

The descriptions of other companies’ products in this document, if any, are provided only as a convenience to you. Any such references should not be considered an endorsement or support by Microsoft. Microsoft cannot guarantee their accuracy, and the products may change over time. Also, the descriptions are intended as brief highlights to aid understanding, rather than as thorough coverage. For authoritative descriptions of these products, please consult their respective manufacturers.

© 2011 Microsoft Corporation. All rights reserved. Any use or distribution of these materials without express authorization of Microsoft Corp. is strictly prohibited.

Microsoft and Windows are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

Page ii, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 3: Jetstress Field Guide v2.0.0.8

000Exchange Community0

Revision and Signoff Sheet

Change Record

Date Author Version Change reference

22/03/2013

Neil Johnson 2.0.0.1 First draft for Jetstress 2013

03/04/2013

Neil Johnson 2.0.0.2 Updates after feedback from Robert Gillies and Ramone Infante.

19/06/2013

Neil Johnson 2.0.0.5 Final issue after internal review

20/06/2013

Neil Johnson 2.0.0.6 Updated Error Table description with JET codes Added troubleshooting information for ESE 606.

20/06/2013

Neil Johnson 2.0.0.7 Fixed formatting issues

Page , Field Guide, Version Prepared by Neil Johnson"" last modified on , Rev

Page 4: Jetstress Field Guide v2.0.0.8

000Exchange Community0

Document Contributors

Name Position Section

Neil Johnson Senior Consultant, UK MCS Author

Alexandre Costa SENIOR SDET, Exchange Test Jetstress internals

Ross Smith IV PRINCIPAL PROGRAM MANAGER, Exchange CXP Configuring Jetstress

Ramon b. Infante DIR, WW COMMUNITIES, UC Various

Matt Gossage PRINCIPAL PROGRAM MANAGER LEAD Various

Umair Ahmad SDET II, Exchange Test Various

Page , Field Guide, Version Prepared by Neil Johnson"" last modified on , Rev

Page 5: Jetstress Field Guide v2.0.0.8

000Exchange Community0

Reviewers

Name Version Position Date

Neil Johnson 2.0.0.1 Senior Consultant II, MCS UK

Alexandre Costa 2.0.0.1 SENIOR SDET, Exchange Test

Ross Smith IV 2.0.0.1 PRINCIPAL PROGRAM MANAGER, Office 365 - CAT SVCS

Ramon b. Infante 2.0.0.1 DIR, WW COMMUNITIES, UC

Matt Gossage 2.0.0.1 PRINCIPAL PROGRAM MANAGER LEAD, Exchange PM – US

Umair Ahmad 2.0.0.1 SDET II, Exchange Test – US

Nathan Muggli 2.0.0.1 SENIOR PROGRAM MANAGER, Exchange PM - US

Scott Schnoll 2.0.0.1 PRINCIPAL TECHNICAL WRITER, Content Publishing

Boris Lokhvitsky 2.0.0.1 DELIVERY ARCHITECT, US-US-MCS West SL 2

Jeff Mealiffe 2.0.0.1 SENIOR PROGRAM MANAGER LEAD, Office 365 - CAT SVCS

Robert Gillies 2.0.0.1 REGIONAL ARCHITECT, US-MCS DOD SL 2

David Mosier 2.0.0.1 PRINCIPAL CONSULTANT, US-MCS Civilian SL 2

Table 1: Document reviewers

Page , Field Guide, Version Prepared by Neil Johnson"" last modified on , Rev

Page 6: Jetstress Field Guide v2.0.0.8

000Exchange Community0

Table of Contents

1 Purpose........................................................................................................................1

2 What is New in Jetstress 2013......................................................................................1

3 Introduction to Jetstress...............................................................................................2

4 Jetstress Internals........................................................................................................3

4.1 Main Jetstress Components......................................................................................................3

4.1.1 Auto Tuning Component.................................................................................................................3

4.1.2 Thread Dispatcher...........................................................................................................................5

4.1.3 Background Log Checksummer.......................................................................................................5

4.1.4 Offline Log and Database Checksummer.........................................................................................5

4.1.5 Reporting and Verification..............................................................................................................6

5 Planning for Jetstress...................................................................................................7

5.1 Jetstress testing flow chart........................................................................................................7

5.1.1 High Level Test Overview................................................................................................................7

5.1.2 Process with Automatic thread tuning............................................................................................8

5.2 When should I run Jetstress in my project?..............................................................................9

5.3 Where should I run Jetstress in my infrastructure?.................................................................10

5.4 Failure Mode Testing...............................................................................................................11

5.4.1 Raid Array Testing.........................................................................................................................11

5.4.2 Resilient Component Testing.........................................................................................................11

5.4.3 Example of a failed degraded mode test.......................................................................................12

5.5 Jetstress testing inside virtual machines.................................................................................13

5.5.1 What is different about Jetstress inside a virtual machine?..........................................................13

5.6 How much time should I allocate for Jetstress testing?..........................................................15

5.6.1 Initialisation...................................................................................................................................15

5.6.2 Testing...........................................................................................................................................15

5.6.3 Clean-up........................................................................................................................................16

5.7 Preparing for the Jetstress test...............................................................................................17

5.8 What happens if the test fails?................................................................................................18

6 Installing Jetstress......................................................................................................19

6.1 Documentation.......................................................................................................................19Page

, Field Guide, Version Prepared by Neil Johnson"" last modified on , Rev

Page 7: Jetstress Field Guide v2.0.0.8

000Exchange Community0

6.2 Jetstress Version and Download.............................................................................................19

6.3 Prerequisites...........................................................................................................................20

6.4 Getting ESE Files necessary for Jetstress.................................................................................21

6.4.1 File locations from an installed Exchange Server...........................................................................21

6.4.2 File locations from the installation media.....................................................................................21

6.5 Installation...............................................................................................................................22

6.5.1 Application Installation..................................................................................................................22

6.5.2 ESE File Installation........................................................................................................................24

7 Configuring Jetstress..................................................................................................26

7.1 Jetstress Test Types.................................................................................................................26

7.1.1 Test a disk subsystem throughput.................................................................................................26

7.1.2 Test an Exchange mailbox profile..................................................................................................26

7.2 Initial configuration.................................................................................................................27

8 Jetstress Output Files.................................................................................................33

9 Reading Jetstress report data.....................................................................................34

9.1 Target design values................................................................................................................34

9.2 Reading the Jetstress Test Result Report................................................................................35

9.2.1 Test Summary................................................................................................................................35

9.2.2 Database Sizing and Throughput...................................................................................................35

9.2.3 Jetstress System Parameters.........................................................................................................36

9.2.4 Database Configuration.................................................................................................................36

9.2.5 Transactional I/O Performance.....................................................................................................36

9.2.6 Background Database Maintenance I/O Performance..................................................................37

9.2.7 Log Replication I/O Performance...................................................................................................37

9.2.8 Total I/O Performance...................................................................................................................38

9.2.9 Host System Performance.............................................................................................................39

9.2.10 Error Counts Per Volume...............................................................................................................39

9.2.11 Test Log.........................................................................................................................................42

9.3 Interpreting Jetstress test results............................................................................................43

9.4 Test evaluation........................................................................................................................44

10 Appendix A – Configuring thread count.................................................................45

11 Appendix B – Configuring sluggishsessions............................................................46Page

, Field Guide, Version Prepared by Neil Johnson"" last modified on , Rev

Page 8: Jetstress Field Guide v2.0.0.8

000Exchange Community0

12 Appendix C - Running a Jetstress Test with JetstressCmd.exe................................47

13 Appendix E – Running Jetstress on a production server.........................................49

14 Common Issues......................................................................................................50

14.1 Troubleshooting Jetstress.......................................................................................................50

14.1.1 Jetstress cannot attach to or create a database............................................................................50

14.1.2 Error loading Performance Monitor counters...............................................................................50

14.1.3 Unable to tune for the parameters...............................................................................................51

14.1.4 Unable to mount databases due to invalid mount point configuration.........................................51

14.1.5 Jetstress testing failed. Error: System.ApplicationException: Faulty performance counter paths: \MSExchange Database(*)\*.........................................................................................................................52

Page , Field Guide, Version Prepared by Neil Johnson"" last modified on , Rev

Page 9: Jetstress Field Guide v2.0.0.8

000Exchange Community0

1 PurposeThis document is intended to explain the process and requirements for validating an Exchange 2013 storage solution prior to releasing an Exchange deployment into production.

It will explain how Jetstress works, how to plan for and perform a Jetstress test, and how to analyse the results of the test.

This document is not intended to provide Exchange storage design guidance. For guidance on Exchange 2013, server design and planning refer to Planning and Deployment.

2 What is New in Jetstress 2013Jetstress 2013 is an evolution of Jetstress 2010. It has some improvements, bug fixes and it allows validation of Exchange Server 2013 solutions.

A quick outline of new features:

The Event log is captured and logged to the test log. These events show up in the Jetstress UI as the test is progressing.

Any errors are logged against the volume that they occurred. The final report shows the error counts per volume in a new sub-section.

A single IO error anywhere will fail the test. In case of CRC errors, they might be remapped. A re-run of Jetstress should verify that they indeed were remapped.

Detects -1018, -1019, -1021, -1022, -1119, hung IO, DbtimeTooNew, DbtimeTooOld. Threads, which generate IO, are now controlled at a global level. Instead of specifying

Threads/DB, you now specify a global thread count, which works against all databases. This improves the granularity of thread tuning and enables automatic tuning to work more effectively.

Jetstress configuration files (JetstressConfig.XML) generated from an older version of Jetstress is no longer allowed.

Important Changes

Do not use Jetstress 2013 for older versions of Exchange Server. Jetstress 2013 has only been tested with Exchange Server 2013.

Page 1, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 10: Jetstress Field Guide v2.0.0.8

000Exchange Community0

3 Introduction to JetstressJetstress is a tool for simulating Exchange database I/O load without requiring Exchange to be installed. It is primarily used to validate physical deployments against the theoretical design targets that were derived during the design phase.

To simulate the complex Exchange database I/O pattern effectively, Jetstress makes use of the same ESE.DLL that Exchange uses in production. It is therefore vital Jetstress use the same version of the Extensible Storage Engine (ESE) files that your Exchange infrastructure will be built with in production.

Ideally, Jetstress testing will be part of the overall project plan. The best time to schedule Jetstress testing is just before Exchange will be physically installed onto the servers.

Jetstress testing provides the following benefits prior to deploying live users.

Validates that the physical deployment is capable of meeting specific performance requirements

Validates that the storage design is capable of meeting specific performance requirements

Finds weak components prior to deploying in production Proves storage and I/O stability

The most important aspect of Jetstress testing is that it allows you to see how the physically deployed storage and server infrastructure will behave once a real Exchange workload is applied. This often works out differently from expectations, especially in scenarios where shared storage infrastructure is deployed or where the storage design is complex.

Often the Jetstress test will not provide the results that were expected. Sometimes by making subtle configuration changes to the storage infrastructure (for example, driver or firmware updates) it is then possible to get the test to pass.

It is important to remember that when the Jetstress test reports a failure, Jetstress has not failed, Jetstress is just reporting on the performance of your storage solution. This may seem an obvious point, however a large number of customer escalation cases for Jetstress are not actually Jetstress cases and are instead storage performance cases. If you need to remediate a test failure, remember that Jetstress is dumb tool that is used worldwide by thousands of Exchange professionals and in Office 365. It is extremely unlikely that Jetstress is broken; it is far more likely that you have a design issue or misconfiguration with your storage deployment.

Fundamentally, a successful Jetstress test validates that all of the hardware and software components within the I/O stack from the operating system down to the physical disk drive are working to a sufficient level to meet the predicted performance required by Exchange to operate successfully.

Page 2, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 11: Jetstress Field Guide v2.0.0.8

000Exchange Community0

Important:

The validity of your Jetstress testing is only as good as the user profile analysis and workload prediction that was completed during the design phase of the project.

4 Jetstress Internals

4.1 Main Jetstress ComponentsLike Exchange, Jetstress is an ESE-based application. It runs in user memory space, makes API calls to ESE, which in turn makes calls to the Windows File system and I/O Manager to gain access to the data stored on disk. During each of these tasks Windows records performance information about the specific task and the operating system as a whole. Once the test is completed, Jetstress analyses the performance data to determine if the system meets the targets specified at the beginning of the test.

Figure 1 - Main Jetstress Components

4.1.1 Auto Tuning Component

This component is responsible for auto tuning within Jetstress. It attempts to determine the maximum thread count that the solution can support. Each thread performs a set amount of ESE calls, which generates a set amount of disk I/O. By raising or lowering thread count, the storage workload can be modified. The auto-tuning component attempts to determine the maximum thread count that the storage solution can support, whilst remaining within the published disk latency guidelines for Exchange Server. The Jetstress test parameters for disk latency are shown in section 8.3 Interpreting Jetstress test results.

Page 3, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 12: Jetstress Field Guide v2.0.0.8

000Exchange Community0

New:

Auto tuning has been improved in Jetstress 2013 by moving to a global thread controller. Auto-tuning may still fail, however it should be successful in many more scenarios than in 2010.

Page 4, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 13: Jetstress Field Guide v2.0.0.8

000Exchange Community0

4.1.2 Thread Dispatcher

The thread dispatcher is responsible for managing workload within Jetstress. The main areas of interest within the thread dispatcher are as follows:

ThreadCount: number of transactional threads globally (prior to Exchange 2010, it used to be the number of threads per storage group and in Exchange 2010 it was number of threads per database). In Exchange 2013 this is a global parameter.

ThreadTypes: each of those threads chooses to do one type of work against the database. The same thread can perform different types of work during a given run. There are four types: insert, read, update and delete (all of those against records on a table). The default operation mix for an Exchange 2010 simulation is: 40%, 35%, 5% and 20%, respectively.

SluggishSessions: the default is 1 for Exchange 2010. This is usually used to fine tune the amount of work performed by a given thread. Internally, a thread sleeps for (SluggishSessions * TaskRunTime) before picking up the next task to run. For example, if you have 3 for SluggishSessions and an insert thread took 100ms in the last cycle, it will sleep for 300ms before moving on to the next cycle. Of course, 0 means “go full throttle”.

4.1.3 Background Log Checksummer

This component simulates the I/O overhead of additional database copies. This copy operation has an I/O cost which increases with each additional copy.

4.1.4 Offline Log and Database Checksummer

This process checksums all database and log files at the end of a Jetstress run to ensure that all data is intact. It also provides performance data for CRC checksum speed should VSS copies require a checksum prior to backup.

This process is extremely hard on storage hardware, often applying an I/O load many times greater than the workload that the actual Jetstress test applies.

Important

If you are running Jetstress on multiple servers in parallel on shared storage infrastructure, it is vital that the CRC check is not running while other servers are performing their Jetstress tests. Selecting the “multi-host” option during the test configuration causes the testing process to stop and wait for confirmation before beginning the CRC check to avoid servers interfering with each other’s results.

While working out the correct thread count to use it is not necessary to let the checksum part of the test complete. To stop the checksum you can either click on cancel, which will stop the checksum part of the test but still generate the performance test report, or edit the Jetstress configuration file and change the VerifyChecksum value to false (default is true).

<VerifyChecksum>false</VerifyChecksum>

Page 5, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 14: Jetstress Field Guide v2.0.0.8

000Exchange Community0

4.1.5 Reporting and Verification

At the end of a Jetstress test, the reporting and verification process compares the observed performance results against a set of acceptable values. These results are then written to a HTML file. During the test, binary performance data is written out to a BLG file.

Page 6, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 15: Jetstress Field Guide v2.0.0.8

000Exchange Community0

5 Planning for JetstressJetstress testing can be difficult to account for in your planning process. Particularly, how much time to allocate for testing, and which parts of the project should Jetstress testing occur? This section will try to answer some of these questions and explain the process in more detail.

5.1 Jetstress testing flow chartThe aim of the following process is to find the maximum workload while still passing the test. Fundamentally, the aim is to increase workload until the test fails or meets the design goals identified in the mailbox role calculator.

Important:

The last value before failure is the highest workload that the system can support. If this value is below the design target, then use sluggishsessions to fine-tune the test. If the storage is still unable to meet the requirements then we have determined that it is unsuitable for the workload intended.

The following process assumes that you are using the disk subsystem throughput test and auto-tuning as recommended.

5.1.1 High Level Test Overview

Figure 2 - High Level Test Overview shows a high-level flowchart for Jetstress testing. The process begins with a completed Mailbox Role Calculator and ends when the test has passed successfully while meeting the targets identified in the calculator.

Figure 2 - High Level Test Overview

Page 7, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 16: Jetstress Field Guide v2.0.0.8

000Exchange Community0

5.1.2 Process with Automatic thread tuning

Figure 3 - Jetstress test flowchart for automatic thread tuning

Page 8, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 17: Jetstress Field Guide v2.0.0.8

000Exchange Community0

5.2 When should I run Jetstress in my project?Jetstress testing can often take place at multiple phases within the project plan. Depending on the design approach taken, Jetstress testing may be performed during both the planning (design) and build phases of a project.

Figure 4 - SDM phase overview

So, why would you run Jetstress during the planning/design phase of a project? The simple answer is that with today’s powerful hardware, Exchange design teams must use standard “chunks” of hardware to create their design. Rather than attempt to guess what the I/O limits are of the hardware it is preferable to perform some Jetstress tests on the hardware to determine the maximum storage IO capacity of the system. This allows the design team to specify the bill of materials much more precisely, thereby saving money and reducing risk.

However, if you have already proven the solution in the lab, why test again at build time? This is a common question. Many projects only schedule sufficient time for testing a single server and its storage solution with the belief that they only need to validate the design. The problem with this approach is that it assumes a zero error rate in the build out. What happens if someone forgets a part of the build on one server? Alternatively, deploys a different device driver from the one used in the lab? What happens if a faulty piece of hardware has been deployed? Jetstress testing at build time is a great way to validate that the physically deployed hardware and software are capable of providing the required I/O performance for Exchange. Jetstress testing at build time is also a way to identify failing components such as disk drives; it is much less stressful to identify a weak batch of disks during a Jetstress test than on a Monday morning after a large user migration!

If the project plan will allow it, build in sufficient time to test each server and storage chassis that will be deployed before migrating user mailboxes to it. Remember that Jetstress can be fully automated, so with a little bit of planning it can be left to run overnight and may not actually add any significant overhead to the project.

Page 9, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 18: Jetstress Field Guide v2.0.0.8

000Exchange Community0

5.3 Where should I run Jetstress in my infrastructure?To ensure that the Jetstress test is representative of production, it is recommended to run Jetstress on every set of disks that will hold mailbox database copies (active, passive or lagged). The test is designed to validate the storage system and so it is important that where you have multiple Exchange servers that use the same storage system, you must test them in parallel to simulate the production workload. If the storage system also supports additional workload, you should use IOMeter to simulate this if it is not yet active on the storage system at the time of testing.

Note:

It is important to remember not to run Jetstress on production servers that have Exchange Server already installed. This may lead to problems with Exchange performance counters. It is recommended to run Jetstress BEFORE installing Exchange Server into production.

In the event that you have already installed and configured Jetstress on your production Exchange Servers, refer to the following article for more information on resolving Exchange Performance Counter problems:

http://blogs.technet.com/b/mikelag/archive/2010/09/10/how-to-unload-reload-performance-counters-on-exchange-2010.aspx

Each database copy must be designed to provide sufficient I/O to support the copy if it were to become active. Therefore, by testing each database LUN in parallel, we are validating that the storage solution is able to meet the design requirements. We are also validating that any pieces of shared infrastructure are able to meet the demand of the entire solution, rather than simply testing each server individually.

Note:

Where there is no shared infrastructure and all storage is directly attached, servers may be tested individually. However, the test must be configured to include any active, replica or lagged LUNS that could become online at the same time to be a valid test.

Page 10, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 19: Jetstress Field Guide v2.0.0.8

000Exchange Community0

5.4 Failure Mode Testing

5.4.1 Raid Array Testing

Since the improvements in Exchange I/O from Exchange 2007, it is now viable to deploy Exchange Server databases on a multitude of storage types, from JBOD to RAID 6. Raid arrays offer a great compromise between data redundancy and performance. However, they can also suffer from a significant performance reduction when operating in degraded mode (spindle failure). Due to this, it is recommended to design RAID arrays that will host Exchange Server databases such that the RAID array should provide sufficient IOPS performance for the Exchange workload when running in degraded mode.

Important:

While testing for failure scenarios it is not necessary to run your Jetstress test at peak working load. Instead, it is recommended to modify the thread count until the Jetstress test achieves just above the Total Database Required IOPS / Server value reported in the Mailbox Role Calculator.

From a service availability perspective, it is important to validate that your storage can provide sufficient performance in all common failure conditions. Due to this, it is recommended to run the Jetstress test while the array is operating in the following conditions.

Array Condition Test importance Description

Optimal Recommended for all deployments All disk spindles operating normally

Degraded Recommended for all deployments Single spindle removed from the array

Rebuilding Recommended if array has hot spare1. Failed spindle replaced and array controller is rebuilding the array

Table 2: Raid array testing conditions

Ideally, the Jetstress test should still pass during a degraded mode test. If the test fails, refer to this post to analyse the failure severity.

5.4.2 Resilient Component Testing

Any aspect of the storage solution that has been designed to be resilient should also be tested in a failed state to determine the impact. For example if there are multiple paths between the host and the storage controller, the Jetstress test should still pass if one is disabled. Since there are so many possible types of resilient components, it is impossible to list them here, however the general spirit of this test is to evaluate potential sources of failure within your storage solution and ensure that Jetstress still passes if they enter a degraded state.

1 If your array does not contain a hot spare, you can choose to perform array rebuilds out of hours so the end user impact is minimized, however your data loss exposure is increased. If you plan on performing array rebuilds during working hours, even if you do not have a hot spare configured it is recommended to perform a Jetstress test run while the array is rebuilding.

Page 11, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 20: Jetstress Field Guide v2.0.0.8

000Exchange Community0

5.4.3 Example of a failed degraded mode test

This example shows an unacceptable test result. I have chosen to show an unacceptable result since a good test is just a flat line and that is not particularly interesting. In this instance, the storage was based on Raid6 technology. The Jetstress test was configured to run at 1256 IOPS (Mailbox Role Calculator predicted 1200 IOPS). Approximately half way through the test, a hard disk drive was (carefully) removed from the array and the spare began rebuilding.

The test data shows that the average read I/O latency (Exchange Database ==> Instances\I/O Database Reads (Attached) /average Latency) increased from 11ms to 400ms+, with latency spikes of 3000-4000ms on the affected LUN. This situation took 18 hours to return to normal after the failure. This represented a clear failure of the degraded mode test.

Important:

Common failure modes such as a disk rebuild should not materially affect the test results.

Figure 5: Degraded mode failure

Note:

Please refer to the following section about understanding storage configuration for Exchange Server 2013 for more information on recommended raid configurations for Exchange Server.

http://technet.microsoft.com/en-us/library/ee832792.aspx

Page 12, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 21: Jetstress Field Guide v2.0.0.8

000Exchange Community0

5.5 Jetstress testing inside virtual machinesA quick history lesson: Over the years, we have seen a huge increase is deployments on hypervisor technology. During the early stages of hypervisor use for Exchange, we worked with a number of customers who observed inaccurate results during their Jetstress tests of virtual machines. This culminated in the Exchange product group releasing a statement that advised against using Jetstress inside a virtual machine and instead to test on the root of the hypervisor – obviously this worked for Hyper-V, but was not quite so practical for all hypervisors. On 30th March 2012 after significant internal testing against modern hypervisors the Exchange Product group announced that it is now viable to perform your Jetstress testing directly from inside the virtual machines that are planned to host the Exchange Mailbox role.

The single caveat is that the hypervisor being used is one of the following or newer:

Microsoft Windows Server 2008 R2 (or newer) Microsoft Hyper-V Server 2008 R2 (or newer) VMware ESX 4.1 (or newer)

Information:

More information about deploying Exchange Server 2013 on a Hypervisor can be found here:

http://technet.microsoft.com/en-us/library/jj619301.aspx

5.5.1 What is different about Jetstress inside a virtual machine?

The approach and testing process do not change. The aim of the test is to validate that the storage presented to the virtual guest can provide sufficient performance to meet the predicted requirements from the mailbox role calculator. All performance counters and recommended values remain the same from a physical to a virtual guest and the recommendations for testing against raid arrays and in failure-modes still apply.

However, there are things that we may need to consider during our Jetstress testing.

1. Is the virtual host operating at a normal working load during our test? If the host has capacity for 10 virtual machines and we are testing with a single virtual machine running, then there is the possibility that we will experience performance problems once the host is fully loaded.

2. Does the host server have any high availability technology that we need to test in degraded mode? This could include things like multiple paths to the storage or network, or maybe even a Hypervisor HA solution. Additionally the host may be the failover location for other guests, meaning that workload may increase dramatically in a failure scenario.

3. Follow the current recommended practices from both Microsoft and your hypervisor vendor. Yes, I know this is obvious but it still amazes me how many problems are resolved by following the recommended guidance!

Page 13, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 22: Jetstress Field Guide v2.0.0.8

000Exchange Community0

Guidance

The spirit of the test is to ensure that the system can meet its predicted workload during normal working conditions and during any common failure modes for which the system has been designed to survive.

Page 14, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 23: Jetstress Field Guide v2.0.0.8

000Exchange Community0

For more information about virtualizing Exchange Server:

Announcing Enhanced Hardware Virtualization Support for Exchange 2010 (this applies equally to Exchange Server 2013):http://blogs.technet.com/b/exchange/archive/2011/05/16/announcing-enhanced-hardware-virtualization-support-for-exchange-2010.aspx

Demystifying Exchange 2010 SP1 Virtualization (this applies equally to Exchange Server 2013):http://blogs.technet.com/b/exchange/archive/2011/10/11/demystifying-exchange-2010-sp1-virtualization.aspx

Best Practices for Virtualizing Exchange Server 2010 with Windows Server® 2008 R2 Hyper V™ (Applies equally to Exchange Server 2013):http://www.microsoft.com/download/en/details.aspx?id=2428

5.6 How much time should I allocate for Jetstress testing?Jetstress testing can take a long time to complete and it is vital that this time is correctly planned for within your Exchange project plan.

Generally, the test procedure can be broken up into three parts.

Initialisation Testing Clean-up

5.6.1 Initialisation

This phase includes installation, prerequisites and initial database creation. Of these tasks, the initial database creation will take the longest amount of time. Database creation time varies between hardware deployments however expect around 24 hours for 10TB of data per server (~7GB/minute). If you are using direct attached storage and initialise multiple servers in parallel these predictions apply to each server. If you are using shared storage, your initialisation time may take considerably longer.

DATA (TB) 1TB 2TB 5TB 10TB 50TB 100TB

TIME (Hours) 2.4 4.8 12.0 24.1 120.3 240.6TIME (Days) 0.1 0.2 0.5 1.0 5.0 10.0

Table 3: Database initialisation time

5.6.2 Testing

The actual testing phase will vary depending on the complexity and maturity of the design. If your design is based on complex, cutting-edge storage technology, it is highly likely that you will need to allocate more time for testing. If your design is based on common direct attached components, the testing phase is likely to be quite short. For simple direct attached solutions allow between 2-5 days, for complex SAN solutions try to allocate up to 10 working days. If you are working in a complex

Page 15, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 24: Jetstress Field Guide v2.0.0.8

000Exchange Community0

enterprise with large scale, complex storage infrastructure budget between 4-6 weeks for Jetstress testing. Troubleshooting storage performance issues can often be very time-consuming.

5.6.3 Clean-up

Before the server can be put into production, it is necessary to remove the Jetstress application and the test databases that were created. The recommended procedure is as follows

Uninstall Jetstress and Reboot Copy the Jetstress data to a safe location Delete the Jetstress installation folder Remove all test databases

Depending on complexity, allow between 1 and 2 hours per Exchange server that needs to have Jetstress uninstalled.

Tip:

If you have a complex deployment, you can use the scripts embedded here:

The scripts will parse your JetstressConfig.XML file and remove all database and log folders defined in the test. The scripts takes two input parameters:

[XMLFile] Path to JetstressConfig.XML file – defaults to “C:\Program Files\Exchange Jetstress\JetstressConfig.xml” if no other value is specified.

[Prompt] $true or $false, default is $true, specify $false to use as part of an automated process.

Note that these scripts are unsupported and you use them entirely at your own risk. They are provided here for convenience only.

Page 16, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 25: Jetstress Field Guide v2.0.0.8

000Exchange Community0

5.7 Preparing for the Jetstress testJetstress simulates an Exchange database workload. To ensure that the environment is ready it should be configured according to both the hardware vendor’s and Microsoft recommendations.

Refer to Understanding Exchange 2013 Storage Configuration Options for further detail.

As a starting point, ensure that the following conditions have been met:

1. If multiple clusters will be sharing any aspect of the disk subsystem, the server/storage configuration must be Cluster/Multi-Cluster Certified.

2. Verify with vendors that drivers and firmware are current and consistent across all servers. Drivers and firmware include, but are not limited to, the following items:

a. Server BIOS/firmwareb. SCSI/Array Controller firmware and driverc. Fibre Host Bus Adapter (HBA) firmware and driverd. Fibre switch/hub firmwaree. SAN (Storage Area Network) enclosure Operating System/Microcode/firmwaref. Hard disk firmware

3. Verify that the HBA/SAN specific configuration is set correctly and is consistent across all servers. Many HBAs use registry keys to customize the configuration to a specific SAN platform (for example, Queue Depth).

4. Raid Controller Stripe size is 256Kb or greater (refer to hardware vendor for guidance).5. Read/Write Cache is 75% Write and 25% Read on all LUN’s.6. Configure the storage logical unit numbers (LUNs) (consider Exchange log devices and

database devices).7. Format the LUNs within Windows with NTFS file system. Best practice = 64k allocation unit

size.8. NTFS Compression is not enabled.9. File Level Anti-Virus is configured to exclude all Exchange data locations and any directories

that Jetstress has been configured to use. 10. Storport.SYS has been updated to the latest supported version for your hardware.

Page 17, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 26: Jetstress Field Guide v2.0.0.8

000Exchange Community0

5.8 What happens if the test fails?It is important to determine the pass and fail criteria for the test. The test will find the peak working load that the storage is able to provide at the I/O latency targets recommended by the Microsoft Exchange Team. These are defined in section 8.3 Interpreting Jetstress test results.

If the recorded IOPS target from the Jetstress test is above the targets documented within the Exchange design then the storage solution is deemed to have passed the test. If it does not meet the design targets, then the storage solution is deemed to have failed the test.

If the test shows that, the storage has failed to meet its design targets it will be necessary to perform remediation. This usually involves a combination of resources from the design/project, build, hardware, and storage vendor teams. The aim of remediation is to determine why the IOPS target was below the design target and to provide a remediation plan before submitting the solution for a re-test.

Before beginning significant storage redesign work, it is important to check the basics listed in section 4.7 Preparing for the Jetstress test. The most common causes of Jetstress test failures are missing simple configuration steps during deployment and/or misconfiguring the Jetstress test itself.

One of the most common pitfalls that occurs when a test fails is focussing on Jetstress itself. Remember that “Jetstress” has not failed. Your storage has failed the test. Jetstress is just the messenger, instead concentrate on understanding the data that Jetstress has provided and how you can fix your storage solution. Jetstress is a well-proven tool and is extremely unlikely to be the root cause of your storage test failing.

Advice:

It is much easier to resolve configuration problems during this phase of the deployment than after the Exchange servers have been put into production. It is far better to suffer a small delay to the project timescales than put a service into production that does not meet its original goals.

Page 18, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 27: Jetstress Field Guide v2.0.0.8

000Exchange Community0

6 Installing Jetstress

6.1 DocumentationThe document that you are currently reading represents the main source of information for Jetstress 2013. If you are validating Exchange Server 2003, 2007 or 2010 refer to the Jetstress Field Guide for Jetstress 2010.

6.2 Jetstress Version and Download

Version Build Usage Link

14.01.0225.017 32 bit Exchange 20032 http://www.microsoft.com/en-us/download/details.aspx?id=20054

14.01.0225.017 64 bit Exchange 2007 Exchange 2010

http://www.microsoft.com/en-us/download/details.aspx?id=4167

15.0.658.4 64 bit Exchange 2013 http://www.microsoft.com/en-us/download/details.aspx?id=36849

Table 4 - Jetstress version and download table

Note: Although there is a 32-bit build of Exchange 2007, it is not recommended or supported to use these ESE files to run a Jetstress test. This is due to the requirement for a 64-bit address space to simulate a realistic Exchange I/O pattern.

Jetstress 2013 will not allow you to use an XML configuration file from an older version of Jetstress.

Always ensure that you use the same version of Jetstress to initialise the databases and to perform the testing.

2 Refer to Appendix D – Exchange 2003 for information on configuring Jetstress 14.01.225.x for Exchange 2003Page 19

, , Version 2.0.0.8 Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 28: Jetstress Field Guide v2.0.0.8

000Exchange Community0

6.3 Prerequisites .NET Framework 4.5 or higher A copy of your 64-bit production ESE files3

o ese.dllo eseperf.dllo eseperf.hxxo eseperf.inio eseperf.xml

It is important that the version of ESE that is used for the test is the same version that will be used in production.

3 See section 5.4 Getting ESE Files necessary for Jetstress for the locations of these files.Page 20

, , Version 2.0.0.8 Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 29: Jetstress Field Guide v2.0.0.8

000Exchange Community0

6.4 Getting ESE Files necessary for JetstressJetstress requires ESE to function. The needed files are available from an installed Exchange server or from the Exchange installation media. It is recommended to get the files from an installed Exchange server that has been fully updated and patched. If you are validating Exchange 2010 or newer, it is possible to get the necessary files directly from the installation media without requiring an Exchange installation.

Note: AMD64 refers to the x86-64 bit architecture and is not specific to AMD processors. Do NOT use the x86 files!

6.4.1 File locations from an installed Exchange Server

File Path

ESE.DLL C:\Program Files\Microsoft\Exchange Server\V15\BinESEPERF.DLL C:\Program Files\Microsoft\Exchange Server\V15\Bin\perf\AMD64ESEPERF.HXX C:\Program Files\Microsoft\Exchange Server\V15\Bin\perf\AMD64ESEPERF.INI C:\Program Files\Microsoft\Exchange Server\V15\Bin\perf\AMD64ESEPERF.XML C:\Program Files\Microsoft\Exchange Server\V15\Bin\perf\AMD64

Table 5 - ESE file locations on running Exchange server

6.4.2 File locations from the installation media

File Path

ESE.DLL \setup\serverroles\commonESEPERF.DLL \setup\serverroles\common\perf\amd64ESEPERF.HXX \setup\serverroles\common\perf\amd64ESEPERF.INI \setup\serverroles\common\perf\amd64ESEPERF.XML \setup\serverroles\common\perf\amd64

Table 6 - ESE file locations from installation media

Caution

Remember to use the same version of ESE files in your Jetstress tests that you will use in production.

Page 21, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 30: Jetstress Field Guide v2.0.0.8

000Exchange Community0

6.5 InstallationBefore performing this section, it is recommended that all prerequisites have been met and that Exchange server is not installed on any servers being used for Jetstress testing.

6.5.1 Application Installation

# Instruction Screenshot

1. Begin Jetstress installation

2. Accept License agreement

Page 22, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 31: Jetstress Field Guide v2.0.0.8

000Exchange Community0

3. Leave the installation options as default unless you have a good reason to change them.

Note: All performance data and HTML reports will be stored in the installation folder so if your system drive is short of space select an alternative folder.

4. This is the last chance to stop the installation. Click on “Next” to install…

Page 23, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 32: Jetstress Field Guide v2.0.0.8

000Exchange Community0

5. Once installation is completed click on “Close”.

Table 7 - Jetstress installation instructions

6.5.2 ESE File Installation

# Instruction Screenshot

1. Copy ESE prerequisite files into the Jetstress installation folder.

By default this is “c:\Program Files\Exchange Jetstress”

Page 24, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 33: Jetstress Field Guide v2.0.0.8

000Exchange Community0

2. Start “Exchange Jetstress 2013”

Note: Jetstress requires local Administrator access. If user access control is enabled, ensure that you start the JetstressWin.EXE process as an administrator.

3. Click on “Start new test”

4. Jetstress will attempt to use the ESE files that were copied over in step 1. The first time that this occurs Jetstress must be restarted. Verify in the output on this screen that the ESE version is correct and that the last line of the status output requires that Jetstress be restarted.

Close Jetstress

This is the end of the Jetstress installation.

Table 8 - ESE installation instructions

Page 25, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 34: Jetstress Field Guide v2.0.0.8

000Exchange Community0

7 Configuring JetstressFor the purposes of this document, we will be configuring a disk subsystem throughput test. The goal of this test is to identify the peak working IOPS value that the storage subsystem can sustain while remaining within the disk latency targets established by the Exchange Product Group.

7.1 Jetstress Test Types

7.1.1 Test a disk subsystem throughput

This test uses some fixed parameters to determine the maximum storage performance at maximum working capacity (80%). This is the recommended test type since it identifies the maximum working load of the storage solution for use with Exchange Server 2013 while the disks are filled to capacity. The values observed from this test can be used both to qualify the solution ready for production and to calculate available system I/O headroom once the service is in production. This test should be regarded as mandatory for each Exchange server released into production.

Databases Size Control

Where you are testing multiple databases per volume, Jetstress will automatically calculate the database size of all databases on the same volume to ensure that the test runs at 80% of volume capacity.

If your volume is over-sized for your solution for some reason and the test databases are too large, then you can control the size of the databases by reducing the “size the database using storage capacity percentage” box during the test configuration to be whatever you need.

7.1.2 Test an Exchange mailbox profile

Helps you determine whether your storage system meets or exceeds the planned Exchange mailbox profile. In the Exchange mailbox profile test scenario, you can specify the number of mailbox users, IOPS per mailbox and quota size to simulate the profiled Exchange mailbox load. This test type can be useful if your storage has been specifically designed to operate only at a specific disk capacity4.

Note: Even if this test type is used, it is still recommended to complete the disk subsystem throughput test to determine the maximum working load of the storage solution at full capacity.

4 It is not recommended to design Exchange storage performance based on less than 80% utilisation capacity.Page 26

, , Version 2.0.0.8 Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 35: Jetstress Field Guide v2.0.0.8

000Exchange Community0

7.2 Initial configuration

# Instruction Screenshot

1. Open “Exchange Jetstress 2013”

2. Click on “Start new test”

3. Check that the status text does not ask for a restart and that the last two lines state that the ESE engine and performance libraries were detected.

Page 27, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 36: Jetstress Field Guide v2.0.0.8

000Exchange Community0

4. Since this is the first time, we are configuring a test we will accept the defaults and click next.

This will create a new configuration file called JetstressConfig.xml in the default installation directory. If you already have an XML file select that.

5. Select the “Test disk subsystem throughput” test and click “next”

6. Ensure that “Supress tuning and use thread count” is unchecked. This is a change to Jetstress 2010 where auto-tuning would rarely work. Auto tuning should work in most scenarios with Jetstress 2013. If Auto-tuning fails, revert to manual thread configuration as per Appendix A – Configuring Thread Count.

You should always test with 100% database capacity and target IOPS throughput, however if the storage presented to your servers is greatly oversized then you can control the Jetstress test database sizes by reducing the size the database using storage capacity percentage.

Most validation tests should leave both values at 100.

Page 28, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 37: Jetstress Field Guide v2.0.0.8

000Exchange Community0

7. Configure the test for performance. If you are testing a shared storage platform, enable the multi-host checkbox. Ensure that run “background database maintenance” is checked. Set “continue the test run despite encountering errors” to enabled.

If any errors are detected during the test, they will be reported in a new table to highlight disk errors.

8. Enter in the folder for storing the test results and set the correct duration for Jetstress. A minimum of one successful 2hr and a separate 24 test is required for deployment validation.

Note: While auto-tuning or configuring thread count, you can set a shorter than 2 hour test by typing directly into the window.

0.75 = 45m 0.50 = 30m 0.25 = 15m

Recommendation: Use 0.50 (30 minute) test runs to set thread count for SAN storage.

9. Configure the test to represent the production deployment.

Number of databases should be the total on this server including all database copies, active, passive and lagged.

Number of copies per database represents the number of total copies

Page 29, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 38: Jetstress Field Guide v2.0.0.8

000Exchange Community0

that will exist for each unique database. This value simply simulates some LOG I/O reads to account for the log shipping between active and passive databases – it does NOT actually copy logs between servers.

For example, if your 6 server DAG contained 30 databases, with 1 active copy, 2 passive HA copies and 1 lagged copy per database (or 120 database copies spread across 6 servers, with each server hosting 20 copies), you would set the number of databases to 20 and the number of copies per database to 4.

10. Configure the database and log file paths appropriately.

Scroll to the bottom of this page to find the “next” link.

Note: Refer to the Mailbox Role Calculator’s Distribution Tab to understand how your database should be configured.

11. If this is the first time the test has been run select to “Create new databases”, otherwise select “Attach existing databases”.

Page 30, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 39: Jetstress Field Guide v2.0.0.8

000Exchange Community0

12. Verify that the paths are as expected and click “Prepare test”

13. This will begin database initialisation – this process will vary but plan on 24 hours for every 10TB worth of data to be initialised.

This value should equate to 80% of the available storage. Refer to section 4.6.1 Initialisation, for further information on database sizes and creation time.

Page 31, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 40: Jetstress Field Guide v2.0.0.8

000Exchange Community0

14. Once the test has been initialised, click “Execute Test”.

15. Once the test has completed, close Jetstress and copy the Jetstress report and performance data somewhere for analysis.

Each performance test will generate the following files.

Performance_<date>.XML Performance_<date>.HTML Performance_<date>.BLG DBChecksum_<date>.XML DBChecksum_<date>.HTML DBChecksum_<date>.BLG XMLConfig_<date>.XML

Ensure that you make a copy of all of these files.

Note: In addition you may also wish to make a copy of the *.EVT files which contain event log data taken during the test.

Table 9 - Jetstress initial configuration

Page 32, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 41: Jetstress Field Guide v2.0.0.8

000Exchange Community0

8 Jetstress Output FilesThis section will explain what output files will be created after the test and what is in each one.

File Content Purpose

Performance_<date>.BLG Binary performance data captured during the performance test.

To provide detailed data for analysis. Open this file in perfmon and examine the counters manually to understand reasons for failure.

Performance_<date>.XML XML Report for the performance test

Provides the status report data in XML format.

Performance_<date>.HTML HTML Report for the performance test

Provides an easy to read status report for the test.

DBChecksum_<date>.BLG Binary performance data captured during the checksum test.

Provides binary performance data gathered during the CRC checksum of the database. Useful if the checksum fails or takes a long time to complete.

DBChecksum_<date>.XML XML Report for the checksum test

Provides status report data in XML format.

DBChecksum_<date>.HTML HTML Report for the checksum test

Provides an easy to read status report for the checksum test.

XMLConfig_<date>.XML XML Configuration File Provides a backup of the Jetstress Configuration file used for the test.

Table 10 - Jetstress output files

Page 33, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 42: Jetstress Field Guide v2.0.0.8

000Exchange Community0

9 Reading Jetstress report dataThis section will walk through a very simple sample report, and explain where the key values are stored and how to interpret the data.

9.1 Target design valuesBefore we can evaluate our Jetstress data, we need to know what our design targets are. Assuming that the storage design was based on data from the Mailbox Role calculator (which they should be), the information we need is in the following table on the Role Requirements tab.

Make a note of the following value:

Total Database Required IOPS / Server

Page 34, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 43: Jetstress Field Guide v2.0.0.8

000Exchange Community0

9.2 Reading the Jetstress Test Result ReportThe following report is for a test with four databases configured.

9.2.1 Test Summary

This section is a basic summary of the test, when it started, finished and which versions of operating system and ESE were used.

The most important part of this section is the overall test result, pass or fail.

9.2.2 Database Sizing and Throughput

This section shows some more detailed parameters regarding the test. A “test disk subsystem throughput” test report will always show 100% for Capacity Percentage and Throughput Percentage. In this example, 4 x 25GB Databases were created on a 126GB LUN. Jetstress created a total of 101GB (109154926592 bytes) of data for testing which is 80% of the available space. This is normal behaviour; by default, in performance mode Jetstress will use 80% of the disk capacity to allow room for growth during the test process.

The most important value in this section is the Achieved Transactional I/O per Second. In this example the test validated the storage can provide 231 transactional I/O per second. This represents random database IOPS.

Note:

To validate that the test has met the design requirements compare the Achieved Transactional I/O per Second from your Jetstress report to the Total Database Required IOPS / Server value recorded in section 8.1 Target design values, from the Mailbox Role Calculator.

Page 35, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 44: Jetstress Field Guide v2.0.0.8

000Exchange Community0

9.2.3 Jetstress System Parameters

This section displays some system values that Jetstress used for this test. The important values for analysis here are the thread count and number of copies per database.

9.2.4 Database Configuration

This section lists the paths for each database and log combination. In this example, 4 x 25GB databases were configured on a single LUN. Check that all of the test databases are listed here and the path names are correct.

9.2.5 Transactional I/O Performance

This section of the report displays the Transactional I/O values that were achieved for each database. Transactional I/O does not include I/O for Background Database Maintenance.

BDM I/O is mostly sequential so it is not usually considered during the design phase.

Information:

If you sum the values highlighted in the red box the result should add up to the Achieved Transactional I/O per second reported in the Database Sizing and Throughput table.

In this example, 33.859 + 24.069 + 33.87 + 23.491 + 33.978 + 24.186 + 34.043 + 23.807 = ~231 IOPS.

Page 36, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 45: Jetstress Field Guide v2.0.0.8

000Exchange Community0

9.2.6 Background Database Maintenance I/O Performance

This section displays the I/O that was used to perform Background Database Maintenance only. The sum of values in the red box shows the total amount of IO used for BDM operations. These are sequential operations and we do not usually need to account for them in our design. However, take the advice of your storage vendor on this aspect, some storage platforms do not handle sequential IO as well as others and may require some additional design work to help them deal with BDM more gracefully.

9.2.7 Log Replication I/O Performance

This section displays the I/O overhead for LOG file replication. In this example there were two replica copies (replicas=2), this is shown by a non-zero count for I/O Log Reads/sec. If this value is greater than zero it confirms that database replication is being simulated.

Note:

For those that noticed, I finally provided a report that shows log IO – I know, the little things count

Page 37, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 46: Jetstress Field Guide v2.0.0.8

000Exchange Community0

9.2.8 Total I/O Performance

This table shows all I/O that was recorded during the test (transactional I/O plus BDM I/O plus LOG I/O). The summation of I/O values from areas highlighted in red in this table should agree (roughly) with those observed at the storage subsystem.

In this case, the summation suggests that the storage subsystem had to deal with 349 IOPS. However, roughly 1/3rd of those (349-231=117) IOPS were sequential and so were not accounted for during the design process, since sequential I/O is very easy on most disk subsystems.

The following chart shows the observed IOPS from the Windows host during the Jetstress test. This counter includes all system IOPS as well as the test IOPS; however there should be a strong correlation between the IOPS observed on the windows host and at the storage subsystem. In the event of contradiction between observed IOPS at the Windows Host and those at the storage controller, the windows host values take precedence from a Jetstress validation perspective.

Page 38, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 47: Jetstress Field Guide v2.0.0.8

000Exchange Community0

Figure 6 - Host observed IOPS

It is import to differentiate between sequential IOPS and transactional (random) IOPS when validating your storage.

We are only interested in transactional IOPS when we are Jetstress testing – BDM and LOG IO are sequential in nature and so we ignore them from a performance planning perspective for Exchange Server.

Often storage teams are confused by the results of a Jetstress test since the achieved transactional I/O per second value is much lower than the observations they make at the storage system. It is important to differentiate between the workloads.

Note:

It is an invalid approach to sum the values displayed in the Total I/O Performance table and compare them to the Total Database Required IOPS / Server predicted by the Mailbox Role calculator. The only value from the Jetstress report that is required for validation is Achieved Transactional I/O per Second. All other values are for support and curiosity only!

Page 39, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 48: Jetstress Field Guide v2.0.0.8

000Exchange Community0

9.2.9 Host System Performance

Figure 7: Host System Performance Table

This section of the report shows the observed system performance during the test. This section is most often used for troubleshooting. The most important thing to note from this section is that the CPU load from Jetstress is usually minimal. Jetstress has been optimized to evaluate the storage subsystem and not the host performance itself.

9.2.10 Error Counts Per Volume

If the Jetstress test detects IO errors, during the test it will try to continue to run the test and report the errors in both the Test Log and Error counts per Volume table. The table lists each volume along with the number and type of IO errors that were recorded.

Figure 8: Error Counts Per Volume Table

Page 40, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 49: Jetstress Field Guide v2.0.0.8

000Exchange Community0

Error Type JET/ESE Error Type Error Code

IO Failures JET_errDiskIO -1022JET_errReadVerifyFailure -1018JET_errPageNotInitialized -1019JET_errReadPgnoVerifyFailure -1118JET_errDiskReadVerificationFailure -1021

Filesystem Corruptions JET_errCheckpointCorrupt -533JET_errMissingLogFile -528JET_errLogFileCorrupt -501JET_errInvalidPath -1023JET_errInvalidSystemPath -1024JET_errInvalidLogDirectory -1025JET_errFileAccessDenied -1032JET_errFileInvalidType -1812JET_errLogCorrupted -1852JET_errObjectNotFound -1305

Lost Flush JET_errReadLostFlushVerifyFailure -1119JET_errDbTimeTooOld -566JET_errDbTimeTooNew -567

Table 11: JET Error Code Groupings

Information

Some failure events are more important than others. Lost Flush events signal significant data corruption has occurred and something is very wrong with your storage (under no circumstances should you entertain putting a system into production that is experiencing ANY lost flush events during a test). However, some other IO Failures are relatively normal, for example, in a JBOD environment we may see -1021 (JET_errDiskReadVerificationFailure) which, although signifies that the data we read was not the same that we originally wrote (checksum failed), Exchange will try to deal with this scenario via Page Patching in normal operation and so is not of critical importance.

For a full list of JET/ESE event types see the following article Extensible Storage Engine Error Codes.

Page 41, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 50: Jetstress Field Guide v2.0.0.8

000Exchange Community0

What is a Lost Flush?

A lost flush occurs if we issued a write operation to the disk and the OS reported the operation as having successfully completed, but it actually didn’t get physically committed to the non-volatile storage. The two main reasons for this to happen are:

1. A bug somewhere in the storage stack.2. Power loss on storage with write-cache enabled: in this case, the operation is committed to

the volatile cache of the disk or controller, but if the hardware loses power, it means it never actually made it to the non-volatile storage, even though it was reported to the application that it did. This is the reason why we only run with write-cache enabled on the storage if there’s a battery backing up the cache, so if it loses power, the controller makes sure to flush the uncommitted cache to the disk.

A lost flush is a very insidious type of storage failure for a database engine because the consequences can range from none (if we are very lucky) to nasty and potentially undetectable logical database corruption (more likely).

Undetected lost flushes on the active copy may show up as a JET_errDbTimeTooNew (-567) replication error on the passive copy. Undetected lost flushes on the passive copy may show up as a JET_errDbTimeTooOld (-566) replication error on the passive copy.

ESE has implemented lost flush detection, based on a flush map. Basically, every time we issue a write on a page, we flip a bit on the actual page and also store that bit in a flush map in memory. If we read the page again off the disk, we check the bit against the in-memory flush map and if they don’t match, it means the flush was lost.

Important:

The bottom line for lost flushes is that you should NEVER put a system into production that has recorded lost flushes during the Jetstress test. You must be 100% certain that you have resolved the underlying problem and have at least one good 24 hour test that has no lost flushes recorded before accepting the solution into production.

Page 42, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 51: Jetstress Field Guide v2.0.0.8

000Exchange Community0

9.2.11 Test Log

This section of the report is a log of the Jetstress test. It is most often used for troubleshooting failures.

Page 43, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 52: Jetstress Field Guide v2.0.0.8

000Exchange Community0

9.3 Interpreting Jetstress test resultsJetstress evaluates latency values for Database Reads and LOG writes since these affect the end user experience.

Performance Test “Strict” mode (<= 6 hour test)

Average Database Read Latency: 20ms Average Log File Write Latency: 10ms Max Database Read Latency: 100ms Max Log File Write Latency: 100ms

Stress Test “Lenient” mode (> 6 hour test)

Average Database Read Latency: 20ms Average Log File Write Latency: 10ms Max Database Read Latency: 200ms Max Log File Write Latency: 200ms

Page 44, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 53: Jetstress Field Guide v2.0.0.8

000Exchange Community0

9.4 Test evaluationEvaluate the following criteria for each test run. The first test is validated against the design target and must be performed manually; Jetstress does not validate this value. The second and third are against pre-defined latency targets for Exchange, if these values are not within tolerance, Jetstress will report the test as failed.

1. DB IOPS Target: Is the Achieved Transactional I/O per Second in the test report higher than the Total Database Required IOPS / Server predicted in the Mailbox Role Calculator?

2. Is the I/O Database Reads Average Latency in the test report <20ms?3. Is the I/O Log Writes Average Latency in the test report <10ms?

DB IOPS Target

DB Read Latency

LOG Write Latency

Action

PASS PASS PASS Test successful

FAIL PASS PASS The test is failing to meet the IOPS target, but the latency values are good. Increase the thread count by 1 and re-test. Use sluggishsessions to fine-tune if necessary.

PASS FAIL FAIL At least one database has recorded latency over threshold. If the latency values are very close to limits increase sluggish sessions by 1, if both target IOPS and latency values are much higher decrease the thread count.

PASS PASS FAIL

PASS FAIL PASS

FAIL FAIL FAIL If the test shows that Achieved IOPS is below the design target AND the test latency values are above limits the storage solution is unable to meet the requirements. At this stage, it is necessary to re-evaluate the storage design and begin troubleshooting the physical deployment to determine the correct remediation.

FAIL FAIL PASSFAIL PASS FAIL

Table 12 - Quick results analysis table

Page 45, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 54: Jetstress Field Guide v2.0.0.8

000Exchange Community0

10 Appendix A – Configuring thread countJetstress 2013 has been updated so that the auto-tuning feature will work in far more scenarios than previously. Due to this, it is recommended to begin Jetstress testing in auto-tuning mode and only revert to manual thread configuration if auto-tuning fails to set a thread value.

Thread count controls how many IOPS Jetstress attempts to drive through the storage subsystem. Setting this value correctly requires some trial and error. For the process described within this document the goal is to increase the thread count to a value that fails and then reduce the value until the test passes, this should then represent the peak working IOPS value that the storage subsystem can support.

Each thread will generate a workload on the system. So for example, if the storage design team recommended that the storage for a given server was able to support 1000 IOPS:

Target IOPS = 1000

Starting thread count = TargetIOPS

(65 )

Given this example…

Starting thread count = 1000(65 ) = 15.38 (round up to 16)

Notes:

Try auto-tuning with Jetstress 2013

If in doubt start with thread=1 and work up until the test fails.

If the thread count predicted is less than 1 it may be necessary to modify the sluggishsessions value afterwards.

The exact quantity of IOPS generated per thread will change as the storage system workload changes. As the storage system gets closer to its performance limit the IOPS per thread value will reduce. Jetstress was designed to produce approximately 60 IOPS per thread at 20ms disk latency.

Page 46, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 55: Jetstress Field Guide v2.0.0.8

000Exchange Community0

11 Appendix B – Configuring sluggishsessionsIf it is not possible to achieve the right IOPS value by modifying the thread count it becomes necessary to modify the sluggishsessions value within the JetstressConfig.xml file.

The sluggishsessions value adds a pause between each task. This allows a level of fine-tuning over the workload dispatched by Jetstress.

As sluggishsessions is increased, the achieved IOPS value decreases.

To change the value, open the JetstressConfig.xml file and look for the default configuration option

<SluggishSessions>1</SluggishSessions>

Modify the value, save the configuration file and then re-start Jetstress.

Page 47, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 56: Jetstress Field Guide v2.0.0.8

000Exchange Community0

12 Appendix C - Running a Jetstress Test with JetstressCmd.exeBoth JetstressWin.exe and JetstressCmd.exe use the common Jetstress core library files, which means you will have comparable test results with the same XML configuration file. We recommend that you use JetstressWin.exe to create new test scenarios, and JetstressCmd.exe to open and run the test scenarios by using the /config command-line option. You can also see all the other available options by using the /? (help) command-line option.

Action Argument Example of Use Description

help /? The help for the command-line program

Config /c JetstressConfig.xml Open a configuration file

Generate /g Generate a sample XML configuration file

TimeOut /TimeOut 2H0M0S Test Duration. Default is 2 hours.

Output /output c:\output Path for test output. Default is the current directory.

DBPath /dbpath m:\sg1\mdb /dbpath n:\sg2\mdb

Database paths for each storage group

LogPath /log x:\sg1\log y:\sg2\log Log path for each storage group

PctCapacity /pctcapacity 100 Specify capacity percentage

Throughput /throughput 100 Specify throughput percentage

Threads /threads Suppress auto tuning and specify thread count

DoNotRunDBMPerformance

Do not run background database maintenance during performance/stress test

RunDBMPerformance Run background database maintenance during soft recovery test

New /new Create new databases

Open /open Open existing databases

Page 48, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 57: Jetstress Field Guide v2.0.0.8

000Exchange Community0

Bak /bak Restore backup database

Recovery /recovery Run soft recovery test

Streaming Run streaming backup test

Transaction Run transaction performance test

VerifyCheckSum Run database checksums

Page 49, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 58: Jetstress Field Guide v2.0.0.8

000Exchange Community0

13 Appendix E – Running Jetstress on a production serverAlthough the formal support position on this is that you shouldn’t do it – ever – at all – under no circumstances – in fact you shouldn’t even be reading this section of the field guide … however, we all accept there are cases where it can be necessary, such as when attaching new storage to an existing server or troubleshooting performance bottlenecks on existing servers.

That still doesn’t mean it’s ok to do it!!

If you really MUST do it, here are some things to know before beginning…

Record the start-up state of all Exchange Services. Stop and Disable all Exchange Services on the server. Copy the ESE files from the currently installed version of Exchange server – Jetstress will

detect that the performance counters are already installed for this version of ESE and will use them, this will prevent performance counter problems afterwards!

Do not unload/reload performance counters after the test (if you have used the same ESE files as are currently installed this is unnecessary and could break things!).

Remember to clean up the Jetstress test databases after testing. Uninstall Jetstress. Set Exchange Services back to the state they were in before you began testing. Reboot your Exchange Server. Inspect Exchange Performance counters are working. Inspect Windows System and Application Event logs for errors.

Remember: This is not supported or recommended – only follow this as a matter of last resort or under the instruction of Microsoft Support/Microsoft Consulting Services.

Page 50, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 59: Jetstress Field Guide v2.0.0.8

000Exchange Community0

14 Common Issues

14.1 Troubleshooting JetstressWhile using Jetstress, you may encounter some known issues with Jetstress. This section provides possible causes, and the recommended solutions.

14.1.1 Jetstress cannot attach to or create a database

Event log error that may display: Error -1023

Possible cause: The path of the database or log files is incorrect. Solution: Ensure that the paths and file names are correct.

Event log error that may display: Error -1032

Possible cause: Permissions are insufficient to access the .edb file or the log files. Solution: Verify that permissions are sufficient for the account under which Jetstress is

running. Jetstress requires read/write permission to the directories it is using.

Event log error that may display: Error -550 (0)

Possible cause: The last time Jetstress was run, it was ended uncleanly. This caused the log files to become unsynchronized with the database.

Solution: Delete the Jetstress database (*.edb), log files (*.log), and check file (*.chk), and re-create the Jetstress database. You can also use Eseutil.exe with the /r switch to resynchronize the logs and database.

Event log error that may display: Error -1022

Possible cause: The failure is caused by circular logging by Jetstress. Solution: Check the log drive for the log file name that is identified in the event log. Delete

that log file and all the log files that have a higher number in the file name. Then, run Eseutil.exe /r to recover Jetstress.edb. When the database is in a good state, delete all the log files in the log directory, and rerun Jetstress.

14.1.2 Error loading Performance Monitor counters

JetstressWin.exe relies on performance counters to monitor the system. JetstressWin.exe requires the ESE database counters to be installed.

Cause: When the counters are not loaded correctly, you may see exception errors related to performance counters.

Solution: To reload the counters, exit from JetstressWin.exe. Locate the directory where JetstressWin.exe was installed and verify that eseperf.dll, eseperf.hxx, and eseperf.ini files exist in the directory. In a command shell window, type the command unlodctr ESE and then click Enter. This will unregister the ESE Database performance counters. Start JetstressWin.exe and allow it to reload the performance counters.

Page 51, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 60: Jetstress Field Guide v2.0.0.8

000Exchange Community0

14.1.3 Unable to tune for the parameters

This error indicates that Jetstress could not find appropriate parameters that could be used to run a performance or stress test at the desired level of I/O load.

Cause: This can be caused by several factors. The most common reason is that the storage subsystem has multiple hosts attached to it, and those hosts are competing for common resources during the tuning process.

Solution: When you are running in a scenario such as this, you can run Jetstress on a single host with tuning enabled to generate the appropriate load parameters, and then rerun the test on the other hosts with the Suppress Tuning option enabled and the tuning parameters entered manually from the results of the first test.

14.1.4 Unable to mount databases due to invalid mount point configuration

When using mount points and running the Prepare phase of Jetstress, the operation fails with error “There is insufficient disk space on volume <system drive>:\” , where <system drive> is the drive letter where you keep your root mount folder.

Cause: This error means that one or more of the mount points is invalid or the mount point folder path is not connected to its LUN. Database creation fails saying that volume C: (or in general, the system volume) does not have enough space. The issue here is that some of the mount-points mapped to directories in the system volume are not properly configured and so Jetstress is looking at the directory (thus checking against the system drive itself), rather than the actual disk.

Troubleshooting: Execute a DIR command in the mount point root folder.

ALL mount point folder paths are indicated by a <JUNCTION> notation. Any folder that is listed as a <DIR> is not attached to its mount point and is likely causing the problem.

Solution: The mount path folder could be listed as <DIR> for a number of reasons:

Page 52, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2

Page 61: Jetstress Field Guide v2.0.0.8

000Exchange Community0

1. Verify the LUN is present and in good health.2. Use the storage system array management software to verify that the LUN has an

assigned logical drive.3. Using the Disk Management MMC, re-assign the LUN to the correct mount-point.

14.1.5 Jetstress testing failed. Error: System.ApplicationException: Faulty performance counter paths: \MSExchange Database(*)\*

Jetstress version 658.004 has an incompatibility with ESE version 620 (CU1) and above, if you try to run a test with more than 38 databases configured. If you experience this issue either use the RTM version of ESE (516.26) or use a version of ESE later than 726, which will be released with CU2.

Additionally, a fixed version of Jetstress will be released (726) that will work with all versions of ESE after 516.26 (Exchange 2013 ESE).

Page 53, , Version 2.0.0.8

Prepared by "document.docx" last modified on 8 Jul. 13, Rev 2