Top Banner
Exchange Deployment Planning Services Exchange 2010 High Availability
134
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Exchange 2010 High Availability

Exchange Deployment Planning Services

Exchange 2010 High Availability

Page 2: Exchange 2010 High Availability

Ideal audience for this workshop Messaging SME Network SME Security SME

Exchange 2010 High Availability

Page 3: Exchange 2010 High Availability

Exchange 2010 High Availability

During this session focus on the following : How will we leverage this functionality

in our organization? What availability and service level

requirements do we have around our messaging solution?

Page 4: Exchange 2010 High Availability

Agenda• Review of Exchange Server 2007

Availability Solutions• Overview of Exchange 2010 High

Availability• Exchange 2010 High Availability

Fundamentals• Exchange 2010 High Availability Deep Dive• Exchange 2010 Site Resilience

Page 5: Exchange 2010 High Availability

Exchange Server 2007 Single Copy Clustering• SCC out-of-box provides little high availability value

− On Store failure, SCC restarts store on the same machine; no CMS failover

− SCC does not automatically recover from storage failures

− SCC does not protect your data, your most valuable asset

− SCC does not protect against site failures− SCC redundant network is not leveraged by CMS

• Conclusion− SCC only provides protection from server hardware

failures and bluescreens, the relatively easy components to recover

− Supports rolling upgrades without losing redundancy

Page 6: Exchange 2010 High Availability

1. Copy logs

E00.logE0000000012.logE0000000011.log

2. Inspect logs

3. Replay logs

LogLog LogLog

Log shipping to a local disk

LocalFileShare

Log shipping within a cluster

Cluster

Log shipping to a standby server or

cluster

Standby

Database Database

Exchange Server 2007 Continuous Replication

Page 7: Exchange 2010 High Availability

DB1

Client Access Server

CCR #1

Node A

CCR #1

Node B

CCR #2 Node B

CCR #2

Node A

SCR

Outlook (MAPI) client

Windows cluster Windows cluster

OWA, ActiveSync, or Outlook Anywhere

AD site: San Jose

AD site: Dallas

Client Access Server

Standby Server

SCR managed

separately; no GUI

Manual “activation” of

remote mailbox server

Clustering knowledge required

DB2

DB3

DB1

DB2

DB3

DB4

DB5

DB6

DB4

DB5

DB6

Database failure requires server failover

DB4

DB5

DB6Mailbox server can’t co-exist

with other roles

Exchange Server 2007 HA Solution (CCR + SCR)

Page 8: Exchange 2010 High Availability

Exchange 2010 High Availability Goals• Reduce complexity • Reduce cost • Native solution - no single point of failure• Improve recovery times• Support larger mailboxes• Support large scale deployments

Make High Availability Exchange deployments mainstream!

Page 9: Exchange 2010 High Availability

Client

DB2DB3

DB2

DB3

DB4

DB4

DB5

Client Access Server (CAS)

Mailbox

Server 1

Mailbox

Server 2

Mailbox

Server 3

Mailbox Server 6

Mailbox

Server 4

AD site: Dallas

AD site: San Jose

Mailbox

Server 5

DB5

DB2

DB3DB4DB5DB1

DB1DB1

DB1

Exchange 2010 High Availability Architecture

Failover managed

within Exchange

Database (DB)

centric failover

Easy to extend

across sites

Client Access Server

All clients connect via CAS

serversDB3DB5

DB1

Page 10: Exchange 2010 High Availability

Exchange 2010 High Availability Fundamentals• Database Availability

Group • Server• Database• Database Copy• Active Manager• RPC Client Access

service

DAG

copy

copy

AMSVR

copy

copy

AM

SVR

DB DB

RPC CAS

RPC CAS

Page 11: Exchange 2010 High Availability

Exchange 2010 High Availability FundamentalsDatabase Availability Group • A group of up to 16 servers hosting a set of

replicated databases• Wraps a Windows Failover Cluster

− Manages servers’ membership in the group− Heartbeats servers, quorum, cluster database

• Defines the boundary of database replication• Defines the boundary of failover/switchover (*over)• Defines boundary for DAG’s Active Manager

Mailbox Server

1

Mailbox Server

2

Mailbox Server

3

Mailbox Server

4

Mailbox Server

16

Page 12: Exchange 2010 High Availability

Exchange 2010 High Availability FundamentalsServer• Unit of membership for a DAG• Hosts the active and passive copies of multiple mailbox

databases• Executes Information Store, CI, Assistants, etc., services on

active mailbox database copies• Executes replication services on passive mailbox database

copies

DB2DB3

DB4

Mailbox Server

1

Mailbox Server

2

Mailbox Server

3

DB1

DB1 DB3

DB4

DB2

Page 13: Exchange 2010 High Availability

Exchange 2010 High Availability FundamentalsServer (Continued)• Provides connection point between Information Store and RPC Client

Access• Very few server-level properties relevant to HA

− Server’s Database Availability Group− Server’s Activation Policy

DB2DB3

DB4

Mailbox Server

1

Mailbox Server

2

Mailbox Server

3

DB1

DB1 DB3

DB4

DB2

RCAClient Access Server

Page 14: Exchange 2010 High Availability

Exchange 2010 High Availability FundamentalsMailbox Database• Unit of *over• A database has 1 active copy – active copy can

be mounted or dismounted• Maximum # of passive copies == # servers in

DAG – 1

DB2DB3

DB4

Mailbox Server

1

Mailbox Server

2

Mailbox Server

3

DB1

DB1 DB3

DB4

DB2 DB1

Page 15: Exchange 2010 High Availability

Exchange 2010 High Availability FundamentalsMailbox Database (Continued)

− ~30 seconds database *overs− Server failover/switchover involves moving

all active databases to one or more other servers

− Database names are unique across a forest− Defines properties relevant at the database

level− GUID: a Database’s unique ID− EdbFilePath: path at which copies are located− Servers: list of servers hosting copies

Page 16: Exchange 2010 High Availability

Exchange 2010 High Availability Fundamentals Active/Passive vs. Source/Target

• Availability Terms− Active: Selected to provide

email services to clients− Passive: Available to provide

email services to clients if active fails

• Replication Terms− Source: Provides data for

copying to a separate location− Target: Receives data from the

source

DB1 DB1

Page 17: Exchange 2010 High Availability

Exchange 2010 High Availability FundamentalsMailbox Database Copy− Scope of replication− A copy is either source or target of replication at any

given time− A copy is either active or passive at any given time− Only 1 copy of each database in a DAG is active at a

time− A server may not host >1 copy of a any database

Mailbox Server 1

Mailbox Server 2

DB1

DB3

DB2

DB1

DB3

XDB2DB1

Page 18: Exchange 2010 High Availability

Exchange 2010 High Availability FundamentalsMailbox Database CopyDefines properties applicable to an individual database copy

− Copy status: Healthy, Initializing, Failed, Mounted, Dismounted, Disconnected, Suspended, FailedandSuspended, Resynchronizing, Seeding

− CopyQueueLength− ReplayQueueLength

ActiveCopyActivationSuspended

Page 19: Exchange 2010 High Availability

• Exchange-aware resource manager (high availability’s brain)− Runs on every server in the DAG− Manages which copies should be active and which

should be passive− Definitive source of information on where a database

is active or mounted− Provides this information to other Exchange

components (e.g., RPC Client Access and Hub Transport)

− Information stored in cluster database

Exchange 2010 High Availability FundamentalsActive Manager

Page 20: Exchange 2010 High Availability

• Active Directory is still primary source for configuration info

• Active Manager is primary source for changeable state information (such as active and mounted)

• Replication service monitors health of all mounted databases, and monitors ESE for I/O errors or failure

Exchange 2010 High Availability FundamentalsActive Manager

Page 21: Exchange 2010 High Availability

Exchange 2010 High Availability FundamentalsContinuous Replication

• Continuous replication has the following basic steps:− Database copy seeding of target− Log copying from source to target− Log inspection at target− Log replay into database copy

Page 22: Exchange 2010 High Availability

Exchange 2010 High Availability FundamentalsDatabase Seeding

• There are several ways to seed the target instance:− Automatic Seeding− Update-MailboxDatabaseCopy cmdlet

− Can be performed from active or passive copies− Manually copy the database− Backup and restore (VSS)

Page 23: Exchange 2010 High Availability

Exchange 2010 High Availability FundamentalsLog Shipping

• Log shipping in Exchange 2010 leverages TCP sockets− Supports encryption and compression− Administrator can set TCP port to be used

• Replication service on target notifies the active instance the next log file it expects − Based on last log file which it inspected

• Replication service on source responds by sending the required log file(s)

• Copied log files are placed in the target’s Inspector directory

Page 24: Exchange 2010 High Availability

Exchange 2010 High Availability FundamentalsLog Inspection

• The following actions are performed to verify the log file before replay:− Physical integrity inspection− Header inspection− Move any Exx.log files to ExxOutofDate folder

that exist on target if it was previously a source

• If inspection fails, the file will be recopied and inspected (up to 3 times)

• If the log file passes inspection it is moved into the database copy’s log directory

Page 25: Exchange 2010 High Availability

Exchange 2010 High Availability FundamentalsLog Replay• Log replay has moved to Information Store• The following validation tests are performed prior

to log replay:− Recalculate the required log generations by inspecting the

database header− Determine the highest generation that is present in the

log directory to ensure that a log file exists− Compare the highest log generation that is present in the

directory to the highest log file that is required− Make sure the logs form the correct sequence− Query the checkpoint file, if one exists

• Replay the log file using a special recovery mode (undo phase is skipped)

Page 26: Exchange 2010 High Availability

Exchange 2010 High Availability FundamentalsLossy Failure Process• In the event of failure, the following steps will occur

for the failed database:− Active Manager will determine the best copy to activate− The Replication service on the target server will attempt

to copy missing log files from the source - ACLL− If successful, then the database will mount with zero data

loss− If unsuccessful (lossy failure), then the database will mount

based on the AutoDatabaseMountDial setting − The mounted database will generate new log files (using

the same log generation sequence)− Transport Dumpster requests will be initiated for the

mounted database to recover lost messages− When original server or database recovers, it will run

through divergence detection and perform an incremental reseed or require a full reseed

Page 27: Exchange 2010 High Availability

Exchange 2010 High Availability FundamentalsBackups• Streaming backup APIs for public use have been cut, must use VSS

for backups− Backup from any copy of the database/logs− Always choose Passive (or Active) copy− Backup an entire server − Designate a dedicated backup server for a given database

• Restore from any of these backups scenarios

DB2DB3

DB2

DB3

DB1

DB3

DB1 DB1

VSS requestor

DB2

Database Availability Group

Mailbox Server 1

Mailbox Server 2

Mailbox Server 3

Page 28: Exchange 2010 High Availability

Multiple Database Copies Enable New Scenarios

• Exchange 2010 HA• E-mail archive• Extended/protected dumpster

retention

7-14 day lag copy

X

Database Availability Group

Mailbox Server 1

Mailbox Server 2

Mailbox Server 3

DB1DB2DB3

DB1DB2DB3

DB1DB2DB3

Site/server/disk failureArchiving/complianceRecover deleted items

Page 29: Exchange 2010 High Availability

Mailbox Database Copies

• Create up to 16 copies of each mailbox database

• Each mailbox database must have a unique name within Organization− Mailbox database objects are global

configuration objects− All mailbox database copies use the same

GUID− No longer connected to specific Mailbox

servers

Page 30: Exchange 2010 High Availability

Mailbox Database Copies

• Each DAG member can host only one copy of a given mailbox database− Database path and log folder path for

copy must be identical on all members

• Copies have settable properties− Activation Preference

− RTM: Used as second sort key during best copy selection

− SP1: Used for distributing active databases; used as primary sorting key when using Lossless mount dial

− Replay Lag and Truncation Lag− Using these features affects your storage

design

Page 31: Exchange 2010 High Availability

Lagged Database Copies

• A lagged copy is a passive database copy with a replay lag time greater than 0

• Lagged copies are only for point-in-time protection, but they are not a replacement for point-in-time backups− Logical corruption and/or mailbox deletion prevention

scenarios− Provide a maximum of 14 days protection

• When should you deploy a lagged copy?− Useful only to mitigate a risk− May not be needed if deploying a backup solution (e.g.,

DPM 2010)

• Lagged copies are not HA database copies− Lagged copies should never be automatically activated by

system− Steps for manual activation documented at http://

technet.microsoft.com/en-us/library/dd979786.aspx

• Lagged copies affect your storage design

Page 32: Exchange 2010 High Availability

DAG DesignTwo Failure Models• Design for all database copies activated

− Design for the worst case - server architecture handles 100 percent of all hosted database copies becoming active

• Design for targeted failure scenarios− Design server architecture to handle the active

mailbox load during the worst failure case you plan to handle− 1 member failure requires 2 or more HA copies and 2

or more servers− 2 member failure requires 3 or more HA copies and 4

or more servers− Requires Set-MailboxServer <Server> -

MaximumActiveDatabases <Number>

Page 33: Exchange 2010 High Availability

DAG DesignIt’s all in the layout• Consider this scenario

− 8 servers, 40 databases with 2 copiesServer 1

Server 2

Server 3

Server 4

Server 5

Server 6

Server 7

Server 8

DB1 DB6 DB11 DB16 DB21 DB26 DB31 DB36

DB2 DB7 DB12 DB17 DB22 DB27 DB32 DB37

DB3 DB8 DB13 DB18 DB23 DB28 DB33 DB38

DB4 DB9 DB14 DB19 DB24 DB29 DB34 DB39

DB5 DB10 DB15 DB20 DB25 DB30 DB35 DB40

DB36’ DB31’ DB26’ DB21’ DB16’ DB11’ DB6’ DB1’

DB37’ DB32’ DB27’ DB22’ DB17’ DB12’ DB7’ DB2’

DB38’ DB33’ DB28’ DB23’ DB18’ DB13’ DB8’ DB3’

DB39’ DB34’ DB29’ DB24’ DB19’ DB14’ DB9’ DB4’

DB40’ DB35’ DB30’ DB25’ DB20’ DB15’ DB10’ DB5’

Page 34: Exchange 2010 High Availability

DAG DesignIt’s all in the layout• If I have a single server failure

− Life is goodServer 1

Server 2

Server 3

Server 4

Server 5

Server 6

Server 7

Server 8

DB1 DB6 DB11 DB16 DB21 DB26 DB31 DB36

DB2 DB7 DB12 DB17 DB22 DB27 DB32 DB37

DB3 DB8 DB13 DB18 DB23 DB28 DB33 DB38

DB4 DB9 DB14 DB19 DB24 DB29 DB34 DB39

DB5 DB10 DB15 DB20 DB25 DB30 DB35 DB40

DB36’ DB31’ DB26’ DB21’ DB16’ DB11’ DB6’ DB1’

DB37’ DB32’ DB27’ DB22’ DB17’ DB12’ DB7’ DB2’

DB38’ DB33’ DB28’ DB23’ DB18’ DB13’ DB8’ DB3’

DB39’ DB34’ DB29’ DB24’ DB19’ DB14’ DB9’ DB4’

DB40’ DB35’ DB30’ DB25’ DB20’ DB15’ DB10’ DB5’

Page 35: Exchange 2010 High Availability

DAG DesignIt’s all in the layout• If I have a double server failure

− Life could be good…Server 1

Server 2

Server 3

Server 4

Server 5

Server 6

Server 7

Server 8

DB1 DB6 DB11 DB16 DB21 DB26 DB31 DB36

DB2 DB7 DB12 DB17 DB22 DB27 DB32 DB37

DB3 DB8 DB13 DB18 DB23 DB28 DB33 DB38

DB4 DB9 DB14 DB19 DB24 DB29 DB34 DB39

DB5 DB10 DB15 DB20 DB25 DB30 DB35 DB40

DB36’ DB31’ DB26’ DB21’ DB16’ DB11’ DB6’ DB1’

DB37’ DB32’ DB27’ DB22’ DB17’ DB12’ DB7’ DB2’

DB38’ DB33’ DB28’ DB23’ DB18’ DB13’ DB8’ DB3’

DB39’ DB34’ DB29’ DB24’ DB19’ DB14’ DB9’ DB4’

DB40’ DB35’ DB30’ DB25’ DB20’ DB15’ DB10’ DB5’

Page 36: Exchange 2010 High Availability

DAG DesignIt’s all in the layout• If I have a double server failure

− Life could be bad…Server 1

Server 2

Server 3

Server 4

Server 5

Server 6

Server 7

Server 8

DB1 DB6 DB11 DB16 DB21 DB26 DB31 DB36

DB2 DB7 DB12 DB17 DB22 DB27 DB32 DB37

DB3 DB8 DB13 DB18 DB23 DB28 DB33 DB38

DB4 DB9 DB14 DB19 DB24 DB29 DB34 DB39

DB5 DB10 DB15 DB20 DB25 DB30 DB35 DB40

DB36’ DB31’ DB26’ DB21’ DB16’ DB11’ DB6’ DB1’

DB37’ DB32’ DB27’ DB22’ DB17’ DB12’ DB7’ DB2’

DB38’ DB33’ DB28’ DB23’ DB18’ DB13’ DB8’ DB3’

DB39’ DB34’ DB29’ DB24’ DB19’ DB14’ DB9’ DB4’

DB40’ DB35’ DB30’ DB25’ DB20’ DB15’ DB10’ DB5’

Page 37: Exchange 2010 High Availability

DAG DesignIt’s all in the layout• Now let’s consider this scenario

− 4 servers, 12 databases with 3 copies

− With a single server failure:

− With a double server failure:

Server 1 Server 2 Server 3 Server 4DB1 DB2 DB3 DB4 DB5 DB6 DB7 DB8 DB9 DB10 DB11 DB12DB4’’ DB5’’ DB6’ DB1’ DB3’’ DB7’’ DB2’’ DB3’ DB4’ DB1’’ DB2’ DB5’DB7’ DB9’’ DB10’ DB8’ DB11’ DB12’

’DB10’’

DB11’’

DB12’ DB6’’ DB8’’ DB9’

Server 1 Server 2 Server 3 Server 4DB1 DB2 DB3 DB4 DB5 DB6 DB7 DB8 DB9 DB10 DB11 DB12DB4’’ DB5’’ DB6’ DB1’ DB3’’ DB7’’ DB2’’ DB3’ DB4’ DB1’’ DB2’ DB5’DB7’ DB9’’ DB10

’DB8’ DB11

’DB12’’

DB10’’

DB11’’

DB12’

DB6’’ DB8’’ DB9’

Server 1 Server 2 Server 3 Server 4DB1 DB2 DB3 DB4 DB5 DB6 DB7 DB8 DB9 DB10 DB11 DB12DB4’’ DB5’’ DB6’ DB1’ DB3’’ DB7’’ DB2’’ DB3’ DB4’ DB1’’ DB2’ DB5’DB7’ DB9’’ DB10

’DB8’ DB11

’DB12’’

DB10’’

DB11’’

DB12’

DB6’’ DB8’’ DB9’

Page 38: Exchange 2010 High Availability

Deep Dive on Exchange 2010 High Availability Basics

QuorumWitnessDAG LifecycleDAG Networks

Page 39: Exchange 2010 High Availability

Quorum

Page 40: Exchange 2010 High Availability

Quorum

• Used to ensure that only one subset of members is functioning at one time

• A majority of members must be active and have communications with each other

• Represents a shared view of members (voters and some resources)

• Dual Usage− Data shared between the voters representing

configuration, etc.− Number of voters required for the solution to stay running

(majority); quorum is a consensus of voters− When a majority of voters can communicate with each other,

the cluster has quorum− When a majority of voters cannot communicate with each

other, the cluster does not have quorum

Page 41: Exchange 2010 High Availability

Quorum

• Quorum is not only necessary for cluster functions, but it is also necessary for DAG functions− In order for a DAG member to mount and activate

databases, it must participate in quorum

• Exchange 2010 uses only two of the four available cluster quorum models− Node Majority (DAGs with an odd number of members)− Node and File Share Majority (DAGs with an even number

of members)

• Quorum = (N/2) + 1 (whole numbers only)− 6 members: (6/2) + 1 = 4 votes for quorum (can lose 3

voters)− 9 members: (9/2) + 1 = 5 votes for quorum (can lose 4

voters)− 13 members: (13/2) + 1 = 7 votes for quorum (can lose 6

voters)− 15 members: (15/2) + 1 = 8 votes for quorum (can lose 7

voters)

Page 42: Exchange 2010 High Availability

Witness and Witness Server

Page 43: Exchange 2010 High Availability

Witness

• A witness is a share on a server that is external to the DAG that participates in quorum by providing a weighted vote for the DAG member that has a lock on the witness.log file− Used only by DAGs that have an even

number of members

• Witness server does not maintain a full copy of quorum data and is not a member of the DAG or cluster

Page 44: Exchange 2010 High Availability

Witness• Represented by File Share Witness

resource− File share witness cluster resource, directory,

and share automatically created and removed as needed

− Uses Cluster IsAlive check for availability− If witness is not available, cluster core resources

are failed and moved to another DAG member− If other DAG member does not bring witness

resource online, the resource will remain in a Failed state, with restart attempts every 60 minutes− See http://support.microsoft.com/kb/978790 for details

on this behavior

Page 45: Exchange 2010 High Availability

Witness• If in a Failed state and needed for

quorum, cluster will try to online File Share Witness resource once− If witness cannot be restarted, it is

considered failed and quorum is lost− If witness can be restarted, it is

considered successful and quorum is maintained− An SMB lock is placed on witness.log− Node PAXOS information is incremented and

the updated PAXOS tag is written to witness.log

• If in an Offline state and needed for quorum, cluster will not try to restart – quorum lost

Page 46: Exchange 2010 High Availability

Witness

• When witness is no longer needed to maintain quorum, lock on witness.log is released

• Any member that locks the witness, retains the weighted vote (“locking node”)− Members in contact with locking node are

in majority and maintain quorum− Members not in contact with locking node

are in minority and lose quorum

Page 47: Exchange 2010 High Availability

Witness Server

• No pre-configuration typically necessary− Exchange Trusted Subsystem must be

member of local Administrators group on Witness Server if Witness Server is not running Exchange 2010

• Cannot be a member of the DAG (present or future)

• Must be in the same Active Directory forest as DAG

Page 48: Exchange 2010 High Availability

Witness Server

• Can be Windows Server 2003 or later− File and Printer Sharing for Microsoft Networks

must be enabled

• Replicating witness directory/share with DFS not supported

• Not necessary to cluster Witness Server− If you do cluster witness server, you must use

Windows 2008

• Single witness server can be used for multiple DAGs− Each DAG requires its own unique Witness

Directory/Share

Page 49: Exchange 2010 High Availability

Database Availability Group Lifecycle

Page 50: Exchange 2010 High Availability

Database Availability Group Lifecycle• Create a DAG

New-DatabaseAvailabilityGroup -Name DAG1 –WitnessServer EXHUB1 -WitnessDirectory C:\DAG1FSW -DatabaseAvailabilityGroupIpAddresses 10.0.0.8

New-DatabaseAvailabilityGroup -Name DAG2 -DatabaseAvailabilityGroupIpAddresses 10.0.0.8,192.168.0.8

• Add Mailbox Servers to DAGAdd-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer EXMBX1

Add-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer EXMBX2

• Add a Mailbox Database CopyAdd-MailboxDatabaseCopy -Identity DB1 -MailboxServer EXMBX2

Page 51: Exchange 2010 High Availability

Database Availability Group Lifecycle• DAG is created initially as empty object in Active

Directory− Continuous replication or 3rd party replication

using Third Party Replication mode− Once changed to Third Party Replication mode,

the DAG cannot be changed back− DAG is given a unique name and configured for

IP addresses (or configured to use DHCP)

Page 52: Exchange 2010 High Availability

Database Availability Group Lifecycle• When the first Mailbox server is added to a

DAG− A failover cluster is formed with the name of

DAG using Node Majority quorum− The server is added to the DAG object in Active

Directory− A cluster name object (CNO) for the DAG is

created in default Computers container using the security context of the Replication service

− The Name and IP address of the DAG is registered in DNS

− The cluster database for the DAG is updated with info about local databases

Page 53: Exchange 2010 High Availability

Database Availability Group Lifecycle• When second and subsequent Mailbox server is

added to a DAG− The server is joined to cluster for the DAG− The quorum model is automatically adjusted− The server is added to the DAG object in Active

Directory− The cluster database for the DAG is updated

with info about local databases

Page 54: Exchange 2010 High Availability

Database Availability Group Lifecycle• After servers have been added to a

DAG− Configure the DAG

− Network encryption− Network compression− Replication port

− Configure DAG networks− Network subnets

− Collapse DAG networks in single network with multiple subnets

− Enable/disable MAPI traffic/replication− Block network heartbeat cross-talk

(Server1\MAPI !<-> Server2\Repl)

Page 55: Exchange 2010 High Availability

Database Availability Group Lifecycle• After servers have been added to a

DAG− Configure DAG member properties

− Automatic database mount dial− BestAvailability, GoodAvailability, Lossless,

custom value− Database copy automatic activation policy

− Blocked, IntrasiteOnly, Unrestricted− Maximum active databases

− Create mailbox database copies− Seeding is performed automatically, but you

have options− Monitor health and status of database

copies and perform switchovers as needed

Page 56: Exchange 2010 High Availability

Database Availability Group Lifecycle• Before you can remove a server from

a DAG, you must first remove all replicated databases from the server

• When a server is removed from a DAG:− The server is evicted from the cluster− The cluster quorum is adjusted− The server is removed from the DAG

object in Active Directory

• Before you can remove a DAG, you must first remove all servers from the DAG

Page 57: Exchange 2010 High Availability

DAG Networks

Page 58: Exchange 2010 High Availability

DAG Networks

• A DAG network is a collection of subnets

• All DAGs must have:− Exactly one MAPI network

− MAPI network connects DAG members to network resources (Active Directory, other Exchange servers, etc.)

− Zero or more Replication networks− Separate network on separate subnet(s)− Used for/by continuous replication only− LRU determines which replication network to

use when multiple replication networks are configured

Page 59: Exchange 2010 High Availability

DAG Networks

• Initially created DAG networks based on enumeration of cluster networks− Cluster enumeration based on subnet− One cluster network is created for each

subnet

Page 60: Exchange 2010 High Availability

DAG Networks

Server / Network IP Address / Subnet Bits

Default Gateway

EX1 – MAPI 192.168.0.15/24 192.168.0.1

EX1 – REPLICATION 10.0.0.15/24 N / A

EX2 – MAPI 192.168.0.16/24 192.168.0.1

EX2 – REPLICATION 10.0.0.16/24 N / A

Name Subnet(s) Interface(s) MAPI Access Enabled

Replication Enabled

DAGNetwork01

192.168.0.0/24

EX1 (192.168.0.15)EX2 (192.168.0.16)

True True

DAGNetwork02

10.0.0.0/24 EX1 (10.0.0.15)EX2 (10.0.0.16)

False True

Page 61: Exchange 2010 High Availability

DAG Networks

Server / Network IP Address / Subnet Bits

Default Gateway

EX1 – MAPI 192.168.0.15/24 192.168.0.1

EX1 – REPLICATION 10.0.0.15/24 N / A

EX2 – MAPI 192.168.1.15/24 192.168.1.1

EX2 – REPLICATION 10.0.1.15/24 N / A

Name Subnet(s) Interface(s) MAPI Access Enabled

Replication Enabled

DAGNetwork01

192.168.0.0/24

EX1 (192.168.0.15)

True True

DAGNetwork02

10.0.0.0/24 EX1 (10.0.0.15) False True

DAGNetwork03

192.168.1.0/24

EX2 (192.168.1.15)

True True

DAGNetwork04

10.0.1.0/24 EX2 (10.0.1.15) False True

Page 62: Exchange 2010 High Availability

DAG Networks

Name Subnet(s) Interface(s) MAPI Access Enabled

Replication Enabled

DAGNetwork01

192.168.0.0/24

EX1 (192.168.0.15)

True True

DAGNetwork02

10.0.0.0/24 EX1 (10.0.0.15) False True

DAGNetwork03

192.168.1.0/24

EX2 (192.168.1.15)

True True

DAGNetwork04

10.0.1.0/24 EX2 (10.0.1.15) False True

• To collapse subnets into two DAG networks and disable replication for the MAPI network:

Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$falseSet-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04

Page 63: Exchange 2010 High Availability

DAG Networks

Name Subnet(s) Interface(s) MAPI Access Enabled

Replication Enabled

DAGNetwork01

192.168.0.0/24192.168.1.0/24

EX1 (192.168.0.15)EX2 (192.168.1.15)

True False

DAGNetwork02

10.0.0.0/2410.0.1.0/24

EX1 (10.0.0.15)EX2 (10.0.1.15)

False True

• To collapse subnets into two DAG networks and disable replication for the MAPI network:

Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$falseSet-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04

Page 64: Exchange 2010 High Availability

DAG Networks

• Automatic network detection occurs only when members added to DAG− If networks are added after member is added, you must

perform discoverySet-DatabaseAvailabilityGroup -DiscoverNetworks

• DAG network configuration persisted in cluster registry− HKLM\Cluster\Exchange\DAG Network

• DAG networks include built-in encryption and compression− Encryption: Kerberos SSP

EncryptMessage/DecryptMessage APIs− Compression: Microsoft XPRESS, based on LZ77 algorithm

• DAGs use a single TCP port for replication and seeding− Default is TCP port 64327− If you change the port and you use Windows Firewall, you

must manually change firewall rules

Page 65: Exchange 2010 High Availability

Deeper Dive on Exchange 2010High Availability Advanced Features

Active ManagerBest Copy SelectionDatacenter Activation Coordination Mode

Page 66: Exchange 2010 High Availability

Active Manager

Page 67: Exchange 2010 High Availability

Active Manager

• Exchange component that manages *overs− Runs on every server in the DAG− Selects best available copy on failovers− Is the definitive source of information on

where a database is active− Stores this information in cluster database− Provides this information to other Exchange

components (e.g., RPC Client Access and Hub Transport)

Page 68: Exchange 2010 High Availability

Active Manager

• Active Manager roles− Standalone Active Manager− Primary Active Manager (PAM)− Standby Active Manager (SAM)

• Active Manager client runs on CAS and Hub

Page 69: Exchange 2010 High Availability

Active Manager

• Transition of role state logged into Microsoft-Exchange-HighAvailability/Operational event log (Crimson Channel)

Page 70: Exchange 2010 High Availability

Active Manager

• Primary Active Manager (PAM)− Runs on the node that owns the cluster

core resources (cluster group)− Gets topology change notifications− Reacts to server failures− Selects the best database copy on *overs− Detects failures of local Information Store

and local databases

Page 71: Exchange 2010 High Availability

Active Manager

• Standby Active Manager (SAM)− Runs on every other node in the DAG− Detects failures of local Information Store

and local databases− Reacts to failures by asking PAM to initiate a

failover− Responds to queries from CAS/Hub about

which server hosts the active copy

• Both roles are necessary for automatic recovery− If the Replication service is stopped,

automatic recovery will not happen

Page 72: Exchange 2010 High Availability

Best Copy Selection

Page 73: Exchange 2010 High Availability

Best Copy Selection

• Process of finding the best copy to activate for an individual database given a list of status results of potential copies for activation

• Active Manager selects the “best” copy to become the new active copy when the existing active copy fails

Page 74: Exchange 2010 High Availability

Best Copy Selection – RTM

• Sorts copies by copy queue length to minimize data loss, using activation preference as a secondary sorting key if necessary

• Selects from sorted listed based on which set of criteria met by each copy

• Attempt Copy Last Logs (ACLL) runs and attempts to copy missing log files from previous active copy

Page 75: Exchange 2010 High Availability

Best Copy Selection – SP1

• Sorts copies by activation preference when auto database mount dial is set to Lossless− Otherwise, sorts copies based on copy

queue length, with activation preference used a secondary sorting key if necessary

• Selects from sorted listed based on which set of criteria met by each copy

• Attempt Copy Last Logs (ACLL) runs and attempts to copy missing log files from previous active copy

Page 76: Exchange 2010 High Availability

Best Copy Selection

• Is database mountable? Is copy queue length <= AutoDatabaseMountDial?− If Yes, database is marked as current

active and mount request is issued− If not, next best database tried (if one is

available)

• During best copy selection, any servers that are unreachable or “activation blocked” are ignored

Page 77: Exchange 2010 High Availability

Best Copy SelectionCriteri

aCopy Queue

LengthReplay Queue

LengthContent Index

Status

1 < 10 logs < 50 logs Healthy

2 < 10 logs < 50 logs Crawling

3 N / A < 50 logs Healthy

4 N / A < 50 logs Crawling

5 N / A < 50 logs N / A

6 < 10 logs N / A Healthy

7 < 10 logs N / A Crawling

8 N / A N / A Healthy

9 N / A N / A Crawling

10 Any database copy with a status of Healthy, DisconnectedAndHealthy, DisconnectedAndResynchronizing,

or SeedingSource

Page 78: Exchange 2010 High Availability

Best Copy Selection – RTM

• Four copies of DB1• DB1 currently active on Server1

Database Copy

Activation

Preference

Copy Queue Length

Replay Queue Length

CI State Database State

Server2\DB1 2 4 0 Healthy Healthy

Server3\DB1 3 2 2 Healthy DiscAndHealthy

Server4\DB1 4 10 0 Crawling Healthy

DB1

Server1

Server2

Server3

Server4

DB1

DB1

DB1X

Page 79: Exchange 2010 High Availability

Best Copy Selection – RTM• Sort list of available copies based by

Copy Queue Length (using Activation Preference as secondary sort key if necessary):− Server3\DB1− Server2\DB1− Server4\DB1

Database Copy

Activation

Preference

Copy Queue Length

Replay Queue Length

CI State Database State

Server2\DB1 2 4 0 Healthy Healthy

Server3\DB1 3 2 2 Healthy DiscAndHealthy

Server4\DB1 4 10 0 Crawling Healthy

Page 80: Exchange 2010 High Availability

Best Copy Selection – RTM• Only two copies meet first set of

criteria for activation (CQL< 10; RQL< 50;CI=Healthy):− Server3\DB1− Server2\DB1− Server4\DB1

Database Copy

Activation

Preference

Copy Queue Length

Replay Queue Length

CI State Database State

Server2\DB1 2 4 0 Healthy Healthy

Server3\DB1 3 2 2 Healthy DiscAndHealthy

Server4\DB1 4 10 0 Crawling Healthy

Lowest copy queue length – tried first

Page 81: Exchange 2010 High Availability

Best Copy Selection – SP1

• Four copies of DB1• DB1 currently active on Server1• Auto database mount

dial set to Lossless

Database Copy

Activation

Preference

Copy Queue Length

Replay Queue Length

CI State Database State

Server2\DB1 2 4 0 Healthy Healthy

Server3\DB1 3 2 2 Healthy DiscAndHealthy

Server4\DB1 4 10 0 Crawling Healthy

DB1

Server1

Server2

Server3

Server4

DB1

DB1

DB1X

Page 82: Exchange 2010 High Availability

Best Copy Selection – SP1• Sort list of available copies based by

Activation Preference:− Server2\DB1− Server3\DB1− Server4\DB1

Database Copy

Activation

Preference

Copy Queue Length

Replay Queue Length

CI State Database State

Server2\DB1 2 4 0 Healthy Healthy

Server3\DB1 3 2 2 Healthy DiscAndHealthy

Server4\DB1 4 10 0 Crawling Healthy

Page 83: Exchange 2010 High Availability

Best Copy Selection – SP1• Sort list of available copies based by

Activation Preference:− Server2\DB1− Server3\DB1− Server4\DB1

Database Copy

Activation

Preference

Copy Queue Length

Replay Queue Length

CI State Database State

Server2\DB1 2 4 0 Healthy Healthy

Server3\DB1 3 2 2 Healthy DiscAndHealthy

Server4\DB1 4 10 0 Crawling Healthy

Lowest preference value – tried first

Page 84: Exchange 2010 High Availability

Best Copy Selection

• After Active Manager determines the best copy to activate− The Replication service on the target

server attempts to copy missing log files from the source (ACLL)− If successful, then the database will mount

with zero data loss− If unsuccessful (lossy failure), then the

database will mount based on the AutoDatabaseMountDial setting

− If data loss is outside of dial setting, next copy will be tried

Page 85: Exchange 2010 High Availability

Best Copy Selection

• After Active Manager determines the best copy to activate− The mounted database will generate new

log files (using the same log generation sequence)

− Transport Dumpster requests will be initiated for the mounted database to recover lost messages

− When original server or database recovers, it will run through divergence detection and either perform an incremental resync or require a full reseed

Page 86: Exchange 2010 High Availability

Datacenter Activation Coordination Mode

Page 87: Exchange 2010 High Availability

Datacenter Activation Coordination Mode• DAC mode is a property of a DAG• Acts as an application-level form of

quorum− Designed to prevent multiple copies of

same database mounting on different members due to loss of network

Page 88: Exchange 2010 High Availability

Datacenter Activation Coordination Mode• RTM: DAC Mode is only for DAGs with three

or more members that are extended to two Active Directory sites− Don’t enable for two-member DAGs where each

member is in different AD site or DAGs where all members are in the same AD site

− DAC Mode also enables use of Site Resilience tasks− Stop-DatabaseAvailabilityGroup− Restore-DatabaseAvailabilityGroup− Start-DatabaseAvailabilityGroup

• SP1: DAC Mode can be enabled for all DAGs

Page 89: Exchange 2010 High Availability

Datacenter Activation Coordination Mode• Uses Datacenter Activation

Coordination Protocol (DACP), which is a bit in memory set to either:− 0 = can’t mount− 1 = can mount

Page 90: Exchange 2010 High Availability

Datacenter Activation Coordination Mode• Active Manager startup sequence

− DACP is set to 0− DAG member communicates with other

DAG members it can reach to determine the current value for their DACP bits− If the starting DAG member can communicate

with all other members, DACP bit switches to 1

− If other DACP bits are set to 0, starting DAG member DACP bit remains at 0

− If another DACP bit is set to 1, starting DAG member DACP bit switches to 1

Page 91: Exchange 2010 High Availability

Improvements in Service Pack 1

Replication and Copy Management enhancements in SP1

Page 92: Exchange 2010 High Availability

Improvements in Service Pack 1• Continuous replication changes

− Enhanced to reduce data loss− Eliminates log drive as single point of failure

• Automatically switches between modes:− File mode (original, log file shipping)− Block mode (enhanced log block shipping)

• Switching process:− Initial mode is file mode− Block mode triggered when target needs Exx.log file (e.g.,

copy queue length = 0)− All healthy passives processed in parallel− File mode triggered when block mode falls too far behind

(e.g., copy queue length > 0)

Page 93: Exchange 2010 High Availability

Log File 2

Log File 1

Log File 2

Log File 1

Log File 4

Log File 3

Send me the latest log files … I have log 2

Log File 5

Log File 4

Log File 5

Log File 3

Database copy up to date

Continuous Replication – File Mode

Continuous Replication – Block Mode

ESE L

og B

uff

er

Replica

tion Lo

g B

uff

er

Log File 6

Log File 6

Log is built and inspected

Log File 7

Log fragment detected

and converted

to complete

log

Improvements in Service Pack 1

Page 94: Exchange 2010 High Availability

Improvements in Service Pack 1• SP1 introduces

RedistributeActiveDatabases.ps1 script (keep database copies balanced across DAG members)− Moves databases to the most preferred copy− If cross-site, tries to balance between sites

• Targetless admin switchover altered for stronger activation preference affinity− First pass of best copy selection sorted by

activation preference; not copy queue length− This basically trades off even distribution of

copies for a longer activation time. So you might pick a copy with more logs to play, but it will provide you with better distribution of databases

Page 95: Exchange 2010 High Availability

Improvements in Service Pack 1• *over Performance Improvements

− In RTM, a *over immediately terminated replay on copy that was becoming active, and mount operation did necessary log recovery

− In SP1, a *over drives database to clean shutdown by playing all logs on passive copy, and no recovery required on new active

Page 96: Exchange 2010 High Availability

Improvements in Service Pack 1• DAG Maintenance Scripts

− StartDAGServerMaintenance.ps1− It runs Suspend-MailboxDatabaseCopy for

each database copy hosted on the DAG member

− It pauses the node in the cluster, which prevents it from being and becoming the PAM

− It sets the DatabaseCopyAutoActivationPolicy parameter on the DAG member to Blocked

− It moves all active databases currently hosted on the DAG member to other DAG members

− If the DAG member currently owns the default cluster group, it moves the default cluster group (and therefore the PAM role) to another DAG member

Page 97: Exchange 2010 High Availability

Improvements in Service Pack 1• DAG Maintenance Scripts

− StopDAGServerMaintenance.ps1− It run Resume-MailboxDatabaseCopy for each

database copy hosted on the DAG member− It resumes the node in the cluster, which it

enables full cluster functionality for the DAG member

− It sets the DatabaseCopyAutoActivationPolicy parameter on the DAG member to Unrestricted

Page 98: Exchange 2010 High Availability

Improvements in Service Pack 1• CollectOverMetrics.ps1 and

CollectReplicationMetrics.ps1 rewritten

Page 99: Exchange 2010 High Availability

Improvements in Service Pack 1• Exchange Management Console

enhancements in SP1− Manage DAG IP addresses− Manage witness server/directory and

alternate witness server/directory

Page 100: Exchange 2010 High Availability

Switchovers and Failovers (*overs)

Page 101: Exchange 2010 High Availability

Exchange 2010 *Overs

• Within a datacenter− Database *over− Server *over

• Between datacenters− Single database *over− Server *over

• Datacenter switchover

Page 102: Exchange 2010 High Availability

Single Database Cross-Datacenter *Over

− Database mounted in another datacenter and another Active Directory site

− Serviced by “new” Hub Transport servers − “Different OwningServer” – for routing

− Transport dumpster re-delivery now from both Active Directory sites

− Serviced by “new” CAS− “Different CAS URL” – for protocol access− Outlook Web App now re-directs connection to

second CAS farm− Other protocols proxy or redirect (varies)

Page 103: Exchange 2010 High Availability

Datacenter Switchover− Customers can evolve to site resilience− Standalone local redundancy site resilience− Consider name space design at first deployment− Keep extending the DAG!− Monitoring and many other concepts/skills just

re-applied− Normal administration remains unchanged− Disaster recovery not HA event

Page 104: Exchange 2010 High Availability

Site Resilience

Page 105: Exchange 2010 High Availability

Agenda

• Understand the steps required to build and activate a standby site for Exchange 2010− Site Resilience Overview− Site Resilience Models− Planning and Design− Site activation steps− Client Behavior

Page 106: Exchange 2010 High Availability

Site Resilience Drivers

• Business requirements drive site resilience− When a risk assessment reveals a high-

impact threat to meeting SLAs for data loss and loss of availability

− Site resilience required to mitigate the risk

− Business requirements dictate low recovery point objective (RPO) and recovery time objective (RTO)

Page 107: Exchange 2010 High Availability

Site Resilience Overview

• Ensuring business continuity brings expense and complexity− A site switchover is a coordinated effort

with many stakeholders that requires practice to ensure the real event is handled well

• Exchange 2010 reduces cost and complexity− Low-impact testing can be performed

with cross-site single database switchover

Page 108: Exchange 2010 High Availability

Exchange 2007Site resilience choices

• CCR+SCR and /recoverCMS• SCC+SCR and /recoverCMS• CCR stretched across datacenters• SCR and database portability• SCR and /m:RecoverServer• SCC stretched across datacenters with

synchronous replication

Page 109: Exchange 2010 High Availability

Exchange 2010 makes it simpler• Database Availability Group (DAG)

with members in different datacenters/sites− Supports automatic and manual cross-site

database switchovers and failovers (*overs)

− No stretched Active Directory site− No special networking needed− No /recoverCMS

Page 110: Exchange 2010 High Availability

Suitability of site resilience solutions

Solution RTO goal RPO goal Deployment complexity

Ship backups and restore

High High Low

Standby Exchange 2003 clusters

Moderate Low High

CCR+SCR in separate AD sites

Moderate Low Moderate

CCR in a stretched AD site

Low Low High

Exchange 2010 DAGs

Low Low Low

Page 111: Exchange 2010 High Availability

Site Resilience Models

Voter Placement and Infrastructure Design

Page 112: Exchange 2010 High Availability

Infrastructure Design

• There are two key models you have to take into account when designing site resilient solutions− Datacenter / Namespace Model− User Distribution Model

• When planning for site resilience, each datacenter is considered active− Exchange Server 2010 site resilience requires

active CAS, HUB, and UM in standby datacenter− Services used by databases mounted in standby

datacenter after single database *over

Page 113: Exchange 2010 High Availability

Infrastructure DesignUser Distribution Models• The locality of the users will ultimately determine

your site resilience architecture− Are users primarily located in one datacenter?− Are users located in multiple datacenters?− Is there a requirement to maintain user population in a

particular datacenter?

• Active/Passive user distribution model− Database copies deployed in the secondary datacenter,

but no active mailboxes are hosted there

• Active/Active user distribution model− User population dispersed across both datacenters with

each datacenter being the primary datacenter for its specific user population

Page 114: Exchange 2010 High Availability

Infrastructure DesignClient Access Arrays• 1 CAS array per AD site

− Multiple DAGs within an AD site can use the same CAS array

• FQDN of the CAS array needs to resolve to a load-balanced virtual IP address in DNS, but only in internal DNS− You need a load balancer for CAS array, as well

• Set the databases in the AD site to utilize CAS array via Set-MailboxDatabase -RPCClientAccessServer property

• By default, new databases will have the RPCClientAccessServer value set on creation− If database was created prior to creating CAS array, then it

is set to random CAS FQDN (or local machine if role co-location)

− If database is created after creating CAS array, then it is set to the CAS array FQDN

Page 115: Exchange 2010 High Availability

Voter Placement

• Majority of voters should be deployed in primary datacenter− Primary = datacenter with majority of

user population

• If user population is spread across datacenters, deploy multiple DAGs to prevent WAN outage from taking one datacenter offline

Page 116: Exchange 2010 High Availability

Voter PlacementPortland Seattle

MB

X0

2MB

X0

1 MB

X0

3

DAG1

MB

X0

4

HU

B0

1 HU

B0

2 HU

B0

4 HU

B0

3

DAG2 Witnes

s

CA

S03

CA

S04

DAG1 Alt Wit

DAG2 Alt Wit

CA

S01

CA

S02

DAG1 Witnes

sM

BX

07M

BX

08 MB

X0

6

DAG2

MB

X0

5

Page 117: Exchange 2010 High Availability

Site Resilience

Namespace, Network and Certificate Planning

Page 118: Exchange 2010 High Availability

• Each datacenter is considered active and needs their own namespaces

• Each datacenter needs the following namespaces− OWA/OA/EWS/EAS namespace− POP/IMAP namespace− RPC Client Access namespace− SMTP namespace

• In addition, one of the datacenters will maintain the Autodiscover namespace

Planning for site resilienceNamespaces

Page 119: Exchange 2010 High Availability

• Best Practice: Use Split DNS for Exchange hostnames used by clients

• Goal: minimize number of hostnames− mail.contoso.com for Exchange

connectivity on intranet and Internet− mail.contoso.com has different IP

addresses in intranet/Internet DNS

• Important – before moving down this path, be sure to map out all host names (outside of Exchange) that you want to create in the internal zone

Planning for site resilienceNamespaces

Page 120: Exchange 2010 High Availability

Planning for site resilienceNamespaces

Datacenter 1

CAS HT

MBX

Datacenter 2

HT CAS

ADAD MBX

Internal DNSMail.contoso.comPop.contoso.comImap.contoso.comAutodiscover.contoso.comSmtp.contoso.comOutlook.contoso.com

Internal DNSMail.region.contoso.comPop.region.contoso.comImap.region.contoso.comSmtp.region.contoso.comOutlook.region.contoso.com

Exchange ConfigExternalURL = mail.region.contoso.comCAS Array = outlook.region.contoso.comOA endpoint = mail.region.contoso.com

Exchange ConfigExternalURL = mail.contoso.comCAS Array = outlook.contoso.comOA endpoint = mail.contoso.com

External DNSMail.region.contoso.comPop.region.contoso.comImap.region.contoso.comSmtp.region.contoso.com

External DNSMail.contoso.comPop.contoso.comImap.contoso.comAutodiscover.contoso.comSmtp.contoso.com

Page 121: Exchange 2010 High Availability

• Design High Availability for Dependencies− Active Directory− Network services (DNS, TCP/IP, etc.)− Telephony services (Unified Messaging)− Backup services− Network services− Infrastructure (power, cooling, etc.)

Planning for site resilienceNetwork

Page 122: Exchange 2010 High Availability

• Latency− Must have less than 250 ms round trip

• Network cross-talk must be blocked− Router ACLs should be used to block traffic

between MAPI and replication networks− If DHCP is used for the replication network,

DHCP can be used to deploy static routes

• Lower TTL for all Exchange records to 5 minutes− OWA/EAS/EWS/OA, IMAP/POP, SMTP, RPCCAS− Both internal and external DNS zone

Planning for site resilienceNetwork

Page 123: Exchange 2010 High Availability

Certificate Type Pros Cons

Wildcard Certs •One cert for both sides•Flexible if names change

•Wildcard certs can be expensive, or impossible to obtain•WM 5 clients don’t work with wildcard certs•Setting of Cert Principal Name to *.company.com is global to all CAS in forest

Intelligent Firewall •Traffic is forwarded to the ‘correct’ CAS

•Requires ISA or other firewall which can forward based on properties•Additional hardware required•AD replication delays affect publishing rules

Load Balancer •Load Balancer can listen for both external names and forward to the ‘correct’ CAS

•Requires multiple certificates•Requires multiple IP’s•Requires load balancer

Same Config in Both Sites

•Just an A record change required after site failover

•No way to run DR site as Active during normal operation

Manipulate Cert Principal Name

•Minimal configuration changes required after failover•Works with all clients

•Setting of Cert Principal Name to mail.company.com is global to all CAS in forest

Planning for site resilienceCertificates

Page 124: Exchange 2010 High Availability

Planning for site resilienceCertificates• Best practice: minimize the number of certificates

− 1 certificate for all CAS servers + reverse proxy + Edge/Hub

− Use Subject Alternative Name (SAN) certificate which can cover multiple hostnames

− 1 additional certificate if using OCS− OCS requires certificates with <=1024 bit keys and the server name

in the certificate principal name

• If leveraging a certificate per datacenter, ensure the Certificate Principal Name is the same on all certificates− Outlook Anywhere won’t connect if the Principal

Name on the certificate does not match the value configured in msstd: (default matches OA RPC End Point)

− Set-OutlookProvider EXPR -CertPrincipalName msstd:mail.contoso.com

Page 125: Exchange 2010 High Availability

Datacenter Switchover

Switchover Tasks

Page 126: Exchange 2010 High Availability

Datacenter Switchover Process• Failure occurs• Activation decision• Terminate partially running primary

datacenter• Activate secondary datacenter

− Validate prerequisites− Activate mailbox servers− Activate other roles (in parallel with previous

step)

• Service is restored

Page 127: Exchange 2010 High Availability

Datacenter Switchovers

1. Primary site fails2. Stop-

DatabaseAvailabilityGroup <DAGName> –ActiveDirectorySite <PSiteName> –ConfigurationOnly (run this in both datacenters)

3. Stop-Service clussvc4. Restore-

DatabaseAvailabilityGroup <DAGName> –ActiveDirectorySite <SSiteName>

5. Databases mount (assuming no activation blocks)

6. Adjust DNS records for SMTP and HTTPS

1. Verify all services working2. Start-DatabaseAvailabilityGroup

<DAGName> –ActiveDirectorySite <PSiteName>

3. Set-DatabaseAvailabilityGroup <DAGName> –WitnessDirectory <Directory> –WitnessServer <ServerName>

4. Reseed data5. Schedule downtime for dismount6. Change DNS records back7. Move-

ActiveMailboxDatabase <DBName> –ActivateOnServer <ServerName>

8. Mount databases in primary datacenter

Primary to Standby Standby to Primary

Page 128: Exchange 2010 High Availability

Datacenter Switchover Tasks

• Stop-DatabaseAvailabilityGroup− Adds failed servers to stopped list− Removes servers from started list

• Restore-DatabaseAvailabilityGroup− Force quorum− Evict stopped nodes− Start using alternate file share witness if necessary

• Start-DatabaseAvailabilityGroup− Remove servers from stopped list− Join servers to cluster− Add joined servers to started list

Page 129: Exchange 2010 High Availability

Client ExperiencesTypical Outlook Behavior

• All Outlook versions behave consistently in a single datacenter scenario− Profile points to RPC Client Access Server array− Profile is unchanged by failovers or loss of CAS

• All Outlook versions should behave consistently in a datacenter switchover scenario− Primary datacenter Client Access Server DNS name is

bound to IP address of standby datacenter’s Client Access Server

− Autodiscover continues to hand out primary datacenter CAS name as Outlook RPC endpoint

− Profile remains unchanged

Page 130: Exchange 2010 High Availability

Client ExperiencesOutlook – Cross-Site DB Failover Experience

• Behavior is to perform a direct connect from the CAS array in the first datacenter to the mailbox hosting the active copy in the second datacenter

• You can only get a redirect to occur by changing the RPCClientAccessServer property on the database

Page 131: Exchange 2010 High Availability

Client ExperiencesOther Clients

• Other client behavior varies based on protocol and scenario

  In-Site *Over Scenario

Out-of-Site *Over Scenario

Datacenter Switchover

OWA Reconnect Manual Redirect Reconnect

OA Reconnect Reconnect / Autodiscover

Reconnect

EAS Reconnect Redirect or proxy Reconnect

POP/IMAP Reconnect Proxy Reconnect

EWS Reconnect Autodiscover Reconnect

Autodiscover N/A Seamless Reconnect

SMTP / Powershell

N/A N/A Reconnect

Page 132: Exchange 2010 High Availability

End of Exchange 2010 High Availability Module

Page 133: Exchange 2010 High Availability

For More Information

• Exchange Server Tech Centerhttp://technet.microsoft.com/en-us/exchange/default.aspx

• Planning serviceshttp://technet.microsoft.com/en-us/library/cc261834.aspx

• Microsoft IT Showcase Webcasts http://www.microsoft.com/howmicrosoftdoesitwebcasts

• Microsoft TechNet http://www.microsoft.com/technet/itshowcase

Page 134: Exchange 2010 High Availability

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after

the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.