Disaster Recovery Sites on AWS: Minimal Cost, Maximum Efficiency

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Disaster Recovery Site on AWS:

Minimal Cost Maximum EfficiencyAbdul Sathar Sait, AWS

March 26, 2014

What You Will Learn

• Disaster Recovery and Business Continuity

• Why AWS for disaster recovery?

• Common DR architectures

• Backup and restore

• Pilot light

• Warm Standby

• Hot Standby

• Customer case study

• Where to go next

Disruptions to Business Continuity

Caused by outage of IT infrastructure

Affects businesses of all kinds and sizes

Can be very expensive

Downtime

Natural Disaster

Security Incidence

Equipment Failure

Human Error

What causes downtime

Business Continuity

Conventional Disaster Recovery Sites

• High cost

• Low ROI

• Implemented only for most critical systems

• Usually scaled down to 50% of production

• Systems in a remote region challenging

• Costly software licenses based on hardware usage

Disaster Recovery on AWS

• Unprecedented capabilities to implement DR sites

• Easily set up DR sites on different geographic regions

• Cut down DR site cost by up to 70%

• Substantial savings on software licenses

Global Reach from Your Desktop

Common DR architectures

Each architecture

differs from the other

In terms of RTO, RPO and Cost

AWS Storage Options

Simple Storage ServiceHighly scalable object storage

1 byte to 5TB in size

99.999999999% durability

Elastic Block StoreHigh performance block storage device

1GB to 1TB in size

Mount as drives to instances with

snapshot/cloning functionalities

GlacierLong term object archive

Extremely low cost per gigabyte

99.999999999% durability

Simple to get started

Easy starting point for exploring the AWS cloud

Low technical barrier to entry

Focus on incorporating cloud into your DR

strategy, not on complex technical issues related

to hot-hot systems

Lowest cost

Very high levels of data durability at low price

Cost of storing snapshots in Amazon S3

Archiving possibilities beyond tape using

Amazon Glacier

Backup & Restore Architecture

Back up and restore

Create instances from AMIs

Restore datafrom backups

Many Ways to Back Up

Build resources around

replicated dataset

Keep ‘pilot light’ on by replicating core

databases

Build AWS resources around dataset and

leave in stopped state

Pilot Light Architecture

Build resources around

replicated dataset

Keep ‘pilot light’ on by replicating core

databases

Build AWS resources around dataset and

leave in stopped state

Scale resources in AWS in

response to a DR event

Start up pool of resources in AWS when

events dictate

Scale up the database instance to handle

production capacity


Switchover to AWSMake necessary DNS changes to redirect

traffic to the DR site on AWS



Create instances from

AMIs


Build a environment similar to

production at a reduced scale

Keep data and files synchronized between

production and DR site by replication

Use smaller and fewer instances than Production.

Use RI (Reserved Instances) for capacity reservation

and cost savings

Scale resources in AWS in

response to a DR event

Scale out the environment by adding

more instance

Scale up the instances to handle

production capacity

Warm Standby Architecture

Switchover to AWSMake necessary DNS changes to redirect

traffic to the DR site on AWS




Build DR site as mirror image of Production

Keep all data and files synchronized between production and DR site by

synchronous replication if possible

Pick the size and number of instances based on acceptable level of performance

without any change in case of a DR event.

Use RI (Reserved Instances) for capacity reservation and cost savings

Multi-site Architecture

Load balance between

production and DR

If latency and error propagation risk

between production and DR sites are

acceptable


If DR site is isolated then

Switch over to AWS

Make necessary DNS changes to

redirect traffic to the DR site on AWS


DR site on AWS can be for

• Primary site on customer data center

• Primary on AWS itself

Primary and DR Sites on AWS

What enabled this?

• Eight isolated S3 regions

• AWS CloudFormation allows quick bootstrap of

another region.

• Route 53 latency based routing and failover

User in San

Francisco

eu-west-1 (Ireland)

us-east-1 (Northern Virginia)

us-west-1 (Northern California)us-west-1 (Northern California)

DNS Failover

What didn’t go wrong

• Official NYC evacuation map stayed up

• USA TODAY Weather map stayed up

• Thousands of other maps used for weather

reporting, data visualization and coordination

around the event all stayed up

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Disaster Recovery Site on AWS:

Minimal Cost Maximum EfficiencyAbdul Sathar Sait, AWS

March 26, 2014

Thank you!

Disaster Recovery Sites on AWS: Minimal Cost, Maximum Efficiency

Technology

dr site cost

aws resources

aws pilot light architecture

dr sites

aws warm standby architecture

dr strategy

dr event scale

primary site