Disaster Recovery Sites on AWS: Minimal Cost, Maximum Efficiency

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Disaster Recovery Site on AWS:

Minimal Cost Maximum EfficiencyRyan Holland, AWS

July 10, 2014

What You Will Learn

• Disaster Recovery and Business Continuity

• Why AWS for disaster recovery?

• Common DR architectures

• Backup and restore

• Pilot light

• Warm Standby

• Hot Standby

• Customer case study

• Where to go next

Disruptions to Business Continuity

Caused by outage of IT infrastructure

Affects businesses of all kinds and sizes

Can be very expensive

Downtime

Natural Disaster

Security Incidence

Equipment Failure

Human Error

What causes downtime

Conventional Disaster Recovery Sites

• High cost

• Low ROI

• Implemented only for most critical systems

• Usually scaled down to 50% of production

• Systems in a remote region challenging

• Costly software licenses based on hardware usage

Disaster Recovery on AWS

• Unprecedented capabilities to implement DR sites

• Easily set up DR sites on different geographic regions

• Cut down DR site cost by up to 70%

• Substantial savings on software licenses

Global Reach from Your Desktop

Tools for Implementing DR on AWS

• Leverage tools like CloudForamtion to automate deployment.

• Choose an AMI strategy that fits the RTO requirements.

• Cross-region EBS snapshot and AMI copy

• Cross-region read replicas for Amazon RDS for MySQL

• Amazon Route53 and Auto Scaling

• EC2 reserved instances

AWS Storage Options

Simple Storage ServiceHighly scalable object storage

1 byte to 5TB in size

99.999999999% durability

Elastic Block StoreHigh performance block storage device

1GB to 1TB in size

Mount as drives to instances with

snapshot/cloning functionalities

GlacierLong term object archive

Extremely low cost per gigabyte

99.999999999% durability

Common DR architectures

Each architecture

differs from the other

In terms of RTO, RPO and Cost

Simple to get started

Easy starting point for exploring the AWS cloud

Low technical barrier to entry

Focus on incorporating cloud into your DR

strategy, not on complex technical issues related

to hot-hot systems

Lowest cost

Very high levels of data durability at low price

Cost of storing snapshots in Amazon S3

Archiving possibilities beyond tape using

Amazon Glacier

Backup & Restore Architecture

Back up and restore

Create instances from AMIs

Restore datafrom backups

Many Ways to Back Up

Backup & Restore Considerations

• Make sure you keep your AMIs current

• Use CloudFormation or other automation tools

• Consider EC2 light utilization reserved instances

• Test your DR plan frequently. Then test some more.

Build resources around

replicated dataset

Keep ‘pilot light’ on by replicating core

databases

Build AWS resources around dataset and

leave in stopped state

Scale resources in AWS in

response to a DR event

Start up pool of resources in AWS when

events dictate

Scale up the database instance to handle

production capacity

Pilot Light Architecture


Create instances from

AMIs


Activating a Pilot Light DR Site

• Use CloudFormation and Auto-Scaling to stage infrastructure.

• Keep your AMIs or bootstrapping scripts current.

• Leverage EC2 heavy utilization reserved instances for the

database


Build a environment similar to

production at a reduced scale

Keep data and files synchronized between

production and DR site by replication

Use smaller and/or fewer instances than Production.

Scale resources in AWS in

response to a DR event

Scale out the environment by adding more

instance

Scale up the instances to handle production

capacity

Warm Standby Architecture



Moving Warm Standby to Production

• Use CloudFormation and Auto-Scaling to resize infrastructure.

• Leverage EC2 heavy utilization reserved instances for the

database and the warm standby instances.


Build DR site as mirror image of Production

Keep all data and files synchronized between production and DR site by

synchronous replication if possible

Pick the size and number of instances based on acceptable level of performance

without any change in case of a DR event.

Use RI (Reserved Instances) for capacity reservation and cost saving

Multi-site Architecture

Load balance between

production and DR

If latency and error propagation risk

between production and DR sites are

acceptable


If DR site is isolated then

Switch over to AWS

Make necessary DNS changes to

redirect traffic to the DR site on AWS


DR site on AWS can be for

• Primary site on customer data center

• Primary on AWS itself

Primary and DR Sites on AWS

What enabled this?

• Eight isolated S3 regions

• AWS CloudFormation allows quick bootstrap of

another region.

• Route 53 latency based routing and failover

User in San

Francisco

eu-west-1 (Ireland)

us-east-1 (Northern Virginia)

us-west-1 (Northern California)us-west-1 (Northern California)

DNS Failover

What didn’t go wrong

• Official NYC evacuation map stayed up

• USA TODAY Weather map stayed up

• Thousands of other maps used for weather

reporting, data visualization and coordination

around the event all stayed up

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Disaster Recovery Site on AWS:

Minimal Cost Maximum EfficiencyRyan Holland, AWS

Thank you!

Disaster Recovery Sites on AWS: Minimal Cost, Maximum Efficiency

Technology

dr plan frequently

disaster recovery

dr site

dr sites

northern california

rights reserved

express consent

files synchronized