Page 1
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Disaster Recovery Site on AWS:
Minimal Cost Maximum Efficiency
Abdul Sathar Sait, Vikram Garlapati, and Kamal Arora (AWS)
November 15, 2013
Page 2
What you will learn
• Why AWS for disaster recovery?
• Common DR architectures
– Pilot light architecture
• Demo
• Code walkthrough
– Backup and restore
• Customer case studies
• Where to go next
Page 3
Conventional Disaster Recovery sites
• High cost
• Low ROI
• Implemented only for most critical systems
• Usually scaled down to 50% of production
• Systems in a remote region challenging
• Costly software licenses based on hardware usage
Page 4
Disaster Recovery site on AWS
• Unprecedented capabilities to implement DR sites
• Easily setup DR sites on different geographic regions
• Cut down DR site cost by up to 70%
• Substantial savings on software licenses
Page 5
Global reach from your desktop
Page 6
Common DR architectures
Backup and
restore Pilot light
Warm standby
Hot standby
Page 7
Pilot light architecture
Page 8
Pilot light architecture
Create instances from
AMIs
Page 9
Build resources around
replicated dataset
Keep ‘pilot light’ on by replicating core
databases
Build AWS resources around dataset and
leave in stopped state
Pilot light architecture
Page 10
Build resources around
replicated dataset
Keep ‘pilot light’ on by replicating core
databases
Build AWS resources around dataset and
leave in stopped state
Scale resources in AWS in
response to a DR event
Start up pool of resources in AWS when
events dictate
Scale up the database instance to handle
production capacity
Pilot light architecture
Page 11
Pilot light architecture
Switchover to AWS Make necessary DNS changes to redirect
traffic to the DR site on AWS
Page 13
Setup Data Replication
Active Passive
Amazon Route 53
Scaled down Standby
Elastic Load
Balancing
Data Volume
Web/ App servers
US East (N. Virginia)
Web/ App Server AMI
Simple DR solution – awsdrdemo.com
Copy AMI
US West (N. California)
Active
Auto scaling Group
Oracle Master
DB
Oracle Slave DB
Page 14
Active
Amazon Route 53
Elastic Load
Balancing
Data Volume
Web/ App servers
US East (N. Virginia)
Simple DR solution – awsdrdemo.com
US West (N. California)
Gone Active
Elastic Load
Balancing
Data Volume
Web/ App servers
Active
Auto Scaling group
Oracle Master
DB
Oracle Slave DB
DNS Failover
Autoscale
Scale up DB
Page 15
Architecture
Active Mirroring /
Replication
Active Passive Amazon Route 53
AMI - Scaled down
Standby
Data Volume
Secondary DB
US West (N. California) Data
Volume
Primary
Web/ App server
US East (N. Virginia)
Webserver AMI
AMI Copy
(ami-996634f0)
Failover App
VPC ID - vpc-a4f2efcc
Subnet IDs-
subnet-bbf2efd3
subnet-884b01ce
subnet-bef2efd6
VPC ID - vpc-5f9ef53e
Subnet IDs-
subnet-440c786c
subnet-289ef549
subnet-2c9ef54d
DR ELB -
Created on Failover
Web Servers:
i-36af5751
awsdrdemo.com
Active ELB:
DRDemoPrimaryELB-
52152634.us-east-
1.elb.amazonaws.com
Primary Database Server:
(i-026aad65)
Private IP
174.168.1.11
Secondary Database Server:
(i-3b266960)
Private IP
174.168.1.11
Failover App Instance:
i-55cfde0e
Elastic IP
54.215.157.25
Web Servers -
Created on Failover
failover.awsdrdemo.com
Page 16
console.aws.amazon.com
Demo – AWS Resources
Page 17
awsdrdemo.com
Demo – Application
Page 18
failover.awsdrdemo.com
Demo – Failover Kickoff
Page 19
status.awsdrdemo.com/dr
Demo – Failover Status Updates
Page 20
Failover Steps
Launch Failover
Application
AWS CloudFormation
- Launch web servers
Resize Target
Database Instance
Route 53 DNS
Updates
AWS CloudFormation
–
Launch ELB Go Live
Page 21
Failover Application Architecture
AWS Region
Webserver AMI
Failover App
CLI
(3)
Launch
CloudFormation
Admin
Users
SNS HTTP
Notification
(5)
CF
Updates
(4)
Script
Updates
(2)
Invoke
Shell Script
(1)
Trigger DR
procedure
(6)
Real-time
feed from SNS
Page 22
Metadata Requests // Sample code for metadata request using .NET API SDK
string uri = "http://169.254.169.254/latest/meta-data/placement/availability-zone";
// Create Web Request
HttpWebRequest webrequest = (HttpWebRequest)WebRequest.Create(uri);
HttpWebResponse webresponse =
webresponse = (HttpWebResponse)webrequest.GetResponse();
Encoding enc = System.Text.Encoding.GetEncoding(1252);
StreamReader loResponseStream = new
StreamReader(webresponse.GetResponseStream(), enc);
// get availability zone value
string availzone = loResponseStream.ReadToEnd();
Page 23
Amazon Route53 Updates
# Retrieving existing ELB details from Route53 Hosted Zone..“
domainname=www.awsdrdemo.com
hostedzoneid="ZXXXXXXXXXXXXR“
# Retrieve ELB alias zone-id from existing Route53 zone
zoneid= $(aws --region us-west-1 --output text route53 list-resource-record-sets --hosted-zone-id $hostedzoneid --
start-record-name $domainname --start-record-type A --max-items 1 | grep ALIASTARGET | awk {'print $2'})
dns=$(aws --region us-west-1 --output text route53 list-resource-record-sets --hosted-zone-id $hostedzoneid --start-
record-name $domainname --start-record-type A --max-items 1 | grep ALIASTARGET | awk {'print $4'})
aws --region us-west-1 route53 change-resource-record-sets --hosted-zone-id $hostedzoneid --
change-batch file:///usr/local/bin/route53.json
http://vrg.s3.amazonaws.com/downloads/route53.json
Page 24
Resize Database Instance # Stopping DB instance for resizing
aws --region us-west-1 ec2 stop-instances --instance-ids $dbInstanceId
# Publish Amazon SNS messages for actions
aws --region us-west-1 sns publish --topic-arn $snsarn --message "Resizing the stopped
instance“
# Resize the DB instance
aws --region us-west-1 ec2 modify-instance-attribute --instance-id $dbInstanceId --instance-
type "{\"Value\": \"m1.small\"}"
# Start the resized DB instance
aws --region us-west-1 ec2 start-instances --instance-ids $dbInstanceId
Page 25
AWS CloudFormation Stack Launch # Launch DR stack using AWS CloudFormation script
launchedstackid =$(aws --region us-west-1 --output text cloudformation create-stack --stack-
name $stackname --template-body file:///usr/local/bin/ELBWithEC2Instances.template --
notification-ar-ns $snsarn --parameters
ParameterKey="HostedZoneId",ParameterValue="$hostedzoneid")
Page 26
AWS CloudFormation Template {
"AWSTemplateFormatVersion" : "2010-09-09",
"Description" : "AWS CloudFormation Template ELBWithEC2Instances: Create a load balanced, Auto Scaled sample website where the instances are locked down to only accept traffic from the load balancer. This script creates an Auto Scaling group behind a load balancer with a simple health check. The web site is available on port 80, however, the instances can be configured to listen on any port (8888 by default).",
"Parameters" : {
"KeyPairName" : {
"Description" : "Name of an existing Amazon EC2 key pair for SSH access",
"Type" : "String",
"Default" : "kamalkeydr"
},
"InstanceType" : {
"Description" : "WebServer EC2 instance type",
"Type" : "String",
"Default" : "m1.small",
"AllowedValues" : [ "t1.micro","m1.small","m1.medium","m1.large","m1.xlarge","m2.xlarge","m2.2xlarge","m2.4xlarge","c1.medium","c1.xlarge","cc1.4xlarge","cc2.8xlarge","cg1.4xlarge"],
"ConstraintDescription" : "must be a valid EC2 instance type."
},
"WebServerPort" : {
"Description" : "TCP/IP port of the web server",
"Type" : "String",
"Default" : "80"
},
"HostedZoneId" : {
"Type" : "String",
"Description" : "The Record Set's Hosted Zone Id for the existing hosted zone",
"Default" : "Z1M58G0W56PQJA"
}
},
"Mappings" : {
"AWSInstanceType2Arch" : {
"t1.micro" : { "Arch" : "64" },
"m1.small" : { "Arch" : "64" },
"m1.medium" : { "Arch" : "64" },
"m1.large" : { "Arch" : "64" },
"m1.xlarge" : { "Arch" : "64" },
"m2.xlarge" : { "Arch" : "64" },
"m2.2xlarge" : { "Arch" : "64" },
"m2.4xlarge" : { "Arch" : "64" },
"c1.medium" : { "Arch" : "64" },
"c1.xlarge" : { "Arch" : "64" }
},
"AWSRegionArch2AMI" : {
"us-west-1" : { "32" : "ami-5e41761b", "64" : "ami-5e41761b" }
}
},
"Resources" : {
"WebServerGroup" : {
"Type" : "AWS::AutoScaling::AutoScalingGroup",
"Properties" : {
"AvailabilityZones" : [ "us-west-1a"],
"LaunchConfigurationName" : { "Ref" : "LaunchConfig" },
"MinSize" : "2",
"MaxSize" : "2",
"LoadBalancerNames" : [ { "Ref" : "ElasticLoadBalancer" }],
"VPCZoneIdentifier" : ["subnet-bbf2efd3"]
}
},
"LaunchConfig" : {
"Type" : "AWS::AutoScaling::LaunchConfiguration",
"Properties" : {
"ImageId" : { "Fn::FindInMap" : [ "AWSRegionArch2AMI", { "Ref" : "AWS::Region" },
{ "Fn::FindInMap" : [ "AWSInstanceType2Arch", { "Ref" : "InstanceType" },
"Arch" ] } ] },
"UserData" : { "Fn::Base64" : { "Ref" : "WebServerPort" }},
"SecurityGroups" : [ { "Ref" : "InstanceSecurityGroup" } ],
"InstanceType" : { "Ref" : "InstanceType" },
"KeyName" : { "Ref" : "KeyPairName" },
"AssociatePublicIpAddress" : "true"
}
},
"ElasticLoadBalancer" : {
"Type" : "AWS::ElasticLoadBalancing::LoadBalancer",
"Properties" : {
"SecurityGroups" : [ { "Ref" : "LoadBalancerSecurityGroup" } ],
"Subnets" : ["subnet-bbf2efd3"],
"Listeners" : [ {
"LoadBalancerPort" : "80",
"InstancePort" : { "Ref" : "WebServerPort" },
"Protocol" : "HTTP"
} ],
"HealthCheck" : {
"Target" : { "Fn::Join" : [ "", ["HTTP:", { "Ref" : "WebServerPort" }, "/"]]},
"HealthyThreshold" : "2",
"UnhealthyThreshold" : "10",
"Interval" : "10",
"Timeout" : "3"
}
}
},
"LoadBalancerSecurityGroup" : {
"Type" : "AWS::EC2::SecurityGroup",
"Properties" : {
"GroupDescription" : "Enable HTTP access on port 80",
"VpcId" : "vpc-a4f2efcc",
"SecurityGroupIngress" : [ {
"IpProtocol" : "tcp",
"FromPort" : "80",
"ToPort" : "80",
"CidrIp" : "0.0.0.0/0"
} ],
"SecurityGroupEgress" : [ {
"IpProtocol" : "tcp",
"FromPort" : { "Ref" : "WebServerPort" },
"ToPort" : { "Ref" : "WebServerPort" },
"CidrIp" : "0.0.0.0/0"
} ]
}
},
"myDNS" : {
"Type" : "AWS::Route53::RecordSetGroup",
"Properties" : {
"HostedZoneName" : "awsdrdemo.com.",
"Comment" : "Zone apex alias targeted to myELB LoadBalancer.",
"RecordSets" : [
{
"Name" : "www.awsdrdemo.com.",
"Type" : "A",
"AliasTarget" : {
"HostedZoneId" : { "Fn::GetAtt" : ["ElasticLoadBalancer", "CanonicalHostedZoneNameID"] },
"DNSName" : { "Fn::GetAtt" : ["ElasticLoadBalancer","CanonicalHostedZoneName"] }
}
}
]
}
},
"InstanceSecurityGroup" : {
"Type" : "AWS::EC2::SecurityGroup",
"Properties" : {
"GroupDescription" : "Enable SSH access and HTTP access on the inbound port",
"VpcId" : "vpc-a4f2efcc",
"SecurityGroupIngress" : [ {
"IpProtocol" : "tcp",
"FromPort" : { "Ref" : "WebServerPort" },
"ToPort" : { "Ref" : "WebServerPort" },
"CidrIp" : "0.0.0.0/0"
} ]
}
}
},
"Outputs" : {
"URL" : {
"Description" : "URL of the website",
"Value" : { "Fn::Join" : [ "", [ "http://", { "Fn::GetAtt" : [ "ElasticLoadBalancer", "DNSName" ]}]]}
}
}
}
HEADERS
PARAMETERS
MAPPINGS
RESOURCES
OUTPUTS
http://vrg.s3.amazonaws.com/downloads/ELBWithEC2Instances.template
Page 27
Parameters "Parameters" : {
"KeyPairName" : {
"Description" : "Name of an existing Amazon EC2 key pair for SSH access",
"Type" : "String"
},
"InstanceType" : {
"Description" : "WebServer EC2 instance type",
"Type" : "String",
"Default" : "m1.small",
"AllowedValues" : [
"t1.micro","m1.small","m1.medium","m1.large","m1.xlarge","m2.xlarge","m2.2xlarge","m2.4xlarge","c1.medium","c1.xlarge","cc1.4xlarge","cc2.8xl
arge","cg1.4xlarge"],
"ConstraintDescription" : "must be a valid EC2 instance type."
},
"HostedZoneId" : {
"Type" : "String",
"Description" : "The Record Set's Hosted Zone Id for the existing hosted zone"
}
}
Page 28
Resources – Web Servers "WebServerGroup" : {
"Type" : "AWS::AutoScaling::AutoScalingGroup",
"Properties" : {
"AvailabilityZones" : [ "us-west-1a"],
"LaunchConfigurationName" : { "Ref" : "LaunchConfig" },
"MinSize" : "2",
"MaxSize" : "2",
"LoadBalancerNames" : [ { "Ref" : "ElasticLoadBalancer" }],
"VPCZoneIdentifier" : ["subnet-bbf2efd3"]
}
},
"LaunchConfig" : {
"Type" : "AWS::AutoScaling::LaunchConfiguration",
"Properties" : {
"ImageId" : { "Fn::FindInMap" : [ "AWSRegionArch2AMI", { "Ref" : "AWS::Region" },
{ "Fn::FindInMap" : [ "AWSInstanceType2Arch", { "Ref" : "InstanceType" }, "Arch" ] } ] },
"UserData" : { "Fn::Base64" : { "Ref" : "WebServerPort" }},
"SecurityGroups" : [ { "Ref" : "InstanceSecurityGroup" } ],
"KeyName" : { "Ref" : "KeyPairName" }
}
Page 29
status.awsdrdemo.com/dr
Demo – Failover Status Updates
Page 30
Disaster recovery site on AWS can be for
• Primary site on customer data center
• Primary on AWS itself
Page 31
Primary and DR sites on AWS
Page 32
Backup & Restore pattern
Simple to get started
Easy starting point for exploring the
AWS cloud
Low technical barrier to entry
Focus on incorporating cloud into your
DR strategy, not on complex technical
issues related to hot-hot systems
Cost-effective
Very high levels of data durability at
low price
Cost of storing snapshots in
Amazon S3
Archiving possibilities beyond tape
using Amazon Glacier
Page 33
Backup and restore
Page 34
Backup and restore
Page 35
Create instances from
AMIs
Restore data from backups
Backup and restore
Page 36
Many ways to backup
Page 37
Disaster Recovery site on AWS can be for
• Primary site on customer data center
• Primary on AWS itself
Page 38
Primary and DR sites on AWS
Page 39
Customer case study
Page 40
We are sincerely eager to hear
your feedback on this
presentation and on re:Invent.
Please fill out an evaluation form
when you have a chance.