How to Design for High Availability & Scale with AWS

Blazeclan 1

Blazeclan

Agenda

Introduction

High Availability

Scalability

Fault Tolerance

AWS Global Infrastructure

Key Design Concepts

Design for Failure

Scaling

Self Healing / Fault Tolerant

Multiple AZ Architecture

Loose Coupling

Sample Architectures

Cloud IT Better2

Blazeclan 3 Cloud IT Better

Introduction

Blazeclan

How Often Do You See This?

Cloud IT Better4

Blazeclan

Cost of Downtime

Cloud IT Better5

A report published in 2010 for top

412 eCommerce sites says,

• The median length of downtime was 840

minutes

• On average, each of them saw 3291 minutes

of downtime

Lost Revenue

• On average, each of them lost $800,099 in

revenue due to downtime

• The total amount of revenue lost due to

downtime of all 412 companies

was $329,640,928!

Blazeclan

Online Business & Downtime Facts

Cloud IT Better6

The Average Hourly Loss because

of Data Center Down Time in 2012

Source: http://www.techrepublic.com/blog/data-center/infographic-the-outrageous-costs-of-data-center-downtime

Blazeclan

How to Build a HIGHLY

AVAILABLE, SCALABLE,

DURABLE AND

RESILIENT Web Application

Cloud IT Better7

Blazeclan

High Availability

• Up Time of an Application

• Planned or Unplanned Outage or Downtime

• Offline, Unreachable, or Partially Available

• Slow to Use

• Goal

• No Downtime

• Always Available

Cloud IT Better8

uptime

99.999%

Blazeclan

Scalability

Cloud IT Better9

Demand

Time

Resources

Scalability doesn’t

Guarantee Availability

Ability of an

Application to

accommodate

change in traffic

without

architectural

changes

Availability may be impacted if application cannot Scale

Blazeclan

Fault Tolerance

• Built-in Redundancy so

applications can Continue

Functioning when Components

fail

• Fault tolerance is crucial to

High Availability

Cloud IT Better10

X

X

Image courtesy: Gigamone.com


AWS Global Infrastructure

Blazeclan

AWS democratizes High Availability

• Multiple Servers

• Isolated Redundant Data

Centers

• Regions across the

Globe

• Availability Zones within

Regions

Cloud IT Better12

Source: http://aws.amazon.com/about-aws/globalinfrastructure/#reglink-sa

Blazeclan

AWS Capacity

Cloud IT Better13

Source: http://www.slideshare.net/AmazonWebServices/aws-webinar-scaling-on-aws-for-the-first-10-million-users

Blazeclan

AWS Platform

Cloud IT Better14

Source : http://www.slideshare.net/AmazonWebServices/aws-webinar-scaling-on-aws-for-the-first-10-million-users

Blazeclan

AWS Building Blocks

Cloud IT Better15

Inherently Highly Available and Fault Tolerant Services

Amazon S3

Amazon SQS

Amazon DynamoDB

Amazon SNS

Amazon CloudFront

Amazon SES

Amazon Route53

Amazon SWF

Elastic Load Balancer

…

Highly Available with Right Architecture

Amazon EC2

Amazon EBS

Amazon RDS

Amazon VPC

Sp

an

Acro

ss A

Z’s

Arc

hit

ect

Acro

ss A

Z’s


Design For Failure

Blazeclan Cloud IT Better17

Avoid Impact on

Business

Avoid

single

points of

failureApplication

Should

Continue to

Function

Assume

everything

fails, and

work

backwards

Everything fails, all the time

– Werner Vogels, CTO, Amazon

Obama’s Prized Limo after it

broke down in his Israel visit!

Blazeclan

Ask Questions for Right Architecture

Cloud IT Better18

What happens if a node in your system fails?

If there are master and slaves

In your architecture, what if the master

node fails?

If a load balancer is sitting in front

of an array of application servers, what if

that load balancer fails?

What are my single points of failure?

What kind of Scenarios do I

have to plan for?

Blazeclan

Lots of Questions

Cloud IT Better19

How do you recognize that failure?

How do I replace that node?

What if the cache keys grow beyondmemory limit of an instance?

How does the failover occur &how is a new slave instantiated &brought into sync with the master?

What if downstream servicetimes out or returns an exception?

Blazeclan

Build Mechanisms to Handle Failure

Cloud IT Better20

• Build process threads that resume on reboot

• Allow the state of the system to re-sync

by reloading messages from queues

• Keep pre-configured and pre-optimized

virtual images to support above point

on launch/boot

• Avoid in-memory sessions or stateful

user context, move that to data stores

• Have a coherent backup and restore

strategy for your data and automate it

Image courtesy: http://www.outsmarthormones.com/wp-content/uploads/2011/06/Fix.jpg

Blazeclan

Design for Failure

Cloud IT Better21

Source: http://media.amazonwebservices.com/architecturecenter/AWS_ac_ra_ftha_04.pdf


Scaling

Blazeclan

Auto Scaling

• Enables to automatically scale

Amazon EC2 capacity up or down

• Enables to terminate Server

Instances at will

• Enables to add more instances

in response to an increasing load

• Enables launch of a replacement

instance immediately, in case of a failure

• Enables application to transition

seamlessly in case the primary server fails

Cloud IT Better23

Image Courtesy: http://www.knovelblogs.com/wp-content/uploads

Blazeclan

Elastic Load Balancing (ELB)

• Distributes incoming traffic to a

application across several Amazon

EC2 instances

• ELB is given a DNS host name &

Requests Sent to this host name

are Delegated to a pool

of Amazon EC2 instances

• ELB Detects Unhealthy Instances

within its pool of Amazon EC2 instances and automatically

reroutes traffic to healthy instances, until the unhealthy

instances have been restored

Cloud IT Better24

Blazeclan

ELB & Auto Scaling

• Auto Scaling & ELB are

an ideal combination

• ELB gives a single DNS

name for addressing

• Auto Scaling ensures

there is always the right

number of healthy

Amazon EC2 instances to

accept requests

Cloud IT Better25


Fault Tolerant

Blazeclan

Fault Tolerance

• In order to build fault-tolerant

applications on Amazon EC2,

it’s important to follow best

practices such as,

• Quickly being able to commission

replacement instances

• Using Amazon EBS for persistent

storage

• Use Multiple Availability Zones and

elastic IP addresses.

Cloud IT Better27


Multi-AZ Architecture

Blazeclan

Multi-AZ Design Considerations

Cloud IT Better29

• Achieve greater Fault Tolerance

by Distributing your application geographically

• The Amazon EC2 service level

agreement commitment is 99.95%

availability for each Amazon EC2 Region

• Deploy application that spans

across multiple Availability Zones

• Redundant instances for each tier of an

application could be placed in distinct Availability Zones

• ELB can automatically balance traffic across multiple instances &

multiple Availability Zones

Image Courtesy: http://chriscampcommunications.blogspot.in

Blazeclan

Multi- AZ Architecture

Cloud IT Better30


Loose Coupling

Blazeclan

Loose Coupled Systems

• Loosely coupled systems are

more fault tolerant and can achieve

a bigger scale

• Loosely coupled systems on AWS

• De-coupling systems allows for hybrid models

(in-cloud + in-physical data center)

• Balancing between clusters enables easier scaling

• Using queues (Amazon SQS) buffers against failures

Cloud IT Better32

• Design for a jumble of black boxes

Blazeclan

Decoupling using SQS

Cloud IT Better33

Blazeclan

Loose Coupling - Best Practices on AWS

Cloud IT Better34

• Use Amazon SQS to isolate components

• Use Amazon SQS as buffers between components

• Design every component such that it expose a service

interface and is responsible for its own scalability and

interacts with other components asynchronously

• Bundle the logical construct of a component

into an Amazon Machine Image so that it can

be deployed more often

• Make your applications as stateless as

possible. Store session state outside of component

(in Amazon SimpleDB, if appropriate)


SampleArchitectures

Blazeclan

High Availability Architecture in RDS

Cloud IT Better36

Blazeclan

Web Hosting on AWS

Cloud IT Better37

Blazeclan

Scalable Reader Farm

Cloud IT Better38

Blazeclan

Design for High Availability & Scale

Don’t let this happen to your Business

Our AWS Expert Solution Architects can help

you review your Architecture.

Avail for our 2hr Free Consultancy!

For any assistance please contact us at

[email protected]

Cloud IT Better39

Blazeclan

Upcoming Webinars

Cloud IT Better40

Check out Our Upcoming Webinars

www.blazeclan.com/webinars

http://www.blazeclan.com/webinars

Blazeclan

[email protected]

Follow Us On :

Our Blog : http://blog.blazeclan.com/

Thank you

https://www.facebook.com/cloudITbetter?ref=hl

https://www.facebook.com/cloudITbetter?ref=hl

https://twitter.com/cloudlytics

https://twitter.com/cloudlytics

http://www.linkedin.com/company/blazeclan-technologies-pvt-ltd-?trk=top_nav_home

http://www.linkedin.com/company/blazeclan-technologies-pvt-ltd-?trk=top_nav_home

https://plus.google.com/112247726525503815239

https://plus.google.com/112247726525503815239

http://www.slideshare.net/cloudITbetter

http://www.slideshare.net/cloudITbetter

http://blog.blazeclan.com/

http://pinterest.com/clouditbetter/

http://pinterest.com/clouditbetter/

How to Design for High Availability & Scale with AWS

Technology

loosely coupled systems

multiple availability zones

aws global infrastructure

amazon ec2 instances

amazon ec2

high availability

fault tolerant

az architecture