Top Banner
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling Derek Chiles, AWS Solutions Architecture (@derekchiles) July 10, 2014
80

More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Sep 08, 2014

Download

Technology

Running your Amazon EC2 instances in Auto Scaling groups allows you to improve your application's availability right out of the box. Auto Scaling replaces impaired or unhealthy instances automatically to maintain your desired number of instances (even if that number is one). You can also use Auto Scaling to automate the provisioning of new instances and software configurations as well as to track of usage and costs by app, project, or cost center. Of course, you can also use Auto Scaling to adjust capacity as needed - on demand, on a schedule, or dynamically based on demand. In this session, we show you a few of the tools you can use to enable Auto Scaling for the applications you run on Amazon EC2.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

More Nines for Your Dimes: Improving Availability and Lowering Costs

using Auto Scaling

Derek Chiles, AWS Solutions Architecture

(@derekchiles)

July 10, 2014

Page 2: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Topics We’ll Cover Today• Auto Scaling introduction

• Console demo

• Maintaining application response times and fleet utilization

• Handling cyclical demand, unexpected “weather events”

• Auto Scaling for 99.9% Uptime

• Single-instance groups

• Cost control and asymmetric scaling responses

• CloudFormation, custom scripts, and multiple inputs

• Using performance testing to choose scaling strategies

• Dealing with bouncy or steep curves

AWS

The Weather Channel

Nokia

Adobe

Dreambox

Page 3: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Ways You Can Use Auto Scaling

Launch EC2 instances

and groups from

reusable templates

Scale up and down as

needed automatically

Auto-replace

Instances and

maintain EC2 capacity

Page 4: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Common Scenarios

• Schedule a one-time scale out and flip to production

• Follow daily, weekly, or monthly cycles

• Provision capacity dynamically by scaling on CPU, memory,

request rate, queue depth, users, etc.

• Auto-tag instances with cost center, project, version, stage

• Auto-replace instances that fail ELB or EC2 checks

• Auto-balance instances across multiple zones.

Prepare for a Big Launch

Fit Capacity to Demand

Be Ready for Spikes

Simplify Cost Allocation

Maintain Stable Capacity

Go Multi-AZ

Page 5: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Demo

Learn the new terms:

Launch Configuration

Auto Scaling Group

Scaling Policy

Amazon CloudWatch Alarm

Amazon SNS Notification

Page 6: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2
Page 7: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

What’s New in Auto Scaling

Better integration

• EC2 console support

• Scheduled scaling policies in

CloudFormation templates

• ELB connection draining

• Auto-assign public IPs in VPC

• Spot + Auto Scaling

More APIs

• Create groups based on running

instances

• Create launch configurations based

on running instances

• Attach running instances to a group

• Describe account limits for groups

and launch configs

Page 8: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Why Auto Scaling?

Scale Up Control CostsImprove Availability

Page 9: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Why Auto Scaling?

Scale Up Control CostsImprove Availability

Page 10: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

The Weather Company

• Top 30 web property in the U.S.• 2nd most viewed television

channel in the U.S.• 85% of U.S. airlines depend on

our forecasts• Major retailers base marketing

spend and store displays based on our forecasts

• 163 million unique visitors across TV and web

Page 11: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Wunderground Radar and

Maps

100 million hits a day

One Billion data points per day

Migrated real-time radar mapping system wunderground.com to

AWS Cloud

Page 12: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

30,000Personal

Weather

StationsSource: Wunderground, Inc. 2013

Page 13: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Why Auto Scaling?

Page 14: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Why Auto Scaling?

Page 15: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Why Auto Scaling?

Page 16: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Why Auto Scaling?

Page 17: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Why Auto Scaling?

Hurricane Sandy

Page 18: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Before Migration – Traditional IT Model doesn’t scale well

Server Count(110 Servers)

Avg. CPU Load HTTP Response Latency(~6000 ms)

HTTP Response Latency(5-15ms)

Server Count(from 110 to 170 Instances)

Avg. CPU Load

After Migration - Wunderground Radar App

Page 19: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Radar on AWS Auto Scaling Architecture

Page 20: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Radar on AWS

CPU Utilization

Page 21: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Radar on AWS

Host Count

Page 22: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Radar on AWS

Page 23: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Radar on AWS

Page 24: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Radar on AWS

Page 25: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Scale up to ensure consistent

performance during high-demand

Page 26: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Why Auto Scaling?

Scale Up Control CostsImprove Availability

Page 27: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Auto Scaling

for 99.9%

Uptime

Page 28: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Here.com Local Search Application

• Local Search app

• First customer facing

application on AWS

• Obvious need for

Uptime

Page 29: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Here.com Local Search Architecture

US-East-1

US-West-2

EU-West-1

US-East-1a

Zookeeper1

Zookeeper2

Zookeeper3

Frontend

Group

Backend

Groups

US-East-1b

Zookeeper1

Zookeeper2

Zookeeper3

Frontend Group

Backend Groups

AP-Southeast-1

Page 30: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Here.com Local Search Architecture

US-East-1

US-West-2

EU-West-1

US-East-1a

Zookeeper1

Zookeeper2

Zookeeper3

Frontend

Group

Backend

Groups

US-East-1b

Zookeeper1

Zookeeper2

Zookeeper3

Frontend Group

Backend Groups

AP-Southeast-1

Single-Instance Auto Scaling

Groups (Zookeeper)

1. Auto-healing: Instances auto-register in

DNS via Route53

2. Dynamic: Auto Scaling Group Names

are used for cluster-node lookups

(cluster1-zookeeper1)

3. Used Standard Tools such as DNS

instead of Queries or Elastic IPs

Page 31: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Auto Scaling when upgrading

without any downtime

Page 32: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Map Data on S3

US-East-1a

Zookeeper1

cluster1

old old

Page 33: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Map Data on S3

US-East-1a

Zookeeper1

cluster1

old old

New Data

V2

Page 34: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Map Data on S3

US-East-1a

Zookeeper1

cluster1

old old

New Data

V2

Page 35: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Map Data on S3

US-East-1a

Zookeeper1

cluster1

old old

New Data

V2

Page 36: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Map Data on S3

US-East-1a

Zookeeper1

cluster1

old old

New Data

V2

Page 37: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Map Data on S3

US-East-1a

Zookeeper1

cluster1

old old

New Data

V2

New

v2New

V2

Page 38: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Map Data on S3

US-East-1a

Zookeeper1

cluster1

old old

New Data

V2

New

v2New

V2

Page 39: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

“Auto scaling”

Web Server Fleet(Amazon EC2)

Database Fleet(RDS or DB on EC2)

Load Balancing(ELB)

v1.1 v1.1

v1.1 v1.1

v1.2

v1.2

v1.2

v1.2

Auto scalingMax instances

Min instances

Scaling Trigger

Custom Metrics

Upper Threshold

Lower Threshold

Increment by

Common scenario: Blue Green Deployments

Using Auto Scaling

Page 40: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Here.com Local Search Success

• Increased Uptime to 99.9%

• All detected health problems have been successfully replaced by Auto Scaling with zero intervention.

• Zookeeper setup has performed flawlessly

“We’ve been

paranoid so it still

pages us; It’s

beginning to feel

silly.”

Page 41: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Why Auto Scaling?

Scale Up Control CostsImprove Availability

Page 42: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Adobe Creative

Cloud Runs on

AWS

Page 43: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Adobe Shared

Cloud Architecture

on AWS

Page 44: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Auto Scaling the Web Layer

Based on Number of HTTP requestsAverage CPU loadNetwork in/out

Page 45: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Auto Scaling the Web Layer

Auto Scaling the Worker Layer

Based on SQS queue length

Based on Number of HTTP requestsAverage CPU loadNetwork in/out

Page 46: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Scale up fast, scale down slow

Page 47: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Cost Control

• Scheduled scaling: we analyzed our traffic and

picked numbers. – scale up in the morning, scale down in the evening

• Policies for slow scale down

• Stage environments: downscale everything to

“min-size” daily (or more)

Page 48: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

CloudFormation + Auto Scaling"ScaleUpPolicy" : {

"Type" : "AWS::Auto Scaling::ScalingPolicy", "Properties" : {

"AdjustmentType" : "ChangeInCapacity", "Auto ScalingGroupName" : { "Ref" : "WorkerAuto ScalingGroup" }, "Cooldown" : {"Ref": "cooldown"}, "ScalingAdjustment" : { "Ref" : "adjustup" }

} },"WorkerAlarmScaleUp": {

"Type": "AWS::CloudWatch::Alarm", "Properties": {

"EvaluationPeriods":{"Ref" : "evalperiod"}, "Statistic": "Sum", "Threshold": {"Ref" : "upthreshold"}, "AlarmDescription": "Scale up if the work load of transcode queue is high", "Period": {"Ref" : "period"}, "AlarmActions": [ { "Ref": "ScaleUpPolicy" }, { "Ref" : "scalingSNStopic" } ], "Namespace": "AWS/SQS", "Dimensions": [ { "Name": "QueueName", "Value": {"Ref" : "queuename" }}], "ComparisonOperator": "GreaterThanThreshold", "MetricName": "ApproximateNumberOfMessagesVisible"

Page 49: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

How – Custom Metrics. . .

Sat Oct 6 05:51:03 UTC 2012

Number of AZs: 4

Number of Web Servers: 16

Number of Healthy Web Servers: 16

ELB Request Count: 9523.0

Request Count Per Healthy Web Server: 595.1875

Network In Per Healthy Web Server: 51 MB

Network Out Per Healthy Web Server: 1 MB

CPU Per Healthy Web Server: 25.23875

Publishing Custom Metrics: InstanceRequestCount, HealthyWebServers, InstanceNetworkIn, InstanceNetworkOut, InstanceCPUUtilization to namespace WebServer in us-east-1

. . .

Page 50: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

How – multi-input scaling

Scale up

Scale down

+2 instances if more than 50 visible messages for >5 min

+50% instances if more than 1000 msg for >2 min

+ fixed 100 instances if more than 10000 msg for >1 min

-10 instance if 0 msg for more than 10 min

-25% if 0 msg for more than 30 min

Page 51: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Adobe’s Advice

• Use CloudFormation!

• Know your system, thresholds

• Watch your scaling history

• Scaling up is easy, scaling down not so much

• Mantra: scale up fast; scale down slow

Page 52: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2
Page 53: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Scaling strategies we use

Scaling with

CloudWatch alarmsScheduled scaling (onetime, recurring)

Page 54: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

A little background on our application

• Ruby on Rails

• Unicorn

• We teach kids math!

Page 55: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

A workload well suited for auto scaling

Page 56: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Scaling with CloudWatch alarms

Page 57: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Performance test to get a baseline• Discover the ideal number of

worker processes per server– Too few and resources go

unused

– Too many and performance suffers under load

• Obtain the maximum load sustainable per server– Our performance tests

measures number of concurrent users

• Find the chokepoint– For us, this was CPU utilization

Page 58: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Performance testing

Page 59: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Identify the breaking point

Breaking point was at about 400 users per server

Page 60: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Our first method to find scale points• Provision a static amount

of servers that we know can handle peak load

• Adjust scale up and scale down alarms based on observed highs and lows

• This worked, but was super inefficient, both in time and money spent

Page 61: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Let’s do some math – identify variables

Independent

• Concurrent users

Dependent

• CPU utilization

• Memory utilization

• Disk I/O

• Network I/O

Page 62: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Let’s do some math – find the slope• Adding about 1600 users per hour

• Which is about 27 per minute

• We know that we can handle a max of

about 400 users per server at 80% CPU

usage

• Which is about 0.2% CPU usage per user

Page 63: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Let’s do some math – when to scale?

• We know (from other testing) that it takes

us about 5 minutes for a new node to

come online

• We’re adding 27 users per minute

• Which means we need to start spinning

up new nodes when we’re about 135

users ( 27 x 5 ) per node short of max

• Which is at about 53% utilization:

(80% - (0.2% * 135))

Page 64: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

How much to scale up by?

• The lowest we can scale up by is 1 node per AZ,

otherwise we would be unbalanced

• For us, this is an extra 800 users of capacity in

five minutes, plenty enough to keep up with our

rate of adding 1600 users per hour

• Adding 800 users of capacity every five minutes,

we could support 9600 additional users per hour

Page 65: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Evaluate your predictions• In the real world, we’ve inched up from

scaling at 53%

• Our perf test is a little harsher than the

real world

• Numbers derived from the perf test are

only as accurate as the simulation of

traffic you in your perf test.

Page 66: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Scheduled scaling

Page 67: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Acceleration in load is not constantRequest count for a 24 hour period

Page 68: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

We can’t use one size fits all• Scale too aggressively

– Overprovisioning: increases

cost

– Bounciness: we add more

than we need and have to

partially scale back shortly

after scaling up, which

increases cost

• Scale too timidly

– Poor performance

– Outages due to lack of

capacity

Page 69: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Putting it all together

Page 70: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

The opportunity cost of NOT scaling

• Our usage curve

from 3/20

• Low of about 5

concurrent users

• High of about

10,000 concurrent

users

Page 71: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

The opportunity cost of NOT scaling

• No autoscaling

• 672 instance hours

• $302.40 at on-

demand prices

Page 72: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

The opportunity cost of NOT scaling

• Autoscaling four

times per day

• 360 instance hours

• $162 at on-

demand prices

• 46% savings vs no

autoscaling

Page 73: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

The opportunity cost of NOT scaling

• Autoscaling as

needed, twelve

times per day

• 272 instance hours

• $122.40 at on-

demand prices

• 24% savings vs

scaling 4 times per

day

• 60% savings vs no

autoscaling

Page 74: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

The opportunity cost of NOT scaling

$302/day

$162/day

$122/day

Page 75: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Demand curve hugs the usage curve…

Page 76: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

…and a (mostly) flat response curve

Page 77: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

“Auto Scaling saves us a lot of money; with

a little bit of math, flexibility of AWS allows

us to further save by aligning our demand

curve with usage curve.” -- Dreambox

Page 78: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Why Auto Scaling?

Scale Up Control CostsImprove Availability

Page 79: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Key Takeaways

• Maintaining application response times and fleet utilization

• Scaling up and handling unexpected “weather events”

• Auto Scaling for 99.9% Uptime

• Single-instance groups

• Cost control and asymmetric scaling responses

• CloudFormation, custom scripts, and multiple inputs

• Using performance testing to choose scaling strategies

• Dealing with bouncy or steep curves

The Weather Channel

Nokia

Adobe

Dreambox

Page 80: More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling and Amazon EC2

Thank You!

Derek [email protected]

@derekchiles