AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS

AWS Government, Education, & Nonprofits Symposium

Canberra, Australia | May 20, 2014

Managing Seasonal Workloads on AWS Clayton Brown Ecosystem Solution Architect

Managing Seasonal Workloads on AWS

Why are customers adopting cloud computing?

Variable expense Replace capital expenditure with variable expense

Source IDC Whitepaper, sponsored by Amazon, “The Business Value of Amazon Web Services

Accelerates Over Time.” July 2012

Average of 400 servers replaced per customer

Economies of scale Lower variable expense than companies can achieve themselves

Why are customers adopting cloud computing?

Saved $34m on SmartHub applica;on

10’s of millions of $ saved with first 12 apps

migrated to AWS

50% reduc;on in analy;cs

costs

Mul;ple global regions helps build highly available

applica;ons

Web Server

Availability Zone 1

Web Server

Availability Zone 2

Web Server

Regional AWS design provides Highly Availability as a Baseline

Corporate Data Center

Which can be fully integrated with existing assets

Demand

Time Week 1 Week 2 Week 3 Week 4 Week 5

Wasted Capacity

Lost Customers,

Rush Hardware Wasted Capacity

Lost Customers,

Rush Hardware

Lost Customers, Rush Hardware

1m

1.5m

2.0m

Scaling on-premise infrastructure can be a challenge

Sizing capacity for peak is harder even still

Demand

Q1 Q2 Q3 Q4 Q1

Wasted Capacity

Lost Customers,

Order Hardware

Wasted Capacity

Wasted Capacity

Wasted Capacity

200k

300k

600k

Time

Capacity of Resources Actual Demand

3000 Cores for risk management processes N

umbe

r of C

ores

300 Cores on weekends

Thu Fri Sun Mon Tue Sat Wed

3000 -

300 -

Different workloads have different usage patterns

Sunday Monday Tuesday Wednesday Thursday Friday Saturday

Typical weekly traffic to Amazon.com

Provisioned capacity

November traffic to Amazon.com

November

November traffic to Amazon.com Provisioned capacity

November

November traffic to Amazon.com 76%

24%

Provisioned capacity

November

Actual demand

Predicted demand

Customer dissa;sfac;on

Waste

Demand

Time

Elastic capacity No need to guess capacity requirements and over-provision

AWS enables companies to match resources to demand

Elastic capacity No need to guess capacity requirements and over-provision

Elas;c capacity

Demand

Time

AWS enables companies to match costs to demand

November 10th 2010 Turned off last physical web server of

Amazon.com

October 31st 2011 Turned off last web servers supporting

European business

November traffic to Amazon.com

November

Num

ber o

f EC

2 In

stan

ces

4/12/2008 4/14/2008 4/15/2008 4/16/2008 4/18/2008 4/19/2008 4/20/2008 4/17/2008 4/13/2008

40 servers to 5000 in 3 days

EC2 scaled to peak of 5000 instances

“Techcrunched” Launch of Facebook

modification Steady state of ~40

instances

Automation is a key enabler to elastic usage

Bootstrapping or DEV-OPS The process of automatically configuring the software and settings on your machines as they boot, each time they boot. Your infrastructure as code.

Amazon Route 53 Elastic Load Balancer

The image cannot be displaye

S3 Bucket CloudFront Distribution

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been

Web Servers

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been

Web Servers

Web ASG Elastic Beanstalk

App

App

Master

Standby

RR 1

RR 2

RR 3

RR 4

ElastiCache Cluster

This is a stack

In AWS everything can be Automated , everything is an API

Resources are not longer finite, they are elastic in AWS

Cloud Forma=on is a great Cookie Cu@er

Your infrastructure as code.

This is a STACK. JavaScript Object Notation ( JSON ) A template of your datacenter / workload. Your infrastructure as code.

Headers Parameters Mappings Resources Outputs

Git Subversion Mercurial

Dev

Test

Prod

Cloud Forma=on is context aware

Your infrastructure as code.

Create: PROD

dev.mysite.com test.mysite.com

prod.mysite.com

Create: TEST Create: DEV

Elastic resources requires Utility Pricing

Enabling customers to Optimize Costs based on Utilization

Meeting base workload, variable and peak with different pricing models

Architecting Tips for scaling to meet Seasonal Patterns

Auto Scaling groups are useful for more than just fault tolerance

•  Vertical Scaling

•  Horizontal Scaling

•  Auto Scaling

•  Scheduled Scaling

•  Programmatic Scaling

•  Datasbse Tier Scaling

•  Asynchronous Process Scaling

•  Event Scaling

ASG == Minimum unit of deployment

myAutoScalingGroup -  myLaunchConfig -  Min 1 -  max 1 -  desired 1

Launch Configuration

ami-0535d66c

ap-southeast2-a ap-southeast2-b

myElasticLoadBlancer

myLaunchConfig - ami-0535d66g - m3.large

Minimum instance of 1 creates Auto Healing Groups

Vertical Scaling (Scale UP)

Vertical Scaling using different instance types

0 0.5

1 1.5

2 2.5

3 3.5

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

DB

Inst

ance

Typ

e

Days of the Month

End of the Month Scaling

75% Savings

Small 1.7 GB, 1 ECU 1 virtual core

Large 7.5 GB 4 ECUs 2 virtual cores

Extra Large 15 GB 8 ECUs 4 virtual cores

Hi-Mem XL 17.1 GB 6.5 ECUs 2 virtual cores

Hi-Mem 2XL 34.2 GB 13 ECUs 4 virtual cores

Hi-Mem 4XL 68.4 GB 26 ECUs 8 virtual cores

High-CPU Med 1.7 GB 5 ECUs 2 virtual cores

High-CPU XL 7 GB 20 ECUs 8 virtual cores

Micro 613 MB Up to 2 ECUs (for short bursts)

Cluster GPU 4XL 22 GB 33.5 ECUs 8 Nehalem virtual cores 2 x NVIDIA Tesla “Fermi” M2050 GPUs

Cluster Compute 4XL 23 GB 33.5 ECUs 8 Nehalem virtual cores

Cluster Compute 8XL 60.5 GB 88 ECUs 8 core 2 x Intel Xeon

Medium 3.75 GB 2 ECUs 1 virtual cores

Memory intensive Cluster Compute

Processor Intensive

Average Applications

Minimal resources

Multiple Family Types, optimized for different uses

Multiple sizes of instance within a family type

Vertical Scaling using Launch Configurations

myAutoScalingGroup -  smallConfig -  Min 1 -  Max 2 -  desired 1 -  TP: Oldest Instance

ami-0535d66c

ElasticIP (EIP) / Elastic NIC (ENI)

Launch Config A

smallConfig - ami-0535d66g - small

ap-southeast2-a

Launch Config B

bigConfig - ami-0535d66g - large

UPDATE myAutoScalingGroup -  largeConfig -  Min 1 -  Max 2 -  Desired 2 -  TP: Oldest Instance

Ver;cal Scaling

UPDATE Desired = 1

Database Tier scaling is automated when using RDS

Push Button Scaling

UP - DOWN

Read Only Replica

IN- OUT

Snapshot & Restore

ON – OFF

Database Tier management is heavily automated using RDS

High Availability

Host Replacement

High Scalability

Asynchronous Replication

Horizontal Scaling (Scale OUT)

ap-southeast2-a ap-southeast2-b Launch

Configuration

ami-0535d66c


myAutoScalingGroup -  myLaunchConfig -  Min 2 -  max 100 -  Desired 2

elb-cname.amazonaws.com

ASG UPDATE Desired = 4

Elastic Load Balancing (ELB) over multiple Availability Zones (AZs)

ASG UPDATE Desired = 2

HOST LEVEL

METRICS

AGGREGATE LEVEL

METRICS

LOG ANALYSIS

EXTERNAL SITE

PERFORMANCE

Auto Scaling (Elastic Usage)


Configuration

ami-0535d66c



Desired = 4

Auto Scaling using Policies to Scale Out

Scale UP +1

Scale DOWN -1


Configuration

ami-0535d66c



API Update Desired = 4

Auto Scaling using API to Scale In / Out

Scale UP +1

Scale DOWN -1

AutoSclaingGroups* -  myLaunchConfig -  Min 0 -  max 100 -  Desired 0


ami-0535d66c


launchWhenCheap - ami-0535d66g -  m3.large -  Spot-price : 0.05

Automate Workload Patterns using Scheduled Scaling

as-put-scheduled-update-group-action ScaleUp --auto-scal`ing-group my-test-asg --recurrence “30 0 1 1,6,12 0” --desired-capacity 20

as-put-scheduled-update-group-action ScaleOff --auto-scaling-group my-test-asg --start-time "2013-05-13T08:00:00Z" --desired-capacity 0

Auto Scaling with Alarms & Policies

Achieve High Utilization with this style of architecture, eliminating waste

Trigger auto-‐scaling policy

Reserved Instances On Demand Spot Pricing

Scheduled Adaptive Predictive

Optimize delivery using S3 static hosting and CloudFront

London

Paris

NY

Served from S3 /images/*

3

Served from EC2 *.php

2

Single CNAME www.mysite.com

1

Lower Cost Lower Latency Higher Scale

Fault Tolerance High Availability High Utilization

Scaling Asynchronous Processing

Asynchronous Process Scaling with SQS Messaging

•  Amazon managed queue service •  Decouple your components •  Think parallel •  Implement elasticity •  Drive Auto Scaling fleets using Queue Depth

Controller A Controller B Controller C

Controller A Controller B Controller C

Q Q Q

Tight Coupl ing

Loose Coupling using Queues

Amazon SQS

Processing task/processing trigger

Processing results

Min 5 Min 10 Min 2

S3 Bucket For Ingest

User

SNS Topic

RRS S3 Bucket to

Serve content to CloudFron

t

S3 Bucket For

originals

CloudFront Download Distribution

SQS Queue Size for Thumbnail

SQS Queue Size Image for

Mobile

SQS Queue Size Image for Web

Auto scaling Group

Instances

Auto scaling Group

Instances

Auto scaling Group

Instances

Asynchronous Process Scaling with SQS Messaging (SQS)

S3 Bucket For Ingest

User

RRS S3 Bucket to

Serve content to CloudFront

S3 Bucket For

originals

CloudFront Download Distribution

Auto scaling Group

Instances

Auto scaling Group

Instances

Auto scaling Group

Instances

SWF

Instance running decider

Asynchronous Process Scaling with Simple Workflow (SWF)

AutoSclaingGroups* -  myLaunchConfig -  Min 0 -  max 100 -  Desired 0


ami-0535d66c


launchWhenCheap - ami-0535d66g -  m3.large -  Spot-price : 0.05

Optimize costs using Auto Bidding groups and spot pricing

aws autoscaling create-launch-configuration --launch-configuration-name launchWhenCheap --spot-price 0.05

SQS queue

Consumers

Producer

Consumers

Amazon Elastic MapReduce Hadoop Cluster

HDFS

Task Node

Core Node

Amazon S3

Amazon DynamoDB/RDS

BI Apps

Via Flume/Fluentd (Log Aggregator) Logs

from EC2

Instances

Code/ Scripts

Amazon S3

Amazon Elastic MapReduce

HiveQL Pig Latin Cascading

Mapper Reducer

Runs multiple JobFlow Steps

Name Node

JDBC/ODBC

HiveQL Pig Latin

Query

Task Node

Core Node

Scale 1000s of nodes when needed a back to zero using EMR

Optionally using a Spot Pricing strategy on task nodes

Event Based Scaling

Parameterized Scaling via CloudFormation

myAutoScalingGroup -  myLaunchConfig -  Min 2 -  max 100 -  Desired inputParameter

Are you confident your N+1?

February, 2012

Automated failover using pilot light configurations

Web Server

Application Server

Database Server

Data Volume

Data Mirroring/ Replication

Not Running

Smaller Instance

Amazon Route 53

User or system

Web Server

Application Server

Database Server

Data Volume

UPDATE Desired = 0 à 1 Desired = 0 à 1 Desired = 1 à 1

Web Server

Application Server

Just in Time systems which can be during an event

•  ~30th biggest E-commerce operation, globally •  ~200 distinct applications, many mobile •  Hundreds of new, untested analytical approaches •  Processing hundreds of TB of data on thousands of servers •  Spikes of hundreds of thousands of concurrent users •  Critically compressed budget •  Less than a year to execute •  Core systems will be used for a single critical day •  Constitutionally-mandated completion date

Support Systems which can be retired immediately after an event

THANK YOU Please give us your feedback by filling out the Feedback Forms

AWS Government, Education, & Nonprofits Symposium

Canberra, Australia | May 20, 2014

AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS

Technology