AWS Government, Education, & Nonprofits Symposium Canberra, Australia | May 20, 2014 Managing Seasonal Workloads on AWS Clayton Brown Ecosystem Solution Architect
May 08, 2015
AWS Government, Education, & Nonprofits Symposium
Canberra, Australia | May 20, 2014
Managing Seasonal Workloads on AWS Clayton Brown Ecosystem Solution Architect
Managing Seasonal Workloads on AWS
Why are customers adopting cloud computing?
Variable expense Replace capital expenditure with variable expense
Source IDC Whitepaper, sponsored by Amazon, “The Business Value of Amazon Web Services
Accelerates Over Time.” July 2012
Average of 400 servers replaced per customer
Economies of scale Lower variable expense than companies can achieve themselves
Why are customers adopting cloud computing?
Saved $34m on SmartHub applica;on
10’s of millions of $ saved with first 12 apps
migrated to AWS
50% reduc;on in analy;cs
costs
Mul;ple global regions helps build highly available
applica;ons
Web Server
Availability Zone 1
Web Server
Availability Zone 2
Web Server
Regional AWS design provides Highly Availability as a Baseline
Corporate Data Center
Which can be fully integrated with existing assets
Demand
Time Week 1 Week 2 Week 3 Week 4 Week 5
Wasted Capacity
Lost Customers,
Rush Hardware Wasted Capacity
Lost Customers,
Rush Hardware
Lost Customers, Rush Hardware
1m
1.5m
2.0m
Scaling on-premise infrastructure can be a challenge
Sizing capacity for peak is harder even still
Demand
Q1 Q2 Q3 Q4 Q1
Wasted Capacity
Lost Customers,
Order Hardware
Wasted Capacity
Wasted Capacity
Wasted Capacity
200k
300k
600k
Time
Capacity of Resources Actual Demand
3000 Cores for risk management processes N
umbe
r of C
ores
300 Cores on weekends
Thu Fri Sun Mon Tue Sat Wed
3000 -
300 -
Different workloads have different usage patterns
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
Typical weekly traffic to Amazon.com
Provisioned capacity
November traffic to Amazon.com
November
November traffic to Amazon.com Provisioned capacity
November
November traffic to Amazon.com 76%
24%
Provisioned capacity
November
Actual demand
Predicted demand
Customer dissa;sfac;on
Waste
Demand
Time
Elastic capacity No need to guess capacity requirements and over-provision
AWS enables companies to match resources to demand
Elastic capacity No need to guess capacity requirements and over-provision
Elas;c capacity
Demand
Time
AWS enables companies to match costs to demand
November 10th 2010 Turned off last physical web server of
Amazon.com
October 31st 2011 Turned off last web servers supporting
European business
November traffic to Amazon.com
November
Num
ber o
f EC
2 In
stan
ces
4/12/2008 4/14/2008 4/15/2008 4/16/2008 4/18/2008 4/19/2008 4/20/2008 4/17/2008 4/13/2008
40 servers to 5000 in 3 days
EC2 scaled to peak of 5000 instances
“Techcrunched” Launch of Facebook
modification Steady state of ~40
instances
Automation is a key enabler to elastic usage
Bootstrapping or DEV-OPS The process of automatically configuring the software and settings on your machines as they boot, each time they boot. Your infrastructure as code.
Amazon Route 53 Elastic Load Balancer
The image cannot be displaye
S3 Bucket CloudFront Distribution
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been
Web Servers
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been
Web Servers
Web ASG Elastic Beanstalk
App
App
Master
Standby
RR 1
RR 2
RR 3
RR 4
ElastiCache Cluster
This is a stack
In AWS everything can be Automated , everything is an API
Resources are not longer finite, they are elastic in AWS
Cloud Forma=on is a great Cookie Cu@er
Your infrastructure as code.
This is a STACK. JavaScript Object Notation ( JSON ) A template of your datacenter / workload. Your infrastructure as code.
Headers Parameters Mappings Resources Outputs
Git Subversion Mercurial
Dev
Test
Prod
Cloud Forma=on is context aware
Your infrastructure as code.
Create: PROD
dev.mysite.com test.mysite.com
prod.mysite.com
Create: TEST Create: DEV
Elastic resources requires Utility Pricing
Enabling customers to Optimize Costs based on Utilization
Meeting base workload, variable and peak with different pricing models
Architecting Tips for scaling to meet Seasonal Patterns
Auto Scaling groups are useful for more than just fault tolerance
• Vertical Scaling
• Horizontal Scaling
• Auto Scaling
• Scheduled Scaling
• Programmatic Scaling
• Datasbse Tier Scaling
• Asynchronous Process Scaling
• Event Scaling
ASG == Minimum unit of deployment
myAutoScalingGroup - myLaunchConfig - Min 1 - max 1 - desired 1
Launch Configuration
ami-0535d66c
ap-southeast2-a ap-southeast2-b
myElasticLoadBlancer
myLaunchConfig - ami-0535d66g - m3.large
Minimum instance of 1 creates Auto Healing Groups
Vertical Scaling (Scale UP)
Vertical Scaling using different instance types
0 0.5
1 1.5
2 2.5
3 3.5
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
DB
Inst
ance
Typ
e
Days of the Month
End of the Month Scaling
75% Savings
Small 1.7 GB, 1 ECU 1 virtual core
Large 7.5 GB 4 ECUs 2 virtual cores
Extra Large 15 GB 8 ECUs 4 virtual cores
Hi-Mem XL 17.1 GB 6.5 ECUs 2 virtual cores
Hi-Mem 2XL 34.2 GB 13 ECUs 4 virtual cores
Hi-Mem 4XL 68.4 GB 26 ECUs 8 virtual cores
High-CPU Med 1.7 GB 5 ECUs 2 virtual cores
High-CPU XL 7 GB 20 ECUs 8 virtual cores
Micro 613 MB Up to 2 ECUs (for short bursts)
Cluster GPU 4XL 22 GB 33.5 ECUs 8 Nehalem virtual cores 2 x NVIDIA Tesla “Fermi” M2050 GPUs
Cluster Compute 4XL 23 GB 33.5 ECUs 8 Nehalem virtual cores
Cluster Compute 8XL 60.5 GB 88 ECUs 8 core 2 x Intel Xeon
Medium 3.75 GB 2 ECUs 1 virtual cores
Memory intensive Cluster Compute
Processor Intensive
Average Applications
Minimal resources
Multiple Family Types, optimized for different uses
Multiple sizes of instance within a family type
Vertical Scaling using Launch Configurations
myAutoScalingGroup - smallConfig - Min 1 - Max 2 - desired 1 - TP: Oldest Instance
ami-0535d66c
ElasticIP (EIP) / Elastic NIC (ENI)
Launch Config A
smallConfig - ami-0535d66g - small
ap-southeast2-a
Launch Config B
bigConfig - ami-0535d66g - large
UPDATE myAutoScalingGroup - largeConfig - Min 1 - Max 2 - Desired 2 - TP: Oldest Instance
Ver;cal Scaling
UPDATE Desired = 1
Database Tier scaling is automated when using RDS
Push Button Scaling
UP - DOWN
Read Only Replica
IN- OUT
Snapshot & Restore
ON – OFF
Database Tier management is heavily automated using RDS
High Availability
Host Replacement
High Scalability
Asynchronous Replication
Horizontal Scaling (Scale OUT)
ap-southeast2-a ap-southeast2-b Launch
Configuration
ami-0535d66c
myLaunchConfig - ami-0535d66g - m3.large
myAutoScalingGroup - myLaunchConfig - Min 2 - max 100 - Desired 2
elb-cname.amazonaws.com
ASG UPDATE Desired = 4
Elastic Load Balancing (ELB) over multiple Availability Zones (AZs)
ASG UPDATE Desired = 2
HOST LEVEL
METRICS
AGGREGATE LEVEL
METRICS
LOG ANALYSIS
EXTERNAL SITE
PERFORMANCE
Auto Scaling (Elastic Usage)
ap-southeast2-a ap-southeast2-b Launch
Configuration
ami-0535d66c
myLaunchConfig - ami-0535d66g - m3.large
myAutoScalingGroup - myLaunchConfig - Min 2 - max 100 - Desired 2
Desired = 4
Auto Scaling using Policies to Scale Out
Scale UP +1
Scale DOWN -1
ap-southeast2-a ap-southeast2-b Launch
Configuration
ami-0535d66c
myLaunchConfig - ami-0535d66g - m3.large
myAutoScalingGroup - myLaunchConfig - Min 2 - max 100 - Desired 2
API Update Desired = 4
Auto Scaling using API to Scale In / Out
Scale UP +1
Scale DOWN -1
AutoSclaingGroups* - myLaunchConfig - Min 0 - max 100 - Desired 0
Launch Configuration
ami-0535d66c
ap-southeast2-a ap-southeast2-b
launchWhenCheap - ami-0535d66g - m3.large - Spot-price : 0.05
Automate Workload Patterns using Scheduled Scaling
as-put-scheduled-update-group-action ScaleUp --auto-scal`ing-group my-test-asg --recurrence “30 0 1 1,6,12 0” --desired-capacity 20
as-put-scheduled-update-group-action ScaleOff --auto-scaling-group my-test-asg --start-time "2013-05-13T08:00:00Z" --desired-capacity 0
Auto Scaling with Alarms & Policies
Achieve High Utilization with this style of architecture, eliminating waste
Trigger auto-‐scaling policy
Reserved Instances On Demand Spot Pricing
Scheduled Adaptive Predictive
Optimize delivery using S3 static hosting and CloudFront
London
Paris
NY
Served from S3 /images/*
3
Served from EC2 *.php
2
Single CNAME www.mysite.com
1
Lower Cost Lower Latency Higher Scale
Fault Tolerance High Availability High Utilization
Scaling Asynchronous Processing
Asynchronous Process Scaling with SQS Messaging
• Amazon managed queue service • Decouple your components • Think parallel • Implement elasticity • Drive Auto Scaling fleets using Queue Depth
Controller A Controller B Controller C
Controller A Controller B Controller C
Q Q Q
Tight Coupl ing
Loose Coupling using Queues
Amazon SQS
Processing task/processing trigger
Processing results
Min 5 Min 10 Min 2
S3 Bucket For Ingest
User
SNS Topic
RRS S3 Bucket to
Serve content to CloudFron
t
S3 Bucket For
originals
CloudFront Download Distribution
SQS Queue Size for Thumbnail
SQS Queue Size Image for
Mobile
SQS Queue Size Image for Web
Auto scaling Group
Instances
Auto scaling Group
Instances
Auto scaling Group
Instances
Asynchronous Process Scaling with SQS Messaging (SQS)
S3 Bucket For Ingest
User
RRS S3 Bucket to
Serve content to CloudFront
S3 Bucket For
originals
CloudFront Download Distribution
Auto scaling Group
Instances
Auto scaling Group
Instances
Auto scaling Group
Instances
SWF
Instance running decider
Asynchronous Process Scaling with Simple Workflow (SWF)
AutoSclaingGroups* - myLaunchConfig - Min 0 - max 100 - Desired 0
Launch Configuration
ami-0535d66c
ap-southeast2-a ap-southeast2-b
launchWhenCheap - ami-0535d66g - m3.large - Spot-price : 0.05
Optimize costs using Auto Bidding groups and spot pricing
aws autoscaling create-launch-configuration --launch-configuration-name launchWhenCheap --spot-price 0.05
SQS queue
Consumers
Producer
Consumers
Amazon Elastic MapReduce Hadoop Cluster
HDFS
Task Node
Core Node
Amazon S3
Amazon DynamoDB/RDS
BI Apps
Via Flume/Fluentd (Log Aggregator) Logs
from EC2
Instances
Code/ Scripts
Amazon S3
Amazon Elastic MapReduce
HiveQL Pig Latin Cascading
Mapper Reducer
Runs multiple JobFlow Steps
Name Node
JDBC/ODBC
HiveQL Pig Latin
Query
Task Node
Core Node
Scale 1000s of nodes when needed a back to zero using EMR
Optionally using a Spot Pricing strategy on task nodes
Event Based Scaling
Parameterized Scaling via CloudFormation
myAutoScalingGroup - myLaunchConfig - Min 2 - max 100 - Desired inputParameter
Are you confident your N+1?
February, 2012
Automated failover using pilot light configurations
Web Server
Application Server
Database Server
Data Volume
Data Mirroring/ Replication
Not Running
Smaller Instance
Amazon Route 53
User or system
Web Server
Application Server
Database Server
Data Volume
UPDATE Desired = 0 à 1 Desired = 0 à 1 Desired = 1 à 1
Web Server
Application Server
Just in Time systems which can be during an event
• ~30th biggest E-commerce operation, globally • ~200 distinct applications, many mobile • Hundreds of new, untested analytical approaches • Processing hundreds of TB of data on thousands of servers • Spikes of hundreds of thousands of concurrent users • Critically compressed budget • Less than a year to execute • Core systems will be used for a single critical day • Constitutionally-mandated completion date
Support Systems which can be retired immediately after an event
THANK YOU Please give us your feedback by filling out the Feedback Forms
AWS Government, Education, & Nonprofits Symposium
Canberra, Australia | May 20, 2014