Auto-Scaling to Minimize Cost and Meet Application Deadlines in Cloud Workflows
Post on 28-Jan-2015
109 Views
Preview:
DESCRIPTION
Transcript
Ming Mao, Marty Humphrey
CS Department, University of Virginia
Auto-Scaling to Minimize Cost and Meet Application Deadlines in
Cloud Workflows
SC 11 (Nov 16, TCC 305)
1
Introduction
Resource provisioning questions are not trivial Under-provisioning → hurt performance
Over-provisioning → pay more than necessary
How much resources?
What types of resources?
When to acquire or release?
How to use them?
A performance-resource mapping problem
2
Auto-Scaling
Schedule-based and rule-based auto-scaling E.g. “run 10 instances between 8AM to 6PM everyday and
2 instances all the other time.”
E.g. “add (remove) 2 instances when the average CPU utilization is above 70% (below 20%) for 5 minutes.”
Simple and convenient, works well for simple applications
What if the relationship between the performance and resources utilization indicators is complex
The resource utilization indicators are low-level and may not be expressive enough
They do not consider the user budgets well
3
Auto-Scaling
Goals of auto-scaling mechanisms Balance performance and cost
E.g. meet performance goals with minimum cost or maximize utilities with the limited budget
Reflect different options for computing resources E.g. VMs have different processing power and price
Be aware of practical considerations E.g. VM may takes several min to be ready to use
Be aware of the cloud billing model E.g. billed by instance-hours
Support specific application performance requirements E.g. deadlines, the number of concurrent users, communication
latency
4
Cloud application model
5
Authentication
(2)
DataValidation
(3)
Entry Point (1)
GoldMembers
Non-Member Job Gold Member Job
Cloud VMs
Non-Member
Silver Members
Loading Profile
(4)
Base Model
(7)
CompleteModel
(10)
Health Record
(6)
CreditHistory
(5)
Third Party Evaluation
(8)
Response(11)
AdvancedModel
(9)
Cloud
Silver Member Job
Auto-Scaling
App consists of service units
Job consists of tasks
Jobs are categorized into classes (deadline and processing flow)
Cloud offers multiple VM types (price and processing power)
App has no knowledge on the workload info in advance
VM takes time to start up (VM acquisition delay) and are billed by hours
Problem definition
Cloud application app = {Si}
Job class J = {DAG(Si), deadline | Si ∈ app}
Cloud VM VMv = {[𝑗𝐽
𝑆𝑖]v , cv , lagv}
Workload Wt = 𝑗𝐽
𝑆𝑖𝐽𝑆𝑖
Scaling plan Scalingt = {VMv , Nv}
Scheduling plan Schedulet = { 𝑗𝐽
𝑆𝑖 →VMv}
Goal Min(C) = Min( 𝑐𝑣𝑁𝑣𝑣 )
6
Solution
SCS (Scaling – Consolidation - Scheduling)
Task bundling
Deadline assignment
Scaling
Instance consolidation
Scheduling
7
Solution – Step 1
Task bundling
Idea – force tasks run on the same instance to improve performance and save data transfer cost
Example
8
Bundle task as T6'
Server 1 Server 2 Server 1 Server 1
Before After
T6 T8 T6 T8
Solution – Step 2
Deadline assignment Idea – to break task dependencies, assign deadlines
proportionally based on task running time (on their cost-efficient machines)
Example
Task upgrading
9
T6
T3
T5
T2
T7
T9T12
T11T4
T8
T6
T3
T5
T2
T7
T9T12
T11T4
T8
Before After
T1 T10 T13 T1 T10 T13
3:00PM 3:00 4:30 3:00 3:10 3:20 3:50 4:00 4:20 4:30
𝑟𝑎𝑛𝑘 = 𝑚𝑎𝑘𝑒𝑠𝑝𝑎𝑛𝑏𝑒𝑓𝑜𝑟𝑒−𝑚𝑎𝑘𝑒𝑠𝑝𝑎𝑛𝑎𝑓𝑡𝑒𝑟
𝑐𝑜𝑠𝑡𝑎𝑓𝑡𝑒𝑟−𝑐𝑜𝑠𝑡𝑏𝑒𝑓𝑜𝑟𝑒
Solution – Step 3
Determine the number of instances
From deadline assignment, we have
Task running time – tm
Task execution interval – [T0 ,T1 ]
Load vector
LVm = [tm/( T1 – T0 )]
# of instances = [LVm]
Example
10
VM10.75 0.250.25
0 00 0
3:00 4:00
0.250 00 0
0 00 0
T1
0.5 00T2
All3:15 3:45
Solution – Step 5
Instance consolidation
Idea – put tasks on the same instance even if some task may not run the most cost-efficiently on that machine
Example
11
T11
High-CPU
T12
Standard
Idle
3:00 PM 4:00 PM
Idle
3:00 PM 4:00 PM
Standard 3:00 PM 4:00 PM
Idle
Before
After T11
T12
T12
Solution – Step 6
Scheduling – Earliest Deadline First
The dynamic scaling feature can make sure that the tasks facing missed deadlines can be found in time
12
𝑡𝑖
𝑇𝑒𝑛𝑑_𝑖 − 𝑇𝑠𝑡𝑎𝑟𝑡_𝑖𝑖< 1
Solution – Overview
13
Parallelism reduction
Evaluation
Workload patterns
Application models
Base line Greedy GAIN
14
Time
72 hours
Task execution
Randomly generated
VM lag
8 min
VM Type Price
Micro $0.02/hour
Standard $0.085/hour
High-CPU $0.68/hour
High-Memory $0.50/hour
Evaluation
15
SCS cost saving ranges from 6.8% to 40.4%
The performance difference is larger with longer deadlines
Evaluation – High volume V.S. Low volume
High workload (10X ) V.S. low workload (X)
Pipeline, 1-hour deadline
16
0
20
40
60
80
100
120
Stable Growing Cycle OnOff
Cost ($)High Volume V.S. Low Volume
Greedy-High
GAIN-High
SCS-High
Greedy-Low
GAIN-Low
SCS-Low
Evaluation – Imprecise parameters
Pipeline application, 20% variance
in estimated execution time, 0.5-
hour deadline
SCS can finish jobs before
deadlines for more than 90%,
much better than Greedy(40%)
and GAIN(50%)
Pipeline application, 20% variance
in the estimate VM acquisition
time, 1-hour deadline
SCS beats Greedy and GAIN
The performance is more affected
by the VM acquisition time
17
0.0%10.0%20.0%30.0%40.0%50.0%60.0%70.0%80.0%90.0%
100.0%
Stable Growing Cycle OnOff
Non-miss Rate (%)
Deadline(0.5hour) Non-Miss Rate for Imprecise Task Execution Estimation
Greedy
GAIN
SCS
0.0%10.0%20.0%30.0%40.0%50.0%60.0%70.0%80.0%90.0%
100.0%
Stable Growing Cycle OnOff
Non-miss Rate(%)
Deadeline(1 hour) Non-Miss Rate for Imprecise Instance Acquisition Lag
Greedy
GAIN
SCS
Related work
18
Dynamic resource provisioning in virtualized environment
Multi-tier web applications, queuing theory, control theory
Workflow scheduling in Grid environment with deadline and budget constraints
Single workflow instance Resource pool is limited
Cloud economics Cloud provider side V.S. cloud user side
Current cloud auto-scaling mechanisms E.g. AWS auto-scaling, RightScale, enStratus, Scalr, AzureScale
project, etc.
Conclusion and future work
Conclusions SCS cost saving ranges from 6.8% to 40.4%
SCS can better handle different workload volume and imprecise parameters
Choosing proper VM types based on the workload saves cost
Instance consolidation can help save partial instance hours
VM acquisition time plays a very important role
Future work Different scheduling approaches
Real scientific applications
Insufficient budget cases - maximize cloud user benefits/utilities under budget constraints
Data-intensive applications
19
Thank you!
20
top related