Fault-Tolerant Workflow Scheduling Using Spot Instances on Clouds Deepak Poola, Kotagiri Ramamohanarao, and Rajkumar Buyya Cloud Computing and Distributed.

Post on 12-Jan-2016

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Fault-Tolerant Workflow Scheduling Using Spot Instances on Clouds

Deepak Poola, Kotagiri Ramamohanarao, and Rajkumar Buyya

Cloud Computing and Distributed Systems (CLOUDS) LaboratoryDepartment of Computing and Information Systems, The University of Melbourne,

Email: deepakc@student.unimelb.edu.au,{kotagiri,rbuyya}@unimelb.edu.au

ICCS-2014, Cairns, Australia

Cloud Computing

Cloud Computing Offers resources as a subscription based service Highly scalable Highly available Driven by market principles Dynamically configured and delivered on demand Different pricing models

2

Benefits of Cloud Computing

• Scalability or elasticity

• On-Demand resource provisioning

• Wide range of resource types

• Pay-as-you-go model

• Attractive cost models

• Illusion of unlimited resources

• Cheaper and fast storage facilities

• Plethora of tools for ease of use– Content-delivery– Monitoring

– Networking– Deployment and Management

3

Spot Instances

• Started by Amazon around December 2009

• Idle or unused datacenter capacity

• Spot price is decided in an Auction-like mechanism

• Varies with time and instance type

• Varies between regions and availability zones

• bid should be higher than or equal to the spot price

• Offers upto 60% cost reductions

Workflows

• Scientific workflow systems aim at automating large complex data analysis to make it easier for scientists.

• Workflows are collection of tasks that are data dependent or control dependent. Workflows can be represented as Directed Acyclic Graph

• Workflow scheduling maps tasks to resources whilst maintaining dependencies

• Jargons– Makespan– Cost

Sample Workflow

5

– Deadline– Budget

Research overview

• Just-in-time and adaptive scheduling heuristic

• Using spot and on-demand instances

• An intelligent bidding strategy

• Minimizes the execution cost

• Providing a robust schedule

• Satisfying the deadline constraint

6

Background

• Workflow is represented a DAG

• Makespan is the total elapsed time

• Pricing models– On-Demand– Spot

• Critical Path is the longest path from the start node to the exit node

Latest Time to On-Demand (LTO)

• It is the latest time the algorithm has to switch to on-demand instances to satisfy the deadline constraint

DeadlineLTOStart

Spot Instances On-Demand

System Model

Runtime Estimation

• We use Downey’s analytical model

• Downey’s model requires:– task’s average parallelism, A,– coefficient of variance of parallelism, σ,– task length – the number of cores

• Cirne et al model to generate A and σ

Failure Estimator

• Estimates the failure probability of a particular bid price

• Based on spot price

• The history price of one month prior is considered

• Total time of the spot price history, HT

• And total out of bid time, OBTbidt is measured

Scheduling Algorithm

Scheduling Algorithm (Contd..)

Scheduling Algorithm (Contd..)

Two type of Scheduling Algorithms

• Conservative: CP and LTO is estimated on the lowest cost instance.

– CP is the longest, hence less slack time– Uses spot instances cautiously under relaxed deadlines

• Aggressive: CP and LTO is estimated on the highest cost instance.

– CP is smallest, hence more slack time– opt on-demand instances that are expensive under

failures

Bidding Strategy

Intelligent Bidding Strategy

• Current spot price (pspot)

• On-demand price (pOD)

• Failure probability (FP) of the previous bid price

• LTO

• Current time (CT)

• α

• β

Intelligent Bidding Strategy

• α : dictates how much higher the bid value must be above the current spot price

• β : determines how fast the bid value reaches the on-demand price

• FP of the previous bid is used as a feedback to the current bid price

Intelligent Bidding Strategy

Other Bidding Strategies

• On-Demand Bidding Strategy : uses the on-demand price as the bid price.

• Naive Bidding Strategy: uses the current spot price as the bid price for the instance

Simulation Setup

• CloudSim was used for simulation

• LIGO workflow with 1000 tasks was considered

• For On-Demand 9 different VMs types wereconsidered

• For Spot, 1 VM type was used

Results : Comparison between algorithms

Mean execution cost of algorithms with varying deadline (with 95% confidence interval)

Results : Comparison between bidding strategies

Mean Execution Cost of bidding strategies with varying deadline (with 95% confidence interval)

Results : Task Failures

Mean of task failures due to bidding strategies

Results : Checkpointing

Conclusion

• Two scheduling heuristics that map workflow tasks onto spot and on-demand instance are presented

• They minimize the execution cost

• They are robust and fault-tolerant towards out-of-bid failures and performance variations

• A bidding strategy that bids intelligently to minimize the cost is presented

• Demonstrates the use of checkpointing, which offers cost savings up to 14%

© Copyright The University of Melbourne 2009

top related