Venugopal adec

Post on 06-May-2015

87 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

Transcript

Autonomic Decentralised Elasticity Management of Cloud

ApplicationsSrikumar VenugopalSchool of Computer Science and EngineeringUniversity of New South Wales, Sydney, AustraliaE: srikumarv@cse.unsw.edu.auW: http://www.cse.unsw.edu.au/~srikumarv

Agenda

Background & Motivation

Problem Statement

Solution Overview

Evaluation: Methodology and Results

Conclusion

The Promise of Cloud Computing

Background & Motivation

State-of-the-art in Auto-scaling

Product/Project Trigger Controller Actions

Amazon Autoscaling

Cloudwatch metrics/ Threshold

Rule-based/Schedule-based

Add/Remove Capacity

WASABi Azure Diagnostics/Threshold

Rule-based Add/Remove Capacity, Custom

RightScale/Scalr Load monitoring Rule-based/Schedule-based

Add/Remove Capacity, Custom

Google Compute Engine

CPU Load, etc. Rule-based Add/Remove Capacity

Academic

CloudScale Demand Prediction Control theory Voltage-scaling

Cataclysm Threshold-based Queueing-model Admission Control

IBM Unity Application Utility Utility functions/RL Add/Remove Capacity

Cons of Rule-based Auto-scaling

• Currently, the most popular mechanisms for auto-scaling are rule-based mechanisms

• The effectiveness of rule-based autoscaling is determined by the trigger conditions

• Setting up the triggers is a trial-and-error process.

Cons of Rule-based Autoscaling

• Commercial products are rule-based– Gives “illusion of control” to users– Leads to the problem of defining the

“right” thresholds

• Centralised controllers– Communication overhead increases with

size– Processing overhead also increases (Big

Data!)

• Limited to One application per VM

Challenges of large-scale elasticity

• Large numbers of instances and apps– Deriving solutions takes time

• Dynamic conditions– Apps are going into critical all the time

• Shifting bottlenecks– Greedy solutions may create bottlenecks

in other places

• Network partitions, fault tolerance…

H. Li, S. Venugopal, Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform, Proceedings of 8th ICAC '11.

Problem Statement

Initial Conditions

Instance1App Server1

app1 app2

Instance2App Server2

app3 app4

IaaS Provider

A Critical Event

Instance1App Server1

app1 app2

IaaS Provider

Instance2App Server2

app3 app4

Placement 1

Instance1App Server1

app2

IaaS Provider

Instance2App Server2

app3 app4 app1

Placement 2

Instance1App Server1

app1

IaaS Provider

Instance2App Server2

app3 app4 app2

Placement 3

Instance1App Server1

app2

IaaS Provider

Instance2App Server2

app3 app4

Instance3App Server3

app1

Placements 4 & 5

Instance1App Server1

app2

IaaS Provider

Instance2App Server2

app3 app4

Instance1App Server1

app2

IaaS Provider

Instance2App Server2

app3 app4

Instance3App Server3

app1 app1

app1 app1

Challenges of App Placement

• Load shifts are dynamic

• Multiple applications may go critical simultaneously

• Instance provisioning should be controlled

• Service QoS must be maintained

Twin Objectives

• Provisioning Problem– To determine the smallest number of

servers required to satisfy resource requirements of all the applications

• Dynamic Placement Problem– To distribute the applications so as to

maximise utilisation yet meet each app’s response time and availability requirements

Solution Overview

Decentralised Elastic Control

• Instances control their own utilisation–Monitoring, management and feedback

• Local controllers are learning agents– Reinforcement Learning

• Servers are linked by Zookeeper– Agility, Flexibility, Co-ordination

• We call our system ADEC (Autonomic Decentralised Elasticity Control)

Software Architecture of ADEC

Reinforcement Learning

• Learn optimal management policies over time– vs. Model-based policies

• Learn long-term effects of short-term actions– If the state-action pairs are chosen correctly

• We have applied Q-Learning to this problem– Initial actions are drawn using Boltzmann dist.

Abstract View of the Control Scheme

States

Basic Actions

Server

Application

create terminate find

move duplicate merge

(-3.5) (3.5) (3.5)

(0.5) (0.5) (0.5)

Actions and Rewards

• Actual actions are a combination of a server and an application action– E.g. find and move, merge and

terminate

• 11 pre-defined actions– Reducing complexity

• Each action is associated with a reward– -ve rewards for actions incurring costs

(e.g. start server)–+ve rewards for actions that save (e.g.

terminate

Co-ordination using find

• Server looks up other servers with the least load– Zookeeper lookup

• Sends a move message to the selected server

• Replies with accept or reject– accept has a +ve reward

Shrinking

• The controller is always reward maximising– Highest Reward is for merge+terminate

• A controller initiates its own shutdown– Low load on its applications

• Gets exclusive lock on termination– Only one instance can terminate at a

time

• Transfers state before shutdown

Information on the DHT

• Server event notification

• List of applications on each server

• Server status updates (load information)

• Q-value updates

Evaluation

Experiment 1: Testing ADEC

• IaaS provider: Amazon EC2– small instances and high CPU instance

• Load-tester: Apache Jmeter• Application server: Tomcat 6.0– JVM with 1 GB RAM

• Server thresholds: 60% and 85%

Experiment 1: Testing

• Six web applications– Test Application: Hotel Management– Search Book Confirm

• Five were subjected to a background load– Uniform Random

• One was subjected to the test load• Application threshold: 200 and 500

ms• Metrics– Average Response Time, Drop Rate,

Servers

Peaking Workload

Poisson Workload

Conclusion

Conclusion

• Demonstrated a co-ordination architecture for provisioning web applications

• Each server is independent and the system is managed by set of simple states and actions

• Instances start and shutdown on their own to meet application objectives

Ongoing Work

• Imrpoved performance modeling for quick detection of slowdowns

• Using utility functions for defining application priorities

• Extension to SOA and BPM – Collaboration with Technical Univ of

Vienna, Austria

• Scaling the database– ElasCass project

Questions ?

srikumarv@cse.unsw.edu.au

Thank you!

top related