OGF19 -- NC 1 Service Level Agreements and QoS: what do we measure and why? Omer F. Rana [email protected] School of Computer Science, Cardiff University, UK Welsh eScience Centre, UK
Mar 27, 2015
OGF19 -- NC 1
Service Level Agreements and QoS: what do we measure and why?
Omer F. Rana
[email protected] of Computer Science, Cardiff University, UKWelsh eScience Centre, UK
2OGF19 – NC (QoS BoF)
QoS Management
QoS has been explored in:Computer Networks
Bandwidth, Delay, Packet loss rate and Jitter.
Multimedia Applications Frame rate and computation resource.
Grid Computing Network QoS, computation and storage
requirements.
3OGF19 – NC (QoS BoF)
Continue … QoS management:
Covers a range of different activities, from resource specification, selection and allocation through to resource release.
QoS system should address the following: Specifying QoS requirements Mapping of QoS requirements to resource capability Negotiating QoS with resource owners Establishing contracts / SLAs with clients Reserving and allocating resources Monitoring parameters associated with QoS sessions Adapting to varying resource quality characteristics Terminating QoS sessions
User Expectations vs. Resource Management
4OGF19 – NC (QoS BoF)
When QoS is needed? Interactive sessions
Computation steering (control parameters & data exchange)
Interactive visualization (visualization & simulations services)
Response within a limited time span Co-scheduling or co-location support
From SCIRun, University of Utah
– Application QoS–User perception, response time, appl. Security, etc.– Middleware QoS–Comp., Memory and Storage– Network QoS–BW, Packet loss, Delay, Jitter
5OGF19 – NC (QoS BoF)
What is a Service Level Agreement (SLA)?
Client Provider
Can youdo X for mefor Y in return?
Yes
SLASLA
Distinguish between: Discovery of suitable provider Establishment of an SLA
P2P Search,Directory Service
SLA-Offer
SLA-AcceptSLA-Reject
A relationship between a client and provider in the context of a particularcapability (service) provision
6OGF19 – NC (QoS BoF)
What is an SLA?
Client Provider
Can youdo X for mefor Y in return?
No, but Ican do Zfor Y
SLASLA
Accept
SLA-CounterOffer
SLA-Offer
SLA-AcceptSLA-Reject
7OGF19 – NC (QoS BoF)
What is an SLA?
Client Provider
Can youdo X for mefor Y in return?
No
SLASLA
Can youdo Z for mefor Y in return?
NegotiationPhase(Single orMulti-Round)
SLA-Offer
SLA-CounterOffer
SLA-OfferDependency
8OGF19 – NC (QoS BoF)
Variations
Client
Providers
SLA
Client
Providers
SLA SLA
Multi-provider SLA
Single SLA is dividedacross multiple providers(e.g. workflow composition)
SLA dependencies
For an SLA to be valid, anotherSLA has to be agreed(e.g. co-allocation)
9OGF19 – NC (QoS BoF)
QoS Context QoS:
Per ServicePer Workflow (Set of Services)
Workflow:Aggregating metrics across servicesSupport through some workflow enactor
10OGF19 – NC (QoS BoF)
Dynamically established and managed relationship between two parties
Objective is “delivery of a service” by one of the parties in the context of the agreement
Delivery involves:Functional QoS Properties
What is an SLA?
www.utilityserve.com SLA
11OGF19 – NC (QoS BoF)
Forming the Agreement Distinguish between:
Agreement itself Mechanisms that lead to the formation of the
agreement
Mechanisms that lead to agreement:Negotiation (single or multi-shot)One-shot creationPolicy-based creation of agreements, etc.
12OGF19 – NC (QoS BoF)
WS-Agreement
Name/ID
Context
Terms Composition
Guarantee Terms
Service Terms
AgreementInformation about AgreementInitiatorResponderExpiration Time
Information about ServiceService Description Terms(generally, these are domaindependent)
Information about ServiceLevelService Level Objectives,Qualifying Conditions for the agreement to be valid,Penalty Terms, etc
13OGF19 – NC (QoS BoF)
WS-Agreement Terms
From: Viktor Yarmolenko (U Manchester)
14OGF19 – NC (QoS BoF)
SLA Life Cycle Identify Provider
On completion of a discovery phase Define SLA
Define what is being requested Agree on SLA terms
Agree on Service Level Objectives Monitor SLA Violation
Confirm whether SLO’s are being violated Destroy SLA
Expire SLA Penalty for SLA Violation
15OGF19 – NC (QoS BoF)
CATNETs: Metrics “Pyramid”
16OGF19 – NC (QoS BoF)
SLA Classes
Guaranteed constraints to be exactly observed SLA is precisely/exactly defined adaptation algorithm/optimization heuristics
Controlled-load some constraints may be observed Range-oriented SLA optimization heuristics
Best-effort any resources will do no adaptation support
17OGF19 – NC (QoS BoF)
Adaptation Algorithm
Assume a total capacity: C= CG + CA + CB where G, A & B denotes ‘guaranteed,’ ‘adaptive’ and ‘best effort’
Adapt(c(u,t), g(u))Net capacity NG(t) = CG(t) –
If NG(t) < 0; (guarantees can’t be honoured)
Then ADD ( - CG(t)) from A to G
ADD (CA(t) – [ - CG(t)]) from A to B
Before invoking the adaptive function: Ensuring that the request at time (t) g(u) --- (SLA) Ensuring that CG
Gu
ug )(
Gu
ug )(
Gu
ug )(
Gu
ug )(
CG(t): Capacityin guaranteed blockat time t;
g(u): guarantee to useruser “u”
18OGF19 – NC (QoS BoF)
SLA Adaptation
Assume capacityTotal: C= CG + CA + CB ‘best effort’ can uses the adaptive capacity,
as long as its not used by the ‘guaranteed’
When QoS degrades for ‘guaranteed’ Then adaptive is utilized to compensate for
the degradation
‘best effort’ can still utilize the remaining capacity of the adaptive, as long as its not used by the ‘guaranteed’
When the congested capacity is restored, the adaptive capacity can be used entirely by the ‘best effort’
G A B
G BA
G A B
BAG
G BA
o Before invoking the adaptive function:o Ensuring that the request at time (t) the agreed upon in the SLAo Ensuring that the total capacities within all SLAs at time (t) CG
Aim: compensation for QoS degradation for
‘guaranteed’ class only
19OGF19 – NC (QoS BoF)
SLA Adaptation
If we define a set: QoS = { a1, a2, ..., an }, where each ai represents a different parameter of interest such as CPU, network bandwidth, etc.
Then we could compare sets: QoSx = { a1x, ..., anx } and QoSy = {a1y, ..., any }
QoS values specified in SLA as: ay ai ax Or ai={ x, y, z } Assume the existence of a cost function: Cost(ai) = ci* ai
Then: Service_ Cost (QoS) =
The proposed heuristic: Total Cost = maxn is the total number of active services
n
iaici
1)*(
n
iQoSiCostService
1))(_(
The QoS broker implements this heuristic by varying the resource quality selection, based on the agreed upon in the SLA, aiming to Optimize resource utilization and maximize the overall cost.
This heuristic could be mapped to the Generalized Assignment Problem (GAP)
20OGF19 – NC (QoS BoF)
Grid Node
Reservation ManagerAllocation Manager
Policy Manager
QoS Grid Service
Resources
Grid QoS service interface
21OGF19 – NC (QoS BoF)
Main components Policy Manager
To provide dynamic info about the domain-specific resource characteristics and policy
Reservation Manger To provide advance/immediate resource reservation
Data structure contains reservation entries Interact with policy manager for resource char.
Allocation Manger To interact with the underlying resource manager for
resource allocation (e.g DSRT, Bandwidth Broker)
22OGF19 – NC (QoS BoF)
UDDIe
QoS Broker
Grid node 1 Grid node 2 Grid node 3
QoS Discovery
Client's Appl.
QoS service
ReservationAllocation
Policy
QoS service
ReservationAllocation
Policy
QoS service
ReservationAllocation
Policy
SLASLA
SLA
Joint work withArgonne National Lab.(Gregor von Lazweski et al.)
23OGF19 – NC (QoS BoF)
Reservation Approaches
Resource reservation / allocation based on two strategies:Time-domain: reserve the whole ‘compute’
power of Grid node. Guaranteed exclusive access
Resource-domain: reserve a CPU slot of the Grid node.
Shared access – guaranteed resource capacity Suitable for light weight applications/services.
24OGF19 – NC (QoS BoF)
Best Effort
25OGF19 – NC (QoS BoF)
Guaranteed
26OGF19 – NC (QoS BoF)