EECS 262a Advanced Topics in Computer Systems Lecture 13 M-CBS(Con’t) and DRF October 10 th , 2012 John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs262
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
EECS 262a Advanced Topics in Computer Systems
Lecture 13
M-CBS(Con’t) and DRFOctober 10th, 2012
John Kubiatowicz and Anthony D. JosephElectrical Engineering and Computer Sciences
University of California, Berkeley
http://www.eecs.berkeley.edu/~kubitron/cs262
10/10/2012 2cs262a-S12 Lecture-13
Online Scheduling for Realtime
10/10/2012 3cs262a-S12 Lecture-13
Schedulability Test• Test to determine whether a feasible schedule exists• Sufficient Test
– If test is passed, then tasks are definitely schedulable– If test is not passed, tasks may be schedulable, but not
necessarily• Necessary Test
– If test is passed, tasks may be schedulable, but not necessarily– If test is not passed, tasks are definitely not schedulable
• Exact Test (= Necessary + Sufficient)– The task set is schedulable if and only if it passes the test.
10/10/2012 4cs262a-S12 Lecture-13
Rate Monotonic Analysis: Assumptions
A1: Tasks are periodic (activated at a constant rate). Period = Intervall between two consequtive activations of task
A2: All instances of a periodic task have the same computation time
A3: All instances of a periodic task have the same relative deadline,
which is equal to the period A4: All tasks are independent
(i.e., no precedence constraints and no resource constraints)
Implicit assumptions: A5: Tasks are preemptableA6: No task can suspend itselfA7: All tasks are released as soon as they arriveA8: All overhead in the kernel is assumed to be zero (or part of )
iP iTiT
iCiT )( ii PD
iC
10/10/2012 5cs262a-S12 Lecture-13
Rate Monotonic Scheduling: Principle• Principle: Each process is assigned a (unique)
priority based on its period (rate); always execute active job with highest priority
• The shorter the period the higher the priority• ( 1 = low priority)• W.l.o.g. number the tasks in reverse order of priority
jiji PP
Process Period Priority NameA 25 5 T1B 60 3 T3C 42 4 T2D 105 1 T5E 75 2 T4
10/10/2012 6cs262a-S12 Lecture-13
Example: Rate Monotonic Scheduling• Example instance
A periodic task set with is schedulable by the rate monotonic scheduling algorithm if
This schedulability test is “sufficient”!• For harmonic periods ( evenly divides ),
the utilization bound is 100%•
nTTT ,,, 21 ,1, niPD ii
,2,1),12( /1
1
nnPC n
n
i i
i
jT iT
nn n for 2ln)12( /1
10/10/2012 10cs262a-S12 Lecture-13
RMS Example•
• The schedulability test requires
• Hence, we get
)2,7(),2,5(),1,4( 321 TTT
286.072,4.052,25.0413
3
2
2
1
1 PC
PC
PC
,2,1),12( /1
1
nnPC n
n
i i
i
780.0)12(3936.0 3/13
1
i i
i
PC
does not satisfy schedulability condition
10/10/2012 11cs262a-S12 Lecture-13
EDF: AssumptionsA1: Tasks are periodic or aperiodic.
Period = Interval between two consequtive activations of task A2: All instances of a periodic task have
the same computation time A3: All instances of a periodic task have the same relative deadline,
which is equal to the period A4: All tasks are independent
(i.e., no precedence constraints and no resource constraints)
Implicit assumptions: A5: Tasks are preemptableA6: No task can suspend itselfA7: All tasks are released as soon as they arriveA8: All overhead in the kernel is assumed to be zero (or part of )
iP iTiT
iCiT )( ii PD
iC
10/10/2012 12cs262a-S12 Lecture-13
EDF Scheduling: Principle• Preemptive priority-based dynamic scheduling• Each task is assigned a (current) priority based on
how close the absolute deadline is. • The scheduler always schedules the active task
with the closest absolute deadline.
0 5 10 15
)1,4(1 T
)2,5(2 T
)2,7(3 T
10/10/2012 13cs262a-S12 Lecture-13
EDF: Schedulability TestTheorem (Utilization-based Schedulability Test): A task set with is schedulable
by the earliest deadline first (EDF) scheduling algorithm if
Exact schedulability test (necessary + sufficient)Proof: [Liu and Layland, 1973]
nTTT ,,, 21 ii PD
n
i i
i
DC
1
1
10/10/2012 14cs262a-S12 Lecture-13
EDF OptimalityEDF Properties• EDF is optimal with respect to feasibility (i.e.,
schedulability)• EDF is optimal with respect to minimizing the
maximum lateness
10/10/2012 15cs262a-S12 Lecture-13Frank Drews Real-Time Systems
EDF Example: Domino Effect
EDF minimizes lateness of the “most tardy task” [Dertouzos, 1974]
10/10/2012 16cs262a-S12 Lecture-13
Constant Bandwidth Server• Intuition: give fixed share of CPU to certain of jobs
– Good for tasks with probabilistic resource requirements• Basic approach: Slots (called “servers”) scheduled with
EDF, rather than jobs– CBS Server defined by two parameters: Qs and Ts
– Mechanism for tracking processor usage so that no more than Qs CPU seconds used every Ts seconds (or whatever measurement you like) when there is demand. Otherwise get to use processor as you like
• Since using EDF, can mix hard-realtime and soft realtime:
10/10/2012 17cs262a-S12 Lecture-13
Today’s Papers• Implementing Constant-Bandwidth Servers upon Multiprocessor Platforms
Sanjoy Baruah, Joel Goossens, and Giuseppe Lipari . Appears in Proceedings of �Real-Time and Embedded Technology and Applications Symposium, (RTAS), 2002. (From Last Time!)
• Dominant Resource Fairness: Fair Allocation of Multiple Resources Types, A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica, Usenix NSDI 2011, Boston, MA, March 2011
• Thoughts?
10/10/2012 18cs262a-S12 Lecture-13
CBS on multiprocessors • Basic problem: EDF not all that efficient on
multiprocessors. – Schedulability constraint considerably less good than for
uniprocessors. Need:
• Key idea of paper: send highest-utilization jobs to specific processors, use EDF for rest
– Minimizes number of processors required– New acceptance test:
10/10/2012 19cs262a-S12 Lecture-13
Is this a good paper?• What were the authors’ goals?• What about the evaluation/metrics?• Did they convince you that this was a good
system/approach?• Were there any red-flags?• What mistakes did they make?• Does the system/approach meet the “Test of Time”
challenge?• How would you review this paper today?
10/10/2012 20cs262a-S12 Lecture-13
What is Fair Sharing?• n users want to share a resource (e.g., CPU)
– Solution: Allocate each 1/n of the shared resource
• Generalized by max-min fairness– Handles if a user wants less than its fair share– E.g. user 1 wants no more than 20%
• Generalized by weighted max-min fairness– Give weights to users according to importance– User 1 gets weight 1, user 2 weight 2
CPU100%
50%
0%
33%
33%
33%
100%
50%
0%
20%
40%
40%
100%
50%
0%
33%
66%
10/10/2012 21cs262a-S12 Lecture-13
Why is Fair Sharing Useful?
• Weighted Fair Sharing / Proportional Shares– User 1 gets weight 2, user 2 weight 1
• Priorities– Give user 1 weight 1000, user 2 weight 1
• Revervations – Ensure user 1 gets 10% of a resource– Give user 1 weight 10, sum weights ≤ 100
– Each user can get at least 1/n of the resource– But will get less if her demand is less
• Strategy-proof– Users are not better off by asking for more than they need– Users have no reason to lie
• Max-min fairness is the only “reasonable” mechanism with these two properties
10/10/2012 23cs262a-S12 Lecture-13
Why Care about Fairness?
• Desirable properties of max-min fairness– Isolation policy:
A user gets her fair share irrespective of the demands of other users
– Flexibility separates mechanism from policy:Proportional sharing, priority, reservation,...
• Many schedulers use max-min fairness– Datacenters: Hadoop’s fair sched, capacity, Quincy– OS: rr, prop sharing, lottery, linux cfs, ...– Networking: wfq, wf2q, sfq, drr, csfq, ...
10/10/2012 24cs262a-S12 Lecture-13
When is Max-Min Fairness not Enough?
• Need to schedule multiple, heterogeneous resources – Example: Task scheduling in datacenters
» Tasks consume more than just CPU – CPU, memory, disk, and I/O
• What are today’s datacenter task demands?
10/10/2012 25cs262a-S12 Lecture-13
Heterogeneous Resource Demands
Most task need ~
<2 CPU, 2 GB RAM>
Some tasks are memory-intensive
Some tasks are CPU-intensive
2000-node Hadoop Cluster at Facebook (Oct 2010)
10/10/2012 26cs262a-S12 Lecture-13
Problem
Single resource example– 1 resource: CPU– User 1 wants <1 CPU> per task– User 2 wants <3 CPU> per task
Multi-resource example– 2 resources: CPUs & memory– User 1 wants <1 CPU, 4 GB> per task– User 2 wants <3 CPU, 1 GB> per task– What is a fair allocation?
CPU
100%
50%
0%
CPU
100%
50%
0%mem
? ?
50%
50%
10/10/2012 27cs262a-S12 Lecture-13
Problem definition
How to fairly share multiple resources when users have heterogeneous demands on them?
10/10/2012 28cs262a-S12 Lecture-13
Demands at Facebook
10/10/2012 29cs262a-S12 Lecture-13
Model
• Users have tasks according to a demand vector– e.g. <2, 3, 1> user’s tasks need 2 R1, 3 R2, 1 R3
– Not needed in practice, can simply measure actual consumption
• Resources given in multiples of demand vectors
• Assume divisible resources
10/10/2012 30cs262a-S12 Lecture-13
• Asset Fairness– Equalize each user’s sum of resource shares
• Cluster with 70 CPUs, 70 GB RAM– U1 needs <2 CPU, 2 GB RAM> per task– U2 needs <1 CPU, 2 GB RAM> per task
Max/min Theorem for DRF• A user Ui has a bottleneck resource Rj in an
allocation A iff Rj is saturated and all users using Rj have a smaller (or equal) dominant share than Ui
• Max/min Theorem for DRF– An allocation A is max/min fair iff every user has a bottleneck
resource
10/10/2012 46cs262a-S12 Lecture-13
Desirable Fairness Properties (1)
• Recall max/min fairness from networking– Maximize the bandwidth of the minimum flow [Bert92]
• Progressive filling (PF) algorithm1. Allocate ε to every flow until some link saturated2. Freeze allocation of all flows on saturated link and goto 1
–Evaluate DRF’s dynamic behavior when demands change
–Compare DRF with current Hadoop scheduler
• Macro-benchmark through simulations
–Simulate Facebook trace with DRF and current Hadoop scheduler
10/10/2012 52cs262a-S12 Lecture-13
DRF Inside Mesos on EC2
Dominant shares are equalized
Share guarantee:
~70% dominant share
Dominant resource
is memory
Dominant resource
is CPU
User 1’s Shares
User 2’s Shares
Dominant Shares
10/10/2012 53cs262a-S12 Lecture-13
Fairness in Today’s Datacenters
• Hadoop Fair Scheduler/capacity/Quincy– Each machine consists of k slots (e.g. k=14)– Run at most one task per slot– Give jobs ”equal” number of slots,
i.e., apply max-min fairness to slot-count
• This is what DRF paper compares against
10/10/2012 54cs262a-S12 Lecture-13
Experiment: DRF vs SlotsNumber of Type 1 Jobs Finished
Number of Type 2 Jobs FinishedLow utilization
Thrashing
Thrashing
Type 1 jobs <2 CPU, 2 GB> Type 2 jobs <1 CPU, 0.5GB>
Jobs
fin
ishe
dJo
bs
finis
hed
10/10/2012 55cs262a-S12 Lecture-13
Experiment: DRF vs SlotsCompletion Time of Type 1 Jobs
Completion Time of Type 2 Jobs
Type 1 job <2 CPU, 2 GB> Type 2 job <1 CPU, 0.5GB>
Low utilization hurts performance
Thrashing
Thrashing
Job
com
plet
ion
time
Job
com
plet
ion
time
10/10/2012 56cs262a-S12 Lecture-13
Reduction in Job Completion TimeDRF vs Slots• Simulation of 1-week Facebook traces
• DRF provides multiple-resource fairness in the presence of heterogeneous demand
– First generalization of max-min fairness to multiple-resources
• DRF’s properties– Share guarantee, at least 1/n of one resource– Strategy-proofness, lying can only hurt you– Performs better than current approaches
10/10/2012 59cs262a-S12 Lecture-13
Is this a good paper?• What were the authors’ goals?• What about the evaluation/metrics?• Did they convince you that this was a good
system/approach?• Were there any red-flags?• What mistakes did they make?• Does the system/approach meet the “Test of Time”
challenge?• How would you review this paper today?