-
The Supermarket Model with Known and Predicted Service Times
Michael Mitzenmacher∗ Matteo Dell’Amico†
Abstract
The supermarket model refers to a system with a largenumber of
queues, where arriving customers choose dqueues at random and join
the queue with the fewestcustomers. The supermarket model
demonstrates thepower of even small amounts of choice, as
comparedto simply joining a queue chosen uniformly at random,for
load balancing systems. In this work we performsimulation-based
studies to consider variations whereservice times for a customer
are predicted, as might bedone in modern settings using machine
learning tech-niques or related mechanisms. Our primary takeawayis
that using even seemingly weak predictions of servicetimes can
yield significant benefits over blind First InFirst Out queueing in
this context. However, some caremust be taken when using predicted
service time infor-mation to both choose a queue and order elements
forservice within a queue; while in many cases using theinformation
for both choosing and ordering is beneficial,in many of our
simulation settings we find that simplyusing the number of jobs to
choose a queue is betterwhen using predicted service times to order
jobs in aqueue. Although this study is simulation based, ourstudy
leaves many natural theoretical open questions forfuture work.
1 Introduction
The success of machine learning has opened up newopportunities
in terms of improving the efficiency ofa wide array of processes.
In this paper, we consideropportunities for using machine learning
predictions ina specific setting: queueing in large distributed
systemsusing “the power of two choices”. This also leads us
toconsider variants of these systems that appear to havenot
previously studied that do not use predictions as astarting point.
While our study here is simulation-based,with both synthetic and
real data sets, it leads to severalnew open theoretical
questions.
We start with key background. In queueing settings,the
supermarket model (also described as the power
∗School of Engineering and Applied Sciences, Harvard Univer-
sity. [email protected]. This work was supported inpart
by NSF grants CCF-1563710 and CCF-1535795.
†[email protected]
of two choices, or balanced allocations) is typicallydescribed
in the following way. Suppose we have asystem of n First In, First
Out (FIFO) queues. Jobs1
arrive to the system as a Poisson process of rate λn,and service
times are independent and exponentiallydistributed with mean 1. If
each job selects a randomqueue on arrival, then via Poisson
splitting [23, Section8.4.2] each queue acts as a standard M/M/1
queue,and in equilibrium the fraction of queues with at leasti jobs
is λi. Note that we consider here the tails ofthe queue length
distribution, as it makes for easiercomparisons. If each job
selects a two random queueson arrival, and chooses to wait at the
queue with fewercustomers (breaking ties randomly), then in the
limitingsystem as n grows to infinity, in equilibrium the
fractionof queues with at least i jobs is λ2
i−1. That is, thetails decrease doubly exponentially in i,
instead of singleexponentially. In practice, even for moderate
values ofn (say in the large hundreds), one obtains
performanceclose to this mean field limit; this can be be proven
basedon appropriate concentration bounds. More generally,for d
choices where d is an integer constant greaterthan 1, the fraction
of queues with at least i jobsfalls like λ(d
i−1)/(d−1) [20, 29]. While there are manyvariations on the
supermarket model and its analysis(see, e.g., [1, 6, 24, 28]), here
we focus on this standard,simple formulation, although we allow for
general servicedistributions instead of just the exponential
distribution.
As stated previously, here we study variations of thesupermarket
model where service times are predicted.To describe our work and
goals, we start by consideringthe baseline where service times are
known.
The analysis for the basic supermarket model de-scribed above
assumes that service times are exponen-tially distributed but
specific job service times are notknown. (Extensions of the
analysis to more generaldistributions are known [1, 6].) As such,
an incomingjob uses only the number of jobs at each chosen queueto
decide which queue to join. As both a theoreticalquestion and for
possible practical implementations, itseems worthwhile to know what
further improvement is
1In this paper we use jobs instead of the more specific term
customers, as the model applies to a variety of
load-balancingsettings.
arX
iv:1
905.
1215
5v2
[cs
.PF]
8 O
ct 2
020
-
possible if service times of the jobs were known.Recently,
Hellemans and Van Houdt proved results
in the supermarket model setting where job reservationsare made
at d randomly chosen queues, and once thefirst reservation reaches
the point of obtaining service,the other reservations are canceled.
This corresponds tochoosing the least loaded (in terms of total
remainingservice time2) of d queues using FIFO queues. Theirwork
applies to general service distributions; for theclass of
phase-type service distributions, they are able toexpress the
limiting behavior of the system in termsof delayed differential
equations [11]. Their results,including theorems regarding the
system behavior as wellas simulations, show that using service time
informationcan lead to significant improvements in the average
timea job spends in the system. (The subsequent work[12] examines
several additional variations.) Becauseof space limitations, we
leave discussion of the challengesof extending these resuots beyond
FIFO queues toAppendix A.1.
However, when the service times are known, thereare two possible
ways to potentially improve performance.First, as above, one can
use the service times whenselecting a queue, by choosing the least
loaded queue.Second, one can order the jobs using a strategy
otherthan FIFO; the natural strategies to minimize theaverage time
in the system (response time) are shortestjob first (SJF),
preemptive shortest job first (PSJF),and shortest remaining
processing time (SRPT). Hereshortest job first assumes no
preemption and alwaysschedules the job with the smallest service
time whena job completes, preemptive shortest job first
allowspreemption so that a job with a smaller service timecan
preempt the running job, and shortest remainingprocessing time
allows preemption but is based on theremaining processing time
instead of the total servicetime for a job. Note that here we
assume a preempted jobdoes not need to start from the beginning and
can latercontinue service where it left off. Also, while
apparentlysomewhat less natural, PSJF allows job priorities tobe
assigned on arrival to a queue without the need forupdating, unlike
SRPT.
In the setting of a single queue, Mitzenmacher hasrecently
considered the setting where service time arepredicted rather than
known exactly [21]. In this model,the jobs have a joint
service-predicted service densityfunction g(x, y), where x is the
true service time and yis the predicted service time. He provides
formulae forthe average response time using corresponding
strategiesshortest predicted job first (SPJF), preemptive
shortest
2We use service time and processing time interchangably inthis
paper; both terms have been used historically.
predicted job first (PSPJF), and shortest predictedremaining
processing time (SPRPT). Simulation resultssuggest that in the
single queue setting even weakpredictors can greatly improve
performance over FIFOqueues. However, using the power of two
choices alreadyprovides great improvements in systems with
multiplequeues. It is therefore natural to consider
whetherpredictions would still provide significant performancegains
in the supermarket model.
The contributions of this paper include the following.
• For the case of known service times, we providea simulation
study with synthetic traces showingthe potential gains when using
SJF, PSJF, andSRPT queues in the supermarket model, providingan
appropriate baseline.
• We similarly through simulations examine the ben-efits when
only predicted information is available,using FIFO, SPJF, PSPJF,
and SPRPT queues.Here we use both synthetic and real-world
datasets.
• We determine somewhat counterintuitive behaviors;for example,
we find many cases where choosing thepredicted least loaded queue
performs worse thansimply choosing the shortest queue.
• We provide a number of open questions related tothe analysis
and use of these systems.
2 Additional Related Work
The power of two choices was first analyzed in thediscrete
settings of hashing, modeled as balls and binsprocesses [3, 14,
18]. It was subsequently analyzed inthe setting of queueing
systems, in particular in themean field limit (also referred to as
the fluid limit) asthe number of queues grows to infinity [20,
29].
Ordering jobs by service time has been studiedextensively in
single queues. The text [10] providesan excellent introduction to
the analysis of standardapproaches such as SJF and SPRT in the
single queuesetting.
Our work falls into a recent line of work that aims touse
machine learning predictions to improve traditionalalgorithms. For
example, Lykouris and Vassilvitskii [16]show how to use prediction
advice from machine learningalgorithms to improve online algorithms
for caching ina way that provides provable performance
guarantees,using the framework of competitive analysis. Otherrecent
works with this theme include the development oflearned Bloom
filters [15, 22] and heavy hitter algorithmsthat use predictions
[13]. One prior work in this veinhas specifically looked at
scheduling with predictionsin the setting of a fixed collection of
jobs, and considervariants of shortest predicted processing time
that yield
-
good performance in terms of the competitive ratio,with the
performance depending on the accuracy of thepredictions [25].
In scheduling of queues, some works have looked atthe effects of
using imprecise information, including forload balancing in
multiple queue settings. For example,Mitzenmacher considers using
old load information toplace jobs (in the context of the power of
two choices)[19]. A strategy called TAGS studies an approach
toutilizing multiple queues when no information existsabout the
service time; jobs that run more than somethreshold in the first
queue are canceled and passed tothe second queue, and so on
[9].
For single queues, Wierman and Nuyens look atvariations of SRPT
and SJF with inexact job sizes,bounding the performance gap based
on bounds onhow inexact the estimates can be [32].
Dell’Amico,Carra, and Michardi note that such bounds may
beimpractical, as outliers in estimating job sizes occurfrequently;
they empirically study scheduling policies forqueueing systems with
estimated sizes [8]. We note [8]points out there are natural
methods to estimate jobsize, such as by running a small portion of
the code ina coding job; we expect this or other inputs would
befeatures in a machine learning formulation. Recent workby Scully
and Harchol-Balter have considered schedulingpolicies that are
based on the amount of service received,where the scheduler only
knows the service receivedapproximately, subject to adversarial
noise, and the goalis to develop robust policies [26]. Also, for
single queues,many prediction-based policies appear to fit within
themore general framework of SOAP policies presented byScully et
al. [27].
Our work differs from these past works, in providinga model
specifically geared toward studying performancewith
machine-learning based predictions in the contextof the supermarket
model.
3 Known Service Times
3.1 Scheduling Beyond FIFO To begin, we noteagain that the work
of [11] shows that for the supermar-ket model with d choices for
constant d, known servicetimes (independently chosen from a given
service timedistribution), and FIFO scheduling, the equations
forthe stationary distribution can be determined, whenthe queue is
chosen according to the least loaded pol-icy. However, there
appears to previously not have beenstudies of scheduling schemes
within each queue thatmake use of the service times, including
shortest job first(SJF), preemptive shortest job first (PSJF), and
shortestremaining processing time (SRPT).
While primarily in this paper we are intersted in theperformance
of the supermarket model with predicted
service times, as these variations do not appear to havebeen
studied, we provide results as a baseline for ourlater results.
In the simulation experiments we present, we sim-ulate 1000
initially empty queues over 10000 units oftime, and take the
average response time for all jobs thatterminate after time 1000
and before time 10000. Wethen take the average of this value over
100 simulations.Waiting for the first 1000 time units allows the
systemto approach the stationary distribution. Variations ofthe
supermarket model have a limiting equilibrium dis-tribution as the
number of queues goes to infinity [6],and in practice we find 1000
queues provides an accurateestimate of the limiting behavior. In
the experiments wefocus on two example service distributions:
exponentialwith mean 1, and a Weibull distribution with cumula-
tive distribution 1 − e−√2x. (The Weibull distribution
is more heavy-tailed, but also has mean 1. We havealso done
experiments with a more heavy-tailed Weibull
distribution with cumulative distribution 1− e− 3√6x; the
general trends are similar for this distribution as for
theWeibull distribution we discuss.) Arrivals are Poissonwith
arrival rate λ; we focus on results with λ ≥ 0.5, asfor smaller
arrival rates all our proposed schemes performvery well and it
becomes difficult to see performance dif-ferences. Unless otherwise
noted in the simulations eachjob chooses d = 2 queues at random.
While we havedone simulations for larger d values, and at a high
levelthere are similar trends, studying the detailed effects
oflarger d across the many variations we study is left forfuture
work.
Figure 1(a) shows the results where the least loadedqueue is
chosen (ties broken randomly), while Figure 1(b)shows the results
where the shortest queue is chosen,for exponential service times.
Figures 1(c) and 1(d)present the results for the Weibull
distributed servicetimes. Generally, we see that the results from
using theknown service times to order jobs at the queue is
verypowerful; indeed, the gain from using SRPT appearslarger than
the gain from moving from shortest queueto least loaded, and
similarly the gain from using SJFand PSJF is larger under high
enough loads.
As the charts make it somewhat more difficult tosee some
important details, we present numerical resultsfor exponentially
distributed service times in Table 1to mark some key points. While
generally the benefitsfrom using the service times to both choose
the queueand order the queue are complementary, this is notalways
the case. We see that using least loaded ratherthan shortest queue
when using PSJF can increase theaverage time in the system under
suitably high load.(This also occurs with the Weibull distribution
undersufficiently high loads.) We also see that using PSJF can
-
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
2
4
6
8
10
1
averagetim
ein
system
FIFOSJFPSJFSRPT
(a) Exponential service times,
queue chosen by least loaded
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
2
4
6
8
10
1
averagetim
ein
system
FIFOSJFPSJFSRPT
(b) Exponential service times,
queue chosen by shortest queue
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
2
4
6
8
10
1
averagetim
ein
system
FIFOSJFPSJFSRPT
(c) Weibull service times, queue
chosen by least loaded
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
2
4
6
8
10
1
averagetim
ein
system
FIFOSJFPSJFSRPT
(d) Weibull service times, queue
chosen by shortest queue
Figure 1: Exponential and Weibull service times, two choice
supermarket model, with various queue schedulingpolicies.
λShortest queue Least loaded
FIFO SJF PSJF SRPT FIFO SJF PSJF SRPT
0.5 1.2658 1.2585 1.1669 1.1337 1.1510 1.1460 1.1462 1.09730.6
1.4078 1.3857 1.2527 1.2020 1.2401 1.2280 1.2289 1.15180.7 1.6148
1.5567 1.3726 1.2962 1.3749 1.3467 1.3490 1.23070.8 1.9485 1.7997
1.5542 1.4367 1.5975 1.5297 1.5371 1.35330.9 2.6168 2.2054 1.8850
1.6873 2.0534 1.8634 1.8915 1.57830.95 3.3923 2.5903 2.2248 1.9408
2.5852 2.1999 2.2685 1.80960.98 4.5384 3.0618 2.6721 2.2614 3.3798
2.6197 2.7807 2.10380.99 5.4855 3.3856 2.8596 2.4903 4.0451 2.8514
3.1696 2.3176
Table 1: Results from choosing from the shortest queue compared
with choosing the least loaded.
give worse performance than using SJF; however, thisdoes not
happen with our experiments with the Weibulldistribution, where the
ability of preemption to helpavoid waiting for long-running jobs
appears to be morehelpful. While it is known that PSJF can behave
worsethan SJF, these examples highlight that the interactionswhen
using service time information in multiple choicesystems must be
treated carefully.
3.2 Choosing a Queue with Exact InformationGiven the
improvements possible using known servicetimes in the supermarket
model, we now considermethods for choosing a queue beyond the queue
with theleast load, in the setting withouot predictions. Givenfull
information about the service times of jobs at eachqueue, a job
could be placed so that it minimizes theadditional waiting time.
The additional waiting timewhen placing an arriving job is the sum
of the remainingservice times of all jobs in the queue that will
remainahead of the arriving job, summed with the product ofthe
service time of the arriving job and the number ofjobs it will be
placed ahead of. Equivalently, we canconsider the total waiting
time for each queue beforeand after the arriving job would be
placed (ignoringthe possibility of future jobs), and place the item
in the
queue that leads to the smallest increase.Alternatively, if
control is not centralized, we might
consider selfish jobs, that seek only to minimize theirown
waiting time when choosing a queue. In this casethe arriving job
will consider the sum of the remainingservice times of all jobs
that will be ahead of it for eachavailable queue choice.
Our results, given in Figure 2, show that choosinga queue to
minimize the additional waiting time inthese situations does yield
a small improvement overleast loaded SRPT, as might be expected.
Because theadditional improvement is small, we expect in
manysystems it may not be worthwhile to implement thismodification,
even if expected waiting time is the primaryperformance metric. Our
results also show that whileselfish jobs have a significant
negative effect, the overallaverage service time still remains
smaller than thestandard supermarket model when choosing the
shorterof two FIFO queues.
4 Predicted Service Times
In many settings, it may be unreasonable to expect toobtain
exact service times, but predicted service timesmay be available.
Indeed, with advances in machine
-
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
1.0
averagetim
ein
system
SRPTSELFISHMIN-ADD
(a) Exponential service times, queue choice methods
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
1.0
averagetim
ein
system
SRPTSELFISHMIN-ADD
(b) Weibull service times, queue choice methods
Figure 2: Comparing methods of choosing a queue. All queues use
SRPT within the queue; in the figure, SRPTmeans each job chooses
the queue with smallest remaining work, SELFISH means each job
chooses the queuethat minimizes its waiting time, and MIN-ADD means
each job chooses the queue that minimizes the additionalwaiting
time added.
learning techniques, we expect that in many settingssome type of
prediction will be available. As noted in[21], in the context of
scheduling within a single queue,one would expect that even weak
predictors may be veryuseful, since ordering jobs correctly most of
the time willproduce significant gains. As we have seen, however,
evenwithout predictions the question of whether using
loadinformation for both choosing a queue and for orderingwithin a
queue provides complementary gains is notalways clear. Naturally,
the same question will ariseagain when using predicted service
times.
4.1 The Prediction Model In what follows, weutilize a simple
model used in [21], namely that thereis a continuous joint
distribution g(x, y) for the actualservice time x and predicted
service time y.
With this model, there remains an issue of how todescribe the
predicted remaining service time. Supposethat the original
predicted service time for a job is y,but the actual service time
is x > y. If the amount ofservice is being tracked, and the
service received hasbeen t, then as the remaining service time is
x− t, it isnatural to use y − t as the predicted remaining
servicetime. Of course, at some point it becomes the case thatt
> y, and the predicted remaining service time will benegative,
which seems unsuitable.
We use (y − t)+ = max(y − t, 0) as the predictedremaining
service time. We recognize (as noted in [21])that this is
problematic; clearly the predicted remainingservice time should be
positive, and ideally would bea function f(y, t) of the initial
prediction and the timeserved thus far. However, determining the
appropriatefunction would appear to require some knowledge of
the joint distribution g(x, y); our aim here is to
exploresimple, general approaches (such as choosing the shortestof
two queues and using SRPT) that are agnostic to theunderlying
distribution g. In many situations, it may becomputationally
undesirable to utilize knowledge of g, org may be not known or
changing over time. We thereforeleave the question of how to
optimize the estimate of thepredicted remaining time to achieve the
best performancein this context as future work.
We consider various models for predictions (some ofwhich were
used in [21]). The models are intended tobe exemplary; they do not
match a specific real-worldsetting, and indeed it would be
difficult to consider therange of possible real-world predictions.
Rather, they aremeant to show generally that even moderately
accuratepredictions can yield strong performance, and to showthat a
variety of interesting behaviors can occur underthis framework.
In one model, which we refer to as exponential pre-dictions, a
job with actual service time x has a predictedservice time that is
exponentially distributed with meanx. This model is not meant to
accurately represent aspecific situation, but is potentially useful
for theoreti-cal analysis in that the corresponding density
equationg(x, y) is easy to write down, and it highlights how
evennoisy predictions can perform well. Also, exponentialservice
times are a standard first consideration in queue-ing theory. In
another model, which we refer to asα-predictions, a job with
service time x has a predictedservice time that is uniform over
[(1−α)x, (1+α)x], for ascale parameter 0 ≤ α ≤ 1. Again, this is a
simple modelthat captures inaccurate estimates naturally. Finally,
weintroduce a model that we dub (α, β)-predictions, which
-
makes use of the following notion of a reversal. For aservice
distribution with be the cumulative distributionfunction S(x), the
reversal of x is S−1(1 − S(x)). Forexample, if x is the value that
is at the 70th percentileof the distribution, the reversal is the
value at the 30thpercentile of the distribution. For an (α,
β)-prediction,when the service time is x, with probability β we
returnthe reversal of x, and with all remaining probability
thepredicted service time is uniform over [(1−α)x, (1+α)x].We use
this model to represent cases where severe mis-predictions are
possible, so that jobs with very largeservice times might be
mistakenly predicted as havingvery small service times (and vice
versa). We mightexpect such mispredictions could be potentially
veryproblematic when scheduling jobs according to theirpredicted
service times.
There further remains the question of how to accountfor the
predicted workload at a queue. We discuss severalvariations.
1. Least Loaded Total: One could simply treat thepredicted
service times as actual service times, andtrack the total predicted
service time remainingat a queue. That is, when a new job arrives
ata queue, the predicted service time for the job isadded to the
total, and the total predicted servicetime reduces at a rate of one
unit per unit time whena job is in the system (with a lower bound
of 0);when the queue empties, the total predicted servicetime is
reset to 0. An advantage of this approachis that in implementation
the queue state can berepresented by a single number. The
disadvantageis that when a job’s predicted service time
differsgreatly from the real service time, this approachdoes not
correspondingly update when that jobcompletes.
2. Least Loaded Updated: Here one updates thequeue state both on
a job arrival and a job comple-tion; when a job completes, the
predicted servicetime at the queue is recomputed as the sum of
thepredicted service times of the remaining jobs. Withsmall
additional complexity, the accuracy of thepredicted work at the
queue improves substantially.
3. Shortest Queue: One can always simply use thenumber of jobs
rather than the predicted servicetime to choose the queue.
We note Least Loaded Updated performs much betterthan Least
Loaded Total, and focus on it for the restof the paper. More on the
comparison is given inAppendix A.2.
4.2 Scheduling with Predictions We begin as be-fore by first
considering the effect of the choice of schedul-ing procedure
within a queue, by examining results forFIFO, shortest predicted
job first (SPJF), preemptiveshortest predicted job first (PSPJF),
and shortest pre-dicted remaining processing time (SPRPT) in
varioussettings. Our figures consider the least loaded updatedand
shortest queue variations described above (as theleast loaded total
variation generally performs signifi-cantly worse, and so we do not
generally consider thismodel, although for comparison purposes we
presentsome results for it in the next subsection). We again
con-sider exponential and Weibull distributed service timesas
previously described.
Our first results for exponential predictions, shownin Figure 3,
already show two key points: predictedservice times can work quite
well, but there are alsosurprising and interesting behaviors.
First, choosing theshortest queue generally performs better than
choosingthe least loaded according to the predicted service timesof
jobs in the queue; for this set of experiments, only withWeibull
distributed service times and FIFO service doeschoosing the queue
based on the predicted load in thequeue perform better than using
just the number of jobs.That is, when using strategies within the
queue thatutilize the predicted information, it is worse to use
thepredicted load to choose the queue. Hence, even in thisvery
simple case, we see that using predicted informationfor multiple
subtasks (choosing a queue, and balancingwithin a queue) can lead
to worse performance thansimply using the information for one of
the subtasks.
Second, PSPJF performs better than SPRPT on theWeibull
distribution. On reflection, this seems reasonablefrom first
principles; a long job that is incorrectlypredicted to have a small
remaining processing timecan lead to increased waiting times for
many jobs underSPRPT, but preempting based on the initial
predictionof the job time ameliorates this effect.
We now examine results in the setting of α-predictions. We first
look at the case of SPRPT; resultsfor other schemes have similar
characteristics. We com-pare SRPT (no prediction) with SPRPT for α
= 0.2, 0.5,,and 0.8, both using the least loaded update and
shortestqueue policies. The results appear in Figure 4. Theprimary
takeaway is that again using predictions offerswhat is arguably
surprisingly little loss in performance,even at large values of α.
Here, we find that least loadeddoes better than shortest queue for
small values of α,but for α = 0.8 and high arrival rates shortest
queue canperform slightly better. This is consistent with our
re-sults for the exponential model; under large error,
usingpredictions both to choose a queue and within a queuecan lead
to over-using the predictions.
-
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1
2
3
4
5
6
7
8
1
averagetim
ein
system
FIFOSPJFPSPJFSPRPT
(a) Exponential service times,
queue chosen by least loadedupdated
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1
2
3
4
5
6
7
8
1
averagetim
ein
system
FIFOSPJFPSPJFSPRPT
(b) Exponential service times,
queue chosen by shortest queue
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1
2
3
4
5
6
7
8
1
averagetim
ein
system
FIFOSPJFPSPJFSPRPT
(c) Weibull service times, queue
chosen by least loaded updated
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1
2
3
4
5
6
7
8
1
averagetim
ein
system
FIFOSPJFPSPJFSPRPT
(d) Weibull service times, queue
chosen by shortest queue
Figure 3: Exponential predictions with exponential and Weibull
service times, two choice supermarket model,with various queue
scheduling policies.
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
1.0
averagetim
ein
system
SRPTSPRPT α = 0.2SPRPT α = 0.5SPRPT α = 0.8
(a) Exponential service times,queue chosen by least
loadedupdated
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
1.0
averagetim
ein
system
SRPTSPRPT α = 0.2SPRPT α = 0.5SPRPT α = 0.8
(b) Exponential service times,queue chosen by shortest queue
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
1.0averagetim
ein
system
SRPTSPRPT α = 0.2SPRPT α = 0.5SPRPT α = 0.8
(c) Weibull service times, queue
chosen by least loaded updated
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
1.0
averagetim
ein
system
SRPTSPRPT α = 0.2SPRPT α = 0.5SPRPT α = 0.8
(d) Weibull service times, queuechosen by shortest queue
Figure 4: α-predictions with exponential and Weibull service
times, two choice supermarket model, using SPRPT.
We also look at the case of PSPJF in Figure 5.Performance is
somewhat worse than for SPRPT, andthe effect of increasing α is
somewhat smaller. Here,we find that joining the shortest queue
generally doesbetter than joining the least loaded queue. Again,
thisis consistent with our results for the exponential
model.Overall the picture remains very similar.
For completeness we also provide results using SPJFin Figure 6.
Here SPJF generally performs worse thanSPRPT and PSPJF; however,
the effect on performanceas α increases rises even more slowly with
α. In theseexperiments, using least loaded always performed
betterthan choosing the shortest queue.
Finally, we consider the case of (α, β)-predictions.Here we
present an example of α = 0.5 with β =0.1, 0.2, and 0.3, comparing
also with the results fromα-prediction when α = 0.5. Recall that in
this settingwith probability β a job’s service time is replaced by
itsreversal in the cumulative distribution function, so thatjobs
with very large service times might be mistakenlypredicted as
having very small service times (and viceversa). The remaining jobs
have predictions uniformover [(1 − α)x, (1 + α)x] when the true
service timeis x. The results for SPRPT are given in Figure 7,for
PSPJF are given in Figure 8, and for SPJF are
given in Figure 9. The primary takeaway is again thatperformance
is quite robust to mispredictions. Evenwhen β = 0.3, performance in
all cases is significantlybetter than for standard choosing the
shortest of twoqueues and using FIFO queueing without knowledge
ofservice times. We also see now familiar trends. Theeffects of
misprediction are more significant for theheavy-tailed service
times, and when mispredictions aresufficiently frequent, it becomes
better to choose a queueaccording to the shortest queue rather than
accordingto the least loaded updated policy. Also, in some
casesPSPJF can outperform SPRPT.
4.2.1 Fairness Issues As the general problem of uti-lizing
predictions for queue scheduling is relatively new,we have focused
here on examining expected responsetime. We point out, however,
that there are novel prob-lems regarding questions such as fairness
in the settingwhere predictions are used. For example, a
standardnotion of fairness involves considering job slowdown,i.e.,
the ratio T (x)/x between a job j’s response timeT (x) and its size
x, and the mean conditional slow-down E[T (x)]/x [4, 30, 31]. Not
surprisingly, we findthat when using predictions, even when using
schedulingmethods based on SRPT, which is often fair or limited
-
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
1.0
averagetim
ein
system
SJFPSPJF α = 0.2PSPJF α = 0.5PSPJF α = 0.8
(a) Exponential service times,queue chosen by least
loadedupdated
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
1.0
averagetim
ein
system
SJFPSPJF α = 0.2PSPJF α = 0.5PSPJF α = 0.8
(b) Exponential service times,queue chosen by shortest queue
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
1.0
averagetim
ein
system
SJFPSPJF α = 0.2PSPJF α = 0.5PSPJF α = 0.8
(c) Weibull service times, queue
chosen by least loaded updated
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
1.0
averagetim
ein
system
SJFPSPJF α = 0.2PSPJF α = 0.5PSPJF α = 0.8
(d) Weibull service times, queuechosen by shortest queue
Figure 5: α-predictions with exponential and Weibull service
times, two choice supermarket model, using PSPJF.
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
1.0
averagetim
ein
system
SJFSPJF α = 0.2SPJF α = 0.5SPJF α = 0.8
(a) Exponential service times,
queue chosen by least loadedupdated
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
1.0
averagetim
ein
system
SJFSPJF α = 0.2SPJF α = 0.5SPJF α = 0.8
(b) Exponential service times,
queue chosen by shortest queue
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1
2
3
4
5
6
1
averagetim
ein
system
SJFSPJF α = 0.2SPJF α = 0.5SPJF α = 0.8
(c) Weibull service times, queue
chosen by least loaded updated
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1
2
3
4
5
6
1
averagetim
ein
system
SJFSPJF α = 0.2SPJF α = 0.5SPJF α = 0.8
(d) Weibull service times, queue
chosen by shortest queue
Figure 6: α-predictions with exponential and Weibull service
times, two choice supermarket model, using SPJF.
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
1.0
averagetim
ein
system
SPRPT (α, β) = (0.5, 0.0)SPRPT (α, β) = (0.5, 0.1)SPRPT (α, β) =
(0.5, 0.2)SPRPT (α, β) = (0.5, 0.3)
(a) Exponential service times,queue chosen by least
loadedupdated
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
1.0
averagetim
ein
system
SPRPT (α, β) = (0.5, 0.0)SPRPT (α, β) = (0.5, 0.1)SPRPT (α, β) =
(0.5, 0.2)SPRPT (α, β) = (0.5, 0.3)
(b) Exponential service times,queue chosen by shortest queue
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
1.0
averagetim
ein
system
SPRPT (α, β) = (0.5, 0.0)SPRPT (α, β) = (0.5, 0.1)SPRPT (α, β) =
(0.5, 0.2)SPRPT (α, β) = (0.5, 0.3)
(c) Weibull service times, queue
chosen by least loaded updated
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
1.0
averagetim
ein
system
SPRPT (α, β) = (0.5, 0.0)SPRPT (α, β) = (0.5, 0.1)SPRPT (α, β) =
(0.5, 0.2)SPRPT (α, β) = (0.5, 0.3)
(d) Weibull service times, queuechosen by shortest queue
Figure 7: (α, β)-predictions with exponential and Weibull
service times, two choice supermarket model, usingSPRPT.
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
1.0
averagetim
ein
system
PSPJF (α, β) = (0.5, 0.0)PSPJF (α, β) = (0.5, 0.1)PSPJF (α, β) =
(0.5, 0.2)PSPJF (α, β) = (0.5, 0.3)
(a) Exponential service times,queue chosen by least loaded
updated
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
1.0
averagetim
ein
system
PSPJF (α, β) = (0.5, 0.0)PSPJF (α, β) = (0.5, 0.1)PSPJF (α, β) =
(0.5, 0.2)PSPJF (α, β) = (0.5, 0.3)
(b) Exponential service times,queue chosen by shortest queue
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
1.0
averagetim
ein
system
PSPJF (α, β) = (0.5, 0.0)PSPJF (α, β) = (0.5, 0.1)PSPJF (α, β) =
(0.5, 0.2)PSPJF (α, β) = (0.5, 0.3)
(c) Weibull service times, queuechosen by least loaded
updated
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
1.0
averagetim
ein
system
PSPJF (α, β) = (0.5, 0.0)PSPJF (α, β) = (0.5, 0.1)PSPJF (α, β) =
(0.5, 0.2)PSPJF (α, β) = (0.5, 0.3)
(d) Weibull service times, queuechosen by shortest queue
Figure 8: (α, β)-predictions with exponential and Weibull
service times, two choice supermarket model, usingPSPJF.
-
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
1.0
averagetim
ein
system
SPJF (α, β) = (0.5, 0.0)SPJF (α, β) = (0.5, 0.1)SPJF (α, β) =
(0.5, 0.2)SPJF (α, β) = (0.5, 0.3)
(a) Exponential service times,
queue chosen by least loadedupdated
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
1.0
averagetim
ein
system
SPJF (α, β) = (0.5, 0.0)SPJF (α, β) = (0.5, 0.1)SPJF (α, β) =
(0.5, 0.2)SPJF (α, β) = (0.5, 0.3)
(b) Exponential service times,
queue chosen by shortest queue
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1
2
3
4
5
6
7
8
9
1
averagetim
ein
system
SPJF (α, β) = (0.5, 0.0)SPJF (α, β) = (0.5, 0.1)SPJF (α, β) =
(0.5, 0.2)SPJF (α, β) = (0.5, 0.3)
(c) Weibull service times, queue
chosen by least loaded updated
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1
2
3
4
5
6
7
8
9
1
averagetim
ein
system
SPJF (α, β) = (0.5, 0.0)SPJF (α, β) = (0.5, 0.1)SPJF (α, β) =
(0.5, 0.2)SPJF (α, β) = (0.5, 0.3)
(d) Weibull service times, queue
chosen by shortest queue
Figure 9: (α, β)-predictions with exponential and Weibull
service times, two choice supermarket model, usingSPJF.
in its unfairness (see the discussion in [30]), the
proposedvariations using prediction have very poor fairness.
This occurs for multiple reasons. Most significantly,even very
short jobs can be caught waiting for jobs witha remaining predicted
service of zero. This leads tooccasional large values of T (x)/x
for small jobs, skewingthe fairness. Also, in cases where small
jobs obtain largepredictions, the value of T (x)/x can again be
high underhigh load.
The two problems suggest different solutions. Inthe first case,
T (x) is artificially high; this suggestsstrong fairness results
require policies that do somethingbeyond letting jobs with
predicted remaining servicetime zero continue. Some addition of
processor sharing,preemption, or modifying the prediction when
thepredicted remaining service time reaches zero shouldtherefore
improve fairness. In the second case, x issmall when the predicted
time y is large; this suggeststhat alternative definitions of
fairness, based on theprediction y as well as the actual service
time x, shouldbe considered.
We provide some preliminary results on fairness inAppendix B,
and leave additional study for further work.
5 Real-World Traces
We now consider how the scheduling policies deal withreal-world
traces by Amvrosiadis et al. [2], who developeda system that
predicts job size in large real-world clusters.For each job, we
have the submission time and its realand predicted runtime in
seconds (for multi-task jobs,their sizes were obtained by summing
the size (resp.predicted size) of each individual task); we only
takeinto account the jobs that completed successfully.
Forconsistency with the synthetic traces, we normalize jobsize to
obtain a mean of 1. We have three traces: Google(385,072 jobs),
Trinity (18,872 jobs), and Twosigma(265,029 jobs). In Figure 10, we
divide each trace in 100periods, and plot the cumulative size of
jobs submitted
0 20 40 60 80 100
Period
0
10
20
30
40
Weigh
tof
jobsin
period(m
ean=1)
Twosigma
Trinity
Google
Figure 10: Cumulative weight of jobs submitted perperiod.
in that period, normalized such that the mean valueper period is
1. We have very different job submissionpatterns: most of the load
for the Twosigma datasetis concentrated around the beginning of the
trace; theTrinity dataset has lower load spikes along the
wholetrace; finally, the Google dataset has a relatively
moreuniform load pattern.3
To obtain an average system load of λ, we scale jobsubmission
times such that the average interval betweenjobs is 1/qλ, where q =
100 is the number of queues in thesimulation. Since there is a
degree of randomness due tothe random choice of queues, we repeated
the experiment5 times for each settings, with negligible
differences inthe end results. Results shown here represent the
averageof the 5 runs.
In Figure 11, we show heatmaps covering the joint
3Amvrosiadis et al. discuss a fourth trace (Mustang).
However,
we have found that job size prediction in this case is not
goodenough to be useful for scheduling, and in our
experimentsscheduling policies that are not based on predicted job
size
outperform those that are. For this reason, we do not
includeresults from that dataset in this paper.
-
−1 0 1 2 3log10(size)
−1
0
1
2
3
log10(estim
ation)
0
10
100
1000
5000
numbero
fjobs
(a) Google
−3 −2 −1 0 1log10(size)
−3
−2
−1
0
1
log10(estim
ation)
0
1
10
100
200
500
1000
1500
numbero
fjobs
(b) Trinity
−3 −2 −1 0 1 2log10(size)
−3
−2
−1
0
1
2
log10(estim
ation)
0
10
100
1000
2000
5000
numbero
fjobs
(c) Twosigma
Figure 11: Real-world datasets: heatmaps of job size
distribution versus estimation.
0.5 0.6 0.7 0.8 0.9 1.0
λ: average system load
0
5
10
15
20
25
30
averagetim
ein
system
FIFOSPJFPSPJFSPRPT
(a) Shortest queue.
0.5 0.6 0.7 0.8 0.9 1.0
λ: average system load
0
5
10
15
20
25
30averagetim
ein
system
FIFOSPJFPSPJFSPRPT
(b) Least loaded updated.
0.5 0.6 0.7 0.8 0.9 1.0
λ: average system load
0
5
10
15
20
25
30
averagetim
ein
system
FIFOSPJFPSPJFSPRPT
(c) Selfish queue selection.
Figure 12: Google dataset: mean response time with various queue
choice methods.
distribution of job real and predicted sizes. In what webelieve
is a common pattern in real-world systems, thejob size distribution
is heavy tailed, with a few largejobs representing a large amount
of the overall systemload; job sizes are distributed along several
orders ofmagnitude.
In Figure 12, we show results for the Google dataset.The
variation of system load in time yields higherabsolute numbers for
the mean response time comparedto the synthetic datasets seen
before; this phenomenonis stronger in the other datasets because
they havelarger load spikes, as seen in Figure 10. Similarly tosome
use cases for synthetic workloads seen before, weobserve that also
here shortest queue performs betterthan least loaded updated.
Interestingly, the selfishstrategy instead performs essentially as
well as shortestqueue, showing that in this scenario the “price of
anarchy”(i.e., the performance cost due to letting each job’s
ownerselfishly choose their queue) is negligible;
analogousconsiderations hold for the other datasets we consider
inthis section. Unsurprisingly, the size-based schedulingpolicies
largely outperform FIFO scheduling in thisheavy-tailed workload,
and—again confirming resultsfor synthetic workloads—PSPJF is
preferable to theother policies due to its better performance when
largejobs are underestimated.
In the other two datasets (Figures 13 and 14) resultsare
qualitatively similar, even though the larger loadspikes further
increase the average time spent in thesystem. The difference
between FIFO and the otherpolicies is even larger, while PSPJF
remains the best-performing policy. The difference between policies
issmaller, compared to the large advantage due to choosinga policy
based on predicted job size.
6 Conclusion
We have considered (primarily through simulation) thesupermarket
model in the setting where service times arepredicted. As a
starting point, however, we consideredthe baseline where service
times are known. Ourresults show that in the “standard” supermarket
model(exponential service times, Poisson arrivals) as well asmore
generally, even though the power of two choicesprovides tremendous
gains over a single choice, thereremains substantial further
performance gains to beachieved when one make use of known service
times. Inparticular, using a service-aware scheduling policy suchas
SRPT can yield significant performance gains underhigh loads. This
immediately raises natural theoreticalquestions, such as deriving
equations for the supermarketmodel using least loaded server
selection with shortestjob first or shortest remaining processing
time, which
-
0.5 0.6 0.7 0.8 0.9 1.0
λ: average system load
20
25
30
35
40
45
50
averagetim
ein
system
FIFOSPJFPSPJFSPRPT
(a) Shortest queue.
0.5 0.6 0.7 0.8 0.9 1.0
λ: average system load
20
25
30
35
40
45
50
averagetim
ein
system
FIFOSPJFPSPJFSPRPT
(b) Least loaded updated.
0.5 0.6 0.7 0.8 0.9 1.0
λ: average system load
20
25
30
35
40
45
50
averagetim
ein
system
FIFOSPJFPSPJFSPRPT
(c) Selfish queue selection.
Figure 13: Trinity dataset: mean response time with various
queue choice methods.
0.5 0.6 0.7 0.8 0.9 1.0
λ: average system load
0
25
50
75
100
125
150
175
200
averagetim
ein
system
FIFOSPJFPSPJFSPRPT
(a) Shortest queue.
0.5 0.6 0.7 0.8 0.9 1.0
λ: average system load
0
25
50
75
100
125
150
175
200averagetim
ein
system
FIFOSPJFPSPJFSPRPT
(b) Least loaded updated.
0.5 0.6 0.7 0.8 0.9 1.0
λ: average system load
0
25
50
75
100
125
150
175
200
averagetim
ein
system
FIFOSPJFPSPJFSPRPT
(c) Selfish queue selection.
Figure 14: Trinity dataset: mean response time with various
queue choice methods.
would extend the recent work of [11].However, our more important
direction is to intro-
duce the idea of using predicted service times in thissetting.
Our simulation-based study suggests that thepower of two choices
maintains most of its power evenwhen using predictions. We also
find some interesting,potentially counterintuitive effects. For
example, whenpredictions are sufficiently inaccurate performance
isbetter when using queue lengths rather than the pre-dicted load
when choosing a queue, even when using thepredictions to schedule
within the queue. We view ourresults as showing the use of
predicted service times inlarge-scale distributed systems can be
quite promisingin terms of improving performance.
Our work highlights many open practical questionson how to
optimize these kinds of systems when usingpredictions, as well as
many open theoretical questionsregarding how to analyze these kinds
of systems. Forexample, suitable mechanisms for managing jobs
thatexceed their predicted service time offer further potentialfor
important improvements. Perhaps most interestingis developing
appropriate theories of fairness when usingpredictions. A short job
predicted to have a long servicetime may face long delays before
service; how to achieve asuitable notion of fairness when using
predictions clearlymerits further study.
References
[1] Reza Aghajani, Xingjie Li, and Kavita Ramanan. Mean-field
dynamics of load-balancing networks with generalservice
distributions. arXiv preprint arXiv:1512.05056,2015.
[2] George Amvrosiadis, Jun Woo Park, Gregory R. Ganger,Garth A.
Gibson, Elisabeth Baseman, and NathanDeBardeleben. On the diversity
of cluster workloadsand its impact on research results. In Proc. of
the 2018USENIX Annual Technical Conference, 2018.
[3] Yossi Azar, Andrei Z. Broder, Anna R. Karlin, andEli Upfal.
Balanced allocations. SIAM J. Comput.,29(1):180–200, 1999.
[4] Nikhil Bansal and Mor Harchol-Balter. Analysis ofSRPT
scheduling: investigating unfairness. In Proc. ofthe Joint
International Conference on Measurementsand Modeling of Computer
Systems, pp. 279-290, 2001.
[5] René Bekker, Sem C Borst, Onno J Boxma, and OfferKella.
Queues with workload-dependent arrival andservice rates. Queueing
Systems, 46(3-4):537–556, 2004.
[6] Maury Bramson, Yi Lu, and Balaji Prabhakar. Ran-domized load
balancing with general service time dis-tributions. In ACM
SIGMETRICS Performance Eval-uation Review, volume 38, pages
275–286, 2010.
[7] Maury Bramson, Yi Lu, and Balaji Prabhakar. Asymp-totic
independence of queues under randomized loadbalancing. Queueing
Systems, 71(3):247–292, 2012.
http://arxiv.org/abs/1512.05056
-
[8] Matteo Dell’Amico, Damiano Carra, and PietroMichiardi. PSBS:
Practical size-based scheduling. IEEETransactions on Computers,
65(7):2199-2212, 2015.
[9] Mor Harchol-Balter. Task assignment with unknownduration. J.
ACM, 49(2):260–288, 2002.
[10] Mor Harchol-Balter. Performance modeling and de-sign of
computer systems: queueing theory in action.Cambridge University
Press, 2013.
[11] Tim Hellemans and Benny Van Houdt. On the
power-of-d-choices with least loaded server selection.
POMACS,2(2):27:1–27:22, 2018.
[12] Tim Hellemans, Tejas Bodas, and Benny Van Houdt.Performance
analysis of workload dependent loadbalancing policies. POMACS,
3(2):35:1–35:35, 2019.
[13] Chen-Yu Hsu, Piotr Indyk, Dina Katabi, and Ali Vakil-ian.
Learning-based frequency estimation algorithms.International
Conference on Learning Representations,2019.
[14] Richard M. Karp, Michael Luby, and Friedhelm Meyerauf der
Heide. Efficient PRAM simulation on a dis-tributed memory machine.
Algorithmica, 16(4/5):517–542, 1996.
[15] Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, andNeoklis
Polyzotis. The case for learned index structures.In Proc. of the
2018 International Conference onManagement of Data, pages 489–504.
ACM, 2018.
[16] Thodoris Lykouris and Sergei Vassilvitskii.
Competitivecaching with machine learned advice. In Proc. of the35th
International Conference on Machine Learning,pages 3302–3311,
2018.
[17] Raymond Marie. Calculating equilibrium probabilitiesfor λ
(n)/c k/1/n queues. ACM Sigmetrics PerformanceEvaluation Review,
9(2):117–125, 1980.
[18] Michael Mitzenmacher. Studying balanced allocationswith
differential equations. Combinatorics, Probability& Computing,
8(5):473–482, 1999.
[19] Michael Mitzenmacher. How useful is old information?IEEE
Trans. Parallel Distrib. Syst., 11(1):6–20, 2000.
[20] Michael Mitzenmacher. The power of two choicesin randomized
load balancing. IEEE Trans. ParallelDistrib. Syst.,
12(10):1094–1104, 2001.
[21] Michael Mitzenmacher. Scheduling with predictionsand the
price of misprediction. arXiv preprintarXiv:1902.00732, 2019.
[22] Michael Mitzenmacher. A model for learned bloomfilters and
optimizing by sandwiching. In Advances inNeural Information
Processing Systems, pages 462–471,2018.
[23] Michael Mitzenmacher and Eli Upfal. Probability
andcomputing - randomized algorithms and probabilisticanalysis.
Cambridge University Press, 2005.
[24] Michael Mitzenmacher and Berthöld Vöcking. Theasymptotics
of selecting the shortest of two, improved.In Proc. of the 37th
Annual Allerton Conference onCommunication, Control, and Computing,
1999.
[25] Manish Purohit, Zoya Svitkina, and Ravi Kumar.Improving
online algorithms via ML predictions. InAdvances in Neural
Information Processing Systems,
pages 9684–9693, 2018.[26] Ziv Scully and Mor Harchol-Balter.
SOAP bubbles: ro-
bust scheduling under adversarial noise. In Proc. of the56th
Annual Allerton Conference on Communication,Control, and Computing,
2018.
[27] Ziv Scully and Mor Harchol-Balter and Allen Scheller-Wolf.
SOAP : One Clean Analysis of All Age-BasedScheduling Policies. In
Proc. of the ACM on Measure-ment and Analysis of Computing Systems,
2018.
[28] Berthold Vöcking. How asymmetry helps load
balancing.Journal of the ACM, 50:4, 568-589, 2003.
[29] Nikita Dmitrievna Vvedenskaya, Roland L’vovich Do-brushin,
and Fridrikh Izrailevich Karpelevich. Queueingsystem with selection
of the shortest of two queues: Anasymptotic approach. Problemy
Peredachi Informatsii,32(1):20–34, 1996.
[30] Adam Wierman. Fairness and scheduling in singleserver
queues. Surveys in Operations Research andManagement Science,
16(1), pp. 39-48, 2011.
[31] Adam Wierman and Mor Harchol-Balter. Classifyingscheduling
policies with respect to unfairness in anM/GI/1. In Proc. of the
International Conference onMeasurements and Modeling of Computer
Systems, pp.238-249, 2003.
[32] Adam Wierman and Misja Nuyens. Scheduling despiteinexact
job-size information. Performance EvaluationReview, 36(1):25-36,
2008.
A Additional Results
A.1 Toward Developing Equations for LimitingBehavior Previous
work has shown that, in the limitingsupermarket model where the
number of queues goes toinfinity, individual queues can be treated
as independent,both when the choosing shortest queue and
whenchoosing the least loaded [7]. This connection plays akey role
in the analysis of the supermarket model whenchoosing the least
loaded queue with FIFO schedulingin [11], which yields the
stationary distribution for thequeue load for individual queues in
the limit as n goesto infinity. (See also [12] on this issue; this
is sometimesreferred to as cavity process analysis.) We can
concludethat in equilibrium, at each queue considered in
isolation,the least loaded variant of the supermarket model hasa
load-dependent arrival process, given by a Poissonprocess of rate
λ(x) when the queue has service load x.(See, for example, [5, 17]
for more on queues with load-dependent arrival processes; note here
the arrival ratedepends on the workload, not the total number of
jobs inthe queue.) The least loaded variant of the supermarketmodel
when using other scheduling schemes, such asSJF, PSJF, and SRPT,
would similarly have the sameload-dependent arrival process, as in
equilibrium theworkload distribution would be the same regardless
ofthe scheduling scheme. Hence we could develop formulaefor
quantities such as the expected response time in
http://arxiv.org/abs/1902.00732
-
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
1.0
averagetim
ein
system
SPRPTSELFISH-PMIN-ADD-P
(a) Exponential service times,
α = 0.5
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
1.0
averagetim
ein
system
SPRPTSELFISH-PMIN-ADD-P
(b) Weibull service times, α =
0.5
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1
2
3
4
5
6
1
averagetim
ein
system
SPRPTSELFISH-PMIN-ADD-P
(c) Exponential service times,
α = 0.5, β = 0.2
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
1
2
3
4
5
6
1
averagetim
ein
system
SPRPTSELFISH-PMIN-ADD-P
(d) Weibull service times, α =
0.5, β = 0.2
Figure 15: Comparing methods of choosing a queue when using
predicted service times, for α-predictions withα = 0.5, and for (α,
β)-predictions with α = 0.5, β = 0.2. All queues use SPRPT within
the queue; in the figure,SPRPT means each job chooses the queue
with smallest predicted remaining work, SELFISH-P means each
jobchooses the queue that minimizes its own waiting time according
to predictions, and MIN-ADD-P means each jobchooses the queue that
minimizes the additional waiting time added according to
predictions.
equilibrium in the supermarket model using the leastloaded queue
and SRPT, if we can develop an analysis ofa single queue using SRPT
with a load-dependent arrivalprocess (and similarly for other
scheduling schemes). Weare not aware of any such analysis in the
literature; thisis a natural and tantalizing open question.
We note that the supermarket model when jobschoose the shortest
queue also, as far as we know, hasnot been analyzed for SJF, PSJF,
and SRPT. Here thearrival process at a queue in equilibrium can be
given bya Poisson process of rate λ(n) when the queue has n
jobswaiting. Again, if we can develop an analysis of a singlequeue
using SRPT with a queue-length-dependent arrivalprocess, we can use
this to analyze the supermarketmodel using SRPT (and similarly for
other schedulingschemes).
A.2 Choosing a Queue with Predictions As be-fore, we consider
methods for choosing a queue beyondthe queue with the (predicted)
least load. We considerplacing a job so that it minimizes the
additional pre-dicted waiting time, based on the predicted waiting
timesfor all jobs. Alternatively, if control is not centralized,we
might consider selfish jobs, that seek only to minimizetheir own
predicted waiting time when choosing a queue.
Our results, in Figure 15, focus on two representativeexamples:
α-predictions with α = 0.5, and (α, β)-predictions with α = 0.5, β
= 0.2. Again, choosinga queue to minimize the additional predicted
waitingtime in these situations does yield a small improvementover
least loaded update with SPRPT, and selfish jobshave a significant
negative effect.
Finally, as discussed earlier we note that there is asignificant
difference between Least Loaded Updatedand Least Loaded Total
policies. Up to this point,we have used “least loaded” to refer to
Least Loaded
Updated, where the predicted service time at the queue
isrecomputed after each departure and arrival. In contrast,Least
Loaded Total tracks a single predicted servicetime for the queue
that is updated on arrival but notat departure (unless a queue
empties, in which casethe service is reset to 0). While
theoretically appealing(as it reduces the state space for the
system), LeastLoaded Total generally performs significantly worse
thanLeast Loaded Updated. Figure 16 below provides arepresentative
example, in the setting of α-predictionswhen α = 0.5. We see FIFO,
in particular, does quitepoorly under Least Loaded Total, and in
all cases, thegap in performance notably increases with the load.
Ourother experiments show that the gap in performance alsoincreases
significantly as the predictions become moreinaccurate; with
exponential predictions, α-predictionswith higher α, or (α,
β)-predictions with α = 0.5 andβ > 0, our simulations show even
larger gains fromusing Least Loaded Updated. While it may be
usefulto consider Least Loaded Total as an approach towardobtaining
theoretical results, it does not appear to bethe result we wish to
aim for.
B Fairness
As described in Section 4.2.1, fairness is often defined interms
of making jobs spend an amount of time in thesystem that is roughly
proportional to their size [30];this is captured by the slowdown
concept, where a job’sslowdown is its resident time divided by its
size. Here,we will show results relative to the real-world
datasetsof Section 5 where the average system load is set toλ =
0.9.
B.1 Slowdown Distribution A first definition offairness can be
having predictable slowdowns: for exam-ple, minimizing the variance
of the per-job slowdowns,
-
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
2
4
6
8
10
1
averagetim
ein
system
FIFO (LLT)SPJF (LLT)SPRPT (LLT)FIFO (LLU)SPJF (LLU)SPRPT
(LLU)
(a) Exponential service times, queue
choice methods
0.5 0.6 0.7 0.8 0.9 1.0
λ: arrival rate
2
4
6
8
10
1
averagetim
ein
system
FIFO (LLT)SPJF (LLT)SPRPT (LLT)FIFO (LLU)SPJF (LLU)SPRPT
(LLU)
(b) Weibull service times, queue choice
methods
Figure 16: Comparing variations of Least Loaded, for
α-predictions with α = 0.5.
100 101 102 103
Slowdown
0.0
0.2
0.4
0.6
0.8
1.0
CDF
FIFOSPJFPSPJFSPRPT
(a) Shortest queue
100 101 102 103
Slowdown
0.0
0.2
0.4
0.6
0.8
1.0
CDF
FIFOSPJFPSPJFSPRPT
(b) Least loaded queue
100 101 102 103
Slowdown
0.0
0.2
0.4
0.6
0.8
1.0
CDF
FIFOSPJFPSPJFSPRPT
(c) Selfish queue selection
Figure 17: Google dataset: CDF of slowdown.
100 101 102 103 104 105
Slowdown
0.0
0.2
0.4
0.6
0.8
1.0
CDF
FIFOSPJFPSPJFSPRPT
(a) Shortest queue
100 101 102 103 104 105
Slowdown
0.0
0.2
0.4
0.6
0.8
1.0
CDF
FIFOSPJFPSPJFSPRPT
(b) Least loaded queue
100 101 102 103 104 105
Slowdown
0.0
0.2
0.4
0.6
0.8
1.0
CDF
FIFOSPJFPSPJFSPRPT
(c) Selfish queue selection
Figure 18: Trinity dataset: CDF of slowdown.
100 101 102 103 104
Slowdown
0.0
0.2
0.4
0.6
0.8
1.0
CDF
FIFOSPJFPSPJFSPRPT
(a) Shortest queue
100 101 102 103 104
Slowdown
0.0
0.2
0.4
0.6
0.8
1.0
CDF
FIFOSPJFPSPJFSPRPT
(b) Least loaded queue
100 101 102 103 104
Slowdown
0.0
0.2
0.4
0.6
0.8
1.0
CDF
FIFOSPJFPSPJFSPRPT
(c) Selfish queue selection
Figure 19: Twosigma dataset: CDF of slowdown.
-
0 5 10 15 20 25
Size
100
101
102
Meancond
ition
alslo
wdo
wn
FIFOSPJFPSPJFSPRPT
(a) Shortest queue
0 5 10 15 20 25
Size
100
101
102
Meancond
ition
alslo
wdo
wn
FIFOSPJFPSPJFSPRPT
(b) Least loaded queue
0 5 10 15 20 25
Size
100
101
102
Meancond
ition
alslo
wdo
wn
FIFOSPJFPSPJFSPRPT
(c) Selfish queue selection
Figure 20: Google dataset: mean conditional slowdown.
0 1 2 3 4 5 6 7 8
Size
100
101
102
Meancond
ition
alslo
wdo
wn
FIFOSPJFPSPJFSPRPT
(a) Shortest queue
0 1 2 3 4 5 6 7 8
Size
100
101
102
Meancond
ition
alslo
wdo
wn
FIFOSPJFPSPJFSPRPT
(b) Least loaded queue
0 1 2 3 4 5 6 7 8
Size
100
101
102
Meancond
ition
alslo
wdo
wn
FIFOSPJFPSPJFSPRPT
(c) Selfish queue selection
Figure 21: Trinity dataset: mean conditional slowdown.
or a given value for the x-th percentile. To facilitatethese
evaluations, in Figures 17 to 19 we show the em-pirical cumulative
distribution functions (CDFs) of theslowdown values observed in our
experiments.
Because system load changes over time, slowdowndistributions are
unequal in essentially all cases thatwe observe–in particular for
Trinity, due to the largeload the system experiences around the
beginning ofthe trace. The overall pattern we see here, however,
isthat the slowdown distribution becomes less unequalwith policies
that perform better; PSPJF, which is thepolicy that performs best
in terms of mean responsetime, also has the least variability in
terms of slowdown.We explain this intuitively with the fact that
extremeslowdown values are caused by “clogged” queues: byoptimizing
mean response time, extreme slowdown casesare also minimized.
B.2 Mean Conditional Slowdown A second defi-nition of fairness
involves mean conditional slowdown,that is, the expected value of
slowdown for job havinga given size: for a job of size x, the mean
conditionalslowdown is E[T (x)]/x, where T (x) is the response
timefor jobs of size x [4, 30, 31]. To evaluate it empirically
inour experiments, we follow the approach of Dell’Amicoet al. [8]:
bin jobs by size in 50 bins having the same
amount of jobs each, and plot the average size and slow-down of
each bin in Figures 20 to 22, respectively on theX and Y axis.
Once again, and confirming existing research [30, 8],we observe
that best-performing policies empiricallyresult in better fairness.
While one may imagine thatsize-based policies that give priority to
smaller jobs wouldpenalize large ones compared to a policy like
FIFO, thisis only true to some extent: for Google (Figure 20)
size-based policies appeared to always be preferable also forlarger
jobs; in Trinity (21) only the very largest jobsare penalized; only
in the Twosigma dataset large jobshave sensibly lower mean
conditional slowdown whensize-based policies are used.
In Figure 21, we notice a discontinuity for jobs ofsize 2. This
can be explained by Figure 11, which showsthat for the Trinity
dataset there is a set of jobs havingsize 2 whose size is
systematically underestimated: thisexplains the drop in mean
conditional slowdown.
For Twosigma (Figure 22), the results for all thesize-based
scheduling algorithms look superimposed. Weexplain this, once
again, with the data from Figure 11,showing that job size
estimations tend to be clusteredaround some well-separated values;
hence, the detailsof the scheduling algorithm only have limited
impact,since in most cases they will all schedule the job with
-
0.0 2.5 5.0 7.5 10.0 12.5 15.0
Size
100
101
102
Meancond
ition
alslo
wdo
wn
FIFOSPJFPSPJFSPRPT
(a) Shortest queue
0.0 2.5 5.0 7.5 10.0 12.5 15.0
Size
100
101
102
Meancond
ition
alslo
wdo
wn
FIFOSPJFPSPJFSPRPT
(b) Least loaded queue
0.0 2.5 5.0 7.5 10.0 12.5 15.0
Size
100
101
102
Meancond
ition
alslo
wdo
wn
FIFOSPJFPSPJFSPRPT
(c) Selfish queue selection
Figure 22: Twosigma dataset: mean conditional slowdown.
the smallest estimated size.
1 Introduction2 Additional Related Work3 Known Service Times3.1
Scheduling Beyond FIFO3.2 Choosing a Queue with Exact
Information
4 Predicted Service Times4.1 The Prediction Model4.2 Scheduling
with Predictions4.2.1 Fairness Issues
5 Real-World Traces6 ConclusionA Additional ResultsA.1 Toward
Developing Equations for Limiting BehaviorA.2 Choosing a Queue with
Predictions
B FairnessB.1 Slowdown DistributionB.2 Mean Conditional
Slowdown