Top Banner
The Supermarket Model with Known and Predicted Service Times Michael Mitzenmacher * Matteo Dell’Amico Abstract The supermarket model refers to a system with a large number of queues, where arriving customers choose d queues at random and join the queue with the fewest customers. The supermarket model demonstrates the power of even small amounts of choice, as compared to simply joining a queue chosen uniformly at random, for load balancing systems. In this work we perform simulation-based studies to consider variations where service times for a customer are predicted, as might be done in modern settings using machine learning tech- niques or related mechanisms. Our primary takeaway is that using even seemingly weak predictions of service times can yield significant benefits over blind First In First Out queueing in this context. However, some care must be taken when using predicted service time infor- mation to both choose a queue and order elements for service within a queue; while in many cases using the information for both choosing and ordering is beneficial, in many of our simulation settings we find that simply using the number of jobs to choose a queue is better when using predicted service times to order jobs in a queue. Although this study is simulation based, our study leaves many natural theoretical open questions for future work. 1 Introduction The success of machine learning has opened up new opportunities in terms of improving the efficiency of a wide array of processes. In this paper, we consider opportunities for using machine learning predictions in a specific setting: queueing in large distributed systems using “the power of two choices”. This also leads us to consider variants of these systems that appear to have not previously studied that do not use predictions as a starting point. While our study here is simulation-based, with both synthetic and real data sets, it leads to several new open theoretical questions. We start with key background. In queueing settings, the supermarket model (also described as the power * School of Engineering and Applied Sciences, Harvard Univer- sity. [email protected]. This work was supported in part by NSF grants CCF-1563710 and CCF-1535795. [email protected] of two choices, or balanced allocations) is typically described in the following way. Suppose we have a system of n First In, First Out (FIFO) queues. Jobs 1 arrive to the system as a Poisson process of rate λn, and service times are independent and exponentially distributed with mean 1. If each job selects a random queue on arrival, then via Poisson splitting [23, Section 8.4.2] each queue acts as a standard M/M/1 queue, and in equilibrium the fraction of queues with at least i jobs is λ i . Note that we consider here the tails of the queue length distribution, as it makes for easier comparisons. If each job selects a two random queues on arrival, and chooses to wait at the queue with fewer customers (breaking ties randomly), then in the limiting system as n grows to infinity, in equilibrium the fraction of queues with at least i jobs is λ 2 i -1 . That is, the tails decrease doubly exponentially in i, instead of single exponentially. In practice, even for moderate values of n (say in the large hundreds), one obtains performance close to this mean field limit; this can be be proven based on appropriate concentration bounds. More generally, for d choices where d is an integer constant greater than 1, the fraction of queues with at least i jobs falls like λ (d i -1)/(d-1) [20, 29]. While there are many variations on the supermarket model and its analysis (see, e.g., [1, 6, 24, 28]), here we focus on this standard, simple formulation, although we allow for general service distributions instead of just the exponential distribution. As stated previously, here we study variations of the supermarket model where service times are predicted. To describe our work and goals, we start by considering the baseline where service times are known. The analysis for the basic supermarket model de- scribed above assumes that service times are exponen- tially distributed but specific job service times are not known. (Extensions of the analysis to more general distributions are known [1, 6].) As such, an incoming job uses only the number of jobs at each chosen queue to decide which queue to join. As both a theoretical question and for possible practical implementations, it seems worthwhile to know what further improvement is 1 In this paper we use jobs instead of the more specific term customers, as the model applies to a variety of load-balancing settings. arXiv:1905.12155v2 [cs.PF] 8 Oct 2020
16

The Supermarket Model with Known and Predicted Service …of two choices, or balanced allocations) is typically described in the following way. Suppose we have a system of nFirst In,

Oct 24, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • The Supermarket Model with Known and Predicted Service Times

    Michael Mitzenmacher∗ Matteo Dell’Amico†

    Abstract

    The supermarket model refers to a system with a largenumber of queues, where arriving customers choose dqueues at random and join the queue with the fewestcustomers. The supermarket model demonstrates thepower of even small amounts of choice, as comparedto simply joining a queue chosen uniformly at random,for load balancing systems. In this work we performsimulation-based studies to consider variations whereservice times for a customer are predicted, as might bedone in modern settings using machine learning tech-niques or related mechanisms. Our primary takeawayis that using even seemingly weak predictions of servicetimes can yield significant benefits over blind First InFirst Out queueing in this context. However, some caremust be taken when using predicted service time infor-mation to both choose a queue and order elements forservice within a queue; while in many cases using theinformation for both choosing and ordering is beneficial,in many of our simulation settings we find that simplyusing the number of jobs to choose a queue is betterwhen using predicted service times to order jobs in aqueue. Although this study is simulation based, ourstudy leaves many natural theoretical open questions forfuture work.

    1 Introduction

    The success of machine learning has opened up newopportunities in terms of improving the efficiency ofa wide array of processes. In this paper, we consideropportunities for using machine learning predictions ina specific setting: queueing in large distributed systemsusing “the power of two choices”. This also leads us toconsider variants of these systems that appear to havenot previously studied that do not use predictions as astarting point. While our study here is simulation-based,with both synthetic and real data sets, it leads to severalnew open theoretical questions.

    We start with key background. In queueing settings,the supermarket model (also described as the power

    ∗School of Engineering and Applied Sciences, Harvard Univer-

    sity. [email protected]. This work was supported inpart by NSF grants CCF-1563710 and CCF-1535795.

    [email protected]

    of two choices, or balanced allocations) is typicallydescribed in the following way. Suppose we have asystem of n First In, First Out (FIFO) queues. Jobs1

    arrive to the system as a Poisson process of rate λn,and service times are independent and exponentiallydistributed with mean 1. If each job selects a randomqueue on arrival, then via Poisson splitting [23, Section8.4.2] each queue acts as a standard M/M/1 queue,and in equilibrium the fraction of queues with at leasti jobs is λi. Note that we consider here the tails ofthe queue length distribution, as it makes for easiercomparisons. If each job selects a two random queueson arrival, and chooses to wait at the queue with fewercustomers (breaking ties randomly), then in the limitingsystem as n grows to infinity, in equilibrium the fractionof queues with at least i jobs is λ2

    i−1. That is, thetails decrease doubly exponentially in i, instead of singleexponentially. In practice, even for moderate values ofn (say in the large hundreds), one obtains performanceclose to this mean field limit; this can be be proven basedon appropriate concentration bounds. More generally,for d choices where d is an integer constant greaterthan 1, the fraction of queues with at least i jobsfalls like λ(d

    i−1)/(d−1) [20, 29]. While there are manyvariations on the supermarket model and its analysis(see, e.g., [1, 6, 24, 28]), here we focus on this standard,simple formulation, although we allow for general servicedistributions instead of just the exponential distribution.

    As stated previously, here we study variations of thesupermarket model where service times are predicted.To describe our work and goals, we start by consideringthe baseline where service times are known.

    The analysis for the basic supermarket model de-scribed above assumes that service times are exponen-tially distributed but specific job service times are notknown. (Extensions of the analysis to more generaldistributions are known [1, 6].) As such, an incomingjob uses only the number of jobs at each chosen queueto decide which queue to join. As both a theoreticalquestion and for possible practical implementations, itseems worthwhile to know what further improvement is

    1In this paper we use jobs instead of the more specific term

    customers, as the model applies to a variety of load-balancingsettings.

    arX

    iv:1

    905.

    1215

    5v2

    [cs

    .PF]

    8 O

    ct 2

    020

  • possible if service times of the jobs were known.Recently, Hellemans and Van Houdt proved results

    in the supermarket model setting where job reservationsare made at d randomly chosen queues, and once thefirst reservation reaches the point of obtaining service,the other reservations are canceled. This corresponds tochoosing the least loaded (in terms of total remainingservice time2) of d queues using FIFO queues. Theirwork applies to general service distributions; for theclass of phase-type service distributions, they are able toexpress the limiting behavior of the system in termsof delayed differential equations [11]. Their results,including theorems regarding the system behavior as wellas simulations, show that using service time informationcan lead to significant improvements in the average timea job spends in the system. (The subsequent work[12] examines several additional variations.) Becauseof space limitations, we leave discussion of the challengesof extending these resuots beyond FIFO queues toAppendix A.1.

    However, when the service times are known, thereare two possible ways to potentially improve performance.First, as above, one can use the service times whenselecting a queue, by choosing the least loaded queue.Second, one can order the jobs using a strategy otherthan FIFO; the natural strategies to minimize theaverage time in the system (response time) are shortestjob first (SJF), preemptive shortest job first (PSJF),and shortest remaining processing time (SRPT). Hereshortest job first assumes no preemption and alwaysschedules the job with the smallest service time whena job completes, preemptive shortest job first allowspreemption so that a job with a smaller service timecan preempt the running job, and shortest remainingprocessing time allows preemption but is based on theremaining processing time instead of the total servicetime for a job. Note that here we assume a preempted jobdoes not need to start from the beginning and can latercontinue service where it left off. Also, while apparentlysomewhat less natural, PSJF allows job priorities tobe assigned on arrival to a queue without the need forupdating, unlike SRPT.

    In the setting of a single queue, Mitzenmacher hasrecently considered the setting where service time arepredicted rather than known exactly [21]. In this model,the jobs have a joint service-predicted service densityfunction g(x, y), where x is the true service time and yis the predicted service time. He provides formulae forthe average response time using corresponding strategiesshortest predicted job first (SPJF), preemptive shortest

    2We use service time and processing time interchangably inthis paper; both terms have been used historically.

    predicted job first (PSPJF), and shortest predictedremaining processing time (SPRPT). Simulation resultssuggest that in the single queue setting even weakpredictors can greatly improve performance over FIFOqueues. However, using the power of two choices alreadyprovides great improvements in systems with multiplequeues. It is therefore natural to consider whetherpredictions would still provide significant performancegains in the supermarket model.

    The contributions of this paper include the following.

    • For the case of known service times, we providea simulation study with synthetic traces showingthe potential gains when using SJF, PSJF, andSRPT queues in the supermarket model, providingan appropriate baseline.

    • We similarly through simulations examine the ben-efits when only predicted information is available,using FIFO, SPJF, PSPJF, and SPRPT queues.Here we use both synthetic and real-world datasets.

    • We determine somewhat counterintuitive behaviors;for example, we find many cases where choosing thepredicted least loaded queue performs worse thansimply choosing the shortest queue.

    • We provide a number of open questions related tothe analysis and use of these systems.

    2 Additional Related Work

    The power of two choices was first analyzed in thediscrete settings of hashing, modeled as balls and binsprocesses [3, 14, 18]. It was subsequently analyzed inthe setting of queueing systems, in particular in themean field limit (also referred to as the fluid limit) asthe number of queues grows to infinity [20, 29].

    Ordering jobs by service time has been studiedextensively in single queues. The text [10] providesan excellent introduction to the analysis of standardapproaches such as SJF and SPRT in the single queuesetting.

    Our work falls into a recent line of work that aims touse machine learning predictions to improve traditionalalgorithms. For example, Lykouris and Vassilvitskii [16]show how to use prediction advice from machine learningalgorithms to improve online algorithms for caching ina way that provides provable performance guarantees,using the framework of competitive analysis. Otherrecent works with this theme include the development oflearned Bloom filters [15, 22] and heavy hitter algorithmsthat use predictions [13]. One prior work in this veinhas specifically looked at scheduling with predictionsin the setting of a fixed collection of jobs, and considervariants of shortest predicted processing time that yield

  • good performance in terms of the competitive ratio,with the performance depending on the accuracy of thepredictions [25].

    In scheduling of queues, some works have looked atthe effects of using imprecise information, including forload balancing in multiple queue settings. For example,Mitzenmacher considers using old load information toplace jobs (in the context of the power of two choices)[19]. A strategy called TAGS studies an approach toutilizing multiple queues when no information existsabout the service time; jobs that run more than somethreshold in the first queue are canceled and passed tothe second queue, and so on [9].

    For single queues, Wierman and Nuyens look atvariations of SRPT and SJF with inexact job sizes,bounding the performance gap based on bounds onhow inexact the estimates can be [32]. Dell’Amico,Carra, and Michardi note that such bounds may beimpractical, as outliers in estimating job sizes occurfrequently; they empirically study scheduling policies forqueueing systems with estimated sizes [8]. We note [8]points out there are natural methods to estimate jobsize, such as by running a small portion of the code ina coding job; we expect this or other inputs would befeatures in a machine learning formulation. Recent workby Scully and Harchol-Balter have considered schedulingpolicies that are based on the amount of service received,where the scheduler only knows the service receivedapproximately, subject to adversarial noise, and the goalis to develop robust policies [26]. Also, for single queues,many prediction-based policies appear to fit within themore general framework of SOAP policies presented byScully et al. [27].

    Our work differs from these past works, in providinga model specifically geared toward studying performancewith machine-learning based predictions in the contextof the supermarket model.

    3 Known Service Times

    3.1 Scheduling Beyond FIFO To begin, we noteagain that the work of [11] shows that for the supermar-ket model with d choices for constant d, known servicetimes (independently chosen from a given service timedistribution), and FIFO scheduling, the equations forthe stationary distribution can be determined, whenthe queue is chosen according to the least loaded pol-icy. However, there appears to previously not have beenstudies of scheduling schemes within each queue thatmake use of the service times, including shortest job first(SJF), preemptive shortest job first (PSJF), and shortestremaining processing time (SRPT).

    While primarily in this paper we are intersted in theperformance of the supermarket model with predicted

    service times, as these variations do not appear to havebeen studied, we provide results as a baseline for ourlater results.

    In the simulation experiments we present, we sim-ulate 1000 initially empty queues over 10000 units oftime, and take the average response time for all jobs thatterminate after time 1000 and before time 10000. Wethen take the average of this value over 100 simulations.Waiting for the first 1000 time units allows the systemto approach the stationary distribution. Variations ofthe supermarket model have a limiting equilibrium dis-tribution as the number of queues goes to infinity [6],and in practice we find 1000 queues provides an accurateestimate of the limiting behavior. In the experiments wefocus on two example service distributions: exponentialwith mean 1, and a Weibull distribution with cumula-

    tive distribution 1 − e−√2x. (The Weibull distribution

    is more heavy-tailed, but also has mean 1. We havealso done experiments with a more heavy-tailed Weibull

    distribution with cumulative distribution 1− e− 3√6x; the

    general trends are similar for this distribution as for theWeibull distribution we discuss.) Arrivals are Poissonwith arrival rate λ; we focus on results with λ ≥ 0.5, asfor smaller arrival rates all our proposed schemes performvery well and it becomes difficult to see performance dif-ferences. Unless otherwise noted in the simulations eachjob chooses d = 2 queues at random. While we havedone simulations for larger d values, and at a high levelthere are similar trends, studying the detailed effects oflarger d across the many variations we study is left forfuture work.

    Figure 1(a) shows the results where the least loadedqueue is chosen (ties broken randomly), while Figure 1(b)shows the results where the shortest queue is chosen,for exponential service times. Figures 1(c) and 1(d)present the results for the Weibull distributed servicetimes. Generally, we see that the results from using theknown service times to order jobs at the queue is verypowerful; indeed, the gain from using SRPT appearslarger than the gain from moving from shortest queueto least loaded, and similarly the gain from using SJFand PSJF is larger under high enough loads.

    As the charts make it somewhat more difficult tosee some important details, we present numerical resultsfor exponentially distributed service times in Table 1to mark some key points. While generally the benefitsfrom using the service times to both choose the queueand order the queue are complementary, this is notalways the case. We see that using least loaded ratherthan shortest queue when using PSJF can increase theaverage time in the system under suitably high load.(This also occurs with the Weibull distribution undersufficiently high loads.) We also see that using PSJF can

  • 0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    2

    4

    6

    8

    10

    1

    averagetim

    ein

    system

    FIFOSJFPSJFSRPT

    (a) Exponential service times,

    queue chosen by least loaded

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    2

    4

    6

    8

    10

    1

    averagetim

    ein

    system

    FIFOSJFPSJFSRPT

    (b) Exponential service times,

    queue chosen by shortest queue

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    2

    4

    6

    8

    10

    1

    averagetim

    ein

    system

    FIFOSJFPSJFSRPT

    (c) Weibull service times, queue

    chosen by least loaded

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    2

    4

    6

    8

    10

    1

    averagetim

    ein

    system

    FIFOSJFPSJFSRPT

    (d) Weibull service times, queue

    chosen by shortest queue

    Figure 1: Exponential and Weibull service times, two choice supermarket model, with various queue schedulingpolicies.

    λShortest queue Least loaded

    FIFO SJF PSJF SRPT FIFO SJF PSJF SRPT

    0.5 1.2658 1.2585 1.1669 1.1337 1.1510 1.1460 1.1462 1.09730.6 1.4078 1.3857 1.2527 1.2020 1.2401 1.2280 1.2289 1.15180.7 1.6148 1.5567 1.3726 1.2962 1.3749 1.3467 1.3490 1.23070.8 1.9485 1.7997 1.5542 1.4367 1.5975 1.5297 1.5371 1.35330.9 2.6168 2.2054 1.8850 1.6873 2.0534 1.8634 1.8915 1.57830.95 3.3923 2.5903 2.2248 1.9408 2.5852 2.1999 2.2685 1.80960.98 4.5384 3.0618 2.6721 2.2614 3.3798 2.6197 2.7807 2.10380.99 5.4855 3.3856 2.8596 2.4903 4.0451 2.8514 3.1696 2.3176

    Table 1: Results from choosing from the shortest queue compared with choosing the least loaded.

    give worse performance than using SJF; however, thisdoes not happen with our experiments with the Weibulldistribution, where the ability of preemption to helpavoid waiting for long-running jobs appears to be morehelpful. While it is known that PSJF can behave worsethan SJF, these examples highlight that the interactionswhen using service time information in multiple choicesystems must be treated carefully.

    3.2 Choosing a Queue with Exact InformationGiven the improvements possible using known servicetimes in the supermarket model, we now considermethods for choosing a queue beyond the queue with theleast load, in the setting withouot predictions. Givenfull information about the service times of jobs at eachqueue, a job could be placed so that it minimizes theadditional waiting time. The additional waiting timewhen placing an arriving job is the sum of the remainingservice times of all jobs in the queue that will remainahead of the arriving job, summed with the product ofthe service time of the arriving job and the number ofjobs it will be placed ahead of. Equivalently, we canconsider the total waiting time for each queue beforeand after the arriving job would be placed (ignoringthe possibility of future jobs), and place the item in the

    queue that leads to the smallest increase.Alternatively, if control is not centralized, we might

    consider selfish jobs, that seek only to minimize theirown waiting time when choosing a queue. In this casethe arriving job will consider the sum of the remainingservice times of all jobs that will be ahead of it for eachavailable queue choice.

    Our results, given in Figure 2, show that choosinga queue to minimize the additional waiting time inthese situations does yield a small improvement overleast loaded SRPT, as might be expected. Because theadditional improvement is small, we expect in manysystems it may not be worthwhile to implement thismodification, even if expected waiting time is the primaryperformance metric. Our results also show that whileselfish jobs have a significant negative effect, the overallaverage service time still remains smaller than thestandard supermarket model when choosing the shorterof two FIFO queues.

    4 Predicted Service Times

    In many settings, it may be unreasonable to expect toobtain exact service times, but predicted service timesmay be available. Indeed, with advances in machine

  • 0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    4.5

    5.0

    1.0

    averagetim

    ein

    system

    SRPTSELFISHMIN-ADD

    (a) Exponential service times, queue choice methods

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    4.5

    5.0

    1.0

    averagetim

    ein

    system

    SRPTSELFISHMIN-ADD

    (b) Weibull service times, queue choice methods

    Figure 2: Comparing methods of choosing a queue. All queues use SRPT within the queue; in the figure, SRPTmeans each job chooses the queue with smallest remaining work, SELFISH means each job chooses the queuethat minimizes its waiting time, and MIN-ADD means each job chooses the queue that minimizes the additionalwaiting time added.

    learning techniques, we expect that in many settingssome type of prediction will be available. As noted in[21], in the context of scheduling within a single queue,one would expect that even weak predictors may be veryuseful, since ordering jobs correctly most of the time willproduce significant gains. As we have seen, however, evenwithout predictions the question of whether using loadinformation for both choosing a queue and for orderingwithin a queue provides complementary gains is notalways clear. Naturally, the same question will ariseagain when using predicted service times.

    4.1 The Prediction Model In what follows, weutilize a simple model used in [21], namely that thereis a continuous joint distribution g(x, y) for the actualservice time x and predicted service time y.

    With this model, there remains an issue of how todescribe the predicted remaining service time. Supposethat the original predicted service time for a job is y,but the actual service time is x > y. If the amount ofservice is being tracked, and the service received hasbeen t, then as the remaining service time is x− t, it isnatural to use y − t as the predicted remaining servicetime. Of course, at some point it becomes the case thatt > y, and the predicted remaining service time will benegative, which seems unsuitable.

    We use (y − t)+ = max(y − t, 0) as the predictedremaining service time. We recognize (as noted in [21])that this is problematic; clearly the predicted remainingservice time should be positive, and ideally would bea function f(y, t) of the initial prediction and the timeserved thus far. However, determining the appropriatefunction would appear to require some knowledge of

    the joint distribution g(x, y); our aim here is to exploresimple, general approaches (such as choosing the shortestof two queues and using SRPT) that are agnostic to theunderlying distribution g. In many situations, it may becomputationally undesirable to utilize knowledge of g, org may be not known or changing over time. We thereforeleave the question of how to optimize the estimate of thepredicted remaining time to achieve the best performancein this context as future work.

    We consider various models for predictions (some ofwhich were used in [21]). The models are intended tobe exemplary; they do not match a specific real-worldsetting, and indeed it would be difficult to consider therange of possible real-world predictions. Rather, they aremeant to show generally that even moderately accuratepredictions can yield strong performance, and to showthat a variety of interesting behaviors can occur underthis framework.

    In one model, which we refer to as exponential pre-dictions, a job with actual service time x has a predictedservice time that is exponentially distributed with meanx. This model is not meant to accurately represent aspecific situation, but is potentially useful for theoreti-cal analysis in that the corresponding density equationg(x, y) is easy to write down, and it highlights how evennoisy predictions can perform well. Also, exponentialservice times are a standard first consideration in queue-ing theory. In another model, which we refer to asα-predictions, a job with service time x has a predictedservice time that is uniform over [(1−α)x, (1+α)x], for ascale parameter 0 ≤ α ≤ 1. Again, this is a simple modelthat captures inaccurate estimates naturally. Finally, weintroduce a model that we dub (α, β)-predictions, which

  • makes use of the following notion of a reversal. For aservice distribution with be the cumulative distributionfunction S(x), the reversal of x is S−1(1 − S(x)). Forexample, if x is the value that is at the 70th percentileof the distribution, the reversal is the value at the 30thpercentile of the distribution. For an (α, β)-prediction,when the service time is x, with probability β we returnthe reversal of x, and with all remaining probability thepredicted service time is uniform over [(1−α)x, (1+α)x].We use this model to represent cases where severe mis-predictions are possible, so that jobs with very largeservice times might be mistakenly predicted as havingvery small service times (and vice versa). We mightexpect such mispredictions could be potentially veryproblematic when scheduling jobs according to theirpredicted service times.

    There further remains the question of how to accountfor the predicted workload at a queue. We discuss severalvariations.

    1. Least Loaded Total: One could simply treat thepredicted service times as actual service times, andtrack the total predicted service time remainingat a queue. That is, when a new job arrives ata queue, the predicted service time for the job isadded to the total, and the total predicted servicetime reduces at a rate of one unit per unit time whena job is in the system (with a lower bound of 0);when the queue empties, the total predicted servicetime is reset to 0. An advantage of this approachis that in implementation the queue state can berepresented by a single number. The disadvantageis that when a job’s predicted service time differsgreatly from the real service time, this approachdoes not correspondingly update when that jobcompletes.

    2. Least Loaded Updated: Here one updates thequeue state both on a job arrival and a job comple-tion; when a job completes, the predicted servicetime at the queue is recomputed as the sum of thepredicted service times of the remaining jobs. Withsmall additional complexity, the accuracy of thepredicted work at the queue improves substantially.

    3. Shortest Queue: One can always simply use thenumber of jobs rather than the predicted servicetime to choose the queue.

    We note Least Loaded Updated performs much betterthan Least Loaded Total, and focus on it for the restof the paper. More on the comparison is given inAppendix A.2.

    4.2 Scheduling with Predictions We begin as be-fore by first considering the effect of the choice of schedul-ing procedure within a queue, by examining results forFIFO, shortest predicted job first (SPJF), preemptiveshortest predicted job first (PSPJF), and shortest pre-dicted remaining processing time (SPRPT) in varioussettings. Our figures consider the least loaded updatedand shortest queue variations described above (as theleast loaded total variation generally performs signifi-cantly worse, and so we do not generally consider thismodel, although for comparison purposes we presentsome results for it in the next subsection). We again con-sider exponential and Weibull distributed service timesas previously described.

    Our first results for exponential predictions, shownin Figure 3, already show two key points: predictedservice times can work quite well, but there are alsosurprising and interesting behaviors. First, choosing theshortest queue generally performs better than choosingthe least loaded according to the predicted service timesof jobs in the queue; for this set of experiments, only withWeibull distributed service times and FIFO service doeschoosing the queue based on the predicted load in thequeue perform better than using just the number of jobs.That is, when using strategies within the queue thatutilize the predicted information, it is worse to use thepredicted load to choose the queue. Hence, even in thisvery simple case, we see that using predicted informationfor multiple subtasks (choosing a queue, and balancingwithin a queue) can lead to worse performance thansimply using the information for one of the subtasks.

    Second, PSPJF performs better than SPRPT on theWeibull distribution. On reflection, this seems reasonablefrom first principles; a long job that is incorrectlypredicted to have a small remaining processing timecan lead to increased waiting times for many jobs underSPRPT, but preempting based on the initial predictionof the job time ameliorates this effect.

    We now examine results in the setting of α-predictions. We first look at the case of SPRPT; resultsfor other schemes have similar characteristics. We com-pare SRPT (no prediction) with SPRPT for α = 0.2, 0.5,,and 0.8, both using the least loaded update and shortestqueue policies. The results appear in Figure 4. Theprimary takeaway is that again using predictions offerswhat is arguably surprisingly little loss in performance,even at large values of α. Here, we find that least loadeddoes better than shortest queue for small values of α,but for α = 0.8 and high arrival rates shortest queue canperform slightly better. This is consistent with our re-sults for the exponential model; under large error, usingpredictions both to choose a queue and within a queuecan lead to over-using the predictions.

  • 0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1

    2

    3

    4

    5

    6

    7

    8

    1

    averagetim

    ein

    system

    FIFOSPJFPSPJFSPRPT

    (a) Exponential service times,

    queue chosen by least loadedupdated

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1

    2

    3

    4

    5

    6

    7

    8

    1

    averagetim

    ein

    system

    FIFOSPJFPSPJFSPRPT

    (b) Exponential service times,

    queue chosen by shortest queue

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1

    2

    3

    4

    5

    6

    7

    8

    1

    averagetim

    ein

    system

    FIFOSPJFPSPJFSPRPT

    (c) Weibull service times, queue

    chosen by least loaded updated

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1

    2

    3

    4

    5

    6

    7

    8

    1

    averagetim

    ein

    system

    FIFOSPJFPSPJFSPRPT

    (d) Weibull service times, queue

    chosen by shortest queue

    Figure 3: Exponential predictions with exponential and Weibull service times, two choice supermarket model,with various queue scheduling policies.

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    1.0

    averagetim

    ein

    system

    SRPTSPRPT α = 0.2SPRPT α = 0.5SPRPT α = 0.8

    (a) Exponential service times,queue chosen by least loadedupdated

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    1.0

    averagetim

    ein

    system

    SRPTSPRPT α = 0.2SPRPT α = 0.5SPRPT α = 0.8

    (b) Exponential service times,queue chosen by shortest queue

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    1.0averagetim

    ein

    system

    SRPTSPRPT α = 0.2SPRPT α = 0.5SPRPT α = 0.8

    (c) Weibull service times, queue

    chosen by least loaded updated

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    1.0

    averagetim

    ein

    system

    SRPTSPRPT α = 0.2SPRPT α = 0.5SPRPT α = 0.8

    (d) Weibull service times, queuechosen by shortest queue

    Figure 4: α-predictions with exponential and Weibull service times, two choice supermarket model, using SPRPT.

    We also look at the case of PSPJF in Figure 5.Performance is somewhat worse than for SPRPT, andthe effect of increasing α is somewhat smaller. Here,we find that joining the shortest queue generally doesbetter than joining the least loaded queue. Again, thisis consistent with our results for the exponential model.Overall the picture remains very similar.

    For completeness we also provide results using SPJFin Figure 6. Here SPJF generally performs worse thanSPRPT and PSPJF; however, the effect on performanceas α increases rises even more slowly with α. In theseexperiments, using least loaded always performed betterthan choosing the shortest queue.

    Finally, we consider the case of (α, β)-predictions.Here we present an example of α = 0.5 with β =0.1, 0.2, and 0.3, comparing also with the results fromα-prediction when α = 0.5. Recall that in this settingwith probability β a job’s service time is replaced by itsreversal in the cumulative distribution function, so thatjobs with very large service times might be mistakenlypredicted as having very small service times (and viceversa). The remaining jobs have predictions uniformover [(1 − α)x, (1 + α)x] when the true service timeis x. The results for SPRPT are given in Figure 7,for PSPJF are given in Figure 8, and for SPJF are

    given in Figure 9. The primary takeaway is again thatperformance is quite robust to mispredictions. Evenwhen β = 0.3, performance in all cases is significantlybetter than for standard choosing the shortest of twoqueues and using FIFO queueing without knowledge ofservice times. We also see now familiar trends. Theeffects of misprediction are more significant for theheavy-tailed service times, and when mispredictions aresufficiently frequent, it becomes better to choose a queueaccording to the shortest queue rather than accordingto the least loaded updated policy. Also, in some casesPSPJF can outperform SPRPT.

    4.2.1 Fairness Issues As the general problem of uti-lizing predictions for queue scheduling is relatively new,we have focused here on examining expected responsetime. We point out, however, that there are novel prob-lems regarding questions such as fairness in the settingwhere predictions are used. For example, a standardnotion of fairness involves considering job slowdown,i.e., the ratio T (x)/x between a job j’s response timeT (x) and its size x, and the mean conditional slow-down E[T (x)]/x [4, 30, 31]. Not surprisingly, we findthat when using predictions, even when using schedulingmethods based on SRPT, which is often fair or limited

  • 0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    1.0

    averagetim

    ein

    system

    SJFPSPJF α = 0.2PSPJF α = 0.5PSPJF α = 0.8

    (a) Exponential service times,queue chosen by least loadedupdated

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    1.0

    averagetim

    ein

    system

    SJFPSPJF α = 0.2PSPJF α = 0.5PSPJF α = 0.8

    (b) Exponential service times,queue chosen by shortest queue

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    1.0

    averagetim

    ein

    system

    SJFPSPJF α = 0.2PSPJF α = 0.5PSPJF α = 0.8

    (c) Weibull service times, queue

    chosen by least loaded updated

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    1.0

    averagetim

    ein

    system

    SJFPSPJF α = 0.2PSPJF α = 0.5PSPJF α = 0.8

    (d) Weibull service times, queuechosen by shortest queue

    Figure 5: α-predictions with exponential and Weibull service times, two choice supermarket model, using PSPJF.

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    4.5

    5.0

    1.0

    averagetim

    ein

    system

    SJFSPJF α = 0.2SPJF α = 0.5SPJF α = 0.8

    (a) Exponential service times,

    queue chosen by least loadedupdated

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    4.5

    5.0

    1.0

    averagetim

    ein

    system

    SJFSPJF α = 0.2SPJF α = 0.5SPJF α = 0.8

    (b) Exponential service times,

    queue chosen by shortest queue

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1

    2

    3

    4

    5

    6

    1

    averagetim

    ein

    system

    SJFSPJF α = 0.2SPJF α = 0.5SPJF α = 0.8

    (c) Weibull service times, queue

    chosen by least loaded updated

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1

    2

    3

    4

    5

    6

    1

    averagetim

    ein

    system

    SJFSPJF α = 0.2SPJF α = 0.5SPJF α = 0.8

    (d) Weibull service times, queue

    chosen by shortest queue

    Figure 6: α-predictions with exponential and Weibull service times, two choice supermarket model, using SPJF.

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    1.0

    averagetim

    ein

    system

    SPRPT (α, β) = (0.5, 0.0)SPRPT (α, β) = (0.5, 0.1)SPRPT (α, β) = (0.5, 0.2)SPRPT (α, β) = (0.5, 0.3)

    (a) Exponential service times,queue chosen by least loadedupdated

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    1.0

    averagetim

    ein

    system

    SPRPT (α, β) = (0.5, 0.0)SPRPT (α, β) = (0.5, 0.1)SPRPT (α, β) = (0.5, 0.2)SPRPT (α, β) = (0.5, 0.3)

    (b) Exponential service times,queue chosen by shortest queue

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    4.5

    5.0

    1.0

    averagetim

    ein

    system

    SPRPT (α, β) = (0.5, 0.0)SPRPT (α, β) = (0.5, 0.1)SPRPT (α, β) = (0.5, 0.2)SPRPT (α, β) = (0.5, 0.3)

    (c) Weibull service times, queue

    chosen by least loaded updated

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    4.5

    5.0

    1.0

    averagetim

    ein

    system

    SPRPT (α, β) = (0.5, 0.0)SPRPT (α, β) = (0.5, 0.1)SPRPT (α, β) = (0.5, 0.2)SPRPT (α, β) = (0.5, 0.3)

    (d) Weibull service times, queuechosen by shortest queue

    Figure 7: (α, β)-predictions with exponential and Weibull service times, two choice supermarket model, usingSPRPT.

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    4.5

    5.0

    1.0

    averagetim

    ein

    system

    PSPJF (α, β) = (0.5, 0.0)PSPJF (α, β) = (0.5, 0.1)PSPJF (α, β) = (0.5, 0.2)PSPJF (α, β) = (0.5, 0.3)

    (a) Exponential service times,queue chosen by least loaded

    updated

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    4.5

    5.0

    1.0

    averagetim

    ein

    system

    PSPJF (α, β) = (0.5, 0.0)PSPJF (α, β) = (0.5, 0.1)PSPJF (α, β) = (0.5, 0.2)PSPJF (α, β) = (0.5, 0.3)

    (b) Exponential service times,queue chosen by shortest queue

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    4.5

    5.0

    1.0

    averagetim

    ein

    system

    PSPJF (α, β) = (0.5, 0.0)PSPJF (α, β) = (0.5, 0.1)PSPJF (α, β) = (0.5, 0.2)PSPJF (α, β) = (0.5, 0.3)

    (c) Weibull service times, queuechosen by least loaded updated

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    4.5

    5.0

    1.0

    averagetim

    ein

    system

    PSPJF (α, β) = (0.5, 0.0)PSPJF (α, β) = (0.5, 0.1)PSPJF (α, β) = (0.5, 0.2)PSPJF (α, β) = (0.5, 0.3)

    (d) Weibull service times, queuechosen by shortest queue

    Figure 8: (α, β)-predictions with exponential and Weibull service times, two choice supermarket model, usingPSPJF.

  • 0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    4.5

    5.0

    1.0

    averagetim

    ein

    system

    SPJF (α, β) = (0.5, 0.0)SPJF (α, β) = (0.5, 0.1)SPJF (α, β) = (0.5, 0.2)SPJF (α, β) = (0.5, 0.3)

    (a) Exponential service times,

    queue chosen by least loadedupdated

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    4.5

    5.0

    1.0

    averagetim

    ein

    system

    SPJF (α, β) = (0.5, 0.0)SPJF (α, β) = (0.5, 0.1)SPJF (α, β) = (0.5, 0.2)SPJF (α, β) = (0.5, 0.3)

    (b) Exponential service times,

    queue chosen by shortest queue

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1

    2

    3

    4

    5

    6

    7

    8

    9

    1

    averagetim

    ein

    system

    SPJF (α, β) = (0.5, 0.0)SPJF (α, β) = (0.5, 0.1)SPJF (α, β) = (0.5, 0.2)SPJF (α, β) = (0.5, 0.3)

    (c) Weibull service times, queue

    chosen by least loaded updated

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1

    2

    3

    4

    5

    6

    7

    8

    9

    1

    averagetim

    ein

    system

    SPJF (α, β) = (0.5, 0.0)SPJF (α, β) = (0.5, 0.1)SPJF (α, β) = (0.5, 0.2)SPJF (α, β) = (0.5, 0.3)

    (d) Weibull service times, queue

    chosen by shortest queue

    Figure 9: (α, β)-predictions with exponential and Weibull service times, two choice supermarket model, usingSPJF.

    in its unfairness (see the discussion in [30]), the proposedvariations using prediction have very poor fairness.

    This occurs for multiple reasons. Most significantly,even very short jobs can be caught waiting for jobs witha remaining predicted service of zero. This leads tooccasional large values of T (x)/x for small jobs, skewingthe fairness. Also, in cases where small jobs obtain largepredictions, the value of T (x)/x can again be high underhigh load.

    The two problems suggest different solutions. Inthe first case, T (x) is artificially high; this suggestsstrong fairness results require policies that do somethingbeyond letting jobs with predicted remaining servicetime zero continue. Some addition of processor sharing,preemption, or modifying the prediction when thepredicted remaining service time reaches zero shouldtherefore improve fairness. In the second case, x issmall when the predicted time y is large; this suggeststhat alternative definitions of fairness, based on theprediction y as well as the actual service time x, shouldbe considered.

    We provide some preliminary results on fairness inAppendix B, and leave additional study for further work.

    5 Real-World Traces

    We now consider how the scheduling policies deal withreal-world traces by Amvrosiadis et al. [2], who developeda system that predicts job size in large real-world clusters.For each job, we have the submission time and its realand predicted runtime in seconds (for multi-task jobs,their sizes were obtained by summing the size (resp.predicted size) of each individual task); we only takeinto account the jobs that completed successfully. Forconsistency with the synthetic traces, we normalize jobsize to obtain a mean of 1. We have three traces: Google(385,072 jobs), Trinity (18,872 jobs), and Twosigma(265,029 jobs). In Figure 10, we divide each trace in 100periods, and plot the cumulative size of jobs submitted

    0 20 40 60 80 100

    Period

    0

    10

    20

    30

    40

    Weigh

    tof

    jobsin

    period(m

    ean=1)

    Twosigma

    Trinity

    Google

    Figure 10: Cumulative weight of jobs submitted perperiod.

    in that period, normalized such that the mean valueper period is 1. We have very different job submissionpatterns: most of the load for the Twosigma datasetis concentrated around the beginning of the trace; theTrinity dataset has lower load spikes along the wholetrace; finally, the Google dataset has a relatively moreuniform load pattern.3

    To obtain an average system load of λ, we scale jobsubmission times such that the average interval betweenjobs is 1/qλ, where q = 100 is the number of queues in thesimulation. Since there is a degree of randomness due tothe random choice of queues, we repeated the experiment5 times for each settings, with negligible differences inthe end results. Results shown here represent the averageof the 5 runs.

    In Figure 11, we show heatmaps covering the joint

    3Amvrosiadis et al. discuss a fourth trace (Mustang). However,

    we have found that job size prediction in this case is not goodenough to be useful for scheduling, and in our experimentsscheduling policies that are not based on predicted job size

    outperform those that are. For this reason, we do not includeresults from that dataset in this paper.

  • −1 0 1 2 3log10(size)

    −1

    0

    1

    2

    3

    log10(estim

    ation)

    0

    10

    100

    1000

    5000

    numbero

    fjobs

    (a) Google

    −3 −2 −1 0 1log10(size)

    −3

    −2

    −1

    0

    1

    log10(estim

    ation)

    0

    1

    10

    100

    200

    500

    1000

    1500

    numbero

    fjobs

    (b) Trinity

    −3 −2 −1 0 1 2log10(size)

    −3

    −2

    −1

    0

    1

    2

    log10(estim

    ation)

    0

    10

    100

    1000

    2000

    5000

    numbero

    fjobs

    (c) Twosigma

    Figure 11: Real-world datasets: heatmaps of job size distribution versus estimation.

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: average system load

    0

    5

    10

    15

    20

    25

    30

    averagetim

    ein

    system

    FIFOSPJFPSPJFSPRPT

    (a) Shortest queue.

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: average system load

    0

    5

    10

    15

    20

    25

    30averagetim

    ein

    system

    FIFOSPJFPSPJFSPRPT

    (b) Least loaded updated.

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: average system load

    0

    5

    10

    15

    20

    25

    30

    averagetim

    ein

    system

    FIFOSPJFPSPJFSPRPT

    (c) Selfish queue selection.

    Figure 12: Google dataset: mean response time with various queue choice methods.

    distribution of job real and predicted sizes. In what webelieve is a common pattern in real-world systems, thejob size distribution is heavy tailed, with a few largejobs representing a large amount of the overall systemload; job sizes are distributed along several orders ofmagnitude.

    In Figure 12, we show results for the Google dataset.The variation of system load in time yields higherabsolute numbers for the mean response time comparedto the synthetic datasets seen before; this phenomenonis stronger in the other datasets because they havelarger load spikes, as seen in Figure 10. Similarly tosome use cases for synthetic workloads seen before, weobserve that also here shortest queue performs betterthan least loaded updated. Interestingly, the selfishstrategy instead performs essentially as well as shortestqueue, showing that in this scenario the “price of anarchy”(i.e., the performance cost due to letting each job’s ownerselfishly choose their queue) is negligible; analogousconsiderations hold for the other datasets we consider inthis section. Unsurprisingly, the size-based schedulingpolicies largely outperform FIFO scheduling in thisheavy-tailed workload, and—again confirming resultsfor synthetic workloads—PSPJF is preferable to theother policies due to its better performance when largejobs are underestimated.

    In the other two datasets (Figures 13 and 14) resultsare qualitatively similar, even though the larger loadspikes further increase the average time spent in thesystem. The difference between FIFO and the otherpolicies is even larger, while PSPJF remains the best-performing policy. The difference between policies issmaller, compared to the large advantage due to choosinga policy based on predicted job size.

    6 Conclusion

    We have considered (primarily through simulation) thesupermarket model in the setting where service times arepredicted. As a starting point, however, we consideredthe baseline where service times are known. Ourresults show that in the “standard” supermarket model(exponential service times, Poisson arrivals) as well asmore generally, even though the power of two choicesprovides tremendous gains over a single choice, thereremains substantial further performance gains to beachieved when one make use of known service times. Inparticular, using a service-aware scheduling policy suchas SRPT can yield significant performance gains underhigh loads. This immediately raises natural theoreticalquestions, such as deriving equations for the supermarketmodel using least loaded server selection with shortestjob first or shortest remaining processing time, which

  • 0.5 0.6 0.7 0.8 0.9 1.0

    λ: average system load

    20

    25

    30

    35

    40

    45

    50

    averagetim

    ein

    system

    FIFOSPJFPSPJFSPRPT

    (a) Shortest queue.

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: average system load

    20

    25

    30

    35

    40

    45

    50

    averagetim

    ein

    system

    FIFOSPJFPSPJFSPRPT

    (b) Least loaded updated.

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: average system load

    20

    25

    30

    35

    40

    45

    50

    averagetim

    ein

    system

    FIFOSPJFPSPJFSPRPT

    (c) Selfish queue selection.

    Figure 13: Trinity dataset: mean response time with various queue choice methods.

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: average system load

    0

    25

    50

    75

    100

    125

    150

    175

    200

    averagetim

    ein

    system

    FIFOSPJFPSPJFSPRPT

    (a) Shortest queue.

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: average system load

    0

    25

    50

    75

    100

    125

    150

    175

    200averagetim

    ein

    system

    FIFOSPJFPSPJFSPRPT

    (b) Least loaded updated.

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: average system load

    0

    25

    50

    75

    100

    125

    150

    175

    200

    averagetim

    ein

    system

    FIFOSPJFPSPJFSPRPT

    (c) Selfish queue selection.

    Figure 14: Trinity dataset: mean response time with various queue choice methods.

    would extend the recent work of [11].However, our more important direction is to intro-

    duce the idea of using predicted service times in thissetting. Our simulation-based study suggests that thepower of two choices maintains most of its power evenwhen using predictions. We also find some interesting,potentially counterintuitive effects. For example, whenpredictions are sufficiently inaccurate performance isbetter when using queue lengths rather than the pre-dicted load when choosing a queue, even when using thepredictions to schedule within the queue. We view ourresults as showing the use of predicted service times inlarge-scale distributed systems can be quite promisingin terms of improving performance.

    Our work highlights many open practical questionson how to optimize these kinds of systems when usingpredictions, as well as many open theoretical questionsregarding how to analyze these kinds of systems. Forexample, suitable mechanisms for managing jobs thatexceed their predicted service time offer further potentialfor important improvements. Perhaps most interestingis developing appropriate theories of fairness when usingpredictions. A short job predicted to have a long servicetime may face long delays before service; how to achieve asuitable notion of fairness when using predictions clearlymerits further study.

    References

    [1] Reza Aghajani, Xingjie Li, and Kavita Ramanan. Mean-field dynamics of load-balancing networks with generalservice distributions. arXiv preprint arXiv:1512.05056,2015.

    [2] George Amvrosiadis, Jun Woo Park, Gregory R. Ganger,Garth A. Gibson, Elisabeth Baseman, and NathanDeBardeleben. On the diversity of cluster workloadsand its impact on research results. In Proc. of the 2018USENIX Annual Technical Conference, 2018.

    [3] Yossi Azar, Andrei Z. Broder, Anna R. Karlin, andEli Upfal. Balanced allocations. SIAM J. Comput.,29(1):180–200, 1999.

    [4] Nikhil Bansal and Mor Harchol-Balter. Analysis ofSRPT scheduling: investigating unfairness. In Proc. ofthe Joint International Conference on Measurementsand Modeling of Computer Systems, pp. 279-290, 2001.

    [5] René Bekker, Sem C Borst, Onno J Boxma, and OfferKella. Queues with workload-dependent arrival andservice rates. Queueing Systems, 46(3-4):537–556, 2004.

    [6] Maury Bramson, Yi Lu, and Balaji Prabhakar. Ran-domized load balancing with general service time dis-tributions. In ACM SIGMETRICS Performance Eval-uation Review, volume 38, pages 275–286, 2010.

    [7] Maury Bramson, Yi Lu, and Balaji Prabhakar. Asymp-totic independence of queues under randomized loadbalancing. Queueing Systems, 71(3):247–292, 2012.

    http://arxiv.org/abs/1512.05056

  • [8] Matteo Dell’Amico, Damiano Carra, and PietroMichiardi. PSBS: Practical size-based scheduling. IEEETransactions on Computers, 65(7):2199-2212, 2015.

    [9] Mor Harchol-Balter. Task assignment with unknownduration. J. ACM, 49(2):260–288, 2002.

    [10] Mor Harchol-Balter. Performance modeling and de-sign of computer systems: queueing theory in action.Cambridge University Press, 2013.

    [11] Tim Hellemans and Benny Van Houdt. On the power-of-d-choices with least loaded server selection. POMACS,2(2):27:1–27:22, 2018.

    [12] Tim Hellemans, Tejas Bodas, and Benny Van Houdt.Performance analysis of workload dependent loadbalancing policies. POMACS, 3(2):35:1–35:35, 2019.

    [13] Chen-Yu Hsu, Piotr Indyk, Dina Katabi, and Ali Vakil-ian. Learning-based frequency estimation algorithms.International Conference on Learning Representations,2019.

    [14] Richard M. Karp, Michael Luby, and Friedhelm Meyerauf der Heide. Efficient PRAM simulation on a dis-tributed memory machine. Algorithmica, 16(4/5):517–542, 1996.

    [15] Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, andNeoklis Polyzotis. The case for learned index structures.In Proc. of the 2018 International Conference onManagement of Data, pages 489–504. ACM, 2018.

    [16] Thodoris Lykouris and Sergei Vassilvitskii. Competitivecaching with machine learned advice. In Proc. of the35th International Conference on Machine Learning,pages 3302–3311, 2018.

    [17] Raymond Marie. Calculating equilibrium probabilitiesfor λ (n)/c k/1/n queues. ACM Sigmetrics PerformanceEvaluation Review, 9(2):117–125, 1980.

    [18] Michael Mitzenmacher. Studying balanced allocationswith differential equations. Combinatorics, Probability& Computing, 8(5):473–482, 1999.

    [19] Michael Mitzenmacher. How useful is old information?IEEE Trans. Parallel Distrib. Syst., 11(1):6–20, 2000.

    [20] Michael Mitzenmacher. The power of two choicesin randomized load balancing. IEEE Trans. ParallelDistrib. Syst., 12(10):1094–1104, 2001.

    [21] Michael Mitzenmacher. Scheduling with predictionsand the price of misprediction. arXiv preprintarXiv:1902.00732, 2019.

    [22] Michael Mitzenmacher. A model for learned bloomfilters and optimizing by sandwiching. In Advances inNeural Information Processing Systems, pages 462–471,2018.

    [23] Michael Mitzenmacher and Eli Upfal. Probability andcomputing - randomized algorithms and probabilisticanalysis. Cambridge University Press, 2005.

    [24] Michael Mitzenmacher and Berthöld Vöcking. Theasymptotics of selecting the shortest of two, improved.In Proc. of the 37th Annual Allerton Conference onCommunication, Control, and Computing, 1999.

    [25] Manish Purohit, Zoya Svitkina, and Ravi Kumar.Improving online algorithms via ML predictions. InAdvances in Neural Information Processing Systems,

    pages 9684–9693, 2018.[26] Ziv Scully and Mor Harchol-Balter. SOAP bubbles: ro-

    bust scheduling under adversarial noise. In Proc. of the56th Annual Allerton Conference on Communication,Control, and Computing, 2018.

    [27] Ziv Scully and Mor Harchol-Balter and Allen Scheller-Wolf. SOAP : One Clean Analysis of All Age-BasedScheduling Policies. In Proc. of the ACM on Measure-ment and Analysis of Computing Systems, 2018.

    [28] Berthold Vöcking. How asymmetry helps load balancing.Journal of the ACM, 50:4, 568-589, 2003.

    [29] Nikita Dmitrievna Vvedenskaya, Roland L’vovich Do-brushin, and Fridrikh Izrailevich Karpelevich. Queueingsystem with selection of the shortest of two queues: Anasymptotic approach. Problemy Peredachi Informatsii,32(1):20–34, 1996.

    [30] Adam Wierman. Fairness and scheduling in singleserver queues. Surveys in Operations Research andManagement Science, 16(1), pp. 39-48, 2011.

    [31] Adam Wierman and Mor Harchol-Balter. Classifyingscheduling policies with respect to unfairness in anM/GI/1. In Proc. of the International Conference onMeasurements and Modeling of Computer Systems, pp.238-249, 2003.

    [32] Adam Wierman and Misja Nuyens. Scheduling despiteinexact job-size information. Performance EvaluationReview, 36(1):25-36, 2008.

    A Additional Results

    A.1 Toward Developing Equations for LimitingBehavior Previous work has shown that, in the limitingsupermarket model where the number of queues goes toinfinity, individual queues can be treated as independent,both when the choosing shortest queue and whenchoosing the least loaded [7]. This connection plays akey role in the analysis of the supermarket model whenchoosing the least loaded queue with FIFO schedulingin [11], which yields the stationary distribution for thequeue load for individual queues in the limit as n goesto infinity. (See also [12] on this issue; this is sometimesreferred to as cavity process analysis.) We can concludethat in equilibrium, at each queue considered in isolation,the least loaded variant of the supermarket model hasa load-dependent arrival process, given by a Poissonprocess of rate λ(x) when the queue has service load x.(See, for example, [5, 17] for more on queues with load-dependent arrival processes; note here the arrival ratedepends on the workload, not the total number of jobs inthe queue.) The least loaded variant of the supermarketmodel when using other scheduling schemes, such asSJF, PSJF, and SRPT, would similarly have the sameload-dependent arrival process, as in equilibrium theworkload distribution would be the same regardless ofthe scheduling scheme. Hence we could develop formulaefor quantities such as the expected response time in

    http://arxiv.org/abs/1902.00732

  • 0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    4.5

    5.0

    1.0

    averagetim

    ein

    system

    SPRPTSELFISH-PMIN-ADD-P

    (a) Exponential service times,

    α = 0.5

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    4.5

    5.0

    1.0

    averagetim

    ein

    system

    SPRPTSELFISH-PMIN-ADD-P

    (b) Weibull service times, α =

    0.5

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1

    2

    3

    4

    5

    6

    1

    averagetim

    ein

    system

    SPRPTSELFISH-PMIN-ADD-P

    (c) Exponential service times,

    α = 0.5, β = 0.2

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    1

    2

    3

    4

    5

    6

    1

    averagetim

    ein

    system

    SPRPTSELFISH-PMIN-ADD-P

    (d) Weibull service times, α =

    0.5, β = 0.2

    Figure 15: Comparing methods of choosing a queue when using predicted service times, for α-predictions withα = 0.5, and for (α, β)-predictions with α = 0.5, β = 0.2. All queues use SPRPT within the queue; in the figure,SPRPT means each job chooses the queue with smallest predicted remaining work, SELFISH-P means each jobchooses the queue that minimizes its own waiting time according to predictions, and MIN-ADD-P means each jobchooses the queue that minimizes the additional waiting time added according to predictions.

    equilibrium in the supermarket model using the leastloaded queue and SRPT, if we can develop an analysis ofa single queue using SRPT with a load-dependent arrivalprocess (and similarly for other scheduling schemes). Weare not aware of any such analysis in the literature; thisis a natural and tantalizing open question.

    We note that the supermarket model when jobschoose the shortest queue also, as far as we know, hasnot been analyzed for SJF, PSJF, and SRPT. Here thearrival process at a queue in equilibrium can be given bya Poisson process of rate λ(n) when the queue has n jobswaiting. Again, if we can develop an analysis of a singlequeue using SRPT with a queue-length-dependent arrivalprocess, we can use this to analyze the supermarketmodel using SRPT (and similarly for other schedulingschemes).

    A.2 Choosing a Queue with Predictions As be-fore, we consider methods for choosing a queue beyondthe queue with the (predicted) least load. We considerplacing a job so that it minimizes the additional pre-dicted waiting time, based on the predicted waiting timesfor all jobs. Alternatively, if control is not centralized,we might consider selfish jobs, that seek only to minimizetheir own predicted waiting time when choosing a queue.

    Our results, in Figure 15, focus on two representativeexamples: α-predictions with α = 0.5, and (α, β)-predictions with α = 0.5, β = 0.2. Again, choosinga queue to minimize the additional predicted waitingtime in these situations does yield a small improvementover least loaded update with SPRPT, and selfish jobshave a significant negative effect.

    Finally, as discussed earlier we note that there is asignificant difference between Least Loaded Updatedand Least Loaded Total policies. Up to this point,we have used “least loaded” to refer to Least Loaded

    Updated, where the predicted service time at the queue isrecomputed after each departure and arrival. In contrast,Least Loaded Total tracks a single predicted servicetime for the queue that is updated on arrival but notat departure (unless a queue empties, in which casethe service is reset to 0). While theoretically appealing(as it reduces the state space for the system), LeastLoaded Total generally performs significantly worse thanLeast Loaded Updated. Figure 16 below provides arepresentative example, in the setting of α-predictionswhen α = 0.5. We see FIFO, in particular, does quitepoorly under Least Loaded Total, and in all cases, thegap in performance notably increases with the load. Ourother experiments show that the gap in performance alsoincreases significantly as the predictions become moreinaccurate; with exponential predictions, α-predictionswith higher α, or (α, β)-predictions with α = 0.5 andβ > 0, our simulations show even larger gains fromusing Least Loaded Updated. While it may be usefulto consider Least Loaded Total as an approach towardobtaining theoretical results, it does not appear to bethe result we wish to aim for.

    B Fairness

    As described in Section 4.2.1, fairness is often defined interms of making jobs spend an amount of time in thesystem that is roughly proportional to their size [30];this is captured by the slowdown concept, where a job’sslowdown is its resident time divided by its size. Here,we will show results relative to the real-world datasetsof Section 5 where the average system load is set toλ = 0.9.

    B.1 Slowdown Distribution A first definition offairness can be having predictable slowdowns: for exam-ple, minimizing the variance of the per-job slowdowns,

  • 0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    2

    4

    6

    8

    10

    1

    averagetim

    ein

    system

    FIFO (LLT)SPJF (LLT)SPRPT (LLT)FIFO (LLU)SPJF (LLU)SPRPT (LLU)

    (a) Exponential service times, queue

    choice methods

    0.5 0.6 0.7 0.8 0.9 1.0

    λ: arrival rate

    2

    4

    6

    8

    10

    1

    averagetim

    ein

    system

    FIFO (LLT)SPJF (LLT)SPRPT (LLT)FIFO (LLU)SPJF (LLU)SPRPT (LLU)

    (b) Weibull service times, queue choice

    methods

    Figure 16: Comparing variations of Least Loaded, for α-predictions with α = 0.5.

    100 101 102 103

    Slowdown

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    CDF

    FIFOSPJFPSPJFSPRPT

    (a) Shortest queue

    100 101 102 103

    Slowdown

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    CDF

    FIFOSPJFPSPJFSPRPT

    (b) Least loaded queue

    100 101 102 103

    Slowdown

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    CDF

    FIFOSPJFPSPJFSPRPT

    (c) Selfish queue selection

    Figure 17: Google dataset: CDF of slowdown.

    100 101 102 103 104 105

    Slowdown

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    CDF

    FIFOSPJFPSPJFSPRPT

    (a) Shortest queue

    100 101 102 103 104 105

    Slowdown

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    CDF

    FIFOSPJFPSPJFSPRPT

    (b) Least loaded queue

    100 101 102 103 104 105

    Slowdown

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    CDF

    FIFOSPJFPSPJFSPRPT

    (c) Selfish queue selection

    Figure 18: Trinity dataset: CDF of slowdown.

    100 101 102 103 104

    Slowdown

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    CDF

    FIFOSPJFPSPJFSPRPT

    (a) Shortest queue

    100 101 102 103 104

    Slowdown

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    CDF

    FIFOSPJFPSPJFSPRPT

    (b) Least loaded queue

    100 101 102 103 104

    Slowdown

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    CDF

    FIFOSPJFPSPJFSPRPT

    (c) Selfish queue selection

    Figure 19: Twosigma dataset: CDF of slowdown.

  • 0 5 10 15 20 25

    Size

    100

    101

    102

    Meancond

    ition

    alslo

    wdo

    wn

    FIFOSPJFPSPJFSPRPT

    (a) Shortest queue

    0 5 10 15 20 25

    Size

    100

    101

    102

    Meancond

    ition

    alslo

    wdo

    wn

    FIFOSPJFPSPJFSPRPT

    (b) Least loaded queue

    0 5 10 15 20 25

    Size

    100

    101

    102

    Meancond

    ition

    alslo

    wdo

    wn

    FIFOSPJFPSPJFSPRPT

    (c) Selfish queue selection

    Figure 20: Google dataset: mean conditional slowdown.

    0 1 2 3 4 5 6 7 8

    Size

    100

    101

    102

    Meancond

    ition

    alslo

    wdo

    wn

    FIFOSPJFPSPJFSPRPT

    (a) Shortest queue

    0 1 2 3 4 5 6 7 8

    Size

    100

    101

    102

    Meancond

    ition

    alslo

    wdo

    wn

    FIFOSPJFPSPJFSPRPT

    (b) Least loaded queue

    0 1 2 3 4 5 6 7 8

    Size

    100

    101

    102

    Meancond

    ition

    alslo

    wdo

    wn

    FIFOSPJFPSPJFSPRPT

    (c) Selfish queue selection

    Figure 21: Trinity dataset: mean conditional slowdown.

    or a given value for the x-th percentile. To facilitatethese evaluations, in Figures 17 to 19 we show the em-pirical cumulative distribution functions (CDFs) of theslowdown values observed in our experiments.

    Because system load changes over time, slowdowndistributions are unequal in essentially all cases thatwe observe–in particular for Trinity, due to the largeload the system experiences around the beginning ofthe trace. The overall pattern we see here, however, isthat the slowdown distribution becomes less unequalwith policies that perform better; PSPJF, which is thepolicy that performs best in terms of mean responsetime, also has the least variability in terms of slowdown.We explain this intuitively with the fact that extremeslowdown values are caused by “clogged” queues: byoptimizing mean response time, extreme slowdown casesare also minimized.

    B.2 Mean Conditional Slowdown A second defi-nition of fairness involves mean conditional slowdown,that is, the expected value of slowdown for job havinga given size: for a job of size x, the mean conditionalslowdown is E[T (x)]/x, where T (x) is the response timefor jobs of size x [4, 30, 31]. To evaluate it empirically inour experiments, we follow the approach of Dell’Amicoet al. [8]: bin jobs by size in 50 bins having the same

    amount of jobs each, and plot the average size and slow-down of each bin in Figures 20 to 22, respectively on theX and Y axis.

    Once again, and confirming existing research [30, 8],we observe that best-performing policies empiricallyresult in better fairness. While one may imagine thatsize-based policies that give priority to smaller jobs wouldpenalize large ones compared to a policy like FIFO, thisis only true to some extent: for Google (Figure 20) size-based policies appeared to always be preferable also forlarger jobs; in Trinity (21) only the very largest jobsare penalized; only in the Twosigma dataset large jobshave sensibly lower mean conditional slowdown whensize-based policies are used.

    In Figure 21, we notice a discontinuity for jobs ofsize 2. This can be explained by Figure 11, which showsthat for the Trinity dataset there is a set of jobs havingsize 2 whose size is systematically underestimated: thisexplains the drop in mean conditional slowdown.

    For Twosigma (Figure 22), the results for all thesize-based scheduling algorithms look superimposed. Weexplain this, once again, with the data from Figure 11,showing that job size estimations tend to be clusteredaround some well-separated values; hence, the detailsof the scheduling algorithm only have limited impact,since in most cases they will all schedule the job with

  • 0.0 2.5 5.0 7.5 10.0 12.5 15.0

    Size

    100

    101

    102

    Meancond

    ition

    alslo

    wdo

    wn

    FIFOSPJFPSPJFSPRPT

    (a) Shortest queue

    0.0 2.5 5.0 7.5 10.0 12.5 15.0

    Size

    100

    101

    102

    Meancond

    ition

    alslo

    wdo

    wn

    FIFOSPJFPSPJFSPRPT

    (b) Least loaded queue

    0.0 2.5 5.0 7.5 10.0 12.5 15.0

    Size

    100

    101

    102

    Meancond

    ition

    alslo

    wdo

    wn

    FIFOSPJFPSPJFSPRPT

    (c) Selfish queue selection

    Figure 22: Twosigma dataset: mean conditional slowdown.

    the smallest estimated size.

    1 Introduction2 Additional Related Work3 Known Service Times3.1 Scheduling Beyond FIFO3.2 Choosing a Queue with Exact Information

    4 Predicted Service Times4.1 The Prediction Model4.2 Scheduling with Predictions4.2.1 Fairness Issues

    5 Real-World Traces6 ConclusionA Additional ResultsA.1 Toward Developing Equations for Limiting BehaviorA.2 Choosing a Queue with Predictions

    B FairnessB.1 Slowdown DistributionB.2 Mean Conditional Slowdown