On the Latency and Energy Efﬁciency of Distributed Storage ...tandonr/journal-papers/Latency-DSS-2015.pdf · Index Terms—Erasure Codes, Distributed Storage, Fork-Join Queues,

2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TCC.2015.2459711, IEEE Transactions on Cloud Computing

On the Latency and Energy Efficiency ofDistributed Storage Systems

Akshay KumarDepartment of ECE

Virginia Tech, Blacksburg, [email protected]

Ravi TandonDiscovery Analytics Center & Dept. of CS


T. Charles ClancyHume Center & Department of ECE


Abstract—The increase in data storage and power consumptionat data-centers has made it imperative to design energy efficientDistributed Storage Systems (DSS). The energy efficiency of DSSis strongly influenced not only by the volume of data, frequencyof data access and redundancy in data storage, but also bythe heterogeneity exhibited by the DSS in these dimensions.To this end, we propose and analyze the energy efficiency ofa heterogeneous distributed storage system in which n storageservers (disks) store the data of R distinct classes. Data of classi is encoded using a (n, ki) erasure code and the (random) dataretrieval requests can also vary across classes. We show that theenergy efficiency of such systems is closely related to the averagelatency and hence motivates us to study the energy efficiencyvia the lens of average latency. Through this connection, weshow that erasure coding serves the dual purpose of reducinglatency and increasing energy efficiency. We present a queuingtheoretic analysis of the proposed model and establish upper andlower bounds on the average latency for each data class undervarious scheduling policies. Through extensive simulations, wepresent qualitative insights which reveal the impact of codingrate, number of servers, service distribution and number ofredundant requests on the average latency and energy efficiencyof the DSS.

Index Terms—Erasure Codes, Distributed Storage, Fork-JoinQueues, Latency, Energy Efficiency, Multi-class queuing system.

I. INTRODUCTION

Cloud based storage systems are emerging to gain signifi-cant prominence due to their highly virtualized infrastructurethat presents cost-effective and simple to use elastic networkresources. The backbone infrastructure of the cloud is com-prised of distributed storage systems (DSS), in which the datais stored and accessed from commodity storage disks. Codingof data across distributed disks provides fault tolerance byproviding reliability against unexpected disk failures. Therehas been a recent paradigm shift from classical replicationbased codes to erasure codes because they provide higher faulttolerance at the same storage cost [2]. As a result, a number ofcommercial DSS such as Google Colossus, Windows Azureetc. are transitioning to the use of erasure codes [3]–[5].Besides providing fault tolerance and minimizing storage cost,another important aspect which deserves equal, if not moreattention is the energy efficiency of DSS.

Over the last decade, the dramatic usage of data has lead toan enormous increase in the volume of stored (archival) dataand the frequency of data access to a DSS [6]. This translates

Parts of this work were presented at the Globecom 2014 conference [1].

to more and more servers being added to the data-centeroperating at higher server utilization levels. As a result, theenergy consumption of the data-centers is increasing steeplyand adds up to its operational cost. According to [7], energyconsumed by the data centers globally has increased by 19%in 2012 and storage systems in a large data-center consume upto 40% of the total energy [8]. Hence, there is a need to deviseenergy efficient data storage schemes. The existing techniquesfor energy-efficient data storage are based on variants ofschemes that involve powering off storage devices [9]–[11].

Energy efficiency is a system wide property and while somemetrics focus on the energy efficiency of hardware or softwarecomponents [12], others are based on the usage of physicalresources (such as CPU, memory, storage etc.) by the runningapplications or servers. For the scope of this work, we focuson the data transfer throughput metric [13], which measuresenergy efficiency as the amount of data processed in the DSSper unit amount of energy expended across all distributedservers. Therefore, the energy efficiency of DSS is stronglyinfluenced by the volume of data transferred (per request),frequency of data storage/access requests, service rate of eachserver and the degree of redundancy in data storage.

The energy efficiency of a DSS is also closely related toits read/write latency1. The data stored and accessed from thecloud is broadly classified into two categories [14], [15]:

• Hot-data: this could refer to data which is frequentlyaccessed (i.e., a higher job request rate). Furthermore, itis desirable to provide higher redundancy/fault tolerancewhen storing such data.

• Cold-data: this could refer to data which is infrequentlyaccessed or archival data. Such data does not necessarilymandate to be coded and stored with higher fault toler-ance, as it is seldom accessed by the users.

When the data is infrequently accessed as in case of Cold-data, the average latency is reduced and it also improves theenergy efficiency of DSS [16]. Another case in point is thatincreasing redundancy as in Hot-data improves fault-tolerancebut generally results in increased latency [17]. The energyefficiency in this case decreases due to increase in powerconsumption as more servers are involved in processing thesame data request. Thus the latency of DSS is closely tied

1Here, latency refers to the time taken to process a data request, measuredrelative to the time at which it enters the DSS. For the scope of this work,we consider latency to be the sum of queuing delay and service time, andassume the other delays to be relatively negligible.



with its energy efficiency. Therefore in this work, we studythe energy efficiency of a DSS through the lens of averagelatency of DSS.

As mentioned earlier, the erasure coded DSS, due to theirseveral merits over replication based codes, have gained sig-nificant prominence in recent times. Therefore, in this work westudy the relationship between latency and energy efficiencyfor such systems. In a erasure coded DSS, the data of eachuser is stored across n disks (or servers) using a (n, k)optimal Maximum-Distance-Separable (MDS) code. By theproperty of MDS codes, accessing the data stored at anyk out of n servers suffices to recover the entire data of auser (also referred to as successful completion of the jobrequest2 of that user). The processing of job requests in DSS istypically analyzed using Fork-Join (F-J) queues [19], [20]. A(n, k) F-J queue consists of n independently operating queuescorresponding to each of the n servers. Every job arrivingin the system is split n ways and enters the queues of all nservers simultaneously. A queuing theoretic latency analysisof the (n, k) F-J system has been done in [17] (also see [21]–[23]). The key findings of these papers is that using erasurecoding and sending redundant requests (requests to more thank servers for a (n, k) F-J system) can significantly reduce thelatency of a DSS.

However, most of the aforementioned literature considers ahomogenous storage architecture and there is no distinction(from system’s perspective) between any two job requestsentering the system. However, that is hardly the case withreal DSS [24], [25], wherein as mentioned earlier (see Hot-data vs. Cold-data), the job requests can be classified intoone of the several classes based on the job arrival rate orfault-tolerance/storage requirements. For instance, Amazon S3[24] allows its customers to choose from following storageoptions: ‘Standard Storage’, ‘Reduced Redundancy Storage’,and ‘Glacier Storage’. Standard Storage is the most expensivebut it provides maximum reliability and availability. At theother extreme is the inexpensive Glacier Storage which storesdata with low redundancy and is designed for non-criticaland infrequently accessed data. Therefore, motivated by thisobservation, we consider a (n, k1, k2, ..., kR) multi-tenant DSSfor R distinct data classes, a generalization of the homogenous(n, k) DSS in [17]. Data of class i (∀i ∈ {1, 2, ..., R}) is storedacross n servers using a (n, ki) erasure (MDS) code. Thearrivals3 of job requests of class i are assumed to follow aPoisson distribution with rate λi.

The key contributions of this paper are:

• A multi-tenant DSS is proposed and analyzed throughthe Fork Join framework to account for the heterogeneityin job arrival rates and fault-tolerance requirements ofdifferent data classes.

2We restrict our attention to read requests because in most of the practicalDSS, such as HDFS [18], Windows Azure [5] etc. the user’s data is writtenonly once to the storage nodes but it can be retrieved multiple times by theuser.

3Job arrivals refers to the time instants at which job requests enters thequeues of the servers in the DSS.

• A data throughput based energy efficiency metric isdefined for the heterogeneous DSS operating under anygiven scheduling policy. For the special case of singleserver and data class, we showed that the average latencyand energy efficiency of DSS are closely related to eachother. Therefore, using a queuing-theoretic approach, weprovided lower and upper bounds on the average latencyfor jobs of class i (∀i ∈ {1, 2, ..., R}) in the proposedF-J framework under various scheduling policies suchas First-Come-First-Serve (FCFS), preemptive and non-preemptive priority scheduling policies.

• We studied the impact of varying the code-rate on thelatency, energy efficiency and network bandwidth con-sumed by DSS. Increasing code-rate reduces latency andincreases energy efficiency. However, this comes at thecost of increased storage space and (write) bandwidth.We also obtained interesting insights from investigatingthe impact of varying the number of servers, heavy-tailarrival/service distributions in the DSS.

• Lastly, we studied the impact of varying the numberof redundant requests (sending requests to more than kservers for (n, k) MDS code) to the DSS. We observedthat sending redundant requests reduces latency and in-creases energy efficiency. Thus, full redundancy resultsin minimum latency and maximum energy efficiency foreach data-class.

II. RELATED WORK

A number of good MDS codes such as LINUX RAID-6and array codes (EVENODD codes, X-code, RDP codes) havebeen developed to encode the data stored on cloud (see [26]and references therein). These codes have very low encod-ing/decoding complexity as they avoid Galois Field arithmetic(unlike the classical Reed-Solomon MDS codes) and involveonly XOR operations. However, they are usually applicableupto two or three disk failures. Also, in the event of diskfailure(s), Array codes and recently introduced Regeneratingcodes reduce disk and network I/O respectively. Recently, non-MDS codes such as Tornado, Raptor and LRC codes [27], [28]have been developed for erasure coded storage. Although thefault-tolerance is not as good as MDS codes, they achievehigher performance due to lower repair bandwidth and I/Ocosts.

The latency analysis of (MDS) erasure coded (n, k) ho-mogenous DSS has been well investigated in [17], [21], [22]which provide queuing theoretic bounds on average latency.A related line of work [23], [29] independently showed thatsending requests to multiple servers always reduces the (read)latency. Then Liang et. al. [30] extended the latency analysis toa (n, k, L) DSS, in which n of a total L number of independentservers are used to store the (n, k) MDS code. It assumeda “constant+exponential” model for the service time of jobs.The authors in [31], [32] developed load-adaptive algorithmsthat dynamically vary job size, coding rate and number ofparallel connections to improve the delay-throughput tradeoffof key-value storage systems. These solutions were extended

2



for heterogeneous services with mixture of job sizes andcoding rate. Recently, Xiang et. al. [33] provided a tight upperbound on average latency, assuming arbitrary erasure code,multiple file types and a general service time distribution.This was then used to solve a joint latency and storagecost optimization problem by optimizing over the choice oferasure code, placement of encoded chunks and the choice ofscheduling policy.

Data-centers while configured for peak-service demand,end up being highly underutilized. Furthermore, the hardwarecomponents in storage systems are not power proportional,with idle mode consuming roughly 60% of that of a busypower [34]. This has resulted in significant research in design-ing/implementing power efficient schemes. Most of the currentliterature focuses on power management via performancescaling (such as DVFS [35], [36]) or low-power states [37].Recently, Liu et. al. [38] investigated the effect of varyingvarious system operations such as processing speed, systemon/off decisions etc. on the power-delay performance from aqueuing theoretic perspective. This work was extended in [39],wherein a joint speed scaling and sleep state management ap-proach was proposed, that determines the best low-power stateand frequency setting by examining the power consumptionand average response time for each pair.

However, the work in [38], [39] does not present the power-delay performance analysis in a erasure coded DSS. Also itfocuses on power consumption rather than the more relevant,energy efficiency of DSS. Therefore, in this work, we studythe relationship between energy efficiency and average latencyin a (MDS) erasure coded heterogeneous DSS for differentscheduling policies.

III. SYSTEM MODEL

A heterogeneous multi-tenant (n, k1, k2, ..., kR) DSS(shown in Fig. 1) consists of n servers that store the data ofR distinct classes. The R classes differ from each other in thefault-tolerance, storage requirements and frequency of accessof the stored data. The data of class i (which is assumed to beof size li) is partitioned into ki equal size fragments and thenstored across n servers using a (n, ki) Maximum-Distance-Separable (MDS) code. Thus each server stores, 1/ki fractionof original data. The arrival process for request of class i isassumed to be Poisson with rate λi. The service time at eachserver is assumed to follow an exponential distribution withservice rate µ (per unit file size) [32]. The effective servicerate at any server for jobs of class i is µi =

kiµli

since eachserver stores 1/ki fraction of data.Example 1. We now present a representative example toillustrate the system model. Consider a (n, k1, k2) = (3, 2, 1)two-class DSS. Data for the two classes A and B are encodedacross n = 3 servers using (3, 2) and (3, 1) MDS codesrespectively as shown in Fig. 2. Let A1 and B1 denote twofiles of class A and B respectively that need to be coded andstored across the servers. Then for the (3, 2) MDS code, A1

is split into two sub-files, A11 and A12, of equal size and arestored on any two servers (servers 1 and 2 in Fig. 2). Then

Server 1

A1

Server 2 Server n

. . .A1 A1

. . .

. . .

�1 �2 �R

A2

A3 B3

B2

B1

R classes

time

Scheduler

. . .

�1 �2 �R

A2

A3 B3

B2

B1

R classes

. . .

�1 �2 �R

A2

A3 B3

B2

B1

R classes

Scheduler Scheduler

time

Fig. 1: System Model.

Server 1 Server 2 Server 3

A11 A12 A11 �A12B1 B1 B1

(3, 2) code: class A (3, 1) code: class B

Fig. 2: MDS codes for data storage in a two-class Fork-Joinsystem.

the remaining server (i.e. server 3) stores A11 ⊕A12. Thuseach server stores half the size of original file and the entirefile can be recovered from any two servers. The (3, 1) MDScode for file B1, is a simple replication code in which eachserver stores the copy of entire file of class B and thus canbe recovered by accessing the data from any one server.

The evolution of system state in this example, depends onthe local scheduling policy at each server. Although there ex-ists various scheduling policies, in this work we consider First-Come-First-Serve (FCFS), preemptive and non-preemptive pri-ority queuing policies at each server. In FCFS scheduling, alldata classes are equal priority. At each server, the job thatenters first in the buffer is served first. In a priority queuingpolicy, the data classes are assigned different priority levels.A job of a particular class will be served only when there areno outstanding jobs of classes with higher priority level. Apriority queuing policy is further classified as preemptive ornon-preemptive based on whether or not the job in server canbe preempted by a job of higher priority level.

Fig. 3(a)-(c) illustrates the evolution of system state underthe FCFS policy. After server 2 finished job A1 in Fig. 3(a), B1

enters server 2 and is finished in the next state (Fig. 3(b)) whileother servers still process A1. Since kB = 1, the remaining twocopies of B1 immediately exit the system. Finally in Fig. 3(c)server 1 finishes A1 and since kA = 2, A1 exits at server 3.

A. Latency and Energy EfficiencyDefinition 1. For a {λi, li, µ, (n, k1, k2, ..., kR)} DSS, theaverage latency of class i under some scheduling policy P

3



A1

B1 B1 B1B2 B2 B2

Job A1 departs

B1 B1B2

B2B2

A2 A2 A2

B2 B2

Job A1 departs

A2 A2

(a) (b) (c)

A2

A2

A2

Job B1 departs

A2

A1 A1 A1 B1 A1 A1 B2 A1

Server 1 Server 2 Server 3 Server 1 Server 2 Server 3 Server 1 Server 2 Server 3

Fig. 3: System state evolution: two-class FJ system with FCFS.

is defined asT iP = T i

s,P + T iq,P , (1)

where T is,P and T i

q,P are the average service time and waitingtime (in queue) for a job of class i respectively.

For a {λi, li, µ, (n, k1, k2, ..., kR)} DSS operating underscheduling policy P , a relevant metric for measuring the en-ergy efficiency, EP , of the DSS is the data transfer throughputmetric [13]. It is defined as the limiting ratio of the amount ofdata processed, DP(t), by the DSS to the energy consumed,EP(t), by the DSS in a infinitely large time duration t. It hasunits of bits/Joule. Now DP(t) is simply,

DP(t) =R∑i=1

liNi(t), (2)

where Ni(t) is the number of jobs of class i processed by DSSin a time interval t. In order to determine EP(t), we modelthe power consumption of the DSS as follows:

• To reduce power consumption, the servers are equippedwith the dynamic voltage/frequency scaling (DVFS)mechanism and low-power states [39]. The DVFS mech-anism reduces operating voltage and CPU processingspeed (or frequency) in step to reduce utilization andhence increase power savings.

• The power consumed by a server in any state is the sumof power consumed by the CPU and the platform whichcomprises of chipset, RAM, HDD, Fan etc. The powerconsumed by the CPU and platform in a given state isassumed to be same across all the n servers.

• The power consumed by a server (CPU and platform)while being in active and low-power state is denotedby Pon and Poff respectively. A server is in active modeduring the busy periods (i.e., there are outstanding jobswaiting for service). In general, at the end of a busyperiod, a server remains active for a while and then entersa sequence of low-power states staying in each for apredetermined amount of time. For ease of analysis, welump them into a single low-power state with constantCPU power, Cl and constant platform power, Pl. Afterthe busy period is over, the server remains in active mode

for dl and then enters the low-power state4. When thebusy period restarts, the server incurs a wake-up latencywl in which it consumes active mode power, but is notcapable of processing any job requests. Fig. 4 explainsthis using an example.

• The CPU power during active mode, Ca is proportionalto V 2f , where V is the supply voltage and f is the CPUoperating frequency5 (f ∈ [0, 1]) and are set by the DVFSmechanism. Further, we assume that V is proportional tof [39]. So Ca = C0f

3, for some maximum power C0.The power consumed by the platform during active mode,Pa, is constant.

• ti,j,kbusy denotes the duration of time for which the kth serveris busy serving jth job of ith class.

• ti,j,kidle denotes the duration of idle period after the kth

server finished the jth job of ith class.

Using the above notations, the active mode power per serveris Pon = Ca + Pa = C0f

3 + Pa. Similarly, Poff = Cl + Pl.Consider any time duration t of interest during the operationof DSS. During this period, the total time for which the DSS isin active mode, ta, is sum total (across all servers) of all busyperiods plus the active mode time before entering low-powerstate. Mathematically, we have,

ta =R∑i=1

Ni(t)∑j=1

n∑k=1

ti,j,kbusy + max (0, ti,j,kidle − dl) (3)

The total time for which DSS is in low-power state, tl is,

tl = nt− ta. (4)

We have now the following definition of energy efficiency ofa DSS.

Definition 2. For a {λi, li, µ, (n, k1, k2, ..., kR)} DSS, theenergy efficiency of the DSS under some scheduling policyP is defined as,

EP = limt→∞

DP(t)

EP(t), (5)

= limt→∞

∑Ri=1 liNi(t)

Ponta + Pofftl, (6)

where (6) follows from (5) using (2). The expressions forta and tl are given in (3) and (4) respectively.

Next in order to highlight the relationship between theaverage latency and energy efficiency of a DSS, we considerthe special case of a M/M/1 system and a single data-class.For tractability of analysis, here we assume that dl, wl andPoff are all 0. Then from the Definition 1 for average latency6,

4As a consequence, if the duration of idle period (time between end ofa busy period and start of the next one) is smaller than dl, then the serveralways remains active).

5Due to this, the effective service rate for class i becomes µi = fkiµ/li.6In this special case, the scheduling policy P and class index i are not

relevant and hence dropped from 1.

4



0 10 20 30 40 50 60 700

5

10

15

20N

um

ber

of

cust

om

ers

in s

yste

m

Time Index0 10 20 30 40 50 60 70

0

100

200

300

400

To

tal i

nst

anta

neo

us

po

wer

dl

Pon

Poff

wl

dl

No service

Fig. 4: Variation of total power consumption of DSS across multiple busyperiods and idle periods. The switch to idle state happens only for idle periodswith duration greater than dl but it then results in a wake-up latency of wl.

we have,

T = Ts + Tq, (7)

=1

µ′ +λ

µ′(µ′ − λ)=

1

µ′ − λ, (8)

where (8) follows from (7) by noting that for a M/M/1 system,the mean service time is Ts =

1µ′ and mean waiting time is

Tq = λµ′ (µ′−λ)

. Here, µ′= µf

l is the effective service rate.Here, The energy efficiency is computed using (6) as

E = limt→∞

lN(t)

Ponta + Pofftl, (9)

= limt→∞

lN(t)

Pon∑N(t)

i=1 Ts,i

, (10)

=l

Pon limt→∞

∑N(t)i=1 Ts,i

N(t)

, (11)

=l

PonTs, (12)

where (10) follows from (9) by noting that ton is sum ofservice time of each of N(t) jobs (denoted by Ts,i for theith job) and by neglecting the power consumed when serveris idle i.e., Poff = 0. Then (12) follows from (11) from thedefinition of average service time. Thus the energy efficiencyis inversely related to the average service time of jobs. Itis difficult to find a closed form expression for the energyefficiency of a heterogeneous DSS but the general trend ofinverse proportionality between latency and energy efficiencycontinues to hold true as verified through extensive simulationsin Section VI. The average latency is also directly related tothe average service time7. Therefore, we conclude that energyefficiency and average latency of a DSS are closely related toeach other. Henceforth, we focus on the latency analysis of aheterogeneous DSS.

7Queuing delay depends on job arrival rate and service time. So the latencywhich is sum of queuing delay and service time directly depends on servicetime.

IV. PRELIMINARIES

In this section, we first present the analysis of averageservice latency in a multi-class single server system withFCFS scheduling policy. For the corresponding results in apriority (preemptive/non-preemptive) queuing system, we referthe reader to [40]. To improve the tractability of the latencyanalysis, the analytical results in this work ignore the impactof wakeup latency wl similar to other works in literature [17],[21]–[23], [29]. We then briefly review the existing results forupper and lower bounds on the average latency for a (n, k)homogenous Fork-Join system [17].

A. Average Latency in Multi-class Single Server System withFCFS Scheduling

Consider the system model described in Fig. 1 with n = 1server and FCFS scheduling policy. The FCFS system canbe modeled as a M/G/1 queuing system with net arrival rate,

λ =R∑

r=1λr, and a general service distribution, S. The average

latency of jobs of class i is the sum of their average waitingtime (in queue) and the average service time. Let Si be arandom variable representing the service time for a job ofclass i in the FCFS system. Then the average service time ofjobs of class i is simply the expectation, E[Si]. In the FCFSsystem, the waiting time, WFCFS, for jobs of all the classes issame and is given by the Pollaczek-Khinchine (P-K) formula[41] (for M/G/1 system) as

WFCFS =λ(E[S2])

2(1− λE[S]), (13)

Therefore, the average latency for jobs of class i is,

T iFCFS = E[Si] +

λE[S2]

2(1− λE[S])= E[Si] +

λ(V[S] + E[S]2)2(1− λE[S])

,

(14)

where V[.] denotes the variance of the random variable. Nowthe fraction of jobs of class i, pi is

pi =λi

R∑r=1

λr

=λi

λ. (15)

So the probability that S takes on the value of Si is pi ∀i =1, 2, · · · , R. Therefore the probability distribution function(pdf) of S is given by

fS(s) =R∑

r=1

prfSr (s). (16)

Then the mean and the second moment of S are simply

E[S] =R∑

r=1

prE[Sr], E[S2] =R∑

r=1

prE[S2r ]. (17)

Using (15) and (17) in (14), we obtain,

T iFCFS = E[Si] +

R∑r=1

λr

[V[Sr] + E[Sr]

2]

2

(1−

R∑r=1

λrE[Sr]

) . (18)

5



0 3 6

2 5

A2

A2

A2

4

9

8

A3 A2

A3 A2

- -

7

6’

12

11

10

9’

8’

λλ λ λ

λ

λ

λ

λ

λ

λ

3µ 3µ 3µ 3µ

µ

2µ 2µ 2µ

µµ 2µ 2µ

2µ

2µ

µµ

µ

2µ

2µ

2µ

A1

A1

A1

A1

A1

-

Duplicate states

Du

plic

ate

stat

es

---

A1

A1

A1

-

A1

A1

---

-

A1

A1A2

A2

-

A3 A2

A3 A2

A3 A2

A1

A1

A1

Fig. 5: Markov chain for a (3, 2) Fork-Join system.

B. Latency Analysis of Homogenous DSS

An exact latency analysis of the (n, k) DSS is prohibitivelycomplex because the Markov chain has a state space withinfinite states in at least k dimensions. This is exemplifiedin Figure 5 which shows the Markov chain evolution fora (3, 2) DSS. Each state is characterized by the number ofjobs in the system. The arrival and service rates of jobs areλ and µ respectively. We note that as more jobs enter thesystem, the Markov Chain starts growing in two-dimensionsand results in multiple states with the same number of jobsin the system such as states 6 and 6

′. Thus, we note that an

exact analysis of the F-J system is very complex. Therefore,we review existing upper- and lower-bounds for the averagelatency of homogenous DSS.

1) Lower Bound on Average Latency: In a (n, k) DSS, ajob is considered finished when k out of n servers finish thatjob. This is equivalent to each job going through k stagessequentially, where the transition from one stage to the nextoccurs when one of the remaining servers finishes a sub-taskof the job [42]. We note that at any stage s, the maximumpossible service rate for a job that is not finished yet is(n− s+ 1)µ

′, where µ

′= fkµ

l . This happens when all theremaining sub-tasks of a job are at the head of their queues.Thus, we can enhance the latency performance in each stages by approximating it with a M/M/1 system with servicerate (n− s+ 1)µ

′. Then, the average latency of the original

system (denoted by T ), can be lower bounded as

T ≥ TLB =

k∑i=1

1

(n− i+ 1)µ′ − λ, (19)

where TLB denotes the lower bound on the average latency ofthe F-J system.

2) Upper Bound on Average Latency: To upper-boundthe performance of the (n, k) F-J system, we degrade itsperformance by approximating it with a (n, k) Split-Merge(SM) system, proposed in [17]. In the (n, k) SM system, aftera server finishes a copy of a job, it is blocked and not allowedto accept new jobs until all k copies of the current job are

finished. When k copies of a job are finished, the copies of thatjob at remaining n− k servers exit the system immediately.The SM system thus can be modeled as a M/G/1 system witharrival rate λ and a service distribution that follows kth orderstatistics [43] and is described here for reference.

Let X1, X2, ..., Xn be n i.i.d random variables (rv). Now ifwe order the rv’s in ascending order to get, X1,n < X2,n · · · <Xk,n · · · < Xn,n, then the distribution of the kth smallestvalue, Xk,n, is called the kth order statistics. The pdf of Xk,n

is given by8

fXk,n(x) =

(n

k − 1, 1, n− k

)FX(x)

k−1(1− FX(x))

n−kfX(x)

(20)where FX(x) and fX(x) are the cumulative density functionand pdf of Xi respectively for all i. The average latency ofthe F-J system is thus upper-bounded by the average latencyfor the SM system, TSM as,

T ≤ TSM = E[Xk,n]︸︷︷︸service time

+λ[V[Xk,n] + E[Xk,n]

2]

2(1− λE[Xk,n]))︸︷︷︸waiting time

, (21)

where the average service time is simply the expectation,E[Xk,n] and the average waiting time for a M/G/1 systemgiven by the P-K formula in (13). Now if Xi is exponentialwith mean 1/µ

′(where µ

′= fkµ

l ), then the mean and varianceof Xk,n are given by,

E[Xk,n] =H1

n−k,n

µ′ ,V[Xk,n] =H2

n−k,n

µ′2, (22)

where Hzx,y is a generalized harmonic number of order z

defined by

Hzx,y =

y∑j=x+1

1

jz, (23)

for some positive integers x, y and z.

V. MAIN RESULTS

Section IV-B presented bounds on the average latency forthe (n, k) F-J system. To extend the lower-bound result (19)to a heterogeneous FJ system, a naive approach would be toapproximate it with a homogenous FJ system with jobs of classi only while evaluating the lower bound on average latency ofclass i. Thus a naive lower-bound on the average latency forjobs of class i is,

T inaive ≥

ki−1∑j=0

1

(n− j)µi − λi. (24)

This lower bound holds true irrespective of the schedulingpolicy used in the heterogeneous system. However, this is aloose bound as it ignores the dependency of response time

8The result in (20) can be understood as follows. First select groups ofk − 1, 1, and n− k servers out of n servers in

( nk−1,1,n−k

)possible ways.

Then the pdf of service time for the singled-out server is simply fX(x). Nowsince Xi are i.i.d random variables, the probability that the selected k − 1servers finish their jobs before the singled-out server is FX(x)k−1. Similarly,the probability that n− k servers finish their jobs after the singled-out serveris (1− FX(x))n−k .

6



for a job of class i on the jobs of other classes in the systemwhich compete for the service at the same server.

Therefore, through a rigorous latency analysis of variousscheduling policies, we next account for this inter-dependencyin average latency of different classes and present lower andupper bounds for the heterogeneous FJ system. To this end,we first define a set of variables for a compact presentationof the results. The operational meaning of these variables willbecome clear when we present the proof of the results.

• (n, ki) is the MDS code used to store data of class i.• li is the file-size for class i.• λi is the arrival rate for jobs of class i.• µi = kifµ/li is the effective service rate for jobs of class

i, where µ is the service rate per unit file size.• ρi =

λi

µiis the server utilization factor for class i.

• Si =i∑

r=1ρrH

1n−kr,n

.

A. Main Results

Lemma 1 gives the stability conditions of the heterogeneousDSS for various scheduling policies. The upper- and lower-bounds on the average latency for various scheduling policiesare presented in Theorem 1 and 2 respectively.

Lemma 1. For a (n, k1, k2, ..., kR) Fork-Join system to bestable, the following condition must be satisfied at each node.

• FCFS scheduling(R∑

r=1

krλr

)(R∑

r=1

λrlrkr

)< nfµ

R∑r=1

λr. (25)

• Preemptive/Non-preemptive priority schedulingR∑

r=1

λrlr < nfµ. (26)

Next, to upper-bound the average latency, we extend theSplit-Merge (SM) system (defined in Section IV-B2) to Rdata classes, keeping the scheduling policy same as that for theoriginal system. Then for a given scheduling policy, the upper-bound on average latency is basically the average latency of thecorresponding SM system. This in turn is sum of the averageservice time and waiting time which can be determined bynoting the equivalence between the SM system as a M/G/1system as described in Section IV-B2. We thus obtain thefollowing upper-bounds on the average latency for differentscheduling policies.

Theorem 1. The average latency for job requests of class iin a (n, k1, k2, ..., kR) Fork-Join system is upper-bounded asfollows:

• FCFS scheduling

T iFCFS ≤

H1n−ki,n

µi︸︷︷︸Service time

+

R∑r=1

λr[H2n−kr,n

+ (H1n−kr,n

)2]/µr

2

2 (1− SR)︸︷︷︸Waiting time

.

(27)

The bound is valid only when SR < 1.• Non-preemptive priority scheduling9

T iN-PQ ≤

H1n−ki,n

µi+

R∑r=1

λr[H2n−kr,n

+ (H1n−kr,n

)2]/µr

2

2 (1− Si−1) (1− Si).

(28)The bound is valid only when Si < 1.

• Preemptive priority scheduling9

T iPQ ≤

H1n−ki,n

µi (1− Si−1)+

i∑r=1

λr[H2n−kr,n

+ (H1n−kr,n

)2]/µr

2

2 (1− Si−1) (1− Si).

(29)The bound is valid only when Si < 1.

We now define an additional set of variables for compactpresentation of the results in Theorem 2.

• Without loss of generality, assume the classes are rela-beled such that k1 ≤ k2 ≤ ... ≤ kR. Then for class i, wedefine cs as,

cs =

0, 1 ≤ s ≤ k1

1, k1 < s ≤ k2...i− 1, ki−1 < s ≤ ki

. (30)

• At a stage s, let Ris denote the set of classes with priority

higher than class i and that have not been finished yet.• ts,i =

λi

(n−s+1)µiat stage s and class i.

• Zis = 1−

∑r∈Ri

s

ts,r at stage s and class i.

For obtaining a lower-bound on the average latency, weenhance the performance of the original system similar tothe process described in Section IV-B2. The processing of ajob of class i is modeled as completing ki sequential stages(or sub-tasks). Then we enhance the latency performance forjob of class i in stage s by assuming the maximum possibleservice rate for it, i.e, (n− s+ 1)µi. However, at stage s,there may also be unfinished sub-tasks of jobs of other classeswhich can be served with maximum possible service rateof (n− s+ 1)µj , where j ̸= i. Due to this, we model theperformance of each enhanced stage as a M/G/1 system. Wethus obtain the following lower-bounds on the average latencyfor different scheduling policies.

Theorem 2. The average latency for job requests of class iin a (n, k1, k2, ..., kR) Fork-Join system is lower-bounded asfollows:

9Without loss of generality, we set the classes in the order of decreasingpriority as 1 > 2 > · · · > R.

7



• FCFS scheduling

T iFCFS ≥

ki∑s=1

ts,iλi︸︷︷︸

service time

+

R∑r=cs+1

t2s,rλr

1−R∑

r=cs+1ts,r︸︷︷︸

waiting time

. (31)

• Non-Preemptive priority scheduling9

T iN-PQ ≥

ki∑s=1

ts,iλi

+

R∑r=cs+1

t2s,rλr

Zis (Zi

s − ts,i)

. (32)

• Preemptive priority scheduling9

T iPQ ≥

ki∑s=1

ts,iλiZi

s

+

∑r∈Ri

s∪i

t2s,rλr

Zis (Zi

s − ts,i)

. (33)

B. Proofs for FCFS scheduling

We now present the proofs for the stability condition andthe bounds on average latency for the FCFS scheduling policy.The proofs for the remaining results are given in [44].

1) Proof of Lemma 1-FCFS scheduling: Consider anyserver in the (n, k1, k2, ..., kR) Fork-Join system. Jobs of classr enter the queue with rate λr. Each new job of class r exitsthe system when kr sub-tasks of that job are completed. Theremaining n− kr sub-tasks are then cleared from the system.Thus for each job of class r, (n−kr)

n fraction of the sub-tasksare deleted and hence the effective arrival rate of jobs of classr at any server is λr

(1− n−kr

n

)= krλr

n . Thus the overallarrival rate at any server, λeff, is

λeff =

R∑r=1

krλr

n. (34)

Let S denote the service distribution for a single-server FCFSsystem serving R data classes. Then from (17), the meanservice time at a server is

E[S] =R∑

r=1

prE[Sr] =R∑

r=1

λr

µr

R∑r=1

λr

, (35)

where (35) follows from (15) and the assumption that theservice time for a job of class i is exponential with rate µr.To ensure stability, the net arrival rate should be less than theaverage service rate at each server. Thus from (34) and (35)the stability condition of each queue is

R∑r=1

krλr

n<

R∑r=1

λr

µr

R∑r=1

λr

−1

,

Since µr = fkrµlr

and the termR∑

r=1λr is a constant, with simple

algebraic manipulations we arrive at(R∑

r=1

krλr

)(R∑

r=1

λrlrkr

)< nfµ

R∑r=1

λr. (36)

This completes the proof of stability condition for FCFSscheduling.

2) Proof of Theorem 1-FCFS scheduling: The FCFS systemcan be modeled as a M/G/1 queuing system with arrival rate

λ =R∑

r=1λr and a general service time distribution S. Then

the average latency for a job of class i in a FCFS schedulingsystem is given by (18) as,

T ifcfs = E[Si] +

R∑r=1

λr

[V[Sr] + E[Sr]

2]

2

(1−

R∑r=1

λrE[Sr]

) .

To obtain an upper bound on the average latency, wedegrade the FJ system in the following manner. For a job ofclass i, the servers that have finished processing a sub-task ofthat job are blocked and do not accept new jobs until ki sub-tasks of that job have been completed. Then the sub-tasks atremaining n− ki servers exit the system immediately. Fig. 6illustrates this process using Example 1. When A1 is finishedat server 2, it is blocked (see Fig. 7(b)) until another kA = 2copies are finished. Now this performance-degraded systemcan be modeled as a M/G/1 system where the distributionof the service process, Si, follows kth

i ordered statistics asdescribed in Section IV-B2. Now for any class i, the servicetime at each of the n servers is exponential with mean 1/µi.Hence from (22), the mean and variance of Si are,

E[Si] =H1

n−ki,n

µi, V[Si] =

H2n−ki,n

µ2i

. (37)

Substituting (37) in (18), we get the following upper boundon average latency:

T iFCFS ≤

H1n−ki,n

µi︸︷︷︸service time

+

R∑r=1

λr[H2n−kr,n

+ (H1n−kr,n

)2]/µr

2

2 (1− SR)︸︷︷︸waiting time

,

(38)

where SR =R∑

r=1ρrH

1n−kr,n

and ρr = λr/µr. This concludes

the proof of upper bound on the average latency for FCFSscheduling.

3) Proof of Theorem 2-FCFS scheduling: For the purposeof obtaining a lower bound on the average latency of classi, using insights from Section IV-B1, we map the parallelprocessing in the proposed FJ system to a sequential processconsisting of ki processing stages for ki sub-tasks of a job ofclass i. The transition from one stage to the next occurs whenone of the remaining servers finishes a sub-task of the job.Let cs denotes the number of classes that are finished beforestart of stage s, defined in (30). The processing in each stage

8



A1

B1 B1 B1B2 B2 B2

A1 departs

B1 B1B2 B2

B2 B2

B1 departs

(a) (b) (c)

A1 departs

A1 A1 A1 A1 B1 B1 B1

B1B2

B2

Server Blocked

Server 1 Server 2 Server 3 Server 1 Server 2 Server 3 Server 1 Server 2 Server 3

Fig. 6: Enhanced two-class FJ system with FCFS.

s corresponds to a single-server FCFS system with jobs of allbut classes 1, 2, · · · , cs. Then, using (14) for the FCFS sub-system at stage s, the average latency for a sub-task of a jobof class i in stage s is given by,

T iFCFS,s = E[Ss

i ] +λE[(Ss)

2]

2(1− λE[Ss])), (39)

where Ss is a r.v. denoting the service time for any sub-taskin stage s and Ss

i denotes the service time for a sub-task ofclass i in stage s. Now the moments of Ss and Ss

i are relatedto each other in the same way as the moments of S and Si in(17). So we have,

E[Ss] =R∑

r=cs+1

prE[Ssr ], E[(Ss)

2] =

R∑r=cs+1

prE[(Ssr)

2].

(40)Substituting (40) in (39), we get

T iFCFS,s,cs = E[Ss

i ] +

R∑r=cs+1

λrE[(Ssi )

2]

2

(1−

R∑r=cs+1

λrE[Ssi ]

) . (41)

Now we note that at any stage s, the maximum possible servicerate for a job of class j that is not finished yet is (n− s+ 1)µj .This happens when all the remaining sub-tasks of job of classj are at the head of their buffers. Thus, we can enhance thelatency performance in each stage s by approximating it witha M/G/1 system with service rate (n− s+ 1)µj for jobs ofclass j. Then, the average latency for sub-task of job of classi in stage s is lower bounded as,

T iFCFS,s,cs ≥ 1

(n− s+ 1)µi+

R∑r=cs+1

λr

(n−s+1)µ2r

1−R∑

r=cs+1

λr

(n−s+1)µr

, (42)

Finally, the average latency for class i in this enhanced system

is simplyki∑s=1

T iFCFS,s,cs . This gives us

T iFCFS ≥

ki∑s=1

ts,iλi︸︷︷︸

service time

+

ki∑s=1

R∑

r=cs+1

ts,r(n−s+1)µr

1−R∑

r=cs+1ts,r

︸︷︷︸

waiting time

,

where ts,i =λi

(n−s+1)µi. This concludes the proof of lower

bound on the average latency for FCFS scheduling.

VI. QUANTITATIVE RESULTS AND DISCUSSION

In this section, we use Monte-Carlo simulations of a het-erogeneous Fork-Join system to study the impact of varyingvarious system parameters on the average latency of differentclasses and the energy efficiency of DSS. For simplicity,the number of data classes is set to 2. Data of class 1is stored using (n, k1) = (10, 5) MDS code. Data of class2 is stored using (10, k2) MDS code where k2 is variedfrom 1 to 10. Arrival rates for the two classes are set as:λ1 = 0.15 and λ2 = 0.5. The job size for both the classes isset to 1 kilobits. Job requests for both the classes are servedusing full redundancy (i.e., r1 = r2 = n). We set the powerconsumption parameters by using the data for Intel Xeonfamily of CPUs10 and associated platform components [39].Therefore, we set C0 = 203.13 W, Pa = 120 W, Cl = 15 W,wl = 6 s, and Pl = 13.1 W. The CPU frequency, f is set to1 unless mentioned otherwise.

A. Impact of fault-tolerance and service rate

The behavior of average latency with respect to change infault-tolerance k, is governed by two opposing factors.

• Increasing k reduces the number of servers available forserving the next job in queue, thus resulting in an increasein latency.

• Increasing k increases the effective service rate (kµ) ofeach server as each server stores a smaller fraction ( l

k )of the job. This decreases the average latency.

Fig. 7 shows the average latency for jobs of class 2 versusk2 for the FCFS system with µ = 1/6 and 111. The file sizefor both classes are equal to 1 kb. We note that the averagelatency increases on increasing k2. This is because µ is largeenough, so the increment in latency due to the first factordominates the decrease in latency due to the second factor. Wealso note that the bounds are somewhat loose at high values ofk2 and low values of µ. In particular, the lower bound becomesloose because at each processing stage of the serial Fork-Joinsystem, the difference between the actual service rate and itsbound at sth stage of processing (i.e. (n− s+ 1)µi) for jobs ofclass i increases with increase in k and decrease in µ. Similarlythe upper bound becomes worse because the service time lostdue to blocking increases significantly at low µ and high kvalues. This is because the remaining sub-tasks are servedreally slow (low µ) and the blocking continues until a largenumber of sub-tasks (high k) are finished. Finally, as expected,we note that the naive lower bound on latency of class 2 isloose as compared to the proposed lower bound for the FCFSsystem.

10We use the power consumption parameters of “Deeper Sleep” state forour low-power state.

11In our work, we set file size to be multiples of 1 kilobits. So µ is definedper kilobit.

9



1 2 3 4 5 6 7 8 9 100

0.5

1

1.5

2

2.5

3

k2

Ave

rage

Lat

ency

of C

lass

2

Actual−FCFSUpper Bound−FCFSLower Bound−FCFSNaive Lower Bound−FCFS

µ = 1

µ = 1/6

Fig. 7: Latency of a data-class increases with increase in its code-rate anddecreases with increase in service rate.

1 2 3 4 5 6 7 8 9 1015

20

25

k2

En

erg

y ef

fici

ency

(b

its/

Jou

le)

1 2 3 4 5 6 7 8 9 100

5

10

Sto

rag

e sp

ace

per

file

(in

kb

)

Storage spaceEnergy Efficiency

(n,k1) = (10, 5), w

l=6 s, d

l=2 s, µ =1/6, l

1 = l

2 = 1 kb

Fig. 8: Tradeoff between energy efficiency of DSS and storage space perfile with variation in code-rate.

B. Impact of coding on energy efficiency and storage space

Fig. 8 illustrates the impact of varying the code rate(k/n for (n, k) MDS code) on the latency, energy efficiencyand network bandwidth of system. At one extreme is the (n, 1)replication code with code rate 1/n that has minimum latency(see Fig. 7) and maximum energy efficiency. This is becausewe just wait for any one server to finish the job. For a fixedn, low latency translates to higher data throughput and hencehigher energy efficiency. However, the total storage space perfile for (n, 1) code is nl where l is file size. Hence the (write)network bandwidth is maximum at k = 1. At the other extremeis (n, n) code with no fault-tolerance and no storage overhead(storage size is l). But it suffers from high latency and lowenergy efficiency. This is because we need to wait for allthe servers to finish their sub-tasks of a job before the jobis completed. Hence latency and throughput suffers which inturn decreases energy efficiency.

C. Impact of number of servers in DSS

Fig. 9 shows the impact of increasing the number of servers(n) on the latency and energy efficiency of DSS, while all othersystem parameters are kept constant. We observed that for lowvalues of n, increasing n increases the energy efficiency. Thisis because of more servers available to serve the job whichreduces average latency and thus increase the throughput. Theincrease in throughput due to lower latency outweighs theincrease in energy consumption due to higher n. Hence the

6 8 10 12 14 16 18 2020.5

21

21.5

22

En

erg

y ef

fici

ency

of

DS

S

Number of servers (n)

6 8 10 12 14 16 18 201

2

3

4

Ave

rag

e la

ten

cy o

f cl

ass

2

Energy efficiencyLatency

(k1, k

2) = (3, 5), (λ

1, λ

2) = (0.15, 0.5), µ=1/6

Fig. 9: Energy efficiency increases and attains a maxima as number ofservers is increased while latency behaves in an inverse fashion.

overall effect is that energy efficiency increases. However athigh values of n, increasing n results in diminishing returns inlatency and throughput. This is because latency improvementis limited by effective service rate (kµ/l) and not the numberof servers. At very large n, the energy consumption becomesquite significant. Therefore, the energy efficiency begins todecrease at large n. We thus conclude that there is anoptimum value of n that maximizes energy efficiency andhas near minimal latency.

D. Impact of general service time

In most of the practical DSS, the service times are notexponentially distributed but rather have heavy-tail whichmeans that there is a significant probability of very largeservice times. Pareto distribution has been found to be a goodfit for service time distribution in practical DSS [45], [46]. Itscumulative distribution function is given by

FS(s) =

{0 for s < sm

1−(sms

)αfor s ≥ sm

(43)

Here α is shape parameter and xm is the scale parameter. Asthe value of α decreases the service becomes more heavy-tailed and it becomes infinite for α ≤ 1. Figures 10 and 11show the impact of Pareto service distribution on the latencyand energy efficiency of DSS for α = 1.1 and 6 respectively.At α = 6, the service distribution is not very heavy-tailed. Soincreasing k2 reduces latency of jobs of class 2 due to increasein their effective service rate (k2µf/l2). However, at α = 1.1,the service time distribution becomes very heavy-tailed, so ask2 becomes large, the increase in service time due to waitingfor more servers (larger k) outweighs the decrease due tohigher effective service rate. In both cases, we note that latencybehaves inversely to the change in latency. We note that as k2increases from 1 to 10, energy efficiency first starts increasing,reaches a maximum and then starts decreasing for large k. Weconclude that for heavy-tailed service distribution, thereexists an optimal code-rate that yield maximum energyefficiency and minimum latency for heavy-tailed servicetimes.

E. Impact of heavy-tailed arrival distribution

Fig. 12 illustrates the impact of a general (Pareto) arrivaltime distribution on the latency and energy efficiency of

10



1 2 3 4 5 6 7 8 9 1060

80

100

120

En

erg

y ef

fici

ency

(b

its/

Jou

le)

1 2 3 4 5 6 7 8 9 100

0.2

0.4

0.6

Ave

rag

e la

ten

cy o

f cl

ass

2 (i

n s

eco

nd

s)

k2

Energy EfficiencyAverage Latency

(n,k1) = (10, 5), α = 6, µ =3

Fig. 10: A light tailed general service distribution (Pareto-distribution withα = 6) results in monotonically decreasing latency as a function of code-rate.Energy efficiency follows an inverse behavior.

1 2 3 4 5 6 7 8 9 10116

117

118

119

120

121

122

Ene

rgy

effic

ienc

y (b

its/J

oule

)

k2

1 2 3 4 5 6 7 8 9 100.07

0.075

0.08

0.085

0.09

0.095

0.1

Ave

rage

late

ncy

of c

lass

2

Average LatencyEnergy Efficiency

(n,k1) = (10, 5), α = 1.1, µ =3, l

1 = l

2 = 1 kb

Fig. 11: A heavy tailed general service distribution (Pareto-distribution withα = 1.1) results in minimal latency and maximum energy efficiency point asthe code-rate is increased.

1 2 3 4 5 6 7 8 9 10110.5

111

111.5

112

Ene

rgy

effic

ienc

y (b

its/J

oule

)

k2

1 2 3 4 5 6 7 8 9 100.05

0.1

0.15

0.2

Ave

rage

late

ncy

of c

lass

2

Energy EfficiencyAverage Latency

(n,k1) = (10, 5), α = 1.5, µ =3

Fig. 12: A heavy tailed inter-arrival time distribution (Pareto-distributionwith α = 1.5) results in monotonically increasing latency (and monotonicallydecreasing energy efficiency) as the code-rate is increased.

DSS. We observed that when distribution becomes heavytailed, latency increases (and energy efficiency decreases) withincrease in code rate. The heavy-tailed arrival distributionresults in occasional very large inter-arrival time, howeverthe arrival rate remains the same. Since it does not influencesignificantly the service dynamics, we observe that the latencyincreases with increase in code-rate similar to the M/M/1 case(in Fig. 7). Since latency increases, energy efficiency decreaseswith increase code-rate similar to previous results.

4 5 6 7 8 9 100.4

0.6

0.8

1

1.2

1.4

1.6

Number of redundant requests for class 1 (r1)

Ave

rag

e L

aten

cy o

f C

lass

1

4 5 6 7 8 9 1076

77

78

79

80

81

82

Ave

rag

e E

ner

gy

Eff

icie

ncy

(b

its/

Jou

le)

Latency − FCFSEnergy Efficiency − FCFS

(n, k1, k2) = (10, 4, 3), (λ1, λ2) = (0.8, 0.25), µ=1/3, (l1, l2) = (104, 104)

Fig. 13: Sending redundant requests reduces average latency and improvesenergy efficiency.

F. Impact of number of redundant requests

We now explore the impact of varying the number ofredundant requests (i.e., sending job requests to more than kservers) on the average latency and energy efficiency of DSS.The behavior of latency is governed by two opposing factors.

• Increasing the number of redundant requests reduces theservice time because there are more servers available thatsimultaneously process the same job. This reduces theservice time of each job. It increases the energy efficiencybecause the servers can process more requests per unittime.

• On the other hand, increasing the number of redundantrequests reduces the number of servers available forserving the next job in queue, thus resulting in increaseof size of queue at the servers. This results in loss ofthroughput and hence a plausible decrease in energyefficiency.

As it turns out that the first factor is more dominant thanthe second one, thereby resulting in an overall reductionin latency (increase in energy efficiency) by increasing thenumber of redundant requests. This behavior can be observedin Fig. 13 which shows the average latency of class 1 andenergy efficiency of DSS for FCFS scheduling. In this figure,the redundancy for class 1, r1, is varied from 4 to 10 and theredundancy of class 2 is set to r2 = 10.

VII. CONCLUSIONS

In this paper, we proposed a novel multi-tenant DSS modeland analyzed the energy efficiency of the system via lensof system latency. In the proposed heterogeneous DSS, eachdata class can possibly have different job arrival rate, jobsize and its data can be stored with a different fault-tolerancerequirement by coding it with appropriate (n, k) MDS code.In order to evaluate the impact of various parameters of DSSon its energy efficiency, we defined a data throughput basedenergy efficiency metric for any given scheduling policy. Weanalytically established that the energy efficiency of DSS isinversely related to the system latency for a special case.This motivated us to further investigate the impact of variousparameters on the relationship between the latency and energyefficiency of the DSS. Therefore, using a queuing-theoretic ap-proach, we obtained bounds on the average latency for FCFS,

11



preemptive and non-preemptive priority queuing policies. Weverified the accuracy of the bounds for different settings ofsystem parameters. The bounds, in general, are tight at highvalues of service rate, µ and low values of k. We also notedthat the proposed lower bounds are tighter than a naive lowerbound that follows directly from the work in [17].

Using simulations, we investigate the relationship betweenaverage latency of data classes and energy efficiency of DSSunder various setting of system parameters. We found thatincreasing the coding rate reduces the network bandwidthbut increases latency and decreases energy efficiency. Wealso found that there exists an optimal number of serverswhich maximizes energy efficiency and results in near minimallatency. We observed that for heavy-tailed service distribution(which is the case for practical DSS), there exists an optimalcode-rate that yield maximum energy efficiency and minimumlatency. Lastly, we studied the impact of sending redundantrequests on the average latency of that data class and theenergy efficiency of DSS. We noted that increasing redundancyfor a data class helps to reduce its average latency andas a consequence, the overall latency decreases and energyefficiency of DSS increases.

REFERENCES

[1] A. Kumar, R. Tandon, and T. Clancy, “On the latency of heterogeneousmds queue,” in IEEE Global Communications Conference (GLOBE-COM), Dec 2014, pp. 2375–2380.

[2] H. Weatherspoon and J. Kubiatowicz, “Erasure coding vs. replication:A quantitative comparison,” in International Workshop on Peer-to-PeerSystems, 2002, pp. 328–338.

[3] “Colossus, successor to google file system,” July 2010. [Online].Available: http://goo.gl/cUXcSm

[4] “Saving capacity with hdfs raid,” June 2014. [Online]. Available:http://goo.gl/P5usvs

[5] C. Huang, H. Simitci, Y. Xu et al., “Erasure coding in windows azurestorage,” in USENIX Conference on Annual Technical Conference, 2012,pp. 2–2.

[6] “Cisco visual networking index: Global mobile data trafficforecast update, 20132018,” Feb 2014. [Online]. Available:http://goo.gl/ULXROo

[7] Y. Sverdlik, “Global data center energy use to grow by 19% in 2012,”Sep. 2011. [Online]. Available: http://goo.gl/Ck1TxB

[8] D. Harnik, D. Naor, and I. Segall, “Low power mode in cloud storagesystems,” in IEEE International Symposium on Parallel DistributedProcessing, May 2009, pp. 1–8.

[9] D. Colarelli and D. Grunwald, “Massive arrays of idle disks forstorage archives,” in Proceedings of the 2002 ACM/IEEE Conferenceon Supercomputing, 2002, pp. 1–11.

[10] A. Verma, R. Koller, L. Useche, and R. Rangaswami, “Srcmap: Energyproportional storage using dynamic consolidation,” in Proceedings ofthe 8th USENIX Conference on File and Storage Technologies, 2010,pp. 20–20.

[11] H. Jo, Y. Kwon, H. Kim, E. Seo, J. Lee, and S. Maeng, “Ssd-hdd-hybridvirtual disk in consolidated environments,” in Proceedings of the 2009International Conference on Parallel Processing, 2010, pp. 375–384.

[12] D. Chen, E. Henis, R. I. Kat et al., “Usage centric green performanceindicators,” SIGMETRICS Perform. Eval. Rev., vol. 39, no. 3, pp. 92–96,Dec. 2011.

[13] G. Schulz, The Green and Virtual Data Centre. New York, NY:CRC/Auerbach, 2009, ch. Measurement, Metrics, and Management ofIT resources.

[14] J. Levandoski, P.-A. Larson, and R. Stoica, “Identifying hot and cold datain main-memory databases,” in Proc. of IEEE International Conferenceon Data Engineering, April 2013, pp. 26–37.

[15] D. Gibson, “Is your data hot, warm, or cold ?” 2012. [Online]. Available:http://ibmdatamag.com/2012/06/is-your-big-data-hot-warm-or-cold/

[16] R. D. Strong, “Low-latency techniques for improving system energyefficiency,” Ph.D. dissertation, University of California, San Diego, 2013.

[17] G. Joshi, Y. Liu, and E. Soljanin, “On the delay-storage trade-offin content download from coded distributed storage systems,” IEEEJournal on Selected Areas in Communication, May 2014.

[18] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The hadoopdistributed file system,” in IEEE Symposium on Mass Storage Systemsand Technologies, 2010, pp. 1–10.

[19] M. Conway, “A multiprocessor system design,” in AFIPS Fall JointComputing Conference, 1963, pp. 139–146.

[20] E. W. Dijkstra, “Cooperating sequential processes,” in ProgrammingLanguages, 1968, pp. 43–112.

[21] L. Huang, S. Pawar, H. Zhang, and K. Ramchandran, “Codes can reducequeueing delay in data centers,” in Information Theory Proceedings(ISIT), 2012, pp. 2766–2770.

[22] N. B. Shah, K. Lee, and K. Ramchandran, “The MDS queue,” arXiv,vol. abs/1211.5405, 2012.

[23] N. Shah, K. Lee, and K. Ramchandran, “When do redundant requestsreduce latency ?” in Allerton, Oct 2013, pp. 731–738.

[24] “How aws pricing works,” July 2014. [Online]. Available:http://media.amazonwebservices.com/AWS Pricing Overview.pdf

[25] “Google cloud storage - pricing.” [Online]. Available:https://cloud.google.com/storage/docs/storage-classes

[26] J. Plank, “Erasure codes for storage systems,” Dec 2013. [Online].Available: https://www.usenix.org/system/files/login/articles/10 plank-online.pdf

[27] N. Cao, S. Yu, Z. Yang, W. Lou, and Y. Hou, “Lt codes-based secureand reliable cloud storage service,” in IEEE INFOCOM, March 2012,pp. 693–701.

[28] S. Aly, Z. Kong, and E. Soljanin, “Raptor codes based distributedstorage algorithms for wireless sensor networks,” in IEEE InternationalSymposium on Information Theory, July 2008, pp. 2051–2055.

[29] S. Chen, Y. Sun, U. Kozat, L. Huang, P. Sinha, G. Liang, X. Liu,and N. Shroff, “When queueing meets coding: Optimal-latency dataretrieving scheme in storage clouds,” in IEEE INFOCOM, April 2014,pp. 1042–1050.

[30] G. Liang and U. Kozat, “Use of erasure code for low latency cloudstorage,” in Allerton, Sept 2014, pp. 576–581.

[31] ——, “Fast cloud: Pushing the envelope on delay performance of cloudstorage with coding,” IEEE/ACM Transactions on Networking, vol. 22,no. 6, pp. 2012–2025, Dec 2014.

[32] ——, “Tofec: Achieving optimal throughput-delay trade-off of cloudstorage using erasure codes,” in Proc. of IEEE INFOCOM, April 2014,pp. 826–834.

[33] Y. Xiang, T. Lan, V. Aggarwal, and Y. F. R. Chen, “Joint latency andcost optimization for erasurecoded data center storage,” SIGMETRICSPerform. Eval. Rev., vol. 42, no. 2, pp. 3–14, Sep. 2014.

[34] L. Barroso and U. Holzle, “The case for energy-proportional computing,”Computer, vol. 40, no. 12, pp. 33–37, Dec 2007.

[35] D. Snowdon, S. Ruocco, and G. Heiser, “Power management anddynamic voltage scaling: Myths and facts,” in Proc. of Workshop onPower Aware Real-time Computing, Sep 2005.

[36] L. L. Andrew, M. Lin, and A. Wierman, “Optimality, fairness, androbustness in speed scaling designs,” SIGMETRICS Perform. Eval. Rev.,vol. 38, no. 1, pp. 37–48, Jun. 2010.

[37] D. Meisner, B. T. Gold, and T. F. Wenisch, “Powernap: Eliminatingserver idle power,” SIGARCH Comput. Archit. News, vol. 37, no. 1, pp.205–216, Mar. 2009.

[38] Y. Liu, S. Draper, and N. S. Kim, “Queuing theoretic analysis of power-performance tradeoff in power-efficient computing,” in Conference onInformation Sciences and Systems (CISS), March 2013, pp. 1–6.

[39] ——, “Sleepscale: Runtime joint speed scaling and sleep states manage-ment for power efficient data centers,” in IEEE International Symposiumon Computer Architecture, June 2014, pp. 313–324.

[40] D. Bertsekas and R. Gallager, Data Networks (2nd Ed.). Upper SaddleRiver, NJ, USA: Prentice-Hall, Inc., 1992, ch. Delay Models in DataNetworks, pp. 203–206.

[41] H. C. Tijms, A first course in stochastic models. Wiley, 2003.[42] E. Varki, A. Merchant, and H. Chen, “The M/M/1 fork-

join queue with variable sub-tasks.” [Online]. Available:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.3062

[43] S. Ross, A first course in probability. Prentice-Hall Inc., 2002.

12



[44] A. Kumar, R. Tandon, and T. C. Clancy, “On thelatency and energy efficiency of erasure-coded cloud storagesystems,” arXiv, vol. abs/1405.2833v2, 2015. [Online]. Available:http://arxiv.org/abs/1405.2833v2

[45] M. Crovella and A. Bestavros, “Self-similarity in world wide web traffic:evidence and possible causes,” IEEE/ACM Transactions on Networking,vol. 5, no. 6, pp. 835–846, Dec 1997.

[46] M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On power-law relation-ships of the internet topology,” SIGCOMM Comput. Commun. Rev.,vol. 29, no. 4, pp. 251–262, Aug 1999.

Akshay Kumar (S’11-) received the B.Tech. degreein Electrical Engineering from the Indian Instituteof Technology, Guwahati (IIT Guwahati) in May2010. He received his M.S. in Electrical Engineeringfrom Virginia Tech in Nov. 2012. He is currentlypursuing his Ph.D. in the Bradley Department ofElectrical and Computer Engineering, Virginia Tech.His research interests include modeling and analysisof distributed storage systems.

Ravi Tandon (S’03-M’09) received the B.Tech.degree in Electrical Engineering from the IndianInstitute of Technology, Kanpur (IIT Kanpur) in May2004. He received the Ph.D. degree in Electrical andComputer Engineering from the University of Mary-land, College Park (UMCP) in June 2010. From July2010 till July 2012, he was a post-doctoral researchassociate at Princeton University. Since July 2012,he has been with Virginia Tech, where currently, heis a Research Assistant Professor in the DiscoveryAnalytics Center and the Department of Computer

Science. His research interests are in the areas of network information theoryfor wireless networks, information theoretic security, machine learning andcloud storage systems. Dr. Tandon is a recipient of the Best Paper Award at theCommunication Theory symposium at the 2011 IEEE Global CommunicationsConference

T. Charles Clancy (S’02-M’06-SM’10) is an Asso-ciate Professor of Electrical and Computer Engineer-ing at Virginia Tech and Director of the Hume Cen-ter for National Security and Technology. Prior tojoining Virginia Tech in 2010, he served as a seniorresearcher at the Laboratory for TelecommunicationsSciences, a defense research lab at the Universityof Maryland, where he led research programs insoftware-defined and cognitive radio. Dr. Clancyreceived his B.S. in Computer Engineering fromthe Rose-Hulman Institute of Technology, M.S. in

Electrical Engineering from the University of Illinois, and his Ph.D. inComputer Science from the University of Maryland. He is a Senior Member ofthe IEEE and has over 100 peer-reviewed technical publications. His currentresearch interests include cognitive communications and spectrum security.

13

On the Latency and Energy Efﬁciency of Distributed Storage ...tandonr/journal-papers/Latency-DSS-2015.pdf · Index Terms—Erasure Codes, Distributed Storage, Fork-Join Queues,

Documents