Top Banner
Received August 6, 2018, accepted September 19, 2018, date of publication September 24, 2018, date of current version October 17, 2018. Digital Object Identifier 10.1109/ACCESS.2018.2871821 Energy-Efficient Fault-Tolerant Scheduling Algorithm for Real-Time Tasks in Cloud-Based 5G Networks PENGZE GUO 1 , MING LIU 1,2 , JUN WU 1 , (Member, IEEE), ZHI XUE 1 , AND XIANGJIAN HE 2 , (Senior Member, IEEE) 1 Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China 2 School of Electrical and Data Engineering, University of Technology Sydney, Ultimo, NSW 2007, Australia Corresponding author: Zhi Xue ([email protected]) This work was supported by the National Natural Science Foundation of China under Grant 61332010. ABSTRACT Green computing has become a hot issue for both academia and industry. The fifth- generation (5G) mobile networks put forward a high request for energy efficiency and low latency. The cloud radio access network provides efficient resource use, high performance, and high availability for 5G systems. However, hardware and software faults of cloud systems may lead to failure in providing real-time services. Developing fault tolerance technique can efficiently enhance the reliability and availability of real-time cloud services. The core idea of fault-tolerant scheduling algorithm is introducing redundancy to ensure that the tasks can be finished in the case of permanent or transient system failure. Nevertheless, the redundancy incurs extra overhead for cloud systems, which results in considerable energy consumption. In this paper, we focus on the problem of how to reduce the energy consumption when providing fault tolerance. We first propose a novel primary-backup-based fault-tolerant scheduling architecture for real-time tasks in the cloud environment. Based on the architecture, we present an energy-efficient fault-tolerant scheduling algorithm for real-time tasks (EFTR). EFTR adopts a proactive strategy to increase the system processing capacity and employs a rearrangement mechanism to improve the resource utilization. Simulation experiments are conducted on the CloudSim platform to evaluate the feasibility and effectiveness of EFTR. Compared with the existing fault-tolerant scheduling algorithms, EFTR shows excellent performance in energy conservation and task schedulability. INDEX TERMS Energy efficiency, fault tolerance, real-time, scheduling, cloud, 5G. I. INTRODUCTION The increasing requirements of communication quality have promoted the evolution of mobile communication technolo- gies. 5G networks are expected to provide ubiquitous connec- tivity and real-time interaction. It is forecasted that there will be more than 50 billion connected devices in 2020 [1]. The mobile communication data in 2020 will be approximately 1000 times more than that in 2010 [2]. To achieve similar energy consumption, the energy efficiency (usually mea- sured in bits/Joule) should be increased by 100x times [3]. According to the statistics, the power consumption of tra- ditional base stations (BSs) account for 72% of the total power consumption of radio access networks (RAN), but the energy efficiency of BS is only about 50% [4]. BSs usually have excessive computing capacity to deal with the traffic during peak hours. However, the resource utilization is low during off-peak hours. Besides, the ancillary equipments in distributed BSs consume large amount of energy. Spurred by both economic and environmental considerations, green computing has become a research priority in the design of information systems [5]–[7]. Cloud radio access network (C-RAN), which centralizes the baseband processing into cloud data centers, is a candi- date solution for 5G [8]. In C-RAN, baseband unit (BBU) pool is responsible for the signal processing, and remote radio heads (RRHs) are distributed to receive and transmit the data from/to user equipments (UEs). BBU and RRHs are connected via high-speed optical fronthaul. Centralized signal processing considerably reduces the energy consump- tuion of cooling devices and other supporting equipments VOLUME 6, 2018 2169-3536 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 53671
13

Energy-Efficient Fault-Tolerant Scheduling Algorithm for Real ......tasks even in case of failure. Fault-tolerant scheduling is an effective method to increase the system reliability.

Jun 24, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Energy-Efficient Fault-Tolerant Scheduling Algorithm for Real ......tasks even in case of failure. Fault-tolerant scheduling is an effective method to increase the system reliability.

Received August 6, 2018, accepted September 19, 2018, date of publication September 24, 2018, date of current version October 17, 2018.

Digital Object Identifier 10.1109/ACCESS.2018.2871821

Energy-Efficient Fault-Tolerant SchedulingAlgorithm for Real-Time Tasks inCloud-Based 5G NetworksPENGZE GUO 1, MING LIU1,2, JUN WU 1, (Member, IEEE), ZHI XUE1,AND XIANGJIAN HE 2, (Senior Member, IEEE)1Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Department of Electronic Engineering,Shanghai Jiao Tong University, Shanghai 200240, China2School of Electrical and Data Engineering, University of Technology Sydney, Ultimo, NSW 2007, Australia

Corresponding author: Zhi Xue ([email protected])

This work was supported by the National Natural Science Foundation of China under Grant 61332010.

ABSTRACT Green computing has become a hot issue for both academia and industry. The fifth-generation (5G) mobile networks put forward a high request for energy efficiency and low latency. The cloudradio access network provides efficient resource use, high performance, and high availability for 5G systems.However, hardware and software faults of cloud systems may lead to failure in providing real-time services.Developing fault tolerance technique can efficiently enhance the reliability and availability of real-time cloudservices. The core idea of fault-tolerant scheduling algorithm is introducing redundancy to ensure that thetasks can be finished in the case of permanent or transient system failure. Nevertheless, the redundancyincurs extra overhead for cloud systems, which results in considerable energy consumption. In this paper,we focus on the problem of how to reduce the energy consumption when providing fault tolerance. We firstpropose a novel primary-backup-based fault-tolerant scheduling architecture for real-time tasks in the cloudenvironment. Based on the architecture, we present an energy-efficient fault-tolerant scheduling algorithmfor real-time tasks (EFTR). EFTR adopts a proactive strategy to increase the system processing capacityand employs a rearrangement mechanism to improve the resource utilization. Simulation experiments areconducted on the CloudSim platform to evaluate the feasibility and effectiveness of EFTR. Compared withthe existing fault-tolerant scheduling algorithms, EFTR shows excellent performance in energy conservationand task schedulability.

INDEX TERMS Energy efficiency, fault tolerance, real-time, scheduling, cloud, 5G.

I. INTRODUCTIONThe increasing requirements of communication quality havepromoted the evolution of mobile communication technolo-gies. 5G networks are expected to provide ubiquitous connec-tivity and real-time interaction. It is forecasted that there willbe more than 50 billion connected devices in 2020 [1]. Themobile communication data in 2020 will be approximately1000 times more than that in 2010 [2]. To achieve similarenergy consumption, the energy efficiency (usually mea-sured in bits/Joule) should be increased by 100x times [3].According to the statistics, the power consumption of tra-ditional base stations (BSs) account for 72% of the totalpower consumption of radio access networks (RAN), but theenergy efficiency of BS is only about 50% [4]. BSs usuallyhave excessive computing capacity to deal with the traffic

during peak hours. However, the resource utilization is lowduring off-peak hours. Besides, the ancillary equipments indistributed BSs consume large amount of energy. Spurredby both economic and environmental considerations, greencomputing has become a research priority in the design ofinformation systems [5]–[7].

Cloud radio access network (C-RAN), which centralizesthe baseband processing into cloud data centers, is a candi-date solution for 5G [8]. In C-RAN, baseband unit (BBU)pool is responsible for the signal processing, and remoteradio heads (RRHs) are distributed to receive and transmitthe data from/to user equipments (UEs). BBU and RRHsare connected via high-speed optical fronthaul. Centralizedsignal processing considerably reduces the energy consump-tuion of cooling devices and other supporting equipments

VOLUME 6, 20182169-3536 2018 IEEE. Translations and content mining are permitted for academic research only.

Personal use is also permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

53671

Page 2: Energy-Efficient Fault-Tolerant Scheduling Algorithm for Real ......tasks even in case of failure. Fault-tolerant scheduling is an effective method to increase the system reliability.

P. Guo et al.: Energy-Efficient Fault-Tolerant Scheduling Algorithm

in BSs. Scalable computing can dynamically adjust the sys-tem resources according to the workload demands in thecloud environment [9]. Centralized deployment can improvethe BS utilization by sharing resources under dynamic trafficload.

As a new paradigm of delivering computing services,cloud computing has the features of dynamic scalabil-ity, measured service and on-demand resource provisioning[10]–[12]. These features largely depend on virtualization.With virtualization, physical hosts can be divided into severalvirtual machines (VMs) [13]. As the mainstay of computingresources, cloud-based BBU pool takes charge of most taskprocessing. It is of paramount importance to improve theenergy efficiency of BBU servers. The elasticity of cloudcomputing raises challenges for C-RAN to increase theresource utilization. In addition, with the tremendous increaseof network traffic, it is another tricky problem of how to meetthe deadline constraints of real-time user requests.

Nowadays more and more real-time services are realizedthrough wireless communication systems, e.g., Internet ofThings (IoT), vehicular networks [14], and video stream-ing [15]. The timeliness of services should be guaranteed.In real-time systems, the computational results should beproduced not only correctly but also timely [16]–[18]. Theconsequences of missing deadlines are different for differentreal-time systems [19]. For hard real-time systems, missingdeadlines can result in catastrophe consequence. While forsoft real-time systems, violation of time constraints usuallyresults in service quality degradation, but the system cancontinue running [20].

In large-scale cloud data centers, node failures are com-mon [21]. Therefore, fault tolerance is a mandatory mecha-nism of wireless networks. Since one computing instance’sfailure may cause some tasks to violate the deadline con-straints, C-RAN should ensure the timeliness of real-timetasks even in case of failure. Fault-tolerant scheduling is aneffective method to increase the system reliability. Primary-backup (PB) model is a popular fault-tolerant scheme. In thePBmodel, each task is duplicated and the two copies (i.e., pri-mary copy and backup copy) are sent to different computingnodes for fault tolerance. Fundamentally, PB model utilizesthe redundancy technology to improve the reliability of thesystem [22].

To the best of our knowledge, no previous work has beendone on dynamic energy-efficient fault-tolerant schedulingfor real-time tasks in cloud-based 5G networks. In this paper,the UEs’ tasks that we concern are independent, aperiodic,and non-preemptive. Both energy conservation and faulttolerance is considered while meeting the real-time require-ments. We first analyze the schedulability of real-time tasksand then try to reduce the energy consumption. In addition,we sufficiently consider the dynamics and elasticity of cloudcomputing, e.g., VM migration and VM creation. When thecurrent system cannot guarantee the timing requirements,new VMs are added. Proactive strategy is adopted to selectproper new VMs. Simulation results show that proactive

strategy and rearrangement mechanism bring substantialimprovement in energy efficiency and tasks guarantee ratio.

The rest of this paper is organized as follows. Related workis described in Section II. Section III presents the systemmodel, including architecture framework, power model, andVM migration. Scheduling criteria are also analyzed here.Section IV describes the EFTR algorithm in detail. Section Vdemonstrates the experiments to evaluate the performance ofEFTR. Finally, we make conclusions in Section VI.

II. RELATED WORKEnergy-efficient techniques of wireless networks have beenstudied from the aspects of mobile devices, communica-tion infrastructures, and cloud data centers. Many researchesare focused on the former two cases [23]–[26]. This workfocuses on energy saving in cloud data centers through taskscheduling.

Since finding the optimal allocation of tasks in unipro-cessor and multiprocessor systems is an NP-complete prob-lem [27], many varieties of heuristics for scheduling taskshave been devised. For scheduling preemptive periodictasks in uniprocessor systems, Liu and Layland [28] pro-posed the Rate-Monotonic (RM) algorithm, which priori-tizes tasks in proportion to their frequency and is proved tobe the optimal fixed-priority algorithm. To precisely judgethe schedulability of tasks with priorities on uniprocessor,Joseph and Pandya [29] put forward the sufficient and nec-essary condition, called the Completion Time Test (CTT).Rate-Monotonic First-Fit (RMFF), which extends the RMalgorithm from uniprocessor to multiprocessor with first-fitbin-packing heuristic, was designed by Dhall and Liu [30].Some works have been done on task scheduling in C-RAN.Xia et al. [7] proposed an iterative coordinate descent algo-rithm to find the scheduling solution for minimizing thenetwork power consumption of downlink C-RAN. Wangand Cen [31] proposed a real-time scheduling algorithm forperiodic preemptive tasks in C-RAN. Zhang et al. [32] putforward the near-far C-RAN (NFC-RAN) architecture com-posed of near edge computing (NEC) and far edge computing(FEC). Task assignment between NEC and FEC is elaboratedto increase the task completion rate. However, fault tolerancehas not been studied in these algorithms.

Bertossi et al. [33] put forward a multiprocessor-based fault-tolerant algorithm FTRMFF using PB model.The FTRMFF algorithm considers both backup overbook-ing and deallocation [34] to reduce system overhead.Guo and Xue [35] proposed the QFTRMFF algorithm, whichstrives to optimize the QoS levels of tasks after the FTRMFFscheduling. Unfortunately, these works are designed forhomogenous systems and not suitable for heterogeneoussystems. The computing resources in C-RAN have vari-ous processing capabilities [36]. In addition, the tasks inabove works are preemptive, i.e., a task can take the placeof another executing task if their execution time overlaps.However, we consider non-preemptive tasks in this paper.Besides, some scheduling algorithms consider tasks with

53672 VOLUME 6, 2018

Page 3: Energy-Efficient Fault-Tolerant Scheduling Algorithm for Real ......tasks even in case of failure. Fault-tolerant scheduling is an effective method to increase the system reliability.

P. Guo et al.: Energy-Efficient Fault-Tolerant Scheduling Algorithm

FIGURE 1. Structure of C-RAN.

inter-dependence [37]–[39], but we focus on independenttasks. Dependent tasks can be transformed to independenttasks by setting new start times and deadlines [40].

The aforementioned methods fall into the category of staticscheduling. 5G systems allow users to transmit and receivedata in a timely manner and require on-line scheduling[41]–[43]. Such dynamic processing cases raise higherdemands on scheduling since tasks are independent and nopriori knowledge about the upcoming tasks is given [44].Luo et al. [45] proposedDYFARS,which leverages PBmodelto provide fault tolerance and enhances the reliability withoutadditional costs. Zhu et al. [46] proposed a QoS-aware fault-tolerant scheduling algorithm (QAFT) to increase the QoSlevels of real-time tasks. QAFT reduces system overheadby advancing primary copies and delaying backup copies.However, above fault-tolerant algorithms do not considervirtualization, which is the fundamental technique of cloudcomputing.

Recently, Wang et al. [47] put forward a fault-tolerantelastic scheduling algorithm for real-time tasks in clouds

called FESTAL. FESTAL takes virtualization into account,and uses backup overlapping to realize high system utiliza-tion. However, the FESTAL algorithm fails to take energysaving into account. Nonetheless, it provides a generalmethod for task scheduling in the fault-tolerant context.

III. SCHEDULING MODELA. ARCHITECTURE FRAMEWORKC-RAN is composed of BBU pool, RRHs, and fronthaullinks, as shown in Fig. 1. After receiving requests from UEs,RRHs send pre-processed baseband signals to BBU pool forfurther processing. Each RRH is connected with a BBU poolvia fronthaul link. BBU pool takes over most of the signalprocessing previously done in BSs. BBU pool consists ofcomputing servers or physical hosts. Different from tradi-tional distributed system, UEs’ tasks are executed by VMsrather than by physical hosts. Each host contains several VMswhich are responsible for executing tasks.

Fig. 2 shows the architecture framework of fault-tolerantscheduling in the BBU pool. Multiple users submit their tasksto the system. When a new task arrives, firstly it is sent toGlobal Scheduler. After analyzing the information gatheredfrom all computing nodes, Global Scheduler makes decisionsaccording to the scheduling algorithm and sends the primaryand backup copies of the task to different VMs. Then theprimary copy is executed if theVM is idle, or waits in the localqueue if the VM is busy. When the primary copy is finishedsuccessfully, the backup copy is deleted and the resourceoccupied by the backup copy is reclaimed. Local Scheduleris in charge of rearranging the order of the local queueif any backup copy is deallocated from the VM. ResourceManager decides how VMs should be added or migrated ifthe current processing capacity is unable to meet the timingrequirements.

The power consumption of C-RAN is typically composedof three parts: RRH power consumption, fronthaul power

FIGURE 2. Fault-tolerant scheduling architecture of BBU pool.

VOLUME 6, 2018 53673

Page 4: Energy-Efficient Fault-Tolerant Scheduling Algorithm for Real ......tasks even in case of failure. Fault-tolerant scheduling is an effective method to increase the system reliability.

P. Guo et al.: Energy-Efficient Fault-Tolerant Scheduling Algorithm

consumption, and BBU pool power consumption [48]. Thiswork focuses on energy saving in virtualized BBU poolthrough task scheduling.

B. SCHEDULING CRITERIAIn the BBU pool, each task from UE is sent to a physical hostand executed by aVMon the host. The k-th VMon j-th host hjis denoted by vmjk .Cj andCjk , which aremeasured byMillionInstructions per Second (MIPS), denote the processing capac-ity of hj and vmjk respectively. In this paper, primary-backupmodel is employed to realize fault tolerance. In this model,each task ti has two copies: primary copy tPi and backupcopy tBi . The VMs that accommodate tPi and tBi are denotedby vm(tPi ) and vm(t

Bi ) respectively. t

Pi is executed before tBi .

When tPi is finished, the task is executed successfully andtBi is removed from vm(tBi ). Task ti arrives at ai and mustfinish before its deadline di. li is the length of the task and ismeasured byMillion Instructions (MI). The execution time oftask ti on vmjk is denoted by ejk (ti) = li/Cjk . sPi and f

Pi denote

the start time and finish time of tPi . sBi and f Bi denote the start

time and finish time of tBi . The start and finish times of copiesare decided by the scheduling algorithm. The backup copyhas two statuses: active and passive. A passive backup copy isexecuted onlywhen the system encounters failure, whereas anactive backup copy is always executed even without systemfaults. The status of tBi is decided by:

st(tBi ) =

{active, if f Pi > sBi ;passive, otherwise.

(1)

Fig. 3 gives an example. The horizontal axis representstime. tB1 and tB2 adopt passive backup scheme while tB3 adoptsactive backup scheme because f P1 < sB1 , f

P2 < sB2 and

f P3 > sB3 . The active backup copy tB3 is composed of redundant

part tBR3 (shaded area) and non-redundant part tBN3 (unshadedarea). The redundant part tBR3 overlaps with the primarycopy tP3 in the time axis.

FIGURE 3. Illustration of primary and backup copies.

It is assumed that if the host host(tPi ) that accommo-dates tPi encounters failure, all VMs on the host, includingvm(tPi ), break down. The fault-tolerant scheduling algorithmshould guarantee that if a primary copy fails, its corespondingbackup copy can still finish before deadlines. The proposedalgorithm can tolerate at most one fault at any point of time.If tolerating multiple faults at one time instant is required,we can divide the hosts into several isolated groups, and applythe algorithm to each group [49].

PB approach is accomplished by introducing redundancy,i.e., the computing resource occupied by backup copies.Except for resource reclaiming, backup-backup (BB) over-lapping [33] is adopted in this work to reduce system over-head. BB overlapping allows backup copies on the sameVM to share the same time interval, thus reducing the VMsneeded. Fig. 3 illustrates BB overlapping between tB1 and tB2 .tB1 and tB2 are both passive backup copies. If host h1 fails, tB1is invoked. There is no conflict between tB1 and tB2 becausewhen tP2 finishes at f2, tB2 is deallocated.The overlapping criteria and scheduling principles are ana-

lyzed below. It is assumed that ti is a newly arrived task. Allother copies have been scheduled before its arrival.

1) PRIMARY COPYThe only criterion for scheduling primary copy is that nooverlapping is allowed. Because if two primary copies over-lap, there must be execution time conflict; if a primary copyoverlaps with a backup copy tBj , they also have conflict whenhost(tPj ) fails and t

Bj gets invoked.

2) BACKUP COPYTheorem 1: Backup copy cannot be scheduled on the

host where the corresponding primary copy is located,i.e., host(tBi ) 6= host(tPi ).

Proof: Prove by contradiction. Suppose that host(tBi ) =host(tPi ). Regardless of whether vm(tBi ) = vm(tPi ) or not,if host(tPi ) fails, both vm(t

Pi ) and vm(t

Bi ) break down. Thus,

tBi cannot be invoked and fault tolerance is infeasible. �Theorem 2: Backup copy cannot overlap any primary

copy.Proof: Prove by contradiction. Suppose that tBi overlaps

a primary copy tPj . When host(tPi ) fails, tBi must execute.No matter when the failure happens, tBi is bound to executethe entire task, thus incurring time conflict between tBi and tPj .Therefore, tBi cannot overlap tPj . �Theorem 3: Backup copy cannot overlap backup copy if

their primary copies are on the same host, i.e., tBi cannotoverlap tBj if host(t

Pi ) = host(tPj ).

Proof: Prove by contradiction. Suppose that tBi overlapstBj and host(tPi ) = host(tPj ). When host(tPi ) fails, both tPiand tPj fail. tBi and tBj must execute, and execution timeconflict between tBi and tBj is inevitable. Therefore, tBi cannotoverlap tBj . �Theorem 4: Redundant part of active backup copy cannot

overlap any backup copy, i.e., tBRi cannot overlap tBj .

53674 VOLUME 6, 2018

Page 5: Energy-Efficient Fault-Tolerant Scheduling Algorithm for Real ......tasks even in case of failure. Fault-tolerant scheduling is an effective method to increase the system reliability.

P. Guo et al.: Energy-Efficient Fault-Tolerant Scheduling Algorithm

Proof: Prove by contradiction. Suppose that tBRi over-laps tBj . When host(tPj ) fails, t

Bj is invoked. But tBRi always

executes, thus resulting in time conflict between tBRi and tBj .Therefore, tBRi cannot overlap tBj . �Theorem 5: Active backup copies cannot overlap with

each other.Proof: Prove by contradiction. Suppose that tBi over-

laps tBj , and both tBi and tBj are active. Then there must beoverlapping between tBRi and tBj , or between t

Bi and tBRj , which

violates Theorem 4. Therefore, tBRi cannot overlap tBj . �

C. POWER MODELSPECpower benchmark [50] measures the power and per-formance characteristics of server-class computer equipment.For most servers, the data reflects linear relationship betweenpower consumption and processor utilization. Therefore,the power model raised by Beloglazov et al. [51] is adopted inthis paper. The power consumption is defined in the followingequation:

P(t) = α · Pmax + (1− α) · Pmax · u(t), (2)

where Pmax is the power consumed when the server is 100%loaded; α is the fraction of power consumed when the serveris idle; and u(t) is the CPU utilization which varies with timedue to different workloads.

Suppose there are N VMs on host hj, the CPU utilizationof hj is:

u(t) =N∑k=1

Cjk (t)Cj

, (3)

where Cjk (t) equals Cjk if vmjk is busy, or 0 if vmjk is idle.The idle power α · Pmax includes the power consumed

by disk, memory, network interface, etc. For example, forPowerEdge R710 (Intel Xeon X5570, 16 cores × 2.93 GHz,8 GB), the idle power is 65 W and 100% active power is220 W. The fraction of power consumed when the server isidle is 29.55%. We analyzed the data on SPECpower bench-mark and calculated the ratios between idle power and 100%active power (i.e., the power consumption when the serveris idle divided by the power consumption when the server isfully utilized). α is set to the average ratio 30% in this paper.Thus, the energy consumption for each host over a period oftime is defined as:

E =∫ t2

t1P(t)dt = 0.3Pmax(t2 − t1)+ 0.7Pmax

∫ t2

t1u(t)dt.

(4)

The energy consumption of vmjk to execute primary copytPi is:

Ejk (tPi ) = 0.7PmaxCjkCj

liCjk= 0.7Pmax

liCj. (5)

The energy consumption of vmjk to execute backup copytBi is:

Ejk (tBi )=

0.7PmaxCjkCj

(f Pi − sBi ), if st(tBi ) = active;

0, otherwise.(6)

D. VIRTUAL MACHINE MIGRATIONLive migration of virtual machines refers to the technique ofmoving running VMs between physical hosts with negligibledowntime [52]. An important motivation of VM migration isto consolidate the computing resource, thus increasing systemutilization. The existing algorithms for dynamic VM con-solidation which aim to reduce energy consumption rarelyconsider the cost of live migration.

It is assumed that the data of VMs is stored on networkattached storage (NAS) and only memory migration is con-sidered in this paper. Pre-copy [53] is a widely used approachfor VMmigration. In this approach, all memory is transmittedfrom the source host to the destination host at the first stage,and dirty pages of memory are iteratively transferred until thememory dirtying rate exceeds a threshold or the remainingdirty memory is small enough. The performance of VM livemigrationmainly depends on thememory size, memory dirty-ing rate, and network transmission rate. According to [54],the total network traffic (Megabyte) of migration is:

Vmig =n∑i=0

Vi =n∑i=0

Vmemλi = Vmem1− λn+1

1− λ, (7)

where n is the total number of iterations, Vi is the datatransferred at each round, Vmem is the size of VM mem-ory, and λ is the ratio of memory dirtying rate to networktransmission rate. In this paper, Vmig is set to the typicalvalue 1.3Vmem.Some works have been done on the cost of VM migration

[54]–[56]. The results show that the energy consumptionindicates linear relationship with the data volume transferred.Liu et al. [54] found that the energy consumption of livemigration is largely independent of the data transmissionrate in a wired network. They concluded that the energyconsumption increases linearly with the network traffic ofVM migration and gave the energy cost (Joule) model:

Emig = 0.512Vmig + 20.165. (8)

In order to meet the requirement of fault tolerance, thereare some constraints of VM migration [47]. We briefly listthe constraints here:

• VM migration should not cause any task’s primary andbackup copies to be located on the same host;

• VM migration should not cause two primary copiesto be located on the same host if their backup copiesoverlap.

Actually, these constraints are essentially identical to the firstand third restricted conditions for backup copies.

VOLUME 6, 2018 53675

Page 6: Energy-Efficient Fault-Tolerant Scheduling Algorithm for Real ......tasks even in case of failure. Fault-tolerant scheduling is an effective method to increase the system reliability.

P. Guo et al.: Energy-Efficient Fault-Tolerant Scheduling Algorithm

FIGURE 4. Illustration of backup schedulability test.

IV. ENERGY-EFFICIENT FAULT-TOLERANTSCHEDULING ALGORITHMA. SCHEDULABILITY TESTSuppose that ti is a newly arrived task. Before scheduling,we need to perform the schedulability test to check whethertPi and tBi are schedulable on some VMs.To check schedulability of tPi on vmjk , we need to permute

the primary and backup copies on vmjk with the increasing oftheir start times. Suppose N primary and backup copies areassigned on vmjk and the time slots occupied by them withininterval [ai, di] are: [s1, f1], [s2, f2], · · · , [sN , fN ]. sn and fnare the start and finish times of the n-th copy (1 ≤ n ≤ N ),and ai ≤ s1 ≤ s2 ≤ · · · ≤ sN < fN ≤ di. Sinceprimary copy should not overlap any copy, these time slotscannot be utilized by tPi . For the purpose of computationalcompleteness, we extend the time slots to [s0, f0], [s1, f1], [s2,f2], · · · , [sN , fN ], [sN+1, fN+1] where s0 = −∞, f0 = ai,sN+1 = di and fN+1 = +∞. We need to scan these time slotsfrom left to right to locate the minimum index n (0 ≤ n ≤ N )satisfying the following condition:

sn+1 − fn ≥ ejk (ti). (9)

If such index n exists, then tPi is schedulable on vmjk and theearliest finish time of tPi on vmjk is fn + ejk (ti).

To check schedulability of tBi , BB overlapping should beconsidered. The unavailable time slots for tBi include all partsof primary copies, redundant parts of active backup copies,and backup parts located before f Pi . Fig. 4 gives an exampleabout schedulability test for backup copy tBi . The schedula-bility of tBi is checked on vm21. Suppose that tPi is assignedto vm11 and four backup copies and one primary copy arelocated on vm21. tB1 and tB4 are passive and tB2 and tB3 are active.The shaded areas are not available for tBi because redundantparts of active backup copies cannot overlap any copy. Thebackup parts located before f Pi (i.e., [sB1 , f

B1 ] and [sB2 , f

Pi ]) are

not available for tBi either. Because if tBi occupies these timeslots, its status must be active, incurring tBRi overlaps tB1 andtB2 , which violates the scheduling principle. Besides, the timeslot occupied by tB4 is available because it is passive. Thearea occupied by tP5 is unavailable because no overlappingis allowed for primary copy. So the unavailable time slots fortBi on vm21 are [sB1 , f

B1 ], [sB2 , f

Pi ], [s

B3 , f

P3 ] and [sP5 , f

B5 ].

Without loss of generality, suppose the unavailable timeslots for tBi on vmjk within interval [sPi , di] are [s1, f1], [s2,f2], · · · , [sN , fN ] and sPi ≤ s1 ≤ s2 ≤ · · · ≤ sN < fN ≤ di.For the purpose of computational completeness, we extendthe time slots to [s0, f0], [s1, f1], [s2, f2], · · · , [sN , fN ], [sN+1,fN+1] where s0 = −∞, f0 = sPi , sN+1 = di and fN+1 = +∞.To find the latest start time of tBi , these time slots should bescanned from right to left to seek the maximum index n (0 ≤n ≤ N ) satisfying condition (9). If such index n exists, then tBiis schedulable on vmjk and the latest start time of tBi on vmjkis sn+1 − ejk (ti).We summarize the above process into function

Schedulability(t∗i , vmjk ). In short, function Schedulability(tPi , vmjk ) returns the earliest finish time of tPi if tPi isschedulable on vmjk , and returns +∞ otherwise. FunctionSchedulability(tBi , vmjk ) returns the latest start time of tBi iftBi is schedulable on vmjk , and returns −∞ otherwise.

B. REARRANGEMENT MECHANISMIn this work, we take the idea in [57] to adjust the executionsequence of copies. In [57], the backup copy is deallocatedafter its corresponding primary copy finishes and the idletime slot left by the backup copy can be utilized by theprimary copies located on the same VM. This rearrangementmechanism helps advance the start time of primary copies andreduce the redundant parts of active backup copies. In orderto further reduce the system overhead introduced by backupcopies, we make improvements on the mechanism by rear-ranging the backup copies. The pseudocode of rearrangementis shown in Algorithm 1. When a primary copy finishes andthe corresponding backup copy is deleted, the rearrangementprocess gets invoked on the VMwhich deallocates the backupcopy. Firstly, all primary copies waiting on the VM arechecked if they can move forward (see lines 1-13). Then allbackup copies are checked if they can move backward (seelines 14-26). It should be noted that such change of executionorder does not violate the scheduling constraints listed inSection III.An example of rearrangement is shown in Fig. 5. When tP1

finishes at f P1 , tB1 is deallocated from vm21. The idle time inter-val left by tB1 is utilized by tP3 . After checking schedulability,tP3 moves forward. Besides, the status of tB3 becomes passive

53676 VOLUME 6, 2018

Page 7: Energy-Efficient Fault-Tolerant Scheduling Algorithm for Real ......tasks even in case of failure. Fault-tolerant scheduling is an effective method to increase the system reliability.

P. Guo et al.: Energy-Efficient Fault-Tolerant Scheduling Algorithm

Algorithm 1 Pseudocode of Rearrangement

1 foreach tPi on vmjk do2 startTimeOrigin← sPi ;3 finishTimeOrigin← f Pi ;4 Deallocate tPi from vmjk ;5 finishTimeNew← Schedulability(tPi , vmjk );6 Allocate tPi on vmjk ;7 if finishTimeNew < finishTimeOrigin then8 sPi ← finishTimeNew− ejk (ti);9 f Pi ← finishTimeNew;

10 Update the status of tBi according to (1);11 else12 sPi ← startTimeOrigin;13 f Pi ← finishTimeOrigin;

14 foreach tBi on vmjk do15 startTimeOrigin← sBi ;16 finishTimeOrigin← f Bi ;17 Deallocate tBi from vmjk ;18 startTimeNew← Schedulability(tBi , vmjk );19 Allocate tBi on vmjk ;20 if startTimeNew > startTimeOrigin then21 sPi ← startTimeNew;22 f Pi ← startTimeNew+ ejk (ti);23 Update the status of tBi according to (1);24 else25 sPi ← startTimeOrigin;26 f Pi ← finishTimeOrigin;

due to the earlier finish time of tP3 . Then tB2 moves backward

utilizing the time interval left by tP3 . This mechanism bringstwo benefits. Firstly, primary copies get the chance to fin-ish earlier. Without the arrangement mechanism, tP3 cannotstart execute before f B2 . Secondly, the energy consumption ofredundant parts is reduced. tB3 does not consume energy afterits status changes from active to passive, and tB2 consumes lessenergy.

C. PRIMARY AND BACKUP SCHEDULINGFor primary copies, they should be executed as early aspossible. While for backup copies, they should be scheduledas late as possible. More precisely, primary copies should beassigned to VMs with earliest finish time (EFT) and backupcopies should be assigned toVMswith latest start time (LST).Besides, both primary and backup copies must comply withthe policy of minimum energy cost (MEC). In this work,MEC has higher priority than EFT and LST.

The pseudocode of primary and backup scheduling algo-rithm is presented in Algorithm 2. Suppose ti is a newlyarrived task. The system firstly checks the schedulability ofthe primary copy tPi on all VMs (see lines 1-5). If tPi passesthe schedulability test, the energy consumption of tPi on eachVM is calculated (see lines 6-7). If assigning tPi to a VM can

FIGURE 5. Illustration of rearrangement. (a) Before tP1 finishes. (b) After

tP1 finishes, tP

3 moves forward. (c) After tP3 moves forward, tB

2 movesbackward.

decrease the energy consumption or advance the finish timeon the premise of equal energy consumption, tPi is assignedto the VM (see lines 8-11). If it fails to find a VM to accom-modate tPi , then function ScaleUp(tPi ) (described below) iscalled to scale up the system by creating new VMs. If thesystem expansion remains unable to satisfy the schedulingrequirements, ti will be given up (see lines 12-17). If tPi issuccessfully allocated, then the algorithm manages to sched-ule the backup copy tBi . All VMs except for those on h(tPi )are checked if tBi can be executed on them while meetingthe real-time requirement. The VM where tBi gets the lateststart time is selected (see lines 18-23). If assigning tBi toexisting VMs fails, ScaleUp(tBi ) is invoked. If the systemcan accommodate tBi after calling ScaleUp(tBi ), then tBi isallocated to the new VM; else ti is rejected (see lines 24-29).The pseudocode of function ScaleUp(t∗i ) is shown in

Algorithm 3. ScaleUp(t∗i ) is triggered by existing system’sfailing to allocate primary or backup copy t∗i . Supposethere are N types of VM templates that can be deployedon the hosts. Their processing capacities (in MIPS) arec1, c2, · · · , cN with c1 < c2 < · · · < cN , and memories

VOLUME 6, 2018 53677

Page 8: Energy-Efficient Fault-Tolerant Scheduling Algorithm for Real ......tasks even in case of failure. Fault-tolerant scheduling is an effective method to increase the system reliability.

P. Guo et al.: Energy-Efficient Fault-Tolerant Scheduling Algorithm

Algorithm 2 Pseudocode of Primary and BackupScheduling

1 foreach new task ti do2 f Pi ←+∞; E(tPi )←+∞;3 foreach host hj do4 foreach VM vmjk on hj do5 EFT ← Schedulability(tPi , vmjk );6 if EFT 6= +∞ then7 Calculate Ejk (tPi ) according to (5);8 if E(tPi ) > Ejk (tPi ) || (E(t

Pi ) == Ejk (tPi )

&& f Pi > EFT ) then9 vm(tPi )← vmjk ;

10 f Pi ← EFT ;11 E(tPi )← Ejk (tPi );

12 if vm(tPi ) == Null then13 vmnew← ScaleUp(tPi );14 if vmnew 6= Null then15 vm(tPi )← vmnew;16 else17 Reject ti;

18 foreach host hj 6= h(tPi ) do19 foreach VM vmjk on hj do20 LST ← Schedulability(tBi , vmjk );21 if LST 6= −∞ && sBi < LST then22 vm(tBi )← vmjk ;23 sBi ← LST ;

24 if vm(tBi ) == Null then25 vmnew← ScaleUp(tBi );26 if vmnew 6= Null then27 vm(tBi )← vmnew;28 else29 Reject ti;

(in MB) are m1,m2, · · · ,mN with m1 < m2 < · · · < mN . rjis the remaining processing capacity of host hj. tVM denotesthe time for creating a VM, and thost denotes the boot time ofa physical host. poj is the time when host hj gets powered on.BW is the bandwidth of the network between different hosts.ScaleUp(t∗i ) strives to create new VMs to raise the system’sprocessing capability. Running hosts are checked first. TheVM template with minimum processing capacity that satis-fies both timing requirement and processing capacity con-straint is selected as a candidate. If t∗i is primary copy and theexpected finish time exceeds the average of ai and di, the newVM’s processing capacity is increased by one level (seelines 1-7). Because if f Pi > (ai+ di)/2, tBi is very likely to beactive. Besides, creating VMs with excessively low process-ing capacities makes little sense for subsequent tasks. Thisproactive strategy is a trade-off between decreasing energy

Algorithm 3 Pseudocode of Function ScaleUp(t∗i )

1 foreach running host hj do2 Find the minimum processing capacity ck that

satisfies ai + tVM + li/ck ≤ di && rj ≥ ck ;3 if such ck exists then4 if t∗i is primary copy &&

ai + tVM + li/ck > (ai + di)/2 then5 k ← k 6= N?(k + 1) : N ;

6 Create vmnew with processing capacity ck on hj;7 Return vmnew;

8 foreach turning on host hj do9 Find the minimum processing capacity ck that

satisfies poj + thost + tVM + li/ck ≤ di && rj ≥ ck ;10 if such ck exists then11 if t∗i is primary copy &&

poj + thost + tVM + li/ck > (ai + di)/2 then12 k ← k 6= N?(k + 1) : N ;

13 Create vmnew with processing capacity ck on hj;14 Return vmnew;

15 foreach host hj do16 cjmin← minimum VM processing capacity on hj;17 mjmin← minimum VM memory on hj;18 Find the minimum processing capacity ck that

satisfies ai + 1.3mjmin/BW + tVM + li/ck ≤ di &&rj + c

jmin ≥ ck ;

19 if such ck exists then20 foreach VM vmjmin with processing capacity

cjmin on hj do21 foreach host hk 6= hj do22 if rk ≥ c

jmin then

23 Migrate vmjmin from hj to hk andcreate vmnew with processingcapacity ck on hj;

24 Return vmnew;

25 Find the minimum processing capacity ck that satisfiesai + thost + tVM + li/ck ≤ di;

26 if such ck exists then27 if t∗i is primary copy &&

ai + thost + tVM + li/ck > (ai + di)/2 then28 k ← k 6= N?(k + 1) : N ;

29 Turn on a new host and create vmnew withprocessing capacity ck on the new host;

30 Return vmnew;

31 Return Null;

consumption and improving system performance. If no suit-able VM can be allocated to running hosts, then hosts whichare starting up are considered. VM templates are checked if

53678 VOLUME 6, 2018

Page 9: Energy-Efficient Fault-Tolerant Scheduling Algorithm for Real ......tasks even in case of failure. Fault-tolerant scheduling is an effective method to increase the system reliability.

P. Guo et al.: Energy-Efficient Fault-Tolerant Scheduling Algorithm

they can be pre-allocated to hosts (see lines 8-14). If abovestrategies do not work, then the algorithm checks if migratinga VM from some host and consolidating the spare processingcapability can meet the time and resource requirements (seelines 15-24). If creating a new VM on existing hosts is stillnot feasible, then a new host is turned on and a new VM withsuitable processing capacity is created on it (see lines 25-30).If all attempts to allocate t∗i fail, Null is returned (see line 31).

V. PERFORMANCE EVALUATIONA. PERFORMANCE METRICSIn this section, we evaluate the overall performance of EFTR.We compare it with FESTAL, which is an algorithm recentlyproposed by Wang et al. [47], baseline algorithm NPEFTR(non-proactive EFTR), algorithm NPEFTR (non-proactiveEFTR), and NMEFTR (non-migration EFTR). The algo-rithms for comparison are concisely explained as follows:• FESTAL. It provides a general framework for taskscheduling in the cloud environment and considers bothvirtualization and backup overlapping. Different fromEFTR, FESTAL adopts conservative policy instead ofproactive policy, which means FESTAL creates newVMs with minimum processing capacities satisfyingthe energy and resource constraints. FESTAL does notadopt the rearrangementmechanism. Besides, it does notconsider the energy problem. For ease of comparison,FESTAL is modified in such a way that it uses the samescheduling strategy in Algorithm 2.

• NPEFTR. Different from EFTR, NPEFTR does notadopt the proactive strategy.

• NREFTR. Different from EFTR, NREFTR does notemploy the rearrangement mechanism.

• NMEFTR. Different from EFTR, NMEFTR does notemploy the VM migration technique.

We compare the algorithms based on the following threemetrics:• Guarantee Ratio is defined to be the ratio of the numberof successfully executed tasks over the total number oftasks.

• Energy Consumption is defined as the total power con-sumption.

• VM Count denotes the total number of VMs neededduring the scheduling.

B. EXPERIMENT SETUPSimulation has the advantage of providing repeatable andcontrollable environment. CloudSim [58] is selected as oursimulation platform. CloudSim is an event driven frameworkfor modeling cloud infrastructures and services. User definedpolicies and strategies for managing tasks and resources canbe deployed on the platform. In this paper, three types ofhosts and VMs are available in the cloud data centers. Themaximum power of each host is 200, 250 or 400 W, andtheir corresponding processing capacities are 1000, 1500 and2000 MIPS, respectively. The processing capacities of threetypes of VM templates are 200, 300 and 400 MIPS. The time

needed for creating a VM and turning on a host is 15 and90 seconds respectively.

The characteristics of tasks, including task size, task count,interval time and baseDeadline, are shown in Table 1. Thetask size (MI) is uniformly distributed between 105 and2 × 105. The task arrival rate is in compliance with Pois-son distribution. 1/λ (s) denotes the mean time betweentask arrival. The deadline of each task ti is di = ai + U(baseDeadline, 4baseDeadline).

TABLE 1. Parameters of tasks.

C. PERFORMANCE IMPACT OF TASK COUNTIn this section, we conduct experiments to evaluate the per-formance impact of task count. Task count increases from5000 to 30000 with step 5000, and other variables areconstant.

Fig. 6(a) shows the energy consumption impact of taskcount. With the increase of task count, more energy isconsumed, because longer execution time is needed andmore VMs are created. EFTR consumes least energy whileFESTAL consumes most energy. This result indicates thatproactive and rearrangement policies play important roles insaving energy. The performance difference between EFTRand NPEFTR shows that although the proactive strategyincreases the processing capacities of some new VMs, it doesincrease the energy consumption because less VMs arecreated (see Fig. 6(c)). The comparison between EFTRand NREFTR indicates that the rearrangement mechanismhelps efficiently reduce idle power consumption by uti-lizing the idle time slots left by deleted backup copies.Besides, the energy consumption of NMEFTR indicates thatVM migration can effectively increase the resource utiliza-tion. EFTR outperforms FESTAL by 9.02% on average inenergy conservation.

Fig. 6(b) shows that the guarantee ratio impact of taskcount is relatively stable and the fluctuation is less than 1%.This can be ascribed to the elasticity of cloud system. Whenmore tasks arrive, computing resources can be added from theinfinite resource pool to guarantee that tasks can meet theirdeadlines. When the number of tasks are relatively small,the time delays caused by turning on hosts and creatingVMs have a negative impact on the guarantee ratio. Whenthe number of tasks are large enough, the system reachesa balanced state and less new hosts and VMs are needed.So the guarantee ratio gradually increases to a stable value.Besides, EFTR and NREFTR have higher guarantee ratiosthan other three algorithms. This can be attributed to theproactive strategy adopted by EFTR and NREFTR, whichincreases the system processing capacity to accommodate

VOLUME 6, 2018 53679

Page 10: Energy-Efficient Fault-Tolerant Scheduling Algorithm for Real ......tasks even in case of failure. Fault-tolerant scheduling is an effective method to increase the system reliability.

P. Guo et al.: Energy-Efficient Fault-Tolerant Scheduling Algorithm

FIGURE 6. Performance impact of task count. (a) Energy consumption impact of task count. (b) Guarantee ratio impact of task count.(c) VM count impact of task count.

FIGURE 7. Performance impact of task arrival rate. (a) Energy consumption impact of arrival rate. (b) Guarantee ratio impact of arrivalrate. (c) VM count impact of arrival rate.

more tasks. EFTR achieves 3% higher guarantee ratio thanFESTAL.

Fig. 6(c) demonstrates that EFTR needs least VMs amongthe five algorithms. Compared with NREFTR, EFTR showsthat the rearrangement mechanism plays an important role inreducing the system overhead by allowing waiting primarycopies to move forward and waiting backup copies to movebackwards. Compared with NPEFTR, EFTR shows that theproactive strategy makes good trade-off between increasingsystem processing capacity and reducing VM count. Theperformance of NMEFTR shows that VMmigration can saveabout 2% VMs for EFTR. EFTR needs 23.5% less VMs thanFESTAL.

D. PERFORMANCE IMPACT OF TASK ARRIVAL RATEParameter intervalTime reflects the task arrival rate. So thesmaller intervalTime is, the more frequently tasks arrive.We vary intervalTime while keeping other parametersunchanged to test the influence of task arrival rate.

From Fig. 7(a), we can observe that the energy consump-tion increases gradually with the decrease of task arrival rate.This is because larger time intervals between two tasks leads

to longer finish times of all tasks, thus increasing the overallenergy consumption. Compared with tasks’ execution times,the change of time interval is relatively small, so the rise inenergy is not obvious. Basically, EFTR needs least energy.The explanation is the same as that in Fig. 6(a). On average,EFTR outperforms FESTAL by 10.43% in terms of energyconservation.

Fig. 7(b) shows that the guarantee ratios of five algorithmskeep slow increasing trend with the decrease of task arrivalrate. The reason is that lower arrival rate means less resourcecompetition and less system load. Furthermore, the cloudsystem has enough time to expand the computing capacitiesby adding new hosts or VMs. EFTR and NREFTR withproactive strategy have higher guarantee ratios than FESTAL,NPEFTR and NMEFTR.

Fig. 7(c) illustrates that the VM count decreases sharplywhen tasks arrive more slowly. When the task load is heavy,the system has to add more VMs to accommodate the tasksand to meet the deadline requirements. When the system loadbecomes light, the system is capable to execute the tasks andless new resources are needed. Owing to the proactive andrearrangement policies, EFTR requires least VMs.

53680 VOLUME 6, 2018

Page 11: Energy-Efficient Fault-Tolerant Scheduling Algorithm for Real ......tasks even in case of failure. Fault-tolerant scheduling is an effective method to increase the system reliability.

P. Guo et al.: Energy-Efficient Fault-Tolerant Scheduling Algorithm

FIGURE 8. Performance impact of task deadline. (a) Energy consumption impact of deadline. (b) Guarantee ratio impact of deadline.(c) VM count impact of deadline.

FIGURE 9. Energy overhead of VM migration. (a) Overhead impact of task count. (b) Overhead impact of arrival rate. (c) Overhead impactof deadline.

E. PERFORMANCE IMPACT OF TASK DEADLINETask deadline is also a significant factor that affects thealgorithm performance. In this section, we compare the fivealgorithms in terms task deadline.With the increase of param-eter baseDeadline, the tasks have looser deadlines. ParameterbaseDeadline varies from 170 to 350 with step size 30.

As we can see from Fig. 8(a), the energy consumptionbecomes larger when baseDeadline increases. Because moretasks are accepted when deadlines become looser, thus moreVMs are created. When baseDeadline is larger than 230,the energy consumption of all algorithms decreases. It can beexplained that when more tasks can be finished on existingVMs, the finish times of tasks become earlier and the growthof VM count slows down. Moreover, the downward trendof EFTR is more obvious because it employs both proactiveand rearrangement policies to increase the system efficiency.Compared with FESTAL, EFTR conserves energy by 9.89%on average.

Fig. 8(b) shows that the guarantee ratio gets higher withthe increase of deadline. The reason here is obvious – looserdeadlines allow more tasks to finish before their deadlineswith the same system processing capacity. When baseDead-line is big enough, the guarantee ratio gets close to 100%.

Besides, EFTR and NREFTR have higher guarantee ratiosthan FESTAL, NPEFTR and NMEFTR. This is becauseadopting proactive strategy can shorten the execution timesof tasks, thus raising the guarantee ratio.

The slowdown trend of VM count growth is obvi-ous in Fig. 8(c). The VM counts of EFTR, NREFTRand NMEFTR increase slowly and even decrease whenbaseDeadline is large enough. However, the VM counts ofFESTAL and NPEFTR keep increasing. The effect of proac-tive strategy is evident here. Besides, EFTR needs the leastnumber of VMs, which verifies the effectiveness of the rear-rangement mechanism.

F. OVERHEAD OF VM MIGRATIONIn this section, we evaluate the energy overhead of VMmigra-tion in EFTR. As shown in Fig. 9, the overhead caused byVM migration is generally proportional to the number ofVMs. The energy overhead is negligible compared with thetotal energy consumption. However, the positive effect ofVM migration is obvious. From Fig. 6-8, we can see thatwith VMmigration, the algorithm needs about 2% less VMs,consumes 3% less energy, and accepts 2% more tasks. Thedata indicates that employing the VM migration technique in

VOLUME 6, 2018 53681

Page 12: Energy-Efficient Fault-Tolerant Scheduling Algorithm for Real ......tasks even in case of failure. Fault-tolerant scheduling is an effective method to increase the system reliability.

P. Guo et al.: Energy-Efficient Fault-Tolerant Scheduling Algorithm

cloud BBU pool is efficient in energy conservation and taskprocessing.

VI. CONCLUSIONIn this paper, we propose an energy efficient fault-tolerantscheduling algorithm, called EFTR, for real-time tasks inC-RAN. Fault tolerance is realized based on the primary-backup model. EFTR algorithm dynamically schedulesprimary and backup copies of tasks with timing require-ments to different virtual machines. The scheduling criteriaand backup overlapping constraints are discussed in detail.Schedulability test is designed to check whether the primaryand backup copies are schedulable on some VMs. In order toincrease resource utilization, we employ the rearrangementmechanism to fully utilize the idle time slots. In addition,EFTR inherits the elasticity of cloud computing and adoptsproactive strategy to increase the system processing capacity.These policies significantly improve the system schedulabil-ity and reduce the energy consumption. Through theoreticalanalysis and simulation studies, we show that EFTR outper-forms FESTAL in terms of energy conservation, guaranteeratio and VM count under different workloads. Meanwhile,we notice that our algorithm is not suitable for dependenttasks, which are common in realistic environment. This isour research focus in further studies.

REFERENCES[1] ‘‘More than 50 billion connected devices,’’ Ericsson, Stockholm, Sweden,

White Paper, Feb. 2011.[2] Q. Wang, D. Chen, N. Zhang, Z. Qin, and Z. Qin, ‘‘LACS: A lightweight

label-based access control scheme in IoT-based 5G caching context,’’ IEEEAccess, vol. 5, pp. 4018–4027, 2017.

[3] A. Abrol and R. K. Jha, ‘‘Power optimization in 5G networks: A steptowards GrEEn communication,’’ IEEE Access, vol. 4, pp. 1355–1374,2016.

[4] ‘‘C-RAN: The road towards green RAN, version 2.5,’’ China Mobile Res.Inst., Beijing, China, White Paper, Oct. 2011.

[5] Q. Xu, Z. Su, Q. Zheng, M. Luo, and B. Dong, ‘‘Secure content deliverywith edge nodes to save caching resources for mobile users in green cities,’’IEEE Trans. Ind. Informat., vol. 14, no. 6, pp. 2550–2559, Jun. 2018.

[6] J. Wu, M. Dong, K. Ota, J. Li, and Z. Guan, ‘‘Big data analysis-basedsecure cluster management for optimized control plane in software-definednetworks,’’ IEEE Trans. Netw. Service Manag., vol. 15, no. 1, pp. 27–38,Mar. 2018.

[7] W. Xia, J. Zhang, T. Q. S. Quek, S. Jin, and H. Zhu, ‘‘Energy-efficient taskscheduling and resource allocation in downlink C-RAN,’’ in Proc. IEEEWireless Commun. Netw. Conf. (WCNC), Apr. 2018, pp. 1–6.

[8] J. Luo, Q. Chen, and L. Tang, ‘‘Reducing power consumption by jointsleeping strategy and power control in delay-aware C-RAN,’’ IEEEAccess,vol. 6, pp. 14655–14667, 2018.

[9] Q. Liu, T. Han, N. Ansari, and G. Wu, ‘‘On designing energy-efficientheterogeneous cloud radio access networks,’’ IEEE Trans. Green Commun.Netw., vol. 2, no. 3, pp. 721–734, Sep. 2018.

[10] A. Botta, W. de Donato, V. Persico, and A. Pescapé, ‘‘Integration of Cloudcomputing and Internet of Things: A survey,’’ Future Gener. Comput. Syst.,vol. 56, pp. 684–700, Mar. 2016.

[11] Y. Al-Dhuraibi, F. Paraiso, N. Djarallah, and P. Merle, ‘‘Elasticity in cloudcomputing: State of the art and research challenges,’’ IEEE Trans. Serv.Comput., vol. 11, no. 2, pp. 430–447, Mar./Apr. 2018.

[12] P. M.Mell and T. Grance, ‘‘The NIST definition of cloud computing,’’ Nat.Inst. Standards Technol., Gaithersburg, MD, USA, Tech. Rep., 2011.

[13] F. Zhang, G. Liu, X. Fu, and R. Yahyapour, ‘‘A survey on virtual machinemigration: Challenges, techniques, and open issues,’’ IEEE Commun. Sur-veys Tuts., vol. 20, no. 2, pp. 1206–1243, 2nd Quart., 2018.

[14] Z. Su, Y. Hui, Q. Xu, T. Yang, J. Liu, and Y. Jia, ‘‘An edge caching schemeto distribute content in vehicular networks,’’ IEEE Trans. Veh. Technol.,vol. 67, no. 6, pp. 5346–5356, Jun. 2018.

[15] N. Abbas, H. Hajj, Z. Abbas, K. Jahed, and S. Sharafeddine, ‘‘An optimizedapproach to video traffic splitting in heterogeneous wireless networkswith energy and QoE considerations,’’ J. Netw. Comput. Appl., vol. 83,pp. 72–88, Apr. 2017.

[16] S. Baruah, M. Bertogna, and G. Buttazzo, Multiprocessor Scheduling forReal-Time Systems (Embedded Systems). Springer, 2015.

[17] A. A. Safaei, ‘‘Real-time processing of streaming big data,’’ Real-TimeSyst., vol. 53, no. 1, pp. 1–44, Jan. 2017.

[18] J. A. Stankovic, ‘‘Misconceptions about real-time computing: A seri-ous problem for next-generation systems,’’ Computer, vol. 21, no. 10,pp. 10–19, Oct. 1988.

[19] J. W. S. Liu, Real-Time Systems. Englewood Cliffs, NJ, USA:Prentice-Hall, 2000.

[20] A. Menychtas, D. Kyriazis, and K. Tserpes, ‘‘Real-time reconfigurationfor guaranteeing QoS provisioning levels in Grid environments,’’ FutureGener. Comput. Syst., vol. 25, no. 7, pp. 779–784, Jul. 2009.

[21] S. Sharafeddine, K. Jahed, O. Farhat, and Z. Dawy, ‘‘Failure recovery inwireless content distribution networks with device-to-device cooperation,’’Comput. Netw., vol. 128, pp. 108–122, Dec. 2017.

[22] Z. Su, Q. Xu, J. Luo, H. Pu, Y. Peng, and R. Lu, ‘‘A secure contentcaching scheme for disaster backup in fog computing enabled mobilesocial networks,’’ IEEE Trans. Ind. Informat., to be published.

[23] S. Sharafeddine and A. El Arid, ‘‘An empirical energy model for secureWeb browsing over mobile devices,’’ Secur. Commun. Netw., vol. 5, no. 9,pp. 1037–1048, Sep. 2012.

[24] A. Yadav, O. A. Dobre, and N. Ansari, ‘‘Energy and traffic awarefull-duplex communications for 5G systems,’’ IEEE Access, vol. 5,pp. 11278–11290, 2017.

[25] K. N. R. S. V. Prasad, E. Hossain, and V. K. Bhargava, ‘‘Energy efficiencyin massive MIMO-based 5G networks: Opportunities and challenges,’’IEEE Wireless Commun., vol. 24, no. 3, pp. 86–94, Jun. 2017.

[26] A. Mukherjee, ‘‘Energy efficiency and delay in 5G ultra-reliable low-latency communications system architectures,’’ IEEE Netw., vol. 32, no. 2,pp. 55–61, Mar./Apr. 2018.

[27] M. R. Garey and D. S. Johnson, ‘‘Complexity results for multiprocessorscheduling under resource constraints,’’ SIAM J. Comput., vol. 4, no. 4,pp. 397–411, Dec. 1975.

[28] C. L. Liu and J. W. Layland, ‘‘Scheduling algorithms for multiprogram-ming in a hard-real-time environment,’’ J. ACM, vol. 20, no. 1, pp. 46–61,Jan. 1973.

[29] M. Joseph and P. Pandya, ‘‘Finding response times in a real-time system,’’Comput. J., vol. 29, no. 5, pp. 390–395, May 1986.

[30] S. K. Dhall and C. L. Liu, ‘‘On a real-time scheduling problem,’’ Oper.Res., vol. 26, no. 1, pp. 127–140, Feb. 1978.

[31] K. Wang and Y. Cen, ‘‘Real-time partitioned scheduling in cloud-RANwith hard deadline constraint,’’ in Proc. IEEE Wireless Commun. Netw.Conf., Mar. 2017, pp. 1–6.

[32] L. Zhang, K. Wang, D. Xuan, and K. Yang, ‘‘Optimal task allocation innear-far computing enhanced C-RAN for wireless big data processing,’’IEEE Wireless Commun., vol. 25, no. 1, pp. 50–55, Feb. 2018.

[33] A. A. Bertossi, L. V. Mancini, and F. Rossini, ‘‘Fault-tolerant rate-monotonic first-fit scheduling in hard-real-time systems,’’ IEEE Trans.Parallel Distrib. Syst., vol. 10, no. 9, pp. 934–945, Sep. 1999.

[34] S. Ghosh, R. Melhem, and D. Mosse, ‘‘Fault-tolerance through schedulingof aperiodic tasks in hard real-time multiprocessor systems,’’ IEEE Trans.Parallel Distrib. Syst., vol. 8, no. 3, pp. 272–284, Mar. 1997.

[35] P. Guo and Z. Xue, ‘‘QoS-aware fault-tolerant rate-monotonic first-fitscheduling in real-time systems,’’ in Proc. IEEE 2nd Inf. Technol. Netw.,Electron. Autom. Control Conf., Dec. 2017, pp. 311–315.

[36] Y. Li, T. Jiang, K. Luo, and S. Mao, ‘‘Green heterogeneous cloud radioaccess networks: Potential techniques, performance trade-offs, and chal-lenges,’’ IEEE Commun. Mag., vol. 55, no. 11, pp. 33–39, Nov. 2017.

[37] S. Wang, K. Li, J. Mei, G. Xiao, and K. Li, ‘‘A reliability-aware taskscheduling algorithm based on replication on heterogeneous computingsystems,’’ J. Grid Comput., vol. 15, no. 1, pp. 23–39, Mar. 2017.

[38] Y. Ding, G. Yao, and K. Hao, ‘‘Fault-tolerant elastic scheduling algorithmfor workflow in cloud systems,’’ Inf. Sci., vol. 393, pp. 47–65, Jul. 2017.

[39] G. Xie et al., ‘‘Minimizing redundancy to satisfy reliability requirement fora parallel application on heterogeneous service-oriented systems,’’ IEEETrans. Serv. Comput., to be published.

53682 VOLUME 6, 2018

Page 13: Energy-Efficient Fault-Tolerant Scheduling Algorithm for Real ......tasks even in case of failure. Fault-tolerant scheduling is an effective method to increase the system reliability.

P. Guo et al.: Energy-Efficient Fault-Tolerant Scheduling Algorithm

[40] D. Natale and Stankovic, ‘‘Dynamic end-to-end guarantees in distributedreal time systems,’’ in Proc. Real-Time Syst. Symp. (REAL), Dec. 1994,pp. 216–227.

[41] S. Saha, A. Sarkar, and A. Chakrabarti, ‘‘Scheduling dynamic hard real-time task sets on fully and partially reconfigurable platforms,’’ IEEEEmbedded Syst. Lett., vol. 7, no. 1, pp. 23–26, Mar. 2015.

[42] X. Zhu, J. Wang, H. Guo, D. Zhu, L. T. Yang, and L. Liu, ‘‘Fault-tolerant scheduling for real-time scientific workflows with elastic resourceprovisioning in virtualized clouds,’’ IEEE Trans. Parallel Distrib. Syst.,vol. 27, no. 12, pp. 3501–3517, Dec. 2016.

[43] Y. Li, M. Chen, W. Dai, and M. Qiu, ‘‘Energy optimization with dynamictask scheduling mobile cloud computing,’’ IEEE Syst. J., vol. 11, no. 1,pp. 96–105, Mar. 2017.

[44] X. Qin and H. Jiang, ‘‘A dynamic and reliability-driven scheduling algo-rithm for parallel real-time jobs executing on heterogeneous clusters,’’J. Parallel Distrib. Comput., vol. 65, no. 8, pp. 885–900, Aug. 2005.

[45] W. Luo, J. Li, F. Yang, G. Tu, L. Pang, and L. Shu, ‘‘DYFARS: Boost-ing reliability in fault-tolerant heterogeneous distributed systems throughdynamic scheduling,’’ in Proc. IEEE 8th ACIS Int. Conf. Softw. Eng.Artif. Intell. Netw., Parallel/Distrib. Comput., vol. 1, Jul./Aug. 2007,pp. 640–645.

[46] X. Zhu, X. Qin, andM. Qiu, ‘‘QoS-aware fault-tolerant scheduling for real-time tasks on heterogeneous clusters,’’ IEEE Trans. Comput., vol. 60, no. 6,pp. 800–812, Jun. 2011.

[47] J. Wang, W. Bao, X. Zhu, L. T. Yang, and Y. Xiang, ‘‘FESTAL: Fault-tolerant elastic scheduling algorithm for real-time tasks in virtualizedclouds,’’ IEEE Trans. Comput., vol. 64, no. 9, pp. 2545–2558, Sep. 2015.

[48] T. Sigwele, A. S. Alam, P. Pillai, and Y. F. Hu, ‘‘Energy-efficient cloudradio access networks by cloud based workload consolidation for 5G,’’J. Netw. Comput. Appl., vol. 78, pp. 1–8, Jan. 2017.

[49] G. Manimaran and C. S. R. Murthy, ‘‘A fault-tolerant dynamic schedulingalgorithm for multiprocessor real-time systems and its analysis,’’ IEEETrans. Parallel Distrib. Syst., vol. 9, no. 11, pp. 1137–1152, Nov. 1998.

[50] (2018). Specpower_ssj2008 Results. [Online]. Available: http://www.spec.org/power_ssj2008/results/power_ssj2008.html

[51] A. Beloglazov, J. Abawajy, and R. Buyya, ‘‘Energy-aware resource allo-cation heuristics for efficient management of data centers for cloud com-puting,’’ Future Generat. Comput. Syst., vol. 28, no. 5, pp. 755–768, 2012.

[52] A. Varasteh and M. Goudarzi, ‘‘Server consolidation techniques in virtu-alized data centers: A survey,’’ IEEE Syst. J., vol. 11, no. 2, pp. 772–783,Jun. 2017.

[53] C. Clark et al., ‘‘Live migration of virtual machines,’’ in Proc. 2nd Conf.Symp. Netw. Syst. Design Implement., vol. 2. Berkeley, CA, USA: USENIXAssociation, Jan. 2005, pp. 273–286.

[54] H. Liu, H. Jin, C.-Z. Xu, and X. Liao, ‘‘Performance and energy modelingfor live migration of virtual machines,’’ Cluster Comput., vol. 16, no. 2,pp. 249–264, 2013.

[55] A. Strunk, ‘‘A lightweight model for estimating energy cost of live migra-tion of virtual machines,’’ in Proc. IEEE 6th Int. Conf. Cloud Comput.,Jun./Jul. 2013, pp. 510–517.

[56] V. De Maio, R. Prodan, S. Benedict, and G. Kecskemeti, ‘‘Modellingenergy consumption of network transfers and virtual machine migration,’’Future Gener. Comput. Syst., vol. 56, pp. 388–406, Mar. 2016.

[57] P. Guo and Z. Xue, ‘‘Real-time fault-tolerant scheduling algorithm withrearrangement in cloud systems,’’ in Proc. IEEE 2nd Inf. Technol. Netw.,Electron. Autom. Control Conf., Dec. 2017, pp. 399–402.

[58] R. Calheiros, R. Ranjan, A. Beloglazov, C. A. F. De Rose, and R. Buyya,‘‘CloudSim: A toolkit for modeling and simulation of cloud computingenvironments and evaluation of resource provisioning algorithms,’’ Softw.,Pract. Exper., vol. 41, no. 1, pp. 23–50, 2011.

PENGZE GUO received the B.S. degree from theSchool of Electronic and Information Engineer-ing, Xi’an Jiaotong University, China, in 2013.He is currently pursuing the Ph.D. degree ininformation and communication engineering withShanghai Jiao Tong University. His research inter-ests include real-time systems, fault tolerance, andcloud computing.

MING LIU is currently a joint Ph.D. Student withthe School of Electronic Information and Electri-cal Engineering, Shanghai Jiao Tong University,China, and the Faculty of Engineering and Infor-mation Technologies, University of TechnologySydney, Australia. His research interests includecyber threat intelligence, intrusion detection sys-tems, and scalable data analytics.

JUN WU (S’08–M’12) received the Ph.D. degreein information and telecommunication studiesfrom Waseda University, Japan, in 2011. He wasa Post-Doctoral Researcher with the ResearchInstitute for Secure Systems, National Instituteof Advanced Industrial Science and Technology,Japan, from 2011 to 2012. He was a Researcherwith the Global Information and Telecommuni-cation Institute, Waseda University, from 2011 to2013. He is currently an Associate Professor with

the School of Electronic Information and Electrical Engineering, ShanghaiJiao Tong University, China, where he is also the Vice Director of theNational Engineering Laboratory for Information Content Analysis Technol-ogy. His research interests include the advanced computing, communicationsand security techniques of software-defined networks, information-centricnetworks smart grids, Internet of Things, and fifth generation. He hasauthored over 100 refereed papers in these fields. He is a TPC memberof more than 10 international conferences, including ICC, GLOBECOM,and WINCON. He is the Chair of the IEEE P21451-1-5 Standard WorkingGroup. He has hosted and participated in a lot of research projects includingthe National Natural Science Foundation of China, the National 863 Planand 973 Plan of China, and the Japan Society of the Promotion of ScienceProjects. He is a Guest Editor of the IEEE SENSORS JOURNAL. He is currently anAssociate Editor of the IEEE ACCESS.

ZHI XUE received the Ph.D. degree in com-munication and information systems from theSchool of Electronic Information and Electri-cal Engineering, Shanghai Jiao Tong University,China, in 2001. He is currently a Professor withShanghai Jiao Tong University. His research inter-ests include cyber security and cloud computing.

XIANGJIAN HE (M’99–SM’05) received thePh.D. degree in computing sciences from the Uni-versity of Technology Sydney, Australia, in 1999.Since 1999, he has been with the University ofTechnology Sydney. He is currently a Full Pro-fessor and the Director of the Computer Visionand Pattern Recognition Laboratory, Global BigData Technologies Centre. He is a Co-Leader ofthe Network Security Research Team, Center forReal-Time Information Networks, University ofTechnology Sydney.

VOLUME 6, 2018 53683