-
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI)
SYSTEMS, VOL. 20, NO. 7, JULY 2012 1187
Temperature-Aware Idle Time Distribution forLeakage Energy
Optimization
Min Bao, Alexandru Andrei, Petru Eles, Member, IEEE, and Zebo
Peng, Senior Member, IEEE
AbstractLarge-scale integration with deep sub-micron
tech-nologies has led to high power densities and high chip
workingtemperatures. At the same time, leakage energy has become
thedominant energy consumption source of circuits due to
reducedthreshold voltages. Given the close interdependence between
tem-perature and leakage current, temperature has become a
majorissue to be considered for power-aware system level design
tech-niques. In this paper, we address the issue of leakage energy
opti-mization through temperature aware idle time distribution
(ITD).We first propose an offline ITD technique to optimize leakage
en-ergy consumption, where only static idle time is distributed. To
ac-count for the dynamic slack, we then propose an online ITD
tech-nique where both static and dynamic idle time are considered.
Toimprove the efficiency of our ITD techniques, we also propose
ananalytical temperature analysis approach which is accurate
and,yet, sufficiently fast to be used inside the energy
optimization loop.
Index TermsIdle time distribution (ITD), leakage energy
opti-mization, system level design, temperature aware design.
I. INTRODUCTION
A. Background
T ECHNOLOGY scaling and ever increasing demand forperformance
have resulted in high power densities in cur-rent circuits, which
have also led to increased chip temperature.Due to the strong
dependence of leakage current on tempera-ture, growing temperature
leads to an increase in leakage cur-rent and, consequently, energy,
which, again, produces highertemperature. Thus, temperature is an
important parameter to betaken into consideration for energy
optimization.Energy optimization for embedded systems has been
exten-
sively researched. At system level, dynamic voltage
selection(DVS) is one of the preferred approaches for reducing
theoverall energy consumption [1], [2]. This technique exploitsthe
available slack time to achieve energy efficiency by re-ducing the
supply voltage and frequency such that the executionof tasks is
stretched within their deadline.There are two types of slacks: 1)
static slack, which is due to
the fact that, when executing at the highest (nominal)
voltage
Manuscript received November 12, 2010; revised March 28, 2011;
acceptedMay 10, 2011. Date of publication July 12, 2011; date of
current version June01, 2012.M. Bao, P. Eles, and Z. Peng are with
the Department of Computer and In-
formation Science, Linkping University, Linkping 58183, Sweden
(e-mail:[email protected]).A. Andrei is with the Department of
Computer and Information Science,
Linkping University, Linkping 58183, Sweden, and also with
Ericsson AB,Linkping 58183, Sweden.Color versions of one or more of
the figures in this paper are available online
at http://ieeexplore.ieee.org.Digital Object Identifier
10.1109/TVLSI.2011.2157542
level, tasks finish before their deadlines even when
executingtheir worst numbers of cycles (WNC) and 2) dynamic
slack,due to the fact that most of the time tasks execute less
thantheir WNC. Offline DVS techniques [3] can only exploit
staticslack, while online approaches [7], [8] are able to further
reduceenergy consumption by exploiting the dynamic slack.However,
very often, not all available slack should or can be
exploited and certain slackmay still exist after DVS. An
obvioussituation is when the lowest supply voltage is such that,
even ifselected, a certain slack interval is left. Another reason
is the ex-istence of the critical voltage [6]. To achieve the
optimal energyefficiency, DVSwould not execute a task at a voltage
lower thanthe critical one, since, otherwise, the additional static
energyconsumed due to the longer execution time is larger than
theenergy saving due to the lowered voltage. During the
availableslack interval, the processor remains idle and can be
switchedto a low power state.Due to the strong inter-dependence
between leakage power
and temperature, different distributions of idle time will lead
todifferent temperature distributions and, consequentially,
energyconsumption. In this paper, we address the issue of
optimizingleakage energy consumption through distribution of both
staticand dynamic slack time.
B. System Level Temperature Modeling
Temperature aware system level design methods rely on
theavailability of temperature modeling and analysis tools.
Systemlevel temperature modeling approaches are mostly based on
theduality between heat transfer and electrical phenomena.
Hotspot[7] is both an architectural level and system level
temperaturesimulator. The basic idea of Hotspot is to build an
equivalentcircuit of thermal resistances and capacitances capturing
boththe architecture blocks and the elements of the thermal
package.In [8], a similar temperature modeling approach was
proposedwhich speeds up the thermal analysis through dynamic
adapta-tion of the resolution.However, temperature analysis time
with approaches like the
two mentioned above are too long to be affordable inside a
tem-perature aware system level optimization loop. There has
beensome work on establishing fast system level temperature
anal-ysis techniques. They also build on the duality between
heattransfer and electrical phenomena and are based on very
restric-tive assumptions in order to simplify the model. In [9],
the au-thors have assumed that: 1) no cooling layer is present; 2)
thereis no interdependency between leakage current and
tempera-ture; and 3) the whole application executes at a constant
voltage.The models in [10] and [11] consider variable voltage
levelsbut maintain the first two limitations above. The most
general
1063-8210/$26.00 2011 IEEE
-
1188 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI)
SYSTEMS, VOL. 20, NO. 7, JULY 2012
analytical model is proposed in [12] which considers
coolinglayers as well as the dependency between leakage and
temper-ature. However, this approach is limited to the case of a
uniquevoltage level throughout the application. In order to support
ourITD technique proposed in this paper, we introduce a fast
andaccurate temperature analysis technique, which eliminates
allthree limitations mentioned above and can be used inside a
tem-perature aware system level optimization loop.
C. Related Work
Several approaches to system level temperature aware designhave
been discussed in literature. Temperature management isutilized to
control the temperature of processors for improvingsystem
reliability [13]. In [14], the authors proposed a techniquefor
temperature management by scaling the processor speed. In[15], the
authors addressed the issue of scheduling and map-ping of a set of
tasks with real-time constraints on multi-pro-cessors for peak
temperature minimization. Techniques for tasksequencing combined
with DVS to reduce the peak tempera-ture of a processor were
proposed in [10]. Several approachesaiming at reducing temperature
variations or temperature gradi-ents across the chip, e.g., [16],
were proposed.A considerable amount of work has been published on
per-
formance optimization under thermal and real-time
constraints.Zhang et al. [17] proposed voltage assignment
techniques to op-timize the performance of a set of periodic tasks
working underthermal constraints. In [18], the authors proposed
approachesto optimize throughput by task sequencing under
thermalconstraints. An online speed adaptation technique for
homo-geneous multi-processors with the target of maximizing
totalthroughput was proposed by Rao et al. in [19].
Temperatureaware DVS techniques considering the
leakage/temperaturedependency were proposed in [2] and [20].In this
paper we address the issue of optimizing leakage en-
ergy consumption through distribution of idle time. The
onlywork, to our best knowledge, previously addressing this issue
is[21] and [22]. In [21], the authors proposed an approach to
dis-tribute idle time for applications consisting of one single
task ex-ecuting at a constant given supply voltage. Thus, their
approachcannot optimize the distribution of idle time among
multipletasks which also execute at different voltages. The same
limita-tion also holds for [22], where a pattern-based ITD for
leakageenergy optimization considering one single task was
proposed.The pattern-based approach generates uniform idle time
distri-bution over the whole application and, thus, is not
appropriatefor ITD among multi-task applications where tasks have
dif-ferent amounts of energy consumption and execute at
differentvoltage levels.
D. Main Contributions
In this paper, we make the following main contributions.11) We
propose an offline ITD approach to optimize leakageenergy
consumption for a set of periodic tasks. Static slackis distributed
globally among tasks which are executed atdifferent discrete
voltage levels.
1Preliminary results regarding the offline ITD approach have
been publishedin [23].
2) We propose, based on the offline ITD approach, an onlineITD
technique where both static and dynamic slack aredistributed.
3) We propose a fast and accurate analytical temperaturemodel
which eliminates all the three limitations men-tioned in Section
I-B, by considering the followingaspects: a) the interdependence
between leakage powerand temperature; b) multiple cooling layers of
the chip; c)non-smooth power consumption generated due to
multiplediscrete supply voltage levels of the processor.
E. Paper OrganizationIn Section II we introduce the power and
application models.
In Section III we give a motivational example.We formulate
theproblem in Section IV. In Section V we introduce our analyt-ical
thermal model. We then propose the offline ITD approach,which
distributes only static slack, in Section VI. Based on theoffline
ITD approach, we present our online ITD technique inSection VII.
Finally, experimental results and conclusions arepresented in
Sections VIII and IX.
II. PRELIMINARIES
A. Power ModelFor dynamic power we use the following equation
[24]:
where , and denote the effective switched capacitance,supply
voltage, and frequency, respectively. The leakage poweris expressed
as follows [25]:
(1)
where is the leakage current at a reference temperature,is the
current temperature, and and are technology de-
pendent coefficients. In Section V-B we will use a
piecewiselinear approximation of this model, as proposed, for
example, in[26]. According to it, the working temperature range
,where and are the ambient and the maximal workingtemperature of
the chip, is divided into several sub-ranges. Theleakage power
inside each sub-range is modeled by alinear function: , where and
are con-stants characteristic to each interval.
B. Application ModelThe application is captured as a task graph
. A node
represents a computational task, while an edgeindicates the data
dependency between two tasks. Each taskis characterized by the
following six-tuple:
where , and are task s worst case, bestcase and expected number
of clock cycles to be executed. Theexpected number of clock cycles
is the arithmetic meanvalue of the probability density function of
the actual executedcycles , i.e., , whereis the probability that a
number of clock cycles are executedby task . We assume that the
probability density functions of
-
BAO et al.: TEMPERATURE-AWARE IDLE TIME DISTRIBUTION FOR LEAKAGE
ENERGY OPTIMIZATION 1189
TABLE IMOTIVATIONAL EXAMPLE: APPLICATION PARAMETERS
Fig. 1. Motivational example: static idle time distribution. (a)
First ITD. (b)Second ITD.
the execution cycles of different tasks are independent.
rep-resents the supply voltage at which the task is executed.
Thesupply voltage can be either constant for all tasks, or it canbe
calculated by a DVS algorithm, e.g., our temperature awareDVS
technique proposed in [20]. Further, and repre-sent the deadline
and the effective switched capacitance.The application is mapped
and scheduled on a processor
which has two power states: active and idle. In the active
statethe processor can operate at several discrete supply
voltagelevels. When the processor does not execute any task, it can
beput to the idle state, consuming a very small amount of
leakagepower .
III. MOTIVATIONAL EXAMPLE
A. Static Idle Time DistributionLet us consider an application
consisting of seven tasks which
share a global deadline of 96.85 ms. The worst case workload,(in
clock cycles), and average switched capacitance, ,
are given in Table I. The tasks run on a processor with a
fixedsupply voltage and frequency of 0.6 V and 132 MHZ,
respec-tively. The corresponding execution times are given inTable
I. Based on the performance of this processor, there exists6 ms
static slack, , in each execution period of this applica-tion. Fig.
1 gives two ways of distributing . The first distri-bution (first
ITD), as shown in Fig. 1(a), places the wholeafter the last task,
while the second distribution (second ITD),in Fig. 1(b), divides
the static slack into three segments andplaces the three idle slots
after execution of task , and ,respectively.For simplicity, in this
example, we ignore both energy and
time overhead due to switching between the active and idlemode.
The two different ITDs will lead to different temperatureand
leakage power profiles. The average working temperature
of each task, as well as the leakage energy consump-
TABLE IISTATIC ITD: LEAKAGE ENERGY COMPARISON
TABLE IIIMOTIVATIONAL EXAMPLE: AN ACTIVATION SCENARIO
tion, are shown in Table II. is the total leakage
energyconsumption of the whole application. Comparing for thefirst
and second ITD, we can observe that around 10% reductionof leakage
energy consumption can be achieved.The leakage energy reduction is
due to the modified working
temperature of the chip which has a strong impact on the
leakagepower. It is also important to mention that the table
reflects thesteady state (not the startup mode), for which energy
minimiza-tion is targeted. This means that the starting temperature
foris identical to the temperature at the end of the previous
period.
B. Dynamic ITDThe ITD approach outlined in the previous section
is an of-
fline static onewhich assumes that tasks execute theirWNC
and,thus, it only distributes the static slack. However, in
reality, mostof the time, there are huge variations in the number
of cycles ex-ecuted by a task, from one activation to the other,
which leads toa large amount of dynamic slack. For the task set
introduced inthe previous section, let us imagine the activation
scenario givenin Table III where the columns and contain the
actualexecuted workload (in clock cycles) and the corresponding
ac-tual execution time of each task, respectively. represents
thedynamic slack generated due to the actual number of cycles
ex-ecuted by task (it is the difference between the andof the
task). For this activation scenario, tasks , andexecute their worst
case workload, while , and executeless than their worst case
workload and, thus, generate dynamicslack. The total amount of
dynamic slack is18.7 ms.Fig. 2(a) illustrates the distribution of
idle time slots during
the above online activation scenario if we use the offline ITD
ap-proach which distributes static slack as illustrated in Fig.
1(b).In this case, the dynamic slack is placed where it is
gener-ated ( is placed after terminates). Table IV shows the
cor-responding working temperature and leakage energy consump-tion
of each task as well as the total leakage energy consump-
-
1190 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI)
SYSTEMS, VOL. 20, NO. 7, JULY 2012
Fig. 2. Motivational Example: ITD. (a) First ITD: Execution
scenario with onlystatic ITD. (b) Second ITD: Execution scenario
with both static and dynamicITD.
TABLE IVDYNAMIC ITD: LEAKAGE ENERGY COMPARISON
tion, which is 7.98 J. However, leakage energy can be reducedby
distributing the dynamic slack more wisely. For example,
atrun-time, whenever a task terminates, the idle time slot
lengthfollowing this task is calculated by taking into
consideration thecurrent time and the current chip temperature.
Fig. 2(b) showsthe ITD determined in this way. The corresponding
total leakageenergy consumed, as given in Table IV, is 7.32 J which
meansa leakage energy reduction of 8%. This reduction is due to
thefurther lowered working temperature of the energy hungry
tasks
, and , which is achieved by ITD considering both staticand
dynamic slack.
IV. PROBLEM FORMULATIONWe consider a set of periodic tasks
executed
in the order . For each task , the six-tuple:is given.
Corresponding
to the supply voltage that task is executed at, the worstcase
execution time , best case execution time , andexpected execution
time can be directly calculated.For each iteration of the
application, the total static slack
is constant and computed by (2)
(2)
where represents the deadline of the last task in the exe-cution
order, and is the sum of the worst case execu-tion time of all
tasks. The total dynamic slack for each executioniteration is
varying due to execution time variation of tasks. Forone iteration,
is calculated as follows:
where represents the actual execution time of task in
thisiteration. conforms to a distribution with the expected
exe-cution time as the arithmetic mean value of the
probabilitydensity function .The total available slack for one
iteration is equal to
the sum of the static slack and dynamic slack. During the
processor can be switched to idle mode
consuming the power . The time and energy overhead forswitching
the processor to and from the idle state are and ,respectively.
Idle slots can be placed after the execution of anytask. The length
of an idle slot after task is denoted as ,and the sum of all idle
slots should be equal with thetotal available idle time . Note that
the time overhead isincluded in the slot length .We will, formulate
the following two ITD problems.1) ITD with only static slack:
Static idle time distribution(SITD).
2) ITD with both static and dynamic slack: Static and dy-namic
idle time distribution (DITD).
A. ITD With Only Static Slack: SITDLet us consider the scenario
in which each task is always
executed with the worst case workload: . In thisscenario, for
each iteration, the available slack is constant andknown: , where
is computed by (2).For one iteration, the total energy consumption
of the task set
can be expressed as follows:
where and are the total dynamic andleakage energy of all tasks.
is the total energyoverhead when the processor is switched to/from
idle state,where is a binary variable indicating whether task
isfollowed or not by an idle slot. is thetotal energy consumption
during the idle time .The dynamic energy consumption of each task ,
is fur-
ther computed as
where is the supply voltage the task is executed at. rep-resents
the worst case execution time of task . As the supplyvoltage and
are constants, the total dynamic energy
is hence constant and independent from the distri-bution of idle
time. The total energy consumption during idletime is
where is the power consumption of the processor in thelow power
mode and . Similar to is alsofixed and independent from ITD, as is
constant with givensupply voltages.The leakage energy consumption
of each task is a func-
tion of both temperature and supply voltage
(3)
-
BAO et al.: TEMPERATURE-AWARE IDLE TIME DISTRIBUTION FOR LEAKAGE
ENERGY OPTIMIZATION 1191
where describes the temperature of the processor duringexecution
of task . With given supply voltages isinfluenced by the
distribution of idle time slots, so the leakageenergy consumption
depends on the ITD.We need to distribute the static slack to
minimize the total
leakage energy consumption and the energy overheads due
toswitching: . With given supplyvoltages , and a fixed distribution
of idle time slots, the samepower pattern is periodically executed
on the processor. As thetask set is executed for a large number of
iterations, the pro-cessor temperature is converging to a steady
state dynamic tem-perature curve (SSDTC). Once the processor has
reached steadystate, the SSDTC will repeat periodically.Our SITD
problem can be formulated as follows: given is a
set of tasks as defined earlier in this section. Theidle time
slot length following each task and, implicitly,(the binary
variable which represents whether task is followedby an idle time
slot or not) are to be determined such that theobjective function
(4) is minimized with the constraints (5) and(7) to be
satisfied.
Problem Formulation 1 Static Idle Time Distribution
(4)
subject to:
(5)
(6)
(7)
in (4) represents the steady-state leakage energy con-sumption.
The constraint in (5) requires that the sum of all idleslots
lengths should be equal with the total available static slack,
where is calculated by (2). The constraint in (6) guarantees
that the deadline of each task is satisfied. Finally, the
constraintin (7) requires that the processor temperature throughout
the ex-ecution of the task set should not exceed the maximal
allowableworking temperature of the chip , where describesthe
processor temperature during execution of task .
B. ITD With Both Static and Dynamic Slack: DITD
The above problem formulation ignores the execution
timevariations of tasks at run-time and, implicitly, ignores the
dy-namic slack. To deal with execution time variation and
performdynamic slack distribution, the idle slot length following
thetermination of a task should be determined, at run-time, basedon
the actual time and processor temperature.Our problem formulation
for DITD is as follows: given is a
set of periodic tasks as defined earlier in thissection. When
task terminates at time , the idle time slotfollowing task s
termination is determined such that (8)
is minimized, where is the total leakage energy
consumption of the remaining tasks , to beexecuted within the
current iteration.
Problem Formulation 2 Dynamic Idle Time Distribution
Minimize:
(8)
subject to:
(9)
(10)(11)
The leakage energy consumption of each remaining taskis
estimated corresponding to the case when the expected
workload is executed. is calculated according to (3) withthe
difference that the expected execution time is used in-stead of as
the upper limit for the integral. The constraint in(9) requires
that the sum of all idle slots lengths should be equalwith the
total available slack where is the time the currenttask terminates.
The total available slack is computed withthe assumption that all
the future tasks to are executedwith their expected workload . The
dead-line of each task is guaranteed by the constraint in (10),
whererepresents the deadline of task . Note that, the worst
case
execution time is used in (10) in order to guarantee thedeadline
of each task in the worst case. The constraint in (11)requires,
similarly to (7), the processor temperature during ex-ecution of
the task set to be lower than the maximal allowableworking
temperature of the chip .
V. TEMPERATURE ANALYSIS
A. Temperature ModelThermal Circuit: In order to analyze the
thermal behavior,
we build an equivalent RC thermal circuit based on the phys-ical
parameters of the die and the package. Due to the fact thatthe
application period can safely be considered significantlysmaller
than the RC time of the heat sink, which, usually, is inthe order
of minutes [27], the heat sink temperature stays con-stant after
the state corresponding to the SSDTC is reached. ForSSDTC
estimation, we, hence, can ignore the thermal capaci-tance (not the
thermal resistance!) of the heat sink and build the2-RC thermal
circuit shown in Fig. 3(a). and representthe temperature node for
the die and the heat spreader respec-tively. stands for the
processor power consumption as afunction of time. We obtain the
values of , andfrom an RC network similar to the one constructed in
Hotspot[7]. is calculated as the sum of the thermal resistance
ofthe die and the thermal interface material (TIM), and as thesum
of the thermal capacitance of the die and the TIM. isthe equivalent
thermal resistance from the heat spreader to the
-
1192 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI)
SYSTEMS, VOL. 20, NO. 7, JULY 2012
Fig. 3. Thermal circuit. (a) 2-RC thermal circuit. (b) 1-RC
thermal circuit.
ground through the heat sink, and is the equivalent
thermalcapacitance of the heat spreader layer.When the application
period is significantly smaller than
the RC time of the heat spreader in the 2-RC thermal circuit,the
heat spreader temperature stays constant after SSDTC isreached. In
this case, we can simplify the 2-RC to an 1-RCthermal circuit [see
Fig. 3(b)].Temperature Equations: For the 2-RC thermal circuit
in
Fig. 3(a), we can describe the temperatures of and
asfollows:
(12)
(13)
where and represent the temperatures at and ,respectively. The
power consumption is the sum of thedynamic and leakage power, which
are dependent on the supplyvoltage and .If, within a time interval,
the power consumption stays con-
stant, the temperature at the beginning and end of the
intervalcan be expressed by solving (12) and (13), where andare the
temperatures of and at the beginning of the timeinterval, while and
are the temperatures at the end ofthe time interval. , and are
constant coef-ficients determined by the length of time interval,
and by thevalues of , and
(14)(15)
B. SSDTC Estimation
As an input to the SSDTC calculation we have the voltagelevels,
calculated by the DVS algorithm, and a given idle timedistribution,
as illustrated in Fig. 4(a).In Fig. 4(a), we divide the execution
interval of each active
state step into several sub-intervals. The total number of
sub-intervals is denoted as . Each sub-interval is short enough
suchthat the temperature variation is small and the leakage power
canbe treated as constant inside the sub-interval.
is the power consumption for each sub-interval. When the
processor is in the active state during the th
sub-interval, is computed by (16), where and arethe supply
voltage and processor temperature at the start of theth
sub-interval
(16)
Fig. 4. Temperature analysis. (a) Voltage pattern. (b)
Steady-state dynamictemperature curve.
represents the dynamic power consumption whilerepresents the
leakage power consumption
based on the piecewise linear leakage model discussed inSection
II-A. When the processor is in idle state during the
thsub-interval, the power consumption .As shown in Fig. 4(b), we
construct the SSDTC by calcu-
lating the temperature values to . The relationship be-tween the
start and end temperature of each sub-interval can bedescribed by
applying (14) and (15) to all sub-intervals. Thus,we can establish
a linear system with equations as shown by(17)(20). and are the
temperature at the beginning ofthe th sub-interval
(17)(18)
(19)(20)
Due to periodicity, when dynamic steady state is reached,
theprocessor and heat spreader temperature at the beginning of
theperiod should be equal to the temperature at the end of the
pre-vious period
(21)
Solving the above linear system (17)(21), we get the values
forto and, hence, obtain the corresponding SSDTC. As
this system is a tridiagonal linear system, it can be solved
effi-ciently, e.g., through LU decomposition with only oper-ations
[28]. It should be mentioned that, in fact, two SSDTCsare obtained,
one reflecting the temperature of the chip, and theother based on
that of the heat spreader.
C. Transient Temperature Curve Estimation
The temperature calculated in the previous section
(SSDTC)corresponds to the dynamic steady state reached after a
suffi-cient number of iterations have been executed. However,
thesame technique can be used to calculate any transient
temper-ature curve (TTC), corresponding to an arbitrary time
interval,as long as the length of the time interval is
significantly smaller
-
BAO et al.: TEMPERATURE-AWARE IDLE TIME DISTRIBUTION FOR LEAKAGE
ENERGY OPTIMIZATION 1193
than the RC time of the heat sink (which is in the order of
min-utes). Under this assumption, as discussed earlier in this
sec-tion, the thermal model in Fig. 3 can be used. The only
differ-ence relative to the SSDTC calculation is that (21) is no
longervalid. To estimate the transient temperature curve (TTC),
thetemperature of and are given as input. The tempera-ture values:
and are cal-culated by solving (17)(20).
VI. ITD WITH ONLY STATIC SLACK (SITD)In this section we discuss
our solutions to the SITD problem,
as formulated in Section VI-A, which only considers staticslack.
We first introduce our approach ignoring the overheads
and in Section VI-A. This approach will be used inSection VI-B
where a general SITD technique is presented.
A. SITD Without Overhead (SITDNOH)Since, in this section, we
ignore the overheads
, it results from (4) that the cost to be minimized is, which is
the total leakage energy consumed
during task execution.Assuming that the execution interval of
task is divided intosub-intervals, the leakage energy consumption
of is the
sum of the leakage energy
(22)
where and represent the processor SSDTCtemperatures at the
beginning and end of the th sub-interval andthe length of this
sub-interval, respectively. The model in (1) isused to compute the
leakage power, , in each sub-interval.Let us first assume that the
chip (as well as the heat spreader)
temperature at the termination of each task is known and is
in-dependent of the starting temperature of the task. Under this
as-sumption, we can formulate SITDNOH as a convex nonlinearproblem
shown in (23)(36), where the objective function to beminimized is
the total leakage energy for all tasks .The optimization variables
to be calculated are the idle slotlengths .
Formulation 1 SITD with No Overheads Consideration
Minimize:
(23)
Subject to:(24)
(25)
(26)
(27)(28)
(29)
(30)
(31)(32)
(33)(34)(35)
(36)
Equation (25) requires the sum of idle slots lengths to be
equalwith the total available idle time: . Equation(26) guarantees
that the deadline of each task is satisfied. Asmentioned above, the
processor and heat spreader temperaturesat the end of task , and ,
are considered known andassigned by (27) and (28), respectively,
where andare given constants. and are the processor temper-ature at
the beginning and end of th sub-interval in the execu-tion of task
, and are given by (29) similar to (17) and (19) inSection V-B.
Equation (30) describes the same relationship forthe heat spreader
temperature. and are the pro-cessor and heat spreader temperatures
at the start of task ,and are dependent on the finishing
temperature of the previoustask and the idle slot placed after . If
we assume thatall idle slots are significantly shorter than the RC
time ofthe heat spreader, then we can describe the processor
temper-ature behavior during the idle slot by (31) and (33), based
onthe 1-RC thermal circuit described in Section V-A. is
thesteady-state temperature that the processor would reach ifwould
be consumed for a sufficiently long time and is calculatedaccording
to (35). is the sum of the two thermal resistancesand in Fig. 3(b).
Under the same assumption as above,
the heat spreader temperature stays constant during the idle
slotas shown in (32) and (34).2 Equations (31) and (32)
calculatethe processor and heat spreader temperature at the end of
theidle slot following task and, implicitly, the starting
temper-ature of . Equation (33) and (34) compute the temperatureat
the start of task , taking into consideration that this taskstarts
after the idle period following task (the task set is ex-ecuted
periodically). Finally, the constraint in (36) requires thatthe
processor temperatures during execution of the task set,
, do not exceed the maximalallowable working temperature of the
chip . The presentedformulation is a convex nonlinear problem, and
can be solvedefficiently in polynomial time [29].SITDNOH Algorithm:
The above formulation is based on the
particular assumption that the temperature at the end of a
taskis known and fixed. However, in reality, this is not the
case,2Idle periods are supposed to be short. If, exceptionally,
they are not signifi-
cantly shorter than the heat spreader RC time, we use the 2-RC
circuit to modelthe temperature during the idle period in (31)(34).
This will not affect the con-vexity of the formulation.
-
1194 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI)
SYSTEMS, VOL. 20, NO. 7, JULY 2012
Fig. 5. SITDNOH heuristics.
and the temperature and [(27) and (28)] at the ter-mination of a
task depend on the starting temperature of thetask and, implicitly,
on the distribution of the idle time. Thismakes the above
formulation become a non-convex program-ming problem which is very
time consuming to solve. In orderto solve the problem efficiently
we have developed an iterativeheuristic outlined in Fig. 5.The
heuristic starts with an arbitrary initial ITD, for example,
that the entire idle time is placed after the last task
.Assuming this ITD and the given voltage levels,
steady-statedynamic temperature analysis is performed, as described
inSection V-B. Given the obtained SSDTC, the leakage
energyconsumption corresponding to the assumed ITDis calculated.
From the SSDTC we can also extract the finaltemperatures and for
each task . Assuming this
and as the final temperatures in (27) and (28), wecan calculate
the idle time using the convex optimizationformulated in
(23)(35).From the new ITD resulted after the optimization, we
calcu-
late a new SSDTC which provides new temperatures andat the end
of each task . The new total leakage energy
consumption , corresponding to the updated ITD, isalso
calculated. The process is continued assuming the new
endtemperatures in (27) and (28) and the convex optimization
pro-duces a new ITD.The iterations stop when the temperature
converges (i.e.,
). However, it can happenthat, after a certain point, additional
iterations do not signifi-cantly improve the ITD. Therefore, even
if convergence has notyet been reached, the optimization is stopped
if no significantenergy reduction has been achieved: .Our
experiments have shown that maximum five iterations areneeded with
and .
B. SITD With Overhead (SITDOH)The approach presented in Section
VI-A is based on the as-
sumption that time and energy overheads for switching the
pro-cessor to and from the idle state, and , are zero, whichis not
the case in reality. If we consider the hypothetical casethat the
end temperature of each task is known, the problemcan be formulated
similar to (23)(35), with the main differ-ence that the total
energy to be minimized is given in (4). Basedon this formulation,
we could solve the SITDOH problem for
Fig. 6. SITDOH heuristics. (a) Step1, (b) Step2.
the real case, when the end temperatures are not supposed to
beknown, similarly to the approach described in Fig. 5. However,the
formulation with the objective function (4), due to the bi-nary
variable , is a mixed integer convex programing problemwhich is
very time consuming to solve. We, hence, propose anSITDOH heuristic
based on the SITDNOH approach presentedin Section VI-A.Our SITDOH
heuristic comprises two steps. In the first step
an optimization of the idle time distribution is performed
byeliminating idle intervals whose lengths are smaller than a
cer-tain threshold limit. In the second step, the ITD is further
refinedin order to improve energy efficiency.A lower bound on the
length of an idle slot can be
determined by considering the following two bounds.1) No idle
slot is allowed to be shorter than , the total timeneeded to switch
to/from the idle state.
2) The energy overhead due to switching should be compen-sated
by the gain due to putting the processor into the idlestate. The
energy gain for an idle interval is computed as
(37)
where is the processor temperature as a function oftime during
the idle time interval . is the supplyvoltage for . is the leakage
power in theactive state during the idle time interval. Thus, in
orderfor the overhead to be compensated, we need .As depends on the
temperature, the threshold lengthof an idle slot is not a given
constant. Nevertheless, thislength will be always larger than
,where is the leakage power atthe maximum temperature at which the
processor isallowed to run.
In conclusion, for the first step of the SITDOH heuristic
[seeFig. 6(a)], we consider: .The basic idea of the first step is
that no idle slot is allowed
to be shorter than . Thus, after running SITDNOH, theobtained
ITD is checked slot by slot. If a slot length issmaller than , this
slot will be removed. In order to achievethis, the particular
constraint in (24), corresponding to slot
-
BAO et al.: TEMPERATURE-AWARE IDLE TIME DISTRIBUTION FOR LEAKAGE
ENERGY OPTIMIZATION 1195
, is changed from to . After all slots havebeen visited and (24)
updated, SITDNOH is performed again.The obtained ITD is such that
all slots which in the previousiteration have been found shorter
than have disappearedand the corresponding idle time has been
redistributed amongother tasks. The process is repeated until no
slot shorter than
has been identified.After step1, we still can be left with slots
that are too short to
be energy efficient. There are the following two reasons for
this.1) Due to the fact that the processor is running at a
tempera-ture lower than the maximum allowed , it can happenthat the
real is smaller than the one considered instep1.
2) Even if , which means that an energy reductiondue to the idle
slot is obtained, energy efficiency can, pos-sibly, be improved by
eliminating the slot and distributingthe corresponding idle time
among other slots.
In the second step [see Fig. 6(b)], we start from the
shortestidle slot and consider to eliminate it [by setting the
corre-sponding constraint in (24)]. If the ITD obtained
afterapplying SITDNOH is more energy efficient, the new ITD
isaccepted. The process is continued as long as, by eliminating
aslot, the total energy consumption is reduced.
VII. ITD WITH DYNAMIC AND STATIC SLACK (DITD)
The above SITD approach determines idle time settings as-suming
that tasks always execute their WNC. However, in orderto exploit
the dynamic slack, the slot length has to be deter-mined at
run-time based on the values of the current time andtemperature
after termination of task . In principle, calculatingthe
appropriate implies the execution of a temperature awareITD
algorithm similar to the one described in Section VI-B[with the
objective function and constraints in (8)(10)]. Run-ning this
algorithm online, after execution of each task, impliesa time and
energy overhead which is not acceptable.To overcome the above
problem, we have divided our DITD
approach into an offline and an online phase. In the offline
phase,idle time settings for all tasks are pre-computed, based on
pos-sible finishing times and finishing temperatures of the task.
Theresults are stored in lookup tables (LUTs), one for each task.
InFig. 7, we show two such tables. They contain idle time
settingsfor combinations of possible termination times and
finishingtemperatures of a task .
A. Online Phase
The online phase is illustrated in Fig. 7. Each time a
taskterminates, the length of the idle time slot following the
ter-mination of has to be fixed; the online scheme chooses the
ap-propriate setting from the LUT , depending on the actualtime and
temperature sensor reading. If there is no exact entry in
, corresponding to the actual time/temperature, the
entrycorresponding to the immediately higher time and closest
tem-perature value is selected. For example, in Fig. 7, finishes
attime 1.35 ms with a temperature 78 C. To determine the
appro-priate idle time slot length is accessed. As there is noexact
entry with 1.35ms and 78 C, the entry cor-responding to termination
time 1.5 ms and temperature 70 C is
Fig. 7. DITD online phase.
chosen. Hence, the processor will be switched to the idle
statefor 0.5 ms before the next task, , starts.We should notice
that, according to our temperature model
presented in Section V, the state of the system is defined
byboth the die and the heat spreader temperatures. In our
LUTs,however, we only consider the die temperature for taking
thedecision on the idle slack. This is due to the following
reasons.1) It is both impractical and potentially expensive to
obtain,at run-time, temperature readings from the heat
spreader.
2) The variations of the heat spreader temperature are
smallcompared to those of the chip. This is due to the fact thatthe
heat capacitance of the heat spreader is much largerthan that of
the chip.
3) Considering also the heat spreader temperature as an
addi-tional dimension in the LUTs would dramatically increasethe
size of the tables without significant contribution to en-ergy
efficiency.
Thus, when generating the LUTs, we will consider that, at
thetermination of a task , the heat spreader has a certain
expectedtemperature . In Section VII-E we will show how
iscalculated.
B. Offline Phase
In the offline phase, one LUT table is generated for each
task.The LUT table generation algorithm is illustrated in Fig. 8.
Theoutermost loop iterates over the set of tasks and
successivelyconstructs the table for each task . The next loop
gen-erates entries corresponding to the various possible fin-ishing
temperatures of . Finally, the innermost loop iter-ates, for each
possible finishing temperature, over all consideredtermination
times of task .The algorithm starts by computing the earliest
and
latest possible finishing times , as well as the lowestand
highest possible finishing temperature for each task. With a given
finishing time and finishing temperature
of task , the innermost loop performs the slack dis-tribution
step DITDOH, iteratively. We describe the DITDOHalgorithm in
Section VII-C. For successive iterations, the fin-ishing
temperature and time will be increased with thetime and temperature
quanta and , respectively. Thecalculation of the parameters , and
as
-
1196 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI)
SYSTEMS, VOL. 20, NO. 7, JULY 2012
Fig. 8. DITD offline phase.
well as the determination of the granularities and number of
en-tries along the time and temperature dimensions are presentedin
Sections VII-D and VII-E, respectively.
C. DITDOH AlgorithmWhen calculating the actual LUT entries for a
task , the
ITD algorithm DITDOH is performed to determine the idle
slotlength following the termination of , with given termina-tion
time and temperature, based on the problem formulationdescribed in
Section IV-B. DITDOH is similar to SITDOH out-lined in Section
VI-B. However, unlike the formulation used inSITDOH [see (23)(36)]
which is based on SSDTC estimation,the formulation used for DITDOH
is based on the estimation ofa transient temperature curve (TTC)
described in Section V-C.Since we do not rely on the fact that
successive iterations of theapplication are identical and that
tasks execute always with theirworst case number of cycles, we do
not calculate an SSDTCcorresponding to the dynamic steady state.
But, instead, we es-timate a TTC.The formulation used for DITDOH is
shown in (38)(53).
Formulation 2 DITD with No Overheads Consideration
Minimize:
(38)
Subject to:(39)
(40)
(41)(42)
(43)(44)(45)
(46)(47)
(48)
(49)
(50)(51)(52)
(53)
As mentioned in Section IV-B, the energy is optimized for
thecase that the future tasks to execute their expected time
which, in reality, happens with a much higher probabilitythan,
e.g., the (nevertheless, idle time slots are distributedsuch that,
even in the worst case, deadlines are satisfied). Theobjective
function (38) to be minimized is the total leakage en-ergy of
further tasks to be executed in the current iteration:
. Equation (38) is similar to (23) with thefollowing two
differences.1) It refers only to the remaining tasks .2) The
execution interval of a task , which is divided into
subintervals, is not corresponding to the worst case, but to the
expected case .
The optimization variables to be calculated are the idle
slotlengths . Equation (40) requires that thesum of all idle slot
lengths should be equal to the total availableidle time, where is
the current tasks finishing time. The totalavailable idle time is
calculated based on the assumption that allfuture tasks are
executed with their expected workload.Equation (41) guarantees the
deadline of task the next
task to be executed after the termination of the current task
inthe worst case (task executed with ). In order to guar-antee that
all future tasks meet their deadlines in the worst case,(42)
requires that finishes before , in the worstcase. The latest
finishing time (see Section VII-D) isthe latest termination time of
task that still allows futuretasks, following , to satisfy their
deadline even if their worstcase workloads are executed. Thus, (41)
and (42) guarantee notonly that the deadline of is satisfied in the
worst case butalso that finishes in time for all the remaining
tasks to beable to meet their deadline in the worst case. Equation
(43) en-forces the deadline of the remaining tasks, considering
that they execute their expected workload. This
means that the idle time following task is determined suchthat
it guarantees deadlines to be satisfied in the worst case but
-
BAO et al.: TEMPERATURE-AWARE IDLE TIME DISTRIBUTION FOR LEAKAGE
ENERGY OPTIMIZATION 1197
is optimized for the situation that tasks execute their
expectedworkloads.Similar to (27) and (28), (44) and (45) specify
the processor
and heat spreader temperatures at the finishing of task :and .
Equation (46) computes the processor temperature atthe beginning of
task similar to (33), where is thechip temperature at the
termination of the current task . Simi-larly, (47) computes the
heat spreader temperature at the begin-ning of task , where is, as
described in Section VII-A,the expected heat spreader temperature
at the termination of task. is pre-calculated as will be explained
in Section VII-E.
Equation (48)(53) compute the TTC of processor/heat
spreaderbased on our TTC estimation method described in Section
V-C,where and are the processor temperature at thebeginning and end
of the th sub-interval during the execu-tion of task . Finally,
throughout the execution of the futuretasks , the processor
temperatures
should not exceed themaximal allowable working of the chip as
imposed by theconstraint in (53). The above formulation is a convex
nonlinearproblem and can be solved efficiently in polynomial time
[29].
D. Time Bounds and GranularityIn the first step of the algorithm
in Fig. 8, the and
for each task are calculated. The earliest finishing timeis
calculated based on the situation that all tasks execute theirbest
case execution time . The latest finishing time iscalculated as the
latest termination time of that still allowsall tasks , to satisfy
their deadlines when they executetheir worst case execution time
.With the time interval for task , a straight-
forward approach to determine the number of entries along
thetime dimension would be to allocate the same number of
entriesfor each task. However, the time interval sizescan differ
very much among tasks, which should be taken intoconsideration when
deciding on the number of time entries .Therefore, given a total
number of entries along the time di-mension , we determine the
number of time entries in each
, as follows:
The corresponding granularity along the time dimension isthe
same for all tasks and is obtained as follows:
E. Temperature Bounds and GranularityThe granularity along the
temperature dimension is the
same for all task and has been determined experimentally.Our
experiments have shown that values around 15 are appro-priate, in
the sense that finer granularities will only marginallyimprove
energy efficiency.To determine the number of entries along the
temperature
dimension, we need to calculate the temperature intervalat the
termination of each task. In fact, it is not
needed to determine the bounds of the temperature interval
exactly. A good estimation, such that, at run-time,
temperaturereadings outside the determined interval will happen
rarely, issufficient. If the temperature readings exceed the
upper/lowerbound of the interval, the idle time setting
corresponding tothe highest/lowest temperature value available in
the LUT willbe used. We have developed an estimation technique for
thetemperature interval , which balances computationcomplexity and
accuracy of the results.In order to estimate the temperature bounds
and , we
define the following two run-time scenarios. Worst Case
Execution Scenario: In which the actual exe-cution time of each
task is always equal to its worst caseexecution time: .
Best Case Execution Scenario: In which the actual execu-tion
time of each task is always equal to its best caseexecution time:
.
In both scenarios, the processor will execute the corre-sponding
periodic power pattern repeatedly and the processortemperature will
eventually reach the corresponding steadystate dynamic temperature
curve (denoted as forthe worst case scenario and for the best case
sce-nario, respectively). From the corresponding SSDTC, we
canobtain, for each task , its finishing temperature. We use
thefinishing temperature of task corresponding to the worst
caseexecution scenario, , as the upper bound of the
finishingtemperature of task ; the finishing temperatureof task
corresponding to the best case execution scenario,
, will be used as the lower bound: .In order to obtain the we
first perform the SITDOH
heuristic (see Fig. 6). Then, temperature analysis (see
SectionV)produces the temperature curve for the worst case scenario
withthe corresponding idle time distribution generated by
SITDOH.The curve is obtained in a similar way, by replacing
with in the constraint in (25).With the upper and lower bounds
and obtained for
each task, the number of the entries along the temperature
di-mension, for task , is
where is the granularity along the temperature dimension.As
mentioned in Section VII-A, when generating the LUTs,
we consider that, at the termination of a task , the
heatspreader has a certain expected temperature . In orderto obtain
these temperatures, we perform the same procedureas outlined above
but, in this case, considering the expectedexecution time of each
task: . We obtain the temper-ature curve corresponding to the heat
spreader (seeSection V-B), from which we extract the expected
temperatureof the heat spreader, , at the termination of each task
.
VIII. EXPERIMENTAL RESULTS
A. Evaluation of the Thermal ModelExperimental Setup: We have
evaluated our thermal model
considering platforms with parameter settings based on
valuesfrom [30], [31] and [32]. We consider die areas of 6 6, 8
8,and 10 10 mm . The heat spreader area is five times the diearea,
and the heat sink area is between 1.3 and 1.4 times the area
-
1198 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI)
SYSTEMS, VOL. 20, NO. 7, JULY 2012
Fig. 9. SSDTC estimation with our approach VS. versus
hotspot.
of the heat spreader. The thickness of the die and heat
spreaderare 0.5 and 2 mm, respectively. The thickness of the heat
sink isbetween 10 and 20 mm. The coefficients corresponding to
thepower model in Section II-A are based on [2] and [24]. For
thetemperature calculation (see Sections V-B and V-C) we
haveconsidered a piecewise linear leakage model with three
seg-ments, as recommended in [26].Accuracy: Wefirst performed a set
of experiments to evaluate
the accuracy of our temperature analysis approach proposed
inSectionV.We randomly generated 500 periodic voltage
patternscorresponding to applications with periods in the range
between5 and 100 ms. For each application, considering the
coefficientsand platform parameters outlined above, we have
computedthe SSDTC using the approach proposed in Section V-B and
byusing Hotspot simulation. For each pair of temperature
curvesobtained, we calculated the maximum deviation as the
largesttemperature difference between any corresponding pairs
ofpoints (in absolute value), as well as the average deviation.
Fig. 9illustrates the results for different application periods.
For ap-plications with a period of 50 ms, for example, there is no
singlecase with a maximum deviation larger than 2.1 C, and the
av-erage deviation is 0.8 C. Over all 500 applications, the
averageand maximum deviation are 0.8 C and 3.8 C, respectively.We
can observe that the deviation increases with the increasingperiod
of the application. This is due to the fact that, with
largerperiods, accuracy can be slightly affected by neglecting
thethermal capacitance of the heat sink (see Section
V-A).Computation Time: We have compared the corresponding
computation time of our SSDTC generation approach with thetime
needed by Hotspot. Fig. 9 illustrates the average speedupas the
ratio of the two execution times. The speedup is between3000 for
periods of 5 ms and 20 for 100 ms periods. An in-creasing period
leads to a larger linear system that has to besolved for SSDTC
estimation (see Section V-B), which explainsthe shape of the
speedup curve in Fig. 9.The accuracy and speedup of our approach
are also depen-
dent on the length of the sub-interval considered for the
temper-ature analysis (see Section V-B and Fig. 4). For the
experimentsthroughout this paper, the length of the sub-interval is
2 ms. Thisis based on the observation that reducing the length
beyond thislimit does not improve the accuracy significantly.
B. Evaluation of the ITD HeuristicsWe have used both generated
test applications as well as a
real life example in our experiments to evaluate our DITD
ap-proach presented in Section VII. This, implicitly, also
evaluatesthe SITD approach in Section VI.
Experimental Setup: We have randomly generated 100
testapplications consisting of 30 to 100 tasks. The workload in
theworst case (WNC) for each task is generated randomly in therange
clock cycles, while the workload in thebest case is generated in
the range clock cycles.To generate the expected workload of each
task, the fol-lowing steps are performed.1) The value of the
expected total dynamic idle time, , isgiven as an input: is the
total dynamic slack when alltasks execute their workload in the
expected case:
.2) is divided into a number of sub-intervals withequal length
.
3) The sub-intervals are allocated among all tasks basedon a
uniform distribution; as result, each task is allocateda number of
sub-intervals.
4) The expected workload of task is, thus, deter-mined as: ,
where is theprocessor frequency when task is executed.
In order to evaluate our DITD technique, we have considereda
straightforward approach (SFA) for comparison. This SFAscenario
corresponds to the natural execution procedure for thecase when no
idle time distribution is performed. Following thisapproach, tasks
are executed according to a static schedule gen-erated based on the
worst case execution time. According to thisschedule, the static
slack is placed at the end of the application,after the last task.
At run-time, when the tasks execute less thantheir WNC and the
generated dynamic slack is large enough, theprocessor is put in
idle mode.We have applied both the DITD and SFA approaches on
the
same test application. When we simulate the execution of thetest
applications, the actual number of executed clock cycles ofa task
is generated using a random number generator accordingto the beta
distribution [38]. The parameters
and are determined based on: 1) the expected work-load and 2) a
given standard deviation of the executedclock cycles of task . The
Hotspot system [7] is used to sim-ulate the sensor readings which
track the temperature behaviorof the platform during the execution
of a test application.In our experiments, the granularity along the
time and tem-
perature dimensions for the LUT tables is set to 1.52.0 ms and15
20 , respectively. It is important to mention that in all
ourexperiments we have accounted for the time and energy over-head
imposed by the online phase of our DITD. Similarly, wehave also
taken into consideration the energy overhead due tothe memory
access. This overhead has been calculated based onthe energy values
given in [33] and [34]. The energy and timeoverheads due to power
state switching are set to 0.5 mJand 0.4 ms, respectively,
according to [6].After applying both the DITD and SFA approaches on
a test
application, we compute the corresponding leakage energy
re-duction due to our DITD approach compared to the SFA:
%, where and are theconsumed leakage energy corresponding to the
SFA and DITDapproach, respectively.Leakage Energy Reduction Versus
Slack Time Ratios: We
first performed experiments considering different combinationsof
static and dynamic idle time ratio . The static idle
-
BAO et al.: TEMPERATURE-AWARE IDLE TIME DISTRIBUTION FOR LEAKAGE
ENERGY OPTIMIZATION 1199
Fig. 10. Leakage energy reduction with switching overheads.
Fig. 11. Leakage energy reduction with no switching
overheads.
Fig. 12. Leakage energy reduction with different standard
deviations.
time ratio is computed as: , whereis the deadline of the last
task in execution order. The dy-
namic idle time ratio is calculated as: , where isthe total
dynamic slack when all tasks execute their workloadin the expected
case, as described earlier in this section. Fig. 10shows the
averaged leakage energy reduction over all test ap-plications. The
energy reduction achieved by DITD grows withthe available amount of
static and dynamic slack. Withand , for example, leakage energy can
be reduced with20% by applying our DITD approach.The DITD approach
proposed in this paper achieves leakage
energy reduction due to two main features: 1) it is
temperatureaware, which means that idle time is distributed such
that thetemperature is controlled in order to minimize leakage and
2) itredistributes slack such that the number of idle slots which
aretoo short to switch power state, is minimized. The
followingquestion has to be answered: how much does the
temperatureawareness of our approach contribute to the energy
reduction?In order to answer this question we have repeated the
aboveexperiments considering a hypothetical scenario with
zeroswitching overhead: 0 mJ and 0 ms. The resultsare shown in Fig.
11. Under such a scenario, the processor canbe switched to the low
power state for the duration of the totalidle time (regardless the
length of the individual idle slots).Thus, the energy gains
obtained with DITD compared to SFA,as illustrated in Fig. 11, are
exclusively due to the temperatureawareness of the DITD
approach.From Figs. 10 and 11 one can also observe the efficiency
of
the ITD approach with only static slack (SITD, Section VI).
Fig. 13. Computation time.
The cases where (no dynamic slack) are, in fact, corre-sponding
to those situationswhen only static slack is distributed.Obviously,
in the cases that both and , there is noslack to distribute and,
thus, the energy reduction is zero.Leakage Energy Reduction Versus
Standard Deviation: As
mentioned, for our experiments we have generated workloadsfor
each task according to a beta distribution ,where and are
determined based on the expected work-load and standard deviation
of the executed workload.For the above experiments, the standard
deviation for eachtask is considered to be: . As thestandard
deviation has an influence on the potential leakage re-duction, we
have repeated the above experiments, consideringthree different
settings of , namely,
, and . Fig. 12shows the leakage reduction % by applying our
DITD ap-proach relative to the SFA, with different standard
deviationsettings. We have considered test applications having
static anddynamic ratios of: and . As can be observed,the
efficiency of the DITD approach increases as the standarddeviation
decreases. This is due to the fact that our DITD algo-rithm is
targeted towards optimizing the energy consumption forthe case that
tasks execute the expected number of cycles ENC.When the standard
deviation is smaller, more of the actual ex-ecuted number of clock
cycles are clustering around the ENCand, therefore, our DITD
approach can achieve better leakagereduction.Computation Time: We
have also evaluated the computation
time for the offline phase of our DITD approach. The results
aregiven in Fig. 13.MPEG2 Decoder: We have applied our DITD
approach to a
real-life application, namely anMPEG2 decoder, which consistsof
34 tasks.3 We have considered a platform with the size of thechip,
heat spreader, and heat sink of 8 8 mm , 18 18 mm ,and 22 22 mm ,
respectively. The thickness of the chip, heatspreader, and the heat
sink is 0.5, 2, and 15 mm, respectively.The execution time
distribution of the tasks has been obtainedfrom simulations on the
MPARM platform [35]. We consideredthe following two overhead
settings: 1) 0.5 mJ,0.4 ms and 2) 1.0 mJ, 0.8 ms. The leakage
energyreduction by applying our DITD approach relative to the
SFAapproach is 32.5% and 40.8%, respectively.
IX. CONCLUSION
We first proposed a static temperature aware ITD approachfor
leakage energy optimization where only static slack is3
http://ffmpeg.mplayerhq.hu/
-
1200 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI)
SYSTEMS, VOL. 20, NO. 7, JULY 2012
considered. In order to consider both static and dynamic
slack,we then proposed a dynamic temperature aware ITD
approach,which consists of an offline and an online step. The
experimentshave demonstrated that considerable energy reduction can
beachieved by our temperature aware ITD approaches. In order
toefficiently perform temperature analysis inside our
optimizationloop for idle time distribution, we have also proposed
a fast andaccurate system level temperature analysis approach.
REFERENCES[1] A. Andrei, P. Eles, and Z. Peng, Energy
optimization of multipro-
cessor systems on chip by voltage selection, IEEE Trans. Very
LargeScale Integr. (VLSI) Syst., vol. 15, no. 3, pp. 262275, Mar.
2007.
[2] Y. Liu, H. Yang, R. Dick, H. Wang, and L. Shang, Thermal vs
energyoptimization for DVFS-enabled processors in embedded systems,
inProc. ISQED, 2007, pp. 204209.
[3] T. Ishihara and H. Yasuura, Voltage scheduling problem for
dy-namically variable voltage processors, in Proc. ISLPED, 1998,
pp.197202.
[4] A. Andrei, P. Eles, Z. Peng, M. Schmitz, and B. M.
Al-Hashimi,Quasi-static voltage scaling for energy minimization
with timeconstraints, in Proc. DATE, 2005, pp. 514519.
[5] C. Xian, Y. H. Lu, and Z. Li, Dynamic voltage scaling for
mul-titasking real-time systems with uncertain execution time,
IEEETrans. Comput.-Aided Design Integr. Circuits Syst., vol. 27,
no. 8, pp.14671478, Aug. 2008.
[6] R. Jejurikar, C. Pereira, and R. Gupta, Leakage aware
dynamicvoltage scaling for realtime embedded systems, in Proc. DAC,
2004,pp. 275280.
[7] W. Huang, S. Ghosh, S. Velusamy, K. Sankaranarayanan, K.
Skadron,and M. Stan, Hotspot: A compact thermal modeling
methodology forearly-stage VLSI design, IEEE Trans. Very Large
Scale Integr. (VLSI)Syst., vol. 14, no. 5, pp. 501513, May
2006.
[8] Y. Yang, Z. P. Gu, R. P. Dick, and L. Shang, Isac:
Integrated spaceand time adaptive chip-package thermal analysis,
IEEE Trans.Comput.-Aided Design Integr. Circuits Syst., vol. 26,
no. 1, pp. 8699,Jan. 2007.
[9] S. Wang and R. Bettatin, Delay analysis in temp.-constrained
hardreal-time systems with general task arrivals, in Proc. RTSS,
2006, pp.323334.
[10] R. Jayaseelan and T. Mitra, A hybrid local-global approach
for multi-core thermal management, in Proc. ICCAD, 2008, pp.
618623.
[11] S. Zhang and K. S. Chatha, System-level thermal aware
design ofapplications with uncertain execution time, in Proc.
ICCAD, 2008,pp. 242249.
[12] R. Rao and S. Vrudhula, Fast and accurate prediction of the
steady-state throughput of multicore processors under thermal
constraints,IEEE Trans. Comput.-Aided Design Integr. Circuits
Syst., vol. 28, no.10, pp. 15591572, Oct. 2009.
[13] D. Brooks, R. P. Dick, R. Joseph, and L. Shang, Power,
thermal,and reliability modeling in nanometer-scale
microprocessors, IEEEMicro, vol. 27, no. 3, pp. 4962, May 2007.
[14] B. Nikhil, K. Tracy, and P. Kirk, Speed scaling to manage
energy andtemperature, J. ACM, vol. 54, no. 1, pp. 139, 2007.
[15] T. Chantem, R. Dick, and X. Hu, Temperature-aware
scheduling andassignment for hard real-time applications on mpsocs,
in Proc. DATE,2008, pp. 288293.
[16] Y. Ge, P. Malani, and Q. Qiu, Distributed task migration
for thermalmanagement inmany-core systems, inProc. DAC, 2010, pp.
579584.
[17] S. Zhang and K. S. Chatha, Approximation algorithm for
thetemperature-aware scheduling problem, in Proc. ICCAD, 2007,
pp.281288.
[18] S. Zhang andK. Chatha, Thermal aware task sequencing on
embeddedprocessors, in Proc. DAC, 2010, pp. 585590.
[19] R. Rao and S. Vrudhula, Efficient online computation of
core speedsto maximize the throughput of thermally constrained
multi-core pro-cessors, in Proc. ICCAD, 2008, pp. 537542.
[20] M. Bao, A. Andrei, P. Eles, and Z. Peng,
Temperature-awarevoltage selection for energy optimization, in
Proc. DATE, 2008, pp.10831086.
[21] L. Yuan, S. Leventhal, and G. Qu, Temperature-aware leakage
mini-mization technique for real-time systems, in Proc. ICCAD,
2006, pp.761764.
[22] C. Yang, J. Chen, L. Thiele, and T. Kuo, Energy-efficient
real-timetask scheduling with temperature-dependent leakage, in
Proc. DATE,2010, pp. 914.
[23] M. Bao, A. Andrei, P. Eles, and Z. Peng, Temperature-aware
idle timedistribution for energy optimization with dynamic voltage
scaling, inProc. DATE, 2010, pp. 2126.
[24] S. Martin, K. Flautner, T. Mudge, and D. Blaauw, Combined
dy-namic voltage scaling and adaptive body biasing for lower power
mi-croprocessors under dynamic workloads, in Proc. ICCAD, 2002,
pp.721725.
[25] W. P. Liao, L. He, and K. M. Lepak, Temperature and supply
voltageaware performance and power modeling at micro-architecture
level,IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol.
24, no.7, pp. 10421053, Jul. 2005.
[26] Y. Liu, R. Dick, L. Shang, and H. Yang, Accurate
temperature-de-pendent integrated circuit leakage power estimation
is easy, in Proc.DATE, 2007, pp. 16.
[27] R. Rao and S. Vrudhula, Performance optimal processor
throttlingunder thermal constraints, in Proc. CASES, 2007, pp.
257266.
[28] W.H. Pressa, S. A. Teukolsky,W. T. Vetterling, and B. P.
Flannery, Nu-merical Recipes 3rd Edition: The Art of Scientific
Computing. Cam-bridge, U.K.: Cambridge Univ. Press, 2007.
[29] Y. Nesterov and A. Nemirovskii, Interior-Point Polynomial
Al-gorithms in Convex Programming. Philadelphia, PA: Society
forIndustrial and Applied Mathematics, 1987.
[30] IBM, Powerpc 970 mp thermal considerations, Appl. Note.[31]
Intel, Intel Core 2 Duo Mobile Processors on 65-nm process for
em-
bedded applications: Thermal design guide,.[32] Intel, Intel
Core 2 Duo Mobile Processors on 45-nm process for em-
bedded applications: Thermal design guide,.[33] S. Hsu et al., A
4.5-ghz 130-nm 32-kb l0 cache with a leakage-tolerant
self reverse-bias bitline scheme, IEEE J. Solid-State Circuits,
vol. 38,no. 5, pp. 755761, May 2003.
[34] A. Macii, E. Macii, and M. Poncino, Improving the
efficiency ofmemory partitioning by address clustering, in Proc.
DATE, 2003, pp.1823.
[35] L. Benini, D. Bertozzi, D. Bruni, N. Drago, F. Fummi, andM.
Poncino,Systemc cosimulation and emulation of multiprocessor soc
designs,Comput., vol. 36, pp. 5359, 2003.
Min Bao received the M.S. degree in computer engineering from
the Universityof Electronic Science and Technology, China, in 2007.
She is currently pursuingthe Ph.D. degree from Linkoping
University.Her research interests include embedded systems
low-power and tempera-
ture-aware design, software/hardware co-design.
Alexandru Andrei received the M.S. degree in computer
engineering from Po-litehnica University, Timisoara, Romania, in
2001 and the Ph.D. degree fromLinkoping University, Linkoping,
Sweden, in 2007.He is with Ericsson, Linkoping, Sweden. His
research interests include em-
bedded systems architectures and design, low-power design,
real-time systemsand hardware/software codesign.
Petru Eles (M99) is a Professor of embedded computer systems
with the De-partment of Computer and Information Science (IDA),
Linkoping University,Linkoping, Sweden. His current research
interests include embedded systems,real-time systems, electronic
design automation, cyber-physical systems, hard-ware/software
codesign, low power system design, fault-tolerant systems, de-sign
for test. He has published a large number of technical papers in
these areasand co-authored several books.
Zebo Peng (M91SM02) received the Ph.D. degree in computer
science fromLinkoping University, Linkoping, Sweden, in
1987.Currently, he is a Professor of computer systems and Director
of the Em-
bedded Systems Laboratory, Linkoping University. His research
interests in-clude design and test of embedded systems, SoC
testing, hardware/softwareco-design, and real-time systems. He has
published over 250 technical papersand co-authored several books in
these areas.