Chapter 4
High-Level Power EstimationThis chapter presents a conceptual framework suitable for achieving accurateand efficient estimation of power dissipation for the hardware part of embeddedsystems. For our analysis, the systems have been described in VHDL at thebehavioral and Register Transfer levels. The goal is to provide to the designerthe capability of analyzing and comparing different solutions in the architecturaldesign space, before the synthesis task. The analytical power model ishierarchical, considering the different components of the target systemarchitecture, mainly the data-path, the memory, the control logic and theembedded core processor. The most relevant aspect of the proposed approach isto be quite general, since it considers a general SOC architecture, suitable formany industrial applications, as well as their single components that typicallycompose the hardware-side of an embedded system. The model parameters takeninto account in our analysis only concern the I/O behavior of the differenthardware modules, and they do not refer to the internal structure of the modules.Experimental results have been obtained by applying the proposed power modelto some industrial case studies and benchmark circuits.
4.1. Introduction
Power estimation methodologies should be provided at several abstraction levels in the
HW/SW co-design flow. Circuit and logic-level power estimation techniques are no more
sufficient, due to the high complexity and high integration levels of digital systems. Accurate
low-level estimation techniques present some limitations due to the need to cope with circuit
complexity in an acceptable design time. Moreover, low-level estimation techniques can be
applied only during the last design phases, when a circuit or logic-level description is already
98
available. However, a re-design process at these levels could be very expensive and time
consuming.
Hence, high-level power estimation is the key issue in the early determination of the power
budget for embedded systems, being unfeasible to synthesize every design solution down to
the gate, circuit and layout levels in a reasonable time. The goal is to meet the design turn-
around time, while exploring the architectural design space widely, and to early re-target the
architectural design choices. Accuracy and efficiency of a high-level analysis should
contribute to meet the power requirements, avoiding a costly re-design process. In general, the
relative accuracy in high-level power estimation is considered much more important than the
absolute accuracy, the main goal being the comparison among different design alternatives
[102].
This chapter is devoted to the definition of a power assessment framework for the HW part of
digital embedded systems ([53], [54], [55], [57]). This work focuses on the HW components
described at the highest levels of abstraction (behavioral or architectural) expressed in VHDL.
Up to now, most of the high-level descriptions are specified in a hardware description
language, such as Verilog or VHDL, along with other graphical formalisms suitable for
describing the functional behavior at the system-level, such as timing diagrams, State
Transition Graphs for Finite State Machines, Statecharts, etc. [76], [134]. In particular, VHDL
has become the de-facto standard in the European design community for the hardware
description and for the most part of the commercial design entry, synthesis and simulation
tools.
The main advantage of VHDL is related to the possibility of specifying the system behavior
by using a mixed description at different abstraction levels: behavioral, Register-Transfer and
structural. Therefore, VHDL provides high flexibility during both the design description and
the simulation phases. Furthermore, VHDL supports a hierarchical design approach, where
the description of the elements composing the hierarchy, properly connected, perform the
global functionality. The hierarchical approach provides also the possibility to use a mixed
description composed of behavioral, Register-Transfer and structural parts at the different
hierarchical levels.
Other advantages of VHDL are related to the possibility of easily specifying both the data-
path and the control-path of the system and to support the modular design approach. Hence,
Chapter 4. High-Level Power Estimation 99
VHDL allows the designer to re-use existing components. In fact, VHDL supports the
definition of functions and procedures, to decompose a complex description into smaller and
simpler functional units. These functional units can be organized as independent files, which
can be compiled and verified separately, thus supporting the definition of a library of re-
usable cells and macro-cells. Finally, VHDL provides also the complete independence with
respect to the technology used and the mapping between a given entity and different
architectures, through the configuration approach.
The aim of this chapter is to provide a conceptual analysis framework for accurate and
efficient estimation of power dissipation for the HW-bound part of embedded systems
described at the behavioral and RT levels. The availability of a power analysis tool at these
levels of abstraction is of paramount importance to obtain early estimation results, while
maintaining an acceptable accuracy. In fact, the behavioral and RT-level descriptions, based
on VHDL, are the design entry point for the majority of embedded systems and IC designs.
The HW partition is usually the more complicated part to be estimated at the high-level with
an acceptable precision, due to its heterogeneous nature. Nevertheless, the most relevant
aspect of the proposed approach is to be quite general, since it considers a general SOC
architecture, suitable for many industrial applications, as well as their single sub-parts, that
typically constitute the HW-side of an embedded system. The model parameters considered in
our analysis only concern the I/O behavior of the different HW modules, and they do not refer
to the internal structure of the modules.
The most important value-added has been the introduction of a third dimension, power, to the
speed versus area design space, where the architectural design exploration is usually carried
out. The main goal is to provide the designer the capability of analyzing and comparing
different solutions in the architectural design space, before the synthesis task. In fact, the
relative power figures can be used to guide the designer in exploring the relative impact of the
different design alternatives on the quality of the final design rather than to provide absolute
power data.
In the proposed approach, the analysis is based on a probabilistic estimation of the switching
activity. The proposed model accurately accounts for both the switching activity and the
physical capacitance for all the parts composing the embedded system architecture.
100
Experimental results derived from the application of the proposed model to a set of
benchmark and industrial circuits have shown the effectiveness of the approach as a relative
power indicator.
The chapter is organized as follows. The discussion starts by presenting the most significant
research works related to high-level power estimation in Section 4.2. Then the power
estimation model, we are focusing on, is introduced in Section 4.3, while the details related to
the different components of the target system architecture have been described from Section
4.4 to Section 4.7. Implementation aspects of the proposed model and experimental results
obtained from some benchmark circuits and industrial case studies are reported in Section 4.8.
Finally Section 4.9 contains some concluding remarks along with some considerations on
future developments of the research.
4.2. Previous Work on High-Level Power Estimation
General surveys of power estimation techniques at different abstraction levels can be found in
[49], [35], [137], [165] [102], [118]. Power estimators operating at the different levels of the
design flow are the fundamental elements for a power-oriented design methodology to
provide a feedback on the quality of a particular design solution. This approach implies, at
each level, a loop for the exploration of various design alternatives. Up to now, several power
estimation models have been proposed in the literature at the gate, circuit and layout levels. At
these levels, the state-of-the-art can be considered mature enough and most of the EDA
vendors provide effective power analysis tools.
However, the application of such tools is not reasonable for designs including millions of
transistors. Moreover, it is more and more important to be able to estimate power figures
during the early design stages, to avoid costly re-design processes. For high-level power
estimation, the relative accuracy is much more important than the absolute accuracy, since the
goal of the analysis is to know whether one alternative is better than another one. The aim is
the wide exploration of the architectural design space. Despite of the increasing interest in the
design exploration and estimation at the highest levels of abstraction, a few papers have been
published addressing the power estimation problem at high-level until recently. State-of-the-
art surveys of the high-level power estimation techniques have been presented in [118], [102].
Chapter 4. High-Level Power Estimation 101
According to the survey of Landman in [102], high-level power estimation techniques related
to the HW part of a system can be classified as behavioral-level and architectural-level
techniques. At higher abstraction levels, power exploration tools at the SW-level and system-
level can be used to identify power metrics associated with both the SW- and HW-bound parts
to guide the system-level partitioning, as shown in Chapter 3. As moving up in the abstraction
levels, the estimation process becomes much more difficult, since the knowledge of some
design characteristics is very limited as well as the typical activity of the hardware resources.
In general, power models at the highest levels do not provide a high degree of absolute
accuracy, the goal being limited to capture general trends.
4.2.1. Behavioral-Level Power EstimationTypical approaches at the algorithmic- or behavioral-level assume to adopt some architectural
styles or templates in order to obtain power estimates based on the exploration of a limited set
of design solutions. Essentially, the behavioral approaches differ on the strategy adopted for
the activity prediction: the behavioral methods can be classified as static and dynamic activity
prediction techniques [102]. The goal of the former techniques is the estimation of the access
frequency of different HW resources, by statically analyzing the behavioral description of the
functions to be implemented [125], [33]. The latter techniques are based on a dynamic
profiling to determine the activation frequencies of various resources and the memory
accesses [96], [158].
Mehra and Rabaey [125] have developed a power estimation strategy based on a static
profiling of the Control Data Flow Graph (CDFG) representing the design behavior. The
analysis has been carried out in the context of the HYPER-LP high-level synthesis system
targeting DSP-oriented applications. The power dissipated by some HW resources, such as
data-path modules, has been analytically estimated from the CDFG. Conversely, for other
modules, such as interconnects and controllers, for which the power information available at
the behavioral-level is not sufficient, statistical models were built to estimate power based on
a stochastic study on several ASICs. Basically, the power associated to a generic hardware
resource has been estimated as:
P = 1 / 2 Na Ca V 2dd fs (1)
102
where Na is the number of resource accesses over the computational period, Ca the average
capacitance switched per access and fs the sampling frequency. The capacitance estimates
have been obtained by the empirical characterization of fixed-activity models of the different
HW resources. The number of resource accesses have been analytically calculated from the
algorithm for the execution units, the registers and the memories, while they have been
determined statistically from benchmarks for the interconnections and the control logic. Then,
the estimation models have been included into an exploration tool that, given the CDFG
description of an algorithm and a library of hardware modules, explores the space of the
available solutions for different values of clock periods and supply voltages. The results have
been compared with an architectural-level power estimator, called Stochastical Power
Analysis (SPA) [97], on 23 different chips, showing an average error of approximately 20%.
Dynamic activity prediction at the behavioral-level is based on a dynamic profiling to
determine the activation frequencies of various resources. During the simulation of a user-
supplied set of input patterns, the activities related to the frequency of various types of
operations and memories accesses are gathered. These access frequencies are then plugged
into a model similar to those used in the static approach. Examples of the dynamic approaches
are the Profile-Driven Synthesis System (PDSS) [96], that receives as input a behavioral sub-
set of VHDL, and the Power-Profiler approach described in [158].
The main advantage of dynamic versus static approaches is a higher accuracy, since data
dependencies are taken into account, whereas the main disadvantages are related to the slower
efficiency in terms of speed and the need of a set of user-supplied typical input patterns.
4.2.2. Architectural-Level Power EstimationAccording to the taxonomy contained in [102], there are two classes of techniques operating
at the architectural or RT level: the analytical and the empirical techniques.
The analytical methods aim at relating the power consumption to the physical capacitances
and the switching activities of the design nets. Analytical techniques are in turn composed of
complexity-based models and activity-based models. The former models consider the design
complexity of each part of the design, in terms of equivalent gates, as a measure of the
physical capacitance, while the latter models exploit the concept of entropy, derived from the
information theory, as a measure of the average transition activity in a circuit.
Chapter 4. High-Level Power Estimation 103
More specifically, in complexity-based models ([131], [114]), first the number n of equivalent
gates contained in each design function is specified in a macro-module library, second, the
power estimates are obtained by multiplying n by the average power consumed by each
equivalent gate. Main advantages of the complexity-based techniques are related to the fact
that they require as inputs only a few design parameters, such as memory size or count of
equivalent gates. Nevertheless, they do not model circuit activity accurately, since an overall
fixed activity factor is typically assumed. As a matter of fact, activity factors vary with block
functionality and with the data being processed.
In activity-based models, the average power is estimated as the product of the area,
considered as a measure of the average nodes capacitance, and the entropy, considered as a
measure of the average activity in a circuit [138], [141], [119], [116], [113]. In the
information theory context, the entropy represents the quantity of information carried by a
random variable or process. Here, the basic idea is to try to associate the power of a block
with the amount of computational work it performs and the entropy is considered as the
metric for measuring the computational work. In these approaches, the logic functionality of a
module is known, though there are no notions about the implementation structure of the
modules.
In [141], the average power dissipated by a module is considered as proportional to the
product of the average capacitance, C, and average switching activity, S. Then, the average
circuit area, A, is used as a measure of C and the average entropy, H, as a measure of the
activity S:
Pave ∝ C × S ∝ A × H (2)
The correlation existing between the entropy of a signal x and its switching activity ESW (x) is
approximately given by: H(x) ≈ 2 ESW (x). Thus S can be approximated with H, where H
indicates the average value of H (x) over all the nodes of the module. Najm proposed in [138]
a simple formula to derive H from the input/output behavior of a module. The area A of the
block is also expressed in terms of the number of primary inputs and the entropy associated to
the primary outputs.
Main drawbacks of the entropy-based approaches are related to their limited accuracy, since
no timing information enters in the above expressions (i.e. no glitching power is accounted
104
for). Furthermore, the implicit assumption is that the capacitance is distributed uniformly over
all the circuit nodes. As for complexity-based models, the main advantage of the entropy-
based methods is that they require a very limited amount of information as input.
The empirical methods are based on the power measures of existing implementations, then a
macro-modeling approach is used to derive models from these measurements. The empirical
methods can be sub-divided into fixed-activity models and activity-sensitive models. The
former models disregard the influence of data-activity on power, while the latter consider the
effects of statistics related to data and instructions activity on power.
An example of the application of the fixed-activity macro-modeling strategy is the Power
Factor Approximation (or PFA) method presented in [151], while some examples of activity-
sensitive models are the ESP tool described in [160] and the SPA tool proposed in [99], [100],
[101]. The SPA approach, based on activity profiling and RT-level VHDL simulation, uses
two different types of activity models: one for the data-path and one for the control path. The
data-path model is referred as the Dual Bit Type (or DBT) model, whereas the control model
is indicated as Activity-Based Control (or ABC) model. In the DBT model, the basic
assumption is that fixed point two’s-complement data streams are characterized by two
distinct activity regions: the random activity of the least significant bits (LSB’s) and the
correlated activity of the most significant bits (MSB’s). The data bits (LSB's) exhibit activity
similar to uniformly distributed white noise, while the activity of sign bits (MSB's) depends
on the sign transition probability, which is related to the temporal data stream correlation.
Thus different coefficients, derived empirically, are used to characterize the capacitance
switched in the data and sign regions. The ABC model for the power consumption of the
control paths has been presented in [101], using three implementation styles: the ROM-based
controller, the PLA-based controller and the random logic controller.
Other macro-modeling approaches have recently been proposed to estimate the circuit power
on a cycle-by-cycle basis. Addressing this need, a cycle-accurate power macro-modeling
approach has been introduced in [191] and [83].
In general, the main advantage of empirical models is that they have a strong correlation to
real implementations.
Chapter 4. High-Level Power Estimation 105
4.2.3. Pattern-DependencyA common characteristic of the power estimation problem at the different abstraction levels is
that the average power is strongly related to the switching activity of the circuit nodes. Such a
fact has been indicated in [137] as stating that power estimation is a pattern-dependent
process. More in detail, the input pattern-dependency of the power estimation approaches can
been classified as strong or weak pattern-dependency.
The typical methods for power estimation based on extensive circuit simulation have been
indicated in [137] as strongly pattern-dependent process. Main advantages of these simulation
techniques derive from their accuracy and wide applicability. However, to obtain a complete
and accurate power estimation, the designer should provide a comprehensive amount of input
patterns to be simulated, thus making this approach very time consuming and computationally
expensive. Therefore the simulation approach is almost impossible to apply to most of the
designs, due to their increasing complexity.
To avoid the need of a large amount of input patterns, the weakly pattern-dependent
approaches [137] require input probabilities. In this case, the estimation results will depend on
the probabilities supplied by the designer, reflecting the typical behavior of the input signals.
Both probabilistic techniques and statistical techniques operating at low levels of abstraction
have been presented in Section 2.5. Probabilistic techniques suitable for combinational
circuits require user-supplied input probabilities to solve the pattern dependency problem,
while statistical techniques use randomly generated input patterns to simulate the circuit
repeatedly, then using statistical mean estimation techniques to stop simulation following a
criterion to determine the closeness to the average power.
Other approaches for reducing the power simulation time are based on the compaction of the
long stream composed of the typical input vectors by using probabilistic automata [121],
[122], [123]. The basic idea is to define a Stochastic State Machine (SSM), which captures the
relevant statistical properties of the given input stream and then to excite this machine with a
reduced number of random inputs so that the output sequence of the machine is statistically
equivalent to the initial one. The significant statistical properties are signal and transition
probabilities and first-order spatio-temporal correlation among bits and across consecutive
time frames. Main goal of these compaction techniques is to reduce the length of the input
106
stream used for power characterization by one to four orders of magnitude, while preserving
an acceptable level of accuracy (i.e. approximately 5%).
In general, the methods proposed for high-level power estimation have not yet achieved the
maturity necessary to enable their use within current industrial CAD environments. Our work
is an attempt to fill such a gap, aiming at providing a high-level power model, based on
VHDL descriptions, to cover the heterogeneous modules composing the basic architecture of
embedded systems.
4.3. The High-Level Power Estimation Model for the HW Part
The power model for the HW-bound part is based on the VHDL description of the ystem at
the behavioral and RT levels. The analysis is based on the probabilistic estimation of the
internal nodes switching activity. The proposed approach is based on the following general
assumptions:
• the supply and ground voltage levels in the ASIC are fixed, although it is worth noting the
impact of supply voltage reduction on power;
• the design style is based on synchronous sequential circuits;
• the data transfer occurs at the register-to-register level;
• the Zero Delay Model (ZDM) has been adopted, thus ignoring the contribution of glitches
and hazards to power.
The inputs for the estimation are as follows:
• the ASIC specification consisting of a hierarchical VHDL description implementing the
target system architecture introduced in Chapter 3;
• the allocation library composed of the available components implementing the macro-
modules (such as adders, multipliers, etc.) and the basic modules (such as registers,
multiplexers, logic gates, I/O pads, etc.). Every component model includes the description
of the logic behavior, the input capacitance, the area and the power characteristics;
• the technological parameters such as frequency, power supply, derating factors
(accounting for the variations in process, voltage and temperature), etc.;
• the switching behavior of the ASIC primary I/Os.
Chapter 4. High-Level Power Estimation 107
The power model is an analytical model, which attempts to relate the average power
dissipation of the VHDL descriptions to the physical capacitance and the switching activity of
the nets. The estimation approach is hierarchical: at the highest hierarchical level, ad-hoc
analytical power models for each part of the target system architecture are proposed; these
models are in turn based on a macro-module library, at the lowest hierarchical levels.
Furthermore, to avoid the need of a huge amount of input patterns, our approach is weakly
pattern-dependent, requiring user-supplied input probabilities, reflecting the typical input
behavior and derived from the system-level specification.
In the proposed single ASIC architecture, the total average power dissipated, PAVE, is given
by:
PAVE = PIO + PCORE (3)
where PIO and PCORE are the average power dissipated by the I/O nets and the core internal
nets, respectively. The value of PAVE can be multiplied by the derating factor, δ, taking into
account the effects of the variations of the fabrication process and the operating conditions
(voltage and temperature) on the power values contained in the target library.
The power model of the core logic is based on the models of the different components of the
target system architecture, therefore the PCORE term can be in turn expressed as:
PCORE = PDP + PMEM + PCNTR + PPROC (4)
where the single terms represent the average power dissipated by the data-path, the memory,
the control logic and the embedded core processor. The power models related to the single
terms in the above equations will be detailed in the following sections, except for the PPROC
term, that is considered to be part of the power dissipated by the SW-bound part, as detailed in
Chapter 3.
4.4. PIO Estimation
Although a pre-synthesis analysis is performed, we assume the knowledge of the ASIC
interface in terms of primary I/O pads characteristics and related switching activity from the
system-level specifications. The set S of input, output and bi-directional nets of the ASIC can
be partitioned into N sets, such as: S = { s1, s2, ..., sk, ..., sN} , where the k-th set sk is composed
108
of the same type tk of I/O pads. Considering for example a set of output pads, the average
power of the set sk can be estimated as:
P s
k = ∑
i=1
nk
Pi (Ci) TRi (5)
where:
• nk is the number of output pads in the set sk;
• Pi (Ci) is the average power consumption per MHz of the i-th output pad in sk. The value
of Pi is computed as a function of the output load Ci at a given reference frequency f0. This
value is tabulated in the selected library (such as in [14] as (Pf0 / f0) expressed in
[µW/MHz] as a function of Ci expressed in [pF]; Ci is the output load of the i-th output pad
expressed in [pF], derived from the system-level specifications. Note that the previous
equation is valid in a range of f0;
• TRi is the toggle rate of the i-th output pad, derived from the system-level specifications.
Similarly, the average power of the input pads can be computed, depending on the estimated
internal standard loads and input ramptime.
4.5. PDP Estimation
The average power dissipated by the data-path can be expressed as:
PDP= PREG + PMUX + PFU (6)
where the single terms represent the average power dissipated by the registers, the
multiplexers and the functional units.
4.5.1. PREG EstimationThe preliminary step is the estimation of the number of required registers and, consequently,
the values of the toggle rate TRi for each of them. According to the abstraction level, such data
are either directly available from the RT-level description or the live variable analysis can be
applied to the behavioral-level specifications.
When a RT-level description is available, the number of required registers can be directly
derived from the analysis of the VHDL source code, while the corresponding switching
activity can de deducted by propagating the switching activity from the primary inputs
Chapter 4. High-Level Power Estimation 109
through the circuit architecture specified at the RT-level. Based on a pre-characterized module
library, we can then obtain an estimate of the capacitive loads driven by each register.
The analysis of a behavioral-level description is more complex, since the scheduling and
allocation passes are not yet performed. Therefore, the live variable analysis [136] has been
applied to the behavioral-level VHDL code to estimate the number of required registers and
the maximum switching activity of each register.
The algorithm examines the life of a variable over a set of VHDL code statements and it is
similar to the one proposed in [136], for the computation of the lifetime of a variable in terms
of its definition and use over a selected set of VHDL code statements. New passes have been
added to the algorithm proposed in [136], to derive information concerning the registers
switching activity. The proposed algorithm can be summarized as follows:
1. Compute the lifetimes of all the variables in the given VHDL code, composed of S
statements. A variable vj is said to live over a set of sequential code statements
{i, i + 1, i + 2, ..., i + n}, when the variable is written in statement i and it is last accessed
in statement (i + n). When a variable is written in a statement (i + k) in the set, but last
used in the same statement (i + k) of the next iteration, it is assumed to live over the entire
set;
2. Represent the lifetime of each variable as a vertical line from statement i through
statement (i + n) in the column j reserved for the corresponding variable vj;
3. Determine the maximum number N of overlapping lifetimes, computing the maximum
number of vertical lines intersecting with any horizontal cut-line;
4. Estimate the minimum number N of set of registers necessary to implement the code by
using register sharing. Register sharing has to be applied whenever a group of variables,
with the same bit-width bi, can be mapped to the same register. The total number of
registers is given by ∑i=1
N
bi ;
5. Select a possible mapping of variables into registers by using registers sharing;
6. Compute the number wi of write to the variables mapped to the same set of registers;
7. Estimate αi of each set of registers dividing wi by the number of statements S:
αi = wi / S; hence, TRi = αi fCLK.
110
Figure 1 shows an application example of this algorithm, representing the differential
equation example reported in [136]. The bold dotted line at statement 7 represents the
horizontal cut-line with the maximum number (N = 9) of vertical lines reaching or crossing it.
Thus, using register sharing, the VHDL statements can be implemented with a minimum of 9
registers. A possible mapping of variables into registers is shown in Table 1.
u1 u2 u3 u4 u5 u6 u x y y1 dx a1. while (x<a) loop
2. u1 :=u*dx
4. u3 := 3*y
3. u2 := 5*x
5. y1 :=i*dx
7. u4 := u1*u2
6. x := x+dx
8. u5 := dx * u3
10. u6 := u-u4
9. y := y+y1
11. u :=u6-u5
12 end loop
Figure 1: Live variable analysis for power estimation
Registerri
Variablesmapped to ri
Writenumber
wi
Switching act.αi
Bit-widthbi
r1 u1, u4 2 1/6 1
r2 u2, u5 2 1/6 1
r3 u3 1 1/12 1
r4 u6, y1 2 1/6 1
r5 u 1 1/12 1
r6 x 1 1/12 1
r7 y 1 1/12 1
r8 dx 1 1/12 1
r9 a 1 1/12 1
Table 1. Results of live variable analysis applied to the differential equation example.
Based on this approach, it is possible to quickly explore the architectural design space. For
example, we can evaluate the influence on power of the registers sharing, by adopting
Chapter 4. High-Level Power Estimation 111
different criteria in the choice of which one and how many variables to map on the same
register. This method enables us to estimate the number of resources allocated and
consequently the related power consumption and area occupancy.
Regarding PREG, it is worth noting that the power of latches and flip/flops is consumed not
only during output transitions, but also during all clock edges by the internal clock buffers,
depicted in Figure 2, even though the data stored in the register does not change. Thus, our
analytical model of registers takes into account both the switching and non-switching power,
the latter due to internal clock buffers. The non-switching power dissipated by internal clock
buffers accounts for approximately the 30% of the average power of the registers, as for
example in [105] for a cell-based CMOS 3.3V technology. Note that, as depicted in Figure 2,
the internal clock buffers are independent of the output load, thus the non-switching power of
latches and flip/flops is load-independent, but dependent on the clock input ramptime.
D
CP
Q
D
CP
Q
Figure 2: The latch and D flip/flop models for power estimation
Let the set of registers, S, be composed of N sets, such as: S = { s1, s2, ..., sk, ..., sN} , where the
k-th set sk is composed of the same type tk of registers. Globally, the average power of the
registers can be estimated as:
PREG = ∑k=1
N(Psk + PNsk) (7)
where:
• Psk is the average power of each set sk
• PNSk is the average non-switching power dissipated by the internal clock buffers of the
registers in the set sk,, that is the average power dissipated by the internal clock buffers
when there are no output transitions.
112
Note that the measured average power Psk, tabulated in the target library, includes also the
power dissipated by the internal clock buffers during clock edges corresponding to output
transitions. Hence the estimated value of Psk should account for a toggle rate given by the
TRsk, while the estimated values of the PNSk should consider a toggle rate of (fCLK - TRsk).
The estimated values of Psk and PNsk , for the k-th set sk , are respectively given by:
P s
k = ∑
i=1
nk
Pi (Ci) TRi (8)
P Ns
k = P0k ∑
i=1
nk
(fCLK - TRi) (9)
where:
• nk is the estimated number of registers in the set sk;
• Pi (Ci) is the average power consumption per MHz of the i-th register in sk. The value of
Pi has been computed running SPICE simulations, at a given reference frequency f0, for
different output standard loads (representing both load cells and interconnections) and
clock input ramptime. Thus the value of Pi is given as a function of the output load Ci and
the input ramptime and it is tabulated in our allocation library in [µW/MHz] as a function
of Ci, expressed in equivalent standard load and input ramptime expressed in [nsec];
• P0k is the non-switching power consumption per MHz of a single register of type tk. The
value of P0k expressed in [µW/MHz] has been computed running SPICE simulations, at a
given reference frequency f0, as a function of the clock input ramptime;
• Ci is the estimated output load of the i-th register in the set sk expressed in equivalent
standard loads;
• TRi is the estimated toggle rate of the i-th register in the set sk, obtained by using the live
variable analysis.
4.5.2. PMUX EstimationBasically, to estimate the size and number of multiplexers from the VHDL code, it is
necessary to determine the number of paths in the data-path. The approach is also based on
the definition of the power model of a 2-input non-inverting multiplexer, based on both static
signal probability of the selection net and the switching activities of the input nets. However,
Chapter 4. High-Level Power Estimation 113
a different approach has been used depending on the abstraction level of the original VHDL
description, as for the registers.
Concerning the RT-level VHDL descriptions, the architecture is already defined and thus it is
easy to derive the number and size of multiplexers in the circuits, such as the related
capacitive loads. For the switching activity estimation of the outputs of the multiplexer, we
propagate the signal probability from the primary inputs through the given architecture.
Concerning behavioral VHDL descriptions, we need to consider a possible architecture where
to map the behavioral description. The analysis of the design paths and the related notations
are similar to those performed in [136], however in the proposed approach we consider also
the paths from primary inputs to internal registers and from internal registers to primary
outputs. A path from the source component S to the target component T is represented as T <
S. Note that all memory accesses require the use of intermediate registers.
Given the target architecture represented in Figure 1, the possible paths can be classified in
the following categories:
1. primary input to register (R < I);
2. register to primary output (O < R);
3. register to register (R < R);
4. register to functional unit (U < R);
5. functional unit to register (R < U);
6. register to memory (M < R);
7. memory to register (R < M).
The algorithm used to determine the possible paths in the data-path could be easily derived
from the algorithm described in [136], but considering also the possible paths of the
categories 1 and 2.
Basically, the number and size of the selector used as input to each register is given by
computing the number of paths that have the given register as destination. Similarly to
evaluate the number and sizes of the multiplexers used as input to each functional unit. The
analysis is based on some assumptions regarding the resource allocation. For example, if we
decide to allocate a single operator to execute the operations of the same type, we need to
compute all the possible paths that have as destination the given operator. The situation is
different if several operators are allocated to execute the same operations. In this case we
114
assume an average usage of the operators. We divide the sum of all paths with the same
operator as destination among all the operators of this type allocated and we distribute the
average number of paths on all the inputs of each operator. In this way, we can estimate the
size s (ti) of the multiplexer for each input of the allocated operators of type ti such as:
s(ti) = w (ti) / 2 n (ti) (10)
where w (ti) is the number of write accesses to the operator ti (i.e. the number of paths with ti
as destination) and n (ti) is the number of operators of type ti that we suppose to allocate.
A
B
S
Z
Figure 3: The 2-input non-inverting multiplexer model for power estimation
Once the size and number of multiplexers has been computed, we derive the switching
activity of the output node of each multiplexer, given the model of the two-input non-
inverting multiplexer depicted in Figure 4. A simplified model for the maximum switching
activity of the output Z of a 2-input non-inverting multiplexer is:
αΖ = αA (1 - pS1) + αB pS
1 (11)
where:
• αA is the switching activity of input A;
• αB is the switching activity of input B;
• ps1 is the static signal probability of the selection net S.
By recursively applying the previous equation, we can obtain the switching activity of
multiplexers with larger sizes, such as the 4-input non-inverting multiplexer:
αΖ = [αA ps11 + αB (1 - ps1
1] ps21 + [αC ps1
1 + αD (1 - ps11)] (1 - ps2
1) (12)
where A, B, C, D are the data inputs, and S1 and S2 are the selection inputs.
Globally, the average power dissipated by the multiplexers can be estimated as:
PMUX = ∑i=1
NPi (13)
Chapter 4. High-Level Power Estimation 115
where N is the estimated number of multiplexers and Pi is the average power of each
multiplexer.
The value of Pi for the i-th multiplexer is given by: Pi = Pti (Ci) TRi , where Pti is the average
power consumption per MHz of a 2-input non-inverting multiplexer and TRi is the toggle rate
of the output of the i-th multiplexer.
4.5.3. PFU EstimationFor the estimation of the average power of the functional units, we use complexity-based
analytical models [102], where the complexity of each functional unit is described in terms of
equivalent gates. For the estimation of the number of equivalent gates necessary to implement
a given function of the data-path, we use a library of macro-modules such as adders,
multipliers, etc.. The library should include the estimated number of logic gates for each
macro-module, depending on the number of operands and the bit-width of each operand. Once
the number of equivalent gates for each macro-function has been evaluated, the estimated
power dissipated by the functional units can be expressed as:
PFU = ∑i=1
NPi (14)
where N is the number of macro-modules, and Pi is the power of the i-th macro-module given
by:
Pi = ni PTECH TRi (15)
where PTECH is a technological parameter expressed in [µW/(gate MHz)]; ni is the estimated
number of logic gates in the i-th macro-function; TRi is the toggle rate of the output net of the
i-th macro-module.
4.6. PMEM Estimation
A power dissipation model for a memory cell, at a low-level of abstraction, has been proposed
in [175], being:
Pmemcell = 2k/2 ( cint lcolumn + 2n-k Ctr)Vdd Vswing fclk (16)
116
where 2k is the number of cells in a row, cint is the wire capacitance per unit length, lcolumn is
the memory column length, 2n-k is the number of cells in a column, Ctr is the minimum size
drain capacitance, and Vswing is the bit line voltage swing.
Considering a fully CMOS single port static RAM, at a high-level of abstraction, we assume
to have in the target library the information related to the power consumption of a single
memory cell Pcell and of a single memory output buffer.
The average power dissipation during a read access to a single row of the array, composed of
n rows and m columns, is proportional to the inverse of the read access time ta and to the sum
of the average power dissipated by the row decoder, the m memory cells composing the i-th
row and the output buffers.
In particular, the power dissipated by the row decoder can be estimated with a complexity-
based model, where the number of equivalent gates is proportional to the product (n × lg2n)
and the load capacitance is the word line capacitance.
4.7. PCNTR Estimation
This section describes the contribution to the power consumption due to the control part of the
target sytsem architecture, described as a set of FSMs represented by STGs. The proposed
model for power dissipation of a FSM is a probabilistic model, where we approximate the
average switching activities of the FSM nodes by using the switching probabilities (or
transition probabilities) derived by modeling the FSM as a Markov chain (such as described
in Chapter 2). Given a typical implementation of a FSM, composed of a combinational circuit
and a set of state registers, as depicted in Figure 5, we consider the different contributions to
the global average power:
PCNTR= PIN + PSTATE_REG+ PCOMB +POUT (17)
where:
• PIN is the average power dissipated by the primary inputs PI;
• PSTATE_REG is the average power dissipated by the state registers;
• PCOMB is the average power dissipated by the combinational logic;
• POUT is the average power dissipated by the primary outputs.
Chapter 4. High-Level Power Estimation 117
RegistersPresent StateBit Lines
Next StateBit Lines
PrimaryInputs Outputs
CLK
Primary
LogicCombinational
Figure 4: The Finite State Machine model for power estimation.
The power estimation models dealing with each term of the above equation are described in
the following. As basic assumptions, we suppose to have the FSM description available in the
form of a State Transition Graph (STG), where each state is represented symbolically and
nothing is known on the structure of the combinational logic implementing the next state and
output functions. The input static signal probabilities and the input switching activity factors
are supposed to be given from the system-level specifications, being derived by simulating the
FSM at a high abstraction level or by direct knowledge of the typical input behavior.
Furthermore, we assume to use a Zero Delay Model for the logic gates and synchronous
primary inputs. Under these assumptions, we can ignore the effects of glitches and hazards on
the state bit lines, therefore the switching activity of the present and next state bit lines are
equal.
Given the FSM description and the input probabilities, the first step of our estimation consists
of the computation of the total state transition probabilities for each edge in the graph, by
modeling the FSM as a Markov chain and following the same method shown in Chapter 2.
The second step consists of finding a state assignment that minimizes the power dissipation,
by applying one of the state assignment methods described in Chapter 7.
4.7.1. Switching Activity Estimation for the State Bit LinesThe switching activity of the state bit lines, depends on both the state encoding and the total
state transition probabilities between each pair of states in the STG [185].
118
Let us generalize the concept of state transition probability to transitions occurring between
two distinct sub-sets of disjoint states, Si and Sj, contained in the set of states S = {s1, s2, ...,
sns}, as defined in [185]:
TP (Si ↔ Sj) = ∑si∈ Si
∑
sj∈ Sj
(Pij + Pji ) (18)
Being bi the i-th bit (1 ≤ i ≤ nvar) of the state code (called state bit) and nvar the number of state
bits ( lg2 ns ≤ nvar ≤ ns), we consider the two sets of sub-states in which the i-th state bit
assumes the value one and zero respectively. The switching activity αbi of the state bit line bi
is given by [185]:
αbi = TP (States(bi =1) ↔ States(bi = 0)) (19)
4.7.2. Switching Activity Estimation for the Primary OutputsConsidering a Moore-type FSM, the switching activity of the primary outputs can be defined
similarly to the switching activity of the state bit lines, depending on both the given output
encoding and the total state transition probabilities. In fact, in a Moore-type FSM, the total
state transition probabilities Pij between the two states si and sj are equal to the total transition
probabilities between the corresponding outputs oi and oj, where the output row vector oi
(i = 1, 2, ..., ns ) is composed of the nO primary outputs: (y1i, ..., y
li, ..., y
nOi).
Let us define the transition probability of the transitions occurring between two distinct sub-
sets of disjoint outputs, Oi and Oj, contained in the set of the outputs O = {o1, o2, ..., ons}, as:
TP (Oi ↔ Oj) = ∑oi∈ Oi
∑
oj∈ Oj
(Pij + Pji ) (20)
Being ym the m-th output bit (1 ≤ m ≤ nO) and nO the number of primary outputs, we consider
the two sets of outputs in which the m-th output bit assumes the value one and zero
respectively. The switching activity αym of the primary outputs ym is given by:
αym = TP (Outputs(ym =1) ↔ Outputs(ym = 0)) (21)
As an example, we consider the same Moore-type FSM considered in Chapter 2, where a state
encoding has been fixed. Figure 5 shows the total transition probabilities associated to each
Chapter 4. High-Level Power Estimation 119
edge. The figure shows also the steady state probabilities, the state codes and the ouput codes
associated to each node.
st2P23 = 9/58
st3
st4st1
P34 = 9/29P43 = 9/58
P42 = 9/58
P11 = 1/58
P12 = 3/58
P33 = 3/29
P21 = 3/58
P2 = 6/29
P1 = 2/29 P4 = 9/29
P3 = 12/29
01/11
00/00
11/01
10/10
Figure 5: Steady state probabilities and total transition probabilities for the example FSM with encoded states.
The switching activities of the state bit lines are given by:
αb1 = TP (S1 = {s3, s4} ↔ S2 = {s1, s2}) = 9/29
αb2 = TP (S3 = {s2, s3} ↔ S4 = {s1, s4}) = 21/29
The switching activities associated to the outputs are:
αy1 = TP (O1 = {o2, o4} ↔ O2 = {o1, o3}) = 21/29
αy2 = TP (O3 = {o2, o3} ↔ O4 = {o1, o4}) = 21/29
4.7.3. PIN EstimationAs mentioned before, let us assume that the input static signal probabilities and the input
switching activity factors are given from the system-level specifications.
The average power dissipated by the k-th primary input belonging to the set
PI = {x1, x2, ..., xk, ..., xnI} depends on the switching activity factors αxk and the input load
capacitance Cxk, the latter being proportional to the number of literals, nlitxk, that the k-th
primary input is driving in the combinational part, and the estimated capacitance Clit due to
each literal [185]. Therefore, the average power PIN can be estimated as:
PIN = ∑xk ∈ PI
Pxk (Cxk) TRxk (22)
120
where: Cxk = nlitxk Clit; TRxk = αxk fCLK and Pxk (Cxk) is the average power consumption per
MHz of the buffer cell driving the k-th input.
4.7.4. PSTATE_REG EstimationThe average power dissipated by the state registers, PSTATE_REG, can be derived by using the
switching activity αbi of the i-th state bit line bi, where 1 ≤ i ≤ nvar and the corresponding
toggle rate is TRbi = αbi fCLK. The term PSTATE_REG accounts for the switching and non-
switching power of the state registers:
PSTATE_REG = ∑i=1
nvar
(Pi + PNSi ) (23)
where nvar is the number of state registers and Pi and PNSi are the average switching and non-
switching power dissipated by each state register. As before, the switching power Pi includes
also the power dissipated by the internal clock buffers, during clock edges corresponding to
output transitions. Hence the terms Pi should account for a toggle rate given by TRbi, while the
terms PNSi should consider a toggle rate of (fCLK - TRbi).
The estimated values of Pi and PNSi are respectively given by:
Pi = Pti (Ci) TRbi (24)
PNSi = P0i (fCLK -TRbi ) (25)
where:
• Pti is the average power consumption per MHz of the i-th register of type ti as a function of
the load capacitance Ci and the input ramptime;
• P0i is the non-switching power consumption per MHz of a single register of type ti;
• Ci = nlitbi Clit is proportional to the number of literals, nlitbi, that the i-th state bit line is
driving in the combinational part, and the estimated capacitance Clit due to each literal,
expressed in equivalent standard loads.
4.7.5. PCOMB EstimationThe average power dissipated by the combinational logic PCOMB has been estimated by
considering the 2-level logic implementation, before the minimization step. The i-th state bit
line bi (where 1 ≤ i ≤ nvar) can be expressed by using the canonical form as the sum of Nbi
Chapter 4. High-Level Power Estimation 121
minterms (Nbi ≤ 2nlit where nlit is the number of literals and 2nlit is the maximum number of
minterms). Similarly, the m-th output bit ym (1 ≤ m ≤ nO) can be expressed in the canonical
form as the sum of Nym minterms (Nym ≤ 2nlit).
Let us assume to use a single AND gate to represent the generic minterm, hence the maximum
number of AND gates in the AND-plane is 2nlit, while in general nAND ≤ 2nlit. Given the
probabilistic model of the switching activity of the generic nlit-input AND gate, the static
probability associated to the output node is the product of static probabilities of the inputs:
p1AND = ∏
i=1
nlit
p1i (26)
Assuming that the inputs are spatio-temporal independent, the switching activity of the output
node for the nlit-input AND gate is given by:
αAND = 2 (1- ∏ i=1
nlit
p1i ) ∏
i=1
nlit
p1i (27)
To compute the switching activity of the output node of the generic AND gate, we need the
static probabilities of primary inputs and state bit lines. Given the vector of the steady state
probabilities P = {P1, ..., Pk, ..., Pns}T (obtained by solving the Chapman-Kolmogorov
equations, as shown in Chapter 2), the probability that the state bit line be equal to 1
corresponds to the sum of the probabilities for the set of states SI with this bit equal to 1:
p 1 bi = ∑ i∈ SI
pi (28)
In this case, we can derive an upper bound for the estimated power of the AND-plane:
PCOMB = ∑ i=1
nAND
Pi (Ci) TRi (29)
where:
• Pi (Ci) is the average power consumption per MHz of the i-th nlit-input AND gate;
• Ci is the capacitance driven by the i-th nlit-input AND gate;
• TRi = αi fCLK is the toggle rate of the i-th nlit-input AND gate (derived by using the
switching activity model of the nlit-input AND gate).
122
4.7.6. POUT EstimationPOUT is the average power dissipated by the OR-plane, that is composed of nvar Nbi-input OR
gates corresponding to the state bit lines, driving the input capacitance of the state registers,
and nO Nym-input OR gates corresponding to the primary outputs, driving the output load
capacitances.
Therefore, the upper bound for the power of the OR-plane is composed of two terms. The first
term is thus proportional to the switching activity factors αbi of the state bit line bi, while the
second term is proportional to the switching activity factors αyi of the primary outputs:
POUT = ∑ i=1
nvar
Pi (CIN_REG) TRbi + ∑i=1
nO
Pi (Cyi) TRyi (30)
where:
• Pi (CIN_REG) is the average power consumption per MHz of the i-th Nbi-input OR gate
driving the i-th state bit line;
• CIN_REG is the input capacitance of each state register;
• TRbi = αbi fCLK is the toggle rate of the i-th state bit line bi;
• Pi (Cyi) is the average power consumption per MHz of the i-th Nyi-input OR gate driving
the i-th primary output;
• Cyi is the output load capacitance of the i-th primary output;
• TRyi = αyi fCLK is the toggle rate of the i-th primary output.
4.8. Implementation and Experimental Results
The high-level power estimation model presented in the previous sections has been
implemented in the program vhdl2pow, written in C language, that receive as input the
VHDL description of the different sub-parts of the target system architecture. The structure of
the vhdl2pow program is composed of three main modules devoted to manage different
types of VHDL descriptions:
• the DP-RT module, for data-path VHDL descriptions at the RT-level;
• the DP-BEH module, for data-path VHDL descriptions at the behavioral-level;
• the CNTR module, for VHDL descriptions of the control part.
Chapter 4. High-Level Power Estimation 123
The first step of all the different modules is the lexical and syntactical analysis of the VHDL
code, implemented in a parsing function. The Lex and Yacc programs have been used to
perform the static analysis of the VHDL source code, based on the lexical and syntactical
rules. The result of the parsing step is an intermediate structure, containing all the information
necessary for the successive step of the semantic analysis.
The semantic analysis is executed by visiting the syntactic tree that represents the
intermediate structure in order to recognize the constructs. An example of VHDL source code
and the corresponding syntactic tree is depicted in Figure 6. The semantic analysis is
developed in different ways, depending on the given type of circuit description, since it is
referred to a particular style of circuit description.
At the RT-level, the architecture is well defined, being the scheduling and allocation steps
already been performed. Basically the analysis aims at identifying the architectural model
composed of logic modules (such as adders, multipliers, etc.) and their interconnections. The
goal of the data structure modeling the circuit architecture is to propagate the switching
activity and to calculate the load capacitances of the nodes. The final step is to estimate all the
parameters in the power model associated to the architectural description.
Concerning the behavioral-level descriptions, two different estimation approaches can be
adopted. The first approach derives an estimate of the switching activity and capacitive loads
directly from the information obtained during the live variable analysis, while the second
approach derives an architectural model corresponding to the behavioral-level model. In the
first approach, the semantic analysis is strictly connected to the results of the live variable
analysis, to obtain an estimate of the resources necessary to implement the circuit
functionality. In the second approach, the performed operations are quite similar to those
performed during high-level synthesis. In practice it is necessary to make some assumptions
about the scheduling of the operations and a possible resource allocation. Then the same
analysis techniques adopted for the RT-level descriptions can be applied to the corresponding
architectural description.
Given the complexity and large number of constructs provided by the VHDL language, our
analysis can accept as input only a sub-set of VHDL constructs, summarized in Table 2. This
imposes some limitation to the modeling style in VHDL, however the sub-set has been
124
derived by taking in consideration some industrial case studies developed at the R&D Labs of
SGS-Thomson and a set of standard benchmark circuits.
case
currentstate
IN10
process currentstate
IN00 IN01 IN11IN10SENSITIV.
ST1
Y_OUT
<=
if
IN00 IN01or or
"00" then else
NX_ST NX_STY_OUT Y_OUT
<= <= <= <=
"11" "00"ST1ST2
comb : process (current_state, IN00, IN10, IN01, IN11) begin
case CURRENT_STATE is
when ST1 =>
if( IN00 or IN01 or IN10 ) then
else
end if;
when ST2 =>
Y_OUT <= "11";
Y_OUT <= "00";
if( IN00 or (IN01 and IN10) ) then
NX_ST <= ST1;
NX_ST <= ST3;
Y_OUT <= "00";
else
NX_ST <= ST1;
Y_OUT <= "00";
NX_ST <= ST2;
Y_OUT <= "11";
end if;
Y_OUT <= "01";
end case;
end process;
(a)
(b)
Figure 6: An example of (a) VHDL source code and (b) the related syntactic tree
Chapter 4. High-Level Power Estimation 125
Constructs Considered Not ConsideredDeclarations constant,
function,variable
file,procedure,generic
Types integer,bit,boolean,std_logic,std_logic_vector
real,access,character,record,physical
Logic Operators and, or,nand, nor,xor, not
Relational Operators =, /=,<, >,<=, >=
Arithmetic Operators +, -, *, / **, rem,abs, mod
Sequential Instructions case, if,while, wait
exit, next,for, null
Attributes high, low,left, right,range, length
Table 2: VHDL construct considered in vhdl2pow
For the control logic, the vhdl2pow program provides a set of routines to extract a state
transition table from the behavioral VHDL description and its conversion to the BLIF format
(Berkeley Logic Interchange Format), a description formalism for sequential circuits adopted
in the OCTTOLLS package. An example of VHDL description of a Moore type and Mealy-
type FSM accepted by the vhdl2pow has been reported in Figure 3. The description is based
on based on the usage of a single process to describe both the state and output behavior.
A set of experimental results has been conducted by applying the power estimation model
implemented in vhdl2pow to several industrial case studies and standard benchmarks.
All the measures have been derived by using the HCMOS6 technology, featuring 0.35 µm and
3.3 V, supplied by SGS-Thomson Microelectronics at the target operating frequency of 100
MHz.
To verify the accuracy of the proposed model, the obtained results have been compared with
the estimates obtained by synthesized with the Synopsys tools targeting the HCMOS6
technology. For the synthesis of RT-level descriptions we used the Design Compiler tools,
while for behavioral-level description we used the Behavioral Compiler tool.
The estimation results obtained by the proposed methodology at pre-synthesis level have been
compared with the results derived by using the Synopsys Design Power tool, based on the
synthesized gate-level netlist and the probabilistic approach to propagate the switching
126
activity of internal nodes. The Synopsys estimates are expected to be more accurate than the
vhdl2pow estimates, since the Synopsys Design Power estimates are based on the
synthesized gate-level netlist, thus they have been derived on lower level description. Notice
that both methods are based on a Zero Delay Model.
END CASE; END PROCESS combo;
.
.
.
.
.
when ST2 => Y_OUT_temp <= "11"; if ( IN00 or IN01 or IN10 ) then NEXT_STATE <= ST3; Y_OUT_temp <= "01"; else NEXT_STATE <= ST1; Y_OUT_temp <= "00"; end if; when ST3 => Y_OUT_temp <= "01"; if ( IN11 or IN01 or IN10 ) then NEXT_STATE <= ST4; Y_OUT_temp <= "10"; else NEXT_STATE <= ST3; Y_OUT_temp <= "01"; end if;
Y_OUT_temp <= "10"; if ( IN00 or IN01 ) then NEXT_STATE <= ST3; Y_OUT_temp <= "01"; else NEXT_STATE <= ST2; Y_OUT_temp <= "11"; end if; end case; end process;
when ST4 =>
comb:process (current_state, IN00, IN01, IN10, IN11) begin case CURRENT_STATE is when ST1 => Y_OUT_temp <= "00"; if ( IN00 or IN01 or IN10 ) then NEXT_STATE <= ST2; Y_OUT_temp <= "11"; else NEXT_STATE <= ST1; Y_OUT_temp <= "00";
BEGIN CASE state IS WHEN IDLE => d_alarm_temp <= "00"; unlock_temp <= ’0’; start_timer_temp <= ’0’; reset_alarm_temp <= ’0’; IF (rc_response = ’1’) THEN nxt_state <= CAR_LOCKING; gas_temp <= ’0’; lock_car_temp <= ’1’; ELSIF (key_response = "00") THEN nxt_state <= GAS_LOCKED; gas_temp <= ’1’; lock_car_temp <= ’0’; ELSE nxt_state <= state; gas_temp <= ’0’; lock_car_temp <= ’0’; END IF; WHEN CAR_LOCKING => gas_temp <= ’0’; unlock_temp <= ’0’; start_timer_temp <= ’0’; reset_alarm_temp <= ’0’; IF (lock_err = ’1’) THEN nxt_state <= IDLE; d_alarm_temp <= "01"; lock_car_temp <= ’0’; ELSIF (lock_ok = ’1’) THEN nxt_state <= CAR_LOCKED; d_alarm_temp <= "10"; lock_car_temp <= ’0’; ELSE nxt_state <= state; d_alarm_temp <= "00"; lock_car_temp <= ’1’; END IF;
combo: PROCESS(state, key_response, rc_response, lock_err, .....)
(b)(a)
end if;
Figure 3: Example of VHDL description of a Moore-type FSM (a) and Mealy-type FSM (b).
The selected set of data-path dominated systems is composed of:
• mem_int, a VHDL RT-level description of a circuit to interface the main memory and
the system bus;
• gcd, a VHDL behavioral description to compute the great common divisor between two
8-bit number;
• diffeq, a VHDL behavioral description of a circuit for the numerical resolution of
second order linear differential equations with constant coefficients; diffeq-t has
been optimized for speed, while diffeq-a for area;
Chapter 4. High-Level Power Estimation 127
• ellipf, a VHDL behavioral description of an elliptic filter of the fifth order;
• dhrc, a VHDL behavioral description of the algorithm to solve the partial differential
equations to describe the heat propagation.
All the above mentioned case studies described at the behavioral-level have been derived
from the High Level Synthesis Design Repository, University of California at Irvine (1995).
The selected set of control dominated systems is composed of:
• pace, a VHDL description of an embedded controller of a pacemaker, for which the
power dissipation is a very significant design constraint
• cerbero, a VHDL description for an embedded controller for an anti-theft system for
automotive applications;
• a set of 43 FSMs derived from the MCNC-91 benchmark suite. First we applied the area-
oriented state assignment program NOVA [189] to the selected benchmarks, then the
encoded FSMs have been synthesized by the Synopsys Design Compiler tool targeting the
HCMOS6 technology.
Table 4 to Table 7 summarizes the results. First, let us discuss the results related to the case
studies reported in Table 4. For the sequential power, the proposed model provides an average
percentage error of 13.23% with respect to the Design Power estimates. For the combinational
power, the average percentage error is equal to 19.37%. Globally, a 13.95% average
percentage error on the total power figures. Let us discuss the results related to the MCNC
FSM benchmark set reported from Table 5 to Table 7. Regarding the sequential power, the
proposed model shows an average percentage error of 13.03% (ranging from 0.01% to
33.96%) with respect to the Design Power estimates. Regarding the combinational and total
power, the average percentage error is equal to 9.09% and 8.43% respectively. Globally, the
relative accuracy of our results compared with the Design Power results is considered
satisfactory at this level of abstraction.
128
CIRCUIT TotalSequential
Power [mW]
TotalCombinational
Power [mW]
TotalPower[mW]
Design Power 4.21 2.137 6.3478mem_int vhdl2pow RT 3.97 1.847 5.817
Perc. Error -5.7% -13.57% -8.36%Design Power 0.922 0.8953 1.8173
gcd vhdl2pow RT 0.832 0.826 1.658Perc. Error 9.76% -7.74% -8.77%Design Power 0.922 0.8953 1.8173
gcd vhdl2pow BEH 0.549 0.771 1.32Perc. Error 40.46% -13.88% -27.36%Design Power 13.72 29.68 43.4
diffeq-t vhdl2pow RT 11.961 27.22 39.181Perc. Error 12.82% -8.29% -9.72%Design Power 13.72 29.68 43.4
diffeq-t vhdl2pow BEH 13.205 34.21 47.415Perc. Error 3.75% 15.26% 9.25%Design Power 10.14 20.12 30.26
diffeq-a vhdl2pow RT 8.894 18.474 27.368Perc. Error 12.29% -8.2% -9.56%Design Power 10.14 20.12 30.26
diffeq-a vhdl2pow BEH 8.937 14.5 23.437Perc. Error 11.86% -27.93% -22.55%Design Power 16.25 1.8773 18.1273
ellipf vhdl2pow BEH 14.49 0.55 15.038Perc. Error 10.83% -70.74% -17.04%Design Power 4.78 4.42 9.2
dhrc vhdl2pow BEH 5.766 5.454 11.22Perc. Error 20.63% 23.3% 21.96%Design Power 0.735 1.01 1.745
pace vhdl2pow CNTR 0.8309 0.9141 1.9Perc. Error 13.05% -9.9% 8.88%Design Power 0.092 0.139 0.231
cerbero vhdl2pow CNTR 0.088 0.120 0.208Perc. Error 4.35% -14.29% -9.96%
Ave. |Perc. Error| 13.23% 19.37% 13.95%
Table 4: Power estimation comparison results for some case studies
Chapter 4. High-Level Power Estimation 129
CIRCUIT PI PO ns TotalSequential
Power [µW]
TotalCombinational
Power [µW]
TotalPower[µW]
Design Power 188.00 430.19 618.19bbara 4 2 10 vhdl2pow 214.98 411.03 626.01
Perc. Error 14.35% -4.45% 1.26%Design Power 266.00 850.10 1116.10
bbsse 7 7 16 vhdl2pow 269.26 903.02 1172.28Perc. Error 1.23% 6.23% 5.03%Design Power 180.00 203.29 383.29
bbtas 2 2 6 vhdl2pow 194.34 221.40 415.74Perc. Error 7.97% 8.91% 8.47%Design Power 218.00 272.09 490.09
bbtasmod 2 2 9 vhdl2pow 238.24 241.59 479.83Perc. Error 9.28% -11.21% -2.09%Design Power 168 329.95 497.95
beecount 3 4 7 vhdl2pow 175.87 331.81 507.68Perc. Error 4.68% 0.56% 1.95%Design Power 212 728.52 940.52
cse 7 7 16 vhdl2pow 271.88 544.11 815.99Perc. Error 28.25% -25.31% -13.24%Design Power 239.00 935.90 1174.90
dk14 3 5 7 vhdl2pow 217.33 995.32 1212.65Perc. Error -9.07% 6.35% 3.21%Design Power 146.00 802.15 948.15
dk15 3 5 4 vhdl2pow 149.82 783.91 933.73Perc. Error 2.62% -2.27% -1.52%Design Power 443 2197.4 2640.4
dk16 2 3 27 vhdl2pow 376.63 1767.62 2144.25Perc. Error -14.98% -19.56% -18.79%Design Power 429.00 659.90 1088.90
dk17 2 3 8 vhdl2pow 483.89 567.63 1051.52Perc. Error 12.79% -13.98% -3.43%Design Power 211.00 376.80 587.80
dk27 1 2 7 vhdl2pow 204.82 387.36 592.18Perc. Error -2.93% 2.80% 0.75%Design Power 331.00 629.88 960.88
dk512 1 3 15 vhdl2pow 330.95 655.96 986.91Perc. Error -0.02% 4.14% 2.71%Design Power 360.00 1063.20 1423.20
donfile 2 1 24 vhdl2pow 326.00 1052.87 1378.87Perc. Error -9.44% -0.97% -3.11%Design Power 351 1365.2 1716.2
ex1 9 19 20 vhdl2pow 316.05 1933.49 2249.54Perc. Error -9.96% 41.63% 31.08%Design Power 445 1562.9 2007.9
ex2 2 2 19 vhdl2pow 339.3 1533.78 1873.08Perc. Error -23.75% -1.86% -6.71%
Table 5: Power estimation comparison results for some MCNC FSM benchmarks (a).
130
CIRCUIT PI PO ns TotalSequential
Power [µW]
TotalCombinational
Power [µW]
TotalPower[µW]
Design Power 317 777.3 1094.3ex3 2 2 10 vhdl2pow 263.1 568.45 831.55
Perc. Error -17.00% -26.87% -24.01%Design Power 408 1211.8 1619.8
ex4 6 9 14 vhdl2pow 274.96 1205.72 1480.68Perc. Error -32.61% -0.50% -8.59%Design Power 260.00 662.45 922.45
ex5 2 2 9 vhdl2pow 272.82 638.52 911.34Perc. Error 4.93% -3.61% -1.20%Design Power 214.00 1139.80 1353.80
ex6 5 8 8 vhdl2pow 208.87 1277.16 1486.03Perc. Error -2.40% 12.05% 9.77%Design Power 253.00 681.57 934.57
ex7 2 2 10 vhdl2pow 266.72 706.38 973.10Perc. Error 5.42% 3.64% 4.12%Design Power 284 1158 1442
keyb 7 2 19 vhdl2pow 299.13 946.69 1245.82Perc. Error 5.33% -18.25% -13.60%Design Power 281 1259.8 1540.8
kirkman 12 6 16 vhdl2pow 220.73 1222.75 1443.48Perc. Error -21.45% -2.94% -6.32%Design Power 99.00 138.34 237.34
lion 2 1 4 vhdl2pow 119.25 157.42 276.67Perc. Error 20.45% 13.79% 16.57%Design Power 231.00 388.59 619.59
lion9 2 1 9 vhdl2pow 229.41 316.68 546.09Perc. Error -0.69% -18.51% -11.86%Design Power 358 1014.5 1372.5
mark1 5 16 15 vhdl2pow 259.57 1001.77 1261.34Perc. Error -27.49% -1.25% -8.10%Design Power 122.00 313.55 435.55
mc 3 5 4 vhdl2pow 130.20 315.65 445.85Perc. Error 6.72% 0.67% 2.36%Design Power 239.00 302.48 541.48
modulo12 1 1 12 vhdl2pow 279.16 305.55 584.71Perc. Error 16.80% 1.01% 7.98%Design Power 306 803 1109
opus 5 6 10 vhdl2pow 267.04 748.75 1015.79Perc. Error -12.73% -6.76% -8.40%Design Power 706.00 5014.60 5720.60
planet 7 19 48 vhdl2pow 515.69 5145.60 5661.29Perc. Error -26.96% 2.61% -1.04%Design Power 467 1976.8 2443.8
pma 8 8 24 vhdl2pow 308.4 2047.79 2356.19Perc. Error -33.96% 3.59% -3.58%
Table 6: Power estimation comparison results for some MCNC FSM benchmarks (b).
Chapter 4. High-Level Power Estimation 131
CIRCUIT PI PO ns TotalSequential
Power [µW]
TotalCombinational
Power [µW]
TotalPower[µW]
Design Power 454 3430.4 3884.4s1 8 6 20 vhdl2pow 343.72 3027.44 3371.16
Perc. Error -24.29% -11.75% -13.21%Design Power 387 2605.4 2992.4
s1a 8 6 20 vhdl2pow 343.72 2591.54 2935.26Perc. Error -11.18% -0.53% -1.91%Design Power 161 244.06 405.06
s27 4 1 6 vhdl2pow 203.09 293.16 496.25Perc. Error 26.14% 20.12% 22.51%Design Power 194.00 427.74 621.74
s8 4 1 5 vhdl2pow 191.37 374.71 566.08Perc. Error -1.36% -12.40% -8.95%Design Power 579 5292 5871
sand 11 9 32 vhdl2pow 531.64 4112 4643.64Perc. Error -8.18% -22.30% -20.91%Design Power 187.00 150.74 337.74
shiftreg 1 1 8 vhdl2pow 187.19 155.26 342.45Perc. Error 0.10% 3.00% 1.39%Design Power 306 973.3 1279.3
sse 7 7 16 vhdl2pow 266.77 926.37 1193.14Perc. Error -12.82% -4.82% -6.73%Design Power 423 3074 3497
styr 9 10 30 vhdl2pow 307.26 2968.76 3276.02Perc. Error -27.36% -3.42% -6.32%Design Power 150 281.02 431.02
tav 4 4 4 vhdl2pow 175 325.71 500.71Perc. Error 16.67% 15.90% 16.17%Design Power 377 5122.9 5499.9
tbk 6 3 32 vhdl2pow 387.16 4981.34 5368.5Perc. Error 2.69% -2.76% -2.39%Design Power 390 1998.8 2388.8
tma 7 6 20 vhdl2pow 289.38 1663.66 1953.04Perc. Error -25.80% -16.77% -18.24%Design Power 222.00 337.33 559.33
train11 2 1 11 vhdl2pow 223.87 332.28 556.15Perc. Error 0.84% -1.50% -0.57%Design Power 104.00 159.05 263.05
train4 2 1 4 vhdl2pow 121.37 173.71 295.08Perc. Error 16.70% 9.22% 12.18%
Ave. |Perc. Error| 13.03% 9.09% 8.43%
Table 7: Power estimation comparison results for some MCNC FSM benchmarks (c).
132
4.9. Summary
This chapter afforded the problem of the high-level power estimation of the HW-bound part
of embedded systems. High-level power estimation can be considered as a key topic to be
addressed in the design flow of an electronic system, to determine power trends during the
earliest stages of the design flow. The goal is the exploration of the architectural design space
from the power perspective. In this chapter, an estimation model has been proposed for both
the data-path and the control-path of the overall system. The input for the proposed model is
the high-level system specification described in VHDL at the behavioral- and RT-level of
abstraction. The high-level power model is composed of a set of sub-models devoted to
describe the power behavior of the heterogeneous parts composing the HW part of the target
system architecture. The most relevant feature of the proposed approach is to be quite general,
since it considers a general SOC architecture as well as their single components, that typically
constitute the HW-side of an embedded system. The model has been implemented in the
vhdl2pow program and applied to evaluate the power of several industrial case studies and
benchmark circuits. The results obtained by vhdl2power have then been compared to those
obtained by a commercial power estimation tool. The accuracy of the results is satisfactory at
this level of abstraction to consider the proposed model as a relative power indicator.