FACULDADE DE E NGENHARIA DA UNIVERSIDADE DO P ORTO RTL Guidelines for Static Power Reduction Ciro de Moura Monteiro Mestrado Integrado em Engenharia Eletrotécnica e de Computadores Synopsys Supervisor: Hélder Silva FEUP Supervisor: José Carlos Alves July 27, 2016
78
Embed
RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO
RTL Guidelines for Static PowerReduction
Ciro de Moura Monteiro
Mestrado Integrado em Engenharia Eletrotécnica e de Computadores
Nos dias que correm, com o crescimento de aparelhos portáteis, operados por baterias decapacidade limitada, é importante uma boa gestão de energia para garantir o maior período deoperação possível. O consumo dinâmico de energia foi em tempos uma das maiores consideraçõesa ter no design de circuitos para baixo consumo de energia, mas hoje em dia, algumas técnicas deredução de consumo dinâmico são aplicadas automaticamente pelas ferramentas.
Cada vez se conseguem produzir circuitos integrados com mais transístores e até mesmo comtransístores mais pequenos. No entanto, daí advém também o problema do aumento nas correntesde fuga. Uma abordagem possível para tentar reduzir o efeito da potência estática consumida pelascorrentes de fuga destes circuitos é apelidada de power gating.
Power gating consiste no uso de transístores como interruptores para ligar e desligar a alimen-tação de partes de um circuito integrado. Para tal, podem ser usados transístores de cabeçalho outransístores de rodapé, cada um com as suas vantagens e desvantagens.
i
ii
Abstract
In today’s world, we are witnessing a growth in battery operated portable devices, that requiresmart power choices due to their limited battery life. Dynamic power consumption has been amajor consideration when designing power aware devices, but some dynamic power savings arealready automatically introduced by the designing tools.
Current technology has evolved into having smaller transistors and that enables building chipswith bigger transistor density. With this technology, new problems arise, as the existence of higherleakage currents. These may or may not be resolved by power reduction techniques, as not all ofthem are leakage oriented. One possible solution for this problem would be the power gatingtechnique.
Power Gating consists in using switching transistors to control the power supply of certainareas of the circuit. This power reduction technique allows the use of header or footer transistor,each one with its benefits and disadvantages.
iii
iv
Agradecimentos
Gostava de deixar um agradecimento...Em especial ao Hélder Silva e Athul Stripad por me acompanharem todas as semanas e aju-
darem em algumas decisões importantes.Ao professor José Carlos Alves por me orientar neste trabalho, e me ajudar a tomar decisões.Ao Nelson Eira pela paciência para me explicar o funcionamento dos scripts utilizados pelo
ambiente de implementação.À empresa Synopsys pela possibilidade que me foi dada em fazer este projecto de dissertação
em ambiente empresarial.À minha família por me ajudar a crescer.À Susana Carvalho por me ajudar a escolher a minha especialização, da qual fiquei a gostar.À Inês Teixeira e Gabriel Ribeiro por me ajudarem com o meu inglês.A todos os meus outros amigos pela paciência para me aturarem.E por fim à FEUP e aos seus professores por me ajudarem na minha formação.
Ciro de Moura Monteiro
v
vi
“Laugh and the world laughs with you.Snore and you sleep alone”
5.1 Power during activity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.2 Power consumption related to current implementation, during activity. . . . . . . 505.3 Power consumption during full simulation. . . . . . . . . . . . . . . . . . . . . . 515.4 Power consumption for a full simulation, write traffic. . . . . . . . . . . . . . . . 515.5 Power consumption for a full simulation, read traffic. . . . . . . . . . . . . . . . 515.6 Relative power consumption for a full write simulation. . . . . . . . . . . . . . . 51
xiii
xiv LIST OF TABLES
Symbols and Abbreviations
CAD Computer-Aided DesignCPF Common Power FormatCTS Clock Tree SynthesisDC Design CompilerDFT Compiler Design-for-test CompilerDMA Direct Memory AccessDRC Design Rule CheckDVE Debugging and Visualisation EnvironmentDVFS Dynamic Voltage and Frequency ScalingDVS Dynamic Voltage ScalingEDA Electronic Design AutomationeDMA Embedded DMAFET Field Effect transistorGIDL Gate Induced Drain LeakageHDL Hardware Description LanguageIC Integrated CircuitIEEE Institute of Electrical and Electronics EngineersIoE Internet of EverythingIoT Internet of ThingsIP Intellectual PropertyIP Intellectual PropertyIR drop Voltage drop due to energy losses in a resistive pathMOS Metal–Oxide–SemiconductorMTCMOS Multi-Threshold CMOSMVSIM Multi-Voltage SimulationNLP Native Low PowerNMOS N channel MOSFETPG Power and GroundPMU Power Management UnitPST Power State TablePVT Process, Voltage, TemperatureQoR Quality of ResultsRCE Regression Control EnvironmentRTL Register Transfer LevelSAIF Switching Activity Interchange formatSDC Synopsys Design ConstraintsSI International System of Units (Système international d’unités)SI2 Silicon Integration Initiative
xv
xvi SYMBOLS AND ABBREVIATIONS
SoC System on a ChipSPEF Standard Parasitic Exchange FormatTCL Tool Command LanguageUPF Unified Power FormatUVM Universal Verification MethodologyVC LP Verification Compiler Low PowerVCD IEEE Standard 1364-1995, Value Change DumpVCS Verilog Compiled code SimulatorVHDL VHSIC Hardware Description LanguageVHSIC Very High Speed Integrated CircuitVIP Verification IP/Verification Link PartnerVLSI Very-Large-Scale IntegrationVPD VCD PlusVTB Verilog Test BenchWWW World Wide Web
Concepts
Clock Gating Technique to reduce clock activityPower Density Power consumed by areaPower Gating Cutting power using a switchPower Island Island that kept ON inside power domain that is OFFShadow Registers Register used to store data in sleep modeVDD Positive voltage railVSS Negative voltage rail/Reference voltage
xvii
Chapter 1
Introduction
Energy efficiency is a very important aspect of electronic circuit design nowadays. Just a few
decades ago, designers used to focus only in having a working chip, and power consumption was
not a primary design concern. The requirements for portability, mobility or battery dependency
were very infrequent and the early CMOS technologies for digital electronics were sufficiently
constrained in terms of power consumption. Then, as technology evolved, transistors size de-
creased, making it possible to fit more of them in one die, and so, power consumption gained
importance. Nonetheless, only dynamic power seemed to matter, due to the fact that for CMOS
technology above 130nm leakage is negligible [4]. Nowadays, power is more important than ever,
especially for battery operated and mobile devices.
Power hungry chips tend to have high power density, which generates a lot of heat for a small
dissipation area. This raises power dissipation issues, requires expensive cooling systems and may
cause chip lifetime reduction.
Due to environmental concerns, there is an interest on reducing power consumption from
devices in an effort of reducing pollution and power wasted without activity. Systems may have
most of their modules idling, consuming power without executing any operation. Chips should
idle efficiently, to reduce the energy consumed on their operation.
In the last years, portability and mobility have gained a still growing importance, and in that
field, power consumption is of paramount importance. For example, there is a significant inconve-
nience for users if portable equipments have to be constantly charged. Not only batteries have little
autonomy without power management, they have short lifetime as well and have to be replaced.
Efficient power management can make a better use of batteries’ energy, making them last longer
and avoiding the hassle of replacing or charging them constantly.
To reduce power consumption in digital integrated VLSI systems, various considerations must
be kept in mind. Dynamic power used to be the major concern in power aware designs, but as
the technology nodes decrease, and transistor density increases, power consumption has gained
increasing importance due to transistor’s leakage. Besides, most circuits today already implement
techniques to reduce dynamic power consumption, such as clock gating. Clock gating effec-
tively reduces the activity in the system consequently reducing dynamic power consumption (as
1
2 Introduction
deductible from equation 2.4).
Today’s techniques to reduce power consumption include using higher threshold transistors in
non-critical paths of the design. That includes using high K dielectric gate oxide, well biasing
and bigger transistors. These techniques are applied at a very low level, they don’t save that
much power and become expensive because they require extra masks for the different transistors.
Sometimes it may be essential to use low vt or ultra low vt transistors to achieve the very fast
speeds that industry demands nowadays. These techniques are also very time consuming since
they may require the designer’s attention to single gates.
Before the 90nm technology node became available, designers used to simply migrate chips
to lower geometries to reduce power. This would take advantage of lower supply voltages as well
as lower capacitance. Integrating a 180nm chip into a 130nm would cut down power to almost
half. Although voltage decreases, current will increase. At 90nm, the increase in current is more
significant than the decrease in voltage and this results in a higher power consumption than the
expected [7].
Static power importance has increased significantly as technology size decreases. Once negli-
gible, at the deep sub-micron level, static consumption can get almost as high as 50% of the total
power consumption [8]. A very useful and effective technique to reduce static power consumption
is power gating. Power gating consists in shutting down inactive parts of a circuit. Implementing
this technique is a very time consuming task due to power necessity identification, and testing.
Power gating was created to reduce static power at the block level. By cutting off the supply to
the module, when power gating is applied, power consumption drops theoretically to zero. Major
dynamic and static power reductions can be achieved by addressing power on the RTL (Register
Transfer Level) and system level [9] [10]. This provides a higher abstraction level to the designer,
and allows the RTL designer to produce power aware circuits, otherwise only implemented at the
back-end.
1.1 Context
This dissertation is being developed in the scope of the Master in Electrical and Computers
Engineering from the Faculty of Engineering from University of Porto. This work was proposed
This work uses UPF because Synopsys tools offer compatibility with it and some power intent
is already specified in UPF. Some further UPF explanation can be found in section 2.2.5.
2.2.5 UPF
UPF is the IEEE (Institute of Electrical and Electronics Engineers) standard for design and
verification of low power in integrated circuits, under the standard number 1801. It was originally
created in an effort for a open portable power specification standard and approved in 2007 as an
Accellera standard. In the same year, Accellera donated it to the IEEE. The first version of IEEE
Std 1801, second version of UPF, was released in 2009 [5].
Since IEEE Std 1801 is an open standard, it gives EDA tool providers the ability to imple-
ment its latest features. The standard is already supported by a large number of EDA companies.
Synopsys tools already support a large subset of the commands in UPF, as well as some UPF-like
power intent commands that are not part of the standard [15].
UPF focus on controlling voltage and current applied to the transistors, normally technology
used for the switches is assumed to be CMOS, but other technologies can also be used. UPF can
be applied with any of the three HDL description languages, VHDL, Verilog or SystemVerilog,
due to its abstraction level [5].
UPF supports a design hierarchy and is advisable for reusing of power intent across configura-
tions. UPF hierarchy is dependent on the RTL modules’ hierarchy, which can be a downside when
it was not designed taking power gating into account.
2.2 Power Gating 21
Figure 2.13: Companies involved in IEEE P1801 working group. Source [3].
The current active version of the standard is IEEE Std 1801-2015, approved on 8 December
2015 by the IEEE-SA Standards Board [16].
2.2.5.1 Concepts
When defining power intent with UPF, a few concepts must be learnt for better understanding
of its structure. This section explains the major concepts used in UPF for power gating.
Modules are put together in power domains according to their power specifications, if we have
two modules that turn off at the same time and use the same voltage, they can be put together in
the same power domain.
Ports are connection points between adjacent levels of hierarchy , connected together using
nets. UPF assumes a more abstract model of the design hierarchy, using its commands to change
the scope within the hierarchy levels. Ports have an HighConn, visible to the parent instance, and
a LowConn side, visible to the instance itself.
Power domains are collections of instances that are powered in the same way, child instances
are included in the same power domain as their parents. A power domain does not need to be con-
tiguous, this means that instances on the same power domain can be placed in different locations.
In the example present in figure 2.14, both modules A and B have the same power requirements,
22 Related Work
A B C
PD_A
PD_B
Figure 2.14: Example of power domains.
so they have been put together in power domain PD_A. As for module C, since it has a different
power requirement than A and B, it belongs to a different power domain.
Supply ports are connections for supply nets on hierarchical boundaries. Supply sets represent
a collection of supply nets. Supply switches control supply connections between supply ports.
2.2.5.2 Scope
The scope is the design hierarchy where the UPF commands are executed. Defining a scope
is particularly useful for a reusable power intent. Using the set_scope command will change
the current scope, and signals will be pulled from the current scope. It is possible to write UPF in
which the current scope is the same as the root scope, but small changes in hierarchy will imply
changing all of the UPF, as in a reusable UPF only the scope would need to be changed.
2.2.5.3 Power domains
A concept introduced with power intent is power domain. When a design is power aware,
modules belong to power domains. A power domain defines a set of rules for the modules that
belong to it. A design can have several power domains, each of which has its own independent set
of rules. A power domain can be switched off, or have a defined voltage. Power domains from the
same design can be in different states independently from each other.
This is the power domain definition present in the standard:
"power domain: A collection of instances that are treated as a group for power-
management purposes. The instances of a power domain typically, but do not always,
share a primary supply set. A power domain can also have additional supplies, in-
cluding retention and isolation supplies." [16]
The other components defined in the IEEE 1801 standard are usually associated with a power
domain. That applies to the retention strategies, isolation strategies, power switches and level
shifters.
2.2 Power Gating 23
Power domains are characterised by their power availability. A power domain that is not
switchable, and remains always powered, is said to be an always on power domain. Power domains
may also be characterised in relation to other power domains, this is, if power domain PD_A is
on when power domain PD_B is off, power domain PD_A is said to be relatively always on in
relation to power domain PD_B.
Three supply set handles are usually created with the power domain, primary, default_retention
and default_isolation. Extra supply sets handles can also be created with the -supply argument.
The power domain’s supply set handles default_retention and default_isolation are usually
associated with an always on supply set from the top power domain.
2.2.5.4 Isolation strategies
Powered off logical outputs can not be directly connected to active logic inputs, since values
are unpredictable they can cause incorrect readings and lead to unwanted behaviour. Isolation cells
exist to address this issue. They are placed on the border of the power domain and are responsible
for clamping the cell output. They can also be used together with level-shifters in a multi-voltage
design.
OFF ONIsolation
cell
Power
Management
Unit
Figure 2.15: Isolation cell between power domains
There are three types of isolation cells, according to their functionality, they can clamp to "0",
"1" or the last value. A simple AND gate can be used to clamp the signal to "0", as well as an OR
gate can be used to clamp it to "1". To clamp the signal to the last value before power down, a
more complex cell is used, consisting of a latch to keep state and a multiplexer, as can be seen in
figure 2.16.
Isolation cells are placed using the UPF command set_isolation. Depending on the
version of the standard used by the tools, it may be necessary to define an isolation control.
This is true for the IEEE 1801-2009 Std. version. For the newer versions of the standard, the
24 Related Work
Figure 2.16: State Retention Isolation. Source: [4]
set_isolation_control command has been superseded and all isolation information can be
defined in a single set_isolation command.
Isolation cells can either be inserted at the output of the gated power domain, or at the input
of the power domain that is connected to it. In case this second power domain is less on than the
first one, there may be no need for the insertion of the isolation cells. Both types of isolation can
coexist in the same design.
Isolation is only needed at either the input of the on power domain or the output of the power
gated one. Having isolation on both will create redundant isolation inserting more cells than the
ones necessary for the operation.
Isolating inputs of a power domain from the outputs a less active one is a way of ensuring
all the signals are isolated. Leaving nets from a power gated module without isolation may cause
incorrect behaviour of the system as well as sneaky paths for current to leak. When isolating
inputs, it is necessary to make sure that no always on cells are inserted before the isolation cells
by the synthesis tools.
Another option is to isolate the outputs from the power domain that is to be powered off.
Nevertheless, this would insert isolation cells in all the output ports of that power domain, some
of which may be connected to itself or a less on power domain.
Outputs from modules that connect to the same power domain do not need to be isolated,
although isolation cells are typically small, they introduce delays in the data-path. Manually se-
lecting each port that should or not be isolated is possible, but impracticable for large designs.
UPF already accounts for this, by using the -diff_supply_only switch when creating the iso-
lation rule, will prevent tools from inserting isolation cells for nets connected to the same supply
set. This however will also foreclose the insertion of isolation cells for output ports with hetero-
geneous fan-out, this is, that connect to both another power domain and itself.
2.2 Power Gating 25
Likewise the -diff_supply_only command, it is also possible to specify a source and/or
sink filter. This filter will only apply the isolation rule to nets that come from one of the source
supply sets and enter one of the specified sink supply sets. This is very useful when isolating
designs that have several power domains.
OFF ON
OFF
Figure 2.17: Power domain with heterogeneous fan-out
Using -diff_supply_only will however fail to create isolation cells in a domain port with
heterogeneous fan-out, like the one on figure 2.17 resulting in a warning message. This case is a
good example where isolation could be placed on the input of the ON power domain. It could also
be place on the output of the OFF power domain, but the second off power domain input does not
need to be isolated as it as the same power needs as the first one.
2.2.5.5 Supply sets
Supply sets are an aggregation of supply functions that together provide a complete power
source [16]. Supply sets provide a higher level of abstraction to the designer, replacing the need
of creating individual supply nets and supply ports. Supply sets have their implicit supply nets,
such as power, ground and well biasing. Supply sets provide the needed supply nets for modules
to operate. Explicitly created supply nets can be associated with an existing supply set via the
-function argument of create_supply_set command.
A power domain can have several supply set handles, which are then associated to supply sets.
Supply sets are usually associated with power domain’s supply set handles.
2.2.5.6 Retention strategies
When powering off some designs, there may be a need to keep some state. To keep state, some
registers need their value to be preserved when the module is turned off. There are several possible
26 Related Work
approaches to achieve this, either using retention registers, power islands or external memory.
Retention registers are made of two registers, the main register, for normal operation and the
shadow register. Shadow registers are less leakier but produce a big area overhead.
Figure 2.18: Retention register. Source: [4]
Another way to retain state is keeping the modules which contain the registers needed to keep
state in a different always on power domain. This technique is named power islands, due to the
fact that those modules will be in a different always on domain, inside a powered off domain. This
adds some complexity to the design, since the back-end designers will need to pull the supply rails
a module inside a powered on domain. This does not cause a considerable area increase, if any,
but it is not advisable in large areas with low activity, since we would be wasting an opportunity
to reduce leakage.
Retention may be one of the power gating components with major impact. Retention registers
can create huge area overhead if not planned carefully. The need for full state retention or only
partial retention should be taken into consideration for area optimisation and restore time reduc-
tion. If the system is able to recover from a power down with only partial state retention, this
becomes an attractive solution give the registers time overhead and size.
Low standby voltage is also a possibility, but this solution increases testing complexity since
it will require a multi-voltage design, as well as a library with cells able to operate on the specified
voltage range, from standby voltage to normal operation voltage.
2.2.5.7 Level Shifters
In a multi-voltage design, communication between modules that operate at different voltages
may cause reading errors or even damage the circuitry. To ensure the correct expected operation,
level shifters must be inserted in between those modules. Level shifters are gates responsible to
shift logical signals across different voltages. If two power domains use different voltage, level
shifters must be used to ensure the correct functionality of the system. Level shifters have a low
voltage and a high voltage side.
Level shifters can be of two types, low to high or high to low. As the name suggests, high to
low level shifters, shift from the high voltage to the low voltage and low to high ones shift from
low to high voltage.
2.2 Power Gating 27
2.2.5.8 Enable Level Shifters
In multi-voltage designs, ports on the boundary of power domains may need both level shifters
and to be isolated. In order to have lower area overhead, a single cell called enable level shifter
can be used instead of the isolation cell and level shifter.
Figure 2.19: Enable Level Shifter Example
2.2.5.9 Power Switch
The power switch is usually implemented in CMOS technology and consists in a transistor
between the power supply and the standard cells power input pins. The switch can be either
NMOS (footer switch) or PMOS (header switch).
Liberty libraries may have several different switch cells. Switch cells in the library may contain
several switches and are usually defined by their type. Switches types can be coarse grain or fine
grain. DC will select a switch able to carry the needed current for the on state. To force DC to
select a specific switch cell, the designer can mark all other switches as dont_use or dont_touch
and recompile the library.
Most switch related decisions are made by the back-end designer, so tampering with the library
may not be a good option. A single switch will usually not be enough to supply an entire power
domain, leaving to the back-end team the decision of selecting coarse grain switching or fine grain
switching and grid or array topology.
Switch cells have an output acknowledge port. The acknowledge port is usually connected to
the PMU to indicate that the power is now stable, or has been removed. This particular signal is
very important to avoid incorrect behaviours, if the PMU transitioned state based on a timer, since
small manufacturing process variations can affect wake up and shutdown times, it could transition
into an operative state before the power domain was actually operational, or even spend more time
than necessary waiting for power up.
28 Related Work
At the back-end phase, the decision of switch topology goes into the design is also made. Most
designs use coarse grain power switching because the reduced complexity in implementation does
not compensate for the increase in area.
It is up to the back-end engineer to introduce delays between the switches in order to avoid
large inrush currents, since this kind of analysis is not able to be performed at the synthesis level.
VVDD
ACK
VDD
Sleep
Figure 2.20: Header switch cell
The figure 2.20 represents a PMOS header switching cell. VDD represents the input voltage
from the power rail. VVDD is the virtual voltage supply that is to be input of the power domain to
be gated. The sleep signal is responsible for controlling the virtual supply rail. The acknowledge
port reports the power state back to the power management unit, with the help of a buffer.
2.2.5.10 Cell Location
UPF provides the option of defining the physical location for cell insertion. This is a somewhat
important decision since it will affect layout complexity. This decision is taken at the RTL level,
but it is important that the power architect is aware of the back-end flow in order to not difficult
the implementation. The cell location is defined by the -location argument present in the UPF
cell insertion commands.
Cells can be inserted on the power domain they belong, in the parent domain or even both.
When working on IP, to be integrated in other designs, it is useful to place the cells in the power
domain they belong to, since putting them outside will create a area overhead in the parent design
in relation to the predicted area of the IP. If the cells are inside the IP area estimation already ac-
counts for them. Cells located inside the IP also provide a more abstract model to the designer that
is going to integrate the IP, this way there is no need to worry with power intent since everything
is already implemented inside the IP, reducing verification and implementation times.
UPF related cells inside the power domain may however cause a more complex back-end
implementation. Isolation cells inside a gated power domain require a extra pg pin connection to
an always on net, to power the cell, since the primary power net will be shut off. This means an
extra power rail has to be pulled inside the power domain on the layout stage of design.
2.2 Power Gating 29
Inserting cells in the parent power domain may be a good option for internal power domains.
That means no extra supply rail needs to be pulled inside the power domain since it will be more
on than the one isolation cells are coming from.
2.2.5.11 Input vs Output Strategies
When creating isolation, level shifter or enable level shifter strategies, it is possible to chose
if that strategy applies to the power domain inputs, outputs or both. This is a quite an important
decision, since it may avoid uninsulated paths or redundant strategies.
As described in the isolation section (2.2.5.4), using the -diff_supply_only true switch
when defining an isolation strategy will not insert cells if the output has heterogeneous fan-out.
Instead, if that happens to be the case, it is better to define the strategy for the input port of the
active power domain, given it is the only power domain needing isolation or level shifting for that
signal.
OFF ON
OFF
iso_enable
Figure 2.21: Isolation on input of heterogeneous fan-out
Figure 2.21 is a good example where the strategy should be applied to the input, however,
in figure 2.22 it is the opposite. Since both power domains require isolation, because they are
active when the output of the first power domain is corrupt, it would be better to just isolate the
output of the first power domain. This represents an example of redundant isolation, and creates
unnecessary cells.
In figure 2.23 displays a situation when using output isolation would be the best option. The
output port connects to two power domains, and using input isolation would create an unnecessary
extra cell.
2.2.5.12 Power State Table
The power state table is a very important component to help verification. The power state table
has no physical implementation, that means it is only a table that defines all possible voltages that
30 Related Work
OFF ONiso_enable
iso_enable
ON
Figure 2.22: Redundant isolation
OFF ONiso_enable
ON
Figure 2.23: Output isolation
can be applied to the power domains. If during the simulation, a power domain enters in a state
that is not defined in the power state table, it is said to be in an illegal state and will trigger an
error, causing the simulation to fail.
The power state table (PST) can contain several possible states, and several supply sets. The
power architect should write all possible states for the power domains in the power state table,
although, it is also possible to have several power state tables in the same design. Having several
power state tables allows unrelated power domains to operate independently. All power domains
related should be included in the same table to catch bugs on the power intent.
Values on the power state table are real and define the voltage applied to the supply net. A zero
in the PST does not mean the net is off, it means the defined voltage is zero. Ground net when
defined as 0, it means the net is ON. Gated nets in the power state table are defined as "OFF".
In the example table 2.2 it is defined the possible states of two power domains, PDA and PDB.
This table has three possible states, PS_ALL_ON, PS_ALL_OFF and PS_LP_1.
2.2 Power Gating 31
Table 2.2: Example of PST
PDA.primary PDB.primaryState power ground power ground
PS_ALL_ON 1.0 0.0 0.8 0.0PS_ALL_OFF OFF 0.0 OFF 0.0
PS_LP_1 1.0 0.0 OFF 0.0
In PS_ALL_ON state, both power domains are on, PDA with 1.0V and PDB with 0.8V. Tools
when analysing the PST will notice this and check if level shifters have been inserted on connec-
tions between the two power domains.
The PS_ALL_OFF state is a state usually present in all PST, designs without it risk missing
states in the power up or power down sequence [7]. It is possible to observe that for this particular
design, header switching was chosen, since the supply net that is gated is the power one.
The last state, PS_LP_1, has one power domain active, PDA, and the other one power gated.
This means there need to be isolation cells from PDB to PDA. As the two power domains operate
at different voltages, enable level shifters should be used instead of both an isolation cell and a
level shifter.
From this power state table, it is possible to see that PDA can not be turned OFF when PDB is
ON. This situation creates a violation of the power state table and will cause the simulation to fail
with an illegal state.
32 Related Work
Chapter 3
Design Flow
This chapter summarises the design flow used in hardware simulation, verification and synthe-
sis and it introduces the necessary differences for a power aware flow.
UPF files are part of the design source. While HDL files are used to specify logic intent, UPF
files are used to specify power intent. UPF files are refined as they go down in the flow, and their
information grows as they get refined. They are inputs to the simulation tools, synthesis tools,
formal verification tools and place and route tools, the output is a new UPF file that should be
formally verified against the original one. This process is illustrated in figure 3.1.
Figure 3.1: UPF tool flow. Source: [5]
UPF files are created at the RTL level of the design and are synthesised with the HDL files for
logical verification. Then they are refined to better suit the needs during the consecutive phases.
On the final phase, together with power analysis, time analysis, validation, functional verification
33
34 Design Flow
are performed to ensure UPF did not affect the expected logical behaviour of the circuit. The
original power intent is kept from the start in order to be formally verified against the succes-
sive refinements to ensure consistency of power intent throughout the development. This original
power intent is referred as golden UPF.
Figure 3.2: Design flow for multi-voltage, power gated designs. Source: [4]
3.1 Flow Without Power
Since the team this project is being developed in at Synopsys is a front-end team, the flow does
not reach the place and route phase. RCE (Regression Control Environment) is the tool respon-
sible for building the working environment. RCE uses CoreConsultant, and CoreConsultant uses
CoreBuilder. CoreBuilder is responsible of preparing files for compilation according to the de-
fined configuration, that means removing pragmas and ifdefs, so that the output RTL lines up with
3.1 Flow Without Power 35
the configuration. CoreBuilder receives a TCL script as input that uses to build the configuration
intended.
VCS (Verilog Compiled code Simulator) is a functional verification tool, responsible for ver-
ifying the RTL against the test bench. VCS performs both compile and run-time verification. In
this project, the test bench has been developed using SystemVerilog, and the design intent Verilog.
When developing tests for the test bench, it is possible to enable wave dumps adding the command
to do so in the test file. VCS recognises that command and generate a VPD (VCD plus) dump file
with the wave forms generated from the simulation activity. This is a file format used with Syn-
opsys tools, the IEEE standard format for wave dumps is VCD (Value Change Dump). VPD files
can be easily converted to VCD, with vpd2vcd tool, if there is a need to use other industry tools.
To help understand undefined port states, a tool called Xprop can be run together with VCS.
Xprop propagates unknown port states across the design. Unknown port states are easier to debug
at the RTL level because the descriptions are closer to the design intent. Xprop is useful to find
the origin of the unknown signal, reducing debugging time.
Wave analysis is a good last resource to catch design errors, incorrect protocol implementa-
tions that may have escaped from the test and find the signal or sequence responsible for a test
failure. DVE (Debugging and Visualisation Environment) is used to visualise the waves. DVE
allows the designer to view code, and points to the source of a signal when double clicking on it.
Another useful feature is hierarchy visualisation, as it is possible to view the modules location,
as well as parent and child instances. In DVE it is also possible to visualise schematics and trace
back signals, this has been very useful to find the source of incorrect behaviour.
After inspecting the simulation results, it is necessary to generate a SAIF (Switching Activity
Interchange format) file with the activity. The SAIF file will later be used by DC to map names
on the netlist, which is essential for PrimeTime to perform power analysis. To get the SAIF file
from the simulation, it is necessary to convert the VPD dump file from VCS to VCD with vpd2vcd
tool using the +includemda switch to include multidimensional arrays. It is possible to select a
power interval, but any time interval will work since the file will only be used for name mapping.
Then the VCD file is post processed with the vcdpost utility. This ensures unique identifiers codes
for nets and registers.
After obtaining the post processed VCD file, running it through vcd2saif generates the SAIF
file. The switches -top and -instance are used to define the top module and instance. This
is particularly useful for removing test bench instances and test modules from the activity file, as
they are not synthesised and therefore not necessary for the process.
3.1.1 Synthesis
CoreTools are also used to generate the workspace used by the synthesis tools. The tool used
to perform synthesis is Design Compiler (DC). Synthesis consists in generating a netlist based on
the verilog logical description of the circuit. Synthesis maps the verilog functions to standard cells
from the given libraries, resulting in a functional netlist able to perform the intended operations.
36 Design Flow
Synthesis tools are driven by TCL scripts, previously written to guide the synthesis process
and provide options for optimisation. The SAIF file extracted from the simulation is now added to
the workspace in order to be used by DC for name mapping. Name mapping is an optional activity
that needs to be added to the existing scripts and consists in creating a new file containing a map
of names between the RTL code from the simulation and the netlist generated by DC. This name
map will later be used by Primetime to annotate activity from the simulation to the netlist during
power analysis.
To perform synthesis, DC needs to be provided with libraries. The type of library used in this
case is liberty. Liberty is a library standard in the VLSI industry used to describe standard cells.
Liberty defines power pins, and logical pins, timing performance, as well as cell power consump-
tion and function. Liberty libraries may have one or several operating conditions for which its
cells attributes are characterised. Operating conditions include process variation, temperature and
voltage.
Power compiler is an integrated extension of DC used to minimise power consumption. Power
compiler also allows for concurrent timing, area and power optimisation [17]. Power Compiler
uses multi-corner multi-mode optimisation.
DFT (Design-for-test) Compiler is responsible for last stage, the insertion of the scan cells.
DFT Compiler also tries to repair DRC (design rule check) violations at the gate level. There is
also some optimisation of area and timing at this phase.
Formality is used to formally verify the equivalence between the RTL logical intent and the
synthesised netlist. Formal equivalence checking is used in the EDA industry to validate the
behavioural equality between two representations of the same circuit. Formality is used in this
case to compare the verilog logical intent against the synthesis generated gate level netlist.
3.2 UPF flow
The UPF flow works similar to the usual design flow, but with increased complexity. Extra
tools are needed to analyse the power intent and apply new signal constraints, such as power down
corruption. This added complexity increases simulation, development and testing times.
For power aware simulation, VCS needs to be run in MVSIM (Multi-Voltage Simulation)
NLP (Native Low Power) mode. Normal simulation assumes that an always on constant voltage is
provided to the chip, which is not true if there is power gating or multi-voltage implemented on the
design. For power gating effect simulation, MVSIM corrupts signals when in low power mode.
Logical outputs are now also dependent on the supply state, the lower the voltage, the slower will
be signal propagation. Since DVS (Dynamic Voltage Scaling) was not implemented due to the IP
complexity, different voltages were not simulated.
MVSIM checks for the correct transition of power states and compares them with the power
state table to ensure that there are no illegal transitions. The correct implementation of the power
control sequence is also checked, which helps catching low power bugs early in the design. Isola-
tion and retention strategies are also checked to ensure their correct behaviour and implementation.
3.3 Power Analysis 37
DVE has enhanced signal visualisation for power aware simulation wave dumps. Corrupted
signals due to power down will be displayed differently for easy identification and not to be con-
fused with signals in unknown state due to logical errors.
3.2.1 Voltage Aware Synthesis
Complexity also increases for voltage aware synthesis. A library with low power kit is needed
in order to map all the new cells introduced by the UPF files. DC has to insert power switches,
isolation cells, level shifters and retention registers according to the power intent. Those cells
usually are marked only for area optimisation, so that DC does not replace them with normal cells.
Before synthesis, it is important to check the library that is going to be used for the presence
of the cells necessary for power gating implementation. Some vendors may mark them as "don’t
use" or "don’t touch", in which case, synthesis tools will ignore those cells and introduce GTECH
(General Technology) cells.
GTECH cells are part of a generic library used to map cells that are not available for DC
from other libraries. GTECH cells can not go into production and should not be present on final
designs. GTECH cells have generic characterisation, translating into incorrect power estimations
due to their big difference from silicon.
Depending on their location, power gating cells may need to have dual supply rails, one is the
power domain supply and the other one the always on supply, in order to ensure always on cells
remain powered during low power mode.
Power aware synthesis requires the power net voltage to select the cells from the library. Lib-
erty cells are designed to operate at a designated voltage, in order to select which cells it will
use, DC checks in the UPF files for the defined voltage of each power domain. In case of a
multi-voltage design, DC will insert cells from different libraries for the different power domains
according to their defined voltage. DVFS (Dynamic Voltage and Frequency Scaling) designs re-
quire cells capable of operating in the voltage range used in the dynamic scaling.
DC will sometimes flatten the hierarchy to perform optimisations, which may not be problem-
atic, but may difficult power analysis. For the case of analysing a specific wrapper consumption,
it is desirable to keep the hierarchy as defined in the logical intent. To achieve this, it is necessary
to force DC to keep hierarchy with the -keep_hierarchy switch statement on invocation.
3.3 Power Analysis
After synthesis, it is important to analyse the synthesis reports for violations and errors. The
cock tree has to be declared as an ideal network as it is a high fan-out network and will not be
optimised at this design stage. The clock tree is declared in a script that serves as input for DC,
and will not be synthesised.
PrimeTime PX is an extension of PrimeTime for power analysis, and is the tool used for
power analysis at the netlist level. PrimeTime will execute normal TCL commands, that allow it
to execute an already written script, instead of manually typing every command in.
38 Design Flow
Running pt_shell at the command line will execute Primetime. The easiest way to obtain
several results from different part of the same simulation is to create a TCL script and execute
it with Primetime, since the flow will be the same, only the input activity file changes. To run a
script with Primetime, the -f <script file> switch is added at invocation.
First, it is necessary to load the libraries used in the synthesis. To do that it is enough to set the
target_library, link_library and search_path. The
It is necessary to make Primetime enter PX mode, enabling power analysis. To enable power
analysis it is necessary to set the power_enable_analysis variable to true. Next it is neces-
sary to set the power analysis mode. In the scope of this work, average power is the one that is
important to analyse. Average power is helpful to analyse energy consumption, which is specially
useful to estimate battery life.
Next it is necessary to pass the netlist to Primetime, using the command read_verilog
<path to netlist>. Primetime needs to know which design it is working with, to perform
that, the current_design <top instance> command is entered.
Since libraries may contain several corner cases, it is necessary to specify the one that will be
used for the power analysis. With this information, Primetime is able to select cell consumptions
and timing corners for a given voltage and temperature.
Next it is necessary to read the power intent. The power intent comes from the UPF file used
during simulation. These files have also been imported by the synthesis tools to insert the special
cells necessary for power gating. The root scope in which the power intent will be executed is the
current design, defined already.
Since wires parasitic effects can’t be ignored, they are must also be taken into account when
performing power analysis. The wire parasitics depend mostly on back-end implementation, but
synthesis results provide an estimation of their effect sooner in the design. Parasitics are read
from a Standard Parasitic Exchange Format (SPEF) file, which is an IEEE standard for parasitic
representation of data wires in ASIC development flow [18].
Design constraints are loaded from the SDC (Synopsys Design Constrain) file, and analysed.
The SDC file defines timing constraints and domain voltage definitions. The SDC file contains the
clock definitions, as well as some other networks that are defined as ideal, since they will later be
optimised at the place and route phase, during clock tree synthesis (CTS).
After this process, running update_timingwill instruct Primetime to take the input files and
configuration previously defined and start analysing the design. Activity has not been provided yet.
It is possible to do a power analysis based on a specific expected operation of the design. This
expected operation comes from a simulation, by providing both the name map file, created during
synthesis and the switching activity file, either a VCD or a SAIF file.
Two important reports are the power report and the switching activity report. The power report
provides an estimation of power consumption, discriminated by module, which is a good way of
checking power budget and power savings across different implementations.
The switching activity report can be used for debugging. If something is not correctly imple-
mented, weird results will appear in the switching activity report. The switching activity report
3.3 Power Analysis 39
shows the design activity according to logic type.
This type of power analysis will not take the clock tree into consideration as it has not yet
been synthesised. At this stage, clocks are considered to be ideal networks. Clock tree synthesis
will provide better optimisation for high fan-out networks, such as the clock tree. CTS is also
important to minimise clock skew and ensure proper clock distribution and a balanced clock tree.
40 Design Flow
Chapter 4
Implementation
The design used in this implementation is proprietary and confidential, given that fact, some
explanations will be reduced to a minimum necessary for understanding implementation choices.
The implementation focus mainly on power gating, using the IEEE Std 1801. The tools used
for this implementation support the IEEE Std 1801-2009 version of this standard. This is not the
last version of the standard, the current active version of the standard is IEEE Std 1801-2015.
This chapter explains how power reduction was implemented on the design, as well as how
the guidelines were defined. These guidelines provide some guide points for easier power gating
implementation, since it can become a difficult job, and adds complexity to design testing.
4.1 Steps taken
The specific sub-design studied is part of a bigger design that communicates with it. In the
figure 4.1, it is possible to view a summary of the IP. The module used for this implementation
is represented as eDMA (Embeded Direct Memory Access), and is composed by a couple of sub
modules. In summary, it is divided in two channels, and some arbitration logic. The write channel
generates mainly traffic directed to the core module while the read channel will generate traffic
mainly for the application module. The eDMA, as the name explicits is a feature for direct memory
access that will offload the core processor to do other tasks while it sends information from the
memory to the application. There is also traffic directly from the core module to the application
and vice versa. Arbitration logic is responsible for selecting the source of traffic to the receiving
modules.
There is some common logic used by both channels and register configuration. The arbitration
logic must be kept powered on even when the remaining blocks in the eDMA are not being used.
Common logic must be powered on when there is an access to write or read from it’s configuration
registers, and be kept on till the eDMA is disabled.
The design has already some power intent implemented. The implementation consists on
a switchable power domain (PD_VMAIN_SW), an always on power domain (PD_VAUX) and
some power islands for state retention. The IP entry in low power state involves fairly complex
41
42 Implementation
Core Application
Data
Data
Data
Data
eDMA
Read
Write
Figure 4.1: Basic representation of the module used.
negotiations, but it is outside of the scope of this work. For a flawless integration of new power
intent onto a design that already has some implemented, the system must be extensively tested to
avoid power down bugs.
On a first approach, it was decided to create a new power domain to gate the less active modules
of the eDMA. Modules that were not used during core to application and application to core traffic
were selected and put inside that power domain. This involves some knowledge of the design and
some trial and error technique, as well as some traffic analysis. The eDMA module is already
integrating part of the switchable PD_VMAIN_SW power domain, making it necessary to test the
full design instead of isolation the eDMA module.
One of the output signals from the eDMA is necessary for the configuration of the application,
and its values may change during run-time. This means that isolation can not be stuck at either "0"
or "1" via simple and or or isolation, as its value may change and cause inconsistencies between
the isolation value and the actual value. This will cause collisions on transactions. To address the
problem, isolation latch cells have been used. However, the libraries available do not possess those
cells, which resulted in GTECH cells insertion.
In an effort to avoid the insertion of GTECH cells, and based on the module that requires
latching isolation cells power consumption, it was decided to remove this module from the power
domain. Its power consumption is relatively low, so the impact of removing it from the power
domain is negligible.
Upon inspection, it was possible to conclude that the eDMA block had no necessity of saving
its state when entering low power. This is because the software already configures registers each
time the module is reactivated.
The control signals for the power domain need to be controlled from a power management
unit. Since there is no need for register retention, this power management unit is also simpler. The
power management unit is a design module and will be explained in more detail on subsection
4.3.
4.2 Obstacles 43
Table 4.1: EDMA power state table
PD_VDMA_SWState power ground
PS_ALL_ON 0.8 0PS_ALL_OFF OFF 0
The PST for this implementation (4.1) is a simple two state table. The PD_VDMA_SW power
domain is either on, at 0.8V, or gated off.
After some analysis, it was possible to conclude that the modules belonging to the read channel
were not necessary for write channel traffic formation and vice versa. On a more aggressive power
saving approach, it was decided to create new power domains for each channel.
There are three different power requirements, making it a good choice to use three power
domains to further reduce static power consumption during read and write operations.
It was possible to take advantage of signals present in the design to enable the read and write
channel independently to control ther PMUs. The PMUs work in parallel to control the power
domains independently from each other. This is further explained in section 4.4 as it is the imple-
mentation with the best results.
4.2 Obstacles
Large IP designs take very long to simulate and even more to synthesise. Even small errors in
the process can cost a lot of time. Since each modification requires the test of the design to ensure
functionality remains as expected, even small improvements and features require new simulation
and, since the design changes, a new synthesis is also required. Synthesis with power, for such big
designs may even take a day or two.
It took some time to understand that the tools installed for synthesis do not yet support the
IEEE 1801-2013 standard, but support the old IEEE 1801-2009. This caused DC to not recog-
nise isolation cells due to some new commands not yet supported. The actual active standard is
IEEE 1801-2015, but the industry always takes some time to support standards, and the version
supported by the available tools is the IEEE Std. 1801-2009.
Since the libraries do not have isolation latch cells, it was necessary to remove the module from
power domain, because synthesis would not introduce isolation cells on that particular path. When
cells are not present on a library, synthesis tools will introduce GTECH (Generic Technology)
cells, these cells are present on a generic technology library, and are will cause incorrect power
and area estimations. Avoiding the insertion of GTECH cells provides a more accurate power
estimation, since parameters are usually well defined on a technology library.
44 Implementation
4.3 Power Management Unit
The power management unit (PMU) is a very important component in a design with power
gating. The PMU is responsible for controlling all the power related components in the design.
Due to the fact that no state retention was necessary in the scope of this project, the power man-
agement unit is a very simple state machine with eight states, that can be reused in most power
gating implementation, as long as there are no state retention registers. For a design with power
off retention necessities, two states, save and restore, have to be added to the state machine. If
power islands are used for state retention, this state machine will be enough though.
This power management unit interacts with the clock and reset control block, that has the
ability of providing different clock and reset signals. It controls clock, reset, power switch and
isolation enable signals. The clock and reset control block was modified to have extra clocks and
extra reset signals for the power domains created. The control block provides synchronous reset
on request and an acknowledge signal that the PMU uses to change state. It also provides the clock
signal when requested, but since this is a test, non-synthesisable module, it has to be implemented
by the client to support these control signals and provide the correct clocks and reset.
The state machine starts in the idle state, assuming the chip has power on reset. The idle state
has the isolation enabled, power switch disabled and no clock. The power domain controlled by
the PMU initiates in the off state, with isolation enabled to prevent unknown signal propagation.
The clock request signal is also disabled to save dynamic power. The enable signal is responsible
for triggering the wake up process. In this design, the enable signal is an or combination of several
signals that are asserted when an operation from the module is required. In a different design, this
could be an internal signal from another module, requesting the gated design to wake up.
In the wake_up state, the power domain is powered up, however reset is kept low and the
isolation enabled. There is no clock signal either. This state waits confirmation from the power
switch, ensuring power is stable.
When the power is stable, the PMU enters the deisolate state. This state only takes one clock
cycle and the main function is to release the isolation. It also asks for the clock signal to the clock
and reset block, since it take one clock cycle to arrive.
The clk state is where the power domain gets its clock, and reset is released. When receiving
an acknowledge signal from the clock and reset control block, the state changes into active.
The active state is where the power domain is fully on and working as if there was no other
logic than the one described in the logical intent. The PMU remains in this state until the enable
signal is deasserted.
When the enable signal is deasserted, the PMU changes into the gate_clk state where it gates
the clock and asks for a reset, this way the design goes into a known state.
After a clock cycle, it enters into the isolate state. In this state, isolation is enabled and after
one clock cycle, it enters the gate_power state.
The gate_power state turns off the power switch and waits for an acknowledgement from the
switch confirming power removal. After receiving the acknowledgement, it transits back into the
4.4 Final Result 45
idle state, waiting for a new enable signal.
If the power domain is needed in the gate_clk or isolate state, the PMU will jump into it’s
counterpart state, to prevent it from entering in low power and causing a big time overhead.
IDLE
WAKE_UP
DEISOLAT E
CLK
ACT IV E
GAT E_CLK
ISOLAT E
GAT E_POWER
enable
pwr_ack
rst_ack!enable
!pwr_ack
enable
enable
enable
!main_rst_n
Figure 4.2: Function State Machine.
An improved version of the PMU unit has been implemented, but not fully tested due to the
traffic profiles used by the system. The improved PMU contains a timer in the ACTIVE state.
This timer prevents the block from entering in low power unless there has been no activity for the
during its defined timeout. The objective of this is to filter out sequential traffic, preventing the
system from constantly shutting down and powering back up at each consecutive transaction. The
timer is configurable by software.
4.4 Final Result
Since the IP power intent is either full on, or in a low power mode that gates the hole system,
it is possible to reduce static power consumption even further by gating modules that may not be
used when the IP is in not in the power down mode.
The final implementation consists of three power domains, one for each power requirement,
PD_VDMA_RD_SW, PD_VDMA_WR_SW and PD_VDMA_SW The first power domain in-
cludes the modules that make up the read channel, the second one includes the modules from
46 Implementation
write channel and the last one is composed by the common logic and configuration registers.
There is a module created for power management. It consists in a simple eight state, state
machine. This module is replicated three times, one for each power domain. The difference is the
signals used to activate the transition of the state machine to the on state. The three PMUs work
in parallel controlling the power domains independently.
Table 4.2: Edma power state table
PD_VDMA_SW PD_VMDA_RD_SW PD_VDMA_WR_SWState power ground power ground power ground
PS_ALL_ON 0.8 0 0.8 0 0.8 0PS_ALL_OFF OFF 0 OFF 0 OFF 0PS_WR_ON 0.8 0 OFF 0 0.8 0PS_RD_ON 0.8 0 0.8 0 OFF 0
The table 4.2 represents the power state table defined in the implementation. It is composed
by four states: PS_ALL_ON, PS_ALL_OFF, PS_WR_ON and PS_RD_ON. Each of these power
states represents a possible state the power domains could be in. The defined operating voltage is
0.8V because it is the voltage cells from library used in the implementation work with.
From the PST it is possible to observe that PD_VDMA_SW could never be gated off when
either of the other two power domains are active. This comes from the fact that PD_VDMA_SW
contains common logic necessary for both read and write traffic operations, as well as configura-
tion registers.
The power states PD_WR_ON and PD_RD_ON are the states used during exclusive write or
read operations, respectively. Those are the power states that take advantage of different power
necessities from the read and write channels and grant some extra power savings.
4.4.1 Power Architecture
All the power domains created are dependent of PD_VMAIN_SW, the main power domain of
the IP, since all supply sets are connected to PD_VMAIN_SW primary supply set.
As it is possible to observe from figure 4.3, each power domain has its own switch, controlled
from its PMU. The pm_en_sw signals are outputs from those power management units, responsi-
ble for the control of the power switch.
4.5 Verification
Testing is a very important and time consuming task in the VLSI industry. To ensure the correct
implementation of power gating in this design, it had to be tested. For the IP, one of the verification
solutions is a test bench named VTB (Verification Test Bench), based on the Universal Verification
Methodology (UVM). UVM is an accellera standard to enable reuse of verification environment
and Verification IP (VIP) [19]. UVM is implemented on top of SystemVerilog (IEEE Std. 1800),
a IEEE standard for hardware design, specification and verification language, a commonly used
4.5 Verification 47
PD_VMAIN_SW
PD_VDMA_SWPD_VDMA_RD_SW PD_VDMA_WR_SW
pm_en_rd_sw pm_en_swpm_en_wr_sw
Figure 4.3: Block representation of the power domains
language for verification. SystemVerilog is very similar to the Verilog HDL, but it also has some
object oriented properties.
VTB has a set of tests, used to exercise and test different interfaces and functions of the IP.
Some of the tests are used to exercise the eDMA block, with different traffic profiles.
Although there are already many tests, none of them take power into account. To ensure the
fully functionality of the block, it is necessary to test the entrance in low power mode as well as
the exit sequence from it. For that, the conditions to enter in low power must be met during the
test. The correct behaviour of the remaining system must also be tested when the module is turned
off.
In order to test the correct behaviour of the implementation, two already existing tests were
modified. One of the tests generates traffic from the eDMA block, both read and write traffic. The
other one sends generic traffic between the core and the application. With those tests, three new
tests were created. The difference between these three tests is the type of traffic generated by the
eDMA. The first test will generate read traffic, the second write traffic and the last one both read
and write traffic.
The test sequence is the same for all of the three tests. The eDMA starts powered off, then it
wakes up because of the configuration process. After the initial configuration, the test will force
generic (core/application) traffic in parallel with eDMA traffic. The eDMA traffic will be write
traffic, when it is generated in the eDMA write channel, read traffic, when generated in the eDMA
read channel, or both, depending on the test used.
After the transactions are complete, the test disables the eDMA block, by deasserting an in-
ternal enable signal, that is accessible to software for chip implementation. Then it proceeds to
send generic traffic, this helps testing the chip functionality with the eDMA module in low power
mode. At this stage the eDMA has been turned off by hardware via the PMU, since it detected no
activity. When generic transactions finish, ensuring proper low power operation, it is necessary to
48 Implementation
test if the eDMA is able to recover from low power and will work as expected. This also helps
testing the correct power up sequence and correct reset of the modules. Since the eDMA operation
consists on, as it’s name explicits, accessing memory, the test will need to program its registers
specifying the amount of data it should transfer and its location on memory. Due to confidentiality,
it is not possible to enter into eDMA configuration details.
The difference between the three tests provide a point to analyse power savings for each type
of traffic. The results will be a good comparative measure for the validation of the three power
domains solution.
When writing tests for power gating, it is important to also have a top power domain at the
root scope that contains all of the design. This power domain will emulate all power components
placed outside of the IP. If the power architect decides to use an external switch, it should put it in
this power domain. The top power domain is optional and is not synthesised.
Since IP normally is bought by other companies to integrate in their designs or SoCs (Sistem
on a Chip), the top power domain can also be useful to simulate the behaviour of the clients power
intent. It is important, when writing specifying power intent that it integrates with power intent
implemented at the chip level.
Chapter 5
Results
The IP used in this project is a highly configurable design that supports different data-path
size, different number of channels for a given instance and other parameters. The results were ob-
tained using a single configuration, well defined throughout the process. From this case study was
possible to establish some base guidelines for power gating implementation at the RTL stage of the
design process. These guidelines have the purpose of simplifying power gating implementation
when the power architect has little knowledge of the design to be optimised.
It is important to make a clean and correct power intent, since it should be understandable by
the back-end team for proper implementation.
5.1 Power Reduction Outcome
The technology node chosen for this implementation was 28nm. This technology node is
not the smallest one available, but it is currently being used by the industry. From the available
libraries, it was the only one that contained standard cells for power gating implementation.
Due to IP complexity and the time necessary to run the whole power characterisation flow, it
was mandatory to select a single corner for power analysis. The chosen corner was 125oC from
the 28nm technology node library. The available temperature corners were 125oC, 0oC and -40oC,
being the 125oC the one with the worst leakage.
This implementation proved to be efficient in reducing static power consumption, and it also
had quite an impact in total power consumption, mainly due to clock gating, since dynamic power
consumption has a bigger slice of the total power consumption.
It was not possible to perform a good area overhead evaluation. This comes from the fact that,
in the flow used for this implementation, area will vary in each synthesis, even using the exact
same design.
In one of the synthesis, it was possible to observe an increase of 2% of the total design area,
comparing to the original design with no power gating. This is not a significant area increase, but
as stated before it is also not a very accurate measurement.
49
50 Results
Since clock gating has also been implemented together with power gating, the area impact
is smaller. Due to register optimisation, clock gating is a technology that reduces area when
implemented.
Power reports at this level in ASICs design flow are not accurate, but are a reference for the
design power trend, becoming a good indication of possible savings.
During activity there is a negligible increase in power consumption, due to extra logic from
power gating and the power management units. The following tables reproduce the results from
power analysis. Current is the current design before power gating implementation on the eDMA
module. 1 PD is the solution with a single power domain for the eDMA module and 3 PD is the
final solution with independent channel power gating.
Table 5.1: Power during activity.
Full TrafficPower (µW) Dynamic Static Total
CurrentIP 3.29E-02 9.81E-03 0.19
DMA 4.18E-04 1.30E-03 3.11E-02
1 PDIP 3.33E-02 9.95E-03 0.196
DMA 0.00062 0.00142 0.0317
3 PDIP 0.0333 0.00995 0.196
DMA 0.00062 0.00142 0.0317
Table 5.2: Power consumption related to current implementation, during activity.
Full TrafficPower (µW) Dynamic Static Total
1 PDIP 101% 101% 103%
DMA 148% 109% 102%
3 PDIP 101% 101% 103%
DMA 148% 109% 102%
As it can be seen from the relative values presented on table 5.2, the increase in power con-
sumption for both single power domain and triple power domain is about 3%, which is not a bad
price to pay for the reduction provided during no activity. The results for the read and write sim-
ulation are very similar, due to the fact that logic is similar and the time interval chosen to extract
activity had similar traffic characteristics.
Results are different for the full simulation, where it is clearly noticeable that the three power
domains solution is much more effective on cutting down power consumption, especially if there
is only read or write traffic. Table 5.6 presents the relative power consumption of both solutions.
It is noticeable that the one power domain solution ended up increasing power consumption, this
is due to the fact that when a single channel is used, both channels are powered on, and remain on
until no channel is needed, as with three power domains, one channel will remain powered unless
there is traffic on both directions.
5.2 Guidelines 51
Table 5.3: Power consumption during full simulation.
Full TrafficPower (µW) Dynamic Static Total
CurrentIP 2.04E-02 9.75E-03 0.123
DMA 4.52E-04 1.29E-03 1.89E-02
1 PDIP 1.85E-02 9.89E-03 0.113
DMA 0.00063 0.00142 0.0176
3 PDIP 1.59E-02 8.51E-03 9.37E-02
DMA 5.00E-04 5.00E-04 8.80E-03
Table 5.4: Power consumption for a full simulation, write traffic.
Write TrafficPower (µW) Dynamic Static Total
CurrentIP 1.80E-02 9.72E-03 0.107
DMA 4.35E-04 1.28E-03 1.62E-02
1 PDIP 1.71E-02 9.87E-03 0.103
DMA 0.00061 0.00141 0.0161
3 PDIP 0.0153 0.00839 0.0868
DMA 0.0005 0.00038 0.0059
Table 5.5: Power consumption for a full simulation, read traffic.
Read TrafficPower (µW) Dynamic Static Total
CurrentIP 1.81E-02 9.72E-03 0.108
DMA 4.36E-04 1.28E-03 1.64E-02
1 PDIP 1.80E-02 9.87E-03 0.109
DMA 6.20E-04 1.41E-03 1.70E-02
3 PDIP 0.0155 0.0084 0.0885
DMA 0.0005 0.00039 0.0066
Table 5.6: Relative power consumption for a full write simulation.
Write TrafficPower (uW) Dynamic Static Total
1 PDIP 95% 102% 96%
DMA 140% 110% 99%
3 PDIP 85% 86% 81%
DMA 115% 30% 36%
5.2 Guidelines
From all the simulations and analysis made during the development of this dissertation, it was
possible to define some guidelines, useful for future power gating implementations and automation
of the power gating process.
First it is necessary to identify the modules that are able to be turned off. This can be achieved
52 Results
by analysing the modules’ activity from the waves provided by a previous simulation. If large
periods of activity are detected, the module becomes a good candidate for power gating.
Making a preliminary power analysis allows to know the current consumption of the modules.
By analysing the module’s leakage power consumption, it is possible to conclude if it is worth
creating a power domain for it or not. If the module has high leakage power, but at the same time
also has great activity, it is worth looking into it and partition it according to activity requirements
of its inner logic.
After selecting the candidates, it is important to group them. In a design, modules are related
to each other, which means, a couple of modules activity will be dependent from another modules’
activity, so, if the last module is not performing any function, all of the other related modules can
also be powered off. By grouping them into a single power domain, it will reduce the amount
of extra logic created from power gating. A good way to group modules together is creating a
wrapper. Although not necessary, it will ease the implementation.
If it is not possible to create a wrapper due to logical hierarchy, or design complexity, it is still
possible to implement power gating. Instead of the power domain being composed by the wrapper,
it will be composed from the independent modules, however this may raise complications in the
back-end phase, if a disjoint power domain has to be created. For new designs it is helpful to take
power hierarchy into consideration when designing the logical hierarchy of the system.
Having the modules selected, the next phase will be implementation. To implement power
gating it is necessary to understand the basics of the design it will be implemented on. If the
design requires state retention, a retention strategy has to be defined.
After deciding the need of state retention, it is necessary to define the signal that will enable
the power domain to turn on. Several power domains can also be an option, in that case, an enable
signal has to be selected for each of them. The enable signal can be a simple signal from the
parent module or a logical function from several signals. If two power domains have the same
enable signal and the same voltage, they can be grouped together, because they have the same
power requirements. The enable signal should be active during the whole activity phase of the
power domain.
Now that all the decisions are made, it is helpful to create a power intent diagram, this way
writing the UPF code will be easier.
5.3 Alternatives
Another power management unit can be designed, but some considerations should be kept in
mind. The wake up/power down and isolation signal sequences are important to avoid behavioural
errors. The power should only be removed after signals are isolated to avoid sampling of corrupted
signals. Isolation should also only be removed after power is stable, for the same reason.
The clock gating and restoration may depend on the design architecture, but removing the
clock when going into low power will reduce dynamic power.
5.3 Alternatives 53
When using retention the PMU flow may differ from the one presented above 4.2. Flows with
full state retention may not require to be reset, but could be a good measure to reset them anyway,
to ensure no logic is corrupted. However, it is important to respect the save/restore sequence. The
reset signal should be applied before restoration, otherwise, data would be lost. The save operation
should also be applied before the reset signal. The isolation should also only be lifted when the
restore operation is concluded.
54 Results
Chapter 6
Conclusion
Reducing power consumption is a concern that has been growing through time. It is important
to consider power in digital circuits design, given the problems introduced in this document and
considering the pollution generated by some power stations.
Power aware implementations are important for saving energy, specially on battery operated
devices. Technology advancements come with advantages and disadvantages. Smaller transistors
can be operated at lower voltages lowering effectively dynamic power consumption, that is highly
voltage dependent. It also allows more complex circuits with more logic at lower prices. Although
dynamic gets lower, static power increases due to lower threshold voltages on smaller transistors.
EDA industries also develop power saving solutions that can be applied to the design on a
high level of abstraction and provide good trade-offs. These power saving solutions are great
since reducing power consumption also reduces heat dissipation, decreasing the need for cooling
systems and consequently lowering devices cost. It even reduces thermal stress of components
increasing their life and reducing thermal related effects on transistors.
The PMU is very versatile and could be used for other retention-less implementations, by
identifying and selecting a good enable signal.
The amount of power the power architect will be able to save when implemented power gating
depends on the approach used. A more aggressive approach is able to save more power, but
requires a high understanding of the design itself and may become more complex. A more complex
implementation may save more power but the more complex is the implementation, the longer it
takes to implement, test and debug.
6.1 Future Work
This section presents possible future to help automate power gating implementation. The idea
is to create a power partitioning tool composed by several scripts with well defined functions. This
tool would be based on the guidelines studied in this dissertation and apply them automatically to
a system, with reduced human interaction.
55
56 Conclusion
One of the scripts has to be capable of reading a verilog module and analyse its processes. If
the module is composed by several processes independent from each other, the script will extract
this processes and promote them into new modules, this way they could be used in the power
intent.
Another important script receives several modules as inputs and places them inside a wrapper
module. The wrapper module will contain only inputs and outputs that are connected to modules
outside of itself, reducing the overall number of ports, and therefore simplifying isolation and level
shifter strategies.
The last script from this power partitioning tool evaluates switching activity and selects the
modules that are good candidates for power gating. Then, with that information it would construct
the power intent for those modules. The script should allow user introduction of the enable signal
as well as retention registers, since those two constraints require some knowledge of the design
itself.
Another important work to do in the future is to characterise the savings provided by the addi-
tion of the timer to the PMU’s state machine. This will require the implementation of power gating
into a new module, that has subsequent transaction requirements, with help from the guidelines
deduced during this dissertation.
References
[1] Advanced Low Power Techniques, May 2016. URL: http://www.synopsys.com/Solutions/EndSolutions/advanced-lowpower/verification-lowpower/Pages/advanced-low-power-techniques.aspx.
[2] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand. Leakage current mechanisms andleakage reduction techniques in deep-submicrometer cmos circuits. Proceedings of the IEEE,91(2):305–327, Feb 2003. doi:10.1109/JPROC.2002.808156.
[3] Sushma Honnavara-Prasad. System level power with ieee1801, 2015. URL: http://systempower.org/wp-content/uploads/2015/04/1801_Sushma.pdf.
[4] Michael Keating, David Flynn, Rob Aitken, Alan Gibbons, and Kaijian Shi. Low PowerMethodology Manual: For System-on-Chip Design. Springer Publishing Company, Incorpo-rated, 2007.
[5] IEEE P1801 Working Group. Ieee standard for design and verification of low-power inte-grated circuits. IEEE Std 1801-2013 (Revision of IEEE Std 1801-2009), pages 1–348, May2013. doi:10.1109/IEEESTD.2013.6521327.
[6] M.C. Schneider and C. Galup-Montoro. CMOS Analog Design Using All-Region MOSFETModeling. Cambridge University Press, 2010. URL: https://books.google.com/books?id=SDPG0Lz39HcC.
[7] S. Jadcherla. Verification Methodology Manual for Low Power. Synopsys, 2009. URL:https://books.google.pt/books?id=qz2NYgEACAAJ.
[8] T. Hattori. Challenges for low-power embedded soc’s. In VLSI Design, Automation andTest, 2007. VLSI-DAT 2007. International Symposium on, pages 1–4, April 2007. doi:10.1109/VDAT.2007.373214.
[9] F. Bin Muslim, A. Qamar, and L. Lavagno. Low power methodology for an asic design flowbased on high-level synthesis. In Software, Telecommunications and Computer Networks(SoftCOM), 2015 23rd International Conference on, pages 11–15, Sept 2015. doi:10.1109/SOFTCOM.2015.7314103.
[10] A. Mathur and Qi Wang. Power reduction techniques and flows at rtl and system level.In VLSI Design, 2009 22nd International Conference on, pages 28–29, Jan 2009. doi:10.1109/VLSI.Design.2009.113.
[11] N.S. Kim, T. Austin, D. Baauw, T. Mudge, K. Flautner, J.S. Hu, M.J. Irwin, M. Kandemir,and V. Narayanan. Leakage current: Moore’s law meets static power. Computer, 36(12):68–75, Dec 2003. doi:10.1109/MC.2003.1250885.
[12] A.S. Sedra and K.C. Smith. Microelectronic Circuits: International edition. OUP USA,2010. URL: https://books.google.pt/books?id=KuGCRAAACAAJ.
[13] Farzan Fallah and Massoud Pedram. Standby and active leakage current control and mini-mization in cmos vlsi circuits. IEICE transactions on electronics, 88(4):509–519, 2005.
[14] S. Carver, A. Mathur, L. Sharma, P. Subbarao, S. Urish, and Qi Wang. Low-power designusing the si2 common power format. IEEE Design & Test of Computers, 29(2):62– 70, 2012/04/. low-power design;common power format standard;CPF standard;IC de-sign;power consumption;power domain;power node;interoperability;IEEE1801 low-powerstandard;SoC design;. URL: http://dx.doi.org/10.1109/MDT.2012.2183574.
[15] V. Gourisetty, H. Mahmoodi, V. Melikyan, E. Babayan, R. Goldman, K. Holcomb, andT. Wood. Low power design flow based on unified power format and synopsys tool chain.In Interdisciplinary Engineering Design Education Conference (IEDEC), 2013 3rd, pages28–31, March 2013. doi:10.1109/IEDEC.2013.6526754.
[16] Ieee standard for design and verification of low-power, energy-aware electronic systems.IEEE Std 1801-2015 (Revision of IEEE Std 1801-2013), pages 1–515, March 2016. doi:10.1109/IEEESTD.2016.7445797.
[17] Power Optimization in Design Compiler, June 2016. URL: http://www.synopsys.com/Tools/Implementation/RTLSynthesis/Pages/PowerCompiler.aspx.
[18] Ieee standard for integrated circuit (ic) open library architecture (ola). IEEE Std 1481-2009,pages c1–658, 2009. doi:10.1109/IEEESTD.2009.5430852.
[19] Universal verification methodology, June 2016. URL: http://www.accellera.org/community/uvm/.