Top Banner
FACULDADE DE E NGENHARIA DA UNIVERSIDADE DO P ORTO RTL Guidelines for Static Power Reduction Ciro de Moura Monteiro Mestrado Integrado em Engenharia Eletrotécnica e de Computadores Synopsys Supervisor: Hélder Silva FEUP Supervisor: José Carlos Alves July 27, 2016
78

RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

Feb 27, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO

RTL Guidelines for Static PowerReduction

Ciro de Moura Monteiro

Mestrado Integrado em Engenharia Eletrotécnica e de Computadores

Synopsys Supervisor: Hélder Silva

FEUP Supervisor: José Carlos Alves

July 27, 2016

Page 2: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

c© Ciro de Moura Monteiro, 2015

Page 3: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

Resumo

Nos dias que correm, com o crescimento de aparelhos portáteis, operados por baterias decapacidade limitada, é importante uma boa gestão de energia para garantir o maior período deoperação possível. O consumo dinâmico de energia foi em tempos uma das maiores consideraçõesa ter no design de circuitos para baixo consumo de energia, mas hoje em dia, algumas técnicas deredução de consumo dinâmico são aplicadas automaticamente pelas ferramentas.

Cada vez se conseguem produzir circuitos integrados com mais transístores e até mesmo comtransístores mais pequenos. No entanto, daí advém também o problema do aumento nas correntesde fuga. Uma abordagem possível para tentar reduzir o efeito da potência estática consumida pelascorrentes de fuga destes circuitos é apelidada de power gating.

Power gating consiste no uso de transístores como interruptores para ligar e desligar a alimen-tação de partes de um circuito integrado. Para tal, podem ser usados transístores de cabeçalho outransístores de rodapé, cada um com as suas vantagens e desvantagens.

i

Page 4: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

ii

Page 5: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

Abstract

In today’s world, we are witnessing a growth in battery operated portable devices, that requiresmart power choices due to their limited battery life. Dynamic power consumption has been amajor consideration when designing power aware devices, but some dynamic power savings arealready automatically introduced by the designing tools.

Current technology has evolved into having smaller transistors and that enables building chipswith bigger transistor density. With this technology, new problems arise, as the existence of higherleakage currents. These may or may not be resolved by power reduction techniques, as not all ofthem are leakage oriented. One possible solution for this problem would be the power gatingtechnique.

Power Gating consists in using switching transistors to control the power supply of certainareas of the circuit. This power reduction technique allows the use of header or footer transistor,each one with its benefits and disadvantages.

iii

Page 6: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

iv

Page 7: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

Agradecimentos

Gostava de deixar um agradecimento...Em especial ao Hélder Silva e Athul Stripad por me acompanharem todas as semanas e aju-

darem em algumas decisões importantes.Ao professor José Carlos Alves por me orientar neste trabalho, e me ajudar a tomar decisões.Ao Nelson Eira pela paciência para me explicar o funcionamento dos scripts utilizados pelo

ambiente de implementação.À empresa Synopsys pela possibilidade que me foi dada em fazer este projecto de dissertação

em ambiente empresarial.À minha família por me ajudar a crescer.À Susana Carvalho por me ajudar a escolher a minha especialização, da qual fiquei a gostar.À Inês Teixeira e Gabriel Ribeiro por me ajudarem com o meu inglês.A todos os meus outros amigos pela paciência para me aturarem.E por fim à FEUP e aos seus professores por me ajudarem na minha formação.

Ciro de Moura Monteiro

v

Page 8: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

vi

Page 9: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

“Laugh and the world laughs with you.Snore and you sleep alone”

Anthony Burgess

vii

Page 10: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

viii

Page 11: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

Contents

1 Introduction 11.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Power Gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 EDA Team Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.6 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Related Work 72.1 Power Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Power and Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1.2 Dynamic Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1.3 Static Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Power Gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2.1 Consideration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2.2 Header vs Footer Switching . . . . . . . . . . . . . . . . . . . . . . . . 152.2.3 Fine Grain vs Coarse Grain . . . . . . . . . . . . . . . . . . . . . . . . . 172.2.4 Power Intent Languages . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2.5 UPF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Design Flow 333.1 Flow Without Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.1.1 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.2 UPF flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2.1 Voltage Aware Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . 373.3 Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Implementation 414.1 Steps taken . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.2 Obstacles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.3 Power Management Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.4 Final Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.4.1 Power Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.5 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5 Results 495.1 Power Reduction Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.2 Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

ix

Page 12: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

x CONTENTS

5.3 Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6 Conclusion 556.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

References 57

Page 13: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

List of Figures

2.1 High power consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Low power consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Power consumption on a CMOS inverter. Source: [1] . . . . . . . . . . . . . . . 102.4 Inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.5 Clock Gating Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.6 Summary of leakage currents of deep-submicrometer transistors. Source: [2] . . 122.7 Sub-threshold leakage path in a CMOS inverter . . . . . . . . . . . . . . . . . . 122.8 Header switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.9 Footer switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.10 Fine grain and cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.11 Fine grain and cell with isolation clamp transistor . . . . . . . . . . . . . . . . . 182.12 Fine grain header switching and cell with isolation clamp transistor . . . . . . . 192.13 Companies involved in IEEE P1801 working group. Source [3]. . . . . . . . . . 212.14 Example of power domains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.15 Isolation cell between power domains . . . . . . . . . . . . . . . . . . . . . . . 232.16 State Retention Isolation. Source: [4] . . . . . . . . . . . . . . . . . . . . . . . 242.17 Power domain with heterogeneous fan-out . . . . . . . . . . . . . . . . . . . . . 252.18 Retention register. Source: [4] . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.19 Enable Level Shifter Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.20 Header switch cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.21 Isolation on input of heterogeneous fan-out . . . . . . . . . . . . . . . . . . . . 292.22 Redundant isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.23 Output isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.1 UPF tool flow. Source: [5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.2 Design flow for multi-voltage, power gated designs. Source: [4] . . . . . . . . . 34

4.1 Basic representation of the module used. . . . . . . . . . . . . . . . . . . . . . . 424.2 Function State Machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.3 Block representation of the power domains . . . . . . . . . . . . . . . . . . . . 47

xi

Page 14: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

xii LIST OF FIGURES

Page 15: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

List of Tables

2.1 Main parameter for the seven-metal-layer 90-nm CMOS technology node. Source:[6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Example of PST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1 EDMA power state table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.2 Edma power state table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.1 Power during activity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.2 Power consumption related to current implementation, during activity. . . . . . . 505.3 Power consumption during full simulation. . . . . . . . . . . . . . . . . . . . . . 515.4 Power consumption for a full simulation, write traffic. . . . . . . . . . . . . . . . 515.5 Power consumption for a full simulation, read traffic. . . . . . . . . . . . . . . . 515.6 Relative power consumption for a full write simulation. . . . . . . . . . . . . . . 51

xiii

Page 16: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

xiv LIST OF TABLES

Page 17: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

Symbols and Abbreviations

CAD Computer-Aided DesignCPF Common Power FormatCTS Clock Tree SynthesisDC Design CompilerDFT Compiler Design-for-test CompilerDMA Direct Memory AccessDRC Design Rule CheckDVE Debugging and Visualisation EnvironmentDVFS Dynamic Voltage and Frequency ScalingDVS Dynamic Voltage ScalingEDA Electronic Design AutomationeDMA Embedded DMAFET Field Effect transistorGIDL Gate Induced Drain LeakageHDL Hardware Description LanguageIC Integrated CircuitIEEE Institute of Electrical and Electronics EngineersIoE Internet of EverythingIoT Internet of ThingsIP Intellectual PropertyIP Intellectual PropertyIR drop Voltage drop due to energy losses in a resistive pathMOS Metal–Oxide–SemiconductorMTCMOS Multi-Threshold CMOSMVSIM Multi-Voltage SimulationNLP Native Low PowerNMOS N channel MOSFETPG Power and GroundPMU Power Management UnitPST Power State TablePVT Process, Voltage, TemperatureQoR Quality of ResultsRCE Regression Control EnvironmentRTL Register Transfer LevelSAIF Switching Activity Interchange formatSDC Synopsys Design ConstraintsSI International System of Units (Système international d’unités)SI2 Silicon Integration Initiative

xv

Page 18: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

xvi SYMBOLS AND ABBREVIATIONS

SoC System on a ChipSPEF Standard Parasitic Exchange FormatTCL Tool Command LanguageUPF Unified Power FormatUVM Universal Verification MethodologyVC LP Verification Compiler Low PowerVCD IEEE Standard 1364-1995, Value Change DumpVCS Verilog Compiled code SimulatorVHDL VHSIC Hardware Description LanguageVHSIC Very High Speed Integrated CircuitVIP Verification IP/Verification Link PartnerVLSI Very-Large-Scale IntegrationVPD VCD PlusVTB Verilog Test BenchWWW World Wide Web

Page 19: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

Concepts

Clock Gating Technique to reduce clock activityPower Density Power consumed by areaPower Gating Cutting power using a switchPower Island Island that kept ON inside power domain that is OFFShadow Registers Register used to store data in sleep modeVDD Positive voltage railVSS Negative voltage rail/Reference voltage

xvii

Page 20: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct
Page 21: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

Chapter 1

Introduction

Energy efficiency is a very important aspect of electronic circuit design nowadays. Just a few

decades ago, designers used to focus only in having a working chip, and power consumption was

not a primary design concern. The requirements for portability, mobility or battery dependency

were very infrequent and the early CMOS technologies for digital electronics were sufficiently

constrained in terms of power consumption. Then, as technology evolved, transistors size de-

creased, making it possible to fit more of them in one die, and so, power consumption gained

importance. Nonetheless, only dynamic power seemed to matter, due to the fact that for CMOS

technology above 130nm leakage is negligible [4]. Nowadays, power is more important than ever,

especially for battery operated and mobile devices.

Power hungry chips tend to have high power density, which generates a lot of heat for a small

dissipation area. This raises power dissipation issues, requires expensive cooling systems and may

cause chip lifetime reduction.

Due to environmental concerns, there is an interest on reducing power consumption from

devices in an effort of reducing pollution and power wasted without activity. Systems may have

most of their modules idling, consuming power without executing any operation. Chips should

idle efficiently, to reduce the energy consumed on their operation.

In the last years, portability and mobility have gained a still growing importance, and in that

field, power consumption is of paramount importance. For example, there is a significant inconve-

nience for users if portable equipments have to be constantly charged. Not only batteries have little

autonomy without power management, they have short lifetime as well and have to be replaced.

Efficient power management can make a better use of batteries’ energy, making them last longer

and avoiding the hassle of replacing or charging them constantly.

To reduce power consumption in digital integrated VLSI systems, various considerations must

be kept in mind. Dynamic power used to be the major concern in power aware designs, but as

the technology nodes decrease, and transistor density increases, power consumption has gained

increasing importance due to transistor’s leakage. Besides, most circuits today already implement

techniques to reduce dynamic power consumption, such as clock gating. Clock gating effec-

tively reduces the activity in the system consequently reducing dynamic power consumption (as

1

Page 22: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

2 Introduction

deductible from equation 2.4).

Today’s techniques to reduce power consumption include using higher threshold transistors in

non-critical paths of the design. That includes using high K dielectric gate oxide, well biasing

and bigger transistors. These techniques are applied at a very low level, they don’t save that

much power and become expensive because they require extra masks for the different transistors.

Sometimes it may be essential to use low vt or ultra low vt transistors to achieve the very fast

speeds that industry demands nowadays. These techniques are also very time consuming since

they may require the designer’s attention to single gates.

Before the 90nm technology node became available, designers used to simply migrate chips

to lower geometries to reduce power. This would take advantage of lower supply voltages as well

as lower capacitance. Integrating a 180nm chip into a 130nm would cut down power to almost

half. Although voltage decreases, current will increase. At 90nm, the increase in current is more

significant than the decrease in voltage and this results in a higher power consumption than the

expected [7].

Static power importance has increased significantly as technology size decreases. Once negli-

gible, at the deep sub-micron level, static consumption can get almost as high as 50% of the total

power consumption [8]. A very useful and effective technique to reduce static power consumption

is power gating. Power gating consists in shutting down inactive parts of a circuit. Implementing

this technique is a very time consuming task due to power necessity identification, and testing.

Power gating was created to reduce static power at the block level. By cutting off the supply to

the module, when power gating is applied, power consumption drops theoretically to zero. Major

dynamic and static power reductions can be achieved by addressing power on the RTL (Register

Transfer Level) and system level [9] [10]. This provides a higher abstraction level to the designer,

and allows the RTL designer to produce power aware circuits, otherwise only implemented at the

back-end.

1.1 Context

This dissertation is being developed in the scope of the Master in Electrical and Computers

Engineering from the Faculty of Engineering from University of Porto. This work was proposed

by Synopsys R© from a previous contact between July and September 2015 on a summer job.

1.2 Motivation

Everyday, a great number of portable devices is in development. Smart phones and smart

watches are two good examples of technological evolution. These are evidences of the growing of

IoT (Internet of Things), also named IoE (Internet of Everything) by some, given its extent. These

devices are small and most of them are battery operated, which requires smart power choices.

Transistor’s size has been decreasing during the past few years, allowing smaller devices with

higher transistor density. Also, this lowers the operating supply voltage, which translates to a

Page 23: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

1.3 Objectives 3

quadratic reduction in the power consumption (P(t) = V (t)2/R). On the other hand, leakage

power also increases with smaller transistors due to lower threshold voltage and increasing quan-

tity of transistors per die. Besides, smaller transistors also allow faster transitions which increases

dynamic power consumption.

When designing digital circuits, reducing power consumption has been a major concern in the

last few years. Designers often implement several methodologies to reduce dynamic power con-

sumption, like clock gating, but as technology improves and gets smaller, static energy consump-

tion has been gaining importance and becoming a bigger slice in the total power consumption.

Even when a circuit is not active, it will consume energy. This is because transistors are not

perfect and permit small currents to flow, despite the logical off state. With the increase in the

number of transistors in a single chip, the sum of these small currents becomes significant. This

problem is known as leakage, because unwanted current is leaking through the transistors.

For this reason, leakage current is an important matter as technology advances, because of

the decreasing size of transistors, which leads to smaller threshold voltage, provoking an increase

in leakage. Nevertheless, switching currents decrease due to smaller capacitance, and therefore,

static power consumption increases, comparing to dynamic power consumption. Switching off the

inactive parts of a circuit can be a solution for the leakage problem. This method is called power

gating.

Power gating is already implemented in several designs and has proved effectiveness, but it is

not yet automatic and there are no guidelines defined for high level static power savings.

1.3 Objectives

The goal in this dissertation is to evaluate strategies for power gating mechanisms implemen-

tation, applied to a high-speed intellectual property (IP) module, characterise the power savings

and the impact in the circuit design, resulting in a set of design guidelines to introduce power

gating in future designs. The study of different approaches is made, and their impact on the design

evaluated.

Since power gating implementation on an IP block designed without power taken into account

can be very troublesome and slow, emerged the need for some guidelines that could be followed

to reduce the implementation time. These guidelines could also be useful in the future with the

intention of automating the process.

Another objective is to evaluate the best way to use the tools to implement power gating in the

RTL design as well as checking the functionality before and after power gating implementation.

1.4 Power Gating

Power gating consists in using transistors as switches to control the supply of power to se-

lected parts of a circuit, according to its activity. Power gating’s purpose is to reduce static power

consumption. As almost every improvement in digital circuits design is a trade-off, power gating

Page 24: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

4 Introduction

decreases the power consumption in exchange for a small area increase as well as increased design

complexity.

A concept in power gating is power domains, these domains are composed by circuit blocks

that share similar activity requirements. Each power domain can be controlled by a different signal,

allowing some power domains to be active while others are sleeping, providing functionality while

saving power. When a power domain is in sleep mode, it’s registers lose their value, which may

become a problem if there is a need to keep state during sleep mode. To solve this issue, we can

use retention registers. Retention registers are used to save the value of important registers and

will be looked upon on later chapters.

Connecting circuits that are not active to active circuits may cause incorrect reads of values.

To avoid this, isolation cells are used to connect inactive and active blocks.

Static power consumption increases as transistors get smaller, and has been gaining importance

in power aware designs. A good solution to address this issue is power gating, that effectively cuts

down the leakage currents. The challenge when adopting a power gating strategy is to decide

which modules should be power controlled and decide when to power on/off, according to the

required functional specifications and the design speed/area trade-offs. Even though there are

other techniques, the focus of this work will be mainly on power gating.

There are description languages that allow to integrate power gating in hardware design with

the help of EDA (Electronic Design Automation) and verification tools. These languages are

known as power intent description languages, because they are used to describe the mechanisms

that control power to the modules.

1.5 EDA Team Organisation

The design of an integrated electronic circuit is a very complex task that requires a close

cooperation among different teams with competences in diverse areas. It is common for EDA

teams to be organised in small sub-teams. A project burden is often divided between the sub-

teams. Usually there will be a front-end team, responsible for RTL design. There is also a team

for RTL verification, responsible for writing test-benches and verify the RTL implementation of

the logical intent. The back-end team is responsible for place and route as well as layout.

Power gating is usually implemented at the front-end, but impacts a lot the back-end process,

and some decisions should be made together. Although there are other techniques for static power

reduction, implemented at the back-end stage of the process, they are not as effective as cutting

the supply voltage to the design. Power gating causes an increase in design effort for both the

front-end and back-end teams, resulting in some decisions being taken together.

Power intent specification is made at the RTL level, allowing verification of the logical opera-

tion with RTL simulations. Simulations at the netlist or lower level would increase implementation

and verification times a lot. This simulations should also be performed, but only after having good

results from the RTL simulation.

Page 25: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

1.6 Structure 5

This work involves RTL design and verification of the design, and also synthesis, necessary

for power analysis, that is usually not done by the front-end team.

1.6 Structure

This document is divided in 6 chapters, Introduction 1, Related Work 2, Design Flow 3,

Implementation 4, Results 5 and Conclusion 6.

The Introduction chapter (1), as its name states, is an introduction to the work developed

throughout this master’s dissertation.

In the Related Work (2) chapter, a bibliographic review and study of the current state of the art

is made, introducing the major concepts used in power gating.

The Design Flow (3) chapter is dedicated to explaining the differences between a traditional

digital CMOS circuit design and a power oriented one.

Implementation (4) is a chapter dedicated to explaining the implementation decisions taken

during the development as well as the final result.

The Results chapter (5) contains the results obtained from the implementation.

Conclusion (6) is the final chapter, where the last conclusions about this work are made, as

well as possible future improvements to it are described.

Page 26: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

6 Introduction

Page 27: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

Chapter 2

Related Work

In today’s world, with the growth of the VLSI industry and the portability of devices, the num-

ber of gates inside a single IC chip has been increasing. This allows more logical components in

the same die or even smaller dies, but causes an increasing power consumption and, consequently,

can raise thermal and energy problems.

One concern, that has been gaining importance, is static power consumption. Static power is

consumed when the circuit is in an idle state, that is, when it has no activity.

Power dissipation causes heat, which can be prejudicial to chips, reducing their lifespan and

affecting performance, but reducing the heat may require expensive cooling systems that raise

products market cost.

As most of the devices in these days are battery operated, it is a major concern to improve their

durability. Batteries hold limited charge, and taking into account today’s electronic devices power

consumption, they usually don’t last very long. As such, there is the need of buying new batteries

constantly. Non rechargeable batteries have a big environmental impact, and rechargeable ones last

a limited amount of charging cycles. To reduce the impact of batteries, better energy efficiency is

needed.

This chapter presents a study of power related problems on CMOS digital circuits and some

techniques to avoid them, focusing mainly on power gating, a technique for static power reduction.

2.1 Power Consumption

With the growth of mobile devices and applications, as well as all the environmental concerns,

power consumption is becoming an important criteria in electronics system designs. Synopsys’

EDA tools provide various solutions for power aware design, some of them automated. On the

other hand, new techniques emerge with better trade-offs and better power savings, and these

are available in the market. Therefore implementing these becomes a market advantage for both

Synopsys and its costumers.

Energy loss is converted into heat, which can be prejudicial to electronic components, therefore

there is a need to maintain a low temperature in these devices. One way of guaranteeing this is to

7

Page 28: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

8 Related Work

use cooling systems. Reducing power consumption will reduce heat dissipation, creating a better

system in many aspects, and making it possible to drop the cooling system, since the cooling

systems raise the price of the product itself.

Current technology already handles clock gating automatically, as well as other power sav-

ing techniques, but leakage power is gaining importance and a solution to efficiently reduce this

problem is needed.

CMOS digital circuits used to have negligible static power losses, as referred by [11] in 2003:

"Historically, complementary metal-oxide semi-conductor technology has dissipated

much less power than earlier technologies such as transistor-transistor and emitter-

coupled logic. In fact, when not switching, CMOS transistors lost negligible power.

However, the power they consume has increased dramatically with increases in device

speed and chip density."

The power consumption in digital CMOS circuits is given by the equation 2.1. This equation

can be divided into dynamic and static power consumption, as seen in subsections below ( 2.1.2

and 2.1.3). The first term is the dynamic power consumption and the second one the static power

consumption. P represents the total power consumption. A is the fraction of gates switching, C

the total capacitance load of all gates and f the clock frequency. Ileak is the leakage current and V

the supply voltage.

P = ACV 2 f +V Ileak (2.1)

Source: [11]

2.1.1 Power and Energy

Energy and power are two important but different concepts, specially concerning portable

devices. For these type of devices, battery life is a big concern, as well as heat dissipation, because

there is usually not enough space to implement an efficient cooling system, if any.

Power SI unit is watt, and it represents the amount of energy transferred per unit of time.

Instant power is given by:

P(t) =V (t)× I(t) (2.2)

Energy is what a system converts into work or heat. Batteries provide energy for a circuit to

execute a given function. Energy SI unit is joule (J) and is usually measured in a time interval.

Energy can be calculated as the integral of power for a given time interval:

E =∫ T

0P(t)dt (2.3)

From that it is possible to deduce that energy is the power used over a given power interval.

Page 29: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

2.1 Power Consumption 9

A higher power demanding circuit will consume more energy than a low power one. Specially

for battery operated devices, power consumption is very important. Having a lower power design

will result on longer operational times. As can be seen in figures 2.1 and 2.2, both graphs have the

same energy, the graph’s area are the same. If two battery operated circuits had power consump-

tions similar to the ones seen in the images, the first one would have less operational time due to

its higher power consumption.

Time

Power

Figure 2.1: High power consumption

Time

Power

Figure 2.2: Low power consumption

Static CMOS logic cells are made of NMOS and PMOS transistor nets, based on their ability

to work like digital switches. Transistors are however not ideal, their gates are capacitive inputs,

which makes the logic gate inputs capacitive. Transistors are also non ideal switches since they

have a non zero ON resistance and have finite OFF resistance. These cause what’s called parasitic

impedance. Parasitic impedances, on CMOS gates, will consume power when switching state,

through charging and discharging of its parasitic capacitors, and static power, when not switching,

due to non infinite OFF resistance and non zero ON resistance.

In figure 2.3, it is possible to observe the paths of static and dynamic power consumption on

a CMOS inverter. Logical gates have input and output capacitance due to transistors parasitic

effects. Pstatic is the static power consumption of the inverter, being V the supply voltage and Ileak

the leakage current. Isc is the short circuit current during state transition, C the capacitive output

of the gate and input of the next gate. Iswtch is the switching current, and fswitch represents the

effective switching frequency, calculated based on clock frequency ( fclk) and activity (A).

Page 30: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

10 Related Work

Figure 2.3: Power consumption on a CMOS inverter. Source: [1]

2.1.2 Dynamic Power

Dynamic power consumption arises from constant charging and discharging of parasitic ca-

pacitances on the output of millions of gates inside an integrated circuit. Transistors are not ideal

and have parasitic capacitance. These capacitances are charged and discharged according to the

output of logic gates. If the logical state is one, the capacitance is charged, on the contrary, if

the logical state is zero, the capacitor is discharged. The transitions between zero and one is what

consumes dynamic power, and it was in times the most important factor in power consumption.

CoutCin

Figure 2.4: Inverter

In figure 2.4 it is possible to view a representation of an inverter with its parasitic input and

output capacitor. These parasitic capacitances are the cause of dynamic power consumption, and

port delays. Another parasitic effect comes from the wire interconnections, the bigger the wire,

the bigger the capacitance.

In digital CMOS circuits, fan-out is the ability of a logic port to drive other logical port inputs.

Clock distribution generates big networks, because all synchronous logic will require a reference

clock signal. The clock signal requires complex routing, and complex buffering due to the high

extent of the signal. Clock signal is implemented using a complex tree of buffers to be able to

drive all the gate inputs, and keep the required timing with reduced skew across the system.

Huge clock trees normally are the ones that consume the most dynamic power. That is why the

main focus in reducing dynamic power consumption is the reduction of clock frequency as well

Page 31: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

2.1 Power Consumption 11

as the reduction of active cycles to inactive parts of the system through clock gating. Most of the

systems nowadays implement clock gating.

Latch

Enable

Clock

Figure 2.5: Clock Gating Cell

Dynamic power consumption can be calculated from expression 2.4, the power loss is caused

by charging and discharging the gates capacitive loads. A is the fraction of gates actively switch-

ing, C the total capacitive load of the module, f the frequency and V the voltage [11]. As can be

observed, dynamic power losses depend on active gates, capacitance, frequency and voltage.

The number of active gates is dependable on the system needs, being the clock tree one of

the biggest contributors. Activity can be reduced by removing the clock signal from logic that is

currently not necessary, this technique is known as clock gating.

Voltage is the most important factor since it is a quadratic factor, a reduction of voltage to

half, will reduce dynamic power to a quarter, but the frequency the system is able to achieve also

depends on voltage ( 2.5), so reducing voltage can be prejudicial for high-speed interfaces. A

technique named Multi-Voltage can be useful for keeping different areas of a chip operating at

different voltages, according to their necessities.

Pdynamic = ACV 2 f (2.4)

f ∝(V −Vth)

α

V(2.5)

Source: [11]

2.1.3 Static Power

Transistors are not ideal digital switches and conduct small currents even when the gate voltage

is below the threshold voltage. These small currents, cause power consumption, more precisely

static power. Static power consists in the power consumed when the gates are not switching. Static

power, once negligible, gained a lot of importance since it has increased in the last few years.

Static power consumption can sometimes be up to 50% of the total power consumption of a

chip. This comes from the ever increasing number of transistors in each die, as well as the use of

lower threshold voltage transistors. In a effort to reduce power consumption, new techniques have

been developed, and can be used together for better efficiency.

Page 32: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

12 Related Work

On a CMOS gate, there are four main leakage sources, sub-threshold leakage, gate leakage,

gate induced drain leakage (GIDL) and reverse bias junction leakage. Sub-threshold leakage

(ISUB) is the current that flows from drain to source when the transistor is operating in the weak

inversion region. Gate induced drain leakage (IGIDL) is the current induced by a high field effect

in the drain caused by a high VDG. Reverse bias junction leakage (IREV) is caused by minority

carrier drift and generation of electron/hole pairs in the depletion regions. Gate leakage (IGATE) is

the current that flows through the gate oxide to the substrate layer due to gate oxide tunnelling and

hot carrier injection [4]. Gate leakage can be improved by using materials with higher dielectric

constant for the gate oxide.

Figure 2.6: Summary of leakage currents of deep-submicrometer transistors. Source: [2]

In figure 2.6, it is possible to observe the sneaky paths in a MOS transistor, where static

current leaks. I1 is the reverse bias pn junction leakage, I2 the subthreshold leakage, I3 the oxide

tunnelling current, I4 the gate current due to hot-carrier injection, I5 the GIDL and I6 is the channel

punchthrough current

According to [11], the two major components of static power consumption are gate leakage

and sub-threshold leakage. Sub-threshold leakage is a weak inversion current across the device,

some devices can be designed to work in sub-threshold mode, but it is out of the scope of this

thesis.

VDD

Isub

Figure 2.7: Sub-threshold leakage path in a CMOS inverter

Static power consumption depends only on voltage and current 2.6. Reducing either voltage

or current is effective to reduce static power consumption. Since reducing voltage can introduce

frequency problems, as seen in 2.1.2, reducing current is the best way to go. To achieve current

Page 33: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

2.1 Power Consumption 13

reduction, one can implement power gating, transistors with higher threshold voltage, and lower

leakage currents, switch on and off the module power rails.

Pstatic =V Ileak (2.6)

Leakage current can be approximated as a combination of sub-threshold and gate-oxide leak-

age:

Ileak = Isub + Iox (2.7)

Gate leakage is the current that flows through the gate oxide, due to the quantum-mechanical

tunneling of electrons, as described by [6]:

"For oxide thicknesses below 4 nm, high current leakages through the oxide can occur

due to the quantum-mechanical tunneling of electrons. The gate leakage current can

not only negatively affect the device performance but also significantly increase the

standby power consumption of a chip."

When a transistor works as a digital switch, it operates either in an active mode or cuts off

the signal. More specifically, a MOSFET enters the cut off state when its gate-substract voltage

difference is bellow the transistor’s threshold voltage. Nonetheless, since transistors are not ideal

switches, they have a non infinite resistance in the off state, which means that a small amount

of power will be consumed by this component. This effect is named as sub-threshold leakage

because, as the name suggests, the gate voltage will be below the threshold.

Drain-bulk and Source-bulk contribute with their reverse currents. Two important mecha-

nisms contribute to bulk current, gate induced drain leakage (GIDL) and impact ionisation. For

advanced technologies, impact ionisation is no longer important because supply voltage is in the

same or lower order than the band-gap of silicon, therefore, the carriers are no longer able to create

electron-hole pairs [6].

From table 2.1 one can observe that p-MOS transistors are less leakier than n-MOS of the

same size, but are not capable of carrying the same amount of current in saturation mode.

Table 2.1: Main parameter for the seven-metal-layer 90-nm CMOS technology node. Source: [6]

ParameterLogic (low power)n-MOS p-MOS

Supply voltage(V) 1.2Drawn gate(nm) 90tox(nm) 1.5VT(mV) 420 -400IDsat(mA/µm) 1.0 0.5Ioff(nA/µm) 15 6

Page 34: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

14 Related Work

According to [12], the sub-threshold conduction current, for short channel MOSFETs can be

calculated with the following equation:

ID = ISeVGSnVT (2.8)

Source: [12]

In 2.8 IS is a constant, VT is the thermal voltage at room temperature and n is a constant whose

value depends on the material and structure of the device. Although small, power dissipation

becomes a problem on chips with billions of transistors. Other authors consider a more complex

and parameter dependent equation ( 2.9), W and L are the gate width and length respectively, the

other parameters are technological parameters.

ISUB =WL

µV 2thCsthe

VGS−VT +ηVDSnvth (1− e

−VDSvth ) (2.9)

Source: [13] [4]

As seen in the equation 2.9, sub-threshold leakage is exponentially dependent of VGS and

VT. As technology scales down VDD and VT to lower dynamic power consumption, static power

consumption increases.

2.2 Power Gating

This section presents power gating, it’s definition, important concepts, how it appeared, lan-

guages used to implement it as well as who made it happen and how the two actual standards that

exist were created.

Power gating consists in using a switch between the supply rails and the cells supply ports.

When the module is not in use, the switch is turned off, cutting power to the module and avoiding

the static power consumption. Power gating, with ideal digital switches would cut completely off

the leakage current consumed by the module. Since the switches are usually implemented using

CMOS technology, it reduces the leakage current of the whole module to the leakage current of

the switching transistors.

Power gating is implemented with transistors connected between supply and the module,

known as header switching, as seen in figure 2.8, between the module and ground, known as

footer switching, seen in figure 2.9 or both. Each approach has its advantages and disadvantages.

Implementing power gating is a trade-off, it increases the area as well as a dynamic power due

to switching between powered on and off state. So, in order to implement power gating, one must

be aware of this trade-off and make sure the implementation is beneficial. If a module is constantly

switching between the on and off state, the increase in dynamic power turns out to be bigger than

the decrease in static power, therefore turning this technique prejudicial rather than beneficial.

Shutting down inactive parts of a system may result in a loss of state, to avoid this problem less

leakier registers are used to retain state. These registers can introduce significant area overhead

because they are implemented using bigger transistors.

Page 35: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

2.2 Power Gating 15

When applying power gating to a design, special care must be taken to avoid changes to critical

paths as well as to avoid creating new ones. Extra logic like the isolation cells and level shifters

cause delays in the data-path. Paths that hardly fulfil timing constraints, with the extra logic may

violate these constraints. Powered off logic takes time when powering back on, which may cause

performance issues.

Power gating can be implemented at the RTL level of abstraction either using power intent lan-

guages like Common Power Format (CPF) or Unified Power Format (UPF), this will be explained

further in another section.

Power switching can be driven either by hardware or software. Hardware implementation re-

quires more area for the control circuitry. Software implementation requires software development

as well as an interface.

2.2.1 Consideration

Some considerations must be kept in mind when implementing power gating. The dynamic

power consumed by the extra circuitry, the static power consumed by always on logic, as well as

the power switching transistors. The area cost should also be kept in mind, because higher Vth

transistors occupy a bigger area. The retention strategy is also important because retention cells

can cause a big area overhead.

A module that is switched off can not be directly connected to an always on module due to

floating voltages, requiring the use of isolation cells.

A single transistor may not be able to drive a full module as its width may not be enough to

drive all the current the module needs. Waking up the circuit too fast may cause a big inrush cur-

rent, which could damage some tracks. The voltage drop at the switching fabric must be carefully

analysed to ensure proper operation of the module.

When a block is power gated, its registers lose their value. It may be important to keep those

values in some cases. In those cases when there is a need to retain state, always on retention

registers are the solution. These registers are usually implemented with less leakier, and lower

voltage retention cells. However, this comes with a time penalty when restoring the values back

to the main registers at wake up time, raises dynamic power consumption and increases area.

2.2.2 Header vs Footer Switching

The switching transistors used for power gating can be placed between the power supply and

the power domain supply pins, or between the ground and the power domain ground pin. This is

known by header and footer switching respectively. Each of this implementations has its advan-

tages and disadvantages.

A single transistor is not able to carry enough current to power a large power domain. For

that reason, several transistors are used in parallel, which are usually staged in time to avoid large

inrush currents [7].

Page 36: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

16 Related Work

The sleep transistor efficiency is a relation between the current in the ON state and the OFF

state (ION/IOFF ). The total leakage of the switching fabric is highly dependent on the switching

efficiency, because we need enough transistors to deliver the required ON state current [4].

Header switching is typically implemented using PMOS transistors to switch VDD. PMOS

transistors are less leakier their NMOS counterpart with the same size, however they provide

lower drive current when active. Header switches turn off the supply voltage allowing for simple

clamp of isolation cells to "0" using a single transistor. This type of isolation however should

only be used to close timing constraints due to the fact if they fail, it will cause hard to detect

stuck-at faults. Since at system level signals are usually referenced to ground ("0"), switching VDD

becomes more convenient.

VDD

Load

Figure 2.8: Header switching

Foot switching is typically implemented with NMOS transistors, they can drive a larger amount

of current than a PMOS transistor of the same size, having a smaller area cost on the design. Typ-

ically NMOS transistors have higher switching performance than PMOS [4], therefore allowing

greater energy savings and lower area impact in the same design. As footer transistors will switch

VSS, making the system more sensible to reference noise, which may become a problem.

VDD

Load

Figure 2.9: Footer switching

Some academic paper authors use both header and footer switching, however, the two series

switches cause a more significant voltage drop, which in turn increases the gates delay [4]. This

Page 37: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

2.2 Power Gating 17

will also create a bigger area overhead since now we have two series switches performing the work

that only one would be enough for.

2.2.3 Fine Grain vs Coarse Grain

The power switch implementation can be either fine grain or coarse grain. Fine grain power

switches are part of the standard cell. It is required that the library contains standard cells with the

switches attached to them. Coarse grain power switching on the other hand can be implemented

with the addition of some special cells for power gating.

The decision of which implementation to use should be discussed with the back-end team,

since the bigger impact will be after synthesis. Usually, the chosen implementation will be coarse

grain power switching since it will create less area overhead, making it a better option even with

the increased design effort.

2.2.3.1 Fine Grain Switching

Since the switch has to be able to provide the worst case current necessary for the cell to

operate without performance loss, the area overhead can be considerable [4]. They may also

include a pull-up or pull-down transistor for isolation. Since the power switch is already inside the

standard cell, it is possible to use the traditional design flow.

Cells used for fine grain switching, with the embedded switching transistor are called Multi-

Threshold CMOS (MTCMOS) cells. MTCMOS cells contain the usual supply connections, inputs

and outputs and additionally they have an input for the sleep signal. MTCMOS cells are usually

implemented using foot switching, due to their higher switching performance. Those kind of cells

will create less area overhead than header switching. Even with foot switching, area overhead can

get close to four times the size of the original cell [4].

VDD

Sleep

Figure 2.10: Fine grain and cell

2.2.3.2 Fine Grain Advantages

According to [4] fine grain switching has some advantages:

Page 38: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

18 Related Work

VDD

Sleep

Figure 2.11: Fine grain and cell with isolation clamp transistor

• Not sensible to ground noise injection because of short virtual power nets;

• Small wake-up latency and in-rush current due to small capacitance of virtual power nets;

• Built-in clamp transistors keep outputs in known states and eliminate wake-up crowbar cur-

rents;

• Timing impact of voltage drop across the switch and clamp behaviour are easy to charac-

terise since they are inside the cell;

• Can be easily analysed and synthesised by conventional ASICs tools and flow, since MTC-

MOS are basically a normal standard cell;

2.2.3.3 Fine Grain Disadvantages

The authors also name a couple of disadvantages from fine grain switching:

• Considerable area overhead, with increases up to three times the size of the original cell;

• Requires special library with MTCMOS cells;

• Significant buffering and routing resources for sleep control distribution;

2.2.3.4 Coarse Grain Switching

In coarse grain power switching, a collection of switches are used to gate a collection of blocks

of cells. Switch network sizing is harder than fine grain switching since the activity can not be

estimated. Coarse grain power switching however introduces significantly smaller area overhead

[4].

Due to the fact that area penalty for fine grain power switching is not worth the saving on

design effort, coarse grain switching became the industry preferred method [4].

Page 39: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

2.2 Power Gating 19

VDD

Sleep

Figure 2.12: Fine grain header switching and cell with isolation clamp transistor

2.2.3.5 Coarse Grain Advantages

Coarse grain power switching is more widely accepted in the EDA industry for power gat-

ing implementation, mainly because of area constraints. The main advantages of coarse grain

switching, as explained in [4] are:

• Since sleep transistors can share charge, they are less sensitive to PVT (process, voltage,

temperature) variations and introduces less voltage drop variations;

• Significantly smaller areas than fine grain switching;

• Sleep transistors number can be optimised for voltage drop and speed targets;

• Existing standard cell libraries can be used with a few extra special cells;

2.2.3.6 Coarse Grain Disadvantages

Coarse grain also bring up some disadvantages, as stated in [4]:

• Requires complex power network;

• Power network is hard to synthesise and requires static and dynamic voltage drop analysis.

• Requires wake-up in-rush current control;

• Bigger wake-up latency;

• Power analysis is more complex;

• Has a more complex flow, due to increased complexity;

Page 40: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

20 Related Work

2.2.4 Power Intent Languages

Power information is not supported by normal Hardware Description Languages (HDL), there-

fore it has to be described somewhere else. There is where power intent files come along. Written

in a different language, they specify how the module should behave in terms of power. These

files are written independently of the HDL, and later formally verified by the tools. Power intent

languages are used to describe power specifications on the RTL level, this high level of abstraction

allows better power savings.

Two standards are currently defined for power intent, developed by different companies: they

are Unified Power Format (UPF) and Common Power Format (CPF).

CPF is a Silicon Integration Initiative (Si2) standard for low power and has some interoper-

ability with IEEE1801 low power standard [14]. UPF, as it is commonly known, is the IEEE 1801

Standard for Design and Verification of Low-Power Integrated Circuits. UPF is based on Tool

Command Language (TCL) [5]. Some other languages, that are not standards, may also be used

by some companies. These languages however, do not have a high usage, since developers prefer

to use standards in an effort to unify the development for faster and easier integrations.

CPF 2.0 is a widely adopted low-power intent format, approved as an Si2 standard by the Low

Power Coalition. It allows some interoperability with IEEE1801-2009 (UPF). CPF supports hier-

archical low-power flow, output and bidirectional virtual ports, isolation strategies, level-shifting,

retention strategies and more.

This work uses UPF because Synopsys tools offer compatibility with it and some power intent

is already specified in UPF. Some further UPF explanation can be found in section 2.2.5.

2.2.5 UPF

UPF is the IEEE (Institute of Electrical and Electronics Engineers) standard for design and

verification of low power in integrated circuits, under the standard number 1801. It was originally

created in an effort for a open portable power specification standard and approved in 2007 as an

Accellera standard. In the same year, Accellera donated it to the IEEE. The first version of IEEE

Std 1801, second version of UPF, was released in 2009 [5].

Since IEEE Std 1801 is an open standard, it gives EDA tool providers the ability to imple-

ment its latest features. The standard is already supported by a large number of EDA companies.

Synopsys tools already support a large subset of the commands in UPF, as well as some UPF-like

power intent commands that are not part of the standard [15].

UPF focus on controlling voltage and current applied to the transistors, normally technology

used for the switches is assumed to be CMOS, but other technologies can also be used. UPF can

be applied with any of the three HDL description languages, VHDL, Verilog or SystemVerilog,

due to its abstraction level [5].

UPF supports a design hierarchy and is advisable for reusing of power intent across configura-

tions. UPF hierarchy is dependent on the RTL modules’ hierarchy, which can be a downside when

it was not designed taking power gating into account.

Page 41: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

2.2 Power Gating 21

Figure 2.13: Companies involved in IEEE P1801 working group. Source [3].

The current active version of the standard is IEEE Std 1801-2015, approved on 8 December

2015 by the IEEE-SA Standards Board [16].

2.2.5.1 Concepts

When defining power intent with UPF, a few concepts must be learnt for better understanding

of its structure. This section explains the major concepts used in UPF for power gating.

Modules are put together in power domains according to their power specifications, if we have

two modules that turn off at the same time and use the same voltage, they can be put together in

the same power domain.

Ports are connection points between adjacent levels of hierarchy , connected together using

nets. UPF assumes a more abstract model of the design hierarchy, using its commands to change

the scope within the hierarchy levels. Ports have an HighConn, visible to the parent instance, and

a LowConn side, visible to the instance itself.

Power domains are collections of instances that are powered in the same way, child instances

are included in the same power domain as their parents. A power domain does not need to be con-

tiguous, this means that instances on the same power domain can be placed in different locations.

In the example present in figure 2.14, both modules A and B have the same power requirements,

Page 42: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

22 Related Work

A B C

PD_A

PD_B

Figure 2.14: Example of power domains.

so they have been put together in power domain PD_A. As for module C, since it has a different

power requirement than A and B, it belongs to a different power domain.

Supply ports are connections for supply nets on hierarchical boundaries. Supply sets represent

a collection of supply nets. Supply switches control supply connections between supply ports.

2.2.5.2 Scope

The scope is the design hierarchy where the UPF commands are executed. Defining a scope

is particularly useful for a reusable power intent. Using the set_scope command will change

the current scope, and signals will be pulled from the current scope. It is possible to write UPF in

which the current scope is the same as the root scope, but small changes in hierarchy will imply

changing all of the UPF, as in a reusable UPF only the scope would need to be changed.

2.2.5.3 Power domains

A concept introduced with power intent is power domain. When a design is power aware,

modules belong to power domains. A power domain defines a set of rules for the modules that

belong to it. A design can have several power domains, each of which has its own independent set

of rules. A power domain can be switched off, or have a defined voltage. Power domains from the

same design can be in different states independently from each other.

This is the power domain definition present in the standard:

"power domain: A collection of instances that are treated as a group for power-

management purposes. The instances of a power domain typically, but do not always,

share a primary supply set. A power domain can also have additional supplies, in-

cluding retention and isolation supplies." [16]

The other components defined in the IEEE 1801 standard are usually associated with a power

domain. That applies to the retention strategies, isolation strategies, power switches and level

shifters.

Page 43: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

2.2 Power Gating 23

Power domains are characterised by their power availability. A power domain that is not

switchable, and remains always powered, is said to be an always on power domain. Power domains

may also be characterised in relation to other power domains, this is, if power domain PD_A is

on when power domain PD_B is off, power domain PD_A is said to be relatively always on in

relation to power domain PD_B.

Three supply set handles are usually created with the power domain, primary, default_retention

and default_isolation. Extra supply sets handles can also be created with the -supply argument.

The power domain’s supply set handles default_retention and default_isolation are usually

associated with an always on supply set from the top power domain.

2.2.5.4 Isolation strategies

Powered off logical outputs can not be directly connected to active logic inputs, since values

are unpredictable they can cause incorrect readings and lead to unwanted behaviour. Isolation cells

exist to address this issue. They are placed on the border of the power domain and are responsible

for clamping the cell output. They can also be used together with level-shifters in a multi-voltage

design.

OFF ONIsolation

cell

Power

Management

Unit

Figure 2.15: Isolation cell between power domains

There are three types of isolation cells, according to their functionality, they can clamp to "0",

"1" or the last value. A simple AND gate can be used to clamp the signal to "0", as well as an OR

gate can be used to clamp it to "1". To clamp the signal to the last value before power down, a

more complex cell is used, consisting of a latch to keep state and a multiplexer, as can be seen in

figure 2.16.

Isolation cells are placed using the UPF command set_isolation. Depending on the

version of the standard used by the tools, it may be necessary to define an isolation control.

This is true for the IEEE 1801-2009 Std. version. For the newer versions of the standard, the

Page 44: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

24 Related Work

Figure 2.16: State Retention Isolation. Source: [4]

set_isolation_control command has been superseded and all isolation information can be

defined in a single set_isolation command.

Isolation cells can either be inserted at the output of the gated power domain, or at the input

of the power domain that is connected to it. In case this second power domain is less on than the

first one, there may be no need for the insertion of the isolation cells. Both types of isolation can

coexist in the same design.

Isolation is only needed at either the input of the on power domain or the output of the power

gated one. Having isolation on both will create redundant isolation inserting more cells than the

ones necessary for the operation.

Isolating inputs of a power domain from the outputs a less active one is a way of ensuring

all the signals are isolated. Leaving nets from a power gated module without isolation may cause

incorrect behaviour of the system as well as sneaky paths for current to leak. When isolating

inputs, it is necessary to make sure that no always on cells are inserted before the isolation cells

by the synthesis tools.

Another option is to isolate the outputs from the power domain that is to be powered off.

Nevertheless, this would insert isolation cells in all the output ports of that power domain, some

of which may be connected to itself or a less on power domain.

Outputs from modules that connect to the same power domain do not need to be isolated,

although isolation cells are typically small, they introduce delays in the data-path. Manually se-

lecting each port that should or not be isolated is possible, but impracticable for large designs.

UPF already accounts for this, by using the -diff_supply_only switch when creating the iso-

lation rule, will prevent tools from inserting isolation cells for nets connected to the same supply

set. This however will also foreclose the insertion of isolation cells for output ports with hetero-

geneous fan-out, this is, that connect to both another power domain and itself.

Page 45: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

2.2 Power Gating 25

Likewise the -diff_supply_only command, it is also possible to specify a source and/or

sink filter. This filter will only apply the isolation rule to nets that come from one of the source

supply sets and enter one of the specified sink supply sets. This is very useful when isolating

designs that have several power domains.

OFF ON

OFF

Figure 2.17: Power domain with heterogeneous fan-out

Using -diff_supply_only will however fail to create isolation cells in a domain port with

heterogeneous fan-out, like the one on figure 2.17 resulting in a warning message. This case is a

good example where isolation could be placed on the input of the ON power domain. It could also

be place on the output of the OFF power domain, but the second off power domain input does not

need to be isolated as it as the same power needs as the first one.

2.2.5.5 Supply sets

Supply sets are an aggregation of supply functions that together provide a complete power

source [16]. Supply sets provide a higher level of abstraction to the designer, replacing the need

of creating individual supply nets and supply ports. Supply sets have their implicit supply nets,

such as power, ground and well biasing. Supply sets provide the needed supply nets for modules

to operate. Explicitly created supply nets can be associated with an existing supply set via the

-function argument of create_supply_set command.

A power domain can have several supply set handles, which are then associated to supply sets.

Supply sets are usually associated with power domain’s supply set handles.

2.2.5.6 Retention strategies

When powering off some designs, there may be a need to keep some state. To keep state, some

registers need their value to be preserved when the module is turned off. There are several possible

Page 46: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

26 Related Work

approaches to achieve this, either using retention registers, power islands or external memory.

Retention registers are made of two registers, the main register, for normal operation and the

shadow register. Shadow registers are less leakier but produce a big area overhead.

Figure 2.18: Retention register. Source: [4]

Another way to retain state is keeping the modules which contain the registers needed to keep

state in a different always on power domain. This technique is named power islands, due to the

fact that those modules will be in a different always on domain, inside a powered off domain. This

adds some complexity to the design, since the back-end designers will need to pull the supply rails

a module inside a powered on domain. This does not cause a considerable area increase, if any,

but it is not advisable in large areas with low activity, since we would be wasting an opportunity

to reduce leakage.

Retention may be one of the power gating components with major impact. Retention registers

can create huge area overhead if not planned carefully. The need for full state retention or only

partial retention should be taken into consideration for area optimisation and restore time reduc-

tion. If the system is able to recover from a power down with only partial state retention, this

becomes an attractive solution give the registers time overhead and size.

Low standby voltage is also a possibility, but this solution increases testing complexity since

it will require a multi-voltage design, as well as a library with cells able to operate on the specified

voltage range, from standby voltage to normal operation voltage.

2.2.5.7 Level Shifters

In a multi-voltage design, communication between modules that operate at different voltages

may cause reading errors or even damage the circuitry. To ensure the correct expected operation,

level shifters must be inserted in between those modules. Level shifters are gates responsible to

shift logical signals across different voltages. If two power domains use different voltage, level

shifters must be used to ensure the correct functionality of the system. Level shifters have a low

voltage and a high voltage side.

Level shifters can be of two types, low to high or high to low. As the name suggests, high to

low level shifters, shift from the high voltage to the low voltage and low to high ones shift from

low to high voltage.

Page 47: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

2.2 Power Gating 27

2.2.5.8 Enable Level Shifters

In multi-voltage designs, ports on the boundary of power domains may need both level shifters

and to be isolated. In order to have lower area overhead, a single cell called enable level shifter

can be used instead of the isolation cell and level shifter.

Figure 2.19: Enable Level Shifter Example

2.2.5.9 Power Switch

The power switch is usually implemented in CMOS technology and consists in a transistor

between the power supply and the standard cells power input pins. The switch can be either

NMOS (footer switch) or PMOS (header switch).

Liberty libraries may have several different switch cells. Switch cells in the library may contain

several switches and are usually defined by their type. Switches types can be coarse grain or fine

grain. DC will select a switch able to carry the needed current for the on state. To force DC to

select a specific switch cell, the designer can mark all other switches as dont_use or dont_touch

and recompile the library.

Most switch related decisions are made by the back-end designer, so tampering with the library

may not be a good option. A single switch will usually not be enough to supply an entire power

domain, leaving to the back-end team the decision of selecting coarse grain switching or fine grain

switching and grid or array topology.

Switch cells have an output acknowledge port. The acknowledge port is usually connected to

the PMU to indicate that the power is now stable, or has been removed. This particular signal is

very important to avoid incorrect behaviours, if the PMU transitioned state based on a timer, since

small manufacturing process variations can affect wake up and shutdown times, it could transition

into an operative state before the power domain was actually operational, or even spend more time

than necessary waiting for power up.

Page 48: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

28 Related Work

At the back-end phase, the decision of switch topology goes into the design is also made. Most

designs use coarse grain power switching because the reduced complexity in implementation does

not compensate for the increase in area.

It is up to the back-end engineer to introduce delays between the switches in order to avoid

large inrush currents, since this kind of analysis is not able to be performed at the synthesis level.

VVDD

ACK

VDD

Sleep

Figure 2.20: Header switch cell

The figure 2.20 represents a PMOS header switching cell. VDD represents the input voltage

from the power rail. VVDD is the virtual voltage supply that is to be input of the power domain to

be gated. The sleep signal is responsible for controlling the virtual supply rail. The acknowledge

port reports the power state back to the power management unit, with the help of a buffer.

2.2.5.10 Cell Location

UPF provides the option of defining the physical location for cell insertion. This is a somewhat

important decision since it will affect layout complexity. This decision is taken at the RTL level,

but it is important that the power architect is aware of the back-end flow in order to not difficult

the implementation. The cell location is defined by the -location argument present in the UPF

cell insertion commands.

Cells can be inserted on the power domain they belong, in the parent domain or even both.

When working on IP, to be integrated in other designs, it is useful to place the cells in the power

domain they belong to, since putting them outside will create a area overhead in the parent design

in relation to the predicted area of the IP. If the cells are inside the IP area estimation already ac-

counts for them. Cells located inside the IP also provide a more abstract model to the designer that

is going to integrate the IP, this way there is no need to worry with power intent since everything

is already implemented inside the IP, reducing verification and implementation times.

UPF related cells inside the power domain may however cause a more complex back-end

implementation. Isolation cells inside a gated power domain require a extra pg pin connection to

an always on net, to power the cell, since the primary power net will be shut off. This means an

extra power rail has to be pulled inside the power domain on the layout stage of design.

Page 49: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

2.2 Power Gating 29

Inserting cells in the parent power domain may be a good option for internal power domains.

That means no extra supply rail needs to be pulled inside the power domain since it will be more

on than the one isolation cells are coming from.

2.2.5.11 Input vs Output Strategies

When creating isolation, level shifter or enable level shifter strategies, it is possible to chose

if that strategy applies to the power domain inputs, outputs or both. This is a quite an important

decision, since it may avoid uninsulated paths or redundant strategies.

As described in the isolation section (2.2.5.4), using the -diff_supply_only true switch

when defining an isolation strategy will not insert cells if the output has heterogeneous fan-out.

Instead, if that happens to be the case, it is better to define the strategy for the input port of the

active power domain, given it is the only power domain needing isolation or level shifting for that

signal.

OFF ON

OFF

iso_enable

Figure 2.21: Isolation on input of heterogeneous fan-out

Figure 2.21 is a good example where the strategy should be applied to the input, however,

in figure 2.22 it is the opposite. Since both power domains require isolation, because they are

active when the output of the first power domain is corrupt, it would be better to just isolate the

output of the first power domain. This represents an example of redundant isolation, and creates

unnecessary cells.

In figure 2.23 displays a situation when using output isolation would be the best option. The

output port connects to two power domains, and using input isolation would create an unnecessary

extra cell.

2.2.5.12 Power State Table

The power state table is a very important component to help verification. The power state table

has no physical implementation, that means it is only a table that defines all possible voltages that

Page 50: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

30 Related Work

OFF ONiso_enable

iso_enable

ON

Figure 2.22: Redundant isolation

OFF ONiso_enable

ON

Figure 2.23: Output isolation

can be applied to the power domains. If during the simulation, a power domain enters in a state

that is not defined in the power state table, it is said to be in an illegal state and will trigger an

error, causing the simulation to fail.

The power state table (PST) can contain several possible states, and several supply sets. The

power architect should write all possible states for the power domains in the power state table,

although, it is also possible to have several power state tables in the same design. Having several

power state tables allows unrelated power domains to operate independently. All power domains

related should be included in the same table to catch bugs on the power intent.

Values on the power state table are real and define the voltage applied to the supply net. A zero

in the PST does not mean the net is off, it means the defined voltage is zero. Ground net when

defined as 0, it means the net is ON. Gated nets in the power state table are defined as "OFF".

In the example table 2.2 it is defined the possible states of two power domains, PDA and PDB.

This table has three possible states, PS_ALL_ON, PS_ALL_OFF and PS_LP_1.

Page 51: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

2.2 Power Gating 31

Table 2.2: Example of PST

PDA.primary PDB.primaryState power ground power ground

PS_ALL_ON 1.0 0.0 0.8 0.0PS_ALL_OFF OFF 0.0 OFF 0.0

PS_LP_1 1.0 0.0 OFF 0.0

In PS_ALL_ON state, both power domains are on, PDA with 1.0V and PDB with 0.8V. Tools

when analysing the PST will notice this and check if level shifters have been inserted on connec-

tions between the two power domains.

The PS_ALL_OFF state is a state usually present in all PST, designs without it risk missing

states in the power up or power down sequence [7]. It is possible to observe that for this particular

design, header switching was chosen, since the supply net that is gated is the power one.

The last state, PS_LP_1, has one power domain active, PDA, and the other one power gated.

This means there need to be isolation cells from PDB to PDA. As the two power domains operate

at different voltages, enable level shifters should be used instead of both an isolation cell and a

level shifter.

From this power state table, it is possible to see that PDA can not be turned OFF when PDB is

ON. This situation creates a violation of the power state table and will cause the simulation to fail

with an illegal state.

Page 52: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

32 Related Work

Page 53: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

Chapter 3

Design Flow

This chapter summarises the design flow used in hardware simulation, verification and synthe-

sis and it introduces the necessary differences for a power aware flow.

UPF files are part of the design source. While HDL files are used to specify logic intent, UPF

files are used to specify power intent. UPF files are refined as they go down in the flow, and their

information grows as they get refined. They are inputs to the simulation tools, synthesis tools,

formal verification tools and place and route tools, the output is a new UPF file that should be

formally verified against the original one. This process is illustrated in figure 3.1.

Figure 3.1: UPF tool flow. Source: [5]

UPF files are created at the RTL level of the design and are synthesised with the HDL files for

logical verification. Then they are refined to better suit the needs during the consecutive phases.

On the final phase, together with power analysis, time analysis, validation, functional verification

33

Page 54: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

34 Design Flow

are performed to ensure UPF did not affect the expected logical behaviour of the circuit. The

original power intent is kept from the start in order to be formally verified against the succes-

sive refinements to ensure consistency of power intent throughout the development. This original

power intent is referred as golden UPF.

Figure 3.2: Design flow for multi-voltage, power gated designs. Source: [4]

3.1 Flow Without Power

Since the team this project is being developed in at Synopsys is a front-end team, the flow does

not reach the place and route phase. RCE (Regression Control Environment) is the tool respon-

sible for building the working environment. RCE uses CoreConsultant, and CoreConsultant uses

CoreBuilder. CoreBuilder is responsible of preparing files for compilation according to the de-

fined configuration, that means removing pragmas and ifdefs, so that the output RTL lines up with

Page 55: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

3.1 Flow Without Power 35

the configuration. CoreBuilder receives a TCL script as input that uses to build the configuration

intended.

VCS (Verilog Compiled code Simulator) is a functional verification tool, responsible for ver-

ifying the RTL against the test bench. VCS performs both compile and run-time verification. In

this project, the test bench has been developed using SystemVerilog, and the design intent Verilog.

When developing tests for the test bench, it is possible to enable wave dumps adding the command

to do so in the test file. VCS recognises that command and generate a VPD (VCD plus) dump file

with the wave forms generated from the simulation activity. This is a file format used with Syn-

opsys tools, the IEEE standard format for wave dumps is VCD (Value Change Dump). VPD files

can be easily converted to VCD, with vpd2vcd tool, if there is a need to use other industry tools.

To help understand undefined port states, a tool called Xprop can be run together with VCS.

Xprop propagates unknown port states across the design. Unknown port states are easier to debug

at the RTL level because the descriptions are closer to the design intent. Xprop is useful to find

the origin of the unknown signal, reducing debugging time.

Wave analysis is a good last resource to catch design errors, incorrect protocol implementa-

tions that may have escaped from the test and find the signal or sequence responsible for a test

failure. DVE (Debugging and Visualisation Environment) is used to visualise the waves. DVE

allows the designer to view code, and points to the source of a signal when double clicking on it.

Another useful feature is hierarchy visualisation, as it is possible to view the modules location,

as well as parent and child instances. In DVE it is also possible to visualise schematics and trace

back signals, this has been very useful to find the source of incorrect behaviour.

After inspecting the simulation results, it is necessary to generate a SAIF (Switching Activity

Interchange format) file with the activity. The SAIF file will later be used by DC to map names

on the netlist, which is essential for PrimeTime to perform power analysis. To get the SAIF file

from the simulation, it is necessary to convert the VPD dump file from VCS to VCD with vpd2vcd

tool using the +includemda switch to include multidimensional arrays. It is possible to select a

power interval, but any time interval will work since the file will only be used for name mapping.

Then the VCD file is post processed with the vcdpost utility. This ensures unique identifiers codes

for nets and registers.

After obtaining the post processed VCD file, running it through vcd2saif generates the SAIF

file. The switches -top and -instance are used to define the top module and instance. This

is particularly useful for removing test bench instances and test modules from the activity file, as

they are not synthesised and therefore not necessary for the process.

3.1.1 Synthesis

CoreTools are also used to generate the workspace used by the synthesis tools. The tool used

to perform synthesis is Design Compiler (DC). Synthesis consists in generating a netlist based on

the verilog logical description of the circuit. Synthesis maps the verilog functions to standard cells

from the given libraries, resulting in a functional netlist able to perform the intended operations.

Page 56: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

36 Design Flow

Synthesis tools are driven by TCL scripts, previously written to guide the synthesis process

and provide options for optimisation. The SAIF file extracted from the simulation is now added to

the workspace in order to be used by DC for name mapping. Name mapping is an optional activity

that needs to be added to the existing scripts and consists in creating a new file containing a map

of names between the RTL code from the simulation and the netlist generated by DC. This name

map will later be used by Primetime to annotate activity from the simulation to the netlist during

power analysis.

To perform synthesis, DC needs to be provided with libraries. The type of library used in this

case is liberty. Liberty is a library standard in the VLSI industry used to describe standard cells.

Liberty defines power pins, and logical pins, timing performance, as well as cell power consump-

tion and function. Liberty libraries may have one or several operating conditions for which its

cells attributes are characterised. Operating conditions include process variation, temperature and

voltage.

Power compiler is an integrated extension of DC used to minimise power consumption. Power

compiler also allows for concurrent timing, area and power optimisation [17]. Power Compiler

uses multi-corner multi-mode optimisation.

DFT (Design-for-test) Compiler is responsible for last stage, the insertion of the scan cells.

DFT Compiler also tries to repair DRC (design rule check) violations at the gate level. There is

also some optimisation of area and timing at this phase.

Formality is used to formally verify the equivalence between the RTL logical intent and the

synthesised netlist. Formal equivalence checking is used in the EDA industry to validate the

behavioural equality between two representations of the same circuit. Formality is used in this

case to compare the verilog logical intent against the synthesis generated gate level netlist.

3.2 UPF flow

The UPF flow works similar to the usual design flow, but with increased complexity. Extra

tools are needed to analyse the power intent and apply new signal constraints, such as power down

corruption. This added complexity increases simulation, development and testing times.

For power aware simulation, VCS needs to be run in MVSIM (Multi-Voltage Simulation)

NLP (Native Low Power) mode. Normal simulation assumes that an always on constant voltage is

provided to the chip, which is not true if there is power gating or multi-voltage implemented on the

design. For power gating effect simulation, MVSIM corrupts signals when in low power mode.

Logical outputs are now also dependent on the supply state, the lower the voltage, the slower will

be signal propagation. Since DVS (Dynamic Voltage Scaling) was not implemented due to the IP

complexity, different voltages were not simulated.

MVSIM checks for the correct transition of power states and compares them with the power

state table to ensure that there are no illegal transitions. The correct implementation of the power

control sequence is also checked, which helps catching low power bugs early in the design. Isola-

tion and retention strategies are also checked to ensure their correct behaviour and implementation.

Page 57: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

3.3 Power Analysis 37

DVE has enhanced signal visualisation for power aware simulation wave dumps. Corrupted

signals due to power down will be displayed differently for easy identification and not to be con-

fused with signals in unknown state due to logical errors.

3.2.1 Voltage Aware Synthesis

Complexity also increases for voltage aware synthesis. A library with low power kit is needed

in order to map all the new cells introduced by the UPF files. DC has to insert power switches,

isolation cells, level shifters and retention registers according to the power intent. Those cells

usually are marked only for area optimisation, so that DC does not replace them with normal cells.

Before synthesis, it is important to check the library that is going to be used for the presence

of the cells necessary for power gating implementation. Some vendors may mark them as "don’t

use" or "don’t touch", in which case, synthesis tools will ignore those cells and introduce GTECH

(General Technology) cells.

GTECH cells are part of a generic library used to map cells that are not available for DC

from other libraries. GTECH cells can not go into production and should not be present on final

designs. GTECH cells have generic characterisation, translating into incorrect power estimations

due to their big difference from silicon.

Depending on their location, power gating cells may need to have dual supply rails, one is the

power domain supply and the other one the always on supply, in order to ensure always on cells

remain powered during low power mode.

Power aware synthesis requires the power net voltage to select the cells from the library. Lib-

erty cells are designed to operate at a designated voltage, in order to select which cells it will

use, DC checks in the UPF files for the defined voltage of each power domain. In case of a

multi-voltage design, DC will insert cells from different libraries for the different power domains

according to their defined voltage. DVFS (Dynamic Voltage and Frequency Scaling) designs re-

quire cells capable of operating in the voltage range used in the dynamic scaling.

DC will sometimes flatten the hierarchy to perform optimisations, which may not be problem-

atic, but may difficult power analysis. For the case of analysing a specific wrapper consumption,

it is desirable to keep the hierarchy as defined in the logical intent. To achieve this, it is necessary

to force DC to keep hierarchy with the -keep_hierarchy switch statement on invocation.

3.3 Power Analysis

After synthesis, it is important to analyse the synthesis reports for violations and errors. The

cock tree has to be declared as an ideal network as it is a high fan-out network and will not be

optimised at this design stage. The clock tree is declared in a script that serves as input for DC,

and will not be synthesised.

PrimeTime PX is an extension of PrimeTime for power analysis, and is the tool used for

power analysis at the netlist level. PrimeTime will execute normal TCL commands, that allow it

to execute an already written script, instead of manually typing every command in.

Page 58: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

38 Design Flow

Running pt_shell at the command line will execute Primetime. The easiest way to obtain

several results from different part of the same simulation is to create a TCL script and execute

it with Primetime, since the flow will be the same, only the input activity file changes. To run a

script with Primetime, the -f <script file> switch is added at invocation.

First, it is necessary to load the libraries used in the synthesis. To do that it is enough to set the

target_library, link_library and search_path. The

It is necessary to make Primetime enter PX mode, enabling power analysis. To enable power

analysis it is necessary to set the power_enable_analysis variable to true. Next it is neces-

sary to set the power analysis mode. In the scope of this work, average power is the one that is

important to analyse. Average power is helpful to analyse energy consumption, which is specially

useful to estimate battery life.

Next it is necessary to pass the netlist to Primetime, using the command read_verilog

<path to netlist>. Primetime needs to know which design it is working with, to perform

that, the current_design <top instance> command is entered.

Since libraries may contain several corner cases, it is necessary to specify the one that will be

used for the power analysis. With this information, Primetime is able to select cell consumptions

and timing corners for a given voltage and temperature.

Next it is necessary to read the power intent. The power intent comes from the UPF file used

during simulation. These files have also been imported by the synthesis tools to insert the special

cells necessary for power gating. The root scope in which the power intent will be executed is the

current design, defined already.

Since wires parasitic effects can’t be ignored, they are must also be taken into account when

performing power analysis. The wire parasitics depend mostly on back-end implementation, but

synthesis results provide an estimation of their effect sooner in the design. Parasitics are read

from a Standard Parasitic Exchange Format (SPEF) file, which is an IEEE standard for parasitic

representation of data wires in ASIC development flow [18].

Design constraints are loaded from the SDC (Synopsys Design Constrain) file, and analysed.

The SDC file defines timing constraints and domain voltage definitions. The SDC file contains the

clock definitions, as well as some other networks that are defined as ideal, since they will later be

optimised at the place and route phase, during clock tree synthesis (CTS).

After this process, running update_timingwill instruct Primetime to take the input files and

configuration previously defined and start analysing the design. Activity has not been provided yet.

It is possible to do a power analysis based on a specific expected operation of the design. This

expected operation comes from a simulation, by providing both the name map file, created during

synthesis and the switching activity file, either a VCD or a SAIF file.

Two important reports are the power report and the switching activity report. The power report

provides an estimation of power consumption, discriminated by module, which is a good way of

checking power budget and power savings across different implementations.

The switching activity report can be used for debugging. If something is not correctly imple-

mented, weird results will appear in the switching activity report. The switching activity report

Page 59: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

3.3 Power Analysis 39

shows the design activity according to logic type.

This type of power analysis will not take the clock tree into consideration as it has not yet

been synthesised. At this stage, clocks are considered to be ideal networks. Clock tree synthesis

will provide better optimisation for high fan-out networks, such as the clock tree. CTS is also

important to minimise clock skew and ensure proper clock distribution and a balanced clock tree.

Page 60: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

40 Design Flow

Page 61: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

Chapter 4

Implementation

The design used in this implementation is proprietary and confidential, given that fact, some

explanations will be reduced to a minimum necessary for understanding implementation choices.

The implementation focus mainly on power gating, using the IEEE Std 1801. The tools used

for this implementation support the IEEE Std 1801-2009 version of this standard. This is not the

last version of the standard, the current active version of the standard is IEEE Std 1801-2015.

This chapter explains how power reduction was implemented on the design, as well as how

the guidelines were defined. These guidelines provide some guide points for easier power gating

implementation, since it can become a difficult job, and adds complexity to design testing.

4.1 Steps taken

The specific sub-design studied is part of a bigger design that communicates with it. In the

figure 4.1, it is possible to view a summary of the IP. The module used for this implementation

is represented as eDMA (Embeded Direct Memory Access), and is composed by a couple of sub

modules. In summary, it is divided in two channels, and some arbitration logic. The write channel

generates mainly traffic directed to the core module while the read channel will generate traffic

mainly for the application module. The eDMA, as the name explicits is a feature for direct memory

access that will offload the core processor to do other tasks while it sends information from the

memory to the application. There is also traffic directly from the core module to the application

and vice versa. Arbitration logic is responsible for selecting the source of traffic to the receiving

modules.

There is some common logic used by both channels and register configuration. The arbitration

logic must be kept powered on even when the remaining blocks in the eDMA are not being used.

Common logic must be powered on when there is an access to write or read from it’s configuration

registers, and be kept on till the eDMA is disabled.

The design has already some power intent implemented. The implementation consists on

a switchable power domain (PD_VMAIN_SW), an always on power domain (PD_VAUX) and

some power islands for state retention. The IP entry in low power state involves fairly complex

41

Page 62: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

42 Implementation

Core Application

Data

Data

Data

Data

eDMA

Read

Write

Figure 4.1: Basic representation of the module used.

negotiations, but it is outside of the scope of this work. For a flawless integration of new power

intent onto a design that already has some implemented, the system must be extensively tested to

avoid power down bugs.

On a first approach, it was decided to create a new power domain to gate the less active modules

of the eDMA. Modules that were not used during core to application and application to core traffic

were selected and put inside that power domain. This involves some knowledge of the design and

some trial and error technique, as well as some traffic analysis. The eDMA module is already

integrating part of the switchable PD_VMAIN_SW power domain, making it necessary to test the

full design instead of isolation the eDMA module.

One of the output signals from the eDMA is necessary for the configuration of the application,

and its values may change during run-time. This means that isolation can not be stuck at either "0"

or "1" via simple and or or isolation, as its value may change and cause inconsistencies between

the isolation value and the actual value. This will cause collisions on transactions. To address the

problem, isolation latch cells have been used. However, the libraries available do not possess those

cells, which resulted in GTECH cells insertion.

In an effort to avoid the insertion of GTECH cells, and based on the module that requires

latching isolation cells power consumption, it was decided to remove this module from the power

domain. Its power consumption is relatively low, so the impact of removing it from the power

domain is negligible.

Upon inspection, it was possible to conclude that the eDMA block had no necessity of saving

its state when entering low power. This is because the software already configures registers each

time the module is reactivated.

The control signals for the power domain need to be controlled from a power management

unit. Since there is no need for register retention, this power management unit is also simpler. The

power management unit is a design module and will be explained in more detail on subsection

4.3.

Page 63: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

4.2 Obstacles 43

Table 4.1: EDMA power state table

PD_VDMA_SWState power ground

PS_ALL_ON 0.8 0PS_ALL_OFF OFF 0

The PST for this implementation (4.1) is a simple two state table. The PD_VDMA_SW power

domain is either on, at 0.8V, or gated off.

After some analysis, it was possible to conclude that the modules belonging to the read channel

were not necessary for write channel traffic formation and vice versa. On a more aggressive power

saving approach, it was decided to create new power domains for each channel.

There are three different power requirements, making it a good choice to use three power

domains to further reduce static power consumption during read and write operations.

It was possible to take advantage of signals present in the design to enable the read and write

channel independently to control ther PMUs. The PMUs work in parallel to control the power

domains independently from each other. This is further explained in section 4.4 as it is the imple-

mentation with the best results.

4.2 Obstacles

Large IP designs take very long to simulate and even more to synthesise. Even small errors in

the process can cost a lot of time. Since each modification requires the test of the design to ensure

functionality remains as expected, even small improvements and features require new simulation

and, since the design changes, a new synthesis is also required. Synthesis with power, for such big

designs may even take a day or two.

It took some time to understand that the tools installed for synthesis do not yet support the

IEEE 1801-2013 standard, but support the old IEEE 1801-2009. This caused DC to not recog-

nise isolation cells due to some new commands not yet supported. The actual active standard is

IEEE 1801-2015, but the industry always takes some time to support standards, and the version

supported by the available tools is the IEEE Std. 1801-2009.

Since the libraries do not have isolation latch cells, it was necessary to remove the module from

power domain, because synthesis would not introduce isolation cells on that particular path. When

cells are not present on a library, synthesis tools will introduce GTECH (Generic Technology)

cells, these cells are present on a generic technology library, and are will cause incorrect power

and area estimations. Avoiding the insertion of GTECH cells provides a more accurate power

estimation, since parameters are usually well defined on a technology library.

Page 64: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

44 Implementation

4.3 Power Management Unit

The power management unit (PMU) is a very important component in a design with power

gating. The PMU is responsible for controlling all the power related components in the design.

Due to the fact that no state retention was necessary in the scope of this project, the power man-

agement unit is a very simple state machine with eight states, that can be reused in most power

gating implementation, as long as there are no state retention registers. For a design with power

off retention necessities, two states, save and restore, have to be added to the state machine. If

power islands are used for state retention, this state machine will be enough though.

This power management unit interacts with the clock and reset control block, that has the

ability of providing different clock and reset signals. It controls clock, reset, power switch and

isolation enable signals. The clock and reset control block was modified to have extra clocks and

extra reset signals for the power domains created. The control block provides synchronous reset

on request and an acknowledge signal that the PMU uses to change state. It also provides the clock

signal when requested, but since this is a test, non-synthesisable module, it has to be implemented

by the client to support these control signals and provide the correct clocks and reset.

The state machine starts in the idle state, assuming the chip has power on reset. The idle state

has the isolation enabled, power switch disabled and no clock. The power domain controlled by

the PMU initiates in the off state, with isolation enabled to prevent unknown signal propagation.

The clock request signal is also disabled to save dynamic power. The enable signal is responsible

for triggering the wake up process. In this design, the enable signal is an or combination of several

signals that are asserted when an operation from the module is required. In a different design, this

could be an internal signal from another module, requesting the gated design to wake up.

In the wake_up state, the power domain is powered up, however reset is kept low and the

isolation enabled. There is no clock signal either. This state waits confirmation from the power

switch, ensuring power is stable.

When the power is stable, the PMU enters the deisolate state. This state only takes one clock

cycle and the main function is to release the isolation. It also asks for the clock signal to the clock

and reset block, since it take one clock cycle to arrive.

The clk state is where the power domain gets its clock, and reset is released. When receiving

an acknowledge signal from the clock and reset control block, the state changes into active.

The active state is where the power domain is fully on and working as if there was no other

logic than the one described in the logical intent. The PMU remains in this state until the enable

signal is deasserted.

When the enable signal is deasserted, the PMU changes into the gate_clk state where it gates

the clock and asks for a reset, this way the design goes into a known state.

After a clock cycle, it enters into the isolate state. In this state, isolation is enabled and after

one clock cycle, it enters the gate_power state.

The gate_power state turns off the power switch and waits for an acknowledgement from the

switch confirming power removal. After receiving the acknowledgement, it transits back into the

Page 65: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

4.4 Final Result 45

idle state, waiting for a new enable signal.

If the power domain is needed in the gate_clk or isolate state, the PMU will jump into it’s

counterpart state, to prevent it from entering in low power and causing a big time overhead.

IDLE

WAKE_UP

DEISOLAT E

CLK

ACT IV E

GAT E_CLK

ISOLAT E

GAT E_POWER

enable

pwr_ack

rst_ack!enable

!pwr_ack

enable

enable

enable

!main_rst_n

Figure 4.2: Function State Machine.

An improved version of the PMU unit has been implemented, but not fully tested due to the

traffic profiles used by the system. The improved PMU contains a timer in the ACTIVE state.

This timer prevents the block from entering in low power unless there has been no activity for the

during its defined timeout. The objective of this is to filter out sequential traffic, preventing the

system from constantly shutting down and powering back up at each consecutive transaction. The

timer is configurable by software.

4.4 Final Result

Since the IP power intent is either full on, or in a low power mode that gates the hole system,

it is possible to reduce static power consumption even further by gating modules that may not be

used when the IP is in not in the power down mode.

The final implementation consists of three power domains, one for each power requirement,

PD_VDMA_RD_SW, PD_VDMA_WR_SW and PD_VDMA_SW The first power domain in-

cludes the modules that make up the read channel, the second one includes the modules from

Page 66: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

46 Implementation

write channel and the last one is composed by the common logic and configuration registers.

There is a module created for power management. It consists in a simple eight state, state

machine. This module is replicated three times, one for each power domain. The difference is the

signals used to activate the transition of the state machine to the on state. The three PMUs work

in parallel controlling the power domains independently.

Table 4.2: Edma power state table

PD_VDMA_SW PD_VMDA_RD_SW PD_VDMA_WR_SWState power ground power ground power ground

PS_ALL_ON 0.8 0 0.8 0 0.8 0PS_ALL_OFF OFF 0 OFF 0 OFF 0PS_WR_ON 0.8 0 OFF 0 0.8 0PS_RD_ON 0.8 0 0.8 0 OFF 0

The table 4.2 represents the power state table defined in the implementation. It is composed

by four states: PS_ALL_ON, PS_ALL_OFF, PS_WR_ON and PS_RD_ON. Each of these power

states represents a possible state the power domains could be in. The defined operating voltage is

0.8V because it is the voltage cells from library used in the implementation work with.

From the PST it is possible to observe that PD_VDMA_SW could never be gated off when

either of the other two power domains are active. This comes from the fact that PD_VDMA_SW

contains common logic necessary for both read and write traffic operations, as well as configura-

tion registers.

The power states PD_WR_ON and PD_RD_ON are the states used during exclusive write or

read operations, respectively. Those are the power states that take advantage of different power

necessities from the read and write channels and grant some extra power savings.

4.4.1 Power Architecture

All the power domains created are dependent of PD_VMAIN_SW, the main power domain of

the IP, since all supply sets are connected to PD_VMAIN_SW primary supply set.

As it is possible to observe from figure 4.3, each power domain has its own switch, controlled

from its PMU. The pm_en_sw signals are outputs from those power management units, responsi-

ble for the control of the power switch.

4.5 Verification

Testing is a very important and time consuming task in the VLSI industry. To ensure the correct

implementation of power gating in this design, it had to be tested. For the IP, one of the verification

solutions is a test bench named VTB (Verification Test Bench), based on the Universal Verification

Methodology (UVM). UVM is an accellera standard to enable reuse of verification environment

and Verification IP (VIP) [19]. UVM is implemented on top of SystemVerilog (IEEE Std. 1800),

a IEEE standard for hardware design, specification and verification language, a commonly used

Page 67: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

4.5 Verification 47

PD_VMAIN_SW

PD_VDMA_SWPD_VDMA_RD_SW PD_VDMA_WR_SW

pm_en_rd_sw pm_en_swpm_en_wr_sw

Figure 4.3: Block representation of the power domains

language for verification. SystemVerilog is very similar to the Verilog HDL, but it also has some

object oriented properties.

VTB has a set of tests, used to exercise and test different interfaces and functions of the IP.

Some of the tests are used to exercise the eDMA block, with different traffic profiles.

Although there are already many tests, none of them take power into account. To ensure the

fully functionality of the block, it is necessary to test the entrance in low power mode as well as

the exit sequence from it. For that, the conditions to enter in low power must be met during the

test. The correct behaviour of the remaining system must also be tested when the module is turned

off.

In order to test the correct behaviour of the implementation, two already existing tests were

modified. One of the tests generates traffic from the eDMA block, both read and write traffic. The

other one sends generic traffic between the core and the application. With those tests, three new

tests were created. The difference between these three tests is the type of traffic generated by the

eDMA. The first test will generate read traffic, the second write traffic and the last one both read

and write traffic.

The test sequence is the same for all of the three tests. The eDMA starts powered off, then it

wakes up because of the configuration process. After the initial configuration, the test will force

generic (core/application) traffic in parallel with eDMA traffic. The eDMA traffic will be write

traffic, when it is generated in the eDMA write channel, read traffic, when generated in the eDMA

read channel, or both, depending on the test used.

After the transactions are complete, the test disables the eDMA block, by deasserting an in-

ternal enable signal, that is accessible to software for chip implementation. Then it proceeds to

send generic traffic, this helps testing the chip functionality with the eDMA module in low power

mode. At this stage the eDMA has been turned off by hardware via the PMU, since it detected no

activity. When generic transactions finish, ensuring proper low power operation, it is necessary to

Page 68: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

48 Implementation

test if the eDMA is able to recover from low power and will work as expected. This also helps

testing the correct power up sequence and correct reset of the modules. Since the eDMA operation

consists on, as it’s name explicits, accessing memory, the test will need to program its registers

specifying the amount of data it should transfer and its location on memory. Due to confidentiality,

it is not possible to enter into eDMA configuration details.

The difference between the three tests provide a point to analyse power savings for each type

of traffic. The results will be a good comparative measure for the validation of the three power

domains solution.

When writing tests for power gating, it is important to also have a top power domain at the

root scope that contains all of the design. This power domain will emulate all power components

placed outside of the IP. If the power architect decides to use an external switch, it should put it in

this power domain. The top power domain is optional and is not synthesised.

Since IP normally is bought by other companies to integrate in their designs or SoCs (Sistem

on a Chip), the top power domain can also be useful to simulate the behaviour of the clients power

intent. It is important, when writing specifying power intent that it integrates with power intent

implemented at the chip level.

Page 69: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

Chapter 5

Results

The IP used in this project is a highly configurable design that supports different data-path

size, different number of channels for a given instance and other parameters. The results were ob-

tained using a single configuration, well defined throughout the process. From this case study was

possible to establish some base guidelines for power gating implementation at the RTL stage of the

design process. These guidelines have the purpose of simplifying power gating implementation

when the power architect has little knowledge of the design to be optimised.

It is important to make a clean and correct power intent, since it should be understandable by

the back-end team for proper implementation.

5.1 Power Reduction Outcome

The technology node chosen for this implementation was 28nm. This technology node is

not the smallest one available, but it is currently being used by the industry. From the available

libraries, it was the only one that contained standard cells for power gating implementation.

Due to IP complexity and the time necessary to run the whole power characterisation flow, it

was mandatory to select a single corner for power analysis. The chosen corner was 125oC from

the 28nm technology node library. The available temperature corners were 125oC, 0oC and -40oC,

being the 125oC the one with the worst leakage.

This implementation proved to be efficient in reducing static power consumption, and it also

had quite an impact in total power consumption, mainly due to clock gating, since dynamic power

consumption has a bigger slice of the total power consumption.

It was not possible to perform a good area overhead evaluation. This comes from the fact that,

in the flow used for this implementation, area will vary in each synthesis, even using the exact

same design.

In one of the synthesis, it was possible to observe an increase of 2% of the total design area,

comparing to the original design with no power gating. This is not a significant area increase, but

as stated before it is also not a very accurate measurement.

49

Page 70: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

50 Results

Since clock gating has also been implemented together with power gating, the area impact

is smaller. Due to register optimisation, clock gating is a technology that reduces area when

implemented.

Power reports at this level in ASICs design flow are not accurate, but are a reference for the

design power trend, becoming a good indication of possible savings.

During activity there is a negligible increase in power consumption, due to extra logic from

power gating and the power management units. The following tables reproduce the results from

power analysis. Current is the current design before power gating implementation on the eDMA

module. 1 PD is the solution with a single power domain for the eDMA module and 3 PD is the

final solution with independent channel power gating.

Table 5.1: Power during activity.

Full TrafficPower (µW) Dynamic Static Total

CurrentIP 3.29E-02 9.81E-03 0.19

DMA 4.18E-04 1.30E-03 3.11E-02

1 PDIP 3.33E-02 9.95E-03 0.196

DMA 0.00062 0.00142 0.0317

3 PDIP 0.0333 0.00995 0.196

DMA 0.00062 0.00142 0.0317

Table 5.2: Power consumption related to current implementation, during activity.

Full TrafficPower (µW) Dynamic Static Total

1 PDIP 101% 101% 103%

DMA 148% 109% 102%

3 PDIP 101% 101% 103%

DMA 148% 109% 102%

As it can be seen from the relative values presented on table 5.2, the increase in power con-

sumption for both single power domain and triple power domain is about 3%, which is not a bad

price to pay for the reduction provided during no activity. The results for the read and write sim-

ulation are very similar, due to the fact that logic is similar and the time interval chosen to extract

activity had similar traffic characteristics.

Results are different for the full simulation, where it is clearly noticeable that the three power

domains solution is much more effective on cutting down power consumption, especially if there

is only read or write traffic. Table 5.6 presents the relative power consumption of both solutions.

It is noticeable that the one power domain solution ended up increasing power consumption, this

is due to the fact that when a single channel is used, both channels are powered on, and remain on

until no channel is needed, as with three power domains, one channel will remain powered unless

there is traffic on both directions.

Page 71: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

5.2 Guidelines 51

Table 5.3: Power consumption during full simulation.

Full TrafficPower (µW) Dynamic Static Total

CurrentIP 2.04E-02 9.75E-03 0.123

DMA 4.52E-04 1.29E-03 1.89E-02

1 PDIP 1.85E-02 9.89E-03 0.113

DMA 0.00063 0.00142 0.0176

3 PDIP 1.59E-02 8.51E-03 9.37E-02

DMA 5.00E-04 5.00E-04 8.80E-03

Table 5.4: Power consumption for a full simulation, write traffic.

Write TrafficPower (µW) Dynamic Static Total

CurrentIP 1.80E-02 9.72E-03 0.107

DMA 4.35E-04 1.28E-03 1.62E-02

1 PDIP 1.71E-02 9.87E-03 0.103

DMA 0.00061 0.00141 0.0161

3 PDIP 0.0153 0.00839 0.0868

DMA 0.0005 0.00038 0.0059

Table 5.5: Power consumption for a full simulation, read traffic.

Read TrafficPower (µW) Dynamic Static Total

CurrentIP 1.81E-02 9.72E-03 0.108

DMA 4.36E-04 1.28E-03 1.64E-02

1 PDIP 1.80E-02 9.87E-03 0.109

DMA 6.20E-04 1.41E-03 1.70E-02

3 PDIP 0.0155 0.0084 0.0885

DMA 0.0005 0.00039 0.0066

Table 5.6: Relative power consumption for a full write simulation.

Write TrafficPower (uW) Dynamic Static Total

1 PDIP 95% 102% 96%

DMA 140% 110% 99%

3 PDIP 85% 86% 81%

DMA 115% 30% 36%

5.2 Guidelines

From all the simulations and analysis made during the development of this dissertation, it was

possible to define some guidelines, useful for future power gating implementations and automation

of the power gating process.

First it is necessary to identify the modules that are able to be turned off. This can be achieved

Page 72: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

52 Results

by analysing the modules’ activity from the waves provided by a previous simulation. If large

periods of activity are detected, the module becomes a good candidate for power gating.

Making a preliminary power analysis allows to know the current consumption of the modules.

By analysing the module’s leakage power consumption, it is possible to conclude if it is worth

creating a power domain for it or not. If the module has high leakage power, but at the same time

also has great activity, it is worth looking into it and partition it according to activity requirements

of its inner logic.

After selecting the candidates, it is important to group them. In a design, modules are related

to each other, which means, a couple of modules activity will be dependent from another modules’

activity, so, if the last module is not performing any function, all of the other related modules can

also be powered off. By grouping them into a single power domain, it will reduce the amount

of extra logic created from power gating. A good way to group modules together is creating a

wrapper. Although not necessary, it will ease the implementation.

If it is not possible to create a wrapper due to logical hierarchy, or design complexity, it is still

possible to implement power gating. Instead of the power domain being composed by the wrapper,

it will be composed from the independent modules, however this may raise complications in the

back-end phase, if a disjoint power domain has to be created. For new designs it is helpful to take

power hierarchy into consideration when designing the logical hierarchy of the system.

Having the modules selected, the next phase will be implementation. To implement power

gating it is necessary to understand the basics of the design it will be implemented on. If the

design requires state retention, a retention strategy has to be defined.

After deciding the need of state retention, it is necessary to define the signal that will enable

the power domain to turn on. Several power domains can also be an option, in that case, an enable

signal has to be selected for each of them. The enable signal can be a simple signal from the

parent module or a logical function from several signals. If two power domains have the same

enable signal and the same voltage, they can be grouped together, because they have the same

power requirements. The enable signal should be active during the whole activity phase of the

power domain.

Now that all the decisions are made, it is helpful to create a power intent diagram, this way

writing the UPF code will be easier.

5.3 Alternatives

Another power management unit can be designed, but some considerations should be kept in

mind. The wake up/power down and isolation signal sequences are important to avoid behavioural

errors. The power should only be removed after signals are isolated to avoid sampling of corrupted

signals. Isolation should also only be removed after power is stable, for the same reason.

The clock gating and restoration may depend on the design architecture, but removing the

clock when going into low power will reduce dynamic power.

Page 73: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

5.3 Alternatives 53

When using retention the PMU flow may differ from the one presented above 4.2. Flows with

full state retention may not require to be reset, but could be a good measure to reset them anyway,

to ensure no logic is corrupted. However, it is important to respect the save/restore sequence. The

reset signal should be applied before restoration, otherwise, data would be lost. The save operation

should also be applied before the reset signal. The isolation should also only be lifted when the

restore operation is concluded.

Page 74: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

54 Results

Page 75: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

Chapter 6

Conclusion

Reducing power consumption is a concern that has been growing through time. It is important

to consider power in digital circuits design, given the problems introduced in this document and

considering the pollution generated by some power stations.

Power aware implementations are important for saving energy, specially on battery operated

devices. Technology advancements come with advantages and disadvantages. Smaller transistors

can be operated at lower voltages lowering effectively dynamic power consumption, that is highly

voltage dependent. It also allows more complex circuits with more logic at lower prices. Although

dynamic gets lower, static power increases due to lower threshold voltages on smaller transistors.

EDA industries also develop power saving solutions that can be applied to the design on a

high level of abstraction and provide good trade-offs. These power saving solutions are great

since reducing power consumption also reduces heat dissipation, decreasing the need for cooling

systems and consequently lowering devices cost. It even reduces thermal stress of components

increasing their life and reducing thermal related effects on transistors.

The PMU is very versatile and could be used for other retention-less implementations, by

identifying and selecting a good enable signal.

The amount of power the power architect will be able to save when implemented power gating

depends on the approach used. A more aggressive approach is able to save more power, but

requires a high understanding of the design itself and may become more complex. A more complex

implementation may save more power but the more complex is the implementation, the longer it

takes to implement, test and debug.

6.1 Future Work

This section presents possible future to help automate power gating implementation. The idea

is to create a power partitioning tool composed by several scripts with well defined functions. This

tool would be based on the guidelines studied in this dissertation and apply them automatically to

a system, with reduced human interaction.

55

Page 76: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

56 Conclusion

One of the scripts has to be capable of reading a verilog module and analyse its processes. If

the module is composed by several processes independent from each other, the script will extract

this processes and promote them into new modules, this way they could be used in the power

intent.

Another important script receives several modules as inputs and places them inside a wrapper

module. The wrapper module will contain only inputs and outputs that are connected to modules

outside of itself, reducing the overall number of ports, and therefore simplifying isolation and level

shifter strategies.

The last script from this power partitioning tool evaluates switching activity and selects the

modules that are good candidates for power gating. Then, with that information it would construct

the power intent for those modules. The script should allow user introduction of the enable signal

as well as retention registers, since those two constraints require some knowledge of the design

itself.

Another important work to do in the future is to characterise the savings provided by the addi-

tion of the timer to the PMU’s state machine. This will require the implementation of power gating

into a new module, that has subsequent transaction requirements, with help from the guidelines

deduced during this dissertation.

Page 77: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

References

[1] Advanced Low Power Techniques, May 2016. URL: http://www.synopsys.com/Solutions/EndSolutions/advanced-lowpower/verification-lowpower/Pages/advanced-low-power-techniques.aspx.

[2] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand. Leakage current mechanisms andleakage reduction techniques in deep-submicrometer cmos circuits. Proceedings of the IEEE,91(2):305–327, Feb 2003. doi:10.1109/JPROC.2002.808156.

[3] Sushma Honnavara-Prasad. System level power with ieee1801, 2015. URL: http://systempower.org/wp-content/uploads/2015/04/1801_Sushma.pdf.

[4] Michael Keating, David Flynn, Rob Aitken, Alan Gibbons, and Kaijian Shi. Low PowerMethodology Manual: For System-on-Chip Design. Springer Publishing Company, Incorpo-rated, 2007.

[5] IEEE P1801 Working Group. Ieee standard for design and verification of low-power inte-grated circuits. IEEE Std 1801-2013 (Revision of IEEE Std 1801-2009), pages 1–348, May2013. doi:10.1109/IEEESTD.2013.6521327.

[6] M.C. Schneider and C. Galup-Montoro. CMOS Analog Design Using All-Region MOSFETModeling. Cambridge University Press, 2010. URL: https://books.google.com/books?id=SDPG0Lz39HcC.

[7] S. Jadcherla. Verification Methodology Manual for Low Power. Synopsys, 2009. URL:https://books.google.pt/books?id=qz2NYgEACAAJ.

[8] T. Hattori. Challenges for low-power embedded soc’s. In VLSI Design, Automation andTest, 2007. VLSI-DAT 2007. International Symposium on, pages 1–4, April 2007. doi:10.1109/VDAT.2007.373214.

[9] F. Bin Muslim, A. Qamar, and L. Lavagno. Low power methodology for an asic design flowbased on high-level synthesis. In Software, Telecommunications and Computer Networks(SoftCOM), 2015 23rd International Conference on, pages 11–15, Sept 2015. doi:10.1109/SOFTCOM.2015.7314103.

[10] A. Mathur and Qi Wang. Power reduction techniques and flows at rtl and system level.In VLSI Design, 2009 22nd International Conference on, pages 28–29, Jan 2009. doi:10.1109/VLSI.Design.2009.113.

[11] N.S. Kim, T. Austin, D. Baauw, T. Mudge, K. Flautner, J.S. Hu, M.J. Irwin, M. Kandemir,and V. Narayanan. Leakage current: Moore’s law meets static power. Computer, 36(12):68–75, Dec 2003. doi:10.1109/MC.2003.1250885.

57

Page 78: RTL Guidelines for Static Power Reduction · 2019. 7. 13. · CPF Common Power Format CTS Clock Tree Synthesis DC Design Compiler DFT Compiler Design-for-test Compiler DMA Direct

58 REFERENCES

[12] A.S. Sedra and K.C. Smith. Microelectronic Circuits: International edition. OUP USA,2010. URL: https://books.google.pt/books?id=KuGCRAAACAAJ.

[13] Farzan Fallah and Massoud Pedram. Standby and active leakage current control and mini-mization in cmos vlsi circuits. IEICE transactions on electronics, 88(4):509–519, 2005.

[14] S. Carver, A. Mathur, L. Sharma, P. Subbarao, S. Urish, and Qi Wang. Low-power designusing the si2 common power format. IEEE Design &amp; Test of Computers, 29(2):62– 70, 2012/04/. low-power design;common power format standard;CPF standard;IC de-sign;power consumption;power domain;power node;interoperability;IEEE1801 low-powerstandard;SoC design;. URL: http://dx.doi.org/10.1109/MDT.2012.2183574.

[15] V. Gourisetty, H. Mahmoodi, V. Melikyan, E. Babayan, R. Goldman, K. Holcomb, andT. Wood. Low power design flow based on unified power format and synopsys tool chain.In Interdisciplinary Engineering Design Education Conference (IEDEC), 2013 3rd, pages28–31, March 2013. doi:10.1109/IEDEC.2013.6526754.

[16] Ieee standard for design and verification of low-power, energy-aware electronic systems.IEEE Std 1801-2015 (Revision of IEEE Std 1801-2013), pages 1–515, March 2016. doi:10.1109/IEEESTD.2016.7445797.

[17] Power Optimization in Design Compiler, June 2016. URL: http://www.synopsys.com/Tools/Implementation/RTLSynthesis/Pages/PowerCompiler.aspx.

[18] Ieee standard for integrated circuit (ic) open library architecture (ola). IEEE Std 1481-2009,pages c1–658, 2009. doi:10.1109/IEEESTD.2009.5430852.

[19] Universal verification methodology, June 2016. URL: http://www.accellera.org/community/uvm/.