Power Grid Analysis in VLSI Designs

Power Grid Analysis in VLSI Designs

A Thesis

Submitted for the Degree of

Master of Science (Engineering) In the Faculty of Engineering

By Kalpesh Shah

Super Computer Education and Research Centre

Indian Institute of Science Bangalore – 560012

March 2007

2

3

Acknowledgements

My sincere gratitude to both my guides - Prof S K Nandy and Dr. Vish Visvanathan. Prof Nandy, thank you for your guidance right from the start of the MS curriculum till the end. I

would not have dreamt of the final chapters had it not been for your timely guidance. To Vish, thank you for bearing with me and guiding me from the beginning till end, in your

busy schedule at office. You are the one who encouraged me from enrolling for this program till end. Thank you for your valuable inputs and comments on the material. My

sincere thanks to IISc and specifically SERC staff who helped me through various administrative work.

To my colleagues and managers at Texas Instruments, thank you for your cooperation -you are a team I am proud of. Thanks for your support and the camaraderie. A special

thanks to Harinath for approving my MS Program and Venugopal Puvvada, my manager when most of this work happened. Discussions with him made this work relevant to Multi-

million gate designs and found real application.

Thanks to many of my friends with whom I discussed similar topics like my research

throughout this period – Ananth, Gokul, Mallik, Suravi, Saby, Bram, Ashish, Aishwarya and Sumedha. A special thanks to Anjana Ghose for all that you did for me while I was not in

Bangalore.

Thanks to my family for having stood behind me like a rock. To my parents, thanks for

your support and affection – your unrelenting persistence helped me to complete last step. To Pratiksha – thank you for being my invisible strength. Your constant reassuring

presence and confidence in me drove me to this point in journey. To Bhavesh and Deepti – thank you for being my savior at times of load at home. Without you folks, this thesis

would not have materialized. And finally, thanks to little Harsh who came to this world halfway through my MS and Darsh who saw my MS from the age of 1 year – you kept me

giving unasked needed breaks and made everything so live.

4

5

Table of Contents Acknowledgements..................................................................................................................3 Abstract ...................................................................................................................................11 1 Introduction ...................................................................................................................13 1.1 Motivation ........................................................................................................................................13 1.1.1 Power Estimation ................................................................................................................................... 16 1.1.2 Power Supply Noise ............................................................................................................................... 17 1.1.3 MTCMOS Analysis ................................................................................................................................. 22 1.2 Terms ..............................................................................................................................................24 1.3 Thesis outline and Contribution......................................................................................................25 2 Toggle Activity Estimation...........................................................................................27 2.1 Overview .........................................................................................................................................27 2.2 Toggle Activity Estimation ..............................................................................................................29 2.3 Multi-million gate solution ...............................................................................................................30 2.3.1 Deriving automatic toggle frequency values.............................................................................................. 31 2.3.2 Hierarchical Modeling ............................................................................................................................. 35 2.4 Validation and Results ....................................................................................................................37 2.5 Summary .........................................................................................................................................38 3 Power Estimation..........................................................................................................39 3.1 Overview .........................................................................................................................................39 3.2 Current approaches to Power Analysis..........................................................................................42 3.3 Power analysis Tools ......................................................................................................................45 3.3.1 Power Compiler: [67] .............................................................................................................................. 45 3.3.2 Power Mill (or Nano Sim) [4][68] .............................................................................................................. 46 3.3.3 Prime Power [66].................................................................................................................................... 47 3.3.4 Other Tools ............................................................................................................................................ 47 3.4 Validation Flow................................................................................................................................48 3.4.1 Netlist Setup:.......................................................................................................................................... 50 3.4.2 Vector Generation .................................................................................................................................. 50 3.4.3 Interconnect setup .................................................................................................................................. 51 3.5 Validation and Results ....................................................................................................................51 3.6 Power estimation applications ........................................................................................................60 3.6.1 Average power/ground bus currents ........................................................................................................ 60 3.6.2 Average power dissipation ...................................................................................................................... 61 3.6.3 Electro migration failures......................................................................................................................... 61 3.6.4 Power Routing........................................................................................................................................ 61 3.6.5 Gate Oxide Integrity Analysis .................................................................................................................. 62 3.7 Summary .........................................................................................................................................62 4 Power Supply Noise Analysis .....................................................................................63 4.1 Overview .........................................................................................................................................63 4.2 Cell Characterization.......................................................................................................................64 4.2.1 Current Characterization Methodology..................................................................................................... 65 4.2.2 Current Characterization Flow................................................................................................................. 71 4.3 Power Grid network modeling ........................................................................................................72 4.3.1 Power Grid Current Waveform Modeling .................................................................................................. 74 4.4 Complete Flow ................................................................................................................................78

6

4.4.1 Timing Information Generation ................................................................................................................ 80 4.4.2 Power Grid Generator............................................................................................................................. 80 4.4.3 SPICE Simulation................................................................................................................................... 82 4.5 Validation and Results ....................................................................................................................82 4.5.1 Peak Power Results ............................................................................................................................... 83 4.5.2 Peak Dynamic IR Drop Results ............................................................................................................... 84 4.6 Summary .........................................................................................................................................87 5 Power Up Analysis........................................................................................................89 5.1 Switched PG Networks ...................................................................................................................91 5.2 Switch Network Analysis.................................................................................................................94 5.2.1 Switch Characterization .......................................................................................................................... 95 5.2.2 Current or Switch Prediction.................................................................................................................... 96 5.3 Results and Analysis.......................................................................................................................99 5.4 Summary .......................................................................................................................................104 6 Conclusion...................................................................................................................105 6.1 Summary .......................................................................................................................................105 6.2 Scope of Future Work...................................................................................................................106 7 References...................................................................................................................109 Appendix A Sample SDC file...............................................................................................115 Appendix B Sample SPEF Format......................................................................................116 Appendix C Power Waveforms Analysis...........................................................................118 Appendix D Current Characterization – sample spice deck ...........................................119 Appendix E Waveform transformation example...............................................................120

7

Table of Figures Figure 1.1 Power Dissipation in CMOS designs ......................................................................................13 Figure 1.2 Power Density trend in CMOS designs...................................................................................14 Figure 1.3 Leakage and Dynamic Power Dissipation [2].........................................................................15 Figure 1.4 Schematic of Power Grid in CMOS designs...........................................................................18 Figure 1.5 Normalized delay and normalized delay to voltage ratio........................................................21 Figure 1.6 Total power break up into leakage and active........................................................................23 Figure 2.1 Schematic of logic circuit 1......................................................................................................31 Figure 2.2 Schematic of Logic Circuit 2....................................................................................................32 Figure 2.3 Gated clock example ...............................................................................................................34 Figure 2.4 Gate Level Netlist for 'simple' design......................................................................................36 Figure 2.5 Timing Arcs in extracted model of 'simple' design..................................................................37 Figure 3.1 Venn diagram of Power Components.....................................................................................40 Figure 3.2 Power Estimation in Design Stages........................................................................................45 Figure 3.3 Power Estimation Validation Flow...........................................................................................49 Figure 3.4 Legends for Validation Flow....................................................................................................49 Figure 4.1 Voltage over time representation at an internal design node ................................................63 Figure 4.2 Schematic circuit for instantaneous voltage drop analysis ....................................................64 Figure 4.3 Inverter waveforms measured at different nodes...................................................................66 Figure 4.4 transition time vs. peak power for Inverter..............................................................................68 Figure 4.5 Transition time vs. peak power for nand gate.........................................................................68 Figure 4.6 Load vs. peak power for AND gate.........................................................................................69 Figure 4.7 Load vs. Peak power for OR gate...........................................................................................69 Figure 4.8 State Dependency on cell switching .......................................................................................70 Figure 4.9 Cell Characterization Flow.......................................................................................................72 Figure 4.10 Power Grid Modeling .............................................................................................................73 Figure 4.11 Peak IR drop Computation Flow...........................................................................................79 Figure 4.12 Prime Time flow for arrival time computation .......................................................................80 Figure 4.13 Power Grid Generation Flow.................................................................................................81 Figure 4.14 PSN waveform of Proposed Method.....................................................................................86 Figure 4.15 PSN Reference Waveform....................................................................................................86 Figure 5.1 Gated Power Supply ([74]) ......................................................................................................89 Figure 5.2 Layout of 1M gate with switch network...................................................................................92 Figure 5.3 Current Glitch and Voltage Ramp at arbitrary switch output..................................................92 Figure 5.4 Typical PG network with Power Switches...............................................................................93 Figure 5.5 Schematic Switch network Analysis Flow...............................................................................95 Figure 5.6 Analysis model of Virtual Power Network...............................................................................96 Figure 5.7 Infinitesimal Time Division for Current Prediction...................................................................97 Figure 5.8 Reduced Switch Network for validation ................................................................................100 Figure 5.9 Voltage Ramp up over Time for various nodes ....................................................................103 Figure 5.10 Current comparison over time.............................................................................................103 Figure 1 1MHz, Peak: 838.9 uW.............................................................................................................118 Figure 2 100MHz, Peak: 840.7 uW.........................................................................................................118

8

Figure 3 1GHz, Peak: 838.2 uW.............................................................................................................118 Figure 4 1MHz base Waveform, 830.4uW .............................................................................................120 Figure 5 100MHz Transformation, 830.4 uW .........................................................................................120 Figure 6 1GHz Transformation for 1MHz, 830.4uW ..............................................................................121

9

List of Tables Table 1.1 Consolidation of ITRS2003 Predictions ...................................................................................14 Table 1.2 Generic Term Definitions ..........................................................................................................25 Table 2.1 Comparison of Static vs Dynamic approaches for Power Estimation.....................................28 Table 3.1 Power Modeling for CMOS gates.............................................................................................43 Table 3.2 ISCAS89 circuit description ......................................................................................................54 Table 3.3 Runtime comparison between vector less and SPICE............................................................55 Table 3.4 Clock Power vs. Total Power....................................................................................................57 Table 3.5 Power Estimation across various tools ....................................................................................60 Table 4.1 Comparison of Peak power Dissipation...................................................................................84 Table 4.2 Comparison of percentage peak instantaneous IR drop.........................................................85 Table 4.3 Comparison of percentage peak IR drop on ISCAS89 circuits...............................................85 Table 5.1 Switch Prediction by proposed algorithm...............................................................................102 Table 5.2 Voltage Prediction...................................................................................................................102 Table 5.3 Power Up analysis - Runtime Comparison ............................................................................103

10

11

Abstract

Power has become an important design closure parameter in today’s ultra low submicron

digital designs. The impact of the increase in power is multi-discipline to researchers ranging

from power supply design, power converters or voltage regulators design, system, board and

package thermal analysis, power grid design and signal integrity analysis to minimizing power

itself. This work focuses on challenges arising due to increase in power to power grid design

and analysis.

Challenges arising due to lower geometries and higher power are very well researched topics

and there is still lot of scope to continue work. Traditionally, designs go through average IR

drop analysis. Average IR drop analysis is highly dependent on current dissipation estimation.

This work proposes a vector less probabilistic toggle estimation which is extension of one of

the approaches proposed in literature. We have further used toggles computed using this

approach to estimate power of ISCAS89 benchmark circuits. This provides insight into quality

of toggles being generated. Power Estimation work is further extended to comprehend with

various state of the art methodologies available i.e. spice based power estimation, logic

simulation based power estimation, commercially available tool comparisons etc. We finally

arrived at optimum flow recommendation which can be used as per design need and schedule.

Today’s design complexity – high frequencies, high logic densities and multiple level clock and

power gating - has forced design community to look beyond average IR drop. High rate of

switching activities induce power supply fluctuations to cells in design which is known as

12

instantaneous IR drop. However, there is no good analysis methodology in place to analyze this

phenomenon. Ad hoc decoupling planning and on chip intrinsic decoupling capacitance helps

to contain this noise but there is no guarantee. This work also applies average toggle

computation approach to compute instantaneous IR drop analysis for designs. Instantaneous IR

drop is also known as dynamic IR drop or power supply noise. We are proposing cell

characterization methodology for standard cells. This data is used to build power grid model of

the design. Finally, the power network is solved to compute instantaneous IR drop.

Leakage Power Minimization has forced design teams to do complex power gating – multi

level MTCMOS usage in Power Grid. This puts additonal analysis challenge for Power Grid in

terms of ON/OFF sequencing and noise injection due to it. This work explains the state of art

here and highlights some of the issues and trade offs using MTCMOS logic. It further suggests

a simple approach to quickly access the impact of MTCMOS gates in Power Grid in terms of

peak currents and IR drop. Alternatively, the approach suggested also helps in MTCMOS gate

optimization. Early leakage optimization overhead can be computed using this approach.

13

1 Introduction

1.1 Motivation

VLSI industry is facing one of the biggest challenges in its evolution – Power Integrity closure

– the next after cross talk induced integrity issues in previous decade. Power Dissipation has

phenomenally increased across years as shown in Figure 1.1 giving rise to this challenge.

Figure 1.2 shows the increase in power density due to ultra low scaling and hence increasing

the components cramped in unit area.

5KW 18KW

1.5KW 500W

40048008

80808085

8086286

386486

Pentium® proc

0.1

1

10

100

1000

10000

100000

1971 1974 1978 1985 1992 2000 2004 2008Year

Pow

er (W

atts

)

5KW 18KW

1.5KW 500W

40048008

80808085

8086286

386486

Pentium® proc

0.1

1

10

100

1000

10000

100000

1971 1974 1978 1985 1992 2000 2004 2008Year

Pow

er (W

atts

)

Figure 1.1 Power Dissipation in CMOS designs

14

400480088080

8085

8086

286 386486

Pentium® procP6

1

10

100

1000

10000

1970 1980 1990 2000 2010Year

Pow

er D

ensi

ty (W

/cm

2)

Hot Plate

NuclearReactor

RocketNozzle

400480088080

8085

8086

286 386486

Pentium® procP6

1

10

100

1000

10000

1970 1980 1990 2000 2010Year

Pow

er D

ensi

ty (W

/cm

2)

Hot PlateHot Plate

NuclearReactorNuclearReactorNuclearReactor

RocketNozzleRocketNozzleRocketNozzle

Figure 1.2 Power Density trend in CMOS designs

Table 1.1 below shows consolidation of ITRS2003 [1] predictions on power as well as its

impact on design as well as operating voltages.

2003 2004

(90u) 2005 2006

2007

(65u) 2008 2009

2010

(45u) 2012

Vdd(High Perf) 1.2 1.2 1.1 1.1 1.1 1 1 1 0.9

Vdd(Low Power) 1 0.9 0.9 0.9 0.8 0.8 0.8 0.7 0.7

High Perf Power (W) 149 158 167 180 189 200 210 218 240

Battery Operated(W) 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3

PG Pads 1700 1800 2000 2100 2200 2300 2400 2400 2600

Table 1.1 Consolidation of ITRS2003 Predictions

15

Further, Figure 1.3 shows that there is leakage as well as dynamic component of power those

are continuously increasing – leakage dominating dynamic – in newer technology nodes. [2]

Next sections describe how these give rise to challenges in Power Grid analysis and leads to the

work done.

Figure 1.3 Leakage and Dynamic Power Dissipation [2]

16

1.1.1 Power Estimation

One of the challenges in Power Integrity analysis is to predict accurate power dissipation – both

average as well as peak - of design. Power Estimation is required for package thermal analysis,

power minimization, and Power Grid design.

The earliest proposed techniques of estimating power dissipation were strongly pattern-

dependent circuit simulation based e.g. SPICE or fast SPICE simulators [3-6]. Besides being

strongly pattern-dependent, these techniques are too slow to be used on modern very large-

scale integrated (VLSI) circuits for which high power dissipation is a major problem.

In order to improve computational efficiency, other simulation-based techniques were proposed

using various kinds of timing, switch-level, and logic simulation [7-9]. In these approaches,

lookup tables are obtained by electrical simulation of the basic library elements, and the

collected data are then used during gate level simulation. These techniques generally assume

that the power supply and ground voltages are fixed, and only the supply current waveform is

estimated. While they are indeed more efficient than traditional circuit simulation at the cost of

some loss in accuracy, they remain strongly pattern-dependent and they are still slow for

modern multi-million gate designs where whole chip can not be simulated together.

In order to overcome the shortcomings of simulation-based techniques, research has been

focused on probabilistic and statistical techniques for toggle estimation. The use of

probabilities to estimate power was first proposed in [11]. In this work, a zero-delay model was

made so that the transition probabilities could be estimated using signal probabilities. A

probabilistic power estimation approach that does compute the toggle power and does not make

the zero-delay or temporal independence assumptions, called probabilistic simulation was

17

proposed in a few papers. In this technique, the use of probabilities was expanded to allow the

specification of probability waveforms. This approach assumed spatial independence, and was

not restricted only to synchronous circuits.

Another probabilistic approach was proposed, where the transition density measure of circuit

activity was introduced by Farid N. [12]. An algorithm was also presented for propagating the

transition density in to the circuit. This approach does not make a zero-delay assumption and

makes only the spatial independence assumption. Result of this independence assumption

makes computed density values insensitive to the internal circuit delays.

Yet another probabilistic approach was presented in [13] by A. Ghosh et. al., where Binary

Decision Diagrams (BDD’s) were used to take into account internal node correlations and

toggle power, at the cost of increased computation. This approach can become computationally

expensive. Apart from that, latest literature describes more accurate toggle estimation methods

based on Bayesian networks [14-16]. They get limited to handle high gate count designs. All of

the above probabilistic and statistical techniques are applicable only to combinational circuits.

They require the user to specify information on the activity at the latch outputs.

This work addresses the toggle computation problem or pattern dependence problem for multi-

million gate designs by extending Najm’s approach [12]. Using this average power estimation

has been performed in various stages of the designs.

1.1.2 Power Supply Noise

With a phenomenal rise in the switching speed in the VSLI circuits, the probability of large

number of cells switching in a short period of time increases. A large number of simultaneous

18

switching occurring in a short period of time can cause a considerable amount of noise in the

power supply network of a circuit. Power supply noise means decrease in voltage seen by cell

Power Ground nodes. Schematic of Power Network gird is shown in Figure 1.4. The resistive

parasitic R in the power distribution network is accountable for the resistive noise, which is the

IR voltage drop in the PG network. Apart from R, on chip decoupling capacitance also plays a

big role. The switching noise in the power distribution network must be contained to a tolerable

level to ensure the reliability/performance of a circuit.

Figure 1.4 Schematic of Power Grid in CMOS designs

Excessive voltage drops manifest themselves as glitches on the PG buses and cause:

• Erroneous logic signals

1

5

Vss Pad

IO Pad

IO Pad

IO Pad

IO Pad

Vss Pad

Vdd Pad Vss Pad IO Pad IO Pad

Vdd Pad Vss Pad IO Pad IO Pad

19

• Degradation in switching speeds

• Reduction in Noise Margin and Driving Capability of the gates

According to a study on Pentium®4 [26], power supply noise can reduce clock frequency by

6.5% on 130 nm node and can reduce clock frequency by 8% on 90 nm node. All these are

handled through various margins in design flow as there are no efficient solutions available to

address dynamic V drop problem in design flow.

There is some work done to estimate peak power as well as decoupling capacitor in this regard.

In [27], a pattern-independent, linear time algorithm is described that estimates the maximum

current waveforms at various contact points in the circuit. The algorithm is first demonstrated

for simple gate delay and current models. The expression for modeling the delays and current

waveforms for a general gate is derived and the way to extend the algorithm under more

general models is also described. The authors improved the work in [28]. In [29] measures of

peak power are proposed in the context of sequential circuits, and a procedure is presented to

obtain lower bounds on these measures, as well as providing the actual input vectors that attain

such bounds. Automatic generation of a functional vector loop for near-worst case power

consumption is attained. Paper [30] presents a statistical method for estimating the peak

power dissipation in VLSI circuits. The method is based on the theory of extreme order

statistics and its application to the probabilistic distributions of the cycle-by-cycle power

consumption, the maximum-likelihood estimation, and the Monte-Carlo simulation. It can be

used to predict the maximum power of a VLSI circuit in the set of constrained input vector

pairs as well as the complete set of all possible input vector pairs. The simulation-based nature

of the method avoids the limitations of a gate-level delay model and a gate-level circuit

structure. Also, the method produces maximum power estimates to satisfy user-specified error

20

and confidence levels. Experimental results show that this method typically produces maximum

power estimates within 5% of the actual value and with a 90% confidence level by only

simulating less than 2500 input vectors. Another technique described in [31] computes peak

powers of design while maintaining the current waveform accuracy. It models logic gates by

breaking the gates into various nodes. It then models various currents in terms of these nodes

which are evaluated quickly during logic simulation to measure power. However, this is based

on logical simulation so extremely difficult to scale.

Chen and Ling [36] proposed an approach to estimate the power supply noise based on an

integrated package-level and chip-level power bus model. Chang, Gupta, and Breuer [37]

proposed an analytical model to estimate the ground bounce caused by the switching in the

internal circuitry for sub-micron VLSI circuits. Jiang, Cheng, and Deng [38] proposed a

Genetic Algorithm-based approach that considered the dependence of switching noise on input

patterns under a distributed RC model of the PG network. Zhao, Roy, and Kho proposed an

event-driven simulation based approach to calculate the worst case power supply noise under a

distributed RLC model [39].

There are still more challenges in this area where very little work has been done.

First, to analyze Power Ground (PG) noise, worst case vectors are required using which the

parasitic network of chip is simulated. Not only the whole approach needs lot of data and

memory but today’s SPICE simulators are not able to handle such complexity in terms of

runtime and capacity. Many times (read as all the time) determining the worst case vectors is

not straightforward.

21

Second, today’s design has huge PG network. It is known that the voltages seen at various

nodes in this network will vary. A resultant voltage across power-ground bus for a macro

impacts the delay as shown in Figure 1.5. Note that delay is non-linear at low voltages. Further,

the change in delay to change is voltage is more non linear compare to delay – this is of very

important to designers as it can cause delay issues or design failures. Due to high dependency

of delay to voltage, dynamic V-drop in PG network is fast becoming a critical concern for the

chip designers [41][59-60].

1.2 1.15 1.1 1.05 1 0.95 0.9 0.85 0.8

Voltage

norm

aliz

ed d

elay

and

nor

mal

ized

de

lay2

volta

ge

Rise Delay

Fall Delay

risedelay2voltage_changefalldelay2voltage_change

Figure 1.5 Normalized delay and normalized delay to voltage ratio

Third aspect to PG noise problem is that it is an iterative phenomenon [41]. When voltage

across cell decreases due to sudden rise in switching activity, it also changes the delays and

hence the simultaneous switching. This in turn can reduce/increase the dynamic noise issues.

Reduce in a sense that the simultaneous switching may reduce all together or increase because

it can move one hot spot of the design to some other hot spot. Handling of this is not a trivial

task from analysis perspective.

22

Four, design methodologies today expect analysis to meet predefined PG noise targets. In

reality, any acceptable voltage drop is fine if we meet the required timing goals. However, this

is not done due to lack of analysis data.

Five, it has been found that many times the device fail on testers due to excessive simultaneous

switching in SCAN testing. This creates serious testability issues and hence not only we need to

analyze dynamic V drop for functional mode but also some other modes like test.

This work addresses the dynamic PG noise problem. The problem is also described as dynamic

V drop problem in some literature. Based on the above-mentioned issues, the goal is to address

the dynamic V drop problem with efficient runtime that addresses today’s multi million gate

designs. The goal is to also evaluate the impact of dynamic V drop on timing.

1.1.3 MTCMOS Analysis

Leakage power consists of more than half of total power in today’s ultra sub micron designs.

See Figure 1.6 below.

23

Figure 1.6 Total power break up into leakage and active

Leakage power control and power network integrity have become one of the key area of

interest for today’s power sensitive designs. In comments on Power Consumption Problem at

the 2002 International Electron Devices Meeting, Intel chairman Andrew Grove cited off-state

current leakage in particular as a limiting factor in future microprocessor integration. [72]

Designers have been coming out innovative way to reduce leakage power using various

techniques – reducing device power supply and frequency of operation [73], Multi-Vt transistor

usage [74-79], controlling input states [74], memory leakage reduction [75], using reverse body

bias [76], and using transistor stack [77]. A detailed study on sources of leakage power and

reduction techniques can be found in [82].

Several techniques are available to reduce the leakage – gated power supply using power

switches is one of the most promising techniques. Power switches consist of several PMOS

24

transistors and controlling signals and are used to dynamically switch off or on the power

supply to specific region in the chip. This work studies the challenges associated with using

power switches and proposes fast analysis technique to estimate peak currents while Power

ramp up of logic happens.

1.2 Terms

Generic terms used in this report are described below.

ASIC Acronym for Application Specific Integrated Circuits. A custom or semi

custom integrated circuit, such as a cell or gate array, created for a specific

application. The complexity of ASICs typically requires significant use of

CAD techniques.

Block Also known as functional block or module. Any block within the design

hierarchy instantiated one or more times that will be laid out separately is

referred to as a block module. Block modules are defined divisions of a chip

based on functionality and can be worked on independently of other

functional blocks.

Netlist A description of the circuit. The description can be a gate-level or Register-

Transfer level (RTL) one. It can also be in different languages like Verilog

or VHDL or SPICE.

Physical Design A portion of a chip or circuit corresponding to a block module that is laid

out separately using a Physical Design tool. It is also referred to as a

physical block, layout region, or layout block.

RTL Acronym for Register Transfer Level

Characterization Electrical analysis performed for the purpose of determining typical device

performance characteristics and/or parametric limits.

25

CMOS Acronym for Complimentary Metal Oxide Semiconductor. An MOS

technology in which both P-channel and N-channel devices are fabricated

on the same die.

Die A single square or rectangular piece of silicon into which a specific

semiconductor circuit has been diffused.

Electromigration Particle migration in aluminum or copper thin-film or polysilicon

conductors at grain boundaries as a result of high current densities.

Electromigration can lead to either an open circuit condition in a conductor

or a short between adjacent connectors.

Interconnect The metallization connecting two or more active elements on the surface of

a die; also, the wires connecting the die to the package leads.

Timing Window Timing window specifies the interval of each circuit node at which a

transition activity is anticipated. For a single clock domain, the time interval

can lie within a clock period. There can be more than one intervals or

overlapping intervals based on complexity of path converging to the node.

Table 1.2 Generic Term Definitions

1.3 Thesis outline and Contribution

There are 3 distinct problems addressed in this work.

First, Average Power Estimation using probabilistic toggle estimation for multi-million gate

designs. Unless specified by the user, the approach calculates switching probabilities as well as

switching rate at different nodes in the circuit (including primary inputs). We have studied

switching activity calculation method with lot of literature already available and enhanced one

of the techniques to meet multimillion gate design needs. This work helps in average dynamic

26

power estimation as well as addresses the challenges of toggle estimation which has varied

applications like peak power estimation, power supply noise analysis and reliability analysis.

Second, Dynamic Power supply Noise estimation. In this regard, a prototype flow is developed

in conjunction with Prime Time STA flow and Spice to measure Power Supply noise. The work

describes gate characterization methodology that involves one time SPICE simulation and how

the PG network is modeled using the characterized data.

Third problem addressed is power grid analysis where MTCMOS gates are inserted. The work

focuses on MTCMOS analysis challenges and key factors to focus on when a bunch of logic

turns ON from OFF state. In this regard, a flow is developed to estimate peak currents or

optimize MTCMOS resistance and switches.

We restrict out scope to CMOS circuits mapped on a predefined cell library and we follow the

two step paradigm – library modeling and analysis of design using modeled information.

Library modeling involves description of cells, their functional, structural or electrical behavior

as needed for block or design analysis, which happens once for all. Electrical behavior

modeling happens through characterization using circuit simulator (e.g. SPICE [3]).

The document is organized as below. Toggle estimation problem is addressed in chapter 2.

Chapter 3 describes the various Power Estimation techniques and tools available in industry

and compares the power numbers with the above toggle estimation method. Chapter 4 describes

Power Supply Noise Estimation and Chapter 5 describes MTCMOS Power Up analysis. Finally,

huge lists of publications are shown at the end for further reference.

27

2 Toggle Activity Estimation

2.1 Overview

In CMOS technologies, the chip components draw power supply current only during a logic

transition if we ignore the small leakage current. The current is also proportional to the supply

voltage value seen by the cell or macro. While this is considered an attractive low-power

feature of these technologies, it makes the power estimation and voltage drop highly dependent

on the switching activity inside these circuits [11][97]. It means, a more active circuit will

consume more current and hence will contribute higher Voltage drop. The activity of circuit is

known by running simulation patterns and analyzing the data. The pattern-dependence problem

is serious. Often, the power of a functional block needs to be estimated when the rest of the

chip has not yet been designed, or even completely specified. In such a case, very little may be

known about the inputs to this functional block, and complete and specific information about

its inputs would be impossible to obtain.

This drives pattern independent toggle activity estimation problem, often referred as vector less

approach. Since vector less approach does not require patterns, it is also called ‘static’ whereas

vector based approach is called ‘dynamic’. Table 2.1 compares these 2 approaches.

STATIC DYNAMIC

Uses probabilistic approach as described

in [12] or zero delay simulation based

Uses Logic simulation to generate switching

activity or SPICE simulation to calculate power.

28

STATIC DYNAMIC

approach.

Vector-less approach. Vector based approach. Hence quality is as good as

input vectors. Imagine number of patterns possible

for 100 inputs block.

Many times gives upper bound. Gives accurate result.

Modeling of certain element (hard

macro/complex block) is difficult.

Since it is vector based, functional models can be

used during simulation.

Very fast. (few minutes-hours) Very slow.(few days-weeks)

Lot of research into products for average

power estimation.

Can give instantaneous power.

Synopsys has: Power Compiler Synopsys has: Power Mill (Nano Sim)

Table 2.1 Comparison of Static vs Dynamic approaches for Power Estimation

This work describes the approach used for toggle frequency estimation and its limitations.

Further it proposes solution to handle these limitations which makes the approach usable for

big designs.

Few terms are used below to clarify discussion:

Transition Density: If a logic signal x(t) makes n(T) transitions in a time internal of

length T, then the transition density of x(t) is defined as:

D(x) = n(T)/T where T is very huge time (infinite ideally)

29

For large T, D(x) becomes time invariant function and hence there is no need to account

for temporal correlation.

Toggle Frequency: If a node x is toggling n(T) times over a time interval of length

T, then the toggle frequency F(x) is defined as:

F(x) = n(T)/(2*T) where T is very huge time (infinite ideally)

Example, if the node is switching at 20 MHz, it is expected that the node will switch 2

times in 50 ns. As it can be seen, the toggle frequency can be converted to transition

density or switching activity by the following equation,

Toggle density = #of transitions/Period = Switching Activity

All the three terms mentioned above are used interchangeably in this document.

It should be noted that toggle frequency of a node has no direct relation with the clock

domain(s) in which node (or logic) exists. We have used the clock domain frequency to

upper bound the toggle frequency calculated by our approach.

Signal Probability: Signal probability P(x) at a node x is defined as the average

fraction of clock period in which the stead state value of x is logic

high.

2.2 Toggle Activity Estimation

This section gives overview of Farid Najm’s work.

Boolean difference of output is computed with respect to each input pin. Boolean difference of

function y (output) depends on x(each of the input). It is defined as:

30

01 =⊕== xy

xy

dxdy

(1)

It was shown in [5] that, if the inputs xI to boolean logic are (spatially) independent, then the

density of its output y is given by:

∑=

=n

ixiD

dxidyPyD

1)()()( (2)

In (2), it is assumed that all inputs are independent. This can lead to inaccuracy where primary

inputs will be diverging and than reconverging to primary outputs – they are not really spatially

independent. However, at a block, the primary inputs can be considered pretty much

independent and hence the above approach can be modeled more accurately if the whole

block’s boolean difference is computed.

Given the signal probability and toggle density values at the primary inputs of a logic circuit, a

single pass over the circuit, using (2), gives the density at every node. Note that apart from

estimating toggle densities at the output node, we also need to calculate output signal

probabilities to do toggle density estimation of subsequent circuit logic. This is simple for two

input AND gate.

P(Y) = P(A)*P(B) or

P(Y) = 1 – P(A)P(B) for NAND gate.

2.3 Multi-million gate solution

When we apply the above approach, it gives good results for designs which are small and can

be analyzed flat and dominated by combinational logic. Beside, it is always not possible to run

flat due to other logistic concerns like blocks are designed first or rest of the design is being

31

done hierarchically or there is reusable IPs in design which do not have net list. The approach

described in previous section was extended to handle such requirements.

We also came across several issues while applying this approach to some large designs [>5M

gates] and implementing tool – Toggle Frequency Calculator. In this section, we will discuss

solutions those addresses each of the problem in detail.

2.3.1 Deriving automatic toggle frequency values

1 Primary Input Handling

The toggle rate at Primary Input is not known. Since they are driven externally, there is

no easy way to predict toggle rate for the same. The same is true for primary input

signal probability. Consider the following Figure 2.1 and Figure 2.2.

Figure 2.1 Schematic of logic circuit 1

32

Figure 2.2 Schematic of Logic Circuit 2

In case of above, Input Clk or D going to block can be primary inputs. Unless user gives

toggle rate, it is highly difficult to compute the same. We used static timing analysis

[24][25] specifications to derive these inputs. They are,

Input Delay Specification – A constraint that specifies the minimum or maximum

amount of delay from a clock edge to the arrival of a signal at a

specified input port. Input delay specification is with respect to a clock

that triggers events on that signal.

Clock specification – specifies the characteristics of a clock, including the clock

name, source period and waveform.

Mode Specifications – specifies the constant values applied on certain port or pins

to drive timing analysis in a specific mode. This means that these pins

or ports are not toggling during the analysis. It also specifies the

constant value to which the port or pin is tied to.

For clock inputs, we used the toggle rate specified as per the clock specification.

For non-clock inputs, we used the clock specified on the Input Delay specification.

For constant ports, we used 0 toggle rate and static probability based on constant value

tied i.e. if it is constant 0, static probability is 0 else it is 1.

33

A Sample SDC file with above command is shown in Appendix A. Note that SDC file

is collection of commands in tcl format so we have shown the commands which are

primarily required.

2 Sequential element modeling (e.g. flip-flops, latches)

Sequential elements do not directly switch arbitrarily when the input switches. Hence,

we can not apply the formula as mentioned in equation (1,2).

We used following formula to compute toggle frequency at the output of sequential

cells. Note that we are referring latches and basic flip-flops as part of sequential cells

and not the complex macros. They are dealt separately.

Qout = min(DataInput, clock/2)

The upper bounding of clock/2 is required since we identified certain cases where Data

Input toggles more than clock/2. This is explained below. For the cases, where data

input is not toggling more than clock/2, output can not toggle more than Data Input.

Above equation takes care of these facts.

3 Some Boolean gates were not taking care realistic scenarios: exor/exnor gates, mux

Equation (1,2) can compute higher toggle rate than clock toggle rate. This can go even

higher than clock toggle rate if there are more such gates in transitive fan out. We found

that this is not the case on actual designs and in many cases, this was not intended

behavior. We exceptionally identified such cells and clipped their toggle rate to half of

the clock toggle rate.

In similar fashion, we exceptionally identified mux cells and assigned the output toggle

rate to maximum toggle rate of all inputs.

34

4 Complex loop handling

These were handled by breaking the loops. We broke the loop at the 1st point where we

found the loop forming.

5 Unconnected inputs going into logic

This was handled by reverse tracking the first sequential cell encountered in the

transitive fan out of unconnected inputs. This algorithm gives the clock controlling the

toggle rate down the line.

If the unconnected inputs are clocks, we assigned the worst toggle rate of the block

itself.

6 Gated clocks or generated clocks

Gated clock is a clock signal that can be modified by logic within the design, such as a

clock that can be turned off to save power. Schematic of gated clock is shown in Figure

2.3.

Figure 2.3 Gated clock example

We made the gated elements transparent for toggle propagation. A clock gating cell is

handled like a buffer.

7 Design Constraints – Guidelines to do realistic usable toggle activity estimation

35

Some of the care needs to be taken despite of all the above solutions. For example,

toggle estimation must be done based on the targeted application. This drives certain

inputs used in 1-6 above. In the implementation, we kept certain hooks to give control

to the user.

2.3.2 Hierarchical Modeling

1. Huge portion of the design is occupied by memories however memory output switching

activity calculation is not straight forward

2. Complex functionalities: Hard macros

3. Multi-million gates cannot afford to have flat analysis due to cycle time and inherent

limitations of probabilistic approaches. We needed to devise a method to do hierarchical

analysis by modeling sub-blocks and using them as a black box.

We used the timing modeling approach to handle (1), (2), (3).

All standard library components are presently modeled in liberty file. [69] Static timing

analysis tools can generate similar liberty file for blocks after completing the analysis. [25]

This file has following information,

• Input pin 2 output pin timing arch

• Setup and Hold constraints for the data input and clock input

• Output timing with respect to either input pin or related clock

We derive output toggle frequency f(out) as below.

36

In case of input 2 output timing Arch

f(out) = maximum(all controlling input toggle rate)

In case of clock 2 output timing Arch

f(out) = average switching activity of clock domain

Figure 2.4 shows the gate level netlist of a design called ‘simple’. Figure 2.5 shows the timing

arcs which will be extracted by Prime Time – a leading industry timing analysis tool. [25]

Timing arc information will be used to compute output toggle rate as explained below.

Figure 2.4 Gate Level Netlist for 'simple' design

37

Figure 2.5 Timing Arcs in extracted model of 'simple' design

There are combinational archs from i3 to out2 and i1 to out2. Hence, output toggle rate at out2

will be controlled by the same clock as i3 or i1. In this case, we assign maximum of i3 or i1

toggle rate at output pin. The other timing arch is clk2->out1. In this case, out1 will be assigned

average switching activity of clk2.

Thus using timing model information, we generate output toggle rates of memories, complex

hard macros or blocks.

2.4 Validation and Results

Above changes were incorporated into executable code and applied to ISCAS89 circuits. The

results were compared through power estimation as discussed in next chapter.

38

2.5 Summary

In this work, we address real issues being faced by large designs. Automatic toggle generation

eases usability as well as improves accuracy. Hierarchical analysis helps in hierarchical design

which is common methodology to handle design complexity.

39

3 Power Estimation

3.1 Overview

Accurate Power Estimates are necessary at various stages of the design in order to make correct

architectural, implementation and cost tradeoffs.[61] Architectural level tradeoffs are higher

level and involves software or instruction level power modeling or high level activity numbers

for different blocks to do implementation tradeoffs. Many times weighted averages are used to

identify best cost options [62-65]. Once the design gets converted to structural net list and

Physical Design starts, Power Estimation mainly drives package design, PG network design

and lower level power minimization. In this case, power dissipation is described as below.

P = (A*C*V^2*f) + (τ*A*V*Ishort) + (V*Ileak)

Where

A = activity factor à this specifies the amount of switching at various internal

nodes of design. Note that ‘f’ is clock frequency which is readily available for

most designs. Activity factor specifies about how much a node toggles per ‘f’

transitions of clock. The activity factor can be derived from simulation patterns

of the logic.

C = capacitance à Interconnect load capacitance or wire capacitance

V = dynamic voltage à voltage at which the logic operates

f = frequency à clock frequency at which the logic operates

40

Ishort = short-circuit current during switching à During transition in CMOS

logic, both NMOS and PMOS are ON for a momentarily of time. This time

current finds a direct path from Power Supply to Ground. This is called short

circuit current. It is dependent on input transition duration of CMOS.

τ = duration of short-circuit current

Ileak = leakage current [72-80][32]

Figure 3.1 defines various components of power and their relation ship or contribution to total

power estimation.

Switching power (70-80%) power dissipated by the

charging and discharging of the load capacitance.

∑∀Cell

iTRiCloadVDD ))(*)((*)2^(

Static (leakage) power (5%): power dissipated by a gate

when it is not switching

∑∀ )(iCell

ge(i)PCellLeaka

Dynamic Power consists of Switching Power and Short Circuit Power

ASIC Flow characterizes libraries for average and leakage power.

Short Circuit powerpower dissipated by a momentary short circuit between the P and N transistors of a gate during switching

Cell Internal Switching Power –can vary based on macro Size

InternalInternalPowerPower

Switching power (70-80%) power dissipated by the

charging and discharging of the load capacitance.

∑∀Cell

iTRiCloadVDD ))(*)((*)2^(

Static (leakage) power (5%): power dissipated by a gate

when it is not switching

∑∀ )(iCell

ge(i)PCellLeaka

Dynamic Power consists of Switching Power and Short Circuit Power

ASIC Flow characterizes libraries for average and leakage power.

Short Circuit powerpower dissipated by a momentary short circuit between the P and N transistors of a gate during switching

Cell Internal Switching Power –can vary based on macro Size

InternalInternalPowerPower

Figure 3.1 Venn diagram of Power Components

41

In this work, above power components and their computation are extensively studied. To

address the problem in systematic manner, power estimation has been simplified the following

way. These assumptions are acceptable given the global analysis that we are considering.

Power supply and ground voltage levels throughout the chip are fixed so that it becomes

simpler to compute the power by estimating the current drawn by every sub-circuit assuming a

given fixed power supply voltage. Note that this does not mean that different blocks can not be

at different voltage level. This allows pre-characterizing library components for required

voltage points.

The circuit is built of logic gates and latches or reusable IPs, and has the popular and well-

structured design style of a synchronous sequential circuit. In other words, it consists of flops

driven by a common clock and combinational logic blocks whose inputs (outputs) are derived

from flop outputs (inputs). It is also assumed that the flops are edge-triggered and, with the use

of CMOS design technology, the circuit draws no steady-state supply current. This allows

breaking down average power dissipation of the circuit into 2 components

• The power consumed by the flops

• The power consumed by the combinational logic blocks.

This chapter is organized as below. In the next section, we have further explained cell based

power analysis. Next section briefly introduces tools used to compare power estimation as

performed by toggle computation described in previous chapter. Later validation and results are

described.

42

3.2 Current approaches to Power Analysis

Cell based power estimation consists of cell characterization and logic simulation or activity

estimation. The characterization phase entails a set of electrical simulations of each library cell

for all possible input transitions and for a wide range of fanin and fanout conditions. Timing

and power information obtained in this way is used to construct lookup tables for the basic

library elements [46][69].

Summing the leakage power of the design’s constituent library cells derives the total leakage

power of a circuit:

PleakageTotal = ∑∀ )(iCell

ge(i)PCellLeaka (3)

Where PcellLeakage(I) is the leakage power dissipation of each cell. Technology library developers

annotate the library cells with the approximate total leakage power dissipated by each cell.

There is usually a single static power number per library cell but sometimes leakage power can

depend on the logical condition of the cell. In this case, the library cell is annotated with a state

dependent static power.

A cell’s internal power is the sum of the internal power of all of the cell’s inputs and outputs as

modeled in the technology library:

∑∀

=)(

)(*)(*iPin

ifiAEiInternal

P (4)

Where Ei is the internal energy of each pin. In practice, the internal energy if a pin is

characterized in the technology library and can be accessed by simple table look-up. Depending

43

on the required accuracy, different look-up tables can be provided by the library designers as

explained in Table 3.1.

Lookup Table Pin

Direction Indices

One-

dimensional

Input/

Output

Input Transition OR Output load capacitance

Two-

dimensional

Output Input transition and output load capacitance

Three-

dimensional

Output Input transition and output load capacitance of the two outputs

that have equal or opposite logic values

Table 3.1 Power Modeling for CMOS gates

The switching power is calculated in the following way:

∑∀

=Cell

ifiAiCloadVDDPswitching ))(*)(*)((*)2^( (5)

Where Cload(i) is the capacitive load of net i. Without any physical information, the load

capacitance Cload(i) is calculated using the wire load model of the net and the fanout of the

driving pin. Usually, this approach achieves relative accuracy.

Apart from the approaches mentioned above, the following factors are also important for

accurate power estimation.

44

1. Temperature dependency of power. Power consumption in CMOS depends on mobility

factors, threshold voltage and doping concentrations. These factors are temperature

dependent. Hence power also varies according to variation in temperature.

2. Voltage dependency of power. Voltage dependency of power is well known.

(P=C*V*V*f). This is true for CMOS technology also. If we model, the CMOS

component as a capacitor, it is clear that power varies based on the variation on supply

voltage.

3. Power increases with increase in frequency of operation. In fact, many designs now a

day have different modes of operation. A high frequency mode when the device is

operational and a low frequency mode when the device is in standby mode. The impact

of frequency on power estimation is already being discussed in previous section.

4. Now a day, most of the designs have a significant chunk of flops or registers. According

to one statistics, around 40-50% logic of the design contains flops. If all the flops are

clocked throughout the operation, clock network consumes almost 50% of total power.

It is sometimes helpful to analyze power consumption on clock network. This work

analyzes clock power contribution to total power.

5. Process corner also impacts the currents and power consumption. This is especially true

for leakage power. A typical VLSI process has leakage power variation of order of 4-6

from worst process to best process.

45

Based on power sensitivity and tool study analysis in this section, we propose a power

estimation flow in typical design cycle as shown in Figure 3.2 below. Note that the power

analysis varies from RTL design to pre layout netlist to post layout netlist.

* SAIF - Switching Activity File based approach

Architecture

Recommended Least Preferred

Power Estimation(spreadsheet)

RTL

Placed Netlist

Detailed Route Over

RC SPICE Netlist

Unplaced Netlist

Placed Netlist

Detailed Route Over

Toggle Frequency Calculator

Power Estimation in Power

Compiler (wire load, global SPEF,

Detailed SPEF)

RC SPICE Netlist

NanoSim

PIF File Generation

PrimePower

Forward SAIF*Or Frequency Constraints

Logi

c Si

mul

atio

n

* SAIF - Switching Activity File based approach

Architecture

Recommended Least Preferred

Power Estimation(spreadsheet)

RTL

Placed Netlist

Detailed Route Over

RC SPICE Netlist

Unplaced Netlist

Placed Netlist

Detailed Route Over


Power Estimation in Power

Compiler (wire load, global SPEF,

Detailed SPEF)

RC SPICE Netlist

NanoSim

PIF File Generation

PrimePower

Forward SAIF*Or Frequency Constraints

Logi

c Si

mul

atio

n

Figure 3.2 Power Estimation in Design Stages

3.3 Power analysis Tools

3.3.1 Power Compiler: [67]

Formerly known as Design Power, power compiler is currently most widely used Synopsys tool.

Power compiler, typically being used during synthesis, does power optimization as well as

power estimation. This tool has static algorithms for calculating switching activity at various

46

circuit nodes and propagates the same. It is known fact that power compiler cannot estimate

good switching activity for sequential cells. It should be also noted that most ASIC vendors

have cell power modeling based on Synopsys Liberty syntax so it is highly important to have

single cell power estimation close to Power Compiler number. Synopsys Reference Manual on

Power Compiler [18] gives basic power calculation theory and description of terms being used

in its tools.

We used power compiler in two modes.

One mode was to use power compiler as complete solution for power estimation. In this

approach, we generated input switching activity from our vectors and specified to

power compiler. Power compiler propagated the switching activity based on switching

probability. It then calculates power. In this method, it used some assignment method

for sequential cells and we went ahead with that because our aim was to verify default

switching activity propagation algorithm of Power Compiler.

Second mode was to use power compiler just as power calculation engine. In this

approach, we generated switching activity at all the nodes by using methodology

defined in Chapter 3 and used the power calculation engine. As mentioned earlier,

power calculation engine is quite accurate and so based on power estimation; our aim

was to evaluate switching activity determination accuracy of other methods.

3.3.2 Power Mill (or Nano Sim) [4][68]

Power Mill is Synopsys tool (currently known as Nano Sim) with fast SPICE engine at core. It

has been identified as nicely correlating for two of the single cell circuits and one small design

47

with SPICE. Power Mill is dynamic simulation based tool and hence it requires patterns for

simulation.

We used Power Mill to calculate average and peak power. The main reason was runtime

advantage of PowerMill compare to SPICE. It should be noted here that Power Mill is capable

of taking SPICE net list as input so any switching between from Power Mill and SPICE is

transparent, if needed.

3.3.3 Prime Power [66]

Prime Power is another offering in Synopsys power portfolio. This is dynamic vector based

solution. However the key difference with Power Mill is that Power Mill is SPICE based tool

whereas Prime Power is logic simulation based tool. In other words, Power Mill is more tuned

for accuracy and Analog kind of designs whereas Prime Power is tuned to digital and

specifically ASIC kind of designs with reasonably good accuracy. Prime Power has PLI

interface with leading industry simulators e.g. VCS, Modelsim, Verilog etc. While doing logic

verification with these simulators, if we instantiate one call/command, the PLI dumps binary

files. These binary files can be used in Prime Power to do power estimation. It should be noted

that Prime Power can do peak power analysis also.

We used Prime Power for both average and peak power analysis. The simulator interface being

used was VCS.

3.3.4 Other Tools

This project used VTRAN for converting vectors to SPICE stimulus. VTRAN is one of the

offerings as part of Synopsys and is generic translator of vectors from one format to another. It

48

is supporting all major industry formats as well as internal formats of many prominent

ASIC/EDA vendors.

VCS was used for logic simulation. There is no specific reason for using this simulator except

that it is Synopsys offering so will go with Prime Power without major hurdles.

There are few TI internal programs used to set up an automated flow. They are listed below.

1. genFuncTDL – An internal utility to generate random vectors with specified clock rate.

2. SimOut – A test constraint validation environment.

3. SDFAligner – for translating SDF from one simulator to other simulator compatible

format.

4. SigProbGen – For converting vectors to input switching activity and probability

calculator.

5. DREPGEN – for generating data compatible for TFC.

6. ASCII benchmark data to Verilog netlist and SPICE netlist translator.

3.4 Validation Flow

The validation flow diagram, data management and color convention is shown in Figure 3.3.

Some of the key steps are described below.

49

VERILOGNETLIST

POWER

RANDOMTDL

SIGPROBGEN

PIF

TESTBench

GENFUNCTDL

POWERESTIMATION

DREPGEN

VTRAN

SMOUT

VCS_PIF

PrimePower

PWLFILE

TFC

TRANSLATERSPICE

POWERMILL

USERFREQFILE

SWITCHINGACTIVITYFILE

DREPFILE+ DATA

SpiceNETLIST

VTRAN cmd

SDF

CFG

CMD

Full VCD

COMPARISON ANDREPORT

DC Scripts

ISCAS89Circuits

TRANSLATERVerilog

VERILOGNETLIST

POWER

RANDOMTDL

SIGPROBGEN

PIF

TESTBench

GENFUNCTDL

POWERESTIMATION

DREPGEN

VTRAN

SMOUT

VCS_PIF

PrimePower

PWLFILE

TFC

TRANSLATERSPICE

POWERMILL

USERFREQFILE

SWITCHINGACTIVITYFILE

DREPFILE+ DATA

SpiceNETLIST

VTRAN cmd

SDF

CFG

CMD

Full VCD

COMPARISON ANDREPORT

DC Scripts

ISCAS89Circuits

TRANSLATERVerilog

Figure 3.3 Power Estimation Validation Flow

n White : Third Party toolsn Green : Automatically generated data or written translatorn Grey : TI toolsn Default : standard inputs/outputsn Blue: Final Outputn Elipse : Data file(s)n Rhombus : Process Block(s)

Figure 3.4 Legends for Validation Flow

50

3.4.1 Netlist Setup:

Standard industry benchmark circuits – ISCAS89 are used for the validation. The circuits’

complexity ranges from 14 gates to 22000 gates. The detail statistics of the circuit is mentioned

in Table 2. [71]

To make the validation complete, two single cell circuits are added for ‘micro’ level validation.

ISCAS89 benchmark circuits were mapped to 130nm technology for analysis. Note that there is

no optimization or synthesis being used while mapping the circuits to 130nm technology

however predetermined set of cells was used. They are,

• 2,3,4 inputs AND/NAND gates

• 2,3,4 inputs OR and NOR gates

• Buffers and inverters

• 2,3 inputs ex-or and ex-nor gates

• Flops

3.4.2 Vector Generation

Random vectors were generated for all the ISCAS89 circuits. The numbers of vectors were

based on circuit complexity and number of gates. They vary from 4 vectors to 38000 vectors

approximately. The same set of vectors is used for logic simulation and SPICE simulation as

well as derivation of switching activity and static probabilities for Input Pins.

51

3.4.3 Interconnect setup

All the circuits can be estimated as synthesized Verilog netlist and hence the parasitic

information was not available. To make comparison more realistic, no load modes were used in

power compiler and in SPICE simulation. The logic simulation was based on SDF generated

from Synopsys.


The complete data from different tools are shown in Table 3.5. Table 3.2 describes circuits used

for benchmarking. Table 3.3 compares run time between dynamic method and modified toggle

computation method for some of the big design blocks. Table 3.4 shows power estimation for

clock network vs. total power estimation. All the power data is dynamic power in uW.

• The power numbers mainly reflect the cell internal power and switching power only due

to gate input capacitances as no interconnects were assumed.

• All the experiments are done at nominal operating point i.e. normal process, 25 C

temperatures and 1.2 voltage (nominal voltage).

• Clock network power is 50% of total dynamic power but this is not true in all cases.

• Run time reduction from static approach is more than 1000 times.

• Prime Power reported power is optimistic in many cases to PowerMill. This is not in

our expectation and we are looking into it.

• TFC is within 30% of PowerMill reported power. However there are certain exceptions

where it reports 30% optimistic power or >50% pessimistic power.

• Power Compiler is >50% pessimistic in most of the cases.

52

Design Name

IN OUT Flops Boolean (gates+inv)

s111 8 1 0 8

s1196 14 14 18 388+141

s1238 14 14 18 428+80

s13207 31 121 669 2573+5378

s13207_1 62 152 638 2573+5378

s1423 17 5 74 490+167

s1488 8 19 6 550+103

s1494 8 19 6 558+89

s15850 14 87 597 3448+6324

s15850_1 77 150 534 3448+6324

s208_1 10 1 8 66+38

s27 4 1 3 8+2

s298 3 6 14 75+44

s344 9 11 15 101+59

s349 9 11 15 104+57

53

Design Name


s35932 35 320 1728 12204+3861

s382 3 6 21 99+59

s38417 28 106 1636 8709+13470

s38584 12 278 1452 11448+7805

s38584_1 38 304 1426 11448+7805

s386 7 7 6 118+41

s4 2 1 1 0

s400 3 6 21 106+58

s420_1 18 1 16 140+78

s444 3 6 21 119+62

s5 2 1 0 1+0

s510 19 7 6 179+32

s526 3 6 21 141+52

s526n 3 6 21 140+54

s5378 35 49 179 1004+1775

s641 35 24 19 107+272

54

Design Name


s713 35 23 19 139+254

s820 18 19 5 256+33

s832 18 19 5 262+25

s838_1 34 1 32 288+158

s9234 19 22 228 2027+3570

s9234_1 36 39 211 2027+3570

s953 16 23 29 311+84

Table 3.2 ISCAS89 circuit description

Design TFC + Power Compiler Runtimes (in mts) PowerMill runtime (CPU Hr)

S13207 3 23

S13207_1 3 24

S15850 3 25

S15850_1 3 26

S35932 6 250

55

Design TFC + Power Compiler Runtimes (in mts) PowerMill runtime (CPU Hr)

S38417 6 189

S38584 7 205

S38584_1 7 212

Table 3.3 Runtime comparison between vector less and SPICE

Design Name CLK Power Total Power %CLK/Total

s4 2.13 3.35 63.6

s27 6.39 10.91 58.61

s208_1 17.05 30.43 56.04

s298 29.84 54.12 55.14

s344 31.97 61.11 52.32

s349 31.97 61.14 52.29

s382 47.04 91.73 51.28

s386 12.79 32.28 39.62

s400 47.04 94.51 49.77

56


s420_1 34.1 53.75 63.46

s444 44.76 84.83 52.77

s510 12.79 29.43 43.46

s526n 44.76 85.94 52.08

s526 44.76 85.89 52.11

s641 40.5 117.38 34.5

s713 40.5 123.07 32.91

s820 10.66 72.29 14.74

s832 10.66 72.5 14.7

s838_1 68.21 99.96 68.24

s953 61.81 102.37 60.38

s1494 12.79 158.7 8.06

s1488 12.79 158.24 8.08

s1423 157.73 356.1 44.29

s1238 38.37 150.51 25.49

s1196 38.37 151.17 25.38

57


s5378 381.55 751.75 50.75

s9234_1 449.75 891.59 50.44

s9234 485.99 632.35 76.85

s13207_1 1359.9 1908.3 71.26

s13207 1426 1718 83

s15850 1272.5 1971.3 64.55

s15850_1 1138.2 2630.3 43.27

s38417 3289.1 4659.3 70.59

s35932 3450.5 9654 35.74

s38584_1 2920.7 8339.6 35.02

s38584 2966.3 8057.2 36.82

Table 3.4 Clock Power vs. Total Power

Design Name

Power

Compiler

Proposed

Approach

Prime

Power

Power

Mill

%new

power/

power

compiler

%power

compiler/

PowerMill

%new

approach/

PowerMill

%prime

power/

PowerMill

s111 5.5 2.23 0 2.87 -59.42 91.62 -22.24 -100

58

Design Name

Power

Compiler

Proposed

Approach

Prime

Power

Power

Mill

%new

power/

power

compiler

%power

compiler/

PowerMill

%new

approach/

PowerMill

%prime

power/

PowerMill

s4 3.72 3.35 2.93 2.79 -9.95 33.43 20.16 4.95

s5 2.49 1.34 0.47 1.72 -46.12 44.66 -22.05 -72.61

s27 12.69 10.91 10.03 9.36 -14.01 35.54 16.55 7.14

s208_1 44.91 30.43 22.4 29.03 -32.25 54.7 4.81 -22.84

s298 67.33 54.12 40.05 41.42 -19.62 62.57 30.67 -3.31

s344 85.24 61.11 56.55 65.7 -28.31 29.74 -6.99 -13.93

s349 86.48 61.14 56.66 65.86 -29.3 31.31 -7.16 -13.97

s382 83.57 91.73 52.75 53.15 9.76 57.25 72.6 -0.75

s386 75.15 32.28 42.78 48.46 -57.05 55.07 -33.4 -11.73

s400 83.96 94.51 52.77 53.3 12.58 57.51 77.32 -1

s420_1 70.19 53.75 45.6 44.12 -23.43 59.11 21.83 3.37

s444 83.79 84.83 52.9 53.64 1.24 56.22 58.15 -1.38

s510 64.68 29.43 18.23 47.43 -54.51 36.36 -37.96 -61.57

s526n 85.2 85.94 53.54 53.89 0.87 58.1 59.48 -0.65

s526 85.41 85.89 53.67 54.08 0.57 57.93 58.83 -0.75

59

Design Name

Power

Compiler

Proposed

Approach

Prime

Power

Power

Mill

%new

power/

power

compiler

%power

compiler/

PowerMill

%new

approach/

PowerMill

%prime

power/

PowerMill

s641 159.77 117.38 72.37 93.34 -26.53 71.17 25.76 -22.46

s713 162.62 123.07 74.51 96.57 -24.32 68.41 27.44 -22.84

s820 119.02 72.29 47.96 73 -39.27 63.04 -0.98 -34.3

s832 119.18 72.5 48.03 73.34 -39.17 62.51 -1.14 -34.51

s838_1 126.27 99.96 93.41 75.78 -20.84 66.63 31.91 23.27

s953 159.75 102.37 85.98 88.5 -35.92 80.51 15.67 -2.85

s1494 187.71 158.7 98.28 136.47 -15.45 37.54 16.29 -27.99

s1488 203.99 158.24 98.16 135.83 -22.42 50.18 16.5 -27.73

s1423 406.56 356.1 244.9 278.03 -12.41 46.23 28.08 -11.92

s1238 302.45 150.51 128.2 151.55 -50.24 99.57 -0.69 -15.41

s1196 296.7 151.17 126.5 151.13 -49.05 96.33 0.03 -16.3

s5378 1041.2 751.75 584.3 688.62 -27.8 51.2 9.17 -15.15

s9234_1 1480.6 891.59 704.7 812.36 -39.78 82.26 9.75 -13.25

s9234 1300.4 632.35 508.2 472.82 -51.37 175.03 33.74 7.48

s13207_1 2853 1908.3 1533 1677.46 -33.11 70.08 13.76 -8.61

60

Design Name

Power

Compiler

Proposed

Approach

Prime

Power

Power

Mill

%new

power/

power

compiler

%power

compiler/

PowerMill

%new

approach/

PowerMill

%prime

power/

PowerMill

s13207 2572 1718 1436 1418.89 -33.2 81.27 21.08 1.21

s15850 2640.3 1971.3 1400 1361.52 -25.34 93.92 44.79 2.83

s15850_1 3272.6 2630.3 1539 1945.25 -19.63 68.24 35.22 -20.88

s38417 7654.6 4659.3 4352 4688.74 -39.13 63.26 -0.63 -7.18

s35932 17606 9654 6789 8513.75 -45.17 106.79 13.39 -20.26

s38584_1 12031.7 8339.6 5630 6738.36 -30.69 78.56 23.76 -16.45

s38584 10951.4 8057.2 4261 6235.13 -26.43 75.64 29.22 -31.66

Table 3.5 Power Estimation across various tools

3.6 Power estimation applications

Once the power estimation has been done, the data can be used in a post-processing step to

investigate various circuit properties. Note that some of them are applications of average toggle

calculation method we described above.

3.6.1 Average power/ground bus currents

Consider the problem of computing the average current in the power or ground bus branches.

This can be solved using toggle densities and average power consumption for each library cell.

61

We can approximate the average power for each cell based on toggle densities and approximate

power or ground network as distributed or lumped R and C. SPICE simulating this power

network, one can estimate average power/ground bus currents. [31]

3.6.2 Average power dissipation

As a direct consequence of the power estimation described above, it should be clear that the

analysis gives overall average power dissipation, summing over all circuit nodes.

3.6.3 Electro migration failures

Electro migration [93][94] is a major reliability problem caused by the transport of atoms in a

metal line due to electron flow. Under persistent current stress, this can cause deformations of

the metal, leading to either short or open circuits. The electro migration failure depends on

average and root mean square – RMS current densities in metal leads. The average current in

each metal lead can be estimated by the method described in this chapter and thus potential

electro migration current can be addressed either in power network or signal lead.

3.6.4 Power Routing

It has been noticed that inaccurate power estimation normally is the root cause of ‘over design’

of power network. By estimating accurate power number, it is possible to have dense power

grid on a block and light power grid on some other block and thus reducing the overall IR drop

problem also.

62

3.6.5 Gate Oxide Integrity Analysis

Reduction in gate oxide thickness in submicron technologies has resulted in increased electric

field at the gate oxides. Excessive electric field > 5MV/cm can cause damage to the gate oxide

and also reduce the Time Dependent Dielectric Breakdown strength (TDDB). The excessive

electric field are caused by undershoot and overshoot at gate terminal. High duty cycle of

overshoot/undershoots will result in permanent failure of the transistors. The Failure in Time

(FIT) rate represents the probability of device failure in 10 years of operation. In this regard,

the duty cycle of signal input pins are measured based on toggle density.

3.7 Summary

Based on our validation flow and analysis of results, it can be found that there is a way to

estimate a good power number with minimum run time as shown Table 3.3. However as the

method suggests, the toggle frequency calculation method has certain limitations as it is based

on probabilistic algorithms and it does not have timing information or it does not do any logical

simulation. Some ‘power’ designers may be interested in having good accuracy at the cost of

run time. We have proposed a power estimation flow that caters the need of ‘power’ user as

well as normal users also.

63

4 Power Supply Noise Analysis

4.1 Overview

Figure 4.1 below gives a representative voltage waveform at an internal node in digital designs

while they are operational. The fluctuations arise due to switching CMOS logic and

inductances in power supply, package and interconnect.

Max Voltage

Time Average IR Drop

Min Voltage

Time

Volta

ge

Increases PropagationDelay

Max Voltage

Time Average IR Drop

Min Voltage

Time

Volta

ge

Increases PropagationDelay

Figure 4.1 Voltage over time representation at an internal design node

The dips in voltages are due to sudden change in currents during logic switching since

inductance will have additional di/dt noise. Apart from that, in CMOS currents are higher while

logic switches compare to average currents used for average IR drop analysis. This causes

additional i(t)*R drop where R is resistance of Power Grid. Total drop seen at the sink of

current is:

deltaV = L(di/dt) + i(t)*R

64

Most popular technique to control this IR drop is to insert decoupling capacitors in the design.

Figure 4.2 shows electrical representation of inductance and dynamic switching of cell that

causes Power supply noise and decoupling capacitors that helps in meeting this instantaneous

need.

Cell

Rns

Idd

Iss

Vdd Pin

Vss Pin

Lpd Rpd

Cpd

Lps

Cps

Rps

Rnd

Vss

Vdd Net

Vss Net

Cnd

Cns

CdecapVdd

Cell

Rns

Idd

Iss

Vdd Pin

Vss Pin

Lpd Rpd

Cpd

Lps

Cps

Rps

Rnd

Vss

Vdd Net

Vss Net

Cnd

Cns

CdecapVdd

Figure 4.2 Schematic circuit for instantaneous voltage drop analysis

This work focuses on computing instantaneous IR drop (deltaV) or actual voltage (Vdd-deltaV)

at Cell’s Power/Ground ports. Vdd is ideal voltage source here and constant over time. Here

also our approach is focused on cell based designs. Next section explains the cell

characterization and modeling needed for block level analysis. Using this characterization, we

build a power grid network that can be simulated. This is discussed in section 5.3. Section 5.4

explains the prototype flow we developed and chapter ends with validation results and

conclusion.

4.2 Cell Characterization

Definition: Cell characterization is a process through which data is prepared for

every cell for usage in the design. Process involves SPICE

characterization as well as post processing of data. The process needs

65

to be absolutely in complete alignment between characterization and

its usage.

4.2.1 Current Characterization Methodology

For instantaneous Power Grid analysis, we analyzed cell peak current waveforms. Figure 4.3

shows transient waveform of inverter cell which was simulated at 250MHz. (VDD is power pin

and VSS is ground pin) It has voltage waveform of primary input and primary output (VA, VY)

of inverter. It also has current waveform in VDD and VSS port (IRVDD, IRVSS). The voltage

waveform at VDD and VSS port is seen. (VVDD_INV1, VVSS_INV1)

Note that current waveform at VDD and VSS are similar except one difference – transition

direction. The current waveform at VDD when output is charging is same as current waveform

at VSS when output is discharging and vice versa. This is true in this case for inverter but it can

vary if the cell is not balanced properly. However in any case the amount of charge

supplied/discharged will be constant since it is governed by load connected at output.

66

Figure 4.3 Inverter waveforms measured at different nodes

Output is rising. This alignment is preserved for better results during current waveform generation. Same is true for Output falling.

Output is rising. There is notable symmetry for rise/fall. This helps us to characterize only one current and do the analysis at Power/Ground network.

67

In this work, we have maintained temporal relation ship between Power and Ground current

waveforms and decoupled the simulations i.e. they are simulated separately and IR drop results

are merged.

We performed simulations and arrived at following conclusions.

• The shape of the current waveform remains the same if the patterns used are same

across different frequencies. Note here that the overall simulation time decreases when

frequency increases for a same set of patterns. This is not a surprise as the load being

charged and discharged is same during each transition for the same slew and for the

same set of patterns. In case of CMOS gate, shape of current waveform remains same

for very high frequencies (period ~= 3 times of 0-100% slew). (Appendix C)

• The slew or transition time (used interchangeably) plays a big role for peak power

determination of cells. When the slew decreases, the width of the current spike

decreases with increase in peak. Figure 4.4 and Figure 4.5 shows the peak power

variation for different input transition times. Note the variation of ~2x for inverter and

~1.5x for 2 input NAND gate.

68

Figure 4.4 transition time vs. peak power for Inverter

Figure 4.5 Transition time vs. peak power for nand gate

• Peak power varies while change in output load. The change is as expected since

capacitance increase along with MOS resistance provides exponential voltage ramp up.

Peak is largely dependent on MOS ON resistance as well as initial voltage. Figure 4.6

and Figure 4.7 shows the plot of variation for AND as well as OR gate. Note that the

variation is ~1-3% across wide range of load.

69

Figure 4.6 Load vs. peak power for AND gate

Figure 4.7 Load vs. Peak power for OR gate

• For cell characterization, pattern dependency is not critical. This is expected as most of

the circuits will be 1-2 level of logic where each pattern will activate/deactivate most of

the transistors. However, soon when cells start becoming larger, some logic may not get

activated during switching. In this case, it is important to choose useful patterns for cell

current characterization.

• For cell characterization, transition direction matters for a given power supply. It means

that output rise transition or fall transition are important to capture during

70

characterization and use them appropriately during use. (Figure 4.3) In our case, we

capture rise and fall transition together and use them for analysis, making proposed

approach direction independent. Figure 4.8 State Dependency on cell switching

Figure 4.8 State Dependency on cell switching

We also established few corollaries those will be used later in discussion.

1. Slew impacts the short circuit current of the device. For multi-stage block, slew impacts

1st stage the most and the overall current waveform is unaffected due to this change.

The impact varies from lo to hi when the design stages are decreasing.

2. Glitches or hazardous transitions can contribute to peak current need of the circuit.

Modeling glitches in non-SPICE analysis is not trivial. It is desired that glitches are

reduced by robust design practices. In this work, it is assumed that there are no glitches

in the design.

71

3. The temporal correlation between different inputs influences the characterization data a

lot. This is due to simultaneous switching. We have used the least affecting combination

i.e. 0 skew between multiple inputs in our analysis – this is worst case also. (Figure 4.8)

4.2.2 Current Characterization Flow

Current Source generation involves time variant current waveform determination for each cell.

This is current waveform as it is seen at VDD pin of cell when the cell output is rising or falling.

The flow is shown in Figure 4.9. Sample SPICE deck is shown in Appendix D. PERL Program

that takes input from SPICE simulation has following options available. In our case, we took

last option with 75ps as sampling interval.

1. full – Whole current data available in the punch file is given as output in two column

format, first column giving the simulation time and the second column giving the

current value corresponding to each simulation time instance.

2. fixed – The total simulation time is divided into 8192 points and the current value at

these 8192 time-values is obtained either directly, if available or by interpolation.

3. Interval filtered – An interval in picoseconds is specified and according to that, the

program obtains the time-values for which the data is expected. Again, the current data

corresponding to these time-values is obtained directly, if available or by interpolation.

72

Cell SPICE Deck

SPICE simulation @ 10 MHz

Perl Processing toSample VDD currents

Cell SPICE Deck

SPICE simulation @ 10 MHz

Perl Processing toSample VDD currents

Figure 4.9 Cell Characterization Flow

Using the above methodology, we characterized all the cells which were being instantiated in

ISCAS89 circuits.

4.3 Power Grid network modeling

This section describes the Power Grid network building using the cell characterization data.

Power Grid offers resistance, capacitance as well as inductance to the switching logic. Figure

4.10 shows schematic of typical power grid. [45] The power & ground supply pins are modeled

as ideal voltage sources. The methodology however vastly varies in terms of current source

modeling and capacitance estimation [50 51 52 53]. This work also focuses on current source

modeling which is described in next sub section.

73

Each such arm Represents resistance…Each such arm Represents resistance…

Figure 4.10 Power Grid Modeling

Once, the power grid is determined along with capacitance and current source distribution, it

can be realized as matrix data structure and can be solved for computing voltages at desired

nodes – specifically the nodes where cell components are connected as below.

V * Y = I

Where V is voltage value at each node, Y is admittance or resistance of PG segment, I is

current that we have characterized.

OR v(t) = Z * i(t) ( Z = R – jW for power network )

V(w) = z(w) * i(w)

74

In our work, we have computed resistances and capacitors based on technology data for 130nm

node. A sample program was written to realize the mesh structure as shown in Figure 4.10 for

VDD network and VSS was taken as ideal ground. This is not an issue since we can lump all

the VSS network elements to VDD network. After determining Power Grid Current Waveform,

we solved the network through SPICE simulations.

4.3.1 Power Grid Current Waveform Modeling

Power Grid Current waveform modeling involves following steps:

1. Compute Toggle frequency for each of the instance in design as proposed in Chapter 2.

2. Using the current characterized data for the cell, transform the current data at the above

computed toggle frequency.

3. Compute the input arrival for each of the instance in design. This is done using Static

Timing Analysis. Compute the shift required in current waveform with reference to

clock edge. For simplicity, we have assumed 0 skew for clock network.

4. Hook up the current sources and solve the PG network.

5. Determine the PG model simulation time.

There are explained further below.

1 Read the characterized data.

75

Characterized data was transformed from time domain to frequency domain. The

sampling is done at fixed frequency (much higher than common design frequency

values) – 1000/75 ~ 13.33 GHz and [t, i(t)] are stored.

I(t) = i(0)d(0) + i(0+Ts)d(0+Ts) + i(0+2*Ts)d(0+2*Ts) + … N Samples

Where,

‘Ts’ is sampling frequency – in this case 13.33 GHz

i(t) is current value at time ‘t’

d(t) = 1 when t=n*Ts else 0. n ranges from 1,…,N

For computation efficiency N may be chosen as power of 2… N = 2 ** n (n is integer)

Now, the Fourier transform of the samples have been performed:

I[k] = i[n]*

2 Model the current waveform for each Boolean gate at computed toggle frequency.

• A compression factor (M) is defined to meet the targeted frequency of the cell under

consideration.

M = targeted frequency/cell characterized frequency (10MHz in this work)

• Transformation allows preserving base of the current transients. This would not have

been possible in a time domain while we scale frequency. Hence, the need of frequency

domain transformation. Appendix E shows the waveform generated after transformation

from 1 MHz waveform. As it can be seen, 1GHz waveform is not per expectation. This

is not an issue since apart from clock cells, other cells are not expected to switch at 1

76

GHz average toggle frequency. Beside, this can be handled by having higher frequency

characterization for clock cells.

• Current data is compressed by compression factor.

• When the data was transformed to frequency domain and the frequency spectrum was

seen, the notable point was that we had a good chunk of lower frequency components -

signifying the approximate triangles of SPICE waveform and most of the medium to

high frequency components were zero - signifying the zero or low-leakage portion of

the power waveform.

3 Attach the current waveform at a PG node where this cell’s power or ground pin is

connected.

4 Compute the total simulation time

• If all instances in the design are applied with respective waveforms, metrics solver gives

peak voltage drop value from 0 to LCM (period of all gates)

• Computing lowest common multiplier (LCM) is computationally intensive for most

designs. Even if we do that, the generated simulation time is prohibitively high. The

memory space also becomes high.

• In reality we are using a smaller number than that to ensure less simulation time and

more realistic data. Instead we computed simulation time as below.

Tstop = f(minimum toggle frequency, max delay)

= Time Period of minimum freq cell + maximum delay of all cell outputs

= 2000 ns (for minimum frequency as 1 MHz and 1000 ns as worst delay)

5 Establishing temporal relationship

77

Do timing analysis and based on input arrival time, the current waveforms are shifted

along time axis. The purpose behind timing analysis is to establish temporal correlation

between various nodes of the design i.e. even though 2 or more nodes have same toggle

frequency; this will not switch all instances in design simultaneously unless needed. In

this work, we have chosen to work with toggle frequency and delay instead of timing

window [28][45]. The reasons,

• Not all circuit nodes switch in all the clock cycles. Average activity computation

establishes relative amount of switching among various nodes. This is possible because

activity estimation techniques consider circuit functionality. Average switching activity

for most of nodes is believed at 20% of the controlling clock frequency. In certain

solutions, the average switching activity for non clock signals is assumed to be 10%

only.

• Timing window method uses classical path sensitization to identify the interval of

switching. Inherent assumption of STA that all activity on a path should finish within 1

clock period (unless specified explicitly using multi-cycle path), the timing intervals for

all nodes will lie within a clock period. This makes whole approach of pseudo dynamic

simulation pessimistic. (see results)

• During timing analysis, we collected 2 sets of data. One, sensitization edge of the node

i.e. whether the node is rising or falling at that time and second, delay of the node from

reference node.

Definition: Reference nodes are those nodes that can be considered as 0 delay

nodes. All the flip-flop outputs are considered as reference node in our

analysis. When the input clock to the flip-flop has some propagation

78

delay associated with it, the reference node will have delay associated

with it.

It can be seen that any frequency higher than 1 MHz will have at least some repetition in its

current signature i.e. a node is switching at 50 MHz (20ns) will have 50 repetitions of its

current signature in 1000 ns simulation.

By changing the minimum frequency, we can change the simulation time considerably. For

example, by changing minimum frequency to 50 MHz, we can ensure that all the current

sources with less than 50 MHz do not contribute (or contributes an average current) to dynamic

V drop analysis and in that case maximum simulation time can become only 20 ns. In all our

analysis we have assumed 1 MHz as minimum frequency.

Number of points in piece wise linear current waveform is based on the sampling resolution

that we did as first step after reading characterized data. An increase or decrease in this

frequency can change the accuracy trading some runtime. In our analysis, we have assumed 75

ps as sampling interval.

Clock network toggles all the time. Also many designs aim for smaller insertion delays as well

as near zero skew. This makes clock network as one of the largest contributor of total current as

well as peak current.

4.4 Complete Flow

Cell characterization and PG network modeling is explained in Figure 4.11. We take Verilog

Netlist as an input and calculate average toggle frequency of each circuit node using simulation

less approach. The frequency constraints are user conditions to drive the frequency calculation

79

of any node. Alternatively frequency constraints can be generated from logic simulation or

functional patterns. SDC contains timing constraints of the design. This is used in toggle

activity calculation as well as timing analysis. Timing information consists of max delay for

paths converging to any node and sensitization edge across that path. Current signatures for

each of the blocks (library macros as well as hierarchical block) are generated from current

models, timing information and activity estimation. The document explains, all the three

processing steps – toggle calculation, timing measurement, current signature generation and

block modeling in detail. Once the current signatures are hooked to parasitic PG-network, a

transient simulation is performed to measure V-drop at each macro node as well as dynamic

transient current waveform is generated for the power-ground pins. The V-drop data is being

fed to timing analysis engine to analyze impact of V-drop to timing.

Netlist Frequency Constraints SDC


Current CharPWL GeneratorTiming Analysis

RLC netlist with current sources

SPICE Simulation

Peak Dynamic Power/Supply Noise

Netlist Frequency Constraints SDC


Current CharPWL GeneratorTiming Analysis

RLC netlist with current sources

SPICE Simulation

Peak Dynamic Power/Supply Noise

Figure 4.11 Peak IR drop Computation Flow

80

Next sections explain Power Grid Generator, Timing Information Generation and SPICE

simulation details.

4.4.1 Timing Information Generation

Timing information was generated using Prime Time. Prime Time requires Verilog netlist,

SDC and SPEF (Standard Parasitic Exchange Format) files as an input. We also wrote a tcl

script (Prime Time supports TCL command language) to get arrival time information for all

nodes of the circuit. Prime Time flow is shown in Figure 4.12 below. Sample SDC file [24][25]

and SPEF used are shown in Appendix A and B.

Prime Time

SDC FileVerilog Netlist SPEF

Arrival TimeComputation

Timing Report

Prime Time

SDC FileVerilog Netlist SPEF

Arrival TimeComputation

Timing Report

Figure 4.12 Prime Time flow for arrival time computation

4.4.2 Power Grid Generator

The Power Grid Generator flow is expanded further below in Figure 4.13.

81

Cell Char @ fix frequency(10MHz in our work)

Perl Code(Processes various Inputs)

Toggle FrequencyCalculator

MATLAB Program-Compression Factor computed (M)

- M based compression in freq domain

Perl CodePG Mesh GenerationCurrent PWL hookup

Timing Report(delay information)

PG Network

Cell Flow

AnalysisFlow

Cell Char @ fix frequency(10MHz in our work)

Perl Code(Processes various Inputs)

Toggle FrequencyCalculator

MATLAB Program-Compression Factor computed (M)

- M based compression in freq domain

Perl CodePG Mesh GenerationCurrent PWL hookup

Timing Report(delay information)

PG Network

Cell Flow

AnalysisFlow

Figure 4.13 Power Grid Generation Flow

PERL program combines the toggle frequency values obtained using TFC and delay values for

corresponding nodes for all the nodes. The output file containing this information for all the

cells is given to MATLAB.

MATLAB program – It is given two inputs. One being the current data at prototype frequencies

for all the gates. The other input is a file containing delay and average activity information for

all the cells of the circuit. Depending upon the activity, the prototype current data is

compressed. And this data is shifted by the amount equal to the delay at that node. The same

procedure is repeated for all the cells. This information about the current data for all the cells is

stored in a file. The second input is a file, which contains the following information about the

VLSI circuit for which we have to obtain the power data.

82

Based on the generated current signatures, a new PG network is created. After this, all the

macro instances are replaced with the corresponding current signatures. In our analysis, we

took a PG network with uniform Power Grid and ideal GND. We did not do any actual power

routing but attached the current sources randomly. This is compared with actual spice circuits

for all macros in the same PG network at the same locations.

4.4.3 SPICE Simulation

Now, each cell is replaced by current source driven by its corresponding PWL data. Package R,

L & C is attached to the top-level power pins. SPICE simulation is performed. The voltage at

each node of the power mesh is punched. The IR drop for each cell is calculated using a

CODAC (Characterization & Optimization of Digital & Analog Circuits) program (TI Internal

Program), which subtracts power supply from the minimum voltage obtained at each node to

give the Peak Dynamic IR Drop at that node. This is done for all the nodes of the circuit. The

same CODAC program can be used to calculate the Average Dynamic IR Drop at each node of

the circuit.


In this work, we have done following simplifications:

• Modeled power grid by creating an nxm mesh. The resistance of each arm in mesh was

derived from Ohm/um number. We also assumed 2 such arms in parallel to comprehend

multi-layer chip scenario.

• Matrix solver was not developed as part of this work. Instead, we used SPICE

simulators available.

83

We executed the flow as explained in previous section. Instead of 1MHz, we used 10MHz for

characterization. This is to reduce the amount of data. We still did 13.33GHz sampling of cell

data.

4.5.1 Peak Power Results

Three small circuits were studied to stabilize the above approach. These three circuits are –

• TWOAND :- The circuit consist of two AND gate one after the another.

• ANDOR :- The circuit consists of one AND gate followed by one OR gate.

• 2AND-1OR :- This circuit has two AND gate at the first level. The outputs of these

AND gates are given to an OR gate whose output is the final output.

The peak power data is obtained for three small circuits using the approach described in the

report and using SPICE simulation. The data obtained using average switching activity

approach and SPICE for 100 Mega Hz and 500 Mega Hz input frequency is given below in

Table 4.1.

PEAK POER (Watts)

TWOAND AND-OR 2AND-1OR FREEQUNCY

Spice Our

Approach SPICE

Our

Approach SPICE

Our

Approach

100 MHz 0.0016817

0.0016 0.0009409 0.0008421 0.0019253 0.0019

84

500 MHz 0.00168113 0.0016 0.0009410 0.00086539 0.00192531 0.0018

Table 4.1 Comparison of Peak power Dissipation

4.5.2 Peak Dynamic IR Drop Results

For determining peak Dynamic IR drop, initially three circuits were used.

• 100 Inverter Chain – It is a chain of 100 inverters with the output of the previous

inverter acting as the input of the next. Delay of the chain is higher than the frequency

of operation.

• 32 Bit Shift Register – This 32-bit shift register is series/parallel shift register.

Depending upon the input and selection criteria, the input is shifted in series or parallel

manner.

• 16 Bit Adder – This is 16-bit binary adder. ‘Carry Forward’ logic is used for addition.

Following points are taken into account while generating the net lists for these circuits.

• Package RLC is added to each power pad.

• Ideal voltage source is attached to each power pad.

• Uniform mesh structure is used and all leaf cells are placed randomly on to it.

• Reduced interconnect network was used using driving point admittance estimation for

power as well as signal lines.

• No existing decoupling capacitors were estimated.

The peak Dynamic IR drop data is obtained using Average Activity approach, Timing Window

approach and SPICE simulation. The data obtained is shown in Table 4.2.

85

Circuit %Drop in

average activity

%Drop in Timing

Window Approach

SPICE

%Drop

100 Inverter Chain 1.65 6 1

32 Bit Shift Register 17.5 40 12

16 Bit Adder 31 NA 19.16

Table 4.2 Comparison of percentage peak instantaneous IR drop

It is clear that the accuracy of the Average Activity method is better than Timing Window

method. To check the performance of this approach, Average Activity method was applied to a

few industry standard circuits. Table 4.3 below shows the comparison of the maximum

Dynamic IR Drop in a circuit using average switching activity and Power Mill. Power Mill is a

SPICE based transient analysis tool offered by Synopsys. It is now called Nano Sim.

circuit %V Drop using avg activity %Vdrop in Power Mill %Error

s27 4.5 5.8 -22.4138

s344 6.3 6.6 -4.54545

s349 6.2 7.5 -17.3333

s444 8.6 13.3 -35.3383

s1238 13.4 13.3 0.75188

s298 12.5 15 -16.6667

Table 4.3 Comparison of percentage peak IR drop on ISCAS89 circuits

86

Power Supply Noise waveforms for average activity approach to spice simulation with actual

logic is shown in Figure 4.14, Figure 4.15 below.

Figure 4.14 PSN waveform of Proposed Method

Figure 4.15 PSN Reference Waveform

87

4.6 Summary

We proposed novel PG network modeling technique. The approach involves average switching

activity calculation, transient current characterization of basic Boolean gates of library,

derivation of PG network model and doing transient simulation of the PG model using vector

less approach. The results are derived from this simulation as desired. Further, our global

average switching activity calculation method ensures that we can consider global timing

impact due to global voltage drop without causing extra runtime. This reduces the need of

local maximum voltage drop analysis on timing [26]. It is also noted in our approach that we

have detailed data of voltage drop across chip/block and based on this profile, we can also use

suitable decoupling placement at required location. The validation is done and results are

compared with dynamic fast SPICE simulator (Nano Sim) and proved that this average

switching rate calculation gives as close results as dynamic vector analysis. However, the

advantage comes from the fact that average switching activity also gives accurate analysis of

average V drop. Hence the approach we are suggesting gives both average and dynamic PG

noise results simultaneously.

The approach is scalable to multimillion gate designs by using the technique proposed by

Blaauw et al [55]. There is further possibility to expand this work to understand decap

sensitivity as well as to skew the analysis for certain end target e.g. PG grid robustness or

Monte Carlo based analysis for higher accuracy and coverage.

88

89

5 Power Up Analysis

One of the popular techniques to reduce leakage is to use gated power supply. [74, 79, 80].

Shekhar [74] has highlighted a technique called ‘sleep transistor’ and challenges associated

with that. This technique proposes to gate power supply using a high threshold transistor when

not required as shown in Figure 5.1. The ‘sleep transistor’ also known as ‘power switch’ turns

off power supply when a portion of chip is idle and thus saving leakage current. Apart from

design challenges, the technique has additional Design Analysis challenges as mentioned below.

Figure 5.1 Gated Power Supply ([74])

1. When Power Supply turns on from off state, a huge capacitive load gets charged

causing a huge surge in current causing Power Supply Noise (PSN). This can couple

with signal lines causing state change or delay change. It can also remain within supply

90

network but causing huge dynamic IR drop that in turn affects circuit performance. The

goal is to predict the surge and control that.

2. The transistor in series with the supply acts as a huge resistor in normal mode of

operation causing additional IR drop. This in turn degrades performance. The IR drop

across the transistor can be as high as 5-20mV. The goal is to do an average IR drop

analysis to access the impact of switch.

3. Optimization of switches to get the best leakage improvement. The optimization has

area penalty or IR drop or Power Supply Noise as cost parameters. For example, low

number of switches gives good leakage improvement but high IR drop and Power

Supply noise.

4. When power supply goes down, all sequential logic in the virtual power domain losses

its state. This puts extra constraint overall on system behavior. There is also a technique

where the state is preserved through ‘retention flops’. [2, 81] The technique does need

extra power routing to save state as well as control logic. The timing analysis needs to

capture the mode switching.

5. Placement and Routing of extra signals, special cells (like retention flops etc) and

virtual power network.

6. Leakage and number of power switch trade off

7. Power routing closes immediately after floor plan. The switches need to be placed by

this time. It is important to have early power up analysis flow to compute required

91

number of optimal switches meeting the peak current surge as well as IR drop and

leakage needs.

Often, PSN is non-negotiable parameter and design-planning goal is to identify total number of

switches that limits PSN to user-defined level. This paper describes an analytical method to

determine optimum number of power switches and power up glitch. Section II elaborates on

switched PG network and PSN problem. Section III outlines the approach to analyze such

networks. Section IV correlates the results we have achieved with SPICE and the efficiency of

algorithm.

5.1 Switched PG Networks

Power Supply Noise is widely acknowledged research domain in today’s high performance

designs. There is various analysis techniques also proposed in literature. [26-31] However,

there is not much awareness on Power Supply Noise caused by turning on the power domains

when gated power supply is used. Figure 5.2 shows switch network for 1M-gate design and

Figure 5.3 shows a current glitch and voltage ramp on an arbitrary switch output. Note that the

current surge can remain for a considerable amount of time causing performance impact to ‘on’

blocks.

92

Power SwitchPower Switch

Figure 5.2 Layout of 1M gate with switch network

Figure 5.3 Current Glitch and Voltage Ramp at arbitrary switch output

A typical PG network with Power Switches can be represented as shown in Figure 5.4. Some of

the characteristics of this network are: [87]

• 2 domains – one golden domain and non-gated power supply, second multiple virtual

domains and switched power supply.

• All virtual domains are unconnected within. They are connected to golden domain

through switch network.

93

• Switch network consists of one or more different kind of switches for a given domain.

• Switch network across virtual power domains are not shared.

• Random logic is connected to golden domain as well as all virtual domains.

• Control logic enables any one or more virtual domains to turn on/off any time.

• Further, any switch network consists of parallel network or sequential network or

combination of both. Parallel configuration allows all switches to turn on

simultaneously whereas sequential configuration allows each switch to turn on one by

one after some delay.

SW1VDD SW SW2VDD SW SW3VDD SW


D1 D1 D1

N Switches

N Switches

Parallel Configuration

Sequential Configuration

Virtual PowerNetwork

NonGatedPower Network

SwitchNetwork

LogicNetwork

LogicNetwork

Offchip Power supply Switch Control Logic

VDD SW

ZOOM



D1 D1 D1

N Switches

N Switches



SW1VDD SWSW1VDD SW SW2VDD SWSW2VDD SW SW3VDD SWSW3VDD SW

SW1VDD SWSW1VDD SW SW2VDD SWSW2VDD SW SW3VDD SWSW3VDD SW

D1 D1 D1

N Switches

N Switches



Virtual PowerNetwork


SwitchNetwork

LogicNetwork

LogicNetwork


VDD SW Virtual PowerNetwork


SwitchNetwork

LogicNetwork

LogicNetwork


VDD SW

ZOOM

Figure 5.4 Typical PG network with Power Switches

When the power supply is ‘off’ and virtual network is disconnected, the current that passes

through is leakage current. If leakage current of the virtual logic is significantly higher than that

of switch network leakage, leakage current improvement happens. When the switches are

turned on i.e. when the power supply connects to virtual power network, the loads in virtual

94

power network start getting charged. Loads include interconnect capacitances, gate

capacitances as well as the circuit diffusion/diode caps. The amount of current being sunk by

these caps depends on the ability of switch network to provide charge in a given time. Due to

fast current need of the virtual power domain, there is L*di/dt noise being injected into circuit

that can affect normal functioning of the golden power domain. Note that despite of capacitive

load dominating, the peak current is still limited by saturation current of switch causing current

profile we got in Figure 5.3.

5.2 Switch Network Analysis

Switch Network Analysis (SNA) early in design-planning includes decision of switch network

topology, identification of switches to be used, total system timings for turning on/off power

domains as well as total power supply noise contribution by a switch network. Sequential

configuration allows configuring delay such that the peak current at any point of time can be

controlled to meet the specification of system noise and hence the tradeoff between the total

time systems requires to on/off virtual network and the noise criteria. This information should

go to the placement and routing tools for physical design. Further, switch network contribution

comes from maximum current surge it causes and the point of optimization there is total

number of switches of each type in the network and delay.

Following assumptions are made to keep the analysis simple but in reality the solution can be

extended to handle them.

• Delay between two consecutive switches is same.

• 2 types of switches exist in the network.

95

• Voltage at any node in virtual power network is of the same value at any time instant

during power ON if there is zero static IR drop.

• Switch Network is sequential. Parallel configuration essentially means a BIG switch -

all transistors forming a BIG switch with characteristic lumped to a single MOS.

High-level flow for the analysis is shown in block diagram Figure 5.5.

C u r r e n t p r e d i c t i o n t h a t c h a r g e s c a p a c i t i v e l o a d

D e t e r m i n a t i o n o f r e q u i r e d p a r a m e t e r s

S w i t c h I V C h a r a c t e r i z a t i o n

C u r r e n t p r e d i c t i o n t h a t c h a r g e s c a p a c i t i v e l o a d

D e t e r m i n a t i o n o f r e q u i r e d p a r a m e t e r s

S w i t c h I V C h a r a c t e r i z a t i o n

Figure 5.5 Schematic Switch network Analysis Flow

5.2.1 Switch Characterization

Switch IV Characterization includes current being sourced through switch for different voltages

between golden and virtual power port of switch. This is achieved using transient SPICE

simulation of the switch. The data is stored in value-pair (voltage-current) format for further

processing.

Switch characterization also involves switch ON resistance measurement. This is resistance that

switches offer during normal functionality i.e. when switches are turned ON and virtual power

network is connected to golden power network. This is measured by putting 10mV battery

across switch and measuring current. This resistance value is later used for average IR drop

analysis across switch.

96

Note that the 1st characterization – IV characterization – that we did also is resistance

characterization. This resistance varies for different value of voltages across switch so it is also

called non-linear resistance characterization.

5.2.2 Current or Switch Prediction

Current prediction is done based on simplified extracted model of block under consideration as

Figure 5.6. The switch network is modeled along with its detailed connectivity and timing

whereas the logic connected to virtual domain is modeled as capacitive load. Current through

switch is predicted in infinitesimal small time duration. The CV characteristic is applied here

as below:

Current(I) =dq/dt OR dq = I dt ……1

But dq = C * dv ……2

Hence dv = I * dt / C ……3

Switch Network Extracted

Total Cload

VoutVDD Switch Network Extracted

Total Cload

Switch Network Extracted

Total Cload

VoutVDD

Figure 5.6 Analysis model of Virtual Power Network

Equation 3 forms the basis of Algorithm 1 described in next section. The delay between two

consecutive switches is used to predict the charge being supplied by the switch to virtual power

97

network domain. The IV table of the switch is used to predict current by further dividing delay

into infinitesimal small time duration as shown in Figure 5.7. Based on the initial voltage and

charge supplied, the voltage has been derived when the next switch just starts turning on. This

process continues till either all switches are turned on or the specified voltage level is reached.

Further, the same method continues if all the switches are turned on but voltage value is lower

than the ideal voltage value (VDD golden) to predict the maximum surge in current. Predicted

number of switches is used to predict static IR drop across switch network as explained in

Algorithm 2. This is another important parameter that will not be discussed further in this

chapter.

Figure 5.7 Infinitesimal Time Division for Current Prediction

Parameters those can be analyzed through this setup include:

• Total number of switches required reaching a required voltage value.

• Alternatively, voltage value that can be reached with given number of switches.

98

• Maximum current surge that will happen given the number of switches.

• Delay impact of consecutive switches while they turned on.

• IR drop across switch network

5.2.2.1 Algorithm for Power Switch Network Analysis:

Initialize load voltage to zero and current charging to Zero.

{

For each, infinitesimal small times period, predict the current based on the

voltage at lumped load from IV table of the switch type.

Identify the actual current based on the number of switches turned on at the

particular instance of time.

Track the current at VDD i.e. if the new current is greater than old one, assign

maximum surge current to new current.

Calculate the rise in voltage in the infinitesimal small time based on equation

(3).

Continue till either all the switches are turned on or the desired voltage level is

reached.

}

Print maximum surge current and voltage level reached after turning on some specific

switches as required by user.

99

Above algorithm is developed for the case where the delay between 2 consecutive switches in

sequential switch network is same. However, it is possible to extend for different delay scenario.

In this case, we need to use timing information from Static Timing Analysis or simulations.

5.2.2.2 Algorithm for Static IR drop analysis across power switches:

{

Read switch characterization data – for static IR drop, read ON Channel

resistance (RON)

Determine total number of switches required to reach desired voltage level –

desired voltage level is specified by user – by “Algorithm for power Switch

Network Analysis”

Effective resistance of the switches predicted above (N) is: RON/N

Compute power consumption of switched off or virtual power network using

any methods described in this work (can be outside this work also!)

Compute average current consumption of the virtual power network. Iavg =

Pavg/VDD

Static IR drop across switch network is: Iavg*RON/N.

}

5.3 Results and Analysis

Traditional approach to study above would be full-fledged SPICE simulation that includes

virtual power network and switch network where each switch is turned on after some delay.

Note that here we are talking about thousands of switches in switch network and about million

100

gates in the virtual network or more. This will take weeks to simulate even with fast SPICE

simulators available in market. Also it is very late in design cycle!

Alternately we can reduce the virtual power network by modeling the interconnect load and

gate capacitance with a huge distributed capacitance and on channel transistor resistance with

effective resistance in series with each distributed C to reduce the number of active elements

and simulate the reduced power network using SPICE (Figure 5.8). This approach gives orders

of improvement in terms of simulation time but the run time is still days. This can be done

during design planning or after detailed design is over!

Figure 5.8 Reduced Switch Network for validation

The technique we presented in last section is static in nature and reduces the runtime to few

minutes and gives very good correlation to techniques described above. The algorithms

described above were analyzed with switches designed in TI’s 90 nm node. All the results

below are for a 1M equivalent gate block. 1M Gates could not be simulated using SPICE along

with switches so a simplified model described in previous paragraph was employed to get

101

SPICE accuracy data while keeping switch network intact. We had employed switch network

with two kinds of switches for this analysis [87]. One set of switches took the virtual domain

till a specific voltage level and second kind of switches with high capacity were turned on in a

sequential manner to measure surge in current.

Table 5.1 shows prediction of switches for given voltage. When the numbers of switches are

increasing the algorithm gives results within 1% accuracy to SPICE based simulation whereas

when the numbers of switches are less, the inaccuracy is within 10%. In other words, the actual

number is quite close to realistic number with accuracy 1-10%. This table also shows the

current surge prediction and the switch number which turns ON causing maximum peak.

Essentially, along with surge, we predict the switch at which the maximum surge occurs. This

helps to further optimize the 2nd type of switch network. Table 5.2 shows voltage prediction

given the number of switches.

The advantage of whole solution comes from the superlative run time improvement that

enables early analysis and tradeoffs in the design – Table 5.3. The runtime clearly outweighs

the small inaccuracy in switch prediction or voltage prediction. Note that runtime does not

include switch IV characterization time since it is one time effort. In static analysis, we can

dump lot more information quickly as per the need to understand certain behavior for tradeoff

analysis. We can also predict time domain behavior of voltage and current using the approach

described in this work. Figure 5.9 compares predicted voltage over time to few arbitrary nodes

simulated in SPICE. Figure 5.10 compares predicted current over time to current measured at

VDD. This is good considering that the analysis is targeted for early trade off analysis.

102

Vdesired (mV) Actual

#Switches

Switches by

Algorithm

Current

Surge (mA)

Current Surge

after #switches

20 380 403 950 123

69 760 771 881 114

271 1560 1554 749 100

583 2340 2328 467 97

869 2964 2971 266 81

1170 4368 4308 24 43

Table 5.1 Switch Prediction by proposed algorithm

#

Switches

Simulated

Voltage (mV)

Voltage by

Algorithm

Surge

Current

Surge Current

after switch #

(mA)

%Error in

voltages

780 63 70.54 892 101 11

1560 280 273.53 784 94 -0.2

2340 587 589.26 546 78 0.38

3120 926 927.7 263 64 0.18

Table 5.2 Voltage Prediction

103

No. of switches Simulation Time (in days) Algorithm Runtime (in mts)

780 ~1.5 < 1

1560 ~4 < 1

2340 ~5 < 1

2940 ~6 < 1

Table 5.3 Power Up analysis - Runtime Comparison

0

200

400

600

800

1000

1200

1400

Time

Volta

ge in

mV

Predicted SPICE@node1 SPICE@node2

Figure 5.9 Voltage Ramp up over Time for various nodes

0

200

400

600

800

1000

Time

Cur

rent

in m

A

Predicted SPICE

Figure 5.10 Current comparison over time

104

5.4 Summary

There are various techniques to improve leakage power of the design - ‘gated power supply’ or

‘sleep transistor’ or ‘switched power network’ is one of the efficient methods to reduce the

leakage power. The analysis techniques described in this work helps in giving quick data for

architecture level decisions while using ‘switched network’ technique. The runtime is in few

seconds and hence Design Team can do lots of iterations to get the optimum number of

switches. The analytical method to calculate total no of switches is fast since it involves one

time SPICE simulation – only IV characteristic of switch - and rest of the analysis is performed

using static analysis. We have also analyzed ‘power on glitch’ for the design using the method

that contributes to Power Supply Noise during power up. All the results are closely matching

with SPICE simulation.

105

6 Conclusion

6.1 Summary

Power Grid analysis challenges being faced by CMOS technology is discussed in this thesis.

For robust power grid, designs need to go through following analysis:

• Accurate Power Estimation

• Instantaneous IR drop analysis and decap planning

• Power Up analysis for designs using MTCMOS for leakage reduction

The key results of this work can be summarized as follows:

1. Successfully implemented hierarchical probabilistic toggle computation approach that is

applicable to multi-million gate designs maintaining the desired accuracy

2. Power Dissipation in cell based CMOS design discussed. A flow is proposed to do

power estimation in various design stages that can improve the accuracy of estimation.

The flow also helps user to make run time and accuracy tradeoffs

3. Proposed the cell characterization methodology for instantaneous IR drop analysis as

well as Power Up analysis for MTCMOS

4. Discussed a prototype flow developed for instantaneous IR drop estimation based on

average toggle rate computed by the proposed toggle methodology in this work. This

flow estimates instantaneous as well as average IR drop numbers during same

simulation.

106

5. Power Up analysis for MTCMOS based digital designs. The methodology is validated

using prototype flow and gives superlative run time improvement compare to Spice.

The methodology also helps in MTCMOS gate optimization.

6.2 Scope of Future Work

Analysis approaches proposed in this work helps in robust power grid analysis. The work has

some extensions possible to further help designs.

First, power estimation proposed in this work relies on gate level netlist. An RTL level power

estimation helps block designer to trade off power early in the design like MTCMOS usage or

multi-Vt usage as proposed in [17].

Second, it is possible to improve pre-layout and post layout power number correlation. One of

the reasons for them to be different is clock tree expansion and buffer insertion while doing

placement and routing in design to meet timing constraints. Early estimation techniques can be

developed to estimate additional cell count to better correlate power numbers in various stages.

Third, the amount of cell characterization data stored for each cell is very huge. A typical ASIC

technology contains 2000-4000 cells. This data reduction is possible if we can just store the

current signatures during transition and use that to model current source in block level analysis.

This will also eliminate the need of frequency domain transform being performed here.

Techniques used in some of the commercial tools in conjunction with the analysis approach

presented in this work can help improving data reduction.

Fourth, we have not got into details of decoupling capacitance for instantaneous IR drop

analysis in this work. It is possible to further extend the work to extensively study various

107

decoupling capacitors – intrinsic due to NWELL, non switching gates, RAMs as well as

intentional being distributed by user. Decoupling capacitor estimation, characterization and

what-if impact analysis on instantaneous IR drop is import area for further research.

Fifth MTCMOS analysis approach proposed in this work is useful early in design planning to

make efficient tradeoffs of MTCMOS switches vs. noise tolerance levels in design. In this work,

we have modeled switch power network with a lumped capacitance. This does not model time

domain behavior of PG network due to PG resistance. A more accurate approach can be

developed that models distributed RC for PG network once placement and power routing is

done. It is our belief that this will give quick accurate analysis of actual network compare to

SPICE like simulations.

108

109

7 References

1. Semiconductor Industry Assoc., International Technology Roadmap for Semiconductors, 2003 Update -

http://public.itrs.net/Files/2003ITRS/Home2003.htm

2. Nam Sung Kim, David Blaauw et al, “Leakage Current: Moore’s Law Meets Static Power”, IEEE Computer, Dec 2003.

3. The SPICE Home Page, http://bwrc.eecs.berkeley.edu/Classes/IcBook/SPICE/

4. Rabe, D; Jochens, G.; Kruse, L.; Nebel, W, „“Power-simulation of cell based ASICs: accuracy- and performance trade-offs”, Proceedings

of Design automation and test in Europe, Feb 1998

5. F. Najm, “A survey of power estimation techniques in VLSI circuits, ”IEEE Trans. VLSI System., vol. 2, pp. 446–455, Dec. 1994.

6. C. Y. Tsui, M. Pedram, and A. Despain, “Efficient estimation of dynamic power dissipation under a real delay model,” in Proc. IEEE Int.

Conf. Computer-Aided Design, 1993, pp. 224–228

7. B. J. George et al., “Power analysis and characterization for semi custom design,” in Proc. Int. Workshop Low Power Design, 1994, pp.

215–218.

8. J.-Y. Lin et al., “A cell-based power estimation in CMOS combinational circuits,” in Proc. IEEE Int. Conf. Computer-Aided Design,

1994, pp. 304–309.

9. H. Sarin and A. McNelly, “A power modeling and characterization method for logic simulation,” in Proc. IEEE Custom Integrated

Circuits Conf., 1995, pp. 363–366.

10. Synopsys’ Design Power, (http://www.synopsys.com/products/power/power.html)

11. N. Waste and K. Eshragian. “Principles of CMOS VLSI Design. VLSI Systems Series. Addison-Wesley, 1985.

12. Najm, F.N, “Transition Density, a stochastic measure of Activity in Digital Circuits”, DAC, pp. 644-649, June 1991.

13. Ghosh, A.; Devadas, S.; Keutzer, K.; White, J, “Estimation of average switching activity in combinational and sequential circuits”, DAC,

pp. 253-259, June 1992

14. S. Bhanja, N. Ranganathan, “Dependency Preserving Probabilistic Modeling of Switching Activity using Bayesian Networks”, 38th

Design Automation Conference, pp. 209-214, 2001.

15. HUGIN API reference manual. Version 5.3. http://www.hugin.com

16. David Heckerman, “A tutorial on learning with Bayesian Networks”, ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf, March 1995.

17. Agarwal, A.; Mukhopadhyay, S.; Raychowdhury, A.; Roy, K.; Kim, C.H, “Leakage power analysis and reduction in nanoscale circuits”,

IEEE Micro, Volume 26, Issue 2, pp. 68-80, March 2006.

18. Keshavarzi, A.; Tschanz, J.W.; Narendra, S.; De, V.; Daasch, W.R.; Roy, K.; Sachdev, M; Hawkins, C.F, “Leakage and process variation

effects in current testing on future CMOS circuits”, IEEE Design & Test of Computers, Volume 9, Issue 5, pp. 36-43, Sept 2002.

19. Dresig, F. Lanches, P. Rettig, O., et al, “Simulation and reduction of CMOS power dissipation at logic level”, Design Automation,

1993, with the European Event in ASIC Design. Proceedings, pp. 341-246, Feb 1993.

20. An-Chang Deng Yan-Chyuan Shiau Loh, K.-H, “Time domain current waveform simulation of CMOS circuits”, IEEE international

conference on Computer aided design 1988, pp. 208-211, Nov 1988.

110

21. F.N. Najm, R.Burch, P. Yang, and I.N. Hajj. “Probabilistic Simulation for Reliability Analysis of CMOS VLSI Circuits”. IEEE

Transactions on CAD, 9(4):439-450, April 1990.

22. Randal S and Tom Phoenix and Brian d foy, “Learning Perl”, 4th Edition, O’Reilly & Associates, ISBN 0596101058

23. Matlab Tutorial, http://www.math.ufl.edu/help/matlab-tutorial/

24. Synopsys, Inc, “Using the Synopsys® Design Constraints Format”, Application Note, Sept 2005.

25. Himanshu Bhatnagar, “Advanced ASIC Chip Synthesis: Using Synopsys Design Compiler Physical Compiler and Primetime”, 2nd

Edition, Kluwer Academic Publishers, ISBN: 0792376447.

26. Martin Saint-Laurent, Swaminathan, "Impact of Power Supply Noise on Timing In High Frequency Microprocessors", IEEE Trans on

Advanced Packaging, pp. 135-144, Feb 2004

27. Kriplani, H.; Najm, F.; Hajj, I, “Improved Delay and Current Models for Estimating Maximum Currents in CMOS VLSI Circuits”,

ISCAS 94, pp. 435-438, June 1994.

28. Kriplani, H.; Najm, F.N.; Hajj, I.N, “Pattern Independent Maximum Current Estimation in Power and Ground Buses of CMOS VLSI

Circuits: Algorithms, Signal Correlations, and Their Resolution”, IEEE Trans on CAD of international circuits and systems, pp. 998-

1012, Aug 1995.

29. Hsiao, M.S.; Rudnick, E.M.; Patel, J.H., “Peak Power Estimation of VLSI Circuits: New Peak Power Measures”, IEEE Trans on VLSI

Systems, pp. 435-439, Aug 2000

30. Qing Wu; Qinru Qiu; Pedram, M, “Estimation of Peak Power Dissipation in VLSI Circuits Using the Limiting Distributions of Extreme

Order Statistics”, IEEE Trans on CAD of integrated Circuits and Systems, pp. 942-956, Aug 2001.

31. Boliolo, A. Benini, L. de Micheli, G. Ricco, B., “Gate-level power and current simulation of CMOS integrated circuits”, Very Large

Scale Integration (VLSI) Systems, pp. 473-488, Dec 1997

32. Anantha Chandrakasan’s Home Page: http://www-mtl.mit.edu/~anantha/publications.html,

http://www.fetchbook.info/search_Anantha_Chandrakasan/searchBy_Author.html

33. FFT Tutorial, http://www.ele.uri.edu/~hansenj/projects/ele436/fft.pdf

34. Jeff Tranter and Paul Raines, “Tcl/Tk in Nutshell”, O’Reilly Associates, ISBN 1565924339

35. Alan V. Oppenheim, Ronald W. Schafer, John R. Buck, “Discrete Time Signal Processing“, 2nd Edition, Prentice Hall, ISBN 0137549202

36. Chen, H.H.; Ling, D.D, “Power Supply Analysis Methodology for Deep-Submicron VLSI Chip Design”, DAC, pp. 638-643, June 1997.

37. Yi-Shing Chang; Gupta, S.K.; Breuer, M.A, “Analysis of Ground Bounce in Deep-Submicron Circuits”, VLSI Test Symposium, pp. 110-

116, May 1997

38. Yi-Min Jiang; Kwang-Ting Cheng; An-Chang Deng, “Estimation of Maximum Power Supply Noise for Deep Sub-Micron Designs”,

International sym on low power electronics and design, pp. 233-238, Aug 1998.

39. Zhao, S.; Roy, K.; Koh, C.-K, “Estimation of Inductive and Resistive Switching Noise on Power Supply Network in Deep Sub-Micron

CMOS Circuits”, International conference on Computer Design, pp. 65-72, Sept 2000.

40. S. Bobba, I.N.Hajj, “Maximum voltage variation in the power distribution network of VLSI circuits with RLC Models,” Proc of ISLPED,

Aug2001

111

41. Bai, G.; Bobba, S.; Hajji, I.N, "Static Timing Analysis Including Power Supply Noise Effect on Propagation Delay in VLSI Circuits",

DAC, pp. 295-300, 2001.

42. G. Steele, et al., “Full-Chip Verification Methods for DSM Power Distribution Systems,” Proc. Of DAC, pp. 744-749, 1998

43. R. Chaudhry, D. Blaauw, R. Panda and T. Edwards, “Current Signature Compression For IR-Drop Analysis,” Proc. Design Automation

Conference, pp. 162-167, 2000

44. S. Bobba and I. N. Hajj, “Estimation of maximum current envelope for power bus analysis and design,” Proc. of ISPD, pp 141-146, Apr

1998

45. Rishi Bhooshan (TI) et.al, “A Unique Method For Dynamic Voltage Drop Analysis and Decoupling Capacitance Estimation,, VDAT

2003

46. Cirit, M.A., “Characterizing a VLSI standard cell library”, Digital Object Identifier 10.1109/CICC, pp.25.7.2-25.7.4, May 1991

47. Debnath, S.P.; Sukumar, J.; Udaykumar, H, “A methodology for fast vector based power supply and substrate noise analyses”,

International conference on VLSI Design, pp. 808-811, Jan 2005.

48. Dalal, A.; Lev, L.; Mitra, S.; “Design of an efficient power distribution network for the UltraSPARC-I microprocessor”, IEEE conference

on Computer Design: VLSI in computers and processors, pp. 118-123, Oct 1995

49. Chen, H.H.; Schuster, S.E.; „On-chip decoupling capacitor optimization for high-performance VLSI design”, VLSI Technology, Systems

and Applications, pp. 99-103, June 1995.

50. Larsson, P, “Power supply noise in future IC's: a crystal ball reading”, Custom Integrated Circuits, pp. 467-474, May 1999.

51. Sotman, M.; Popovich, M.; Kolodny, A.; Friedman, E, “Leveraging symbiotic on-die decoupling capacitance”, Electrical Performance of

Electronic Packaging, pp. 111-114, Oct 2005

52. Larsson, P, “Resonance and damping in CMOS circuits with on-chip decoupling capacitance”, IEEE Transactions on Circuits and

Systems-I, vol 45, pp. 849-858, Aug 1998

53. Larsson, P, “Parasitic Resistance in an MOS Transistor Used as On-Chip Decoupling Capacitance,” IEEE Journal of Solid State Circuits,

vol 32, pp 574-576, Apr 1997

54. Chaudhry, R.; Panda, R.; Edwards, T.; Blaauw, D, “Design and analysis of power distribution networks with accurate RLC models”,

International conference on VLSI Design, pp. 151-155, Jan 2000

55. Min Zhao; Panda, R.V.; Sapatnekar, S.S.; Edwards, T.; Chaudhry, R.; Blaauw, D, “Hierarchical analysis of power distribution networks”,

DAC, pp. 150-155, June 2000

56. IBM Methodology for Power Supply Noise - http://www.research.ibm.com/da/nova.html

57. R. Heald et. al, “Implementation of a 3rd Generation Sparc V9 64b Microprocessor”, Proc IEEE ISSCC, pp. 412-413, 2000

58. Yi-Min Jiang Kwang-Ting Cheng, “Analysis of Performance Impact Caused by Power Supply Noise in Deep Submicron Devices”, DAC,

June 1999

59. Apache Design Solutions, “Reshaping Nanometer Flows with Physical Power Integrity”, http://www.apache-da.com, White Paper, May

2003.

60. Anthony Ralston, Philip Rabinowitz, “A First course in Numerical Analysis”, 2nd Edition, Dover Publications, ISBN 048641454X.

61. Kalpesh Shah, “SNUG 2006 Panel Discussion”

112

62. H. Mehta, R.M.Owens, M.J.Irwin, “Energy Characterization Based on Clustering,” 33rd Design Automation Conference, June 1996.

63. D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: A framework for Architectural-Level Power Analysis and Optimizations,” Proc of

International Symposium on Computer Architecture, pp. 83-94, June 2000

64. V. Tiwari, S. Malik, and A. Wolfe, ”Power Analysis of Embedded Software: A First Step toward software power minimization,” IEEE

Trans VLSI Systems, vol2, no. 4, pp 437-445, 1994

65. E. Macii, M. Pedram and F. Somenzi, “High Level Power Modeling and Estimation,” IEEE Transactions on Computer Aided Design of

Integrated Circuits and Systems, vol 17, November 1998.

66. Synopsys Prime Power - http://www.synopsys.com/products/power/primepower_ds.pdf

67. Synopsys Power Compiler - http://www.synopsys.com/products/power/power_ds.pdf

68. Synopsys Nanosim - http://www.synopsys.com/products/mixedsignal/nanosim/nanosim.html

69. Synopsys Liberty Format - http://www.synopsys.com/partners/tapin/lib_info.html

70. M Horowitz and R Gonzalez, “Energy dissipation in general purpose Microprocessors”, IJSSC, vol31, Sept 1996.

71. Brglez, F. Bryan, D. Kozminski, K. , “Combinational profiles of sequential benchmark circuits”, ISCAS, vol 3, pp. 1929-1934, May

1989.

72. R. Wilson and D. Lammers, “Grove Calls Leakage Chip Designers’ Top Problem,” EE Times, 13 Dec 2002;

www.eetimes.com/story/OEG20021213S0040.

73. Intel SpeedStem ™ technology, http://www.intel.com

74. Y.Ye, S Borkar, V. De, “A New Technique for Standby Leakage Reduction in High-Performance Circuits,” 1998 Symposium on VLSI

Circuits, June 1998.

75. M. Powell et al., “Reducing Leakage in a High Performance Deep-Submicron Instruction Cache,” IEEE Trans. VLSI, Feb 2001, pp 77-89

76. Ali K., Charles H. et al., “ Effect of reverse body bias for low power CMOS circuits”

77. Kaushik R, Mark C.J., Dinesh S., “leakage control with efficient use of transistor stacks in single threshold CMOS”

78. Shekhar Borkar, “Low Power Design Challenges for the Decade”, 2001.

79. Kumagai, K.; Iwaki, H.; Yoshida, H.; Suzuki, H.; Yamada, T.; Kurosawa, S.; “A Novel Powering Down Scheme for low Vt CMOS

Circuits”, 1998 Symposium on , 11-13 June 1998. Pages:44 – 45

80. Mutoh, S.; Douseki, T.; Matsuya, Y.; Aoki, T.; Yamada, J.,” 1V high-speed digital circuit technology with 0.5μm multi-threshold

CMOS”, IEEE ASIC Conference, 1993.

81. Akamatsu, H.; Iwata, T.; Yamamoto, H.; Hirata, T.; Yamauchi, H.; Kotani, H.; Matsuzawa, A.; “A low power data holding circuit with

an intermittent power supply scheme for sub-1V MT-CMOS LSIs”, VLSI Circuits, 1996. Digest of Technical Papers., 1996 Symposium

on , 13-15 June 1996 Pages:14 – 15

82. Ye, Y.; Borkar, S.; De, V. , “A new technique for standby leakage reduction in high-performance circuits”, Symposium on VLSI Circuits,

June 1998. Page(s): 40-41

83. Das, K.K.; Joshi, R.V.; Chuang, C.T.; Cook, P.W.; Brown, R.B., “New digital circuit techniques for total standby leakage reduction in

Nano-scale SOI technology”, pp. 309-312, ISSCC, Sept 2003.

84. Wenxin Wang; Anis, M.; Areibi, S, “Fast techniques for standby leakage reduction in MTCMOS circuits”, ISOCC, pp. 21-24, Sept 2004

113

85. Fei Li; Lei He; Saluja, K.K.; “Estimation of maximum power-up current”, DAC, pp. 51-56, Jan 2002

86. Calhoun, B.H.; Honore, F.A.; Chandrakasan, A.P, “A leakage reduction methodology for distributed MTCMOS”, JSSC, pp. 818-826,

May 2004

87. Royannez, P.; Mair, H.; Dahan, F.; Wagner, M.; Streeter, M.; Bouetel, L.; Blasquez, J.; Clasen, H.; Semino, G.; Dong, J.; Scott, D.; Pitts,

B.; Raibaut, C.; Uming Ko, “90nm Low Leakage SoC Design Techniques for Wireless Applications”, ISSCC, pp. 138-139, Feb 2005.

88. R. Heald, et al., “Implementation of a 3rd Generation SPARC V9 64b Microprocessor,” Proc. IEEE ISSCC, pp 412-413, 2000

89. P. Gronowski, W. Bowhill, R. Preston, M. Gowan, and R. Allmon, “High Performance Microprocessor Design,” IEEE Journal of Solid

State Circuits, vol 33, no 5, pp. 676-686, Apr 1998.

90. J. Darnauer, D. Chengson, B. Schmidt, and E. Priest, “Electrical Evaluation of Flip-Chip package Alternatives for Next Generation

Microprocessor,“ Electronic Components and Technology Conference, pp. 666-673, 1998

91. S. Borkar, “Low Power Design Challenges for the Decade,” Proc. of ISLPED, 2000

92. V. Tiwari, D. Singh, S. Rajgopal, G. Mehta, R. Patel and F. Baez, “Reducing Power in High performance Microprocessors,” Proc. of

Design Automations Conference, 1997

93. Wachnik, R.A.; Filippi, R.G.; Shaw, T.M.; Lin, P.C, “Practical benefits of the electromigration short-length effect, including a new design

rule methodology and an electromigration resistant power grid with enhanced wireability”, Sym on VLSI Technology, pp. 220-221, June

2000.

94. J. Kitchin, “Statistical Electromigration Budgeting for Reliable Design and Verification in a 300-MHz Microprocessor”, Symposium on

VLSI Circuits Digests, pp. 115-116, 1995

95. T .H. Cormen, C. E. Leiserson, R. L. Rivest “Introduction to Algorithms”, PHI

96. Chapra, S.C, Canale R P “Numerical Methods for Engineers” 3rd Ed., McGraw-Hill 1998.

97. Rabey, “Digital Integrated Circuits Design”, Pearson Education, Second Edition, 2003

114

115

Appendix A Sample SDC file

create_clock –period <value> [get_ports clk]

set_input_delay <value> -clock clk1 [get_ports IN*]

set_case_analysis 0 [get_ports *reset* *scan_mode*]

report_timing <file name>

116

Appendix B Sample SPEF Format

*SPEF "IEEE 1481-1997" *DESIGN "s27" *DATE "Mon Dec 13 10:05:00 1999" *VENDOR "TI" *PROGRAM "vlog2spef" *VERSION "1.0" *DESIGN_FLOW "Dummy From Verilog" *DIVIDER / *DELIMITER : *BUS_DELIMITER [] *T_UNIT 1 NS *C_UNIT 1 PF *R_UNIT 1 KOHM *L_UNIT 1e-3 UH *PORTS G17 O *L 0.1 G3 I *S 0.1 0.1 G2 I *S 0.1 0.1 G1 I *S 0.1 0.1 G0 I *S 0.1 0.1 PREZ I *S 0.1 0.1 CLK I *S 0.1 0.1 *D_NET G17 0.1 *CONN *I IV110_1:Y O *L 0.1 *D IV110 *P G17 O *L 0.1 *CAP 0 G17 0.1 1 IV110_1:Y 0.1 2 G17:0 0.1 *RES 0 G17 G17:0 0.1 1 IV110_1:Y G17:0 0.1 *END *D_NET G3 0.1 *CONN *I OR210_1:A I *L 0.1 *D OR210 *P G3 I *L 0.1 *CAP 0 G3 0.1 1 OR210_1:A 0.1 2 G3:0 0.1 *RES 0 G3 G3:0 0.1 1 OR210_1:A G3:0 0.1 *END *D_NET G2 0.1 *CONN

*I NO210_3:A I *L 0.1 *D NO210 *P G2 I *L 0.1 *CAP 0 G2 0.1 1 NO210_3:A 0.1 2 G2:0 0.1 *RES 0 G2 G2:0 0.1 1 NO210_3:A G2:0 0.1 *END *D_NET G1 0.1 *CONN *I NO210_2:A I *L 0.1 *D NO210 *P G1 I *L 0.1 *CAP 0 G1 0.1 1 NO210_2:A 0.1 2 G1:0 0.1 *RES 0 G1 G1:0 0.1 1 NO210_2:A G1:0 0.1 *END *D_NET G0 0.1 *CONN *I IV110_0:A I *L 0.1 *D IV110 *P G0 I *L 0.1 *CAP 0 G0 0.1 1 IV110_0:A 0.1 2 G0:0 0.1 *RES 0 G0 G0:0 0.1 1 IV110_0:A G0:0 0.1 *END *D_NET PREZ 0.1 *CONN *I DTP10J_0:PREZ I *L 0.1 *D DTP10J *I DTP10J_1:PREZ I *L 0.1 *D DTP10J *I DTP10J_2:PREZ I *L 0.1 *D DTP10J *P PREZ I *L 0.1 *CAP 0 PREZ 0.1 1 DTP10J_0:PREZ 0.1 2 DTP10J_1:PREZ 0.1 3 DTP10J_2:PREZ 0.1 4 PREZ:0 0.1

117

*RES 0 PREZ PREZ:0 0.1 1 DTP10J_0:PREZ PREZ:0 0.1 2 DTP10J_1:PREZ PREZ:0 0.1 3 DTP10J_2:PREZ PREZ:0 0.1 *END *D_NET CLK 0.1 *CONN *I DTP10J_0:CLK I *L 0.1 *D DTP10J *I DTP10J_1:CLK I *L 0.1 *D DTP10J *I DTP10J_2:CLK I *L 0.1 *D DTP10J *P CLK I *L 0.1 *CAP 0 CLK 0.1 1 DTP10J_0:CLK 0.1 2 DTP10J_1:CLK 0.1 3 DTP10J_2:CLK 0.1 4 CLK:0 0.1 *RES 0 CLK CLK:0 0.1 1 DTP10J_0:CLK CLK:0 0.1 2 DTP10J_1:CLK CLK:0 0.1 3 DTP10J_2:CLK CLK:0 0.1 *END *D_NET G10 0.1 *CONN *I DTP10J_0:D I *L 0.1 *D DTP10J *I NO210_0:Y O *L 0.1 *D NO210 *CAP 0 DTP10J_0:D 0.1 1 NO210_0:Y 0.1 2 G10:0 0.1 *RES 0 DTP10J_0:D G10:0 0.1 1 NO210_0:Y G10:0 0.1

*END *D_NET G5 0.1 *CONN *I DTP10J_0:Q O *L 0.1 *D DTP10J *I NO210_1:A I *L 0.1 *D NO210 *CAP 0 DTP10J_0:Q 0.1 1 NO210_1:A 0.1 2 G5:0 0.1 *RES 0 DTP10J_0:Q G5:0 0.1 1 NO210_1:A G5:0 0.1 *END *D_NET G6 0.1 *CONN *I DTP10J_1:Q O *L 0.1 *D DTP10J *I AN210_0:B I *L 0.1 *D AN210 *CAP 0 DTP10J_1:Q 0.1 1 AN210_0:B 0.1 2 G6:0 0.1 *RES 0 DTP10J_1:Q G6:0 0.1 1 AN210_0:B G6:0 0.1 *END

118

Appendix C Power Waveforms Analysis

AND Gate power waveforms at different frequency points. Note that waveform shape and

peaks are matching across frequency range.

Figure 1 1MHz, Peak: 838.9 uW

Figure 2 100MHz, Peak: 840.7 uW

Figure 3 1GHz, Peak: 838.2 uW

119

Appendix D Current Characterization – sample spice deck

* *epic tech="voltage 1.2v" *epic "vdd 0 1.2 0.01" *epic "vss 0 0 0.01" *epic "invoke spice3 %input %output" * spice options .inc /user/kalpu/cloc/autochar/userware/spice_options noprint * temperature = 25 .temp 25 .inc ../user_data/models_strong noprint *.inc /db/pdk/1233c035a/current/models/current/tis/model.paths.strong noprint .inc /user/kalpu/cloc/autochar/subckt/sr40/an210h noprint PVDD 1.2 vvdd vdd 0 PVDD RVDD VDD VDD_inv1 1000 RVSS VSS_inv1 0 1000 xinv1 A B Y VSS_inv1 vdd_inv1 an210h *10 MHz VA A 0 PULSE 0 PVDD 1n pslew pslew 50n 100n *Vb B 0 PULSE 0 PVDD 1n pslew pslew 50n 100n Vb B 0 PVDD Pslew 0.01n pload 50ff CY Y 0 pload .tran 0.01ns 250ns .MEASURE TR AVGPWR AVG P(Vvdd) FROM=20ns TO=60ns .punch tr V(Vdd_inv1 vss_inv1) .punch tr I(VVDD) .punch tr I(rvdd) .punch tr V(A B Y) *.punch tr I(rvdd rvss) .end

120

Appendix E Waveform transformation example

Figure 4 1MHz base Waveform, 830.4uW

Figure 5 100MHz Transformation, 830.4 uW

121

Figure 6 1GHz Transformation for 1MHz, 830.4uW

Power Grid Analysis in VLSI Designs

Documents

special thanks

ms program

ms curriculum

science engineering

work relevant

prof nandy

power grid analysis

power estimation