Top Banner
© 2014 ANSYS, Inc. 6/23/2014 1 1 Methods for Achieving RTL to Gate Power Consistency Design Automation Conference 2014
18

Methods for Achieving RTL to Gate Power Consistency

Jun 21, 2015

Download

Engineering

ANSYS Inc.

Consistency between RTL and signoff power numbers is necessary in enabling early low power design decisions with confidence. A modeling and characterization approach that takes into account physical design parameters is required to ensure this consistency. This presentation covers factors that affect RTL power accuracy and how PowerArtist™ PACE™ technology models physical effects to deliver predictable RTL power accuracy for sub-20nm designs. Learn more on our website: https://bit.ly/10Rpcxu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 11

Methods for Achieving RTL to Gate Power Consistency

Design Automation Conference 2014

Page 2: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 22

PowerArtist™: RTL Design-for-Power Platform

Power Analysis and Debug

Original RTL Low-Power RTL

Automated Power Reduction Links with Physical

Physical

Power

RTL Power

PACE RPM

Page 3: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 33

Objectives of RTL Power Analysis

• Power trade-off analysis using relative accuracy

• Sign off power with absolute accuracy

• Analysis driven power reduction

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 201 211 221 231 241 251 261 271 281 291

Cu

mu

lati

ve

Are

a

Ove

rhe

ad

(n

orm

aliz

ed

)

To

tal P

ow

er

Sa

vin

gs

Ava

ila

ble

(n

orm

aliz

ed

)

# RTL Changes (Design Effort)

Maximum acceptable area

impact

Maximum possible

power savings

Only 5 changes

gave 50% saving

Page 4: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 55

RTL Power: Inputs for PowerArtist

Vdd

1

Power domains(UPF / CPF)

Vdd

2module PA (

...

always @ (posedge clk) begin

dout <= din1;

end

assign out = sel ? dout : din2;

...

endmodule RTL (VHDL, Verilog, System Verilog)

RTL Power

Analysis

Capacitance model (WLM / PACE)

mux

andregister

register

Activity

(FSDB / VCD / SAIF)

Clock tree, gating (SDC, PACE, user input)

clk

Power models(Liberty .lib)

register

registerand

mux

Page 5: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 66

Factors Affecting RTL Power Accuracy

Synthesis

Modeling

Inferencing

Multi-VT

Cell Selection

Micro-

architecture

Algorithmic

RTL Models

Activity

Propagation

Timing

Power

Computation

Physical

Models

Clock Tree

Wire Cap

Transition Time

Low Power

Structures

Voltage / Power

Domains

CPF / UPF

NOTE: Algorithmic and Low Power

structures are not configured for

accuracy

Page 6: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 77

Synthesis Modeling Aspects for RTL Power

• Optimization settings to be consistent as synthesis

• Enable DesignWare flow (if DW components are present)Inferencing

• Apply consistent multi-VT settings from synthesisMulti-VT

• Fine-tune cell selection based on synthesis netlist

• Apply boundary conditions based on load/ frequencyCell Selection

• Apply microarchitectures for macros (e.g. adders, multipliers)Microarchitecture

Page 7: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 88

Synthesis Modeling Aspects in PowerArtist

b = 8’b11000100;

assign z = a * b;

CSA

Constant Multipliers

assign z = a + b + c + d ; a b c

CSA d

CSA

+

a b

+ c

d+

+

Chains of Adders

Look-Up Table Optimization

OR

plane

addressdata

case (address)

8'd0 : data = {32'd0};

8'd1 : data = {32'd12};

endcase

address

Optimized and-or plane by

sharing common logic

data

Cell mapping to

basic 2-input cellsModeled using

AOIs

Un-encoded mux

Page 8: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 99

RTL Power AccuracyUsing Wire Load Models

– Large difference seen with

simple wire load models

– Clock and Combo power show

the largest difference

– Total power shows 40%

difference wrt gate level

Mobile SoC Case Study

** Note: GATE considered to be most accurate

28.8%11.0%

-9.2%

69.2%

41.2%32.3%

40.2%

-100%

-80%

-60%

-40%

-20%

0%

20%

40%

60%

80%

100%

0.000

0.020

0.040

0.060

0.080

0.100

0.120

% D

iffe

ren

ce

Po

wer

(W

atts

)

RTL Wire Load Models vs. Gate Level(Different Power Categories)

RTL WLM GATE %diff

Page 9: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1010

Physical Aspects Modeling for Power

• Modeling clock tree

• Balanced and Clock Mesh topologyClock Tree

• Accurately model post-layout wire capacitance

• Model capacitance profile for different types of netsWire Cap

• Accurately model slew for realistic power

• Both clock and logic netsTransition Time

Page 10: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1111

Physical Modeling: Clock Tree

• RTL clock power accuracy requirements

– Understand clock gating methodology

– Understand clock tree topology and buffering

• Difficult for RTL designers to get data from backend team

Clock Mesh TopologyBalanced Clock Tree

Page 11: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1212

Physical Modeling: Wire Cap

40nm, 45k nets with fanout 1

Traditional Wire Load Models

• Not available in some vendor libraries; often not calibrated

• Custom WLMs not portable across blocks and designs

• Simplistic modeling results in poor accuracy

WLM assigns 1fF for all nets vs. SPEF

that varies 0.2fF to >129fF

Page 12: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1313

PACE™ for RTL Power Accuracy

PACE applies from RTL to Pre-layout Power

• Clock tree models

– Determine buffer and CG cells per inferred clock tree

– Supports both balanced clock tree as well as clock mesh

• Wire capacitance models

– Granular, power-oriented vs. traditional WLMs

module PA (

...

always @ (posedge clk)

begin

dout <= din1;

end

assign out = sel ? dout :

din2;

...

endmodule

Clock distribution

Parasitics

Multiple Vt

Low-power structures

RTL Power

Bridge the RTL ↔ Implementation Gap

Statistical Models:

Wire Cap and Clock

Representative

LayoutPowerArtist

Calibration (PACE)

Post-Layout Power

Page 13: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1414

-13.4%5.1%

-9.2%

22.8%8.1%

-37.4%

3.0%

-100%

-80%

-60%

-40%

-20%

0%

20%

40%

60%

80%

100%

0.000

0.020

0.040

0.060

0.080

0.100

0.120

% D

iffe

ren

ce

Po

wer

(W

atts

)

PACE Cap Models vs. WLM & Gate Level(Different Power Categories)

RTL WLM RTL w PACE Cap GATE %diff

RTL Power AccuracyUsing PACE Cap Models

– Tighter correlation seen with

PACE Cap models

– Register and Combo power

are within +/-20%

– Total power shows <5%

difference wrt gate level

Mobile SoC Case Study

** Note: GATE considered to be most accurate

Page 14: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1515

RTL Power AccuracyUsing PACE Cap + Clock Models

– Best correlation seen with

PACE Cap + Clock models

– Overall correlation is within

+/-15%

Mobile SoC Case Study

** Note: GATE considered to be most accurate

-13.4%

9.9%

-9.2%

-12.8% -9.0% -13.6% -9.4%

-100.0%

-80.0%

-60.0%

-40.0%

-20.0%

0.0%

20.0%

40.0%

60.0%

80.0%

100.0%

0.000

0.020

0.040

0.060

0.080

0.100

0.120

% D

iffe

ren

ce

Po

we

r (W

atts

)

PACE Cap+Clk Models vs. WLM & Gate Level(Different Power Categories)

RTL WLM RTL w PACE Cap+Clock GATE

%diff w/ PACE %diff w/ WLM

Page 15: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1616

0.000

0.020

0.040

0.060

0.080

0.100

0.120

Design 1 Design 2 Design 3

Po

wer

(W

atts

)

Total Power Comparison

RTL WLM RTL PACE GATE

RTL Power AccuracyUsing PACE Cap + Clock Models

– Total power with WLM is

greater than +/-30%

– With PACE models within

+/-20%

Mobile SoC Blocks Case

Study

** Note: GATE considered to be most accurate

Page 16: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1717

RTL Power AccuracyUsing PACE Cap + Clock Models

– Total power with WLM is

greater than +/-30%

– With PACE models within

+/-20%

Mobile SoC Blocks Case

Study

** Note: GATE considered to be most accurate

– Clock power with PACE

is within +/-20% as well

15.5%

19.0%20.7%

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

0.00E+00

1.00E-02

2.00E-02

3.00E-02

4.00E-02

5.00E-02

6.00E-02

7.00E-02

8.00E-02

Design 1 Design 2 Design 3

% d

iff

Po

we

r (W

atts

)

Clock Power wrt RTL PACE vs. GATE

GATE RTL PACE %diff

Page 17: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1818

Nvidia Case Study: RTL Power Accuracy

DESIGNNumber of

instances

Black-

boxed DW

instances

Avg

Dynamic

Power

(mW)

Avg

Leakage

Power

(mW)

Avg Total

Power

(mW)

Avg

Dynamic

Power

(mW)

Avg

Leakage

Power

(mW)

Avg Total

Power

(mW)

%

Dynamic

Power

% Leakage

Power

% Total

Power

PR 580320 0 82.524 114.210 196.735 92.900 111.734 204.635 12.57% -2.17% 4.02%

TD 268993 0 89.209 38.713 127.923 101.755 35.089 136.844 14.06% -9.36% 6.97%

TTM 158407 14 64.828 21.353 86.181 63.583 20.212 83.795 -1.92% -5.34% -2.77%

TTF 134152 64 47.850 14.874 62.724 32.563 13.431 45.995 -31.95% -9.70% -26.67%

SMI 1137155 101 145.497 201.661 347.158 125.133 135.635 260.768 -14.00% -32.74% -24.88%

SRF 509095 24 263.894 75.515 339.409 258.332 73.897 332.229 -2.11% -2.14% -2.12%

115.634 77.721 193.355 112.378 65.000 177.378 -2.82% -16.37% -8.26%

125.114 62.448 187.562 129.143 60.233 189.376 3.22% -3.55% 0.97%

85.867 76.462 162.329 97.328 73.412 170.739 13.35% -3.99% 5.18%

Average Power excluding SMI/TTF

Average Power PR/TD only

RTL Power ArtistPost-synthesis PT-PXRTL Power Artist vs

Post-synthesis PT-PX

Average Power overall designs

• Power correlation performed for 6 designs 130K - 1.13M instances

• In general, very good average power correlation observed (SMI and TTF having DWs)

• 8-16 tests being run across the blocks

** Source : Nvidia-Apache Webinar, July 2013 (Miki)

Page 18: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1919

Summary

• RTL power enables early design trade offs for high power impact

• PowerArtist provides predictable RTL power accuracy wrt GATE

• PowerArtist has advanced synthesis and physical modeling techniques

• PowerArtist PACE modeling is proven across designs

• Use PowerArtist for RTL power sign-off with absolute accuracy