1 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential 6 Steps to Low Power RTL Design Jonghyun Lim, Sr Application Engineer
1 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
6 Steps to Low Power RTL DesignJonghyun Lim, Sr Application Engineer
2 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Early Power Decisions High Impact
Po
we
r R
ed
uct
ion
100%
50%
0%
Large Impact Small Impact
RTLDesign
LogicSynthesis
PhysicalDesign
Timing Closure
• Power-Performance-Area Trade-offs
• Voltage / Power Domain Planning
• Block-level Clock and Data Gating
• Eliminate Redundant Activity
• Power Switch Sizing / Placement
• Clock Gater Cloning / Decloning
• Multi-Vt Optimization
• Power Integrity Verification
RTL Design-for-Power Low Power Implementation
3 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Residual receive activity in transmit mode
TRANSMIT MODE
RECEIVE MODE
RTL Power Regression Flow
Six Steps to Low-Power RTL Design
Enabled Clock Inactive Data
0.00E+00
1.00E-02
2.00E-02
3.00E-02
4.00E-02
5.00E-02
6.00E-02
Po
we
r (W
) Version 2 (Typ)
Version 1 (Typ)
Version 2 (Idle)
Version 1 (Idle)
Version 1 Version 2
RTL Design-for-Power Methodology
Perform design trade-offs
Peak Power = 391mWAverage power = 239mW
Profile design activity/power
Track power via regressions
Guided Reduction
Debug power hotspotsCheck power vs. budget
4 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
1.Perform design trade-offs
2.Profile design activity
3.Check power versus budget
4.Debug power hotspots
5.Reduce power at RTL
6.Track power via regressions
Six Steps to Low-Power RTL Design
5 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
1. Perform Design Trade-Offs at RTL
Quick Design Iterations
Design Specification
RTL Design
Gate-Level Design
Layout
~20 hours
~22 mins
RTL Design
Effective Design-for-Power
Gate-level Power
+Adder
Register
Mux
RTL Power
Power-per-Function
Power-per-Gate
6 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
•Power efficiency across different architectures
Explore Power of Micro-Architectures
0.00E+00
1.00E-02
2.00E-02
3.00E-02
4.00E-02
5.00E-02
6.00E-02
Register Combinational Switching Clock Memory Inferred Buffer
Po
we
r (W
)
Version 2 (Typ) Version 1 (Typ)
Version 2 (Idle) Version 1 (Idle)
Arch#2 (Typical mode)Arch#2 (Typical mode)Typical mode, Arch #2
Idle mode, Arch #1
Idle mode, Arch #2
Typical mode, Arch #1
Ref: Architectural Exploration: Area-Power tradeoff in 802.11a transmitter design,, MIT
Validate across multiple modes of operation
http://csg.csail.mit.edu/6.375/6_375_2007_www/handouts/lectures/L05-BS2-802dot11a.ppt
7 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
1.Perform design trade-offs
2.Profile design activity/power
3.Check power versus budget
4.Debug power hotspots
5.Reduce power at RTL
6.Track power via regressions
Six Steps to Low-Power RTL Design
8 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Reset Activity
Pipeline Fill Activity
Redundant Pipeline Activity
Optimal TimeInterval
2. Profile Design Activity/Power
Conventional signal activity viewer
• Difficult to validate activity coverage
• Difficult to analyze activity per hierarchy
Design-level Activity Viewer
• Identify power-critical windows
• Qualify vectors per mode
• Identify wasted activity
9 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Per mode
Best Practices for Analyzing Design Activity
Transmit Mode
Receive Mode
Per hierarchy
Redundant clock activity?
Per net category
Memory
Clock
Register
Flop clock pins per hierarchy
Redundant clock activity?
10 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Identify Power-Critical Cycles for Grid Integrity
Chip
PackageVdd
di/dt
L di/dt
Clock/Power Gating di/dt V-drop
Di/dt event not at the same time as the peak
Automatic Cycle Selection on GPU Core
11GB FSDB, 632K cycles, 3.3M instances
• RTL provides high-performance for M+ cycles• Can identify Peak and dP/dt power-critical cycles• Can directly interface to power grid integrity tool
Peak = 6X Average Power
Frame: DIDTStart time: 0.0817704Finish time: 0.0817706
Frame: CYCLE_POWERStart time: 0.0806005Finish time: 0.0806007
11 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
1.Perform design trade-offs
2.Profile design activity
3.Check power versus budget
4.Debug power hotspots
5.Reduce power at RTL
6.Track power via regressions
Six Steps to Low-Power RTL Design
12 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
3. Check Power vs. Budget, Early
•Peak Power and Time
•Waveforms per Hierarchy, Category
Time-based Power
•Power by Hierarchy, Category, Mode
•Power by Clock, Power Domains
Average Power
•Get early visibility into power: Average, Peak, Power Waveform
•Guide power-related design decisions early: grid, package, decap
•Avoid schedule and price impact from late surprises
13 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Slice and Dice Power
Power by Hierarchy
Power by Category Power by Clock
Peak Power and Time
By hierarchy
By category
14 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Model Physical Effects for Accurate RTL Power
module PA (...always @ (posedge clk)
begin dout <= din1;
endassign out = sel ? dout :
din2;...
endmodule
WLMsClock modelingInferencing
Clock distributionParasiticsMultiple VtLow-power structures
RTL Power
Models can bridge RTL ↔ Implementation Gap
Pre/Post-Layout Power
Cap, Clock & Cell models
Representative Layout
Characterization
Pre-Layout Power Budgeting
RTL vs Gates Total Power: within 15% RTL vs Gates Clock Power: within 15%
15 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
1.Perform design trade-offs
2.Profile design activity
3.Check power versus budget
4.Debug power hotspots
5.Reduce power at RTL
6.Track power via regressions
Six Steps to Low-Power RTL Design
16 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
4. Identify and Debug Power Hotspots
Graphical Debug
• Quickly spot power anomalies: Where?
• Interactively identify root cause: Why? When?
Custom Queries based Debug
• Automate custom power reduction beyond standard tool reports
• Tcl interface to power database
Power Efficiency Metrics
Identify inefficiency at different abstractions: hierarchy, clock, instanceCycle-accurate metrics include Clock Gating Efficiency
Acc
ura
te P
ow
er
Engi
ne
Fo
un
dat
ion
Tec
hn
olo
gy
17 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Visually Debug Power for Anomalies
Inactive Data, Active Clock
Identify Block-level Clock Gating Enable
Power bugs• Power incorrect, functionally correct• Large power savings Designers spot ‘bugs’• Browse by absolute power• Browse by relative power• Cross probe to schematics, RTL
18 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Automate Custom Reports with Tcl Interface
• Glitchy ALUs: ALUs with either input not coming directly from registers• List of ungated registers: RTL file and line, power consumed, bit width• Clock enable efficiency: Per clock gate, with downstream power the gate controls• ALU mux selects: List of signals that constitute the mux select for ALU data
Collections-based, industry-standard
19 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Review Power Efficiency Using Metrics
By Hierarchical Instance
By Flop / Latch
By Clock
20 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
1.Perform design trade-offs
2.Profile design activity
3.Check power versus budget
4.Debug power hotspots
5.Reduce power at RTL
6.Track power via regressions
Six Steps to Low-Power RTL Design
21 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
5. Reduce Power Early at RTL
Clock Active, Data Inactive
Clock Inactive, Data Active
Block-level Clock Gating
Block-level Data Gating
1.1 Clock Pins-------------------------------------------------------Redundant Total Pin Mode InstanceCycles Cycles Name Name Name
-------------------------------------------------------200 201 CLKA read top.core1.t1.dpmem.m1-------------------------------------------------------
1.2 Input and Redundant Pins-------------------------------------------------------Redundant Total Pin Mode InstanceToggles Toggles Name Name Name
-------------------------------------------------------1 1 AB[8] read top.core1.t1.dpmem.m1-------------------------------------------------------
Wasted Activity per Mode
Redundant activity in read mode
Block-level Clock and Data Gating
Leverage quasi-static signals as coarse clock gating enables
sel (t-1)
sel
0
1
data
Per port, Per clock, Per hierarchy
22 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
• Increase CG coverage
• Improve CG efficiency
• Stability and observability
Clock / Clock Gating
• Eliminate redundant activity
• Use don’t care conditions
• Isolate datapath operators
Control Logic and Datapath
• Eliminate redundant access
• Split wide memories
• Exercise sleep modes
Memory Subsystem
Leverage Guided Reduction Techniques
23 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Ensure Reduction is Analysis-DrivenMaximize Reduction, Minimize Iterations
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
1 11 21 31 41 51 61 71 81 91 101111121131141151161171181191201211221231241251261271281291
Pre
dic
ted
Po
wer
Savin
gs
(norm
aliz
ed)
# RTL Changes (Design Effort)
Top 5 RTL changes 50% identified power savings
– Prioritize high impact reductions– Minimize design impact
Saving estimates include added and removed logic, changed activity
24 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
1.Perform design trade-offs
2.Profile design activity
3.Check power versus budget
4.Debug power hotspots
5.Reduce power at RTL
6.Track power via regressions
Six Steps to Low-Power RTL Design
25 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
6. Track Power via Regressions
• Monitor power creep across design development cycle
• Tcl interface to database enables custom queries
• Detailed tracking across design hierarchy, clock, supply
• Utility tracks change in power across two versions
Sample report
Typical Regression Framework
• 30+ blocks in a typical SoC
• 2+ vectors per block
• Vectors written for power: idle, active
• Daily block-level runs
• Weekly chip-level runs
• Track power change, reduction opportunities
26 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Residual receive activity in transmit mode
TRANSMIT MODE
RECEIVE MODE
RTL Power Regression Flow
Six Steps to Low-Power RTL Design
Enabled Clock Inactive Data
0.00E+00
1.00E-02
2.00E-02
3.00E-02
4.00E-02
5.00E-02
6.00E-02
Po
we
r (W
) Version 2 (Typ)
Version 1 (Typ)
Version 2 (Idle)
Version 1 (Idle)
Version 1 Version 2
RTL Design-for-Power Methodology
Perform design trade-offs
Peak Power = 391mWAverage power = 239mW
Profile design activity
Track power via regressions
Guided Reduction
Debug power hotspotsCheck power vs. budget
27 © 2015 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Thank You