Page 1
Switching Activity Analysis and Pre-Layout Activity Prediction for FPGAs
Jason H. Anderson and Farid N. Najm
ACM/IEEE Int’l Workshop on System Level Interconnect Prediction (SLIP)Monterey, CA, April 5-6, 2003
Dept. of Electrical and Computer EngineeringUniversity of Toronto Toronto, Ontario, Canada
Page 2
2
Motivation
• Today’s largest FPGAs are “hot”– consume watts of power
• Xilinx Virtex-II CLB: 5.9 µW/MHz [Shang02]• Modest design: 2500 CLBs, 100 MHz → 1.5 W
• Optimize FPGA power consumption: reduce cooling/packaging costs, new apps, better reliability
• Characterize, then optimize
Page 3
3
FPGA Power Dissipation
• Power breakdown:– Majority is dynamic– Interconnect dominates:
• Xilinx Virtex-II: 50-70% of power dissipated in interconnect [Shang02]; similar results: [Poon02, Kusse98]
– Average dynamic power:
2
2
1VfCP
nets iiavg i⋅∑ ⋅=
∈
toggle rate (switching activity)
supply voltage
capacitance
Page 4
4
Switching Activity
• Different views:– zero delay activity: all dlys are zero– logic delay activity: logic dlys only– routed delay activity: both logic/routing dlys
• Delays introduce glitches: spurious transitions that consume power
i0i1z
glitch
dly = 1
dly = 2
zi0i1
Page 5
5
Motivation
• Activity analysis (part 1 of this work):– Study extent of activity change due to glitches
– FPGA delays dominated by interconnect →severe glitching in this technology?
– Low-power CAD based on zero delay activities →valid for FPGAs?
Page 6
6
Motivation
• State-of-the-art FPGAs can implement complex systems with millions of gates– Design teams, not just individuals– Increasingly long design cycle
• Early, high-level power estimation: minimize design time & cost
Page 7
7
• Layout is most time-consuming part of FPGA CAD flow.
• Pre-layout power estimation requires:– Net capacitance prediction– Net activity prediction (part 2 of this work)
Motivation
Page 8
8
Activity Analysis
• Simulation-based approach• Map 14 circuits into Xilinx Virtex-II• Simulate with zero’ed delays,
logic delays, routed delays• 2 vectors sets: high or low input activity
– high (low) activity vector set: each input has 50% (25%) probability of toggling between vectors
Page 9
9
Activity Analysis Flow
HDL synthesis (Synplify Pro)HDL synthesis (Synplify Pro)
Technology mappingTechnology mapping
Placement and routingPlacement and routing
Zero or logic delay simulation (Synopsys VSS)Zero or logic delay simulation (Synopsys VSS)
Xili
nx to
ols
HDL circuit
Simulation vectors
Switching activity data
Routed delay simulation (Synopsys VSS)Routed delay simulation (Synopsys VSS)
Switching activity data
Routed design
Mapped design
Page 10
10
Effect of Glitching on Transition Count
High activity vector set results:
0.0
20.0
40.0
60.0
80.0
100.0
120.0
140.0
160.0
C35
40
apex
2
ex5p
ex10
10
spla
C26
70 pdc
alu4
seq
apex
4
pair
cps
dalu
mis
ex3
% in
cr. i
n #
of
tran
siti
on
s vs
. zer
o d
elay
sim
uat
ion
logic delay simulationrouted delay simulation
avg: 72.5%
avg: 28.8%
Page 11
11
Activity Analysis
• Substantial activity increase when routing delays are accounted for– Accounting for logic delays is not enough --
interconnect dominates delay
– High activity vector set:• act. incr. zero → logic: avg: 28%, max: 84%• act. incr. logic → routing: avg: 34%, max: 61%
Page 12
12
Effect of Glitching on Transition Count
Low activity vector set results:
0.0
20.0
40.0
60.0
80.0
100.0
120.0
140.0
160.0
C35
40
apex
2
ex5p
ex10
10
spla
C26
70 pdc
alu4
seq
apex
4
pair
cps
dalu
mis
ex3
% in
cr. i
n #
of
tran
siti
on
s vs
. zer
o d
elay
sim
uat
ion
logic delay simulationrouted delay simulation
avg: 42.4%
avg: 19.7%
Page 13
13
Activity Analysis
• Low activity vector set glitching 1/2 to 2/3 as severe as high activity vector set – Fewer inputs switch simultaneously →
fewer simultaneous transitions on different paths to net
– act. incr. zero → logic: avg: 20%, max: 66%– act. incr. logic → routing: avg: 19%, max 35%
Page 14
14
Effect of Delay Optimization
• Previous results: P & R run without performance constraints
• Timing-driven P & R may lead to smaller interconnect delays →less glitching?
Page 15
15
Effect of Delay Optimization
0
20
40
60
80
100
120
140
160
C3540 alu4 spla
% in
cr. i
n #
of
tran
siti
on
s ro
ute
d d
ly s
im v
s ze
ro d
ly s
imUnconstrained P & R
Timing-driven P & R
• Glitching reduction from timing-driven P & Ris not that substantial
Page 16
16
Activity Prediction
• Problem difficulty:– How “hard” is the prediction problem?– What degree of accuracy can be
expected?
• Gauge “noise” in the prediction problem using a specially-designed circuit
Page 17
17
Problem Difficulty
• Regular circuit:
• Has structural & functional regularity
LUT0 LUT1
i0 i1 i2 i3 i4
o1o0
LUT127
i127
o127
i0i1i2
primary inputs
primary outputs
each LUTimplements4-input ANDfunction
Page 18
18
Problem Difficulty
• Implement “regular” circuit in Virtex-II• Analyze activity increase on LUT output
signals from zero to routed delay sim.
• Variability in increase (across LUTs) due to delays known only after layout:– routing delays– different input-to-output LUT delays
• Represents noise we cannot predict
Page 19
19
Problem Difficulty
0
20
40
60
80
100
120
140
160
0 20 40 60 80 100 120
% in
crea
se in
act
ivity
LUT output signal
high activity vector set simulationlow activity vector set simulation
% in
crea
se in
act
ivit
y (z
ero
del
ay s
im. t
o r
ou
ted
del
ay s
im.)
LUT output signal
Page 20
20
Problem Difficulty
• Variability in activity increase significant:– 0-40% (low activity vector set)– 0-100% (high activity vector set)
• Accurate pre-layout activity prediction for FPGAs is a difficult problem
Page 21
21
Activity Prediction
• Predict net glitching using zero (or logic) delay activity, circuit properties
• Idea: glitches propagated or generated
z
a
LUTa
z
glitch
a
z
propagated glitch
generated glitch
Abstract view:
φβα +⋅+⋅= )()()( zpropzgenzpredict
Page 22
22
Generated Glitches
• FPGA logic elements are uniform, have equal drive capability
• Buffered routing switches → connection delay approx. fanout independent
• Predict pre-layout path delay using path length (# of LUTs)
• Unequal path delays lead to glitches
Page 23
23
Generated Glitches
• Let PL(x) = set of path lengths to node x• Define # of path lengths introduced by
node y: |})(||)({|min)(
)(ixPLyPLyIPL
yinputsix
−=∈
y
a b PL(b) = {4,5}PL(a) = {5,6}
PL(y) = {5,6,7}, IPL(y) = 1
Page 24
24
Generated Glitches
• Depth term included since glitching likely to be worse for “deeper” nodes
)()()( ydepthyIPLygen ⋅+= γ
depth of node driving net y
Page 25
25
Propagated Glitches
• Propagate term uses notions of Boolean difference & static probability
• Consider logic function:• Boolean difference of y w.r.t. xi =
),...,,( 21 nxxxfy =
ii xxi
ffxy ⊕=
∂∂
function f(…) with xi replaced by 1
function f(…) with xi replaced by 0
Page 26
26
Propagated Glitches
• Key:
• Static probability: fraction of time logic signal is in “1” state
• Relevant to whether a glitch on xi will become a glitch on y
:
∂∂
ixy
P probability a transition on xi will result in transition on y
1=∂∂
ixy → transition on xi will cause
transition on y
Page 27
27
Propagated Glitches
• za(xi) = zero delay activity of xi
– replace with logic dly activity (if available)
∑ ⋅
∂∂
∑ ⋅⋅
∂∂
=
∈
∈
)(
)(
)(
)()(
)(
yinputsxi
i
yinputsxii
i
i
i
xzaxy
P
xzaxpredictxy
P
yprop
Page 28
28
Experimental Methodology
• Divide 14 circuits into 2 groups: characterization circuits and test circuits
1) Tune model for specific CAD flow &device using characterization circuits
2) Apply model to predict activity in test circuits
Page 29
29
Experimental Methodology
• Two prediction scenarios:– predict routed dly activity from zero dly act.– predict routed dly activity from logic dly act.
• Static probability, zero/logic activity extracted from simulation – parameters can also be computed using
probabilistic approaches
Page 30
30
0
100
200
300
400
500
600
700
50 100 150 200 250 300 350 400
% in
cr. i
n ac
tivity
(ze
ro d
elay
sim
. to
rout
ed d
elay
sim
.)
Predict function value
Model Tuning%
incr
ease
in a
ctiv
ity
(zer
o de
lay
sim
. to
rout
ed d
elay
sim
.)
Predict function value
• High activity vector set simulation of characterization circuits
Page 31
31
Results
0
10
20
30
40
50
60
70
80
90
alu4 seq
apex
4
pair
cps
dalu
mis
ex3
% e
rro
r
zero delay activity meanabsolute error
predicted activity meanabsolute error
Prediction from zero delay activity data:
• mean error reduced by factor of 2 for most circuits
∑−=
∈ netsn nranra nza
err)(
)()(
∑−=
∈ netsn nranra npa
err)(
)()(
zero dly act routed dly act
predicted act
Page 32
32
Results
Prediction from logic delay activity data:
0
10
20
30
40
50
60
70
80
90al
u4 seq
apex
4
pair
cps
dalu
mis
ex3
% e
rro
r
logic delay activity meanabsolute error
predicted activity meanabsolute error
Page 33
33
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
-100 -50 0 50 100
% o
f all
nets
% error
zero delay activitypredicted activity
Results
Error histogram for zero delay activity, predicted activity:
2
4
6
8
10
12
14
16
Page 34
34
Results
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
-100 -50 0 50 100
% o
f all
nets
% error
logic delay activitypredicted activity
Error histogram for logic delay activity, predicted activity:
2
4
6
8
10
12
14
16
Page 35
35
Results Summary
• Mean absolute error in activity reduced by factor of 2 for many circuits
• Zero/logic delay activities have one-sided error bias– Will consistently underestimate power
• Prediction model: one-sided error bias is eliminated– Better avg. power estimates
Page 36
36
Summary
• Switching activity analysis:– Differences between zero, logic, routed
delay activity can be significant– Glitching severity depends on input activity
• Pre-layout activity prediction:– A difficult problem– Demonstrated prediction approach based
on circuit structure/functionality– Mean activity error reduced,
one-sided error bias eliminated