Slide 2 Techniques for VLSI Circuit Optimization Considering
Process Variations Mahalingam Venkataraman Department of Computer
Science and Engineering University of South Florida, Tampa, FL,
33620 Chair: Prof. Babu Joseph Major Professor: Prof. Nagarajan
Ranganathan Committee Members: Prof. Srinivas Katkoori Prof. Hao
Zheng Prof. Justin E. Harlow Prof. Kandethody Ramachandran Prof.
Sanjuktha Bhanja 1 Mahalingam Venkataraman, PhD Defense Date:
3/23/2009 Slide 3 Outline of Presentation Introduction, Motivation
and ContributionsVariation Aware Gate SizingVariation Aware Timing
based PlacementVariation Aware Buffer Insertion and Driver
SizingDynamic Clock StretchingConclusionsPublications 2 Mahalingam
Venkataraman, PhD Defense Date: 3/23/2009 Slide 4 Transistor Count
VLSI Circuit Complexity Technology scaling Improved
multi-functional features and enhanced performance Increased
failure probability Northwood 55 Mill. Prescott 125 Mill. Yonah,
151 Mill. Wolfdale 410 Mill. Yonah 151 Mill. Source: Intel 3 Slide
5 Nanometer Dimensions 1 m 10 cm 1 cm1 mm100 m10 m 100 nm 65 nm
Transistor Source: Intel Source: Spektrum der Wissenschaften
Courtesy: Sill, PGPEE 2008 4 Slide 6 Process Variations Process
variations, in general, refer to the difference between the
intended and obtained values in voltage and process parameters
prior and post fabrication of the circuit. The variations are more
pronounced in nanometer era due to the limitations in fabrication
equipment and lithography process Process variations in nanometer
era has a impact on the failure probability and hence the timing
yield of integrated circuits 5 Slide 7 VLSI Circuit Optimization
Circuit optimization in the nanometer era, is formally defined as
the process of designing circuits with best possible power, delay
and noise parameters Common methods Transistor/Gate Sizing, Wire
sizing, Incremental placement Multiple supply, threshold voltages,
Buffer insertion The relationship among the parameters are
conflicting Circuits with optimal power can have a poor performance
and/or noise value Process variations have made the relationships
among the conflicting optimization objectives complex and hence
more difficult to optimize 6 Slide 8 Motivation: Dissertation
Research Corner based circuit optimization ignoring variation
effects can negatively impact timing yield Worst case consideration
of variations, guarantees good yield, but can lead to severe over
design. In this context, there is a strong need for re-invention of
circuit optimization techniques in a statistical perspective. The
methodology has to consider multiple conflicting objectives model
variation effects without assumptions regarding distributions has
to be efficient enough to handle large circuits. Hence, in this
dissertation, we model and develop novel statistical and runtime
variation aware solutions for circuit optimization considering
process variations. 7 Slide 9 Statistical Timing Analysis Element
delay as PDF/CDF Element delay Min/Max Circuit delay Circuit delay
as PDF/CDF Static Timing Analysis Statistical Timing Analysis
Variation awareness in VLSI started with PDF/CDF propagation in
timing analysis. Circuit optimization frameworks were then built on
top of the SSTA engine to optimize performance considering
variations. 8 Slide 10 Mathematical Programming based Circuit
Optimization SSTA based iterative circuit optimization require a
number of complicated operations at each node and hence incur a
prohibitive runtime [Schmidt, EJOR 2000, Karkowski, ICSS 1995].
Hence, the authors in [Mani, DAC 2005, Mani ICCD 2004], proposed
stochastic mathematical programming based circuit optimization.
Mathematical programs are fast and has the capability to handle
large circuits Several circuit optimization problems like gate
sizing, buffer insertion and placement have well defined
mathematical programming formulations The stochastic programming
technique is reasonably fast, but can be conservative in terms of
yield and hence lesser savings in area or power [Buckley, IJFSS
1990] 9 Slide 11 Fuzzy Mathematical Programming (FMP) FMP is a
special case of Mathematical programming with fuzzy variables in
constraints or objective functions. variations are modelled as
fuzzy numbers. Similar to stochastic programming, fuzzy programming
involves a relaxation step FMP has been used to model uncertainty
in scheduling, binding, testing, robotics, pattern matching and
artificial intelligence. A fuzzy number (linear, trapezoidal or
non- linear) is defined as a number whose precise value is somewhat
uncertain. 10 Slide 12 Motivation: Fuzzy Programming The author, in
[Buckley, IJFSS 1990], highlighted that fuzzy programming
guarantees solutions better or at least as good as stochastic
programming and proved the same using Monte-Carlo simulations. The
bound constraints in fuzzy programming allows the FMP to search for
the optimal value instead of averaging a list of close to optimal
values as in stochastic programming. Fuzzy programming also handles
variation parameter in the objective function as opposed to
constraints in stochastic Hence, we planned to use fuzzy
programming based modeling and solution for uncertainty aware VLSI
circuit optimization. 11 Slide 13 Motivation: Dynamic Clock
Stretching The proposed statistical design methods (fuzzy or
stochastic) are quite effective in the presence of variations
incurring reasonable overheads. However, when there are no
variations occuring in critical paths, the overheads still remain.
To avoid this, we investigate a completely different approach to
handle process variations. A dynamic delay detection and clock
stretching technique is proposed to combat the effects of process
variations 12 Slide 14 Contributions of This Dissertation Major
Contributions Modeling Variations in Process Parameters using Fuzzy
Membership Functions Fuzzy Linear Programming Formulation for
Variation Aware Gate Sizing Fuzzy and Stochastic Nonlinear
Programming Formulation for Variation Aware Timing Based Placement
Fuzzy Piece-Wise Linear Programming Formulation For Variation Aware
Buffer Insertion and Driver Sizing at Logic and Layout Level A
Process Variation Tolerant Circuit Design using Dynamic Clock
Stretching 13 Slide 15 Introduction, Motivation and
ContributionsVariation Aware Gate SizingVariation Aware Timing
based PlacementVariation Aware Buffer Insertion and Driver
SizingDynamic Clock StretchingConclusionsPublications Outline of
Presentation Introduction, Motivation and Contributions Gate Sizing
Previous works Fuzzy LP Formulation Fuzzy Gate Sizing Algorithm
Experimental Results Variation Aware Gate Sizing Variation Aware
Timing based PlacementVariation Aware Buffer Insertion and Driver
SizingDynamic Clock StretchingConclusionsPublications 14 Slide 16
Variation Aware Gate Sizing (VA-GS) Gate sizing is one of the
simplest, yet effective technique for improving power/performance
trade- off in VLSI circuits Increasing size of a gate increases
performance and power consumption. The problem of gate sizing is
well suited to be formulated as a mathematical programming problem
In this work, we formulate variation aware gate sizing as a fuzzy
linear programming problem, maximizing timing yield with power and
delay as constraints. 15 Slide 17 Previous Work: (VA-GS)
Statistical Timing Analysis (SSTA) methods [Hasimoto, ISPD 2000],
[Devgan, ICCAD 03], [Blaaw, DAC 04] Continuous functions propagated
instead of discrete values max and add operations on continuous
functions Penalty based circuit optimization [Visweswariah et.al,
DAC 02] used penalty functions in constraints to avoid building a
wall of timing critical paths Stochastic optimization using chance
constrained programming [Mani, ICCD 04, Mani, DAC 05] models
uncertainty in delay using probabilistic constraints 16 Slide 18
Previous Works 17 Slide 19 Variation Aware Gate Sizing Outline Step
1: Formulation of linear models for gate delay and dynamic power as
functions of gate sizes. Step 2: Modeling process variation in gate
delay coefficients by treating them as triangular fuzzy numbers.
Step 3: Formulating and solving the LP for Deterministic Gate
Sizing by setting the variation parameters to worst and typical
case -> we get bounds for fuzzy formulation. Step 4: The bound
values generated above are used to convert fuzzy formulation into a
corresponding crisp formulation using symmetric relaxation. Step 5:
The crisp optimization problem is then solved through a commercial
nonlinear optimization solver. 18 Slide 20 Step 1: Power and Timing
Models The power consumption of a gate is fitted as a linear
function of the gate size (s i ) only. Linear approximation for
gate delay is adopted from [Berkelaar, EDAC 90] where a, b, c :
constant coefficients from spice simulations fo(i): fan-out of gate
i; s i : size of gate i; The above equation describes, gate delay
(d i ) as a function of gate size (s i ) and sizes of its fan-out
gates 19 Slide 21 Step 2: Modeling Variations The variations in
gate length and oxide thickness are translated to coefficients b
and c in the delay equation The actual physical variability of
these coefficients are unknown, but they closely approximate gate
length and oxide thickness [Mani, ICCD 04] The fuzzy coefficients
are modeled as triangular fuzzy numbers of the form (b i,b i g i, b
i +g i ) and (c i,c i h i,c i +h i ) and the coefficients g i and h
i represent the maximum variations 20 Slide 22 Step 3:
Deterministic Gate Sizing: LP Formulation In this work, we use a
delay constrained power minimization formulation for gate sizing
The deterministic version of the gate sizing optimization problem
can be shown as where Pi is the power consumption of gate i, Dp is
the delay of path p and Tspec is the required timing specification
of the circuit The variations in delay are transferred to the
coefficients b and c in the delay equation 21 Slide 23 Step 3:
Pre-Processing for Creating Crisp Problem The deterministic LP
problem is solved with gate delay set to worst case (wc_sizing)
Next, the deterministic LP problem is also solved with delay of a
gate set to nominal case (nc_sizing) The solution to these
optimizations represent the lower and upper bound values for
variation aware fuzzy gate sizing problem 22 Slide 24 Using these
bound values from the pre-processing step and a variation parameter
lambda ) the fuzzy linear programming problem shown below is
converted to crisp programming problem. The solution to the crisp
problem is in between the bound values and represents an overall
degree of satisfaction of the variation parameters and the
objectives of the optimization problem. Step 4: Variation Aware
Fuzzy Gate Sizing 23 Slide 25 Step 4: Crisp Nonlinear VA-GS Problem
The crisp problem for VA-GS is given by, Where is the variation
parameter, nc sizing and wc sizing represent the values of the
objective functions from the deterministic pre-processing
optimizations and varies from 0 to 1. The crisp problem maximizes
the variation resistance (robustness), bounds the power value and
satisfies the delay constraints in an optimal fashion 24 Slide 26
Reducing Computations in VA-GS The path based delay constraints in
the gate sizing problem is converted to node based constraints This
translation only introduces a sub-optimality of 1 - 2%, but
increases the feasibility of optimizing large circuits since # of
paths is exponential The spatial correlation values of the process
variations can be handled by processing the circuit as n smaller
regions Gates in a specific region are assumed to have the same
range of variation values [Singh, DAC 05] 25 Slide 27 Step 5: VA-GS
Simulation VA-GS was tested on ITC99 circuits AMPL mathematical
programming language format. KNITRO a commercial non- linear
optimization solver. A variation of 25% in gate delay was assumed
in accordance with [Nassif, ISSCC 2000]. 26 Slide 28 Experimental
Results The variation aware fuzzy gate sizing approach provides an
average improvement of 18% compared to DWC and 9% compared to
stochastic gate sizing without compromising on timing yield. Slide
29 Spatial Correlations We use a grid based correlation model The
approach divides the design into different number of regions, the
gates in same/closer regions are highly correlated compared to
gates in different regions A pre-processing step incorporates the
correlation effects into coefficients b i and c i in accordance
with the gate and fan-out gates location in the chip Slide 30 VA-GS
with Spatial Correlations The spatial correlation values of the
process variations were handled by clustering the circuit as n
smaller regions similar to [Singh, DAC 05] The variation aware gate
sizing with spatial correlations had an extra 3% improvement due to
reduced pessimism in modeling variations. Slide 31 Monte-Carlo
simulation The solution of the fuzzy technique is verified for
timing yield values using Monte-Carlo simulation We generated 10000
copy of all benchmark circuits with random gate delay coefficients
and fixed gate sizes from the solution of the fuzzy approach The
delay coefficients corresponding to gate length and oxide thickness
were treated as random numbers within the nominal case and worst
case range. The timing yield defined as the number of times delay
of the random circuit is less than Tspec value. The proposed fuzzy
approach indicates a timing yield of 99% for the ITC benchmark
circuits. 30 Slide 32 Introduction, Motivation and
ContributionsVariation Aware Gate SizingVariation Aware Timing
based PlacementVariation Aware Buffer Insertion and Driver
SizingDynamic Clock StretchingConclusionsPublications Outline of
Presentation Introduction, Motivation and Contributions Variation
Aware Gate Sizing Timing Based Placement (TBP) Previous works
Problem Formulation Variation Aware Fuzzy TBP Stochastic TBP
Experimental Results Variation Aware Timing based Placement
Variation Aware Buffer Insertion and Driver SizingDynamic Clock
StretchingConclusionsPublications 31 Slide 33 Timing Based
Placement (TBP) Incremental placement for delay improvement is a
crucial step in the post layout timing convergence flow The TBP
performs small changes to the cell locations, after wire length
driven standard cell placement, with the objective of improving
worst negative slack Previous works on timing driven placement
[Choi, ICCAD 03] has shown significant improvements of (upto 20%)
in worst negative slack 32 Slide 34 Variation Aware TBP The
objective of timing based placement is to find optimal locations of
cells in a critical sub-circuit such that the critical delay of the
circuit is minimized. The timing based placement technique requires
a nonlinear programming approach, as net delay has a quadratic
dependence on net length We proposed two new solutions: (i) A fuzzy
nonlinear program based solution (ii) A stochastic chance
constrained programming based solution for variation aware timing
based placement. 33 Slide 35 Previous works on TBP Timing driven
placement can be categorized into net-based [Ren, ISPD 04] and path
based [Chowdhary, DAC 05] approaches. The net-based approach
translates the timing requirements into sensitivity coefficients of
timing critical nets and performs a weighted wire length
minimization. Hence, modeling the effects of process variations in
these net-based approaches is not straightforward. The path based
approaches hold an accurate timing view and minimize critical path
delay more directly by involving path delay constraints in the
optimization problem. 34 Slide 36 Previous works on TBP A problem
with the path based approach is their high computational complexity
due to the exponential number of paths in the circuit. But path
based delay constraints can be transformed into node-based arrival
time constraints [Chowdhary, DAC 05] to improve the feasibility of
optimizing large circuits. The transformation only introduces a
sub-optimality of 1- 2% [Mani, ICCD 04]. Hence, we model process
variations in a node based timing based placement formulation to
maximize yield, with delay and placement location constraints. 35
Slide 37 Taxonomy VA-TBP 36 Slide 38 Location Constraints and HPWL
The variables leftx, rightx, lowery and uppery are defined for
every net. For every cell at location (x,y) connected to net,
following constraints are required, Half perimeter wire length
(HPWL) of this net is then given by, lowery uppery Net leftx rightx
37 Slide 39 Variation Aware Fuzzy TBP Outline Step 1: Formulation
of linear model for gate delay and nonlinear model for interconnect
delay. Step 2: Modeling process variation in delay coefficients by
treating them as triangular fuzzy numbers. Step 3: Estimate
critical cells and calculate move distance. Step 4: Formulating and
solving the NLP for TBP by setting the variation parameters to
worst and typical case -> we get bounds for fuzzy formulation.
Step 5: The bound values generated above are used to convert fuzzy
formulation into a corresponding crisp formulation using symmetric
relaxation. Step 6: The crisp optimization problem is then solved
through a commercial nonlinear optimization solver. 38 Slide 40
Step 1: Gate and Interconnect Delay models We model gate delay as
linear function of gate size (s i ) and capacitance (C pi ). In
timing based placement, the gate size (s i ) does not change and
only load seen by the gate changes, due to change in interconnect
length. The interconnect delay is modeled as a quadratic function
of the net length and can be shown as, Hence, in this work, we
model timing based placement as a nonlinear programming problem to
maximize timing yield with delay and location constraints 39 Slide
41 Step 2: Modeling Variations Similar to variation aware gate
sizing, we model the uncertainty in delay using coefficients of the
gate and interconnect delay equation. The coefficients A 1, A 2 and
K D are assumed to vary and are are modelled as triangular fuzzy
numbers of the form (A 1,A 1 -VA 1, A 1 +VA 1 ), (A 2,A 2 -VA 2, A
2 +VA 2 ) and (K D, K D -VK D, K D +VK D ) respectively. 40 Slide
42 Step 3: Pre-Processing The gates and interconnects in the most
critical path and the adjacency graph within three levels are
considered for incremental timing based placement The allowed
movable distances of the critical cells are set in proportion to
the criticality and the available free-space of the cluster 41
Slide 43 Step 4: Deterministic TBP Formulation The deterministic
version of the incremental timing based placement problem can be
shown as, The HPWL and location constraints are not shown here as
they are not affected by process variations. Here, arr is the
arrival time variable of gate and nets and Tspec is the required
timing specification of the circuit The problem is formulated to
maximize the timing specification (a pseudo for worst negative
slack) with node based required arrival time constraints. 42 Slide
44 Step 4: TBP at Nominal and Worst Case Corner The deterministic
TBP problem is solved with gate and net delay set to worst case
values (wc_tbp) Next, the deterministic TBP problem is also solved
with delay of a gate set to nominal case (nc_tbp) The solution to
these optimizations represent the lower and upper bound values for
variation aware incremental timing based placement problem 43 Slide
45 Using these bound values from the pre-processing step and a
variation parameter lambda ) the uncertain nonlinear programming
problem is converted to a crisp nonlinear problem. The problem aims
to maximize variation resistance ( ) and maintains the timing
specification in between the bound values ( wc_tbp and nc_tbp) Step
5: Crisp TBP Formulation 44 Slide 46 Stochastic Timing Based
Placement The stochastic formulation is cast as a robust
mathematical program, which captures variation effects on the
constraints using the mean and variance of the uncertain
parameters. The stochastic chance constrained programming technique
models uncertainty in delay using probabilistic constraints. 45
Slide 47 Probabilistic Constraints The uncertain arrival time
constraints modeled as probabilistic constraints: Where, ( the
probability at which the constraint has to be met corresponds to
the timing yield of the circuit The probabilistic constraints are
relaxed to the equivalent formulation with mean, cumulative
distribution and standard deviation 46 Slide 48 Stochastic TBP The
resultant stochastic TBP problem can be shown as, Here, ) is the
standard deviation and is the inverse cdf value of the
distribution. In accordance with previous works [Prekopa, Kluwer
95], a inverse cdf value of 3 is used for timing yield of 99.7% 47
Slide 49 Step 6: VA-TBP Simulation VA-TBP was tested on ITC99
benchmark circuits KNITRO solver available through NEOS is used for
both formulations described in AMPL format 48 Slide 50 Experimental
Results The variation aware fuzzy placement approach provides an
average improvement of 12% compared to DWC and the stochastic
placement methodology provided a 10% compared to DWC Slide 51
Introduction, Motivation and ContributionsVariation Aware Gate
SizingVariation Aware Timing based PlacementVariation Aware Buffer
Insertion and Driver SizingDynamic Clock
StretchingConclusionsPublications Introduction, Motivation and
Contributions Variation Aware Gate Sizing Variation Aware Timing
based Placement Buffer Insertion and Driver Sizing (BIDS)
Deterministic BIDS Logic Level BIDS Logic versus Layout Level BIDS
Experimental Results Variation Aware Buffer Insertion and Driver
Sizing Dynamic Clock StretchingConclusionsPublications Outline of
Presentation 50 Slide 52 Buffer Insertion and Driver Sizing (BIDS)
Impact of interconnect driven performance optimization is
increasing in the nanometer era. In prior buffer insertion
techniques, wires have been divided into smaller segments and bring
the wire delay to almost linear in terms of its length. It has also
been pointed out in [Saxena, TCAD 04], that 35% of the total
standard logic cells in a circuit will be buffers at the 65nm
technology level. Further, several works have pointed out that
buffer insertion coupled with driver sizing, in the optimization
phase, can reduce the number of buffers inserted. 51 Slide 53
Previous Works: Buffer Insertion Buffer insertion techniques
Net-based Net-based ordering mechanisms may lead to sub-optimal
over-buffering due to a lack of global view. Path based The path
based buffer insertion algorithms abstract a whole path as a net
and often result in over buffering in the paths that are considered
first. Network-based Network-based buffer insertion techniques,
consider a whole circuit as input and insert buffers with a global
view. 52 Slide 54 Logic Level Variation Aware BIDS We formulate the
buffer insertion and driver sizing problem at the logic level as a
piece-wise linear program with variations modeled as fuzzy numbers.
Piece-wise linear constraints are used for modeling buffer
insertion, when multiple buffers are to be inserted in a net
segment A look-up table based approximation is used for net length
modeling at the logic level Number of buffers and gate sizes used
as pseudonym for dynamic power consumption during BIDS 53 Slide 55
Gate and Interconnect Delay models Similar to several LP based
sizing works, we model gate delay as function of gate size (s i )
and downstream capacitance (C load ) The interconnect delay on the
other hand can only be modeled as a quadratic function of the net
length To reduce complexity, we model the BIDS problem with
piece-wise linear delay constraints 54 Slide 56 Logic Level Net
Length Estimation Accurate modeling of the interconnect length at
the logic level is crucial to optimization at this level In this
work, we estimate wire length using a fast and accurate lookup
table based estimation. Previous works, have used the Rents rule to
derive the upper bounds for interconnection lengths The rents rule
however, does not hold true at all levels of partition hierarchy in
the nanometer era Hence, we use a table based methodology with
number of cells/interconnects and fan-out count of each cell as the
address for look-up 55 Slide 57 Logic Level Net Length Estimation
The look-up table is created with layout-level wire length results
of sample benchmark circuits MCNC benchmark suite with gate
complexity ranging from 500 to 10000 gates were used for estimation
Interconnects with same fan-out count is grouped and the average
net length for each fan-out count is calculated For each fan-out
count, nets are averaged again based on gate count in the second
dimension A maximum fan-out size of 20 is assumed and all nets with
more than 20 fan-out count are rounded to 20 56 Slide 58
Deterministic-BIDS The equation below shows the BIDS problem
formulated to minimize buffer and gate cost with piece-wise
required time constraints 57 Slide 59 Modeling Variations The
variations in delay are modeled as a function of the coefficients
in required timing constraints The coefficients cb1, cb2 and cg1
are modeled as fuzzy numbers with worst case values (cb1-vb1),
(cb2-vb2) and (cg1-vg1), where vb1, vb2 and vg1 are the maximum
variation values 58 Slide 60 Pre-Processing (Deterministic
Optimizations) The fuzzy mechanism starts with a set of
pre-processing optimization steps with the varying parameters set
to worst case and nominal case values Here, we model uncertainty
due to process variations, as an imprecision in the delay
improvement due to BIDS. The coefficients cg1, cb1 and cb2 are
modeled as triangular fuzzy numbers The deterministic problem is
solved with variations coefficients set to cg1-vg1, cb1-vb1 and
cb2-vb2 (Obj wc ) The deterministic problem is solved with cg1, cb1
and cb2 set to their mean values (Obj nc ) 59 Slide 61 The Obj wc
and Obj nc from the deterministic-BIDS are the worst case and
nominal case objective values Now with these pre-processed
objective (Obj) values and a variation resistance parameter
(lambda), the fuzzy problem is converted to the following crisp
problem, Conversion to Crisp Formulation 60 Slide 62 Experimental
Setup The simulation flow for the fuzzy-BIDS is shown in Figure.
Fuzzy-BIDS was tested on ITC 99 benchmark circuits mapped to user
defined technology library AMPL mathematical programming language
format KNITRO interior point non- linear optimization solver 61
Slide 63 Experimental Results The variation aware logic level
fuzzy-BIDS approach provides an average improvement of 35% on the
number of buffers and gate cost required to meet performance and
yield targets 62 Slide 64 Layout Level Variation Aware BIDS The
variation aware buffer insertion at the layout level is formulated
to optimize variation resistance with delay and cost (number of
buffers and gate sizes) as constraints. The layout level buffer
insertion, however, has restriction on the candidate buffer
location to avoid repeating the place and route step. The
generation of candidate buffer locations is performed by dividing
the routed wires into channels. Sparse channels were preferred as
candidates compared to denser ones. A incremental legalization step
is performed after the layout level buffer insertion to remove
overlaps 63 Slide 65 Layout Level BIDS Simulation The benchmark
circuits for layout level BIDS were placed and routed using cadence
design encounter tool to estimate actual wire lengths Similar to
the logic level simulations, the layout level AMPL models were
solved with KNITRO nonlinear programming (NLP) solver The AMPL
models were rebuilt for layout level with worst-case, nominal-case
and fuzzy modeling 64 Slide 66 Logic Level versus Layout Level BIDS
The cost function (number of buffers plus gate size increments)
comparing logic and layout level BIDS for various benchmarks is
shown in Figure. The average difference (among all benchmarks) in
buffer plus gate cost between logic and layout level simulations is
within 10% 65 Slide 67 Introduction, Motivation and
ContributionsVariation Aware Gate SizingVariation Aware Timing
based PlacementVariation Aware Buffer Insertion and Driver
SizingDynamic Clock StretchingConclusionsPublications Outline of
Presentation Introduction, Motivation and Contributions Variation
Aware Gate Sizing Variation Aware Timing based Placement Variation
Aware Buffer Insertion and Driver Sizing Motivation Related Work
Proposed Methodology Example Evaluation Experimental Results
Dynamic Clock Stretching Conclusions Publications 66 Slide 68
Dynamic Clock Stretching: Motivation Statistical optimization
methods (fuzzy, stochastic) have been effective in improving the
yield/cost tradeoffs for circuits in the nanometer era However,
statistical design methods over consume power/delay even in the
absence of variations Hence, solutions which can dynamically detect
delay due to variations and perform corrective/preventive action is
becoming necessary Here, we propose a dynamic delay detection and
clock stretching technique to prevent timing violations 67 Slide 69
The methodology uses a shadow latch to capture delayed transitions
and generates error signal, which is sent to the voltage controller
The technique, based on current timing failures, corrects them from
happening in later cycles RAZOR: Dan Ernst et. al., MICRO 2003.
Related Work: Dynamic Error Correction using Adaptive Voltage
Scaling 68 Slide 70 Related Work: Critical Path Isolation Based
Variation Tolerance [Ghosh, ICCAD 2003] The methodology isolates
critical paths. Evaluates the data in two cycles whenever critical
paths are activated. Works well on special designs with few
critical paths, but incurs delay overhead on random designs. 69
Slide 71 Related Work: Other Methods Adaptive voltage scaling based
on Critical path duplication [Burd, ISSCC 2000] Clock phase
adjustment based on dynamic delay buffer cell [Semiao, DDECS 2008]
The dynamic delay buffer and critical path duplication do not
consider spatial correlation Dynamic delay buffer design considers
variations in process parameters and ignores temperature and
voltage variations 70 Slide 72 Basic Idea Irrespective of the
variations occurring (P, V or T), we would like to investigate
solutions at circuit level to combat variations with significantly
less overheads. Identify and capture the delay due to process
variations early in the clock period Employ a delay detection
circuit to identify if a transition is delayed in the critical
paths Delay the clock (or select a delayed clock) in the event that
the arrival of a signal is delayed due to process variations. 71
Slide 73 Proposed Work: Dynamic Clock Stretching for Variation
Tolerance 72 Slide 74 Proposed Work: Discussion An important
pre-processing step would be the identification of critical
locations (interconnects), halfway in the critical path In the
presence (absence) of variations, the transitions have to be after
(before) the negative edge of the clock The positive level
triggered latch, shown in Figure captures the value floating on
critical interconnect at the positive level of the clock. 73 Slide
75 Proposed Work: Discussion If the transition is delayed due to
process variations, then the inputs to the XOR gate will be
different. The multiplexor selects the normal (undelayed) or
delayed clock for the destination flip-flop based on the value of
the XOR gate output In the proposed approach, the delayed clock can
be dynamically selected, in case the signal propagation is delayed
in the data path due to process variations. 74 Slide 76 Proposed
Work: Discussion The delay detection and clock stretching logic
(CSL) is added to the critical and near critical paths that can
potentially have timing failure due to process variations Unlike
voltage or frequency scaling, the proposed methodology can provide
immediate activation and enable prevention of timing failures Since
the detection circuit monitors data transitions on critical
interconnects, the methodology is independent of the type of
process variation (PVT). 75 Slide 77 Example Circuit Evaluation A
chain of inverters in between two flip-flops stages is chosen as
the example circuit. In this circuit, all interconnects in the path
switch making the net halfway in the path, the necessary critical
interconnect. Next, we show the simulation snapshots for the
example circuit simulation 76 Slide 78 Simulation Snapshot of
Example Circuit 77 Slide 79 Simulation Snapshot of Example Circuit
78 Slide 80 Simulation Snapshot of Example Circuit 79 Slide 81
Short paths and Pipelined critical paths In the context of clock
stretching, the issue of short paths and consecutively pipelined
critical paths has to be addressed. In nanometer designs, short
paths are usually rare due to the multiple objectives of power,
performance and yield Plus, in this work, we only use a small
margin for clock stretching (approximately 10%), hence minimizing
the possibility of short path failures Secondly in pipeline
circuits if a critical path is followed by another critical path in
the following pipeline stage, the CSL methodology can cause timing
failures. This is because the delayed clock circuitry reduces the
data capture time available in subsequent pipeline stage. 80 Slide
82 Monte-Carlo Flow for Timing Yield Estimation The simulation flow
for timing yield estimation is shown in Figure. A simple C program
was developed to estimate timing yield, with place-route and timing
analysis reports 81 Slide 83 Timing Yield Results on Benchmarks
Number of critical paths can be reduced by incremental
sizing/placement to improve CSL overhead 82 Slide 84 Clock Stretch
Range Versus Timing Yield Graph showing impact of clock stretching
on timing yield. It can be clearly seen that 10% is a good choice
considering the objective of high timing yield and short path
failures 83 Slide 85 Conclusions In this research, we have proposed
solutions for improving timing yield considering variations without
significant over design. The fuzzy modeling is shown to effectively
model variations in linear, nonlinear and piece-wise linear circuit
optimization problems. Hence, the various algorithms and circuit
optimization methods proposed in this dissertation research
represent significant additions to the VLSI CAD tools in the
context of variation aware design. 84 Slide 86 Conclusions The
proposed circuit level technique can be used to dynamically detect
delay in signals that occur due to variations and stretch the clock
to add the required extra slack. This method is expected to make a
significant impact in the industry and a totally different approach
from the previous works. 85 Slide 87 Acknowledgements Semiconductor
Research Corporation contract 2007-HJ-1596 NSF Computing Research
Infrastructure grant CNS-0551621 86 Slide 88 Publications 1.V.
Mahalingam, N. Ranganathan and J.E. Harlow, Fuzzy Optimization
Approach for Gate Sizing in the presence of Process Variations,
IEEE Transactions on VLSI Systems, 16(8), Pages 975-984, Aug 2008
2.V. Mahalingam and N. Ranganathan, Timing Based Placement
Considering Uncertainty due to Process Variations, Accepted for
Publication (Feb 2009) in IEEE Transactions on VLSI Systems 3.V.
Mahalingam and N. Ranganathan, Improving Accuracy in Mitchells
Logarithmic Multiplication using Operand Decomposition, IEEE
Transactions on Computers, 55(12), Pages 1523-1535, Dec 2006 4.V.
Mahalingam, K. Bhattacharya, N. Ranganathan, H. Chakravarthula, R.
Murphy and K. Pratt,An Efficient VLSI Architecture for Accurate
Computation of Lucas-Kanade based Optical Flow, Accepted for
Publication (Sep 2008) in IEEE Transactions on VLSI Systems 5.N.
Ranganathan, U. Gupta and V. Mahalingam, Simultaneous Optimization
of Total Power, Crosstalk Noise, and Delay Under Uncertainty, Great
lakes symposium in VLSI (GLSVLSI), Pages 171-176, May 2008 6.V.
Mahalingam and N. Ranganathan, A Fuzzy Optimization Approach for
Process Variation Aware Buffer Insertion and Driver Sizing, IEEE
Computer Society Annual Symposium on VLSI (ISVLSI), Pages 329-334,
Apr 2008 7.V. Mahalingam and N. Ranganathan, Variation Aware Timing
based Placement using Fuzzy Programming, IEEE International
Symposium on Quality Electronic Design (ISQED), Pages 327-332, Mar
2007 8.V. Mahalingam, N. Ranganathan and Justin E. Harlow, A Novel
Approach for Variation Aware Power Minimization during Gate Sizing,
IEEE International Symposium on Low Power Electronic Design
(ISLPED), Pages 174-179, Oct 2006 9.V. Mahalingam and N.
Ranganathan, Variation Aware Circuit-Wise Buffer Insertion and
Driver Sizing at the Logic Level, Submitted to Design Automation
Conference (DAC), 2009 10.V. Mahalingam, N. Ranganathan, N. Ahmed
and H. Towfique, A Variation Aware Circuit Design using Dynamic
Clock Stretching, Submitted to IEEE International Symposium on Low
Power Electronic Design (ISLPED), 2009 87