Spring 2006 EE 5324 - VLSI Design II - © Kia Ba zargan 1 EE 5324 – VLSI Design II Kia Bazargan University of Minnesota Part VIII: Timing Issues
Dec 13, 2015
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 1
EE 5324 – VLSI Design IIEE 5324 – VLSI Design II
Kia Bazargan
University of Minnesota
Part VIII: Timing IssuesPart VIII: Timing Issues
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 2
References and Copyright
• Textbooks referenced [Rab96] J. M. Rabaey
“Digital Integrated Circuits: A Design Perspective”Prentice Hall, 1996.
• Slides used(Modified by Kia when necessary) [©Prentice Hall] © Prentice Hall 1995, © UCB
1996 Slides for [Rab96] http://bwrc.eecs.berkeley.edu/Classes/IcBook/instructors.html
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 3
Why Deal With Timing?
• Clock Makes sure signals are settled before being
written Controls the order of operations
• Problem? Physical implementation of the circuit what
we planned Why?
o Wires incur delay on signalso Clock edge might arrive too early or too late
• Challenges Clock routing Synchronization protocols
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 4
Clock Skew• Clock signal
Connects to all registers/flip-flops Connects to all pre-charge/evaluate of dynamic
logic Huge fanout large capacitive load Routed to all parts of the chip
Huge capacitance of the clock net itself Example: Alpha processor: 3.24 nF (40% chip C)
• Clock skew Clock net has huge RC Signal arrival time depends on the length of the
dest from source Not the “same” clock signal for different
destinations• Why important?
Timing violated Larger chips even worse
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 5
Clock Wire Delay
CL
r
c
Rs
r = 0.07 /lc=0.04 fF/m2
(Tungsten wire)
[©Prentice Hall]
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 6
Reference Circuit: Pipelined Datapath
• We use this circuit to analyze the problem
CL1 R1 CL2 R2 CL3 R3
t’ t’’ t’’’
In Outti
tl,min
tl,max
tr,min
tr,max
Skew: = t’’ – t’
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 7
Skew in Single-Phase Edge-Triggered Clocking
• Race between clock and data
R1 R2
t’ t’’= t’+
’ ’’
tr,min+tl,min+ti
tr,min+tl,min+ti tr,min+tl,min+ti (skew bound)
[Rab96] p513
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 8
Skew in Single-Phase Edge-Triggered Clocking
• Data stable before clock applied
R1 R2
t’ t’’+T= t’+
’ ’’
tr,max+tl,max+ti
T tr,max+tl,max+ti- T tr,max+tl,max+ti-
’’+T
t’’+ T t’+tr,max+tl,max+ti
(clock periodbound)
[Rab96] p513
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 9
Clock Signal Direction
• Same direction as data: >0 Skew constraint (bound) must be strictly
controlled - : If constraint not met,
even reducing clock frequency would not help!
+ : Positive skew increases throughput (by ) (see “clock period bound”)
o Not worth: high risk
• Opposite direction as data: <0 Skew constraint always met Throughput decreases (by ||)
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 10
Skew in Two-Phase Master-Slave Clocking
CL1 M1 CL2 M2 CL3 M3
’
In S1 S2 S3
’
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 11
Two-Phase Clock Timing
clock period T
T1
-T12
clockoverlap
1
2
1’
T2
T12 T21
tmin > – T12
tmax < T + – T12
tmin > – T12
tmax < T + – T12
new data applied to CL2 previous data latched into M2
[©Prentice Hall]
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 12
Two-Phase vs. Single-Phase
• Comparing the skew bounds, T12 acts as a buffer for the skew Skew can always be countered by increasing
T12
• Performance Increasing T12 could mean longer clock periods
• Positive vs. negative skew Same as single-phase
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 13
How to Counter Clock Skew Problems?
• Routing the clock in the opposite direction of data Local solution only, not always an option (see below)
• Controlling the non-overlap periods of the clock Only for 2-phase clocks Could decrease clock frequency
• Perform the routing of the clock such that skew is minimum
. . .
log Out
InPositive Skew
Negative Skew
Reg R
eg
Reg
Reg
[©Prentice Hall]
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 14
Clock Routing
CLOCK
H-Tree Network
Observe: Only Relative Skew is Important
CLOCK
Mainclock driver
Secondaryclock
drivers
Reduces absolute delay.Makes Power-Down easierSensitive to variationsin Buffer Delay
LocalArea
modulemodule
modulemodule
modulemodule
Comb-Tree Network
[©Prentice Hall]
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 15
Example: DEC Alpha 21164
• Clock frequency: 300MHz – 9.3 million transistors
• Total clock load: 3.75 nF• Power in clock distribution network: 20W
(40% of the total!)• Uses two-level clock distribution
Single 6-stage driver at center Secondary buffers drive left and right side
• Clock grid in metal3 and metal4
[©Prentice Hall]
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 16
DEC Alpha 21164
Clock Drivers
[©Prentice Hall]
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 17
DEC Alpha 21164: Clock Skew
[©Prentice Hall]
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 18
Self-Timed and Asynchronous Circuits
• Functions of clock in synchronous designs Act as completion signal (data stable before
latched) Ensures correct ordering of events Based on worst-case delay of the circuit
• Truly asynchronous design Completion is ensured by careful timing analysis Ordering of events is implicit in logic Very risky
• Self-timed design Completion ensured completion signal Ordering imposed by handshaking protocol “Local” solution to the timing problem Based on average delay of the circuit [©Prentice Hall]
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 19
done done donestart start start
Req Req Req
Ack Ack Ack
Example of Self-Timed Pipeline (Handshaking)
• “Start” and “done” signals ensure physical timing constraints met
• Acknowledge/Request (aka handshaking protocol) ensure correct ordering of the operations
CL1R1 CL2R2 CL3R3In
tCL1 tCL2 tCL3
start start start
HSReq
HSReq
HSReq
Ack Ack Ack
done done done
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 20
Self-Timed Circuits: Advantages and Disadv.
• Advantages to synchronous: Timing signals generated locally
o No clock routing problemso Saving in power consumption of the clock net
Potential increase in performanceo Separate physical and logical ordering mechanismo Self-timed: average, synchronous: worst-case
Robust to variations (manufacturing + environment)
• Disadvantage: Larger area
o Redundancyo Control circuit (handshaking)
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 21
Completion Signal Generation Methods
• Delay module method Mimic the delay of the
logic circuit using a separate delay element.
Not much area overhead Not aggressive in
obtaining average speed Used in memories (internal
timing)
• Dual-rail computation Use redundant signal
representation Denote 1, 0, “in transition”
LogicNetwork
In
Delay Modulestartdone
out
B B0 B1
In transition 0 00 0 11 1 0Illegal 1 1
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 22
Completion Signal Generation: Redundant Code
Start
StartB0
B1Done
Vdd Vdd
B1B0
In1In1In2In2
PDN PDN
[©Prentice Hall]
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 23
Redundant Signal Coding (cont.)
• When “start” is low Circuit precharged (B0,B1) in the “transition” state
• When “start” high ONLY ONE of the pull-down networks evaluates Only one of the B0, B1 signals goes high
• “Done” defined as the OR For an N-bit word, all “done” signals must be
combined more area, more delay
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 24
Example: Self-Timed Adder
P0
C0
P1
G0
P2
G1
P3
G2 G3
VDD
Start
Start
P0
C0
P1
K0
P2
K1
P3
K2 K3
VDD
Start
Start
C0 C1 C2 C3 C4 C4
C4C0 C1 C2 C3 C4
VDD
Start
C4
C3
C2
C1
C4
C3
C2
C1
Start Done
(a) Differential carry generation
(b) Completion signal
[©Prentice Hall]
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 25
Example: Self-Timed Adder (cont.)
• Dual evaluation network used only for thecarry chain (critical path)
• Using K (kill) instead of G (generate),inverts the function
• “Done” evaluation assumed to be slower than sum evaluation
• Example: Self-timed: 0.23 nsec/bit, 3300 2. Synchronous: same delay, less area BUT, actual performance of self-timed
substantially better (average vs. worst-case delays)
Self-timed: O(log N) delay – similar to tree-structured synchronous
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 26
Handshaking
• Protocol for the logical ordering of operations Avoid race Avoid hazards
• Extra hardware to implement State machine Queues possible
• Exact protocol depends on: Architecture Environment Must accommodate:
o New data available (sender)o Request computation (sender)o Acknowledge receipt (receiver)o Ready for new computation (receiver)
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 27
Four-Phase Handshaking
Sender-receiverconfiguration Timing diagram
Sen
der
Rece
iver
Req
Ack
Data
Req
Data
Ack
Cycle 1 Cycle 2
Sender’s actionReceiver’s action
[©Prentice Hall]
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 28
Event Logic: the Muller C-element
[©Prentice Hall]
A
B
F
A B Fn+1
001
1
010
1
0Fn
Fn
1
(a) Schematic (b) Truth table
VDD
FA
B
QS
R
A
B
F
Static
Dynamic
C
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 29
Two-Phase Handshaking Implementation
[©Prentice Hall]
SenderLogic
ReceiverLogic
Data
C
Data
Acce
pte
d
Req
Ack
Data Ready
Implementation
Sender’s actionReceiver’s action
Req
Ack
Data
cycle 1 cycle 2
Timing diagram
“edge-sensitive” to HS signals
0 Data Ready (DR)=1
1
1 Receiver: “ready for new data” (Ack)
2
2 Sender: “new data ready” (DR) Req
3
3 Receiver: “done, ready for new data” (Ack)
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 30
Example: Self-Timed FIFO
[©Prentice Hall]
Reqi En1 Done1 En2 Done2 En3 Reqo
Acki Reqi En1 Done1 En2 Done2
Acki Reqi En1 Done1
Acki Reqi
C C
R1In Out
En
Acki
Reqi
R2 R3
CReq0
Acko
Done
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 31
Asynchronous Systems
• Outside world usually asynchronous• Synchronization usually by polling• Perfect synchronization impossible
Sample input at transition
f
fin
AsynchronousSystem
SynchronousSystem
Synchro-nization
[©Prentice Hall]
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 32
A Simple Synchronizer
Vin
Vout
• Data sampled on Falling Edge of Clock
• Latch will eventually Resolve Signal Value,but ... this might take infinite time!
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 33
System Level Synchronization
[©Prentice Hall]
Reference clock
PC board
Chip 1 Chip 2
Logic Logic
I/O Data
1’
2’
1 “
2 “
Crystal-basedclock-generator
Clo
ck
Gen
erat
or
Clo
ckG
ener
ator
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 34
Skew of Local Clocks vs Reference
’
"
’
"
(a) Skew of local clock signals
with respect of reference clock.(b) Local clock signals as produced
by PLL based clock generator.
[©Prentice Hall]
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 35
Phase-Locked Loop Based Clock Generator
[©Prentice Hall]
Phasedetector
Chargepump
Up
Down
Loopfilter
VCO
Clock decode &
buffer
Divide byN
Reference clock
Localclock
1 2 ...
Vcontr
Acts also as Clock Multiplier
Up
Down
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 36
To Probe Further ...
• Clock skew visualization (cool animations!!) P. J. Restle,
"Technical Visualizations in VLSI Design",Design Automation Conference, pp. 494-499, 2001
• Asynchronous FIFO design (system-level comm) T. Chelcea and S. Nowick,
“Robust Interfaces for MixedTiming Systems with Application to LatencyInsensitive Protocols”,Design Automation Conference, pp. 21-26, 2001.