8/14/2019 Oral Vladimir Slides
1/53
Computer Systems Laboratory
Stanford University
Design of High-Speed Links:
A look at Modern VLSI Design
Vladimir Stojanovi
8/14/2019 Oral Vladimir Slides
2/53
2
Chip design is changing
Best systems trade-off circuits, architecture
and system issues
Becoming constrained by power Not so much by area/density
Pentium 4125M transistors850mW/mm2
90nm tech103W3.4GHz
Pentium3M transistors30mW/mm2
0.6um tech4W0.1GHz
8/14/2019 Oral Vladimir Slides
3/53
3
Power-performance system optimization
Complex, many levels of hierarchy and variables
8/14/2019 Oral Vladimir Slides
4/53
4
Power-performance system optimization
Complex, many levels of hierarchy and variables
Individual components
Flops & latches
(power and timingcritical)
D Q
Clk
Logic
D Q
Clk
8/14/2019 Oral Vladimir Slides
5/53
5
Power-performance system optimization
Complex, many levels of hierarchy and variables
Individual components
Flops & latches
(power and timingcritical)
System level,VLSI blocks and circuits
-Physical (Vdd, Vth, Sizing)
-Logic
-uArchitecture (parallelism, pipelining)
Vdd1, Vth1
Vdd2,
Vth2
Vdd3,
Vth3
Vdd4,
Vth4
Vdd5
,
Vth5D Q
Clk
Logic
D Q
Clk
D Q
Clk
Logic A
D Q
Clk
Logic B
D Q
Clk
D Q
Clk
Logic A
D Q
Clk
Logic A
Logic B
Logic B
8/14/2019 Oral Vladimir Slides
6/53
6
Power-performance system optimization
Complex, many levels of hierarchy and variables
Individual components
Flops & latches
(power and timingcritical)
System level,VLSI blocks and circuits
-Physical (Vdd, Vth, Sizing)
-Logic
-uArchitecture (paralellism, pipelining)
Interfaces
(Digital, Analog andMixed-Signal)
Vdd1, Vth1
Vdd2,
Vth2
Vdd3,
Vth3
Vdd4,
Vth4
Vdd5
,
Vth5
TransmitterChannel
ReceiverD Q
Clk
Logic
D Q
Clk
D Q
Clk
Logic A
D Q
Clk
Logic B
D Q
Clk
D Q
Clk
Logic A
D Q
Clk
Logic A
Logic B
Logic B
8/14/2019 Oral Vladimir Slides
7/537
Seems pretty simple:
Challenging multi-disciplinary area Circuits Communications Optimization
Look at sub-problem: links
Transmitter
ChannelReceiver
8/14/2019 Oral Vladimir Slides
8/538
What makes it challenging
Now, the bandwidth limit is in wires
High speed
link chip
> 2 GHz signals
8/14/2019 Oral Vladimir Slides
9/539
New link design
Dealing with bandwidth limited channels This is an old research area
Textbooks on digital communications
Think modems, DSL But cant directly apply their solutions
Standard approach requires high-speed A/Ds and digitalsignal processing
20Gs/s A/Ds are expensive (Un)fortunately need to rethink issues
8/14/2019 Oral Vladimir Slides
10/5310
Outline
Show system level optimization for links Create a framework to evaluate trade-offs
Background on high-speed links
High-speed link modeling
System level optimization
Practical implementation issues
Current / future work
8/14/2019 Oral Vladimir Slides
11/5311
Backplane environment
Line attenuation
Reflections from stubs (vias)
Back plane connector
Line card trace
Package
On-chip parasitic(termination resistance and
device loading capacitance)
Line cardvia
Back plane trace
Backplane via
Packagevia
Back plane connector
Line card trace
Package
On-chip parasitic(termination resistance and
device loading capacitance)
Line cardvia
Back plane trace
Backplane via
Packagevia
8/14/2019 Oral Vladimir Slides
12/5312
Backplane channel
Loss is variable Same backplane Different lengths Different stubs
Top vs. Bot
Attenuation is large >30dB @ 3GHz But is that bad?
Required signal amplitudeset by noise
0 2 4 6 8 10
-60
-50
-40
-30
-20
-10
0
frequency [GHz]
A
ttenuation[dB]
9" FR4,via stub
26" FR4,via stub
26" FR4
9" FR4
8/14/2019 Oral Vladimir Slides
13/53
13
Inter-symbol interference (ISI)
Channel is low pass Our nice short pulse gets spread out
0 1 2 3
0
0.2
0.4
0.6
0.8
1
ns
puseresponse
Tsymbol=160ps
Dispersion
short latency
(skin-effect,dielectric loss)
Reflections
long latency
(impedance mismatches
connectors, via stubs,device parasitics,
package)
8/14/2019 Oral Vladimir Slides
14/53
14
ISI
Middle sample is corrupted by 0.2 trailing ISI (from the previoussymbol), and 0.1 leading ISI (from the next symbol) resulting in0.3 total ISI
As a result middle symbol is detected in error
0 2 4 6 8 10 12 14 16 18
0
0.2
0.4
0.6
0.8
1
Symbol time
Amplitude
Error!
8/14/2019 Oral Vladimir Slides
15/53
15
The right sub-system model
Need accurate models To relate the power/complexity to performance
Main system impairments Interference
Various noise sources
Voltage (thermal, supply, offsets, quantization noise) Timing (jitter, offset)
8/14/2019 Oral Vladimir Slides
16/53
16
Problem with current models
Worst case analysis Can be too pessimistic
If probability of worst case very small
Gaussian distributions Works well near mean
Often way off at tails e.g. ISI distribution is bounded
Use direct noise and interference statistics
8/14/2019 Oral Vladimir Slides
17/53
17
Effect of timing noise
Need to map from time to voltage
Idealsampling
The effect is going to depend on the size of the jitter, the
input sequence, and the channel
Jitteredsampling
Voltage noise
Voltage noise
when receiver
clock is off
8/14/2019 Oral Vladimir Slides
18/53
18
kb
kT
TX
k
Tk )1( +
TX
k 1+
kT
TX
k
Tk )1( +
TX
k 1+
+
kb
kb
kb
1
2
Example: Effect of transmitter jitter
Decompose output into ideal and noise
Noise are pulses at front and end of symbol Width of pulse is equal to jitter
Approximate with deltas on bandlimited channels
Jittered pulse decomposition
ideal
noise
8/14/2019 Oral Vladimir Slides
19/53
19
Jitter effect on voltage noise
Transmitter jitter High frequency (cycle-cycle) jitter is bad
Changes the energy (area) of the symbol No correlation of noise sources that sum
Low frequency jitter is less bad
Effectively shifts waveform Correlated noise give partial cancellation
Receive jitter Modeled by shift of transmit sequence Same as low frequency transmitter jitter
Bandwidth of the jitter is critical It sets the magnitude of the noise created
kRx
kRx
8/14/2019 Oral Vladimir Slides
20/53
20
Jitter source from PLL clocks
Noise sources
Reference clock phase noise VCO supply noise
Clock buffer supply noise
M. Mansuri and C-K.K. Yang, "Jitter optimization based on phase-locked loop design parameters,"
IEEE Journal Solid-State Circuits, Nov. 2002
Re fClkPhase
detector
Kpd
Icp
IcpR
C
VCO
Kvco/sClock
buffer
N
+
105
106
107
108
109
1010
-30
-20
-10
0
10
frequency [Hz]
Noisetransferfunctions
[dB]
fromVCOsupply
from
input clockfrom
clockbuffer supply
8/14/2019 Oral Vladimir Slides
21/53
21
2x Oversampled bang-bang CDR
Generate early/late from dn,d
n-1,e
n
Simple 1st order loop, cancels receiver setup time
Now need jitter on data Clk, not PLL output Base linear PLL jitter
Add non-linear phase selector noise from CDR
dn-1
dn
en(late)
dn
en
8/14/2019 Oral Vladimir Slides
22/53
22
0 50 100 150 200 250
-15
-10
-5
0
Phase Count
log10Steady-Sta
teProbability
Bang-bang CDR model
Gives the probability distribution of phase Which is the CDR jitter distribution
Model CDR loop as a state machine Markov chain
A.E. Payzin, "Analysis of a Digital Bit Synchronizer," IEEE Transactions on Communications, April 1983.
8/14/2019 Oral Vladimir Slides
23/53
23
Outline
Show system level optimization for links Create a framework to evaluate trade-offs
Background on high-speed links
High-speed link modeling System level optimization
Limits What is the capacity of these links?
Improving todays baseband signaling
Practical implementation issues
Current / future work
8/14/2019 Oral Vladimir Slides
24/53
8/14/2019 Oral Vladimir Slides
25/53
25
Capacity with link-specific noise
Effective noise from phase noise Proportional to signal energy
Decreases expected gains
Still, capacity much higher than data rates in todays links
NELCO FR4
-25 -20 -15 -10 -5 00
20
40
60
80
100
120
140
Capacity[Gb/s]
log10(Clipping probability)
thermal noise
thermal noise and
LCPLL phase noise
thermal noise and
ring PLL phase noise
-25 -20 -15 -10 -5 00
20
40
60
80
100
120
140
Capacity[Gb/s]
log10(Clipping probability)
thermal noise
thermal noise and LCPLL
phase noise
thermal noise and ring PLL phase noise
8/14/2019 Oral Vladimir Slides
26/53
26
Todays links
Exclusively baseband
Biggest problem is ISI
Starting to use equalization Thinking about multi-level modulation
Constrained by speed and power
Large number of links on a chip
Model links to find efficient implementations
8/14/2019 Oral Vladimir Slides
27/53
27
Baseband links - removing ISI
Transmit and Receive Equalization Changes signal to correct for ISI Often easier to work at transmitter
DACs easier than ADCs
Linear transmit equalizer
Decision-feedback equalizer
SampledData
Deadband Feedback taps
TapSe lLogic
TxData
Causal
taps
Anticausal taps
Channel
J. Zerbe et al, "Design, Equalization and Clock Recovery for a 2.5-10Gb/s 2-PAM/4-PAM BackplaneTransceiver Cell," IEEE Journal Solid-State Circuits, Dec. 2003.
8/14/2019 Oral Vladimir Slides
28/53
28
Transmit equalization headroom constraint
Transmit DAC has limited voltage headroom
Unknown target signal levels Hard to formulate error or objective function
Need to tune the equalizer and receive comparator levels
0 0.5 1 1.5 2 2.5-25
-20
-15
-10
-5
0
frequency [GHz]
Attenuation
[dB]
equalized
unequalized
Amplitude of equalized signal
depends on the channel
Peak power constraint
Optimization example:
8/14/2019 Oral Vladimir Slides
29/53
29
Optimization example:
Power constrained linear precoding
Add variable gain to amplify to known target level Formulate the objective function from error
SINRis not concave in win general
Change objective to quasiconcave
( )222
121),( gwwgwgEgwMSE
TTT
a++=
PPP
2
2
)11)(11(
)1()(
+=
wwE
wEwSINR
TTTTT
a
T
aunbiased
PIIP
P
unbiasedSINR
8/14/2019 Oral Vladimir Slides
30/53
30
Optimal linear precoding
Minimize BER Residual dispersion into peak distortion
Reflections into mean distortion
Includes all link-specific noise sources
( )1..
)11)(11(
15.0maximize
1
2/12
1min
+
=
wtswwE
offsetwVwd
wTT
PD
T
PD
TT
a
PDpeak
T
PIIIIP
PIP
2=wTS0TXw+wTS
0RXw+ 2
thermal
Still, does this objective really relate to link performance?
Need to look at noise and interference distributions
8/14/2019 Oral Vladimir Slides
31/53
31
Including feedback equalization
Feedback equalization (DFE) Subtracts error from input
No attenuation
Problem with DFE Need to know interfering bits
ISI must be causal
Problem - latency in the decision circuit Receive latency + DAC settling < bit time
Can increase allowable time by loop unrolling Receive next bit before the previous is resolved
0 2 4 6 8 10 12 14 16 18
0
0.2
0.4
0.6
0.8
1
Symbol time
Amplitude
Feedbackequalization
8/14/2019 Oral Vladimir Slides
32/53
32
1 bit loop unrolling
Instead of subtracting the error Move the slicer level to include the noise
Slice for each possible level, since previous value unknown
1+
1
+1
1
+1
1
+
0
2PAM signalconstellation
1 D+1
+1
+
1
+1
1
K.K. Parhi, "High-Speed architectures for algorithms with quantizer loops,"
IEEE International Symposium on Circuits and Systems, May 1990
D Q1nd
dClk
1| 1 =nn dd
0| 1 =nn dd
dClk
+
nx
8/14/2019 Oral Vladimir Slides
33/53
33
Residual error
Cannot correct all the ISI
Equalizers are finite length
EQ coefficients quantized
ISI-noise enhancement tradeoff
The error affects both voltage and timing
Need accurate distribution of this error Random data
Standard textbook methods for distribution of the sum ofweighted random variables
8/14/2019 Oral Vladimir Slides
34/53
34
Comparison with Gaussian model
0 25 50 75 100
-10
-8
-6
-4
-2
0
residual ISI [mV]80 100 120 140 160 180
-10
-8
-6
-4
-2
0
40mVerror@10-10
25%ofeyeheight
4%Tsymbol
error@10-10
9%Tsymbol
log10probability[cd
log10Steady-State
PhaseProbabil
phasecount
Cumulative ISI distribution Impact on CDR phase
Gaussian model only good down to 10-3 probability
Way pessimistic for much lower probabilities
8/14/2019 Oral Vladimir Slides
35/53
35
0 20 40 60 80 100 120 140 160-150
-100
-50
0
50
100
150
time [ps]
margin[
mV]
-30
-25
-20
-15
-10
-5
BER contours
Voltage margin Min. distance between the receiver threshold and contours with same BER
0 20 40 60 80 100 120 140 160-150
-100
-50
0
50
100
150
time [ps]
margin
[mV]
-30
-25
-20
-15
-10
-5
5 tap Tx Eq 5 tap Tx Eq + 1 tap DFE
8/14/2019 Oral Vladimir Slides
36/53
36
Pulse amplitude modulation
Binary (NRZ) 1 bit / symbol Symbol rate = bit rate
PAM4 2 bits / symbol
Symbol rate = bit rate/2
10
11
01
00
1
0
8/14/2019 Oral Vladimir Slides
37/53
8/14/2019 Oral Vladimir Slides
38/53
8/14/2019 Oral Vladimir Slides
39/53
39
Outline
Show system level optimization for links Create a framework to evaluate trade-offs
Background on high-speed links
High-speed link modeling System level optimization
Practical implementation issues Low-cost adaptation
Dual-mode link (hardware re-use)
Current / future work
8/14/2019 Oral Vladimir Slides
40/53
40
Fully adaptive dual-mode link
Reconfigurable dual-mode PAM2/PAM4 link Adaptive equalization Transmit and receive equalization DFE with loop unrolling
TX
RX
PLL PAM2/PAM4 2-10Gb/s 0.13m 40mW/Gb
8/14/2019 Oral Vladimir Slides
41/53
8/14/2019 Oral Vladimir Slides
42/53
42
Equalizer loop
Scale the equalizer - output Tx constraint
Dual-loop adaptive algorithm
Data level reference loop
)()(1 nnwnn xsignesignstepww +=+
0),(1 >=+ nndataLevnn xesignstepdLevdLev
dLevinitdLevmid
dLevend
Initial eye Mid-way equalized Equalized
errorinitp-p
nx
)( nxSign
)( neSign
8/14/2019 Oral Vladimir Slides
43/53
43
Dual loop convergence 4 tap example
Hard to estimate analytically Experimental results show
Both loops are stable within wide range 0.1 10x of relative speeds
0 50 100 150 200-400
-200
0
200
400
600
800
1000
number of updates
tapw
eight[mV] main tap
post1pre1
post2
0 50 100 150 2000
20
40
60
80
100
number of updates
dLev[mV]
PAM2, 5Gb/s, 4taps Tx Equalization
8/14/2019 Oral Vladimir Slides
44/53
44
Hardware re-use: Dual-mode receiver
PAM4
D QD Q
D Q
D Q
D Q
thresh (+)
thresh (-)
x
0
lsb(+)
lsb(-)
msb
prDFE enable
D Q
dClk
dClk
dClk
prDFE enable
prDFE enable
D Q
D Q
D Q
D Q1
0
1
0
1
0
1
0thresh(+)
thresh(-)
0
8/14/2019 Oral Vladimir Slides
45/53
45
Hardware re-use: Dual-mode receiver
PAM4
PAM2
0
D QD Q
D Q
D Q
D Q
thresh (+)
thresh (-)
x
0
lsb(+)
lsb(-)
msb
prDFE enable
D Q
dClk
dClk
dClk
prDFE enable
prDFE enable
D Q
D Q
D Q
D Q1
0
1
0
1
0
1
0
8/14/2019 Oral Vladimir Slides
46/53
46
Hardware re-use: Dual-mode receiver
PAM4
PAM2 with loop-unrolled DFE tap Leverage multi-level properties of signals in loop-unrolling
thresh(+)
thresh(-)
D QD Q
D Q
D Q
D Q
thresh (+)
thresh (-)
x
0
lsb(+)
lsb(-)
msb
prDFE enable
D Q
dClk
dClk
dClk
prDFE enable
prDFE enable
D Q
D Q
D Q
D Q1
0
1
0
1
0
1
0
8/14/2019 Oral Vladimir Slides
47/53
47
Improvements with loop-unrolling
Signal as seen by the
receiver (on-chip scope)
0 1000 2000 3000 4000
0
0.1
0.2
0.3
0.4[V]
[ps]
unequalized
(a)
0 1000 2000 3000 4000
0
0.05
0.1
0.15
0.2
0.25[V]
[ps]
transmit equalized
with one tap DFE
fully transmit equalized
(b)
0 50 100 150 200
-100
-50
0
50
100
150
200
[ps]
[mV]
-5
-4.5
-4
-3.5
-3
log10
(voltagepro
babilitydistribution)
8/14/2019 Oral Vladimir Slides
48/53
48
Model and measurements
-80-60-40-20020406080
-14
-12
-10
-8
-6
-4
-2
0
log10(BER)
Voltage Margin [mV]
PAM4, 3taps of transmit equalization, 5Gb/s
O li
8/14/2019 Oral Vladimir Slides
49/53
49
Outline
Show system level optimization for links Create a framework to evaluate trade-offs
Background on high-speed links
High-speed link modeling System level optimization
Practical implementation issues
Current / future work Bridging the gap to link capacity
B id i th M lti t li k
8/14/2019 Oral Vladimir Slides
50/53
50
Bridging the gap: Multi-tone link
0 2 4 6 8 10 12 140
2
4
6
8
10
Multi-tone data rates with thermal noise
Nelco 64Gb/s
FR4 38Gb/s
#b
its/Hz
frequency [GHz]
B id i th M lti t li k
8/14/2019 Oral Vladimir Slides
51/53
51
Bridging the gap: Multi-tone link
f
#
levels
data0
data1
dataN
Challenge balancing the inter-symbol andinter-channel interference Microwave filter techniques Custom signal processing
0 2 4 6 8 10 12 140
2
4
6
8
Multi-tone data rates with thermal noise
Nelco 64Gb/s
FR4 38Gb/s
#bits/Hz
frequency [GHz]
LPF
BPF
BPF
BPF
LPF
ejw1t ejw1t
ejwNt
data0
data1
LPF
BPF
ejwNt
LPF
dataN
LPF
LPF
C l i
8/14/2019 Oral Vladimir Slides
52/53
52
Conclusions
Links nice example of system-level optimization Need accurate models Global tradeoff
off-chip communication with on-chip computation
ISI is large in baseband links Cant completely compensate
(At least not with reasonable area/power)
Power constrained transmitter PAM4 and simple DFE are attractive solutions
Implemented practical, low-cost algorithms Still, far from the capacity of these links
Looking into multi-tone to bridge the gap
A k l d t
8/14/2019 Oral Vladimir Slides
53/53
Acknowledgments
Prof. Mark Horowitz and Prof. Vojin Oklobdzija
Prof. Stephen Boyd, Prof. Joseph Kahn, Prof. Thomas Lee
My mother Nada, my wife Ivana, kids Marija and Marko, my sister
Tamara, Maurizio and my whole family
Rambus and MARCO IFC for support
Jared Zerbe, Andrew Ho, Fred Chen and everybody in Rambus XG team
MH group - especially Elad Alon and Amir Amirkhany
Dr. George Ginis and Prof. John Cioffi
Dejan Markovic and Prof. Borivoje Nikolic
Prof. Michael Flynn, Prof. Ken Yang
Marianne Marx, Teresa, Penny, Taru, Deborah, Pamela
My friends Svjetlana, Danijela and Dejan