Top Banner

of 53

Oral Vladimir Slides

May 30, 2018

Download

Documents

kedarkul
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/14/2019 Oral Vladimir Slides

    1/53

    Computer Systems Laboratory

    Stanford University

    Design of High-Speed Links:

    A look at Modern VLSI Design

    Vladimir Stojanovi

  • 8/14/2019 Oral Vladimir Slides

    2/53

    2

    Chip design is changing

    Best systems trade-off circuits, architecture

    and system issues

    Becoming constrained by power Not so much by area/density

    Pentium 4125M transistors850mW/mm2

    90nm tech103W3.4GHz

    Pentium3M transistors30mW/mm2

    0.6um tech4W0.1GHz

  • 8/14/2019 Oral Vladimir Slides

    3/53

    3

    Power-performance system optimization

    Complex, many levels of hierarchy and variables

  • 8/14/2019 Oral Vladimir Slides

    4/53

    4

    Power-performance system optimization

    Complex, many levels of hierarchy and variables

    Individual components

    Flops & latches

    (power and timingcritical)

    D Q

    Clk

    Logic

    D Q

    Clk

  • 8/14/2019 Oral Vladimir Slides

    5/53

    5

    Power-performance system optimization

    Complex, many levels of hierarchy and variables

    Individual components

    Flops & latches

    (power and timingcritical)

    System level,VLSI blocks and circuits

    -Physical (Vdd, Vth, Sizing)

    -Logic

    -uArchitecture (parallelism, pipelining)

    Vdd1, Vth1

    Vdd2,

    Vth2

    Vdd3,

    Vth3

    Vdd4,

    Vth4

    Vdd5

    ,

    Vth5D Q

    Clk

    Logic

    D Q

    Clk

    D Q

    Clk

    Logic A

    D Q

    Clk

    Logic B

    D Q

    Clk

    D Q

    Clk

    Logic A

    D Q

    Clk

    Logic A

    Logic B

    Logic B

  • 8/14/2019 Oral Vladimir Slides

    6/53

    6

    Power-performance system optimization

    Complex, many levels of hierarchy and variables

    Individual components

    Flops & latches

    (power and timingcritical)

    System level,VLSI blocks and circuits

    -Physical (Vdd, Vth, Sizing)

    -Logic

    -uArchitecture (paralellism, pipelining)

    Interfaces

    (Digital, Analog andMixed-Signal)

    Vdd1, Vth1

    Vdd2,

    Vth2

    Vdd3,

    Vth3

    Vdd4,

    Vth4

    Vdd5

    ,

    Vth5

    TransmitterChannel

    ReceiverD Q

    Clk

    Logic

    D Q

    Clk

    D Q

    Clk

    Logic A

    D Q

    Clk

    Logic B

    D Q

    Clk

    D Q

    Clk

    Logic A

    D Q

    Clk

    Logic A

    Logic B

    Logic B

  • 8/14/2019 Oral Vladimir Slides

    7/537

    Seems pretty simple:

    Challenging multi-disciplinary area Circuits Communications Optimization

    Look at sub-problem: links

    Transmitter

    ChannelReceiver

  • 8/14/2019 Oral Vladimir Slides

    8/538

    What makes it challenging

    Now, the bandwidth limit is in wires

    High speed

    link chip

    > 2 GHz signals

  • 8/14/2019 Oral Vladimir Slides

    9/539

    New link design

    Dealing with bandwidth limited channels This is an old research area

    Textbooks on digital communications

    Think modems, DSL But cant directly apply their solutions

    Standard approach requires high-speed A/Ds and digitalsignal processing

    20Gs/s A/Ds are expensive (Un)fortunately need to rethink issues

  • 8/14/2019 Oral Vladimir Slides

    10/5310

    Outline

    Show system level optimization for links Create a framework to evaluate trade-offs

    Background on high-speed links

    High-speed link modeling

    System level optimization

    Practical implementation issues

    Current / future work

  • 8/14/2019 Oral Vladimir Slides

    11/5311

    Backplane environment

    Line attenuation

    Reflections from stubs (vias)

    Back plane connector

    Line card trace

    Package

    On-chip parasitic(termination resistance and

    device loading capacitance)

    Line cardvia

    Back plane trace

    Backplane via

    Packagevia

    Back plane connector

    Line card trace

    Package

    On-chip parasitic(termination resistance and

    device loading capacitance)

    Line cardvia

    Back plane trace

    Backplane via

    Packagevia

  • 8/14/2019 Oral Vladimir Slides

    12/5312

    Backplane channel

    Loss is variable Same backplane Different lengths Different stubs

    Top vs. Bot

    Attenuation is large >30dB @ 3GHz But is that bad?

    Required signal amplitudeset by noise

    0 2 4 6 8 10

    -60

    -50

    -40

    -30

    -20

    -10

    0

    frequency [GHz]

    A

    ttenuation[dB]

    9" FR4,via stub

    26" FR4,via stub

    26" FR4

    9" FR4

  • 8/14/2019 Oral Vladimir Slides

    13/53

    13

    Inter-symbol interference (ISI)

    Channel is low pass Our nice short pulse gets spread out

    0 1 2 3

    0

    0.2

    0.4

    0.6

    0.8

    1

    ns

    puseresponse

    Tsymbol=160ps

    Dispersion

    short latency

    (skin-effect,dielectric loss)

    Reflections

    long latency

    (impedance mismatches

    connectors, via stubs,device parasitics,

    package)

  • 8/14/2019 Oral Vladimir Slides

    14/53

    14

    ISI

    Middle sample is corrupted by 0.2 trailing ISI (from the previoussymbol), and 0.1 leading ISI (from the next symbol) resulting in0.3 total ISI

    As a result middle symbol is detected in error

    0 2 4 6 8 10 12 14 16 18

    0

    0.2

    0.4

    0.6

    0.8

    1

    Symbol time

    Amplitude

    Error!

  • 8/14/2019 Oral Vladimir Slides

    15/53

    15

    The right sub-system model

    Need accurate models To relate the power/complexity to performance

    Main system impairments Interference

    Various noise sources

    Voltage (thermal, supply, offsets, quantization noise) Timing (jitter, offset)

  • 8/14/2019 Oral Vladimir Slides

    16/53

    16

    Problem with current models

    Worst case analysis Can be too pessimistic

    If probability of worst case very small

    Gaussian distributions Works well near mean

    Often way off at tails e.g. ISI distribution is bounded

    Use direct noise and interference statistics

  • 8/14/2019 Oral Vladimir Slides

    17/53

    17

    Effect of timing noise

    Need to map from time to voltage

    Idealsampling

    The effect is going to depend on the size of the jitter, the

    input sequence, and the channel

    Jitteredsampling

    Voltage noise

    Voltage noise

    when receiver

    clock is off

  • 8/14/2019 Oral Vladimir Slides

    18/53

    18

    kb

    kT

    TX

    k

    Tk )1( +

    TX

    k 1+

    kT

    TX

    k

    Tk )1( +

    TX

    k 1+

    +

    kb

    kb

    kb

    1

    2

    Example: Effect of transmitter jitter

    Decompose output into ideal and noise

    Noise are pulses at front and end of symbol Width of pulse is equal to jitter

    Approximate with deltas on bandlimited channels

    Jittered pulse decomposition

    ideal

    noise

  • 8/14/2019 Oral Vladimir Slides

    19/53

    19

    Jitter effect on voltage noise

    Transmitter jitter High frequency (cycle-cycle) jitter is bad

    Changes the energy (area) of the symbol No correlation of noise sources that sum

    Low frequency jitter is less bad

    Effectively shifts waveform Correlated noise give partial cancellation

    Receive jitter Modeled by shift of transmit sequence Same as low frequency transmitter jitter

    Bandwidth of the jitter is critical It sets the magnitude of the noise created

    kRx

    kRx

  • 8/14/2019 Oral Vladimir Slides

    20/53

    20

    Jitter source from PLL clocks

    Noise sources

    Reference clock phase noise VCO supply noise

    Clock buffer supply noise

    M. Mansuri and C-K.K. Yang, "Jitter optimization based on phase-locked loop design parameters,"

    IEEE Journal Solid-State Circuits, Nov. 2002

    Re fClkPhase

    detector

    Kpd

    Icp

    IcpR

    C

    VCO

    Kvco/sClock

    buffer

    N

    +

    105

    106

    107

    108

    109

    1010

    -30

    -20

    -10

    0

    10

    frequency [Hz]

    Noisetransferfunctions

    [dB]

    fromVCOsupply

    from

    input clockfrom

    clockbuffer supply

  • 8/14/2019 Oral Vladimir Slides

    21/53

    21

    2x Oversampled bang-bang CDR

    Generate early/late from dn,d

    n-1,e

    n

    Simple 1st order loop, cancels receiver setup time

    Now need jitter on data Clk, not PLL output Base linear PLL jitter

    Add non-linear phase selector noise from CDR

    dn-1

    dn

    en(late)

    dn

    en

  • 8/14/2019 Oral Vladimir Slides

    22/53

    22

    0 50 100 150 200 250

    -15

    -10

    -5

    0

    Phase Count

    log10Steady-Sta

    teProbability

    Bang-bang CDR model

    Gives the probability distribution of phase Which is the CDR jitter distribution

    Model CDR loop as a state machine Markov chain

    A.E. Payzin, "Analysis of a Digital Bit Synchronizer," IEEE Transactions on Communications, April 1983.

  • 8/14/2019 Oral Vladimir Slides

    23/53

    23

    Outline

    Show system level optimization for links Create a framework to evaluate trade-offs

    Background on high-speed links

    High-speed link modeling System level optimization

    Limits What is the capacity of these links?

    Improving todays baseband signaling

    Practical implementation issues

    Current / future work

  • 8/14/2019 Oral Vladimir Slides

    24/53

  • 8/14/2019 Oral Vladimir Slides

    25/53

    25

    Capacity with link-specific noise

    Effective noise from phase noise Proportional to signal energy

    Decreases expected gains

    Still, capacity much higher than data rates in todays links

    NELCO FR4

    -25 -20 -15 -10 -5 00

    20

    40

    60

    80

    100

    120

    140

    Capacity[Gb/s]

    log10(Clipping probability)

    thermal noise

    thermal noise and

    LCPLL phase noise

    thermal noise and

    ring PLL phase noise

    -25 -20 -15 -10 -5 00

    20

    40

    60

    80

    100

    120

    140

    Capacity[Gb/s]

    log10(Clipping probability)

    thermal noise

    thermal noise and LCPLL

    phase noise

    thermal noise and ring PLL phase noise

  • 8/14/2019 Oral Vladimir Slides

    26/53

    26

    Todays links

    Exclusively baseband

    Biggest problem is ISI

    Starting to use equalization Thinking about multi-level modulation

    Constrained by speed and power

    Large number of links on a chip

    Model links to find efficient implementations

  • 8/14/2019 Oral Vladimir Slides

    27/53

    27

    Baseband links - removing ISI

    Transmit and Receive Equalization Changes signal to correct for ISI Often easier to work at transmitter

    DACs easier than ADCs

    Linear transmit equalizer

    Decision-feedback equalizer

    SampledData

    Deadband Feedback taps

    TapSe lLogic

    TxData

    Causal

    taps

    Anticausal taps

    Channel

    J. Zerbe et al, "Design, Equalization and Clock Recovery for a 2.5-10Gb/s 2-PAM/4-PAM BackplaneTransceiver Cell," IEEE Journal Solid-State Circuits, Dec. 2003.

  • 8/14/2019 Oral Vladimir Slides

    28/53

    28

    Transmit equalization headroom constraint

    Transmit DAC has limited voltage headroom

    Unknown target signal levels Hard to formulate error or objective function

    Need to tune the equalizer and receive comparator levels

    0 0.5 1 1.5 2 2.5-25

    -20

    -15

    -10

    -5

    0

    frequency [GHz]

    Attenuation

    [dB]

    equalized

    unequalized

    Amplitude of equalized signal

    depends on the channel

    Peak power constraint

    Optimization example:

  • 8/14/2019 Oral Vladimir Slides

    29/53

    29

    Optimization example:

    Power constrained linear precoding

    Add variable gain to amplify to known target level Formulate the objective function from error

    SINRis not concave in win general

    Change objective to quasiconcave

    ( )222

    121),( gwwgwgEgwMSE

    TTT

    a++=

    PPP

    2

    2

    )11)(11(

    )1()(

    +=

    wwE

    wEwSINR

    TTTTT

    a

    T

    aunbiased

    PIIP

    P

    unbiasedSINR

  • 8/14/2019 Oral Vladimir Slides

    30/53

    30

    Optimal linear precoding

    Minimize BER Residual dispersion into peak distortion

    Reflections into mean distortion

    Includes all link-specific noise sources

    ( )1..

    )11)(11(

    15.0maximize

    1

    2/12

    1min

    +

    =

    wtswwE

    offsetwVwd

    wTT

    PD

    T

    PD

    TT

    a

    PDpeak

    T

    PIIIIP

    PIP

    2=wTS0TXw+wTS

    0RXw+ 2

    thermal

    Still, does this objective really relate to link performance?

    Need to look at noise and interference distributions

  • 8/14/2019 Oral Vladimir Slides

    31/53

    31

    Including feedback equalization

    Feedback equalization (DFE) Subtracts error from input

    No attenuation

    Problem with DFE Need to know interfering bits

    ISI must be causal

    Problem - latency in the decision circuit Receive latency + DAC settling < bit time

    Can increase allowable time by loop unrolling Receive next bit before the previous is resolved

    0 2 4 6 8 10 12 14 16 18

    0

    0.2

    0.4

    0.6

    0.8

    1

    Symbol time

    Amplitude

    Feedbackequalization

  • 8/14/2019 Oral Vladimir Slides

    32/53

    32

    1 bit loop unrolling

    Instead of subtracting the error Move the slicer level to include the noise

    Slice for each possible level, since previous value unknown

    1+

    1

    +1

    1

    +1

    1

    +

    0

    2PAM signalconstellation

    1 D+1

    +1

    +

    1

    +1

    1

    K.K. Parhi, "High-Speed architectures for algorithms with quantizer loops,"

    IEEE International Symposium on Circuits and Systems, May 1990

    D Q1nd

    dClk

    1| 1 =nn dd

    0| 1 =nn dd

    dClk

    +

    nx

  • 8/14/2019 Oral Vladimir Slides

    33/53

    33

    Residual error

    Cannot correct all the ISI

    Equalizers are finite length

    EQ coefficients quantized

    ISI-noise enhancement tradeoff

    The error affects both voltage and timing

    Need accurate distribution of this error Random data

    Standard textbook methods for distribution of the sum ofweighted random variables

  • 8/14/2019 Oral Vladimir Slides

    34/53

    34

    Comparison with Gaussian model

    0 25 50 75 100

    -10

    -8

    -6

    -4

    -2

    0

    residual ISI [mV]80 100 120 140 160 180

    -10

    -8

    -6

    -4

    -2

    0

    40mVerror@10-10

    25%ofeyeheight

    4%Tsymbol

    error@10-10

    9%Tsymbol

    log10probability[cd

    log10Steady-State

    PhaseProbabil

    phasecount

    Cumulative ISI distribution Impact on CDR phase

    Gaussian model only good down to 10-3 probability

    Way pessimistic for much lower probabilities

  • 8/14/2019 Oral Vladimir Slides

    35/53

    35

    0 20 40 60 80 100 120 140 160-150

    -100

    -50

    0

    50

    100

    150

    time [ps]

    margin[

    mV]

    -30

    -25

    -20

    -15

    -10

    -5

    BER contours

    Voltage margin Min. distance between the receiver threshold and contours with same BER

    0 20 40 60 80 100 120 140 160-150

    -100

    -50

    0

    50

    100

    150

    time [ps]

    margin

    [mV]

    -30

    -25

    -20

    -15

    -10

    -5

    5 tap Tx Eq 5 tap Tx Eq + 1 tap DFE

  • 8/14/2019 Oral Vladimir Slides

    36/53

    36

    Pulse amplitude modulation

    Binary (NRZ) 1 bit / symbol Symbol rate = bit rate

    PAM4 2 bits / symbol

    Symbol rate = bit rate/2

    10

    11

    01

    00

    1

    0

  • 8/14/2019 Oral Vladimir Slides

    37/53

  • 8/14/2019 Oral Vladimir Slides

    38/53

  • 8/14/2019 Oral Vladimir Slides

    39/53

    39

    Outline

    Show system level optimization for links Create a framework to evaluate trade-offs

    Background on high-speed links

    High-speed link modeling System level optimization

    Practical implementation issues Low-cost adaptation

    Dual-mode link (hardware re-use)

    Current / future work

  • 8/14/2019 Oral Vladimir Slides

    40/53

    40

    Fully adaptive dual-mode link

    Reconfigurable dual-mode PAM2/PAM4 link Adaptive equalization Transmit and receive equalization DFE with loop unrolling

    TX

    RX

    PLL PAM2/PAM4 2-10Gb/s 0.13m 40mW/Gb

  • 8/14/2019 Oral Vladimir Slides

    41/53

  • 8/14/2019 Oral Vladimir Slides

    42/53

    42

    Equalizer loop

    Scale the equalizer - output Tx constraint

    Dual-loop adaptive algorithm

    Data level reference loop

    )()(1 nnwnn xsignesignstepww +=+

    0),(1 >=+ nndataLevnn xesignstepdLevdLev

    dLevinitdLevmid

    dLevend

    Initial eye Mid-way equalized Equalized

    errorinitp-p

    nx

    )( nxSign

    )( neSign

  • 8/14/2019 Oral Vladimir Slides

    43/53

    43

    Dual loop convergence 4 tap example

    Hard to estimate analytically Experimental results show

    Both loops are stable within wide range 0.1 10x of relative speeds

    0 50 100 150 200-400

    -200

    0

    200

    400

    600

    800

    1000

    number of updates

    tapw

    eight[mV] main tap

    post1pre1

    post2

    0 50 100 150 2000

    20

    40

    60

    80

    100

    number of updates

    dLev[mV]

    PAM2, 5Gb/s, 4taps Tx Equalization

  • 8/14/2019 Oral Vladimir Slides

    44/53

    44

    Hardware re-use: Dual-mode receiver

    PAM4

    D QD Q

    D Q

    D Q

    D Q

    thresh (+)

    thresh (-)

    x

    0

    lsb(+)

    lsb(-)

    msb

    prDFE enable

    D Q

    dClk

    dClk

    dClk

    prDFE enable

    prDFE enable

    D Q

    D Q

    D Q

    D Q1

    0

    1

    0

    1

    0

    1

    0thresh(+)

    thresh(-)

    0

  • 8/14/2019 Oral Vladimir Slides

    45/53

    45

    Hardware re-use: Dual-mode receiver

    PAM4

    PAM2

    0

    D QD Q

    D Q

    D Q

    D Q

    thresh (+)

    thresh (-)

    x

    0

    lsb(+)

    lsb(-)

    msb

    prDFE enable

    D Q

    dClk

    dClk

    dClk

    prDFE enable

    prDFE enable

    D Q

    D Q

    D Q

    D Q1

    0

    1

    0

    1

    0

    1

    0

  • 8/14/2019 Oral Vladimir Slides

    46/53

    46

    Hardware re-use: Dual-mode receiver

    PAM4

    PAM2 with loop-unrolled DFE tap Leverage multi-level properties of signals in loop-unrolling

    thresh(+)

    thresh(-)

    D QD Q

    D Q

    D Q

    D Q

    thresh (+)

    thresh (-)

    x

    0

    lsb(+)

    lsb(-)

    msb

    prDFE enable

    D Q

    dClk

    dClk

    dClk

    prDFE enable

    prDFE enable

    D Q

    D Q

    D Q

    D Q1

    0

    1

    0

    1

    0

    1

    0

  • 8/14/2019 Oral Vladimir Slides

    47/53

    47

    Improvements with loop-unrolling

    Signal as seen by the

    receiver (on-chip scope)

    0 1000 2000 3000 4000

    0

    0.1

    0.2

    0.3

    0.4[V]

    [ps]

    unequalized

    (a)

    0 1000 2000 3000 4000

    0

    0.05

    0.1

    0.15

    0.2

    0.25[V]

    [ps]

    transmit equalized

    with one tap DFE

    fully transmit equalized

    (b)

    0 50 100 150 200

    -100

    -50

    0

    50

    100

    150

    200

    [ps]

    [mV]

    -5

    -4.5

    -4

    -3.5

    -3

    log10

    (voltagepro

    babilitydistribution)

  • 8/14/2019 Oral Vladimir Slides

    48/53

    48

    Model and measurements

    -80-60-40-20020406080

    -14

    -12

    -10

    -8

    -6

    -4

    -2

    0

    log10(BER)

    Voltage Margin [mV]

    PAM4, 3taps of transmit equalization, 5Gb/s

    O li

  • 8/14/2019 Oral Vladimir Slides

    49/53

    49

    Outline

    Show system level optimization for links Create a framework to evaluate trade-offs

    Background on high-speed links

    High-speed link modeling System level optimization

    Practical implementation issues

    Current / future work Bridging the gap to link capacity

    B id i th M lti t li k

  • 8/14/2019 Oral Vladimir Slides

    50/53

    50

    Bridging the gap: Multi-tone link

    0 2 4 6 8 10 12 140

    2

    4

    6

    8

    10

    Multi-tone data rates with thermal noise

    Nelco 64Gb/s

    FR4 38Gb/s

    #b

    its/Hz

    frequency [GHz]

    B id i th M lti t li k

  • 8/14/2019 Oral Vladimir Slides

    51/53

    51

    Bridging the gap: Multi-tone link

    f

    #

    levels

    data0

    data1

    dataN

    Challenge balancing the inter-symbol andinter-channel interference Microwave filter techniques Custom signal processing

    0 2 4 6 8 10 12 140

    2

    4

    6

    8

    Multi-tone data rates with thermal noise

    Nelco 64Gb/s

    FR4 38Gb/s

    #bits/Hz

    frequency [GHz]

    LPF

    BPF

    BPF

    BPF

    LPF

    ejw1t ejw1t

    ejwNt

    data0

    data1

    LPF

    BPF

    ejwNt

    LPF

    dataN

    LPF

    LPF

    C l i

  • 8/14/2019 Oral Vladimir Slides

    52/53

    52

    Conclusions

    Links nice example of system-level optimization Need accurate models Global tradeoff

    off-chip communication with on-chip computation

    ISI is large in baseband links Cant completely compensate

    (At least not with reasonable area/power)

    Power constrained transmitter PAM4 and simple DFE are attractive solutions

    Implemented practical, low-cost algorithms Still, far from the capacity of these links

    Looking into multi-tone to bridge the gap

    A k l d t

  • 8/14/2019 Oral Vladimir Slides

    53/53

    Acknowledgments

    Prof. Mark Horowitz and Prof. Vojin Oklobdzija

    Prof. Stephen Boyd, Prof. Joseph Kahn, Prof. Thomas Lee

    My mother Nada, my wife Ivana, kids Marija and Marko, my sister

    Tamara, Maurizio and my whole family

    Rambus and MARCO IFC for support

    Jared Zerbe, Andrew Ho, Fred Chen and everybody in Rambus XG team

    MH group - especially Elad Alon and Amir Amirkhany

    Dr. George Ginis and Prof. John Cioffi

    Dejan Markovic and Prof. Borivoje Nikolic

    Prof. Michael Flynn, Prof. Ken Yang

    Marianne Marx, Teresa, Penny, Taru, Deborah, Pamela

    My friends Svjetlana, Danijela and Dejan