ISSCC 2010 / SESSION 8 / HIGH-SPEED WIRELINE TRANSCEIVERS / 8ali/papers/isscc2010-p8-6.pdf · 2010-01-14 · ISSCC 2010 / SESSION 8 / HIGH-SPEED WIRELINE TRANSCEIVERS / 8.6 ... measured

166 • 2010 IEEE International Solid-State Circuits Conference

ISSCC 2010 / SESSION 8 / HIGH-SPEED WIRELINE TRANSCEIVERS / 8.6

8.6 A Fractional-Sampling-Rate ADC-Based CDR with Feedforward Architecture in 65nm CMOS

Oleksiy Tyshchenko1, Ali Sheikholeslami1, Hirotaka Tamura2, Yasumoto Tomita2, Hisakatsu Yamaguchi2, Masaya Kibune2, Takuji Yamamoto2

1University of Toronto, Toronto, Canada2Fujitsu Laboratories, Kawasaki, Japan

ADC-based CDRs take digital samples of the received signal to recover the clockand data. Digital representation of the signal allows for extensive channel equal-ization in the digital domain. Recently-reported ADC-based CDRs sample the sig-nal at 1× or 2× the baud rate. The 1× CDR aligns the sampling clock with the sig-nal using a phase-tracking feedback loop [1-2], which requires a voltage-con-trolled oscillator or phase interpolator, both analog circuits, to adjust the phaseof the sampling clock. To eliminate these analog circuits (and their phase con-trol) in favor of an all-digital implementation, a blind-sampling ADC-based CDR(top of Fig. 8.6.1) samples the received signal at 2× without phase locking to thesignal. The CDR then interpolates between the blind samples to obtain a new setof samples in order to recover the phase and data [3-4]. The doubling of thesampling rate, however, increases the ADC power consumption or, equivalently,reduces the maximum baud rate due to the conversion-rate limitations of ADCs.

This paper presents a new fractional-sampling-rate (FSR) CDR architecture,shown in Fig. 8.6.1, that samples the received signal blindly at a fractional rateof 1.45×, hence reducing the ADC power per Gb/s of data rate by 27.3% com-pared to the 2× architecture. This architecture uses a digital phase detector (PD)that estimates the data phase directly from the blind digital samples, thus elim-inating the need for interpolation. This PD enables data recovery in a feed-for-ward path, further simplifying the CDR architecture. Measurements of a test-chipfabricated in 65nm CMOS confirm that the FSR CDR successfully recovers datawith BER<10-13 at 6.875Gb/s from samples taken at 10GS/s.

A block-diagram of the CDR architecture is shown in Fig. 8.6.2. We blindly sam-ple a 6.875Gb/s signal with four time-interleaved 2.5GS/s 5-bit flash ADCs for atotal sampling rate of 10GS/s, corresponding to 1.45 samples per unit interval(UI). This sampling rate makes the sampling interval (SI) equal to 11/16 UIs,which causes the sampling instances to span the full duration of a UI. A 4:16DeMUX then feeds 16 samples at a time, corresponding to 11 UIs, to the digitalCDR. The PD estimates the instantaneous zero-crossing phase, φX[1:16], forevery UI, using a scheme we describe later. We use φX to recover the averagezero-crossing phase, φAVG, in two steps. First, the phase subtractor generates thephase error, φERR, with a modulo-subtraction of φAVG from φX, bounding φERR

within [-0.5; 0.5) UI. Then, φERR is fed into a third-order low-pass filter to recov-er φAVG. The filter consists of three discrete-time integrators with programmablegains, K1, K2, and K3, that control the CDR’s jitter-tracking bandwidth. The datadecision block picks one sliced sample per UI as the recovered data by compar-ing φX[n] and φAVG, and marks duplicate samples, present in some UIs due to theFSR, as invalid samples. We remove these invalid samples from the data-deci-sion vector, Ŝ[1:16], with a vector compactor (described later), which outputs 11data bits, D[1:11]. For measurement purposes, we retime the recovered datafrom the blind-sampling clock domain to the baud-rate clock domain, fB/16,using a FIFO.

The PD, shown in Fig. 8.6.3, consists of an average-transition-slope calculatorand a data-phase calculator. From the 16 samples at its input, the PD linearlyestimates φX for every pair of adjacent samples with opposite polarities. This lin-ear phase estimation proves sufficient when there is enough ISI in the channel.Otherwise, an anti-aliasing filter needs to precede the sampling ADC. The PDuses the transition slope between the samples to estimate the phase. As shownin Fig. 8.6.3, due to the FSR some slopes lead to small errors in φX, while otherslead to larger errors. To calculate a running average of slopes, we select onlythose slopes that lead to low φX error. When two transitions occur around onesample (top waveform in Fig. 8.6.3) only the transition with the higher slope con-tributes to the average (S[n] to S[n+1]). When such comparison is impossible(bottom waveform in Fig. 8.6.3), a slope contributes to the average only if bothits samples exceed a threshold level, VTH, that is extracted from sample magni-tudes. Since the time between adjacent samples is constant, finding the slopesimplifies to a sum of sample magnitudes; the slope calculator thus outputs(|S[n]|+|S[n+1]|)AVG. The phase calculator, shown in Fig. 8.6.3, estimates thezero-crossing time, φZC[n] (in units of SI), as the ratio |S[n]|/(|S[n]|+|S[n+1]|).To maintain low circuit complexity, the accuracy of φZC[n] is limited to 2 bits. Fortransitions with low-error slopes, we use an instantaneous sum in the 2-bit

φZC[n] calculation, while for transitions with high-error slopes we use the aver-age sum. We then convert φZC[n] from SI to UI using φX[n]=TS[n]+SI·φZC[n],where TS[n] is the time stamp – the sample’s position in UI. Our choice of sam-pling rate causes TS[n] to repeat every 16 samples. Since φZC[n] is only 2-bitaccurate, we convert φZC[n] to φX[n] using a selector with constant inputs, asshown in Fig. 8.6.3.

The data decision block, shown in Fig. 8.6.2, picks one sliced sample per UI bycomparing φX[n], φAVG, and TS[n]. This block also marks the duplicate samplesby setting their valid flags (VF[n]) to ‘0’. To remove these duplicates, whose posi-tions are unknown a priori, we use the vector compactor presented in Fig. 8.6.4.It accepts 16 sliced samples, Ŝ[n], with their VF[n] and produces 11 data bits,D[k], such that every UI corresponds to a single data bit. The compactor con-sists of an array of conditional data selectors, which pass data bits either fromthe left or from top to bottom according to the state of the enable signals. Therows with VF[n]=‘0’ (shaded) pass the data from top to bottom. As a result, theoutput data vector is free of duplicate samples. To reduce area and power, weeliminate the cells that only pass data from top to bottom. With this, the com-pactor reduces to only 33 cells instead of a full 176-cell array, resulting in a sin-gle-cycle compaction.

To experimentally verify the ADC functionality at the FSR, we sampled a6.875Gb/s 27–1 PRBS signal at 10GS/s and captured the DeMUXed ADC sam-ples. We then assigned constant TS[n] to every channel of the DeMUX accord-ing to the channel number, n. Finally, we arranged the samples along the timeaxis in the ascending order of their TS[n]. Figure 8.6.5 presents the resultingmeasured eye diagram reconstructed from 516800 ADC samples. For every ver-tical slice of the eye diagram, we annotate the DeMUX channel number and thecorresponding time stamp. An open eye at the ADC output confirms that the ADCis functional and that error-free data recovery is possible in the FSR CDR.

Figure 8.6.6 presents the simulated and measured jitter tolerances of the FSRCDR. Both simulations and measurements were performed with a 6.875Gb/s27–1 PRBS input and a sampling rate of 10GS/s. We used an event-driven model[5] in Simulink to simulate the CDR. Our simulations show that the CDR toler-ates up to 0.5UIPP of sinusoidal jitter at high frequencies (simulated for 2×105

UIs, no random jitter at TX and RX). To validate our simulated results, we fabri-cated and characterized the FSR CDR in 65nm CMOS. The inset in Fig. 8.6.6presents a measured eye diagram at the receiver input. In addition to the 16ps(0.11UIPP) of jitter already present because of the setup, we applied sinusoidaljitter from 50kHz to 8MHz (the range was limited by the available equipment) tomeasure the jitter tolerance of the CDR. We generated the receiver input with aCentellax PRBS board and recorded the jitter tolerance at BER=10-12. Our meas-urements closely match the simulation results and confirm that the FSR CDR tol-erates 0.3UIPP of high-frequency sinusoidal jitter. The CDR tolerates up to 49MHz(0.98%) of frequency offset (with BER<10-12) between the transmitter and receiv-er beyond the nominal offset due to the FSR.

Figure 8.6.7 shows a die photo of the test-chip. The ADC, DeMUX and clockdivider are analog custom-designed blocks while the CDR and test-structureswere synthesized. The CDR consists of 75644 gates and consumes 58.8mWwhile the ADC consumes 116.4mW. The FSR CDR reduces the ADC power by27.3%, in comparison with a 2× feed-forward architecture, at the cost of dou-bling the gate count; however, the power per Gb/s of data rate and the totalreceiver area are reduced by 12.5%. The receiver occupies 0.3683mm2.

Acknowledgment:We thank Chihiro Sannomiya for assistance with test-chip design and verifica-tion.

References:[1] O. Agazzi et al., “A 90nm CMOS DSP MLSD Transceiver with Integrated AFEfor Electronic Dispersion Compensation of Multi-mode Optical Fibers at 10Gb/s,”ISSCC Dig. of Tech. Papers, pp. 232-233, Feb. 2008.[2] M. Harwood et al., “A 12.5Gb/s SerDes in 65nm CMOS Using a Baud-RateADC with Digital Receiver Equalization and Clock Recovery,” ISSCC Dig. of Tech.Papers, pp. 436-437, Feb. 2007.[3] F.M. Gardner, “Interpolation in digital modems – Part I: Fundamentals,” IEEETran. on Communications, Vol. 41, Issue 3, pp. 501-507, Mar. 1993.[4] M. Spurbeck et al., “Interpolated timing recovery for hard disk drive readchannels,” IEEE Intern. Conf. on Communications, Vol. 3, pp. 1618-1624, Jun.1997.[5] M. van Ierssel et al., “Event-Driven Modeling of CDR Jitter Induced by Power-Supply Noise, Finite Decision-Circuit Bandwidth, and Channel ISI,” IEEE Tran. onCircuits and Systems I, Vol. 55, Issue 5, pp. 1306-1315, Jun. 2008.

978-1-4244-6034-2/10/$26.00 ©2010 IEEE

167DIGEST OF TECHNICAL PAPERS •

ISSCC 2010 / February 9, 2010 / 11:15 AM

Figure 8.6.1: Blind-Sampling ADC-Based CDR Architectures. Figure 8.6.2: Fractional-Sampling-Rate (FSR) CDR Block Diagram.

Figure 8.6.3: Phase-Detector Implementation.

Figure 8.6.5: Measured Eye Diagram at ADC Output (reconstructed from 5x105

ADC samples). Figure 8.6.6: Measured and Simulated Jitter Tolerance.

Figure 8.6.4: Vector-Compactor Implementation.

ADC Interpolator PhaseDetector

LoopFilterRX

ADC PhaseDetector FilterRX

1.45xSampling Clock

InterpolationIndex Updater

X ERR

AVG

AVG

ERR

Previous Work: 2x Interpolating Feedback CDR

DataDecision

RecoveredData

DataDecision

RecoveredData

This Work: Fractional-Sampling-Rate (FSR) Feed-Forward CDR

Interpolated Samples

2xSampling Clock

Feedback

Feed-ForwardEQ

EQ5-bit

2.5GS/sADC

RX6.875 Gb/s

5 GHz 2-phaseSampling Clk

16

PD16

AVG

DataDecision

11FIFO

16

DOUT

Digital CDR

Phase Subtractor

VectorCompactor

2.5 GHz4-phase

S[1:16]

4 ERR

Low-Pass Filter

fB/16Retiming Clk

Low-PassFilter

2

S[1:16] D[1:11]

16

X[1:16]

X[1]

X[2]mod

mod

X[16] mod

1/16

AVG

ERR K1z-1

1 - z-1

K2z-1

1 - z-1

K3z-1

1 - z-1

ERR AVG

Average-transition-slope calculator

Data-phasecalculator 16

X[1:16](|S[n]| + |S[n+1]|)AVG

16

1

Digital Samples

TS[n] + SI·1/8TS[n] + SI·3/8TS[n] + SI·5/8TS[n] + SI·7/8

00011011

X[n], UI

ZC[n], SIS[n]

S[n+1] S[n+2]

S[n-1]VTH

|VTH|

S[n] > VTH AND |S[n+1]| > |VTH|:Include in average slope

5

-2

0S[n+1]

S[n]

ZC[n]

t

00 01 10 11SI = 11/16 UI

S[n]

S[n+1]

S[n+2]S[n-1]

S[n] > S[n+2]:Small X[n] error

Include in average slope

S[n+2] < S[n]:Large X[n+1] error

Exclude from average slope

S[1:16]

01

DT_IN_T EN_IN_T

EN_OUT_R

DT_IN_L

EN_IN_L

DT_OUT_B EN_OUT_B

This block accepts 16 sliced samples, S[n], with their valid flags, VF[n] (on the left), and produces 11 data bits, D[k] (at the bottom).

01

x0

01

11

1 1 1x x x

S[1]VF[1]

S[2]VF[2]

S[3]VF[3]

S[4]VF[4]

1x

1 0 0

11

S[16]VF[16]

D[1] D[3]D[2]1

D[11]

1 4 7 01 13 16 3 6 9 12 15 2 5 8 11 14

0/16 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16 9/16 10/16 11/16 12/16 13/16 14/16 15/16

DeMUX Channel, n

Time Stamp, TS[n]

AD

C S

ampl

e V

alue

2

30

26

22

18

14

10

6

XTime, UI

0 0.2 0.4 0.6 0.8 1

1030.1

1

10

100

1000

Jitter Frequency, Hz

Jitte

r Am

plitu

de, U

I PP

104 105 106 107 108

Sample Input Eye Diagram (no sinusoidal jitter)

SimulatedMeasured

500mV0.11UI16 ps

8

• 2010 IEEE International Solid-State Circuits Conference 978-1-4244-6034-2/10/$26.00 ©2010 IEEE

ISSCC 2010 PAPER CONTINUATIONS

Figure 8.6.7: Die Photograph.

1900m

Data Rate 6.875 Gb/sSupply 1.2 V

ADC Power 116.4 mW

Process 65 nm CMOS

Digital Power 58.8 mW

Output B

uffers

4 channels2.5GS/s

FlashADCs400x490 m2

4:16DeMUX60x490 m2

Input Buffers50x60 m2

CDR430x270 m2

SynthesizedLogic

TestStructures

Bias Gen. &Clock Div.

170x140 m2

Receiver Area 0.3683 mm2

ISSCC 2010 / SESSION 8 / HIGH-SPEED WIRELINE TRANSCEIVERS / 8ali/papers/isscc2010-p8-6.pdf · 2010-01-14 · ISSCC 2010 / SESSION 8 / HIGH-SPEED WIRELINE TRANSCEIVERS / 8.6 ... measured

Documents