by Alif Zaman - tspace.library.utoronto.ca · Alif Zaman Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2017 This thesis

Transceiver Modelling for High-Speed Serial Links

by

Alif Zaman

A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science

Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

© Copyright 2017 by Alif Zaman

Abstract

Transceiver Modelling for High-Speed Serial Links

Alif Zaman

Master of Applied Science

Graduate Department of Electrical and Computer Engineering

University of Toronto

2017

This thesis deals with evaluating the transceiver circuitry employed in high-speed serial links. Contri-

butions from the thesis can be divided into two segments: object-oriented programming based simulation

and step response based modelling for transceiver circuitry. During the object-oriented programming

based simulation, each circuit block is treated as a circuit object with the capability to independently

simulate its behaviour facilitated through encapsulated properties and methods. The proposed object-

oriented scheme incorporates the conventional time-step based analysis into the event-driven simulation

in order to support asynchronous circuitry evaluation, while maintaining simulation speed comparable

to that of the event-driven scheme. Later, the thesis focuses on step response based on modelling for

equalizer and clock and data recovery (CDR) circuit systems to capture their circuit-level nonlinearity

during the simulation. It is demonstrated how to generate Spectre-like eye diagrams for equalizers and

to describe transistor switching transient and clocking frequency saturation effects for CDR.

ii

Acknowledgements

First of all, I would like to express my thanks to my supervisor Professor Ali Sheikheoleslami from

bottom of my heart. Because of his various assistance, encouragement, and guidance at multiple situa-

tions of my study period, I am able to graduate. Without his gracious support, I cannot think of any

easy way to reach at the current stage.

I also would like to thank Professor Tony Chan Carusone, Professor Antonio Liscidini, and Profes-

sor Raymond Kwong to serve my thesis defense committee as well as provide useful feedback. Their

thoughtful feedback have aided the enrichment of the thesis.

In addition, I would like to thank Fujitsu group, particularly Hirotaka Tamura, for their patiently

listening and consistently providing feedback during the project development phase. I cannot help but

thank to Samira and Farhad for having so much time together doing assignments, discussing circuits, and

various other activities. Along with Farhad, I also thank Josh for helping me editing thesis, sharing useful

knowledge and discussion. I also would like to thank all other graduate students, whose names are not

mentioned here, for their various useful technical discussions, suggestions, assistance, and time-to-time

encouragements during my graduate studies.

Finally, I would like to thank my family members, especially my mom, for their encouragement and

various support from Calgary during my study and to the Creator who made it happen.

iii

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Thesis Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 4

2.1 Signal Integrity Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Serial Link Transceiver Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.1 Equalization Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.2 Clock Recovery Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Performance Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1 Bit-Error-Rate (BER) Eye Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.2 Jitter Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4 Analog-Mixed Signal (AMS) Simulation Overview . . . . . . . . . . . . . . . . . . . . . . 12

2.4.1 Time-Step Based Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4.2 Event-Driven Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5 Modelling for Continuous Time Component Blocks . . . . . . . . . . . . . . . . . . . . . . 17

2.5.1 Ordinary Differential Equation (ODE) Based Modelling . . . . . . . . . . . . . . . 17

2.5.2 Pulse Response Based Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.5.3 Step Response Based Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.5.4 Symbolic Expression Based Modelling . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3 Proposed Simulation Method for Analog-Mixed Signal Analysis 28

3.1 Object-Oriented (OO) Modelling Based Simulation . . . . . . . . . . . . . . . . . . . . . . 29

3.1.1 Abstraction for OO Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.1.2 Operating Principle of OO Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2 Description of Circuit Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2.1 Properties and Methods of Circuit Objects . . . . . . . . . . . . . . . . . . . . . . . 33

3.2.2 Processed Data Formats of Circuit Objects . . . . . . . . . . . . . . . . . . . . . . 40

3.3 Performance Evaluation of OO Simulation in Case Studies . . . . . . . . . . . . . . . . . . 41

3.3.1 Example Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.3.2 Object Order Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.3.3 Feedback Loop Situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

iv

3.3.4 Incorporating Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.3.5 Simulation Speed Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4 Proposed Modelling for Equalizer Circuitry 58

4.1 Feed Forward Equalizer (FFE) Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.1.1 FFE Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.1.2 FFE Modelling for OO Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.1.3 FFE Modelling Testcase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.2 Continuous Time Linear Equalizer (CTLE) Modelling . . . . . . . . . . . . . . . . . . . . 66

4.2.1 CTLE Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.2.2 CTLE Modelling for OO Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.2.3 CTLE Modelling Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.3 Decision Feedback Equalizer (DFE) Modelling . . . . . . . . . . . . . . . . . . . . . . . . 77

4.3.1 DFE Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.3.2 DFE Modelling for OO Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.3.3 DFE Modelling Test Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5 Proposed Modelling for Clock and Data Recovery (CDR) System 84

5.1 CDR Functional Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.2 CDR Component-level Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.2.1 Phase Detector (PD) Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.2.2 Loop Filter (LF) and Voltage Controlled Oscillator (VCO) Modelling . . . . . . . 90

5.3 Putting it Altogether . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.4 Performance Evaluation for the Proposed Modelling Scheme . . . . . . . . . . . . . . . . . 98

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

6 Conclusion and Future Work 101

6.1 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Bibliography 104

v

List of Figures

2.1 A schematic of a typical channel construction . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Generic transceiver architecture for high-speed serial links . . . . . . . . . . . . . . . . . . 6

2.3 Idealistic concept of equalization to compensate for channel attenuation. . . . . . . . . . . 7

2.4 Concept of clock recovery unit at the receiver . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.5 Concept of generating a bit error rate (BER) eye diagram . . . . . . . . . . . . . . . . . . 10

2.6 Asymptotic jitter tolerance plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.7 Comparison between constant and variable time-step based simulation schemes . . . . . . 13

2.8 Typical time-step based transient simulation flow chart [1] . . . . . . . . . . . . . . . . . . 15

2.9 Concept of event driven simulation: (a) block diagram, (b) operation [2] . . . . . . . . . . 16

2.10 Demonstration of Kirchhoff current law (KCL) . . . . . . . . . . . . . . . . . . . . . . . . 18

2.11 Continuous time waveform formation using pulse-response based modelling . . . . . . . . 21

2.12 Continuous time waveform formation using step-response based technique . . . . . . . . . 23

2.13 Symbolic expression based modelling overview [3,4] . . . . . . . . . . . . . . . . . . . . . . 26

3.1 Comparative study between calculating vti and tvc,i . . . . . . . . . . . . . . . . . . . . . 29

3.2 Analog-mixed signal system abstraction for object-oriented (OO) simulation . . . . . . . . 31

3.3 Relationship between signal time and simulation time . . . . . . . . . . . . . . . . . . . . 32

3.4 Definition of a circuit object, cktObj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.5 Processed data format comparison for the cases of discrete time and continuous time objects 40

3.6 Simulation test case study for OO simulation . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.7 Effect of circuit object placement order in activation list for OO simulation . . . . . . . . 49

3.8 Schematic of a system with a feedback loop . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.9 Effects on object placement ordering in OO simulation for system with feedback loop . . . 51

3.10 Hierarchical representation for a system with a feedback loop . . . . . . . . . . . . . . . . 52

3.11 Serial processing to parallel processing conversion for OO simulation . . . . . . . . . . . . 54

3.12 Parallel processing demonstration under restricted resource environment for OO simulation 55

3.13 Speed performance result for the OO simulation . . . . . . . . . . . . . . . . . . . . . . . 56

4.1 Architectural overview of typical channel equalization system . . . . . . . . . . . . . . . . 58

4.2 Basic architecture of a symbol-spaced feed forward equalizer (FFE) . . . . . . . . . . . . . 59

4.3 Cursor extraction from channel pulse response . . . . . . . . . . . . . . . . . . . . . . . . 60

4.4 Circuit-level overview of a 3-tap source series terminated based single-ended FFE . . . . . 62

4.5 Look-up table (LUT) based nonlinearity modelling for FFE . . . . . . . . . . . . . . . . . 62

4.6 Channel waveform construction based on FFE outputs . . . . . . . . . . . . . . . . . . . . 64

vi

4.7 FFE simulation testbench and waveform reconstruction process . . . . . . . . . . . . . . . 65

4.8 Bode plot of a channel accompanied by its ideal equalizer and realistic continuous time

linear equalizer (CTLE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.9 Circuit-level overview of single-ended CTLE . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.10 Representing CTLE for OO simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.11 Plot of CTLE gain response, vOut/vIn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.12 Extracted step responses for modelling a CTLE (considering the effect of channel) . . . . 71

4.13 Step response extraction process for CTLE . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.14 Continuous time waveform formation for a CTLE (considering the effect of channel) . . . 73

4.15 CTLE modelling performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.16 CTLE modelling performance evaluation due to an FFE . . . . . . . . . . . . . . . . . . . 76

4.17 Basic architecture of a decision feedback equalizer . . . . . . . . . . . . . . . . . . . . . . 77

4.18 Pulse response due to 2-tap decision feedback equalizer (DFE) . . . . . . . . . . . . . . . 78

4.19 Circuit-level overview of differentially ended DFE . . . . . . . . . . . . . . . . . . . . . . . 79

4.20 Modifying DFE model to capture the finite adder bandwidth . . . . . . . . . . . . . . . . 81

4.21 DFE modelling performance evaluation with respect to FFE and CTLE . . . . . . . . . . 82

5.1 Architectural overview of the clock and data recovery (CDR) system . . . . . . . . . . . . 85

5.2 Modelling overview of binary phase detector (PD) . . . . . . . . . . . . . . . . . . . . . . 88

5.3 Modelling overview of linear PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.4 Modeling overview of charge pump based loop filter (LF) and voltage controlled oscillator

(VCO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.5 CDR open loop step response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.6 Demonstration of CDR clock transition calculation . . . . . . . . . . . . . . . . . . . . . . 97

5.7 Test case block diagram for linear PD based CDR . . . . . . . . . . . . . . . . . . . . . . 98

5.8 Proposed modeling measurement accuracy validation with respect to time-step based

simulation measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

vii

List of Tables

3.1 Major properties of circuit objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2 Major methods of circuit objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3 Description of object-specific properties for the selected object-oriented simulation case . . 43

3.4 Simulation-specific properties of all objects for the selected object-oriented simulation case 44

3.5 Explanation of simulation steps for the selected object-oriented simulation case (for sim-

ulation time 0 - 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.6 Simulation time break down for the case of 10, 000k bits (where 1k = 1, 000) . . . . . . . . 57

4.1 Eye diagram measurements for feed forward equalizer (FFE) test-case . . . . . . . . . . . 66

4.2 Eye diagram measurements for continuous time linear equalizer (CTLE) test-case . . . . . 75

4.3 Eye diagram measurements for CTLE test-case due to FFE . . . . . . . . . . . . . . . . . 75

4.4 Eye diagram measurements for decision feedback equalizer (DFE) test-case . . . . . . . . 83

viii

List of Algorithms

3.1 Pseudo-code of running the top-level object-oriented (OO) simulation . . . . . . . . . . . 32

3.2 Pseudo-code for a circuit object, cktObj . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3 Hierarchical representation template for OO simulation . . . . . . . . . . . . . . . . . . . 52

3.4 Pseudo-code script to deal with feedback system in OO simulation . . . . . . . . . . . . . 53

4.1 Modelling template of feed forward equalizer for running OO simulation . . . . . . . . . . 63

4.2 Modeling template of continuous time linear equalizer for running OO simulation . . . . . 70

4.3 Modeling template of decision feedback equalizer for running OO simulation . . . . . . . . 80

5.1 Pseudo-code of binary phase detector logical functions . . . . . . . . . . . . . . . . . . . . 87

5.2 Pseudo-code of linear phase detector logical functions . . . . . . . . . . . . . . . . . . . . 90

5.3 Modeling template of clock and data recovery for running OO simulation . . . . . . . . . 96

ix

Acronyms

2D 2-dimension

3D 3-dimension

AMS analog-mixed signal

ASIC application specific integrated circuit

BER bit error rate

BERT bit error rate tester

CDF cumulative distribution function

CDR clock and data recovery

Ch channel

CID consecutive identical digits

Clk clock

CTLE continuous time linear equalizer

dB decibel

DCD duty cycle distortion

DFE decision feedback equalizer

DFF data flip-flop

EQ equalizer

FFE feed forward equalizer

FIR finite impulse response

Gbps Giga (109) bits-per-second

GUI graphical user interface

x

IC integrated chip

IoT internet of things

ISI inter-symbol interference

KCL Kirchhoff current law

LF loop filter

LPF low pass filter

LTI linear time-invariant

LUT look up table

MNA modified nodal analysis

NMOS n-type metal-oxide semiconductor

NR Newton-Raphson’s method

ODE ordinary differential equation

OO object-oriented

PAM pulse amplitude modulation

PD phase detector

PDF probability density function

PFD partial fraction decomposition

PLL phase locked loop

PMOS p-type metal-oxide semiconductor

PRBS pseudo random bit stream

PVT process-voltage-temperature

RBG random bit-stream generator

RC resistive-capacitive

Rx receiver

SE single-ended

SSC spread spectrum clocking

SST source series terminated

xi

Tx trasmitter

UI unit interval

VCO voltage controlled oscillator

xii

Chapter 1

Introduction

With the advances in computational technologies, the demand for high-speed data communication is

continuously increasing. Communication speed needs to be increased in order to cope with the demand

for day-to-day internet applications and global socio-economic progress [5, 6]. Cloud computing, online

marketing, electronic messaging, internet telephony, remote file sharing, social networking, and video

broadcasting are a few notable present-day user applications. In the near future, a greater set of appli-

cations related to the internet of things (IoT) will collectively increase further demand for high speed

communication [7]. All these newly developed applications are based on high speed data connectivity

and as such pose major challenges in operating feasibility of extant data communication systems.

Overcoming the data connectivity bottlenecks will require the clever integration of multiple innovative

engineering solutions. For instance, silicon technologies have greatly enhanced the data processing

capabilities of integrated chips (IC) inside computers and other electronic devices [8, 9]. To achieve

high productivity, the ICs must often communicate among themselves at high data rate. Over time,

semiconductor ICs have become miniaturized, but pin sizes of IC packages have remained nearly constant;

hence, data rate through each pin or serialized high speed data needs to be increased to keep up with

the chip-to-chip communication demand [10, 11]. Communication through serial links has a number of

signal integrity related issues and resolving these issues is often not cost-effective or practical without

major engineering interventions. These interventions must meet low-power budget and take into account

implementation form factors, technological feasibility, and other system design specifications [12].

1.1 Motivation

As the data rate increases through any serial link, the data quality suffers from issues associated with

signal integrity such as signal attenuation, dispersion, and reflection [13]. Due to the channel imperfec-

tions, the received signal may look quite different and not recognizable without further signal processing.

To compensate for the link imperfections, additional transceiver circuitry such as equalizers and clock

recovery units are employed. Equalizers are required to compensate for the signal attenuation and clock

recovery units are to produce a sampling clock with an optimal sampling phase so as to reduce the bit

error rate (BER).

Implementing the transceiver circuitry vastly depends on the design specifications, which are deter-

mined through analyzing the serial link or channel characteristics and system operating environment [14].

1

Chapter 1. Introduction 2

When the channel becomes heavily attenuated, the received signal level may fall below the noise level.

Under such circumstances, the equalizer design may become complicated requiring area overhead and

additional power. High-speed transceivers often are implemented using newer and faster sub-micron

device technologies in order to integrate into larger application specific integrated circuit (ASIC) sys-

tems. Designs for these newer technologies require a number of considerations to be taken into account

such as the integration of nonlinear circuit devices, random device mismatches, low supply voltage, and

simultaneous switching power supply noise [15,16]. These factors may significantly limit the transceiver

performance and may lead to product failure.

Evaluating the above situations during the transceiver design phase can help avoid potential product

failures. Transistor-level simulation tools, such as SPICE simulators, are not suitable for this evaluation

due to their indefinitely long simulation times and other issues related to computational processes [17].

Most signal integrity tools available currently can only simulate situations associated with channel im-

perfections based on linear circuit models. Tools such as event-driven simulators may be able to replicate

the transistor-level nonlinearity, but their simulation schemes are often restricted to system specific eval-

uation purposes [2–4, 18–22]. Commercial tools such as LinkLab have the capacity to replicate such

behaviour, but their models are proprietary impairing further modifications [15].

The proposed work in this thesis focuses on a computationally optimized simulation scheme, which

can be exploited to achieve SPICE-level accuracy, while completing the simulation in a reasonable

time. The proposed simulation scheme addresses the computationally intensive nature of the time-step

based SPICE simulator through identifying the underlying repetitive factors during transceiver circuitry

simulations. In addition, the proposed method demonstrates a way to integrate various types of the

transceiver circuitry such as equalizer and clock recovery, in a single integrated simulation environment.

All through the process, the proposed work keeps the number of computations as low as possible to

achieve a high simulation speed.

1.2 Thesis Objective

We present a novel object-oriented simulation scheme both for equalizers and clock recovery circuit

systems. The main objectives of the thesis are as follows:

• Investigate conventional time-step based and event-driven simulation schemes to come up with

a computationally efficient, but feature-rich simulator using an object-oriented programming ap-

proach.

• Propose equalizer models for running in an object-oriented simulation environment and generate

Spectre-like eye diagrams through capturing transistor-level nonlinearity.

• Propose a generic clock and data recovery (CDR) modelling scheme, which can be used to represent

both linear and binary phase detector based CDRs.

Chapter 1. Introduction 3

1.3 Thesis Outline

The remainder of the thesis is organized as follows.

• Chapter 2 describes the background behind serial link operation, its performance evaluation met-

rics, and its simulation strategies for high data rate transmission purposes.

• Chapter 3 describes the basics of the proposed object-oriented simulation scheme.

• Chapter 4 presents our proposed modelling schemes for equalizers to capture transistor-level non-

linearity.

• Chapter 5 presents our proposed generic modelling schemes for both linear and binary phase

detector based CDR systems.

• Chapter 6 summarizes the thesis contributions and highlights the future directions for this project.

Chapter 2

Background

Transceiver circuits recover the transmitted data at the receiver end. Usually, transceiver circuit blocks

are placed both before and after the channel used for high-speed data transmission. As the data rate

increases, various channel related imperfections affect the transmitted signal making it often unrecog-

nizable at the receiver side. The transceiver circuit blocks compensates for the effects of the channel

and attempt to make data transmission nearly seamless from the perspectives of both the trasmitter

and the receiver ends. However, due to non-ideal circuit blocks, the signal at the receiver end may

contain residual inter-symbol interference (ISI) and distortions, which in turn results in non-zero BER.

Understanding the reasons behind the errors requires a detailed analysis of the transceiver circuitry and

channel as a whole. This chapter provides a background information about transceiver architecture, its

implementation, and its modelling for validation purposes at the system-level.

The remainder of this chapter is organized as follows. A basic overview of signal integrity is provided

in Section 2.1. Section 2.2 provides a functional overview of serial link transceiver circuit architecture.

How to evaluate different parts of the transceiver circuitry is covered in Section 2.3. Once the relationship

between the analog-mixed signal (AMS) and transceiver systems are established, Section 2.4 provides an

overview of the problems associated with different kinds of AMS simulation schemes. Section 2.5 presents

a brief summary of various modelling schemes for continuous-time circuit blocks used in simulations.

Finally, Section 2.6 provides the conclusions of this chapter.

2.1 Signal Integrity Overview

Designs compatible with high data-rate communication have been made mainly from signal integrity

perspective. As the data transmission rate increases, the signal integrity of a channel is adversely

affected and consequently increase the BER. These signal integrity issues arise mostly due to the channel

frequency characteristics and semiconductor device nonlinearity.

Figure 2.1 provides a schematic view of a typical channel considered for high data-rate transmission

[15,23,24]. Anything between the process of generating the input signal (shown as signal in) and receiving

the output signal (shown as signal out) using circuit devices is considered a part of the channel. After

generating the input signal at the trasmitter chip, the signal initially needs to traverse the pad inside the

chip, then through the bond-wire and package pad before entering the metallic conductor. When the

signal is received at the chip on the other end, a process in reverse order of travelling through conductor

4

Chapter 2. Background 5

Signal

InCrosstalk

Signal

Out

Package

Chip

Pad Pad

ViaConductor

· · ·

· · · ︷︷︸Transmitter

︸︷︷ · · ·ReceiverBond-wire

Figure 2.1: A schematic of a typical channel construction

to chip pad takes place. The signal leaving the trasmitter chip often travels through several connectors,

such as vias and bond-wires. In addition, unwanted signal fragments may appear from neighbouring

data transmission lines.

The communication channel, such as the one shown above, suffers from three major problems: signal

attenuation, reflection, and cross-talk [15]. Signal attenuation occurs mostly due to conductor and

dielectric loss. Signal reflection is caused by impedance discontinuities in the channel. An impedance

discontinuity exists, whenever the signal path changes from one material layer to another, such as at the

interface between pads, bond-wires, and vias. Crosstalk occurs due to neighbouring data transmission

channels. As a product form-factor gets smaller over time, the transmission system often suffers from

crosstalk at various locations. Depending on the crosstalk location and strength, it may limit the signal

transmission speed.

Another source of signal integrity is semiconductor device nonlinearity. As the data transmission

frequency goes up, smaller sub-micron semiconductor devices are used to design transceiver circuitry

due to their higher terminal frequencies and lower power consumption. Designing transceivers in smaller

devices also allows for integration with other large compact ASIC systems such as microprocessors and

memory. Using smaller devices can cause a wide variety of other signal integrity issues, such as nonlinear

transistor characteristics, random device mismatches, process variation and power supply noise. For

instance, device-level nonlinearity, process variation, and random device mismatches affect linearity of

transceiver filter operations. Power supply noise, which is mostly due to digital system integration,

escalates noise-level during the data transmission and increases the BER.

2.2 Serial Link Transceiver Architecture

Depending on the data transmission rate and channel characteristics, the transceiver architecture must

vary in complexity to counteract the unwanted noise and interference. An objective of the transceiver

operation is to minimize the probability of data transmission error while maintaining low power consump-

tion and a small footprint. Having a low BER is certainly desirable to achieve high data transmission

efficiency. Low power consumption is important in order to keep the overall system power within the

budget.

Typically, any transceiver for a high-speed serial link has two major sub-systems: equalizer and

clock recovery. The equalizer sub-system can be implemented at the trasmitter or receiver or both


Transmitter Receiver

Source TxEQ Ch RxEQ Sink

TxClk

RxClk

Clock

Recovery

Figure 2.2: Generic transceiver architecture for high-speed serial links

ends depending on the channel attenuation level. If the channel is highly attenuating, equalization is

performed at the both ends. The clock recovery, which is an essential for providing the clock in order to

sample at the optimum signal location, is implemented at the receiver.

Figure 2.2 shows a typical construction of a transceiver. The source generates data synchronized

to the clock at the trasmitter (marked by TxClk). Input data is then equalized by the equalizer at

the trasmitter (shown as TxEQ) before transmitting through the channel (shown as Ch). Once the

signal through the channel arrives at the receiver, the equalizer at the receiver (shown as RxEQ) further

equalizes the signal to prepare it for sampling. The clock recovery unit generates clock signal (marked

as RxClk) for sampling the transmitted data. It determines the optimal sampling phase by analyzing

previously detected samples from the RxEQ. It is worth mentioning that both equalizers depending on

their architectures require clocking in order to perform equalization.

Conceptual details of both equalizer and clock recovery units are described in sections 2.2.1 and 2.2.2

respectively.

2.2.1 Equalization Overview

Equalization inverts the transmitted symbol distortion caused in a channel at high frequency [25]. By

nature, any channel behaves like a passive low pass filter. For wire-line communication, the frequency

content of a transmitted bit-stream usually ranges from 0 Hz to all the way up to Nyquist frequency,

fN (i.e. fN = fBit−Rate/2). When a random bit-stream is transmitted through a channel at a high rate,

the higher frequency content of the transmitted bits get attenuated and delayed compared to those at

lower frequencies. This low pass filtering behaviour of the channel introduces ISI, because transmitted

current bits are affected by both previous and later transmitted bits. The task of an equalizer is to

reduce the amount of ISI to an acceptable minimum level so that transmitted symbols can be exactly

reconstructed after sampling.

Equalizers for any high-speed serial link can be realized using two main approaches: frequency domain

and time domain. Both approaches are shown in Figure 2.3. In all sub-figures, vertical axes represent

the amplitudes, while horizontal axis is either frequency, ω, (in sub-figure (a)) or time, t, (in sub-figure


Channel Equalizer Output

ω ω ω

AdB AdB AdB

(a) Frequency domain perspective

Signal

with ISI

Replica

ISI

Signal

without ISI

t t tTSymbol TSymbol TSymbol

(b) Time domain perspective

Figure 2.3: Idealistic concept of equalization to compensate for channel attenuation.

(b)). From a frequency domain operation perspective, the task of an equalizer is to amplify as well as to

advance the higher frequency components of input signals relative to lower frequency components. The

resultant equalized output becomes flat at AdB in the ideal case, in which case, received information will

be identical. From a time domain perspective, the task of an equalizer is to minimize ISI by subtracting

its replicated version. When transmitted symbols contain ISI (when t > TSymbol), the equalizer should

directly subtract ISI and the resultant output should contain only the transmitted symbols.

Both realization approaches are employed in both continuous time and discrete time systems. Con-

tinuous time equalizer filters are implemented using passive circuit elements, such as resistors, capacitors,

and inductors. Even though continuous time equalizers can work with highly attenuated signals, they

are hard to tune and tend to amplify unwanted noise. To the contrary, discrete time equalizers require

synchronous delays and logic elements such as data flip-flop (DFF) interfaced with digital-to-analog con-

verter circuitry. Discrete time equalizers are highly programmable and less susceptible external noise,

but cannot work with small signal amplitudes and also can suffer under jittery clock conditions. Since

each of the equalization schemes has its own drawbacks, both types of equalizers are often utilized

together to counteract these problems.

2.2.2 Clock Recovery Overview

Once the equalizer has minimized ISI to an acceptable level for sampling, the work of the clock recovery

unit at the transceiver begins. The goal of the clock recovery unit is to detect the most suitable

sampling location from the equalized but jittery signal in order to generate a clean data eye at the

receiver end (shown in Figure 2.4). When the transmitted signal is properly equalized, the equalizer not

only amplifies the sampling location of signal but also removes the jitter associated with ISI. However,

an equalized signal is often left with additional jitter, which can be composed of both deterministic

jitter and random jitter [26]. Random jitter is present due to various unknown noise sources, such as


Jittery Eye

Jitter PDF

∗ =

Deterministic Random Overall

Recovered Clock

Edge Data︸︷︷︸Sampling Phase

Clean Eye

Figure 2.4: Concept of clock recovery unit at the receiver

power supply interruption, sudden system activity increase, and ambient heat profile. This jitter tends

to be unbounded in nature. Sources of deterministic jitter are related to the data transmission system,

such as spread spectrum clocking (SSC), ISI, and duty cycle distortion (DCD). Deterministic jitter is

bounded in nature and can be tracked. Clock recovery is needed to generate a clean clock signal for data

reconstruction by filtering out this jitter related corruption.

Since the prime objective of clock recovery is to filter out unwanted jitter present in the transmitted

signal, it works as a mean estimator or low pass filter with respect to incoming signal edges. The low

pass filter implemented inside the clock recovery system is often referred to as a loop filter (LF). The

LF synchronizes the edge sampling phase of the internal clock source with respect to the midpoint of

the incoming data edges, as shown in Figure 2.4. Once the midpoint of incoming data edge detection is

successful, the data sampling phase can be determined by obtaining the constant phase shift, which is

usually π radians for square wave clocks. Because the LF operates at a much lower frequency than the

signal frequency, it can be implemented using continuous time or discrete time or using both types of

circuit components.

The cut off frequency of the LF in a clock recovery system has the crucial role of defining the

upper limit of jitter frequencies to be rejected. It alternatively infers certain low frequency jitter passes

through the LF, resulting in jitter in the recovered data output. In reality, most unwanted jitter, both

deterministic and random, is of high frequency, which justifies the adoption of low pass filtering behaviour

for LF designs [26]. However, having a loop filter with a very low cutoff frequency delays the incoming

data synchronization lock acquisition and limits the signal frequency offset range. Hence, certain data


transceiver applications must be adapted for certain cutoff frequencies of the LF, depending on the

nature of the jitter probability density function (PDF) characteristics and system design specifications.

2.3 Performance Evaluation Criteria

Data transmission through a serial link is negatively affected by the error rate in the received data. A

high data transmission error rate increases activities associated with received data error management,

such as repeated data transmission, forward error correction, and a high density of parity bits [25]. A

high error rate not only increases overall system power consumption, but also can limit the effective

data transmission rate. In order to achieve a high data transmission rate while reducing the associated

operating power requirements, it is essential to maximize data transmission efficiency. Modern high data-

rate (10 Gbps or more) transmission applications therefore have stringent low bit error rate (10−12 −10−16) requirements [14].

Implementing systems to transmit at high data-rates while maintaining ultra-low BER becomes a

major engineering challenge. Fulfilling the challenge with respect to short time-frame to market as well

as high manufacturing cost for modern sub-micron devices does not permit iterative manufacturing and

testing on the laboratory environment. Instead, it is preferable to perform system-level verification tests

via simulations. Simulator-based verification facilitates the design and evaluation of multiple types of

transceiver architectures at the pre-manufacturing stage. Two popular system verification approaches,

BER eye diagram and jitter tolerance, are discussed as follows.

2.3.1 Bit-Error-Rate (BER) Eye Diagrams

A BER eye diagram is generated at the post-equalization stage in order to observe the available eye

opening at a given BER. The opening at a given BER refers to a 2D enclosed area, represents both the

allowable sampling clock jitter and the threshold offset, as shown in Figure 2.5. BER eye diagrams are

useful in determining the design specifications related to horizontal and vertical eye openings necessary

for sampling slicers.

3D Eye Diagrams

Generating a BER eye diagram begins with developing a 3D eye diagram. A 3D eye diagram is developed

like a regular eye diagram by estimating the likelihood of recorded transient traces that are overlapping

over a constant multiple of unit intervals. Usually, the number of UIs for overlapping is chosen as 2 UIs

(UI - unit interval). Figure 2.5a shows an example 3D eye diagram with overlapping interval of 2 UIs.

As in a regular eye diagram, the horizontal axis represents time and the vertical axis represents signal

amplitude. Color information denotes the PDF of the 3D eye diagram, indicating how likely it is that a

recorded transient trace would go through certain regions. The origin, O(0, 0), marked on the 3D eye

diagram is at the center of the eye and represents the ideal sampling location.

Timing and Amplitude Margins

Timing and amplitude margin plots are used to measure the horizontal and vertical eye openings, avail-

able at a given BER. Example plots for timing as well as amplitude margins are shown in Figures 2.5b

and 2.5c. As mentioned earlier, incoming signals at the receiver suffer from various sources of noise,


3D Eye Diagram

Am

pli

tud

e

Time (UI)

O

Origin, O(0, 0)

PD

F

High

Low

(a)

Timing Margin

log10BER

Time (UI)

(b)

Amplitude Margin

Am

pli

tud

e

log10BER

(c)

3D BER Eye

Am

pli

tud

e

Time (UI)

log10BER

High

Low

(d)

BER Contour

Am

pli

tud

e

Time (UI)

High

Low

(e)

Figure 2.5: Concept of generating a bit error rate (BER) eye diagram


which can be categorized as either deterministic or random noise. Effects of both deterministic and

random noise are visible in the margin plots: deterministic noise causes flat regions, whereas random

noise leads to gradually declining margins, as the BER is reduced logarithmically. A timing margin plot

can be generated from the eye diagram by integrating the measured PDF across the zero-crossing level,

shown in Figure 2.5a, as horizontal dotted line passes through the origin. Similarly, voltage margin plots

can be obtained by integrating the PDF vertically while sampling from the origin.

Even though both timing and amplitude margin plots infer how much eye opening can be obtained

at a specific BER, they have their own drawbacks due to their underlying assumptions. Timing margin

plots assume the data being sliced exactly with respect to the zero-cross level. Similarly, amplitude

margin plots provide the amount of vertical opening under the assumption that the sampling clock (or

recovered clock) has no jitter. In reality, the data slicer and the sampling clock are never perfect, so

amplitude and timing margins cannot be considered independent from each other.

BER Eye Diagram and its Contour Map

When the 3D eye diagram is integrated over its 2D time-amplitude plane, the resultant 3D plot contains

cumulative distribution function (CDF) information of the overlapped transient traces. This 3D CDF

plot is also called a 3D BER eye (Figure 2.5d). It can be observed based on the logarithmically scaled

colour bar in Figure 2.5d: as sampling time and slicer threshold shift away from the origin, O(0, 0),

over the 2D plane, the probability of error increases. The white space located in the center of the BER

plot represents log 0 = −∞, since no transient plot passes through the region.

From the 3D BER eye plot, the BER contour can be interpolated (or extrapolated) using regional

3D slope information at a user specified BER value (Figure 2.5e). In the BER contour plot, each

contour outlines an area that represents both on acceptable sampling clock jitter and slicer threshold

variation. As expected from the figure, lower BER contours cover a smaller area. In order for the data

transmission system to meet specific BER requirements, receiver recovered clock jitter as well as slicer

threshold nonlinear variations must be jointly limited within the specified BER contour boundary. It can

be inferred from the contour properties that timing and voltage margins are not considered separately,

and therefore, it is the best way to perform serial link verification.

2.3.2 Jitter Tolerance

Even though BER eye diagrams can provide information about the best timing margins for sampling,

they can be used only to evaluate the actual allowable high frequency jitter for a clock recovery system.

Certain low frequency jitters, such as SSC-related jitter, can have a timing spread of more than 1 UI,

which can easily lead to closed eye diagrams and zero timing margins. Evaluating a serial link under

such circumstance requires the performance of a jitter tolerance test for the clock recovery unit. A jitter

tolerance test can also be used to analyze the jitter frequency characteristic of the clock recovery unit,

particularly of the LF. As explained in Section 2.2.2, the LF defines the key jitter cutoff frequency, which

is not only responsible for output jitter in recovered data but also relevant to data synchronization lock

time.

Figure 2.6 shows an asymptotic plot of jitter tolerance amplitude varying with input jitter frequency.

The horizontal axis shows the jitter modulation frequency of the transmitted data, whereas the vertical

axis depicts jitter amplitude or maximum jitter width. Both horizontal and vertical axes need to be


Jitter Tolerance

log10 Amplitude

≤ 1 UI

−40 dB/decade

0 Hz ← fj f

log10 Frequency

Figure 2.6: Asymptotic jitter tolerance plot

distributed logarithmically in order to capture wide range values. Initially, transmitted data is usually

modulated with sinusoidal jitter at a given frequency. Then the amplitude of the sinusoidal jitter is

increased in order to search for the transition amplitude, up to which point the clock recovery system

can track without making any detection error. The searching procedure needs to take into account

the initial synchronization lock acquisition period and its associated detection errors the clock recovery

system would make. As can be observed from the figure, at low input jitter frequencies (0 Hz < f < fj),

jitter amplitude tolerance for the recovery system increases, since the LF allows low frequency jitter to

pass through the system. At high input jitter frequencies (when f ≥ fj), jitter amplitude tolerance

reaches a constant limit, ≤ 1 UI, as the jitter frequency goes beyond the trackable limit of LF. At

low jitter frequencies, the jitter tolerance amplitude generally increases at a constant rate. The slope

varies depending on the clock recovery system architecture. For second-order phase tracking based clock

recovery systems, the slope is 40 dB/decade. (Details on generating a jitter tolerance plot can be found

in the Appendix A of [27].)

2.4 Analog-Mixed Signal (AMS) Simulation Overview

The objective of AMS simulation is to simulate discrete time components alongside continuous time

components. Simulating continuous time circuit components, such as resistive-capacitive filters, passive

equalizers, and amplifiers, is highly demanding of computational resources due to the sophisticated mod-

elling systems required. Performing top-level verification of a reasonably sized AMS system of current

applications is therefore usually impractical, if only continuous time models for all circuit blocks are

employed. Since analog circuitry requires careful performance evaluation due to its complex behaviour,

modelling analog circuits with continuous time modelling is justifiable. To the contrary, because sim-

ulations of digital systems are usually performed to ensure logical correctness, computationally light

discrete time modelling is sufficient. AMS simulators therefore have great significance for various top-

level verification processes.

AMS simulators are mainly used for system-level transient simulation purposes. There are two ways


to perform AMS simulations: time-step based and event-driven simulations. Working principles for

most commercial AMS simulators follow time-step based simulation due to its close resemblance to and

easy adaption of continuous time circuitry simulation schemes. Recently, event-driven simulation has

become increasingly popular in research communities due to its speed advantages over time-step based

simulation. Details for both schemes are described below.

2.4.1 Time-Step Based Simulation

Time-step based simulation is specifically designed for simulating analog circuit components to capture

their continuous time and nonlinear behaviour with high accuracy. The basic concept of time-step based

simulation is to calculate the output of the next time step based on the input as well as the output of the

current step. Time-step based simulator therefore usually employs circuit components modelled with

an ordinary differential equation (ODE) based scheme (Section 2.5.1). In time-step based simulations,

outputs of continuous-time circuit components are evaluated at discrete time-steps. Accuracy of the

continuous-time output calculations depends on the granularity of the chosen time-steps. Conducting a

simulation with smaller time-steps increases simulation time, whereas larger time-steps reduce accuracy

and leads to potential convergence instability [18, 22, 28]. Therefore, picking the right time-steps is

critical for this type of simulation scheme.

Constant and Variable Time-Step Based Simulations

Two types of time-step selection schemes are available: constant time-step and variable time-step. Figure

2.7 shows both selection schemes using two transient plots. In both plots, the dark line represents a

continuous-time waveform of a circuit net with the horizontal axis denoting time and the vertical axis

denoting amplitude. Thin vertical lines connecting the waveform and the horizontal axis indicate the

time points, when the amplitudes are measured.

In the case of constant time-steps, simulation time-steps are kept constant for the entire simulation

period. The constant time-step is usually defined by the user, but can be automatically determined by

the simulator at the initial phase based on defined circuit component properties. The sampling rates

Constant Time Step

Amplitude

TimeVariable Time Step

Amplitude

Time

Figure 2.7: Comparison between constant and variable time-step based simulation schemes


associated with the simulation time-steps must be greater than or equal to twice of the maximum circuit

system operating frequency in order to perform exact reconstruction without aliasing, according to the

Nyquist-Shannon Sampling theorem. In practice, the sampling rate should be about 10−15 times of the

maximum operating frequency, because outputs at the non-measured time points are usually interpolated

based on the neighbouring measured time points [29]. In order to maintain the expected accuracy during

the interpolation process, the suggested sampling rate is sufficient for most cases.

Having a constant time-step based simulation scheme is inefficient, since the simulator is blindly

picking time points regardless of observable activity. Variable time-step based simulations are therefore

preferable, since the simulation time-steps are picked based on circuit activity, specifically based on the

slope of the signal amplitude. Whenever the signal has steeper slopes, smaller time-steps are picked,

and whenever the signal has flatter slopes, larger time-steps are picked.

Figure 2.8 shows how a typical variable time-step based transient simulation is conducted. At the

initial step (as shown t <- 0 ), the initial value for each circuit node (shown as v(0) <- v0 ) is determined

based on the given system. The next step is to update the time by a small time step, ∆t, to evaluate the

system using the defined equation, f(v, t). If the evaluation is not successful at achieving convergence,

the system output is recalculated by running another iteration, v <- v + ∆v. In the case of variable time-

steps, time step, ∆t, is tested for time stamp acceptance. If the acceptance test fails, a new and usually

smaller time step, ∆t, is picked. Finally, once the time stamp acceptance test passes, the simulator can

progress further along the time axis. This tri-iterative loop continues, until the simulation finishes.

Issues with Variable Time-Step Based Simulation

This circuit activity monitoring based variable time-step simulation scheme improves performance for

smaller circuit systems, but its performance diminishes as circuit systems get bigger due to the increased

probability of circuit activity at any given time. For instance, consider a circuit with N possible nets,

the amplitude of which vary independently over continuous time. Time points for each net, Neti, can

be described as a time set, TNeti (where i = 1, 2, 3, . . . N).

TNet1 = 0, t11, t12, t13, . . . tStopTNet2 = 0, t21, t22, t23, . . . tStopTNet2 = 0, t31, t32, t33, . . . tStop

...

TNetN = 0, tN1, tN2, tN3, . . . tStop

(2.1)

Time points under each time set, TNeti , are organized in an ascending order, 0 < ti1 < ti2 < ti3 <

· · · < tstop, and they are selected in a way that the amplitude variation of the net is maintained at a

constant level. Here, 0 and tStop represent initial and stop times for the simulator respectively. When

the circuit is evaluated using a time-step based simulator, all time points from all N -nets are combined.

The top-level time set, TNets, which represent all time points during a simulation, can be expressed as

the union of individual time sets.

TNets = TNet1 ∪ TNet2 ∪ TNet3 · · · ∪ TNetN (2.2)

As can be observed from the relation above, even if most circuit nets do not necessarily have observable

activities, all circuit nets need to be evaluated, since the time-step based simulation scheme has no defined


Figure 2.8: Typical time-step based transient simulation flow chart [1]


way to isolate the nets without observable activities. Hence, as circuit size grows, the time point density

increases in TNets due to the increased probability of circuit activities, decreasing the benefits of variable

time-step based simulations.

2.4.2 Event-Driven Simulation

Event-driven programming was initially developed for graphical user interface (GUI) based application

software [30]. Once any GUI based application is launched, the software executes commands whenever

a user interacts with it and sleeps otherwise. This brings significant efficiency in software operation. A

similar idea was adopted for simulating digital circuit system, because digital circuitry remains at one

of its amplitude levels associated with a binary logical state of either logic 0 or 1 majority of the time,

although it is built using transistors, as in analog circuitry. Whenever any digital circuit block needs

to be evaluated, events are scheduled in the event queue and the simulator executes circuit blocks in

the ascending order of event time stamps. This type of simulator is optimized not only along the time

axis like variable time-steps, but also across system space. The time axis and system space optimization

property is referred to as spatiotemporal optimization. Due to such property, event-driven simulation

provides significant speed advantages and computational efficiency in comparison with time-step based

simulation, even for large-scale systems.

As in digital systems, any high-speed serial link communication involves a digital source at the

trasmitter and a digital sink at the receiver (Figure 2.2). The digital source, which is synchronous with

the trasmitter clock, usually generates a random discrete signal stream mimicking the properties of the

transition density of actual sources. Similarly, the digital sink, synchronized with the receiver clock,

samples the transmitted digital signal. Figure 2.9 shows an example case, where a simple transceiver

operation is simulated. In the example, the event scheduler receives the next transition events from

TX and CDR using their corresponding event routines. The task of the event scheduler is to sort out

Figure 2.9: Concept of event driven simulation: (a) block diagram, (b) operation [2]


the transition events in an ascending order and activate the circuit blocks based on that order. Only

the channel and equalizers are analog, but a number of methods have been proposed to overcome the

issue to a certain extent. As a whole, event-driven simulation is becoming nearly an ideal candidate for

system-level verification of high-speed communication over serial links.

Drawbacks of event-driven simulations arise from its requirement of predicting future events for

execution. For synchronous systems, future events can be predicted using the local clock transition

interval and clock operations are relatively independent from external effects. For instance, if a system

has a binary phase detector (PD)-based CDR, future transition events of its VCO can be calculated in

advance based on the current state of binary PD. Synchronous circuit blocks, which are connected to

such CDRs, can simply refer to the time-events of the CDR. However, this non-causal behaviour cannot

be applied to asynchronous circuit blocks, such as slicer or linear PD. Since these asynchronous circuit

blocks work based on zero-crossing detection and the zero-crossing detection requires time-step based

waveforms, events associated with zero-crossing cannot be estimated without going over the time-step

based waveforms.

2.5 Modelling for Continuous Time Component Blocks

The key challenges of conducting an AMS simulation come from modelling continuous time circuit

blocks. Modelling a continuous time circuit block is always challenging, because its output varies both

in time and amplitude. Since computation can be performed in discrete time, time points for continuous

time output calculations are determined based on the system defining properties. For example, if an

operational amplifier is tested in transient mode for continuous time input amplification, time points

should be placed as closely as possible. In that case, it is preferable to adopt a state-space based modelling

scheme in time-step based simulation environment. However, if a switch capacitor-based circuit needs to

be tested, it is usually necessary to observe outputs at each clock transition. Under such circumstances,

it is preferable to adopt a continuous time modelling scheme, which can be used to calculate output

in any given time space. Here, four popular and recently developed modelling schemes applicable for

continuous time circuitry from transceiver operation perspective are presented.

2.5.1 Ordinary Differential Equation (ODE) Based Modelling

ODE-based modelling for continuous-time circuitry is the most popular scheme implemented in major

commercial simulators due to its versatility in device-level nonlinear modelling and generalizability in

any circuit analysis. This modelling scheme is mainly adopted in time-step based simulation schemes,

as mentioned earlier in Section 2.4.1. Modelling in ODE-based schemes is usually based on modified

nodal analysis (MNA), because most nonlinear circuit devices are modelled using controlled current

sources [1, 28]. A generic form of the equation for a node can be written based on Kirchhoff’s law on

currents (KCL), as shown in Figure 2.10.

Figure 2.10 shows all possible circuit branches connected to a node. Each circuit branch contains a

two-terminal load, which is shown as a rectangular box. Current through each branch is quantified as,

in(v, t)+Kn ·dqn(v, t)

dt, where Kn represents respective gain coefficient and n denotes the branch index.


Node

i1 (v,t) +

K1 · dq

1 (v,t)dt

i2(v, t) +K2 ·dq2(v, t)

dt

i 3(v,t)

+K 3·dq

3(v,t)

dt

Figure 2.10: Demonstration of Kirchhoff current law (KCL)

Based on this, the following holds:

fNode(v, t) = i1(v, t) + i1(v, t) + i3(v, t) + · · ·︸︷︷︸For static components

+K1 ·dq1(v, t)

dt+K2 ·

dq2(v, t)

dt+K3 ·

dq3(v, t)

dt+ · · ·︸︷︷︸

For dynamic components

= 0 (2.3)

Re-arrangement in KCL Equation 2.3 allows us to observe two major representations: one is of static

components, such as resistors, voltages, and current sources, and the other is for dynamic components,

which can be capacitive or inductive. Each term in the equation is considered as a function of voltage, v,

and time, t, to show device level nonlinearity and time variations. Certain devices, such as transistors,

which have more than two terminals, are broken down into equivalent multiple two-terminal devices.

Such KCL equation is formed from all circuit nodes, which give a system of equations of the form,

A−→x −−→b = 0. The matrix, A, contains values of resistance, capacitance, inductance, and controlled

source gain coefficients, which are known. Regarding the vector,−→b , is formed with known currents and

voltages from independent sources. All unknown quantities related to node voltages and net currents

are accumulated under the vector, −→x . Before solving for the unknowns, the expression for fNode(v, t) is


further simplified by applying finite difference approximation: as in any derivative entity of y, dy/dt ≈(y(t+∆t)−y(t))/∆t, where ∆t represents appropriately selected time-steps based on system convergence

requirements. This allows us to linearize the system before solving it algebraically.

The system of equations can now be solved directly by inverting the matrix, or iteratively through

making an initial guess. Even though the direct inversion method can be adopted for smaller circuit

systems, the iterative approach is usually preferred to maximize the processing capability for solving

large circuit systems. Among various iterative solving methods, Newton-Raphson’s method (NR) is

one of the most commonly applied methods. Equation 2.4 describes the iterative approach of the NR

method:

xn+1 = xn −f(xn)

f ′(xn)(2.4)

en+1 = (xn+1 − xn) ≤MaxRelTol(xn+1), AbsTol (2.5)

As can be seen from Equation 2.4, a function, f(xn) can be solved, if there exists a non-zero first-

order derivative f ′(xn) 6= 0. Since the circuit system is usually nonlinear and dynamic in nature, its

system of equations satisfies the required conditions for the NR method. After obtaining the new value

xn+1 through applying xn, the error for the new value, en+1, can be calculated, as shown in Equation

2.5. For every new value of xn, the corresponding error en+1 is estimated, and the error value, en+1

goes down, as the number of iterations is increased. It also worth mentioning that during every new

iteration, new operating points for all nonlinear circuit components need to be obtained meaning that

the system of equations changes for every new error value, en+1.

The process of iterations continues until the error value drops below a preset simulation error limit.

The preset error limit is the maximum of relative error (referred as RelTol) and absolute error (referred

as AbsTol), as shown in the equation. Relative error is defined as a function of the new value, xn+1,

through the relation, RelTol(xn+1) = |xn+1−xn|/Minxn+1, xn. When the value is much larger than

zero (xn 0), relative error is signified during error calculation. However, without absolute tolerance,

it is not possible to achieve convergence using the NR method. For systems of equations, multiple

error values would need to be calculated for multiple circuit nodes. In such cases, the maximum of all

calculated error values is used for error limit comparison.

It can be inferred from the above discussion that the ODE based modelling scheme allows us to use

any arbitrary input signal, since the output for each circuit node is calculated at every time point. It can

also be clearly seen that the calculation scheme is significantly computationally intensive. As the system

size grows, the system of equations (which is a square matrix in order to have a unique solution set) grows

and matrix inversion complexity increases nearly exponentially. Determining the initial guess, x0, can be

troublesome, because the system needs to intelligently pick the values to ensure convergence; otherwise,

the initial guess needs to be provided manually for each circuit node by the user. Convergence failure

can also arise from not selecting the proper time step, ∆t. Like the case of larger time steps leading to

larger errors, certain continuous functions, such as tanh(x), which do not always have finite derivative,

may often cause convergence failure due to improper time point selection.


2.5.2 Pulse Response Based Modelling

Modelling continuous time circuit behaviour using pulse responses requires representing the transmitted

binary signal through a summing input pulse train multiplied by the transmitted symbols. This is one

of the continuous time component modelling techniques, whose operation principles closely resemble

the event-driven simulation scheme. The modelling scheme is also employed to generate statistical

eye diagrams, a technique which can be used to create eye diagrams with integrated statistical PDF

information without running time-consuming transient simulations [31]. The rest of the section explains

the core concept of how to generate continuous-time output waveform applying the recorded pulse

response.

Figure 2.11a depicts an example pulse response, p(t), that can be acquired from simulation or labora-

tory environment. Its input is a rectangular pulse of unit amplitude Π(t). Here, a rectangular unit pulse,

Π(t), is defined as 1 within the pulse duration of a bit period, Tb. From the recorded pulse response of

indefinite duration, a conspicuous segment of pulse response, pExt(t), can be extracted, which is mostly

non-zero within the chosen range [0, tExt], but approximately zero otherwise. The collected samples

outside the range do not contribute noticeably to any calculation and hence, they are considered to be

zero. This allows us to write the pulse response, p(t), as presented in Equation 2.7:

Π(t) =

1 if 0 ≤ t ≤ Tb0 otherwise

(2.6)

p(t) =

pExt(t) if 0 ≤ t ≤ tExt0 otherwise

(2.7)

In order to estimate the continuous time output, y(t), using the pulse response, any bit-stream input,

x(t), (Figure 2.11b), can be written as follows:

x(t) = limN→∞

N∑i=1

bi ·Π(t− iTb) (2.8)

where bi ∈ A−1, A+1 and i = 1, 2, 3, . . . , N . A−1 and A+1 represent the amplitudes of two binary

logic states 0 and 1 respectively. Using x(t), continuous time output, y(t), can be determined through

convolution with the impulse response of a continuous time system, c(t).

y(t) = x(t) ∗ c(t)

= limN→∞

N∑i=1

bi · p(t− iTb)︸︷︷︸Simulation length dependent, O(N 2)

(2.9)

where the pulse response is defined in relation to the impulse response, c(t), as, p(t) = Π(t) ∗ c(t).The top plot of Figure 2.11b shows an arbitrary binary bit-stream waveform with sharp transition in a

continuous time-frame. The figure demonstrates how the shifted versions of the pulse responses (shown


t0 tExt

p(t)

0

1Extracted SegmentpExt(t)

(a) Pulse response extraction

Random Bit-stream, x(t)

tTb 2Tb 3Tb 4Tb · · ·

A+1

A−1

Shifted Pulse Responses, bi · pExt(t− iTb)...

t

t

t

Tb 2Tb

2Tb 3Tb

3Tb 4Tb

0

0

0

A+1

A−1

A−1

...

Summed Step Response, y(t)

t

A+1

A−1

(b) Waveform formation

Figure 2.11: Continuous time waveform formation using pulse-response based modelling


in the middle section) are summed to generate the desired output response, y(t) (shown in the bottom

section).

As can be seen in Equation 2.9, the summation must be executed for all transmitted bits throughout

the entire simulation. Due to these facts, the complexity of the implemented algorithm for an N -bit

long simulation grows with O(N 2). In other words, simulation time grows quadratically, without even

considering computational storage requirements, which is undesirable. To bring the complexity to O(N ),

the definition of pulse response, p(t), presented in Equation 2.7, is exploited to reduce the number of

transmitted bits to be summed to a constant. In that case, the equation for calculating continuous-time

output, y(t), becomes,

y(t) = limN→∞

N∑i=N−k+1

bi · pExt(t− iTb)︸︷︷︸Simulation length independent, O(N )

(2.10)

One of the major concerns regarding Equation 2.10 is that the summation needs to be executed at

every fixed bit duration, Tb. In reality, a transmitted bit-stream often contains various effects, such as

clock jitter and amplitude variation due to equalizer effects, such as feed forward equalizer (FFE), so

the system does not always behave with realizable linearity. Capturing such behaviour requires pulse

responses of various amplitudes as well as durations, and this can make the algorithm very complex.

2.5.3 Step Response Based Modelling

Similar to pulse response based modelling, another continuous time modelling technique for event-driven

simulation is step response based modelling. In step response based modelling, a continuous time

waveform is estimated using the collected step response instead of the pulse response. A key advantage

of step response based modelling over pulse response based modelling is that summation needs to be

executed only when a transition occurs. Since the algorithmic summation happens during the transition

phase of transmitted bit-streams, the number of calculations is always less than or, in the worst case,

equal to that of pulse response based simulation, assuming the time vectors for both cases is of same

length. The rest of the section presents how to apply the step-response to calculate the continuous time

waveform with the aid of Figure 2.12.

The step response, s(t), is recorded for the applied unit step, u(t), as input to the continuous time

system of interest. As can be noticed from Figure 2.12a, a conspicuous segment of the step response,

sExt(t), can be extracted within the time range, [0, tExt], as in the case of pulse response, p(t). Outside

the range, the step response, s(t), is 0 at the initial stage, (when t < 0) and beyond the time range

t > tExt, s(t), it can be considered as a constant, s∞. The expression for step response, s(t), is described

in Equation 2.12.


t0 tExt

s(t)

0

s∞

Extracted Segment

sExt(t)

(a) Step response extraction

Random Bit-stream, x(t)

tt1 t2 t3 · · ·

A+1

A−1

Shifted Step Responses, (αi − αi−1) · sExt(t− ti)

t

t

t

t1

t2

t3

0

0

0

A+1 −A−1

A−1 −A+1

A+1 −A−1

...

Summed Step Response, y(t)

t

A+1

A−1

(b) Waveform formation

Figure 2.12: Continuous time waveform formation using step-response based technique


u(t) =

1 if t ≥ 0

0 otherwise(2.11)

s(t) =

sExt(t) if 0 ≤ t ≤ tExts∞ if t > tExt

0 otherwise

(2.12)

Calculating the continuous time output, y(t), involves performing convolution on a random bit-

stream, x(t), with continuous time system impulse response, c(t). In this case, the random bit-stream,

x(t), needs to be defined in terms of transition states, αi, as defined by Equation 2.13. The transition

states, αi, happening at transition phase, ti, is defined as αi ∈ A−1, A+1, but αi 6= αi−1, where

i = 1, 2, 3, . . . , N .

x(t) = limN→∞

N∑i=1

(αi − αi−1) · u(t− ti) (2.13)

y(t) = x(t) ∗ c(t)

= limN→∞

N∑i=1

(αi − αi−1) · s(t− ti)︸︷︷︸Simulation length dependent, O(N 2)

(2.14)

Here, the step response, s(t), is defined with respect to the continuous time impulse response, c(t),

through s(t) = u(t) ∗ c(t). After applying this relationship during the convolution, the resultant

expression for continuous time output, y(t), can be found as presented in Equation 2.14. Similar to the

case of pulse response, implementing the equation leads to exponential algorithmic complexity, O(N 2).

In order to bring the complexity down to O(N ), the expression for step response, s(t), described in

Equation 2.12, is utilized. The resultant expression for continuous time output, y(t), becomes,

y(t) = limN→∞

N∑i=N−k+1

(αi − αi−1) · sExt(t− ti)︸︷︷︸Simulation length independent, O(N )

+ limN→∞

N−k∑i=1

(αi − αi−1) · s∞︸︷︷︸Constant, O(N )

(2.15)

As can be seen from Equation 2.15, there are two sub-expressions. The first expression always requires

the k number of summation at the calculation stage, which indicates its simulation length independence

and linear computational complexity, O(N ). The second expression is a scalar constant, which needs

to be updated during every transition. Its computational complexity is also linear, O(N ), due to its

simulation length dependency, but quite negligible in comparison to the first expression. Overall, this way

of calculating responses for continuous-time systems has great speed advantages due to its computational

simplicity.


2.5.4 Symbolic Expression Based Modelling

Symbolic expression based modelling refers to describing the output responses of continuous time cir-

cuitry using continuous time algebraic functions. Such algebraic functions can be, sinx, tanx, ex, log x,

polynomial expressions, or combinations of them. Outputs from transceiver circuitry have also been

modelled using such symbolic expression, which has been proposed by Jang et.al. [3, 4]. Figure 2.13

shows the key concepts behind the modelling scheme.

Jang et.al. proposed the s-domain generic expression,∑i

bi/(s+ai)mi , where i represents a positive

index, (Figure 2.13a) [3]. In the expression, bi, ai, and mi depict a coefficient, a complex pole (placed at

the left half plane), and repetitions of the pole respectively. Figure 2.13a shows the derivation process

behind achieving the s-domain generic expression. As Jang et.al. suggested, all major continuous time

waveforms, such as c · u(t), c · tu(t), c · e−atu(t), c · te−atu(t), and similar functions (where c represents

coefficients), can be represented as linear combinations of the time t-domain expression tmi−1e−aitu(t),

whose Laplace transform is the earlier mentioned generic expression. The generic expression is similar to

the rational fitting function in [32], except that the expression of [3] has tmi−1 in t-domain to represent

the repetitive poles. Hence, the generic expression can be determined from the rationally fitted s-domain

function of a linear time-invariant (LTI) system after approximating closely placed poles as repetitive

poles.

Once the transfer function of the system,∑i

bi/(s+ ai)mi , is determined, its time-domain response

due to step input,∑i

citmi−1e−aitu(t), is determined through partial fraction decomposition (PFD).

Figure 2.13b shows how the determined step response is handled. If a low pass filter (LPF) with one

pole at ωp is fed with a step input, the output of the LPF can be calculated from summation of the

two functions, c1 · u(t), which is due to c1/s, and −c2 · e−ωptu(t), which is due to −c2/(ωp + s). The

scheme suggested that these two terms can be represented as two corresponding sets, 0, c1, 1, and

ωp, −c2, 1, instead of calculating a complete exponentially decaying waveform with finite time-steps.

If the continuous-time output due to an arbitrary binary input signal (similar to x(t) shown in Figure

2.12b) needs to be calculated, only two sets therefore need to be updated every time a step is applied.

Even though the symbolic expression based modelling scheme has led to a great deal of calculation

reduction in simplified case studies, the scheme has a number of drawbacks. In a realistic transceiver

case, the channel is the most complicated linear system, and modelling a high-speed channel typically

involves about 60− 100 pole fittings. Under such circumstances, this symbolic modelling scheme needs

to adopt a similar number of coefficient sets, and for every step response, all of them need to be updated.

In addition, when the transceiver system contains continuous time filters, such as CTLE, the system

becomes nonlinear. To handle that case, a Volterra series-based modelling scheme was proposed [4], which

increases the complexity and number of coefficient sets by orders of magnitude. Finally, maintaining

exponential equations allows us to avoid complex calculations only during event transitions, but if the

system needs to generate an eye diagram, the output waveform has to be calculated. Calculating

exponential terms is not as simple as performing simple addition and multiplication-type operations in

a typical microprocessor. Instead, exponential calculation usually involves a Taylor series expansion,

which means a microprocessor needs multiple instruction cycles to calculate each exponential term. As

a whole, the symbolic scheme is therefore not necessarily promising for complex system.


(a) Derivation process of the modelling scheme [3]

(b) Waveform calculation [4]

Figure 2.13: Symbolic expression based modelling overview [3,4]


2.6 Summary

This chapter has discussed the background necessary to proceed to the next chapters dealing with

equalizer and CDR modelling and their optimized validation procedures. Signal integrity in terms

of high-speed data transmission performance evaluations is covered from both signal attenuation and

timing uncertainty perspectives. Then, computational processing schemes applicable to transceiver

circuitry evaluation under long time simulation environments are covered. Throughout the discussion,

critical analysis creates the base for further potential developments on the topic of transceiver circuitry

modelling.

Chapter 3

Proposed Simulation Method for

Analog-Mixed Signal Analysis

To perform various system-level verifications for any analog-mixed signal (AMS) system, it is desirable

to run such AMS simulations in a computationally efficient way while maintaining high modularity. This

would yield short simulation run time and allow for observation and study of the top-level behaviour

of the systems. A number of tests, such as BER contour generation, jitter tolerance plotting, and

circuit design parameter optimization, require fast simulations. In addition to speed, high modularity

is essential, as it enables different types of system-level verification tests with consistent performance

results.

Event-driven simulation scheme has demonstrated a promising speed performance in AMS systems.

During the event-driven simulation, an event scheduler determines next events based on the requests

from the circuit blocks and simulates only the relevant circuit blocks related to the time events (as

described in Section 2.4.2). Due to this capability, an event-driven simulation scheme provides significant

simulation speed. However, the circuit blocks need to be able to predict their next events for evaluations.

If the circuit blocks are asynchronous in nature, they cannot predict their next events. Under such

circumstance, event-driven simulation fails to evaluate the AMS systems.

The inability of the event-driven simulation to incorporate asynchronous circuit blocks is explained

using Figure 3.1. This figure demonstrates two scenarios of calculations: (a) calculating vti , when t = ti,

and (b) calculating multiple tvc,is, when v = vc, (where i = 1, 2, 3, . . . ). In each case of the figure,

the horizontal axis represents time, t, and the vertical axis depicts continuous time amplitude, v. As

can be explained through the top plot, continuous time outputs, vti ’s, can be represented as a function

of time, vti = f(t = ti). The functional relationship allows to calculate any output state at any given

time and hence, the circuit block associated with the relationship can be incorporated into event-driven

simulation scheme. On the contrary, the bottom plot presents a scenario of calculating transition time

points for specific events, tvc,i’s, whenever the output reaches vc. The relationship between tvc,i and

vc cannot be represented as a function and hence, the circuit blocks operating based on tvc,i cannot be

incorporated into event-driven simulation.

Calculating transition points, tvc,i’s, has a number of applications, such as generating jitter PDF

and modelling asynchronous circuit modules. The transition points, tvc,i’s, are usually detected through

going over granular time-stepped continuous waveform. Hence, detecting the transition points, tvc,i’s,

28

Chapter 3. Proposed Simulation Method for Analog-Mixed Signal Analysis 29

t1 t2 t3 · · ·

vt1vt2

vt3

i = 1, 2, 3, . . .

(a) Calculating vti , when t = ti

v = vc

tvc,1 tvc,2 tvc,3 · · ·i = 1, 2, 3, . . .

(b) Calculating tvc,is, when v = vc

Figure 3.1: Comparative study between calculating vti and tvc,i

is currently available only in time-step based simulation scheme. In time-step based simulation, the

simulator needs to repetitively select different time-points around each tvc,i, until the simulator reaches

to that tvc,i within specified error bound (as explained in Section 2.4.1). However, the process of

detecting transition points, tvc,i’s, is usually time consuming and computationally inefficient, which is

further discussed in detail later in Section 3.3.1.

This chapter presents a new way of running AMS simulation system addressing the aforementioned

issue of calculating transition points, tvc,i’s. The new AMS simulation scheme is realizable in standard

object-oriented (OO) programming environment due to added benefits of high computation efficiency

and modularity. The rest of this chapter deals with how the proposed simulation method can be utilized

to achieve the desired benefits. Section 3.1 discusses the concept behind introducing OO simulation for

AMS system analyses. Next, Section 3.2 mentions about the key considerations for modelling such AMS

system and how the interactions occur among different varieties of circuit systems. Finally, Section 3.3

explains the detail process of operating the proposed simulation scheme and examines its performance

under various circumstances.

3.1 Object-Oriented (OO) Modelling Based Simulation

The concept of OO programming originated from the notion of imitating real-life object-to-object inter-

action [33]. In OO programming, each object can be designed to have its own properties and methods.

Properties of an object are internal fields or variables which are to store the state information of the

object for time-to-time usage without direct external intervention. Similarly, methods of an object are

internal functions, which are applied to describe object-specific algorithms or routines to perform various


activities, such as output calculation, object-to-object interaction, and output visualization. Another

noteworthy features of OO programming is that objects of new classes can be derived from other similar

or parent classes to describe their behaviour with minimal effort. These types of attributes make the

OO programming an attractive choice for modelling wide varieties of systems like AMS circuitry.

3.1.1 Abstraction for OO Simulation

Any AMS system like transceiver circuitry can be easily represented in OO programming based simula-

tion (in short OO simulation) scheme. All circuit components in an AMS system can be classified either

as discrete time or continuous time components. Output states generated by discrete time components

change only during their corresponding discrete time events, whereas outputs from continuous time cir-

cuit components vary always with respect to their input variation in time. During the proposed OO

simulation, all components regardless of their types are treated independently to calculate their outputs

using their own algorithms. Outputs are calculated based on the input information which are received

through object-to-object interactions. This allows to have individual time-steps and evaluation process

for different circuit components and it is particularly beneficial for continuous time components, which

are not directly related to each other, due to facilitating de-unionized time point selection unlike the

case of time-step based simulators.

Figure 3.2 explains how the OO simulator can be abstracted for an AMS system. Three discrete time

components, D1, D2, and D3, and two continuous time components, C1 and C2 are shown in Figure

3.2a as representative components of an AMS system. D1 and D2 work as two independent sources for

the representative AMS system, while C1 receives input from the discrete time source D1 and feeds its

output to the discrete time component D3. D3 receives inputs from C1 and D2. Lastly, C2 receives

input from D2 and D3.

The AMS system presented in Figure 3.2a can be translated for OO simulation purpose like the way

shown in Figure 3.2b. In Figure 3.2b, there are two types of blocks: circuit objects and a host platform.

Circuit components D1, D2, D3, C1, and C2 are represented as circuit objects objD1, objD2, objD3,

objC1, and objC2 respectively. Depending on the operation type of circuit objects, they can be of the

same class or different classes. All circuit objects maintain similar input-output relationship just like the

representative AMS system during the OO simulations. The solid arrow points to the direction, where

the processed outputs of circuit objects are flowing. Task of the host platform is to ensure the process

output flow between circuit objects, until the simulation is completed. Here, interaction between host

and each circuit object is shown using a dotted arrow.

3.1.2 Operating Principle of OO Simulation

In the proposed OO simulation scheme, the simulation is conducted primarily through establishing

interactions among circuit objects. All circuit objects are designed to handle their individual simulation

processes. Hence, the proposed OO simulation scheme at the top-level only needs to coordinate with

the circuit objects, until individual processing time of each circuit object arrives to the user-defined stop

time.

During the simulation, all the circuit objects are activated to process their inputs based on the

received input information from their preceding objects by the top-level simulation coordinator (or the

host platform). As indicated in the Algorithm 3.1, the OO simulation involves two phases: initialization


D1

D2

D3

C1

C2

(a) Abstract view

objD1

objD2

objD3

objC1

objC2

Host

(b) Object-oriented view

Figure 3.2: Analog-mixed signal system abstraction for object-oriented (OO) simulation


phase and process phase. At the initialization phase, all circuit objects, cktObj’s, are processed to

evaluate its initial output state (when t = 0). Once all the objects are initialized, all circuit objects are

processed based on the order of activation queue to perform their internal simulations. At this phase,

each object can either decide to proceed with generating outputs (if sufficient information is received) or

defer the process to a later simulation time-frame or declare the completed state. The completed state

is achieved, when output timing of a circuit object reaches to the simulation stop time, tStop .

Algorithm 3.1 Pseudo-code of running the top-level object-oriented (OO) simulation

function run()% Initialization Phase- Initialize all cktObj's at t = 0

% Process Phasewhile all cktObj's have not reached tStop

- Partially process cktObj 1- Partially process cktObj 2- Partially process cktObj 3

...- Partially process cktObj N

end% Note: All these partial-process incrementally lead the transient analysis% to reach at the simulation stop time tStop

end

Since circuit objects are designed to calculate their outputs, the simulation coordinator does not

determine the timing of the circuit object outputs. This causes different circuit objects running their

simulations progressively at different time space (referred to as signal time) at a given simulation step

(referred to as simulation time). Figure 3.3 explains such possible scenario with an example case. The

figure employs three components, C1, C2, and C3, whose outputs are evaluated at time points, t1, t2,

and t3 respectively, where t1 > t2 > t3. These time points are shown on the signal time axis and the

measurement unit of the axis is in seconds (s). These outputs are calculated at the n-th iteration of

the simulation time axis. Because circuit objects can have different signal time standings, transition

events happening earlier in signal time space can be detected later in simulation time space. Having

C1C2C3

t1t2t3

Signal

Time, s

n

Simulation

Time

Figure 3.3: Relationship between signal time and simulation time


this splitting flexibility of the signal time axis for circuit objects allows the OO simulation scheme to

incorporate the asynchronous circuit objects, while having the similar circuit spatiotemporal optimization

of event-driven simulation.

Due to activating circuit objects not in the ascending order of their time events, circuit objects can

be enlisted in the activation queue of simulation coordinator in any order. The random ordering in

activation queue only cause activation of circuit objects at immature state, but circuit objects have

the options to defer their processes. Also, this phenomenon of wrong activation only happens in the

initial simulation time-frame, but diminishes gradually, as the simulation progresses. In reality, the total

number of wrong activation solely depends on the enlisting order of the circuit objects and is independent

of the actual simulation stop time (see Section 3.3.2 for proof). Hence, the OO simulation performance

does not suffer from the random order of circuit objects in the activation queue.

3.2 Description of Circuit Objects

Dealing with circuit objects for simulation purposes is like managing supply chain for manufacturing

process. Each circuit object of Figure 3.2b can be visualized as a unit for processing received information

packet and then sending the processed output packet to its destination. In order to function as an

independent simulation unit, each circuit object needs to have its own internal properties and individual

methods for generating outputs.

cktObj

Properties

MethodsReceivingInputs

SendingOutputs

Input Objectsfor cktOj

Output Objectsfor cktOj

Figure 3.4: Definition of a circuit object, cktObj

Figure 3.4 depicts the definition of such a circuit object unit with the capability to run independent

simulation process. In the figure, the block marked as cktObj defines the circuit object of interest.

Any circuit object like cktObj can have multiple input sources and multiple output sinks, which are

also similar circuit objects like cktObj. Each circuit object has a number of specialized properties and

methods, which are described in Section 3.2.1. The 3D cubic boxes represent the data information

generated by circuit objects, which are described in Section 3.2.2.

3.2.1 Properties and Methods of Circuit Objects

In order to utilize the modularity of OO programming, properties and methods of circuit objects are

required to be designed in a standardized format. Standardization in function (listed under methods)

design and variable (listed under properties) naming allows to establish common interface for object-

to-object interaction as well as to develop new circuit objects through inheritance. Common interface


becomes essential, when it comes to simulating a wide variety of circuit system combinations. Inheritance

brings user-friendly attributes for modelling systems through adopting properties and methods of the

other close circuit objects for the new circuit objects.

Algorithm 3.2 presents a pseudo-code for any circuit object, cktObj. The circuit object, cktObj,

is here inherited from a parent circuit object, paretObj in order to obtain all properties and methods

without repetitive declaration inside cktObj code block. If the cktObj has no parent, its inheritance is

declared as handle – it is basically a class to create new objects in dynamic memory space and to allow

access to that exact object during assignment operation (details can be found in [34]). The pseudo-

code block has two main segment: properties and methods. The properties are used for storing various

types of information, such as intermediate processed states, received inputs, generated outputs, and

object characteristics. The methods are used for describing activities at various stages, such as object

construction stage, initialization stage, input-receiving (and output-sending) stage, and processing stage.

Due to inheritance of OO programming, certain methods can be overloaded in the situation, which is

useful in designing new circuit objects. Inside the pseudo-code, some properties and mehtods are shown

under their respective sections.

Algorithm 3.2 Pseudo-code for a circuit object, cktObj

% Pseudo-code for cktObj (inherited from parentObj)classdef cktObj < parentObj

properties% Internal properties are described here

objTypinPortlastOut <Time>lastOut <State>lastIn <Time>lastIn <State>evtHost... other properties (not mentioned here)

end

methods% Simulation related functions are described here

function cktObj(input arguments) % Constructorfunction init() % Initializerfunction receive() % Input receiverfunction isComplete() % Answerer for completenessfunction process() % Output calculator... other methods (not mentioned here)

end

end

Table 3.1 describes about key considerations of major properties.


Table 3.1: Major properties of circuit objects

Name Description

objTyp 1 It is used to describe the type of circuit objects for a number of object-

to-object interaction purposes. For example, discrete type circuit objects

generate discrete time outputs, where continuous type circuit objects

generate continuous time outputs. Other circuit objects like simulation

measurement scopes generate no output. Certain circuit objects, with

different types of output such as symbolic type, can be introduced to the

simulation scheme simply by appending as a new type of circuit object.

Using the type allows to perform compatibility check or modified actions

at the output receiving end. This property is kept as constant throughout

lifespan of the object.

% For discrete type

objTyp = ObjTyp.discrete

% For continuous type

objTyp = ObjTyp.continuous

% For measurement scope type

objTyp = ObjTyp.scope

inPort It is used to store the input information related to input circuit object

(like handle to input circuit object). Object information of input circuitry

is required for various interaction purposes, such as collecting processed

output packets, requesting to hold on to a stage (for certain object),

navigating to the other circuit objects in the system chain. This property

is defined during the circuit object initialization and once defined, it is

kept constant throughout lifespan of the object.

lastOut <Time>

and

lastOut <State>

These properties work as temporary storage for recently processed output

package. The lastOut <Time> is used for storing timing information

and the lastOut <State> is for calculated output state information.

Their length must be same at all conditions. After every process phase,

old processed outputs are replaced by the newly processed outputs and

hence they are temporary storage. Because of that, the processed outputs

need to be saved immediately after each phase at the receiving ends.

% Storage format for processed information

last <Time> = [t1, t2, t3, , tN]; % Size: N x 1

last <State> = [s1, s2, s3, , sN]; % Size: N x 1

1Properties end with a underscore ( ) at the end


Continuation of Table 3.1

Name Description

lastIn <Time>

and

lastIn <State>

These two properties work as storage for unprocessed information pack-

age received from input circuit objects. The lastIn <Time> is used

to store timing information and the lastIn <State> is for received in-

put state information. They are strictly of same size like the case of

lastOut <Time> and lastOut <State>. Whenever new input infor-

mation being received, they are appended to old input information and

after each processing phase, unwanted information are trimmed off. The

appending and trimming actions help to reduce the demand for computa-

tional memory and results in constant computational speed throughout

the simulation period. These properties are kept as protected to prevent

from potential external corruption.

% Storage format for appended received information

lastIn <Time> = [lastIn <Time>, inPort .lastOut <Time> ];

lastIn <State> = [lastIn <State>, inPort .lastOut <State>];

% |------Old------|----------New-----------|

evtHost It is used to store the object information of top-level simulation coordi-

nator. The coordinator can be considered as a host for requesting events

from circuit objects – in short, event host. Besides coordinating, it con-

tains a number of global information, such as simulation stop time and

order of pulse amplitude modulation (PAM).

% All sharable properties can be defined like this.

% For example,

evtHost .tStop % Simulation stop time

evtHost .pamOrder % Simulation PAM order

% ...

End of Table

Table 3.1 presents only the common and standard properties within the circuit objects. Besides these

properties, each circuit object can have specific properties. For instance, the circuit object PRBS has a

property called order , which represents the number of flip-flops in a PRBS, such as 3, 7, 15, and 31.

Within the circuit object, all properties can have their setter and getter methods to control their be-

haviour, such as specifying acceptable inputs, future changeability, and access restriction. This type of

controlling features or encapsulation helps to prevent unwanted circuit behaviours during simulation.

We now present the list of major methods for modelling circuit objects in Table 3.2.


Table 3.2: Major methods of circuit objects

Name Description

Constructor This method is responsible for not only creating the object itself but also

performing its first-stage initialization. It takes the name of the circuit

object it is defined for. All essential properties, like inPort and object

specific parameters, are defined at this phase. It is invoked when the ob-

ject is declared in the top-level script. Pseudo-code for the constructor

is described below.

classdef cktObj < handle

methods

function cktObj(input arguments) % Constructor

- Define input ports

- Initialize object specific parameters

end

end

end

init()2 This method is called at the initialization phase performed by the simu-

lation coordinator. This phase is the second-stage initialization and the

stage is added to reduce the top-level scripting and thereby to facilitate

user-friendly coding styles. At this phase, all remaining internal proper-

ties are defined so that the circuit object is prepared to enter into the

processing phase. Certain internal properties often have dependency on

the initial output states of the inPort . Hence, the method needs to be

invoked, once the method init() of the inPort is completed. If the

circuit object has multiple input sources, the init() of the circuit object

can be called either once or multiple times by the similar methods of the

chosen input sources. Pseudo-code for the init() is described below.

function init() % Initializer

- Initialize any remaining undefined property

- Calculate initial output (at t = 0)

end

2Methods end with brackets at the end.



Name Description

receive() This method is used for receiving processed information from the inPort

object. Every time the inPort has completed its processing, the method

is invoked to append the processed information to the previously received

information at the properties lastIn <Time> and lastIn <State>.

Hence, calling the receive() repeatedly causes the sizes of the properties

to get bigger indefinitely, which may bring undesirable effects of higher

memory resource and lower computational speed, (unless the vectors are

trimmed off through processing at the same growth rate). Pseudo-code

for the receive() is described below.

function receive() % Input receiver

- Append to the previously stored input information

end

isComplete() This method is used for providing feedback to the system coordinator

about the simulation status of the circuit object. It provides logical out-

put true, if the signal time of the circuit object exceeds the simulation

stop time, tStop ; otherwise, it raises the flag as false to indicate the

incomplete status of processing. Pseudo-code for the isComplete() is

described below.

function isComplete() % Completeness answerer

return maxlast <Time> ≥ evtHost .tStop

end

One noteworthy fact about the completion status would be mostly false

since the initial stage. Once the processing is completed, the status would

be true onward. The behaviour is more like a unit step function, which is

implementation friendly for branch prediction based computation scheme.



Name Description

process() Core functionality of the OO simulation is performed by this method.

It is invoked by the simulation coordinator either directly or through

helper circuit objects (with state holding capability). Holding on to a

state becomes necessary when the simulation process for the object can-

not be performed due to the lack of available input information. The

helper circuit object with holding state capability is developed in order

to reduce repetitive coding. Algorithm related to the circuit object is

described inside the method process(). Once the method is completed,

the processed information is saved in the fields lastOut <Time> and

lastOut <State>. Pseudo-code for the method process() is described

below.

function process() % Output calculator

- Identify next time event

if not enough input information received

- Hold on to the state

return

end

- Process input information

- Save the processed information

- Discard unnecessary input information

- Notify about process completion

end

As can be seen, the method process() trims off unnecessary input infor-

mation from the properties lastIn <Time> and lastIn <State>. This

allows to keep their sizes within manageable limits to avoid potential sim-

ulation slow speed issues (which is highlighted earlier). Once the method

process() is completed, it can either return back to the main routine

run() or initiate process()’s of other circuit objects.

End of Table

Like the case of property analysis, Table 3.2 only includes major methods for any circuit object.

These major methods are relevant to simulation interactions. As can be observed from the table, the

Constructor and init() are both called once for initialization purposes during the simulation. Because

of that, their contributions to the simulation run time is mostly negligible and hence their efficiency in

simulation speed is irrelevant. Other three methods, receive(), isComplete(), and process(), are

called numerous times. The simulation length, set by the tStop , defines how many times these methods

are called. Hence, simulation speed performance is directly affected by the efficiencies of these three

methods, because the task of the top-level simulation coordinator (which is function run(), as described

by Algorithm 3.1) has been significantly simplified. Among the three methods, the process() is the

most computationally intensive and the overall elapsed time is mostly dominated by the process()’s of


all circuit objects.

Besides these major methods, a number of other methods can be defined for various purposes. For

instance, if the output of a circuit object is enabled to save, the output can be displayed in a graphical

window using the method plot(). In this case, the plot(), specifically defined for the circuit object,

has been overloaded with other built-in plot() functions.

3.2.2 Processed Data Formats of Circuit Objects

One of the major concerns in circuit simulations is the co-simulation of the continuous time circuit

components with discrete time circuit components in event-driven mode. This section discusses the

format of the processed data, which can be used to establish the interactions among such circuit objects

(shown as 3D cubic box in Figure 3.4). Figure 3.5 presents an equivalent relationship between discrete

time and continuous time circuit objects for OO simulation purposes. Details about each processed data

format are presented in the following.

Discrete Time Objects

As explained in Section 2.4, output of a discrete time component comprises of discrete output states

at their corresponding discrete time events. Accordingly, the top part of Figure 3.5 presents how the

output of a discrete time components can be presented as two equally sized sequences: a time sequence,

TD = (ti)Ni=0, where t0 = 0, and a state sequence, SD = (si)

Ni=0. At each time event, ti, the discrete

time component generates a discrete state, si, and its initial state, s0, is defined at the time event, t0.

In the proposed OO simulation scheme, the state sequence, SD, is designed such that consecutive states

are usually not allowed to be equal, si 6= si−1, where i = 1, 2, . . . N . This is to minimize unnecessary

object-to-object interactions and achieve the speed advantages of event-driven simulation.

For Discrete Time Objects

Time, TD =(

0, t1, t2, . . . ti, . . . tN

)State, SD =

(s0, s1, s2, . . . si, . . . sN

)

For Continuous Time Objects

Time, TC =(

0,(tk,1)K1

k=1,

(tk,2)K2

k=1, . . .

(tk,i)Ki

k=1, . . .

(tk,N

)KN

k=1

)Waveform, WC =

(w0,

(wk,1

)K1

k=1,(wk,2

)K2

k=1, . . .

(wk,i

)Ki

k=1︸︷︷︸, . . .(wk,N

)KN

k=1

)i-th Wavelet, wi

here,(tk,i)Ki

k=1,= (t1,i, t2,i, . . . tKi,i), tk,i ∈ (ti−1, ti], tk,i = tk−1,i + ∆tk,i︸︷︷︸

Time Step

Figure 3.5: Processed data format comparison for the cases of discrete time and continuous time objects


Regarding the time sequence, TD, all consecutive time events, ti’s, are generated in such a way

that they can be progressively increasing, 0 < t1 < t2 < · · · < tN , during the simulation. This is in

contrast with the situation in conventional time-step based simulation, where if the operation of any

discrete time component depends on the transition points of a continuous time component as an input

source, the simulator often has to move back and forth along the time axis to find the transition points,

which causes certain calculations to be performed repetitively (see [1] for details). In the proposed

scheme, progressively increasing order of the time sequence elements, (ti − ti−1)Ni=1 > 0, is maintainable

throughout the simulation, because circuit objects can be at different signal time simultaneously at any

given simulation time.

Continuous Time Objects

For continuous time components, the continuous output waveform needs to be represented with multiple

discrete time outputs spaced at reasonable time-steps. Continuous-like discrete time output has a number

of applications in transceiver simulations, such as eye diagram generations, jitter measurements, and

asynchronous circuit block simulations. In order to support such continuous-like outputs in event-driven

simulation mode, a continuous time waveform, WC , recorded for an interval, [0, tN ], can be visualized

as multiple wavelets, which are recorded for smaller intervals, 0, (0, t1], (t1, t2], . . . (tN−1, tN ]. Figure

3.5 shows how such continuous time wavelets with relevant intervals can be aligned with their respective

discrete time outputs. Like the case of discrete time objects, the initial output, w0, for continuous time

objects recoded at t = 0, is a scalar. For the rest of the cases, any wavelet, wi, which itself is considered

as a sub-sequence, (wk,i)Ki

k=1, recorded at corresponding time sub-sequence, (tk,i)Ki

k=1, is comparable to

the respective discrete output event, si, at ti. Discrete time points for any i-th wavelet, tk,i’s, are selected

from the respective interval, (ti−1, ti]. Inside the time sub-sequence, each element, tk,i, is incremented

from the previous element, tk−1,i, by its respective time step, ∆tk,i.

Depending on the simulation requirements, time steps can be picked as constant or variable. A

constant time step can be useful for generating an eye diagram, because the generation process involves

sampling at a fixed time step. Time step, ∆ti, can be kept constant within a time sub-sequence, but

might need to be varied over entire simulation period, because all time intervals, (ti−1, ti]’s, might not

be the perfect multiples of the initially chosen time step. A variable time step can be useful for detecting

transition points like determining zero-crossings to improve simulation speed. Regardless, all time points

in (tk,i)Ki

k=1 must be selected within the given interval, (ti−1, ti], to avoid non-causality effects. This

segmentation process enables running continuous time components in event-driven mode.

3.3 Performance Evaluation of OO Simulation in Case Studies

We evaluate the performance of OO simulation through examples in this section. Section 3.3.1 is used

to study the simulation steps in detail for a simple example case. This example case is analyzed in

terms of the order of processing circuit objects for various combinations in Section 3.3.2. Later, Section

3.3.4 explains how the proposed OO simulation can be implemented for parallel processing environment.

Section 3.3.3 discusses the simulation situations, if the system has feedback.


ClockPRBS

Random Bit-stream Generator

Transmitter

Channel

Receiver

Slicer

VClk VRBG VCh VRx

(a) Block-level schematic

VClk

VRBG

VCh

VRx

-

Signal Time, t

Simulation Time: 1, 2, 3, . . .

t1 t2 t3 t4 t5

t1a t2a t3a t4a t5a

1 2 3 4 5

2 3 4 5 6

3 4 5 6 7

(b) Output waveforms with added markings

Figure 3.6: Simulation test case study for OO simulation


3.3.1 Example Case Study

Figure 3.6 shows the block-level testbench of a simple transceiver circuit used for the evaluation and its

corresponding output waveforms generated from the simulation with added markings for future reference.

The transceiver employs three major blocks: a random bit-stream generator (RBG) (at the trasmitter

side), a data transmission channel, and a slicer (at the receiver side). The RBG consists of a PRBS

generator fed by a clock producing a synchronous binary output. In the waveform plot, the horizontal

axes for all cases represent time, t. The second plot from the top shows a sample output waveform from

the RBG, VRBG, which is generated at the clock transitions (both rising and falling), VClk. The third

plot depicts the output waveform of the channel, VCh, calculated at a sample rate much higher than the

clock frequency to demonstrate its continuous nature. The task of the slicer is to produce a binary data

corresponding to its continuous input. The fourth plot shows the binary output of the slicer, VRx.

During the OO simulation, all major blocks are modelled as independent objects. Table 3.3 presents

object-specific properties, which are required during the object construction.

Table 3.3: Description of object-specific properties for the selected object-oriented simulation case

Object Property Value

RBG objTyp It is defined as ObjTyp.discrete by default, since the rep-

resentative component is of discrete type.

inPort It is kept empty, [], since the component has no input source.

clkPeriod It defines the period for the internal clock. Since the PRBS

flip-flops are configured for both edge operation, it is set to be

twice of an UI.

prbsState It defines the output binary states of the PRBS flip-flops. It is

actually a vector and its length is determined by the number

of the flip-flops. Elements in the vector are updated at every

clock phase based on the PRBS polynomial expression.

Channel objTyp It is defined as ObjTyp.continuous , because slicer operation

requires continuous time wavelets.

inPort It contains the handle information related to the object RBG.

modelInfo It is utilized to describe the channel model. Its content depends

on how the channel is modelled for simulation. For instance,

it contain step response amplitude and timing information, if

the object is modelled based on step responses.



Object Property Value

Slicer objTyp It is defined as ObjTyp.discrete , because slicer outputs only

contains transition event of the channel.

inPort It contains the handle information of the object channel.

threshold It defines the assigned threshold to make the binary decision.

Here, its value is set at the mid-point of the VClk amplitude

range.

End of Table

The above table also does not discuss about certain properties, such as evtHost , lastIn <Time>,

and lastIn <State>. These properties are discussed earlier in Table 3.1. Simulation-specific methods

(except isComplete()) for all objects are presented as follows.

Table 3.4: Simulation-specific properties of all objects for the selected object-oriented simulation case

Object Method Routine

RBG init() This method defines the properties lastOut <Time> and

lastOut <State> to represent its initial output.

receive() Since the object does not have any input source, this method

does not exist.

process() This method is responsible to generate the output. It is respon-

sible to calculate the PRBS output with respect to the clock

transitions. Its routine can be described as follows.

function process()

while no PRBS transition is detected

- Use clkPeriod to define the next clock ...

transition

- Perform the PRBS operation

- Update the prbsState

end

- Update lastOut <Time> and lastOut <State>

- Notify the Channel to receive its output

end




Channel init() It calculates the initial output at t = 0 based on the initial

state of the RBG.

receive() This method receives the processed output from the RBG ob-

ject and appends them to the properties lastIn <Time> and

lastIn <State>.

process() Its task is to calculate the continuous time output.

function process()

if no input information is available

- hold on to the state

return

end

- Calculate its continuous time output

- Store the output at lastOut <Time> and ...

lastOut <State>

- Notify the Slicer to receive its output

end

Slicer init() It defines the initial state of the slicer based on the initial

output of the channel.

receive() This method receives continuous time output from the channel

and then identifies the slicing location on the continuous time

waveform.

function receive()

- Receive continuous time output from channel

% Pre-process

- Detects the threshold-crossing locations, t i's

- Assigns output states for all t i's

- Append t i's and its states to lastOut <Time> ...

and lastOut <State>

end




process() Task of this method is to generate time events based on the

detected crossing locations.

function process()

if lastOut <Time> is empty

return

end

- Generate an time event using the first crossing

- Remove the first crossing

end

End of Table

As can be seen, the receive()’s for all objects perform the task of receiving input (if input source is

available), except for the slicer, in which case, the method also performs some pre-processing to simplify

the task of the slicer process(). It is intuitive for the slicer to save only the detected threshold-crossings

instead of entire continuous time waveform from the channel, since it reduces the memory requirement

for slicer operation.

Table 3.5 explains step-by-step how the simulation is conducted in OO mode. As mentioned in

Section 3.1.2, the enlisted object order does not matter for running OO simulation. Therefore, let us

assume the objects are enlisted in the following order: a. slicer, b. channel, and c. RBG. Based on the

Algorithm 3.1, the simulation steps are described as follows:

Table 3.5: Explanation of simulation steps for the selected object-oriented simulation case (for simulationtime 0 - 4)

Simulation Time Action

03 At the initial step, all circuit objects are initialized. Since the RBG has

no input source, its method init() perform the initialization indepen-

dently at first. Next, the channel method init() is initialized based on

the initial state of the RBG. Later, the slicer performs its initialization

similarly based on the initial output at t = 0 of the channel.

3represents the initialization phase



Simulation Time Action

1 This step progresses as follows.

a Slicer: Since no input is available for processing, its process()

declares a hold state.

b Channel: Since no input is available for processing, its process()

declares a hold state.

c RBG: Its process() generates the first output at time event, t1.

2 This step proceeds as follows.

a Slicer: Since no input is available for processing, its process()

still remains at the hold state.

b Channel: Its process() produces continuous time output se-

quence for (tk,1)K1

k=1, where tk,1 ∈ (0, t1].

c RBG: Its process() generates the output at time event, t2.


a Slicer: Its process() goes over the received waveform recorded

at (tk,1)K1

k=1, but detects no transition at the defined threshold, and

hence no output is generated.


quence for (tk,2)K2

k=1, where tk,2 ∈ (t1, t2].

c RBG: Its process() generates the output at time event, t3.


a Slicer: Its process() detects the first transition event at t1a.


quence for (tk,3)K3

k=1, where tk,3 ∈ (t2, t3].

c RBG: Its process() generates the first output at time event, t4.

End of Table

As can be observed at simulation time 4 that all circuit objects are processing their received input

information and generating outputs. This pattern of processing for all circuit objects will repeat at

all future simulation time, 5, 6, 7, . . . , until the process() methods of all the circuit objects stop

processing. When a circuit object has reached its processing to the end of the signal time, it stops


processing and this scenario is similar to that of holding states. In this case, the RBG finishes processing

first, then the channel does, and lastly the slicer does. When all process() methods stop processing,

the OO simulator terminates completely.

In addition to previous observation, it is worth mentioning that time events like t2a generated by the

slicer occur slightly before the time event, t3, generated by the RBG according to signal time axis. In

conventional AMS simulation scheme, the simulator finds the time event, t2a, through guessing multiple

time points iteratively around the time event, t2a, of the signal time axis. During the determination

process, the simulator recursively generates and discards the other time events, such as t3, which can

bring undesirable consequences of long simulation time. In OO simulation mode, the time event, t2a,

is detected much later than the time event, t3, in simulation time axis. Because it is assumed here

that the event, t2a, does not cause any shifting of the event, t3, along the signal time axis. In essence,

once the events are generated by any circuit object, they are not discarded, but are controlled how far

the circuit object can progress. By removing the discarding policy through the signal time flexibility of

circuit objects at the individual level helps the OO simulator to achieve higher computational efficiency

and thereby increased simulation speed.

Here, the example case is chosen to be simple for explanation convenience. Realistic examples have

complexities associated with branching and feedback loops, which leads to various processing frequency.

In those cases, certain circuit objects often need to be on hold states at intermediate simulation steps.

According to the pseudo-code (presented in Table 3.2), the method process() of any circuit object

wastes negligible time, whenever the circuit object enters into hold states. Since the process() has

the dominant effects in simulation time, additional hold states would not noticeably linger the overall

simulation time.

3.3.2 Object Order Sensitivity

This section analyzes the example described earlier in terms of its sensitivity to circuit object enlisting

order in the top-level simulation coordinator. Figure 3.7 depicts three possible orders, in which the

circuit objects from the example case can be enlisted and their effects on running simulations for N

number of transitions. In each figure, the horizontal axis represents the simulation time progressing

from left to right. The vertical axis represents the order, by which each circuit object is activated.

Combining the two axes forms a matrix, in which each cell represents a time event, ti, (or a time

sequence, (tk,i)k=Ki

k=1 , where tk,i ∈ (ti−1, ti], for continuous time component representation), generated

by corresponding circuit object. Here, X represents a situation when no output is generated, but the

circuit object still has to spend time for processing its inputs. On the other hand, H and C represent

hold and completed states respectively, but negligible time is spent.

Case 1 scenario (shown in Figure 3.7a) is re-drawn from object sequence used for describing the

simulation steps in Table 3.5, where the object activation order is: a. slicer, b. channel, and c. RBG.

Because of the activation order, the channel has to be held once and the slicer has to be held twice

due to lack of available input information. Afterward, the channel and the slicer never have to be held,

because both objects have access to sufficient input information. For case 2, the hold states of the slicer

at simulation time 2 is possible to be avoided, since the input information at (tk,1)K1

k=1 is available to

generate the initial slicer state, X. It is because the channel is activated before the slicer. In Case 3,

RBG is placed at first followed by the channel and the slicer. This eliminates all hold states in this

example case.


a.S

lice

rH

HX

t 1a

···

t (N−3)a

t (N−2)a

t (N−1)a

b.

Ch

ann

elH

(tk,1

)K1

k=1

(tk,2

)K2

k=1

(tk,3

)K3

k=1

···

(tk,N−1)K

N−

1

k=1

(tk,N

)KN

k=1

C

c.R

BG

t 1t 2

t 3t 4

···

t NC

C

12

34

NN

+1

N+

2−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→

Sim

ula

tion

Tim

e

(a)

Case

1sc

enari

o(e

xam

ple

case

)

a.C

han

nel

H(tk,1

)K1

k=1

(tk,2

)K2

k=1

(tk,3

)K3

k=1

···

(tk,N−1)K

N−

1

k=1

(tk,N

)KN

k=1

b.

Sli

cer

HX

t 1a

t 2a

···

t (N−2)a

t (N−1)a

c.R

BG

t 1t 2

t 3t 4

···

t NC

12

34

NN

+1

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→

Sim

ula

tion

Tim

e

(b)

Case

2sc

enari

o

a.R

BG

t 1t 2

t 3t 4

···

t N

b.

Ch

ann

el(tk,1

)K1

k=1

(tk,2

)K2

k=1

(tk,3

)K3

k=1

(tk,4

)K4

k=1

···

(tk,N

)KN

k=1

c.S

lice

rX

t 1a

t 2a

t 3a

···

t (N−1)a

12

34

N−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→

Sim

ula

tion

Tim

e

Legend

H:Hold

State

C:Completed

X:NoOutput

(c)

Case

3sc

enari

o

Fig

ure

3.7:

Eff

ect

of

circ

uit

ob

ject

pla

cem

ent

ord

erin

act

ivati

on

list

for

OO

sim

ula

tion


In the example case, the clock is cascaded with the circuit object PRBS as a part of the circuit

object RBG. This integrations allows all circuit objects to be held based on the placement positions of

the circuit objects. If the clock is treated as an independent circuit object, additional hold states will

be required. The number of additional hold states depends on the maximum number of consecutive

identical digits (CID) of the PRBS. Once the PRBS has reached to the point, when it has transmitted

its maximum CID, circuit objects in the later chain do not go on to hold states. Only way to increment

the number of hold states is to consistently increase the CID, which reduces generating number of events.

Overall, the number of hold states are independent from the length of the simulation.

3.3.3 Feedback Loop Situation

This section analyzes the circumstances, where there exists feedback loops in OO simulation environment.

Feedback loop is a commonly used structure in clock synchronization as well as in equalizer coefficient

adaptation schemes. Simulating feedback loops are important for various purpose, such as to study

the top-level functional accuracy, impact on the neighbouring circuitry, and feedback loop stability in

transient time. Because the OO simulation scheme is primarily developed focusing on the feed-forward

architecture, setting up the feedback loop simulation vastly depends on the nature of the loop delay.

Signal

In

Signal

Out

Feedback

Path

C1 C2

C3

Figure 3.8: Schematic of a system with a feedback loop

Figure 3.8 provides a schematic representation of a typical feedback system, which comprises three

major components: C1, C2, and C3. The input signal is processed by C1 based on the feedback C3.

The output of C1 is processed by C2 to generate the output of the system, which in turn is fed to C3.

We use this example here to study its possible implementation and to evaluate its potential situations

under OO simulation environment. For explanation convenience, all blocks are considered to be discrete

time.

Dealing with Components Through Object Ordering

One approach is to deal with the components employed in the feedback system through object ordering.

Section 3.3.2 shows that object ordering is not a concern for systems that involve feed-forward architec-

ture. However, if a system with feedback architecture is dealt without modification for OO simulation,

number of hold states increase. Increasing the number of hold states leads to accumulation of input

information for processing and thereby demands more memory. Effects on number of hold states during

OO simulation of the system with feedback loop (shown in Figure 3.8) is illustrated in Figure 3.9.


C1 · · · ti ti+1 ti+2 · · ·

C2 · · · ti ti+1 ti+2 · · ·

C3 · · · ti ti+1 ti+2 · · ·

N N + 1 N + 2−−−−−−−−−−−−−−−−−−−−−−→

Simulation Time

(a) Case 1 scenario (preferred)

C3 · · · ti ti+1 ti+2 · · ·

C2 · · · ti ti+1 ti+1 · · ·

C1 · · · ti ti+1 ti+2 · · ·

N N + 1 N + 2 N + 3 N + 4 N + 5 N + 6 N + 7 N + 8−−−−−−−−−−−−−−−−−−−−−−→

Simulation Time

(b) Case 2 scenario (not preferred)

Figure 3.9: Effects on object placement ordering in OO simulation for system with feedback loop

The above figure presents two possible scenario of object placement ordering, during the simulation

for the feedback system. In the ordering shown in Figure 3.9a, all information transactions occur at

the right timing and hence no hold state is visible in the sub-figure. Once C1 has processed the time

event ti, C2 can starts working for its processing for ti, and upon receiving the output at ti from C2,

C3 can process its output at ti. This situation repeats in the subsequent simulation steps. If the

object ordering is performed in reverse (first C3, then C2, and last C1), the information transaction

become sparse along simulation step axis, as it is demonstrated in Figure 3.9b. Hence, when dealing

with a feedback system, Case 1 object ordering is preferred over that of Case 2 and the object ordering

should be enforced during the initial phase. It is worth mentioning that all components in the feedback

system must produce output events at every simulation steps, even though there is no output transition;

otherwise, the feedback system cannot be simulated in OO environment.

Integrating into a Single Object

Another approach to implement the example feedback system is through integration of all the components

in a single object. If the updates due to feedback loop are taking place in continuous time, like the case

of CDR systems, it is preferred to describe the entire system inside one circuit object. Under such

circumstances, the representative object can be built in a bottom-up approach.

Figure 3.10 shows the hierarchy extraction procedure for feedback system modelling and Algorithm

3.3 shows how to capture the bottom-up hierarchy at the code level. As can be observed from the

abstraction process, feedback path gets hidden from the external OO simulator. Hence, hold state

issues associated with object ordering can be avoided. This way of abstraction process also enhances the

coding level comprehensibility due to having a controlled and organized developments. However, because

of describing the entire feedback system using one circuit object, it is not possible to see the outputs at


Signal

In

Signal

OutC1 C2

C3

obj C1

obj C1C2

obj C1C2C3

Figure 3.10: Hierarchical representation for a system with a feedback loop

Algorithm 3.3 Hierarchical representation template for OO simulation

% In obj C1.m fileclassdef obj C1 < handle % Starting as root

- properties and methods are described hereend

% In obj C1C2.m fileclassdef obj C1C2 < obj C1 % Inheriting from obj C1

- properties and methods that are not described in obj C1 go hereend

% In obj C1C2C3.m fileclassdef obj C1C2C3 < obj C1C2 % Inheriting from obj C1C2

- properties and methods that are described neither in obj C1nor in obj C1C2 go here

end


the intermediate stages, like the outputs from C3 of Figure 3.10. To overcome such problem, it is always

possible to add a circuit object related to C3 at the output of the integrated object, obj C1C2C3.

Handling at the Script Level

The third approach to deal with the feedback loop is by handling it through top-level scripting. If the

feedback loop operates in discrete time with updating at a longer loop delay (comparable to multiple

UIs), this scheme can be applied. In this case, the feedback loop is first broken at a junction so that

the modelled system behaves like a feed-forward system. Later, at the script level, the feedback related

calculations are performed and then applied during the feedback updating phase. This analysis scheme

can have great usages particularly at the early phase of algorithmic development.

Algorithm 3.4 depicts the pseudo-code of the script, which can mimic the feedback loop. At the

initial stage, the circuit system under test are defined without closing the feedback loop. The approach

also requires defining feedback updating time, fedbackTime, and simulation stop time, simStopTime.

The scripting terms fedbackTime and simStopTime are defined here as ∆t and T respectively. The

relationship between the two variables can be defined as, T ≈ N∆t, where N 1 represents the number

of feedback update intervals set for observation. Next step is to generate a loop to generate intervals,

[0, ∆t], [∆t, 2∆t], [2∆t, 3∆t], . . . , [(N − 1)∆t, N∆t]. Here, it is acceptable to have overlaps at the

transition points, i∆t, where i = 1, 2, . . . , (N − 1), since outputs from each simulation are not used

for merging. During each loop interval, simulation for feed-forward system is conducted and then new

feedback coefficients are determined based on the simulation outputs.

Algorithm 3.4 Pseudo-code script to deal with feedback system in OO simulation

% Initialize the environment- Define the circuit system for test- Define simulation stop time => simStopTime- Define feedback update time => feedbackTime

% Looping to mimic feedbackt = 0;- Initialize feedback coefficient (at t = 0)while t < simStopTime

t = t + feedbackTime;

% Task during each loop- run simulation until t- Acquire simulation output- Update feedback coefficient

end

It is possible to run the OO simulation for any such intervals because final states of all circuit objects

from the immediate past simulation can be applied as initial states for the current simulation. Only

certain initial conditions related to the feedback loop need to be intervened to reflect the new feedback

coefficients. This scheme is not realistic for most time-step based simulations mainly because the feature

of continuing the simulations based on previously saved results are not supported. Even if the feature is

supported by certain time-step based simulators, applying the newly updated coefficients might not be

introduced safely without causing numerical instability.


3.3.4 Incorporating Parallelism

Parallel computation platform is becoming a de facto standard in recent years due to its speed in terms

of conducting number of arithmetic calculations per cycle. Incorporating parallelism into the proposed

OO simulation scheme can be beneficial in modern computational platform. The ability to identify the

inter-dependency among circuit components as well as maintaining circuit component specific time axes

makes the OO simulator feasible to implement in parallel computation environment. Parallel processing

scheme for the OO simulator is explained using Figure 3.11 and 3.12 as follows.

Figure 3.11 describes how to realize the parallel computation structure embedded inside the OO

simulation scheme. The horizontal axis represents time and the vertical axis indicates the index of a

processor cores. P(·) symbolizes a process, a fragment of the software, which can be run on a processor

core. In the OO simulation case, a process can be the top-level script (which is referred as P(S))

or any circuit object specific method. For this example, let us assume the circuit system consists

of N circuit components referred to as C1, C2, C3, . . . CN , and their processes are described as

P(C1), P(C2), P(C3), . . . P(CN ) respectively. The knowledge of how the circuit components are

connected is not essential from the perspective of conducting the OO simulation (as established in

Section 3.3.2). Hence, the connectivity among the circuit components is not displayed here.

The figure shows two computational processing cases, serial and parallel on the same time axis for any

Serial processing︷︸︸︷

Pro

cessorCore

Index

Core 1 P(S) P(C1) P(C2) P(C3) · · · P(CN )

Core 1

Core 2

Core 3

...

Core N

P(S) P(C1)

P(C2)

P(C3)

P(CN )︸︷︷︸Parallel processing

-Time

Figure 3.11: Serial processing to parallel processing conversion for OO simulation


Pro

cessorCore

Index

Core 1

Core 2

Core 3

Core 4

P(S) P(C1)

P(C2)

P(C3)

P(C4)

P(C5)

P(C6)

P(C7)

· · ·

· · ·

· · · P(CN )

︸︷︷︸Parallel processing

-Time

Figure 3.12: Parallel processing demonstration under restricted resource environment for OO simulation

given simulation step. Top part of the figure depicts the situation, if the OO simulation is evaluated using

only one core, Core 1. First time slot is utilized to perform the process associated with top-level script,

P(S), where all processes related to circuit components, P(Ci), are launched from. If the computation

platform has access to more cores than the number of processes associated with circuit components

(number of cores ≥ N), all processes, P(Ci), can be launched at the same time upon evaluation of

process, P(S). The gray arrows in the bottom part of the figure demonstrate the parallelism performed

by N processing cores. As can be observed, incorporating parallelism with abundant processing cores is

only limited by the evaluation time of the longest process among all the processes.

Once the circuit system becomes larger, the number of processes associated with circuit components

usually increases. Under such circumstances, the number of processes is much larger than the number of

available processing cores for parallel computations and this is typically expected for current processors

in applications. Figure 3.12 depicts the case of a 4-core processor evaluating previously described N -

component circuit system, where it is assumed N 4. As can be observed from the figure, P(S), is

first performed, and then all P(Ci)’s are evaluated using the 4 cores in parallel.

3.3.5 Simulation Speed Performance

For the simulation performance analysis, we study a test case related to FFE operation as shown in

Figure 3.13. The trasmitter side consists of a RBG (shown as source) followed by an FFE and triggered

by a clock source (shown as TxClk). Once equalized by the FFE, the transmitted signal is then sent

through the attenuating channel (shown as Ch). Continuous time output from the channel is then

analyzed at the scopes to generate the eye diagram and to measure jitter. Detail modelling process of

the FFE circuitry and its various settings are described in Section 4.1.

We implement this test case in detail in C++ using a Linux computer with Intel Core i7 processor.

The source and the FFE along with their clock are implemented as one discrete time circuit object. The

channel is implemented here as a continuous time object and its maximum output time resolution is set

to 0.01 UI (or minimum 100 discrete time points per UI), because its output is fed to an eye scope to

calculate the eye diagram with exactly 0.01 UI resolution. The output of the channel is also fed to a



Source FFE

TxClk

Ch

Eye

Scope

Jitter

Monitor

(a) Block diagram

Simulation Time for Varying Length

Simulation

Time

1, 000s

100s

10s

1s

0.1s

0.46s

4.40s

43.85s

440.88s

Trend line

10k 100k 1, 000k 10, 000k

Number of Bits (1k = 1000)

(b) Simulation speed result

Figure 3.13: Speed performance result for the OO simulation


jitter monitor block, whose task is to measure the total jitter present in the system. Jitter measurement

is performed in real-time through identifying the zero-crossing points.

Once an executable program is compiled from the C++, the program is executed for different simulation

lengths. Figure 3.13b shows the plot of simulation run time, due to transmitting different number of bits

in log scale. As the number of bits are increased, the simulation time increases linearly at the rate of

∼ 44s per million bits. Performance linearity is highly desired in algorithmic performance, since it allows

to predict the trend as the simulation length is modified. Table 3.6 shows simulation time broken at the

individual circuit object level for the case of 10, 000k bits. As can be seen, the most time consuming

component channel takes about 93.7% of the entire simulation time. Because of this, the example case is

not considered worthy enough to implement in parallel processing environment. However, if the system

contains multiple components, which are similar to the channel in terms of calculation effort, it is possible

to increase simulation speed through parallelism.

Table 3.6: Simulation time break down for the case of 10, 000k bits (where 1k = 1, 000)

Component Simulation Time

Transmitter 0.6s

Channel 412.7s

Receiver 27.5s

Altogether 440.8s

3.4 Summary

This chapter presents a novel scheme to simulate an AMS circuit system. The proposed scheme ad-

dresses asynchronous circuitry incompatibility issue, which exists in conventional event-driven simulators

through incorporating continuous time output calculation for circuit objects. Continuous time output

calculation is performed primarily in time step based simulations, but time step based simulator has

slow evaluation speed due to its inherent time axis unionization process. Hence, the proposed simula-

tion scheme facilitates individual time-point selection independence to the circuit components during

the evaluation process. This helps to calculate outputs at any given location, whenever any activity is

detected. In order to implement the scheme, relationship between discrete time and continuous time

circuit component modelling has been made. Later, various studies have been performed to analyze

effectiveness of the proposed simulation scheme. Concept of the proposed simulation scheme has been

applied in modelling equalizers and CDR circuitry.

Chapter 4

Proposed Modelling for Equalizer

Circuitry

This chapter discusses the proposed modelling concept for equalizer circuitry. As explained in Section

2.2.1, the task of an equalizer is to compensate for channel attenuation. The primary purpose of proposed

modelling is to speed up the simulation process, while maintaining comparable accuracy to that of

conventional SPICE simulators. Conducting simulations at higher speed and accuracy is required in

order to perform verification analyses, such as generating BER contours (Section 2.3.1).

Three major types of equalizers are analyzed here: feed forward equalizer (FFE), continuous time

linear equalizer (CTLE), and decision feedback equalizer (DFE). Figure 4.1 represents an example ar-

chitecture consisting of these three types of equalizers: FFE at the trasmitter side, while the receiver

contains CTLE and DFE. CTLE is implemented using passive resistive-capacitive circuit elements for

continuous time operation, while FFE and DFE operate in discrete time using local clock sources.

Hence, the figure shows two clock signals, TxClk and RxClk, which are local to the trasmitter and the

receiver respectively, but have uni-directional synchronous relationship. The figure implicitly shows the

synchronous clock relationships, because the clock recovery system is not discussed in this chapter.

Performance of these equalizers often suffers from unavoidable nonlinearities, once they are imple-

mented using real-life circuit elements. The nonlinear effects of the equalizers may appear from various


Source FFE Ch CTLE DFE Sink

Clock

Synchronization

TxClk RxClk

Figure 4.1: Architectural overview of typical channel equalization system

58

Chapter 4. Proposed Modelling for Equalizer Circuitry 59

sources, such as nonlinear operations, finite bandwidth, mismatches, and process-voltage-temperature

(PVT) variations of transistor and other devices. These nonlinearities limit data communication speed

through transmission channel, but their effects are often not visible in the linear behavioural models of

the respective equalizers. Hence, it is required to capture such nonlinearity as realistically as possible to

increase the accuracy of transceiver performance verification analyses.

This chapter deals with how to capture nonlinear equalizer behaviour in the models. The proposed

modelling schemes for FFE, CTLE, and DFE are presented in Section 4.1, 4.2, and 4.3 respectively. Each

section begins by introducing an equivalent linear model for the corresponding equalizer, then discusses

circuit-level implementation, and finally explains the modelling procedure to capture the transistor-

level nonlinearity. Each proposed model is evaluated through generating an eye diagram (or multiple

eye diagrams depending on the case), which is then overlapped with the one generated by Spectre for

comparison purposes.

4.1 Feed Forward Equalizer (FFE) Modelling

Like any feed-forward control system, FFE equalizes the input signal directly with its pre-conceived

knowledge of the channel attenuation. An FFE can be designed to eliminate both pre-cursor and post-

cursor ISI. FFE, implemented at the trasmitter side, receives input signal from the data transmission

source, which is synchronous to the local clock.

Figure 4.2 shows a block-level architecture of an FFE with M pre-taps and N post-taps. Signal

in represents an input bit-stream, which needs to be transmitted. During the FFE operation, the

multiple delayed versions of the input bit-stream are added with the defined tap weights, wi, where

i = −M, . . . , −1, 0, 1, . . . , N . The z−1 block represents a delay that is usually set to 1 UI. All

the delayed signals and the input signal are first multiplied with their respective tap weights and then

summed up to produce the equalized output.

Signal In · · · · · ·z−1

Delay

z−1 z−1 z−1

w−M w−1 w0 w1 wN

Equalized Output

Figure 4.2: Basic architecture of a symbol-spaced feed forward equalizer (FFE)

Based on this, the FFE transfer function in z-domain, FFFE(z), can be written as follows:

FFFE(z) =

(w−M · zM + · · ·+ w−1 · z︸︷︷︸

M Pre-taps

+w0 + w1 · z−1 + · · ·+ wN · z−N︸︷︷︸N Post-taps

)· z−M (4.1)


4.1.1 FFE Implementation

The FFE operation is usually based on zero forcing equalization. The notion of zero forcing equalization

is to force the ISI to zero. Tap weights for the FFE are calculated from the pulse response of the intended

channel. The pulse response can be collected from simulation or measured by applying a rectangular

pulse of one-bit duration, Tb, to the channel. Calculation procedure for the FFE tap weights is explained

below.

Figure 4.3 shows an example pulse response, p(t), by applying an input pulse, Π(t). The dark colored

stems, superimposed on p(t), mark the extracted cursors, ci’s, where i = . . . , −1, 0, 1, 2, . . . .

Input Pulse, Π(t)Π(t)

1

t0 Tb

Pulse Response, p(t)p(t)

Apeak

t-

Tb

· · ·c−1

c0

c1

c2· · ·

Channel

Cursor

Figure 4.3: Cursor extraction from channel pulse response

We can represent ci’s as a column vector,−→C , as follows,

−→C =

[c−∞ · · · c−1︸︷︷︸P1 Pre-cursor ISI

c0 c1 c2 · · · c∞︸︷︷︸P2 Post-cursor ISI

]T

In−→C , c0 is considered the main cursor, as it has the highest height amongst all other cursors.

From Equation 4.1, FFE tap weights can be extracted in vector format,−−−−→WFFE , as follows,

−−−−→WFFE =

[w−M · · · w−1 w0 w1 · · · wN

]T


Applying the convolution between−→C and

−−−−→WFFE , the desired channel output response with peak

amplitude, Apeak, can be formulated as follows,

−→C ∗−−−−→WFFE =

[0 0 · · · 0︸︷︷︸M + P1 zeros

Apeak 0 0 · · · 0︸︷︷︸N zeros

](4.2)

where

−→C ∗−−−−→WFFE =

... c−∞

c−1. . .

c0. . .

...

c1. . . c−1

...

c2. . . c0 c−1

...

.... . . c1 c0 c−1

. . .

. . . c2 c1 c0. . .

...

... c2 c1. . . c−1

... c2. . . c0

.... . . c1

. . . c2

c∞...

·

w−M

...

w−1

w0

w1

...

w−N

Here, the number of rows and columns for−→C ∗−−−−→WFFE is (P1 + P2 + M + N + 1) and (M + N + 1)

respectively. Solving Equation 4.2 for−−−−→WFFE yields the FFE tap weights of defined size. As for Apeak,

it is considered as 1 here for simplicity.

An example circuitry of a 3-tap FFE implemented at the trasmitter end is shown in Figure 4.4. The

example circuit is a source series terminated (SST) FFE implemented for single-ended data transmission

application. The circuit consists of two major segments: digital logic circuitry and slices of transmit

driver. The digital logic circuitry encodes the data with delay elements, such as z−1 and z−2, and

polarities for the the tap weights, sgn(wi). The task of the slices is to drive the encoded transmit signals

representing the tap weight magnitudes, |wi|. Widths of the PMOS and NMOS transistors, WPi and

WNi, are designed such that their resistances represent the respective tap weights. In order to minimize

reflection through the channel, the net impedance looking from the output of the slices toward supply or

ground should be set to characteristic impedance of the the channel. Even though the example is shown

for 3-tap case (involving 1 pre-tap and 1 post-tap), the number of taps can be extended to represent M

pre-taps and N post-taps. (Regarding detail design procedure of the transmit driver, refer to [35,36].)


Data

In

sgn(w−1)

sgn(w0) · z−1

sgn(w1) · z−2

WP1 ∝ |w1|

WN1 ∝ |w1|RT1 ∝

1

|w1|

Slices of FFE Trasmit DriverDigital FFE Logic Circuitry

Equalized

Out

Figure 4.4: Circuit-level overview of a 3-tap source series terminated based single-ended FFE

4.1.2 FFE Modelling for OO Simulation

In OO simulation, FFE implemented at the trasmitter end, is considered as a discrete time object,

ObjTyp.discrete . Algorithm 4.1 presents the FFE template object for running OO simulation. It

has two input sources for operation: clock and RBG; both sources are discrete type. Routines of its

constructor and methods (inti(), receive(), and process()) are programmed following the criteria

described in Section 3.2.1.

In reality, a fabricated FFE behaves nonlinearly and this has various undesirable effects, such as

sampling threshold shift, jitter increase, and signal transition shape asymmetry. They can be generated

from a wide variety of sources, such as FFE implementation architecture, local clock jitter, and power

supply noise. Here, the primary focus of this section is to discuss how to capture the nonlinearity

associated with the FFE implementation architecture.

Architectural nonlinearity in the example circuit (shown in Figure 4.4) is due to the nonlinear tran-

sistor operation. Because equivalent resistance across the drain-source region depends on the volt-

age difference, tap weights realized from the equivalent resistances vary during the FFE operation.

Hence, no closed-form algebraic equation is not available. To overcome the problem, a look up ta-

ble (LUT) based calculation scheme is proposed. In general, if a FFE has n-taps, it can have 2n

. . . 0110100 . . .

Random binarybit-stream

. . . ,−1,+3,+1,−3, . . .

2-tap FFE symbolicstates (shown for 0110)

A+3A+1

A−1A−3

2-tap FFEoutput states

Figure 4.5: Look-up table (LUT) based nonlinearity modelling for FFE


Algorithm 4.1 Modelling template of feed forward equalizer for running OO simulation

classdef FFE < handleproperties

objTyp = ObjTyp.discrete - Discrete object TypeclkPort - Clock object informationinPort - Input RBG object information% Other internal properties not shown here

end

methods% Constructor called from the top-level scriptfunction obj = FFE()

- Construct the FFE object- Receive and verify all input information

end

% Method init() triggered by the input object inPort for initial processingfunction init(obj)

- Define remaining uninitialized internal variables- Calculate output at time, t = 0- Notify its outputs receiving objects

end

% Method receive() triggered by input object inPortfunction receive(obj)

- Collect the output from the inPort at t i > 0- Append the collected information v(t i) with previous information

- New collection v(t i) 6= v(t i-1) and t i > t i-1end

% Method process() triggered by clock object clkPortfunction process(obj)

if processing is completedreturn

end- Determine the next transition, t jif Maxcollected input information timing < t j

- Hold on the statereturn

end- Calculate the FFE output at t j- Notify to its output receiving objects- Discard unnecessary inPort outputs from the collection

end

% Other internal methods not shown hereend

end


FFE Output, xFFE(t)A+7

...

A−7

tChannel Response, yCh(t)

A+7

...

A−7

t

Figure 4.6: Channel waveform construction based on FFE outputs

possible output states. From the simulation, all possible output states can be recorded as, AFFE =

A−2n−1, . . . , A−3, A−1, A+1, A+3, . . . , A+2n−1, at the steady-state and one of these states is

selected based on the calculated FFE output states. In the Figure 4.5, an example case for 2-tap FFE is

shown. From the received bit-stream, the symbolic FFE states are calculated, and later each symbolic

state is replaced by its corresponding FFE amplitude.

Figure 4.6 shows a channel response accompanied by its input source FFE. In both plots, horizontal

axis represents time, t, and vertical axis represents amplitudes of the FFE transition states marked as,

A−7, A−5, . . . , A+7. FFE employed here is a 3-tap FFE and hence it has 23 = 8 possible states. Its

equalization gain is set according to the channel attenuation. Based on the FFE output states, the

channel response is calculated using Equation 2.15. Since we are only interested on the shape of the

channel response, signal attenuation at 0 Hz is considered to be 0 dB.

4.1.3 FFE Modelling Testcase

In order to evaluate the accuracy of the proposed modelling technique, a test-case of a LUT-based FFE

followed by a channel was created. Figure 4.7 shows a block diagram and compares the output eye

diagrams of the test-case. The test includes a source, a LUT-based FFE, a channel (shown as Ch), and

an eye diagram generator (shown as eye scope). The channel selected for the test was a 4-inch FR4

channel having an attenuation of ∼ 5 dB at the Nyquist frequency, fNyquist = 4 GHz. The objective of

the test is to compare the eye diagrams generated by the Spectre simulation and the proposed modelling

scheme.

For the equalization purpose, a 3-tap FFE was chosen and the tap weights were determined to be−−−−→WFFE = [−2/20, 15/20, −3/20] using Equation 4.2. Based on the tap weights, an example FFE, shown

in Figure 4.4, was implemented at the transistor level in Cadence environment. All 8 possible steady-

state amplitudes of the 3-tap FFE were found, 0.236, 0.264, 0.280, 0.321, 0.633, 0.682, 0.702, and 0.737

(all in Volts). However, these amplitudes did not match with the amplitudes calculated from the initially

designed tap weights. During the test, a PRBS7 was used as the source. The step response due to FFE

output transition was collected from the Spectre simulation considering trasmitter input termination of

the driver, channel, and receiver input termination to ∼ 50 Ω. Channel output was calculated applying



Source FFE ChEyeScope

TxClk

(a) Block diagram

Proposed Scheme︷︸︸︷−UI 0 + UI

−UI 0 + UI︸︷︷︸Spectre

(b) FFE eye diagram comparison

Figure 4.7: FFE simulation testbench and waveform reconstruction process


the FFE steady-state amplitudes in step response based modelling scheme and then an eye diagram was

generated. As can be seen from the figure, both eye diagrams from Spectre and the proposed scheme

are almost identical, validating the accuracy of the proposed modelling scheme. Eye diagram related

measurements for Spectre and proposed modelling cases are shown in Table 4.1.

Table 4.1: Eye diagram measurements for feed forward equalizer (FFE) test-case

Eye Measurements From Spectre From Proposed Scheme

Horizontal eye opening 0.78 UIpp 0.79 UIpp

Vertical eye opening 334 mV pp 339 mV pp

Relative simulation time 6000 X 1 X

The above modelling scheme was incorporated into the OO simulation scheme and the speed per-

formance of the simulation scheme was measured. The proposed scheme took ∼ 44s to generate eye

diagram based on 1-million transmitted bits. Detail speed performance is discussed in Section 3.3.5.

4.2 Continuous Time Linear Equalizer (CTLE) Modelling

Concept of CTLE operation is based on flattening the frequency response of the overall data transmission

link. Figure 4.8 shows the frequency response of a typical channel along with the frequency responses

of an ideal equalizer and a real CTLE for this channel. An ideal equalizer is the inverse of the low

pass filtering channel to compensate for the channel attenuation. Since signal amplification at higher

frequency increases unwanted noise, real CTLEs are bandlimited.

ω

ωz1 ωp1 ωp2

|H(ω)|

Apeak

Ideal equalizer,1

C(s)

Real CTLE, HCTLE(s)

Channel response, C(s)

Figure 4.8: Bode plot of a channel accompanied by its ideal equalizer and realistic continuous time linearequalizer (CTLE)

Here, Apeak defines the low-frequency gain of the CTLE. The real CTLE response can be formulated

considering one zero at ωz1 and two poles at ωp1 and ωp2 into its transfer function, HCTLE(s). The

transfer function, HCTLE(s), can be described as follows,


HCTLE(s) ≈1

C(s)

= K ·s+ ωz1

(s+ ωp1)(s+ ωp2)(4.3)

where C(s) is the channel transfer function, K is the gain factor, defined as, K = Apeak ·

∣∣∣∣∣ωp1ωp2ωz1

∣∣∣∣∣.This transfer function can provide up to 20 dB/dec between ωz1 and ωp1. To achieve higher gain and

advance equalization, more zeros and poles can be incorporated into the transfer function.

4.2.1 CTLE Implementation

For high-speed wire-line application, a CTLE usually is implemented using passive resistive-capacitive

circuit components. An example of a CTLE circuit system is shown in Figure 4.9. Each stage of

the block diagram can be represented as a generic differential buffer block with an impedance transfer

function, Z(s). Each Z(s) is defined for the specific stage according to the stage functionality. Input of

the example CTLE is single-ended, while its output is differential. Due to receiving signal-ended signal,

the input terminal, Vin−, of the gain stage is connected to a reference voltage, Vref , while other input

terminal, Vin+, is connected to the channel attenuated signal, Vin.

Task of the gain stage is to achieve high frequency gain, while the amplification stages are for

providing required amplification for sampling. The impedance transfer function, Z(s), plays a major

role in defining the output characteristics of each stage. For the amplification stages, Z(s)’s are set to

be 0 (or shorted), while for the gain stage, Z(s) is formed using a parallel combination of a resistor, Rz,

and a capacitor, Cz. Applying the definition, the zero, ωz1, and the poles, ωp1 and ωp2, can be found as,

ωz1 =1

RzCz

ωp1 =1

RzCz

1 +gm1 · gm2

gm1 + gm2·Rz

ωp2 =1

RLCL

where gm1 and gm2 represent transconductances of M1 and M2 transistors respectively.

4.2.2 CTLE Modelling for OO Simulation

Since CTLE is usually modelled to observe its eye diagram, its object type is chosen as continuous type,

ObjTyp.Continuous . It can be incorporated as an independent or as part of a cascaded continuous

time filter for OO simulation. Because the goal here is to model its nonlinearity using the step response

based scheme due to the computational speed advantages, any preceding continuous time filters need to


Vin

Vref

Vout

Gain stage Amplifying stagesDifferential signal

Single-ended signal

(a) CTLE block diagram

Vin+ Vin−

−Vout+

RL RL

M1 M2

Z(s)

Iss2

Iss2

RL: Load resistanceIss: Tail current

(b) Generic schematic for all stages

Z(s) =

Rz

Cz

Gain stage

Amplifying

stage

(c) Definition of Z(s)

Figure 4.9: Circuit-level overview of single-ended CTLE


be cascaded. Figure 4.10 shows such an example case, where the CTLE is cascaded with the channel

(shown as Ch). Here, the source is considered discrete type object, such as RBG and FFE, where the

sink can be any object, such as any measurement scope. Algorithm 4.2 depicts the pseudo-code for

CTLE operation under OO simulation environment.


Clk

Source Ch CTLE Sink

Cascading Filters

Figure 4.10: Representing CTLE for OO simulation

Even though the functionality of a CTLE is supposed to be linear, CTLE implemented at the circuit-

level shows noticeable nonlinearity. This nonlinearity mostly contributes to often deformed and asym-

metric eye diagrams, which result in high jitter as well as shifted sampling threshold reference. These can

be taken into account in the CTLE model considering the gain nonlinearity as well as system memory,

which are described as follows.

Gain Nonlinearity

Gain nonlinearity is regarded as variation in output signal gain due to different input signal amplitude.

Ideally, the output signal is considered constant for the CTLE, but the nonlinearity is observable from

its circuit-level implementation. Figure 4.11 shows the DC gain plot of a differential buffer like CTLE.

Here, the DC gain is defined as, ∆VOut/∆VIn, where ∆VIn = Vi2 − Vi1, ∆VOut = Vo2 − Vo1, and all

amplitudes, Vi1 and Vi2, can vary independently within the CTLE input signal range. As the input

signal, ∆VIn increases due to Vi1 and Vi2, the output signal, ∆VOut increases with variable gain until it

saturates.

︷︸︸︷∆VIn

Vi1 Vi2 vIn

︸︷︷

︸

∆VOut

Vo1

Vo2

vOut

Figure 4.11: Plot of CTLE gain response, vOut/vIn


Algorithm 4.2 Modeling template of continuous time linear equalizer for running OO simulation

classdef CTLE < handleproperties

objTyp = ObjTyp.continuous - Continuous object TypeinPort - Input circuit object information% Other internal properties not shown here

end

methods% Constructor called from the top-level scriptfunction obj = CTLE

- Construct the CTLE object- Receive and verify all required inputs

end

% Method init() triggered by the input object inPortfunction init(obj)

- Define additional uninitialized internal variables- Calculate the output, y(t = 0)- Notify its outputs receiving objects for collection

end

% Method receive() triggered by the object inPortfunction receive(obj)

- Collect outputs from the input object inPort at t i+1- call its process() method

end

% Method process() called from thefunction process(obj)

- Generate time vector, (t i, t i+1]- Calculate the output, (y(t i), y(t i+1)]- Notify its output receiving objects for collection- Discard unnecessary input information


end


When a CTLE is modelled with respect to a RBG, which has only two possible output states, this

gain nonlinearity does not need to be considered. It is because these two states do not contribute to the

shape of the output eye diagram. However, if the CTLE is modelled due to an FFE, the amplitudes of

the FFE output states need to be recalculated. These amplitudes change due to amplitude dependent

CTLE gain. For the single-ended CTLE, the FFE amplitudes become asymmetric with respect to their

center and this leads to asymmetric eye diagram.

System Memory

Besides gain nonlinearity, CTLE system exhibits memory. Because of system memory, the CTLE changes

its system transfer function based on the previously transmitted bits (or transition sequence). This

phenomenon is evident from the collected step responses shown in Figure 4.12. The constant, s∞,

represents the steady-state height of the step responses. Since the CTLE transfer function changes,

collected step responses show variation in peaking. These step responses are collected through reversing

the continuous time waveform construction process (described in Section 2.5.3). Figure 4.13 explains the

reversing process. First, two continuous time waveforms, y1(t) and y2(t), due to input signal transitions

α1α2α3 . . . αi−1αi and α1α2α3 . . . αi−1 respectively, are recorded from a SPICE simulator. Subtracting

y2(t) from y1(t) yields the step response due to transmitting the transition, αi. Observing the variation

−sF (t)

0

−s∞0 ≥ t ≥ tExt

t

(a) Falling edge case (inverted for visualization)

sR(t)

s∞

0

0 ≥ t ≥ tExt

t

(b) Rising edge case

Figure 4.12: Extracted step responses for modelling a CTLE (considering the effect of channel)


Waveform Due to Transition Sequence, α1α2 . . . αi−1αi

Waveform Due to Transition Sequence, α1α2 . . . αi−1

Calculated Step Response

y1(t)

y2(t)

y1(t)− y2(t)

0 tExt︸︷︷︸Extracted region

t

Figure 4.13: Step response extraction process for CTLE

on y1(t)− y2(t), extraction region for the step response is determined.

Here, we propose modelling CTLE using step responses, sF (t) and sR(t), collected from shorter

SPICE simulations. Figure 4.14 shows the construction process of CTLE continuous time waveform,

yCTLE(t). The top plot shows a random bit-stream, x(t), which has transitions at ti (where i = 1, 2, . . . ).

For each transition at ti, a step response is determined based on the approximate output at t′i = ti+ ∆t,

where ∆t denotes a constant time offset. The approximate output at t′i does not exactly follow the CTLE

output, yCTLE(t), since it does not take into account the transitions happening after ti. ∆t is determined

during the step response extraction through observing where variations among step responses are visually

maximum. During the waveform construction, step response determination involves selecting the closest

step response among the collected ones or interpolating one.

The proposed modelling method for CTLE offers several key benefits compared to other modelling

methods. The example single-ended CTLE has asymmetric rising and falling edges leading to deformed

eye diagram. This can be easily taken into considerations in the scheme with two different set of step

responses. Another key advantage is that the nonlinearity conditions of the CTLE need to be determined

only at the transition events instead of every chosen time point. Thus, the proposed modelling scheme

avoids numerous repetitive calculations at every time step unlike the time-step based ODE modelling.

In addition, the proposed modelling scheme considers the memory effect using approximate output of

the CTLE and this provides flexibility to include frequency offset related activities, such as data-rate

variations, random and deterministic jitters. Other memory effect modelling schemes, such as bit-pattern


x(t)

−1

+1

yCTLE(t)

A+1

A−1

t

ti

- ∆t

t′i

Figure 4.14: Continuous time waveform formation for a CTLE (considering the effect of channel)

dependent modelling (proposed by Ren, J. et.al. [19]), cannot be applied under such frequency offset

environments.

4.2.3 CTLE Modelling Test Cases

In order to evaluate the accuracy of the proposed CTLE modelling scheme, two test cases were considered.

First case involves testing a CTLE by itself and the later case deals with respect to the CTLE along

with an FFE at higher data rate. Both cases are described with the help of the Figure 4.15 and 4.16

respectively. Objective is to compare the accuracy of the eye diagrams generated during both test cases.

Test Case 1: CTLE Operation

First test case focuses on effectiveness of the proposed system memory consideration during the CTLE

modelling. Hence, the test case includes a binary clocked source at the trasmitter, a channel (shown as

Ch), and a CTLE at the receiver. Because the goal is to study the eye diagram at the CTLE output,

an eye scope is included after the CTLE. Here, the source is a PRBS7 generating binary signal at

8 Gbps. The channel is a 4-inch FR4 channel having an attenuation of ∼ 5 dB at the Nyquist frequency,

fNyquist = 4 GHz. The CTLE was designed to have a zeros, ωz1 = −3.769× 1010 rad/s and two poles,

ωp1 = 0.6 · fNyquist and ωp2 = 3 · ωp1 to provide the desired boost.

Based on the design specification, a CTLE circuit is implemented at the transistor-level. In order to

realize the zero, ωz1, the resistor, Rz, and the capacitor, Cz, are chosen approximately to be 1.672 kΩ and

41.3 fF respectively. Other two poles, ωp1 and ωp2, appear due to circuit parasitic and load capacitance.

The common mode at input terminal Vin of the CTLE is set to be 750 mV and the reference terminal,

Vref , is set to be 740 mV . Due to the offset between the input common mode and the reference voltages,

eye diagram generated at the CTLE differential output through Spectre simulation becomes asymmetric.

(The asymmetry in the differential signal is introduced in order to counteract the asymmetry initiated

by the FFE, which is covered in the next test case.)



Clk

Source Ch CTLEEyeScope

(a) Block diagram



(b) Eye diagram comparison

Figure 4.15: CTLE modelling performance evaluation


Following the proposed modelling scheme, the CTLE is then modelled. During the test, since the

CTLE response is calculated due to a binary source, gain nonlinearity is not taken into account. In order

to consider the CTLE system memory, 16 different rising and falling edges are recorded from Spectre

simulations. The extracted step responses are then used to construct the continuous time waveform,

which is later overlaid on top of each other to generate the eye diagram. As can be observed from

the figure, the eye diagrams from the proposed scheme nicely have matched on top of the one from

the Spectre simulation. Table 4.2 shows eye diagram measurements for both Spectre and the proposed

modelling scheme cases.

Table 4.2: Eye diagram measurements for continuous time linear equalizer (CTLE) test-case


Horizontal eye opening 0.78 UI 0.78 UI

Vertical eye opening 301 mV ppd 300 mV ppd

Relative simulation time1 2000 X 1 X

Test Case 2: FFE-CTLE Joint Operation

Aim of this test case is to demonstrate how to describe the CTLE nonlinearity due to a multi-level

source, such as FFE. Figure 4.16 shows the test case block diagram and the overlapped eye diagrams

for comparison. The test-bench of this case is quite similar to that of the Test Case 1, except that the

transmitted signal being pre-equalized by an FFE. The local clock at the trasmitter (shown as TxClk)

triggers both the source and the FFE.

Here, the data transmission took place at 16 Gbps and the channel has ∼ 20 dB attenuation (at

fNyquist = 8 GHz). Before the transmission, the input bit-stream is equalized by the same 3-tap FFE

described in Section 4.1.3. However, the FFE steady-state transition levels vary from its originally

recorded values, due to the CTLE gain nonlinearity. The newly recorded FFE transition levels are,

−0.570, −0.559, −0.547, −0.483, 0.412, 0.512, 0.541, 0.570, (measurement unit in Volts). Even though

the FFE has 8 different steady states, there are only 14 possible different transitions (instead of 8× (8−1) = 56). For each transition case, 8 different CTLE step responses are considered. The step responses

are applied following the proposed scheme to generate the continuous time waveform, which is then used

to generate an eye diagram. Eye diagrams generated from both Spectre and the proposed scheme are

overlapped and as can be observed, both eye diagrams have matched and measurements related to these

diagrams are presented in Table 4.3.

Table 4.3: Eye diagram measurements for CTLE test-case due to FFE



Vertical eye opening 498 mV ppd 501 mV


1Approximated from the speed measurements of the FFE test-case, described in Section 3.3.5



TxClk

Source FFE Ch CTLEEyeScope

(a) Block diagram




Figure 4.16: CTLE modelling performance evaluation due to an FFE


4.3 Decision Feedback Equalizer (DFE) Modelling

The DFE operation involves subtracting residual ISI from the channel attenuated signal directly in time

domain and the equalization is usually performed right before data sampling. Residual ISI is determined

based on the previously detected bits and hence this equalizer cannot remove pre-cursor ISI like FFE.

Besides, the DFE has dependency on the local clock supply to calculate the residual ISI and hence its

performance depends on the amount of jitter present in the clock.

Figure 4.17 shows the architecture of a DFE. The DFE consists of three key components: an adder,

a slicer, and a feedback filter. Task of the adder is to subtract residual ISI, ei, calculated in discrete time

(where ei = e(t = ti) and i = 1, 2, 3, . . . ), from the continuous time input signal, x(t). The discrete

time residual ISI, ei, is held on for the bit-duration, Tb, until a new residual ISI, ei+1, is available. The

continuous time output from the adder, y(t), is then sampled by the slicer to determine the transmitted

bits, yi, with respect to an assigned threshold. yi can be any of the valid transmitted bits, yi ∈ −1,+1,for a given threshold 0. The feedback filter is there to calculate the residual ISI, ei using the previously

decided bits, yi’s, as inputs based on the defined tap weights, wi’s, where i = 1, 2, . . . , N .

Signal In,

x(t)

Equalized Output,

yDFE(t)

Decided Bits, yi

Slicer

Feedback Filter

Adder

Delay

z−1 · · · z−1 z−1

wN w2 w1

Res

idu

alIS

I,e i

Figure 4.17: Basic architecture of a decision feedback equalizer

4.3.1 DFE Implementation

Concept of residual ISI subtraction during the DFE operation is explained using Figure 4.18. The DFE

pulse response, pDFE(t), is superimposed on top of the intended channel response, c(t), which is recorded

due to the input pulse, Π(t). Here, the example DFE only cancels two ISI cursors, c1 and c2, followed

by the main cursor, c0 based on the clock sampling phase. As can be observed from pDFE(t), it contains

sharp edges, because the DFE removes the ISI only at the sampling locations due to its discrete time

feedback filter. For high-speed transceiver operation, a slicer is usually implemented using sampling

latch, which requires sufficient sampling aperture (both before and after the sampling clock edge) to

function properly. To ensure proper sampling latch operation, outputs from the filter should be made

available furthest point from the sampling phase; hence, each discontinuity in pDFE(t) appears around


Input Pulse, Π(t)

t0 Tb

1

Channel and DFE Response

t-Tb

Apeak

c0

c1c2

Channel, c(t)

Cursor, ci

DFE, pDFE(t)

Figure 4.18: Pulse response due to 2-tap decision feedback equalizer (DFE)

at a middle point between two neighbouring cursors.

At the circuit level, a DFE employs an analog adder accompanied by a digitally clocked slicer and a

feedback FIR filter. The example DFE used for modelling study both receives and provides differentially

ended signals in order to comply with the differential output of the CTLE presented earlier. Figure 4.19

provides circuit-level overview of the DFE of interest. The DFE adder is realized using multiple gain

blocks with shared resistive load, RL, in order to perform current-mode summation. Each gain block is

a differential pair transistor, M1 and M2, biased with a current source, Iss,i, which is set proportional

to the corresponding tap weight, wi, where i = 0, 1, 2, . . . , N . The slicer is designed using a DFF with

high input sensitivity in order to achieve greater amplification for low equalized signal. Output of the

DFF is then fed to digital FIR logic to determine the polarity and delay of the gain stage, Ai. Here,

the rising edge of the clock is considered as sampling phase for the DFF and the falling edge is used for

digital FIR logic operation.

4.3.2 DFE Modelling for OO Simulation

Depending on the simulation requirements, a DFE can be considered as either a discrete time circuit

object, ObjTyp.discrete or a continuous time circuit object, ObjTyp.continuous . If the simulation

objective is only to acquire the recovered bits, modelling the DFE as a discrete time object is usually

sufficient. However, if generating eye diagrams is the ultimate goal, the DFE needs to be considered as a

continuous time circuit object. During the simulation, a DFE accepts two inputs: a clock source, which

is discrete type, and a signal source for equalization. If the DFE is modelled using an adder with linear

gain and infinite bandwidth, the signal source should be a continuous time circuit object such as channel

and CTLE. In contrast, if a realistic adder is incorporated, DFE modelling scheme becomes similar to

that of the CTLE. For step response based modelling, any continuous time filter along the signal source

path needs to be cascaded. Algorithm 4.3 presents the pseudo-code for a DFE as a continuous time


Equalized Signal Decided Binary BitsSignal

In

Clock

RL RL

A0

A1

. . .

AN

D Q

D Q

DFF

Digital

FIR

Logic

(a) Top-level

+ +Vin IoutAi− −

Symbol View

+

Vin

−

+

Iout

−

M1 M2

Iss,i ∝ Ai

Schematic View

(b) Gain Block Description

Figure 4.19: Circuit-level overview of differentially ended DFE


circuit object. Like an FFE, the DFE also operates synchronously with the local clock. Hence, methods

for the DFE are programmed similarly to those of the FFE.

Algorithm 4.3 Modeling template of decision feedback equalizer for running OO simulation

classdef DFE < handleproperties

objTyp = ObjTyp.continuous - Continuous object TypeclkPort - Clock object informationinPort - Data source object information% Other internal properties not shown here

end

methods% Constructor called from the top level scriptfunction obj = DFE

- Construct the circuit object DFE- Receive and verify all required input information

end

% Method init() triggered by data source object, inPortfunction init(obj)

- Define additional uninitialized internal variables- Calculate the output at t = 0- Notify its output receiving objects for initial output collection

end

% Method receive() triggered by data source object, inPortfunction receive(obj)

- Collect outputs from object pointed by inPort at (t i-1, t i]- Append the outputs to previously stored information

end

% Method process() triggered by clock object, clkPortfunction process(obj)

if processing is completedreturn

end- Get the clock transition, t jif Maxcollected output timing of inPort < t j

- Hold on to the state at t jreturn

end- Determine next processing time range, (t j-1, t j]- Calculate the output for the range- Notify its output receiving objects for output collection- Discard unnecessary input information

end


end

Output calculation for the DFE is similar to that of the CTLE due to algorithmic similarity. Figure

4.20 shows how to modify the DFE block diagram in order to capture the finite bandwidth property

for step response based modelling. A DFE top-level block diagram with a bandlimited adder, where

a LPF with transfer function, HAdder(s), is incorporated after the adder block. The HAdder(s) block

is shifted before the adder and it causes to have two HAdder(s) blocks: one along the signal path and

another closed to the FFIR(z) block. The HAdder(s) block along signal path needs to be cascaded with

its preceding continuous time filter blocks, such as channel and CTLE. Other HAdder(s) block is applied


Signal

In, x(t)

Decided

Bits, yi

Bandlimited Adder

Adder

HAdder(s)

Slicer

FFIR(z)

(a) Conventional model

Signal

In, x(t)

Decided

Bits, yi

AdderSlicer

HAdder(s)

HAdder(s)

FFIR(z)

(b) Considering adder bandwidth before the addition

Figure 4.20: Modifying DFE model to capture the finite adder bandwidth

to convert the discrete time output of the feedback filter, FFIR(z). This allows to consider the adder as

an ideal summer, since its inputs are bandlimited.

Like CTLE, gain nonlinearity and system memory need to be taken into account for the DFE adder.

Initially, all DFE steady-states and their state transitions are identified due to input signal source as

well as its feedback filter as part of the gain nonlinearity modelling. In order to capture system memory,

multiple step responses are recorded with their associated intermediate output information. In the case,

one set of such information is associated with the cascaded continuous time filters along the signal path

and the other set is related to the HAdder(s) from the FFIR(z). These collected information are then

applied to construct the continuous time DFE output. Because the proposed modelling scheme for

DFE shares great similarity with that of the CTLE, the aforementioned modelling advantages are also

retained.

4.3.3 DFE Modelling Test Case

Based on the proposed DFE modelling scheme, a testcase is designed to evaluate the modelling accuracy.

Figure 4.21 depicts the testbench and the eye diagram comparison. The block diagram is quite similar

to typical top-level equalization architecture (Figure 4.1). As can be inferred from the testbench, all

equalizers (FFE, CTLE, and DFE) are considered during this test. Here, the data transmission is set at



Source FFE Ch CTLE DFEEyeScope

DelayBuffer

TxClkRxClk

(a) Block diagram




Figure 4.21: DFE modelling performance evaluation with respect to FFE and CTLE


16 Gbps and the selected channel (show as Ch) is 4-inch FR4 with an insertion loss of ∼ 20 dB at the

Nyquist frequency, fNyquist = 8 GHz. An eye scope is added to observe the eye diagram after combined

equalization of FFE, CTLE, and DFE. The receiver clock (shown as RxClk) is created by delaying the

trasmitter clock (shown as as TxClk) instead of incorporating a clock synchronization scheme, because

the objective is only to capture the equalizer nonlinearity in the eye diagram.

In order to equalize for ∼ 20 dB channel attenuation, gains from all equalizers are distributed. For the

test, same 3-tap FFE and the first order CTLE are employed, which are described in Section 4.1.3 and

4.2.3 respectively. After the FFE-CTLE equalization, 1-tap DFE is employed to eliminate the remaining

ISI. Tap weights for the DFE is set up as−−−−→WDFE = [9/12,−3/12], which provides additional ∼ 6 dB

gain. Considering all equalizers, an eye diagram is generated using outputs from Spectre simulation.

In order to achieve the Spectre-like eye diagram, the DFE step responses and their associated pa-

rameters were extracted. Here, 3-tap FFE and 1-tap DFE together contribute to 23 × 21 = 16 possible

steady-states, which were extracted from Spectre simulation, as −0.654, −0.652, −0.650, −0.635, −0.310,

−0.309, −0.308, −0.300, 0.273, 0.300, 0.304, 0.307, 0.588, 0.638, 0.645, and 0.650 (all measured in Volts).

For 16 possible states, 22 different types of step responses are identified and for each type, 4 different step

responses were extracted. Applying step responses, a eye diagram for the DFE are constructed, which

was compared with that generated from the Spectre simulation. As can be seen from the figure, both

eye diagrams nicely overlapped with nearly all eye diagram features. Table 4.4 presents eye diagram

related measurements for both cases.

Table 4.4: Eye diagram measurements for decision feedback equalizer (DFE) test-case



Vertical eye opening 470 mV ppd 440 mV ppd


4.4 Summary

This chapter discusses modelling of three types of equalizers: FFE, CTLE, and DFE, in an integrated

environment. For each equalizer, the relationship between the linear step response based modelling and

nonlinear behaviour is established with some modifications. After the modifications, when the linear

step response based modelling has been applied on the equalizers, the proposed modelling is able to

generate eye diagrams, which matches indistinguishably with the eye diagrams generated from Spectre.

The proposed modelling method not only shows excellence in generating accurate eye diagrams but also

demonstrates potentials in simulation speed up, which has been performed on the FFE modelling case.

1Approximated from the speed measurements of the FFE test-case, described in Section 3.3.5

Chapter 5

Proposed Modelling for Clock and

Data Recovery (CDR) System

This chapter introduces the proposed modelling concept for CDR system, which can be evaluated under

OO simulation environment. Primary task of a CDR is to determine the optimal sampling location from

the received signal, which is reasonably equalized. As explained in Section 2.2.2, CDR determines the

optimal sampling location by taking the average of the past zero-crossing time points. Evaluating the

CDR effectiveness in identifying the optimal sampling location requires long transient simulations, due to

various disagreements with equivalent linear system model. One such evaluation based on long transient

simulations is jitter tolerance test, which is described in Section 2.3.2. It is essential to understand

the CDR lock acquisition behaviour as well as other CDR clock dependent circuitry, such as DFE,

functionality under various input jitter situations.

In order to perform such long transient simulations, it is preferable to adopt low computationally

intensive models for high simulation speed, while maintaining reasonable measurement accuracy. Con-

ventional time-step based simulation scheme provides accuracy for zero-crossing locations up to its user

defined time-step; hence, it is possible to achieve reasonable simulation accuracy with decreasing simula-

tion time-step size. As the simulation time-step size is reduced, the simulation speed drops accordingly

making the scheme infeasible for conducting such long transient simulations. In event-driven simulation

scheme, zero-crossing locations are being estimated directly based on the defined CDR model; there-

fore, accuracy of the zero-crossing points depends on the model development considerations. Since the

event-driven simulation scheme has an incompatibility issue associated with asynchronous circuit system

(discussed in Chapter 3), only CDR, which operates in discrete time or is representable in equivalent

discrete time system, can be incorporated in the environment. Because of that, simulating CDR with

asynchronous circuitry is not feasible in event-driven environment.

The proposed modelling addresses the aforementioned issues of capturing such CDR asynchronous

behaviour in OO simulation, which is functionally similar to event-driven simulation. The model also

maintains low computational profile. Section 5.1 provides an overview of a CDR to highlight the func-

tional disagreement with equivalent linear modelling scheme. Next, Section 5.2 describes the proposed

modelling scheme at the CDR component-level. Component-level description are put together to make

the complete CDR model, which is presented in Section 5.3. Section 5.4 validates the proposed modelling

scheme with respect to time-step based model. Finally, Section 5.5 draws the conclusion regarding the

84

Chapter 5. Proposed Modelling for Clock and Data Recovery (CDR) System 85

modelling concept.

5.1 CDR Functional Overview

A CDR contains three major components: phase detector (PD), loop filter (LF), and voltage controlled

oscillator (VCO). Block diagram and linear model of the CDR are shown in Figure 5.1. PD determines

if the clock edges of the CDR (referred to as recovered clock) is early or late with respect to the received

data edges (of signal in as shown in the diagram). Upon detecting a data transition, the PD raises either

early or late flag (shown as E/L referring to early or late case); otherwise, it maintains a no action state

when no adjustment should be taken place. Output of the PD (early, late, or no action), which is a

discrete time signal, is then low pass filtered using the LF to determine the average. Average of early-late

pulse modulated signal is utilized to determine whether the VCO should increase or decrease the output

clock frequency. Recovered data sampled from the input signal is also available from the PD. It is usually

Signal In

Recovered Data

Recovered Clock

PD LF VCOE/L

PD: Phase Detector

LF: Loop Filter

VCO: Voltage Controlled Oscillator

E/L: Early or Late

(a) Block diagram

Phase Detector

KPD HLF (s)KV CO

s

ΦIn(s) VPD VLF ΦOut(s)

(b) Linear model

Figure 5.1: Architectural overview of the clock and data recovery (CDR) system


implemented using the data sampling DFFs.

In the linear model, the PD is viewed as a summer that calculates phase difference between the input

signal, ΦIn(s), and the recovered clock output, ΦOut(s), multiplied by a gain factor, KPD. Value of the

PD gain factor, KPD, depends on various factors, such as data transition density, PD architecture, and

ambient noise. Output of the PD, VPD, is then low pass filtered with transfer function, HLF (s), which

is a behavioural representation of the LF. Finally, the HLF (s) output, VLF , is integrated with a gain

factor of KV CO to generate the recovered clock output, Φout(s).

The linear model is employed during the initial design phase to determine parameters for various

circuit components, such as resistors, capacitors, and charge pump switches. This model essentially

helps to initiate the CDR design process based on given constraints, such as loop filter bandwidth,

feedback loop stability, and maximum transient overshoot. However, the model fails to assess the true

reality behind CDR operation, which can easily lead to an implementation failure. Typically, the PD

implemented in a CDR for high-speed application provides discrete time outputs (early, late, or no

action) at designated clock or data transitions, whereas the PD inside the linear model provides phase

difference in continuous time. Every time the PD detects a data transition, it raises a flag causing

the VCO frequency to increase or decrease by a defined constant amount. If the PD requested VCO

frequency offset is set to be so high (or low) to meet the loop filter bandwidth specification that correction

from the PD to reverse VCO frequency offset becomes late and under such circumstance, CDR adds a

wrong sample (or lost a sample). This CDR lock slipping event due to the discreteness of the PD is not

always visible with the continuous time model of the PD in the linear model case. Hence, it is essential

to perform the transient simulation of the CDR, which takes into account the discrete time behaviour

of the PD.

5.2 CDR Component-level Modelling

OO simulation scheme behaves similar to event-driven simulation to achieve speed advantage. During

the OO simulation, CDR is treated as a clock transition event generator. The resultant outputs are

discrete time clock events calculated based on the received input signal. However, the CDR architecture

has a feedback loop, which involves updating in continuous time and hence, it is modelled as one circuit

object (as explained in Section 3.3.3).

Inside the CDR circuit object, the event calculation process is divided into two major segments.

Former segment involves determining the PD output transitions, and the later one deals with LF and

VCO operations. Here, the outputs from the PD and VCO are discrete type, while the LF provides

continuous time output. Since the goal is to determine the clock transition events associated with

the CDR, the LF is cascaded with VCO in order to hide its continuous time information. Cascading

continuous time filters allows to avoid implementing ODE based algorithms, which are computationally

intensive.

5.2.1 Phase Detector (PD) Modelling

Objective of the PD is to generate information related to phase difference between incoming data transi-

tion and CDR clock edges. Usually, two types of PD are mostly found in high-speed applications. They

are: binary PD and linear PD. Detail modelling procedure accompanied by their behavioural functions

are described as follows.


Binary PD

Binary PD works on the principal of identifying the sign of phase difference between data edge and clock

edge. Figure 5.2 provides the implementation overview of an example binary PD. As can be seen from

the schematic, the binary PD has three DFFs, marked as DFF1, DFF2, and DFF3. Both DFF1 and

DFF3 are rising edge-triggered DFFs connected serially providing sampled outputs, D1 and D2. DFF2

is falling edge-triggered sampling directly the input signal and provides edge output E.

Based on the sampled values of D1, D2, and E, the PD logic derives CDR clock state of being

early or late for a given transition. The binary PD raises the early flag, if the edge occurs before the

transition, D1⊕ E. If the edge occurs after the transition, D2⊕ E, the late flag is raised. PD decision

remains upheld only during the data transition phase, D1 ⊕ D2. Both early and late cases are shown

in he waveform view. As can be observed, the PD output is synchronous to the CDR clock transitions,

because the PD output state remains fixed until next clock transition arrives. This makes it feasible to

implement in conventional event-driven simulation scheme because clock transitions are always known

looking from the CDR modelling side. According to the binary PD logic, state transition diagram is

constructed for OO simulation and is shown at the bottom of the figure. Following the state diagram,

pseudo-code of the binary PD logical function can be described, which is shown in Algorithm 5.1.

Algorithm 5.1 Pseudo-code of binary phase detector logical functions

function calculate binaryPD% Update D1, D2, and Eif Rising edge detected

- Update D1 and D2else % Falling edge detected

- Update Eend% Raise the appropriate flagif D1 6= D2 % Data transition occurred

if D1 6= E- Raise the flag: Early (-1)

else % D2 6= E is true- Raise the flag: Late (+1)

endelse % No data transition found

- Raise the flag: No Transition (0)end

end

Linear PD

Unlike the binary PD case, linear PD provides more precise phase difference information through its

pulse width modulation. The widths of the generated pulses are linearly proportional to phase difference

between clock and data edges. Figure 5.3 shows implementation overview of an example linear PD for

high-speed application. As can be seen from the schematic, the linear PD consists of two serially con-

nected DFFs, DFF1 and DFF2. They are rising and falling edge triggered respectively, and their sampled

outputs are, D1 and D2. The linear PD logic determines early or late based on the sampled output, D1

and D2, and input data D0. Dark routing highlights the signal path for generating asynchronous PD

output signal.

The linear PD logic analyzes D0, D1, and D2 to generate a pulse related to phase difference and


Data In

Clock

DFF1

DFF2

DFF3

D1

D2

E BinaryPD

Logic

PDOut

(a) Schematic view

Early Case

D2 E D1

Late Case

D2 E D1

Data1

0

Clock1

0

PD+1

0−1

Early: −1, Late: +1, No Action: 0

(b) Early/late case waveform

0start

−1

+1

AB

C

B

A

C

C

A

B

A = D1⊕D2

B = (D1⊕D2) · (D1⊕ E)

C = (D1⊕D2) · (D2⊕ E)

(c) State transition

Figure 5.2: Modelling overview of binary phase detector (PD)


Linear PD Logic

DFF1 DFF2

D0 D1 D2

Datain

Dataout

PDout

Clock

(a) Schematic viewEarly Case Late Case

Data1

0

Clock1

0

PD

+1

0

−1︸︷︷︸1 UI

︸︷︷︸1 UI

Early: −1, Late: +1, No Action: 0

(b) Early/late waveform view

0start

+1

−1

TT

T

T

T

T

(c) State transition

Figure 5.3: Modelling overview of linear PD


a pulse of half a UI width for reference. Example PD output waveforms for both early and late cases

are shown in the middle sub-figure. Subtracting the two pulse provides both sign and magnitude of the

desired phase difference for a specific data edge. As can be seen, the early case generates net negative

pulse, whereas the late case generates net positive pulse. It is also worth mentioning from the waveform

that the linear PD output contains additional transitions related to data edges (marked with dotted

vertical lines).

In order to capture the linear PD behaviour, state transition diagram is formed. Typically, once data

transition is detected, T , the PD state transition follows, 0 → +1 → −1 → 0 → · · · path. Unexpected

data transition may take place due to large frequency offset and to cover that, two additional paths,

+1 → 0 and −1 → +1 are drawn. If no data transition is detected, T , the PD remains at the 0 state.

Based on the state transition diagram, pseudo-code for the linear PD is written in Algorithm 5.2. The

algorithm should be called whenever any transition related to data and clock occurs.

Algorithm 5.2 Pseudo-code of linear phase detector logical functions

function calculate linearPD% Update D1, D2if Data transition detected

- Update D0elseif Rising edge detected

- Update D1else % Falling edge detected

- Update D2end% Raise the appropriate flag- Determine the state using (D0 6= D1) - (D1 6= D2)

end

The asynchronous behaviour of the linear PD is due to the additional output transition related to

data edges. These transitions are not directly predictable from the CDR modelling point of view. When

the linear PD based CDR arrives at its lock position, falling edge of the clock tends to align with the

unpredictable data edges. At this phase, the CDR clock edges sometimes comes slightly earlier and other

times sightly later than the data edges. Because of this unpredictable nature of the linear PD, the system

cannot be directly simulated in event-driven simulation. [20] attempted to deal with the unpredictable

transition events for linear PD based phase locked loop (PLL) through isolating PLL related transition

from the actual event-driven simulation time axis. However, the scheme will becomes complicated, if

a similar system needs to be developed for the CDR case, since the system accepts transition events

related to random bit-stream. In order to deal with the aforementioned situation, the OO simulation

with event scheduling flexibility is proposed. Details regarding how to deal with such asynchronous

events are explained earlier in Section 3.3.1.

5.2.2 Loop Filter (LF) and Voltage Controlled Oscillator (VCO) Modelling

Based on the PD generated discrete time outputs, LF and VCO jointly estimate the desired clock

transitions for CDR. Figure 5.4 shows schematic and block diagram overview of charge pump based LF

connected to a VCO. CDR with charge pump based loop filter behaves like a type 2 PLL [37,38]. Because

CDR needs to generate clock, whose phase should be controllable for incoming data edge alignment with

respect to reasonable frequency offset, type 2 PLL mechanism is adopted to track ramp input with zero


steady-state error. Whenever the PD raises the late flag (or positive pulse), the up switch is closed to

initiate current flow and the down switch is enabled in response to early flag (or negative pulse). No

action state can be realized by opening or closing both switches together. Depending on the current

direction, either charge is either pumped in or out from the loop filter, which is formed using two

capacitors, CP and CA, and a resistor, RP .

The proposed modelling scheme for combined LF and VCO involve calculating clock transitions based

on PD discrete time outputs, vPD,i (where i = 1, 2, 3, . . . , N as N →∞). Received discrete time PD

outputs can be visualized in continuous-time form, vPD(t), as follows,

vPD(t) = u(t− t1)− u(t− t2)− u(t− t3) + u(t− t4) + · · ·

= limN→∞

N∑i=1

(vPD,i − vPD,i−1) · u(t− ti) (5.1)

Here, vPD,i = vPD(t = ti), and vPD,i ∈ −1, 0, 1. These values, −1, 0, and 1, symbolically

represent early, no action, and late states respectively. Using the continuous time PD output, vPD(t),

the CDR clock output phase, φOut(t), can be determined from convolution with open-loop CDR impulse

response, hCDR(t).

φOut(t) = vPD(t) ∗ hCDR(t)

= limN→∞

N∑i=1

(VPD,i − VPD,i−1) · ϕ(t− ti) (5.2)

Here, ϕ(t) is open loop CDR clock phase response in continuous time due to unit step, u(t), from

PD and defined as, ϕ(t) = hCDR(t) ∗ u(t). Values for both φOut(t) and ϕ(t) can be positive real value

starting from 0. Determining the zero-crossing or clock transition locations involves solving for time

points, tnπ, when φOut(t) = nπ (here, n = 0, 1, 2, 3, . . . ∞). If n is assumed even for rising edge

transitions, the odd valued n provides falling edge of the clock. In order to get the insight into ϕ(t),

open loop CDR transfer function, HCDR(s), is determined based on the block diagram as follows.

HCDR(s) =Φout

Φin(s)

∣∣∣∣∣Open

=IPKV CO

s2(CP + CA)·

1 + sRPCP

1 + sRP ·CPCA

CP + CA

(5.3)


Up

Down

IP

IP

︸︷︷︸Chargepump

Loop filter

RP

CP

CA

VCO

(a) Schematic view

Chargepump

Up/Down IP

Loop filter

HLF (s)

VCO

KV CO

s

Clockout

HLF (s) =1

s(CP + CA)·

1 + sRPCP

1 + sRP ·CPCA

CP + CA

(b) Block diagram

Figure 5.4: Modeling overview of charge pump based loop filter (LF) and voltage controlled oscillator(VCO)


ϕ(t)ϕExt(t) ϕ∞(t)

0t→ −∞ 0 tExt t

Figure 5.5: CDR open loop step response

Taking the inverse Laplace transform due to unit step yields the desired open loop CDR clock phase

output, ϕ(t).

ϕ(t) = L−1

1

s·HCDR(s)

= K

[1

2t2 −

a− bab

t+a− bab2

(1− e−bt

) ]u(t) (5.4)

where K =IPKV CO

CP + CA, a =

1

RPCP, and b =

CP + CA

RPCPCA.

As can be observed from Equation 5.4, ϕ(t) does not become constant as time progresses, but increases

quadratically, t2. This is also observable for ϕ(t) from Figure 5.5. This ϕ(t) is undesirable from the

perspective of the continuous time modelling technique associated with step response, s(t), (described

in Section 2.5.3), since s(t) becomes constant as time progresses, t > tExt.

Like Equation 2.12, ϕ(t) can be re-written following the Figure 5.5 as,

ϕ(t) =

ϕExt(t) 0 ≤ t ≤ tExtϕ∞(t) t > tExt

0 t < 0

(5.5)

Here, in the above equation, ϕExt(t) appears due to the charge pump components (0 ≤ t ≤ tExt). In

reality, expression for ϕExt(t) will not be as simple as the expression for ϕ(t), presented in Equation 5.4.

The real expression for ϕExt(t) becomes more complicated due to other circuit-level nonlinearity, such

as frequency poles associated with switches, transistor nonlinearity, and device mismatches. Hence, it

is preferable to treat ϕExt(t) as a LUT during the OO simulation in order to avoid repetitive complex

calculations, once determined from the SPICE simulator.

Outside of the extracted region, t > tExt, since ϕ(t) grows quadratically with respect to time, steady-

state response of ϕ(t) can be expressed as, ϕ∞(t) = At2 +Bt+ C, where A, B, and C are polynomial


coefficients. The real VCO does not increase its oscillating frequency linearly, as its input voltage

increases. Usually, the VCO becomes saturated as the input voltage is increased and there also exists

certain amount of delay response time for the VCO to its frequency. Depending the VCO frequency

saturation capturing range, values for A, B, and C can be fitted accordingly using least-square method.

After applying the new definition of ϕ(t) (described in Equation 5.5), CDR clock phase output,

φOut(t), can be expressed as follows,

φOut(t) = limN→∞

N∑i=1

(vPD,i − vPD,i−1) · ϕ(t− ti)

= limN→∞

N∑i=N−k+1

(vPD,i − vPD,i−1) · ϕExt(t− ti) + limN→∞

N−k∑i=1

(vPD,i − vPD,i−1) · ϕ∞(t− ti)︸︷︷︸φConst(t), Simulation length dependent, O(N 2)

(5.6)

Unlike the Equation 2.15, the second segment marked as φConst(t) still remains as a function of time,

t. If this segment remains time-varying, the implemented algorithm performance will degrade as the

simulation time length increase, O(N 2), which is clearly not desirable. In order to achieve the linear

computational performance, O(N ), like that of the step response based modelling scheme, time-varying

nature of φConst(t) needs to be handled algebraically such that it can be calculated for any given time

space. Since the steady-state ϕ∞(t) can be described with a quadratic expression, ϕ∞(t) = At2+Bt+C,

plugging the expression into the φConst(t) leads as follows,

φConst(t) = limN→∞

N−k∑i=1

(vPD,i − vPD,i−1) · ϕ∞(t− ti)

= limN→∞

N−k∑i=1

(vPD,i − vPD,i−1) ·[A(t− ti)2 +B(t− ti) + C

]

= limN→∞

N−k∑i=1

(vPD,i − vPD,i−1) ·[A B C

]·

1 −2ti t2i

1 −ti

1

︸︷︷︸

3×3 Constant Matrix, O(N )

·

t2

t

1

(5.7)

As it can be observed from the new expression for φConst(t) in Equation 5.7, the summation ranging

i = 1, 2, . . . N−k, (as N →∞) only incorporates discrete time information associated with PD output

transitions, ti’s. The expression does not include the actual continuous time information, t, whose spread

can be described as, [0, tStop], where tStop indicates simulation stop time.

Like the case of the step response based modelling, first expression in Equation 5.6 involves k number

of summations, which also facilitates simulation length independence, O(N ). The second expression,

φConst(t), has been modified so that only the 3× 3 matrix comprises of ti needs to be updated. Hence,

the second expression also has the similar characteristics like that of the step response case and the

computational complexity is also O(N ).


5.3 Putting it Altogether

Since task of the CDR is to generate clock transitions synchronized to optimal sampling location at

the receiver end, the CDR is modelled as a discrete time circuit object, ObjTyp.discrete in OO

simulation environment. CDR accepts only one input associated with binary input sources like RBG.

Depending on the CDR architecture and test objectives, input source for CDR can be discrete time or

continuous time circuit object. If the selected CDR architecture employs linear PD, the CDR input

source should be discrete type, since certain PD transitions are originated from the transitions of the

input source. Sometimes sampling correctness of the DFFs of the PD is important for simulation and

under such circumstance, the CDR can be modelled to accept input from a continuous time circuit object.

Algorithm 5.3 depicts a generic template for coding both linear and binary PD based CDR. Here, linear

PD based CDR operation is explained for OO simulation purpose due to its inherent asynchronous

behaviour causing incompatibility for event-driven simulation environment. Later, a brief discussion on

incorporating binary PD is also provided.

Determining the CDR clock transitions, which is performed inside the method recieve(), requires

simultaneously dealing with the PD output and LF-VCO discrete time transitions. Figure 5.6 is employed

here to explain the proposed CDR clock output transition determination scheme for linear PD based

CDR. After receiving newly processed information, ti+1, from the CDR input circuit object, the CDR

method receive() appends the information with previously received information, ti – this creates an

analysis window, [ti, ti+1), within which the CDR algorithm can determine its output transitions. Let

assume the within the analysis window, [ti, ti+1), N + 1 CDR clock transition occurs. At first, the

linear PD output is determined at the time point, ti, which is associated with the data point transition.

Taking into account the PD output transitions and previous clock transition, tj−1 (where tj−1 < ti),

the proposed scheme defines a new analysis sub-window, [ti, tj + δt). In the new analysis sub-window,

tj represents as new clock transition to be detected and δt defines a small offset necessary to determine

the new transition. Within the newly defined window, continuous time output for LF-VCO is calculated

(mostly around time tj) and from the output, the new transition, tj , is detected through interpolation

at nπ. At this point, the PD output state is updated again at tj . Similar continuous time output for

LF-VCO is performed for new sub-window, [tj , tj+1 + δt) to detect the next clock transition at tj+1.

After that, again the PD output state is updated. This PD state updating as well as new clock transition

detection continue until the final sub-window, [tj+N , ti+1), has been reached. The final sub-window,

[tj+N , ti+1), appears after detecting the last clock transition, tj+N , and analyzing the window does not

provide any new transitions. It is worth mentioning that the PD will not change its state after tj+1 time

event until ti+1 and hence the final sub-window can be selected as [tj+1, ti+1) to detect remaining the

CDR clock transitions.

As can be observed, the proposed modelling scheme take the advantage of event scheduling flexibility

from the OO simulation technique. The event scheduling flexibility has facilitated here to schedule

events, tj , tj+1, tj+2, . . . , tj+N , to take place after estimating data transition event, ti+1, even though

the data transition event, ti+1, occurs later in signal time space. Maintaining the ascending order of time

events, tj < tj+1 < tj+2 < · · · < tj+N < ti+1, is not necessary, since the data transition event, ti+1, is

not going to change due to any variation in the CDR operation. Besides, the event scheduling flexibility

has allowed to avoid inevitable repetitive the entire system calculation elimination in order to detect the

clock transitions. The proposed modelling scheme can also be applied for binary PD based CDR. In that

case, the PD outputs related to data transitions do not take place; otherwise, the modelling for both


Algorithm 5.3 Modeling template of clock and data recovery for running OO simulation

classdef CDR < handleproperties

objTyp = ObjTyp.discrete - Discrete object TypeinPort - Input object% Other internal properties not shown here

end

methods% Constructor called from the top-level scriptfunction obj = CDR

- Construct the CDR object- Receive and verify all require inputs

end

% Method init() triggered by the object inPortfunction init(obj)

- Define additional internal input variables- Calculate output at time, t = 0- Set the PD state = 0- Enlist itself to the event host

end

% Method receive() triggered by the object inPortfunction receive(obj)

- Collect outputs from inPort at t i+1

% Determine all possible clock transitions ≤ t i+1while true

- Determine next clock transition, t nextif t next ≥ t i+1

breakend- Accept t next as clock transition list- Update PD state at t next

end- Update PD state at t i+1 % Not applicable for Binary PD- Erase unnecessary PD states from the PD transition list

end

% Method process() triggered from the event hostfunction process(obj)

if (process() is completed) | | (clock transition list is empty)return

end- Pass the first transition from the accepted transition list- Notify to the next circuit object

end


end


Data Eye (One or Multiple UIs)

2 consecutive transitions

Intermediate Calculation Steps

tj + δt

tj+1 + δt

. . .

Reconstructed Outputs

PDoutput· · ·Clockoutput

· · ·

ti ti+1

tj−1 tj tj+1 · · · tj+N

Figure 5.6: Demonstration of CDR clock transition calculation


linear and binary PD based CDRs share the same modelling procedure. It also indicates more advanced

mixed signaling scheme can be described using the proposed CDR modelling concept.

5.4 Performance Evaluation for the Proposed Modelling Scheme

The proposed step response based modelling scheme for CDR clock transitions in OO simulation envi-

ronment is compared here with the conventional time-step based modelling scheme. The object of the

comparison test is to evaluate its modelling accuracy with the conventional scheme, before performing

additional modification to capture the CDR nonlinearity due to charge pump assisted LF-VCO sat-

uration as well as DFF regeneration effects from PD. Linear PD based CDR is selected for the test

case.


Clk

Source Ch CDR BERT

Figure 5.7: Test case block diagram for linear PD based CDR

Figure 5.7 shows the testbench employed here to verify the accuracy of the proposed modelling

scheme. Based on the block diagram, the trasmitter end employs only a clock (marked as Clk) syn-

chronous source, which generates PRBS7 bit-stream (marked as source) at 10 Gbps. Then the source

output is passed through the channel (shown as Ch). Here, the channel only adds controlled delay, ∆t,

without initiating any attenuation, since the test-case is not designed to include any jitter effect due to

equalizer. The receiver side comprises only with the linear PD based CDR followed by bit error rate tester

(BERT) for PRBS7. The CDR is designed here with a loop filter bandwidth, ωLF = 20π × 106 rad/s

with a phase margin of 53.1. Based on the system specification, the CDR circuit design parameters are

determined as, IP = 20 µA, CP = 156.3 pF , RP = 1.6 kΩ, CA = 15.6 pF , KV CO = 3.1416 G · rad/s−V(considering input VCO range to be 2V ), and KPD = 0.5/2π (following linear CDR model).

Using the design parameters, the testbench (shown in Figure 5.7) was set up both in time-step based

and OO simulation environments. Both types of simulations were conducted for 104 bits over a duration

of 1 µs and outputs from various circuit nets of the system were collected for comparison. Key observable

segments from the collected outputs are shown in the top figure. Outputs for both input data from the

source and CDR sampled data outputs are shown under unlocked and locked conditions. First row shows

the input bit-stream generated by the PRBS7 with the delay of 0.3 UI. Next row shows the CDR clock

transitions collected from the time-step based simulation. CDR clock transitions nicely aligns with the

tπ, t2π, t3π, t4π, t5π, . . . , marked on the calculated CDR phase output, φOut(t), using the Equations

5.6 and 5.7. PD output transitions are also shown in between two sub-plots of CDR clock and φOut(t)

to confirm unlocked and locked situations. Applying the same equations of φOut(t) for the LF output,

vLF (t), which controls the VCO frequency, can be constructed and the reconstructed output is shown in

bottom sub-figure for complete 1 µs duration. As can be seen from alignment markings on the figures,


Un

lock

ed

Sit

uati

on

Dat

a1 0

CD

RC

lock

1 0

PD

+1 0 −1

φOut(t

)

Ran

ge:

t 1πt 2πt 3πt 4πt 5π···

︸︷︷

︸0−

1.02

5ns

···

···

···

···

Lock

ed

Sit

uati

on

︸︷︷

︸0.9

99µs−

1µs

(a)

Clo

cktr

ansi

tion

dem

onst

rati

on

v LF

(t)

Ste

ady-s

tate

ofth

eL

Fou

tpu

t

01µs

Tim

e,t

(b)

Calc

ula

ted

VC

Oco

ntr

ol,v LF

(t)

Fig

ure

5.8:

Pro

pos

edm

odel

ing

mea

sure

men

tacc

ura

cyva

lid

ati

on

wit

hre

spec

tto

tim

e-st

epb

ase

dsi

mu

lati

on

mea

sure

men

t


outputs from both environments match without any distinguishable difference.

5.5 Summary

This chapter deals with CDR modelling on the basis of system-level nonlinearity. In the beginning,

it is presented how equivalent linear model deviates from the realistic situation of the CDR. Later, a

new step-response based CDR modelling concept has been proposed and how the modelling can be

useful in regards to capturing the nonlinearity of the CDR arising from LF-VCO saturation as well as

DFF regeneration effects from the PD. Finally, the proposed modelling has been compared with the

conventional time-step model of the CDR for its calculation accuracy. It has also been shown how

to model linear PD based CDR, which is currently not feasible to model in conventional event-driven

simulation environment. However, the proposed modeling task still needs to be verified with a realistic

CDR that is implemented with transistor-level circuitry to demonstrate its true nonlinearity capturing

capability and it is mentioned as part of the future work.

Chapter 6

Conclusion and Future Work

This chapter summarizes the overall thesis contributions as well as certain future works that need to

be completed. Section 6.1 provides the summary of all three contributions presented in Chapter 3 - 5.

Future works for each contribution are discussed in last Section 6.2.

6.1 Thesis Contribution

This thesis deals with transceiver circuitry modelling as well as simulating in a computationally efficient

environment while accurately capturing circuit-level nonlinearity. Contributions from the thesis are

summarized as follows.

• OO Simulation: The OO simulation scheme has been developed based on notion of the con-

ventional event-driven simulation platform to support operations for asynchronous circuitry. The

proposed scheme addresses the incompatibility issue through introducing event scheduling flex-

ibility. Even though asynchronous circuit operation is supported in time-step based simulation

environment, the simulation scheme is computationally inefficient, time consuming, and often un-

reliable due to convergence instability. The proposed OO simulator also improves the simulation

speed for continuous time circuitry through focusing on the system to calculate only required min-

imum time points to describe continuous time outputs. Since circuit objects used in the simulation

scheme is designed with initializing methods to calculate the initial conditions, the simulator does

not have to randomly guess or deal with incorrect user-defined initial conditions. Thus the pro-

posed system addresses the problem of convergence instability of a large circuit system through

introducing generalized initialization methods.

• Equalizer Modelling: Even though equalization is mostly based on a linear transfer function

(either in continuous or discrete time perspective), implemented equalizers in transistor-level barely

retains the exact linearity. Since performing system-level simulation in SPICE simulator with in-

depth transistor-level information is not feasible for time consuming BER related studies, the

modified step-response based modelling have been proposed to deal with the transistor related

nonlinearity. During the continuous time waveform formation, the proposed modelling technique

suggests determining step response based on the current output states and transition patterns. This

alterations in step response based modelling allows to capture a number of nonlinearity factors,

101

Chapter 6. Conclusion and Future Work 102

such as transistor transconductance, output impedance, and input capacitance variations. Having

high modelling accuracy facilitates generating Spectre-like eye diagrams at the equalizer outputs.

The modified step response based modelling not only achieves high accuracy but also maintains

high and linear simulation speed (∼ 44s for 1 million bits). Essentially, the proposed modelling

scheme eliminates numerous repetitive computationally intensive calculations through utilizing the

key transistor-level information during the simulation, while maintaining low processor memory

footprint.

• CDR Modelling: Step response based modelling success for equalizers is exploited for CDR

nonlinearity modelling, during its clock transition calculations. Conventional linear models of

a CDR suffers from modelling accuracy for the CDRs designed for high-speed application due

to assuming PD output calculation continuity. Event-driven modelling improves the modelling

accuracy through adopting clock phase-to-phase PD updates, but fails to incorporate linear PD

based CDR due to its asynchronous nature. The proposed modelling scheme addresses the issue of

the asynchronous PD through utilizing the event-scheduling flexibility offered by the OO simulator.

In addition, the modelling technique shed lights on capturing the circuit-level nonlinearity of the

CDR appearing due to charge pump based LF-VCO saturation effect as well as DFF regeneration

of the PD. The nonlinearity capture using the proposed step response based modelling technique

is not performed due to time scarcity.

6.2 Future Work

During the course of thesis work, modelling transceiver circuitry has been mainly studied to seek for

computationally efficient ways to capture the nonlinearity with reasonable accuracy. Following potential

studies can be conducted in order to further improve the circuitry simulation performance.

• OO Simulation: The OO simulation scheme has been used to demonstrate how to incorporate

asynchronous circuitry for event-driven simulation type environment using its event scheduling

flexibility. However, the proposed technique also shed lights on how to simulate multiple continuous

time circuit systems with more computational efficiency compared to conventional time-step based

simulators. This computational efficiency is possible to achieve through the de-unionizing the time

axis to select activity individual time axis for each respective continuous time component. To prove

the computation efficiency, a new example case with reasonably large circuit system, which might

involve multiple transceiver circuitry connected either serially or in parallel need to be developed.

Even though incorporating event scheduling flexibility has enabled supporting aforementioned

features, it can potentially reduce the simulation speed drastically due to excessive memory re-

quirements in long simulation cases. As indicated at the end of Section 3.3.2, if an circuit object

cannot keep up with processing with the rate it receives the input, the input information can create

overload with memory. Under such circumstances, the event scheduler can be made adaptive to

prioritize certain processing events through communicating with the respective circuit objects to

deal with the situations.

• Equalizer Modelling: Modified step response based equalizer circuitry modelling has showed

how to generate Spectre-like eye diagrams without utilizing real transistor model. The technique

Chapter 6. Conclusion and Future Work 103

also has been used to demonstrate to potentially simulation speed using the case for FFE based

modelling in C++ environment (presented as part of OO simulation performance in Section 3.3.5).

However, it is required to observe the true simulation speed for the case of CTLE and DFE cases,

although their simulation speed would provide slightly higher simulation speed due to their similar

calculation scheme but slightly complex step response scheme. Having a complete simulation speed

performance helps to establish the nobleness of the proposed method compared to other available

modelling schemes.

The proposed modelling scheme performs with excellent accuracy, when continuous time output

from the equalizer contains residual ISI. Circuit nonlinearity related to residual ISI occurs at the

high-speed operations, but circuits designed for low-speed applications do not usually have residual

ISI. This situation can be dealt with initiating data-pattern dependent step response models, which

is proposed by Ren et.al. [19]. To increase the range of operation for step response based modelling

technique, this data-pattern dependent scheme can be integrated with the proposed scheme.

• CDR Modelling: Step response based modelling for CDR has been demonstrated and compared

with the conventional model for validity. However, more works need to be performed. The next

phase of the task would be to implement a transistor-level CDR system applicable for high-speed

operation. Architecture of the CDR should be selected such that the circuit has reasonably visible

nonlinearity like the case of equalizers. The last phase would be to adopt multiple step responses

based model for a CDR to represent its nonlinearity. Similar to earlier case, the speed performance

should also be conducted in C++ environment.

Bibliography

[1] Cadence Design Systems Inc., SpectreHDL Reference.

[2] M. Van Ierssel, H. Yamaguchi, A. Sheikholeslami, H. Tamura, and W. W. Walker, “Event-driven

modeling of cdr jitter induced by power-supply noise, finite decision-circuit bandwidth, and channel

isi,” Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 55, no. 5, pp. 1306–1315,

2008.

[3] J.-E. Jang, M.-J. Park, D. Lee, and J. Kim, “True event-driven simulation of analog/mixed-signal

behaviors in systemverilog: A decision-feedback equalizing (dfe) receiver example,” in Custom In-

tegrated Circuits Conference (CICC), 2012 IEEE, pp. 1–4, IEEE, 2012.

[4] J.-E. Jang, S.-J. Yang, and J. Kim, “Event-driven simulation of volterra series models in systemver-

ilog,” in Custom Integrated Circuits Conference (CICC), 2013 IEEE, pp. 1–4, IEEE, 2013.

[5] T. Flew, New media: An introduction. Oxford University Press, 2007.

[6] A. M. Odlyzko, “Internet traffic growth: Sources and implications,” in ITCom 2003, pp. 1–15,

International Society for Optics and Photonics, 2003.

[7] L. Atzori, A. Iera, and G. Morabito, “The internet of things: A survey,” Computer networks, vol. 54,

no. 15, pp. 2787–2805, 2010.

[8] G. E. Moore, “No exponential is forever: but” forever” can be delayed![semiconductor industry],” in

Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC. 2003 IEEE International,

pp. 20–23, IEEE, 2003.

[9] L. Wilson, “International technology roadmap for semiconductors (itrs),” Semiconductor Industry

Association, 2013.

[10] A. Kagi, J. R. Goodman, and D. Burger, “Memory bandwidth limitations of future microproces-

sors,” in Computer Architecture, 1996 23rd Annual International Symposium on, pp. 78–78, IEEE,

1996.

[11] C. A. Palesko and E. J. Vardaman, “Cost comparison for flip chip, gold wire bond, and copper

wire bond packaging,” in 2010 Proceedings 60th Electronic Components and Technology Conference

(ECTC), pp. 10–13, IEEE, 2010.

[12] D. C. Kilper, G. Atkinson, S. K. Korotky, S. Goyal, P. Vetter, D. Suvakovic, and O. Blume, “Power

trends in communication networks,” IEEE Journal of Selected Topics in Quantum Electronics,

vol. 2, no. 17, pp. 275–284, 2011.

104

Bibliography 105

[13] J. Fan, X. Ye, J. Kim, B. Archambeault, and A. Orlandi, “Signal integrity design for high-speed

digital circuits: progress and directions,” IEEE Trans. Electromagn. Compat, vol. 52, no. 2, pp. 392–

400, 2010.

[14] T. Palkert, “A review of current standards activities for high speed physical layers,” in Fifth In-

ternational Workshop on System-on-Chip for Real-Time Applications (IWSOC’05), pp. 495–499,

IEEE, 2005.

[15] R. Kollipara, B. Chia, F. Lambrecht, C. Yuan, J. Zerbe, G. Patel, T. Cohen, and B. Kirk, “Practical

design considerations for 10 to 25 gbps copper backplane serial links,” 2006.

[16] M. P. Li, M. Shimanouchi, and H. Wu, “Advancements in high-speed link modeling and simulation

(an invited paper for cicc 2013),” in Custom Integrated Circuits Conference (CICC), 2013 IEEE,

pp. 1–8, IEEE, 2013.

[17] K. Nichols, T. Kazmierski, M. Zwolinski, and A. Brown, “Overview of spice-like circuit simulation

algorithms,” IEE Proceedings-Circuits, Devices and Systems, vol. 141, no. 4, pp. 242–250, 1994.

[18] R. Cottrell, “Event-driven behavioural simulation of analogue transfer functions,” in Proceedings of

the conference on European design automation, pp. 240–243, IEEE Computer Society Press, 1990.

[19] J. Ren and K. S. Oh, “Multiple edge responses for fast and accurate system simulations,” Advanced

Packaging, IEEE Transactions on, vol. 31, no. 4, pp. 741–748, 2008.

[20] M.-J. Park, H. Kim, M. Lee, and J. Kim, “Fast and accurate event-driven simulation of mixed-signal

systems with data supplementation,” in Custom Integrated Circuits Conference (CICC), 2011 IEEE,

pp. 1–4, IEEE, 2011.

[21] J. Jang and J. Kim, “Ppv-based modeling and event-driven simulation of injection-locked oscillators

in systemverilog,” Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 62, no. 8,

pp. 1908–1917, 2015.

[22] P. Maffezzoni, L. Codecasa, and D. D’Amore, “Event-driven time-domain simulation of closed-loop

switched circuits,” Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions

on, vol. 25, no. 11, pp. 2413–2426, 2006.

[23] G. Balamurugan, B. Casper, J. E. Jaussi, M. Mansuri, F. O’Mahony, and J. Kennedy, “Modeling

and analysis of high-speed i/o links,” IEEE transactions on advanced packaging, vol. 32, no. 2,

pp. 237–247, 2009.

[24] H.-H. Chuang, W.-D. Guo, Y.-H. Lin, H.-S. Chen, Y.-C. Lu, Y.-S. Cheng, M.-Z. Hong, C.-H. Yu,

W.-C. Cheng, Y.-P. Chou, et al., “Signal/power integrity modeling of high-speed memory modules

using chip-package-board coanalysis,” IEEE Transactions on Electromagnetic Compatibility, vol. 52,

no. 2, pp. 381–391, 2010.

[25] J. G. Proakis and M. Salehi, Digital Communications. McGraw-Hill Education, 2007.

[26] A. Kuo, T. Farahmand, N. Ou, S. Tabatabaei, and A. Ivanov, “Jitter models and measurement

methods for high-speed serial interconnects,” in Test Conference, 2004. Proceedings. ITC 2004.

International, pp. 1295–1302, IEEE, 2004.

Bibliography 106

[27] M. H. Van Ierssel, Circuit Techniques for High-Speed Serial and Backplane Signaling. PhD thesis,

University of Toronto, 2007.

[28] F. N. Najm, Circuit simulation. John Wiley & Sons, 2010.

[29] S. Joeres, H.-W. Groh, and S. Heinen, “Event driven analog modeling of rf frontends,” in Behavioral

Modeling and Simulation Workshop, 2007. BMAS 2007. IEEE International, pp. 46–51, IEEE, 2007.

[30] S. Ferg, “Event-driven programming: introduction, tutorial, history.” http://eventdrivenpgm.

sourceforge.net/, 2006. Online; accessed 10 July 2016.

[31] A. Sanders, M. Resso, and J. D’Ambrosia, “Channel compliance testing utilizing novel statistical

eye methodology,” DesignCon 2004, 2004.

[32] B. Gustavsen and A. Semlyen, “Rational approximation of frequency domain responses by vector

fitting,” IEEE Transactions on power delivery, vol. 14, no. 3, pp. 1052–1061, 1999.

[33] E. Kindler and I. Krivy, “Object-oriented simulation of systems with sophisticated control,” Inter-

national Journal of General Systems, vol. 40, no. 3, pp. 313–343, 2011.

[34] T. M. Inc., “Handle class.” http://www.mathworks.com/help/matlab/ref/handle-class.html,

2016. Accessed: 2016-07-17.

[35] M. Kossel, C. Menolfi, J. Weiss, P. Buchmann, G. Von Bueren, L. Rodoni, C. Morf, T. Toifl, and

M. Schmatz, “A t-coil-enhanced 8.5 gb/s high-swing sst transmitter in 65 nm bulk cmos with 16

db return loss over 10 ghz bandwidth,” Solid-State Circuits, IEEE Journal of, vol. 43, no. 12,

pp. 2905–2920, 2008.

[36] C. Menolfi, J. Hertle, T. Toifl, T. Morf, D. Gardellini, M. Braendli, P. Buchmann, and M. Kossel,

“A 28gb/s source-series terminated tx in 32nm cmos soi,” in Solid-State Circuits Conference Digest

of Technical Papers (ISSCC), 2012 IEEE International, pp. 334–336, IEEE, 2012.

[37] T. C. Carusone, D. A. Johns, and K. Martin, Analog integrated circuit design. John Wiley & Sons,

2012.

[38] B. Razavi, Design of analog CMOS integrated circuits. McGraw-Hill Education, 2001.

http://eventdrivenpgm.sourceforge.net/

http://eventdrivenpgm.sourceforge.net/

http://www.mathworks.com/help/matlab/ref/handle-class.html

by Alif Zaman - tspace.library.utoronto.ca · Alif Zaman Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2017 This thesis

Documents

by Alif Zaman - tspace.library.utoronto.ca · Alif Zaman Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2017 This thesis