by Alif Zaman - tspace.library.utoronto.ca · Alif Zaman Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2017 This thesis
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Transceiver Modelling for High-Speed Serial Links
by
Alif Zaman
A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science
Graduate Department of Electrical and Computer EngineeringUniversity of Toronto
Re-arrangement in KCL Equation 2.3 allows us to observe two major representations: one is of static
components, such as resistors, voltages, and current sources, and the other is for dynamic components,
which can be capacitive or inductive. Each term in the equation is considered as a function of voltage, v,
and time, t, to show device level nonlinearity and time variations. Certain devices, such as transistors,
which have more than two terminals, are broken down into equivalent multiple two-terminal devices.
Such KCL equation is formed from all circuit nodes, which give a system of equations of the form,
A−→x −−→b = 0. The matrix, A, contains values of resistance, capacitance, inductance, and controlled
source gain coefficients, which are known. Regarding the vector,−→b , is formed with known currents and
voltages from independent sources. All unknown quantities related to node voltages and net currents
are accumulated under the vector, −→x . Before solving for the unknowns, the expression for fNode(v, t) is
Chapter 2. Background 19
further simplified by applying finite difference approximation: as in any derivative entity of y, dy/dt ≈(y(t+∆t)−y(t))/∆t, where ∆t represents appropriately selected time-steps based on system convergence
requirements. This allows us to linearize the system before solving it algebraically.
The system of equations can now be solved directly by inverting the matrix, or iteratively through
making an initial guess. Even though the direct inversion method can be adopted for smaller circuit
systems, the iterative approach is usually preferred to maximize the processing capability for solving
large circuit systems. Among various iterative solving methods, Newton-Raphson’s method (NR) is
one of the most commonly applied methods. Equation 2.4 describes the iterative approach of the NR
method:
xn+1 = xn −f(xn)
f ′(xn)(2.4)
en+1 = (xn+1 − xn) ≤MaxRelTol(xn+1), AbsTol (2.5)
As can be seen from Equation 2.4, a function, f(xn) can be solved, if there exists a non-zero first-
order derivative f ′(xn) 6= 0. Since the circuit system is usually nonlinear and dynamic in nature, its
system of equations satisfies the required conditions for the NR method. After obtaining the new value
xn+1 through applying xn, the error for the new value, en+1, can be calculated, as shown in Equation
2.5. For every new value of xn, the corresponding error en+1 is estimated, and the error value, en+1
goes down, as the number of iterations is increased. It also worth mentioning that during every new
iteration, new operating points for all nonlinear circuit components need to be obtained meaning that
the system of equations changes for every new error value, en+1.
The process of iterations continues until the error value drops below a preset simulation error limit.
The preset error limit is the maximum of relative error (referred as RelTol) and absolute error (referred
as AbsTol), as shown in the equation. Relative error is defined as a function of the new value, xn+1,
through the relation, RelTol(xn+1) = |xn+1−xn|/Minxn+1, xn. When the value is much larger than
zero (xn 0), relative error is signified during error calculation. However, without absolute tolerance,
it is not possible to achieve convergence using the NR method. For systems of equations, multiple
error values would need to be calculated for multiple circuit nodes. In such cases, the maximum of all
calculated error values is used for error limit comparison.
It can be inferred from the above discussion that the ODE based modelling scheme allows us to use
any arbitrary input signal, since the output for each circuit node is calculated at every time point. It can
also be clearly seen that the calculation scheme is significantly computationally intensive. As the system
size grows, the system of equations (which is a square matrix in order to have a unique solution set) grows
and matrix inversion complexity increases nearly exponentially. Determining the initial guess, x0, can be
troublesome, because the system needs to intelligently pick the values to ensure convergence; otherwise,
the initial guess needs to be provided manually for each circuit node by the user. Convergence failure
can also arise from not selecting the proper time step, ∆t. Like the case of larger time steps leading to
larger errors, certain continuous functions, such as tanh(x), which do not always have finite derivative,
may often cause convergence failure due to improper time point selection.
Chapter 2. Background 20
2.5.2 Pulse Response Based Modelling
Modelling continuous time circuit behaviour using pulse responses requires representing the transmitted
binary signal through a summing input pulse train multiplied by the transmitted symbols. This is one
of the continuous time component modelling techniques, whose operation principles closely resemble
the event-driven simulation scheme. The modelling scheme is also employed to generate statistical
eye diagrams, a technique which can be used to create eye diagrams with integrated statistical PDF
information without running time-consuming transient simulations [31]. The rest of the section explains
the core concept of how to generate continuous-time output waveform applying the recorded pulse
response.
Figure 2.11a depicts an example pulse response, p(t), that can be acquired from simulation or labora-
tory environment. Its input is a rectangular pulse of unit amplitude Π(t). Here, a rectangular unit pulse,
Π(t), is defined as 1 within the pulse duration of a bit period, Tb. From the recorded pulse response of
indefinite duration, a conspicuous segment of pulse response, pExt(t), can be extracted, which is mostly
non-zero within the chosen range [0, tExt], but approximately zero otherwise. The collected samples
outside the range do not contribute noticeably to any calculation and hence, they are considered to be
zero. This allows us to write the pulse response, p(t), as presented in Equation 2.7:
Π(t) =
1 if 0 ≤ t ≤ Tb0 otherwise
(2.6)
p(t) =
pExt(t) if 0 ≤ t ≤ tExt0 otherwise
(2.7)
In order to estimate the continuous time output, y(t), using the pulse response, any bit-stream input,
x(t), (Figure 2.11b), can be written as follows:
x(t) = limN→∞
N∑i=1
bi ·Π(t− iTb) (2.8)
where bi ∈ A−1, A+1 and i = 1, 2, 3, . . . , N . A−1 and A+1 represent the amplitudes of two binary
logic states 0 and 1 respectively. Using x(t), continuous time output, y(t), can be determined through
convolution with the impulse response of a continuous time system, c(t).
y(t) = x(t) ∗ c(t)
= limN→∞
N∑i=1
bi · p(t− iTb)︸ ︷︷ ︸Simulation length dependent, O(N 2)
(2.9)
where the pulse response is defined in relation to the impulse response, c(t), as, p(t) = Π(t) ∗ c(t).The top plot of Figure 2.11b shows an arbitrary binary bit-stream waveform with sharp transition in a
continuous time-frame. The figure demonstrates how the shifted versions of the pulse responses (shown
Chapter 2. Background 21
t0 tExt
p(t)
0
1Extracted SegmentpExt(t)
(a) Pulse response extraction
Random Bit-stream, x(t)
tTb 2Tb 3Tb 4Tb · · ·
A+1
A−1
Shifted Pulse Responses, bi · pExt(t− iTb)...
t
t
t
Tb 2Tb
2Tb 3Tb
3Tb 4Tb
0
0
0
A+1
A−1
A−1
...
Summed Step Response, y(t)
t
A+1
A−1
(b) Waveform formation
Figure 2.11: Continuous time waveform formation using pulse-response based modelling
Chapter 2. Background 22
in the middle section) are summed to generate the desired output response, y(t) (shown in the bottom
section).
As can be seen in Equation 2.9, the summation must be executed for all transmitted bits throughout
the entire simulation. Due to these facts, the complexity of the implemented algorithm for an N -bit
long simulation grows with O(N 2). In other words, simulation time grows quadratically, without even
considering computational storage requirements, which is undesirable. To bring the complexity to O(N ),
the definition of pulse response, p(t), presented in Equation 2.7, is exploited to reduce the number of
transmitted bits to be summed to a constant. In that case, the equation for calculating continuous-time
output, y(t), becomes,
y(t) = limN→∞
N∑i=N−k+1
bi · pExt(t− iTb)︸ ︷︷ ︸Simulation length independent, O(N )
(2.10)
One of the major concerns regarding Equation 2.10 is that the summation needs to be executed at
every fixed bit duration, Tb. In reality, a transmitted bit-stream often contains various effects, such as
clock jitter and amplitude variation due to equalizer effects, such as feed forward equalizer (FFE), so
the system does not always behave with realizable linearity. Capturing such behaviour requires pulse
responses of various amplitudes as well as durations, and this can make the algorithm very complex.
2.5.3 Step Response Based Modelling
Similar to pulse response based modelling, another continuous time modelling technique for event-driven
simulation is step response based modelling. In step response based modelling, a continuous time
waveform is estimated using the collected step response instead of the pulse response. A key advantage
of step response based modelling over pulse response based modelling is that summation needs to be
executed only when a transition occurs. Since the algorithmic summation happens during the transition
phase of transmitted bit-streams, the number of calculations is always less than or, in the worst case,
equal to that of pulse response based simulation, assuming the time vectors for both cases is of same
length. The rest of the section presents how to apply the step-response to calculate the continuous time
waveform with the aid of Figure 2.12.
The step response, s(t), is recorded for the applied unit step, u(t), as input to the continuous time
system of interest. As can be noticed from Figure 2.12a, a conspicuous segment of the step response,
sExt(t), can be extracted within the time range, [0, tExt], as in the case of pulse response, p(t). Outside
the range, the step response, s(t), is 0 at the initial stage, (when t < 0) and beyond the time range
t > tExt, s(t), it can be considered as a constant, s∞. The expression for step response, s(t), is described
in Equation 2.12.
Chapter 2. Background 23
t0 tExt
s(t)
0
s∞
Extracted Segment
sExt(t)
(a) Step response extraction
Random Bit-stream, x(t)
tt1 t2 t3 · · ·
A+1
A−1
Shifted Step Responses, (αi − αi−1) · sExt(t− ti)
t
t
t
t1
t2
t3
0
0
0
A+1 −A−1
A−1 −A+1
A+1 −A−1
...
Summed Step Response, y(t)
t
A+1
A−1
(b) Waveform formation
Figure 2.12: Continuous time waveform formation using step-response based technique
Chapter 2. Background 24
u(t) =
1 if t ≥ 0
0 otherwise(2.11)
s(t) =
sExt(t) if 0 ≤ t ≤ tExts∞ if t > tExt
0 otherwise
(2.12)
Calculating the continuous time output, y(t), involves performing convolution on a random bit-
stream, x(t), with continuous time system impulse response, c(t). In this case, the random bit-stream,
x(t), needs to be defined in terms of transition states, αi, as defined by Equation 2.13. The transition
states, αi, happening at transition phase, ti, is defined as αi ∈ A−1, A+1, but αi 6= αi−1, where
Figure 3.5: Processed data format comparison for the cases of discrete time and continuous time objects
Chapter 3. Proposed Simulation Method for Analog-Mixed Signal Analysis 41
Regarding the time sequence, TD, all consecutive time events, ti’s, are generated in such a way
that they can be progressively increasing, 0 < t1 < t2 < · · · < tN , during the simulation. This is in
contrast with the situation in conventional time-step based simulation, where if the operation of any
discrete time component depends on the transition points of a continuous time component as an input
source, the simulator often has to move back and forth along the time axis to find the transition points,
which causes certain calculations to be performed repetitively (see [1] for details). In the proposed
scheme, progressively increasing order of the time sequence elements, (ti − ti−1)Ni=1 > 0, is maintainable
throughout the simulation, because circuit objects can be at different signal time simultaneously at any
given simulation time.
Continuous Time Objects
For continuous time components, the continuous output waveform needs to be represented with multiple
discrete time outputs spaced at reasonable time-steps. Continuous-like discrete time output has a number
of applications in transceiver simulations, such as eye diagram generations, jitter measurements, and
asynchronous circuit block simulations. In order to support such continuous-like outputs in event-driven
simulation mode, a continuous time waveform, WC , recorded for an interval, [0, tN ], can be visualized
as multiple wavelets, which are recorded for smaller intervals, 0, (0, t1], (t1, t2], . . . (tN−1, tN ]. Figure
3.5 shows how such continuous time wavelets with relevant intervals can be aligned with their respective
discrete time outputs. Like the case of discrete time objects, the initial output, w0, for continuous time
objects recoded at t = 0, is a scalar. For the rest of the cases, any wavelet, wi, which itself is considered
as a sub-sequence, (wk,i)Ki
k=1, recorded at corresponding time sub-sequence, (tk,i)Ki
k=1, is comparable to
the respective discrete output event, si, at ti. Discrete time points for any i-th wavelet, tk,i’s, are selected
from the respective interval, (ti−1, ti]. Inside the time sub-sequence, each element, tk,i, is incremented
from the previous element, tk−1,i, by its respective time step, ∆tk,i.
Depending on the simulation requirements, time steps can be picked as constant or variable. A
constant time step can be useful for generating an eye diagram, because the generation process involves
sampling at a fixed time step. Time step, ∆ti, can be kept constant within a time sub-sequence, but
might need to be varied over entire simulation period, because all time intervals, (ti−1, ti]’s, might not
be the perfect multiples of the initially chosen time step. A variable time step can be useful for detecting
transition points like determining zero-crossings to improve simulation speed. Regardless, all time points
in (tk,i)Ki
k=1 must be selected within the given interval, (ti−1, ti], to avoid non-causality effects. This
segmentation process enables running continuous time components in event-driven mode.
3.3 Performance Evaluation of OO Simulation in Case Studies
We evaluate the performance of OO simulation through examples in this section. Section 3.3.1 is used
to study the simulation steps in detail for a simple example case. This example case is analyzed in
terms of the order of processing circuit objects for various combinations in Section 3.3.2. Later, Section
3.3.4 explains how the proposed OO simulation can be implemented for parallel processing environment.
Section 3.3.3 discusses the simulation situations, if the system has feedback.
Chapter 3. Proposed Simulation Method for Analog-Mixed Signal Analysis 42
ClockPRBS
Random Bit-stream Generator
Transmitter
Channel
Receiver
Slicer
VClk VRBG VCh VRx
(a) Block-level schematic
VClk
VRBG
VCh
VRx
-
Signal Time, t
Simulation Time: 1, 2, 3, . . .
t1 t2 t3 t4 t5
t1a t2a t3a t4a t5a
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
(b) Output waveforms with added markings
Figure 3.6: Simulation test case study for OO simulation
Chapter 3. Proposed Simulation Method for Analog-Mixed Signal Analysis 43
3.3.1 Example Case Study
Figure 3.6 shows the block-level testbench of a simple transceiver circuit used for the evaluation and its
corresponding output waveforms generated from the simulation with added markings for future reference.
The transceiver employs three major blocks: a random bit-stream generator (RBG) (at the trasmitter
side), a data transmission channel, and a slicer (at the receiver side). The RBG consists of a PRBS
generator fed by a clock producing a synchronous binary output. In the waveform plot, the horizontal
axes for all cases represent time, t. The second plot from the top shows a sample output waveform from
the RBG, VRBG, which is generated at the clock transitions (both rising and falling), VClk. The third
plot depicts the output waveform of the channel, VCh, calculated at a sample rate much higher than the
clock frequency to demonstrate its continuous nature. The task of the slicer is to produce a binary data
corresponding to its continuous input. The fourth plot shows the binary output of the slicer, VRx.
During the OO simulation, all major blocks are modelled as independent objects. Table 3.3 presents
object-specific properties, which are required during the object construction.
Table 3.3: Description of object-specific properties for the selected object-oriented simulation case
Object Property Value
RBG objTyp It is defined as ObjTyp.discrete by default, since the rep-
resentative component is of discrete type.
inPort It is kept empty, [], since the component has no input source.
clkPeriod It defines the period for the internal clock. Since the PRBS
flip-flops are configured for both edge operation, it is set to be
twice of an UI.
prbsState It defines the output binary states of the PRBS flip-flops. It is
actually a vector and its length is determined by the number
of the flip-flops. Elements in the vector are updated at every
clock phase based on the PRBS polynomial expression.
Channel objTyp It is defined as ObjTyp.continuous , because slicer operation
requires continuous time wavelets.
inPort It contains the handle information related to the object RBG.
modelInfo It is utilized to describe the channel model. Its content depends
on how the channel is modelled for simulation. For instance,
it contain step response amplitude and timing information, if
the object is modelled based on step responses.
Chapter 3. Proposed Simulation Method for Analog-Mixed Signal Analysis 44
Continuation of Table 3.3
Object Property Value
Slicer objTyp It is defined as ObjTyp.discrete , because slicer outputs only
contains transition event of the channel.
inPort It contains the handle information of the object channel.
threshold It defines the assigned threshold to make the binary decision.
Here, its value is set at the mid-point of the VClk amplitude
range.
End of Table
The above table also does not discuss about certain properties, such as evtHost , lastIn <Time>,
and lastIn <State>. These properties are discussed earlier in Table 3.1. Simulation-specific methods
(except isComplete()) for all objects are presented as follows.
Table 3.4: Simulation-specific properties of all objects for the selected object-oriented simulation case
Object Method Routine
RBG init() This method defines the properties lastOut <Time> and
lastOut <State> to represent its initial output.
receive() Since the object does not have any input source, this method
does not exist.
process() This method is responsible to generate the output. It is respon-
sible to calculate the PRBS output with respect to the clock
transitions. Its routine can be described as follows.
function process()
while no PRBS transition is detected
- Use clkPeriod to define the next clock ...
transition
- Perform the PRBS operation
- Update the prbsState
end
- Update lastOut <Time> and lastOut <State>
- Notify the Channel to receive its output
end
Chapter 3. Proposed Simulation Method for Analog-Mixed Signal Analysis 45
Continuation of Table 3.4
Object Method Routine
Channel init() It calculates the initial output at t = 0 based on the initial
state of the RBG.
receive() This method receives the processed output from the RBG ob-
ject and appends them to the properties lastIn <Time> and
lastIn <State>.
process() Its task is to calculate the continuous time output.
function process()
if no input information is available
- hold on to the state
return
end
- Calculate its continuous time output
- Store the output at lastOut <Time> and ...
lastOut <State>
- Notify the Slicer to receive its output
end
Slicer init() It defines the initial state of the slicer based on the initial
output of the channel.
receive() This method receives continuous time output from the channel
and then identifies the slicing location on the continuous time
waveform.
function receive()
- Receive continuous time output from channel
% Pre-process
- Detects the threshold-crossing locations, t i's
- Assigns output states for all t i's
- Append t i's and its states to lastOut <Time> ...
and lastOut <State>
end
Chapter 3. Proposed Simulation Method for Analog-Mixed Signal Analysis 46
Continuation of Table 3.4
Object Method Routine
process() Task of this method is to generate time events based on the
detected crossing locations.
function process()
if lastOut <Time> is empty
return
end
- Generate an time event using the first crossing
- Remove the first crossing
end
End of Table
As can be seen, the receive()’s for all objects perform the task of receiving input (if input source is
available), except for the slicer, in which case, the method also performs some pre-processing to simplify
the task of the slicer process(). It is intuitive for the slicer to save only the detected threshold-crossings
instead of entire continuous time waveform from the channel, since it reduces the memory requirement
for slicer operation.
Table 3.5 explains step-by-step how the simulation is conducted in OO mode. As mentioned in
Section 3.1.2, the enlisted object order does not matter for running OO simulation. Therefore, let us
assume the objects are enlisted in the following order: a. slicer, b. channel, and c. RBG. Based on the
Algorithm 3.1, the simulation steps are described as follows:
Table 3.5: Explanation of simulation steps for the selected object-oriented simulation case (for simulationtime 0 - 4)
Simulation Time Action
03 At the initial step, all circuit objects are initialized. Since the RBG has
no input source, its method init() perform the initialization indepen-
dently at first. Next, the channel method init() is initialized based on
the initial state of the RBG. Later, the slicer performs its initialization
similarly based on the initial output at t = 0 of the channel.
3represents the initialization phase
Chapter 3. Proposed Simulation Method for Analog-Mixed Signal Analysis 47
Continuation of Table 3.5
Simulation Time Action
1 This step progresses as follows.
a Slicer: Since no input is available for processing, its process()
declares a hold state.
b Channel: Since no input is available for processing, its process()
declares a hold state.
c RBG: Its process() generates the first output at time event, t1.
2 This step proceeds as follows.
a Slicer: Since no input is available for processing, its process()
still remains at the hold state.
b Channel: Its process() produces continuous time output se-
quence for (tk,1)K1
k=1, where tk,1 ∈ (0, t1].
c RBG: Its process() generates the output at time event, t2.
3 This step proceeds as follows.
a Slicer: Its process() goes over the received waveform recorded
at (tk,1)K1
k=1, but detects no transition at the defined threshold, and
hence no output is generated.
b Channel: Its process() produces continuous time output se-
quence for (tk,2)K2
k=1, where tk,2 ∈ (t1, t2].
c RBG: Its process() generates the output at time event, t3.
4 This step proceeds as follows.
a Slicer: Its process() detects the first transition event at t1a.
b Channel: Its process() produces continuous time output se-
quence for (tk,3)K3
k=1, where tk,3 ∈ (t2, t3].
c RBG: Its process() generates the first output at time event, t4.
End of Table
As can be observed at simulation time 4 that all circuit objects are processing their received input
information and generating outputs. This pattern of processing for all circuit objects will repeat at
all future simulation time, 5, 6, 7, . . . , until the process() methods of all the circuit objects stop
processing. When a circuit object has reached its processing to the end of the signal time, it stops
Chapter 3. Proposed Simulation Method for Analog-Mixed Signal Analysis 48
processing and this scenario is similar to that of holding states. In this case, the RBG finishes processing
first, then the channel does, and lastly the slicer does. When all process() methods stop processing,
the OO simulator terminates completely.
In addition to previous observation, it is worth mentioning that time events like t2a generated by the
slicer occur slightly before the time event, t3, generated by the RBG according to signal time axis. In
conventional AMS simulation scheme, the simulator finds the time event, t2a, through guessing multiple
time points iteratively around the time event, t2a, of the signal time axis. During the determination
process, the simulator recursively generates and discards the other time events, such as t3, which can
bring undesirable consequences of long simulation time. In OO simulation mode, the time event, t2a,
is detected much later than the time event, t3, in simulation time axis. Because it is assumed here
that the event, t2a, does not cause any shifting of the event, t3, along the signal time axis. In essence,
once the events are generated by any circuit object, they are not discarded, but are controlled how far
the circuit object can progress. By removing the discarding policy through the signal time flexibility of
circuit objects at the individual level helps the OO simulator to achieve higher computational efficiency
and thereby increased simulation speed.
Here, the example case is chosen to be simple for explanation convenience. Realistic examples have
complexities associated with branching and feedback loops, which leads to various processing frequency.
In those cases, certain circuit objects often need to be on hold states at intermediate simulation steps.
According to the pseudo-code (presented in Table 3.2), the method process() of any circuit object
wastes negligible time, whenever the circuit object enters into hold states. Since the process() has
the dominant effects in simulation time, additional hold states would not noticeably linger the overall
simulation time.
3.3.2 Object Order Sensitivity
This section analyzes the example described earlier in terms of its sensitivity to circuit object enlisting
order in the top-level simulation coordinator. Figure 3.7 depicts three possible orders, in which the
circuit objects from the example case can be enlisted and their effects on running simulations for N
number of transitions. In each figure, the horizontal axis represents the simulation time progressing
from left to right. The vertical axis represents the order, by which each circuit object is activated.
Combining the two axes forms a matrix, in which each cell represents a time event, ti, (or a time
sequence, (tk,i)k=Ki
k=1 , where tk,i ∈ (ti−1, ti], for continuous time component representation), generated
by corresponding circuit object. Here, X represents a situation when no output is generated, but the
circuit object still has to spend time for processing its inputs. On the other hand, H and C represent
hold and completed states respectively, but negligible time is spent.
Case 1 scenario (shown in Figure 3.7a) is re-drawn from object sequence used for describing the
simulation steps in Table 3.5, where the object activation order is: a. slicer, b. channel, and c. RBG.
Because of the activation order, the channel has to be held once and the slicer has to be held twice
due to lack of available input information. Afterward, the channel and the slicer never have to be held,
because both objects have access to sufficient input information. For case 2, the hold states of the slicer
at simulation time 2 is possible to be avoided, since the input information at (tk,1)K1
k=1 is available to
generate the initial slicer state, X. It is because the channel is activated before the slicer. In Case 3,
RBG is placed at first followed by the channel and the slicer. This eliminates all hold states in this
example case.
Chapter 3. Proposed Simulation Method for Analog-Mixed Signal Analysis 49
a.S
lice
rH
HX
t 1a
···
t (N−3)a
t (N−2)a
t (N−1)a
b.
Ch
ann
elH
(tk,1
)K1
k=1
(tk,2
)K2
k=1
(tk,3
)K3
k=1
···
(tk,N−1)K
N−
1
k=1
(tk,N
)KN
k=1
C
c.R
BG
t 1t 2
t 3t 4
···
t NC
C
12
34
NN
+1
N+
2−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→
Sim
ula
tion
Tim
e
(a)
Case
1sc
enari
o(e
xam
ple
case
)
a.C
han
nel
H(tk,1
)K1
k=1
(tk,2
)K2
k=1
(tk,3
)K3
k=1
···
(tk,N−1)K
N−
1
k=1
(tk,N
)KN
k=1
b.
Sli
cer
HX
t 1a
t 2a
···
t (N−2)a
t (N−1)a
c.R
BG
t 1t 2
t 3t 4
···
t NC
12
34
NN
+1
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→
Sim
ula
tion
Tim
e
(b)
Case
2sc
enari
o
a.R
BG
t 1t 2
t 3t 4
···
t N
b.
Ch
ann
el(tk,1
)K1
k=1
(tk,2
)K2
k=1
(tk,3
)K3
k=1
(tk,4
)K4
k=1
···
(tk,N
)KN
k=1
c.S
lice
rX
t 1a
t 2a
t 3a
···
t (N−1)a
12
34
N−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→
Sim
ula
tion
Tim
e
Legend
H:Hold
State
C:Completed
X:NoOutput
(c)
Case
3sc
enari
o
Fig
ure
3.7:
Eff
ect
of
circ
uit
ob
ject
pla
cem
ent
ord
erin
act
ivati
on
list
for
OO
sim
ula
tion
Chapter 3. Proposed Simulation Method for Analog-Mixed Signal Analysis 50
In the example case, the clock is cascaded with the circuit object PRBS as a part of the circuit
object RBG. This integrations allows all circuit objects to be held based on the placement positions of
the circuit objects. If the clock is treated as an independent circuit object, additional hold states will
be required. The number of additional hold states depends on the maximum number of consecutive
identical digits (CID) of the PRBS. Once the PRBS has reached to the point, when it has transmitted
its maximum CID, circuit objects in the later chain do not go on to hold states. Only way to increment
the number of hold states is to consistently increase the CID, which reduces generating number of events.
Overall, the number of hold states are independent from the length of the simulation.
3.3.3 Feedback Loop Situation
This section analyzes the circumstances, where there exists feedback loops in OO simulation environment.
Feedback loop is a commonly used structure in clock synchronization as well as in equalizer coefficient
adaptation schemes. Simulating feedback loops are important for various purpose, such as to study
the top-level functional accuracy, impact on the neighbouring circuitry, and feedback loop stability in
transient time. Because the OO simulation scheme is primarily developed focusing on the feed-forward
architecture, setting up the feedback loop simulation vastly depends on the nature of the loop delay.
Signal
In
Signal
Out
Feedback
Path
C1 C2
C3
Figure 3.8: Schematic of a system with a feedback loop
Figure 3.8 provides a schematic representation of a typical feedback system, which comprises three
major components: C1, C2, and C3. The input signal is processed by C1 based on the feedback C3.
The output of C1 is processed by C2 to generate the output of the system, which in turn is fed to C3.
We use this example here to study its possible implementation and to evaluate its potential situations
under OO simulation environment. For explanation convenience, all blocks are considered to be discrete
time.
Dealing with Components Through Object Ordering
One approach is to deal with the components employed in the feedback system through object ordering.
Section 3.3.2 shows that object ordering is not a concern for systems that involve feed-forward architec-
ture. However, if a system with feedback architecture is dealt without modification for OO simulation,
number of hold states increase. Increasing the number of hold states leads to accumulation of input
information for processing and thereby demands more memory. Effects on number of hold states during
OO simulation of the system with feedback loop (shown in Figure 3.8) is illustrated in Figure 3.9.
Chapter 3. Proposed Simulation Method for Analog-Mixed Signal Analysis 51
C1 · · · ti ti+1 ti+2 · · ·
C2 · · · ti ti+1 ti+2 · · ·
C3 · · · ti ti+1 ti+2 · · ·
N N + 1 N + 2−−−−−−−−−−−−−−−−−−−−−−→
Simulation Time
(a) Case 1 scenario (preferred)
C3 · · · ti ti+1 ti+2 · · ·
C2 · · · ti ti+1 ti+1 · · ·
C1 · · · ti ti+1 ti+2 · · ·
N N + 1 N + 2 N + 3 N + 4 N + 5 N + 6 N + 7 N + 8−−−−−−−−−−−−−−−−−−−−−−→
Simulation Time
(b) Case 2 scenario (not preferred)
Figure 3.9: Effects on object placement ordering in OO simulation for system with feedback loop
The above figure presents two possible scenario of object placement ordering, during the simulation
for the feedback system. In the ordering shown in Figure 3.9a, all information transactions occur at
the right timing and hence no hold state is visible in the sub-figure. Once C1 has processed the time
event ti, C2 can starts working for its processing for ti, and upon receiving the output at ti from C2,
C3 can process its output at ti. This situation repeats in the subsequent simulation steps. If the
object ordering is performed in reverse (first C3, then C2, and last C1), the information transaction
become sparse along simulation step axis, as it is demonstrated in Figure 3.9b. Hence, when dealing
with a feedback system, Case 1 object ordering is preferred over that of Case 2 and the object ordering
should be enforced during the initial phase. It is worth mentioning that all components in the feedback
system must produce output events at every simulation steps, even though there is no output transition;
otherwise, the feedback system cannot be simulated in OO environment.
Integrating into a Single Object
Another approach to implement the example feedback system is through integration of all the components
in a single object. If the updates due to feedback loop are taking place in continuous time, like the case
of CDR systems, it is preferred to describe the entire system inside one circuit object. Under such
circumstances, the representative object can be built in a bottom-up approach.
Figure 3.10 shows the hierarchy extraction procedure for feedback system modelling and Algorithm
3.3 shows how to capture the bottom-up hierarchy at the code level. As can be observed from the
abstraction process, feedback path gets hidden from the external OO simulator. Hence, hold state
issues associated with object ordering can be avoided. This way of abstraction process also enhances the
coding level comprehensibility due to having a controlled and organized developments. However, because
of describing the entire feedback system using one circuit object, it is not possible to see the outputs at
Chapter 3. Proposed Simulation Method for Analog-Mixed Signal Analysis 52
Signal
In
Signal
OutC1 C2
C3
obj C1
obj C1C2
obj C1C2C3
Figure 3.10: Hierarchical representation for a system with a feedback loop
Algorithm 3.3 Hierarchical representation template for OO simulation
% In obj C1.m fileclassdef obj C1 < handle % Starting as root
- properties and methods are described hereend
% In obj C1C2.m fileclassdef obj C1C2 < obj C1 % Inheriting from obj C1
- properties and methods that are not described in obj C1 go hereend
% In obj C1C2C3.m fileclassdef obj C1C2C3 < obj C1C2 % Inheriting from obj C1C2
- properties and methods that are described neither in obj C1nor in obj C1C2 go here
end
Chapter 3. Proposed Simulation Method for Analog-Mixed Signal Analysis 53
the intermediate stages, like the outputs from C3 of Figure 3.10. To overcome such problem, it is always
possible to add a circuit object related to C3 at the output of the integrated object, obj C1C2C3.
Handling at the Script Level
The third approach to deal with the feedback loop is by handling it through top-level scripting. If the
feedback loop operates in discrete time with updating at a longer loop delay (comparable to multiple
UIs), this scheme can be applied. In this case, the feedback loop is first broken at a junction so that
the modelled system behaves like a feed-forward system. Later, at the script level, the feedback related
calculations are performed and then applied during the feedback updating phase. This analysis scheme
can have great usages particularly at the early phase of algorithmic development.
Algorithm 3.4 depicts the pseudo-code of the script, which can mimic the feedback loop. At the
initial stage, the circuit system under test are defined without closing the feedback loop. The approach
also requires defining feedback updating time, fedbackTime, and simulation stop time, simStopTime.
The scripting terms fedbackTime and simStopTime are defined here as ∆t and T respectively. The
relationship between the two variables can be defined as, T ≈ N∆t, where N 1 represents the number
of feedback update intervals set for observation. Next step is to generate a loop to generate intervals,
[0, ∆t], [∆t, 2∆t], [2∆t, 3∆t], . . . , [(N − 1)∆t, N∆t]. Here, it is acceptable to have overlaps at the
transition points, i∆t, where i = 1, 2, . . . , (N − 1), since outputs from each simulation are not used
for merging. During each loop interval, simulation for feed-forward system is conducted and then new
feedback coefficients are determined based on the simulation outputs.
Algorithm 3.4 Pseudo-code script to deal with feedback system in OO simulation
% Initialize the environment- Define the circuit system for test- Define simulation stop time => simStopTime- Define feedback update time => feedbackTime
% Looping to mimic feedbackt = 0;- Initialize feedback coefficient (at t = 0)while t < simStopTime
t = t + feedbackTime;
% Task during each loop- run simulation until t- Acquire simulation output- Update feedback coefficient
end
It is possible to run the OO simulation for any such intervals because final states of all circuit objects
from the immediate past simulation can be applied as initial states for the current simulation. Only
certain initial conditions related to the feedback loop need to be intervened to reflect the new feedback
coefficients. This scheme is not realistic for most time-step based simulations mainly because the feature
of continuing the simulations based on previously saved results are not supported. Even if the feature is
supported by certain time-step based simulators, applying the newly updated coefficients might not be
introduced safely without causing numerical instability.
Chapter 3. Proposed Simulation Method for Analog-Mixed Signal Analysis 54
3.3.4 Incorporating Parallelism
Parallel computation platform is becoming a de facto standard in recent years due to its speed in terms
of conducting number of arithmetic calculations per cycle. Incorporating parallelism into the proposed
OO simulation scheme can be beneficial in modern computational platform. The ability to identify the
inter-dependency among circuit components as well as maintaining circuit component specific time axes
makes the OO simulator feasible to implement in parallel computation environment. Parallel processing
scheme for the OO simulator is explained using Figure 3.11 and 3.12 as follows.
Figure 3.11 describes how to realize the parallel computation structure embedded inside the OO
simulation scheme. The horizontal axis represents time and the vertical axis indicates the index of a
processor cores. P(·) symbolizes a process, a fragment of the software, which can be run on a processor
core. In the OO simulation case, a process can be the top-level script (which is referred as P(S))
or any circuit object specific method. For this example, let us assume the circuit system consists
of N circuit components referred to as C1, C2, C3, . . . CN , and their processes are described as
P(C1), P(C2), P(C3), . . . P(CN ) respectively. The knowledge of how the circuit components are
connected is not essential from the perspective of conducting the OO simulation (as established in
Section 3.3.2). Hence, the connectivity among the circuit components is not displayed here.
The figure shows two computational processing cases, serial and parallel on the same time axis for any
Serial processing︷ ︸︸ ︷
Pro
cessorCore
Index
Core 1 P(S) P(C1) P(C2) P(C3) · · · P(CN )
Core 1
Core 2
Core 3
...
Core N
P(S) P(C1)
P(C2)
P(C3)
P(CN )︸ ︷︷ ︸Parallel processing
-Time
Figure 3.11: Serial processing to parallel processing conversion for OO simulation
Chapter 3. Proposed Simulation Method for Analog-Mixed Signal Analysis 55
Pro
cessorCore
Index
Core 1
Core 2
Core 3
Core 4
P(S) P(C1)
P(C2)
P(C3)
P(C4)
P(C5)
P(C6)
P(C7)
· · ·
· · ·
· · · P(CN )
︸ ︷︷ ︸Parallel processing
-Time
Figure 3.12: Parallel processing demonstration under restricted resource environment for OO simulation
given simulation step. Top part of the figure depicts the situation, if the OO simulation is evaluated using
only one core, Core 1. First time slot is utilized to perform the process associated with top-level script,
P(S), where all processes related to circuit components, P(Ci), are launched from. If the computation
platform has access to more cores than the number of processes associated with circuit components
(number of cores ≥ N), all processes, P(Ci), can be launched at the same time upon evaluation of
process, P(S). The gray arrows in the bottom part of the figure demonstrate the parallelism performed
by N processing cores. As can be observed, incorporating parallelism with abundant processing cores is
only limited by the evaluation time of the longest process among all the processes.
Once the circuit system becomes larger, the number of processes associated with circuit components
usually increases. Under such circumstances, the number of processes is much larger than the number of
available processing cores for parallel computations and this is typically expected for current processors
in applications. Figure 3.12 depicts the case of a 4-core processor evaluating previously described N -
component circuit system, where it is assumed N 4. As can be observed from the figure, P(S), is
first performed, and then all P(Ci)’s are evaluated using the 4 cores in parallel.
3.3.5 Simulation Speed Performance
For the simulation performance analysis, we study a test case related to FFE operation as shown in
Figure 3.13. The trasmitter side consists of a RBG (shown as source) followed by an FFE and triggered
by a clock source (shown as TxClk). Once equalized by the FFE, the transmitted signal is then sent
through the attenuating channel (shown as Ch). Continuous time output from the channel is then
analyzed at the scopes to generate the eye diagram and to measure jitter. Detail modelling process of
the FFE circuitry and its various settings are described in Section 4.1.
We implement this test case in detail in C++ using a Linux computer with Intel Core i7 processor.
The source and the FFE along with their clock are implemented as one discrete time circuit object. The
channel is implemented here as a continuous time object and its maximum output time resolution is set
to 0.01 UI (or minimum 100 discrete time points per UI), because its output is fed to an eye scope to
calculate the eye diagram with exactly 0.01 UI resolution. The output of the channel is also fed to a
Chapter 3. Proposed Simulation Method for Analog-Mixed Signal Analysis 56
Transmitter Receiver
Source FFE
TxClk
Ch
Eye
Scope
Jitter
Monitor
(a) Block diagram
Simulation Time for Varying Length
Simulation
Time
1, 000s
100s
10s
1s
0.1s
0.46s
4.40s
43.85s
440.88s
Trend line
10k 100k 1, 000k 10, 000k
Number of Bits (1k = 1000)
(b) Simulation speed result
Figure 3.13: Speed performance result for the OO simulation
Chapter 3. Proposed Simulation Method for Analog-Mixed Signal Analysis 57
jitter monitor block, whose task is to measure the total jitter present in the system. Jitter measurement
is performed in real-time through identifying the zero-crossing points.
Once an executable program is compiled from the C++, the program is executed for different simulation
lengths. Figure 3.13b shows the plot of simulation run time, due to transmitting different number of bits
in log scale. As the number of bits are increased, the simulation time increases linearly at the rate of
∼ 44s per million bits. Performance linearity is highly desired in algorithmic performance, since it allows
to predict the trend as the simulation length is modified. Table 3.6 shows simulation time broken at the
individual circuit object level for the case of 10, 000k bits. As can be seen, the most time consuming
component channel takes about 93.7% of the entire simulation time. Because of this, the example case is
not considered worthy enough to implement in parallel processing environment. However, if the system
contains multiple components, which are similar to the channel in terms of calculation effort, it is possible
to increase simulation speed through parallelism.
Table 3.6: Simulation time break down for the case of 10, 000k bits (where 1k = 1, 000)
Component Simulation Time
Transmitter 0.6s
Channel 412.7s
Receiver 27.5s
Altogether 440.8s
3.4 Summary
This chapter presents a novel scheme to simulate an AMS circuit system. The proposed scheme ad-
dresses asynchronous circuitry incompatibility issue, which exists in conventional event-driven simulators
through incorporating continuous time output calculation for circuit objects. Continuous time output
calculation is performed primarily in time step based simulations, but time step based simulator has
slow evaluation speed due to its inherent time axis unionization process. Hence, the proposed simula-
tion scheme facilitates individual time-point selection independence to the circuit components during
the evaluation process. This helps to calculate outputs at any given location, whenever any activity is
detected. In order to implement the scheme, relationship between discrete time and continuous time
circuit component modelling has been made. Later, various studies have been performed to analyze
effectiveness of the proposed simulation scheme. Concept of the proposed simulation scheme has been
applied in modelling equalizers and CDR circuitry.
Chapter 4
Proposed Modelling for Equalizer
Circuitry
This chapter discusses the proposed modelling concept for equalizer circuitry. As explained in Section
2.2.1, the task of an equalizer is to compensate for channel attenuation. The primary purpose of proposed
modelling is to speed up the simulation process, while maintaining comparable accuracy to that of
conventional SPICE simulators. Conducting simulations at higher speed and accuracy is required in
order to perform verification analyses, such as generating BER contours (Section 2.3.1).
Three major types of equalizers are analyzed here: feed forward equalizer (FFE), continuous time
linear equalizer (CTLE), and decision feedback equalizer (DFE). Figure 4.1 represents an example ar-
chitecture consisting of these three types of equalizers: FFE at the trasmitter side, while the receiver
contains CTLE and DFE. CTLE is implemented using passive resistive-capacitive circuit elements for
continuous time operation, while FFE and DFE operate in discrete time using local clock sources.
Hence, the figure shows two clock signals, TxClk and RxClk, which are local to the trasmitter and the
receiver respectively, but have uni-directional synchronous relationship. The figure implicitly shows the
synchronous clock relationships, because the clock recovery system is not discussed in this chapter.
Performance of these equalizers often suffers from unavoidable nonlinearities, once they are imple-
mented using real-life circuit elements. The nonlinear effects of the equalizers may appear from various
Transmitter Receiver
Source FFE Ch CTLE DFE Sink
Clock
Synchronization
TxClk RxClk
Figure 4.1: Architectural overview of typical channel equalization system
58
Chapter 4. Proposed Modelling for Equalizer Circuitry 59
sources, such as nonlinear operations, finite bandwidth, mismatches, and process-voltage-temperature
(PVT) variations of transistor and other devices. These nonlinearities limit data communication speed
through transmission channel, but their effects are often not visible in the linear behavioural models of
the respective equalizers. Hence, it is required to capture such nonlinearity as realistically as possible to
increase the accuracy of transceiver performance verification analyses.
This chapter deals with how to capture nonlinear equalizer behaviour in the models. The proposed
modelling schemes for FFE, CTLE, and DFE are presented in Section 4.1, 4.2, and 4.3 respectively. Each
section begins by introducing an equivalent linear model for the corresponding equalizer, then discusses
circuit-level implementation, and finally explains the modelling procedure to capture the transistor-
level nonlinearity. Each proposed model is evaluated through generating an eye diagram (or multiple
eye diagrams depending on the case), which is then overlapped with the one generated by Spectre for
comparison purposes.
4.1 Feed Forward Equalizer (FFE) Modelling
Like any feed-forward control system, FFE equalizes the input signal directly with its pre-conceived
knowledge of the channel attenuation. An FFE can be designed to eliminate both pre-cursor and post-
cursor ISI. FFE, implemented at the trasmitter side, receives input signal from the data transmission
source, which is synchronous to the local clock.
Figure 4.2 shows a block-level architecture of an FFE with M pre-taps and N post-taps. Signal
in represents an input bit-stream, which needs to be transmitted. During the FFE operation, the
multiple delayed versions of the input bit-stream are added with the defined tap weights, wi, where
i = −M, . . . , −1, 0, 1, . . . , N . The z−1 block represents a delay that is usually set to 1 UI. All
the delayed signals and the input signal are first multiplied with their respective tap weights and then
summed up to produce the equalized output.
Signal In · · · · · ·z−1
Delay
z−1 z−1 z−1
w−M w−1 w0 w1 wN
Equalized Output
Figure 4.2: Basic architecture of a symbol-spaced feed forward equalizer (FFE)
Based on this, the FFE transfer function in z-domain, FFFE(z), can be written as follows:
FFFE(z) =
(w−M · zM + · · ·+ w−1 · z︸ ︷︷ ︸
M Pre-taps
+w0 + w1 · z−1 + · · ·+ wN · z−N︸ ︷︷ ︸N Post-taps
)· z−M (4.1)
Chapter 4. Proposed Modelling for Equalizer Circuitry 60
4.1.1 FFE Implementation
The FFE operation is usually based on zero forcing equalization. The notion of zero forcing equalization
is to force the ISI to zero. Tap weights for the FFE are calculated from the pulse response of the intended
channel. The pulse response can be collected from simulation or measured by applying a rectangular
pulse of one-bit duration, Tb, to the channel. Calculation procedure for the FFE tap weights is explained
below.
Figure 4.3 shows an example pulse response, p(t), by applying an input pulse, Π(t). The dark colored
stems, superimposed on p(t), mark the extracted cursors, ci’s, where i = . . . , −1, 0, 1, 2, . . . .
Input Pulse, Π(t)Π(t)
1
t0 Tb
Pulse Response, p(t)p(t)
Apeak
t-
Tb
· · ·c−1
c0
c1
c2· · ·
Channel
Cursor
Figure 4.3: Cursor extraction from channel pulse response
We can represent ci’s as a column vector,−→C , as follows,
−→C =
[c−∞ · · · c−1︸ ︷︷ ︸P1 Pre-cursor ISI
c0 c1 c2 · · · c∞︸ ︷︷ ︸P2 Post-cursor ISI
]T
In−→C , c0 is considered the main cursor, as it has the highest height amongst all other cursors.
From Equation 4.1, FFE tap weights can be extracted in vector format,−−−−→WFFE , as follows,
−−−−→WFFE =
[w−M · · · w−1 w0 w1 · · · wN
]T
Chapter 4. Proposed Modelling for Equalizer Circuitry 61
Applying the convolution between−→C and
−−−−→WFFE , the desired channel output response with peak
amplitude, Apeak, can be formulated as follows,
−→C ∗−−−−→WFFE =
[0 0 · · · 0︸ ︷︷ ︸M + P1 zeros
Apeak 0 0 · · · 0︸ ︷︷ ︸N zeros
](4.2)
where
−→C ∗−−−−→WFFE =
... c−∞
c−1. . .
c0. . .
...
c1. . . c−1
...
c2. . . c0 c−1
...
.... . . c1 c0 c−1
. . .
. . . c2 c1 c0. . .
...
... c2 c1. . . c−1
... c2. . . c0
.... . . c1
. . . c2
c∞...
·
w−M
...
w−1
w0
w1
...
w−N
Here, the number of rows and columns for−→C ∗−−−−→WFFE is (P1 + P2 + M + N + 1) and (M + N + 1)
respectively. Solving Equation 4.2 for−−−−→WFFE yields the FFE tap weights of defined size. As for Apeak,
it is considered as 1 here for simplicity.
An example circuitry of a 3-tap FFE implemented at the trasmitter end is shown in Figure 4.4. The
example circuit is a source series terminated (SST) FFE implemented for single-ended data transmission
application. The circuit consists of two major segments: digital logic circuitry and slices of transmit
driver. The digital logic circuitry encodes the data with delay elements, such as z−1 and z−2, and
polarities for the the tap weights, sgn(wi). The task of the slices is to drive the encoded transmit signals
representing the tap weight magnitudes, |wi|. Widths of the PMOS and NMOS transistors, WPi and
WNi, are designed such that their resistances represent the respective tap weights. In order to minimize
reflection through the channel, the net impedance looking from the output of the slices toward supply or
ground should be set to characteristic impedance of the the channel. Even though the example is shown
for 3-tap case (involving 1 pre-tap and 1 post-tap), the number of taps can be extended to represent M
pre-taps and N post-taps. (Regarding detail design procedure of the transmit driver, refer to [35,36].)
Chapter 4. Proposed Modelling for Equalizer Circuitry 62
Data
In
sgn(w−1)
sgn(w0) · z−1
sgn(w1) · z−2
WP1 ∝ |w1|
WN1 ∝ |w1|RT1 ∝
1
|w1|
Slices of FFE Trasmit DriverDigital FFE Logic Circuitry
Equalized
Out
Figure 4.4: Circuit-level overview of a 3-tap source series terminated based single-ended FFE
4.1.2 FFE Modelling for OO Simulation
In OO simulation, FFE implemented at the trasmitter end, is considered as a discrete time object,
ObjTyp.discrete . Algorithm 4.1 presents the FFE template object for running OO simulation. It
has two input sources for operation: clock and RBG; both sources are discrete type. Routines of its
constructor and methods (inti(), receive(), and process()) are programmed following the criteria
described in Section 3.2.1.
In reality, a fabricated FFE behaves nonlinearly and this has various undesirable effects, such as
sampling threshold shift, jitter increase, and signal transition shape asymmetry. They can be generated
from a wide variety of sources, such as FFE implementation architecture, local clock jitter, and power
supply noise. Here, the primary focus of this section is to discuss how to capture the nonlinearity
associated with the FFE implementation architecture.
Architectural nonlinearity in the example circuit (shown in Figure 4.4) is due to the nonlinear tran-
sistor operation. Because equivalent resistance across the drain-source region depends on the volt-
age difference, tap weights realized from the equivalent resistances vary during the FFE operation.
Hence, no closed-form algebraic equation is not available. To overcome the problem, a look up ta-
ble (LUT) based calculation scheme is proposed. In general, if a FFE has n-taps, it can have 2n
. . . 0110100 . . .
Random binarybit-stream
. . . ,−1,+3,+1,−3, . . .
2-tap FFE symbolicstates (shown for 0110)
A+3A+1
A−1A−3
2-tap FFEoutput states
Figure 4.5: Look-up table (LUT) based nonlinearity modelling for FFE
Chapter 4. Proposed Modelling for Equalizer Circuitry 63
Algorithm 4.1 Modelling template of feed forward equalizer for running OO simulation
classdef FFE < handleproperties
objTyp = ObjTyp.discrete - Discrete object TypeclkPort - Clock object informationinPort - Input RBG object information% Other internal properties not shown here
end
methods% Constructor called from the top-level scriptfunction obj = FFE()
- Construct the FFE object- Receive and verify all input information
end
% Method init() triggered by the input object inPort for initial processingfunction init(obj)
- Define remaining uninitialized internal variables- Calculate output at time, t = 0- Notify its outputs receiving objects
end
% Method receive() triggered by input object inPortfunction receive(obj)
- Collect the output from the inPort at t i > 0- Append the collected information v(t i) with previous information
- New collection v(t i) 6= v(t i-1) and t i > t i-1end
% Method process() triggered by clock object clkPortfunction process(obj)
if processing is completedreturn
end- Determine the next transition, t jif Maxcollected input information timing < t j
- Hold on the statereturn
end- Calculate the FFE output at t j- Notify to its output receiving objects- Discard unnecessary inPort outputs from the collection
end
% Other internal methods not shown hereend
end
Chapter 4. Proposed Modelling for Equalizer Circuitry 64
FFE Output, xFFE(t)A+7
...
A−7
tChannel Response, yCh(t)
A+7
...
A−7
t
Figure 4.6: Channel waveform construction based on FFE outputs
possible output states. From the simulation, all possible output states can be recorded as, AFFE =
A−2n−1, . . . , A−3, A−1, A+1, A+3, . . . , A+2n−1, at the steady-state and one of these states is
selected based on the calculated FFE output states. In the Figure 4.5, an example case for 2-tap FFE is
shown. From the received bit-stream, the symbolic FFE states are calculated, and later each symbolic
state is replaced by its corresponding FFE amplitude.
Figure 4.6 shows a channel response accompanied by its input source FFE. In both plots, horizontal
axis represents time, t, and vertical axis represents amplitudes of the FFE transition states marked as,
A−7, A−5, . . . , A+7. FFE employed here is a 3-tap FFE and hence it has 23 = 8 possible states. Its
equalization gain is set according to the channel attenuation. Based on the FFE output states, the
channel response is calculated using Equation 2.15. Since we are only interested on the shape of the
channel response, signal attenuation at 0 Hz is considered to be 0 dB.
4.1.3 FFE Modelling Testcase
In order to evaluate the accuracy of the proposed modelling technique, a test-case of a LUT-based FFE
followed by a channel was created. Figure 4.7 shows a block diagram and compares the output eye
diagrams of the test-case. The test includes a source, a LUT-based FFE, a channel (shown as Ch), and
an eye diagram generator (shown as eye scope). The channel selected for the test was a 4-inch FR4
channel having an attenuation of ∼ 5 dB at the Nyquist frequency, fNyquist = 4 GHz. The objective of
the test is to compare the eye diagrams generated by the Spectre simulation and the proposed modelling
scheme.
For the equalization purpose, a 3-tap FFE was chosen and the tap weights were determined to be−−−−→WFFE = [−2/20, 15/20, −3/20] using Equation 4.2. Based on the tap weights, an example FFE, shown
in Figure 4.4, was implemented at the transistor level in Cadence environment. All 8 possible steady-
state amplitudes of the 3-tap FFE were found, 0.236, 0.264, 0.280, 0.321, 0.633, 0.682, 0.702, and 0.737
(all in Volts). However, these amplitudes did not match with the amplitudes calculated from the initially
designed tap weights. During the test, a PRBS7 was used as the source. The step response due to FFE
output transition was collected from the Spectre simulation considering trasmitter input termination of
the driver, channel, and receiver input termination to ∼ 50 Ω. Channel output was calculated applying
Chapter 4. Proposed Modelling for Equalizer Circuitry 65
Transmitter Receiver
Source FFE ChEyeScope
TxClk
(a) Block diagram
Proposed Scheme︷ ︸︸ ︷−UI 0 + UI
−UI 0 + UI︸ ︷︷ ︸Spectre
(b) FFE eye diagram comparison
Figure 4.7: FFE simulation testbench and waveform reconstruction process
Chapter 4. Proposed Modelling for Equalizer Circuitry 66
the FFE steady-state amplitudes in step response based modelling scheme and then an eye diagram was
generated. As can be seen from the figure, both eye diagrams from Spectre and the proposed scheme
are almost identical, validating the accuracy of the proposed modelling scheme. Eye diagram related
measurements for Spectre and proposed modelling cases are shown in Table 4.1.
Eye Measurements From Spectre From Proposed Scheme
Horizontal eye opening 0.78 UIpp 0.79 UIpp
Vertical eye opening 334 mV pp 339 mV pp
Relative simulation time 6000 X 1 X
The above modelling scheme was incorporated into the OO simulation scheme and the speed per-
formance of the simulation scheme was measured. The proposed scheme took ∼ 44s to generate eye
diagram based on 1-million transmitted bits. Detail speed performance is discussed in Section 3.3.5.
4.2 Continuous Time Linear Equalizer (CTLE) Modelling
Concept of CTLE operation is based on flattening the frequency response of the overall data transmission
link. Figure 4.8 shows the frequency response of a typical channel along with the frequency responses
of an ideal equalizer and a real CTLE for this channel. An ideal equalizer is the inverse of the low
pass filtering channel to compensate for the channel attenuation. Since signal amplification at higher
frequency increases unwanted noise, real CTLEs are bandlimited.
ω
ωz1 ωp1 ωp2
|H(ω)|
Apeak
Ideal equalizer,1
C(s)
Real CTLE, HCTLE(s)
Channel response, C(s)
Figure 4.8: Bode plot of a channel accompanied by its ideal equalizer and realistic continuous time linearequalizer (CTLE)
Here, Apeak defines the low-frequency gain of the CTLE. The real CTLE response can be formulated
considering one zero at ωz1 and two poles at ωp1 and ωp2 into its transfer function, HCTLE(s). The
transfer function, HCTLE(s), can be described as follows,
Chapter 4. Proposed Modelling for Equalizer Circuitry 67
HCTLE(s) ≈1
C(s)
= K ·s+ ωz1
(s+ ωp1)(s+ ωp2)(4.3)
where C(s) is the channel transfer function, K is the gain factor, defined as, K = Apeak ·
∣∣∣∣∣ωp1ωp2ωz1
∣∣∣∣∣.This transfer function can provide up to 20 dB/dec between ωz1 and ωp1. To achieve higher gain and
advance equalization, more zeros and poles can be incorporated into the transfer function.
4.2.1 CTLE Implementation
For high-speed wire-line application, a CTLE usually is implemented using passive resistive-capacitive
circuit components. An example of a CTLE circuit system is shown in Figure 4.9. Each stage of
the block diagram can be represented as a generic differential buffer block with an impedance transfer
function, Z(s). Each Z(s) is defined for the specific stage according to the stage functionality. Input of
the example CTLE is single-ended, while its output is differential. Due to receiving signal-ended signal,
the input terminal, Vin−, of the gain stage is connected to a reference voltage, Vref , while other input
terminal, Vin+, is connected to the channel attenuated signal, Vin.
Task of the gain stage is to achieve high frequency gain, while the amplification stages are for
providing required amplification for sampling. The impedance transfer function, Z(s), plays a major
role in defining the output characteristics of each stage. For the amplification stages, Z(s)’s are set to
be 0 (or shorted), while for the gain stage, Z(s) is formed using a parallel combination of a resistor, Rz,
and a capacitor, Cz. Applying the definition, the zero, ωz1, and the poles, ωp1 and ωp2, can be found as,
ωz1 =1
RzCz
ωp1 =1
RzCz
1 +gm1 · gm2
gm1 + gm2·Rz
ωp2 =1
RLCL
where gm1 and gm2 represent transconductances of M1 and M2 transistors respectively.
4.2.2 CTLE Modelling for OO Simulation
Since CTLE is usually modelled to observe its eye diagram, its object type is chosen as continuous type,
ObjTyp.Continuous . It can be incorporated as an independent or as part of a cascaded continuous
time filter for OO simulation. Because the goal here is to model its nonlinearity using the step response
based scheme due to the computational speed advantages, any preceding continuous time filters need to
Chapter 4. Proposed Modelling for Equalizer Circuitry 68
Vin
Vref
Vout
Gain stage Amplifying stagesDifferential signal
Single-ended signal
(a) CTLE block diagram
Vin+ Vin−
−Vout+
RL RL
M1 M2
Z(s)
Iss2
Iss2
RL: Load resistanceIss: Tail current
(b) Generic schematic for all stages
Z(s) =
Rz
Cz
Gain stage
Amplifying
stage
(c) Definition of Z(s)
Figure 4.9: Circuit-level overview of single-ended CTLE
Chapter 4. Proposed Modelling for Equalizer Circuitry 69
be cascaded. Figure 4.10 shows such an example case, where the CTLE is cascaded with the channel
(shown as Ch). Here, the source is considered discrete type object, such as RBG and FFE, where the
sink can be any object, such as any measurement scope. Algorithm 4.2 depicts the pseudo-code for
CTLE operation under OO simulation environment.
Transmitter Receiver
Clk
Source Ch CTLE Sink
Cascading Filters
Figure 4.10: Representing CTLE for OO simulation
Even though the functionality of a CTLE is supposed to be linear, CTLE implemented at the circuit-
level shows noticeable nonlinearity. This nonlinearity mostly contributes to often deformed and asym-
metric eye diagrams, which result in high jitter as well as shifted sampling threshold reference. These can
be taken into account in the CTLE model considering the gain nonlinearity as well as system memory,
which are described as follows.
Gain Nonlinearity
Gain nonlinearity is regarded as variation in output signal gain due to different input signal amplitude.
Ideally, the output signal is considered constant for the CTLE, but the nonlinearity is observable from
its circuit-level implementation. Figure 4.11 shows the DC gain plot of a differential buffer like CTLE.
Here, the DC gain is defined as, ∆VOut/∆VIn, where ∆VIn = Vi2 − Vi1, ∆VOut = Vo2 − Vo1, and all
amplitudes, Vi1 and Vi2, can vary independently within the CTLE input signal range. As the input
signal, ∆VIn increases due to Vi1 and Vi2, the output signal, ∆VOut increases with variable gain until it
saturates.
︷ ︸︸ ︷∆VIn
Vi1 Vi2 vIn
︸︷︷
︸
∆VOut
Vo1
Vo2
vOut
Figure 4.11: Plot of CTLE gain response, vOut/vIn
Chapter 4. Proposed Modelling for Equalizer Circuitry 70
Algorithm 4.2 Modeling template of continuous time linear equalizer for running OO simulation
classdef CTLE < handleproperties
objTyp = ObjTyp.continuous - Continuous object TypeinPort - Input circuit object information% Other internal properties not shown here
end
methods% Constructor called from the top-level scriptfunction obj = CTLE
- Construct the CTLE object- Receive and verify all required inputs
end
% Method init() triggered by the input object inPortfunction init(obj)
- Define additional uninitialized internal variables- Calculate the output, y(t = 0)- Notify its outputs receiving objects for collection
end
% Method receive() triggered by the object inPortfunction receive(obj)
- Collect outputs from the input object inPort at t i+1- call its process() method
end
% Method process() called from thefunction process(obj)
- Generate time vector, (t i, t i+1]- Calculate the output, (y(t i), y(t i+1)]- Notify its output receiving objects for collection- Discard unnecessary input information
% Other internal methods not shown hereend
end
Chapter 4. Proposed Modelling for Equalizer Circuitry 71
When a CTLE is modelled with respect to a RBG, which has only two possible output states, this
gain nonlinearity does not need to be considered. It is because these two states do not contribute to the
shape of the output eye diagram. However, if the CTLE is modelled due to an FFE, the amplitudes of
the FFE output states need to be recalculated. These amplitudes change due to amplitude dependent
CTLE gain. For the single-ended CTLE, the FFE amplitudes become asymmetric with respect to their
center and this leads to asymmetric eye diagram.
System Memory
Besides gain nonlinearity, CTLE system exhibits memory. Because of system memory, the CTLE changes
its system transfer function based on the previously transmitted bits (or transition sequence). This
phenomenon is evident from the collected step responses shown in Figure 4.12. The constant, s∞,
represents the steady-state height of the step responses. Since the CTLE transfer function changes,
collected step responses show variation in peaking. These step responses are collected through reversing
the continuous time waveform construction process (described in Section 2.5.3). Figure 4.13 explains the
reversing process. First, two continuous time waveforms, y1(t) and y2(t), due to input signal transitions
α1α2α3 . . . αi−1αi and α1α2α3 . . . αi−1 respectively, are recorded from a SPICE simulator. Subtracting
y2(t) from y1(t) yields the step response due to transmitting the transition, αi. Observing the variation
−sF (t)
0
−s∞0 ≥ t ≥ tExt
t
(a) Falling edge case (inverted for visualization)
sR(t)
s∞
0
0 ≥ t ≥ tExt
t
(b) Rising edge case
Figure 4.12: Extracted step responses for modelling a CTLE (considering the effect of channel)
Chapter 4. Proposed Modelling for Equalizer Circuitry 72
Waveform Due to Transition Sequence, α1α2 . . . αi−1αi
Waveform Due to Transition Sequence, α1α2 . . . αi−1
Calculated Step Response
y1(t)
y2(t)
y1(t)− y2(t)
0 tExt︸ ︷︷ ︸Extracted region
t
Figure 4.13: Step response extraction process for CTLE
on y1(t)− y2(t), extraction region for the step response is determined.
Here, we propose modelling CTLE using step responses, sF (t) and sR(t), collected from shorter
SPICE simulations. Figure 4.14 shows the construction process of CTLE continuous time waveform,
yCTLE(t). The top plot shows a random bit-stream, x(t), which has transitions at ti (where i = 1, 2, . . . ).
For each transition at ti, a step response is determined based on the approximate output at t′i = ti+ ∆t,
where ∆t denotes a constant time offset. The approximate output at t′i does not exactly follow the CTLE
output, yCTLE(t), since it does not take into account the transitions happening after ti. ∆t is determined
during the step response extraction through observing where variations among step responses are visually
maximum. During the waveform construction, step response determination involves selecting the closest
step response among the collected ones or interpolating one.
The proposed modelling method for CTLE offers several key benefits compared to other modelling
methods. The example single-ended CTLE has asymmetric rising and falling edges leading to deformed
eye diagram. This can be easily taken into considerations in the scheme with two different set of step
responses. Another key advantage is that the nonlinearity conditions of the CTLE need to be determined
only at the transition events instead of every chosen time point. Thus, the proposed modelling scheme
avoids numerous repetitive calculations at every time step unlike the time-step based ODE modelling.
In addition, the proposed modelling scheme considers the memory effect using approximate output of
the CTLE and this provides flexibility to include frequency offset related activities, such as data-rate
variations, random and deterministic jitters. Other memory effect modelling schemes, such as bit-pattern
Chapter 4. Proposed Modelling for Equalizer Circuitry 73
x(t)
−1
+1
yCTLE(t)
A+1
A−1
t
ti
- ∆t
t′i
Figure 4.14: Continuous time waveform formation for a CTLE (considering the effect of channel)
dependent modelling (proposed by Ren, J. et.al. [19]), cannot be applied under such frequency offset
environments.
4.2.3 CTLE Modelling Test Cases
In order to evaluate the accuracy of the proposed CTLE modelling scheme, two test cases were considered.
First case involves testing a CTLE by itself and the later case deals with respect to the CTLE along
with an FFE at higher data rate. Both cases are described with the help of the Figure 4.15 and 4.16
respectively. Objective is to compare the accuracy of the eye diagrams generated during both test cases.
Test Case 1: CTLE Operation
First test case focuses on effectiveness of the proposed system memory consideration during the CTLE
modelling. Hence, the test case includes a binary clocked source at the trasmitter, a channel (shown as
Ch), and a CTLE at the receiver. Because the goal is to study the eye diagram at the CTLE output,
an eye scope is included after the CTLE. Here, the source is a PRBS7 generating binary signal at
8 Gbps. The channel is a 4-inch FR4 channel having an attenuation of ∼ 5 dB at the Nyquist frequency,
fNyquist = 4 GHz. The CTLE was designed to have a zeros, ωz1 = −3.769× 1010 rad/s and two poles,
ωp1 = 0.6 · fNyquist and ωp2 = 3 · ωp1 to provide the desired boost.
Based on the design specification, a CTLE circuit is implemented at the transistor-level. In order to
realize the zero, ωz1, the resistor, Rz, and the capacitor, Cz, are chosen approximately to be 1.672 kΩ and
41.3 fF respectively. Other two poles, ωp1 and ωp2, appear due to circuit parasitic and load capacitance.
The common mode at input terminal Vin of the CTLE is set to be 750 mV and the reference terminal,
Vref , is set to be 740 mV . Due to the offset between the input common mode and the reference voltages,
eye diagram generated at the CTLE differential output through Spectre simulation becomes asymmetric.
(The asymmetry in the differential signal is introduced in order to counteract the asymmetry initiated
by the FFE, which is covered in the next test case.)
Chapter 4. Proposed Modelling for Equalizer Circuitry 74
Chapter 4. Proposed Modelling for Equalizer Circuitry 75
Following the proposed modelling scheme, the CTLE is then modelled. During the test, since the
CTLE response is calculated due to a binary source, gain nonlinearity is not taken into account. In order
to consider the CTLE system memory, 16 different rising and falling edges are recorded from Spectre
simulations. The extracted step responses are then used to construct the continuous time waveform,
which is later overlaid on top of each other to generate the eye diagram. As can be observed from
the figure, the eye diagrams from the proposed scheme nicely have matched on top of the one from
the Spectre simulation. Table 4.2 shows eye diagram measurements for both Spectre and the proposed
modelling scheme cases.
Table 4.2: Eye diagram measurements for continuous time linear equalizer (CTLE) test-case
Eye Measurements From Spectre From Proposed Scheme
Horizontal eye opening 0.78 UI 0.78 UI
Vertical eye opening 301 mV ppd 300 mV ppd
Relative simulation time1 2000 X 1 X
Test Case 2: FFE-CTLE Joint Operation
Aim of this test case is to demonstrate how to describe the CTLE nonlinearity due to a multi-level
source, such as FFE. Figure 4.16 shows the test case block diagram and the overlapped eye diagrams
for comparison. The test-bench of this case is quite similar to that of the Test Case 1, except that the
transmitted signal being pre-equalized by an FFE. The local clock at the trasmitter (shown as TxClk)
triggers both the source and the FFE.
Here, the data transmission took place at 16 Gbps and the channel has ∼ 20 dB attenuation (at
fNyquist = 8 GHz). Before the transmission, the input bit-stream is equalized by the same 3-tap FFE
described in Section 4.1.3. However, the FFE steady-state transition levels vary from its originally
recorded values, due to the CTLE gain nonlinearity. The newly recorded FFE transition levels are,
−0.570, −0.559, −0.547, −0.483, 0.412, 0.512, 0.541, 0.570, (measurement unit in Volts). Even though
the FFE has 8 different steady states, there are only 14 possible different transitions (instead of 8× (8−1) = 56). For each transition case, 8 different CTLE step responses are considered. The step responses
are applied following the proposed scheme to generate the continuous time waveform, which is then used
to generate an eye diagram. Eye diagrams generated from both Spectre and the proposed scheme are
overlapped and as can be observed, both eye diagrams have matched and measurements related to these
diagrams are presented in Table 4.3.
Table 4.3: Eye diagram measurements for CTLE test-case due to FFE
Eye Measurements From Spectre From Proposed Scheme
Horizontal eye opening 0.71 UIpp 0.69 UIpp
Vertical eye opening 498 mV ppd 501 mV
Relative simulation time1 2000 X 1 X
1Approximated from the speed measurements of the FFE test-case, described in Section 3.3.5
Chapter 4. Proposed Modelling for Equalizer Circuitry 76
Transmitter Receiver
TxClk
Source FFE Ch CTLEEyeScope
(a) Block diagram
Proposed Scheme︷ ︸︸ ︷−UI 0 + UI
−UI 0 + UI︸ ︷︷ ︸Spectre
(b) Eye diagram comparison
Figure 4.16: CTLE modelling performance evaluation due to an FFE
Chapter 4. Proposed Modelling for Equalizer Circuitry 77
4.3 Decision Feedback Equalizer (DFE) Modelling
The DFE operation involves subtracting residual ISI from the channel attenuated signal directly in time
domain and the equalization is usually performed right before data sampling. Residual ISI is determined
based on the previously detected bits and hence this equalizer cannot remove pre-cursor ISI like FFE.
Besides, the DFE has dependency on the local clock supply to calculate the residual ISI and hence its
performance depends on the amount of jitter present in the clock.
Figure 4.17 shows the architecture of a DFE. The DFE consists of three key components: an adder,
a slicer, and a feedback filter. Task of the adder is to subtract residual ISI, ei, calculated in discrete time
(where ei = e(t = ti) and i = 1, 2, 3, . . . ), from the continuous time input signal, x(t). The discrete
time residual ISI, ei, is held on for the bit-duration, Tb, until a new residual ISI, ei+1, is available. The
continuous time output from the adder, y(t), is then sampled by the slicer to determine the transmitted
bits, yi, with respect to an assigned threshold. yi can be any of the valid transmitted bits, yi ∈ −1,+1,for a given threshold 0. The feedback filter is there to calculate the residual ISI, ei using the previously
decided bits, yi’s, as inputs based on the defined tap weights, wi’s, where i = 1, 2, . . . , N .
Signal In,
x(t)
Equalized Output,
yDFE(t)
Decided Bits, yi
Slicer
Feedback Filter
Adder
Delay
z−1 · · · z−1 z−1
wN w2 w1
Res
idu
alIS
I,e i
Figure 4.17: Basic architecture of a decision feedback equalizer
4.3.1 DFE Implementation
Concept of residual ISI subtraction during the DFE operation is explained using Figure 4.18. The DFE
pulse response, pDFE(t), is superimposed on top of the intended channel response, c(t), which is recorded
due to the input pulse, Π(t). Here, the example DFE only cancels two ISI cursors, c1 and c2, followed
by the main cursor, c0 based on the clock sampling phase. As can be observed from pDFE(t), it contains
sharp edges, because the DFE removes the ISI only at the sampling locations due to its discrete time
feedback filter. For high-speed transceiver operation, a slicer is usually implemented using sampling
latch, which requires sufficient sampling aperture (both before and after the sampling clock edge) to
function properly. To ensure proper sampling latch operation, outputs from the filter should be made
available furthest point from the sampling phase; hence, each discontinuity in pDFE(t) appears around
Chapter 4. Proposed Modelling for Equalizer Circuitry 78
Input Pulse, Π(t)
t0 Tb
1
Channel and DFE Response
t-Tb
Apeak
c0
c1c2
Channel, c(t)
Cursor, ci
DFE, pDFE(t)
Figure 4.18: Pulse response due to 2-tap decision feedback equalizer (DFE)
at a middle point between two neighbouring cursors.
At the circuit level, a DFE employs an analog adder accompanied by a digitally clocked slicer and a
feedback FIR filter. The example DFE used for modelling study both receives and provides differentially
ended signals in order to comply with the differential output of the CTLE presented earlier. Figure 4.19
provides circuit-level overview of the DFE of interest. The DFE adder is realized using multiple gain
blocks with shared resistive load, RL, in order to perform current-mode summation. Each gain block is
a differential pair transistor, M1 and M2, biased with a current source, Iss,i, which is set proportional
to the corresponding tap weight, wi, where i = 0, 1, 2, . . . , N . The slicer is designed using a DFF with
high input sensitivity in order to achieve greater amplification for low equalized signal. Output of the
DFF is then fed to digital FIR logic to determine the polarity and delay of the gain stage, Ai. Here,
the rising edge of the clock is considered as sampling phase for the DFF and the falling edge is used for
digital FIR logic operation.
4.3.2 DFE Modelling for OO Simulation
Depending on the simulation requirements, a DFE can be considered as either a discrete time circuit
object, ObjTyp.discrete or a continuous time circuit object, ObjTyp.continuous . If the simulation
objective is only to acquire the recovered bits, modelling the DFE as a discrete time object is usually
sufficient. However, if generating eye diagrams is the ultimate goal, the DFE needs to be considered as a
continuous time circuit object. During the simulation, a DFE accepts two inputs: a clock source, which
is discrete type, and a signal source for equalization. If the DFE is modelled using an adder with linear
gain and infinite bandwidth, the signal source should be a continuous time circuit object such as channel
and CTLE. In contrast, if a realistic adder is incorporated, DFE modelling scheme becomes similar to
that of the CTLE. For step response based modelling, any continuous time filter along the signal source
path needs to be cascaded. Algorithm 4.3 presents the pseudo-code for a DFE as a continuous time
Chapter 4. Proposed Modelling for Equalizer Circuitry 79
Equalized Signal Decided Binary BitsSignal
In
Clock
RL RL
A0
A1
. . .
AN
D Q
D Q
DFF
Digital
FIR
Logic
(a) Top-level
+ +Vin IoutAi− −
Symbol View
+
Vin
−
+
Iout
−
M1 M2
Iss,i ∝ Ai
Schematic View
(b) Gain Block Description
Figure 4.19: Circuit-level overview of differentially ended DFE
Chapter 4. Proposed Modelling for Equalizer Circuitry 80
circuit object. Like an FFE, the DFE also operates synchronously with the local clock. Hence, methods
for the DFE are programmed similarly to those of the FFE.
Algorithm 4.3 Modeling template of decision feedback equalizer for running OO simulation
classdef DFE < handleproperties
objTyp = ObjTyp.continuous - Continuous object TypeclkPort - Clock object informationinPort - Data source object information% Other internal properties not shown here
end
methods% Constructor called from the top level scriptfunction obj = DFE
- Construct the circuit object DFE- Receive and verify all required input information
end
% Method init() triggered by data source object, inPortfunction init(obj)
- Define additional uninitialized internal variables- Calculate the output at t = 0- Notify its output receiving objects for initial output collection
end
% Method receive() triggered by data source object, inPortfunction receive(obj)
- Collect outputs from object pointed by inPort at (t i-1, t i]- Append the outputs to previously stored information
end
% Method process() triggered by clock object, clkPortfunction process(obj)
if processing is completedreturn
end- Get the clock transition, t jif Maxcollected output timing of inPort < t j
- Hold on to the state at t jreturn
end- Determine next processing time range, (t j-1, t j]- Calculate the output for the range- Notify its output receiving objects for output collection- Discard unnecessary input information
end
% Other internal methods not shown hereend
end
Output calculation for the DFE is similar to that of the CTLE due to algorithmic similarity. Figure
4.20 shows how to modify the DFE block diagram in order to capture the finite bandwidth property
for step response based modelling. A DFE top-level block diagram with a bandlimited adder, where
a LPF with transfer function, HAdder(s), is incorporated after the adder block. The HAdder(s) block
is shifted before the adder and it causes to have two HAdder(s) blocks: one along the signal path and
another closed to the FFIR(z) block. The HAdder(s) block along signal path needs to be cascaded with
its preceding continuous time filter blocks, such as channel and CTLE. Other HAdder(s) block is applied
Chapter 4. Proposed Modelling for Equalizer Circuitry 81
Signal
In, x(t)
Decided
Bits, yi
Bandlimited Adder
Adder
HAdder(s)
Slicer
FFIR(z)
(a) Conventional model
Signal
In, x(t)
Decided
Bits, yi
AdderSlicer
HAdder(s)
HAdder(s)
FFIR(z)
(b) Considering adder bandwidth before the addition
Figure 4.20: Modifying DFE model to capture the finite adder bandwidth
to convert the discrete time output of the feedback filter, FFIR(z). This allows to consider the adder as
an ideal summer, since its inputs are bandlimited.
Like CTLE, gain nonlinearity and system memory need to be taken into account for the DFE adder.
Initially, all DFE steady-states and their state transitions are identified due to input signal source as
well as its feedback filter as part of the gain nonlinearity modelling. In order to capture system memory,
multiple step responses are recorded with their associated intermediate output information. In the case,
one set of such information is associated with the cascaded continuous time filters along the signal path
and the other set is related to the HAdder(s) from the FFIR(z). These collected information are then
applied to construct the continuous time DFE output. Because the proposed modelling scheme for
DFE shares great similarity with that of the CTLE, the aforementioned modelling advantages are also
retained.
4.3.3 DFE Modelling Test Case
Based on the proposed DFE modelling scheme, a testcase is designed to evaluate the modelling accuracy.
Figure 4.21 depicts the testbench and the eye diagram comparison. The block diagram is quite similar
to typical top-level equalization architecture (Figure 4.1). As can be inferred from the testbench, all
equalizers (FFE, CTLE, and DFE) are considered during this test. Here, the data transmission is set at
Chapter 4. Proposed Modelling for Equalizer Circuitry 82
Transmitter Receiver
Source FFE Ch CTLE DFEEyeScope
DelayBuffer
TxClkRxClk
(a) Block diagram
Proposed Scheme︷ ︸︸ ︷−UI 0 + UI
−UI 0 + UI︸ ︷︷ ︸Spectre
(b) Eye diagram comparison
Figure 4.21: DFE modelling performance evaluation with respect to FFE and CTLE
Chapter 4. Proposed Modelling for Equalizer Circuitry 83
16 Gbps and the selected channel (show as Ch) is 4-inch FR4 with an insertion loss of ∼ 20 dB at the
Nyquist frequency, fNyquist = 8 GHz. An eye scope is added to observe the eye diagram after combined
equalization of FFE, CTLE, and DFE. The receiver clock (shown as RxClk) is created by delaying the
trasmitter clock (shown as as TxClk) instead of incorporating a clock synchronization scheme, because
the objective is only to capture the equalizer nonlinearity in the eye diagram.
In order to equalize for ∼ 20 dB channel attenuation, gains from all equalizers are distributed. For the
test, same 3-tap FFE and the first order CTLE are employed, which are described in Section 4.1.3 and
4.2.3 respectively. After the FFE-CTLE equalization, 1-tap DFE is employed to eliminate the remaining
ISI. Tap weights for the DFE is set up as−−−−→WDFE = [9/12,−3/12], which provides additional ∼ 6 dB
gain. Considering all equalizers, an eye diagram is generated using outputs from Spectre simulation.
In order to achieve the Spectre-like eye diagram, the DFE step responses and their associated pa-
rameters were extracted. Here, 3-tap FFE and 1-tap DFE together contribute to 23 × 21 = 16 possible
steady-states, which were extracted from Spectre simulation, as −0.654, −0.652, −0.650, −0.635, −0.310,
−0.309, −0.308, −0.300, 0.273, 0.300, 0.304, 0.307, 0.588, 0.638, 0.645, and 0.650 (all measured in Volts).
For 16 possible states, 22 different types of step responses are identified and for each type, 4 different step
responses were extracted. Applying step responses, a eye diagram for the DFE are constructed, which
was compared with that generated from the Spectre simulation. As can be seen from the figure, both
eye diagrams nicely overlapped with nearly all eye diagram features. Table 4.4 presents eye diagram
Unlike the Equation 2.15, the second segment marked as φConst(t) still remains as a function of time,
t. If this segment remains time-varying, the implemented algorithm performance will degrade as the
simulation time length increase, O(N 2), which is clearly not desirable. In order to achieve the linear
computational performance, O(N ), like that of the step response based modelling scheme, time-varying
nature of φConst(t) needs to be handled algebraically such that it can be calculated for any given time
space. Since the steady-state ϕ∞(t) can be described with a quadratic expression, ϕ∞(t) = At2+Bt+C,
plugging the expression into the φConst(t) leads as follows,
φConst(t) = limN→∞
N−k∑i=1
(vPD,i − vPD,i−1) · ϕ∞(t− ti)
= limN→∞
N−k∑i=1
(vPD,i − vPD,i−1) ·[A(t− ti)2 +B(t− ti) + C
]
= limN→∞
N−k∑i=1
(vPD,i − vPD,i−1) ·[A B C
]·
1 −2ti t2i
1 −ti
1
︸ ︷︷ ︸
3×3 Constant Matrix, O(N )
·
t2
t
1
(5.7)
As it can be observed from the new expression for φConst(t) in Equation 5.7, the summation ranging
i = 1, 2, . . . N−k, (as N →∞) only incorporates discrete time information associated with PD output
transitions, ti’s. The expression does not include the actual continuous time information, t, whose spread
can be described as, [0, tStop], where tStop indicates simulation stop time.
Like the case of the step response based modelling, first expression in Equation 5.6 involves k number
of summations, which also facilitates simulation length independence, O(N ). The second expression,
φConst(t), has been modified so that only the 3× 3 matrix comprises of ti needs to be updated. Hence,
the second expression also has the similar characteristics like that of the step response case and the
computational complexity is also O(N ).
Chapter 5. Proposed Modelling for Clock and Data Recovery (CDR) System 95
5.3 Putting it Altogether
Since task of the CDR is to generate clock transitions synchronized to optimal sampling location at
the receiver end, the CDR is modelled as a discrete time circuit object, ObjTyp.discrete in OO
simulation environment. CDR accepts only one input associated with binary input sources like RBG.
Depending on the CDR architecture and test objectives, input source for CDR can be discrete time or
continuous time circuit object. If the selected CDR architecture employs linear PD, the CDR input
source should be discrete type, since certain PD transitions are originated from the transitions of the
input source. Sometimes sampling correctness of the DFFs of the PD is important for simulation and
under such circumstance, the CDR can be modelled to accept input from a continuous time circuit object.
Algorithm 5.3 depicts a generic template for coding both linear and binary PD based CDR. Here, linear
PD based CDR operation is explained for OO simulation purpose due to its inherent asynchronous
behaviour causing incompatibility for event-driven simulation environment. Later, a brief discussion on
incorporating binary PD is also provided.
Determining the CDR clock transitions, which is performed inside the method recieve(), requires
simultaneously dealing with the PD output and LF-VCO discrete time transitions. Figure 5.6 is employed
here to explain the proposed CDR clock output transition determination scheme for linear PD based
CDR. After receiving newly processed information, ti+1, from the CDR input circuit object, the CDR
method receive() appends the information with previously received information, ti – this creates an
analysis window, [ti, ti+1), within which the CDR algorithm can determine its output transitions. Let
assume the within the analysis window, [ti, ti+1), N + 1 CDR clock transition occurs. At first, the
linear PD output is determined at the time point, ti, which is associated with the data point transition.
Taking into account the PD output transitions and previous clock transition, tj−1 (where tj−1 < ti),
the proposed scheme defines a new analysis sub-window, [ti, tj + δt). In the new analysis sub-window,
tj represents as new clock transition to be detected and δt defines a small offset necessary to determine
the new transition. Within the newly defined window, continuous time output for LF-VCO is calculated
(mostly around time tj) and from the output, the new transition, tj , is detected through interpolation
at nπ. At this point, the PD output state is updated again at tj . Similar continuous time output for
LF-VCO is performed for new sub-window, [tj , tj+1 + δt) to detect the next clock transition at tj+1.
After that, again the PD output state is updated. This PD state updating as well as new clock transition
detection continue until the final sub-window, [tj+N , ti+1), has been reached. The final sub-window,
[tj+N , ti+1), appears after detecting the last clock transition, tj+N , and analyzing the window does not
provide any new transitions. It is worth mentioning that the PD will not change its state after tj+1 time
event until ti+1 and hence the final sub-window can be selected as [tj+1, ti+1) to detect remaining the
CDR clock transitions.
As can be observed, the proposed modelling scheme take the advantage of event scheduling flexibility
from the OO simulation technique. The event scheduling flexibility has facilitated here to schedule
events, tj , tj+1, tj+2, . . . , tj+N , to take place after estimating data transition event, ti+1, even though
the data transition event, ti+1, occurs later in signal time space. Maintaining the ascending order of time
events, tj < tj+1 < tj+2 < · · · < tj+N < ti+1, is not necessary, since the data transition event, ti+1, is
not going to change due to any variation in the CDR operation. Besides, the event scheduling flexibility
has allowed to avoid inevitable repetitive the entire system calculation elimination in order to detect the
clock transitions. The proposed modelling scheme can also be applied for binary PD based CDR. In that
case, the PD outputs related to data transitions do not take place; otherwise, the modelling for both
Chapter 5. Proposed Modelling for Clock and Data Recovery (CDR) System 96
Algorithm 5.3 Modeling template of clock and data recovery for running OO simulation
classdef CDR < handleproperties
objTyp = ObjTyp.discrete - Discrete object TypeinPort - Input object% Other internal properties not shown here
end
methods% Constructor called from the top-level scriptfunction obj = CDR
- Construct the CDR object- Receive and verify all require inputs
end
% Method init() triggered by the object inPortfunction init(obj)
- Define additional internal input variables- Calculate output at time, t = 0- Set the PD state = 0- Enlist itself to the event host
end
% Method receive() triggered by the object inPortfunction receive(obj)
- Collect outputs from inPort at t i+1
% Determine all possible clock transitions ≤ t i+1while true
- Determine next clock transition, t nextif t next ≥ t i+1
breakend- Accept t next as clock transition list- Update PD state at t next
end- Update PD state at t i+1 % Not applicable for Binary PD- Erase unnecessary PD states from the PD transition list
end
% Method process() triggered from the event hostfunction process(obj)
if (process() is completed) | | (clock transition list is empty)return
end- Pass the first transition from the accepted transition list- Notify to the next circuit object
end
% Other internal methods not shown hereend
end
Chapter 5. Proposed Modelling for Clock and Data Recovery (CDR) System 97
Data Eye (One or Multiple UIs)
2 consecutive transitions
Intermediate Calculation Steps
tj + δt
tj+1 + δt
. . .
Reconstructed Outputs
PDoutput· · ·Clockoutput
· · ·
ti ti+1
tj−1 tj tj+1 · · · tj+N
Figure 5.6: Demonstration of CDR clock transition calculation
Chapter 5. Proposed Modelling for Clock and Data Recovery (CDR) System 98
linear and binary PD based CDRs share the same modelling procedure. It also indicates more advanced
mixed signaling scheme can be described using the proposed CDR modelling concept.
5.4 Performance Evaluation for the Proposed Modelling Scheme
The proposed step response based modelling scheme for CDR clock transitions in OO simulation envi-
ronment is compared here with the conventional time-step based modelling scheme. The object of the
comparison test is to evaluate its modelling accuracy with the conventional scheme, before performing
additional modification to capture the CDR nonlinearity due to charge pump assisted LF-VCO sat-
uration as well as DFF regeneration effects from PD. Linear PD based CDR is selected for the test
case.
Transmitter Receiver
Clk
Source Ch CDR BERT
Figure 5.7: Test case block diagram for linear PD based CDR
Figure 5.7 shows the testbench employed here to verify the accuracy of the proposed modelling
scheme. Based on the block diagram, the trasmitter end employs only a clock (marked as Clk) syn-
chronous source, which generates PRBS7 bit-stream (marked as source) at 10 Gbps. Then the source
output is passed through the channel (shown as Ch). Here, the channel only adds controlled delay, ∆t,
without initiating any attenuation, since the test-case is not designed to include any jitter effect due to
equalizer. The receiver side comprises only with the linear PD based CDR followed by bit error rate tester
(BERT) for PRBS7. The CDR is designed here with a loop filter bandwidth, ωLF = 20π × 106 rad/s
with a phase margin of 53.1. Based on the system specification, the CDR circuit design parameters are
determined as, IP = 20 µA, CP = 156.3 pF , RP = 1.6 kΩ, CA = 15.6 pF , KV CO = 3.1416 G · rad/s−V(considering input VCO range to be 2V ), and KPD = 0.5/2π (following linear CDR model).
Using the design parameters, the testbench (shown in Figure 5.7) was set up both in time-step based
and OO simulation environments. Both types of simulations were conducted for 104 bits over a duration
of 1 µs and outputs from various circuit nets of the system were collected for comparison. Key observable
segments from the collected outputs are shown in the top figure. Outputs for both input data from the
source and CDR sampled data outputs are shown under unlocked and locked conditions. First row shows
the input bit-stream generated by the PRBS7 with the delay of 0.3 UI. Next row shows the CDR clock
transitions collected from the time-step based simulation. CDR clock transitions nicely aligns with the
tπ, t2π, t3π, t4π, t5π, . . . , marked on the calculated CDR phase output, φOut(t), using the Equations
5.6 and 5.7. PD output transitions are also shown in between two sub-plots of CDR clock and φOut(t)
to confirm unlocked and locked situations. Applying the same equations of φOut(t) for the LF output,
vLF (t), which controls the VCO frequency, can be constructed and the reconstructed output is shown in
bottom sub-figure for complete 1 µs duration. As can be seen from alignment markings on the figures,
Chapter 5. Proposed Modelling for Clock and Data Recovery (CDR) System 99
Un
lock
ed
Sit
uati
on
Dat
a1 0
CD
RC
lock
1 0
PD
+1 0 −1
φOut(t
)
Ran
ge:
t 1πt 2πt 3πt 4πt 5π···
︸︷︷
︸0−
1.02
5ns
···
···
···
···
Lock
ed
Sit
uati
on
︸︷︷
︸0.9
99µs−
1µs
(a)
Clo
cktr
ansi
tion
dem
onst
rati
on
v LF
(t)
Ste
ady-s
tate
ofth
eL
Fou
tpu
t
01µs
Tim
e,t
(b)
Calc
ula
ted
VC
Oco
ntr
ol,v LF
(t)
Fig
ure
5.8:
Pro
pos
edm
odel
ing
mea
sure
men
tacc
ura
cyva
lid
ati
on
wit
hre
spec
tto
tim
e-st
epb
ase
dsi
mu
lati
on
mea
sure
men
t
Chapter 5. Proposed Modelling for Clock and Data Recovery (CDR) System 100
outputs from both environments match without any distinguishable difference.
5.5 Summary
This chapter deals with CDR modelling on the basis of system-level nonlinearity. In the beginning,
it is presented how equivalent linear model deviates from the realistic situation of the CDR. Later, a
new step-response based CDR modelling concept has been proposed and how the modelling can be
useful in regards to capturing the nonlinearity of the CDR arising from LF-VCO saturation as well as
DFF regeneration effects from the PD. Finally, the proposed modelling has been compared with the
conventional time-step model of the CDR for its calculation accuracy. It has also been shown how
to model linear PD based CDR, which is currently not feasible to model in conventional event-driven
simulation environment. However, the proposed modeling task still needs to be verified with a realistic
CDR that is implemented with transistor-level circuitry to demonstrate its true nonlinearity capturing
capability and it is mentioned as part of the future work.
Chapter 6
Conclusion and Future Work
This chapter summarizes the overall thesis contributions as well as certain future works that need to
be completed. Section 6.1 provides the summary of all three contributions presented in Chapter 3 - 5.
Future works for each contribution are discussed in last Section 6.2.
6.1 Thesis Contribution
This thesis deals with transceiver circuitry modelling as well as simulating in a computationally efficient
environment while accurately capturing circuit-level nonlinearity. Contributions from the thesis are
summarized as follows.
• OO Simulation: The OO simulation scheme has been developed based on notion of the con-
ventional event-driven simulation platform to support operations for asynchronous circuitry. The
proposed scheme addresses the incompatibility issue through introducing event scheduling flex-
ibility. Even though asynchronous circuit operation is supported in time-step based simulation
environment, the simulation scheme is computationally inefficient, time consuming, and often un-
reliable due to convergence instability. The proposed OO simulator also improves the simulation
speed for continuous time circuitry through focusing on the system to calculate only required min-
imum time points to describe continuous time outputs. Since circuit objects used in the simulation
scheme is designed with initializing methods to calculate the initial conditions, the simulator does
not have to randomly guess or deal with incorrect user-defined initial conditions. Thus the pro-
posed system addresses the problem of convergence instability of a large circuit system through
introducing generalized initialization methods.
• Equalizer Modelling: Even though equalization is mostly based on a linear transfer function
(either in continuous or discrete time perspective), implemented equalizers in transistor-level barely
retains the exact linearity. Since performing system-level simulation in SPICE simulator with in-
depth transistor-level information is not feasible for time consuming BER related studies, the
modified step-response based modelling have been proposed to deal with the transistor related
nonlinearity. During the continuous time waveform formation, the proposed modelling technique
suggests determining step response based on the current output states and transition patterns. This
alterations in step response based modelling allows to capture a number of nonlinearity factors,
101
Chapter 6. Conclusion and Future Work 102
such as transistor transconductance, output impedance, and input capacitance variations. Having
high modelling accuracy facilitates generating Spectre-like eye diagrams at the equalizer outputs.
The modified step response based modelling not only achieves high accuracy but also maintains
high and linear simulation speed (∼ 44s for 1 million bits). Essentially, the proposed modelling
scheme eliminates numerous repetitive computationally intensive calculations through utilizing the
key transistor-level information during the simulation, while maintaining low processor memory
footprint.
• CDR Modelling: Step response based modelling success for equalizers is exploited for CDR
nonlinearity modelling, during its clock transition calculations. Conventional linear models of
a CDR suffers from modelling accuracy for the CDRs designed for high-speed application due
to assuming PD output calculation continuity. Event-driven modelling improves the modelling
accuracy through adopting clock phase-to-phase PD updates, but fails to incorporate linear PD
based CDR due to its asynchronous nature. The proposed modelling scheme addresses the issue of
the asynchronous PD through utilizing the event-scheduling flexibility offered by the OO simulator.
In addition, the modelling technique shed lights on capturing the circuit-level nonlinearity of the
CDR appearing due to charge pump based LF-VCO saturation effect as well as DFF regeneration
of the PD. The nonlinearity capture using the proposed step response based modelling technique
is not performed due to time scarcity.
6.2 Future Work
During the course of thesis work, modelling transceiver circuitry has been mainly studied to seek for
computationally efficient ways to capture the nonlinearity with reasonable accuracy. Following potential
studies can be conducted in order to further improve the circuitry simulation performance.
• OO Simulation: The OO simulation scheme has been used to demonstrate how to incorporate
asynchronous circuitry for event-driven simulation type environment using its event scheduling
flexibility. However, the proposed technique also shed lights on how to simulate multiple continuous
time circuit systems with more computational efficiency compared to conventional time-step based
simulators. This computational efficiency is possible to achieve through the de-unionizing the time
axis to select activity individual time axis for each respective continuous time component. To prove
the computation efficiency, a new example case with reasonably large circuit system, which might
involve multiple transceiver circuitry connected either serially or in parallel need to be developed.
Even though incorporating event scheduling flexibility has enabled supporting aforementioned
features, it can potentially reduce the simulation speed drastically due to excessive memory re-
quirements in long simulation cases. As indicated at the end of Section 3.3.2, if an circuit object
cannot keep up with processing with the rate it receives the input, the input information can create
overload with memory. Under such circumstances, the event scheduler can be made adaptive to
prioritize certain processing events through communicating with the respective circuit objects to
deal with the situations.
• Equalizer Modelling: Modified step response based equalizer circuitry modelling has showed
how to generate Spectre-like eye diagrams without utilizing real transistor model. The technique
Chapter 6. Conclusion and Future Work 103
also has been used to demonstrate to potentially simulation speed using the case for FFE based
modelling in C++ environment (presented as part of OO simulation performance in Section 3.3.5).
However, it is required to observe the true simulation speed for the case of CTLE and DFE cases,
although their simulation speed would provide slightly higher simulation speed due to their similar
calculation scheme but slightly complex step response scheme. Having a complete simulation speed
performance helps to establish the nobleness of the proposed method compared to other available
modelling schemes.
The proposed modelling scheme performs with excellent accuracy, when continuous time output
from the equalizer contains residual ISI. Circuit nonlinearity related to residual ISI occurs at the
high-speed operations, but circuits designed for low-speed applications do not usually have residual
ISI. This situation can be dealt with initiating data-pattern dependent step response models, which
is proposed by Ren et.al. [19]. To increase the range of operation for step response based modelling
technique, this data-pattern dependent scheme can be integrated with the proposed scheme.
• CDR Modelling: Step response based modelling for CDR has been demonstrated and compared
with the conventional model for validity. However, more works need to be performed. The next
phase of the task would be to implement a transistor-level CDR system applicable for high-speed
operation. Architecture of the CDR should be selected such that the circuit has reasonably visible
nonlinearity like the case of equalizers. The last phase would be to adopt multiple step responses
based model for a CDR to represent its nonlinearity. Similar to earlier case, the speed performance
should also be conducted in C++ environment.
Bibliography
[1] Cadence Design Systems Inc., SpectreHDL Reference.
[2] M. Van Ierssel, H. Yamaguchi, A. Sheikholeslami, H. Tamura, and W. W. Walker, “Event-driven
modeling of cdr jitter induced by power-supply noise, finite decision-circuit bandwidth, and channel
isi,” Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 55, no. 5, pp. 1306–1315,
2008.
[3] J.-E. Jang, M.-J. Park, D. Lee, and J. Kim, “True event-driven simulation of analog/mixed-signal
behaviors in systemverilog: A decision-feedback equalizing (dfe) receiver example,” in Custom In-
tegrated Circuits Conference (CICC), 2012 IEEE, pp. 1–4, IEEE, 2012.
[4] J.-E. Jang, S.-J. Yang, and J. Kim, “Event-driven simulation of volterra series models in systemver-
ilog,” in Custom Integrated Circuits Conference (CICC), 2013 IEEE, pp. 1–4, IEEE, 2013.
[5] T. Flew, New media: An introduction. Oxford University Press, 2007.
[6] A. M. Odlyzko, “Internet traffic growth: Sources and implications,” in ITCom 2003, pp. 1–15,
International Society for Optics and Photonics, 2003.
[7] L. Atzori, A. Iera, and G. Morabito, “The internet of things: A survey,” Computer networks, vol. 54,
no. 15, pp. 2787–2805, 2010.
[8] G. E. Moore, “No exponential is forever: but” forever” can be delayed![semiconductor industry],” in