Page 1
Refined error analysis in second order Σ∆
modulation with constant inputs
C. Sinan Gunturk1 and Nguyen T. Thao2
1Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012.
e-mail: [email protected]
2Department of Electrical Engineering, City College and Graduate School, City University of New York, Convent
Avenue at 138th Street, New York, NY 10031. e-mail: [email protected]
This work has been supported in part by the National Science Foundation Grants DMS 97-29992, DMS 02-19053,
DMS 02-19072, CCR 02-09431 and the Francis Robbins Upton fellowship at Princeton University.
August 26, 2003 DRAFT
Truong-Thao Nguyen
IEEE Transactions on Information theory, submitted in March 2001
Page 2
i
Abstract
Although the technique of sigma-delta (Σ∆) modulation is well established in practice for performing
high resolution analog-to-digital conversion, theoretical analysis of the error between the input signal and
the reconstructed signal has remained partial. For modulators of order higher than 1, the only rigorous
error analysis currently available that matches practical and numerical simulation results is only applicable
to a very special configuration, namely, the standard and ideal k-bit k-loop Σ∆ modulator. Moreover, the
error measure involves averaging over time as well as possibly over the input value. At the second order, it
is known in practice that the mean-squared error decays with the oversampling ratio λ at the rate O(λ−5).
In this paper, we introduce two new fundamental results in this analysis for constant input signals. We
first establish a framework of analysis that is applicable to all second order modulators provided that the
built-in quantizer has uniformly spaced output levels, and that the noise transfer function has its two
zeros at the zero frequency. In particular, this includes the one-bit case, a rigorous and deterministic
analysis of which is still not available. This generalization has been possible thanks to the discovery of
the mathematical tiling property of the state variables of such modulators. The second aspect of our
contribution is to perform an instantaneous error analysis that avoids infinite time-averaging. Until now,
only an O(λ−4) type error bound was known to hold in this setting. Under our generalized framework, we
provide two types of squared-error estimates; one that is statistically averaged over the input and another
that is valid for almost every input (in the sense of Lebesgue measure). In both cases, we improve the
error bound to O(λ−4.5), up to a logarithmic factor, for a general class of modulators including some
specific ones that are covered in this paper in detail. In the particular case of the standard and ideal 2-bit
double-loop configuration, our methods provide a (previously unavailable) instantaneous error bound of
O(λ−5), again up to a logarithmic factor.
Keywords
A/D conversion, Σ∆ modulation, quantization, piecewise affine transformation, tiling, uniform distri-
bution, discrepancy, exponential sums.
August 26, 2003 DRAFT
Page 3
ii
Contents
I Introduction 1
II Equations of the second order modulator 7
II-A Feedback equations and an equivalent system . . . . . . . . . . . . . . . . 7
II-B The quantizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
II-C State-space equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
II-D Basic error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
II-E Nonlinear functions T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
III Invariant tiles under constant inputs 14
III-A Experimental observation of tiling . . . . . . . . . . . . . . . . . . . . . . 15
III-B Mathematical justifications of tiling . . . . . . . . . . . . . . . . . . . . . 17
III-C The single invariant tile case and its fundamental consequence . . . . . . . 18
III-D Further developments on the single tile case . . . . . . . . . . . . . . . . . 20
IV Thorough study of three particular configurations 22
IV-A Linear T and 2-bit quantizer: the L2 system . . . . . . . . . . . . . . . . . 23
IV-B Linear T and 1-bit quantizer: the L1 system . . . . . . . . . . . . . . . . . 24
IV-C A new rule: Quadratic T and 1-bit quantizer: the Q1 system . . . . . . . 25
IV-D Boundedness, regularity and tiling . . . . . . . . . . . . . . . . . . . . . . 27
V The Main Theorem 28
VI Discussion and further remarks 36
Appendix 37
-A Tools from the theory of uniform distribution . . . . . . . . . . . . . . . . 37
-B Invariant set Γx for the L1 system . . . . . . . . . . . . . . . . . . . . . . . 40
-C Invariant set Γx for the Q1 system . . . . . . . . . . . . . . . . . . . . . . . 42
-D Proof of Proposition IV.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
-E On the analysis of the quadratic scheme: zero-centroid setting of C(x) . . . 47
August 26, 2003 DRAFT
Page 4
1
I. Introduction
In the current state of the art of circuit design, high resolution A/D and D/A conversion
is achieved by oversampling the input signal and transforming it into a sequence of coarsely
quantized values which are selected from a small alphabet consisting of as few as two
symbols. An approximation of the input is then obtained by extracting the in-band content
of the quantized signal via appropriate filtering. Sigma-delta (Σ∆) modulation is a widely
used method for this purpose (see [1], [19], [14]), owing its success largely to its robustness
against circuit imperfections and ease of implementation.
The simplest version of Σ∆ modulation is the single-loop (first order) version originally
introduced in [15], which involves an integrator, a single-bit quantizer and a negative
feedback from the quantizer output into the integrator input. The system equations are y[n] = y[n−1] + x0[n]− q[n−1]
q[n] = Q(y[n]).(1)
The block diagram of Figure 1 symbolizes this system. Here, x0[n] = X0(n/λ), n ∈ Z,
denotes the sequence of samples of the continuous-time input signal X0(t) sampled at λ
times per time-unit. We shall normalize the time-unit so that the spacing between the
Nyquist-rate samples is equal to one time-unit; thus λ is also equal to the oversampling
rate. Throughout the paper, λ will be assumed to take integer values. The signals y[n] and
q[n] = Q(y[n]) denote the quantizer input and the output, respectively. In this system, the
quantizer Q is one-bit; i.e., it outputs values from a discrete set consisting of two values,
although multi-bit quantizers are also used in practice, especially for higher order systems.
Unless otherwise specified, the input is assumed to be in [−12, 1
2] when the quantization
step size is normalized to 1.
It is one of the primary objectives of the theory to understand, as a function of λ, the
behavior of the error between the input signal and the approximations given by
xλ[n] := (φλ ∗ q)[n] :=∑
k
φλ[k]q[n−k] (2)
for suitable lowpass filters φλ whose number of taps typically grow linearly in λ, thus
spanning a uniformly bounded duration of real implementation time. Various norms can
August 26, 2003 DRAFT
Page 5
2
DQ
D
− + +
[ ] [ ]quantizerscalar
y n q nx n0 [ ]
Fig. 1. Block diagram of classical first order Σ∆ modulation.
be considered for measuring the error signal
eλ[n] := x0[n]− xλ[n], (3)
such as the supremum norm, defined by
‖eλ‖∞ := supn|eλ[n]|, (4)
or an (infinite) time-averaged squared norm, defined by
‖eλ‖2av := lim sup
N→∞
1
2N + 1
N∑n=−N
|eλ[n]|2. (5)
For continuous-time approximations of the input, one considers smooth interpolations
Xλ(t) of the sequence xλ[n] at the original time scale, i.e., in the sense that Xλ(n/λ) =
xλ[n]. Analogous norms can be defined for the corresponding continuous-time error signal
Eλ(t) := X0(t)− Xλ(t). Note that in this case one would also have eλ[n] = Eλ(n/λ).
A priori, the error decay in λ depends on the reconstruction filter, the error measure
and the input signal. The effect of each of these factors is important and interesting on
its own right; however one may safely claim that the effect of the reconstruction filter is
understood better than the other two. For the discussion below the reader may assume
that the filters are ideal low-pass; what follows is usually valid for a wide range of filters,
though sometimes small modifications may also be necessary.
Let us consider the sup-norm first. In the case of constant inputs x0[n] = x, the sup-
norm of the error eλ := eλ,x has been known for a long time to be bounded by Cλ−1 (see,
e.g. [6]) where C does not depend on x; this bound in the same form has been extended
to the case of arbitrary bandlimited functions as well [4]. Neither of these bounds is
August 26, 2003 DRAFT
Page 6
3
sharp, however. For constant inputs, one in fact has ‖eλ,x‖∞ ≤ C(x)λ−2+ε for almost
every x (in the sense of Lebesgue measure) where ε > 0 may be arbitrarily small [2],
[8], [10]. Here, the constant C(x) depends on some fine arithmetical properties of x (in
the sense of Diophantine approximations) and is quite irregular (for instance, it is not
square integrable in x on any non-zero interval). For arbitrary bandlimited functions, a
corresponding improvement in the exponent of λ has been found only for the instantaneous
error; for each ε > 0 and each time instant t, one has |Eλ(t)| ≤ C(X ′0(t), ε)λ
−4/3+ε [8],
[10].
It is clear that ‖eλ‖2av ≤ ‖eλ‖2
∞; therefore all upper bounds for the squared sup-norm
apply to the time-averaged squared norm as well. In the first order case with constant
inputs, it turns out that these two norms behave somewhat similarly in the sense that time-
averaging does not yield any significant gain in the exponent of λ and that ‖eλ,x‖2av ≥
c(x)λ−4 for infinitely many λ [24]. This is not the case for the higher order schemes
(which will be defined shortly below) and there still remains a large discrepancy – natural
or artificial – between the best known exponents of λ in the bounds for these two error
norms.
To provide more insight on the size of the error signal, let us now look at the effect
of statistical averaging over the values of the constant input. Various mixed-type error
norms can be considered depending on how one incorporates the mathematical expecta-
tion (taken over the input space) into the norm definition. It is known in the case of
uniformly distributed inputs x that, the mean (expected) time-averaged squared error (or
equivalently, mean-squared time-averaged error) defined by
E(‖eλ,x‖2av) :=
∫ 1/2
−1/2
‖eλ,x‖2av dx (6)
is bounded by Cλ−3 both from above and from below [6]. Note that the sup-norm estimate
Cλ−1, which is uniform for all values of x would yield the suboptimal estimate Cλ−2 for
this quantity; on the other hand, we also see that the constant C(x) in the improved
sup-norm estimate C(x)λ−2+ε cannot be square integrable with respect to x for otherwise
it would yield an impossible Cλ−4+ε type estimate. These remarks apply to the mean-
August 26, 2003 DRAFT
Page 7
4
TABLE IA comparison of previously known error estimates for Σ∆ modulation and contribution
of this paper.
order k = 1 k = 2 k ≥ 3 k ≥ 2,k-loop
# bits 1 bit 1 bit 1 bit k bitssquared sup-norm [4]:‖eλ‖2∞ Cλ−2 Cλ−4 Cλ−2k Cλ−2k
time-varyinginputsx0[·]
squared instantaneous [9][10]:|eλ(t)|2 C(x′0(t), ε)λ
− 83+ε
squared sup-norm [2][9][10]:‖eλ,x‖2∞
C(x)λ−4+ε
a.e. x
input-averagedsquared sup-norm [9]:∫ 1
2
− 12
‖eλ,x‖2∞ dx
Cλ−3 log2 λ
constantinputs x
time-averagedsquared error [12][13][3]:‖eλ,x‖2av
C(x)λ−4+ε
a.e. x
C(x, ε)λ−5+ε
a.e. x[24]
Cλ−2k−1
input and time-averagedsquared error [6]:∫ 1
2
− 12
‖eλ,x‖2av dx
Cλ−3
input-averagedinstantaneous squared[this paper]:∫ a
−a
∣∣eλ,x[n]∣∣2 dx, a < 1
2
Cλ−4.5 log2 λ
instantaneous squared[this paper]:∣∣eλ,x[n]
∣∣2|x| ≤ a < 1
2
C(x, n)λ−4.5 log
72+δλ
a.e. x
squared sup-norm
E(‖eλ,x‖2∞) :=
∫ 1/2
−1/2
‖eλ,x‖2∞ dx (7)
as well. This norm is stronger (i.e. larger) than E(‖eλ,x‖2av), but it turns out that it obeys
the only slightly worse upper bound Cλ−3 log2 λ [9]. These results are summarized in the
“k = 1” (first order) column of Table I in terms of squared norms.
More complicated multi-loop systems incorporate multiple number of integrators and
August 26, 2003 DRAFT
Page 8
5
feedbacks, and achieve better system performance as λ is increased [12], [13], [3]. For
a k-th order system, the corresponding system equations involve a k-th order difference
equation, which may be presented in the prototypical form
∆ky[n] = x0[n]− q[n], (8)
where ∆k denotes the k-fold composition of the standard difference operator ∆ defined by
∆y[n] = y[n]−y[n−1], and q[n] again represents the Σ∆ quantized output signal (possibly
up to a shift in time). In the case of a k-th order stable Σ∆ modulator (one for which
y[n] is bounded), it is proved in [4] that ‖eλ‖2∞ ≤ Cλ−2k where the constant C is uniform
over the input. They also give the first infinite family of arbitrarily high order single-bit
schemes that are unconditionally stable for arbitrary bounded inputs. Since the sup-norm
is the strongest norm among all the norms we consider, an O(λ−2k) type estimate applies,
in particular, to all the mean-squared norms; however, similar to the first order case, this
does not necessarily reflect the optimal behavior of these systems. Indeed, it is known for
the multi-bit multi-loop configuration with constant inputs that ‖eλ‖2av has an O(λ−2k−1)
type decay [13]. However, note that the analysis of [13] is restricted to only a special case
of Σ∆ modulation with a fixed uniform k-bit quantizer1 for a k-th order scheme. While
using this multi-bit quantizer avoids overloading and eases the analysis of the quantization
error significantly, it is clearly a non-ideal setup for large k, since one of the appeals of
Σ∆ modulation is its capability of working with single-bit quantizers, producing one bit
per input sample.
In this paper, we allow more general quantizers, including single-bit quantizers. As one
important contribution, we analyze Σ∆ schemes for which no better estimate than the one
provided by [4] could previously be given. This includes the remaining “1-bit” columns in
Table I. Under our generalized framework, we provide two types of squared instantaneous
error estimates. The first one involves statistical averaging over the input and is uniform
in time, measured by the sup-norm in time of the mean-squared instantaneous error
∥∥E(e2λ,x[·])
∥∥∞ := sup
n
∫I
|eλ,x[n]|2 dx, (9)
1By this, we mean that the quantizer has uniformly spaced 2k output values and each threshold level is the
midpoint of an interval defined by these output values.
August 26, 2003 DRAFT
Page 9
6
where I is an interval of input values that may be restricted by the Σ∆ scheme. For the
convenience and simplicity of notation, we shall use the shortcut MSE(λ) (for the generic
“Mean Squared Error”) for this particular error measure. Note that this measure satisfies
the “sandwich” inequality∫I
‖eλ,x‖2av dx ≤ MSE(λ) ≤
∫I
‖eλ,x‖2∞ dx. (10)
For the (single-bit) schemes we shall consider, the best available bound on MSE(λ) is
O(λ−4), which is provided by the bound in [4] for the much stronger sup-norm. We will
show in this paper that MSE(λ) obeys an O(λ−4.5) type bound, up to a logarithmic factor.
The second error bound we shall provide will be directly on the instantaneous error. We
will show that for almost every input x, |eλ,x[n]|2 also obeys the upper bound O(λ−4.5) up
to a logarithmic factor. To the best of our knowledge, there has been no other improved
estimates for these schemes yet (however, also see [24]).
For the two-bit configuration, our methods will produce an O(λ−5) type bound, again
up to a logarithmic factor. This result provides us with a rate estimate that matches the
estimate for the time-averaged squared error norm; however note that due to (10), neither
of these results imply (i.e., is stronger than) the other one.
Due to the increased complexity of the analysis, we shall restrict this paper to second-
order systems with constant inputs. Our methods, however, are not limited to second-order
schemes only, but to a large class of arbitrary order modulators [24]. We also believe that
the new techniques we introduce will prove to be very useful for time-varying inputs.
The paper is organized as follows. In Section II, we derive the basic equations and
formulas for the time evolution of signals in second order Σ∆ modulators. In particular,
we express the reconstruction error in terms of the state vector of the system. At this
point, the main obstacle against pushing the derivations further is the absence of an
explicit expression of any of the node signals of the Σ∆ modulator, including its output
and its state vector, which is basically due to the nonlinear recursion embedded in the
Σ∆ modulator. A first contribution of this paper is the introduction and the exploitation
of a new remarkable property of Σ∆ modulators which, in principle, enables an explicit
derivation of its output and its state-vector sequence. This property, which we call the
tiling property, refers to the fact that the state vector remains in a set Γx that tiles the
August 26, 2003 DRAFT
Page 10
7
+ +
Ddelay
DQ
D
- - + +
[ ] [ ]
xx βα
quantizerscalar
0[ ]x n y n q n
Fig. 2. Block diagram of classical second order Σ∆ modulation.
space by Z2 translations. We give the exact definition of this property in Section III,
show experimental evidence of it, give mathematical justifications, and derive from the
knowledge of Γx an explicit expression of the state vector in terms of x and n. While the
existence of the tile Γx is clearly demonstrated by experiment and proved mathematically
(in complete for the cases considered in this paper and with less precision in the arbitrary
generality), detailed parametrizations of these tiles are not known in general. Further
knowledge on these parameters requires explicit analyses of given configurations. We
study in Section IV three different configurations for which a thorough analysis has been
feasible. Part of this analysis involves the study of geometric regularity of the invariant sets
as carried out in Section IV-D, which will turn out to be important for the improved error
estimates mentioned above that we derive in Section V. These estimates depend heavily
on the general machinery of the theory of uniform distribution [16], [5]. Appendix-A is
specially dedicated to the basic elements of this theory as utilized in this paper. We
conclude the paper with further remarks and future research directions (Section VI).
II. Equations of the second order modulator
A. Feedback equations and an equivalent system
The generic architecture of a classical second order Σ∆ modulator is shown in Figure
2.2 It can be easily derived from the block diagram that this system satisfies the second
order difference equation
∆2y[n] = x0[n]− (α + β)q[n−1] + βq[n−2]
= x0[n]− (α + β∆)q[n−1]. (11)
2There are more general configurations that contain extra feedbacks from the quantizer input as well [19].
August 26, 2003 DRAFT
Page 11
8
In the standard case of the double-loop configuration studied in [13] where α = β = 1,
this equation can be rewritten as a difference equation for the quantizer error y[n]− q[n]:
∆2(y[n]− q[n]) = x0[n]− q[n]. (12)
In the general case, the direct signal analysis of the system of Figure 2 is difficult due to
the complicated action of the feedback. We shall first derive an equivalent diagram of this
system which yields simpler feedback mechanisms. Consider the change of input variable
defined by the difference equation
x0[n] = γ∆2x1[n] + (α + β∆)x1[n−1] (13)
where γ is a parameter to be chosen at our disposal. Next, define the auxiliary variables
u1[n] and u2[n] to satisfy the difference equations
∆u2[n] = u1[n]; ∆u1[n] = x1[n]− q[n]. (14)
Then, by subsequently applying (14), (11) and (13), it follows that
∆2(αu2[n−1] + βu1[n−1] + γx1[n]) = (α + β∆)(x1[n−1]− q[n−1]) + γ∆2x1[n]
= (α + β∆)x1[n−1] + ∆2y[n]− x0[n] + γ∆2x1[n]
= ∆2y[n]. (15)
Assuming that the initial conditions for x1[n] have been picked (arbitrarily, or by some
criterion), the initial conditions for the sequences u1[n] and u2[n] can now be chosen so
that (15) implies
y[n] = βu1[n−1] + αu2[n−1] + γx1[n]
= T (u1[n−1], u2[n−1], x1[n]) , (16)
where
T (u1, u2, x) := βu1 + αu2 + γx. (17)
Since
q[n] = Q(y[n]), (18)
August 26, 2003 DRAFT
Page 12
9
1
1
1
1
1
2
2
1
2
[ ]
[ ]D
D u
u n
n[ ] [ −1]
u n
u n
operatormemoryless
Qquantizerscalar
y n q n[ ][ ]
x n
u n
[ −1]
u n
u n
[ ]
[ −1]
[ −1]D
[ ]
[ ]+
uDu nq n
+ +n
[ −1]
x nT
[ ]
Fig. 3. Alternative representation of the pure feedback process of second order Σ∆ modulation.
dynamicalsystem
[ ] [ ][ ] [ ]x nx n G Φq n0 1 x nλ
λ
~
Fig. 4. Global signal processing chain of Σ∆ modulation: the first block is characterized by the difference
equation of (13), the second block represents the system of Figure 3 and the third block represents the
convolution operation of (2). The first two blocks combined together generalizes the original second order
Σ∆ system of Figure 2.
the signal y[n] can now be thought of resulting from x1[n] through a new dynamical
system shown in Figure 3. In this system, the feedback-loop simply carries the input-
output difference x1[n] − q[n] at every instant and the remainder of the system uses this
value to produce the next signal value y[n+1] to be quantized.
Given the construction of this dynamical system, the complete Σ∆ modulation process
can then be equivalently described as the transformation of x0[n] into xλ[n] through the
sequence of equations (13), (16), (18) and (2). Thus, the signal processing of Σ∆ modu-
lation based on the architecture in Figure 2 can be represented by the block diagram of
Figure 4, where the block labeled “dynamical system” symbolizes the system of Figure
3. The operator G is basically a recursive filter that transforms x0[n] into x1[n] through
the difference equation (13). Note that in this setup the operator G and the signal x1[n]
appear by mathematical construction and do not necessarily exist physically in an actual
implementation given by Figure 2. We also deduce that the effect of γ as a parameter in
G is cancelled out in the system of Figure 3. For the realizability of this equivalence, we
will assume that the parameters α and β have been chosen such that the operator G is
stable for some γ.
August 26, 2003 DRAFT
Page 13
10
Let us note at this stage that while the Σ∆ modulator described by Figure 2 is favorable
for the efficiency of its circuit implementation, it is also a legitimate option to switch to
the slightly less efficient Σ∆ modulator scheme described solely by Figure 3 (i.e. without
the pre-filter G) when circuit implementation is not the primary concern. In this case,
γ would be an additional parameter of design. In fact there would be a whole range of
flexibility in the choice of T if nonlinear functions are also allowed. We shall return to this
issue in Section II-E.
B. The quantizer
We assume in this paper that the quantizer Q is uniform of step size 1, in the sense
that its output values are of the type i − 12
where i = i0, i0 + 1, ..., i1. The quantization
intervals I i which satisfy Q(I i) = i− 12
are defined by
I i :=
(−∞, i0), i = i0,
[i− 1, i), i0 < i < i1,
[i1 − 1, +∞), i = i1
(19)
We call the quantizer k-bit if i1 − i0 + 1 = 2k. In the particular one-bit case, we assume
that i0 = 0, so that the quantizer mapping reduces to
Q(y) =
−12, if y < 0,
+12, if y ≥ 0.
(20)
We call the quantizer infinite if i0 = −∞ and i1 = +∞.
We say that the quantizer is overloaded if |y−Q(y)| > 12. Note that the infinite quantizer
is never overloaded.
C. State-space equations
At every instant, u1[n] and u2[n] constitute the state variables of the system. We will
use the short hand notation
u[n] =
u1[n]
u2[n]
(21)
August 26, 2003 DRAFT
Page 14
11
to represent the vector state of the system. The full recursive system equations of the
block diagram of Figure 3 is then q[n] = Q(T (u[n−1], x1[n]))
u[n] = Au[n−1] + (x1[n]− q[n])e,(22)
where
A =
1 0
1 1
, and e =
1
1
. (23)
For each real number ξ, we define a partition Ωiξ, i = i0, . . . , i1 of R2 by setting Ωi
ξ =
u : T (u, ξ) ∈ I i. Let Miξ : R2 → R2 denote the affine transformation defined by
Miξ(u) := Au + (ξ − i + 1
2)e, (24)
and Mξ : R2 → R2 denote the piecewise affine transformation defined by
Mξ(u) = Miξ(u), if u ∈ Ωi
ξ. (25)
With this notation, the recursive equations of (22) can be rewritten as
u[n] = Mx1[n](u[n−1]). (26)
D. Basic error analysis
Suppose we use the system of Figure 2 and we would like to compute an approximation
of the input x0[n] via the convolution xλ = φλ ∗ q. Since the overall result is equivalently
described by the signal processing chain of Figure 4, it is natural to consider Φλ of the
form Φλ = G−1Hλ to remove the pre-filtering effect of G. With this choice, all we need
to satisfy is that Hλ is a suitable reconstruction filter for the system of Figure 3 for the
input x1[n]. Indeed, if we know that x1 − hλ ∗ q is small, then
x0 − φλ ∗ q = g−1 ∗ x1 − g−1 ∗ hλ ∗ q = g−1 ∗ (x1 − hλ ∗ q) (27)
will also be small since g−1 is a (causal) FIR filter of at most 3 taps as defined in (13).
For the error analysis, it therefore suffices to consider the system of Figure 3 only.
Now the error signal x1 − hλ ∗ q for the system of Figure 3 can be written as
x1 − hλ ∗ q = x1 − hλ ∗ x1 + hλ ∗ (x1 − q)
= e[1]λ + e
[2]λ , (28)
August 26, 2003 DRAFT
Page 15
12
where the first error component
e[1]λ := x1 − hλ ∗ x1 (29)
is a signal that does not depend on the quantization procedure and can be made arbitrarily
small (in fact, even zero) by chosing hλ suitably, and the second error component
e[2]λ := hλ ∗ (x1 − q) (30)
corresponds to the in-band portion of the “quantization error” signal x1[n] − q[n]. It is
this second error component that constitutes the center of interest of Σ∆ error analysis
since unlike the first one it is highly nonlinear in the input.
In the particular case when the input is a constant signal x1[n] = x (as will be the case
for the rest of the paper) we do not even have to worry about the first error component e[1]λ
since we can eliminate it completely by restricting hλ to filters that satisfy∑
n hλ[n] = 1.
It therefore causes no ambiguity to denote e[2]λ by eλ. Substituting ∆2u2[n] = x1[n]− q[n]
from (14) and changing the order of convolution and differentiation yields the formula
eλ = ∆2hλ ∗ u2 . (31)
When the dynamical system associated with the map Mx is stable (see Section IV), a
basic decay estimate immediately follows. Indeed, stability implies that u2 is bounded,
which leads to the simple error bound
‖eλ‖∞ ≤ ‖∆2hλ‖1‖u2‖∞. (32)
This basic estimate cannot provide any decay rate better than O(λ−2). To see this,
recall first that the number of taps of hλ was required to grow linearly in λ; this implies
that ‖∆2hλ‖1 ≥ c/λ2 for some constant c > 0. Indeed, let h be an L-tap filter with
h[n] = 0 for n < 0 and n ≥ L. Define s[n] = 12(n− L−3
2)2. Using ∆2s[n] = 1, and applying
summation by parts twice, one obtains
1 =∑
h[n]∆2s[n] =∑
s[n−2]∆2h[n] ≤(
max0≤n≤L+1
|s[n−2]|) L+1∑
n=0
|∆2h[n]| ≤ (L+1)2
8‖∆2h‖1 ,
hence the proof of the claim.
August 26, 2003 DRAFT
Page 16
13
On the other hand, the upper bound O(λ−2) is easily achieved by imposing some smooth-
ness on hλ. Let ρλ be the rectangular filter of length λ given by ρλ[n] = 1λ
if and only if
0 ≤ n < λ. Consider
hλ = ρλ ∗ ρλ ∗ hλ (33)
where hλ is any filter with linearly growing number of taps in λ and that satisfies∑
hλ[n] =
1 with ‖hλ‖1 ≤ C for some absolute constant C. Clearly, we have∑
hλ[n] = 1 as well.
Note that ∆ρλ = 1λ(δ0 − δλ) where δa denotes the sequence defined by δa[n] = δ[n−a].
This implies ∆2hλ = 1λ2 (δ0 − 2δλ + δ2λ) ∗ hλ and we obtain
‖∆2hλ‖1 ≤1
λ2‖δ0 − 2δλ + δ2λ‖1‖hλ‖1 ≤
4C
λ2; (34)
therefore (32) implies ‖eλ‖∞ = O(λ−2).
The simplest choice for hλ would be δ0. In this case, hλ is the triangular filter, i.e.,
the second order discrete B-spline more commonly known in the circuit community as the
sinc2 filter due to its frequency domain representation. It has been found, however, that
the error decays faster than O(λ−2) with the choice hλ = ρλ, in which case hλ is the sinc3
filter. To explain this phenomenon, we return to the exact error expression (31) which
now yields the decomposition
eλ =1
λ2(δ0 − 2δλ + δ2λ) ∗ (ρλ ∗ u2) =
1
λ2(δ0 − 2δλ + δ2λ) ∗ uλ, (35)
where uλ is the sequence of running averages defined by
uλ[n] := ρλ ∗ u2 [n] =1
λ
λ−1∑m=0
u2[n−m]. (36)
When λ is large, it is expected that the signal uλ[n] will vary more slowly than u2[n] due
to the long time averaging. In fact, if some form of “central limit theorem” could be shown
to hold for u2[n], this would force uλ[n] to be mostly concentrated about a mean value.
Note that (δ0 − 2δλ + δ2λ) is a difference operator which would bring out the residual
value around this mean when convolved with uλ. Therefore, eλ would at most vary as this
residual value (up to the multiplicative factor 1λ2 ). This additional cancellation provides an
intuitive justification for using the sinc3 filter. The quantification of this idea, which will
be essential in the derivation of our improved estimates mentioned in Section I, consists
August 26, 2003 DRAFT
Page 17
14
much of the rest of this paper. Qualitatively, these results may be viewed as originating
from the ergodicity of the mappings Mx with respect to the Lebesgue measure on certain
invariant sets. On the other hand, quantitative results will depend heavily on fine analytic
and algebraic properties of these invariant sets, which will be stated in Section IV.
It is known that the sinc3 filter can be further improved by some coefficient modifica-
tions, yielding smaller multiplicative coefficients in the error [12]. However, in this paper
we shall stick to the sinc3 filter, as it is simple to implement, and it captures the essence
of our methods best. Let us briefly mention here that nonlinear reconstruction has been
demonstrated to yield faster error decays [25][22] but is not used in practice for complexity
reasons.
E. Nonlinear functions T
The error decay guaranteed by the inequality of (32) actually gives us more informa-
tion than just an error bound. Note that it assumes no condition on the nature of the
functions T and Q except that they have been designed to ensure that u is bounded.
Theoretically, the constraint that T be a linear function is artificial as a design criterion
since the composed operator Q T is in any case nonlinear because of Q. Moreover, note
that it is globally the composed operator Q T that differentiates the stability properties
of one modulator from another. We will see in the next section that relaxing the linearity
of the function T will enable us to uncover general properties of the dynamical systems
that are analytically unreachable with linear functions T . Also, although the feasibility of
nonlinear functions T with regard to analog circuit implementation is still an unanswered
question, a prototype of one-bit second order modulator with a quadratic function T was
numerically demonstrated in [23] to have superior performances to the one-bit linear-T
modulators. This prototype will be introduced in the next section.
III. Invariant tiles under constant inputs
The error relations (35) and (36) require a refined analysis of the state vector u[n]. The
fundamental difficulty is that the sequence u[n] is not known explicitly in terms of x1[n].
As can be seen in (26), u[n] is only recursively determined in terms of u[n−1] with x1[n] as
a varying parameter. The scope of this paper is the error analysis under constant inputs
August 26, 2003 DRAFT
Page 18
15
x1[n] = x, ∀n. In this situation, u[n] recursively depends on u[n−1] through the fixed
mapping Mx, i.e.,
u[n] = Mx(u[n−1]). (37)
The key to the analysis lies in the study of the map Mx.
A. Experimental observation of tiling
We start with the description of a particular experiment that lead to the discovery of a
remarkable property of the maps Mx. For various second order Σ∆ modulators, we plot
in black in Figure 5 several consecutive iterates u[n] of a fixed initial condition u[0] under
the map Mx, where x is a fixed constant input. In these plots, x is chosen to be irrational;
we will return to this issue later. For each modulator, one can observe in this plot that
the state points remain in (and fill out) a certain deterministic set Γ := Γx. However,
there is more to this set in that in every case its integer (Z2) translations appear to tile
the plane. We highlight this fact in the figure by representing the translates of the points
u[n] by (1, 0) and (1, 1) in two gray tones, respectively. Formally, we say that a set Γ is
a tile when for each point v ∈ R2, there is a unique point v′ ∈ Γ such that v − v′ ∈ Z2.
This is equivalent to the fact that the family Γ + kk∈Z2 forms a partition of R2.
Since the initial observation of this phenomenon [9], it has been systematically confirmed
on any stable second order modulator employing a quantizer with uniformly spaced output
levels as assumed in this paper. In the cases of Figures 5(a) through 5(d), the standard
linear rule T (u1, u2, x) = u1 +u2 +x is used with different versions of the quantizer. Figure
5(a) is the case of the non-overloaded ideal quantizer (infinite quantizer). In Figure 5(b)
we use the 3-level quantizer introduced in [26] which employs −1, 0, 1 as the output levels
and 12,−1
2as the threshold values. Figure 5(c) is the standard one-bit quantizer case. In
Figure 5(d) we use an infinite quantizer whose threshold at 0 has been deviated by +13.
Figure 5(e) shows the case of a different linear rule T (u1, u2, x) = u1 + 12u2 + x with the
regular infinite quantizer. Finally, Figure 5(f) shows the case of the following “semi-linear”
rule introduced in [23]:
T (u1, u2, x) = (9− 6|x|)u1 + (6− 12|x|)u2 + (10− 4|x|)x. (38)
August 26, 2003 DRAFT
Page 19
16
(a) (d)
(b) (e)
(c) (f)
Fig. 5. Representation in black of several consecutive state points of various second order Σ∆ modulators
with an irrational constant input x ' 1/4. The copies in gray are the translated versions of the state
points by (1, 0) and (1, 1), respectively.
August 26, 2003 DRAFT
Page 20
17
Formally speaking, these experiments demonstrate the existence of a tile Γx that con-
tains the forward trajectory U = u[n]n≥0. Now, the recursive relation (37) implies that
Mx(U) ⊂ U . Since U appears to be dense in Γx, one is then tempted to conjecture that
Mx(Γx) ⊂ Γx. Once proved, this result implies that if the initial initial state u[0] ∈ Γx,
then all forward trajectories will be known to remain in Γx. However, the real significance
of this result lies in the tiling property as will be explained in Section III-C.
B. Mathematical justifications of tiling
Some mathematical justifications of the above tiling conjecture have been recently pro-
vided in [24] under the assumption of stability. We call the dynamical system defined by
a map M on Rm positively stable if there exists a bounded set Γ0 satisfying M(Γ0) ⊂ Γ0.
We call such a set Γ0 positively invariant. Certainly the existence of such a set ensures
stable operation of the modulator and there has been interest in finding such sets [21]. We
are interested in invariant sets that are also tiles. It turns out that if a positively invariant
set can be found for M, then consecutive iterations of this set under M converge to an
attractor set that is a tile, up to a multiplicity. In the following theorem we summarize
the results obtained in [24] in this direction:
Theorem III.1 ([24]) Given a finite measurable partition Ωii of Rm, a collection of
integer vectors ei ∈ Zm and an irrational constant x ∈ R, consider the piecewise affine
map M on Rm defined by
M(u) := Au + (x + 12)e− ei, if u ∈ Ωi, (39)
where A is the m × m lower triangular matrix of 1’s and e = (1, . . . , 1) ∈ Zm. If there
exists a bounded set Γ0 of positive measure that is positively invariant under Mx, then
the set
Γ :=⋂k≥0
Mk(Γ0) (40)
is invariant by M (i.e., M(Γ) = Γ) and is equal (up to a set of measure zero) to the disjoint
union of a finite and non-empty collection of tiles.
A number of remarks is in order. First, note that the mapping defined in (23)-(25) is
indeed of the form of (39). Second, not only does this theorem state the existence of an
August 26, 2003 DRAFT
Page 21
18
invariant set Γ, but (40) shows that Γ is an attractor of M within the region of stability Γ0.
Next, note that this theorem is valid in any dimensions m and under general conditions
on the partition Ωii as well as the integer translations ei, as long as the overall map M
is positively stable. However, under these general assumptions, the conclusion is only that
Γ is composed of one or more tiles, all up to a set of measure zero. Indeed, an example
given in [24] shows that a map of the type (25) may yield an invariant set composed of two
tiles. Let us note that this example required the use of a particular nonlinear thresholding
function T . The exact conditions on T to yield a single tile are not currently known. From
our experience (including, for example, the experiments of Figure 5), we believe that all
stable Σ∆ modulators using a linear thresholding function T yield a single invariant tile,
at least at the second order and including the case of rational input constants x. However,
care must be taken in the definition of the invariant set Γ when x is a rational number;
the statement is on the existence of a tile Γ that is invariant under M and it may not
necessarily be the case that Γ can be found as an attractor as in (40) or the closure of any
trajectory. It remains a general conjecture that linear thresholding functions enjoy these
properties. In Section IV, we will give the proof of these properties on three particular
configurations of second order Σ∆ modulation. But before performing this analysis, we
would like to show why the single tile case is of crucial importance.
C. The single invariant tile case and its fundamental consequence
From now on we only consider Σ∆ modulators for which the invariant set Γx is a single
tile for each x. We shall see in this case that it is possible to find an explicit expression
for u[n] in terms of n and Γx. To keep the discussion simple we shall restrict ourselves to
second order modulators; the generalization to higher order modulators is routine.
We first introduce some notation. Let Γ be an arbitrary tile in R2. By definition, the
collection of sets Γ + kk∈Z2 form a partition of R2. This implies that for each u ∈ R2,
there exists a unique point in Γ, denoted 〈u〉Γ, such that
〈u〉Γ− u ∈ Z2. (41)
In other words, u 7→ 〈u〉Γ
is the unique map from R2 to Γ that satisfies
∀u ∈ Γ, 〈u〉Γ
= u, (42)
August 26, 2003 DRAFT
Page 22
19
−1 −0.5 0 0.5 1 1.5 2 2.5 3−1
−0.5
0
0.5
1
1.5
2
u1
u 2
u[n+1]
v[n+1]
u[n]
v[n] u[n−1]
v[n−1]
Γ
(a)
−1 −0.5 0 0.5 1 1.5 2−1
−0.5
0
0.5
1
1.5
2
u1
u 2
Γ S(0,0)
S(−1,0)
S(−1,−1)
S(0,−1)
u
< u>
< u>Γ
(b)
Fig. 6. Modulo operations: (a) illustration of u[n] = 〈v[n]〉Γ
from (48); (b) comparison between 〈u〉Γ
and
〈u〉 (illustration of (52)). In both figures, 2D points are marked using the symbol ’×’.
and
∀u ∈ R2, ∀k ∈ Z2, 〈u + k〉Γ
= 〈u〉Γ. (43)
In the simple case where Γ = [0, 1)2, we will use the standard notation 〈v〉 to denote
〈v〉[0,1)2 , where
〈v〉 =
〈v1〉
〈v2〉
. (44)
Here 〈v〉 := v − bvc denotes the fractional part of a real number v, and bvc denotes the
greatest integer less than or equal to v.
We return to the sequence u[n] of the Σ∆ state vector which remains in Γx for all n.
From (25-26), we can write
u[n] =(Au[n−1] + (x + 1
2)e)− ie, (45)
where i ∈ Z. Since −ie ∈ Z2, we obtain via (42) and (43) that
u[n] =⟨Au[n−1] + (x + 1
2)e⟩
Γx. (46)
At the same time, let us artificially build a closely related sequence v[n], which we recur-
sively define by
v[n] = Av[n−1] + (x + 12)e (47)
August 26, 2003 DRAFT
Page 23
20
with the initial state v[0] = u[0]. We have the following property:
Proposition III.2: For all n, we have
u[n] = 〈v[n]〉Γx . (48)
Proof: Since 〈v〉Γx − v ∈ Z2 and A is a matrix with all integer coefficients, we have
A〈v〉Γx −Av = A(〈v〉Γx − v) ∈ Z2. It follows from (43) that
〈A 〈v〉Γx + w〉Γx = 〈Av + w〉Γx (49)
for any w ∈ R2. The proposition is then proved by induction. For n = 0, we have
u[0] = 〈u[0]〉Γx = 〈v[0]〉Γx . Suppose (48) holds for k = n − 1, i.e. u[n−1] = 〈v[n−1]〉Γx .
Then, by successively applying (46), (49) and (47), we obtain
u[n] =⟨A〈v[n−1]〉Γx + (x + 1
2)e⟩
Γx
=⟨Av[n−1] + (x + 1
2)e⟩
Γx
= 〈v[n]〉Γx.
The power of this result lies in the fact that there is an explicit functional expression
for v[n] which can be obtained by simply iterating (47) forwards and backwards:
∀n ∈ Z, v[n] =
v1[n]
v2[n]
=
u1[0] + n(x + 12)
nu1[0] + u2[0] + 12n(n + 1)(x + 1
2)
. (50)
Thus, under the assumption that the tile Γx is known, the combination of (48) and (50)
provides an explicit expression of u[n] in terms of n.
We define u[n] for n < 0 by (48). It follows from the invariance of Γx under Mx that
this definition is consistent with (37) for all n ∈ Z.
Figure 6(a) gives a graphical example of explicit determination of the sequence u[n]
from the knowledge of the tile, via the preliminary calculation of the sequence v[n] from
(50).
D. Further developments on the single tile case
A remaining major difficulty of analysis is that the expression (48) for u[n] depends on
the knowledge of the invariant set Γx. Not only that this set can be complex as in some of
August 26, 2003 DRAFT
Page 24
21
the examples in Figure 5, but also explicit expressions are, in general, not easily obtainable.
Nevertheless, an analysis of u[n] is still possible, thanks to a particular decomposition of
〈·〉Γx into simpler components. This is based on the following lemma:
Lemma III.3: Let Γ and Γ′ be two sets that tile the plane with Z2 translations. For
each k ∈ Z2, let us define the set Πk := u : 〈u〉Γ∈ Γ′ − k . Then
(i) the family Πkk∈Z2 forms a partition of R2,
(ii) 〈u〉Γ′− 〈u〉
Γ= k when u ∈ Πk.
Proof: Since Γ′ tiles the plane with Z2 translations, for any u ∈ R2, there exists a
unique k ∈ Z2 such that 〈u〉Γ∈ Γ′ − k. This proves part (i). Now, consider any given
k ∈ Z2 and any u ∈ Πk. By definition, 〈u〉Γ∈ Γ′ − k, and we have 〈u〉
Γ+ k ∈ Γ′. Since
〈u〉Γ
+ k differs from u by an element in Z2, and itself lies in Γ′, it must indeed be equal
to 〈u〉Γ′
, i.e., 〈u〉Γ′− 〈u〉
Γ= k.
Lemma III.3 actually leads to the following explicit relation
〈u〉Γ′ = 〈u〉Γ +∑k∈Z2
χΠk
(u) k,
where χA
stands for the characteristic function of the set A. Note that 〈u〉Γ
always belongs
to Γ. Hence u ∈ Πk if and only if 〈u〉Γ∈ Γk := Γ ∩ (Γ′ − k). We can then also write
〈u〉Γ′ = 〈u〉Γ +∑k∈Z2
χΓk
(〈u〉Γ) k. (51)
Of particular interest will be the case where Γ′ = Γx and Γ = [0, 1)2. This yields
〈u〉Γx = 〈u〉+∑k∈Z2
χSk
(〈u〉) k. (52)
where Sk := [0, 1)2∩(Γx−k). This is the decomposition of the function 〈·〉Γx as mentioned
earlier.
Another useful property is the following:
Proposition III.4: Let Γ and Γ′ be two (Lebesgue) measurable sets that tile the plane
with the Z2 lattice. Then 〈·〉Γ as a mapping from Γ′ to Γ is a (Lebesgue) measure preserving
bijection whose inverse is given by 〈·〉Γ′ . If F : R2 → R is any Z2-periodic locally integrable
function, then ∫Γ
F (u)du =
∫Γ′
F (u)du. (53)
August 26, 2003 DRAFT
Page 25
22
Proof: Bijectivity is clear. On the other hand, if u ∈ Γ′, then 〈〈u〉Γ〉Γ′ = u, since each
of these mappings shifts its argument by an element of Z2, and the resulting point lies in
Γ′. Hence 〈·〉Γ′ inverts 〈·〉Γ. Now, for any k ∈ Z2, let us define Γk := Γ∩(Γ′−k) and Γ′k :=
Γ′ ∩ (Γ − k). It follows easily from the tiling assumption that the families Γkk∈Z2 and
Γ′kk∈Z2 form partitions of Γ and Γ′, respectively. It is also easy to see that Γ′k = Γ−k−k.
Now, for a measurable set A ⊂ Γ, (51) implies that 〈A〉Γ′
=⋃
k∈Z2 ((A ∩ Γk) + k). This is
a disjoint union, and it follows that |〈A〉Γ′| =∑
k∈Z2 |A∩ Γk + k| =∑
k∈Z2 |A∩ Γk| = |A|.
Hence 〈·〉Γ preserves measure.
Since F is Z2-periodic, we have F (u) = F (〈u〉Γ′). Using this and the measure preserving
property of 〈·〉Γ′ , we get∫Γ
F (u)du =
∫Γ
F (〈u〉Γ′)du =
∫Γ′
F (v)dv. (54)
We conclude this section with a word is on the dynamics of Mx on the invariant set Γx.
Consider the mapping 〈Mx〉 : [0, 1)2 → [0, 1)2 naturally defined by 〈Mx〉(u) = 〈Mx(u)〉.
It can be easily checked that the mappings Mx|Γxand 〈Mx〉 are related to each other via
Mx|Γx= 〈·〉Γx 〈Mx〉 〈·〉.
It is well known that when x is irrational, 〈Mx〉 is ergodic with respect to the Lebesgue
measure (see e.g. [20]). Since both 〈·〉Γx : [0, 1)2 → Γx and 〈·〉 : Γx → [0, 1)2 are measure
preserving, it follows that Mx (and also M−1x ) is ergodic on Γx with respect to the Lebesgue
measure as well. This is the ergodicity property that was mentioned at the end of Section
II-D.
IV. Thorough study of three particular configurations
The purpose of this section is three-fold. First, we would like to give some concrete
examples of invariant tiles Γx in some practical configurations. Recall from the previous
section that a general criterion regarding when the invariant sets reduce to single tiles is not
available yet and also that our signal analysis machinery is dependent on this condition.
Second, we will see in these examples that tiling phenomenon is not restricted to irrational
inputs but that it applies to rational inputs as well. Third, we would like to extract some
August 26, 2003 DRAFT
Page 26
23
common analytical features of these invariant sets which will later be crucial in the error
analysis of Section V.
A. Linear T and 2-bit quantizer: the L2 system
The Σ∆ configuration for which derivations are the easiest is the standard 2-bit double-
loop configuration previously studied in [12]. As explained in Section II-A, this corresponds
to the case where (α, β) = (1, 1). We set γ = 1, so that y[n] satisfies the relation
y[n] = u1[n−1] + u2[n−1] + x[n]. (55)
With this choice of coefficients, the modulator satisfies a unique property that we describe
here. One can easily derive from Figure 3 that
u2[n] = u1[n−1] + u2[n−1] + x[n]− q[n].
Because of (55), this implies that
u2[n] = y[n]− q[n]
which is the quantizer error (up to the sign). First, assume that the quantizer Q is infinite
as defined in Section II-B. This implies that −12≤ y[n] − q[n] < 1
2for all n. Therefore
the point (u2[n−1], u2[n]) belongs to [−12, 1
2)2 regardless of the input sequence (x[n])n∈N.
Correspondingly, the couple (u1[n], u2[n]) belongs to the image Γ of the set [−12, 1
2)2 under
the bijection τ : (w1, w2) 7→ (w2 −w1, w2). We depict this set in Figure 7. Γx := Γ clearly
remains invariant under Mx for all x, and its Z2 translations tile the plane; the latter
follows easily from the observation that this is already true for the set [−12, 1
2)2 and that
the matrix representing τ is integer-valued with determinant ±1.
Now, if −12≤ x[n] ≤ 1
2, then as a consequence of (55), y[n] always remains in the
interval (−2, 2). Hence the infinite quantizer can as well be replaced with the two-bit
quantizer with output values −1.5,−0.5, 0.5, 1.5 to produce an equivalent system. For
the corresponding threshold values −1, 0, 1 and for x = 0, the level sets of the function
T are also drawn in Figure 7. Note that there are 4 regions in Γ determined by these lines,
and these are represented by the 2 bits of the quantizer output.
August 26, 2003 DRAFT
Page 27
24
T=1
T=0
T=−1
AB
C D
Γ
u
u
1
2
Fig. 7. The invariant set for the dynamical systems Mx considered in Section IV-A. The level sets of
T (·) are drawn for x = 0.
B. Linear T and 1-bit quantizer: the L1 system
To analyze properties that are likely to be representative of the general case of practical
second order Σ∆ modulators, it is important to consider at least one configuration where
T is linear, but (α, β) 6= (1, 1) and the quantizer Q is only one-bit as defined in (20).
Unfortunately, in this situation the invariant sets of Mx suddenly become complicated
and very difficult to identify. Until now, this identification has been possible only in the
particular case where (α, β, γ) = (12, 1, 0), the quantizer Q is one-bit and x is limited to
the interval [−16, 1
6]. In this situation, the space is partitioned by the line u1 + 1
2u2 = 0
into two half-spaces denoted by Ω0x and Ω1
x and Mx is an affine transformation on each of
these half-spaces. As x varies, these mappings exhibit invariant sets that depend on x in
a non-trivial way. Consider the partition of the interval (0, 16) as(
0, 16
)= . . . ∪ [αk+1, αk) ∪ . . . ∪ [α2, α1),
where αk = 12(4k2 − 1)−1, k ≥ 1, and for each x ∈ (0, 1
6), let k = kx be the unique integer
such that x ∈ [αk+1, αk). We show in Appendix-B that the connected set Γx enclosed
in the polygon shown in Figure 8(a) (where the portion of the boundary represented in
mixed line is excluded) is an invariant set for Mx, and its Z2 translations tile the plane.
August 26, 2003 DRAFT
Page 28
25
The exact definition of the vertices of Γx is given in Appendix-B, Table II. Note that the
total number of vertices is equal to 4k + 6, which increases indefinitely as x approaches 0.
We add that Γ0 and Γ 16
are obtained via the limits of Px, as x → 0 and x → 16. Together
with the symmetry Γx = −Γ−x, which is a mere consequence of the relation
T (−u,−x) = −T (u, x), (56)
we obtain the parameterization of all Γx in the range [−16, 1
6].
We also note that the polygonal boundary of Γx has bounded perimeter for all x ∈
[−16, 1
6].
C. A new rule: Quadratic T and 1-bit quantizer: the Q1 system
To extract further potential properties of the dynamical system of Figure 3 in the one-
bit case, it is interesting to explore the case where the linearity of T is relaxed, given the
limited available results with linear T . Such a question was previously studied in [23] (see
also [4] for a piecewise linear choice of T ).
It turns out that remarkably simple invariant sets are obtained (yet still using a one-bit
quantizer) by considering the quadratic function T defined by
T (u1, u2, x) = C(x) + (6− 4x)u1 + (4− 8x)u2 + 4(x + u1)2, x ≥ 0, (57)
where C(x) is an arbitrary function of x. We present in Appendix-C the reasoning behind
this particular choice of T . For x < 0, T is defined by the symmetry relation (56). The
invariant sets of the resulting dynamical systems have the property that the boundary of
each of them is a piecewise quadratic curve with four pieces. An example of these sets is
depicted in Figure 9 for x = 0.24, and for a particular choice of C(x). The invariant set
Γx is the region bounded by the quadratic curves that connect the points P1, P2, P3, P4,
where the piece of curve that joins P1 to P2 is to be excluded. We show in Appendix-C
that Γx is an invariant set for Mx, and its Z2 translations tile the plane.
Although the relevance of such a system to analog circuit implementation is still to be
evaluated, this quadratic function T is still interesting to be considered as it gives us a first
situation where the invariant set of the dynamical system of Figure 3 is entirely available
August 26, 2003 DRAFT
Page 29
26
u2
u1
P’3
P0
P2
P2k−2
P2k
=P’2kQ
1
Q’3=Q
3
P4
P’2
P’4
P’2k−2
Q’1
QT
PT P1
P3
P2k−1
Q2
P’0
Q0
P’2k−1
Q’2
P’1
Q’0
T=0
Ω Ω
x
x 0
1
P’2k−3
P2k−3
P’2k−4
(a)
−(1,1
)+(1,0)
−(0,
1)
u1
u2
x Γ
(b)
Fig. 8. The invariant set for the dynamical system given in Section IV-B for a generic k value (in this
figure k = 5): (a) detailed description; (b) tiling demonstration.
August 26, 2003 DRAFT
Page 30
27
−(1,1
)
+(1,0)
−(0,
1)
u1
u2
T=0
Q4
Q3
P3
P2
Q2
Q1
P1
P4
Γ x Ω 0 x
Ω 1 x
Fig. 9. The invariant set for the dynamical system given in Section IV-C. (x = 0.24)
analytically. It also gives us what we believe to be the simplest configuration of invariant
sets theoretically achievable by one-bit second order modulators.
One remark is on the robustness of the implementation. It was shown in the work
of Yılmaz in [26] that second order Σ∆ modulation is robust against small functional
perturbations of (what we refer here as) Q T in the sense that stability of the state
variable u is ensured as long as the boundary separating the partition Ωixi=0,1 stays in a
particular region. It also follows from this work that the quadratic function T presented
in this paper is robust in this sense, at least in a range of inputs x. This increases the
chances of the implementability of this quadratic function in real circuitry.
D. Boundedness, regularity and tiling
The invariant sets of the dynamical systems given above possess three properties which
turn out to be crucial in the estimates we shall prove in this paper. These properties
are uniform boundedness in x, regularity of the boundary and tiling, as summarized in
Proposition IV.1. Before we proceed to the statement of the proposition, let us define the
regularity class of sets Mb to be the collection of sets H ⊂ [0, 1)d for which
|u ∈ Hc : dist(u, H) < ε| ≤ b(ε) and |u ∈ H : dist(u, Hc) < ε| ≤ b(ε)
August 26, 2003 DRAFT
Page 31
28
for every ε > 0, where b : (0,∞) → (0,∞) is a monotonically increasing function such
that limε→0+ b(ε) = 0. Here |A| denotes the (Lebesgue) measure of the set A, and Ac
denotes the complement of A. Every Jordan measurable set (i.e. a set whose boundary
has Lebesgue measure zero) belongs to such a class Mb for some b.
Proposition IV.1: For each of the one-parameter family of dynamical systems D =
(Mx, x ∈ [−12, 1
2]) given in Sections IV-A, IV-B, and IV-C, there exists a subinterval
I = I(D) of [−12, 1
2] such that for each x ∈ I, the map Mx possesses an invariant set Γx
with the following properties:
1. Uniform boundedness in x: There exists a positive constant M0 such that
supx∈I
supu∈Γx
|u| ≤ M0.
2. Regularity of the boundary: There exists a positive constant C0 such that Γx ∈Mb for
all x ∈ I, where b(ε) = C0ε.
3. Tiling: For each x ∈ I, the set Γx is a tile congruent to [0, 1)2 modulo translations by
vectors in Z2. I.e., the translates of Γx by the integer lattice tile the plane:
Γx + Z2 = R2, and (Γx + k) ∩ Γx = ∅ if k ∈ Z2\0.
We say that Γx is a tiling invariant set, or equivalently an invariant tile.
The proof of this proposition is given in Appendix-D.
V. The Main Theorem
We shall continue to use the notation L2, L1, and Q1 to denote the one-parameter
dynamical systems D = (Mx, x ∈ [−12, 1
2]) for the 2-bit linear, 1-bit linear, and the 1-bit
quadratic schemes given in Sections IV-A, IV-B, and IV-C, respectively. For each of these
second order Σ∆ schemes, we assume that for each x ∈ I(D), the initial condition u[0] is
chosen from the invariant set Γx of the associated dynamical system Mx, and the sequence
u[n] is defined for all n ∈ Z as described in Section III-C. Recall that the subscript x in
eλ,x[n] denotes the dependence of the error eλ[n] = hλ ∗ (x1[n] − q[n]) on the value x of
the constant input signal x1[n], and hλ is the sinc3 filter defined in Section II-D. For each
family D ∈ L2,L1,Q1, we set, as in Section I,
MSE(λ;D) = supn
∫I
|eλ,x[n]|2 dx, (58)
August 26, 2003 DRAFT
Page 32
29
where I = I(D) is as defined in Proposition IV.1. The following theorem lists our improved
estimates for these second order schemes:
Theorem V.1: Let D be a second order Σ∆ modulation scheme that satisfies the prop-
erties listed in Proposition IV.1, in particular any of the schemes L2, L1 or Q1. Then the
following estimates hold:
(a) The mean square error defined by (58) satisfies
MSE(λ;D) ≤ Cλ−9/2 log2 λ (59)
for all λ > 0, where C = C(D) is a constant that depends only on the scheme D.
(b) For almost every x ∈ I, and all n ∈ Z,
|eλ,x[n]|2 ≤ Cλ−9/2 log4 λ (60)
for all λ > 0, where C = C(D, x, n) does not depend on λ, but otherwise may depend on
the scheme D, input x and the time point n.
(c) For D = L2, the same estimates in (a) and (b) hold with 9/2 replaced by 5.
Before we proceed onto the proof of the theorem, let us list some further qualitative
observations. Equation (35) states that
eλ,x[n] =1
λ2(uλ[n]− 2uλ[n−λ] + uλ[n−2λ]) . (61)
As a reminder from (36), uλ[n] is qualitatively the average of the discrete sequence u2[m]
over the time interval of (n − λ, n]. This expression immediately suggests that eλ,x[n] =
o(λ−2). To see this, denote by F2 the mapping that takes u = (u1, u2) to u2. When x is
irrational, for almost all u[0] and for all n, the Ergodic Theorem yields
limλ→∞
uλ[n] = limλ→∞
1
λ
λ−1∑j=0
F2(M−jx u[n]) =
∫Γx
F2(w) dw =
∫Γx
w2 dw1dw2. (62)
Now, it is a simple exercise to show that
uλ[n]− 2uλ[n−λ] + uλ[n−2λ] = 3(uλ[n]− 2u2λ[n] + u3λ[n]
); (63)
August 26, 2003 DRAFT
Page 33
30
therefore (61) and (62) together imply that eλ,x[n] = o(λ−2). Note that this argument
does not provide us with any information about the improvement on the exponent of λ.
The proof of Theorem V.1 below will heavily use techniques from the theory of uniform
distribution. Appendix-A contains the definitions and the tools that we shall employ in
the proof.
Proof of Theorem V.1: Let us define a residual sequence rλ by
rλ[n] := uλ[n]−∫
Γx
w2 dw1dw2 =1
λ
∑n−λ<m≤n
F2(u[m])−∫
Γx
F2(w) dw. (64)
Since uλ[n] and rλ[n] differ by an absolute constant, we can replace uλ[n] in (63) by rλ[n].
When combined with (61), this yields
|eλ,x[n]| =3
λ2|rλ[n]− 2r2λ[n] + r3λ[n]|
≤ 3
λ2
(|rλ[n]|+ |2r2λ[n]|+ |r3λ[n]|
), (65)
and by Cauchy-Schwarz,
|eλ,x[n]|2 ≤ C
λ4
(|rλ[n]|2 + |r2λ[n]|2 + |r3λ[n]|2
); (66)
therefore it suffices, for each time point n, to estimate |rλ[n]| for general λ.
Let us first consider the rather simple case D = L2. Note that the invariant set Γx given
in Figure 7 is such that the ordinate of any point in Γx always lies in [−12, 1
2). Also, the
sequence v[n] defined in Section III-C satisfies u2[n]− v2[n] ∈ Z. Therefore
u2[n] = 〈v2[n]〉[− 1
2 , 12 )= 〈v2[n] + 1
2〉 − 1
2. (67)
Since in this case
∫Γx
w2dw1dw2 = 0, (64) becomes
rλ[n] =1
λ
∑n−λ<m≤n
(〈v2[m] + 1
2〉 − 1
2
)=
1
λ
∑n−λ<m≤n
〈v2[m] + 12〉 −
∫ 1
0
w dw. (68)
Let D(n−λ,n](〈v2〉) denote the discrepancy (Appendix-A) of the λ consecutive sequence
elements 〈v2[m]〉; n−λ < m ≤ n. Koksma’s inequality (Appendix-A) can be used to
bound |rλ[n]|:
|rλ[n]| ≤ D(n−λ,n](〈v2〉), (69)
August 26, 2003 DRAFT
Page 34
31
where we have used the invariance of discrepancy under translations of the torus T = [0, 1)
in the equality D(n−λ,n](〈v2 + 12〉) = D(n−λ,n](〈v2〉). The estimate (69) therefore reduces the
problem for the case D = L2 to estimating the λ-term discrepancy of the sequence 〈v2〉.
The general case for D which includes D ∈ L1,Q1 is more difficult because it is no
longer possible to obtain an expression of u2[n] as simple as in (67). Initially, the expression
(64) suggests the need for some two-dimensional version of Koksma’s inequality, defined
on an arbitrary set (in our case Γx); however, the setup for the so-called Koksma-Hlawka
inequality [5, Theorem 1.14]) is the unit cube [0, 1)d. Using (48), the Z2-periodicity of
〈·〉Γx
, and Proposition III.4, we can transform the expression (64) into
rλ[n] =1
λ
∑n−λ<m≤n
F2
(⟨〈v[m]〉
⟩Γx
)−∫
[0,1)2F2
(〈w〉
Γx
)dw, (70)
and attempt to use the Koksma-Hlawka inequality for the sequence 〈v[m]〉 and the function
f = F2 〈·〉Γx. At first, this attempt also appears to be defeated because Koksma-Hlawka
inequality holds for functions that are of bounded variation in the sense of Hardy and
Krause (see [5, p.10] for the definition), which is a more restrictive class than the usual
functional class BV ([0, 1)d) when d ≥ 2, and which does not necessarily contain F2 〈·〉Γx
due to the geometry of Γx.
We overcome this difficulty with the following procedure. By applying (52) on both⟨〈v[m]〉
⟩Γx
and 〈w〉Γx
, and by using the linearity of F2, we first obtain
rλ[n] =
(1
λ
∑n−λ<m≤n
F2(〈v[m]〉)−∫
[0,1)2F2(w) dw
)
+∑k∈Z2
(1
λ
∑n−λ<m≤n
χSk
(〈v[m]〉)−∫
[0,1)2χ
Sk(w)dw
)F2(k). (71)
(Note that we have replaced 〈w〉 with w since w ∈ [0, 1)2.) It is now possible to apply
the Koksma-Hlawka inequality to the first term on the right hand side. This gives∣∣∣∣∣1λ ∑n−λ<m≤n
F2(〈v[m]〉)−∫
[0,1)2F2(w) dw
∣∣∣∣∣ ≤ C0 D(n−λ,n](〈v〉), (72)
where C0 = VarHK(F2) is the variation of F2 in the sense of Hardy and Krause, and
D(n−λ,n](〈v〉) is the two-dimensional discrepancy (Appendix-A) of 〈v[m]〉 : n−λ < m ≤
August 26, 2003 DRAFT
Page 35
32
n. On the other hand, it is still true that the functions χSk
are not necessarily of bounded
variation in the sense of Hardy and Krause. However, then the notion of discrepancy with
respect to a given subset can be invoked (Appendix-A). Indeed, by definition, we have∣∣∣∣∣1λ ∑n−λ<m≤n
χSk
(〈v[m]〉)−∫
[0,1)2χ
Sk(w)dw
∣∣∣∣∣ = D(n−λ,n](〈v〉, Sk). (73)
We now make use of the regularity of the sets Γx in order to estimate these quantities.
From Proposition IV.1 (Property 2) and Theorem A.4, it follows that
supx∈I
supk∈Z2
D(n−λ,n](〈v〉, Sk) ≤ C1D(n−λ,n](〈v〉)1/2 (74)
for some constant C1, where I = I(D), as defined in Proposition IV.1. Note also that
Sk = ∅ as soon as |k| > M0, where M0 is some absolute constant that only depends on the
system D, and the range I. We can therefore limit the summation over k in (71) to the
set K = k ∈ Z2 : |k| ≤ M0, whose cardinality #K does not exceed M20 . We have also
|F2(k)| ≤ |k| ≤ M0 for all k ∈ K. With (72) and (74), we finally obtain the analogous
bound for |rλ[n]|
|rλ[n]| ≤ C0 D(n−λ,n](〈v〉) + M0 C1
∑k∈K
D(n−λ,n](〈v〉)1/2
≤ (C0 + C1 M30 )D(n−λ,n](〈v〉)1/2, (75)
where we have used the fact that discrepancy is always between 0 and 1 when merging the
terms with different powers. Therefore the problem in the general case is also reduced to
estimating the λ-term discrepancy, but this time of the two-dimensional point sequence
〈v〉. The following lemma addresses this issue:
Lemma V.2: The following estimates hold:
(a) For all λ > 0 and n ∈ Z,∫ 1/2
−1/2
[D(n−λ,n](〈v〉)
]2dx ≤ Cλ−1 log4 λ, (76)
where C is an absolute constant.
(b) For almost every x ∈ I, all λ > 0 and n ∈ Z,
D(n−λ,n](〈v〉) ≤ Cλ−1/2 log7/2+δ λ, (77)
August 26, 2003 DRAFT
Page 36
33
where C = C(x, n) does not depend on λ, but otherwise may depend on the input x and
the time point n and δ is some fixed small positive number.
(c) If v is replaced by v2, then log4 λ can be replaced by log2 λ in (a) and log7/2+δ λ can
be replaced by log5/2+δ λ in (b).
The proof of this lemma is independent of the rest of the proof of Theorem V.1 and is
presented separately at the end of this section.
Lemma V.2 is essentially all that was needed to complete the proof of Theorem V.1:
(a) We square and integrate both sides of the inequality (75) and apply Cauchy-Schwarz
followed by Lemma V.2(a) to obtain∫I
|rλ[n]|2 dx ≤ C
∫I
D(n−λ,n](〈v〉) dx
≤ C|I|1/2
(∫I
[D(n−λ,n](〈v〉)
]2dx
)1/2
≤ Cλ−1/2 log2 λ. (78)
Note that C does not depend on n. Therefore this result together with (66) implies (59).
(b) In this case, we simply apply (66), (75) and Lemma V.2(b) to obtain (60).
(c) For the mean-square error, we square and integrate both sides of (69) and apply (66)
and Lemma V.2(c). For the instantaneous error, we simply apply (66), (69) and Lemma
V.2(c) to obtain the desired estimate.
Proof of Lemma V.2:
(a) Define, for k = (k1, k2) ∈ Z2,
S(a,b](k, x) :=1
b− a
∑a<m≤b
e2πik·〈v[m]〉, (79)
where the dependence on x becomes explicit if the formula (50) is inserted in this expres-
sion. Using the periodicity of the exponential function, one can rewrite S(a,b] as
S(a,b](k, x) =1
b− a
∑a<m≤b
cme2πidmx, (80)
where cm = e2πi[u1[0]k1+(u2[0]+mu1[0])k2] and dm = mk1 + 12m(m + 1)k2.
Note that |cm| = 1 and dm ∈ Z for all m. Since dm is a quadratic polynomial in m, it can
attain any given value at most twice. Hence, if S(a,b](k, x) is rewritten as a trigonometric
August 26, 2003 DRAFT
Page 37
34
polynomial in x with distinct frequencies, the amplitude of each frequency will be bounded
by 2/(b−a), since maxm,l |cm+cl| ≤ 2. Also, there will be at most b−a distinct frequencies.
Thus, using Parseval’s theorem, one easily bounds ‖S(a,b](k, ·)‖L2(T) by
‖S(a,b](k, ·)‖L2(T) ≤2√
b− a(81)
uniformly in k. Now, for any positive integer K, Erdos-Turan-Koksma inequality (Theorem
A.5) yields the estimate
D(a,b](〈v〉) ≤ C
1
K+
∑0<‖k‖∞≤K
1
r(k)
∣∣S(a,b](k, x)∣∣ , (82)
which, upon taking the square, using the (Cauchy-Schwarz) inequality (y+z)2 ≤ 2(y2+z2)
and integrating gives∫ 1/2
−1/2
D2(a,b](〈v〉) dx ≤ 2C
1
K2+
∑0<‖k‖∞,‖l‖∞≤K
1
r(k)r(l)
∫ 1/2
−1/2
|S(a,b](k, x)||S(a,b](l, x)| dx
.
(83)
The integral expression on the right hand side can be bounded by 4/(b−a) using Cauchy-
Schwarz inequality and (81). On the other hand, one has
∑0<‖k‖∞≤K
1
r(k)= 4
(K∑
k1=1
K∑k2=1
1
k1k2
+K∑
k1=1
1
k1
)≤ C ′ log2 K, (84)
so that (83) reduces to, for a = n− λ and b = n,∫ 1/2
−1/2
[D(n−λ,n](〈v〉)
]2dx ≤ C ′′ inf
K≥1
(1
K2+
1
λlog4 K
)≤ Cλ−1 log4 λ, (85)
where it suffices to choose K ∼ λ1/2 at the last step.
(b) The proof of this result may seem somewhat unexpected since it is actually derived
from the input-averaged estimate. However, the technique we shall use in our proof is
well-known in the metric theory of discrepancy [5, §1.6.1].
Let DΛ denote the discrepancy of a given sequence wm over the set of indices m ∈ Λ
and #Λ denote the cardinality of Λ. A crucial aspect of the method is that the function
Λ 7→ #ΛDΛ is sub-additive, i.e., for Λ1 ∩ Λ2 = ∅, we have
#(Λ1 ∪ Λ2)DΛ1∪Λ2 ≤ #Λ1DΛ1 + #Λ2DΛ2 , (86)
August 26, 2003 DRAFT
Page 38
35
which follows straightforwardly from the definition of discrepancy given by (A9).
Denote by Λjm the collection of all dyadic subintervals Λ ⊂ [0, 2m) with |Λ| = 2j. For
example, Λ23 = [0, 4), [4, 8). Note that #Λj
m = 2m−j.
It is clear by considering the binary expansion of any given λ ∈ [0, 2m) that one can
write [0, λ) as a disjoint union of at most m dyadic intervals. Let us call the collection of
these intervals Jλ. Hence we have Jλ ⊂⋃m−1
j=0 Λjm, #Jλ ≤ m and [0, λ) =
⋃Λ∈Jλ
Λ.
Fix n. Since (n− λ, n] = n− [0, λ), we have
λD(n−λ,n](〈v〉) ≤∑Λ∈Jλ
|Λ|Dn−Λ(〈v〉), (87)
so that by Cauchy-Schwarz, we get
λ2[D(n−λ,n](〈v〉)
]2 ≤ (#Jλ)∑Λ∈Jλ
|Λ|2 [Dn−Λ(〈v〉)]2
≤ mΨm(x), (88)
where we define Ψm(x) to be the function
Ψm(x) :=m−1∑j=0
∑Λ∈Λj
m
|Λ|2 [Dn−Λ(〈v〉)]2 m ≥ 1. (89)
Now, note that Lemma V.2(a) implies∫ 1/2
−1/2
Ψm(x) dx =m−1∑j=0
∑Λ∈Λj
m
|Λ|2∫ 1/2
−1/2
[Dn−Λ(〈v〉)]2 dx
≤ C1
m−1∑j=0
∑Λ∈Λj
m
|Λ| log4 |Λ|
≤ C2 2mm5. (90)
Therefore, for an arbitrary positive number δ > 0, we obtain
∞∑m=1
∫ 1/2
−1/2
Ψm(x)
2mm6+δdx < ∞. (91)
Now, as we show next, a standard Borel-Cantelli argument yields the bound
Ψm(x) ≤ C(x)2mm6+δ, for all m, and almost every x. (92)
August 26, 2003 DRAFT
Page 39
36
To see this, let Em := x ∈ [−12, 1
2] : Ψm(x) ≥ 2mm6+δ with measure |Em|. Since we
have
|Em| =∫
Em
1 dx ≤∫ 1/2
−1/2
Ψm(x)
2mm6+δdx, (93)
it follows that∑
m |Em| < ∞. Hence the set
∞⋂l=1
⋃m≥l
Em =
x ∈ [−12, 1
2] : Ψm(x) ≥ 2mm6+δ for infinitely many m
has measure zero. This means that for almost every x ∈ [−1
2, 1
2], one has Ψm(x) ≤ 2mm6+δ
for all but finitely many m. For each x, we remove this finite set of unwanted values of
m by multiplying the upper bound by a suitable constant C(x). This proves (92). (One
can extend this argument to show that Ψm(x) = o(2mm6+δ) almost everywhere; see [5,
p. 154]).
Now, for each λ > 0, there exists a unique m such that 2m−1 ≤ λ < 2m. Then (88)
implies together with Ψm(x) ≤ C(x)2mm6+δ almost everywhere that
[D(n−λ,n](〈v〉)
]2 ≤ C(x, n)λ−1 log7+δ λ, a.e. x, (94)
where we have also restored the possible dependence of the constant C(x) on n which was
fixed at the beginning of the proof.
(c) These inequalities are proved exactly in the same manner as in (a) and (b), however
using the one dimensional Erdos-Turan inequality (Theorem A.3) instead.
VI. Discussion and further remarks
What has fundamentally enabled our analysis of the Σ∆ modulators in this paper is
the tiling property of the invariant sets of the associated dynamical systems. The tiling
property allowed us to find an explicit expression of the error signal for constant inputs.
In this paper, we have concentrated on upper bounds for the instantaneous error of the
modulator in two cases: in the mean and almost surely, when the constant input comes
from a uniform distribution. In both cases, we have derived bounds in the form of λ−4.5
(modulo logarithmic factors) under the general regularity conditions of Proposition IV.1.
Apart from the L2 case, what kept us from achieving the experimentally observed generic
August 26, 2003 DRAFT
Page 40
37
decay rate λ−5 was the lack of a more customized discrepancy estimate than what is
implied by Theorem A.4. It would be interesting to improve this machinery and further
close this gap.
The constants appearing in the error bounds that we have derived this paper are un-
fortunately only implicit. While it is very desirable for practical implementations to know
explicit (and perhaps tight) constants, at this stage we do not know if the functional
forms of these error bounds reflect the accurate order of magnitude of the norms we have
considered. Therefore, we have not focused on the values of constants in this paper.
It turns out [24] that some of these problems are eliminated if the time-averaged square
error measure is used instead; it then becomes possible via the tools of ergodic theory to
extract a more refined form of the error decay rate in λ. This constitutes a generalization
of the work in [7], [12] to a much more general set-up of Σ∆ quantization schemes.
We note that the analysis of this paper can be straightforwardly generalized to higher
order Σ∆ modulators with constant input once the tiling property (with single invariant
tiles) is established and these tiles satisfy the properties listed in Proposition IV.1 (in
fact, it is possible to relax the regularity conditions stated in there via the weaker general
conditions of Theorem A.4). We leave the details of this generalized analysis to the reader.
In parallel, a substantial topic of investigation is a better understanding of the tiling
phenomenon, and in particular, how the constant input theory can be generalized to time-
varying inputs. This is not easy, however, since there is yet no scheme apart from L2 in
which the invariant sets Γx do not vary with x. Understanding this dependence will prove
to be crucial in improving the error estimates for second and higher-order Σ∆ modulators
for time-varying inputs.
Appendix
A. Tools from the theory of uniform distribution
Let wn∞n=1 be a sequence of points in [0, 1) identified with the 1-torus T = R/Z. The
sequence wn is said to be uniformly distributed (in short, u.d.) if
limN→∞
#1 ≤ n ≤ N : wn ∈ IN
= |I| (A1)
August 26, 2003 DRAFT
Page 41
38
for every arc I in T. Define the N -term discrepancy of the sequence wn as
DN := DN(w) := supI∈I
∣∣∣∣#1 ≤ n ≤ N : wn ∈ IN
− |I|∣∣∣∣ , (A2)
where I denotes the set of all intervals in [0, 1) considered as the 1-torus T = R/Z. It is
an elementary result that wn is u.d. if and only if DN(w) → 0 as N →∞. Equivalent
characterizations of uniform distribution are given by Weyl’s criterion:
Theorem A.1 (Weyl)
wn is u.d. ⇐⇒ 1
N
N∑n=1
e2πikwn → 0 for each nonzero k ∈ Z, (A3)
⇐⇒ 1
N
N∑n=1
f(wn) →∫
Tf(w) dw for every Riemann-integrable
(or, equivalently, continuous) f on T. (A4)
These are “qualitative” statements. The relation between how good the distribution of a
sequence is and how fast (A3) and (A4) converge are studied in the “quantitative” theory.
The second Weyl criterion is especially relevant to numerical integration. Fundamental
quantitative measures in the theory are the following:
Theorem A.2 (Koksma’s inequality [16]) Given any function f : [0, 1] → R that is of
bounded variation and a finite sequence of points w1, . . . , wN in [0, 1),
∣∣∣ 1
N
N∑n=1
f(wn)−∫ 1
0
f(w)dw∣∣∣ ≤ Var(f)DN(w), (A5)
where Var(f) denotes the total variation of f .
Theorem A.3 (Erdos-Turan inequality [16])
DN(w) ≤ C infK≥1
(1
K+
K∑k=1
1
k
∣∣∣ 1
N
N∑n=1
e2πikwn
∣∣∣) (A6)
for some absolute constant C.
The theory of uniform distribution generalizes naturally to higher dimensions, however
with some added complexity. Let wn be a sequence in [0, 1)d identified with Td = Rd/Zd.
For a measurable subset H of [0, 1)d, define
DN(w; H) :=
∣∣∣∣#1 ≤ n ≤ N : wn ∈ HN
− |H|∣∣∣∣ , (A7)
August 26, 2003 DRAFT
Page 42
39
where |H| denotes the d-dimensional Lebesgue measure of H. Let Id denote the set of
all intervals (i.e., the set of all rectangles whose sides are parallel to the axes) in Td. The
discrepancy DN is by definition
DN(w) = supH∈Id
DN(w; H). (A8)
The sequence wn is said to be u.d. if the condition limN→∞ DN(w; H) = 0 holds for
every H ∈ Id. Again, this is equivalent to limN→∞ DN = 0. Weyl’s criterion naturally
extends using multidimensional versions of (A3) and (A4).
If a finite index set Λ replaces the set of indices 1, . . . , N , then we shall use the notation
DΛ(w) to denote the discrepancy of the points wn, n ∈ Λ, i.e.
DΛ := DΛ(w) := supI∈I
∣∣∣∣#n ∈ Λ : wn ∈ I#Λ
− |I|∣∣∣∣ . (A9)
A definition of discrepancy exists also for arbitrary non-negative Borel measures µ on
[0, 1)d. The discrepancy of µ with respect to the set H ∈ [0, 1)d, denoted by D(µ; H), is
defined to be |µ(H)− |H||. Similarly, one has the definition
D(µ) := supH∈Id
D(µ; H) (A10)
for the discrepancy of µ. By definition, DN(w; H) = D(µN ; H), where the measure µN is
defined by µN(A) := 1N
∑Nn=1 χA(wn) for A ⊂ Td.
If the supremum in (A10) is taken instead over all convex subsets of Td, then this quantity
defines the isotropic discrepancy J(µ). Clearly, one has D(µ) ≤ J(µ); on the other hand,
an inequality in the reverse direction exists only in a weaker sense: J(µ) ≤ CdD(µ)1/d,
where Cd is a constant that depends only on the dimension d. The following theorem
[18, pp. 173] (see also [17]), gives a discrepancy estimate for sets in the larger family of
Jordan-measurable sets. Let Mb denote the class of sets defined in Section IV-D.
Theorem A.4 (Niederreiter, Wills) Let b : (0,∞) → (0,∞) be monotonically increasing
such that b(ε) ≥ ε for all ε > 0, and limε→0+ b(ε) = 0. Then, for every H ∈Mb, one has
D(µ; H) ≤ 4 b(2√
dD(µ)1/d). (A11)
A multidimensional version of Koksma’s inequality (called the Koksma-Hlawka inequal-
ity) holds for functions of bounded variation in the sense of Hardy and Krause. We will
August 26, 2003 DRAFT
Page 43
40
not go into the details but refer to [16], [5] only. On the other hand, a generalization of
Erdos-Turan inequality is simpler to state and is given by the following:
Theorem A.5 (Erdos-Turan-Koksma inequality [16])
DN(w) ≤ Cd infK≥1
1
K+
∑0<‖k‖∞≤K
1
r(k)
∣∣∣ 1
N
N∑n=1
e2πik·wn
∣∣∣ (A12)
for some absolute constant Cd, where r(k) :=d∏
i=1
max1, |ki| for k = (k1, . . . , kd) ∈ Zd.
B. Invariant set Γx for the L1 system
In this section, we show how the set Γx described in Section IV-B can be shown by
inspection to be invariant by Mx. We will not mention however by what process the
invariant set can be found initially, as our current method is limited to practical inspection.
Consider an integer k ≥ 1 and x ∈ [αk+1, αk). Consider then the two points QT and PT of
the thresholding line u1 + 12u2 = 0, defined in the first part of Table II. Let us define the
following points
Q0 := M1x(QT ) = Mx(QT ) , Q′
0 := M0x(QT ) 6= Mx(QT ),
P0 := M1x(QT ) = Mx(PT ) , P ′
0 := M0x(PT ) 6= Mx(PT ).
(B1)
Then, for j ≥ 0, let us recursively define
Qj+1 := Mx(Qj) , Q′j+1 := Mx(Q
′j),
Pj+1 := Mx(Pj) , P ′j+1 := Mx(P
′j).
(B2)
Thanks to (B1), (B2) and the definition of Mx in (25), one can easy establish the second
part of Table II. To derive the third part of the table, one first has to note the following
properties. Because of (24), it is trivial that M0x(P ) − M1
x(P ) = (1, 1) for any point
P . It then results from (B1) that Q′0 − Q0 = (1, 1) and P ′
0 − P0 = (1, 1). Next, if two
points P and P ′ are such that P ′ − P ∈ Z2, it is easy to see from (25) and (24) that
Mx(P′) −Mx(P ) ∈ Z2. Thus, one recursively obtains from (B2) that Q′
j − Qj ∈ Z2 and
P ′j − Pj ∈ Z2. We derive these integer difference vectors for a certain number of indices j
and show the results in the third part of the table.
Let us denote by R1R2 · · ·Rn the set enclosed into the polygon of vertices R1, R2, · · · , Rn,
and by [R1, R2, · · · , Rn] the union of the segments [Ri−1, Ri] for i = 2, · · · , n. We define
August 26, 2003 DRAFT
Page 44
41
TABLE II
Characterization of the vertices of the invariant set of system L1.
Point Domain Abscissa Ordinate Range
QT Ω1x ∩ Ω0
x −14
+ 14(2k−1)
+ x(k − 3) 12− 1
2(2k−1)− 2x(k − 3)
PT Ω1x ∩ Ω0
x −32x + 1
43x− 1
2
Q0 Ω0x −3
4+ 1
4(2k−1)+ x(k − 2) −1
4− 1
4(2k−1)− x(k − 4)
Q1 Ω0x −1
4+ 1
4(2k−1)+ x(k − 1) 3x− 1
2
Q2 Ω1x
14
+ 14(2k−1)
+ xk −14
+ 14(2k−1)
+ x(k + 3)
Q3 −14
+ 14(2k−1)
+ x(k + 1) −12
+ 12(2k−1)
+ 2x(k + 2)
P2j Ω0x −1
4+ x(2j − 1
2) −3
4+ x(2j2 + 5
2) j = 0, . . . , k
P2j−1 Ω1x
14
+ x(2j − 32) −1
2+ x(2j2 − 2j + 3) j = 1, . . . , k
Q′0 Ω1
x Q0 + (1, 1)
Q′1 Ω1
x Q1 + (0, 1)
Q′2 Ω0
x Q2 − (1, 0)
Q′3 Q3
P ′0 Ω0
x P0 − (1, 1)
P ′2j Ω0
x P2j − (0, 1) j = 0, . . . , k − 1
P ′2j−1 Ω1
x P2j−1 − (0, 1) j = 1, . . . , k − 1
P ′2k−1 Ω0
x P2k−1 − (1, 0)
P ′2k P2k
the set
Γx := (S0x −B0
x) ∪ (S1x −B1
x),
where
S0x := QT P ′
2k−4 · · ·P ′4P
′2Q
′2P
′2k−1Q0P0P2P4 · · ·P2k−2Q1PT ,
S1x := QT P ′
2k−2Q′1P
′1P
′3 · · ·P ′
2k−3Q′0P
′0Q2P2k−1 · · ·P3P1PT ,
B0x :=
[Q0, P
′2k−1, Q
′2, P
′2, P
′4, · · · , P ′
2k−4, QT , PT
],
August 26, 2003 DRAFT
Page 45
42
B1x :=
[QT , P ′
2k−2, Q′1, P
′1, P
′3, · · · , P ′
2k−3, Q′0, P
′0
].
The above sets can be recognized in Figure 8(a). They are also highlighted in Figure 10(a)
where S0x and S1
x are represented by shaded areas, and B0x and B1
x are represented by a
dashed line and a mixed line, respectively. The set Γx is basically formed by removing from
S0x ∪ S1
x the upper boundary shown in mixed line in Figure 8(a). Note from the definition
of B0x that we are also removing the inner segment [QT , PT ] from S0
x. However, this inner
segment still remains in Γx because it is part of S1x. Now, note that S0
x − B0x ⊂ Ω0
x and
S1x −B1
x ⊂ Ω1x. Therefore
Mx(Γx) = M0x(S
0x −B0
x) ∪M1x(S
1x −B1
x)
=(M0
x(S0x)−M0
x(B0x))∪(M1
x(S1x)−M1
x(B1x)). (B3)
In the last equality, we have used the fact that M0x and M1
x are injective. Let us derive
M0x(S
0x). Since M0
x is affine, M0x(S
0x) is simply the polygonal set whose vertices are obtained
by transforming those of S0x through M0
x. Now, except for QT and PT , all the vertices of
S0x belong to Ω0
x. Their images by M0x and by Mx are therefore the same. Their images
through Mx are then trivially obtained from (B2). Meanwhile, the explicit transformation
of QT and PT through M0x is obtained from (B1). By applying similar reasonings to
M1x(S
1x), M0
x(B0x) and M1
x(B1x) we then find
M0x(S
0x) = Q′
0P′2k−3 · · ·P ′
5P′3Q
′3P
′2kQ1P1P3 · · ·P2k−1Q2P
′0,
M1x(S
1x) = Q0P
′2k−1Q
′2P
′2P
′4 · · ·P ′
2k−2Q′1P
′1Q3P2k · · ·P4P2P0,
M0x(B
0x) =
[Q1, P
′2k, Q
′3, P
′3, P
′5, · · · , P ′
2k−3, Q′0, P
′0
],
M1x(B
1x) =
[Q0, P
′2k−1, Q
′2, P
′2, P
′4, · · · , P ′
2k−2, Q′1, P
′1
].
These sets can also be recognized in Figure 8(a) and are highlighted in Figure 10(b). By
using (B3), one can see by inspection that Mx(Γx) = Γx.
C. Invariant set Γx for the Q1 system
We briefly describe here the construction principle of the function of (57). The basic idea
is to find a change of coordinates such that in the new coordinate system the dynamical
August 26, 2003 DRAFT
Page 46
43
u1
u2
T=0
Sx0
Ω 0
x
Ω 1
x
Sx1 B
x0
Bx1
QT
PT
(a)
Mx0(S
x0)
Mx1(S
x1)
Mx0(B
x0)
Mx1(B
x1)
u1
u2
(b)
Fig. 10. A schematic diagram of the action of Mx on Γx: (a) before the mapping; (b) after the mapping.
system becomes somewhat “simpler”. Denote by Φx the bijection defining the change of
coordinates, and Mx the transformation in the new coordinate system, given by
Mx = ΦxMxΦ−1x .
Let the pieces of Mx on Ω0x := Φx(Ω
0x) and Ω1
x := Φx(Ω1x) be denoted by M0
x and M1x,
respectively. It turns out that it is possible to find Φx which reduces one of M0x or M1
x to
a pure translation, while keeping the other one still affine. Assuming x > 0, it is actually
interesting (and more intuitive) to reduce M1x to a translation, since in this case the state
variable u[n] stays in Ω1x more frequently than Ω0
x. This can be realized by setting
Φx(u1, u2) := (u1, u2 + 12ax
(u1 − ax
2)2 − cx), (C1)
where
ax :=1
2− x > 0 (C2)
and cx is an arbitrary constant that may depend on x. We denote Φx(u) also by u :=
(u1, u2). Then M1x is given by
M1xu = u + (x− 1
2)f ,
and M0x by
M0xu = Axu + (x + 1
2)gx,
August 26, 2003 DRAFT
Page 47
44
u1
u2
T=0
x0
Ω Ω
x1
Q2
Γ Γ
x0 x
1
Q1
Q4
Q3P
3
P1
P2
P4
~ ~ ~ ~
~ ~ ~ ~
~ ~ ~
~
(a)
u1
u2
Mx0(
x0)
Q2
Γ
Q1
Q4
Q3P
3
P1
P2
P4
~ ~ ~ ~
~ ~ ~ ~
Mx1(
x1)Γ
~
~
(b)
Fig. 11. The invariant set in the (u1, u2) domain. (x = 0.24)
where
f =
1
0
, Ax =
1 0
1ax
1
, and gx =
1
12ax
.
Note that the description of Mx is independent of the constant cx and that M1x is simply
the translation along the u1-axis by the negative constant −ax.
The final ingredient is the specification of the partition Ω0x, Ω
1x, or equivalently the
partition Ω0x, Ω
1x. This is done with the help of eight characteristic points of the mappings
M0x and M1
x, denoted by P1, P2, P3, P4 and Q1, Q2, Q3, Q4 (see Figure 11). These points
are defined by
P1 = (12
+ 12ax, 1), P2 = (−1
2− 1
2ax, 1), P3 = (−1
2+ 1
2ax, 0), P4 = (1
2− 1
2ax, 0) (C3)
and
Q1 = P1 − (ax, 0), Q2 = P2 + (ax, 0), Q3 = P3 + (ax, 0), Q4 = P4 − (ax, 0). (C4)
Consider the set Γx := Γ0x ∪ Γ1
x where
Γ0x := Q3Q2P2P3 − [Q3, Q2, P2] and Γ1
x := P4P1Q2Q3 − [P1, Q2].
In these definitions, we imply that Γ0x is formed by taking the parallelograms Q3Q2P2P3
without the two boundary segments [Q3, Q2] and [Q2, P2], and Γ1x is formed in a similar
August 26, 2003 DRAFT
Page 48
45
manner. These two sets are illustrated in Figure 11(a). Note from the figure that Γx is
then simply the trapezoid P4P1P2P3 from which the upper boundary segment [P1, P2] has
been removed. Now, one can easily check that
M0x(Γ
0x) = P1Q1Q4P4 − [P1, Q1, Q4] and M1
x(Γ1x) = Q4Q1P2P3 − [Q1P2].
These sets are illustrated in Figure 11(b). One can easily see that M0x(Γ
0x)∪M1
x(Γ1x) = Γx.
If we choose the straight line passing through Q2 and Q3 as the boundary between Ω0x and
Ω1x, i.e.,
Ω1x =
(u1, u2) : u1 + ax(u2 − 3
2) + 1
2≥ 0
,
then we ensure that Γ0x ⊂ Ω0
x and Γ1x ⊂ Ω1
x. In this situation, we have Mx(Γx) = M0x(Γ
0x)∪
M1x(Γ
1x) = Γx. Back to the original space, the set Γx := Φ−1(Γx) then satisfies Mx(Γx) =
Γx. Because of the quadratic nature of Φx, it is clear that Γx has a boundary composed
of four parabolic pieces. This is illustrated in Figure 9. Because the boundary segment
[P1, P2] is excluded from Γx, the boundary parabola passing through P1 and P2 is excluded
from Γx.
To find the expression for T back in the original system of coordinates, we substitute
the expressions for u1, u2 and ax from (C1) and (C2). Then the resulting function T , up
to a scaling factor, is given by the expression (57) with
C(x) = 8(x− 12)cx − 3x2 + 11x− 7
4. (C5)
In this paper, the choice of C(x) does not matter. However, when dealing with time-
varying inputs, it is shown in [23] that it is interesting to choose C(x) so that the centroid of
Γx is located at (0, 0) regardless of x. In this situation, it is indeed numerically shown that
the resulting modulator becomes superior in performance to the one-bit linear-T second
order modulators. Since Γx is here entirely known analytically, such a value of C(x) is
easy to derive. We show in Appendix-E that this is achieved when C(x) = (7− 3x)x− 112
for x ∈ [0, 12).
D. Proof of Proposition IV.1
Proof: Property 1 can be checked in a straightforward manner using the explicit
parametric descriptions of the invariant sets given in the respective sections and in the
August 26, 2003 DRAFT
Page 49
46
Appendix. Property 2 is a consequence of the fact that each of the invariant sets possesses a
boundary that is composed of a finite number of smooth curves, totaling a finite perimeter.
Hence the ε-neighborhood of each ∂Γx cover an area that decreases as O(ε) as ε → 0. The
uniformity of the constants M0 and C0 are guaranteed by the choice of the intervals I(D).
In particular, these intervals can be chosen to be [−12, 1
2], [−1
6, 1
6] and an arbitrary closed
subinterval of (−12, 1
2), respectively. Let us prove Property 3.
The L2 system: This was shown in Section IV-A.
The Q1 system: The invariant set is Γx = Φ−1x (P1P2P3P4) where the points Pi are given
in (C3) and Φx is given in (C1). From (C3), we derive the equations of the four linear
boundaries of the parallelograms P1P2P3P4 and obtain
(P1P2) : u2 = 1 , (P2P3) : u2 = − 12ax
u1 − 12ax
+ 12,
(P3P4) : u2 = 0 , (P1P4) : u2 = + 12ax
u1 − 12ax
+ 12.
By writing (u1, u2) = Φx(u1, u2) and applying (C1), we derive the equations of the four
parabolic boundaries of Φ−1x (P1P2P3P4) and obtain
(P1P2) : u2 = − 12ax
(u1 − ax
2 )2 + cx + 1 , (P2P3) : u2 = − 12ax
(u1 − ax
2 )2 + cx − 12ax
u1 − 12ax
+ 12 ,
(P3P4) : u2 = − 12ax
(u1 − ax
2 )2 + cx , (P1P4) : u2 = − 12ax
(u1 − ax
2 )2 + cx + 12ax
u1 − 12ax
+ 12 .
One can then easily check that the four above parabolas satisfy the following relations:
(P1P4) = (P1P2) + (1, 0),
(P3P4) = (P1P2)− (0, 1),
(P2P3) = (P1P2)− (1, 1).
(D1)
This can be also graphically seen in Figure 9. This is sufficient to prove Property 3 for the
Q1 system. A graphical representation of the tiling property is shown in the same figure.
The L1 system: Proving the tiling property of the invariant set Γx described in Section
IV-B is a tedious process. Here, we will only point out boundary relations similar to (D1).
Given n vertices R1, R2, · · · , Rn, let us use the notation [R1, R2, · · · , Rn] to designate the
union of the segments [Ri−1, Ri] for i = 2, · · · , n. By using Table II, one can see that
[P2k−1, Q2] = [P ′2k−1, Q
′2] + (1, 0),
[P2, P4, · · · , P2k−2, Q1, P1, P3, · · · , P2k−3] = [P ′2, P
′4, · · · , P ′
2k−2, Q′1, P
′1, P
′3, · · · , P ′
2k−3]− (0, 1),
[Q0, P0] = [Q′0, P
′0]− (1, 1).
We illustrate these three relations by the three arrows in Figure 8(b).
August 26, 2003 DRAFT
Page 50
47
E. On the analysis of the quadratic scheme: zero-centroid setting of C(x)
Let us call Gx = (u1,x, u2,x) the centroid point of Γx and write u = (u1, u2) in general.
We have
Gx =
∫u∈Γx
u du =
∫u∈Φ(Γx)
Φ−1(u) du.
In the last equality, we have used the fact that du = du since the transformation Φ
from (C1) conserves measure. We know from Section IV-C that Φ(Γx) is the trapezoid
P1P2P3P4. From (C2) and the explicit coordinates of its vertices given in (C3) and (C4),
we have
Φ(Γx) =(u1, u2) : 0 ≤ u2 ≤ 1 and |u1| ≤ 1
2+ ax(u2 − 1
2)
.
From (C1), one easily derives that
Φ−1(u) = (u1, u2 − 12ax
(u1 − ax
2)2 + cx
).
Consequently, the first component of Gx is u1,x =∫u∈Φ(Γx)
u1 du. Since Φ(Γx) is clearly
symmetrical with respect to the u2-axis, we already have u1,x = 0. The second component
of Gx is equal to
u2,x =
∫ 1
u2=0
∫ 12+ax(u− 1
2)
u1=− 12−ax(u− 1
2)
(u2 − 1
2ax(u1 − ax
2)2 + cx
)du1du2 = 1
2− 1
24ax+ cx.
The component ux will then be systematically equal to 0 by choosing cx = 124ax
− 12. Using
(C2) and (C5), this implies that C(x) = (7− 3x)x− 112
.
acknowledgments
The authors would like to thank Ingrid Daubechies, Ron DeVore, Ozgur Yılmaz and
Jade Vinson for interesting discussions on the topic of this paper, and the anonymous
referees for their helpful comments and suggestions on the presentation. S.G. would like
to thank Institute for Advanced Study and Courant Institute for their hospitality during
the writing stage of this work.
References
[1] J. C. Candy and G. C. Temes, Eds., Oversampling Delta-Sigma Data Converters: Theory, Design and
Simulation, IEEE Press, 1992.
August 26, 2003 DRAFT
Page 51
48
[2] W. Chou, T. H. Meng, and R. M. Gray, “Time Domain Analysis of Sigma Delta Modulation,” Proceedings
ICASSP-90, Int. Conf. on Acoustics, Speech and Signal Processing, vol. 3, pp. 1751–1754, Albuquerque, NM,
April 1990.
[3] W. Chou, P. W. Wong, and R. M. Gray, “Multistage sigma-delta modulation,” IEEE Trans. Inform. Theory,
vol. 35, pp. 784–796, July 1989.
[4] I. Daubechies and R. A. DeVore, “Reconstructing a bandlimited function from very coarsely quantized data:
A family of stable sigma-delta modulators of arbitrary order,” Ann. of Math., vol. 158, no. 2, pp. 643–674,
Sept. 2003.
[5] M. Drmota and R. F. Tichy, Sequences, Discrepancies and Applications, Springer, 1997.
[6] R. M. Gray, “Spectral analysis of quantization noise in a single-loop sigma-delta modulator with dc input,”
IEEE Transactions on Communications, vol. 37, pp. 588–599, June 1989.
[7] R. M. Gray, W. Chou, and P.-W. Wong, “Quantization noise in single-loop sigma-delta modulation with
sinusoidal input,” IEEE Transactions on Communications, vol. COM-37, pp. 956–968, Sept. 1989.
[8] C. S. Gunturk, “Improved error estimates for first order sigma-delta modulation,” in Proceedings of the
International Workshop on Sampling Theory and Applications, SampTA’99, August 1999, pp. 171–176, Loen,
Norway.
[9] C. S. Gunturk, Harmonic Analysis of Two Problems in Signal Quantization and Compression, Ph.D. thesis,
Princeton University, 2000.
[10] C. S. Gunturk, “Approximating a Bandlimited Function Using Very Coarsely Quantized Data: Improved
Error Estimates in Sigma-Delta Modulation”, J. Amer. Math. Soc., posted on August 1, 2003, PII S 0894-
0347(03)00436-3 (to appear in print).
[11] C. S. Gunturk, J. Lagarias, and V. Vaishampayan, “On the robustness of single loop sigma-delta modulation,”
IEEE Transactions on Information Theory., July 2001.
[12] N. He, F. Kuhlmann, and A. Buzo, “Double-loop sigma-delta modulation with dc input,” IEEE Trans.
Commun, vol. 38, pp. 487–495, April 1990.
[13] N. He, F. Kuhlmann, and A. Buzo, “Multi-loop sigma-delta quantization,” IEEE Trans. Inform. Theory,
vol. 38, pp. 1015–1028, May 1992.
[14] D. F. Hoschele, Jr., Analog-to-Digital and Digital-to-Analog Conversion Techniques, John Wiley & Sons,
1994.
[15] H. Inose and Y. Yasuda, “A unity bit coding method by negative feedback,” Proceedings of the IEEE, vol.
51, pp. 1524–1535, Nov 1963.
[16] L. Kuipers and H. Niederreiter, Uniform Distribution of Sequences, Wiley, 1974.
[17] M. Laczkovich, “Discrepancy estimates for sets with small boundary,” Stud. Sci. Math. Hung., vol. 30, pp.
105–109, 1995.
[18] H. Niederreiter, “Application of diophantine approximations to numerical integration,” in Diophantine
Approximation and its Applications, C. F. Osgood, Ed., pp. 129–199. Academic Press, 1973.
[19] S. R. Norsworthy, R. Schreier, and G. C. Temes, Eds., Delta-Sigma Data Converters: Theory, Design and
Simulation, IEEE Press, 1996.
[20] W. Parry, Topics in Ergodic Theory, Cambridge University Press, 1981.
[21] R. Schreier, M. V. Goodson, and B. Zhang, “An algorithm for computing convex positively invariant sets for
delta-sigma modulators,” IEEE Trans. on Circuits and Systems, I, vol. 44, pp. 38–44, January 1997.
August 26, 2003 DRAFT
Page 52
49
[22] N. T. Thao, “Vector Quantization Analysis of Σ∆ Modulation,” IEEE Transactions on Signal Processing,
vol.44, pp.808-817, Apr. 1996.
[23] N. T. Thao, “MSE behavior and centroid function of mth order asymptotic Σ∆ modulators,” IEEE Trans.
Circuits and Systems II, 2001, submitted.
[24] N. T. Thao and C. S. Gunturk, “Ergodic Dynamics in Σ∆ Quantization: Tiling Invariant Sets and Spectral
Analysis of Error,” arXiv:math.DS/0308223, http://arXiv.org/abs/math.DS/0308223.
[25] N. T. Thao and M. Vetterli, “Deterministic Analysis of Oversampled A/D Conversion and Decoding Im-
provement Based on Consistent Estimates,” IEEE Transactions on Signal Processing, vol.42, pp.519-531,
Mar. 1994.
[26] O. Yılmaz, “Stability analysis for several sigma-delta methods of coarse quantization of bandlimited func-
tions,” Constructive Approximation, 18 (2002), no. 4, 599–623.
August 26, 2003 DRAFT