Statistical inference for nonparametric GARCH models Alexander Meister * Jens-Peter Kreiß † May 15, 2015 Abstract We consider extensions of the famous GARCH(1, 1) model where the recursive equa- tion for the volatilities is not specified by a parametric link but by a smooth autoregression function. Our goal is to estimate this function under nonparametric constraints when the volatilities are observed with multiplicative innovation errors. We construct an estimation procedure whose risk attains the usual convergence rates for bivariate nonparametric regres- sion estimation. Furthermore, those rates are shown to be optimal in the minimax sense. Numerical simulations are provided for a parametric submodel. Keywords: autoregression; financial time series; inference for stochastic processes; minimax rates; non- parametric regression. AMS subject classification 2010: 62M10; 62G08. 1 Introduction During the last decades GARCH time series have become a famous and widely studied model for the analysis of financial data, e.g. for the investigation of stock market indices. Since the development of the basic ARCH processes by Engle (1982) and their generalization to GARCH time series in Bollerslev (1986) a considerable amount of literature deals with statistical infer- ence – in particular parameter estimation – for those models. Early interest goes back to Lee and Hansen (1994), who consider a quasi-maximum likelihood estimator for the GARCH(1, 1) parameters, and to Lumsdaine (1996) who proves consistency and asymptotic normality of such methods. In a related setting, Berkes et at. (2003) and Berkes and Horvath (2004) establish * Institut f¨ ur Mathematik, Universit¨ at Rostock, D-18051 Rostock, Germany, email: alexander.meister@uni- rostock.de † Institut f¨ ur Mathematische Stochastik, TU Braunschweig, D-38092 Braunschweig, Germany, email: [email protected]1
38
Embed
Statistical inference for nonparametric GARCH … inference for nonparametric GARCH models Alexander Meister Jens-Peter Krei ... e.g. for the investigation of stock market indices.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Statistical inference for nonparametric GARCH models
Alexander Meister∗ Jens-Peter Krei߆
May 15, 2015
Abstract
We consider extensions of the famous GARCH(1, 1) model where the recursive equa-
tion for the volatilities is not specified by a parametric link but by a smooth autoregression
function. Our goal is to estimate this function under nonparametric constraints when the
volatilities are observed with multiplicative innovation errors. We construct an estimation
procedure whose risk attains the usual convergence rates for bivariate nonparametric regres-
sion estimation. Furthermore, those rates are shown to be optimal in the minimax sense.
Numerical simulations are provided for a parametric submodel.
Keywords: autoregression; financial time series; inference for stochastic processes; minimax rates; non-
parametric regression.
AMS subject classification 2010: 62M10; 62G08.
1 Introduction
During the last decades GARCH time series have become a famous and widely studied model
for the analysis of financial data, e.g. for the investigation of stock market indices. Since the
development of the basic ARCH processes by Engle (1982) and their generalization to GARCH
time series in Bollerslev (1986) a considerable amount of literature deals with statistical infer-
ence – in particular parameter estimation – for those models. Early interest goes back to Lee
and Hansen (1994), who consider a quasi-maximum likelihood estimator for the GARCH(1, 1)
parameters, and to Lumsdaine (1996) who proves consistency and asymptotic normality of such
methods. In a related setting, Berkes et at. (2003) and Berkes and Horvath (2004) establish
The result (3.3) can be viewed as a pseudo estimation equation since the left hand side is
accessible by the data while the right hand side contains the target curve m in its representation.
Unfortunately the right hand side also depends on the unobserved Xj−k+1. However we can show
in the following lemma that this dependence is of minor importance for large k.
Lemma 3.1 We assume Condition A and that j ≥ k > 1. Let f : [0,∞)2 → [0,∞) or R2 → Rbe a continuously differentiable function with bounded partial derivative with respect to x. Fixing
an arbitrary deterministic x0 ≥ 0, we have∣∣f [k](Xj−k+1, Yj−k+1, . . . , Yj)− f [k](x0, Yj−k+1, . . . , Yj)∣∣ ≤ ‖fx‖k∞ |Xj−k+1 − x0| .
We recall Condition (A1) which guarantees that the assumptions on f in Lemma 3.1 are satisfied
by m; and which provides in addition that ‖mx‖∞ ≤ c1 < 1. Combining (3.2) and Lemma 3.1,
we realize that m[k](x0, Yj−k+1, . . . , Yj) can be employed as a proxy of Xj+1. Concretely we have
E∣∣Xj+1 −m[k](x0, Yj−k+1, . . . , Yj)
∣∣2 ≤ ‖mx‖2k∞ E|X1 − x0|2 , (3.4)
where the strict stationarity of the sequence (Xn)n has been used in the last step. Furthermore,
as an immediate consequence of (3.2) and (3.4), we deduce that
Thus, suggesting m[k](x0, Yj−k+1, . . . , Yj) for an arbitrary fixed deterministic x0 ≥ 0 as a proxy
of the right hand side of (3.3) seems reasonable. The left hand side of this equality represents the
best least square approximation of Yj+1 among all measurable functions based on Yj−k+1, . . . , Yj .
That motivates us to consider that g ∈ G which minimizes the (nonlinear) contrast functional
Φn(g) :=1
n−K − 1
n−2∑j=K
∣∣Yj+1 − g[K](x0, Yj−K+1, . . . , Yj)∣∣2
+∣∣Yj+2 − g[K+1](x0, Yj−K+1, . . . , Yj+1)
∣∣2 ,as the estimator m = m(Y1, . . . , Yn) of m where we fix some finite collection G of applicants for
the true regression function m. Moreover the parameter K ≤ n − 2 remains to be chosen. We
use a two-step adaptation of g[j] to Yj+1, i.e. for j = K and j = K + 1, in order to reconstruct
g from g[j] which will be made precise in section 4. In particular, Lemma 4.1 will clarify the
merit of this double least square procedure.
6
3.2 Selection of the approximation set G
Now we focus on the problem of how to choose the approximation set G. In addition to Condi-
tion A, we impose
Condition B
The restriction of the true regression function m to the domain I := [0, R′′] × [0, R′′Rε] can
be continued to a function on [−1, R′′ + 1] × [−1, R′′Rε + 1] which is bβc-fold continuously
differentiable and all partial derivatives of m with the order ≤ bβc are bounded by a uniform
constant cM on the enlarged domain. Therein we write R′′ := α′′0/(1−c1) with c1 as in Condition
(A1). Moreover, for non-integer β, the Holder condition∣∣∣ ∂bβcm
(∂x)k(∂y)bβc−k(x1, y1) − ∂bβcm
(∂x)k(∂y)bβc−k(x2, y2)
∣∣∣ ≤ cM∣∣(x1, y1)− (x2, y2)
∣∣β−bβc ,is satisfied for all k = 0, . . . , bβc, (x1, y1), (x2, y2) which are located in [−1, R′′+1]×[−1, R′′Rε+1].
We assume that β > 2.
Note that, by Condition A and Proposition 2.1(a), we have RX ≤ R′′ so that the smoothness
region of m contains the entire support of the distribution of (X1, Y1). Condition B represents
classical smoothness assumptions on m where β describes the smoothness level of the function
m. All admitted regression functions, i.e. those functions which satisfy all constraints imposed
on m in the Conditions A and B are collected in the function class
M =M(c1, α′0, α′1, β′1, α′′0, α
′′1, β′′1 , β, cM) .
Contrarily, the distribution of the error ε1 is assumed to be unknown but fixed. By ‖ · ‖2 we
denote the Hilbert space norm
‖m‖22 :=
∫m2(x, y)dPX1,Y1(x, y) ,
for any measurable function m : [0,∞)2 → R, for which the above expression is finite, with
respect to the stationary distribution PX1,Y1 of (X1, Y1) when m is the true regression function.
Now let us formulate our stipulations on the approximation space G which are required to show
the asymptotic results in Section 4.
Condition C
We assume that G consists either of finitely many measurable mappings g : [0,∞)2 → [0,∞) or
R2 → R. In addition, we assume that g is continuously differentiable and satisfies ‖gx‖∞ ≤ c1
with the constant c1 as in Condition A and ‖gy‖∞ ≤ cG as well as supx∈[0,R′′]×[0,R′′Rε] |g(x, y)| ≤
7
cG for some constant cG > cM. Furthermore, we impose that any m ∈ M is well-approximable
in G; more specifically
supm∈M
ming∈G‖m− g‖22 = O
(n−β/(1+β)
). (3.6)
On the other hand, the cardinality of G is restricted as follows,
#G ≤ expcN · n1/(1+β)
,
for some finite constant cN . Therein β and R′′ are as in Condition B.
We provide
Lemma 3.2 For cG sufficiently large, there always exists some set G such that Condition C is
satisfied.
The cover provided in Lemma 3.2 depends on the smoothness level β in Condition C so that
the estimator m depends on β. Therefore the estimator is denoted by mβ while the index is
oppressed elsewhere. That motivates us to consider a cross-validation selector for β as used in
the construction of our estimator. The integrated squared error of our estimator is given by
‖mβ −m‖22 = ‖mβ‖22 − 2
∫∫mβ(x, y)m(x, y)dP(X,Y )(x, y) + ‖m‖22 . (3.7)
While the last term in (3.7) does not depend on the estimator we have to mimic the first two
expressions. For that purpose we introduce 0 < n1 < n2 < n and base our estimator only on
the dataset Y1, . . . , Yn1 . Then ‖mβ‖22 can be estimated by
1
n− n2
n∑j=n2+1
∣∣m[j−n1]β (x0, Yn1+1, . . . , Yj)
∣∣2 ,in the notation (3.1), and the second term in (3.7) is estimated by
2
n− n2 − 1
n−1∑j=n2+1
Yj+1 · m[j−n1]β (x0, Yn1+1, . . . , Yj) .
Replacing the first two terms in (3.7) by these empirical proxies and omitting the third one
provides an empirically accessible version of (3.7). This version can be minimized over some
discrete grid with respect to β instead of the true integrated squared error. Then the minimizing
value is suggested as β. Still we have to leave the question of adaptivity open for future research.
Finally we mention that the estimator derived in this section can also be applied to parametric
approaches although we do not focus on such models in the framework of the current paper. In
the simulation section we will consider such a parametric subclass.
8
4 Asymptotic properties
In this section we investigate the asymptotic quality of our estimator m as defined in section
3. In order to evaluate this quality we consider the mean integrated square error E‖m −m‖22(MISE). In the following subsection 4.1, we deduce upper bounds on the convergence rates of the
MISE when the sample size n tends to infinity; while, in subsection 4.2, we will show that those
rates are optimal with respect to any estimator based on the given observations in a minimax
sense. Note that we consider the MISE uniformly over the function class m ∈M while the error
density fε is viewed as unknown but fixed; in particular, it does not change in n.
4.1 Convergence rates – upper bounds
We fix some arbitrary g ∈ G. Then we study the expectation of the functional Φn(g). Exploiting
(3.3) and the strict stationarity of (Yn)n, we obtain that
EΦn(g) = E∣∣YK+1 − g[K](x0, Y1, . . . , YK)
∣∣2 + E∣∣YK+2 − g(g[K](x0, Y1, . . . , YK), YK+1)
∣∣2= E var(YK+1|Y1, . . . , YK) + E var(YK+2|Y1, . . . , YK+1)
+ E|WK −GK(g)|2 + E|WK+1 −GK+1(g)|2 ,
where we define
Wj := E(m[j](X1, Y1, . . . , Yj)|Y1, . . . , Yj
),
Gj(g) := g[j](x0, Y1, . . . , Yj) , g ∈ G ,
for j = K,K + 1. For any g1, g2 ∈ G we deduce that
As a side result of this inequality we derive by induction that all random variables Zt+n,n,
t, n ≥ 0, are located in L∞(Ω,A, P ), i.e. the Banach space consisting of all random variables
X on (Ω,A, P ) with ess sup|X| < ∞. Moreover, Condition (A1) and the nonnegativity of the
εn provide that Zt+n,n ≥ 0 almost surely for all t, n ≥ 0 also by induction. Since ‖mx‖∞ +
‖my‖∞Rε ≤ c1 < 1 and (6.1) holds true we easily conclude by the triangle inequality that
(Zt+n,n)n≥0 are Cauchy sequences in L∞(Ω,A, P ) for all integer t ≥ 0. By the geometric
summation formula, we also deduce that
ess supZt+n,n ≤ α′′0/(1− c1) , (6.2)
for all t, n ≥ 0. The completeness of L∞(Ω,A, P ) implies convergence of each (Zt+n,n)n≥0
to some Z∗t ∈ L∞(Ω,A, P ) with respect to the underlying essential supremum norm. The
nonnegative random variables on (Ω,A, P ) which are bounded from above by α′′0/(1−c1) almost
surely form a closed subset of L∞(Ω,A, P ) as L∞(Ω,A, P )-convergence implies convergence
in distribution. We conclude that (6.2) can be taken over to the limit variables Z∗t so that
Z∗t ∈[0, α′′0/(1 − c1)
]almost surely for all t ≥ 0. Since L∞(Ω,A, P )-convergence also implies
convergence in probability the inequality (6.1) also yields that
P[|Zt+n,n − Z∗t | > δ
]≤ lim sup
m→∞P[|Zt+n,n − Zt+m,m| > δ/2
]≤ 2δ−1
∞∑m=n
ess sup |Zt+m,m − Zt+m+1,m+1|
≤ 2δ−1α′′0cn1/(1− c1) ,
by Markov’s inequality for any δ > 0 so that the sum of the above probabilities taken over all n ∈N is finite. Hence, (Zt+n,n)n converges to Z∗t almost surely as well. By the recursive definition of
the Zt,s and the joint initial values Zt,0 = 0 we realize that, for any fixed n, the random variables
Zt+n,n, t ≥ 0, are identically distributed. We conclude that all limit variables Z∗t possess the same
distribution. As m is continuous and we have Zt+n+1,n+1 = m(Zt+n+1,n, ε−t−1) the established
almost sure convergence implies that
Z∗t = m(Z∗t+1, ε−t−1) = m(Z∗t+1, Z∗t+1ε−t−1) ,
almost surely for all t ≥ 0. The variable Zt+1+n,n is measurable in the σ-algebra generated by
Zt+1+n,0, ε−t−2, ε−t−3, . . . , ε−t−n−1. Hence, an appropriate version of the almost sure limit Z∗t+1
16
is measurable in the σ-algebra generated by all εs, s ≤ −t − 2, and therefore independent of
ε−t−1. We conclude that, by putting X0 := Z∗0 , the induced sequences (Xn)n, which are defined
according to the recursive relation (1.1), is strictly stationary and causal since X0, ε0, ε1, . . . are
independent so that all marginal distributions of (Xn)n−s∈T , with any finite T ⊆ N0, do not
depend on the integer shift s ≥ 0. Also all Xn have the same distribution as the Z∗t so that
Xn ∈ [0, α′′0/(1− c1)] holds true almost surely. Uniqueness of the stationary distribution can be
seen as follows: We have Zt+n,n = m[n](0, ε−n−t, . . . , ε−1−t) for all t and n in the notation (3.1).
Now let (Xn)n be a strictly stationary and causal time series where X1 is independent of all εs,
integer s. Again, by the mean value theorem and induction we obtain that∣∣Zt+n,n − m[n](X1, ε−n−t, . . . , ε−1−t)∣∣ ≤ α′′0 c
n1 ,
so that the stochastic processm[n](X1, ε−n−t, . . . , ε−1−t)
n
converges to the random variable
Z∗t almost surely and, hence, in distribution. As we claim that Xnn is a strictly stationary
and causal solution for (1.1) all the random variables m[n](X1, ε−n−t, . . . , ε−1−t) have the same
distribution as X1. Therefore the distributions of X1 and Z∗t coincide so that the distribution
of Z∗t is indeed the unique strictly stationary distribution when m is the autoregression function
and fε is the error density.
(b) Since X1 ≥ LX =⋂n∈NX1 ≥ LX − 1/n we have X1 ≥ LX almost surely. The
monotonicity constraints in Condition (A3) imply that X2 = m(X1, ε1) ≥ m(LX , 0) = m(LX , 0)
a.s.. The strict stationarity of the process (Xn)n provides that X1 ≥ m(LX , 0) a.s. holds true
as well. By the definition of LX it follows that LX ≥ m(LX , 0). Now we assume that LX >
m(LX , 0) = m(LX , 0). As m is continuous there exists some ρ > 0 with LX > m(LX + ρ, ρ).
Thus,
P [X1 ≤ LX + ρ, ε1 ≤ ρ] ≤ P [X2 = m(X1, ε1) ≤ m(LX + ρ, ρ)]
≤ P [X1 < LX ] = 0 ,
due to Condition (A3) and the definition of LX . On the other hand we have
P [X1 ≤ LX + ρ, ε1 ≤ ρ] = P [X1 ≤ LX + ρ] · P [ε1 ≤ ρ] = 0 ,
so that we obtain a contradiction to Condition (A4) and the definition of LX . We conclude that
LX = m(LX , 0). If another solution x ≥ 0 existed we would have
by the mean value theorem where ξK denotes some specific value between XK+1 and GK(g). In
order to determine that minimum only the values λ ∈ 0, 1, |m(XK+1, YK+1)−g(XK+1, YK+1)|/|XK+1 −GK(g)| have to be considered where the last point has to be taken into account only
if |m(XK+1, YK+1) − g(XK+1, YK+1)| ≤ |XK+1 − GK(g)|. In all of those cases we derive that
the above term is bounded from below by |m(XK+1, YK+1)− g(GK(g), YK+1)|2/2. Then, taking
the expectation of these random variables yields the desired result.
Proof of Theorem 4.1: By Condition C we have ‖g‖22 ≤ c2G for all g ∈ G. As m ∈ G hold true
by the definition of m we have ‖m‖22 ≤ c2G almost surely. Condition B yields that ‖m‖22 ≤ c2
M
for the true regression function m since |m| is bounded by cM on the entire support of (X1, Y1).
Therefore, we have ‖m−m‖2 ≤ cM + cG almost surely and
E‖m−m‖22 =
∫ ∞δ=0
P[‖m−m‖22 > δ
]dδ ≤ δn +
∫ (cM+cG)2
δ=δn
P[‖m−m‖22 > δ
]dδ , (6.7)
where δn := cδ ·n−β/(β+1) for some constant cδ > 1 still to be chosen. Clearly we have 0 < δn <
(cM+ cG)2 for n sufficiently large. By the definition of m we derive that Φn(m) ≤ Φn(g0) holds
true almost surely. Writing ∆n(g) := Φn(g)−EΦn(g), equation (4.2) provides that ‖m−m‖22 > δ
implies the existence of some g ∈ G with ‖m− g‖22 > δ such that
∆n(g)−∆n(g0) ≤ −1
4‖m− g‖22 −
1
4δ +
4
(1− c1)2‖m− g0‖22 +RK .
Condition C yields that
supm∈M
‖m− g0‖22 = O(n−β/(1+β)
).
Therefore cδ can be selected sufficiently large (uniformly for all m ∈M) such that
|∆n(g)−∆n(g0)| ≥ 1
4‖m− g‖22 ,
22
for all δ > δn. Therein, note that, by (4.3) and the imposed choice of K, the term RK is
asymptotically negligible uniformly with respect to m. Hence, we have shown that
P[‖m−m‖22 > δ
]≤ P
[∃g ∈ G : ‖m− g‖22 > δ ,
∣∣∆n(g)−∆n(g0)∣∣ ≥ 1
4‖m− g‖22
]≤ P
[∃g ∈ G :
∣∣∆n(g)−∆n(g0)∣∣ ≥ 1
4max
δ, ‖m− g‖22
],
holds true for all δ ∈(δn, (cG + cM)2
)and for all n > N with some N which does not depend
on m but only on the function class M. By the representation
where the finite positive constants d1 and d2 only depends on c1, R′′ and Rε. Applying Theorem
1 from Merlevede et al. (2011) to (6.8) yields that
P[‖m−m‖22 > δ
]≤ 4
(#G)·(
(n−K − 1) exp−D1(n−K + 1)γ
+ exp
−D2(n−K − 1)2/3
+ exp
− D2
3d1(n−K − 1)δ
+ exp
− D2
3d2(n−K − 1)c−2K
1 δ2
+ exp−D3(n−K − 1) · exp
(D4
(n−K − 1)γ(1−γ)
logγ(n−K − 1)
)),
(6.17)
28
for all δ ∈ (δn, (cG + cM)2) and n > max4, N where the positive finite constants D1, . . . , D4
only depend on c1, x0, R′′, Rε, cG and γ2. Considering the upper bound on the cardinality of Gprovided by Condition C, we apply the inequality (6.17) to (6.7) and obtain that
supm∈M
E‖m−m‖22 ≤ δn + 4(cM + cG)2(n−K − 1) expcNn
1/(1+β) −D1(n−K + 1)γ
+ 4(cM + cG)2 · expcNn
1/(1+β) −D2(n−K + 1)2/3
+ 4(cM + cG)2 · expcNn
1/(1+β) −D3(n−K − 1) · exp(D4
(n−K − 1)γ(1−γ)
logγ(n−K − 1)
)+ 4(cM + cG)2 · exp
cNn
1/(1+β) − D2
3d2(n−K − 1)c−2K
1 δ2n
+
12d1
D2(n−K − 1)exp
cNn
1/(1+β) − D2
3d1(n−K − 1)δn
for all n > max4, N. Due to δn = cδ · n−β/(1+β) with cδ sufficiently large, the selection of K
and the fact that γ can be chosen arbitarily close to 1 (select γ2 large enough) while β > 1 by
Condition C, the first addend δn dominates asymptotically in the above inequality which finally
leads to the desired rate.
Proof of Lemma 4.2: The derivative of m(x)(y) := m(x, y) with respect to y is bounded from
below by c′Mα′0 for any m ∈M′ and x ≥ α′0 so that the functions m(x) : [0,∞)→ [m(x, 0),∞),
x ≥ α′0, increase strictly monotonically and, hence, are invertible. For any c ≥ 0 we have
(x, z) : m−1(x)(z) ≤ c = (x, z) : m(x, c) − z ≥ 0 so that Borel measurability of the function
(x, z) 7→ m−1(x)(z) follows of that of m. By Fubini’s theorem, it follows from there that
P [X1 ≤ z] =
∫P [m(x, ε1) ≤ z]dPX1(x) =
∫P[ε ≤ m−1
(x)(z)]dPX1(x)
=
∫ ∫ m−1(x)
(z)
t=0fε(t) dt dPX1(x) =
∫ z
s=0
(∫ fε(m−1(x)(s))
my(x, m−1(x)(s))
dPX1(x))ds ,
for all z ≥ 0 so that PX1 is absolutely continuous and has the Lebesgue density
fX(z) = Efε(m
−1(X1)(z))
my(X1, m−1(X1)(z))
,
which is bounded from above by ‖fε‖∞/(c′Mα′0) as X1 ≥ α′0 holds true almost surely. Moreover
the support of fX is included in [α′0, R′′] with R′′ as in Condition B. Therefore we have
‖f‖22 =
∫∫f2(x, xt)fX(x)fε(t) dx dt ≤
∫∫[α′0,R
′′]×[0,Rε]f2(x, xt) dx dt · ‖fε‖2∞/(c′Mα′0)
≤∫∫
[α′0,R′′]×[0,R′′Rε]
f2(x, y) dx dy · ‖fε‖2∞/c′M(α′0)2 ,
29
so that ‖ · ‖2 is dominated by ‖ · ‖λ multiplied by a uniform constant.
Proof of Lemma 4.3: It follows immediately that
Q((a, b]) ≥ c · λ((a, b]) ,
for all left-open intervals (a, b] ⊆ I since
Q((a, b]) ≥ Q([a− 1/m, b]) ≥ c · λ([a− 1/m, b]) → c(b− a) = cλ((a, b]) ,
as m→∞ for all (a, b] ⊆ I. Let F denote the collection of all unions of finitely many intervals
(a, b] ⊆ I. One can verify that F forms a ring of subsets of I which is closed with respect to
the relative complement related to I. Moreover, F can equivalently be defined by the collection
of all unions of finitely many intervals (a, b] ⊆ I which are pairwise disjoint. For all f ∈ F we
conclude that
Q(f) = Q( n⋃k=1
(ak, bk])
=
n∑k=1
Q((ak, bk]) ≥ c ·n∑k=1
λ((ak, bk]) = c · λ(f) ,
where the intervals (ak, bk] are assumed to be pairwise disjoint. The σ-field generated by the
ring F equals the Borel σ-field of I as F contains all left-open intervals of I. As Q and λ are
finite measures on that Borel σ-field their restrictions to the domain F are finite pre-measures.
By Caratheodory’s extension theorem, Q and λ fulfil
Q(g) = inf ∞∑j=1
Q(fj) : g ⊆⋃j∈N
fj , fj ∈ F,
λ(g) = inf ∞∑j=1
λ(fj) : g ⊆⋃j∈N
fj , fj ∈ F.
Since∑∞
j=1Q(fj) ≥ c ·∑∞
j=1 λ(fj) the outer measure representation of Q and λ yields that
Q(g) ≥ c · λ(g) for all Borel subsets g of I. Clearly for all elementary non-negative measurable
mappings ϕ on I, ϕ =∑n
k=1 ϕk · 1gk , for some real numbers ϕk ≥ 0 and pairwise disjoint Borel
subsets gk, k = 1, . . . , n, of I, it follows that∫ϕ(x)dQ(x) ≥ c ·
∫ϕ(x)dλ(x) .
This inequality extends to all non-negative measurable functions ϕ by the monotone conver-
gence theorem as any such function ϕ is pointwise approximable by a pointwise monotonically
increasing sequence of elementary non-negative measurable mappings on the domain I.
30
Now we assume that the Lebesgue density q of Q on I exists. Then we put ϕ equal to the
indicator function of the preimage q−1((−∞, c)) ⊆ I so that∫(q(x)− c)ϕ(x)dλ(x) =
∫ϕ(x)dQ(x)− c ·
∫ϕ(x)dλ(x) ≥ 0 .
As the function (q − c)ϕ is non-positive it follows that, for Lebesgue almost all x ∈ I, we have
q(x) = c or ϕ(x) = 0. Thus, for those x we have q(x) ≥ c.
Proof of Theorem 4.2: The statistical experiments under which one observes the dataX1, Y 1, . . . , Y n
with X1 = logX1 and Y j = log Yj from model (1.1) is more informative than the model under
consideration in which only Y1, . . . , Yn are accessible. Hence, as we are proving a lower bound,
we may assume that logX1 is additionally observed and that the Y j are recorded instead of the
Yj .
First let us verify that, for all binary matrices θ, the functions mθ from (4.6) lie in M. The
support of the function mθ −mG is included in(x, y) ∈ R2 : x ∈ α0/(1− α1) +
1
16Rεβ
′1α0 · [1, 2] , 0 ≤ y ≤ Rεx/2
,
as a subset. Thus we derive that
‖(mθ)x − (mG)x‖∞ ≤ 8cHb1−βn ·
8
Rεβ′1α0‖Hx‖∞ +
4
α0‖Hy‖∞
,
‖(mθ)y − (mG)y‖∞ ≤8cHRεα0
‖Hy‖∞ · b1−βn , (6.18)
while (mG)x ≡ α1 and (mG)y ≡ β1. By Condition D we have α1 + β1Rε < c1 so that Condition
(A1) holds true for all mθ for cH > 0 sufficiently small. All functions mθ coincide with mG on
their restriction to [0, α0/(1− α1) +
1
16Rεβ
′1α0
]× [0,∞) , (6.19)
while, for x > α0/(1− α1) +Rεβ′1α0/16, the upper bound
‖mθ −mG‖∞ ≤ cHbβn‖H‖∞ ,
for cH small enough, suffices to verify the envelope Condition (A2). By (6.18) and the small cH
we also establish that
infx≥0,y≥0
(mθ)x ≥ α1/2 > 0 , infx≥0,y≥0
(mθ)y ≥ β1/2 > 0 ,
for all θ, what guarantees the monotonicity Condition (A3). Moreover, the above inequalities
guarantee Condition C’ so that the stationary distribution of (Xn)n under any mθ has a bounded
31
Lebesgue density fX,θ by Lemma 4.2. Also, as cH is sufficiently small with respect to all other
constants, we can verify the smoothness constraint in Condition B for all admitted values of θ.
As we have shown that
MP :=mθ : θ ∈ Θn
⊆ M ,
where Θn denotes the set of all binary (bn× bn)-matrices, we follow a usual strategy of minimax
theory and put an a-priori distribution on MP . Concretely, we assume that all components of
the binary matrix θ are independent and satisfy P [θj,k = 1] = 1/2 and estimate the minimax
risk by the corresponding Bayesian risk from below. However, we face the problem that the loss
function depends on the stationary distribution of the data and, hence, on mθ. As all mθ equal
mG on the interval (6.19) we have α0/(1−α1) = mθ
(α0/(1−α1), 0
). By Proposition 2.1(b), the
left endpoint of the stationary distribution under the regression function mθ equals α0/(1−α1)
for all θ ∈ Θn so that
J := J (mθ) =((α0/(1− α1) +Rεβ
′1α0/16, α0/(1− α1) +Rεβ
′1α0/8] ,
for all θ ∈ Θn. The closure of this interval contains the whole support of mθ−mG for all θ ∈ Θn.
Then, (4.5) yields that
‖mn −mθ‖22 ≥ µ6rε ·∫ Rε/2
0
∫J
∣∣mn(x, xt)− mθ(x, t)∣∣2 dx dt ,
where the ‖ · ‖2-norm depends on θ. We introduce the intervals
Ij,k := ij,k+(−Rεβ′1α0/(64bn), Rεβ
′1α0/(64bn)
)×(−Rε/(8bn), Rε/(8bn)
), j, k = 0, . . . , bn−1 ,
which are pairwise disjoint and subsets of [0, Rε/2]× J . We deduce that