Statistical inference for nonparametric GARCH … inference for nonparametric GARCH models Alexander Meister Jens-Peter Krei ... e.g. for the investigation of stock market indices.

Statistical inference for nonparametric GARCH models

Alexander Meister∗ Jens-Peter Kreiß†

May 15, 2015

Abstract

We consider extensions of the famous GARCH(1, 1) model where the recursive equa-

tion for the volatilities is not specified by a parametric link but by a smooth autoregression

function. Our goal is to estimate this function under nonparametric constraints when the

volatilities are observed with multiplicative innovation errors. We construct an estimation

procedure whose risk attains the usual convergence rates for bivariate nonparametric regres-

sion estimation. Furthermore, those rates are shown to be optimal in the minimax sense.

Numerical simulations are provided for a parametric submodel.

Keywords: autoregression; financial time series; inference for stochastic processes; minimax rates; non-

parametric regression.

AMS subject classification 2010: 62M10; 62G08.

1 Introduction

During the last decades GARCH time series have become a famous and widely studied model

for the analysis of financial data, e.g. for the investigation of stock market indices. Since the

development of the basic ARCH processes by Engle (1982) and their generalization to GARCH

time series in Bollerslev (1986) a considerable amount of literature deals with statistical infer-

ence – in particular parameter estimation – for those models. Early interest goes back to Lee

and Hansen (1994), who consider a quasi-maximum likelihood estimator for the GARCH(1, 1)

parameters, and to Lumsdaine (1996) who proves consistency and asymptotic normality of such

methods. In a related setting, Berkes et at. (2003) and Berkes and Horvath (2004) establish

∗Institut fur Mathematik, Universitat Rostock, D-18051 Rostock, Germany, email: alexander.meister@uni-

rostock.de†Institut fur Mathematische Stochastik, TU Braunschweig, D-38092 Braunschweig, Germany, email:

[email protected]

1

n−1/2-consistency, asymptotic normality and asymptotic efficiency for the quasi maximum like-

lihood estimator of the GARCH parameters. Hall and Yao (2003) study GARCH models under

heavy-tailed errors. More recent approaches to parameter estimation in the GARCH setting

include Robinson and Zaffaroni (2006), Francq et al. (2011) and the manuscript of Fan et al.

(2012). Also we refer to the books of Fan and Yao (2003) and Francq and Zikoıan (2010) for

comprehensive overviews of estimation procedures in general nonlinear time series and GARCH

models, respectively. Moreover, Buchmann and Muller (2012) prove asymptotic equivalence of

the GARCH model and a time-continuous COGARCH experiment in LeCam’s sense.

Besides the standard GARCH time series many related models have been derived such as

e.g. AGARCH (Ding et al. (1993)) and EGARCH (Nelson (1991)). Straumann and Mikosch

(2006) consider quasi maximum likelihood estimation in those settings among others. Linton

and Mammen (2005) investigate kernel methods in semiparametric ARCH(∞) models. Hormann

(2008) deduces asymptotic properties of augmented GARCH processes. Cızek et al. (2009) study

various of parametric GARCH-related models where the parameters may vary nonparametrically

in time and, thus, the data are non-stationary. Other extensions concern the GARCH-in-mean

models. Recently, Christensen et al. (2012) consider semiparametric estimation in that model

in which the recursive equation for the volatilities is still parametric while an additional function

occurs in the recursion formula for the observations. Conrad and Mammen (2009) study testing

for parametric specifications for the mean function in those models and introduce an iterative

kernel estimator for the mean curve.

In the current note we study a general extension of the GARCH(1, 1) model where the

recursive equation for the volalilities is specified by a link function which is not uniquely deter-

mined by finitely many real-valued parameters but follows only nonparametric constraints. More

specifically, we introduce the stochastic processes (Xn)n≥1 and (Yn)n≥1 driven by the recursive

formula

Yn = Xnεn , (1.1)

Xn+1 = m(Xn, Yn) ,

with i.i.d. nonnegative random variables εnn≥0, which satisfy Eεn = 1 and Rε := ess sup εn <

∞, and a measurable autoregression function m : [0,∞)2 → [0,∞). The data Yj , j = 1, . . . , n,

are observed while theXn are not accessible. Therein, we assume that (Xn)n is strictly stationary

and causal, i.e. for any fixed finite T ⊆ N0, the marginal distributions of (Xt)t−s∈T coincide

for any integer shift s ≥ 0; Xn is measurable in the σ-field generated by X1, ε1, . . . , εn−1; and

X1, ε1, . . . , εn−1 are independent. The random variables Xn are non-negative as well. Our goal

is to estimate the link function m without any parametric pre-specification. The GARCH(1, 1)-

2

setting is still included by putting

m(x, y) = α0 + β1x+ α1y ,

where Xn represent the squared volatilities and Yn represent the squared returns of a financial

asset. This simplification reduces on the one hand the technical effort to establish our main

result, namely to rigorously show that in nonparametric GARCH settings the usual convergence

rates for nonparametric estimation of smooth bivariate regression functions can be achieved.

On the other hand relevant extensions of the standard GARCH model, which are devoted

to include asymmetry in the volatility function according to the dependence on the lagged

return, are not included in our model. Threshold GARCH (TGARCH) models have also been

introduced in order to capture asymmetry (i.e. usage of threshold zero). If one instead thinks of

a positive threshold, at which the GARCH parameters alter depending on whether the lagged

squared return is less than or larger than the threshold, (1.1) covers rather similar situations

with smooth transitions (depending on squared or absolute values of lagged returns) between

different GARCH models.

Such general models have been mentioned e.g. in Buhlmann and McNeil (2002) or Audrino

and Buhlmann (2001, 2009). In those papers, some iterative estimation and forecasting algo-

rithms are proposed. However, to our best knowledge, a rigorous asymptotic minimax approach

to nonparametric estimation of the autoregression function m has not been studied so far. The

current paper intends to fill that gap by introducing a nonparametric minimum contrast esti-

mator of m (section 3), deducing the convergence rates of its risk and showing their minimax

optimality (section 4). Numerical simulations are provided in Section 5. The proofs are deferred

to section 6 while, in section 2, some important properties of stationary distributions of the

volatilities are established.

2 Properties of stationary distributions

We introduce the function m(x, y) := m(x, xy) so that the second equation of (1.1) can be

represented by Xn+1 = m(Xn, εn). We impose

Condition A

(A1) We assume that m is continuously differentiable, maps the subdomain [0,∞)2 into [0,∞)

and satisfies ‖mx‖∞+ ‖my‖∞Rε ≤ c1 < 1 where mx, my denote the corresponding partial

derivatives of m and the supremum norm is to be taken over [0,∞)2. Moreover, Rε =

ess sup ε1.

3

(A2) We impose that the function m is bounded by linear functions from above and below, i.e.

α′0 + α′1x+ β′1y ≤ m(x, y) ≤ α′′0 + α′′1x+ β′′1y, ∀(x, y) ∈ [0,∞)2 ,

for some positive finite constants α′0, α′′0, α

′1, α′′1, β′1, β′′1 with α′′1 < 1.

(A3) Moreover the function m increases monotonically with respect to both components; con-

cretely we have

m(x, y) ≤ m(x′, y′), ∀x′ ≥ x ≥ 0, y′ ≥ y ≥ 0 .

(A4) Furthermore, we impose that ε1 has a bounded Lebesgue density fε which decreases mono-

tonically on (0,∞) and satisfies rε := infx∈[0,Rε/2] fε(x) > 0. We have Eε1 = 1.

The existence of stationary solutions in modified GARCH models has been shown in the papers

of Bougerol (1993) and Straumann and Mikosch (2006). Also Diaconis and Freedman (1999)

and Wu and Shao (2004) study the existence and properties of a stationary solution for similar

iterated random functions. Still, for the purpose of our work, some additional properties of

the stationary distribution are required such as its support constraints. Therefore, we modify

the proof of Bougerol (1993) in order to obtain part (a) of the following proposition. Part (b)

provides a fixed point equation for the left endpoint of stationary distributions. Part (c) gives

some lower bound on the small ball probabilities of stationary distributions. As a remarkable

side result, a lower bound on the tail of this distribution at its left endpoint is deduced.

Proposition 2.1 (a) (modification of the result of Bougerol (1993)) We assume that

Condition A holds true. Then there exists a strictly stationary and causal process (Xn)n with

Xn ≥ 0 a.s. and

RX := ess supXn ≤ α′′0/(1− c1) < ∞ ,

which represents a solution to (1.1) for a given sequence (εn)n. The distribution of X1 is unique

for any given m and fε under the assumptions of strict stationarity and causality.

(b) We impose that (Xn)n satisfies all properties mentioned in part (a). We denote the left

endpoint of the stationary distribution of X1 by

LX := supx ≥ 0 : X1 ≥ x a.s.

.

Then, LX is the only fixed point of the mapping x 7→ m(x, 0), x ∈ [0,∞).

(c) We impose that (Xn)n satisfies all properties mentioned in part (a). Also, we assume that

1− α′1β′1

− α′0(1− α′′1)

α′′0β′1

≤ Rε8. (2.1)

4

Then, for any z ∈ (LX , LX +Rεβ′1α′0/4) and δ ∈ (0, α′0/8), we have

P [|X1 − z| ≤ δ] ≥ δ · µ1(z − LX)µ2 exp(µ3 log2(z − LX)

),

for some real-valued constants µ1, µ2, µ3, which only depends on c1, rε, Rε, α′0, α′′0, β′1, where µ1 >

0, µ2 > 0 and µ3 < 0.

It immediately follows from Proposition 2.1(a) and Condition A that (Yn)n with Yn := Xnεn

is a strictly stationary and causal series as well and consists of nonnegative and essentially

bounded random variables. Therein note that Xn and εn are independent. Throughout this

work we assume that the series (Xn)n is strictly stationary and causal and satisfies all properties

mentioned in Proposition 2.1(a).

3 Methodology

In this section we construct a nonparametric estimator for the function m and suggest how to

choose some approximation space used for its definition.

3.1 Construction of the estimation procedure

Here and elsewhere, we utilize the following notation: Given a measurable basic function f :

[a,∞)2 → [a,∞) or R2 → R, we define the functions f [k] : [a,∞)k+1 → [a,∞) or Rk+1 → R,

respectively, a ∈ R, integer k ≥ 1, via the recursive scheme

f [1](x, y1) := f(x, y1) ,

f [k+1](x, y1, . . . , yk+1) := f(f [k](x, y1, . . . , yk), yk+1

). (3.1)

In the sequel we impose that j ≥ k are arbitrary. Iterating the second equation of (1.1) k-fold,

we obtain that

Xj+1 = m[k](Xj−k+1, Yj−k+1, . . . , Yj) . (3.2)

Considering the causality of the solution (Xn)n we establish independence of εj+1 on the one

hand and of Xj−k+1, Yj−k+1, . . . , Yj on the other hand so that

E(Yj+1|Xj−k+1, Yj−k+1, . . . , Yj

)= Xj+1 ,

almost surely; and taking the conditional expectation with respect to the smaller σ-field gener-

ated by Yj−k+1, . . . , Yj leads to

E(Yj+1|Yj−k+1, . . . , Yj

)= E

(Xj+1|Yj−k+1, . . . , Yj

)= E

(m[k](Xj−k+1, Yj−k+1, . . . , Yj)|Yj−k+1, . . . , Yj

)a.s. . (3.3)

5

The result (3.3) can be viewed as a pseudo estimation equation since the left hand side is

accessible by the data while the right hand side contains the target curve m in its representation.

Unfortunately the right hand side also depends on the unobserved Xj−k+1. However we can show

in the following lemma that this dependence is of minor importance for large k.

Lemma 3.1 We assume Condition A and that j ≥ k > 1. Let f : [0,∞)2 → [0,∞) or R2 → Rbe a continuously differentiable function with bounded partial derivative with respect to x. Fixing

an arbitrary deterministic x0 ≥ 0, we have∣∣f [k](Xj−k+1, Yj−k+1, . . . , Yj)− f [k](x0, Yj−k+1, . . . , Yj)∣∣ ≤ ‖fx‖k∞ |Xj−k+1 − x0| .

We recall Condition (A1) which guarantees that the assumptions on f in Lemma 3.1 are satisfied

by m; and which provides in addition that ‖mx‖∞ ≤ c1 < 1. Combining (3.2) and Lemma 3.1,

we realize that m[k](x0, Yj−k+1, . . . , Yj) can be employed as a proxy of Xj+1. Concretely we have

E∣∣Xj+1 −m[k](x0, Yj−k+1, . . . , Yj)

∣∣2 ≤ ‖mx‖2k∞ E|X1 − x0|2 , (3.4)

where the strict stationarity of the sequence (Xn)n has been used in the last step. Furthermore,

as an immediate consequence of (3.2) and (3.4), we deduce that

E∣∣E(m[k](Xj−k+1, Yj−k+1, . . . , Yj)|Yj−k+1, . . . , Yj

)−m[k](x0, Yj−k+1, . . . , Yj)

∣∣2= E

∣∣E(m[k](Xj−k+1, Yj−k+1, . . . , Yj)−m[k](x0, Yj−k+1, . . . , Yj)∣∣Yj−k+1, . . . , Yj

)∣∣2≤ E|m[k](Xj−k+1, Yj−k+1, . . . , Yj)−m[k](x0, Yj−k+1, . . . , Yj)|2

≤ ‖mx‖2k∞ E|X1 − x0|2 , (3.5)

Thus, suggesting m[k](x0, Yj−k+1, . . . , Yj) for an arbitrary fixed deterministic x0 ≥ 0 as a proxy

of the right hand side of (3.3) seems reasonable. The left hand side of this equality represents the

best least square approximation of Yj+1 among all measurable functions based on Yj−k+1, . . . , Yj .

That motivates us to consider that g ∈ G which minimizes the (nonlinear) contrast functional

Φn(g) :=1

n−K − 1

n−2∑j=K

∣∣Yj+1 − g[K](x0, Yj−K+1, . . . , Yj)∣∣2

+∣∣Yj+2 − g[K+1](x0, Yj−K+1, . . . , Yj+1)

∣∣2 ,as the estimator m = m(Y1, . . . , Yn) of m where we fix some finite collection G of applicants for

the true regression function m. Moreover the parameter K ≤ n − 2 remains to be chosen. We

use a two-step adaptation of g[j] to Yj+1, i.e. for j = K and j = K + 1, in order to reconstruct

g from g[j] which will be made precise in section 4. In particular, Lemma 4.1 will clarify the

merit of this double least square procedure.

6

3.2 Selection of the approximation set G

Now we focus on the problem of how to choose the approximation set G. In addition to Condi-

tion A, we impose

Condition B

The restriction of the true regression function m to the domain I := [0, R′′] × [0, R′′Rε] can

be continued to a function on [−1, R′′ + 1] × [−1, R′′Rε + 1] which is bβc-fold continuously

differentiable and all partial derivatives of m with the order ≤ bβc are bounded by a uniform

constant cM on the enlarged domain. Therein we write R′′ := α′′0/(1−c1) with c1 as in Condition

(A1). Moreover, for non-integer β, the Holder condition∣∣∣ ∂bβcm

(∂x)k(∂y)bβc−k(x1, y1) − ∂bβcm

(∂x)k(∂y)bβc−k(x2, y2)

∣∣∣ ≤ cM∣∣(x1, y1)− (x2, y2)

∣∣β−bβc ,is satisfied for all k = 0, . . . , bβc, (x1, y1), (x2, y2) which are located in [−1, R′′+1]×[−1, R′′Rε+1].

We assume that β > 2.

Note that, by Condition A and Proposition 2.1(a), we have RX ≤ R′′ so that the smoothness

region of m contains the entire support of the distribution of (X1, Y1). Condition B represents

classical smoothness assumptions on m where β describes the smoothness level of the function

m. All admitted regression functions, i.e. those functions which satisfy all constraints imposed

on m in the Conditions A and B are collected in the function class

M =M(c1, α′0, α′1, β′1, α′′0, α

′′1, β′′1 , β, cM) .

Contrarily, the distribution of the error ε1 is assumed to be unknown but fixed. By ‖ · ‖2 we

denote the Hilbert space norm

‖m‖22 :=

∫m2(x, y)dPX1,Y1(x, y) ,

for any measurable function m : [0,∞)2 → R, for which the above expression is finite, with

respect to the stationary distribution PX1,Y1 of (X1, Y1) when m is the true regression function.

Now let us formulate our stipulations on the approximation space G which are required to show

the asymptotic results in Section 4.

Condition C

We assume that G consists either of finitely many measurable mappings g : [0,∞)2 → [0,∞) or

R2 → R. In addition, we assume that g is continuously differentiable and satisfies ‖gx‖∞ ≤ c1

with the constant c1 as in Condition A and ‖gy‖∞ ≤ cG as well as supx∈[0,R′′]×[0,R′′Rε] |g(x, y)| ≤

7

cG for some constant cG > cM. Furthermore, we impose that any m ∈ M is well-approximable

in G; more specifically

supm∈M

ming∈G‖m− g‖22 = O

(n−β/(1+β)

). (3.6)

On the other hand, the cardinality of G is restricted as follows,

#G ≤ expcN · n1/(1+β)

,

for some finite constant cN . Therein β and R′′ are as in Condition B.

We provide

Lemma 3.2 For cG sufficiently large, there always exists some set G such that Condition C is

satisfied.

The cover provided in Lemma 3.2 depends on the smoothness level β in Condition C so that

the estimator m depends on β. Therefore the estimator is denoted by mβ while the index is

oppressed elsewhere. That motivates us to consider a cross-validation selector for β as used in

the construction of our estimator. The integrated squared error of our estimator is given by

‖mβ −m‖22 = ‖mβ‖22 − 2

∫∫mβ(x, y)m(x, y)dP(X,Y )(x, y) + ‖m‖22 . (3.7)

While the last term in (3.7) does not depend on the estimator we have to mimic the first two

expressions. For that purpose we introduce 0 < n1 < n2 < n and base our estimator only on

the dataset Y1, . . . , Yn1 . Then ‖mβ‖22 can be estimated by

1

n− n2

n∑j=n2+1

∣∣m[j−n1]β (x0, Yn1+1, . . . , Yj)

∣∣2 ,in the notation (3.1), and the second term in (3.7) is estimated by

2

n− n2 − 1

n−1∑j=n2+1

Yj+1 · m[j−n1]β (x0, Yn1+1, . . . , Yj) .

Replacing the first two terms in (3.7) by these empirical proxies and omitting the third one

provides an empirically accessible version of (3.7). This version can be minimized over some

discrete grid with respect to β instead of the true integrated squared error. Then the minimizing

value is suggested as β. Still we have to leave the question of adaptivity open for future research.

Finally we mention that the estimator derived in this section can also be applied to parametric

approaches although we do not focus on such models in the framework of the current paper. In

the simulation section we will consider such a parametric subclass.

8

4 Asymptotic properties

In this section we investigate the asymptotic quality of our estimator m as defined in section

3. In order to evaluate this quality we consider the mean integrated square error E‖m −m‖22(MISE). In the following subsection 4.1, we deduce upper bounds on the convergence rates of the

MISE when the sample size n tends to infinity; while, in subsection 4.2, we will show that those

rates are optimal with respect to any estimator based on the given observations in a minimax

sense. Note that we consider the MISE uniformly over the function class m ∈M while the error

density fε is viewed as unknown but fixed; in particular, it does not change in n.

4.1 Convergence rates – upper bounds

We fix some arbitrary g ∈ G. Then we study the expectation of the functional Φn(g). Exploiting

(3.3) and the strict stationarity of (Yn)n, we obtain that

EΦn(g) = E∣∣YK+1 − g[K](x0, Y1, . . . , YK)

∣∣2 + E∣∣YK+2 − g(g[K](x0, Y1, . . . , YK), YK+1)

∣∣2= E var(YK+1|Y1, . . . , YK) + E var(YK+2|Y1, . . . , YK+1)

+ E|WK −GK(g)|2 + E|WK+1 −GK+1(g)|2 ,

where we define

Wj := E(m[j](X1, Y1, . . . , Yj)|Y1, . . . , Yj

),

Gj(g) := g[j](x0, Y1, . . . , Yj) , g ∈ G ,

for j = K,K + 1. For any g1, g2 ∈ G we deduce that

EΦn(g1)− EΦn(g2) = E|GK(g1)|2 + E|GK+1(g1)|2 − E|GK(g2)|2 − E|GK+1(g2)|2

− 2EWKGK(g1) + EWK+1GK+1(g1)− EWKGK(g2)− EWK+1GK+1(g2)

.

For l = 1, 2, k = K,K + 1, we combine (3.4), (3.5), the Cauchy-Schwarz inequality and the

triangle inequality in L2(Ω,A, P ) to show that∣∣EWjGj(gl)− EXj+1Gj(gl)∣∣ ≤ 2‖mx‖j∞ (E|X1 − x0|2)1/2E|Gj(gl)|21/2 .

Then elementary calculation yields that∣∣EΦn(g1)− EΦn(g2)− ϑK(g1) + ϑK(g2)∣∣

≤ 2‖mx‖K∞ (E|X1 − x0|2)1/22∑l=1

(E|GK(gl)|21/2 + ‖mx‖∞E|GK+1(gl)|21/2

), (4.1)

9

where ϑK(g) := E|XK+1 − GK(g)|2 + E|XK+2 − GK+1(g)|2, g ∈ G. This functional will be

studied in the following lemma. In particular, we will provide both an upper and a lower bound

of ϑK(g) where the right hand side of the corresponding inequalities both contain the distance

between g and the true target function m with respect to the ‖ · ‖2-norm.

Lemma 4.1 We assume the Conditions A,B and C. Then, for any g ∈ G, we have

ϑ1/2K (g) ≤ 2‖m− g‖2/

(1− ‖gx‖∞

)+ ‖gx‖K∞1 + ‖gx‖∞(E|X1 − x0|2)1/2 ,

as well as

ϑK(g) ≥ ‖m− g‖22/2 .

As a side result of Lemma 4.1, we deduce that under the Conditions A and C

(E|GK(g)|2)1/2 ≤ (EX21 )1/2 + ϑ

1/2K (g)

≤ (EX21 )1/2 + 2(E|X1 − x0|2)1/2 + 2‖m− g‖2/(1− ‖gx‖∞)

≤ (EX21 )1/2 + 2(E|X1 − x0|2)1/2 + 2(EX2

1 )1/2 + ‖g‖2/(1− ‖gx‖∞)

≤ 3RX + x0 + 2RX + cG/(1− c1) ,

using the equality X2 = m(X1, Y1) and the strict stationarity of (Xn)n. Applying that to (4.1)

we have ∣∣EΦn(g1)− EΦn(g2)− ϑK(g1) + ϑK(g2)∣∣ ≤ SK ,

with

SK := 4cK1 (RX + x0) ·

6RX + 4x0 + 4RX + cG

1− c1

.

Now we introduce g0 := arg ming∈G ‖m− g‖2. Then, for any g ∈ G, we obtain under Condition

A and C by Lemma 4.1 that

EΦn(g)− EΦn(g0) ≥ ϑK(g)− ϑK(g0)− SK

≥ 1

2‖m− g‖22 −

4(1− c1

)2 · ‖m− g0‖22 −RK , (4.2)

where

RK :=8(cG + α′′0 + α′′1RX + β′′1RXRε)

1− c1(RX + x0)cK1 + 4c2K

1 (RX + x0)2 + SK

≤ Λ1 · cK1 , (4.3)

where the positive finite constant Λ1 only depends on α′′0, α′′1, β′′1 , Rε, x0, cG , c1 by Proposition

2.1(a). The equation (4.2) shows that the distance between the expectation of Φn at g and at

10

g0, i.e. the best-approximating element of m within G, increases with the order ‖m − g‖22 if

this distance is large enough so that the expected value of Φn(g) really contrasts g from g0 with

respect to their ‖ · ‖2-distance to the target function m. That is essential to prove the following

theorem.

Theorem 4.1 We consider the model (1.1) under the Conditions A, B. The series (Xn)n is

assumed to be strictly stationary and causal. We choose K nι for some ι ∈ (0, 1) and the

approximation set G such that Condition C is satisfied. Then the estimator m from section 3

satisfies

supm∈M

E‖m−m‖22 = O(n−β/(β+1)

).

We realize that our estimator attains the usual convergence rates for the nonparametric estima-

tion of smooth bivariate regression functions.

Remark 1 An inspection of the proof of Theorem 4.1 shows that the result remains valid when

M is changed into one of its subsets in the theorem as well as in Condition C.

4.2 Convergence rates – lower bounds

In this subsection we focus on the question whether or not the rates established in Theorem 4.1

are optimal with respect to an arbitrary estimator sequence (mn)n where mn is based on the

data Y1, . . . , Yn in a minimax sense. Prepatory to this investigation we provide

Condition C’

The partial derivative my satisfies inf(x,y)∈[0,∞)2 my(x, y) ≥ c′M for some uniform constant

c′M > 0. All m ∈M which satisfy this constraint are collected in the set M′.

and

Lemma 4.2 We consider the model (1.1) under the Conditions A, B and C’. Then the station-

ary distribution of X1 is absolutely continuous and has a Lebesgue density fX which is bounded

from above by ‖fε‖∞/(c′Mα′0). By ‖ · ‖λ we denote the norm

‖f‖λ :=(∫ R′′

x=α′0

∫ R′′Rε

y=0f2(x, y) dx dy

)1/2,

of the Hilbert space of all measurable and squared integrable functions from [α′0, R′′]× [0, R′′Rε]

to C with respect to the Lebesgue measure λ. For all functions f in this space, we have

‖f‖22 ≤ ‖f‖2λ · ‖fε‖2∞/c′M(α′0)2 .

11

We consider the model (1.1) under the Conditions A and B where, in addition, we stipulate

Condition D

We have 0 < α′0 = α′′0 =: α0 and α′′1 > α′1 as well as β′′1 > β′1. Also α′′1 + β′′1Rε < c1. Also we

impose that

α′′1 − α′1 ≤ Rεβ′1/8 .

For some given Rε > 1, the parameters α0, α′1, α′′1, β′1 and β′′1 can be selected such that Condition

D is granted while these constraints are consistent with Condition A. For any m ∈M under the

additional Condition D, we consider the interval J (m) := (LX + Rεβ′1α0/16, LX + Rεβ

′1α0/8].

Since Rεβ′1 < 1 by Condition D, it follows from Proposition 2.1(c) that

P [X1 ∈ I] ≥ µ1

2·(Rεβ′1α0

16

)µ2· exp

µ3

[log2

(Rεβ′1α0

16

)+ log2

(Rεβ′1α0

8

)]· λ(I)

≥ µ6 · λ(I) , (4.4)

for all closed intervals I ⊆ J (m) where the constant µ6 > 0 only depends on c1, rε, Rε, α0, β′1

and λ stands for the one-dimensional Lebesgue measure. We give the following lemma.

Lemma 4.3 Let I = (i1, i2] be an left-open non-void bounded interval and c > 0 some constant.

Let Q be some fixed probability measure on the Borel σ-field of I. We impose that for all [a, b] ⊆ Iwe have

Q([a, b]) ≥ c · λ([a, b]) ,

Then, for all non-negative measurable functions ϕ on the domain I we have∫ϕ(x)dQ(x) ≥ c ·

∫ϕ(x)dλ(x) .

If, in addition, Q has a Lebesgue-Borel density q on I then q(x) ≥ c holds true for Lebesgue

almost all x ∈ I.

Combining Lemma 4.3 and (4.4), we conclude by Fubini’s theorem that

‖m‖22 ≥∫ (∫

m2(x, xt) · 1J (m)(x)dPX1(x))fε(t)dt

≥ µ6 ·∫ (∫

J (m)m2(x, xt)dx

)fε(t)dt , (4.5)

for any measurable function m : [0,∞)2 → R with Em2(X1, Y1) < ∞. Note that the ‖ · ‖2-

norm is constructed with respect to the stationary distribution of (X1, Y1) when m is the true

autoregression function. In particular PX1 denotes the marginal distribution of X1.

12

We define mG(x, y) = α0 +α1x+ β1y for α1 ∈ (α′1, α′′1) and β1 ∈ (β′1, β

′′1 ). Thus the function

m = mG leads to a standard GARCH(1, 1) time series. Considering that (mG)x ≡ α1 and

(mG)y ≡ β1, we realize that m = mG fulfills Condition A. Choosing cM sufficiently large with

respect to any β > 2 Condition B is satisfied as well as all derivatives of mG with the order ≥ 2

vanish. Then mG lies in M. By Proposition 2.1(b) we easily deduce that the left endpoint LX

of the distribution of X1 under m = mG equals LX = α0/(1− α1). We introduce the points

ξj,n := α0/(1− α1) +1

16Rεβ

′1α0 ·

(1 +

j

bn

), j = 0, . . . , bn ,

for some sequence (bn)n ↑ ∞ of even integers. Furthermore we define the grid points

ij,k :=(1

2(ξj,n + ξj+1,n), (k + 1/2)Rε/(2bn)

), j, k = 0, . . . , bn − 1 .

The function H(x, y) := exp(

1x2−1

+ 1y2−1

)for max|x|, |y| < 1 and H(x, y) := 0 otherwise

is supported on [−1, 1]2 and differentiable infinitely often on R2. We specify the following

parameterized family of functions

mθ(x, y) := mG(x, y) +

bn−1∑j,k=0

cHb−βn θj,k ·H

(64bn(x− ij,k,1)/(Rεβ

′1α0), 8bn(y − ij,k,2)/Rε

),

where θ = θj,kj,k=0,...,bn−1 denotes some binary matrix, cH > 0 denotes a constant, which

is viewed as sufficiently small compared to all other constants, and β is as in Condition B.

Accordingly we define

mθ(x, y) := mθ(x, y/x) . (4.6)

We have provided the family of competing regression functions which will be used to prove the

minimax lower bound in the following theorem.

Theorem 4.2 We consider the model (1.1) under the Conditions A,B and D with cM suffi-

ciently large. Furthermore, we assume that fε is continuously differentiable and that the Fisher

information of the density f ε of log ε1 satisfies

E∣∣f ′ε(log ε1)/f ε(log ε1)

∣∣2 ≤ Fε , (4.7)

for some Fε <∞. Let mnn be an arbitrary estimator sequence where mn is based on the data

Y1, . . . , Yn. Then we have

lim infn→∞

nβ/(β+1) supm∈M

E‖mn −m‖22 > 0 .

As f ε(x) = fε(exp(x)) · exp(x) the condition (4.7) holds true for some finite Fε whenever

Eε21

∣∣f ′ε(ε1)/fε(ε1)∣∣2 < ∞ .

Thus, Theorem 4.2 ensures optimality of the convergence rates established for our estimator in

Theorem 4.1.

13

5 Implementation and numerical simulations

We consider the implementation of our estimator and the perspectives for its use in practice.

The main difficulty represents the selection of the approximation set G. Nevertheless we will

show that our method can be used beyond the framework of the Conditions A–C, whose main

purpose is to guarantee the optimal nonparametric convergence rates as considered in the pre-

vious section. As mentioned in Subsection 3.2 we concentrate on a parametric type of G for the

numerical simulation. Concretely, we consider the functions

mα0,α1,β1,βp(x, y) = α0 + β1x+ α1y + βp(

sin(2πx) + cos(2πy)),

which represent the GARCH model with some trigonometric perturbance. Figure 1 and 2 show

the graphical plot of these functions for the values α0 = 0.2, α1 = 0.2, β1 = 0.3, βp = −0.03 and

α0 = 0.3, α1 = 0.2, β1 = 0.3, βp = 0, respectively.

α0 α1 β1 βp

0.2000 0.2000 0.3000 −0.0300

0.2000 0.2500 0.2500 −0.0300

0.2000 0.2000 0.3000 −0.0300

0.2000 0.2000 0.3000 −0.0300

0.2000 0.2000 0.3000 −0.0300

0.2000 0.2000 0.3000 −0.0300

0.2000 0.2000 0.3000 −0.0300

0.2000 0.2000 0.3000 −0.0300

0.2000 0.2000 0.3000 −0.0300

0.2000 0.2000 0.3000 −0.0300

Table 1: results corr. to Fig. 1

α0 α1 β1 βp

0.3000 0.2000 0.3000 0

0.3000 0.2000 0.3000 0

0.3000 0.2500 0.2500 0.0100

0.3000 0.2000 0.3000 0

0.3000 0.2000 0.3000 −0.0100

0.3000 0.2000 0.3000 0

0.3000 0.2000 0.3000 0.0100

0.3000 0.2000 0.3000 0

0.3000 0.2000 0.3000 −0.0100

0.3000 0.2000 0.3000 −0.0100

Table 2: results corr. to Fig. 2

As huge sample sizes are usually available in financial econometrics we base our simulations

on 5, 000 observations Y1, . . . , Y5000. We choose

G =mα0,α1,β1,βp : α0, α1, β1 = 0.1 : 0.05 : 0.3, βp = −0.03 : 0.01 : 0.03

,

for our estimator under the true m from Figure 1 as well as from Figure 2. Note that the

existence of a stationary solution is ensured for each g ∈ G by the results of Bougerol (1993).

We have executed 10 independent simulation runs for each setting. We select the parameters

K = 5 and x0 = 0. Moreover the error variables εj have the Lebesgue density

fε(x) =3

4

(1− x/4

)21[0,4](x) ,

14

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

Figure 1: trigonometric perturbance

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

Figure 2: pure GARCH surface

in our simulations. The results of all simulations for m as in Figure 1 and Figure 2 are given

in Table 1 and 2, respectively. We realize that our procedure performs well for the considered

estimation problem. However refining the equidistant grid from the step width 0.05 to e.g. 0.01

will increase the computational effort intensively and also deteriorate the estimation results due

to the increasing variance of the estimator under enlarged G. Critically we address that the

appropriate selection of G remains a challenging problem in practice in order to avoid a curse

of dimensionality if many coefficients are involved. The MATLAB code is available from the

authors upon request.

6 Proofs

Proof of Proposition 2.1: (a) Inspired by the proof of Bougerol (1993) we introduce random

variables εn with integer n < 0 such that εn, integer n, are i.i.d. random variables on a joint

probability space (Ω,A, P ). Furthermore, we define the random variables Zt,0 := 0 for all integer

t ≥ 0 and recursively Zt,s+1 := m(Zt,s, εs−t). We deduce that

|Zt+1,s+1 − Zt,s| =∣∣m(Zt+1,s, εs−t−1)− m(Zt,s−1, εs−t−1)

∣∣≤ ‖mx(·, εs−t−1)‖∞ |Zt+1,s − Zt,s−1| ,

for any t ≥ 0, s ≥ 1 by the mean value theorem. Repeated application of this inequality yields

that

|Zt+1,s+1 − Zt,s| ≤s−1∏j=0

‖mx(·, εj−t)‖∞ α′′0 .

15

As the random variables ‖mx(·, εj−t)‖∞, j = 0, . . . , s− 1, are bounded from above by ‖mx‖∞+

‖my‖∞Rε almost surely we obtain that

ess sup|Zt+1,s+1 − Zt,s| ≤ (‖mx‖∞ + ‖my‖∞Rε)s α′′0 ≤ cs1 α′′0 . (6.1)

As a side result of this inequality we derive by induction that all random variables Zt+n,n,

t, n ≥ 0, are located in L∞(Ω,A, P ), i.e. the Banach space consisting of all random variables

X on (Ω,A, P ) with ess sup|X| < ∞. Moreover, Condition (A1) and the nonnegativity of the

εn provide that Zt+n,n ≥ 0 almost surely for all t, n ≥ 0 also by induction. Since ‖mx‖∞ +

‖my‖∞Rε ≤ c1 < 1 and (6.1) holds true we easily conclude by the triangle inequality that

(Zt+n,n)n≥0 are Cauchy sequences in L∞(Ω,A, P ) for all integer t ≥ 0. By the geometric

summation formula, we also deduce that

ess supZt+n,n ≤ α′′0/(1− c1) , (6.2)

for all t, n ≥ 0. The completeness of L∞(Ω,A, P ) implies convergence of each (Zt+n,n)n≥0

to some Z∗t ∈ L∞(Ω,A, P ) with respect to the underlying essential supremum norm. The

nonnegative random variables on (Ω,A, P ) which are bounded from above by α′′0/(1−c1) almost

surely form a closed subset of L∞(Ω,A, P ) as L∞(Ω,A, P )-convergence implies convergence

in distribution. We conclude that (6.2) can be taken over to the limit variables Z∗t so that

Z∗t ∈[0, α′′0/(1 − c1)

]almost surely for all t ≥ 0. Since L∞(Ω,A, P )-convergence also implies

convergence in probability the inequality (6.1) also yields that

P[|Zt+n,n − Z∗t | > δ

]≤ lim sup

m→∞P[|Zt+n,n − Zt+m,m| > δ/2

]≤ 2δ−1

∞∑m=n

ess sup |Zt+m,m − Zt+m+1,m+1|

≤ 2δ−1α′′0cn1/(1− c1) ,

by Markov’s inequality for any δ > 0 so that the sum of the above probabilities taken over all n ∈N is finite. Hence, (Zt+n,n)n converges to Z∗t almost surely as well. By the recursive definition of

the Zt,s and the joint initial values Zt,0 = 0 we realize that, for any fixed n, the random variables

Zt+n,n, t ≥ 0, are identically distributed. We conclude that all limit variables Z∗t possess the same

distribution. As m is continuous and we have Zt+n+1,n+1 = m(Zt+n+1,n, ε−t−1) the established

almost sure convergence implies that

Z∗t = m(Z∗t+1, ε−t−1) = m(Z∗t+1, Z∗t+1ε−t−1) ,

almost surely for all t ≥ 0. The variable Zt+1+n,n is measurable in the σ-algebra generated by

Zt+1+n,0, ε−t−2, ε−t−3, . . . , ε−t−n−1. Hence, an appropriate version of the almost sure limit Z∗t+1

16

is measurable in the σ-algebra generated by all εs, s ≤ −t − 2, and therefore independent of

ε−t−1. We conclude that, by putting X0 := Z∗0 , the induced sequences (Xn)n, which are defined

according to the recursive relation (1.1), is strictly stationary and causal since X0, ε0, ε1, . . . are

independent so that all marginal distributions of (Xn)n−s∈T , with any finite T ⊆ N0, do not

depend on the integer shift s ≥ 0. Also all Xn have the same distribution as the Z∗t so that

Xn ∈ [0, α′′0/(1− c1)] holds true almost surely. Uniqueness of the stationary distribution can be

seen as follows: We have Zt+n,n = m[n](0, ε−n−t, . . . , ε−1−t) for all t and n in the notation (3.1).

Now let (Xn)n be a strictly stationary and causal time series where X1 is independent of all εs,

integer s. Again, by the mean value theorem and induction we obtain that∣∣Zt+n,n − m[n](X1, ε−n−t, . . . , ε−1−t)∣∣ ≤ α′′0 c

n1 ,

so that the stochastic processm[n](X1, ε−n−t, . . . , ε−1−t)

n

converges to the random variable

Z∗t almost surely and, hence, in distribution. As we claim that Xnn is a strictly stationary

and causal solution for (1.1) all the random variables m[n](X1, ε−n−t, . . . , ε−1−t) have the same

distribution as X1. Therefore the distributions of X1 and Z∗t coincide so that the distribution

of Z∗t is indeed the unique strictly stationary distribution when m is the autoregression function

and fε is the error density.

(b) Since X1 ≥ LX =⋂n∈NX1 ≥ LX − 1/n we have X1 ≥ LX almost surely. The

monotonicity constraints in Condition (A3) imply that X2 = m(X1, ε1) ≥ m(LX , 0) = m(LX , 0)

a.s.. The strict stationarity of the process (Xn)n provides that X1 ≥ m(LX , 0) a.s. holds true

as well. By the definition of LX it follows that LX ≥ m(LX , 0). Now we assume that LX >

m(LX , 0) = m(LX , 0). As m is continuous there exists some ρ > 0 with LX > m(LX + ρ, ρ).

Thus,

P [X1 ≤ LX + ρ, ε1 ≤ ρ] ≤ P [X2 = m(X1, ε1) ≤ m(LX + ρ, ρ)]

≤ P [X1 < LX ] = 0 ,

due to Condition (A3) and the definition of LX . On the other hand we have

P [X1 ≤ LX + ρ, ε1 ≤ ρ] = P [X1 ≤ LX + ρ] · P [ε1 ≤ ρ] = 0 ,

so that we obtain a contradiction to Condition (A4) and the definition of LX . We conclude that

LX = m(LX , 0). If another solution x ≥ 0 existed we would have

|x− LX | = |m(x, 0)−m(LX , 0)| ≤ ‖mx‖∞|x− LX | ≤ c1|x− LX | ,

leading to a contradiction since c1 < 1.

17

(c) The constraints in Condition (A1) and (A2) give us that LX ≥ α′0 > 0. Writing FX for the

distribution function of X1 we deduce for any x > LX that

FX(x) = P [m(X1, ε1) ≤ x] =

∫t≥LX

P [m(t, ε1) ≤ x] dFX(t) , (6.3)

where we exploit the independence of X1 and ε1. Let us consider the probability P [m(t, ε1) ≤ x]

for any fixed t ≥ LX . We have

P [m(t, ε1) ≤ x] = P [m(t, ε1)− m(t, 0) ≤ x−m(t, 0)] ≥ P[‖my‖∞tε1 ≤ x−m(t, 0)

]≥ P

[ε1 ≤ (x−m(t, 0))/(c1t)

],

by Condition (A1), which provides via Rε ≥ Eε1 = 1 that

1 > c1 ≥ ‖my‖∞Rε ≥ ‖my‖∞ ,

and the mean value theorem. This theorem also yields that

x−m(t, 0) = x− LX +m(LX , 0)−m(t, 0) ≥ x− LX − ‖mx‖∞(t− LX) ≥ x− LX − c1(t− LX) ,

(6.4)

so that

P[m(t, ε1) ≤ x

]≥ Fε

([x− LX − c1(t− LX)

]/(c1t)

),

where Fε denotes the distribution function of ε1. Also if t− LX ≤ c−11 (x− LX) holds true then

x ≥ m(t, 0) and also c1t ≤ x. Writing GX(y) := FX(y+LX), y ∈ R, we insert these results into

(6.3), which yields that

GX(x− LX) ≥∫

1[0,c−11 (x−LX)](s)Fε

((x− LX − c1s)/x

)dGX(s) .

We define c∗1 := (1 + c1)/2 and conclude that

GX(y) ≥∫

1[0,y/c∗1](s)Fε(y(1− c1/c

∗1)/(y + LX)

)dGX(s) ≥ Fε

(1− c1

1 + c1· y

y + LX

)·GX(y/c∗1) ,

for all y > 0. Iterating this inequality we obtain by induction that

GX(x− LX) ≥ GX((x− LX)/(c∗1)n

) n−1∏j=0

Fε

(1− c1

1 + c1· x− LX

(c∗1)jLX + x− LX

)≥ GX

((x− LX)/(c∗1)n

)· Fnε

(1− c1

1 + c1· x− LX

x1

),

for all integer n ≥ 2 and x ∈ (LX , x1) where we put x1 := 2α′′0/(1− c1). By the upper bound on

RX in part (a) we derive that

GX(x1 − LX) = FX(x1) = 1 .

18

This also provides that x1 > LX . Thus, for all integer n ≥ − log((x1 − LX)/(x − LX)

)/ log c∗1

and x ∈ (LX , x1), we have

GX(x− LX) ≥ Fnε

(1− c1

1 + c1· x− LX

x1

)Specifying n := d− log

((x1 − LX)/(x− LX)

)/ log c∗1e we obtain that

GX(x− LX) ≥ exp−( log(x1 − LX)− log(x− LX)

log c∗1− 1)· logFε

(1− c1

1 + c1· x− LX

x1

).

The lower bound imposed on fε in Condition (A4) implies that Fε(t) ≥ rεt for all t ∈ [0, Rε/2]

so that by elementary calculation and bounding we obtain

FX(x) ≥ µ5(x− LX)µ3 exp(µ4 log2(x− LX)

), ∀x ∈

(LX ,min

x1, LX +

(1 + c1)Rεx1

2− 2c1

),

(6.5)

where µ3, µ4, µ5 are real-valued constants which only depend on c1, rε, cε, α′0, α′′0. In particular

we have µ4 = log−1 c∗1 < 0 and µ3, µ5 > 0.

Now we fix some arbitrary z > LX and δ > 0. Using the arguments from above (see eq.

(6.3)) we deduce that

P[|X1 − z| ≤ δ

]≥∫ x2

t=LX

P[|m(t, ε1)− z| ≤ δ

]dFX(t) , (6.6)

for some x2 ≥ LX which remains to be specified. We learn from Condition (A2) and (A3)

that the continuous functions s 7→ m(t, s), s ∈ [0,∞), which increase monotonically, tend to

+∞ as s → +∞ and, therefore, range from m(t, 0) to +∞. We focus on the case where

z ≥ m(t, 0). Then there exists some s(t, z) ≥ 0 such that m(t, s(t, z)) = z. The mean value

theorem provides that for all s ≥ 0 with |s − s(t, z)| ≤ Rεδ/x2 we have |m(t, s) − z| ≤ δ since

‖my‖∞ ≤ c1/Rε < 1/Rε. We conclude that

P[|m(t, ε1)− z| ≤ δ

]≥ P

[|ε1 − s(t, z)| ≤ Rεδ/x2

]≥ Rεδx

−12 fε

(s(t, z) +Rεδ/x2

),

by Condition (A4). The lower bound on m in Condition (A2) yields that s(t, z) ≤ (z − α′0 −α′1t)/(β

′1t). Therein note that z ≥ m(t, 0) ≥ α′0 + α′1t. We obtain the inequality

P[|m(t, ε1)− z| ≤ δ

]≥ Rεδx

−12 fε

(z − α′0β′1LX

− α′1β′1

+Rεδx−12

).

Inserting this low bound, which does not depend on t, into (6.6) for all t ∈ [LX , x2] with

m(t, 0) ≤ z we deduce that

P[|X1 − z| ≤ δ

]≥ P

[X1 ≤ x2, m(X1, 0) ≤ z

]·Rεδx−1

2 fε

(z − α′0β′1α

′0

+Rεδ/α′0

)≥ FX(x2) · 1[0,z](m(x2, 0)) ·Rεδx−1

2 fε

(z − LXβ′1α

′0

− α′′1 − α′1β′1

+Rεδ/α′0

).

19

Also we have used that x2 ≥ LX ≥ α′0; and that LX = m(LX , 0) ≥ α′′0 + α′′1LX , due to part (b)

of the proposition, leading to LX ≤ α′′0/(1−α′′1). Using the arguments from (6.4) when replacing

t by x2 and x by z we derive that selecting x2 = LX + (z−LX)/c1 implies m(x2, 0) ≤ z for any

c1 > max

1, β′1/4(1 +Rε(1− c1))

, which finally leads to

P[|X1 − z| ≤ δ

]≥ z−1δ (c1)−µ3µ5 exp(µ4 log2 c1)(z − LX)µ3−2µ4 log c1 exp

(µ4 log2(z − LX)

)· fε(z − LXβ′1α

′0

− α′′1 − α′1β′1

+Rεδ/α′0

),

when combining these findings with (6.5) and x2 ≤ z. The assumptions on z and δ along with

(2.1) yield thatz − LXβ′1α

′0

− α′′1 − α′1β′1

+Rεδ/α′0 < Rε/2 ,

so that Condition (A4) completes the proof of the proposition.

Proof of Lemma 3.1: From the recursive definition of the f [k] we learn that∣∣f [k](Xj−k+1, Yj−k+1, . . . , Yj)− f [k](x0, Yj−k+1, . . . , Yj)∣∣

≤∣∣f(f [k−1](Xj−k+1, Yj−k+1, . . . , Yj−1), Yj

)− f

(f [k−1](x0, Yj−k+1, . . . , Yj−1), Yj

)∣∣≤ ‖fx‖∞

∣∣f [k−1](Xj−k+1, Yj−k+1, . . . , Yj−1)− f [k−1](x0, Yj−k+1, . . . , Yj−1)∣∣ ,

by the mean value theorem. Repeated use of this inequality (by reducing j and k by 1 in each

step) proves the claim of the lemma.

Proof of Lemma 3.2: We define M0 as the set of the restriction of m to the domain I :=

[0, R′′] × [0, R′′Rε] for all m ∈ M. We learn from Theorem 2.7.1, p. 155 in van der Vaart and

Wellner (1996) that the κ-covering number N (κ) of M0 with respect to the uniform metric,

i.e. the minimal number of closed balls in C0(I) with the radius κ which coverM0, is bounded

from above by expcN · κ−2/β

for all κ > 0 where the finite constant cN only depends on R′′,

Rε, cM and β. Related results on covering numbers can be found in the book of van de Geer

(2000). Moreover C0(I) denotes the Banach space of all continuous functions on I, equipped

with the underlying supremum norm. If, in addition, we stipulate that the centers of the covering

balls in the definition of N (κ) are located in M0 we refer to the minimal number of such balls

as the intrinsic covering number N (M0, κ). Now assume that M1 represents a κ/2-cover of

M0 with #M1 = N (κ/2). Hence, for any m1 ∈ M1 there exists some m0 ∈ M0 such that

‖m1 − m0‖∞ ≤ κ/2. Otherwise, M1\m1 would be a κ/2-cover of M0 with the cardinality

N (κ/2) − 1. Therefore we can define a mapping Ψ from the finite set M1 to M0 which maps

20

m1 ∈ M1 to some fixed m0 ∈ M0 with ‖m1 −m0‖∞ ≤ κ/2. Obviously, Ψ(m1) : m1 ∈ M1represents an intrinsic κ-cover of M0 so that

N (M0, κ) ≤ expcN 41/β · κ−2/β

.

Now we select the set G1 such that G0 :=g |I : g ∈ G

is an intrinsic κn-cover of M0 with

#G ≤ expcN 41/β ·κ−2/β

n

. Note that ‖gx‖∞ ≤ c1 follows from Condition A for all g ∈ G1 ⊆M.

When choosing the constant cG sufficiently large with respect to Rε, R′′, c1, α′′0, α′′1, β′′1 , the other

constraints contained in Condition B are also fulfilled. When choosing the sequence (κn)n such

that κn n−β/(2β+2) Condition C is granted since

‖f‖22 ≤ sup(x,y)∈I

f2(x, y) ,

for all measurable functions f : [0,∞)2 → R with Ef2(X1, Y1) <∞.

Proof of Lemma 4.1: We deduce by (3.2) that(E|Xj+1 − g[j](X1, Y1, . . . , Yj)|2

)1/2≤(E∣∣m(Xj , Yj)− g(Xj , Yj)

∣∣2)1/2 +(E∣∣g(Xj , Yj)− g(g[j−1](X1, Y1, . . . , Yj−1), Yj)

∣∣2)1/2≤ ‖m− g‖2 + ‖gx‖∞

(E|Xj − g[j−1](X1, Y1, . . . , Yj−1)|2

)1/2,

by the strict stationarity of the sequences (Xn)n and (Yn)n. Iterating this inequality by replacing

j by j − 1 in the next step, we conclude that

(E|Xj+1 − g[j](X1, Y1, . . . , Yj)|2

)1/2 ≤ ‖m− g‖2 j−1∑k=0

‖gx‖k∞ ≤ ‖m− g‖2/(1− ‖gx‖∞

),

for all integer j ≥ 1. Moreover we obtain that(E|Xj+1 −Gj(g)|2

)1/2≤ ‖m− g‖2/

(1− ‖gx‖∞

)+(E|g[j](X1, Y1, . . . , Yj)− g[j](x0, Y1, . . . , Yj)|2

)1/2≤ ‖m− g‖2/

(1− ‖gx‖∞

)+ ‖gx‖j∞(E|X1 − x0|2)1/2 ,

so that the proposed upper bound on ϑ(g) has been established.

21

In order to prove the lower bound, we consider that

|XK+1 −GK(g)|2 + |m(XK+1, YK+1)− g(GK(g), YK+1)|2

≥ |XK+1 −GK(g)|2 + |m(XK+1, YK+1)− g(XK+1, YK+1)|2

− 2|gx(ξK , YK+1)| · |XK+1 −GK(g)| · |m(XK+1, YK+1)− g(XK+1, YK+1)|

+ |gx(ξK , YK+1)|2 · |XK+1 −GK(g)|2

≥ minλ∈[0,1]

|XK+1 −GK(g)|2 + |m(XK+1, YK+1)− g(XK+1, YK+1)|2

− 2λ|XK+1 −GK(g)| · |m(XK+1, YK+1)− g(XK+1, YK+1)|+ λ2|XK+1 −GK(g)|2 .

by the mean value theorem where ξK denotes some specific value between XK+1 and GK(g). In

order to determine that minimum only the values λ ∈ 0, 1, |m(XK+1, YK+1)−g(XK+1, YK+1)|/|XK+1 −GK(g)| have to be considered where the last point has to be taken into account only

if |m(XK+1, YK+1) − g(XK+1, YK+1)| ≤ |XK+1 − GK(g)|. In all of those cases we derive that

the above term is bounded from below by |m(XK+1, YK+1)− g(GK(g), YK+1)|2/2. Then, taking

the expectation of these random variables yields the desired result.

Proof of Theorem 4.1: By Condition C we have ‖g‖22 ≤ c2G for all g ∈ G. As m ∈ G hold true

by the definition of m we have ‖m‖22 ≤ c2G almost surely. Condition B yields that ‖m‖22 ≤ c2

M

for the true regression function m since |m| is bounded by cM on the entire support of (X1, Y1).

Therefore, we have ‖m−m‖2 ≤ cM + cG almost surely and

E‖m−m‖22 =

∫ ∞δ=0

P[‖m−m‖22 > δ

]dδ ≤ δn +

∫ (cM+cG)2

δ=δn

P[‖m−m‖22 > δ

]dδ , (6.7)

where δn := cδ ·n−β/(β+1) for some constant cδ > 1 still to be chosen. Clearly we have 0 < δn <

(cM+ cG)2 for n sufficiently large. By the definition of m we derive that Φn(m) ≤ Φn(g0) holds

true almost surely. Writing ∆n(g) := Φn(g)−EΦn(g), equation (4.2) provides that ‖m−m‖22 > δ

implies the existence of some g ∈ G with ‖m− g‖22 > δ such that

∆n(g)−∆n(g0) ≤ −1

4‖m− g‖22 −

1

4δ +

4

(1− c1)2‖m− g0‖22 +RK .

Condition C yields that

supm∈M

‖m− g0‖22 = O(n−β/(1+β)

).

Therefore cδ can be selected sufficiently large (uniformly for all m ∈M) such that

|∆n(g)−∆n(g0)| ≥ 1

4‖m− g‖22 ,

22

for all δ > δn. Therein, note that, by (4.3) and the imposed choice of K, the term RK is

asymptotically negligible uniformly with respect to m. Hence, we have shown that

P[‖m−m‖22 > δ

]≤ P

[∃g ∈ G : ‖m− g‖22 > δ ,

∣∣∆n(g)−∆n(g0)∣∣ ≥ 1

4‖m− g‖22

]≤ P

[∃g ∈ G :

∣∣∆n(g)−∆n(g0)∣∣ ≥ 1

4max

δ, ‖m− g‖22

],

holds true for all δ ∈(δn, (cG + cM)2

)and for all n > N with some N which does not depend

on m but only on the function class M. By the representation

∆n(g)−∆n(g0) =1

n−K − 1

n−2∑j=K

Vj,K(g) + Vj+1,K+1(g)− Vj,K(g0)− Vj+1,K+1(g0)

− 2Uj,K(g) + 2Uj,K(g0)− 2Uj+1,K+1(g) + 2Uj+1,K+1(g0),

where

Vj,K(g) :=∣∣m[K](Xj−K+1, Yj−K+1, . . . , Yj)− g[K](x0, Yj−K+1, . . . , Yj)

∣∣2− E

∣∣m[K](Xj−K+1, Yj−K+1, . . . , Yj)− g[K](x0, Yj−K+1, . . . , Yj)∣∣2 ,

Uj,K(g) := Xj+1 ·(m[K](Xj−K+1, Yj−K+1, . . . , Yj)− g[K](x0, Yj−K+1, . . . , Yj)

)· (εj+1 − 1) ,

we deduce that

P[‖m−m‖22 > δ

]≤

1∑l=0

P[

supg∈G

∣∣∣ 1

n−K − 1

n−2∑j=K

Vj+l,K+l(g)/maxδ, ‖m− g‖22∣∣∣ ≥ 1

32

]

+1∑l=0

P[

supg∈G

∣∣∣ 1

n−K − 1

n−2∑j=K

Uj+l,K+l(g)/maxδ, ‖m− g‖22∣∣∣ ≥ 1

64

]

≤∑g∈G

1∑l=0

P[∣∣∣ n−2∑j=K

Vj+l,K+l(g)/maxδ, ‖m− g‖22∣∣∣ ≥ n−K − 1

32

]

+∑g∈G

1∑l=0

P[∣∣∣ n−2∑j=K

Uj+l,K+l(g)/maxδ, ‖m− g‖22∣∣∣ ≥ n−K − 1

64

], (6.8)

for all δ ∈(δn, (cG + cM)2

)and n > N . In order to continue the bounding of the probability

P[‖m −m‖22 > δ

]an exponential inequality for weakly dependent and identically distributed

random variables is required. In the following we will utilize a result from Merlevede et al.

(2011). For that purpose, we introduce the stochastic processes (X∗j )j and (Y ∗j )j by

X∗j :=

x0 , for j ≤ p+ 2 ,

m[j−p−2](x0, εp+2, . . . , εj−1) , otherwise.

23

and

Y ∗j :=

x0 , for j ≤ p+ 2 ,

X∗j εj , otherwise.

Accordingly we define the random variables V ∗j,k(g) and U∗j,K(g) by

V ∗j,K(g) :=∣∣m[K](X∗j−K+1, Y

∗j−K+1, . . . , Y

∗j )− g[K](x0, Y

∗j−K+1, . . . , Y

∗j )∣∣2

− E∣∣m[K](Xj−K+1, Yj−K+1, . . . , Yj)− g[K](x0, Yj−K+1, . . . , Yj)

∣∣2 ,U∗j,K(g) := X∗j+1 ·

(m[K](X∗j−K+1, Y

∗j−K+1, . . . , Y

∗j )− g[K](x0, Y

∗j−K+1, . . . , Y

∗j ))

· (εj+1 − 1) ,

All variables X∗j and Y ∗j are measurable in the σ-field generated by εp+2, εp+3, . . . and so are

V ∗j,K(g) and U∗j,K(g) for all j ≥ p + 1. Moreover, by Mp,s, s = 0, 1, we denote the σ-fields

induced by the Vj+l,K+l(g), j ≤ p and Uj+l,K+l(g), j ≤ p, respectively, which are both contained

in σ((Xj+l, εj+l) : j ≤ p

)as a subset. Hence, all V ∗j,K(g) and U∗j,K(g), j ≥ p+ 1 are independent

of Mp,s, s = 0, 1, respectively, so that

Ef(fL(V ∗j1+l,K+l, . . . , V

∗jL+l,K+l)) | Mp,0

= E f(fL(V ∗j1+l,K+l, . . . , V

∗jL+l,K+l)) ,

Ef(fL(U∗j1+l,K+l, . . . , U

∗jL+l,K+l)) | Mp,1

= E f(fL(U∗j1+l,K+l, . . . , U

∗jL+l,K+l)) ,

hold true almost surely for p + 1 ≤ j1 < · · · < jL. Therein f and fL denote some arbitrary

1-Lipschitz-functions from R and RL, resp., to R. From there we learn that∣∣Ef(fL(Vj1+l,K+l, . . . , VjL+l,K+l)) | Mp,0

− E f(fL(Vj1+l,K+l, . . . , VjL+l,K+l))

∣∣≤∣∣Ef(fL(Vj1+l,K+l, . . . , VjL+l,K+l))− f(fL(V ∗j1+l,K+l, . . . , V

∗jL+l,K+l)) | Mp,0

∣∣+ E

∣∣f(fL(Vj1+l,K+l, . . . , VjL+l,K+l))− f(fL(V ∗j1+l,K+l, . . . , V∗jL+l,K+l))

∣∣≤

L∑r=1

E∣∣Vjr+l,K+l − V ∗jr+l,K+l

∣∣ +

L∑r=1

E∣∣Vjr+l,K+l − V ∗jr+l,K+l

∣∣ | Mp,0

. (6.9)

Analogously we deduce that∣∣Ef(fL(Uj1+l,K+l, . . . , UjL+l,K+l)) | Mp,1

− E f(fL(Uj1+l,K+l, . . . , UjL+l,K+l))

∣∣≤

L∑r=1

E∣∣Ujr+l,K+l − U∗jr+l,K+l

∣∣ +

L∑r=1

E∣∣Ujr+l,K+l − U∗jr+l,K+l

∣∣ | Mp,1

. (6.10)

Now we consider that∣∣Vjr+l,K+l − V ∗jr+l,K+l

∣∣ ≤ ∆2 + 2(R′′ + cG)∆ ,∣∣Ujr+l,K+l − U∗jr+l,K+l

∣∣ ≤ (Rε + 1)R′′∆ + (R′′ + cG)|X∗jr+l+1 −Xjr+l+1|

+ ∆|X∗jr+l+1 −Xjr+l+1|, (6.11)

24

where R′′ ≥ RX is as in Condition B and

∆ :=∣∣m[K](X∗jr−K+1, Y

∗jr−K+1, . . . , Y

∗jr+l)−m

[K](Xjr−K+1, Yjr−K+1, . . . , Yjr+l)∣∣

+∣∣g[K](x0, Y

∗jr−K+1, . . . , Y

∗jr+l)− g

[K](x0, Yjr−K+1, . . . , Yjr+l)∣∣ .

We have∣∣m[K](X∗jr−K+1, Y∗jr−K+1, . . . , Y

∗jr+l)−m

[K](Xjr−K+1, Yjr−K+1, . . . , Yjr+l)∣∣

≤ ‖mx‖∞ ·∣∣m[K−1](X∗jr−K+1, Y

∗jr−K+1, . . . , Y

∗jr+l−1)

−m[K−1](Xjr−K+1, Yjr−K+1, . . . , Yjr+l−1)∣∣+ ‖my‖∞ ·

∣∣Y ∗jr+l − Yjr+l∣∣...

≤ ‖my‖∞ ·K+l−1∑q=0

‖mx‖q∞∣∣Y ∗jr+l−q − Yjr+l−q∣∣ + ‖mx‖K+l

∞ ·∣∣X∗jr−K+1 −Xjr−K+1

∣∣ , (6.12)

by iterating the first inequality driven by the mean value theorem. We obtain the corresponding

inequality∣∣g[K](x0, Y∗jr−K+1, . . . , Y

∗jr+l)− g

[K](x0, Yjr−K+1, . . . , Yjr+l)∣∣

≤ ‖gy‖∞ ·K+l−1∑q=0

‖gx‖q∞∣∣Y ∗jr+l−q − Yjr+l−q∣∣ . (6.13)

For the distances between the variables and their proxies we deduce that∣∣X∗jr−K+1 −Xjr−K+1

∣∣ ≤ (x0 +R′′)c(jr−K−p−1)+1 .

since ‖mx(·, εj)‖∞ ≤ c1 for all integer j and Xjr−K+1 = m[jr−K−p−1](Xp+2, εp+2, . . . , εjr−K) for

jr −K − p ≥ 2. Also we have

∣∣Y ∗jr+l−q − Yjr+l−q∣∣ ≤x0 +RεR

′′ , for q ≥ jr + l − p− 2 ,

Rε(x0 +R′′)cjr+l−q−p−21 , otherwise.

Inserting these inequalities into (6.12) we obtain that∣∣m[K](X∗jr−K+1, Y∗jr−K+1, . . . , Y

∗jr+l)−m

[K](Xjr−K+1, Yjr−K+1, . . . , Yjr+l)∣∣

≤ c1(x0 +R′′)cjr+l−p−21 minK + l, (jr + l − p− 2)+

+c1(x0 +RεR

′′)

Rε

K+l−1∑q=jr+l−p−2

cq1 + (x0 +R′′) cK+l+(jr−K−p−1)+1

≤ (x0 +R′′)(jr + l − p− 2)+cjr+l−p−11 +

x0 +RεR′′

Rε(1− c1)cjr+l−p−1

1

+ (x0 +R′′) cjr+l−p−11

≤ const. · (jr − p)cjr−p1 ,

25

where we arrange that the constant “const.” only depends on c1, x0, R′′, Rε, cG . Note that the

deterministic constant const. may change its value from line to line as long as its sole dependence

on the above quantities is not affected. With respect to (6.13) we analogously derive that∣∣g[K](x0, Y∗jr−K+1, . . . , Y

∗jr+l)− g

[K](x0, Yjr−K+1, . . . , Yjr+l)∣∣

≤ cGRε(x0 +R′′)(jr + l − p− 2)+cjr+l−p−21 +

cG(x0 +RεR′′)

1− c1cjr+l−p−1

1

≤ const. · (jr − p)cjr−p1 .

Thus, ∆ also admits the upper bound const. · (jr − p)cjr−p1 . As above we deduce that∣∣X∗jr+l+1 −Xjr+l+1

∣∣ ≤ (x0 +R′′)c(jr+l−p−1)+1 .

Combining these findings with the inequalities (6.11) we provide that

max∣∣Vjr+l,K+l − V ∗jr+l,K+l

∣∣, ∣∣Ujr+l,K+l − U∗jr+l,K+l

∣∣ ≤ const. · (jr − p)cjr−p1 .

The equations (6.9) and (6.10) yield that∣∣Ef(fL(Vj1+l,K+l, . . . , VjL+l,K+l)) | Mp,0

− E f(fL(Vj1+l,K+l, . . . , VjL+l,K+l))

∣∣≤ const. ·

L∑r=1

(jr − p)cjr−p1 ≤ const. ·∞∑

i=j1−pici1

= const. · c1d

dc1

cj1−p1

1− c1= const. · cj1−p1 ·

(j1 − p)(1− c1) + c1

, (6.14)

while ∣∣Ef(fL(Uj1+l,K+l, . . . , UjL+l,K+l)) | Mp,1

− E f(fL(Uj1+l,K+l, . . . , UjL+l,K+l))

∣∣ ,admits the same upper bound. Note that this upper bound does not depend on L. In the

notation of Merlevede et al. (2011), in particular eq. (2.3), we have established that

τ(x) ≤ const. · cbxc1 ·bxc(1− c1) + c1

,

for all x ≥ 1. Thus,

τ(x) ≤ const. · exp(x ·

log c1 + 1/c1 − 1),

Since c1 ∈ (0, 1) we have log c1 + 1/c1− 1 < 0 as the function h(x) := log x+ 1/x− 1, x ∈ (0, 1],

is continuous, increases strictly monotonically and satisfies h(1) = 0. Therefore, the condition

(2.6) in Merlevede et al. (2011) is fulfilled when putting c := − log c1 − 1/c1 + 1, γ1 := 1 and

choosing a sufficiently large (only depending on c1, x0, R′′, Rε). Moreover the condition (2.7)

of that paper is also satisfied for any γ2 > 0 and an appropriate constant b with respect to γ2

26

since the supports of the distributions of the random variables Vj,K(g) and Uj,K(g) are included

in the intervals [−2(cG +R′′)2, 2(cG +R′′)2] and [−R′′(cG +R′′)(Rε + 1), R′′(cG +R′′)(Rε + 1)],

respectively, as subsets. Then, the constraint (2.8) of Merlevede et al. (2011) is also satisfied

for γ := γ2/(1 + γ2) ∈ (0, 1).

Now let us consider the covariance of Vj+l,K+l(g) and Vi+l,K+l(g) for j > i. We have

cov(Vj+l,K+l(g), Vi+l,K+l(g)

)= EVj+l,K+l(g)Vi+l,K+l(g)

= EVi+l,K+l(g)EVj+l,K+l(g) | Mi,0

.

We obtain that∣∣cov(Vj+l,K+l(g), Vi+l,K+l(g)

)∣∣ ≤ E|Vi+l,K+l(g)|

· ess sup∣∣EVj+l,K+l(g) | Mi,0

∣∣ .Let us put f(x) := x and f1(x) := x with L := 1 in (6.14). Then this inequality yields that

ess sup∣∣EVj+l,K+l(g) | Mi,0

∣∣ ≤ const. · cj−i1 ·

(j − i)(1− c1) + c1

.

Therein we have used that Vj+l,K+l(g) is centered. The quantity V in eq. (1.12) in Merlevede

et al. (2011) – with respect to the sum of the Vj+l,K+l(g)/maxδ, ‖m− g‖22 – turns out to be

bounded from above by

EV 2K+l,K+l(g)/maxδ2, ‖m− g‖42

+ const. ·∑j>K

E|VK+l,K+l(g)|cj−K1 ·

(j −K)(1− c1) + c1

/maxδ2, ‖m− g‖42

≤ const. · E|VK+l,K+l(g)|/maxδ2, ‖m− g‖42 .

Note that the VK+l,K+l are identically distributed for all l and that |VK+l,K+l| is essentially

bounded by 2(cG +R′′)2. We consider that

E|VK+l,K+l(g)| ≤ 2E∣∣m[K](X1, Y1, . . . , YK+l)− g[K](x0, Y1, . . . , YK+l)

∣∣2≤ 2E

(‖gx‖∞ ·

∣∣m[K−1](X1, Y1, . . . , YK−1+l)− g[K−1](x0, Y1, . . . , YK−1+l)∣∣

+∣∣m(XK+l, YK+l)− g(XK+l, YK+l)

∣∣)2...

≤ 2E(K+l−1∑

i=0

ci1∣∣m(XK+l−i, YK+l−i)− g(XK+l−i, YK+l−i)

∣∣+ cK+l1 |X1 − x0|

)2

≤ 4

1− c1·K+l−1∑i=0

ci1E∣∣m(XK+l−i, YK+l−i)− g(XK+l−i, YK+l−i)

∣∣2 + 4(R′′ + x0)2c2(K+l)1

27

≤ 4

(1− c1)2· ‖m− g‖22 + 4(R′′ + x0)2c

2(K+l)1 , (6.15)

by the Cauchy-Schwarz inequality and the strict stationarity of the process (Xi, Yi), integer i.

Focussing on the covariance of Uj+l,K+l(g) and Ui+l,K+l(g) for j > i we realize that

EUj+l,K+lUi+l,K+l =EUi+l,K+lXj+1+l ·

(m[K](Xj−K+1, Yj−K+1, . . . , Yj+l)

− g[K](x0, Yj−K+1, . . . , Yj+l))· E(εj+l+1 − 1)

= 0 ,

for j > i. The Uj+l,K+l(g) are centered so that cov(Uj+l,K+l, Ui+l,K+l

)vanishes for all j > i.

Hence, the quantity V in eq. (1.12) in Merlevede et al. (2011) – with respect to the sum of the

Uj+l,K+l(g)/maxδ, ‖m− g‖22 – equals EU2K+l,K+l(g)/maxδ2, ‖m− g‖42 since the |Uj+l,K+l|

are essentially bounded by R′′(Rε + 1)(cG +R′′). We consider that

EU2K+l,K+l(g) = (R′′)2(Rε + 1)2 · E

∣∣m[K](Xj−K+1, Yj−K+1, . . . , Yj+l)

− g[K](x0, Yj−K+1, . . . , Yj+l)∣∣2

≤ (R′′)2(Rε + 1)2 · 2

(1− c1)2· ‖m− g‖22

+ 2(R′′)2(Rε + 1)2(R′′ + x0)2c2(K+l)1 , (6.16)

where we have used the upper bound from (6.15).

Thus the quantity V in the notation of Merlevede et al. (2011) is bounded from above by

d1/maxδ, ‖m− g‖22 + d2c2K1 /maxδ2, ‖m− g‖42 ≤ d1/δ + d2c

2K1 /δ2 ,

where the finite positive constants d1 and d2 only depends on c1, R′′ and Rε. Applying Theorem

1 from Merlevede et al. (2011) to (6.8) yields that

P[‖m−m‖22 > δ

]≤ 4

(#G)·(

(n−K − 1) exp−D1(n−K + 1)γ

+ exp

−D2(n−K − 1)2/3

+ exp

− D2

3d1(n−K − 1)δ

+ exp

− D2

3d2(n−K − 1)c−2K

1 δ2

+ exp−D3(n−K − 1) · exp

(D4

(n−K − 1)γ(1−γ)

logγ(n−K − 1)

)),

(6.17)

28

for all δ ∈ (δn, (cG + cM)2) and n > max4, N where the positive finite constants D1, . . . , D4

only depend on c1, x0, R′′, Rε, cG and γ2. Considering the upper bound on the cardinality of Gprovided by Condition C, we apply the inequality (6.17) to (6.7) and obtain that

supm∈M

E‖m−m‖22 ≤ δn + 4(cM + cG)2(n−K − 1) expcNn

1/(1+β) −D1(n−K + 1)γ

+ 4(cM + cG)2 · expcNn

1/(1+β) −D2(n−K + 1)2/3

+ 4(cM + cG)2 · expcNn

1/(1+β) −D3(n−K − 1) · exp(D4

(n−K − 1)γ(1−γ)

logγ(n−K − 1)

)+ 4(cM + cG)2 · exp

cNn

1/(1+β) − D2

3d2(n−K − 1)c−2K

1 δ2n

+

12d1

D2(n−K − 1)exp

cNn

1/(1+β) − D2

3d1(n−K − 1)δn

for all n > max4, N. Due to δn = cδ · n−β/(1+β) with cδ sufficiently large, the selection of K

and the fact that γ can be chosen arbitarily close to 1 (select γ2 large enough) while β > 1 by

Condition C, the first addend δn dominates asymptotically in the above inequality which finally

leads to the desired rate.

Proof of Lemma 4.2: The derivative of m(x)(y) := m(x, y) with respect to y is bounded from

below by c′Mα′0 for any m ∈M′ and x ≥ α′0 so that the functions m(x) : [0,∞)→ [m(x, 0),∞),

x ≥ α′0, increase strictly monotonically and, hence, are invertible. For any c ≥ 0 we have

(x, z) : m−1(x)(z) ≤ c = (x, z) : m(x, c) − z ≥ 0 so that Borel measurability of the function

(x, z) 7→ m−1(x)(z) follows of that of m. By Fubini’s theorem, it follows from there that

P [X1 ≤ z] =

∫P [m(x, ε1) ≤ z]dPX1(x) =

∫P[ε ≤ m−1

(x)(z)]dPX1(x)

=

∫ ∫ m−1(x)

(z)

t=0fε(t) dt dPX1(x) =

∫ z

s=0

(∫ fε(m−1(x)(s))

my(x, m−1(x)(s))

dPX1(x))ds ,

for all z ≥ 0 so that PX1 is absolutely continuous and has the Lebesgue density

fX(z) = Efε(m

−1(X1)(z))

my(X1, m−1(X1)(z))

,

which is bounded from above by ‖fε‖∞/(c′Mα′0) as X1 ≥ α′0 holds true almost surely. Moreover

the support of fX is included in [α′0, R′′] with R′′ as in Condition B. Therefore we have

‖f‖22 =

∫∫f2(x, xt)fX(x)fε(t) dx dt ≤

∫∫[α′0,R

′′]×[0,Rε]f2(x, xt) dx dt · ‖fε‖2∞/(c′Mα′0)

≤∫∫

[α′0,R′′]×[0,R′′Rε]

f2(x, y) dx dy · ‖fε‖2∞/c′M(α′0)2 ,

29

so that ‖ · ‖2 is dominated by ‖ · ‖λ multiplied by a uniform constant.

Proof of Lemma 4.3: It follows immediately that

Q((a, b]) ≥ c · λ((a, b]) ,

for all left-open intervals (a, b] ⊆ I since

Q((a, b]) ≥ Q([a− 1/m, b]) ≥ c · λ([a− 1/m, b]) → c(b− a) = cλ((a, b]) ,

as m→∞ for all (a, b] ⊆ I. Let F denote the collection of all unions of finitely many intervals

(a, b] ⊆ I. One can verify that F forms a ring of subsets of I which is closed with respect to

the relative complement related to I. Moreover, F can equivalently be defined by the collection

of all unions of finitely many intervals (a, b] ⊆ I which are pairwise disjoint. For all f ∈ F we

conclude that

Q(f) = Q( n⋃k=1

(ak, bk])

=

n∑k=1

Q((ak, bk]) ≥ c ·n∑k=1

λ((ak, bk]) = c · λ(f) ,

where the intervals (ak, bk] are assumed to be pairwise disjoint. The σ-field generated by the

ring F equals the Borel σ-field of I as F contains all left-open intervals of I. As Q and λ are

finite measures on that Borel σ-field their restrictions to the domain F are finite pre-measures.

By Caratheodory’s extension theorem, Q and λ fulfil

Q(g) = inf ∞∑j=1

Q(fj) : g ⊆⋃j∈N

fj , fj ∈ F,

λ(g) = inf ∞∑j=1

λ(fj) : g ⊆⋃j∈N

fj , fj ∈ F.

Since∑∞

j=1Q(fj) ≥ c ·∑∞

j=1 λ(fj) the outer measure representation of Q and λ yields that

Q(g) ≥ c · λ(g) for all Borel subsets g of I. Clearly for all elementary non-negative measurable

mappings ϕ on I, ϕ =∑n

k=1 ϕk · 1gk , for some real numbers ϕk ≥ 0 and pairwise disjoint Borel

subsets gk, k = 1, . . . , n, of I, it follows that∫ϕ(x)dQ(x) ≥ c ·

∫ϕ(x)dλ(x) .

This inequality extends to all non-negative measurable functions ϕ by the monotone conver-

gence theorem as any such function ϕ is pointwise approximable by a pointwise monotonically

increasing sequence of elementary non-negative measurable mappings on the domain I.

30

Now we assume that the Lebesgue density q of Q on I exists. Then we put ϕ equal to the

indicator function of the preimage q−1((−∞, c)) ⊆ I so that∫(q(x)− c)ϕ(x)dλ(x) =

∫ϕ(x)dQ(x)− c ·

∫ϕ(x)dλ(x) ≥ 0 .

As the function (q − c)ϕ is non-positive it follows that, for Lebesgue almost all x ∈ I, we have

q(x) = c or ϕ(x) = 0. Thus, for those x we have q(x) ≥ c.

Proof of Theorem 4.2: The statistical experiments under which one observes the dataX1, Y 1, . . . , Y n

with X1 = logX1 and Y j = log Yj from model (1.1) is more informative than the model under

consideration in which only Y1, . . . , Yn are accessible. Hence, as we are proving a lower bound,

we may assume that logX1 is additionally observed and that the Y j are recorded instead of the

Yj .

First let us verify that, for all binary matrices θ, the functions mθ from (4.6) lie in M. The

support of the function mθ −mG is included in(x, y) ∈ R2 : x ∈ α0/(1− α1) +

1

16Rεβ

′1α0 · [1, 2] , 0 ≤ y ≤ Rεx/2

,

as a subset. Thus we derive that

‖(mθ)x − (mG)x‖∞ ≤ 8cHb1−βn ·

8

Rεβ′1α0‖Hx‖∞ +

4

α0‖Hy‖∞

,

‖(mθ)y − (mG)y‖∞ ≤8cHRεα0

‖Hy‖∞ · b1−βn , (6.18)

while (mG)x ≡ α1 and (mG)y ≡ β1. By Condition D we have α1 + β1Rε < c1 so that Condition

(A1) holds true for all mθ for cH > 0 sufficiently small. All functions mθ coincide with mG on

their restriction to [0, α0/(1− α1) +

1

16Rεβ

′1α0

]× [0,∞) , (6.19)

while, for x > α0/(1− α1) +Rεβ′1α0/16, the upper bound

‖mθ −mG‖∞ ≤ cHbβn‖H‖∞ ,

for cH small enough, suffices to verify the envelope Condition (A2). By (6.18) and the small cH

we also establish that

infx≥0,y≥0

(mθ)x ≥ α1/2 > 0 , infx≥0,y≥0

(mθ)y ≥ β1/2 > 0 ,

for all θ, what guarantees the monotonicity Condition (A3). Moreover, the above inequalities

guarantee Condition C’ so that the stationary distribution of (Xn)n under any mθ has a bounded

31

Lebesgue density fX,θ by Lemma 4.2. Also, as cH is sufficiently small with respect to all other

constants, we can verify the smoothness constraint in Condition B for all admitted values of θ.

As we have shown that

MP :=mθ : θ ∈ Θn

⊆ M ,

where Θn denotes the set of all binary (bn× bn)-matrices, we follow a usual strategy of minimax

theory and put an a-priori distribution on MP . Concretely, we assume that all components of

the binary matrix θ are independent and satisfy P [θj,k = 1] = 1/2 and estimate the minimax

risk by the corresponding Bayesian risk from below. However, we face the problem that the loss

function depends on the stationary distribution of the data and, hence, on mθ. As all mθ equal

mG on the interval (6.19) we have α0/(1−α1) = mθ

(α0/(1−α1), 0

). By Proposition 2.1(b), the

left endpoint of the stationary distribution under the regression function mθ equals α0/(1−α1)

for all θ ∈ Θn so that

J := J (mθ) =((α0/(1− α1) +Rεβ

′1α0/16, α0/(1− α1) +Rεβ

′1α0/8] ,

for all θ ∈ Θn. The closure of this interval contains the whole support of mθ−mG for all θ ∈ Θn.

Then, (4.5) yields that

‖mn −mθ‖22 ≥ µ6rε ·∫ Rε/2

0

∫J

∣∣mn(x, xt)− mθ(x, t)∣∣2 dx dt ,

where the ‖ · ‖2-norm depends on θ. We introduce the intervals

Ij,k := ij,k+(−Rεβ′1α0/(64bn), Rεβ

′1α0/(64bn)

)×(−Rε/(8bn), Rε/(8bn)

), j, k = 0, . . . , bn−1 ,

which are pairwise disjoint and subsets of [0, Rε/2]× J . We deduce that

supm∈M

Em‖mn −m‖22 ≥ µ6rεEθ Emθ

∫ Rε/2

0

∫J

∣∣mn(x, xt)− mθ(x, t)∣∣2 dx dt

≥ 1

2µ6rε

bn−1∑j,k=0

Eθ,j,k

(Emθ,j,k,0

∫∫Ij,k

∣∣mn(x, xt)− mθ,j,k,0(x, t)∣∣2 dx dt

+ Emθ,j,k,1

∫∫Ij,k

∣∣mn(x, xt)− mθ,j,k,1(x, t)∣∣2)

≥ 1

4µ6rε

bn−1∑j,k=0

Eθ,j,k

∫ ∫∫Ij,k

∣∣mθ,j,k,1(x, t)− mθ,j,k,0(x, t)∣∣2 dx dt minLn,θ,j,k,0(y),Ln,θ,j,k,1(y)dy

≥ µ6rεR2εβ′1cHα0

2048· b−2β−2n ·

∫∫H2(x, y)dxdy ·

bn−1∑j,k=0

Eθ,j,k

(1−

√1−A(Ln,θ,j,k,0,Ln,θ,j,k,1)

)where Em, Emθ , Emθ,j,k,b , b ∈ 0, 1, denote the expectation with respect to the data when

m, mθ, mθ,j,k,b, resp., is the true autoregression function and we write mθ,j,k,b for the function

32

mθ when θj,k = b and all other components of θ are random; Eθ stands for the expectation

with respect to the a-priori distribution of θ; Eθ,j,k denotes that expectation with respect to all

components of θ except θj,k; Ln,θ,j,k,b, b ∈ 0, 1, denotes the Lebesgue density of the data under

the true autoregression function mθ,j,k,b; finally,

A(f1, f2) :=

∫ √f1(x)f2(x)dx ,

stands for the Hellinger affinity between two densities f1 and f2. We have used LeCam’s in-

equality in the last step. Therefore, we obtain the lower bound b−2βn on the convergence rate if

we verify that

infn≥N

infθ∈Θn

infj,k=0,...,bn−1

A(Ln,θ,j,k,0,Ln,θ,j,k,1) > 0 , (6.20)

for some fixed integer N . For the likelihood function, i.e. the data density with respect to the

(n+ 1)-dimensional Lebesgue measure, we have the following recursive equations

L0,θ,j,k,b(x1) = fX,θ,j,k,b(x1) ,

Ll+1,θ,j,k,b(x1, y1, . . . , yl+1) = f ε(yl+1 −mθ,j,k,b(xl, yl)

)· Ll,θ,j,k,b(x1, y1, . . . , yl) .

for l = 0, . . . , n−1. Therein we have mθ,j,k,b(x, y) = logmθ,j,k,b(exp(x), exp(y)) and f ε, fX,θ,j,k,b

denote the density of log ε1 and the stationary density of (logXn)n under m = mθ,j,k,b, respec-

tively. Also the x2, . . . , xl are recursively defined by xr+1 = mθ,j,k,b(xr, yr). Writing

H(f1, f2) :=(∫ (√

f1(x)−√f2(x)

)2dx)1/2

,

for the Hellinger distance between densities f1 and f2, we deduce that

Al+1 := A(Ll+1,θ,j,k,0,Ll+1,θ,j,k,1)

=

∫· · ·∫ (∫ √

f ε(yl+1 −mθ,j,k,0(xl, yl)

)f ε(yl+1 −mθ,j,k,1(xl, yl)

)dyl+1

)·√Ll,θ,j,k,0(x1, y1, . . . , yl) · Ll,θ,j,k,1(x1, y1, . . . , yl)dx1dy1 · · · dyl

=

∫· · ·∫ (

1− 1

2H2(f ε(· −mθ,j,k,0(xl, yl)

), f ε(· −mθ,j,k,1(xl, yl)

)))·√Ll,θ,j,k,0(x1, y1, . . . , yl) · Ll,θ,j,k,1(x1, y1, . . . , yl)dx1dy1 · · · dyl

= Al −1

2∆n,θ,j,k , (6.21)

with

∆n,θ,j,k ≤Emθ,j,k,0H

2(f ε(· −mθ,j,k,0(X l, Y l)

), f ε(· −mθ,j,k,1(X l, Y l)

))· Emθ,j,k,1H

2(f ε(· −mθ,j,k,0(X l, Y l)

), f ε(· −mθ,j,k,1(X l, Y l)

))1/2,

33

by the Cauchy-Schwarz inequality. For any a, b ∈ R the regularity constraint (4.7) implies that

H2(f ε(· −a

), f ε(· −b

))≤ Fε · (a− b)2 ,

so that

∆n,θ,j,k ≤ Fε · maxb=0,1

∥∥ logmθ,j,k,0 − logmθ,j,k,1

∥∥2

2,b,

where we write ‖·‖2,b for the ‖·‖2-norm under the true autoregression functionmθ,j,k,b. According

to Lemma 4.2, we have∥∥ logmθ,j,k,0 − logmθ,j,k,1

∥∥2

2,b=

∫∫ ∣∣ log mθ,j,k,0(x, t)− log mθ,j,k,1(x, t)∣∣2fX,θ,j,k,b(x)fε(t) dx dt

≤2‖fε‖2∞c2

H

β1α20

· b−2βn ·

∫Ij,k

H2(64bn(x− ij,k,1)/(Rεβ

′1α0), 8bn(t− ij,k,2)/Rε

)dx dt

≤ ‖fε‖2∞R

2εβ′1cH

256β1α0·∫∫

H2(x, y)dxdy · b−2β−2n ,

which provides an upper bound on ∆n,θ,j,k. Then it follows from (6.21) by induction that

An ≥ A0 −‖fε‖2∞R2

εβ′1cH

512β1α0·∫∫

H2(x, y)dxdy · nb−2β−2n . (6.22)

Then we consider that

A0 =

∫ √fX,θ,j,k,0(x)fX,θ,j,k,1(x) dx =

∫ √fX,θ,j,k,0(exp(x))fX,θ,j,k,1(exp(x)) exp(x)dx

=

∫ √fX,θ,j,k,0(x)fX,θ,j,k,1(x) dx ≥

∫J

√fX,θ,j,k,0(x)fX,θ,j,k,1(x) dx

≥ µ6Rεβ′1α0/16 > 0 ,

using (4.4) and Lemma 4.3 where the above lower bound does not depend on cH . Viewing cH

as sufficiently small and choosing bn = n1/(2β+2), the equation (6.22) finally proves (6.20), which

establishes b−2βn = n−β/(β+1) as a lower bound on the convergence rate.

References

[1] Audrino, F. and Buhlmann, P. (2001). Tree-structured generalized autoregressive conditional het-

eroscedastic models. J. Roy. Statist. Soc., Ser. B 63, 727–744.

[2] Audrino, F. and Buhlmann, P. (2009). Splines for financial volatility. J. Roy. Statist. Soc., Ser. B

71, 655–670.

[3] Berkes, I., Horvath, L. and Kokoszka, P.S. (2003). GARCH processes: structure and estimation.

Bernoulli 9, 201–227.

34

[4] Berkes, I. and Horvath, L. (2004). The efficiency of the estimators of the parameters in GARCH

processes. Ann. Statist. 32, 633–655.

[5] Bollerslev, T. (1986). Generalized autoregressive conditional heteroscedasticity. J. Econometrics 31,

307–327.

[6] Bougerol, P. and Picard, N. (1992). Stationarity of GARCH processes and of some non-negative

time series. J. Econometrics 52, 115–127.

[7] Bougerol, P. (1993). Kalman filtering with random coefficients and contractions. SIAM J. Control.

Optim. 31, 942–959.

[8] Buchmann, B. and Muller, G. (2012). Limits experiments of GARCH. Bernoulli 18, 64–99.

[9] Buhlmann, P. and McNeil, A.J. (2002). An algorithm for nonparametric GARCH modelling. Comp.

Statist. Data Anal. 40, 665–683.

[10] Cızek, P., Hardle, W. and Spokoiny, V. (2009). Adaptive pointwise estimation in time-

inhomogeneous conditional heteroscedasticity models, Econom. J. 12, 248–271.

[11] Conrad, C. and Mammen, E. (2009). Nonparametric regression on latent covariates with an ap-

plication to semiparametric GARCH-in-Mean models. Discussion Paper No. 473, University of

Heidelberg, Department of Economics.

[12] Christensen, B.J., Dahl, C.M. and Iglesias, E. (2012). Semiparametric inference in a GARCH-in-

mean model. J. Econometrics 167, 458-472.

[13] Diaconis, P. and Freedman, D. (1999). Iterated random functions. SIAM Review 41, 45–76.

[14] Ding, Z., Granger, C.W.J. and Engle, R. (1993). A long memory property of stock market returns

and a new model. J. Empirical Finance 1, 83–106.

[15] Engle, R. (1982). Autoregressive conditional heteroskedasticity with estimates of the variance of the

United Kingdom inflation. Econometrica 50, 987–1007.

[16] Fan, J., Qi, L. and Xiu, D. (2014). Quasi Maximum Likelihood Estimation of GARCH Models with

Heavy-Tailed Likelihoods. J. Busi. Econ. Stat. 32, 178–205.

[17] Fan, J. and Yao, Q., Nonlinear time series: nonparametric and parametric methods, 2003, Springer.

[18] Francq, C. and Zikoıan, J.-M., GARCH models: structure, statistical inference and financial appli-

cations, 2010, Wiley.

[19] Francq, C., Lepage, G. and Zikoıan, J.-M. (2011). Two-stage non Gaussian QML estimation of

GARCH models and testing the efficiency of the Gaussian QMLE. J. Econometrics 165, 246–257.

[20] Hall, P. and Yao, Q. (2003). Inference in ARCH and GARCH models with heavy-tailed errors.

Econometrica 71, 285–317.

35

[21] Hormann, S. (2008). Augmented GARCH sequences: dependence structure and asymptotics.

Bernoulli 14, 543–561.

[22] Lee, S.-W and Hansen, B.E. (1994). Asymptotic theory for the GARCH(1, 1) quasi-maximum like-

lihood estimator. Econometric Theory 10, 29–52.

[23] Linton, O. and Mammen, E. (2005). Estimating semiparametric ARCH(∞) models by kernel

smoothing methods. Econometrica 73, 771–836.

[24] Lumsdaine, R.L. (1996). Consistency and asymptotic normality of the quasi-maximum likelihood

estimator in IGARCH(1, 1) and covariance stationary GARCH(1, 1) models. Econometrica 64, 575–

596.

[25] Merlevede, F., Peligrad, M. and Rio, E. (2011). A Bernstein type inequality and moderate deviations

for weakly dependent sequences. Prob. Theo. Rel. Fields 151, 435–474.

[26] Nelson, D. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econometrica

59, 347–370.

[27] Robinson, P.M. and Zaffaroni, P. (2006). Pseudo-maximum-likelihood estimation of ARCH(∞) mod-

els. Ann. Statist. 34, 1049–1074.

[28] Straumann, D. and Mikosch, Th. (2006). Quasi-maximum-likelihood estimation in conditionally

heteroscedastic time series: A stochastic recurrence equations approach. Ann. Statist. 34, 2449–

2495

[29] van de Geer, Sara A., Empirical Processes in M-estimation: Applications of empirical process theory,

2000, Cambridge Series in Statistical and Probabilistic Mathematics 6, Cambridge University Press.

[30] van der Vaart, A.W. and Wellner, J.A., Weak Convergence and Empirical Processes with Applica-

tions to Statistics, 1996, Springer.

[31] Wu, W.B. and Shao, X. (2004). Limit theorems for iterated random functions. J. Appl. Prob. 41,

425–436.

A MATLAB code – Supplement not to be published

In this section we provide the MATLAB codes for the estimators in the simulation section 5

from the main article.

36

A.1 Computation of the function mα0,α1,β1,βp

f unc t i on erg = l i n s i n ( alpha 0 , alpha 1 , beta , beta p , x , y ) ;

% here ” beta ” = beta 1

stoerung = beta p ∗ s i n (2∗ pi ∗x ’ ) ∗ ones (1 , l ength ( y ) )

+ beta p ∗ ones (1 , l ength ( x ) ) ’ ∗ cos (2∗ pi ∗y ) ;

erg = alpha 0 ∗ ones ( l ength ( x ) , l ength ( y ) ) + beta ∗ x ’ ∗ ones (1 , l ength ( y ) )

+ alpha 1 ∗ ones (1 , l ength ( x ) ) ’ ∗ y + stoerung ;

A.2 Computation of the contrast functional Φ

f unc t i on erg = Phi ( alpha 0 , alpha 1 , beta , beta p ,K, x0 , Daten ) ;

% i n s e r t the observed data as ”Daten”

n = length ( Daten ) ;

summe = 0 ;

f o r j = K: 1 : n−2;

g1 = l i n s i n ( alpha 0 , alpha 1 , beta , beta p , x0 , Daten ( j−K+1)) ;

f o r l = 2 : 1 :K;

g1 = l i n s i n ( alpha 0 , alpha 1 , beta , beta p , g1 , Daten ( j−K+l ) ) ;

end ;

g2 = l i n s i n ( alpha 0 , alpha 1 , beta , beta p , g1 , Daten ( j +1)) ;

de l t a1 = ( Daten ( j +1) − g1 ) ˆ 2 ;

de l t a2 = ( Daten ( j +2) − g2 ) ˆ 2 ;

summe = summe + de l ta1 + de l ta2 ;

end ;

erg = summe / (n−K−1);

A.3 Computation of the final coefficient estimators

f unc t i on erg = m Schaetzer (K, x0 , Daten ) ;

% i n s e r t the observed data as ”Daten”

Minimum = [ 0 . 1 0 .1 0 . 1 0 ; 0 0 0 Phi ( 0 . 1 , 0 . 1 , 0 . 1 , 0 ,K, x0 , Daten ) ] ;

f o r beta = 0 . 1 : 0 . 0 5 : 0 . 3 ;

f o r a lpha 1 = 0 . 1 : 0 . 0 5 : 0 . 3 ;

f o r a lpha 0 = 0 . 1 : 0 . 0 5 : 0 . 3 ;

f o r beta p = −0 . 0 3 : 0 . 0 1 : 0 . 0 3 ;

Wert = Phi ( alpha 0 , alpha 1 , beta , beta p ,K, x0 , Daten ) ;

37

i f Wert < Minimum(2 ,4 ) , Minimum = [ a lpha 0 a lpha 1 beta beta p ;

0 0 0 Wert ] ;

e l s e end ;

end ;

end ;

end ;

end ;

erg = Minimum ( 1 , : ) ;

38

Statistical inference for nonparametric GARCH … inference for nonparametric GARCH models Alexander Meister Jens-Peter Krei ... e.g. for the investigation of stock market indices.

Documents