Noname manuscript No. (will be inserted by the editor) Difference based estimators and infill statistics Differences and infill statistics Jos´ e R. Le´ on · Carenne Lude˜ na Abstract Infill statistics, that is, statistical inference based on very dense observa- tions over a fixed domain has become of late a subject of growing importance. On the other hand, it is a known phenomenon that in many cases infill statistics do not provide optimal rates. The degree of sub-optimality is related to how much parameter- related information is lost because of dense sampling, which in turn is related to sample path regularity. In the stationary Gaussian case this is determined by the large value behaviour of the spectral density and its derivatives. Moreover, many interesting non stationary examples such as non linear functionals of stationary Gaussian processes or diffusion processes driven by a stationary increment Gaussian process can also be seen to depend on the large value behaviour of the spectral density of the underlying pro- cess. In this article we discuss several examples in a unified frequency domain approach providing a general framework relating sample path regularity to estimation rates. This includes examples such as volatility estimation for diffusions and fractional diffusions, multifractals and non-linear functions of Gaussian processes. As a final example we include the problem of estimation in the presence of an additive white noise, known as the nugget effect or micro-structure error. Keywords gaussian processes, diffusions and fractional diffusions, increasing domain asymptotics, infill statistics, microergodicity, multifractals, non-linear functionals of gaussian processes, nugget effect, spatial statistics, spectral analysis · volatility Mathematics Subject Classification (2000) 62F12, 62M15 · 62M30 The authors would like to thank Proyecto LOCTI (Ministerio de Ciencia Tecnolog´ ıa e Inno- vaci´on de Venezuela) “Estudio del transporte de contaminantes en el Lago de Valencia” Jos´ e R. Le´on E-mail: [email protected]Escuela de Matem´aticas, Facultad de Ciencias UCV, Caracas, Venezuela. Carenne Lude˜ na E-mail: [email protected]. Escuela de Matem´aticas, Facultad de Ciencias UCV, Caracas, Venezuela.
34
Embed
Difference based estimators and infill statistics Differences and infill statistics
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Noname manuscript No.(will be inserted by the editor)
Difference based estimators and infill statistics
Differences and infill statistics
Jose R. Leon · Carenne Ludena
Abstract Infill statistics, that is, statistical inference based on very dense observa-
tions over a fixed domain has become of late a subject of growing importance. On
the other hand, it is a known phenomenon that in many cases infill statistics do not
provide optimal rates. The degree of sub-optimality is related to how much parameter-
related information is lost because of dense sampling, which in turn is related to sample
path regularity. In the stationary Gaussian case this is determined by the large value
behaviour of the spectral density and its derivatives. Moreover, many interesting non
stationary examples such as non linear functionals of stationary Gaussian processes or
diffusion processes driven by a stationary increment Gaussian process can also be seen
to depend on the large value behaviour of the spectral density of the underlying pro-
cess. In this article we discuss several examples in a unified frequency domain approach
providing a general framework relating sample path regularity to estimation rates. This
includes examples such as volatility estimation for diffusions and fractional diffusions,
multifractals and non-linear functions of Gaussian processes. As a final example we
include the problem of estimation in the presence of an additive white noise, known as
the nugget effect or micro-structure error.
Keywords gaussian processes, diffusions and fractional diffusions, increasing domain
asymptotics, infill statistics, microergodicity, multifractals, non-linear functionals of
The authors would like to thank Proyecto LOCTI (Ministerio de Ciencia Tecnologıa e Inno-vacion de Venezuela) “Estudio del transporte de contaminantes en el Lago de Valencia”
Jose R. Leon E-mail: [email protected] de Matematicas, Facultad de Ciencias UCV, Caracas, Venezuela.
Carenne Ludena E-mail: [email protected] de Matematicas, Facultad de Ciencias UCV, Caracas, Venezuela.
Consider an L2 process Xt, with t ∈ Rd, whose law depends on a certain parameter θ ∈Θ but that is only observed over D ⊂ Rd. The fact that observations are restricted to
the bounded areaD means that statistical inference on θ must be based on progressively
more densely sampled points as the sample size increases over the fixed domain. This is
referred to as infill statistics following Cressie (1993) or fixed domain asymptotics Stein
(1999). As is well known (see for example Stein , 1999; Zhang and Zimmerman , 2005;
Chan and Wood , 2004 and references therein), this restriction may yield suboptimal
rates.
We assume we observe Xti with the multi–index ti defined by
ti := (i1∆, . . . , id∆), 1 ≤ ij ≤ n, j = 1, . . . , d so the total number of observations
is nd. Purely infill asymptotics are characterized by n∆ → C as n → ∞ for a certain
positive constant C.
Without loss of generality assume C = 1. Even under very mild regularity assumptions
on X, as ∆ → 0 there is less information available as follows from the underlying
regularity. However, despite the growing literature in the infill setting, there seems to
be no general rule assessing estimation performance based on model assumptions.
With this in mind, in this article we consider a series of one dimensional examples
of infill based statistical inference (we also include higher dimensional generalizations
whenever available), based on q-order differences in a frequency domain framework.
Differences are important in infill statistics because they provide a method for “de-
regularizing” the original data, hence increasing the amount of information available.
Our main goal is to try to provide insights as to what the main issues are, expectable
results and what may be interesting problems for further developments. To this end we
hope to illustrate the intimate yet not thoroughly exploited relationship between spatial
statistics and increment or variations based statistics including recent very powerful
tools based on Malliavin calculus (see for example Nualart and Ortiz-La Torre , 2008;
Tudor and Viens , 2009, and the references therein).
Let Pθ be the law of Xt, t ∈ D. Consider a certain function of the parameter,
h = h(θ). Following Stein Stein (1999), Section 6.2, a necessary although not sufficient
condition for consistently estimating h is that it is microergodic. Given the class of
probability models Pθ, θ ∈ Θ and a function h : Θ → Rq, then h is defined to be
microergodic if h(θ1) 6= h(θ2)⇒ Pθ1 ⊥ Pθ2 .
3
In general, as will be discussed in more detail in Section 3 and in the example below,
any function h that requires precise information on the low frequencies requires precise
information on the full range of values of X (which is not observable over an a priori
fixed domain) and hence will not be consistently estimable in a purely infill framework
and will not be microergodic.
To develop this idea we shall dwell on the following example proposed by Stein (1999).
Example 1 Estimating the variance of the sample mean Stein (1999) pp. 144-148: let
Z(t) be a centered stationary Gaussian process with spectral density f = f(θ). Assume
that we split D = [0, 1]d into nd sub-cubes of side 1/n and set ti to be the center of
each sub-cube. Further assume we only observe process Z at points ti. Consider the
quantity of interest to be the (non-observable) random variable I(Z) =∫Z(t)dt. A
natural choice is of course to approximate I(Z) by the observable Zn = 1/nd∑i Z(ti)
and study the approximation error of this procedure which is measured by its variance
Vn := V ar(I(Z)− Zn) =
∫f(w)gn(w)dw,
gn(w) = (1−d∏j=1
2n sin(wj/(2n))
wj)2
d∏j=1
sin2(wj/2)
n2 sin2(wj/(2n)).
The problem is then assessing the rate of convergence of Vn and constructing consistent
estimators for the correctly normalized variance.
Both problems will depend on the smoothness of process Z. If f(w) ∼ θ|w|−4 as
|w| → ∞ then
n4Vn → 1/144
∫w2 sin2(w/2)f(w)dw. (1)
Moreover, since the limiting quantity depends on f over all its range it is not consis-
tently estimable based on the observations. However, if f is less regular, i.e. f(w) ∼θ|w|−2p as |w| → ∞ for d = 1, 2 and d/2 < p < 2, then n2pV ar(I(Z) − Zn)) → C(θ)
where C(θ) is the optimal variance (see Theorems 2 and 4 in Sections 5.2 and 5.3 in
Stein , 1999) which only depends on the high frequency behaviour of f and whence
can be estimated as developed below in detail in the one dimensional case.
Set d = 1. Consistent estimation of parameter θ for f(w) ∼ θ|w|−2p as |w| → ∞ can be
achieved by considering successive applications of the difference operator ∆nZ(t) :=
Z(t+ 1/n)− Z(t). Indeed, following Stein (1999), page 167, the following statistic
1
Cp
∑i
[(∆n)pZ(ti)]2
is seen to be a consistent estimator of θ with Cp =∫∞−∞ |w|
−2p(2 sin(w/2))2pdw, which
only depends on the behaviour of f for large values of w. The trick here is that using
successive increments introduces sin(w/2n) instead of sin(w/2) and thus only the large-
value behaviour of f is relevant, provided it is of the stated form.
We remark that in a non strictly infill domain (if Cn := ∆/n→∞) it is always possible
to construct estimators of the variance of the sample mean which attain√Cn rates
under general ergodicity conditions.
4
The above discussion introduces several points we will develop throughout the article.
The first is estimating rough signals is easier than estimating regular signals in an
infill setting, whence the need to de-regularize. The second is that differentiating the
observed series provides estimators which depend only on high frequency behaviour in
many interesting cases as in the example discussed above. The third is that any esti-
mation problem that requires knowledge on the low frequencies in general will not be
consistent so that the process will not be microergodic. The fourth is that whether the
infill setting provides the same estimation rates as the classical ergodic setting, i.e. if the
Fisher information matrix is asymptotically equivalent to the conditional information
matrix (see Section 2 below for more details), can be determined in many cases based
on whether the parameters of interest can be estimated using sufficiently higher order
differences of the original observations. In this article we assume the sampling strategy
is regular and follow a spectral approach in order to establish asymptotic results. Regu-
larly spaced observations (perhaps with missing observations) are a standard restriction
for stationary or stationary increment process and typical preprocessing techniques re-
quire converting the irregular sample into a regular one (see for example Cressie , 1993,
Section 2.4). In the non stationary case on the other hand, for example when estimat-
ing volatilities for diffusions or fractional diffusions, there has been much more interest
in developing estimators based on non regular samples. For completeness sake a brief
discussion on non regular samples will also be included.
Most of the examples discussed in this article are closely related to work the authors
have developed in diverse frameworks dealing with statistical inference for densely sam-
pled data based on q-order differences. However, the main thread follows the results
of Stein (1999) or more recently of Zhang and Zimmerman (2005) in their thorough
review of maximum likelihood estimation in the infill case, comparing infill to increas-
ing domain statistics. The article is organized as follows. In Section 2 we discuss ML
estimation and its relation to microergodicity. In Section 3 we review certain known re-
sults for infill estimation of stationary and stationary increment Gaussian process and
state the main result of the article: namely a characterization of the convergence rates
of successive differences according to the tail behaviour of regularly varying spectral
densities. Although this result is not strictly speaking new, we believe it is important
for practitioners as it establishes obtainable rates in a unified framework. In Section 4
we generalize these results to a certain class of non-stationary processes which can be
written as Y = XW with W a stationary, or stationary increment gaussian process and
discuss several examples including volatility estimation, non linear functions of Gaus-
sian processes and applications to multifractals. In Section 5 we consider an application
to fields and finally in Section 6 we discuss certain results concerning inference in the
presence of a “nugget effect”, which can be written as Y = X + Z with X possibly
non stationary and Z a white noise process. Many of the examples can be extended to
t ∈ Rd, but we will restrict our attention to the case d = 1, indicating whenever higher
dimensional results are available.
2 Maximum likelihood estimation
In this section we give a quick overview of maximum likelihood estimation in the fixed
domain setting and relate microergodicity to the properties of the MLE.
5
Let Xt = Xt(θ) be a spatial process and consider the problem of estimating θ based
on the likelihood Ln. Let ln stand for the log-likelihood and In for the Information
matrix. In an increasing domain setting (fixed ∆) I−1/2n (
∂ln(θ)∂θ ) → N(0, I) under
certain general regularity conditions including that the diagonal terms of I−1n → 0.
On the other hand assume, quite generally, that Xt is observed over a series of domains
D1 ⊂ . . . ⊂ Dn ⊂ . . . ⊂ D. Let Fn = σXt, t ∈ Dn. Then ∂ln(θ)∂θ ,Fn, n ≥ 1 is a
martingale under certain general conditions (see Prop. 6.1 in Hall and Heyde , 1980 as
cited by Zhang and Zimmerman , 2005). Let In be the conditional information matrix
given Fn then I−1/2n (∂ln(θ)∂θ ) → N(0, I). Hence the maximum likelihood estimators
will have the same asymptotic properties if In and In are aymptotically equivalent.
Since the MLE cannot be consistent for any θ that is not microergodic, that the asymp-
totic properties in both settings are equal implies microergodicity. In the stationary
Gaussian case, as we will see below, microergodicity is equivalent to certain conditions
over the covariance kernel or the spectral density.
Non sufficiency of microergodicity for consistency is discussed in Zhang and Zimmer-
man (2005), pp. 924-925. The authors develop an example in which they consider a
two dimensional parameter φ = (θ, ν) such that Pφ1⊥ Pφ2
for ν1 = ν2 as long as
θ1 6= θ2 (Pφ1≡ Pφ2
if and only if θ1 = θ2). Set φ0 = (θ0, ν0) to be the true value of
the parameter. The authors show the MLE based on very dense observations consis-
tently estimates θ0. However the MLE of ν0, ν, maximizes P(θ0,ν)/P(θ0,ν0). That is, ν
is actually the maximum likelihood estimator of ν0 when θ0 is known and thus it is
typically not a consistent estimator of ν0.
As an example of the above construction they consider the O.U. process Xt solution
of
dXt = −νXtdt+√
2θdWt,
for Wt a Brownian motion.
Then the MLE θ is consistent but ν = ν(θ) =∫XtdXt/
∫X2t dt is not a consistent
estimate of ν.
In the next sections we will review microergodicity for stationary and stationary incre-
ment Gaussian processes.
3 Stationary Gaussian processes
Following the results of Chapter 4 in Stein (1999) two Gaussian processes with continu-
ous mean functions and continuous positive covariance kernels are always either equiv-
alent or orthogonal. We shall see that this property allows establishing both necessary
and sufficient conditions over a certain class of covariance kernels for the equivalence
of the underlying measures and whence establishing necessary and sufficient conditions
for microergodicity.
More precisely, consider two zero mean stationary Gaussian measures Pθj , j = 1, 2
defined by the corresponding covariance kernel Kj with spectral density fj . If one of
the spectral densities possess a Laplace transform in a neighbourhood of the origin
then both measures are either equivalent or orthogonal on any bounded region. Also,
6
if for j0 ∈ 1, 2 fj0 is such that fj0(w) ∼ |φ(w)|2 as |w| → ∞, where φ is the
Fourier transform of a square integrable function with bounded support (for example
if fj0(w) ∼ |w|−2α, α > 1 as |w| → ∞), then for any bounded region R both measures
will be equivalent over R if there exists C = C(R) such that∫|w|>C
(f1(w)− f2(w))2
f2j0(w)dw <∞. (2)
Condition (2) states the measures will be equivalent over a bounded domain whenever
the high frequency behaviour of the spectral density of the difference of both processes
is negligible relative to the tail behaviour of the spectral density f of one of them. The
converse result is not true, but as Stein (1999) points out a reasonable conjecture is
that if (2) does not hold then it is possible to find some bounded region R′ such that
the measures will not be equivalent over it, and whence be orthogonal.
If for j0 ∈ 1, 2, there exists a positive even integer α such that fj0(w) ∼ |w|−α, then
in the one-dimensional case, and any finite T , both measures will be equivalent over
[0, T ] if and only if k(t) := K1(t) −K2(t) is almost everywhere α times differentiable
(k(α−1) exists and is absolutely continuous over (−T, T )) and∫ T
0
(k(α)(t))2(T − t)dt <∞
(Theorems 13 and 14 in Ibragimov and Rozanov , 1978 as cited by Stein , 1999, p.
122). Moreover, as Stein (1999) goes on to remark, under the stated condition K1
does not have α derivatives at t = 0. Hence, the difference must be more regular than
each covariance function. Typical examples include
1. Consider the covariance kernels K1(t) = e−|t| and K2(t) = (1−|t|)+. The first has
spectral density f1(w) = c/(1 + |w|2) which satisfies the stated condition for α = 2
and the difference has an absolutely continuous derivative over (−1, 1) but it is not
continuously differentiable at t = 1.
2. Let Pθ,ν be the centered Gaussian measure defined by K(t) = θe−ν|t|. Then
h(θ, ν) = θν is microergodic but g(θ, ν) = θ is not.
Results in this section allow for the following concluding remark. In the Gaussian case
h is microergodic if
Ph(θ) ≡ Ph(θ′) ⇐⇒ θ = θ′.
If for all θ ∈ Θ the spectral density fθ exists and is assumed to be regularly varying
of order α = α(θ) (see Definition 32), then h is microergodic if and only if h(θ) 6=h(ν) ⇐⇒ k
(α)θ,ν is not square integrable over some small interval containing 0, where
kθ,ν = Kθ −Kν is the difference of the associated covariance kernels.
3.1 Convergence rates for estimators based on differences
In this Section we look in more detail at the case d = 1 and f(w, θ) ∼C(γ)L(w, θ)w−(1+γ) as w → ∞ with L a slowly varying function at w = 0 with
7
exponent zero, 0 < γ and θ is the parameter of interest. An example of this kind of
asymptotics was originally developed in Guyon and Leon (1989) (see also Ortega , 1990
for an early generalization considering non regular sampling schemes), assuming Xt to
be a centered stationary Gaussian process with spectral density f(w, θ) and covariance
K(t, θ) = 1− L(|t|, θ)|t|γ (3)
for 0 < γ < 2.
Recall we are assuming d = 1 and set n = 1/∆. Define also ∆Xti = Xti −Xti−1 and
set ∆pXti = ∆p−1∆Xti .
By direct calculation
E(∆pXti
)2=
∫f(w, θ)(2 sin(w/(2n)))2pdw. (4)
Proceeding as in example 1 distinct rates will be obtained according to the tail be-
haviour of the spectral density f .
Define the statistic
Zn(p) :=1
n
∑i
(∆pXti)2
E (∆pXti)2.
The next proposition characterizes convergence rates of Zn(p) to 1 according to the
relative values of γ and p. Previous versions of this result can be found in Guyon
and Leon (1989) and Chan and Wood (2000) for stationary Gaussian processes with
covariance C(h) = 1 − L(h)|h|γ , 0 < γ < 2, the latter for d ≤ 2. Our treatment relies
on a spectral approach which can then be readily generalized to other non stationary
frameworks.
Proposition 31 Assume that d = 1 and X is a stationary process with spectral density
f(w, θ) such that
limw→∞
f(w, θ)w1+γ
L(1/w, θ)= C(γ),
with L(w, θ) a slowly varying function at w = 0 and 0 < γ. Then we have,
– If γ ≥ 2p then
n2pE(∆pXti
)2 → 22p∫f(u)u2pdu,
and V ar(Zn(p)) = O(1).
– If γ < 2p then
nγE(∆pXti
)2 → C(γ)
∫L(1/w, θ)w−(1+γ)(2 sin(w/2))2pdw
and
– If γ > (4p− 1)/2 then nV ar(Zn(p)) = O(n2γ−(4p−1)).
– If γ = 2p− 1/2 then V ar(Zn(p)) = O(log(n)/n).
– If γ < 2p− 1/2 then V ar(Zn(p)) = O(1/n).
8
Proposition 31 can be restated as follows relating Zn(p) to the original statistical
problem
1. Case γ ≥ 2p:
n2pE(∆pXti
)2 → 22p∫f(u, θ)u2pdu,
and this limit cannot be consistently estimated using the empirical mean function.
2. Case γ < 2p
nγE(∆pXti
)2 → C(γ)
∫L(1/w, θ)w−(1+γ)(2 sin(w/2))2pdw.
and
– Case 2p − 1/2 ≤ γ < 2p:∫L(1/w, θ)w−(1+γ)(2 sin(w/2))2pdw can be consis-
tently estimated using its empirical mean function but only suboptimal rates
are obtained and the limiting variance of the estimator is not estimable (it
depends on unobservable low frequencies).
– Case γ < 2p − 1/2:∫L(1/w, θ)w−(1+γ)(2 sin(w/2))2pdw can be consistently
estimated using its empirical mean function, rates are optimal and the variance
of the estimator can also be estimated.
Whence, estimation based on regular signals will attain√n rates in an in-
fill framework if the parameters to be estimated can be obtained from∫L(1/w, θ)w−(1+γ)(2 sin(w/2))2pdw (for example using contrasts) for large enough
p, i.e. p(γ) > [(1 + 2γ)/4] + 1.
Some examples:
Example 2 X a stationary Gaussian process with covariance function K(t) = 1 −L(|t|, θ)|t|γ . In this case we have the following Corollary of Proposition 31:
Corollary 31 Assume that d = 1 and X is a stationary Gaussian process with co-
variance function K(t) = 1− L(|t|, θ)|t|γ , L(t) a slowly varying function at t = 0 and
0 < γ < 2. Assume additionally L has a continuous derivative such that L′(1/t) =
L(1/t)(1 + o(1)). Then the results of Proposition 31 follow with p = 1.
Example 3 A Gaussian process X with spectral density f which is homogeneous with
parameter α, f(λ) = C|λ|−α. Then
n(−α−1)/2(∆Xk)1≤k≤nd= (Xk)1≤k≤n,
where the differences in the r.h.s. do not depend on n. However, if α is not known then
estimation of C will not be efficient.
Example 4 O.U.: f(λ) = 2Kθ/(θ2+λ2). In this case the parameter vector (K, θ) cannot
be efficiently estimated. This is because we must necessarily look at differences in order
to get optimal rates and differences will imply it will be impossible to consistently
estimate θ alone. Recall, K(u) = Ke−θ|u|.
9
Example 5 The Cauchy class Gneiting and Schlather (2004): Consider a Gaussian
process with correlation c(h) = (1 + h2α)β/(2α), where 0 < α < 1 and 0 < β. Then,
limw→∞f(w)
w−2α−1 = 1 and α is estimable at√n rates if p is chosen such that α < p−1/4.
Estimating β is not possible in a purely infill scenario.
Remark 31 Corollary 31 can be obtained under the more general, although more tech-
nical, conditions required in order to apply Proposition 32 in the proof of Corollary 31:
assume K is an even covariance kernel with m+ 1 integrable derivatives and such that
if r = K(t)−K(0), then
– r(m) ∈ RV 0α (see Definition 32) for some 0 ≤ α < 1
– g(t) := r(m)(−t) and g′(t) be absolutely continuous and for some x0 and δ > 0
– supx≤x0t2|g(2)(t)/r(t)| <∞
– supx≥x0t1+δ|g′(t)| <∞
Remark 32 Under certain additional conditions (for example if f is homogeneous of
order α), the results in Proposition 31 can be extended to processes with stationary q
increments. We will develop the case of fBm as an example.
Consider the representation of fBm
X(t) = cH
∫ ∞−∞
(e−itλ − 1)1
|λ|H+1/2dW (λ). (5)
Setting t = tj we obtain the representation
∆Xtj = cH
∫ ∞−∞
e−ij/nλ(1− e−iλ/n)1
|λ|H+1/2dW (λ),
which is exactly of the form stated above with γ = 2H.
Proof of Proposition 31:
We start by setting p = 1 which renders the main argument easier to follow. Set
σ2n := E (∆Xt1)2 so that
Zn(1) =1
n
∑i
(∆Xti)2
(σ2n).
By construction Zi :=∆Xti
(σ2n)
1/2 is a collection of standard stationary Gaussian variables
with covariance ρ(j) = [K((j+1)/n)−2K(j/n)+K((j−1)/n)]/σ2n and Zn = 1n
∑i Z
2i .
Hence, standard results for Gaussian processes yield
V ar(Zn(1)) =2
n2
∑|j|≤n−1
(n− |j|)ρ2(j)
=2
nσ2n
∫|f(u)| |f(v)| sin2(u/2n) sin2(v/2n)Fn(u− v)dudv,
where Fn(w) = 1n
sin2(nw/2)sin2(w/2)
is the Fejer kernel and we have used that
ρ(j) =2
σ2n
∫ ∞−∞
eiwj/n(1− cos(w/n))f(w)dw (6)
10
for j 6= 0.
Rates of convergence depend on whether the dominating term in the above integral
corresponds to the small or large values of |u−v|, or more precisely if we consider Fn(t)
or Fn(nt). Indeed, assume γ > 3/2 which is equivalent to K being two times differen-
tiable with square integrable second derivative. Then the dominant term corresponds
to small values of |u− v|. More precisely,∫f(u)f(v) sin2(u/2n) sin2(v/2n)Fn(u− v)dudv
where σ2n and the proof of 31 may be continued to obtain the stated results.
The rates established in Proposition 42 will be the same as those obtained in 31. In
many examples encountered in this setting θ is assumed to be known and the parameter
of interest is ν. Consistent, optimal rate estimation of this parameter is possible if ν
can be recovered from I(p) for large enough p.
If σ(·) is random then the above discussion suggests that the correct centering term is
not the expectation E (Zn(p)) but rather I(p) (or its discretization 1/n∑j σ
2p(tj , ν)),
19
which is random. If σ and W are independent, then I(p) = limn→∞
Eσ (Zn(p)). Although
it is generally not true that both processes are independent, in many situations they
are asymptotically so, so that Eσ ((Zn(p)))→ I(p) anyway. Estimation rates for ν will
then depend on whether I(p) exists for large enough p. In the following subsections we
include several examples where this is so.
4.1 Estimating volatility for diffusion and fractionary-diffusion processes
There exist quite a number of results addressing the problem of estimating the diffu-
sion coefficient or volatility using contrasts or non parametric approaches for diffusion
processes (see for example Genon-Catalot & Jacod , 1993 for one of the first results in
this direction). Consider the model
dXt = b(t,Xt)dt+ σ(θ, t,Xt)dWt,
where b and σ are assumed to satisfy the necessary conditions to assure the existence
of a strong solution Xt. Assume the number of observations is n = T/∆. In Genon-
Catalot & Jacod (1993) the authors obtain√n rates and show that the estimation
problem satisfies the LANM property, and in particular is consistent. Namely, consider
Zn =1
n
n∑i=1
|∆Xi∆1/2
|2 ∼ 1
n
∑i
|σ(θ, i∆,Xi∆)∆Wi
∆1/2|2,
Then, conditionally on X,√n(Zn − 1
T
∫ T0σ2(θ, t,Xt)dt) converges stably to a Gaus-
sian r.v. with conditional variance Γ 2 = 1T
∫ T0σ4(θ, t,Xt)dt, where
∫ T0σ2(θ, t,Xt)dt is
known as the integrated volatility. Results for contrasts follow from here.
Although in general the variance is random, if σ = σ(θ,Xt) does not depend on t
(and the solution is assumed to be stationary and ergodic), then Γ 2 converges to
E(σ4(θ,X0)
), as |T | → ∞.
On the other hand, consider the problem of estimating α and β in the model
dXt = b(Xt, α)dt+ σ(Xt, β)dWt,
which is assumed to have a stationary solution. The best possible rates (see for ex-
ample Sorensen Sorensen , 2004) are√n for β and
√n∆ for α so that, in this case
completely infill statistics are not possible. Intuitively, this result is not surprising since
the covariance of ∆Xi does not depend on α. In fact, that an absolutely continuous
change of measure changes the distribution of X to the zero drift distribution assures
that estimation of α cannot be microergodic, and whence cannot be consistent.
Estimating the integrated volatility or the volatility associated parameter θ for diffusion
processes has been less studied in the case of non regular samples for high frequency
data, because of the technical difficulties associated with the characterization of the
limiting behaviour of the estimators. This problem is discussed in depth in Hayashi
et al. (2011), where deterministic, random but independent and data dependent non
regular sampling schemes are considered.
20
It is also possible to consider diffusions driven by fractional Brownian motion (fBm).
That is, solutions to
dXt = b(Xt, α)dt+ σ(Xt, β)dWHt ,
where WHt is a fBm with Hurst parameter 0 < H < 1.
There are several ways to define integration with respect to fBm. If H > 1/2 a pathwise
approach is possible for sufficiently regular σ (see Dai and Heyde , 1996,Lin , 1995,
Zahle , 1998). Existence and pathwise properties of the solution Xt are shown in
Nualart and Rascanu (2002). Based on these properties and the results in section
4. it follows that it is possible to construct efficient estimators for β based on Zn(2) ∼1/n(
∑j σ
2(Xtj , β)[∆2WHtj ]2) Leon and Ludena (2007) for all 1/2 < H < 1. Analogous
results for estimators based on first order differences only hold for 1/2 < H < 3/4,
that is 1 < γ < 3/2 as follows from equation (5). The proof is based on studying the
variance of
Zmn (p) :=1
ngtj (W
Hs1 , . . . ,W
Hsm)[∆2WH
tj ]2,
where gt = σ(Xmt ) and Xm
t is an approximation of the solution Xt which depends
only on a finite collection WHsk , k = 1, . . . ,m and then showing
E(Zmn (p)− Zn(p)
)2 → 0.
Consistent estimation of α is again not possible in a strictly infill scenario. In fact, set
Xt = h(BHt ) where h satisfies the O.D.E
h(t) = σ(h(t))dt
h(0) = x0
and consider BH,1t = BHt y(t) where y(t) solves the random O.D.E.
dy(t) =b(h(BHt ), t)
σ(h(BHt ))
with y(0) = 0. With this notation Xt = h(BH,1t + y(t)) and solves the equation
Xt = x0 +
∫ T
0
b(s,Xs)ds+
∫ T
0
σ(Xs)dBH,1s .
The above stochastic integral is well defined as BH,1 has the same pathwise properties
as BH . By Theorem 6.1 in Decreusefond and Ustunel (1999) there exists a probability
measure P1 which is absolutely continuous with respect to P and such that the law of
BH,1 under P1 is the same as the law of BH under P , as long as y satisfies certain
bounds (Theorem 4 in Leon and Ludena , 2007; Lemma 9 in Gloter and Hoffmann ,
2004). Namely, that there exists δ such that
E
(eC∫ 10
(∫|s−u|<δ
y′(u)−y′(s)(u−s)H+1/2
)2)<∞.
As for the diffusion case the absolutely continuous change of measure to the zero drift
case assures that estimation of α is not microergodic.
21
For H < 1/2 the pathwise approach is no longer possible and generally in this case
the solution is interpreted in the sense of Decreusefond and Ustunel (1999). The
latter also includes the case H > 1/2, but since, except for deterministic σ(β,Xt),
both solutions do not coincide, we have preferred to stick to the former in this case.
Existence and properties of a strong solution have also been developed based on this
definition for H < 1/2 (see Coutin , 2007 for a thorough review). A number of results
dealing with parameter estimation of diffusions driven by fBm have appeared recently.
However most of them are related to establishing consistency results for MLE when
T →∞ (see Tudor and Viens , 2009 and the references therein) for all 0 < H < 1. As
above, purely infill statistics in this domain will be consistent only for volatility related
parameters. A generalization of the Girzanov absolutely continuous change of measure
(see Theorem 1 in Tudor and Viens , 2009) once again assures that the problem of
estimating drift related parameters cannot be microergodic.
Another closely related example was developed in Barndorff-Nielsen et al. (2009). In
this article the authors consider processes
Xt = X0 +
∫φsdGs,
where G is a centered stationary increment Gaussian process with
R(t) := V ar(Gt+s −Gs) = L0(t)tγ ,
0 < γ < 2 and L0 a slowly varying function at zero. Here the integral is defined in a
pathwise sense for any φ with bounded q variation, q < 1/(1 − γ/2) (see Barndorff-
Nielsen et al. , 2009 for details on the construction). The authors show convergence
in probability for quadratic variations, and higher order powers, of the increments of
process X as well as a conditional CLT.
4.2 Estimating volatility for γ unknown
A special case of volatility estimation occurs when ∆Xti = σ(ti, ν)(1 + oP (1))∆Wti ,,
with W a stationary process with spectral density f(w, γ) such that
limw→∞
f(w, γ)w1+γ
L(1/w)= C(γ),
with L(w) a slowly varying function at w = 0 and γ unknown. Since by Proposition
42 if 0 < γ < 2p− 1/2 then
nγE(∆pXti
)2 → (
∫ T
0
σ2p(t, ν)dt)(
∫L(1/w)w−(1+γ)(2 sin(w/2))2pdw)
and the variance of the mean squared p−order differences will be O(1/n), a reasonable
strategy is a two step procedure estimating γ based on the logarithm of the said mean
squared p−order differences and then using this estimated value in order to estimate
parameter ν.
22
More precisely, recalling ∆ to be the sample lag, consider estimators Zn(p, j) based on
observations sampled at lags j∆ for j = 1, . . . ,m and consider
γ =∑
αjZn(p, j),
with the sequence αj satisfying
–∑j αj = 0
–∑j αj log(j) = 1.
Then, from Proposition 42, γ is seen to be a consistent estimator of γ with O(1/n)
variance whenever 0 < γ < 2p− 1/2.
The estimated parameter γ can then be used to build estimators for ν based on∫ T0σ2p(t, ν)dt. However, estimation for ν will achieve O(
√n/ log(n)) rates instead of√
n. The interested reader is encouraged to review Istas and Lang (1997), Zhu and
Taqqu (2006), Berzin et al. (2012), Chan and Wood (2000) and Zhu and Stein (2002),
the latter two dealing with extensions of these ideas to fields.
4.3 Non linear functions stationary Gaussian processes
The next example, closely related to volatility estimation, deals with infill statistics
based on the increments of a nonlinear function of a stationary Gaussian process. Let
Xt be a stationary Gaussian process with covariance function γ(t, θ) = 1−|t|γ(θ)L(t, θ)
as above (H = γ/2). Assume instead of Xi we observe Yi = G(Xi) with G a certain
nonlinear function and that we are interested in estimating θ based on the increments
of the indirect observations, that is, ∆Yi ∼ G′(Xi)∆Xi. This scenario has been studied
in detail in Chan and Wood (2004) where log based estimators of the exponent γ are
considered. The authors go on to consider two dimensional isotropic fields and estima-
tors of γ built on quadratic variations of spacial increments following their previous
work in Chan and Wood (2000).
Based on Proposition 31 consistency results can then be obtained for estimators based
on quadratic variations of p order differences under certain regularity conditions over
G, namely that G′ ∈ C4(R) with polynomial tails (see for example Theorem 1 in
Leon and Ludena (2004)). The basic idea is to show a.s. convergence to zero of the
conditional variance, given the process Xt, of
Zn(p) =1
n
n∑i=1
[G′(Xti)]2[∆pXi − EXti
(∆pXi
)]2,
based on the results developed in Section 3.1 and the asymptotic independence of the
process and its increments in the Gaussian case. By construction, Zn is centered and
the conditional variance will depend only on the conditional covariance structure of
the p order increments. By the asymptotic independence the conditional covariance
converges to the non conditional covariance. Conditional Central Limit Theorems will
also hold in this case.
23
4.4 Multifractals
Another interesting framework for comparing infill and increasing domain statistics is
the estimation of the scale function in multifractal models. Recall ∆Xi = Xti+1 −Xtiand ∆ = ti+1− ti stands for the fixed sampling lag. If we define the scale function ζ(q)
as
ζ(q) := − lim∆→0
logE((∆Xi)
q) / log(∆),
process X is considered a multifractal if ζ(q) is not linear. Two very popular examples
are Multiplicative cascades (MC) introduced Mandelbrot (1974) (see Ossiander and
Waymire , 2000 for a thorough account on estimation procedures) and Multifractal
Random Measures (MRM) introduced in Bacry et al. (2003) or more generally a Mul-
tifractal Random Walk process (MRW) Bacry et al. (2001, 2003); Ludena (2008). It
is assumed that ∆ = 2−m and L = 2ξm, with ξ ≥ 0, and that the total number of ob-
servations is n = L/∆. As for the estimation of a random volatility coefficient, or non
linear functionals of Gaussian processes, the difference operator ∆Xti ∼ Yti∆Wti,θ
where the r.v. ∆Wti,θ are conditionally independent and independent of the variables
Yti . However, the latter exhibit a very complex dependence structure unlike the previ-
ously considered diffusion like examples. As mentioned, the parameter of interest in this
setting is the scaling function ζ(q), but the natural empirical mean-based estimator is
biased by a non-observable random term and it is necessary to subtract an appropriate
centering term in order to eliminate the bias. For MC this can be achieved by consid-
ering the difference of the square increments for two successive scales, which consists
namely in eliminating the conditional expectation. This new unbiased estimator will
still have a random variance in the purely infill framework though. On the other hand,
in the increasing domain framework, the asymptotic variance of the estimator will be
deterministic. For MRM or MRW the above estimation procedure is no longer useful
in the purely infill scenario. In the increasing domain case, because of the deterministic
limiting variance√n∆ rates and a Central Limit Theorem can be obtained under quite
general assumptions (Ludena and Soulier , 2012).
Very succinctly, MC are constructed as follows.
Consider a collection Wr, r ∈ 0, 1mm ≥ 1 of independent random variables with
common law W such that E (W ) = 1 and E (W log2W ) < 1. For each m ≥ 1 consider
the random measure defined by
λm(I) = 2−m∑
r∈0,1m∩I
m∏i=1
Wr|i ,
for any I a Borel subset of [0, 1], and each r = (r1, . . . , rm) ∈ 0, 1m identified to
the real number∑mi=1 ri2
m−k. Here r|i stands for the restriction of r to its first i
components. It can be seen (see Kahane and Peyriere (1976), Ossiander and Waymire
(2000) for details on the construction and main results) that there exists a random
measure λ∞, such that
P(λn ⇒ λ∞ as n→∞) = 1 ,
where⇒ stands for vague convergence. The limiting measure verifies E (λ∞([0, 1])) = 1
under the stated assumptions for the r.v. W .
24
Moreover, let q > 1 be such that E (W q) <∞ and set ψ(q) = log2(E (W q)). Consider
the sequence
λm,q(I) = 2−m(1−q+ψ(q)) ∑r∈0,1m∩I
m∏i=1
W qr|i
Let q0 be the largest value of q such that
qψ′(q) < ψ(q) + 1 . (7)
Then if q < q0, Proposition 2.2 in Ossiander and Waymire (2000), yields the existence
or a certain random measure λ∞,q such that λm,q ⇒ λ∞,q. The limiting measure
satisfies E (λ∞,q([0, 1])]) = 1 under the stated assumptions for the r.v. W .
Also, as follows from Proposition 2.1 in Ossiander and Waymire (2000), there exists
a sequence of i.d. random variables aj , such that