Top Banner
Author's personal copy Journal of Multivariate Analysis 116 (2013) 422–439 Contents lists available at SciVerse ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Least squares estimators for discretely observed stochastic processes driven by small Lévy noises Hongwei Long a,, Yasutaka Shimizu b , Wei Sun c a Department of Mathematical Sciences, Florida Atlantic University, Boca Raton, FL 33431-0991, USA b Graduate School of Engineering Science, Osaka University, Toyonaka, Osaka 560-8531, Japan c Department of Mathematics and Statistics, Concordia University, Montreal, Quebec H3G 1M8, Canada article info Article history: Received 21 May 2012 Available online 25 January 2013 AMS 2010 subject classifications: primary 62F12 62M05 secondary 60G52 60J75 Keywords: Asymptotic distribution of LSE Consistency of LSE Discrete observations Least squares method Stochastic processes Parameter estimation Small Lévy noises abstract We study the problem of parameter estimation for discretely observed stochastic processes driven by additive small Lévy noises. We do not impose any moment condition on the driving Lévy process. Under certain regularity conditions on the drift function, we obtain consistency and rate of convergence of the least squares estimator (LSE) of the drift parameter when a small dispersion coefficient ε 0 and n →∞ simultaneously. The asymptotic distribution of the LSE in our general setting is shown to be the convolution of a normal distribution and a distribution related to the jump part of the Lévy process. Moreover, we briefly remark that our methodology can be easily extended to the more general case of semi-martingale noises. © 2013 Elsevier Inc. All rights reserved. 1. Introduction Let (, F , P) be a basic probability space equipped with a right continuous and increasing family of σ -algebras (F t , t 0). Let (L t , t 0) be an R d -valued Lévy process, which is given by L t = at + σ B t + t 0 |z|≤1 z ˜ N (ds, dz ) + t 0 |z|>1 zN (ds, dz ), (1.1) where a = (a 1 ,..., a d ) R d , σ = ij ) d×r is a d × r real-valued matrix, B t = (B 1 t ,..., B r t ) is an r -dimensional standard Brownian motion, N (ds, dz ) is an independent Poisson random measure on R + × (R d \{0}) with characteristic measure dt ν(dz ), and ˜ N (ds, dz ) = N (ds, dz ) ν(dz )ds is a martingale measure. Here we assume that ν(dz ) is a Lévy measure on R d \{0} satisfying R d \{0} (|z | 2 1)ν(dz )< with |z |= d i=1 z 2 i . The stochastic process X = (X t , t 0), starting from x 0 R d , is defined as the unique strong solution to the following stochastic differential equation (SDE) dX t = b(X t ,θ)dt + εdL t , t ∈[0, 1]; X 0 = x 0 , (1.2) where θ Θ = ¯ Θ 0 (the closure of Θ 0 ) with Θ 0 being an open bounded convex subset of R p , and b = (b 1 ,..., b d ) : R d × Θ R d is a known function. Without loss of generality, we assume that ε (0, 1]. The regularity conditions on b will Corresponding author. E-mail address: [email protected] (H. Long). 0047-259X/$ – see front matter © 2013 Elsevier Inc. All rights reserved. doi:10.1016/j.jmva.2013.01.012
18

Author's personal copy Journal of Multivariate Analysis ...math.fau.edu/long/JMA2013LSS1.pdfLeast squares estimators for discretely observed stochastic processes driven by small Lévy

Jul 11, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Author's personal copy Journal of Multivariate Analysis ...math.fau.edu/long/JMA2013LSS1.pdfLeast squares estimators for discretely observed stochastic processes driven by small Lévy

Author's personal copy

Journal of Multivariate Analysis 116 (2013) 422–439

Contents lists available at SciVerse ScienceDirect

Journal of Multivariate Analysis

journal homepage: www.elsevier.com/locate/jmva

Least squares estimators for discretely observed stochasticprocesses driven by small Lévy noisesHongwei Long a,∗, Yasutaka Shimizu b, Wei Sun c

a Department of Mathematical Sciences, Florida Atlantic University, Boca Raton, FL 33431-0991, USAb Graduate School of Engineering Science, Osaka University, Toyonaka, Osaka 560-8531, Japanc Department of Mathematics and Statistics, Concordia University, Montreal, Quebec H3G 1M8, Canada

a r t i c l e i n f o

Article history:Received 21 May 2012Available online 25 January 2013

AMS 2010 subject classifications:primary 62F1262M05secondary 60G5260J75

Keywords:Asymptotic distribution of LSEConsistency of LSEDiscrete observationsLeast squares methodStochastic processesParameter estimationSmall Lévy noises

a b s t r a c t

Westudy the problemof parameter estimation for discretely observed stochastic processesdriven by additive small Lévy noises. We do not impose any moment condition on thedriving Lévy process. Under certain regularity conditions on the drift function, we obtainconsistency and rate of convergence of the least squares estimator (LSE) of the driftparameter when a small dispersion coefficient ε → 0 and n → ∞ simultaneously. Theasymptotic distribution of the LSE in our general setting is shown to be the convolutionof a normal distribution and a distribution related to the jump part of the Lévy process.Moreover, we briefly remark that our methodology can be easily extended to the moregeneral case of semi-martingale noises.

© 2013 Elsevier Inc. All rights reserved.

1. Introduction

Let (Ω, F , P) be a basic probability space equipped with a right continuous and increasing family of σ -algebras (Ft , t ≥

0). Let (Lt , t ≥ 0) be an Rd-valued Lévy process, which is given by

Lt = at + σBt +

t

0

|z|≤1

zN(ds, dz) +

t

0

|z|>1

zN(ds, dz), (1.1)

where a = (a1, . . . , ad) ∈ Rd, σ = (σij)d×r is a d × r real-valued matrix, Bt = (B1t , . . . , B

rt ) is an r-dimensional standard

Brownian motion, N(ds, dz) is an independent Poisson random measure on R+ × (Rd\ 0) with characteristic measure

dtν(dz), and N(ds, dz) = N(ds, dz) − ν(dz)ds is a martingale measure. Here we assume that ν(dz) is a Lévy measure on

Rd\ 0 satisfying

Rd\0(|z|

2∧ 1)ν(dz) < ∞ with |z| =

di=1 z

2i . The stochastic process X = (Xt , t ≥ 0), starting from

x0 ∈ Rd, is defined as the unique strong solution to the following stochastic differential equation (SDE)dXt = b(Xt , θ)dt + εdLt , t ∈ [0, 1]; X0 = x0, (1.2)

where θ ∈ Θ = Θ0 (the closure of Θ0) with Θ0 being an open bounded convex subset of Rp, and b = (b1, . . . , bd) :

Rd×Θ → Rd is a known function. Without loss of generality, we assume that ε ∈ (0, 1]. The regularity conditions on bwill

∗ Corresponding author.E-mail address: [email protected] (H. Long).

0047-259X/$ – see front matter© 2013 Elsevier Inc. All rights reserved.doi:10.1016/j.jmva.2013.01.012

Page 2: Author's personal copy Journal of Multivariate Analysis ...math.fau.edu/long/JMA2013LSS1.pdfLeast squares estimators for discretely observed stochastic processes driven by small Lévy

Author's personal copy

H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439 423

be provided in Section 2. Assume that this process is observed at regularly spaced time points tk = k/n, k = 1, 2, . . . , n.The only unknown quantity in SDE (1.2) is the parameter θ . Let θ0 ∈ Θ0 be the true value of the parameter θ . The purposeof this paper is to study the least squares estimator for the true value θ0 based on the sampling data (Xtk)

nk=1 with small

dispersion ε and large sample size n.In the case of diffusion processes driven by Brownian motion, a popular method is the maximum likelihood estimator

(MLE) based on the Girsanov density when the processes can be observed continuously (see Prakasa Rao [31], Liptser andShiryaev [19], Kutoyants [16], and Bishwal [2]).When a diffusion process is observed only at discrete times, inmost cases thetransition density and hence the likelihood function of the observations is not explicitly computable. In order to overcomethis difficulty, some approximate likelihood methods have been proposed by Lo [20], Pedersen [27,28], Poulsen [29], andAït-Sahalia [1]. For a comprehensive review on MLE and other related methods, we refer to Sørensen [37]. The leastsquares estimator (LSE) is asymptotically equivalent to the MLE. For the LSE, the convergence in probability was provedin Dorogovcev [5] and Le Breton [18], the strong consistency was studied in Kasonga [12], and the asymptotic distributionwas studied in Prakasa Rao [30]. For a more recent comprehensive discussion, we refer to Prakasa Rao [31], Kutoyants [16],Bishwal [2] and the references therein.

The parametric estimation problems for diffusion processes with jumps based on discrete observations have beenstudied by Shimizu and Yoshida [35] and Shimizu [33] via the quasi-maximum likelihood. They established consistencyand asymptotic normality for the proposed estimators. Moreover, Ogihara and Yoshida [26] showed some stronger resultsthan the ones by Shimizu and Yoshida [35], and also investigated an adaptive Bayes-type estimator with its asymptoticproperties. The driving jump processes considered in Shimizu and Yoshida [35], Shimizu [33] and Ogihara and Yoshida [26]include a large class of Lévy processes such as compound Poisson processes, gamma, inverse Gaussian, variance gamma,normal inverse Gaussian or some generalized tempered stable processes. Masuda [24] dealt with the consistency andasymptotic normality of the TFE (trajectory-fitting estimator) and LSE when the driving process is a zero-mean adaptedprocess (including Lévy process) with finite moments. The parametric estimation for Lévy-driven Ornstein–Uhlenbeckprocesses was also studied by Brockwell et al. [3], Spiliopoulos [39], and Valdivieso et al. [46]. However, the aforementionedpapers were unable to cover an important class of driving Lévy processes, namely α-stable Lévy motions with α ∈ (0, 2).Recently, Hu and Long [9,10] have started the study on parameter estimation for Ornstein–Uhlenbeck processes driven byα-stable Lévy motions. They obtained some new asymptotic results on the proposed TFE and LSE under continuousor discrete observations, which are different from the classical cases where asymptotic distributions are normal.Fasen [6] extended the results of Hu and Long [10] to multivariate Ornstein–Uhlenbeck processes driven by α-stableLévy motions. Masuda [25] proposed a self-weighted least absolute deviation estimator for discretely observed ergodicOrnstein–Uhlenbeck processes driven by symmetric Lévy processes.

The asymptotic theory of parametric estimation for diffusion processeswith small white noise based on continuous-timeobservations has been well developed (see, e.g., Kutoyants [14,15], Yoshida [48,50], Uchida and Yoshida [44]). There havebeen many applications of small noise asymptotics to mathematical finance, see for example Yoshida [49], Takahashi [40],Kunitomo and Takahashi [13], Takahashi and Yoshida [41], Uchida and Yoshida [45]. From a practical point of view inparametric inference, it is more realistic and interesting to consider asymptotic estimation for diffusion processes withsmall noise based on discrete observations. Substantial progress has been made in this direction. Genon-Catalot [7] andLaredo [17] studied the efficient estimation of drift parameters of small diffusions from discrete observations when ε → 0and n → ∞. Sørensen [36] used martingale estimating functions to establish consistency and asymptotic normality ofthe estimators of drift and diffusion coefficient parameters when ε → 0 and n is fixed. Sørensen and Uchida [38] andGloter and Sørensen [8] used a contrast function to study the efficient estimation for unknown parameters in both drift anddiffusion coefficient functions. Uchida [42,43] used themartingale estimating function approach to study estimation of driftparameters for small diffusions under weaker conditions. Thus, in the cases of small diffusions, the asymptotic distributionsof the estimators are normal under suitable conditions on ε and n.

Long [21] studied the parameter estimation problem for discretely observed one-dimensional Ornstein–Uhlenbeckprocesses with small Lévy noises. In that paper, the drift function is linear in both x and θ (b(x, θ) = −θx), the drivingLévy process is Lt = aBt + bZt , where a and b are known constants, (Bt , t ≥ 0) is the standard Brownian motion and Zt isa α-stable Lévy motion independent of (Bt , t ≥ 0). The consistency and rate of convergence of the least squares estimatorare established. The asymptotic distribution of the LSE is shown to be the convolution of a normal distribution and a stabledistribution. In a similar framework, Long [22] discussed the statistical estimation of the drift parameter for a class of SDEswith special drift function b(x, θ) = θb(x). Ma [23] extended the results of Long [21] to the case when the driving noise is ageneral Lévy process. However, all the drift functions discussed in Long [21,22] and Ma [23] are linear in θ , which restrictsthe applicability of their models and results. In this paper, we allow the drift function b(x, θ) to be nonlinear in both x and θ ,and the driving noise to be a general Lévy process. We are interested in estimating the drift parameter in SDE (1.2) based ondiscrete observations Xti

ni=1 when ε → 0 and n → ∞. We shall use the least squares method to obtain an asymptotically

consistent estimator.Consider the following contrast function

Ψn,ε(θ) =

nk=1

|Xtk − Xtk−1 − b(Xtk−1 , θ) · ∆tk−1|2

ε2∆tk−1,

Page 3: Author's personal copy Journal of Multivariate Analysis ...math.fau.edu/long/JMA2013LSS1.pdfLeast squares estimators for discretely observed stochastic processes driven by small Lévy

Author's personal copy

424 H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

where ∆tk−1 = tk − tk−1 = 1/n. Then the LSE θn,ε is defined as

θn,ε := argminθ∈Θ

Ψn,ε(θ).

Since minimizing Ψn,ε(θ) is equivalent to minimizing

Φn,ε(θ) := ε2(Ψn,ε(θ) − Ψn,ε(θ0)),

we may write the LSE as

θn,ε = argminθ∈Θ

Φn,ε(θ).

We shall use this fact later for convenience of the proofs.In the nonlinear case, it is generally very difficult or impossible to obtain an explicit formula for the least squares estimator

θn,ε . However, we can use some nice criteria in statistical inference (see Chapter 5 of van der Vaart [47] and Shimizu [34] fora more general criterion) to establish the consistency of the LSE as well as its asymptotic behaviors (asymptotic distributionand rate of convergence). In this paper, we consider the asymptotics of the LSE θn,ε with high frequency (n → ∞) and smalldispersion (ε → 0). Our goal is to prove that θn,ε → θ0 in probability and to establish its rate of convergence and asymptoticdistributions.We obtain some new asymptotic distributions for the LSE in our general setting, which are the convolutions ofnormal distribution and a distribution related to the jump part of the driving Lévy process. Some similar but more generalresults are also established when the driving Lévy process is replaced by a general semi-martingale.

The paper is organized as follows. In Section 2, we state our main result with some remarks and examples. We establishthe consistency of the LSE θn,ε and give its asymptotic distribution, which is a natural extension of the classical small-diffusion cases. All the proofs are given in Section 3. In Section 4, we discuss the extension of main results in Section 2to the general case when the driving noise is a semi-martingale. Some simulation studies are provided in Section 5.

2. Main results

2.1. Notation and assumptions

Let X0= (X0

t , t ≥ 0) be the solution to the underlying ordinary differential equation (ODE) under the true value of thedrift parameter:

dX0t = b(X0

t , θ0)dt, X00 = x0.

For amulti-indexm = (m1, . . . ,mk), we define a derivative operator in z ∈ Rk as ∂mz := ∂

m1z1 · · · ∂

mkzk , where ∂

mizi := ∂mi/∂zmi

i .Let Ck,l(Rd

× Θ; Rq) be the space of all functions f : Rd× Θ → Rq which is k and l times continuously differentiable

with respect to x and θ , respectively. Moreover Ck,l↑

(Rd× Θ; Rq) is a class of f ∈ Ck,l(Rd

× Θ; Rq) satisfying thatsupθ∈Θ |∂α

θ ∂βx f (x, θ)| ≤ C(1+ |x|)λ for universal positive constants C and λ, where α = (α1, . . . , αp) and β = (β1, . . . , βd)

are multi-indices with 0 ≤p

i=1 αi ≤ l and 0 ≤d

i=1 βi ≤ k, respectively.We introduce the following set of assumptions.

(A1) There exists a constant K > 0 such that

|b(x, θ) − b(y, θ)| ≤ K |x − y|; |b(x, θ)| ≤ K(1 + |x|)

for each x, y ∈ Rd and θ ∈ Θ .(A2) b(·, ·) ∈ C2,3

↑(Rd

× Θ; Rd).(A3) θ = θ0 ⇔ b(X0

t , θ) = b(X0t , θ0) for at least one value of t ∈ [0, 1].

(A4) I(θ0) = (I ij(θ0))1≤i,j≤p is positive definite, where

I ij(θ) =

1

0(∂θib)

T (X0s , θ)∂θjb(X

0s , θ)ds.

It is well-known that SDE (1.2) has a unique strong solution under (A1). For convenience, we shall use C to denote ageneric constant whose value may vary from place to place. For a matrix A, we define |A|

2= tr(AAT ), where AT is the

transpose of A. In particular, |σ |2

=d

i=1r

j=1 σ 2ij .

2.2. Asymptotic behavior of LSE

The consistency of our estimator θn,ε is given as follows.

Theorem 2.1. Under conditions (A1)–(A3), we have

θn,εPθ0

−→ θ0

as ε → 0 and n → ∞.

Page 4: Author's personal copy Journal of Multivariate Analysis ...math.fau.edu/long/JMA2013LSS1.pdfLeast squares estimators for discretely observed stochastic processes driven by small Lévy

Author's personal copy

H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439 425

The next theorem gives the asymptotic distribution of θn,ε . As is easily seen, our result includes the case of Sørensen andUchida [38] as a special case.

Theorem 2.2. Under conditions (A1)–(A4), we have

ε−1(θn,ε − θ0)Pθ0

−→ I−1(θ0)S(θ0), (2.1)

as ε → 0, n → ∞ and nε → ∞, where

S(θ0) :=

1

0(∂θ1b)

T (X0s , θ0)dLs, . . . ,

1

0(∂θpb)

T (X0s , θ0)dLs

T

.

Remark 2.3. One of our main contributions is that we no longer require any high-order moments condition on X as in,e.g., Sørensen and Uchida [38] and others, which makes our results applicable in many practical models.

Remark 2.4. In general, the limiting distribution on the right-hand side of (2.1) is a convolution of a normal distributionand a distribution related to the jump part of the Lévy process. In particular, if the driving Lévy process L is the linearcombination of standard Brownian motion and α-stable motion, the limiting distribution becomes the convolution of anormal distribution and a stable distribution.

Remark 2.5. When d = 1 and b(x, θ) = −θx, i.e., SDE (1.2) is linear and driven by a general Lévy process, Theorem 2.2reduces to Theorem 1.1 of Ma [23]. When the driving Lévy process is a linear combination of standard Brownianmotion andα-stable motion, Theorem 2.2 was discussed in Long [21] and Ma [23].

Example 2.6. We consider a one-dimensional stochastic process in (1.2) with drift function b(x, θ) = θ1 + θ2x. We assumethat the true value θ0 = (θ0

1 , θ02 ) of θ = (θ1, θ2) belongs to Θ0 = (c1, c2) × (c3, c4) ⊂ R2 with c1 < c2 and c3 < c4. Then,

X0 satisfies the following ODE

dX0t = (θ0

1 + θ02X

0t )dt, X0

0 = x0. (2.2)

The explicit solution is given by X0t = eθ02 tx0 +

θ01 (eθ02 t

−1)

θ02when θ0

2 = 0; X0t = x0 + θ0

1 t when θ02 = 0. The LSE

θn,ε = (θn,ε,1, θn,ε,2)T of θ0 is given by

θn,ε,1 = (X1 − X0) − θn,ε,2

1n

nk=1

Xtk−1

,

θn,ε,2 =

nk=1

(Xtk − Xtk−1)Xtk−1 − (X1 − X0)

1n

nk=1

Xtk−1

1n

nk=1

X2tk−1

1n

nk=1

Xtk−1

2 .

Note that ∂θ1b(x, θ) = 1 and ∂θ2b(x, θ) = x. In this case, the limiting random vector in Theorem 2.2 is I−1(θ0)( 10 dLs, 1

0 X0s dLs)

T , where

I(θ0) =

1

0ds

1

0X0s ds 1

0X0s ds

1

0(X0

s )2ds

.

Example 2.7. We consider a one-dimensional stochastic process in (1.2) with drift function b(x, θ) =√

θ + x2. We assumethat the true value θ0 of θ belongs to Θ0 = (c1, c2) ⊂ R with 0 < c1 < c2 < ∞. Then, X0 satisfies the following ODE

dX0t =

θ0 + (X0

t )2dt, X00 = x0.

The explicit solution is given by X0t =

(x0+

θ0+x20)2e2t−θ0

2(x0+

θ0+x20)et

. It is easy to verify that the LSE θn,ε of θ is a solution to the following

nonlinear equationn

k=1

Xtk − Xtk−1θ + X2

tk−1

= 1.

Page 5: Author's personal copy Journal of Multivariate Analysis ...math.fau.edu/long/JMA2013LSS1.pdfLeast squares estimators for discretely observed stochastic processes driven by small Lévy

Author's personal copy

426 H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

Since it is impossible to get the explicit expression for θn,ε , we solve the above equation numerically (e.g. by usingNewton’s method). Note that ∂θb(x, θ) =

1

2√

θ+x2. It is clear that the limiting random variable in Theorem 2.2 is

I−1(θ0) 10

1

2√

θ0+(X0s )2

dLs, where I(θ0) = 10

14θ0+(X0

s )2ds. In particular, we assume that Lt = aBt + σZt , where Bt is the

standard Brownian motion and Zt is a standard α-stable Lévy motion independent of Bt . Let us denote by N a randomvariable with the standard normal distribution and U a random variable with the standard α-stable distribution Sα(1, β, 0),where α ∈ (0, 2) is the index of stability and β ∈ [−1, 1] is the skewness parameter. By using the self-similarity and timechange, we can easily show that the limiting random variable in Theorem 2.2 has distribution given by

aI−12 (θ0)N + σ I−1(θ0)

1

0

1

2

θ0 + (X0s )2

α

ds

1/α

U .

Example 2.8. We consider a two-dimensional stochastic process in (1.2) with drift function b(x, θ) = C + Ax, whereC = (c1, c2)T , A = (Aij)1≤i,j≤2 and x = (x1, x2)T . We assume that the eigenvalues of A have positive real parts. We want toestimate θ = (θ1, . . . , θ6)

T= (c1, A11, A12, c2, A21, A22)

T∈ Θ ⊂ R6, whose true value is θ0 = (c01 , A

011, A

012, c

02 , A

021, A

022)

T .Then X0

t satisfies the following ODE

dX0t = (C0 + A0X0

t )dt, X00 = x0.

The explicit solution is given by X0t = eA0tx0 +

t0 eA0(t−s)C0ds. After some basic calculation, we find that the LSE θn,ε =

(θn,ε,i)1≤i≤6 is given by

θn,ε,1

θn,ε,2

θn,ε,3

= Λ−1n

nn

k=1

Y (1)k

nn

k=1

Y (1)k X (1)

tk−1

nn

k=1

Y (1)k X (2)

tk−1

and

θn,ε,4

θn,ε,5

θn,ε,6

= Λ−1n

nn

k=1

Y (2)k

nn

k=1

Y (2)k X (1)

tk−1

nn

k=1

Y (2)k X (2)

tk−1

,

where X (i)tk−1

(i = 1, 2) are the components of Xtk−1 , Y(i)k (i = 1, 2) are the components of Yk = Xtk − Xtk−1 , and

Λn =

nn

k=1

X (1)tk−1

nk=1

X (2)tk−1

nk=1

X (1)tk−1

nk=1

X (1)tk−1

2 nk=1

X (1)tk−1

X (2)tk−1

nk=1

X (2)tk−1

nk=1

X (1)tk−1

X (2)tk−1

nk=1

X (2)tk−1

2

.

Since it is easy and straightforward to compute the partial derivatives ∂θib(x, θ), 1 ≤ i ≤ 6, and the limiting random vectorin Theorem 2.2, we omit the details here.

3. Proofs

3.1. Proof of Theorem 2.1

We first establish some preliminary lemmas. In the sequel, we shall use the notation

Y n,εt := X[nt]/n

for the stochastic process X defined by (1.2), where [nt] denotes the integer part of nt .

Lemma 3.1. The sequence Y n,εt converges to the deterministic process X0

t uniformly on compacts in probability as ε → 0 andn → ∞.

Proof. Note that

Xt − X0t =

t

0(b(Xs, θ0) − b(X0

s , θ0))ds + εLt .

Page 6: Author's personal copy Journal of Multivariate Analysis ...math.fau.edu/long/JMA2013LSS1.pdfLeast squares estimators for discretely observed stochastic processes driven by small Lévy

Author's personal copy

H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439 427

By the Lipschitz condition on b(·) in (A1) and the Cauchy–Schwarz inequality, we find that

|Xt − X0t |

2≤ 2

t

0(b(Xs, θ0) − b(X0

s , θ0))ds2 + 2ε2

|Lt |2

≤ 2t t

0|b(Xs, θ0) − b(X0

s , θ0)|2ds + 2ε2 sup

0≤s≤t|Ls|2

≤ 2K 2t t

0|Xs − X0

s |2ds + 2ε2 sup

0≤s≤t|Ls|2.

By Gronwall’s inequality, it follows that

|Xt − X0t |

2≤ 2ε2e2K

2t2 sup0≤s≤t

|Ls|2

and consequently

sup0≤t≤T

|Xt − X0t | ≤

√2εeK

2T2 sup0≤t≤T

|Lt |, (3.1)

which goes to zero in probability as ε → 0 for each T > 0. Since [nt]/n → t as n → ∞, we conclude that the statementholds.

Lemma 3.2. Let τ n,εm = inft ≥ 0 : |X0

t | ≥ m or |Y n,εt | ≥ m. Then, τ n,ε

m → ∞ a.s. uniformly in n and ε as m → ∞.

Proof. Note that

Xt = x0 +

t

0b(Xs, θ0)ds + εLt .

By the linear growth condition on b and the Cauchy–Schwarz inequality, we get

|Xt |2

≤ 2(|x0| + ε|Lt |)2 + 2 t

0b(Xs, θ0)ds

2≤ 2

|x0| + ε sup

0≤s≤t|Ls|2

+ 2t t

0|b(Xs, θ0)|

2ds

≤ 2

|x0| + ε sup0≤s≤t

|Ls|2

+ 2K 2t t

0(1 + |Xs|)

2ds

2

|x0| + ε sup0≤s≤t

|Ls|2

+ 4K 2t2

+ 4K 2t t

0|Xs|

2ds.

Gronwall’s inequality yields that

|Xt |2

2

|x0| + ε sup0≤s≤t

|Ls|2

+ 4K 2t2e4K

2t2

and

|Xt | ≤

√2

|x0| + ε sup0≤s≤t

|Ls|

+ 2Kte2K

2t2 .

Thus, it follows that

|Y n,εt | = |X[nt]/n| ≤

√2

|x0| + sup0≤s≤t

|Ls|

+ 2Kte2K

2t2 ,

which is almost surely finite. Therefore the proof is complete.

We shall use ∇xf (x, θ) = (∂x1 f (x, θ), . . . , ∂xd f (x, θ))T to denote the gradient operator of f (x, θ) with respect to x.

Lemma 3.3. Let f ∈ C1,1↑

(Rd× Θ; R). Assume (A1)–(A2). Then, we have

1n

nk=1

f (Xtk−1 , θ)Pθ0

−→

1

0f (X0

s , θ)ds

as ε → 0 and n → ∞, uniformly in θ ∈ Θ .

Page 7: Author's personal copy Journal of Multivariate Analysis ...math.fau.edu/long/JMA2013LSS1.pdfLeast squares estimators for discretely observed stochastic processes driven by small Lévy

Author's personal copy

428 H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

Proof. By the differentiability of the function f (x, θ) and Lemma 3.1, we find that

supθ∈Θ

1nn

k=1

f (Xtk−1 , θ) −

1

0f (X0

s , θ)ds

= supθ∈Θ

1

0f (Y n,ε

s , θ)ds −

1

0f (X0

s , θ)ds

≤ supθ∈Θ

1

0|f (Y n,ε

s , θ) − f (X0s , θ)|ds

≤ supθ∈Θ

1

0

1

0(∇xf )T (X0

s + u(Y n,εs − X0

s ), θ) · (Y n,εs − X0

s )du ds

1

0

1

0supθ∈Θ

|∇xf (X0s + u(Y n,ε

s − X0s ), θ)|du

|Y n,ε

s − X0s |ds

1

0C(1 + |X0

s | + |Y n,εs |)λ|Y n,ε

s − X0s |ds

≤ C1 + sup

0≤s≤1|X0

s | + sup0≤s≤1

|Xs|

λ

sup0≤s≤1

|Y n,εs − X0

s |

Pθ0−→ 0

as ε → 0 and n → ∞.

Lemma 3.4. Let f ∈ C1,1↑

(Rd× Θ; R). Assume (A1)–(A2). Then, we have that for each 1 ≤ i ≤ d and each θ ∈ Θ ,

nk=1

f (Xtk−1 , θ)(Litk − Litk−1)

Pθ0−→

1

0f (X0

s , θ)dLis

as ε → 0 and n → ∞, where

Lit = ait +

rj=1

σijBjt +

t

0

|z|≤1

ziN(ds, dz) +

t

0

|z|>1

ziN(ds, dz)

is the i-th component of Lt .

Proof. Note thatn

k=1

f (Xtk−1 , θ)(Litk − Litk−1) =

1

0f (Y n,ε

s , θ)dLis.

Let Lit = Lit − t0

|z|>1 ziN(ds, dz). Then, we have the following decomposition 1

0f (Y n,ε

s , θ)dLis −

1

0f (X0

s , θ)dLis =

1

0

|z|>1

(f (Y n,εs , θ) − f (X0

s , θ))ziN(ds, dz) +

1

0(f (Y n,ε

s , θ) − f (X0s , θ))dLis.

Similar to the proof of Lemma 3.3, we have 1

0

|z|>1

(f (Y n,εs , θ) − f (X0

s , θ))ziN(ds, dz) ≤

1

0

|z|>1

|f (Y n,εs , θ) − f (X0

s , θ)||zi|N(ds, dz)

1

0

|z|>1

C(1 + |X0s | + |Y n,ε

s |)λ|Y n,εs − X0

s ||zi|N(ds, dz)

≤ C1 + sup

0≤s≤1|X0

s | + sup0≤s≤1

|Xs|

λ

sup0≤s≤1

|Y n,εs − X0

s |

×

1

0

|z|>1

|zi|N(ds, dz),

Page 8: Author's personal copy Journal of Multivariate Analysis ...math.fau.edu/long/JMA2013LSS1.pdfLeast squares estimators for discretely observed stochastic processes driven by small Lévy

Author's personal copy

H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439 429

which converges to zero in probability as ε → 0 and n → ∞ by Lemma 3.1. By using the stopping time τ n,εm , Lemma 3.1,

Markov inequality and dominated convergence, we find that for any given η > 0 and some fixedm

P 1

0(f (Y n,ε

s , θ) − f (X0s , θ))1

s≤τn,εm

dLis

> η

|ai|η

1

0E|f (Y n,ε

s , θ) − f (X0s , θ)|1

s≤τn,εm

ds

+

r

j=1σ 2ij

η

1

0E|f (Y n,ε

s , θ) − f (X0s , θ)|21

s≤τn,εm

ds1/2

+1η

1

0E|f (Y n,ε

s , θ) − f (X0s , θ)|21

s≤τn,εm

ds ·

|z|≤1

|zi|2ν(dz)1/2

, (3.2)

which goes to zero as ε → 0 and n → ∞. Then, we have

P 1

0(f (Y n,ε

s , θ) − f (X0s , θ))dLis

> η

≤ P(τ n,ε

m < 1) + P 1

0(f (Y n,ε

s , θ) − f (X0s , θ))1

s≤τn,εm

dLis

> η

,

which converges to zero as ε → 0 and n → ∞ by Lemma 3.2 and (3.2). This completes the proof.

Lemma 3.5. Let f ∈ C1,1↑

(Rd× Θ; R). Assume (A1)–(A2). Then, we have that for 1 ≤ i ≤ d,

nk=1

f (Xtk−1 , θ)(X itk − X i

tk−1− bi(Xtk−1 , θ0)∆tk−1)

Pθ0−→ 0

as ε → 0 and n → ∞, uniformly in θ ∈ Θ , where X it and bi are the i-th components of Xt and b, respectively.

Proof. Note that

X itk = X i

tk−1+

tk

tk−1

bi(Xs, θ0)ds + ε(Litk − Litk−1).

It is easy to see that

nk=1

f (Xtk−1 , θ)(X itk − X i

tk−1− bi(Xtk−1 , θ0)∆tk−1) =

nk=1

tk

tk−1

f (Xtk−1 , θ)(bi(Xs, θ0) − bi(Xtk−1 , θ0))ds

+ ε

nk=1

f (Xtk−1 , θ)(Litk − Litk−1)

=

1

0f (Y n,ε

s , θ)(bi(Xs, θ0) − bi(Y n,εs , θ0))ds + ε

1

0f (Y n,ε

s , θ)dLis.

By the given condition on f and the Lipschitz condition on b, we have

supθ∈Θ

1

0f (Y n,ε

s , θ)(bi(Xs, θ0) − bi(Y n,εs , θ0))ds

1

0supθ∈Θ

|f (Y n,εs , θ)| · K |Xs − Y n,ε

s |ds

≤ KC 1

0(1 + |Y n,ε

s |)λ(|Xs − X0s | + |Y n,ε

s − X0s |)ds

≤ KC1 + sup

0≤t≤1|Xt |

λ sup0≤s≤1

|Xs − X0s | + sup

0≤s≤1|Y n,ε

s − X0s |

,

which converges to zero in probability as ε → 0 and n → ∞ by Lemma 3.1. Next using the decomposition of Lt , we have

supθ∈Θ

ε 1

0f (Y n,ε

s , θ)dLis

≤ ε supθ∈Θ

ai 1

0f (Y n,ε

s , θ)ds+ ε sup

θ∈Θ

1

0f (Y n,ε

s , θ)

rj=1

σijdBjs

+ ε sup

θ∈Θ

1

0

|z|≤1

f (Y n,εs , θ)ziN(ds, dz)

+ ε supθ∈Θ

1

0

|z|>1

f (Y n,εs , θ)ziN(ds, dz)

.

Page 9: Author's personal copy Journal of Multivariate Analysis ...math.fau.edu/long/JMA2013LSS1.pdfLeast squares estimators for discretely observed stochastic processes driven by small Lévy

Author's personal copy

430 H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

It is clear that

ε supθ∈Θ

ai 1

0f (Y n,ε

s , θ)ds ≤ ε|ai|C

1

0(1 + |Y n,ε

s |)λds

≤ ε|ai|C1 + sup

0≤s≤1|Xs|

λ

,

which converges to zero in probability as ε → 0 and n → ∞, and

ε supθ∈Θ

1

0

|z|>1

f (Y n,εs , θ)ziN(ds, dz)

≤ ε

1

0

|z|>1

supθ∈Θ

|f (Y n,εs , θ)| · |zi|N(ds, dz)

≤ ε

1

0

|z|>1

C(1 + |Y n,εs |)λ · |zi|N(ds, dz)

≤ εC1 + sup

0≤s≤1|Xs|

λ 1

0

|z|>1

|zi|N(ds, dz),

which converges to zero in probability. Note that

P

ε sup

θ∈Θ

1

0f (Y n,ε

s , θ)

rj=1

σijdBjs

> η

≤ P(τ n,ε

m < 1) + P

ε sup

θ∈Θ

1

0f (Y n,ε

s , θ)1s≤τ

n,εm

rj=1

σijdBjs

> η

. (3.3)

Let

uin,ε(θ) = ε

1

0f (Y n,ε

s , θ)1s≤τ

n,εm

rj=1

σijdBjs, 1 ≤ i ≤ d.

We want to prove that uin,ε(θ) → 0 in probability as ε → 0 and n → ∞, uniformly in θ ∈ Θ . It suffices to show

the pointwise convergence and the tightness of the sequence uin,ε(·). For the pointwise convergence, by the Chebyshev

inequality and Ito’s isometry, we have

P(|uin,ε(θ)| > η) ≤ ε2η−2E

1

0f (Y n,ε

s , θ)1s≤τ

n,εm

rj=1

σijdBjs

2

r

j=1

σ 2ij

ε2η−2

1

0E|f (Y n,ε

s , θ)|21s≤τ

n,εm

ds

r

j=1

σ 2ij

ε2η−2

1

0EC2(1 + |Y n,ε

s |)2λ1s≤τ

n,εm

ds

r

j=1

σ 2ij

ε2η−2C2(1 + m)2λ, (3.4)

which converges to zero as ε → 0 and n → ∞with fixedm. For the tightness of uin,ε(·), by using Theorem20 in Appendix I

of Ibragimov and Has’minskii [11], it is enough to prove the following two inequalities

E[|uin,ε(θ)|2q] ≤ C, (3.5)

E[|uin,ε(θ2) − ui

n,ε(θ1)|2q

] ≤ C |θ2 − θ1|2q (3.6)

for θ, θ1, θ2 ∈ Θ , where 2q > p. The proof of (3.5) is very similar to moment estimates in (3.4) by replacing Ito’s isometrywith the Burkholder–Davis–Gundy inequality. So we omit the details here. For (3.6), by using Taylor’s formula and theBurkholder–Davis–Gundy inequality, we have

E[|uin,ε(θ2) − ui

n,ε(θ1)|2q

] ≤ ε2qCq

r

j=1

σ 2ij

q

E

1

0(f (Y n,ε

s , θ2) − f (Y n,εs , θ1))

21s≤τ

n,εm

dsq

≤ ε2qCq

r

j=1

σ 2ij

q

E

1

0

1

0|θ2 − θ1|

2|∇θ f (Y n,ε

s , θ1 + v(θ2 − θ1))|21

s≤τn,εm

dvdsq

Page 10: Author's personal copy Journal of Multivariate Analysis ...math.fau.edu/long/JMA2013LSS1.pdfLeast squares estimators for discretely observed stochastic processes driven by small Lévy

Author's personal copy

H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439 431

≤ ε2qCq

r

j=1

σ 2ij

q

C2q|θ2 − θ1|

2qE

1

0(1 + |Y n,ε

s |)2λ1s≤τ

n,εm

dsq

≤ ε2qCq

r

j=1

σ 2ij

q

C2q(1 + m)2λq|θ2 − θ1|2q.

Combining (3.3) and the above arguments, we have that ε supθ∈Θ

10 f (Y n,ε

s , θ)r

j=1 σijdBjs

converges to zero in probability

as ε → 0 and n → ∞. Similarly, we can prove that ε supθ∈Θ

10

|z|≤1 f (Y

n,εs , θ)ziN(ds, dz)

converges to zero in probabilityas ε → 0 and n → ∞. Therefore, the proof is complete.

Now we are in a position to prove Theorem 2.1.

Proof of Theorem 2.1. Note that

Φn,ε(θ) = −2n

k=1

(b(Xtk−1 , θ) − b(Xtk−1 , θ0))T (Xtk − Xtk−1 − n−1b(Xtk−1 , θ0)) +

1n

nk=1

|b(Xtk−1 , θ) − b(Xtk−1 , θ0)|2

:= Φ(1)n,ε(θ) + Φ(2)

n,ε(θ).

By Lemma 3.5 and let f (x, θ) = bi(x, θ) − bi(x, θ0) (1 ≤ i ≤ d), we have supθ∈Θ |Φ(1)n,ε(θ)|

Pθ0−→ 0 as ε → 0 and n → ∞. By

using Lemma 3.3 with f (x, θ) = |b(x, θ) − b(x, θ0)|2, we find supθ∈Θ |Φ(2)n,ε(θ) − F(θ)|

Pθ0−→ 0 as ε → 0 and n → ∞, where

F(θ) = 10 |b(X0

t , θ) − b(X0t , θ0)|

2dt . Thus combining the previous arguments, we have

supθ∈Θ

|Φn,ε(θ) − F(θ)|Pθ0

−→ 0

as ε → 0 and n → ∞, and that (A3) and the continuity of X0 yield that

inf|θ−θ0|>δ

F(θ) > F(θ0) = 0,

for each δ > 0. Therefore, by Theorem 5.9 of van der Vaart [47], we have the desired consistency, i.e., θn,εPθ0

−→ θ0 as ε → 0and n → ∞. This completes the proof.

3.2. Proof of Theorem 2.2

Note that

∇θΦn,ε(θ) = −2n

k=1

(∇θb)T (Xtk−1 , θ)(Xtk − Xtk−1 − b(Xtk−1 , θ)∆tk−1).

Let Gn,ε(θ) = (G1n,ε, . . . ,G

pn,ε)

T with

Gin,ε(θ) =

nk=1

(∂θib)T (Xtk−1 , θ)(Xtk − Xtk−1 − b(Xtk−1 , θ)∆tk−1), i = 1, . . . , p,

and let Kn,ε(θ) = ∇θGn,ε(θ), which is a p× pmatrix consisting of elements K ijn,ε(θ) = ∂θjG

in,ε(θ), 1 ≤ i, j ≤ p. Moreover, we

introduce the following function

K ij(θ) =

1

0(∂θj∂θib)

T (X0s , θ)(b(X0

s , θ0) − b(X0s , θ))ds − I ij(θ), 1 ≤ i, j ≤ p.

Then we define the matrix function K(θ) = (K ij(θ))1≤i,j≤p.Before proving Theorem 2.2, we prepare some preliminary results.

Lemma 3.6. Assume (A1)–(A2). Then, we have that for each i = 1, . . . , p

ε−1Gin,ε(θ0)

Pθ0−→

1

0(∂θib)

T (X0s , θ0)dLs

as ε → 0, n → ∞ and nε → ∞.

Page 11: Author's personal copy Journal of Multivariate Analysis ...math.fau.edu/long/JMA2013LSS1.pdfLeast squares estimators for discretely observed stochastic processes driven by small Lévy

Author's personal copy

432 H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

Proof. Note that for 1 ≤ i ≤ p

ε−1Gin,ε(θ0) = ε−1

nk=1

(∂θib)T (Xtk−1 , θ0)(Xtk − Xtk−1 − b(Xtk−1 , θ0)∆tk−1)

= ε−1n

k=1

(∂θib)T (Xtk−1 , θ0)

tk

tk−1

(b(Xs, θ0) − b(Xtk−1 , θ0))ds +

nk=1

(∂θib)T (Xtk−1 , θ0)(Ltk − Ltk−1)

:= H(1)n,ε(θ0) + H(2)

n,ε(θ0).

By using Lemma 3.4 and letting f (x, θ) = ∂θibj(x, θ) (1 ≤ i ≤ p, 1 ≤ j ≤ d) with θ = θ0, we have

H(2)n,ε(θ0) =

1

0(∂θib)

T (Y n,εs , θ0)dLs

Pθ0−→

1

0(∂θib)

T (X0s , θ0)dLs

as ε → 0 and n → ∞. It suffices to prove that H(1)n,ε(θ0) converges to zero in probability. For H(1)

n,ε(θ0), we need some delicateestimate for the process Xt . For s ∈ [tk−1, tk], we have

Xs − Xtk−1 =

s

tk−1

(b(Xu, θ0) − b(Xtk−1 , θ0))du + b(Xtk−1 , θ0)(s − tk−1) + ε(Ls − Ltk−1).

By the Lipschitz condition on b and the Cauchy–Schwarz inequality, we find that

|Xs − Xtk−1 |2

≤ 2

s

tk−1

(b(Xu, θ0) − b(Xtk−1 , θ0))du

2

+ 2|b(Xtk−1 , θ0)|(s − tk−1) + ε|Ls − Ltk−1 |

2≤ 2K 2n−1

s

tk−1

|Xu − Xtk−1 |2du + 2

n−1

|b(Xtk−1 , θ0)| + ε suptk−1≤s≤tk

|Ls − Ltk−1 |

2

.

By Gronwall’s inequality, we get

|Xs − Xtk−1 |2

≤ 2

n−1

|b(Xtk−1 , θ0)| + ε suptk−1≤s≤tk

|Ls − Ltk−1 |

2

e2K2n−1(s−tk−1).

It further follows that

suptk−1≤s≤tk

|Xs − Xtk−1 | ≤√2

n−1

|b(Xtk−1 , θ0)| + ε suptk−1≤s≤tk

|Ls − Ltk−1 |

eK

2/n2 . (3.7)

Thus, by the Lipschitz condition on b and (3.7), we get

|H(1)n,ε(θ0)| ≤ ε−1

nk=1

|∂θib(Xtk−1 , θ0)| ·

tk

tk−1

(b(Xs, θ0) − b(Xtk−1 , θ0))ds

≤ ε−1

nk=1

|∂θib(Xtk−1 , θ0)| ·

tk

tk−1

|b(Xs, θ0) − b(Xtk−1 , θ0)|ds

≤ ε−1n

k=1

|∂θib(Xtk−1 , θ0)|

tk

tk−1

K |Xs − Xtk−1 |ds

≤ (nε)−1Kn

k=1

|∂θib(Xtk−1 , θ0)| suptk−1≤s≤tk

|Xs − Xtk−1 |

√2KeK

2/n2

nε·1n

nk=1

|∂θib(Xtk−1 , θ0)| · |b(Xtk−1 , θ0)| +

√2KeK

2/n2

n

nk=1

|∂θib(Xtk−1 , θ0)| suptk−1≤s≤tk

|Ls − Ltk−1 |

:= H(1,1)n,ε (θ0) + H(1,2)

n,ε (θ0).

It is easy to see that H(1,1)n,ε (θ0) converges to zero in probability as nε → ∞ since

1n

nk=1

|∂θib(Xtk−1 , θ0)| · |b(Xtk−1 , θ0)| ≤ CK1 + sup

0≤s≤1|Xs|

λ+1

< ∞ a.s.

Page 12: Author's personal copy Journal of Multivariate Analysis ...math.fau.edu/long/JMA2013LSS1.pdfLeast squares estimators for discretely observed stochastic processes driven by small Lévy

Author's personal copy

H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439 433

(cf. (3.1)). By using the basic fact that

1n

nk=1

suptk−1≤s≤tk

|Ls − Ltk−1 | = oP(1),

we find that

H(1,2)n,ε (θ0) ≤

√2KeK

2/n2C1 + sup

0≤s≤1|Xs|

λ 1n

nk=1

suptk−1≤s≤tk

|Ls − Ltk−1 |,

which converges to zero in probability as ε → 0 and n → ∞. Therefore the proof is complete.

Lemma 3.7. Assume (A1)–(A4). Then, we have

supθ∈Θ

|Kn,ε(θ) − K(θ)|Pθ0

−→ 0

as ε → 0 and n → ∞.

Proof. It suffices to prove that for 1 ≤ i, j ≤ p

supθ∈Θ

|K ijn,ε(θ) − K ij(θ)|

Pθ0−→ 0

as ε → 0 and n → ∞. Note that

K ijn,ε(θ) = ∂θjG

in,ε(θ)

=

nk=1

(∂θj∂θib)T (Xtk−1 , θ)(Xtk − Xtk−1 − b(Xtk−1 , θ0)∆tk−1)

+1n

ni=1

(∂θj∂θib)

T (Xtk−1 , θ)(b(Xtk−1 , θ0) − b(Xtk−1 , θ)) − (∂θib)T (Xtk−1 , θ)∂θjb(Xtk−1 , θ)

:= K ij,(1)

n,ε (θ) + K ij,(2)n,ε (θ).

By using Lemma 3.5 and letting f (x, θ) = ∂θj∂θibl(x, θ) (1 ≤ i, j ≤ p, 1 ≤ l ≤ d), we have that supθ∈Θ |K ij,(1)n,ε (θ)| converges

to zero in probability as ε → 0 and n → ∞. By using Lemma 3.3 and letting f (x, θ) = (∂θj∂θib)T (x, θ)(b(x, θ0) − b(x, θ)) −

(∂θib)T (x, θ)∂θjb(x, θ), it follow that supθ∈Θ |K ij,(2)

n,ε (θ)−K ij(θ)| converges to zero in probability as ε → 0 and n → ∞. Thus,the proof is complete.

Finally we are ready to prove Theorem 2.2.

Proof of Theorem 2.2. The proof ideas mainly follow Uchida [42]. Let B(θ0; ρ) = θ : |θ − θ0| ≤ ρ for ρ > 0. Then,by the consistency of θn,ε , there exists a sequence ηn,ε → 0 as ε → 0 and n → ∞ such that B(θ0; ηn,ε) ⊂ Θ0, and thatPθ0 [θn,ε ∈ B(θ0; ηn,ε)] → 1. When θn,ε ∈ B(θ0; ηn,ε), it follows by Taylor’s formula that

Dn,εSn,ε = ε−1Gn,ε(θn,ε) − ε−1Gn,ε(θ0),

where Dn,ε = 10 Kn,ε(θ0 + u(θn,ε − θ0))du and Sn,ε = ε−1(θn,ε − θ0) since B(θ0; ηn,ε) is a convex subset of Θ0. We have

|Dn,ε − Kn,ε(θ0)|1θn,ε∈B(θ0;ηn,ε)≤ sup

θ∈B(θ0;ηn,ε)|Kn,ε(θ) − Kn,ε(θ0)|

≤ supθ∈B(θ0;ηn,ε)

|Kn,ε(θ) − K(θ)| + supθ∈B(θ0;ηn,ε)

|K(θ) − K(θ0)| + |Kn,ε(θ0) − K(θ0)|.

Consequently, it follows from Lemma 3.7 that

Dn,εPθ0

−→ K(θ0), ε → 0, n → ∞.

Note that K(θ) is continuous with respect to θ . Since −K(θ0) = I(θ0) is positive definite, there exists a positive constantδ > 0 such that inf|w|=1 |K(θ0)w| > 2δ. For such a δ > 0, there exists ε(δ) > 0 and N(δ) ∈ N such that for any ε ∈ (0, ε(δ)),n > N(δ), B(θ0; ηn,ε) ⊂ Θ0 and |K(θ) − K(θ0)| < δ/2 for θ ∈ B(θ0; ηn,ε). For such δ > 0, let

Γn,ε =

sup

|θ−θ0|<ηn,ϵ

|Kn,ε(θ) − K(θ0)| <δ

2, θn,ε ∈ B(θ0; ηn,ε)

.

Page 13: Author's personal copy Journal of Multivariate Analysis ...math.fau.edu/long/JMA2013LSS1.pdfLeast squares estimators for discretely observed stochastic processes driven by small Lévy

Author's personal copy

434 H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

Then, for any ε ∈ (0, ε(δ)) and n > N(δ), we have, on Γn,ε ,

sup|w|=1

|(Dn,ε − K(θ0))w| ≤ sup|w|=1

Dn,ε −

1

0K(θ0 + u(θn,ε − θ0))du

w

+ sup

|w|=1

1

0K(θ0 + u(θn,ε − θ0))du − K(θ0)

w

≤ sup

|θ−θ0|≤ηn,ε

|Kn,ε(θ) − K(θ)| +δ

2< δ.

Thus, on Γn,ε ,

inf|w|=1

|Dn,εw| ≥ inf|w|=1

|K(θ0)w| − sup|w|=1

|(Dn,ε − K(θ0))w| > 2δ − δ = δ > 0.

Hence, letting

Dn,ε = Dn,ε is invertible, θn,ε ∈ B(θ0; ηn,ε),

we see that Pθ0 [Dn,ε] ≥ Pθ0 [Γn,ε] → 1 as ε → 0 and n → ∞ by Lemma 3.7. Now set

Un,ε = Dn,ε1Dn,ε + Ip×p1Dcn,ε ,

where Ip×p is the identity matrix. Then it is easy to see that

|Un,ε − K(θ0)| ≤ |Dn,ε − K(θ0)|1Dn,ε + |Ip×p − K(θ0)|1Dcn,ε

Pθ0−→ 0,

since Pθ0 [Dn,ε] → 1. Thus, by Lemma 3.6, we obtain that

Sn,ε = U−1n,εDn,εSn,ε1Dn,ε + Sn,ε1Dc

n,ε

= U−1n,ε (−ε−1Gn,ε(θ0))1Dn,ε + Sn,ε1Dc

n,ε

Pθ0−→ (I(θ0))−1

1

0(∂θ1b)

T (X0s , θ0)dLs, . . . ,

1

0(∂θpb)

T (X0s , θ0)dLs

T

as ε → 0, n → ∞ and nε → ∞. This completes the proof.

4. Generalization to semi-martingale noises

In this section, we discuss the extension of our main results in Section 2 to the general case when the driving noise is asemi-martingale. Let Qt = Q0 + Mt + At be a semi-martingale, where Mt is a local martingale and At is a finite variationprocess. Then, we can replace the driving Lévy process Lt in the SDE (1.2) by the semi-martingale Qt to get

dXt = b(Xt , θ)dt + εdQt , t ∈ [0, 1]; X0 = x0. (4.1)

All the related information about the LSE of θ discussed in the Introduction and Section 2 shall be the same.We are interestedin the consistency and asymptotic behavior of the LSE θn,ε under the general model (4.1).

We state the new results as follows.

Theorem 4.1. Under conditions (A1)–(A3), we have

θn,εPθ0

−→ θ0

as ε → 0 and n → ∞.

Theorem 4.2. Under conditions (A1)–(A4), we have

ε−1(θn,ε − θ0)Pθ0

−→ I−1(θ0)S(θ0),

as ε → 0, n → ∞ and nε → ∞, where

S(θ0) :=

1

0(∂θ1b)

T (X0s , θ0)dQs, . . . ,

1

0(∂θpb)

T (X0s , θ0)dQs

T

.

Page 14: Author's personal copy Journal of Multivariate Analysis ...math.fau.edu/long/JMA2013LSS1.pdfLeast squares estimators for discretely observed stochastic processes driven by small Lévy

Author's personal copy

H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439 435

Remark 4.3. Since the proofs of Theorems 4.1 and 4.2 are similar to those of Theorems 2.1 and 2.2 given in Section 3, wemention only the necessarymodifications. Note thatMt is a local martingale, we can use the standard localization procedureto make M and [M,M] being bounded up to the stopping times Tm, m = 1, 2, . . . , with limm→∞ Tm = ∞ almost surely.For example, we can define Tm = inft > 0 : [M,M] > m or

t0 |dAs| > m. Then, we can modify the definition of τ n,ε

m by

τ n,εm = inft ≥ 0 : |X0

t | ≥ m or |Y n,εt | ≥ m ∧ Tm.

Lemma 3.2 still holds, i.e. τ n,εm → ∞ a.s. uniformly in n and ε asm → ∞.When the proofs are based on pathwise arguments,

they can be carried over to the semi-martingale noise case easily.When the proofs are based on theMarkov inequality (or theChebyshev inequality), Ito’s isometry and the Burkholder–Davis–Gundy inequality (c.f. Theorem 54 and remark on page 175in Chapter IV of Protter [32]), we can apply the modified stopping times τ n,ε

m to the stochastic integrals with respect to thelocal martingale Mt . Thus all the proofs will be still valid in terms of the modifications described as above. We omit thedetails here.

5. Simulations

Consider a 2-dimensional model for (1.2) with

b(x, θ) =

θ1 + x21 + x22, −θ2x2

1 + x21 + x22

T

, Lt =

V δ,γt + Bt

Sαt

, (5.1)

where B is the standard Brownian motion, Sα is a standard symmetric α-stable process Sα(1, 0, 0), and V δ,γ is a variancegamma process with Lévy density

pV (z) =δ

|z|e−γ |z|, z ∈ R, δ, γ > 0.

In this example, we find that our LSE of θ , say θ = (θn,ε,1, θn,ε,2) satisfies

nk=1

X (1)tk − X (1)

tk−1θn,ε,1 + (X (1)

tk−1)2 + (X (2)

tk−1)2

= 1; θn,ε,2 = −

nk=1

X(2)tk

−X(2)tk−1

X(2)tk−1

1+(X(1)tk−1

)2+(X(2)tk−1

)2

n−1n

k=1

(X(2)tk−1

)2

1+(X(1)tk−1

)2+(X(2)tk−1

)2

. (5.2)

In the sequel, we set values of parameters as

(X (1)0 , X (2)

0 ) = (1, 1), (θ1, θ2) = (2, 1), (δ, γ , α) = (5, 3, 3/2).

Then both X (1) and X (2) are infinite activity jump-processes, but the jump activity of X (1) is bounded variation, and the oneof X (2) is unbounded variation. A sample path of X = (X (1)

t , X (2)t )t∈[0,1] with ε = 0.3 is given in Fig. 1.

In each experiment, we generate a discrete sample (Xtk)k=0,1,...,n and compute θ from the sample. This procedure isiterated 10,000 times, and the mean and the standard deviation of 10,000 sampled estimators are computed in each caseof (ε, n). To optimize the nonlinear equation (5.2), we used nlm function in R. On generating discrete samples of Lévyprocesses, see e.g., Cont and Tankov [4], Section II.6 and references therein, or one can find some random number generatorin yuima package of R, which is a package for simulating SDEswith jumps; see https://r-forge.r-project.org/projects/yuima/.For example, we use rstable to generate random samples fromα-stable distributions. The results are shown in Tables 1–4.From those tables, we can observe the consistency result holds true when ε → 0. We also note that the size of n is oftenless important in practice for estimating the drift parameter than the size of ε although n → ∞ is necessary in theory. It isintuitively clear because the accuracy of estimating drift highly depends on the terminal time T of observations in general,that is, the larger T becomes, the more accurately θ is estimated. However, the terminal T = 1 is now fixed. Note that, inthe small noise model, letting ε → 0 corresponds to observing a process from a macro point of view, which correspondsto the case T → ∞ in some sense. Therefore, increasing n under fixed ε does not improve a bias of estimators, which isimproved only if ε → 0. In general, a large n can decrease the standard error (or standard deviation) of estimation, but theeffect seems small in this example.

Comparing standard deviations between θn,ε,1 and θn,ε,2, the former seems to be estimated more ‘stably’ than the latter.This is because ‘‘big’’ jumps of X (1) are less frequent than those of X (2). If ε is small enough, the path of X (1) is almost similarto the deterministic curve of (X0)(1) since ‘‘big’’ jumps do not occur so frequently. However, X (2) can have more ‘‘big’’ jumpsthat are not ignorable even if ε is ‘‘small’’, which makes the estimator fluctuating.

To observe the asymptotic distribution of θ , we shall compare the above example, sayModel A (non-Gaussian noise), witha 2-dimensional processwith the samedrift b as in (5.1), but the driving noise L is 2-dimensional Brownianmotion, sayModel

Page 15: Author's personal copy Journal of Multivariate Analysis ...math.fau.edu/long/JMA2013LSS1.pdfLeast squares estimators for discretely observed stochastic processes driven by small Lévy

Author's personal copy

436 H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

1.0

1.5

2.0

2.5

3.0

3.5

X1

0.0 0.2 0.4 0.6 0.8 1.0

Time

0.4

0.6

0.8

1.0

X2

Fig. 1. A sample path of Model (5.1) with (θ1, θ2, δ, γ , α) = (2, 1, 5, 3, 3/2) and ε = 0.3.

Table 1Mean (upper) and standard deviation (parentheses) of estimates through 10,000 experimentsin the case ε = 0.3 and (δ, γ , α) = (5, 3, 3/2).

ε = 0.3 n = 500 n = 1000 n = 3000 True

θn,ε,1 2.30885 2.31618 2.29381 2.0(1.8770) (1.8248) (1.7926)

θn,ε,2 1.54087 1.50664 1.52753 1.0(2.8493) (2.8685) (2.7667)

Table 2Mean (upper) and standard deviation (parentheses) of estimates through 10,000 experimentsin the case ε = 0.1 and (δ, γ , α) = (5, 3, 3/2).

ε = 0.1 n = 500 n = 1000 n = 3000 True

θn,ε,1 2.03134 2.02699 2.02389 2.0(0.5829) (0.5836) (0.5833)

θn,ε,2 1.10165 1.09839 1.09709 1.0(1.2024) (1.1212) (1.0971)

Table 3Mean (upper) and standard deviation (parentheses) of estimates through 10,000 experimentsin the case ε = 0.05 and (δ, γ , α) = (5, 3, 3/2).

ε = 0.05 n = 500 n = 1000 n = 3000 True

θn,ε,1 2.00583 2.00599 2.01071 2.0(0.2961) (0.2951) (0.2913)

θn,ε,2 1.04883 1.05963 1.04438 1.0(1.4364) (1.3026) (0.6773)

B (Gaussian noise). Figs. 2 and 3 respectively show (normal) QQ-plots for 10,000 iterated samples of ε−1(θn,ε,i−θi) (i = 1, 2)in Model A with (ε, n) = (0.01, 3000), and Figs. 4 and 5 are those for Model B with (ε, n) = (0.01, 3000). In Model B,(marginal) asymptotic distributions of ε−1(θn,ε,i−θi) (i = 1, 2)must theoretically be normal, which are supported by Figs. 4and 5. On the other hand, tails of the corresponding distributions inModel A should be heavier than normal distributions due

Page 16: Author's personal copy Journal of Multivariate Analysis ...math.fau.edu/long/JMA2013LSS1.pdfLeast squares estimators for discretely observed stochastic processes driven by small Lévy

Author's personal copy

H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439 437

Table 4Mean (upper) and standard deviation (parentheses) of estimates through 10,000 experimentsin the case ε = 0.01 and (δ, γ , α) = (5, 3, 3/2).

ε = 0.01 n = 500 n = 1000 n = 3000 True

θn,ε,1 2.00051 2.00061 2.00108 2.0(0.0583) (0.0583) (0.0578)

θn,ε,2 1.00308 1.00572 0.99958 1.0(0.2626) (0.1454) (0.1371)

Fig. 2. Normal QQ-plot for 10,000 iterated samples of ε−1(θn,ε,1 − θ1) in Model A (non-Gaussian); (ε, n) = (0.01, 3000).

Fig. 3. Normal QQ-plot for 10,000 iterated samples of ε−1(θn,ε,2 − θ2) in Model A (non-Gaussian); (ε, n) = (0.01, 3000).

to jump activities, and we can observe those facts from Figs. 2 and 3. We can also observe that the asymptotic distributionof ε−1(θn,ε,2 − θ2) is much heavier than the one of ε−1(θn,ε,1 − θ1) because of the high frequency of jumps in X (2). Thesefacts are consistent with the theory.

Page 17: Author's personal copy Journal of Multivariate Analysis ...math.fau.edu/long/JMA2013LSS1.pdfLeast squares estimators for discretely observed stochastic processes driven by small Lévy

Author's personal copy

438 H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

Fig. 4. Normal QQ-plot for 10,000 iterated samples of ε−1(θn,ε,1 − θ1) in Model B (Gaussian); (ε, n) = (0.01, 3000).

Fig. 5. Normal QQ-plot for 10,000 iterated samples of ε−1(θn,ε,2 − θ2) in Model B (Gaussian); (ε, n) = (0.01, 3000).

Acknowledgments

The authors are grateful to anonymous referees for suggesting to add sections for semi-martingale noise and simulations(Sections 4 and 5). This research was supported by JSPS KAKENHI Grant Number 24740061, Japan Science and TechnologyAgency, CREST (the 2nd author), and NSERC Grant Number 311945-2008 (the 3rd author).

References

[1] Y. Aït-Sahalia, Maximum likelihood estimation of discretely sampled diffusion: a closed-form approximation approach, Econometrica 70 (2002)223–262.

[2] J.P.N. Bishwal, Parameter Estimation in Stochastic Differential Equations, in: Lecture Notes in Mathematics, Vol. 1923, Springer-Verlag, Berlin,Heidelberg, New York, 2008.

[3] P.J. Brockwell, R.A. Davis, Y. Yang, Estimation for non-negative Lévy-driven Ornstein–Uhlenbeck processes, J. Appl. Probab. 44 (2007) 977–989.[4] R. Cont, P. Tankov, Financial Modelling with Jump Processes, Chapman & Hall/CRC, Boca Raton, FL, 2004.[5] A.Ja. Dorogovcev, The consistency of an estimate of a parameter of a stochastic differential equation, Theory Probab. Math. Stat. 10 (1976) 73–82.

Page 18: Author's personal copy Journal of Multivariate Analysis ...math.fau.edu/long/JMA2013LSS1.pdfLeast squares estimators for discretely observed stochastic processes driven by small Lévy

Author's personal copy

H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439 439

[6] V. Fasen, Statistical estimation of multivariate Ornstein–Uhlenbeck processes and applications to co-integration, J. Econometrics 172 (2013) 325–337.[7] V. Genon-Catalot, Maximum contrast estimation for diffusion processes from discrete observations, Statistics 21 (1990) 99–116.[8] A. Gloter, M. Sørensen, Estimation for stochastic differential equationswith a small diffusion coefficient, Stochastic Process. Appl. 119 (2009) 679–699.[9] Y. Hu, H. Long, Parameter estimation for Ornstein–Uhlenbeck processes driven by α-stable Lévy motions, Commun. Stoch. Anal. 1 (2007) 175–192.

[10] Y. Hu, H. Long, Least squares estimator for Ornstein–Uhlenbeck processes driven byα-stablemotions, Stochastic Process. Appl. 119 (2009) 2465–2480.[11] I.A. Ibragimov, R.Z. Has’minskii, Statistical Estimation: Asymptotic Theory, Springer-Verlag, New York, Berlin, 1981.[12] R.A. Kasonga, The consistency of a nonlinear least squares estimator for diffusion processes, Stochastic Process. Appl. 30 (1988) 263–275.[13] N. Kunitomo, A. Takahashi, The asymptotic expansion approach to the valuation of interest rate contingent claims, Math. Finance 11 (2001) 117–151.[14] Yu.A. Kutoyants, Parameter Estimation for Stochastic Processes, Heldermann, Berlin, 1984.[15] Yu.A. Kutoyants, Identification of Dynamical Systems with Small Noise, Kluwer, Dordrecht, 1994.[16] Yu.A. Kutoyants, Statistical Inference for Ergodic Diffusion Processes, Springer-Verlag, London, Berlin, Heidelberg, 2004.[17] C.F. Laredo, A sufficient condition for asymptotic sufficiency of incomplete observations of a diffusion process, Ann. Statist. 18 (1990) 1158–1171.[18] A. Le Breton, On continuous and discrete sampling for parameter estimation in diffusion type processes, Math. Program. Studies 5 (1976) 124–144.[19] R.S. Liptser, A.N. Shiryaev, Statistics of Random Processes: II Applications, Second Edition, in: Applications of Mathematics, Springer-Verlag, Berlin,

Heidelberg, New York, 2001.[20] A.W. Lo, Maximum likelihood estimation of generalized Ito processes with discretely sampled data, Econometric Theory 4 (1988) 231–247.[21] H. Long, Least squares estimator for discretely observed Ornstein–Uhlenbeck processes with small Lévy noises, Statist. Probab. Lett. 79 (2009)

2076–2085.[22] H. Long, Parameter estimation for a class of stochastic differential equations driven by small stable noises from discrete observations, Acta Math.

Scient. 30B (2010) 645–663.[23] C. Ma, A note on ‘‘Least squares estimator for discretely observed Ornstein–Uhlenbeck processes with small Lévy noises’’, Statist. Probab. Lett. 80

(2010) 1528–1531.[24] H. Masuda, Simple estimators for parametric Markovian trend of ergodic processes based on sampled data, J. Japan Statist. Soc. 35 (2005) 147–170.[25] H. Masuda, Approximate self-weighted LAD estimation of discretely observed ergodic Ornstein–Uhlenbeck processes, Electron. J. Stat. 4 (2010)

525–565.[26] T. Ogihara, N. Yoshida, Quasi-likelihood analysis for the stochastic differential equationwith jumps, Stat., Inference Stoch. Process. 14 (2011) 189–229.[27] A.R. Pedersen, A new approach tomaximum likelihood estimation for stochastic differential equations based on discrete observations, Scand. J. Statist.

22 (1995) 55–71.[28] A.R. Pedersen, Consistency and asymptotic normality of an approximate maximum likelihood estimator for discretely observed diffusion processes,

Bernoulli 1 (1995) 257–279.[29] R. Poulsen, Approximate maximum likelihood estimation of discretely observed diffusion processes, Tech. Report 29, Centre for Analytical Finance,

University of Aarhus, 1999.[30] B.L.S. Prakasa Rao, Asymptotic theory for nonlinear least squares estimator for diffusion processes, Math. Operations forschung Statist Ser. Statist. 14

(1983) 195–209.[31] B.L.S. Prakasa Rao, Statistical Inference for Diffusion Type Processes, Oxford University Press, Arnold, London, New York, 1999.[32] P. Protter, Stochastic Integration and Differential Equations, Springer-Verlag, Berlin, Heidelberg, New York, 1990.[33] Y. Shimizu, M-estimation for discretely observed ergodic diffusion processes with infinite jumps, Stat. Inference Stoch. Process. 9 (2006) 179–225.[34] Y. Shimizu, Quadratic type contrast functions for discretely observed non-ergodic diffusion processes, Research Report Series 09-04, Division of

Mathematical Science, Osaka University, 2010.[35] Y. Shimizu, N. Yoshida, Estimation of parameters for diffusion processeswith jumps fromdiscrete observations, Stat. Inference Stoch. Process. 9 (2006)

227–277.[36] M. Sørensen, Small dispersion asymptotics for diffusionmartingale estimating functions, Preprint No. 2000-2, Department of Statistics and Operation

Research, University of Copenhagen, Copenhagen, 2000.[37] H. Sørensen, Parameter inference for diffusion processes observed at discrete points in time: a survey, Internat. Statist. Rev. 72 (2004) 337–354.[38] M. Sørensen, M. Uchida, Small diffusion asymptotics for discretely sampled stochastic differential equations, Bernoulli 9 (2003) 1051–1069.[39] K. Spiliopoulos, Methods of moments estimation of Ornstein–Uhlenbeck processes driven by general Lévy process, Preprint, University of Maryland,

2008.[40] A. Takahashi, An asymptotic expansion approach to pricing contingent claims, Asia-Paci. Financial Markets 6 (1999) 115–151.[41] A. Takahashi, N. Yoshida, An asymptotic expansion scheme for optimal investment problems, Stat. Inference Stoch. Process. 7 (2004) 153–188.[42] M. Uchida, Estimation for discretely observed small diffusions based on approximate martingale estimating functions, Scand. J. Statist. 31 (2004)

553–566.[43] M. Uchida, Approximate martingale estimating functions for stochastic differential equations with small noises, Stochastic Process. Appl. 118 (2008)

1706–1721.[44] M. Uchida, N. Yoshida, Information criteria for small diffusions via the theory of Malliavin-Watanabe, Stat. Inference Stoch. Process. 7 (2004) 35–67.[45] M. Uchida, N. Yoshida, Asymptotic expansion for small diffusions applied to option pricing, Stat. Inference Stoch. Process. 7 (2004) 189–223.[46] L. Valdivieso, W. Schoutens, F. Tuerlinckx, Maximum likelihood estimation in processes of Ornstein–Uhlenbeck type, Stat. Infer. Stoch. Process. 12

(2009) 1–19.[47] A.W. van der Vaart, Asymptotic Statistics, in: Cambridge Series in Statistical and Probabilistic Mathematics, vol. 3, Cambridge University Press, 1998.[48] N. Yoshida, Asymptotic expansion of maximum likelihood estimators for small diffusions via the theory of Malliavin-Watanabe, Probab. Theory Relat.

Fields 92 (1992) 275–311.[49] N. Yoshida, Asymptotic expansion for statistics related to small diffusions, J. Japan Statist. Soc. 22 (1992) 139–159.[50] N. Yoshida, Conditional expansions and their applications, Stochastic Process. Appl. 107 (2003) 53–81.