Top Banner
Rates of Convergence and Central Limit Theorem for Empirical Processes of Stationary Mixing Sequences By Bin Yu* Statistics Department University of California, Berkeley Technical Report No. 260 June 1990 *Research supported in part by NSF grants MC84-03239, DMS-9001710 Department of Statistics University of California Berkeley, California
37

Rates Convergence and Limit Empirical Processes of ...

Apr 26, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Rates Convergence and Limit Empirical Processes of ...

Rates of Convergence and Central Limit Theorem for EmpiricalProcesses of Stationary Mixing Sequences

By

Bin Yu*

Statistics DepartmentUniversity of California, Berkeley

Technical Report No. 260June 1990

*Research supported in part by NSF grants MC84-03239,DMS-9001710

Department of StatisticsUniversity of CaliforniaBerkeley, California

Page 2: Rates Convergence and Limit Empirical Processes of ...

Rates of Convergence and Central Limit Theorens for EmpiricalProcesss of Stationary Mixing Sequences

Bin Yu*

Statistics DepartmentUniversity of Califomia, Berkeley

June, 1990

ABSTRACT

An intuitive approach is used to obtain rates of convergence and CLT for empir-

ical processes for stationary mixing sequences with not too slow a 0 (O) mixing rate.

This includes as a special case Markov chains with an exponental decay 4-mixing

rate .=0(0f) for some B < 1.

The main techniques are to divide the sequence into blocks of equal size and

construct a new sequence with the same structure as the original sequence but with

indePendent blocks. Then the rates of convergence and CLT for the second sequence

are obtained using the standard symmetization technique and a chaining argument and

these results can be extended to the original sequence.

The metric entropy conditions imposed are similar to those in the independent

case. A 0 (or O) mixing rate condition related to the metric entropy conditions is also

needed, although for simplicity the measurability difficulties are ignored. In the case

of V-C classes, the uniform convergence result holds for any strictly stationary

sequence with J (or e) mixing rate N (or 0=o(n) ) for some d > 0, while the CLT

requires the rte N (or #f)=o(n7) for some d > 3. As by-products of these results,

several useful inequalities about the original and constructed sequence are obtained.

Research supported in part by NSF grants MC84-03239, DMS-8701426 and DMS-9001710.

Abbreviated title. Limit theorem for mixing sequences.

AMS 1980 class4¶icatioas. Primary 60F05, 60F17; Secodary 60G10.Key words arnd phrases. Empirical process, index class, blocking, independence, mixing, rtes of cmvergence, CentrlI LimitTheornm, metric entropy, V-C class, symmetrization.

Page 3: Rates Convergence and Limit Empirical Processes of ...

1. Introduction

There has been a lot of research work on empirical processes indexed by classes of

functions since Vapnik & Cervonenkds (1971) showed that uniform convergence holds for

the empiincal process indexed by a V-C class in the i.i.d case. Many papers followed.

Examples include Dudley (1978), Dudley & Philipp (1983), Gine and Zinn (1984),

Le Cam (1983) and Pollard (1982). However, most of this work concentrates on the

independent case.

In this paper, uniform convergence, rates of convergence, and a Cental Limit

Theorem for the empirical process in the i.i.d case, are extended to the weakly dependent

case, or mixing sequences. Conditions on the mixing rate of the sequence and epy

constraints on the index class are imposed. The main technique used is the constucdon of

an independent block (IB) sequence which enables us to employ the symmetrization tech-

nique used in the i.i.d case, and Bennett's inquality in a chaining argument.

Two closely related works in the same direction as this paper are Philipp (1986) and

Massart (1988). In tenns of the entmpy conditions used, their works are very similar to

each other, and there is some overlap of results too. The difference between their approach

and mine lies in the entmpy conditions on the index class. My approach can be viewed as

an extension of Vapnik-Cervonenkis-type theory to weakly dependent (mixing) sequences,

since I adopt the Pollard (1984) approach and impose random entropy condidons, while

both Philipp and Massart use metric entropy with inclusion. In this context we note the

remark of Massart (1988), who stated that "we don't know whether the above weakly

dependent framework [as in Massart (1988)] could support a general 'Vapnik-Cervonenkis

type theory' or not".

When an algebraic decay of covering number with inclusion is assumed, Philipp

(1986) gives the strng invariance principle with rate (log n )F under very weak conditions

on the strong mixing rate. Under the same entropy conditions, but stronger conditions on

the strng mixing rate, Massart obtains a faster rate in the strong invariance pnnciple.

Moreover, if the sequence has an exponential decaying mixing rate, the algebraic decay of

the covering number with inclusion condition can be relaxed (Massart (1988)) to the log of

Page 4: Rates Convergence and Limit Empirical Processes of ...

- 2 -

tha number (i.e., th metric enmpy with inclusion) in order to have an invariance princi-

ple with mte (Iogn)4 for some X>O. Massart (1988) also relaxes the constraint on the

index class of sets in Philipp (1986) to an index class of functions with constant envelope

function. Another related work is Levental (1989) on martingale difference sequences.

The results here are restricted to rates of convergence (uniform convergence) and the

CLT. I do not consider invariance principles. The results obtained here are under possibly

weaker mixing conditions and cover a large variety of V-C classes. There are some V-C

classes of sets that are not known to satisfy the metric entropy with inclusion conditions of

Philipp and Massart. For technical reasons, only 3-mixing (completely regular) are con-

sidered here, but we hope that any examples of strong (a-) mixing sequences with fast

enough mixing rate will have a good chance to be completely regular.

The rest of the paper is organized as follows. Section 2 gives preliminares on mcix-

ing sequences and definitions. Section 3 includes the main results, that is, rates of conver-

gence of empincal processes for general index classes (Theorem 3.1) and an equicontinuitylemma for index classes with constant envelope functions (Theorem 3.6). Section 4 con-

tains the proofs for Theorem 3.1 and 3.6, while section 5 contains the rest of the proofs.

Moreover, the constmction of an independent block sequence is descnrbed in detail in sec-

tion 4; it is the comerstone of the proofs.

2. Preliminaries

This section contains the preliminary materials on mixing sequences and metric entro-

pies. The size of the index class of the empirical process can be regulated thugh metric

entropy conditions related to the empirical Ll norn, and thrugh covering integrals. The

algebraic decay condition of the covering number of an index class is also introduced.

From now on, all the necessary measurability requirements are assumed. The reader

is referred to Dudley and Philipp (1983) for a careful treatment of this problem.

Page 5: Rates Convergence and Limit Empirical Processes of ...

- 3 -

Let X =(Xi )j be a strictly stationary real-valued sequence with distribution P,

which implies X, (i1 ) al havethe same distribution P. For the sequence X, let

a/= (XIIX29...9 Xi

and

5'1+k =a(XI+k,X,+k+1i,-)-

Many kinds of mixing conditions exist in the literature. The weakest among those most

commonly used is called strong mixing or a-mixing.

2.1 Definition For any sequence X, the a-mixing coefficient ak is defined as follows:

ak(X)=sup (E I P (B I O )-P(B) I.: B e g+k,I,l> 1).

Other mxinlgs are

2.2 Definition For any sequence X, the P-mixing coefficient Pk is defined as follows:

Pkj(!)=E sup f I P (B I al )-P(B) 1: B e el+k, l > 1}.

2.3 Definition For any sequence X, the 0-mixing coefficient OA is defined as follows:

*k(X)=sup ( I P (B I A )-P(B)I: A eOc, B E'I+k, l> 1)

Moreover, we can define mixing rate constants for c = a, P, 4-mixings:

r, = sup( reR.R+: (cX n') is a bounded sequence)and

oc = sup( O<01: (c,, 0-) is a bounded sequence).

The three mixing coefficients are ordered as follows, see Philipp (1986):

ak < P < Ok-. (2.0)

Note that the stationary sequence (f (Xi ): i = 1, 2, ...) for a measurable function f,

has a-mixing and 0-mixing rate bounded by the corresponding rate of the original

sequence since the a-field of f (X ) is contained in the a-field of X for any f, that is,

ak .ak (f) and +k O*k (f).

Page 6: Rates Convergence and Limit Empirical Processes of ...

- 4 -

Thberefore, if the sequence (X,) sadsfies an a or # condition, then so does the sequence

If (X;i)).

For examples of mixing sequences, see Athreya and Pantula (1986), Ibragimov and

Rosanov (1978), Mokkadem (1988), Pham and Tran (1985) and Withers (1981). In partic-

ular, a Marov chain is 0-mixing with O# < 1 under some regularity conditon (Doob

(1953)).

We now introduce some constants which will be used frequently later. For any 8 in

(0, 1], denote A,(n ,8)=,ak& +5), A,(n )= akl2. Let A(oo,1) be denoted by AcLk k

and A# (c ) by A#.We take Pollard's linear functional notation and use P instead of E to denote expecta-

tions. Hence, P f =f d P =E f (XI)=E f (X.) for al n . 1. Moreover, we use P to

denote "a probability measure on a (sometimes unspecified) measurable space; miscellane-

ous random variables live on this space." (Pollard (1984)).

It is known that if the mixing rate of sequence tends to zero fast, the variance of e

sum of a function of each n successive observations is O(n). This is necessary for the

CLT to hold. The following more general lemma is from Dehling (1983).

2.4 Lemma Suppose that f is a measurable function, X a strictly stationary sequence, and

Pf = 0. Then

i)

2P (f(Xi)) < n (I +2Aa(n,8))(P If 128)t226, (2.1)

1

n 2P (f(Xi)) S n (I+A*(n))(P If 12). (2.2)

ii) If f is bounded by a constant M, then

n 2P (Xf (Xi)) < n (1+201 ak)M2, (2.3)

1 k

2

P (f(Xi)) S n (1 +2AaM413)(P If 1)239, (2.4)

Page 7: Rates Convergence and Limit Empirical Processes of ...

and

a 2

P(f (Xi)) S n(1+2A#)MP If I. (2.5)

Proof: Part (i) follows from lemma 3.2 and lemma 3.5 of Dehling (1983). (2.4) and (2.5)

of (ii) are consequences of (i). As for (2.3), it is a consequence of Lmma 3.1 of Dehling

(1983).

2.5 Corollary Let r'a=min( 1,r,) and 8, = 1, if rx=1,=I = 0 otherwise.

If Ifl . M, then

.8 2-rP (If (Xi)) SO((Iogn) n (2.6)

Proof. The proof is completed by (2.3) and the convergence property of the algebraic

series Ykv'. 0

We define the ,B mixing aM +-mixmng coefficients for any probability measure Q on

a product measure space ( (1 x Q2, £xI :2) as follows.

2.6 Definition Suppose that Q1 and Q2 am the margial probability measures of Q on

S21, ;1) and (Q22, 2). Then we define

5(F,1 .2.Q )=E sup I Q (B I 7:1)-Q2(B) I: B e ;2 ),f

+(T:1,X2,Q )=sup { I Q (B I A )-Q2(B) 1: A e 2:1, B e F;2) .

2.7 Lemma Suppose that h(x,y) is a measurable function with bound Mh, and P the pro-

duct measure Q IxQ2. hMen we have

I Q h - P h I <Mh 5(1. Z2, Q )!5Mh +(F1, F29 Q )

Proof. Since the bounded measurable function h(x,y) can be approximated by bounded

linear combinations of simple functions of form IAXB with A eQ and B efl2, it is enough

to prove the result for h of such form.m

Let h = 2 ilt<; where the A 's are chosen to be disjoint, and put MA,=max Ia1 l.

Page 8: Rates Convergence and Limit Empirical Processes of ...

- 6 -

Moreover, write i =Q (B; I Z-1 )-Q2(Bi ). Since 11 (Ai)S 1, it follows that:

i~~~~~~~~~~~~

in

IQh P I= IFaj( Q(Bj11Bi)-Q2(ABj)dQl

i

=MhJ2supI I'i I IA1dQI Im~~~

< 2;I , su |71 1 I IAd Q IiMjsul i n

Mhf sup 1I Id Q

<Mh VY(l1, F21 Q )

The proof is then completed using the fact that the f3-mixing coefficient is bounded by the

*-mixing coefficient. 0

By induction and Lemma 2.7, we have

2.8 Corollary Let m . 1, and suppose that h is a bounded measurable function on a pro-m m

duct probability space (11li, rI xi). Let Q be a probability measure on the producti=1 i=1

space witfi marginal measures Q, on (ih, 1j), and Qi+l the marginal measure of Q oni+1 i+1

(nS2j xj), im=1,.. m-l Writej=1 j=1

V(Q)=SUsupm o(njF-i+ Q )+j=1

and define [(Q) similarly and let P = HQi. Theni=l

Page 9: Rates Convergence and Limit Empirical Processes of ...

- 7 -

I Q h - P h I .(m-l)MAs [(Q).(m-1)MA *(Q).

Renmrk: Corollary 2.8 is the key to conecting the mixing sequence and the independent

block sequence (see section 4 for details).

Before we introduce the definitions of metric enmpy and covering integral, we need

some notations.

For any mixing sequence XI, X2, , denote by P, the empirical measure of the

first n observations:

P. f if (xi).

For any Borel-measurable function f, define the empirical P-bridge E. by

En f=4W(P,f -Pf ).

For any measurable family F of functions, we call F an envelope function of F if

If I< FforallfinF.

Since we are interested in the uniform performance of the empirical measure, intui-

tion suggests that F can't contain too many fiunctions. One measure of the size of F is the

covering number or metric entropy. Along the lines of Pollard (1984), we define the cov-

ering number as follows.

2.9 Definition (covering number) The covering number N( e, , F) related to a semi-

metic d on F is defined as

N(e,d,F)=minm (there is g1,. . . .gm ,e L'(P ), such that,

minj5j ,, d(fgj)Se, for any f, in F ).

logN ( e , d , F) is called a metric entropy at £. See Kolmogorov and Tihomirov (1959).

If we take the following random L I-semimetric p ,, as d,

P,n (f.g) = Pm (If-g 1)

then the related covering number is random. Denote by pt the Lt semi-metric

p1(f,g)=P If -g i. A metrc related to pX is p,l2(f .g)=(P I f -g 1)1/2. The last

Page 10: Rates Convergence and Limit Empirical Processes of ...

metric is convenient when using a rstricted chaining argument (Le Cam's "square-root-

trick" (Pollard (1984))) to prove the CLT for O-mixing sequences. Moreover, a "cube-

root-trick" and p1n(f,g)=(P I f -B I)l"3 are needed to deal with a-mixing sequences.

The covenrng numbers for P12 and P1/3 are N (e£ p1,F) and N (e, p139,F), and the6 , 6 ~~~~~~~1/2

corresponding covenrng integrals are J (8, P1z,F)={2 log(N (e, p1/F)2/C)} de and

J (8p1i,F)={2 log(N (e,p 13, F)2/E)} de. Conditions on the rate at which the cov-

ering numbers tend to infinity as e tends to zero, and finiteness of covering integrals, are

restrictions on the size of the family F.

The following algebraic decay conditions on covering numbers will be imposed in the

later theorems and they are known to be satisfied by V-C classes (Dudley (1978)).

N(e,pi, F) =O (Cw) for some w >O, as e- O. (2.7)

N(e,pI,,,F)=O(ew) forsome w>O,as e-O. (2.8)

It follows that when (2.7) is satisfied, as e -* 0, we have

N (e,pi,2,F)<N(e2,p ,F)=O(.72w)

and

N (E,pi,3F)<N(e3,p1,F)=O(,73w).

3. Limit theorems: the main results

This section includes the statements of the main results, i.e., the rates of convergence,

uniform convergence, and the CLT for the empirical process of a stationary mixing

sequence. The main results are Theorem 3.1 (the rates of convergence) and Theorem 3.6

(equicontinuity lemma). The proofs are left to sections 4 and 5.

Page 11: Rates Convergence and Limit Empirical Processes of ...

-9-

For the unifom convergence theotem, we first rstrict ourselves to bounded index

classes We then find proper conditions to ensure that the law of lare numbers hOlds for

the envelope function, which is in turn the condition necessary to generalize the uniform

convergence result to classes having non-constant envelope function. Unfortunately, con-

stant envelope function is assumed for the CLT (Theorem 3.10) to hold, although I believe

that if we truncate X properly, this requirement is not necessary. Stronger mixing assump-

tions might be needed for the CLT to hold for classes with non-constant envelope function.

For any a-mixing rate constant rc > 0, recall that r'* = min (1, rc), and set 6a = 1 if

re=1 and ra = 0 otherwise.

3.1 Theorem (Rates of convergence) Suppose that X is a strictly staonary c.

Assume the necessary measurability requirements are satisfied and let FM be an index ClaSS

with constant envelope function M. Suppose further that there exists integer pairs (a,,, ,,)

where i ,,=[n/2a,,], such that for any b,,=O(1), we have (logn) a,,

Ia fia, = o(l), and

logN1 (c b , pI,FM) = op ( (logn =a 'a b) (3.1)

Then we have

P (supf FIP,,f-Pf I >eb,}e0 aas n o. 0

See section 4 for proof.

Remark: If we replace condition (3.1) by a similar 3-mixing condition with x and r'If

similarly defined, Theorem 3.1 still holds.

3.2 Corollary Suppose that F is an index class satisfying (2.8), i.e.,

N( £,pp, . F ) =O (£W ) for some w > 0, as e - 0.

and that the necessary measurability requirements are assumed.

Page 12: Rates Convergence and Limit Empirical Processes of ...

- 10-

i) IfO < rg . 1, then for any &o such that (1-r)12 (l+r ) < 8S < 1/2, we have

PtSUPf F IPff -Pf I ,>£ & 0 as n, -4c,

i.e.,

SUpfeF IPfJ-Pf I = ( Gn.ii) if rp >0, then for0< So < 1/2 and e> 0, we have

P(supfeFIP f-Pf Il>eE8 ) 0 as n oo,

.e.,

SUpf 6F IPnff-Pf l=op ( G)

iii) If 0O < 1, then for any 81 > 1 and £ > 0, we have

P SUPfeF IPJf -Pf I > e(0 as n o,

i.e.,

SUpf.F IP,f-Pf I =op .

See section 5 for the proof.

Remark

1) As a straightforward consequence of Theorem 3.1, for a class F satisfying (2.8),

P(SUPfEFIPnf-Pf I>eJ-0 as n -4o,

provided that r > 0.

2) V-C classes satisfy (2.8), so Corollary 3.2 holds for V-C classes.

3.3 Theorem (Uniform convergence for bounded families) Suppose that X is a strictly

stationary sequence. Assume the necessary measurability requirements are satisfied and

Page 13: Rates Convergence and Limit Empirical Processes of ...

- 11 -

Fm an index class with constant envelope function M. For any given e > 0, we have

P(SUPf FM IPj,f-Pf I>e)e 0 as n c.

under any of the following three conditions.

i)ra=l and r >0and

logNI(,P1,.FM )=o° (o (3.2)

ii) r,> 1, rp>0, and

logIV (r£, p I,n, FM, )op (n ).(3.3)

iii) For some positive constant ao < 1,

log IV I (e, P n,, FM)- op ( n a (3.4)

and either 1) O<ra<ao and r> ao ', or 2) l>r >ao and rp>O. O See section S

for proof.

3.4 Definition (metrically transitive) A strictly stationary sequence Y=(Y1,...) is called

metrically transitive if all invariant sets of its shift transformation T (i.e., T Yi = Yi+) have

probabiliWty 0 or 1. 0

We refer to Doob (1953) for more discussions on this concept.

3.5 Theorem (Uniform convergence for general families) Consider an index class of

functions F w'ith general envelope function F under the random entropy and mixing condi-

tions of Theorem 3.3 plus one of the following constraints

i) (F (Xi), i = 1, - - - with F E L (P ) is metrically transitive.

ii) A,(n,6)= a2 = o(n) and F e L24(P ) for some 6 in (0, 1].k=1

iii) r,>O and F E L2(P

We have

PISUPf F Pnf -Pf i > O as n - . O0

Page 14: Rates Convergence and Limit Empirical Processes of ...

- 12 -

See section 5 for proof.

3.6 Theorem (Equicontinuity for a mixng sequence) Suppose that X is a strictly station-

ary sequence and that the necessary measurability requirements are satisfied. Let FM be a

class of functions with constant envelope function M.

a) Suppose that the sequence is 4-mixing with A*< or ro >2, and the covering integral

J (8,p1n2FM ) is finite for each 5>0, and there exist integer pairs (a., p,) with

l=[fl/2afl], aRnl411,,1 = o(l) such that

logN (,pl,n I, FM ) ° ( 42 ) for all T1 > GO, (3.S)

and

logN (T1 a. Pi,,3 FM ) = oP (a,,2) for all T1 > O. (36)

Then, for each l > 0 and £ > 0 there exist a 6 > 0 for which

lim,, __ ,P(sup[ IEn U_B-) I >n ) <£

where [8]= { (f,g): f,gFFm, p1/2(f g)<4 ).

b) Suppose that the sequence is ,3-mixing with A x<o or r.> 3, and the covering integral

J (8,pl/3,F 1) is finite for each 5>0, and there exist integer pairs (a,,, ,,) with

1, =[nI2aa n1/6 va <n3/10, RJ. ia = o(l) such that

logN( . P , FM ) = Op ( ) forall T1 >O, (3.7)

and

logN (f(a )'3/2 PI,n FM) = O0 (-S ) for all 11 > 0. (3.8)

Then, for each r1 > 0 and £ > 0 there exist a 8 > 0 for which

limwr [ P{( fsup[sf iEp(f -g ) I 4n) <Ep

where [81 = { (f *g): -f g c FM, P113(f .gk}<. O See section 4 for proof.

Page 15: Rates Convergence and Limit Empirical Processes of ...

- 13 -

To clarify the sense in which the empirical pcess indexed by F converges to the P-

bridge, we introduce the space C(FJP) c * of real bounded functionals on F (Pollard

(1984)). For a detailed discussion and remarks on the relevant measurability problem, see

Pollard (1984, Appendix C) and references cited therein.

3.7 Definition C(F, P) is the set of all functionals x(.) on F which are uniformly continu-

ous with respect to the L 1(P ) seminorm PI on F. That is, for each functional x(.) and e

> 0 there should exist a 8 > 0 such that Ix(f )-x(g )I < e whenever p (f-g)<&-

Define Bp as the smallest a-field on F which contains all the closed balls with centers in

C(F, P), and makes all the finite-dimensional projections measurable.

The following theorem is a slightly different version of Theorem 21 on page 157 of

Pollard (1984). We cite it here without prof.

3.8 Proposition Let F be a subset of L2(P ) with envelope F and let X be a strictly sta-

tionary sequence. Suppose that the covering number N (e, pl/2, F) (or N (e, pw, F)) is

finite for all e > 0, and that

i) for any finite number of functions ff ... f,,, in F, (E,, f1 j=l,2,...,m con-

verges weakly to the multivariate normal with finite covariance matrix (aij) where

Tij = (7fj,j =P fi (X1)fj(X1)+2£P fi (X1)fj(Xk+ )

=Pfi (XI)fj(X,)+2wP fj (X0)fi(Xk+0);

ii) for each rl > 0 and e > 0 there exist a S > 0 for which

liiii, ~P(supla,6 IE, f- g) I > 7 )<£c

where

[S]=( (f, g): f, g eF and pl/2 (f -g)<) (or [J={ (f, g): f, g e F and P1/3 (f-g) < 8)).

Then E. -- Ep as random elements of . The limit P-bridge process Ep is a tight, gaus-

sian random element of -X whose sample paths all belong to C(F, P), and Ep (. f) has a

nonnal distribution with mean zero and varance 2jj. O

Page 16: Rates Convergence and Limit Empirical Processes of ...

- 14 -

Remark:

1) Pollard defined his C(F,P) in terms of L2(P) metric P2. In our case, we should define

it in terms of d = Pm or Pi3. However, since Pta and P13 are equivalent to Pi in terms

of generating the space in Definition 3.7, we take Pi as the metric and note the correspond-

ing space is contained in the metric space defined in terms of L2(P) metric when the

envelope function of F is in L 2(P)

2) Thle "pointwise boundedness" and "total boundednesst assumptions in Pollard (1984) are

insured by our envelope fuction assumption and our finite covering number assumption

for metrics Plt2 or P1/3. Therefore, although fte [6] here is defined in terms of the semi-

metic p12 or P1/3 instead of the L2(P)-metric p2(f,g)= 41p If-_g 12 as in Pollard

(1984), it doesn't make any difference to the proof.

We say that the CLT holds for the empirical process indexed by F if the conclusion

of Proposition 3.8 holds. Now let us cite a finite dimensional CLT from Dehling (1983).

3.9 Proposition (Dehling (1983)) Suppose that X is a strictly stationary real (vector)

sequence and that

i) A*<oc (or r,, >2), and finite second moments exist; or

ii) A,,,<00 (or ra > 3) and finite third moments exist.

Then the CLT holds for this sequence. O

3.10 Theorem (CLT for a mixing sequence) Let FM be a class of functdons with constant

envelope function M and let X be a strictly stationary sequence such that the necessary

measurability requirements are satisfied. Assume that the following random entropy condi-

tion holds for some b in (0,1) and every £ > 0:

logN 4-n PI,FM )=op (n a).

Then, under either a) or b) below, the CLT holds for the empirical process indexed by FM.

3+2ora) the sequence is +-mixing with A <co (r*>2), r p> 12

a

ao0 1/4, and the covering

Page 17: Rates Convergence and Limit Empirical Processes of ...

- 15 -

integal J (8,plp,FM ) is finite for each 8 >0 and

b) The seuec is f-mixing, A <o (r>3), rp>3a4 < 2/1 1and the covering

integralJ (8,pl3, FM ) is finite for each 8 >O. 0

See section 5 for proof.

Remark: The upper bound requirement on o0 is purely a result of the blocking technique

we shall employ in the proof. A refined chaining argument might be able to bring the

upper bound closer to 1/2 which is the condition used in the independent case (Le Cam

(1983), also Pollard (1984)). Presumably, a stronger mixing condition would be needed to

increase the upper bound. Note that even when # < I is assumed, Theorem 3.10 in the

present form couldn't increase the upper bound on ao further from 1/4.

Observe that the two metric entropy conditions for both uniform convergence and the

equicontinuity lemma are met by index classes satisfying (2.8) which correspond praci-

cally to ao = 0. Hence, by (b) of the above theorem, we have

3.11 Theorem If FM is an index class satisfying (2.8) and X is a strictly stationary

sequence. Then under either

i)J(8,pln,FM)<ooforeach8>0,andr,,>2and rp>3;orii)J(8,pl/3,FM)<ooforeach8>0,andrax>3and rp>7/3,the CLT holds for the empirical processes indexed by FM. 0

Since V-C classes satisfy (2.8), we get

3.12 Corollary If FM is a V-C class, and X is a strictly stationary sequence with rx > 3

and rp > 7/3, then the CLT holds for the empirical processes indexed by FM. 0

Remark In the case of an index class satisfying (2.8) where both J (8, p2, FM) and

J (8,p,,,FM ) are finite, the condition (a) in Theorem 3.10 requires r* > 2 and rp > 3

which implies ra > 3, and rn > 7/3. Therefore, (b) gives stronger results. However, (a)

covers the case when ao is between 2/11 and 1/4. Of course, this fine distinction on the

exponent of the random metric entrpy may be of little importance, since we don't know if

there exists such a class with exponent in thi's range.

Page 18: Rates Convergence and Limit Empirical Processes of ...

- 16 -

4. Proofs I

We include in this section the proofs of the key results of this paper (Theorem 3.1

and Theorem 3.6). First, I explain how a blocking technique enables us to get the

exponential inequality, which is the key step in proving our results. Then the rigorous

proofs are given in the forn of a few lemmas after constructing an independent block (IB)

sequence. Some useful inequalities relating the original sequence to the IB sequence are

also presented here (Lemma 4.2).

After observing that all the results we want are either in terms of distributions, or can

be interpreted in terms of probabilites, we constuct an [B sequence from the original sta-

tionary mixing sequence such that the IB sequence is very close in distribution to the mix-

ing sequence. We then transfer the problem to the IB sequence to which the stard tech

niques of the independent case can be applied. Symmetrization is used for the IB sequence

in the proofs of both Theorem 3.1 and Theorem 3.6. Moreover, in the proof of Tlhorem

3.6, we also need to use a restricted chaining argument which requires some exponential

inequality. This inequality can be obtained for the IB sequence via Bennett's inequality

only because of the independence of the IB sequence. If the mixing rate is tending to zero

fast enough, the same upper bound obtained using the techniques in tfie independent case

plus a o(1) quantity becomes the upper bound for the original mixing sequence.

We divide the n-sequence X,,=(x1,x2.,x,,) into blocks of length a,,, one after the

other. We eliminate every other block and work with the remaining odd blocks. Depend-

ing on te mixing and metric entmpy conditions to be assumed, we choose a,, large so that

the odd a.-blocks are "almost" independent, but at the same time choose ax, not that large

so that the odd a,,-blocks together behave similarly to the oigingal mixing sequence. Then

we construct an independent sequence of blocks where each block has the same distribu-

tion as one of the a,, -blocks of the original sequence.

More precisely, for any integer pair (a,,, ,,) with 11n /2 a,, ], we divide the

strictly stationary n-sequence Xy,, = (X1, X2., X,,) into 2A,, blocks of length a, and the

remainder block of length n - 2 ,u,, a,,. Denote the indices in the blocks altemately by H's

Page 19: Rates Convergence and Limit Empirical Processes of ...

- 17 -

and T's, and denote the indices in the remainder block by R. These indices depend on n,

but for simplicity we suppress n. That is

HI= (i: I <i <a,,)

Tj = li: a. + I <i 52a,,)

Generally, for 1 < j < ,,

Hj = (i: 2(j -1I)aP, +1:5i 5(2j- l)a,,,

Tj = {i: (2j - lI)a,t +1 <i <(2j )a,, }.

Denote the random variables that correspond to the Hi and T, indices as

X (Hj )=({Xi, i EHj },

X (Tj)= (Xi, i Tj},

Further, let the whole sequence of H-blocks be denoted by Xa. =

(X (Hj) j=1, 2,..., An}

Now, we take a sequence of identically distributed independent blocks

{_(H1):j=j,..4Ln } where _(Hj) = [,: i Hj ), such that the sequence is indepen-

dent of X,, and each block has the same distribution as a block from the original sequence:

distribution ( _(Hj ) ) = distribution ( X (Hj ) ) = distibution ( X (H1 )). We call this

constructed sequence the independent block a, -sequence (IB sequence). Denote the IB

sequence as ~a Because of the mixing condition, we can relate Xa< and - in the fol-

lowing way.

4.1 Lemma Let the distribution of Xa. and abe Q and Q respectively. For any

measurble function h on RN af with bound M,

I Q h (Xa )-Q h (ga) I <M (Xn-I)1asProof This is a direct application of Corollary 2.8. In the corollary, take Q = the proba-

bility distribution of the a,, -sequence with Rj = Raf, I; =product Borel (a-field on Raf and

m = - Then P in the corollary equals the probability distribution of the IB a,, -sequence,

Page 20: Rates Convergence and Limit Empirical Processes of ...

- 18 -

i.e., . Notice ftat (Q)5 Pa. OI

Renurk:

1) This is the key lemma, and it is used throughout the subsequent proofs. Different func-

tions h are used in the application of this lemma; in particular, h is often taken to be an

indicator function.

2) The f3-mixing (or 4-mixing) condition is required for both the uniform convergence

result and the CLT, because in our approach this lemma is crucial in connecting the origi-

nal sequence with the IB sequence. We are not able to obtain this lemma under a-mixing

conditions. If Lemma 4.1 holds for a-mixing, all the main results could then be obtained

under a-mixing conditions which are weaker than the ones we use now.

Recall that the index class with an envelope function F is denoted by F. For simpli-

city, we assume P f = 0 for all f in F. Then the empirical measure on

(f (Xi ): i = 1, ,2, ..., n } is

P, f = n £f(Xi).nl

Correspondingly, the empirical P-bridge is

1nE, f (Xj).

For the original sequence X, we write

Yj.f (xa,, f=£ (Xi ) and YIjjf (Xl,I = f (Xi )ieHj ie Tj

For the constructed [B sequence _, define

Zj(, a) f() j),axianddenote P,f=-f Z11J

Associated with this empirical measure (note that it is not a probability measure if a,, >0)

are two random semi-metrics

1 AplN.,(f,g)=I IYf-g 1,nfI

Page 21: Rates Convergence and Limit Empirical Processes of ...

° 19 -

and

1~LP46 (f tg)= IlDzjf-s 1

We shall compare these two semi-metrics with the L1 empirical metric

p1,,,(f,g)=Pn If-g I.

Similarly, we have the pseudo empiincal process for the IB sequence

1 AnE

n J.Zj 'f

The L I random covering numbers corresponding to Pi, , . pi,,, and pi, are denoted by

N (e, Pi,,,. F), N (e, Plg. F), and N (e, Plm F),

respectively.

For simplicity, from now on. (i,, a,,) is always an integer pair satisfying

n/2 -aa, .J.,, am,nnl2. Therefore, jL,, - oo, a, -+ co, a,, =o(n), and A, a,, = 0(n). More-

over, the IB sequence is implicitly assumed to be defined in terms of a pair of integers

(p.,,. a,,).

The following lemma allows us to replace P, by P, with only an enor of order

gn a,,

4,2 Lemma Suppose that F = M, and b,,=0 (1), as n - o.

i) If p,, b,, -+ o, then

P(SUPfeF I P- f I1 2bm ) 5 2 P(SUMPfEF 1 P lg#,f I 2e

b,, )+2g11,a,.~4

ii) If a,,=o(4n), then

P(supfgEFs I E, (f-g) I 2.) 2P(supf gEF IREN (f-g) I >.-)+2,, Pa.,

where F8 is any subset of F.

Page 22: Rates Convergence and Limit Empirical Processes of ...

- 20-

Proot.

i) Note tat he sum of f over the remainder block R is unifomly bounded by

M (2 a,,)n1 =O (,'1) which tends to zero faster than b, since ,, b, -* oo, and Xa has the

same distribution as XI, = Xi: i Tjfor 1.1j S 1. Therefore, for n sufficiently

large, we have

P(SUPfeF I P f I .)

1 1 A

.P(SuPf,gEF -n Yjf(Xa)+nY,> (Xl, I

n j -g (Xa) I4 IF Yljf (Xia.) I >J= n j=I1

2P(SUPf,geFI - E Yjfj-g (XM) I >¾. (4.4)

Taking for h the indicator function of the event

SUpf g F I n- F Yjf-g (Xa.) I > -n j=1 4

Lemma 4.1 gives the following bound on the the left-hand side of (4.4):

2P (supf g e F In ZjJ -g (Xa,) I> e ) + 2 g1 P3a

ii) This is similar to i). Note that a,, =o(4n) is assumed instead of a,, =o(n) to eliminatethe remainder block, which is bounded by 2 a,, MhF . El

Since the IB sequence consists of i.i.d blocks, we can use the standard symmetriza-don technique in the independent case to get the uniform convergence (rates of conver-

gence) for the pseudo-empirical measure P_ ,,, of the IB sequence. The entropy conditions

needed will be in tenns of the IB sequence. Therefore, we need to relate the entmpy con-

ditions on the original sequence to those on the IB sequence. Because the entrpy condi-

tions are random, i.e., they can be stated in terms of probability, we can easily transfer theentropy condition about the original sequence to the IB sequence by Lemma 4.1. Recall

Page 23: Rates Convergence and Limit Empirical Processes of ...

- 21 -

that r' =min(r,,, 1), and Sa= 1 if ra 1, and 6a=O otherse.

43 Lemma If gm,, Pa =0(1), then for n sufficiently large, the following hold.

i) For any constant sequence b,,. if logN (e, Pl,,1, F)=op (b,,), then

logN (£, P1,.F)=op (b,,). -(4.5)

ii) If F= M and ra,>O b,, =0 (1) and (log n a4 p,, b,2 -4 oo, ten

P(SUPf F IPg f I >.ebn Ia

e 2bexp4

- 2b, a, (log n) +logN,( 8 F)}. (4.6)

iii) Under the assumptions in ii) and assume further that

logN b,, P ,,. F op ( (logn -)4ua,s,CL b,2), (4.7)

then, for any e > 0, and n sufficiently large,

P(SUPfeF I Pf I1.eb, )<e.

Proof: i) By the triangle inequality, we have

N 1 ^i=P fi ( ng) Yj,If-S I < n ;If (Xi)-g (Xi ) I S PS,n (f g )

Thus,

N (e, Pi.' F)5N1(e, Pl . F).

This together with the assumption in i) implies

logN (£, pl ,z, F)=op (b,,).

Then take h in Lemma 4.1 to be the indicator function of the event

logNJ (e, PL3' F){ > e}, we obtain the conclusion of (i).

b,,

ii) This can be obtained by the standard symmetrization technique for te i.i.d case, Since

the Z's are i.i.d for any fixed f in FM, and bounded by M a,.

Page 24: Rates Convergence and Limit Empirical Processes of ...

- 22 -

By Chebysev's i qt and Corollary 2.5 , tere is a constant C such that

var (Zjf ). C M2a. `(log n)6.

Therefore,

P ( I Pp,, f I >e bn I a)

22 CM2 (log )<1-b =1/2

e2b,, n (logn)aGa. p, b 2

smce (log nR) a,"a1 b_, co as n -- oo.

Using the same arguments as on page 14-15 of Pollard (1984), we get (4.7).

iii) This is straightforward from ii) and the symmetrization lemma in Pollard (1984). 0

Proof of Theorem 3.1 (rates of convergence):

Combining (i) of lemma 4.2, and (i) and (iii) of lemma 4.3 we obtain Theorem 3.1. 0

Now we tumn to the proof of Theorem 3.6, the equicontinuity lemma. We denote as

FM our index class with a constant envelope function M and define the empirical proess

for the IB sequence as

E*,,, f = ; (Zj,f -a,, Pf )= Zjt

since Pf = 0 is assumed.

Recall that there are two semi-metrics related to Pi (f.g) = P If-g 1. They are

p ,2 (f.g) = 4? If-g I and p ,3 (f,g) = (P If-g I )13 . Their corresponding covering

integrals are J(y,pj2,F) and J(y,p 13,F). P1/2 and P1/3 are used because tey are

closely related to the L '-empirical random semi-metric p ,, (f,g) = P,, If-g I in terms of

which the original sequence and the IB sequence behave similarly.

Besides symmetrization, we need a restricted chaining argument used by Le Cam

(Pollard (1984)) to chain down the process E,, to a link of size 0(a,,/4n) (Lemma 4.5).

The link is measured in terms of the square-root metric Pl2 or the cube-root metric pl3,

Page 25: Rates Convergence and Limit Empirical Processes of ...

- 23 -

depending on the type of mixing conditions assumed. Tlhe oter chaining argument used

by Pollard (1984) will not work here since we are not able to show the uniform conver-

gence of the sum of squares of functions on the blocks to its expectation. However, in the

independent case where the "block" contains only one observation, this convergerxe is a

trivial application of the uniform convergence result for a different index class

(f -g)2:f,g e F).

Next we cite the restricted chaining lemma from Pollard (1984). Although Pollard

(1984) used the L2(P) metric, his argument is general enough to cover our case.

4.4 Lemma (Restricted Chaining) Let (Z(t): t e T) be a stochastic process that satisfies

the exponential inequality

P( IZ(t)-Z(s)I >1) P(2exp({ 112/D22) if d(s.t)<6 (4.8)

for every rl > 0 and 6 > 0 with 6.alij, for some constant a.

Suppose that T has a finite covering integral J(.). Let T(a) be an a-net (containing N(a)

points: T(a)= ( 1, t2, ..., tN: min,INd(5,:)<a forall t T); let ta be the closest point

in T(a) to t; and let [6] denote the set of pairs (s,t) with d(s,t) < 6. Given e > 0 and y >

0, there exists a 6 > 0, depending on £, y and J(.), for which

P(sup[8 IZ(t-Z(s) I >57):52£+P(SUPT IZ -Z(ta) >1) (4.9)

provided aS-5- £, and yS 144 and J(a)5min(--, 1 .3 12D' D

4.5 Lemma Suppose that the covering integral J ( 6, d, FM ) corrsponding to a metric d on

FM is finite, and that there are integer pairs (a., A, ).such that a ,, =0 (n). Assume that

the EB sequence satisfies

Var ( ,Zjjg )2.A M n d2(f g (4.10)j=1

Then, for any given E > 0, 11 > 0, and sufficiently large n,

P ( supp] Ik9I h I >5Tg)

Page 26: Rates Convergence and Limit Empirical Processes of ...

- 24 -

S2e+P(supHpm(d)IENh I)fl) (4.11)

whereH,,(d)=(ffg:f,g eFMafdd2(f,g)5 K- )andK= 2MAwher d ):5K~~~~4BFadK=81(1/2)

Proof: For any fixed n, by the restricted chaining lemma (Lemma 4.4), we only have to

show that condition (4.8) holds for our process ER with index set T = FM, for every posi-

tive 6 and n such that 6> a 4i with a2=K -, and K=2AMVn- B-'(112)Let h -f-g, and note that tlfi M and Pf = 0. Moreover, for any fixed f and g, Zj,are i.i.d for j = 1, ..., ,,, and bounded by 2 a,, M.

By Bennett's inequality, for every 6 amd r1 as above,

P( IE hI>T)=P( IX(Zjf.-8 I >rnp4) (4.12)j-l

2exp (-12(fAlM g2i B ( 2 a M Tl4)2 nA Md2(f,g)) n AMd2( g

where B(x)=2x-2[(1+x)log(1+x)-x 1 for x > 0 is a decreasing function of x with range

(O0,).

If we restrict f and g to be close, i.e., d2 (f ,g)<6, then the left hand side of (4.12) is

bounded by

2exp{-J-( , j2 )B2a Mr1)42nAM 2 n AM62

2 M2 A2a.=2exp B(kA

sep-4 A M 82

since A< S s!l B-1 (1/2), that is, 62> 2AM =1a2nfor a=Ka By

lemma 4.4, the proof is complete. O

Page 27: Rates Convergence and Limit Empirical Processes of ...

- 25 -

4.6 Corollaryi) If J (y,p1L2,FM )<oo and A*<oo (r*>2), then (4:11) holds for d = In

ii) If J (T,pP13 FM)< andAa<o (ra> 3), then (4.11) holds ford = p1/3.

Proof: We only need to check condition (4.10)

i) When A*,<°o, (4.10) holds for d = Pin by (2.5) of lemma 2.4.

ii) When Aa<°°, (4.10) holds ford = P1/3 by (2.4) of lemma 2.4. O

Now the task for proving the equicontinuity lemma becomes that of bounding the tail

probability of the supremum E 3 over a smaller neighbodlood H,, (d). We will replace

this neighborhood of size a,,fW by a random neighbodrod H',, (d) ( where d is either

Pij2 or Pl3)' since on this random neighborhood the symmetrization technique can be used

to obtain a bound on the tail probability.

Depending on the metric we choose, the neighborhood H. (d will have a differet

size in terms of p1. That is

H,(PI12)=(f-g,f,g FMandp2n(fag)O(

=(f g f- e FM and Pa,(f,g)<Oa

and the corresponding random neighborhood is

Hx'(pj )=(f-g,f g e FM and Pl,,,(f,g)<O a,,

On the other hand,

Ha (P1r3)= f -g,f .g e FM and p?23(f ,g )<0 ( aft

=(f -g f, g EFM and P1(f,g)<O ((am )3n}

with the corresponding random neighborhood

H, (p13)= (f-g, f , g E FM and p14L.(f,g )SO (( af)3/)2

Page 28: Rates Convergence and Limit Empirical Processes of ...

- 26 -

Lemma 4.7 and its corollary (4.8) below make sure that the replacement of H. (d) by

H',,(d) (whe d = pV2 or p ) is legitimate.

4.7 Lemma Suppose the stmng mixing raM constant ra > 1, and let

HM= h:h=If-g If ,g e FM )*

i) If there are integer pairs (a,,, p.,,) such that a, -oo, a,, p. = O(n) and gn Pa. = 0(1),

and the random entropy satisfies

logN ( to,P, w FM ) =OP ( a.2)

then, as n -+ oo, we have

P(suPHM IP,h a,,

ii) If there are integer pairs (a,,, .,,) such that a,,2/4I -oco, a,, p.,, = 0(n) and P, =

o(l), and the random entropy satisfies

logN ((; )3/2. PlM, FM )=o; (;;-)9

then

P(supH,. I P 1,p. h I > £(V; )112) _+ O.

Proof: We use lemma 4.3 for HM. For ra > 1, it is sufficient to show that

logN ( £ b,,pIP,, HM )=OP ( n b.2 ). (4.13)

where b,, =a,, iv7i for (i) and b,, = (a,,/n)3/2 for (ii).

Observe that the covering number of HM, i.e. N (2 £, p1,, HM ), is bounded by the square

of the covering number of FM, i.e. N (e, pl, , FM ). The last statement can be seen to be

true, because, for any e > 0 and h = I f -g I E HM, there is a f I and gI such that

pl,n (f,f )<e andp1,, (g,gj)<e. Lethj= If1-gl 1, note that

I IhI-Ih1I I<Ih-h1l

and

Page 29: Rates Convergence and Limit Empirical Processes of ...

- 27 -

Ih-hi I = If -g -f1+g I S If-f II + Ig -g1I.

Therefore, the log of the covering number of HM is bounded by 2 times the log of the cov-

ering number of FM.

i) Take b, = --, then (4.13) is satisfied by assumption (3.5).

ii) Take b,, = )312, then (4.13) is satisfied by assumption (3.7). 0

4.8 Corollary For any e >O and l> 0:

i) Under the assumptions in Lemma 4.7 (i), we have

P(SUPH,, (d) P41 (h ) I >T)SP (SUPH'(d) I PX, (h ) I >n)+e

where d = P1n.ii) Under the assumptions in Lemma 4.7 (ii), we have the conclusion of i) for d = pl.

Proof: i) Let A,, = (supH1 (p IP ,h I >11 ) and B,, = (SuPHM IIh II<. T)en,

by lemma 4.7 (i), P (Bc ) - 0 as n tends to infinity. Hence, it is enough to show that, on

B,,, Hn (P1/2) is contained in H,,'(p1,2). For any h = f -g in H,, (P i?) and on B,,

pl(f,g)=P Ih I <K ^- and

P-,^ Ih I"=I n,(Zj,,I a a.

This implies

I n-l I zj,A I fn t,zj,lh I1 1

I+1+,, a,, n1P Ih I

<a.l n+1/2P hI

< (I +K /2) a. 14-n

Page 30: Rates Convergence and Limit Empirical Processes of ...

- 28 -

=0 (a.,in).

ii) This is imilar to i). 0

Now let us deal with the random neighborhood H."'(p1/2) and H.' (P113) by symmeti-

zation.

For any Ti > 0, let t1,t2.t, tN be a S-net in of H.,'(d) (d = p11or Pin3)

and where N = N ( Fn pj ,,,Hn,(d)) By Hoeffding's inequality, we have

P( I I,) exp(- 8 n2/4 Zj). (4.14)n =

Zt 2 ""n8m

Since ti is on the --net of H,'(d), there is a h = f-g in H,,'(d) such that

Pl,jL (ti ,h )< G By the triangle inequality,

I Zjj 1: - I ZjJ -Z I + I Zjn J J n j=l n jail

=PI'P (ti,h )+Pj' (f ,g)

ll + a,1l

0 (a. 14 ) if a,, -4 o .

Therefore, Z anM- ;IZi, I = 0(4n a2).j-l n j=l

Combining the last bound with (4.13), we have

psm tzi argument.i Poalad(95expae 12)),

By the symmetrization argument in Pollard (1984, page 15),

Page 31: Rates Convergence and Limit Empirical Processes of ...

- 29 -

;An 1- 2

P(SU4PH.'@@(Po I i} 1,2 _

SN2in-piAngHti(plf2))maxiP(lI; ajZj,,,ltI .,>

<N( I,;,,s, H.,*(pj7))excp(-O( 4)a

.exp(logN ( 1,Pg, HM )-O( 2T))

Sexp(21ogN( 2vin Plg FM )°( a))

That is

P ( SUPH4 (P,,I2 ,; ZCi 2jf1>aI,j=1

Sexp(2logN (- FM )-°( )) (4.15)

Similarly, for d = p 1/3 when a,, n lt, we get

1: (Y Z

2 "a~~~~L5~~~~~/

.exp(logN ( 2 p;;P,, H.'(pl/3)-O ( 52)

2exp(logN(IV n3(.16S2enc(log (2 1,,Plm FM )-O( s ) ). (4. 16)

Now we am ready to prove the equicontinuity lemma (Theorem 3.6) for FM.

Proof of Theorem 3.6 (equicontinuity):

i) Under the assumption a,, = o (n1/4) = O(4F) and gm,, [a =o (1), we take

F8= [8=f -g: f ,g e FM and Pi/2 (f ,g) < 6 inLemma 4.2 (ii). For any e >O, when n

sufficiently large, we have

Page 32: Rates Convergence and Limit Empirical Processes of ...

- 30 -

P(s4p11 I En (f -g ) I >rI)

SP(sup[6J I Et (f-g) I >il/2)+C.

Moreover, snce ( ,P2, FM )<oo and A*<o (r*>2), by Corollary 4.6 (i), we get

P(sup[sj I Eo, (f-g ) I >ij/2)

.P(supHU(p,) I E. (f -g ) I >1/10)+E.

Since the random entropy condition in Lemma 4.7 (i) is assumed, (i.e., condition (3.6)), we

have

P(SUPHO,(P I ES, (f -g ) I >nV10)

< P (supH. Cp, I EN (f - ) I >t/10)+e.

By (4.15) and assumption (3.5), for n large, on a set of probability I - e, we get

P(SUPH,'(PV) I ES,, (f -g ) I >11/10 1 _)<E.

Integrating out -, we get

P(supH' (pin) I Et (f -g) I >q/10)<2e.

Putting all the above inequalities together, we finally get

P(sup[(1 I En (f -g ) I >n)<5e.

As for (ii), similarly, we can use Lemma 4.2 (i), Corollary 4.6 (ii), Lemma 4.7 (ii), (4.16)

and condition (3.7). 0

5. Proof II

In this section we present the proofs of the results in section 3 except those of the

two main theorems. These results are direct consequences of the two main theorems by

choosing ,, and a,. optimally according to the more specific entropy conditions assumed.

The proofs are arranged in the same order as the results appeared in section 3.

Page 33: Rates Convergence and Limit Empirical Processes of ...

- 31 -

Proof of Corollary 3.2:

Note that logNI(c b,2,p ,,,,G )=O(log(n)) by Lemma 2.11 . If we take b,, =n +1/2 or

(logn)614N, the results follow from the remark after Theorem 3.1 and our hypothesis with

a,, = n and b,, =n1'+ for (i) and (ii), and a,, = 2 log(n), b, ( (@n for (iii). 0

Proof of Theorem 33 (uniform convergence for bounded families) In Theorem 3.1,

take b,, a 1. The only task left is to find the optimal integer pair (L,,,a,,) satisfying (3.1)

and gm Pa=1o(l). We take a,, as of order nX. Then ,, is of order n 1-x. Note that

gm,, a =o(1) is equivalent to x < '(1 +r)-l. It suffices to show that (3.1) holds for (i), (ii)

and (iii).

i) Since ra= 1, we can take any x such that 0 < x < (l+r)-1. In this case, (3.1) is th

same as assumption (3.2).

ii) When r, > 1, choose x as in (i), but note that (3.1) is the same as assumption (3.3) m

this case.

iii) 1) When O<r <ao, we take x=(l-cO)/(l-ra). This implies that ar'p. = O

nrx+1x)) = O( na), which means that (3.1) can be inferred from assumption (3.4).

2) In this case, (3.1) holds for any x in (0,1). Take x as in i) to ensure that ,, Pa. =o(1).

0

Before we prove the uniform convergence theorem for an index class with non-

constant envelope function, we first recall some results from Doob (1953) and from using

the Chebyshev inequality on the law of large numbers for strictly stationary sequences.

5.1 Proposition Suppose that X is strictly stationary with the stationary distribution P, and

that F is a measurable function. Then under either of the following three conditions, we

have

1 F (Xi ) P F in probability.n i=1

Page 34: Rates Convergence and Limit Empirical Processes of ...

- 32 -

i) X is metrically transitie, and F e L1(P).

ii) A&(n,8)= Z oij = o(n) and F e L2+(P ) for some 6 in (0, 1].k-1

iii) r#>OandFeL2(P). 0

Proof of Theorem 3.5 (Uniform convergence for general families):

Case a: Assume F aM. This case is covered by Theorem 3.1 if we take b,, m1.

Case b: For any given M, take FM = M =f I F <M): f e G ). Then,

I Pm f-Pf LSIPnfm-PfM I +P F I(F>M))+P FJ(F>M))-

This implies

SUPF I P, f-Pf SSUPFMIP!fM -PfM I +Pn FI(F>M))+P FlI(F>A).

For any fixed e>0, takeM such PFl (F>M) ) <& Then by th asmp on Fa

Proposition 5.1, the law of large numbers holdis for '(F>M) wi this M.TNerefore, bodh

PFI (F>M)) and PF (F>M)) can be bounded by 3P F I (F>M)) in probabit when n is

sufficiently large. Note that for this fixed M the supremum over Fm tends to 0 in proba-

bility by part a) and the fact that the covering number of Fm is bounded by ta of F. O

Proof of Theorem 3.10 (CLT for a stationary mixing sequence):

By Proposition 3.9 and the mixing assumptions, the finite dimensional CLT holds for X.

According to Proposidon 3.8, we only have to check the equicontinuity condition. By

Theorem 3.6 (equicontinuity), we need to find an integer pairs (IL,. a, ) to satisfy either

condition (i) or (ii) of lTeorem 3.6., and be such that gx,, Pa. =o(1).

Take a,, =n[" ]. Then j1,, =o() is equivalent to rp > q(x) = l/x - 1.

a) Take x = 1/4-aa/2. Then/ la,, =nl~Z =n%. But a,,=n °wn° since a 1/4.

So the two random entropy conditions in Theorem 3.6 i) are implied by the random

entropy conditions assumed here. Further, ,, Pa =o(1), because r > 1- 2 ao

q( 114-a1o2) = q (x ) is assumed.

Page 35: Rates Convergence and Limit Empirical Processes of ...

33

b) Since rp 7+4aO =(-3-2 ao) -1, by the continuity of function q(y) = y 1, for3-4ao 10 5

co0<2/11, there is a positive and small 8 such that r >q(3/10-2ca05 -6) = q(x) with x =

3/10- 2acrS-6, and

1/6< 1/6+ao/3<x a 632 8< 32 /10.105 10 5

So 3/4 - 5 x/2 > ao and 3 x - 1/2 > Cb Hence the two random entropy conditions in

Theorem 3.6 b) are implied by the random entropy condition assumed here. 0

Acknowledgements: This research is based on part of the author's Ph.D. disserta-

tion submitted to the University of Califomia at Berkeley. The author is deeply in debt to

Prof. Lucien Le Cam for suggesting the problem, many valuable discussions and

encouragement. The author also wants to express her tanks to Prof. Deborah Nolan for

many helpful discussions and suggestions for improvement, and to Prof. Terence Speed for

commenting on the drafts. Special thanks are due to Prof. Walter Philipp for pointing out

the difference of this work and the related work of his and Massart's.

References

Athreya, K. B. and Pantula, S. G. (1986). Mixing properties of Harris chains and autore-

gressive processes. J. Appl. Prob. 23 880-892.

Dehling, H. (1983). Limit theorems for sums of weakly dependent Banach space valued

random variables. Zeit. fur Wahr. und Ver. Geb. 63 393-432.

Dudley, R. M. (1978). Central limit theorems for empirical measures. Ann. Probab. 6

899-929. (Correction, ibid, 7 (1979) 909-911).

Dudley, R. M. and Philipp, W. (1983). Invariance principles for sums of Banach space

valued random elements and empirical processes. Zeit. fur Wahr. und Ver. Geb. 62

509-552.

Doob, J. L. (1953). Stochastic processes. New York: Wiley.

Page 36: Rates Convergence and Limit Empirical Processes of ...

34

Gine, E. and Zin, J. (1984). On te centl limit theorem for empiricalpncesses. Ann.

Probab. 12 929-989.

Ibragimov, I. A. and Rosanov, Y. A. (1978). Gaussian random processes. New York:

Springer-Verlag. Applications ofmathemacs 9.

Kolmogorov, A. N. and Tihomirov, V. M. (1959). e-entropy and -capacity of sets

functional spaces. Uspehi Mat. Nauk. 14 3-86 (Amer. Math. Soc. Transl. Ser. 2 17

277-364).

Le Cam, L.M. (1984). A remark on empirical measures. In Bickel, P., Doksum, K., and

Hodges, J. (editors), Festschrift for E. L. Lehmann, 305-327. Belmon, CA: Wads-

worth

Levental, S. (1989). A uniform CLT for uniformly bounded families of mat

differences. J. of Theoretical Probability 2 271-287.

Massart, P. (1988). Invariant principles for empirical processes: the weakly nd t

case. Ph.D thesis, University of Paris.

Mokkadem, A. (1988). Mixing properties of ARMA processes. Stochastic processes and

their applications 29 309-315.

Pham, T. D. and Tran, L. T. (1985). Some mixing properties of time series models. Sto-

chastic processes and their applications 19 297-303.

Philipp, W. (1986). Invariance principles for independent and weakly dependent random

variables. Dependence in Probability and Statistics: A survey of recent results. 225-

268, E. Eberlem and M.S. Taqqu eds., Biihauser.

Pollard, D. (1982). A central limit theorem for empincal processes. J. of Australian

Madteatical Society (Series A) 33 235-248.

Pollard, D. (1984). Convergence of stochstdc processes. New York: Springer-Verlag.

Vapnik, V. N. and Cervonenkis, A. Ya. (1971). On the uniform convergence of relative

frequencies of events to their probabilides. Theory of Probability and its applications

16 264-280.

Page 37: Rates Convergence and Limit Empirical Processes of ...

35

Withers, C. S. (1981). Conditions for linear processes to be strong-mixing. Zeit. fur

Wahr. und Ver. Geb. 57 477-480.

DEPARTMENT OF STATISTICS

UNIVERSrrY OF WISCONSIN

MADISON, WISCONSIN 53706