Rates of Convergence and Central Limit Theorem for EmpiricalProcesses of Stationary Mixing Sequences
By
Bin Yu*
Statistics DepartmentUniversity of California, Berkeley
Technical Report No. 260June 1990
*Research supported in part by NSF grants MC84-03239,DMS-9001710
Department of StatisticsUniversity of CaliforniaBerkeley, California
Rates of Convergence and Central Limit Theorens for EmpiricalProcesss of Stationary Mixing Sequences
Bin Yu*
Statistics DepartmentUniversity of Califomia, Berkeley
June, 1990
ABSTRACT
An intuitive approach is used to obtain rates of convergence and CLT for empir-
ical processes for stationary mixing sequences with not too slow a 0 (O) mixing rate.
This includes as a special case Markov chains with an exponental decay 4-mixing
rate .=0(0f) for some B < 1.
The main techniques are to divide the sequence into blocks of equal size and
construct a new sequence with the same structure as the original sequence but with
indePendent blocks. Then the rates of convergence and CLT for the second sequence
are obtained using the standard symmetization technique and a chaining argument and
these results can be extended to the original sequence.
The metric entropy conditions imposed are similar to those in the independent
case. A 0 (or O) mixing rate condition related to the metric entropy conditions is also
needed, although for simplicity the measurability difficulties are ignored. In the case
of V-C classes, the uniform convergence result holds for any strictly stationary
sequence with J (or e) mixing rate N (or 0=o(n) ) for some d > 0, while the CLT
requires the rte N (or #f)=o(n7) for some d > 3. As by-products of these results,
several useful inequalities about the original and constructed sequence are obtained.
Research supported in part by NSF grants MC84-03239, DMS-8701426 and DMS-9001710.
Abbreviated title. Limit theorem for mixing sequences.
AMS 1980 class4¶icatioas. Primary 60F05, 60F17; Secodary 60G10.Key words arnd phrases. Empirical process, index class, blocking, independence, mixing, rtes of cmvergence, CentrlI LimitTheornm, metric entropy, V-C class, symmetrization.
1. Introduction
There has been a lot of research work on empirical processes indexed by classes of
functions since Vapnik & Cervonenkds (1971) showed that uniform convergence holds for
the empiincal process indexed by a V-C class in the i.i.d case. Many papers followed.
Examples include Dudley (1978), Dudley & Philipp (1983), Gine and Zinn (1984),
Le Cam (1983) and Pollard (1982). However, most of this work concentrates on the
independent case.
In this paper, uniform convergence, rates of convergence, and a Cental Limit
Theorem for the empirical process in the i.i.d case, are extended to the weakly dependent
case, or mixing sequences. Conditions on the mixing rate of the sequence and epy
constraints on the index class are imposed. The main technique used is the constucdon of
an independent block (IB) sequence which enables us to employ the symmetrization tech-
nique used in the i.i.d case, and Bennett's inquality in a chaining argument.
Two closely related works in the same direction as this paper are Philipp (1986) and
Massart (1988). In tenns of the entmpy conditions used, their works are very similar to
each other, and there is some overlap of results too. The difference between their approach
and mine lies in the entmpy conditions on the index class. My approach can be viewed as
an extension of Vapnik-Cervonenkis-type theory to weakly dependent (mixing) sequences,
since I adopt the Pollard (1984) approach and impose random entropy condidons, while
both Philipp and Massart use metric entropy with inclusion. In this context we note the
remark of Massart (1988), who stated that "we don't know whether the above weakly
dependent framework [as in Massart (1988)] could support a general 'Vapnik-Cervonenkis
type theory' or not".
When an algebraic decay of covering number with inclusion is assumed, Philipp
(1986) gives the strng invariance principle with rate (log n )F under very weak conditions
on the strong mixing rate. Under the same entropy conditions, but stronger conditions on
the strng mixing rate, Massart obtains a faster rate in the strong invariance pnnciple.
Moreover, if the sequence has an exponential decaying mixing rate, the algebraic decay of
the covering number with inclusion condition can be relaxed (Massart (1988)) to the log of
- 2 -
tha number (i.e., th metric enmpy with inclusion) in order to have an invariance princi-
ple with mte (Iogn)4 for some X>O. Massart (1988) also relaxes the constraint on the
index class of sets in Philipp (1986) to an index class of functions with constant envelope
function. Another related work is Levental (1989) on martingale difference sequences.
The results here are restricted to rates of convergence (uniform convergence) and the
CLT. I do not consider invariance principles. The results obtained here are under possibly
weaker mixing conditions and cover a large variety of V-C classes. There are some V-C
classes of sets that are not known to satisfy the metric entropy with inclusion conditions of
Philipp and Massart. For technical reasons, only 3-mixing (completely regular) are con-
sidered here, but we hope that any examples of strong (a-) mixing sequences with fast
enough mixing rate will have a good chance to be completely regular.
The rest of the paper is organized as follows. Section 2 gives preliminares on mcix-
ing sequences and definitions. Section 3 includes the main results, that is, rates of conver-
gence of empincal processes for general index classes (Theorem 3.1) and an equicontinuitylemma for index classes with constant envelope functions (Theorem 3.6). Section 4 con-
tains the proofs for Theorem 3.1 and 3.6, while section 5 contains the rest of the proofs.
Moreover, the constmction of an independent block sequence is descnrbed in detail in sec-
tion 4; it is the comerstone of the proofs.
2. Preliminaries
This section contains the preliminary materials on mixing sequences and metric entro-
pies. The size of the index class of the empirical process can be regulated thugh metric
entropy conditions related to the empirical Ll norn, and thrugh covering integrals. The
algebraic decay condition of the covering number of an index class is also introduced.
From now on, all the necessary measurability requirements are assumed. The reader
is referred to Dudley and Philipp (1983) for a careful treatment of this problem.
- 3 -
Let X =(Xi )j be a strictly stationary real-valued sequence with distribution P,
which implies X, (i1 ) al havethe same distribution P. For the sequence X, let
a/= (XIIX29...9 Xi
and
5'1+k =a(XI+k,X,+k+1i,-)-
Many kinds of mixing conditions exist in the literature. The weakest among those most
commonly used is called strong mixing or a-mixing.
2.1 Definition For any sequence X, the a-mixing coefficient ak is defined as follows:
ak(X)=sup (E I P (B I O )-P(B) I.: B e g+k,I,l> 1).
Other mxinlgs are
2.2 Definition For any sequence X, the P-mixing coefficient Pk is defined as follows:
Pkj(!)=E sup f I P (B I al )-P(B) 1: B e el+k, l > 1}.
2.3 Definition For any sequence X, the 0-mixing coefficient OA is defined as follows:
*k(X)=sup ( I P (B I A )-P(B)I: A eOc, B E'I+k, l> 1)
Moreover, we can define mixing rate constants for c = a, P, 4-mixings:
r, = sup( reR.R+: (cX n') is a bounded sequence)and
oc = sup( O<01: (c,, 0-) is a bounded sequence).
The three mixing coefficients are ordered as follows, see Philipp (1986):
ak < P < Ok-. (2.0)
Note that the stationary sequence (f (Xi ): i = 1, 2, ...) for a measurable function f,
has a-mixing and 0-mixing rate bounded by the corresponding rate of the original
sequence since the a-field of f (X ) is contained in the a-field of X for any f, that is,
ak .ak (f) and +k O*k (f).
- 4 -
Thberefore, if the sequence (X,) sadsfies an a or # condition, then so does the sequence
If (X;i)).
For examples of mixing sequences, see Athreya and Pantula (1986), Ibragimov and
Rosanov (1978), Mokkadem (1988), Pham and Tran (1985) and Withers (1981). In partic-
ular, a Marov chain is 0-mixing with O# < 1 under some regularity conditon (Doob
(1953)).
We now introduce some constants which will be used frequently later. For any 8 in
(0, 1], denote A,(n ,8)=,ak& +5), A,(n )= akl2. Let A(oo,1) be denoted by AcLk k
and A# (c ) by A#.We take Pollard's linear functional notation and use P instead of E to denote expecta-
tions. Hence, P f =f d P =E f (XI)=E f (X.) for al n . 1. Moreover, we use P to
denote "a probability measure on a (sometimes unspecified) measurable space; miscellane-
ous random variables live on this space." (Pollard (1984)).
It is known that if the mixing rate of sequence tends to zero fast, the variance of e
sum of a function of each n successive observations is O(n). This is necessary for the
CLT to hold. The following more general lemma is from Dehling (1983).
2.4 Lemma Suppose that f is a measurable function, X a strictly stationary sequence, and
Pf = 0. Then
i)
2P (f(Xi)) < n (I +2Aa(n,8))(P If 128)t226, (2.1)
1
n 2P (f(Xi)) S n (I+A*(n))(P If 12). (2.2)
ii) If f is bounded by a constant M, then
n 2P (Xf (Xi)) < n (1+201 ak)M2, (2.3)
1 k
2
P (f(Xi)) S n (1 +2AaM413)(P If 1)239, (2.4)
and
a 2
P(f (Xi)) S n(1+2A#)MP If I. (2.5)
Proof: Part (i) follows from lemma 3.2 and lemma 3.5 of Dehling (1983). (2.4) and (2.5)
of (ii) are consequences of (i). As for (2.3), it is a consequence of Lmma 3.1 of Dehling
(1983).
2.5 Corollary Let r'a=min( 1,r,) and 8, = 1, if rx=1,=I = 0 otherwise.
If Ifl . M, then
.8 2-rP (If (Xi)) SO((Iogn) n (2.6)
Proof. The proof is completed by (2.3) and the convergence property of the algebraic
series Ykv'. 0
We define the ,B mixing aM +-mixmng coefficients for any probability measure Q on
a product measure space ( (1 x Q2, £xI :2) as follows.
2.6 Definition Suppose that Q1 and Q2 am the margial probability measures of Q on
S21, ;1) and (Q22, 2). Then we define
5(F,1 .2.Q )=E sup I Q (B I 7:1)-Q2(B) I: B e ;2 ),f
+(T:1,X2,Q )=sup { I Q (B I A )-Q2(B) 1: A e 2:1, B e F;2) .
2.7 Lemma Suppose that h(x,y) is a measurable function with bound Mh, and P the pro-
duct measure Q IxQ2. hMen we have
I Q h - P h I <Mh 5(1. Z2, Q )!5Mh +(F1, F29 Q )
Proof. Since the bounded measurable function h(x,y) can be approximated by bounded
linear combinations of simple functions of form IAXB with A eQ and B efl2, it is enough
to prove the result for h of such form.m
Let h = 2 ilt<; where the A 's are chosen to be disjoint, and put MA,=max Ia1 l.
- 6 -
Moreover, write i =Q (B; I Z-1 )-Q2(Bi ). Since 11 (Ai)S 1, it follows that:
i~~~~~~~~~~~~
in
IQh P I= IFaj( Q(Bj11Bi)-Q2(ABj)dQl
i
=MhJ2supI I'i I IA1dQI Im~~~
< 2;I , su |71 1 I IAd Q IiMjsul i n
Mhf sup 1I Id Q
<Mh VY(l1, F21 Q )
The proof is then completed using the fact that the f3-mixing coefficient is bounded by the
*-mixing coefficient. 0
By induction and Lemma 2.7, we have
2.8 Corollary Let m . 1, and suppose that h is a bounded measurable function on a pro-m m
duct probability space (11li, rI xi). Let Q be a probability measure on the producti=1 i=1
space witfi marginal measures Q, on (ih, 1j), and Qi+l the marginal measure of Q oni+1 i+1
(nS2j xj), im=1,.. m-l Writej=1 j=1
V(Q)=SUsupm o(njF-i+ Q )+j=1
and define [(Q) similarly and let P = HQi. Theni=l
- 7 -
I Q h - P h I .(m-l)MAs [(Q).(m-1)MA *(Q).
Renmrk: Corollary 2.8 is the key to conecting the mixing sequence and the independent
block sequence (see section 4 for details).
Before we introduce the definitions of metric enmpy and covering integral, we need
some notations.
For any mixing sequence XI, X2, , denote by P, the empirical measure of the
first n observations:
P. f if (xi).
For any Borel-measurable function f, define the empirical P-bridge E. by
En f=4W(P,f -Pf ).
For any measurable family F of functions, we call F an envelope function of F if
If I< FforallfinF.
Since we are interested in the uniform performance of the empirical measure, intui-
tion suggests that F can't contain too many fiunctions. One measure of the size of F is the
covering number or metric entropy. Along the lines of Pollard (1984), we define the cov-
ering number as follows.
2.9 Definition (covering number) The covering number N( e, , F) related to a semi-
metic d on F is defined as
N(e,d,F)=minm (there is g1,. . . .gm ,e L'(P ), such that,
minj5j ,, d(fgj)Se, for any f, in F ).
logN ( e , d , F) is called a metric entropy at £. See Kolmogorov and Tihomirov (1959).
If we take the following random L I-semimetric p ,, as d,
P,n (f.g) = Pm (If-g 1)
then the related covering number is random. Denote by pt the Lt semi-metric
p1(f,g)=P If -g i. A metrc related to pX is p,l2(f .g)=(P I f -g 1)1/2. The last
metric is convenient when using a rstricted chaining argument (Le Cam's "square-root-
trick" (Pollard (1984))) to prove the CLT for O-mixing sequences. Moreover, a "cube-
root-trick" and p1n(f,g)=(P I f -B I)l"3 are needed to deal with a-mixing sequences.
The covenrng numbers for P12 and P1/3 are N (e£ p1,F) and N (e, p139,F), and the6 , 6 ~~~~~~~1/2
corresponding covenrng integrals are J (8, P1z,F)={2 log(N (e, p1/F)2/C)} de and
J (8p1i,F)={2 log(N (e,p 13, F)2/E)} de. Conditions on the rate at which the cov-
ering numbers tend to infinity as e tends to zero, and finiteness of covering integrals, are
restrictions on the size of the family F.
The following algebraic decay conditions on covering numbers will be imposed in the
later theorems and they are known to be satisfied by V-C classes (Dudley (1978)).
N(e,pi, F) =O (Cw) for some w >O, as e- O. (2.7)
N(e,pI,,,F)=O(ew) forsome w>O,as e-O. (2.8)
It follows that when (2.7) is satisfied, as e -* 0, we have
N (e,pi,2,F)<N(e2,p ,F)=O(.72w)
and
N (E,pi,3F)<N(e3,p1,F)=O(,73w).
3. Limit theorems: the main results
This section includes the statements of the main results, i.e., the rates of convergence,
uniform convergence, and the CLT for the empirical process of a stationary mixing
sequence. The main results are Theorem 3.1 (the rates of convergence) and Theorem 3.6
(equicontinuity lemma). The proofs are left to sections 4 and 5.
-9-
For the unifom convergence theotem, we first rstrict ourselves to bounded index
classes We then find proper conditions to ensure that the law of lare numbers hOlds for
the envelope function, which is in turn the condition necessary to generalize the uniform
convergence result to classes having non-constant envelope function. Unfortunately, con-
stant envelope function is assumed for the CLT (Theorem 3.10) to hold, although I believe
that if we truncate X properly, this requirement is not necessary. Stronger mixing assump-
tions might be needed for the CLT to hold for classes with non-constant envelope function.
For any a-mixing rate constant rc > 0, recall that r'* = min (1, rc), and set 6a = 1 if
re=1 and ra = 0 otherwise.
3.1 Theorem (Rates of convergence) Suppose that X is a strictly staonary c.
Assume the necessary measurability requirements are satisfied and let FM be an index ClaSS
with constant envelope function M. Suppose further that there exists integer pairs (a,,, ,,)
where i ,,=[n/2a,,], such that for any b,,=O(1), we have (logn) a,,
Ia fia, = o(l), and
logN1 (c b , pI,FM) = op ( (logn =a 'a b) (3.1)
Then we have
P (supf FIP,,f-Pf I >eb,}e0 aas n o. 0
See section 4 for proof.
Remark: If we replace condition (3.1) by a similar 3-mixing condition with x and r'If
similarly defined, Theorem 3.1 still holds.
3.2 Corollary Suppose that F is an index class satisfying (2.8), i.e.,
N( £,pp, . F ) =O (£W ) for some w > 0, as e - 0.
and that the necessary measurability requirements are assumed.
- 10-
i) IfO < rg . 1, then for any &o such that (1-r)12 (l+r ) < 8S < 1/2, we have
PtSUPf F IPff -Pf I ,>£ & 0 as n, -4c,
i.e.,
SUpfeF IPfJ-Pf I = ( Gn.ii) if rp >0, then for0< So < 1/2 and e> 0, we have
P(supfeFIP f-Pf Il>eE8 ) 0 as n oo,
.e.,
SUpf 6F IPnff-Pf l=op ( G)
iii) If 0O < 1, then for any 81 > 1 and £ > 0, we have
P SUPfeF IPJf -Pf I > e(0 as n o,
i.e.,
SUpf.F IP,f-Pf I =op .
See section 5 for the proof.
Remark
1) As a straightforward consequence of Theorem 3.1, for a class F satisfying (2.8),
P(SUPfEFIPnf-Pf I>eJ-0 as n -4o,
provided that r > 0.
2) V-C classes satisfy (2.8), so Corollary 3.2 holds for V-C classes.
3.3 Theorem (Uniform convergence for bounded families) Suppose that X is a strictly
stationary sequence. Assume the necessary measurability requirements are satisfied and
- 11 -
Fm an index class with constant envelope function M. For any given e > 0, we have
P(SUPf FM IPj,f-Pf I>e)e 0 as n c.
under any of the following three conditions.
i)ra=l and r >0and
logNI(,P1,.FM )=o° (o (3.2)
ii) r,> 1, rp>0, and
logIV (r£, p I,n, FM, )op (n ).(3.3)
iii) For some positive constant ao < 1,
log IV I (e, P n,, FM)- op ( n a (3.4)
and either 1) O<ra<ao and r> ao ', or 2) l>r >ao and rp>O. O See section S
for proof.
3.4 Definition (metrically transitive) A strictly stationary sequence Y=(Y1,...) is called
metrically transitive if all invariant sets of its shift transformation T (i.e., T Yi = Yi+) have
probabiliWty 0 or 1. 0
We refer to Doob (1953) for more discussions on this concept.
3.5 Theorem (Uniform convergence for general families) Consider an index class of
functions F w'ith general envelope function F under the random entropy and mixing condi-
tions of Theorem 3.3 plus one of the following constraints
i) (F (Xi), i = 1, - - - with F E L (P ) is metrically transitive.
ii) A,(n,6)= a2 = o(n) and F e L24(P ) for some 6 in (0, 1].k=1
iii) r,>O and F E L2(P
We have
PISUPf F Pnf -Pf i > O as n - . O0
- 12 -
See section 5 for proof.
3.6 Theorem (Equicontinuity for a mixng sequence) Suppose that X is a strictly station-
ary sequence and that the necessary measurability requirements are satisfied. Let FM be a
class of functions with constant envelope function M.
a) Suppose that the sequence is 4-mixing with A*< or ro >2, and the covering integral
J (8,p1n2FM ) is finite for each 5>0, and there exist integer pairs (a., p,) with
l=[fl/2afl], aRnl411,,1 = o(l) such that
logN (,pl,n I, FM ) ° ( 42 ) for all T1 > GO, (3.S)
and
logN (T1 a. Pi,,3 FM ) = oP (a,,2) for all T1 > O. (36)
Then, for each l > 0 and £ > 0 there exist a 6 > 0 for which
lim,, __ ,P(sup[ IEn U_B-) I >n ) <£
where [8]= { (f,g): f,gFFm, p1/2(f g)<4 ).
b) Suppose that the sequence is ,3-mixing with A x<o or r.> 3, and the covering integral
J (8,pl/3,F 1) is finite for each 5>0, and there exist integer pairs (a,,, ,,) with
1, =[nI2aa n1/6 va <n3/10, RJ. ia = o(l) such that
logN( . P , FM ) = Op ( ) forall T1 >O, (3.7)
and
logN (f(a )'3/2 PI,n FM) = O0 (-S ) for all 11 > 0. (3.8)
Then, for each r1 > 0 and £ > 0 there exist a 8 > 0 for which
limwr [ P{( fsup[sf iEp(f -g ) I 4n) <Ep
where [81 = { (f *g): -f g c FM, P113(f .gk}<. O See section 4 for proof.
- 13 -
To clarify the sense in which the empirical pcess indexed by F converges to the P-
bridge, we introduce the space C(FJP) c * of real bounded functionals on F (Pollard
(1984)). For a detailed discussion and remarks on the relevant measurability problem, see
Pollard (1984, Appendix C) and references cited therein.
3.7 Definition C(F, P) is the set of all functionals x(.) on F which are uniformly continu-
ous with respect to the L 1(P ) seminorm PI on F. That is, for each functional x(.) and e
> 0 there should exist a 8 > 0 such that Ix(f )-x(g )I < e whenever p (f-g)<&-
Define Bp as the smallest a-field on F which contains all the closed balls with centers in
C(F, P), and makes all the finite-dimensional projections measurable.
The following theorem is a slightly different version of Theorem 21 on page 157 of
Pollard (1984). We cite it here without prof.
3.8 Proposition Let F be a subset of L2(P ) with envelope F and let X be a strictly sta-
tionary sequence. Suppose that the covering number N (e, pl/2, F) (or N (e, pw, F)) is
finite for all e > 0, and that
i) for any finite number of functions ff ... f,,, in F, (E,, f1 j=l,2,...,m con-
verges weakly to the multivariate normal with finite covariance matrix (aij) where
Tij = (7fj,j =P fi (X1)fj(X1)+2£P fi (X1)fj(Xk+ )
=Pfi (XI)fj(X,)+2wP fj (X0)fi(Xk+0);
ii) for each rl > 0 and e > 0 there exist a S > 0 for which
liiii, ~P(supla,6 IE, f- g) I > 7 )<£c
where
[S]=( (f, g): f, g eF and pl/2 (f -g)<) (or [J={ (f, g): f, g e F and P1/3 (f-g) < 8)).
Then E. -- Ep as random elements of . The limit P-bridge process Ep is a tight, gaus-
sian random element of -X whose sample paths all belong to C(F, P), and Ep (. f) has a
nonnal distribution with mean zero and varance 2jj. O
- 14 -
Remark:
1) Pollard defined his C(F,P) in terms of L2(P) metric P2. In our case, we should define
it in terms of d = Pm or Pi3. However, since Pta and P13 are equivalent to Pi in terms
of generating the space in Definition 3.7, we take Pi as the metric and note the correspond-
ing space is contained in the metric space defined in terms of L2(P) metric when the
envelope function of F is in L 2(P)
2) Thle "pointwise boundedness" and "total boundednesst assumptions in Pollard (1984) are
insured by our envelope fuction assumption and our finite covering number assumption
for metrics Plt2 or P1/3. Therefore, although fte [6] here is defined in terms of the semi-
metic p12 or P1/3 instead of the L2(P)-metric p2(f,g)= 41p If-_g 12 as in Pollard
(1984), it doesn't make any difference to the proof.
We say that the CLT holds for the empirical process indexed by F if the conclusion
of Proposition 3.8 holds. Now let us cite a finite dimensional CLT from Dehling (1983).
3.9 Proposition (Dehling (1983)) Suppose that X is a strictly stationary real (vector)
sequence and that
i) A*<oc (or r,, >2), and finite second moments exist; or
ii) A,,,<00 (or ra > 3) and finite third moments exist.
Then the CLT holds for this sequence. O
3.10 Theorem (CLT for a mixing sequence) Let FM be a class of functdons with constant
envelope function M and let X be a strictly stationary sequence such that the necessary
measurability requirements are satisfied. Assume that the following random entropy condi-
tion holds for some b in (0,1) and every £ > 0:
logN 4-n PI,FM )=op (n a).
Then, under either a) or b) below, the CLT holds for the empirical process indexed by FM.
3+2ora) the sequence is +-mixing with A <co (r*>2), r p> 12
a
ao0 1/4, and the covering
- 15 -
integal J (8,plp,FM ) is finite for each 8 >0 and
b) The seuec is f-mixing, A <o (r>3), rp>3a4 < 2/1 1and the covering
integralJ (8,pl3, FM ) is finite for each 8 >O. 0
See section 5 for proof.
Remark: The upper bound requirement on o0 is purely a result of the blocking technique
we shall employ in the proof. A refined chaining argument might be able to bring the
upper bound closer to 1/2 which is the condition used in the independent case (Le Cam
(1983), also Pollard (1984)). Presumably, a stronger mixing condition would be needed to
increase the upper bound. Note that even when # < I is assumed, Theorem 3.10 in the
present form couldn't increase the upper bound on ao further from 1/4.
Observe that the two metric entropy conditions for both uniform convergence and the
equicontinuity lemma are met by index classes satisfying (2.8) which correspond praci-
cally to ao = 0. Hence, by (b) of the above theorem, we have
3.11 Theorem If FM is an index class satisfying (2.8) and X is a strictly stationary
sequence. Then under either
i)J(8,pln,FM)<ooforeach8>0,andr,,>2and rp>3;orii)J(8,pl/3,FM)<ooforeach8>0,andrax>3and rp>7/3,the CLT holds for the empirical processes indexed by FM. 0
Since V-C classes satisfy (2.8), we get
3.12 Corollary If FM is a V-C class, and X is a strictly stationary sequence with rx > 3
and rp > 7/3, then the CLT holds for the empirical processes indexed by FM. 0
Remark In the case of an index class satisfying (2.8) where both J (8, p2, FM) and
J (8,p,,,FM ) are finite, the condition (a) in Theorem 3.10 requires r* > 2 and rp > 3
which implies ra > 3, and rn > 7/3. Therefore, (b) gives stronger results. However, (a)
covers the case when ao is between 2/11 and 1/4. Of course, this fine distinction on the
exponent of the random metric entrpy may be of little importance, since we don't know if
there exists such a class with exponent in thi's range.
- 16 -
4. Proofs I
We include in this section the proofs of the key results of this paper (Theorem 3.1
and Theorem 3.6). First, I explain how a blocking technique enables us to get the
exponential inequality, which is the key step in proving our results. Then the rigorous
proofs are given in the forn of a few lemmas after constructing an independent block (IB)
sequence. Some useful inequalities relating the original sequence to the IB sequence are
also presented here (Lemma 4.2).
After observing that all the results we want are either in terms of distributions, or can
be interpreted in terms of probabilites, we constuct an [B sequence from the original sta-
tionary mixing sequence such that the IB sequence is very close in distribution to the mix-
ing sequence. We then transfer the problem to the IB sequence to which the stard tech
niques of the independent case can be applied. Symmetrization is used for the IB sequence
in the proofs of both Theorem 3.1 and Theorem 3.6. Moreover, in the proof of Tlhorem
3.6, we also need to use a restricted chaining argument which requires some exponential
inequality. This inequality can be obtained for the IB sequence via Bennett's inequality
only because of the independence of the IB sequence. If the mixing rate is tending to zero
fast enough, the same upper bound obtained using the techniques in tfie independent case
plus a o(1) quantity becomes the upper bound for the original mixing sequence.
We divide the n-sequence X,,=(x1,x2.,x,,) into blocks of length a,,, one after the
other. We eliminate every other block and work with the remaining odd blocks. Depend-
ing on te mixing and metric entmpy conditions to be assumed, we choose a,, large so that
the odd a.-blocks are "almost" independent, but at the same time choose ax, not that large
so that the odd a,,-blocks together behave similarly to the oigingal mixing sequence. Then
we construct an independent sequence of blocks where each block has the same distribu-
tion as one of the a,, -blocks of the original sequence.
More precisely, for any integer pair (a,,, ,,) with 11n /2 a,, ], we divide the
strictly stationary n-sequence Xy,, = (X1, X2., X,,) into 2A,, blocks of length a, and the
remainder block of length n - 2 ,u,, a,,. Denote the indices in the blocks altemately by H's
- 17 -
and T's, and denote the indices in the remainder block by R. These indices depend on n,
but for simplicity we suppress n. That is
HI= (i: I <i <a,,)
Tj = li: a. + I <i 52a,,)
Generally, for 1 < j < ,,
Hj = (i: 2(j -1I)aP, +1:5i 5(2j- l)a,,,
Tj = {i: (2j - lI)a,t +1 <i <(2j )a,, }.
Denote the random variables that correspond to the Hi and T, indices as
X (Hj )=({Xi, i EHj },
X (Tj)= (Xi, i Tj},
Further, let the whole sequence of H-blocks be denoted by Xa. =
(X (Hj) j=1, 2,..., An}
Now, we take a sequence of identically distributed independent blocks
{_(H1):j=j,..4Ln } where _(Hj) = [,: i Hj ), such that the sequence is indepen-
dent of X,, and each block has the same distribution as a block from the original sequence:
distribution ( _(Hj ) ) = distribution ( X (Hj ) ) = distibution ( X (H1 )). We call this
constructed sequence the independent block a, -sequence (IB sequence). Denote the IB
sequence as ~a Because of the mixing condition, we can relate Xa< and - in the fol-
lowing way.
4.1 Lemma Let the distribution of Xa. and abe Q and Q respectively. For any
measurble function h on RN af with bound M,
I Q h (Xa )-Q h (ga) I <M (Xn-I)1asProof This is a direct application of Corollary 2.8. In the corollary, take Q = the proba-
bility distribution of the a,, -sequence with Rj = Raf, I; =product Borel (a-field on Raf and
m = - Then P in the corollary equals the probability distribution of the IB a,, -sequence,
- 18 -
i.e., . Notice ftat (Q)5 Pa. OI
Renurk:
1) This is the key lemma, and it is used throughout the subsequent proofs. Different func-
tions h are used in the application of this lemma; in particular, h is often taken to be an
indicator function.
2) The f3-mixing (or 4-mixing) condition is required for both the uniform convergence
result and the CLT, because in our approach this lemma is crucial in connecting the origi-
nal sequence with the IB sequence. We are not able to obtain this lemma under a-mixing
conditions. If Lemma 4.1 holds for a-mixing, all the main results could then be obtained
under a-mixing conditions which are weaker than the ones we use now.
Recall that the index class with an envelope function F is denoted by F. For simpli-
city, we assume P f = 0 for all f in F. Then the empirical measure on
(f (Xi ): i = 1, ,2, ..., n } is
P, f = n £f(Xi).nl
Correspondingly, the empirical P-bridge is
1nE, f (Xj).
For the original sequence X, we write
Yj.f (xa,, f=£ (Xi ) and YIjjf (Xl,I = f (Xi )ieHj ie Tj
For the constructed [B sequence _, define
Zj(, a) f() j),axianddenote P,f=-f Z11J
Associated with this empirical measure (note that it is not a probability measure if a,, >0)
are two random semi-metrics
1 AplN.,(f,g)=I IYf-g 1,nfI
° 19 -
and
1~LP46 (f tg)= IlDzjf-s 1
We shall compare these two semi-metrics with the L1 empirical metric
p1,,,(f,g)=Pn If-g I.
Similarly, we have the pseudo empiincal process for the IB sequence
1 AnE
n J.Zj 'f
The L I random covering numbers corresponding to Pi, , . pi,,, and pi, are denoted by
N (e, Pi,,,. F), N (e, Plg. F), and N (e, Plm F),
respectively.
For simplicity, from now on. (i,, a,,) is always an integer pair satisfying
n/2 -aa, .J.,, am,nnl2. Therefore, jL,, - oo, a, -+ co, a,, =o(n), and A, a,, = 0(n). More-
over, the IB sequence is implicitly assumed to be defined in terms of a pair of integers
(p.,,. a,,).
The following lemma allows us to replace P, by P, with only an enor of order
gn a,,
4,2 Lemma Suppose that F = M, and b,,=0 (1), as n - o.
i) If p,, b,, -+ o, then
P(SUPfeF I P- f I1 2bm ) 5 2 P(SUMPfEF 1 P lg#,f I 2e
b,, )+2g11,a,.~4
ii) If a,,=o(4n), then
P(supfgEFs I E, (f-g) I 2.) 2P(supf gEF IREN (f-g) I >.-)+2,, Pa.,
where F8 is any subset of F.
- 20-
Proot.
i) Note tat he sum of f over the remainder block R is unifomly bounded by
M (2 a,,)n1 =O (,'1) which tends to zero faster than b, since ,, b, -* oo, and Xa has the
same distribution as XI, = Xi: i Tjfor 1.1j S 1. Therefore, for n sufficiently
large, we have
P(SUPfeF I P f I .)
1 1 A
.P(SuPf,gEF -n Yjf(Xa)+nY,> (Xl, I
n j -g (Xa) I4 IF Yljf (Xia.) I >J= n j=I1
2P(SUPf,geFI - E Yjfj-g (XM) I >¾. (4.4)
Taking for h the indicator function of the event
SUpf g F I n- F Yjf-g (Xa.) I > -n j=1 4
Lemma 4.1 gives the following bound on the the left-hand side of (4.4):
2P (supf g e F In ZjJ -g (Xa,) I> e ) + 2 g1 P3a
ii) This is similar to i). Note that a,, =o(4n) is assumed instead of a,, =o(n) to eliminatethe remainder block, which is bounded by 2 a,, MhF . El
Since the IB sequence consists of i.i.d blocks, we can use the standard symmetriza-don technique in the independent case to get the uniform convergence (rates of conver-
gence) for the pseudo-empirical measure P_ ,,, of the IB sequence. The entropy conditions
needed will be in tenns of the IB sequence. Therefore, we need to relate the entmpy con-
ditions on the original sequence to those on the IB sequence. Because the entrpy condi-
tions are random, i.e., they can be stated in terms of probability, we can easily transfer theentropy condition about the original sequence to the IB sequence by Lemma 4.1. Recall
- 21 -
that r' =min(r,,, 1), and Sa= 1 if ra 1, and 6a=O otherse.
43 Lemma If gm,, Pa =0(1), then for n sufficiently large, the following hold.
i) For any constant sequence b,,. if logN (e, Pl,,1, F)=op (b,,), then
logN (£, P1,.F)=op (b,,). -(4.5)
ii) If F= M and ra,>O b,, =0 (1) and (log n a4 p,, b,2 -4 oo, ten
P(SUPf F IPg f I >.ebn Ia
e 2bexp4
- 2b, a, (log n) +logN,( 8 F)}. (4.6)
iii) Under the assumptions in ii) and assume further that
logN b,, P ,,. F op ( (logn -)4ua,s,CL b,2), (4.7)
then, for any e > 0, and n sufficiently large,
P(SUPfeF I Pf I1.eb, )<e.
Proof: i) By the triangle inequality, we have
N 1 ^i=P fi ( ng) Yj,If-S I < n ;If (Xi)-g (Xi ) I S PS,n (f g )
Thus,
N (e, Pi.' F)5N1(e, Pl . F).
This together with the assumption in i) implies
logN (£, pl ,z, F)=op (b,,).
Then take h in Lemma 4.1 to be the indicator function of the event
logNJ (e, PL3' F){ > e}, we obtain the conclusion of (i).
b,,
ii) This can be obtained by the standard symmetrization technique for te i.i.d case, Since
the Z's are i.i.d for any fixed f in FM, and bounded by M a,.
- 22 -
By Chebysev's i qt and Corollary 2.5 , tere is a constant C such that
var (Zjf ). C M2a. `(log n)6.
Therefore,
P ( I Pp,, f I >e bn I a)
22 CM2 (log )<1-b =1/2
e2b,, n (logn)aGa. p, b 2
smce (log nR) a,"a1 b_, co as n -- oo.
Using the same arguments as on page 14-15 of Pollard (1984), we get (4.7).
iii) This is straightforward from ii) and the symmetrization lemma in Pollard (1984). 0
Proof of Theorem 3.1 (rates of convergence):
Combining (i) of lemma 4.2, and (i) and (iii) of lemma 4.3 we obtain Theorem 3.1. 0
Now we tumn to the proof of Theorem 3.6, the equicontinuity lemma. We denote as
FM our index class with a constant envelope function M and define the empirical proess
for the IB sequence as
E*,,, f = ; (Zj,f -a,, Pf )= Zjt
since Pf = 0 is assumed.
Recall that there are two semi-metrics related to Pi (f.g) = P If-g 1. They are
p ,2 (f.g) = 4? If-g I and p ,3 (f,g) = (P If-g I )13 . Their corresponding covering
integrals are J(y,pj2,F) and J(y,p 13,F). P1/2 and P1/3 are used because tey are
closely related to the L '-empirical random semi-metric p ,, (f,g) = P,, If-g I in terms of
which the original sequence and the IB sequence behave similarly.
Besides symmetrization, we need a restricted chaining argument used by Le Cam
(Pollard (1984)) to chain down the process E,, to a link of size 0(a,,/4n) (Lemma 4.5).
The link is measured in terms of the square-root metric Pl2 or the cube-root metric pl3,
- 23 -
depending on the type of mixing conditions assumed. Tlhe oter chaining argument used
by Pollard (1984) will not work here since we are not able to show the uniform conver-
gence of the sum of squares of functions on the blocks to its expectation. However, in the
independent case where the "block" contains only one observation, this convergerxe is a
trivial application of the uniform convergence result for a different index class
(f -g)2:f,g e F).
Next we cite the restricted chaining lemma from Pollard (1984). Although Pollard
(1984) used the L2(P) metric, his argument is general enough to cover our case.
4.4 Lemma (Restricted Chaining) Let (Z(t): t e T) be a stochastic process that satisfies
the exponential inequality
P( IZ(t)-Z(s)I >1) P(2exp({ 112/D22) if d(s.t)<6 (4.8)
for every rl > 0 and 6 > 0 with 6.alij, for some constant a.
Suppose that T has a finite covering integral J(.). Let T(a) be an a-net (containing N(a)
points: T(a)= ( 1, t2, ..., tN: min,INd(5,:)<a forall t T); let ta be the closest point
in T(a) to t; and let [6] denote the set of pairs (s,t) with d(s,t) < 6. Given e > 0 and y >
0, there exists a 6 > 0, depending on £, y and J(.), for which
P(sup[8 IZ(t-Z(s) I >57):52£+P(SUPT IZ -Z(ta) >1) (4.9)
provided aS-5- £, and yS 144 and J(a)5min(--, 1 .3 12D' D
4.5 Lemma Suppose that the covering integral J ( 6, d, FM ) corrsponding to a metric d on
FM is finite, and that there are integer pairs (a., A, ).such that a ,, =0 (n). Assume that
the EB sequence satisfies
Var ( ,Zjjg )2.A M n d2(f g (4.10)j=1
Then, for any given E > 0, 11 > 0, and sufficiently large n,
P ( supp] Ik9I h I >5Tg)
- 24 -
S2e+P(supHpm(d)IENh I)fl) (4.11)
whereH,,(d)=(ffg:f,g eFMafdd2(f,g)5 K- )andK= 2MAwher d ):5K~~~~4BFadK=81(1/2)
Proof: For any fixed n, by the restricted chaining lemma (Lemma 4.4), we only have to
show that condition (4.8) holds for our process ER with index set T = FM, for every posi-
tive 6 and n such that 6> a 4i with a2=K -, and K=2AMVn- B-'(112)Let h -f-g, and note that tlfi M and Pf = 0. Moreover, for any fixed f and g, Zj,are i.i.d for j = 1, ..., ,,, and bounded by 2 a,, M.
By Bennett's inequality, for every 6 amd r1 as above,
P( IE hI>T)=P( IX(Zjf.-8 I >rnp4) (4.12)j-l
2exp (-12(fAlM g2i B ( 2 a M Tl4)2 nA Md2(f,g)) n AMd2( g
where B(x)=2x-2[(1+x)log(1+x)-x 1 for x > 0 is a decreasing function of x with range
(O0,).
If we restrict f and g to be close, i.e., d2 (f ,g)<6, then the left hand side of (4.12) is
bounded by
2exp{-J-( , j2 )B2a Mr1)42nAM 2 n AM62
2 M2 A2a.=2exp B(kA
sep-4 A M 82
since A< S s!l B-1 (1/2), that is, 62> 2AM =1a2nfor a=Ka By
lemma 4.4, the proof is complete. O
- 25 -
4.6 Corollaryi) If J (y,p1L2,FM )<oo and A*<oo (r*>2), then (4:11) holds for d = In
ii) If J (T,pP13 FM)< andAa<o (ra> 3), then (4.11) holds ford = p1/3.
Proof: We only need to check condition (4.10)
i) When A*,<°o, (4.10) holds for d = Pin by (2.5) of lemma 2.4.
ii) When Aa<°°, (4.10) holds ford = P1/3 by (2.4) of lemma 2.4. O
Now the task for proving the equicontinuity lemma becomes that of bounding the tail
probability of the supremum E 3 over a smaller neighbodlood H,, (d). We will replace
this neighborhood of size a,,fW by a random neighbodrod H',, (d) ( where d is either
Pij2 or Pl3)' since on this random neighborhood the symmetrization technique can be used
to obtain a bound on the tail probability.
Depending on the metric we choose, the neighborhood H. (d will have a differet
size in terms of p1. That is
H,(PI12)=(f-g,f,g FMandp2n(fag)O(
=(f g f- e FM and Pa,(f,g)<Oa
and the corresponding random neighborhood is
Hx'(pj )=(f-g,f g e FM and Pl,,,(f,g)<O a,,
On the other hand,
Ha (P1r3)= f -g,f .g e FM and p?23(f ,g )<0 ( aft
=(f -g f, g EFM and P1(f,g)<O ((am )3n}
with the corresponding random neighborhood
H, (p13)= (f-g, f , g E FM and p14L.(f,g )SO (( af)3/)2
- 26 -
Lemma 4.7 and its corollary (4.8) below make sure that the replacement of H. (d) by
H',,(d) (whe d = pV2 or p ) is legitimate.
4.7 Lemma Suppose the stmng mixing raM constant ra > 1, and let
HM= h:h=If-g If ,g e FM )*
i) If there are integer pairs (a,,, p.,,) such that a, -oo, a,, p. = O(n) and gn Pa. = 0(1),
and the random entropy satisfies
logN ( to,P, w FM ) =OP ( a.2)
then, as n -+ oo, we have
P(suPHM IP,h a,,
ii) If there are integer pairs (a,,, .,,) such that a,,2/4I -oco, a,, p.,, = 0(n) and P, =
o(l), and the random entropy satisfies
logN ((; )3/2. PlM, FM )=o; (;;-)9
then
P(supH,. I P 1,p. h I > £(V; )112) _+ O.
Proof: We use lemma 4.3 for HM. For ra > 1, it is sufficient to show that
logN ( £ b,,pIP,, HM )=OP ( n b.2 ). (4.13)
where b,, =a,, iv7i for (i) and b,, = (a,,/n)3/2 for (ii).
Observe that the covering number of HM, i.e. N (2 £, p1,, HM ), is bounded by the square
of the covering number of FM, i.e. N (e, pl, , FM ). The last statement can be seen to be
true, because, for any e > 0 and h = I f -g I E HM, there is a f I and gI such that
pl,n (f,f )<e andp1,, (g,gj)<e. Lethj= If1-gl 1, note that
I IhI-Ih1I I<Ih-h1l
and
- 27 -
Ih-hi I = If -g -f1+g I S If-f II + Ig -g1I.
Therefore, the log of the covering number of HM is bounded by 2 times the log of the cov-
ering number of FM.
i) Take b, = --, then (4.13) is satisfied by assumption (3.5).
ii) Take b,, = )312, then (4.13) is satisfied by assumption (3.7). 0
4.8 Corollary For any e >O and l> 0:
i) Under the assumptions in Lemma 4.7 (i), we have
P(SUPH,, (d) P41 (h ) I >T)SP (SUPH'(d) I PX, (h ) I >n)+e
where d = P1n.ii) Under the assumptions in Lemma 4.7 (ii), we have the conclusion of i) for d = pl.
Proof: i) Let A,, = (supH1 (p IP ,h I >11 ) and B,, = (SuPHM IIh II<. T)en,
by lemma 4.7 (i), P (Bc ) - 0 as n tends to infinity. Hence, it is enough to show that, on
B,,, Hn (P1/2) is contained in H,,'(p1,2). For any h = f -g in H,, (P i?) and on B,,
pl(f,g)=P Ih I <K ^- and
P-,^ Ih I"=I n,(Zj,,I a a.
This implies
I n-l I zj,A I fn t,zj,lh I1 1
I+1+,, a,, n1P Ih I
<a.l n+1/2P hI
< (I +K /2) a. 14-n
- 28 -
=0 (a.,in).
ii) This is imilar to i). 0
Now let us deal with the random neighborhood H."'(p1/2) and H.' (P113) by symmeti-
zation.
For any Ti > 0, let t1,t2.t, tN be a S-net in of H.,'(d) (d = p11or Pin3)
and where N = N ( Fn pj ,,,Hn,(d)) By Hoeffding's inequality, we have
P( I I,) exp(- 8 n2/4 Zj). (4.14)n =
Zt 2 ""n8m
Since ti is on the --net of H,'(d), there is a h = f-g in H,,'(d) such that
Pl,jL (ti ,h )< G By the triangle inequality,
I Zjj 1: - I ZjJ -Z I + I Zjn J J n j=l n jail
=PI'P (ti,h )+Pj' (f ,g)
ll + a,1l
0 (a. 14 ) if a,, -4 o .
Therefore, Z anM- ;IZi, I = 0(4n a2).j-l n j=l
Combining the last bound with (4.13), we have
psm tzi argument.i Poalad(95expae 12)),
By the symmetrization argument in Pollard (1984, page 15),
- 29 -
;An 1- 2
P(SU4PH.'@@(Po I i} 1,2 _
SN2in-piAngHti(plf2))maxiP(lI; ajZj,,,ltI .,>
<N( I,;,,s, H.,*(pj7))excp(-O( 4)a
.exp(logN ( 1,Pg, HM )-O( 2T))
Sexp(21ogN( 2vin Plg FM )°( a))
That is
P ( SUPH4 (P,,I2 ,; ZCi 2jf1>aI,j=1
Sexp(2logN (- FM )-°( )) (4.15)
Similarly, for d = p 1/3 when a,, n lt, we get
1: (Y Z
2 "a~~~~L5~~~~~/
.exp(logN ( 2 p;;P,, H.'(pl/3)-O ( 52)
2exp(logN(IV n3(.16S2enc(log (2 1,,Plm FM )-O( s ) ). (4. 16)
Now we am ready to prove the equicontinuity lemma (Theorem 3.6) for FM.
Proof of Theorem 3.6 (equicontinuity):
i) Under the assumption a,, = o (n1/4) = O(4F) and gm,, [a =o (1), we take
F8= [8=f -g: f ,g e FM and Pi/2 (f ,g) < 6 inLemma 4.2 (ii). For any e >O, when n
sufficiently large, we have
- 30 -
P(s4p11 I En (f -g ) I >rI)
SP(sup[6J I Et (f-g) I >il/2)+C.
Moreover, snce ( ,P2, FM )<oo and A*<o (r*>2), by Corollary 4.6 (i), we get
P(sup[sj I Eo, (f-g ) I >ij/2)
.P(supHU(p,) I E. (f -g ) I >1/10)+E.
Since the random entropy condition in Lemma 4.7 (i) is assumed, (i.e., condition (3.6)), we
have
P(SUPHO,(P I ES, (f -g ) I >nV10)
< P (supH. Cp, I EN (f - ) I >t/10)+e.
By (4.15) and assumption (3.5), for n large, on a set of probability I - e, we get
P(SUPH,'(PV) I ES,, (f -g ) I >11/10 1 _)<E.
Integrating out -, we get
P(supH' (pin) I Et (f -g) I >q/10)<2e.
Putting all the above inequalities together, we finally get
P(sup[(1 I En (f -g ) I >n)<5e.
As for (ii), similarly, we can use Lemma 4.2 (i), Corollary 4.6 (ii), Lemma 4.7 (ii), (4.16)
and condition (3.7). 0
5. Proof II
In this section we present the proofs of the results in section 3 except those of the
two main theorems. These results are direct consequences of the two main theorems by
choosing ,, and a,. optimally according to the more specific entropy conditions assumed.
The proofs are arranged in the same order as the results appeared in section 3.
- 31 -
Proof of Corollary 3.2:
Note that logNI(c b,2,p ,,,,G )=O(log(n)) by Lemma 2.11 . If we take b,, =n +1/2 or
(logn)614N, the results follow from the remark after Theorem 3.1 and our hypothesis with
a,, = n and b,, =n1'+ for (i) and (ii), and a,, = 2 log(n), b, ( (@n for (iii). 0
Proof of Theorem 33 (uniform convergence for bounded families) In Theorem 3.1,
take b,, a 1. The only task left is to find the optimal integer pair (L,,,a,,) satisfying (3.1)
and gm Pa=1o(l). We take a,, as of order nX. Then ,, is of order n 1-x. Note that
gm,, a =o(1) is equivalent to x < '(1 +r)-l. It suffices to show that (3.1) holds for (i), (ii)
and (iii).
i) Since ra= 1, we can take any x such that 0 < x < (l+r)-1. In this case, (3.1) is th
same as assumption (3.2).
ii) When r, > 1, choose x as in (i), but note that (3.1) is the same as assumption (3.3) m
this case.
iii) 1) When O<r <ao, we take x=(l-cO)/(l-ra). This implies that ar'p. = O
nrx+1x)) = O( na), which means that (3.1) can be inferred from assumption (3.4).
2) In this case, (3.1) holds for any x in (0,1). Take x as in i) to ensure that ,, Pa. =o(1).
0
Before we prove the uniform convergence theorem for an index class with non-
constant envelope function, we first recall some results from Doob (1953) and from using
the Chebyshev inequality on the law of large numbers for strictly stationary sequences.
5.1 Proposition Suppose that X is strictly stationary with the stationary distribution P, and
that F is a measurable function. Then under either of the following three conditions, we
have
1 F (Xi ) P F in probability.n i=1
- 32 -
i) X is metrically transitie, and F e L1(P).
ii) A&(n,8)= Z oij = o(n) and F e L2+(P ) for some 6 in (0, 1].k-1
iii) r#>OandFeL2(P). 0
Proof of Theorem 3.5 (Uniform convergence for general families):
Case a: Assume F aM. This case is covered by Theorem 3.1 if we take b,, m1.
Case b: For any given M, take FM = M =f I F <M): f e G ). Then,
I Pm f-Pf LSIPnfm-PfM I +P F I(F>M))+P FJ(F>M))-
This implies
SUPF I P, f-Pf SSUPFMIP!fM -PfM I +Pn FI(F>M))+P FlI(F>A).
For any fixed e>0, takeM such PFl (F>M) ) <& Then by th asmp on Fa
Proposition 5.1, the law of large numbers holdis for '(F>M) wi this M.TNerefore, bodh
PFI (F>M)) and PF (F>M)) can be bounded by 3P F I (F>M)) in probabit when n is
sufficiently large. Note that for this fixed M the supremum over Fm tends to 0 in proba-
bility by part a) and the fact that the covering number of Fm is bounded by ta of F. O
Proof of Theorem 3.10 (CLT for a stationary mixing sequence):
By Proposition 3.9 and the mixing assumptions, the finite dimensional CLT holds for X.
According to Proposidon 3.8, we only have to check the equicontinuity condition. By
Theorem 3.6 (equicontinuity), we need to find an integer pairs (IL,. a, ) to satisfy either
condition (i) or (ii) of lTeorem 3.6., and be such that gx,, Pa. =o(1).
Take a,, =n[" ]. Then j1,, =o() is equivalent to rp > q(x) = l/x - 1.
a) Take x = 1/4-aa/2. Then/ la,, =nl~Z =n%. But a,,=n °wn° since a 1/4.
So the two random entropy conditions in Theorem 3.6 i) are implied by the random
entropy conditions assumed here. Further, ,, Pa =o(1), because r > 1- 2 ao
q( 114-a1o2) = q (x ) is assumed.
33
b) Since rp 7+4aO =(-3-2 ao) -1, by the continuity of function q(y) = y 1, for3-4ao 10 5
co0<2/11, there is a positive and small 8 such that r >q(3/10-2ca05 -6) = q(x) with x =
3/10- 2acrS-6, and
1/6< 1/6+ao/3<x a 632 8< 32 /10.105 10 5
So 3/4 - 5 x/2 > ao and 3 x - 1/2 > Cb Hence the two random entropy conditions in
Theorem 3.6 b) are implied by the random entropy condition assumed here. 0
Acknowledgements: This research is based on part of the author's Ph.D. disserta-
tion submitted to the University of Califomia at Berkeley. The author is deeply in debt to
Prof. Lucien Le Cam for suggesting the problem, many valuable discussions and
encouragement. The author also wants to express her tanks to Prof. Deborah Nolan for
many helpful discussions and suggestions for improvement, and to Prof. Terence Speed for
commenting on the drafts. Special thanks are due to Prof. Walter Philipp for pointing out
the difference of this work and the related work of his and Massart's.
References
Athreya, K. B. and Pantula, S. G. (1986). Mixing properties of Harris chains and autore-
gressive processes. J. Appl. Prob. 23 880-892.
Dehling, H. (1983). Limit theorems for sums of weakly dependent Banach space valued
random variables. Zeit. fur Wahr. und Ver. Geb. 63 393-432.
Dudley, R. M. (1978). Central limit theorems for empirical measures. Ann. Probab. 6
899-929. (Correction, ibid, 7 (1979) 909-911).
Dudley, R. M. and Philipp, W. (1983). Invariance principles for sums of Banach space
valued random elements and empirical processes. Zeit. fur Wahr. und Ver. Geb. 62
509-552.
Doob, J. L. (1953). Stochastic processes. New York: Wiley.
34
Gine, E. and Zin, J. (1984). On te centl limit theorem for empiricalpncesses. Ann.
Probab. 12 929-989.
Ibragimov, I. A. and Rosanov, Y. A. (1978). Gaussian random processes. New York:
Springer-Verlag. Applications ofmathemacs 9.
Kolmogorov, A. N. and Tihomirov, V. M. (1959). e-entropy and -capacity of sets
functional spaces. Uspehi Mat. Nauk. 14 3-86 (Amer. Math. Soc. Transl. Ser. 2 17
277-364).
Le Cam, L.M. (1984). A remark on empirical measures. In Bickel, P., Doksum, K., and
Hodges, J. (editors), Festschrift for E. L. Lehmann, 305-327. Belmon, CA: Wads-
worth
Levental, S. (1989). A uniform CLT for uniformly bounded families of mat
differences. J. of Theoretical Probability 2 271-287.
Massart, P. (1988). Invariant principles for empirical processes: the weakly nd t
case. Ph.D thesis, University of Paris.
Mokkadem, A. (1988). Mixing properties of ARMA processes. Stochastic processes and
their applications 29 309-315.
Pham, T. D. and Tran, L. T. (1985). Some mixing properties of time series models. Sto-
chastic processes and their applications 19 297-303.
Philipp, W. (1986). Invariance principles for independent and weakly dependent random
variables. Dependence in Probability and Statistics: A survey of recent results. 225-
268, E. Eberlem and M.S. Taqqu eds., Biihauser.
Pollard, D. (1982). A central limit theorem for empincal processes. J. of Australian
Madteatical Society (Series A) 33 235-248.
Pollard, D. (1984). Convergence of stochstdc processes. New York: Springer-Verlag.
Vapnik, V. N. and Cervonenkis, A. Ya. (1971). On the uniform convergence of relative
frequencies of events to their probabilides. Theory of Probability and its applications
16 264-280.
35
Withers, C. S. (1981). Conditions for linear processes to be strong-mixing. Zeit. fur
Wahr. und Ver. Geb. 57 477-480.
DEPARTMENT OF STATISTICS
UNIVERSrrY OF WISCONSIN
MADISON, WISCONSIN 53706