Top Banner
Advanced Probability University of Cambridge, Part III of the Mathematical Tripos Michaelmas Term 2006 Gr´ egory Miermont 1 1 CNRS & Laboratoire de Math´ ematique, Equipe Probabilit´ es, Statistique et Mod´ elisation, Bt. 425, Universit´ e Paris-Sud, 91405 Orsay, France
92
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: AdPr2006

Advanced ProbabilityUniversity of Cambridge,

Part III of the Mathematical Tripos

Michaelmas Term 2006

Gregory Miermont1

1CNRS & Laboratoire de Mathematique, Equipe Probabilites, Statistique et Modelisation,Bt. 425, Universite Paris-Sud, 91405 Orsay, France

Page 2: AdPr2006
Page 3: AdPr2006

Contents

1 Conditional expectation 5

1.1 The discrete case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Conditioning with respect to a σ-algebra . . . . . . . . . . . . . . . . . . . 6

1.2.1 The L2 case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.2 General case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.3 Non-negative case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Specific properties of conditional expectation . . . . . . . . . . . . . . . . . 10

1.4 Computing a conditional expectation . . . . . . . . . . . . . . . . . . . . . 11

1.4.1 Conditional density functions . . . . . . . . . . . . . . . . . . . . . 11

1.4.2 The Gaussian case . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Discrete-time martingales 13

2.1 Basic notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.1 Stochastic processes, filtrations . . . . . . . . . . . . . . . . . . . . 13

2.1.2 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.3 Doob’s stopping times . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Optional stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 The convergence theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4 Lp convergence, p > 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4.1 A maximal inequality . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.5 L1 convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.6 Optional stopping in the UI case . . . . . . . . . . . . . . . . . . . . . . . 20

2.7 Backwards martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Examples of applications of discrete-time martingales 23

3.1 Kolmogorov’s 0− 1 law, law of large numbers . . . . . . . . . . . . . . . . 23

3.2 Branching processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3 The Radon-Nikodym theorem . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.4 Product martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.4.1 Example: consistency of the likelihood ratio test . . . . . . . . . . . 28

3

Page 4: AdPr2006

4 Continuous-parameter processes 29

4.1 Theoretical problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Finite marginal distributions, versions . . . . . . . . . . . . . . . . . . . . . 31

4.3 The martingale regularization theorem . . . . . . . . . . . . . . . . . . . . 32

4.4 Convergence theorems for continuous-time martingales . . . . . . . . . . . 34

4.5 Kolmogorov’s continuity criterion . . . . . . . . . . . . . . . . . . . . . . . 35

5 Weak convergence 37

5.1 Definition and characterizations . . . . . . . . . . . . . . . . . . . . . . . . 37

5.2 Convergence in distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.3 Tightness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.4 Levy’s convergence theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6 Brownian motion 43

6.1 Wiener’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.2 First properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.3 The strong Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6.4 Martingales and Brownian motion . . . . . . . . . . . . . . . . . . . . . . . 51

6.5 Recurrence and transience properties . . . . . . . . . . . . . . . . . . . . . 53

6.6 The Dirichlet problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.7 Donsker’s invariance principle . . . . . . . . . . . . . . . . . . . . . . . . . 59

7 Poisson random measures and processes 63

7.1 Poisson random measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

7.2 Integrals with respect to a Poisson measure . . . . . . . . . . . . . . . . . . 64

7.3 Poisson point processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

7.3.1 Example: the Poisson process . . . . . . . . . . . . . . . . . . . . . 66

7.3.2 Example: compound Poisson processes . . . . . . . . . . . . . . . . 67

8 ID laws and Levy processes 69

8.1 The Levy-Khintchine formula . . . . . . . . . . . . . . . . . . . . . . . . . 69

8.2 Levy processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

9 Exercises 75

9.1 Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

9.2 Discrete-time martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

9.3 Continuous-time processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

9.4 Weak convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

9.5 Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

9.6 Poisson measures, ID laws and Levy processes . . . . . . . . . . . . . . . . 89

Page 5: AdPr2006

Chapter 1

Conditional expectation

1.1 The discrete case

Let (Ω,F , P ) be a probability space. If A,B ∈ F are two events such that P (B) > 0, wedefine the conditional probability of A given B by the formula

P (A|B) =P (A ∩B)

P (B).

We interpret this quantity as the probability of the event A given the fact that B isrealized. The fact that

P (A|B) =P (B|A)P (A)

P (B)

is called Bayes’ rule. More generally, if X ∈ L1(Ω,F , P ) is an integrable random variable,we define

E[X|B] =E[X1B]

P (B),

the conditional expectation of X given B.

Example. Toss a fair die (probability space Ω = 1, 2, 3, 4, 5, 6 and P (ω) = 1/6,for ω ∈ Ω) and let A = the result is even, B = the result is less than or equal to 2.Then P (A|B) = 1/2, P (B|A) = 1/3. If X = ω is the result, then E[X|A] = 4, E[X|B] =3/2.

Let (Bi, i ∈ I) be a countable collection of disjoint events, such that Ω =⋃i∈I Bi, and

G = σBi, i ∈ I. If X ∈ L1(Ω,F , P ), we define a random variable

X ′ =∑i∈I

E[X|Bi]1Bi,

with the convention that E[X|Bi] = 0 if P (Bi) = 0.

The random variable X ′ is integrable, since

E[|X ′|] =∑i∈I

P (Bi)|E[X|Bi]| =∑i∈I

P (Bi)|E[X1Bi

]|P (Bi)

≤ E[|X|].

Moreover, it is straightforward to check:

5

Page 6: AdPr2006

6 CHAPTER 1. CONDITIONAL EXPECTATION

1. X ′ is G-measurable, and

2. for every B ∈ G, E[1BX′] = E[1BX].

Example. If X ∈ L1(Ω,F , P ) and Y is a random variable with values in a countable setE, the above construction gives, by letting By = Y = y, y ∈ E, which partitions σ(Y )into measurable events, a random variable

E[X|Y ] =∑y∈E

E[X|Y = y]1Y=y.

Notice that the value taken by E[X|Y = y] when P (Y = y) = 0, which we have fixedto 0, is actually irrelevant to define E[X|Y ], since a random variable is always defined upto a set of zero measure. It is important to keep in mind that conditional expectationsare always a priori only defined up to a zero-measure set.

1.2 Conditioning with respect to a σ-algebra

We are now going to define the conditional expectation given a sub-σ-algebra of ourprobability space, by using the properties 1. and 2. of the previous paragraph. Thedefinition is due to Kolmogorov.

Theorem 1.2.1 Let G ⊂ F be a sub-σ-algebra, and X ∈ L1(Ω,F , P ). Then there existsa random variable X ′ with E[|X ′|] <∞ such that the following two characteristic propertyare verified:

1. X ′ is G-measurable

2. for every B ∈ G, E[1BX′] = E[1BX].

Moreover, if X ′′ is another such random variable, then X ′ = X ′′ a.s. We denote byE[X|G] ∈ L1(Ω,G, P ) the class of random variable X ′. It is called the conditional expec-tation of X given G.

Otherwise said, E[X|G] is the unique element of L1(Ω,G, P ) such that E[1BX] =E[1BE[X|G]] for every B ∈ G. Equivalently, an approximation argument allows to replace2. in the statement by:

2’. For every bounded G-measurable random variable Z, E[ZX ′] = E[ZX].

Proof of the uniqueness. Suppose X ′ and X ′′ satisfy the two conditions of thestatement. Then B = X ′ > X ′′ ∈ G, and therefore

0 = E[1B(X −X)] = E[1B(X ′ −X ′′)],

which shows X ′ ≤ X ′′ a.s., the reverse inequality is obtained by symmetry.

The existence will need two intermediate steps.

Page 7: AdPr2006

1.2. CONDITIONING WITH RESPECT TO A σ-ALGEBRA 7

1.2.1 The L2 case

We first consider L2 variables. Suppose that X ∈ L2(Ω,F , P ) and let G ⊂ F be asub-σ-algebra. Notice that L2(Ω,G, P ) is a closed vector subspace of the Hilbert spaceL2(Ω,F , P ). Therefore, there exists a unique random variable X ′ ∈ L2(Ω,G, P ) such thatE[Z(X −X ′)] = 0 for every Z ∈ L2(Ω,G, P ), namely, X ′ is the orthogonal projection ofX onto L2(Ω,G, P ). This shows the previous theorem in the case X ∈ L2, and in factE[·|G] : L2 → L2 is the orthonormal projector onto L2(Ω,G, P ), and hence is linear.

It follows from the uniqueness statement that the conditional expectation has thefollowing nice interpretation in the L2 case: E[X|G] is the G-measurable random variablethat best approximates X. It is useful to keep this intuitive idea even in the general L1

case, although the word “approximates” becomes more fuzzy.

Notice that X ′ := E[X|G] ≥ 0 a.s. whenever X ≥ 0 since (notice X ′ < 0 ∈ G)

E[X1X′<0] = E[X ′1X′<0],

and the left-hand side is non-negative while the right hand-side is non-positive, entailingP (X ′ < 0) = 0. Moreover, it holds that E[E[X|G]] = E[X], because it is the scalarproduct of X against the constant function 1 ∈ L2(Ω,G, P ).

1.2.2 General case

Now let X ≥ 0 be any non-negative random variable (not necessarily integrable). ThenX ∧ n is in L2 for every n ∈ N, and X ∧ n increases to X pointwise. Therefore, thesequence E[X ∧ n|G] is an (a.s.) increasing sequence, because X ∧ n−X ∧ (n− 1) ≥ 0)and by linearity of E[·|G] on L2. It therefore increases a.s. to a limit which we denote byE[X|G]. Notice that E[E[X ∧ n|G]] = E[X ∧ n] so that by the monotone convergencetheorem, E[E[X|G]] = E[X]. In particular, if X is integrable, then so is E[X|G].

Proof of existence in Theorem 1.2.1. Existence. Let X ∈ L1, and write X =X+ − X− (where X+ = X ∨ 0, and X− = (−X) ∨ 0). Then X+, X− are non-negativeintegrable random variables, so E[X+|G] and E[X−|G] are finite a.s. and we may define

E[X|G] = E[X+|G]− E[X−|G].

Now, let B ∈ G. Then E[(X+ ∧ n)1B] = E[E[X+ ∧ n|G]1B] by definition. The monotoneconvergence theorem allows to pass to the limit (all integrated random variables are non-negative), and we obtain E[X+

1B] = E[E[X+|G]1B]. The same is easily true for X−,and by subtracting we see that E[X|B] indeed satisfies the characteristic properties 1., 2.

The following properties are immediate consequences of the previous theorem and itsproof.

Proposition 1.2.1 Let G ⊂ F be a σ-algebra and X, Y ∈ L1(Ω,F , P ). Then

1. E[E[X|G]] = E[X]

2. If X is G-measurable, then E[X|G] = X.

Page 8: AdPr2006

8 CHAPTER 1. CONDITIONAL EXPECTATION

3. If X is independent of G, then E[X|G] = E[X].

4. If a, b ∈ R then E[aX + bY |G] = aE[X|G] + bE[Y |G] (linearity).

5. If X ≥ 0 then E[X|G] ≥ 0 (positiveness).

6. |E[X|G]| ≤ E[|X| |G], so that E[|E[X|G]|] ≤ E[|X|].

Important remark. Notice that all statements concerning conditional expectation areabout L1 variables, which are only defined up to a subset of of zero probability, and henceare a.s. statements. This is of crucial importance and reminds the fact that we encounteredbefore, that E[X|Y = y] can be assigned an arbitrary value whenever P (Y = y) = 0.

1.2.3 Non-negative case

In the course of proving the last theorem, we actually built an object E[X|G] as the a.s.increasing limit of E[X ∧ n|G] for any non-negative random variable X, not necessarilyintegrable. This random variable enjoys similar properties as the L1 case, and we statethem similarly as in Theorem 1.2.1.

Theorem 1.2.2 Let G ⊂ F be a sub-σ-algebra, and X ≥ 0 a non-negative randomvariable. Then there exists a random variable X ′ ≥ 0 such that

1. X ′ is G-measurable, and

2. for every non-negative G-measurable random variable Z, E[ZX ′] = E[ZX].

Moreover, if X ′′ is another such r.v., X ′ = X ′′ a.s. We denote by E[X|G] the class of X ′

up to a.s. equality.

Proof. Any r.v. in the class of E[X|G] = lim supnE[X ∧ n|G] trivially satisfies 1. It alsosatisfies 2. since if Z is a positive G-measurable random variable, we have, by passing tothe (increasing) limit in

E[(X ∧ n)(Z ∧ n)] = E[E[X ∧ n|G]Z ∧ n],

that E[XZ] = E[E[X|G]Z].

Uniqueness. If X ′, X ′′ are non-negative and satisfy the properties 1. & 2., for anya < b ∈ Q+, by letting B = X ′ ≤ a < b ≤ X ′′ ∈ G, we obtain

bP (B) ≤ E[X ′′1B] = E[X1B] = E[X ′

1B] ≤ aP (B),

which entails P (B) = 0, so that P (X ′ < X ′′) = 0 by taking the countable union overa < b ∈ Q+. Similarly, P (X ′ > X ′′) = 0.

The reader is invited to formulate and prove analogs of the properties of Proposition1.2.1 for positive variables, and in particular, that if 0 ≤ X ≤ Y then 0 ≤ E[X|G] ≤E[Y |G] a.s. The conditional expectation enjoys the following properties, which matchthose of the classical expectation.

Page 9: AdPr2006

1.2. CONDITIONING WITH RESPECT TO A σ-ALGEBRA 9

Proposition 1.2.2 Let G ⊂ F be a σ-algebra.

1. If (Xn, n ≥ 0) is an increasing sequence of non-negative random variables with limitX, then (conditional monotone convergence theorem)

E[Xn|G] n→∞

E[X|G] a.s.

2. If (Xn, n ≥ 0) is a sequence of non-negative random variables, then (conditionalFatou theorem)

E[lim infn→∞

Xn|G] ≤ lim infn→∞

E[Xn|G] a.s.

3. If (Xn, n ≥ 0) is a sequence of random variables a.s. converging to X, and if thereexists Y ∈ L1(Ω,F , P ) such that supn |Xn| ≤ Y a.s., then (conditional dominatedconvergence theorem)

limn→∞

E[Xn|G] = E[X|G] , a.s. and in L1.

4. If ϕ : R → (−∞,∞] is a convex function and X ∈ L1(Ω,F , P ), and either ϕ isnon-negative or ϕ(X) ∈ L1(Ω,F , P ), then (conditional Jensen inequality)

E[ϕ(X)|G] ≥ ϕ(E[X|G]) a.s.

5. If 1 ≤ p <∞ and X ∈ Lp(Ω,F , P ),

‖E[X|G]‖p ≤ ‖X‖p.

In particular, the linear operator X 7→ E[X|G] from Lp(Ω,F , P ) to Lp(Ω,G, P ) iscontinuous.

Proof. 1. Let X ′ be the increasing limit of E[Xn|G]. Let Z be a positive G-measurablerandom variable, then E[ZE[Xn|G]] = E[ZXn], which by taking an increasing limit givesE[ZX ′] = E[ZX], so X ′ = E[X|G].

2. We have E[infk≥nXk|G] ≤ infk≥nE[Xk|G] for every n by monotonicity of theconditional expectation, and the result is obtained by passing to the limit and using 1.

3. Applying 2. to the nonnegative random variables Z − Xn, Z + Xn, we get thatE[Z−X|G] ≤ E[Z|G]− lim supE[Xn|G] and that E[Z+X|G] ≤ E[Z|G]+lim inf E[Xn|G],giving the a.s. result. The L1 result is a consequence of the dominated convergencetheorem, since |E[Xn|G]| ≤ E[|Xn| |G] ≤ |Z| a.s.

4. A convex function ϕ is the superior envelope of its affine minorants, i.e.

ϕ(x) = supa,b∈R:∀y,ay+b≤ϕ(y)

ax+ b = supa,b∈Q:∀y,ay+b≤ϕ(y)

ax+ b.

The result is then a consequence of linearity of the conditional expectation and the factthat Q is countable (this last fact is necessary because of the fact that conditional expec-tation is defined only a.s.).

5. One deduces from 4. and the previous proposition that ‖E[X|G]‖pp = E[|E[X|G]|p] ≤E[E[|X|p |G]] = E[|X|p] = ‖X‖pp, if 1 ≤ p <∞ and X ∈ Lp(Ω,F , P ). Thus

‖E[X|G]‖p ≤ ‖X‖p.

Page 10: AdPr2006

10 CHAPTER 1. CONDITIONAL EXPECTATION

1.3 Specific properties of conditional expectation

The “information contained in G” can be factorized out the conditional expectation:

Proposition 1.3.1 Let G ⊂ F be a σ-algebra, and let X, Y be real random variables suchthat either X, Y are non-negative or X,XY ∈ L1(Ω,F , P ). Then, is Y is G-measurable,we have

E[Y X|G] = Y E[X|G].

Proof. Let Z be a non-negative G-measurable random variable, then, if X, Y are non-negative, E[ZY X] = E[ZY E[X|G]] since ZY is non-negative, and the result follows byuniqueness. IfX,XY are integrable, the same result follows by lettingX = X+−X−, Y =Y + − Y −.

One has the Tower property (restricting the information)

Proposition 1.3.2 Let G1 ⊂ G2 ⊂ F be σ-algebras. Then for every random variable Xwhich is positive or integrable,

E[E[X|G2]|G1] = E[X|G1].

Proof. For a positive bounded G1-measurable Z, Z is G2-measurable as well, so thatE[ZE[E[X|G2]|G1]] = E[ZE[X|G2]] = E[ZX] = E[ZE[X|G1]], hence the result.

Proposition 1.3.3 Let G1,G2 be two sub-σ-algebras of F , and let X be a positive orintegrable random variable. Then, if G2 is independent of σ(X,G1), E[X|G1 ∨ G2] =E[X|G1].

Proof. Let A ∈ G1, B ∈ G2, then

E[1A∩BE[X|G1 ∨ G2]] = E[1A1BX] = E[1BE[X1A|G2]] = P (B)E[X1A]

= P (B)E[1AE[X|G1]] = E[1A∩BE[X|G1]],

where we have used the independence property at the third and last steps. The proof isthen done by the monotone class theorem.

Proposition 1.3.4 Let X,Y be random variables and G be a sub-σ-algebra of F suchthat Y is G-measurable and X is independent of G. Then for any non-negative measurablefunction f ,

E[f(X, Y )|G] =

∫P (X ∈ dx)f(x, Y ),

where P (X ∈ dx) is the law of X.

Proof. For any non-negative G-measurable random variable Z, we have that X is in-dependent of (Y, Z), so that the law P ((X, Y, Z) ∈ dxdydz) is equal to the productP (X ∈ dx)P ((Y, Z) ∈ dydz) of the law of X by the law of (Y, Z). Hence,

E[Zf(X, Y )] =

∫zf(x, y)P (X ∈ dx)P ((Y, Z) ∈ dydz)

=

∫P (X ∈ dx)E[Zf(x, Y )] = E

[Z

∫P (X ∈ dx)f(x, Y )

],

where we used Fubini’s theorem in two places. This shows the result.

Page 11: AdPr2006

1.4. COMPUTING A CONDITIONAL EXPECTATION 11

1.4 Computing a conditional expectation

We give two concrete and important examples of computation of conditional expectations.

1.4.1 Conditional density functions

Suppose X, Y have values in Rm and Rn respectively, and that the law of (X, Y ) has adensity: P ((X, Y ) ∈ dxdy) = fX,Y (x, y)dxdy. Let fY (y) =

∫x∈Rm fX,Y (x, y)dx, y ∈ Rn be

the density of Y . Then for every non-negative measurable h : Rm → R, g : Rn → R, wehave

E[h(X)g(Y )] =

∫Rm×Rn

h(x)g(y)fX,Y (x, y)dxdy

=

∫Rn

g(y)fY (y)dy

∫Rm

h(x)fX,Y (x, y)

fY (y)1fY (y)>0dx

= E[ϕ(Y )g(Y )],

so E[h(X)|Y ] = ϕ(Y ), where

ϕ(y) =1

fY (y)

∫Rm

h(x)fX,Y (x, y)dx if fY (y) > 0,

and 0 else. We interpret this result by saying that

E[h(X)|Y ] =

∫Rm

h(x)ν(Y, dx),

where ν(y, dx) = fY (y)−1fX,Y (x, y)1fY (y)>0dx = fX|Y (x|y)dx. The measure ν(y, dx)is called conditional distribution given Y = y, and fX|Y (x|y) is the conditional densityfunction of X given Y = y. Notice this function of x, y is defined only up to a zero-measure set.

1.4.2 The Gaussian case

Let (X, Y ) be a Gaussian vector in R2. Take X ′ = aY +b with a, b such that Cov (X, Y ) =aVarY and aE[Y ] + b = E[X]. In this case, Cov (Y,X − X ′) = 0, hence X − X ′ isindependent of σ(Y ) by properties of Gaussian vectors. Moreover, X −X ′ is centered sofor every B ∈ σ(Y ), one has E[1BX] = E[1BX

′], hence X ′ = E[X|Y ].

Page 12: AdPr2006

12 CHAPTER 1. CONDITIONAL EXPECTATION

Page 13: AdPr2006

Chapter 2

Discrete-time martingales

Before we entirely focus on discrete-time martingales, we start with a general discussionon stochastic processes, which includes both discrete and continuous-time processes.

2.1 Basic notions

2.1.1 Stochastic processes, filtrations

Let (Ω,F , P ) be a probability space. For a measurable states space (E, E) and a subsetI ⊂ R of “times”, or “epochs”, an E-valued stochastic process indexed by I is a collection(Xt, t ∈ I) of random variables. Most of the processes we will consider take values in R,Rd, or C, being endowed with their Borel σ-algebras.

A filtration is a collection (Ft, t ∈ I) of sub-σ-algebras of F which is increasing (s ≤t =⇒ Fs ⊆ Ft). Once a filtration is given, we call (Ω,F , (Ft)t∈I , P ) a filtered probabilityspace. A process (Xt, t ∈ I) is adapted to the filtration (Ft, t ∈ I) if Xt is Ft-measurablefor every t.

The intuitive idea is that Ft is the quantity of information available up to time t(present). To give an informal example, if we are interested in the evolution of the stockmarket, we can take Ft as the past history of the stocks prices (or only some of them) upto time t.

We will let F∞ =∨t∈I Ft ⊆ F be the information at the end of times.

Example. For every process (Xt, t ∈ I), one associates the natural filtration

FXt = σ(Xs, s ≤ t) , t ∈ I.

Every process is adapted to its natural filtration, and FX is the smallest filtration X isadapted to: FX

t contains all the measurable events depending on (Xs, s ≤ t).

Last, a real-valued process (Xt, t ∈ I) is said to be integrable if E[|Xt|] < ∞ for allt ∈ I.

13

Page 14: AdPr2006

14 CHAPTER 2. DISCRETE-TIME MARTINGALES

2.1.2 Martingales

Definition 2.1.1 Let (Ω,F , (Ft)t∈I , P ) be a filtered probability space. An R-valued adaptedintegrable process (Xt, t ∈ I) is:

• a martingale if for every s ≤ t, E[Xt|Fs] = Xs.

• a supermartingale if for every s ≤ t, E[Xt|Fs] ≤ Xs.

• a submartingale if for every s ≤ t, E[Xt|Fs] ≥ Xs.

Notice that a (super, sub)martingale remains a martingale with respect to its naturalfiltration, by the tower property of conditional expectation.

2.1.3 Doob’s stopping times

Definition 2.1.2 Let (Ω,F , (Ft)t∈I , P ) be a filtered probability space. A stopping time(with respect to this space) is a random variable T : Ω → I t∞ such that T ≤ t ∈ Ftfor every t ∈ I.

For example, constant times are (trivial) stopping times. If I = Z+, the randomvariable n1A + ∞1Ac is a stopping time if A ∈ Fn (with the convention 0 · ∞ = 0).The intuitive idea behind this definition is that T is a time when a decision can be taken(given the information we have). For example, for a meteorologist having the weatherinformation up to the present time, the “first day of 2006 when the temperature is above23C is a stopping time, but not the “last day of 2006 when the temperature is above23C”.

Example. If I ⊂ Z+, the definition can be replaced by T = n ∈ Fn for all n ∈ I.When I is a subset of the integers, we will denote the time by letters n,m, k rather thant, s, r (so n ≥ 0 means n ∈ Z+). Particularly important instances of stopping times in thiscase are the first entrance times. Let (Xn, n ≥ 0) be an adapted process and let A ∈ E .The first entrance time in A is

TA = infn ∈ Z+ : Xn ∈ A ∈ Z+ t ∞.

It is a stopping time since

TA ≤ n =⋃

0≤m≤n

X−1m (A).

To the contrary, the last exit time before some fixed N ,

LA = supn ∈ 0, 1, . . . , N : Xn ∈ A ∈ Z+ t ∞,

is in general not a stopping time.

As an immediate consequence of the definition, one gets:

Proposition 2.1.1 Let S, T, (Tn, n ∈ N) be stopping times (with respect to some filteredprobability space). Then S ∧T, S ∨T, infn Tn, supn Tn, lim infn Tn, lim supn Tn are stoppingtimes.

Page 15: AdPr2006

2.2. OPTIONAL STOPPING 15

Definition 2.1.3 Let T be a stopping time with respect to some filtered probability space(Ω,F , (Ft)t∈I , P ). We define FT , the σ-algebra of events before time T

FT = A ∈ F∞ : A ∩ T ≤ t ∈ Ft.

The reader is invited to check that it defines indeed a σ-algebra, which is interpretedas the events that are measurable with respect to the information available at time T : “4days before the first day (T ) in 2005 when the temperature is above 23C, the temperaturewas below 10C” is in FT . If S, T are stopping times, one checks that

S ≤ T =⇒ FS ⊆ FT . (2.1)

Now suppose that I is countable. If (Xt, t ∈ I) is adapted, and T a stopping time,we let XT1T<∞ = XT (ω)(ω) if T (ω) < ∞, and 0 else. It is a random variable as thecomposition of (ω, t) 7→ Xt(ω) and ω 7→ (ω, T (ω)), which are measurable (why?). We alsolet XT = (XT∧t, t ∈ I), and we call it the process X stopped at T .

Proposition 2.1.2 Under these hypotheses,

1. XT1T<∞ is FT -measurable,

2. the process XT is adapted,

3. if moreover I = Z+ and X is integrable, then XT is integrable.

Proof. 1. Let A ∈ E . Then XT ∈ A ∩ T ≤ t =⋃s∈I,s≤tXs ∈ A ∩ T = s. Then

notice T = s = T ≤ s \⋃u<sT ≤ u ∈ Fs.

2. For every t ∈ I, XT∧t is FT∧t-measurable, hence Ft measurable since T ∧ t ≤ t, by(2.1).

3. If I = Z+ and X is integrable, E[|XTn |] =

∑m<nE[|Xm|1T=m] +E[|Xn|1T≥n] ≤

n sup0≤m≤nE[|Xm|].

From now on until the end of the section (except in the paragraph on backwardsmartingales), we will suppose that E = R and I = Z+ (discrete-time processes).

2.2 Discrete-time martingales: optional stopping

We consider a filtered probability space (Ω,F , (Fn), P ). All the above terminology (stop-ping times, adapted processes and so on) will be with respect to this space.

We first introduce the so-called ‘martingale transform’, which is sometimes called the‘discrete stochastic integral’ with respect to a (super, sub)martingale X. We say that aprocess (Cn, n ≥ 1) is previsible if Cn is Fn−1-measurable for every n ≥ 1. A previsibleprocess can be interpreted as a strategy: one bets at time n only with all the accumulatedknowledge up to time n− 1.

Page 16: AdPr2006

16 CHAPTER 2. DISCRETE-TIME MARTINGALES

If (Xn, n ≥ 0) is adapted and (Cn, n ≥ 1) is previsible, we define an adapted processC ·X by

(C ·X)n =n∑k=1

Ck(Xk −Xk−1).

We can interpret this new process as follows: if Xn is a certain amount of money at timen and if Cn is the bet of a player at time n then (C ·X)n is the total winning of the playerat time n.

Proposition 2.2.1 In this setting, if X is a martingale, and C is bounded, then C · Xis a martingale. If X is a supermartingale (resp. submartingale) and Cn ≥ 0 for everyn ≥ 1, then C ·X is a supermartingale (resp. submartingale).

Proof. Suppose X is a martingale. Since C is bounded, the process C · X is triviallyintegrable. Since Cn+1 is Fn-measurable,

E[(C ·X)n+1 − (C ·X)n|Fn] = E[Cn+1(Xn+1 −Xn)|Fn] = Cn+1E[Xn+1 −Xn|Fn] = 0.

The (super-, sub-)martingale cases are similar.

Theorem 2.2.1 (Optional stopping) Let (Xn, n ≥ 0) be a martingale (resp. super-,sub-martingale).

(i) If T is a stopping time, then Then XT is also a martingale (resp. super-, sub-martingale).

(ii) If S ≤ T are bounded stopping times, then E[XT |FS] = XS (resp. E[XT |FS] ≤XS, E[XT |FS] ≥ XS).

(iii) If S ≤ T are bounded stopping times, then E[XT ] = E[XS] (resp. E[XT ] ≤E[XS], E[XT ] ≥ E[XS]).

Proof. (i) Let Cn = 1n≤T, then C is a previsible non-negative bounded process, and itis immediate that C ·X = XT . The first result follows from Proposition 2.2.1.

(ii) If now S, T are bounded stopping times with S ≤ T , and A ∈ FS, we defineCn = 1A1S<n≤T. Then C is a nonnegative bounded previsible process, since A ∩ S <n = A ∩ S ≤ n − 1 ∈ Fn−1 and n ≤ T = n − 1 < T ∈ Fn−1. Morevoer, XS, XT

are integrable since S, T are bounded, and (C ·X)K = 1A(XT −XS) as soon as K ≥ Ta.s. Since C · X is a martingale, E[(C · X)K ] = E[(C · X)0] = 0. Taking expectationsentails that E[XT |FS] = XS.

(iii) Follows by taking expectations in (ii).

Notice that the last two statement is not true in general. For example, if (Yn, n ≥ 0)are independent random variables which take values ±1 with probability 1/2, then Xn =∑

1≤i≤n Yi is a martingale. If T = infn ≥ 0 : Xn = 1 then it is classical that T < ∞a.s., but of course E[XT ] = 1 > 0 = E[X0]. However, for non-negative supermartingales,Fatou’s lemma entails:

Proposition 2.2.2 Suppose X is a non-negative supermartingale. Then for any stoppingtime which is a.s. finite, we have E[XT ] ≤ E[X0].

Page 17: AdPr2006

2.3. THE CONVERGENCE THEOREM 17

Beware that this ≤ sign should not in general be turned into a = sign, even if X isa martingale! The very same proposition is actually true without the assumption thatP (T <∞) = 1, by the martingale convergence theorem 2.3.1 below.

2.3 Discrete-time martingales: the convergence the-

orem

The martingale convergence theorem is the most important result in this chapter.

Theorem 2.3.1 (Martingale convergence theorem) If X is a supermartingale whichis bounded in L1(Ω,F , P ), i.e. such that supnE[|Xn|] < ∞, then Xn converges a.s. to-wards an a.s. finite limit X∞.

An easy and important corollary for that is

Corollary 2.3.1 A non-negative supermartingale converges a.s. towards an a.s. finitelimit.

Indeed, for a non-negative supermartingale, E[|Xn|] = E[Xn] ≤ E[X0] <∞.

The proof of Theorem 2.3.1 relies on an estimation of the number of upcrossings of asubmartingale between to levels a < b. If (xn, n ≥ 0) is a real sequence, and a < b aretwo real numbers, we define two integer-valued sequences Sk(x), Tk(x), k ≥ 1 recursivelyas follows. Let T0(x) = 0 and for k ≥ 0, let

Sk+1(x) = infn ≥ Tk : xn < a , Tk+1(x) = infn ≥ Sk+1(x) : xm > b,

with the usual convention inf ∅ = ∞. The number Nn([a, b], x) = supk > 0 : Tk(x) ≤ nis the number of upcrossings of x between a and b before time n, which increases asn → ∞ to the total number of upcrossings N([a, b], x) = supk > 0 : Tk(x) < ∞. Thekey is the simple following analytic lemma:

Lemma 2.3.1 A real sequence x converges (in R) if and only if N([a, b], x) < ∞ forevery rationals a < b.

Proof. If there exist a < b rationals such that N([a, b], x) = ∞, then lim infn xn ≤ a <b ≤ lim supn xn so that x does not converge. If x does not converge, then lim infn xn <lim supn xn, so by taking two rationals a < b in between, we get the converse statement.

Theorem 2.3.2 (Doob’s upcrossing lemma) Let X be a supermartingale, and a < btwo reals. Then for every n ≥ 0,

(b− a)E[Nn([a, b], X)] ≤ E[(Xn − a)−].

Page 18: AdPr2006

18 CHAPTER 2. DISCRETE-TIME MARTINGALES

Proof. It is immediate by induction that Sk = Sk(X), Tk = Tk(X) defined as above arestopping times. Define a previsible process C, taking only 0 or 1 values, by

Cn =∑k≥1

1Sk<n≤Tk.

It is indeed previsible since Sk < n ≤ Tk = Sk ≤ n− 1∩Tk ≤ n− 1c ∈ Fn−1. Now,letting Nn = N([a, b], X), we have

(C ·X)n =Nn∑i=1

(XTi−XSi

) + (Xn −XSNn+1)1SNn+1≤n

≥ (b− a)Nn + (Xn − a)1Xn≤a ≥ (b− a)Nn − (Xn − a)−.

Since C is a non-negative bounded previsible process, C ·X is a supermartingale so finally

(b− a)E[Nn]− E[(Xn − a)−] ≤ E[(C ·X)n] ≤ 0,

hence the result.

Proof of Theorem 2.3.1. Since (x + y)− ≤ |x| + |y|, we get from Theorem 2.3.2that E[Nn] ≤ (b − a)−1E[|Xn| + a], and since Nn increases to N = N([a, b], X) we getby monotone convergence E[N ] ≤ (b − a)−1(supnE[|Xn|] + a). In particular, we getN([a, b], X) <∞ a.s. for every a < b ∈ Q, so

P

( ⋂a<b∈Q

N([a, b], X) <∞

)= 1.

Hence the a.s. convergence to some X∞, possibly infinite.

Now Fatou’s lemma gives E[|X∞|] ≤ lim infnE[|Xn|] < ∞ by hypothesis, hence|X∞| <∞ a.s.

Exercise. In fact, from Theorem 2.3.2 it is clearly enough that supnE[X−n ] <∞ is suffi-

cient, prove that this actually implies boundedness is L1 (providedX is a supermartingale,of course).

2.4 Doob’s inequalities and Lp convergence, p > 1

2.4.1 A maximal inequality

Proposition 2.4.1 Let X be a sub-martingale. Then letting Xn = sup0≤k≤nXk, forevery c > 0, and n ≥ 0,

cP (Xn ≥ c) ≤ E[Xn1X∗n≥c] ≤ E[X+

n ].

Proof. Letting T = infk ≥ 0 : Xk ≥ c, we obtain by optional stopping that

E[Xn] ≥ E[XTn ] = E[Xn1T>n] + E[XT1T≤n] ≥ E[Xn1T>n] + cP (T ≤ n).

Since T ≤ n = Xn ≥ c, the conclusion follows.

Page 19: AdPr2006

2.4. LP CONVERGENCE, P > 1 19

Theorem 2.4.1 (Doob’s Lp inequality) Let p > 1, and X be a martingale, then let-ting X∗

n = sup0≤k≤n |Xk|, we have

‖X∗n‖p ≤

p

p− 1‖Xn‖p.

Proof. Since x 7→ |x| in convex, the process (|Xn|, n ≥ 0) is a non-negative submartingale.Applying Proposition 2.4.1 and Holder’s inequality shows that

E[(X∗n)p] =

∫ ∞

0

dx pxp−1P (X∗n ≥ x)

≤∫ ∞

0

dx pxp−2E[|Xn|1X∗n≥x]

= pE

[|Xn|

∫ X∗n

0

dx xp−2

]=

p

p− 1E[|Xn|(X∗

n)p−1] ≤ p

p− 1‖Xn‖p‖X∗

n‖p−1p ,

which yields the result.

Theorem 2.4.2 Let X be a martingale and p > 1, then the following statements areequivalent:

1. X is bounded in Lp(Ω,F , P ): supn≥0 ‖Xn‖p <∞

2. X converges a.s. and in Lp to a random variable X∞

3. There exists some Z ∈ Lp(Ω,F , P ) such that

Xn = E[Z|Fn].

Proof. 1. =⇒ 2. Suppose X is bounded in Lp, then in particular, it is bounded in L1

so it converges a.s. to some finite X∞ by Theorem 2.3.1. Moreover, X∞ ∈ Lp by an easyapplication of Fatou’s theorem. Next, Doob’s inequality ‖X∗

n‖p ≤ C‖Xn‖p < C ′ < ∞entails ‖X∗

∞‖p <∞ by monotone convergence, where X∗∞ = supn≥0 |Xn| is the monotone

limit of X∗n. Since X∗

∞ ≥ supn∈N |Xn|, |Xn−X∞| ≤ 2X∗∞ ∈ Lp and dominated convergence

entails that Xn converges to X∞ in Lp.

2. =⇒ 3. Since conditional expectation is continuous as a linear operator on Lp

spaces (Proposition 1.2.2), if Xn → X∞ in Lp we have for n ≤ m, Xn = E[Xm|Fn] →m→∞E[X∞|Fn].

3. =⇒ 1. This is immediate by the conditional Jensen inequality.

A martingale which has the form in 3. is said to be closed (in Lp). Notice thatin this case, X∞ = E[Z|F∞], where F∞ =

∨n≥0Fn. Indeed,

⋃n≥0Fn is a π-system

that spans F∞, and moreover if B ∈ FN say is an element of this π-system, E[1BZ] =E[1BE[Z|F∞]] = E[1BXN ] → E[1BX∞] as N → ∞. Since X∞ = lim supnXn is F∞-measurable, this gives the result.

Therefore, for p > 1, the map Z ∈ Lp(Ω,F∞, P ) 7→ (E[Z|Fn], n ≥ 0) is a bijectionbetween Lp(Ω,F∞, P ) and the set of martingales that are bounded in Lp.

Page 20: AdPr2006

20 CHAPTER 2. DISCRETE-TIME MARTINGALES

2.5 Uniform integrability and convergence in L1

The case of L1 convergence is a little different from Lp for p > 1, as one needs to sup-pose uniform integrability rather than a mere boundedness in L1. Notice that uniformintegrability follows from boundedness in Lp.

Theorem 2.5.1 Let X be a martingale. The following statements are equivalent:

1. (Xn, n ≥ 0) is uniformly integrable

2. Xn converges a.s. and in L1(Ω,F , P ) to a limit X∞

3. There exists Z ∈ L1(Ω,F , P ) so that Xn = E[Z|Fn], n ≥ 0.

Proof. 1. =⇒ 2. Suppose X is uniformly integrable, then it is bounded in L1 so byTheorem 2.3.1 it converges a.s. By properties of uniform integrability, it then convergesin L1.

2. =⇒ 3. This follows the same proof as above: X∞ = Z is a suitable choice.

3. =⇒ 1. This is a straightforward consequence of the fact that

E[X|G] : G is a sub-σ-algebra of Fis U.I., see the example sheet 1.

As above, we then have E[Z|F∞] = X∞, and this theorem says that there is a one-to-one correspondence between U.I. martingales and L1(Ω,F∞, P ).

Exercise. Show that if X is a U.I. supermartingale (resp. submartingale), then Xn

converges a.s. and in L1 to a limit X∞, so that E[X∞|Fn] ≤ Xn (resp. ≥) for every n.

2.6 Optional stopping in the case of U.I. martingales

We give an improved version of the optional stopping theorem, in which the boundednesscondition on the stopping time is lifted, and replaced by a uniform integrability conditionon the martingale. Since U.I. martingales have a well defined limit X∞, we unambiguouslylet XT = XT1T<∞ +X∞1T=∞ for any stopping time T .

Theorem 2.6.1 Let X be a U.I. martingale, and S, T be two stopping times with S ≤ T .Then E[XT |FS] = XS.

Proof. We check that XT ∈ L1, indeed, since |Xn| ≤ E[|X∞| |Fn],

E[|XT |] =∞∑n=0

E[|Xn|1T=n] + E[|X∞|1T=∞] ≤∑n∈N

E[|X∞|1T=n] = E[|X∞|].

Next, if B ∈ FT ,

E[1BX∞] =∑

n∈Z+t∞

E[1B1T=nX∞] =∑

n∈Z+t∞

E[1B1T=nXn] = E[1BXT ],

so that XT = E[X∞|FT ]. Finally, E[XT |FS] = E[E[X∞|FT ]|FS] = XS, by the towerproperty.

Page 21: AdPr2006

2.7. BACKWARDS MARTINGALES 21

2.7 Backwards martingales

Backwards martingales are martingales whose time-set is Z−. More precisely, given afiltration . . . ⊆ G−2 ⊆ G−1 ⊆ G0, a process (Xn, n ≤ 0) is a backward martingale ifE[Xn+1|Gn] = Xn, as in the usual definition. They are somehow nicer than forwardmartingales, as they are automatically U.I. since X0 ∈ L1, and E[X0|Gn] = Xn for everyn ≤ 0. Adapting Doob’s upcrossing theorem is a simple exercise: if Nm([a, b], X) is thenumber of upcrossings of a backwards martingale from a to b between times −m and 0,one has, considering the (forward) supermartingale (X−m+k, 0 ≤ k ≤ m), that

(b− a)E[Nm([a, b], X)] ≤ E[(X0 − a)−].

As m → ∞, Nm([a, b], X) increases to the total number of upcrossings of X from a tob, and this allows to conclude that Xn converges a.s. as n → −∞ to a G−∞-measurablerandom variable X−∞, where G−∞ =

⋂n≤0 Gn. We proved:

Theorem 2.7.1 Let X be a backwards martingale. Then Xn converges a.s. and in L1 asn→ −∞ to the random variable X−∞ = E[X0|G−∞].

Moreover, if X0 ∈ Lp for some p > 1, then X is bounded in Lp and converges in Lp

as n→ −∞.

Page 22: AdPr2006

22 CHAPTER 2. DISCRETE-TIME MARTINGALES

Page 23: AdPr2006

Chapter 3

Examples of applications ofdiscrete-time martingales

3.1 Kolmogorov’s 0− 1 law, law of large numbers

Let (Yn, n ≥ 1) be a sequence of independent random variables.

Theorem 3.1.1 (Kolmogorov’s 0− 1 law) The tail σ-algebra G∞ =⋂n≥0 Gn, where

Gn = σXm,m ≥ n, is trivial: every A ∈ G∞ has probability 0 or 1.

Proof. Let Fn = σY1, . . . , Yn, n ≥ 1. Let A ∈ G∞. Then E[1A|Fn] = P (A) since Fn isindependent of Gn+1, hence of G∞. Therefore, the martingale convergence theorem givesE[1A|F∞] = 1A = P (A) a.s., since G∞ ⊂ F∞. Hence, P (A) ∈ 0, 1.

Suppose now that the Yi are real-valued i.i.d. random variables in L1. Let Sn =∑nk=1 Yk, n ≥ 0 be the associated random walk.

Theorem 3.1.2 (LLN) A.s. as n→∞,

Snn−→n→∞

E[Y1].

Proof. Let Hn = σSn, Sn+1, . . . = σSn, Yn+1, Yn+2, . . .. We have E[Sn|Hn+1] =Sn+1 − E[Xn+1|Sn+1]. Now, by symmetry we have E[Xn+1|Sn+1] = E[Xk|Sn+1] for ev-ery 1 ≤ k ≤ n + 1, so that it equals (n + 1)−1E[Sn+1|Sn+1] = Sn+1/(n + 1). Finally,E[Sn/n|Hn+1] = Sn+1/(n + 1), so that (S−n/(−n), n ≤ −1) is a backwards martingalewith respect to its natural filtration. Therefore, Sn/n converges a.s. and in L1 to a limitwhich is a.s. constant by Kolmogorov’s 0− 1 law, so it must be equal to its mean value:E[S1|H∞] = E[S1] = E[Y1].

3.2 Branching processes

Let µ be a probability distribution on Z+, and consider a Markov process (Zn, n ≥ 0) inZ+ whose step-transitions are determined by the following rule. Given Zn = z, take z

23

Page 24: AdPr2006

24CHAPTER 3. EXAMPLES OF APPLICATIONS OF DISCRETE-TIME MARTINGALES

independent random variables Y1, . . . , Yz with law µ, and let Zn+1 have the same distri-bution as Y1 + . . .+Yz. In particular, 0 is an absorbing state for this process. This can beinterpreted as follows: Zn is a number of individuals present in a population, and at eachtime, each individual dies after giving birth to a µ-distributed number of sons, indepen-dently of the others. Notice that E[Zn+1|Fn] = E[Zn+1|Zn] = mZn, where (Fn, n ≥ 0) isthe natural filtration, and m is the mean of µ, m =

∑z zµ(z). Therefore, supposing

m ∈ (0,∞),

Proposition 3.2.1 The process (m−nZn, n ≥ 0) is a non-negative martingale.

Notice that the fact that the martingale converges a.s. to a finite value immediatelyimplies that when m < 1, there exists some n so that Zn = 0, i.e. the population becomesextinct in finite time. It is also guessed that when m > 1, Zn should be of order mn sothat the population should grow explosively, at least with a positive probability. It is astandard to show that

Exercise Let ϕ(s) =∑

z∈Z+µ(z)sz be the generating function of µ, we suppose

µ(1) < 1. Show that if Z0 = 1, then the generating function of Zn is the n-foldcomposition of ϕ with itself. Show that the probability of eventual extinction of the pop-ulation satisfies ϕ(q) = q, and that q > 0 ⇐⇒ m > 1. As a hint, ϕ is a convex functionsuch that ϕ′(1) = m.

Notice that, still supposing Z0 = 1, the martingale (Mn = Zn/mn, n ≥ 0) cannot be

U.I. when m ≤ 1, since it converges to 0 a.s., so E[M∞] < E[M0]. This leaves openthe question whether P (M∞ > 0) > 0 in the case m > 1. We are going to address theproblem in a particular case:

Proposition 3.2.2 Suppose m > 1, Z0 = 1 and σ2 = Var (µ) <∞. Then the martingaleM is bounded in L2, and hence converges a.s. and in L2 to a variable M∞ so that E[M∞] =1, in particular, P (M∞ > 0) > 0.

Proof. We compute E[Z2n+1|Fn] = Z2

nm2 + Znσ

2. This shows that E[M2n+1] = E[M2

n] +σ2m−n, and therefore, since m−n, n ≥ 0 is summable, M is bounded in L2 (this statementis actually equivalent to m > 1).

Exercise Show that under these hypotheses, M∞ > 0 and limn Zn = ∞ are equal,up to an event of vanishing probability.

3.3 A martingale approach to the Radon-Nikodym

theorem

We begin with the following general remark. Let (Ω,F , (Fn), P ) be a filtered probabilityspace with F∞ = F , and let Q be a finite non-negative measure on (Ω,F). Let Pn andQn denote the restrictions of P and Q to the measurable space (Ω,Fn). Suppose that forevery n, Qn has a density Mn with respect to Pn, namely Qn(dω) = Mn(ω)Pn(dω), whereMn is an Fn-measurable non-negative function. We also sometimes let Mn = dQn/dPn.

Page 25: AdPr2006

3.3. THE RADON-NIKODYM THEOREM 25

Then it is immediate that (Mn, n ≥ 0) is a martingale with respect to the filtered space(Ω,F , (Fn), P ). Indeed, E[Mn] = Q(Ω) <∞, and for A ∈ Fn,

EP [Mn+11A] = EPn+1 [Mn+11A] = Qn+1(A) = Qn(A) = EPn [Mn1A] = EP [Mn1A],

where EP , EPn denote expectations with respect to the probability measures P, Pn. Anatural problem is to wonder whether the identity Qn = MnPn passes to the limit Q =M∞P as n→∞, where M∞ is the a.s. limit of the non-negative martingale M .

Proposition 3.3.1 Under these hypotheses, there exists a non-negative random variableX := dQ/dP such that Q = X · P if and only if (Mn, n ≥ 0) is U.I.

Proof. If M is U.I., then we can pass to the limit in E[Mm1A] = Q(A) for A ∈ Fn andm → ∞, to obtain E[M∞1A] = Q(A) for every A ∈

⋃n≥0Fn. Since this last set is a

π-system that generates F∞ = F , we obtain M∞ · P = Q by the theorem on uniquenessof measures.

Conversely, if Q = X · P , then for A ∈ Fn, we have Q(A) = E[Mn1A] = E[X1A] sothat Mn = E[X|Fn], which shows that M is U.I.

The Radon-Nikodym theorem (in a particular case) states as follows.

Theorem 3.3.1 (Radon-Nikodym) Let (Ω,F) be a measurable space such that F isseparable, i.e. generated by a countable set of events Fk, k ≥ 1. Let P be a probabilitymeasure on (Ω,F) and Q be a finite non-negative measure on (Ω,F). Then the followingstatements are equivalent.

(i) Q is absolutely continuous with respect to P , namely

∀A ∈ F , P (A) = 0 =⇒ Q(A) = 0.

(ii) ∀ ε > 0,∃ δ > 0,∀A ∈ F , P (A) ≤ δ =⇒ Q(A) ≤ ε.

(iii) There exists a non-negative random variable X such that Q = X · P .

The separability condition on F can actually be lifted, see Williams’ book for theproof in the general case.

Proof. That (iii) implies (i) is straightforward.

If (ii) is not satisfied then we can find a sequence Bn of events and an ε > 0 suchthat P (Bn) < 2−n but Q(Bn) ≥ ε. But by the Borel-Cantelli lemma, P (lim supBn) =0, while Q(lim supBn), as the decreasing limit of Q(

⋃k≥nBk) as n → ∞, must be ≥

lim supnQ(Bn) ≥ ε. Hence, (i) does not hold for the set A = lim supBn. So (i) implies(ii).

Let us now assume (ii). Let Fn be a filtration such that Fn is the σ-algebra spanned byevents F1, . . . , Fn, Notice that any event of Fn is a disjoint union of non-empty “atoms”of the form ⋂

i≥1

Gi,

Page 26: AdPr2006

26CHAPTER 3. EXAMPLES OF APPLICATIONS OF DISCRETE-TIME MARTINGALES

where either Gi = Fi or its complementary set. We let An be the set of atoms of Fn. Let

Mn(ω) =∑A∈An

Q(A)

P (A)1A(ω),

with the convention that 0/0 = 0. Then it is easy to check that Mn is a density for Qn

with respect to Pn, where Pn, Qn denote restrictions to Fn as above. Indeed, if A ∈ An,

Qn(A) =Q(A)

P (A)P (A) = EPn [Mn1A].

Therefore, (Mn, n ≥ 0) is a non-negative (Fn, n ≥ 0)-martingale, and Mn(ω) convergesa.s. towards a limit M∞(ω). Moreover, the last proposition tells us that it suffices to shozthat (Mn) is U.I. to conclude the proof.

But note that we have E[Mn1Mn≥a] = Q(Mn ≥ a). So for ε > 0 fixed, P (Mn ≥ a) ≤E[Mn]/a = Q(Ω)/a < δ for all n, as soon as a is large enough, with δ fixed by the claim,and this entails Q(Mn ≥ a) ≤ ε for every n. Hence the result.

Example. Let Ω = [0, 1) be endowed with its Borel σ-field, which is spanned by Ik,j =[j2−k, (j + 1)2−k), k ≥ 0, 0 ≤ j ≤ 2k − 1. The intervals Ik,j, 0 ≤ j ≤ 2k − 1 are called thedyadic intervals of depth k, they span a σ-algebra which we call Fk. We let λ(dω) be theLebesgue measure on [0, 1). Let ν be a finite non-negative measure on [0, 1), and

Mn(ω) = 2n2n−1∑j=0

1In,j(ω)ν(In,j),

then we obtain by the previous theorem that if ν is absolutely continuous with respectto λ, then ν = f · λ for some non-negative measurable f . We then see that a.s., ifIk(x) = [2−k[2kx], 2−k([2kx] + 1)) denotes the dyadic interval of level k containing x,

2k∫Ik(x)

f(x)λ(dx) →k→∞

f(x).

This is a particular case of Lebesgue differentiation theorem.

3.4 Product martingales and likelihood ratio tests

Theorem 3.4.1 (Kakutani’s theorem) Let (Yn, n ≥ 1) a sequence of independentnon-negative random variables, with mean 1. Let Fn = σ(Y1, . . . , Yn). Then Xn =∏

1≤k≤n Yk, n ≥ 0 is a (Fn, n ≥ 0)-martingale, which converges to some X∞ ≥ 0. Letting

an = E[√Yn], the following statements are equivalent:

1. X is U.I.

2. E[X∞] = 1

3. P (X∞ > 0) > 0

Page 27: AdPr2006

3.4. PRODUCT MARTINGALES 27

4.∏

n an > 0.

Proof. The fact that M is a (non-negative) martingale follows from the fact thatE[Xn+1|Fn] = XnE[Yn+1|Fn] = XnE[Yn+1]. For the same reason, the process

Mn =n∏k=1

√Ynan

, n ≥ 0

is a non-negative martingale with mean E[Mn] = 1, and E[M2n] =

∏nk=1 a

−2k . Thus, M

is bounded in L2 if and only if∏

n an > 0 (notice that an ∈ (0, 1] e.g. by the Schwarz

inequality E[1 ·√Yn] ≤

√E[Yn]).

Now, with the standard notation X∗n = sup0≤k≤nXk, using Doob’s L2 inequality,

E[X∗n] ≤ E[(M∗

n)2] ≤ 4E[M2

n],

which shows that if M is bounded in L2, then X∗∞ is integrable, hence X is U.I. since it

is dominated by X∗∞. We thus have obtained 4. =⇒ 1. =⇒ 2. =⇒ 3., where the

second implication comes from the optional stopping theorem for U.I. martingales, andthe implication 2. =⇒ 3. is trivial. On the other hand, if

∏n an = 0, since Mn converges

a.s. to some M∞ ≥ 0, then√Xn = Mn

∏1≤k≤n ak converges to 0, so that 3. does not hold.

So 3. =⇒ 4., hence the result.

Note that, if Yn > 0 a.s. for every n, the event X∞ = 0 is a tail event, so that 2.above is equivalent to P (X∞ > 0) = 1 by Kolmogorov’s 0− 1 law.

As an example of application of this theorem, consider a σ-finite measured space(E, E , λ) and let Ω = EN,F = E⊗N be the product measurable space. We let Xn(ω) =ωn, n ≥ 1, and Fn = σ(X1, . . . , Xn). One says that X is the canonical (E-valued)process.

Now suppose given two families of probability measures (µn, n ≥ 1) and (νn, n ≥1) that admit densities dµn = fndλ, dνn = gndλ with respect to λ. We suppose thatfn(x)gn(x) > 0 for every n, x. Let P =

⊗n≥1 µn, resp. Q =

⊗n≥1 νn denote the measures

on (Ω,F) under which (Xn, n ≥ 1) is a sequence of independent random variables withrespective laws µn (resp. νn). In particular, if A =

∏ni=1Ai×EN is a measurable rectangle

in Fn,

Q(A) =

∫En

n∏i=1

gi(xi)

fi(xi)

n∏i=1

fi(xi)dxi = EP (Mn 1A),

where EP denotes expectation with respect to P , and

Mn =n∏i=1

gi(Xi)

fi(Xi).

Since measurable rectangles of Fn form a π-system that span Fn, the probability Q|Fn isabsolutely continuous with respect to P |Fn , with density Mn, so that (Mn, n ≥ 1) is a non-negative martingale with respect to the filtered space (Ω,F , (Fn, n ≥ 0), P ). Kakutani’stheorem then shows that M converges a.s. and in L1 to its limit M∞ if and only if∏

n≥1

∫E

√fn(x)gn(x)λ(dx) > 0 ⇐⇒

∑n≥1

∫E

(√fn(x)−

√gn(x)

)2

λ(dx).

Page 28: AdPr2006

28CHAPTER 3. EXAMPLES OF APPLICATIONS OF DISCRETE-TIME MARTINGALES

In this case, one has Q(A) = EP [M∞1A] for every measurable rectangle of F , and Q isabsolutely continuous with respect to P with density M∞. In the opposite case, M∞ = 0,so Proposition 3.3.1 shows that Q and P are carried by two disjoint measurable sets.

3.4.1 Example: consistency of the likelihood ratio test

In the case where µn = µ, νn = ν for every n, we see that M∞ = 0 a.s. if (and onlyif) µ 6= ν. This is called the consistency of the likelihood ratio test in statistics. Letus recall the background for the application of this test. Suppose given an i.i.d. sampleX1, X2, . . . , Xn, with an unknown common distribution. Suppose one wants to test thehypothesis (H0) that this distribution is P against the hypothesis (H1) that it is Q,where P and Q have everywhere positive densities f, g with respect to some common σ-finite measure λ (for example, a normal distribution and a Cauchy distribution). LettingMn =

∏1≤i≤n g(Xi)/f(Xi), we use the test 1Mn≤1 for acceptance ofH0 againstH1. Then

supposing H0, then M∞ = 0 a.s. so the probability of rejection P (Mn > 1) converges to0. Similarly, supposing H1, then M∞ = +∞ a.s. so the probability of rejection goes to 1.

Page 29: AdPr2006

Chapter 4

Continuous-parameter stochasticprocesses

In this section, we will consider the case when processes are indexed by a real intervalI ⊂ R, with non-empty interior, in many cases I will be R+. This makes the whole studymore involved, as we now stress here. In all what follows, the states space E is assumedto be a metric space, usually E = R or E = Rd endowed with the Euclidean norm.

4.1 Theoretical problems when dealing with contin-

uous time processes

Although the definitions for filtrations, adapted processes, stopping times, martingales,super-, sub-martingales are not changed when compared to the discrete case (see the be-ginning of Section 2), the use of continuous time induces important measurability prob-lems. Indeed, there is no reason why an adapted process (ω, t) 7→ Xt(ω) should be ameasurable map defined on Ω× I, or even the sample path t 7→ Xt(ω) for any fixed ω. Inparticular, stopped processes like XT1T<∞ for a stopping time T have no reason to berandom variables.

Even worse, there are in general “very few” stopping times — for example first entrancetimes inft ≥ 0 : Xt ∈ A for measurable (or even open or closed) subsets of the statesspace E need not be stopping times.

This is the reason why we add a priori requirements on the regularity of randomprocesses under consideration. A quite natural requirement is that they are continuousprocesses, i.e. that t 7→ Xt(ω) is continuous for a.e. ω, because a continuous function isdetermined by its values on a countable dense subset of I. More generally, we will considerprocesses that are right-continuous and admit left limits everywhere, a.s. — such processesare called cadlag, and are also determined by the values they take on a countable densesubset of I (the notation cadlag stands for the French ‘continu a droite, limite a gauche’).

We let C(I, E), D(I, E) denote the spaces of continuous and cadlag functions from Ito E, we consider these sets as measurable spaces by endowing them with the productσ-algebra that makes the projections πt : X 7→ Xt measurable for every t ∈ I. Usually, wewill consider processes with values in R, or sometimes Rd for some d ≥ 1 in the chapter

29

Page 30: AdPr2006

30 CHAPTER 4. CONTINUOUS-PARAMETER PROCESSES

on Brownian motion. The following proposition holds, of which (ii) is an analog of 1., 2.in Proposition 2.1.2.

Proposition 4.1.1 Let (Ω,F , (Ft, t ∈ I), P ) be a filtered probability space, and let (Xt, t ∈I) be an adapted process with values in E.

(i) Suppose X is continuous (i.e. (Xt(ω), t ∈ I) ∈ C(I, E) for every ω). If A is aclosed set, and inf I > −∞, then the random time

TA = inft ∈ I : Xt ∈ A

is a stopping time.

(ii) Let T be a stopping time, and suppose X is cadlag. Then XT1T<∞ : ω 7→XT (ω)(ω)1T (ω)<∞ is an FT -measurable random variable. Moreover, the stopped processXT = (XT∧t, t ≥ 0) is adapted.

Proof. For (i), notice that if A is closed and X is continuous, then for every t ∈ I,

TA ≤ t =

inf

s∈I∩Q,s≤td(Xs, A) = 0

,

where d(x,A) = infy∈A d(x, y) is the distance from x to the set A. Indeed, if Xs ∈ Afor some s ≤ t, then for qn converging to s in Q ∩ I ∩ (−∞, t], Xqn converges to Xs, sothat d(Xqn , A) converges to 0. Conversely, if there exists qn ∈ Q ∩ I ∩ (−∞, t] such thatd(Xqn , A) converges to 0, then since inf I > −∞ we can extract along a subsequence andassume qn converges to some s ∈ I ∩ (−∞, t], and this s has to satisfy d(Xs, A) = 0 bycontinuity of X. Since A is closed, this implies Xs ∈ A, so that TA ≤ t.

For (ii), first note that a random variable Z is FT -measurable if Z1T≤t ∈ Ft forevery t ∈ I, by approximating Z by a finite sum of the form

∑αi1Ai

, for Ai ∈ FT .

Notice also that if T is a stopping time, then, if dxe denotes smallest n ∈ Z+ withn ≥ x, Tn = 2−nd2nT e is also a stopping time with Tn ≥ T , that decreases to T as n→∞(Tn = ∞ if T = ∞). Indeed, Tn ≤ t = T ≤ 2−nb2ntc ∈ Ft (notice dxe ≤ y if andonly if x ≤ byc, where byc is the largest n ∈ Z+ with n ≤ x). Moreover, Tn takes valuesin the set D∗

n = k2−n, k ∈ Z+ t ∞ of dyadic numbers with level n (or ∞).

Therefore, XT1T<∞1T≤t = Xt1T=t +XT1T<t, which by the cadlag property isequal to

Xt1T=t + limn→∞

XTn∧t1T<t.

The variables Xt1T=t and XTn∧t1T<t are Ft-measurable, because

XTn∧t =∑

d∈D∗n,d≤t

Xd1Tn=d +Xt1t<Tn.

hence the result. For the statement on the stopped process, notice that for every t, XT∧tis FT∧t, hence Ft-measurable.

It turns out that (i) does not hold in general for cadlag processes, although it is a verysubtle problem to find counterexamples. See Rogers and Williams’ book, Chapters II.74

Page 31: AdPr2006

4.2. FINITE MARGINAL DISTRIBUTIONS, VERSIONS 31

and II.75. In particular, Lemma 75.1 therein shows that TA is a stopping time if A iscompact and X is an adapted cadlag process, whenever the filtration (Ft, t ∈ I) satisfiesthe so-called “usual conditions” — see Section 4.3 for the definition of these conditions.

You may check as an exercise that the times TA for open sets A associated with cadlagprocesses, are stopping times with respect to the filtration (Ft+, t ∈ I), where

Ft+ =⋂s>t

Fs.

Somehow, the filtration (Ft+) foresees what will happen ‘just after’ t.

4.2 Finite marginal distributions, versions

We now discuss the notion of law of a process. If (Xt, t ∈ I) is a stochastic process, wecan consider it as a random variable with values in the set EI of maps f : I → E, wherethis last space is endowed with the product σ-algebra (the smallest σ-algebra that makesthe projections f ∈ EI 7→ f(t) measurable for every t ∈ I). It is then natural to considerthe image measure µ of the probability P by the process X as the law of X. However,this measure is uneasy to manipulate, and the quantities that are of true interest are thefollowing simpler objects.

Definition 4.2.1 Let (Xt, t ∈ I) be a process. For every finite J ⊂ I, the finite marginaldistribution of X indexed by J is the law µJ of the EJ-valued random variable (Xt, t ∈ J).

It is a nice fact that the finite marginal distributions µJ : J ⊂ I,#J <∞ uniquelycharacterize the law µ of the process (Xt, t ∈ I) as defined above. Indeed, by definition, ifX and Y are cadlag processes having the same finite marginal laws, then their distributionagree on the π-system of finite “rectangles” of the form

∏i∈J As ×

∏t∈I\J E for finite

J ⊂ I, which generate the product σ-algebra, hence the distributions under considerationare equal. Notice that this uniqueness result does not imply the existence of a processwith given marginal distributions.

The problem with (finite marginal) laws of processes is that they are powerless indealing with properties of processes that involve more than countably many times, suchas continuity or cadlag properties of the process. For example, if X is a continuousprocess, there are (many!) non-continuous processes that have the same finite-marginaldistributions as X: the finite marginal distributions just do not ‘see’ the sample pathproperties of the process. This motivates the following definition.

Definition 4.2.2 If X and X ′ are two processes defined on some common probabilityspace (Ω,F , P ), we say that X ′ is a version of X if for every t, Xt(ω) = X ′

t(ω) a.s.

In particular, two versions X and X ′ of the same process share the same finite-dimensional distribution, however, this does not say that there exists an ω so thatXt(ω) = X ′

t(ω) for every t. This becomes true if both X and X ′ are a priori knownto be cadlag, for instance.

Page 32: AdPr2006

32 CHAPTER 4. CONTINUOUS-PARAMETER PROCESSES

Example. To explain these very abstract notions, suppose we want to find a process(Xt, 0 ≤ t ≤ 1) whose finite marginal laws are Dirac masses at 0, namely

µJ((0, 0, . . . , 0︸ ︷︷ ︸#J times

)) = P (Xs = 0, s ∈ J) = 1

for every finite J ⊂ [0, 1]. Of course, the process Xt = 0, 0 ≤ t ≤ 1 satisfies this. However,the process X ′

t = 1U(t), 0 ≤ t ≤ 1, where U is a uniform random variable on [0, 1], is aversion of X, and therefore has the same law as X. But of course, it is not continuous,and P (X ′

t = 0∀ t ∈ [0, 1]) = 0. We thus want to consider it as a ‘bad’ version of thezero process. This example motivates the following way of dealing with processes: whenconsidering a process whose finite marginal distributions are known, we first try to findthe most regular version of the process as we can before studying it.

We will discuss two ‘regularization theorems’ in this course, the martingale regular-ization theorem and Kolmogorov’s continuity criterion, which are instances of situationswhen there exists a regular (continuous or cadlag) version of the stochastic process underconsideration.

4.3 The martingale regularization theorem

We consider here a martingale (Xt, t ≥ 0) on some filtered probability space (Ω,F , (Ft, t ≥0), P ). We let N be the set of events in F with probability 0,

Ft+ =⋂s>t

Fs, t ≥ 0,

and Ft = Ft+ ∪N .

Theorem 4.3.1 Let (Xt, t ≥ 0) be a martingale. Then there exists a cadlag process X

which is a martingale with respect to the filtered probability space (Ω,F , (Ft, t ≥ 0), P ),

so that for every t ≥ 0, Xt = E[Xt|Ft] a.s. If Ft = Ft for every t ≥ 0, X is therefore acadlag version of X.

We say that (Ft, t ≥ 0) satisfies the usual conditions if Ft = Ft for every t, that is,N ⊆ F0 and Ft+ = Ft (a filtration satisfying this last condition for t ≥ 0 is called right-continuous, notice that (Ft+, t ≥ 0) is right-continuous for every filtration (Ft, t ≥ 0)). Asa corollary of Theorem 4.3.1, in the case when the filtration satisfies the usual conditions, amartingale admits a cadlag version so there is “little to lose” to consider that martingalesare cadlag.

Lemma 4.3.1 A function f : Q+ → R admits a left and a right (finite) limit at everyt ∈ R+ if and only if for every rationals a < b and bounded I ⊂ Q, f is bounded on I andthe number

N([a, b], I, f) = sup

n ≥ 0 :

∃ 0 ≤ s1 < t1 < . . . < sn < tn, all in I ,f(si) < a, f(ti) > b, 1 ≤ i ≤ n

a upcrossings of f from a to b is finite.

Page 33: AdPr2006

4.3. THE MARTINGALE REGULARIZATION THEOREM 33

Proof of Theorem 4.3.1. We first show that X is bounded on bounded subsetsof Q+. Indeed, if I is such a subset and J = a1, . . . , ak is a finite subset of I witha1 < . . . < an, then Ml = Xal

, 1 ≤ l ≤ k is a martingale. Doob’s maximal inequalityapplied to the submartingale |M | then shows that

cP (M∗k > c) = cP (max

1≤l≤k|Xal

| > c) ≤ E[|Xak|] ≤ E[|XK |]

for any K > sup I. Therefore, taking a monotone limit over finite J ⊂ I with union I,we have

cP (supt∈I

|Xt| > c) ≤ E[|XK |].

This shows that P (supt∈I |Xt| <∞) = 1 by letting c→∞.

Let I still be a bounded subset of R+, and a < b ∈ Q+. By definition, we haveN([a, b], I,X) = sup

J⊂I,finiteN([a, b], J,X). So let J ⊂ I be a finite subset of the form

a1, a2, . . . , ak as above, and again let Ml = Xal, 1 ≤ l ≤ k. Doob’s upcrossing lemma

for this martingale gives

(b− a)E[N([a, b], J,X)] ≤ E[(Xak− a)−] ≤ E[(XK − a)−],

for any K ≥ sup I, because ((Xt − a)−, t ≥ 0) is a submartingale due to the convexity ofx 7→ (x − a)−. Taking the supremum over J shows that N([a, b], I,X) is a.s. bounded,because E[|XK |] <∞. This shows by letting K →∞ along integers, that N([a, b], I,X)is finite for every bounded subset I of Q+, and every a < b rationals, for every ω in anevent Ω0 with probability 1. Therefore, we can define

Xt(ω) = lims∈Q+,s>t

Xs(ω) , ω ∈ Ω0

and Xt(ω) = 0 for every t for ω /∈ Ω0. The process X thus obtained then is indeed

adapted to the filtration (Ft, t ≥ 0). It remains to show that X is an (Ft)-martingale,

satisfies E[Xt|Ft] = Xt, and is cadlag.

First, check that if X remains an (Ft ∨ N , t ∈ I)-martingale, because E[X|G ∨ N ] =E[X|G] in L1(Ω,G ∨N , P ) for any integrable X and sub-σ-algebra G ∈ F . Thus, we maysuppose that N ⊂ Ft for every t. Let s < t ∈ R+, and sn, n ≥ 0 be a (strictly) decreasing

sequence of rationals that converges to s, with s0 < t. Then Xs = limXsn = limE[Xt|Fsn ]by definition for ω ∈ Ω0. Now, the process (Mn = Xs−n , n ≤ 0) is a backwards martingalewith respect to the filtration (Gn = Fs−n , n ≤ 0). The backwards martingale convergence

theorem thus shows that Xs = E[Xt|Fs+], and therefore Xt = E[Xt|Ft]. Moreover,taking a rational sequence (tn) decreasing to t and using again the backwards martingale

convergence theorem, (Xtn) converges to Xt in L1, so that Xs = E[Xt|Fs] for every s ≤ t.

The only thing that remains to prove is the cadlag property. If t ∈ R+ and if Xs(ω)

does not converge to Xt(ω) as s ↓ t, then |Xt − Xs| > ε for some ε > 0 and for infinitely

many s > t, so that if ω ∈ Ω0, |Xt −Xu| > ε/2 for an infinite number of rationals u > t,

contradicting ω ∈ Ω0. The argument for showing that X has left limits is similar.

From now on, when considering martingales in continuous time, we will always taketheir cadlag version, provided the underlying filtration satisfies the usual hypotheses.

Page 34: AdPr2006

34 CHAPTER 4. CONTINUOUS-PARAMETER PROCESSES

4.4 Doob’s inequalities and convergence theorems for

martingales in continuous time

Considering cadlag martingales makes it straightforward to generalize to the continuouscase the inequalities of section 2.4, by density arguments. We leave to the reader to showthe following theorems which are analog to the discrete-time case.

Proposition 4.4.1 (A.s. convergence) Let (Xt, t ≥ 0) be a cadlag martingale whichis bounded in L1. Then Xt converges as t→∞ a.s. to an (a.s.) finite limit X∞.

To prove this, notice that convergence of Xt as t → ∞ to a (possibly infinite) limitis equivalent to the fact that the number of upcrossings of X from below a to above bover the time interval R+ is finite for every a < b rationals. However, by the cadlagproperty, it suffices to restrict our attention to the countable time set Q+ rather thanR+. Indeed, for each upcrossing of X from a to b between times s < t say, we can findrationals s′ > s, t′ > t as close to s, t as wanted so that X accomplishes an upcrossingfrom a to b between times s′, t′, and this implies that N(X,R+, [a, b]) = N(X,Q+, [a, b])(possibly infinite). Then, use similar arguments as those used in the first part of the proofof Theorem 4.3.1.

Proposition 4.4.2 (Doob’s inequalities) If (Xt, t ≥ 0) is a cadlag martingale andX∗t = sup0≤s≤t |Xs|, then for every c > 0, t ≥ 0,

cP (X∗t ≥ c) ≤ E[|Xt|].

Moreover, if p > 1 then

‖X∗t ‖p ≤

p

p− 1‖Xt‖p.

To prove this, notice that X∗t = sups∈t∪([0,t]∩Q) |Xs| by the cadlag property.

Proposition 4.4.3 (Lp convergence) (i) If X is a cadlag martingale and p > 1 thensupt≥0 ‖Xt‖p < ∞ if and only if X converges a.s. and in Lp to its limit X∞, and this ifand only if X is closed in Lp, i.e. there exists Z ∈ Lp so that E[Z|Ft] = Xt for every t,a.s. (one can then take Z = X∞).

(ii) If X is a cadlag martingale then X is U.I. if and only if X converges a.s. and inL1 to its limit X∞, and this if and only if X is closed (in L1).

Proposition 4.4.4 (Optional stopping) Let X be a cadlag U.I. martingale. Then forevery stopping times S ≤ T , one has E[XT |FS] = XS a.s.

Proof. Let Tn be the stopping time 2−nd2nT e as defined in the proof of Proposition4.1.1. The right-continuity of paths of X shows that XTn converges to XT a.s. Moreover,Tn takes values in the countable set D∗

n of dyadic rationals of level n (and ∞), so that

E[X∞|FTn ] =∑d∈D∗n

E[1Tn=dX∞|FTn ] =∑d∈D∗n

1Tn=dE[X∞|Fd]

Page 35: AdPr2006

4.5. KOLMOGOROV’S CONTINUITY CRITERION 35

(you should check this carefully). Now, since Xt converges to X∞ in L1, Xd = E[Xt|Fd] =E[X∞|Fd] a.s., and E[X∞|FTn ] = XTn . Passing to the limit as n → ∞ and using thebackwards martingale convergence theorem, we obtain E[X∞|F ′

T ] = XT where F ′T =⋂

n≥1FTn , and therefore E[X∞|FT ] = XT by the tower property, sinceXT is FT -measurable.The theorem then follows as in Theorem 2.6.1.

4.5 Kolmogorov’s continuity criterion

Theorem 4.5.1 (Kolmogorov’s continuity criterion) Let (Xt, 0 ≤ t ≤ 1) be a stochas-tic process with real values. Suppose there exist p > 0, c > 0, ε > 0 so that for everys, t ≥ 0,

E[|Xt −Xs|p] ≤ c|t− s|1+ε.

Then, there exists a modification X of X which is a.s. continuous (and even α-Holdercontinuous for any α ∈ (0, ε/p)).

Proof. Let Dn = k · 2−n, 0 ≤ k ≤ 2n denote the dyadic numbers of [0, 1] with level n,so Dn increases as n increases. Then letting α ∈ (0, ε/p), Markov’s inequality gives for0 ≤ k < 2n,

P (|Xk2−n−X(k+1)2−n| > 2−nα) ≤ 2npαE[|Xk2−n−X(k+1)2−n|p] ≤ 2npα2−n−nε ≤ 2−n2−(ε−pα)n.

Summing over Dn we obtain

P

(sup

0≤k<2n

|Xk2−n −X(k+1)2−n| > 2−nα)≤ 2−n(ε−pα),

which is summable. Therefore, the Borel-Cantelli lemma shows that for a.a. ω, thereexists Nω so that if n ≥ Nω, the supremum under consideration is ≤ 2−nα. Otherwisesaid, a.s.,

supn≥0

supk∈0,...,2n−1

|Xk2−n −X(k+1)2−n|2−nα

≤M(ω) <∞.

We claim that this implies that for every s, t ∈ D =⋃n≥0Dn, |Xs−Xt| ≤M ′(ω)|t− s|α,

for some M ′(ω) <∞ a.s. Indeed, if s, t ∈ D, s < t, and if r is the least integer such thatt − s > 2−r−1 we can write [s, t) as a disjoint unions of intervals of the form [r, r + 2−n)with r ∈ Dn and n > r, in such a way that for every n > r, at most two of these intervalshave length 2−n. This entails that

|Xs −Xt| ≤ 2∑n≥r+1

M(ω)2−nα ≤ 2(1− 2−α)−1M(ω)2−(r+1)α ≤M ′(ω)|t− s|α

where M ′(ω) < ∞ a.s. Therefore, the process (Xt, t ∈ D) is a.s. uniformly continuous(and even α-Holder continuous). Since D is an everywhere dense set in [0, 1], the latter

process a.s. admits a unique continuous extension X on [0, 1], which is also α-Holder

continuous (it is consistently defined by Xt = limnXtn , where (tn, n ≥ 0) is any D-valuedsequence converging to t). On the exceptional set where (Xd, d ∈ D) is not uniformly

Page 36: AdPr2006

36 CHAPTER 4. CONTINUOUS-PARAMETER PROCESSES

continuous, we let Xt = 0, 0 ≤ t ≤ 1, so X is continuous. It remains to show that X is aversion of X. To this end, we estimate by Fatou’s lemma

E[|Xt − Xt|p] ≤ lim infn

E[|Xt −Xtn|p],

where (tn, n ≥ 0) is any D-valued sequence converging to t. But since E[|Xt −Xtn|p] ≤c|t− tn|1+ε, this converges to 0 as n→∞. Therefore, Xt = Xt a.s. for every t.

The nice thing about this criterion is that is depends only on a control on the two-dimensional marginal distributions of the stochastic process.

In fact, the very same proof can give the following alternative

Corollary 4.5.1 Let (Xd, d ∈ D) be a stochastic process indexed by the set D of dyadicnumbers in [0, 1]. Assume that there exist c, p, ε > 0 so that for every s, t ∈ D,

E[|Xs −Xt|p] ≤ c|s− t|1+ε ,

thenalmost-surely, the process (Xd, d ∈ [0, 1]) has an extension (Xt, t ∈ [0, 1]) that iscontinuous, and even Holder-continuous of any index α ∈ (0, ε/p).

Page 37: AdPr2006

Chapter 5

Weak convergence

5.1 Definition and characterizations

Let (M,d) be a metric space, endowed with its Borel σ-algebra. All measures in thischapter will be measures on such a measurable space. Let (µn, n ≥ 0) be a sequenceof probability measures on M . We say that µn converges weakly to the non-negativemeasure µ if for every continuous bounded function f : M → R, one has µn(f) → µ(f).Notice that in this case, µ is automatically a probability measure since µ(1) = 1, andthe definition actually still makes sense if we suppose that µn (resp. µ) are just finitenon-negative measures on M .

Examples. Let (xn, n ≥ 0) be a M -valued sequence that converges to x. Then δxn

converges weakly to δx, where δa is the Dirac mass at a. This is just saying that f(xn) →f(x) for continuous functions.

Let M = [0, 1] and µn = n−1∑

0≤k≤n−1 δk/n. Then µn(f) is the Riemann sum

n−1∑

0≤k≤n−1 f(k/n), which converges to∫ 1

0f(x)dx if f is continuous, which shows that

µn converges weakly to Lebesgue’s measure on [0, 1].

In this two cases, notice that it is not true that µn(A) converges to µ(A) for everyBorel set A convergence. This ‘pointwise convergence’ is stronger, but much more rigidthan weak convergence. For example, δxn does not converge in that sense to δx unlessxn = x eventually. See e.g. Chapter III in Stroock’s book for a discussion on the variousexisting notions of convergence for measures.

Theorem 5.1.1 Let (µn, n ≥ 0) be a sequence of probability distributions. The followingassertions are equivalent:

1. µn converges weakly to µ

2. For every open subset G of M , lim infn µn(G) ≥ µ(G) (‘open sets can lose mass’)

3. For every closed subset F of M , lim supn µn(F ) ≤ µ(F ) (‘closed sets can gain mass’)

4. For every Borel subset A in M with µ(∂A) = 0, limn µn(A) = µ(A). (‘mass is lostor gained through the boundary’)

37

Page 38: AdPr2006

38 CHAPTER 5. WEAK CONVERGENCE

Proof. 1. =⇒ 2. Let G be an open subset with nonempty complement Gc. The distancefunction d(x,Gc) is continuous and positive if and only if x ∈ G. Let fM = 1∧(Md(x,Gc)).Then fM increases to 1G as M ↑ ∞. Now, µn(fM) ≤ µn(G) converges to µ(fM), so thatlim infn µn(G) ≥ µ(fM) for every M , and by monotone convergence letting M ↑ ∞, onegets the result.

2. ⇐⇒ 3. is obvious by taking complementary sets.

2.,3. =⇒ 4. Let A and A respectively denote the interior and the closure of A. Sinceµ(∂A) = µ(A \ A) = 0, we obtain µ(A) = µ(A) = µ(A).

lim supn

µn(A) ≤ µ(A) ≤ lim infn

µn(A),

and since A ⊂ A, this gives the result.

4. =⇒ 1. Let f : M → R+ be a continuous bounded non-negative function, then usingFubini’s theorem,∫

M

f(x)µn(dx) =

∫M

µn(dx)

∫ ∞

0

1t≤f(x)dt =

∫ K

0

µn(f ≥ t)dt,

where K is any upper bound for f . Now f ≥ t := x : f(x) ≥ t is a closed subsetof M , whose boundary is included in f = t, because f > t is open and included inf ≥ t, and their difference is f = t. However, there can be at most a countable setof numbers t such that µ(f = t) > 0, because

t : µ(f = t) > 0 =⋃n≥1

t : µ(f = t) ≥ n−1,

and the n-th set on the right-hand side has at most n elements. Therefore, for Lebesgue-almost all t, µ(∂f ≥ t) = 0 and therefore, 4. and dominated convergence over thefinite interval [0, K], where the integrated quantities are bounded by 1, show that µn(f)

converges to∫ K

0µ(f ≥ t)dt = µ(f). The case of functions taking values of both signs

is immediate.

As a consequence, one obtains the following important criterion for weak convergenceof measures on R. Recall that the distribution function of a non-negative finite measureµ on R is the cadlag function defined by Fµ(x) = µ((−∞, x]), x ∈ R.

Proposition 5.1.1 Let µn, n ≥ 0, µ be probability measures on R. Then the followingstatements are equivalent:

1. µn converges weakly to µ

2. for every x ∈ R such that Fµ is continuous at x, Fµn(x) converges to Fµ(x) asn→∞.

Proof. The continuity of Fµ at x exactly says that µ(∂Ax) = 0 where Ax = (−∞, x], so1. =⇒ 2. is immediate by Theorem 5.1.1.

Page 39: AdPr2006

5.2. CONVERGENCE IN DISTRIBUTION 39

Conversely, let G be an open subset of R, which we write as a countable union⋃k(ak, bk) of disjoint open intervals. Then

µn(G) =∑k

µn((ak, bk)) , (5.1)

while for every k and ak < a′ < b′ < bk. ,

µn((ak, bk)) = Fµn(bk−)− Fµn(ak) ≥ Fµn(b′)− Fµn(a′).

If we take a′, b′ to be continuity points of Fµ, we then obtain lim infn µn((ak, bk)) ≥Fµ(b

′) − Fµ(a′). Letting a′ ↓ ak, b

′ ↑ bk along continuity points of Fµ (such pointsalways form a dense set in R) gives lim infn µn((ak, bk)) ≥ µ((ak, bk)). On the other hand,applying Fatou’s lemma to (5.1) yields lim infn µn(G) ≥

∑k lim infn µn((ak, bk)), whence

lim infn µn(G) ≥ µ(G).

5.2 Convergence in distribution for random variables

If (Xn, n ≥ 0) is a sequence of random variables with values in a metric space (M,d), anddefined on possibly different probability spaces (Ωn,Fn, Pn), we say that Xn convergesin distribution to a random variable X on (Ω,F , P ) if the law of Xn converges weaklyto that of X. Otherwise said, Xn converges in distribution to X if for every continuousbounded function f , E[f(Xn)] converges to E[f(X)].

The two following examples are the probabilistic counterpart of the examples discussedin the beginning of the previous section.

Examples. If (xn) is a sequence in M that converges to x, then xn converges as n→∞to x in distribution, if the xn, n ≥ 0 and x are considered as random variables!

If U is a uniform random variable on [0, 1) and Un = n−1bnUc, we see that Un has lawµn and converges in distribution to U .

In the two cases we just discussed, the variables under consideration even convergea.s., which directly entails convergence in distribution, see the example sheets.

The notion of convergence in distribution is related to the other notions of convergencefor random variables as follows. See the Example sheet 3 for the proof.

Proposition 5.2.1 1. If (Xn, n ≥ 1) is a sequence of random variables that converges inprobability to some random variable X, then Xn converges in distribution to X.

2. If (Xn, n ≥ 1) is a sequence of random variables that converges in distribution tosome constant random variable c, then Xn converges to c in probability.

Using Proposition 5.1.1, we can discuss the following

Page 40: AdPr2006

40 CHAPTER 5. WEAK CONVERGENCE

Example: the central limit theorem. The central limit theorem says that if (Xn, n ≥1) is a sequence of iid random variables in L2 with m = E[X1] and σ2 = Var (X1), thenfor every a < b in R,

P

(a ≤ Sn −mn

σ√n

≤ b

)→n→∞

1√2π

∫ b

a

e−x2/2dx,

where Sn = X1 + . . . + Xn. This is exactly saying that (Sn − mn)/(σ√n) converges in

distribution as n→∞ to a Gaussian N (0, 1) random variable.

5.3 Tightness

Definition 5.3.1 Let µi, i ∈ I be a family of probability measures on M . This familyis said to be tight if for every ε > 0, there exists a compact subset K ⊂M such that

supi∈I

µi(M \K) ≤ ε,

i.e. most of the mass of µi is contained in K, uniformly in i ∈ I.

Proposition 5.3.1 (Prokhorov’s theorem) Suppose that the sequence of probabilitymeasures (µn, n ≥ 0) on M is tight. Then there exists a subsequence (µnk

, k ≥ 0) alongwhich µn converges weakly to some limiting µ.

The proof is considerably eased when M = R, which we will suppose. For the generalcase, see Billingsley’s book Convergence of Probability Measures. Notice that in particular,if (µn, n ≥ 0) is a sequence of probability measures on a compact space, then there existsa subsequence µnk

weakly converging to some µ.

Proof. Let Fn be the distribution function of µn. Then it is easy by a diagonal extractionargument to find an extraction (nk, k ≥ 0) and a non-decreasing function F : Q → [0, 1]such that Fnk

(r) → F (r) as k → ∞ for every rational r. The function F is extended onR as a cadlag non-decreasing function by the formula F (x) = limr↓x,r∈Q F (r). It is thenelementary by a monotonicity argument to show that Fnk

(x) → F (x) for every x whichis a continuity point of F .

To conclude, we must check that F is the distribution function of some measure µ.But the tightness shows that for every ε > 0, there exists A > 0 such that Fn(A) ≥ 1− εand Fn(−A) ≤ ε for every n. By further choosing A so that F is continuous at A and−A, we see that F (A) ≥ 1 − ε and F (−A) ≤ ε, whence F has limits 0 and 1 at −∞and +∞. By a standard corollary of Caratheodory’s theorem, there exists a probabilitymeasure µ having F as its distribution function.

Remark. The fact that Fn converges up to extraction to a function F , which need not bea probability distribution function unless the tightness hypothesis is verified, is a particularcase of Helly’s theorem, and says that up to extraction, a family of probability laws µnconverges vaguely to a possibly defective measure µ (i.e. of mass ≤ 1), i.e. µn(f) → µ(f)

Page 41: AdPr2006

5.4. LEVY’S CONVERGENCE THEOREM 41

for every f with compact support. The problem that could appear is that some of the massof the µn’s could ‘go to infinity’, for example δn converges vaguely to the zero measure asn→∞, and does not converge weakly. This phenomenon of mass ‘going away’ is exactlywhat Prokhorov’s theorem prevents from happening.

In many situations, showing that a sequence of random variables Xn converges indistribution to a limiting X with law µ is done in two steps. One first shows that thesequence (µn, n ≥ 0) of laws of the Xn form a tight sequence. Then, one shows that thelimit of µn along any subsequence cannot be other than µ. This will be illustrated in thenext section.

5.4 Levy’s convergence theorem

In this section, we let d be a positive integer and consider only random variables withvalues in the states space Rd.

Recall that the characteristic function of an Rd-valued random variable X is the func-tion ΦX : Rd → C defined by ΦX(λ) = E[exp(i〈λ,X〉)]. It is a continuous function onRd, such that ΦX(0) = 1. Moreover, it induces an injective mapping from (distributionsof) random variables to complex-valued functions defined on Rd, in the sense that tworandom variables with distinct distributions have distinct characteristic functions.

The following theorem is extremely useful in practice.

Theorem 5.4.1 (Levy’s convergence theorem) Let (Xn, n ≥ 0) be a sequence of ran-dom variables.

(i) If Xn converges in distribution to a random variable X, then ΦXn(λ) converges toΦX(λ) for every λ ∈ Rd.

(ii) If ΦXn(λ) converges to Ψ(λ) for every λ ∈ Rd, where Ψ is a function which iscontinuous at 0, then Ψ is a characteristic function, i.e. there exists a random variable Xsuch that Ψ = ΦX , and moreover, Xn converges in distribution to X.

Corollary 5.4.1 If (Xn, n ≥ 0), X are random variables in Rd, then Xn converges indistribution to X as n→∞ if and only if ΦXn converges to ΦX pointwise.

The proof of (i) in Levy’s theorem is immediate since the function x 7→ exp(iλ · x) iscontinuous and bounded from Rd to C. For the proof of (ii), we will need to show thatthe hypotheses imply the tightness of the sequence of laws of Xn, n ≥ 0. To this end, thefollowing bound is very useful.

Lemma 5.4.1 Let X be a random variable with values in Rd. Then for any norm ‖ · ‖on Rd there exists a constant C > 0 (depending on d and on the choice of the norm) suchthat

P (‖X‖ ≥ K) ≤ CKd

∫[−K−1,K−1]d

(1−<ΦX(u))du.

Page 42: AdPr2006

42 CHAPTER 5. WEAK CONVERGENCE

Proof. Let µ be the distribution of X. Using Fubini’s theorem and a simple recursion,it is easy to check that

1

λd

∫[−λ,λ]d

(1−<ΦX(u))du = 2d∫

Rd

µ(dx)

(1−

d∏i=1

sin(λxi)

λxi

).

Now, the continuous function sinc : t ∈ R 7→ t−1 sin t is such that there exists 0 < c < 1such that |sinc t| ≤ c for every t ≥ 1, so that f : u ∈ Rd 7→

∏di=1 sinui/ui is such that

|f(u)| ≤ c as soon as ‖u‖∞ ≥ 1. Therefore, 1 − f is a non-negative continuous functionwhich is ≥ 1 − c when ‖u‖∞ ≥ 1. Letting C = 2d(1 − c)−1 entails that C(1 − f(u)) ≥1‖u‖∞≥1. Putting things together, one gets the result for the norm ‖·‖∞, and the generalresult follows from the equivalence of norms in finite-dimensional vector spaces.

Proof of Levy’s theorem. Suppose ΦXn converges pointwise to a limit Ψ that iscontinuous at 0. Then, |1−<ΦXn| being bounded above by 2, fixing ε > 0, the dominatedconvergence theorem shows that for any K > 0

limnKd

∫[−K−1,K−1]d

(1−<ΦXn(u))du = Kd

∫[−K−1,K−1]d

(1−<Ψ(u))du.

By taking K large enough, we can make this limiting value < ε/(2Cd), because Ψ iscontinuous at 0, and it follows by the lemma that for every n large enough, P (|Xn| ≥K) ≤ ε. Up to increasing K, this then holds for every n, showing tightness of the familyof laws of the Xn. Therefore, up to extracting a subsequence, we see from Prokhorov’stheorem thatXn converges in distribution to a limitingX, so that ΦXn converges pointwiseto ΦX along this subsequence (by part (i)). This is possible only if ΦX = Ψ, showingthat Ψ is a characteristic function. Moreover, this shows that the law of X is the onlypossible probability measure which is the weak limit of the laws of the Xn along somesubsequence, so Xn must converge to X in distribution.

More precisely, if Xn did not converge in distribution to X, we could find a continuousbounded f , some ε > 0 and a subsequence Xnk

such that for all k,

|E[f(Xnk)]− E[f(X)]| > ε (5.2)

But since the laws of (Xnk, k ≥ 0) are tight, we could find a further subsequence along

which Xnkconverges in distribution to some X ′, which by (i) would satisfy ΦX′ = Ψ = ΦX

and thus have same distribution as X, contradicting (5.2).

Page 43: AdPr2006

Chapter 6

Introduction to Brownian motion

6.1 History up to Wiener’s theorem

This chapter is devoted to the construction and some properties of one of probabilitytheory’s most fundamental objects. Brownian motion earned its name after R. Brown,who observed around 1827 that tiny particles of pollen in water have an extremely erraticmotion. It was observed by Physicists that this was due to a important number of randomshocks undertaken by the particles from the (much smaller) water molecules in motionin the liquid. A. Einstein established in 1905 the first mathematical basis for Brownianmotion, by showing that it must be an isotropic Gaussian process. The first rigorousmathematical construction of Brownian motion is due to N. Wiener in 1923, using Fouriertheory.

In order to motivate the introduction of this object, we first begin by a “microscopical”depiction of Brownian motion. Suppose (Xn, n ≥ 0) is a sequence of Rd valued randomvariables with mean 0 and covariance matrix σ2Id, which is the identity matrix in ddimensions, for some σ2 > 0. Namely, if X1 = (X1

1 , . . . , Xd1 ),

E[X i1] = 0 , E[X i

1Xj1 ] = σ2δij , 1 ≤ i, j ≤ d.

We interpret Xn as the spatial displacement resulting from the shocks due to watermolecules during the n-th time interval, and the fact that the covariance matrix is scalarstands for an isotropy assumption (no direction of space is priviledged).

From this, we let Sn = X1 + . . . + Xn and we embed this discrete-time process intocontinuous time by letting

B(n)t = n−1/2S[nt] , t ≥ 0.

Let | · | be the Euclidean norm on Rd and for t > 0 and X, y ∈ Rd, define

pt(x) =1

(2πt)d/2exp

(−|x|

2

2t

),

which is the density of the Gaussian distribution N (0, tId) with mean 0 and covariancematrix tId. By convention, the Gaussian law N (m, 0) is the Dirac mass at m.

43

Page 44: AdPr2006

44 CHAPTER 6. BROWNIAN MOTION

Proposition 6.1.1 Let 0 ≤ t1 < t2 < . . . < tk. Then the finite marginal distributions ofB(n) with respect to times t1, . . . , tk converge weakly as n→∞. More precisely, if F is abounded continuous function, and letting x0 = 0, t0 = 0,

E[F (B

(n)t1 , . . . , B

(n)tk

)]→n→∞

∫(Rd)k

F (x1, . . . , xk)∏

1≤i≤k

pσ2(ti−ti−1)(xi − xi−1)dxi.

Otherwise said, (B(n)t1 , . . . , B

(n)tk

) converges in distribution to (G1, G2, . . . , Gk), which is arandom vector whose law is characterized by the fact that (G1, G2 − G1, . . . , Gk − Gk−1)are independent centered Gaussian random variables with respective covariance matricesσ2(ti − ti−1)Id.

Proof. With the notations of the theorem, we first check that (B(n)t1 , B

(n)t2 −B

(n)t1 , . . . , B

(n)tk−

B(n)tk−1

) is a sequence of independent random variables. Indeed, one has for 1 ≤ i ≤ k,

B(n)ti −B

(n)ti−1

=1√n

[nti]∑j=[nti−1]+1

Xj,

and the independence follows by the fact that (Xj, j ≥ 0) is an i.i.d. family. Even better,we have the identity in distribution for the i-th increment

B(n)ti −B

(n)ti−1

d=

√[nti]− [nti−1]√

n

1√[nti]− [nti−1]

[nti]−[nti−1]∑j=1

Xj,

and the central limit theorem shows that this converges in distribution to a Gaussian lawN (0, σ2(ti− ti−1)Id). Summing up our study, and introducing characteristic functions, wehave shown that for every ξ = (ξj, 1 ≤ j ≤ k),

E

[exp

(ik∏j=1

ξj(B(n)tj −B

(n)tj−1

)]=

k∏j=1

E[exp

(iξj(B

(n)tj −B

(n)tj−1

)]→n→∞

k∏j=1

E[exp (iξj(Gj −Gj−1)]

= E

[exp

(ik∏j=1

ξi(Gj −Gj−1)

)],

where G1, . . . , Gk is distributed as in the statement of the proposition. By Levy’s conver-gence theorem we deduce that increments of B(n) between times ti converge to incrementsof the sequence Gi, which is easily equivalent to the statement.

This gives the clue that B(n) should converge to a process B whose increments areindependent and Gaussian with covariances dictated by the above formula. This will beset in a rigorous way at the end of this section, with Donsker’s invariance theorem.

Page 45: AdPr2006

6.1. WIENER’S THEOREM 45

Definition 6.1.1 A Rd-valued stochastic process (Bt, t ≥ 0) is called a standard Brownianmotion if it is a continuous process, that satisfies the following conditions:

(i) B0 = 0 a.s.,

(ii) for every 0 = t0 ≤ t1 ≤ t2 ≤ . . . ≤ tk, the increments (Bt1−Bt0 , Bt2−Bt1 , . . . , Btk−Btk−1

) are independent, and

(iii) for every t, s ≥ 0, the law of Bt+s − Bt is Gaussian with mean 0 and covariancesId.

The term “standard” refers to the fact that B1 is normalized to have variance Id, andthe choice B0 = 0.

The characteristic properties (i), (ii), (iii) exactly amount to say that the finite-dimensional marginals of a Brownian motion are given by the formula of Proposition6.1.1. Therefore the law of the Brownian motion is uniquely determined. We now showWiener’s theorem that Brownian motion exists!

Theorem 6.1.1 (Wiener) There exists a Brownian motion on some probability space.

Proof. We will first prove the theorem in dimension d = 1 and construct a process(Bt, 0 ≤ t ≤ 1) satisfying the properties of a Brownian motion.

Let D0 = 0, 1, Dn = k2−n, 0 ≤ k ≤ 2n for n ≥ 1, and D =⋃n≥0Dn be the set of

dyadic rational numbers in [0, 1]. On some probability space (Ω,F , P ), let (Zd, d ∈ D) bea collection of independent random variables all having a Gaussian distribution N (0, 1)with mean 0 and variance 1. We are first going to construct the process (Bd, d ∈ D) sothat Bd is a linear combination of the Zd′ ’s for every d.

It is a well-known and important fact that if random variables X1, X2, . . . are linearcombinations of independent centered Gaussian random variables, then X1, X2, . . . are in-dependent if and only if they are pairwise uncorrelated, namely Cov (Xi, Xj) = E[XiXj] =0 for every i 6= j.

We set B0 = 0 and Bd = Z1. Inductively, given (Bd, d ∈ Dn−1), we build (Bd, d ∈ Dn)in such a way that

• (Bd, d ∈ Dn) satisfies (i), (ii), (iii) in the definition of the Brownian motion (wherethe instants under consideration are taken in Dn).

• the random variables (Zd, d ∈ D \Dn) are independent of (Bd, d ∈ Dn).

To this end, take d ∈ Dn \Dn−1, and let d− = d−2−n and d+ = d+2−n so that d−, d+

are consecutive dyadic numbers in Dn−1. Then write

Bd =Bd− +Bd+

2+

Zd2(n+1)/2

.

Then Bd−Bd− = (Bd+−Bd−)/2+Zd/2(n+1)/2 and Bd+−Bd = (Bd+−Bd−)/2−Zd/2(n+1)/2.

Now notice that Nd := (Bd+ − Bd−)/2 and N ′d := Zd/2

(n+1)/2 are by the inductionhypothesis two independent centered Gaussian random variables with variance 2−n−1.From this, one deduces Cov (Nd + N ′

d, Nd − N ′d) = Var (Nd) − Var (N ′

d) = 0, so that

Page 46: AdPr2006

46 CHAPTER 6. BROWNIAN MOTION

the increments Bd − Bd− and Bd+ − Bd are independent with variance 2−n, as shouldbe. Moreover, these increments are independent of the increments Bd′+2−n−1 − Bd′ ford′ ∈ Dn−1, d

′ 6= d− and of Zd′ , d′ ∈ Dn \ Dn−1, d

′ 6= d so they are independent of theincrements Bd′′+2−n −Bd′′ for d′′ ∈ Dn, d

′′ /∈ d−, d. This allows the induction argumentto proceed one step further.

Thus, we have a process (Bd, d ∈ D) satisfying the properties of Brownian motion.Note that Bt −B − s has same

Let s ≤ t ∈ D, and notice that for every p > 0, since Bt−Bs has same law as√t− sN ,

where N is a standard Gaussian random variable,

E[|Bt −Bs|p] = |t− s|p/2E[|N |p].

Since a Gaussian random variable admits moments of all orders, it follows from Corollary4.5.1 that (Bd, d ∈ D) a.s. admits a continuous continuation (Bt, 0 ≤ t ≤ 1).

Up to modifying B on the exceptional event where such an extension does not exist,replacing it by the 0 function for instance, we see that B can be supposed to be continuousfor every ω.

We now check that (Bt, t ∈ [0, 1]) thus constructed has the properties of Brownianmotion. Let 0 = t0 < t1 < . . . < tk, and let 0 = tn0 < tn1 < . . . < tnk be dyadic numberssuch that tni converges to ti as n→∞. Then by continuity, (Btn1

, . . . , Btnk) converges a.s.

to (Bt1 , . . . , Btk) as n→∞, while on the other hand, (Btnj−Btnj−1,1≤j≤k) are independent

Gaussian random variables with variances (tnj − tnj−1, 1 ≤ j ≤ k), so it is not difficultusing Lvy’s theorem to see that this converges in distribution to independent Gaussianrandom variables with respective variances tj − tj−1, which thus is the distribution of(Btj −Btj−1

, 1 ≤ j ≤ k), as wanted.

It is now easy to construct a Brownian motion indexed by R+: simply take independentstandard Brownian motions (Bi

t, 0 ≤ t ≤ 1), i ≥ 0 as we just constructed, and let

Bt =

btc−1∑i=0

Bi1 +B

btct−btc , t ≥ 0 .

It is easy to check that this has the wanted properties.

Finally, it is straightforward to build a Brownian motion in Rd, by taking d independentcopies B1, . . . , Bd of B and checking that ((B1

t , . . . , Bdt ), t ≥ 0) is a Brownian motion in

Rd.

Let ΩW = C(R+,Rd) be the ‘Wiener space’ of continuous functions, endowed with theproduct σ-algebraW (or the Borel σ-algebra associated with the compact-open topology).Let Xt(w) = w(t), t ≥ 0 denote the canonical process (w ∈ ΩW ).

Proposition 6.1.2 (Wiener’s measure) There exists a unique measure W0(dw) on(ΩW ,W), such that (Xt, t ≥ 0) is a standard Brownian motion on (ΩW ,W ,W0(dw)).

Proof. Let (Bt, t ≥ 0) be a standard Brownian motion defined on some probability space(Ω,F , P ) The distribution of B, i.e. the image measure of P by the random variable

Page 47: AdPr2006

6.2. FIRST PROPERTIES 47

B : Ω → ΩW , is a measure W0(dw) satisfying the conditions of the statement. Uniquenessis obvious because such a measure is determined by the finite-dimensional marginals ofBrownian motion.

For x ∈ Rd we also let Wx(dw) to be the image measure of W by (wt, t ≥ 0) 7→(x + wt, t ≥ 0). A (continuous) process with law Wx(dw) is called a Brownian motionstarted at x.

We let (FBt , t ≥ 0) be the natural filtration of (Bt, t ≥ 0).

Notice that Kolmogorov’s continuity lemma shows that a standard Brownian motionis also a.s. locally Holder continuous with any exponent α < 1/2, since it is for everyα < 1/2− 1/p for some integer p.

6.2 First properties

The first few following basic (and fundamental) invariance properties of Brownian motionare left as an exercise.

Proposition 6.2.1 Let B be a standard Brownian motion in Rd.

1. If U ∈ O(n) is an orthogonal matrix, then UB = (UBt, t ≥ 0) is again a Brownianmotion. In particular, −B is a Brownian motion.

2. If λ > 0 then (λ−1/2Bλt, t ≥ 0) is a standard Brownian motion (scaling property)

3. For every t ≥ 0, the shifted process (Bt+s − Bt, s ≥ 0) is a Brownian motion inde-pendent of FB

t (simple Markov property).

We now turn to less trivial path properties of Brownian motion. We begin with

Theorem 6.2.1 (Blumenthal’s 0− 1 law) Let B be a standard Brownian motion. Theσ-algebra FB

0+ =⋂ε>0FB

ε is trivial, i.e. constituted of the events of probability 0 or 1.

Proof. Let 0 < t1 < t2 < . . . < tk and A ∈ FB0+. Then if F is continuous bounded

function (Rd)k → R, we have by continuity of B and the dominated convergence theorem,

E[1AF (Bt1 , . . . , Btk)] = limε↓0

E[1AF (B(ε)t1−ε, . . . , B

(ε)tk−ε)],

where Bε = (Bt+ε−Bε , t ≥ 0). On the other hand, since A is FBε -measurable for any ε > 0,

the simple Markov property shows that this is equal to

P (A) limε↓0

E[F (Bt1−ε, . . . , Btk−ε)],

which is P (A)E[F (Bt1 , . . . , Btk)], using again dominated convergence and continuity of Band F . This entails that F0+ is independent of σ(Bs, s ≥ 0) = FB

∞. However, FB∞ contains

FB0+, so that the latter σ-algebra is independent of itself, and P (A) = P (A∩A) = P (A)2,

entailing the result.

Page 48: AdPr2006

48 CHAPTER 6. BROWNIAN MOTION

Proposition 6.2.2 (i) For d = 1 and t ≥ 0, let St = sup0≤s≤tBs and It = inf0≤s≤tBs

(these are random variables because B is continuous). Then almost-surely, for everyε > 0, one has

Sε > 0 and Iε < 0.

In particular, there exists a zero of B in any interval of the form (0, ε), ε > 0.

(ii) A.s.,supt≥0

Bt = − inft≥0

Bt = +∞.

(iii) Let C be an open cone in Rd with non-empty interior and origin at 0, i.e. a setof the form tu : t > 0, u ∈ A, where A is an non-empty open subset of the unit sphereof Rd. If

HC = inft > 0 : Bt ∈ Cis the first hitting time of C, then HC = 0 a.s.

Proof. (i) The probability that Bt > 0 is 1/2 for every t, so P (St > 0) ≥ 1/2, and there-fore if tn, n ≥ 0 is any sequence decreasing to 0, P (lim supnBtn > 0) ≥ lim supn P (Btn >0) = 1/2. Since the event lim supnBtn > 0 is in F0+, Blumenthal’s law shows that itsprobability must be 1. The same is true for the infimum by considering the Brownianmotion −B.

(ii) Let S∞ = supt≥0Bt. By scaling invariance, for every λ > 0, λS∞ = supt≥0 λBt hassame law as supt≥0Bλ2t = S∞. This is possible only if either S∞ ∈ 0,∞ a.s., however,it cannot be 0 by (i).

(iii) The cone C is invariant by multiplication by a positive scalar, so that P (Bt ∈ C)is the same as P (B1 ∈ C) for every t by the scaling invariance of Brownian motion. Now,if C has nonempty interior, it is straightforward to check that P (B1 ∈ C) > 0, and oneconcludes similarly as above. Details are left to the reader.

6.3 The strong Markov property

We now want to prove an important analog of the simple Markov property, where de-terministic times are replaced by stopping times. To begin with, we extend a little thedefinition of Brownian motion, by allowing it to start from a random location, and byworking with filtrations that are larger with the natural filtration of standard Brownianmotions.

We say that B is a Brownian motion (started at B0) if (Bt −B0, t ≥ 0) is a standardBrownian motion which is independent of B0. Otherwise said, it is the same as thedefinition as a standard Brownian motion, except that we do not require that B0 = 0. Ifwe want to express this on the Wiener space with the Wiener measure, we have for everymeasurable functional F : ΩW → R+,

E[F (Bt, t ≥ 0)] = E[F (Bt −B0 +B0, t ≥ 0)],

and since (Bt −B0) has law Wx, this is∫Rd

P (B0 ∈ dx)

∫ΩW

W0(dw)F (x+ w(t), t ≥ 0) =

∫Rd

P (B0 ∈ dx)Wx(F ) = E[WB0(F )],

Page 49: AdPr2006

6.3. THE STRONG MARKOV PROPERTY 49

where as above, Wx is the image of W0 by the translation w 7→ x + w, and WB0(F )is the random variable ω 7→ WB0(ω)(F ). Using Proposition 1.3.4 actually shows thatE[F (B)|B0] = WB0(F ).

Let (Ft, t ≥ 0) be a filtration. We say that a Brownian motion B is an (Ft)-Brownianmotion if B is adapted to (Ft), and if B(t) = (Bt+s − Bt, s ≥ 0) is independent of Ft forevery t ≥ 0. For instance, if (Ft) is the natural filtration of a 2-dimensional Brownianmotion (B1

t , B2t , t ≥ 0), then (B1

t , t ≥ 0) is an (Ft)-Brownian motion. If B′ is a standardBrownian motion and X is a random variable independent of B′, then B = (X+B′

t, t ≥ 0)is a Brownian motion (started at B0 = X), and it is an (FB

t ) = (σ(X) ∨ FB′t )-Brownian

motion. A Brownian motion is always an (FBt )-Brownian motion. If B is a standard

Brownian motion, then the completed filtration Ft = FBt ∨N (N being the set of events

of probability 0) can be shown to be right-continuous, i.e. Ft+ = Ft for every t ≥ 0, andB is an (Ft)-Brownian motion.

Let (Bt, t ≥ 0) be an (Ft)-Brownian motion in Rd and T be an (Ft)-stopping time.

We let B(T )t = BT+t −BT for every t ≥ 0 on the event T <∞, and 0 otherwise. Then

Theorem 6.3.1 (Strong Markov property) Conditionally on T < ∞, the processB(T ) is a standard Brownian motion, which is independent of FT . Otherwise said, condi-tionally given FT and T <∞, the process (BT+t, t ≥ 0) is an (FT+t)-Brownian motionstarted at BT .

Proof. Suppose first that T <∞ a.s. Let A ∈ FT , and consider times t1 < t2 < . . . < tk.We want to show that for every bounded continuous function F on (Rd)k,

E[1AF (B(T )t1 , . . . , B

(T )tk

)] = P (A)E[F (Bt1 , . . . , Btk)]. (6.1)

Indeed, taking A = Ω entails that B(T ) is a Brownian motion, while letting A vary in FTentails the independence of (B

(T )t1 , . . . , B

(T )tk

) and FT for every t1, . . . , tk, hence of B(T ) andFT .

Now, suppose first that T takes its values in a countable subset E of R+. Then

E[1AF (B(T )t1 , . . . , B

(T )tk

)] =∑s∈E

E[1A∩T=sF (B(s)t1 , . . . , B

(s)tk

)]

=∑s∈E

P (A ∩ T = s)E[F (Bt1 , . . . , Btk)],

where we used the simple Markov property and the fact that A ∩ T = s ∈ Fs bydefinition. Back to the general case, we can apply this result to the stopping time Tn =2−nd2nT e. Since Tn ≥ T , it holds that FT ⊂ FTn so that we obtain for A ∈ FT

E[1AF (B(Tn)t1 , . . . , B

(Tn)tk

)] = P (A)E[F (Bt1 , . . . , Btn)]. (6.2)

Now, by a.s. continuity of B, it holds that B(Tn)t converges a.s. to B

(T )t as n → ∞, for

every t ≥ 0. Since F is bounded, the dominated convergence theorem allows to pass tothe limit in (6.2), obtaining (6.1).

Finally, if P (T = ∞) > 0, check that (6.1) remains true when replacing A by A∩T <∞, and divide by P (T <∞).

Page 50: AdPr2006

50 CHAPTER 6. BROWNIAN MOTION

An important example of application of the strong Markov property is the so-calledreflection principle. Recall that St = sup0≤s≤tBs.

Theorem 6.3.2 (Reflection principle) Let (Bt, t ≥ 0) be an (Ft)-Brownian motionstarted at 0, and T be an (Ft)-stopping time. Then, the process

Bt = Bt1t≤T + (2BT −Bt)1t>T , t ≥ 0

is also an (Ft)-Brownian motion started at 0.

Proof. By the strong Markov property, the processes (Bt, 0 ≤ t ≤ T ) and B(T ) areindependent. Moreover, B(T ) is a standard Brownian motion, and hence has same lawas −B(T ). Therefore, the pair ((Bt, 0 ≤ t ≤ T ), B(T )) has same law as ((Bt, 0 ≤ t ≤T ),−B(T )). On the other hand, the trajectory B is a measurable G((Bt, 0 ≤ t ≤ T ), B(T )),where G(X,Y ) is the concatenation of the paths X, Y . The conclusion follows from the

fact that G((Bt, 0 ≤ t ≤ T ),−B(T )) = B.

Corollary 6.3.1 (Sometimes also called the reflection principle) Let 0 < b anda ≤ b, then for every t ≥ 0,

P (St ≥ b, Bt ≤ a) = P (Bt ≥ 2b− a).

Proof. Let Tx = inft ≥ 0 : Bt ≥ x be the first entrance time of Bt in [x,∞) forx > 0. Then Tx is an (FB

t )-stopping time for every x by (i), Proposition 4.1.1. Noticethat Tx <∞ a.s. since S∞ = ∞ a.s., where S∞ = limt→∞ St.

Now by continuity of B, BTx = x for every x. By the reflection principle applied toT = Tb, we obtain (with the definition B given in the statement of the reflection principle)

P (St ≥ b, Bt ≤ a) = P (Tb ≤ t, 2b−Bt ≥ 2b− a) = P (Tb ≤ t, Bt ≥ 2b− a),

since 2b − Bt = Bt as soon as t ≥ Tb. On the other hand, the event Bt ≥ 2b − a is

contained in Tb ≤ t since 2b− a ≥ b. Therefore, we obtain P (St ≥ b, Bt ≤ a) = P (Bt ≥2b− a), and the result follows since B is a Brownian motion.

Notice also that the probability under consideration is equal to P (St > b,Bt < a) =P (Bt > 2b− a), i.e. the inequalities can be strict or not. Indeed, for the right-hand side,this is due to the fact that the distribution of Bt is non-atomic, and for the left-hand side,this boils down to showing that for every x,

Tx = inft ≥ 0 : Bt > x , a.s.,

which is a straightforward consequence of the strong Markov property at time Tx, com-bined with Proposition 6.2.2.

Corollary 6.3.2 The random variable St has the same law as |Bt|, for every fixed t ≥ 0.Moreover, for every x > 0, the random time Tx has same law as (x/B1)

2.

Proof. As a ↑ b, the probability P (St ≥ b, Bt ≤ a) converges to P (St ≥ b, Bt ≤ b), andthis is equal to P (Bt ≥ b) by Corollary 6.3.1. Therefore,

P (St ≥ b) = P (St ≥ b, Bt ≤ b) + P (Bt ≥ b) = 2P (Bt ≥ b) = P (|Bt| ≥ b),

because Bt ≥ b ⊂ St ≥ b, and this gives the result. We leave the computation of thedistribution of Tx as an exercise.

Page 51: AdPr2006

6.4. MARTINGALES AND BROWNIAN MOTION 51

6.4 Some martingales associated to Brownian motion

One of the nice features of Brownian motion is that there are a tremendous amount ofmartingales that are associated with it.

Proposition 6.4.1 Let (Bt, t ≥ 0) be an (Ft)-Brownian motion.

(i) If d = 1 and B0 ∈ L1, the process (Bt, t ≥ 0) is a (Ft)-martingale.

(ii) If d = 1 and B0 ∈ L2, the process (B2t − t, t ≥ 0) is a (Ft)-martingale.

(iii) In any dimension, let u = (u1, . . . , ud) ∈ Cd. If E[exp(〈u,B0〉)] <∞, the processM = (exp(〈u,Bt〉 − tu2/2), t ≥ 0) is also a (Ft)-martingale for every u ∈ Cd, where u2 isa notation for

∑di=1 u

2i .

Notice that in (iii), we are dealing with C-valued processes. The definition of E[X|G]the conditional expectation for a random variable X ∈ L1(C) is E[<X|G]+iE[=X|G], andwe say that an integrable process (Xt, t ≥ 0) with values in C, and adapted to a filtration(Ft), is a martingale if its real and imaginary parts are. Notice that the hypothesis on B0

in (iii) is automatically satisfied whenever u = iv with v ∈ R is purely imaginary.

Proof. (i) If s ≤ t, E[Bt − Bs|Fs] = E[B(s)t−s] = 0, where B

(s)u = Bu+s − Bs has mean 0

and is independent of Fs, by the simple Markov property. The integrability of the processis obvious by hypothesis on B0.

(ii) Integrability is an easy exercise using that Bt−B0 is independent of B0. We have,for s ≤ t, B2

t = (Bt−Bs)2 + 2Bs(Bt−Bs) +B2

s . Taking conditional expectation given Fsand using the simple Markov property gives that E[B2

t ] = (t− s) +B2s , hence the result.

(iii) Integrability comes from the fact that E[exp(λBt)] = exp(tλ2/2) whenever B is astandard Brownian motion, and the fact that

E[exp(〈u,Bt〉)] = E[exp(〈u, (Bt−B0+B0)〉)] = E[exp(〈u,Bt−B0〉)]E[exp(〈u,B0〉)] <∞.

For s ≤ t, Mt = exp(i〈u, (Bt−Bs)〉+i〈u,Bs〉+t|u|2/2). We use the Markov property again,and the fact that E[exp(i〈u,Bt −Bs〉)] = exp(−(t− s)|u|2/2), which is the characteristicfunction of a Gaussian law with mean 0 and variance |u|2.

From this, one can show that

Proposition 6.4.2 Let (Bt, t ≥ 0) be a standard Brownian motion and Tx = inft ≥ 0 :Bt = x. Then for x, y > 0, one has

P (T−y < Tx) =x

x+ y, E[Tx ∧ T−y] = xy.

Proposition 6.4.3 Let (Bt, t ≥ 0) be a (Ft)-Brownian motion. Let f(t, x) : R+×Rd → Cbe continuously differentiable in the variable t and twice continuously differentiable in x,and suppose that f and its derivatives of all order are bounded. Then,

Mt = f(t, Bt)− f(0, B0)−∫ t

0

ds

(∂

∂t+

1

2∆

)f(s, Bs) , t ≥ 0

is a (Ft)-martingale, where ∆ =∑d

i=1∂2

∂x2i

is the Laplacian operator acting on the spatial

coordinate of f .

Page 52: AdPr2006

52 CHAPTER 6. BROWNIAN MOTION

This is the first symptom of the famous Ito formula, which says what this martingaleactually is.

Proof. Integrability is trivial from the boundedness of f , as well as adptedness, since Mt

is a function of (Bs, 0 ≤ s ≤ t). Let s, t ≥ 0. We estimate

E[Mt+s|Ft] = Mt + E

[f(t+ s, Bt+s)− f(t, Bt)−

∫ t+s

t

du

(∂

∂t+

2

)f(u,Bu)

∣∣∣∣Ft] .On the one hand, E[f(t+s, Bt+s)−f(t, Bt)|Ft] = E[f(t+s, Bt+s−Bt+Bt)|Ft]−f(t, Bt),and since Bt+s − Bt is independent of Ft with law N (0, s), using Proposition 1.3.4, thisis equal to ∫

Rf(t+ s, Bt + x)p(s, x)dx− f(t, Bt), (6.3)

where p(s, x) = (2πs)−d/2 exp(−|x|2/(2s)) is the probability density function for N (0, s).On the other hand, if we let L = ∂/∂t+ ∆/2,

E

[∫ t+s

t

duLf(u,Bu)

∣∣∣∣Ft] = E

[∫ s

0

duLf(u+ t, Bt+s −Bt +Bt)

∣∣∣∣Ft] .This expression is of the form E[F ((B(t), Bt))|Ft], where F is measurable and B

(t)s =

Bt+s − Bt, s ≥ 0 is independent of Bt by the simple Markov property, and has lawW0(dw), the Wiener measure. If (Xt, t ≥ 0) is the canonical process Xt(w) = wt, thenthis last expression rewrites, by Proposition 1.3.4,

E

[∫ t+s

t

Lf(u,Bu)du

∣∣∣∣Ft] =

∫ΩW

W0(dw)

∫ s

0

duLf(u+ t,Xs +Bt)

=

∫ s

0

du

∫ΩW

W0(dw)Lf(u,Xs +Bt)

=

∫ s

0

du

∫Rd

dx p(s, x)Lf(u, x+Bt),

where f(s, x) = f(s+t, x), and we made a use of Fubini’s theorem. Next, the boundedness

of Lf entails that this is equal to

limε↓0

∫ s

ε

du

∫Rd

dx p(s, x)Lf(u, x+Bt).

From the expression for L, we can split this into two parts. Using integration by parts,∫R

dx

∫ s

ε

du p(u, x)∂f

∂t(u, x+Bt)

=

∫R

dx p(s, x)f(s, x+Bt)−∫

Rdx p(ε, x)f(ε, x+Bt)

−∫

Rd

dx

∫ s

ε

du∂p

∂t(u, x) f(t+ u, x+Bt).

Page 53: AdPr2006

6.5. RECURRENCE AND TRANSIENCE PROPERTIES 53

Similarly, integrating by parts twice yields∫ s

ε

du

∫R

dx p(u, x)∆

2f(u+ t, x+Bt) =

∫ s

ε

du

∫R

dx∆

2p(u, x) f(u+ t, x+Bt).

Now, p(t, x) satisfies the heat equation (∂t − ∆/2)p = 0. Therefore, the integral termscancel each other, and it remains

E

[∫ t+s

t

duLf(u,Bu)

∣∣∣∣Ft] =

∫Rd

dx p(s, x)f(s, x+Bt)− limε↓0

∫Rd

dx p(ε, x)f(ε, x+Bt),

which by dominated convergence is exactly (6.3). This shows that E[Mt+s −Mt|Ft] = 0.

6.5 Recurrence and transience properties of Brown-

ian motion

From this section on, we are going to introduce a bit of extra notation. We will supposethat the reference measurable space (Ω,F) on which (Bt, t ≥ 0) is defined endowed withprobability measures Px, x ∈ Rd such that under Px, (Bt−x, t ≥ 0) is a standard Brownianmotion. A possibility is to choose the Wiener space and endow it with the measures Wx,so that the canonical process (Xt, t ≥ 0) is a Brownian motion started at x under Wx.We let Ex be the expectation associated with Px. In the sequel, B(x, r), B(x, r) willrespectively denote the open and closed Euclidean balls with center x and radius r, in Rd

for some d ≥ 1.

Theorem 6.5.1 (i) If d = 1, Brownian motion is point-recurrent in the sense that underP0 (or any Py, y ∈ R),

a.s., t ≥ 0 : Bt = x is unbounded for every x ∈ R.

(ii) If d = 2, Brownian motion is neighborhood-recurrent, in the sense that for everyx, under Px,

a.s., t ≥ 0 : |Bt| ≤ ε is unbounded for every x ∈ Rd, ε > 0.

However, points are polar in the sense that for every x ∈ Rd,

P0(Hx = ∞) = 1 , where Hx = inft > 0 : Bt = x

is the hitting time of x.

(iii) If d ≥ 3, Brownian motion is transient, in the sense that a.s. under P0, |Bt| → ∞as t→∞.

Proof. (i) is a consequence of (ii) in Proposition 6.2.2.

For (ii), let 0 < ε < R be real numbers and f be a C∞ function, which is bounded withall its derivatives, and that coincides with x 7→ log |x| on Dε,R = x ∈ R2 : ε ≤ |x| ≤ R.

Page 54: AdPr2006

54 CHAPTER 6. BROWNIAN MOTION

Then one can check that ∆f = 0 on the interior of Dε,R, and therefore, if we let S =inft ≥ 0 : |Bt| = ε and T = inft ≥ 0 : |Bt| = R, then S, T,H = S ∧ T are stoppingtimes, and from Proposition 6.4.3, the stopped process (log |Bt∧H |, t ≥ 0) is a (bounded)martingale. If ε < |x| < R, we thus obtain that Ex[log |BH |] = log |x|. Since H ≤ T <∞a.s. (Brownian motion is unbounded a.s.), and since |BS| = ε, |BT | = R on the event thatS <∞, T <∞, the left-hand side is (log ε)Px(S < T ) + (logR)Px(S > T ). Therefore,

Px(S < T ) =logR− log |x|logR− log ε

. (6.4)

Letting ε→ 0 shows that the probability of hitting 0 before hitting the boundary of theball with radius R is 0, and therefore, letting R→∞, the probability of hitting 0 (startingfrom x 6= 0) is 0. The announced result (for x 6= 0) is then obtained by translation. Wethus have that P0(Hx <∞) = 0 for every x 6= 0. Next, we have

P0(∃t ≥ a : Bt = 0) = P0(∃s ≥ 0 : Bs+a −Ba +Ba = 0),

and the Markov property at time a shows that this is

P0(∃t ≥ a : Bt = 0) =

∫R2

P0(Ba ∈ dy)P0(∃s ≥ 0 : Bs + y = 0)

=

∫R2

P0(Ba ∈ dy)Py(∃s ≥ 0 : Bs = 0) = 0,

because the law of Ba under P0 is a Gaussian law that does not charge the point 0 (wehave been using the notation P (X ∈ dx) for the law of the random variable X).

On the other hand, letting R → ∞ first in (6.4), we get that the probability ofhitting the ball with center 0 and radius ε is 1 for every ε, starting from any point:Px(∃t ≥ 0 : |Bt| ≥ ε) = 1. Thus, for every n ∈ Z+, a similar application of the Markovproperty at time n gives

Px(∃t ≥ n : |Bt| ≤ ε) =

∫R2

Px(Bn ∈ dy)Py(∃t ≥ 0 : |Bt| ≤ ε) = 1.

Hence the result.

For (iii) Since the three first components of a Brownian motion in Rd form a Brownianmotion, it is clearly sufficient to treat the case d = 3. So assume d = 3. Let f be a C∞function with all derivatives that are bounded, and that coincides with x 7→ 1/|x| on Dε,R,which is defined as previously but for d = 3. Then ∆f = 0 on the interior of Dε,R, andthe same argument as above shows that for x ∈ Dε,R, defining S, T as above,

Px(S < T ) =|x|−1 −R−1

ε−1 −R−1.

This converges to ε/|x| as R → ∞, which is thus the probability of ever visiting B(0, ε)when starting from x (with |x| ≥ ε). Define two sequences of stopping times Sk, Tk, k ≥ 1by S1 = inft ≥ 0 : |Bt| ≤ ε, and

Tk = inft ≥ Sk : |Bt| ≥ 2ε , Sk+1 = inft ≥ Sk : |Bt| ≥ 2ε.

Page 55: AdPr2006

6.6. THE DIRICHLET PROBLEM 55

If Sk is finite, we get that Tk is also finite, because Brownian motion is an a.s. unboundedprocess, so Sk < ∞ = Tk < ∞ up to a zero-probability event. The strong Markovproperty at time Tk gives

Px(Sk+1 <∞|Sk <∞) = Px(Sk+1 <∞|Tk <∞)

= Px(∃s ≥ Tk : |Bs −BTk+BTk

| ≤ ε |Tk <∞)

=

∫R3

Px(BTk∈ dy|Tk <∞)Py(∃s ≥ 0 : |Bs| ≤ ε),

where Px(BTk∈ dy|Tk <∞) is the law of BTk

under the probability measure Px(A|Tk <∞), A ∈ F . Since |BTk

| = 2ε on the event Tk < ∞, we have that the last probabilityis ε/|y| = 1/2. Finally, we obtain by induction that Px(Sk < ∞) ≤ Px(S1 < ∞)2−k+1,and the Borel-Cantelli lemma entails that a.s., Sk = ∞ for some k. Therefore, Brownianmotion in dimension 3 a.s. eventually leaves the ball of radius ε for good, and lettingε = n→∞ along Z+ gives the result.

Remark. If B(x, ε) is the Euclidean ball of center x and radius ε, notice that the propertyof (ii) implies the fact that t ≥ 0 : Bt ∈ B(x, ε) is unbounded for every x ∈ R2 and everyε > 0, almost surely (indeed, one can cover R2 by a countable union of balls of a fixedradius). In particular, the trajectory of a 2-dimensional Brownian motion is everywheredense. On the other hand, it will a.s. never hit a fixed countable family of points (exceptmaybe at time 0), like the points with rational coordinates!

6.6 Brownian motion and the Dirichlet problem

Let D be a connected open subset of Rd for some d ≥ 2. We will say that D is a domain.Let ∂D be the boundary of D. We denote by ∆ the Laplacian on Rd. Suppose givena measurable function g : ∂D → R. A solution of the Dirichlet problem with boundarycondition g on D is a function u : D → R of class C2(D) ∩ C(D), such that

∆u = 0 on Du|∂D = g.

(6.5)

A solution of the Dirichlet problem is the mathematical counterpart of the followingphysical problem: given an object made of homogeneous material, such that the tempera-ture g(y) is imposed at point y of its boundary, the solution u(x) of the Dirichlet problemgives the temperature at the point x in the object when equilibium is attained.

As we will see, it is possible to give a probabilistic resolution of the Dirichlet problemwith the help of Brownian motion. This is essentially due to Kakutani. We let Ex bethe law of the Brownian motion in Rd started at x. In the remaining of the section, letT = inft ≥ 0 : Bt /∈ D be the first exit time from D. It is a stopping time, as it is thefirst entrance time in the closed set Dc. We will assume that the domain D is such thatP (T <∞) = 1 to avoid complications. Hence BT is a well-defined random variable.

In the sequel, | · | is the euclidean norm on Rd. The goal of this section is to prove thefollowing

Page 56: AdPr2006

56 CHAPTER 6. BROWNIAN MOTION

Theorem 6.6.1 Suppose that g ∈ C(∂D,R) is bounded, and assume that D satisfies alocal exterior cone condition (l.e.c.c.), i.e. for every y ∈ ∂D, there exists a nonemptyopen convex cone with origin at y such that C ∩ B(y, r) ⊂ Dc for some r > 0. Then thefunction

u : x 7→ Ex [g(BT )]

is the unique bounded solution of the Dirichlet problem (6.5).

In particular, if D is bounded and satisfies the l.e.c.c., then u is the unique solutionof the Dirichlet problem.

We start with a uniqueness statement.

Proposition 6.6.1 Let g be a bounded function in C(∂D,R). Set

u(x) = Ex [g(BT )] .

If v is a bounded solution of the Dirichlet problem, then v = u.

In particular, we obtain uniqueness when D is bounded. Notice that we do not makeany assumption on the regularity of D here besides the fact that T <∞ a.s.

Proof. Let v be a bounded solution of the Dirichlet problem. For every N ≥ 1, introducethe reduced set DN = x ∈ D : |x| < N and d(x, ∂D) > 1/N. Notice it is an open set,but which need not be connected. We let TN be the first exit time of DN . By Proposition6.4.3, the process

Mt = vN(Bt)− vN(B0) +

∫ t

0

−1

2∆vN(Bs)ds , t ≥ 0

is a martingale, where vN is a C2 function that coincides with v on DN , and whichis bounded with all its partial derivatives (this may look innocent, but the fact thatsuch a function exists is highly non-trivial, the use of such a function could be avoidedby a stopped analog of Proposition 6.4.3). Moreover, the martingale stopped at TN isMt∧TN

= v(Bt∧TN) − v(B0), because ∆v = 0 inside D, and it is bounded (because v is

bounded on DN), hence uniformly integrable. By optional stopping at TN , we get thatfor every x ∈ DN ,

0 = Ex[MTN] = Ex[v(BTN

)]− v(x) (6.6)

Now, as N → ∞, BTNconverges to BT a.s. by continuity of paths and the fact that

T < ∞ a.s. Since v is bounded, we can use dominated convergence as N → ∞, and getthat for every x ∈ D,

v(x) = Ex[v(BT )] = Ex[g(BT )],

hence the result.

For every x ∈ Rd and r > 0, let σx,r be the uniform probability measure on the sphereSx,r = y ∈ Rd : |y−x| = r. It is the unique probability measure on Sx,r that is invariantunder isometries of Sx,r. We say that a locally bounded measurable function h : D → Ris harmonic on D if for every x ∈ D and every r > 0 such that the closed ball B(x, r)with center x and radius r is contained in D,

h(x) =

∫Sx,r

h(y)σx,r(dy).

Page 57: AdPr2006

6.6. THE DIRICHLET PROBLEM 57

Proposition 6.6.2 Let h be harmonic on a domain D. Then h ∈ C∞(D,R), and ∆h = 0on D.

Proof. Let x ∈ D and ε > 0 such that B(x, ε) ⊂ D. Then let ϕ ∈ C∞(D,R) benon-negative with non-empty compact support in [0, ε[. We have, for 0 < r < ε,

h(x) =

∫S(0,r)

h(x+ y)σ0,r(dy).

Multiplying by ϕ(r)rd−1 and integrating in r gives

ch(x) =

∫Rd

ϕ(|z|)h(x+ z)dz,

where c > 0 is some constant, where we have used the fact that∫Rd

f(x)dx = C

∫R+

rd−1dr

∫S(0,r)

f(ry)σ0,r(dy)

for some C > 0. Therefore, ch(x) =∫

Rd ϕ(|z − x|)h(z)dz and by derivation under the∫

sign, we easily get that h is C∞.

Next, by translation we may suppose that 0 ∈ D and show only that ∆h(0) = 0. wemay apply Taylor’s formula to h, obtaining, as x→ 0,

h(x) = h(0) + 〈∇h(0), x〉+d∑i=1

x2i

∂2h

∂x2i

(0) +∑i6=j

xixj∂2h

∂xi∂xj(0) + o(|x|2).

Now, integration over S0,r for r small enough yields∫Sx,r

h(x)σ0,r(dx) = h(0) + Cr∆h(0) + o(|r|2),

where Cr =∫

S0,rx2

1σ0,r(dx), as the reader may check that all the other integrals up to the

second order are 0, by symmetry. since the left-hand side is h(0), we obtain ∆h(0) = 0.

Therefore, harmonic functions are solutions of certain Dirichlet problems.

Proposition 6.6.3 Let g be a bounded measurable function on ∂D, and let T = inft ≥0 : Bt /∈ D. Then the function h : x ∈ D 7→ Ex[g(BT )] is harmonic on D, and hence∆h = 0 on D.

Proof. For every Borel subsets A1, . . . , Ak of Rd and times t1 < . . . < tk, the map

x 7→ Px(Bt1 ∈ A1, . . . , Btn ∈ An)

is measurable by Fubini’s theorem, once one has written the explicit formula for thisprobability. Therefore, by the monotone class theorem, x 7→ Ex[F ] is measurable for

Page 58: AdPr2006

58 CHAPTER 6. BROWNIAN MOTION

every integrable random variable F , which is measurable with respect to the productσ-algebra on C(R,Rd). Moreover, h is bounded by assumption.

Now, let S = inft ≥ 0 : |Bt−x| ≥ r the first exit time of B form the ball of center xand radius r. Then by (ii), Proposition 6.2.2, S <∞ a.s. By the strong Markov property,

B = (BS+t, t ≥ 0) is an (FS+t) Brownian motion started at BS. Moreover, the first hitting

time of ∂D for B is T = T − S. Moreover, BeT = BT , so that

Ex[g(BT )] = Ex[g(BeT )] =

∫Rd

Px(BS ∈ dy)Ey[g(BT )1T<∞],

and we recognize∫Px(BS ∈ dy)h(y) in the last expression.

Since B starts from x under Px, the rotation invariance of Brownian motion showsthat BS − x has a distribution on the sphere of center 0 and radius r which is invariantunder the orthogonal group, so we conclude that the distribution of BS is the uniformmeasure on the sphere of center x and radius r, and therefore that h is harmonic on D.

It remains to understand whether the function u of Theorem 6.6.1 is actually a solutionof the Dirichlet problem. Indeed, is not the case in general that u(x) has limit g(y) asx ∈ D, x → y, and the reason is that some points of ∂D may be ‘invisible’ to Brownianmotion. The reader can convince himself, for example, that ifD = B(0, 1)\0 is the openball of R2 with center 0 and radius 1, whose origin has been removed, and if g = 10, thenno solution of the Dirichlet problem with boundary constraint g exists. The probabilisticreason for that is that Brownian motion does not see the boundary point 0. This is thereason why we have to make regularity assumptions on D in the following theorem.

Proof of Theorem 6.6.1.

It remains to prove that under the l.e.c.c., u is continuous on D, i.e. u(x) converges tog(y) as x ∈ D converges to y ∈ ∂D. In order to do that, we need a preliminary lemma.Recall that T is the first exit time of D for the Brownian path.

Lemma 6.6.1 Let D be a domain satisfying the l.e.c.c., and let y ∈ ∂D. Then for everyη > 0, Px(T < η) → 1 as x ∈ D → y.

Proof. Let Cy = y + C be a nonempty open convex cone with origin at y such that forsome η > 0, Cy ⊂ Dc (we leave as an exercise the case when only a neighborhood of thiscone around y is contained in Dc). Then it is an elementary geometrical fact that for everyη > 0 small enough, there exist δ > 0 and a nonempty open convex cone C ′ with originat 0, such that x+ (C ′ \B(0, η)) ⊆ Cy for every x ∈ B(y, δ). Now by (iii) in Proposition6.2.2, if Hε

C′ = inft > 0 : Bt ∈ C ′ \ B(0, ε), then P0(HεC′ < η) P0(HC′ < η) = 1 as

η ↓ 0.

Since hitting x+ (C ′ \B(0, η)) implies hitting Cy and therefore leaving D, we obtain,after translating by x, that for every η, ε′ > 0, Px(T < η) can be made ≥ 1 − ε′ for xbelonging to a sufficiently small δ-neighborhood of y in D.

We can now finish the proof of Theorem 6.6.1. Let y ∈ ∂D. We want to estimate thequantity Ex[g(BT )]− g(y) for some y ∈ ∂D. For η, δ > 0, let

Aη,δ =

sup

0≤t≤η|Bt − x| ≥ δ/2

.

Page 59: AdPr2006

6.7. DONSKER’S INVARIANCE PRINCIPLE 59

This event decreases to ∅ as η ↓ 0 because B has continuous paths. Now, for any δ, η > 0,

Ex[|g(BT )− g(y)|] = Ex[|g(BT )− g(y)| ; T ≤ η ∩ Acδ,η]+Ex[|g(BT )− g(y)| ; T ≤ η ∩ Aδ,η]+Ex[|g(BT )− g(y)| ; T ≥ η]

Fix ε > 0. We are going to show that each of these three quantities can be made < ε/3for x close enough to y. Since g is continuous at y, for some δ > 0, |y − z| < δ withy, z ∈ ∂D implies |g(y) − g(z)| < ε/3. Moreover, on the event T ≤ η ∩ Acδ,η, we knowthat |BT − x| < δ/2, and thus |BT − y| ≤ δ as soon as |x− y| ≤ δ/2. Therefore, for everyη > 0, the first quantity is less than ε/3 for x ∈ B(y, δ/2).

Next, if M is an upper bound for |g|, the second quantity is bounded by 2MP (Aδ,η).Hence, by now choosing η small enough, this is < ε/3.

Finally, with δ, η fixed as above, the third quantity is bounded by 2MPx(T ≥ η). Bythe previous lemma, this is < ε/3 as soon as x ∈ B(y, α) ∩D for some α > 0. Therefore,for any x ∈ B(y, α ∧ δ/2) ∩D, |u(x)− g(y)| < ε. This entails the result.

Corollary 6.6.1 A function u : D → R is harmonic in D if and only if it is in C2(D,R),and satisfies ∆u = 0.

Proof. Let u be of class C2(D) be of zero Laplacian, and let x ∈ D. Let ε be such thatB(x, ε) ⊆ D, and notice that u|B(x,ε) is a solution of the Dirichlet problem on B(x, ε) withboundary values u|∂B(x,ε). Then B(x, ε) satisfies the l.e.c.c., so that u|B(x,ε) is the uniquesuch solution, which is also given by the harmonic function of Theorem 6.6.1. Therefore,u is harmonic on D.

6.7 Donsker’s invariance principle

The following theorem completes the description of Brownian motion as a ‘limit’ of cen-tered random walks as depicted in the beginning of the chapter, and strengthen theconvergence of finite-dimensional marginals to that convergence in distribution. We en-dow C([0, 1],R) with the supremum norm, and recall (see the exercises on continuous-timeprocesses) that the product σ-algebra associated with it coincides with the Borel σ-algebraassociated with this norm. We say that a function F : C([0, 1]) → R is continuous if it iscontinuous with respect to this norm.

Theorem 6.7.1 (Donsker’s invariance principle) Let (Xn, n ≥ 1) be a sequence ofR-valued integrable independent random variables with common law µ, such that∫

xµ(dx) = 0 and

∫x2µ(dx) = σ2 ∈ (0,∞).

Let S0 = 0 and Sn = X1 + . . . + Xn, and define a continuous process that interpolateslinearly between values of S, namely

St = (1− t)S[t] + tS[t]+1 t ≥ 0,

Page 60: AdPr2006

60 CHAPTER 6. BROWNIAN MOTION

where [t] denotes the integer part of t and t = t− [t]. Then S[N ] := ((σ2N)−1/2SNt, 0 ≤t ≤ 1) converges in distribution to a standard Brownian motion between times 0 and 1,i.e. for every bounded continuous function F : C([0, 1]) → R,

E[F (S[N ])

]→n→∞

E0[F (B)].

Notice that this is much stronger than what Proposition 6.1.1 says. Despite theslight difference of framework between these two results (one uses cadlag continuous-time version of the random walk, and the other uses an interpolated continuous version),Donsker’s invariance principle is stronger. For instance, one can infer from this theoremthat the random variable N−1/2 sup0≤n≤N Sn converges to sup0≤t≤1Bt in distribution,because f 7→ sup f is a continuous operation on C([0, 1],R). Proposition 6.1.1 would bepowerless to address this issue.

The proof we give here is an elegant demonstration that makes use of a coupling of therandom walk with the Brownian motion, called the Skorokhod embedding theorem. It ishowever specific to dimension d = 1. Suppose we are given a Brownian motion (Bt, t ≥ 0)on some probability space (Ω,F , P ).

Let µ+(dx) = P (X1 ∈ dx)1x≥0, µ−(dy) = P (−X1 ∈ dy)1y≥0 define two non-negative measures. Assume that (Ω,F , P ) is a rich enough probability space so that wecan further define on it, independently of (Bt, t ≥ 0), a sequence of independent identicallydistributed R2-valued random variables ((Yn, Zn), n ≥ 1) with distribution

P ((Yn, Zn) ∈ dxdy) = C(x+ y)µ+(dx)µ−(dy),

where C > 0 is the appropriate normalizing constant that makes this expression a prob-ability measure.

Next, consider let F0 = σ(Yn, Zn), n ≥ 1 and Ft = F0 ∨ FBt , so that (Ft, t ≥ 0)

is a filtration such that B is an (Ft)-Brownian motion. We define a sequence of randomtimes, by T0 = 0, T1 = inft ≥ 0 : Bt ∈ Y1,−Z1, and recursively,

Tn = inft ≥ Tn−1 : Bt −BTn−1 ∈ Yn,−Zn.

By (ii) in Proposition 6.2.2, these times are a.s. finite, and they are stopping times withrespect to the filtration (Ft). We claim that

Lemma 6.7.1 The sequence (BTn , n ≥ 0) has the same law as (Sn, n ≥ 0). Moreover,the intertimes (Tn− Tn−1, n ≥ 0) form an independent sequence of random variables withsame distribution, and expectation E[T1] = σ2.

Proof. By repeated application of the Markov property at times Tn, n ≥ 1, and thefact that the (Yn, Zn), n ≥ 1 are independent with same distribution, we obtain that theprocesses (Bt+Tn−1−BTn−1 , 0 ≤ t ≤ Tn−Tn−1) are independent with the same distribution.The fact that the differences BTn − BTn−1 , n ≥ 1 and Tn − Tn−1, n ≥ 0 form sequences ofindependent and identicaly distributed random variables follows from this observation.

It therefore remains to check that BT1 has same law as X1 and E[T1] = σ2. Rememberfrom Proposition 6.4.2 that given Y1, Z1, the probability that BT1 = Y1 is Z1/(Y1 +Z1), as

Page 61: AdPr2006

6.7. DONSKER’S INVARIANCE PRINCIPLE 61

follows from the optional stopping theorem. Therefore, for every non-negative measurablefunction f , by first conditioning on (Y1, Z1), we get

E[f(BT1)] = E

[f(Y1)

Z1

Y1 + Z1

+ f(−Z1)Y1

Y1 + Z1

]=

∫R+×R+

C(x+ y)µ+(dx)µ−(dy)

(f(x)

y

x+ y+ f(−y) x

x+ y

)= C ′

∫R+

(f(x)µ+(dx) + f(−x)µ−(dx)) = C ′E[f(X1)],

for C ′ = C∫xµ+(dx) which can only be = 1 (by taking f = 1). Here, we have used the

fact that∫xµ+(dx) =

∫xµ−(dx), which amounts to say that X1 is centered.

For E[T1], recall from Propostition 6.4.2 that E[inft ≥ 0 : Bt ∈ x,−y] = xy, soby a similar conditioning argument as above,

E[T1] =

∫R+×R+

C(x+ y)xyµ+(dx)µ−(dy) = σ2,

where we again used that C∫xµ+(dx) = 1.

Proof of Donsker’s invariance principle. We suppose given a Brownian motionB. For N ≥ 1, define B

(N)t = N1/2BN−1t, t ≥ 0, which is a Brownian motion by scaling

invariance. Perform the Skorokhod embedding construction on B(N) to obtain variablesT

(N)n , n ≥ 0. Then, let S

(N)n = B

(N)

T(N)n

. Then by Lemma 6.7.1, S(N)n , n ≥ 0 is a random

walk with same law as Sn, n ≥ 0. We interpolate linearly between integers to obtain acontinuous process S

(N)t , t ≥ 0. Finally, let S

(N)t = (σ2N)−1/2S

(N)Nt , t ≥ 0 and T

(N)n =

N−1T(N)n .

We are going to show that the supremum norm of Bt− S(N)t , 0 ≤ t ≤ 1 converges to 0

in probability.

By the law of large numbers, Tn/n converges a.s. to σ2 as n → ∞. Thus, by amonotonicity argument, N−1 sup0≤n≤N |Tn − σ2n| converges to 0 a.s. as N → ∞. As aconsequence, this supremum converges to 0 in probability, meaning that for every δ > 0,

P

(sup

0≤n≤N|T (N)n − n/N | ≥ δ

)→

N→∞0.

On the other hand, for every t ∈ [n/N, (n+1)/N ], there exists some u ∈ [T(N)n , T

(N)n+1] with

Bu = S(N)t , because S

(N)n/N = BeT (N)

nfor every n and by the intermediate values theorem,

S(N) and B being continuous. Therefore, the event sup0≤t≤1 |S(N)t −Bt| > ε is contained

in the union KN ∪ LN , where

KN =

sup

0≤n≤N|T (N)n − n/N | > δ

and

LN = ∃ t ∈ [0, 1], ∃u ∈ [t− δ − 1/N, t+ δ + 1/N ] : |Bt −Bu| > ε.

Page 62: AdPr2006

62 CHAPTER 6. BROWNIAN MOTION

We already know that P (KN) → 0 as N → ∞. For LN , since B is a.s. uniformlycontinuous on [0, 1], by taking δ small enough and then N large enough, we can makeP (LN) as small as wanted. Therefore, we have showed that

P(‖S(N) −B‖∞ > ε

)→n→∞

0.

Therefore, (S(N), 0 ≤ t ≤ 1) converges in probability for the uniform norm to (Bt, 0 ≤t ≤ 1), which entails convergence in distribution by Proposition 5.2.1. This concludes theproof.

Page 63: AdPr2006

Chapter 7

Poisson random measures andPoisson processes

7.1 Poisson random measures

Let (E, E) be a measurable space, and let µ be a non-negative σ-finite measure on (E, E).We denote by E∗ the set of σ-finite atomic measures on E, i.e. the set of σ-finite measurestaking values in Z+ t ∞ (in fact, we will only consider measures that can be put inthe form

∑i∈I δxi

with I countable and xi ∈ E, i ∈ I). The set E∗ is endowed with theproduct σ-algebra E∗ = σ(XA, A ∈ E), where XA(m) = m(A) for m ∈ E∗, and A ∈ E .Otherwise said, for every A ∈ E , the mapping m 7→ m(A) from E∗ to Z+ t ∞ ismeasurable with respect to E∗. For λ > 0 we denote by P(λ) the Poisson distributionwith parameter λ, which assigns mass e−λλn/n! to the integer n.

Definition 7.1.1 A Poisson measure on (E, E) with intensity µ is a random variableM with values in E∗, such that if (Ak, k ≥ 1) is a sequence of disjoint sets in E, withµ(Ak) <∞ for every k,

(i) the random variables M(Ak), k ≥ 1 are independent, and

(ii) the law of M(Ak) is P(µ(Ak)) for k ≥ 1.

Notice that properties (i) and (ii) completely characterize the law of the randomvariable M . Indeed, notice that events which are either empty or of the form

m ∈ E∗ : m(A1) = i1, . . . ,m(Ak) = ik ,

with pairwise disjoint A1, . . . , Ak ∈ E , µ(Aj) < ∞, 1 ≤ j ≤ k and where (i1, . . . , ik) areintegers, form a π-system that generates E∗. If now M is a Poisson random measure withintensity µ, on some probability space (Ω,F , P ), then

P (M(A1) = i1, . . . ,M(Ak) = ik) =k∏j=1

e−µ(Aj)µ(Aj)

ij

nj!.

Hence the uniqueness of the law of a random measure satisfying (i), (ii). Existence isstated in the next

63

Page 64: AdPr2006

64 CHAPTER 7. POISSON RANDOM MEASURES AND PROCESSES

Proposition 7.1.1 For every σ-finite non-negative measure µ on (E, E), there exists aPoisson random measure on (E, E) with intensity µ.

Proof. Suppose first that λ = µ(E) <∞. We let N be a Poisson random variable withparameter λ, and X1, . . . be independent random variables, independent of N , with lawµ/µ(E). Finally, we let Mω =

∑N(ω)i=1 δXi(ω).

Now, ifN is Poisson with parameter λ and (Yi, i ≥ 1) are independent and independentof N , with P (Yi = j) = pj, 1 ≤ j ≤ k, it holds that

∑Ni=1 1Yi=j, 1 ≤ j ≤ k are

independent with respective laws P(pjλ), 1 ≤ j ≤ k. It follows that M is a Poissonmeasure with intensity µ: for disjoint A1, . . . , Ak in E with finite µ-measures, we let Yi = jwhenever Xi ∈ Aj, defining independent random variables in 1, . . . , k with P (Yi = j) =µ(Aj)/µ(E), so that M(Aj), 1 ≤ j ≤ k are independent P(µ(E)µ(Aj)/µ(E)), 1 ≤ j ≤ krandom variables.

In the general case, since µ is σ-finite, there is a partition of E into measurable setsEk, k ≥ 1 that are disjoint and have finite µ-measure. We can construct independentPoisson measures Mk on Ek with intensity µ(· ∩ Ek), for k ≥ 1. We claim that

M(A) =∑k≥1

Mk(A ∩ Ek) , A ∈ E ,

defines a Poisson random measure with intensity µ. This is an easy consequence of theproperty that if Z1, Z2, . . . are independent Poisson variables with respective parametersλ1, λ2, . . ., then the sum Z1 + Z2 + . . . is Poisson with parameter λ1 + λ2 + . . . (with theconvention that P(∞) is a Dirac mass at ∞).

From the construction, we obtain the following important property of Poisson randommeasures:

Proposition 7.1.2 Let M be a Poisson random measure on E with intensity µ, and letA ∈ E be such that µ(A) < ∞. Then M(A) has law P(µ(A)), and given M(A) = k, therestriction M |A has same law as

∑ki=1 δXi

, where (X1, X2, . . . , Xk) are independent withlaw µ(· ∩ A)/µ(A). Moreover, if A,B ∈ E are disjoint, then the restrictions M |A,M |Bare independent. Last, any Poisson random measure can be written in the form M(dx) =∑

i∈I δxi(dx) where I is a countable index-set and the xi, i ∈ I are random variables.

7.2 Integrals with respect to a Poisson measure

Proposition 7.2.1 Let M be a Poisson random measure on E, with intensity µ. Thenfor every measurable f : E → R+, the quantity

M(f) :=

∫E

f(x)M(dx)

defines a random variable, and

E[exp(−M(f))] = exp

(−∫E

µ(dx)(1− exp(−f(x)))

).

Page 65: AdPr2006

7.2. INTEGRALS WITH RESPECT TO A POISSON MEASURE 65

Moreover, if f : E → R is measurable and in L1(µ), then f ∈ L1(M) a.s.,∫Ef(x)M(dx)

defines a random variable, and

E[exp(iM(f))] = exp

(∫E

µ(dx)(exp(if(x))− 1)

).

The first formula is sometimes called the Laplace functional formula, or Campbellformula. Notice that by replacing f by af , differentiating the formula with respect to aand letting a ↓ 0, one gets the first moment formula

E[M(f)] =

∫E

f(x)µ(dx) ,

whenever f ≥ 0, or f is integrable w.r.t. µ (in this case, consider first f+, f−). Similarly,

VarM(f) =

∫E

f(x)2µ(dx)

(for this, first notice that the restrictions of M to f ≥ 0 and f < 0 are independent).

Proof. Let En, n ≥ 0 be a measurable partition of E into sets with finite µ-measure.First assume that f = 1A for A ∈ E , µ(A) < ∞. Then M(A) is a random variableby definition of M , and this extends to any A ∈ E by considering A ∩ En, n ≥ 0 andsummation. Since any measurable non-negative function is the increasing limit of finitelinear combinations of such indicator functions, we obtain that M(f) is a random variableas a limit of random variables. Moreover, a similar argument shows that M(f1En), n ≥ 0are independent random variables.

Next, assume f ≥ 0. The number Nn of atoms of M that fall in En has law P(µ(En))and given Nn = k, the atoms can be supposed to be independent random variables withlaw µ(· ∩ En)/µ(En). Therefore,

E[exp(−M(f1En))] =∞∑k=0

e−µ(En)µ(En)k

k!

(∫En

µ(dx)

µ(En)e−f(x)

)k= exp

(−∫En

µ(dx)(1− exp(−f(x)))

)From the independence of the variables M(f1En), we can then take products over n ≥ 0(i.e. apply monotone convergence) and obtain the wanted formula.

From this, we obtain the first moment formula for functions f ≥ 0. If f is a measurablefunction from E → R, applying the result to |f | shows that if f ∈ L1(µ), thenM(|f |) <∞a.s. so M(f) is well-defined for almost every ω, and defines a random variable as it is equalto M(f+)−M(f−).

To establish the last formula of the theorem, in the case where f ∈ L1(µ), follows bythe same kind of arguments: first, we establish the formula for f1En in place of f . Then, toobtain the result, we must show that

∫Anµ(dx)(eif(x)−1) converges to

∫Eµ(dx)(eif(x)−1),

where An = E0∪· · ·∪En. But |eif(x)−1| ≤ |f(x)|, whence the function under considerationis integrable with respect to µ, giving the result (|

∫E\An

g(x)µ(dx)| ≤∫E\An

|g(x)|µ(dx)

decreases to 0 whenever g is integrable).

Page 66: AdPr2006

66 CHAPTER 7. POISSON RANDOM MEASURES AND PROCESSES

7.3 Poisson point processes

We now show how Poisson random measures can be used to define certain stochasticprocesses. Let (E, E) be a measurable space, and consider a σ-finite measure G on (E, E).Let µ be the product measure dt⊗G(dx) on R+ × E, where dt is the Lebesgue measureon (R+,B(R+)). Otherwise said, µ is the unique measure such that µ([0, t]×A) = tG(A)for t ≥ 0 and A ∈ E .

A Levy process (Xt, t ≥ 0) (with values in R) is a process with independent andstationary increments, i.e. such that for every 0 = t0 ≤ t1 ≤ . . . ≤ tk, the random variables(Xti −Xti−1

, 1 ≤ i ≤ k) are independent with respective laws that of Xti−ti−1, 1 ≤ i ≤ k.

Equivalently, X is a Levy process if and only if X(t) = (Xt+s −Xt, s ≥ 0) has same lawas X and is independent of FX

t = σ(Xs, 0 ≤ s ≤ t) for every t ≥ 0 (simple Markovproperty).

Proposition 7.3.1 A Poisson random measure M whose intensity µ is of the above formis called a Poisson point process. If f be a measurable G-integrable function on E, thenthe process

N ft =

∫[0,t]×E

f(x)M(ds, dx) , t ≥ 0 ,

is a Levy process. Moreover, the process

M ft =

∫[0,t]×E

f(x)M(ds, dx)− t

∫E

f(x)G(dx) , t ≥ 0,

is a martingale with respect to the filtration Ft = σ(M([0, s]× A), s ≤ t, A ∈ E), t ≥ 0. Ifmoreover f ∈ L2(µ), the process

(M ft )2 − t

∫E

f(x)2G(dx),

is an (Ft)-martingale.

Proof. For s ≤ t, we have N ft − N f

s =∫

(s,t]×E f(x)M(du, dx). Moreover, it is easy to

check that M(du, dx)1u∈(s,t] has same law as the image of M(du, dx)1u∈(0,t−s] under(u, x) 7→ (s + u, x) from R+ × E to itself, and is independent of M(du, dx)1u∈[0,s].We obtain that N f has stationary and independent increments. The fact that M f is amartingale is a straightforward consequence of the first moment formula and the simpleMarkov property. The last statement comes from writing (M f

t )2 = (M ft −M f

s + M fs )2

and expanding, then using the variance formula and the simple Markov property.

7.3.1 Example: the Poisson process

Let X1, X2, . . . be a sequence of independent exponential random variables with parameterθ, and define 0 = T0 ≤ T1 ≤ . . . by Tn = X1 + . . .+Xn. We let

N θt =

∞∑n=1

1Tn≤t , t ≥ 0,

Page 67: AdPr2006

7.3. POISSON POINT PROCESSES 67

be the cadlag process that counts the number of times Tn that are ≤ t. The process(N θ

t , t ≥ 0) is called the (homogeneous) Poisson process with intensity θ. This is the so-called Markovian description of the Poisson process, which is a jump-hold Markov process.The following alternative description makes use of Poisson random measures. We givethe statement without proof, which can be found in textbooks, or make a good exercise(first notice that with both definitions, N θ is a process with stationary and independentincrements).

Proposition 7.3.2 Let θ > 0, and let M be a Poisson random measure with intensityθdt on R+. Then the process

N θt = M([0, t]) , t ≥ 0

is a Poisson process with intensity θ.

The set of atoms of the measure M itself is sometimes also called a Poisson (point)process with intensity θ.

7.3.2 Example: compound Poisson processes

A compound Poisson process with intensity ν is a process of the form

N νt =

∫[0,t]×R

xM(ds, dx) , t ≥ 0,

where M is a Poisson random measure with intensity dt⊗ ν(dx) and ν is a finite measureon R. Alternatively, if we write M in the form

∑i∈I δ(ti,xi), for every t ≥ 0 we can write

Xt = xi whenever t = ti and Xt = 0 otherwise. As an exercise, one can prove that thisis a.s. well-defined, i.e. that a.s., for every t ≥ 0, the set i ∈ I : ti = t has at most oneelement. With this notation, we can write

N νt =

∑0≤s≤t

Xs , t ≥ 0

(notice that there is a.s. a finite set of times s ∈ [0, t] such that Xs 6= 0, so the sum ismeaningful).

There is a Markov jump-hold description for these processes as well: if N θ is a Poissonprocess with parameter θ = ν(R) and jump times 0 < T1 < T2 < . . ., and if Y1, Y2, . . . is asequence of i.i.d. random variables, independent of N θ and with law ν/θ, then the process∑

n≥1

Yn1Tn≤t , t ≥ 0,

is a compound Poisson process with intensity ν. This comes from the following markingproperty of Poisson measures: suppose we have a description of a Poisson random measureM(dx) with intensity µ as

∑i∈I δXi

(dx), where (Xi, i ∈ I) is a countable family of randomvariables. If (Yi, i ∈ I) is a family of i.i.d. random variables with law ν, and independent

Page 68: AdPr2006

68 CHAPTER 7. POISSON RANDOM MEASURES AND PROCESSES

of M , then M ′ =∑

i∈I δ(Xi,Yi) is a Poisson random measure with intensity the productmeasure µ⊗ ν.

We let CP(ν) be the law of N ν1 , it is called the compound Poisson distribution with

intensity ν. It can be written in the form

CP(ν) =∑n≥0

e−ν(R)ν∗n

n!,

where ν∗n is the n-fold convolution of the measure ν. Recall that the convolution of twofinite measures µ, ν on R is the unique measure µ ∗ ν which is characterized by

µ ∗ ν(A) =

∫∫1A(x+ y)µ(dx)ν(dy) , A ∈ BR,

and that if µ, ν are probability measures, then µ∗ν is the law of the sum of two independentrandom variables with respective laws µ, ν). The characteristic function of CP(ν) is givenby

ΦCP(ν)(u) = exp(−ν(R)(1− Φν/ν(R)(u))),

where Φν/ν(R) is the characteristic function of ν/ν(R).

Page 69: AdPr2006

Chapter 8

Infinitely divisible laws and Levyprocesses

In this chapter, we consider only random variables and processes with values in R.

8.1 Infinitely divisible laws and Levy-Khintchine for-

mula

Definition 8.1.1 Let µ be a probability measure on (R,BR). We say that µ is infinitelydivisible (ID) if for every n ≥ 1, there exists a probability distribution µn such that ifX1, . . . , Xn are independent with law µn, then their sum X1 + . . .+Xn has law µ.

Otherwise said, for every n, there exists µn such that µ∗nn = µ, where ∗ stands forthe convolution operation for measures. Yet otherwise said, the characteristic functionΦ of µ is such that for every n ≥ 1, there exists another characteristic function Φn withΦnn = Φ. We stress that it is not the existence of a function whose n-th power is Φ which

is problematic, but really that this function is a characteristic function.

To start with, let us mention examples of ID laws. Constant random variables areID. The Gaussian N (m,σ2) is the convolution of n laws N (m/n, σ2/n), so it is ID. ThePoisson law P(λ) is also ID as the convolution of n laws P(λ/n). More generally, acompound Poisson law CP(ν) is ID, as the n-th power of CP(ν/n).

It is a bit harder to see, but however true, that exponential and geometric distributionsare ID. However, the uniform distribution on [0, 1], or the Bernoulli distribution withparameter p ∈ (0, 1), are not ID. Suppose indeed that an ID law µ has a support whichis bounded above and below by M > 0. Then the support of µn is bounded by M/n, butthen its variance is ≤M2/n2, which shows that the variance of µ is ≤M2/n for every n,hence µ is a Dirac mass.

The main goal of this chapter is to give a structural theorem for ID laws, the Levy-Khintchine formula. Say that a triple (a, q,Π) is a Levy triple if

• a ∈ R,

• q ≥ 0,

69

Page 70: AdPr2006

70 CHAPTER 8. ID LAWS AND LEVY PROCESSES

• Π is a σ-finite measure on R such that Π(0) = 0 and∫

(x2 ∧ 1) Π(dx) <∞.

In particular, Π(1|x|>ε) <∞ for every ε > 0.

Theorem 8.1.1 (Levy-Khintchine formula) Let µ be an ID law. Then there exist aunique Levy triple (a, q,Π) such that if Φ is the characteristic function of µ, Φ(u) = eψ(u),where ψ is the characteristic exponent given by

ψ(u) = iau− q

2u2 +

∫R(eiux − 1− iux1|x|<1)Π(dx).

We reobtain the constants for a = q = 0, the normal laws for Π = 0, and the compoundPoisson laws for a = q = 0 and Π(dx) = ν(dx), a finite measure.

Lemma 8.1.1 The characteristic function Φ of an ID law never vanishes, and therefore,the characteristic exponent ψ with ψ(0) = 0 is well-defined and unique.

Proof. If µ is ID, then Φ = Φnn for all n, where Φn is the characteristic exponent of

some law µn. Therefore, |Φ| = |Φn|n, and taking logarithms, as n → ∞, we see thatΦn converges pointwise to 1Φ 6=0. However, Φ is continuous and takes the value 1 at 0,so it is non-zero in a neighborhood of 0, and 1Φ 6=0 equals 1 (hence is continuous) in aneighborhood of 0. By Levy’s convergence theorem, this shows that µn weakly convergesto some distribution, which has no choice but to be δ0. In particular, Φ never vanishes.

To conclude, it is a standard topology exercise that a continuous function f : R → Cthat never vanishes and such that f(0) = 1 can be uniquely lifted into a continuousfunction g : R → C with g(0) = 0, so that eg = f .

As a corollary, notice that Φn, the n-th ‘root’ of Φ, can itself be written in the formeψn for a unique continuous ψn satisfying ψn(0) = 0, so that ψn = ψ/n. It also entails theuniqueness of µn such that µ∗nn = µ.

Lemma 8.1.2 An ID law is the weak limit of compound Poisson laws.

Proof. Let Φn be the characteristic function of µn, as defined above. Then since (1−(1−Φn))

n = Φ, and Φn → 1 pointwise, we obtain that −n(1−Φn) → ψ pointwise, taking thecomplex logarithm in a neighborhood of 1. In fact, this convergence even holds uniformlyon compact neighborhood of 0, a fact that we will need later on. Exponentiating givesexp(−n(1−Φn)) → Φ. However, on the left-hand size we can recognize the characterisitcfunction of a compound Poisson law with intensity nµn.

Proof of the Levy-Khintchine formula. We must now prove that the limit ψ of−n(1 − Φn) has the form given in the statement of the theorem. First of all, we makea technical modification of the statement, replacing the 1|x|<1 in the statement by acontinuous function h such that 1|x|<1 ≤ h ≤ 1|x|≤2. This will just modify the valueof a in the statement.

Page 71: AdPr2006

8.2. LEVY PROCESSES 71

Let ηn(dx) = (1∧ x2)nµn(dx), which is a sequence of measures with finite total mass.Suppose we know that the sequence (ηn, n ≥ 1) is tight and (ηn(R), n ≥ 1) is bounded,and let η be the limit of ηn along some subsequence nk. Then∫

R(eiux − 1)nµn(dx) =

∫R(eiux − 1)

ηn(dx)

x2 ∧ 1(8.1)

=

∫R

(eiux − 1− iuxh(x))

x2 ∧ 1ηn(dx) + iu

∫R

xh(x)

x2 ∧ 1ηn(dx)

=

∫R

Θ(u, x)ηn(dx) + iuan

where

Θn(u, x) =

(eiux − 1− iuxh(x))/(x2 ∧ 1) if x 6= 0−u2/2 if x = 0,

and an =∫

Rxh(x)x2∧1

ηn(dx). Now, for each fixed u, Θ(·, u) is a continuous bounded function,and therefore, along the subsequence nk,

∫R Θ(u, x)ηn(dx) converges to

∫R Θ(u, x)η(dx).

Since the left-hand side in (8.1) converges to ψ(u), this implies that ankconverges to some

a ∈ R. Therefore, if q = η(0), we obtain that

ψ(u) = iua− q

2u2 +

∫R(eiux − 1− iuxh(x))Π(dx),

where Π(dx) = 1x 6=0(x2 ∧ 1)−1η(dx) is a measure that is σ-finite, integrates x2 ∧ 1, and

does not charge 0. Hence the result.

So, let us prove that (ηn, n ≥ 1) is tight and that the total masses are bounded. First,x21|x|≤1 ≤ C(1− cosx) for some C > 0, so

ηn(|x| ≤ 1) =

∫Rx21|x|≤1nµn(dy) ≤ C

∫R(1− cosx)nµn(dx),

which converges to−C<ψ(1) as n→∞. Second, adapting Lemma 5.4.1, since ηn1|x|≥1 =nµn1|x|≥1, for some C > 0, and every K ≥ 1,

ηn(|x| ≥ K) ≤ CK

∫|x|≤K−1

n(1−<Φn(x))dx →n→∞

−CK∫ K−1

−K−1

<ψ(x)dx,

where the limit can be taken because the convergence of the integrand is uniform oncompact neighborhoods of 0, as stressed in the proof of Lemma 8.1.2. Now the limit canbe made as small as wanted for K large enough, because ψ is continuous and ψ(0) = 0.This entails the result.

The uniqueness statement will be proved in the next section.

8.2 Levy processes

In this section, all the Levy processes under consideration start at X0 = 0. Levy processesare closely related to ID laws, indeed, if X is a Levy process, then the random variable

Page 72: AdPr2006

72 CHAPTER 8. ID LAWS AND LEVY PROCESSES

X1 can be written as a sum of i.i.d. variables

X1 =n∑k=1

(Xk/n −X(k−1)/n),

hence is ID. In fact, (laws of) cadlag Levy processes are in one-to-one correspondence withID laws, as we show in this section. The first problem we address is that the mappingX 7→ X1 is injective from the set of (laws of) cadlag Levy processes to the set of ID laws.

Proposition 8.2.1 Let µ be an ID law. Then there exists at most one cadlag Levyprocess (Xt, t ≥ 0) such that X1 has law µ. Moreover, if µ has a Levy triple (a, q,Π) withassociated characteristic exponent

ψ(u) = iau− q

2u2 +

∫R(eiux − 1− iux1|x|<1)Π(dx),

then the law of such a process X is entirely characterized by the formula.

E[exp(iuXt)] = exp(tψ(u)).

Proof. If X is as in the statement, then for n ≥ 1, X1/n must have ψ/n as characteristicexponent by uniqueness of the characteristic exponent of the n-th root of an ID law.From this we deduce easily that E[exp(iuXt)] = exp(tψ(u)) for every t ∈ Q+ and u ∈ R.Since X is cadlag we deduce the result for every t ∈ R+ by approximating t by 2−nd2nte.Therefore, the one-dimensional marginal distributions of X are uniquely determined byµ. It is then easy to check that the finite-marginal distributions of Levy processes are inturn determined by their one-dimensional marginal distributions, because the increments(Xtj −Xtj−1

, 1 ≤ j ≤ k), for any 0 = t0 ≤ t1 ≤ . . . ≤ tk, are independent with respectivelaws (Xtj−tj−1

, 1 ≤ j ≤ k). Hence the result.

The next theorem is a kind of converse to this theorem, and gives an explicit construc-tion of ‘the’ cadlag Levy process whose law at time 1 is a given ID law µ. Let (a, q,Π)be a Levy triple associated to an ID law µ. Consider a Poisson random measure M onR+ × R with intensity dt ⊗ Π(dx), and let (t,∆t) = (t, x) if M has an atom of the form(t, x), and ∆t = 0 otherwise. For any n ≥ 1, consider the martingale

Y nt =

∫[0,t]

∫R1n−1≤|y|<1yM(ds, dy)− t

∫yΠ(dy)1n−1≤|y|<1 , t ≥ 0

associated by Proposition 7.3.1 with the Poisson measure M(dt, dx)1n−1≤|x|<1. Noticealso that this last measure has always a finite number of atoms, because Π(dx)1|x|>n−1is a finite measure by assumption on Π, so that

Y nt =

∑0≤s≤t

∆t1n−1≤|∆t|<1 − t

∫yΠ(dy)1n−1≤|y|<1 , t ≥ 0. (8.2)

Independently of M , let Bt be a standard Brownian motion. Finally notice that

Y 0t =

∑0≤s≤t

∆t1|∆t|≥1 , t ≥ 0 (8.3)

is a compound Poisson process with intensity Π(dx)1|x|>1. We let Ft be the σ-algebragenerated by Bs, Y

0s , Y

ns , n ≥ 1; 0 ≤ s ≤ t.

Page 73: AdPr2006

8.2. LEVY PROCESSES 73

Theorem 8.2.1 (Levy-Ito’s theorem) Let µ be an ID law, with Levy triple (a, q,Π),and let B, Y 0, Y n, n ≥ 1 denote the processes associated with this triple as explained above.Then there exists a cadlag square integrable (Ft)-martingale Y ∞ such that for every t ≥ 0,

E

[sup

0≤s≤t|Y ns − Y ∞

s |2]−→n→∞

0.

Moreover, the process

Xt = at+√qBt + Y 0

t + Y ∞t , t ≥ 0

is a Levy process such that X1 has distribution µ.

This theorem, which is extremely useful in practice, is an explicit construction of anycadlag Levy process, out of four independent ingredients: a deterministic drift, a Brownianmotion, and a jump part made of a compound Poisson process, and a compensated L2

cadlag martingale. The compensation by a drift in the formula defining Y n is crucial,because the identity function is in general not in L1(Π), so that

∫[0,t]×[0,1]

xM(ds, dx) is in

general ill-defined.

Proof. For every n > m > 0, the process Y n − Y m is a cadlag martingale, and Doob’sL2 inequality gives

E

[sup

0≤s≤t|Y ns − Y m

s |2]

≤ 4E[|Y nt − Y m

t |2] = 4t

∫Ry2Π(dy)1n−1≤|x|<m−1

≤ 4t

∫Ry2Π(dy)10<|x|<m−1 ,

where we used the last statement of Proposition 7.3.1 for the second equality. Since∫y2Π(dy)10<x<1 < ∞, this can be made as small as wanted for m large enough. In

particular, for every t, Y nt is a Cauchy sequence in L2, and thus converges to a limit Y ∞

t

in L2. The process (Y ∞t , t ≥ 0) then defines a martingale, as is checked by passing to the

limit as n → ∞ in E[Y nt |Fs] = Y n

s . Moreover, by passing to the limit as n → ∞, weobtain that sup0≤s≤t |Y ∞

s − Y ms |2 converges in L2 to 0 as m → ∞, for every t ≥ 0. By

extracting along a subsequence, we may assume that the convergence is almost-sure, sothat Y ∞ is the a.s. uniform limit over compacts of cadlag processes, hence is also a cadlagprocess (in fact, admits a cadlag version).

Therefore, the process X defined in the statement is indeed a cadlag process, and it iseasy to show that it is a Levy process, being a pointwise L2 limit of Levy processes. Thelast thing that remains to be proved is that X1 has law µ. But from the independence ofthe components used to build X, we obtain that if Xn

t = at+√qBt + Y 0

t + Y nt ,

E[exp(iuXn1 )] = exp

(iua− q

2u2 +

∫R(eiuy − 1)Π(dy)1|x|≥1

+

∫R(eiuy − 1− iuy)Π(dy)1n−1≤|y|<1

).

By passing to the limit as n → ∞, we obtain that X1 has the characteristic functionassociated to µ.

Page 74: AdPr2006

74 CHAPTER 8. ID LAWS AND LEVY PROCESSES

Proof of the uniqueness in Theorem 8.1.1. Let µ be an ID law with Levy triple(a, q,Π), and let X be the unique cadlag Levy process such that X1 has law µ, given byProposition 8.2.1. Then Theorem 8.2.1 shows that the jumps of X between times 0 andt, i.e. the process (∆s, 0 ≤ s ≤ t) defined by ∆s = Xs − Xs−, s ≥ 0, are the atoms ofa Poisson random measure M with intensity tΠ on R. This intensity is determined bythe law of M through the first moment formula tΠ(A) = E[M(A)] of Proposition 7.2.1.Then, by defining Y 0 and Y n by the formulas (8.2), (8.3) and letting Y ∞ = limn Y

n

along a subsequence according to which the limit is almost-sure uniformly on compacts,we obtain that X − Y 0 − Y ∞ is a (scaled) Brownian motion with drift, with same law

as B = (at +√qBt, t ≥ 0). We can recover a as the expectation of B1, and

√q as its

variance. Finally, we see that µ uniquely determines its Levy triple.

Page 75: AdPr2006

Chapter 9

Exercises

Warmup

The exercises of this section are designed to help remind you of basic concepts of prob-ability theory (random variables, expectation, classical probability distributions, Borel-Cantelli lemmas). The last one is a longer exercise that contains the basic results onuniform integrability that are needed in this course.

Exercise 9.0.1Remind yourself what the following classical discrete distributions are : Bernoulli with pa-rameter p ∈ [0, 1], binomial with parameters (n, p) ∈ N× [0, 1], geometric with parameterp ∈ [0, 1], Poisson with parameter λ ≥ 0.

Do so with the following classical distributions on R: uniform on [a, b], exponentialwith mean θ−1, gamma with (positive) parameters (a, θ) (mean a/θ, variance a/θ2), betawith (positive) parameters (a, b), Gaussian with mean m and variance σ2, Cauchy withparameter a.

Exercise 9.0.2Compute the distribution of 1/N2, where N is a standard Gaussian N (0, 1) randomvariable. What is the distribution of N/N ′, where N,N ′ are two independent such randomvariables ?

Exercise 9.0.3Show that for any countable set I and I-indexed family (Xi, i ∈ I) of non-negative randomvariables, then supi∈I E[Xi] ≤ E[supi∈I Xi]. Show that these two quantities are equal iffor every i, j ∈ I there exists some k ∈ I such that Xi ∨Xj ≤ Xk.

Exercise 9.0.4Fix α > 0, and let (Zn, n ≥ 0) be a sequence of independent random variables with valuesin 0, 1, whose laws are characterized by

P (Zn = 1) =1

nα= 1− P (Zn = 0).

Show that Zn converges to 0 in L1. Show that lim supn Zn is 0 a.s. if α > 1 and 1 a.s. ifα ≤ 1.

75

Page 76: AdPr2006

76 CHAPTER 9. EXERCISES

Exercise 9.0.5Let (Xn, n ≥ 1) be a sequence of independent exponential random variables with mean1. Show that lim supn(log n)−1Xn = 1 a.s.

Exercise 9.0.6Let N be a random N (0, 1) random variable. Show that

P (N > x) ≤ 1

x√

2πexp(−x2/2).

Show in fact that as x→∞,

P (N > x) =1

x√

2πexp(−x2/2)(1 + o(1)).

Let (Yn, n ≥ 1) be a sequence of independent such Gaussian variables. Show thatlim supn(2 log n)−1/2Yn = 1 a.s.

Exercise 9.0.7 The basics of uniform integrabilityLet (E,A, µ) be a measured space, with µ(E) < ∞. If f is a measurable non-negativefunction, we let µ(f) be a shorthand for

∫Efdµ.

A family of R-valued functions (fi, i ∈ I) in L1(E,A, µ) is said to be uniformly inte-grable (U.I. in short) if the following holds :

supi∈I

µ(|fi|1|fi|>a) →a→∞

0.

You may think of (E,A, µ) and the fi as being a probability space and random variables.

1. Show that a U.I. family is bounded in L1(E,A, µ). Show that the converse is nottrue.

2. Show that a finite family of integrable functions is U.I.

3. Let G : R+ → R+ be a measurable function such that limx→∞ x−1G(x) = +∞.Show that for every C > 0, the family

f ∈ L1(E,A, µ) : µ(G(|f |)) ≤ C

is U.I. Deduce that a family of measurable functions that is bounded in Lp(E,A, µ) forsome p > 1 is U.I.

4. (Harder) Show that the converse is true : if (fi, i ∈ I) is a U.I. family, then thereexists a function G as in 3. so that (fi, i ∈ I) is included in a set of the form of previousdisplayed expression. (Hint : consider an increasing positive sequence (an, n ≥ 0) suchthat supi∈I µ(|fi|1fi≥an) ≤ 2−n for every n)

5. Let (fi, i ∈ I) be a family that is bounded in L1(E,A, µ). Show that (i) and (ii)below are equivalent :

(i) (fi, i ∈ I) is U.I.

(ii) ∀ ε > 0, ∃ δ > 0 s.t. ∀A ∈ A, µ(A) < δ =⇒ supi∈I µ(|fi|1A) < ε.

6. Show that if (fi, i ∈ I) and (gj, j ∈ J) are two U.I. families, then (fi+gj, i ∈ I, j ∈ J)is also U.I.

Page 77: AdPr2006

9.1. CONDITIONAL EXPECTATION 77

7. Let (fn, n ≥ 0) be a sequence of L1 functions that converges in measure to ameasurable function f , i.e. for every ε > 0,

µ(|f − fn| > ε) →n→∞

0.

Show that (fn, n ≥ 0) converges in L1 to f if and only if (fn, n ≥ 0) is U.I. Hint : For thenecessary condition, you might find useful to consider sets such as |f − fn| > 1, ε <|f − fn| ≤ 1 and |f − fn| ≤ ε.

Remark. This shows that a sequence of random variables converging in probability (ora.s.) to some other random variable has an ’upgraded’ L1 convergence if and only if it isuniformly integrable.

9.1 Conditional expectation

Exercise 9.1.1Let X, Y be two random variables in L1 so that

E[X|Y ] = Y and E[Y |X] = X.

Show that X = Y a.s. As a hint, you may want to consider quantities like E[(X −Y )1X>c,Y≤c] + E[(X − Y )1X≤c,Y≤c].

Exercise 9.1.2Let X, Y be two independent Bernoulli random variables with parameter p ∈ (0, 1). LetZ = 1X+Y=0. Compute E[X|Z], E[Y |Z].

Exercise 9.1.3Let X ≥ 0 be a random variable on a probability space (Ω,F , P ), and let G ⊆ F bea sub-σ-algebra. Show that X > 0 implies that E[X|G] > 0, up to an event of zeroprobability. Show that E[X|G] > 0 is actually the smallest G-measurable event thatcontains the event X > 0, up to zero probability events.

Exercise 9.1.4Check that the sum Z of two independent exponential random variables X,Y with param-eter θ > 0 (mean 1/θ) is a gamma distribution with parameter (2, θ), whose density withrespect to Lebesgue measure is θ2x exp(−θx)1x≥0. Show that for every non-negativemeasurable h,

E[h(X)|Z] =1

Z

∫ Z

0

h(u)du.

Conversely, let Z be a random variable with a Γ(2, θ) distribution, and suppose X isa random variable whose conditional distribution given Z is uniform on [0, Z]. Namely,

for every Borel non-negative function h, E[h(X)|Z] = Z−1∫ Z

0h(x)dx a.s. Show that X

and Z −X are independent, with exponential law.

Page 78: AdPr2006

78 CHAPTER 9. EXERCISES

Exercise 9.1.5Suppose given a, b > 0, and let X, Y be two random variables with values in Z+ and R+

respectively, whose distribution is characterized by the formula

P (X = n, Y ≤ t) = b

∫ t

0

(ay)n

n!exp(−(a+ b)y)dy.

Let n ∈ Z+ and h : R+ → R+ be a measurable function, compute E[h(Y )|X = n]. Thencompute E[Y/(X + 1)], E[1X=n|Y ] and E[X|Y ].

Exercise 9.1.6Let (X, Y1, . . . , Yn) be a random vector with components in L2. Show that the bestapproximation of X in the L2 norm by an affine combination of the (Yi, 1 ≤ i ≤ n), sayof the form λ0 +

∑ni=1 λi(Yi−E[Yi]), is given by λ0 = E[X] and any solution (λ1, . . . , λn)

of the linear system

Cov (X, Yj) =n∑i=1

λiCov (Yi, Yj) , 1 ≤ j ≤ n.

This affine combination is called the linear regression of X with respect to (Y1, . . . , Yn).

If (X, Y1, . . . , Yn) is a Gaussian random vector, show that E[X|Y1, . . . , Yn] equals thelinear regression of X with respect to (Y1, . . . , Yn).

Exercise 9.1.7Let X ∈ L1(Ω,F , P ). Show that the family

E[X|G] : G is a sub-σ-algebra of F

is uniformly integrable.

Exercise 9.1.8 Conditional independenceLet G ⊆ F be a sub-σ-algebra. Two random variables X,Y are said to be independentconditionally on G if for every non-negative measurable f, g,

E[f(X)g(Y )|G] = E[f(X)|G]E[g(Y )|G].

What are two random variables independent conditionally on ∅,Ω? On F?

1. Show that X, Y are independent conditionally on G if and only if for every non-negative G-measurable random variable Z, and every f, g non-negative measurable func-tions,

E[f(X)g(Y )Z] = E[f(X)ZE[g(Y )|G]],

and this if and only if for every measurable non-negative g,

E[g(Y )|G ∨ σ(X)] = E[g(Y )|G].

Comment the case G = ∅,Ω.2. Suppose given three random variables X, Y, Z with a positive density p(x, y, z).

Suppose X,Y are independent conditionally on σ(Z). Show that there exist measurablepositive functions r, s so that p(x, y, z) = q(z)r(x, z)s(y, z) where q is the density of Z,and conversely.

Page 79: AdPr2006

9.2. DISCRETE-TIME MARTINGALES 79

9.2 Discrete-time martingales

Exercise 9.2.1Let (Xn, n ≥ 0) be an integrable process with values in a countable subset E ⊂ R. Showthat X is a martingale with respect to its natural filtration if and only if for every n andevery i0, . . . , in ∈ E,

E[Xn+1|X0 = i0, . . . , Xn = in] = in.

Exercise 9.2.2Let (Xn, n ≥ 1) be a sequence of independent random variables with respective laws givenby

P (Xn = −n2) =1

n2, P

(Xn =

n2

n2 − 1

)= 1− 1

n2.

Let Sn = X1 + . . .+Xn. Show that Sn/n→ 1 a.s. as n→∞, and deduce that (Sn, n ≥ 0)is a martingale which converges to +∞.

Exercise 9.2.3Let (Ω,F , (Fn), P ) be filtered probability space. Let A ∈ Fn for some n, and let m,m′ ≥n. Show that m1A +m′

1Ac is a stopping time.

Show that an adapted process (Xn, n ≥ 0) with respect to some filtered probabilityspace is a martingale if and only if it is integrable, and for every bounded stopping timeT , E[XT ] = E[X0].

Exercise 9.2.4Let X be a martingale (resp. supermartingale) on some filtered probability space, and letT be an a.s. finite stopping time. Prove that E[XT ] = E[X0] (resp. E[XT ] ≤ E[X0]) ifeither one of the following conditions holds:

1. X is bounded (∃M > 0 : ∀n ≥ 0, |Xn| ≤M a.s.).

2. X has bounded increments (∃M > 0 : ∀n ≥ 0, |Xn+1 −Xn| ≤ M a.s.) and E[T ] <∞.

Exercise 9.2.5Let (Xn, n ≥ 0) be a non-negative supermartingale. Show the maximal inequality fora > 0:

aP

(max0≤k≤n

Xk ≥ a

)≤ E[X0].

Exercise 9.2.6Let T be an (Fn, n ≥ 0)-stopping time such that for some integer N > 0 and ε > 0,

P (T ≤ N + n|Fn) ≥ ε , for every n ≥ 0.

Show that E[T ] <∞. Hint: Find bounds for P (T > kN).

Page 80: AdPr2006

80 CHAPTER 9. EXERCISES

Exercise 9.2.7Your winnings per unit stake on game n are εn, where (εn, n ≥ 0) is a sequence ofindependent random variables with

P (εn = 1) = p , P (εn = −1) = 1− p = q,

where p ∈ (1/2, 1). Your stake Cn on game n must lie between 0 and Zn−1, where Zn−1

is your fortune at time n − 1. Your object is to maximize the expected ’interest rate’E[log(ZN/Z0)] where N is a given integer representing the length of the game, and Z0,your fortune at time 0, is a given constant. Let Fn = σε1, . . . , εn. Show that if C isany previsible strategy, that is if Cn is Fn−1-measurable for all n, then logZn − nα is asupermartingale, where α denotes the entropy

α = p log p+ q log q + log 2,

so that E[log(ZN/Z0)] ≤ Nα, but that, for a certain strategy, logZn−nα is a martingale.What is the best strategy?

Exercise 9.2.8 Polya’s urnConsider an urn that initially contains two balls, one black, one white. One picks atrandom one of the balls with equal probability, checks the color, replaces the ball in theurn and adds another ball of the same color. Then resume the procedure. After step n,n+ 2 balls are in the urn, of which Bn + 1 are black and n+ 1−Bn are white.

1. Show that ((n + 2)−1(Bn + 1), n ≥ 0) is a martingale with respect to a certainfiltration you should indicate. Show that it converges a.s. and in Lp for all p ≥ 1 to a[0, 1]-valued random variable X∞.

2. Show that for every k, the process

(Bn + 1)(Bn + 2) . . . (Bn + k)

(n+ 2)(n+ 3) . . . (n+ k + 1), n ≥ 1

is a martingale. Deduce the value of E[Xk∞], and finally the law of X∞.

3. Re-obtain this result by directly showing that P (Bn = k) = (n + 1)−1 for everyn ≥ 1, 1 ≤ k ≤ n. As a hint, let Yi be the indicator that the i-th picked ball is black, andcompute P (Yi = ai, 1 ≤ i ≤ n) for any (ai, 1 ≤ i ≤ n) ∈ 0, 1n.

4. Show that for 0 < θ < 1, (N θn, n ≥ 0) is a martingale, where

N θn =

(n+ 1)!

Bn!(n−Bn)!θBn(1− θ)n−Bn

Exercise 9.2.9 Bayes’ urnLet U be a uniform random variable on [0, 1], and conditionally on U , let X1, X2, . . . beindependent Bernoulli random variables with parameter U . Let Bn =

∑ni=1Xi. Show that

for every n, (B1, . . . , Bn) has the same law as the sequence (B1, . . . , Bn) in the previousexercise. Show that N θ

n is a conditional density function of U given B1, . . . , Bn.

Exercise 9.2.10 Monkey typing ABRACADABRAA monkey types a text at random on a keyboard, so that each new letter is picked

Page 81: AdPr2006

9.2. DISCRETE-TIME MARTINGALES 81

uniformly at random among the 26 letters of the roman alphabet. Let Xn be the n-thletter of the monkey’s masterpiece, and let T be the first time when the monkey has typedthe exact word ABRACADABRA

T = infn ≥ 0 : (Xn−10, Xn−9, . . . , Xn) = (A,B,R,A,C,A,D,A,B,R,A).

Show that E[T ] <∞. The goal is to give the exact value of E[T ]. For this, suppose thatjust before each time n, a player Pn comes and bets 1 gold coin (GC) that Xn will be A.If he loses, he leaves the game, and if he wins, he earns 26GC, which he entirely plays onXn+1 being B. If he loses, he leaves, else he earns 262GC which he bets on Xn+2 being R,and so on. Show that

E[T ] = 2611 + 264 + 26.

(Hint: Use exercise 9.2.4) Why is that larger than the average first time the monkey hastyped ABRACADABRI?

Exercise 9.2.11Let (Xn, n ≥ 0) be a sequence of [0, 1]-valued random variables, which satisfy the followingproperty. First, X0 = a a.s. for some a ∈ (0, 1), and for n ≥ 0,

P

(Xn+1 =

Xn

2

∣∣∣∣Fn) = 1−Xn , P

(Xn+1 =

1 +Xn

2

∣∣∣∣Fn) = Xn,

where Fn = σXk, 0 ≤ k ≤ n. Here, we have denoted P (A|G) = E[1A|G].

1. Prove that (Xn, n ≥ 0) is a martingale that converges in Lp for every p ≥ 1.

2. Check that E[(Xn+1−Xn)2] = E[Xn(1−Xn)]/4. Then determine E[X∞(1−X∞)]

and deduce the law of X∞.

Exercise 9.2.12Let (Xn, n ≥ 0) be a martingale in L2. Show that its increments (Xn+1 −Xn, n ≥ 0) arepairwise orthogonal. Conclude that X is bounded in L2 if and only if∑

n≥0

E[(Xn+1 −Xn)2] <∞,

and that Xn converges in L2 in this case, without using the L2 convergence theorem formartingales.

Exercise 9.2.13 Wald’s identityLet (Xn, n ≥ 0) be a sequence of independent and identically distributed real integrablerandom variables, which are not a.s. 0. We let Sn = X1+. . .+Xn be the associated randomwalk, and recall that (Sn − nE[X1], n ≥ 0) is a martingale. Let T be a (Fn)-stoppingtime.

1. Show that

E[|ST∧n − ST |] ≤∞∑

k=n+1

E[|Xk|1T≥k] ≤ E[|X1|]E[T1T≥n+1].

Page 82: AdPr2006

82 CHAPTER 9. EXERCISES

Deduce that if E[T ] < ∞, then ST∧n converges to ST in L1. Deduce that if E[T ] < ∞,then E[ST ] = E[X1]E[T ].

2. Suppose E[X1] = 0 and Ta = infn ≥ 0 : Sn > a for some a > 0. Show thatE[Ta] = ∞.

3. Let now a < 0 < b and Ta,b = infn ≥ 0 : Sn < a or Sn > b. Assume thatE[X1] 6= 0. By discussing separately the cases where X1 is bounded or not, prove thatE[Ta,b] <∞ and that E[STa,b

] = E[X1]E[Ta,b].

4. Assume that E[X1] = 0. Show that E[Ta,b] < ∞. Hint: consider again separatelythe cases when X1 is bounded and unbounded. In the bounded case, think how far(S2

n, n ≥ 0) is from being a martingale.

Exercise 9.2.14 The gambler’s ruinLet 0 < K < N be integers. Consider a sequence of independent random variables(Xn, n ≥ 1) with P (Xn = 1) = p = 1 − P (Xn = −1), where p ∈ (0, 1/2) ∪ (1/2, 1). LetSn = X1 + . . .+Xn and define

T0 = infn ≥ 1 : Sn = 0 , TN = infn ≥ 1 : Sn = N.

Show that T := T0∧TN is a.s. finite (and in fact has finite expectation). Then show that,letting q = 1− p,

Mn =

(q

p

)Sn

, Nn = Sn − (p− q)n , n ≥ 0,

defines two martingales with respect to the natural filtration of (Sn, n ≥ 1). ComputeP (T0 < TN) and E[ST ], E[T ].

What happens to this exercise if p = 1/2?

Exercise 9.2.15 Azuma-Hœffding inequality1. Let Y be a random variable taking values in [−c, c] for some c > 0, and such thatE[Y ] = 0. Show that for every θ ∈ R,

E[eθY ] ≤ cosh θc ≤ exp

(θ2c2

2

).

As a hint, the convexity of z 7→ ezθ entails that

eyθ ≤ y + c

2cecθ +

c− y

2ce−cθ.

Also, state and prove a conditional version of this fact.

2. Let M be a martingale with M0 = 0, and such that there exists a sequence(cn, n ≥ 0) of positive real numbers such that |Mn −Mn−1| ≤ cn for every n. Show thatfor x ≥ 0,

P

(sup

0≤k≤nMk ≥ x

)≤ exp

(− x2

2∑n

k=1 c2k

).

As a hint, notice that (eθMn , n ≥ 0) is a submartingale, and optimize over θ.

Page 83: AdPr2006

9.2. DISCRETE-TIME MARTINGALES 83

Exercise 9.2.16 A discrete Girsanov theoremLet Ω be the space of real-valued sequences (ωn, n ≥ 0) such that lim supn ωn = +∞ andlim infn ωn = −∞. We say that such sequences oscillate. Let Fn = σXk, 0 ≤ k ≤ nwhere Xk(ω) = ωk is the k-th projection, and F = F∞. Show that p = 1/2 is the only realin (0, 1) such that there exists a probability measure Pp on (Ω,F) that makes (Xn, n ≥ 0)a simple random walk with step distributions

Pp(X1 = 1) = p = 1− P (X1 = −1).

Let Pp,n be the unique probability measure on (Ω,Fn) that makes (Xk, 0 ≤ k ≤ n) asimple random walk with these step distributions. If p ∈ (0, 1) \ 1/2, identify themartingale

Mn =dPp,n

dP1/2,n

.

Find a finite stopping time T such that E1/2[MT ] < 1.

Exercise 9.2.17Let f : [0, 1] → R be a Lipschitz function, i.e. |f(x) − f(y)| ≤ K|x − y| for some K > 0and every x, y. Let fn be the function obtained by interpolating linearly between thevalues of f taken at numbers of the form k2−n, 0 ≤ k ≤ 2n, and let Mn = f ′n.

1. Show that Mn is a martingale in some filtration.

2. Deduce that there exists an integrable function g : [0, 1] → R such that f(x) =f(0) +

∫ x0g(y)dy for almost every 0 ≤ x ≤ 1.

Exercise 9.2.18 Doob’s decomposition of submartingalesLet (Xn, n ≥ 0) be a submartingale.

1. Show that there exists a unique martingale Mn and a unique previsible process(An, n ≥ 0) such that A0 = 0, A is increasing and X = M + A.

2. Show that M,A are bounded in L1 if and only if X is, and that A∞ < ∞ a.s. inthis case (and even that E[A∞] <∞), where A∞ is the increasing limit of An as n→∞.

Exercise 9.2.19Let (Xn, n ≥ 0) be a U.I. submartingale.

1. Show that if X = M + A is the Doob decomposition of X, then M is U.I.

2. Show that for every pair of stopping times S, T , with S ≤ T ,

E[XT |FS] ≥ XS

Exercise 9.2.20 Quadratic variationLet (Xn, n ≥ 0) be a square-integrable martingale.

1. Show that there exists a unique increasing previsible process starting at 0, whichwe denote by (〈X〉n, n ≥ 0), so that (X2

n − 〈X〉n, n ≥ 0) is a martingale.

2. Let C be a bounded previsible process. Compute 〈C ·X〉.3. Let T be a stopping time, show that 〈XT 〉 = 〈X〉T .

4. (Harder) Show that 〈X〉∞ <∞ implies that Xn converges as n→∞, up to a zeroprobability event. Is the converse true? Show that it is when supn≥0 |Xn+1 − Xn| ≤ Ka.s. for some K > 0.

Page 84: AdPr2006

84 CHAPTER 9. EXERCISES

9.3 Continuous-time processes

Exercise 9.3.1 Gaussian processesA real-valued process (Xt, t ≥ 0) is called a Gaussian process if for every t1 < t2 < . . . < tk,the random vector (Xt1 , . . . , Xtk) is a Gaussian random vector. Show that the law of aGaussian process is uniquely characterized by the numbers E[Xt], t ≥ 0 and Cov (Xs, Xt)for s, t ≥ 0.

Exercise 9.3.2Let T be an exponential random variable with parameter λ > 0. Define

Zt =

0 if t < T1 if t ≥ T

, Ft = σZs, 0 ≤ s ≤ t , Mt =

1− eλt if t < T

1 if t ≥ T.

Show that E[|Mt|] < ∞ for every t ≥ 0, and that E[Mt1T>r] = E[Ms1T>r] for everyr ≤ s ≤ t. Deduce that (Mt, t ≥ 0) is a cadlag (Ft)-martingale.

Is M bounded in L1? Is it uniformly integrable? Is MT− in L1?

Exercise 9.3.3 Hazard functionLet T be a random variable in (0,∞) that admits a strictly positive continuous densityf on (0,∞). Let F (t) = P (T ≤ t). Let

At =

∫ t

0

f(s)ds

1− F (s), t ≥ 0

to be the hazard function of T . Show that AT has the law of an exponential randomvariable with parameter 1. As a hint, consider the distribution function P (AT ≤ t), t ≥ 0and write it in terms of the inverse function A−1.

By letting Zt = 1t≥T , t ≥ 0 and Ft = σZs, 0 ≤ s ≤ t, prove that (Zt − AT∧t, t ≥ 0)is a cadlag martingale with respect to (Ft, t ≥ 0).

The next exercises are designed to (hopefully) help those of you who want to have abetter insight on the nature of filtrations and events related to continuous-time processes.

Exercise 9.3.4Let C1 be the product σ-algebra on Ω = C([0, 1],R), i.e. the smallest σ-algebra that makesthe applications Xt : ω 7→ ω(t) for t ≥ 0 measurable.

Let C2 be the (more natural?) Borel σ-algebra on C([0, 1],R), when endowed with theuniform norm and the associated topology.

Show that C1 = C2.

Exercise 9.3.5Let I be a nonempty real interval. Let Ω = RI be the set of all functions defined on I,which is endowed with the product σ-algebra F , i.e. the smallest σ-algebra with respectto which Xt : ω 7→ ω(t) is measurable for every t. Show that

G =⋃J≺I

σ(Xs, s ∈ J)

Page 85: AdPr2006

9.4. WEAK CONVERGENCE 85

is a σ-algebra, where J ≺ I stands for J ⊂ I and J is countable. Deduce that G = F .Show that the set

ω ∈ Ω : s 7→ Xs(ω) is continuous

is not measurable with respect to F .

9.4 Weak convergence

Exercise 9.4.1Let (Xn, n ≥ 1) be a sequence of independent random variables with uniform distributionon [0, 1]. Let Mn = max(X1, . . . , Xn). Show that n(1−Mn) converges in distribution asn→∞, and determine the limit law.

Exercise 9.4.2Let (Xn, n ∈ Nt ∞) be random variables defined on some probability space (Ω,F , P ),with values in a metric space (M,d).

1. Suppose that Xn → X∞ a.s. as n → ∞. Show that Xn converges to X∞ indistribution.

2. Suppose that Xn converges in probability to X∞. Show that Xn converges indistribution to X∞.

Hint: use the fact that (Xn, n ≥ 0) converges in probability to X∞ if and only iffor every subsequence extracted from (Xn, n ≥ 0), there exists a further subsequenceconverging a.s. to X∞.

3. If Xn converges in distribution to a constant X∞ = c, then Xn converges inprobability to c.

Exercise 9.4.3Suppose given sequences (Xn, n ≥ 0), (Yn, n ≥ 0) of real-valued random variables, andtwo extra random variables X,Y , such that Xn, Yn respectively converge in distributionto X, Y . Is it true that (Xn, Yn) converges in distribution to (X, Y )? Show that this istrue in the following cases

1. For every n, Xn and Yn are independent, as well as X and Y .

2. Y is a.s. constant (Hint: use 3. in the previous exercise).

Exercise 9.4.4Let m be a probability measure on R. Define, for every n ≥ 0,

mn(dx) =∑k∈Z

m([k2−n, (k + 1)2−n))δk2−n(dx),

where δz(dx) denotes the Dirac mass at z. Show that mn converges weakly to m.

Exercise 9.4.51. Let (Xn, n ≥ 1) be independent exponential random variables with mean 1. Define

Page 86: AdPr2006

86 CHAPTER 9. EXERCISES

Sn = X1+. . .+Xn, and determine without computation the limit of P (Sn ≤ n) as n→∞(Hint: which theorem could be useful here?).

2. Determine also without computation the limit of exp(−n)∑n

k=0(k!)−1nk.

Hint: recall that the Poisson law with parameter λ > 0 is the probability distributionon Z+ that puts mass e−λλn/n! on the integer n. Then if X, Y are two random variableswhich are independent and with respective laws that are Poisson with parameters λ, µ,then X + Y has a Poisson law with parameter λ+ µ. Using this, make the formula looklike question 1.

Exercise 9.4.6Let (Yn, n ≥ 0) be a sequence of random variables so that Yn follows a GaussianN (mn, σ

2n)

law, and suppose that Yn weakly converges to some Y as n→∞. Show that there existm ∈ R and σ2 ≥ 0 so that mn → m,σ2

n → σ2, and that Y is Gaussian N (m,σ2).

Hint: Use characteristic functions, and first show that the variance converges.

Exercise 9.4.7Let d ≥ 1.

1. Show that a finite family of probability measures on Rd is tight.

2. Assuming Prokhorov’s theorem for probability measures on Rd, show that if(µn, n ≥ 0) is a sequence of non-negative measures on Rd which is tight (for every ε > 0there is a compact K ⊂ Rd such that supn≥0 µn(Rd \K) < ε) and such that

supn≥0

µn(Rd) <∞,

then there exists a subsequence nk along which µn weakly converges to a limit µ (i.e.µnk

(f) converges to µ(f) for every bounded continuous f).

9.5 Brownian motion

Exercise 9.5.1Recall that a Gaussian process (Xt, t ≥ 0) in Rd is a process such that for every t1 <t2 < . . . < tk ∈ R+, the vector (Xt1 , . . . , Xtk) is a Gaussian random vector. Show thatthe (standard) Brownian motion in Rd is the unique Gaussian process (Bt, t ≥ 0) withE[Bt] = 0 for every t ≥ 0 and Cov (Bs, Bt) = (s ∧ t) Id for every s, t ≥ 0.

Exercise 9.5.2Let B be a standard real-valued Brownian motion.

1. Show that a.s.,

lim supt↓0

Bt√t

= ∞ , lim inft↓0

Bt√t

= −∞.

2. Show that Bn/n → 0 a.s. as n → ∞. Then show that a.s. for n large enough,supt∈[n,n+1] |Bt −Bn| ≤

√n, and conclude that Bt/t→ 0 a.s. as t→∞.

Page 87: AdPr2006

9.5. BROWNIAN MOTION 87

3. Show that the process

B′t =

tB1/t if t > 0

0 if t = 0, t ≥ 0

is a standard Brownian motion (Hint: Use Exercise 9.5.1).

4. Use this to show that

lim supt→∞

Bt√t

= ∞ , lim inft→∞

Bt√t

= −∞.

Exercise 9.5.3 Around hitting timesLet (Bt, t ≥ 0) be a standard real-valued Brownian motion.

1. Let Tx = inft ≥ 0 : Bt = x for x ∈ R. Prove that Tx has same distribution as(x/B1)

2, and compute its probability distribution function.

2. For x, y > 0, show that

P (T−y < Tx) =x

x+ y, E[Tx ∧ T−y] = xy.

3. Show that if 0 < x < y, the random variable Ty − Tx has same law as Ty−x, and isindependent of FTx (where (Ft, t ≥ 0) is the natural filtration of B).

Hint: the three questions are independent.

Exercise 9.5.4Let (Bt, t ≥ 0) be a standard real-valued Brownian motion. Compute the joint distributionof (Bt, sup0≤s≤tBs) for t ≥ 0.

Exercise 9.5.5Let (Bt, t ≥ 0) be a standard Brownian motion, and let 0 ≤ a < b.

1. Compute the mean and variance of

Xn :=2n∑k=1

(Ba+k(b−a)2−n −Ba+(k−1)(b−a)2−n

)2.

2. Show that Xn converges a.s. and give its limit.

3. Deduce that a.s. there exists no interval [a, b] with a < b such that B is Holdercontinuous with exponent α > 1/2 on [a, b], i.e. supa≤s,t≤b(|Bt −Bs|/|t− s|α) <∞.

Exercise 9.5.6Let (Bt, t ≥ 0) be a standard Brownian motion. Define G1 = supt ≤ 1 : Bt = 0 andD1 = inft ≥ 1 : Bt = 0.

1. Are these random variables stopping times? Show that G1 has same distributionas D−1

1 .

2. By applying the Markov property at time 1, compute the law of D1. Deduce thatof G1 (it is called the arcsine law).

Page 88: AdPr2006

88 CHAPTER 9. EXERCISES

Exercise 9.5.7Let (Bt, t ≥ 0) be a standard Brownian motion, and let (Ft, t ≥ 0) be its natural filtration.Determine all the polynomials f(t, x) of degree less than or equal to 3 in x such that(f(t, Bt), t ≥ 0) is a martingale.

Exercise 9.5.8Let (Bt, t ≥ 0) be a standard Brownian motion in R3. We let Rt = 1/|Bt|.

1. Show that (Rt, t ≥ 1) is bounded in L2.

2. Show that E[Rt] → 0 as t→∞.

3. Show that Rt, t ≥ 1 is a supermartingale. Deduce that |Bt| → ∞ as t→∞, a.s.

Exercise 9.5.9 Zeros of Brownian motionLet (Bt, t ≥ 0) be a standard real-valued Brownian motion. Let Z = t ≥ 0 : Bt = 0 bethe set of zeros of B.

1. Show that it is closed, unbounded and has zero Lebesgue measure a.s.

2. By using the stopping times Dq = inft ≥ q : Bt = 0 for q ∈ Q+, show that Z hasno isolated point a.s.

Exercise 9.5.10Let W0(dw) denote Wiener’s measure on Ω0 = w ∈ C([0, 1]) : w(0) = 0, and define a

new probability measure W(a)0 on Ω0 by

dW(a)0

dW0

(w) = exp(aw(1)− a2/2).

1. Show that under W(a)0 , the canonical process Xt : w 7→ w(t) remains Gaussian, and

give its distribution.

2. Show that W0(f ∈ Ω0 : ‖f‖∞ < ε) > 0 for every ε > 0, where ‖f‖∞ =sup0≤t≤1 |f(t)|.

3. Show that for every non-empty open set U ⊂ Ω0, one has W0(A) > 0. Hint: Firstnote that any such U contains the ε-neighborhood of a function f which is piecewiselinear, for some ε > 0.

Exercise 9.5.11 Brownian bridgeLet (Bt, 0 ≤ t ≤ 1) be a standard Brownian motion. We let (Zy

t = yt + (Bt − tB1), 0 ≤t ≤ 1) for any y ∈ R, and call it the Brownian bridge from 0 to y. Let W y

0 be thelaw of (Zy

t , 0 ≤ t ≤ 1) on C([0, 1]). Show that for any non-negative measurable functionF : C([0, 1]) → R+ for f(y) = W y

0 (F ), we have

E[F (B)|B1] = f(B1) , a.s.

Hint: find a simple argument entailing that B1 is independent of the process (Bt−tB1, 0 ≤t ≤ 1).

Explain why we can interpret W y0 as the law of a Brownian motion ‘conditioned to hit

y at time 1’.

Page 89: AdPr2006

9.6. POISSON MEASURES, ID LAWS AND LEVY PROCESSES 89

Exercise 9.5.12Show that the Dirichlet problem on D = B(0, 1) \ 0 in Rd, with boundary conditionsg(x) = 0 for |x| = 1 and g(x) = 1 for x = 0, has no solution for d ≥ 2.

Exercise 9.5.13 Dirichlet problem in the upper-half planeLet H = (x, y) ∈ R2 : y > 0. Let (Bt, t ≥ 0) be a Brownian motion started from xunder the probability measure Px, and let T = inft ≥ 0 : Bt /∈ H.

1. Determine the law of BT under Px whenever x ∈ H.

2. Show that if u is a bounded continuous function on H which is harmonic on H,then

u(x, y) =

∫R

dz u(z, 0)1

π

y

(x− z)2 + y2dz.

9.6 Poisson measures, ID laws and Levy processes

Exercise 9.6.1Prove that the Poisson law with parameter λ > 0 is the weak limit of the Binomial lawwith parameters (n, λ/n) as n→∞.

A factory makes 500,000 light bulbs in a day. On an average, 4 of these are defectuous.Estimate the probability that on some given day, 2 of the produced light bulbs weredefectuous.

Exercise 9.6.2 The bus paradoxWhy do we always feel we are waiting a very long time before buses arrive? This exercisegives in indication of why... well, if buses arrive according to a Poisson process.

1. Suppose buses are circulating in a city day and night since ever, the counterpartbeing that drivers do not officiate with a timetable. Rather, the times of arrival of busesat a given bus-stop are the atoms of a Poisson measure on R with intensity θdt, where dtis Lebesgue measure on R. A customer arrives at a fixed time t at the bus-stop. Let S, Tbe the two consecutive atoms of the Poisson measure satisfying S < t < T . Show thatthe average time E[T − S] that elapses between the arrivals of the last bus before time tand the first bus after time t is 2/θ. Explain why this is twice the average time betweenconsecutive buses. Can you see why this is so?

2. Suppose that buses start circulating at time 0, so that arrivals of buses at thestation are now the jump times of a Poisson process with intensity θ on R+. If thecustomer arrives at time t, show that the average elapsed time between the bus before(time S) and after his arrival (time T ) is θ−1(2− e−θt) (with the convention S = 0 if noatom has fallen in [0, t]).

Exercise 9.6.3Prove Proposition 7.3.2.

Exercise 9.6.4Check the marking property of Poisson random measures: if M(dx) =

∑i∈I δxi

(dx) is

Page 90: AdPr2006

90 CHAPTER 9. EXERCISES

a Poisson random measure on E, E with intensity µ, and if yi, i ∈ I are i.i.d. randomvariables with law ν on some measurable space (F,F), and independent of M , then∑

i∈I δ(xi,yi)(dx dy) is a Poisson random measure on E × F with intensity µ⊗ ν.

Exercise 9.6.5 Brownian motion and the Cauchy processLet (Bt = (B1

t , B2t ), t ≥ 0) be a standard Brownian motion in R2 (i.e. B0 = 0). Recall

that the Cauchy law with parameter a > 0 has probability distribution function a/(π(a2+x2)), x ∈ R. We let

Ca = inft ≥ 0 : B2t = −a , a ≥ 0.

Prove that the process (B1Ca, a ≥ 0) is a Levy process such that Ca is a Cauchy law with

parameter a for every a > 0. Does this remind you of a previous exercise?

Page 91: AdPr2006

Index

cadlag, 29

Blumenthal’s 0− 1 law, 47Branching process, 23Brownian motion, 45

(Ft)-Brownian motion, 48finite marginal distributions, 45standard, 45

Central limit theorem, 40Characteristic exponent, 70Compound Poisson distribution, 68Compound Poisson process, 67Conditional convergence theorems, 9Conditional density functions, 11Conditional expectation

discrete case, 5for L1 random variables, 6for non-negative random variables, 8

Conditional Jensen inequality, 9Convergence in distribution, 39

Dirichlet problem, 55Donsker’s invariance principle, 59Doob’s Lp inequality, 19, 34Doob’s maximal inequality, 18Doob’s upcrossing lemma, 17

Exterior cone condition, 55

Filtration, 13filtered space, 13natural filtration, 13

Finite marginal distributions, 31First entrance time, 14, 30First hitting times for Brownian motion, 50,

51

Harmonic function, 56

Infinitely divisible distribution, 69

Intensity measure, 63

Kakutani’s product-martingales theorem, 26Kolmogorov’s 0− 1 law, 23Kolmogorov’s continuity criterion, 35

Levy process, 66, 71Levy triple, 69Levy’s convergence theorem, 41Levy-Ito theorem, 73Levy-Khintchine formula, 70Laplace functional, 65Last exit time, 14Law of large numbers, 23Likelihood ratio test, 28

Martingale, 14backwards, 21closed, 19complex-valued, 51convergence theorem

almost-sure, 17, 34for backwards martingales, 21in L1, 20, 34in Lp, p > 1, 19, 34

regularization, 32uniformly integrable, 20

Martingale transform, 15

Optional stoppingfor discrete-time martingales, 16for uniformly integrable martingales, 20,

34

Poisson point process, 66Poisson random measure, 63Prokhorov’s theorem, 40

Radon-Nikodym theorem, 25Recurrence, 53Reflection principle, 49

91

Page 92: AdPr2006

92 INDEX

Scaling property of Brownian motion, 47Separable σ-algebra, 25Simple Markov property

for Brownian motion, 47for Levy processes, 66

Skorokhod’s embedding, 60Stochastic process, 13

adapted, 13continuous-time, 29discrete-time, 15integrable, 13previsible, 15stopped process, 15

Stopping timedefinition, 14measurable events before T , 15

Strong Markov property, 49Submartingale, 14Supermartingale, 14

Taking out ‘what is known’, 10Tightness, 40Tower property, 10Transience, 53

Versions of a process, 31

Weak convergence, 37Wiener space, 46Wiener’s measure, 46, 48Wiener’s theorem, 45