Top Banner
Stochastic Processes – Lecture Notes – Marcin Pitera Jagiellonian University November 29, 2020 Contents 1 Introduction & Preliminaries 2 1.1 Conditional expectation .................................. 2 1.2 Useful classical theorems .................................. 4 2 Introduction to stochastic processes 5 2.1 Basic definitions and properties .............................. 5 2.2 Continuous modifications and Kolmogorov’s theorems ................. 7 2.3 Filtrations and adaptiveness ................................ 9 2.4 Stopping times ....................................... 12 2.5 Martingales ......................................... 17 3 Important examples of stochastic processes 23 3.1 Brownian motion ...................................... 24 3.2 Poisson Process ....................................... 29 3.3 Markov processes ...................................... 34 4 Introduction to stochastic Ito calculus 37 4.1 Ito integral of an elementary process ........................... 37 4.2 Extending Ito integral to L 2 space. ............................ 41 A Appendix 44 A.1 Exemplary list of questions (and scope) for the exam .................. 44 A.2 Notation ........................................... 45 1
45

Stochastic Processes { Lecture Notes

Feb 18, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Stochastic Processes { Lecture Notes

Stochastic Processes

– Lecture Notes –

Marcin PiteraJagiellonian University

November 29, 2020

Contents

1 Introduction & Preliminaries 21.1 Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Useful classical theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Introduction to stochastic processes 52.1 Basic definitions and properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Continuous modifications and Kolmogorov’s theorems . . . . . . . . . . . . . . . . . 72.3 Filtrations and adaptiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Important examples of stochastic processes 233.1 Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2 Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.3 Markov processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4 Introduction to stochastic Ito calculus 374.1 Ito integral of an elementary process . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2 Extending Ito integral to L2 space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

A Appendix 44A.1 Exemplary list of questions (and scope) for the exam . . . . . . . . . . . . . . . . . . 44A.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

1

Page 2: Stochastic Processes { Lecture Notes

2

1 Introduction & Preliminaries

In this course we assume the knowledge of basic concepts, techniques, and theorems from Analy-sis, Measure theory, Probability Theory, and Statistics. Before we begin we quickly outline somebasics related to conditional expectations and martingale theory that will be relevant for this course.

If not stated otherwise, we assume that (Ω,F ,P) is the underlying probability space. Also, weassume that all random variables introduced throughout this course are integrable. If not statedotherwise, all the inequalities should be understood in the almost-sure sense.

This lecture notes are partly based on polish lecture notes prepared by prof. Szymon Peszat.I would like to thank prof. Peszat for his authorisation to share the materials.

1.1 Conditional expectation

Let G be a sub σ–algebra of F , i.e a σ–algebra of subsets of Ω such that G ⊆ F and let X : Ω→ Rbe an integrable random variable.

Definition 1.1. A conditional expectation of X with respect to G is a random variableY such that

1) Y is G-measurable;

2) For any A ∈ G we get E[1AX] = E[1AY ].

While for integrable random variable the conditional expectation always exist, it might be non-unique, i.e. more than one random variable might satisfy properties from Definition 1.1. Neverthe-less, it is defined uniquely up to a set of zero measure.1 For transparency, we use symbol E[X|G]to denote the conditional expectation (keeping in mind that we operate on the space L1(Ω,F ,P)of equivalent classes of random variables, rather than on the space of all random variables, and allthe inequalities are understood in almost-sure sense).

Next, we define a conditional expectation of X with respect to another random variable Y .

Definition 1.2. Let X and Y be a random variable. We denote by E[X|Y ] the conditionalexpectation of X with respect to Y given by

E[X|Y ] := E[X|σ(Y )],

where σ(Y ) is a σ–algebra generated by Y .

From Definition 1.1 we know that E[X|Y ] is σ(Y ) measurable. Thus, from Doob–Dynkin lemmait follows that

E[X|Y ] = g(Y ),

for some Borel measurable function g : R→ R. From now on, we use the convention

E[X|Y = y] := g(y), y ∈ R.

Before we outline basic properties of conditional expectation, let us present two examples of con-ditional expectations.

1That is, if Y and V satisfy all the properties from Definition 1.1, then P[Y = V ] = 1.

Page 3: Stochastic Processes { Lecture Notes

3

Example 1.3. Let G = σ(A1, A2, . . . , An), where Aini=1 is a finite partition of Ω.2 Then, forany integrable random variable X we get the representation

E[X|G](ω) =

1

P[Ai]E[1AiX] if ω ∈ Ai for which P[Ai] > 0;

0 if ω ∈ Ai for which P[Ai] = 0.(1.1)

Note that in Equation (1.1) we can substitute 0 with any real number, for any i ∈ 1, . . . , n suchthat P[Ai] = 0. Also, using the convention 0

0 = 0 we can rewrite (1.1) as

E[X|G] =n∑i=1

E[1AiX]

P[Ai]1Ai .

Proof. We divide the proof of (1.1) into a few steps.1) First of all, let us show that

G =

⋃i∈I

Ai : I ⊆ 1, 2, . . . , n

. (1.2)

Let G denote the RHS of (1.2). Because G contains A1, . . . , An we get that G ⊆ G. On the otherhand we know that for any I ⊆ 1, . . . , n we get

⋃i∈I Ai ∈ G, so G ⊆ G. Thus, (1.2) is proved.

2) Next, we show that random variable Y is G–measurable if and only if Y is constant on any Ai.On the contrary, let us assume that there exists Y and i0 such that Y is G–measurable and Y isnot constant on Ai0 . Then, there exists x ∈ R such that for B = (−∞, x] we get

Y −1(B) ∩Ai0 6= ∅ and Y −1(Bc) ∩Ai0 6= ∅.

This contradicts (1.2), because Y −1(B) ∩Ai0 is a (proper) subset of Aio , which belongs to G.3) Finally, we are now ready to prove (1.1). From the previous steps we know that

E[X|G] =

n∑i=1

ai1Ai ,

for some sequence of real numbers a1, . . . , an. Now, using Definition 1.1 and noting that for i 6= jwe have 1Ai1Aj = 1Ai∩Aj = 0, we get

E[1AiX] = E

[1Ai

(n∑i=1

ai1Ai

)]= E [1Aiai] = P[Ai]ai,

which concludes the proof of (1.1).

Example 1.4. Let Y be a random variable that takes finitely many values y1, . . . , yn – all withpositive probability. Then, for any integrable random variable X we get

E[X|Y ] =n∑i=1

E[1Y=yiX]

P[Y = yi]1Y=yi. (1.3)

Alternatively, we can rewrite (1.3) as

E[X|Y = y] =

1

P[Y=yi]E[1Y=yiX] if y = yi for some i ∈ 1, . . . , n,

0 if y 6∈ y1, . . . , yn.2A collection of sets such that Ai ∩Aj = ∅ for i 6= j, and Ω = A1 ∪ . . . ∪An.

Page 4: Stochastic Processes { Lecture Notes

4

Proof. We know thatσ(Y ) = σ(A1, . . . , An)

where Ai := Y = yi for i = 1, . . . , n. Moreover, it is easy to note that A1, . . . , An is a partitionof Ω. Thus, the proof follows from the proof of Example 1.3.

For brevity, we state all the properties/theorems in this section without proofs. We start withsome basic properties of conditional expectation.

Theorem 1.5 (Basic properties). Let X and Y be integrable random variables and let G be a subσ-algebra of F . Then,

1) If X is G–measurable, then E[X|G] = X;

2) If X is G–measurable and XY is integrable, then E[XY |G] = XE[Y |G];

3) If X is independent of G,3 then E[X|G] = E[X];

4) If X is independent of G, and H is a sub σ–algebra of F , then E[X|σ(G,H)] = E[X|H];

5) If G = Ω, ∅, then E[X|G] = E[X];

6) For any a, b ∈ R we get E[aX + bY |G] = aE[X|G] + bE[Y |G];

7) If H is sub σ-field of G, then E[X|H] = E[E[X|G] |H];

8) If X ≥ 0, then E[X|G] ≥ 0;

9) If X ≥ Y , then E[X|G] ≥ E[Y |G];

1.2 Useful classical theorems

Next, we present a couple of classical results transferred from the static to the conditional case.For brevity, we omit most of the proofs – they are very similar to the unconditional ones that couldbe found in any good differential and integral calculus book.

Theorem 1.6 (Conditional monotone convergence theorem). Let (Xn) be a non-negative non-decreasing sequence of random variables that converge (almost-surely) to X. Then, for any subσ-algebra G (of F) we get

E[Xn|G]a.s.−→ E[X|G], n→∞.

Theorem 1.7 (Conditional dominated convergence theorem). Let (Xn) be a sequence of randomvariables that converge (almost-surely) to X, and let Y be an integrable random variable, such thatfor any n ∈ N we get |Xn| < Y . Then, random variable X is integrable and for any sub σ-algebraG (of F) we get

limn→∞

E[Xn|G] = E[X|G].

Theorem 1.8 (Conditional Fatou lemma). Let (Xn) be a non-negative sequence of integrable ran-dom variables. Then, for any sub σ–algebra G (of F) we get

E[lim infn→∞

Xn|G] ≤ lim infn→∞

E[Xn|G].

3That is σ(X) and G are independent: for any A ∈ σ(X) and B ∈ G, we get P[A ∩B] = P[A]P[B].

Page 5: Stochastic Processes { Lecture Notes

5

Finally, let us note that if X is square integrable (i.e. X ∈ L2(Ω,F ,P)), then the conditionalexpectation wrt. sub σ–algebra G could be seen as the orthogonal projection from L2(Ω,F ,P) toL2(Ω,G,P).4 This implies that E[X|G] is the least-square best G–measurable predictor (that is alsosquare integrable). In particular, If G relates to knowledge about a system (e.g. stock market) andX is a random variable corresponding to some system-related process (e.g. stock price), then weknow that E[X|G] might be treated as the best guess about X given information G; see Theorem 1.9.

Theorem 1.9 (Conditional expectation as the best least-square predictor). Let X be a squareintegrable random variable and let G be a sub σ-algebra of F . Then, for any square integrableG–measurable random variable Z we get

E[(X − E[X|G])2] ≤ E[(X − Z)2]. (1.4)

Proof. Let Z be a square integrable and G-measurable. Using the tower property, we get

E[(X − Z)2] = E[((X − E[X|G]) + (E[X|G])− Z))2]

≥ E[(X − E[X|G])2] + 2E[(X − E[X|G])(E[X|G])− Z)]

= E[(X − E[X|G])2]− 2E[(X − E[X|G])Z]

Next, we have

E[(X − E[X|G])Z] = E[E[(X − E[X|G])Z | G]

]= E[Z E[(X − E[X|G]) | G]]

= E[Z (E[X|G]− E[X|G])]

= 0,

which concludes the proof.

2 Introduction to stochastic processes

In this section we use T to denote time. In the discrete case T is typically associated with the set ofdays or years, e.g. T = 1, 2, . . . , T for some fixed T ∈ N, or T = N, or T = Z. In the continuouscase T is usually linked to some fixed interval of R, e.g. T = [0, T ] for some fixed T ∈ R+, orT = R+ , or T = R.

2.1 Basic definitions and properties

First, we introduce the definition of a stochastic process.

Definition 2.1. A collection of random variables X = (Xt)t∈T defined on a same probabilityspace (Ω,F ,P) and indexed by time is called a stochastic process.

It should be noted that stochastic process could be seen as a function X : T×Ω→ R. We call X avector stochastic process if it is a collection od random vectors indexed by time, and when theoutput is also random vector. For brevity we will always use the term stochastic process, evenif we talk about random vectors rather than random variables. For any fixed ω ∈ Ω, one can see(Xt(ω))t∈T as a function of time – a specific realisation of the stochastic process.

4It could be shown that L2(Ω,F ,P) is a Hilbert space (with inner product 〈X,Y 〉 = EXY ), where we identifyrandom variables which agree almost-surely, and L2(Ω,G,P) is it’s linear subspace.

Page 6: Stochastic Processes { Lecture Notes

6

Definition 2.2. Let X be a stochastic process. A sample path (trajectory, realisation)of X corresponding to ω ∈ Ω is a function t→ Xt(ω), where t ∈ T.

Usually, we require additional properties from the stochastic processes like the continuity of samplepaths. Some of them are summarised in Definition 2.3. For brevity we present the properties in thecontinuous time setting. Note that while some of them might be extended directly to the discretetime case (e.g. integrability), others cannot (e.g. continuity).

Definition 2.3. Let X be a continuous-time stochastic process. We say that X is

1) measurable if the map X : T× Ω→ R is F × B(T)–measurable.

2) continuous (resp. left continuous, right continuous) if all sample paths of X are continuous(resp. left continuous, right continous).

3) continuous in probability (or stochastically continuous) if for any t ∈ T we get Xs → Xt

in probability whenever s→ t, i.e. when

∀ε>0∃δ>0 : P[ |Xt −Xs| ≥ ε ] ≤ ε, for s ∈ T such that |t− s| ≤ δ;

4) cadlag if X is right continuous and all sample paths have left limits for any t ∈ T.a

5) integrable if E|Xt| <∞ for any t ∈ T.

6) square-integrable if E|Xt|2 <∞ for any t ∈ T.

7) p-power integrable if E|Xt|p <∞ for p ∈ N and any t ∈ T.

afr. continu a droite at limites a gauche

Now, we want to identify processes for which almost all trajectories are the same.

Definition 2.4. Let X,Y be two stochastic process defined on the same probability space.We say that

1) Y is indistinguishable from X if P[Xt = Yt , ∀t ∈ T] = 1.

2) Y is a modification of X if for any t ∈ T we get P[Xt = Yt] = 1.

3) Y has the same finite-dimensional distribution as X if for any n ∈ N, finite number oftime points (t1, . . . , tn) ∈ Tn, and A ∈ B(Rn), we get

P[(Xt1 , . . . , Xtn) ∈ A] = P[(Yt1 , . . . , Ytn) ∈ A].

Sometimes, instead of the term modification we say that X is a version of Y . Note that if X andY are modifications of each other and are (a.s.) continuous, they are indistinguishable. Let us alsoshow an example of two processes which are modifications of each other but are not indistinguish-able.

Example 2.5. Let T = [0, 1] and let Ω = [0, 1] be a standard probability space. Let X be such

Page 7: Stochastic Processes { Lecture Notes

7

that Xt ≡ 0 for all t ∈ T and let Y be given by

Yt(ω) =

1 if t = ω,

0 if t 6= ω.

Then, for any t ∈ T we get P[Xt = Yt] = P[Ω \ t] = 1, and P[Xt = Yt , ∀t ∈ T] = 0.

In the end of this subsection let us define a three properties, that are often used to characterisecertain classes of stochastic processes (most of the properties could be translated into discrete-timeframework).

Definition 2.6. Let X be a continuous-time stochastic process. We say that X

1) is stationary if for any time-shift h ∈ T the processes (Xt)t∈T and (Xt+h)t∈T have the samefinite-dimensional distributions.

2) is weakly stationary if for any time-shift h ∈ T the processes (Xt)t∈T and (Xt+h)t∈T havefinite-dimensional distributions with the same (finite) first and second moments.

3) has independent increments if for any finite set of time-points t1 ≤ t2 ≤ . . . ≤ tn (fromT), the incremental random variables

Xt2 −Xt1 , Xt3 −Xt2 , . . . , Xtn −Xtn−1

are independent.

Moreover, if the distribution of the increment Xt−Xs depends only on t− s, then we say thatX has independent and stationary increments.

Note that while property 1) is more strict and we often require it for theoretical purposes, theproperty 2) is much easier to check and/or verify. It is especially important in signal processing.Note that weak stationarity implies that the mean value of the process must be the same for anytime-point, and the auto-covariance function given by

CX(t, s) := Cov(Xt, Xs),

where t, s ∈ T, depends only on the difference between time-points t and s.

2.2 Continuous modifications and Kolmogorov’s theorems

We show that if a process has a continuous modification, then it must be continuous in probability.

Proposition 2.7. Let X be a stochastic process for which there exists a continuous modification.Then X is continuous in probability.

Proof. Let us fix t ∈ T and ε > 0. Let

An := ω ∈ Ω : there exists s ∈ T such that |t− s| ≤ 1n and |Xt(ω)− Xs(ω)| ≥ ε.

From the continuity of X we know that An is measurable since it could be expressed as

An := ω ∈ Ω : there exists s ∈ T ∩Q such that |t− s| ≤ 1n and |Xt(ω)− Xs(ω)| ≥ ε.

Page 8: Stochastic Processes { Lecture Notes

8

Also, we know that (An)n∈N is a decreasing sequence5 such that P[⋂n∈NAn ] = 0. Thus, there

exists n0 ∈ N such thatP[An0 ] ≤ ε. (2.1)

Next, because X is the modification of X and Xt− Xs = (Xt−Xt) + (Xt−Xs) + (Xs− Xs) we get

P[|Xt − Xs| ≥ ε] = P[|Xt −Xs| ≥ ε]. (2.2)

Combining (2.1) with (2.2) for any δ < 1n0

and any s ∈ T such that |t− s| ≤ δ we get

P[|Xt −Xs| ≥ ε] = P[|Xt − Xs| ≥ ε] ≤ P[An0 ] ≤ ε,

which concludes the proof.

From Proposition 2.7 we know that any continuous process is continuous in probability. Itshould be also noted that any modification of a continuous in probability process is also continuousin probabiilty. Let us now show an example of process which is continuous in probability but doesnot have a continuous modification.

Example 2.8. Let T = [0, 1] and let Z ∼ U [0, 1]. Let X = (Xt)t∈T be given by

Xt(ω) = 1[0,Z(ω))(t),

for t ∈ [0, 1] and ω ∈ Ω. One can easily check that there exists no continuous modification of Xbut X is continuous in probability.

Before we state (without the proof) two simplified versions of the Kolmogorov’s continuitytheorem that is very useful in the theory of stochastic processes let us recall a concept of Holdercontinuity.

Definition 2.9. We say that a function f : I → R is Holder continuous on I = [a, b] withexponent α ∈ (0, 1) if there exists a positive constant C ∈ R such that for any x, y ∈ I we get

|f(x)− f(y)| ≤ C|x− y|α.

Of course any Holder continuous function is continuous. Also, if f is Holder continuous withexponent α ∈ (0, 1), then it is Holder continuous with exponent γ, for any γ ≤ α.

Theorem 2.10 (Kolmogorov’s continuity theorem 1). Let T = [a, b] and let X be a stochasticprocess. If there exist constants p,K, ε > 0 such that for any t, s ∈ T we get

E|Xt −Xs|p ≤ K|t− s|1+ε, (2.3)

then X has a modification, which is Holder continuous for any exponent α ∈ (0, ε/p).

While the condition T = [a, b] might look restrictive, it is often used to prove the existence ofcontinuous modifications. Thoerem 2.11 is a direct consequence of Theorem 2.10.

Theorem 2.11 (Kolmogorov’s continuity theorem 2). Let T = R+ and let X be a stochasticprocess. Let us assume that for any T ∈ T there exist constants p,K, ε > 0 such that (2.3) issatisfied for s, t ≤ T . Then, there exists a continuous modification of X.

5i.e. An ⊇ An+1 for any n ∈ N

Page 9: Stochastic Processes { Lecture Notes

9

We are now ready to provide the second (type of) Kolmogorov’s theorem - the extension the-orem. Given, a finite-dimensional distributions for all finite collections of time-points we want tocheck if there exists a stochastic process, which admits those distributions. To formulate the result,we need to formalise the concept of finite-distributions.

Definition 2.12. Let X be a stochastic process. The mapping PX : BT → [0, 1], whichcharacterises the finite-dimensional distributions of X, is given by

PX [A] := P [ω ∈ Ω : (Xt1(ω), . . . , Xtn(ω)) ∈ Γ] ,

where A = x ∈ RT : (xt1 . . . . , xtn) ∈ Γ is a cylinder seta, and BT is a family of all cylindersets on T.

awe call A ⊆ RT a cylinder set if A = x ∈ RT : (xt1 , . . . , xtn) ∈ Γ, where n ∈ N, the increasing sequence(ti)

ni=1 is a sequence of (finite) time-points of T, and Γ is a Borel-measurable set on Rn.

Note that while BT is always an algebra, it is not necessarily a σ-algebra. Given the auxiliarymapping PX defined on cylinder sets BT we want to be able to (uniquely) extend it to a probabilitymeasure on B(RT) in order to get the distribution of the corresponding stochastic process. In otherwords, we want to check if the mapping PX can be used to characterise X. This can be done ifonly the mapping preserves the consistency property. Before we state the main result, we introducesome additional notation.

Given T, let T denote the set of all finite subsets of T with partial inclusion-order. Let usassume we are given a family (PI)I∈T of probability measures PI defined on (RI ,B(RI)). Then, forI1 ⊂ I2 ⊂ T and A ∈ B(RI1) we define

CI2,I1(A) := ω ∈ RI2 : ω(t); t ∈ I1 ∈ A,CI1(A) := ω ∈ RT : ω(t); t ∈ I1 ∈ A.

Note that CI2,I1 : B(RI1) → B(RI2) and CI1 : B(RI1) → B(RT). We say that a family (PI)I∈T isconsistent if for any I1 ⊂ I2 ⊂ T we get

PI1 [A] = PI2 [CI2,I1(A)]

for all A ∈ B(RI1). Now, we are ready to state the main Theorem.

Theorem 2.13 (Kolmogorov’s extension theorem). Let (PI)I∈T be a consistent family of measures.Then, there exists a unique probability measure P defined on (RT,B(RT)) such that

P[CI(A)] = PI [A],

for any I ∈ T and A ∈ B(RI).

Note that the consistency property from Theorem 2.13 is very natural. We simply want topreserve the distribution if we exclude some sets of time points from I2 and restrict ourselves to I1.The theorems states that given stochastic process X and the corresponding mapping PX , we canextend this mapping to the probability measure on RT (defined on σ-algebra of Borel sets on RT).

2.3 Filtrations and adaptiveness

We start with basic definitions.

Page 10: Stochastic Processes { Lecture Notes

10

Definition 2.14. A filtration is a non-decreasing family of sub σ–algebras of F indexed bytime, i.e. a family F := (Ft)t∈T such that

Fs ⊆ Ft,

for s ≤ t, where t, s ∈ T.

Usually Ft corresponds to our knowledge about the system up to time t. For brevity, we callthe probability space (Ω,F ,P) with filtration F a filtered probability space. In continuous time, toguarantee smoothness, usually a right continuity axiom is imposed on filtration.

Definition 2.15. Let F be a (continuous time) filtration. We say that F is the right-continuous filtration if for any t ∈ T we get

Ft = Ft+ ,

where Ft+ :=⋂s>t,s∈TFs.

In stochastic process theory filtered probability spaces are often assumed to satisfy usual conditions.It simply means that the filtration is right-continuous and complete.6

As stated before, the filtration represents our overall knowledge about a system, while thestochastic process is the realisation of some specific system-related process. To formalise thisintuition, we need to introduce the concept of adaptiveness.

Definition 2.16. A process X is said to be adapted to filtration F (or F-adapted) if Xt isFt–measurable for any t ∈ T.

Of course, our overall knowledge about the system might be embedded into the process, which isreflected in the next definition.

Definition 2.17. Let X be a stochastic process. We say that FX := (FXt )t∈T, where

FXt = σ(Xs, s ≤ t, s ∈ T)

is a filtration generated by stochastic process X.

In other words, a filtration is generated by a stochastic process X if for any t ∈ T, the σ-algebra FXtis the smallest σ–algebra such that Xs is FXt –measurable, for any s ∈ T, such that s ≤ t. It shouldbe noted that filtrations generated by right-continuous processes are in general not necessarily right-continuous.7 The opposite implication is also not true, i.e. the filtration generated by the processwhich is not right-continuous could be right-continuous.8 Nevertheless, under some additionalassumptions (e.g. Feller property) it is the case.

Now, knowing the concept of filtration we can define a progressively measurable process.

6i.e. both F and Ft, for t ∈ T contain all P-null sets; recall that a probability space (Ω,Σ,P) is called complete ifand only if for any S ⊂ Ω such that S ⊆ N for some zero-measure set N (N ∈ Σ and P[N ] = 0) we get S ∈ Σ.

7Let X = (Xt)t∈[0,1] be given by Xt = tZ, for some fixed random variable Z ∼ U [0, 1].8Let Z be a strictly positive random variable and let X = (Xt)t∈R+ be such that Xt = tZ for t ∈ [0, 1] ∪ (2,+∞)

and Xt = 0 otherwise.

Page 11: Stochastic Processes { Lecture Notes

11

Definition 2.18. A process X defined on a filtered probability space is called progressivelymeasurable if it is F-adapted and for any t ∈ T the map (T ∩ (−∞, t]) × Ω → R defined by(s, ω)→ Xs(ω) is B(T ∩ (−∞, t])×Ft measurable.

Intuitively speaking, we want the stochastic process terminated at time t ∈ T to be measurable withrespect to information up to time t. As we show in the next example, the concepts of measurabilityand progressive measurability are not equivalent.

Example 2.19. Let (Ω,F ,P) be a standard probability space with Ω = [0, 1].9 Let

A := σ(N ⊂ [0, 1] : #N <∞)

denote the σ–algebra of countable sets (and their complements). For time horizon T = [0,+∞) wedefine filtration F = (Ft)t∈T by setting

Ft :=

A for t ∈ [0, 1);

F for t ∈ [1,∞).

Next, we define a stochastic process X = (Xt)t∈T by setting

Xt(ω) := 1∆(t, ω) =

1 if t = ω and t ≤ 1/2

0 otherwise, t ∈ T, ω ∈ Ω.

where ∆ := (t, t) : t ∈ [0, 12 ] is a subset of T× Ω.

It is easy to observe that X is a measurable process, since ∆ ∈ B([0, 1])× F . It is also F-adaptedsince for any t ≥ 0 and A ∈ B(R) we get

X−1t (A) =

∅ if t ∈ [0,+∞), 0 6∈ A, and 1 6∈ A,t if t ∈ [0, 1/2], 0 6∈ A, and 1 ∈ A,[0, 1] \ t if t ∈ [0, 1/2], 0 ∈ A, and 1 6∈ A,[0, 1] if t ∈ [0, 1/2], 0 ∈ A, and 1 ∈ A,[0, 1] if t ∈ [1/2,+∞), 0 ∈ A.

and consequently X−1t (A) ∈ Ft.

Now, we show that X is not progressively measurable. Let us fix T = 1/2 and show that

∆ 6∈ B([0, 1/2])×F1/2, (2.4)

i.e. ∆ 6∈ B([0, 1/2])×A. Assuming (2.4) is not true then we would get

∆ ∈ σ(An : n ∈ N)× σ(Dn : n ∈ N),

for some sequences (An)n∈N and (Dn)n∈N of subsets of B([0, 1/2]) and A ∩ [0, 1/2], respectively.Thus, from Foubini theorem and the definition of ∆ we immediately get that for any t ∈ [0, 1/2]we get

t = ω ∈ Ω : (ω, t) ∈ ∆ ∈ σ(Dn : n ∈ N).10

9i.e. ([0, 1],B([0, 1]),L), where L is the standard Lebesgue measure on [0, 1].10It might be easier to understand, if one draw ∆ in [0, 1]2 square.

Page 12: Stochastic Processes { Lecture Notes

12

Setting D :=⋃n∈NDn and noting that

σ(Dn : n ∈ N) ⊂ A ∈ Ω : A = Γ or A = Γ ∪ (Ω \D), where Γ ⊆ D,

we get that for any t ∈ [0, 1/2] we have t ⊂⋃n∈NDn, which in turn implies

[0, 1/2] ⊂ D.

This contradicts the fact that the union of Dn’s must be countable.It is interesting to note that that the stochastic process Y ≡ 0 is a progressively measurable

modification of X beacause

P[Xt = 0] = P[ω ∈ Ω : Xt(ω) = 0] = P[Ω \ t] = 1.

In the end of this section we give define the property known as predictability.

Definition 2.20. Let PT denote a σ-algebra on T×Ω that is generated by the left-continuousand adapted stochastic processes, i.e. the smallest σ-algebra that contain the sets

A× (s, t], s, t ∈ T, s < t,A ∈ Fs.

Then,

• PT is called the predictable σ-algebra.

• a stochastic process X is said to be predictable if it is PT–measurable.

Sometimes, it is also useful to define the optional σ-algebra denoted by OT, i.e. σ-algebra on T×Ωthat is generated by the left-continuous and adapted stochastic processes. We call a stochasticprocess optional if it is OT–measurable.

2.4 Stopping times

Assuming that the stochastic process reflects the value of a game (or the price of the stock) we mightbe interested in the definition of stopping condition, e.g. when we win (or lose) a particular amountof money (the stock price pass a pre-specified threshold). This intuitive concept is embedded intothe definition of stopping times (Markov moments). First, let us give a definition of a random time.

Definition 2.21. Let (Ω,F ,F,P) be a filtered probability space. A random variable τ withvalues in time set T ∪ +∞, i.e.

τ : Ω→ T ∪ +∞

is called a random time. We call τ a stopping time (or F–stopping time, or Markov moment)if

τ ≤ t ∈ Ft.

for any t ∈ T.

Intuitively speaking, a random time is a stopping time if we can decide if the event τ ≤ t occurred(or not occured) basing only on the information available up to time t ∈ T. This definition can be

Page 13: Stochastic Processes { Lecture Notes

13

used to define a stopping rule which influences our control of the process (e.g. stop the game ifyour reward is bigger than a pre-specified number). Moreover, if not stated (explicitely) otherwisewe we assume that all considered stopping times are finite, i.e. they satisfy the condition

P[τ = +∞] = 0.

We do that mainly for technical reasons, and to present the statements of the results more trans-parently. Also, we use the convention

inf ∅ = +∞,

i.e. if a stopping condition is not satisfied, then the value of the corresponding stopping time will beequal to +∞. To better understand the concept of stopping times, let us provide a few examples.

Example 2.22 (First entry time). Let T = N and let X be an F-adapted stochastic process. Then,for any B ∈ B(R) the random variable

τB := inft ∈ T : Xt ∈ B

is a stopping time. It could be interpreted as the first time of entry of X into B.

Proof. For any t ∈ T we get

τ ≤ t =⋃

s∈T:s≤tXs ∈ B.

As X is F–adapted we know that Xs ∈ B ∈ Fs and consequently Xs ∈ B ∈ Ft. As the unionis countable, the claim follows.

One could also ask if the last hitting time is a stopping time, i.e. if

ρ = supt ∈ T : Xt ∈ B.

Unfortunately it is not the case (despite some degenerate cases). Intuitively speaking, we wouldneed to know the future in order to determine if X returns to B (proof is left as a simple exercise).Another natural question is if Example 2.22 could be translated to continuous time filtration. Weshow that it is indeed the case, under some additional technical conditions. Before we present theexample, let us recall some basic facts related to stopping times.

First, we show that for stopping times there are additional types of Ft-measurable events.

Lemma 2.23. Let F be a filtration and let τ be a stopping time. Then, for any t ∈ T the eventsτ > t, τ < t, and τ > t belong to Ft.

Proof. For any t ∈ T we get τ ≤ t ∈ Ft and consequently τ ≤ tc ∈ Ft (as Ft is a σ–algebra).Thus,

τ > t = τ ≤ tc ∈ Ft.

Next, we get

τ < t =⋃

s<t,s∈Q∩Tτ ≤ s ∈ Ft,

where Q is a set of rational numbers. Finally,

τ = t = τ ≤ t \ τ < t ∈ Ft.

Page 14: Stochastic Processes { Lecture Notes

14

Unfortunately, in continuous time usually we cannot replace condition τ ≤ t ∈ Ft withτ < t ∈ Ft or τ = t ∈ Ft. Nevertheless, if we add some additional technical assumptions, thenthe definitions might be equivalent.

Lemma 2.24. Let T = R+ and let F be right continuous. Then, a random time τ is a stoppingtime if and only if for any t ∈ T we get

τ < t ∈ Ft. (2.5)

Proof. The first part of the proof (left implication) is given in Lemma 2.23. Let us assume that τis such that (2.5) is satisfied. Then for any t ∈ T we get

τ ≤ t =⋂

s>t,s∈Q∩Tτ < s ∈ Ft+ .

Since Ft+ = Ft, the claim follows.

Finally, w need to know that if (τn)n∈N is a sequence of stopping times, then so is the supremumof all τn’s. Under continuity assumptions, same is true for infimum, limes inferior, and limessuprerior. Instead of writing supn∈N τn we use the notation sup τn (noticing that supremum isω-wise and taken with respect to n, rather than with respect to ω).

Lemma 2.25. Let (τn)n∈N be a sequence of F–stopping times. Then, sup τn is a stopping time.Moreover, if F is right-continuous, then inf τn, lim sup τn, and lim inf τn are stopping times.

Proof. It is easy to note that sup τn is a random time. Moreover, for any t ∈ T we get

sup τn ≤ t =⋂n∈Nτn ≤ t ∈ Ft.

Now, let us assume that F is right-continuous. Then, it is enough to show the strict inequalities;see Lemma 2.5. Taking the complement of inf τn < t, we get

inf τn ≥ t =⋂n∈Nτn ≥ t =

⋂n∈Nτn < tc ∈ Ft,

so that inf τn < t ∈ Ft. Next, we have

lim sup τn < t =∞⋃k=1

∞⋂n=1

∞⋃m=n

τm < t− 1/k ∈ Ft

lim inf τn > t =∞⋃k=1

∞⋃n=1

∞⋂m=n

τm > t+ 1/k ∈ Ft

which concludes the proof.

We are not finally ready to formulate the analog of Example 2.22 in continuous

Example 2.26. Let T = R+ and let X be an F-adapted stochastic process. Assume that X andF are right-continuous. Then, for any open (or closed) subset B ∈ B(R) the random variable

τB := inft ∈ T : Xt ∈ B

is a stopping time.

Page 15: Stochastic Processes { Lecture Notes

15

Proof. Let B be an open subset of R. Due to Lemma 2.5 it is enough to operate on strict inequalities(and their complements). We know that for any t ∈ T we get

τB ≥ t = Xs ∈ Bc, s ∈ T, s < t.

Now, because X is right-continuous we get

Xs ∈ Bc, s ∈ T, s < t =⋂

s<t,s∈T∩QXs ∈ Bc ∈ Ft.

which concludes the proof for open subset. Now, let us assume that B is a closed subset of R. Forε > 0 let Bε denote the ε-hull of B, i.e. the set of point for which the distance from D is strictlysmaller than ε. Since Bε is open, we know that τDε is a stopping time. Noting that

τB = limε→0+

τDε

and using Lemma 2.25 we get that τB is a stopping time, which concludes the proof.

If the time is discrete, we can reformulate the definition of stopping time using equality insteadof inequality.

Proposition 2.27. Let F be a discrete-time filtration. Then τ is F–stopping time if and only if

τ = t ∈ Ft

for any t ∈ T.

Proof. If τ is a stopping time, then

τ = t = τ ≤ t \⋃

s∈T,s<tτ ≤ s,

which proves our claim. On the other hand we get

τ = t =⋃

s≤t,s∈Tτ = s,

so the converse implication is also true.

Finally, let us show that one can perform some basic operations on stopping times.

Proposition 2.28. Let F be a filtration and let τ, ρ be two F–stopping times. Then,

1) If τ and ρ are non-negative, then τ + ρ is a stopping time.11

2) τ ∧ ρ := minτ, ρ is a stopping time.

3) τ ∨ ρ := maxτ, ρ is a stopping time.

11Assuming that ρ+ τ has values in T.

Page 16: Stochastic Processes { Lecture Notes

16

Proof. To prove 1) it is enough to note that

τ + ρ > t =[τ = 0 ∩ ρ > t

]∪[ ⋃s∈T∩Q∩[0,+∞)

τ > s ∩ ρ > t− s].

Next, to prove 2) and 3) it is enough to note that

τ ∧ ρ ≤ t = τ ≤ t ∩ ρ ≤ t

andτ ∨ ρ ≤ t = τ ≤ t ∪ ρ ≤ t.

Now, we outline some concepts and notation related to stopping times. For simplicity, fromnow on we assume that the considered stochastic processes are progressively measurable. As wehave mentioned, stopping times might be used to stop a process or define a random sample fromthe process. Also, we can define the σ–algebra which provides the information up to a randomtime.

Definition 2.29. Let τ be a (finite) F–stopping time and let X be a progressively F-measurablestochastic process.

1) A random time sample from stochastic process X picked at τ , and denoted by Xτ ,is given by

Xτ (ω) := Xτ(ω)(ω), for ω ∈ Ω.

2) A process stopped at τ denoted Xτ = (Xτt )t∈T is given by

Xτt := Xt∧τ , for t ∈ T.

3) The σ–algebra at a stopping time τ , denoted by Fτ is given by

Fτ := A ∈ F : A ∩ τ ≤ t ∈ Ft , for any t ∈ T

Please note that in general the function Xτ might be not measurable at all (and consequently notFτ–measurable). Nevertheless, under the progressive measurability it is the case. Also, we need toshow that Fτ is in fact a σ–algebra.

Proposition 2.30. Let τ be a (finite) F–stopping time. Then, the family of sets Fτ is σ–algebraand τ is Fτ–measurable. Moreover, if the stochastic process X is progressively F-measurable thenXτ is Fτ measurable (and Xt∧τ is Ft measurable).

Proof. From the definition of τ we get Ω ∈ Fτ , as τ ≤ t ∈ Ft for any t ∈ T. Now, let us assumethat A ∈ Fτ . Then, for any t ∈ T we get

Ac ∩ τ ≤ t = τ ≤ t \ (A ∩ τ ≤ t) ∈ Ft,

and consequently Ac ∈ Fτ .Next, we know that for a sequence (An)n∈N, where An ∈ Fτ , we get

Page 17: Stochastic Processes { Lecture Notes

17

(⋃n∈N

An) ∩ τ ≤ t =⋃n∈N

(An ∩ τ ≤ t) ∈ Ft

for any t ∈ T, which implies⋃n∈NAn ∈ Fτ . This implies that Fτ is indeed σ–algebra.

Now,to prove that τ in Fτ–measurable it is enough to show that for any s ∈ R the event τ ≤ sbelongs to Fτ . We can assume that s ∈ T (since τ has values in T). Then, for As = τ ≤ s we get

As ∩ τ ≤ t = τ ≤ t ∧ s ∈ Ft∧s ⊆ Ft.

for any t ∈ T, which concludes this part of the proof.Finally, we need to show that for Γ ∈ B(R) and any t ∈ T we get

Xτ ∈ Γ ∩ τ ≤ t ∈ Ft.

Let us fix t ∈ T. Noting that τ ∧ t is a stopping time, we get that the map Z : Ω→ Ω× [0, t] givenby Z(ω) = (ω, τ(ω) ∧ t, is F–measurable (as its margins are measurable).12 From the progressivemeasurability of X, we know that the map W : Ω × [0, t] → R given by W (ω, s) = Xs(ω), isFt × B(R)–measurable. Consequently, the map V : Ω→ R given by

V (ω) = W (Z(ω)) = Xτ(ω)∧t(ω),

is Ft–measurable. Now, we note that

Xτ ∈ Γ ∩ τ ≤ t = Xτ∧t ∈ Γ ∩ τ ≤ t = V −1(Γ) ∩ τ ≤ t ∈ Ft,

for any t ∈ T (as the union of two measurable events), which concludes the proof.

If the time is discrete, then the progressive measurability is equivalent to measurability, so theclaim follows immediately (the proof is left as a simple exercise).

2.5 Martingales

We begin this section with a general definition of the martingale property

Definition 2.31. Let X be F-adapted integrable stochastic process. We say that

1) X is a martingale with respect to F if for any s, t ∈ T, such that s ≤ t we get

E[Xt|Fs] = Xs;

2) X is a submartingale with respect to F if for any s, t ∈ T, such that s ≤ t we get

E[Xt|Fs] ≥ Xs;

3) X is a supermartingale with respect to F if for any s, t ∈ T, such that s ≤ t we get

E[Xt|Fs] ≤ Xs.

12note that the map does preserve Ft–measurabilityas well.

Page 18: Stochastic Processes { Lecture Notes

18

Sometimes, for brevity we say that X is F-martingale (resp. F-submartingale, F-supermartingale),or, if the underlying filtration is known, simply a martingale (resp. submartingale, supermartin-gale). Using the tower property, one can easily prove the following proposition.

Proposition 2.32. Let (Ω,F ,F,P) be a filtered probability space and let X be F-adapted integrablestochastic process. Then

1) If X is martingale, then E[Xt] = E[Xs] for t, s ∈ T;

2) If X is submartingale, then E[Xt] ≥ E[Xs] for t, s ∈ T, such that t ≥ s;

3) If X is supermartingale, then E[Xt] ≤ E[Xs] for t, s ∈ T, such that t ≥ s.

Given any random variable, it is easy to define the corresponding regular martingale by simplytaking the conditional expectation.

Definition 2.33. Let X be a martingale. We call X a regular martingale If there exists arandom variable η such that for any t ∈ T we get

Xt = E[η|Ft].

If the time is finite, then any martingale is regular. We leave the proof as an exercise.

Proposition 2.34. Let T = [0, T ] for some T ∈ R+13 and let X be a martingale. Then Xt =

E[XT |Ft] for any t ∈ T.

Of course, there exists martingales which are not regular. The simplest example is a randomwalk (with respect to its natural filtration). Finally, let us recall that Jensen’s inequality might beuseful to show the (sub)martingale property

Proposition 2.35. Let X be a martingale and let f : R → R be a convex function. Then theprocess (f(Xt))t∈T is a submartingale (assuming it is integrable).

Proof. The proof follows easily from the Conditional Jensen’s inequality. Indeed, for any t, s ∈ Tsuch that t > s we get

f(Xs) = f(E[Xt|Fs]) ≥ E[f(Xt)|Fs].

We state the next result to show the interactions between stopping times and martingales. Forsimplicity, we present the proofs only in discrete time. The first result shows that under stopping,the martingale property is preserved.

Theorem 2.36. Let T be discrete and let X be an F–martingale (resp. F-submartingale, F-supermartingale). Then for any F–stopping time τ the stopped process Xτ is an F–martingale(resp. F-submartingale, F-supermartingale).

Proof. For simplicity, let T = N. We only show the proof for submartingale process (from whichother cases follow easily). As the time is discrete it is enough to check the submartingale condition

E[Xτn+1|Fn] ≥ Xτ

n,

13or T = 1, 2, . . . , T for T ∈ N

Page 19: Stochastic Processes { Lecture Notes

19

for all n ∈ N. First, we know that for any n ∈ N we get

Xn∧τ =n−1∑k=1

1τ=kXk + 1τ≥nXn,

Hence, the stopped process Xτ is integrable. Second, we get

X(n+1)∧τ −Xn∧τ = 1τ>n(Xn+1 −Xn), (2.6)

and consequently

Xn∧τ = X1 +n−1∑k=1

(X(k+1)∧τ −Xk∧τ ) = X1 +n−1∑k=1

1τ>k(Xk+1 −Xk).

Consequently, the process Xτ is F–adapted. Finally, noting that τ > n = τ ≤ nc ∈ Fn andusing (2.6) we get

E[X(n+1)∧τ |Fn] = E[Xn∧τ +X(n+1)∧τ −Xn∧τ |Fn]

= E[Xn∧τ + 1τ>n(Xn+1 −Xn)|Fn]

= Xn∧τ + 1τ>nE[Xn+1 −Xn|Fn]

≥ Xn∧τ .

which concludes the proof.

In fact, the martingale inequalities are true for stopping time, which is a consequence of theDoob’s optional sampling theorem. Below, we present the simplified discrete-time version.

Theorem 2.37. Let T be discrete and let X be an F–martingale (resp. F-submartingale or F-supermartingale). Let τ1 ≤ τ2 be two bounded F–stopping times. Then, we get

E[Xτ2 |Fτ1 ] = Xτ1 (resp. ≤ or ≥ ).

Proof. For simplicity let us assume that T = N and show the proof for submartingale. FromProposition 2.30 we know that for i = 1, 2, the random variable Xτi is Fτi–measurable, and Fτi isa proper σ–algebra.14 Let N ∈ N be such that τ2 ≤ N (such number exists since τ2 is bounded).We know that Xτ1 and Xτ2 are integrable since for i = 1, 2 we get

E|Xτi | ≤N∑n=1

E|Xn| <∞.

We need to show that for any A ∈ Fτ1 we get

E[1AXτ1 ] ≤ E[1AXτ2 ].

Noting that

A =N⋃k=1

Ak

14Note that time is discrete, so it is enough for X to be measurable and adapted.

Page 20: Stochastic Processes { Lecture Notes

20

where Ak := τ1 = k ∩A, and Ak’s are Fτ1–measurable, it is enough to show that

E[1AkXτ1 ] ≤ E[1AkXτ2 ].

for any k = 1, 2, . . . , N . Let us fix k ∈ N (k ≤ N) and let

L(n) := E[1AkXn∧τ2 ], for n = k, k + 1, . . . , N.

If we show that the function L is non-decreasing, then we would get (note that τ1 ≤ τ2)

E[1BkXτ1 ] = E[1BkXτ1∧τ2 ] = E[1BkXk∧τ2 ] = L(k) ≤ L(N) = E[1BkXN∧τ2 ] = E[1BkXτ2 ]

which will conclude the proof. For any n = k + 1, . . . , N we get

L(n+ 1)− L(n) = E[1Ak(X(n+1)∧τ2 −Xn∧τ2)] = E[1Ak∩τ2>n(Xn+1 −Xn)].

Since X is a submartingale, it would be sufficient to show that for any n we get

Ak ∩ τ2 > n ∈ Fn.

From the definition of τ2 we know that τ2 > n ∈ Fn. Also, we know that

Ak = τ1 = k = A ∩ τ1 ≤ k \A ∩ τ1 ≤ k − 1.

As A ∈ Fτ1 ,we know that A ∩ τ1 ≤ k ∈ Fk and A ∩ τ1 ≤ k − 1 ∈ Fk−1. This concludes theproof as k − 1 < k ≤ n, and consequently Fk−1 ⊆ Fk ⊆ Fn.

Now, let us consider the continuous time setting. The analogues of Theorem 2.36 and Thoerem2.37 are true for cadlag martingales. Before we explicitely state the results, let us provide a theorem,which states that for stochastically continuous martingales, the cadlag modification always exists.As mentioned, for brevity we state the results without the proofs.15

Theorem 2.38. Let T = R+ and let the stochastic process X be a stochastically continuous mar-tingale (resp. supermartingale, submartingale). Then, there exists a cadlag modification of X.

Theorem 2.39. Let T = R+ and let the stochastic process X be a cadlag martingale (resp. sub-martingale, supermartingale). Then, the stopped process Xτ is a martingale (resp. submartingale,supermartingale).

Theorem 2.40. Let T = R+ and let the stochastic process X be a cadlag martingale (resp. sub-martingale, supermartingale). Let τ1 ≤ τ2 be two bounded stopping times. Then, we get

E[Xτ2 |Fτ1 ] = Xτ1 (resp. ≤ , ≥ ).

2.5.1 Doob-Meyer decomposition

We start this section with the analogue of Doob-Meyer decomposition for discrete-time, where itcan be presented in a much simpler form (so called Doob decomposition).

15see e.g. O. Kallenberg, Foundations of modern probability, Springer, New York 2002 and references therein forthe proofs.

Page 21: Stochastic Processes { Lecture Notes

21

Theorem 2.41 (Doob decomposition). Let T = N and let X be a submartingale (resp. super-martingale). Then we have a (unique) decomposition

Xt = Mt +At,

for t ≥ 0, where M = (Mt)t∈T is a martingale, and A = (At)t∈T is a predictable increasing (resp.decreasing) process starting from zero.16

Proof. Let X be a submartingale. Let A and M be given by

At =t∑

k=1

(E[Xk|Fk−1]−Xk−1),

Mt = X0 +

t∑k=1

(Xk − E[Xk|Fk−1],

where A0 = 0 and M0 = X0. One can easily see that for any t ∈ N we get

Xt = Mt +At.

Also, since X is a submartingale we get that A is increasing and starting from 0. Moreover, forany t ∈ N we get

E[Mt −Mt−1|Ft] = E[Xt − E[Xk|Fk−1]] = 0.

so M is a martingale. The proof uniqueness is left as a simple excercise.

Without the monotonicity assumptions imposed on A, the Doob decomposition could be infact made for any integrable and adapted discrete-time stochastic process, and it is almost surelyunique.

Now, we want to show that a similar theorem is true for continuous time. Before we state themain Theorem of this section, we need to recall the concept of uniform integrability.

Definition 2.42. Let X = Xαα∈I be a family of random variables indexed by I. We saythat a family X is uniformly integrable if

∀ε>0∃M>0 : supα∈I

E[1|Xα|>M|Xα|

]< ε.

Next, let us formally define the class of (D) and (DL) processes. To ease the notation for anys ∈ T∪+∞ let Σs denote a family of all stopping times (with respect to the underlying filtrationF) with values smaller or equal to s, i.e.

Σs := τ : τ ≤ s.

Definition 2.43. Let X be a right-continuous submartingale. We say that

• X is of class (D) if the family Xτ : τ ∈ Σ∞ is uniformly integrable,

• X is of class (DL) if for any t ∈ T the family Xτ : τ ∈ Σt is uniformly integrable,

16In discrete time predictability is corresponding to the fact that Xt is Ft−1 measurable.

Page 22: Stochastic Processes { Lecture Notes

22

One might look at (D) and (DL) class properties as generalisations of the concept of uniformintegrability for stochastic processes. While Definition 2.43 might look strange, it is in fact gener-alisation of an intuitve property: From Theorem 2.40 we see that for a cadlag martingale processX, time index t ∈ T, and bounded stopping time τ ≤ t, we get

E[Xt|Fτ ] = Xτ . (2.7)

As the sets of conditional expectations of a given random variable (with respect to various subσ-fields) are uniformly integrable, the (DL) class property is satisfied. Indeed, we get the followingresult.

Proposition 2.44. Let X be a martingale cadlag stochastic process. Then X is of class (DL).

Proof. Let us fix t ∈ T. As for any τ , such that τ ≤ t we have (2.7) it is enough to prove that thefamily of conditional expectations

Y : Y = E[Xt|B] for some B, where B is a sub σ–algebra of F

is uniformly integrable. Let us fix ε > 0. Since |Xt| is integrable we know that there exists δ > 0such that for any A ∈ F satisfying P[A] ≤ δ we get

E[1A|Xt|] < ε.

By conditional Jensen’s inequality for any B and Y = E[Xt|B] we get

E[|Y |] = E[|E[Xt|B]|] ≤ E[E[|Xt||B]] = E[|Xt|].

which implies (using Markov’s inequality)

P[|Y | > a] ≤ 1

aE[|Y |] ≤ 1

aE[|Xt|].

for any a > 0. Thus, setting a := E[|Xt|]δ we get

P[A] ≤ δ,

for A := |Y | > a. Consequently, noting that A ∈ B, we get

E [1A|Y |] ≤ E [1A|E[Xt | B]|]≤ E [1AE[|Xt| | B]]

= E [E [1A|Xt| | B]]

= E[1A|Xt| | B]

≤ ε.

As for the fixed t ∈ T the choice of a was independent of B we conclude the proof.

Finally, we state (without the proof) the simplified version of Doob-Meyer decomposition the-orem. Note that the statement is more general (i.e. for submartingale stochastic processes).

Page 23: Stochastic Processes { Lecture Notes

23

Theorem 2.45 (Doob-Meyer decomposition). Let X be a cadlag submartingale of class (DL).Then, we have a unique decomposition

Xt = Mt +At,

for t ≥ 0, where M = (Mt)t∈T is a martingale, and A = (At)t∈T is a predictable increasing processstarting from zero.

The decomposition is often used to define the quadratic variation of X. We know that if X isa square-integrable cadlag martingale then X2 is a submartingale of class (DL). The proof of thisfact is left as as excercise (use Jensen’s inequality!). Consequently, we can decompose X2 using theDoob-Meyer decomposition, and the predictable part relates to the quadratic variation of X.17

Definition 2.46. Let X be a square-integrable cadlag martingale. The quadratic variationof X is the stochastic process 〈X,X〉 = (〈X,X〉t)t∈T defined by

〈X,X〉t := X2t −Mt,

where (Mt)t∈T is a martingale from Theorem 2.45.

3 Important examples of stochastic processes

In this section we introduce some important examples of stochastic processes. If not stated other-wise, we assume that T = R+ , i.e. time is continuous and we have a starting point.

Recall that in Definition 2.6 we introduced properties related to process stationarity and inde-pendence of increments. Using concepts related to those definitions, we can define certain importantfamilies of stochastic processes. Before we focus of specific examples, let us provide a general defi-nition of a Levy process.

Definition 3.1. Let T = R+. We say that a stochastic process X is a Levy process if

1) X0 = 0;

2) X is stochastically continuous;

3) X has independent and stationary increments.

The class of Levy processes is very important from both theoretical and practical point of view.It could be viewed as the continuous-time analog of a random walk. In particular, note that forany Levy process the increments over different (disjoint) time intervals of the same length mustbe independent and must have the same distribution. This class of processes satisfy many usefulproperties, such as the Markov property (we will formally state this result later). Also, there alwaysexists a cadlag modification of a Levy process. Let us state this fact for future’s reference (withoutthe proof; see e.g. Kinney’s Theorem).

Proposition 3.2. Let X be a Levy process. Then, there exists a cadlag modification of X.

17Why it is called quadratic variation and why it is interesting? We will need it later to integrate by parts and toget stochastic change of variables formula, known as Ito’s lemma.

Page 24: Stochastic Processes { Lecture Notes

24

Usually, we will additionally require increments to have a particular distribution, e.g. Gaussianor Poisson with time-dependant parameters. To proof the existence of a certain stochastic processgiven it’s finite-dimensional distributions, we utilise the Kolmogorov’s existence theorem. Then,Kolmogorov’s continuity theorem will be used to pick a proper cadlag modification. To illustratethis, we show how this could be done for a Brownian motion.

3.1 Brownian motion

In this section we define and show basic properties of the Brownian motion, sometimes referred toas the Wiener process. This process is the central part of stochastic analysis and will be later usedin the definition of Ito’s integral. We start with the definition.

Definition 3.3. Let T = R+. We call W = (Wt)t∈T a (standard) Brownian motion (orWiener process) if W satisfies the following properties:

1. W0 = 0;

2. W is (a.s.) continuous;

3. W has independent increments;a

4. Wt −Ws ∼ N(0,√t− s) for any 0 ≤ s ≤ t.

ai.e. for 0 ≤ t1 < . . . tn <∞ the random variables Wt1 , Wt2 −Wt1 , . . . , Wtn−1 −Wtn , are independent.

Straight from the definition we see that Brownian motion is an example of the Levy process,where the increments are defined in terms of the Gaussian distribution with constant (zero) meanand time-varying volatility. Note that we require increments to be stationary, not the process itself.Before we show that the Brownian motion exists let us outline it’s basic properties– note that forgaussian vectors, the first two moments characterise the whole distribution. We leave the proof asan excercise.

Proposition 3.4. Let W be a Brownian motion. Then

1. W is a gaussian process;18

2. E[Wt] = 0, for t ∈ T;

3. CW (t, s) = Cov(Wt,Ws) = t ∧ s, for t, s ∈ T.19

It should be noted that the Brownian motion could be alternatively defined using propertiesgiven in Proposition 3.4, i.e. any gaussian continuous process starting at zero for which the firsttwo moments satisfy the above conditions is the Brownian motion.

Assuming the probability space is reach enough, we can show that Brownian motion indeedexists. This is the statement of the next Theorem.

Theorem 3.5. Let T = R+ and let (Ω,F) = (RT,B(RT)). Then there exists a probability measureP defined on (Ω,Σ) and a stochastic process W , such that W is the Brownian motion.

18i.e for any 0 ≤ t1 < . . . tn <∞ the random vector (Wt1 ,Wt2 , . . . ,Wtn) is a multivariate gaussian vector.19where for random variables X and Y we set Cov[X,Y ] := E[(X − E[X])(Y − E[Y ])].

Page 25: Stochastic Processes { Lecture Notes

25

Proof. For brevity, we show only the outline of the proof. We use I = t1, t2 . . . , tn ∈ T to denote(any) finite set of time-points from T, such that 0 ≤ t1 < t2 < . . . < tn, where n ∈ N. Moreover,let µI denote the Gaussian measure on (Rn,B(Rn)) with zero mean vector, and n × n covariancematrix given by

MI := [ti ∧ tj ]i,j∈I .20

For any I = t1, t2 . . . , tn and Γ ∈ B(Rn) we set

C(I,Γ) := ω ∈ Ω : (ω(t1), . . . , ω(tn) ∈ Γ.

We use FI := C(I,Γ) : Γ ∈ B(Rn) to denote the σ–algebra of all cylinder sets with predefinedtime indices I ∈ T and we use PI to denote the corresponding probability measure on (Ω,FI) whichis given by

PI [C(I,Γ)] = µI(Γ), Γ ∈ B(Rn).

Then, it can be shown that the family of measures (PI)i∈T is consistent (the proof is left as anexercise).

Thus, Using Kolmogorov’s extension Theorem 2.13 we know that there exists a probabilitymeasure on (Ω,F), say P, such that for any I = t1, . . . , tn ∈ T and Γ ∈ B(Rn) we get

P[C(I,Γ)] = PI [C(I,Γ)].

On (Ω,F ,P) we define the stochastic process W = (Wt)t∈T setting

Wt(ω) := ω, for t ∈ T and ω ∈ Ω.

It can be shown that the process W is Gaussian, starts from zero, and it’s auto-covariance functioncoincides with auto-covariance function of the Wiener process (given in Proposition 3.4).

Consequently, to conclude the proof we only need to show that there exists a continuous modi-fication of the stochastic process W . Noting that for any t, s ∈ T and p ∈ N we get

E

[|Wt −Ws|√|t− s|

]2p

= E|X|2p,

where X is a standard Gaussian random variable, and setting Kp := E|X|2p, we get the property

E|Wt −Ws|2p = Kp|t− s|p, t, s ∈ T.21 (3.1)

Consequently, setting ε = p− 1 and using Kolmogorov’s continuity Theorem 2.11 we conclude thatW has a continuous modification. The continuous modification is the standard Brownian motion,which concludes the proof.

One can also provide a more constructive proof of the existence, e.g. by taking Haar functionson [0, 1] and showing convergence for properly defined series of independent Gaussian randomvariables. Exemplary construction might be found in the literature under the name Levy-Ciesielskicontruction. It should be noted that the construction only require probability space to be reachenough to contain a sequence of independent standard Gaussian random variables. We are nowready to show some interesting properties of the Brownian motion.

Next, we outline some basic results about the Brownian motion. Let us start with basic trans-formations for which the translated process is still the Brownian motion.

20The proof that MI is indeed a proper covariance matrix for any I ∈ T is left as an exercise.21Note that all moments of the standard gaussian random variable exist so Kp is finite for any p ∈ N.

Page 26: Stochastic Processes { Lecture Notes

26

Proposition 3.6 (Transformations of Brownian motion). Let W be a Brownian motion. Then,the following symmetric transformations preserve the Brownian motion:

• Time-homogeneity, i.e. for any s > 0 the process Wt = Wt+s−Ws is a Brownian motion;

• Positive-scaling, i.e. for any c > 0 the process Wt = cWt/c2 is a Brownian motion;

• Time-inversion, i.e. the process W0 = 0 and Wt = tW1/t (for t > 0) is a Brownian motion;

• Path-inversion, i.e. the process Wt = −Wt is a Brownian motion.

The proof of Proposition 3.6 is very simple and it is left as an excercise. Next, we show theproperties of the Brownian motion related to continuity of it’s sample paths.

Proposition 3.7 (Continuity properties). Let W be a Brownian motion. Then

1. W is Holder continuous for any exponent α < 1/2;22

2. W is not Holder continuous with exponent α = 1/2 on any (time) subinterval;

3. Almost all paths of W are nowhere differentiable;

4. W has infinite total variation on any (time) sub-interval.

Proof. Let W be a Brownian motion.

1) Using Equation (3.1) from the proof of Theorem 3.5 we know that we have

E|Wt −Ws|2p = Kp|t− s|p, t, s ∈ T,

for any p ∈ N. Noting thatε

p=p− 1

2p→ 1

2for p→∞,

and using Theorem 2.10 we know that we can find a continuous modification of W which is Holdercontinuous for any exponent α < 1/2.

2) Let BC[a,b] denote the set off all paths which are Holder continuous with α = 1/2 on time interval

[a, b] with constant C, i.e.

BC[a,b] := ω ∈ Ω | ∀s,t∈[a,b] : |Wt(ω)−Ws(ω)| ≤ C|t− s|1/2.

Let us show that for any C ∈ N and a, b ∈ Q (a < b) we get P[BC

[a,b]

]= 0. Indeed, noting that

W has stationary and independent increments, and we can shift subset [a, b] to [0, b − a], for any

22i.e. there exists such modification.

Page 27: Stochastic Processes { Lecture Notes

27

n ∈ N we get

P[BC

[a,b]

]≤ P

[∣∣∣W in

(b−a) −W i−1n

(b−a)

∣∣∣ ≤ C√b− an

, for i = 1, 2, . . . , n

]

=n∏i=1

P

[∣∣∣W in

(b−a) −W i−1n

(b−a)

∣∣∣ ≤ C√b− an

]

= P

[∣∣∣W b−an

∣∣∣ ≤ C√b− an

]n= P [|W1| ≤ C]n

= an.

Since an 0 (as n→∞) we conclude that P[BC

[a,b]

]= 0. Noting that to prove the initial statement

we only need to consider time-intervals with rational endpoints and

P

⋃C∈N

⋃a<ba,b∈Q

BC[a,b]

≤∑C∈N

∑a<ba,b∈Q

P[BC

[a,b]

]= 0,

we conclude this part of the proof.

3) For brevity, we only outline the proof.23 Suppose that W is differentiable at some point s ∈ T.Then, it can be shown that there exists ε > 0 and K ∈ N such that for any t ∈ [s, s+ ε) we get

|Wt −Ws| ≤ K(t− s). (3.2)

Next, it can be shown that inequality (3.2) implies that for sufficiently big n ∈ N there exists i ∈ N,such that for k = 0, 1, 2 we get

|W(i+k+1)/n −W(i+k)/n| ≤ 7K

n. (3.3)

Now, let AM denote the set of all paths, such that there exists at least one time-point in [0,M) atwhich W is differentiable. Using property (3.3) we know that

AM ⊂∞⋃K=1

∞⋃n0=1

∞⋂n=n0

Mn−4⋃i=0

2⋂k=0

|W(i+k+1)/n −W(i+k)/n| ≤ 7

K

n

.

Now, noting that W has stationary and independent increments, and exploiting the fact that

23see e.g. H.P. McKean, Stochastic integrals for the full proof.

Page 28: Stochastic Processes { Lecture Notes

28

√nW1/n ∼ N(0, 1) we get

P

[ ∞⋂n=n0

Mn−4⋃i=0

2⋂k=0

|W(i+k+1)/n −W(i+k)/n| ≤ 7

K

n

]≤ lim inf

n→∞MnP

[|W1/n| < 7

K

n

]3

≤ lim infn→∞

MnP[|W1| < 7

K√n

]3

≤ lim infn→∞

Mn

(2√2π

∫ 7K/√n

0e−

12x

2

dx

)3

≤ lim infn→∞

Mn

(2√2π

∫ 7K/√n

01 dx

)3

≤ lim infn→∞

Mn

(2√2π

7K√n

)3

≤ lim infn→∞

C√n.

where the constant C := M(

14K√2π

)3is independent of i and n. Noting that

lim infn→∞

C√n

= 0,

and using the (countable) subadditivity property of the probability measure we conclude thatP[AM ] = 0.

4) The function f : R→ R has a finite total-variation on the interval [a, b] if

supt∈P[a,b]

n∑i=1

|f(ti+1)− f(ti)| <∞,

where P[a, b] is the set of all finite partitions of [a, b] and t = (t1, . . . , tn). Using similar argumentsas before it is enough to show that for a, b ∈ Q+ (a < b) and C ∈ N we get

limn→∞

P

[n∑i=1

∣∣∣W in

(b−a) −W i−1n

(b−a)

∣∣∣ ≤ C] = 0.

For any fixed n ∈ N we set Xn :=∑n

i=1

∣∣∣W in

(b−a) −W i−1n

(b−a)

∣∣∣. Recalling that for a standard

normal random variable Z we get E[|Z|] =√

2π we get

E[Xn] =n∑i=1

√b− an

E[|W1|] =

√2(b− a)

πn ,

Var [Xn] =n∑i=1

Var[∣∣∣W i

n(b−a) −W i−1

n(b−a)

∣∣∣] = nVar[∣∣∣W b−a

n

∣∣∣] = (b− a) Var |W1| .

The Chebyshev’s inequality states that for any non-constant Y ∈ L2 and any r > 0 we get

P[|Y − E[Y ]| ≥ r] ≤ Var [Y ]

r2.

Page 29: Stochastic Processes { Lecture Notes

29

Thus, setting Y = −Xn and considering only positive part of Y − E[Y ] we get

P[Xn ≤ E[Xn]− r] ≤ Var [Xn]

r2.

Next, noting that E[Xn] > 0 and setting r = E[Xn]2 we get

P[Xn ≤

E[Xn]

2

]≤ 4(b− a) Var |W1|

E[Xn]2≤ 2πVar |W1|

n.

Now, noting that E[Xn] → ∞ (as n → ∞) for any C ∈ R+ there exists n0 ∈ N such that for anyn ≥ n0 we get

P[Xn ≤ C] ≤ P[Xn ≤

E[Xn]

2

]≤ 2πVar |W1|

n.

Since πVar |W1|n → 0 (as n→∞) the proof is complete.

W conclude this Section with basic characteristic of the Brownian motion related to it’s mar-tingale representations.

Proposition 3.8. Let W be a Brownian motion. Then,

1. The process (Wt) is a martingale (wrt. filtration generated by W ).

2. The process (W 2t − t) is a martingale (wrt. filtration generated by W ).

3. Let F be a filtration and let Z be a process such that

- Z0 = 0;

- Z is square-integrable and continuous;

- (Zt) is F–martingale;

- (Z2t − t) is F–martingale;

Then, Z is a Brownian motion.

3.2 Poisson Process

Now, we introduce the second important stochastic process – the Poisson process. It is anotherexample of a Levy process which could be constructed given it’s finite-dimensional distribution.Before we state the definition, let us recall what is a Poisson random variable. We say that randomvariable X has a Poisson distribution with intensity parameter λ > 0 if for k = 0, 1, 2, . . . we get

P[X = k] =λk

k!e−λ.

For brevity, we use the notation X ∼ P(λ) to denote the Poisson random variable with intensityλ > 0. We are now ready to define the Poisson process.

Definition 3.9. Let T = R+. We call N = (Nt)t∈T a Poisson process with intensity λ > 0if

1) N0 = 0;

2) N is stochastically continuous;

Page 30: Stochastic Processes { Lecture Notes

30

3) N has independent and stationary increments;

4) Nt −Ns ∼ P(λ(t− s)), for any 0 ≥ s ≥ t.

Directly from the definition we get that every Poisson process is a Levy process. Using Kolmogorov’stheorems one could show that such process exists. Nevertheless, we provide an alternative renewaltheory based representation of the Poisson processes which is more constructive. Also, for simplicitywe we work with the cadlag modicivation of the process. Let us start with the definition of therenewal process.

Definition 3.10. Let T = R+. We call N = (Nt)t∈T a renewal process if

Nt = maxn ∈ N : Sn ≤ t,

where S0 = 0, Sn = X1 + . . . + Xn for n ≥ 1, and (Xn)n∈N is a sequence of non-negativeindependent and identically distributed random variables.

We usually associate Sn with the time of the n-th arrival of some predefined event (e.g. arrivalof particle, malfunction of a machine, etc.), while Xn is associated with in-between (inter)arrivaltime. With that interpratation, for any t ∈ T, the random variable Nt informs us how many timesthe even occurred in the time interval [0, t]. Note that we assume that we are dealing with arecurrent-event process where the inter-event times are independent and identically distributed.

The key property of the renewal process is the distribution of the inter-arrival times. To ensuresomething called Markov property, we would like our process to forget about the past - this is theproperty satisfied by exponential random variables. Those class of reneval processes coincide withthe class of Poisson random processes, as explained in the next Theorem.

Theorem 3.11. Let T = R+. The stochastic process N = (Nt)t∈T is a Poisson process withintensity λ > 0 if and only if it is a renewal process with inter-arrival times (Xn)n∈N havingexponential distribution with parameter λ > 0.

Proof. First, assume that N is a Poisson process with intensity λ > 0. We define a sequence ofrandom variables (Xn)n∈N be setting

X1 := inft ∈ T : Nt = 1Xn := inft ∈ T : Nt = n − inft ∈ T : Nt = n− 1.

Setting Sn := X1 + . . . + Xn, noting that Sn = inft ∈ T : Nt = n, and using the fact that N isalmost-surely non-negative, for any k ∈ N we get

Nt = k = S1 ≤ t ∩ . . . ∩ Sk ≤ t ∩ Sk+1 > t =

maxn∈NSn ≤ t = k

.

Consequently, we getNt = maxn ∈ N : Sn ≤ t.

Now, we want to show that the sequence (Xn) is independent and exponentially distributed withparameter λ > 0. For any t1 ≥ 0 we get

P[X1 > t1] = P[Nt1 = 0] = e−λt1 ,

Page 31: Stochastic Processes { Lecture Notes

31

so X1 is exponentially distributed with parameter λ > 0. Now, for any t1, t2 ≥ 0 we get

P[X2 > t2 | X1 = t1] = P[S2 > t2+t1 | X1 = t1] = P[Nt1+t2 = 1 | X1 = t1] = P[Nt2+t1−Nt1 = 0 | X1 = t1]

Noting that Nt2+t1−Nt1 is independent of σ(Ns, s ≤ t1) and X1 = t1 is σ(Ns, s ≤ t1)–measurablewe get

P[X2 > t2|X1 = t1] = P[Nt2+t1 −Nt1 = 0] = e−λt2 .

This shows that X2 is in fact independent of X1 and has exponential distribution with parameterλ. Same reasoning could be applied recursively to show that Xn is independent of Xn−1, Xn−2, . . .,and X1, which concludes this part of the proof.24

Now let us assume that N is a renewal process with exponential increments (Xk)k∈N, i.e.

Nt = maxn ∈ N : Sn ≤ t,

where Sn := X1 + . . . + Xn. We use (Ft)t∈T to denote natural filtration generated by the process(Nt)t∈T. For brevity, we only outline the key steps of the proof:

1) For any s ∈ T we setτs := inft ≥ 0 : Ns+t −Ns > 0.

Noting that for any n ∈ N we get Ns = n = Sn ≤ s ∩ Sn +Xn+1 > s ∈ Fs, for any t ∈ Twe get

P[τs ≥ t | Fs] = P

[⋃n∈Nτs > t ∩ Ns = n | Fs

]

= P

[⋃n∈NNs+t = n ∩ Ns = n | Fs

]=∑n∈N

1Ns=nP [Sn +Xn+1 > t+ s | Fs]

=∑n∈N

1Ns=nP [Xn+1 > (s− Sn) + t | Fs]

=∑n∈N

1Ns=nP [Xn+1 > t]

=∑n∈N

1Ns=ne−λt

= e−λt,

where in the last equality we have used the memoryless property and the fact that the setXn+1 > (s− Sn) conditioned on the event Ns = n is Fs-measurable. Consequently, we getthat for any t, s ∈ T the random variable given by

minNt −Ns, 1

is independent of Fs, and is the standard Bernoulli random variable with probability of successequal to 1− e−λ(t−s).

24See Billingsley, Probability and Measure for the full formal proof.

Page 32: Stochastic Processes { Lecture Notes

32

2) Next, for any t, s ∈ T, and n ∈ N, we consider equidistant partition of [t, s] into n+ 1 separatetime-points, say tn1 , . . . , t

nn+1, and for i = 1, 2, . . . , n we set

ξni := minNtni+1−Ntni

, 1.

3) We define the associated random variable Zn given by

Zn :=n∑i=1

ξni ,

and note that Zn ∼ B(n, 1 − e−λ(t−s)/n), i.e. Zn has Binomial distribution with n trials andprobability of success equal to 1− e−λ(t−s)/n.

4) We havelimn→∞

Zn = Nt −Ns,

almost surely, as there cannot be infinitely many jump on a finite unit interval, and the sizeof jumps is equal to 1 almost surely. Consequently, as each Zn is independent of Fs, we getthat Nt − Ns in independent of Fs. In particular, this implies that the increments of N areindependent.

5) For n → ∞ we get that n(1 − e−λ(t−s)/n) → λ(t − s), i.e. the number of trials multiplied byintensity for the Bernoulli random variables Zn goes to a constant. Thus, using the Poissonlimit theorem we know that the sequence of random variables (Zn)n∈N weakly converges to thePoisson random variable with intensity parameter λ(t − s). This shows that increments arestationary, and have the Poisson distribution, which concludes the proof.

To calculate the distribution of Nt more directly one could note that for any k ∈ N, the randomvariable Sk has the Gamma density gk : R→ R+ given by25

gk(r) =

λ (λr)k−1

(k−1)! e−λr, if r > 0.

0 otherwise.

Thus, for k ∈ N we get

P[Nt = k] = P[maxn ∈ N : Sn ≤ t = k]

= P[Sk ≤ t < Sk +Xk+1]

=

∫ ∫r≤t≤r+z

gk(r)g1(z) dr dz

=

∫ t

0

[∫ ∞t−r

g1(z) dz

]gk(r) dr

=

∫ t

0e−λ(t−r)gk(r) dr

=λn

(k − 1)!e−λt

∫ t

0rn−1 dr

=(λt)n

k!e−λt,

which shows that Nt ∼ P(λt).

25The proof of this fact is left as an excercise. It can be proved by induction. One can directly calculate theconvolution of the increments.

Page 33: Stochastic Processes { Lecture Notes

33

As we have mentioned in the proof, the jumps for the (cadlag) Poisson process are almost surelyfinite and equal to one.

Definition 3.12. Let X = (Xt)t∈T be a cadlag stochastic process. We use

∆Xt := Xt −Xt− ,

to denote the jump of X at t ∈ T.

Also, the intensity parameter λ > 0 is responsible for the mean number of signals. Those two factsare presented in the next proposition, the proof is left as an exercise.

Proposition 3.13. Let N = (Nt)t∈T be a Poisson process. Then

1. P[∆Nt ∈ 0, 1] = 1, for any t ∈ T;

2. limt→+∞Ntt = λ.

In fact, for any (cadlag) Levy process it is interesting to measure the size of jumps and theirintensity. Let B0 denote the family of Borel sets of R whose closures do not contain 0.

Definition 3.14. Let X be a (cadlag) Levy process. Then,

1) the Poisson random measure (or jump measure) of X is the map ΠX : T×B0 ×Ω→ Ngiven by

ΠX(t, U)(ω) =∑

s:0<s≤t1∆Xs∈U(ω);

2) the Levy measure of X is the map νX : B0 → R given by

νX(U) = E[ΠX(1, U)].

The map ΠX measures how many times the jump of size from U occurred on time interval [0, t]for a given path, while νX measures the average frequency of U jumps in the unit interval [0, 1]. Itshould be noted that ΠX is finite for all U ∈ B0 and t ∈ T which follows from the cadlag property.

Proposition 3.15. Let X be a (cadlag) Levy process. Then, ΠX is finite.

Proof. Set T1 := inft > 0 : ∆Xt ∈ U. By the right-continuity of X we know that

limt→0+

Xt = X0 = 0.

Noting that we can find ε > 0 such that for any u ∈ U we get u > ε, and exploiting right-continuityof X we get that T1 > 0 (a.s.). Next, we inductively define

Tn+1 = inft > 0 : ∆Xt ∈ U and t > Tn.

Using the above argument we know that Tn+1 > Tn. To conclude the proof, it is enough to showthat for n→∞ we get Tn →∞ (a.s.). On the contrary, let us assume that there exists a subset ofpositive measure, say D, such that for any ω ∈ D we get Tn(ω)→ Tω, where Tω <∞. Consequently,for all ω ∈ D we know that

limt→T−ω

Xt(ω)

does not exist, which contradicts the cadlag property.

Page 34: Stochastic Processes { Lecture Notes

34

Finally, we state the proposition which shows the connection between Poisson process, Poissonrandom measure, and the Levy measure.

Proposition 3.16. Let X be a (cadlag) Levy process. Then, for any fixed U ∈ B0 the processNX,U = (NX,U

t )t∈T given by

NX,Ut (ω) := ΠX(t, U)(ω),

is a Poisson process with intensity parameter λ = νX(U).

It should be noted that if X is itself a Poisson process with intensity λ > 0, and 1 ∈ U , thenwe get ΠX(t, U) = X, and νX(U) = λ.

3.3 Markov processes

In the previous Section we considered two examples of processes: Brownian motion and Poissonprocesses. Those processes were constructed in a way such that the most up-to-date value of theprocess was the only important information, when we wanted to say something about the futuredynamics. This property is known in the literature as the Markov property. For brevity, we onlypresent the results in continuous time (the discrete-time case should have been discussed in anothercourse).

Definition 3.17. Let T = R+ and let X = (Xt)t∈T be an F-adapted process. We say that Xhas the Markov property if for any A ∈ B(R), and s, t ∈ T, such that t > s, we get

P[Xt ∈ A | Fs] = P[Xt ∈ A | Xs].

We say that X is a Markov process with respect to filtration F if it satisfies the Markov propertywith respect to this filtration. We are now ready to formally define the Markov process.

Definition 3.18. Let T = R+ and let X = (Xt)t∈T be a stochastic process. We say that X isa Markov process if it satisfies the Markov property with respect to it’s natural filtration.

Typically, Markov processes are defined in terms of their’s transition functions. Before intro-ducing the formal definition let us define and discuss some underlying concepts. We start withthe definition of a transition kernel. While the definition could be easily extended to stochasticprocesses taking values in any measurable space, for brevity we only consider the univariate realcase (R,B(R)).

Definition 3.19. Let t, s ∈ T be such that t > s. A probability kernel (from s to t) is amap

Ps,t : R× B(R)→ R+ ∪ +∞

such that

• for each x ∈ R the map A 7→ Ps,t(x,A) is a measure;

• for each A ∈ B(R) the map x 7→ Ps,t(x,A) is measurable.

• for all x ∈ E we get Ps,t(x,R) = 1.

Page 35: Stochastic Processes { Lecture Notes

35

The idea of a probability kernel is pretty straighforward. For any two time-points s and t we wantto asses the probability of being at time t in the set A of the process which at time s was at statex. In other words, we want to check how the process which started at point x (at time s) behavesat time t. For completeness, for any t ∈ T we also define the probability kernel Pt,t which is simplygiven by

Pt,t(x,A) := δx(A).

In order to recover information about the evolution of the Markov process we need a whole collectionof probability kernels for all possible time-points. For brevity we only consider the homogeneouscase, i.e. any kernel Pt,s (from a collection of probability kernels (Pt,s)t,s∈T:t<s) should depend ontime only though the increment t − s. In other words, for any t > 0 and h ≥ 0 we want to haveP0,t ≡ Ph,t+h. In that case it is enough to consider the incremental family of kernels (Pt)t∈T givene.g. by Pt := P0,t. From now we restrict ourselves to the homogenous case and use the simplifiednotation.26

Before we define the transition function we need to introduce basic notation related to prob-ability kernels. Let C(R) denote the space of all bounded and measurable functions f : R → R.Given transition kernel Pt and f ∈ C(R) we define the associated integral for any x ∈ R by setting

Ptf(x) :=

∫Rf(y)Pt(x,dy).27

Also, any two kernels, say Pt and Ps, could be combined. For any f ∈ C(R) and x ∈ R we write

PtPsf(x) :=

∫R

∫Rf(z)Ps(y,dz)Pt(x,dy),

and use PtPs to denote the associated probability kernel. Also, note that Ptf ∈ C(R). We are nowfinally ready to define the transition function.

Definition 3.20. A homogeneous transition function P on (R,B(R)) is a collection ofprobability kernels (Pt)t∈T such that for any t, s ∈ T we get

PtPs = Pt+s. (3.4)

The identity (3.4) is sometimes referred to as the Chapman-Kolmorogov equation. It is requiredin order to guarantee the time-consistency of process evolution, which could be expressed in termsof the conditional expectation via the tower rule property. Before we explain this in details let usshow how the transition function P could be linked to a stochastic process.

Definition 3.21. Let X be an F-adapted stochastic process. We say that X is Markov withtransition function P if for any f ∈ C(R) and t, s ∈ T, such that t > s, we get

E[f(Xt)|Fs] = Pt−sf(Xs). (3.5)

26In fact, this assumption is not very restrictive - given time inhomogeneous Markov process we can simply lookat the space-time process ((t,Xt))t∈T that will be homogeneous.

27Note that Pt(x, ·) is a measure so this is well defined

Page 36: Stochastic Processes { Lecture Notes

36

Note that the Chapman-Kolmogorov property is indeed required in Definition 3.21. Using the towerproperty, for time-points t, s, u ∈ T, we immediately get

Pt+sf(Xu) = E[f(Xt+s+u)|Fu]

= E[E[f(Xt+s+u)|Ft+u]|Fu]

= E[Psf(Xt+u)|Fu]

= PtPsf(Xu).

It can be shown that both Brownian motion and Poisson process are Markov with appropriatetransition functions. In fact, given any Levy process X we can define it’s transition function Psimply by setting

Pt(x,A) := P[Xt+s ∈ A|Xs = x],

for all t ∈ T, x ∈ R, and A ∈ B(R) (and where s > 0). Note that RHS does not depend ons because X has stationary increments. Nevertheless, in practice it is usually not possible towrite down transition functions explicitly. Instead, we define the processes in terms of solutionsto differential equations, where the infinitesimal generator is provided instead of the transitionfunction.

Next, it is important to know If we can construct the process with any pre-defined transitionfunction. Such construction is indeed possible and the proof follows from the Kolmogorov’s exten-sion theorem. For simplicity, we assume that we know the initial state of the process (at time 0)and it is determined. It could be shown that similar results are true if we assume that the initialdistribution of the process is known.

Theorem 3.22. Let T = R+ and let Ω = RT be a space of functions ω : T → R. Let X = (Xt)t∈Tdenote the coordinate stochastic process, where Xt : Ω→ R is given by

Xt(ω) = ω(t),

for any function ω ∈ Ω. Let F denote a natural filtration of X, and let F := σ(Xt, t ∈ T). Then,for any transition function P and starting point x ∈ R there exists there is (a unique) probabilitymeasure P on (Ω,F) under which X is Markov with transition function P and initial state x.

Also, it should be noted that if we have two processes with the same initial state and the sametransition functions, then they have same finite distributions. The proof of this fact is left as anexercise. Finally, we need to show that if a process is Markov with transition function, then it isindeed a Markov process.

Proposition 3.23. Let X be Markov with transition function P . Then X is a Markov process.

Proof. The proof of this fact is in fact straightforward. Let us fix t, s ∈ T such that t > s andA ∈ B(R). Let f : R → R be the indicator function of set A, i.e. f(x) = 1x∈A. We know thatf ∈ C(R). Moreover, using property (3.5) we get

P[Xt ∈ A | Fs] = E[1Xt∈A|Fs] = E[f(Xt)|Fs] = Pt−sf(Xs) = Pt−s(Xs, A).

Similarly,

P[Xt ∈ A | Xs] = E[E[1Xt∈A|Fs]|Xs] = E[Pt−s(Xs, A)|Xs] = Pt−s(Xs, A),

which conclude the proof.

In fact, for Borel spaces, the reverse implication is also true, i.e. for any Markov process, thereexists the associated transition probability kernel. However, the detailed explanation of this fact isout of scope of this lecture.

Page 37: Stochastic Processes { Lecture Notes

37

4 Introduction to stochastic Ito calculus

The main goal of this section is to define the stochastic integral from the Wiener process W , i.e.give a meaning to quantity ∫ t

0Xs dWs, t ≥ 0. (4.1)

Initial thought might be to use the standard (non-stochastic) framework and define (4.1) for any ωusing the Lebesgue-Stieltjes integral. Unfortunately, we have shown in Proposition 3.7 that Wienerprocess has infinite total variation on any time-interval, so that this cannot be done and specialconstruction must be used. While the construction details in many places resemble the constructionof the classical Lebesgue integral, there are a few key differences. The main one is connected to thefact that we cannot take any point from a predefined partition, and then take the limit (shrinkingthe size of all partitions). In fact, we will simply take the left end-point which will lead to gooddefinition.28 Let us start with the definition of simple process which might be seen as a mainbuilding block of the Ito integral.

In this section we assume that time is continuous, i.e. T = R+, and F corresponds to filtrationgenerated by a fixed Wiener process W = (Wt)t∈T. Also, for simplicity we focus on defining theintegrals on time-intervals [0, t). This could be easily extended to any sub-intervals.

4.1 Ito integral of an elementary process

The elementary processes are a simple extension of the elementary functions.

Definition 4.1. We call a stochastic process ξ = (ξt)t∈T an elementary (predictible) pro-cess if ξ could be represented as

ξt = Z10(t) +

n−1∑k=0

Zk1(tk,tk+1](t), t ≥ 0, (4.2)

where Z is F0–measurable, n ∈ N, the sequence of finite time points t0, t1, . . . , tn is strictlyincreasing with t0 = 0, and Zk is a square-integrable and Ftk–measurable random variable forany k = 0, 1, . . . , n− 1. We denote by P the class of all elementary processes.

Now, we define Ito integral of elementary process. Given ξ ∈ P we define first the infinite-timeintegral I(ξ) by setting

I(ξ) :=

∫ ∞0

ξs dWs :=

n−1∑k=0

Zk(Wtk+1−Wtk).

Next, we define the integral of ξ at time t by simply setting

It(ξ) :=

∫ ∞0

ξs dWs := I(1[0,t]ξ).29

Note that the integral (It)t∈T is a stochastic process itself. One could easily check that thosedefinitions are well stated in the sense that they do not depend on representation 4.2 and the

28One could take e.g. middlepoint as well. Then, we will obtain different integral referred to as Stratonovichintegral, the most common alternative to the Ito integral that is used frequently in physics.

29where (1[0,t]ξ)s = 1[0,t]ξs for any s ∈ T

Page 38: Stochastic Processes { Lecture Notes

38

linear combinations of linear processes is a linear process.30 Let us now combine together the basicproperties of the integral that we have just defined. They are outlined in Theorem 4.2.

Theorem 4.2. Let ξ, ν ∈ P. Then

1. For any α, β ∈ R and t ≥ 0 we get It(αξ + βν) = αIt(ξ) + βIt(ν).

2. For any t > s ≥ 0 we get It(ξ) = Is(ξ) + Is,t(ξ), where Is,t(ξ) = I(1[s,t]ξ).

3. The process (It(ξ))t∈T has continuous paths.

4. The process (It(ξ))t∈T is a square-integrable martingale (wrt. F). Moreover, for any t ≥ 0 weget

E[It(ξ)] = 0,

E[It(ξ)2] = E

[∫ t

0ξ2s ds

]. (4.3)

Equality (4.3) is often referred to as Ito isometry.

Proof. The proof of 1) and 2) is left as an exercise. Let us now show 3). Noting that a (finite) sumof continuous stochastic processes is a continuous stochastic process it is enough to note that

Z1(Wt2 −Wt1) (4.4)

is continuous, where Zk is Ft1-measurable random variable and t2 > t1 ≥ 0. This follow directlyfrom the Wiener process path continuity property. Indeed, setting

ξt = Z11(t1,t2](t)

we simply get

It(ξ) =

0 t ∈ [0, t1],

Z1(Wt −Wt1) t ∈ (t1, t2),

Z1(Wt2 −Wt1) t ∈ [t2,∞).

(4.5)

Now, we prove 4). Again, because the sum of square-integrable martingales is a square-integrablemartingale we can consider the simplified process as in (4.4) and consider integral process (It(ξ))t∈Tgiven in (4.5). First, we note that the stochastic process (It(ξ))t∈T is adapted since W is adaptedand Z1 is Ft1–measurable. Second, it is integrable since for any t ∈ T the process could be seen asa multiplication of two square-integrable random variables. Thus, we need to show the martingaleproperty

E[It(ξ) | Fs] = Is(ξ),

for all t > s ≥ 0. Let us consider three cases: 0 ≤ s ≤ t1, t1 < s ≤ t2, and s > t2.

30the proof is left as an exercise. Simply note that for any two partitions there exists a sub-partition that is moregranular compared to both of them.

Page 39: Stochastic Processes { Lecture Notes

39

1) Case 1: 0 ≤ s ≤ t1. In this case we know that Is(ξ) = 0. If t ≤ t1 the proof is straight forwardas both numbers are equal to 0. For t ∈ (t1, t2) using the fact that Fs ⊆ Ft1 we get

E[It(ξ) | Fs] = E[Z1(Wt −Wt1) | Fs]= E[E[Z1(Wt −Wt1) | Ft1 ] | Fs]= E[Z1E[Wt −Wt1 | Ft1 ] | Fs]= E[Z1 0 | Fs]= 0.

The proof for t ∈ [t2,∞) is similar.

2) Case 2: t1 < s ≤ t2. Let t ∈ (s, t2]. Because Z1 is Fs–measurable we get

E[It(ξ) | Fs] = E[Z1(Wt −Wt1) | Fs]= Z1E[Wt −Wt1 | Fs]= Z1E[(Wt −Ws) + (Ws −Wt1) | Fs]= Z1(Ws −Wt1)

= Is(ξ)

The proof for t ∈ [t2,∞) is similar.

3) Case 3: s > t2. We get It(ξ) = Is(ξ).

This concludes the proof of the martingale property. Let us now show that for any t ≥ 0 we get

E[It(ξ)] = 0 (4.6)

Again, we can utilise representation (4.4). For t ≤ t1 we get It(ξ) = which implies (4.6). On theother hand, if t > t1 then we get

E[It(ξ)] = E[E[It(ξ) | Ft1 ]] = E[Z1E[Wt∧t2 −Wt1 | Ft1 ]] = 0.

Thus, to conclude the proof we need to show

E[It(ξ)2] = E

[∫ t

0ξ2s ds

]. (4.7)

Here, we cannot simply use (4.6) as the (·)2 transform is not linear. Let us assume that ξ ∈ P andit could be expressed as (4.2), i.e. we have

ξt = Z10(t) +n−1∑k=0

Zk1(tk,tk+1](t).

Let t ≥ 0 and let M := 1, 2, . . . , n− 1. We consider three cases: t = 0, t ≥ tn, and t ∈ (0, tn).

1) Case 1: t = 0. We get It(ξ) = 0 so (4.7) is satisfied.

Page 40: Stochastic Processes { Lecture Notes

40

2) Case 2: t ≥ tn. We have

It(ξ) = I(ξ) =n−1∑k=0

Zk(Wtk+1−Wtk),

Consequently,

E[I2t (ξ)] =

∑j,k∈M

E[ZjZk(Wtj+1 −Wtj )(Wtk+1

−Wtk)]

=∑k∈M

E[Z2k(Wtk+1

−Wtk)2]

+ 2∑j>kj,k∈M

E[ZjZk(Wtj+1 −Wtj )(Wtk+1

−Wtk)]

Now, for any fixed j, k ∈M such that j > k we get

E[ZjZk(Wtj+1 −Wtj )(Wtk+1

−Wtk)]

= E[E[ZjZk(Wtj+1 −Wtj )(Wtk+1

−Wtk) | Ftj]]

= E[ZjZk(Wtk+1

−Wtk)E[Wtj+1 −Wtj | Ftj

]]= E

[ZjZk(Wtk+1

−Wtk)E[Wtj+1 −Wtj

]]= 0. (4.8)

On the other hand, using Proposition 3.8 for any t > s we get

E[(Wt −Ws)2|Fs] = E[W 2

t − 2WtWs +W 2s | Fs]

= E[W 2t −W 2

s | Fs]= E[W 2

t − t | Fs] + t−W 2s

= W 2s − s+ t−W 2

s

= t− s,

so that ∑k∈M

E[Z2k(Wtk+1

−Wtk)2]

=∑k∈M

E[E[Z2k(Wtk+1

−Wtk)2 | Ftk]]

=∑k∈M

E[EZ2

k

[(Wtk+1

−Wtk)2 | Ftk]]

=∑k∈M

E[Z2k(tk+1 − tk)

](4.9)

Finally, noting that for any t ∈ R we get∫ t

0ξ2s ds =

∫ t

0

[n−1∑k=0

Z2k1(tk,tk+1](s)

]ds

=n−1∑k=0

Z2k

∫ t

01(tk,tk+1](s) ds

=

n−1∑k=0

Z2k(tk+1 − tk). (4.10)

Combining (4.8), (4.9), and (4.10) we get

E[I2t (ξ)] = E

[∫ t

0ξ2s ds

],

which concludes this part of the proof.

Page 41: Stochastic Processes { Lecture Notes

41

3) Case 3: t ∈ (0, tn). We know that there exists j ∈ 0, . . . , n− 1 such that t ∈ (tj , tj+1]. We get

E[I2t (ξ)] = E

[(Itj (ξ) + Itj ,t(ξ)

)2]= E

[I2tj (ξ)

]+ 2E

[Itj (ξ)Itj ,t(ξ)

]+ E

[I2tj ,t(ξ)

]. (4.11)

Following all the steps of the previous case we get

E[I2tj (ξ)

]= E

[∫ tj

0ξ2s ds

]. (4.12)

Next, using the fact that (It(ξ))t∈T is a martingale we get

E[Itj (ξ)Itj ,t(ξ)

]= E

[E[Itj (ξ)Itj ,t(ξ) | Ftj

]]= E

[Itj (ξ)E

[Itj ,t(ξ) | Ftj

]]= E

[Itj (ξ)E

[It(ξ)− Itj (ξ) | Ftj

]]= E

[Itj (ξ)E

[Itj (ξ)− Itj (ξ) | Ftj

]]= 0. (4.13)

Finally, using the fact that t ∈ (tj , tj+1], as in the previous case, we get

E[I2tj ,t(ξ)

]= E

[Z2j (Wt −Wtj )

2]

= E[E[Z2j (Wt −Wtj )

2 | Ftj]]

= E[Z2j E[(Wt −Wtj )

2 | Ftj]]

= E[Z2j (t− tj)

]= E

[∫ t

tj

ξ2s ds

]. (4.14)

Combining (4.12), (4.13), and (4.14) with (4.11) we conclude the proof.

4.2 Extending Ito integral to L2 space.

In this section we outline how to extend Ito integral from the space P to the space of square-integrable processes, i.e. processes of class L2. For brevity, we skip the proofs.

Definition 4.3. We say that the stochastic process X is of class L2 (i.e. X ∈ L2) if X isadapted, measurable, and for any t > 0 we get

E[∫ t

0X2s ds

]<∞.

On L2, for any t ∈ T we define the seminorm

‖X‖L2,t := E[∫ t

0X2s ds

].

Page 42: Stochastic Processes { Lecture Notes

42

The associated metric is given by

ρL2(X,Y ) :=∞∑n=1

2−n(‖X − Y ‖L2,n ∧ 1), X, Y ∈ L2.

It can be shown that ρL2 is indeed a metric, and that (L2, ρL2) is a Polish space. Nevertheless, theproof of this is out of scope of this lecture. Now, we need to show that we can approximate anyprocess from L2 using processess from P. This is the statement of Proposition 4.4.

Proposition 4.4. The class of P processes is dense in L2, i.e. for any X ∈ L2 there exists asequence (ξn)∞n=1 of processes, such that ξn ∈ P (for n ∈ N), and ξn → X (in ρL2).

The proof of proposition (4.4) is typically split into several parts. First, we show that P isdense in the subspace of bounded processes with non-zero values on some finite time interval.Then, we extend the reasoning to continuous processes, progressively measurable processes, andfinally to measurable and adapted processes (on space L2). The last extension is typically doneusing the Dellacherie-Meyer theorem, which states that any adapted and measurable process hasprogressively measurable modification.

Next, it can be shown the for any X ∈ L2 the Ito integrals for the approximating sequence fromP do converge in the mean-square metric.

Proposition 4.5. Let X ∈ L2 and let (ξn)∞n=1 be a sequence of processes from P such that ξn → X(in ρL2). Then,

1) For any t ∈ T, the sequence of random variables (It(ξn))n∈N converges in L2(Ω,Ft,P).

2) If (νn)∞n=1 is sequence of processes from P such that νn → X (in ρL2), then for any t ∈ T weget

limn→∞

It(ξn) = lim

n→∞It(ν

n), in L2(Ω,Ft,P)

The proof od Proposition (4.5) is a simple consequence of Ito isometry combined with the factthat (L2, ρL2) is a Polish space. This proposition allow us to properly define Ito integral for anyprocess from L2.

Definition 4.6. Let X ∈ L2. For any t > we define the Ito integral of X at time t ∈ T as theL2–limit of the sequence (It(ξ

n))∞n , where (ξn)∞n=1 is such that ξn ∈ P and

ξn → X (in ρL2).

We denote the limit by It(X) or∫ t

0 Xs dWx.

In the next Theorem we outline the basic properties of Ito integrals. Note that most of the propertiesare a direct extension of properties given in Theorem 4.2.

Theorem 4.7. Let X,Y ∈ L2. Then

1. For any α, β ∈ R and t ≥ 0 we get It(αX + βY ) = αIt(X) + βIt(T ).

2. The process (It(X))t∈T has continuous paths starting from 0 (i.e. I0(X) = 0).

Page 43: Stochastic Processes { Lecture Notes

43

3. The process (It(X))t∈T is a square-integrable martingale (wrt. F). Moreover, for any ∞ >t ≥ k ≥ 0 we get

E[∫ t

0Xs dWs

]= 0,

E

[(∫ t

0Xs dWs

)2]

= E[∫ t

0X2s ds

],

E[(∫ t

0Xs dWs

)(∫ t

0Ys dWs

)]= E

[∫ t

0XsYs ds

],

E

[(∫ t

0Xs dWs −

∫ k

0Xs dWs

)2

| Fs

]=

∫ t

kX2s ds.

As before, we call the second equality in 3. the Ito isometry.

Final end remark: The typical next step in the Ito calculus is to extend the Ito integral from L2

space to P2 space, i.e. the space of measurable and adapted processes such that

P[∫ t

0X2s ds <∞, for any t ∈ T

]= 1.

In the introduction to stochastic analysis we continue integra construction. We also define the classof Ito processes, and show why above definitions are useful, especially in mathematical finance (e.g.in Black-Scholes framework).

Page 44: Stochastic Processes { Lecture Notes

44

A Appendix

A.1 Exemplary list of questions (and scope) for the exam

1) Conditional expectation – definition, basic properties, examples.

2) Conditional expectation as the best least-square predictor.

3) Definition of the stochastic processes and basic (continuity) properties.

4) Stochastic process modifications, finite-dimensional distributions, and stationarity.

5) Properties of continuous modifications, Holder continuity, and Kolmogorov’s continuity theorems.

6) Cylinder sets, consistency property, and Kolmogorov”s extension theorem.

7) Filtrations – basic definitions, properties, and examples.

8) Process measurability and progressive measurability. Examples.

9) Stopping times in continuous time – definition and basic properties.

10) First entry time, and different types of events constructed using stopping times.

11) Operations on stopping times (sup, +, etc.), and different characterisations of stopping times.

12) Stochastic process random time sample, stopped process, and stopping sigma algebra.

13) Martingales – definition, and basic properties.

14) Regular martingales, and concave transforms of martingales

15) Stopped processes and martingale property.

16) Optional sampling theorem in discrete-time.

17) Doob-Meyer decomposition (in discrete and continuous time).

18) Uniform integrability, (D) and (DL) property, (DL) property for cadlag martingales.

19) Levy processes – definition and examples.

20) Brownian Motion – definition, and basic properties

21) Existence of the Brownian motion.

22) Transformations of Brownian motion.

23) Continuity properties for Brownian motion (Holder continuity, Path differentiability, Total variation).

24) Martingale characterisation of Brownian motion.

25) Poisson Process - definition and basic properties.

26) Two definition of the Poisson process and their equivalence.

27) Process jumps, poisson random measure, and Levy measure.

28) Markov processes – definition, and examples.

29) Probability kernels, transition functions, and construction of Markov processes.

30) Elementary (predictible) processes and their’s Ito integral.

31) Basic properties of Ito integral.

32) Ito integral on L2 space.

Page 45: Stochastic Processes { Lecture Notes

45

A.2 Notation

(Ω,F ,P) The probability space. If not stated otherwise, we assume that it willalways be the underlying space

L0(Ω,F ,P) The set of all (a.s. identified) random variables on (Ω,F ,P)L1(Ω,F ,P) The set of all (a.s. identified) integrable RVs on (Ω,F ,P)Lp(Ω,F ,P) The set of all (a.s. identified) square-integrable RVs on (Ω,F ,P)σ(X) σ-algebra generated by X1A(·) characteristic function of set Aδx(·) Dirac delta function for point xAc Complement of set A, i.e. Ω \A.BT The set of all cylinder sets on T.T The set of all finite subsets of T.〈X,X〉 The quadratic variation of X.Σs The set of all stopping times with values up to time s, i.e. τ : τ ≤ s.CX Auto-covariance function of process X, i.e. CX(t, s) = Cov(Xt, Xs)B0 The family of Borel sets U ⊂ R whose closure does not contain 0.C(E) The space of all bounded and measurable functions f : E → R.Pt Transition (time-homogenous) kernel at time tP The family of all elementary (predictible) processes.L2 The family of all adapted, measurable stochastic processes such that for

any t > 0 we get E[∫ t

0 X2s ds

]<∞.

I(X)∫∞

0 Xs dWs, i.e. stochastic Ito integral of the process X on [0,∞).

It(X)∫ t

0 Xs dWs.

Is,t(X)∫ ts Xs dWs.

RV Random variable.SP Stochastic process.BM Brownian Motion (Wiener process).PP Poisson process.