Top Banner
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes ACTL2002/ACTL5101 Probability and Statistics c Katja Ignatieva School of Risk and Actuarial Studies Australian School of Business University of New South Wales [email protected] Week 5 Video Lecture Notes Probability: Week 1 Week 2 Week 3 Week 4 Estimation: Week 5 Week 6 Review Hypothesis testing: Week 7 Week 8 Week 9 Linear regression: Week 10 Week 11 Week 12 Video lectures: Week 1 VL Week 2 VL Week 3 VL Week 4 VL
124
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

ACTL2002/ACTL5101 Probability and Statistics

c© Katja Ignatieva

School of Risk and Actuarial StudiesAustralian School of Business

University of New South Wales

[email protected]

Week 5 Video Lecture NotesProbability: Week 1 Week 2 Week 3 Week 4

Estimation: Week 5 Week 6 Review

Hypothesis testing: Week 7 Week 8 Week 9

Linear regression: Week 10 Week 11 Week 12

Video lectures: Week 1 VL Week 2 VL Week 3 VL Week 4 VL

Page 2: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: chi-squared distribution

Chi-squared distribution: one degree of freedom

Special sampling distributions & sample mean and variance

Special Sampling Distributions: chi-squared distributionChi-squared distribution: one degree of freedomChi-squared distribution: n degrees of freedom

Special Sampling Distributions: student-t distributionJacobian technique and William Gosset (t-distribution)

Special Sampling Distributions: Snecdor’s F distributionJacobian technique and Snecdor’s F distribution

Distribution of sample mean/varianceBackgroundFundamental sampling distributions

Page 3: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: chi-squared distribution

Chi-squared distribution: one degree of freedom

Chi-squared distribution: one degree of freedomSampling from a normal distribution; independent andidentically distributed (i.i.d.) random values.

Suppose Z ∼ N (0, 1), then

Y = Z 2 ∼ χ2 (1)

has a chi-squared distribution with one degree of freedom.

Distribution characteristics:

fY (y) =1√2πy

· exp(−y/2);

FY (y) =FZ (√y)− FZ (−√y) = 2 · FZ (

√y)− 1;

E[Y ] =E[Z 2]

= 1;

Var(Y ) =E[Y 2]− (E [Y ])2 = E

[Z 4]−(E[Z 2])2

= 3− 1 = 2.

Prove: see next slides.802/827

Page 4: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: chi-squared distribution

Chi-squared distribution: one degree of freedom

Prove that Z 2 has a chi-squared distributed with one degreeof freedom (using p.d.f.), with Z a standard normal r.v..

Proof: using the CDF technique (seen last week). Consider:

FY (y) = Pr(Z 2 ≤ y

)= Pr (−√y ≤ Z ≤ √y)

=

∫ √y−√y

1√2π· e−

12z2dz

= 2 ·∫ √y

0

1√2π· e−

12z2dz

∗= 2 ·

∫ y

0

1√2π· 1

2· w−1/2 · e−

12wdw .

* using change of variable z =√w , so that

dz = 12 · w

−1/2dw .

Proof continues on next slide.803/827

Page 5: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: chi-squared distribution

Chi-squared distribution: one degree of freedom

Proof (cont.).

FY (y) =

∫ y

0

1√2π· w−1/2 · e−

12wdw .

Differentiating to get the p.d.f. gives:

∂FY (y)

∂y= fY (y)

∗∗=

1√2π· y−1/2 · e−

12y

=1

21/2 · Γ(

12

) · y (1−2)/2 · e−y/2,

** using differentiation of integral:∂∫ ba f (x)dx

∂b = f (b).

which is the density of a χ2 (1) distributed random variables(see F&T pages 164-169 for tabulated values of c.d.f.).

Note: Yi ∼ χ2 (1)dist= Gamma

(12 ,

12

)⇒ MY (t) =

(1/2

1/2−t

)1/2= (1− 2 · t)−1/2.

804/827

Page 6: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: chi-squared distribution

Chi-squared distribution: n degrees of freedom

Special sampling distributions & sample mean and variance

Special Sampling Distributions: chi-squared distributionChi-squared distribution: one degree of freedomChi-squared distribution: n degrees of freedom

Special Sampling Distributions: student-t distributionJacobian technique and William Gosset (t-distribution)

Special Sampling Distributions: Snecdor’s F distributionJacobian technique and Snecdor’s F distribution

Distribution of sample mean/varianceBackgroundFundamental sampling distributions

Page 7: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: chi-squared distribution

Chi-squared distribution: n degrees of freedom

Chi-squared distribution: n degrees of freedomLet Zi , i = 1, . . . , n be i.i.d. N(0,1), then X =

∑ni=1 Z

2i , has

a Chi-squared distribution with n d.f.: X ∼ χ2 (n).

Distribution properties:

fX (x) =1

2n/2 · Γ (n/2)· x (n−2)/2 · e−x/2, if x > 0,

and zero otherwise. Parameter constraints: n = 1, 2, . . .

E[X ] = E

[n∑

i=1

Yi

]∗= n · E [Yi ] = n

Var(X ) = Var

(n∑

i=1

Yi

)∗= n · Var (Yi ) = 2 · n

MX (t) = M∑ni=1 Yi

(t) = MnYi

(t)∗= (1− 2 · t)−n/2 , t < 1/2.

Prove: * use i = 1, . . . , n i.i.d. Yi ∼ χ2(1).805/827

Page 8: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: chi-squared distribution

Chi-squared distribution: n degrees of freedom

Alternative proof: Recall the p.d.f. of Y :

fY (y) =1√

2 · Γ(1/2)· y−1/2 · ey/2

Recall X ∼ Gamma (n, λ), with p.d.f.:

fX (x) =λn · xn−1 · e−λ·x

Γ (n), if x ≥ 0 and zero otherwise.

For independent Y1,Y2, . . . ,Yn ∼ χ2 (1) ,

Y1 + Y2 + . . .+ Yn ∼ Gamma

(n

2,

1

2

)dist= χ2 (n) ,

since the sum of i.i.d. Gamma random variablesGamma(αi , λ) is also a Gamma random variable but withGamma(

∑ni=1 αi , λ) (see lecture week 2).

See F&T pages 164-169 for tabulated values of c.d.f.806/827

Page 9: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: chi-squared distribution

Chi-squared distribution: n degrees of freedom

Chi-squared probability/cumulative density function

0 10 20 300

0.1

0.2

0.3

0.4

0.5

x

f X(x)

χ2 p.d.f.

n=1n=2n=3n=5n=10n=25

0 10 20 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

F X(x)

χ2 c.d.f.

807/827

Page 10: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: student-t distribution

Jacobian technique and William Gosset (t-distribution)

Special sampling distributions & sample mean and variance

Special Sampling Distributions: chi-squared distributionChi-squared distribution: one degree of freedomChi-squared distribution: n degrees of freedom

Special Sampling Distributions: student-t distributionJacobian technique and William Gosset (t-distribution)

Special Sampling Distributions: Snecdor’s F distributionJacobian technique and Snecdor’s F distribution

Distribution of sample mean/varianceBackgroundFundamental sampling distributions

Page 11: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: student-t distribution

Jacobian technique and William Gosset (t-distribution)

Jacobian technique and William Gosset

As an illustration of the Jacobian transformation technique,consider deriving the t-distribution (see exercises 4.111, 4.112and 7.30 in W+(7ed)).

t-Distributions discovered by William Gosset in 1908. Gossetwas a statistician employed by the Guinness brewing company.

Suppose Z ∼ N (0, 1) and V ∼ χ2 (r) =r∑

k=1

Z 2i , where

Zi , i = 1, . . . , r i.i.d. and Z , V are independent.

Then, the random variable:

T =Z√V /r

has a t-distribution with r degrees of freedom.808/827

Page 12: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: student-t distribution

Jacobian technique and William Gosset (t-distribution)

Jacobian transformation technique procedure

Recall the procedure to find joint density of u1 = g1(x1, x2)and u2 = g2(x1, x2):

1. Find u1 = g1 (x1, x2) and u2 = g2 (x1, x2).

2. Determine h (u1, u2) = g−1 (u1, u2).

3. Find the absolute value of the Jacobian of the transformation.

4. Multiply that with the joint density of X1, X2 evaluated inh1(u1, u2), h2(u1, u2).

809/827

Page 13: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: student-t distribution

Jacobian technique and William Gosset (t-distribution)

Proof:

Note p.d.f.’s:

fV (v) = v r/2−1

2r/2·Γ(r/2)· e−v/2, if 0 ≤ v <∞;

fZ (z) = 1√2π· e−

12·z2, if −∞ < z <∞.

1. Define the variables:

s = g1(z , v) = v and t = g2(z , v) =z√v/r

.

2. So that this forms a one-to-one transformation with inverse:

v = h1(s, t) = s and z = h2(s, t) = t ·√s/r .

810/827

Page 14: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: student-t distribution

Jacobian technique and William Gosset (t-distribution)

3. The Jacobian is:

J (s, t) = det

∂h1(s, t)

∂s

∂h1(s, t)

∂t

∂h2(s, t)

∂s

∂h2(s, t)

∂t

= det

1 0

12 · t · s

−1/2/√r√

s /r

=√s /r

Note that the support is:

0 < v < ∞ and −∞ < z < ∞;0 < s < ∞ and −∞ < t < ∞.

811/827

Page 15: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: student-t distribution

Jacobian technique and William Gosset (t-distribution)

Since Z and V are independent, their joint density can bewritten as:

fZ ,V (z , v) =fZ (z) · fV (v)

=1√2π· e−

12z2 · 1

Γ (r/2) · 2r/2· v r/2−1 · e−v/2.

4. Using the Jacobian transformation formula above, the jointdensity of (S ,T ) is given by:

fS ,T (s, t) =√s /r · 1√

2πe− 1

2

(t√

s/r)2

· 1

Γ (r/2) 2r/2sr/2−1e−s/2

=1√

2πΓ (r/2) 2r/2· s(r+1)/2−1 · 1√

r· exp

(− s

2

(1 +

t2

r

)),

5. Therefore, the marginal density of T is given by:

fT (t) =

∫ ∞0

fS,T (s, t) ds

(continues on next slide).812/827

Page 16: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: student-t distribution

Jacobian technique and William Gosset (t-distribution)

Making the transformation:

w =s

2

(1 +

t2

r

)⇔ s =

2w

1 + t2/r,

so that:

dw =1

2

(1 +

t2

r

)ds ⇔ ds =

(2

1 + t2/r

)dw .

So that we have:

fT (t) =

∫ ∞0

1√2πΓ (r/2) 2r/2

· s(r+1)/2−1 · 1√r· exp

(− s

2·(

1 +t2

r

))ds

=

∫ ∞0

1√2πΓ (r/2) 2r/2

·

(2w

1 + t2

r

) (r+1)2 −1

· 1√r· exp(−w) ·

(2

1 + t2

r

)dw .

813/827

Page 17: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: student-t distribution

Jacobian technique and William Gosset (t-distribution)

Simplifying:

fT (t) =

∫ ∞0

1√2πr · Γ (r/2) · 2r/2

(2

1 + t2/r

)(r+1)/2−1(2

1 + t2/r

)× w (r+1)/2−1e−wdw

=1√

πr · Γ (r/2) · 2(r+1)/2

(2

1 + t2/r

)(r+1)/2 ∫ ∞0

w (r+1)/2−1e−wdw

∗=

1√πr· Γ ((r + 1) /2)

Γ (r/2)

(1

1 + t2/r

)(r+1)/2

, for −∞ < t <∞,

* using Gamma function:∫∞

0 xα−1 · exp(−x)dx = Γ(α).

This is the standard form of t−distribution (see F&T page163 for tabulated values of c.d.f.).

814/827

Page 18: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: student-t distribution

Jacobian technique and William Gosset (t-distribution)

Student-t probability/cumulative density function

−5 0 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

x

f X(x)

Student−t p.d.f.

r=1r=2r=3r=5r=10r=25

−5 0 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

F X(x)

Student−t c.d.f.

815/827

Page 19: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: Snecdor’s F distribution

Jacobian technique and Snecdor’s F distribution

Special sampling distributions & sample mean and variance

Special Sampling Distributions: chi-squared distributionChi-squared distribution: one degree of freedomChi-squared distribution: n degrees of freedom

Special Sampling Distributions: student-t distributionJacobian technique and William Gosset (t-distribution)

Special Sampling Distributions: Snecdor’s F distributionJacobian technique and Snecdor’s F distribution

Distribution of sample mean/varianceBackgroundFundamental sampling distributions

Page 20: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: Snecdor’s F distribution

Jacobian technique and Snecdor’s F distribution

Snecdor’s F distribution

Suppose U ∼ χ2 (n1) and V ∼ χ2 (n2) are two independentchi-squared distributed random variables.

Then, the random variable:

F =U /n1

V /n2

has a F distribution with n1 and n2 degrees of freedom.

See F&T pages 170-174 for tabulated values of c.d.f.

Prove: Use Jacobian technique.

1. Define variables: f = u/n1

v/n2, g = v ;

2. Inverse transformation: v = g and u = f · g · n1n2

.816/827

Page 21: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: Snecdor’s F distribution

Jacobian technique and Snecdor’s F distribution

Snecdor’s F distribution

3. Jacobian of the transformation:

J(f , g) = det

([∂v/∂f ∂v/∂g∂u/∂f ∂u/∂g

])= det

([0 1

g · n1n2

f · n1n2

])= −g · n1

n2.

Absolute value of the Jacobian: |J(f , g)| = g · n1n2

.

4. Multiply the absolute value of the Jacobian by the jointdensity (joint density, using independence:fU,V (u, v) = fU(u) · fV (v)):

fU,V (u, v) =fU(u) · fV (v)

=u

(n1−2)2

2n1/2 · Γ(n1/2)· exp

(−u

2

)· v

(n2−2)2

2n2/2 · Γ(n2/2)· exp

(−v

2

)Continues on the next slide.

817/827

Page 22: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: Snecdor’s F distribution

Jacobian technique and Snecdor’s F distribution

Snecdor’s F distribution(Cont.) Joint density F and G (using u = f · g · n1

n2and

v = g):

fF ,G (f , g) =n1 · gn2·

(f ·n1·gn2

) (n1−2)2

2n1/2 · Γ(n12

) · exp

(−

f ·n1·gn2

2

)· g

(n2−2)2

2n2/2 · Γ(n22

) · exp(−g

2

)5. The marginal of F is obtained by integrating over all possible

values of G :

fF (f ) =

∫ ∞0

fF ,G (f , g)dg

=func(f ) ·∫ ∞

0g (n1+n2−2)/2 · exp

(−g(

1

2+

fn1

2n2

))dg

where func(f ) =n1

2n2/2 · Γ(n2/2)· (f · n1)(n1−2)/2

nn1/22 · 2n1/2 · Γ(n1/2)

818/827

Page 23: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: Snecdor’s F distribution

Jacobian technique and Snecdor’s F distribution

Continues:

fF (f )∗=func(f ) ·

(2 · n2

n2 + f · n1

)(n1+n2−2)/2+1

·∫ ∞

0x (n1+n2−2)/2 · exp (−x) dx

∗∗=func(f ) ·

(2 · n2

n2 + f · n1

)(n1+n2)/2

· Γ ((n1 + n2)/2)

∗∗∗= n

n1/21 · nn2/2

2 · Γ ((n1 + n2)/2)

Γ (n1/2) · Γ (n2/2)· f n1/2−1

(n2 + f · n1)(n1+n2)/2

* using transformation x = g ·(

12 + f ·n1

2·n2

), thus g = 2·n2

n2+f ·n1· x and

dx =(n2+f ·n1

2·n2

)dg , thus dg =

(n2+f ·n1

2·n2

)−1dx .

** using Gamma function: Γ(α) =∫∞

0 xα−1 · exp(−x)dx .

*** using func(f ) = n1·(f ·n1)(n1−2)/2

2(n1+n2)/2·nn1/22 ·Γ(n2/2)·Γ(n1/2)

819/827

Page 24: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Special Sampling Distributions: Snecdor’s F distribution

Jacobian technique and Snecdor’s F distribution

Snecdor’s F probability density function

0 5 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

f X(x)

Snecdor‘s F p.d.f.

n1=2, n

2=2

n1=2, n

2=4

n1=2, n

2=6

n1=2, n

2=10

n1=10, n

2=2

n1=10, n

2=10

0 5 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

F X(x)

Snecdor‘s F c.d.f.

820/827

Page 25: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Distribution of sample mean/variance

Background

Special sampling distributions & sample mean and variance

Special Sampling Distributions: chi-squared distributionChi-squared distribution: one degree of freedomChi-squared distribution: n degrees of freedom

Special Sampling Distributions: student-t distributionJacobian technique and William Gosset (t-distribution)

Special Sampling Distributions: Snecdor’s F distributionJacobian technique and Snecdor’s F distribution

Distribution of sample mean/varianceBackgroundFundamental sampling distributions

Page 26: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Distribution of sample mean/variance

Background

Properties of the sample mean and sample varianceSuppose you select randomly from a sample.

Assume selected with replacement or, alternatively, from alarge population size.

These outcomes (x1, . . . , xn) are random variables, all with thesame distribution and independent.

Suppose X1,X2, . . . ,Xn are n independent r.v. with identicaldistribution. Define the sample mean by:

X =1

n∑k=1

Xk ,

and recall the sample variance by:

S2 =1

n − 1·

n∑k=1

(Xk − X

)2.

821/827

Page 27: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Distribution of sample mean/variance

Fundamental sampling distributions

Special sampling distributions & sample mean and variance

Special Sampling Distributions: chi-squared distributionChi-squared distribution: one degree of freedomChi-squared distribution: n degrees of freedom

Special Sampling Distributions: student-t distributionJacobian technique and William Gosset (t-distribution)

Special Sampling Distributions: Snecdor’s F distributionJacobian technique and Snecdor’s F distribution

Distribution of sample mean/varianceBackgroundFundamental sampling distributions

Page 28: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Distribution of sample mean/variance

Fundamental sampling distributions

Fundamental sampling distributions

Sampling distributions for i.i.d. normal samples, i.e.,Xi ∼ N(µ, σ2).

In the next slides we will prove the following importantproperties:

- X ∼ N(µ, 1

nσ2): sample mean using known population

variance.

- T =X − µ

S√n

∼ tn−1: sample mean using sample variance.

-(n − 1) · S2

σ2∼ χ2 (n − 1): sample variance using population

variance.

- X and S2 are independent (proof given in Exercise 13.93 ofW+(7ed)).

822/827

Page 29: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Distribution of sample mean/variance

Fundamental sampling distributions

Distribution of sample mean (known σ2)

Prove that the distribution of the sample mean given knownvariance is N(µ, σ2/n).

We have X1, . . . ,Xn are i.i.d. normally distributed variables.

We defined the sample mean by: X =n∑

i=1

Xin .

Use MGF-technique to find the distribution of X :

MX (t) =M n∑i=1

Xi/n(t) = Mn

Xi(t/n) = exp

(µ · t

n+

1

2· σ2 ·

( tn

)2)n

= exp

(µ · t +

1

2· σ

2

n· t2

)which is the m.g.f. of a normal distribution with mean µ andvariance σ2/n.

823/827

Page 30: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Distribution of sample mean/variance

Fundamental sampling distributions

Distribution of sample mean (unknown σ2)

The distribution of the sample mean given unknown(population) variance is given by:

X − µS√n

∼ tn−1

Proof:

X − µS√n

=

X−µσ√n√S2

σ2

∗∼ Z√χ2n−1

n−1

∼ tn−1,

where Z ∼ N(0, 1) is a standard normal r.v..

* Using (n − 1) · S2/σ2 ∼ χ2n−1 (prove: see next slides).

824/827

Page 31: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Distribution of sample mean/variance

Fundamental sampling distributions

Distribution of sample variance

Prove that the distribution of the sample variance is given by:

(n − 1) · S2

σ2∼ χ2

n−1.

First note that:

(n − 1) · S2

σ2=

∑ni=1

(Xi − X

)2

σ2

and second note that:∑ni=1 (Xi − µ)2

σ2=

n∑i=1

(Xi − µσ

)2

=n∑

i=1

Z 2i ∼ χ2

n,

where Zi ∼ N(0, 1), i = 1, . . . , n i.i.d. standard normal r.v..825/827

Page 32: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Distribution of sample mean/variance

Fundamental sampling distributions

We have:∑ni=1 (Xi − µ)2

σ2︸ ︷︷ ︸∑ni=1 Z

2i ∼χ2

n

=

∑ni=1

((Xi − X ) + (X − µ)

)2

σ2

∗=

∑ni=1

(Xi − X

)2

σ2+

∑ni=1

(X − µ

)2

σ2

=

∑ni=1

(Xi − X

)2

σ2+

(X − µ

σ√n

)2

︸ ︷︷ ︸Z2∼χ2

1

.

Hence, the first term on right is χ2n−1 (using gamma sum

property/MGF-technique).

* Using 2 · (X − µ) ·∑n

i=1(Xi − X )︸ ︷︷ ︸=0

= 0.

826/827

Page 33: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

Distribution of sample mean/variance

Fundamental sampling distributions

Fundamental sampling distributions

We have now proven the following important properties:

- X ∼ N(µ, 1

nσ2)

- T =X − µ

S√n

∼ tn−1

-(n − 1) · S2

σ2∼ χ2 (n − 1)

We will use this for:

- confidence intervals for population mean and variance;- testing population mean and variance;- parameter uncertainty of a linear regression model.

Notice, when applying CLT, we do not need that Xi arenormally distributed anymore.

827/827

Page 34: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

ACTL2002/ACTL5101 Probability and Statistics

c© Katja Ignatieva

School of Risk and Actuarial StudiesAustralian School of Business

University of New South Wales

[email protected]

Week 5Probability: Week 1 Week 2 Week 3 Week 4

Estimation: Week 6 Review

Hypothesis testing: Week 7 Week 8 Week 9

Linear regression: Week 10 Week 11 Week 12

Video lectures: Week 1 VL Week 2 VL Week 3 VL Week 4 VL Week 5 VL

Page 35: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

1001/1074

Page 36: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Last four weeks

Introduction to probability;

Moments: (non)-central moments, mean, variance (standarddeviation), skewness & kurtosis;

Special univariate distribution (discrete & continue);

Joint distributions;

Dependence of multivariate distributions

Functions of random variables

1002/1074

Page 37: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

This week

Parameter estimation:

- Method of Moments;

- Maximum Likelihood method;

- Bayesian estimator.

Convergence (almost surely, probability, & distribution);

Application (important theorems):

- Law of large numbers;

- Central limit theorem.

1003/1074

Page 38: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Parameter estimation

Definition of an estimator

Limit theorems & parameter estimatorsParameter estimation

Definition of an estimator

Estimator I: the method of momentsThe method of momentsExample & exercise

Estimator II: maximum likelihood estimatorMaximum likelihood estimationExample & exerciseSampling distribution and the bootstrap

Estimator III: Bayesian estimatorIntroductionBayesian estimationExample & exercise

Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson

SummarySummary

Page 39: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Parameter estimation

Definition of an estimator

Definition of an Estimator

Problem of statistical estimation: a population has somecharacteristics that can be described by a r.v. X with densityfX (· |θ ).

Density has unknown parameter (or set of parameters) θ.

We observe values of the random sample X1,X2, . . . ,Xn

from the population fX (· |θ ). Denote this observed samplevalues by x1, x2, . . . , xn.

We then estimate the parameter (or some function of theparameter) based on this random sample.

1004/1074

Page 40: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Parameter estimation

Definition of an estimator

Definition of an Estimator

Any statistic, i.e., a function T (X1,X2, . . . ,Xn), that is afunction of observable random variables and whose values areused to estimate τ (θ), where τ (·) is some function of theparameter θ, is called an estimator of τ (θ).

A value θ of the statistic evaluated at the observed samplevalues by x1, x2, . . . , xn, will be called an (point) estimate.

For example:

T (X1,X2, . . . ,Xn) = X n =1

n

∑nj=1 Xj , estimator;

θ = 0.23, point estimate.

Note θ can be a vector, then the estimator is a set ofequations.

1005/1074

Page 41: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator I: the method of moments

The method of moments

Limit theorems & parameter estimatorsParameter estimation

Definition of an estimator

Estimator I: the method of momentsThe method of momentsExample & exercise

Estimator II: maximum likelihood estimatorMaximum likelihood estimationExample & exerciseSampling distribution and the bootstrap

Estimator III: Bayesian estimatorIntroductionBayesian estimationExample & exercise

Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson

SummarySummary

Page 42: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator I: the method of moments

The method of moments

The Method of Moments

Example of estimator: Method of Moments (MME).

Let X1,X2, . . . ,Xn be a random sample from the populationwith density fX (·|θ) which we will assume has k number ofparameters, say θ = [θ1, θ2, . . . , θk ]>.

The method of moments estimator τ(θ) procedure is:1. Equate (the first) k sample moments to the corresponding k

population moments;

2. Equate the k population moments to the parameters of thedistribution;

3. Solve the resulting system of simultaneous equations.

The method of moment point estimates (θ) are the estimatevalues of the estimator corresponding to the data set.

1006/1074

Page 43: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator I: the method of moments

The method of moments

The Method of Moments

Denote the sample moments by:

m1 =1

n∑j=1

xj , m2 =1

n∑j=1

x2j , . . . , mk =

1

n∑j=1

xkj ,

and the population moments by:

µ1 (θ1, θ2, . . . , θk) = E [X ] , µ2 (θ1, θ2, . . . , θk) = E[X 2],

. . . , µk (θ1, θ2, . . . , θk) = E[X k].

The system of equations to solve for (θ1, θ2, . . . , θk) is givenby:

mj = µj (θ1, θ2, . . . , θk) , for j = 1, 2, . . . , k.

Solving this provides us the point estimate θ.1007/1074

Page 44: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator I: the method of moments

Example & exercise

Limit theorems & parameter estimatorsParameter estimation

Definition of an estimator

Estimator I: the method of momentsThe method of momentsExample & exercise

Estimator II: maximum likelihood estimatorMaximum likelihood estimationExample & exerciseSampling distribution and the bootstrap

Estimator III: Bayesian estimatorIntroductionBayesian estimationExample & exercise

Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson

SummarySummary

Page 45: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator I: the method of moments

Example & exercise

Example: MME & Binomial distributionSuppose X1,X2, . . . ,Xn is a random sample from Bin (n, p)distribution, with known parameter n.

Question: Use the method of moments to find pointestimators of θ = p.

1. Solution: Equate population moment to sample moment:

E[X ] =1

n∑j=1

xj = x .

2. Equate population moment to the parameter (use week 2):

E[X ] = n · p.

3. Then the method of moments estimator is (i.e., solving it):

x = n · p ⇒ p = x/n.1008/1074

Page 46: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator I: the method of moments

Example & exercise

Exercise: MME & Normal distribution

Suppose X1,X2, . . . ,Xn is a random sample from N(µ, σ2

)distribution.

Question: Use the method of moments to find pointestimators of µ and σ2.

1. Solution: Equate population moment to sample moment:

E [X ]︸ ︷︷ ︸population moment

=1

n∑j=1

xj = x︸ ︷︷ ︸sample moment

E[X 2]︸ ︷︷ ︸

population moment

=1

n∑j=1

x2j .︸ ︷︷ ︸

sample moment1009/1074

Page 47: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator I: the method of moments

Example & exercise

Exercise: MME & Normal distribution

2. Equate population moment to the parameters (use week 2):

E[X ] = µ and E[X 2] = Var(X ) + E[X ]2 = σ2 + µ2.

3. The method of moments estimators are:

µ =E [X ] = x

σ2 =E[X 2]− (E [X ])2

=1

n

n∑j=1

x2j − x2 =

1

n

n∑j=1

(xj − x)2 ∗=n − 1

ns2,

* using s2 =∑n

j=1(xj−x)2

n−1 is the sample variance.

Note: E[σ2]6= σ2 (biased estimator), more on this next

week.1010/1074

Page 48: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Maximum likelihood estimation

Limit theorems & parameter estimatorsParameter estimation

Definition of an estimator

Estimator I: the method of momentsThe method of momentsExample & exercise

Estimator II: maximum likelihood estimatorMaximum likelihood estimationExample & exerciseSampling distribution and the bootstrap

Estimator III: Bayesian estimatorIntroductionBayesian estimationExample & exercise

Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson

SummarySummary

Page 49: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Maximum likelihood estimation

Maximum Likelihood function

Another example (mostly used) of an estimator is themaximum likelihood estimator.

First, we need to define the likelihood function.

If x1, x2, . . . , xn are drawn from a population with a parameterθ (where θ could be a vector of parameters), then thelikelihood function is given by:

L (θ; x1, x2, . . . , xn) = fX1,X2,...,Xn (x1, x2, . . . , xn) ,

where fX1,X2,...,Xn (x1, x2, . . . , xn) is the joint probability densityof the random variables X1,X2, . . . ,Xn.

1011/1074

Page 50: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Maximum likelihood estimation

Maximum Likelihood EstimationLet L (θ) = L (θ; x1, x2, . . . , xn) be the likelihood function forX1,X2, . . . ,Xn.

The set of parameters θ = θ (x1, x2, . . . , xn) (note: function ofobserved values) that maximizes L (θ) is the maximumlikelihood estimate of θ.

The random variable θ (X1,X2, . . . ,Xn) is called the maximumlikelihood estimator.

When X1,X2, . . . ,Xn is a random sample from fX (x |θ), thenthe likelihood function is (using i.i.d. property):

L (θ; x1, x2, . . . , xn) =n∏

j=1

fX (xj |θ) ,

which is just the product of the densities evaluated at each ofthe observations in the random sample.1012/1074

Page 51: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Maximum likelihood estimation

Maximum Likelihood Estimation

If the likelihood function contains k parameters so that:

L (θ1, θ2, . . . , θk ; x) = fX (x1|θ) · fX (x2; θ) · . . . · fX (xn; θ) ,

then (under certain regularity conditions), the point where thelikelihood is a maximum is a solution of the k equations:

∂L (θ1, θ2, . . . , θk ; x)

∂θ1= 0,

∂L (θ; x)

∂θ2= 0, . . . ,

∂L (θ; x)

∂θk= 0.

Normally, the solutions to this system of equations give theglobal maximum, but to ensure, you should usually check forthe second derivative (or Hessian) conditions and boundaryconditions for a global maximum.

1013/1074

Page 52: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Maximum likelihood estimation

Maximum Likelihood Estimation

Consider the case of estimating two variables, say θ1 and θ2.

Define the gradient vector:

D (L) =

∂L

∂θ1

∂L

∂θ2

and define the Hessian matrix:

H (L) =

∂2L

∂θ21

∂2L

∂θ1∂θ2

∂2L

∂θ1∂θ2

∂2L

∂θ22

.1014/1074

Page 53: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Maximum likelihood estimation

Maximum Likelihood Estimation

From calculus we know that the maximum choice θ1 and θ2

should satisfy not only:

D (L) = 0,

but also H should be negative definite which means:

[h1 h2

]

∂2L

∂θ21

∂2L

∂θ1∂θ2

∂2L

∂θ1∂θ2

∂2L

∂θ22

[h1

h2

]< 0,

for all [h1, h2] 6= 0.

1015/1074

Page 54: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Maximum likelihood estimation

Log-Likelihood functionGenerally, maximizing the log-likelihood function is easier.

Not surprisingly, we define the log-likelihood function as:

` (θ1, θ2, . . . , θk ; x) = log (L (θ1, θ2, . . . , θk ; x))

= log

n∏j=1

fX (xj |θ)

∗=

n∑j=1

log (fX (xj |θ)) .

* using log(a · b) = log(a) + log(b).

Maximizing the log-likelihood function gives the sameparameter estimates as maximizing the likelihood function,because taking the log is a monotonic increasing function.

1016/1074

Page 55: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Maximum likelihood estimation

MLE procedure

The general procedure to find the ML estimator is:

1. Determine the likelihood function L (θ1, θ2, . . . , θk ; x);

2. Determine the log-likelihood function` (θ1, θ2, . . . , θk ; x) = log (L (θ1, θ2, . . . , θk ; x));

3. Equate the derivatives of ` (θ1, θ2, . . . , θk ; x) w.r.t.θ1, θ2, . . . , θk to zero (⇒ global/local minimum/maximum).

4. Check wether second derivative is negative (maximum) andboundary conditions.

1017/1074

Page 56: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Example & exercise

Limit theorems & parameter estimatorsParameter estimation

Definition of an estimator

Estimator I: the method of momentsThe method of momentsExample & exercise

Estimator II: maximum likelihood estimatorMaximum likelihood estimationExample & exerciseSampling distribution and the bootstrap

Estimator III: Bayesian estimatorIntroductionBayesian estimationExample & exercise

Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson

SummarySummary

Page 57: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Example & exercise

Example: MLE and Poisson1. Suppose X1,X2, . . . ,Xn are i.i.d. and Poisson(λ). The

likelihood function is given by:

L (λ; x) =n∏

j=1

fX (xj |θ) =

(e−λλx1

x1!

)·(e−λλx2

x2!

)· . . . ·

(e−λλxn

xn!

)

=e−λ·n(λx1

x1!· λ

x2

x2!· . . . · λ

xn

xn!

).

2. So that taking the log of both sides, we get:

` (λ; x) = −λ · n + log (λ) ·n∑

k=1

xk −n∑

k=1

log (xk !) .

Or, equivalently, using directly the log-likelihood function:

` (λ; x) =n∑

j=1

log (fX (xj |θ)) =n∑

j=1

−λ+xk · log (λ)− log (xk !) .

1018/1074

Page 58: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Example & exercise

Example: MLE and Poisson

Now we need to maximize this log-likelihood function withrespect to the parameter λ.

3. Taking the first order condition (FOC) with respect to λ wehave:

∂λ` (λ) = 0 ⇒ −n +

1

λ

n∑k=1

xk = 0.

This gives the maximum likelihood estimate (MLE):

λ =1

n

n∑k=1

xk = x ,

which equals the sample mean.

4. Check for second derivative condition to ensure globalmaximum.

1019/1074

Page 59: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Example & exercise

Exercise: MLE and Normal

Suppose X1,X2, . . . ,Xn are i.i.d. and Normal(µ, σ2

)where

both parameters are unknown.

The p.d.f. is given by:

fX (x) =1√

2π · σ· exp

(−1

2·(x − µσ

)2).

1. Thus the likelihood function is given by:

L (µ, σ; x) =n∏

k=1

1√2πσ

exp

(−1

2

(xk − µσ

)2).

Question: Find the MLE of µ and σ2.

1020/1074

Page 60: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Example & exercise

Exercise: MLE and Normal

2. Solution: Its log-likelihood function is:

` (µ, σ; x) =n∑

i=1

log

(1√

2π · σ· exp

(−1

2·(xk − µσ

)2))

∗=−n · log(σ)− n

2· log(2π)− 1

2σ2·

n∑k=1

(xk − µ)2 .

* using log(1/a) = log(a−1) = − log(a), with a = σand log(1/

√b) = log(b−0.5) = −0.5 log(b), with b = 2π.

Take the derivative w.r.t. µ and σ and set that equal to zero.

1021/1074

Page 61: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Example & exercise

3./4. Then, we obtain:

∂µ` (µ, σ; x) =

1

σ2

n∑k=1

(xk − µ) = 0

⇒n∑

k=1

xk − nµ = 0

⇒ µ = x

∂σ` (µ, σ; x) =

−nσ

+

∑nk=1(xk − µ)

σ3= 0

⇒ n =

n∑k=1

(xk − µ)

σ2

⇒ σ2 =1

n

n∑k=1

(xk − x)2 .

See §9.7 and §9.8 of W+(7ed) for further details.1022/1074

Page 62: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Example & exercise

Example: MME & MLE and Gamma

You may not always obtain closed-form solutions for theparameter estimates with the maximum likelihood method.

An example of such problem when estimating the parametersusing MLE is the Gamma distribution.

As we will see in the next slides, using MLE yields oneparameter estimate in closed-form solution; not so for thesecond parameter.

To find the MLE one should do the following: numericallyestimate the estimates (!) by solving a non-linear equation.This can be done by employing an iterative numericalapproximation (e.g. Newton-Ralphson).

Application: Surrender mortgages, see Excel.1023/1074

Page 63: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Example & exercise

Example: MME & MLE and Gamma

In such cases an initial value may be needed so that othermeans of estimating first may be used, such as using themethod of moments. Then use it as the starting value.

Question: Consider X1,X2, . . . ,Xn i.i.d. and Gamma(λ, α)find the MME of the Gamma distribution.

fX (x) = λα

Γ(α) · xα−1 · e−λ·x ; E [X r ] = Γ(α+r)

λrΓ(α)

MX (t) = E[etX]

=(

λλ−t

)α; Var (X ) = α

λ2 .

1. Solution: Equate sample moments to population moments:

µ1 = M(1)X (t)

∣∣∣t=0

= E [X ] = x and µ2 = M(2)X (t)

∣∣∣t=0

= E[X 2]

=n∑

i=1

x2i

n.

1024/1074

Page 64: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Example & exercise

Example: MME & MLE and Gamma

2. Equate population moments to the parameters:

µ1 =α

λand µ2 =

α · (α + 1)

λ2=α

λ·(α + 1

λ

)= µ1 ·

(µ1 +

1

λ

).

3. Therefore, the method of moments estimates are given by:

µ2µ1

= µ1 + 1λ ⇒λ = µ1

µ2−µ21

α = µ1 · λ ⇒α =µ2

1

µ2−µ21.

So that estimators are:

λ =x

σ2and α =

x2

σ2.

using (step 1.) µ1 = x and

µ2 =n∑

i=1

x2in ⇒ µ2 − µ2

1 =n∑

i=1

x2in − x2 = σ2

1025/1074

Page 65: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Example & exercise

Example: MME & MLE and Gamma

Question: Find the ML-estimates.

1. Solution: Now, X1,X2, . . . ,Xn are i.i.d. and Gamma(λ, α) solikelihood function is:

L (λ, α; x) =n∏

i=1

1

Γ (α)· λα · xα−1

i · e−λ·xi .

2. The log-likelihood function is then:

` (λ, α; x) =− n · log (Γ (α)) + n · α · log(λ)

+ (α− 1) ·n∑

i=1

log(xi )− λ ·n∑

i=1

xi .

1026/1074

Page 66: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Example & exercise

Example: MME & MLE and Gamma

3. Maximizing this:

∂α` (λ, α; x) =− n ·

∂Γ(α)∂α

Γ (α)+ n · log(λ) +

n∑i=1

log(xi ) = 0

∂λ` (λ, α; x) =

n · αλ−

n∑i=1

xi = 0.

Easy to solve for second equation:

λ =n · αn∑

i=1xi

,

but need numerical (iterative) techniques for solving the firstequation.

1027/1074

Page 67: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Example & exercise

Example: MLE and Uniform

Suppose X1,X2, . . . ,Xn are i.i.d. U [0, θ], i.e., fX (x) =1

θ, for

0 ≤ x ≤ θ, and zero otherwise. Here the range of x dependson the parameter θ.

The likelihood function can be expressed as:

L (θ; x) =

(1

θ

)n

·n∏

k=1

I0≤xk≤θ,

where I0≤xk≤θ is an indicator function taking 1 if x ∈ [0, θ]and zero otherwise.

Question: How to find the maximum of this Likelihoodfunction?

1028/1074

Page 68: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Example & exercise

Example: MLE and Uniform

x(1) x(4) x(3) x(2)0

2

4

6

8

10

12

14

θ

L(θ;x)

Solution: Non-linearity in theindicator function ⇒ cannot usecalculus to maximize this function,i.e., setting FOC equal to zero.

You can maximize it by looking at itsproperties:

-∏n

k=1 I0≤xk≤θ can only take value 0and 1;Note: it will take the value 0 ifθ < x(n) and 1 else!

- (1/θ)n is a decreasing function in θ;

- Hence, function is maximized for thelowest value of θ for which∏n

k=1 I0≤xk≤θ = 1 i.e.:

θ = max x1, x2, . . . , xn = x(n).

1029/1074

Page 69: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Sampling distribution and the bootstrap

Limit theorems & parameter estimatorsParameter estimation

Definition of an estimator

Estimator I: the method of momentsThe method of momentsExample & exercise

Estimator II: maximum likelihood estimatorMaximum likelihood estimationExample & exerciseSampling distribution and the bootstrap

Estimator III: Bayesian estimatorIntroductionBayesian estimationExample & exercise

Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson

SummarySummary

Page 70: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Sampling distribution and the bootstrap

Sampling distribution and the bootstrap

We might not only be interested in the point estimate, but inthe whole distribution of the MLE estimate (parameteruncertainty!);

However, we have no closed solution for MLE estimates. Howto obtain their sampling distribution? Use bootstrapping.

Step 1: Generate k samples from Gamma(λ, α).

Step 2: Estimate λ, α for each of these k samples using MLE.

Step 3: The empirical joint cumulative distribution function ofthese k parameter estimates is an approximation to sampledistribution of the MLE estimates.

Quantification of risk: produce histograms of estimates.1030/1074

Page 71: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator II: maximum likelihood estimator

Sampling distribution and the bootstrap

Sampling distribution and bootstrap, k = 250, see Excel

1 2 30

0.2

0.4

0.6

0.8

1

Approximation sample distr of α

α

F α(α)

1st time 2nd 3rd 4th 5th

0.1 0.2 0.3 0.40

0.2

0.4

0.6

0.8

1

Approximation sample distr of λ

λ

F λ(λ)

1031/1074

Page 72: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Introduction

Limit theorems & parameter estimatorsParameter estimation

Definition of an estimator

Estimator I: the method of momentsThe method of momentsExample & exercise

Estimator II: maximum likelihood estimatorMaximum likelihood estimationExample & exerciseSampling distribution and the bootstrap

Estimator III: Bayesian estimatorIntroductionBayesian estimationExample & exercise

Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson

SummarySummary

Page 73: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Introduction

IntroductionWe have seen:

I Method of moment estimator:Idea: first k moments of the estimated special distribution andsample are the same.

I Maximum likelihood estimator:Idea: Probability of sample given a class of distribution is thehighest with this set of parameters.

Warning: Bayesian estimation is hard to understand. Partlydue to non-standard notation in Bayesian estimates.

Pure Bayesian interpretation: Suppose you have, a priori,prior belief about a distribution;

Then you observe data ⇒ more information about thedistribution.1032/1074

Page 74: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Introduction

Example frequentist interpretation: Let Xi ∼ Ber(θ) bewhether individual i lodge a claim at the insurer:

-∑T

i=1 Xi = Y ∼ Bin(T , θ) be the number of car accidents;

- The probability of insured having a car accident depends onadverse selection;

- A new insurer does not know the amount of adverse selectionin his pool;

- Now, let θ ∈ Θ, with Θ ∼ Beta(a, b) the distribution of therisk among individuals (i.e., representing adverse selection);

- Use this for estimating the parameter ⇒ what is our prior forθ?

This is called empirical Bayes.

Similar idea: Bayesian updating, in case of time varyingparameters:

- Prior: Last year’s estimated claim distribution;- Data: This years claims;- Posterior: revised estimated claim distribution.1033/1074

Page 75: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Bayesian estimation

Limit theorems & parameter estimatorsParameter estimation

Definition of an estimator

Estimator I: the method of momentsThe method of momentsExample & exercise

Estimator II: maximum likelihood estimatorMaximum likelihood estimationExample & exerciseSampling distribution and the bootstrap

Estimator III: Bayesian estimatorIntroductionBayesian estimationExample & exercise

Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson

SummarySummary

Page 76: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Bayesian estimation

Notation for Bayesian estimation

Under this approach, we assume that Θ is a random quantitywith density π (θ) called the prior density.(This is usual notation, rather than fΘ(θ).)

A sample X = x(= [x1, x2, . . . , xT ]>) is taken from itspopulation and the prior density is updated using theinformation drawn from this sample and applying Bayes’ rule.This updated prior is called the posterior density, which is theconditional density of Θ given the sample X = x is π(θ|x)(=fΘ|X (θ|x)).

So we’re using a conditional r.v., Θ|X , associated with themultivariate distribution of Θ and the X (look back at lecturenotes for week 3).

Use for example E [π(θ|x)] as the Bayesian estimator.1034/1074

Page 77: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Bayesian estimation

Bayesian estimation, theory

First, let us define a loss function L(θ; θ) on T which is anestimator of τ(θ) with:

L(θ; θ) ≥ 0, for every θ;

L(θ; θ) = 0, when θ = θ.

Interpretation loss function: for reasonable functions wehave:a loss function has a lower value ⇒ better estimator.

Examples of the loss function:

- Mean squared error: L(θ, θ) = (θ − θ)2 (mostly used);

- Absolute error: L(θ, θ) = |θ − θ|.

1035/1074

Page 78: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Bayesian estimation

Bayesian estimation, theoryNext, we define a risk function, the expected loss:

Rθ(θ) =E

θ

[L(θ; θ)

]=

∫L(θ(x); θ) · fx |Θ(x |θ)dx .

Note: estimator is a random variable (e.g. T = θ = X ,τ(θ) = θ = µ) depending on observations.

Interpretation risk function: loss function is a randomvariable ⇒ taking expectation returns a number given θ.

Note: Rθ(θ) is a function of θ (we only know prior density).

Define Bayes risk under prior π as:

Bπ(θ) = Eθ[Rθ(θ)]

=

∫ΘRθ(θ) · π(θ)dθ.

Goal: minimize Bayes risk.1036/1074

Page 79: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Bayesian estimation

Bayesian estimation, theoryNow, we can introduce the Bayesian estimator, for a givenloss function, θB , for which the following hold:

Eθ[RθB

(θ)]≤ Eθ

[Rθ(θ)],

for any θ.

Rewriting, * using reversing order of integrals; ** using thelaw of iterative expectations (week 3) we have:

θB =argminθ

Eθ[Eθ

[L(θ|θ)

]]∗=argmin

θ

[Eθ[L(θ|θ)

]]∗∗=argmin

θ

[L(θ)

].

Interpretation: θ is the “best estimator” with respect to lossfunction L(θ; θ).1037/1074

Page 80: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Bayesian estimation

Bayesian estimation, estimatorsRewriting the Bayes risk we have:

Bπ(θ) =

∫ΘRθ(θ) · π(θ)dθ =

∫Θ

∫L(θ(x), θ) · fx |θdx · π(θ)dθ

∗=

∫Θ

∫L(θ(x), θ) · fx |θ · π(θ|x)dxdθ

∗∗=

∫ ∫ΘL(θ(x), θ) · π(θ|x)dθ︸ ︷︷ ︸

≡r(θ|x)

· fx |θdx

=

∫r(θ|x) · fx |θdx .

Implying: minimizing Bπ(θ) is equivalent to minimizing r(θ|x) forall x .* using

∫π(θ|x)dx = π(θ), i.e., Law of Total Probability and **

changing order of integration.1038/1074

Page 81: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Bayesian estimation

Bayesian estimation, estimatorsFor the squared error loss function (used in *) we have:

minθ

Bπ(θ)

⇔minimizing r(θ|x) for all x ⇒ ∂r(θ|x)

∂θ|x= 0

∗⇒2

∫Θ

(θ − θ(x)) · π(θ|x)dθ = 0

⇒θB(x) =

∫Θθ · π(θ|x)dθ

⇒θB(x) = Eθ|x [θ] .

Interpretation: Bayesian estimator under squared error lossfunction is the expectation of the posterior density, i.e.,θB = E[π(θ|x)]!

One can show that for absolute error loss function:θB(x) = median(π(θ|x)).

1039/1074

Page 82: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Bayesian estimation

Bayesian estimation, derivationThe posterior density (i.e., fΘ|X (θ|x)) is derived as:

π (θ|x)∗=

fX |Θ (x1, x2, . . . , xT |θ ) · π (θ)∫fX |Θ (x1, x2, . . . , xT |θ ) · π (θ) dθ

(1)

∗∗=

fX |Θ (x1, x2, . . . , xT |θ ) · π (θ)

fX (x1, x2, . . . , xT )

* Using Bayes formula: Pr(Ai |B) = Pr(B|Ai )·Pr(Ai )∑nj=1 Pr(B|Aj )·Pr(Aj )

, with

A1, . . . ,An a complete partition of Ω.

** Using LTP: Pr(A) =∑n

i=1 Pr(A|Bi ) · Pr(Bi )(where B1, . . . ,Bn a complete partition of Ω, week 1).

Hence, denominator is is the marginal density of theX = [x1, x2, . . . , xT ]> (=constant given the observations!).

Note: π(θ) is a complete partition of the sample space.1040/1074

Page 83: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Bayesian estimation

Bayesian estimation, derivationNotation: ∝ is “proportional to”, i.e., f (x) ∝ g(x)⇒ f (x) = c · g(x).

We have that the posterior is given by:

π (θ|x) ∝ fX |Θ (x1, x2, . . . , xT |θ ) · π (θ) . (2)

Either use equation (1) (difficult/tidious integral!) or (2).

Equation (2) can be used to find the posterior density by:I. Find c such that c ·

∫fX |Θ (x1, x2, . . . , xT |θ ) · π (θ) dθ = 1.

II. Find a (special) distribution that is proportional tofX |Θ (x1, x2, . . . , xT |θ ) · π (θ). (fastest way, if possible!)

Estimation procedure:1. Find posterior density using (1) (difficult/tidious integral!) or

(2).2. Compute the Bayesian estimator (using the posterior) under a

given loss function (under mean squared loss function: takeexpectation of the posterior distribution).

1041/1074

Page 84: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Example & exercise

Limit theorems & parameter estimatorsParameter estimation

Definition of an estimator

Estimator I: the method of momentsThe method of momentsExample & exercise

Estimator II: maximum likelihood estimatorMaximum likelihood estimationExample & exerciseSampling distribution and the bootstrap

Estimator III: Bayesian estimatorIntroductionBayesian estimationExample & exercise

Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson

SummarySummary

Page 85: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Example & exercise

Example Bayesian estimation: Bernoulli-Beta

Let X1,X2, . . . ,XT be i.i.d. Bernoulli(Θ), i.e.,(Xi |Θ = θ) ∼ Bernoulli(θ).

Assume the prior density of Θ is Beta(a, b) so that:

π (θ) =Γ (a + b)

Γ (a) · Γ (b)· θa−1 · (1− θ)b−1 .

We know that the conditional density (density conditional onthe true value of θ) of our data is given by:

fX |Θ (x |θ ) =θx1 (1− θ)1−x1 · θx2 (1− θ)1−x2 · . . . · θxT (1− θ)1−xT

T∑j=1

xj· (1− θ)

T−T∑j=1

xj ∗= θs · (1− θ)T−s .

This is just the likelihood function.* Simplifying notation, let s =

∑Tj=1 xj .

1042/1074

Page 86: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Example & exercise

1. Easy method: The posterior density, the density of Θ givenX = x , using (1) is proportional to:

π (θ|x) ∝fX |Θ (x1, x2, . . . , xT |θ ) · π (θ)

=Γ (a + b)

Γ (a) Γ (b)· θ(a+s)−1 · (1− θ)(b+T−s)−1 (3)

I. Posterior density is also solvable by finding c such that:∫c · Γ (a + b)

Γ (a) · Γ (b)· θ(a+s)−1 · (1− θ)(b+T−s)−1 dθ = 1.

Posterior density is c · fX |Θ (x1, x2, . . . , xT |θ ) · π (θ).

II. However, we observe (3) is proportional to the p.d.f. ofΞ ∼ Beta (a + s, b + T − s).

1. Tedious method: To find the posterior density using (2) wefirst need to find the marginal density of the X (next slide).

1043/1074

Page 87: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Example & exercise

The marginal density of the X (* using LTP) is given by:

fX (x)∗=

∫ 1

0fX |Θ (x |θ ) · π (θ)dθ

=

∫ 1

0

Γ (a + b)

Γ (a) · Γ (b)· θ(a+s)−1 · (1− θ)(b+T−s)−1 dθ

∗∗=

Γ (a + b)

Γ (a) · Γ (b)

Γ (a + s) · Γ (b + T − s)

Γ (a + b + T ).

** :∫ 1

0 xα−1 · (1− x)β−1dx = B(α, β) = Γ(α)·Γ(β)Γ(α+β) ; Posterior density

using (2): π (θ|x) =fX |Θ (x |θ ) · π (θ)

fX (x)

=θs · (1− θ)T−s · Γ(a+b)

Γ(a)·Γ(b) · θa−1 · (1− θ)b−1

Γ(a+b)Γ(a)·Γ(b)

Γ(a+s)·Γ(b+T−s)Γ(a+b+T )

=Γ (a + b + T )

Γ (a + s) · Γ (b + T − s)· θ(a+s)−1 · (1− θ)(b+T−s)−1,

1044/1074

Page 88: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Example & exercise

Example Bayesian estimation: Bernoulli-Beta

2. The mean of this r.v. with the above posterior density is then:

θB = E[Θ|X = x ] = E [Ξ ∼ Beta (a + s, b + T − s)] =a + s

a + b + T

gives the Bayesian estimator of Θ.

We note that we can write the Bayesian estimator as aweighted average of the prior mean (which is a/(a + b)) andthe sample mean (which is s/T ) as follows:

θB = E[Θ|X = x ] =

(T

a + b + T

)︸ ︷︷ ︸

weight sample

·( s

T

)︸ ︷︷ ︸

sample mean

+

(a + b

a + b + T

)︸ ︷︷ ︸

weight prior

·(

a

a + b

)︸ ︷︷ ︸prior mean

.

1045/1074

Page 89: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Example & exercise

Exercise Normal-NormalLet X1,X2, . . . ,XT be i.i.d. Normal

(Θ, σ2

2

), i.e.,

(Xi |Θ = θ) ∼ Normal(θ, σ22).

Assume the prior density of Θ is Normal(m, σ2

1

)so that:

π (θ) =1√

2πσ1

· exp

(−(θ −m)2

2 · σ21

).

Question: Find the Bayesian estimator for θ.

Solution: We know that the conditional density of our data isgiven by the likelihood function:

fX |Θ (x |θ ) =T∏j=1

1√2πσ2

· exp

(−

(xj − θ)2

2 · σ22

)

=1

(√

2πσ2)T· exp

(−∑T

j=1(xj − θ)2

2 · σ22

)1046/1074

Page 90: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Example & exercise

1. Posterior density:

π(θ|x) ∝ fX |Θ(x |θ) · π(θ) ∝ exp

(−∑T

j=1(xj − θ)2

2 · σ22

)· exp

(−(θ −m)2

2 · σ21

)

= exp

(−∑T

j=1(xj − θ)2

2 · σ22

− (θ −m)2

2 · σ21

)

= exp

(−∑T

j=1(x2j + θ2 − 2 · θ · xj)

2 · σ22

− (θ2 + m2 − 2 · θ ·m)

2 · σ21

)

= exp

(−σ2

2 · (θ2 + m2 − 2 · θ ·m) + σ21 ·∑T

j=1(x2j + θ2 − 2 · θ · xj)

2 · σ22 · σ2

1

)∗∝ exp

(−θ

2 · (σ22 + T · σ2

1)− 2 · θ · (m · σ22 + T · x · σ2

1)

2 · σ22 · σ2

1

)

= exp

− θ2 − 2 · θ · (m·σ22+T ·x ·σ2

1)

(σ22+T ·σ2

1)

2 · σ22 · σ2

1/(σ22 + T · σ2

1)

∗∗∝ exp

−(θ − m·σ2

2+T ·x ·σ21

σ22+T ·σ2

1

)2

2 · σ22 · σ2

1/(σ22 + T · σ2

1)

1047/1074

Page 91: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Estimator III: Bayesian estimator

Example & exercise

*: exp

(−m2+

∑Tj=1 xj

2·σ22 ·σ2

1

)and **:

exp

((m·σ2

2+T ·x ·σ21

σ22+T ·σ2

1

)2/(

2 · σ22 · σ2

1/(σ22 + T · σ2

1)))

are

constants given x .

1. Thus θ|X is Normally distributed with meanm·σ2

2+T ·x ·σ21

σ22+T ·σ2

1and

varianceσ2

2 ·σ21

σ22+T ·σ2

1. Note that we can rewrite it to:

mean:

1σ2

1

1σ2

1+ T

σ22

·m +

Tσ2

1

1σ2

1+ T

σ22

· x , and variance:

(1

σ21

+T

σ22

)−1

2. The Bayesian estimator under both the mean squared lossfunction and absolute error loss function is:

θB =

1σ2

1

1σ2

1+ T

σ22

·m +

Tσ2

1

1σ2

1+ T

σ22

· x .

1048/1074

Page 92: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Chebyshev’s Inequality

Limit theorems & parameter estimatorsParameter estimation

Definition of an estimator

Estimator I: the method of momentsThe method of momentsExample & exercise

Estimator II: maximum likelihood estimatorMaximum likelihood estimationExample & exerciseSampling distribution and the bootstrap

Estimator III: Bayesian estimatorIntroductionBayesian estimationExample & exercise

Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson

SummarySummary

Page 93: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Chebyshev’s Inequality

Chebyshev’s InequalityThe Chebyshev’s inequality, states that for any randomvariable X with mean µ and variance σ2, the followingprobability inequality holds for all ε > 0:

Pr (|X − µ| > ε) ≤ σ2

ε2.

Note that this applies to all distributions, hence alsonon-symmetric ones! This implies that:

Pr (X − µ > ε) ≤ σ2

ε2≥ Pr (X − µ < −ε) .

Interesting example: set ε = k · σ then:

Pr (|X − µ| > k · σ) ≤ 1

k2.

This provides us with an upper bound of the probability thatX deviates more than k standard deviations of its mean.1049/1074

Page 94: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Chebyshev’s Inequality

Application: Chebyshev’s Inequality

The distribution of fire insurance claims does not have aspecial distribution.

We do know that the mean claim size in the portfolio is $50million with a standard deviation of $150 million.

Question: What is an upper bound for the probability thatthe claim size is larger than $500 million?

Solution: We have:

Pr (X − µ > k · σ) ≤Pr (|X − µ| > k · σ)

= Pr (|X − 50| > k · 150)

≤ 1

k2=

1

9.

Thus, Pr (X > 500) ≤ 1/9.1050/1074

Page 95: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Convergence concepts

Limit theorems & parameter estimatorsParameter estimation

Definition of an estimator

Estimator I: the method of momentsThe method of momentsExample & exercise

Estimator II: maximum likelihood estimatorMaximum likelihood estimationExample & exerciseSampling distribution and the bootstrap

Estimator III: Bayesian estimatorIntroductionBayesian estimationExample & exercise

Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson

SummarySummary

Page 96: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Convergence concepts

Convergence conceptsSuppose X1,X2, . . . form a sequence of r.v.’s. Example: Xi isthe sample variance using the first i observations.

Xn is said to converge almost surely (a.s.) to the randomvariable X as n→∞ if and only if:

Pr (ω : Xn (ω)→ X (ω) , as n→∞) =1,

and we write Xna.s.→ X , as n→∞.

Sometimes called strong convergence. It means that beyondsome point in the sequence (ω), the difference will always beless than some positive ε, but that point is random.

OPTIONAL:Also expressed as: Pr (|Xn(ω)− X (ω)| > ε, i.o.) = 0, wherei.o. stands for infinitely often: Pr(An i.o.) = Pr(lim supn An).

Applications: Law of large numbers, Monte Carlo integration.1051/1074

Page 97: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Convergence concepts

Xn converges in probability to the random variable X asn→∞ if and only if, for every ε > 0,

Pr (|Xn − X | > ε)→0, as n→∞,

and we write Xnp→ X , as n→∞.

Difference converges in probability and converges almostsurely: Pr (|Xn − X | > ε) goes to zero instead of equals zero

as n goes to infinity (hencep→ is weaker than

a.s.→).

1052/1074

Page 98: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Convergence concepts

Xn converges in distribution to the random variable X asn→∞ if and only if, for every x ,

FXn (x)→ FX (x) , as n→∞.

and we write Xnd→ X , as n→∞. Sometimes called weak

convergence.

Convergence of MGF’s implies weak convergence.

Applications (see later in lecture):

- Cental Limit Theorem;

- Xn ∼ Bin(n, p) and X ∼ N(n · p, n · p · (1− p));

- Xn ∼ Poi(λn), with λn →∞ and X ∼ N(λn, λn).

1053/1074

Page 99: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of strong convergency: Law of Large Numbers

Limit theorems & parameter estimatorsParameter estimation

Definition of an estimator

Estimator I: the method of momentsThe method of momentsExample & exercise

Estimator II: maximum likelihood estimatorMaximum likelihood estimationExample & exerciseSampling distribution and the bootstrap

Estimator III: Bayesian estimatorIntroductionBayesian estimationExample & exercise

Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson

SummarySummary

Page 100: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of strong convergency: Law of Large Numbers

The Law of Large NumbersSuppose X1,X2, . . . ,Xn are independent random variableswith common mean E[Xk ] = µ and common varianceVar(Xk) = σ2, for k = 1, 2, . . . , n.Define the sequence of sample means as:

X n =1

n

n∑k=1

Xk .

Then, according to the law of large numbers, for any ε > 0,we have:

limn→∞

Pr(∣∣X n − µ

∣∣ > ε)

= limn→∞

σ2n

ε2= lim

n→∞

σ2

n · ε2= 0,

Proof: special case: ∼ N(µ, σ2): X − µ ∼ N(0, σ2/n), thuswhen n→∞ we have lim

n→∞σ2/n = 0.

General case: When second moment exists, use Chebychev’sinequality with σ → 0.1054/1074

Page 101: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of strong convergency: Law of Large Numbers

The law of large numbers (LLN) is sometimes written as:

Pr(∣∣X n − µ

∣∣ > ε)→ 0, as n→∞.

The result above is sometimes called the (weak) law of large

numbers and sometimes we write X np→ µ, because this is the

same concept as convergence in probability to a constant.

However, there is also what we call the (strong) law of largenumbers which simply states that the sample mean convergesalmost surely to µ:

X na.s.→ µ, as n→∞.

Important result in Probability and Statistics!

Intuitively, the law of large number states that the samplemean X n converges to the true value µ.

How accurate the estimate is will depend on:I) how large the sample size is; II) the variance σ2.1055/1074

Page 102: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of strong convergency: Law of Large Numbers

Application of LLN: Monte Carlo Integration

Suppose we wish to calculate

I (g) =

∫ 1

0g (x) dx ,

where elementary techniques of integration will not work.

Using the Monte Carlo method, we generate U [0, 1] variablessay X1,X2, . . . ,Xn and compute:

In (g) =1

n∑k=1

g (Xk) ,

where In (g) denotes the approximation of I (g), we have:In (g)

a.s.→ I (g), as n→∞.

Prove: next slide.1056/1074

Page 103: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of strong convergency: Law of Large Numbers

Proof: Using the law of large numbers, we haveIn(g) = 1

n

∑nk=1 g(Xk)

a.s.→ E [g (X )] which is:

E [g (X )] =

∫ 1

0g (x) · 1dx =

∫ 1

0g (x) dx = I (g) .

Try this in Excel using the integral of the standard normaldensity. How good is your approximation for 100 (1,00010,000 100,000 and 1,000,000) random numbers?

This method is called Monte Carlo integration.

1057/1074

Page 104: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of strong convergency: Law of Large Numbers

Application of LLN: Pooling of Risks in Insurance

Individuals may be faced with large and unpredictable losses.Insurance may help reduce the financial consequences of suchlosses by pooling individual risks. This is based on the LLN.

If X1,X2, . . . ,Xn are the amount of losses faced by n differentindividuals, but homogeneous enough to have a commondistribution, and if these individuals pool together and eachagrees to pay:

X n =1

n∑k=1

Xk .

Then, the LLN tells us that the amount each person will endup paying becomes more predictable as the size of the groupincreases. In effect, this amount will becomecloser to µ, the average loss each individual expects.

1058/1074

Page 105: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of weak convergency: Central Limit Theorem

Limit theorems & parameter estimatorsParameter estimation

Definition of an estimator

Estimator I: the method of momentsThe method of momentsExample & exercise

Estimator II: maximum likelihood estimatorMaximum likelihood estimationExample & exerciseSampling distribution and the bootstrap

Estimator III: Bayesian estimatorIntroductionBayesian estimationExample & exercise

Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson

SummarySummary

Page 106: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of weak convergency: Central Limit Theorem

Central Limit Theorem

Suppose X1,X2, . . . ,Xn are independent, identicallydistributed random variables with finite mean µ and finitevariance σ2. As before, denote the sample mean by X n.

Then, the central limit theorem states:

X n − µσ/√

n

d→ N (0, 1) , as n→∞.

This holds for all r.v. with finite mean and variance, not onlynormal r.v.!

Prove & rewriting CLT: see next slides.

1059/1074

Page 107: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of weak convergency: Central Limit Theorem

Rewriting Central Limit TheoremWe can write this result as:

limn→∞

Pr

(X n − µσ/√

n≤ x

)= Φ (x) ,

for all x where Φ (·) denotes the cdf of a standard normal r.v..

Intuitively for large n, the random variable:

Zn =X n − µσ/√

n

is approximately standard normally distributed.

The Central Limit Theorem is usually expressed in terms ofthe standardized sums Sn =

∑nk=1 Xk . Then the CLT applies

to the random variable:

Zn =Sn − n · µ√

n · σd→ N (0, 1) , as n→∞.

1060/1074

Page 108: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of weak convergency: Central Limit Theorem

Proof of the Central Limit TheoremLet X1,X2, . . . be a sequence of independent r.v.’s with meanµ and variance σ2 and denote Sn =

∑ni=1 Xi . Prove that

Zn =Sn − n · µσ ·√n

converges to the standard normal distribution.

General procedure to prove Xnd→ X :

1. Find the m.g.f. of X : MX (t);

2. Find the m.g.f. of Xn: MXn(t);

3. Take the limit n→∞ of m.g.f. of Xn: limn→∞

MXn(t) and

rewrite it. This should be equal to MX (t).

Note: useful are expansions for log and exp (see F&T page 2)!

1. Proof: Consider the case with µ = 0 and assuming the MGFexists for X , then we have: MZ (t) = exp

(t2/2

).1061/1074

Page 109: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of weak convergency: Central Limit Theorem

2. Recall Sn =n∑

i=1Xi , the m.g.f. of Zn = Sn

σ·√n

=

n∑i=1

Xi

σ·√n

is

obtained by:

MZn (t)∗=Msn

(t

σ ·√n

)∗∗=

(MXi

(t

σ ·√n

))n

* using Ma·X (t) = MX (a · t) ** using Sn is the sum of n i.i.d.random variables Xi , thus M∑n

i=1Xi(t) = Mn

Xi(t).

Note that we only assumed that:

MXi(t) =f

(t, σ2

);

E [Xi ] =µ;

Var (Xi ) =σ2 <∞,hence, for any distribution Xi with mean µ and finite variance!1062/1074

Page 110: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of weak convergency: Central Limit Theorem

Note: limn→∞

b · n−c = 0, for b ∈ R and c > 0.

Recall from week 1: 1) An m.g.f. uniquely defines adistribution; 2) The m.g.f. is a function of all moments.

Consider Taylor series around zero for any M (t):

M (t) =∞∑i=0

t i

i !· M(i)(t)

∣∣∣t=0︸ ︷︷ ︸

i th moment

=M (0) + t ·M(1) (t)∣∣∣t=0

+1

2·t2 ·M(2) (t)

∣∣∣t=0

+ O(t3),

where O(t3) covers all terms ck · tk , with ck ∈ R for k ≥ 3.

We have M (0) = E[e0·X ] = 1 and because we assumed that E[Xi ] = 0:

M(1)Xi

(t)∣∣∣t=0

=E [Xi ] = 0, and M(2)Xi

(t)∣∣∣t=0

= E[X 2i

]= Var (Xi ) + (E [Xi ])

2 = σ2.

3. Proof continues on next slide.1063/1074

Page 111: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of weak convergency: Central Limit Theorem

Now we can align the results from the previous two slides:

limn→∞

MZn (t)1062= lim

n→∞

(MXi

(t/(σ√n)))n

1063= lim

n→∞

( ∞∑i=0

(t/(σ ·

√n))i

i !· M(i)

Xi(t)∣∣∣t=0

)n

1063= lim

n→∞

(1 + 0 +

1

2

(t

σ√n

)2

σ2 + O

((t

σ/√n

)3))n

⇒ limn→∞

log (MZn (t)) = limn→∞

n · log

(1 +

1

2

(t

σ√n

)2

σ2 + O

((1

n

)3/2))

∗= lim

n→∞n ·

(1

2

(t√n

)2

+ O

((1

n

)3/2)

︸ ︷︷ ︸=n·

(O(( 1n )

3/2)

+O(( 1n )

2))

=O(( 1n )

1/2)→0, if n→∞

)=

t2

2,

* using log(1 + a)=∑∞

i=1(−1)i+1ai

i = a+O(a2), with a= t2

n +O((

1n

)3/2)

.1064/1074

Page 112: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of weak convergency: Central Limit Theorem

Application CLT: An insurer offers builder’s risk insurance. Ithas yearly 400 contracts and offers the product already 9years. The sample mean of a claim is $10 million and thesample standard deviation is $25 million.

Question: What is the probability that in a year the claimsize is larger than $5 billion?

Solution: Using CLT (why is σ ≈ sample s.d.?)

X n − µσ/√

n

d→N (0, 1) , as n→∞

⇒ X n ∼N(µ,(σ/√n)2)

⇒ n · X n ∼N(n · µ, n · σ2

)⇒ 0.9772 = Pr

(400 · X 400 ≤ 400 · 10 million + 2 · 20 · 25 million

).

Thus, Pr(400 · X 400 > $5 billion

)= 1− 0.9772 = 0.0228.

1065/1074

Page 113: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of convergence in distribution: Normal Approximation to the Binomial

Limit theorems & parameter estimatorsParameter estimation

Definition of an estimator

Estimator I: the method of momentsThe method of momentsExample & exercise

Estimator II: maximum likelihood estimatorMaximum likelihood estimationExample & exerciseSampling distribution and the bootstrap

Estimator III: Bayesian estimatorIntroductionBayesian estimationExample & exercise

Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson

SummarySummary

Page 114: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of convergence in distribution: Normal Approximation to the Binomial

Normal Approximation to the Binomial

From week 2 we know: a Binomial random variable is the sumof Bernoulli random variables. Let Xk ∼ Bernoulli (p). Then:

S = X1 + X2 + . . .+ Xn

has a Binomial(n, p) distribution.

Applying the Central Limit Theorem, S must beapproximately normal with mean E[S ] = n · p and varianceVar(S) = n · p · q, so that approximately for large n we have:

S − n · p√n · p · q

∼ N (0, 1) .

Question: What is the probability that X = 60 ifX ∼ Bin(1000, 0.06)? Not in Binomial tables!

1066/1074

Page 115: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of convergence in distribution: Normal Approximation to the Binomial

In practice, for large n and for p around 0.5 (but in particularnp > 5 and np (1− p) > 5 or n > 30) then can approximatethe binomial probabilities with the Normal distribution.

Use µ = n · p and σ2 = n · p · (1− p).

Continuity correction for binomial: note that Binomial randomvariable X takes integer values k = 0, 1, 2, . . . but Normalprobability is continuous so that for value:

Pr (X = k) ,

we require the Normal approximation:

Pr

((k−1

2

)− µ

σ< Z <

(k+ 1

2

)− µ

σ

)and similarly for probability that Pr (X ≤ k).

1067/1074

Page 116: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of convergence in distribution: Normal Approximation to the Binomial

Normal approximation to Binomial

0 2 40

0.2

0.4

x

prob

abilit

y mas

s fun

ction Binomial(5,0.1) p.m.f.

← p.d.f. N(0.5,0.45)

n = 5, p = 0.1

0 5 100

0.1

0.2

0.3

0.4

x

prob

abilit

y mas

s fun

ction Binomial(10,0.1) p.m.f.

← p.d.f. N(1,0.9)

n = 10, p = 0.1

0 10 20 300

0.05

0.1

0.15

0.2

x

prob

abilit

y mas

s fun

ction Binomial(30,0.1) p.m.f.

← p.d.f. N(3,2.7)

n = 30, p = 0.1

0 100 2000

0.02

0.04

0.06

0.08

x

prob

abilit

y mas

s fun

ction Binomial(200,0.1) p.m.f.

← p.d.f. N(20,18)

n = 200, p = 0.1

1068/1074

Page 117: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of convergence in distribution: Normal Approximation to the Poisson

Limit theorems & parameter estimatorsParameter estimation

Definition of an estimator

Estimator I: the method of momentsThe method of momentsExample & exercise

Estimator II: maximum likelihood estimatorMaximum likelihood estimationExample & exerciseSampling distribution and the bootstrap

Estimator III: Bayesian estimatorIntroductionBayesian estimationExample & exercise

Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson

SummarySummary

Page 118: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of convergence in distribution: Normal Approximation to the Poisson

Normal approximation to the Poisson

Approximation of Poisson by Normal for large values of λ.

Let Xn be a sequence of Poisson random variables withincreasing parameters λ1, λ2, . . . such that λn →∞.

We have:

E[Xn] =λn

Var(Xn) =λn

Standardize the random variable (i.e., subtract mean anddivide by standard deviation):

Zn =Xn − E[Xn]√

Var(Xn)=

Xn − λn√λn

d→ Z ∼ N(0, 1).

Proof: See next slides.1069/1074

Page 119: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of convergence in distribution: Normal Approximation to the Poisson

1. We have the m.g.f. of Z : MZ (t) = exp(t2/2

).

2. Next, we need to find the m.g.f. of Zn. We know (week 2):

MXn(t) = exp(λn ·

(et − 1

)).

Thus, using the calculation rules for m.g.f., we have:

MZn (t) =MXn−λn√λn

(t) = M Xn√λn−√λn

(t)

∗=exp

(−√λn · t

)·MXn

(t/√λn

)= exp

(−√λn · t

)· exp

(λn ·

(et/√λn − 1

))= exp

(−√λn · t + λn ·

(et/√λn − 1

))* using Ma·X+b(t) = exp (b · t) ·MX (a · t).

1070/1074

Page 120: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of convergence in distribution: Normal Approximation to the Poisson

3. Find the limit of the MZn(t) and proof it equals MZ (t):

limn→∞

MZn (t) = limn→∞

exp(−√λn · t + λn ·

(et/√λn − 1

))⇒ lim

n→∞log (MZn (t)) = lim

n→∞− t ·

√λn + λn ·

(e

t√λn − 1

)∗= lim

n→∞−t√λn + λn ·

(1 +

t√λn

+1

2!

(t√λn

)2

+1

3!

(t√λn

)3

+ . . .−1

)

= limn→∞

1

2!t2 + O

(1√λn

)= t2/2

⇒ limn→∞

MZn (t) = exp(t2/2

)= MZ (t).

* using exponential expansion: ea =∑∞

i=1ai

i! , witha = t/

√λn.

1071/1074

Page 121: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Convergence of series

Application of convergence in distribution: Normal Approximation to the Poisson

Normal approximation to Poisson

0 1 20

0.5

1

x

prob

abilit

y m

ass

func

tion Poisson(0.1) p.m.f.

← p.d.f. N(0.1,0.1)

λ = 0.1

0 2 4 60

0.1

0.2

0.3

x

prob

abilit

y m

ass

func

tion Poisson(1) p.m.f.

← p.d.f. N(1,1)

λ = 1

0 10 20 300

0.05

0.1

x

prob

abilit

y m

ass

func

tion Poisson(10) p.m.f.

← p.d.f. N(10,10)

λ = 10

0 100 2000

0.01

0.02

0.03

x

prob

abilit

y m

ass

func

tion Poisson(100) p.m.f.

← p.d.f. N(100,100)

λ = 100

1072/1074

Page 122: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Summary

Summary

Limit theorems & parameter estimatorsParameter estimation

Definition of an estimator

Estimator I: the method of momentsThe method of momentsExample & exercise

Estimator II: maximum likelihood estimatorMaximum likelihood estimationExample & exerciseSampling distribution and the bootstrap

Estimator III: Bayesian estimatorIntroductionBayesian estimationExample & exercise

Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson

SummarySummary

Page 123: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Summary

Summary

Parameter estimatorsMethod of moments:

1. Equate (the first) k sample moments to the corresponding kpopulation moments;

2. Equate the k population moments to the parameters of thedistribution;

3. Solve the resulting system of simultaneous equations.Maximum likelihood:

1. Determine the likelihood function L (θ1, θ2, . . . , θk ; x);2. Determine the log-likelihood function

` (θ1, θ2, . . . , θk ; x) = log (L (θ1, θ2, . . . , θk ; x));3. Equate the derivatives of ` (θ1, θ2, . . . , θk ; x) w.r.t.

θ1, θ2, . . . , θk to zero (⇒ global/local minimum/maximum).4. Check wether second derivative is negative (maximum) and

boundary conditions.Bayesian:

1. Posterior density using (1) (difficult/tidious integral!) or (2).2. Compute the Bayesian estimator under a given loss function.

1073/1074

Page 124: Week 5 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Summary

Summary

LLN & CLT

Law of large numbers: Let Xi , . . . ,Xn be independentrandom variables with equal mean E[Xk ] = µ and varianceVar(Xk) = σ2 for k = 1, . . . , n, then for all ε > 0 we have:

Pr(∣∣X n − µ

∣∣ > ε)→ 0, as n→∞.

Central limit theorem: Let Xi , . . . ,Xn be independent andidentically distributed random variables with mean E[Xk ] = µand variance Var(Xk) = σ2 for k = 1, . . . , n, then:

X n − µσ/√n

d→ N(0, 1), as n→∞.

1074/1074