ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes ACTL2002/ACTL5101 Probability and Statistics c Katja Ignatieva School of Risk and Actuarial Studies Australian School of Business University of New South Wales [email protected]Week 5 Video Lecture Notes Probability: Week 1 Week 2 Week 3 Week 4 Estimation: Week 5 Week 6 Review Hypothesis testing: Week 7 Week 8 Week 9 Linear regression: Week 10 Week 11 Week 12 Video lectures: Week 1 VL Week 2 VL Week 3 VL Week 4 VL
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes
Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson
SummarySummary
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Parameter estimation
Definition of an estimator
Definition of an Estimator
Problem of statistical estimation: a population has somecharacteristics that can be described by a r.v. X with densityfX (· |θ ).
Density has unknown parameter (or set of parameters) θ.
We observe values of the random sample X1,X2, . . . ,Xn
from the population fX (· |θ ). Denote this observed samplevalues by x1, x2, . . . , xn.
We then estimate the parameter (or some function of theparameter) based on this random sample.
1004/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Parameter estimation
Definition of an estimator
Definition of an Estimator
Any statistic, i.e., a function T (X1,X2, . . . ,Xn), that is afunction of observable random variables and whose values areused to estimate τ (θ), where τ (·) is some function of theparameter θ, is called an estimator of τ (θ).
A value θ of the statistic evaluated at the observed samplevalues by x1, x2, . . . , xn, will be called an (point) estimate.
For example:
T (X1,X2, . . . ,Xn) = X n =1
n
∑nj=1 Xj , estimator;
θ = 0.23, point estimate.
Note θ can be a vector, then the estimator is a set ofequations.
1005/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson
SummarySummary
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator I: the method of moments
The method of moments
The Method of Moments
Example of estimator: Method of Moments (MME).
Let X1,X2, . . . ,Xn be a random sample from the populationwith density fX (·|θ) which we will assume has k number ofparameters, say θ = [θ1, θ2, . . . , θk ]>.
The method of moments estimator τ(θ) procedure is:1. Equate (the first) k sample moments to the corresponding k
population moments;
2. Equate the k population moments to the parameters of thedistribution;
3. Solve the resulting system of simultaneous equations.
The method of moment point estimates (θ) are the estimatevalues of the estimator corresponding to the data set.
1006/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson
SummarySummary
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator I: the method of moments
Example & exercise
Example: MME & Binomial distributionSuppose X1,X2, . . . ,Xn is a random sample from Bin (n, p)distribution, with known parameter n.
Question: Use the method of moments to find pointestimators of θ = p.
1. Solution: Equate population moment to sample moment:
E[X ] =1
n·
n∑j=1
xj = x .
2. Equate population moment to the parameter (use week 2):
E[X ] = n · p.
3. Then the method of moments estimator is (i.e., solving it):
x = n · p ⇒ p = x/n.1008/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator I: the method of moments
Example & exercise
Exercise: MME & Normal distribution
Suppose X1,X2, . . . ,Xn is a random sample from N(µ, σ2
)distribution.
Question: Use the method of moments to find pointestimators of µ and σ2.
1. Solution: Equate population moment to sample moment:
E [X ]︸ ︷︷ ︸population moment
=1
n·
n∑j=1
xj = x︸ ︷︷ ︸sample moment
E[X 2]︸ ︷︷ ︸
population moment
=1
n·
n∑j=1
x2j .︸ ︷︷ ︸
sample moment1009/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator I: the method of moments
Example & exercise
Exercise: MME & Normal distribution
2. Equate population moment to the parameters (use week 2):
Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson
SummarySummary
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Maximum Likelihood function
Another example (mostly used) of an estimator is themaximum likelihood estimator.
First, we need to define the likelihood function.
If x1, x2, . . . , xn are drawn from a population with a parameterθ (where θ could be a vector of parameters), then thelikelihood function is given by:
where fX1,X2,...,Xn (x1, x2, . . . , xn) is the joint probability densityof the random variables X1,X2, . . . ,Xn.
1011/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Maximum Likelihood EstimationLet L (θ) = L (θ; x1, x2, . . . , xn) be the likelihood function forX1,X2, . . . ,Xn.
The set of parameters θ = θ (x1, x2, . . . , xn) (note: function ofobserved values) that maximizes L (θ) is the maximumlikelihood estimate of θ.
The random variable θ (X1,X2, . . . ,Xn) is called the maximumlikelihood estimator.
When X1,X2, . . . ,Xn is a random sample from fX (x |θ), thenthe likelihood function is (using i.i.d. property):
L (θ; x1, x2, . . . , xn) =n∏
j=1
fX (xj |θ) ,
which is just the product of the densities evaluated at each ofthe observations in the random sample.1012/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Maximum Likelihood Estimation
If the likelihood function contains k parameters so that:
L (θ1, θ2, . . . , θk ; x) = fX (x1|θ) · fX (x2; θ) · . . . · fX (xn; θ) ,
then (under certain regularity conditions), the point where thelikelihood is a maximum is a solution of the k equations:
∂L (θ1, θ2, . . . , θk ; x)
∂θ1= 0,
∂L (θ; x)
∂θ2= 0, . . . ,
∂L (θ; x)
∂θk= 0.
Normally, the solutions to this system of equations give theglobal maximum, but to ensure, you should usually check forthe second derivative (or Hessian) conditions and boundaryconditions for a global maximum.
1013/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Maximum Likelihood Estimation
Consider the case of estimating two variables, say θ1 and θ2.
Define the gradient vector:
D (L) =
∂L
∂θ1
∂L
∂θ2
and define the Hessian matrix:
H (L) =
∂2L
∂θ21
∂2L
∂θ1∂θ2
∂2L
∂θ1∂θ2
∂2L
∂θ22
.1014/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Maximum Likelihood Estimation
From calculus we know that the maximum choice θ1 and θ2
should satisfy not only:
D (L) = 0,
but also H should be negative definite which means:
[h1 h2
]
∂2L
∂θ21
∂2L
∂θ1∂θ2
∂2L
∂θ1∂θ2
∂2L
∂θ22
[h1
h2
]< 0,
for all [h1, h2] 6= 0.
1015/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Log-Likelihood functionGenerally, maximizing the log-likelihood function is easier.
Not surprisingly, we define the log-likelihood function as:
Maximizing the log-likelihood function gives the sameparameter estimates as maximizing the likelihood function,because taking the log is a monotonic increasing function.
1016/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
MLE procedure
The general procedure to find the ML estimator is:
1. Determine the likelihood function L (θ1, θ2, . . . , θk ; x);
Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson
SummarySummary
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Example & exercise
Example: MLE and Poisson1. Suppose X1,X2, . . . ,Xn are i.i.d. and Poisson(λ). The
likelihood function is given by:
L (λ; x) =n∏
j=1
fX (xj |θ) =
(e−λλx1
x1!
)·(e−λλx2
x2!
)· . . . ·
(e−λλxn
xn!
)
=e−λ·n(λx1
x1!· λ
x2
x2!· . . . · λ
xn
xn!
).
2. So that taking the log of both sides, we get:
` (λ; x) = −λ · n + log (λ) ·n∑
k=1
xk −n∑
k=1
log (xk !) .
Or, equivalently, using directly the log-likelihood function:
` (λ; x) =n∑
j=1
log (fX (xj |θ)) =n∑
j=1
−λ+xk · log (λ)− log (xk !) .
1018/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Example & exercise
Example: MLE and Poisson
Now we need to maximize this log-likelihood function withrespect to the parameter λ.
3. Taking the first order condition (FOC) with respect to λ wehave:
∂
∂λ` (λ) = 0 ⇒ −n +
1
λ
n∑k=1
xk = 0.
This gives the maximum likelihood estimate (MLE):
λ =1
n
n∑k=1
xk = x ,
which equals the sample mean.
4. Check for second derivative condition to ensure globalmaximum.
1019/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Example & exercise
Exercise: MLE and Normal
Suppose X1,X2, . . . ,Xn are i.i.d. and Normal(µ, σ2
)where
both parameters are unknown.
The p.d.f. is given by:
fX (x) =1√
2π · σ· exp
(−1
2·(x − µσ
)2).
1. Thus the likelihood function is given by:
L (µ, σ; x) =n∏
k=1
1√2πσ
exp
(−1
2
(xk − µσ
)2).
Question: Find the MLE of µ and σ2.
1020/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Example & exercise
Exercise: MLE and Normal
2. Solution: Its log-likelihood function is:
` (µ, σ; x) =n∑
i=1
log
(1√
2π · σ· exp
(−1
2·(xk − µσ
)2))
∗=−n · log(σ)− n
2· log(2π)− 1
2σ2·
n∑k=1
(xk − µ)2 .
* using log(1/a) = log(a−1) = − log(a), with a = σand log(1/
√b) = log(b−0.5) = −0.5 log(b), with b = 2π.
Take the derivative w.r.t. µ and σ and set that equal to zero.
1021/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Example & exercise
3./4. Then, we obtain:
∂
∂µ` (µ, σ; x) =
1
σ2
n∑k=1
(xk − µ) = 0
⇒n∑
k=1
xk − nµ = 0
⇒ µ = x
∂
∂σ` (µ, σ; x) =
−nσ
+
∑nk=1(xk − µ)
σ3= 0
⇒ n =
n∑k=1
(xk − µ)
σ2
⇒ σ2 =1
n
n∑k=1
(xk − x)2 .
See §9.7 and §9.8 of W+(7ed) for further details.1022/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Example & exercise
Example: MME & MLE and Gamma
You may not always obtain closed-form solutions for theparameter estimates with the maximum likelihood method.
An example of such problem when estimating the parametersusing MLE is the Gamma distribution.
As we will see in the next slides, using MLE yields oneparameter estimate in closed-form solution; not so for thesecond parameter.
To find the MLE one should do the following: numericallyestimate the estimates (!) by solving a non-linear equation.This can be done by employing an iterative numericalapproximation (e.g. Newton-Ralphson).
Application: Surrender mortgages, see Excel.1023/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Example & exercise
Example: MME & MLE and Gamma
In such cases an initial value may be needed so that othermeans of estimating first may be used, such as using themethod of moments. Then use it as the starting value.
Question: Consider X1,X2, . . . ,Xn i.i.d. and Gamma(λ, α)find the MME of the Gamma distribution.
fX (x) = λα
Γ(α) · xα−1 · e−λ·x ; E [X r ] = Γ(α+r)
λrΓ(α)
MX (t) = E[etX]
=(
λλ−t
)α; Var (X ) = α
λ2 .
1. Solution: Equate sample moments to population moments:
µ1 = M(1)X (t)
∣∣∣t=0
= E [X ] = x and µ2 = M(2)X (t)
∣∣∣t=0
= E[X 2]
=n∑
i=1
x2i
n.
1024/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Example & exercise
Example: MME & MLE and Gamma
2. Equate population moments to the parameters:
µ1 =α
λand µ2 =
α · (α + 1)
λ2=α
λ·(α + 1
λ
)= µ1 ·
(µ1 +
1
λ
).
3. Therefore, the method of moments estimates are given by:
µ2µ1
= µ1 + 1λ ⇒λ = µ1
µ2−µ21
α = µ1 · λ ⇒α =µ2
1
µ2−µ21.
So that estimators are:
λ =x
σ2and α =
x2
σ2.
using (step 1.) µ1 = x and
µ2 =n∑
i=1
x2in ⇒ µ2 − µ2
1 =n∑
i=1
x2in − x2 = σ2
1025/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Example & exercise
Example: MME & MLE and Gamma
Question: Find the ML-estimates.
1. Solution: Now, X1,X2, . . . ,Xn are i.i.d. and Gamma(λ, α) solikelihood function is:
L (λ, α; x) =n∏
i=1
1
Γ (α)· λα · xα−1
i · e−λ·xi .
2. The log-likelihood function is then:
` (λ, α; x) =− n · log (Γ (α)) + n · α · log(λ)
+ (α− 1) ·n∑
i=1
log(xi )− λ ·n∑
i=1
xi .
1026/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Example & exercise
Example: MME & MLE and Gamma
3. Maximizing this:
∂
∂α` (λ, α; x) =− n ·
∂Γ(α)∂α
Γ (α)+ n · log(λ) +
n∑i=1
log(xi ) = 0
∂
∂λ` (λ, α; x) =
n · αλ−
n∑i=1
xi = 0.
Easy to solve for second equation:
λ =n · αn∑
i=1xi
,
but need numerical (iterative) techniques for solving the firstequation.
1027/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Example & exercise
Example: MLE and Uniform
Suppose X1,X2, . . . ,Xn are i.i.d. U [0, θ], i.e., fX (x) =1
θ, for
0 ≤ x ≤ θ, and zero otherwise. Here the range of x dependson the parameter θ.
The likelihood function can be expressed as:
L (θ; x) =
(1
θ
)n
·n∏
k=1
I0≤xk≤θ,
where I0≤xk≤θ is an indicator function taking 1 if x ∈ [0, θ]and zero otherwise.
Question: How to find the maximum of this Likelihoodfunction?
1028/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Example & exercise
Example: MLE and Uniform
x(1) x(4) x(3) x(2)0
2
4
6
8
10
12
14
θ
L(θ;x)
Solution: Non-linearity in theindicator function ⇒ cannot usecalculus to maximize this function,i.e., setting FOC equal to zero.
You can maximize it by looking at itsproperties:
-∏n
k=1 I0≤xk≤θ can only take value 0and 1;Note: it will take the value 0 ifθ < x(n) and 1 else!
- (1/θ)n is a decreasing function in θ;
- Hence, function is maximized for thelowest value of θ for which∏n
k=1 I0≤xk≤θ = 1 i.e.:
θ = max x1, x2, . . . , xn = x(n).
1029/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson
SummarySummary
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Sampling distribution and the bootstrap
Sampling distribution and the bootstrap
We might not only be interested in the point estimate, but inthe whole distribution of the MLE estimate (parameteruncertainty!);
However, we have no closed solution for MLE estimates. Howto obtain their sampling distribution? Use bootstrapping.
Step 1: Generate k samples from Gamma(λ, α).
Step 2: Estimate λ, α for each of these k samples using MLE.
Step 3: The empirical joint cumulative distribution function ofthese k parameter estimates is an approximation to sampledistribution of the MLE estimates.
Quantification of risk: produce histograms of estimates.1030/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator II: maximum likelihood estimator
Sampling distribution and the bootstrap
Sampling distribution and bootstrap, k = 250, see Excel
1 2 30
0.2
0.4
0.6
0.8
1
Approximation sample distr of α
α
F α(α)
1st time 2nd 3rd 4th 5th
0.1 0.2 0.3 0.40
0.2
0.4
0.6
0.8
1
Approximation sample distr of λ
λ
F λ(λ)
1031/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson
SummarySummary
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator III: Bayesian estimator
Introduction
IntroductionWe have seen:
I Method of moment estimator:Idea: first k moments of the estimated special distribution andsample are the same.
I Maximum likelihood estimator:Idea: Probability of sample given a class of distribution is thehighest with this set of parameters.
Warning: Bayesian estimation is hard to understand. Partlydue to non-standard notation in Bayesian estimates.
Pure Bayesian interpretation: Suppose you have, a priori,prior belief about a distribution;
Then you observe data ⇒ more information about thedistribution.1032/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator III: Bayesian estimator
Introduction
Example frequentist interpretation: Let Xi ∼ Ber(θ) bewhether individual i lodge a claim at the insurer:
-∑T
i=1 Xi = Y ∼ Bin(T , θ) be the number of car accidents;
- The probability of insured having a car accident depends onadverse selection;
- A new insurer does not know the amount of adverse selectionin his pool;
- Now, let θ ∈ Θ, with Θ ∼ Beta(a, b) the distribution of therisk among individuals (i.e., representing adverse selection);
- Use this for estimating the parameter ⇒ what is our prior forθ?
This is called empirical Bayes.
Similar idea: Bayesian updating, in case of time varyingparameters:
- Prior: Last year’s estimated claim distribution;- Data: This years claims;- Posterior: revised estimated claim distribution.1033/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson
SummarySummary
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator III: Bayesian estimator
Bayesian estimation
Notation for Bayesian estimation
Under this approach, we assume that Θ is a random quantitywith density π (θ) called the prior density.(This is usual notation, rather than fΘ(θ).)
A sample X = x(= [x1, x2, . . . , xT ]>) is taken from itspopulation and the prior density is updated using theinformation drawn from this sample and applying Bayes’ rule.This updated prior is called the posterior density, which is theconditional density of Θ given the sample X = x is π(θ|x)(=fΘ|X (θ|x)).
So we’re using a conditional r.v., Θ|X , associated with themultivariate distribution of Θ and the X (look back at lecturenotes for week 3).
Use for example E [π(θ|x)] as the Bayesian estimator.1034/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator III: Bayesian estimator
Bayesian estimation
Bayesian estimation, theory
First, let us define a loss function L(θ; θ) on T which is anestimator of τ(θ) with:
L(θ; θ) ≥ 0, for every θ;
L(θ; θ) = 0, when θ = θ.
Interpretation loss function: for reasonable functions wehave:a loss function has a lower value ⇒ better estimator.
Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson
SummarySummary
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator III: Bayesian estimator
Example & exercise
Example Bayesian estimation: Bernoulli-Beta
Let X1,X2, . . . ,XT be i.i.d. Bernoulli(Θ), i.e.,(Xi |Θ = θ) ∼ Bernoulli(θ).
Assume the prior density of Θ is Beta(a, b) so that:
π (θ) =Γ (a + b)
Γ (a) · Γ (b)· θa−1 · (1− θ)b−1 .
We know that the conditional density (density conditional onthe true value of θ) of our data is given by:
Γ (a + s) · Γ (b + T − s)· θ(a+s)−1 · (1− θ)(b+T−s)−1,
1044/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator III: Bayesian estimator
Example & exercise
Example Bayesian estimation: Bernoulli-Beta
2. The mean of this r.v. with the above posterior density is then:
θB = E[Θ|X = x ] = E [Ξ ∼ Beta (a + s, b + T − s)] =a + s
a + b + T
gives the Bayesian estimator of Θ.
We note that we can write the Bayesian estimator as aweighted average of the prior mean (which is a/(a + b)) andthe sample mean (which is s/T ) as follows:
θB = E[Θ|X = x ] =
(T
a + b + T
)︸ ︷︷ ︸
weight sample
·( s
T
)︸ ︷︷ ︸
sample mean
+
(a + b
a + b + T
)︸ ︷︷ ︸
weight prior
·(
a
a + b
)︸ ︷︷ ︸prior mean
.
1045/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator III: Bayesian estimator
Example & exercise
Exercise Normal-NormalLet X1,X2, . . . ,XT be i.i.d. Normal
(Θ, σ2
2
), i.e.,
(Xi |Θ = θ) ∼ Normal(θ, σ22).
Assume the prior density of Θ is Normal(m, σ2
1
)so that:
π (θ) =1√
2πσ1
· exp
(−(θ −m)2
2 · σ21
).
Question: Find the Bayesian estimator for θ.
Solution: We know that the conditional density of our data isgiven by the likelihood function:
fX |Θ (x |θ ) =T∏j=1
1√2πσ2
· exp
(−
(xj − θ)2
2 · σ22
)
=1
(√
2πσ2)T· exp
(−∑T
j=1(xj − θ)2
2 · σ22
)1046/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator III: Bayesian estimator
Example & exercise
1. Posterior density:
π(θ|x) ∝ fX |Θ(x |θ) · π(θ) ∝ exp
(−∑T
j=1(xj − θ)2
2 · σ22
)· exp
(−(θ −m)2
2 · σ21
)
= exp
(−∑T
j=1(xj − θ)2
2 · σ22
− (θ −m)2
2 · σ21
)
= exp
(−∑T
j=1(x2j + θ2 − 2 · θ · xj)
2 · σ22
− (θ2 + m2 − 2 · θ ·m)
2 · σ21
)
= exp
(−σ2
2 · (θ2 + m2 − 2 · θ ·m) + σ21 ·∑T
j=1(x2j + θ2 − 2 · θ · xj)
2 · σ22 · σ2
1
)∗∝ exp
(−θ
2 · (σ22 + T · σ2
1)− 2 · θ · (m · σ22 + T · x · σ2
1)
2 · σ22 · σ2
1
)
= exp
− θ2 − 2 · θ · (m·σ22+T ·x ·σ2
1)
(σ22+T ·σ2
1)
2 · σ22 · σ2
1/(σ22 + T · σ2
1)
∗∗∝ exp
−(θ − m·σ2
2+T ·x ·σ21
σ22+T ·σ2
1
)2
2 · σ22 · σ2
1/(σ22 + T · σ2
1)
1047/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator III: Bayesian estimator
Example & exercise
*: exp
(−m2+
∑Tj=1 xj
2·σ22 ·σ2
1
)and **:
exp
((m·σ2
2+T ·x ·σ21
σ22+T ·σ2
1
)2/(
2 · σ22 · σ2
1/(σ22 + T · σ2
1)))
are
constants given x .
1. Thus θ|X is Normally distributed with meanm·σ2
2+T ·x ·σ21
σ22+T ·σ2
1and
varianceσ2
2 ·σ21
σ22+T ·σ2
1. Note that we can rewrite it to:
mean:
1σ2
1
1σ2
1+ T
σ22
·m +
Tσ2
1
1σ2
1+ T
σ22
· x , and variance:
(1
σ21
+T
σ22
)−1
2. The Bayesian estimator under both the mean squared lossfunction and absolute error loss function is:
θB =
1σ2
1
1σ2
1+ T
σ22
·m +
Tσ2
1
1σ2
1+ T
σ22
· x .
1048/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson
SummarySummary
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Chebyshev’s Inequality
Chebyshev’s InequalityThe Chebyshev’s inequality, states that for any randomvariable X with mean µ and variance σ2, the followingprobability inequality holds for all ε > 0:
Pr (|X − µ| > ε) ≤ σ2
ε2.
Note that this applies to all distributions, hence alsonon-symmetric ones! This implies that:
Pr (X − µ > ε) ≤ σ2
ε2≥ Pr (X − µ < −ε) .
Interesting example: set ε = k · σ then:
Pr (|X − µ| > k · σ) ≤ 1
k2.
This provides us with an upper bound of the probability thatX deviates more than k standard deviations of its mean.1049/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Chebyshev’s Inequality
Application: Chebyshev’s Inequality
The distribution of fire insurance claims does not have aspecial distribution.
We do know that the mean claim size in the portfolio is $50million with a standard deviation of $150 million.
Question: What is an upper bound for the probability thatthe claim size is larger than $500 million?
Solution: We have:
Pr (X − µ > k · σ) ≤Pr (|X − µ| > k · σ)
= Pr (|X − 50| > k · 150)
≤ 1
k2=
1
9.
Thus, Pr (X > 500) ≤ 1/9.1050/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson
SummarySummary
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Convergence concepts
Convergence conceptsSuppose X1,X2, . . . form a sequence of r.v.’s. Example: Xi isthe sample variance using the first i observations.
Xn is said to converge almost surely (a.s.) to the randomvariable X as n→∞ if and only if:
Pr (ω : Xn (ω)→ X (ω) , as n→∞) =1,
and we write Xna.s.→ X , as n→∞.
Sometimes called strong convergence. It means that beyondsome point in the sequence (ω), the difference will always beless than some positive ε, but that point is random.
Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson
SummarySummary
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of strong convergency: Law of Large Numbers
The Law of Large NumbersSuppose X1,X2, . . . ,Xn are independent random variableswith common mean E[Xk ] = µ and common varianceVar(Xk) = σ2, for k = 1, 2, . . . , n.Define the sequence of sample means as:
X n =1
n
n∑k=1
Xk .
Then, according to the law of large numbers, for any ε > 0,we have:
limn→∞
Pr(∣∣X n − µ
∣∣ > ε)
= limn→∞
σ2n
ε2= lim
n→∞
σ2
n · ε2= 0,
Proof: special case: ∼ N(µ, σ2): X − µ ∼ N(0, σ2/n), thuswhen n→∞ we have lim
n→∞σ2/n = 0.
General case: When second moment exists, use Chebychev’sinequality with σ → 0.1054/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of strong convergency: Law of Large Numbers
The law of large numbers (LLN) is sometimes written as:
Pr(∣∣X n − µ
∣∣ > ε)→ 0, as n→∞.
The result above is sometimes called the (weak) law of large
numbers and sometimes we write X np→ µ, because this is the
same concept as convergence in probability to a constant.
However, there is also what we call the (strong) law of largenumbers which simply states that the sample mean convergesalmost surely to µ:
X na.s.→ µ, as n→∞.
Important result in Probability and Statistics!
Intuitively, the law of large number states that the samplemean X n converges to the true value µ.
How accurate the estimate is will depend on:I) how large the sample size is; II) the variance σ2.1055/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of strong convergency: Law of Large Numbers
Application of LLN: Monte Carlo Integration
Suppose we wish to calculate
I (g) =
∫ 1
0g (x) dx ,
where elementary techniques of integration will not work.
Using the Monte Carlo method, we generate U [0, 1] variablessay X1,X2, . . . ,Xn and compute:
In (g) =1
n·
n∑k=1
g (Xk) ,
where In (g) denotes the approximation of I (g), we have:In (g)
a.s.→ I (g), as n→∞.
Prove: next slide.1056/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of strong convergency: Law of Large Numbers
Proof: Using the law of large numbers, we haveIn(g) = 1
n
∑nk=1 g(Xk)
a.s.→ E [g (X )] which is:
E [g (X )] =
∫ 1
0g (x) · 1dx =
∫ 1
0g (x) dx = I (g) .
Try this in Excel using the integral of the standard normaldensity. How good is your approximation for 100 (1,00010,000 100,000 and 1,000,000) random numbers?
This method is called Monte Carlo integration.
1057/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of strong convergency: Law of Large Numbers
Application of LLN: Pooling of Risks in Insurance
Individuals may be faced with large and unpredictable losses.Insurance may help reduce the financial consequences of suchlosses by pooling individual risks. This is based on the LLN.
If X1,X2, . . . ,Xn are the amount of losses faced by n differentindividuals, but homogeneous enough to have a commondistribution, and if these individuals pool together and eachagrees to pay:
X n =1
n·
n∑k=1
Xk .
Then, the LLN tells us that the amount each person will endup paying becomes more predictable as the size of the groupincreases. In effect, this amount will becomecloser to µ, the average loss each individual expects.
1058/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of weak convergency: Central Limit Theorem
Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson
SummarySummary
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of weak convergency: Central Limit Theorem
Central Limit Theorem
Suppose X1,X2, . . . ,Xn are independent, identicallydistributed random variables with finite mean µ and finitevariance σ2. As before, denote the sample mean by X n.
Then, the central limit theorem states:
X n − µσ/√
n
d→ N (0, 1) , as n→∞.
This holds for all r.v. with finite mean and variance, not onlynormal r.v.!
Prove & rewriting CLT: see next slides.
1059/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of weak convergency: Central Limit Theorem
Rewriting Central Limit TheoremWe can write this result as:
limn→∞
Pr
(X n − µσ/√
n≤ x
)= Φ (x) ,
for all x where Φ (·) denotes the cdf of a standard normal r.v..
Intuitively for large n, the random variable:
Zn =X n − µσ/√
n
is approximately standard normally distributed.
The Central Limit Theorem is usually expressed in terms ofthe standardized sums Sn =
∑nk=1 Xk . Then the CLT applies
to the random variable:
Zn =Sn − n · µ√
n · σd→ N (0, 1) , as n→∞.
1060/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of weak convergency: Central Limit Theorem
Proof of the Central Limit TheoremLet X1,X2, . . . be a sequence of independent r.v.’s with meanµ and variance σ2 and denote Sn =
∑ni=1 Xi . Prove that
Zn =Sn − n · µσ ·√n
converges to the standard normal distribution.
General procedure to prove Xnd→ X :
1. Find the m.g.f. of X : MX (t);
2. Find the m.g.f. of Xn: MXn(t);
3. Take the limit n→∞ of m.g.f. of Xn: limn→∞
MXn(t) and
rewrite it. This should be equal to MX (t).
Note: useful are expansions for log and exp (see F&T page 2)!
1. Proof: Consider the case with µ = 0 and assuming the MGFexists for X , then we have: MZ (t) = exp
(t2/2
).1061/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of weak convergency: Central Limit Theorem
2. Recall Sn =n∑
i=1Xi , the m.g.f. of Zn = Sn
σ·√n
=
n∑i=1
Xi
σ·√n
is
obtained by:
MZn (t)∗=Msn
(t
σ ·√n
)∗∗=
(MXi
(t
σ ·√n
))n
* using Ma·X (t) = MX (a · t) ** using Sn is the sum of n i.i.d.random variables Xi , thus M∑n
i=1Xi(t) = Mn
Xi(t).
Note that we only assumed that:
MXi(t) =f
(t, σ2
);
E [Xi ] =µ;
Var (Xi ) =σ2 <∞,hence, for any distribution Xi with mean µ and finite variance!1062/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of weak convergency: Central Limit Theorem
Note: limn→∞
b · n−c = 0, for b ∈ R and c > 0.
Recall from week 1: 1) An m.g.f. uniquely defines adistribution; 2) The m.g.f. is a function of all moments.
Consider Taylor series around zero for any M (t):
M (t) =∞∑i=0
t i
i !· M(i)(t)
∣∣∣t=0︸ ︷︷ ︸
i th moment
=M (0) + t ·M(1) (t)∣∣∣t=0
+1
2·t2 ·M(2) (t)
∣∣∣t=0
+ O(t3),
where O(t3) covers all terms ck · tk , with ck ∈ R for k ≥ 3.
We have M (0) = E[e0·X ] = 1 and because we assumed that E[Xi ] = 0:
M(1)Xi
(t)∣∣∣t=0
=E [Xi ] = 0, and M(2)Xi
(t)∣∣∣t=0
= E[X 2i
]= Var (Xi ) + (E [Xi ])
2 = σ2.
3. Proof continues on next slide.1063/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of weak convergency: Central Limit Theorem
Now we can align the results from the previous two slides:
limn→∞
MZn (t)1062= lim
n→∞
(MXi
(t/(σ√n)))n
1063= lim
n→∞
( ∞∑i=0
(t/(σ ·
√n))i
i !· M(i)
Xi(t)∣∣∣t=0
)n
1063= lim
n→∞
(1 + 0 +
1
2
(t
σ√n
)2
σ2 + O
((t
σ/√n
)3))n
⇒ limn→∞
log (MZn (t)) = limn→∞
n · log
(1 +
1
2
(t
σ√n
)2
σ2 + O
((1
n
)3/2))
∗= lim
n→∞n ·
(1
2
(t√n
)2
+ O
((1
n
)3/2)
︸ ︷︷ ︸=n·
(O(( 1n )
3/2)
+O(( 1n )
2))
=O(( 1n )
1/2)→0, if n→∞
)=
t2
2,
* using log(1 + a)=∑∞
i=1(−1)i+1ai
i = a+O(a2), with a= t2
n +O((
1n
)3/2)
.1064/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of weak convergency: Central Limit Theorem
Application CLT: An insurer offers builder’s risk insurance. Ithas yearly 400 contracts and offers the product already 9years. The sample mean of a claim is $10 million and thesample standard deviation is $25 million.
Question: What is the probability that in a year the claimsize is larger than $5 billion?
Solution: Using CLT (why is σ ≈ sample s.d.?)
X n − µσ/√
n
d→N (0, 1) , as n→∞
⇒ X n ∼N(µ,(σ/√n)2)
⇒ n · X n ∼N(n · µ, n · σ2
)⇒ 0.9772 = Pr
(400 · X 400 ≤ 400 · 10 million + 2 · 20 · 25 million
).
Thus, Pr(400 · X 400 > $5 billion
)= 1− 0.9772 = 0.0228.
1065/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of convergence in distribution: Normal Approximation to the Binomial
Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson
SummarySummary
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of convergence in distribution: Normal Approximation to the Binomial
Normal Approximation to the Binomial
From week 2 we know: a Binomial random variable is the sumof Bernoulli random variables. Let Xk ∼ Bernoulli (p). Then:
S = X1 + X2 + . . .+ Xn
has a Binomial(n, p) distribution.
Applying the Central Limit Theorem, S must beapproximately normal with mean E[S ] = n · p and varianceVar(S) = n · p · q, so that approximately for large n we have:
S − n · p√n · p · q
∼ N (0, 1) .
Question: What is the probability that X = 60 ifX ∼ Bin(1000, 0.06)? Not in Binomial tables!
1066/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of convergence in distribution: Normal Approximation to the Binomial
In practice, for large n and for p around 0.5 (but in particularnp > 5 and np (1− p) > 5 or n > 30) then can approximatethe binomial probabilities with the Normal distribution.
Use µ = n · p and σ2 = n · p · (1− p).
Continuity correction for binomial: note that Binomial randomvariable X takes integer values k = 0, 1, 2, . . . but Normalprobability is continuous so that for value:
Pr (X = k) ,
we require the Normal approximation:
Pr
((k−1
2
)− µ
σ< Z <
(k+ 1
2
)− µ
σ
)and similarly for probability that Pr (X ≤ k).
1067/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of convergence in distribution: Normal Approximation to the Binomial
Normal approximation to Binomial
0 2 40
0.2
0.4
x
prob
abilit
y mas
s fun
ction Binomial(5,0.1) p.m.f.
← p.d.f. N(0.5,0.45)
n = 5, p = 0.1
0 5 100
0.1
0.2
0.3
0.4
x
prob
abilit
y mas
s fun
ction Binomial(10,0.1) p.m.f.
← p.d.f. N(1,0.9)
n = 10, p = 0.1
0 10 20 300
0.05
0.1
0.15
0.2
x
prob
abilit
y mas
s fun
ction Binomial(30,0.1) p.m.f.
← p.d.f. N(3,2.7)
n = 30, p = 0.1
0 100 2000
0.02
0.04
0.06
0.08
x
prob
abilit
y mas
s fun
ction Binomial(200,0.1) p.m.f.
← p.d.f. N(20,18)
n = 200, p = 0.1
1068/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of convergence in distribution: Normal Approximation to the Poisson
Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson
SummarySummary
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of convergence in distribution: Normal Approximation to the Poisson
Normal approximation to the Poisson
Approximation of Poisson by Normal for large values of λ.
Let Xn be a sequence of Poisson random variables withincreasing parameters λ1, λ2, . . . such that λn →∞.
We have:
E[Xn] =λn
Var(Xn) =λn
Standardize the random variable (i.e., subtract mean anddivide by standard deviation):
Zn =Xn − E[Xn]√
Var(Xn)=
Xn − λn√λn
d→ Z ∼ N(0, 1).
Proof: See next slides.1069/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of convergence in distribution: Normal Approximation to the Poisson
1. We have the m.g.f. of Z : MZ (t) = exp(t2/2
).
2. Next, we need to find the m.g.f. of Zn. We know (week 2):
MXn(t) = exp(λn ·
(et − 1
)).
Thus, using the calculation rules for m.g.f., we have:
MZn (t) =MXn−λn√λn
(t) = M Xn√λn−√λn
(t)
∗=exp
(−√λn · t
)·MXn
(t/√λn
)= exp
(−√λn · t
)· exp
(λn ·
(et/√λn − 1
))= exp
(−√λn · t + λn ·
(et/√λn − 1
))* using Ma·X+b(t) = exp (b · t) ·MX (a · t).
1070/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of convergence in distribution: Normal Approximation to the Poisson
3. Find the limit of the MZn(t) and proof it equals MZ (t):
limn→∞
MZn (t) = limn→∞
exp(−√λn · t + λn ·
(et/√λn − 1
))⇒ lim
n→∞log (MZn (t)) = lim
n→∞− t ·
√λn + λn ·
(e
t√λn − 1
)∗= lim
n→∞−t√λn + λn ·
(1 +
t√λn
+1
2!
(t√λn
)2
+1
3!
(t√λn
)3
+ . . .−1
)
= limn→∞
1
2!t2 + O
(1√λn
)= t2/2
⇒ limn→∞
MZn (t) = exp(t2/2
)= MZ (t).
* using exponential expansion: ea =∑∞
i=1ai
i! , witha = t/
√λn.
1071/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of series
Application of convergence in distribution: Normal Approximation to the Poisson
Normal approximation to Poisson
0 1 20
0.5
1
x
prob
abilit
y m
ass
func
tion Poisson(0.1) p.m.f.
← p.d.f. N(0.1,0.1)
λ = 0.1
0 2 4 60
0.1
0.2
0.3
x
prob
abilit
y m
ass
func
tion Poisson(1) p.m.f.
← p.d.f. N(1,1)
λ = 1
0 10 20 300
0.05
0.1
x
prob
abilit
y m
ass
func
tion Poisson(10) p.m.f.
← p.d.f. N(10,10)
λ = 10
0 100 2000
0.01
0.02
0.03
x
prob
abilit
y m
ass
func
tion Poisson(100) p.m.f.
← p.d.f. N(100,100)
λ = 100
1072/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Convergence of seriesChebyshev’s InequalityConvergence conceptsApplication of strong convergency: Law of Large NumbersApplication of weak convergency: Central Limit TheoremApplication of convergence in distribution: Normal Approximation to the BinomialApplication of convergence in distribution: Normal Approximation to the Poisson
SummarySummary
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Summary
Summary
Parameter estimatorsMethod of moments:
1. Equate (the first) k sample moments to the corresponding kpopulation moments;
2. Equate the k population moments to the parameters of thedistribution;
3. Solve the resulting system of simultaneous equations.Maximum likelihood:
1. Determine the likelihood function L (θ1, θ2, . . . , θk ; x);2. Determine the log-likelihood function
θ1, θ2, . . . , θk to zero (⇒ global/local minimum/maximum).4. Check wether second derivative is negative (maximum) and
boundary conditions.Bayesian:
1. Posterior density using (1) (difficult/tidious integral!) or (2).2. Compute the Bayesian estimator under a given loss function.
1073/1074
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Summary
Summary
LLN & CLT
Law of large numbers: Let Xi , . . . ,Xn be independentrandom variables with equal mean E[Xk ] = µ and varianceVar(Xk) = σ2 for k = 1, . . . , n, then for all ε > 0 we have:
Pr(∣∣X n − µ
∣∣ > ε)→ 0, as n→∞.
Central limit theorem: Let Xi , . . . ,Xn be independent andidentically distributed random variables with mean E[Xk ] = µand variance Var(Xk) = σ2 for k = 1, . . . , n, then: