. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sampling Distributions of Estimators and Hypothesis Testing Weerachart T. Kilenthong Research Institute for Policy Evaluation and Design (RIPED) University of the Thai Chamber of Commerce Tee (RIPED) Statistical Inference 1 / 45
45
Embed
Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Sampling Distributions of Estimators and HypothesisTesting
Weerachart T. Kilenthong
Research Institute for Policy Evaluation and Design (RIPED)University of the Thai Chamber of Commerce
Tee (RIPED) Statistical Inference 1 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Random Variable and Distribution Function
Definition (Random Variable)
A random variable X is a real-valued function that is defined on thesample space S .
We usually use a random variable to represent uncertain situations. Infact, we assign value to each situations, e.g.,
I Raining: we could assign values to a random variable X representingraining experiment, i.e., X = 1 if it is raining, and 0 otherwise.
To understand a random variable completely, we need to know itsdistribution function, i.e.,
I Raining: suppose that the probability of raining is p = 0.6. Itsdistribution function is defined by
Pr (X = 1) = 0.6,
Pr (X = 0) = 0.4.
I Income: suppose that household income is distributed as log-normalwith parameter
(µ = 10, σ2 = 100
):
Tee (RIPED) Statistical Inference 2 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Distribution Function from Simualted Data: BernoulliDistribution
x=0 x=1
Bernoulli Distribution p=0.6
Prob
abilit
y
0.0
0.2
0.4
0.6
0.8
1.0
0.4006
0.5994
Figure: Observed fraction of sample from a 10,000 draws of the Bernoullidistribution
Tee (RIPED) Statistical Inference 3 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Distribution Function from Simualted Data: CDF ofLog-Normal Distribution
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
Log−normal Distribution mu=10 sigma=10
y
Com
ulat
ive
dist
ribut
ion
func
tion
Figure: Observed fraction of sample from a 10,000 draws of the log-normaldistribution
Tee (RIPED) Statistical Inference 4 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Distribution Function from the Population
The question then is how can we get the distribution of a randomvariable we interested in?
Of course, if we can observe the whole population, then we can justcalculate the following cumulative distribution function
F (x) = Pr (X ≤ x)
For example, for each level of income x , we can estimate F (x) usingthe fraction of households whose income is no more than x .
Then, we can get the density probability function(p.d.f) f (x) by
f (x) =dF (x)
dx,
which is the slope of the c.d.f. F (x).
Tee (RIPED) Statistical Inference 5 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Distribution Function from Simualted Data: pdf ofLog-Normal Distribution
Log−normal Distribution mu=10 sigma=10
y
Prob
abilit
y di
strib
utio
n fu
nctio
n
0 20 40 60 80 100
0.00
0.02
0.04
0.06
0.08
0.10
Figure: Observed fraction of sample from a 10,000 draws of the log-normaldistribution
Tee (RIPED) Statistical Inference 6 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Population Mean and Variance
Sometimes it is more convenient to characterize a distributionfunction by moments or expectation.
The most popular moments are mean E [X ] ≡ µ and varianceVar [X ] ≡ σ2:
µ = E [X ] =
∫xf (x)dx ,
σ2 = Var [X ] =
∫(x − E [X ])2 f (x)dx .
One of the reason is that(µ, σ2
)are sufficient statistics for a Normal
distribution. That is, if we know(µ, σ2
)of a Normal distribution then
we know the whole distribution. Note: why do we care about Normaldistribution so much?
Tee (RIPED) Statistical Inference 7 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Sample and Population
Unfortunately, it is almost impossible to observe all the population.
Therefore, we need to use estimation and inference.
An observed data we got is called a sample. We, of course, need touse the sample to infer about the underlying random variable or thetruth. We cannot wait for the population data.
Problem: obviously data is not the same as population. What can wedo?
Tee (RIPED) Statistical Inference 8 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Estiamtion of Mean and Variance
We can use a non-parametric method to estimate the wholedistribution directly. But it requires a very large sample, which weusually do not have. This way is the best if you can do it, of course.
Most of the time, though, we will estimate mean and variance mostly.
There are several underlying theories that give us the followingestimators for mean and standard deviation:
µ = xn =
∑ni=1 xin
,
σ =
√∑ni=1 (xi − xn)
2
n.
Note: there is an alternative estimator of variance:
σ =
√∑ni=1 (xi − xn)
2
n − 1,
which is unbiased. But both will be very close when n is large.
Tee (RIPED) Statistical Inference 9 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Estimators as Random Variables
Question: Should we consider an estimator as a constant number ora random variable?
Answer: given that we use sample, which is not the population, weneed to consider an estimator as a random variable.
As a result, an estimator itself must have a distribution function,which is the key of a statistical analysis.
A statement from a statistical analysis, therefore, needs toincorporate the distribution function of the estimator. Technically,this procedure is called a statistical inference.
Tee (RIPED) Statistical Inference 10 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Distribution of the Mean Estimator or Average
Question: What is the distribution of the average:
µ =
∑ni=1 xin
,
I Using a law of large number, we know that the limit of the averageE [µ] is equal to the true parameter µ.
Question: what is the distribution of the estimator?I If we assume that the distribution is Normal (we usually prove this
using the central limit theorem), then we can simply ask what is thevariance of the mean estimator?
Tee (RIPED) Statistical Inference 11 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Standard Error of the Average
We usually estimate the standard deviation of an estimator bystandard error (s.e.).
Question: what is standard error of the mean estimator of a Normaldistribution?
Back to basic: what is standard error?I It is the standard deviation of the mean estimator.I But how can we calculate it?
Statistic programs (i.e. STATA, R) usually calculate the s.e. for you.
Here we will show you how to construct it manually. The purpose isnot for you to do it this way but to show you its meaning.
Tee (RIPED) Statistical Inference 12 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Simulated Data from the Standard Normal distributionwith
(µ = 0, σ2 = 1
)
We begin by simulating the data from a known Normal distributionwith
(µ = 0, σ2 = 1
). These parameters in this case are the true
parameters.
Suppose that we simulate 10,000 observations.
You can see the distribution of each sample sets int he followingfigure.
I It get closer to the theoretical distribution when the sample size islarger.
Tee (RIPED) Statistical Inference 13 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Simulated Data from a Normal distribution with(µ = 0, σ2 = 1
)number of sample = 30
z
Dens
ity
−1 0 1 2
0.00.4
0.8number of sample = 100
z
Dens
ity
−2 −1 0 1 2 3 4
0.00
0.15
0.30
number of sample = 500
z
Dens
ity
−3 −2 −1 0 1 2 3
0.00.2
0.4
number of sample = 1000
z
Dens
ity
−3 −2 −1 0 1 2 3
0.00.2
0.4
number of sample = 5000
z
Dens
ity
−2 0 2 4
0.00.1
0.20.3
0.4
number of sample = 10000
z
Dens
ity
−2 0 2 4
0.00.1
0.20.3
0.4
Normal Distribution:µ=0,σ2=1
Figure: Distribution of simulated sample from the standard normal distribution
Tee (RIPED) Statistical Inference 14 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Standard Error of the Average from Simulated Data
We will randomly draw n = 50, 100, 500, 1,000 sample from the10,000 observations, each of them 1,000 times.
Then, calculate the average or the mean estimator for each draw
µn =
∑ni=1 xin
,
Get the distribution of the averages µn from those draws.
This is the distribution of the estimator. Of course, the accuracydepends on the number of draw.
Tee (RIPED) Statistical Inference 15 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Standard Error of the Average from Simulated Data
Figure: Distribution of estimated mean or averge from randomized sample
Tee (RIPED) Statistical Inference 16 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Standard Error from Theory
Question: Do we need to do this bootstrapping all the time?
Answer: we usually do not do it this way. We use probability theoryto guide us how to calculate the standard error (s.e.) of an estimator.
For example, using a central limit theorem, we can show that√n(Xn − µ
)has the Normal distribution with mean zero and
variance σ2. Hence, Var[Xn
]= σ2
n .
As a result, we can approximate the standard error for the average orthe mean estimator by
σ√n.
That is, we replace σ by its estimator σ:
σ =
√∑ni=1 (xi − xn)
2
n − 1,
As a result, the average Xn has the t distribution with n− 1 degree offreedom.
Standard Error is Not the Standard Deviation of theSample
One may be confused whether the standard error is the standarddeviation of the data?
The answer is clearly NO.
To see this further, let see the standard error for standard deviationestimator.
Tee (RIPED) Statistical Inference 19 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Standard Error of the Standard Deviation σ fromSimulated Data
Figure: Distribution of estimated standard deviation from randomized sample
Tee (RIPED) Statistical Inference 20 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Proportion/Fraction as Probability
Another simple but useful estimator is a fraction pi , a ratio of numberof observations in a group ni and the total number of observations n:
pi =nin.
This fraction is an efficient estimator of a probability of having groupi in the population.
Example: Bernoulli distribution with parameter p; the outcome iseither 1 or 0. The estimator for p is
p = xn =
∑ni=1 xin
=n1n,
where n1 is the number of observations with xi = 1.
The key point: p ∼ N(p, p(1−p)
n
)Tee (RIPED) Statistical Inference 21 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Proportion/Fraction from Simulation
In theory, the standard error for p is√
p(1−p)n .
0.60 0.65 0.70 0.75 0.80 0.85 0.90
010
2030
40
Dens
ity
0.60 0.65 0.70 0.75 0.80 0.85 0.90
010
2030
40
Dens
ity
0.60 0.65 0.70 0.75 0.80 0.85 0.90
010
2030
40
Dens
ity
0.60 0.65 0.70 0.75 0.80 0.85 0.90
010
2030
40
Dens
ity
n=50 n=100 n=500 n=1000
p=0.75
Figure: Distribution of estimated p: the true parameter is p = 0.75.Tee (RIPED) Statistical Inference 22 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Bayes Estimation for a normal distribution
Assume that, X1,X2, ...,Xn be a random sample for N(µ, σ2) and σ2
is known.
Assume that, the prior is µ ∼ N(µ0, σ20).
The key point: the posterior of µ is also normal with mean andvariance:
µ1 =σ2µ0 + nσ2
0Xn
σ2 + nσ20
and σ21 =
σ2σ20
σ2 + nσ20
.
Where Xn is sample mean and n is number of sample.
Tee (RIPED) Statistical Inference 23 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Simulation of posterior distribution
Figure: Distribution of estimated µ1: the true parameter is µ = 0.
Tee (RIPED) Statistical Inference 24 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum Likelihood Estimation for a normal distribution
Assume that, X1,X2, ...,Xn be a random sample for N(µ, σ2).
For this example, likelihood function is
L =n∏
i=1
ϕ
(xi − µ
σ
∣∣∣∣µ, σ2
)(1)
The MLE of θ = µ, σ2 is θ = µ, σ2 = x ,∑n
i=1(xi−xn)n and the
asymptotic normality result states when θ0 = µ0, σ20 is true
parameter that
√n(µ− µ0)
d−→ N(0,σ20
n)
√n(σ2 − σ2
0)d−→ N(0,
2(σ20)
2
n).
Tee (RIPED) Statistical Inference 25 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Simulation of estimator using MLE
Figure: Distribution of estimated µ and σ2: the true parameters are µ = 0 andσ2 = 1.
Tee (RIPED) Statistical Inference 26 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Method of Moment Estimation for a normal distribution
Assume that, X1,X2, ...,Xn be a random sample for N(µ, σ2).
The moments of this example are∑ni=1 xin
= E (X ) = µ (2)∑ni=1 x
2i
n= E (X 2) = µ2 + σ2 (3)
The MM estimors of θ = µ, σ is θ = µ, σ = x ,∑n
i=1(xi−xn)n .
In general, Bootstrap (Monte Carlo simulation) methods provideapproximations to the sampling distributions of MM estimators.
Tee (RIPED) Statistical Inference 27 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Simulation of estimator using MM estimators
Figure: Distribution of estimated µ and σ2: the true parameters are µ = 0 andσ2 = 1.
Tee (RIPED) Statistical Inference 28 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Hypothesis Testing
In general, we apply a statistical analysis to test whether a hypothesiscan be rejected or not.
This is a main reason why do we need to know the distribution or thestandard error of an estimator.
We will not go into details about how many types of testing we cando here. We will simply focus on the basic concept that might beuseful for simple analysis.
Tee (RIPED) Statistical Inference 29 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Null and Alternative Hypothesis
We usually want to test whether a parameter θ is in a set Ω0 or not:
H0 : θ ∈ Ω0,
which is called a Null hypothesis. Consequently, at the same time, theAlternative hypothesis is automatically defined as
H1 : θ ∈ Ω1,
where Ω0 and Ω1 are disjoint partition with Ω0 ∩ Ω1 = ∅ andΩ0 ∪ Ω1 = Ω.
If we know the true value of θ, it is then very easy to tell which one istrue. But unfortunately we usually do not know the true value. Wecan at best estimate it.
Tee (RIPED) Statistical Inference 30 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Test Statistics
To simplify a test further, we usually define a test statisticsT = r (X ), which is a function of observables X .Examples:
I t-statistics: it is a test statistics for testing whether a parameter isdifferent from zero:
t =θ
s.e.(θ) ,
where s.e.(θ)si the standard error of the estimator θ. If the
parameter of estimate is the mean, whose estimator is the average, wethen can test whether the mean is equal to a constant µ or not, usingthe following t-stat
t =Xn − µ
σ√n
,
where the distribution of this t-stat is the t distribution with degree of
freedom n − 1. Note that here θ = Xn−µσ and s.e.
(θ)= σ
σ√n
Tee (RIPED) Statistical Inference 31 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
t and Normal Distribution
−4 −2 0 2 4
0.00.1
0.20.3
0.4
Student's t DistributionsDe
nsity
normaldf=3df=5df=20df=30
Figure: Normal and t distributions.
Tee (RIPED) Statistical Inference 32 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Error Type I and Error Type II: Graphical Representation
Figure: Error type I and error type II
Tee (RIPED) Statistical Inference 33 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Error Type I and Error Type II: Formal Definition
Figure: Error type I and error type II
Tee (RIPED) Statistical Inference 34 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Significance Level of A Test
We usually like to tell us how big is the Type I error: α.
Figure: Significance Level
Tee (RIPED) Statistical Inference 35 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
p-Value
Definition
The p-value is the smallest level α s.t. we would reject H0 at level α withthe observed data
Alternatively, we can use p-value to tell us smallest probability thatwe would reject H0.
Figure: p-value
Tee (RIPED) Statistical Inference 36 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Confidence IntervalThe daily net-return of ABSM from 11/09/2017 to 11/09/2018
−0.02
−0.01
0.00
0.01
0.02
Date
ABSM
11/09
/17
11/10
/17
11/11
/17
11/12
/17
11/01
/18
11/02
/18
11/03
/18
11/04
/18
11/05
/18
11/06
/18
11/07
/18
11/08
/18
11/09
/18
95% CI Fitted values
Figure: Confidence Interval
Tee (RIPED) Statistical Inference 37 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Comparing Means of Two Distributions: t-Distribution
When we want to compare means of two distributions with the samevariance σ2:
H0 : µ1 = µ2
We can use the following statistic to test the hypothesis
U =
Xn−Ym
σ√
1n+ 1
m√S2X+S2
Yσ2(n+m−2)
The key point is that it has the t distribution.
Tee (RIPED) Statistical Inference 38 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Comparing Means of Two Distributions: Example
−0.02 −0.01 0.00 0.01 0.02
010
2030
4050
6070
kden
sity
−0.02 −0.01 0.00 0.01 0.02
010
2030
4050
6070
kden
sity
ABSM
1AMSET50
Figure: Comparing Means of Two Distributions: daily net return of ABSM and1AMSET50 from 11/09/2017 to 11/09/2018.
Tee (RIPED) Statistical Inference 39 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Comparing Means of Two Distributions: Example
We set a hypothesis testing to compare mean of two funds.
H0 : µABSM = µ1AMSET50
Ha : µABSM = µ1AMSET50
Tee (RIPED) Statistical Inference 40 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Comparing Variances of Two Distributions: F-Distribution
We can also compare variances of two distributions.
H0 : σ1 = σ2
We can use the following statistic to test the hypothesis
U =
S2X
n−1
S2Y
m−1
The key point is that it has the F distribution.
Tee (RIPED) Statistical Inference 41 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Comparing Variances of Two Distributions: Example
We set a hypothesis testing to compare mean of two funds.
H0 : σABSM = σ1AMSET50
Ha : σABSM = σ1AMSET50
Figure: Comparing Variances of Two Distributions: daily net return of ABSM and1AMSET50 from 11/09/2017 to 11/09/2018.
Tee (RIPED) Statistical Inference 42 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Comparing Two Distributions: χ2-Distribution
We sometimes would like to compare two distributions.
With categorical or discrete data, we can form the null hypothesis as
H0 : pji = p0i for i = 1, . . . , k and ∀j
where each sample can be categorized into k groups.
The test statistic is
Q =k∑
i=1
(Ni − Np0i
)2Np0i
,
where Ni is the number of observation in group i , and∑k
i=1 Ni = N.
The key point: Q ∼ χ2k−1.
Tee (RIPED) Statistical Inference 43 / 45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Comparing Two Distributions: Example
We set a hypothesis testing to compare proportion of two categoricalfunds including: type and policy.
H0 : pji = pTotali
Ha : pji = pTotali
when i = Equity, Fixed, Mixed, Other and j = global, local,Total