Populations, Fundamentals of Mathematicalpioneer.netserv.chula.ac.th/~achairat/Appendix C Mathematical... · Finite Sample Properties of Estimators Fundamentals of Mathematical Statistics

Fundamentals of Mathematical Statistics

Read Wooldridge, Appendix C:

Fundamentals of Mathematical Statistics: Part One . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat

Outline: Fundamentals of Mathematical Statistics

Part OneI. Populations, Parameters, and Random SamplingII. Finite Sample Properties of EstimatorsIII. Asymptotic or Large Sample Properties of Estimators

Part TwoIV. General Approaches to Parameter EstimationV. Interval Estimation and Confidence Intervals

Part ThreeVI. Hypothesis TestingVII. Remarks on Notation

2I. Random S. II. Finite S. III. Asymptotic S. IV. Parameter E. V. Interval E. & Confidence I. VI. Hypothesis T VII. Remarks

Fundamentals of Mathematical Statistics . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat

I. Populations, Parameters, and Random Sampling

• Population refers to any well‐defined group of subjects.

• Statistical inference involves learning something about the population from a sample.

• Parameters are constants that determine the directions and strengths of relationship among variables.


Fundamentals of Mathematical Statistics . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat I. Populations, Parameters, and Random Sampling

Populations, Parameters, and Random Sampling• By “learning”, we can mean several things.

– Most important are estimation and hypothesis testing.

Example:• Suppose our interest is to find the average percentage increase in wage given an

additional year of education.– Population: obtain wage and education of 33 million working people– Sample: obtain data on a subset of the population.



Example: Results:o the return to education is 7.5%

‐ example of point estimate.o the return to education is between 5.6% and 9.4%

‐ example of interval estimates.o Does education affect wage?

– example of hypothesis testing.

Sampling • Let Y be a random variable representing a population with a probability

density function f(y;).

• The probability density function (pdf) of Y is assumed to be known except for the value of – Different values of imply different population distributions.



SamplingRandom Sampling: Definition• If Y1, …, Yn are independent random variables with a common probability

density function f(y;), then {Y1, …, Yn} is a random sample from the population represented by f(y;).

• We also say the Yi are i.i.d. random variables from f(y;).– i.i.d. (independent, identically distributed)



Example: a random sample from normal distribution.

• If Y1, …, Yn are independent random variables with a normal distribution with mean and variance 2, then {Y1, …, Yn} is a random sample from the Normal(,2) population.

Sampling

Example: working population

• We may obtain a sample of 100 families. – Note that the data we observe will differ for each different sample. A

sample provides a set of numbers, say, {y1, …, yn}.



Sampling

Example: random sample from Bernoulli distribution.

• If Y1, ..., Yn are independent random variables, each is distributed as Bernoulli () so that

P(Yi=1) = P(Yi=0) = 1 ‐

then, {Y1, ..., Yn} constitutes a random sample from the Bernoulli () distribution.

• Note that Yi = 1 if passenger i show upYi = 0 otherwise



II. Finite Sample Properties of Estimators

• A “finite sample” implies a sample of any size, no matter how large or small.

– Small sample properties.

• Asymptotic properties have to do with the behavior of estimators as the sample size grows without bound.

A. UnbiasednessB. VarianceC. Efficiency


Fundamentals of Mathematical Statistics . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat II. Finite Sample Properties of Estimators

Estimators and Estimates

• Suppose {Y1, …, Yn} is a random sample from a population distribution that depends on an unknown parameter .

– An estimator of is the rule that assigns each possible outcome of the sample.

– A rule is specified before any sampling is carried out.

10


I. Random S. II. Finite S. III. Asymptotic S. IV. Parameter E. V. Interval E. & Confidence I. VI. Hypothesis T VII. RemarksFundamentals of Mathematical Statistics . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat II. Finite Sample Properties of Estimators

• An estimatorW of a parameter can be expressed as

W = h(Y1, …, Yn}

for some known function h.

• When [particular set of values, say {y1, …, yn}, is plugged into the function h, we obtain the estimate of .

Estimators and Estimates:sampling distribution

• The distribution of an estimator is called the sampling distribution. – It describes the likelihood of various outcomes of W across different

random samples.

• The entire sampling distribution of W1 can be obtained given the probability distribution of W1 and outcomes.

11



Estimators and EstimatesExample:

• Let {Y1, …, Yn} be a random sample from the population with mean . The natural estimator of is the average of the random sample:

Note that Y‐bar is called the sample average.

12



• Unlike in Appendix A, we define the sample average of a set of numbers as a descriptive statistic.

• For actual data outcomes, y1, …, yn, the estimate is the average in the sample …

• Example:

• Estimator:

• Estimate: = 6.0– Our estimate of the average city unemployment rate in the U.S. is 6.0%.

Notes1) Each sample results in a different estimate.2) The rule for obtaining the estimate is the same.

Example C.1: City Unemployment Rates

13



Unbiasedness Unbiased Estimator: Definition

An estimator W of is unbiased if

E(W) =

for all possible values of W

– Intuitively, if the estimator is unbiased, then its probability distribution has an expected value equal to the parameter it is supposed to be estimating.

14



• Unbiasedness does not mean that the estimate from a particular sample is equal to , or even very close to .

• If we could indefinitely draw random samples on Y from the population– then average these estimates over all random samples will obtain .

Unbiasedness

Bias of an Estimator: Definition

If W is an estimator of , its bias is defined as

Bias() = E(W) –

• An estimator has a positive bias if E(W) – >0.

15



• The unbiasedness of an estimator and the size of bias depend on – the distribution of Y and – the function h

• We cannot control the distribution of Y, but we could choose the choice of the rule h.

Unbiasedness

• Show: the sample average is an unbiased estimator of the population mean .

1 1 1

1 1 1( ) ( )n n n

i i ii i i

E Y E Y E Y E Yn n n

1

1 1 ( )n

in

n n

16



UnbiasednessWeaknesses: (1) Some very good estimators are not unbiased.

(2) Unbiased estimators could be quite poor estimators.

Example:Let W = Y1 (from a random sample of size n, we discard all of the observations except

the first)E(Y1) =

17



• Unbiasedness ensures that the probability distribution of an estimator has a mean value equal to the parameter it is supposed to be estimating.

• Variance shows how spread out the distribution of an estimator.

The Sampling Variance of Estimators

• The variance of an estimator is the measure of the dispersion in the distribution. It is often called sampling variance.

• Example: the variance of sample average from a population.

18



• Summary:If {Y1, …, Yn} is a random sample from a population with mean and variance 2, then

• has the same mean as the population• Its sampling variance equals the population variance 2 over the sample size.

(2/n)


• Define the estimator as

which is usually called the sample variance.

• Show that the sample variance is an unbiased estimator of 2.E(S2) = 2

19




Suppose W1 and W2 both are unbiased estimators of , but W1 is more tightly centered about . (See graph!)

20



This implies that the probability that W1 is greater than any distance from is less thanthe probability W2 is greater than the same distance from .

Estimator Estimator Y1

Mean is unbiasedE( ) =

Y1 is unbiasedE(Y1) =

Variance Var( )= 2/n Var(Y1)= 2

Example:For a random sample with mean and variance 2.Y1 is the estimator, the first observation drawn.

If the sample size n=10; this implies Var(Y1) is ten times larger than Var( )

21





Example:From simulation in Table C.1.

20 random samples of size 10 (n=10) generated from the normal distribution with =2 and 2=1

y1 ranges from (‐0.64‐4.27); mean = 1.89

ranges from (1.16‐2.58)mean = 1.96

Which estimator is better?



Relative Efficiency: If W1 and W2 are two unbiased estimators of , then W1 is efficient relative to W2 when Var(W1)Var(W2) for all .

Relative Efficiency

23



Example:

Example:y1

EfficiencyExample:• For estimating the population mean ,

Var( ) < Var(Y1) for any value of 2.

• The estimator is efficient relative to Y1 for estimating .

24



• In a certain class of estimators, we can show that the sample average has the smallest variance.

Example:

Show that has the smallest variance among all unbiased estimators that are also linear functions of Y1, Y2, …, Yn.

– The assumptions are that Yi have common mean and variance, and that they are pairwise uncorrelated.

Efficiency

• If we do not restrict our attention to unbiased estimators, then comparing variances is meaningless

Example: In estimating the population mean , we use trivial estimator equal to zero– mean equal to zero E(0) = 0– Variance equal to zero: Var(0) = 0– bias of this estimator equal ‐ Bias(0) = ‐

Bias(0) = E(0) ‐ = ‐

• So this trivial estimator is a very poor estimator when the bias of the estimator or is large.

25



Efficiency

• A measure when comparing estimators that are not necessarily unbiased:

– Mean squared error (MSE)

• If W is an estimator of , thenMSE(W) = E(W‐)2

= E[W‐E(W) +E(W)‐]2= Var(W) + [bias(W)]2

• The MSE measures how far, on average, the estimator is away from . It depends on the variance and bias.

26



Problem C.1C.1 Let Y1, Y2, Y3, and Y4 be independent, identically distributed

random variables from a population with mean and variance 2. Let

(Y1 + Y2 + Y3 + Y4)

denote the average of these four random variables.

(i) What are the expected value and variance of in terms of and 2? [ans.]

14

Y

27



Problem C.1 continue..

(ii) Now, consider a different estimator of :

This is an example of a weighted average of the Yi. Show that W is also an unbiased estimator of . Find the variance of W. [ans.]

(iii) Based on your answers to parts (i) and (ii), which estimator of do you prefer, or W? [ans.]

4321 21

41

81

81 YYYYW

28



Problem C.1 (i) continue…

• Expected value of =

• Variance of = 2/4.

1 1 1

1 1 1( ) ( )n n n

i i ii i i

E Y E Y E Y E Yn n n

1

1 1 ( )n

in

n n

29

(i) This is just a special case of what we covered in the text, with n = 4: • E( ) = µ and Var( ) = 2/4.


Problem C.1 (ii)

(ii) • W is unbiased.

E(W) = E(Y1)/8 + E(Y2)/8 + E(Y3)/4 + E(Y4)/2= µ[(1/8) + (1/8) + (1/4) + (1/2)]= µ(1 + 1 + 2 + 4)/8 = µ,

Note that Yi are independent.

30

• Find variance of WVar(W) = Var(Y1)/64 + Var(Y2)/64 + Var(Y3)/16 + Var(Y4)/4

= 2[(1/64) + (1/64) + (4/64) + (16/64)]

= 2(22/64) = 2(11/32).


Problem C.1 (iii)

• (iii) Var(W) = 11/32Var( ) = 8/32 = ¼

• Because 11/32 > 8/32 = 1/4, Var(W) > Var( ) for any 2 > 0, so is preferred to W because each is unbiased.



III. Asymptotic or Large Sample Properties of Estimators

• For estimating a population mean – One notable feature of Y1 is that it has the same variance for any sample

size– improves in the sense that its variance gets smaller as n gets larger.

• Y1 does not improve in this case

• We can rule out silly estimators by studying the asymptotic or large sample properties of estimators (n).

A. ConsistencyB. Asy. Normality


Fundamentals of Mathematical Statistics . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat III. Asymptotic or Large Sample Properties of Estimators

• How large is “large” sample?– This depends on the underlying population distribution.– Note that large sample approximations have been known to work well for

sample sizes as small as 20 observations (n=20).

Consistency• Consistency concerns how far the estimator is likely to be from the

parameter it is supposed to be estimating – as sample size increases indefinitely.

• Definition: ConsistencyLet Wn be an estimator of based on Y1, …, Yn of sample size n. Then, Wn is a consistent estimator if for for every >0

P(Wn‐ ) > 0 as n

• Note that we index the estimator by the sample size, n, in stating this definition.

I. Random Sampling II. Finite Sample III. Asymptotic Sample


Fundamentals of Mathematical Statistics: Part One . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat 33

Consistency1. The distribution Wn becomesmore and more concentrated about as sample size increases (n).

2. For larger sample sizes, Wn is less and less likely to be very far from .




3. When Wn is consistent, we say that is the probability limit of Wn, written as

plim(Wn) =

4. The conclusion that Yn is consistent of is known as the law of large numbers (LLN)

Consistency

• Example: the average of a random sample drawn with mean and variance 2

– the sample average is unbiased

– Thus, Var( n) 0 as n ‐>

n is a consistent estimator of

nYVar n

2

)(




• Unbiased estimators are not necessarily consistent, but those whose variances shrink to zero as sample size increases are consistent.

• Formally, if Wn is an estimator of and Var(Wn)0 as n , then plim(Wn)=.

Law of Large NumberDefinition: LLNLet Y1, …, Yn be i.i.d. random variables with mean . Then

plim( n) =




Intuitively, the LLN says that if we are interested in finding population average , we get arbitrarily close to by choosing a sufficiently large sample.

ConsistencyProperty PLIM.1

Let be a parameter and define a new parameter =g() for some continuous function g( ). Suppose plim(Wn)= . Define an estimator of as Gn=g(Wn). Then

plim(Gn) = .Alternatively,

plim[g(Wn)] = g[plim(Wn)]for some continuous function, g().




• What is a continuous function?– Note a continuous function is a “function that can be graphed without lifting your pencil from the

paper”.

Examples:• g()= a + b• g()= 2• g() = 1/• g() = exp()

Consistency

Property PLIM.2

If plim(Tn)= and plim(Un)=, then1) plim(Tn+Un) = + 2) plim(TnUn) = 3) plim(Tn/Un) = / provided that 0




Consistency• Example:

(1) ∑ E(2) Yn* ∑ E(Y*)

• As n , and X* are consistent estimators of .

(1) plim( ) = Var( ) = 0 as n

(2) plim(Y*) = plim( ∑ = plim( plim( )= 1* =

Var( ) = 0 as n

• Y* is also a consistent estimator since Y* approaches the value of the parameter as sample size gets larger and larger.




Consistency

Example:Estimating standard deviation from a population with mean and variance 2

Given sample variance

– Sample variance is unbiased estimator for 2.– Sn2 is also a consistent estimator for 2.

• Sample standard deviation– Sn is not an unbiased estimator of . Why?– Sn is a consistent estimator of .

2 plim plim nn SS




Consistency

• Due to the facts that

• It follows from PLIM.1 and PLIM.2 that

• Gn is a consistent estimator of . It is just the percentage difference between ̅n and n.

nnnn YYZG /)(100

)(plim nG

YnY )(plimZnZ )(plim




Example:Yi annual earnings with a high school education

(population mean = Y)Zi annual earnings with a college education

(population mean = Z)• Let {Y1,…, Yn} and {Z1,…, Zn} be a random sample of size n from a population of workers, and

we want to estimate the percentage difference in annual earnings between two groups, which is

YYZ /)(100

Asymptotic Normality Central Limit Theorem

Consistency is a property of point estimators, so is unbiasedness.

Consistency and distribution• Consistency does not tell us about the shape of that distribution for a given

sample size.• Most econometric estimators have distributions that are well approximated by

a normal distribution for large samples (n).




ZN│z

cdf for Zn (z) as n

Asymptotic NormalityDefinition: Asymptotic Normality• Let {Zn: n=1, …, n} be a sequence of random variables such that for all numbers z,

P(Zn z) (z) as n ,

where (z) is the standard normal cumulative distribution function (cdf)




• Intuitively,This property means that the cdf for Zn gets closer and closer to the cdf of the standard normal distribution as the sample size n gets large.

Central Limit Theorem (CLT)

• Definition: CLTIf Yi d(, 2), then

has an asymptotic standard normal distribution.




Yi d(, 2) implies that {Y1, …, Yn} be a random sample with mean and variance 2.

• Intuitively,The central limit theorem (CLT) suggests that the average from a random sample for any population, when standardized, has an asymptotic standard normal distribution.

The variable Zn is the standardized version of n: we have subtracted off E( n)= and divided by sd( n)=/ .

“a” stands for “asymptotically” or “approximately”.

Asymptotic Normality Central Limit Theorem• if is replaced by Sn, does have an approximate standard normal

distribution for size n?

The exact distributions of

are not the same as (1) , but the difference is often small enough to be ignored for large n.


45

(1)

I. Random S. II. Finite S. III. Asymptotic S. IV. Parameter E. V. Interval E. & Confidence I. VI. Hypothesis T VII. RemarksFundamentals of Mathematical Statistics . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat III. Asymptotic or Large Sample Properties of Estimators

Problem C.3C.3 Let denote the sample average from a random sample with

mean and variance 2. Consider two alternative estimators of :

W1 = [(n‐1)/n] and W2 = /2.

(i) Show that W1 and W2 are both biased estimators of and find the biases. What happens to the biases as n ? Comment on any important differences in bias for the two estimators as the sample size gets large. [ans.]




Problem C.3 continue …

(ii) Find the probability limits of W1 and W2. {Hint: Use properties PLIM.1 and PLIM.2; for W1, note that plim[(n‐1)/n] = 1.} Which estimator is consistent? [ans.]

(iii) Find Var(W1) and Var(W2). [ans.]

(iv) Argue that W1 is a better estimator than if is “close” to zero. [ans.]

(Consider both bias and variance.)




Problem C.3 (i)(i) • E(W1) = [(n – 1)/n]E( ) = [(n – 1)/n]µ,

Bias(W1) = [(n – 1)/n]µ – µ = –µ/n. As n , Bias(W1) 0

• Similarly, E(W2) = E( )/2 = µ/2,

Bias(W2) = µ/2 – µ = –µ/2. As n , Bias(W1) = –µ/2

• The bias in W1 tends to zero as n, while the bias in W2 is –µ/2 for all n. This is an important difference.



Problem C.3 (ii)(ii) • plim(W1) = plim[(n – 1)/n]plim( )

= 1µ = µ.

• plim(W2) = plim( )/2 = µ/2.

• Because plim(W1) = µ and plim(W2) = µ/2, W1 is consistent whereas W2 is inconsistent.



Problem C.3 (iii)

(iii)

• Var(W1) = [(n – 1)/n]2Var( )= [(n – 1)2/n3]2

• Var(W2) = Var( )/4 = 2/(4n).



Problem C.3 (iv)(iv) • Because is unbiased, its mean squared error is simply its variance.

MSE( ) = Var( ) + [Bias( )]2= 2/n

• On the other hand, MSE(W1) = Var(W1) + [Bias(W1)]2

= [(n – 1)2/n3]2 + µ2/n2.

Let µ = 0, MSE(W1) = Var(W1) = [(n – 1)2/n2](2/n)

Thus, MSE(W1) < MSE( ) orVar(W1) < Var( )[(n – 1)2/n2](2/n) < 2/n because (n – 1)/n < 1.

• Therefore, MSE(W1) is smaller than Var( ) for µ close to zero.

• For large n, the difference between the two estimators is trivial.



IV. General Approaches to Parameter Estimation

• We have learned finite and asymptotic properties for estimation –unbiasedness, consistency and efficiency.

Question:Are they general approaches that produce estimators with good properties?

A. MomentsB. Max LikelihoodC. Least Squares

52I. Random S. II. Finite S. III. Asymptotic S. IV. Parameter E. V. Interval E. & Confidence I. VI. Hypothesis T VII. RemarksFundamentals of Mathematical Statistics . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat IV. General Approaches to Parameter Estimation

• Given a parameter appearing in a population distribution, there are usually many ways to obtain unbiased and consistent estimator of .

• There are three methods– Method of Moments– Method of Maximum Likelihood– Method of Least Squares

Method of Moments• The basis of the method of moments proceeds as follows.

– The parameter is shown to be related to some function of some expected value in the distribution of Y, usually E(Y) and E(Y2).



Fundamentals of Mathematical Statistics . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat IV. General Approaches to Parameter Estimation

Example: Population mean• Suppose is a function of ; i.e., =g()

• Given the sample average is an unbiased and consistent estimator of , it is natural to replace for . Thus, g( ) is the estimator .

– In addition, the estimator g( ) is a consistent estimator of . If g () is linearfunction of , then g( ) is an unbiased estimator of .

• Why is this a method of moments? – Here we replace the population moment with the sample average .

Method of MomentsExample: Population covariance• The population covariance between two random variables X and Y is

XY = E(X‐X)(Y‐Y)

• The method of moment suggests the following estimator,

1)This is a consistent estimator of XY.2) However, this is a biased estimator.

1

1 ( )( )n

XY i ii

S X X Y Yn




Example: Sample covariance• The sample covariance is

1) It can be shown that this is an unbiased estimator of XY.2) This is a consistent estimator of XY.

Method of MomentsExample: Population correlation

XY = XY/(XY)

• The method of moments suggests estimating XY as

This is called the sample correlation coefficient.




Notes:(1) RXY is a consistent estimator of XY.

(Why?): SXY, SX and SY are consistent.(2) RXY is not an unbiased estimator of XY. (Why?)

First: SX and SY are not unbiased estimators.Second: RXY is a ratio of estimators, so it would not be unbiased.

Maximum Likelihood• Maximum likelihood Estimator (MLE)

• Let {Y1, …, Yn} be a random sample from the population distribution f(y;).




• The likelihood function, which is a random variable, can be defined as

L(; Y1, …, Yn) = f(Y1;)f(Y2;) f(Yn;)

• It is easier to work with the log‐likelihood function

1. In the discrete case, this isP(Y1=y1, Y2=y2,…, Yn=yn)

= P(Y1=y1)P(Y2=y2) … P(Yn=yn)

2. The joint distribution of {Y1, …, Yn} can be written as the product of the densities: f(Y1;)f(Y2,) f(Yn,).

• Then, the maximum likelihood estimator of , call it W, is the value of that maximizes the likelihood function.

– Intuitively, out of the possible values for , the value that makes the likelihood that the observed values are largest should be chosen.

Facts:1) It is obtained by taking the natural

log of the likelihood function.

2) The log of the product is the sum of the logs.

Maximum LikelihoodA. MomentsB. Max LikelihoodC. Least Squares



• Properties:1) MLE is usually consistent and sometimes unbiased.2) MLE is the generally the most asymptotically efficient estimator (when

the population model f(y;) is correctly specified).– MLE has the smallest variance among all unbiased estimators of .– MLE is the minimum variance unbiased estimator.

• Least squares estimators – a third kind of estimator.• The sample mean, , is a least square estimator of the population mean .

– It can be shown that the value of mwhich make the sum of squared deviations as small as possible is m =

Least SquaresA. MomentsB. Max LikelihoodC. Least Squares



Properties:1) LSE is consistent and unbiased.

2) LSE is the generally the most efficient estimator in finite and large samples.

– LSE has the smallest variance among all linear unbiased estimators of .

Methods of Moments, Least Squares, Maximum Likelihood

• The principles of least squares, method of moments, and maximum likelihood often result in the same estimator.




Summary Unbiased Consistent EfficiencyMoments unbiased consistent efficient

Least Squares unbiased consistent efficient

MaximumLikelihood

usually consistent consistent asymptotically

efficient

V. Interval Estimation and Confidence Intervals

• A point estimate is the researcher’s best guess at the population value, but it provides no information how close the estimate is likely to be to the population parameter.

60

A. NatureB. CI N(0,1)C. Rule of ThumbD. Asymptotic CI

I. Random S. II. Finite S. III. Asymptotic S. IV. Parameter E. V. Interval E. & Confidence I. VI. Hypothesis T VII. RemarksFundamentals of Mathematical Statistics . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat V. Interval Estimation and Confidence Intervals

Example:• On the basis of a random sample of workers,

– a researcher reports that job training grants increase hourly wage by 6.4%– We cannot know how close an estimate is for a particular sample because we

do not know the population value.

The Nature of Interval Estimation

• Interval estimation comes in when we make statement involving probabilities.

– One way of assessing the uncertainly in an estimator – sampling standard deviation.

• Interval estimation uses information on the point estimate and the standard deviation by constructing Confidence interval.

– It shows how the population value is likely to lie in relation to the estimate.

61



Concept of Interval estimation:• Assume {Y1, …, Yn} is a random sample from the Normal(, 2) population.

Suppose that the variance 2 is known (or 2=1).– The sample average has a normal distribution with mean and variance

2 /n; i.e., – Normal(, 2 /n).


• The standardized version of has a standard normal distribution.

• Rewrite as

• The random interval is

95.96.1/

96.1

nYP

( 1.96 / , 1.96 / )Y n Y n

62


( 1.96 / < 1.96 / ) 0.95P Y n Y n


1) The probability that the random interval contains the population mean is .95 or 95%2) This information allows us to construct an interval estimate of

‐ by plugging the sample outcome of the average, and =1 into the random interval .

• It is called a 95% confidence interval.

• A shorthand notation is ny /96.1

( 1.96 / < 1.96 / ) 0.95P Y n Y n

( 1.96 / , 1.96 / )Y n Y n random intervalconfidence interval


Example:• Given the sample data {y1, y2, …, yn} are observed. We can find .

Suppose n= 16, =7.3, =1

• The 95% confidence interval for is = 7.3 .49

• We can write in an interval form as [6.81, 7.89].

16/96.13.7

63



• The meaning of a confidence is more difficult to understand.We mean that the random interval contains with probability .95– There is a 95% chance that random interval contains .

• Random interval is an example of interval estimator.– since endpoints change with different samples.


Correct Interpretation:A random interval contains with probability 0.95.

Incorrect interpretation:The probability that is in the interval is 0.95.

• since is unknown and it either is or is not in the interval.

64

A. NatureB. CI N(0,1)C. Rule of ThumbD. Asymptotic CI( 1.96 / , 1.96 / )Y n Y n



65


Example:• Table C.2 contains calculations for 20

random samples• Assume Normal(2,1) distribution with

sample size n=10.• Interval estimates of are .62.Results:1) The interval changes with each random

sample.2) 19 of the 20 intervals contain the

population value of .3) Only for replication number 19, =2 is not

in the confidence interval. 4) 95% of the samples result in a confidence

interval that contain .


CIs for the Mean from a Normally Distributed Population

• Suppose the variance is 2 and known. The 95% confidence interval is

IV. Parameter Estimation V. Interval Estimation & Confidence Interval66

Fundamentals of Mathematical Statistics: Part Two . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat


• In practice, we rarely know the population variance 2.• To allow for unknown , we can use an estimate:

• However, the random interval no longer contains with probability .95 because the constant has been replaced with the random variable S.

)/(96.1 nSY


• We use t distribution, rather standard normal distribution.

where S is the sample standard deviation of the random sample {Y1, …, Yn}.

– Note that

67



• To construct a 95% confidence interval using t distribution, let c be the 97.5th percentile in the tn‐1 distribution.

• Once the critical value c is chosen, the random interval contains .

]/ ,/[ nScYnScY

P(-c<t<c) =.95

Table G.2 in Appendix G.


Example:• Let n=20

df = n‐1 = 19c = 2.093 (See Table G.2 in Appendix G)

• The 95% confidence interval is

and s are the values obtained from the sample.

)20/(093.2 sy

68




• More generally, let c be the 100(1‐/2) percentile in the tn‐1 distribution.

• A 100(1‐ )% confidence interval is c/2 – known after choosing and degree of freedom n‐1.

69



• Recall that

• s/ is the point estimate of sd( or the standard error of .

n/)sd( Y

se( ) / nY s

• A 100(1‐)% confidence interval can be written as

• The notion of the standard error of an estimate plays an important role in econometrics.

Example C.2 Effect of Job Training Grants on Worker Productivity

• A sample of firms receiving job training grants in 1988. Scrap rates – number of items per 100

produced that are not useable and need to be scrapped.

The change in scrap rates has a normal distribution.

70


• n=20, = ‐1.15 se( ) = s/ = .54 (Note that s=2.41)

• A 95% confidence interval for the mean change in scrap rate is

2.093se( )[‐2.28, ‐0.02]

• With 95% confidence, the average change in scrap rates in the population is not zero!


Example C.2 Effect of Job Training Grants on Worker Productivity

• The analysis above has some potentially serious flaws.

• It assumes that any systematic reduction in scrap rates is due to the job training grants.– Many things (variables) can happen over the course of the year to change

worker productivity!

71



• Note that t distribution approaches the standard normal distribution as the degrees of freedom gets large.

• In particular,=.05, c/2 1.96 as n

(see graph)

A Simple Rule of Thumb for a 95% Confidence Interval

• A Rule of Thumb for an approximate 95% confidence interval is

1) It is slightly too big for large sample sizes2) It is slightly too small for small sample sizes.

72



• For some applications, the population is nonnormal.– In some cases, the nonnormal population has no standard distribution.

• This does not matter as long as sample sizes are sufficiently large for the central limit theorem to give a good approximation of the distribution of the sample average .

• A large sample size has a nice feature since it results in small confidence intervals. This is because standard error for

– [se( )] shrinks to zero as the sample size grows.

73


Asymptotic Confidence Intervals for Nonnormal Populations


• For large n, an approximate 95% confidence interval is

where 1.96 is the 97.5th percentile in the standard normal distribution.

• Note that the standard normal distribution is used in place of t distribution since we deal with asymptotics.

– as n increases without bound, the t distribution approaches standard normal distribution.

Example C.3 Race Discrimination in Hiring• Matched pairs analysis – each person in a pair interviewed for the same job.

• We are interested in the difference B ‐ W

• Unbiased estimators of B and W are and the fractions of interviews for which blacks and whites were offered jobs.

B Probability that the black person is offered a job

w Probability that the white person is offered a job

Bi=1 If the black person gets a job offer from employer i

Wi=1 If the white person gets a job offer from employer i

74



• A new variableYi = Bi –Wi

• Yi can take these values, then,

Yi =-1 if the black did not get the job, but the white did

Yi =0 if both did or did not get the job

Yi =1 if the white did not get the job, but the black didWBii WEBE )()(

Example C.3 Race Discrimination in Hiring

• Sample size n=241

– 22.4% of black were offered jobs, while 35.7% of white were offered jobs– This is prima facie evidence of discrimination!

• Sample standard deviation: s=0.482• Find an approximate 95% confidence interval for

133.357.224. ,357. and 224. ysowb

75



• A 95% CI for = B ‐ w is ‐.133 1.96(.482/(241)½

‐.133 .031[‐.164, ‐.102]

• A 99% CI for = B ‐ w is ‐.133 2.58(.482/(241)½

[‐.213, ‐.053]

• We are very confident that the population difference is not zero!

Problem C.7C.7 The new management at a bakery

claims that workers are now more productive than they were under old management, which is why wages have “generally increased.” Let Wi

b be Worker i’s wage under the old management and let Wi

a be Worker i’s wage after the change. The difference is

Di = Wia ‐Wi

b. Assume that the Di are a random sample from a Normal(, 2) distribution.

(i) Using the following data on 15 workers, construct an exact 95% confidence interval for . [ans.]

76



obs Wb Wa D=Wa-Wb

1 8.3 9.25 0.95

2 9.4 9 -0.4

3 9 9.25 0.25

4 10.5 10 -0.5

5 11.4 12 0.6

6 8.75 9.5 0.75

7 10 10.25 0.25

8 9.5 9.5 0

9 10.8 11.5 0.7

10 12.55 13.1 0.55

11 12 11.5 -0.5

12 8.65 9 0.35

13 7.75 7.75 0

14 11.25 11.5 0.25

15 12.65 13 0.35

mean 10.16667 10.40667 0.24

Problem C.7 (i)

(i) • The average increase in wage is ̅ = .24, or 24 cents. • The sample standard deviation is

about s = .451n = 15, se( ̅) = .1164.


Fundamentals of Mathematical Statistics . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat V. Interval Estimation and Confidence Intervals

obs Wb Wa D=Wa-Wb

1 8.3 9.25 0.95

2 9.4 9 -0.4

3 9 9.25 0.25

4 10.5 10 -0.5

5 11.4 12 0.6

6 8.75 9.5 0.75

7 10 10.25 0.25

8 9.5 9.5 0

9 10.8 11.5 0.7

10 12.55 13.1 0.55

11 12 11.5 -0.5

12 8.65 9 0.35

13 7.75 7.75 0

14 11.25 11.5 0.25

15 12.65 13 0.35

mean 10.16667 10.40667 0.24

Date: 05/07/07 Time: 07:57

Sample: 1 15

WB WA D

Mean 10.16667 10.40667 0.24Median 10 10 0.25

Maximum 12.65 13.1 0.95

Minimum 7.75 7.75 -0.5

Std. Dev. 1.569084 1.595291 0.450872Skewness 0.175376 0.290842 -0.34947

Kurtosis 1.810807 2.022774 2.161199

Jarque-Bera 0.960754 0.80833 0.745067

Probability 0.61855 0.667534 0.688986

Sum 152.5 156.1 3.6

Sum Sq. Dev. 34.46833 35.62933 2.846

Observations 15 15 15

Problem C.7 (i)


Fundamentals of Mathematical Statistics . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat V. Interval Estimation and Confidence Intervals

VI. Hypothesis Testing

• We have reviewed how to evaluate point estimators and to construct confidence intervals.

– Sometimes the question we are interested in has a definite yes or no answer.1) Does a job training program effectively increase average worker

productivity?2) Are blacks discriminated against in hiring?

• Devising methods for answering such questions, using a sample of data, is known as hypothesis testing.

A. FundamentalsB. HT N(0,1)C. AsymptoticD. P-ValueE. CI & HT

79I. Random S. II. Finite S. III. Asymptotic S. IV. Parameter E. V. Interval E. & Confidence I. VI. Hypothesis T VII. RemarksFundamentals of Mathematical Statistics . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat VI. Hypothesis Testing

Fundamentals of Hypothesis Testing

• Suppose the election results are as follows: – Candidate A=42% and – Candidate B=58% of the popular vote

• Candidate A argued that the election was rigged.Consulting agency: a sample of 100 voters. It was found that 53% voted for Candidate A.

Question:how strong is the sample evidence against the officially reported percentage of 42%?



• One way to proceed is to set up a hypothesis test.Let be the true proportion of the population voting for Candidate A

• The null hypothesis isH0: =.42


• The null hypothesis plays a role similar to that of a defendant. – A defendant is presumed to be innocent until proven guilty.– The null hypothesis is presumed to be true until the data strongly suggest

otherwise.



• The alternative hypothesis is that the true proportion voting for Candidate A is above 0.42.

H1: >.42

• In order to conclude H1 is true and H0 is false, we must prove beyond reasonable doubt.

– Observing 43 votes out of a sample of 100 is not enough to overturn the original result.

• Such an outcome is within the expected sampling variation.

– How about observing 53 votes out of a sample of 100?


• There are two kinds of mistakes:

1)We reject the null hypothesis when it is true – Type I errorExample: We reject H0 when the true proportion of voting for Candidate

A is in fact 0.42.

2)We “accept” or do not reject the null hypothesis when it is false – Type II errorExample: we “accept” H0, but >.42.




• We can compute the probability of making either a Type I or a Type II error.

• Hypothesis testing requires choosing the significance level, denoted by .

= P(Reject H0 H0)

Read: the probability of rejecting null hypothesis, given that H0 is true.

• A significance level is the probability of committing Type I error.



• Classical hypothesis requires that we specify a significance level for a test.

• Common values for are .10, .05, and .01. – They quantify our tolerance for an error.

• =.05: The researcher is willing to make mistakes (falsely reject H0) 5% of the time.


• Type II error– Want to minimize the probability of Type II error– Alternatively, want to maximize the power of a test.

• The power of the test is one minus the probability of a Type II error. Mathematically,

() = P(Reject H0) = 1 – P(Type II)

where the actual value of the parameter.

• We would like the power to equal unity whenever the null hypothesis is false.



Testing Hypotheses about the Mean in a Normal Population

• In order to test hypothesis, we need to choose a test statisticand a critical value.

• The test statistic T is some function of the random sample.

• When we compute the statistic for particular outcome, we obtain an outcome of the test statistic, denoted by t.



• Provided that the null hypothesis is true, the critical value c is determined by the distribution of T and the chosen significance level .

• All rejection rules depend on the outcome of the test statistic t and the critical value c.




• To test hypothesis about the mean from a Normal(, 2) is as follows. The null hypothesis is

H0: = 0,

where 0 is a value we specify. In the majority of applications, =0.

• The rejection rule depends on the nature of the alternative hypothesis.


Three alternatives of interest

• One sided alternative:H1: > 0,H1: < 0,

• Two sided alternative:H1: 0,

• Here we are interested in any departure from the null hypothesis.



• For example, for an one‐sided alternative,H1: 0 > 0.

• The null hypothesis is effectively H0: 0.

• Here we reject the null hypothesis when the value of sample average, , is sufficiently greater than 0. How?




• We use standardized version,

• Note that s is used in place of and • This is called the t statistic. The t statistic measures the distance from to 0

relative to the standard error of .

nsyse /)(




• Under the null hypothesis, the random variable is

T has a tn‐1 distribution.

Example of a one‐tailed test:• Choose the significance level =.05. The critical

value c is chosen so that

P( T > cH0) = .05

• The rejection rule is t > c

SYnT /)( 0where c is the 100(1-) percentile in a tn-

1 distribution. This is an example of a one-tailed test.

Example C.4: Effect of Enterprise Zones on Business Investments

• Y denotes the percentage change in investment from the year before and year after a city became an enterprise zone.

• Assume that Y has a Normal(,2) distribution.H0: =0 (Null hypothesis: Enterprise zones have no effect)H1: >0 (Alternative hypothesis:They have a positive effect)



• Suppose that we wish to test H0 at the 5% level. The test statistic is

• A sample of 36 cities. =0.5; C=1.69 (see Table G.2) y‐bar=8.2; s=23.9 t = 2.06

• We conclude that, at the 5% significance level, enterprise zones have an effect on average investment.

• At 1% significance level, do enterprise zones have an positive effect?

Back

Up



• For the null hypothesis and the alternative hypothesisH0: ≥ 0,H1: < 0.

• The rejection rule is

t < ‐cThis implies that < 0 that are sufficiently far from zero to reject H0.


92


I. Random S. II. Finite S. III. Asymptotic S. IV. Parameter E. V. Interval E. & Confidence I. VI. Hypothesis T VII. RemarksFundamentals of Mathematical Statistics . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat VI. Hypothesis Testing

Example C.5: Race Discrimination in Hiring

• =B‐W is the difference in the probability that blacks and whites receive job offers. is the population mean of the variable Y=B‐W where B and W are binary variables.

• TestingH0: =0H1: <0



• Given n=241,• The t statistic for testing H0: =0

t = ‐.133/.031 = ‐4.29

• Critical value = ‐2.58 (one‐sided test; =.005)• t<‐2.58 There is very strong evidence against H0 in favor of H1.

031.241/48.)(133. ysey

• For the null hypothesis and the alternative hypothesis,H0: = 0,H1: 0.

• The rejection rule is

t> c

This gives a two‐tailed test.




• We have to be careful in obtaining the critical value, c.

• The critical value c (See graph!)– It is the 100(1‐/2) percentile in a tn‐1 distribution.– If =.05, c is the 97.5th percentile in the tn‐1 distribution.




Example: Let n=22,

• c=2.08, the 97.5th percentile in a t21distribution.(See Table G.2)

• Rejection Rule: the absolute value of t statistic must exceed 2.08.


• Proper language for hypothesis testing:“We fail to reject H0 in favor of H1 at the 5% significance level”

• Incorrect wording:“ We accept H0 at the 5% significance level”



Asymptotic Tests for Nonnormal Population

• If the sample is large enough, we can invoke central limit theorem.

• Asymptotic theory is based on n increasing without bound

• Under the null hypothesis, a Normal (0,1)

As n gets large, the tn‐1 distribution converges to the standard normal distribution.

SYnT /)( 0



Asymptotic Tests for Nonnormal Population

• Because asymptotic theory is based on n increasing without bound, – standard normal or t critical values are pretty much the same

• Suggestions:– For moderate values of n, say between 30 and 60, it is traditional to use the

t distribution.– For n 120, the choice between two distributions is irrelevant.



• Note that our chosen significance levels are only approximate.

• When the sample size is large, the actual significance level will be very close to 5%.

Example C.5: Race Discrimination in Hiring

• Given n=241,

• The t statistic for testing H0: =0t = ‐.133/.031 = ‐4.29

• Critical value = ‐2.58 (two‐sided test; =.01)• t<‐2.58 There is very strong evidence against H0 in favor of H1.

031.241/48.)(133. ysey



• =B‐W is the difference in the probability that blacks and whites receive job offers. is the population mean of the variable Y=B‐W where B and W are binary variables.

• TestingH0: =0H1: 0

Computing and Using p‐Values

• The traditional requirement of choosing the significance level ahead of time means that different researchers could wind up with different conclusions,

– although they use the same set of data and same procedures.

• p‐value of the test



• p‐value of the testIt is the largest significance level at which we fail to reject the null hypothesis.

• p‐value of the testIt is the smallest significance level at which we reject the null hypothesis.


• One sided Test: Let H0: =0 in a Normal(,2). The test statistic is

• The observed value of T for our sample is t = 1.52

SYnT /



• The p‐value is the area to the right hand side of 1.52, which is

p‐value = P(T>1.52H0) = 1 – (1.52) = .0655

where () is the standard normal cumulative distribution function (cdf).


• Interpretation: t=1.52 and p‐value=.065

– The largest significance level at which we carry out the test and fail to reject the H0 is .065.

– The probability that we observe a value of T as large as 1.52 when the null hypothesis is true.

– If we carry out the test at the significance level above .065, we reject the null hypothesis.

– The smallest significance level at which we reject the null hypothesis is .065.

– We would observe the value of T as large as 1.52 due to chance 6.5% of the time.




• Interpretation: t=2.85 and (n is large)p‐value= 1 –(2.85) =.0022

– If the null hypothesis is true, we observe a value of T as large as 2.85 with probability .002.

– If we carry out the test at the significance level above .002, we reject the null hypothesis.

– The smallest significance level at which we reject the null hypothesis is .002.



Example C.6: Effect of Training Grants on Worker Productivity(one tail test)• is the average change in scrap rates; n=20. Note that the change in scrap rates has a

normal distribution.

• HypothesisH0: = 0 (Training grants have no effect)H1: <0



n=20 (for one‐tail test)

• If we carry out the test at the significance level above 0.023, we reject the null hypothesis.

• The smallest significance level at which we reject the null hypothesis is 0.023.

Example: Training Grants and Worker Productivity (two tails)


105

2.13

Area = p-value = 023

P-value = 0.023+0.023 = 0.046


• If we carry out the test at the significance level above 0.046, we reject the null hypothesis.

• The smallest significance level at which we reject the null hypothesis is 0.046.

Two sided alternative• For t testing about population means, the p‐value is

P(Tn‐1>t) = 2P(Tn‐1>t)

• P‐value is computed by finding the area to the right oftand multiplying the area by two.

• Hull Hypothesis and Two‐sided alternativeH0: =0 against H1: 0

t value of the test statistic

Tn-1 t random variable

Example C.7 Race Discrimination in Hiring

• The t statistic for testing H0: =0t = ‐.133/.031 = ‐4.29

• If Z is a standard normal random variableP(Z<‐4.29) 0

• Hull Hypothesis and Two‐sided alternativeH0: =0 against H1: 0

• There is very strong evidence against H0 in favor of H1.– Note that Critical value = ‐2.58 (=.01)

• Givenn=241; = ‐.133; se( ) = .48/ = .031



For nonnormal distribution, the exact p‐value can be difficult to obtain, but we can find asympototic p‐values by using the same calculations.


• Rejection rules for t value: Summary1) For H1: > 0, the rejection rule is t>c and the p‐value is P(T>t).2) For H1: < 0, the rejection rule is t<‐c and the p‐value is P(T<t).3) For H1: ≠ 0, the rejec on rule is t>c and the p‐value is P(T>t).



• Rejection rules for p‐value: SummaryChoose significance level, 1) We reject the H0 at the 100% level if

p‐value < 2) We fail to reject H0 at the 100% level if

p‐value

The Relationship between Confidence Interval and Hypothesis Testing

• The confidence interval and hypothesis testing are linked.• Assume =.05. The confidence interval can be used to test two sided

alternatives. SupposeH0: = 0H1: 0

• Rejection Rule– If 0 does not in the confidence interval, we reject the null hypothesis at the

5% level.– If the hypothesized value of , 0, lies in the confidence interval, we fail to

reject the null hypothesis at the 5% level.



• After a confidence interval is constructed, many values of 0 can be tested.

– Since a confidence interval contains more than one value, there are many null hypotheses that can be rejected.

Example C.8: Training Grants and Worker Productivity

• A 95% confidence interval for the mean change in scrap rate is[‐2.28, ‐0.02]

• Since zero is excluded from this interval, at 5% level, we reject H0: =0 against H1: 0

• If H0: =‐2, we fail to reject the null hypothesis.



• Don’t sayWe “accept” the null hypothesis H0: =‐1.0 at the 5% significance level.

• This is because in the same set of data, there are usually many hypotheses that cannot be rejected.

• For example, – it is logically incorrect to say that H0: =‐1 and H0: = ‐2 are both “accepted.” – It is possible that neither are rejected. – Thus, we say “fail to reject”.

Practical Significance and Statistical Significance

• Three covered evidences of population parameters include 1) point estimates, 2) confidence intervals and 3) hypothesis tests.

• In empirical analysis, we should also put emphasis on the magnitudes of the point estimates!

110



• Statistical significance depends on the size of the test statistic, not on the size of .

– It depends on the ratio of to its standard error:• The test statistic could be large because se( ) is large or is large.

)(/ yseyt

Practical Significance and Statistical Significance

• Note that the magnitude and sign of the test statistic determine the statistical significance.

• Practical significance depends on the magnitude of . – The estimate can be statistically significant without being large,

especially when we work with large sample sizes.

111



Example C.9: Effect of Free Way Width on Commute Time

• Given n=900,• The t statistic for testing H0: =0

t = ‐3.6/1.09 = ‐3.30p‐value=.005

• Statistical Significance: We conclude that the estimated reduction in comunte time had a statistically significant effect on average commute time.

• Practical Significance: The estimated reduction in average commute time is only 3.6 minutes.

3.6; sample sd 32.7 ( ) 32.7 / 900 1.09y se y

112



• Let Y denote the change in commute time, measured in minutes, for commuters before and after a freeway was widened.

• Assume YNormal(, 2)• Hypotheses:

H0: =0 H1: <0

VII. Remarks on Notation

• We have been careful to use standard conventions to denote

• Distinguishing between an estimator and estimate (outcome of the random variable W) is important for understanding various concepts in– Estimation and– Hypothesis Testing.

random variable Westimator Westimate w


Fundamentals of Mathematical Statistics . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat VII. Remarks on Notation

Remarks on Notation• In the main text, we use a simpler convention that is widely used in

econometrics.

• If is a population parameter, the notation will be used to denote both an estimator and an estimate of

• = theta hat



Example:• If the population parameter is , then

denotes an estimator or estimate of .

• If the parameter is 2, then denotes an estimator or estimate of 2.

Problem C.6C.6 You are hired by the governor to study whether a tax on

liquor has decreased average liquor consumption in your state. You are able to obtain, for a sample of individuals selected at random, the difference in liquor consumption (in ounces) for the years before and after the tax. For person i who is sampled randomly from the population, Yi denotes the change in liquor consumption. Treat these as a random sample from a Normal(, 2) distribution.

(i) The null hypothesis is that there was no change in average liquor consumption. State this formally in terms of . [ans.]

(ii) The alternative is that there was a decline in liquor consumption; state the alternative in terms of . [ans.]

115


VI. Hypothesis Testing VII. Remarks on NotationFundamentals of Mathematical Statistics: Part Three . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat

Problem C.6 continue(iii) Now, suppose your sample size is n = 900 and you obtain the

estimates = ‐ 32.8 and s = 466.4.Calculate the t statistic for testing H0 against H1; obtain the p‐value for the test. (Because of the large sample size, just use the standard normal distribution tabulated in Table G.1.) Do you reject H0 at the 5% level? at the 1% level? [ans.]

(iv) Would you say that the estimated fall in consumption is large in magnitude? Comment on the practical versus statistical significance of this estimate. [ans.]

(v) What has been implicitly assumed in your analysis about other determinants of liquor consumption over the two‐year period in order to infer causality from the tax change to liquor consumption? [ans.]

116


VI. Hypothesis Testing VII. Remarks on NotationFundamentals of Mathematical Statistics: Part Three . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat

Problem C.6 (i) (ii)

Yi – the change in liquor consumption.Yi are a random sample from a Normal (, 2)

(i) – H0: µ = 0.

(ii)– H1: µ < 0.


Fundamentals of Mathematical Statistics . Intensive Course in Mathematics and Statistics . Chairat Aemkulwat VI. Hypothesis Testing

Problem C.6 (iii)(iii) • The standard error of is

se( ) = s/ = 466.4/30 = 15.55.

• Therefore, the t statistic for testing H0: µ = 0 is t = /se( ) = – 32.8/15.55

= –2.11. • We obtain the p‐value as

P(Z –2.11), where Z ~ Normal(0,1).

• These probabilities are in Table G.1: p‐value = .0174.

• (=.05) Because the p‐value is below .05, we reject H0 against the one‐sided alternative at the 5% level.

• (=.01) We do not reject at the 1% level because p‐value = .0174 > .01.

y





Problem C.6 (iv)(iv) • The estimated reduction, about 33 ounces, does not seem large for an

entire year’s consumption. – If the alcohol is beer, 33 ounces is less than three 12‐ounce cans of beer. – Even if this is hard liquor, the reduction seems small. – (On the other hand, when aggregated across the entire population, alcohol

distributors might not think the effect is so small.)

(v) • The implicit assumption is that other factors that affect liquor

consumption – such as income, or changes in price due to transportation costs, are constant over the two years.



Problem C.7C.7 The new management at a bakery

claims that workers are now more productive than they were under old management, which is why wages have “generally increased.” Let Wi

b be Worker i’s wage under the old management and let Wi

a be Worker i’s wage after the change. The difference is

Di = Wia ‐Wi

b. Assume that the Di are a random sample from a Normal(, 2) distribution.

(i) Using the following data on 15 workers, construct an exact 95% confidence interval for . [ans.]

121


obs Wb Wa D=Wa-Wb

1 8.3 9.25 0.95

2 9.4 9 -0.4

3 9 9.25 0.25

4 10.5 10 -0.5

5 11.4 12 0.6

6 8.75 9.5 0.75

7 10 10.25 0.25

8 9.5 9.5 0

9 10.8 11.5 0.7

10 12.55 13.1 0.55

11 12 11.5 -0.5

12 8.65 9 0.35

13 7.75 7.75 0

14 11.25 11.5 0.25

15 12.65 13 0.35

mean 10.16667 10.40667 0.24


Problem C.7 continue …(ii) Formally state the null hypothesis that there has been no

change in average wages. In particular, what is E(Di) under H0? If you are hired to examine the validity of the new management’s claim, what is the relevant alternative hypothesis in terms of E(Di)? [ans.]

• (iii) Test the null hypothesis from part (ii) against the stated alternative at the 5% and 1% levels. [ans.]

• (iv) Obtain the p‐value for the test in part (iii). [ans.]

122



Problem C.7 (i)(i) • The average increase in wage is ̅ = .24, or 24

cents.

• The sample standard deviation is about s = .451, n = 15, se( ̅) = .1164.

• From Table G.2, the 97.5th percentile in the t14distribution is 2.145.

• So the 95% CI is

= .24 2.145(.1164), = or about –.010 to .490.

(ii) • If µ = E(di) then

H0: µ = 0. • The alternative is that management’s claim is

true: H1: µ > 0.

123

obs Wb Wa D=Wa-Wb

1 8.3 9.25 0.95

2 9.4 9 -0.4

3 9 9.25 0.25

4 10.5 10 -0.5

5 11.4 12 0.6

6 8.75 9.5 0.75

7 10 10.25 0.25

8 9.5 9.5 0

9 10.8 11.5 0.7

10 12.55 13.1 0.55

11 12 11.5 -0.5

12 8.65 9 0.35

13 7.75 7.75 0

14 11.25 11.5 0.25

15 12.65 13 0.35

mean 10.16667 10.40667 0.24


]/ ,/[ nScdnScd

Date: 05/07/07 Time: 07:57

Sample: 1 15

WB WA D

Mean 10.16667 10.40667 0.24Median 10 10 0.25

Maximum 12.65 13.1 0.95

Minimum 7.75 7.75 -0.5

Std. Dev. 1.569084 1.595291 0.450872Skewness 0.175376 0.290842 -0.34947

Kurtosis 1.810807 2.022774 2.161199

Jarque-Bera 0.960754 0.80833 0.745067

Probability 0.61855 0.667534 0.688986

Sum 152.5 156.1 3.6

Sum Sq. Dev. 34.46833 35.62933 2.846

Observations 15 15 15

Problem C.7 (i)





Problem C.7 (iv)(iv) • We obtain the p‐value as

P(T > 2.062), where T is from the t14 distribution.

• The p‐value obtained from Eview is .029; – this is half of the p‐value for the two‐sided alternative. – (Econometrics packages, including Eviews, report the p‐value for the two‐sided alternative.)



Hypothesis Testing for DI

Date: 05/07/07 Time: 08:03Sample: 1 15Included observations: 15Test of Hypothesis: Mean = 0.000000

Sample Mean = 0.240000Sample Std. Dev. = 0.450872

Method Value Probabilityt-statistic 2.061595 0.0583

View / Test of Descriptive Stats / Simple Hypothesis Tests

127

Problem C.7 (iv)


Good Luck!

FT19, PT15: See you around!



Populations, Fundamentals of Mathematicalpioneer.netserv.chula.ac.th/~achairat/Appendix C Mathematical... · Finite Sample Properties of Estimators Fundamentals of Mathematical Statistics

Documents