Confidence intervals and hypothesis testing Petter Mostad 2005.10.03.

Confidence intervals and hypothesis testing

Petter Mostad

2005.10.03

Confidence intervals (repetition)

• Assume μ and σ2 are some real numbers, and assume the data X1,X2,…,Xn are a random sample from N(μ,σ2). – Then

– thus

– so

and we say that is a confidence interval for μ with 95% confidence, based on the statistic

~ (0,1)/

XZ N

n

( 1.96 1.96) 95%P Z

( 1.96 1.96 ) 95%n n

P X X ( 1.96 , 1.96 )

n nX X

X

Confidence intervals, general idea

• We have a model with an unknown parameter• We find a ”statistic” (function of the sample) with

a known distribution, depending only on the unknown parameter

• This distribution is used to construct an interval with the following property: If you repeat many times selecting a parameter and simulating the statistic, then about (say) 95% of the time, the confidence interval will contain the parameter

Hypothesis testing

• Selecting the most plausible model for the data, among those suggested

• Example: Assume X1,X2,…,Xn is a random sample from N(μ,σ2), where σ2 is known, but μ is not; we want to select μ fitting the data.

• One possibility is to look at the probability of observing the data given different values for μ. (We will return to this)

• Another is to do a hypothesis test

Example

• We select two alternative hypotheses: – H0: – H1:

• Use the value of to test H0 versus H1: If is far from , it will indicate H1.

• Under H0, we know that

• Reject H0 if is outside

0( 1.96 1.96 ) 95%n n

P X X

0 0( 1.96 , 1.96 )n n

X X

X

0

0

0

General outline for hypothesis testing

• The possible hypotheses are divided into H0, the null hypothesis, and H1, the alternative hypothesis

• A hypothesis can be– Simple, so that it is possible to compute the

probability of data (e.g., )– Composite, i.e., a collection of simple

hypotheses (e.g., )

3.7

3.7

General outline (cont.)

• A test statistic is selected. It must: – Have a higher probability for ”extreme” values under

H1 than under H0

– Have a known distribution under H0 (when simple)• If the value of the test statistic is ”too extreme”,

then H0 is rejected. • The probability, under H0, of observing the given

data or something more extreme is called the p-value. Thus we reject H0 if the p-value is small.

• The value at which we reject H0 is called the significance level.

Note:

• There is an asymmetry between H0 and H1: In fact, if the data is inconclusive, we end up not rejecting H0.

• If H0 is true the probability to reject H0 is (say) 5%. That DOES NOT MEAN we are 95% certain that H0 is true!

• How much evidence we have for choosing H1 over H0 depends entirely on how much more probable rejection is if H1 is true.

Errors of types I and II

• The above can be seen as a decision rule for H0 or H1.

• For any such rule we can compute (if both H0 and H1 are simple hypotheses):

P(accept | H0) P(accept | H1)

P(reject | H0) P(reject | H1)

Accept H0

Reject H0

H0 true H1 true

TYPE II error

TYPE I errorSignificance

1 - power

Significance and power

• If H0 is composite, we compute the significance from the simple hypothesis that gives the largest probability of rejecting H0.

• If H1 is composite, we compute a power value for each simple hypothesis. Thus we get a power function.

Example 1: Normal distribution with unknown variance

• Assume

• Then

• Thus

• So a confidence interval for , with significance is given by

21 2, ,..., ~ ( , )nX X X N

1~/

n

Xt

s n

1, / 2 1, / 2( )s sn nn n

P X t X t

1, / 2 1, / 2( , )s sn nn n

X t X t

Example 1 (Hypothesis testing)

• Hypotheses:

• Test statistic under H0

• Reject H0 if or if

• Alternatively, the p-value for the test can be computed (if ) as the such that

0 0:H

01~

/n

Xt

s n

0 1, / 2s

n nX t

20 1, / 2n n

X t

1 0:H

0 1, / 2s

n nX t

0X

Example 1 (cont.)

• Hypotheses:

• Test statistic assuming

• Reject H0 if

• Alternatively, the p-value for the test can be computed as the such that

0 0:H

01~

/n

Xt

s n

0 1,s

n nX t

20 1,n n

X t

1 0:H

0

Example 1 (cont.)

• Assume that you want to analyze as above the data in some column of an SPSS table.

• Use ”Analyze” => ”Compare means” => ”One-sample T Test”

• You get as output a confidence interval, and a test as the one described above.

• You may adjust the confidence level using ”Options…”

Example 2: Differences between means

• Assume and

• We would like to study the difference• Four different cases:

– Matched pairs– Known population variances– Unknown but equal population variances– Unknown and possibly different pop. variances

21 2, ,..., ~ ( , )n x xX X X N

21 2, ,..., ~ ( , )m y yY Y Y N

1 2

Known population variances

• We get

• Confidence interval for 1 2

22

( )~ (0,1)x y

yx

x y

X YN

n n

22

/ 2yx

x y

X Y Zn n

Unknown but equal population variances

• We get

where

• Confidence interval for

22 2

( )~

x y

x yn n

p p

x y

X Yt

s s

n n

2 22 ( 1) ( 1)

2x x y y

px y

n s n ss

n n

x y

2 2

2, / 2x y

p pn n

x y

s sX Y t

n n

Hypothesis testing: Unknown but equal population variances

• Hypotheses:

• Test statistic:


0 : x yH 1 : x yH

22 2~

x yn n

p p

x y

X Yt

s s

n n

2, / 22 2 x yn n

p p

x y

X Yt

s s

n n

2, / 22 2 x yn n

p p

x y

X Yt

s s

n n

”T test with equal variances”

Unknown and possibly unequal population variances

• We get

where

• Conf. interval for

22

( )~x y

yx

x y

X Yt

ss

n n

22 2

2 22 2 ( / )( / )

1 1

yx

x y

ssn n

y yx x

x y

s ns n

n n

x y 22

, / 2yx

x y

ssX Y t

n n

Hypothesis test: Unknown and possibly unequal pop. variances

• Hypotheses:

• Test statistic


0 : x yH 1 : x yH

22~

yx

x y

X Yt

ssn n

, / 222yx

x y

X Yt

ssn n

, / 222

yx

x y

X Yt

ss

n n

”T test with unequal variances”

Practical examples:

• The lengths of children in a class are measured at age 8 and at age 10. Use the data to find an estimate, with confidence limits, on how much children grow between these ages.

• You want to determine whether a costly operation is generally done more cheaply in France than in Norway. Your data is the actual costs of 10 such operations in Norway and 20 in France.

Example 3: Population proportions

• Assume , so that is a frequency.• Then

• Thus

• Thus


~ ( , )X Bin n Xnp

~ (0,1)(1 ) /

pN

n

~ (0,1)(1 ) /

pN

p p n

(approximately, for large n)

(approximately, for large n)

/ 2 / 2

(1 ) (1 )p p p pP p Z p Z

n n

/ 2 / 2

(1 ) (1 ),

p p p pp Z p Z

n n


• Hypotheses:

• Test statistic under H0, for large n


0 0:H

0

0 0

~ (0,1)(1 ) /

pN

n

0 00 / 2

(1 )p Z

n

1 0:H

0 00 / 2

(1 )p Z

n

Example 4: Differences between population proportions

• Assume and , so that and are frequencies

• Then


1 1 1~ ( , )X Bin n1

11Xnp

1 2 1 2

1 1 2 2

1 2

( )~ (0,1)

(1 ) (1 )

p pN

n n

(approximately)

2 2 2~ ( , )X Bin n2

22Xnp

1 2

1 1 2 21 2 / 2

1 2

(1 ) (1 )p p p pp p Z

n n


• Hypotheses:

• Test statistic

where

• Reject H0 if

0 1 2:H 1 2

0 0 0 0

1 2

~ (0,1)(1 ) (1 )

p pN

p p p pn n

1 1 2:H

1 1 2 20

1 2

n p n pp

n n

1 2/ 2

0 0 0 0

1 2

(1 ) (1 )

p pZ

p p p pn n

Example 5: The variance of a normal distribution

• Assume

• Then

• Thus


21 2, ,..., ~ ( , )nX X X N

22

12

( 1)~ n

n s

2 2

2 21,1 / 2 1, / 2

( 1) ( 1),

n n

n s n s

22 2

1,1 / 2 1, / 22

( 1)n n

n sP

2

Example 6: Comparing variances for normal distributions

• Assume• We get

• Fnx-1,ny-1 is an F distribution with nx-1 and ny-1 degrees of freedom

• We can use this exactly as before to obtain a confidence interval for and for testing for example if

• Note: The assumption of normality is crucial!

21 2, ,..., ~ ( , )n x xX X X N 2

1 2, ,..., ~ ( , )m y yY Y Y N 2 2

1, 12 2

/~

/ x y

x xn n

y y

sF

s

2 2/x y 2 2x y

Sample size computations

• For a sample from a normal population with known variance, the size of the conficence interval for the mean depends only on the sample size.

• So we can compute the necessary sample size to match a required accuracy

• Note: If the variance is unknown, it must somehow be estimated on beforehand to do the computation

• Works also for population proportion estimation, giving an inequality for the required sample size

Power computations

• If you reject H0, you know very little about the evidence for H1 versus H0 unless you study the power of the test.

• The power is 1 minus the probability of rejecting H0 given that a hypothesis in H1 is true.

• Thus it is a function of the possible hypotheses in H1.

• We would like our tests to have as high power as possible.

Confidence intervals and hypothesis testing Petter Mostad 2005.10.03.

Documents