Top Banner
Week 2 a Confidence intervals b Hypothesis testing for proportions Ernst Wit / Wim Krijnen Johann Bernoulli Institute [email protected] http://www.math.rug.nl/ernst Ernst Wit / Wim Krijnen Week 2
39

Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Jul 29, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Week 2a Confidence intervals

b Hypothesis testing for proportions

Ernst Wit / Wim Krijnen

Johann Bernoulli Institute

[email protected]

http://www.math.rug.nl/∼ernst

Ernst Wit / Wim Krijnen Week 2

Page 2: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Estimation

Remember:

Statistical inference is the business of calculatingstatistics on variables in a sample in order saysomething about parameters in a population.

One statistic we like to calculate is an estimator.

Example. In a country two candidates are involved in apresidential election, Pal en Oba. The country has 10,000inhabitants, of which 5,600 are in favour of Oba and 4,400 infavour of Pal.

I Parameter: p = 0.56 (true fraction of those in favour of Oba)

I Sample: news organization samples 100 people for their votes.

I Statistic: those in favour of Oba100 (Estimator).

Ernst Wit / Wim Krijnen Week 2

Page 3: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

How good is the estimator?

I Let’s assume there are 1,000 news organizations and eachperform their own independent poll.

I None of them will be certain to get it right, but their answerswill vary around the truth p = 0.56:

Ernst Wit / Wim Krijnen Week 2

Page 4: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

7.1 Confidence interval

> country<-c(rep(1,5600),rep(0,4400))

> phat <- vector(’numeric’,1000)

> for (i in 1:1000){

+ phat[i] <- mean(sample(country,size=100,replace=TRUE))}

Quantiles of 1,000 ps:

> quantile(phat,c(0.1,0.9))

10% 90%

0.50 0.62

> quantile(phat,c(0.025,0.975))

2.5% 97.5%

0.46 0.66

Ernst Wit / Wim Krijnen Week 2

Page 5: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Above results suggest

P(0.50 ≤ p ≤ 0.62) ≈ 0.80,

P(0.46 ≤ p ≤ 0.66) ≈ 0.95.

Since true p = .56

P(0.50− p ≤ p − p ≤ 0.62− p) = .80

P(−0.06 ≤ p − p ≤ 0.06) = .80

Distance between p and p less than 0.06 with 80% certaintyEquivalently

P(p − 0.06 ≤ p ≤ p + 0.06) = .80

“True p within interval (p − 0.06, p + 0.06) with 80% confidence”80% confidence interval

Ernst Wit / Wim Krijnen Week 2

Page 6: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Deriving some theory...

What can we say about the approximate distribution of p:

1. Approximately normal:

2. It is approximately unbiased: E p = p

> mean(phat)

[1] 0.56047

3. It has a tractable standard deviation: SD(p) =√

p(1−p)n

> sqrt(0.56*0.44/100)

[1] 0.04963869

> sd(phat)

[1] 0.04952708Ernst Wit / Wim Krijnen Week 2

Page 7: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

7.2 Confidence interval for population proportion, p

We have seen that (approximately)

p ∼ N

(p,

p(1− p)

n

),

and so (approximately)

p − p

SE(p)∼ N(0, 1)

whereSE(p) =

√p(1− p/n

From normal distribution we know, for example, that

P

(−1.96 ≤ p − p

SE(p)≤ 1.96

)≈ 0.95

I interval (p − 1.96SE(p), p + 1.96SE(p) contains p withapproximate probability 0.95

I This is a 95% confidence intervalErnst Wit / Wim Krijnen Week 2

Page 8: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

In general

α/2 = P(Z ≤ z∗), 1− α/2 = P(Z ≤ z∗)

zstar <- -qnorm(alpha/2); zstar <- qnorm(1-alpha/2)

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

z−axis

norm

al d

ensi

ty

rejectionregion

α 2

rejectionregion

α 2

acceptanceregion1 − α

zα 2− zα 2

So a (1− α)100% confidence interval for p is given as

(p − z ∗ SE(p), p + z ∗ SE(p)

Ernst Wit / Wim Krijnen Week 2

Page 9: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

What does this mean in practice?

“How does this example really help me, because inpractice I have only one example?”

Well, let’s go back to the Election example. We saw in one sample:

p = 0.58,

and so we can calculate√0.58× 0.42

100= 0.04935585

[NOTE: not so different from 0.04952708 from 1,000 samples]So, then we can claim that a 95% CI is given by:

(0.58− 0.04935× 1.96, 0.58 + 0.04935× 1.96) = (0.483, 0.677)

Ernst Wit / Wim Krijnen Week 2

Page 10: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Example 7.2

466 out of 1,013 voters rate precedent performance as “good”Construct 95% confidence interval for true proportion.

> n <- 1013; alpha <- 0.05

> phat <- 466/n

> SE <- sqrt(phat*(1-phat)/n)

> zstar <- qnorm(1-alpha/2)

> round(phat + c(-1,1)*zstar*SE,4)

[1] 0.4293 0.4907

This means

The “probability” that the interval covers the truefraction is 0.95.

We say

“We are 95% confident that the true fraction liesbetween 0.4293 and 0.4907.”

Ernst Wit / Wim Krijnen Week 2

Page 11: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Let R do the work: prop.test

> prop.test(466,1013,conf.level=0.95)

1-sample proportions test with continuity correction

data: 466 out of 1013, null probability 0.5

X-squared = 6.3179, df = 1, p-value = 0.01195

alternative hypothesis: true p is not equal to 0.5

95 percent confidence interval:

0.4290475 0.4912989

sample estimates:

p

0.4600197

Note: Slightly different result from solving Equation 7.1 directly.

Ernst Wit / Wim Krijnen Week 2

Page 12: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

7.5 Confidence intervals for differences of proportions

Example 7.8: Poll in Week 1 1,000 interviewed, 560 agree;Week 2 1,200 interviewed and 570 agree

Aim: to see if opinion in the population has changed.

Statistic:

Proportional change =570

1200− 560

1000= −0.085.

Is this really a change?

We use (again) the facts that:

I If n is large, then p2 − p1 is approximately normal.

I E (p2 − p1) = p2 − p1

I SE(p2 − p1) ≈√

p1(1−p1)n1

+ p2(1−p2)n2

p2 − p1 − (p2 − p1)

SE(p1 − p2)∼ N(0, 1)

Ernst Wit / Wim Krijnen Week 2

Page 13: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Confidence intervals for differences of proportions (2)

This means that a (1− α)100% CI for the proportion difference is:(p2 − p1 − zα/2 × SE(p2 − p1) , p2 − p1 + zα/2 × SE(p2 − p1)

).

In example 7.8:

> prop.test(x=c(560,570),n=c(1000,1200),conf.level=0.95)

data: c(560, 570) out of c(1000, 1200)

X-squared = 15.437, df = 1, p-value = 8.53e-05

95 percent confidence interval: 0.04231207 0.12768793

sample estimates: prop 1 prop 2 0.560 0.475

Conclusion: 0 not in CI; non-zero difference with 95% confidence.

Ernst Wit / Wim Krijnen Week 2

Page 14: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

7.5.2 Difference in means

Example 7.9 Weight loss under condition placebo (Group 1)or condition drug ephedra (Group 2).

x <- c(0,0,0,2,4,5,13,14,14,14,15,17,17)

y <- c(0,6,7,8,11,13,16,16,16,17,18)

for (i in 1:1000){

x.new <- sample(x,size=length(x),replace=T)

y.new <- sample(y,size=length(y),replace=T)

dif<-c(dif,mean(x.new)-mean(y.new))}

quantile(dif,c(.025,.975))

2.5% 97.5%

-7.610140 2.168007

hist(dif)

Histogram of dif

dif

Fre

quen

cy

−10 −5 0 5

050

100

150

Ernst Wit / Wim Krijnen Week 2

Page 15: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

General (normal) theory for comparing two means

Sample 1: X1,X2, · · · ,Xnx ∼ N(µ1, σ1) gives mean X , var. s2xSample 2: Y1,Y2, · · · ,Yny ∼ N(µ2, σ2) gives mean Y , var. s2y

Problem: Construct CI for X − Y

T =(X − Y )− (µx − µy )

SE(X − Y )∼ t − distribution with df

df =

{nx + ny − 2 if σ1 = σ2(

s2xnx

+s2yny

)2·((s2x /nx )

2

nx−1 +(s2y /ny )

2

ny−1

)−1if σ1 6= σ2

SE(X − Y ) ≈

s2p(1/nx + 1/ny ) if σ1 = σ2√s2x /nx + s2y /ny ) if σ1 6= σ2

(1− α) · 100%C.I. = (X − Y )± t∗ · SE(X − Y )

Ernst Wit / Wim Krijnen Week 2

Page 16: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Example 7.9: Weight loss under placebo vs ephedra

> t.test(x,y,var.equal=TRUE,conf.level=0.95)

t = -1.0542, df = 22, p-value = 0.3032 # 13+11-2=22

95 percent confidence interval:

-8.279119 2.698699

> t.test(x,y,var.equal=FALSE,conf.level=0.95)

t = -1.0722, df = 21.99, p-value = 0.2953 #21.99 = 22

95 percent confidence interval:

-8.187298 2.606878 #ouput adapted to fit screen

Conclusion: 0 in CI, no difference in average weight loss between 2groups belong to 95% of the expected values.

Ernst Wit / Wim Krijnen Week 2

Page 17: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

7.5.3 Matched samples

Data: Thickness of shoe sole materials A and B on one foot foreach of ten boys.

Aim: a difference in average wear between materials A and B?

> t.test(shoes$A,shoes$B,equal.var=T)

...

95 percent CI:

-2.745046 1.925046

sample estimates:

mean of x mean of y

10.63 11.04

8 10 12 14

810

1214

shoes$A

shoe

s$B

Ernst Wit / Wim Krijnen Week 2

Page 18: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Matched pairs should be treated specially

> with(shoes, t.test(A,B,paired=TRUE,conf.level=.95))

Paired t-test

data: A and B

t = -3.3489, df = 9, p-value = 0.008539

95 percent confidence interval:

-0.6869539 -0.1330461

I Conclusion: 0 not in CI; nonzero difference in means with95% confidence.

I Note: Different result from unpaired confidence intervals!

I Paired CI eliminates variability among boys.

Ernst Wit / Wim Krijnen Week 2

Page 19: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Conclusions

I Estimation is one of the foremost statistical inferencetechniques.

I Estimates are variable: if new data would be collected, theestimate would be different.

I Confidence intervals capture the amount of variability of theestimate w.r.t. the true underlying parameter.

I Correct interpretation of a (1-α)100% CI for parameter:

In (1-α)100% of the cases in which a CI would beconstructed in the above way, the true parametervalue would be contained in it.

I Short-hand:

We are (1-α)100% confident that the trueparameter value lies in the CI.

Ernst Wit / Wim Krijnen Week 2

Page 20: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Hypothesis testing

Ernst Wit / Wim Krijnen Week 2

Page 21: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

A Murder Mystery

DATA:

I 6:15 Ernst W. receives 20 min phone call

I 7:06 Ernst W. looks at his watch

I 7:08 Ernst W. bumps into his neighbour

I 7:13 Ernst W. arrives at the party

I 7:29 Ernst W. is discovered with the dead body of the host.

I Pathologist claims that host died between 7:00 and 7:05.

Police arrests W. on suspicion of murder and charge him with:

H0 : Ernst W. is guilty of the murder.

This is their working hypothesis or the null hypothesis

W. want to convince the judge of the alternative hypothesis:

H1 : Ernst W. is not guilty of the murder.

Ernst Wit / Wim Krijnen Week 2

Page 22: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

All’s well that ends well

W’s lawyer suggests following summary of data (test statistic):

T = the neighbour saw Ernst W. at home at 7:05.

Ernst W.’s lawyer now argues that:

I IF E.W. was guilty of the murder that took place between7:00 and 7:05pm,

I THEN it would be impossible that the neighbour E.W. athome at 7:05pm.

I BUT the neighbour saw E.W. at home at 7:05.

I SO E.W. cannot be the murderer.

In other words, the “possibility value”, the p-value:

Pr(T happens if H0is true) = very small.

Therefore the judge rejects the null-hypothesis.Ernst Wit / Wim Krijnen Week 2

Page 23: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

What if W. had hired a cheaper lawyer?

Another test-statistic, for instance,

S = W. made phone call at 6:15pm that lasted 20 minutes.

Although true, it does not disprove the null-hypothesis:

I IF E.W. is the murderer,

I THEN he may have made phone call at 6:15 lasting 20minutes.

So the p-value,

Pr(Shappens ifH0true) 6= small.

We cannot reject the null-hypothesis!

[Note: Large p-values do not prove the null-hypothesis.]

Ernst Wit / Wim Krijnen Week 2

Page 24: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Hypothesis Testing: General Theory

1. Key question: Is there evidence in favour of new theory?

2. Null-hypothesis: H0: THE NEW THEORY IS FALSE

3. Alternative hypothesis: H1: THE NEW THEORY IS TRUE

4. Test-statistic: T = summary of evidence in data.

5. Significance level: typically α = 0.05.

6. P-value: P(such an extreme T if H0 is true).

7. Decision:I If p-value < significance level, reject H0.I If p-value > significance level, do not reject H0.

Ernst Wit / Wim Krijnen Week 2

Page 25: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Statistical hypotheses

The Null-Hypothesis is always of the form:

H0 : parameter = some value.

Alternative Hypothesis is often negation of Null-Hypothesis:

H1 : parameter 6= some value.

This is called a 2-sided test.

Although 1-sided tests exist, i.e.,

H1 : parameter > value or parameter < value.

You are not allowed to use them after seeing the data.

NOTE: Statistical hypotheses are statements about population,not about the sample.

Ernst Wit / Wim Krijnen Week 2

Page 26: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

The test-statistic, p-value and significance

Test statistic: summary of data informative about the parameter.

There is a main distinction:

I Normal Tests

I Non-Parametric Tests

Significance level: allowable mistake of rejecting H0 when it’sactually true.

P-value: probability of that same mistake in this particular case.

Asymmetric decision:

I IF p-value < significance level, thensufficient evidence to reject H0 and believe that H1 is true.

I IF p-value > significance level, theninsufficient evidence to reject H0. Proven nothing either way.

Ernst Wit / Wim Krijnen Week 2

Page 27: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

8.1 Significance test for a population proportion

Example 8.3: Known poverty rate in 2000 is 11.3%.Sample of 50,000 in 2001; 5,850 (11.7%) indicate poverty.

Question: Did rate of poverty increase?

Hypotheses: Test no change of poverty against change of poverty

H0 : p = 0.113

H1 : p 6= 0.113

Test-statistic:

T = fraction of poor in sample in 2001.

Significance level:α = 0.05.

P-value: (Think hard about this!)

p-value = P(|T − 0.113| > |0.117− 0.113|)

Ernst Wit / Wim Krijnen Week 2

Page 28: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Calculating the p-value

Besides being conceptually challenging, we also need to remembera few things from last lecture in order to calculate the p-value.

Remember: For large n the sample proportion p satisfies

p − p0

SE(p|H0)=

p − p0√p0(1− p0)/n

∼ N(0, 1).

So

p-value = P(|T − 0.113| > |0.117− 0.113|)= P(| p−0.113√

0.113(1−0.113)/50000| > | 0.117−0.113√

0.113(1−0.113)/50000|)

= P(|Z | > 2.825)

= 2P(Z > 2.825)

= 0.0047

Ernst Wit / Wim Krijnen Week 2

Page 29: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Decision

By the way, we can also get the p-value from R:

> prop.test(x=5850,n=50000,p=.113)

1-sample proportions test with continuity correction

data: 5850 out of 50000, null probability 0.113

X-squared = 7.9417, df = 1, p-value = 0.004831

Decision: p-value < 0.05, so we reject H0.

Conclusion: We have significant evidence that poverty increasedin the population from 2000 to 2001.

PS. in the book a one-sided test is performed. This is not recommended!

Ernst Wit / Wim Krijnen Week 2

Page 30: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

8.5 Two-sample tests of proportion

Example 8.8: Poverty rate (continued). What happened withthe fraction of poor from 2001 to 2002?

Sample:2001: 5850 out of 50000 indicate poverty2002: 7260 out of 60000 indicate poverty

Hypotheses: H0 : p1 = p2 against H1 : p1 6= p2 or equivalently:

H0 : p2 − p1 = 0

H1 : p2 − p1 6= 0

Test-statistic:T = p2 − p1.

We can use again that approximately for large n1 and n2:

p2 − p1 − (p2 − p1)√p1(1−p1)

n1+ p2(1−p2)

n2

∼ N(0, 1)

Ernst Wit / Wim Krijnen Week 2

Page 31: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Did poverty change from 2001 to 2002?

We can calculate the p-value by first principles, or ...

> prop.test(x=c(5850,7260),n=c(50000,60000))

data: c(5850, 7260) out of c(50000, 60000)

X-squared = 4.1187, df = 1, p-value = 0.04241

alternative hypothesis: two.sided

95 percent confidence interval:

-0.0078584975 -0.0001415025

sample estimates:

prop 1 prop 2

0.117 0.121

Decision: p-value < 0.05, so reject H0.

Conclusion: Also from 2001 to 2002 the fraction of poorincreased in the population.

Ernst Wit / Wim Krijnen Week 2

Page 32: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

One Sample T-Test

1. Key question of Interest:

Is population mean equal to some pre-specifiednumber, say µ0.

2. Hypotheses: Null-Hypothesis and Alternative Hypothesis:

H0 : population mean =µ0

H1 : population mean 6= µ0.

3. Data is a random sample from a population.

4. Test-Statistic: T = sample mean

5. P-value: P(|T − µ0| > |x − µ0|), where x is the observedsample mean.

Ernst Wit / Wim Krijnen Week 2

Page 33: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

IQ OF DRUG OFFENDERS

Question: Is there evidence that average IQ level of soft drugoffender population is different from 100?

Data: IQ scores on a sample of 15 soft drug offenders.

IQ of 15 soft drugs offenders

x

Fre

quen

cy

80 90 100 110 120 130 140 150

01

23

4

Hypotheses:

H0 : pop. mean IQ soft drug offenders = 100

H1 : pop. mean IQ soft drug offenders 6= 100

Ernst Wit / Wim Krijnen Week 2

Page 34: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Test-statistic, p-value en conclusion

Test-statistic: T = average IQ of 15 drug offenders.P-value: We observe t = 114.6 and SD = 16.6:

p-value = P(|T − 100| > |114.6− 100|)

= P(|T − 100|16.6/

√15

>|114.6− 100|

16.6/√

15)

= P(|t14| > 3.41)

= 0.0042

... or with R:

> t.test(x,mu=100)

t = 3.41, df = 14, p-value = 0.004228

alternative hypothesis: true mean is not equal to 100

Decision: Since p-value = 0.0042 < 0.05, we reject thenull-hypothesis H0.

Conclusion: Soft drug users are on average more intelligent thangeneral population.

Note: (a) conclusion in terms of population (b) one-sidedconclusion.

Ernst Wit / Wim Krijnen Week 2

Page 35: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Matched samples: paired t-test

Data: Thickness of shoe sole materials A and B on one foot foreach of ten boys.

Aim: a difference in average wear between materials A and B?

> t.test(shoes$A,shoes$B)

data: shoes$A and shoes$B

t = -0.3689, df = 17.987,

p-value = 0.7165

8 10 12 14

810

1214

shoes$A

shoe

s$B

Ernst Wit / Wim Krijnen Week 2

Page 36: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Matched pairs should be treated specially

> with(shoes, t.test(A,B,paired=TRUE,conf.level=.95))

Paired t-test

data: A and B

t = -3.3489, df = 9, p-value = 0.008539

95 percent confidence interval:

-0.6869539 -0.1330461

I Decision: p-value < 0.05, so reject H0.

I Conclusion: there is a difference between average wear of thetwo types of materials.

I Paired t-test eliminates variability among boys.

Ernst Wit / Wim Krijnen Week 2

Page 37: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Non-parametric tests

Have another look at the picture:

8 10 12 14

810

1214

shoes$A

shoe

s$B

Fact: 8 out of 10 points lies above the line-of-equality...

Test-statistic: T = number of points above the line

Ernst Wit / Wim Krijnen Week 2

Page 38: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Sign test

Hypotheses:

H0 : Median wear A = Median wear B

H1 : Median wear A 6= Median wear B

Note:T |H0 ∼ Binomial(10, .5).

p-value = P(|T − 5| > |8− 5|)= P(T ≥ 8) + P(T ≤ 2)

= 0.1094

I Decision: p-value > 0.05, so do not reject H0.I Conclusion: there is no evidence in T that there is a

difference between average wear of the two types of materials.I Sign test is LESS powerful than the paired t-test,

BUT it makes fewer assumptions.Ernst Wit / Wim Krijnen Week 2

Page 39: Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit / Wim Krijnen Week 2. General (normal) theory for comparing two means ... We are

Conclusions

Hypothesis testing checks to what extent the data can beexplained by a status quo assumption, or whether it needs to berejected because the data are too unusual otherwise.

I Terminology: hypotheses, test-statistic, p-value, significancelevel.

I Interpretation: be careful when interpreting p-values!

I Tests: most common tests are based on normal distribution.Non-parametric tests avoid making normal assumptions, butare less powerful if these assumptions are actually true.

Ernst Wit / Wim Krijnen Week 2