Hypothesis testing Hypothesis testing Summer Program Summer Program Brian Healy Brian Healy
Hypothesis testingHypothesis testing
Summer ProgramSummer Program
Brian HealyBrian Healy
Last classLast class
Study designStudy design– What is sampling variability?What is sampling variability?– How does our sample effect the How does our sample effect the
questions we can answer?questions we can answer? Basics of probabilityBasics of probability Central limit theoremCentral limit theorem Sample meanSample mean
What are we doing today?What are we doing today?
Rare eventRare event p-valuep-value Hypothesis testHypothesis test t-distribution / sample standard t-distribution / sample standard
deviationdeviation
Big pictureBig picture
We discussed last week that we could We discussed last week that we could estimate the population mean with the estimate the population mean with the sample mean and the central limit theorem sample mean and the central limit theorem told us the distribution of the sample mean.told us the distribution of the sample mean.
Now, we are going to consider testing Now, we are going to consider testing whether or not our sample mean is equal to whether or not our sample mean is equal to a hypothesized value. We call this a hypothesized value. We call this hypothesized value the null hypothesis. hypothesized value the null hypothesis. This test allows us to compare our sample This test allows us to compare our sample to a value in a statistically meaningful way.to a value in a statistically meaningful way.
Null hypothesisNull hypothesis We set up our null hypothesis so that we We set up our null hypothesis so that we
can reject the null hypothesis. The test is can reject the null hypothesis. The test is designed to disprove the nulldesigned to disprove the null
The first and most important step in any The first and most important step in any problem. This part requires knowledge of problem. This part requires knowledge of the problem.the problem.
Notation: HNotation: H00 HH00: My mother can run a 5 minute mile.: My mother can run a 5 minute mile.
– Not: My mother cannot run a 5 minute mile.Not: My mother cannot run a 5 minute mile. HH00: The probability of heads on the coin is : The probability of heads on the coin is
0.5. 0.5. – Not: The probability is not 0.5Not: The probability is not 0.5
Alternative hypothesisAlternative hypothesis
Notation: HNotation: HAA or H or H11
Has two characteristicsHas two characteristics– Must cover all values not included in the Must cover all values not included in the
nullnull– Must contain the value that we think is Must contain the value that we think is
going to happengoing to happen
HHAA: My mother runs a mile slower than 5 : My mother runs a mile slower than 5 minutesminutes
HHAA: The probability of heads is not 0.5: The probability of heads is not 0.5
Hypothesis testHypothesis test Definition: A statistical test of a null Definition: A statistical test of a null
hypothesishypothesis Completed under the assumption that the Completed under the assumption that the
null is true (conditional probability)null is true (conditional probability) Always want to disprove the null hypothesisAlways want to disprove the null hypothesis
– Ex. HEx. H00: Mom’s mean time<=5:00: Mom’s mean time<=5:00– HHAA:: Mom’s mean time>5:00Mom’s mean time>5:00– Alternatively: HAlternatively: H00: Probability of heads=0.5: Probability of heads=0.5– HHAA: Probability of heads != 0.5: Probability of heads != 0.5
The most important step is properly defining The most important step is properly defining the null and alternative hypothesesthe null and alternative hypotheses
One-sided
Two-sided
How do we test this How do we test this hypothesis?hypothesis?
Take a sampleTake a sample As we have discussed, we want to think As we have discussed, we want to think
carefully about the how to collect the carefully about the how to collect the sample to ensure that we limit bias sample to ensure that we limit bias confounding and allow the results to be confounding and allow the results to be generalized to the proper population.generalized to the proper population.
From this sample, we can find a summary From this sample, we can find a summary statistic and compare this to null statistic and compare this to null hypothesishypothesis– Mean (t-test, linear regression)Mean (t-test, linear regression)– Median (Wilcoxon tests, quantile regression)Median (Wilcoxon tests, quantile regression)
What does this have to do with What does this have to do with the CLT?the CLT?
To test a hypothesis, we take a sample and To test a hypothesis, we take a sample and find the sample meanfind the sample mean– Ex. Have my mom run a mile 10 times, or flip Ex. Have my mom run a mile 10 times, or flip
the coin 20 timesthe coin 20 times– Determining the proper sample size is next classDetermining the proper sample size is next class
Under the null hypothesis, we know the Under the null hypothesis, we know the population meanpopulation mean
We sometimes may know the population We sometimes may know the population variancevariance
The distribution of the sample mean is The distribution of the sample mean is normal with known mean and variance normal with known mean and variance under these conditionsunder these conditions
Distribution of test statisticDistribution of test statistic
Under the null hypothesis, we know that Under the null hypothesis, we know that the distribution of is normal with mean the distribution of is normal with mean and standard deviation and standard deviation
Now, we want to find the probability of Now, we want to find the probability of observing the sample mean or a value observing the sample mean or a value more extreme, under the null (more extreme, under the null (p-valuep-value) to ) to see if the null hypothesis is likely true or see if the null hypothesis is likely true or false. false.
Have we observed a Have we observed a rare eventrare event? Is it rare ? Is it rare enough to reconsider the null?enough to reconsider the null?
x
n
What is a rare event?What is a rare event?
My mom claims that she runs a mile in 5 My mom claims that she runs a mile in 5 minutes.minutes.
I think she can’tI think she can’t How can I test this?How can I test this? What happens if she ran a mile inWhat happens if she ran a mile in
– 5:15 minutes?5:15 minutes?– 6 minutes?6 minutes?– 10 minutes?10 minutes?
What if she ran 5 separate miles at 10 What if she ran 5 separate miles at 10 minutes on average?minutes on average?
What is a rare event?What is a rare event? You play a game against a friend. In this You play a game against a friend. In this
game, you win a dollar if the coin is heads game, you win a dollar if the coin is heads and you lose a dollar if the coin is tailsand you lose a dollar if the coin is tails
What is the null hypothesis?What is the null hypothesis? What if the coin landed on tails 2 What if the coin landed on tails 2
consecutive times?consecutive times? What if the coin landed on tails 10 What if the coin landed on tails 10
consecutive times?consecutive times? At what point would you start to get At what point would you start to get
suspicious?suspicious? We want to know if the event we observed We want to know if the event we observed
could have happened simply by chance or could have happened simply by chance or if something else is more likely going onif something else is more likely going on
P-valueP-value Tells you how rare the event isTells you how rare the event is Definition: Given a null hypothesis, the probability Definition: Given a null hypothesis, the probability
of the observed value or something more of the observed value or something more extremeextreme
P(event or something more extreme | HP(event or something more extreme | Hoo is true) is true) Ex. Coin toss problemEx. Coin toss problem
– Null hypothesis: P(tails)=0.5Null hypothesis: P(tails)=0.5– Sample 9 out of 10 tailsSample 9 out of 10 tails
– P(9 or more tails | HP(9 or more tails | H00 is true)=P(9 tails | H is true)=P(9 tails | H00 is is true)+P(10 tails | Htrue)+P(10 tails | H00 is true)=0.011 is true)=0.011
Alpha level-type I errorAlpha level-type I error Definition: probability of rejecting the null Definition: probability of rejecting the null
hypothesis when the null hypothesis is in fact true hypothesis when the null hypothesis is in fact true (rejection probability). (rejection probability).
Usually 0.05 or 0.1, but set by the investigatorUsually 0.05 or 0.1, but set by the investigator Compare the p-value to the alpha level to Compare the p-value to the alpha level to
determine if you have a significant result. This determine if you have a significant result. This value defines how rare an event needs to be for value defines how rare an event needs to be for use to say that the event did not occur by use to say that the event did not occur by chance.chance.
It is called an error because this conclusion that It is called an error because this conclusion that the result was not due to chance is wrong the result was not due to chance is wrong of the time.of the time.
One-sided or two-sidedOne-sided or two-sided
Steps for hypothesis testingSteps for hypothesis testing
1)1) State null and alternative hypothesesState null and alternative hypotheses
2)2) State type of test and alpha level State type of test and alpha level
3)3) Determine and calculate appropriate test Determine and calculate appropriate test statisticstatistic
4)4) Calculate p-valueCalculate p-value
5)5) Decide whether to reject or not reject the Decide whether to reject or not reject the null hypothesisnull hypothesis
• NEVER accept nullNEVER accept null
6)6) Write conclusionWrite conclusion
ExampleExample
A study in New Bedford was looking at A study in New Bedford was looking at pregnant teens to see how long after pregnant teens to see how long after pregnancy did each young woman arrive at pregnancy did each young woman arrive at the physician’s office for the first visit and the physician’s office for the first visit and the amount of time between the first visit the amount of time between the first visit and the second visit.and the second visit.
Questions: Do teens from a low income area Questions: Do teens from a low income area arrive at a clinic later than the average arrive at a clinic later than the average woman? Is there more time between the woman? Is there more time between the first and second visit among these teens?first and second visit among these teens?
It is known that the average amount of time It is known that the average amount of time from conception until a woman first visits from conception until a woman first visits her doctor is 8.5 weeks (this number is an her doctor is 8.5 weeks (this number is an estimate because it is difficult to know estimate because it is difficult to know exactly when conception occurred) and the exactly when conception occurred) and the average amount of time from first visit to average amount of time from first visit to second visit is 4.3 weeks.second visit is 4.3 weeks.
For the moment, let’s assume that we know For the moment, let’s assume that we know the population standard deviations for each the population standard deviations for each of these are 2.6 weeks and 2.2 weeks, of these are 2.6 weeks and 2.2 weeks, respectively.respectively.
We have collected a sample of 35 pregnant We have collected a sample of 35 pregnant teens and we would like to know if they take teens and we would like to know if they take longer to get their first visit than the longer to get their first visit than the average womanaverage woman
Sample dataSample data As with all of the data sets from now on, the data As with all of the data sets from now on, the data
is on the BIO232 website.is on the BIO232 website. Let’s determine the mean for this sample and Let’s determine the mean for this sample and
compare it to the hypothesized value.compare it to the hypothesized value. preg<-read.table(“preg.dat”, header=T)preg<-read.table(“preg.dat”, header=T)
first<-preg[,1]first<-preg[,1]mean(first) #This is the sample meanmean(first) #This is the sample mean[1] 9.74[1] 9.74
So the sample mean is clearly not equal to the So the sample mean is clearly not equal to the population mean (8.5 weeks), but is it sufficiently population mean (8.5 weeks), but is it sufficiently different to say that these girls are different than different to say that these girls are different than the population.the population.
Steps for hypothesis testingSteps for hypothesis testing1)1) Null: Null: =8.5 weeks, Alternative: =8.5 weeks, Alternative: != 8.5 weeks != 8.5 weeks2)2) One sample hypothesis test, alpha=0.05One sample hypothesis test, alpha=0.053)3)
4)4) Area in upper tail = 0.0024, p-value = 0.0048Area in upper tail = 0.0024, p-value = 0.00485)5) Reject nullReject null6)6) Conclusion: There is a difference in the Conclusion: There is a difference in the
amount of time from conception to the first amount of time from conception to the first visit to a physician. The time is longer for the visit to a physician. The time is longer for the pregnant teens.pregnant teens.
82.2356.2
5.874.9
n
xz
PicturePicture
Here is a pictureHere is a picture
8.5
Area=0.0024
9.74
Area=0.0024
Normal hypothesis test in RNormal hypothesis test in R
To complete a normal hypothesis test To complete a normal hypothesis test in R, you can simply use the in R, you can simply use the pnormpnorm command with the appropriate mean command with the appropriate mean and standard deviation. Remember, and standard deviation. Remember, pnormpnorm provides the area in the lower provides the area in the lower tail in all casestail in all cases
For the previous problem, to get the For the previous problem, to get the appropriate 2-sided p-value, useappropriate 2-sided p-value, use(1-pnorm(9.74,8.5,2.6))*2(1-pnorm(9.74,8.5,2.6))*2
Another way to look at the Another way to look at the testtest
Given a specific alpha Given a specific alpha level, you can find the level, you can find the cut-off for which all cut-off for which all values more extreme, values more extreme, the null hypothesis the null hypothesis would be rejectedwould be rejected
The region more The region more extreme is called the extreme is called the rejection regionrejection region
For our present For our present problem, the cut-off problem, the cut-off for the rejection for the rejection region would beregion would be
8.5
Area=0.025
z
36.935
6.296.15.8
96.1356.2
5.8
cut
cut
cut-off=9.36
PracticePractice
Here are the times my mom ran in the 10 Here are the times my mom ran in the 10 trials. Test the null hypothesis that she can trials. Test the null hypothesis that she can runs a 9:00 mile on average.runs a 9:00 mile on average.
mom<-c(9.5, 10, 8.75, 9, 11.2, 8.65, 9.6, mom<-c(9.5, 10, 8.75, 9, 11.2, 8.65, 9.6, 10.2, 8.8, 9.8)10.2, 8.8, 9.8)
What are the null and alternative What are the null and alternative hypotheses?hypotheses?
What do you conclude?What do you conclude? What would have happened if we had What would have happened if we had
completed a two-sided test?completed a two-sided test?
Comparison of one-sided and Comparison of one-sided and two-sided teststwo-sided tests
Two-sided p-value is always twice one-sided Two-sided p-value is always twice one-sided p-value.p-value.
Two-sided test is more conservative because Two-sided test is more conservative because the rejection region is split between the high the rejection region is split between the high and low side. For the one-sided test, the and low side. For the one-sided test, the rejection region is only on the side of interestrejection region is only on the side of interest
Two-sided test most common in literature Two-sided test most common in literature even though usually people know the even though usually people know the direction of effect they are interested in direction of effect they are interested in detecting.detecting.
PicturePicture
Wait a minuteWait a minute
Up to now, assumed we know the population Up to now, assumed we know the population variance (is this a good assumption?)variance (is this a good assumption?)
How could we estimate the population How could we estimate the population variance?variance?– Sample variance!!!Sample variance!!!
– Is the sample variance exactly equal to population Is the sample variance exactly equal to population variance?variance?
– How can we account for the additional How can we account for the additional uncertainty?uncertainty?
Now, we need to do a little mathNow, we need to do a little math
2
1
2
1
1
n
ii xx
ns
t-distributiont-distribution Assume Assume XXii are iid normal are iid normal Normal distributionNormal distribution
Chi-square distribution (Proof of this is Chi-square distribution (Proof of this is given in Casella and Berger and in given in Casella and Berger and in Inference I)Inference I)
t-distribution- ratio of Normal t-distribution- ratio of Normal (U)(U) and and chi-square (chi-square (V)V)
)1,0(~ N
n
X
212
2
~)1(
n
Sn
122~)1()1()1(
)()(
nt
nV
U
nSn
nX
n
SX
t-distributiont-distribution
Heavier tails than normal distributionHeavier tails than normal distribution– Accounts for additional variabilityAccounts for additional variability– Tails heavier with fewer degrees of Tails heavier with fewer degrees of
freedom (dof)freedom (dof) As dof goes to infinity, t dist As dof goes to infinity, t dist normal normal
distdist Can use t-dist test statistic just as the Can use t-dist test statistic just as the
previousprevious Remember assumption of underlying Remember assumption of underlying
normalnormal
ExampleExample We can use a t-test to We can use a t-test to
test the second null test the second null hypothesis about our hypothesis about our pregnant teens, pregnant teens, namely that the time namely that the time from the first visit to from the first visit to the second visit is the the second visit is the same as in the general same as in the general populationpopulation
First, we need to First, we need to ensure that the ensure that the underlying distribution underlying distribution is approximately is approximately normalnormal
Histogram of second
second
Frequency
2 4 6 8 10
01
23
45
6
Steps for hypothesis testingSteps for hypothesis testing
1)1) Null: Null: =4.3 weeks, Alternative: =4.3 weeks, Alternative: != 4.3 weeks != 4.3 weeks
2)2) One sample hypothesis t-test, alpha=0.05One sample hypothesis t-test, alpha=0.05
3)3)
4)4) p-value = 0.0017p-value = 0.0017
5)5) Reject nullReject null
6)6) Conclusion: There is a difference in the Conclusion: There is a difference in the amount of time from the first visit to the amount of time from the first visit to the second visit. The time is longer for the second visit. The time is longer for the pregnant teens.pregnant teens.
4.33504.2
8.497.534
ns
xt
One sample t-test in ROne sample t-test in R To complete a t-test in R, useTo complete a t-test in R, use
> > t.test(second,mu=4.8)t.test(second,mu=4.8)
One Sample t-testOne Sample t-test
data: second data: second t = 3.4035, df = 34, t = 3.4035, df = 34, p-value = 0.00172p-value = 0.00172alternative hypothesis: true mean is not equal to 4.8 alternative hypothesis: true mean is not equal to 4.8 95 percent confidence interval:95 percent confidence interval: 5.271960 6.670897 5.271960 6.670897 sample estimates:sample estimates:mean of x mean of x 5.971429 5.971429
PracticePractice
Using the class data set, test the Using the class data set, test the following hypotheses:following hypotheses:– The average age of an incoming student The average age of an incoming student
to the biostat program is 25. Is the to the biostat program is 25. Is the mean age of this year’s class mean age of this year’s class significantly different? Is there anything significantly different? Is there anything we need to consider in this analysis?we need to consider in this analysis?
– The average height of an incoming The average height of an incoming student is 71 inches. Is the mean height student is 71 inches. Is the mean height of this year’s class significantly shorter?of this year’s class significantly shorter?
More practiceMore practice
The TV watching habits of my The TV watching habits of my seventh grade classes are shown in seventh grade classes are shown in the dataset TV.dat from the course the dataset TV.dat from the course website. The gender and age of the website. The gender and age of the students is given as well. How did my students is given as well. How did my students TV watching habits students TV watching habits compare to the national average for compare to the national average for 77thth graders of 4 hours/day? Use an graders of 4 hours/day? Use an alpha level of 0.01.alpha level of 0.01.