- 1. Hypotheses In Statistics, a hypothesis proposes a model for
the world and then we look at the data. If the data are consistent
with that model, we have no reason to disbelieve the hypothesis.
Data consistent with the model lend support to the hypothesis, but
do not prove it. But if the facts are inconsistent with the model,
we need to make a choice as to whether they are inconsistent enough
to disbelieve the model. If they are inconsistent enough, we can
reject the model.
2. Hypotheses Testing Think about the logic of jury trials: To
prove someone is guilty, we start by assuming they are innocent. We
retain that hypothesis until the facts make it unlikely beyond a
reasonable doubt. Then, and only then, we reject the hypothesis of
innocence and declare the person guilty. 3. Hypotheses (cont.) The
statistical twist is that we can quantify our level of doubt. We
can use the model proposed by our hypothesis to calculate the
probability that the event weve witnessed could happen. Thats just
the probability were looking forit quantifies exactly how surprised
we are to see our results. This probability is called a P-value. 4.
Our Problem Suppose we tossed a coin 100 times and we have obtained
38 heads and 62 tails. Is the coin biased toward tails? There is no
way to say yes or no with 100% certainty. But we can evaluate the
strength of support to the hypothesis that the coin is biased. 5.
Hypotheses (cont.) Null hypothesis- H0 established fact, no change
of parameters, a statement that we expect data to contradict
(status quo) Alternative hypothesis- HA new conjuncture, change of
parameters, your claim, a statement that needs a strong support
from data to claim it. Our problem: testing a hypothesis about p =
proportion of times it turns tails (in the long run) H0: coin is
fair, p = 0.5 (or p 0.5) HA: coin is biased, p > 0.5 6. Ex: A
statistics professor wants to see if more than 80% of her students
enjoyed taking her class. At the end of the term, she takes a
random sample of students from her large class and asks, in an
anonymous survey, if the students enjoyed taking her class. Which
set of hypotheses should she test? A. H0: p < 0.80 HA: p >
0.80 B. H0: p = 0.80 HA: p > 0.80 C. H0: p > 0.80 HA: p =
0.80 D. H0: p = 0.80 HA: p < 0.80 7. Ex: An online catalog
company wants on-time delivery for 90% of the orders they ship.
They have been shipping orders via UPS and FedEx but will switch to
a new, cheaper delivery service (ShipFast) unless there is evidence
that this service cannot meet the 90% on-time goal. As a test the
company sends a random sample of orders via ShipFast, and then
makes follow-up phone calls to see if these orders arrived on time.
Which hypotheses should they test? A. H0: p < 0.90 HA: p >
0.90 B. H0: p = 0.90 HA: p > 0.90C. H0: p > 0.90 HA: p = 0.90
D. H0: p = 0.90 HA: p < 0.90 8. Hypotheses (cont.)When the data
are consistent with the model from the null hypothesis, the P-value
is high and we are unable to reject the null hypothesis. In that
case, we have to retain the null hypothesis we started with. We
cant claim to have proved it; instead we fail to reject the null
hypothesis when the data are consistent with the null hypothesis
model and in line with what we would expect from natural sampling
variability. If the P-value is low enough, well reject the null
hypothesis, since what we observed would be very unlikely were the
null model true. Assume that the null hypothesis Ho is true and
uphold it, unless data strongly speaks against it. 9. Testing
Hypotheses The null hypothesis, which we denote H0, specifies a
population model parameter of interest and proposes a value for
that parameter. We want to compare our data to what we would expect
given that H0 is true. We can do this by finding out how many
standard deviations away from the proposed value we are. We then
ask how likely it is to get results like we did if the null
hypothesis were true. 10. The Reasoning of Hypothesis Testing 1.
Hypotheses The null hypothesis: To perform a hypothesis test, we
must first translate our question of interest into a statement
about model parameters.In general, we have H0: parameter =
hypothesized value.The alternative hypothesis: The alternative
hypothesis, HA, contains the values of the parameter we accept if
we reject the null. 11. The Reasoning of Hypothesis Testing (cont.)
2. Model The test about proportions is called a one-proportion
z-test. 12. One-Proportion z-Test The conditions for the
one-proportion z-test are the same as for the one proportion
z-interval. We test the hypothesis H0: p = p0using the statisticz
where SD p p p0 SD pp0 q0 n When the conditions are met and the
null hypothesis is true, this statistic follows the standard Normal
model, so we can use that model to obtain a P-value. 13. The
Reasoning of Hypothesis Testing (cont.) 3. Mechanics Under
mechanics we place the actual calculation of our test statistic
from the data. Different tests will have different formulas and
different test statistics. Usually, the mechanics are handled by a
statistics program or calculator, but its good to know the
formulas. 14. The Reasoning of Hypothesis Testing (cont.) 3.
Mechanics If the difference between what we have observed and what
is expected under the null model H0 assumption is statistically
significant (large enough) then we reject H0 in favor of HA. 15.
Our Coin Problem where and p0 is the H0 value of the parameter, in
our case p0=0.5. 16. The Reasoning of Hypothesis Testing (cont.) 3.
Mechanics continued The ultimate goal of the calculation is to
obtain a P-value. The P-value is the probability that the observed
statistic value (or an even more extreme value) could occur if the
null model were correct. If the P-value is small enough, well
reject the null hypothesis. Note: The P-value is a conditional
probabilityits the probability that the observed results could have
happened if the null hypothesis is true. 17. The Reasoning of
Hypothesis Testing P-value The probability that the test statistics
takes the observed or more extreme value, when the null hypothesis
H0 is true. Our Problem: P-value = P(z > 2.4)= .0082 For a fair
coin the probability of seeing 62 or more tails in 100 tosses is
less than 0.01 (1%). The smaller the p-value, the stronger evidence
against H0 (that is in favor of HA). So we reject the null
hypothesis that this is a fair coin and support the alternative
that it is biased towards tails. 18. Just Checking 1. An allergy
drug has been tested and found to give relief to 75% of the
patients in a large clinical trial. Now the scientists want to see
if the new improved version works even better. What would the null
hypothesis and alternative hypothesis be? 2. The new drug is tested
and the P-value is 0.0001. What would you conclude about the new
drug? 19. P-value info (Ch 21) We can use an alpha level or to set
a threshold on our P-value. Alpha level is also called the
significance level. If our P-value is less than our alpha level, we
will reject the null hypothesis. If our P-value is greater than our
alpha level, we have to fail to reject the null hypothesis. We can
define a rare event arbitrarily by setting a threshold for our
P-value.We would then say that the results are statistically
significant.Alpha levels are represented using the symbol
.Typically we use = 0.1, 0.05, or 0.01.When in doubt, we use =
0.05.Partially depends on importance of claim being made. The more
important the claim or higher the stakes, the higher an alpha level
you would use. 20. Statistically Significant (Ch 21) When we get a
P-value below our alpha level (lets assume 0.05), we can say we
reject the null hypothesis at the 5% level of significance.
Sometimes, statistical significance doesnt mean the difference is
important in the context of the situation. On the other hand,
sometimes a significant difference may turn out to not be
statistically significant. Sometimes a larger sample size can fix
this. 21. Statistically Significant (Ch 21) It may make you
uncomfortable to reject/fail to reject. If your P-value falls just
slightly above your alpha level, youre not allowed to reject the
null hypothesis. (fail to reject the null) Yet a P-value just
barely below the alpha level leads to rejection. When you decide to
declare a verdict, it is a good idea to report the P-value as an
indication of the strength of the evidence. 22. The Reasoning of
Hypothesis Testing (cont.) 4. Conclusion/Decision The
conclusion/decision in a hypothesis test is always a statement
about the null hypothesis. The conclusion must state either Reject
H0 Fail to reject H0 (uphold H0)And, as always, the conclusion
should be stated in context. 23. The Reasoning of Hypothesis
Testing (cont.) 4. Conclusion Your conclusion about the null
hypothesis should never be the end of a testing procedure. Often
there are actions to take or policies to change. 24. Alternative
Hypotheses There are three possible alternative hypotheses: HA:
parameter < hypothesized value HA: parameter hypothesized value
HA: parameter > hypothesized value 25. Alternative Hypotheses
(cont.) HA: parameter value is known as a two-sided alternative
because we are equally interested in deviations on either side of
the null hypothesis value. For two-sided alternatives, the P-value
is the probability of deviating in either direction from the null
hypothesis value. 26. Alternative Hypotheses (cont.) The other two
alternative hypotheses are called one-sided alternatives. A
one-sided alternative focuses on deviations from the null
hypothesis value in only one direction. Thus, the P-value for
one-sided alternatives is the probability of deviating only in the
direction of the alternative away from the null hypothesis value.
27. Alternative Hypotheses (cont.) 28. Critical Values for
Hypothesis Testing Just like we used critical values in confidence
intervals, we will use them with alpha levels.If our z-score is
more extreme than the critical value, then we will have a P-value
smaller than our alpha level. 29. Just Checking cont. 3. A bank is
testing a new method for getting delinquent customers to pay their
past-due credit card bills. The standard way was to send a letter
(costing about $0.40 each) asking the customer to pay. That worked
30% of the time. They want to test a new method that involves
sending a video tape to the customer encouraging them to contact
the bank and set up a payment plan. Developing and sending the
video costs about $10.00 per customer. What is the parameter of
interest? What are the null and alternative hypotheses? 30. Just
Checking cont. 4. The bank sets up an experiment to test the
effectiveness of the video tape. They mail it out to several
randomly selected delinquent customers and keep track of how many
actually do contact the bank to arrange payments. The banks
statistician calculates a P-value of 0.003. What does this P-value
suggest about the video tape? 31. 5. Some people are concerned that
new tougher standards and high-stakes tests may drive up the high
school dropout rate. The National Center for Education Statistics
reported that the high school dropout rate for the year 2004 was
10.3%. One school district, whose dropout rate has always been very
close to the national average, reports that 210 of their 1782
students dropped out last year. Is their experience evidence that
the dropout rate is increasing? 32. 6. In a study of 11,000 car
crashes, it was found that 5720 of them occurred within 5 miles of
home. Is this significant evidence to show that more than 50% of
car crashes occur within 5 miles of home? 33. Confidence Intervals
and Hypothesis Tests Confidence intervals and hypothesis tests are
built on the same calculations with the same assumptions and
conditions. Our conclusion about the null should be consistent with
whether or not the proportion in the claim falls within the
confidence interval. A 95% confidence interval corresponds with a
two-sided hypothesis test with = 5%. 34. Confidence Levels and
Hypothesis Testing A confidence interval with a confidence level of
C% corresponds to a two-sided hypothesis test with an level of 100
C%. A confidence interval with a confidence level of C% corresponds
to a one-sided hypothesis test with an level of (100 C)%. Think
about it: A one-sided test with = 5% corresponds to a confidence
interval with 5% on each side, giving 90% confidence level. 35.
Example: Is Euro a fair coin? Soon after the Euro was introduced as
currency in Europe, it was widely reported that someone had spun a
Euro 250 times and gotten heads 140 times. a. Estimate the true
proportion of heads using a 95% confidence interval. (remember to
check conditions)CI : pz* pq n(.56)(.44) .56 1.96 250.56 .062CI :
(.488,.622) b. Does your confidence interval provide evidence that
the coin is unfair when spun? Explain. c. What is the significance
level? 36. Just Checking 7. An experiment to test the fairness of a
roulette wheel gives a z-score of 0.62. What would you conclude? 8.
We encountered a bank that wondered if it could get more customers
to make payments on delinquent balances by sending them a DVD
urging them to set up a payment plan. Well, the bank just got back
the results on their tests of this strategy. A 90% confidence
interval for the success rate is (0.29, 0.45). Their old
send-a-letter method had worked 30% of the time. Can you reject the
null hypothesis that the proportion is still 30% at =0.05? Explain.
9. Given the confidence interval the bank found in their trial of
DVDs, what would you recommend that they do? Should they scrap the
DVD strategy? 37. Errors in Hypothesis Testing Even with our
careful analysis and lots of evidence, we can make an incorrect
decision. Two ways we can make mistakes with hypothesis testing:
Type I: null hypothesis is true, but we reject it. (HOT) Type II:
null hypothesis is false, but we fail to reject it. (HAT) Which
error is more serious depends on the situation. 38. Type I Error-
HOT In medical terms, this would be a false positive. A healthy
person is diagnosed with a disease incorrectly.In jury terms, this
would mean an innocent person is convicted. 39. Type II Error- HAT
In medical terms, this would be a false negative. An infected
person goes undiagnosed.In jury terms, this would mean an guilty
person is not convicted. 40. Type I and II Errors 41. Just Checking
continued 10. Remember our bank? It is looking for evidence that
the costlier DVD strategy produces a higher success rate than the
letters it has been sending. Explain what a Type I error is in this
context and what would the consequences would be to the bank? 11.
Whats a Type II error in the bank experiment context, and what
would the consequences be? 42. Example: Spam Filter 12. Suppose a
spam filter uses a point system to score each email based on
sender, subject, and keywords. The higher the point total, the more
likely that the message is spam. We can think of the filters
decision as a hypothesis test. The null hypothesis is that the
email is a real message. A high point score would be evidence that
it is junk and will therefore reject the null hypothesis and
classify it as spam. a. When the filter allows spam to slip through
into your inbox, which kind of error is this? b. Which kind of
error is it when a real message gets classified as junk? c. If the
filter has a default cutoff score of 50 , but you reset it to 60,
is that analogous to choosing a higher or lower value of for a
hypothesis test? 43. Probability of Errors To reject H0, the
P-value must fall below . When H0 is true that happens exactly with
probability so when you choose the level , you are setting the
probability of a Type I error to . When H0 is false and we fail to
reject it, we have made a Type II error. We assign the letter to
the probability of this mistake. 44. Reducing Errors We can reduce
to lower the chance of a Type I Error, but then that will have the
effect of raising . The only way to really reduce both Type I and
Type II errors simultaneously is to increase our sample size, which
will reduce our standard deviations. 45. What Can Go Wrong? Dont
interpret the P-value as the probability that H0 is true. Dont
believe too strongly in arbitrary alpha levels. Dont confuse
practical and statistical significance. Dont forget that in spite
of all your care, you might make a wrong decision.