Introduction to Hypothesis Testing AP Statistics Chap 11-1.

Introduction to Hypothesis Testing

AP Statistics Chap 11-1

Statistical Dilemma


AT&T believes the average telephone bill in Columbus, Georgia is $42.05 per month.

They take a sample of 100 bills and find that the average value of the sample is $55.57.

What does it mean?


Hypothesis Testing

Population

Conclusion: Mean age is lower than thought.

How strong is the evidence?

Sample

Now select a random sample

Compare the sample results tocurrent accepted facts/thoughts. If currently accepted that mean age is 50 and sample mean is 20.

What is a Hypothesis?

• A hypothesis is a theory proposed to explain a observation.

– population mean

– population proportion


Example: The mean monthly cell phone bill of this city is = $42

Example: The proportion of adults in this city with cell phones is p = .68

The Null Hypothesis, H0

• States the currently accepted fact

Example: The average number of TV sets in U.S. Homes is at least three ( )


3μ:H0 3x:H0

3μ:H0

Is always about a population parameter, not about a sample statistic

The Null Hypothesis, H0

• Assume that the null hypothesis is true until there is sufficient evidence to reject it.– Similar to the notion of innocent until

proven guilty• Always contains “=” , “≤” or “” sign• May or may not be rejected– Never proven true or false


The Alternative Hypothesis, HA

• Is generally the hypothesis that is believed by the researcher based on the sample.

• Challenges the Ho

• Is the opposite of the null hypothesis– e.g.: The average number of TV sets in U.S.

homes is less than 3 ( HA: < 3 )

• Never contains the “=” , “≤” or “” sign• Stated as “≠”, “>” or “<“


If it is unlikely that we would get a sample mean of this value ...

Reason for Rejecting H0


Sampling Distribution of the Statistic

= 50If H0 is true ... then we reject the

null hypothesis that = 50.

20

... if in fact this were the population mean…

x

Level of Significance,

• Defines unlikely values of sample statistic if null hypothesis is true– Defines rejection region of the sampling

distribution

• Is designated by , (level of significance)– Typical values are .01, .05, or .10

• Is selected by the researcher at the beginning


Level of Significance and the Rejection Region


H0: μ =50 HA: μ < 50 0

a

Lower tail test

Level of significance = a

0

H0: μ = 50 HA: μ > 50

a

0Upper tail test

H0: μ = 50 HA: μ ≠ 50

/2a

Two tailed test

Rejection region is shaded

/2a

p-Value Approach to Testing

• p-value: Probability of obtaining a test statistic more extreme ( ≤ or ) than the observed sample value given H0 is true

– Also called observed level of significance


p-Value Approach to Testing

• Obtain the p-value from a computer randomization model more extreme

• Compare the p-value with

– If p-value < , reject H0

– If p-value , do not reject H0


Interpreting the p-value…


Overwhelming Evidence(Highly Significant)

Strong Evidence(Significant) Weak Evidence

(Not Significant)

No Evidence(Not Significant)

0 .01 .05 .10

Pictures were taken of 25 owners and their purebred dogs, selected at random from dog parks. Study participants were shown a picture of an owner together with pictures of two dogs (the owner’s dog and another random dog from the study) and asked to choose which dog most resembled the owner. Of the 25 owners, 16 were paired with the correct dog. Is this convincing evidence that dogs tend to resemble their owners or just the results of random chance?

How extreme is a phat of .64, if the results is random chance?

Dogs and Owners

Distribution of sample proportions

P-Value = .238 for two tail test

Do men and women have different views on divorce? A May 2010 Gallup poll of U.S. citizens over the age of 18 asked participants if they view divorce as “morally acceptable”. Of the 1029 adults surveyed, 71% of men and 67% of women responded ‘yes’.

What does the survey indicate?Men and women may differ in opinion.

What is the no change hypothesis?Men and women do not differ in opinion.

: 0

: 0a

o M W

M W

H P P

H P P

Attitude Toward Divorce

Attitude Toward Divorce

Is there sufficient evidence that men and women differ?

Researchers trained a sample of male college students to tap their fingers at a rapid rate. The sample was then divided at random into two groups of ten students each. Each student drank the equivalent of about two cups of coffee, which included about 200 mg of caffeine for the students in one group but was decaffeinated coffee for the second group. After a two hour period, each student was tested to measure finger tapping rate (taps per minute). The goal of the experiment was to determine whether caffeine produces an increase in the average tap rate.

What are the Null and Alternate Hypotheses

Caffeine and Finger Tapping

Hypotheses

0

0

:

:NC

NC

C

C

H

H

Or

0

0

: 0

: 0NC

NC

C

C

H

H

Caffeine and Finger Tapping

Researchers conducted a study examining the effect of a smile on the leniency of disciplinary action. For each suspect, along with a description of the offense, a picture was provided with either a smile or neutral facial expression. A leniency score was calculated based on the disciplinary. The experimenters are testing to see if the average lenience score is higher for smiling students than it is for students with a neutral facial .

Smiles and Punishment

What are the null and alternate hypotheses?

o S NS

a S NS

H : μ = μ

H : μ > μ

Smiles and Punishment

If α = .05, is the results statistically significant?

In a study of relationships between the type of uniforms worn by professional sports teams and the aggressiveness of the team, they consider teams from the National Football League (NFL). Participants with no knowledge of the teams rated the jerseys on characteristics such as timid/aggressive, nice/mean and good/bad. The averages of these responses produced a “malevolence” index with higher scores signifying impressions of more malevolent uniforms. To measure aggressiveness, the authors used the amount of converted to z-scores and averaged for each team over the seasons from 1970-1986. r = 0.43

Is there a correlation between uniforms and penalties in the NFL?

What are Ho and Ha?

NFL Uniforms vs Penalties

Hypotheses

: 0

: 0O

A

H

H

NFL Uniforms vs Penalties

Lithium vs Placebo

An experiment to investigate the effectiveness of the two drugs desipramine and lithium in the treatment of cocaine addiction was conducted. Subjects (cocaine addicts seeking treatment) were randomly assigned to take one of the treatment drugs or a placebo so that there were 24 patients in each group. The results of the study are summarized in the table below. The question of interest is whether lithium is more effective at preventing relapse than taking an inert pill.State the null and alternative hypotheses.

𝑯𝑶 : 𝒑𝑳=𝒑𝑵

𝑯 𝑨 : 𝒑𝑳<𝒑𝑵

How would you test these hypotheses?

Type I and Type II Errors

State of Nature

Decision

Do NotReject No Error Type II Error

Reject Type I Error

Possible Hypothesis Test Outcomes

H0 False H0 True

No Error

Practical vs Statistical Significance

Local college offers an SAT preparation course and provides a statistical analysis on its website showing that 95% of students improve their SAT score after taking their $1000 course.

How much would it have to improve your score to make the cost of the course worthwhile?

50 points?100 points?300 points?

Statistically significant results does not imply the size of the difference.

Introduction to Hypothesis Testing AP Statistics Chap 11-1.

Documents

pvalue ap statisticschap

owners slide

testing pvalue

sample statistic slide

sample mean

pvalue approach

observed sample value

average value