Stat 217 – Day 27 Chi-square tests (Topic 25)
Dec 21, 2015
The Plan
Exam 2 returned at end of class today Mean .80 (36/45) Solutions with commentary online Discuss in class tomorrow
Today: Chi-square Tuesday: ANOVA Wednesday: Begin Regression Thursday: Regression lab
Previously
One population proportion or mean Comparing two population proportions or
means Is the difference statistically significant = larger
than what we would expect by chance (if no difference in the populations) Simulation Normal probability model
Chance = random sampling or random assignment
Next
Comparing more than 2 population proportions or more than 2 population means
Same first question: Is the response variable quantitative or categorical?
Random sampling or random assignment Same analysis but affects “scope of conclusions”
Activity 25-1 (p. 507) (a)-(f)
Observational study with an independent random sample in each of 1972, 1988, 2004 (explanatory variable) looking at whether people are “very happy” (response variable)
Could the differences in these three sample proportions have arisen by chance (random sampling process) alone?
Activity 25-1
Parameters? Let 72 represent the proportion of all adult Americans who
would have rated their general level as happiness as very happy in 1972
Similarly for 88 and 04
(g) H0 : 72 = 88 = 04 no association between happiness level and year
Ha: not all 3 equal (is an association) General strategy?
Assume Ho is true, what expect to see? Are our observed results surprising?
If Ho is true
What would our segmented bar graph look like in this case?
So our two-way table would be?
“expected counts”
Test statistic
Compare the observed counts to these expected counts
(n) Large values are evidence against Ho
How decide what is large?
Chi-square distribution
Minitab output (handout)
Chi-Square Test: 1972, 1988, 2004 Expected counts are printed below observed countsChi-Square contributions are printed below expected counts 1972 1988 2004 Total 1 486 498 419 1403 511.05 466.50 425.45 1.228 2.127 0.098 2 1120 968 918 3006 1094.95 999.50 911.55 0.573 0.993 0.046 Total 1606 1466 1337 4409 Chi-Sq = 5.064, DF = 2, P-Value = 0.079
Minitab output (handout)
Chi-Square Test: 1972, 1988, 2004 Expected counts are printed below observed countsChi-Square contributions are printed below expected counts 1972 1988 2004 Total 1 486 498 419 1403 511.05 466.50 425.45 1.228 2.127 0.098 2 1120 968 918 3006 1094.95 999.50 911.55 0.573 0.993 0.046 Total 1606 1466 1337 4409 Chi-Sq = 5.064, DF = 2, P-Value = 0.079
Minitab output (handout)
Chi-Square Test: 1972, 1988, 2004 Expected counts are printed below observed countsChi-Square contributions are printed below expected counts 1972 1988 2004 Total 1 486 498 419 1403 511.05 466.50 425.45 1.228 2.127 0.098 2 1120 968 918 3006 1094.95 999.50 911.55 0.573 0.993 0.046 Total 1606 1466 1337 4409 Chi-Sq = 5.064, DF = 2, P-Value = 0.079
(486-511.05)2
511.05
Minitab output (handout)
Chi-Square Test: 1972, 1988, 2004 Expected counts are printed below observed countsChi-Square contributions are printed below expected counts 1972 1988 2004 Total 1 486 498 419 1403 511.05 466.50 425.45 1.228 2.127 0.098 2 1120 968 918 3006 1094.95 999.50 911.55 0.573 0.993 0.046 Total 1606 1466 1337 4409 Chi-Sq = 5.064, DF = 2, P-Value = 0.079
1.228 + 2.217+.098+.573 + .993 + .046
Minitab output (handout)
Chi-Square Test: 1972, 1988, 2004 Expected counts are printed below observed countsChi-Square contributions are printed below expected counts 1972 1988 2004 Total 1 486 498 419 1403 511.05 466.50 425.45 1.228 2.127 0.098 2 1120 968 918 3006 1094.95 999.50 911.55 0.573 0.993 0.046 Total 1606 1466 1337 4409 Chi-Sq = 5.064, DF = 2, P-Value = 0.079
Activity 25-1
(p) With p-value = .079, fail to reject at the 5% level (but would at 10% level!)
(q) You have weak statistical evidence that the population proportions of very happy people were not identical for these three years. (Because these were random samples, you are safe in generalizing this conclusion to the populations of all American adults in each year but not a randomized experiment so no cause and effect relationship)
Activity 25-5 (p. 515)
Two way table Chi-square test (output on handout) But what about a two-sample z-test? Same exact results if using a two-sided
alternative!
Technical conditions
Independent random samples…
Expected cell counts are all at least 5 Are some ways to work around this…
To Turn in with Partner Read background of Activity 25-3 Examine output on handout What conclusions would you draw:
Significance, Causation, Generalizability
For Tuesday Finish Topic 25
Output on handout (don’t have to learn Minitab) Notice how the hypothesis statements in the pink boxes
differ across the scenarios Self-check Activity 25-6
Activity 25-2 (p. 511)
What if have a non-binary response variable?Same thing!
(a) Ho: the population distributions of happiness were the same all three yearsno association between happiness level and yearHa: the population distributions were not the same (is an association)
(b) X2 = 35.655 (df = 4), p-value = .000(c) Strong evidence of a change in at least one of
these population distributions
Activity 25-2
Where are the differences (descriptively)?
Fewer “not too happy” in 1998 than expected. More “not too happy” in 1972 than expected.
Activity 25-3 (e)-(g)
Can apply to randomized experiment as well Ho: The population proportions of potential
customers who would leave a tip (or the probability is the same regardless of the type of card they receive)No association between type of card and whether or not tip
Ha: not the same (is an association) Conclusion: Is significant evidence that the type of
card affects the likelihood of someone receiving a tip (more leaving a tip with a joke card than expected) At least for this waitress, this coffee bar
Activity 25-4
(b) Data collection:
one sample, both variables recorded simultaneously (not independent random samples or randomized experiment)
Ho: no association between happiness level and political inclination in population
Ha: is an association Same analysis!