Top Banner
Chi-square, Goodness of fit, and Contingency Tables
14
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chi-square, Goodness of fit, and Contingency Tables.

Chi-square, Goodness of fit, and Contingency Tables

Page 2: Chi-square, Goodness of fit, and Contingency Tables.

What is the χ2 distribution

Basically a distribution of squared differences

Page 3: Chi-square, Goodness of fit, and Contingency Tables.

Useful for detecting categorical differences

Calculate the χ2 test statistic= (observed-expected)2/expected

Degrees of freedom = number of categories -1

Look up χ2 value for that degree of freedom and chosen alpha value. If test statistic > table value, then significant

Page 4: Chi-square, Goodness of fit, and Contingency Tables.

1.Two sided test: find the column corresponding to α/2 in the table for upper critical values and

1. reject the null hypothesis if the test statistic is greater than the tabled value. 2.Use 1 - α /2 in the table for lower critical values and reject null if the test statistic is less than the tabled value.

2.Upper one-sided test: find column corresponding to α in upper critical values table. If test statistic greater, reject.

Page 5: Chi-square, Goodness of fit, and Contingency Tables.

Also useful for model fitting

Assume you have a fit a model to some data and have some residual errors left over.

You want to check if residuals are normally distributed. You bin them in a histogram

Estimate proportions of residuals in each, compare to actual data

Page 6: Chi-square, Goodness of fit, and Contingency Tables.

Model Fitting Example

  Dark Green Yellow Total

Observed numbers (O)

53 11 64

Expected numbers (E)

48 16 64

O - E 5 -5 0

(O-E)2 25 25

(O-E)2 / E 25/48 = 0.52 25/16 = 1.56 2.08

Consider a classic genetics experiment. The offspring of a cross between the F1 brassicas was 53 dark

green and 11 yellow. If the plants are heterozygous for color the ratio of 3 dark green to

1 yellow would be expected.

Page 7: Chi-square, Goodness of fit, and Contingency Tables.

Compound Hypotheses and Directionality

With multiple categories, compound hypotheses are possible

H0 Pr(cat 1) = 0.25, Pr(cat 2) = 0.50 and Pr(cat 3) = 0.75

HA: one of the above not the case Where there are 2 categories, a

“directional alternative” is possible

Page 8: Chi-square, Goodness of fit, and Contingency Tables.

Directional Alternatives

Only in the case of “dichotomous variables” – two categories, effectively.

Step 1: Check Directionality of trend If not, p-value > 0.5 by necessity If so, proceed to step 2

The P-value is half what it would be if HA were non directional

Page 9: Chi-square, Goodness of fit, and Contingency Tables.

Directional Alternative Example Two football teams records are compared

against the average number of wins by an NFL team per year, 9.

Team 1 won 14 games this year and several players were caught doping with HGF.

Team 2 won 11 games this year and tested clean.

Is there evidence that doping increased the number of wins by team 1?

Page 10: Chi-square, Goodness of fit, and Contingency Tables.

Contingency Tables

Use χ2 test statistic as above, but Calculate expected values for each

element in table from E=(row total)*(column total)/Grand Total;

Df =1

Page 11: Chi-square, Goodness of fit, and Contingency Tables.

2x2 Contingency Tables Can indicate either

Two independent samples with a dichotomous observed variabled

One sample with two dichotomous observed variables

Female Male Tot(col)

HIV test

9 8 17

No HIVtest

52 51 103

Tot (row)

61 59 120

Page 12: Chi-square, Goodness of fit, and Contingency Tables.

Relation to Independence of data You can interpret

contingency tables in terms of conditional probabilities

Pr(HIV test | female)= 9/61

Pr(female | HIV test) = 9/17

Test becomes H0 : Likelihood of taking and HIV test is independent of sex

Female

Male Tot(col)

HIV test

9 8 17

No HIVtest

52 51 103

Tot (row)

61 59 120

Page 13: Chi-square, Goodness of fit, and Contingency Tables.

Rxk contingency tables

Same as above, but degrees of freedom = (r-1)*(k-1).

Page 14: Chi-square, Goodness of fit, and Contingency Tables.

Corrections to the Chi-Squared Test It is a requirement that a chi-squared test be applied to discrete data.

Counting numbers are appropriate, continuous measurements are not. Assuming continuity in the underlying distribution distorts the p value and may make false positives more likely.

Frank Yates proposed a correction to the chi-squared formula. Adding a small negative term to the argument. This tends to increase the p-value, and makes the test more conservative, making false positives less likely. However, the test may now be *too* conservative.

Additionally, chi squared test should not be used when the observed values in a cell are <5. It is, at times not inappropriate to pad an empty cell with a small value, though, as one can only assume the result would be more significant with no value there.