Top Banner
Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health Chi-Square Test Statistic
53

Chi-Square Test Statistic

Apr 04, 2023

Download

Documents

Khang Minh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chi-Square Test Statistic

Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health

Chi-Square Test Statistic

Page 2: Chi-Square Test Statistic

Measurement

¤ What are the 4 levels of measurement discussed? – 1. Nominal or Classificatory Scale

• Gender, ethnic background – 2. Ordinal or Ranking Scale

• Hardness of rocks, beauty, military ranks – 3. Interval Scale

• Celsius or Fahrenheit – 4. Ratio Scale

• speed, height, mass or weight

Page 3: Chi-Square Test Statistic

Question 1

A chimpanzee was trained to make same-different judgments about pairs of stimuli. For pictures of objects the animal was correct on 23 out of 24 trials. Did the chimp perform significantly better than chance?

Page 4: Chi-Square Test Statistic

Question 2

A fruit fly mutant, Cyclops eyes, is hypothesized to be autosomal dominant. The data from heterozygous male and female were 44 wild type male, 60 wild type females, 110 cyclops males and 150 cyclops females. Does this data support or reject the hypothesis?

Page 5: Chi-Square Test Statistic

Question 3

A genetics engineer was attempting to cross a tiger and a cheetah. She predicted a phenotypic outcome of the traits observing to be in the following ratio 4 stripes only: 3 spots only: 9 both stripes and spots. When the cross was performed she found 50 with stripes only, 41 with spots only and 85 with both. According to these data, did she get the predicted outcome?

Page 6: Chi-Square Test Statistic

Question 4 A KUVA poll surveyed a simple random sample of

1000 veterinary students. Respondents were classified by gender (male or female) and by occupational preference (Vet Practitioners, Products representatives and Technical supports, or Lecturers and DLD officers). Results are shown in the table below. Do the men's occupational preferences differ significantly from the women's preferences?

Vet practitioners

Sales& Technical supports

Lecturers and DLD officers

Total

Male 150 200 50 400 Female 300 250 50 600

Total 450 450 100 1000

Page 7: Chi-Square Test Statistic

Question 5

Given the following table, test the null hypothesis that there is no relationship between social class (high, middle, low) and stress category (high, low).

High Class Middle

Class

Low Class Total

High

Stress

13 49 168 230

Low

Stress

55 90 246 391

Total 68 139 414 621

Page 8: Chi-Square Test Statistic

Chi-Square (2)Test

Page 9: Chi-Square Test Statistic

Chi-Square (2)Test

The chi-square test is a statistical test used to examine differences with categorical variables There are a number of features of the social world we characterize through categorical variables - religion, political preference, etc. To examine hypotheses using such variables, use the chi-square test

Page 10: Chi-Square Test Statistic

When using the chi square test, the researcher needs a clear idea of what is being investigate.

It is customary to define the object of the research by writing an hypothesis.

Chi square is then used to either prove or disprove the hypothesis.

Chi-Square (2)Test

Page 11: Chi-Square Test Statistic

2 Test Requirements

Prior to using the chi square test, there are certain requirements that must be met.

The data must be in the form of frequencies counted in each of a set of categories. Percentages cannot be used. The total number observed must be exceed 20.

Page 12: Chi-Square Test Statistic

The expected frequency under the H0 hypothesis in any one fraction must not normally be less than 5.

All the observations must be independent of each other. In other words, one observation must not have an influence upon another observation.

2 Test Requirements

D+ D- Total

E-

E+

Total

Page 13: Chi-Square Test Statistic

Hypothesis

The hypothesis is the most important part of a research project. It states exactly what the researcher is trying to establish. It must be written in a clear and concise way so that other people can easily understand the aims of the research project.

Page 14: Chi-Square Test Statistic

The chi-square test is used in two similar but distinct circumstances:

For estimating how closely an observed distribution matches an expected distribution - we'll refer to this as the goodness-of-fit test

For estimating whether two random variables are independent

Page 15: Chi-Square Test Statistic

Test of Goodness-of-Fit

Page 16: Chi-Square Test Statistic

The Goodness-of-Fit Test

One of the more interesting goodness-of-fit applications of the chi-square test is to examine issues of fairness and cheating in games of chance, such as cards, dice, and roulette. Since such games usually involve wagering, there is significant incentive for people to try to rig the games and allegations of missing cards, "loaded" dice, and "sticky" roulette wheels are all too common.

Page 17: Chi-Square Test Statistic

So how can the goodness-of-fit test be used to examine cheating in gambling?

Take the example of dice. Most dice used in wagering have six sides, with each side having a value of one, two, three, four, five, or six.

The Goodness-of-Fit Test

Page 18: Chi-Square Test Statistic

• If the dice being used is fair, then the chance of any particular number coming up is the same: 1 in 6.

• However, if the dice is loaded, then certain numbers will have a greater likelihood of appearing, while others will have a lower likelihood.

The Goodness-of-Fit Test

Page 19: Chi-Square Test Statistic

The Goodness-of-Fit Test

Value on Blue Dice

Observed Frequency

Expected Frequency

1 16 10

2 5 10

3 9 10

4 7 10

5 6 10

6 17 10

Total 60 60

Page 20: Chi-Square Test Statistic

At first glance, this table would appear to be strong evidence that the blue die was, indeed, loaded. There are more 1's and 6's than expected, and fewer than the other numbers However, it's possible that such differences occurred by chance The chi-square statistic can be used to estimate the likelihood that the values observed on the blue die occurred by chance.

Page 21: Chi-Square Test Statistic

The key idea of the chi-square test is a comparison of observed and expected values

How many of something were expected and how many were observed in some process?

In this case, we would expect 10 of each number to have appeared and we observed those values in the left column

Page 22: Chi-Square Test Statistic

With these sets of figures, we calculate the chi-square statistic as follows:

Using this formula with the values in the table above gives us a value of 13.6

Page 23: Chi-Square Test Statistic

To determine the significance level we need to know the "degrees of freedom“. In the case of the chi-square goodness-of-fit test, the number of degrees of freedom is equal to the number of terms used in calculating chi-square minus one. There were 6 terms in the chi-square for this problem - therefore, the number of degrees of freedom is 5.

Page 24: Chi-Square Test Statistic

We then compare the value calculated in the formula above to a standard set of tables.

The value returned from the table is P = 0.02

Page 25: Chi-Square Test Statistic

We interpret this as meaning that if the dice was fair (or not loaded), then the chance of getting a χ2 statistic as large or larger than the one calculated above is less than 2.0%.

In other words, there's only a very slim chance that these rolls came from a fair dice.

Page 26: Chi-Square Test Statistic

To recap the steps used in calculating a goodness-of-fit test with chi-square:

Establish hypotheses. Calculate chi-square statistic. Doing so requires knowing: The number of observations Expected values Observed values Assess significance level. Doing so requires knowing the number of degrees of freedom. Finally, decide whether to accept or reject the null hypothesis.

Page 27: Chi-Square Test Statistic

Test of Independence

Page 28: Chi-Square Test Statistic

The Testing Independence

The other primary use of the chi-square test is to examine whether two variables are independent or not.

What does it mean to be independent, in this sense?

It means that the two factors are not related.

Page 29: Chi-Square Test Statistic

Typically in social science research, we're interested in finding factors that are related - education and income, occupation and prestige, age and voting behavior. In this case, the chi- square can

be used to assess whether two variables are independent or not.

Page 30: Chi-Square Test Statistic

More generally, we say that variable Y is "not correlated with" or "independent of" the variable X if more of one is not associated with more of another.

If two categorical variables are correlated their values tend to move together, either in the same direction or in the opposite.

Page 31: Chi-Square Test Statistic

• For example, we want to know whether boys or girls get into “Gigs” trouble more often in college.

• There are many ways we might show information related to this question, but perhaps the most frequent and easiest to comprehend method is in a table.

The Testing Independence Example

Page 32: Chi-Square Test Statistic

“Gigs” Trouble No Trouble Total

Boys 46 71 117

Girls 37 83 120

Total 83 154 237

To examine statistically whether boys got in trouble with “Gigs” more often than girls, we need to frame the question in terms of hypotheses.

Page 33: Chi-Square Test Statistic

Establish Hypotheses The null hypothesis is that the two variables are independent - or, in this particular case that the likelihood of getting in “Gigs” trouble is the same for boys and girls. The alternative hypothesis to be tested is that the likelihood of getting in “Gigs” trouble is not the same for boys and girls.

Page 34: Chi-Square Test Statistic

It is important to keep in mind that the chi-square test only tests whether two variables are independent It cannot address questions of which is greater or less Using the chi-square test, we cannot evaluate directly the hypothesis that boys get in “Gigs” trouble more than girls; rather, the test (strictly speaking) can only test whether the two variables are independent or not

Cautionary Note

Page 35: Chi-Square Test Statistic

Calculate the expected value for each cell of the table

As with the goodness-of-fit example described earlier, the key idea of the chi-square test for independence is a comparison of observed and expected values How many of something were expected and how many were observed in some process? In the case of tabular data, however, we usually do not know what the distribution should look like (as we did with rolls of dice) Rather, in this use of the chi-square test, expected values are calculated based on the row and column totals from the table.

Page 36: Chi-Square Test Statistic

The expected value for each cell of the table can be calculated using the following formula:

Page 37: Chi-Square Test Statistic

For example, in the table comparing the percentage of boys and girls in trouble, the expected count for the number of boys who got in trouble is:

Page 38: Chi-Square Test Statistic

The first step, then, in calculating the chi-square statistic in a test for independence is generating the expected value for each cell of the table.

“Gigs” Trouble:

Yes “Gigs” Trouble:

No Total

Boys 46 (40.97) 71 (76.02) 117

Girls 37 (42.03) 83 (77.97) 120

Total 83 154 237

Page 39: Chi-Square Test Statistic

Calculate Chi-square statistic

With these sets of figures, we calculate the chi-square statistic as follows

Page 40: Chi-Square Test Statistic

Assess significance level

Lastly, to determine the significance level we need to know the "degrees of freedom." In the case of the chi-square test of independence, the number of degrees of freedom is equal to the number of columns in the table minus one multiplied by the number of rows in the table minus one

Page 41: Chi-Square Test Statistic

In this table, there were two rows and two columns. Therefore, the number of degrees of freedom is:

(2 – 1) * (2 – 1) = 1

We then compare the value calculated in the formula above to a standard set of tables.

Page 42: Chi-Square Test Statistic

The value returned from the table is

P = 0.2

Thus, we cannot reject the null hypothesis and conclude that boys are not significantly more likely to get in “Gigs” trouble than girls.

Page 43: Chi-Square Test Statistic

To recap the steps used in calculating a goodness-of-fit test with chi-square:

Establish hypotheses Calculate expected values for each cell of the table Calculate chi-square statistic. Doing so requires knowing: The number of observations Observed values Assess significance level. Doing so requires knowing the number of degrees of freedom Finally, decide whether to accept or reject the null hypothesis

Page 44: Chi-Square Test Statistic

Fisher’s Exact Test

Fisher’s Exact Test is a test for independence in a 2 X 2 table

It is most useful when the total sample size and the expected values are small

The test holds the marginal totals fixed and computes the hypergeometric probability

Useful when observe value is less than 5

Page 45: Chi-Square Test Statistic

Hypergeometric distribution

Example: 2x2 table with cell counts a, b, c, d. Assuming marginal totals are fixed:

M1= a+b, M2= c+d, N1= a+c, N2= b+d. for convenience assume N1<N2, M1<M2 possible value of a are: 0, 1, … min(M1,N1)

Probability distribution of cell count “a” follows a hypergeometric distribution:

N = a + b + c + d = N1+N2 = M1+M2 – Pr (x=a) = N1!N2!M1!M2! / [N!a!b!c!d!] – Mean (x) = M1N1/ N – Var (x) = M1M2N1N2 / [N2(N-1)] Fisher exact test is based on this hypergeometric distribution

Page 46: Chi-Square Test Statistic

Fisher’s Exact Test Example

Is HIV Infection related to History of STDs in Sub Saharan African Countries? Test at 5% level.

Yes No Total

Yes 3 7 10

No 5 10 15

Total 8 17 25

HIV Infection Hx o

f STDs

Page 47: Chi-Square Test Statistic

Yes No Total

Yes 3 7 10

No 5 10 15

Total 8 17 25

HIV Infection

Hx o

f STDs

Yes No Total

Yes 3 (a) 7 (b) 10 (a+b)

No 5 (c) 10 (d) 15 (c+d)

Total 8 (a+c) 17 (b+d) 25 (n)

HIV Infection

Hx o

f STDs

Page 48: Chi-Square Test Statistic

Fisher’s Exact Test

p = 10! 15! 8! 17! 3! 7! 5! 10! 25! p = 6.80 x 1037 p = 0.33

2.04 x 1038

Page 49: Chi-Square Test Statistic

Hypergeometric probability

Probability of observing this specific table given fixed marginal totals is

Pr (3,7,5,10) = 10!15!8!17!/[25!3!7!5!10!] = 0.33

Note the above is not the p-value. Why? Not the accumulative probability, or not the tail probability

Tail prob. = sum of all values (a =3,2,1,0)

Page 50: Chi-Square Test Statistic

Yes No Total

Yes 2 8 10

No 6 9 15

Total 8 17 25

HIV Infection

Hx o

f STDs

Yes No Total

Yes 0 10 10

No 8 7 15

Total 8 17 25

HIV Infection

Hx o

f STDs

Yes No Total

Yes 1 9 10

No 7 8 15

Total 8 17 25

HIV Infection

Hx o

f STDs

a = 2

a = 0

a = 1

Page 51: Chi-Square Test Statistic

Hypergeometric probability

Pr (2,8,6,9) = 10!15!8!17!/[25!2!8!6!9!] = 0.2082

Pr (1,9,7,8) = 10!15!8!17!/[25!1!9!7!8!] = 0.0595 Pr (0,10,8,7) = 10!15!8!17!/[25!0!10!8!7!] = 0.0059 Tail prob =0.3332+0.2082+0.0595+0.0059 = 0.6068

Page 52: Chi-Square Test Statistic

Pearson Chi-squares test VS Yates correction

Pearson Chi-squares test

χ2 = ∑i (Oi-Ei)2/Ei

follows a chi-squares distribution with

df = (r-1)(c-1) if Ei ≥ 5

Yates correction for more accurate p-value

χ2 = ∑i (|Oi-Ei|-0.5)2/Ei

when Oi and Ei are close to each other