Chi-square (χ Chi-square (χ 2 2 ) ) Fenster Fenster
Dec 31, 2015
Chi-square (χChi-square (χ22) )
FensterFenster
Chi-SquareChi-Square
Chi-Square χChi-Square χ22
Tests of Statistical Significance for Tests of Statistical Significance for Nominal Level Data (Note: can also be Nominal Level Data (Note: can also be used for ordinal level data).used for ordinal level data).
Chi-SquareChi-Square
Chi-Square is an elegant and beautiful testChi-Square is an elegant and beautiful testThe assumptions required to use the test The assumptions required to use the test
are very weak. That is to say, we do not are very weak. That is to say, we do not have to make many assumptions about have to make many assumptions about how the data are distributed.how the data are distributed.
Chi-SquareChi-Square
We ask the following question- Are the We ask the following question- Are the frequencies empirically obtained (by this frequencies empirically obtained (by this we mean OBSERVED) significantly we mean OBSERVED) significantly different from those, which would have different from those, which would have been EXPECTED under some general set been EXPECTED under some general set of assumptions:of assumptions:
Chi-SquareChi-Square
Assumptions to use Chi-Square Test: Assumptions to use Chi-Square Test: Samples are randomly selected from the Samples are randomly selected from the
population.population.EXPECTED frequencies (to be defined EXPECTED frequencies (to be defined
later) are greater than 5 in every cell. But later) are greater than 5 in every cell. But even this assumption can be modified with even this assumption can be modified with the use of Yates' correction. WE DO NOT the use of Yates' correction. WE DO NOT NEED TO ASSUME NORMALITY!!!!NEED TO ASSUME NORMALITY!!!!
Chi-SquareChi-Square
This may not surprise you. After all, the This may not surprise you. After all, the concept of a normal distribution has no concept of a normal distribution has no meaning for nominal level data and chi-meaning for nominal level data and chi-square is a test for nominal level data. Chi-square is a test for nominal level data. Chi-square is so popular because of this weak square is so popular because of this weak set of assumptions.set of assumptions.
Chi-SquareChi-Square
HHoo for chi-square: If there were no relationship for chi-square: If there were no relationship
between the dependent and independent between the dependent and independent variable, the column percentages will not change variable, the column percentages will not change as we move across levels of the INDEPENDENT as we move across levels of the INDEPENDENT variable.variable.
Note: We covered this earlier in the course. We Note: We covered this earlier in the course. We said we had no relationship between two said we had no relationship between two variables if the column percentages do not variables if the column percentages do not change across the independent variables.change across the independent variables.
Chi-SquareChi-Square
We can compute an EXPECTED set of We can compute an EXPECTED set of frequencies from the MARGINAL totals of frequencies from the MARGINAL totals of the dependent and independent variables.the dependent and independent variables.
To calculate EXPECTED frequencies we To calculate EXPECTED frequencies we take the row total multiplied by the column take the row total multiplied by the column total and divide by the grand totaltotal and divide by the grand total
Chi-SquareChi-Square
Expected frequencies= Expected frequencies= (Row total) X (Column total)(Row total) X (Column total) Grand TotalGrand Total
OBSERVED Frequencies are those OBSERVED Frequencies are those frequencies that are empirically obtained. frequencies that are empirically obtained.
Those are the frequencies that are given Those are the frequencies that are given to us.to us.
Chi-SquareChi-Square Chi-Square= Σ(Chi-Square= Σ(Observed frequencies- Expected FrequenciesObserved frequencies- Expected Frequencies))22
Expected FrequenciesExpected Frequencies
Chi-SquareChi-Square
Usually this formula is written Usually this formula is written Chi-Square = Σ Chi-Square = Σ (O - E)(O - E)22
EE
The larger the difference between The larger the difference between observed and expected frequencies the observed and expected frequencies the larger the value for χlarger the value for χ22..
Chi-SquareChi-Square
If you look at a chi-square table, you will If you look at a chi-square table, you will see many different χsee many different χ22 distributions. distributions.
Which one should you use? Which one should you use? You use the χYou use the χ22 distribution with the distribution with the
appropriate number of degrees of appropriate number of degrees of freedom. freedom.
For χFor χ22 degrees of freedom are given with degrees of freedom are given with the following formula: df= (r-1) X (c-1)the following formula: df= (r-1) X (c-1)
Chi-SquareChi-Square
That is to say That is to say (1) we take the number of rows we have (1) we take the number of rows we have
and subtract one. and subtract one. (2) We take the number of columns we (2) We take the number of columns we
have and subtract one. have and subtract one. (3) We then multiply the numbers we get (3) We then multiply the numbers we get
for the first two parts.for the first two parts.
Chi-SquareChi-Square
Logic of the χLogic of the χ22 test: test: We do not expect observed and expected We do not expect observed and expected
frequencies to be EXACTLY the same. frequencies to be EXACTLY the same. Observed and expected values can vary Observed and expected values can vary
simply by sampling variability. simply by sampling variability. However, if the value of χHowever, if the value of χ22 turns out to be turns out to be
larger than that expected by chance, we larger than that expected by chance, we shall be in a position to reject the null shall be in a position to reject the null hypothesis.hypothesis.
Chi-SquareChi-Square
EXAMPLE: EXAMPLE: Let us say one was interested in Let us say one was interested in
investigating the relationship between investigating the relationship between gender and opinions on accountability. gender and opinions on accountability.
Our null hypothesis is that gender makes Our null hypothesis is that gender makes no difference in attitudes towards no difference in attitudes towards accountability. Our research hypothesis is accountability. Our research hypothesis is that gender makes a difference in attitudes that gender makes a difference in attitudes towards accountability.towards accountability.
Chi-SquareChi-Square
GenderGender
Opinion on Opinion on AccountabilityAccountability
MaleMale FemaleFemale Row TotalsRow Totals
Accountability good Accountability good for educational for educational systemsystem
126126 9999 225225
Accountability bad Accountability bad for educational for educational systemsystem
7171 162162 233233
Col. TotalsCol. Totals 197197 261261 458458
Chi-SquareChi-Square
It is important to note that the numbers in It is important to note that the numbers in each cell are actual frequencies rather each cell are actual frequencies rather than percentages.than percentages.
Let us go through our six-step hypothesis Let us go through our six-step hypothesis testing method in this case.testing method in this case.
Chi-SquareChi-Square
Step 2: State the Research hypothesisStep 2: State the Research hypothesisHH11: Gender does make a difference when : Gender does make a difference when
predicting to attitudes towards educational predicting to attitudes towards educational accountability.accountability.
Step 1-State null hypothesisStep 1-State null hypothesisHHoo: Gender does not make a difference : Gender does not make a difference
when predicting to attitudes towards when predicting to attitudes towards educational accountability.educational accountability.
Chi-SquareChi-Square
Step 3: Select a significance level: Let’s Step 3: Select a significance level: Let’s chose α=.01chose α=.01
Step 4: Collect and summarize the sample Step 4: Collect and summarize the sample data:data:
Calculation of χCalculation of χ22::Compute out EXPECTED FREQUENCIES Compute out EXPECTED FREQUENCIES
for EACH CELLfor EACH CELL
Chi-SquareChi-Square
Computing out EXPECTED Computing out EXPECTED FREQUENCIESFREQUENCIES
cell a- males who believe that that cell a- males who believe that that accountability is good for the educational accountability is good for the educational systemsystem
(197) (225) (197) (225) = 96.8= 96.8
458458
Chi-SquareChi-Square
b- females who believe that that b- females who believe that that accountability is good for the educational accountability is good for the educational systemsystem
(261) (225)(261) (225) = 128.2 = 128.2
458458
Chi-SquareChi-Square
c- males who believe that that c- males who believe that that accountability is bad for the educational accountability is bad for the educational system system
(197) (233)(197) (233) = 100.2 = 100.2
458458
Chi-SquareChi-Square
d- females who believe that that d- females who believe that that accountability is bad for the educational accountability is bad for the educational system system
(261) (233)(261) (233) = 132.8 = 132.8458458
Set up a chi-square tableSet up a chi-square table
CellCellf observed f observed
f f expected expected
f observed- f observed- f expected f expected
(f obs- f exp)(f obs- f exp)22 (f obs- f exp) (f obs- f exp) 22//f expf exp
AA 126126 96.896.8 29.229.2 852.64852.64 8.8088.808
BB 9999 128.2128.2 -29.2-29.2 852.64852.64 6.6516.651
CC 7171 100.2100.2 -29.2-29.2 852.64852.64 8.5098.509
DD 162162 132.8132.8 29.229.2 852.64852.64 6.4206.420
TotalTotal 458458 458458 00 30.38830.388
Chi-SquareChi-Square
Step 5Step 5 Obtaining the sampling distribution. Look at a Obtaining the sampling distribution. Look at a
chi-square table. We will use the chi-square test chi-square table. We will use the chi-square test with 1 degree of freedom. Why one? df=(r-1) X with 1 degree of freedom. Why one? df=(r-1) X (c-1) (c-1)
We have 2 rows and 2 columns.We have 2 rows and 2 columns. so we get df= (2-1) X (2-1)= 1 X 1=1so we get df= (2-1) X (2-1)= 1 X 1=1 With our choice of α=.01, we get a χWith our choice of α=.01, we get a χ22 critical of critical of
6.635 (found in chi-square table, p. 566)6.635 (found in chi-square table, p. 566)
Chi-SquareChi-Square
If we find a χIf we find a χ22 greater than or equal to 6.635 we greater than or equal to 6.635 we reject the null hypothesis and conclude that reject the null hypothesis and conclude that gender does make a difference when predicting gender does make a difference when predicting to attitudes towards educational accountability.to attitudes towards educational accountability.
If we find a χIf we find a χ22 less than 6.635 we fail to reject less than 6.635 we fail to reject
the null hypothesis and conclude that gender the null hypothesis and conclude that gender does not make a difference when predicting to does not make a difference when predicting to attitudes towards educational accountability.attitudes towards educational accountability.
Chi-SquareChi-Square
Note: All χNote: All χ22 tests are one-tailed tests. tests are one-tailed tests. Chi-square can only tell you whether a Chi-square can only tell you whether a
variable is significant. variable is significant. Chi-square can not tell you anything about Chi-square can not tell you anything about
the DIRECTIONALITY of the relationship. the DIRECTIONALITY of the relationship. You must inspect the column percentages You must inspect the column percentages
as you move across categories of the as you move across categories of the independent variable to determine independent variable to determine DIRECTIONALITY. DIRECTIONALITY.
Chi-SquareChi-Square
Another way to determine DIRECTIONALITY is to look Another way to determine DIRECTIONALITY is to look at the RESIDUALS (you can instruct SPSS to present at the RESIDUALS (you can instruct SPSS to present the residuals on your output file. the residuals on your output file.
RESIDUALS are simply the OBSERVED cell count RESIDUALS are simply the OBSERVED cell count minus the EXPECTED value.) minus the EXPECTED value.)
If the RESIDUALS are NEGATIVE, you are getting If the RESIDUALS are NEGATIVE, you are getting fewer OBSERVED cases than EXPECTED in a CELL. fewer OBSERVED cases than EXPECTED in a CELL.
If the RESIDUALS are POSITIVE, you are getting If the RESIDUALS are POSITIVE, you are getting more OBSERVED cases than EXPECTED in a CELL.more OBSERVED cases than EXPECTED in a CELL.
Chi-SquareChi-Square
To determine DIRECTIONALITY, look at To determine DIRECTIONALITY, look at the SIGN changes of the RESIDUALS as the SIGN changes of the RESIDUALS as you move across categories of the you move across categories of the independent variable. independent variable.
Let us assume that the RESIDUALS start Let us assume that the RESIDUALS start out NEGATIVE and end up POSITIVE. out NEGATIVE and end up POSITIVE. This would imply that the independent This would imply that the independent variable is related to the dependent variable is related to the dependent variable.variable.
Chi-SquareChi-Square
Step 6: Make a decision: Step 6: Make a decision: χχ22 observed= 30.388 and χ2 critical= 6.635 observed= 30.388 and χ2 critical= 6.635Decision: REJECT HDecision: REJECT Hoo : : χ2 observed is greater than χχ2 observed is greater than χ22 critical criticalWe easily reject the null hypothesis and We easily reject the null hypothesis and
conclude that gender does make a conclude that gender does make a difference when predicting to attitudes difference when predicting to attitudes towards educational accountability.towards educational accountability.
Chi-SquareChi-Square
Two points to note in this example. Two points to note in this example. We had one degree of freedom. We had one degree of freedom. By one degree of freedom we mean that By one degree of freedom we mean that
only one number in the table is actually only one number in the table is actually free to vary. free to vary.
Assume we know the row and column Assume we know the row and column totals. totals.
Once we know one number in a 2 X 2 Once we know one number in a 2 X 2 table, we can find the other three.table, we can find the other three.
Chi-SquareChi-Square
If I knew the row and column totals, there is only one cell that is If I knew the row and column totals, there is only one cell that is free to vary. free to vary.
aa bb a+ba+b
cc dd c+dc+d
a+ca+c b+db+d a+b+c+da+b+c+d
Chi-SquareChi-Square
WE ONLY NEED TO KNOW ONE CELL WE ONLY NEED TO KNOW ONE CELL TO KNOW THE ENTIRE TABLE. THIS IS TO KNOW THE ENTIRE TABLE. THIS IS WHY WE HAD ONE DEGREE OF WHY WE HAD ONE DEGREE OF FREEDOM. FREEDOM.
Chi-SquareChi-Square
In our example, f observed - f expected = In our example, f observed - f expected = the same number for each cell (-29.2 or the same number for each cell (-29.2 or 29.2) because the table had only one 29.2) because the table had only one degree of freedom. If a table has more degree of freedom. If a table has more than one degree of freedom, f observed- f than one degree of freedom, f observed- f expected does not necessarily equal the expected does not necessarily equal the same number in every cell (and will not same number in every cell (and will not generally be the same).generally be the same).
Chi-SquareChi-Square
How many cells do we need to know in a How many cells do we need to know in a 3 X 3 table? I told you the formula tells 3 X 3 table? I told you the formula tells us the answer is us the answer is (r-1) (c-1)(r-1) (c-1)
(3-1) (3-1)=(2) X (2) = (3-1) (3-1)=(2) X (2) = 44
Let us see how we get df to equal 4.Let us see how we get df to equal 4.
aa bb cc a+b+ca+b+c
dd ee ff d+e+fd+e+f
gg hh ii g+h+ig+h+i
a+d+ga+d+g b+e+hb+e+h c+f+ic+f+i a+b+c+d+e+a+b+c+d+e+f+g+h+if+g+h+i
Chi-SquareChi-Square
Let us say we knew cell a. Let us say we knew cell a. Could we know all the other cells in the table? Could we know all the other cells in the table? Not this time. Let’s say we know cells a and b. Not this time. Let’s say we know cells a and b. If we knew cells a and b than we can find out cell If we knew cells a and b than we can find out cell
c, but we would not know any other cells. c, but we would not know any other cells. Only if we know four cells: a, b, d, and e Only if we know four cells: a, b, d, and e
would we be able to find the other five cells. would we be able to find the other five cells. This is why we have four degrees of freedom in This is why we have four degrees of freedom in a 3 X 3 table.a 3 X 3 table.
Chi-SquareChi-Square
One other point about chi-square. One other point about chi-square. Chi-square can tell you whether a Chi-square can tell you whether a
relationship is significant. relationship is significant. Chi-square can also tell you what cells are Chi-square can also tell you what cells are
most important in determining the most important in determining the significance of the relationship. In our significance of the relationship. In our example we find that all cells contribute to example we find that all cells contribute to the significance of the relationship. the significance of the relationship.
Set up a chi-square tableSet up a chi-square table
CellCellf observed f observed
f f expected expected
f observed- f observed- f expected f expected
(f obs- f exp)(f obs- f exp)22 (f obs- f exp) (f obs- f exp) 22//f expf exp
AA 126126 96.896.8 29.229.2 852.64852.64 8.8088.808
BB 9999 128.2128.2 -29.2-29.2 852.64852.64 6.6516.651
CC 7171 100.2100.2 -29.2-29.2 852.64852.64 8.5098.509
DD 162162 132.8132.8 29.229.2 852.64852.64 6.4206.420
TotalTotal 458458 458458 00 30.38830.388
Chi-SquareChi-Square
Three of our cells have individual χThree of our cells have individual χ22 greater than needed to establish statistical greater than needed to establish statistical significance for an entire relationship. significance for an entire relationship. Since χSince χ22 cannot be negative, we can cannot be negative, we can determine if part of our relationship drives determine if part of our relationship drives the entire relationship to statistical the entire relationship to statistical significance.significance.
SPSS Command Syntax for CrosstabsSPSS Command Syntax for Crosstabs
Note: You can get EXPECTED Note: You can get EXPECTED frequencies in SPSS by going into frequencies in SPSS by going into
ANALYZEANALYZEDESCRIPTIVE STATISTICSDESCRIPTIVE STATISTICSCROSSTABS and clicking on CROSSTABS and clicking on
CROSSTABSCROSSTABSDependent variable goes into row boxDependent variable goes into row box Independent variable goes into column Independent variable goes into column
boxbox
SPSS Command Syntax for CrosstabsSPSS Command Syntax for Crosstabs
Click on CellsClick on CellsClick on EXPECTEDClick on EXPECTEDAlso click on UNSTANDARDIZED under Also click on UNSTANDARDIZED under
residuals.) Click Continue.residuals.) Click Continue.Click on the STATISTICS boxClick on the STATISTICS boxClick on Chi-Square. Click on Chi-Square. You may also want to click on the You may also want to click on the
Contingency Coefficient and lambda.Contingency Coefficient and lambda.