Chi-Square Test of significance for proportions FETP India
Mar 31, 2015
Chi-Square
Test of significance for proportions
FETP India
Competency to be gained from this lecture
Test the statistical significance of proportions using the relevant
Chi-square test
Key elements
• Principles of the Chi-square• Comparison of a proportion with an hypothesized value
• Chi-square for 2x2 tables• Chi-square for m x n tables• Testing dose-response with Chi-square
Analyzing quantitative and qualitative data
Quantitative data Qualitative
dataNormal Non-normal
Summary statistics
•Mean •Median •Proportions
Statistical tests
•T-test•F-test
•Non-parametric test
•Chi-square•Fisher exact test
Chi-square: Principle
• The Chi–square test examines whether a series of observed (O) numbers in various categories are consistent with the numbers expected (E) in those categories on some specific hypothesis (Null hypothesis)
O= Observed value E= Expected value
€
χ 2 =∑ (O−E)2
E≥0
Principle
How the Chi-square works in practice
• X2 = 0 when every observed value is equal to the expected value
• As soon as an observed value differs from the expected value, the X
2 exceeds zero
• The value of the X2 is compared with a
tabulated value
• If the calculated value of X2 exceeds
the tabulated value under the column p = 0.05, the null hypothesis is rejected
Principle
Chi-square table:Percentage points of X
2 distribution
Principle
Use of Chi-square to compare a proportion with a hypothesized
value• The reported coverage for measles in a sub-center is 80%
• The chief medical officer of the district suspects that this coverage could be overestimated
• Validation survey with 80 children selected using simple random sampling 56/80 (70%) vaccinated
Hypothesized value
Chi-square calculations
VaccinatedNon-
vaccinated Total
Expected 64 16 80
Observed 56 24 80
€
χ 2 =(56 −64)2
64+(24 −16)2
16=(−8)2
64+(8)2
16=6464
+6416
=1+ 4 =5
€
χ 2 =∑ (O−E)2
E≥0
Interpretation of the Chi-square
• The calculated value of X2 (i.e.,
5) with 1 degree of freedom exceeds the table value (3.84) at 5% level
• Hence, the medical officer rejects the null hypothesis that the coverage is 80%
Hypothesized value
Use of Chi-square to compare proportions between two
samples• Cholera outbreak affecting a village• Cases clustered around a pond• Hypothesis generating interviews suggest that many case-patients washed their utensils in the pond
• The investigator compares those who washed their utensils with the others in terms of cholera incidence
2x2 tables
Incidence of diarrhea (cholera) among persons who
washed utensils in a pond and others, South 24 Parganas, West Bengal, India, 2006
2x2 tables
Chi-square to test the difference
in the two proportions• The proportion of persons affected by cholera in the exposed and unexposed groups differ
• Three steps to test whether this difference is significant:1. Calculate expected values2. Compare observed and expected to
calculate the Chi-square3. Compare the Chi-square with tabulated
value
2x2 tables
Step 1: Calculate the expected values (1/2)
•51% of the population became sick•If the cholera occurred at random, these
proportions apply to the two groups, exposed and unexposed 2x2 tables
Step 1: Calculate the expected values (2/2)
•51% of the 56 who washed utensils (=28) should have been sick•All other numbers can de deducted by subtraction (one degree of freedom)
2x2 tables
Step 2: Compare observed and expected values
€
χ 2 =(50 −28)2
28+(6 −28)2
28+(8 −30)2
30+(49 −27)2
27=(22)2
28+(−22)2
28+(−22)2
30+(22)2
27=68.6
€
χ 2 =∑ (O−E)2
E
2x2 tables
Step 3: Interpretation of the Chi-square
• The calculated value of X2 (i.e., 68.6)
with 1 degree of freedom exceeds the table value (3.84) at 5% level
• Hence, we reject the Null hypothesis that the attack rate of cholera is equal in the exposed and unexposed group
• This may suggest washing utensils in the pond is a source of infection if other elements of the investigation also support the hypothesis
2x2 tables
Simpler Chi-square formula
€
χ 2 =N(ad−bc)2
(a+ c)(b+ d)(a+ b)(c+ d)
2x2 tables
Application of simpler Chi-square formula to the cholera
example
€
χ 2 =113(50x49 −6x8)2
(50 + 8)(6 + 49)(50 + 6)(8 + 49)=64
2x2 tables
€
χ 2 =N(ad−bc)2
(a+ c)(b+ d)(a+ b)(c+ d)
Corrected Chi-square formula
€
χ 2 =N{(ad−bc)−N/ 2}2
(a+ c)(b+ d)(a+ b)(c+ d)
Note that the corrected value will always be smaller than the uncorrected which tends to exaggerate
the significance of a difference2x2 tables
Example of a 4x2 table
Cataract
Present Absent Total
Hindu 10 90 100
Muslim 4 46 50
Christian 3 22 25
Others 1 9 10
Total 18 167 185
Degrees of freedom = (Nbr of rows-1) x (Nbr of columns-1)=(4-1)x(2-1)= 3x1=3 N x N tables
Calculation of the Chi-square for a 4x2 table
Cataract
Present Absent Total
Hindu 10 90 100
Muslim 4 46 50
Christian 3 22 25
Others 1 9 10
Total 18 167 185
•Overall prevalence of cataract = 9.6%•Apply 9.6% proportion to all groups to calculate expected values•Use generic formula
€
χ 2 =∑ (O−E)2
EN x N tables
Interpretation of a Chi-square
for a m x n tableThe Chi-square tests the overall Null hypothesis that all frequencies are distributed at random
• If the Null hypothesis is rejected, it means the distribution is heterogeneous • It is not possible to:
✗“Attribute” the difference to a particular group✗ Regroup categories according to differences observed and test with a 2x2 table (i.e., post-hoc analysis)
✗ Test with multiple 2x2 tables (i.e., multiple comparisons)
N x N tables
Key rule about Chi-square
• Chi- square test should be applied on qualitative data set out in the form of frequencies
• Chi– square test should not be done on: Percentages Rates Ratios Mean values
N x N tables
Limitations to the use of Chi-square
• When sample size is small, other exact tests (e.g., Fisher exact test) are preferred and calculated with computer N < 30 Expected value < 5
• When several expected cell frequencies are less than one, it is better to amalgamate rows / columns
N x N tables
Testing a dose-response relationship
with a Chi-square• Overall m x n Chi-square
Tests the null hypothesis that the odds ratios do not differ
No particular conditions needed Overall test, easy to compute Rough conclusions
• Chi-square for trend
Dose-reponse
Exposure to injections with reusable needles and acute
hepatitis B, Thiruvananthapuram, Kerala,
India, 1992 Potential risk factors
Cases (N=160
)Controls (N=160)
Odds ratio
95% confidence interval
None 51 120 1.0 N/A
Single injection 41 25 3.9 2.0-7.3
Multiple injections
29 7 9.8 3.8-26
Overall 3x2 Chi-square : 42, 2 degrees of freedom, p<0.00001
Heterogeneous
exposures categories
Dose-reponse
Testing a dose-response relationship
with a Chi-square• Overall m x n Chi-square• Chi-square for trend
Tests for a linear trend for the increase of the odds ratios with increased levels of exposure
Requires equal interval in the exposure categories
Can be calculated on a computer Refined conclusions
Dose-reponse
Odds of typhoid according to raw onion consumption,
Darjeeling, West Bengal, India, 2005-2006
Weekly raw onion servings
Cases (n=123)
Controls(n=123)
Odds ratio
# % # % OR 95% CI
1-2 19 23 28 34 1 N/A
3-4 25 31 27 61 1.4 0.6 – 3.3
5+ 38 46 8 5 7 2.5 – 21
Chi-square for trend: 16.8; P-value: 0.0004
Homogeneous
exposures categories
Dose-reponse
Take home messages
• The Chi-square compares expected and observed counts
• The Chi-square can compare an actual proportion with a theoretical one
• The Chi-square is the basic test to compare proportions in epidemiological 2x2 tables
• The Chi-square can be used also as a global test for m x n tables
• Specific Chi-squares can be used to test for dose-response effect