“Chi-Square Statistics” By Namrata Khemka
“Chi-Square Statistics”
By
Namrata Khemka
Table of Contents
1. What is Chi-Square?
2. When and why is Chi-Square used?
3. Limitations/Restrictions of Chi-Square
4. Examples
5. References
What is “Chi Square”
• Invented by Pearson • Test for “Goodness of fit”• Tests for independence of variables• Non parametric test
Parametric vs. Non Parametric Data
Parametric data1. Numerical scores
2. Manipulate the scores
3. Example
• Average height of people in 10 cities
Non Parametric data1. Nominal data
2. Scores not manipulated
3. Example
• How many people are over 6ft and how many are below in 2 cities
What is “Chi Square”
• Invented by Pearson • Test for “Goodness of fit”• Tests for independence of variables• Non parametric test• Analyze categorical or
measurement data• SPSS or Excel
Goodness of the Fit
1. Null Hypothesis
2. Observed frequency
3. Expected frequencies
4. Good Fit
5. Poor Fit
6. Sum of observed frequencies = sum of expected frequencies.
Computational Steps
• Scenario
Scenario:
• A movie theater owner would like to know the factors involved in movie selection by people.
• A sample of 50 people were asked, which of the following were important to them.
• They may choose one of the following:
1.Actors2.Directors3.Time the movies is playing4.Genre
Question
• Do any of these factors play a greater role than the others?
Computational Steps
• Scenario
• Threshold Value = 0.05
• Null Hypothesis
Null Hypothesis
• There is no difference in the importance of these 4 factors in determining which movie is selected
Computational Steps
• Scenario
• Threshold Value = 0.05
• Null Hypothesis
• Observed Frequencies
• Expected Frequencies
• p-value
Interpret the Results
• Since p is < 0.05, we reject the null hypothesis.
• There fore, some of the factors are mentioned more than others in response to movie selection
Test of Independence
• Examines the extent to which two variables are related
• Example
Scenario:
• University of Calgary is interested in determining whether or not there is a relationship between educational level and the number of flights taken each year.
• 150 travelers in the airport were interviewed and the results are:
Scenario - Continued
2 or less flights a year
More than 2 flights a year
University Student
53 22
High School Student
37 38
Computational Steps
• Scenario
• Threshold Value = 0.05
• Null Hypothesis
Null Hypothesis
• The educational level of the travelers and the number of flights are independent of one another.
Computational Steps
• Scenario
• Threshold Value = 0.05
• Null Hypothesis
• Observed Frequencies
• Expected Frequencies
• p-value
Interpret the Results
• Since p is < 0.05, we reject the null hypothesis.
• These 2 variables are not independent of one another.
• Thus, the educational level of travelers and the number of flights they take are related
Requirements and Limitations
• Random sampling
• Data must be in raw frequencies
• Independence of observations
• Size of the expected frequencies
• Collapsing values
Collasping Values
Leather
Shoes
Sandals Boots Runners
Man 18 5 12 16
Women 20 19 6 10
Calculation - Details
• Fo – fe
• (Fo – fe)2
• ((Fo – fe)2)/fe
• Chi-square = SUM((Fo – fe)2)/fe
• Calculate the degrees of freedom = (R-1) (C-1)
Calculation - Fo – Fe
2 or less flights a year
More than 2 flights a year
University Student
8 -8
High School Student
-8 8
Calculation – (Fo – Fe)2
2 or less flights a year
More than 2 flights a year
University Student
64 64
High School Student
64 64
Calculation – ((Fo – fe)2)/fe
2 or less flights a year
More than 2 flights a year
University Student
1.42 2.13
High School Student
1.42 2.13
Calculation – Continued
• Chi-square = SUM((Fo
– fe)2)/fe
• 7.1111
• Calculate the degrees of freedom = (R-1) (C-1)
• (2-1)(2-1) = 1
Distribution Tabledf 0.9 0.1 0.05 0.025 0.01
1 0.016 2.706 3.841 5.024 6.635
2 0.211 4.605 5.991 7.378 9.21
3 0.584 6.251 7.815 9.348 11.345
4 1.064 7.779 9.488 11.143 13.277
Interpretation
Chi-Square
Conclusion
• What is chi-square
• When should chi-square be used
• Limitations of Chi-square
• Examples
• Resources
References
• www.ling.upenn.edu/courses/Summer_2002/ling102/chisq.html
• Statistical techniques in business and economics by Lind, Marchal and Mason
• Statistics for the behavioral sciences by Federick J. Gravetter and Larry B. Wallnau
Questions???