CHAPTER 14 CHAPTER 14 •Chi- square goodness of fit Test “GOF” •Chi-square Test for Homogeneity •Chi-square Test for Independence •Symbol for Chi-Square is χ 2 DONE SAME WAY
CHAPTER 14CHAPTER 14
•Chi- square goodness of fit Test “GOF”
•Chi-square Test for Homogeneity•Chi-square Test for Independence
•Symbol for Chi-Square is χ2
DONE SAME WAY
What’s the difference?What’s the difference?• Goodness of Fit…..how well does the observed data
match the expected….2 rows or 2 columns
• Homogeneity…..More than one sample is taken with one categorical variable in mind
• (2+ Samples, 1 category)
• Independence/Association…..Only one sample is taken and there are two or more categories.
• (1 sample, 2+ categories)
Chi-Square CurveChi-Square Curve
It is not a NORMAL CURVE!!!
It is always skewed to the right some
Is your die fair—one Is your die fair—one more time.more time.
Roll your die 60 times.
Write down the number for every roll.
Chi-Square GOF TestChi-Square GOF Test
• If your die is fair you would expect to get 10 of each number in 60 rolls.
• In this test we compare the EXPECTED results vs the OBSERVED results.
HypothesesHypotheses• Ho: The proportion of each number that occurs on my die is
1/6• Ha: The proportion of each number that occurs on my die is
different than 1/6
• There are no symbols for the chi-square. However, it is always one-sided, even though the word “different” is used.
Normal condition for Normal condition for χχ22
• 80% of the expected cells are greater than or equal to 5.
(not observed cells—expected cells!)
FormulaFormula
Degrees of Freedom Degrees of Freedom (df)(df)
• For all chi-square tests use the following:
• df = (r – 1)(c – 1)
• r is the row and c is the column
Calculator stepsCalculator stepsTI-83+ Calculator
Tyoe the observed in L1 and expected in L2. Then click on the L3 heading and type the formula(then click enter), then quit out to the main screen
Then hit 2nd and Stat
Find the sum of L3, your answer is the chi-square statistic
Put your observed counts in L1 and Expected in L2
Calculator stepsCalculator stepsAfter your get the sum you need to obtain the p-value
X2 , UB , df
This will give you the p-value
Calculator stepsCalculator stepsTI-84 calculator does most of the work for you
Make sure you have typed your observed counts in L1 and expected counts in L2
5
Demographics.Demographics.
• Rancho is approximately 53.6% Hispanic, 29.2% Asian, 12.7% white and 4.5% other.(data as of 2012-2013 school year)
• Does Mr. Pines’ AP stats classes reflect this diversity? Run the appropriate test, verify your requirements, and write a conclusion.
Demographics.Demographics.Ho: The diversity in Mr. Pines classes is the same as Rancho’s diversity
Ha: The diversity in Mr. Pines classes is different than Rancho’s diversity
Assumptions: We have an independent random sample of 159 students ethnicity. We can assume that there have been at least 1590 students in Mr. Pines classes. 100% of our expected cells are 5 or more.
Chi-square GOF test
This p-value is low enough to reject Ho at the 1% level
This is strong evidence to suggest that Mr. Pines class diversity may be different than Rancho’s diversity.
X2 = 12.38P-value =.0062df = 3
2015
Is there a Is there a difference……difference……
• Do boys and girls prefer types of social media?
• Please choose your favorite of the three below.
One Hundred Fifty Seven students were surveyed………….
Snapchat
Girls 36 25 22
Boys 19 36 19
We took a sample of 83 girls and a sample of 74 boys.
What’s the difference?What’s the difference?• Goodness of Fit…..how well does the observed data
match the expected….2 rows or 2 columns
• Homogeneity…..More than one sample is taken with one categorical variable in mind
• (2+ Samples, 1 category)
• Independence/Association…..Only one sample is taken and there are two or more categories.
• (1 sample, 2+ categories)
One Hundred Fifty Seven students were surveyed………….
Snapchat
Girls 36(29.076) 25(32.248) 22(21.675)
Boys 19(25.924) 36(28.752) 19(19.325)
Hypotheses for Chi-Hypotheses for Chi-Square Test for Square Test for HomogeneityHomogeneity
Ho: There is no difference between gender and social media preferenceHa: There is a difference between gender and social media preference
OR
Ho: The proportions of boys and girls who prefer each type of social media are the sameHa: The proportions of boys and girls who prefer each type of social media are different
Ho: There is no difference between gender and social media preferenceHa: There is a difference between gender and social media preference
Assumptions: We have two independent random samples of students social media preferences(83 girls and 74 boys). There are obviously more than 830 girls and 740 boys in the population sampled from who use social media. 100% of our expected cells are 5 or more.
Chi-square test of homogeneity
X2 = 6.96P-value =.0307df = 2
This p-value is low enough to reject Ho at the 5% level
This is evidence to suggest that there may be a difference between gender and social media preference.
Period 1
Period 3
Referrals vs Days of Referrals vs Days of weekweek
Monday Tuesday Wednesday Thursday Friday
12 5 9 4 15
Are referrals related to the day of the week?
The table shows the number of students referred for disciplinary reasons to the principals office, broken down by day of the week.
BIRTHDAYS.BIRTHDAYS.
• Are Mr. Pines students birth months distributed in proportion to the number of days in each month?
• We can run a chi-square GOF based on the # of days in each month
n = 153
H0: Mr. Pines students birthday months are in proportion to the number of days in each month.
Ha:Mr. Pines students birthday months are different than the proportion of the number of days in each month.
We have an independent sample of 53 students birth months. All births is obviously more than 10x our sample. 100% of our expected counts are 5 or more.
X2 = 9.81
P-value = .5475
df = 11
This p-value is too high to reject Ho
Based on this sample, there is NOT enough evidence to suggest that Mr. Pines students birthday months are different than the proportion of the number of days in each month.
Chi-square GOF Test
C/O 2015 per 1 only
BIRTHDAYSBIRTHDAYSO E
9
7
12
11
14
16
20
12
11
12
12
17
To figure out the expected we need to think about the number of days in each month.
Jan 31 July 31Feb 28 Aug 31Mar 31 Sep 30 Total = 365 daysApr 30 Oct 31May 31 Nov 30June 30 Dec 31
n = 153
H0: Births are uniformly distributed by the # of days in each month
Ha: Births are not uniformly distributed by the # of days in each month
CONDITIONS
O E
12
10.956
5 9.896
12
10.956
12
10.603
13
10.956
7 10.603
9 10.956
14
10.956
11
10.603
11
10.956
8 10.603
15
10.956
We have an independent sample of 129 students birth months. All births is obviously more than 10x our sample. 100% of our expected counts are 5 or more. df = 11
X2 = 7.75
P = .7355
This p-value is too high to reject Ho.
There is not enough evidence to suggest that births are not uniformly distributed by the # of days in each month.
C/O 2014
Chi-Square Test for Chi-Square Test for HomogeneityHomogeneity
• Data is given in a 2-way table
• Expected counts are found by using a matrix on your calculator or by multiplying the (ROW TOTAL)(COLUMN TOTAL)/GRAND TOTAL
• Conditions and df are the same as GOF test
Is there a Is there a difference……difference……
• Do boys and girls prefer different video game consoles?
• Please choose your favorite console out of the 3?
Hypotheses for Chi-Hypotheses for Chi-Square Test for Square Test for HomogeneityHomogeneity
Remember Ha is always means different!
Ho: There is no difference between gender and video game console preferenceHa: There is a difference between gender and video game preference
OR
Ho: The proportions of boys and girls who prefer each type of console are the sameHa: The proportions of boys and girls who prefer each type of console are different
Why is this a Homogeneity Test?Why is this a Homogeneity Test?
• Two samples were taken separately
• Boys console preference
• Girls console preference
• There is ONE category of interest.
Two-Way TableTwo-Way Table2013
Two-Way TableTwo-Way Table2014
Hit 2nd Matrix, go to EDIT
Set the appropriate matrix size
Enter observed counts in matrix
Quit to home screen, go to test menu
Should be already setup if you used A and B
You need the expected counts….so go back to 2nd matrix. Use NAMES and go down to Matrix B, calculator generates them after you run the test
You will most likely have to scroll to the right to see all of the expected counts
REMEMBER!....EXPECTED COUNTS MUST BE ON YOUR PAPER!
College Students’ College Students’ DrinkingDrinking
In 1987, a random sample of undergraduate students at Rutgers University was sent a questionnaire that asked about their alcohol drinking habits. Here are the results displayed in a two-way table.
Chi-Square Test for Chi-Square Test for IndependenceIndependence
There was one sample taken and then data was broken down into different categories.
When only one sample is taken we are doing a Chi-Square Test for Independence/Association
HypothesesHypotheses
This is a chi-square test for Independence/Association, you have a few options for writing the hypotheses
Ho: There is no association between students’ residence type and drinking habits.Ha There is an association between students’ residence type and drinking habits.
OR
Ho: Student drinking habits and residence type are independentHa Student drinking habits and residence type are not independent
Full MoonFull MoonSome people believe that a full moon elicits unusual behavior in people. The table shows the number of arrests made in a small town during the weeks of six full moons and six other randomly selected weeks in the same year. Is there evidence of a difference in the types of illegal activity that takes place.
This is a chi-square test for Homogeneity
Thanks To:Grace Montgomery
THANKS TO: Amy Nguyen
Testing M&MTesting M&M’’ss• The Mars company has always claimed that the color
distribution of their M&M’s follow a certain proportion as follows:
Brown Red Yellow Green Orange Blue
13% 13% 14% 16% 20% 24%
Check the M&M’s that were given to you. How many of each color do you have? We will run a Chi-Square GOF test to see if their claim is accurate.
Do not eat your M&M’s until we have all observed and expected counts completed!
HypothesesHypotheses• Ho: My bag of M&M’s follow the same color distribution
as the Mars company claim.
• Ha: My bag of M&M’s follows a different color distribution as the Mars company claim.
Assumptions/Assumptions/ConditionsConditions
• ___% of expected counts >5
• My bag of M&M’s can be considered an independent random sample of M&M’s
M&M Combined M&M Combined ResultsResults
Colors Brown Red Yellow Green Orange Blue
Claim % 13% 13% 14% 16% 20% 24%
Expected 728.78 728.78 784.84 896.96 1121.2 1345.44
Observed 606 586 803 1101 1335 1175
There were a total of 5606 M&M’s sampled.
We have a chi-square statistic of 157.85 which gives a P-value of 0.
Mr. Pines Poker ChipsMr. Pines Poker Chips
• 44 white chips = 20pts• 5 blue chips = 30pts• 1 red chip = 50pts
• 44 white chips = 20pts• 5 blue chips = 30pts• 1 red chip = 50pts
Mr. Pines Poker ChipsMr. Pines Poker Chips
• 44 white chips = 20pts• 5 blue chips = 30pts• 1 red chip = 50pts
• 44 white chips = 20pts• 5 blue chips = 30pts• 1 red chip = 50pts
There have been 135 attempts at randomly choosing poker chips out of the bag. A White chip has been pulled 301 times, a Blue chip 47 times, and the Red chip 16 times. Has this followed the expected probabilities?
Run a chi-square GOF Test.
Baseball BatsBaseball Bats• There have been some major bat changes for the 2011
season. Aluminum baseball bats have been regulated so that they meet certain safety standards. After 5 games this season, coach Pines has noticed significant reductions in power numbers such as 2B’s, 3B’s, and HR’s…..Of course he would like to test his hypothesis.
Baseball BatsBaseball Bats• Run a Chi-Squared two-way table test to see if there is
an association between the power numbers and types of bats.
• Also run a 2-Prop. Z Test between the types of bats used.
• If these are done correctly, Z2 = X2
HypothesesHypotheses• Ho: There is no association between type of bats and
extra base hits• Ha: There is an association between type of bats and
extra base hits
Assumptions/Assumptions/ConditionsConditions
• E---All expected counts > 5
• S----We have a random sample of 23 schools hitting stats for the first 5 games of the 2010 and 2011 baseball seasons
• I----We can assume that all stats are independent of other teams stats
Observed and Observed and Expected CountsExpected Counts
2010(BESR Bats) 2011(BBCOR Bats)
Singles 672 (695.26) 703 (679.74)
Extra Base Hits 313 (289.74) 260 (283.26)
• X2 = 5.35
• P-value = .0207
• This p-value is low enough to reject at the 5% level.• There is evidence to suggest that there may be an
association between the types of bats and extra base hits
Tootsie Pop WrappersTootsie Pop WrappersWe are interested in whether or not the designs on the wrappings on Tootsie Roll Pops are independent of the flavor of the pop.
HypothesesHypotheses
This is a chi-square test for Independence/Association, you have a few options for writing the hypotheses
Ho: There is no association between pop flavor and designs on the wrapper.Ha There is an association between pop flavor and designs on the wrapper.
OR
Ho: Pop flavor and wrapper designs are independent.Ha Pop flavor and wrapper designs are not independent.