AP Statistics Name: ____________________________
Chapter 11 – Inference for Distributions of Categorical
DataDate: ____________________ Per: _____
Section 11.1 Introduction: The Chi-Square Statistic
ACTIVITY: The Candy Man Can pg. 678
Mars, Incorporated, which is headquartered in McLean, Virginia,
makes milk chocolate candies. Here’s what the company’s Consumer
Affairs Department says about the color distribution of its
M&M’S Milk Chocolate Candies:
On average, M&M’S Milk Chocolate Candies will contain 13
percent of Browns, 13 percent of Reds, 14 percent yellows, 16
percent greens, 20 percent oranges, and 24 percent blues.
The purpose of this activity is to determine whether the
company’s claim is believable.
1. Our class will take a random sample of M&M’s from a large
bag. As a class, count the number of M&M’s Chocolate Candies of
each color. Make a table that summarizes these observed counts.
Color:
Blue
Orange
Green
Yellow
Red
Brown
Total
Observed
Counts:
2. As a class, write down hypotheses for a significance
test.
H0:
Ha:
3. Let’s suppose that M&Ms claimed distribution is correct.
If they are correct, how many of each color would we expect to get
in our sample.
Expected Counts: Brown: _______ Yellow:_______ Orange:_______
Green:_______ Blue:_______ Red:_______
Use the table to calculate the test statistic.
Observed
Expected
(Observed - Expected)
(Observed - Expected)2
Brown
Yellow
Orange
Green
Blue
Red
Add up all the numbers in the last column.
This is our test statistic:
4. What value would we get for the test statistic if our sample
was very close to what is expected?
5. What value would we get for the test statistic if our sample
was very far from what is expected?
Section 11.1 Day 1 Big Ideas: Chi-Square Goodness of Fit
Test
Hypotheses:
Chi-Square G.O.F. Formula:
Expected Values
Degrees of Freedom and P-Value
Example: Six-Sided Die
Maya made a 6-sided die in her ceramics class and rolled it 90
times to test if each side was equally likely to show up. The table
summarizes the outcomes of her 90 rolls.
(a) State the hypotheses that Carrie should test.
(b) Calculate the expected count for each of the possible
outcomes.
(c) Calculate the value of the chi-square test statistic.
(d) Which degrees of freedom should you use?
(e) Use table C to find the p-value. What conclusion would you
make?
In Summary… Please Read:
A Chi-Square Goodness of Fit Test is applied when you have one
categorical variable from a single population. It is used to
determine whether the sample data are consistent with a
hypothesized distribution. This is an efficient method when we need
to look at multiple levels of a categorical variable to see how all
of them differ from the null or claimed values, rather than look at
just one level of the categorical variable like we did in previous
chapters for population parameters. In this chapter, we are
conducting tests about distributions of categorical variables.
This approach consists of four steps: (1) state the hypotheses,
(2) formulate an analysis plan, (3) analyze the sample data, and
(4) interpret the results. Although this 4-step process might seem
repetitive to you, it is important to follow it as you will be
graded on these four steps on the AP® exam.
Comparing Observed and Expected Counts: The Chi-Square Statistic
(pg. 680)
Let’s begin by stating hypotheses. In a chi-square test for
goodness of fit, the null hypothesis should state a claim about the
distribution of a single categorical variable in the population of
interest. In the case of the M&M Activity, the categorical
variable we are measuring is color and the population of interest
is all M&M’S Milk Chocolate Candies. The appropriate null
hypothesis is:
“ The company’s stated color distribution for all M&M’S Milk
Chocolate Candies is correct”.
In Chapters 9 and 10, the alternative hypotheses could be
one-sided or two-sided, depending on the question we were trying to
answer. However, in this chapter, there will be no one-sided tests.
The alternative hypothesis will always be “the null hypothesis is
not correct” (in context, of course!). In other words, the
alternative hypothesis in a chi-square test for goodness of fit is
that the categorical variable does not have the specified
distribution. The alternative hypothesis is mutually exclusive from
the null hypothesis. The appropriate alternative hypothesis for the
M&M’S Activity is:
“ The company’s stated color distribution for all M&M’S Milk
Chocolate Candies is not correct”.
As you can already tell, we choose to state the hypotheses for a
chi-square goodness of fit test in words. We can certainly do so in
symbols as well, but it gets a little more complicated. Here is
what it would look like in symbols:
Notice if one of the proportions is incorrect, at least one
other proportion must be incorrect since they have to add up to 1.
This is why it is best to state your Chi-Square hypotheses in
words.
The big idea of a chi-square hypothesis test for goodness of fit
is to compare the observed counts with the expected counts assuming
that the null hypothesis is true (you always assume this when
performing a significance test). The more the observed counts
differ from the expected counts, the more evidence we have against
the null hypothesis in favor of the alternative hypothesis!
Example 1: Return of the M&M’S ® Candies (Calculating
Expected Counts)
Jerome’s class collected data from a random sample of 60
M&M’S ® Milk Chocolate Candies. Calculate the expected counts
for each color if Mars, Incorporated’s claim is true: “On average,
M&M’S Milk Chocolate Candies will contain 13 percent of Browns,
13 percent of Reds, 14 percent yellows, 16 percent greens, 20
percent oranges, and 24 percent blues”.
This is pretty straightforward already, but to calculate the
expected counts you must multiple the sample size n by the expected
proportion in each level i of the category as stated in the null
hypothesis.
Notice anything familiar from Chapter 6 yet? The number of
M&M’S ® of a specific color in a random sample of 60 candies is
a binomial random variable.
Keep in mind that the expected counts are not likely to be whole
numbers. Some students mistakenly round the expected counts,
believing they must be integers. You will lose points on the AP ®
exam because it shows a misunderstanding of what expected counts
represent. Remember that expected counts represent “the average
number of observations in a given category in many, many random
samples”.
Let’s continue seeing if the data provide convincing evidence
for the alternative hypothesis. As previously stated, if the
observed counts are from the expected counts, then that is the
evidence we are seeking in favor of the alternative and against the
null hypothesis. It is generally a good idea to use relative
frequency bar graphs or frequency bar graphs (when sample sizes are
the same) to compare the expected distribution against the observed
distribution. The following figure is a side-by-side bar graph
comparing the observed and expected counts.
We see some fairly large differences between the observed values
and the expected values for several colors. Is this due to chance?
Or is it due to the fact that Mars, Inc.’s claim is false? You must
calculate the chi-square test statistic to find out.
The Chi-Square Distributions and P-values
An important condition you must always check is that your
expected counts are all at least 5 in order to use a chi-square
distribution.
There is a different chi-square distribution for each possible
value of the degrees of freedom. As the degrees of freedom (df)
increase, the density curve becomes less right-skewed and larger
values of chi-square become more probable. Some other interesting
facts about chi-square distributions are that the mean of a
particular chi-square distribution is equal to its degrees of
freedom. For df > 2, the mode (peak) of the chi-square
distribution is at df – 2 .
Unlike other significance tests we have learned so far, the
P-value for a chi-square test is always found by calculating the
area to the RIGHT of the observed chi-square statistic. To get the
P-value from a chi-square distribution, you can use Table C or
technology:
· Table C gives you an interval of possible P-values based on
the specified degrees of freedom. First, determine the degrees of
freedom, then go to the row in Table C labeled with those degrees
of freedom. Scan the table from left to right until you determine
between which two critical values your particular chi-square value
falls. Then, look at the P-values at the top of the table to
determine between which two P-values the true P-value falls.
· Technology is much more precise! Type 2ND VARS and choose cdf
(lower: __, upper: ___, df: __) and type in the corresponding
values to determine your very precise P-value.
We recommend using the calculator command because it is much
faster than what your great-grandpa might have done in
Ancient-Placement Statistics ®.
And yes, in case you were wondering, we will be using the
specific “Chi-Square Test” calculator function next week when we do
the 4-step process for a Chi-Square GOF Test.
Mars, Inc., reports that their M&M’S® Peanut Chocolate
Candies are produced according to the following color distribution:
23% each of blue and orange, 15% each of green and yellow, and 12%
each of red and brown. Joey bought a randomly selected bag of
Peanut Chocolate Candies and counted the colors of the candies in
his sample: 12 blue, 7 orange, 13 green, 4 yellow, 8 red, and 2
brown.
1. State appropriate hypotheses for testing the company’s claim
about the color distribution
of M&M’S Peanut Chocolate Candies.
2. Calculate the expected count for each color, assuming that
the company’s claim is true. Show work.
3. Calculate the chi-square statistic for Joey’s sample. Show
formula and your work.
4. Confirm that the expected counts are large enough to use a
chi-square distribution. Which distribution (specify the degrees of
freedom) should we use?
5. Sketch a graph like Figure 11.4 on page 685 that shows the
P-value.
6. Use Table C to find the P-value. Then use your calculator’s
cdf command.
7. What conclusion would you draw about the company’s claimed
color distribution for M&M’S® Peanut Chocolate Candies? Justify
your answer.
Expected
Expected
Observed
2
)
(
-