Top Banner
AP Statistics Name: ____________________________ Chapter 11 – Inference for Distributions of Categorical Data Date: ____________________ Per: _____ Section 11.1 Introduction: The Chi-Square Statistic χ 2 ACTIVITY: The Candy Man Can pg. 678 Mars, Incorporated, which is headquartered in McLean, Virginia, makes mi chocolate candies. Here’s what the company’s Consumer Affairs Department says about the color distribution of its M&M’S Milk Chocolate Candies: On average, M&M’S Milk Chocolate Candies will contain 13 percent of Browns, 13 percent of Reds, 14 percent yellows, 16 percent greens, 20 percent oranges, and 24 percent blues. The purpose of this activity is to determine whether the company’s claim is believable. 1. Our class will take a random sample of M&M’s from a large bag. As a class, count the number of M&M’s Chocolate Candies of each color. Make a table that summarizes these observed counts. Color: Blue Orange Green Yellow Red Brown Total Observed Counts: 2. As a class, write down hypotheses for a significance test. H 0 : H a : 3. Let’s suppose that M&Ms claimed distribution is correct. If they are correct, how many of each color would we expect to get in our sample . Expected Counts: Brown: _______ Yellow:_______ Orange:_______ Green:_______ Blue:_______ Red:_______ Use the table to calculate the test statistic. Observe d Expecte d (Observed - Expected) (Observed - Expected) 2 ( ObservedExpected ) 2 Expected Brown Yello w Orang e
8

wghsconti.weebly.com · Web viewAP Statistics Name: _____ Chapter 11 – Inference for Distributions of Categorical Data Date: _____ Per: _____ Section 11.1 Introduction: The Chi-Square

Feb 13, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

AP Statistics Name: ____________________________

Chapter 11 – Inference for Distributions of Categorical DataDate: ____________________ Per: _____

Section 11.1 Introduction: The Chi-Square Statistic

ACTIVITY: The Candy Man Can pg. 678

Mars, Incorporated, which is headquartered in McLean, Virginia, makes milk chocolate candies. Here’s what the company’s Consumer Affairs Department says about the color distribution of its M&M’S Milk Chocolate Candies:

On average, M&M’S Milk Chocolate Candies will contain 13 percent of Browns, 13 percent of Reds, 14 percent yellows, 16 percent greens, 20 percent oranges, and 24 percent blues.

The purpose of this activity is to determine whether the company’s claim is believable.

1. Our class will take a random sample of M&M’s from a large bag. As a class, count the number of M&M’s Chocolate Candies of each color. Make a table that summarizes these observed counts.

Color:

Blue

Orange

Green

Yellow

Red

Brown

Total

Observed

Counts:

2. As a class, write down hypotheses for a significance test.

H0:

Ha:

3. Let’s suppose that M&Ms claimed distribution is correct. If they are correct, how many of each color would we expect to get in our sample.

Expected Counts: Brown: _______ Yellow:_______ Orange:_______ Green:_______ Blue:_______ Red:_______

Use the table to calculate the test statistic.

 

Observed

Expected

(Observed - Expected)

(Observed - Expected)2

Brown

 

 

 

 

 

Yellow

 

 

 

 

 

Orange

 

 

 

 

 

Green

 

 

 

 

 

Blue

 

 

 

 

 

Red

 

 

 

 

 

Add up all the numbers in the last column.

This is our test statistic:

4. What value would we get for the test statistic if our sample was very close to what is expected?

5. What value would we get for the test statistic if our sample was very far from what is expected?

Section 11.1 Day 1 Big Ideas: Chi-Square Goodness of Fit Test

Hypotheses:

Chi-Square G.O.F. Formula:

Expected Values

Degrees of Freedom and P-Value

Example: Six-Sided Die

Maya made a 6-sided die in her ceramics class and rolled it 90 times to test if each side was equally likely to show up. The table summarizes the outcomes of her 90 rolls.

(a) State the hypotheses that Carrie should test.

(b) Calculate the expected count for each of the possible outcomes.

(c) Calculate the value of the chi-square test statistic.

(d) Which degrees of freedom should you use?

(e) Use table C to find the p-value. What conclusion would you make?

In Summary… Please Read:

A Chi-Square Goodness of Fit Test is applied when you have one categorical variable from a single population. It is used to determine whether the sample data are consistent with a hypothesized distribution. This is an efficient method when we need to look at multiple levels of a categorical variable to see how all of them differ from the null or claimed values, rather than look at just one level of the categorical variable like we did in previous chapters for population parameters. In this chapter, we are conducting tests about distributions of categorical variables.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze the sample data, and (4) interpret the results. Although this 4-step process might seem repetitive to you, it is important to follow it as you will be graded on these four steps on the AP® exam.

Comparing Observed and Expected Counts: The Chi-Square Statistic (pg. 680)

Let’s begin by stating hypotheses. In a chi-square test for goodness of fit, the null hypothesis should state a claim about the distribution of a single categorical variable in the population of interest. In the case of the M&M Activity, the categorical variable we are measuring is color and the population of interest is all M&M’S Milk Chocolate Candies. The appropriate null hypothesis is:

“ The company’s stated color distribution for all M&M’S Milk Chocolate Candies is correct”.

In Chapters 9 and 10, the alternative hypotheses could be one-sided or two-sided, depending on the question we were trying to answer. However, in this chapter, there will be no one-sided tests. The alternative hypothesis will always be “the null hypothesis is not correct” (in context, of course!). In other words, the alternative hypothesis in a chi-square test for goodness of fit is that the categorical variable does not have the specified distribution. The alternative hypothesis is mutually exclusive from the null hypothesis. The appropriate alternative hypothesis for the M&M’S Activity is:

“ The company’s stated color distribution for all M&M’S Milk Chocolate Candies is not correct”.

As you can already tell, we choose to state the hypotheses for a chi-square goodness of fit test in words. We can certainly do so in symbols as well, but it gets a little more complicated. Here is what it would look like in symbols:

Notice if one of the proportions is incorrect, at least one other proportion must be incorrect since they have to add up to 1. This is why it is best to state your Chi-Square hypotheses in words.

The big idea of a chi-square hypothesis test for goodness of fit is to compare the observed counts with the expected counts assuming that the null hypothesis is true (you always assume this when performing a significance test). The more the observed counts differ from the expected counts, the more evidence we have against the null hypothesis in favor of the alternative hypothesis!

Example 1: Return of the M&M’S ® Candies (Calculating Expected Counts)

Jerome’s class collected data from a random sample of 60 M&M’S ® Milk Chocolate Candies. Calculate the expected counts for each color if Mars, Incorporated’s claim is true: “On average, M&M’S Milk Chocolate Candies will contain 13 percent of Browns, 13 percent of Reds, 14 percent yellows, 16 percent greens, 20 percent oranges, and 24 percent blues”.

This is pretty straightforward already, but to calculate the expected counts you must multiple the sample size n by the expected proportion in each level i of the category as stated in the null hypothesis.

Notice anything familiar from Chapter 6 yet? The number of M&M’S ® of a specific color in a random sample of 60 candies is a binomial random variable.

Keep in mind that the expected counts are not likely to be whole numbers. Some students mistakenly round the expected counts, believing they must be integers. You will lose points on the AP ® exam because it shows a misunderstanding of what expected counts represent. Remember that expected counts represent “the average number of observations in a given category in many, many random samples”.

Let’s continue seeing if the data provide convincing evidence for the alternative hypothesis. As previously stated, if the observed counts are from the expected counts, then that is the evidence we are seeking in favor of the alternative and against the null hypothesis. It is generally a good idea to use relative frequency bar graphs or frequency bar graphs (when sample sizes are the same) to compare the expected distribution against the observed distribution. The following figure is a side-by-side bar graph comparing the observed and expected counts.

We see some fairly large differences between the observed values and the expected values for several colors. Is this due to chance? Or is it due to the fact that Mars, Inc.’s claim is false? You must calculate the chi-square test statistic to find out.

The Chi-Square Distributions and P-values

An important condition you must always check is that your expected counts are all at least 5 in order to use a chi-square distribution.

There is a different chi-square distribution for each possible value of the degrees of freedom. As the degrees of freedom (df) increase, the density curve becomes less right-skewed and larger values of chi-square become more probable. Some other interesting facts about chi-square distributions are that the mean of a particular chi-square distribution is equal to its degrees of freedom. For df > 2, the mode (peak) of the chi-square distribution is at df – 2 .

Unlike other significance tests we have learned so far, the P-value for a chi-square test is always found by calculating the area to the RIGHT of the observed chi-square statistic. To get the P-value from a chi-square distribution, you can use Table C or technology:

· Table C gives you an interval of possible P-values based on the specified degrees of freedom. First, determine the degrees of freedom, then go to the row in Table C labeled with those degrees of freedom. Scan the table from left to right until you determine between which two critical values your particular chi-square value falls. Then, look at the P-values at the top of the table to determine between which two P-values the true P-value falls.

· Technology is much more precise! Type 2ND VARS and choose cdf (lower: __, upper: ___, df: __) and type in the corresponding values to determine your very precise P-value.

We recommend using the calculator command because it is much faster than what your great-grandpa might have done in Ancient-Placement Statistics ®.

And yes, in case you were wondering, we will be using the specific “Chi-Square Test” calculator function next week when we do the 4-step process for a Chi-Square GOF Test.

Mars, Inc., reports that their M&M’S® Peanut Chocolate Candies are produced according to the following color distribution: 23% each of blue and orange, 15% each of green and yellow, and 12% each of red and brown. Joey bought a randomly selected bag of Peanut Chocolate Candies and counted the colors of the candies in his sample: 12 blue, 7 orange, 13 green, 4 yellow, 8 red, and 2 brown.

1. State appropriate hypotheses for testing the company’s claim about the color distribution

of M&M’S Peanut Chocolate Candies.

2. Calculate the expected count for each color, assuming that the company’s claim is true. Show work.

3. Calculate the chi-square statistic for Joey’s sample. Show formula and your work.

4. Confirm that the expected counts are large enough to use a chi-square distribution. Which distribution (specify the degrees of freedom) should we use?

5. Sketch a graph like Figure 11.4 on page 685 that shows the P-value.

6. Use Table C to find the P-value. Then use your calculator’s cdf command.

7. What conclusion would you draw about the company’s claimed color distribution for M&M’S® Peanut Chocolate Candies? Justify your answer.

Expected

Expected

Observed

2

)

(

-