Top Banner
Chapter 12: The Analysis of Categorical Data and Goodness-of-Fit Test
22
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

Chapter 12: The Analysis of Categorical Data and Goodness-

of-Fit Test

Page 2: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

Chi-Square Tests for Univariate Categorical Data

Page 3: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

• One way frequency table – univariate categorical data are most conveniently summarized

Cash Credit Exchange Refused

Frequency 34 18 31 17

Page 4: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

)1... :(Note

KCategory for proportion true

2Category for proportion true

1Category for proportion true

variablelcategorica a of categories ofnumber k

Notation

21

k

2

1

k

kCategory for proportion edhypothesiz

2Category for proportion edhypothesiz

1Category for proportion edhypothesiz :H

form thehave testedbe tohypotheses The

k

2

10

Ha: H0 is not true, so at least one of the true category proportions differs from the corresponding hypothesized value.

Page 5: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

Example

• A number of psychological studies have considered the relationship between various deviant behaviors and other variables, such as lunar phase. An article focused on the existence of any relationship between date of patient admission for specified treatment and patient’s birthday. Admission date was partitioned into four categories according to how close it was to the patient’s birthday:

Page 6: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

1. Within 7 days of birthday

2. Between 8 and 30 days, inclusive, from the birthday

3. Between 31 and 90 days, inclusive, from the birthday

4. More than 90 days from the birthday

Page 7: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

• Let π1, π2, π3, and π4 denote the true proportions in categories 1, 2, 3, and 4, respectively. If there is no relationship between admission date and birthday, then, because there are 15 days included in the first category (from 7 days before the patient’s birthday to 7 days after, including of course, the birthday itself).

Page 8: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

504.365

184

329.365

120

126.365

46

041.365

15

4

3

2

1

The hypotheses of interest are then

H0: π1 = .041, π2 = .126, π3 = .329, π4 = .504

Ha: H0 is not true

Page 9: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

• The cited article gave data for n = 200 patients admitted for alcoholism treatment. If H0 is true, the expected counts are

100.8200(.504)4)category for proportion izedn(hypothes4)Category for count (expected

65.8200(.329)3)category for proportion izedn(hypothes3)Category for count (expected

25.2200(.126)2)category for proportion izedn(hypothes2)Category for count (expected

8.2200(.041) 1)category for proportion izedn(hypothes1)Category for count expected(

Page 10: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

Category

1 2 3 4

Observed 11 24 69 96

Expected 8.2 25.2 65.8 100.8

Page 11: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

count cell expected

count) cell expected -count cell (observed

quantity thecomputingfirst from results ,X statistic,fit -of-goodness The2

2

cells all

22

2

count cell expected

count) cell expected -count cell (observedX

:cellsk allfor quantities theseof sum theis statistic X The

Page 12: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

Example

• We use the same data from previous example to test the hypothesis that admission date is unrelated to birthday. Let’s use a .05 significance level and the nine-step hypothesis-testing procedure.

Page 13: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

1. Let π1, π2, π3, and π4 denote the proportions of all admissions for treatment of alcoholism falling in the four categories.

2. H0: π1=.041, π2=.126, π3=.329, π4=.504

3. Ha: H0 is not true.

4. Significance level: α = .05

Page 14: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

cells all

22

count cell expected

count) cell expected -count cell observed(X :StatisticTest .5

6. Assumptions: The expected cell counts (from Example 12.1) are 8.2, 25.2, 65.8, and 100.8, all of which are greater than 5. The article did not indicate how the patients were selected. We can proceed with the chi-square test if it is reasonable to assume that the 200 patients in the sample can be regarded as a random sample of patients admitted for treatment of alcoholism.

41.1

23.016.006.096.08.100

)8.10096(

8.65

)8.6569(

2.25

)2.2524(

2.8

)2.811(

:nCalculatio .72222

2

X

Page 15: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

8. P-value: The P-value is based on a chi-square distribution with df = 4 – 1 = 3. The computed value of X2 is smaller than 6.25 (the smallest entry in the df = 3 column), so P-value > .10.

9. Conclusion: Because P-value > α, H0 cannot be rejected. There is not sufficient evidence to conclude that date admitted for treatment and birthday are related.

Page 16: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

Example

• Does the color of a car influence the chance that it will be stolen? It was reported the following information for a random sample of 830 stolen vehicles: 140 were white, 100 were blue, 270 were red, 230 were black, and 90 were other colors. We use X2 goodness-of-fit test and a significance level of .01 to test the hypothesis that proportion stolen are identical to population color proportions.

Page 17: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

• Suppose that it is known that 15% of all cars are white, 15% are blue, 35% red, 30% are black, and 5% are other colors. If these same population color proportions hold for stolen cars, the expected counts are:

Page 18: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

• Expected for white = 830(0.15) = 124.5

• Expected for blue = 830(0.15) = 124.5

• Expected for red = 830(0.35) = 290.5

• Expected for black = 830(0.30) = 249.0

• Expected for other = 830(0.05) = 41.5

Page 19: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

Observed and Expected Counts

Category Color Observed Count

Expected Count

1 White 140 124.5

2 Blue 100 124.5

3 Red 270 290.5

4 Black 230 249.0

5 Other 90 41.5

Page 20: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

1. Let π1, π2,…, π5 denote the true proportions of stolen cars that fall into the five color categories.

2. H0: π1=.15, π2=.15, π3=.35, π4=.30, π5=.05

3. Ha: H0 is not true

4. Significance level: α = .01

Page 21: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

cells all

22

count cell expected

count) cell expected-count cell observed(:StatisticTest .5 X

6. Assumptions: The sample was a random sample of stolen vehicles. All expected counts are greater than 5, so the sample size is large enough to use the chi-square test.

33.66

68.5645.145.182.493.15.41

)5.4190(

249

)249230(

5.290

)5.290270(

5.124

)5.124100(

5.124

)5.124140(

:nCalculatio .722222

2

X

Page 22: Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

8. P-value: All expected counts exceed 5, so the P-value can be based on a chi-square distribution with df = 5 – 1 = 4. The computed value is larger than 18.46, the largest value in the df = 4 column so P-value < .001

9. Conclusion: Because P-value ≤ α, H0 is rejected. There is convincing evidence that at least one of the color proportions for stolen cars differs from the corresponding proportion for all cars.