Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

Chapter 12: The Analysis of Categorical Data and Goodness-

of-Fit Test

Chi-Square Tests for Univariate Categorical Data

• One way frequency table – univariate categorical data are most conveniently summarized

Cash Credit Exchange Refused

Frequency 34 18 31 17

)1... :(Note

KCategory for proportion true

2Category for proportion true

1Category for proportion true

variablelcategorica a of categories ofnumber k

Notation

21

k

2

1

k

kCategory for proportion edhypothesiz

2Category for proportion edhypothesiz

1Category for proportion edhypothesiz :H

form thehave testedbe tohypotheses The

k

2

10

Ha: H0 is not true, so at least one of the true category proportions differs from the corresponding hypothesized value.

Example

• A number of psychological studies have considered the relationship between various deviant behaviors and other variables, such as lunar phase. An article focused on the existence of any relationship between date of patient admission for specified treatment and patient’s birthday. Admission date was partitioned into four categories according to how close it was to the patient’s birthday:

1. Within 7 days of birthday

2. Between 8 and 30 days, inclusive, from the birthday

3. Between 31 and 90 days, inclusive, from the birthday

4. More than 90 days from the birthday

• Let π1, π2, π3, and π4 denote the true proportions in categories 1, 2, 3, and 4, respectively. If there is no relationship between admission date and birthday, then, because there are 15 days included in the first category (from 7 days before the patient’s birthday to 7 days after, including of course, the birthday itself).

504.365

184

329.365

120

126.365

46

041.365

15

4

3

2

1

The hypotheses of interest are then

H0: π1 = .041, π2 = .126, π3 = .329, π4 = .504

Ha: H0 is not true

• The cited article gave data for n = 200 patients admitted for alcoholism treatment. If H0 is true, the expected counts are

100.8200(.504)4)category for proportion izedn(hypothes4)Category for count (expected



8.2200(.041) 1)category for proportion izedn(hypothes1)Category for count expected(

Category

1 2 3 4

Observed 11 24 69 96

Expected 8.2 25.2 65.8 100.8

count cell expected

count) cell expected -count cell (observed

quantity thecomputingfirst from results ,X statistic,fit -of-goodness The2

2

cells all

22

2

count cell expected

count) cell expected -count cell (observedX

:cellsk allfor quantities theseof sum theis statistic X The

Example

• We use the same data from previous example to test the hypothesis that admission date is unrelated to birthday. Let’s use a .05 significance level and the nine-step hypothesis-testing procedure.

1. Let π1, π2, π3, and π4 denote the proportions of all admissions for treatment of alcoholism falling in the four categories.

2. H0: π1=.041, π2=.126, π3=.329, π4=.504

3. Ha: H0 is not true.

4. Significance level: α = .05

cells all

22

count cell expected

count) cell expected -count cell observed(X :StatisticTest .5

6. Assumptions: The expected cell counts (from Example 12.1) are 8.2, 25.2, 65.8, and 100.8, all of which are greater than 5. The article did not indicate how the patients were selected. We can proceed with the chi-square test if it is reasonable to assume that the 200 patients in the sample can be regarded as a random sample of patients admitted for treatment of alcoholism.

41.1

23.016.006.096.08.100

)8.10096(

8.65

)8.6569(

2.25

)2.2524(

2.8

)2.811(

:nCalculatio .72222

2

X

8. P-value: The P-value is based on a chi-square distribution with df = 4 – 1 = 3. The computed value of X2 is smaller than 6.25 (the smallest entry in the df = 3 column), so P-value > .10.

9. Conclusion: Because P-value > α, H0 cannot be rejected. There is not sufficient evidence to conclude that date admitted for treatment and birthday are related.

Example

• Does the color of a car influence the chance that it will be stolen? It was reported the following information for a random sample of 830 stolen vehicles: 140 were white, 100 were blue, 270 were red, 230 were black, and 90 were other colors. We use X2 goodness-of-fit test and a significance level of .01 to test the hypothesis that proportion stolen are identical to population color proportions.

• Suppose that it is known that 15% of all cars are white, 15% are blue, 35% red, 30% are black, and 5% are other colors. If these same population color proportions hold for stolen cars, the expected counts are:

• Expected for white = 830(0.15) = 124.5

• Expected for blue = 830(0.15) = 124.5

• Expected for red = 830(0.35) = 290.5

• Expected for black = 830(0.30) = 249.0

• Expected for other = 830(0.05) = 41.5

Observed and Expected Counts

Category Color Observed Count

Expected Count

1 White 140 124.5

2 Blue 100 124.5

3 Red 270 290.5

4 Black 230 249.0

5 Other 90 41.5

1. Let π1, π2,…, π5 denote the true proportions of stolen cars that fall into the five color categories.

2. H0: π1=.15, π2=.15, π3=.35, π4=.30, π5=.05

3. Ha: H0 is not true

4. Significance level: α = .01

cells all

22

count cell expected

count) cell expected-count cell observed(:StatisticTest .5 X

6. Assumptions: The sample was a random sample of stolen vehicles. All expected counts are greater than 5, so the sample size is large enough to use the chi-square test.

33.66

68.5645.145.182.493.15.41

)5.4190(

249

)249230(

5.290

)5.290270(

5.124

)5.124100(

5.124

)5.124140(

:nCalculatio .722222

2

X

8. P-value: All expected counts exceed 5, so the P-value can be based on a chi-square distribution with df = 5 – 1 = 4. The computed value is larger than 18.46, the largest value in the df = 4 column so P-value < .001

9. Conclusion: Because P-value ≤ α, H0 is rejected. There is convincing evidence that at least one of the color proportions for stolen cars differs from the corresponding proportion for all cars.

Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

Documents

patients birthday

true proportions

admission date

days of birthdaybetween

random sample of patients

true category proportions

alcoholism treatment

treatment of alcoholism