Top Banner
Statistics for Business and Economics Chapter 9 Categorical Data Analysis
59

Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Mar 30, 2015

Download

Documents

Tristan Egger
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Statistics for Business and Economics

Chapter 9

Categorical Data Analysis

Page 2: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Learning Objectives

1. Explain 2 Test for Proportions

2. Explain 2 Test of Independence

3. Solve Hypothesis Testing Problems• More Than Two Population Proportions• Independence

Page 3: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Data Types

Data

Quantitative Qualitative

ContinuousDiscrete

Page 4: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Qualitative Data

• Qualitative random variables yield responses that classify

– Example: gender (male, female)

• Measurement reflects number in category

• Nominal or ordinal scale

• Examples– What make of car do you drive? – Do you live on-campus or off-campus?

Page 5: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Hypothesis Tests Qualitative Data

QualitativeData

Z Test Z Test c2 Test

Proportion Independence1 pop.

c2 Test

More than2 pop.

2 pop.

Page 6: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Chi-Square (2) Test for k Proportions

Page 7: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Hypothesis Tests Qualitative Data

QualitativeData

Z Test Z Test c2 Test

Proportion Independence1 pop.

c2 Test

More than2 pop.

2 pop.

Page 8: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Multinomial Experiment

• n identical trials

• k outcomes to each trial

• Constant outcome probability, pk

• Independent trials

• Random variable is count, nk

• Example: ask 100 people (n) which of 3 candidates (k) they will vote for

Page 9: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Chi-Square (2) Test for k Proportions

• Tests equality (=) of proportions only– Example: p1 = .2, p2=.3, p3 = .5

• One variable with several levels

• Uses one-way contingency table

Page 10: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

One-Way Contingency Table

Shows number of observations in k independent groups (outcomes or variable levels)

Outcomes (k = 3)

Number of responses

Candidate

Tom Bill Mary Total

35 20 45 100

Page 11: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Conditions Required for a Valid Test: One-way Table

1. A multinomial experiment has been conducted

2. The sample size n is large: E(ni) is greater than or equal to 5 for every cell

Page 12: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

2 Test for k Proportions Hypotheses & Statistic

2. Test Statistic

2

2

all cells

i i

i

n E n

E n

Observed count

Expected count:E(ni) = npi,0

3. Degrees of Freedom: k – 1 Number of outcomes

Hypothesized probability

1. Hypotheses

H0: p1 = p1,0, p2 = p2,0, ..., pk = pk,0

Ha: At least one pi is different from above

Page 13: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

2 Test Basic Idea

1. Compares observed count to expected count assuming null hypothesis is true

2. Closer observed count is to expected count, the more likely the H0 is true

• Measured by squared difference relative to expected count— Reject large values

Page 14: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Finding Critical Value Example

What is the critical 2 value if k = 3, and =.05?

c20

Upper Tail AreaDF .995 … .95 … .051 ... … 0.004 … 3.8412 0.010 … 0.103 … 5.991

2 Table (Portion)

If ni = E(ni), 2 = 0.

Do not reject H0

df = k - 1 = 2

5.991

Reject H0

= .05

Page 15: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

As personnel director, you want to test the perception of fairness of three methods of performance evaluation. Of 180 employees, 63 rated Method 1 as fair, 45 rated Method 2 as fair, 72 rated Method 3 as fair. At the .05 level of significance, is there a difference in perceptions?

2 Test for k Proportions Example

Page 16: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

• H0:• Ha:• =

• n1 = n2 = n3 =

• Critical Value(s):

Test Statistic:

Decision:

Conclusion:

p1 = p2 = p3 = 1/3

At least 1 is different

.05

63 45 72

= .05

c20

Reject H0

5.991

2 Test for k Proportions Solution

Page 17: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

,0

1 2 3 180 1 3 60

i iE n np

E n E n E n

2 Test for k Proportions Solution

2

2

all cells

2 2 263 60 45 60 72 60

6.360 60 60

i i

i

n E n

E n

Page 18: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Test Statistic:

Decision:

Conclusion:

2 = 6.3

Reject at = .05

There is evidence of a difference in proportions

2 Test for k Proportions Solution

• H0:• Ha:• =

• n1 = n2 = n3 =

• Critical Value(s):

c20

Reject H0

p1 = p2 = p3 = 1/3

At least 1 is different

.05

63 45 72

5.991

= .05

Page 19: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Contingency Tables

Contingency Tables

• Useful in situations involving multiple population proportions

• Used to classify sample observations according to two or more characteristics

• Also called a cross-classification table.

Page 20: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Contingency Table Example

Left-Handed vs. Gender

Dominant Hand: Left vs. Right

Gender: Male vs. Female

2 categories for each variable, so called a 2 x 2 table

Suppose we examine a sample of 300 children

Page 21: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Contingency Table Example

Sample results organized in a contingency table:(continued)

Gender

Hand Preference

Left Right

Female 12 108 120

Male 24 156 180

36 264 300

120 Females, 12 were left handed

180 Males, 24 were left handed

sample size = n = 300:

Page 22: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

2 Test for the Difference Between Two Proportions

• If H0 is true, then the proportion of left-handed females should be the same as the proportion of left-handed males

• The two proportions above should be the same as the proportion of left-handed people overall

H0: π1 = π2 (Proportion of females who are left

handed is equal to the proportion of

males who are left handed)

H1: π1 ≠ π2 (The two proportions are not the same

hand preference is not independent of gender)

Page 23: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

The Chi-Square Test Statistic

• where:fo = observed frequency in a particular cellfe = expected frequency in a particular cell if H0 is true

(Assumed: each cell in the contingency table has

expected frequency of at least 5)

cells

22

)(

all e

eoSTAT f

ffχ

The Chi-square test statistic is:

freedom of degree 1 has case 2x 2 thefor 2STAT

χ

Page 24: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Decision Rule

Decision Rule:If , reject H0, otherwise, do not reject H0

The test statistic approximately follows a chi-squared distribution with one degree of freedom

0

Reject H0Do not reject H0

2STAT

χ

22αSTAT

χ χ

Page 25: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Computing the Average Proportion

Here: 120 Females, 12 were

left handed

180 Males, 24 were left handed

i.e., of all the children the proportion of left handers is 0.12, that is, 12%

n

X

nn

XXp

21

21

12.0300

36

180120

2412p

The average proportion is:

Page 26: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Finding Expected Frequencies

• To obtain the expected frequency for left handed females, multiply the average proportion left handed (p) by the total number of females

• To obtain the expected frequency for left handed males, multiply the average proportion left handed (p) by the total number of males

If the two proportions are equal, then

P(Left Handed | Female) = P(Left Handed | Male) = .12

i.e., we would expect (.12)(120) = 14.4 females to be left handed(.12)(180) = 21.6 males to be left handed

Page 27: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Observed vs. Expected Frequencies

Gender

Hand Preference

Left Right

FemaleObserved = 12

Expected = 14.4

Observed = 108

Expected = 105.6120

MaleObserved = 24

Expected = 21.6

Observed = 156

Expected = 158.4180

36 264 300

Page 28: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Gender

Hand Preference

Left Right

FemaleObserved = 12

Expected = 14.4

Observed = 108

Expected = 105.6120

MaleObserved = 24

Expected = 21.6

Observed = 156

Expected = 158.4180

36 264 300

0.7576158.4

158.4)(156

21.6

21.6)(24

105.6

105.6)(108

14.4

14.4)(12

f

)f(fχ

2222

cells all e

2eo2

STAT

The Chi-Square Test Statistic

The test statistic is:

Page 29: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Decision Rule

Decision Rule:If > 3.841, reject H0, otherwise, do not reject H0

3.841 d.f. 1 with ; 0.7576 is statistic test The 205.0

2 χχSTAT

Here, = 0.7576< = 3.841, so we do not reject H0 and conclude that there is not sufficient evidence that the two proportions are different at = 0.05

20.05 = 3.841

0

0.05

Reject H0Do not reject H0

2STAT

χ

2STAT

χ 205.0

χ

Page 30: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

• Extend the 2 test to the case with more than two independent populations:

2 Test for Differences Among More Than Two Proportions

H0: π1 = π2 = … = πc

H1: Not all of the πj are equal (j = 1, 2, …, c)

Page 31: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

The Chi-Square Test Statistic

• Where:

fo = observed frequency in a particular cell of the 2 x c table

fe = expected frequency in a particular cell if H0 is true

(Assumed: each cell in the contingency table has expectedfrequency of at least 1)

cells

22

)(

all e

eoSTAT f

ffχ

The Chi-square test statistic is:

freedom of degrees 1-c 1)-1)(c-(2 has case cx 2 thefor χ 2 STAT

Page 32: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Computing the Overall Proportion

n

X

nnn

XXXp

c21

c21

The overall

proportion is:

• Expected cell frequencies for the c categories are calculated as in the 2 x 2 case, and the decision rule is the same:

Where is from the chi-squared distribution with c – 1 degrees of freedom

Decision Rule:If , reject H0, otherwise, do not reject H0

22αSTAT

χ χ

χ

Page 33: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

The Marascuilo Procedure

• Used when the null hypothesis of equal proportions is rejected

• Enables you to make comparisons between all pairs

• Start with the observed differences, pj – pj’, for all pairs (for j ≠ j’) . . .

• . . .then compare the absolute difference to a calculated critical range

Page 34: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

2 Test of Independence

Page 35: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Hypothesis Tests Qualitative Data

QualitativeData

Z Test Z Test c2 Test

Proportion Independence1 pop.

c2 Test

More than2 pop.

2 pop.

Page 36: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

2 Test of Independence

• Shows if a relationship exists between two qualitative variables

– One sample is drawn– Does not show causality

• Uses two-way contingency table

Page 37: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

2 Test of Independence Contingency Table

Shows number of observations from 1 sample jointly in 2 qualitative variables

House Location House Style Urban Rural Total

Split-Level 63 49 112 Ranch 15 33 48 Total 78 82 160

Levels of variable 2

Levels of variable 1

Page 38: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Conditions Required for a Valid 2 Test: Independence

1. Multinomial experiment has been conducted

2. The sample size, n, is large: Eij is greater than or equal to 5 for every cell

Page 39: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

2 Test of Independence Hypotheses & Statistic

1. Hypotheses• H0: Variables are independent

• Ha: Variables are related (dependent)

3. Degrees of Freedom: (r – 1)(c – 1)

Rows Columns

2. Test Statistic Observed count

Expected count

2

2

all cells

ij ij

ij

n E

E

Page 40: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

2 Test of Independence Expected Counts

1. Statistical independence means joint probability equals product of marginal probabilities

2. Compute marginal probabilities and multiply for joint probability

3. Expected count is sample size times joint probability

Page 41: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

112 160

Marginal probability =

Expected Count Example

Location Urban Rural

House Style Obs. Obs. Total

Split–Level 63 49 112

Ranch 15 33 48

Total 78 82 160

Page 42: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

78 160

Marginal probability =

Expected Count Example112 160

Marginal probability =

Location Urban Rural

House Style Obs. Obs. Total

Split–Level 63 49 112

Ranch 15 33 48

Total 78 82 160

Page 43: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Expected Count Example

78 160

Marginal probability =

112 160

Marginal probability = Joint probability = 112 160

78 160

Location Urban Rural

House Style Obs. Obs. Total

Split–Level 63 49 112

Ranch 15 33 48

Total 78 82 160

Expected count = 160· 112 160

78 160

= 54.6

Page 44: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Expected Count Calculationi jR C

= nijE

House Location Urban Rural

House Style Obs. Exp. Obs. Exp. Total

Split-Level 63

112·78 160

54.6 49

112·82 160

57.4 112

Ranch 15

48·78 160

23.4 33

48·82 160

24.6 48

Total 78 78 82 82 160

Page 45: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

As a realtor you want to determine if house style and house location are related. At the .05 level of significance, is there evidence of a relationship?

2 Test of Independence Example

House Location House Style Urban Rural Total

Split-Level 63 49 112 Ranch 15 33 48 Total 78 82 160

Page 46: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

2 Test of Independence Solution

• H0: • Ha: • = • df = • Critical Value(s):

Test Statistic:

Decision:

Conclusion:

No Relationship

Relationship

.05(2 - 1)(2 - 1) = 1

c20

Reject H0

3.841

= .05

Page 47: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Eij 5 in all cells

2 Test of Independence Solution

House Location Urban Rural

House Style Obs. Exp. Obs. Exp. Total

Split-Level 63 54.6 49 57.4 112

Ranch 15 23.4 33 24.6 48

Total 78 78 82 82 160

112·82 160

48·78 160

48·82 160

112·78 160

Page 48: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

2

2

all cells

2 2 2

11 11 12 12 22 22

11 12 22

2 2 263 54.6 49 57.4 33 24.6

8.4154.6 57.4 24.6

ij ij

ij

n E

E

n E n E n E

E E E

2 Test of Independence Solution

Page 49: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

2 Test of Independence Solution

Test Statistic:

Decision:

Conclusion:

2 = 8.41

Reject at = .05

There is evidence of a relationship

• H0: • Ha: • = • df = • Critical Value(s):

c20

Reject H0

No Relationship

Relationship

.05(2 - 1)(2 - 1) = 1

3.841

= .05

Page 50: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

You’re a marketing research analyst. You ask a random sample of 286 consumers if they purchase Diet Pepsi or Diet Coke. At the .05 level of significance, is there evidence of a relationship?

2 Test of Independence Thinking Challenge

Diet PepsiDiet Coke No Yes TotalNo 84 32 116Yes 48 122 170Total 132 154 286

Page 51: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

2 Test of Independence Solution*

• H0: • Ha: • = • df = • Critical Value(s):

Test Statistic:

Decision:

Conclusion:

No Relationship

Relationship

.05(2 - 1)(2 - 1) = 1

c20

Reject H0

3.841

= .05

Page 52: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Diet Pepsi No Yes

Diet Coke Obs. Exp. Obs. Exp. Total

No 84 53.5 32 62.5 116

Yes 48 78.5 122 91.5 170

Total 132 132 154 154 286

Eij 5 in all cells

170·132 286

170·154 286

116·132 286

154·132 286

2 Test of Independence Solution*

Page 53: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

2

2

all cells

2 2 2

11 11 12 12 22 22

11 12 22

2 2 284 53.5 32 62.5 122 91.5

54.2953.5 62.5 91.5

ij ij

ij

n E

E

n E n E n E

E E E

2 Test of Independence Solution*

Page 54: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

2 Test of Independence Solution*

Test Statistic:

Decision:

Conclusion:

2 = 54.29

Reject at = .05

There is evidence of a relationship

• H0: • Ha: • = • df = • Critical Value(s):

c20

Reject H0

No Relationship

Relationship

.05(2 - 1)(2 - 1) = 1

3.841

= .05

Page 55: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

There is a statistically significant relationship between purchasing Diet Coke and Diet Pepsi. So what do you think the relationship is? Aren’t they competitors?

2 Test of Independence Thinking Challenge 2

Diet PepsiDiet Coke No Yes TotalNo 84 32 116Yes 48 122 170Total 132 154 286

Page 56: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Low Income

You Re-Analyze the Data

High IncomeDiet Pepsi

Diet Coke No Yes Total No 4 30 34 Yes 40 2 42 Total 44 32 76

Diet Pepsi Diet Coke No Yes Total

No 80 2 82 Yes 8 120 128 Total 88 122 210

Page 57: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

True Relationships*

Apparent relation

Underlying causal relation

Control or intervening variable (true cause)

Diet Coke

Diet Pepsi

Page 58: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Moral of the Story*

© 1984-1994 T/Maker Co.

Numbers don’t think - People do!

Page 59: Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

Conclusion

1. Explained 2 Test for Proportions

2. Explained 2 Test of Independence

3. Solved Hypothesis Testing Problems• More Than Two Population Proportions• Independence