Top Banner
1 Chapter 5 Two-Way Tables Associations Between Categorical Variables
21
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

1

Chapter 5

Two-Way Tables Associations Between Categorical Variables

Page 2: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

2

• Quantitative variables correlation [Ch 3] & regression [Ch 4]

• categorical variables two-way tables of frequency counts [Ch 5]

Associations between variables

Page 3: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

3

Two-Way Table of CountsR-by-C tables

Variables

AGE variable = column variable (3 levels)

EDUCATION variable = row variable (4 levels)

This is a 4-by-3 table

Page 4: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

4

Marginal Distributions

Variables

Row variablemarginal totals

37,786 81,435 56,008

27,85858,07744,46544,828

Column variablemarginal distribution

Page 5: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

5

Marginal Percents

• Relative frequencies (%s) for each variable separately

• Descriptive purposes only; does not not address association

• Illustrative Example (Distribution of education level)

– Statement: Describe the distribution of education levels in the population

– Plan: Calculate marginal percents for row variable “EDUCATION”

marginal countmarginal percent 100%

table total

Page 6: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

6

Marginal Percents Example

% not completing HS = 27,859 / 175,230 × 100% = 15.9%

% completing HS = 58,077 / 175,230 × 100% = 33.1%

% with 1-3 yrs college = 44,465 / 175,230 × 100% = 25.4%

% with 4+ yrs college = 44,828 / 175,230 × 100% = 25.6%

Step 3: “Solve”

Rowtotals

Tabletotal

Page 7: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

7

Marginal Percents (Example)

• 16% did not complete high school

• 33% completed high school

• 25% completed 1 to 3 years of college

• 26% completed 4+ years of college

Step 4: “Conclude”

Merely descriptive statements

Page 8: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

8

Association

• If the row variable is the explanatory variable → compare conditional row proportions

• If the column variable is the explanatory variable → compare conditional column proportions

cell countconditional row proportion

row total

cell countconditional column proportion

column total

Use conditional proportions to determine associations

Page 9: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

9

Example: Association between AGE & EDUCATION

State: Is AGE associated with EDUCATION level?

Plan: Since AGE is the explanatory variable calculate conditional column proportions. We do not need to calculate every conditional proportion. (Be selective.) Let us calculate the proportion completing 4+ years of college by AGE

Page 10: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

10

Example: “Solve” & “Conclude”

p̂ 25-34

11,071 = .2930 = 29%

37,786

55ˆ =p 10,597

.1892 = 19%56,008

35 54p̂ 23,160

= .2843 = 28%81,435

ˆLet p represent proportion completing 4+ years of college

Conclude: As age goes up, % completing college goes down Negative association between age and college completion

Page 11: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

11

• No association: conditional percents nearly equal at all levels of the explanatory variable

• Positive association: as explanatory variable rises conditional percentages increase

• Negative associations: as explanatory variable rises conditional percentages go down

Direction of association

Page 12: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

12

State: Is ACCEPTANCE into UC Berkeley graduate school (response variable) associated with GENDER (explanatory variable)?

Example: Gender bias?

Accepted Not accept. TotalMale 198 162 360Female 88 112 200Total 286 274 560

Plan: Since GENDER is the explanatory variable calculate row percents (acceptance “rates” by gender); compare % accepted by GENDER

Page 13: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

13

Example: “Gender bias?”

Accepted Not accept TotalMale 198 162 360Female 88 112 200Total 286 274 560

Conclude: positive association between “maleness” and acceptance

Step 3: Solve

Page 14: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

14

Simpson’s Paradox

• Lurking variable MAJOR applied to– Business school major (240 applicants) – Art school major (320 applicants)

• State: Does lurking variable explain association between maleness and acceptance?

• Plan: Subdivide (“stratify”) data into subgroups according to lurking variable MAJOR then calculate acceptance rates by gender within subgroups

Simpson’s Paradox ≡ lurking variable reverses direction of the association

Page 15: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

15

“Gender Bias” Data by MAJOR

Business School ApplicantsSuccess Failure Total

Male 18 102 120Female 24 96 120

Total 42 198 240

All ApplicantsSuccess Failure Total

Male 198 162 360Female 88 112 200Total 286 274 560

Art School ApplicantsSuccess Failure Total

Male 180 60 240Female 64 16 80

Total 244 76 320

Page 16: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

16

Business School ApplicantsSuccess Failure Total

Male 18 102 120

Female 24 96 120

Total 42 198 240

Conclude: Negative association with maleness

0.15 18

proportion accepted, males 120

240.20 proportion accepted, females

120

Conclude: Negative association with maleness

0.15 18

proportion accepted, males 120

240.20 proportion accepted, females

120

Page 17: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

17

Art School Applicants

Success Failure Total

Male 180 60 240

Female 64 16 80

Total 244 76 320

Conclude: Negative association with maleness

0.75240

180

proportion accepted, males

640.80 proportion accepted, females

80

Page 18: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

18

• Overall: higher acceptance rate for men

• Within Business school: higher acceptance rate for women

• Within Art school: higher acceptance rate for women

• Therefore, the lurking variable (MAJOR) reversed the direction of the association (Simpson’s Paradox)

• Acceptance to grad school at UC Berkeley favored women after “controlling for” MAJOR

Gender Bias Example Conclusion

Page 19: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

19

HIV vaccine boostHIV vaccine boost (Exercise 5.6)

State: Do data support that vaccine delivered by EP results in a higher proportion responding?Plan = ?Solution = ?Conclusion = ?

Page 20: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

Kidney StonesKidney Stones (Exercise 5.7)

Small Stones

Open Surgery

Percutaneous

Success 81 234

Failure 6 36

Large Stones

Open Surgery

Percutaneous

Success 192 55

Failure 71 25

(a) Find % of kidney stones, combining the data for small and large stones, that were successfully removed for each of the two procedures. Which procedure had the higher overall success rate?(b) What % of all small kidney stones were successfully removed? What % of all large kidney stones…? Which type of kidney stone is easier to treat?

Page 21: 1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.

22

Helicopter EvacuationHelicopter Evacuation Lurking Variable /SimpsonSimpson’’s Paradoxs Paradox

X

Helicopter or RoadY

Survived or Died

ZAccident Severity