Top Banner
Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 1 UTSA Confounding 2011 1 MILO SCHIELD, Augsburg College Director, W. M. Keck Statistical Literacy Project Vice President, National Numeracy Network US Rep., International Statistical Literacy Project January 13, 2011 University of Texas San Antonio (UTSA) Slides at www.StatLit.org/pdf/ 2011-Schield-UTSA-Confounding-Slides.pdf Statistical Literacy: Confounding 2011 2 Statistical Literacy Statistical literacy is the ability to read and interpret summary statistics in everyday life. Statistical Literacy studies (1) the relation between statistical associations and causation, and (2) the full-range of influences on a statistic or on a statistical association. [Take CARE] 2011 3 Take CARE: Context The influence of factors taken into account by data broken out by subgroups in tables and graphs averages, ratios and comparisons of averages and ratios epidemiological models (cf., deaths attributed to obesity) regression models and the study design (cf., longitudinal vs. cross-sectional; experiment vs. observational study). The influence of related factors (confounders) not taken into account in the study and not blocked by the study design. 2011 4 Controlling for a confounder can DECREASE an association MN has 3.8 times as much prison expense as ME MN has 3.4 times as many inmates as ME MN has 25% more prison expense per inmate than ME State Total # Inmates Per Inmate MN $184M 4,865 $37,825 ME $48M 1,424 $33,711 2011 5 Controlling for a confounder can NULLIFY an association MD has 3 times as much prison expense as KS MD has three times as many inmates as KS MD has the same prison expense per inmate as KS State Total # Inmates Per Inmate MD $481M 21,623 $22,250 KS $159M 7,148 $22,250 2011 6 Controlling for a confounder can REVERSE an association CA has 50% more prison expense than NY CA has almost twice as many inmates as NY CA has 25% less prison expense per inmate than NY State Total # Inmates Per Inmate CA $2.9B 136K $21,385 NY $1.9B 69K $28,426
28

Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

Jul 25, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

Statistical Literacy: Confounding 13 Jan, 2011

2011-Schield-UTSA-Confounding-Slides.pdf 1

UTSA Confounding 2011 1

MILO SCHIELD, Augsburg College

Director, W. M. Keck Statistical Literacy ProjectVice President, National Numeracy Network

US Rep., International Statistical Literacy Project

January 13, 2011University of Texas San Antonio (UTSA)

Slides at www.StatLit.org/pdf/2011-Schield-UTSA-Confounding-Slides.pdf

Statistical Literacy: Confounding

2011 2

Statistical Literacy

Statistical literacy is the ability to read and interpret summary statistics in everyday life.

Statistical Literacy studies

(1) the relation between statistical associationsand causation, and

(2) the full-range of influences on a statisticor on a statistical association. [Take CARE]

2011 3

Take CARE: Context

The influence of factors taken into account by

• data broken out by subgroups in tables and graphs

• averages, ratios and comparisons of averages and ratios

• epidemiological models (cf., deaths attributed to obesity)

• regression models and

• the study design (cf., longitudinal vs. cross-sectional; experiment vs. observational study).

The influence of related factors (confounders) not taken into account in the study and not blocked by the study design.

2011 4

Controlling for a confoundercan DECREASE an association

MN has 3.8 times as much prison expense as ME

MN has 3.4 times as many inmates as ME

MN has 25% more prison expense per inmate than ME

State Total # Inmates Per Inmate

MN $184M 4,865 $37,825

ME $48M 1,424 $33,711

2011 5

Controlling for a confoundercan NULLIFY an association

MD has 3 times as much prison expense as KS

MD has three times as many inmates as KS

MD has the same prison expense per inmate as KS

State Total # Inmates Per Inmate

MD $481M 21,623 $22,250

KS $159M 7,148 $22,250

2011 6

Controlling for a confoundercan REVERSE an association

CA has 50% more prison expense than NY

CA has almost twice as many inmates as NY

CA has 25% less prison expense per inmate than NY

State Total # Inmates Per Inmate

CA $2.9B 136K $21,385

NY $1.9B 69K $28,426

Page 2: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

Statistical Literacy: Confounding 13 Jan, 2011

2011-Schield-UTSA-Confounding-Slides.pdf 2

2011 7

Controlling for a confoundercan INCREASE an association

MN has 27% more prison expense than IA

MN has 18% fewer inmates than IA

MN has 56% more prison expense per inmate than IA

State Total # Inmates Per Inmate

MN $184M 4,865 $37,825

IA $144M 5,929 $24,286

2011 8

Association vs. Causation

.

SEASON WINS vs. TOTAL PAYROLLUS Major League Baseball

52

62

72

82

92

102

10 20 30 40 50 60

Total Payroll ($Millions)

1995

Sea

son W

ins

Yankees

BlueJays

Indians

Twins

Marlins

Rangers

Mets Padres

Braves

Orioles

Red SoxReds

Expos

Pirates Tigers

2011 9

Adjusting for Land Size:Standardize on Average Lot

House Prices (Average Acres = 1.6)

$50,000

$150,000

$250,000

$350,000

$450,000

0 1 2 3 4 5 6

Land Size (Acres)2004AssessMTB

Best-Fit Line

2011 10

SAT VERBAL SCORES: FLAT

GROUP 1981 2002 CHANGE

White 519 (85%) 527 (65%) 8

Black 412 (9%) 431 (11%) 19

Asian 474 (3%) 501 (10%) 27

Mexican 438 (2%) 446 (4%) 8

Puerto Rican 437 (1%) 455 (3%) 18

American Indian 471 (0%) 479 (1%) 8

ALL Test takers 504 (100%) 504 (100%) ZERO

2011 11

Multivariate Analysiscan be Complex

To simplify, consider cases with • a binary outcome, • a binary predictor and • a binary confounder.

What are the necessary conditions for nullification or a reversal?

See Schield (1999) and Schield and Burnham (2003)

2011 12

City Hospital:Hospital of Death??

.Hospital Total Died Death Rate

City 1,000 55 5.50%

Rural 1,000 35 3.50%

Both 2,000 90 4.50%

Condition Total Died Death Rate

Good 800 15 1.90%

Poor 1,200 75 6.30%

Page 3: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

Statistical Literacy: Confounding 13 Jan, 2011

2011-Schield-UTSA-Confounding-Slides.pdf 3

2011 13

Can this confounder nullify or reverse this association?

.

4.5%

6.3%

1.9%

5.5%

3.5%

By HospitalBy PatientCondition

4.4

Pct

. P

ts

2 P

ct.P

ts

Death Rates

Rural

CityOverall

Poorhealth

Goodhealth

230% more60% more

2011 14

Confounder Reverses;City Hospital is Better

.

Condition Hospital Total DiedDeath Rate

Good City 100 1 1.00%

Rural 700 14 2.00%

Total 800 15 1.90%

Poor City 900 54 6.00%

Rural 300 21 7.00%

Total 1,200 75 6.30%

2011 15

Two-Group Rates with a Binary Confounder

.

0,0

1,0

Ra Rb

Rc Rd

AQ

XQ

AP

XP

A: AssociatedB: confounder.

E: effect

BP

BQ XN

XM

0,1

1,1

2011 16

Compare Hospital Death RatesConfounder: Patient Condition

. A Confounder can Influence a Difference

0%

1%

2%

3%

4%

5%

6%

7%

0% 20% 40% 60% 80% 100%

Percentage who are in "Poor" Condition

Dea

th R

ate

2011 17

Standardize on combined confounder percentage

. Standardizing Can Reverse A Difference

0%

1%

2%

3%

4%

5%

6%

7%

0% 20% 40% 60% 80% 100%

Percentage who are in "Poor" Condition

Dea

th R

ate

2011 18

Adjusting for Land SizeAuto Deaths and Airbag PresenceConfounded by Seatbelt Use

15

43

70

98

125

0% 20% 40% 60% 80% 100%

Percentage who wear Seatbelts

Dea

th R

ate

per

10,

00

Acc

iden

ts ..

None All

Airbag

No Airbag

Airbag

Standardized

Page 4: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

Statistical Literacy: Confounding 13 Jan, 2011

2011-Schield-UTSA-Confounding-Slides.pdf 4

2011 19

Adjusting for Land SizeSubscription Renewal Rates by MonthConfounded by Change in Subscription Mix

10%

20%

30%

40%

50%

60%

70%

80%

0% 20% 40% 60% 80% 100%

Percentage of Renewals which are Agent

Ren

ewal

Rat

e

January

Standardize

February

10% 40% 46%

2011 20

Confounder: Race2000n NAEP 4th Grade MathStandardized Scores: LA vs WV

204

230

203

226

200

205

210

215

220

225

230

0% 20% 40% 60% 80% 100%

Percentage who are White

NA

EP

Sco

res LA

WV

Std.

2011 21

Confounder: Family StructureIncome: US Families by Race & Structure

$10,000

$15,000

$20,000

$25,000

$30,000

$35,000

$40,000

$45,000

$50,000

$55,000

$60,000

$65,000

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Percentage who are headed by Married Couple

Mea

n In

com

e

Black Families

78%

White Families

82%48%

Population

2011 22

Control for Mom’s Age

2011 23

Controlling Can ChangeStatistical Significance

2011 24

Conclusion

Statistical educators must show students how confounders can influence associations and

change statistical significance. The failure of educators to do this

may be seen as “statistical negligence.”

Schield (1999). Simpson's Paradox and Cornfield's Conditions, See www.StatLit.org/pdf/1999SchieldASA.pdf.

Schield, Milo (2006). Presenting Confounding and Standardization Graphically. STATS Magazine, ASA. Fall 2006. pp. 14-18. Draft at www.StatLit.org/pdf/2006SchieldSTATS.pdf.

Schield, Milo (2009). Confound Those Speculative Statistics. 2009 ASA Proceedings of the Section on Statistical Education. [CD-ROM] 4255-4266. www.StatLit.org/pdf/2009SchieldASA.pdf

Page 5: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

UTSA Confounding 2011 1

MILO SCHIELD, Augsburg College

Director, W. M. Keck Statistical Literacy ProjectVice President, National Numeracy Network

US Rep., International Statistical Literacy Project

January 13, 2011University of Texas San Antonio (UTSA)

Slides at www.StatLit.org/pdf/2011-Schield-UTSA-Confounding-Slides.pdf

Statistical Literacy: Confounding

Page 6: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 2

Statistical Literacy

Statistical literacy is the ability to read and interpret summary statistics in everyday life.

Statistical Literacy studies

(1) the relation between statistical associationsand causation, and

(2) the full-range of influences on a statisticor on a statistical association. [Take CARE]

Page 7: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 3

Take CARE: Context

The influence of factors taken into account by

• data broken out by subgroups in tables and graphs

• averages, ratios and comparisons of averages and ratios

• epidemiological models (cf., deaths attributed to obesity)

• regression models and

• the study design (cf., longitudinal vs. cross-sectional; experiment vs. observational study).

The influence of related factors (confounders) not taken into account in the study and not blocked by the study design.

Page 8: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 4

Controlling for a confoundercan DECREASE an association

MN has 3.8 times as much prison expense as ME

MN has 3.4 times as many inmates as ME

MN has 25% more prison expense per inmate than ME

State Total # Inmates Per Inmate

MN $184M 4,865 $37,825

ME $48M 1,424 $33,711

Page 9: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 5

Controlling for a confoundercan NULLIFY an association

MD has 3 times as much prison expense as KS

MD has three times as many inmates as KS

MD has the same prison expense per inmate as KS

State Total # Inmates Per Inmate

MD $481M 21,623 $22,250

KS $159M 7,148 $22,250

Page 10: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 6

Controlling for a confoundercan REVERSE an association

CA has 50% more prison expense than NY

CA has almost twice as many inmates as NY

CA has 25% less prison expense per inmate than NY

State Total # Inmates Per Inmate

CA $2.9B 136K $21,385

NY $1.9B 69K $28,426

Page 11: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 7

Controlling for a confoundercan INCREASE an association

MN has 27% more prison expense than IA

MN has 18% fewer inmates than IA

MN has 56% more prison expense per inmate than IA

State Total # Inmates Per Inmate

MN $184M 4,865 $37,825

IA $144M 5,929 $24,286

Page 12: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 8

Association vs. Causation

.

SEASON WINS vs. TOTAL PAYROLLUS Major League Baseball

52

62

72

82

92

102

10 20 30 40 50 60

Total Payroll ($Millions)

1995

Sea

son W

ins

Yankees

BlueJays

Indians

Twins

Marlins

Rangers

Mets Padres

Braves

Orioles

Red SoxReds

Expos

Pirates Tigers

Page 13: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 9

Adjusting for Land Size:Standardize on Average Lot

House Prices (Average Acres = 1.6)

$50,000

$150,000

$250,000

$350,000

$450,000

0 1 2 3 4 5 6

Land Size (Acres)2004AssessMTB

Best-Fit Line

Page 14: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 10

SAT VERBAL SCORES: FLAT

GROUP 1981 2002 CHANGE

White 519 (85%) 527 (65%) 8

Black 412 (9%) 431 (11%) 19

Asian 474 (3%) 501 (10%) 27

Mexican 438 (2%) 446 (4%) 8

Puerto Rican 437 (1%) 455 (3%) 18

American Indian 471 (0%) 479 (1%) 8

ALL Test takers 504 (100%) 504 (100%) ZERO

Page 15: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 11

Multivariate Analysiscan be Complex

To simplify, consider cases with • a binary outcome, • a binary predictor and • a binary confounder.

What are the necessary conditions for nullification or a reversal?

See Schield (1999) and Schield and Burnham (2003)

Page 16: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 12

City Hospital:Hospital of Death??

.Hospital Total Died Death Rate

City 1,000 55 5.50%

Rural 1,000 35 3.50%

Both 2,000 90 4.50%

Condition Total Died Death Rate

Good 800 15 1.90%

Poor 1,200 75 6.30%

Page 17: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 13

Can this confounder nullify or reverse this association?

.

4.5%

6.3%

1.9%

5.5%

3.5%

By HospitalBy PatientCondition

4.4

Pct

. P

ts

2 P

ct.P

ts

Death Rates

Rural

CityOverall

Poorhealth

Goodhealth

230% more60% more

Page 18: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 14

Confounder Reverses;City Hospital is Better

.

Condition Hospital Total DiedDeath Rate

Good City 100 1 1.00%

Rural 700 14 2.00%

Total 800 15 1.90%

Poor City 900 54 6.00%

Rural 300 21 7.00%

Total 1,200 75 6.30%

Page 19: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 15

Two-Group Rates with a Binary Confounder

.

0,0

1,0

Ra Rb

Rc Rd

AQ

XQ

AP

XP

A: AssociatedB: confounder.

E: effect

BP

BQ XN

XM

0,1

1,1

Page 20: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 16

Compare Hospital Death RatesConfounder: Patient Condition

. A Confounder can Influence a Difference

0%

1%

2%

3%

4%

5%

6%

7%

0% 20% 40% 60% 80% 100%

Percentage who are in "Poor" Condition

Dea

th R

ate

Page 21: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 17

Standardize on combined confounder percentage

. Standardizing Can Reverse A Difference

0%

1%

2%

3%

4%

5%

6%

7%

0% 20% 40% 60% 80% 100%

Percentage who are in "Poor" Condition

Dea

th R

ate

Page 22: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 18

Adjusting for Land SizeAuto Deaths and Airbag PresenceConfounded by Seatbelt Use

15

43

70

98

125

0% 20% 40% 60% 80% 100%

Percentage who wear Seatbelts

Dea

th R

ate

per

10,

00

Acc

iden

ts ..

None All

Airbag

No Airbag

Airbag

Standardized

Page 23: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 19

Adjusting for Land SizeSubscription Renewal Rates by MonthConfounded by Change in Subscription Mix

10%

20%

30%

40%

50%

60%

70%

80%

0% 20% 40% 60% 80% 100%

Percentage of Renewals which are Agent

Ren

ewal

Rat

e

January

Standardize

February

10% 40% 46%

Page 24: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 20

Confounder: Race2000n NAEP 4th Grade MathStandardized Scores: LA vs WV

204

230

203

226

200

205

210

215

220

225

230

0% 20% 40% 60% 80% 100%

Percentage who are White

NA

EP

Sco

res LA

WV

Std.

Page 25: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 21

Confounder: Family StructureIncome: US Families by Race & Structure

$10,000

$15,000

$20,000

$25,000

$30,000

$35,000

$40,000

$45,000

$50,000

$55,000

$60,000

$65,000

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Percentage who are headed by Married Couple

Mea

n In

com

e

Black Families

78%

White Families

82%48%

Population

Page 26: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 22

Control for Mom’s Age

Page 27: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 23

Controlling Can ChangeStatistical Significance

Page 28: Statistical Literacy: Confounding 13 Jan, 2011 · 2019-04-02 · Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2 2011 7 Controlling for a

2011 24

Conclusion

Statistical educators must show students how confounders can influence associations and

change statistical significance. The failure of educators to do this

may be seen as “statistical negligence.”

Schield (1999). Simpson's Paradox and Cornfield's Conditions, See www.StatLit.org/pdf/1999SchieldASA.pdf.

Schield, Milo (2006). Presenting Confounding and Standardization Graphically. STATS Magazine, ASA. Fall 2006. pp. 14-18. Draft at www.StatLit.org/pdf/2006SchieldSTATS.pdf.

Schield, Milo (2009). Confound Those Speculative Statistics. 2009 ASA Proceedings of the Section on Statistical Education. [CD-ROM] 4255-4266. www.StatLit.org/pdf/2009SchieldASA.pdf