Top Banner
Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg School
26

Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

Mar 28, 2015

Download

Documents

Marcus Oats
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

Statistical Models: The Rest of the Story

Scott L. Zeger

Hurley-Dorrier Professor and Chair

Department of Biostatistics

The Johns Hopkins University Bloomberg School

Page 2: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

What is a model?

Page 3: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

What is a statistical model?

Tool for those empirical sciences where signals come embedded in noise

Lens through which to view data to better understand the signal

Tool for quantifying the evidence in data about a particular truth we seek

Page 4: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

Empirical science: search for “truth”

Truth for Population

Observed Value for a Representative

Sample

Probability – statistical model

Statistical inference

Page 5: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

How to Choose the Best Model

• Miminize the mean squared error• Minimize the Akaike Information Criterion (AIC)• Minimize the Bayesian Information Criterion (BIC)• Maximize the likelihood function• Cross-validate• Jackknife• Bootstrap• Boost, then bag• etc

You can not choose the best model because there isn’t one

You can choose a useful model based upon prior scientific knowledge

You can explore and report how your scientific findings vary over a set of other useful models

You can average your results across useful models

Page 6: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.
Page 7: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

Causal Model

Smoking Disease

Dollars

Death

Page 8: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

Causal Model

Iraq invasion

Violence Death

Page 9: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

What Do We Know about Smoking and Medical Expenditures

• WHO, U.S. Surgeon General and IARC say smoking causes 13 major diseases:– Lung cancer; COPD; atherosclerosis; MI;

stroke; ….• In the U.S., most people receive treatment for

major chronic diseases (e.g. lung cancer)

• It cost money to treat your disease

Page 10: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

What we know LITTLE about

• Whether smoking causes people to use more or less medical services to treat smoking caused diseases

• Whether smoking causes people without a major disease to seek more or less medical services– “I hate my doctor, she tries to take my cigs

away”– “I go as often as I can afford; got to watch out

for those diseases that can kill me”

Page 11: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

Competing Causal Models

Smoking Disease Dollars

Smoking Dollars

Page 12: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

Odds Ratios of Lung Cancer/COPD by Pack-years for Current and Former Smokers

Page 13: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

Medical Expenditures for Persons with vs without Lung Cancer/COPD

Page 14: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

Difference in Average Expenditures by Propensity to Have Disease

Page 15: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

Smoking Attributable Burden for Cohort of 60 Million Who Started Under 21 Years Old,

1954-2000

Disease: LC/COPD

(millions case-years)

43.7

Disease: CHD Group

(millions case-years)

80.8

Dollars

(billions)

1,087

Deaths

(million years lost)

128.0

(13m persons)

Page 16: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.
Page 17: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

Smoking Disease Dollars

“Know this”:

$1 Trillion +- 0.2 T for 10% of pop

???

Estimate well what you can; estimate poorly what you must.

Don’t dilute decent causal estimates with causal speculates (unless you intend to make everything uncertain)

Page 18: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

Causal Model

Iraq invasion

Violence Death

Page 19: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

What We Know Well

• 2,237 U.S. soldiers (DoD)

• 99 British soldiers (British Govt)

• 4,027 Iraqi police (News reports compiled by iCasualties.org)

• 28,198 - 31,800 Iraqi civilians (IBC web-site)

The count includes civilian deaths caused by coalition military action and by military or paramilitary responses to the coalition presence (e.g. insurgent and terrorist attacks). It also includes excess civilian deaths caused by criminal action resulting from the breakdown in law and order which followed the coalition invasion. Compiled from eye-witness reports and news articles

What We Know Less Well

Page 20: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

Iraq invasion

Violence Death

~30,000 Iraqi deaths

Lack of sanitation

Lack of clean water

Poor nutrition

Limited access to medical care

Extreme stress and grief

Page 21: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.
Page 22: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.
Page 23: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

98,000 (95% CI: 8,000 - 194,000) without Falluja

~ 20 - 50% violent

Page 24: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

Summary

• A model defines the boundaries of an analysis and can determine what will be learned from data– Like a lens determines what you will see

• Same model for two problems – Separate what can be estimated precisely from

what can not– Prior knowledge about pathway

• Too much uncertainty invalidates, whether it should or not

Page 25: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

Timing of Proceeds Relative to Smoking Attributable Expenditures for Major Diseases Only

Page 26: Statistical Models: The Rest of the Story Scott L. Zeger Hurley-Dorrier Professor and Chair Department of Biostatistics The Johns Hopkins University Bloomberg.

Smoking Attributable Fraction of Disease (SAF) and Dollars (SAFE)