Top Banner
Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012 BIOSTATISTICS 590
21

Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Jan 02, 2016

Download

Documents

Hester Bailey
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15Veronika N. Stiles, BSDHUniversity of Michigan September’2012

BIOSTATISTICS 590

Page 2: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Basis for Presentation

• This presentation is based on an article by:

• T.E. Raghunathan• J.M. Lepkowski • J.V. Hoewyk• P. Solenberger

“A multivariate Technique for Multiply Imputing Missing Values Using a Sequence of Regression Models”

Survey Methodology, June 2001Vol. 27, No. 1, pp. 85-95

Page 3: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Rationale for Multiple Imputation • Incomplete data is a common problem

• Allows to use an existing complete-data software, once the missing values have been imputed

Page 4: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Basic Definitions• “Imputation” is the placement of one or more estimated answers

into a field of a data record that previously had NO data• Draws from a predictive distribution

Basic Strategy• To create imputations through fitting a sequence of multiple

regressions • Regressions use the variable with missing data as the outcome (Y)

variable• Regression models based on complete data are used to make

predictions of Y when Y is missing• To draw values from the predictive distributions• Cyclical manner• The type of regression model varies by imputed variable

(Example is coming up in future slides)

Page 5: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Types of Regression Models Used

1. Linear 2. Logistic 3. Poisson4. Generalized logit5. Mixture of the above

Remember! The type of regression model depends on the type of imputed variable!

Page 6: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Assumptions in MI Technique• Population is infinite • Sample is SRS• Variables are one of the following:• Continuous• Binary• Categorical • Counts• Mixed

Page 7: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Advantages of Multiple Imputation

+ Method for imputation is known;+ Analyses are based on the same # of

cases;+ All data provided is used in each

analysis;+ Allows for multiple predictors;+ Valid points and interval estimates

under a general set of conditions are obtained

by repeatedly applying the complete data software

Page 8: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Imputation Method • Each imputation consists of “rounds”• Start round 1 by regressing the variable with fewest

# of missing values • Remember! Imputations for missing values in Y are

draws from the predictive distribution (Use predicted mean Y + a random draw from the

normal error distribution) • Then, update X by replacing missing Y with the

imputed value• X=full matrix with all variables (including Y)

Lesion Location Etiology Lesion Size ChronicityTemporal Lobectomy 2.72 89.3

Occipital Stroke . 36.3Temporal Hemorrhage . 55.3

Page 9: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Imputation Method

• Move on to the next Y with fewest missing values• Repeat MI using updated X as

predictors until all variables have been imputed

Run the process M times; Yield M entire datasets; Each dataset has different set of

imputed values, but the same data for complete values

Page 10: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Example Time

Effect of Smoking on Primary Cardiac Arrest (CA)

• Case-control study• Examine relationship between

smoking and CA

Page 11: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Means and Proportions of Key Variables and Percent Missing

Variable Control (n = 551) Cases (n = 347)

% Missing Mean (SD) % Missing Mean (SD)

Age 0 58.4 (10.4) 0 59.4 (9.9)

BMI 8.2 25.8 (4.1) 2.6 26.4 (4.6)

Years Smoked 16.8 24.8 (14.7) 5.4 31.7 (13.8)

Proportion Proportion

Female 0 23.2 0 19.9 >= High School 0 76.8 0 61.9

Smoking Status 0

Never Smoked 0 47.2 0 27.3 Former Smoker 0 42.1 0 38.2 Current Smoker 0 10.7 0 34.5

Page 12: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Intuitively…

• What variables might predict missing data?• Could age, education, smoking status predict BMI?• Could age predict years smoked?• However, years smoked can only be imputed for current and former smokers!• Some values may need to be fixed post-MI

Page 13: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Multiple Imputation Process in CA Study

• Log (BMI) has fewest missing values • Regress Log (BMI) on age, female,

education, Years_Smoked, smoking status, and cardiac arrest through normal linear model• Cardiac Arrest IS included in the

imputation model• Predicted values of log (BMI) are saved to the dataset, replacing the missing values

Page 14: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Multiple Imputation Process in CA Study• Next, Years Smoked was regressed on all of

the variables above+ log (BMI) (Please note that the regression excludes ‘never-smokers’)

• Predicted values of Years Smoked are saved to the dataset, replacing the missing values

• M=25 imputations (Note: many researchers use M=5 or 5<M<10)

• Original logistic regression model was fit for each MI data set

Page 15: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

How were estimates of coefficients and covariance matrices obtained?

• IVEware software performs calculations, using estimates and covariance matrix• Combines the results from 5-25 regressions• Combines both within-regression and between-

regression error

• IVEware: Imputation and Variance Estimation Software

http://www.isr.umich.edu/src/smp/ive/

• Developed by our own Dr. Raghunathan & researchers at the Survey Methodology Program

Page 16: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Complete-Case Analysis vs MIPredictor Variables Complete Case SRMI

(n = 795) Method 1 (n = 898)

Estimate (SE) Estimate (SE)

Intercept -2.922 (0.791) -2.61 (0.757)

Age 0.015 (0.009) 0.015 (0.009)

Female -0.007 (0.203) -0.115 (0.189)

Education -0.448 (0.173) -0.467 (0.166)

BMI 0.056 (0.018) 0.049 (0.013)

Current Smoker 1.693 (0.569) 2.001 (0.543)

Former Smoker 0.003 (0.284) -0.029 (0.262)

Current Smoker x Yrs Smoked -0.003 (0.015) -0.008 (0.013)

Former Smoker x Yrs Smoked 0.019 (0.009) 0.014 (0.009)

Page 17: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Results of the Multiple Imputations• MI standard errors are smaller:• due to additional subjects in imputed

data

• Modest changes in relationship between smoking and CA

• Years Smoked in Former Smokers is a significant predictor of cardiac arrest in the complete-case analysis, but NOT in the MI analysis (!!!)

Page 18: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Additional Variables MI Approach

• Additional variables NOT in the substantive analysis can be used

• Prediction for missing values in each variable borrows strength from all other variables

• In our cardiac arrest example, impute dataset

+50 additional variables SE are smaller

• Improved efficiency vs. variables in model only

Page 19: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

In Addition…

IVEware performs…

1. Single or multiple imputations2. Analyses accounting for:• Clustering• Stratification• Weighting

3. Combines information from multiple sources

(+some other functions beyond the scope of this presentation)

Page 20: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Critique• This article might be too challenging and complicated as

an entry-level description of multiple imputation • Some of the foundational concepts from this article have

not been covered thus far in OJOC program• nonignorable missing mechanism

RECOMMENDATION

Start with “Survey Methodology” (2nd edition) by R.M. Groves, F. J. Fowler, Jr., M.P. Couper, J.M. Lepkowski, E. Singer, R. Tourangeau. Wiley Series in Survey Methodology, A John Wiley & Sons, Inc., Publication, 2009, p. 356.

Page 21: Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Thank You for Your Attention!