July 14 July 14 th th , 2003 , 2003 www.kiprc.uky.edu www.kiprc.uky.edu 29 29 th th TRF 2003, Denver TRF 2003, Denver Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center University of Kentucky Performing Sensitivity Analyses of Imputed Missing Values
43
Embed
Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center
Performing Sensitivity Analyses of Imputed Missing Values. Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center University of Kentucky. Multiple Imputation in Public Health Research. Handling Missing Data in Nursing Research with Multiple Imputation. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Sensitivity Analyses Sensitivity Analyses on Imputed Valueson Imputed Values
A sensitivity analysis A sensitivity analysis tests if our study tests if our study results are sensitive results are sensitive to our assumptions to our assumptions (missing data (missing data mechanism), data mechanism), data conditions (missing conditions (missing data rate), and data rate), and choices (imputation choices (imputation models or number of models or number of imputations) made imputations) made for obtaining the for obtaining the resultsresults
Research Question: What was the relationship between driving under the influence of drugs and/or alcohol, and being killed or hospitalized in a crash, for motorcycle riders in Kentucky in 2001?
Outcome (Dependent Variable): Killed or Hospitalized (K/H)
Results for the Gold StandardResults for the Gold StandardParamet
erOR(95%
CI)Estimate SE P
DUI 2.51 (1.58 3.98)
0.9189 0.2364 0.0001
Speed 1.58 (1.18 2.10)
0.4546 0.1456 0.0018
Fixed 1.70 (1.24 2.33)
0.5311 0.1599 0.0009
Head-on 1.70 (1.04 2.77)
0.5316 0.2486 0.0380
This Gold Standard result is used to compare with all other results.
Conclusion: comparing motorcyclists with DUI to motorcyclists without DUI, the odds of being killed or hospitalized are 2.5 times greater than the odds of not being killed or hospitalized, when other factors are controlled.
Note: The imputation model does not have to be identical to the analysis model, but at least it should include all of the analysis covariates. You can add any additional variables that are correlated to the variables that have missing values.
– DFN: the missing data values are a simple random sample of all data values.
– We simulated this condition by using SAS Proc SurveySelect to pick a random sample from the study data set, then set DUI = missing for those selected cases.
• Missing At Random (MAR) - DFN: the probability of missing values on one variable is unrelated to
the values of this variable, after controlling for other variables in the analysis
- We simulated this condition by setting DUI = missing for riders aged 46 or older
• Not Missing At Random (NMAR) – DFN: the probability of missing values on one variable is related to the
values of this variable even if we control other variables in the analysis– We simulated this condition by setting DUI = missing for uninjured
•Even if we used the simplest imputation model MI was able to produce results that are consistent with the Gold Standard when the missing data mechanisms were MCAR or MAR, but not NMAR
•we would predict the increased odds of death or hospitalization for riders suspected of DUI to be 1.78 (1.15 2.76) for NMAR, while our Gold Standard predicts it to be 2.51 (1.58 3.98).
Point Estimate and 95% CI for DUI with Different Missing Data Mechanisms
Conclusions of SA on Missing Data Rate• For both missing data mechanisms, the 50% missing case produced the DUI parameter estimate farthest from the Gold Standard estimate, as well as the widest 95% CI. However, for MCAR the difference from the Gold Standard estimate was -7%, whereas for MAR it was 42%. In addition, the 95% CI for 50%MCAR was 19% wider than the Gold Standard 95% CI, whereas for 50%MAR it was 106% wider.
•It shows that the simplest imputation model is not sufficient to handle very high missing data rates .
Point Estimate and 95%CI for DUI with Different Missing Data Rates
•In our example, n=5 to 10 is enough to get good results for data set with 50% MAR on DUI.
•No MI (complete cases only), we would conclude that: motorcyclists with DUI had 4.2 (2.1, 8.4) times more likely killed or hospitalized than motorcyclists without DUI. But from the Gold Standard, the OR is 2.5 (1.5, 4.0)
Point Estimate and 95% CI for DUI with Different Imputation Numbers
Summary---Answers?Summary---Answers?• May I use MI to deal with missing data problems
for my data sets?Seems a good idea to try MI. Depend on the missing data mechanisms of variables with missing values in your data sets (however, even our results with MI for NMAR were better than No MI)
• How can I believe that the MI will give me the better analysis results?We found that using MI on our example gave us much better analysis results than No MI (the complete cases only)
• How can I get better analysis results by using MI?Understand the relationship between variables in your data sets; Know the missing data mechanisms of variables;Determine the percent of missing information;Build a reasonable imputation model;Use Proc MI options wisely
Special thanks to Dr. Mike McGlincy, who gave us helpful suggestions during our study of sensitivity analyses on imputed values and insightful comments on the analysis results.
1. The MI procedure assumes that the data are from a continuous multivariate distribution. It also assumes that the data are from a multivariate normal distribution when the MCMC method is usedAccording to Schafer’s MI FAQ page, MI tends to be quite forgiving of assumption for normal distribution. For example: when working with binary or ordered categorical variables, it is often acceptable to impute under a normality assumption and then round off the continuous imputed values to the nearest category. Variables whose distributions are heavily skewed may be transformed to approximate normality and then transformed back to their original scale after imputation.
2. Proc MI and Proc MIANALYZE assume that the missing data are Missing At Random (MAR)MCAR is unlikely for real world crash datasetsNMAR may be shifted to MAR by using a richer imputation model to help predict missing values. Because crash datasets include many related variables that can help predict each other