Top Banner
How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1
26

How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

How to Handle Missing Values in Multivariate

DataBy Jeff McNeal & Marlen Roberts

1

Page 2: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

The Missing Data Problem

•Problems with Statistical Inference

• Sample Size & Power

• Biased Results

Little, R., & Rubin, D. (2002). Introduction. In Statistical Analysis with Missing Data (2nd ed., pp. 1-2). Hoboken, New Jersey: John Wiley & Sons. 2

Page 3: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Real World Examples

• Respondents in a household survey refuse to report income

• Missing results of manufacturing experiment due to equipment failure

• Voters’ inability to express preference for a political candidate in an opinion poll

Little, R., & Rubin, D. (2002). Introduction. In Statistical Analysis with Missing Data (2nd ed., pp. 1-2). Hoboken, New Jersey: John Wiley & Sons. 3

Page 4: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Outline

• Common Assumptions and Missing Data Patterns

• Taxonomy of Methods for Handling Missing Values

• Multiple Imputation

• Maximum Likelihood

• Simulation4

Page 5: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Missing Data Patterns

• All missing data are not created equal

• Missing due to a random process

• Missing due to a non-random process

5

Page 6: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

A Simple Example: Income Survey

Westfall, P., & Henning, K. (2013). Understanding Advanced Statistical Methods (1st ed.). Boca Raton, Florida: CRC Press, Taylor & Francis Group. 6

Page 7: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Univariate Missing Data Process: MCAR

P.H. Westfall 7

Page 8: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Multivariate Missing Data Processes:

MCAR and MAR

http://support.sas.com/resources/papers/proceedings12/312-2012.pdf 8

Page 9: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Missing Data Processes: MNAR

http://www.stat.columbia.edu/~gelman/arm/missing.pdf 9

Page 10: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Taxonomy of Missing-Data Methods

• Complete Case Analysis (Listwise Deletion)

• Available Case Analysis (Pairwise Deletion)

• Least Squares on Imputed Data

• Multiple Imputation

• Maximum Likelihood (and Bayes)

Little, R., & Rubin, D. (2002). Introduction. In Statistical Analysis with Missing Data (2nd ed., pp. 19-20). Hoboken, New Jersey: John Wiley & Sons. 10

Page 11: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Complete Case Analysis (Listwise Deletion)

• Easy to implement

• Works well when MCAR assumption is met

• Wastes a lot of information

http://statistics.ucla.edu/system/resources/BAhbBlsHOgZmSSI7MjAxMi8wNS8yOS8xNF80OF8wOV83M19SZWdyZXNzaW9uX3dpdGhfTWlzc2luZ19YX3MucGRmBjoGRVQ/Regression%20with%20Missing

%20X's.pdf11

Page 12: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Available Case Analysis (Pairwise Deletion)

• Attempts to minimize the loss of data in listwise deletion

• Increases the power of your test

• Usually is outperformed by Maximum Likelihood

• Caveat: Can result in non-positive definite covariance matrices

http://statistics.ucla.edu/system/resources/BAhbBlsHOgZmSSI7MjAxMi8wNS8yOS8xNF80OF8wOV83M19SZWdyZXNzaW9uX3dpdGhfTWlzc2luZ19YX3MucGRmBjoGRVQ/Regression%20with%20Missing

%20X's.pdf12

Page 13: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Least Squares Imputation Methods

• Unconditional Mean Substitution

• Conditional Mean Imputation based on X

• Conditional Mean Imputation based on X and Y

http://statistics.ucla.edu/system/resources/BAhbBlsHOgZmSSI7MjAxMi8wNS8yOS8xNF80OF8wOV83M19SZWdyZXNzaW9uX3dpdGhfTWlzc2luZ19YX3MucGRmBjoGRVQ/Regression%20with%20Missing

%20X's.pdf13

Page 14: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Unconditional Mean Substitution

• Just take the sample mean of the observed data and use it for the missing values

• Heavily biases the covariance matrix

• Bias can be corrected but the inferences (confidence intervals, tests, etc.) are distorted and over-precise

http://statistics.ucla.edu/system/resources/BAhbBlsHOgZmSSI7MjAxMi8wNS8yOS8xNF80OF8wOV83M19SZWdyZXNzaW9uX3dpdGhfTWlzc2luZ19YX3MucGRmBjoGRVQ/Regression%20with%20Missing

%20X's.pdf14

Page 15: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Conditional Mean Imputation

http://statistics.ucla.edu/system/resources/BAhbBlsHOgZmSSI7MjAxMi8wNS8yOS8xNF80OF8wOV83M19SZWdyZXNzaW9uX3dpdGhfTWlzc2luZ19YX3MucGRmBjoGRVQ/Regression%20with%20Missing

%20X's.pdf15

Page 16: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Multiple Imputation

Little, R., & Rubin, D. (2002). Introduction. In Statistical Analysis with Missing Data (2nd ed., pp. 19-20). Hoboken, New Jersey: John Wiley & Sons. 16

Page 17: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Steps Involved in Multiple Imputation

• Introduce random variation into the process of imputing missing values

• Generate several data sets, each with different imputed values

• Perform an analysis on each data set

• Combine the results into a single set of parameter estimates, standard errors, and test statistics

http://support.sas.com/resources/papers/proceedings12/312-2012.pdf 17

Page 18: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Introducing Randomness into a M.I. Model

http://support.sas.com/resources/papers/proceedings12/312-2012.pdf 18

Page 19: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Adding Variability to the Imputed Values

http://support.sas.com/resources/papers/proceedings12/312-2012.pdf 19

Page 20: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Why Do We Want to Add Variability?

• This is the whole point of multiple imputation

http://www.stat.columbia.edu/~gelman/arm/missing.pdf 20

Page 21: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Combining Inferences from Imputed Data

http://support.sas.com/resources/papers/proceedings12/312-2012.pdf 21

Page 22: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Simplified Form using a Regression Example

http://www.stat.columbia.edu/~gelman/arm/missing.pdf 22

Page 23: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Likelihood-Based Inference

https://www.amstat.org/sections/srms/webinarfiles/ModernMethodWebinarMay2012.pdf 23

Page 24: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

ML with Ignorable Missing Data

https://www.amstat.org/sections/srms/webinarfiles/ModernMethodWebinarMay2012.pdf 24

Page 25: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

ML with Ignorable Missing Data

https://www.amstat.org/sections/srms/webinarfiles/ModernMethodWebinarMay2012.pdf 25

Page 26: How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Comparison of Methods

Listwise Pairwise• Easiest to implement• Has minimal effect if data are MCAR, or

MAR for large sample sizes• Has a tendency to bias results

• Uses more information than listwise• Increases statistical power• Also easy to implement

Multiple Imputation Maximum Likelihood• Requires no special software once the

imputed datasets are generated• Requires specification of a model• Requires more assumptions

• Requires specification of a model for each variable

• Most asymptotically efficient• Most complex• You get model comparison statistics (AIC,

BIC, etc.)26