Top Banner
Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics
19

Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

Jan 13, 2016

Download

Documents

Elaine Booker
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

Practical Missing Data Analysis in

SPSS (v17 onwards)

Peter T. Donnan

Professor of Epidemiology and Biostatistics

Page 2: Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

ObjectivesObjectives

•How to impute missing values How to impute missing values in SPSS, specifically MIin SPSS, specifically MI

•How to implement analyses How to implement analyses with multiple imputed valueswith multiple imputed values

•Interpretation of the outputInterpretation of the output

•Practical tipsPractical tips

Page 3: Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

Example dataExample data

From trial of pedometers+advice From trial of pedometers+advice vs advice vs controls in vs advice vs controls in sedentary elderly womensedentary elderly women

Follow-up at 3 and 6 mnthsFollow-up at 3 and 6 mnths

Main outcome measure of Main outcome measure of activity from accelerometer activity from accelerometer countscounts

210 randomised / 170 at 3 210 randomised / 170 at 3 monthsmonths

Page 4: Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

Example data – Example data – Pedometer trialPedometer trial

Read in data ‘SPSS Study Read in data ‘SPSS Study databse.sav’ databse.sav’

Main outcome is:Main outcome is:

3 mnth activity – 3 mnth activity – AccelVM2AccelVM2

Baseline activity – Baseline activity – AccelVM1aAccelVM1a

Trial arm represented by two Trial arm represented by two dummy variables:dummy variables: Grp1Grp1 = Pedom. = Pedom. Vs. controlVs. control

Grp2Grp2 = Advice vs. = Advice vs. control control

Page 5: Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

Main analysis – Main analysis – Pedometer trialPedometer trial

Regression on 3 Regression on 3 months activity months activity adjusting for adjusting for baseline activity baseline activity and two dummy and two dummy variables variables representing representing trial arm trial arm contrasts contrasts

Page 6: Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

Main analysis – Main analysis – Pedometer trialPedometer trial

Note that n =170 Note that n =170 with 40 missing in with 40 missing in complete case complete case analysis and so analysis and so potential for biaspotential for bias

Page 7: Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

Missing at Random (MAR)Missing at Random (MAR)

• Prob (Missing) is independent of: Prob (Missing) is independent of: 1) unobserved data 1) unobserved data butbut 2) 2) dependentdependent on observed data on observed data

• Essentially observed data is a random Essentially observed data is a random sample of full data sample of full data in each stratumin each stratum

• MAR is weaker version of MCAR MAR is weaker version of MCAR assumptionassumption

• If MAR is assumed, If MAR is assumed, manymany methods methods possible to impute data using observed possible to impute data using observed data.data.

• Prob (Missing) is independent of: Prob (Missing) is independent of: 1) unobserved data 1) unobserved data butbut 2) 2) dependentdependent on observed data on observed data

• Essentially observed data is a random Essentially observed data is a random sample of full data sample of full data in each stratumin each stratum

• MAR is weaker version of MCAR MAR is weaker version of MCAR assumptionassumption

• If MAR is assumed, If MAR is assumed, manymany methods methods possible to impute data using observed possible to impute data using observed data.data.

Page 8: Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

Completers (n =172)Dropped out at 3 months (n = 32)

Chi-squared or t-test p-value

Age Mean (SD) 77.1 (5.0) 78.5 (5.6) 0.137

Accelerometer VM Mean (SD)

130695 (47991) 113381 (50444)0.065

Limb Function Mean (SD) 8.69 (2.25) 7.41 (2.86) 0.028

NHS Costs previous 3 months Mean (SD)

£199.59 (306.74) £404.29 (1289.54)0.402

Pedometer Group N (%) 58 (85.3%) 10 (14.7%) 0.052

BCI Group N (%) 52 (77.6%) 15 (22.4%)

Control Group N (%) 62 (92.5%) 5 (7.5%)

Stairs difficult Yes 48 (76.2%) 15 (23.8%) 0.033

No 124 (87.9% 17 (12.1%)

Comparison of completers at 3 months

and drop-outs

Page 9: Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

Execution of MI in SPSSExecution of MI in SPSS

So assuming MAR we can use the So assuming MAR we can use the available data to predict missing available data to predict missing values in SPSS:values in SPSS:

AnalyzeAnalyze

Multiple ImputationMultiple Imputation

Impute Missing Data Impute Missing Data ValuesValues

Page 10: Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

Execution of MI in SPSSExecution of MI in SPSS

Enter ALL Enter ALL variables you variables you think associated think associated with missingnesswith missingness

Note default Note default imputation imputation number = 5number = 5

Create new Create new dataset to store dataset to store resultsresults

Note icon Note icon indicating indicating procedures that procedures that allow MI analysisallow MI analysis

Page 11: Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

Execution of MI in SPSSExecution of MI in SPSS

Automatic Automatic method lets SPSS method lets SPSS chosechose

Custom gives Custom gives more flexibilitymore flexibility

Can include all 2-Can include all 2-way interactionsway interactions

Linear Regression Linear Regression model predictionmodel prediction

Page 12: Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

Execution of MI in SPSSExecution of MI in SPSS

List of variables List of variables chosenchosen

Define Each Define Each variable for variable for imputation or imputation or predictor or BOTHpredictor or BOTH

N.b. Recommend N.b. Recommend including the including the OUTCOME as both OUTCOME as both predictor and predictor and outcomeoutcome

Page 13: Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

Output of MI in SPSSOutput of MI in SPSS

Note main Note main interest in interest in outcome VM2 but outcome VM2 but other factors with other factors with missing values missing values also imputedalso imputed

Page 14: Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

Step 2 - Using Imputed Step 2 - Using Imputed datasets in analysisdatasets in analysis

Note new dataset has IMPUTATION Note new dataset has IMPUTATION number as first column and contains in number as first column and contains in order the original dataset (n = 210), order the original dataset (n = 210), IMPUTATION = 0IMPUTATION = 0

and concatenated below it a further 5 and concatenated below it a further 5 new datasets (each n = 210) but now new datasets (each n = 210) but now with imputed values, IMPUTATION = 1 with imputed values, IMPUTATION = 1 to 5to 5

Most analyses can now be implemented Most analyses can now be implemented if the fossil shell spiral symbol is if the fossil shell spiral symbol is presentpresent

Page 15: Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

Repeat Main analysis – Repeat Main analysis – Need Pooled ResultsNeed Pooled Results

Procedure Procedure exactly same as exactly same as before before

SPSS will do the SPSS will do the pooled analysis if pooled analysis if the icon (above)the icon (above)

is present in the is present in the drop-down menudrop-down menu

Page 16: Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

Pooled Analysis in SPSSPooled Analysis in SPSS

Results Results presented for presented for the original the original data and for data and for each imputed each imputed dataset dataset separatelyseparately

Page 17: Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

Results of pooled analysis Results of pooled analysis from 5 imputed datasetsfrom 5 imputed datasets

Model B SE t Sig. Fraction missing

Pooled

Constant 15607 7808 1.999 0.047 0.173

AccelVM1a

0.852 0.051 16.630 0.000 0.124

Pedometer Group

11310 6131 1.845 0.066 0.138

Advice only

17536 6526 2.687 0.009 0.266

Larger Larger effect effect sizes in sizes in both both groupsgroups

Greater power gives Greater power gives more significancemore significance

Page 18: Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

InterpretationInterpretation

Compare pooled results with the original Compare pooled results with the original as a form of sensitivity analysisas a form of sensitivity analysis

If results similar suggests the original If results similar suggests the original results fairly robustresults fairly robust

Consider whether MAR is reasonable Consider whether MAR is reasonable assumptionassumption

Consider whether you have included Consider whether you have included allall factors (including the outcome) related factors (including the outcome) related to the missingness in the imputation to the missingness in the imputation model as a crucial assumptionmodel as a crucial assumption

Page 19: Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

SummarySummary

• SPSS now includes Multiple imputation SPSS now includes Multiple imputation in its armouryin its armoury

• Consider assumptions of MIConsider assumptions of MI

• Compare results under different Compare results under different assumption to assess robustness of assumption to assess robustness of resultsresults

• If MAR assumption o.k. then MI If MAR assumption o.k. then MI provides results that are provides results that are lessless biased biased than complete case analysisthan complete case analysis