Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

Use, misuse, and abuse of statistics

The European Academy of Nursing ScienceYear 2, Friday

Geert Verbeke

Interuniversity Institute for Biostatisticsand statistical Bioinformatics

[email protected]

http://perswww.kuleuven.be/geert_verbeke

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Errors in statistics: Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Errors in statistics: Practical implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Quiz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5 Clustered data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6 Missing observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

EANS: Use, misuse, and abuse of statistics i

Chapter 1

Introduction

. Focus of the course

. Course materials

EANS: Use, misuse, and abuse of statistics 1

1.1 I will NOT talk about . . .

• Mathematics

• Technical details

• Software

• Algorithms

• . . .


1.2 I will focus on . . .

• Use and misuse of statistics

• Frequently observed errors

• Some misconceptions

• Applications

• Publications

• Intuition

• . . .


1.3 Course material

• Course notes, also available from:

http://perswww.kuleuven.be/geert_verbeke/courses

• Online voting tool ‘Poll Everywhere’:

http://pollev.com/geertverbeke

iOS equipment

↑EANS: Use, misuse, and abuse of statistics 4

Chapter 2

Errors in statistics: Basic concepts

. Introduction

. Two types of errors


2.1 Introduction

• Consider the comparison of weight gains in rats with high (group 1) or low (group 2)protein level diets:

• On average, there is an observed difference of 19g between both groups.

• Formal comparison can be based on the unpaired t-test in which one tests

H0 : µ1 = µ2 versus HA : µ1 6= µ2,

where µ1 and µ2 are the means of large populations of rats fed with high or lowprotein level diets, respectively.


POPULATION

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

S

A

M

P

L

E

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

Hypotheses to test

H0 : µ1 = µ2

HA : µ1 6= µ2 ?Estimates for µ1 and µ2

µ1 = 120

µ2 = 101

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••

INFERENCE AND ESTIMATIONRANDOM


• Result:

There is no significant difference (p = 0.0757) in weight gain

between rats on a high protein level diet,

and rats on a low protein level diet

• The result of any statistical test should be interpreted as evidence in favour or againstthe null hypothesis, and should not be interpreted as formal proof.

• In our example, maybe a true difference was too small to be detected based on such asmall experiment.

• Alternatively, p = 0.001 would only indicate that the observed difference of 19g isunlikely to occur by pure chance, but maybe our sample was indeed the extreme onethat happens once every 1000 experiments.


• Hence, whenever statistical tests are used, errors in the conclusions can occur.

• It is therefore important to quantify the errors, and to keep them under control

• This is the case for our t-test but also for any other test, i.e., each time a p-value iscalculated and interpreted:

. (un-)paired t-test

. chi-squared test

. linear regression

. ANOVA

. logistic regression

. . . .


2.2 Two types of errors

RealityH0 correct H0 not correct

Test resultAccept H0 No error Type II error

Reject H0 Type I error No error

• Type I error: H0 is incorrectly rejected

• Type II error: H0 is incorrectly accepted


• The probability of a type I error can easily be controled by choosing the level ofsignificance α sufficiently small:

P (Type I error) = α =

1%

5%

10%

• The probability of a type II error can only be controled by conducting sufficiently largeexperiments:

P (Type II error) = 1 − power =⇒

power calculation

sample size calculation


Chapter 3

Errors in statistics: Practical implications

. Multiple testing

. Bonferroni correction

. Tests for baseline differences

. Equivalence tests

. Significance versus relevance

. Examples from biomedical literature


3.1 Multiple testing

• Each time a test is performed, there is probability α of making a type I error

• For example, if α = 0.05, we can expect to incorrectly reject the null hypothesisin 5 out of 100 times.

• Implication:

“The more tests one performs, the higher the probabilitythat something is detected by pure chance”

• This problem of multiple testing occurs very frequently in bio-medical sciences, invarious settings


3.1.1 Example: A classroom experiment

• On entry in the classroom, assign each student at random a seat at the left or at theright side of the classroom

• Compare both sides with respect to 100 aspects including weight, height, age, gender,color of hair, color of eyes,. . .

• It is to be expected that for at least 5 of these outcomes, a significant difference isobtained at the 5% level of significance, by pure chance.


3.1.2 Example: Testing many relations

• Amin et al. [1], Table 2:

. 18 tests performed

. only 2 significant results


3.1.3 Example: Subgroup analyses

• Kaplan et al. [2], Table 5:

. Tests based on C.I.’s for odds ratios

. C.I. containing 1 is equivalent to anon-significant test result

. 21 × 3 = 63 tests performed

. only 5 significant results


3.1.4 Example: Searching for the most significant results

• This ‘scientific finding’ was printed in the Belgian newspapers:

• It was even stated that those who wake up before 7.21am have a statisticallysignificant higher stress level during the day than those who wake up after 7.21am.


3.1.5 Conclusion

• Significant results obtained by multiple testing are often overinterpreted

• If the number of tests is reported, the reader knows that such results need to beinterpreted with extreme care

• The problem arises when only the significant results are reported, and one does notknow how many tests were performed in total

• This leads to reporting results which turn out to be not reproducible

• For example, a new study would not find that students seated on the left are tallerthan those on the right. Instead, they might weigh more.

• For example, a new experiment might show no difference in stress levels betweensubjects waking up early and those waking up late. Or maybe the critical wake uptime would be 8.12am.


3.2 Bonferroni correction

• Suppose two tests are performed, both at the 5% level of significance.

• The probability that at least one type I error will be made can be shown not to exceed2 × 0.05 = 0.10:

P (at least 1 type I error) ≤ 2 × 5% = 10%

• In general, if k tests are performed, all at the 5% level of significance, the probabilityof making at least one type I error can only be shown not to exceed k × 5%

• Obviously, controling the overall type I error rate can be done by performing eachseparate test at the α/k level of significance.


• For example, performing 2 tests at the 2.5% level of significance each implies that theprobability of making at least one type I error will not exceed 5%.

• In general, when k tests are performed at the α/k level of significance, one is surethat the overall probability of making at least one type I error will not exceed α.

• This correction of the significance level is called the Bonferroni correction.

• Note that, strictly speaking, the Bonferroni correction is an overcorrection, since theoverall type I error rate can only be shown not to exceed 5%, and usually will besmaller than the required 5%.

• In some specific testing situations (e.g., ANOVA analysis), more accurate correctionsare available (e.g., Tukey test)


3.3 Examples from the biomedical literature

• Baba et al. [3], p.1202 and p.1203:


• Kellett et al. [4], Table 2 (for example):


In the discussion, R.Roy writes:

Note that the reader cannot perform the Bonferroni correction as the exact p-valueshave not been reported.


3.4 Tests for baseline differences

• In order to show causal effects, patients are often randomized into 2 or more groups

• This ensures (at least in large studies) that all treatment groups are identical, exceptfor the treatment the patients receive

• In (relatively) small studies, imbalances can still occur by pure chance

• Therefore, one often compares the various groups with respect to important factorswhich are believed to be strongly related to the outcome of interest.

• This is called testing for baseline differences, as one compares the characteristicsof the patients at the start of the study.


• As an example, suppose interest is to compare two oral treatments, A and B, for thetreatment of hypertension.

• Suppose the change in diastolic BP is the oucome of interest

• Age is one of the factors believed to be strongly related to BP. Therefore, it isimportant that both treatment groups have the same age distribution

• Therefore, one often tests for age differences between A and B, e.g., based on thetwo-sample t-test.

• The hypothesis tested is

H0 : µA = µB versus HA : µA 6= µB

• Note that H0 and HA express properties of the populations, not the samples


• In the populations, we know that, due to the randomization, µA and µB are identical

• Conclusion:

It makes no sense at all to perform baseline testsin randomized studies

• No matter how small the resulting p-value would be (e.g., < 10−8) we know that theobserved difference in age between groups A and B has occurred purely by chance.

• Note also that testing for baseline differences cannot be used to check whether therandomization was done properly.


3.5 Example from the biomedical literature

Nissen et al. [5], abstract and table 1:

A two-arm randomized study


formal tests at baseline


3.6 Equivalence tests

• Suppose two groups A and B are to be compared, and an unpaired t-test is used to test

H0 : µA = µB versus HA : µA 6= µB

• In case of a non-significant test result, one often concludes that both groups areidentical or equivalent

• An alternative interpretation is that the experiment did not have sufficient power toshow an effect which is present.

• Conclusion:

Non-significance should not be interpreted as equivalence


• This can also be seen from the fact that, if the t-test could be used to showequivalence, it would be best to collect data on (extremely) small samples, as thiswould increase the chance to obtain an non-significant result, due to lack of power.

• Instead, one should reverse H0 and HA:

H0 : |µA − µB| > ∆ versus HA : |µA − µB| ≤ ∆

where ∆ is a pre-specified constant, defining ‘equivalence’

• The result of the equivalence test entirely depends on the choice of ∆

• Therefore, ∆ needs to be specified prior to the data collection


3.7 Example from the biomedical literature

Shatari et al. [6]:

• Title:


• Table 1:

No significantdifferences !


• Results and conclusions (abstract):


3.8 Significance versus relevance

• The power to detect some effect increases with the sample size

• This implies that any effect, no matter how small, will, sooner or later, be detected, ifthe sample is sufficiently large.

• For example, consider an experiment in hypertensive patients receive some treatment.

• The outcome of interest is change in BP:

BPbefore − BPafter

• Suppose that the observed difference would be 0.1 mmHg.


• A p-value as small as 0.001 would be likely to be obtained, provided that the samplewould be sufficiently large.

• Obviously, an average change in BP as small as 0.1 mmHg is not relevant from aclinical point of view.

• Conclusion:

Statistical significance 6= Clinical relevance


• A highly significant effect can be a large effect:

µ

0

[ ]

95% C.I. p = 0.0001

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

.

• A highly significant effect can also be a very small effect, but estimated with highprecision, due to a large sample size:

µ

0

[ ]

95% C.I. p = 0.0001

.

.

..

.

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

.

..

.

..

.

.

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.


• The p-value cannot distinguish between both situations

• It is therefore important not to blindly overinterpret significant results withoutknowing the size of the effect

• This is another reason why confidence intervals are to be preferred over significancetesting


Chapter 4

Quiz

• Online voting tool ‘Poll Everywhere’: http://pollev.com/geertverbeke

iOS equipment

↑EANS: Use, misuse, and abuse of statistics 38

4.1 Question 1

A group of women is subdivided atdelivery into ‘Intrathecal analgesia’or ‘Systemic analgesia’.The table reports means and stan-dard deviations for both groups.Which statement is correct ?

ANSWER:(http://pollev.com/geertverbeke)

A. Correction for multiple testing is needed because three outcomes are tested

B. Correction only needed in case more than one sign. effect is observed

C. Correction only needed in case at least one sign. effect is observed

D. None of the above

https://www.polleverywhere.com/multiple_choice_polls/jxsMdbtAvUHGk8G?preview=true


4.2 Question 2

A publication contains thefollowing table with resultsfrom 6 different hypothesis tests.Which statement is correct ?


A. Since only 2 tests are sign. no Bonferroni correction is needed

B. Bonferroni correction is needed since at least one test is sign.

C. No Bonferroni correction needed because no test is sign. at 0.05/6 significance level


https://www.polleverywhere.com/multiple_choice_polls/yhANjCO9z4IuhP0?preview=true


4.3 Question 3

Two treatments A and B arecompared to placebo withfollowing results.Which treatment is themost promising ? Why ?


A. A because most significant

B. A because smallest confidence interval

C. B because estimated treatment effect larger than for A


https://www.polleverywhere.com/multiple_choice_polls/duBlG7zaaKV4oHq?preview=true


4.4 Question 4

An experiment is set up to compare two treatments, A and B.A difference is observed but is not significant.Which of the following statements is correct with absolute certainty ?


A. The study does not have sufficient power

B. The experiment was too small

C. The variability in the data was too large

D. Maybe there is no difference between A and B in the population

https://www.polleverywhere.com/multiple_choice_polls/wP6yN6F101WdFOP?preview=true


4.5 Question 5

What is a potential danger in a (very) large implementation study ?


A. The null hypothesis will too often be incorrectly rejected

B. Effects too small to be relevant can be highly significant

C. The null hypothesis will too often be correctly accepted


https://www.polleverywhere.com/multiple_choice_polls/YNjQJeqOIs90aKA?preview=true


Chapter 5

Clustered data

. Data set

. Naive analysis

. Correct analysis

. Other examples


5.1 Data set: Washing without water

• Schoonhoven et al. [7]

• Comparison of traditional washing (soap & water) with the use of disposable washgloves, made of non-woven material, saturated with quickly vaporizing cleaning &caring lotions

• Nursing home residents requiring bathing by nurses

• 56 nursing home wards (±500 residents) randomized:

. Usual Care (UC: traditional bathing)

. Washing without water (WWW)


• Exclusion: In bath or shower > 1 day/week

• Outcome of interest is ‘Completeness of assisted bathing (1/0)’after 4 weeks post randomization

• Correction for dementia (1/0)

• Other covariates (age, gender, Barthel index, BMI, skin damage, . . . ) explored as well


5.2 Naive analysis

• Logistic regression with factors ‘intervention’ and ‘dementia’

• Results:

Effect OR 95% C.I. p-value

Intervention: WWW 4.739 [3.155; 7.143] <0.0001

UC

Dementia: NO 1.508 [1.005; 2.268] 0.0475

YES

• Bathing completeness more likely . . .

. . . . in WWW intervention group

. . . . in non-demented residents


5.3 Correct analysis

• Analysis did not account for the variability between wards w.r.t. proportion of residentswith complete bathing

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�

� �

� �

� �

� �

� � ��

� � � � � � � � � � � � � ! " � " # $ � % � � &


• Variability implies residents from one ward to be more alike than residents fromdifferent wards

=⇒ Correlated data

• This correlation should be accounted for in the statistical analysis

=⇒ Mixed models


• Corrected results:

Naive Correct

Effect OR 95% C.I. p-value OR 95% C.I. p-value

Intervention: WWW 4.739 [3.155; 7.143] <0.0001 12.821 [4.566; 35.714] <0.0001

UC

Dementia: NO 1.508 [1.005; 2.268] 0.0475 1.271 [0.883; 1.828] 0.1962

YES

• Conclusion:

Effects of covariates highly affectedby correlation witin clusters


5.4 Other examples

Clustering =⇒ Correlation

• Residents clustered within wards

• Patients clustered within hospitals

• Ophthalmology studies: Eyes within patients (−→ paired t-test)

• Longitudinal data: Repeated measurements within subjects

• . . .


Chapter 6

Missing observations

. Introduction

. Examples

. How to handle missing data ?


6.1 Introduction

• Complete data sets are rare in practice

• This implies loss of power, but more importantly may also imply biased results

• Problematic case:

Probability for an observation to be missingis related to the observation itself

• How to handle missingness in a data set ?


6.2 Examples

• Consider data from a longitudinal study with 20 subjects, measured at baseline andfollowed by 6 weekly visits:

' () *

+,-

.

/ .

0 .

1 .

2 .

3 4 5 6. / 0 1 2 7 8

9 : ; < = > ? > @ A ? A


• Due to dropout, not all subjects have been followed up to week 6:

B CD E

FGH

I

J I

K I

L I

M I

N O P QI J K L M R S

T U V W X Y W Z Z [ \ [

• Let us compare various common approaches to handle missingness,when interest is in estimation of the average trend


• Averaging the observed values at each visit:

] ^_ `

abc

d

e d

f d

g d

h d

i j k ld e f g h m n

o p q r s t r u u v w v

=⇒

Correct at visits without missing observations

Biased at visits with missing observations


• Averaging the values of the complete cases only:

x yz {

|}~

�

� �

� �

� �

� �

� � � ��

� � � � � � � � � � � � � � � � �

=⇒

Biased at visits without missing observations



• Averaging after last observation carried forward (LOCF):

� ��

��

�

� �

� �

�

¡ �

¢ £ ¤ ¥� � � ¡ ¦ §

¨ © ª « ¬ ® ¯ ° ± ° ¬ ² ³

=⇒


Distorted association structure (→ p-values)


• Averaging after mean imputation:

´ µ¶ ·

¸¹º

»

¼ »

½ »

¾ »

¿ »

À Á Â Ã» ¼ ½ ¾ ¿ Ä Å

Æ Ç È É Ê Ë Ì Í Î È Î Ê Ï É

=⇒


Distorted association & variance structure (→ p-values)


6.3 How to handle missing data ?

• No uniformly best answer:

. Depends on nature of missingness

. Depends on outcome type

. Depends on research question

. Depends on model considered

. . . .

• All methods rely on assumptions about the relation between the probability for anobservation to be missing and the observation itself

=⇒ Untestable assumptions


• Multiple imputation (M = 5 imputations):

Observeddata

.................................................................................................................................................................................................................................................................................................................................................

........................................................................................

........................................................................................

..........................................

........................................

....................................................................................................................................................................................... ........................................

.......................................................................................................................................................................................................................... ........................................

......................................................................................................................................................................................................................................................................................................... ........................................

Imputed 1

Imputed 2

Imputed 3

Imputed 4

Imputed 5

....................................................................................................................................................................................... ........................................

....................................................................................................................................................................................... ........................................

....................................................................................................................................................................................... ........................................

....................................................................................................................................................................................... ........................................

....................................................................................................................................................................................... ........................................

Results 1

Results 2

Results 3

Results 4

Results 5

......................................................................................................................................................................................................................................................................................................... ........................................

.......................................................................................................................................................................................................................... ........................................

....................................................................................................................................................................................... ........................................

.......................................................................................

........................................................................................

...........................................

........................................

.................................................................................................................................................................................................................................................................................................................................................

Finalresults

...........................

..........................

...........................

...........................

..................................................

....................................

....................................................................................

.....................................................................................................................................................................

.........................................................................................................................................................................................................................................................

Imputation CombinationAnalysis


• Advantages:

. Correctly accounts for uncertainty about imputed values

. Imputation can be based on observed information (covariates, outcomes)

. Expert opinion

. Various imputation models can be explored (−→ sensitivity analyses)

. Relatively straightforward to implement

• Often, a small number M of imputations is sufficient (M = 3, 5)

• Alternative approaches possible, but less generally applicableand/or more difficult to implement


The End !


Bibliography

[1] A.I. Amin, O. Hallbook, A.J. Lee, R. Sexton, B.J. Moran, and R.J. Heald. A 5-cm colonic j pouch colo-anal reconstruction following anteriorresection for low rectal cancer results in acceptable evacuation and continence in the long term. Colorectal Disease, 5:33–37, 2003.

[2] S. Kaplan, S. Etlin, I. Novikov, and B. Modan. Occupational risks for the development of brain tumours. American Journal of Industrial Medicine,31:15–20, 1997.

[3] Y. Baba, J.D. Putzke, N.R. Whaley, Z.K. Wszolek, and R.J. Uitti. Gender and the parkinson’s disease phenotype. Journal of Neurology,252:1201–1205, 2005.

[4] K.M. Kellett, D.A. Kellett, and L.A. Nordholm. Effects of an exercise program on sick leave due to back pain. Physical Therapy, 71:283–293,1991.

[5] S.E. Nissen, E.M. Tuzcu, P. Schoenhagen, et al. Statin therapy, LDL cholesterol, C-reactive protein, and coronary artery disease. The New

England Journal of Medicine, 352:29–38, 2005.

[6] T. Shatari, M.A. Clark, T. Yamamoto, A. Menon, C. Keh, J.Alexander-Williams, and M. Keighley. Long strictureplasty is as safe and effective asshort strictureplasty in small-bowel crohn’s disease. Colorectal Disease, 6:438–441, 2004.

[7] L. Schoonhoven, B.G. van Gaal, S. Teerenstra, E. Adang, C. van der Vleuten, and T. van Achterberg. Cost-consequence analysis of “washingwithout water” for nursing home residents: A cluster randomized trial. International Journal of Nursing Studies, 52:112–120, 2015.


Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

Documents