Top Banner
Use, misuse, and abuse of statistics The European Academy of Nursing Science Year 2, Friday Geert Verbeke Interuniversity Institute for Biostatistics and statistical Bioinformatics [email protected] http://perswww.kuleuven.be/geert_verbeke
66

Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

Jan 23, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

Use, misuse, and abuse of statistics

The European Academy of Nursing ScienceYear 2, Friday

Geert Verbeke

Interuniversity Institute for Biostatisticsand statistical Bioinformatics

[email protected]

http://perswww.kuleuven.be/geert_verbeke

Page 2: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Errors in statistics: Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Errors in statistics: Practical implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Quiz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5 Clustered data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6 Missing observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

EANS: Use, misuse, and abuse of statistics i

Page 3: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

Chapter 1

Introduction

. Focus of the course

. Course materials

EANS: Use, misuse, and abuse of statistics 1

Page 4: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

1.1 I will NOT talk about . . .

• Mathematics

• Technical details

• Software

• Algorithms

• . . .

EANS: Use, misuse, and abuse of statistics 2

Page 5: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

1.2 I will focus on . . .

• Use and misuse of statistics

• Frequently observed errors

• Some misconceptions

• Applications

• Publications

• Intuition

• . . .

EANS: Use, misuse, and abuse of statistics 3

Page 6: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

1.3 Course material

• Course notes, also available from:

http://perswww.kuleuven.be/geert_verbeke/courses

• Online voting tool ‘Poll Everywhere’:

http://pollev.com/geertverbeke

iOS equipment

↑EANS: Use, misuse, and abuse of statistics 4

Page 7: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

Chapter 2

Errors in statistics: Basic concepts

. Introduction

. Two types of errors

EANS: Use, misuse, and abuse of statistics 5

Page 8: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

2.1 Introduction

• Consider the comparison of weight gains in rats with high (group 1) or low (group 2)protein level diets:

• On average, there is an observed difference of 19g between both groups.

• Formal comparison can be based on the unpaired t-test in which one tests

H0 : µ1 = µ2 versus HA : µ1 6= µ2,

where µ1 and µ2 are the means of large populations of rats fed with high or lowprotein level diets, respectively.

EANS: Use, misuse, and abuse of statistics 6

Page 9: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

POPULATION

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

S

A

M

P

L

E

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

Hypotheses to test

H0 : µ1 = µ2

HA : µ1 6= µ2 ?Estimates for µ1 and µ2

µ1 = 120

µ2 = 101

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••

INFERENCE AND ESTIMATIONRANDOM

EANS: Use, misuse, and abuse of statistics 7

Page 10: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• Result:

There is no significant difference (p = 0.0757) in weight gain

between rats on a high protein level diet,

and rats on a low protein level diet

• The result of any statistical test should be interpreted as evidence in favour or againstthe null hypothesis, and should not be interpreted as formal proof.

• In our example, maybe a true difference was too small to be detected based on such asmall experiment.

• Alternatively, p = 0.001 would only indicate that the observed difference of 19g isunlikely to occur by pure chance, but maybe our sample was indeed the extreme onethat happens once every 1000 experiments.

EANS: Use, misuse, and abuse of statistics 8

Page 11: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• Hence, whenever statistical tests are used, errors in the conclusions can occur.

• It is therefore important to quantify the errors, and to keep them under control

• This is the case for our t-test but also for any other test, i.e., each time a p-value iscalculated and interpreted:

. (un-)paired t-test

. chi-squared test

. linear regression

. ANOVA

. logistic regression

. . . .

EANS: Use, misuse, and abuse of statistics 9

Page 12: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

2.2 Two types of errors

RealityH0 correct H0 not correct

Test resultAccept H0 No error Type II error

Reject H0 Type I error No error

• Type I error: H0 is incorrectly rejected

• Type II error: H0 is incorrectly accepted

EANS: Use, misuse, and abuse of statistics 10

Page 13: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• The probability of a type I error can easily be controled by choosing the level ofsignificance α sufficiently small:

P (Type I error) = α =

1%

5%

10%

• The probability of a type II error can only be controled by conducting sufficiently largeexperiments:

P (Type II error) = 1 − power =⇒

power calculation

sample size calculation

EANS: Use, misuse, and abuse of statistics 11

Page 14: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

Chapter 3

Errors in statistics: Practical implications

. Multiple testing

. Bonferroni correction

. Tests for baseline differences

. Equivalence tests

. Significance versus relevance

. Examples from biomedical literature

EANS: Use, misuse, and abuse of statistics 12

Page 15: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

3.1 Multiple testing

• Each time a test is performed, there is probability α of making a type I error

• For example, if α = 0.05, we can expect to incorrectly reject the null hypothesisin 5 out of 100 times.

• Implication:

“The more tests one performs, the higher the probabilitythat something is detected by pure chance”

• This problem of multiple testing occurs very frequently in bio-medical sciences, invarious settings

EANS: Use, misuse, and abuse of statistics 13

Page 16: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

3.1.1 Example: A classroom experiment

• On entry in the classroom, assign each student at random a seat at the left or at theright side of the classroom

• Compare both sides with respect to 100 aspects including weight, height, age, gender,color of hair, color of eyes,. . .

• It is to be expected that for at least 5 of these outcomes, a significant difference isobtained at the 5% level of significance, by pure chance.

EANS: Use, misuse, and abuse of statistics 14

Page 17: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

3.1.2 Example: Testing many relations

• Amin et al. [1], Table 2:

. 18 tests performed

. only 2 significant results

EANS: Use, misuse, and abuse of statistics 15

Page 18: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

3.1.3 Example: Subgroup analyses

• Kaplan et al. [2], Table 5:

. Tests based on C.I.’s for odds ratios

. C.I. containing 1 is equivalent to anon-significant test result

. 21 × 3 = 63 tests performed

. only 5 significant results

EANS: Use, misuse, and abuse of statistics 16

Page 19: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

3.1.4 Example: Searching for the most significant results

• This ‘scientific finding’ was printed in the Belgian newspapers:

• It was even stated that those who wake up before 7.21am have a statisticallysignificant higher stress level during the day than those who wake up after 7.21am.

EANS: Use, misuse, and abuse of statistics 17

Page 20: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

3.1.5 Conclusion

• Significant results obtained by multiple testing are often overinterpreted

• If the number of tests is reported, the reader knows that such results need to beinterpreted with extreme care

• The problem arises when only the significant results are reported, and one does notknow how many tests were performed in total

• This leads to reporting results which turn out to be not reproducible

• For example, a new study would not find that students seated on the left are tallerthan those on the right. Instead, they might weigh more.

• For example, a new experiment might show no difference in stress levels betweensubjects waking up early and those waking up late. Or maybe the critical wake uptime would be 8.12am.

EANS: Use, misuse, and abuse of statistics 18

Page 21: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

3.2 Bonferroni correction

• Suppose two tests are performed, both at the 5% level of significance.

• The probability that at least one type I error will be made can be shown not to exceed2 × 0.05 = 0.10:

P (at least 1 type I error) ≤ 2 × 5% = 10%

• In general, if k tests are performed, all at the 5% level of significance, the probabilityof making at least one type I error can only be shown not to exceed k × 5%

• Obviously, controling the overall type I error rate can be done by performing eachseparate test at the α/k level of significance.

EANS: Use, misuse, and abuse of statistics 19

Page 22: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• For example, performing 2 tests at the 2.5% level of significance each implies that theprobability of making at least one type I error will not exceed 5%.

• In general, when k tests are performed at the α/k level of significance, one is surethat the overall probability of making at least one type I error will not exceed α.

• This correction of the significance level is called the Bonferroni correction.

• Note that, strictly speaking, the Bonferroni correction is an overcorrection, since theoverall type I error rate can only be shown not to exceed 5%, and usually will besmaller than the required 5%.

• In some specific testing situations (e.g., ANOVA analysis), more accurate correctionsare available (e.g., Tukey test)

EANS: Use, misuse, and abuse of statistics 20

Page 23: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

3.3 Examples from the biomedical literature

• Baba et al. [3], p.1202 and p.1203:

EANS: Use, misuse, and abuse of statistics 21

Page 24: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• Kellett et al. [4], Table 2 (for example):

EANS: Use, misuse, and abuse of statistics 22

Page 25: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

In the discussion, R.Roy writes:

Note that the reader cannot perform the Bonferroni correction as the exact p-valueshave not been reported.

EANS: Use, misuse, and abuse of statistics 23

Page 26: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

3.4 Tests for baseline differences

• In order to show causal effects, patients are often randomized into 2 or more groups

• This ensures (at least in large studies) that all treatment groups are identical, exceptfor the treatment the patients receive

• In (relatively) small studies, imbalances can still occur by pure chance

• Therefore, one often compares the various groups with respect to important factorswhich are believed to be strongly related to the outcome of interest.

• This is called testing for baseline differences, as one compares the characteristicsof the patients at the start of the study.

EANS: Use, misuse, and abuse of statistics 24

Page 27: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• As an example, suppose interest is to compare two oral treatments, A and B, for thetreatment of hypertension.

• Suppose the change in diastolic BP is the oucome of interest

• Age is one of the factors believed to be strongly related to BP. Therefore, it isimportant that both treatment groups have the same age distribution

• Therefore, one often tests for age differences between A and B, e.g., based on thetwo-sample t-test.

• The hypothesis tested is

H0 : µA = µB versus HA : µA 6= µB

• Note that H0 and HA express properties of the populations, not the samples

EANS: Use, misuse, and abuse of statistics 25

Page 28: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• In the populations, we know that, due to the randomization, µA and µB are identical

• Conclusion:

It makes no sense at all to perform baseline testsin randomized studies

• No matter how small the resulting p-value would be (e.g., < 10−8) we know that theobserved difference in age between groups A and B has occurred purely by chance.

• Note also that testing for baseline differences cannot be used to check whether therandomization was done properly.

EANS: Use, misuse, and abuse of statistics 26

Page 29: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

3.5 Example from the biomedical literature

Nissen et al. [5], abstract and table 1:

A two-arm randomized study

EANS: Use, misuse, and abuse of statistics 27

Page 30: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

formal tests at baseline

EANS: Use, misuse, and abuse of statistics 28

Page 31: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

3.6 Equivalence tests

• Suppose two groups A and B are to be compared, and an unpaired t-test is used to test

H0 : µA = µB versus HA : µA 6= µB

• In case of a non-significant test result, one often concludes that both groups areidentical or equivalent

• An alternative interpretation is that the experiment did not have sufficient power toshow an effect which is present.

• Conclusion:

Non-significance should not be interpreted as equivalence

EANS: Use, misuse, and abuse of statistics 29

Page 32: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• This can also be seen from the fact that, if the t-test could be used to showequivalence, it would be best to collect data on (extremely) small samples, as thiswould increase the chance to obtain an non-significant result, due to lack of power.

• Instead, one should reverse H0 and HA:

H0 : |µA − µB| > ∆ versus HA : |µA − µB| ≤ ∆

where ∆ is a pre-specified constant, defining ‘equivalence’

• The result of the equivalence test entirely depends on the choice of ∆

• Therefore, ∆ needs to be specified prior to the data collection

EANS: Use, misuse, and abuse of statistics 30

Page 33: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

3.7 Example from the biomedical literature

Shatari et al. [6]:

• Title:

EANS: Use, misuse, and abuse of statistics 31

Page 34: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• Table 1:

No significantdifferences !

EANS: Use, misuse, and abuse of statistics 32

Page 35: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• Results and conclusions (abstract):

EANS: Use, misuse, and abuse of statistics 33

Page 36: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

3.8 Significance versus relevance

• The power to detect some effect increases with the sample size

• This implies that any effect, no matter how small, will, sooner or later, be detected, ifthe sample is sufficiently large.

• For example, consider an experiment in hypertensive patients receive some treatment.

• The outcome of interest is change in BP:

BPbefore − BPafter

• Suppose that the observed difference would be 0.1 mmHg.

EANS: Use, misuse, and abuse of statistics 34

Page 37: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• A p-value as small as 0.001 would be likely to be obtained, provided that the samplewould be sufficiently large.

• Obviously, an average change in BP as small as 0.1 mmHg is not relevant from aclinical point of view.

• Conclusion:

Statistical significance 6= Clinical relevance

EANS: Use, misuse, and abuse of statistics 35

Page 38: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• A highly significant effect can be a large effect:

µ

0

[ ]

95% C.I. p = 0.0001

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

.

• A highly significant effect can also be a very small effect, but estimated with highprecision, due to a large sample size:

µ

0

[ ]

95% C.I. p = 0.0001

.

.

..

.

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

.

..

.

..

.

.

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

EANS: Use, misuse, and abuse of statistics 36

Page 39: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• The p-value cannot distinguish between both situations

• It is therefore important not to blindly overinterpret significant results withoutknowing the size of the effect

• This is another reason why confidence intervals are to be preferred over significancetesting

EANS: Use, misuse, and abuse of statistics 37

Page 40: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

Chapter 4

Quiz

• Online voting tool ‘Poll Everywhere’: http://pollev.com/geertverbeke

iOS equipment

↑EANS: Use, misuse, and abuse of statistics 38

Page 41: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

4.1 Question 1

A group of women is subdivided atdelivery into ‘Intrathecal analgesia’or ‘Systemic analgesia’.The table reports means and stan-dard deviations for both groups.Which statement is correct ?

ANSWER:(http://pollev.com/geertverbeke)

A. Correction for multiple testing is needed because three outcomes are tested

B. Correction only needed in case more than one sign. effect is observed

C. Correction only needed in case at least one sign. effect is observed

D. None of the above

https://www.polleverywhere.com/multiple_choice_polls/jxsMdbtAvUHGk8G?preview=true

EANS: Use, misuse, and abuse of statistics 39

Page 42: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

4.2 Question 2

A publication contains thefollowing table with resultsfrom 6 different hypothesis tests.Which statement is correct ?

ANSWER:(http://pollev.com/geertverbeke)

A. Since only 2 tests are sign. no Bonferroni correction is needed

B. Bonferroni correction is needed since at least one test is sign.

C. No Bonferroni correction needed because no test is sign. at 0.05/6 significance level

D. None of the above

https://www.polleverywhere.com/multiple_choice_polls/yhANjCO9z4IuhP0?preview=true

EANS: Use, misuse, and abuse of statistics 40

Page 43: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

4.3 Question 3

Two treatments A and B arecompared to placebo withfollowing results.Which treatment is themost promising ? Why ?

ANSWER:(http://pollev.com/geertverbeke)

A. A because most significant

B. A because smallest confidence interval

C. B because estimated treatment effect larger than for A

D. None of the above

https://www.polleverywhere.com/multiple_choice_polls/duBlG7zaaKV4oHq?preview=true

EANS: Use, misuse, and abuse of statistics 41

Page 44: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

4.4 Question 4

An experiment is set up to compare two treatments, A and B.A difference is observed but is not significant.Which of the following statements is correct with absolute certainty ?

ANSWER:(http://pollev.com/geertverbeke)

A. The study does not have sufficient power

B. The experiment was too small

C. The variability in the data was too large

D. Maybe there is no difference between A and B in the population

https://www.polleverywhere.com/multiple_choice_polls/wP6yN6F101WdFOP?preview=true

EANS: Use, misuse, and abuse of statistics 42

Page 45: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

4.5 Question 5

What is a potential danger in a (very) large implementation study ?

ANSWER:(http://pollev.com/geertverbeke)

A. The null hypothesis will too often be incorrectly rejected

B. Effects too small to be relevant can be highly significant

C. The null hypothesis will too often be correctly accepted

D. None of the above

https://www.polleverywhere.com/multiple_choice_polls/YNjQJeqOIs90aKA?preview=true

EANS: Use, misuse, and abuse of statistics 43

Page 46: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

Chapter 5

Clustered data

. Data set

. Naive analysis

. Correct analysis

. Other examples

EANS: Use, misuse, and abuse of statistics 44

Page 47: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

5.1 Data set: Washing without water

• Schoonhoven et al. [7]

• Comparison of traditional washing (soap & water) with the use of disposable washgloves, made of non-woven material, saturated with quickly vaporizing cleaning &caring lotions

• Nursing home residents requiring bathing by nurses

• 56 nursing home wards (±500 residents) randomized:

. Usual Care (UC: traditional bathing)

. Washing without water (WWW)

EANS: Use, misuse, and abuse of statistics 45

Page 48: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• Exclusion: In bath or shower > 1 day/week

• Outcome of interest is ‘Completeness of assisted bathing (1/0)’after 4 weeks post randomization

• Correction for dementia (1/0)

• Other covariates (age, gender, Barthel index, BMI, skin damage, . . . ) explored as well

EANS: Use, misuse, and abuse of statistics 46

Page 49: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

5.2 Naive analysis

• Logistic regression with factors ‘intervention’ and ‘dementia’

• Results:

Effect OR 95% C.I. p-value

Intervention: WWW 4.739 [3.155; 7.143] <0.0001

UC

Dementia: NO 1.508 [1.005; 2.268] 0.0475

YES

• Bathing completeness more likely . . .

. . . . in WWW intervention group

. . . . in non-demented residents

EANS: Use, misuse, and abuse of statistics 47

Page 50: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

5.3 Correct analysis

• Analysis did not account for the variability between wards w.r.t. proportion of residentswith complete bathing

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

� �

� �

� �

� �

� � ���� ������� � �

� � � � � � � � � � � � � ! " � " # $ � % � � &

EANS: Use, misuse, and abuse of statistics 48

Page 51: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• Variability implies residents from one ward to be more alike than residents fromdifferent wards

=⇒ Correlated data

• This correlation should be accounted for in the statistical analysis

=⇒ Mixed models

EANS: Use, misuse, and abuse of statistics 49

Page 52: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• Corrected results:

Naive Correct

Effect OR 95% C.I. p-value OR 95% C.I. p-value

Intervention: WWW 4.739 [3.155; 7.143] <0.0001 12.821 [4.566; 35.714] <0.0001

UC

Dementia: NO 1.508 [1.005; 2.268] 0.0475 1.271 [0.883; 1.828] 0.1962

YES

• Conclusion:

Effects of covariates highly affectedby correlation witin clusters

EANS: Use, misuse, and abuse of statistics 50

Page 53: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

5.4 Other examples

Clustering =⇒ Correlation

• Residents clustered within wards

• Patients clustered within hospitals

• Ophthalmology studies: Eyes within patients (−→ paired t-test)

• Longitudinal data: Repeated measurements within subjects

• . . .

EANS: Use, misuse, and abuse of statistics 51

Page 54: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

Chapter 6

Missing observations

. Introduction

. Examples

. How to handle missing data ?

EANS: Use, misuse, and abuse of statistics 52

Page 55: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

6.1 Introduction

• Complete data sets are rare in practice

• This implies loss of power, but more importantly may also imply biased results

• Problematic case:

Probability for an observation to be missingis related to the observation itself

• How to handle missingness in a data set ?

EANS: Use, misuse, and abuse of statistics 53

Page 56: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

6.2 Examples

• Consider data from a longitudinal study with 20 subjects, measured at baseline andfollowed by 6 weekly visits:

' () *

+,-

.

/ .

0 .

1 .

2 .

3 4 5 6. / 0 1 2 7 8

9 : ; < = > ? > @ A ? A

EANS: Use, misuse, and abuse of statistics 54

Page 57: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• Due to dropout, not all subjects have been followed up to week 6:

B CD E

FGH

I

J I

K I

L I

M I

N O P QI J K L M R S

T U V W X Y W Z Z [ \ [

• Let us compare various common approaches to handle missingness,when interest is in estimation of the average trend

EANS: Use, misuse, and abuse of statistics 55

Page 58: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• Averaging the observed values at each visit:

] ^_ `

abc

d

e d

f d

g d

h d

i j k ld e f g h m n

o p q r s t r u u v w v

=⇒

Correct at visits without missing observations

Biased at visits with missing observations

EANS: Use, misuse, and abuse of statistics 56

Page 59: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• Averaging the values of the complete cases only:

x yz {

|}~

� �

� �

� �

� �

� � � �� � � � � � �

� � � � � � � � � � � � � � � � �

=⇒

Biased at visits without missing observations

Biased at visits with missing observations

EANS: Use, misuse, and abuse of statistics 57

Page 60: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• Averaging after last observation carried forward (LOCF):

� �� �

���

� �

� �

  �

¡ �

¢ £ ¤ ¥� � �   ¡ ¦ §

¨ © ª « ¬ ­ ® ¯ ° ± ° ¬ ² ³

=⇒

Biased at visits with missing observations

Distorted association structure (→ p-values)

EANS: Use, misuse, and abuse of statistics 58

Page 61: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• Averaging after mean imputation:

´ µ¶ ·

¸¹º

»

¼ »

½ »

¾ »

¿ »

À Á  û ¼ ½ ¾ ¿ Ä Å

Æ Ç È É Ê Ë Ì Í Î È Î Ê Ï É

=⇒

Biased at visits with missing observations

Distorted association & variance structure (→ p-values)

EANS: Use, misuse, and abuse of statistics 59

Page 62: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

6.3 How to handle missing data ?

• No uniformly best answer:

. Depends on nature of missingness

. Depends on outcome type

. Depends on research question

. Depends on model considered

. . . .

• All methods rely on assumptions about the relation between the probability for anobservation to be missing and the observation itself

=⇒ Untestable assumptions

EANS: Use, misuse, and abuse of statistics 60

Page 63: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• Multiple imputation (M = 5 imputations):

Observeddata

.................................................................................................................................................................................................................................................................................................................................................

........................................................................................

........................................................................................

..........................................

........................................

....................................................................................................................................................................................... ........................................

.......................................................................................................................................................................................................................... ........................................

......................................................................................................................................................................................................................................................................................................... ........................................

Imputed 1

Imputed 2

Imputed 3

Imputed 4

Imputed 5

....................................................................................................................................................................................... ........................................

....................................................................................................................................................................................... ........................................

....................................................................................................................................................................................... ........................................

....................................................................................................................................................................................... ........................................

....................................................................................................................................................................................... ........................................

Results 1

Results 2

Results 3

Results 4

Results 5

......................................................................................................................................................................................................................................................................................................... ........................................

.......................................................................................................................................................................................................................... ........................................

....................................................................................................................................................................................... ........................................

.......................................................................................

........................................................................................

...........................................

........................................

.................................................................................................................................................................................................................................................................................................................................................

Finalresults

...........................

..........................

...........................

...........................

..................................................

....................................

....................................................................................

.....................................................................................................................................................................

.........................................................................................................................................................................................................................................................

Imputation CombinationAnalysis

EANS: Use, misuse, and abuse of statistics 61

Page 64: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

• Advantages:

. Correctly accounts for uncertainty about imputed values

. Imputation can be based on observed information (covariates, outcomes)

. Expert opinion

. Various imputation models can be explored (−→ sensitivity analyses)

. Relatively straightforward to implement

• Often, a small number M of imputations is sufficient (M = 3, 5)

• Alternative approaches possible, but less generally applicableand/or more difficult to implement

EANS: Use, misuse, and abuse of statistics 62

Page 65: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

The End !

EANS: Use, misuse, and abuse of statistics 63

Page 66: Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29 • This can also be seen from the fact that, if the t-test could be used to show

Bibliography

[1] A.I. Amin, O. Hallbook, A.J. Lee, R. Sexton, B.J. Moran, and R.J. Heald. A 5-cm colonic j pouch colo-anal reconstruction following anteriorresection for low rectal cancer results in acceptable evacuation and continence in the long term. Colorectal Disease, 5:33–37, 2003.

[2] S. Kaplan, S. Etlin, I. Novikov, and B. Modan. Occupational risks for the development of brain tumours. American Journal of Industrial Medicine,31:15–20, 1997.

[3] Y. Baba, J.D. Putzke, N.R. Whaley, Z.K. Wszolek, and R.J. Uitti. Gender and the parkinson’s disease phenotype. Journal of Neurology,252:1201–1205, 2005.

[4] K.M. Kellett, D.A. Kellett, and L.A. Nordholm. Effects of an exercise program on sick leave due to back pain. Physical Therapy, 71:283–293,1991.

[5] S.E. Nissen, E.M. Tuzcu, P. Schoenhagen, et al. Statin therapy, LDL cholesterol, C-reactive protein, and coronary artery disease. The New

England Journal of Medicine, 352:29–38, 2005.

[6] T. Shatari, M.A. Clark, T. Yamamoto, A. Menon, C. Keh, J.Alexander-Williams, and M. Keighley. Long strictureplasty is as safe and effective asshort strictureplasty in small-bowel crohn’s disease. Colorectal Disease, 6:438–441, 2004.

[7] L. Schoonhoven, B.G. van Gaal, S. Teerenstra, E. Adang, C. van der Vleuten, and T. van Achterberg. Cost-consequence analysis of “washingwithout water” for nursing home residents: A cluster randomized trial. International Journal of Nursing Studies, 52:112–120, 2015.

EANS: Use, misuse, and abuse of statistics 64