1 Multiplicity Gaetan Lion July 2013
Jan 14, 2015
1
Multiplicity
Gaetan Lion
July 2013
2
Probability of Making a Type I error*
when using a t test with > 2 Samples
*A false positive. Rejecting the null hypothesis when it is true.
Prob of Type I Error (Initial Confidence Level 95%)
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
1 2 3 4 5 6 7 8 9 10
# of Hypothesis
Confidence level 95%
Unadjusted a value 5%
Probability of Type I Error
# of Logic Logic Logic
hypothesis Bonferroni Sidak Sidak
1 0.05 0.05 0.05
2 0.10 0.10 0.10
3 0.15 0.14 0.14
4 0.20 0.19 0.19
5 0.25 0.23 0.23
6 0.30 0.26 0.26
7 0.35 0.30 0.30
8 0.40 0.34 0.34
9 0.45 0.37 0.37
10 0.50 0.40 0.40
3
How to test > 2 Samples
Two Basic Steps:
1) Choose a specific ANOVA method, given your testing
framework: Between-Groups, Within-Groups, or Mixed
ANOVA… or use a nonparametric equivalent if
warranted. This is to figure out whether your groups or
samples are different overall.
2) Decide in advance whether to conduct Post Hoc (after
the fact) or Planned Comparison tests to figure out which
specific groups are different.
4
The ANOVAs
Between-GroupsUnpaired testing. Difference between independent groups.
Single measures or observations.
Within-GroupsPaired testing. Difference between same groups before
and after treatment. Repeated measures or observations.
Mixed
Mixed testing. Difference between independent groups
before and after treatment. Repeated measures or
observations.
5
ANOVA semantics
One-Way Between-Groups ANOVA means an ANOVA with independent
groups measuring one single independent variable and one dependent
variable. The independent variable could be type of students by Major
and the dependent variable could be math proficiency.
Four-Way Between-Groups ANOVA using the same data, but in addition to
Majors would also look at: Gender, Class (Freshman, Sophomore,…), and
Ethnicity. So, you now have four independent variables.
Balanced ANOVA means that each group or sample is of the same size
(same number of male vs female, etc…). An Unbalanced ANOVA means
that some of the samples are of different size.
6
Excel ANOVA(s) Add-in Cryptic Semantics
“Factor” means the same as “Way.” They both mean Independent Variable. “With
Replication” can be confused with “Repeated Measures” that typically means
“Within Group” or paired testing.
“Without Replication” can be confused with “Single Measure” that typically means
“Between Groups” or unpaired testing.
In Excel Add-in “With Replication” means you have more than one single data point
per group or sample which is almost always the case.
“Without Replication” in Excel Add-in can be used for two very different situations:
1) Two-Way Between Groups ANOVA with a single observation per category; and
2) One-Way Within Groups ANOVA.
ANOVA method Excel Add-in corresponding tool
One-Way Between-Groups ANOVA Anova: Single Factor
Two-Way Between Groups ANOVA Anova: Two-Factor With Replication
More than one observation per category (standard)
Two-Way Between Groups ANOVA Anova: Two-Factor Without Replication
A single observation per category
One-Way Within Groups ANOVA Anova: Two-Factor Without Replication
7
Post Hoc vs Planned Comparison Tests
Post Hoc test Planned Comparison test
Purpose
Exploratory. You test the
difference between all potential
combination of Groups.
Confirmation of theory or hypothesis.
You test only the Groups you expect
to be different in a specific direction
(greater, lower).
Risk of Type I
error
Very low. Very unlikely to
generate a false positive. Reject
null hypothesis when it is true.
Low. Not quite as conservative as a
Post Hoc test. But, conservative
enough.
Risk of Type II
error
High. Not, unlikely to generate a
false negative (accept the null
hypothesis when it is false). This
test lacks Power.
Lower risk of Type II error than Post
Hoc test. The test is more sensitive,
more likely to uncover a difference.
It has more Power.
8
PH means Post Hoc test
PC means Planned Comparison test
HYPOTHESIS TESTING FLOW CHART
Multiple hypothesis testing. > 2 Samples or Groups
Multiple hypothesis test Transition test Post Hoc Test
Are the groups different? to facilitate Which group is different?
Post Hoc test
Tukey's HSD test (PH)
Scheffe test (PH)
Normal Between-Groups ANOVA
REGWQ test (PH)
Dunnett test (PC)
Unpaired t test
not Kruskal-Wallis test. Mann-Whitney
Bonferroni test (PH)
not Friedman test
Sidak test (PH)
Paired t test
Simple contrasts (PC)
Normal Within-Groups ANOVA
Repeated contrasts (PC)
Normal Mixed ANOVA No Post Hoc test
not No nonparametric
alternative
Unpaired testing
Difference between independent groups
(Between-Groups).
Single measure or observation.
Paired testing
Difference between same group before and
after treatment (Within-Groups).
Repeated measures or observations.
Mixed testing
Difference between independent groups
before and after treatment (Mixed).
Repeated measures or observations.
Research structure
Are we testing different groups once?
Are we testing the same group(s) at
different times?
Wilcoxon Sign
Rank Test
9
Two-Ways Between-Groups
ANOVA example
10
Data Format
For Excel Add-in
Y X1 X2
Int. Score Cowboy Gender
71 J. Wayne Male
76 J. Wayne Male
84 J. Wayne Male
72 J. Wayne Male
68 J. Wayne Male
66 J. Wayne Female
64 J. Wayne Female
66 J. Wayne Female
47 J. Wayne Female
66 J. Wayne Female
65 C. Eastwood Male
53 C. Eastwood Male
70 C. Eastwood Male
46 C. Eastwood Male
53 C. Eastwood Male
73 C. Eastwood Female
80 C. Eastwood Female
81 C. Eastwood Female
88 C. Eastwood Female
72 C. Eastwood Female
81 None Male
69 None Male
55 None Male
60 None Male
61 None Male
72 None Female
75 None Female
73 None Female
54 None Female
65 None Female
For XLStat
XLStat treats this
ANOVA as a linear
regression with one
dependent variable
and two qualitative
independent
variables.
Two-Way Between-Groups ANOVA
Two Independent variable: Cowboy preference in movies, Gender
One Dependent variable: Intelligence score
Male Female
John Wayne 71 66
76 64
84 66
72 47
68 66
Clint Eastwood 65 73
53 80
70 81
46 88
53 72
None 81 72
69 75
55 73
60 54
61 65
11
Excel Long Hand
Between-Sample Variability (BSV)
Sample
size Average Total Avg. Differ.^2
J. Wayne - Male 5 74.2 67.5 44.4
J. Wayne - Female 5 61.8 67.5 32.9
C. Eastwood - Male 5 57.4 67.5 102.7
C. Eastwood - Female 5 78.8 67.5 126.9
None - Male 5 65.2 67.5 5.4
None - Female 5 67.8 67.5 0.1
J. Wayne 10 68.0 67.5 0.2
C. Eastwood 10 68.1 67.5 0.3
None 10 66.5 67.5 1.1
Male 15 65.6 67.5 3.7
Female 15 69.5 67.5 3.7
SS DF (k - 1) MS
Corrected Model 1,562.3 5 312.5
Cowboy 16 2 8.0
Gender 112.1 1 112.1
Within-Sample Variability (WSV)
Sample -1 STDEV Variance
J. Wayne - Male 4 6.2 38.2
J. Wayne - Female 4 8.3 69.2
C. Eastwood - Male 4 9.8 96.3
C. Eastwood - Female 4 6.5 42.7
None - Male 4 10.2 103.2
None - Female 4 8.6 73.7
SS Within 1,693.2
DF (n - k) 24
MS Within 70.5
Between-Sample Variability/Within-Sample Variability Output
BSV/WSV
Source SS df MS F Sign.
Model 1,562.3 5 312.5 4.4 0.005
Cowboy 16.1 2 8.0 0.1 0.893
Gender 112.1 1 112.1 1.6 0.220
Cowboy*Gender 1,434.1 2 717.0 10.2 0.001
Error/Residual 1,693.2 24 70.5
Corrected Model 3,255.5 29
12
Excel Add-in
Anova: Two-Factor With ReplicationSUMMARY Male Female Total
J. Wayne
Count 5 5 10
Sum 371 309 680
Average 74.2 61.8 68
Variance 38.2 69.2 90.4
C. Eastwood
Count 5 5 10
Sum 287 394 681
Average 57.4 78.8 68.1
Variance 96.3 42.7 189.0
None
Count 5 5 10
Sum 326 339 665
Average 65.2 67.8 66.5
Variance 103.2 73.7 80.5
Total
Count 15 15
Sum 984 1042
Average 65.6 69.5
Variance 118.4 106.1
ANOVA
Source of Variation SS df MS F P-value F crit
Sample 16.1 2 8.0 0.11 0.893 3.4
Columns 112.1 1 112.1 1.59 0.220 4.3
Interaction 1434.1 2 717.0 10.16 0.001 3.4
Within 1693.2 24 70.6
Total 3255.5 29
Cowboy Gender
Error/Residual
Corrected Total/Model
13
XLStat ANOVA
Pred(I Score) / I Score
45
50
55
60
65
70
75
80
85
90
45 50 55 60 65 70 75 80 85 90
Pred(I Score)
I S
co
re
Analysis of variance:
Source DF SS MS F Pr > F
Model 5 1562.3 312.5 4.43 0.005
Error 24 1693.2 70.5
Corrected Total 29 3255.5
Computed against model Y=Mean(Y)
Type I Sum of Squares analysis:
Source DF SS MS F Pr > F
Cowboy 2 16 8.0 0.1 0.893
Gender 1 112 112.1 1.6 0.220
Cowboy*Gender 2 1434 717.0 10.2 0.001
14
Post Hoc and
Planned Comparison tests
15
Tukey’s HSD (PH) vs Dunnett test (PC)
for Cowboys
Tukey's Honestly Significant Difference (HSD) test. Post Hoc test Dunnett test. Planned Comparison
MS Within 70.5 MS Within 70.5
n 10 Number per treatment/Number of Groups n 10
Standard Error 2.66 SQRT(MS within(1/n) Standard Error 3.76 SQRT(2MS within/n)
Average intelligence score: Average intelligence score:
C. Eastwood 68.1 C. Eastwood 68.1
J. Wayne 68.0 J. Wayne 68.0
None 66.5 None 66.5
A A/B*1.96 A A/B*1.96
Standard. Alpha Estimated Estimated Standard. Alpha Estimated Est.
Differ. Difference 0.05 Z value 2-tail P val. Differ. Difference 0.05 Z value 2-tail P val.
C. East vs None 1.60 0.60 Not sign. 0.33 0.74 C. East vs None 1.60 0.43 Not sign. 0.36 0.72
C. East vs J. Wayne 0.10 0.04 Not sign. 0.02 0.98 C. East vs J. Wayne 0.10 0.03 Not sign. 0.02 0.98
J. Wayne vs None 1.50 0.56 Not sign. 0.31 0.75 J. Wayne vs None 1.50 0.40 Not sign. 0.33 0.74
Critical value @ a 0.05. 2-tail Critical value @ a 0.05. 2-tail
df within 24 df within 24
k # groups 3 k # groups 3
alpha 0.05 from table 3.53 B alpha 0.05 from table 2.35 B
16
Tuckey’s Test (PH) (for Cowboys) Tukey's Honestly Significant Difference (HSD) test. Post Hoc test
MS Within 70.5
n 10 Number per treatment/Number of Groups
Standard Error 2.66 SQRT(MS within(1/n)
Average intelligence score:
C. Eastwood 68.1
J. Wayne 68.0
None 66.5
A A/B*1.96
Standard. Alpha Estimated Estimated
Differ. Difference 0.05 Z value 2-tail P val.
C. East vs None 1.60 0.60 Not sign. 0.33 0.74
C. East vs J. Wayne 0.10 0.04 Not sign. 0.02 0.98
J. Wayne vs None 1.50 0.56 Not sign. 0.31 0.75
Critical value @ a 0.05. 2-tail
df within 24
k # groups 3
alpha 0.05 from table 3.53 B
17
Dunnett test (PC) (for Cowboys)
Dunnett test. Planned Comparison
MS Within 70.5
n 10
Standard Error 3.76 SQRT(2MS within/n)
Average intelligence score:
C. Eastwood 68.1
J. Wayne 68.0
None 66.5
A A/B*1.96
Standard. Alpha Estimated Est.
Differ. Difference 0.05 Z value 2-tail P val.
C. East vs None 1.60 0.43 Not sign. 0.36 0.72
C. East vs J. Wayne 0.10 0.03 Not sign. 0.02 0.98
J. Wayne vs None 1.50 0.40 Not sign. 0.33 0.74
Critical value @ a 0.05. 2-tail
df within 24
k # groups 3
alpha 0.05 from table 2.35 B
18
Comparing Dunnett vs Tukey’s across
various Mean difference levels Comparing Dunnett's vs Tukey's across various Mean difference level.
Standard
error
a 5% 2-tail
critical
value
Dunnett's 3.76 2.35
Tukey's 2.66 3.53
Z 1.96
Tuckey's Test Dunnett Test
% of % of
Mean Standard. 2-tail 2-tail 2-tail Mean Standard. 2-tail 2-tail 2-tail
difference differen. Critical val. Z equival. P val est. difference differen. Critical val. Z equival. P val est.
0.5 0.19 5.3% 0.10 0.92 0.5 0.13 5.7% 0.11 0.91
1.0 0.38 10.7% 0.21 0.83 1.0 0.27 11.3% 0.22 0.82
2.0 0.75 21.3% 0.42 0.68 2.0 0.53 22.7% 0.44 0.66
3.0 1.13 32.0% 0.63 0.53 3.0 0.80 34.0% 0.67 0.51
4.0 1.51 42.7% 0.84 0.40 4.0 1.06 45.3% 0.89 0.37
5.0 1.88 53.3% 1.05 0.30 5.0 1.33 56.6% 1.11 0.27
6.0 2.26 64.0% 1.25 0.21 6.0 1.60 68.0% 1.33 0.18
7.0 2.64 74.7% 1.46 0.14 7.0 1.86 79.3% 1.55 0.12
8.0 3.01 85.3% 1.67 0.09 8.0 2.13 90.6% 1.78 0.08
9.0 3.39 96.0% 1.88 0.06 9.0 2.40 102.0% 2.00 0.05
19
Dunnett vs Tukey’s visual comparison
Dunnett vs Tuckey 2-tail p value
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.5 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
Mean difference
2-t
ail
p v
alu
e
Tukey
Dunnett
The Dunnett test is only marginally more sensitive or has more Power than the
Tukey’s test (more likely to find a statistically significant difference) when using a
2-tail test. However, with Dunnett, if warranted, you can also use a 1-tail test…
which would make a huge difference. With Tukey, you can’t do that.
20
Bonferroni vs Sidak Test adjusted a value
Multiple hypothesis testing adjustments
Corresponding to a: 5%
Adjusted a value:
# of
hypothesis Bonferroni Sidak
1 5.00% 5.00%
2 2.50% 2.53%
3 1.67% 1.70%
4 1.25% 1.27%
5 1.00% 1.02%
6 0.83% 0.85%
7 0.71% 0.73%
8 0.63% 0.64%
9 0.56% 0.57%
10 0.50% 0.51%
Bonferroni: a/# of hypothesis
Sidak: 1 - (1- a)1/# of hypothesis
Those tests consists in adjusting
the relevant Alpha threshold (i.e.
5%) for the number of
hypothesis you are testing.
Bonferroni simply divides the
Alpha value by the # of
hypothesis. Sidak uses a
compounding formula that is
technically more accurate but
makes no material difference in
this situation.
21
What would be qualifying a value?
At what familywise level a would a single hypothesis qualify (Sidak logic)
Original unpaired t test p value
0.5% 1% 2.5% 5% 10% 15%
1 0.01 0.01 0.03 0.05 0.10 0.15
2 0.01 0.02 0.05 0.10 0.19 0.28
3 0.01 0.03 0.07 0.14 0.27 0.39
# of 4 0.02 0.04 0.10 0.19 0.34 0.48
hypothesis 5 0.02 0.05 0.12 0.23 0.41 0.56
6 0.03 0.06 0.14 0.26 0.47 0.62
7 0.03 0.07 0.16 0.30 0.52 0.68
8 0.04 0.08 0.18 0.34 0.57 0.73
9 0.04 0.09 0.20 0.37 0.61 0.77
10 0.05 0.10 0.22 0.40 0.65 0.80
22
A Radical Idea: Skipping ANOVA
HYPOTHESIS TESTING FLOW CHART
Multiple hypothesis testing. > 2 Samples or Groups
Multiple hypothesis test Transition test Post Hoc Test
Are the groups different? to facilitate Which group is different?
Post Hoc test
Tukey's HSD test (PH)
Scheffe test (PH)
Normal Between-Groups ANOVA
REGWQ test (PH)
Dunnett test (PC)
Unpaired t test
not Kruskal-Wallis test. Mann-Whitney
Bonferroni test (PH)
not Friedman test
Sidak test (PH)
Paired t test
Simple contrasts (PC)
Normal Within-Groups ANOVA
Repeated contrasts (PC)
Normal Mixed ANOVA No Post Hoc test
not No nonparametric
alternative
Unpaired testing
Difference between independent groups
(Between-Groups).
Single measure or observation.
Paired testing
Difference between same group before and
after treatment (Within-Groups).
Repeated measures or observations.
Mixed testing
Difference between independent groups
before and after treatment (Mixed).
Repeated measures or observations.
Research structure
Are we testing different groups once?
Are we testing the same group(s) at
different times?
Wilcoxon Sign
Rank Test
23
Streamlined Testing
HYPOTHESIS TESTING FLOW CHART
Multiple hypothesis testing. > 2 Samples or Groups
Post Hoc Test
Which group is different?
Normal Unpaired t test
not Mann-Whitney
not Wilcoxon Sign Rk test
Normal Paired t test
Unpaired testing
Difference between independent groups
(Between-Groups).
Single measure or observation.
Paired testing
Difference between same group before and
after treatment (Within-Groups).
Repeated measures or observations.
Research structure
Are we testing different groups once?
Are we testing the same group(s) at
different times?
Bonferroni or
Sidak test (PH)
24
Disadvantages of Streamlined Testing
• You can’t run Tukey’s (PH) and Dunnett (PC). Those tests are not just adjustment to P value, and may be superior in certain circumstances;
• You don’t have access to any Planned Comparison tests that are more sensitive (more Power) and can allow you to use a 1-tail P value when warranted;
• ANOVA gives you valuable information about the different independent variables, and their interaction. Between-Sample Variability/Within-Sample Variability Output
BSV/WSV
Source SS df MS F Sign.
Model 1,562.3 5 312.45 4.43 0.005
Cowboy 16.1 2 8.03 0.11 0.893
Gender 112.1 1 112.13 1.59 0.220
Cowboy*Gender 1,434.1 2 717.03 10.16 0.001