Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

Institut für Medizinische Biometrie, Epidemiologie und Informatik

Aesthetics and power in multiple testing – a contradiction?

MCP 2007, Vienna

Gerhard Hommel

2 2

Introduction: Economics and Statistics

Economics: profit is not everything Ethical / social component Competing interests Aesthetics: protection of environment, industrial art,

patronage

Statistics: power is not everything Ethics: decisions are logical, conceivable, simple Competing interests Aesthetics: “beauty of mathematics” (subjective), but

also same points as for ethics

3 3

Examples for (non-) aesthetics:

Closure test + : principle simply to describe + : coherence directly obtained – : often very cumbersome to perform Bonferroni-Holm: SD(α/n, α/(n-1), … , α/2, α) Hochberg : SU(α/n, α/(n-1), … , α/2, α)

FDP, e.g. control of P(FDP > 0.2): SD(α/n, α/(n-1), α/(n-2), α/(n-3), 2α/(n-3), 2α/(n-4), … ,

3α/(n-7), …)

not beautiful (and not powerful)!

4 4

Logical decisions: Coherence

Coherence: When a hypothesis (= subset of the parameter space) is rejected, every of its subsets can be rejected.

Closure test: Local level α tests for all - hypotheses + coherence control of multiple level (FWER) α.

Closure tests form a complete class within all MTP’s controlling the FWER α.

But: Bonferroni-Holm is not coherent, in general!

Quasi-coherence: coherence for all index sets forming an intersection.

Quasi-closure test: Local level α tests for all index sets + quasi-coherence control of multiple level (FWER) α.

5 5

Monotonic decisions

Consider: monotonicity between different hypotheses:

p1, … ,pn = p-values

pi pj and Hj rejected Hi rejected.

Not obligatory: weights for hypotheses (from importance or expected power)

See Benjamini / Hochberg (1997) Fixed sequence tests Gatekeeping procedures

6 6

Monotonic decisions:nested hypotheses

Example: Yi = ß0 + ß1 xi + ß2 xi² +i

H1: ß1 = ß2 = 0 H2: ß2 = 0

F test of H1: p = .051

t test of H2: p = .024

Bonferroni-Holm ( = .05) rejects only H2

Logical: reject H1, too.

Size of a p-value is not the only criterion for rejection!

xi –3 –2 –1 0 1 2 3

yi 8 2 –1 1.6 –2 3 4

7 7

Monotonic decisions:multiple comparisons

Example: Comparison of k=4 means (ANOVA)

Hij: i = j , 1 i < j 4

p13 = .0241 < p34 = .0244 (t test; pooled variance)

Closure test rejects H14, H24, H34, but not H13!

(same result with regwq)

Non-monotonicity may be reasonable:

It is easier to separate group 4 from the cluster of groups 1,2,3 than to find differences within the cluster.

group 1 2 3 4

mean value 0 1 2 3.99

8 8

Monotonic decisions

My conclusion:

Only for equal weights and no logical constraints, it is mandatory that

decisions are monotonic in p-values, anddecisions are exchangeable.

9 9

Monotonicity within same hypothesis(α-consistency)

Given p-values p1, …, pn; q1, …, qn

with qi pi for i=1,…,n.

When a hypothesis is rejected, based on pi‘s, it should also be rejected when based on qi‘s.

Counterexample 1 (WAP procedure of Benjamini-Hochberg, 1997):

Stepdown based on p(j) w(j)α/(w(j)+…+w(n)):

Controls the FWER, but is not α-consistent.

10 10

Monotonicity within same hypothesis(α-consistency)

Counterexample 2: Tarone‘s (1990) MTPUses information about minimum attainable p-

values α1*, …, αn*

n=2, α1*=.03, α2*=.04: α = .05: no Hj can be rejected; α = .035: H1 can be rejected if p1 .035.

Hommel/Krummenauer (1998): monotonic improvement of Tarone‘s procedure (using a „rejection function“ b(α))

11 11

The fallback procedure (I)

Wiens (2003): „fixed sequence testing procedure“ with possibility to continue

Dmitrienko, Wiens, Westfall (2005): „fallback procedure“

Wiens + Dmitrienko (2005): Proof that FWER is controlled, suggestion for improvement

Two types of weights: sequence of hypotheses; „assigned weights“ α1‘,…,αn‘ with Σαi‘ =α.

12 12

The fallback procedure (II)

Use „assigned weights“ α1‘,…,αn‘ with Σαi‘ =α .

Actual significance levels:

α1 = α1‘

αi = αi‘ + αi-1 if Hi-1 has been rejected

αi = αi‘ if Hi-1 has not been rejected.

α1‘ = α, α2‘ = ... = αn‘ = 0 fixed sequence test.

13 13

Example for n = 2

Endpoint 1: Functional capacity of heart Endpoint 2: Mortality α = .05, α1‘ = .04, α2‘ = .01

p1 .04: Reject H1 and test H2 with α2 = .05 .

p1 > .04: Retain H1 and test H2 with α2 = .01 .

Weighted Bonferroni-Holm with α1‘ = .04, α2‘ = .01 :

Rejects H1, in addition, when p2 .01 and

.04 < p1 .05 !

14 14

Comparison with weighted Bonferroni-Holm

For n = 2: WBH is strictly more powerful than the fallback procedure. The improvement by Wiens + Dmitrienko is identical to WBH.

For n 3: There exist situations where fallback rejects and WBH not, and conversely. ( the improvement by W+D is not identical to WBH)

15 15

The fallback procedure for n=3:weights for intersection hypotheses

αi‘= wiα

wi = 1

(see W+D)

index set weight for index

1 2 3

{1,2,3}

{1,2}

{1,3}

{2,3}

{1}

{2}

{3}

w1 w2 w3

w1 w2 --

w1 -- w2+w3

-- w1+w2 w3

w1 -- --

-- w1+w2 --

-- -- w1+w2+w3

16 16

The fallback procedure for n=3:equal weights

αi‘= wiα

wi = 1/3

Consequence

for importance:

H2 H3 H1?


1 2 3

{1,2,3}

{1,2}

{1,3}

{2,3}

{1}

{2}

{3}

1/3 1/3 1/3

1/3 1/3 --

1/3 -- 2/3

-- 2/3 1/3

1/3 -- --

-- 2/3 --

-- -- 1

17 17


αi‘= wiα

wi = 1/3

Consequence

for importance:

H2 H3 H1?


1 2 3

{1,2,3}

{1,2}

{1,3}

{2,3}

{1}

{2}

{3}

1/3 1/3 1/3

1/3 1/3 --

1/3 -- 2/3

-- 2/3 1/3

1/3 -- --

-- 2/3 --

-- -- 1

18 18

The fallback procedure for n=3:equal weights; improvement by W+D

αi‘= wiα

wi = 1/3

Consequence

for importance:

H2 H3 H1

(remains)


1 2 3

{1,2,3}

{1,2}

{1,3}

{2,3}

{1}

{2}

{3}

1/3 1/3 1/3

1/2 1/2 --

1/3 -- 2/3

-- 2/3 1/3

1 -- --

-- 1 --

-- -- 1

19 19


The decisions of the fallback procedure (with equal weights) are not exchangeable (and can never become!).

Example: p(1)=.015, p(2)=.02, p(3)=1; α=.05.

(Bonferroni-Holm: rejects H(1) and H(2) )

p1 < p2 < p3 : reject H1, H2 p1 < p3 < p2 : reject H1

p2 < p1 < p3 : reject H2

p2 < p3 < p1 : reject H2, H3

p3 < p1 < p2 : reject H3 (, H1) p3 < p2 < p1 : reject H3

20 20

The fallback procedure:critical questions

What are the relations of the two different types of weighting?

Can it be meaningful to give higher assigned weights for higher indices?

Can one give „guidelines“ how to choose the weights? Equal assigned weights: what is the influence of

ordering? (anyway: the procedure has „aesthetic“ drawbacks)

For which situations can one expect that the fallback procedure is more powerful than WBH?

Or should one better renounce it completely?

21 21

Thank you for your attendance!

Are there more questions?

Or some answers?

Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

Documents