Top Banner
Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard Hommel
21

Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

Jan 01, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

Institut für Medizinische Biometrie, Epidemiologie und Informatik

Aesthetics and power in multiple testing – a contradiction?

MCP 2007, Vienna

Gerhard Hommel

Page 2: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

2 2

Introduction: Economics and Statistics

Economics: profit is not everything Ethical / social component Competing interests Aesthetics: protection of environment, industrial art,

patronage

Statistics: power is not everything Ethics: decisions are logical, conceivable, simple Competing interests Aesthetics: “beauty of mathematics” (subjective), but

also same points as for ethics

Page 3: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

3 3

Examples for (non-) aesthetics:

Closure test + : principle simply to describe + : coherence directly obtained – : often very cumbersome to perform Bonferroni-Holm: SD(α/n, α/(n-1), … , α/2, α) Hochberg : SU(α/n, α/(n-1), … , α/2, α)

FDP, e.g. control of P(FDP > 0.2): SD(α/n, α/(n-1), α/(n-2), α/(n-3), 2α/(n-3), 2α/(n-4), … ,

3α/(n-7), …)

not beautiful (and not powerful)!

Page 4: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

4 4

Logical decisions: Coherence

Coherence: When a hypothesis (= subset of the parameter space) is rejected, every of its subsets can be rejected.

Closure test: Local level α tests for all - hypotheses + coherence control of multiple level (FWER) α.

Closure tests form a complete class within all MTP’s controlling the FWER α.

But: Bonferroni-Holm is not coherent, in general!

Quasi-coherence: coherence for all index sets forming an intersection.

Quasi-closure test: Local level α tests for all index sets + quasi-coherence control of multiple level (FWER) α.

Page 5: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

5 5

Monotonic decisions

Consider: monotonicity between different hypotheses:

p1, … ,pn = p-values

pi pj and Hj rejected Hi rejected.

Not obligatory: weights for hypotheses (from importance or expected power)

See Benjamini / Hochberg (1997) Fixed sequence tests Gatekeeping procedures

Page 6: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

6 6

Monotonic decisions:nested hypotheses

Example: Yi = ß0 + ß1 xi + ß2 xi² +i

H1: ß1 = ß2 = 0 H2: ß2 = 0

F test of H1: p = .051

t test of H2: p = .024

Bonferroni-Holm ( = .05) rejects only H2

Logical: reject H1, too.

Size of a p-value is not the only criterion for rejection!

xi –3 –2 –1 0 1 2 3

yi 8 2 –1 1.6 –2 3 4

Page 7: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

7 7

Monotonic decisions:multiple comparisons

Example: Comparison of k=4 means (ANOVA)

Hij: i = j , 1 i < j 4

p13 = .0241 < p34 = .0244 (t test; pooled variance)

Closure test rejects H14, H24, H34, but not H13!

(same result with regwq)

Non-monotonicity may be reasonable:

It is easier to separate group 4 from the cluster of groups 1,2,3 than to find differences within the cluster.

group 1 2 3 4

mean value 0 1 2 3.99

Page 8: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

8 8

Monotonic decisions

My conclusion:

Only for equal weights and no logical constraints, it is mandatory that

decisions are monotonic in p-values, anddecisions are exchangeable.

Page 9: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

9 9

Monotonicity within same hypothesis(α-consistency)

Given p-values p1, …, pn; q1, …, qn

with qi pi for i=1,…,n.

When a hypothesis is rejected, based on pi‘s, it should also be rejected when based on qi‘s.

Counterexample 1 (WAP procedure of Benjamini-Hochberg, 1997):

Stepdown based on p(j) w(j)α/(w(j)+…+w(n)):

Controls the FWER, but is not α-consistent.

Page 10: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

10 10

Monotonicity within same hypothesis(α-consistency)

Counterexample 2: Tarone‘s (1990) MTPUses information about minimum attainable p-

values α1*, …, αn*

n=2, α1*=.03, α2*=.04: α = .05: no Hj can be rejected; α = .035: H1 can be rejected if p1 .035.

Hommel/Krummenauer (1998): monotonic improvement of Tarone‘s procedure (using a „rejection function“ b(α))

Page 11: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

11 11

The fallback procedure (I)

Wiens (2003): „fixed sequence testing procedure“ with possibility to continue

Dmitrienko, Wiens, Westfall (2005): „fallback procedure“

Wiens + Dmitrienko (2005): Proof that FWER is controlled, suggestion for improvement

Two types of weights: sequence of hypotheses; „assigned weights“ α1‘,…,αn‘ with Σαi‘ =α.

Page 12: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

12 12

The fallback procedure (II)

Use „assigned weights“ α1‘,…,αn‘ with Σαi‘ =α .

Actual significance levels:

α1 = α1‘

αi = αi‘ + αi-1 if Hi-1 has been rejected

αi = αi‘ if Hi-1 has not been rejected.

α1‘ = α, α2‘ = ... = αn‘ = 0 fixed sequence test.

Page 13: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

13 13

Example for n = 2

Endpoint 1: Functional capacity of heart Endpoint 2: Mortality α = .05, α1‘ = .04, α2‘ = .01

p1 .04: Reject H1 and test H2 with α2 = .05 .

p1 > .04: Retain H1 and test H2 with α2 = .01 .

Weighted Bonferroni-Holm with α1‘ = .04, α2‘ = .01 :

Rejects H1, in addition, when p2 .01 and

.04 < p1 .05 !

Page 14: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

14 14

Comparison with weighted Bonferroni-Holm

For n = 2: WBH is strictly more powerful than the fallback procedure. The improvement by Wiens + Dmitrienko is identical to WBH.

For n 3: There exist situations where fallback rejects and WBH not, and conversely. ( the improvement by W+D is not identical to WBH)

Page 15: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

15 15

The fallback procedure for n=3:weights for intersection hypotheses

αi‘= wiα

wi = 1

(see W+D)

index set weight for index

1 2 3

{1,2,3}

{1,2}

{1,3}

{2,3}

{1}

{2}

{3}

w1 w2 w3

w1 w2 --

w1 -- w2+w3

-- w1+w2 w3

w1 -- --

-- w1+w2 --

-- -- w1+w2+w3

Page 16: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

16 16

The fallback procedure for n=3:equal weights

αi‘= wiα

wi = 1/3

Consequence

for importance:

H2 H3 H1?

index set weight for index

1 2 3

{1,2,3}

{1,2}

{1,3}

{2,3}

{1}

{2}

{3}

1/3 1/3 1/3

1/3 1/3 --

1/3 -- 2/3

-- 2/3 1/3

1/3 -- --

-- 2/3 --

-- -- 1

Page 17: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

17 17

The fallback procedure for n=3:equal weights

αi‘= wiα

wi = 1/3

Consequence

for importance:

H2 H3 H1?

index set weight for index

1 2 3

{1,2,3}

{1,2}

{1,3}

{2,3}

{1}

{2}

{3}

1/3 1/3 1/3

1/3 1/3 --

1/3 -- 2/3

-- 2/3 1/3

1/3 -- --

-- 2/3 --

-- -- 1

Page 18: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

18 18

The fallback procedure for n=3:equal weights; improvement by W+D

αi‘= wiα

wi = 1/3

Consequence

for importance:

H2 H3 H1

(remains)

index set weight for index

1 2 3

{1,2,3}

{1,2}

{1,3}

{2,3}

{1}

{2}

{3}

1/3 1/3 1/3

1/2 1/2 --

1/3 -- 2/3

-- 2/3 1/3

1 -- --

-- 1 --

-- -- 1

Page 19: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

19 19

The fallback procedure for n=3:equal weights

The decisions of the fallback procedure (with equal weights) are not exchangeable (and can never become!).

Example: p(1)=.015, p(2)=.02, p(3)=1; α=.05.

(Bonferroni-Holm: rejects H(1) and H(2) )

p1 < p2 < p3 : reject H1, H2 p1 < p3 < p2 : reject H1

p2 < p1 < p3 : reject H2

p2 < p3 < p1 : reject H2, H3

p3 < p1 < p2 : reject H3 (, H1) p3 < p2 < p1 : reject H3

Page 20: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

20 20

The fallback procedure:critical questions

What are the relations of the two different types of weighting?

Can it be meaningful to give higher assigned weights for higher indices?

Can one give „guidelines“ how to choose the weights? Equal assigned weights: what is the influence of

ordering? (anyway: the procedure has „aesthetic“ drawbacks)

For which situations can one expect that the fallback procedure is more powerful than WBH?

Or should one better renounce it completely?

Page 21: Institut für Medizinische Biometrie, Epidemiologie und Informatik Aesthetics and power in multiple testing – a contradiction? MCP 2007, Vienna Gerhard.

21 21

Thank you for your attendance!

Are there more questions?

Or some answers?