Comparing Three or More Groups: Multiple Comparisons vs Planned Comparisons Robert Boudreau, PhD

Comparing Three or More Groups:Comparing Three or More Groups:

Multiple Comparisons Multiple Comparisons vs vs

Planned ComparisonsPlanned Comparisons

Robert Boudreau, PhDRobert Boudreau, PhDCo-Director of Methodology CoreCo-Director of Methodology Core

PITT-Multidisciplinary Clinical Research Center PITT-Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal Diseasesfor Rheumatic and Musculoskeletal Diseases

First a simple thought First a simple thought experimentexperiment

Flip a fair coin 100 times: Let H=# Flip a fair coin 100 times: Let H=# headsheads

H = 0,1,2, …, 100 are the possible H = 0,1,2, …, 100 are the possible outcomesoutcomes

H has a binomial distribution with H has a binomial distribution with known probs known probs

Prob[ 40 < H < 60 ] very close to 0.95Prob[ 40 < H < 60 ] very close to 0.95

Prob [ H ≤ 40 ] + P[ H ≥ 60] = 0.05 Prob [ H ≤ 40 ] + P[ H ≥ 60] = 0.05

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


Flip a fair coin 100 times: Let H=# headsFlip a fair coin 100 times: Let H=# heads H = 0,1,2, …, 100 are the possible outcomesH = 0,1,2, …, 100 are the possible outcomes H has a binomial distribution with known H has a binomial distribution with known

probs probs Prob[ 40 < H < 60 ] very close to 0.95Prob[ 40 < H < 60 ] very close to 0.95

Prob [ H ≤ 40 ] + P[ H ≥ 60] = 0.05 Prob [ H ≤ 40 ] + P[ H ≥ 60] = 0.05 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

ExperimentExperiment: 20 people flip their own coin 100 : 20 people flip their own coin 100 timestimes

Q: Q: Approx how many will get 40 or fewer headsApprox how many will get 40 or fewer heads

or 60+ heads? or 60+ heads?


Flip a fair coin 100 times: Let H=# headsFlip a fair coin 100 times: Let H=# heads H = 0,1,2, …, 100 are the possible outcomesH = 0,1,2, …, 100 are the possible outcomes H has a binomial distribution with known H has a binomial distribution with known

probs probs Prob[ 40 < H < 60 ] very close to 0.95Prob[ 40 < H < 60 ] very close to 0.95

Prob [ H ≤ 40 ] + P[ H ≥ 60] = 0.05 Prob [ H ≤ 40 ] + P[ H ≥ 60] = 0.05 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


Q: Q: Approx how many will get less than 40 headsApprox how many will get less than 40 heads

or 60+ heads? or 60+ heads? Answer: OneAnswer: One


Flip a fair coin 100 times: Let H=# headsFlip a fair coin 100 times: Let H=# heads H = 0,1,2, …, 100 are the possible outcomesH = 0,1,2, …, 100 are the possible outcomes H has a binomial distribution with known probs H has a binomial distribution with known probs Prob[ 40 < H < 60 ] very close to 0.95Prob[ 40 < H < 60 ] very close to 0.95

Prob [ H ≤ 40 ] + P[ H ≥ 60] = 0.05 Prob [ H ≤ 40 ] + P[ H ≥ 60] = 0.05 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


Q: Q: Approx how many will get less than 40 headsApprox how many will get less than 40 heads

or 60+ heads? or 60+ heads? Answer: One (1/20 = 5%)Answer: One (1/20 = 5%)


ExperimentExperiment: 20 people flip their own coin : 20 people flip their own coin 100 times100 times

OneOne (1/20=0.05) will flip an unusually (1/20=0.05) will flip an unusually small or unusually large # heads small or unusually large # heads (on (on average)average)

Q: Q: Can we conclude that this person “X” Can we conclude that this person “X” flips an “unfair” coin, or was this flips an “unfair” coin, or was this explainable by “chance”?explainable by “chance”?

Controlling Controlling Experiment-Experiment-wisewise Error Error

ExperimentExperiment: 20 people flip their own coin 100 times: 20 people flip their own coin 100 times

Person X’s confidence interval didn’t cover 0.5Person X’s confidence interval didn’t cover 0.5

Q: Q: What alpha level should be used so that 95% of What alpha level should be used so that 95% of the time the time all 20 confidence intervalsall 20 confidence intervals each cover each cover 0.5?0.5?

(i.e. so that the correct conclusion is drawn (i.e. so that the correct conclusion is drawn about about

every single coin)every single coin)

Controlling Controlling Experiment-Experiment-wisewise Error Error

ExperimentExperiment: 20 people flip their own coin 100 times: 20 people flip their own coin 100 times

Person X’s confidence interval didn’t cover 0.5Person X’s confidence interval didn’t cover 0.5

Q: Q: What alpha level should be used so that 95% of the What alpha level should be used so that 95% of the time time all 20 confidence intervalsall 20 confidence intervals each cover 0.5?each cover 0.5?

(i.e. so that the correct conclusion is drawn about (i.e. so that the correct conclusion is drawn about

every single coin)every single coin)

Equivalent to drawing a “wrong” conclusion Equivalent to drawing a “wrong” conclusion about at least one of the coins only 5% of the time about at least one of the coins only 5% of the time (Experiment-wise Type I error)(Experiment-wise Type I error)

Controlling Experiment-Controlling Experiment-wise Errorwise Error

Q: Q: What alpha level should be used so that there’s a What alpha level should be used so that there’s a 95% probability that 95% probability that all 20 confidence intervalsall 20 confidence intervals each cover 0.5? (aka Experiment-wise correct each cover 0.5? (aka Experiment-wise correct conclusion)conclusion)

Experiment-wise Experiment-wise αα=0.05, solve for comparison-wise =0.05, solve for comparison-wise αα*: *:

αα = Prob[ At least one C.I. misses 0 ] = Prob[ At least one C.I. misses 0 ]

= 1 – Prob[ All C.I.’s cover 0 ]= 1 – Prob[ All C.I.’s cover 0 ]

= 1 – (1 – = 1 – (1 – αα* )* )2020

Sidak:Sidak: Comparison-wise Comparison-wise αα* = 1 – (1 – * = 1 – (1 – αα))1/n 1/n

n=20 “comparisons”: n=20 “comparisons”: αα* = 1 – (1-.05)* = 1 – (1-.05)1/201/20 = 0.00256 = 0.00256


Q:Q: What alpha level should be used so that What alpha level should be used so that there’s a 95% probability that there’s a 95% probability that all 20 confidence all 20 confidence intervalsintervals each cover 0.5? each cover 0.5?

Sidak:Sidak: Comparison-wise Comparison-wise αα* = 1 – (1 – * = 1 – (1 – αα))1/n 1/n

n=20 “comparisons”: n=20 “comparisons”: αα* = 1 – (1-.05)* = 1 – (1-.05)1/201/20 = = 0.002560.00256

Bonferroni:Bonferroni: αα* = * = αα/n ( 0.05/20=0.0025)/n ( 0.05/20=0.0025)


Mathematically: Mathematically: αα/n < 1 – (1 – /n < 1 – (1 – αα))1/n 1/n

Bonferroni < Sidak Bonferroni < Sidak (i.e. higher (i.e. higher αα-level)-level)

But usually very close But usually very close Sidak slightly more powerful Sidak slightly more powerful

Bonferroni works in all situations Bonferroni works in all situations to guarantee control to guarantee control of experimentwise error (but may be conservative)of experimentwise error (but may be conservative)

Sidak (derived assuming independence) can under-Sidak (derived assuming independence) can under-control in presence of high correlations control in presence of high correlations

Comparison of Adverse Comparison of Adverse Effect Effect

of 4 Drugs on Systolic BPof 4 Drugs on Systolic BP





Unadjusted pairwise t-tests Unadjusted pairwise t-tests ((αα = 0.05 each comparison) = 0.05 each comparison)critical value of t=critical value of t=2.131452.13145

Pairwise t-tests Pairwise t-tests (Bonferroni) (Bonferroni)

critical value of t=critical value of t=3.036283.03628

Pairwise t-tests (Sidak) Pairwise t-tests (Sidak) critical value of t=critical value of t=3.025853.02585

Comparison of critical Comparison of critical valuesvalues

Scheffe: * Designed for arbitrary post-hoc testing

* Controls experimentwise error for all

possible simultaneous comparisons and contrasts


of 4 Drugs on Systolic BP of 4 Drugs on Systolic BP (v2)(v2)

Note: For Drug 4, I’ve subtracted 6 from the previous values

s

s


of 4 Drugs on Systolic BP of 4 Drugs on Systolic BP (v2)(v2)

ANOVA F-test

Unadjusted pairwise t-tests Unadjusted pairwise t-tests (v2) (v2)

((αα = 0.05 each comparison) = 0.05 each comparison)critical value of t=critical value of t=2.131452.13145

Pairwise t-tests Pairwise t-tests (Bonferroni) (Bonferroni) (v2)(v2)


Pairwise t-tests (Sidak) Pairwise t-tests (Sidak) (v2)(v2)


Tukey’s Studentized Range Tukey’s Studentized Range TestTest

Related in concept to Scheffe’s MethodRelated in concept to Scheffe’s Method Designed for all pairwise comparisons Designed for all pairwise comparisons

exclusivelyexclusively

(recall: Scheffe applies to (recall: Scheffe applies to all all possible possible simultaneous simultaneous

pairwise comparisons and contrasts)pairwise comparisons and contrasts)

Exact experimentwise error coverage if sample Exact experimentwise error coverage if sample sizes equalsizes equal

Critical values smaller than Bonferroni or SidakCritical values smaller than Bonferroni or Sidak

More powerful in finding differencesMore powerful in finding differences

Pairwise t-tests (Tukey) Pairwise t-tests (Tukey) (v2)(v2)

critical value of t= critical value of t=2.882152.88215



Dunnett’s MethodDunnett’s Method(Comparison vs a Control)(Comparison vs a Control)

Related in concept to Scheffe and Tukey Related in concept to Scheffe and Tukey MethodsMethods

Designed for pairwise comparisons vs a Designed for pairwise comparisons vs a single control single control exclusivelyexclusively

Exact experimentwise error coverage of Exact experimentwise error coverage of those comparisons if sample sizes equalthose comparisons if sample sizes equal

Critical values smaller than Bonferroni, Critical values smaller than Bonferroni, Sidak or TukeySidak or Tukey

More powerful in finding differences vs More powerful in finding differences vs controlcontrol

Comparison vs Control Comparison vs Control (Dunnett) (Dunnett) (v2)(v2)

critical value of t= critical value of t=2.617022.61702

Controlling for Multiple Controlling for Multiple Comparisons in Exploratory Comparisons in Exploratory

AnalysesAnalyses Caterina Rosano, Howard J. Caterina Rosano, Howard J.

Aizenstein, Stephanie Studenski, Anne Aizenstein, Stephanie Studenski, Anne B. Newman. B. Newman.

A Regions-of-Interest Volumetric A Regions-of-Interest Volumetric Analysis of Mobility Limitations in Analysis of Mobility Limitations in Community-Dwelling Older Adults. Community-Dwelling Older Adults. Journal of Gerontology: Medical Journal of Gerontology: Medical Sciences 2007Sciences 2007


AnalysesAnalyses A Regions-of-Interest Volumetric A Regions-of-Interest Volumetric

Analysis of Mobility Limitations in Analysis of Mobility Limitations in Community-Dwelling Older Adults. Community-Dwelling Older Adults. Journal of Gerontology: Medical Journal of Gerontology: Medical Sciences 2007Sciences 2007


AnalysesAnalyses


AnalysesAnalyses

c

Thank you !Thank you !

Any Questions?Any Questions?

Robert Boudreau, PhDRobert Boudreau, PhDCo-Director of Methodology CoreCo-Director of Methodology Core

PITT-Multidisciplinary Clinical Research Center PITT-Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal Diseasesfor Rheumatic and Musculoskeletal Diseases

Comparing Three or More Groups: Multiple Comparisons vs Planned Comparisons Robert Boudreau, PhD

Documents