Top Banner
What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008
31

What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Mar 26, 2015

Download

Documents

Sarah Coyle
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

What To Do About the Multiple Comparisons Problem?

Peter Z. Schochet

What To Do About the Multiple Comparisons Problem?

Peter Z. Schochet

February 2008February 2008

Page 2: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Overview of Presentation Overview of Presentation

Background

Suggested testing guidelines

Background

Suggested testing guidelines

2

Page 3: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

BackgroundBackground

Page 4: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Overview of the ProblemOverview of the Problem

Multiple hypothesis tests are often conducted in impact studies

– Outcomes– Subgroups – Treatment groups

Standard testing methods could yield:– Spurious significant impacts – Incorrect policy conclusions

Multiple hypothesis tests are often conducted in impact studies

– Outcomes– Subgroups – Treatment groups

Standard testing methods could yield:– Spurious significant impacts – Incorrect policy conclusions 4

Page 5: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Assume a Classical Hypothesis Testing Framework

Assume a Classical Hypothesis Testing Framework

True impacts are fixed for the study population

Test H0j: Impactj = 0

Reject H0j if p-value of t-test < =.05

Chance of finding a spurious impact is 5 percent for each test alone

True impacts are fixed for the study population

Test H0j: Impactj = 0

Reject H0j if p-value of t-test < =.05

Chance of finding a spurious impact is 5 percent for each test alone

5

Page 6: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

But Suppose No True Impacts and the Tests Are Considered Together But Suppose No True Impacts and

the Tests Are Considered Together

Probability 1 t-test

Number of Testsa Is Statistically Significant

1 .05

5 .23

10 .40

20 .64

50 .92aAssumes independent tests

6

Page 7: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Impact Findings Can Be Misrepresented

Impact Findings Can Be Misrepresented

Publishing bias

A focus on “stars”

Publishing bias

A focus on “stars”

7

Page 8: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Adjustment Procedures Lower Levels for Individual Tests

Adjustment Procedures Lower Levels for Individual Tests

Control the “combined” error rate

Many available methods:

– Bonferroni: Compare p-values to (.05 / # of tests)

– Fisher’s LSD, Holm (1979), Sidak (1967), Scheffe (1959), Hochberg (1988), Rom (1990), Tukey (1953)

– Resampling methods (Westfall and Young 1993)

– Benjamini-Hochberg (1995)

Control the “combined” error rate

Many available methods:

– Bonferroni: Compare p-values to (.05 / # of tests)

– Fisher’s LSD, Holm (1979), Sidak (1967), Scheffe (1959), Hochberg (1988), Rom (1990), Tukey (1953)

– Resampling methods (Westfall and Young 1993)

– Benjamini-Hochberg (1995)

8

Page 9: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

These Methods Reduce Statistical Power-

The Chances of Finding Real Effects These Methods Reduce Statistical Power-

The Chances of Finding Real Effects

Simulated Statistical Powera

Number of Tests Unadjusted Bonferroni

5 .80 .59

10 .80 .50

20 .80 .41

50 .80 .31

a Assumes 1,000 treatments and 1,000 controls, 20 percent of all null hypotheses are true, and independent tests

9

Page 10: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Big Debate on Whether To Use Adjustment Procedures

Big Debate on Whether To Use Adjustment Procedures

What is the proper balance between Type I and Type II errors?

What is the proper balance between Type I and Type II errors?

10

Page 11: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

To Adjust or Not To Adjust?

To Adjust or Not To Adjust?

Page 12: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

February, July, December 2007 Advisory Panel Meetings Held at IES

February, July, December 2007 Advisory Panel Meetings Held at IES

Chairs:

Phoebe Cottingham, IESRob Hollister, SwarthmoreRebecca Maynard, U. of PA

Chairs:

Phoebe Cottingham, IESRob Hollister, SwarthmoreRebecca Maynard, U. of PA

Participants:

Steve Bell, AbtHoward Bloom, MDRC John Burghardt, MPRMark Dynarski, MPRAndrew Gelman, ColumbiaDavid Judkins, WestatJeff Kling, BrookingsDavid Myers, AIRLarry Orr, AbtPeter Schochet, MPR

12

Page 13: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Basic Principles for a Testing Strategy

Basic Principles for a Testing Strategy

Page 14: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

The Multiplicity Problem Should Not Be Ignored

The Multiplicity Problem Should Not Be Ignored

Erroneous conclusions can result otherwise

But need a strategy that balances Type I and II errors

Erroneous conclusions can result otherwise

But need a strategy that balances Type I and II errors

14

Page 15: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Limiting the Number of Outcomes and Subgroups Can Help

Limiting the Number of Outcomes and Subgroups Can Help

But not always possible or desirable

Need flexible strategy for confirmatory and exploratory analyses

But not always possible or desirable

Need flexible strategy for confirmatory and exploratory analyses

15

Page 16: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Problem Should Be Addressed by First Structuring the Data

Problem Should Be Addressed by First Structuring the Data

Structure will depend on the research questions

Adjustments should not be conducted blindly across all contrasts

Structure will depend on the research questions

Adjustments should not be conducted blindly across all contrasts

16

Page 17: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Suggested Testing Guidelines Suggested Testing Guidelines

Page 18: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

The Plan Must Be Specified Up Front

The Plan Must Be Specified Up Front

Rigor requires that the strategy be documented prior to data analysis

Rigor requires that the strategy be documented prior to data analysis

18

Page 19: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Delineate Separate Outcome Domains

Delineate Separate Outcome Domains

Based on a conceptual framework that relates the intervention to the outcomes

Represent key clusters of constructs

Domain “items” are likely to measure the same underlying trait

– Test scores– Teacher practices– School attendance

Based on a conceptual framework that relates the intervention to the outcomes

Represent key clusters of constructs

Domain “items” are likely to measure the same underlying trait

– Test scores– Teacher practices– School attendance

19

Page 20: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Testing Strategy: Both Confirmatory and Exploratory Components

Testing Strategy: Both Confirmatory and Exploratory Components

Confirmatory component

– Addresses central study hypotheses

– Must adjust for multiple comparisons

– Must be specified in advance

Exploratory component

– Identify impacts or relationships for future study

– Findings should be regarded as preliminary

Confirmatory component

– Addresses central study hypotheses

– Must adjust for multiple comparisons

– Must be specified in advance

Exploratory component

– Identify impacts or relationships for future study

– Findings should be regarded as preliminary 20

Page 21: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Confirmatory Analysis Has Two Potential Parts

Confirmatory Analysis Has Two Potential Parts

1. Domain-specific analysis

2. Between-domain analysis

1. Domain-specific analysis

2. Between-domain analysis

21

Page 22: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Domain-Specific Analysis Domain-Specific Analysis

Page 23: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Test Impacts for Outcomes as a Group

Test Impacts for Outcomes as a Group

Create a composite domain outcome

– Weighted average of standardized outcomes

Simple average Index Latent factor

Conduct a t-test on the composite

Create a composite domain outcome

– Weighted average of standardized outcomes

Simple average Index Latent factor

Conduct a t-test on the composite

23

Page 24: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

What About Tests for Individual Domain Outcomes?

What About Tests for Individual Domain Outcomes?

If impact on composite is significant

– Test impacts for individual domain outcomes without multiplicity corrections

– Use only for interpretation

If impact on composite is not significant

– Further tests are not warranted

If impact on composite is significant

– Test impacts for individual domain outcomes without multiplicity corrections

– Use only for interpretation

If impact on composite is not significant

– Further tests are not warranted

24

Page 25: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Between-Domain Analysis Between-Domain Analysis

Page 26: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Applicable If Studies Require Summative Evidence of Impacts

Applicable If Studies Require Summative Evidence of Impacts

Constructing “unified” composites may not make sense

– Domains measure different latent traits

Test domain composites individually using adjustment procedures

Constructing “unified” composites may not make sense

– Domains measure different latent traits

Test domain composites individually using adjustment procedures

26

Page 27: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Testing Strategy Will Depend on the Research Questions

Testing Strategy Will Depend on the Research Questions

Are impacts significant in all domains? – No adjustments are needed

Are impacts significant in any domain? – Adjustments are needed

Are impacts significant in all domains? – No adjustments are needed

Are impacts significant in any domain? – Adjustments are needed

27

Page 28: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Other Situations That Require Multiplicity Adjustments

Other Situations That Require Multiplicity Adjustments

1. Designs with multiple treatment groups

– Apply Tukey-Kramer, Dunnett, or resampling methods to domain composites

2. Subgroup analyses that are part of the confirmatory analysis

– Conduct F-tests for differences across subgroup impacts

1. Designs with multiple treatment groups

– Apply Tukey-Kramer, Dunnett, or resampling methods to domain composites

2. Subgroup analyses that are part of the confirmatory analysis

– Conduct F-tests for differences across subgroup impacts

28

Page 29: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Statistical Power Statistical Power

Studies must be designed to have sufficient statistical power for all confirmatory analyses

– Includes subgroup analyses

Studies must be designed to have sufficient statistical power for all confirmatory analyses

– Includes subgroup analyses

29

Page 30: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Reporting Must Link to the Study Protocols

Reporting Must Link to the Study Protocols

Qualify confirmatory and exploratory analysis findings in reports

– No one way to present adjusted and unadjusted p-values

– Confidence intervals may be helpful

– Emphasize confirmatory analysis results in the executive summary

Qualify confirmatory and exploratory analysis findings in reports

– No one way to present adjusted and unadjusted p-values

– Confidence intervals may be helpful

– Emphasize confirmatory analysis results in the executive summary

30

Page 31: What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Testing Approach SummaryTesting Approach Summary

Pre-specify plan in the study protocols

Structure the data– Delineate outcome domains

Confirmatory analysis

–Within and between domains

Exploratory analysis

Qualify findings appropriately

Pre-specify plan in the study protocols

Structure the data– Delineate outcome domains

Confirmatory analysis

–Within and between domains

Exploratory analysis

Qualify findings appropriately31