Use of the False Discovery Rate for Evaluating Clinical Safety Data Joseph F. Heyse Devan V. Mehrotra Clinical Biostatistics – Vaccines Merck Research.

Use of the False Discovery Rate for Evaluating Clinical Safety Data

Joseph F. Heyse Devan V. Mehrotra

Clinical Biostatistics – VaccinesMerck Research Laboratories

Blue Bell, PA

Third International Conference on Multiple Comparisons Bethesda, MD August 6, 2002

Heyse/MCP2002 bl 2

Acknowledgment

This research was in collaboration with the late Professor John Tukey (Princeton University).

Heyse/MCP2002 bl 3

Outline

Motivating example

Multiplicity issues

FWER and FDR

Proposal for flagging AEs

Summary of three examples

Concluding remarks

Heyse/MCP2002 bl 4

Introduction Evaluation of safety is an important part of clinical

trials of pharmaceutical and biological products.

Adverse experiences (AEs) can be categorized as three types– Tier 1: Associated with specific hypotheses– Tier 2: Set encountered as part of trial safety

evaluation– Tier 3: Rare spontaneous reports of serious

events that require clinical evaluation

Our interest is primarily Tier 2

Heyse/MCP2002 bl 5

ICH Recommendations

ICH-E9 recommends descriptive statistical methods supplemented by confidence intervals

p-values useful to evaluate a specific difference of interest

If hypothesis tests are used, statistical adjustments for multiplicity to quantitate the Type I error are appropriate, but the Type II error is usually of more concern

p-values sometimes useful as a “flagging” device applied to a large number of safety variables to highlight differences worthy of further attention

Heyse/MCP2002 bl 6

IllustrationMultiplicity in Safety Assessment Clinical trial compared the safety and

immunogenicity of the combination vaccine COMVAX™* to its monovalent components

1 of 92 safety comparisons revealed a higher rate of unusual high-pitched crying (UHPC) following the second of a three-dose series (6.7% vs. 2.3%, p=0.016)

No medical rationale for this finding was discovered and a larger hypothesis-driven study was designed

Comparable rates were observed following vaccination in this larger trial

*COMVAX™ is a combination of HIB and HB vaccine

Heyse/MCP2002 bl 7

Motivating Example(MMRV* Vaccine)

Safety and immunogenicity vaccine trial.

Study population: healthy toddlers, 12-18 months of age

Group 1 = MMRV + PedvaxHIB on Day 0 Group 2 = MMR + PedvaxHIB on Day 0, followed

by (optional) varicella vaccine on Day 42

*MMRV is a combination measles, mumps, rubella, varicella vaccine

Heyse/MCP2002 bl 8

Motivating Example (cont’d)

Safety follow-up (local and systemic reactions)Group 1: Day 0-42 (N=148)Group 2: Day 0-42 (N=148) and Day 42-84 (N=132)

Question: Is the safety profile different if the varicella component is given as part of a combination vaccine on Day 0 compared with giving it 6 weeks later as a monovalent vaccine?

AEs: Group 1 (Day 0-42) vs. Group 2 (Day 42-84)

Heyse/MCP2002 bl 9

Clinical AE Counts (“Tier 2” AEs)

# BS ADVERSE EXPERIENCE

1 01 ASTHENIA / FATIGUE

2 01 FEVER

3 01 INFECTION, FUNGAL

4 01 INFECTION, VIRAL

5 01 MALAISE

6 03 ANOREXIA

7 03 CANDIDIASIS, ORAL

8 03 CONSTIPATION

9 03 DIARRHEA

10 03 GASTROENTERITIS, INFECTIOUS

11 03 NAUSEA

12 03 VOMITING

13 05 LYMPHADENOPATHY

Grp 1(N1=148

)X1

57

34

2

3

27

7

2

2

24

3

2

19

3

Grp 2(N2=132

)X2

40

26

0

1

20

2

0

0

10

1

7

19

2

DIFF (%)

8.2

3.3

1.4

1.3

3.1

3.2

1.4

1.4

8.6

1.3

-4.0

-1.6

0.5

p-value

.1673

.5606

.4998

.6248

.5248

.1791

.4998

.4998

.0289*

.6248

.0889

.7295

1.0000

Heyse/MCP2002 bl 10

Clinical AE Counts (“Tier 2” AEs) - cont’d

# BS ADVERSE EXPERIENCE

14 06 DEHYDRATION

15 08 CRYING

16 08 INSOMNIA

17 08 IRRITABILITY

18 09 BRONCHITIS

19 09 CONGESTION, NASAL

20 09 CONGESTION, RESPIRATORY

21 09 COUGH

22 09 INFECTION, RESPIRATORY, UPPER

23 09LARYNGOTRACHEOBRONCHITIS

24 09 PHARYNGITIS25 09 RHINORRHEA26 09 SINUSITIS

Grp 1(N1=148

)X1

0

2

2

75

4

4

1

13

28

2

13153

Grp 2(N2=132

)X2

2

0

2

43

1

2

2

8

20

1

8

14

1

DIFF (%)

-1.5

1.4

-0.2

18.1

1.9

1.2

-0.8

2.7

3.8

0.6

2.7-0.51.3

p-value

.2214

.4998

1.0000

.0025*

.3746

.6872

.6033

.4969

.4308

1.0000

.49691.0000.6248

Heyse/MCP2002 bl 11

Clinical AE Counts (“Tier 2” AEs) - cont’d

# BS ADVERSE EXPERIENCE27 09 TONSILLITIS28 09 WHEEZING29 10 BITE/STING, NON-VENOMOUS30 10 ECZEMA31 10 PRURITUS32 10 RASH33 10 RASH, DIAPER34 10 RASH, MEASLES/RUBELLA-LIKE35 10 RASH, VARICELLA-LIKE36 10 URTICARIA37 10 VIRAL EXANTHEMA38 11 CONJUNCTIVITIS39 11 OTITIS MEDIA40 11 OTORRHEA

Grp 1(N1=148

)X123422

13684010

182

Grp 2(N2=132

)X2110013212222

141

DIFF (%)0.61.32.71.40.66.52.54.61.2

-1.5-0.8-1.51.60.6

p-value 1.0000 .6248.1248.4998

1.0000.0209*.2885.0388*

.6872.2214.6033.2214.7109

1.0000

Heyse/MCP2002 bl 12

Multiplicity Issues - The Problem

Potential for too many false positive safety findings if the multiplicity problem is ignored (for “Tier 2” AEs).

This can muddy the interpretation of the

safety profile of the vaccine/drug.

Heyse/MCP2002 bl 13

Multiplicity Issues - The Challenge

To develop a procedure for tackling multiplicity that:

Provides a proper balance between “no adjustment” and “too much adjustment”.

Is easy to automate/implement.

Heyse/MCP2002 bl 14

Familywise Error Rate (FWER)

Let F = {H1,H2 … Hm} denote a family of m hypotheses.

FWER = Pr(any true Hi F is rejected).

We usually seek methods for which FWER .

Benjamini & Hochberg (1995) argue that, in certain settings, requiring control of the FWER is often too conservative. They suggest controlling the “false discovery rate” instead, as a more powerful alternative.

Benjamini , Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, B, 57, 289-300.

Heyse/MCP2002 bl 15

False Discovery Rate (FDR)(Benjamini & Hochberg)

.0 as 00 Define rejected.y incorrectl are which hypotheses

null rejected of proportion expected RV

EFDR

Declared

InsignificantDeclared

Significant Total

# of true Hi U V m0

# of false Hi T S m m0

Total m R R m

Heyse/MCP2002 bl 16

False Discovery Rate (FDR) (cont’d)

(Benjamini & Hochberg)

FDR FWER {equality holds if m = m0}. Effect of correlations on FDR is an area of research.

mj

p if H,,H,H jectRe jj21 {This controls FDR at }

0

mm

1mj,pj

m,p~minp~

pp~ :values-p Adjusted

j1jj

mm

Unadjusted p-values .0193 .0280 .2038 .4941

FDR-adjusted p-values .0560 .0560 .2718 .4941

Example

Heyse/MCP2002 bl 17

Proposal for Flagging AEs

We routinely summarize AEs by body system (BS).

s body systems (i = 1, 2, …, s)

ki AEs associated with body system i

pij = between-group p-value for the jth AE within ith BS (e.g., based on two-tailed Fisher’s exact test.)

Heyse/MCP2002 bl 18

Proposal for Flagging AEs (cont’d)

Step 1Ignore AEs for which the total incidence is so low that a rejection even at the unadjusted 0.05 level is impossible.

Step 2Among the remaining AEs, flag those for which the p-value achieves statistical significance after adjusting for multiplicity using a “Double FDR” approach.

Heyse/MCP2002 bl 19

Double FDR Approach

Define This represents the strongest safety “signal” for body system i.

1st level FDR adjustment – Apply FDR adjustment to– Let

2nd level FDR adjustment– Within body system i, apply FDR adjustment to – Let

.pppminpiik,2i,1i

*i

*s

*2

*1 p,,p,p

*i

*i p adjusted-FDR p~

ijij p adjusted-FDR p~

si1,pppiik,2i,1i

Heyse/MCP2002 bl 20

Double FDR Approach (cont’d)

Proposed Flagging Rule

Flag AE(i,j) if

What values of 1 and 2 should we use?

2ij1*i p~ and p~

Heyse/MCP2002 bl 21

Choosing 1 and 2

Set 2 = and use either (a) or (b) below for 1.

(a)Using resampling (non-parametric bootstrap) to determine the largest data-dependent 1 ( 2) that ensures FDR .

OR

(b)Choose 1 ( 2) independent of the data. For

example, let , and estimate the

resulting FDR using resampling.

2or 2

21

Heyse/MCP2002 bl 22

Resampling Procedure Purpose

– To estimate the false discovery rates of the following:

– To determine the largest 1( 2) that guarantees

FDR when using DFDR(1, 2).

NOADJ

FULLFDR()

DFDR(1, 2)

No multiplicity adjustment; flag AE if unadjusted p < .05

Full FDR adjustment (ignore BS grouping)

Double FDR adjustment for selected (1, 2)

Heyse/MCP2002 bl 23

Resampling Procedure (cont’d)

Details1. POOL data from both treatment groups into a

common population. Sample with replacement from this common population, to simulate many repetitions of the original trial.

This procedure:a) simulates a true null situation (Group 1 =

Group 2).b) preserves the correlation structure of original

data.

2. Implement our proposal for flagging AEs using the NOADJ, FULLFDR(), and DFDR(1, 2) approaches, and calculate the corresponding FDRs.

Heyse/MCP2002 bl 24

MMRV Example - Resampling Results

Y = # of incorrectly flagged AEs*

Distribution of Y (%)

Method 0 1 2 3 FDR (% )

NOADJ 48.8 33.0 12.9 5.3 51.2

FULLFDR(.10) 95.2 4.0 0.6 0.2 4.8

DFDR(.02, .05) 97.0 2.5 0.4 0.1 3.0

DFDR(.05, .05) 91.2 7.3 1.1 0.4 8.8

DFDR(.05, .10) 90.9 6.4 1.9 0.8 9.1

DFDR(.10, .10) 79.8 13.0 5.2 2.0 20.2

* out of 40; 2000 simulations

Heyse/MCP2002 bl 25

MMRV Example - Resampling Results

2

1 0.05 0.10 0.15

0.01 1.45 1.45 1.450.02 3.00 3.00 3.000.03 4.70 4.70 4.700.04 7.10 7.15 7.150.05 8.80 9.15 9.150.06 11.70 11.700.07 13.65 13.700.08 16.35 16.500.09 18.85 19.250.10 20.25 21.300.11 24.250.12 25.600.13 27.750.14 29.900.15 31.25

DFDR(1, 2): Estimated FDR (%)

Max. Acceptable FDR () 5% 10% 15%

(2 = ) (.03,.05) (.05,.10) (.07,.15)

Heyse/MCP2002 bl 26

First Level FDR Adjustment

Body System IDNervous systemSkinDigestive systemBody site unspecifiedSpecial sensesMetabolic / immuneRespiratoryHematologic and lymphatic

Number of AE Types

397531

111

Unadjusted p-value0.00250.02090.02890.16730.22140.22140.37461.0000

FDR Adjusted

p-value0.02000.07710.07710.29520.29520.29520.42811.0000

Heyse/MCP2002 bl 27

Second Level FDR AdjustmentBody System 08: Nervous System and Psychiatric

Adverse Experience

Irritability

Crying

Insomnia

Unadjusted p-value

0.0025

0.4998

1.0000

FDR Adjusted p-

value

0.0075

0.7497

1.0000

Heyse/MCP2002 bl 28

Summary of Three ExamplesFlagged AEs

DFDR Adjustment, maximum FDR (% ):Trial(# of subs.)

#of

AEsNo Multiplicity

Adjustment 15% 10% 5%

PedvaxHIB

(N=681)15

FDR ~ 43%Irritability

Upper Resp. Inf.Rash

1=.07,=.15Irritability

Upper Resp. Inf.

1=.05,=.10

1=.02,=.05

MMRV(N=280)

40

FDR ~ 51%Irritability

RashM/R-like rash

Diarrhea


RashM/R-like rash

Diarrhea


Irritability

COMVAX

(N=811)58

FDR ~ 87%Erythema

RashRhinorrhea

Heyse/MCP2002 bl 29

Concluding Remarks

Current approach of flagging AEs based on unadjusted p-values (or C.I.s) can result in excessive false positive safety findings. These can cause undue concern for approval/labeling, and can affect post-marketing commitments.

Under our proposal, the unadjusted p-values (or C.I.s) would still be reported. The Double FDR multiplicity adjustment is a method to facilitate the interpretation of the unadjusted p-values.

Heyse/MCP2002 bl 30

Concluding Remarks (cont’d)

Our proposal for tackling multiplicity will:

– substantially reduce the percentage of incorrectly flagged AEs.

– be better accepted if described a priori in the protocol/DAP rather than on a post-hoc basis.

– facilitate comparable interpretation of safety results across studies, with respect to Type I error.

Use of the False Discovery Rate for Evaluating Clinical Safety Data Joseph F. Heyse Devan V. Mehrotra Clinical Biostatistics – Vaccines Merck Research.

Documents

vaccine u safety

acknowledgment u

u question

clinical evaluation

u study population

monovalent components

fdr u proposal

u comparable rates