Use of the False Discovery Rate for Evaluating Clinical Safety Data Joseph F. Heyse Devan V. Mehrotra Clinical Biostatistics – Vaccines Merck Research Laboratories Blue Bell, PA Third International Conference on Multiple Comparisons Bethesda, MD August 6, 2002
30
Embed
Use of the False Discovery Rate for Evaluating Clinical Safety Data Joseph F. Heyse Devan V. Mehrotra Clinical Biostatistics – Vaccines Merck Research.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Use of the False Discovery Rate for Evaluating Clinical Safety Data
Joseph F. Heyse Devan V. Mehrotra
Clinical Biostatistics – VaccinesMerck Research Laboratories
Blue Bell, PA
Third International Conference on Multiple Comparisons Bethesda, MD August 6, 2002
Heyse/MCP2002 bl 2
Acknowledgment
This research was in collaboration with the late Professor John Tukey (Princeton University).
Heyse/MCP2002 bl 3
Outline
Motivating example
Multiplicity issues
FWER and FDR
Proposal for flagging AEs
Summary of three examples
Concluding remarks
Heyse/MCP2002 bl 4
Introduction Evaluation of safety is an important part of clinical
trials of pharmaceutical and biological products.
Adverse experiences (AEs) can be categorized as three types– Tier 1: Associated with specific hypotheses– Tier 2: Set encountered as part of trial safety
evaluation– Tier 3: Rare spontaneous reports of serious
events that require clinical evaluation
Our interest is primarily Tier 2
Heyse/MCP2002 bl 5
ICH Recommendations
ICH-E9 recommends descriptive statistical methods supplemented by confidence intervals
p-values useful to evaluate a specific difference of interest
If hypothesis tests are used, statistical adjustments for multiplicity to quantitate the Type I error are appropriate, but the Type II error is usually of more concern
p-values sometimes useful as a “flagging” device applied to a large number of safety variables to highlight differences worthy of further attention
Heyse/MCP2002 bl 6
IllustrationMultiplicity in Safety Assessment Clinical trial compared the safety and
immunogenicity of the combination vaccine COMVAX™* to its monovalent components
1 of 92 safety comparisons revealed a higher rate of unusual high-pitched crying (UHPC) following the second of a three-dose series (6.7% vs. 2.3%, p=0.016)
No medical rationale for this finding was discovered and a larger hypothesis-driven study was designed
Comparable rates were observed following vaccination in this larger trial
*COMVAX™ is a combination of HIB and HB vaccine
Heyse/MCP2002 bl 7
Motivating Example(MMRV* Vaccine)
Safety and immunogenicity vaccine trial.
Study population: healthy toddlers, 12-18 months of age
Group 1 = MMRV + PedvaxHIB on Day 0 Group 2 = MMR + PedvaxHIB on Day 0, followed
by (optional) varicella vaccine on Day 42
*MMRV is a combination measles, mumps, rubella, varicella vaccine
Heyse/MCP2002 bl 8
Motivating Example (cont’d)
Safety follow-up (local and systemic reactions)Group 1: Day 0-42 (N=148)Group 2: Day 0-42 (N=148) and Day 42-84 (N=132)
Question: Is the safety profile different if the varicella component is given as part of a combination vaccine on Day 0 compared with giving it 6 weeks later as a monovalent vaccine?
Potential for too many false positive safety findings if the multiplicity problem is ignored (for “Tier 2” AEs).
This can muddy the interpretation of the
safety profile of the vaccine/drug.
Heyse/MCP2002 bl 13
Multiplicity Issues - The Challenge
To develop a procedure for tackling multiplicity that:
Provides a proper balance between “no adjustment” and “too much adjustment”.
Is easy to automate/implement.
Heyse/MCP2002 bl 14
Familywise Error Rate (FWER)
Let F = {H1,H2 … Hm} denote a family of m hypotheses.
FWER = Pr(any true Hi F is rejected).
We usually seek methods for which FWER .
Benjamini & Hochberg (1995) argue that, in certain settings, requiring control of the FWER is often too conservative. They suggest controlling the “false discovery rate” instead, as a more powerful alternative.
Benjamini , Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, B, 57, 289-300.
Heyse/MCP2002 bl 15
False Discovery Rate (FDR)(Benjamini & Hochberg)
.0 as 00 Define rejected.y incorrectl are which hypotheses
null rejected of proportion expected RV
EFDR
Declared
InsignificantDeclared
Significant Total
# of true Hi U V m0
# of false Hi T S m m0
Total m R R m
Heyse/MCP2002 bl 16
False Discovery Rate (FDR) (cont’d)
(Benjamini & Hochberg)
FDR FWER {equality holds if m = m0}. Effect of correlations on FDR is an area of research.
mj
p if H,,H,H jectRe jj21 {This controls FDR at }
0
mm
1mj,pj
m,p~minp~
pp~ :values-p Adjusted
j1jj
mm
Unadjusted p-values .0193 .0280 .2038 .4941
FDR-adjusted p-values .0560 .0560 .2718 .4941
Example
Heyse/MCP2002 bl 17
Proposal for Flagging AEs
We routinely summarize AEs by body system (BS).
s body systems (i = 1, 2, …, s)
ki AEs associated with body system i
pij = between-group p-value for the jth AE within ith BS (e.g., based on two-tailed Fisher’s exact test.)
Heyse/MCP2002 bl 18
Proposal for Flagging AEs (cont’d)
Step 1Ignore AEs for which the total incidence is so low that a rejection even at the unadjusted 0.05 level is impossible.
Step 2Among the remaining AEs, flag those for which the p-value achieves statistical significance after adjusting for multiplicity using a “Double FDR” approach.
Heyse/MCP2002 bl 19
Double FDR Approach
Define This represents the strongest safety “signal” for body system i.
1st level FDR adjustment – Apply FDR adjustment to– Let
2nd level FDR adjustment– Within body system i, apply FDR adjustment to – Let
.pppminpiik,2i,1i
*i
*s
*2
*1 p,,p,p
*i
*i p adjusted-FDR p~
ijij p adjusted-FDR p~
si1,pppiik,2i,1i
Heyse/MCP2002 bl 20
Double FDR Approach (cont’d)
Proposed Flagging Rule
Flag AE(i,j) if
What values of 1 and 2 should we use?
2ij1*i p~ and p~
Heyse/MCP2002 bl 21
Choosing 1 and 2
Set 2 = and use either (a) or (b) below for 1.
(a)Using resampling (non-parametric bootstrap) to determine the largest data-dependent 1 ( 2) that ensures FDR .
OR
(b)Choose 1 ( 2) independent of the data. For
example, let , and estimate the
resulting FDR using resampling.
2or 2
21
Heyse/MCP2002 bl 22
Resampling Procedure Purpose
– To estimate the false discovery rates of the following:
– To determine the largest 1( 2) that guarantees
FDR when using DFDR(1, 2).
NOADJ
FULLFDR()
DFDR(1, 2)
No multiplicity adjustment; flag AE if unadjusted p < .05
Full FDR adjustment (ignore BS grouping)
Double FDR adjustment for selected (1, 2)
Heyse/MCP2002 bl 23
Resampling Procedure (cont’d)
Details1. POOL data from both treatment groups into a
common population. Sample with replacement from this common population, to simulate many repetitions of the original trial.
This procedure:a) simulates a true null situation (Group 1 =
Group 2).b) preserves the correlation structure of original
data.
2. Implement our proposal for flagging AEs using the NOADJ, FULLFDR(), and DFDR(1, 2) approaches, and calculate the corresponding FDRs.
Second Level FDR AdjustmentBody System 08: Nervous System and Psychiatric
Adverse Experience
Irritability
Crying
Insomnia
Unadjusted p-value
0.0025
0.4998
1.0000
FDR Adjusted p-
value
0.0075
0.7497
1.0000
Heyse/MCP2002 bl 28
Summary of Three ExamplesFlagged AEs
DFDR Adjustment, maximum FDR (% ):Trial(# of subs.)
#of
AEsNo Multiplicity
Adjustment 15% 10% 5%
PedvaxHIB
(N=681)15
FDR ~ 43%Irritability
Upper Resp. Inf.Rash
1=.07,=.15Irritability
Upper Resp. Inf.
1=.05,=.10
1=.02,=.05
MMRV(N=280)
40
FDR ~ 51%Irritability
RashM/R-like rash
Diarrhea
1=.07,=.15Irritability
RashM/R-like rash
Diarrhea
1=.05,=.10Irritability
Irritability
COMVAX
(N=811)58
FDR ~ 87%Erythema
RashRhinorrhea
Heyse/MCP2002 bl 29
Concluding Remarks
Current approach of flagging AEs based on unadjusted p-values (or C.I.s) can result in excessive false positive safety findings. These can cause undue concern for approval/labeling, and can affect post-marketing commitments.
Under our proposal, the unadjusted p-values (or C.I.s) would still be reported. The Double FDR multiplicity adjustment is a method to facilitate the interpretation of the unadjusted p-values.
Heyse/MCP2002 bl 30
Concluding Remarks (cont’d)
Our proposal for tackling multiplicity will:
– substantially reduce the percentage of incorrectly flagged AEs.
– be better accepted if described a priori in the protocol/DAP rather than on a post-hoc basis.
– facilitate comparable interpretation of safety results across studies, with respect to Type I error.