Adaptive Clinical Trials Scott Evans, Ph.D. Harvard University Muscle Study Group September 28, 2012
Jan 03, 2016
Adaptive Clinical Trials
Scott Evans, Ph.D.
Harvard University
Muscle Study Group
September 28, 2012
Adaptive Designs
• Not universally defined
– Broad definition: any design in which key parameters can be changed during the trial based on data from the current study or from external sources
– Narrow definition: specific design changes as a result of planned after interim analyses of treatment responses
Adaptive Designs
• A design feature
– A planned procedure for statistical error and bias control
– Described in the protocol
• Not a substitute for careful planning
– Not a rescue medication
• Fancy adaptations and statistical methods cannot rescue poorly designed trials
Practical Questions During Trial Conduct
• Stop for efficacy or futility?
• Are there subgroups w/ unacceptable toxicity?
• Has medical knowledge changed the scientific validity, medical importance, ethical acceptability, or equipoise of the trial?
• Should we adjust our design due to inaccurate design assumptions?
– Re-calculate sample size?
– Modify duration of follow-up?
Motivation
“I’ve designed >1000 clinical trials, each time having to make assumptions about variation, control-group response rate, etc. in order to
calculate sample size …
Motivation
“I’ve designed >1000 clinical trials, each time having to make assumptions about variation, control-group response rate, etc. in order to
calculate sample size …
I have not been right yet.”
Example as DSMB Member
• Trial designed to detect difference between response rates of 90% (control) and 97.5%
– 7.5% absolute difference
– 486 patients required to have 90% power
• Observed rate of control at interim is 80%
– With N=486, 56% power to detect 7.5% difference
– N=1066 required for 90% power to detect a difference between 80% vs. 87.5%
Motivation• Answering these questions has:
– Ethical attractiveness• Safer trials: fewer participants exposed to inefficacious/harmful
therapies
– Economical advantages• Smaller expected sample sizes • Shorter trials
– Public health advantages• Answers may get to the medical community more quickly
What Can be Adapted?
• Sample size
• Drop/add arms
• Stop for efficacy or futility
• Population enrichment (adapt eligibility criteria)
• Randomization probabilities
• Doses
• Objectives / hypotheses
– E.g., switching between NI and superiority
• Endpoints
• NI margin
When and Where? Trials with:
• High levels of uncertainty/unknowns (e.g. novel interventions)
• Design characteristics (e.g., power) that are sensitive to assumptions
• Long FU: adaptation is feasible and medical practice can change
• Invasive procedures or expensive evaluations
• Serious diseases; high risk treatments
• Vulnerable populations
• Data that serves as the basis for adaptation is available quickly
Complexity and Acceptability
• Some adaptations are well understood/accepted
• Depends upon– Type of adaptation– The data utilized for decision-making– How adaptation is implemented– Who is reviewing data and making the decision to
adapt
Threat to Trial Integrity
• LOW– Adaptations prior to any data analyses– Adaptations based on
• Baseline data• External data• Blinded (aggregate) data • Nuisance parameters (e.g. variation)
• HIGH– Unplanned adaptations– Adaptations based on observed treatment effects
Example: Adaptation based on external data
• ATN 082: Evaluation of Pre-exposure prophylaxis (PREP)
• Randomization to PREP vs. placebo to prevent HIV transmission
– 8/2008: 1st participant enrolled
– 11/22/2010: email notifying results from iPREX trial (Gates Fndtn)
• PREP reduced HIV acquisition in similar trial (NEJM; 11/23/2010)
– 11/23/2010: DSMB call
• Equipoise? Still ethical to randomize and follow?
• Recommendation
– Notify participants and IRBs of iPREX results
– Unblind participants
– Discontinue control arm; offer rollover onto PREP
– Continue enrollment into PREP
Major Scientific Concerns with Adaptive Designs
• Statistical – Error control associated with multiplicity
• Operational bias– Adaptations are visible and could be used to infer trial results,
affecting patient/investigator action during the trial• E.g., participation, adherence, objectivity of patient ratings, etc.
– Not a statistical source of bias and thus difficult to adjust for– May cause heterogeneity of results (before vs. after adaptation)
Addressing Concerns
• Statistical – Methods exist (e.g., group sequential and modern adaptive design
methods for controlling errors)
• Operational bias– Careful and responsible application of adaptation– Well-constructed processes
• Control of dissemination of adaptation• Interim analyses and DSMBs procedures• The “closed protocol” (protocol team blinded)
– Details regarding the planned adaptation are put into a separate (limited distribution) document to reduce back-calculation for inferring effects
Example: Industry Trial
• Randomized controlled trial for treating lymphoma
• Conditional power calculated during interim analyses
• Pre-specified sample size adaptation rule (e.g., if low stop for futility; if very high continue as scheduled; if in the middle there are various sample size adaptations (or # events)
• DMC can recommend trial continuation but does not specify sample size (thus nobody can back- calculate treatment effect at interim)
• DMC is kept apprised of enrollment and # of events (event-driven trial), and DMC says “STOP” when appropriate
Conceptual Issue
• Should we adapt sample size based on observed treatment effect?– Trials are designed to detect relevant effects– Observed effects may not be relevant– Are we losing sight of clinical relevance?
5/26/2011: Email from FDA Team Leader
• I find many sponsors re-estimate their sample based on the interim difference. I feel this is incorrect. Sample size should only be re-estimated based on mispecification of variability or control rate. The difference we are trying to detect should be based on clinical input. If we re-estimate based on observed difference, we may end up with a trial that shows a statistically significant difference but not a clinically meaningful difference. Furthermore, I think using this information to would affect the Type I error rate even if we adjust for the interim analysis. There seems to be disagreement in FDA as to whether you can re-estimate based on observed difference I may be in the minority. I would appreciate your thoughts.
Changing Endpoints
• NEJM 2009: Evaluation of 12 reports of trials of Gabapentin
– 8 had a primary endpoint in manuscript different from protocol
– 5 trials failed to report protocol-defined primary outcomes
• ENHANCE Trial: Changes driven by science or business?
– Vytorin vs. simvastatin for preventing atherosclerosis (negative trial)
– Completed in 2006; Registered in clinicaltrials.gov in OCT 2007
– Endpoints entered differed from original design
• Chan et al. (JAMA, 2004) compared published articles with protocols for 102 randomized trials; 62% of the trials had at least one primary endpoint that had been changed, introduced, or omitted.
• Changes to endpoints can compromise the scientific integrity of a trial
• Not generally recommended (concern for “cherry-picking” / error inflation)
• New information could merit endpoint changes – Evolving medical knowledge (long-term trials); results from other trials or identification of better
biomarkers
• Incorporation of up-to-date knowledge into design is theoretically okay if the decision is “independent” of trial data (e.g., external data)– Demonstration of independence is difficult– DSMBs: not be appropriate decision-maker if they have seen the data
Handling Adaptations
• Update documentation
– The protocol (amendment)
– The clinical trial registry (clinicaltrials.gov)
– The monitoring plan
– The statistical analysis plan
Practical Issues
• Budget implications of changing sample size
• Complex drug supply issues
• Resources to conduct interim analyses – Data cleaning– Statistical analyses– Arranging DSMB meetings
• Protecting the blind and restricted access to data
• Perception issues
Analysis Issues
• Interpret cautiously
• Evaluate issues with
• Statistical error control
• Operational bias
• Generalizability
• Evaluate consistency of results before and after adaptation
Reporting / Publishing• Clearly describe
– The adaptation– Whether the adaptation was planned or unplanned– The rationale for the adaptation– When the adaptation was made– The data upon which adaptation is based and whether the data
were blinded or unblinded– The planned process for the adaptation including who made the
decision regarding adaptation– Deviations from the planned process– Consistency of results before vs. after the adaptation
• Discuss – Potential biases induced by the adaptation– Adequacy of firewalls to protect against operational bias– The effects on error control and multiplicity context
DSMBs
“It’s probably the toughest job in clinical medicine. Being on a DSMB requires real cojones.”
Jeffrey DrazenEditor, NEJMForbes, 2012
DSMBs and Adaptive Designs
• Many (MDs and statisticians) don’t understand adaptive design issues well or appreciate implications of DSMB actions
– Poor DSMB processes can jeopardize trial integrity
• Considerations
– Get DSMB members experienced with adaptive designs
– Statistician chair
– Well-constructed charter
Recent example:Release of Interim Results
• Vertex has ongoing treatment trial for cystic fibrosis
• Positive results of interim analyses released; trial continued
• Stock price sored
• Executive VP sold stock for 8.8 million profit; other officers too
• Oops! We made a mistake. Results not as positive as reported
• Stock price tumbles…
• Questions regarding interim data practices including DSMB operations
• SEC Investigation ongoing
Recent Example: DSMB Actions Questioned
• J&J prostate cancer drug Zytiga
• Interim results leaned heavily towards positive trial… but results not significant p=0.08.
• DSMB felt ethical obligation and stopped trial anyway
• Frequentists say trial stopped too early from an evidence perspective
• Some Bayesian argue no problem when you consider prior
• Perception that DSMB is in bed with the company
Predicted Interval Plots (PIPs)
Li L, Evans SR, Uno H, Wei LJ, “Predicted Interval Plots: A Graphical Tool for Data Monitoring in Clinical Trials”, Statistics in Biopharmaceutical Research, 1:4:348-355, 2009.
Evans SR, Li L, Wei LJ, “Data Monitoring in Clinical Trials Using Prediction”, Drug Information Journal, 41:733-742, 2007.
Motivation
• Patient management – Not a single decision but tailored sequential treatment decisions
(adjustments of therapy over time) based on individual patient response (transitions of health states based on efficacy, toxicity, adherence, QOL, etc.)
– Mixture of short-term and long-term outcomes
• Adaptive treatment regime designs – Compares treatment strategies (of sequential decisions) that are
consistent with clinical practice
Therapy#1
Therapy#1
Therapy#2
Therapy#2
NonResponders
NonResponders
RespondersResponders
RespondersResponders
NonResponders
NonResponders
EligiblePatientsEligiblePatients
Short-term Response
= Randomization
Therapy#1
Therapy#1
Therapy#2
Therapy#2
Therapy #1Therapy #1
Therapy #2 Therapy #2
NonResponders
NonResponders
RespondersResponders
RespondersResponders
NonResponders
NonResponders
EligiblePatientsEligiblePatients
Short-term Response
= Randomization
Therapy#1
Therapy#1
Therapy#2
Therapy#2
Therapy #1Therapy #1
Therapy #3 Therapy #3
Therapy #4 Therapy #4
Therapy #5 Therapy #5
Therapy #3 Therapy #3
Therapy #2 Therapy #2
NonResponders
NonResponders
RespondersResponders
RespondersResponders
NonResponders
NonResponders
EligiblePatientsEligiblePatients
Short-term Response
= Randomization
Therapy#1
Therapy#1
Therapy#2
Therapy#2
Therapy #1Therapy #1
Therapy #3 Therapy #3
Therapy #4 Therapy #4
Therapy #5 Therapy #5
Therapy #3 Therapy #3
Therapy #2 Therapy #2
NonResponders
NonResponders
RespondersResponders
RespondersResponders
NonResponders
NonResponders
EligiblePatientsEligiblePatients
Short-term Response Long-term Response
Follow-up
Follow-up
Follow-up
Follow-up
Follow-up
Follow-up
= Randomization
Example: HIV-Associated PML
• Design compares 4 treatment STRATEGIES– cART + steroids if IRIS is observed– cART without steroids– Enhanced-cART + steroids if IRIS is observed– Enhanced-cART without steroids
• Step 1– Randomized to cART or enhanced-cART (cART + enfuvirtide)– Observe patient response, particularly for IRIS
• Step 2 – If no IRIS then patient continues with therapy– If IRIS, then randomize to steroids or placebo
cARTcART
cART +ENF
cART +ENF
cARTcART
+ Steroid + Steroid
+ Placebo + Placebo
+ Placebo + Placebo
+ Steroid + Steroid
cART+ENF cART+ENF
IRISIRIS
No IRISNo IRIS
No IRISNo IRIS
IRISIRIS
PMLPML
Short-term Response Long-term Response
Follow-up
Follow-up
Follow-up
Follow-up
Follow-up
Follow-up
Coinfection: PMLShort-term Outcome: IRISLong-term Outcome: Survival
Adaptive Treatment Regimes
• Distinction between the regime (strategy dictating patient treatment) vs. realized experiences– Data from individual patients can contribute to multiple strategies
– Patients on the same regime can have different treatment experiences
• ITT complexity– Assigning treatment at later stages for patients LFU in early stages
– Should consent patients to agree to ALL sequential randomizations
Summary
• Adaptation is a design feature
– Requires careful and responsible planning
• When used appropriately, adaptive designs can be efficient and informative
• When used inappropriately, adaptive designs can threaten trial integrity
• Be aware of information apparent to observers and consider actions to protect trial integrity
– Minimize access to results to control operational bias
Adaptive Statistician
Many collaborating clinicians ask:
“Can we change statisticians? I’m tired of listening to Evans explain all of the
mistakes we are making.”
2-Stage Designs
• “Internal pilot”: Stage 1 vs. Stage II: learn vs. confirm– Hypothesis generation vs. hypothesis testing
• Efficiency advantage– Single trial addresses objectives traditionally addressed in two trials– Eliminates down-time between separate trials (but less thinking time)– IRB advantage (vs. approval of two trials)
• Classify by whether objectives or endpoints changes across stages
• Important distinction is whether final analyses uses data from both stages or only Stage II
2-Stage DesignSame Objectives and Endpoints
• Stage I: Evaluate preliminary evidence of effect/no effect
• ACTG 269 (Evans et.al., JCO, 2002)
– Phase II single arm trial of oral etoposide for AIDS KS– Endpoint: tumor response rate (50% decrease in lesion number/size)– Stage I
• Enroll small number of participants (N=14)• If response is unacceptably low (0/14), then quit for futility
noting that if true response rate is 20% then <5% chance of observing 0/14
• Otherwise continue to Stage II (not testing for efficacy)• Expected sample size is minimized when response is low given
error constraints– Trial continued w/ final response rate = 36%
Adaptive Randomization
• Randomization schedule cannot be constructed prior to trial initiation
• Treatment allocation depends on:1. Baseline characteristics, or2. Responses
• Minimization– Creates between-treatment-group balance wrt important variables
• “minimizes imbalances”– Revises the probability of treatment assignment based on baseline
characteristics of the participant and participants already randomized
Adaptive Randomization
• Response adaptive randomization– Bases treatment assignment probabilities on the observed responses
of participants that are already enrolled– Feasible with short-term outcomes (e.g., emergency medicine trials,
e.g., stroke, status epilepticus, or traumatic brain injury)– “Play-the-winner” or “urn design”
• Proportionally more patients are randomized to the more effective intervention
• May be attractive for this reason
• Disadvantages– Time trends in response create challenges (e.g., learning effects in
surgery trials)– Suggests equipoise does not hold– May be less efficient than group-sequential designs
Adaptive Dose Selection or Duration
• Enroll sequence of cohorts where subsequent cohorts open depending upon outcome from previous cohort
• A5210: AMD11070 (oral CXCR4 entry inhibitor)
– Accrue 6 participants; if <x DLTs then treat next 6 at next higher dose
• 5277: ITX-5061 HCV entry inhibitor for HCV monoinfection
– 3 doses (25/27/150); 3 durations (3/14/28 days)
– Start with highest dose on cohort of 10 (8 active; 2 placebo)
– If anti-viral activity (4/8 show 1 log drop), then reduce dose
– If no activity; then increase duration
Limitations of Many Traditional Methods
• Over-reliance on p-values without careful consideration of effect sizes (clinical relevance) and precision
• Inflexible decision rules based ONE endpoint– Desire to base decisions upon totality of evidence (e.g., safety
data, secondary endpoints, QOL, external data, etc.)
• No formal evaluation of the ramifications of continuing
Predicted Intervals
• Predict CI at future timepoint (e.g., end of trial or next interim analysis time) conditional upon:
1. Observed data2. Assumptions regarding future data (e.g., observed
trend continues, HA is true, H0 is true, best/worst case scenarios, etc.)
• Use with repeated confidence interval theory to control false positive error
NARC 009Evans et. al., PLoS ONE, 2007.
• Randomized, double-blind, placebo-controlled, multicenter, dose-ranging study of prosaptide (PRO) for the treatment of HIV-associated neuropathic pain
• Participants were randomized to 2, 4, 8, 16 mg/d PRO or placebo administered via subcutaneous injection
• Primary endpoint:– 6 week change from baseline in weekly average of random daily
Gracely pain scale prompts using an electronic diary
• Designed N= 390 equally allocated between groups– Interim analysis conducted after 167 participants completed the 6-
week double-blind treatment period
Treatment N95% CI for
Mean Change95% CI for
Diff1
95% PI forDiff2
95% PI forDiff3
Required Diff4
Placebo 31 (-0.35, -0.11)
2 mg 34 (-0.21, -0.04) (-0.04, 0.25) (-0.01, 0.21) (-0.16, 0.06) -0.54
4 mg 34 (-0.38, -0.12) (-0.19, 0.16) (-0.14, 0.10) (-0.23, 0.01) -0.45
8 mg 32 (-0.18, -0.02) (-0.01, 0.28) (0.03, 0.23) (-0.15, 0.05) -0.56
16 mg 36 (-0.34, -0.09) (-0.16, 0.19) (-0.11, 0.14) (-0.21, 0.04) -0.54
1: 95% CI for the difference in mean changes vs. placebo2: 95% PI for the difference in mean changes vs. placebo assuming full enrollment, assuming current trend3: 95% PI for the difference in mean changes vs. placebo assuming full enrollment, assuming per protocol, μ placebo = -
0.17 and μdrug = -0.34
4: Difference in mean changes needed in the remaining participants for the CI for the difference in mean changes to exclude zero (in favor of active treatment) at the end of the trial
Interim Analysis Results: NARC 009
Predicted Intervals and PIPs
• Intuitive
• Advantages
– Flexible decision making
• Considering all data (all endpoints, external data, etc.)
– Effect sizes and associated precision
• Clinical relevance and statistical significance
– Evaluation of trial with continuation
• PI width provides information about gain in precision
• Conditional power
– Can be used for all types of endpoints (e.g., binary. Continuous, event-time) and hypotheses (e.g., superiority or noninferiority)