Adaptive Clinical Trials Scott Evans, Ph.D. Harvard University Muscle Study Group September 28, 2012.

Adaptive Clinical Trials

Scott Evans, Ph.D.

Harvard University

Muscle Study Group

September 28, 2012

Special Thank You

• Dr. Griggs

• Dr. McDermott

DEFINITION

Adaptive Designs

• Not universally defined

– Broad definition: any design in which key parameters can be changed during the trial based on data from the current study or from external sources

– Narrow definition: specific design changes as a result of planned after interim analyses of treatment responses

Adaptive Designs

• A design feature

– A planned procedure for statistical error and bias control

– Described in the protocol

• Not a substitute for careful planning

– Not a rescue medication

• Fancy adaptations and statistical methods cannot rescue poorly designed trials

MOTIVATION

Practical Questions During Trial Conduct

• Stop for efficacy or futility?

• Are there subgroups w/ unacceptable toxicity?

• Has medical knowledge changed the scientific validity, medical importance, ethical acceptability, or equipoise of the trial?

• Should we adjust our design due to inaccurate design assumptions?

– Re-calculate sample size?

– Modify duration of follow-up?

Motivation

“I’ve designed >1000 clinical trials, each time having to make assumptions about variation, control-group response rate, etc. in order to

calculate sample size …

Motivation

“I’ve designed >1000 clinical trials, each time having to make assumptions about variation, control-group response rate, etc. in order to

calculate sample size …

I have not been right yet.”

Example as DSMB Member

• Trial designed to detect difference between response rates of 90% (control) and 97.5%

– 7.5% absolute difference

– 486 patients required to have 90% power

• Observed rate of control at interim is 80%

– With N=486, 56% power to detect 7.5% difference

– N=1066 required for 90% power to detect a difference between 80% vs. 87.5%

Motivation• Answering these questions has:

– Ethical attractiveness• Safer trials: fewer participants exposed to inefficacious/harmful

therapies

– Economical advantages• Smaller expected sample sizes • Shorter trials

– Public health advantages• Answers may get to the medical community more quickly

What Can be Adapted?

• Sample size

• Drop/add arms

• Stop for efficacy or futility

• Population enrichment (adapt eligibility criteria)

• Randomization probabilities

• Doses

• Objectives / hypotheses

– E.g., switching between NI and superiority

• Endpoints

• NI margin

When and Where? Trials with:

• High levels of uncertainty/unknowns (e.g. novel interventions)

• Design characteristics (e.g., power) that are sensitive to assumptions

• Long FU: adaptation is feasible and medical practice can change

• Invasive procedures or expensive evaluations

• Serious diseases; high risk treatments

• Vulnerable populations

• Data that serves as the basis for adaptation is available quickly

TRIAL INTEGRITY

Complexity and Acceptability

• Some adaptations are well understood/accepted

• Depends upon– Type of adaptation– The data utilized for decision-making– How adaptation is implemented– Who is reviewing data and making the decision to

adapt

Threat to Trial Integrity

• LOW– Adaptations prior to any data analyses– Adaptations based on

• Baseline data• External data• Blinded (aggregate) data • Nuisance parameters (e.g. variation)

• HIGH– Unplanned adaptations– Adaptations based on observed treatment effects

Example: Adaptation based on external data

• ATN 082: Evaluation of Pre-exposure prophylaxis (PREP)

• Randomization to PREP vs. placebo to prevent HIV transmission

– 8/2008: 1st participant enrolled

– 11/22/2010: email notifying results from iPREX trial (Gates Fndtn)

• PREP reduced HIV acquisition in similar trial (NEJM; 11/23/2010)

– 11/23/2010: DSMB call

• Equipoise? Still ethical to randomize and follow?

• Recommendation

– Notify participants and IRBs of iPREX results

– Unblind participants

– Discontinue control arm; offer rollover onto PREP

– Continue enrollment into PREP

Major Scientific Concerns with Adaptive Designs

• Statistical – Error control associated with multiplicity

• Operational bias– Adaptations are visible and could be used to infer trial results,

affecting patient/investigator action during the trial• E.g., participation, adherence, objectivity of patient ratings, etc.

– Not a statistical source of bias and thus difficult to adjust for– May cause heterogeneity of results (before vs. after adaptation)

Addressing Concerns

• Statistical – Methods exist (e.g., group sequential and modern adaptive design

methods for controlling errors)

• Operational bias– Careful and responsible application of adaptation– Well-constructed processes

• Control of dissemination of adaptation• Interim analyses and DSMBs procedures• The “closed protocol” (protocol team blinded)

– Details regarding the planned adaptation are put into a separate (limited distribution) document to reduce back-calculation for inferring effects

Example: Industry Trial

• Randomized controlled trial for treating lymphoma

• Conditional power calculated during interim analyses

• Pre-specified sample size adaptation rule (e.g., if low stop for futility; if very high continue as scheduled; if in the middle there are various sample size adaptations (or # events)

• DMC can recommend trial continuation but does not specify sample size (thus nobody can back- calculate treatment effect at interim)

• DMC is kept apprised of enrollment and # of events (event-driven trial), and DMC says “STOP” when appropriate

Conceptual Issue

• Should we adapt sample size based on observed treatment effect?– Trials are designed to detect relevant effects– Observed effects may not be relevant– Are we losing sight of clinical relevance?

5/26/2011: Email from FDA Team Leader

• I find many sponsors re-estimate their sample based on the interim difference. I feel this is incorrect. Sample size should only be re-estimated based on mispecification of variability or control rate. The difference we are trying to detect should be based on clinical input. If we re-estimate based on observed difference, we may end up with a trial that shows a statistically significant difference but not a clinically meaningful difference. Furthermore, I think using this information to would affect the Type I error rate even if we adjust for the interim analysis. There seems to be disagreement in FDA as to whether you can re-estimate based on observed difference I may be in the minority. I would appreciate your thoughts.

CHANGING ENDPOINTS

Changing Endpoints

• NEJM 2009: Evaluation of 12 reports of trials of Gabapentin

– 8 had a primary endpoint in manuscript different from protocol

– 5 trials failed to report protocol-defined primary outcomes

• ENHANCE Trial: Changes driven by science or business?

– Vytorin vs. simvastatin for preventing atherosclerosis (negative trial)

– Completed in 2006; Registered in clinicaltrials.gov in OCT 2007

– Endpoints entered differed from original design

• Chan et al. (JAMA, 2004) compared published articles with protocols for 102 randomized trials; 62% of the trials had at least one primary endpoint that had been changed, introduced, or omitted.

• Changes to endpoints can compromise the scientific integrity of a trial

• Not generally recommended (concern for “cherry-picking” / error inflation)

• New information could merit endpoint changes – Evolving medical knowledge (long-term trials); results from other trials or identification of better

biomarkers

• Incorporation of up-to-date knowledge into design is theoretically okay if the decision is “independent” of trial data (e.g., external data)– Demonstration of independence is difficult– DSMBs: not be appropriate decision-maker if they have seen the data

ANALYSIS, REPORTING, AND PRACTICAL ISSUES

Handling Adaptations

• Update documentation

– The protocol (amendment)

– The clinical trial registry (clinicaltrials.gov)

– The monitoring plan

– The statistical analysis plan

Practical Issues

• Budget implications of changing sample size

• Complex drug supply issues

• Resources to conduct interim analyses – Data cleaning– Statistical analyses– Arranging DSMB meetings

• Protecting the blind and restricted access to data

• Perception issues

Analysis Issues

• Interpret cautiously

• Evaluate issues with

• Statistical error control

• Operational bias

• Generalizability

• Evaluate consistency of results before and after adaptation

Reporting / Publishing• Clearly describe

– The adaptation– Whether the adaptation was planned or unplanned– The rationale for the adaptation– When the adaptation was made– The data upon which adaptation is based and whether the data

were blinded or unblinded– The planned process for the adaptation including who made the

decision regarding adaptation– Deviations from the planned process– Consistency of results before vs. after the adaptation

• Discuss – Potential biases induced by the adaptation– Adequacy of firewalls to protect against operational bias– The effects on error control and multiplicity context

DSMBs

“It’s probably the toughest job in clinical medicine. Being on a DSMB requires real cojones.”

Jeffrey DrazenEditor, NEJMForbes, 2012

DSMBs and Adaptive Designs

• Many (MDs and statisticians) don’t understand adaptive design issues well or appreciate implications of DSMB actions

– Poor DSMB processes can jeopardize trial integrity

• Considerations

– Get DSMB members experienced with adaptive designs

– Statistician chair

– Well-constructed charter

Recent example:Release of Interim Results

• Vertex has ongoing treatment trial for cystic fibrosis

• Positive results of interim analyses released; trial continued

• Stock price sored

• Executive VP sold stock for 8.8 million profit; other officers too

• Oops! We made a mistake. Results not as positive as reported

• Stock price tumbles…

• Questions regarding interim data practices including DSMB operations

• SEC Investigation ongoing

Recent Example: DSMB Actions Questioned

• J&J prostate cancer drug Zytiga

• Interim results leaned heavily towards positive trial… but results not significant p=0.08.

• DSMB felt ethical obligation and stopped trial anyway

• Frequentists say trial stopped too early from an evidence perspective

• Some Bayesian argue no problem when you consider prior

• Perception that DSMB is in bed with the company

Predicted Interval Plots (PIPs)

Li L, Evans SR, Uno H, Wei LJ, “Predicted Interval Plots: A Graphical Tool for Data Monitoring in Clinical Trials”, Statistics in Biopharmaceutical Research, 1:4:348-355, 2009.

Evans SR, Li L, Wei LJ, “Data Monitoring in Clinical Trials Using Prediction”, Drug Information Journal, 41:733-742, 2007.

RESPONSE-ADAPTIVE TREATMENT REGIMES

Motivation

• Patient management – Not a single decision but tailored sequential treatment decisions

(adjustments of therapy over time) based on individual patient response (transitions of health states based on efficacy, toxicity, adherence, QOL, etc.)

– Mixture of short-term and long-term outcomes

• Adaptive treatment regime designs – Compares treatment strategies (of sequential decisions) that are

consistent with clinical practice

Therapy#1

Therapy#1

Therapy#2

Therapy#2

EligiblePatientsEligiblePatients

= Randomization

Therapy#1

Therapy#1

Therapy#2

Therapy#2

NonResponders

NonResponders

RespondersResponders


NonResponders

NonResponders


Short-term Response

= Randomization

Therapy#1

Therapy#1

Therapy#2

Therapy#2

Therapy #1Therapy #1

Therapy #2 Therapy #2

NonResponders

NonResponders



NonResponders

NonResponders


Short-term Response

= Randomization

Therapy#1

Therapy#1

Therapy#2

Therapy#2







NonResponders

NonResponders



NonResponders

NonResponders


Short-term Response

= Randomization

Therapy#1

Therapy#1

Therapy#2

Therapy#2







NonResponders

NonResponders



NonResponders

NonResponders


Short-term Response Long-term Response

Follow-up

Follow-up

Follow-up

Follow-up

Follow-up

Follow-up

= Randomization

Example: HIV-Associated PML

• Design compares 4 treatment STRATEGIES– cART + steroids if IRIS is observed– cART without steroids– Enhanced-cART + steroids if IRIS is observed– Enhanced-cART without steroids

• Step 1– Randomized to cART or enhanced-cART (cART + enfuvirtide)– Observe patient response, particularly for IRIS

• Step 2 – If no IRIS then patient continues with therapy– If IRIS, then randomize to steroids or placebo

cARTcART

cART +ENF

cART +ENF

cARTcART

+ Steroid + Steroid

+ Placebo + Placebo

+ Placebo + Placebo

+ Steroid + Steroid

cART+ENF cART+ENF

IRISIRIS

No IRISNo IRIS

No IRISNo IRIS

IRISIRIS

PMLPML

Short-term Response Long-term Response

Follow-up

Follow-up

Follow-up

Follow-up

Follow-up

Follow-up

Coinfection: PMLShort-term Outcome: IRISLong-term Outcome: Survival

Adaptive Treatment Regimes

• Distinction between the regime (strategy dictating patient treatment) vs. realized experiences– Data from individual patients can contribute to multiple strategies

– Patients on the same regime can have different treatment experiences

• ITT complexity– Assigning treatment at later stages for patients LFU in early stages

– Should consent patients to agree to ALL sequential randomizations

Summary

• Adaptation is a design feature

– Requires careful and responsible planning

• When used appropriately, adaptive designs can be efficient and informative

• When used inappropriately, adaptive designs can threaten trial integrity

• Be aware of information apparent to observers and consider actions to protect trial integrity

– Minimize access to results to control operational bias

Adaptive Statistician

Many collaborating clinicians ask:

Adaptive Statistician

Many collaborating clinicians ask:

“Can we change statisticians? I’m tired of listening to Evans explain all of the

mistakes we are making.”

…as you can see dear colleagues, adaptive design is a very easy concept…

Thank you for listening.

BACK-UP

2 STAGE DESIGNS

2-Stage Designs

• “Internal pilot”: Stage 1 vs. Stage II: learn vs. confirm– Hypothesis generation vs. hypothesis testing

• Efficiency advantage– Single trial addresses objectives traditionally addressed in two trials– Eliminates down-time between separate trials (but less thinking time)– IRB advantage (vs. approval of two trials)

• Classify by whether objectives or endpoints changes across stages

• Important distinction is whether final analyses uses data from both stages or only Stage II

Seamless Designs (e.g., Phase II/III)

2-Stage DesignSame Objectives and Endpoints

• Stage I: Evaluate preliminary evidence of effect/no effect

• ACTG 269 (Evans et.al., JCO, 2002)

– Phase II single arm trial of oral etoposide for AIDS KS– Endpoint: tumor response rate (50% decrease in lesion number/size)– Stage I

• Enroll small number of participants (N=14)• If response is unacceptably low (0/14), then quit for futility

noting that if true response rate is 20% then <5% chance of observing 0/14

• Otherwise continue to Stage II (not testing for efficacy)• Expected sample size is minimized when response is low given

error constraints– Trial continued w/ final response rate = 36%

Adaptive Randomization

• Randomization schedule cannot be constructed prior to trial initiation

• Treatment allocation depends on:1. Baseline characteristics, or2. Responses

• Minimization– Creates between-treatment-group balance wrt important variables

• “minimizes imbalances”– Revises the probability of treatment assignment based on baseline

characteristics of the participant and participants already randomized

Adaptive Randomization

• Response adaptive randomization– Bases treatment assignment probabilities on the observed responses

of participants that are already enrolled– Feasible with short-term outcomes (e.g., emergency medicine trials,

e.g., stroke, status epilepticus, or traumatic brain injury)– “Play-the-winner” or “urn design”

• Proportionally more patients are randomized to the more effective intervention

• May be attractive for this reason

• Disadvantages– Time trends in response create challenges (e.g., learning effects in

surgery trials)– Suggests equipoise does not hold– May be less efficient than group-sequential designs

Adaptive Dose Selection or Duration

• Enroll sequence of cohorts where subsequent cohorts open depending upon outcome from previous cohort

• A5210: AMD11070 (oral CXCR4 entry inhibitor)

– Accrue 6 participants; if <x DLTs then treat next 6 at next higher dose

• 5277: ITX-5061 HCV entry inhibitor for HCV monoinfection

– 3 doses (25/27/150); 3 durations (3/14/28 days)

– Start with highest dose on cohort of 10 (8 active; 2 placebo)

– If anti-viral activity (4/8 show 1 log drop), then reduce dose

– If no activity; then increase duration

Limitations of Many Traditional Methods

• Over-reliance on p-values without careful consideration of effect sizes (clinical relevance) and precision

• Inflexible decision rules based ONE endpoint– Desire to base decisions upon totality of evidence (e.g., safety

data, secondary endpoints, QOL, external data, etc.)

• No formal evaluation of the ramifications of continuing

Predicted Intervals

• Predict CI at future timepoint (e.g., end of trial or next interim analysis time) conditional upon:

1. Observed data2. Assumptions regarding future data (e.g., observed

trend continues, HA is true, H0 is true, best/worst case scenarios, etc.)

• Use with repeated confidence interval theory to control false positive error

NARC 009Evans et. al., PLoS ONE, 2007.

• Randomized, double-blind, placebo-controlled, multicenter, dose-ranging study of prosaptide (PRO) for the treatment of HIV-associated neuropathic pain

• Participants were randomized to 2, 4, 8, 16 mg/d PRO or placebo administered via subcutaneous injection

• Primary endpoint:– 6 week change from baseline in weekly average of random daily

Gracely pain scale prompts using an electronic diary

• Designed N= 390 equally allocated between groups– Interim analysis conducted after 167 participants completed the 6-

week double-blind treatment period

Treatment N95% CI for

Mean Change95% CI for

Diff1

95% PI forDiff2

95% PI forDiff3

Required Diff4

Placebo 31 (-0.35, -0.11)

2 mg 34 (-0.21, -0.04) (-0.04, 0.25) (-0.01, 0.21) (-0.16, 0.06) -0.54

4 mg 34 (-0.38, -0.12) (-0.19, 0.16) (-0.14, 0.10) (-0.23, 0.01) -0.45

8 mg 32 (-0.18, -0.02) (-0.01, 0.28) (0.03, 0.23) (-0.15, 0.05) -0.56

16 mg 36 (-0.34, -0.09) (-0.16, 0.19) (-0.11, 0.14) (-0.21, 0.04) -0.54

1: 95% CI for the difference in mean changes vs. placebo2: 95% PI for the difference in mean changes vs. placebo assuming full enrollment, assuming current trend3: 95% PI for the difference in mean changes vs. placebo assuming full enrollment, assuming per protocol, μ placebo = -

0.17 and μdrug = -0.34

4: Difference in mean changes needed in the remaining participants for the CI for the difference in mean changes to exclude zero (in favor of active treatment) at the end of the trial

Interim Analysis Results: NARC 009

Predicted Intervals and PIPs

• Intuitive

• Advantages

– Flexible decision making

• Considering all data (all endpoints, external data, etc.)

– Effect sizes and associated precision

• Clinical relevance and statistical significance

– Evaluation of trial with continuation

• PI width provides information about gain in precision

• Conditional power

– Can be used for all types of endpoints (e.g., binary. Continuous, event-time) and hypotheses (e.g., superiority or noninferiority)

Adaptive Clinical Trials Scott Evans, Ph.D. Harvard University Muscle Study Group September 28, 2012.

Documents

sample size motivationive

data analysesadaptations

controlgroup response

external data atn

inaccurate design assumptions

bias control

powerobserved rate of

sample size dropadd