Top Banner
DRAFT The Future of Drug Development: Clinical Trial Design White Paper September 17, 2007 1
79
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Optimizing_Trial_Design.doc.doc

DRAFT

The Future of Drug Development: Clinical Trial Design

White Paper

September 17, 2007

CBI 3 Cambridge Center, NE20-382, Cambridge, MA 02142 617-253-0257 [email protected] http://web.mit.edu/cbi/index.html

1

Page 2: Optimizing_Trial_Design.doc.doc

Acknowledgement

MIT’s Center for Biomedical Innovation* greatly appreciates the input of Frank Douglas, PhD, MD, into the conceptualization of this document at CBI’s Streamlining Clinical Trial Operations workshop in March 2007. We also thank the following collaborators for their generous contribution of time, effort, and expertise in building on the initial concept to produce this draft white paper:

Michael Branson, PhD (Novartis International AG) Pravin Chaturvedi, PhD (IndUS Pharmaceuticals) Ene Ette, PhD (Anoixis Corp.) Paul Gallo, PhD (Novartis International AG) Howard Golub, MD, PhD (Harvard-MIT Division of Health Sciences &

Technology and BattelleCRO) Susan Levinson, PhD (The Strategic Choice LLC) Cyrus Mehta, PhD (Cytel Software Corporation) John Orloff, MD (Novartis International AG) Nitin Patel, PhD (Cytel Software Corporation) Jose Pinheiro, PhD (Novartis International AG) Sameer Sabir (MIT MBA & SM - 2009) Donald Stanski, MD (Novartis International AG)

Please note that the views expressed in this document are those of the authors and not necessarily their affiliated organizations.

*The CBI Leadership TeamGigi Hirsch, MD – Interim Executive Director, CBIErnst Berndt, PhD – MIT Sloan School of Management & CBI Co-DirectorAnthony Sinskey, ScD – MIT School of Biology, Harvard-MIT Division of Health Sciences and Technology, & CBI Co-DirectorSteve Tannenbaum, PhD – MIT School of Engineering & CBI Co-Director

The Future of Drug Development: Clinical Trial Design

2

Page 3: Optimizing_Trial_Design.doc.doc

A Whitepaper from the Center for Biomedical Innovation

Objective: This whitepaper seeks to advance the effective implementation of tools and models which improve clinical development through application of a flexible paradigm for optimizing clinical trial design. This goal will be achieved by: Developing a common language and meaningful educational tools to facilitate

understanding and communication. Sharing examples of application of modern tools to specific cases to improve

statistical power without increasing sample size in order to shorten time to achieve results and to minimize risk of failure.

Collaborating with the FDA to advance the use of clinical trial designs which improve the clinical probability of success and ensure the appropriate rigor of analysis.

Providing training materials to ensure appropriate understanding and application of modern clinical trial design methods, so that trialists can determine where the tools and methods should be applied.

The Current Situation

Although investments in R&D have risen steadily and the process has been continuously improved by key stakeholders, the risk of failure at each stage of the R&D cycle remains unacceptably high in the view of both the industry participants and the beneficiaries of their products, including patients and payors. The resultant loss in productivity depicted in Figure 1 is a serious concern for all stakeholders. The costs of NIH biomedical research and pharmaceutical industry R&D have increased more than 200% in the last 10 years (Budget US Govt., App., FY 1993-2003; Parexel Pharm. R&D Stat. Sourcebook, 2002/2003); the investment required to achieve one successful drug launch is estimated to have increased by 55% from 1995-2000 to 2000-2002 (Singh et al. In Vivo: The Business & Medicine Report, 17:10, p. 73, November, 2003); and the Tufts Center estimates the cost of a new drug at $897 million (DiMasi et al. The Journal of Health Economics, 22(2003) 151-185). Moreover, in spite of increased investments, the probability of success of a therapeutic agent entering Phase I of clinical development has decreased from 14% to 8%. With a range of estimates of drug development costs from various sources from $ 0.8 B to $ 1.7 B, the tools, methods and models which decrease development time, increase probability of success and speed delivery of products to patients can have significant impact to reduce overall healthcare spending, which is now approaching $2 trillion dollars in the US alone. The US government has recognized the criticality of this problem in the initiatives formulated by the FDA Critical Path and the NIH Roadmap. These initiatives highlight the challenges of developing new therapeutic agents in a scientific and clinical environment where complexity is increasing in a logarithmic manner due to advances in understanding both of disease and safety issues, and they challenge the stakeholder community to provide solutions. This whitepaper addresses needs within the clinical trial design focus area.

3

Page 4: Optimizing_Trial_Design.doc.doc

Figure 1. Loss in productivity measured as NME launched per dollar spent for each 5 year period. (B. Booth & R. Zemmel. Nature Rev Drug Disc 2004)

A recent report evaluates the costs associated with different phases of drug discovery and drug development. Close examination of the costs associated with drug development activities shows that clinical phase activities represent more than 65% of the total expenses per launched new product. As indicated in Figure 2, prior to 2000, the expenses associated with drug discovery operations (including failures and discontinued programs) approximately equal those associated with drug development activities – which are defined as activities supporting Investigational New Drug (IND) application through those supporting market launch activities. Since the completion of the human genome project, there has been an explosion of research activity in the pharmaceutical and biotech industry to explore the opportunities associated with novel and unprecedented biological targets to design, discover and develop new therapeutic entities for the treatment of various diseases. Between 2000 and 2002, the increased costs of development activities is particularly evident in the costs associated with Phase 2 and Phase 3 clinical activities, which increase to 44% of the total, as compared to 32% in the prior period (Figure 2).

4

Page 5: Optimizing_Trial_Design.doc.doc

Figure 2. Costs of pharmaceutical R&D allocated to each phase of the process.

There are a number of reasons for the increased costs of development and the increased attrition rates that have led to decreased productivity. Aggressive targeting of new mechanisms of action leads to programs that do not have

precedents or proof of concept established Bayesian approaches are rarely used to guide decision making When modeling & simulation are used, they are often applied retrospectively and/or

inconsistently Pooling of data from different sources is hampered by a lack of common data

standards Pivotal studies for registration are often designed without fully utilizing prior

knowledge which might improve design or reduce costs and increase trial efficiency Large costly outcome studies are increasingly conducted for multiple stakeholders

(commercial and regulatory) Trial results are usually analyzed individually without factoring in prior knowledge

In the discovery phase, there has been a considerable effort by the pharmaceutical industry to establish product selection criteria to advance the most likely candidates into development to improve efficiency of spending in development. As the exploration of novel unprecedented targets has involved a greater amount of biological, chemical and pharmacological screening activities, the cost to establish such assays has risen and the physiological relevance of the assays is often not validated. These approaches allow the screening of large chemical libraries and generate higher numbers of ‘hits’ and lead

5

Page 6: Optimizing_Trial_Design.doc.doc

candidates, although the hits and leads selected for optimization have a higher attrition rate during preclinical and early development studies.

Following the selection of a clinical candidate, the highest attrition rate happens during phase 2 studies with the new therapeutic entity. There are several reasons for the lower success rates, including lack of proof of the relevance of the biological target as a disease-intervention or disease-modifying target and lack of understanding of the dose-response relationship of the new molecular entity (NME). Since the primary objectives of phase 2 studies are to demonstrate clinical proof of activity in relevant patient populations together with establishment of a dose-response relationship and/or selection of an active dose with adequate tolerability, the phase 2 studies represent a critical juncture in clinical drug development programs. When possible, most clinical development programs will use an exploratory phase 2A study to establish clinical proof of activity and a more robust, clinically-relevant phase 2B study (or studies) to establish dose-response relationships – preferably with clinically relevant or clinical endpoints.

Such studies are expensive and often times there are not enough dose groups selected to obtain clear dose-response relationships or clear demonstration of a single active dose. The use of modern flexible tools can help clinical development programs assess multiple dose groups in a flexible design format, and ensure judicious use of limited patient resources. These tools also provide the patient benefit of limiting exposure to ineffective or poorly tolerated dose(s).

Looking to the Future

Traditional views of the development process encompass a linear set of steps designated Pre-clinical, Phase I, Phase II, and Phase III, followed by submission to regulatory authorities for approval (Figure 3). The traditional R&D approach separates discovery and development into separate and distinct phases. The process is multi-phasic and sequential, and is driven by discrete milestones separated by “white space”. This rather inefficient process is not required by the federal code of regulations; rather it is a process that is codified by decades of tradition and experience primarily with agents to treat broad patient populations. Increasingly associated with long development times and high costs, the traditional approach also does not consider the needs of development programs for narrowly defined patient populations in a more stratified or personalized medical paradigm. The current empiric approach to drug development based on a “blockbuster” model has resulted in products targeted to large segments of the patient population with little regard to factors that affect efficacy or safety in individual patients. With the cloning of the human genome and the advent of new technologies comes the opportunity to move away from a clinical definition of disease to one that is based on a better understanding of the molecular basis. This should lead to improved tests to diagnose and select patients for specific therapies that will dramatically improve responder rates and the benefit/risk balance in individual patients. Therefore, declining R&D productivity and the need for more personalized therapeutic approaches, mandate fundamental change to reverse this worrisome trend and address future needs.

6

Page 7: Optimizing_Trial_Design.doc.doc

Pharmaceutical R&D – The Old Paradigm

Research Phase I Phase II Phase III

ApprovalEfficacy

Safety

PK1

Initial safety

Toxicology

Market

Phase IV

1Pharmacokinetics

• Multi-phase

• Sequential

• Milestone-led

Figure 3. The Traditional View of Pharmaceutical R&D

In the increasingly complex environment of drug development, a more integrated view of this process is now emerging which seeks to increase flexibility and maximize the use of accumulated knowledge to improve the acquisition of further knowledge about drug candidates (Figure 4). This model is particularly important for the development of new classes of therapeutics for which there are no previous precedents to guide design. In this model of the future, the broader, more flexible phases leading up to submission for approval are designated Exploratory and Confirmatory.

Clinicaldevelopment

Target discovery and

validation

PoC1

clinical trials

Confirmatory PhaseExploratory Phase

ApprovalEfficacy

Safety

Proof-of-Concept

MarketTarget

Pharmaceutical R&D – The New Paradigm

• Adaptive

• Parallel

• Data-led1 Proof-of-Concept

Figure 4. A New Paradigm for Pharmaceutical R&D

The Exploratory phase of development seeks to apply all available knowledge and tools, including biomarkers and modeling and simulation, as well as advanced statistical methodology and designs to the goals of determining proof of concept and dose selection to a level of rigor that will enhance likelihood of success in the Confirmatory phase. During the Exploratory Phase of development, integration of discovery and early development functions will break down traditional barriers associated with the proverbial hand-off of compounds from research to development. Early entry into man through new

7

Page 8: Optimizing_Trial_Design.doc.doc

tools, such as Exploratory IND or Exploratory CTA, will accelerate the generation of early human data, and facilitate the selection of compounds with the most desirable PK/PD properties for further clinical development. POC will consist of a package of studies that will establish proof-of-therapeutic-activity, which will be heavily dependent on the availability of efficacy biomarkers. In the new development model, decision making will be enhanced by Modeling and Simulation, making most efficient use of all available data, simulating results of clinical trials in advance, and guiding development strategy.

During the Confirmatory phase, modern designs and tools and knowledge are applied to larger scale studies with the goal of identifying the target patient population in which the drug is efficacious, as well as establishing the benefit to risk ratio and confirming the optimal dose and dosing regimen. Innovative clinical trial designs such as adaptive/seamless studies will compress timelines, improve dose/regimen selection, and reduce the number of patients assigned to non-viable dosing regimens. The patient population will be targeted to those who are most likely to respond, based on a biomarker, and those with the most favorable benefit/risk ratio. Real-time data sharing with Health Authorities, facilitated by the necessary IT infrastructure, will greatly increase transparency between industry and regulators, and will facilitate faster regulatory decision making.

The key element of this new view of the development process is the ability to apply all available knowledge where appropriate across the breadth of studies to improve the quality, timeliness and efficiency of the effort. This broader, more flexible view contrasts with the sequential approach in that it reduces the compartmentalization of data across functional silos and the rigid view of each phase which might preclude more effective trial designs. Flexible clinical trial designs prospectively utilize a set of complex sequential interim analyses that dynamically modify the course of a trial (e.g., adjusting sample size) without negatively impacting the statistical integrity of the trial. This is accomplished by using a fraction of the current study’s data to statistically appropriately change or adjust design criteria going forward that were based on less reliable initial assumptions. These new approaches establish the need to take steps to more effectively share all known information, including creation of common data standards, pooling of sets of data not previously considered, application of Bayesian approaches to decision-making and prospectively using modeling and simulation to enhance design and clinical strategy.

Varying Stakeholder Views

It is key to understand the viewpoints of varied stakeholders in order to facilitate change. This paper seeks to bridge the differences amongst stakeholders with varying business needs, disciplinary backgrounds and functional roles which impact drug discovery and development. These points of view include those of the regulatory authorities, the trialist, the statistician, and R&D management in both large and small biopharmaceutical companies, as well as academia/NIH, patient advisory groups, and clinical research

8

Page 9: Optimizing_Trial_Design.doc.doc

organizations. Each of these stakeholders is a key decision-maker and/or influencer whose understanding of these tools is essential to the adoption of modern flexible trial design. However, the adoption of the tools may be different across the groups as each has a different perspective based on their role in the process.

Within the large pharmaceutical corporation, the use of modern designs has been applied mostly as a tool in phase 2 clinical programs to evaluate multiple dose levels with a reduced number of patients. This can play a critical role in allowing a higher probability of success in phase 2 programs and ensuring that the very expensive phase 3 trials are run with the optimal (rather than an empirical) dose. Modeling and simulation are also employed by the large pharmaceutical corporation; however, these tools are more frequently employed retrospectively and can provide more benefit with consistent use and experience. Moreover, although large organizations have the advantage of greater resources to collect information on biological targets, biomarkers and pre-clinical assessments, the full potential of this knowledge is often lost to the compartmentalization of data due to lack of common data standards, as well as functional silos. Thus, the tools available are not fully utilized.

For a smaller pharmaceutical or biotech entity, modern designs have the potential to offer a more affordable clinical development program – particularly when the therapeutic focus is for an orphan or niche disease indication – where there are fewer patients available for clinical trial enrollment. Furthermore, novel designs can be discussed with the regulatory agencies in a combined phase 2/3 format to allow the design to be considered as a pivotal study for registration. This opportunity thus represents a significant advantage to the smaller company and allows it to efficiently develop drugs for therapeutic areas of high unmet clinical need, but with reduced resources – both patients and costs. While this benefit applies to all sponsors, resource and time constraints create unique issues for the small company, which must use its limited resources judiciously to achieve its program goals, as the biotech industry usually has to achieve key milestones for its immediate next financing in the time frame available to it (usually around 12 months).

Since modern flexible designs require significant statistical analysis rigor and a complete buy-in from the regulatory agency, it does take longer during the planning and protocol development phase. Furthermore, regulatory agencies and Institutional Review Boards need to accept the novel design formats, in particular those with interim analysis. Sometimes, these negotiations and assessments can take a very long time and the company may choose to go the traditional phase 1, phase 2A, phase 2B and phase 3 route instead. For example, there is a prolonged, albeit necessary, discussion with biostatisticians within the company and the FDA to agree on the statistical analysis plan. While one may save on the costs of patient resources, a prolonged negotiation amongst statisticians and clinicians from the sponsor and regulatory agency can lead to erosion of capital and time, negating the value of the modern clinical trial design. Thus, it may not really reduce the clinical development costs, despite the use of lower patient resources. In other words, the traditional route may end up being the shorter route for the program. Facilitating this process promises to provide solutions to this issue, as well.

9

Page 10: Optimizing_Trial_Design.doc.doc

It would be useful to establish some rules and guidelines to allow any sponsor of clinical trials to readily evaluate their ability to use modern clinical trial designs in their NME development programs. It should not be a “one size fits all” solution, but a guide that considers the underlying knowledge of the disease and therapeutic candidate, as well as the goals of the program and the realities of patient access and standard of care. This set of guiding principles takes on even greater importance as the development of new therapeutic agents which address more targeted patient populations utilizes the increasing knowledge of genomics, biomarkers, and enhanced disease understanding to achieve the goals of stratified/personalized medicine (Trusheim, Mark R., Berndt, Ernst R and Douglas, Frank L., 2007. Stratified Medicine: strategic and economic implications of combining drugs and clinical biomarkers. Nat Rev Drug Discov. 6, 287-293 (April 2007).

For the contract research organization (CRO), the benefits of modern designs are aligned to those of their customers, the trial sponsors. However, for the CRO, logistical challenges in the operational implementation of these designs must be addressed. Proactive involvement in the dialogue in order to address operational challenges can be facilitated through improved understanding of the tools and common understanding of the designs and benefits. The Set of Solutions

Identifying the optimal development strategy and clinical trial design is a complex interdisciplinary challenge, requiring evaluation of the known information, as well as the available methodology and tools to be applied. This paper will address a set of guidelines for the trialist to ensure that the best solutions are applied to the appropriate challenge and to facilitate collaboration of key stakeholders in the development of clinical strategy, including trialists, statisticians and regulatory authorities. These solutions are applied to achieve the following common goals to:

Apply modern trial designs and statistical approaches to increase the probability of success and reduce late stage attrition

Use all available data (internal and external) to improve predictive capabilities and to optimize decision making

Use modern techniques and prior knowledge to optimize design of late development programs (clinical scenario planning), dose selection, and target patient population

Modeling and Simulation techniques provide a framework for achieving these goals and are a cornerstone of the new drug development model. Use of these techniques has been fueled by the accumulation of evidence of the utility of using quantitative scientific evidence to underpin decision making at key points along the drug development time axis, as well as the strong support from within regulatory agencies. In the Exploratory phase, modeling and simulation can help refine dose selection and study design. In the Confirmatory phase, simulation helps to understand how different study designs can impact the outcome and likelihood of success, thereby guiding development strategy. Modeling and simulation is facilitated by pooling many sources of data both from prior

10

Page 11: Optimizing_Trial_Design.doc.doc

studies of the drug as well as external data which may be informative to better guide decision-making. These techniques can be used not just during the trial design process, but mid-study in support of modern flexible trial designs. The methods can be applied to:

Biological Models, where mathematical modeling is used to understand genetic, biochemical and physiological networks, pathways and processes underlying disease, and pharmacotherapy.

Pharmacological Models, including deterministic and stochastic PK/PD (pharmacokinetic/pharmacodynamic) modeling to guide clinical trial design, dose selection, and development strategies.

Statistical Clinical Trial Models, which utilize probability and data analysis for inference and decision making under uncertainty and variability using innovative frequentist as well as Bayesian methodologies.

Some possibilities for enhancing the traditional approach include modeling safety and efficacy responses on a continuous scale, and making use of longitudinal measurements when available. In addition, modern trial designs can be enhanced through Bayesian approaches by using placebo and/or baseline external data, combined with modeling, to improve estimation of the percentage of patients with clinically relevant safety events, or with positive response to treatment, and to use information from pre-clinical studies, or previous trials.

Bayesian methodology relies on the use of probability models to describe the knowledge about parameters of interest (e.g., treatment effect for a drug in development). Bayesian inference uses principles from the scientific method to combine prior beliefs with observed data, producing enhanced, updated information. It is particularly well suited for sequential experimentation, as typically occurs in clinical drug development, when information from previous experiments (e.g., phases in development) provides prior knowledge for future experiments, the data from which is then used to update the current knowledge. Initial beliefs about the parameters are summarized in their prior distribution. Then, new data values are collected experimentally (e.g., patient survival in an oncology trial), and the probability distribution of these values leads to the likelihood function (the observed evidence on the parameters). The two elements are then combined, using Bayes’ theorem, to produce the posterior distribution of the parameters, that is, the updated knowledge given the observed evidence. In contrast, the Frequentist method relies solely on the observed evidence for its inferences, typically not formally taking into account prior information.

Adaptive design utilizes interim data from the trial itself to modify and improve the study design, without undermining its validity or integrity. In the Exploratory setting, an adaptive trial can assign a larger proportion of the enrolled subjects to the treatment arms that are performing well, drop arms that are performing poorly, and investigate a wider range of doses so as to more effectively select doses that are most likely to succeed in the Confirmatory phase. In the Confirmatory phase, adaptive design can facilitate early identification of efficacious treatments, determine if the trial should be terminated for futility, and make sample size adjustments at interim looks so as to ensure that the trial is adequately powered. In some cases it might even be possible to enrich the patient

11

Page 12: Optimizing_Trial_Design.doc.doc

population by altering the eligibility criteria at an interim look. Thus, adaptive trials have the potential to translate into more ethical treatment of patients within trials, more efficient drug development, and better focusing of available resources.

Seamless design combines into a single trial the objectives traditionally addressed in separate trials. A seamless adaptive design addresses objectives normally achieved through separate trials, such as seamless adaptive phase II/III trials. The advantages of the seamless design are reduction in the duration of a clinical program, by elimination of the time period traditionally occurring between phases, potentially greater efficiency from the use of data from both stages, possibly translating into the need for fewer patients to obtain the same quality of information, and potential for earlier obtaining of long-term safety data, by continued follow-up of patients from the first stage. Other types of adaptation that may be considered during the Confirmatory phase of development include Phase III group sequential trials and Phase III trials with sample size re-estimation.

Sample size re-estimation methods provide the flexibility to either increase or decrease the sample size at an interim point in the trial. This is important in the cases where there is uncertainty about between-subject variance in the response or uncertainty about the clinically meaningful effect size at which to power the trial. These methods allow the study to begin with a certain sample size which then can be increased or decreased at an interim point or even allow for an efficacy stopping boundary.

The application of a set of solutions to the Exploratory and Confirmatory phases of clinical development can be best illustrated through case studies. The following sections highlight a number of novel approaches that leverage these new techniques and methodologies, with the objective of improving decision making so as to avoid late stage attrition. Exploratory Phase of Development

Combining Bayesian Methods with Modeling and Simulation

Better decision making in early development is of critical importance to improve the flagging efficiency of drug development currently facing the pharmaceutical industry. Because these studies are conducted under fairly restricted resources (duration, sample sizes, etc), the efficient use of all available information is of fundamental importance for decision making. Modeling and simulation approaches can, and should, play an important role in this regard. They can be employed to represent dose and time response behavior of safety and efficacy endpoints, which can be combined with Bayesian methods to provide a continuous flow of information across different phases of development. For example, pre-clinical data can be used to construct models and to provide prior information on model parameters. Likewise, the results from a proof of concept (PoC) study can be used to form prior distributions for a similar model to be used in a subsequent dose finding study.

An additional benefit of modeling in early development is that it allows the use of external information (e.g., baseline values for safety endpoints) to estimate characteristics

12

Page 13: Optimizing_Trial_Design.doc.doc

of interest about the population. Given the vast quantity of data from other development programs that are available in most pharmaceutical companies, as well as current discussions within the industry to share placebo data across companies, this has enormous potential for improving the efficiency of investigation in early development.

Bayesian Modeling Combined with Use of External Baseline Data to Improve Efficacy and Safety Signal Detection in Early Development

Early development studies for establishing PoC generally use small patient cohorts (typically between 10 to 20), observed for a relatively short period of time (several weeks) to evaluate early efficacy and safety signals. Safety and efficacy variables are often measured on a continuous scale and observed several times over the duration of the study. However, typically the endpoints for the Go/No Go decision are based on a single time point (e.g., change from baseline at end of study) and use dichotomized versions of the original variables to characterize responder/non-responder behavior. An example of the latter is the transformation of continuous liver function test measurements (e.g., ALT and AST) into binary indicators (e.g. exceeding 3 X upper limit of normal (ULN) ). There are, therefore, two types of information loss often present in PoC studies: (i) the dichotomization of continuous endpoints, and (ii) not using all of the available longitudinal measurements collected in the study.

The usual design used for efficacy and safety evaluation in PoC studies is characterized by the use of cohorts in a dose-escalation algorithm. Cohorts are assigned, in sequence, to increasing doses until the maximum tolerated dose (MTD), generally determined in a previous study, is reached, or unacceptable safety is observed for a given cohort. A new cohort is only allowed to start once acceptable safety signals are verified for all previous doses. At the end of the study, one hopes to either determine a dose range for further exploration in Phase IIb, or to conclude that no PoC can be established based on the efficacy/safety trade-off.

Because of the small cohort sizes, only safety problems occurring in a relatively large percentage of patients can be reliably detected via the dose escalation procedure. Likewise, only relatively strong efficacy signals can be detected with reasonable statistical power using traditional pair-wise hypothesis tests. Safety and efficacy signal detection can be made more efficient in a variety of ways: by using data, or more generally, information external to the trial and longitudinal modeling approaches to make use of all available information. Furthermore, the utility of PoC studies within the drug development program can be enhanced by incorporating the information obtained in them directly into later phase trials. Bayesian modeling techniques are particularly useful in implementing these different approaches.

In this section we use a real PoC study in dyslipidemia to illustrate the methods mentioned above. The discussion focuses on the key ideas, rather than the mathematical formalization of the various methods. Graphical displays are used to illustrate the methodology.

13

Page 14: Optimizing_Trial_Design.doc.doc

Case study: dyslipidemia PoC studyThe dose-escalation PoC trial was the first multiple-dose study conducted for the compound in dyslipidemic patients, following a single-dose study in healthy volunteers, in which the maximum safe dose (MSD) was established. The 4-week study included, in the initial planning phase, 6 ascending doses of the compound, denoted here d1, d2, …, d6, plus a placebo and an active control arm. The highest dose was later dropped from the design. The primary efficacy endpoint was the percent change from baseline in non-HDL cholesterol, with a requirement of non-inferiority with respect to the active control (not more than 10% less effective than the active control). Superiority to placebo was included as a secondary efficacy criterion. Other lipid variables were measured and used as part of the Go/No Go decision process.

Safety concerns concentrated on potential liver toxicity, with ALT elevation above 3 X ULN being the primary safety event of interest. Evidence of an incidence of 2% or higher of patients experiencing ALT elevations above 3 X ULN in the population would mean a No-Go decision for the corresponding dose.

The sample size that would be required to ensure reasonable statistical power (at least 80%) to properly identify non-inferiority in efficacy and safety problems for a given dose were considerably beyond the available resources for the PoC study. A combination of mixed-effects modeling and Bayesian methods was used to allow more efficient use of the available information. In addition, for the safety analysis, external baseline data from previous studies in similar patient populations was used to improve the performance of the method in detecting small safety signals. These approaches are described in more detail in the next two sections.

A. Detecting safety signalsPre-clinical data from toxicity studies in dogs was used to formulate an empirical dose-time response model for ALT variation. Because the longitudinal measurements are made on the same subject, the model needs to take into account the correlation among the observations. This is typically done by assigning subject-specific parameters in the model, leading to a so-called mixed-effects model. In addition, the model also needed to characterize the dose-response relationship at any given time point. Nonlinear relationships were required for both the time- and dose-response components of the model, thus leading to a nonlinear mixed-effects (NLME) model.

Instead of modeling the ALT directly, though, the model was developed for the percent change in ALT from baseline. This reduced the impact of inter-subject variation (each baseline value served as a subject’s own control) and, most importantly, allowed the use of external baseline data to improve the estimation of reduced safety signals in the patient population, as described later in this section. The pre-clinical data was also used to derive prior distributions for the model parameters in the Bayesian NLME model, with an appropriate discount factor used to keep the prior relatively “non-informative”. Dose-scaling methods based on physiologically based (PB) pharmacokinetics (PK) modeling, were used to allow the conversion of the dog toxicity results to humans. The main motivation for using external baseline data in the safety analysis for

14

Page 15: Optimizing_Trial_Design.doc.doc

dose escalation was to get an estimate for what would be the percentage of patients with ALT elevations exceeding 3 x ULN in a patient population like the one from the placebo data. This allowed the conversion of the ALT dose-time response model results into a more clinically meaningful metric of "% of responders." This could also have been directly estimated from the percentage of patients in the cohorts with ALT > 3 ULN, but, because of the small sample size, this estimate is rather poor and cannot be reliably used to detect 1% or 2% incidence in the patient population.

After each cohort was completed, the ALT longitudinal measurements were used to update the fit of the Bayesian NLME model. The expected percentages of ALT > 3 X ULN events in the patient population after 4 weeks of treatment for each dose in the trial were estimated as follows. First, individual longitudinal profiles for the percentage change from baseline in ALT were simulated from the current estimated model. Figure 5 provides an example with a sample of profiles derived from the pre-clinical model.

Figure 5. Sample of longitudinal profiles of percent change from baseline in ALT simulated from pre-clinical model.

The simulated profiles are then combined with external baseline ALT data from previous trials with a similar patient population to produce ALT profiles (i.e., for each simulated profile, a baseline ALT value was randomly chosen from the external database and used as a multiplier for the percentage changes) which were then used to estimate the incidence rate of ALT > 3 X ULN events in the patient population. Figure 6 displays ALT profiles obtained by applying this approach to the sample of percentage change ALT profiles shown in Figure 5. For visualization purposes, only a relatively small number of simulated profiles are included in Figures 5 and 6. In practice, a much larger

15

Page 16: Optimizing_Trial_Design.doc.doc

(e.g., 100,000) number of simulated profiles are used to estimate the safety event incidence rates for each dose.

Figure 6. Longitudinal ALT profiles corresponding to the percentage change ALT profiles from Figure 5 combined with external baseline data.

The original dose escalation algorithm proposed for the trial was as follows: escalate until two or more patients with ALT > 3 X ULN (at any given time during the four week period) are observed in dose di, or until the MTD is reached. The maximum safe dose is then defined as the highest dose at which at most one patient experienced ALT > 3 X ULN, if any. Although simple to implement, this algorithm is not capable of detecting incidence rates around 2% for the safety event, as discussed below.

The Bayesian NLME modeling approach provides a more efficient alternative. It can be used in a dose escalation algorithm similar to the continual reassessment method (CRM) that has become popular in Oncology. The CRM-like algorithm escalates the dose until it is clear that the current, or the next, dose has an incidence rate of ALT > X 3 ULN larger than the desired 2%. More precisely, the doses are escalated until the probability of dose di having a 2% or higher safety incidence rate is greater than 90% (such probability is calculated under a Bayesian framework), or the MTD is reached. The MSD is then defined as the highest dose for which the probability of having a 2% or higher safety incidence rate is less than 90%, if any. Table 1 provides a comparison of the probabilities of each dose being selected as the MSD under the two dose-escalation algorithms, under the dose-response profile estimated from the pre-clinical data. The results clearly show the problem with the original dose-escalation algorithm: it is only capable of detecting incidence rates much larger than 2%. The highest dose has over a 50% chance of being selected as the MSD, even though its associated incidence rate is 10%. By benefiting from the external baseline data, the dose-time modeling of the continuous ALT values, and the Bayesian framework, the CRM-like algorithm has a much better performance, choosing the “right” dose (i.e., the dose closest to the 2% incidence rate) with about 50%

16

Page 17: Optimizing_Trial_Design.doc.doc

chance. Note that, even under this more efficient algorithm, there is still a substantial chance of selecting an MSD with an incidence rate larger than desired. Larger cohort sizes would be required to improve the operational characteristics of the algorithm.

% Probability Dose (d) is identified as MSD

Dose Incidence rate (%) Original CRM-Like

d1 0.90 1.17 0.59

d2 1.16 2.64 12.22

d3 2.04 4.59 49.56

d4 3.33 12.32 26.20

d5 6.16 22.48 4.40

d6 10.00 56.79 7.04

Table 1. Comparison of original and CRM-like dose-escalation algorithms, under pre-clinical estimated toxicity.

B. Establishing efficacyA similar approach was used to evaluate the primary efficacy endpoint. The percent change from baseline in non-HDL cholesterol was represented through a Bayesian dose-time response NLME model, derived from the pre-clinical data and using discount factors for prior distributions. The non-inferiority comparison to the active control, as well as the superiority evaluation with respect to placebo, was determined in terms of difference in percentage change from baseline. In this case, external baseline data was not used. For the efficacy evaluation, the go/no go criterion was already established in terms of the continuous measurement (change from baseline) that was being modeled, so we did not need to use the placebo data for an "indirect estimation" as in the safety case. Placebo data could have been used to improve prior knowledge about some of the parameters in the Bayesian model (e.g., baseline triglycerides), but was not used in this actual trial. The original criterion to be used in the study to establish the efficacy of a dose d i was based on non-inferiority to the active control: if the upper 90% confidence bound on the difference in percentage change from baseline in non-HDL cholesterol was below the non-inferiority margin of 10%, the dose would be declared efficacious. Under the design assumptions (e.g., cohort size, variability at Week 4, etc) the power to establish efficacy under equivalence (i.e., same average percentage reduction in non-HDL cholesterol from baseline) was less than 60%. This was considered by the clinical team as inadequate for decision making, but increasing the cohort size was not an option.

This motivated the evaluation of model-based alternatives, which ultimately led to the development of the Bayesian NLME dose-time response model. The efficacy test under this approach is as follows: dose d1 is considered efficacious, in a non-inferiority sense, if there is at least a 90% probability that it is not more than 10% worse than the active control with regard to percent reduction in non-HDL cholesterol from baseline. This

17

Page 18: Optimizing_Trial_Design.doc.doc

probability is evaluated under a Bayesian framework. Using this approach, the power to establish efficacy under equivalence and the same design assumptions as before was increased to 78%, being adopted in the study protocol as the primary efficacy analysis. The test for superiority with respect to placebo was defined similarly: at least 90% probability that the dose produced a higher reduction in non-HDL cholesterol from baseline than placebo.

Figure 7 displays the (posterior) distributions of the mean differences in percentage reduction of non-HDL cholesterol from baseline with respect to the active control (a) and placebo (b), based on simulated data with similar characteristics as the study data. P-values refer to the (posterior) probabilities of being inferior to the active control by more than 10% (a) and inferior to placebo (b).

Figure 7. Posterior distributions per dose of mean differences in non-HDL change from baseline with respect to active control (a) and placebo (b).

Under the assumed efficacy rules, doses d3, d4 and d5 are considered non-inferior to the active control and only the smallest dose d1 is not superior to placebo.

The Bayesian NLME modeling approach offers the additional advantage of allowing the use of information from different phases of development in an integrated way. The pre-clinical data was used to derive the model and prior distributions (with appropriate

18

Page 19: Optimizing_Trial_Design.doc.doc

discount factors). Because non-HDL is also to be used as an efficacy endpoint in Phase IIb, the same modeling approach will also be used for estimating the target dose. The posterior distributions for the model parameters obtained at the end of the PoC study will be used to construct the prior distributions for the dose-ranging trial, with appropriate discount factors used to reduce the impact of the PoC results on the Phase IIb trial.

Modeling and Simulation for Dose and Dose Regimen Selection

Two of the most important goals of modeling are a) to determine the dose(s) and dosing regimen(s) that achieve the target clinical benefit while minimizing undesirable adverse effects, and b) to optimize clinical development strategy by applying simulation techniques to evaluate various clinical study design options. Two examples that illustrate the value and the power of modeling and simulation in dose selection are provided below.

Case Study: Dose selection in rare genetic disorder:A monoclonal antibody against IL-1b has been developed to treat an IL-1 dependent inflammatory disease called Muckle Wells syndrome, a rare genetic disorder characterized by fever, urticaria, joint pain, and malaise. The antibody is delivered parenterally and binds to free IL-1b, driving it into the inactive complex and leading to remission of symptoms. Total IL-1b, representing mainly the inactive complex, increases after dosing and can be measured (see Figure 8). By the laws of mass action, the free and active form of IL-1b, which cannot be measured, must decrease. The reduction in free IL-1b results in a decrease in markers of inflammation, including C-reactive protein, and a remission of clinical signs and symptoms of disease.

The clinical data can be captured in a mathematical model, which is continuously adjusted to fit the available emerging data. After building the model, we can simulate, that is, explore on the computer, until we are reasonably certain that we have a suitable dose and dosing regimen (interval between doses) which gives the desired response for the majority of patients (e.g. 80% certain that 90% of patients will be flare free for 2 months). The data derived from this modeling exercise allowed for selection of a dosing regimen that is now being investigated in a Phase III program.

19

Page 20: Optimizing_Trial_Design.doc.doc

560504448392336280224168112560-56

120

100

80

60

40

20

0

-20

-40

-60

-80

5

4

3

2

1

0

Time (days)

C-reactive protein (mg/L)

Symptom score and probability560504448392336280224168112560-56

1e+4

1000

100

10

1

0.1

0.01

100

10

1

0.1

0.01

0.001

1e-4

1e-5

Time (days)

Concentration (µg/mL)

Concentration (pg/mL or nM)

Flaring

Antibody

Total IL-1 (complex)

Free IL-1 is suppressed

C-reactive protein

Symptoms

Remission

Figure 8. Modeling and simulation to aid in dose and dose regimen selection for a monoclonal Ab treatment of a rare genetic inflammatory disorder.

Case Study: Dose Selection in Type II Diabetes Mellitus:A DPP IV inhibitor is being developed for treatment of type 2 Diabetes. The causal chain of events leading from DPP IV inhibition, through increases in GLP1A and decreases in plasma glucose concentrations, to reductions in HBA1C, a qualified surrogate endpoint for diabetes, is reasonably well understood from a qualitative perspective. A more quantitative assessment can be provided by linking the series of mathematical models for each of the measures in the causal chain of events (Figure 9). Thus, in the example provided below, the model can predict the impact of changing the dose or dosing regimen for the DPP IV inhibitor on the clinical outcome, estimated by the surrogate marker HBA1C.

20

Page 21: Optimizing_Trial_Design.doc.doc

LAF 237

Dose

LAF 237

PK

DPP-IV

inhibition

GLP1A

increase

Glucose

reduction

HBA1c

reduction

PK Modeling PK/PD Modeling

PD/Biomarker Modeling

Biomarker/Efficacy ModelingID 5 , DAY 1

TIME (h)

DV

/IP

RE

(m

g/l)

0 10 20 30 40

0.0

0.0

020

.004

0.0

060

.008

0.0

100

.012

LAF PREDICTED

DPP I

V INH

IBITIO

N

0.005 0.050 0.500

020

4060

8010

0

1.41.51.61.71.81.92.02.12.22.32.42.52.6

1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6

LAF vs Time DPP IV vs LAF Glucose vs DPP IVObserved HbA1c vspredictions based on glucose reductions

● INPUT: PK data, biomarker data, outcome data

● OUTPUT & VALUE: Predictions of clinical outcomes for different dosing regimens

Figure 9. A series of models to link dosing regimens with pharmacodynamic parameters and ultimately clinical outcomes.

Adaptive Trial Designs in Early Development

The standard approach to early development programs is to have separate trials for proof-of-concept, to show that a drug is effective within tolerable limits established by Phase 1 studies; dose ranging, to determine the range of doses that are most interesting in terms of efficacy and safety; and dose selection, to choose a dose to carry forward to Phase 3.

Adaptive trial designs allow the use of information collected during a trial to perform mid-trial design modifications. These changes are pre-specified in the protocol. Adaptive designs allow for initial uncertainties in trial design to be confirmed or adapted during the trial. Data collected before and after the adaptation are used in the final analysis. Possible adaptations include adjustments to sample size, allocation to treatments, the addition or deletion of treatment arms, inclusion/exclusion criteria for the study population, statistical hypotheses such as non-inferiority or superiority, and combining trials or treatment phases.

Adaptive designs offer several benefits over the standard approach. For example, a PoC trial can be combined with a dose-ranging trial to yield a single trial. This approach has distinct advantages, in that it reduces start-up cost, reduces time between trials (the “white space”), potentially increases power, and potentially improves estimates of dose-response. Adaptive designs may also enable working with more candidate doses without increasing sample size. This is important to reduce risk of failure in confirmatory trials,

21

Page 22: Optimizing_Trial_Design.doc.doc

where it has been estimated that industry-wide 45% of Phase III programs do not have the optimum dose.

Adaptive Dose FindingA key concept in this approach is to assign doses adaptively, i.e., assign dose to the next subject based on responses of previous subjects with dose assignment chosen to maximize the information about the dose response curve. For most drugs the true dose-response relationship is unknown. In a traditional dose-finding trial, selecting a few doses may or may not adequately describe the dose-response relationship and many patients will be allocated to ‘non-informative’ doses. In adaptive dose finding (Figure 10), the strategy is to initially include only a few patients on many doses to determine the dose-response, then to allocate more patients to the dose-range of interest. This reduces the allocation of patients to ‘non-informative’ doses (‘wasted doses’).

Adaptive Dose Finding

Increased number of doses + adaptive allocation

Res

po

nse

Dose‘Wasted’

Doses‘Wasted’

Doses

Figure 10. Illustration of the impact of adaptive dose finding in which a greater number of doses on the dose-response curve can be studied, minimizing study of ‘wasted’ doses.

Therefore, adaptive dose assignment limits the number of subjects given doses of little interest (too high / too low) as compared to the standard approach. This has the ethical advantage over fixed randomization as fewer subjects are assigned doses that are too high or too low. It also potentially avoids conducting a separate trial due to the fixed dose finding trial not adequately defining the dose range.

Case Study: Combining PoC and dose ranging trials into a single adaptive trial.This trial evaluated an analgesic drug to treat dental pain, and tested seven doses of the drug. The primary endpoint was assumed to follow a normal distribution with standard deviation (SD) = 9. Several designs with different sample sizes, randomization ratios of

22

Page 23: Optimizing_Trial_Design.doc.doc

drug to placebo and starting doses were simulated against several scenarios. Here we describe one design with sample size =120 (40 placebo, 80 drug).

Bayesian adaptive trials were simulated over seven scenarios to enable comparisons with standard designs. Response was assumed to be monotone in the range below the maximum tolerable dose as determined in Phase 1 studies. Monotone curves have been found to be applicable to a wide variety of drugs, and they were determined to be appropriate for the study drug in this trial. Seven scenarios which represented the gamut of likely dose response curves were chosen from the family of four-parameter logistic functions. This is a flexible family capable of representing concave, convex and sigmoidal response curves in the range of candidate doses. Figure 11 below shows the scenarios.

0

5

10

15

20

25

0 1 2 3 4 5 6 7 8

Dose

Me

an

Re

sp

on

se Scenario 1

Scenario 2

Scenario 3

Scenario 4

Scenario 5

Scenario 6

Scenario 7

Figure 11. Representation of seven likely dose-response curves based on earlier studies

In simulations, it was found that across all seven scenarios a single adaptive trial can replace two standard trials (PoC and Dose-ranging). As a result, the combined trial has greater power than the standard PoC design, and the combined trial is substantially better in estimating the dose response curve.

The power of the trend test for proof-of-concept was always greater for the adaptive design as shown in Table 2 below.

.

1 5 52 33 843 32 674 86 1005 85 996 100 1007 99 100

ScenarioPower of Standard POC trial (sample

size = 30)Power of combined

Adaptive Design

23

Page 24: Optimizing_Trial_Design.doc.doc

Table 2. Comparison of the statistical power for adaptive and standard PoC designs.

When there was a small effect size at the highest dose (scenarios 2 &3), the power of the adaptive design was about double that of the standard design. When the effect size was modest (scenarios 4 & 5), the power was increased to practically 100%. When effect sizes were large (scenarios 6 & 7), the power was almost 100% for both adaptive and standard designs.

For the same total sample size, the adaptive combined PoC-dose finding trial is more efficient than the two standard trials in estimating the response at every dose. This is illustrated in Figure 12 below. The continuous curve shows the efficiency of the adaptive design relative to the standard dose ranging design for scenario 7. Efficiency at each dose is defined as the ratio of the estimation error of the standard design to the estimation error of the adaptive design. The bars show the number of subjects allocated to each dose by the adaptive design. These results are computed by averaging the results of 1000 simulations.

0

2

4

6

8

10

12

14

16

18

20

1 2 3 4 5 6 7

Dose

Av.

All

ocati

on

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

10.0

Eff

icie

ncy

av. allocation Efficiency

.Figure 12. Efficiency of the adaptive design relative to the standard dose ranging design for scenario.

The overall efficiency across all doses is greater by a factor of five, while for the sloping part of the dose response curve (doses 4, 5, 6), the adaptive design is three times more efficient. In Figure 13 below, the adaptive combined PoC-dose ranging trial with 60 subjects is as efficient in estimating the response at every dose as the two standard trials with a combined sample size of 120 subjects. It is also as powerful in testing for proof-of-concept.

24

Page 25: Optimizing_Trial_Design.doc.doc

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

10.0

1 2 3 4 5 6 7

Dose

Eff

icie

nc

y

Adaptive (sample size =60)

Adaptive (sample size = 120)

Standard design (sample size =120)

Figure 13. Comparison of Efficiency of 60 or 120 subject adaptive trial with standard design of 20 subjects.

Thus, the above results are true irrespective of which of the seven scenarios reflects the true dose response curve. For all seven scenarios for the same sample size, the efficiency of the adaptive design was about five times that of the standard design over all doses. It was three times that of the standard design for estimating dose response in the sloping part of the dose response curve. Another way to think about this result is that for half the sample size, the adaptive design is as powerful and efficient as the standard approach with two trials.

Logistical Considerations for Adaptive Designs during Exploratory DevelopmentIn pragmatic terms, adaptive trial designs require the following:

quickly observable response relative to accrual rate or good longitudinal models to forecast end points in time to adapt dose assignments for future subjects.

more statistical up-front work to build model of the dose-response curve and to perform simulations.

many simulations required to find the best combinations of. sample size, randomization ratio between placebo and drug, starting dose and number of doses. Need efficient programs and fast computing platforms.

infrastructure for rapid communication of responses from sites to central unblinded analysis center and rapid communication of adaptive dose assignments to sites

randomizer software capable of rapidly computing dynamic allocation of doses to subjects (pre-specified randomization lists will not work)

25

Page 26: Optimizing_Trial_Design.doc.doc

flexible drug supply process, as demand for doses is not fixed in advance, but evolves as information on responses at various doses is gathered as the trial progresses

Confirmatory Phase of Development

Optimization of trial design during confirmatory development holds the promise of greater success rates, improved efficiency, compressed timelines, smaller overall programs and lower attrition rates. A number of novel approaches to confirmatory development that may contribute to fulfilling this promise are highlighted below.

Seamless adaptive designs

Another opportunity for adding efficiency to the drug development process is through use of so-called seamless designs, which aim to combine into a single trial objectives traditionally addressed in separate trials. An important specific example in a confirmatory context is a seamless adaptive phase II/III design addressing objectives normally achieved through separate phase II and phase III trials. Most typically, this type of design involves selection of a treatment group for continued investigation in a confirmatory trial. Another example is the selection of an optimal patient population for further study, a scenario likely to increase in frequency as medical treatment becomes more stratified. The first stage of such a trial typically would look much like a late phase II trial, with a control group and several treatment groups (e.g., different dose levels of the same treatment). Results are examined at the end of the first stage, and one or more of the treatment groups are selected to continue, along with the control group, into the trial’s second stage. The final analysis comparing the selected group(s) with the control will use data from the continuing groups from both stages of the trial.

The potential advantages of the seamless design are:

reduction in the duration of the clinical program, by elimination of the time period traditionally occurring between phase II and phase III trials;

greater efficiency from the use of data from both stages, which may require fewer patients to obtain the same quality of information;

earlier acquisition of long-term safety data, by continued follow-up of patients from the first stage.

Figure 14 below schematically illustrates a seamless adaptive design, and in a particular example quantifies its advantages versus a traditional separate-trial approach. Four doses, along with a control, are investigated, and the best dose is chosen at interim to continue. Compared to a separate-trial approach using the same number of patients, there is a half-year savings in time, and a 9% increase in power. If the efficiency of the adaptive approach were used to reduce patients instead of increasing power, the power of the separate-trial approach could be achieved with fewer patients in the adaptive design.

26

Page 27: Optimizing_Trial_Design.doc.doc

Comparison of Seamless/Adaptive Design with a separate Phase 2 and Phase 3 approach

2.5 years83 %1,200

2 years92 %1,200

Total Duration

Prob. of Success

Number of Patients

Design

Adaptive Design

A

BCD

Control

ClassicalA

BCD

Control

Figure 14. Hypothetical illustration of the potential advantages of a seamless/adaptive trial design compared to a classical approach to Phase IIb and III studies.

Not all programs will be candidates for these designs. Perhaps most importantly, this approach will tend to be appropriate when a late stage of phase II has been reached, so that most questions necessary to proceed to phase III have been answered earlier in the program, and the nature of a confirmatory trial can be closely envisioned, apart from the decision being addressed at the selection point of the seamless trial.

Other feasibility considerations for use of these designs include:

Length of follow-up time for the endpoint used for selection vs duration of enrollment: shorter follow-up will be more conducive to a seamless adaptive design, while a relatively long endpoint follow-up period will tend to argue against using such a design.

Drug supply and drug packaging will be expected to be more challenging in this setting; development programs which do not involve complex treatment regimens might thus be better suited to such designs.

The processes by which interim data are examined, and the selection decision is made and implemented, must be considered very carefully. Current conventions that restrict knowledge of interim results in ongoing trials should be respected in order to avoid compromising the interpretability of trial results. In some cases the decision being made at the selection point of a seamless design will be one for which sponsor perspective may be relevant and which has traditionally been a sponsor responsibility, raising the question of sponsor involvement in the monitoring process. Operating procedures for the monitoring process must be carefully considered to ensure that the right expertise is

27

Page 28: Optimizing_Trial_Design.doc.doc

brought to bear on the decision, while limiting access to the accruing data in order to maintain trial integrity.

Other issues and considerations for seamless adaptive designs:

The endpoint used for selection need not be the same as the endpoint to be used in the main study analysis; if a good surrogate marker is available, this may be used and may enhance the efficiency of the seamless trial.

Modeling and simulation will likely play a very important role in developing the specific details of seamless designs (e.g., per-group sample sizes in the different stages, considered under a variety of underlying scenarios) to ensure that they are robust and efficient.

The final analysis must use statistical methodology appropriate to the design, i.e., ‘naïve’ comparisons of control vs the selected treatment that do not account for the design will not be appropriate.

The appropriateness of the design does not depend on any particular algorithm for choosing the patient group to be continued; it is not even necessary for a firm algorithm to be specified in advance, though the general principles that will govern the decision should be clear in advance.

Sample Size Re-Estimation within a Confirmatory Trial (Phase III)

The purpose of this section is to describe the current choices available to appropriately utilize a portion of the information obtained within the current confirmatory study in order to inform and adjust the sample size necessary to increase confidence, going forward, in answering the primary study questions posed. We intend to describe the reasons why this is often necessary, and what circumstances and study design features actually lend themselves to these approaches. Also, we will attempt to provide a balanced set of statistical and non-statistical ‘real world’ arguments related to the strengths and weaknesses of each approach, typically by presenting illustrative examples.

The primary goals of a confirmatory clinical trial are to ensure that the test treatment or diagnostic does not cause more harm than good (safety), and to efficiently and confidently find the actual effect size of the chosen primary outcome(s) within the identified patient population (efficacy). In practice, the initial estimate of sample size, as well as its possible subsequent modification, is motivated by the desire to demonstrate that the experimental arm has greater efficacy than the control arm. The goal is not necessarily to minimize or reduce sample size, but to ensure an appropriate sample size in order to confidently answer the primary study questions.

The SCID Parameter and the Nuisance Parameter(σ)Before proceeding to discuss design approaches, it is important to point out a fundamental difference between uncertainty about δ (the underlying treatment effect) and the uncertainty about σ (the between-subject variability). The parameter δ denotes the

28

Page 29: Optimizing_Trial_Design.doc.doc

true underlying difference between the treatment and control arms with respect to the primary endpoint. Even though the true value of δ is unknown, the trial investigators will usually have in mind a specific value, say, that represents the smallest clinically important delta (SCID) for this clinical trial. They will wish to determine the sample size that can detect values of δ that exceed the SCID with good power. In contrast, the standard deviation σ is simply a "nuisance parameter" whose true value must be estimated somehow in order to proceed with the sample size calculation. For purposes of this paper, we will assume that σ = 1. Keeping this distinction in mind we now consider the various design options.

Standard Method: Fixed Sample Design with σ Treated as KnownThe standard method to power a study is to first estimate the δ of the primary endpoint based on available prior information. That estimated δ should be, at a minimum, the SCID of the primary endpoint. Previous knowledge of the potential effect size may provide further refinement of baseline assumptions. This approach seeks to provide adequate power to find an effect equal to or larger than the SCID.

The SCID can often but not always be pre-specified from purely clinical arguments, whereas the actual effect size is unknown. Therefore, one could in principle design a study with a fixed sample size that will have adequate power to detect the SCID, in the absence of adequate prior information about the actual effect size of the test agent. This is what statisticians envisaged when they created the fixed-sample methodology. However, this fixed sample methodology has several drawbacks. If the actual effect is less than the SCID, the trial will fail and was unnecessary in the first place. If the actual effect is substantially larger than the SCID, a smaller sample size would have sufficed to attain adequate power.

Issues with the standard approach in the ‘real world’Often sponsors cannot justify risking significant resources solely for a trial size resulting from SCID assumptions, especially when this results in a study much larger than the current ‘best guess’ at the actual effect size. The sponsor typically determines the sample size they can justify. Then, the resulting effect size they are powered to see is calculated. This almost always results in a smaller trial than that which would be required if it were calculated using the SCID. This approach is the complete reverse of what was originally envisioned by the mathematicians who developed these methods. If it is impractical to use the SCID as the effect size in the calculation, the standard methods may require one to ‘take a guess’ with relatively little data on too many important factors.

In fact, the current process is extremely circular: the approach is very dependent on the estimate of δ. There is typically not enough early data to describe the actual δ in the specific study population of

29

Page 30: Optimizing_Trial_Design.doc.doc

interest. It is the δ that is a major factor in determining appropriate sample sizes. If the δ were known to any reliable certainty, then there would be little need to perform the study.

One approach to solving the problem of uncertainty about is to design and execute an additional number of exploratory trials (typically Phase II studies). These small phase II studies are typically carried out to get a more precise estimate of the actual δ and σ so that the pivotal study might be adequately powered. Each exploratory trial, although somewhat smaller than confirmatory trials, still requires significant resources to perform appropriately. Also, the inevitable start-up time and wind-down activities between trials have to be included when determining true program efficiency as well as development timelines. Thus, this might not be the most efficient way to proceed from the view-point of the entire clinical trial program.

It is helpful to discuss sample size calculations in the context of a concrete prototype example. Accordingly, we will consider a randomized clinical trial whose primary endpoint is a continuous outcome. However, the issues to be discussed in this context will apply equally to other types of outcomes such as binary outcomes or time to event outcomes. Figure 15 demonstrates a hypothetical situation in which a study is powered to a sample size of 1000 subjects. Assuming that , a 1000 person two-arm trial has adequate power to detect a value . The study sponsor has determined that the SCID is 0.1. One can calculate that a sample size of 4000 subjects would be required for adequate power to detect the SCID. The sponsor cannot, however, rationalize from a patient risk exposure standpoint,or as a resource investment, to perform a 4,000 subject trial given the available sparse evidence from early phase trials. Suppose that the true underlying value of is 0.15. In that case the a sample size of 2000 subjects would be required to adequately power the trial so as to detect this difference, The difficulty is, of course, that the true underlying value of is not known at the start of the trial. In this example, the 1000 subject study would most likely yield a non-significant result, since it is only powered to detect an effect size of 0.2, which is larger than the actual effect size of 0.15.

30

Page 31: Optimizing_Trial_Design.doc.doc

Figure 15. Sample ‘real world’ situation regarding sample size

In the above, albeit rather extreme example, if the decision after completing the 1000 patient under-powered trial were not to repeat the trial with a larger sample size, then a potentially efficacious treatment would be unfortunately and wrongly discarded. If the decision were to repeat the trial with the additional re-estimation of the actual effect size, using the point estimate of this underpowered trial, then another 2000 subject trial would need to be executed, and the original 1000-subject trial would have been relatively wasted, not to mention the time it would take to initiate and execute an additional trial. The time and resources to perform the study (sometimes upwards of 1-3 years) would have been spent without much benefit other than gaining a more reliable estimate of the actual effect size in order to design another trial. More importantly, the subjects for that study would have been put at unnecessary risk because the study had no real chance of being definitive.

Advantages of flexible designs for sample size re-estimation in confirmatory trialsBased on the issues outlined above, it would thus appear that a more flexible approach to the fixed sample-size methodology is needed. By altering the sample size using interim data from the trial itself, this flexibility can be achieved without compromising the power or the false positive rate of the trial.

Flexible trial designs provide additional well controlled data with the specific defined patient population to inform and potentially adjust sample size decisions, resulting in more efficient and successful designs. This is accomplished in a number of ways. First, by having access to more reliable data on which to make crucial estimates of important factors, such as effect size, sponsors can avoid inconclusive results stemming from underpowered trials. In addition, relative to standard approaches, these techniques have the potential to aid in earlier detection of futile trials or demonstration of superior effectiveness.

31

Primary endpoint effect size

N=4000“Smallest Clinically

Important Delta”(SCID=0.1)

N=2000 Actual effect size

(True )

N=1000Powered to see

effect size ()

Page 32: Optimizing_Trial_Design.doc.doc

Sample size re-estimation should be considered in two situations: (1) when there is significant uncertainty about σ, or (2) when there is a substantial difference between the sample size resulting from using the SCID and the sample size the sponsor can justify based on their ‘best guess’ of the effect size, with or without additional uncertainty about σ.

Sample size re-estimation usually involves the choice of a suitable initial sample size including one or more interim analyses at which the sample size will be re-assessed. There are two distinct strategies for choosing the initial sample size, and then altering it on the basis of data obtained at various interim analysis time points. One is the group sequential strategy, and the other is the adaptive strategy. In the group sequential strategy one starts out with a large up-front sample size commitment and cuts back if the accruing data suggest that the large sample size is not needed. In the adaptive strategy one proceeds in the opposite direction, by starting out with a smaller initial sample size commitment but with the option to increase it should the accruing data suggest that such an increase is warranted. These two approaches are discussed next, in separate sections.

Group Sequential Design Suppose that the sponsor is unsure of the true value of δ but believes nevertheless that it is larger than the SCID. In that case a group sequential design might be considered. Such a design is characterized by a maximum sample size, an interim monitoring strategy, and a corresponding boundary for early stopping for efficacy. The maximum sample size is computed so that the study has adequate power to detect a value of δ that the sponsor believes represents a reasonable estimate of the efficacy of the experimental compound, provided this estimate is at least as large as the SCID. If the sponsor wishes to be very conservative about this estimate, the maximum sample size needed can be computed to have adequate power at the SCID itself. An up front commitment is made to enroll patients up to this maximum sample size. However, if the true δ exceeds the SCID, the trial may terminate earlier with high probability by crossing an early stopping boundary at an interim look.

To return to our earlier example, suppose that the sponsor decides to make an up-front commitment of 4000 patients to the trial but intends to monitor the accruing data up to four times, after 1000, 2000, 3000 and 4000 patients become evaluable for the primary endpoint. The commitment of 4000 patients ensures that the trial will have 88% power to detect a difference as small as δ = 0.1 (in this case the SCID). Although this is a rather large sample size to commit to the trial, the actual sample size is expected to be substantially smaller if the true δ is larger than the SCID. This is so because at each of the four interim monitoring time points, there is a chance of early termination and a declaration of statistical significance. At each interim look all the accumulated efficacy data will be summarized into an estimate of δ divided by its standard error. This standardized estimate of δ is referred to as a Wald statistic. The Wald statistic will be compared to a corresponding early-stopping boundary for that look and the first time that a boundary is crossed, the trial will be terminated and the experimental arm will be

32

Page 33: Optimizing_Trial_Design.doc.doc

declared to be more efficacious than the control arm; i.e., we will claim that δ > 0. This is depicted in Figure 16 below.

Figure 16: Four-Look Group Sequential Design with Early Stopping at Look 2

The four boundary values of the Wald statistic are displayed on the Y-axis and are located at 4.33, 2.96, 2.36 and 2.01 standardized units above zero, respectively. If the true value of δ is 0, the Wald statistic has a 0.025 probability of crossing one of these four boundaries and thereby creating a false positive result for the clinical trial. On the other hand if the true value of δ is 0.1 (the SCID) the chance of crossing one of the four boundaries on the positive side is 0.88, the power of the clinical trial. Finally, suppose that the actual value of δ is 0.2 (double the SCID). Now the chance of crossing the upper efficacy boundary is 1. Furthermore there is a substantial chance of crossing one of the early boundaries and saving on sample size. One can show that the chance of crossing the very first boundary, after enrolling 1000 subjects is 12%. The chance of crossing the second boundary, after enrolling 2000 subjects is 81%, the chance of crossing the third boundary, after enrolling 3000 subjects, is 6%, and there is only a 1% chance of going all the way to the end with 4000 subjects being enrolled. Thus, even though the up-front sample size commitment was 4000 subjects, the actual sample size is substantially less. On average, the sample size is only 1944 patients.

The early stopping boundaries displayed in Figure 16 must satisfy the regulatory requirement that the false positive rate (or type-1 error) of the clinical trial is limited to 0.025. That is, they must satisfy the requirement that the probability of crossing any one

33

Page 34: Optimizing_Trial_Design.doc.doc

of them when in fact δ = 0, is only 0.025. Further details on this aspect of the methodology and actual sample size calculations are described in Appendix I.

The Adaptive DesignThe group sequential design described above is characterized by pre-specifying a maximum sample size up front and possibly terminating earlier if the true δ is larger than anticipated. In contrast, an adaptive design is characterized by pre-specifying a smaller initial sample size commitment based on a less conservative assumption about the unknown δ, but with the possibility of increasing the commitment after seeing some interim data from the trial. On the surface, this is rather like the usual practice of first running a small phase II trial to get some idea about efficacy and safety, then following up with a larger phase III trial once the efficacy and safety of the compound have been established. There is, however, an important distinction between the conventional phase II followed by phase III strategy and the adaptive strategy outlined in the present section. In the conventional approach, the data from the phase II trial are not combined with the data from the phase III trial. The adaptive design, however, utilizes all the data from both stages for the final analysis. This can have important advantages both in terms of gaining additional statistical power, as well as shortening the drug development time. In our prototype example, we stated that the SCID was 0.1. We have just seen how we might design a group sequential trial having 88% power to detect the SCID, provided we make an up-front commitment of 4000 subjects to the trial. Sometimes, however, a sponsor might not be willing to make such a large up-front commitment, particularly when the only currently available data on δ is from one or two small phase II trials. The sponsor might feel much more comfortable with a design that starts out with a smaller sample size of say 1000 patients with the opportunity to increase the sample size at an interim look, after observing data from the trial itself. This is the motivation for the adaptive design considered next.

Suppose that the sponsor believes that the true δ = 0.2, that is, twice as large as the SCID. If this is indeed the case then a total sample size of 1000 patients will have 89% power at a one sided level of 0.025. On this basis, the sponsor is prepared to make an initial investment of 1000 patients to this trial. As an insurance policy, however, the sponsor intends to take an interim look at the accruing data at the mid-point of the trial, after 500 patients are evaluable for response. An estimate of δ will be obtained from these 500 patients. If the estimate is smaller than the sponsor expected, then the sponsor might choose to increase the sample size so as to preserve the power of the trial.

Many different criteria can be used to decide if a sample size increase is warranted. A commonly used criterion is "conditional power". The conditional power at an interim look is the probability given the observed data that, upon completion of the trial, the experimental compound will demonstrate efficacy. More specifically, conditional power is defined as the probability that the Wald statistic will cross the final efficacy boundary, conditional on its observed value at the interim look. The conditional power computation requires us to specify a value for δ. One may choose the value specified at the initial design stage or the value estimated from the interim data. In this example we will use the interim estimated value of δ for evaluating conditional power. Table 3 below displays

34

Page 35: Optimizing_Trial_Design.doc.doc

conditional power for various estimated values of δ at the interim look, along with the total sample size needed to achieve 80% conditional power at the final look. The entries in the table assume that Notice the final sample size required to achieve 80% conditional power could either increase or decrease from the initially planned sample size of 1000.

Table 3: Conditional power and sample size table for various observed estimates of after observing data on the first 500 subjects of a 1000 patient trial

Interim Estimate

Conditional Power ifNo Sample Size Increase

Total Sample Size Neededto Achieve 80% Conditional Power

0.2 95% 720 (sample size reduction)0.175 86% 890 (sample size reduction)0.15 72% 1166 (sample size increase)0.125 51% 1757 (sample size increase)0.1 30% 2990 (sample size increase)

Sample size re-estimation based on interim estimates of the primary effect size parameter must be performed in an unblinded manner (see below). Therefore, an independent statistical group, such as a Data Monitoring Committee (DMC), is usually required to perform the interim analysis. Sometimes the sponsor appoints an internal team located at a different site to perform the interim analysis and make the sample size recommendation, but this must be agreed upon in advance by the regulatory authorities.

Extending the Methodology to Unknown Although the group sequential and adaptive design methods were presented under the assumption that the standard deviation is known, they apply equally for the case of unknown . One can start out with an initial estimate of and corresponding sample size estimate. Then, at an interim look one can re-estimate this nuisance parameter and input the updated estimate into equation and thereby re-compute the sample size.

An example of sample size re-estimation conducted because of uncertainty about the variance is shown in Figure 17 below. At the beginning of the trial, the planned sample size was estimated at 150 patients based on a standard deviation of 1.0. At the interim analysis, the actual standard deviation was 1.4. Even though the effect size was as originally predicted, an increase in sample size to 295 patients would be required to maintain 90% power. Without the sample size re-estimation, the power at the final analysis would only be 64% and there would be much greater risk of a failed trial.

35

Page 36: Optimizing_Trial_Design.doc.doc

Sample Size Re-estimation

LearningActive

enrollment

Interim Analysis

Sample size

Re-estimation

Final Analysis

LPFV

Control

Δ = 0.375

Power = 90%

σ = 1.0

n = 150

Δ = 0.375

Power = 90%

σ = 1.4

n = 295

If the sample size is not increased, the power

at the final analysis would be 64%

Figure 17. Hypothetical example of a study in which sample size re-estimation due to uncertainty about σ led to increase in sample size to ensure 90% power was maintained.

There are two ways to obtain the new sample size in the situation of unknown : blinded and unblinded.

In the instance of blinded sample size re-estimation, the sponsor uses pooled data to estimate . This is permitted with no penalty to the alpha. It is preferable that the sponsor pre-specify how many times changes are to be made to the sample size, at what time points, and how the new sample size will be calculated. Usually this type of adjustment will not be permitted by regulatory authorities more than once.

For unblinded sample size re-estimation, the sponsor sets up a mechanism (possibly with a DMC) whereby the sample size re-estimation is based on an unblinded estimate of variability (or statistical information) at the interim analysis. Sample size may be altered one or more times but the maximum statistical information must be pre-specified.

If the sponsor agrees that there will be no early stopping for efficacy at the interim looks, no alpha adjustment is necessary. The DMC may monitor the data one or more times and adjust the sample size up or down based on the unblinded estimate of variability and attempt to reach the pre-specified maximum information.

When the sponsor pre-specifies the interim looks at which it is permissible to terminate early for efficacy, the amount of alpha to be spent at each such interim look must be pre-specified. The maximum information must also be pre-specified. The alpha should only

36

Page 37: Optimizing_Trial_Design.doc.doc

be spent at interim looks where early stopping for efficacy is a possibility. Interim looks that are taken solely to estimate the current information need not consume any alpha. The trial then proceeds until either it is terminated early for efficacy or the maximum information is reached.

Challenges in the use of flexible clinical trialsThe approaches listed above, when used in the right setting with the right safeguards are statistically valid. We have touched on some of the important statistical issues. Here we will focus on some of the non-statistical ‘real world’ issues which confront trial designers and reviewers.

From a purely statistical perspective, it is possible to demonstrate that for any given adaptive design, one can construct a corresponding group sequential design that will have a smaller average sample size for the same level of power. (See, for example, Tsiatis and Mehta, Biometrika, 2003, vol 90, pages 367-378). In order for this efficiency gain to be substantial, however, one would have to utilize aggressive early stopping boundaries, such as are not commonly used in group sequential trials. Most group sequential trials utilize conservative stopping boundaries like the O'Brien-Fleming stopping boundaries discussed earlier, or the Haybittle-Peto boundaries (Pocock S.J., JAMA 2005, vol 294, No 17, pages 2228-2230). In that case the efficiency gain, if any, is likely to be small. As a practical matter then, there are no major statistical arguments for choosing between the group sequential and the adaptive approaches.

There are, however, real preferences when taking into consideration the required business decisions and their concomitant risks. In the group sequential case, in our example, the sponsor would have to commit to a 4000 patient trial up front, with the hopes that the actual effect size will result in an eventual reduction in overall size. In the adaptive case, the sponsor would only have to commit to a 1000 patient trial up front, fully realizing that an additional decision would need to be made at each interim analysis. In the group sequential case (or standard approach for that matter), it would be unethical if the sponsor did not have the resources to perform a 4000 patient trial, but committed to start one. In the adaptive case, the sponsor knows that there is real likelihood that they will have to commit additional resources above the 1000 initial commitment to finish the study. However, in this case (extending our example) they would be making this decision after 500 patients have completed this current well-controlled, appropriately designed and managed study containing the precise patient population, as opposed to making this decision based on relatively little Phase II data. There is no question; the overall business risk is diminished in the adaptive case.

Why not use the adaptive design vs. group sequential approach all the time then? There is justifiable fear that sponsors who will commit to the up front smaller number of patients, will not in fact commit to or be able to increase sample size even after the interim analysis requires it to maintain power. However, in this case, one can argue that

37

Page 38: Optimizing_Trial_Design.doc.doc

there really is no difference from the underpowered fixed design, in that the study will go to 1000 patients either way. In this case, it is a much more informed decision, with significantly more information than existed in the standard approach.

Flexible designs are in general somewhat more complicated than standard designs and require some additional expertise to design and to review. Further, information regarding the results of the current trial must be used to adjust or adapt it appropriately. There is the potential for misuse of this information that could bias the results of the study following the information disclosure. A fundamental underlying assumption of the approach is that the management of the study is no different before and after the interim information is available. It is possible that knowledge of the results of this interim analysis might result in some unwanted change (conscious or unconscious) in the future management of the trial. Even if a DMC were the only group to view the details of the unblinded by-group analyses that are required, the information that would be available to those responsible for carrying out the trial (sponsor, CRO, investigators) might result in some unintended changes in its future management. Therefore, it is often prudent to perform on-going monitoring of important indicators, preferably by the DMC, to determine whether there are systematic differences before and after the analyses. If there appear to be, then it would be important to understand why, and correct this as soon as possible. Either way, it would be important to demonstrate in the final analyses that there was no substantial difference before or after the DMC’s recommendation(s). Although an added potential reason for bias, this does not have to result in bias if care is taken with how the information is disseminated and used. Also, to the extent the study is both randomized and blinded, the possibility of bias is further diminished.

However, there are a number of reasons why a flexible design adds little advantage in some settings. These include situations where the follow-up duration to measure the primary outcome of each patient is long with respect to the time needed to recruit most of the patients. In this instance, there would be no real opportunity for the adaptive approach to adjust the sample size down from the initial sample size since most patients would have been already enrolled at the time the interim analyses can be completed. This in general would also be true for any approach requiring interim analyses to adjust sample size of the current study (including the group sequential approach). Also, for situations where the safety concerns require a minimum number of patients, it would not be appropriate to adjust sample size down below that minimum even if suggested from the interim efficacy analyses. An example of this is in smaller confirmatory studies (such as most medical device studies) where the total sample size is less than 500.

A number of logistical and regulatory actions must be fulfilled in order to avoid compromising an adaptive trial.

38

Page 39: Optimizing_Trial_Design.doc.doc

(1) The actual algorithm for computing the new sample size must be specified in advance. This is usually implemented through creating a charter for the independent trial management committee charged with the responsibility of performing the unblinded interim analysis and reporting the new sample size to the sponsor. (2) The sponsor needs to have developed in-house procedures that ensure that the actual algorithm for increasing the sample size is not broadcast throughout the company, especially not to the study investigators.

There are a number of options for utilizing partial data from an on-going confirmatory trial to adjust the sample size of the trial. These approaches mainly depend on what is currently known about δ and σ, and upon other ‘real world’ considerations. A number of key questions should be asked before attempting to determine if any of these techniques would be of value. These include:1. Has there not been enough early data to fundamentally understand the procedures for

use, and impact on the human physiology of the experimental treatment?2. Has there not been enough early data to produce a well formed hypothesis to test with

a confirmatory trial?3. Is the sample size based more on the minimum required for safety ascertainment

rather than efficacy?4. Is the follow-up duration per patient long with respect to the time to recruit most of

the patients?5. Is there no DMC planned or needed for the study?

If the answer to any of these is ‘yes’ then one should probably not consider a statistical study design that requires the interim analyses of unblinded by-group data. If all are answered no, then in addition, there needs to be a prospective interim analysis plan that clearly specifies the sample size adjustment rules in advance based on those analyses. Also, a data monitoring program should be in place to periodically determine if the results of, or actions taken as a result of the interim analyses led to differential treatment of the study groups before and after the analyses.

The Challenges

The use of a flexible paradigm for optimizing clinical trial design can provide a more effective strategy to assess efficacy and safety with high quality, timeliness and efficiency, while limiting the exposure of patients to ineffective or poorly tolerated drug regimens. There are, however, a number of challenges which limit the extent to which these flexible paradigms have been implemented to date.

Since newer designs require significant statistical analyses, simulations, and logistical considerations to verify their operating characteristics, precisely because they are so flexible, it does take longer during the planning and protocol development phase.

The regulatory agencies and Institutional Review Boards need to accept the design format for interim analysis. Sometimes, these discussions can take a very long time and the company may choose instead to go the traditional route or path of least resistance.

39

Page 40: Optimizing_Trial_Design.doc.doc

These designs require quickly observable responses relative to the patient accrual rate or good longitudinal models to forecast end points in time to adapt dose assignments for future subjects.

Efficient programs, including randomizer software and fast computing platforms are needed.

An infrastructure is needed for rapid communication of responses from trial sites to the central unblinded analysis center and rapid communication of dose assignments to trial sites.

A flexible drug supply process is required to respond to the demand which evolves as information on responses at various doses is gathered in the trial.

Maximizing the use of all potential prior information requires greater collaboration across functional silos in organizations to avoid compartmentalization of data.

Inclusion of a broader sample of datasets may be difficult due to the lack of common data standards.

For novel therapies without proof of concept, competitive hurdles to sharing information considered proprietary inhibits data exchange.

Overcoming internal resistance and aversion to change represents a major hurdle for incorporating the prospective use of novel trial designs and methodologies and modeling and simulation into clinical development programs.

In order to move towards “industrialization” of these new approaches in drug development, a disciplined approach to implementing and sustaining change must be applied across the industry, beginning with education and awareness campaigns targeted to all stakeholders involved in conducting and evaluating clinical trials.

Recommendations & Next Steps

A key barrier to implementation of tools and techniques which advance the quality, timeliness and efficiency of drug development is the ability work across disciplines and amongst stakeholders to understand how and when to apply these solutions. This whitepaper makes the following recommendations to address this challenge and provides tools to enable them.

Define and disseminate a common vocabulary and a common understanding of the value of modern trial designs to all stakeholders.

Develop and disseminate guidelines and case studies for assessing situations where tools should be applied and where they should not be utilized.

Create a methodology for dialogue with regulatory authorities to facilitate discussion of clinical strategies which utilize these tools and address potential constraints and issues.

Identify specific solutions to address challenges which inhibit adoption of modern tools and designs.

In order to implement these recommendations, a set of next steps has been defined for the working group.

40

Page 41: Optimizing_Trial_Design.doc.doc

Review the contents and recommendations of the working group with key stakeholders at the FDA and refine this document as appropriate based on those discussions

Convene a workshop under the CBI Safe Haven with representation from all stakeholder groups to discuss and debate the issues, barriers to adoption and opportunities identified within this work effort and issue a whitepaper reflecting the stakeholder perspectives as a result of that workshop.

Develop education and communication tools to emphasize dissemination of the learnings, including case studies

o Publication(s)o Classroom programo Written materialso Web-based learningo Knowledge-based forum

Establish an ongoing expert panel to continue to update these tools and guidelines as the state of the art evolves in clinical development strategy and trial design.

Identify processes and tools for change management to address challenges in support of relevant stakeholder groups.

41

Page 42: Optimizing_Trial_Design.doc.doc

Appendix I: Methodology

How the Early Stopping Boundaries are ObtainedThe early stopping boundaries displayed in Figure 1 must satisfy the regulatory requirement that the false positive rate (or type-1 error) of the clinical trial is limited to 0.025. That is, they must satisfy the requirement that the probability of crossing any one of them when in fact , is only 0.025. If there was no interim monitoring there would be only one boundary, at the end of the trial when all 4000 patients were enrolled. It is well known that for this single-look case, the magnitude of the boundary that the Wald statistic must cross so as to limit the false positive rate to 0.025 is 1.96 units. If, however, the accruing data are to be monitored up to four times, there are four possible chances of reaching a false positive conclusion when . Therefore the critical boundary value at these four monitoring points can no longer be 1.96. Since there are now multiple opportunities for crossing a boundary, the criterion for declaring statistical significance must be made stricter than 1.96, or else the overall the overall false positive rate of the trial will be inflated. We have seen in the previous section that this is indeed the case, with the four boundaries being 4.33, 2.96, 2.36 and 2.01.

The standard way to obtain the early stopping boundaries that control the type-1 error at level is through the use of an spending function. This is simply any monotone increasing function defined on the interval [0, 1] with and The variable t is the ratio of the current sample size to the maximum committed sample size and is known as the "information fraction". At any information fraction t, the corresponding value represents the portion of the total type-1 error that the trial is permitted to use up. Once this function is specified it is possible to generate stopping boundaries having the property that under the null hypothesis the probability of crossing any boundary at or before time t is Thus by construction, the probability of crossing any boundary at or before time t=1, under the null hypothesis , must be

the required false positive rate.

Let us illustrate with the four-look design displayed in Figure 1 (also Figure 16 in main text).

42

Page 43: Optimizing_Trial_Design.doc.doc

Figure 1: Four-Look Group Sequential Design with Early Stopping at Look 2

Figure 2 below represents the spending function that was used to generate those four boundaries.

Figure 2: Spending Function Generating the Boundaries of Figure 1

43

Page 44: Optimizing_Trial_Design.doc.doc

Since the boundaries are constructed at sample sizes of 1000, 2000, 3000 and 4000, the corresponding information fractions at which the available will be spent are t = 0.25, 0.5, 0.75 and 1. Very little of the available is spent at the first interim look since

. The corresponding first-look stopping boundary must satisfy the requirement that the Wald statistic can cross it with probability 0.00001 under the null hypothesis that This results in the rather large stopping boundary 4.33. At the second look the cumulative type-1 error available for spending is . The corresponding second-look boundary must now satisfy the requirement that the probability of crossing either the first or second-look boundaries under the null hypothesis is 0.0015. Using well known mathematical techniques one can calculate that the second-look boundary has to be 2.96 in order to satisfy this probabilistic requirement. At the third look the information fraction is 3000/4000 = 0.75, the cumulative error available for spending is The corresponding third-look boundary must satisfy the requirement that the cumulative probability of crossing either the first, second or third-look boundaries under the null hypothesis is 0.0096. One can show that in order to satisfy this requirement, the third-look boundary must equal 2.36. At the final look, the cumulative error available for spending is and the corresponding fourth-look boundary is 2.01. Now the cumulative probability under the null hypothesis of crossing either the first, second, third or fourth-look boundaries is 0.025.

It is convenient to think of the total as the available budget to be allocated according to the trialist's wishes at each interim monitoring time point. One could spend all the in one single look or one could spread it out over a number of different looks. In the latter case one could spend very little of the available in the beginning and spend more towards the end of the trial as was the case for our example. This led to very high stopping boundaries in the beginning and lower ones towards the end. Alternatively one might prefer to spend the available more aggressively right from the start as shown below in Figure 3.

44

Page 45: Optimizing_Trial_Design.doc.doc

Figure 3: An aggressive spending function

Boundaries of a four-look group sequential design based on the above spending function would look as shown below in Figure 4.

45

Page 46: Optimizing_Trial_Design.doc.doc

Figure 4: Boundaries corresponding to the spending function in Figure 3

In conclusion, different spending functions yield different boundary shapes, all preserving the type-1 error . The trialist can choose the spending function and corresponding boundaries that best satisfies the needs of the trial. The smaller the incremental that the trialist spends at a given interim look, the higher will be the stopping boundary at that look. In the limiting case that the trialist has no intention of stopping at all, but is only interested in taking an administrative look, none of the available need be spent at that look.

Sample Size Calculations

The stopping boundaries discussed in the previous section were constructed to satisfy the regulatory requirement that the probability of crossing any one of them under the null hypothesis (i.e., the false positive rate) should not exceed . Once these boundaries have been obtained one can compute the probability of crossing them at any positive value of . Suppose one wishes to power the study to have 1- power to detect a

difference with a one-sided level- test. The sample size N required to achieve this power with a single-look design can be computed as

(1)

where is the upper percentile of the standard normal distribution. The above sample size is inflated by a appropriate inflation factor if the trial has one or more interim looks for early stopping. Without this inflation some power would be lost. The magnitude of the inflation factor depends on the number of interim looks, the type of spending function used to compute the boundaries and the values of and . In general the inflation factor is very small if the is spent conservatively at the beginning of the trial, as is the case in Figure 2. The boundaries created by this spending function are referred to as O'Brien-Fleming boundaries. For a four-look design with O'Brien-Fleming boundaries,

and the inflation factor is only about 2%. In other words, if a single look design achieves 90% power with 100 patients, a corresponding four-look design with O'Brien-Fleming boundaries will require 102 patients to achieve the same power. In contrast the spending function depicted in Figure 3 spends the rather aggressively right from the start. The corresponding boundaries displayed in Figure 4 are referred to as Pocock boundaries. For a four-look design with Pocock boundaries, and

the inflation factor is about 18%. It is thus seen that the faster we spend the the more the maximum sample size gets inflated relative to the sample size needed for a single look design having the same power.

46

Page 47: Optimizing_Trial_Design.doc.doc

Appendix II: Glossary

Accrual rateRate of build-up of data from a clinical trial

Active Control ArmSubjects who are randomly assigned to either a recognized effective treatment or the study drug.

Adaptive Clinical TrialsAdaptive sampling designs, also called response-adaptive designs, for statistical experiments are ones where the accruing data from experiments (i.e., the observations) are used to adjust the experiment as it is being run. Possible adaptations that may occur in the trial include sample size, sample size allocation to treatments, addition, deletion or changes to treatment arms, changes to subgroups, inclusion/exclusion criteria, or hypotheses (non-inferiority v. superiority).

Adaptive Seamless Trial designCombination of two distinct subsequent trials in a drug program into a single trial. Information gathered in the first stage of the combined trial is used to adapt the design for the next stage, which seamlessly follows; the information from the learning stage will contribute evidence to the overall conclusions. The trials are defined in a single protocol.

Bayesian Methods

Bayesian methodology relies on the use of probability models to describe parameters of interest (e.g., treatment effect for a drug in development). Bayesian inference uses principles from the scientific method to combine prior knowledge with observed data, producing enhanced, updated information.

BiomarkerA characteristic that is objectively measured and evaluated as an indicator of normal biologic processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention. FDA Pharmacogenomics Guidance further defines possible, probable and known valid biomarker categories depending on available scientific information on the marker (1).

Clinical EndpointsAn outcome or medical event that a clinical trial monitors. A characteristic or variable that reflects how a patient feels, functions or survives. Clinical endpoints are observed in trials (i.e., fractures, survival) or may be established by assessing an intermediary measurement (i.e., measures of bone mineral density, cholesterol, blood pressure).

Cohort Study

An observational study with more than one group of subjects, or cohorts, as described above.

Conditional Power

47

Page 48: Optimizing_Trial_Design.doc.doc

The conditional power at an interim look is the probability given the observed data that, upon completion of the trial, the experimental compound will demonstrate efficacy. More specifically conditional power is defined as the probability that the Wald statistic will cross the final efficacy boundary conditional on its observed value at the interim look

Confirmatory Studies

Normally a later-stage trial (Phase IIb, Phase III), where a hypothesis (hypotheses) are stated in advance and then tested during an adequately controlled trial to provide firm evidence of efficacy (and safety)

CRM-like algorithm

This method escalates the dose until it is clear that the current, or the next, dose has an incidence rate of larger than the desired rate %. More precisely, the doses are escalated until the probability of dose di having a 2% or higher safety incidence rate is greater than 90% (such probability is calculated under a Bayesian framework), or the MTD is reached. The MSD is then defined as the highest dose for which the probability of having a 2% or higher safety incidence rate is less than 90%, if any.

Continual Reassessment Method (CRM)The continual reassessment method (CRM) has been proposed as an alternative to a traditional cohort design. Its essential features are the sequential (continuous) selection of dose level for the next patients based on the dose-toxicity relationship and the updating of the relationship based on patients’ response data using Bayesian calculation.

CRO

Contract Research Organization

Cross-Sectional/Longitudinal Studies

A study in which the presence or absence of disease or other health-related variables are determined in each member of the study population or in a representative sample at one particular time. This contrasts with longitudinal studies which are followed over a period of time.

Δ

The parameter δ (delta) denotes the true underlying treatment effect. Even though δ is unknown, the trial investigators will usually have in mind a specific value that represents the smallest clinically important effect (SCID) for this clinical trial

Deterministic Model

A mathematical model in which the parameters and variables are not subject to random fluctuations, so that the system is at any time entirely defined by the initial conditions chosen. Dichotomised versions of the original variables

48

Page 49: Optimizing_Trial_Design.doc.doc

Characterisation of a responder or non-responder based on a binary system , or yes/no criteria in relation to the original baseline, to determine whether endpoints were met for that particular patient.

DMC

Data Monitoring Committee.

Dose-ranging study

A clinical trial to compare two or more doses of the same drug to establish the range of doses which are efficacious and well-tolerated, as well as the dose-response relationship of the drug.

Dose-response relationshipThe Dose-response relationship describes the change in effect on a human/animal caused by differing levels of exposure to a drug

Enrichment Design

The selection of subpopulations within the general population who are more likely to show a drug benefit than in an unselected population. This can range from selecting patients more likely to follow trial protocol, to those who those who are likely to have an event that is being measured.

Exploratory Studies

Sometimes called Phase 0, exploratory studies are early or first-in-human trials designed to expedite the development of promising therapeutics by establishing very early on whether the agent behaves in human subjects as was anticipated from preclinical studies. Phase I and IIa trials can also be considered as exploratory trials as there is no predefined hypothesis

Frequentist Methods

Frequentist methods regard the population value as a fixed, unvarying (but unknown) quantity, without a probability distribution. Frequentist methods then calculate confidence intervals for this quantity, or significance tests of hypotheses concerning it. Go/No Go decision

A decision on whether or not to continue with the development of a drug candidate.

Group Sequential Design

49

Page 50: Optimizing_Trial_Design.doc.doc

A trial design that allows a look at the data at particular time points or after a defined number of patients have been entered and followed up based on formulating a stopping rule derived from repeated significance tests ‘Hits’

Molecules found to have initial activity in screening models designed to discover potential lead candidates for further development.

Late stage attrition

An industry phrase used to describe how a company drug pipeline is ‘whittled down’ due to drug failures throughout the development process. A large number of compounds will start the pre-clinical phase, and only a small percentage will reach the final/NDA stage. Late stage refers to those candidates which fail in Phase 3.

Lead CandidatesDiscovery stage candidates which have met a set of screening criteria to enter the optimization phase prior to selecting clinical candidates.

Likelihood Function

Likelihood or likelihood function allows us to determine unknown parameters based on known outcomes.

Longitudinal measurements Measurements taken chronologically throughout the course of the study

Maximum Tolerated DoseDose that produces grade 3 (severe) or grade 4 (life-threatening) toxicity in 30% or fewer of the patients tested. (Alternative: The maximum tolerable dose that can be given safely so as side-effects/toxicity are seen in less than 30% of patients.

Model-based drug development (MBDD)

MBDD utilizes quantitative and scientific data to accurately predict and guide drug discovery and clinical research/trials.

Biological M&S utilizes mathematical modeling to understand genetic, biochemical and physiological networks, pathways and processes underlying disease, and pharmacotherapy.

Pharmacological M&S involves deterministic and stochastic pharmacokinetic/pharmacodynamic (PK/PD) modeling to guide clinical trial design, dose selection, and development strategies.

Statistical M&S applies probability and data analysis for inference and decision making under uncertainty and variability using innovative frequentist as well as Bayesian methodologies.

Non-inferiority

50

Page 51: Optimizing_Trial_Design.doc.doc

Non-inferiority trials are intended to show that the effect of a new treatment is not worse than that of an active control by more than a specified margin

Non-ResponderA patient in a clinical trial who does not respond, according to the guidelines in the study, to the given treatment

NME

New molecular entity - A medication containing an active substance that has never before been approved for marketing in any form by a regulatory authority.

Orphan Drug

In the United States, an orphan drug is any drug developed under the Orphan Drug Act of January 1983 ("ODA"), a federal law concerning rare diseases ("orphan diseases"), defined as diseases affecting fewer than 200,000 people in the United States or low prevalence, taken as prevalence of less than 5 per 10,000 in the community.

Patient Cohort

A group of subjects initially identified as having one or more characteristics in common who are followed over time. In the context of clinical trials, normally patients have in common the disease state for which the drug that is being developed. Other common factors could include factors such as weight, age etc and/or genetic traits as relevant to the study.

Pharmacodynamics

Biochemical and physiological effects of drugs and the mechanisms of drug action and the relationship between drug concentration and effect.

Pharmacokinetics

The study of the bodily absorption, distribution, metabolism, and excretion of drugs.

Phase 1Clinical pharmacology studies usually in about 20-30 healthy volunteers (sometimes in patients) to determine the safety and tolerability of a drug or treatment, other dynamic effects and the pharmacokinetic profile. Evidence of efficacy may be gained if patients, disease models or biomarkers are used.

Phase 2aPilot clinical trials to evaluate efficacy and safety in selected populations who have the disease or condition to be treated, diagnosed or prevented. Phase 2a usually represents the initial studies in which patients are exposed to the drug or treatment. Objectives may focus on dose-response, type of patient, frequency of dosing, and may provide some initial efficacy or proof-of-concept data and safety data in patients, albeit based on a relatively small database.

Phase 2b

51

Page 52: Optimizing_Trial_Design.doc.doc

Well controlled trials to establish proof of concept in patients who have the disease or condition to be treated, diagnosed or prevented and to determine dose and dosing regimen for confirmatory larger scale Phase 3 trials. These studies may also identify key safety issues and other questions to be addressed in Phase 3 trial protocols.

Phase 3

Multicenter studies in populations of 400 to 3000 patients (or more) for whom the drug or treatment is eventually intended. Phase 3 trials are designed to confirm effectiveness on clinical outcomes, monitor side effects, compare it to commonly used treatments and/or placebo, and collect information that will allow the experimental drug or treatment to be used safely. Trials are also conducted in special groups of patients or under special conditions dictated by the nature of the particular treatment and/or disease. Phase 3 trials often provide much of the information needed for the package insert and labelling of the medicine.

Posterior Probability Distribution

The posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned when the relevant evidence or data is taken into account. One applies Bayes' theorem, multiplying the prior probability distribution by the likelihood function and then normalizing, to get the posterior probability distribution, which is the conditional distribution of the uncertain quantity given the data.

Predictive Capabilities

Using Bayesian methodologies to ‘predict’ the direction of a clinical trial and make adjustments/adaptations, such as sample size or treatment arms (see adaptive trials) accordingly.

Prior Probability Distribution

In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p is the probability distribution that would express one's uncertainty about p before the experimental data are taken into account. It is meant to attribute uncertainty rather than randomness to the uncertain quantity. A prior may be the purely subjective assessment or ‘beliefs’ of an experienced expert.

Proof Of Concept (POC)Proof of concept is the evidence to confirm that a drug or treatment mechanism of action is a viable approach to achieve the intended clinical endpoints. Proof of concept is traditionally achieved in a Phase 2a trial before proceeding with more extensive studies, although a series of preclinical and early exploratory clinical studies provide early insights into the likelihood of success and inform the Phase 2 trial design.

Randomization ratio

52

Page 53: Optimizing_Trial_Design.doc.doc

Ratio of patients treated with placebo compared to those treated with the active drug.

Responder

A patient in a clinical trial who responds, according to the guidelines in the study, to the given treatment

Sample Size Efficiency

Clinical trial sample size efficiency is related to the difference between the theoretical optimal sample size and the actual study sample size. In this context, the optimal sample size is defined as the minimum number of subjects just required to confidently determine the actual effect size of the primary endpoint, while ensuring that there is enough data to also determine the relative safety of the test treatment or diagnostic.

Sample Size Re-estimation

Starting out with a small initial commitment of patients and factoring in the possibility that the sample size might need to be increased during the course of the trial. This could be accomplished by taking an interim assessment of the treatment effect and reassessing whether the initial plan of a smaller sample size remains a viable option. The vice-versa situation, starting with a large commitment of patients and reducing the number based on interim findings, is also possible.

SCID

Smallest clinically important delta.

Superiority Evaluation

A substantial and consistent demonstration of superiority of active product to placebo.

Stochastic Model

A mathematical model which takes into consideration the presence of some randomness in one or more of its parameters or variables. The predictions of the model therefore do not give a single point estimate but a probability distribution of possible estimates.

Surrogate Endpoint

A biomarker intended to substitute for a clinical endpoint. A surrogate endpoint is expected to predict clinical benefit (or harm, or lack of benefit) based on epidemiologic, therapeutic, pathophysiologic or other scientific evidence. An example is a laboratory measurement of biological activity within the body that indirectly indicates the effect of treatment on a disease state. E.g. CD4 cell counts and viral load are examples of surrogate endpoints for AIDS progression

53

Page 54: Optimizing_Trial_Design.doc.doc

Type 1 error

Error made when a null hypothesis is rejected but is actually true. Also called false positive.

Type 2 error

The probability of erroneously failing to reject the null hypothesis. Also called false negativeULN

Upper Limit of Normal

Wald statistic

At each interim look all the accumulated efficacy data will be summarized into an estimate of δ divided by its standard error. This standardized estimate of δ is referred to as a Wald statistic.

54

Page 55: Optimizing_Trial_Design.doc.doc

References

1. Budget US Govt., App., FY 1993-20032. Parexel Pharm. R&D Stat. Sourcebook, 2002/20033. Singh et al. In Vivo: The Business & Medicine Report, 17:10, p. 73, November,

20034. DiMasi et al. The Journal of Health Economics, 22(2003) 151-1855. BIOMARKERS DEFINITIONS WORKING GROUP: BIOMARKERS AND

SURROGATE ENDPOINTS: PREFERRED DEFINITIONS AND CONCEPTUAL FRAMEWORK. CLIN PHARMACOL THER 2001;69:89-95.

6. B. Booth & R. Zemmel. Nature Rev Drug Disc 20047. Trusheim, Mark R., Berndt, Ernst R and Douglas, Frank L., 2007. Stratified

Medicine: strategic and economic implications of combining drugs and clinical biomarkers. Nat Rev Drug Discov. 6, 287-293 (April 2007).

8. Tsiatis and Mehta, Biometrika, 2003, vol 90, pages 367-378.9. Pocock S.J., JAMA 2005,, vol 294, No 17, pages 2228-2230

55