Top Banner
Research Article Development of a unified clinical trial database for Alzheimer’s disease Jon Neville a , Steve Kopko b , Steve Broadbent a , Enrique Avil es a , Robert Stafford a , Christine M. Solinsky c , Lisa J. Bain d , Martin Cisneroz a , Klaus Romero a , Diane Stephenson a, *, for the Coalition Against Major Diseases a Coalition Against Major Diseases (CAMD), Critical Path Institute, Tucson, AZ, USA b CDISC, Austin, TX, USA c USC Department of Pharmaceutical Sciences, USA d Independent Science Writer, Elverson, PA, USA Abstract Background: Data obtained in completed Alzheimer’s disease (AD) clinical trials can inform deci- sion making for future trials. Recognizing the importance of sharing these data, the Coalition Against Major Diseases created an Online Data Repository for AD (CODR-AD) with the aim of supporting accelerated drug development. Objective: The aim was to build an open access, standardized database from control arm data collected across many clinical trials. Methods: Comprehensive AD-specific data standards were developed to enable the pooling of data from different sources. Nine member organizations contributed patient-level data from 24 clinical tri- als of AD treatments. Results: CODR-AD consists of control arm pooled and standardized data from 24 trials currently numbered at 6500 subjects; Alzheimer’s Disease Assessment Scale-cognitive subscale 11 is the main outcome and specific covariates are also included. Conclusions: CODR-AD represents a unique integrated standardized clinical trials database avail- able to qualified researchers. The pooling of data across studies facilitates a more comprehensive un- derstanding of disease heterogeneity. Ó 2015 The Alzheimer’s Association. Published by Elsevier Inc. All rights reserved. Keywords: Alzheimer’s disease; Clinical trials database; Placebo data; Data standardization; Data integration; Facilitated access 1. Introduction Alzheimer’s disease (AD) currently affects more than 36 million people worldwide, with the prevalence expected to triple by 2050 [1]. Yet, despite intensive efforts, there are no approved disease-modifying products capable of slowing or arresting the disease. Recent trials of AD drugs have raised concerns about the path forward for drug development and highlighted the importance of learning as much as possible from trials that have already been conducted for therapeutic candidates. Sharing the data collected in those trials has thus been recognized as an essential, albeit chal- lenging, component of drug development efforts [2]. The U.S. Food and Drug Administration (FDA) recog- nizing the urgency of addressing the public health crisis that stems from a failure to translate scientific progress into new therapies, launched the Critical Path Initiative in 2004 [3] to the drive innovation for the treatment of major diseases such as AD, cancer, and diabetes. In 2005, Critical Path Institute (C-Path) was created as a public–private part- nership to deliver on the mission of the Critical Path Initia- tive, specifically to improve the efficiency of drug and medical device development through the creation of broadly *Corresponding author. Tel.: 11-520-547-3440; Fax: 11-520-547- 3456. E-mail address: [email protected] http://dx.doi.org/10.1016/j.jalz.2014.11.005 1552-5260/Ó 2015 The Alzheimer’s Association. Published by Elsevier Inc. All rights reserved. FLA 5.2.0 DTD ĸ JALZ1948_proof ĸ 26 February 2015 ĸ 7:06 pm ĸ ce Alzheimer’s & Dementia - (2015) 1-10
10

Development of a unified clinical trial database for ... of a... · SDTM is suited for collecting data of various types and stor- ... including terminology, structure, and cross-dataset

Mar 18, 2018

Download

Documents

trannhi
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Development of a unified clinical trial database for ... of a... · SDTM is suited for collecting data of various types and stor- ... including terminology, structure, and cross-dataset

Alzheimer’s & Dementia - (2015) 1-10

Research Article

Development of a unified clinical trial database for Alzheimer’s disease

Jon Nevillea, Steve Kopkob, Steve Broadbenta, Enrique Avil�esa, Robert Stafforda,Christine M. Solinskyc, Lisa J. Baind, Martin Cisneroza, Klaus Romeroa, Diane Stephensona,*,

for the Coalition Against Major DiseasesaCoalition Against Major Diseases (CAMD), Critical Path Institute, Tucson, AZ, USA

bCDISC, Austin, TX, USAcUSC Department of Pharmaceutical Sciences, USA

dIndependent Science Writer, Elverson, PA, USA

Abstract Background: Data obtained in completed Alzheimer’s disease (AD) clinical trials can inform deci-

*Corresponding a

3456.

E-mail address: D

http://dx.doi.org/10.10

1552-5260/� 2015 Th

sion making for future trials. Recognizing the importance of sharing these data, the Coalition AgainstMajor Diseases created an Online Data Repository for AD (CODR-AD) with the aim of supportingaccelerated drug development.Objective: The aim was to build an open access, standardized database from control arm datacollected across many clinical trials.Methods: Comprehensive AD-specific data standards were developed to enable the pooling of datafrom different sources. Nine member organizations contributed patient-level data from 24 clinical tri-als of AD treatments.Results: CODR-AD consists of control arm pooled and standardized data from 24 trials currentlynumbered at 6500 subjects; Alzheimer’s Disease Assessment Scale-cognitive subscale 11 is themain outcome and specific covariates are also included.Conclusions: CODR-AD represents a unique integrated standardized clinical trials database avail-able to qualified researchers. The pooling of data across studies facilitates a more comprehensive un-derstanding of disease heterogeneity.� 2015 The Alzheimer’s Association. Published by Elsevier Inc. All rights reserved.

Keywords: Alzheimer’s disease; Clinical trials database; Placebo data; Data standardization; Data integration; Facilitated

access

1. Introduction

Alzheimer’s disease (AD) currently affects more than 36million people worldwide, with the prevalence expected totriple by 2050 [1]. Yet, despite intensive efforts, there areno approved disease-modifying products capable of slowingor arresting the disease. Recent trials of AD drugs haveraised concerns about the path forward for drug developmentand highlighted the importance of learning as much as

uthor. Tel.: 11-520-547-3440; Fax: 11-520-547-

[email protected]

16/j.jalz.2014.11.005

e Alzheimer’s Association. Published by Elsevier Inc. All r

FLA 5.2.0 DTD � JALZ1948_proof �

possible from trials that have already been conducted fortherapeutic candidates. Sharing the data collected in thosetrials has thus been recognized as an essential, albeit chal-lenging, component of drug development efforts [2].

The U.S. Food and Drug Administration (FDA) recog-nizing the urgency of addressing the public health crisisthat stems from a failure to translate scientific progressinto new therapies, launched the Critical Path Initiative in2004 [3] to the drive innovation for the treatment of majordiseases such as AD, cancer, and diabetes. In 2005, CriticalPath Institute (C-Path) was created as a public–private part-nership to deliver on the mission of the Critical Path Initia-tive, specifically to improve the efficiency of drug andmedical device development through the creation of broadly

ights reserved.

26 February 2015 � 7:06 pm � ce

Page 2: Development of a unified clinical trial database for ... of a... · SDTM is suited for collecting data of various types and stor- ... including terminology, structure, and cross-dataset

J. Neville et al. / Alzheimer’s & Dementia - (2015) 1-102

accepted standards and tools. C-Path is a fully independent,501(c)3 nonprofit institute comprised of seven precompeti-tive consortia (www.cpath.org), including the CoalitionAgainst Major Diseases (CAMD). The mission of CAMDis to develop new technologies and methods to accelerateprogress in treating neurodegenerative diseases, namelyAD and Parkinson’s disease. CAMD serves as a neutral thirdparty and brings together pharmaceutical companies,research organizations, patient advocacy organizations, reg-ulatory and other government agencies, and academia toaddress critical needs in three major cross-cutting areas:data sharing, disease modeling, and biomarkers [4,5].

Among the first issues addressed by CAMD was the needto combine disparate clinical data contributed by multipleorganizations. The Alzheimer’s Disease NeuroimagingInitiative (ADNI) [6] provides an instructive example ofhow data sharing fuels progress. However, as ADNI is purelyobservational, there is a need to understand how the analysisof disease progression in ADNI subjects compares to thatobserved in other populations, particularly clinical trial sub-jects enrolled at multiple global sites. Therefore, it is essen-tial to obtain data from randomized samples of subjects thatare more representative of global clinical trial populations.

This manuscript describes the process by which CAMDdeveloped an online repository for clinical trial data obtainedin globally executed randomized controlled AD clinicalstudies (C-Path Online Data Repository-Alzheimer’s dis-ease; CODR-AD).

2. Methods

2.1. The CDISC standard, study data tabulation model

Establishing and conforming to comprehensive data stan-dards was essential to the development of a database that en-ables the pooling of data from different sources. For this,CAMD partnered with the Clinical Data Interchange Stan-dards Consortium (CDISC) [7], a nonprofit organizationthat focuses on developing global standards for clinical trialdata collection. CDISC standards are preferred by regulators,industry, and other research organizations as a means of facil-itating regulatory review, aggregation, and querying of data,sharing data between entities, and streamlining the acquisitionand analysis of data. In 2012, when the Prescription Drug FreeUser Act was reauthorized, CDISC was recognized as anexample of an organization that develops the kind of openstandards needed for ensuring efficient review of medicalproducts—standards that will be required for regulatory sub-missions to the agency by the end of FY2017 [8].

The foundational Study Data Tabulation Model (SDTM)standard as it existed at the start of CODR development wasinsufficient with regard to representing the AD-specific dataof interest to CAMD. To address this issue, CAMD workedwith CDISC to develop a previously nonexistent AD thera-peutic area standard to accommodate additional data ele-ments relevant to AD clinical trials. This therapeutic areastandard included scores from the Alzheimer’s Disease

FLA 5.2.0 DTD � JALZ1948_proof �

Assessment Scale-cognitive subscale (ADAS-Cog) and theMini-Mental State Examination (MMSE), b-amyloid, andtau biomarkers, and apolipoprotein E (APOE) genotypebecause the presence of the APOE ε4 allele is the strongestgenetic risk factor for AD thus far identified [9].

Because a key goal of the database was to support thedevelopment of quantitative modeling and simulation tools,the variables and domains selected for standardization werethose deemed necessary for developing a drug-disease-trialmodel [10]. The proposed AD-specific standards, developedby a team of clinical trial researchers and data standards ex-perts, were reviewed and vetted through a public review andcomment process. The resulting standards for AD clinicaltrials were published [11], representing the first disease-specific therapeutic standards. A summary of the moresalient concepts captured by SDTM domains contained inthe database is provided in Table 1.

As development of the standards progressed, it becameincreasingly clear that the standards would—in addition tofacilitating the pooling of data from legacy clinical trials—also provide a resource for prospectively collecting data innew trials without the need for remapping after the fact.

SDTMdefines how clinical study data should be structuredfor submissions to the FDA and other regulatory authorities.SDTM is suited for collecting data of various types and stor-ing it in a relatively small number of observation classes. Forexample, it allows the preservation of all data collected at anindividual visit by making use of “long” data structures.“Long” data sets are generally preferred over “wide” datasets for storing data when subject measures are repeatedlongitudinally. In a long data set, the variable itself is a columnheading and separate observations are captured in differentrows. In contrast, in a “wide” data set, each observation iscaptured as a separate variable (i.e., in a separate column).Long structures thus lead to fewer “holes” in the data setwhen some subjects have more observations than others, orwhen some subjects are missing some of the observations.Long data sets also facilitate the development of standardizedprograms to operate on this fixed standard data format.Conversely, wide data sets are generally more preferred fordata capture and some types of analysis. Although the longdatabase structure may be less intuitive to researchers accus-tomed to working with analysis subsets, the flexibility wasimportant because the AD database includes disparate dataand heterogeneous subjects. Thus, SDTM was appropriatefor the intended CAMDdatabase, given the longitudinal mea-sures repeated across time in AD trials, particularly when thenumber of observations varies between subjects. Transform-ing between the two formats is typically a simple task inmost statistical software packages.

2.2. Collecting and standardizing data

With the standards in place, patient-level data from thecontrol arms of relevant trials were remapped and used topopulate the database. The scope of patient-level data

26 February 2015 � 7:06 pm � ce

Page 3: Development of a unified clinical trial database for ... of a... · SDTM is suited for collecting data of various types and stor- ... including terminology, structure, and cross-dataset

Table 1

SDTM domains used for CODR-AD

CDISC domain Abbreviation

Observation

class Contents

Demography DM Special

purpose

Age

Gender

Race

Ethnicity

Country

Subject

characteristics

SC Findings *APOE genotype

*MTHFR genotype

Concomitant

medications

CM Interventions yAcetylcholinesteraseinhibitorsyMemantineyGeneral medications

Adverse events AE Events event

Severity

Duration

Medical history MH Events primary diagnosis (MCI

or AD)

Family history of AD

General medical

history

Vital signs VS Findings SBP, DB

Heart rate

Temperature

Weight, height

BMI

Respiratory rate

Questionnaires QS Findings ADAS-Cog

MMSE

Others as collected

may be present, but

not standardized

Laboratory

results

LB Findings All labs collected,

mapped to SDTM.

Controlled terminology

compliance was out

of scope in LB.

Abbreviations: CODR-AD, Coalition Against Major Diseases created an

Online Data Repository for AD; MCI, mild cognitive impairment; AD, Alz-

heimer’s disease; SBP, systolic blood pressure; DBP, diastolic blood pres-

sure; CDISC, Clinical Data Interchange Standards Consortium; BMI,

body mass index; ADAS-Cog, Alzheimer’s Disease Assessment Scale

Cognitive subscale; MMSE, Mini-Mental Health Examination.

NOTE. A list of CDISC Study Data TabulationModel domains (data sets)

used in CODR-AD, their corresponding observation classes, and a summary

of the types of data stored in each.

*Not all studies collected genotype; some study sponsors collected but

did not provide this information due to issues with informed consent.yContributors were asked to supply memantine and acetylcholinesterase

inhibitors names as generic when present in the data. All other concomitant

medication names are provided verbatim (as collected), and may or may not

be decoded to generic names.

J. Neville et al. / Alzheimer’s & Dementia - (2015) 1-10 3

requested was extensive, including characteristics such asdemographics, medical history, subject disposition, and out-comes data such as longitudinal ADAS Cog and baselineMMSE scores, and adverse events. The common denomina-tor longitudinal cognitive endpoint was ADAS Cog 11,which at the case-report-form level, showed different itemadministration orders between sponsors. A concordanceanalysis was performed, which showed that despite this

FLA 5.2.0 DTD � JALZ1948_proof �

item administration variation, the level of agreement wasadequate and should not affect the interpretation of theADAS Cog 11 (unpublished data). Additionally, memberswere asked to contribute tables of subject-visit records,disposition events, laboratory results, vital signs, andconcomitant medications. APOE genotype was requestedin the form of categorical, allele-level isoform data whenavailable (each of two alleles could have a categorical valueof 2, 3, or 4, enabling the derivation of a comprehensiveAPOE genotype by patient). CAMD partnered with thedata standards experts within each member company toremap existing clinical data to the newly developed ADCDISC standard. This resource-intensive step was a criticalsuccess factor for the consortium, as data were disparate innature across studies and could not be pooled for analysisin their original form.

The process of remapping data was accomplished in twostages: logical mapping and programmatic transformation ofthe data. In the logical mapping stage, a source-to-target(i.e., legacy to standard) specification was developed to pro-vide rules for creating new, standardized data sets of the ex-isting variables in the legacy data sets. This step could not befully automated; it involved many person-hours and collab-oration between data managers, programmers, and ofteneven clinical subject matter experts to ensure that the clinicalutility and meaning of the data were not compromised.These legacy data usually did not contain the related meta-data or documentation to effectively understand the datawithout reviewing questions with a subject matter expert in-ternal to the contributing organization to avoid confusionand error. Often, the process involved splitting or concate-nating multiple variables, and separating variables thatmay have been grouped together in the legacy data into mul-tiple data sets based on the target standard format (SDTM).In the programmatic transformation stage, clinical data pro-grammers wrote scripts and programs to execute the plandescribed in the mapping specification to create the stan-dardized data sets. In almost all cases, the data mappingwas performed by the contributors.

Because of the potential for errors in the process, the final,crucial step in remapping was validation, to demonstrate theaccuracy of the data. During this phase, data were checkedfor the conformance to the SDTM standard. CAMD datamanagement used OpenCDISC�, an open source, freelyavailable validation program to perform this task [12]. Open-CDISC checked for conformance to approximately 200rules, including terminology, structure, and cross-datasetagreement. Additional checks were performed by CAMDfor the conformance to new rules for terminology that hadnot yet been incorporated into the OpenCDISC software.The program generated validation reports, which CAMDthen annotated with instructions or requests for clarificationfrom the contributors. Validating and reconciling errors wasoften an iterative process working with the contributors torequest changes and revalidating until data were suitablefor the production database.

26 February 2015 � 7:06 pm � ce

Page 4: Development of a unified clinical trial database for ... of a... · SDTM is suited for collecting data of various types and stor- ... including terminology, structure, and cross-dataset

J. Neville et al. / Alzheimer’s & Dementia - (2015) 1-104

2.3. Data deidentification

Data in the CODR-AD database were deidentified inaccordance with the Health Insurance Portability andAccountability Act (HIPAA) Safe Harbor requirements[13]. Conforming to these guidelines entailed removing 18so-called “identifiers” from the data, e.g., name, address, so-cial security number, etc. It is important to note that most ofthese identifiers are not typically recorded in clinical trialsdata, and were therefore not supplied to CAMD.When avail-able, and to ensure compliance with HIPAA Safe Harbor re-quirements, any age .89 years-of-age was converted to“999,” whereas full year-month-day dates were first con-verted to an integer representing the number of days elapsedfrom each subject’s reference start date (defined day 1).

2.4. Mixed effects modeling

The clinical trial simulation tool developed by CAMD us-ing the CODR-AD database as one source of data is based onmixed effects population models (for disease progression,placebo effect and symptomatic drug effect), and a Weibullsurvival model for patient dropouts. As such, these ap-proaches are suited to identify sources of variability thatdrive, for example, varying rates of disease progressionwithin specific subpopulations, or the varying probabilityof patients dropping out. Only by integrating multiple datasources can such models help identify such subpopulations,and quantify the impact of such varying rates of progressionon the design and analysis of clinical trials. This approach issupported by the results described by Rogers et al. [14], andin the endorsement decision from FDA [15] and EMA [16].See Rogers et al. [14], for a detailed methodology of mixedeffect model development.

This same concept applies to the placebo effects function,which captures relevant baseline sources of variability (base-line age and severity, gender and APOE ε4 genotype) thatcan help design teams envision scenarios for varying magni-tudes, durations, and variability of placebo response. Theplacebo effects quantitative description allows researchersto envision scenarios regarding magnitude, duration, andvariability of the placebo response, according to the selectedentry criteria for the simulated trial [17]. As with any longi-tudinal modeling approach, variance always increases as afunction of time. However, these quantitative tools are (bynature), continuously evolving entities that get continuouslyrefined as additional data become available [17].

2.5. Consortium approach

A key factor critical to the success of the CODR-AD data-base was the use of a consortium approach to sharing dataand information. All full-member organizations of CAMDassigned a representative to the CAMD Coordinating Com-mittee. This committee determines the direction, budget, andpolicies of CAMD. Sharing clinical trials data, even whenlimited to the control arms, is not without perceived risk to

FLA 5.2.0 DTD � JALZ1948_proof �

the contributor. To mitigate this risk and address the con-cerns of the members, CAMD made the use of a consortiumlegal agreement and a separate data use agreement that spellsout the acceptable use and access policies to the database.Additionally, the consortium legal agreement specifies thatall publications produced by CAMD must be presented tothe Coordinating Committee for review and input before be-ing submitted for publication.

3. Results

To date, CAMD has received data on a total of 6500 sub-jects from 24 remapped studies of AD and mild cognitiveimpairment (MCI) from nine member organizations: Abbott(now AbbVie), the Alzheimer’s Disease Cooperative Study,AstraZeneca, Eisai, Forest, GlaxoSmithKline, Johnson&Johnson, Pfizer, and Sanofi [17]. The diagnostic status ofsubjects in the database according to the stage of AD isMCI: n 5 1041; moderate to severe AD: n 5 146; severeAD: n 5 377; mild to moderate AD: n 5 4936.

A summary of the trials and baseline subject characteris-tics is shown in Table 2 and descriptive statistics of the sub-jects are represented in Fig. 1. Approximately 3200 subjectsformed the analysis data set for an integrated approach bythe CAMD modeling and simulation team to analyze thedata as one key component for the development of a clinicaltrial simulation tool for mild and moderate AD [14]. The toolhas been recently endorsed by both the EMA and FDA as thefirst drug-disease-trial model to achieve a regulatory deci-sion [15,16]. In addition to CODR, the modeling toolincorporated patient-level data from ADNI and summarydata from the literature. This quantitative drug developmenttool enables users to simulate phases 2 and 3 trials within thedrug development process based on longitudinal ADAS Cogscores, and all their sources of variability, in mild and mod-erate AD patients (Fig. 2). Relevant covariates for diseaseprogression include gender, number of APOE ε4 alleles,baseline age, and baseline disease severity (captured bythe baseline MMSE score).

The CAMD consortium members agreed to make theCODR-AD database available to qualified external re-searchers. The rationale was to be sure to maximize theimpact of the investments in the AD database beyond the pri-mary goals of the consortium. At present, there are a growingnumber of examples of diverse research questions that arebeing addressed by analyzing the CODR-AD database(Table 3). The field is currently realizing the critical impor-tance of data sharing to identify subtle signals in heteroge-neous diseases; such strategies will serve to catalyze theconcept of personalized medicine and de-risk drug develop-ment, an urgent need for AD.

4. Discussion

Although there are several other AD databases availableto researchers, CODR-AD is unique in that it is the first

26 February 2015 � 7:06 pm � ce

Page 5: Development of a unified clinical trial database for ... of a... · SDTM is suited for collecting data of various types and stor- ... including terminology, structure, and cross-dataset

Table 2

Summary characteristics of trials and subjects in CODR-AD

STUDYID

Duration

(weeks) N

Female

%

Years

since DX

Background

therapy

1000 12 102 58.8 2.5 (,1–13) Yes

1009 12 164 55.5 0.9 (,1–11) No

1013 78 719 50.2 2 (,1–10) Both

1014 78 644 56.2 2.1 (,1–11) Both

1055 52 140 58.6 NA* No

1056 54 494 55.9 2.5 (,1–20) Both

1057 54 500 61.4 2.1 (,1–10) Both

1058 24 166 59 1.5 (,1–10) No

1105 78 326 50.9 2.2 (,1–12) Yes

1107 24 146 61 2.1 (,1–11) Noz

1131 24 57 59.7 2.6 (,1–10) Noz

1132 52 412 43.5 3.3 (,1–24) No

1133 30 162 61.1 NA* Noz

1134 24z 105 81.9 NA* Noz

1135 30z 274 55.1 NA* Noz

1136 52 144 59 NA* Noz

1137 24 216 50.5 3.6y (,1–10) Yes

1138 24 202 57.4 3.4y (,1–20) No

1139 24 167 67.7 5.6 (,1–19) Noz

1140 24 137 42.3 2.6 (,1–20) Noz

1141 104 492 55.2 0.3 (,1–5) Noz

1142 78 409 56 4.4 (,1–20) Both

1143 24z 105 82.9 5.4 (,1–20) Noz

1144 54z 217 64.5 3.6 (,1–13) Noz

N 5 6500

Abbreviation: CODR-AD, Coalition Against Major Diseases created an

Online Data Repository for AD.

NOTE. Study ID is the unique identifier assigned to each study by CODR.

N refers to the number of patients randomized to control arm contained in

each study. Years since DX refers to the mean years because the diagnosis

of AD or MCI at the start of each study, and ( ) contains the range in years.

Background therapy identifies whether studies enrolled patients who were

stably treated with either memantine, an aceteylcholinesterase inhibitor,

or both at trial start; such therapy was neither an inclusion nor exclusion cri-

terion for these studies, as it was in the case of studies marked “Yes” and

“No”, respectively.

*Data not available; could not be derived because the date of diagnosis

was not provided.yStudies 1137 and 1138: Years since DX was calculated based on a sup-

plemental variable for estimated start of cognitive problem as collected in

these studies, because a formal diagnosis date was not available in medical

history.zThese values were determined based on the presence or absence of

acetylcholinesterase inhibitors or memantine in the data. Neither protocols

nor clinicaltrials.gov listings were available to make this determination.

J. Neville et al. / Alzheimer’s & Dementia - (2015) 1-10 5

database available to qualified researchers that pools patient-level records from clinical trials data by adhering to an open-source standard. By pooling data in this fashion, analysts areable to query all trials or subsets of trials contained in thedatabasewithout having to re-write programming statementsfor each new study. By providing longitudinal results from avariety of assessment tools, these studies enable researchersto better understand how the disease progresses and identifycritical points along the disease continuum where interven-tion may be most effective.

Moreover, such data repositories align with the goals ofC-Path and the FDA to use precompetitive data sharing as

FLA 5.2.0 DTD � JALZ1948_proof �

a means to improve efficiency in clinical trials. FDA createdthe Janus Clinical Trials Repository Project in 2010 to pro-vide a hub for integrating data submitted to the agency assupporting evidence for regulatory decisions [18], and hasbegun converting legacy data and developing analytic toolsto make these data more useful.

CODR, launched in 2010, currently supports several C-Path consortia, including CAMD, the Polycystic KidneyDisease Outcomes Consortium, and the Predictive SafetyTesting Consortium, for a total of seven databases thatcontain data from nearly 150 studies (Fig. 3). C-Path’s new-est consortium, the Multiple Sclerosis Outcomes Assess-ment Consortium will also use CODR for sharing datafrom Multiple Sclerosis clinical trials [19]. Access to eachdatabase is managed separately according to the policiesof its parent consortium. Among all C-Path CODR data-bases, the CODR-AD database is, so far, unique in beingthe only database that is available for external qualified re-searchers. Additional CODR databases may be made avail-able to qualified external researchers in the future, dependenton the objectives of each C-Path consortium. The sharedmission of C-Path consortia is to foster the development ofdrug development tools by precompetitive data sharingacross member companies. The CODR database infrastruc-ture represents a common means to accomplish this sharedgoal.

The CAMD CODR-AD database does not presentlycontain biomarker data (neuroimaging and biofluids suchas cerebrospinal fluid analytes). The field of AD is strugglingwith the fact that there is a lack of consensus on the specificmethodology and assay protocol standards employed in clin-ical trials to date using biomarkers. This is the case for bothbiofluid and neuroimaging biomarkers and is one of themany reasons why ADNI, which does use consensus proto-cols has been so successful. Although it is not possible at thepresent time to pool biomarkers across distinct randomizedcontrolled clinical trials into a unified clinical trial database,the adoption of consensus Alzheimer’s disease specificCDISC data standards in ongoing and future trials will posi-tively impact the future.

At present, this database does contain clinical data repre-senting mild to moderate stages of AD (w5500 subjects) andpredementia trials at the stage of MCI (w1000 subjects).CODR-AD enables investigators not only to access anddownload data but also provides a web interface to analyzedata with a commonly used open-source statistical program(“R”) [20] and create and download reports. Aweb interfacefor generating reports via Structured Query Language is alsoprovided.

Other data repositories have also been created for datafrom AD clinical trials: the Global Alzheimer’s AssociationInteractive Network [21], a cloud-based, federated data re-pository of AD research data, which is currently under devel-opment by the Alzheimer’s Association and the Laboratoryfor NeuroImaging at the University of Southern California[21]; a donepezil data repository that aggregates clinical trial

26 February 2015 � 7:06 pm � ce

Page 6: Development of a unified clinical trial database for ... of a... · SDTM is suited for collecting data of various types and stor- ... including terminology, structure, and cross-dataset

web4C=FPO

web4C=FPO

Fig. 1. Descriptive statistics for the subjects (N 5 6500). Represented in the Coalition Against Major Diseases created an Online Data Repository for AD

(CODR-AD) database. (A) Number of subjects by region. (B) Distribution of gender. (C) Categorical age range distribution. (D) Number of subjects by

primary diagnosis at start of trial. (E) Number of subjects by baseline severity of cognitive deficit as measured by Mini-Mental Health Examination

(MMSE) for both Alzheimer’s disease (AD), blue bars) and mild cognitive impairment (MCI, yellow bars). (F) APOE genotypes represented in CODR-

AD. Homozygous subjects are defined by having the ε4 isoform in both alleles. Carriers have one ε4 isoform allele. Noncarriers do not have the ε4 isoform

in either allele.

J. Neville et al. / Alzheimer’s & Dementia - (2015) 1-106

data from 18 randomized, controlled trials conducted be-tween 1991 and 2005 by Pfizer and Eisai [22], and the Na-tional Alzheimer’s Coordinating Center database [23]. Anumber of other databases of longitudinal aging studieshave also been created [24], yet these vary in terms of thetypes of data included and availability to other researchers.

It is important to note that data integration would not havebeen possible without the use of data standards. Overall,CAMD members contributed significant in-kind resourcesto remap all data including ADAS-Cog subscales to theAD CDISC standard. The successful development of theAD modeling tool could not have been achieved from a sin-gle sponsor. Pooling data sets as the CAMD team has done inthe CODR-AD database created a unique and powerfulresearch resource by enabling scientists to query for infor-mation across all data sets in the database simultaneously.

C-Path shares AD data with CAMD consortium membersand qualified researchers who request access via the CODRwebsite [24]. When requesting access, users are asked tosend intended research questions and approaches toCAMD for approval. Once approved, users are able to queryand analyze data relevant to their research questions.External disclosures are to be communicated to CAMDand publications acknowledge the consortium. To date,CAMD is aware of multiple external uses of the CODR data-base to address various research questions that have resultedin the publications outlined in Table 3 [17,25–30].

FLA 5.2.0 DTD � JALZ1948_proof �

Additionally, we are aware of multiple abstracts that havebeen presented [31–33].

4.1. Caveats

Although the database has the potential to be very power-ful, a number of caveats should be kept in mind. First, inte-grating data does not necessarily mean that those data arepoolable from a statistical, scientific, or clinical standpoint.The database contains 24 studies, but it cannot be assumedthat all 24 are suitable to answer every analysis question.Therefore, users must determine which studies are suitablefor their analysis question(s). Furthermore, although dataare standardized with regard to the SDTM variables andstructure, some terminology was left verbatim as submittedby contributors. This is most notable in the labs data setwhere, for example, white blood cell counts may be referredto as “WBC” or “leukocytes.” Also, the concomitant use ofacetylcholinesterase inhibitors is a potentially majorconfounder to assessing outcomes, so CAMD asked mem-bers to adhere to generic drug names for this class of drugswhen they were present in the data (as background therapy,for instance).

The primary cognitive outcome measure in AD is theADAS-Cog, but this scale is not highly standardized, with10-, 11-, 12-, and 13-item versions of the scale used by spon-sors contributing data.Most trials include a common set of 11

26 February 2015 � 7:06 pm � ce

Page 7: Development of a unified clinical trial database for ... of a... · SDTM is suited for collecting data of various types and stor- ... including terminology, structure, and cross-dataset

web4C=FPO

web4C=FPO

Fig. 2. Conceptual representation of the mild and moderate Alzheimer’s

disease (AD) CTS Tool. Standardized data from different sources provide

the necessary information to develop quantitative models that capture the

relevant aspects of disease progression, pharmacologic effects (which for

AD have been categorized as “symptomatic” or “disease-modifying”),

and aspects of trial design such as the magnitude duration and variability

of the placebo effect. Such an integrated drug-disease-trial model forms

the basis for the CAMD clinical trial simulation tool for mild and moderate

AD. CTS, clinical trial simulation.

J. Neville et al. / Alzheimer’s & Dementia - (2015) 1-10 7

items from a total pool of 15 items.Moreover, the items do notalways appear in the same order, and each sponsor may havehad different rules in their respective analysis plans on how tohandle missing data. Our resolution was to ask each memberto provide the lowest level of raw detail on the individualitems rather than their own analyses. If a given item wasmissing, theywere instructed to populate the “status” variableas “not done” and then to define the “reason not done” aseither due to “cognitive reasons” or “noncognitive reasons,”if this information was available. This allows investigators

Table 3

Publications by external users of the CODR-AD database

Title Research topic

Disease progression meta-analysis model in AD. Disease progres

Differences between early and late onset AD Characterizing c

AD biomarke

Identifying combinatorial biomarkers by association rule

mining in the CAMD Alzheimer’s database

Combinatorial b

Reliability of the ADAS-Cog in longitudinal studies Interrater reliabi

consistency

Understanding placebo responses in AD clinical trials from

the literature meta-data and CAMD database.

Placebo respons

Early-onset AD: a global cross-sectional analysis. Characterizing e

Improved utilization of ADAS-Cog assessment data through

item response theory based pharmacometric modeling.

Item response th

Abbreviations: CODR-AD, Coalition Against Major Diseases created an Onl

Against Major Diseases; ADAS-Cog, Alzheimer’s Disease Assessment Scale Cog

NOTE. A summary of the known publications based on the research utilization o

exist yet are not listed here.

FLA 5.2.0 DTD � JALZ1948_proof �

the ability to dig deeper into each subject’s performanceand derive their own scores according to their own analysisplans. The supplementary materials include recommenda-tions with regard to approaching such issues in analysis.

The authors acknowledge that there are numerous chal-lenges that have plagued successful development of ther-apies in AD to date, only some of which may beaddressed by sharing data from retrospective trials anduse of predictive modeling and simulation tools. The fieldof AD is evolving in many ways such as new diagnosticguidelines that include biomarkers. The lack of bio-markers in the current CAMD-AD database does poselimitations in applicability to addressing some keyresearch questions and it is acknowledged that the impactof the CAMD-AD database will be expanded with the in-clusion of biomarkers.

5. Moving forward

The C-Path Online Data Repository for AD serves as anexample of what can be achieved by standardization andintegration of clinical trial database from industry-sponsored AD trials. The standardization and pooling ofclinical trial data facilitates the analysis of data across mul-tiple studies, providing a more comprehensive understand-ing of the disease process.

CODR-AD is an evolving database and the data standardsdeveloped as part of this project are not intended to be static.C-Path partnered with CDISC with the publication of v2.0 ofthe AD standards, which incorporates expanded clinical end-points and biomarkers and is focused on predementia stagesof AD (http://www.cdisc.org/therapeutic). CAMD is alsoworking with partners to develop amore user-friendly accessinterface, and discussions are currently underway with otherorganizations, including ADNI, to develop approaches forthe broader use of CDISC data standards, and integrated da-tabases.

Reference

sion modeling Ito et al. [25].

linical features of early- vs. late-onset

rs research

Panegyres and Chen [29].

iomarkers Szalkai et al. [27].

lity; test-retest reliability; internal Khan et al. [28].

e Ito et al. [17].

arly onset AD Panegyres and Chen [26].

eory Ueckert et al. [30].

ine Data Repository for AD; AD, Alzheimer’s disease; CAMD, Coalition

nitive subscale.

f the CODR-AD database by investigators external to CAMD. Abstracts also

26 February 2015 � 7:06 pm � ce

Page 8: Development of a unified clinical trial database for ... of a... · SDTM is suited for collecting data of various types and stor- ... including terminology, structure, and cross-dataset

web4C=FPO

web4C=FPO

Fig. 3. Contents of the C-Path online data repository. A breakdown of the studies contained in all Coalition Against Major Diseases created an Online

Data Repository (CODR) databases across the participating C-Path consortia. The databases maintained by Coalition Against Major Diseases (CAMD)

and the Polycystic Kidney Disease Outcomes Assessment Consortium (PKD) make use of Clinical Data Interchange Standards Consortium Study Data

Tabulation Model (CDISC SDTM). Predictive Safety Testing Consortium (PSTC) studies are primarily nonclinical. The contents of the database are ac-

curate as of the publication of this article. The number of new studies and working groups among consortia are dynamic and subject to growth. HV,

healthy volunteer study.

J. Neville et al. / Alzheimer’s & Dementia - (2015) 1-108

Other data will continue to be incorporated intoCODR-AD as it becomes available. CAMD is workingwith companies to develop a framework under whichthey would be willing to make more data available,such as inclusion and exclusion criteria, trial designmethods, active treatment arm data, and biomarkersacross the disease continuum. Although companies haveexpressed concerns that by making this information avail-able, (i.e., they could risk losing their anonymity as con-tributors), regulators and many sponsors agree that thesedata are essential for more efficient analysis and interpre-tation of the database [34]. CODR-AD database and itsuse to date serves as an example that responsible useand effective and impactful advances will emerge frombig data. The landscape of precompetitive data sharingis changing in a positive way and the CODR-AD databaseserves as an example for others interested in big dataacross disciplines and diseases.

Acknowledgments

The authors thank Dr. Arthur Toga (Laboratory of NeuroImaging, Keck School of Medicine of University of

FLA 5.2.0 DTD � JALZ1948_proof �

Southern California) for review of the manuscript. Wealso gratefully acknowledge the contributions of Dr. RayWoosley, Rick Myers, Andy Tofel and Aaron Avery(Ephibian), Dr. Rebecca Kush (CDISC), Cathy Barrows(GSK), Roberta Rosenburg and Ann Pennington andKaori Ito (Pfizer), Suzanne Pierre (Sanofi), and ChrisTolk (CDISC).The success of this database and goals of the CAMD con-sortium would not have been possible without the contribu-tion of subject-level clinical trials data. We wish to thankSanofi, Pfizer, GlaxoSmithKline, Johnson &Johnson, Abb-Vie, Forest Laboratories, Eisai, AstraZeneca, and the Alz-heimer’s Disease Cooperative Study (ADCS) for their datacontributions.Funding: Partial funding for this work came from grantsfrom Science Foundation Arizona (SRG 0335-08) andthe U.S. Food and Drug Administration (5U01FD003865).

Supplementary data

Supplementary data related to this article can be found athttp://dx.doi.org/10.1016/j.jalz.2014.11.005.

26 February 2015 � 7:06 pm � ce

Page 9: Development of a unified clinical trial database for ... of a... · SDTM is suited for collecting data of various types and stor- ... including terminology, structure, and cross-dataset

J. Neville et al. / Alzheimer’s & Dementia - (2015) 1-10 9

RESEARCH IN CONTEXT

1. Systemic review: The data team from the CoalitionAgainstMajorDiseases surveyed available clinical da-tabases for Alzheimer’s disease (AD) research and as-sessed the approaches used by the developers of thosedatabases. Additionally, the team surveyed availableclinical data standards (in particular, standards specificto AD) suitable for creating integrated databases.

2. Interpretation: A standardized database of clinicaltrials on AD (predementia and dementia)—the firstof its kind—was developed and made available toqualified researchers. A new open-source, publiclyavailable CDISC data standard for Alzheimer’s pre-dementia and dementia was also developed and isavailable for use in prospective clinical studies.

3. Future directions: Maximizing the usefulness of thedatabase will require incorporating data from addi-tional trials, including biomarker data and datafrom other outcome assessments and endpoints.The authors are actively seeking these data and willbe updating data standards that will enable theirincorporation into the database.

References

[1] Alzheimer’s Disease International. World Alzheimer’s report. 2009.

[2] Olson S, Downey AS. Sharing clinical research data: workshop sum-

mary. Washington, D.C: Institute of Medicine of the National Acade-

mies. 2013.

[3] Food and Drug Administration. Innovation or stagnation: challenges

and opportunity on the critical path to new medical products. 2004.

Available at: http://www.fda.gov/ScienceResearch/SpecialTopics/

CriticalPathInitiative/CriticalPathOpportunitiesReports/ucm077262.

htm. Accessed February 26, 2013.

[4] Romero K, Corrigan B, Neville J, Kopko S, Cantillon M. Striving for

an integrated drug development process for neurodegeneration: the

coalition against major diseases. Neurodegen DisManage 2011;1:1–7.

[5] Romero K, de Mars M, Frank D, Anthony M, Neville J, Kirby L, et al.

The coalition against major diseases: developing tools for an inte-

grated drug development process for Alzheimer’s and Parkinson’s dis-

eases. Clin Pharmacol Ther 2009;86:365–7.

[6] Weiner MW, Veitch DP, Aisen PS, Beckett LA, Cairns NJ, Green RC,

et al. The Alzheimer’s Disease Neuroimaging Initiative: a review of pa-

pers published since its inception. Alzheimers Dement 2013;9:e111–94.

[7] Clinical Data Interchange Standards Consortium. Clinical Data Inter-

change Standards Consortium.

[8] Food and Drug Administration. PDUFA reauthorization performance

goals and procedures fiscal years 2013 through 2017. 2012. Available

at: http://www.fda.gov/downloads/ForIndustry/UserFees/Prescription

DrugUserFee/UCM270412.pdf. Accessed June 27, 2013.

[9] Corder EH, Saunders AM, Strittmatter WJ, Schmechel DE,

Gaskell PC, Small GW, et al. Gene dose of apolipoprotein E type 4

allele and the risk of Alzheimer’s disease in late onset families. Sci-

ence 1993;261:921–3.

FLA 5.2.0 DTD � JALZ1948_proof �

[10] Gobburu JV, Lesko LJ. Quantitative disease, drug, and trial models.

Annu Rev Pharmacol Toxicol 2009;49:291–301.

[11] Clinical Data Interchange Standards Consortium. Alzheimer’s

disease-specific therapeutic area supplement to the study data tabula-

tion model user guide. Prepared by the Coalition Against Major Dis-

eases. Jan 2014. Available at: http://www.cdisc.org/system/files/all/

standard_category/application/pdf/taug_alzheimer_27s_20v2.0_20no

portfolio.pdf. Accessed February 11, 2015.

[12] OpenCDISC. OpenCDISC Validator. Available at: http://www.open

cdisc.org/. Accessed February 11, 2015.

[13] U.S. Department of Health and Human Services. Guidance regarding

methods of de-identification of protected health information in accor-

dancewith the Health Insurance Portability and Accountability Act (HI-

PAA) privacy rule. Available at: http://www.hhs.gov/ocr/privacy/hipaa/

understanding/coveredentities/De-identification/guidance.html. Ac-

cessed February 11, 2015.

[14] Rogers JA, Polhamus D, Gillespie WR, Ito K, Romero K, Qiu R, et al.

Combining patient-level and summary-level data for Alzheimer’s dis-

ease modeling and simulation: a beta regression meta-analysis. J Phar-

macokinet Pharmacodyn 2012;39:479–98.

[15] Food and Drug Administration. Regulatory letter to critical path insti-

tute and CAMD regarding Alzheimer’s disease simulation tool. FDA

Center for Drug Evaluation and Research, Disease specific model li-

brary. Available at: http://www.fda.gov/aboutfda/centersoffices/office

ofmedicalproductsandtobacco/cder/ucm180485.htm. Accessed Feb-

ruary 11, 2015.

[16] European Medicines Agency. Qualification opinion of a novel data

driven model of disease progression and trial evaluation in mild and

moderate Alzheimer’s disease. 2013. Available at: http://www.ema.

europa.eu/docs/en_GB/document_library/Regulatory_and_procedural_

guideline/2013/10/WC500151309.pdf. Accessed February 11, 2015.

[17] Ito K, Corrigan B, Romero K, Anziano R, Neville J, Stephenson D,

et al. Understanding placebo responses in Alzheimer’s disease clinical

trials from the literature meta-data and CAMD database. J Alzheimers

Dis 2013;37:173–83.

[18] Food and Drug Administration. Janus Clinical Trials Repository

(CTR) Project 2012 [6/27/13]. Available at: http://www.fda.gov/

ForIndustry/DataStandards/StudyDataStandards/ucm155327.htm. Ac-

cessed February 11, 2015.

[19] Rudick RA, Larocca N, Hudson LD. Multiple Sclerosis Outcome As-

sessments Consortium: genesis and initial project plan. Mult Scler

2013;20:12–7.

[20] R Project for Statistical Computing. Available at: http://www.

r-project.org/. Accessed February 11, 2015.

[21] Alzheimer’s Association. Global Alzheimer’s Association Interactive

Network 2013. Available at: http://www.gaain.org/. Accessed

February 11, 2015.

[22] Jones R, Wilkinson D, Lopez OL, Cummings J, Waldemar G,

Zhang R, et al. Collaborative research between academia and industry

using a large clinical trial database: a case study in Alzheimer’s dis-

ease. Trials 2011;12:233.

[23] Beekly DL, Ramos EM, Lee WW, Deitrich WD, Jacka ME, Wu J,

et al. The National Alzheimer’s Coordinating Center (NACC) data-

base: the uniform data set. Alzheimer Dis Assoc Disord 2007;

21:249–58.

[24] CAMD. C-Path online data repository. Available at: https://codr.

c-path.org. Accessed February 11, 2015.

[25] Ito K, Ahadieh S, Corrigan B, French J, Fullerton T, Tensfeldt T, et al.

Disease progression meta-analysis model in Alzheimer’s disease. Alz-

heimers Dement 2010;6:39–53.

[26] Panegyres PK, Chen HY, the Coalition against Major Diseases. Early-

onset Alzheimer’s disease: a global cross-sectional analysis. Eur J

Neurol 2014 Apr 30; http://dx.doi.org/10.1111/ene.12453 [Epub

ahead of print]. 2014.

[27] Szalkai B, Grolmusz VK, Grolmusz VI, Coalition Against Major Dis-

eases. Identifying combinatorial biomarkers by association rule min-

ing in the CAMD Alzheimer’s database. Cornell University Library.

26 February 2015 � 7:06 pm � ce

Page 10: Development of a unified clinical trial database for ... of a... · SDTM is suited for collecting data of various types and stor- ... including terminology, structure, and cross-dataset

J. Neville et al. / Alzheimer’s & Dementia - (2015) 1-1010

2013. Available at: http://arxiv.org/abs/1312.1876. Accessed February

11, 2015.

[28] Khan A, Yavorsky C, DiClemente G, Opler M, Liechti S, Rothman B,

et al. Reliability of the Alzheimer’s disease assessment scale (ADAS-

Cog) in longitudinal studies. Curr Alzheimer Res 2013;10:952–63.

[29] Panegyres PK, ChenHY. Differences between early and late onset Alz-

heimer’s disease. Am J Neurodegener Dis 2013;2:300–6.

[30] Ueckert S, Plan EL, Ito K, Karlsson I, Corrigan B, Hooker AC.

Improved utilization of ADAS-Cog assessment data through item

response theory based pharmacometric modeling. Pharm Res 2014;

31:2152–65 [Epub ahead of print].

[31] Ueckert S, Plan EL, Ito K, Karlsson MO, Corrigan B, Hooker AC.

Application of item response theory to ADAS-cog scores modelling

FLA 5.2.0 DTD � JALZ1948_proof �

in Alzheimer’s disease. Venice, Italy: Population Approach Group in

Europe; 2012. p. 21.

[32] Yavorsky C, DiClemente G, Opler M, Khan A, Jovic S, Rothman B. Es-

tablishing threshold scores and profiles of cognitive impairment for the

Alzheimer’s Disease Assessment Scale— Cognitive Subscale (ADAS-

Cog) for patients with higher dementia (MMSE, 12), Alzheimer’s dis-

ease and probably MCI. Alzheimers Dement 2012;8:P415–6.

[33] Khan A, Yavorsky C, Opler M, Jovic S. Differential item functioning

and the Alzheimer’s Disease Assessment Scale-Cognitive Subscale

(ADAS-Cog) among people with Alzheimer’s disease. Alzheimer De-

ment 2013;9:P460–1.

[34] Eichler HG, Petavy F, Pignatti F, Rasi G. Access to patient-level trial

data–a boon to drug developers. N Engl J Med 2013;369:1577–9.

26 February 2015 � 7:06 pm � ce