Impressions of a New NCI Director: Big Data Norman E. Sharpless, M.D. Cancer Informatics for Cancer Centers April 3, 2018
Impressions of a New NCI Director: Big Data
Norman E. Sharpless, M.D. Cancer Informatics for Cancer Centers April 3, 2018
www.cancer.gov
www.cancer.gov/espanol
October 17, 2018
3
NCI: Leading the National Cancer Program
Bethesda Maryland
Frederick Maryland
NCI- Designated Cancer Centers
National Clinical Trials Network
4
Why Big Data Really Matters… A personal story
NCI Genomic Data Commons
Many Programs Generating Multimodal Data
The Cancer Imaging Archive
TCIA
Proteomic Data Coordinating Center
Clinical Proteomic Tumor Analysis Consortium
Open Public Data
National Cancer Data Ecosystem
• Accelerate progress in cancer, including prevention & screening • From cutting edge basic
research to wider uptake of standard of care
• Encourage greater cooperation and collaboration • Within and between academia,
government, and private sector • Enhance data sharing
Build a National Cancer Data Ecosystem
• Enhanced cloud-computing platforms
• Services that link disparate information, including clinical, image, and molecular data
• Essential underlying data science infrastructure, standards, methods, and portals for the Cancer Data Ecosystem
Overarching goals Recommendations
Enhanced Data Sharing Working Group Recommendation: The Cancer Data Ecosystem
CancerResearchDataCommons
SBGCGC
BroadFireCloud ISBCGC
Retrospective Characterization and Analysis of Biospecimens Collected from NCI-Sponsored Trials of the National Clinical Trials Network (NCTN) and NCI Community Oncology Research Program (NCORP)
Program Announcement Released: Receipt Date for Proposals:
December 4, 2017 March 15, 2018
Based on the BRP recommendations, projects of particular interest to accelerate our understanding of biologic response include:
• Analyses in clinical settings in which it usually takes many years for complete outcome data to become available from a trial
• Analyses in rare tumor types • Analyses in special populations (e.g., children, adolescent and young adults,
racial/ethnic minority groups and underserved populations)
Retrospective Characterization and Analysis of Biospecimens Collected from NCI-Sponsored Trials of the National Clinical Trials Network (NCTN) and NCI Community Oncology Research Program (NCORP)
• Hypothesis-driven proposals with detailed statistical plans.
• Exploratory or hypothesis-generating projects will be considered, particularly in cases of good clinical opportunity, high diversity sample representation, or building on data generated from prior analysis projects.
• Comprehensive molecular analyses of malignant and patient-matched normal samples could answer a key clinical question(s)
• Feasibility given number and quality of biospecimens available
• Acceptable timelines for provision of biospecimens and data
• Appropriate consent for use of specimens and appropriate data sharing plans
Highest priority Additional criteria
13
NCI-MATCH and Pediatric MATCH Molecular Analysis for Therapy Choice
14
NCI Molecular Analysis for Therapy Choice (NCI-MATCH)
• Precision oncology trial to explore treating patients based on the molecular profiles of their tumors
• 1,089 sites in U.S. across NCTN and NCORP
15
NCI-MATCHBox
§ NCI-MATCHBoxTeamResponsibili7es§ SequencingPipelineConfigura7on§ SeamlessIntegra7onwithLaboratoryandClinical
Systems§ BiospecimenTracking§ Parsing,Annota7onandVariantRepor7ng§ AutomatedPa7entManagementWorkflows§ TreatmentArmManagementandTracking§ Algorithm-DrivenTreatmentAssignment§ ProficiencyandCompetencyTes7ngSupport§ Dataanaly7cs,Visualiza7onandRepor7ng
16
NCI Molecular Analysis for Therapy Choice (NCI-MATCH) Rare Variant Initiative: • Patients with low frequency mutations (< 2%) where well qualified
drugs/targets available • Foundation Medicine, Caris Life Sciences, MDACC, MSKCC will
notify treating physician at any of the MATCH sites when results of their NGS panel would make patient eligible for a MATCH treatment arm
• Results verified centrally by NCI-MATCH Oncomine® assay • RFP from other NGS providers posted August 2017 and received
January 2018 to broaden the base of patients available to enroll in precision oncology studies
17
NCI Molecular Analysis for Therapy Choice (NCI-MATCH)
Time period # enrolled # first
samples submitted
# first sample fail
# assay complete
# assigned to Rx
# enrolled on Rx
Total Pre Pause 794 739 116 645 54 27
Total Post Pause 5,602 5,222 428 4,913 938 662
Overall Total Screening Cohort 6,396 5,961 544 5,558 992 689
Total Outside Assay 104 59 3 102 88 71
18
First NCI-MATCH Efficacy Data: Nivolumab in MSI high cancers
• Median cycles 3.5 (range 1-13+ cycles)
• Median time to first response was 2.1 months (includes unconfirmed PRs)
• 6-Month PFS was 49% (95% CI: 32-67%)
• Median duration of response has not been reached (4-8+ months; 7/8 still under treatment at time of data cutoff)
• 11 patients remain on therapy at time of data cutoff
19
NCI-COG Pediatric MATCH
20
Pediatric MATCH Active Therapeutic Arms
21
Pediatric MATCH Enrollment
0
5
10
15
20
25
30
35
40
45
2017-07 2017-08 2017-09 2017-10 2017-11 2017-12 2018-01 2018-02
MonthlyAcAvity
registraAon specimen_receivedassay_completed
• First 131 patients: 74 males, 57 females Age 1-21, median age 12 yrs 35% patients AYA • Tumor sequencing completed on
94 patients • At least one patient has matched
to each of the treatment arms
22
BRCA Challenge – Program Overview
1. Share BRCA1 and BRCA2 variants publicly via a web portal
2. Address social, ethical, legal challenges to global data sharing
3. Create a GA4GH model for all disease genes
Major milestones
BRCA Exchange >18,000 variants, multiple sites
1/3 expert-classified with supporting rationale
Coming soon: mobile app with alert function
Mission: Improve care of patients at risk of breast and ovarian cancer using global data sharing and collaboration in the analysis of BRCA1 and BRCA2.
24
BRCA Exchange Website brcaexchange.org
• Flexible searching • Drill down to extra
info • Tiled format
• Versioning
• Variant level • Dataset level
25
NCI SEER Program Surveillance, Epidemiology, and End Results
26
The SEER Program • Funded by NCI to support research on the diagnosis, treatment and outcomes of cancer since 1973
• 16 population-based registries covering 28% of the US population
• Registries collect information on all cancer cases for residents of the state or region
• Representing racial and ethnic minorities
• Various geographic subgroups
• 450,000+ incident cases annually • Approximately 85% of cases with real time electronic pathology (e-path) reporting
Drug Category Unique Patient / Prescription
Count (2013 – 2016)
Antineoplastic - Hormonal and Related Agents 16,806
Antimetabolites 7,032
Antineoplastics – Misc. 3,345
Antineoplastic Enzyme Inhibitors 1,642
Alkylating Agents 1,008
Chemotherapy Rescue/Antidote Agents 524
Antineoplastic - Immunomodulators 222
Mitotic Inhibitors 122
Topisomerase I Inhibitors 26
Antineoplastic - Antibodies 17
Atineoplastic or Premalignant Lesion Agents - Topical
14
Antineoplastic – Angiogenesis Inhibitors 4
Diagnostic Drugs 5
Antineoplastic Antibiotics 5
Total 30,772
Walgreen’s Data for Georgia: Table of frequency distribution of oral antineoplastic drugs by generic category (2013-2016)
• Initial pilot in GA – once data assessed will scale to entire SEER program.
• 20,000 Total unique patients with 225,420 fills
• These types of real world data will permit: • Monitoring of patient compliance • Assessing the use of these agents in the
context of outcome differences in use across subpopulations - disparity analysis
28
Trends in checkpoint inhibitor use in oncology practices Captured from Unlimited Systems claims (2013-2017)*
Once scaled to SEER, linked claims data will permit: • Evaluation of use in
the context of demographics and outcome
• Monitoring diffusion of agents
• Measuring use across subgroups of the population (potential for disparities research)
*Represents 12-35% of oncologists in 5 registries
29
Variation in genetic testing in breast and ovarian cancers by race/ethnicity (California and Georgia)
Overall Testing Rates (2013-2015)
• 26% of all 82,120 Breast Cancers • 33% of all 6,268 Ovarian Cancers
30
Capturing outcomes other than survival: Two methods NLP/Machine learning solutions • Working with Department of Energy partners to develop deep learning
algorithms to extract recurrence as distant metastatic disease from unstructured text documents (pathology and radiology reports)
Patient-generated data within the registry • Working with partners to test solutions, e.g., patient portals, direct patient
reporting, and patient-generated data sources (2 studies in process)
31
Department of Energy Pilot Project
NLP / Machine learning solutions
Develop deep learning algorithms to extract recurrence as distant metastatic disease from unstructured text documents (pathology and radiology reports)
35
Big Issues in Big Data Facing NCI
Workforce and career development
EHR Mining
Storage – What? How Long? Cloud?
Security, privacy and de-identification
Use of challenges / prizes
CBIIT leadership
www.cancer.gov
www.cancer.gov/espanol