Nikolaus Schultz Marie Josée and Henry R. Kravis Center for Molecular Oncology Memorial Sloan Kettering Cancer Center October 13, 2015 Visualization and Analysis of Cancer Genomics Data
Nikolaus Schultz
Marie Josée and Henry R. Kravis Center for Molecular Oncology
Memorial Sloan Kettering Cancer Center
October 13, 2015
Visualization and Analysis of Cancer Genomics Data
Cost of DNA Sequencing is dropping rapidly
The Hallmarks of Cancer
Hanahan and Weinberg. Cell. March 4 2011.
Cancer is a class of diseases in which a group of cells display: uncontrolled growth invasion that intrudes upon and destroys adjacent tissues, and sometimes metastasis (spreading to other locations in the body via lymph or blood)
Many of these mechanisms are known, many not. Only some are treatable.
All these properties are caused by genetic or epigenetic alterations.
Can we identify the responsible alterations in the genomes of cancer patients?
The Hallmarks of CancerCancer is a class of diseases in which a group of cells display: uncontrolled growth invasion that intrudes upon and destroys adjacent tissues, and sometimes metastasis (spreading to other locations in the body via lymph or blood)
Tumor development / Drivers versus passengers
How does a cancer cell acquire all these different alterations?
Sequential accumulation of genomic alterations that confer a growth advantage (like in evolution, but faster).
Certain early events can increase the rate of accumulation, like mutations in DNA damage repair genes or cell-cycle checkpoint genes (or mutagens).
Over time, many alterations develop. The ones that confer a growth advantage are called “drivers”, all others are “passengers”. Can we distinguish between them?
Identification of functional alterations in genomic data - per disease - per gene - per patient - per pathway
Different, recurrent ways to alter the same pathway / process?
Many events are rare, so we need hundreds of samples of the same disease (sub-)type to find them based on recurrence!
Clinical applications:Development of new prognostic toolsIdentification of new treatment options
Patient-specific treatment
Utility of cancer genomics data
bioinformaticians
biologists
clinicians
2010 2011 2012 2013
Kidney clear cell
Endometrial cancer
Thyroid cancer
Head & neck squamous
Lung squamous cell carcinoma
Colorectal cancer
Breast cancer
Low grade glioma
2014
GBM Phase II
Bladder cancer
Lung adenocarcinoma
Melanoma
Prostate cancer
Stomach adenocarcinoma
+ lobular breast cancer, chromophobe kidney, papillary kidney,, pancreatic, rare tumors …500 samples per tumor type 10,000 tumor / normal pairs total
The Cancer Genome Atlas Project History20092008
Ovarian cancerGBM
AML
Cervical
Liver
Sarcoma
2015
Cancer Cell Line Encyclopedia (CCLE)
Broad Institute, Sanger, Washington University, etc.
Tumor sequencing in hospitals (MSKCC 500 per month)
Sources of tumor sequencing data
10,000 tumors
6,000 tumors
1,000 cell lines
5,000 tumors
>15,000 tumors
Raw data (FASTQ / BAM files)dbGaP, CGHub, ICGC Data Portal
Processed data (gene level data, mutation calls)
TCGA Data Portal, ICGC Data Portal, Supplementary Tables
Data slices (subsets of processed data)
Data visualization and analysis tools
Data availability
bioinformaticians
biologists, clinicians
Raw data (FASTQ / BAM files)dbGaP, CGHub, ICGC Data Portal
Processed data (gene level data, mutation calls)
TCGA Data Portal, ICGC Data Portal, Supplementary Tables
Data slices (subsets of processed data)
Data visualization and analysis tools
Data availability
bioinformaticians
biologists, cliniciansReduction of complexity!
Most mutations found in cancer are “passengers”
Driver alteration frequencies per tumor type
Driver alteration frequencies per tumor type
Rec L domain Furin-like Rec L domain Kinase domain
ERBB2 mutation hotspots across cancer types
Rec L domain Furin-like Rec L domain Kinase domain
ERBB2 mutation hotspots across cancer types
signal
noise
ERBB2 mutation hotspots across cancer typesS310F
Bladder: 1Breast: 3Cervical: 1Colorectal: 2Lung adeno: 2Ovarian: 2 Stomach: 1CCLE: 1 (bladder)
L755S/M/P/WBreast: 4Colorectal: 2Endometrial: 1Kidney (pap): 1Melanoma: 1Stomach: 1CCLE: 3 (colorectal, stomach, brain)
V777L/ABreast: 1Colorectal: 2GBM: 2
V842IBreast: 1Colorectal: 4Endometrial: 2CCLE: 4 (Lung, ovarian, endometrial)
R678QBreast: 1Colorectal: 1Endometrial: 1Stomach: 2CCLE: 1 (colorectal)
774-776insLung adeno: 6CCLE: 1 (lung)
Rec L domain Furin-like Rec L domain Kinase domain
ERBB2 mutation hotspots across cancer typesS310F
Bladder: 1Breast: 3Cervical: 1Colorectal: 2Lung adeno: 2Ovarian: 2 Stomach: 1CCLE: 1 (bladder)
L755S/M/P/WBreast: 4Colorectal: 2Endometrial: 1Kidney (pap): 1Melanoma: 1Stomach: 1CCLE: 3 (colorectal, stomach, brain)
V777L/ABreast: 1Colorectal: 2GBM: 2
V842IBreast: 1Colorectal: 4Endometrial: 2CCLE: 4 (Lung, ovarian, endometrial)
R678QBreast: 1Colorectal: 1Endometrial: 1Stomach: 2CCLE: 1 (colorectal)
774-776insLung adeno: 6CCLE: 1 (lung)
Rec L domain Furin-like Rec L domain Kinase domain
Greulich et al. PNAS 2012.
Kancha et al. PLoS ONE 2011.
Bose et al. Cancer Discovery 2012.
Bose et al. Cancer Discovery 2012.
Bose et al. Cancer Discovery 2012.
cBioPortal for Cancer Genomics: Data to knowledge
Tumor DNA DNA sequencer, microarrays …
Tumor and normalsequences
Data
Intuitive interface, quick response time, reduction of complexity
Alteration types and thresholds can be customized for each gene.
Reduction of complexity: Event callsWhich genes are altered in which samples?
cBioPortal
Data visualization and exploration in cBioPortal
ClinicalMSK-IMPACT
Geno
mic
dat
a
CMO Research
FoundationMedicine
ClinicalMSK-IMPACT
Geno
mic
dat
a
CMO Research cBioPortal
Data visualization and exploration in cBioPortal
TCGA, ICGC
Other public data
FoundationMedicine
ClinicalMSK-IMPACT
Geno
mic
dat
a
CMO Research cBioPortal
FoundationMedicine
TCGA, ICGC
Other public data
MSKCCclinical data
Data visualization and exploration in cBioPortal
ClinicalMSK-IMPACT
Geno
mic
dat
a
CMO Research cBioPortal
OncoKB: Annotation of variant effects, treatment
FoundationMedicine
TCGA, ICGC
Other public data
Clinical annotationStep 1: ManualStep 2: Automated via
institutional databases
MSKCCclinical data
OncoKBKnowledgebase
of oncogenic mutations
Variant effectNCCN guidelines
Standard therapy
Investigationaltherapy
Clinical trials
cBioPortal usage and interest cbioportal.org
>5,000 unique users per week, doubling every year
cBioPortal usage and interest cbioportal.org
>5,000 unique users per week, doubling every year
Numerous academic installations of cBioPortal:Dana-Farber, Princess Margaret, CHOP, Weill Cornell, Fred Hutchinson, UCSC, Columbia, NYU, NY Genome Center, British Columbia, University of Michigan, SickKids, Vanderbilt, Emory, UNC, University of Pittsburgh, CRUK, EMBL, Charite Berlin, institutions in Japan, China, …
Interest by several people to modify or customize the code, and to contribute new features
Interest by pharmaceutical companies and others to use cBioPortal● For internal data analysis (large pharma)● In customer-facing applications (smaller service companies)
Switch to open sourcecBioPortal source code is available via GitHub:
https://github.com/cBioPortal/cbioportalAGPL license v3 (Affero GPL):
A GPL variant, main difference is that redistribution over a network triggers the copyleft requirements
Impact on cancer research, patient treatment, drug development through:• More robust and flexible software• Accelerated development of new features• Wider user base, collaborative culture
Core cBioPortal Development groupMemorial Sloan Kettering Cancer Center
Nikolaus Schultz, Chris Sander, Benjamin Gross, JJ Gao
Dana Farber Cancer InstituteEthan Cerami
Princess Margaret Cancer Centre Trevor Pugh, Stuart Watt
Re-uniting two cBioPortal foundersCoordination of architectural decisions, feature development, merges, etc.
TheHyve now offering commercial services around cBioPortal
Summary
Rapidly growing body of cancer genomics data (public and private)
Reduction of complexity can make these data accessible and interpretable
cBioPortal allows access to cancer genomics data sets:cbioportal.org: public sitevia GitHub: install local versions
cBioPortal is now fully open sourcesoftwaredata pipelines and data sets coming sooncommercial support available
Still exploring pre-competitive funding options
Acknowledgements
CMOCyriac KandothWilliam LeeRajmohan MuraliNicholas D. SocciBarry TaylorMichael BergerAgnes VialeDavid B. SolitMichael Trapani Ederlinda Paraiso
Molecular DiagnosticsAhmet ZehirAijaz SyedDonavan ChengMichael BergerMaria ArcilaMarc LadanyiInformation SystemsMike EubanksStu Gardos
cBioPortalJianJiong GaoBenjamin GrossYichao SunHongxin ZhangFred CriscuoloDong LiAdam AbeshouseRitika KundraAnnice Chen
Chris SanderOnur SumerArman AksoyEthan Cerami
KnowledgebaseDebyani ChakravartySarah PhillipsJulia Rudolph
Bioinformatics CoreJoanne Edington
Demo slides
End of Live Demo