Lessons Learned: Reali.es of Building Cancer Models Sharing , Rewards and Affordability Stephen Friend MD PhD
Nov 28, 2014
Lessons Learned: Reali.es of Building Cancer Models-‐
Sharing , Rewards and Affordability
Stephen Friend MD PhD
KRAS NRAS
BRAF
MEK1/2
EGFR
ERBB2
BCR/ABL
EGFRi
Proliferation, Survival
• EGFR Pathway commonly mutated/ac.vated in Cancer • 30% of all epithelial cancers
• Blocking Abs approved for treatment of metasta.c colon cancer
• Subsequently found that RASMUT tumors don’t respond – “Nega.ve Predic.ve Biomarker”
• However s.ll EGFR+ / RASWT pa.ents who don’t respond? – need “Posi.ve Predic.ve Biomarker”
• And in Lung Cancer not clear that RASMUT status is useful biomarker
Predic.ng treatment response to known oncogenes is complex and requires detailed understanding of how different gene.c backgrounds func.on
Oncogenes only make good targets in particular molecular contexts : EGFR story
Reality: Overlapping Pathways
Preliminary Probabalistic Models- Rosetta
Gene symbol Gene name Variance of OFPM explained by gene expression*
Mouse model
Source
Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg
Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12]
Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple
(UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg
(Columbia University, NY) [11] C3ar1 Complement component
3a receptor 1 46% ko Purchased from Deltagen, CA
Tgfbr2 Transforming growth factor beta receptor 2
39% ko Purchased from Deltagen, CA
Networks facilitate direct identification of genes that are
causal for disease Evolutionarily tolerated weak spots
Nat Genet (2005) 205:370
"Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003)
"Variations in DNA elucidate molecular networks that cause disease." Nature. (2008)
"Genetics of gene expression and its effect on disease." Nature. (2008)
"Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009)
….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc
"Identification of pathways for atherosclerosis." Circ Res. (2007)
"Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008)
…… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome
"Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005)
“..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)
"An integrative genomics approach to infer causal associations ...” Nat Genet. (2005)
"Increasing the power to detect causal associations… “PLoS Comput Biol. (2007)
"Integrating large-scale functional genomic data ..." Nat Genet. (2008)
…… Plus 3 additional papers in PLoS Genet., BMC Genet.
d
Metabolic Disease
CVD
Bone
Methods
Extensive Publications now Substantiating Scientific Approach Probabilistic Causal Bionetwork Models
• >80 Publications from Rosetta Genetics
Ø 50 network papers Ø http://sagebase.org/research/resources.php
List of Influential Papers in Network Modeling
Background: Informa.on Commons for Biological Func.ons
Sage Bionetworks A non-profit organization with a vision to enable networked team
approaches to building better models of disease
BIOMEDICINE INFORMATION COMMONS INCUBATOR
Sagebase.org
Data Repository
Discovery Platform
Building Disease Maps
Commons Pilots
Sage Bionetworks Collaborators
§ Pharma Partners § Merck, Pfizer, Takeda, Astra Zeneca, Amgen,Roche, Johnson &Johnson
13
§ Foundations § Kauffman CHDI, Gates Foundation
§ Government § NIH, LSDF, NCI
§ Academic § Levy (Framingham) § Rosengren (Lund) § Krauss (CHORI)
§ Federation § Ideker, Califano, Nolan, Schadt
Predictive models of cancer phenotypes
Ø mRNA Ø copy number Ø somatic
mutations Ø epigenetics Ø proteomics
Molecular characterization
Cancer phenotypes Ø Drug sensitivity
screens Ø Clinical
prognosis
Panel of tumor samples
Predic2ve model
months'a)er'beginning'treatment'Brian&J&Druker,&Nature'Medicine'15,&114901152&(2009)&
overall'survival'(%)'
Rela:ng'a'gene:c'feature'of'a'cancer'to'the'efficacy'of'a'drug:'Gleevec'(Ima:nib)'improves'survival'in'CML'pa:ents'harboring'the'
BCREABL'transloca:on'
KRAS% NRAS%
BRAF%
MEK1/2%
EGFR% ERBB2%MET%
PIKC3A%
BCR/ABL%
ARF% MDM2% TP53%
Ima9nib%Nilo9nib%
AZD0530%
AZD6244%
Erlo9nib%
Lap9nib%
NutlinG3%
PDG0325901%
PF2341066%PHAG665752%%
PLX4720%RAF265%
ZDG6474%
Biological System
Data Analysis
Fundamentally Biological Science hasn’t changed yet because of the ‘Omics Revolu.on……
…..it is s.ll about the process of linking a system to a hypothesis to some data to some analyses
Biological
System
Data
Analysis
Iterative Networked Approaches To Generating Analyzing and Supporting New Models
Uncouple the automatic linkage between the data generators, analyzers, and validators
SYNAPSE
CURATED DATA
TOOLS/ METHODS
ANALYSES/ MODELS
RAW DATA
BioMedicine Information Commons
Data Generators
Data Analysts
Experimentalists
Clinicians
Patients/ Citizens
Networked Approaches
SYNAPSE
CURATED DATA
TOOLS/ METHODS
ANALYSES/ MODELS
RAW DATA
BioMedical Information Commons
Data Generators
Data Analysts
Experimentalists
Clinicians
Patients/ Citizens
Networked Team Approaches
3 PRIVACY BARRIERS
2 REWARDS
RECOGNITION
5 REWARDS
FOR SHARING
1 USABLE DATA
4 HOW TO
DISTRIBUTE TASKS
Open and Networked Team Approaches
1 USABLE DATA
2 REWARDS
RECOGNITION
SYNAPSE
Two approaches to building common scientific and technical knowledge
Text summary of the completed project Assembled after the fact
Every code change versioned Every issue tracked Every project the starting point for new work All evolving and accessible in real time Social Coding
Synapse is GitHub for Biomedical Data
Data and code versioned Analysis history captured in real time Work anywhere, and share the results with anyone Social Science
Every code change versioned Every issue tracked Every project the starting point for new work All evolving and accessible in real time Social Coding
Watch What I Do, Not What I Say
sage bionetworks synapse project
Most of the People You Need to Work with Don’t Work with You
sage bionetworks synapse project
My Other Computer is “The Cloud”
sage bionetworks synapse project
Data Analysis with Synapse
Run Any Tool
On Any Platform
Record in Synapse
Share with Anyone
Performance*assessment*
Expression* Copy*number* Muta6on* Phenotype*
Expression* Copy*number* Muta6on* Phenotype*
Expression*Copy*
number*
Muta6on*
Phenotype*Expression*
Copy*number*
Muta6on*
Phenotype*
Predic6ve*model*genera6on*
Synapse infrastructure for sharing, searching, and analyzing TCGA data
• Automated workflows for cura.on, QC, and sharing of large-‐scale datasets.
• All of TCGA, GEO, and user-‐submihed data processed with standard normaliza.on methods.
• Searchable TCGA data: • 23 cancers • 11 data plajorms • Standardized meta-‐data ontologies
130$drugs$
Pred
ic.o
n$Ac
curacy$(R
2 )$
Performance*assessment*
Expression* Copy*number* Muta6on* Phenotype*
Expression* Copy*number* Muta6on* Phenotype*
Expression*Copy*
number*
Muta6on*
Phenotype*Expression*
Copy*number*
Muta6on*
Phenotype*
Predic6ve*model*genera6on*
Synapse infrastructure for sharing, searching, and analyzing TCGA data
• Comparison of many modeling approaches applied to the same data.
• Models transparently shared and reusable through Synapse.
• Displayed is comparison of 6 modeling approaches to predict sensi.vity to 130 drugs.
• Extending pipeline to evaluate predic.on of TCGA phenotypes.
• Hos.ng of collabora.ve compe..ons to compare models from many groups.
Open and Networked Approaches
3 PRIVACY BARRIERS
PORTABLE LEGAL CONSENT: weconsent.us John Wilbanks
REDEFINING HOW WE WORK TOGETHER: Sage/DREAM Breast Cancer Prognosis Challenge
4 HOW TO
DISTRIBUTE TASKS
COLLABORATIVE CHALLENGES
5 REWARDS
FOR SHARING
What is the problem? Our current models of disease biology are primitive and limit
doctor’s understanding and ability to treat patients Current incentives reward those who silo information and work in closed systems
The Solution: Competitions to crowd-source research in biology and other fields
Ø Why competitions? • Objective assessments • Acceleration of progress • Transparency • Reproducibility • Extensible, reusable models
Ø Competitions in biomedical research • CASP (protein structure) • Fold it / EteRNA (protein / RNA structure) • CAGI (genome annotation) • Assemblethon / alignathon (genome assembly / alignment) • SBV Improver (industrial methodology benchmarking) • DREAM (co-organizer of Sage/DREAM competition)
Ø Generic competition platforms • Kaggle, Innocentive, MLComp
METABRIC
• Array-CGH"• Expression arrays"• Sequencing TP53 PIK3CA"• Amplified DNA and cDNA banks"• miRNA profiling"
Anglo-Canadian collaboration"
Gene sequencing (ICGC)
Sage/DREAM Challenge: Details and Timing
Phase 1: July thru end-Sep 2012 Ø Training data: 2,000 breast cancer
samples from METABRIC cohort • Gene expression • Copy number • Clinical covariates • 10 year survival
Ø Supporting data: Other Sage-curated breast cancer datasets
• >1,000 samples from GEO • ~800 samples from TCGA • ~500 additional samples from
Norway group • Curated and available on
Synapse, Sage’s compute platform
Ø Data released in phases on Synapse from now through end-September
Ø Will evaluate accuracy of models built
on METABRIC data to predict survival in:
• Held out samples from METABRIC
• Other datasets
Phase 2: Oct 15 thru Nov 12, 2012
Ø Evaluation of models in novel
dataset. Ø Validation data: ~500 fresh
frozen tumors from Norway group with:
• Clinical covariates • 10 year survival
Performance*assessment*
Expression* Copy*number* Muta6on* Phenotype*
Expression* Copy*number* Muta6on* Phenotype*
Expression*Copy*
number*
Muta6on*
Phenotype*Expression*
Copy*number*
Muta6on*
Phenotype*
Predic6ve*model*genera6on*
Synapse transparent, reproducible, versioned machine learning infrastructure for method comparison
METABRIC cohort: 997 breast cancer samples
Clinical covariates Gene expression (Illumina HT12v3) Copy number (Affy SNP 6.0) 10 year survival
Loaded through Synapse R client as Bioconductor objects.
Performance*assessment*
Expression* Copy*number* Muta6on* Phenotype*
Expression* Copy*number* Muta6on* Phenotype*
Expression*Copy*
number*
Muta6on*
Phenotype*Expression*
Copy*number*
Muta6on*
Phenotype*
Predic6ve*model*genera6on*
Synapse transparent, reproducible, versioned machine learning infrastructure for method comparison
Custom models implement train() and predict() API.
Implementa)on of simple clinical-‐only survival model used as baseline predictor.
Trey%Ideker)Janusz%Dutkowski)
Eric%Schadt)Gaurav%Pandey)
Gustavo%Stolovi=ky)Erhan%
Bilal)
Andrea%Califano)
Yishai%Shimoni)
Mukesh%Bansal) Mariano%
Alvarez)
Garry%Nolan)
In%Sock%Jang)Ben%Sauerwine)Stephen%Friend)
Justin%Guinney)
Marc%Vidal)
Adam%Margolin)
Ben%Logsdon)
Federa2on modeling compe22on
Models submiVed and evaluated in real-‐2me
leaderboard
>200 models tested within 3 months
Sage-‐DREAM Breast Cancer Prognosis Challenge one month of building beher disease models together
154 par.cipants; 27 countries 268 par.cipants; 32 countries
290 models posted to Leaderboard
breast cancer data
Challenge Launch: July 17
August 17 Status
Examples of Par.cipants
Summary of Breast Cancer Challenge #1 hVps://synapse.sagebase.org/ -‐ BCCOverview:0 Transparency, reproducibility
Valida2on in novel dataset
Publica2on in Science Transla2onal Medicine
Dona2on of Google-‐scale compute space.
For the goal of promo2ng democra2za2on of medicine… Registra2on star2ng NOW…
sign up at: synapse.sagebase.org
Performance*assessment*
Expression* Copy*number* Muta6on* Phenotype*
Expression* Copy*number* Muta6on* Phenotype*
Expression*Copy*
number*
Muta6on*
Phenotype*Expression*
Copy*number*
Muta6on*
Phenotype*
Predic6ve*model*genera6on*
Breast Cancer Collaborative Challenges and Beyond
Start With Pre-‐Collated Cohort
Collabora.ve Challenge Hosted on
Synapse
Obtain research ques.ons from breast cancer community for Challenge 2
Generate and fund research Challenge 2 research proposal
The challenge on molecular predictors of breast cancer will create a community-‐based effort to provide an unbiased assessment of the most accurate models and methodologies for predic:on of breast cancer survival.
Announce best performing model to predict breast cancer
survival
43
SYNAPSE
CURATED DATA
TOOLS/ METHODS
ANALYSES/ MODELS
RAW DATA
BioMedical Information Commons
Data Generators
Data Analysts
Experimentalists
Clinicians
Patients/ Citizens
Networked Team Approaches
3 PRIVACY BARRIERS
2 REWARDS
RECOGNITION
5 REWARDS
FOR SHARING
1 USABLE DATA
4 HOW TO
DISTRIBUTE TASKS
Lessons Learned: Reali.es of Building Cancer Models-‐
Sharing , Rewards and Affordability
Stephen Friend MD PhD