Friend EORTC 2012-11-08
Post on 25-Dec-2014
251 Views
Preview:
DESCRIPTION
Transcript
Integrating Cancer Networks and the Value of Compute Spaces
Stephen H Friend November 8, 2012
EORTC/NCI Dublin
KRAS NRAS
BRAF
MEK1/2
EGFR
ERBB2
BCR/ABL
EGFRi
Proliferation, Survival
• EGFR Pathway commonly mutated/activated in Cancer • 30% of all epithelial cancers
• Blocking Abs approved for treatment of metastatic
colon cancer
• Subsequently found that RASMUT tumors don’t respond – “Negative Predictive Biomarker”
• However still EGFR+ / RASWT patients who don’t respond? – need “Positive Predictive Biomarker”
• And in Lung Cancer not clear that RASMUT status is useful biomarker
Predicting treatment response to known oncogenes is complex and requires detailed understanding of how different genetic backgrounds function
Oncogenes only make good targets in particular molecular
contexts : EGFR story
Reality: Overlapping Pathways
Preliminary Probabalistic Models- Rosetta
Gene symbol Gene name Variance of OFPM
explained by gene
expression*
Mouse
model
Source
Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics
Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics
Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg
Mirochnitchenko (University of
Medicine and Dentistry at New
Jersey, NJ) [12]
Lactb Lactamase beta 52% tg Constructed using BAC transgenics
Me1 Malic enzyme 1 52% ko Naturally occurring KO
Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple
(UCLA) [13]
Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg
(Columbia University, NY) [11]
C3ar1 Complement component
3a receptor 1
46% ko Purchased from Deltagen, CA
Tgfbr2 Transforming growth
factor beta receptor 2
39% ko Purchased from Deltagen, CA
Networks facilitate direct
identification of genes that are
causal for disease
Evolutionarily tolerated weak spots
Nat Genet (2005) 205:370
"Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003)
"Variations in DNA elucidate molecular networks that cause disease." Nature. (2008)
"Genetics of gene expression and its effect on disease." Nature. (2008)
"Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009)
….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc
"Identification of pathways for atherosclerosis." Circ Res. (2007)
"Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008)
…… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome
"Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005)
“..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)
"An integrative genomics approach to infer causal associations ...” Nat Genet. (2005)
"Increasing the power to detect causal associations… “PLoS Comput Biol. (2007)
"Integrating large-scale functional genomic data ..." Nat Genet. (2008)
…… Plus 3 additional papers in PLoS Genet., BMC Genet.
d
Metabolic
Disease
CVD
Bone
Methods
Extensive Publications now Substantiating Scientific Approach
Probabilistic Causal Bionetwork Models
>80 Publications from Rosetta Genetics/ Sage Bionetworks
Biological
System
Data
Analysis
Iterative Networked Approaches
To Generating Analyzing and Supporting New Models
Uncouple the automatic linkage between the
data generators, analyzers, and validators
An Alternative
Commons are resources that are owned in common or shared among
communities.
-David Bollier
Biomedicine
Information
Commons
Sage Bionetworks
A non-profit organization with a vision to enable networked team
approaches to building better models of disease
BIOMEDICINE INFORMATION COMMONS INCUBATOR
Better Models of
Disease:
INFORMATION
COMMONS
Technology Platform
Challenges
Imp
actf
ul M
od
els
Go
vernan
ce
Sage Bionetworks Collaborators
Pharma Partners Merck, Pfizer, Takeda, Astra Zeneca,
Amgen,Roche, Johnson &Johnson
11
Foundations
Kauffman CHDI, Gates Foundation
Government
NIH, LSDF, NCI
Academic
Levy (Framingham)
Rosengren (Lund)
Krauss (CHORI)
Federation
Ideker, Califano, Nolan, Schadt
IT/Data Generators
Pharma
Academic Consortia
Joint Patient/Scient
ist Communities
Biotech
Patient Foundations
Individual Patients
Better Models of Disease:
INFORMATION COMMONS
Technology Platform
Challenges
Imp
actf
ul M
od
els
Go
vern
ance
Constituencies
Background: Information Commons for Biological Functions
SYNAPSE
CURATED
DATA
TOOLS/
METHODS
ANALYZES/
MODELS
RAW
DATA
BioMedicine Information Commons
Data
Generators
Data
Analysts
Experimentalists
Clinicians
Patients/
Citizens
Networked Approaches
FOUR PILOTS IN THE SAGE BIONETWORKS COMMONS INCUBATOR
• Provide a “compute space” for hosting and sharing models – (to complement data storage and tools provided by Sanger Broad…)- SYNAPSE)
• Co-generate models of drivers for Cell Line/Clinical Sensitivity
• Host Challenges and other approaches that will maximize most people providing and sharing their insights as quickly as possible – https://synapse.sagebase.org/ - BCCOverview:0
• Engage citizens as partners in gathering information and insights and funds
Two approaches to building common scientific
and technical knowledge
Text summary of the completed project
Assembled after the fact
Every code change versioned
Every issue tracked
Every project the starting point for new work
All evolving and accessible in real time
Social Coding
“Synapse is a compute platform
for transparent, reproducible, and
modular collaborative research.”
Synapse is GitHub for Biomedical Data
• Data and code versioned
• Analysis history captured in real time
• Work anywhere, and share the results with anyone
• Social/Interactive Science
• Every code change versioned
• Every issue tracked
• Every project the starting point for new work
• Social/Interactive Coding
Currently at 16K+ datasets and ~1M models
Demo Interaction
Download Data from Web Programmatic Access to Data
Demo Interaction
Download Data from Web Programmatic Access to Data
Data Repository: with versions
Points to specific
version of repository
Pancancer collaborative subtype discovery
Download analysis and meta-analysis
Download another Cluster
Result
Download Evaluation and view more
stats
• Perform Model averaging
• Compare/contrast models
• Find consensus clusters
130$drugs$
Predic.on$
Accuracy$(R2)$
Performance*assessment*
Expression*Copy*
number* Muta6on* Phenotype*
Expression*Copy*
number* Muta6on* Phenotype*
Expression*
Copy*number*
Muta6on*
Phenotype*Expression*
Copy*number*
Muta6on*
Phenotype*
Predic6ve*model*genera6on*
Synapse infrastructure for sharing, searching, and analyzing TCGA data
• Comparison of many modeling approaches applied to the same data.
• Models transparently shared and reusable through Synapse.
• Displayed is comparison of 6 modeling approaches to predict sensitivity to 130 drugs.
• Extending pipeline to evaluate prediction of TCGA phenotypes.
• Hosting of collaborative competitions to compare models from many groups.
Performance*assessment*
Expression*Copy*
number* Muta6on* Phenotype*
Expression*Copy*
number* Muta6on* Phenotype*
Expression*
Copy*number*
Muta6on*
Phenotype*Expression*
Copy*number*
Muta6on*
Phenotype*
Predic6ve*model*genera6on*
Synapse transparent, reproducible, versioned machine
learning infrastructure for method comparison
1) Automated, standardized workflows for curation, QC and hosting of large-scale datasets (Brig Mecham).
custom model 1 custom model 2 custom model N
4) Statistical performance assessment across models.
custom model 1 custom model 2 custom model N
5) Output of candidate biomarkers and feature evaluation (e.g. GSEA, pathway analysis)
2) Programmatic APIs to load standaridzed objects, e.g. R ExpressionSets (Matt Furia): Load cell line feature and response data: > ccleFeatureData <- getEntity(ccleFeatureDataId) > ccleResponseData <- getEntity(ccleResponseDataId) Load TCGA feature and phenotype data (in same format as cell line data): > tcgaFeatureData <- getEntity(tcgaFeatureDataId) > tcgaResponseData <- getEntity(tcgaResponseDataId)
3) Pluggable API to implement predictive modeling algorithms.
User implements customTrain() and customPredict() functions.
Support for all commonly used machine learning methods (for automated benchmarking against new methods)
Objective assessment of factors influencing model
performance (>1 million predictions evaluated)
Sanger CCLE Prediction accuracy
improved by…
Not discretizing data
Including expression data
Elastic net regression
130 compounds 24 compounds
Cro
ss v
alid
atio
n p
red
icti
on
acc
ura
cy (
R2)
In Sock Jang
Assessment of pathway enrichment of inferred
predictive feature sets
KEGG REACTOME BIOCARTA
San
ger
CC
LE Pat
hw
ays
Compounds
Data Analysis with Synapse
Run Any Tool
On Any Platform
Record in Synapse
Share with Anyone
Why Stratifying Patients for Therapy Matters
30
Chemotherapy
Chemotherapy
+
Cetuximab
43% 59%
60% mCRC
patients are
RASwt
40% mCRC
patients are
RASmut
Metastatic
Colorectal Cancer (mCRC)
But not all CRC patients that are RASwt respond to Cetuximab In other cancers for which it is efficacious RAS status appears not to predict response (e.g. lung)
40% 36%
responder
non-responder
KRAS
BRAF
MEK1/2
EGFR
Proliferation, Survival
RAS Model using primary tumor data to predict KRAS mutation status
31
290 CRC samples:
• KRAS12 or KRAS13 (n=115) vs WT (n=175)
• Penalized regression model using ElasticNet and gene expression data
Robust External Validation In CRC data sets
RAS signatures derived from CRC cohort can classify mutation status in CRC
False positive rate
Tru
e p
ositiv
e r
ate
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
TCGA CRC
Khambata−Ford
Gaedcke
Model specific to
CRC: does not
generalized to other
KRAS dependent
cancers
32
kra
s.p
.G12
D
kra
s.p
.G1
2V
kra
s.p
.G13
D
kra
s.p
.A14
6T
kra
s.p
.G12
C
kra
s.p
.G1
2S
kra
s.p
.G1
2A
kra
s.p
.K1
17
N
kra
s.p
.Q61L
kra
s.p
.A14
6V
kra
s.p
.E9
8X
kra
s.p
.G12
R
kra
s.p
.G13
C
kra
s.p
.Q2
2K
kra
s.p
.R6
8S
bra
f.p.V
60
0E
bra
f.p.E
22
8V
bra
f.p.F
247L
bra
f.p.K
205Q
nra
s.p
.Q6
1K
nra
s.p
.G12
C
nra
s.p
.G12
D
nra
s.p
.G13
R
nra
s.p
.Q61L
nra
s.p
.E13
2K
nra
s.p
.G1
2A
nra
s.p
.Q61
H
nra
s.p
.Q61
R
nra
s.p
.R1
64
C
WT
0.0
0.2
0.4
0.6
0.8
1.0
RIS
kras
braf
nras
wt
Exploring the RASness Model in TCGA Colorectal Carcinoma
Putative novel activating KRAS mutations
Can we predict response to RAS Pathway Drugs in CRC Cell lines?
33
RASness Model Translates to predict response to RAS pathway drugs in CRC cell lines
P value
Note: KRAS and/or BRAF mutation status NOT predictive of response to MEK inhibitor
KRAS NRAS
BRAF
MEK1/2
EGFR
ERBB2
BCR/ABL
Proliferation, Survival
PD-0325901
AZD6244
Correlate RASness Score with IC50 for drugs across 21 CRC cell lines from CCLE1 panel
1. Barretina et al. 2012 Nature. 483:603: The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.
34
RASness Model Predicts response to Cetuximab in patient and xenograft data
Non−response Response
0.1
0.3
0.5
tumor, n=19
RIS
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
p=0.023 (p>.5)
Non−response Response
0.2
0.4
0.6
0.8
xeno early, n=26
RIS
●
●
●
●
●
●●
●
●
●
●●
●●●
●
●
●
●
●●
●
●
●
●
p=0.017 (p=0.03)
Non−response Response
0.1
0.3
0.5
0.7
xeno late, n=28
RIS
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
p=0.0034 (p>.5)
Non−response Response
0.0
0.4
0.8
tumor + xeno, n=73
RIS
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●●
● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
p=1.4e−05 (p=0.13)
kras
braf
kras+braf
wt
RAS model predicts response to Cetuximab better than mutation status
54 xenograft models
115 expression arrays, xenograft and primary
tumor
Kras, braf, pik3ca, apc profiled
Response to cetuximab, 5-FU, I-OHP, CPT-11
measured
Predictive models of cancer phenotypes
mRNA
copy number
somatic mutations
epigenetics
proteomics
Molecular
characterization
Cancer
phenotypes
Drug sensitivity
screens
Clinical
prognosis
Panel of tumor samples
Predictive model
15
Developing predictive models of genotype-specific
sensitivity to compound treatment P
red
icti
ve F
eatu
res
(bio
mar
kers
)
Genetic Feature Matrix Expression, copy number, somatic mutations, etc.
Maximize:
logPr |C,G ~ C G2
2
1 1
2
2
Sensitive Refractory
(e.g. EC50)
Cancer samples with varying degrees of response to therapy
36
AHR expression predicts sensitivity to MEK inhibitors in NRAS mutant cell lines
Functionally validated by AHR knockdown
Legend AHR shRNA Control shRNA
Novel predictions are functionally validated
37
Prediction Validation
BCL$xL&Expression&&
Doxorubicin*Triptolide*Eme3ne*ActD*Flavopiridol*Anicomycin*Puromycin*
! "#$%&'#( ) * +', - &$#"#( &'* . /%0* 0&1&"23#/#4* . 4#5&6 7/#4* 86 ) 94) * : 2"&6 7/#4*
;<"*
/,5$,5)*
=><"*
?!@
*
BCL-xL expression predicts sensitivity to several chemotherapeutics
Functionally validated by :
BCL-xL knockdown BCL-xL inhibitor drug synergy
Mouse models Clinical trials
Wei G.*, Margolin A.A.*, et al, Cancer Cell
REDEFINING HOW WE WORK TOGETHER: Sage/DREAM Breast Cancer Prognosis Challenge
What is the problem?
Our current models of disease biology are primitive and limit
doctor’s understanding and ability to treat patients
Current incentives reward those who
silo information and work in closed
systems
The Solution: Competitions to crowd-source research
in biology and other fields
Why competitions?
• Objective assessments
• Acceleration of progress
• Transparency
• Reproducibility
• Extensible, reusable models
Competitions in biomedical research
• CASP (protein structure)
• Fold it / EteRNA (protein / RNA structure)
• CAGI (genome annotation)
• Assemblethon / alignathon (genome assembly / alignment)
• SBV Improver (industrial methodology benchmarking)
• DREAM (co-organizer of Sage/DREAM competition)
Generic competition platforms
• Kaggle, Innocentive, MLComp
METABRIC
•Array-CGH
•Expression arrays
•Sequencing TP53 PIK3CA
•Amplified DNA and cDNA banks
•miRNA profiling
Anglo-Canadian collaboration
Gene sequencing (ICGC)
Sage/DREAM Challenge: Details and Timing
Phase 1: July thru end-Sep 2012
Training data: 2,000 breast cancer samples from METABRIC cohort
• Gene expression
• Copy number
• Clinical covariates
• 10 year survival
Supporting data: Other Sage-curated breast cancer datasets
• >1,000 samples from GEO
• ~800 samples from TCGA
• ~500 additional samples from Norway group
• Curated and available on Synapse, Sage’s compute platform
Data released in phases on Synapse from now through end-September
Will evaluate accuracy of models built on METABRIC data to predict survival in:
• Held out samples from METABRIC
• Other datasets
Phase 2: Oct 15 thru Nov 12, 2012
Evaluation of models in novel dataset.
Validation data: ~500 fresh frozen tumors from Norway group with:
• Clinical covariates
• 10 year survival
Performance*assessment*
Expression*Copy*
number* Muta6on* Phenotype*
Expression*Copy*
number* Muta6on* Phenotype*
Expression*
Copy*number*
Muta6on*
Phenotype*Expression*
Copy*number*
Muta6on*
Phenotype*
Predic6ve*model*genera6on*
Synapse transparent, reproducible, versioned machine
learning infrastructure for method comparison
Custom models implement train() and predict() API.
Implementation of simple clinical-only survival model used as baseline predictor.
Trey%Ideker)
Janusz%Dutkowski)
Eric%Schadt)Gaurav%Pandey)
Gustavo%Stolovi= ky)Erhan%
Bilal)
Andrea%Califano)
Yishai%Shimoni)
Mukesh%Bansal) Mariano%
Alvarez)
Garry%Nolan)
In%Sock%Jang) Ben%Sauerwine)
Stephen%Friend)
Justin%Guinney)
Marc%Vidal)
Adam%Margolin)
Ben%Logsdon)
Federation modeling competition
Models submitted and evaluated in real-time
leaderboard
>200 models tested within 3 months
Sage-DREAM Breast Cancer Prognosis Challenge one month of building better disease models together
154 participants; 27 countries
268 participants; 32 countries
290 models posted to Leaderboard
breast cancer data
Challenge Launch: July 17
August 17 Status
Summary of Breast Cancer Challenge #1 https://synapse.sagebase.org/ - BCCOverview:0
Transparency, reproducibility
Validation in novel dataset
Publication in Science Translational Medicine
Donation of Google-scale compute space.
For the goal of promoting democratization of medicine… Registration starting NOW…
sign up at: synapse.sagebase.org
Performance*assessment*
Expression*Copy*
number* Muta6on* Phenotype*
Expression*Copy*
number* Muta6on* Phenotype*
Expression*
Copy*number*
Muta6on*
Phenotype*Expression*
Copy*number*
Muta6on*
Phenotype*
Predic6ve*model*genera6on*
FOUR PILOTS IN THE SAGE BIONETWORKS COMMONS INCUBATOR
• Provide a “compute space” for hosting and sharing models – (to complement data storage and tools provided by Sanger Broad…)- SYNAPSE)
• Co-generate models of drivers for Cell Line/Clinical Sensitivity
• Host Challenges and other approaches that will maximize most people providing and sharing their insights as quickly as possible – https://synapse.sagebase.org/ - BCCOverview:0
• Engage citizens as partners in gathering information and insights and funds
SYNAPSE
CURATED
DATA
TOOLS/
METHODS
ANALYZES/
MODELS
RAW
DATA
BioMedicine Information Commons
Data
Generators
Data
Analysts
Experimentalists
Clinicians
Patients/
Citizens
Networked Approaches
Upon this gifted age, in its dark hour,
Rains from the sky a meteoric shower
Of Facts…they lie unquestioned,uncombined.
Wisdom enough to leech us of our ill
Is daily spun; but there exists no loom
To weave it into fabric.
- Edna St. Vincent Millay
top related