Actionable Cancer Network Models And Open Medical Information Systems Integrating layers of omics data models and compute spaces Stephen Friend MD PhD Sage Bionetworks (Non-Profit Organization) Seattle/ Beijing/ Amsterdam ICR Oslo November 1, 2011
Jun 03, 2015
Actionable Cancer Network Models And Open Medical Information Systems
Integrating layers of omics data models and compute spaces
Stephen Friend MD PhD
Sage Bionetworks (Non-Profit Organization) Seattle/ Beijing/ Amsterdam
ICR Oslo November 1, 2011
Why not use data intensive science to build models of disease
Current Reward Structures
Organizational Structures and Tools
Pilots
Opportunities
What is the problem? • Regulatory hurdles too high? • Low hanging fruit picked? • Payers unwilling to pay? • Genome has not delivered? • Valley of death? • Companies not large enough to execute on strategy? • Internal research costs too high? • Clinical trials in developed countries too expensive?
In fact, all are true but none is the real problem
What is the problem?
We need to rebuild the drug discovery process so that we be6er understand disease biology before tes8ng proprietary compounds on sick pa8ents
What is the problem?
Most approved cancer therapies assumed tumor indica8ons would represent homogenous popula8ons
Most new cancer therapies are in search of single altered components
Our exis8ng tumor models o>en assume pathway knowledge sufficinet to infer correct therapies
Personalized Medicine 101: Capturing Single bases pair mutations = ID of responders
Reality: Overlapping Pathways
The value of appropriate representations/ maps
Equipment capable of generating massive amounts of data
“Data Intensive” Science- Fourth Scientific Paradigm
Open Information System
IT Interoperability
Host evolving computational models in a “Compute Space”
WHY NOT USE “DATA INTENSIVE” SCIENCE
TO BUILD BETTER DISEASE MAPS?
what will it take to understand disease?
DNA RNA PROTEIN (dark maGer)
MOVING BEYOND ALTERED COMPONENT LISTS
2002 Can one build a “causal” model?
trait
How is genomic data used to understand biology?
“Standard” GWAS Approaches Profiling Approaches
“Integrated” Genetics Approaches
Genome scale profiling provide correlates of disease Many examples BUT what is cause and effect?
Identifies Causative DNA Variation but provides NO mechanism
Provide unbiased view of molecular physiology as it
relates to disease phenotypes
Insights on mechanism
Provide causal relationships and allows predictions
RNA amplification Microarray hybirdization
Gene Index
Tum
ors
Tum
ors
20
Integration of Genotypic, Gene Expression & Trait Data
Causal Inference
Schadt et al. Nature Genetics 37: 710 (2005) Millstein et al. BMC Genetics 10: 23 (2009)
Chen et al. Nature 452:429 (2008) Zhang & Horvath. Stat.Appl.Genet.Mol.Biol. 4: article 17 (2005)
Zhu et al. Cytogenet Genome Res. 105:363 (2004) Zhu et al. PLoS Comput. Biol. 3: e69 (2007)
“Global Coherent Datasets” • population based
• 100s-1000s individuals
22
Define a Gene Co-expression Similarity
Define a Family of Adjacency Functions
Determine the AF Parameters
Define a Measure of Node Distance
Identify Network Modules (Clustering)
Relate the Network Concepts to External Gene or Sample Information
Gene Co-Expression Network Analysis
Zhang B, Horvath S. Stat Appl Genet Mol Biol 2005
Constructing Co-expression Networks
Start with expression measures for genes most variant genes across 100s ++ samples
Note: NOT a gene expression heatmap
1 -0.1 -0.6 -0.8
-0.1 1 0.1 0.2
-0.6 0.1 1 0.8
-0.8 0.2 0.8 1 1
2
3
4
1 2 3 4
Correlation Matrix Brain sample
expr
essi
on
1 0 1 1 0 1 0 0 1 0 1 1 1 0 1 1 1
2
3
4
1 2 3 4
Connection Matrix
1 0 0 0 0 1 1 1 0 1 1 1 0 1 1 1 1
2
4
3
1 2 4 3
4 1
3 2
Establish a 2D correlation matrix for all gene pairs
Define Threshold eg >0.6 for edge
Clustered Connection Matrix
Hierarchically cluster
sets of genes for which many pairs interact (relative to the total number of pairs in that
set)
Network Module
Identify modules
Preliminary Probabalistic Models- Rosetta /Schadt
Gene symbol Gene name Variance of OFPM explained by gene expression*
Mouse model
Source
Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg
Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12]
Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple
(UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg
(Columbia University, NY) [11] C3ar1 Complement component
3a receptor 1 46% ko Purchased from Deltagen, CA
Tgfbr2 Transforming growth factor beta receptor 2
39% ko Purchased from Deltagen, CA
Networks facilitate direct identification of genes that are
causal for disease Evolutionarily tolerated weak spots
Nat Genet (2005) 205:370
db/db mouse (p~10E(-30))
AVANDIA in db/db mouse
= up regulated = down regulated
Our ability to integrate compound data into our network analyses
db/db mouse (p~10E(-20) p~10E(-100))
TH
E EV
OLU
TIO
N O
F SY
STEM
S B
IOLO
GY
Disease Models
Physiologic / Pathologic
Phenotype Regulation
Literature
Structure Mol. Profiles
Model Evolution
Model Topology
Model Dynamics
Genomic
Signaling
Transcriptional
Protein-‐Protein Complexes
"Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003)
"Variations in DNA elucidate molecular networks that cause disease." Nature. (2008)
"Genetics of gene expression and its effect on disease." Nature. (2008)
"Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009) ….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc
"Identification of pathways for atherosclerosis." Circ Res. (2007)
"Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008)
…… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome
"Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005)
“..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)
"An integrative genomics approach to infer causal associations ...” Nat Genet. (2005)
"Increasing the power to detect causal associations… “PLoS Comput Biol. (2007)
"Integrating large-scale functional genomic data ..." Nat Genet. (2008)
…… Plus 3 additional papers in PLoS Genet., BMC Genet.
d
Metabolic Disease
CVD
Bone
Methods
Extensive Publications now Substantiating Scientific Approach Probabilistic Causal Bionetwork Models
• >80 Publications from Rosetta Genetics
50 network papers http://sagebase.org/research/resources.php
List of Influential Papers in Network Modeling
(Eric Schadt)
Equipment capable of generating massive amounts of data A-
“Data Intensive” Science- Fourth Scientific Paradigm Score Card for Medical Sciences
Open Information System D-
IT Interoperability D
Host evolving computational models in a “Compute Space F
.
We still consider much clinical research as if we were “hunter gathers”- not sharing
TENURE FEUDAL STATES
Clinical/genomic data are accessible but minimally usable
Little incentive to annotate and curate data for other scientists to use
Mathematical models of disease are not built to be
reproduced or versioned by others
Assumption that genetic alterations in human conditions should be owned
Lack of standard forms for sharing data and lack of forms for future rights and consentss
Publication Bias- Where can we find the (negative) clinical data?
sharing as an adoption of common standards.. Clinical Genomics Privacy IP
Sage Mission
Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by
contributor scientists with a shared vision to accelerate the elimination of human disease
Sagebase.org
Data Repository
Discovery Platform
Building Disease Maps
Commons Pilots
Sage Bionetworks Collaborators
Pharma Partners Merck, Pfizer, Takeda, Astra Zeneca, Amgen, Johnson &Johnson
40
Foundations Kauffman CHDI, Gates Foundation
Government NIH, LSDF
Academic Levy (Framingham) Rosengren (Lund) Krauss (CHORI)
Federation Ideker, Califarno, Butte, Schadt
RULES GOVERN
PLAT
FORM
NEW
MAP
S NEW MAPS
Disease Map and Tool Users- ( Scientists, Industry, Foundations, Regulators...)
PLATFORM Sage Platform and Infrastructure Builders-
( Academic Biotech and Industry IT Partners...)
PILOTS= PROJECTS FOR COMMONS Data Sharing Commons Pilots-
(Federation, CCSB, Inspire2Live....)
NEW TOOLS Data Tool and Disease Map Generators- (Global coherent data sets, Cytoscape,
Clinical Trialists, Industrial Trialists, CROs…)
RULES AND GOVERNANCE Data Sharing Barrier Breakers-
(Patients Advocates, Governance and Policy Makers, Funders...)
42
CNV Data Gene
Expression
Clinical Traits
Bayesian Network Co-Expression Network
Integration of Coexp. & Bayesian Networks
Integration of Multiple Networks for Pathway and Target Identification
Key Driver Analysis
42
Bin Zhang Jun Zhu
Key Driver Analysis
43 http://sagebase.org/research/tools.php
Bin Zhang Jun Zhu Justin Guinney
A) Miller 159 samples B) Christos 189 samples
C) NKI 295 samples
D) Wang 286 samples
Cell cycle
Pre-mRNA
ECM
Immune response
Blood vessel
E) Super modules
Zhang B et al., Towards a global picture of breast cancer (manuscript).
44
NKI: N Engl J Med. 2002 Dec 19;347(25):1999.
Wang: Lancet. 2005 Feb 19-25;365(9460):671.
Miller: Breast Cancer Res. 2005;7(6):R953.
Christos: J Natl Cancer Inst. 2006 15;98(4):262.
Model of Breast Cancer: Co-expression Bin Zhang Xudong Dai Jun Zhu
Breast Cancer Bayesian Network Conserved Super-modules
mR
NA
proc
.
Chr
omat
in
Pathways & Regulators (Key drivers=yellow; key drivers validated in siRNA screen=green)
Cell Cycle (Blue) Chromatin Modification (Black) Pre-mRNA proc. (Brown) mRNA proc. (red)
Extract gene:gene relationships for selected super-modules from BN and define Key Drivers
Zhang B et al., Key Driver Analysis in Gene Networks (manuscript)
45
Bin Zhang Xudong Dai Jun Zhu
Model of Breast Cancer: Integration
= predictive of survival
Predictive model
Developing predictive models of genotype specific sensitivity to Perturbations- Margolin
Developing predictive models of genotype-specific sensitivity to compound treatment
Pred
ic8ve Features
(biomarkers)
Gene8c Feature Matrix Expression, copy number, somaQc mutaQons, etc.
Sensi8ve Refractory
(e.g. EC50)
Cancer samples with varying degrees of response to therapy
47
Elastic net regression 500
Features
100
Features
20
Features
1 Feature
48
Bootstrapping retains robust predictive features
49
Our approach identifies mutations in genes upstream of MEK as top predictors of sensitivity to MEK inhibition
#1 Mut BRAF
#3 Mut NRAS
PD-‐0325901
PD-‐0325901
#9 Mut BRAF
#312 Mut NRAS
!"#$% &"#$%
'"#(%
)*!+,-% #./0-11%2/345-674+%
50
TP53 mut
CDKN2A copy
MDM2 expr
HGF expr
CML linage EGFR mut
EGFR mut
EGFR mut
CML lineage
ERBB2 expr
BRAF mut
BRAF mut
NRAS mut
BRAF mut
NRAS mut
KRAS mut
BRAF mut
NRAS mut
KRAS mut
#1 BRAF mut
#2 NRAS mut #1 BRAF mut
#3 KRAS mut #2 NRAS mut #1 BRAF mut
#3 KRAS mut #2 NRAS mut #1 BRAF mut
#1 EGFR mut
#1 ERBB2 expr
#1 EGFR mut
#2 CML lineage #1 EGFR mut
#1 CML lineage
#1 HGF expr
#2 TP53 mut #3 CDKN2A copy #1 MDM2 expr
How accurate would predic8ve models perform for diagnos8cs?
For 11/12 compounds, the #1 predictive feature in an unbiased analysis corresponds to the known stratifier of sensitivity
51
Why not share clinical /genomic data and model building in the ways currently used by the software industry (power of tracking workflows and versioning
Leveraging Existing Technologies
Taverna
Addama
tranSMART
INTEROPERABILITY
INTEROPERABILITY
Genome Pattern CYTOSCAPE tranSMART I2B2
SYNAPSE
Watch What I Do, Not What I Say Reduce, Reuse, Recycle
Most of the People You Need to Work with Don’t Work with You
My Other Computer is Amazon
sage bionetworks synapse project
CTCAP Non-‐Responders Arch2POCM The FederaQon Portable Legal Consent Sage Congress Project
Six Pilots at Sage Bionetworks
RULES GOVERN
PLAT
FORM
NEW
MAP
S
Clinical Trial Comparator Arm Partnership “CTCAP” Strategic Opportunities For Regulatory Science
Leadership and Action
FDA September 27, 2011
CTCAP
Clinical Trial Comparator Arm Partnership (CTCAP)
Description: Collate, Annotate, Curate and Host Clinical Trial Data with Genomic Information from the Comparator Arms of Industry and Foundation Sponsored Clinical Trials: Building a Site for Sharing Data and Models to evolve better Disease Maps.
Public-Private Partnership of leading pharmaceutical companies, clinical trial groups and researchers.
Neutral Conveners: Sage Bionetworks and Genetic Alliance [nonprofits].
Initiative to share existing trial data (molecular and clinical) from non-proprietary comparator and placebo arms to create powerful new tool for drug development.
Started Sept 2010
Shared clinical/genomic data sharing and analysis will maximize clinical impact and enable discovery
• Graphic of curated to qced to models
Non-‐Responders Project
To identify Non-Responders to approved Oncology drug regimens in order to improve
outcomes, spare patients unnecessary toxicities from treatments that have no benefit to them, and
reduce healthcare costs
The Non-‐Responder Cancer Project Leadership Team
11
Garry Nolan, PhD Professor, Baxter Laboratory of Stem Cell Biology, Department of Microbiology and Immunology, Stanford University Director, Proteomics Center at Stanford University
Richard Schilsky, MD Chief, Hematology- Oncology, Deputy Director, Comprehensive Cancer Center, University of Chicago; Chair, National Cancer Institute Board of Scientific Advisors; past-President ASCO, past Chairman CALGB clinical trials group
Todd Golub, MD Founding Director Cancer Biology Program Broad Institute, Charles Dana Investigator Dana-Farber Cancer Institute, Professor of Pediatrics Harvard Medical School, Investigator, Howard Hughes Medical Institute
Stephen Friend, MD, PhD President and Co-Founder of Sage Bionetworks, Head of Merck Oncology 01-08, Founder of Rosetta Inpharmatics 97-01, co-Founder of the Seattle Project
The Non-‐Responder Project is an internaQonal iniQaQve with funding for 6 iniQal cancers anQcipated from both the public and private sectors
5
Ovarian Renal Breast AML Colon Lung
United States China
Seeking private sector and philanthropic funding for
prospec8ve studies
RetrospecQve study; likely to be funded by the Federal Government
Funded by the Chinese government and private sector partners
GEOGRAPHY
TARGET CANCER
FUNDING SOURCE
Arch2POCM
Restructuring the PrecompeQQve Space for Drug Discovery
How to potenQally De-‐Risk High-‐Risk TherapeuQc Areas
The FederaQon
2008 2009 2010 2011
How can we accelerate the pace of scientific discovery?
Ways to move beyond “traditional” collaborations?
Intra-lab vs Inter-lab Communication
Colrain/ Industrial PPPs Academic Unions
(Nolan and Haussler)
human aging: predicting bioage using whole blood methylation
!
!
!!!
!
!!!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
! !!
!
!
!
!
!
!!!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!!
!
!
!
!
!
40 50 60 70 80 90 100
40
60
80
100
Training Cohort: San Diego (n=170)
Chronological Age
Bio
logic
al A
ge
RMSE=3.35
!
!!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
! !
!
!
!!
!
!
!
!
!
!!
!
!
!!
!
!!
!!
!
!
!
!!!
!!
!
!
!
!
!
!
!!
!!
!!
!
!!
!
!!
!
!!
!
!
!
!!!
!
!
!
! !
!
!
!!
!
!
!
!!
!
!
!!
! !
!!!
!
!
!
!!
!
!
!!
!
!
!!
40 50 60 70 80 90
40
60
80
100
Validation Cohort: Utah (n=123)
Chronological Age
Bio
logic
al A
ge
RMSE=5.44
• Independent training (n=170) and validation (n=123) Caucasian cohorts • 450k Illumina methylation array • Exom sequencing • Clinical phenotypes: Type II diabetes, BMI, gender…
sage federation: model of biological age
Faster Aging
Slower Aging
Clinical Association - Gender - BMI - Disease Genotype Association Gene Pathway Expression Pr
edicted Age (liver expression)
Chronological Age (years)
Age Differential
Reproducible science==shareable science
Sweave: combines programmatic analysis with narrative
Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reports using literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 –
Proceedings in Computational Statistics,pages 575-580. Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9
Dynamic generation of statistical reports using literate data analysis
Federated Aging Project : Combining analysis + narraQve
=Sweave Vignette Sage Lab
Califano Lab Ideker Lab
Shared Data Repository
JIRA: Source code repository & wiki
R code + narrative
PDF(plots + text + code snippets)
Data objects
HTML
Submitted Paper
Portable Legal Consent
(AcQvaQng PaQents)
John Wilbanks
Sage Congress Project April 20 2012
RA Parkinson’s Asthma
(Responders CompeQQons)
Why not use data intensive science to build models of disease
Current Reward Structures
Organizational Structures and Tools
Six Pilots
Opportunities
IMPACT ON PATIENTS
Actionable Cancer Network Models And Open Medical Information Systems