Open Source pre-competitive drug discovery Moving beyond linear investigations Both of the science and of how we work Stephen Friend MD PhD Sage Bionetworks (Non-Profit Organization) Seattle/ Beijing/ Amsterdam February 28, 2012
Dec 04, 2014
Open Source pre-competitive drug discovery
Moving beyond linear investigations Both of the science and of how we work
Stephen Friend MD PhD
Sage Bionetworks (Non-Profit Organization) Seattle/ Beijing/ Amsterdam
February 28, 2012
Partnering & Collabora/on-‐So what has been possible?
All pa&ents now >25,000 at a Cancer Center partnered provide consented expression on their pts for classifying sub-‐popula&ons
Combina&on Therapies-‐ each at Ph I-‐ joint development 2 Pharma
Sharing all the CT Onc Trial imagining files among 2 Pharma
Link Parma with an “Ins&tute for Applied Cancer Center”
Share genomic data on 25,000 samples with clinical records and Expression and Exomes among three Pharma
Partnering & Collabora/on-‐So what has been possible?
All pa&ents now >25,000 at a Cancer Center partnered provide consented expression on their pts for classifying sub-‐popula&ons
2006 MoffiP Cancer Center-‐ Merck
Combina&on Therapies-‐ each at Ph I-‐ joint development 2 Pharma
2007 AZ Merck (Mek/Akt)
Sharing all the CT Onc Trial imagining files among 2 Pharma 2008 BMS & Merck
Link Parma with an ” Ins&tute for Applied Cancer Center”
2008 Belfer-‐ Merck
Share genomic data on 25,000 samples with clinical records and Expression and Exomes among three Pharma
2010 Asian Cancer Research Group ACRG-‐ Lilly Merck Pfizer
So what is the problem?
Most approved therapies were assumed to be monotherapies for diseases represen&ng homogenous popula&ons
Our exis&ng disease models o]en assume pathway knowledge sufficient to infer correct therapies
Familiar but Incomplete
Reality: Overlapping Pathways
what will it take to understand disease?
DNA RNA PROTEIN (dark maCer)
MOVING BEYOND ALTERED COMPONENT LISTS
DIVERSE POWERFUL USE OF MODELS AND NETWORKS
50 network papers http://sagebase.org/research/resources.php
List of Influential Papers in Network Modeling
(Eric Schadt)
Sage Mission
Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by
contributor scientists with a shared vision to accelerate the elimination of human disease
Sagebase.org
Data Repository
Discovery Platform
Building Disease Maps
Commons Pilots
Sage Bionetworks Collaborators
Pharma Partners Merck, Pfizer, Takeda, Astra Zeneca, Amgen, Roche
12
Foundations Kauffman CHDI, Gates Foundation
Government NIH, LSDF, NCI
Academic Levy (Framingham) Rosengren (Lund) Krauss (CHORI)
Federation Ideker, Califano, Nolan, Schadt
RULES GOVERN PL
ATFO
RM
NEW
MAP
S
RULES GOVERN PL
ATFO
RM
NEW
MAP
S
Why not share clinical /genomic data and model building within teams in ways currently used by the software industry
(power of tracking workflows and versioning
Leveraging Existing Technologies
Taverna
Addama
tranSMART
Watch What I Do, Not What I Say sage bionetworks synapse project
Reduce, Reuse, Recycle sage bionetworks synapse project
Most of the People You Need to Work with Don’t Work with You
sage bionetworks synapse project
My Other Computer is Cloudera Amazon Google
sage bionetworks synapse project
Sage Metagenomics Project
• > 10k genomic and expression standardized datasets indexed in SCR • Error detection, normalization in mG • Access raw or processed data via download or API in downstream analysis • Building towards open, continuous community curation
Processed Data (S3)
Sage Metagenomics using Amazon Simple Workflow
Full case study at http://aws.amazon.com/swf/testimonials/swfsagebio/
Amazon SWF and Synapse
• Maintains state of analysis • Tracks step execution • Logs workflow history • Dispatches work to Amazon or
remote worker nodes • Efficiently match job size to
hardware • Provides error handling and
recovery
• Hosts raw and processed data for further reuse in public or private projects
• Provides visibility into intermediate results and algorithmic details
• Allows programmatic access to data; integration with R
• Provides standard terminologies for annotations
• Search across data sets
Synapse Roadmap
Q1-2012 Q2-2012 Q3-2012 Q4-2012 Q1-2013 Q2-2013
Synapse Platform Functionality
Data / Analysis Capabilities
Q3-2013 Q4-2013
Internal Alpha Public Beta Testing Synapse 1.0 Synapse 1.5 Future
• Data Repository • Projects and security • R integration • Analysis provenance
• Search • Controlled Vocabularies • Governance of restricted data
• 40+ manually curated clinical studies • 8000 + GEO / Array Express datasets • Clinical, genomic, compound sensitivity • Bioconductor and custom R analysis
• TCGA • METABRIC breast cancer challenge
• Workflow templates • Publishing figures • Wiki & collaboration tools • Integrated management of cloud resources
• Social networking • User-customized dashboards • R Studio integration • Curation tool integration
• Predictive modeling workflows • Automated processing of common genomics platforms
• TBD: Integrations with other visualization and analysis packages
INTEROPERABILITY
INTEROPERABILITY
Genome Pattern CYTOSCAPE tranSMART I2B2
SYNAPSE
CTCAP The Federa/on Portable Legal Consent Sage Congress Project
Arch2POCM
Five Pilots involving Sage Bionetworks
RULES GOVERN
PLAT
FORM
NEW
MAP
S
Clinical Trial Comparator Arm Partnership (CTCAP)
Description: Collate, Annotate, Curate and Host Clinical Trial Data with Genomic Information from the Comparator Arms of Industry and Foundation Sponsored Clinical Trials: Building a Site for Sharing Data and Models to evolve better Disease Maps.
Public-Private Partnership of leading pharmaceutical companies, clinical trial groups and researchers.
Neutral Conveners: Sage Bionetworks and Genetic Alliance [nonprofits].
Initiative to share existing trial data (molecular and clinical) from non-proprietary comparator and placebo arms to create powerful new tool for drug development.
Started Sept 2010
Shared clinical/genomic data sharing and analysis will maximize clinical impact and enable discovery
• Graphic of curated to qced to models
The Federa/on
2008 2009 2010 2011
How can we accelerate the pace of scientific discovery?
Ways to move beyond “traditional” collaborations?
Intra-lab vs Inter-lab Communication
Colrain/ Industrial PPPs Academic Unions
(Nolan and Haussler)
sage federation: model of biological age
Faster Aging
Slower Aging
Clinical Association - Gender - BMI - Disease Genotype Association Gene Pathway Expression Pr
edicted Age (liver expression)
Chronological Age (years)
Age Differential
Reproducible science==shareable science
Sweave: combines programmatic analysis with narrative
Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reports using literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 –
Proceedings in Computational Statistics,pages 575-580. Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9
Dynamic generation of statistical reports using literate data analysis
TP53 mut
CDKN2A copy
MDM2 expr
HGF expr
CML linage EGFR mut
EGFR mut
EGFR mut
CML lineage
ERBB2 expr
BRAF mut
BRAF mut
NRAS mut
BRAF mut
NRAS mut
KRAS mut
BRAF mut
NRAS mut
KRAS mut
#1 BRAF mut
#2 NRAS mut #1 BRAF mut
#3 KRAS mut #2 NRAS mut #1 BRAF mut
#3 KRAS mut #2 NRAS mut #1 BRAF mut
#1 EGFR mut
#1 ERBB2 expr
#1 EGFR mut
#2 CML lineage #1 EGFR mut
#1 CML lineage
#1 HGF expr
#2 TP53 mut #3 CDKN2A copy #1 MDM2 expr
Can the approach make new discoveries?
For 11/12 compounds, the #1 predictive feature in an unbiased analysis corresponds to the known stratifier of sensitivity
35
Vaske, et al.
Presentation outline
Currently mRNA copy number somatic mutations (36
cancer-related genes) In progress targeted exon sequencing epigenetics microRNA lncRNA phospho-tyrosine kinase metabolites
Molecular characterization (1,000 cell lines)
Viability screens (500 cell lines, 24 compounds)
Small molecule screen
Cancer cell line encyclopedia
TCGA /ICGC Molecular characterization (50 tumor types)
genomics transcriptomics epigenetics
Clinical data Predic&ve model
1) Predic&ng drug response from cancer cell lines
2) Future approaches: network-‐based predictors and mul&-‐task learning
3) Standardized workflows for data management, versioning and method comparison
Transfer learning
Network / pathway prior informa&on
Vaske, et al.
1) Data management APIs to load standaridzed objects, e.g. R ExpressionSets (MaP Furia):
ccleFeatureData <-‐ getEn/ty(ccleFeatureDataId) ccleResponseData <-‐ getEn/ty(ccleResponseDataId)
tcgaFeatureData <-‐ getEn/ty(tcgaFeatureDataId) tcgaResponseData <-‐ getEn/ty(tcgaResponseDataId)
=!
Observed Data!=! +!
+!
Random Variation!Systematic Variation!
+!
Normalization: Remove the influence of adjustment variables on data...!
=! +!
2) Automated, standardized workflows for cura&on and QC of large-‐scale datasets (Brig Mecham).
A. TCGA: Automated cloud-‐based processing. B. GEO / Array Expression: Normaliza/on workflows, cura/on of phenotype using standard ontologies. C. Addi/onal studies with gene/c and phenotypic data in Sage repository (e.g. CCLE and Sanger cell line datasets)
custom model 1 custom model 2 custom model N
4) Sta&s&cal performance assessment across models.
custom model 1 custom model 2 custom model N
5) Output of candidate biomarkers and feature evalua&on (e.g. GSEA, pathway analysis)
6) Experimental follow-‐up on top predic&ons (TBD) E.g. for cell lines: medium throughput suppressor / enhancer screens of drug sensi/vity for knockdown / overexpression of predicted biomarkers.
3) Pluggable API to implement predic&ve modeling algorithms.
A) Support for all commonly used machine learning methods (for automated benchmarking against new methods)
B) Pluggable custom methods as R classes implemen/ng customTrain() and customPredict() methods.
A) Can be arbitrarily complex (e.g. pathway and other priors)
B) Support for paralleliza/on in for each loops.
Portable Legal Consent
(Ac/va/ng Pa/ents)
John Wilbanks
Sage Congress Project April 20 2012
RealNames Parkinson’s Project Revisi/ng Breast Cancer Prognosis
Fanconi’s Anemia
(Responders Compe//ons-‐ IBM-‐DREAM)
Confidential | © 2012 Third Rock Ventures
THE QUICK WIN, FAST FAIL DRUG DEVELOPMENT PARADIGM
March 1, 2012 PAGE 40
Preclinical development Phase I
Phase II
Test each scarce molecule thoroughly
Phase III Scarcity of drug discovery
Abundance of drug discovery
CS FHD FED PD Launch
PD Launch
• Increase critical information content early to shift attrition to cheaper phase
• Use savings from shifted attrition to re-invest in the R&D ‘sweet spot’
FHD
POC
CS
Preclinical development
Confirmation, dose finding Commercialization
R&D ‘sweet spot’
TRADITIONAL
QUICK WIN, FAST FAIL
Higher p(TS)
$ $ $$ $$$$
Source: Nature Publishing Group
Arch2POCM
Restructuring the Precompe//ve Space for Drug Discovery
How to poten/ally De-‐Risk High-‐Risk Therapeu/c Areas
Arch2POCM: Highlights A PPP To De-Risk Novel Targets That The Pharmaceutical Industry Can
Then Use To Accelerate The Development of New and Effective Medicines • The Arch2POCM will be a charitable Public Private Partnership (PPP) that will file no patents and
whose scientific plan (including target selection) will be endorsed by its pharmaceutical, private and public funders
• Arch2POCM will de-risk novel targets by developing and using pairs of test compounds (two different chemotypes) that interact with the selected targets: the compounds will be developed through Phase IIb clinical trials to determine if the selected target plays a role in the biology of human disease
• Arch2POCM will work with and leverage patient groups and clinical CROs to enable patient recruitment, and with regulators to design novel studies and to validate novel biomarkers
• Arch2POCM will make its GMP test compounds available to academic groups and foundations so they can use them to perform clinical studies and publish on a multitude of additional indications
• Arch2POCM will release all reagents and data to the public at pre-defined stages in its drug development process. To ensure scientific quality, data and reagents will be released once they have been vetted by an independent scientific committee
• Arch2POCM will publish all negative POCM data immediately in order to reduce the number of ongoing redundant proprietary studies (in pharma, biotech and academia) on an invalidated target and thereby – minimize unnecessary patient exposure – provide significant economic savings for the pharmaceutical industry
• In the rare instance in which a molecule achieves positive POCM, Arch2POCM will ensure that the compound has the ability to reach the market by arranging for exclusive access to the proprietary IND database for the molecule 42
Arch2POCM: scale and scope
• Proposed Goal: Initiate 2 programs. One for Oncology/Epigenetics/Immunology. One for Neuroscience/Schizophrenia/Autism. Both programs will have 6-8 drug discovery projects (targets) - ramped up over a period of 2 years
– It is envisioned that Arch2POCM’s funding partners will select targets that are judged as slightly too risky to be pursued at the top of pharma’s portfolio, but that have significant scientific potential that could benefit from Arch2POCM’s crowdsourcing effort
• These will be executed over a period of 5 years making a total of 16 drug discovery projects
– Projected pipeline attrition by Year 5 (assuming 12 targets loaded in early discovery)
• 30% will enter Phase 1 • 20% will deliver Ph 2 POCM data 43
Arch2POCM: proposed funding strategy
– Arch2POCM funding will come from a combination of public funding from governments and private sector funding from pharmaceutical and biotechnology companies and from private philanthropists
– By investing $1.6 M annually into one or both of Arch2POCM’s selected disease areas, partnered pharmaceutical companies:
1. obtain a vote on Arch2POCM target selection 2. gain real time data access to Arch2POCM’s12- 16 drug discovery
projects 3. have the strategic opportunity to expand their overall portfolio
44
Lead identification Phase I Phase II Preclinical
Lead optimisation
Assay in vitro probe
Lead Clinical candidate
Phase I asset
Phase II asset
Pioneer targets - genomic/ genetic - disease networks - academic partners - private partners - SAGE, SGC,
Stage-gate 1: Early Discovery and PCC Compounds (75%)
Stage-gate 2: Pharma’s re-purposed clinical assets (25%) 45
Entry points for Arch2POCM programs: Two compounds (different chemotypes) will be advanced per target
Five Year Objective: Initiate ≈ 8 drug discovery projects with 6 entering in Early Discovery, one entering in pre-clinical and one entering in PH I
Months → 0-6 7-12 13-18 19-24 25-30 31-36 37-42 43-48 49-54 55-60
Pipeline flow for Arch2POCM
Early discovery (45% PTRS) Pre-clinical (70% PTRS) Ph I (65% PTRS)
Ph II (10% PTRS)
1.3
1
Ph 1 (1)
1
Year #2 Arch2POCM Target Load
Arch2POCM Snapshot at Year 5
Year #1 Arch2POCM Target Load
Early discovery (2)
1
Targets Loaded 8
Projected INDs filed 3-‐4
Ph 1 or 2 Trials In Progress 2
Projected Complete Ph 2 (POCM) Data Sets
1
*PTRS = Probability of technical and regulatory success
Pre-clinical (1)
Early discovery (4)
Pre-clinical
Pre-clinical
Ph 1
Ph 1
Ph 1
Ph 2
Ph 2
Ph 2
46
The case for epigenetics/chromatin biology
1. There are epigenetic oncology drugs on the market (HDACs)
2. A growing number of links to oncology, notably many genetic links (i.e. fusion proteins, somatic mutations)
3. A pioneer area: More than 400 targets amenable to small molecule intervention - most of which only recently shown to be “druggable”, and only a few of which are under active investigation
4. Open access, early-stage science is developing quickly – significant collaborative efforts (e.g. SGC, NIH) to generate proteins, structures, assays and chemical starting points
47
Domain Family Typical substrate class* Total Targets
Histone Lysine demethylase
Histone/Protein K/R(me)n/ (meCpG) 30
Bromodomain Histone/Protein K(ac) 57
R O Y A L
Tudor domain Histone Kme2/3 - Rme2s 59
Chromodomain Histone/Protein K(me)3 34
MBT repeat Histone K(me)3 9
PHD finger Histone K(me)n 97
Acetyltransferase Histone/Protein K 17
Methyltransferase Histone/Protein K&R 60
PARP/ADPRT Histone/Protein R&E 17
MACRO Histone/Protein (p)-ADPribose 15
Histone deacetylases Histone/Protein KAc 11
395
The current epigenetics universe
Now known to be amenable to small molecule inhibition 48
Why is Arch2POCM a “smart bet” for Pharma investment?
Arch2POCM: an external epigene/c think tank from which Pharma can load the most likely to succeed targets as proprietary programs or leverage Arch2POCM results for its other internal efforts • A front row seat on the progression of 6-‐ 8 epigene/c targets means that:
• Pharma can select the epigene/c targets that best compliment their internal pormolio and for which there is the greatest interest
• Pharma can structure Arch2POCM’s projects so that key objec/ves line up with internal go/no-‐go decisions
• Pharma can use Arch2POCM data to trigger its internal level of investment on a par/cular target
• Pharma can use Arch2POCM resources to enrich their internal epigene/cs effort: ac/ve chemotypes, assays, pre-‐clinical models, biomarkers, gene/c and phenotypic data for pa/ent stra/fica/on, rela/onships to epigene/c experts
• Pharma can use Arch2POCM’s lead compound chemotypes to: • inform their proprietary medicinal chemistry efforts on the target
• iden/fy chemical scaffolds that impact epigene/c pathways: a proprietary combina/on therapy opportunity
• Toxicity screening of Arch2POCM compounds with FDA tools can be used to guide internal proprietary chemistry efforts in oncology, inflamma/on and beyond
• Arch2POCM’s crowd of scien/sts and clinicians provides its Pharma partners with parallel shots on goal at the best context for Arch2POCM’s compounds/targets 49
How will Arch2POCM provide “line of sight” to new medicines?
Arch2POCM will partner with scientists, clinicians and CROs that:
• use “Omics” approaches to construct predictive models of disease networks (genomic, proteomic, signaling and metabolic)
• have strategies available to identify those disease network gene(s) which when perturbed, impact the overall functioning of the network
• already have epigenetic assays in place to identify chemotype structures (from discovery and/or pharma’s re-purposed un-used clinical assets) that impact the target and disease-correlated molecular phenotypes
• already have biomarker tools available that can be tested for correlation to Arch2POCM’s targets
• already have access to patient data and/or patient groups to mine for genetic and phenotypic signatures that may represent best responders for Arch2POCM clinical trials
50
• Arch2POCM’s Ph II validation of high risk high opportunity targets focuses Pharma’s NME efforts
• Positive POCM data: De-risked validated targets for Pharma development • Negative POCM data: public release of this data minimizes the amount of time
and money that Pharma and the industry place on failed targets
• Arch2POCM’s clinical candidate compounds provide Pharma with multiple paths to new medicines
• Arch2POCM compounds that achieve POCM can be advanced into Ph 3 by Arch2POCM Members
• The purchaser of Arch2POCM’s IND database obtains a significant time advantage over competitors to generate Phase III data and proceed to market
• NMEs that derive from Arch2POCM will launch with database exclusivity protections: 5-8 years to garner a return on investment
• The crowd’s testing of Arch2POCM compounds may identify alternative/better contexts for agonizing/antagonizing the disease biology target
• indications • patient stratification • combination therapy options
How will Arch2POCM provide “line of sight” to new medicines?
51
Arch2POCM: current partnering status • Pharmaceutical Funding Partners
– Three companies are considering a potential role as industry anchors for Arch2POCM – Two companies have demonstrated interest in Arch2POCM and their company leadership wants to
go to next step- awaiting face to face discussions to go over agreement
• Public Funding Partners
– Good progress is being made to obtain financial backing for Arch2POCM from public funders in a number of countries (Canada, United Kingdom and Sweden) for both epigenetics and for CNS
– Ontario Brain Institute, Canada has allocated $3M to the development of an autism clinical network that is committed to work with Arch2POCM
• Philanthropic Funding Partners: awaiting designation of anchor partners
• In kind partners – GE Healthcare (imaging): lead diagnostics partner and willing to share its experimental oncology
biomarkers – Cancer Research UK: through some of its drug discovery and development resources considering
participating in Arch2POCM through “in kind efforts” • Academic partners
– Institutions that have indicated willingness to let their scientists participate without patent filing: UCSF, Massachusetts General Hospital, University of North Carolina, University of Toronto, Oxford University, Karolinska Institute
– Academic community of epigenetic experts/resources already identified
• Regulatory partners: Because the objective of the Arch2POCM PPP is to probe and elucidate disease biology as opposed to develop new proprietary products, FDA and EMEA are ready to play an active role (toxicity screens, and legacy clinical trial data)
• Patient group partners: leaders from Genetic Alliance, Inspire2Live and the Love Avon Army of Women are actively engaged 52
Confidential | © 2012 Third Rock Ventures
STRATEGIC INFLECTION: FORCES AFFECTING A BUSINESS
MDAndersonCC02272012 PAGE 53
Society’s Needs Customers
Suppliers
New Technologies
New Competitors
Businesses Academia Government
Networking Disease Model Building