Exploring Disease Bionetworks and How we Perform our Science Stephen Friend June 18, 2012 ICR
Nov 28, 2014
Exploring Disease Bionetworks and How we Perform our Science
Stephen Friend June 18, 2012
ICR
InformaFon Commons for Biological FuncFon
KRAS NRAS
BRAF
MEK1/2
EGFR
ERBB2
BCR/ABL
EGFRi
Proliferation, Survival
• EGFR Pathway commonly mutated/acFvated in Cancer • 30% of all epithelial cancers
• Blocking Abs approved for treatment of metastaFc colon cancer
• Subsequently found that RASMUT tumors don’t respond – “NegaFve PredicFve Biomarker”
• However sFll EGFR+ / RASWT paFents who don’t respond? – need “PosiFve PredicFve Biomarker”
• And in Lung Cancer not clear that RASMUT status is useful biomarker
PredicFng treatment response to known oncogenes is complex and requires detailed understanding of how different geneFc backgrounds funcFon
Oncogenes only make good targets in particular molecular contexts : EGFR story
Causal Relationships ≠ Correlative Relationships? : CETPi story
• Epidemiological Data provides strong support for independent association of low LDL and high HDL with reduced incidence of heart disease
• Statins reduce LDL and reduce incidence of CVD deaths establishing causal relationship
• CETP inhibition raises HDL – Does this have positive clinical benefit?
• Torcetrapib (Pfizer) - $800M drug failed Ph3 (2006): a) Lack of efficacy; b) Increased mortality (off target?) • Dalcetrapib (Roche) – development halted in Ph3 (May 2012) for lack of efficacy (no increase in mortality) • Anacetrapib (Merck) / Evacetrapib (Lilly) – development ongoing. Hoped that they are better inhibitors and
this will lead to clinical benefit. Will cost $1Billion+ to find out
Can we save billions of dollars by generaFng and sharing datasets that let us be]er understand causal relaFonships?
Is there a common framework for tesFng clinical hypotheses (ARCH2POCM)?
what will it take to understand disease?
DNA RNA PROTEIN
MOVING BEYOND ALTERED COMPONENT LISTS
Familiar but Incomplete
Preliminary Probabalistic Models- Rosetta
Gene symbol Gene name Variance of OFPM explained by gene expression*
Mouse model
Source
Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg
Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12]
Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple
(UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg
(Columbia University, NY) [11] C3ar1 Complement component
3a receptor 1 46% ko Purchased from Deltagen, CA
Tgfbr2 Transforming growth factor beta receptor 2
39% ko Purchased from Deltagen, CA
Networks facilitate direct identification of genes that are
causal for disease Evolutionarily tolerated weak spots
Nat Genet (2005) 205:370
"Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003)
"Variations in DNA elucidate molecular networks that cause disease." Nature. (2008)
"Genetics of gene expression and its effect on disease." Nature. (2008)
"Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009) ….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc
"Identification of pathways for atherosclerosis." Circ Res. (2007)
"Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008)
…… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome
"Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005)
“..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)
"An integrative genomics approach to infer causal associations ...” Nat Genet. (2005)
"Increasing the power to detect causal associations… “PLoS Comput Biol. (2007)
"Integrating large-scale functional genomic data ..." Nat Genet. (2008)
…… Plus 3 additional papers in PLoS Genet., BMC Genet.
d
Metabolic Disease
CVD
Bone
Methods
Extensive Publications now Substantiating Scientific Approach Probabilistic Causal Bionetwork Models
• >80 Publications from Rosetta Genetics
50 network papers http://sagebase.org/research/resources.php
List of Influential Papers in Network Modeling
Biological System
Data Analysis
Fundamentally Biological Science hasn’t changed because of the ‘Omics RevoluFon……
…..it is about the process of linking a system to a hypothesis to some data to some analyses
But the way we do it has changed…………………………………………
Biological System
Data
Analysis
Biological System
Analysis
Data
Single Lab Model
Multiple Lab Model
• R01 Funding • Hypothesis->data->analysis->paper • Small-scale data / analysis • Reproducible?
• P01 Funding • Hypothesis->data->analysis->paper • Medium-scale data / analysis • Data Generators/Analysts/Validators maybe
different groups • Reproducible?
Driven by molecular technologies we have become more data intensive leading to more specializaFon: data generators (centralized cores), data analyzers (bioinformaFcians), validators (experimentalists: lab & clinical) This is reflected in the tendency for more mulF lab consorFum style grants in which the data generators, analyzers, validators may be different labs.
Biological System
Data
Analysis
Iterative Networked Approaches To Generating Analyzing and Supporting New Models
Uncouple the automatic linkage between the data generators, analyzers, and validators
SYNAPSE
CURATED DATA
TOOLS/ METHODS
ANALYZES/ MODELS
RAW DATA
BioMedicine Information Commons
Data Generators
Data Analysts
Experimentalists
Clinicians
Patients/ Citizens
Networked Approaches
SYNAPSE
CURATED DATA
TOOLS/ METHODS
ANALYZES/ MODELS
RAW DATA
BioMedical Information Commons
Data Generators
Data Analysts
Experimentalists
Clinicians
Patients/ Citizens
Networked Approaches
5 PRIVACY BARRIERS
2 REWARDS
RECOGNITION
3 GOVERNANCE
1 USABLE DATA
4 HOW TO
DISTRIBUTE TASKS
Barriers to Engaging Networked Approaches to a BioMedicine Information Commons
5 PRIVACY BARRIERS
1 USABLE DATA
3 RULES
GOVERNANCE
2 REWARDS
RECOGNITION
4 HOW TO
DISTRIBUTE TASKS
PORTABLE LEGAL CONSENT
SYNAPSE
SYNAPSE
THE FEDERATION
COLLABORATIVE CHALLENGES
Open and Networked Approaches:Democratization of Science
1 USABLE DATA
2 REWARDS
RECOGNITION
SYNAPSE
SYNAPSE
Two approaches to building common scientific and technical knowledge
Text summary of the completed project Assembled after the fact
Every code change versioned Every issue tracked Every project the starting point for new work All evolving and accessible in real time Social Coding
Synapse is GitHub for Biomedical Data
Data and code versioned Analysis history captured in real time Work anywhere, and share the results with anyone Social Science
Every code change versioned Every issue tracked Every project the starting point for new work All evolving and accessible in real time Social Coding
Why not share clinical /genomic data and model building in the ways currently used by the software industry (power of tracking workflows and versioning
Leveraging Existing Technologies
Taverna
Addama
tranSMART
Watch What I Do, Not What I Say
sage bionetworks synapse project
Reduce, Reuse, Recycle
sage bionetworks synapse project
Most of the People You Need to Work with Don’t Work with You
sage bionetworks synapse project
My Other Computer is “The Cloud”
sage bionetworks synapse project
Data Analysis with Synapse
Run Any Tool
On Any Platform
Record in Synapse
Share with Anyone
Find Public Data
Use Existing Tools
Public or Private Projects
Publish Your Work
clearScience links the components of a ‘big science’ project to a cloud computing environment...
so with a click from your browser you can push code into a virtual machine
or data...
or models...
or figures...
or entire compute environments... conveniently pre-populated with data, code, and the library and version dependencies
“my other computer is the cloud… let me hand it to you…”
pilot advisors!
Downloading through TCGA data portal
!"#$%#&'()"*'++"++&"(,*
-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*
-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*
-./#"++0%(*1%/2*
(3&4"#*
53,'6%(*
!7"(%,2/"*-./#"++0%(*
1%/2*(3&4"#*
53,'6%(*
!7"(%,2/"*
!#"80)69"*&%8":*;"("#'6%(*
• Automated workflows for curaFon, QC, and sharing of large-‐scale datasets.
• All of TCGA, GEO, and user-‐submi]ed data processed with standard normalizaFon methods.
• Searchable TCGA data: • 23 cancers • 11 data plaoorms • Standardized meta-‐data ontologies
!"#$%#&'()"*'++"++&"(,*
-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*
-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*
-./#"++0%(*1%/2*
(3&4"#*
53,'6%(*
!7"(%,2/"*-./#"++0%(*
1%/2*(3&4"#*
53,'6%(*
!7"(%,2/"*
!#"80)69"*&%8":*;"("#'6%(*
• Data accessible at mulFple levels of aggregaFon. • Links to upstream and downstream processing of
data. • Displayed is TCGA Glioblastoma data normalized
for each plaoorm across batches.
!"#$%#&'()"*'++"++&"(,*
-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*
-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*
-./#"++0%(*1%/2*
(3&4"#*
53,'6%(*
!7"(%,2/"*-./#"++0%(*
1%/2*(3&4"#*
53,'6%(*
!7"(%,2/"*
!#"80)69"*&%8":*;"("#'6%(*
• Data accessible through programmaFc environments such as R.
• Standardized formats allow reuse of analysis pipelines on all processed datasets.
• TCGA, GEO, user-‐submi]ed data.
!"#$%&'()$
*&+%
,-./
0$1-
-'&2-3$45
6 7$
!"#$%#&'()"*'++"++&"(,*
-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*
-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*
-./#"++0%(*1%/2*
(3&4"#*
53,'6%(*
!7"(%,2/"*-./#"++0%(*
1%/2*(3&4"#*
53,'6%(*
!7"(%,2/"*
!#"80)69"*&%8":*;"("#'6%(*
• Comparison of many modeling approaches applied to the same data.
• Models transparently shared and reusable through Synapse.
• Displayed is comparison of 6 modeling approaches to predict sensiFvity to 130 drugs.
• Extending pipeline to evaluate predicFon of TCGA phenotypes.
• HosFng of collaboraFve compeFFons to compare models from many groups.
Open and Networked Approaches
3 RULES
GOVERNANCE
THE FEDERATION
A B C
Pipeline Strategy
A B C
D
Divide and Conquer Strategy
A B C
Parallel/IteraFve Strategy
sage federation: model of biological age
Faster Aging
Slower Aging
Clinical Association - Gender - BMI - Disease Genotype Association Gene Pathway Expression Pr
edicted Age (liver expression)
Chronological Age (years)
Age Differential
REDEFINING HOW WE WORK TOGETHER: Sage/DREAM Breast Cancer Prognosis Challenge
4 HOW TO
DISTRIBUTE TASKS
COLLABORATIVE CHALLENGES
What is the problem? Our current models of disease biology are primitive and limit
doctor’s understanding and ability to treat patients
Current incentives reward those who silo information and work in closed systems 38
The Solution: Competitions to crowd-source research in biology and other fields
Why competitions? • Objective assessments • Acceleration of progress • Transparency • Reproducibility • Extensible, reusable models
Competitions in biomedical research • CASP (protein structure) • Fold it / EteRNA (protein / RNA structure) • CAGI (genome annotation) • Assemblethon / alignathon (genome assembly / alignment) • SBV Improver (industrial methodology benchmarking) • DREAM (co-organizer of Sage/DREAM competition)
Generic competition platforms • Kaggle, Innocentive, MLComp
39
The Sage/DREAM breast cancer prognosis challenge
Goal: Challenge to assess the accuracy of computational models designed to predict breast cancer survival using patient clinical and genomic data
Why this is unique: This Sage/DREAM Challenge is a pre-collated cohort: 2000 breast cancer samples
from the Metabric cohort Accessible to all: A cloud-based common compute architecture is being made
available by Google to support the computational models needed to develop and test challenge models
New Rigor: • Contestants will evaluate their models on a validation data set composed of newly generated
data (provided by Dr. Anne-Lise Borreson Dale) • Contestants must demonstrate their models can be reproduced by others
New incentives: leaderboard to energize participants, Science Translational Medicine publication for winning team
Breast cancer patients, funders and researchers can track this Challenge on BRIDGE, an open source online community being built by Sage and Ashoka Changemakers and affiliated with this Challenge
40
Sage/DREAM Challenge: Details and Timing
Phase 1: Apr thru end-Sep 2012
Training data: 2,000 breast cancer samples from METABRIC cohort
• Gene expression • Copy number • Clinical covariates • 10 year survival
Supporting data: Other Sage-curated breast cancer datasets
• >1,000 samples from GEO • ~800 samples from TCGA • ~500 additional samples from
Norway group • Curated and available on
Synapse, Sage’s compute platform
Data released in phases on Synapse from now through end-September
Will evaluate accuracy of models built on METABRIC data to predict survival in:
• Held out samples from METABRIC
• Other datasets
Phase 2: Oct 1 thru Nov 12, 2012
Evaluation of models in novel dataset.
Validation data: ~500 fresh frozen tumors from Norway group with:
• Clinical covariates • 10 year survival
Gene expression and copy number data to be generated for model evaluation
• Sent to Cancer Research UK to generate data at same facility as METABRIC
• Models built on training data evaluated on newly generated data
Winners announced at November 12 DREAM conference
41
Summary
Transparency, reproducibility
Valida;on in novel dataset
Publica;on in Science Transla;onal Medicine
Dona;on of Google-‐scale compute space.
For the goal of promo;ng democra;za;on of medicine… Registra;on star;ng NOW…
sign up at synapse.sagebase.org
!"#$%#&'()"*'++"++&"(,*
-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*
-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*
-./#"++0%(*1%/2*
(3&4"#*
53,'6%(*
!7"(%,2/"*-./#"++0%(*
1%/2*(3&4"#*
53,'6%(*
!7"(%,2/"*
!#"80)69"*&%8":*;"("#'6%(*
42
Presentation outline
mRNA copy number Sequencing
(1,600 genes)
Molecular characterization • 1,000 cell lines
Viability screens • 500 cell lines • 24 compounds
Cancer cell line encyclopedia
1) Predic;ng drug response from cancer cell lines
Primary tumor datasets (TCGA, METABRIC)
genomics transcriptomics epigenetics
Clinical data (e.g. survival time)
2) Predic;ng clinical cancer phenotypes
3) Workflows for data management, versioning and method comparison
!"#$%#&'()"*'++"++&"(,*
-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*
-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*
-./#"++0%(*1%/2*
(3&4"#*
53,'6%(*
!7"(%,2/"*-./#"++0%(*
1%/2*(3&4"#*
53,'6%(*
!7"(%,2/"*
!#"80)69"*&%8":*;"("#'6%(*
4) Network-‐based predictors and mul;-‐task learning
Predic;ve model
Molecular characterization
Developing predictive models of genotype-specific sensitivity to compound treatment
Pred
ic;ve Features
(biomarkers)
Gene;c Feature Matrix Expression, copy number, somaFc mutaFons, etc.
Sensi;ve Refractory
(e.g. EC50)
Cancer samples with varying degrees of response to therapy
44
Our approach identifies mutations in genes upstream of MEK as top predictors of sensitivity to MEK inhibition
#1 Mut NRAS
#3 Mut BRAF
PD-‐0325901
PD-‐0325901
#9 Mut BRAF
#312 Mut NRAS
!"#$% &"#$%
'"#(%
)*!+,-% #./0-11%2/345-674+%
45
#9 Mut KRAS
TP53 mut
CDKN2A copy
MDM2 expr
HGF expr
CML linage EGFR mut
EGFR mut
EGFR mut
CML lineage
ERBB2 expr
BRAF mut
BRAF mut
NRAS mut
BRAF mut
NRAS mut
KRAS mut
BRAF mut
NRAS mut
KRAS mut
#1 BRAF mut
#2 NRAS mut #1 BRAF mut
#3 KRAS mut #2 NRAS mut #1 BRAF mut
#3 KRAS mut #2 NRAS mut #1 BRAF mut
#1 EGFR mut
#1 ERBB2 expr
#1 EGFR mut
#2 CML lineage #1 EGFR mut
#1 CML lineage
#1 HGF expr
#2 TP53 mut #3 CDKN2A copy #1 MDM2 expr
Can the approach make new discoveries?
For 11/12 compounds, the #1 predictive feature in an unbiased analysis corresponds to the known stratifier of sensitivity
46
Predicted biomarkers supported by literature evidence
Predic;on Literature evidence Model / Significance
haematopoietic
solid
LBH589 (HDACi)
”Responses with single agent HDACi have been predominantly observed in advanced hematologic malignancies including T-‐cell lymphoma, Hodgkin lymphoma, and myeloid malignancies."
HDAC inhibitors are effec;ve in haematopoie;c tumors
Supported in current clinical trials
Typical pharma: >10 phase 2 clinical trials in solid tumors @ $millions per trial.
!"#$%&'()%
)*+,,-%
NQO1 over-‐expression predicts 17-‐AAG sensi;vity
NQO1 metabolizes 17-‐AAG to stable intermediary with 32-‐fold increase in ac;vity.
!"#$%%&&'(
)*+,(-.)(
!"#$
%&'())**+$
,-./*$
MYC amplifica;on predicts sensi;vity to HSP70 inhibi;on.
HSP70 inhibits MYC-‐mediated apoptosis.
AHR expression predicts sensi;vity to MEK inhibitors in NRAS mutant cell lines
Func;onally validated by AHR knockdown
Legend AHR shRNA Control shRNA
Novel predictions are functionally validated
48
Predic;on Valida;on
!"#$%#&'%()*++,-.&&
!"#"$%&'(')*+$',-".'/0*1203)0*4(-!*5.67",'$'/".*4)'("28(')*9%$"28(')*
!"#$%&'#()* +',-&$#"#(&'* ./%0* 0&1&"23#/#4* .4#5&67/#4* 86)94)* :2"&67/#4*
;<"*
/,5$,5)*
=><"*
?!@*
BCL-‐xL expression predicts sensi;vity to several chemotherapeu;cs
Func;onally validated by :
BCL-‐xL knockdown BCL-‐xL inhibitor drug synergy
Mouse models Clinical trials
Wei G.*, Margolin A.A.*, et al, Cancer Cell
Open and Networked Approaches
5 PRIVACY BARRIERS
PORTABLE LEGAL CONSENT: weconsent.us John Wilbanks
Arch2POCM
The Current R&D Ecosystem Is In Need of a New Approach to Drug Development
• $200B per year in biomedical and drug discovery R&D
• Only a handful of new medicines are approved each year
• Productivity in steady decline since 1950
• >90% of novel drugs entering clinical trials fail, and negative POC information is not shared
• Significant pharma revenues going off patent in next 5 years
• >30,000 pharma employees laid off from downsizing in each of last four years
• 90% of 2013 prescriptions will be for generic drugs
51
Issues With Drug Discovery
1. The greatest attrition is at clinical proof-of-concept – once a “target” is linked to a disease in the clinic, the risk of failure is far lower
2. Most novel targets are pursued by multiple companies in parallel (and most fail at clinical POC)
3. The complete data from failed trials are rarely, if ever, released to the public
52
Open access research tools drive science
53
SGC: Open Access Chemical Biology a great success
• PPP: -‐ GSK, Pfizer, NovarFs, Lilly, Abbo], Takeda -‐ Genome Canada, Ontario, CIHR, Wellcome Trust
• Based in UniversiFes of Toronto and Oxford
• 200 scienFsts
• Academic network of more than 250 labs
• Generate freely available reagents (proteins, assays, structures, inhibitors, anFbodies) for novel, human, therapeuFcally relevant proteins
• Give these to academic collaborators to dissect pathways and disease networks, and thereby discover new targets for drug discovery
54
Some SGC Achievements
• Structural impact – SGC contributed ~25% of global output of human structures annually
– SGC contributes >40% of global output of human parasite structures annually
• High quality science (some publicaFons from 2011) Vedadi et al, Nature Chem Biol, in press (2011); Evans et al, Nature Gene;cs in
press (2011); Norman et al Science Transl Med. 3(88):88mr1 (2011); Kochan G et al PNAS 108:7745 (2011); Clasquin MF et al Cell 145:969 (2011); Colwill et al, Nature Methods 8:551 (2011); Ceccarelli et al, Cell 145:1075 (2011; Strushkevich et al, PNAS 108:10139 (2011); Bian et al EMBO J in press (2011) Norman et al Science Trans. Med. 3:76cm10 (2011); Xu et al Nature Comm. 2: art. no. 227 (2011); Edwards et al Nature 470:163 (2011); Fairman et al Nature Struct, and Mol. Biol. 18:316 (2011); Adams-‐Cioaba et al, Nature Comm. 2 (1) (2011); Carr et al EMBO J 30:317 (2011); Deutsch et al Cell 144:566 (2011); Filippakopoulos et al Cell, in press; Nature Chem. Biol. in press, Nature in press
55
Impact Of SGC’s Open Access JQ1 BET Probe
Paper published Dec 23 has already cited >60 times
Harvard spin off (15 M$ seed funding raised)
> 5 pharma have launched bromodomain programs
JQ1/SGCB01 has been distributed to >250 labs/companies
Already used by some to link Brd4 to new areas of science
Zuber et al : BRD4 as target in acute leukaemia Nature, 2011 Delmore et al: JQ1 suppresses myc in multiple myeloma Cell, 2011 Dawson et al: BRD4 in MLL (isoxazole inhibitor) Nature, 2011 Blobel et al: Novel Targets in AML Cancer Cell, 2011 Mertz et al : Myc dependent cancer PNAS, 2011 Zhao et al: Post mitotic transcriptional re-activation Nature Cell Biol., 2011
56
Open access to the clinic?
57
Drug Discovery Is a Lomery Because:
Knowledge about clinical disease is limiFng -‐ paFents are heterogeneous
-‐ do not know how some drugs work eg paracetamol
-‐ different doses effecFve in different paFents
-‐ efficacy is short lived
-‐ poor biomarkers…..
Too many targets/preclinical assays do not prioriFze
58
Other Problems With How We Do Drug Discovery
• Same targets, in parallel, in secret
• No one organisaFon has all capabiliFes
• Early IP is making it even harder (makes process slower, harder and more expensive)
59
Most Novel Targets Fail at Clinical POC
Target ID/
Discovery
Hit/ Probe/ Lead
ID
Clinical candidate
ID
Tox./ Pharmacy
Phase I
Phase IIa/ b
HTS LO
10% 30% 30% 90+% 50%
this is killing our industry
…we can generate “safe” molecules, but they are not developable in chosen patient group 60
This Failure Is Repeated, Many Times
Target ID/
Discovery
Hit/ Probe/ Lead
ID
Clinical candidate
ID
Toxicology/ Pharmacy
Phase I
Phase IIa/ b
HTS
30% 30% 90+%
Target ID/
Discovery
Hit/ Probe/ Lead
ID
Clinical candidate
ID
Toxicology/ Pharmacy
Phase I
Phase IIa/ b
30% 30% 90+%
Target ID/
Discovery
Hit/ Probe/ Lead
ID
Clinical candidate
ID
Toxicology/ Pharmacy
Phase I
Phase IIa/ b
30% 30% 90+%
Target ID/
Discovery
Hit/ Probe/ Lead
ID
Clinical candidate
ID
Toxicology/ Pharmacy
Phase I
Phase IIa/ b
30% 30% 90+%
Target ID/
Discovery
Hit/ Probe/ Lead
ID
Clinical candidate
ID
Toxicology/ Pharmacy
Phase I
Phase IIa/ b
30% 30% 90+%
Target ID/
Discovery
Hit/ Probe/ Lead
ID
Clinical candidate
ID
Toxicology/ Pharmacy
Phase I
Phase IIa/ b
30% 30% 90+%
Target ID/
Discovery
Hit/ Probe/ Lead
ID
Clinical candidate
ID
Toxicology/ Pharmacy
Phase I
Phase IIa/ b
10% 30% 30% 90+% 50%
LO
…and outcomes are not shared 61
A Possible Soution:Arch2POCM An Open Access Clinical Validation PPP
• PPP to clinically validate (Ph IIa) pioneer targets
• Pharma, public, academia, regulators and paFent groups are acFve parFcipants
• CulFvate a common stream of knowledge – Avoid patents
– Place all data into the public domain
– Crowdsource the PPP’s druglike compounds
• In –validated targets are idenFfied before pharma makes a substanFal proprietary investment
– Reduces the number of redundant trials on bad targets
– Reduces safety concerns
• Validated targets are de-‐risked for pharma investment – Pharma can iniFate proprietary effort when risks are balanced with returns
– PPP pharma members can acquire Arch2POCM IND for validated targets and benefit from shorter development Fmeline and data exclusivity for sales
62
Arch2POCM: Scale and Scope • Proposed Vertical Goal:
– Initiate 2 programs. One for Oncology/Epigenetics/Immunology. One for Neuroscience/Schizophrenia/Autism.
– Both programs will have 8 drug discovery projects (targets) – By Year 5, 30% of projects will have started Ph 1 and 20% will have completed
Ph Iia – $200-250M over five years is projected as necessary to advance up to 8 drug
discovery projects within each of the two therapeutic programs – By investing $1.6 M annually into one or both of Arch2POCM’s selected disease
areas, partnered pharmaceutical companies: 1. obtain a vote on Arch2POCM target selection 2. gain real time data access to Arch2POCM’s 16 drug discovery projects 3. have the strategic opportunity to expand their overall portfolio
• Proposed Horizontal Goal: – Initiate 1-2 projects, (1-2 novel target mechanisms), as pilots to assess
Arch2POCM principles – In either Oncology or Neuroscience – Specific target mechanisms to be determined by funders’ interest – Interested funders include pharma, public research foundations and venture
philanthropists 63
Histone
DNA
Lysine
Epigenetics: Exciting Science and Also A New Area For Drug Discovery
Modification Write Read Erase
Acetyl HAT Bromo HDAC Methyl HMT MBT DeMethyl
64
The Case For Epigenetics/Chromatin Biology
1. There are epigenetic oncology drugs on the market (HDACs)
2. A growing number of links to oncology, notably many genetic links (i.e. fusion proteins, somatic mutations)
3. A pioneer area: More than 400 targets amenable to small molecule intervention - most of which only recently shown to be “druggable”, and only a few of which are under active investigation
4. Open access, early-stage science is developing quickly – significant collaborative efforts (e.g. SGC, NIH) to generate proteins, structures, assays and chemical starting points
65
Domain Family Typical substrate class* Total Targets
Histone Lysine demethylase
Histone/Protein K/R(me)n/ (meCpG) 30
Bromodomain Histone/Protein K(ac) 57
R O Y A L
Tudor domain Histone Kme2/3 - Rme2s 59
Chromodomain Histone/Protein K(me)3 34
MBT repeat Histone K(me)3 9
PHD finger Histone K(me)n 97
Acetyltransferase Histone/Protein K 17
Methyltransferase Histone/Protein K&R 60
PARP/ADPRT Histone/Protein R&E 17
MACRO Histone/Protein (p)-ADPribose 15
Histone deacetylases Histone/Protein KAc 11
395
The Current Epigenetics Universe
Now known to be amenable to small molecule inhibition 66
SGC Oxford SGC Toronto
BET family chemical biology
67
What Are Bromodomains and How Do They Function?
68
What Are Bromodomains: • Small highly conserved protein recognition domains (~110 residues) • Bundle of four α-helices and two loops that form a pocket with a conserved Asn residue • 56 unique human bromodomains identified: spread across 42 proteins
How Do They Function: • Selectively bind to acetylated lysine residues located on histones • Histone/BRD complex leads to transcription and gene expression • Inhibition of BRD binding to acetylated histones leads to gene silencing
Bromodomains: Genetic Links to Cancer
Genetic abnormality
Publications
69
Available Reagents for Bromodomain Family
28 crystal structures 42 purified proteins
70
Robust Assays Available
Peptide library screen using SPR
Histone peptide
Targ
ets
We now have a suite of assays for bromodomains • Filippakopoulos et al Cell. 2012 149(1):214-31.
Peptide array screens using dot blots
71
CBP/PCAF
BET
A Series of Chemical Starting Points
72
Proof-of-concept JQ1: A Selective Inhibitor for BETs
Panagis Fillipakopoulos, Jun Qi, Stefan Knapp, Jay Bradner 73
NUT midline carcinoma (NMC) is a rare, highly lethal cancer that occurs in children and young adults.
NMCs uniformly present in the midline, most commonly in the head, neck, or mediastinum, as poorly differentiated carcinomas
Rearrangement of the Nuclear protein in testis (NUT) that creates a BRD4-NUT fusion gene
Variant rearrangements, some involving the BRD3 gene
NMC is diagnosed by fluorescence in situ hybridization and NUT antibodies.
It is unclear how common NUT rearrangements are in squamous cell carcinomas due to lack of routine diagnostic 74
JQ1 Inhibits NMC Tumour Growth
FDG-PET
4 days 50mg/kg IP Jay Bradner/Andrew Kung, Harvard
75
Potential Year 1 Aims of an Arch2POCM Bromodomain Program
1. Select two pre-clinical candidates: Leverage SGC’s existing open access network of labs, compounds, assays and information to identify two chemotypes for medicinal chemistry optimization
2. Develop a biomarker strategy for clinical development: opportunities for surrogate endpoints and patient stratification
3. Implement crowdsourced research: manufacture and distribute optimized pre-clinical candidates to academic and clinical researchers
76
Process For Arch2POCM Target Selection
Arch2POCM creates a disease area spreadsheet of relevant information for pioneer targets such as:
1. Novelty: Target selection should focus on addressing fundamental questions on biology and disease association
• No clinical precedent • Exception: advance an existing asset into a new disease area
2. Targets should be tractable • In vitro assay availability • Cell-based assay availability • Characterized protein (e.g. 3D structure; antibody, cell lines, mouse model) • Availability of starting chemical matter
3. Evidence of genetic linkages • Translocations, mutations, splicing alterations specifically linked to disease • “Peripheral” genetic linkages: • Gene expression profiles or GWAS data indicate correlation
– Implicated in pathway with clear genetic link (SLS, Networks)
4. Key research contacts (academic or industry)
77
Evidence that this target plays an important role in tumors (in vitro, in vivo, animal
model data)
Maturity of the program
Posi;ve evidence of
the compound
playing a role in the given disease
Data showing a failed result
of the compound for
the given disease
Mouse knockout model (MGI)
SMARCA4 Expression correlates with development of prostate cancer BUT SMARCA4 in general acts as tumor suppressor and is necessary for genome stability; targeted knockdown of SMARCA4 potenFates lung cancer development;
potent, selecFve, cell acFve compound idenFfied
NA NA Homozygotes for a null allele die in utero before implantaFon. Embryos heterozygous for this null allele and an ENU-‐induced allele show impaired definiFve erythropoiesis, anemia and lethality during organogenesis. Heterozygotes show cyanosis and cardiovascular defects and are pre-‐disposed to breast tumors
SMARCA2A Gastric cancer; mutated in CLL; depleFon of BRM causes accelerated progression to the differenFaFon phenotype BUT targeted deleFon is causaFve for the development of prostaFc hyperplasia in mice
potent, selecFve, cell acFve compound idenFfied
NA NA Mice homozygous for a targeted mutaFon in this gene may exhibit inferFlity and a slightly increased body weight in some geneFc backgrounds.
CBP TranslocaFon of CBP with MOZ, monocyFc leukemia zinc finger protein cause acute myeloid leukemia ; other translocaFons involve MLL (HRX); Mutated in ALL BUT CBP has also been proposed as a classical tumor suppressor
potent, selecFve, cell acFve compound idenFfied
NA NA Homozygotes for null or altered alleles die around midgestaFon with defects in hemopoiesis, blood vessel formaFon, and neural tube closure. Heterozygotes may exhibit skeletal, cardiac, and hematopoieFc defects, retarded growth, and hematologic tumors.
ATAD2 Correlated with survival of high-‐grade osteosarcoma paFents a{er chemo-‐therapy; required for breast cancer cell proliferaFon ; differenFally expressed in NSCLC
Weak hits NA NA NA
BRD4 TranslocaFons produce BRD4-‐NUT fusion oncogene causing midline carcinoma
JQ1 JQ1 in BRD-‐NUT fusion and MLL
NA Homozygotes for a gene-‐trap null mutaFon die soon a{er implantaFon. Heterozygotes exhibit impaired pre-‐ and postnatal growth, head malformaFons, lack of subcutaneous fat, cataracts, and abnormal liver cells.
BRD2 In transgenic mice, consFtuFve lymphoid expression of Brd2 causes a malignancy most similar to human diffuse large B cell lymphoma
JQ1 JQ1 in BRD-‐NUT fusion and MLL
NA Mice homozygous for a null mutaFon display embryonic lethality during organogenesis with decreased embryo size, decreased cell proliferaFon, a delay in the cell cycle, and increased cell death. Heterozygous mice also display decreased cell proliferaFon.
Poten;al Targets-‐ Bromodomain Family
Evidence that this target plays an important role in tumors (in vitro, in vivo, animal model data)
Maturity of the program
Posi;ve evidence of the compound
playing a role in the given disease
Data showing a failed result of the compound for the given
disease
Mouse model (MGI)
JMJD3 Upregulated in prostate cancer; expression is higher in metastaFc prostate cancer BUT JMJD3 contributes to the acFvaFon of the INK4A-‐ARF tumor suppressor locus in response to oncogene -‐ and stress-‐induced senescence.
potent, selecFve, cell acFve compound idenFfied
NA; inhibits TNF-‐alpha
producFon in macrophages of RA paFents
NA Mice homozygous for a knock-‐out allele exhibit perinatal lethality associated with thick alveolar septum and absences of air space in the lungs. Bone marrow chimera mice derived from fetal liver cells exhibit impaired eosinophil recruitment and abnormal response to helminth infecFon.
JARID1B High levels in breast cancer cell lines, strong expression in the invasive but not in the benign components of primary breast carcinomas. BUT tumor suppressor in melanoma cells
No progress NA NA NA
Poten;al Targets-‐ Demethylases
Evidence that this target plays an important role in tumors (in vitro, in vivo, animal model data)
Maturity of the program
Posi;ve evidence of the compound playing a role in the given disease
Data showing a failed result of the compound for the given disease
SETD8 Recent data indicates that SETD8 deregulates PCNA expression by degradaFon accelerated by methylaFon at K248. Expression levels of SETD8 and PCNA upregulated in cancer cells. Cancer Research May 2012 Takawa et al.
Weak inhibitors idenFfied (8 microM) in chemistry opFmizaFon.
NA NA
EZH2 EZH2 upregulated in cancer cells. Studies on mutants indicates an interesFng profile where both wild-‐type and mutant (Y641F) are required for malignant phenotype. Sneeringer et al. PNAS 2012. Compounds idenFfied in GSK patents WO 2011/140324 and 140315 and WO 2012/005805 and 075080.
potent, selecFve, cell acFve compound idenFfied.
NA NA
MMSET MMSET, WHSC1, NSD2 is overexpressed in cancer cells. Hudlebusch et al. Clinical Cancer Res 2011
No hits—currently screening
NA NA
DOT1L Daigle et al. Cancer Cell 2011 elegantly show that potent DOT1L inhibitors kill cells containing MLL translocaFons and do not kill cell not containing the translocaFons
potent, selecFve, cell acFve compound idenFfied.
Transgenic mouse model tumors shrunk by SC
dosing of inhibitor
Poten;al Targets-‐ Histone Methyltransferases
Proposed Metrics For Measuring Arch2POCM Success
Use a therapeutic product profile (TPP) with stage-gates and defined milestones to monitor project progression:
• Small molecule screening hit rate achieved • SAR/In vitro testing
– Target EC50 achieved by at least XX compounds – Selectivity target achieved by at least YY compounds – Biological activity demonstrated for at least XX compounds in human tissue models (disease tissue, stem cells)
• Manufacturing and Quality – Steady and cost-effective supply of lead compound achieved – Stability of lead compound demonstrated (sufficient to support POCM testing) – Lead compound formulation identified to support pre-clinical and clinical studies – Lead compound demonstrates selected quality attributes (sufficient to support pre-clinical studies and distribution to the
crowd)
• Pre-clinical testing – Lead compounds achieve pre-clinical safety – Lead compound s surpass target TI – Lead compounds demonstrate cross-reactivity sufficient to support pre-clinical tox testing
• Clinical – Lead compounds demonstrate Ph I safety – Lead compounds demonstrate Ph II POCM
• Data management – IT database infrastructure populated with XX epigenetics investigators/grant application/publications – Database QC and compliance defined and implemented (internal and external)
81
Program Activities Grid For Arch2POCM
Ac;vity Arch2POCM Loca;on/Inves;gator (TBD)
Target Structure
Compound libraries
Assay development for epigeneFc screens and biomarkers
HTP screens for epigeneFc hits
Med Chem SAR To ID Two Suitable Binding Arch2POCM Test Compounds
Non-‐GLP scaleup of Arch2POCM Test Compounds and associated analyFcs
DistribuFon of Arch2POCM Test Compounds
PK, PD, ADME, Tox TesFng
GMP Manufacturing of Arch2POCM Test Compounds
GMP FormulaFon
GMP Drug Storage and DistribuFon
IND PreparaFon Support
Clinical Assay Development and QualificaFon
Ph I-‐II Clinical Trials
Ph I-‐II Database Management and CSR ProducFon 82
DISCUSSION
• OpportuniFes to Review Targets • OpportuniFes to Discuss Approach • OpportuniFes to Consider PotenFal Lead Groups for funding using this Open Approach
83
SYNAPSE
CURATED DATA
TOOLS/ METHODS
ANALYZES/ MODELS
RAW DATA
BioMedical Information Commons
Data Generators
Data Analysts
Experimentalists
Clinicians
Patients/ Citizens
Networked Approaches
5 PRIVACY BARRIERS
2 REWARDS
RECOGNITION
3 GOVERNANCE
1 USABLE DATA
4 HOW TO
DISTRIBUTE TASKS