Stephen Friend ICR UK 2012-06-18

Exploring Disease Bionetworks and How we Perform our Science

Stephen Friend June 18, 2012

ICR

InformaFon Commons for Biological FuncFon

KRAS NRAS

BRAF

MEK1/2

EGFR

ERBB2

BCR/ABL

EGFRi

Proliferation, Survival

•  EGFR Pathway commonly mutated/acFvated in Cancer •  30% of all epithelial cancers

•  Blocking Abs approved for treatment of metastaFc colon cancer

•  Subsequently found that RASMUT tumors don’t respond – “NegaFve PredicFve Biomarker”

•  However sFll EGFR+ / RASWT paFents who don’t respond? – need “PosiFve PredicFve Biomarker”

•  And in Lung Cancer not clear that RASMUT status is useful biomarker

PredicFng treatment response to known oncogenes is complex and requires detailed understanding of how different geneFc backgrounds funcFon

Oncogenes only make good targets in particular molecular contexts : EGFR story

Causal Relationships ≠ Correlative Relationships? : CETPi story

•  Epidemiological Data provides strong support for independent association of low LDL and high HDL with reduced incidence of heart disease

•  Statins reduce LDL and reduce incidence of CVD deaths establishing causal relationship

•  CETP inhibition raises HDL – Does this have positive clinical benefit?

•  Torcetrapib (Pfizer) - $800M drug failed Ph3 (2006): a) Lack of efficacy; b) Increased mortality (off target?) •  Dalcetrapib (Roche) – development halted in Ph3 (May 2012) for lack of efficacy (no increase in mortality) •  Anacetrapib (Merck) / Evacetrapib (Lilly) – development ongoing. Hoped that they are better inhibitors and

this will lead to clinical benefit. Will cost $1Billion+ to find out

Can we save billions of dollars by generaFng and sharing datasets that let us be]er understand causal relaFonships?

Is there a common framework for tesFng clinical hypotheses (ARCH2POCM)?

what will it take to understand disease?

DNA RNA PROTEIN

MOVING BEYOND ALTERED COMPONENT LISTS

Familiar but Incomplete

Preliminary Probabalistic Models- Rosetta

Gene symbol Gene name Variance of OFPM explained by gene expression*

Mouse model

Source

Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg

Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12]

Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple

(UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg

(Columbia University, NY) [11] C3ar1 Complement component

3a receptor 1 46% ko Purchased from Deltagen, CA

Tgfbr2 Transforming growth factor beta receptor 2

39% ko Purchased from Deltagen, CA

Networks facilitate direct identification of genes that are

causal for disease Evolutionarily tolerated weak spots

Nat Genet (2005) 205:370

"Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003)

"Variations in DNA elucidate molecular networks that cause disease." Nature. (2008)

"Genetics of gene expression and its effect on disease." Nature. (2008)

"Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009) ….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc

"Identification of pathways for atherosclerosis." Circ Res. (2007)

"Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008)

…… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome

"Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005)

“..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)

"An integrative genomics approach to infer causal associations ...” Nat Genet. (2005)

"Increasing the power to detect causal associations… “PLoS Comput Biol. (2007)

"Integrating large-scale functional genomic data ..." Nat Genet. (2008)

…… Plus 3 additional papers in PLoS Genet., BMC Genet.

d

Metabolic Disease

CVD

Bone

Methods

Extensive Publications now Substantiating Scientific Approach Probabilistic Causal Bionetwork Models

• >80 Publications from Rosetta Genetics

  50 network papers   http://sagebase.org/research/resources.php

List of Influential Papers in Network Modeling

Biological System

Data Analysis

Fundamentally Biological Science hasn’t changed because of the ‘Omics RevoluFon……

…..it is about the process of linking a system to a hypothesis to some data to some analyses

But the way we do it has changed…………………………………………

Biological System

Data

Analysis

Biological System

Analysis

Data

Single Lab Model

Multiple Lab Model

•  R01 Funding •  Hypothesis->data->analysis->paper •  Small-scale data / analysis •  Reproducible?

•  P01 Funding •  Hypothesis->data->analysis->paper •  Medium-scale data / analysis •  Data Generators/Analysts/Validators maybe

different groups •  Reproducible?

Driven by molecular technologies we have become more data intensive leading to more specializaFon: data generators (centralized cores), data analyzers (bioinformaFcians), validators (experimentalists: lab & clinical) This is reflected in the tendency for more mulF lab consorFum style grants in which the data generators, analyzers, validators may be different labs.

Biological System

Data

Analysis

Iterative Networked Approaches To Generating Analyzing and Supporting New Models

Uncouple the automatic linkage between the data generators, analyzers, and validators

SYNAPSE

CURATED DATA

TOOLS/ METHODS

ANALYZES/ MODELS

RAW DATA

BioMedicine Information Commons

Data Generators

Data Analysts

Experimentalists

Clinicians

Patients/ Citizens

Networked Approaches

SYNAPSE

CURATED DATA

TOOLS/ METHODS

ANALYZES/ MODELS

RAW DATA

BioMedical Information Commons

Data Generators

Data Analysts

Experimentalists

Clinicians

Patients/ Citizens


5 PRIVACY BARRIERS

2 REWARDS

RECOGNITION

3 GOVERNANCE

1 USABLE DATA

4 HOW TO

DISTRIBUTE TASKS

Barriers to Engaging Networked Approaches to a BioMedicine Information Commons

5 PRIVACY BARRIERS

1 USABLE DATA

3 RULES

GOVERNANCE

2 REWARDS

RECOGNITION

4 HOW TO

DISTRIBUTE TASKS

PORTABLE LEGAL CONSENT

SYNAPSE

SYNAPSE

THE FEDERATION

COLLABORATIVE CHALLENGES

Open and Networked Approaches:Democratization of Science

1 USABLE DATA

2 REWARDS

RECOGNITION

SYNAPSE

SYNAPSE

Two approaches to building common scientific and technical knowledge

Text summary of the completed project Assembled after the fact

Every code change versioned Every issue tracked Every project the starting point for new work All evolving and accessible in real time Social Coding

Synapse is GitHub for Biomedical Data

Data and code versioned Analysis history captured in real time Work anywhere, and share the results with anyone Social Science

Every code change versioned Every issue tracked Every project the starting point for new work All evolving and accessible in real time Social Coding

Why not share clinical /genomic data and model building in the ways currently used by the software industry (power of tracking workflows and versioning

Leveraging Existing Technologies

Taverna

Addama

tranSMART

Watch What I Do, Not What I Say

sage bionetworks synapse project

Reduce, Reuse, Recycle


Most of the People You Need to Work with Don’t Work with You


My Other Computer is “The Cloud”


Data Analysis with Synapse

Run Any Tool

On Any Platform

Record in Synapse

Share with Anyone

Find Public Data

Use Existing Tools

Public or Private Projects

Publish Your Work

clearScience links the components of a ‘big science’ project to a cloud computing environment...

so with a click from your browser you can push code into a virtual machine

or data...

or models...

or figures...

or entire compute environments... conveniently pre-populated with data, code, and the library and version dependencies

“my other computer is the cloud… let me hand it to you…”

pilot advisors!

Downloading through TCGA data portal

!"#$%#&'()"*'++"++&"(,*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(*1%/2*

(3&4"#*

53,'6%(*

!7"(%,2/"*-./#"++0%(*

1%/2*(3&4"#*

53,'6%(*

!7"(%,2/"*

!#"80)69"*&%8":*;"("#'6%(*

•  Automated workflows for curaFon, QC, and sharing of large-‐scale datasets.

•  All of TCGA, GEO, and user-‐submi]ed data processed with standard normalizaFon methods.

•  Searchable TCGA data: •  23 cancers •  11 data plaoorms •  Standardized meta-‐data ontologies

!"#$%#&'()"*'++"++&"(,*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(*1%/2*

(3&4"#*

53,'6%(*

!7"(%,2/"*-./#"++0%(*

1%/2*(3&4"#*

53,'6%(*

!7"(%,2/"*

!#"80)69"*&%8":*;"("#'6%(*

•  Data accessible at mulFple levels of aggregaFon. •  Links to upstream and downstream processing of

data. •  Displayed is TCGA Glioblastoma data normalized

for each plaoorm across batches.

!"#$%#&'()"*'++"++&"(,*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(*1%/2*

(3&4"#*

53,'6%(*

!7"(%,2/"*-./#"++0%(*

1%/2*(3&4"#*

53,'6%(*

!7"(%,2/"*

!#"80)69"*&%8":*;"("#'6%(*

•  Data accessible through programmaFc environments such as R.

•  Standardized formats allow reuse of analysis pipelines on all processed datasets.

•  TCGA, GEO, user-‐submi]ed data.

!"#$%&'()$

*&+%

,-./

0$1-

-'&2-3$45

6 7$

!"#$%#&'()"*'++"++&"(,*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(*1%/2*

(3&4"#*

53,'6%(*

!7"(%,2/"*-./#"++0%(*

1%/2*(3&4"#*

53,'6%(*

!7"(%,2/"*

!#"80)69"*&%8":*;"("#'6%(*

•  Comparison of many modeling approaches applied to the same data.

•  Models transparently shared and reusable through Synapse.

•  Displayed is comparison of 6 modeling approaches to predict sensiFvity to 130 drugs.

•  Extending pipeline to evaluate predicFon of TCGA phenotypes.

•  HosFng of collaboraFve compeFFons to compare models from many groups.

Open and Networked Approaches

3 RULES

GOVERNANCE

THE FEDERATION

A B C

Pipeline Strategy

A B C

D

Divide and Conquer Strategy

A B C

Parallel/IteraFve Strategy

sage federation: model of biological age

Faster Aging

Slower Aging

Clinical Association -  Gender -  BMI -  Disease Genotype Association Gene Pathway Expression Pr

edicted Age (liver expression)

Chronological Age (years)

Age Differential

REDEFINING HOW WE WORK TOGETHER: Sage/DREAM Breast Cancer Prognosis Challenge

4 HOW TO

DISTRIBUTE TASKS

COLLABORATIVE CHALLENGES

What is the problem? Our current models of disease biology are primitive and limit

doctor’s understanding and ability to treat patients

Current incentives reward those who silo information and work in closed systems 38

The Solution: Competitions to crowd-source research in biology and other fields

  Why competitions? •  Objective assessments •  Acceleration of progress •  Transparency •  Reproducibility •  Extensible, reusable models

  Competitions in biomedical research •  CASP (protein structure) •  Fold it / EteRNA (protein / RNA structure) •  CAGI (genome annotation) •  Assemblethon / alignathon (genome assembly / alignment) •  SBV Improver (industrial methodology benchmarking) •  DREAM (co-organizer of Sage/DREAM competition)

  Generic competition platforms •  Kaggle, Innocentive, MLComp

39

The Sage/DREAM breast cancer prognosis challenge

Goal: Challenge to assess the accuracy of computational models designed to predict breast cancer survival using patient clinical and genomic data

Why this is unique:   This Sage/DREAM Challenge is a pre-collated cohort: 2000 breast cancer samples

from the Metabric cohort   Accessible to all: A cloud-based common compute architecture is being made

available by Google to support the computational models needed to develop and test challenge models

  New Rigor: •  Contestants will evaluate their models on a validation data set composed of newly generated

data (provided by Dr. Anne-Lise Borreson Dale) •  Contestants must demonstrate their models can be reproduced by others

  New incentives: leaderboard to energize participants, Science Translational Medicine publication for winning team

  Breast cancer patients, funders and researchers can track this Challenge on BRIDGE, an open source online community being built by Sage and Ashoka Changemakers and affiliated with this Challenge

40

Sage/DREAM Challenge: Details and Timing

Phase 1: Apr thru end-Sep 2012

  Training data: 2,000 breast cancer samples from METABRIC cohort

•  Gene expression •  Copy number •  Clinical covariates •  10 year survival

  Supporting data: Other Sage-curated breast cancer datasets

•  >1,000 samples from GEO •  ~800 samples from TCGA •  ~500 additional samples from

Norway group •  Curated and available on

Synapse, Sage’s compute platform

  Data released in phases on Synapse from now through end-September

  Will evaluate accuracy of models built on METABRIC data to predict survival in:

•  Held out samples from METABRIC

•  Other datasets

Phase 2: Oct 1 thru Nov 12, 2012

  Evaluation of models in novel dataset.

  Validation data: ~500 fresh frozen tumors from Norway group with:

•  Clinical covariates •  10 year survival

  Gene expression and copy number data to be generated for model evaluation

•  Sent to Cancer Research UK to generate data at same facility as METABRIC

•  Models built on training data evaluated on newly generated data

  Winners announced at November 12 DREAM conference

41

Summary

Transparency, reproducibility

Valida;on in novel dataset

Publica;on in Science Transla;onal Medicine

Dona;on of Google-‐scale compute space.

For the goal of promo;ng democra;za;on of medicine… Registra;on star;ng NOW…

sign up at synapse.sagebase.org

!"#$%#&'()"*'++"++&"(,*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(*1%/2*

(3&4"#*

53,'6%(*

!7"(%,2/"*-./#"++0%(*

1%/2*(3&4"#*

53,'6%(*

!7"(%,2/"*

!#"80)69"*&%8":*;"("#'6%(*

42

Presentation outline

  mRNA   copy number   Sequencing

(1,600 genes)

Molecular characterization •  1,000 cell lines

Viability screens •  500 cell lines •  24 compounds

Cancer cell line encyclopedia

1) Predic;ng drug response from cancer cell lines

Primary tumor datasets (TCGA, METABRIC)

  genomics   transcriptomics   epigenetics

Clinical data (e.g. survival time)

2) Predic;ng clinical cancer phenotypes

3) Workflows for data management, versioning and method comparison

!"#$%#&'()"*'++"++&"(,*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(*1%/2*

(3&4"#*

53,'6%(*

!7"(%,2/"*-./#"++0%(*

1%/2*(3&4"#*

53,'6%(*

!7"(%,2/"*

!#"80)69"*&%8":*;"("#'6%(*

4) Network-‐based predictors and mul;-‐task learning

Predic;ve model

Molecular characterization

Developing predictive models of genotype-specific sensitivity to compound treatment

Pred

ic;ve Features

(biomarkers)

Gene;c Feature Matrix Expression, copy number, somaFc mutaFons, etc.

Sensi;ve Refractory

(e.g. EC50)

Cancer samples with varying degrees of response to therapy

44

Our approach identifies mutations in genes upstream of MEK as top predictors of sensitivity to MEK inhibition

#1 Mut NRAS

#3 Mut BRAF

PD-‐0325901

PD-‐0325901

#9 Mut BRAF

#312 Mut NRAS

!"#$% &"#$%

'"#(%

)*!+,-% #./0-11%2/345-674+%

45

#9 Mut KRAS

TP53 mut

CDKN2A copy

MDM2 expr

HGF expr

CML linage EGFR mut

EGFR mut

EGFR mut

CML lineage

ERBB2 expr

BRAF mut

BRAF mut

NRAS mut

BRAF mut

NRAS mut

KRAS mut

BRAF mut

NRAS mut

KRAS mut

#1 BRAF mut

#2 NRAS mut #1 BRAF mut

#3 KRAS mut #2 NRAS mut #1 BRAF mut

#3 KRAS mut #2 NRAS mut #1 BRAF mut

#1 EGFR mut

#1 ERBB2 expr

#1 EGFR mut

#2 CML lineage #1 EGFR mut

#1 CML lineage

#1 HGF expr

#2 TP53 mut #3 CDKN2A copy #1 MDM2 expr

Can the approach make new discoveries?

For 11/12 compounds, the #1 predictive feature in an unbiased analysis corresponds to the known stratifier of sensitivity

46

Predicted biomarkers supported by literature evidence

Predic;on Literature evidence Model / Significance

haematopoietic

solid

LBH589 (HDACi)

”Responses with single agent HDACi have been predominantly observed in advanced hematologic malignancies including T-‐cell lymphoma, Hodgkin lymphoma, and myeloid malignancies."

HDAC inhibitors are effec;ve in haematopoie;c tumors

Supported in current clinical trials

Typical pharma: >10 phase 2 clinical trials in solid tumors @ $millions per trial.

!"#$%&'()%

)*+,,-%

NQO1 over-‐expression predicts 17-‐AAG sensi;vity

NQO1 metabolizes 17-‐AAG to stable intermediary with 32-‐fold increase in ac;vity.

!"#$%%&&'(

)*+,(-.)(

!"#$

%&'())**+$

,-./*$

MYC amplifica;on predicts sensi;vity to HSP70 inhibi;on.

HSP70 inhibits MYC-‐mediated apoptosis.

AHR expression predicts sensi;vity to MEK inhibitors in NRAS mutant cell lines

Func;onally validated by AHR knockdown

Legend AHR shRNA Control shRNA

Novel predictions are functionally validated

48

Predic;on Valida;on

!"#$%#&'%()*++,-.&&

!"#"$%&'(')*+$',-".'/0*1203)0*4(-!*5.67",'$'/".*4)'("28(')*9%$"28(')*

!"#$%&'#()* +',-&$#"#(&'* ./%0* 0&1&"23#/#4* .4#5&67/#4* 86)94)* :2"&67/#4*

;<"*

/,5$,5)*

=><"*

?!@*

BCL-‐xL expression predicts sensi;vity to several chemotherapeu;cs

Func;onally validated by :

BCL-‐xL knockdown BCL-‐xL inhibitor drug synergy

Mouse models Clinical trials

Wei G.*, Margolin A.A.*, et al, Cancer Cell

Open and Networked Approaches

5 PRIVACY BARRIERS

PORTABLE LEGAL CONSENT: weconsent.us John Wilbanks

Arch2POCM

The Current R&D Ecosystem Is In Need of a New Approach to Drug Development

•  $200B per year in biomedical and drug discovery R&D

•  Only a handful of new medicines are approved each year

•  Productivity in steady decline since 1950

•  >90% of novel drugs entering clinical trials fail, and negative POC information is not shared

•  Significant pharma revenues going off patent in next 5 years

•  >30,000 pharma employees laid off from downsizing in each of last four years

•  90% of 2013 prescriptions will be for generic drugs

51

Issues With Drug Discovery

1.  The greatest attrition is at clinical proof-of-concept – once a “target” is linked to a disease in the clinic, the risk of failure is far lower

2.  Most novel targets are pursued by multiple companies in parallel (and most fail at clinical POC)

3.  The complete data from failed trials are rarely, if ever, released to the public

52

Open access research tools drive science

53

SGC: Open Access Chemical Biology a great success

•  PPP: -‐ GSK, Pfizer, NovarFs, Lilly, Abbo], Takeda -‐ Genome Canada, Ontario, CIHR, Wellcome Trust

•  Based in UniversiFes of Toronto and Oxford

•  200 scienFsts

•  Academic network of more than 250 labs

•  Generate freely available reagents (proteins, assays, structures, inhibitors, anFbodies) for novel, human, therapeuFcally relevant proteins

•  Give these to academic collaborators to dissect pathways and disease networks, and thereby discover new targets for drug discovery

54

Some SGC Achievements

•  Structural impact –  SGC contributed ~25% of global output of human structures annually

–  SGC contributes >40% of global output of human parasite structures annually

•  High quality science (some publicaFons from 2011) Vedadi et al, Nature Chem Biol, in press (2011); Evans et al, Nature Gene;cs in

press (2011); Norman et al Science Transl Med. 3(88):88mr1 (2011); Kochan G et al PNAS 108:7745 (2011); Clasquin MF et al Cell 145:969 (2011); Colwill et al, Nature Methods 8:551 (2011); Ceccarelli et al, Cell 145:1075 (2011; Strushkevich et al, PNAS 108:10139 (2011); Bian et al EMBO J in press (2011) Norman et al Science Trans. Med. 3:76cm10 (2011); Xu et al Nature Comm. 2: art. no. 227 (2011); Edwards et al Nature 470:163 (2011); Fairman et al Nature Struct, and Mol. Biol. 18:316 (2011); Adams-‐Cioaba et al, Nature Comm. 2 (1) (2011); Carr et al EMBO J 30:317 (2011); Deutsch et al Cell 144:566 (2011); Filippakopoulos et al Cell, in press; Nature Chem. Biol. in press, Nature in press

55

Impact Of SGC’s Open Access JQ1 BET Probe

  Paper published Dec 23 has already cited >60 times

  Harvard spin off (15 M$ seed funding raised)

  > 5 pharma have launched bromodomain programs

  JQ1/SGCB01 has been distributed to >250 labs/companies

  Already used by some to link Brd4 to new areas of science

Zuber et al : BRD4 as target in acute leukaemia Nature, 2011 Delmore et al: JQ1 suppresses myc in multiple myeloma Cell, 2011 Dawson et al: BRD4 in MLL (isoxazole inhibitor) Nature, 2011 Blobel et al: Novel Targets in AML Cancer Cell, 2011 Mertz et al : Myc dependent cancer PNAS, 2011 Zhao et al: Post mitotic transcriptional re-activation Nature Cell Biol., 2011

56

Open access to the clinic?

57

Drug Discovery Is a Lomery Because:

Knowledge about clinical disease is limiFng -‐ paFents are heterogeneous

-‐ do not know how some drugs work eg paracetamol

-‐ different doses effecFve in different paFents

-‐ efficacy is short lived

-‐ poor biomarkers…..

Too many targets/preclinical assays do not prioriFze

58

Other Problems With How We Do Drug Discovery

•  Same targets, in parallel, in secret

•  No one organisaFon has all capabiliFes

•  Early IP is making it even harder (makes process slower, harder and more expensive)

59

Most Novel Targets Fail at Clinical POC

Target ID/

Discovery

Hit/ Probe/ Lead

ID

Clinical candidate

ID

Tox./ Pharmacy

Phase I

Phase IIa/ b

HTS LO

10% 30% 30% 90+% 50%

this is killing our industry

…we can generate “safe” molecules, but they are not developable in chosen patient group 60

This Failure Is Repeated, Many Times

Target ID/

Discovery

Hit/ Probe/ Lead

ID

Clinical candidate

ID

Toxicology/ Pharmacy

Phase I

Phase IIa/ b

HTS

30% 30% 90+%

Target ID/

Discovery

Hit/ Probe/ Lead

ID

Clinical candidate

ID


Phase I

Phase IIa/ b

30% 30% 90+%

Target ID/

Discovery

Hit/ Probe/ Lead

ID

Clinical candidate

ID


Phase I

Phase IIa/ b

30% 30% 90+%

Target ID/

Discovery

Hit/ Probe/ Lead

ID

Clinical candidate

ID


Phase I

Phase IIa/ b

30% 30% 90+%

Target ID/

Discovery

Hit/ Probe/ Lead

ID

Clinical candidate

ID


Phase I

Phase IIa/ b

30% 30% 90+%

Target ID/

Discovery

Hit/ Probe/ Lead

ID

Clinical candidate

ID


Phase I

Phase IIa/ b

30% 30% 90+%

Target ID/

Discovery

Hit/ Probe/ Lead

ID

Clinical candidate

ID


Phase I

Phase IIa/ b

10% 30% 30% 90+% 50%

LO

…and outcomes are not shared 61

A Possible Soution:Arch2POCM An Open Access Clinical Validation PPP

•  PPP to clinically validate (Ph IIa) pioneer targets

•  Pharma, public, academia, regulators and paFent groups are acFve parFcipants

•  CulFvate a common stream of knowledge –  Avoid patents

–  Place all data into the public domain

–  Crowdsource the PPP’s druglike compounds

•  In –validated targets are idenFfied before pharma makes a substanFal proprietary investment

–  Reduces the number of redundant trials on bad targets

–  Reduces safety concerns

•  Validated targets are de-‐risked for pharma investment –  Pharma can iniFate proprietary effort when risks are balanced with returns

–  PPP pharma members can acquire Arch2POCM IND for validated targets and benefit from shorter development Fmeline and data exclusivity for sales

62

Arch2POCM: Scale and Scope •  Proposed Vertical Goal:

–  Initiate 2 programs. One for Oncology/Epigenetics/Immunology. One for Neuroscience/Schizophrenia/Autism.

–  Both programs will have 8 drug discovery projects (targets) –  By Year 5, 30% of projects will have started Ph 1 and 20% will have completed

Ph Iia –  $200-250M over five years is projected as necessary to advance up to 8 drug

discovery projects within each of the two therapeutic programs –  By investing $1.6 M annually into one or both of Arch2POCM’s selected disease

areas, partnered pharmaceutical companies: 1.  obtain a vote on Arch2POCM target selection 2.  gain real time data access to Arch2POCM’s 16 drug discovery projects 3.  have the strategic opportunity to expand their overall portfolio

•  Proposed Horizontal Goal: –  Initiate 1-2 projects, (1-2 novel target mechanisms), as pilots to assess

Arch2POCM principles –  In either Oncology or Neuroscience –  Specific target mechanisms to be determined by funders’ interest –  Interested funders include pharma, public research foundations and venture

philanthropists 63

Histone

DNA

Lysine

Epigenetics: Exciting Science and Also A New Area For Drug Discovery

Modification Write Read Erase

Acetyl HAT Bromo HDAC Methyl HMT MBT DeMethyl

64

The Case For Epigenetics/Chromatin Biology

1.  There are epigenetic oncology drugs on the market (HDACs)

2.  A growing number of links to oncology, notably many genetic links (i.e. fusion proteins, somatic mutations)

3.  A pioneer area: More than 400 targets amenable to small molecule intervention - most of which only recently shown to be “druggable”, and only a few of which are under active investigation

4.  Open access, early-stage science is developing quickly – significant collaborative efforts (e.g. SGC, NIH) to generate proteins, structures, assays and chemical starting points

65

Domain Family Typical substrate class* Total Targets

Histone Lysine demethylase

Histone/Protein K/R(me)n/ (meCpG) 30

Bromodomain Histone/Protein K(ac) 57

R O Y A L

Tudor domain Histone Kme2/3 - Rme2s 59

Chromodomain Histone/Protein K(me)3 34

MBT repeat Histone K(me)3 9

PHD finger Histone K(me)n 97

Acetyltransferase Histone/Protein K 17

Methyltransferase Histone/Protein K&R 60

PARP/ADPRT Histone/Protein R&E 17

MACRO Histone/Protein (p)-ADPribose 15

Histone deacetylases Histone/Protein KAc 11

395

The Current Epigenetics Universe

Now known to be amenable to small molecule inhibition 66

SGC Oxford SGC Toronto

BET family chemical biology

67

What Are Bromodomains and How Do They Function?

68

What Are Bromodomains: • Small highly conserved protein recognition domains (~110 residues) • Bundle of four α-helices and two loops that form a pocket with a conserved Asn residue • 56 unique human bromodomains identified: spread across 42 proteins

How Do They Function: • Selectively bind to acetylated lysine residues located on histones • Histone/BRD complex leads to transcription and gene expression • Inhibition of BRD binding to acetylated histones leads to gene silencing

Bromodomains: Genetic Links to Cancer

Genetic abnormality

Publications

69

Available Reagents for Bromodomain Family

28 crystal structures 42 purified proteins

70

Robust Assays Available

Peptide library screen using SPR

Histone peptide

Targ

ets

  We now have a suite of assays for bromodomains •  Filippakopoulos et al Cell. 2012 149(1):214-31.

Peptide array screens using dot blots

71

CBP/PCAF

BET

A Series of Chemical Starting Points

72

Proof-of-concept JQ1: A Selective Inhibitor for BETs

Panagis Fillipakopoulos, Jun Qi, Stefan Knapp, Jay Bradner 73

  NUT midline carcinoma (NMC) is a rare, highly lethal cancer that occurs in children and young adults.

  NMCs uniformly present in the midline, most commonly in the head, neck, or mediastinum, as poorly differentiated carcinomas

  Rearrangement of the Nuclear protein in testis (NUT) that creates a BRD4-NUT fusion gene

 Variant rearrangements, some involving the BRD3 gene

  NMC is diagnosed by fluorescence in situ hybridization and NUT antibodies.

It is unclear how common NUT rearrangements are in squamous cell carcinomas due to lack of routine diagnostic 74

JQ1 Inhibits NMC Tumour Growth

FDG-PET

4 days 50mg/kg IP Jay Bradner/Andrew Kung, Harvard

75

Potential Year 1 Aims of an Arch2POCM Bromodomain Program

1.  Select two pre-clinical candidates: Leverage SGC’s existing open access network of labs, compounds, assays and information to identify two chemotypes for medicinal chemistry optimization

2.  Develop a biomarker strategy for clinical development: opportunities for surrogate endpoints and patient stratification

3.  Implement crowdsourced research: manufacture and distribute optimized pre-clinical candidates to academic and clinical researchers

76

Process For Arch2POCM Target Selection

Arch2POCM creates a disease area spreadsheet of relevant information for pioneer targets such as:

1.  Novelty: Target selection should focus on addressing fundamental questions on biology and disease association

•  No clinical precedent •  Exception: advance an existing asset into a new disease area

2.  Targets should be tractable •  In vitro assay availability •  Cell-based assay availability •  Characterized protein (e.g. 3D structure; antibody, cell lines, mouse model) •  Availability of starting chemical matter

3.  Evidence of genetic linkages •  Translocations, mutations, splicing alterations specifically linked to disease •  “Peripheral” genetic linkages: •  Gene expression profiles or GWAS data indicate correlation

–  Implicated in pathway with clear genetic link (SLS, Networks)

4.  Key research contacts (academic or industry)

77

Evidence that this target plays an important role in tumors (in vitro, in vivo, animal

model data)

Maturity of the program

Posi;ve evidence of

the compound

playing a role in the given disease

Data showing a failed result

of the compound for

the given disease

Mouse knockout model (MGI)

SMARCA4 Expression correlates with development of prostate cancer BUT SMARCA4 in general acts as tumor suppressor and is necessary for genome stability; targeted knockdown of SMARCA4 potenFates lung cancer development;

potent, selecFve, cell acFve compound idenFfied

NA NA Homozygotes for a null allele die in utero before implantaFon. Embryos heterozygous for this null allele and an ENU-‐induced allele show impaired definiFve erythropoiesis, anemia and lethality during organogenesis. Heterozygotes show cyanosis and cardiovascular defects and are pre-‐disposed to breast tumors

SMARCA2A Gastric cancer; mutated in CLL; depleFon of BRM causes accelerated progression to the differenFaFon phenotype BUT targeted deleFon is causaFve for the development of prostaFc hyperplasia in mice


NA NA Mice homozygous for a targeted mutaFon in this gene may exhibit inferFlity and a slightly increased body weight in some geneFc backgrounds.

CBP TranslocaFon of CBP with MOZ, monocyFc leukemia zinc finger protein cause acute myeloid leukemia ; other translocaFons involve MLL (HRX); Mutated in ALL BUT CBP has also been proposed as a classical tumor suppressor


NA NA Homozygotes for null or altered alleles die around midgestaFon with defects in hemopoiesis, blood vessel formaFon, and neural tube closure. Heterozygotes may exhibit skeletal, cardiac, and hematopoieFc defects, retarded growth, and hematologic tumors.

ATAD2 Correlated with survival of high-‐grade osteosarcoma paFents a{er chemo-‐therapy; required for breast cancer cell proliferaFon ; differenFally expressed in NSCLC

Weak hits NA NA NA

BRD4 TranslocaFons produce BRD4-‐NUT fusion oncogene causing midline carcinoma

JQ1 JQ1 in BRD-‐NUT fusion and MLL

NA Homozygotes for a gene-‐trap null mutaFon die soon a{er implantaFon. Heterozygotes exhibit impaired pre-‐ and postnatal growth, head malformaFons, lack of subcutaneous fat, cataracts, and abnormal liver cells.

BRD2 In transgenic mice, consFtuFve lymphoid expression of Brd2 causes a malignancy most similar to human diffuse large B cell lymphoma

JQ1 JQ1 in BRD-‐NUT fusion and MLL

NA Mice homozygous for a null mutaFon display embryonic lethality during organogenesis with decreased embryo size, decreased cell proliferaFon, a delay in the cell cycle, and increased cell death. Heterozygous mice also display decreased cell proliferaFon.

Poten;al Targets-‐ Bromodomain Family

Evidence that this target plays an important role in tumors (in vitro, in vivo, animal model data)


Posi;ve evidence of the compound

playing a role in the given disease

Data showing a failed result of the compound for the given

disease

Mouse model (MGI)

JMJD3 Upregulated in prostate cancer; expression is higher in metastaFc prostate cancer BUT JMJD3 contributes to the acFvaFon of the INK4A-‐ARF tumor suppressor locus in response to oncogene -‐ and stress-‐induced senescence.


NA; inhibits TNF-‐alpha

producFon in macrophages of RA paFents

NA Mice homozygous for a knock-‐out allele exhibit perinatal lethality associated with thick alveolar septum and absences of air space in the lungs. Bone marrow chimera mice derived from fetal liver cells exhibit impaired eosinophil recruitment and abnormal response to helminth infecFon.

JARID1B High levels in breast cancer cell lines, strong expression in the invasive but not in the benign components of primary breast carcinomas. BUT tumor suppressor in melanoma cells

No progress NA NA NA

Poten;al Targets-‐ Demethylases

Evidence that this target plays an important role in tumors (in vitro, in vivo, animal model data)


Posi;ve evidence of the compound playing a role in the given disease

Data showing a failed result of the compound for the given disease

SETD8 Recent data indicates that SETD8 deregulates PCNA expression by degradaFon accelerated by methylaFon at K248. Expression levels of SETD8 and PCNA upregulated in cancer cells. Cancer Research May 2012 Takawa et al.

Weak inhibitors idenFfied (8 microM) in chemistry opFmizaFon.

NA NA

EZH2 EZH2 upregulated in cancer cells. Studies on mutants indicates an interesFng profile where both wild-‐type and mutant (Y641F) are required for malignant phenotype. Sneeringer et al. PNAS 2012. Compounds idenFfied in GSK patents WO 2011/140324 and 140315 and WO 2012/005805 and 075080.

potent, selecFve, cell acFve compound idenFfied.

NA NA

MMSET MMSET, WHSC1, NSD2 is overexpressed in cancer cells. Hudlebusch et al. Clinical Cancer Res 2011

No hits—currently screening

NA NA

DOT1L Daigle et al. Cancer Cell 2011 elegantly show that potent DOT1L inhibitors kill cells containing MLL translocaFons and do not kill cell not containing the translocaFons

potent, selecFve, cell acFve compound idenFfied.

Transgenic mouse model tumors shrunk by SC

dosing of inhibitor

Poten;al Targets-‐ Histone Methyltransferases

Proposed Metrics For Measuring Arch2POCM Success

Use a therapeutic product profile (TPP) with stage-gates and defined milestones to monitor project progression:

•  Small molecule screening hit rate achieved •  SAR/In vitro testing

–  Target EC50 achieved by at least XX compounds –  Selectivity target achieved by at least YY compounds –  Biological activity demonstrated for at least XX compounds in human tissue models (disease tissue, stem cells)

•  Manufacturing and Quality –  Steady and cost-effective supply of lead compound achieved –  Stability of lead compound demonstrated (sufficient to support POCM testing) –  Lead compound formulation identified to support pre-clinical and clinical studies –  Lead compound demonstrates selected quality attributes (sufficient to support pre-clinical studies and distribution to the

crowd)

•  Pre-clinical testing –  Lead compounds achieve pre-clinical safety –  Lead compound s surpass target TI –  Lead compounds demonstrate cross-reactivity sufficient to support pre-clinical tox testing

•  Clinical –  Lead compounds demonstrate Ph I safety –  Lead compounds demonstrate Ph II POCM

•  Data management –  IT database infrastructure populated with XX epigenetics investigators/grant application/publications –  Database QC and compliance defined and implemented (internal and external)

81

Program Activities Grid For Arch2POCM

Ac;vity Arch2POCM Loca;on/Inves;gator (TBD)

Target Structure

Compound libraries

Assay development for epigeneFc screens and biomarkers

HTP screens for epigeneFc hits

Med Chem SAR To ID Two Suitable Binding Arch2POCM Test Compounds

Non-‐GLP scaleup of Arch2POCM Test Compounds and associated analyFcs

DistribuFon of Arch2POCM Test Compounds

PK, PD, ADME, Tox TesFng

GMP Manufacturing of Arch2POCM Test Compounds

GMP FormulaFon

GMP Drug Storage and DistribuFon

IND PreparaFon Support

Clinical Assay Development and QualificaFon

Ph I-‐II Clinical Trials

Ph I-‐II Database Management and CSR ProducFon 82

DISCUSSION

•  OpportuniFes to Review Targets •  OpportuniFes to Discuss Approach •  OpportuniFes to Consider PotenFal Lead Groups for funding using this Open Approach

83

SYNAPSE

CURATED DATA

TOOLS/ METHODS

ANALYZES/ MODELS

RAW DATA

BioMedical Information Commons

Data Generators

Data Analysts

Experimentalists

Clinicians

Patients/ Citizens


5 PRIVACY BARRIERS

2 REWARDS

RECOGNITION

3 GOVERNANCE

1 USABLE DATA

4 HOW TO

DISTRIBUTE TASKS

Stephen Friend ICR UK 2012-06-18

Health & Medicine