Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28

Open Source pre-competitive drug discovery

Moving beyond linear investigations Both of the science and of how we work

Stephen Friend MD PhD

Sage Bionetworks (Non-Profit Organization) Seattle/ Beijing/ Amsterdam

February 28, 2012

Partnering & Collabora/on-‐So what has been possible?

All pa&ents now >25,000 at a Cancer Center partnered provide consented expression on their pts for classifying sub-‐popula&ons

Combina&on Therapies-‐ each at Ph I-‐ joint development 2 Pharma

Sharing all the CT Onc Trial imagining files among 2 Pharma

Link Parma with an “Ins&tute for Applied Cancer Center”

Share genomic data on 25,000 samples with clinical records and Expression and Exomes among three Pharma

Partnering & Collabora/on-‐So what has been possible?

All pa&ents now >25,000 at a Cancer Center partnered provide consented expression on their pts for classifying sub-‐popula&ons

2006 MoffiP Cancer Center-‐ Merck

Combina&on Therapies-‐ each at Ph I-‐ joint development 2 Pharma

2007 AZ Merck (Mek/Akt)

Sharing all the CT Onc Trial imagining files among 2 Pharma 2008 BMS & Merck

Link Parma with an ” Ins&tute for Applied Cancer Center”

2008 Belfer-‐ Merck

Share genomic data on 25,000 samples with clinical records and Expression and Exomes among three Pharma

2010 Asian Cancer Research Group ACRG-‐ Lilly Merck Pfizer

So what is the problem?

Most approved therapies were assumed to be monotherapies for diseases represen&ng homogenous popula&ons

Our exis&ng disease models o]en assume pathway knowledge sufficient to infer correct therapies

Familiar but Incomplete

Reality: Overlapping Pathways

what will it take to understand disease?

DNA RNA PROTEIN (dark maCer)

MOVING BEYOND ALTERED COMPONENT LISTS

DIVERSE POWERFUL USE OF MODELS AND NETWORKS

  50 network papers   http://sagebase.org/research/resources.php

List of Influential Papers in Network Modeling

(Eric Schadt)

Sage Mission

Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by

contributor scientists with a shared vision to accelerate the elimination of human disease

Sagebase.org

Data Repository

Discovery Platform

Building Disease Maps

Commons Pilots

Sage Bionetworks Collaborators

  Pharma Partners   Merck, Pfizer, Takeda, Astra Zeneca, Amgen, Roche

12

  Foundations   Kauffman CHDI, Gates Foundation

  Government   NIH, LSDF, NCI

  Academic   Levy (Framingham)   Rosengren (Lund)   Krauss (CHORI)

  Federation   Ideker, Califano, Nolan, Schadt

RULES GOVERN PL

ATFO

RM

NEW

MAP

S

RULES GOVERN PL

ATFO

RM

NEW

MAP

S

Why not share clinical /genomic data and model building within teams in ways currently used by the software industry

(power of tracking workflows and versioning

Leveraging Existing Technologies

Taverna

Addama

tranSMART

Watch What I Do, Not What I Say sage bionetworks synapse project

Reduce, Reuse, Recycle sage bionetworks synapse project

Most of the People You Need to Work with Don’t Work with You

sage bionetworks synapse project

My Other Computer is Cloudera Amazon Google

sage bionetworks synapse project

Sage Metagenomics Project

•  > 10k genomic and expression standardized datasets indexed in SCR •  Error detection, normalization in mG •  Access raw or processed data via download or API in downstream analysis •  Building towards open, continuous community curation

Processed Data (S3)

Sage Metagenomics using Amazon Simple Workflow

Full case study at http://aws.amazon.com/swf/testimonials/swfsagebio/

Amazon SWF and Synapse

•  Maintains state of analysis •  Tracks step execution •  Logs workflow history •  Dispatches work to Amazon or

remote worker nodes •  Efficiently match job size to

hardware •  Provides error handling and

recovery

•  Hosts raw and processed data for further reuse in public or private projects

•  Provides visibility into intermediate results and algorithmic details

•  Allows programmatic access to data; integration with R

•  Provides standard terminologies for annotations

•  Search across data sets

Synapse Roadmap

Q1-2012 Q2-2012 Q3-2012 Q4-2012 Q1-2013 Q2-2013

Synapse Platform Functionality

Data / Analysis Capabilities

Q3-2013 Q4-2013

Internal Alpha Public Beta Testing Synapse 1.0 Synapse 1.5 Future

•  Data Repository •  Projects and security •  R integration •  Analysis provenance

• Search • Controlled Vocabularies • Governance of restricted data

•  40+ manually curated clinical studies •  8000 + GEO / Array Express datasets •  Clinical, genomic, compound sensitivity •  Bioconductor and custom R analysis

• TCGA •  METABRIC breast cancer challenge

•  Workflow templates •  Publishing figures •  Wiki & collaboration tools •  Integrated management of cloud resources

•  Social networking •  User-customized dashboards •  R Studio integration •  Curation tool integration

•  Predictive modeling workflows •  Automated processing of common genomics platforms

•  TBD: Integrations with other visualization and analysis packages

INTEROPERABILITY

INTEROPERABILITY

Genome Pattern CYTOSCAPE tranSMART I2B2

SYNAPSE

CTCAP The Federa/on Portable Legal Consent Sage Congress Project

Arch2POCM

Five Pilots involving Sage Bionetworks

RULES GOVERN

PLAT

FORM

NEW

MAP

S

Clinical Trial Comparator Arm Partnership (CTCAP)

  Description: Collate, Annotate, Curate and Host Clinical Trial Data with Genomic Information from the Comparator Arms of Industry and Foundation Sponsored Clinical Trials: Building a Site for Sharing Data and Models to evolve better Disease Maps.

  Public-Private Partnership of leading pharmaceutical companies, clinical trial groups and researchers.

  Neutral Conveners: Sage Bionetworks and Genetic Alliance [nonprofits].

  Initiative to share existing trial data (molecular and clinical) from non-proprietary comparator and placebo arms to create powerful new tool for drug development.

Started Sept 2010

Shared clinical/genomic data sharing and analysis will maximize clinical impact and enable discovery

•  Graphic of curated to qced to models

The Federa/on

2008 2009 2010 2011

How can we accelerate the pace of scientific discovery?

Ways to move beyond “traditional” collaborations?

Intra-lab vs Inter-lab Communication

Colrain/ Industrial PPPs Academic Unions

(Nolan and Haussler)

sage federation: model of biological age

Faster Aging

Slower Aging

Clinical Association -  Gender -  BMI -  Disease Genotype Association Gene Pathway Expression Pr

edicted Age (liver expression)

Chronological Age (years)

Age Differential

Reproducible science==shareable science

Sweave: combines programmatic analysis with narrative

Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reports using literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 –

Proceedings in Computational Statistics,pages 575-580. Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9

Dynamic generation of statistical reports using literate data analysis

TP53 mut

CDKN2A copy

MDM2 expr

HGF expr

CML linage EGFR mut

EGFR mut

EGFR mut

CML lineage

ERBB2 expr

BRAF mut

BRAF mut

NRAS mut

BRAF mut

NRAS mut

KRAS mut

BRAF mut

NRAS mut

KRAS mut

#1 BRAF mut

#2 NRAS mut #1 BRAF mut

#3 KRAS mut #2 NRAS mut #1 BRAF mut

#3 KRAS mut #2 NRAS mut #1 BRAF mut

#1 EGFR mut

#1 ERBB2 expr

#1 EGFR mut

#2 CML lineage #1 EGFR mut

#1 CML lineage

#1 HGF expr

#2 TP53 mut #3 CDKN2A copy #1 MDM2 expr

Can the approach make new discoveries?

For 11/12 compounds, the #1 predictive feature in an unbiased analysis corresponds to the known stratifier of sensitivity

35

Vaske, et al.

Presentation outline

Currently   mRNA   copy number   somatic mutations (36

cancer-related genes) In progress   targeted exon sequencing   epigenetics   microRNA   lncRNA   phospho-tyrosine kinase   metabolites

Molecular characterization (1,000 cell lines)

Viability screens (500 cell lines, 24 compounds)

Small molecule screen

Cancer cell line encyclopedia

TCGA /ICGC Molecular characterization (50 tumor types)

  genomics   transcriptomics   epigenetics

Clinical data Predic&ve model

1) Predic&ng drug response from cancer cell lines

2) Future approaches: network-‐based predictors and mul&-‐task learning

3) Standardized workflows for data management, versioning and method comparison

Transfer learning

Network / pathway prior informa&on

Vaske, et al.

1)  Data management APIs to load standaridzed objects, e.g. R ExpressionSets (MaP Furia):

ccleFeatureData <-‐ getEn/ty(ccleFeatureDataId) ccleResponseData <-‐ getEn/ty(ccleResponseDataId)

tcgaFeatureData <-‐ getEn/ty(tcgaFeatureDataId) tcgaResponseData <-‐ getEn/ty(tcgaResponseDataId)

=!

Observed Data!=! +!

+!

Random Variation!Systematic Variation!

+!

Normalization: Remove the influence of adjustment variables on data...!

=! +!

2) Automated, standardized workflows for cura&on and QC of large-‐scale datasets (Brig Mecham).

A.  TCGA: Automated cloud-‐based processing. B. GEO / Array Expression: Normaliza/on workflows, cura/on of phenotype using standard ontologies. C. Addi/onal studies with gene/c and phenotypic data in Sage repository (e.g. CCLE and Sanger cell line datasets)

custom model 1 custom model 2 custom model N

4)  Sta&s&cal performance assessment across models.

custom model 1 custom model 2 custom model N

5)  Output of candidate biomarkers and feature evalua&on (e.g. GSEA, pathway analysis)

6) Experimental follow-‐up on top predic&ons (TBD) E.g. for cell lines: medium throughput suppressor / enhancer screens of drug sensi/vity for knockdown / overexpression of predicted biomarkers.

3)  Pluggable API to implement predic&ve modeling algorithms.

A)  Support for all commonly used machine learning methods (for automated benchmarking against new methods)

B)  Pluggable custom methods as R classes implemen/ng customTrain() and customPredict() methods.

A)  Can be arbitrarily complex (e.g. pathway and other priors)

B)  Support for paralleliza/on in for each loops.

Portable Legal Consent

(Ac/va/ng Pa/ents)

John Wilbanks

Sage Congress Project April 20 2012

RealNames Parkinson’s Project Revisi/ng Breast Cancer Prognosis

Fanconi’s Anemia

(Responders Compe//ons-‐ IBM-‐DREAM)

Confidential | © 2012 Third Rock Ventures

THE QUICK WIN, FAST FAIL DRUG DEVELOPMENT PARADIGM

March 1, 2012 PAGE 40

Preclinical development Phase I

Phase II

Test each scarce molecule thoroughly

Phase III Scarcity of drug discovery

Abundance of drug discovery

CS FHD FED PD Launch

PD Launch

•  Increase critical information content early to shift attrition to cheaper phase

•  Use savings from shifted attrition to re-invest in the R&D ‘sweet spot’

FHD

POC

CS

Preclinical development

Confirmation, dose finding Commercialization

R&D ‘sweet spot’

TRADITIONAL

QUICK WIN, FAST FAIL

Higher p(TS)

$ $ $$ $$$$

Source: Nature Publishing Group

Arch2POCM

Restructuring the Precompe//ve Space for Drug Discovery

How to poten/ally De-‐Risk High-‐Risk Therapeu/c Areas

Arch2POCM: Highlights A PPP To De-Risk Novel Targets That The Pharmaceutical Industry Can

Then Use To Accelerate The Development of New and Effective Medicines •  The Arch2POCM will be a charitable Public Private Partnership (PPP) that will file no patents and

whose scientific plan (including target selection) will be endorsed by its pharmaceutical, private and public funders

•  Arch2POCM will de-risk novel targets by developing and using pairs of test compounds (two different chemotypes) that interact with the selected targets: the compounds will be developed through Phase IIb clinical trials to determine if the selected target plays a role in the biology of human disease

•  Arch2POCM will work with and leverage patient groups and clinical CROs to enable patient recruitment, and with regulators to design novel studies and to validate novel biomarkers

•  Arch2POCM will make its GMP test compounds available to academic groups and foundations so they can use them to perform clinical studies and publish on a multitude of additional indications

•  Arch2POCM will release all reagents and data to the public at pre-defined stages in its drug development process. To ensure scientific quality, data and reagents will be released once they have been vetted by an independent scientific committee

•  Arch2POCM will publish all negative POCM data immediately in order to reduce the number of ongoing redundant proprietary studies (in pharma, biotech and academia) on an invalidated target and thereby –  minimize unnecessary patient exposure –  provide significant economic savings for the pharmaceutical industry

•  In the rare instance in which a molecule achieves positive POCM, Arch2POCM will ensure that the compound has the ability to reach the market by arranging for exclusive access to the proprietary IND database for the molecule 42

Arch2POCM: scale and scope

•  Proposed Goal: Initiate 2 programs. One for Oncology/Epigenetics/Immunology. One for Neuroscience/Schizophrenia/Autism. Both programs will have 6-8 drug discovery projects (targets) - ramped up over a period of 2 years

–  It is envisioned that Arch2POCM’s funding partners will select targets that are judged as slightly too risky to be pursued at the top of pharma’s portfolio, but that have significant scientific potential that could benefit from Arch2POCM’s crowdsourcing effort

•  These will be executed over a period of 5 years making a total of 16 drug discovery projects

–  Projected pipeline attrition by Year 5 (assuming 12 targets loaded in early discovery)

•  30% will enter Phase 1 •  20% will deliver Ph 2 POCM data 43

Arch2POCM: proposed funding strategy

–  Arch2POCM funding will come from a combination of public funding from governments and private sector funding from pharmaceutical and biotechnology companies and from private philanthropists

–  By investing $1.6 M annually into one or both of Arch2POCM’s selected disease areas, partnered pharmaceutical companies:

1.  obtain a vote on Arch2POCM target selection 2.  gain real time data access to Arch2POCM’s12- 16 drug discovery

projects 3.  have the strategic opportunity to expand their overall portfolio

44

Lead identification Phase I Phase II Preclinical

Lead optimisation

Assay in vitro probe

Lead Clinical candidate

Phase I asset

Phase II asset

Pioneer targets - genomic/ genetic - disease networks - academic partners - private partners - SAGE, SGC,

Stage-gate 1: Early Discovery and PCC Compounds (75%)

Stage-gate 2: Pharma’s re-purposed clinical assets (25%) 45

Entry points for Arch2POCM programs: Two compounds (different chemotypes) will be advanced per target

Five Year Objective: Initiate ≈ 8 drug discovery projects with 6 entering in Early Discovery, one entering in pre-clinical and one entering in PH I

Months → 0-6 7-12 13-18 19-24 25-30 31-36 37-42 43-48 49-54 55-60

Pipeline flow for Arch2POCM

Early discovery (45% PTRS) Pre-clinical (70% PTRS) Ph I (65% PTRS)

Ph II (10% PTRS)

1.3

1

Ph 1 (1)

1

Year #2 Arch2POCM Target Load

Arch2POCM Snapshot at Year 5

Year #1 Arch2POCM Target Load

Early discovery (2)

1

Targets Loaded 8

Projected INDs filed 3-‐4

Ph 1 or 2 Trials In Progress 2

Projected Complete Ph 2 (POCM) Data Sets

1

*PTRS = Probability of technical and regulatory success

Pre-clinical (1)

Early discovery (4)

Pre-clinical

Pre-clinical

Ph 1

Ph 1

Ph 1

Ph 2

Ph 2

Ph 2

46

The case for epigenetics/chromatin biology

1.  There are epigenetic oncology drugs on the market (HDACs)

2.  A growing number of links to oncology, notably many genetic links (i.e. fusion proteins, somatic mutations)

3.  A pioneer area: More than 400 targets amenable to small molecule intervention - most of which only recently shown to be “druggable”, and only a few of which are under active investigation

4.  Open access, early-stage science is developing quickly – significant collaborative efforts (e.g. SGC, NIH) to generate proteins, structures, assays and chemical starting points

47

Domain Family Typical substrate class* Total Targets

Histone Lysine demethylase

Histone/Protein K/R(me)n/ (meCpG) 30

Bromodomain Histone/Protein K(ac) 57

R O Y A L

Tudor domain Histone Kme2/3 - Rme2s 59

Chromodomain Histone/Protein K(me)3 34

MBT repeat Histone K(me)3 9

PHD finger Histone K(me)n 97

Acetyltransferase Histone/Protein K 17

Methyltransferase Histone/Protein K&R 60

PARP/ADPRT Histone/Protein R&E 17

MACRO Histone/Protein (p)-ADPribose 15

Histone deacetylases Histone/Protein KAc 11

395

The current epigenetics universe

Now known to be amenable to small molecule inhibition 48

Why is Arch2POCM a “smart bet” for Pharma investment?

Arch2POCM: an external epigene/c think tank from which Pharma can load the most likely to succeed targets as proprietary programs or leverage Arch2POCM results for its other internal efforts •  A front row seat on the progression of 6-‐ 8 epigene/c targets means that:

•  Pharma can select the epigene/c targets that best compliment their internal pormolio and for which there is the greatest interest

•  Pharma can structure Arch2POCM’s projects so that key objec/ves line up with internal go/no-‐go decisions

•  Pharma can use Arch2POCM data to trigger its internal level of investment on a par/cular target

•  Pharma can use Arch2POCM resources to enrich their internal epigene/cs effort: ac/ve chemotypes, assays, pre-‐clinical models, biomarkers, gene/c and phenotypic data for pa/ent stra/fica/on, rela/onships to epigene/c experts

•  Pharma can use Arch2POCM’s lead compound chemotypes to: •  inform their proprietary medicinal chemistry efforts on the target

•  iden/fy chemical scaffolds that impact epigene/c pathways: a proprietary combina/on therapy opportunity

•  Toxicity screening of Arch2POCM compounds with FDA tools can be used to guide internal proprietary chemistry efforts in oncology, inflamma/on and beyond

•  Arch2POCM’s crowd of scien/sts and clinicians provides its Pharma partners with parallel shots on goal at the best context for Arch2POCM’s compounds/targets 49

How will Arch2POCM provide “line of sight” to new medicines?

Arch2POCM will partner with scientists, clinicians and CROs that:

•  use “Omics” approaches to construct predictive models of disease networks (genomic, proteomic, signaling and metabolic)

•  have strategies available to identify those disease network gene(s) which when perturbed, impact the overall functioning of the network

•  already have epigenetic assays in place to identify chemotype structures (from discovery and/or pharma’s re-purposed un-used clinical assets) that impact the target and disease-correlated molecular phenotypes

•  already have biomarker tools available that can be tested for correlation to Arch2POCM’s targets

•  already have access to patient data and/or patient groups to mine for genetic and phenotypic signatures that may represent best responders for Arch2POCM clinical trials

50

•  Arch2POCM’s Ph II validation of high risk high opportunity targets focuses Pharma’s NME efforts

•  Positive POCM data: De-risked validated targets for Pharma development •  Negative POCM data: public release of this data minimizes the amount of time

and money that Pharma and the industry place on failed targets

•  Arch2POCM’s clinical candidate compounds provide Pharma with multiple paths to new medicines

•  Arch2POCM compounds that achieve POCM can be advanced into Ph 3 by Arch2POCM Members

•  The purchaser of Arch2POCM’s IND database obtains a significant time advantage over competitors to generate Phase III data and proceed to market

•  NMEs that derive from Arch2POCM will launch with database exclusivity protections: 5-8 years to garner a return on investment

•  The crowd’s testing of Arch2POCM compounds may identify alternative/better contexts for agonizing/antagonizing the disease biology target

•  indications •  patient stratification •  combination therapy options

How will Arch2POCM provide “line of sight” to new medicines?

51

Arch2POCM: current partnering status •  Pharmaceutical Funding Partners

–  Three companies are considering a potential role as industry anchors for Arch2POCM –  Two companies have demonstrated interest in Arch2POCM and their company leadership wants to

go to next step- awaiting face to face discussions to go over agreement

•  Public Funding Partners

–  Good progress is being made to obtain financial backing for Arch2POCM from public funders in a number of countries (Canada, United Kingdom and Sweden) for both epigenetics and for CNS

–  Ontario Brain Institute, Canada has allocated $3M to the development of an autism clinical network that is committed to work with Arch2POCM

•  Philanthropic Funding Partners: awaiting designation of anchor partners

•  In kind partners –  GE Healthcare (imaging): lead diagnostics partner and willing to share its experimental oncology

biomarkers –  Cancer Research UK: through some of its drug discovery and development resources considering

participating in Arch2POCM through “in kind efforts” •  Academic partners

–  Institutions that have indicated willingness to let their scientists participate without patent filing: UCSF, Massachusetts General Hospital, University of North Carolina, University of Toronto, Oxford University, Karolinska Institute

–  Academic community of epigenetic experts/resources already identified

•  Regulatory partners: Because the objective of the Arch2POCM PPP is to probe and elucidate disease biology as opposed to develop new proprietary products, FDA and EMEA are ready to play an active role (toxicity screens, and legacy clinical trial data)

•  Patient group partners: leaders from Genetic Alliance, Inspire2Live and the Love Avon Army of Women are actively engaged 52

Confidential | © 2012 Third Rock Ventures

STRATEGIC INFLECTION: FORCES AFFECTING A BUSINESS

MDAndersonCC02272012 PAGE 53

Society’s Needs Customers

Suppliers

New Technologies

New Competitors

Businesses Academia Government

Networking Disease Model Building

Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28

Health & Medicine