Stephen Friend Haas School of Business 2012-03-05

The Future of Open Innovation: Development and Use of Therapies

End of the Era of Medical Guilds and Alchemy

Moving beyond the Medical Industrial Complex

Stephen Friend MD PhD Sage Bionetworks (Non-Profit Organization)

Seattle/ Beijing/ Amsterdam

UC Berkeley Hass School of Business Topics in Innovation

March 5, 2012

•  New ways of Building Models of Disease

•  What prevents us from building them?

•  What is Sage Bionetworks?

•  Review of Six Pilots

•  So what are the next steps?

What is the problem?

Most approved therapies were assumed to be monotherapies for diseases represen4ng homogenous popula4ons

Our exis4ng disease models o9en assume pathway knowledge sufficient to infer correct therapies

Familiar but Incomplete

Reality: Overlapping Pathways

The value of appropriate representations/ maps

Equipment capable of generating massive amounts of data

“Data Intensive” Science- Fourth Scientific Paradigm

Open Information System

IT Interoperability

Host evolving computational models in a “Compute Space”

WHY NOT USE “DATA INTENSIVE” SCIENCE

TO BUILD BETTER DISEASE MAPS?

what will it take to understand disease?

DNA RNA PROTEIN (dark maOer)

MOVING BEYOND ALTERED COMPONENT LISTS

2002 Can one build a “causal” model?

Preliminary Probabalistic Models- Rosetta /Schadt

Gene symbol Gene name Variance of OFPM explained by gene expression*

Mouse model

Source

Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg

Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12]

Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple

(UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg

(Columbia University, NY) [11] C3ar1 Complement component

3a receptor 1 46% ko Purchased from Deltagen, CA

Tgfbr2 Transforming growth factor beta receptor 2

39% ko Purchased from Deltagen, CA

Networks facilitate direct identification of genes that are

causal for disease Evolutionarily tolerated weak spots

Nat Genet (2005) 205:370

DIVERSE POWERFUL USE OF MODELS AND NETWORKS

  50 network papers   http://sagebase.org/research/resources.php

List of Influential Papers in Network Modeling

(Eric Schadt)

Equipment capable of generating massive amounts of data A-

“Data Intensive” Science- Fourth Scientific Paradigm Score Card for Medical Sciences

Open Information System D-

IT Interoperability D

Host evolving computational models in a “Compute Space F

.

We still consider much clinical research as if we were “hunter gathers”- not sharing

TENURE FEUDAL STATES

Clinical/genomic data are accessible but minimally usable

Little incentive to annotate and curate data for other scientists to use

Mathematical models of disease are not built to be

reproduced or versioned by others

Lack of standard forms for future rights and consents

Lack of data standards..

Sage Mission

Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by

contributor scientists with a shared vision to accelerate the elimination of human disease

Sagebase.org

Data Repository

Discovery Platform

Building Disease Maps

Commons Pilots

Sage Bionetworks Collaborators

  Pharma Partners   Merck, Pfizer, Takeda, Astra Zeneca, Amgen, Johnson &Johnson

27

  Foundations   Kauffman CHDI, Gates Foundation

  Government   NIH, LSDF, NCI

  Academic   Levy (Framingham)   Rosengren (Lund)   Krauss (CHORI)

  Federation   Ideker, Califano, Nolan, Schadt

What is this?

Bayesian networks enriched in inflammaVon genes correlated with disease severity in pre-‐frontal cortex of 250 Alzheimer’s paVents.

What does it mean?

InflammaVon in AD is an interacVve mulV-‐pathway system. More broadly, network structure organizes complex disease effects into coherent sub-‐systems and can prioriVze key genes.

Are you joking?

Gene validaVon shows novel key drivers increase Abeta uptake and decrease neurite length through an ROS burst. (highly relevant to AD pathology)

ALZHEIMER’S

Liver Adipose

FaDy acids

Hypothalamus

Macrophage/ inflamma4on

Lep4n signaling

Phagocytosis-‐ induced lipolysis

Phagocytosis-‐ induced lipolysis

M1 macrophage

A mulV-‐Vssue immune-‐driven theory of weight loss

RULES GOVERN

PLAT

FORM

NEW

MAP

S

PLATFORM Sage Platform and Infrastructure Builders-

( Academic Biotech and Industry IT Partners...)

PILOTS= PROJECTS FOR COMMONS Data Sharing Commons Pilots-

(Federation, CCSB, Inspire2Live....)

Why not share clinical /genomic data and model building in the ways currently used by the software industry (power of tracking workflows and versioning

Leveraging Existing Technologies

Taverna

Addama

tranSMART

Watch What I Do, Not What I Say sage bionetworks synapse project

Most of the People You Need to Work with Don’t Work with You

sage bionetworks synapse project

My Other Computer is Cloudera Amazon Google

sage bionetworks synapse project

Sage Metagenomics Project

•  > 10k genomic and expression standardized datasets indexed in SCR •  Error detection, normalization in mG •  Access raw or processed data via download or API in downstream analysis •  Building towards open, continuous community curation

Processed Data (S3)

Sage Metagenomics using Amazon Simple Workflow

Full case study at http://aws.amazon.com/swf/testimonials/swfsagebio/

Synapse Roadmap

Q1-2012 Q2-2012 Q3-2012 Q4-2012 Q1-2013 Q2-2013

Synapse Platform Functionality

Data / Analysis Capabilities

Q3-2013 Q4-2013

Internal Alpha Public Beta Testing Synapse 1.0 Synapse 1.5 Future

•  Data Repository •  Projects and security •  R integration •  Analysis provenance

• Search • Controlled Vocabularies • Governance of restricted data

•  40+ manually curated clinical studies •  8000 + GEO / Array Express datasets •  Clinical, genomic, compound sensitivity •  Bioconductor and custom R analysis

• TCGA •  METABRIC breast cancer challenge

•  Workflow templates •  Publishing figures •  Wiki & collaboration tools •  Integrated management of cloud resources

•  Social networking •  User-customized dashboards •  R Studio integration •  Curation tool integration

•  Predictive modeling workflows •  Automated processing of common genomics platforms

•  TBD: Integrations with other visualization and analysis packages

CTCAP Arch2POCM The FederaVon Portable Legal Consent Sage Congress Project BRIDGE

Six Pilots involving Sage Bionetworks

RULES GOVERN

PLAT

FORM

NEW

MAP

S

Clinical Trial Comparator Arm Partnership (CTCAP)

  Description: Collate, Annotate, Curate and Host Clinical Trial Data with Genomic Information from the Comparator Arms of Industry and Foundation Sponsored Clinical Trials: Building a Site for Sharing Data and Models to evolve better Disease Maps.

  Public-Private Partnership of leading pharmaceutical companies, clinical trial groups and researchers.

  Neutral Conveners: Sage Bionetworks and Genetic Alliance [nonprofits].

  Initiative to share existing trial data (molecular and clinical) from non-proprietary comparator and placebo arms to create powerful new tool for drug development.

Started Sept 2010

Shared clinical/genomic data sharing and analysis will maximize clinical impact and enable discovery

•  Graphic of curated to qced to models

Arch2POCM

Restructuring the PrecompeVVve Space for Drug Discovery

How to potenVally De-‐Risk High-‐Risk TherapeuVc Areas

Arch2POCM: scale and scope

•  Proposed Goal: Initiate 2 programs. One for Oncology/Epigenetics/Immunology. One for Neuroscience/Schizophrenia/Autism. Both programs will have 8 drug discovery projects (targets) - ramped up over a period of 2 years

–  It is envisioned that Arch2POCM’s funding partners will select targets that are judged as slightly too risky to be pursued at the top of pharma’s portfolio, but that have significant scientific potential that could benefit from Arch2POCM’s crowdsourcing effort

•  These will be executed over a period of 5 years making a total of 16 drug discovery projects

–  Projected pipeline attrition by Year 5 (assuming 12 targets loaded in early discovery)

•  30% will enter Phase 1 •  20% will deliver Ph 2 POCM data 45

Arch2POCM: Highlights A PPP To De-Risk Novel Targets That The Pharmaceutical Industry Can

Then Use To Accelerate The Development of New and Effective Medicines •  The Arch2POCM will be a charitable Public Private Partnership (PPP) that will file no patents and

whose scientific plan (including target selection) will be endorsed by its pharmaceutical, private and public funders

•  Arch2POCM will de-risk novel targets by developing and using pairs of test compounds (two different chemotypes) that interact with the selected targets: the compounds will be developed through Phase IIb clinical trials to determine if the selected target plays a role in the biology of human disease

•  Arch2POCM will work with and leverage patient groups and clinical CROs to enable patient recruitment, and with regulators to design novel studies and to validate novel biomarkers

•  Arch2POCM will make its GMP test compounds available to academic groups and foundations so they can use them to perform clinical studies and publish on a multitude of additional indications

•  Arch2POCM will release all reagents and data to the public at pre-defined stages in its drug development process. To ensure scientific quality, data and reagents will be released once they have been vetted by an independent scientific committee

•  Arch2POCM will publish all negative POCM data immediately in order to reduce the number of ongoing redundant proprietary studies (in pharma, biotech and academia) on an invalidated target and thereby –  minimize unnecessary patient exposure –  provide significant economic savings for the pharmaceutical industry

•  In the rare instance in which a molecule achieves positive POCM, Arch2POCM will ensure that the compound has the ability to reach the market by arranging for exclusive access to the proprietary IND database for the molecule 46

Arch2POCM: proposed funding strategy –  $160-200M over five years is projected as necessary to advance

up to 8 drug discovery projects within each of the two therapeutic programs

–  Arch2POCM funding will come from a combination of public funding from governments and private sector funding from pharmaceutical and biotechnology companies and from private philanthropists

–  By investing $1.6 M annually into one or both of Arch2POCM’s selected disease areas, partnered pharmaceutical companies:

1.  obtain a vote on Arch2POCM target selection 2.  have the opportunity to donate existing compounds from their

abandoned clinical programs for re-purposing on Arch2POCM’s targets

3.  gain real time data access to Arch2POCM’s 16 drug discovery projects

4.  have the strategic opportunity to expand their overall portfolio 47

Five Year Objective: Initiate ≈ 8 drug discovery projects with 6 entering in Early Discovery, one entering in pre-clinical and one entering in PH I

Months → 0-6 7-12 13-18 19-24 25-30 31-36 37-42 43-48 49-54 55-60

Pipeline flow for Arch2POCM

Early discovery (45% PTRS) Pre-clinical (70% PTRS) Ph I (65% PTRS)

Ph II (10% PTRS)

1.3

1

Ph 1 (1)

1

Year #2 Arch2POCM Target Load

Arch2POCM Snapshot at Year 5

Year #1 Arch2POCM Target Load

Early discovery (2)

1

Targets Loaded 8

Projected INDs filed 3-‐4

Ph 1 or 2 Trials In Progress 2

Projected Complete Ph 2 (POCM) Data Sets

1

*PTRS = Probability of technical and regulatory success

Pre-clinical (1)

Early discovery (4)

Pre-clinical

Pre-clinical

Ph 1

Ph 1

Ph 1

Ph 2

Ph 2

Ph 2

48

The case for epigenetics/chromatin biology

1.  There are epigenetic oncology drugs on the market (HDACs)

2.  A growing number of links to oncology, notably many genetic links (i.e. fusion proteins, somatic mutations)

3.  A pioneer area: More than 400 targets amenable to small molecule intervention - most of which only recently shown to be “druggable”, and only a few of which are under active investigation

4.  Open access, early-stage science is developing quickly – significant collaborative efforts (e.g. SGC, NIH) to generate proteins, structures, assays and chemical starting points

49

Arch2POCM epigenetics program: Assumptions for launch and completion of Year 1

•  Funding necessary to prosecute 8 epigenetic target-based projects o  ≈$85M for five years with $15M available for Year 1

•  $1.6M from each of 3 pharma partners ($4.8M) •  $5M from public funders and $5M from philanthropists

o  Year 1: load 3 targets with 2 in Early Discovery and 1 in pre-clinical stage of development o  Year 2: load 5 targets with at least one late stage clinical asset from a pharma partner

•  Partners –  In kind partners

o  GE Healthcare (imaging): open sharing of its experimental oncology biomarkers o  CRUK: through some of its drug discovery and development resources participating in Arch2POCM

–  Potential academic partner sites •  Institutions that have indicated willingness to let their scientists participate without patent filing: UCSF,

Massachusetts General Hospital, University of North Carolina, University of Toronto, Oxford University, Karolinska Institute

•  Costs to fund Arch2POCM academic partners will be de-frayed by crowd-sourcing: each funded investigator will use their own network to amplify what they can do and publish on Arch2POCM targets

–  Patient groups will enable patient recruitment and reduce costs for clinical studies –  FDA and EMEA team of regulators available

o  Oncology experts available o  Can provide in vitro screening assays for toxicities and biomarker development to improve patient

selection o  FDA to help build and host a compliant Arch2POCM data-sharing site

o  Infrastructure that needs to be in place to execute on time o  Align vendors and CROs prior to initiation of Arch2POCM projects o  IT and patient database management: harmonization of data-entry across participating clinical collaborators

in place well before start of first Arch2POCM trial 50

General benefits of Arch2POCM for drug development

1.  Arch2POCM’s use of test compounds to de-risk previously unexplored biology enables drug developers to initiate proprietary drug development starting from an array of unbiased, clinically validated targets

2.  Arch2POCM’s crowdsourced research and trials provides the pharmaceutical industry with “parallel shots on goal: by aligning test compounds to most promising unmet medical need”

3.  The positive and negative clinical trial data that Arch2POCM and the crowd produce and publish will increase clinical success rates (as one can pick targets and indications more smartly) and will save the pharmaceutical industry money by reducing redundant proprietary efforts on failed targets

51

Why is Arch2POCM a “smart bet” for Pharma investment?

Arch2POCM: an external epigeneVc think tank from which Pharma can load the most likely to succeed targets as proprietary programs or leverage Arch2POCM results for its other internal efforts •  A front row seat on the progression of 8 epigeneVc targets means that:

•  Pharma can select the epigeneVc targets that best compliment their internal poriolio and for which there is the greatest interest

•  Pharma can structure Arch2POCM’s projects so that key objecVves line up with internal go/no-‐go decisions

•  Pharma can use Arch2POCM data to trigger its internal level of investment on a parVcular target

•  Pharma can use Arch2POCM resources to enrich their internal epigeneVcs effort: acVve chemotypes, assays, pre-‐clinical models, biomarkers, geneVc and phenotypic data for paVent straVficaVon, relaVonships to epigeneVc experts

•  Pharma can use Arch2POCM’s lead compound chemotypes to: •  inform their proprietary medicinal chemistry efforts on the target

•  idenVfy chemical scaffolds that impact epigeneVc pathways: a proprietary combinaVon therapy opportunity

•  Toxicity screening of Arch2POCM compounds with FDA tools can be used to guide internal proprietary chemistry efforts in oncology, inflammaVon and beyond

•  Arch2POCM’s crowd of scienVsts and clinicians provides its Pharma partners with parallel shots on goal at the best context for Arch2POCM’s compounds/targets 52

•  Arch2POCM’s Ph II validation of high risk high opportunity targets focuses Pharma’s NME efforts

•  Positive POCM data: De-risked validated targets for Pharma development •  Negative POCM data: public release of this data minimizes the amount of time

and money that Pharma and the industry place on failed targets

•  Arch2POCM’s clinical candidate compounds provide Pharma with multiple paths to new medicines

•  Arch2POCM compounds that achieve POCM can be advanced into Ph 3 by Arch2POCM Members

•  The purchaser of Arch2POCM’s IND database obtains a significant time advantage over competitors to generate Phase III data and proceed to market

•  NMEs that derive from Arch2POCM will launch with database exclusivity protections: 5-8 years to garner a return on investment

•  The crowd’s testing of Arch2POCM compounds may identify alternative/better contexts for agonizing/antagonizing the disease biology target

•  indications •  patient stratification •  combination therapy options

How will Arch2POCM provide “line of sight” to new medicines?

53

The FederaVon

2008 2009 2010 2011

How can we accelerate the pace of scientific discovery?

Ways to move beyond “traditional” collaborations?

Intra-lab vs Inter-lab Communication

Colrain/ Industrial PPPs Academic Unions

(Nolan and Haussler)

sage federation: model of biological age

Faster Aging

Slower Aging

Clinical Association -  Gender -  BMI -  Disease Genotype Association Gene Pathway Expression Pr

edicted Age (liver expression)

Chronological Age (years)

Age Differential

Reproducible science==shareable science

Sweave: combines programmatic analysis with narrative

Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reports using literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 –

Proceedings in Computational Statistics,pages 575-580. Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9

Dynamic generation of statistical reports using literate data analysis

Federated Aging Project : Combining analysis + narraVve

=Sweave Vignette Sage Lab

Califano Lab Ideker Lab

Shared Data Repository

JIRA: Source code repository & wiki

R code + narrative

PDF(plots + text + code snippets)

Data objects

HTML

Submitted Paper

1)  Data management APIs to load standaridzed objects, e.g. R ExpressionSets (MaD Furia):

ccleFeatureData <-‐ getEnVty(ccleFeatureDataId) ccleResponseData <-‐ getEnVty(ccleResponseDataId)

tcgaFeatureData <-‐ getEnVty(tcgaFeatureDataId) tcgaResponseData <-‐ getEnVty(tcgaResponseDataId)

=!

Observed Data!=! +!

+!

Random Variation!Systematic Variation!

+!

Normalization: Remove the influence of adjustment variables on data...!

=! +!

2) Automated, standardized workflows for cura4on and QC of large-‐scale datasets (Brig Mecham).

A.  TCGA: Automated cloud-‐based processing. B. GEO / Array Expression: NormalizaVon workflows, curaVon of phenotype using standard ontologies. C. AddiVonal studies with geneVc and phenotypic data in Sage repository (e.g. CCLE and Sanger cell line datasets)

custom model 1 custom model 2 custom model N

4)  Sta4s4cal performance assessment across models.

custom model 1 custom model 2 custom model N

5)  Output of candidate biomarkers and feature evalua4on (e.g. GSEA, pathway analysis)

6) Experimental follow-‐up on top predic4ons (TBD) E.g. for cell lines: medium throughput suppressor / enhancer screens of drug sensiVvity for knockdown / overexpression of predicted biomarkers.

3)  Pluggable API to implement predic4ve modeling algorithms.

A)  Support for all commonly used machine learning methods (for automated benchmarking against new methods)

B)  Pluggable custom methods as R classes implemenVng customTrain() and customPredict() methods.

A)  Can be arbitrarily complex (e.g. pathway and other priors)

B)  Support for parallelizaVon in for each loops.

Portable Legal Consent

(AcVvaVng PaVents)

John Wilbanks

weconsent.us

Sage Congress Project April 20 2012

RealNames Parkinson’s Project RevisiVng Breast Cancer Prognosis

Fanconi’s Anemia

(Responders CompeVVons-‐ IBM-‐DREAM)

Networking Disease Model Building

Stephen Friend Haas School of Business 2012-03-05

Health & Medicine

sage missionsage bionetworks

complex disease

disease severity

integrative bionetworks

mathematicalmodels of

data intensive science

curate data

clinicalgenomic data