YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

From Gene networks to bioinformatics networks

Stephen Friend MD PhD

Sage Bionetworks (Non-Profit Organization) Seattle/ Beijing/ San Francisco

NHLBI July 18th, 2011

Page 2: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

why consider the fourth paradigm- data intensive science

thinking beyond the narrative, beyond pathways

advantages of an open innovation compute space

it is more about how than what

Page 3: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

COPD   Diabetes  

Pulmonary  Fibrosis   Obesity    Treating Symptoms v.s. Modifying Diseases

Will it work for me?

Page 4: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Familiar  but  Incomplete  

Page 5: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Reality: Overlapping Pathways

Page 6: Stephen Friend National Heart Lung & Blood Institute 2011-07-19
Page 7: Stephen Friend National Heart Lung & Blood Institute 2011-07-19
Page 8: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

WHY  NOT  USE    “DATA  INTENSIVE”  SCIENCE  

TO  BUILD  BETTER  DISEASE  MAPS?  

Page 9: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Equipment capable of generating massive amounts of data

“Data Intensive Science”- “Fourth Scientific Paradigm” For building: “Better Maps of Human Disease”

Open Information System

IT Interoperability

Evolving Models hosted in a Compute Space- Knowledge Expert

Page 10: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

It is now possible to carry out comprehensive monitoring of many traits at the population level

Monitor disease and molecular traits in populations

Putative causal gene

Disease trait

Page 11: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

what will it take to understand disease?

                   DNA    RNA  PROTEIN  (dark  maKer)    

MOVING  BEYOND  ALTERED  COMPONENT  LISTS  

Page 12: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

2002 Can one build a “causal” model?

Page 13: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

trait trait trait trait trait trait trait trait trait trait trait trait trait

How is genomic data used to understand biology?

!Standard" GWAS Approaches Profiling Approaches

!Integrated" Genetics Approaches

Genome scale profiling provide correlates of disease   Many examples BUT what is cause and effect?

Identifies Causative DNA Variation but provides NO mechanism

  Provide unbiased view of molecular physiology as it

relates to disease phenotypes

  Insights on mechanism

  Provide causal relationships and allows predictions

RNA amplification Microarray hybirdization

Gene Index

Tum

ors

Tum

ors

Page 14: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Integration of Genotypic, Gene Expression & Trait Data

Causal Inference

Schadt et al. Nature Genetics 37: 710 (2005) Millstein et al. BMC Genetics 10: 23 (2009)

Chen et al. Nature 452:429 (2008) Zhang & Horvath. Stat.Appl.Genet.Mol.Biol. 4: article 17 (2005)

Zhu et al. Cytogenet Genome Res. 105:363 (2004) Zhu et al. PLoS Comput. Biol. 3: e69 (2007)

“Global Coherent Datasets” •  population based

•  100s-1000s individuals

Page 15: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Constructing Co-expression Networks

Start with expression measures for genes most variant genes across 100s ++ samples

Note: NOT a gene expression heatmap

1 -0.1 -0.6 -0.8

-0.1 1 0.1 0.2

-0.6 0.1 1 0.8

-0.8 0.2 0.8 1 1

2

3

4

1 2 3 4

Correlation Matrix Brain sample

expr

essi

on

1 0 1 1 0 1 0 0 1 0 1 1 1 0 1 1 1

2

3

4

1 2 3 4

Connection Matrix

1 0 0 0 0 1 1 1 0 1 1 1 0 1 1 1 1

2

4

3

1 2 4 3

4 1

3 2

4 1

2

Establish a 2D correlation matrix for all gene pairs

Define Threshold eg >0.6 for edge

Clustered Connection Matrix

Hierarchically cluster

sets of genes for which many pairs interact (relative to the total number of pairs in that

set)

Network Module

Identify modules

Page 16: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Preliminary Probabalistic Models- Rosetta /Schadt

Gene symbol Gene name Variance of OFPM explained by gene expression*

Mouse model

Source

Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg

Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12]

Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple

(UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg

(Columbia University, NY) [11] C3ar1 Complement component

3a receptor 1 46% ko Purchased from Deltagen, CA

Tgfbr2 Transforming growth factor beta receptor 2

39% ko Purchased from Deltagen, CA

Networks facilitate direct identification of genes that are

causal for disease Evolutionarily tolerated weak spots

Nat Genet (2005) 205:370

Page 17: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

  50 network papers   http://sagebase.org/research/resources.php

List of Influential Papers in Network Modeling

Page 18: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

(Eric Schadt)

Page 19: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Recognition that the benefits of bionetwork based molecular models of diseases are powerful but that they require significant resources

Appreciation that it will require decades of evolving representations as real complexity emerges and needs to be integrated with therapeutic interventions

Page 20: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Sage Mission

Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by

contributor scientists with a shared vision to accelerate the elimination of human disease

Sagebase.org

Data Repository

Discovery Platform

Building Disease Maps

Commons Pilots

Page 21: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Sage Bionetworks Collaborators

  Pharma Partners   Merck, Pfizer, Takeda, Astra Zeneca, Amgen, Johnson &Johnson

21

  Foundations   Kauffman CHDI, Gates Foundation

  Government   NIH, LSDF

  Academic   Levy (Framingham)   Rosengren (Lund)   Krauss (CHORI)

  Federation   Ideker, Califarno, Butte, Schadt

Page 22: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

RULES GOVERN

Engaging Communities of Interest

PLAT

FORM

NEW

MAP

S NEW MAPS

Disease Map and Tool Users- ( Scientists, Industry, Foundations, Regulators...)

PLATFORM Sage Platform and Infrastructure Builders-

( Academic Biotech and Industry IT Partners...)

RULES AND GOVERNANCE Data Sharing Barrier Breakers-

(Patients Advocates, Governance and Policy Makers,  Funders...)

NEW TOOLS Data Tool and Disease Map Generators- (Global coherent data sets, Cytoscape,

Clinical Trialists, Industrial Trialists, CROs…)

PILOTS= PROJECTS FOR COMMONS Data Sharing Commons Pilots-

(Federation, CCSB, Inspire2Live....)

Page 23: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Research Platform Research Platform Commons

Data Repository

Discovery Platform

Building Disease

Maps

Tools & Methods

Repository

Discovery

Maps

Tools &

Repository

Discovery Platform

Repository Repository

Discovery

Repository

Discovery

Commons Pilots

Outposts Federation

CCSB

LSDF-WPP Inspire2Live

POC

Cancer Neurological Disease

Metabolic Disease

Pfizer Merck Takeda

Astra Zeneca CHDI Gates NIH

Curation/Annotation

CTCAP Public Data Merck Data TCGA/ICGC

Hosting Data Hosting Tools

Hosting Models

LSDF

Bayesian Models Co-expression Models

KDA/GSVA

Page 24: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

A) Miller 159 samples B) Christos 189 samples

C) NKI 295 samples

D) Wang 286 samples

Cell cycle

Pre-mRNA

ECM

Immune response

Blood vessel

E) Super modules

Zhang B et al., Towards a global picture of breast cancer (manuscript).

NKI: N Engl J Med. 2002 Dec 19;347(25):1999.

Wang: Lancet. 2005 Feb 19-25;365(9460):671.

Miller: Breast Cancer Res. 2005;7(6):R953.

Christos: J Natl Cancer Inst. 2006 15;98(4):262.

Model of Breast Cancer: Co-expression Bin Zhang Xudong Dai Jun Zhu

Page 25: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Model of Alzheimer’s Disease Bin Zhang Jun Zhu

AD

normal

AD

normal

AD

normal

Cell cycle

http://sage.fhcrc.org/downloads/downloads.php

Page 26: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Blue module: 3000 genes Associated with Type 2 diabetes Elevated HbA1c Reduced insulin secretion

Global expression data from 64 human islet donors

340 genes in islet-specific open chromatin regions

168 overlapping genes, which have

•  Higher connectivity •  Markedly stronger association with

•  Type 2 diabetes •  Elevated HbA1c •  Reduced insulin secretion

•  Enrichment for beta-cell transcription factors and exocytotic proteins

New Type II Diabetes Disease Models Anders Rosengren

Page 27: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

•  Search across 1300 datasets in MetaGEO at Sage for similar expression profiles Top hit: Islet dedifferentiation study where the 168 genes were upregulated in mature islets and downregulated in dedifferentiated islets (Kutlu et al., Phys Gen 2009)

•  Analyses of expression-SNPs and clinical SNPs as well as Causal Inference Test

•  Identification of candidate key genes affecting beta-cell differentiation and chromatin

Working hypothesis:

Normal beta-cell: open chromatin in islet-specific regions, high expression of beta-cell transcription factors, differentiated beta-cells and normal insulin secretion

Diabetic beta-cell: lower expression of beta-cell transcription factors affecting the identified module, dedifferentiation, reduced insulin secretion and hyperglycemia

Next steps: Validation of hypothesis and suggested key genes in human islets

Anders Rosengren

New Type II Diabetes Disease Models

Page 28: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Liver Cytochrome P450 Regulatory Network Models

Xia Yang Bin Zhang Jun Zhu

Yang et al. Systematic genetic and genomic analysis of cytochrome P450 enzyme activities in human liver. 2010. Genome Research 20:1020.

Regulators of P450 network

http://sage.fhcrc.org/downloads/downloads.php

Page 29: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Clinical Trial Comparator Arm Partnership (CTCAP)

  Description: Collate, Annotate, Curate and Host Clinical Trial Data with Genomic Information from the Comparator Arms of Industry and Foundation Sponsored Clinical Trials: Building a Site for Sharing Data and Models to evolve better Disease Maps.

  Public-Private Partnership of leading pharmaceutical companies, clinical trial groups and researchers.

  Neutral Conveners: Sage Bionetworks and Genetic Alliance [nonprofits].

  Initiative to share existing trial data (molecular and clinical) from non-proprietary comparator and placebo arms to create powerful new tool for drug development.

Page 30: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Examples: The Sage Federation

•  Founding Lab Groups

–  Seattle- Sage Bionetworks –  New York- Columbia: Andrea Califano –  Palo Alto- Stanford: Atul Butte –  San Diego- UCSD: Trey Ideker –  San Francisco: UCSF/Sage: Eric Schadt

•  Initial Projects –  Aging –  Diabetes –  Warburg

•  Goals: Share all datasets, tools, models Develop interoperability for human data

Page 31: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Federation s Genome-wide Network and Modeling Approach

Califano group at Columbia Sage Bionetworks Butte group at Stanford

Page 32: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Human Aging Project

Brain A (n=363)

Brain B (n=145)

Blood A (n=~1000)

Blood B (n=~1000)

Brain C (n=400)

Adipose (n=~700)

Data Transformations

TF Activity Profile

Gene Set / Pathway Variation Analysis

Interactome

Machine Learning

Elastic Net

Network Prior Models

Tree Classifiers

Age Model

Page 33: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Deriving Master Regulators from Transcription Factors Regulatory Networks Glycolysis & Glycogenesis Metabolism Pathway

Page 34: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

THE FEDERATION Butte Califano Friend Ideker Schadt

vs

Page 35: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

“… the world is becoming too fast, too complex, and too networked

for any company to have all the answers inside”

Y. Benkler, The Wealth of Networks

Page 36: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Is the Industry managing itself into irrelevance?

$130 billion of patented drug sales will face generics in the 2011-2016 decade (55% of 2009 US sales)

Sales exposed to generics will double in 2012 (to $33 billion)

98% of big pharma sales come from products 5 years and older (avg patent life = 11 years)

6 big pharmas were lost in the last 10 years

R&D spending is flattening,

Page 37: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Largest Attrition For Pioneer Targets is at Clinical POC (Ph II)

Target ID/ Discovery

50% 10% 30% 30% 90%

This is killing drug discovery

We can generate effective and “safe” molecules in animals, but they do not have sufficient efficacy and/or safety in the chosen patient group.

Hit/Probe/Lead ID

Clinical Candidate

ID

Toxicology/

Pharmacology

Phase I Phase IIa/IIb

Attrition

Page 38: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

The current pharma model is redundant

50% 10% 30% 30% 90%

Negative POC information is not shared Attrition

Target ID/ Discovery

Hit/Probe/Lead ID

Clinical Candidate

ID

Toxicology/

Pharmacology

Phase I Phase IIa/IIb

Target ID/ Discovery

Hit/Probe/Lead ID

Clinical Candidate

ID

Toxicology/

Pharmacology

Phase I Phase IIa/IIb

Target ID/ Discovery

Hit/Probe/Lead ID

Clinical Candidate

ID

Toxicology/

Pharmacology

Phase I Phase IIa/IIb

Target ID/ Discovery

Hit/Probe/Lead ID

Clinical Candidate

ID

Toxicology/

Pharmacology

Phase I Phase IIa/IIb

Target ID/ Discovery

Hit/Probe/Lead ID

Clinical Candidate

ID

Toxicology/

Pharmacology

Phase I Phase IIa/IIb

Target ID/ Discovery

Hit/Probe/Lead ID

Clinical Candidate

ID

Toxicology/

Pharmacology

Phase I Phase IIa/IIb

Target ID/ Discovery

Hit/Probe/Lead ID

Clinical Candidate

ID

Toxicology/

Pharmacology

Phase I Phase IIa/IIb

Page 39: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Let’s imagine….

•  A pool of dedicated, stable funding

•  A process that attracts top scientists and clinicians

•  A process in which regulators can fully collaborate to solve key scientific problems

•  An engaged citizenry that promotes science and acknowledges risk

•  Mechanisms to avoid bureaucratic and administrative barriers

•  Sharing of knowledge to more rapidly achieve understanding of human biology

•  A steady stream of targets whose links to disease have been validated in humans

Page 40: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

A globally distributed public private partnership (PPP) committed to:

• Generate more clinically validated targets by sharing data

• Deliver more new drugs for patients by using compounds to understand disease biology

Arch2POCM

Page 41: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Arch2POCM: what’s in a name?

Arch: as in archipelago and referring to the distributed network of academic labs, pharma partners and clinical sites that will contribute to Arch2POCM programs

POCM: Proof Of Clinical Mechanism: demonstration in a Ph II setting that the mechanism of the selected disease target can be safely and usefully modulated.

Page 42: Stephen Friend National Heart Lung & Blood Institute 2011-07-19
Page 43: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

April  16-­‐17,  2011  San  Francisco  

Arch2POCM  Mission  

To establish a pre-competitive “stream” of drug development data and POCM candidates that: 1.  Will focus on high risk/high opportunity targets

2.  Will inform the industry regarding those targets that are validated for clinical proof of concept mechanism (POCM) and those that are not

3.  Will drive down redundant efforts in discovery and early development

4.  Will lead to substantial cost avoidance (est. $12.5 B annuall

(HOW DOES THIS COMPLEMENT NIH TRANSLATIONAL CENTER)

PARTNERS/ WHO DOES WHAT/ NO IP /CROWDSOURCING

Page 44: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Federation Projects: Building a Compute Space Combining analysis + narrative

=Sweave Vignette Sage Lab

Califano Lab Ideker Lab Califano Lab

Shared Data Repository

JIRA: Source code repository & wiki

R code + narrative

PDF(plots + text + code snippets) PDF(plots + text + code snippets)

Data objects

HTML

Submitted Paper

Page 45: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Reproducible science==shareable science

Sweave: combines programmatic analysis with narrative

Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reports using literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 –

Proceedings in Computational Statistics,pages 575-580. Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9

Dynamic generation of statistical reports using literate data analysis

Page 46: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Software Tools Support Collaboration

Page 47: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Biology Tools Support Collaboration

Page 48: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Potential Supporting Technologies

Taverna

Addama

tranSMART

Page 49: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Platform for Modeling

SYNAPSE  

Page 50: Stephen Friend National Heart Lung & Blood Institute 2011-07-19
Page 51: Stephen Friend National Heart Lung & Blood Institute 2011-07-19
Page 52: Stephen Friend National Heart Lung & Blood Institute 2011-07-19
Page 53: Stephen Friend National Heart Lung & Blood Institute 2011-07-19
Page 54: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

� � � � � � � � � � � � �

INTEROPERABILITY (tranSMART)

Page 55: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

 TENURE      FEUDAL  STATES      

Page 56: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Synapse  as  a  Github  for  building  models  of  disease  

Page 57: Stephen Friend National Heart Lung & Blood Institute 2011-07-19
Page 58: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

IMPACT ON PATIENTS IMPACT ON PATIENTS

Page 59: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

Eight Projects Initiated in last year

Page 60: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

!

Group D LEGAL STACK-ENABLING PAIENTS: John Wilbanks

Page 61: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

why consider the fourth paradigm- data intensive science

thinking beyond the narrative, beyond pathways

advantages of an open innovation compute space

it is more about how than what

Page 62: Stephen Friend National Heart Lung & Blood Institute 2011-07-19

OPPORTUNITIES FOR LUNG COMMUNITY

Data sets, Tools and Models for Lung Biology/Pathophsiology

Broad Institute cell line panels enriched in lung cancer

Change reward structures for sharing data (patients and academics)

Several Pharma partners interested in building models of respiratory disease- 2 public /3 Industry (Ron Crystal)


Related Documents