Top Banner

Click here to load reader

74

Friend UCSF 2012-07-27

Jun 25, 2015

Download

Health & Medicine

Sage Base

Stephen Friend, July 27, 2012. University of California San Francisco, San Francisco, CA
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Friend UCSF 2012-07-27

Why not use data intensive science to build better models of diseases together?

– beyond current rewards  

Stephen  H  Friend  MD  PhD  Sage  Bionetworks  (non-­‐profit)  

July  27,  2012  UCSF  

Page 2: Friend UCSF 2012-07-27
Page 3: Friend UCSF 2012-07-27

So  what  is  the  problem?  

     Most  approved  therapies  were  assumed  to  be  monotherapies  for  diseases  represen4ng  homogenous  popula4ons  

 Our  exis4ng  disease  models  o9en  assume  pathway  knowledge  sufficient  to  infer  correct  therapies  

Page 4: Friend UCSF 2012-07-27

Familiar but Incomplete

Page 5: Friend UCSF 2012-07-27

Reality: Overlapping Pathways

Page 6: Friend UCSF 2012-07-27

The value of appropriate representations/ maps

Page 7: Friend UCSF 2012-07-27
Page 8: Friend UCSF 2012-07-27

Equipment capable of generating massive amounts of data

“Data Intensive” Science- Fourth Scientific Paradigm

Open Information System

IT Interoperability

Host evolving computational models in a “Compute Space”

Page 9: Friend UCSF 2012-07-27
Page 10: Friend UCSF 2012-07-27

WHY  NOT  USE    “DATA  INTENSIVE”  SCIENCE  

TO  BUILD  BETTER  DISEASE  MAPS?  

Page 11: Friend UCSF 2012-07-27

what will it take to understand disease?

                   DNA    RNA  PROTEIN  (dark  maVer)    

MOVING  BEYOND  ALTERED  COMPONENT  LISTS  

Page 12: Friend UCSF 2012-07-27

2002 Can one build a “causal” model?

Page 13: Friend UCSF 2012-07-27

Preliminary Probabalistic Models- Rosetta /Schadt

Gene symbol Gene name Variance of OFPM explained by gene expression*

Mouse model

Source

Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg

Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12]

Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple

(UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg

(Columbia University, NY) [11] C3ar1 Complement component

3a receptor 1 46% ko Purchased from Deltagen, CA

Tgfbr2 Transforming growth factor beta receptor 2

39% ko Purchased from Deltagen, CA

Networks facilitate direct identification of genes that are

causal for disease Evolutionarily tolerated weak spots

Nat Genet (2005) 205:370

Page 14: Friend UCSF 2012-07-27

DIVERSE  POWERFUL  USE  OF  MODELS  AND  NETWORKS  

Page 15: Friend UCSF 2012-07-27

  50 network papers   http://sagebase.org/research/resources.php

List of Influential Papers in Network Modeling

Page 16: Friend UCSF 2012-07-27

(Eric Schadt)

Page 17: Friend UCSF 2012-07-27

Equipment capable of generating massive amounts of data A-

“Data Intensive” Science- Fourth Scientific Paradigm Score Card for Medical Sciences

Open Information System D-

IT Interoperability D

Host evolving computational models in a “Compute Space F

Page 18: Friend UCSF 2012-07-27

.

We still consider much clinical research as if we were “hunter gathers”- not sharing

Page 19: Friend UCSF 2012-07-27

 TENURE      FEUDAL  STATES      

Page 20: Friend UCSF 2012-07-27

Clinical/genomic data are accessible but minimally usable

Little incentive to annotate and curate data for other scientists to use

Page 21: Friend UCSF 2012-07-27

Mathematical models of disease are not built to be

reproduced or versioned by others

Page 22: Friend UCSF 2012-07-27

Lack of standard forms for future rights and consents

Page 23: Friend UCSF 2012-07-27

Lack of data standards..

Page 24: Friend UCSF 2012-07-27
Page 25: Friend UCSF 2012-07-27

Background:  Informa\on  Commons  for  Biological  Func\ons  

Page 26: Friend UCSF 2012-07-27

Sage Mission

Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by

contributor scientists with a shared vision to accelerate the elimination of human disease

Sagebase.org

Data Repository

Discovery Platform

Building Disease Maps

Commons Pilots

Page 27: Friend UCSF 2012-07-27

Sage Bionetworks Collaborators

  Pharma Partners   Merck, Pfizer, Takeda, Astra Zeneca, Amgen, Johnson &Johnson

27

  Foundations   Kauffman CHDI, Gates Foundation

  Government   NIH, LSDF, NCI

  Academic   Levy (Framingham)   Rosengren (Lund)   Krauss (CHORI)

  Federation   Ideker, Califano, Nolan, Schadt

Page 28: Friend UCSF 2012-07-27

Better Models of Disease:

INFORMATION COMMONS

Biomedical Information Commons

Page 29: Friend UCSF 2012-07-27

Better Models of Disease:

INFORMATION COMMONS

Technology Platform

Challenges

Impa

ctfu

l Mod

els

Governance

Products/Approaches

Page 30: Friend UCSF 2012-07-27

IT  

Pharma  

Academic  Consor4a  

Joint  Pa4ent/Scien4st  

Communi4es  

Biotech  

Pa4ent  Founda4ons  

Individual  Pa4ents  

BeVer  Models  of  Disease:  

INFORMATION  COMMONS  

Technology  PlaHorm    

Challenges      

Impa

cHul  M

odels  

Governance  

Cons4tuencies  

Page 31: Friend UCSF 2012-07-27

IT  Pharma  

Academic  Consor4a  

Joint  Pa4ent/Scien4st  

Communi4es  

Biotech  

Pa4ent  Founda4ons  

Individual  Pa4ents  

         RNDP/FA/MEL  Communi2es  engaging    COMMONS  PLATFORM  

Takeda  

WPP  

Discovery    Network  

BrCA/Challenges  

BeVer  Models  of  Disease:  

INFORMATION  COMMONS  

Technology  PlaHorm    

Challenges      

Impa

cHul  M

odels  

Governance  

Ongoing  Sage  Bionetworks  Ini4a4ves    

Cell  Line    Challenge  

Common  Mind/  Mt.  Sinai  Neuro  

TCGA/Challenge  

ClearScience  

Roche

SB/Gates

Sage CCSB

EU  PARTICIPATION  

Page 32: Friend UCSF 2012-07-27

IT  

Pharma  

Academic  Consor4a  

Joint  Pa4ent/Scien4st  

Communi4es  

Biotech  

Pa4ent  Founda4ons  

Individual  Pa4ents  

BeVer  Models  of  Disease:  

INFORMATION  COMMONS  

Technology  PlaHorm    

Challenges      

Impa

cHul  M

odels  

Governance  

Cons4tuencies  

Page 33: Friend UCSF 2012-07-27

A) Miller 159 samples B) Christos 189 samples

C) NKI 295 samples

D) Wang 286 samples

Cell cycle

Pre-mRNA

ECM

Immune response

Blood vessel

E) Super modules

Zhang B et al., Towards a global picture of breast cancer (manuscript).

33

NKI: N Engl J Med. 2002 Dec 19;347(25):1999.

Wang: Lancet. 2005 Feb 19-25;365(9460):671.

Miller: Breast Cancer Res. 2005;7(6):R953.

Christos: J Natl Cancer Inst. 2006 15;98(4):262.

Impactful Models Breast Cancer: Co-expression Networks

Page 34: Friend UCSF 2012-07-27

What  is  this?  

Bayesian  networks  enriched  in  inflamma\on  genes    correlated  with  disease  severity  in  pre-­‐frontal  cortex  of  250  Alzheimer’s  pa\ents.  

What  does  it  mean?  

Inflamma\on    in  AD  is  an  interac\ve  mul\-­‐pathway  system.    More  broadly,  network  structure  organizes  complex  disease  effects  into  coherent  sub-­‐systems  and  can  priori\ze  key  genes.  

Are  you  joking?  

Gene  valida\on  shows  novel  key  drivers  increase  Abeta  uptake  and  decrease  neurite  length  through  an  ROS  burst.  (highly  relevant  to  AD  pathology)  

CHRIS  GAITERI-­‐ALZHEIMER’S  

Page 35: Friend UCSF 2012-07-27

Liver   Adipose  

FaNy  acids  

Hypothalamus  

Macrophage/  inflamma4on  

Lep4n  signaling  

Phagocytosis-­‐  induced  lipolysis  

Phagocytosis-­‐  induced  lipolysis  

M1  macrophage  

A  mul\-­‐\ssue  immune-­‐driven  theory  of  weight  loss  

IMPACTFUL  MODELS  

Page 36: Friend UCSF 2012-07-27

IT  

Pharma  

Academic  Consor4a  

Joint  Pa4ent/Scien4st  

Communi4es  

Biotech  

Pa4ent  Founda4ons  

Individual  Pa4ents  

BeVer  Models  of  Disease:  

INFORMATION  COMMONS  

Technology  PlaHorm    

Challenges      

Impa

cHul  M

odels  

Governance  

Cons4tuencies  

Page 37: Friend UCSF 2012-07-27

Two approaches to building common scientific and technical knowledge

Text summary of the completed project Assembled after the fact

Every code change versioned Every issue tracked Every project the starting point for new work All evolving and accessible in real time Social Coding

Page 38: Friend UCSF 2012-07-27

Synapse is GitHub for Biomedical Data

Data and code versioned Analysis history captured in real time Work anywhere, and share the results with anyone Social Science

Every code change versioned Every issue tracked Every project the starting point for new work All evolving and accessible in real time Social Coding

Page 39: Friend UCSF 2012-07-27

Leveraging Existing Technologies

Taverna

Addama

tranSMART

Page 40: Friend UCSF 2012-07-27

Watch What I Do, Not What I Say

sage bionetworks synapse project

Page 41: Friend UCSF 2012-07-27

Most of the People You Need to Work with Don’t Work with You

sage bionetworks synapse project

Page 42: Friend UCSF 2012-07-27

My Other Computer is “The Cloud”

sage bionetworks synapse project

Page 43: Friend UCSF 2012-07-27

Data Analysis with Synapse

Run Any Tool

On Any Platform

Record in Synapse

Share with Anyone

Page 44: Friend UCSF 2012-07-27

!"#$%#&'()"*'++"++&"(,*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(*1%/2*

(3&4"#*

53,'6%(*

!7"(%,2/"*-./#"++0%(*

1%/2*(3&4"#*

53,'6%(*

!7"(%,2/"*

!#"80)69"*&%8":*;"("#'6%(*

•  Automated  workflows  for  cura\on,  QC,  and  sharing  of  large-­‐scale  datasets.  

•  All  of  TCGA,  GEO,  and  user-­‐submiVed  data  processed  with  standard  normaliza\on  methods.  

•  Searchable  TCGA  data:  •  23  cancers  •  11  data  plahorms  •  Standardized  meta-­‐data  ontologies  

Page 45: Friend UCSF 2012-07-27

!"#$%&'()$

*&+%

,-./

0$1-

-'&2-3$45

6 7$

!"#$%#&'()"*'++"++&"(,*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(*1%/2*

(3&4"#*

53,'6%(*

!7"(%,2/"*-./#"++0%(*

1%/2*(3&4"#*

53,'6%(*

!7"(%,2/"*

!#"80)69"*&%8":*;"("#'6%(*

•  Comparison  of  many  modeling  approaches  applied  to  the  same  data.  

•  Models  transparently  shared  and  reusable  through  Synapse.  

•  Displayed  is  comparison  of  6  modeling  approaches  to  predict  sensi\vity  to  130  drugs.  

•  Extending  pipeline  to  evaluate  predic\on  of  TCGA  phenotypes.  

•  Hos\ng  of  collabora\ve  compe\\ons  to  compare  models  from  many  groups.  

Page 46: Friend UCSF 2012-07-27

INTEROPERABILITY  

INTEROPERABILITY

Genome Pattern CYTOSCAPE tranSMART I2B2

SYNAPSE  

Page 47: Friend UCSF 2012-07-27

IT  

Pharma  

Academic  Consor4a  

Joint  Pa4ent/Scien4st  

Communi4es  

Biotech  

Pa4ent  Founda4ons  

Individual  Pa4ents  

BeVer  Models  of  Disease:  

INFORMATION  COMMONS  

Technology  PlaHorm    

Challenges      

Impa

cHul  M

odels  

Governance  

Cons4tuencies  

Page 48: Friend UCSF 2012-07-27

Clinical Trial Comparator Arm Partnership (CTCAP)

  Description: Collate, Annotate, Curate and Host Clinical Trial Data with Genomic Information from the Comparator Arms of Industry and Foundation Sponsored Clinical Trials: Building a Site for Sharing Data and Models to evolve better Disease Maps.

  Public-Private Partnership of leading pharmaceutical companies, clinical trial groups and researchers.

  Neutral Conveners: Sage Bionetworks and Genetic Alliance [nonprofits].

  Initiative to share existing trial data (molecular and clinical) from non-proprietary comparator and placebo arms to create powerful new tool for drug development.

Started Sept 2010

Page 49: Friend UCSF 2012-07-27

Shared clinical/genomic data sharing and analysis will maximize clinical impact and enable discovery

•  Graphic  of  curated  to  qced  to  models  

Page 50: Friend UCSF 2012-07-27

Arch2POCM  

Restructuring  the  Precompe\\ve  Space  for  Drug  Discovery  

How  to  poten\ally  De-­‐Risk      High-­‐Risk  Therapeu\c  Areas  

Page 51: Friend UCSF 2012-07-27
Page 52: Friend UCSF 2012-07-27

The  Federa\on  

Page 53: Friend UCSF 2012-07-27

2008   2009   2010   2011  

How can we accelerate the pace of scientific discovery?

Ways to move beyond “traditional” collaborations?

Intra-lab vs Inter-lab Communication

Pfizer CTI/ Industrial PPPs Academic Unions

Page 54: Friend UCSF 2012-07-27

(Nolan  and  Haussler)  

Page 55: Friend UCSF 2012-07-27

sage federation: model of biological age

Faster Aging

Slower Aging

Clinical Association -  Gender -  BMI -  Disease Genotype Association Gene Pathway Expression Pr

edicted  Age  (liver  expression)  

Chronological  Age  (years)  

Age Differential

Page 56: Friend UCSF 2012-07-27

Reproducible  science==shareable  science  

Sweave: combines programmatic analysis with narrative

Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reports using literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 –

Proceedings in Computational Statistics,pages 575-580. Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9

Dynamic generation of statistical reports using literate data analysis

Page 57: Friend UCSF 2012-07-27

Portable  Legal  Consent  

(Ac\va\ng  Pa\ents)  

John  Wilbanks  

Page 58: Friend UCSF 2012-07-27
Page 59: Friend UCSF 2012-07-27
Page 60: Friend UCSF 2012-07-27
Page 61: Friend UCSF 2012-07-27

weconsent.us  

Page 62: Friend UCSF 2012-07-27

IT  

Pharma  

Academic  Consor4a  

Joint  Pa4ent/Scien4st  

Communi4es  

Biotech  

Pa4ent  Founda4ons  

Individual  Pa4ents  

BeVer  Models  of  Disease:  

INFORMATION  COMMONS  

Technology  PlaHorm    

Challenges      

Impa

cHul  M

odels  

Governance  

Cons4tuencies  

Page 63: Friend UCSF 2012-07-27

What  is the problem?

Our current models of disease biology are primitive and limit doctor’s understanding and ability to treat patients

Current incentives reward those who silo information and work in closed systems

Page 64: Friend UCSF 2012-07-27

The Solution: Competitions to crowd-source research in biology and other fields

  Why competitions? •  Objective assessments •  Acceleration of progress •  Transparency •  Reproducibility •  Extensible, reusable models

  Competitions in biomedical research •  CASP (protein structure) •  Fold it / EteRNA (protein / RNA structure) •  CAGI (genome annotation) •  Assemblethon / alignathon (genome assembly / alignment) •  SBV Improver (industrial methodology benchmarking) •  DREAM (co-organizer of Sage/DREAM competition)

  Generic competition platforms •  Kaggle, Innocentive, MLComp

Page 65: Friend UCSF 2012-07-27

The Sage/DREAM breast cancer prognosis challenge

Goal: Challenge to assess the accuracy of computational models designed to predict breast cancer survival using patient clinical and genomic data

Why this is unique:   This Sage/DREAM Challenge is a pre-collated cohort: 2000 breast cancer samples

from the Metabric cohort   Accessible to all: A cloud-based common compute architecture is being made

available by Google to support the computational models needed to develop and test challenge models

  New Rigor: •  Contestants will evaluate their models on a validation data set composed of newly generated

data (provided by Dr. Anne-Lise Borreson Dale) •  Contestants must demonstrate their models can be reproduced by others

  New incentives: leaderboard to energize participants, Science Translational Medicine publication for winning team

  Breast cancer patients, funders and researchers can track this Challenge on BRIDGE, an open source online community being built by Sage and Ashoka Changemakers and affiliated with this Challenge

Page 66: Friend UCSF 2012-07-27

Sage/DREAM Challenge: Details and Timing

Phase  1: July thru end-Sep 2012

  Training data: 2,000 breast cancer samples from METABRIC cohort

•  Gene expression •  Copy number •  Clinical covariates •  10 year survival

  Supporting data: Other Sage-curated breast cancer datasets

•  >1,000 samples from GEO •  ~800 samples from TCGA •  ~500 additional samples from

Norway group •  Curated and available on

Synapse, Sage’s compute platform

  Data released in phases on Synapse from now through end-September

  Will evaluate accuracy of models built on METABRIC data to predict survival in:

•  Held out samples from METABRIC

•  Other datasets

Phase  2:  Oct 1 thru Nov 12, 2012

  Evaluation of models in novel dataset.

  Validation data: ~500 fresh frozen tumors from Norway group with:

•  Clinical covariates •  10 year survival

  Gene expression and copy number data to be generated for model evaluation

•  Sent to Cancer Research UK to generate data at same facility as METABRIC

•  Models built on training data evaluated on newly generated data

  Winners announced at November 12 DREAM conference

Page 67: Friend UCSF 2012-07-27

!"#$%#&'()"*'++"++&"(,*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(*1%/2*

(3&4"#*

53,'6%(*

!7"(%,2/"*-./#"++0%(*

1%/2*(3&4"#*

53,'6%(*

!7"(%,2/"*

!#"80)69"*&%8":*;"("#'6%(*

METABRIC  cohort:  997  breast  cancer  samples  

Clinical  covariates  

Gene  expression  (Illumina  HT12v3)  

Copy  number  (Affy  SNP  6.0)  

10  year  survival  

Loaded  through  Synapse  R  client  as  Bioconductor  objects.  

Page 68: Friend UCSF 2012-07-27

!"#$%#&'()"*'++"++&"(,*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(*1%/2*

(3&4"#*

53,'6%(*

!7"(%,2/"*-./#"++0%(*

1%/2*(3&4"#*

53,'6%(*

!7"(%,2/"*

!#"80)69"*&%8":*;"("#'6%(*

Custom  models  implement  train()  and  predict()  API.  

Implementa)on  of  simple  clinical-­‐only  survival  model  used  as  baseline  predictor.  

Page 69: Friend UCSF 2012-07-27

!"#$%&'#(#")*+,-./%0-1(23.(4)

5"46%768+'1)9+-"+:%;+,'#$)

9-.1+:2%712<2:4=($)5"8+,%

>4<+<)

?,'"#+%@+<4A+,2)

B4.8+4%784C2,4)

D-(#.8%>+,.+<) D+"4+,2%

?<:+"#/)

9+""$%E2<+,)

&,%726(%*+,F)>#,%7+-#"34,#)71#G8#,%H"4#,')

*-.14,%9-4,,#$)

D+"6%I4'+<)

?'+C%D+"F2<4,)

>#,%J2F.'2,)

Federa4on  modeling  compe44on  

Models  submiNed  and  evaluated  in  real-­‐4me  

leaderboard  >200  models  tested  within  3  months  

Page 70: Friend UCSF 2012-07-27

Summary hVps://synapse.sagebase.org/  -­‐  BCCOverview:0  

Transparency,  reproducibility  

Valida4on  in  novel  dataset  

Publica4on  in  Science  Transla4onal  Medicine  

Dona4on  of  Google-­‐scale  compute  space.  

For  the  goal  of  promo4ng  democra4za4on  of  medicine…  Registra4on  star4ng  NOW…  

sign  up  at:    synapse.sagebase.org  

!"#$%#&'()"*'++"++&"(,*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(* 1%/2*(3&4"#* 53,'6%(* !7"(%,2/"*

-./#"++0%(*1%/2*

(3&4"#*

53,'6%(*

!7"(%,2/"*-./#"++0%(*

1%/2*(3&4"#*

53,'6%(*

!7"(%,2/"*

!#"80)69"*&%8":*;"("#'6%(*

Page 71: Friend UCSF 2012-07-27
Page 72: Friend UCSF 2012-07-27
Page 73: Friend UCSF 2012-07-27

SUMMARY  

These  new  data  intensive  models  of  disease    will  be  strikingly  powerful  

They  will  not  arise  within  the  current  academic/industrial  loop  They  will  be  harder  and  more  expensive  than  we  can  accept  Ci\zenss  as  donors  of  data  insights  and  funds  will  be  cri\cal  

For  these  benefits  to  be  realized  -­‐  must  become  affordable  therefore  we  willl  need:  

Compute  spaces  A  Commons  

New  ways  of  being  rewarded  More  eyeballs  working  without  being  paid  

More  willingness  to  share  \ll  aoer  Clinical  Proof  of  Concept  

Page 74: Friend UCSF 2012-07-27

Alignments between the UCSF Strategic Plan���and the matchstick pilots for the Information

Commons being pursued by Sage Bionetworks •  Invest in infrastructure that enables UCSF to excel in basic,

clinical and population science- SYNAPSE as a compute space -links with Michael Wiener/ OneMind

• Build a Bioinformatics initiative across all school by June 2014- Impactful Models / Challenges with Laura Esserman/ LauraVant’Veer

•  Enhance existing data repositories and mining tools by June 2012 - Sage collaborations with Alex Pico, Geoff Manley

• Develop infrastructure to support new team-based, interdisciplinary learning models- PLC (The Federation?)

• Accelerate translation of groundbreaking science into therapies Arch2POCM