Top Banner
Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts to share tools UW-Madison: 1
18

Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.

Dec 18, 2015

Download

Documents

Gwenda Pitts
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.

1

Computational Infrastructure for Systems Genetics Analysis

Brian Yandell, UW-Madison

high-throughput analysis of systems dataenable biologists & analysts to share tools

UW-Madison: Yandell,Attie,Broman,KendziorskiJackson Labs: ChurchillU Groningen: Jansen,SwertzUC-Denver: TabakoffLabKey: Igra

Page 2: Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.
Page 3: Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.

typical workflow (from Mark Igra, LabKey)

3

FileStorage

DatabaseStorage

Cluster

1 Transfer files. 2 Configure Analysis

3

Bio-Web Server

Run analysis and load results

4Review results, repeat steps 2-4 as required

Page 4: Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.

4

view results(R graphics,

GenomeSpace tools)

systems genetics portal

(PhenoGen)

collaborativeportal

(LabKey)

iterate many times

get data (GEO, Sage)

run pipeline(CLIO,XGAP,HTDAS)

Page 5: Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.
Page 6: Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.

data access model

closed limited open

publishedshared

collaborationin progress

patent issues

proprietaryinternal

Sage Commons

Company X

Page 7: Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.

7

analysis pipeline acts on objects(extends concept of GenePattern)

pipeline

checks

input output

settings

Page 8: Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.

8

pipeline is composed of many steps

Ai B

C

DEoD’

Page 9: Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.

9

causal model selection choices in context of larger, unknown network

focal trait

target trait

focal trait

target trait

focal trait

target trait

focal trait

target trait

causal

reactive

correlated

uncorrelated

Page 10: Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.

10

BxH ApoE-/- chr 2: causal architecture

hotspot

12 causal calls

Page 11: Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.

11

BxH ApoE-/- causal networkfor transcription factor Pscdbp

causal trait

Page 12: Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.

view results(R graphics,

GenomeSpace tools)

systems genetics portal

(PhenoGen)

collaborativeportal

(LabKey)

iterate many times

get data(GEO, Sage)

develop analysis models & algorithms

run pipeline(CLIO,XGAP,HTDAS)

updateperiodically

[email protected]

Page 13: Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.

13

platform for biologists and analysts

• create and extend pipeline steps• share algorithms–public library–private authentication

• compare methods on one platform• combine data from multiple studies

Page 14: Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.

14

pipeline

checks

input output

settings

rawcode

version controlsystem

R&D

Page 15: Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.

15

Model/View/Controller (MVC) software architecture

• isolate domain logic from input and presentation• permit independent development, testing, maintenance

ControllerInput/response

Viewrender for interaction

Modeldomain-specific logic

user changes

system actions

Page 16: Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.

16

perspectives for building a communitywhere disease data and models are shared

Benefits of wider access to datasets and models:1- catalyze new insights on disease & methods2- enable deeper comparison of methods & results

Lessons Learned:1- need quick feedback between biologists & analysts2- involve biologists early in development3- repeated use of pipelines leads to

documented learning from experienceincreased rigor in methods

Challenges Ahead:1- stitching together components as coherent system2- ramping up to ever larger molecular datasets

Page 17: Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.

17

www.stat.wisc.edu/~yandell/[email protected]

• UW-Madison– Alan Attie– Christina Kendziorski– Karl Broman– Mark Keller– Andrew Broman– Aimee Broman– YounJeong Choi– Elias Chaibub Neto– Jee Young Moon– John Dawson– Ping Wang– NIH Grants DK58037, DK66369, GM74244,

GM69430 , EY18869

• Jackson Labs (HTDAS)– Gary Churchill– Ricardo Verdugo– Keith Sheppard

• UC-Denver (PhenoGen)– Boris Tabakoff– Cheryl Hornbaker– Laura Saba– Paula Hoffman

• Labkey Software– Mark Igra

• U Groningen (XGA)– Ritsert Jansen– Morris Swertz– Pjotr Pins– Danny Arends

• Broad Institute– Jill Mesirov– Michael Reich

Page 18: Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.

18

Systems Genetics Analysis PlatformBrian Yandell, UW-Madison

high-throughput analysis of systems dataenable biologists & analysts to share toolsUW-Madison: Attie, Broman,KendziorskiJackson Labs: ChurchillU Groningen: Jansen, SwertzUC-Denver: TabakoffLabKey: Igra hotspot

causal trait