Top Banner
Galaxy Data intensive biology for everyone. www.galaxyproject.org @jxtx / #usegalaxy
27

2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

May 01, 2018

Download

Documents

Vandan Gaikwad
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

GalaxyData intensive biology for everyone.

www.galaxyproject.org

@jxtx / #usegalaxy

Page 2: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

I ❤ SEQUENCING!

High-Throughputv

Page 3: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

High-throughput sequencing is

transformative

Page 4: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

Resequencing

De novo genome

sequencing

Direct RNA sequencing

Open Chromatin assays

(DNase, FAIRE)

Transcription factors

(ChIP-seq)

Histones variants

(ChIP-seq, MNase-seq)

Long range interactions

(5C, Hi-C, ChIA-PET

Methylation

(Bisulfite-seq)

Page 5: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

High-throughput sequencing is

democratizing

Page 6: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

(http://omicsmaps.com/)

It is widely available...

Page 7: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

...and practically free!

(NHGRI / Nature 497:546–547)

Page 8: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

Making sense of this data requires

sophisticated methods

!

How can we ensure that these methods are

accessible to researchers? !

...while also ensuring that scientific results remain reproducible?

Page 9: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

Galaxy: accessible analysis system

Page 10: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

A free (for everyone) web service integrating a

wealth of tools, compute resources, terabytes of

reference data and permanent storage

Open source software that makes integrating

your own tools and data and customizing for your

own site simple

An open extensible platform for sharing tools,

datatypes, workflows, ...

Page 11: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

Describe analysis tool

behavior abstractly

Analysis environment automatically

and transparently tracks details

Workflow system for complex analysis,

constructed explicitly or automatically

Pervasive sharing, and publication

of documents with integrated analysis

Page 12: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

Visualization and visual analytics

Page 13: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

The free service is still the easiest way for

users with no informatics infrastructure to

analyze their data

!

How can we possibly sustain this?

Page 14: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health
Page 15: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

Ne

w D

ata

pe

r M

on

th (

TB

)

0

30

60

90

120

2008-04

2008-08

2008-12

2009-04

2009-08

2009-12

2010-04

2010-08

2010-12

2011-04

2011-08

2011-12

2012-04

2012-08

2012-12

2013-04

2013-08

usegalaxy.org data growth

+128 cores for NGS/multicore jobs

Data quotas implemented...

Nate Coraor

Page 16: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

usegalaxy.org frustration growth

Job

s D

ele

ted

Be

fore

Ru

n (

% o

f su

bm

itte

d)

0%

2%

5%

7%

9%

Tota

l Jo

bs

Co

mp

lete

d (

cou

nt)

0

40,000

80,000

120,000

160,000

2008-04

2008-08

2008-12

2009-04

2009-08

2009-12

2010-04

2010-08

2010-12

2011-04

2011-08

2011-12

2012-04

2012-08

2012-12

2013-04

2013-08

Nate Coraor

Page 17: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

How can this possibly scale? !

1. Leverage exisiting public cyber-infrastructure

!

2. Decentralize, provide many deployment models

(cloud and local — not talking about this today)

Page 18: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

Best place to build this robust entry point is clearly a

national supercomputing center

!

The Texas Advanced Computing Center (TACC) has

already built substantial infrastructure in the context

of the iPlant project

!

(Including multi petabyte online storage, cloud infrastructure,

collocated with some of the worlds largest HPC machines)

!

However, the iPlant and TACC cyber-infrastructure

was underused; thus we established a collaboration

!

Since October 2013 Galaxy Main has run from TACC

Page 19: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

Transparent Migrations using Galaxy’s

Hierarchical Object Store

Galaxy Server Processes

Corral

Corral Staging

Penn State

Read Data

In Corral?

In

Staging?

In PSU?

Yes

Yes

Yes

No

No

NoObject Not Found

Write Data

Nate Coraor

Page 20: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

Expanding to more XSEDE resources

Page 21: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

Galaxy can already run jobs on almost any

batch system, but most XSEDE resources do

not provide direct access for job

submission…

Page 22: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

Pulsar

!

Galaxy job runner that can

run almost anywhere

!

No shared filesystem, stages all necessary

Galaxy components

John Chilton

Page 23: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

Blacklight (PSC)

Messaging Server

Pulsar

Galaxy Server Processes

Stampede (TACC)

Galaxy Server VMs (TACC)

Pulsar

Job control (AMQP)

Data transfer (HTTPS) Data transfer (HTTPS)

Nate Coraor

Page 24: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

Moving long running jobs out to XSEDE

• Problem:

• Jobs wait in the queue for a long time

• Jobs may fail immediately upon run due to bad parameters

• Most jobs run quickly! Can we relocate the long ones?

!

• Goals:

• Shorten wait from submission to start

• Allow testing params without waiting

!

• Solutions:

• Set a short walltime, resubmit jobs to bigger resources

(new code)

• User selection of resources (Stampede - longer wait to start,

but more concurrent jobs allowed)

• Create “development” queues w/ short walltime

Nate Coraor and John Chilton

Page 25: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

State of Affairs

• Today

• Galaxy Test jobs to Stampede and

Blacklight

• Galaxy Main jobs to Stampede

!

• Up next

• Galaxy Main jobs to Blacklight

• Optimize Trinity tools for Blacklight

• Linking XSEDE allocations to Galaxy

accounts

Page 26: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

Credits

• Texas Advanced

Computing Center

• Dan Stanzione

• Matt Vaughn

• Chris Jordan

• Mike Packard

• Nathaniel Mendoza

!

• iPlant Collaborative

• Stephen Goff

• Pittsburgh

Supercomputing

Center

• Philip Blood

• Kathy Benninger

• Robert Budden

• Jared Yanovich

• Josephine Palencia

• J. Ray Scott

• Joe Lappa

... and the Galaxy Team and community !

Galaxy is supported in part by NSF, NHGRI, Pennsylvania Department of Public Health, The Huck Institutes of the Life Sciences, The Institute for CyberScience at Penn State, and Johns Hopkins University

Page 27: 2014.09.10 Taylor Hopkins - Galaxy · users with no informatics infrastructure to ... State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

Dan Blankenberg Nate CoraorDannon Baker

Jeremy GoecksAnton NekrutenkoJames TaylorDave Clements Jennifer Jackson

Engineering

Support and outreach Leadership

Carl Eberhard

Dave Bouvier

John Chilton Sam GuerlerMartin Čech

Enis Afgan

Supported by the NHGRI (HG005542, HG004909, HG005133, HG006620), NSF (DBI-0850103), Penn

State University, Johns Hopkins University, and the Pennsylvania Department of Public Health

Nick Stoler