Top Banner
Building and provisioning genomics platforms on the world’s clouds Enis Afgan Johns Hopkins University Galaxy Project April 2016, University of Heidelberg
21

Building and provisioning genomics platforms on the world’s clouds

Apr 15, 2017

Download

Technology

Enis Afgan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Building and provisioning genomics platforms on the world’s clouds

Building and provisioning genomics

platforms on the world’s clouds

Enis AfganJohns Hopkins University

Galaxy ProjectApril 2016, University of Heidelberg

Page 2: Building and provisioning genomics platforms on the world’s clouds

World’s clouds

AWSAWS (coming soon)

Google Compute EngineChameleon

JetstreamNeCTAR

Azure

Page 3: Building and provisioning genomics platforms on the world’s clouds

Capacity without end-to-end solution

Page 4: Building and provisioning genomics platforms on the world’s clouds

How to appropriately utilize clouds?

VM Platform Service

Page 5: Building and provisioning genomics platforms on the world’s clouds

Standalone VMPre-configured server that is readily available.Pros

Easy to build; easy to deployLow cloud infrastructure requirements ⟶ Transferable

ConsLimited capacity (compute and storage)

See it in actionwiki.galaxyproject.org/Cloud/Jetstream

Page 6: Building and provisioning genomics platforms on the world’s clouds

Scalable platformSet up a virtual cluster across multiple VMs with app services.

ProsDynamically scale compute and storageHigher-level services: persistent storage, sharing, multi-application

ConsComplicated build; considerable infrastructure requirements

See it in actionwiki.galaxyproject.org/CloudMan

Page 7: Building and provisioning genomics platforms on the world’s clouds

Scalable platform (cont)Data analysis spans more than one application (even if that is Galaxy).Meet Genomics Virtual Lab (GVL)Pros

Versatile platform built onthe scalable CloudMan clusterIncludes common tutorials

ConsDemanding to buildCalls for more customization

See it in actiongenome.edu.au

Page 8: Building and provisioning genomics platforms on the world’s clouds

Ready-to-use serviceUse cloud resources from an always-on, public servicePros

Visit a URL and start computing – no setup requiredCons

User quotas still applyIt’s still a public service: no user customization

See it in actionusegalaxy.org (bwa, bowtie2 – more coming)

Page 9: Building and provisioning genomics platforms on the world’s clouds

There’s a lot of clouds out there!

AWSAWS (coming soon)

Google Compute EngineChameleon

JetstreamNeCTAR

Azure

Page 10: Building and provisioning genomics platforms on the world’s clouds

How to appropriately utilize many clouds?

VM Platform Service

Build system

Page 11: Building and provisioning genomics platforms on the world’s clouds

Adjustable build system

Automate the process of building each componentCodify knowledge about the system ⟶ easier to reproduceWe use Ansible as the technology of choice

Compose systems from configurable and reusable roles

Galaxy-Kickstarter Playbook

artbio.github.io/ansible-artimed/

Galaxy-CloudManPlaybook

github.com/galaxyproject/galaxy-cloudman-playbook

Use-GalaxyPlaybook

github.com/galaxyproject/usegalaxy-playbook

Page 12: Building and provisioning genomics platforms on the world’s clouds

Many clouds AND many solutions!?!

launch.genome.edu.au ; use.jetstream-cloud.org ; launch.usegalaxy.org

Page 13: Building and provisioning genomics platforms on the world’s clouds

CloudBridge (future)A Simple Cross-Cloud Python Library

1. Offer a uniform API irrespective of the underlying provider

2. Provide a set of conformance tests for all supported clouds

3. Focus on mature clouds with a required minimal set of features

4. Be as thin as possible

Support for AWS and OpenStack exists; Google Cloud under development

cloudbridge.readthedocs.org

Page 14: Building and provisioning genomics platforms on the world’s clouds

CloudLaunch (future)A centralized launcher for any app and any cloud.

User configurable applications and clouds; view and launch shared instances; multi-cloud dashboard view

github.com/galaxyproject/cloudlaunchgithub.com/galaxyproject/cloudlaunch-ui

Page 15: Building and provisioning genomics platforms on the world’s clouds

CloudMan (future)Resource manager with configurable service layer• Pull away from low-level application service management

• Leverage containers to supply services• Allow runtime service and configuration changes

• Run on any infrastructure, including high-level services, such as ECS, or Docker API

Goal: Launch a (template-based) CloudMan platform and add application services as desired from Dockerhub or similar while resource provisioning is automatically handled.

Page 16: Building and provisioning genomics platforms on the world’s clouds

Galaxy ObjectStore (future)

Allow uniform any-Galaxy computing (i.e., make Galaxy instances interchangeable and disposable)• Galaxy implements an ObjectStore interface as an

abstraction to data• Leverage it to expand user data storage and allow any

Galaxy to connect to a user’s bucket• Use ObjectStore for reference data (simplify builds)• Still will need to deal with the database dependency

Page 17: Building and provisioning genomics platforms on the world’s clouds

The endgame?

launch.usegalaxy.org

ObjectStore

CloudBridge

CloudManA P P L I C A T I O N S

Page 18: Building and provisioning genomics platforms on the world’s clouds

Building your own cloud?Make it easyFor end-users to register and get onboard (very simple auth)For deployers to interface with the cloud (adopt ‘standards’)Develop capacity and usage plansGo for monthly-reset, merit-based Allocation Units (AUs)Design for flexibilityUsers need more storage? Different instance types?Create champion teamsBring them onboard early to deploy target apps; give them $$$Start with good documentationTechnical but not overly detailed (look at AWS)Be open; add great, interactive supportDesign a training programFor application developers and end users; build a community

Page 19: Building and provisioning genomics platforms on the world’s clouds

Acknowledgments

Page 20: Building and provisioning genomics platforms on the world’s clouds

Want more Galaxy?

gcc2016.iu.edu

Page 21: Building and provisioning genomics platforms on the world’s clouds

usegalaxy.org cloud-bursting

usegalaxy.orgCVMFS

NFSjob_conf.xml