Christophe Blanchet, Clément Gauthey Infrastructure Distributed for Biology IDB-IBCP CNRS FR3302 - LYON - FRANCE http://idee-b.ibcp.fr IDB acknowledges co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552 ) and the French National Research Agency's Arpege Programme (ANR-10-SEGI-001 ) IDB-Cloud Providing Bioinformatics Services on Cloud
26
Embed
IDB-Cloud Providing Bioinformatics Services on Cloud
A presentation of IDB (Infrastructure Distributed for Biology) using StratusLab technology by Christophe Blanchet and Clément Gauthey at Lille, France, May 2013.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Christophe Blanchet, Clément Gauthey
Infrastructure Distributed for BiologyIDB-IBCP CNRS FR3302 - LYON - FRANCE
http://idee-b.ibcp.fr
IDB acknowledges co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552) and the French National Research Agency's Arpege Programme (ANR-10-SEGI-001)
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
IDB’s Cloud
• Cloud workbench for Biology• 13 turnkey bioinformatics appliances (as of Apr. 2013)
• Running since Sept. 2011, opened to Biology community
• Lyon, FRANCE
• Powered by• StratusLab
• Compute nodes, Block storage
• +900 cores, +4TB RAM, 36TB vdisks
• Mainly Intel SandyBridge servers with 32c 128GB
• Bigmen servers with 64c 768GB
• VMs from 1core-1GB to 64cores-768GB RAM
• + Openstack
• Object storage (Swift)
• +200 TB redundant & scalable storage
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Driven throught a simple web interface
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Integrate Bioinformatics Tools in Cloud
BLAST
GOR4
FastASSearch
Abyss
ClustalW
Bioinformatics
Tools
RayBWA
PhyML RedHat,CentOS
Debian,Ubuntu
Suse
LinuxVirtual machines
Createnew
Appliance
Bioinformatics Marketplace
NGSStructure Galaxy ARIA (…)Sequence
• Appliances are virtual machines• small : few GB, easy to convert in most virtualization formats
• Installed and pre-configured with common bioinformatics tools• e.g. BLAST, Clustalw, ARIA, MEME, HMMer, TopHat, BWA, Samtools, etc.
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Bioinformatics Appliances
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Select your bioinformatics tools
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Run Bioinformatics Cloud InstancesBioinformatics Marketplace
NGSStructure Galaxy ARIA (…)Sequence
IBCP's CloudResources
BLAST,Clustal,
etc.
PaaS
WorkersVM CNS
Shar
ed F
S
launch jobssshIaaS
Master & StorageVM ARIA
Portal
Laun
chIn
stan
ces
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Manage your Cloud Instances
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
UNIPROT
PDB
EMBLPROSITE
Genomes
Public
Data sources
BioinformaticsCloud
BLAST,Clustal,
etc.
PaaS
WorkersVM CNS
Shar
ed F
S
launch jobssshIaaS
Master & StorageVM ARIA
Portal
shared(NFS)
User
Persistent data
pdisk(iSCSI)
Biological Data in CloudUpload your data
Get your results
scp http/S3
scp http/S3
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Biological examples
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Common bioinformatics node
• ‘Biocompute’ appliance
• Use your own instance(s)
• With pre-installed standard bioinformatics tools• BLAST, FastA, SSearch,HMM,...
• ClustalW2, Clustal-Omega, Muscle,..
• Bowtie(2), BWA, samtools, ...
• MEME, R, etc.
• Connected to public reference data• Uniprot, EMBL, genomes, PDB, etc.
• Automaticaly shared to the VMs
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Structural Biology• TOwards StruCtural AssignmeNt Improvement
• To improve the determination of protein structures based on Nuclear Magnetic Resonance (NMR) information with ARIA software
• Large computational needs.
• A NMR laboratory will not specially invest in building a cluster of about 100 nodes to be able to run such NMR structure calculations.
• Flexibility of the cloud to deploy the different required bioinformatics tools can accelerate such a procedure.
• Commercial interest in providing such tools to structural biologists on a “pay as you go” basis.
• Endorsers:Institut Pasteur Parisand CNRS IBCP
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
IaaS deployment of ARIA
SharedStorage
Intermediateresults
CNSCNS
CNSCNS
CNSCNS
CNSCNS
...(20-100)
Structurepreparation
(8x)
ARIA
Final results
Input data: 10s MBResults: GB
ReadWrite
Virtual
Cluster
WorkersVM CNS
Master & StorageVM ARIA
Shar
ed F
S
launch jobsssh
Significant increase in the number of calculated protein conformations improves the
statistics on the NMR conformations and can help to overcome the ambiguity
bottleneck.
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Galaxy portal for NGS analyses• Analyse NGS data
• portal Galaxy is widely used in the community
• connected to large public data: sequences and indexes
• large user data (GBs)
• Preserve workflows and results (persistent storage)
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Proteomics desktop• Motivation
• Collaboration with a mass spectroscopy platform
• Running out of space on their local resources
• Protein identification• Mass experimental data
• Reference databases : nr, Swiss-Prot
• Reference screening tools:OMSSA, X!Tandem
• User interface• Remote display
• NX
• Reference GUIs
• SearchGUI
• PeptidShaker
source: PeptideShaker site
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Conclusion• Provide turnkey bioinformatics appliances
• Standard tools and pipelines
• Interoperability: ready to run on cloud
• Easier to transfer appliances than data (GB vs TB)
• Provide a cloud infrastructure tightly connected to existing bioinformatics infrastructure• Public IDB’s bioinformatics cloud
• Linked to public biological databases
• In collaboration with the French Bioinformatics Institute
• Ease the usage by scientists• Usual bioinformatics gateways
• Persistent and large ubiquitous storage
• Web interface for cloud management
• Access on a registration basis and standard use
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Perspectives• Define good practices to provide academic community and
industry with bioinformatics services!
• French Bioinformatics Institute - IFB• Goals are to provide core bioinformatics resources to the national and
international life science research community in key fields such as genomics, proteomics, systems biology, etc.
• Aims at building a national academic cloud devoted to Bioinformatics, inspired by the model evaluated through the IDB’s cloud.
• European ELIXIR infrastructure• To build a sustainable European
infrastructure for biological information, supporting life science research and its translation
• IFB will be the French representative in ELIXIR.
BioinformaticsCenterAppliances
catalog
Scientists
French biologists
have access to
regional resources
(RENABI)
Yes
Engineers
No
toolX ? Cloud
Bioinformatics or
public cloud.
Regional, national
or a federation.
Appliances
create new
register
Available ?
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
• Acknowledgment
• IDB members: Clément Gauthey, Simon Malesys
• StratusLab members
• co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552) and by the French National Research Agency's Arpege Programme (ANR-10-SEGI-001).