Kipper: Sequence database versioning for Galaxy bioinformatics servers

Post on 14-Jan-2017

166 Views

Category:

Science

4 Downloads

Preview:

Click to see full reader

Transcript

KIPPER: SEQUENCE DATABASE VERSIONING FOR GALAXY BIOINFORMATICS SERVERS

Damion DooleyHsiao Lab, BC Public Health Microbiology & Reference LaboratoryAnd UBC Department of Pathology, Vancouver, Canada

https://github.com/Public-Health-Bioinformatics

/kipper /versioned_data

How to recreate sequencing analysis?

Retrieve or redo sequencing data

Get right software versions

Get databases as they appeared on a certain date

Nice database vs. juggernaut

Periodically publishedVarying ability to download past versions

RDP RNA v10.1 – 11.4 (5.5 GB) Silva RNA v89 – 119 (2.6 GB)Uniref (~50 versions, ~35 GB latest)

Pseudo-versioned Version stated but no way to get past ones?No client software for insert/delete diff

NCBI nt (58 GB) NCBI nr (78 GB) Ancient juggernaut supporting immortal database and crushing

unwary sys admins in its path

Kipper – fetch!

What is a poor server admin to do?

Kipper data store

Metadata file

Kipper data store

Volume file(s)

Version listing

• Add new version:

• Retrieve a version by id:

$ Kipper rdp_rna –i download.fasta –o.

$ Kipper rdp_rna –e –n11

• Kipper is a python script

$ Kipper rdp_rna

Galaxy - version retrieval

Version retrieval

Acknowledgements

This work was supported by Genome Canada / Genome BC Grant “A Federated Bioinformatics Platform for Public Health Microbial Genomics” to Fiona Brinkman, Gary Van Domselaar and William Hsiao. More information about the IRIDA project (Integrated Rapid Infectious Disease Analysis) can be found at http://www.irida.ca

top related