Kipper: Sequence database versioning for Galaxy bioinformatics servers
Post on 14-Jan-2017
166 Views
Preview:
Transcript
KIPPER: SEQUENCE DATABASE VERSIONING FOR GALAXY BIOINFORMATICS SERVERS
Damion DooleyHsiao Lab, BC Public Health Microbiology & Reference LaboratoryAnd UBC Department of Pathology, Vancouver, Canada
https://github.com/Public-Health-Bioinformatics
/kipper /versioned_data
How to recreate sequencing analysis?
Retrieve or redo sequencing data
Get right software versions
Get databases as they appeared on a certain date
Nice database vs. juggernaut
Periodically publishedVarying ability to download past versions
RDP RNA v10.1 – 11.4 (5.5 GB) Silva RNA v89 – 119 (2.6 GB)Uniref (~50 versions, ~35 GB latest)
Pseudo-versioned Version stated but no way to get past ones?No client software for insert/delete diff
NCBI nt (58 GB) NCBI nr (78 GB) Ancient juggernaut supporting immortal database and crushing
unwary sys admins in its path
Kipper – fetch!
What is a poor server admin to do?
Kipper data store
Metadata file
Kipper data store
Volume file(s)
Version listing
• Add new version:
• Retrieve a version by id:
$ Kipper rdp_rna –i download.fasta –o.
$ Kipper rdp_rna –e –n11
• Kipper is a python script
$ Kipper rdp_rna
Galaxy - version retrieval
Version retrieval
Acknowledgements
This work was supported by Genome Canada / Genome BC Grant “A Federated Bioinformatics Platform for Public Health Microbial Genomics” to Fiona Brinkman, Gary Van Domselaar and William Hsiao. More information about the IRIDA project (Integrated Rapid Infectious Disease Analysis) can be found at http://www.irida.ca
top related