Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Pooling metagenomes in MEGAN based on environmental parameters Hans-Joachim Ruscheweyh Center for Bioinformatics, Tuebingen University June 15, 2011 1 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
33
Embed
Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters
Hans-Joachim Ruscheweyh's talk from the 1st Earth Microbiome Project meeting in Shenzhen
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
The study of DNA of uncultured organisms> 99% of all microbes cannot be culturedA genome is the entire genetic information of a singleorganismA metagenome is the entire genetic information of aassemblage of organisms
Tree reflects theNCBI taxonomyReads arecompared againstreferencedatabase e.g. NRReads aremapped on thetree using thecomparisonresults based onthe LCA algorithm
MEGAN communicates with aPostgreSQL databaseMany datasets are available inone database instanceMany users can operate onthe same database instanceThis avoids redundancy onoften large datasets
Metadata are for example environmental parameters recordedtogether with the actual metagenomic sample e.g. collectiondate, gender, health status, ...
Month Salinity AmmoniaJanuary_2PM January 33.3 0.0
January_10PM January 34.2 0.0August_4AM August 33.3 0.14
August_10AM August 32.1 0.06
Datasets taken from: The taxonomic and functional diversity of microbes at a temperate coastal site: a ’multi-omic’study of the seasonal and diel temporal variation; Gilbert et al. (2010)
A primary dataset is a dataset created from the originalBLAST output and the reads fileA combined dataset is created from primary datasetsA combined dataset is created by using:
References to read and match data of the primary datasetsOptionally also the classification data of the primarydatasets
Hence, a combined dataset can be created time and spaceefficiently
Input: 8 primary datasets. Altogether ~100,000 reads, ~4mio matches, ~4.5 GB spaceIt takes ~50 minutes to load these datasets to the databaseThree combined datasets (winter, spring, summer) arecreatedTheir creation takes ~30 seconds and needs ~40MBadditional spaceAlternatively combined datasets can be created on-the-fly.This takes less than a second and needs no additionalspace
MEGAN communicates with a PostgreSQL databaseThis gives the user access to many datasetsMany user can work on the database simultaneouslyPrimary datasets can be pooled to create combineddatasetsThe MetaData Analyzer allows one to create combineddatasets based on the usage of boolean expressions onassigned metadataThis technique is highly space and time efficient
MEGAN v4 is freely available from www-ab.informatik.uni-tuebingen.de/software/megan
Integrative analysis of environmental sequences usingMEGAN4, Daniel H. Huson, Suparna Mitra, Hans-JoachimRuscheweyh, Nico Weber, Stephan C. Schuster; submitted2011Thanks go to Daniel Huson, Suparna Mitra, Nico Weber,Stefan Schuster
Thank your for your attention!27 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes