Challenges for metagenomic data analysis and lessons from viral metagenomes [What would you do if sequencing were free?] Rob Edwards http://phage.sdsu.edu/~rob San Diego State University Fellowship for Interpretation of Genomes SGM Meeting, Warwick, April 200
32
Embed
Rob Edwards phage.sdsu/~rob San Diego State University
SGM Meeting, Warwick, April 2006. Challenges for metagenomic data analysis and lessons from viral metagenomes [What would you do if sequencing were free?]. Rob Edwards http://phage.sdsu.edu/~rob San Diego State University Fellowship for Interpretation of Genomes. Outline. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Challenges for metagenomic data analysis and lessons from viral metagenomes
[What would you do if sequencing were free?]
Rob Edwards
http://phage.sdsu.edu/~rob
San Diego State UniversityFellowship for Interpretation of Genomes
SGM Meeting, Warwick, April 2006
Outline
• The envy is not mine
• A tour around the world, thanks to phage
• People suck
• What is the most successful gene in
evolution?
• Is there a Future?
This is all 454 sequence data
• 21 libraries– 10 microbial, 11 phage
• 597,340,328 bp total– 20% of the human genome– 50% of all complete and partial microbial
genomes
• 5,769,035 sequences– Average 274,716 per library
• Average read length 103.5 bp– Av. read length has not increased in 7 months
• Cost 0.04¢ per bp
Sequencing is cheap and easy.
Bioinformatics is neither.
The Soudan Mine, Minnesota
Red Stuff OxidizedBlack Stuff Reduced
Red and Black Samples Are Different
Cloned and 454 sequenced16S are indistinguishable
Black stuff
Red
ClonedRed
There are different amounts of metabolism in each environment
There are different amounts ofsubstrates in each environment
BlackStuff
RedStuff
But are the differences significant?
• Sample 10,000 proteins from site 1• Count frequency of each “subsystem”• Repeat 20,000 times
• Repeat for sample 2
• Combine both samples• Sample 10,000 proteins 20,000 times• Build 95% CI