Metagenomic tools for the fungal community Holly Bik, UC Davis 19 October 2012
Dec 05, 2014
Metagenomic+tools+for+the+fungal+community+
Holly+Bik,+UC+Davis+19+October+2012+
hAp://phylosiE.wordpress.com+
Explicitly+PhylogeneLc+Approaches+Aligned+environmental+sequences+
Guide+Tree+
EvoluLonary+Placement+of+short+reads+
+++++++++
We+provide:+• Support+for+Paired+End+(raw)+Illumina+data+• Marker+gene+data+for+Bacteria,+Archaea,+Eukaryotes,+Viruses+
• Taxonomy+assignments+based+on+probability+distribuLons+over+a+reference+phylogeny+
• Complement+to+exisLng+tools+–+QIIME/VAMPs+– Inputs/outputs+will+be+compaLble+for+use+with+other+soEware+tools+
Markers+
• PMPROK+–+Dongying+Wu’s+Bac/Arch+markers+• EukaryoLc+Orthologs+–+Parfrey+2011+paper+• 16S/18S+rRNA++• Mitochondria+_+protein_coding+genes+• Viral+Markers+–+Markov+clustering+on+genomes+• Codon+Subtrees+–+finer+scale+taxonomy+
• Extended+Markers+–+plasLds,+gene+families+
Reference+Marker+Genes+
The+Monkey+–+Build+Marker+Packages+
FastTree
hmmbuild (ssu-build)
Mapping'File'(sequence'name,'NCBI'taxon'ID)'
Reconcile'NCBI'taxonomy'IDs'with'phylogene?c'topology'
Execute'build_marker'mode'
Generate'unique'IDs'for'input'sequences'
Create'profile'HMMs'(or'CMs'for'rRNA'data)'using'input'sequences'
Alignment'File'(Marker'sequences'in'FASTA'format)'
Build'tree'and'collapse'topology'according'to'a'userMspecified'PD'cutoff'(e.g.'99%)''
Tree Reconciliation
Built Marker Packages
Index Marker Database
Clean'and'package'new'marker'genes'
New'marker'gene'packages'placed'into'shared'PhyloSiS'marker'directory'
Execute'index'mode'
Indexes'the'marker'databases'needed'for'LAST'and'Bow?e'
NOTE:'New'marker'packages'are'named'according'to'input'filenames'(e.g.'MarkerAlignment.fasta).'Core'marker'data'will'be'overwriXen'during'new'marker'builds'if'input'files'do'not'have'unique'names'compared'to'exis?ng'PhyloSiS'markers.'
Locally'indexed'marker'packages'will'not'interfere'with'automa?c'updates'to'PhyloSiS'core'markers'
Quan?ta?ve'metric'(minimum'hamming'distance)'used'to'match'edges'between'NCBI'taxon'tree'and'molecular'phylogeny'
PD'cutoff'
Built'PhyloSiS'Marker'package'
Tree' HMM'profile''(CMs'for'rRNA)'
Taxon'map' Representa?ve'sequences'
Alignment'
FastTree
hmmbuild (ssu-build)
Mapping'File'(sequence'name,'NCBI'taxon'ID)'
Reconcile'NCBI'taxonomy'IDs'with'phylogene?c'topology'
Execute'build_marker'mode'
Generate'unique'IDs'for'input'sequences'
Create'profile'HMMs'(or'CMs'for'rRNA'data)'using'input'sequences'
Alignment'File'(Marker'sequences'in'FASTA'format)'
Build'tree'and'collapse'topology'according'to'a'userMspecified'PD'cutoff'(e.g.'99%)''
Tree Reconciliation
Built Marker Packages
Index Marker Database
Clean'and'package'new'marker'genes'
New'marker'gene'packages'placed'into'shared'PhyloSiS'marker'directory'
Execute'index'mode'
Indexes'the'marker'databases'needed'for'LAST'and'Bow?e'
NOTE:'New'marker'packages'are'named'according'to'input'filenames'(e.g.'MarkerAlignment.fasta).'Core'marker'data'will'be'overwriXen'during'new'marker'builds'if'input'files'do'not'have'unique'names'compared'to'exis?ng'PhyloSiS'markers.'
Locally'indexed'marker'packages'will'not'interfere'with'automa?c'updates'to'PhyloSiS'core'markers'
Quan?ta?ve'metric'(minimum'hamming'distance)'used'to'match'edges'between'NCBI'taxon'tree'and'molecular'phylogeny'
PD'cutoff'
Built'PhyloSiS'Marker'package'
Tree' HMM'profile''(CMs'for'rRNA)'
Taxon'map' Representa?ve'sequences'
Alignment'
The+Kangaroo+–+SimulaLon+Data+
Select Taxa
PD on concatenated tree
Genome&Directory&Define&the&number&of&&genomes&to&pick&(default&=&10)&and&number&of&
reads&to&generate&per&file&(default&=&100,000)&
Grinder&algorithm&randomly&generates&reads&from&selected&genomes,&outputs&simulated&PEAIllumina&and&454&datasets&
Execute&sim&mode&
Determines&PD&contribuFons&for&taxa&present&in&concatenated&guide&tree&in&PhyloSiH&marker&directory&
Two&separate&approaches&used:&1. Select&some&number&of&taxa&that&contribute&
to&PD&(user&input,&default&=&10&taxa)&2. Sample&taxa&uniformly&without&replacement&
Knockout Swaths of Taxa
Generated Simulated Reads
Simulation Marker Directory
Workflow&plugs&into&updateDB&to&remove&genomes&which&have&been&used&to&simulate&metagenome&data,&as&well&as&a&swath&of&related&taxa.&
A&new&marker&directory&is&created,&where&simulated&genomes&have&been&knocked&out&from&marker&packages.&&
Compute metrics between target and
remaining taxa
Calculated&metrics&include:&the&distance&to&nearest&neighbors,&connecFng&branch&lengths,&and&the&number&of&sampled&nodes&within&various&PD&units&of&connecFng&nodes.&
DBupdate+–+Mining+new+genomes+
Amino Acid Tree
Run PhyloSift (search + align)
Execute'
phylosi/_dbupdate.pl'
A'taxa'set'is'selected'with'a'
maxPD'cutoff'of'0.02'and'a'new'
tree'is'inferred'
EBI'
Genomes'
Infer Updated Tree
PD'metric'used'to'split'guide'tree'into'
smaller'subtrees;'subsets'of'taxa'are'
selected'such'that'no'branch'connecEng'
them'has'length'>0.X'for'some'value'of'X'
Add'new'sequences'to'marker'packages'
JGI'
Genomes'
Private'
Genomes'
NCBI'
Genomes'
Nucleotide Tree
Prune Tree
Update reference sequences with
new data
New'sequences'added'at'0.25'PD'for'amino'
acid'tree;'higher'PD'threshold'enables'
more'aggressive'searches'of'reference'
database,'since'LAST'searching'is'faster'
with'fewer'sequences.'
Reconcile'NCBI'taxonomy'IDs'with'
phylogeneEc'topologies,'for'both'
amino'acid'tree'and'codon'subtrees'
Tree Reconciliation
Codon Subtrees
Package Markers
Users’'local'marker'databases'are'automaEcally'
scanned'each'Eme'PhyloSi/'is'run'and'any'new'
updates'are'automaEcally'downloaded'if'available'
Automated Download to
PhyloSift Users
Tree+ReconciliaLon+in+PhyloSiE+
Environmental,Sequences,
Named,Taxa,
Great!,,
Not,Bad,,
Ge9ng,Tricky…,,
Tree+Placement+Fat+Tree+_+Guppy+
Marine+Metagenome+
Chemoautotrophic+bacteria+–+oxidize+ammonia+into+nitrite+
Alveolate+ProLsts+
Common+seawater+Archaea+
Tree+Placement+Tog+Tree+_+Guppy+
Marine+Metagenome+
Marine+Metagenome+
Tree+Placement+Sing+Tree+_+Guppy+
Linking+with+the+Fungal+ITS+community+
• How+does+fungal+ITS+sequence+data+relate+to+your+project?+– PhyloSiE+has+the+capability+to+add+any+marker+gene+reference+packages+that+are+relevant+for+specific+taxonomic+communiLes++
• What+fungal+ITS+data+does+your+project+currently+provide+– None+–+but+we+do+mine+other+marker+genes+from+fungal+genomes+
• What+fungal+ITS+data+is+your+project+hoping+to+provide?+– We+wouldn’t+provide+data,+but+can+work+with+users+to+increase+support+for+fungal+analyses+
• Is+your+project+involved+with+curaLng+fungal+ITS+sequences+– No,+but+we+would+curate+alignments+and+marker+packages+of+ITS+sequences+mined+from+public+databases+
• If+so,+what+curaLon+strategies+are+being+implemented+for+your+project?+– Alignment+filtering+and+masking,+pruning+reference+trees+
• What+tools+for+working+with+fungal+ITS+sequences+does+your+project+currently+provide?++– None+so+far+–+but+can+be+implemented+if+given+a+reference+dataset+(e.g.+alignment)+
Linking+with+the+Fungal+ITS+community+
Linking+with+the+Fungal+ITS+community+
• What+tools+are+you+developing+/+planning+to+develop?++– Current+focus+is+on+mulLsample+comparisons+– Gene+tree+reconciliaLon+– Probability+distribuLon+over+tree+topology+to+delimit+OTUs+(PhylogeneLc+OTUs)+
• What+framework+of+fungal+taxonomy+does+your+project+use?++– NCBI_derived+taxonomy+(because+of+tree+mapping/reconciliaLon+issues)+
SATELLITE MEETING
Eukaryotic Metagenomics
March/April 2013 UC Davis
Acknowledgements+UC+Davis+• Jonathan+Eisen+• Aaron+Darling+• Guillaume+Jospin+• Dongying+Wu+• David+Coil+
+PhyloSiE+SoEware+Development+on+Github:+hAps://github.com/gjospin/PhyloSiE++Google+Group+for+user+support:++hAps://groups.google.com/d/forum/phylosiE++TwiAer:+@PhyloSiE+