Extending Cyberinfrastructure for Gene Tree Reconciliation James Estill, John Bowers, Hariolf Haefele, Adam Kubach, Naim Matasci, Sheldon McKay, Andrew Muir, Dennis Roberts, Sriram Srinivasan, Cécile Ané, Jim Leebens-Mack, Todd Vision iEvoBio June 21, 2011
36
Embed
iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation
The reconciliation of gene trees to species trees makes use of the species tree to infer the history of evolutionary events such as gene duplication and loss in an individual gene family history. A cyberinfrastructure for tree reconciliation (TR) has been developed that includes an extensible pipeline for high-throughput reconciliation of gene trees to species trees, database utilities, and a visualization tool. The TR database schema extends the Ensemble-Compara database to include species trees and the mapping between the nodes of a gene tree and the species tree used for that reconciliation, which permits large-scale analysis of the distribution of gene tree events on species tree, and comparison of the evolutionary timing of events between gene trees. The Chado controlled vocabulary module was also incorporated to support the use of OBO ontologies to tag attribute values within the database. The schema supports multiple reconciliations for each gene tree, and an ontology for TR was developed to support storage of metadata for TR methodologies. Additions to the BioPerl Tree API allow for direct import of reconciled trees in PRIME format, and utilities have been provided to populate the database from de novo analyses of gene tree reconciliations. Queries against the database are facilitated by a RESTful web API that allows for BLAST searches against gene sequences in the database, as well as searches for GO term assignments among gene families. These tools support comparative analysis of reconciliation methodologies, which we illustrate by reporting an evaluation of the accuracy of methods that reconcile gene trees individually relative to synteny-informed reconstructions of genome duplication history. We also illustrate a novel visualization tool for interactively exploring the mapping between gene trees and species tree.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Extending Cyberinfrastructurefor
Gene Tree Reconciliation
James Estill, John Bowers, Hariolf Haefele, Adam Kubach, Naim Matasci, Sheldon McKay, Andrew Muir,
Dennis Roberts, Sriram Srinivasan, Cécile Ané, Jim Leebens-Mack, Todd Vision
iEvoBioJune 21, 2011
iPlant Tree of Life (iPTOL)
• Tree Reconciliation
• Big Trees
• Data Assembly
• Trait Evolution
• Data Integration
• Tree Visualization
Gene Tree Reconciliation
Projection of gene trees onto a species tree• gene duplications• gene losses• lineage sorting• horizontal transfer
Gene Tree Reconciliation
• Locating gene duplications allows us to identify orthologs and paralogs
• Identify gene composition in inferred ancestral genomes
• Map of the positions of ancestral polyploidy events
• Contribute to the study of the “fate” of duplicated genes
• Address questions of gene family coevolution
Existing Cyberinfrastructure
TreeBeST
primeGSR
GenerateReconciliations
primeTV
fltreebest
GeneTrees
EC VisualizeReconciliations
Extending Cyberinfrastructure
• Increased interoperability among the component pieces
• Query the location of gene duplications on the species tree
• Integrate tree visualization tools that scale to many thousands of nodes
• Allow for the storage and analysis of multiple reconciliations for a single gene tree within a single database structure
SpeciesTrees
Extending Cyberinfrastructure
TreeBeST
primeGSR
GenerateReconciliations
primeTV
fltreebest
GeneTrees
VisualizeReconciliations
Adding Species Trees
SpeciesTrees
Reconciled
Extending Cyberinfrastructure
TreeBeST
primeGSR
GenerateReconciliations
primeTV
fltreebest
GeneTrees
VisualizeReconciliations
A
B
C
D
E
12
3
4567
89
101112
13
14
15
1617
18
19
20
2122
23
Reconciled Tree
Gene Tree
123
4567
89
1011
12
13
14
15
16
17
18
19
20
2122
23
Species Tree
E
D
A
B
C
3 Species5 Nodes
12 Genes From 3 Species23 Nodes in Gene Tree
Mapping Host to Guest
• Map the guest tree onto the host tree by defining the position on an host tree edge that the gene tree node maps to
A
1 2
Host Tree Edge
B
Host Tree Nodes
Guest Tree Edge
Guest Tree Nodes
ParentNode
ChildNode
Mapping Host to Guest
1
• Guest nodes can map to four general locations on host edges
Inside Parent Node
2
Inside Child Node
Edge Between Host Nodes
3
Outside of Host Edge
4
Mapping Host to Guest
1
• Locations stored in a reconciliation map table
2
3
4
A
A
A
A
B
B
B
B
map idguestnode
hostparent node
host child node
1001 1 A A
1002 2 B B
1003 3 A B
1004 4 NULL A
Reconciliation
• Reconciliation is a mapping of the nodes of guest tree (gene tree) onto the nodes and edges of the host tree (species tree)
• The topology of the two trees are stored separately from the mapping of the reconciliation itself