Optimal Efficient Reconstruction of Root- Unknown Phylogenetic Networks with Constrained and Structured Recombination Author: Dan Gusfield Presentation by: C. Badri Narayanan
Feb 25, 2016
Optimal Efficient Reconstruction of Root-Unknown Phylogenetic
Networks with Constrained and Structured Recombination
Author: Dan Gusfield
Presentation by: C. Badri Narayanan
Agenda
• Main Problem – Root-Unknown galled-tree problem
• Solving Optimal Root-Unknown Galled-Tree Problem
Root-Unknown Galled-Tree problem
Given a set of sequences (say, M), find a galled-tree with minimum number of recombinations, if one exists else output none
Let’s see the approach previously taken
Points Considered in Theorem(s)
• Only single-crossover recombinations are considered
• The algorithm will be extended to multiple crossover recombinations
Before seeing the approach let’s consider some definitions
Definition of Terms
• Trivial Component: A node with no edges
• Component (a.k.a. Connected/Non-Trivial Component): For any pair of nodes there is at least one path between those nodes
• Reduced galled-tree: If no gall contains a character site from a trivial component
Previous Approaches – A Roadmap
• To construct a galled-tree for M with known ancestral sequence (say, A)
Focus on each non-trivial component
separately from incompatibility graph
For each component in the incompatibility
graph, determine the site arrangement on a
gall
Connect the galls in a tree structure
Place the sites from the trivial components
Difficulties for Unknown Ancestral Sequence
• For any two sequences S & S’ (in M), the conflict and incompatibility graphs may be different
• How do we know which (ancestral) sequence will allow a galled-tree
Optimal Galled-Tree• If a galled-tree that minimizes the number
of recombinations over all galled-trees for a set of sequences (say, M) and over all choices of ancestral sequence then it is called “Optimal Galled-Tree”
• The ancestral sequence of an optimal galled-tree is called an “optimal ancestral sequence”
Author’s Approach: Theorem on Galled Trees – Finding An
Ancestral Sequence
If there is a galled-tree for M with some ancestral sequence, then there is an optimal galled-tree for M where the (optimal) ancestral sequence is one of the sequences in M
Proof for the Theorem
T – optimal galled-tree for M A – ancestral sequence for T
Every gall must have at least three edges branching off of it
Proof continued….
Path P in T from root to some leaf z which doesn’t contain any recombination nodes
Zz – sequence labeling z where Zz is in M
Make Zz as the ancestral sequence &
reverse the directions of all edges on path P
Main Problem contd..
• Each such reversal of edges changes the direction of mutation on edges
• The reversal of edges don’t change
> Labels on edges in T
> Recombination node on a gall
• The modified tree T’ also derives M
Main Problem contd..
• Ancestral sequence of T’ is Zz which is a member of M
• T’ also contains same number of galls and hence T’ is also optimal
• Running time is O(n2 m + n4) where
n – number of sequences
m – length of binary sequence
Solving Optimal Root-Unknown Galled-Tree Problem
• M – can be derived on a galled-tree; T* - an optimal galled-tree for M
• A* - an optimal ancestral sequence
Connecting galls of T*
Assumptions Every node v on a gall Q in T* is
incident with exactly one edge; The
other end is off of Q (a.k.a. “off-edge”)
Off-edge may be directed into or out of a node
(say, x)
Connecting Galls of T*• Transform T* to T’
(conceptually) as follows– Node 00100 (say, x) is
incident with 2 edges– A new edge (say, y) is
introduced– Connect the 2 original
edges (that were initially out of x) from y
– T’ specifies how galls of T* are connected to each other but does not show the internal arrangement of the sites on any gall
Connecting Galls of T*
If x is root of T* then create a new root and connect it with an If x is root of T* then create a new root and connect it with an edge to xedge to x
Contract each gall Q in T* to a single node (say, q) and make all Contract each gall Q in T* to a single node (say, q) and make all edges undirectededges undirected
Algorithmic Construction of T’
• Find a family of splits SP(T)
• C1 & C2 are obtained from the incompatibility graph
• The leaf nodes for the tree (on the right side of the figure) are determined by the sites that have unique combination of characters
Extensions to Complex Biological Phenomena & Structured Recombination
• Site-Arrangement algorithm for gall Q corresponding to component C
Let M(C ) be matrix M restricted to sites in C
Extensions to Complex Biological Phenomena & Structured Recombination For each distinct sequence X in M(C ):
Let M(C, X) be M(C ) after removal of all rows with sequence X
If there is an undirected perfect phylogeny T(C) for M(C,X) where all sites on C are contained in one path whose end sequences can be recombined (with single-crossover) to create sequence X then output the pair (X, T(C ))
Extensions to Complex Biological Phenomena & Structured Recombination
• Step 2 of above algorithm is modified for multiple-crossover recombination
• To determine if X can be created by a multiple-crossover recombination of Su(C) and Sy(C),
starting with Su(C)
– Let Su(C) and Sy(C) denote two sequences
Extensions to Complex Biological Phenomena & Structured Recombination
• Algorithm:– i = 1; Z = Su(C)
– do{
• Find longest substring of Z starting at position i that matches a substring X starting at position i
• If none, return no else
• Set i to position past the right end of those matching substrings
• If Z = Su(C) then set Z = Sy(C) else Z = Su(C)
}
– Return yes
Extensions to Complex Biological Phenomena & Structured Recombination
The above algorithm produces a multiple-crossover galled-tree for M
Thank You