Top Banner
SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery
20

SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

Dec 30, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: a triplet-based supertree approach to phylogenomics

Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery

Page 2: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: ISBM 20102

Introduction: inferring phylogeny (1 gene)

Page 3: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: ISBM 20103

Introduction: inferring phylogeny (3 genes)

Gene 1 Gene 3Gene 2

??????????????????????????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

????????????????????????????????????

SuperTree

SuperMatrix

Page 4: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: ISBM 20104

Introduction: inferring phylogeny (more data)

Gene 1000Gene 1

?????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

????????????????????????????????????

SuperTree

SuperMatrix

………………………..………………………..………………………..………………………..………………………..………………………..………………………..………………………..………………………..………………………..………………………..………………………..………………………..………………………..……………………….………………………..

SNP / Morpho/ biblio

Page 5: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: ISBM 20105

Supertree overview: MRP

0100101001?11?0100

01??0?011?0???0010

??0011010??001????

0100010??00??001?0

111??0101000????01

MRP [Baum 1992, Ragan 1992] 1 binary sequence per taxon 1 site per clade (1=in the clade; 0 outside; ? missing)

MR P

ABCDEF

CDEABF

CDEFBA

MRP

[Goloboff and Pol, 2002] Relation contradicted by all source trees

Page 6: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: ISBM 20106

Supertree overview: intuitive approach

The Supertree problem (intuitive formulation) Input: a collection of overlapping trees (a forest) Output: the tree that best represents this collection A major question is: how to define "best represents" ?

Vizualizing supertree candidates within the tree space

Median supertree Intuitive solution Generalization of the consensus tree Good theoretical properties [Steel and Rodriguo, 2008]

Page 7: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: ISBM 20107

Supertree oveview: median tree

d( , ) = + -

Tree decomposition as:• split set• quartet set• triplet set

Tree restrictionInitial trees

Page 8: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: ISBM 20108

Supertree overview: MRP and median tree

ED

CBA

T1

Triplet MRABCDEFGH

110?????0

11?0????0

AB|C AB|D … GH|F … FH|G …

………………………Rooting

FGH

BAC

T2

?????1010

………………………

?????0110

GFH

BAC

T3

………………………

0100101001?11?0100

01??0?011?0???0010

??0011010??001????

0100010??00??001?0

111??0101000????01

MR PInput forest

Page 9: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: ISBM 20109

Supertree overview: MRP and median tree

The parsimony value is related to the triplet distance: 1 parsimony step for triplets within the supertree 2 parsimony steps for others parsimony score = nbSites + (triplet distance)/2

The MRP approach is unadapted to triplet encoding for 100 taxa 97% of « ? » for 1000 taxa 99.7% of « ? » unnecessary huge matrices

Page 10: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: ISBM 201010

Supertriplets: few notations

Given a forest F of input trees N+(xy|z): number of occurrences of xy|z in F N-(xy|z) = N+(xz|y) + N+(yz|x) (alternive resolutions in F) Input trees are then useless (little impact of forest size)

Searching for the (asymmetric) triplet median tree T:

median :

d3(T,F) = d3(T,Ti)Ti ∈F

d3(T,F) = (2N−(xy | z) + N(x | y | z) )xy|z ∈ triplets(T )

+ (N−(xy | z) + N +(xy | z))x|y|z ∈ triplets(T )

asymmetric

Page 11: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: ISBM 201011

Supertriplets: general overview

N-(homo pan|mus)N+(homo pan|mus)

N-(pan bos|mus)N+(pan bos|mus)

N-(homo pan|bos)N+(homo pan|bos)

N-(mus pan| bos)N+(mus pan|bos)

…………………..

Triplet decompostion

first sketchNJ-like strategy

improvementNNI local search

Branch supportand collapse

O(n3 |F| ) O(n3)+ consistency

O(n3) to test all branches once

O(n3)

Page 12: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: ISBM 201012

Supertriplets: agglomerative process

DE|ADE|BDE|C

AB|CAB|DAB|E

Triplets(T3)

EDC

BA

T0

C1={A} C2={B}

EDC

BA

T1

C1={D} C2={E}

EDC

BA

T2

AC|D BC|DAC|E BC|E

C1={A,B} C2={C}

ED

CBA

T3

Page 13: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: ISBM 201013

Supertriplets: agglomerative process

Agglomeration of (CA,CB ) Transform T into T’ Resolve some new triplets (AB|X) with ACA, BCB, X{CACB}

d3( T’,F ) = d3( T,F ) - ( ∑ N+(AB|X) - ∑ N-(AB|X) )

We select the pair maximizing Score (CA, CB) = (∑ N+(AB|X) - ∑ N- (AB|X) ) / (∑ N+(AB|

X) + ∑ N-(AB|X) )

The whole process is O(n3) : when CA and CB are agglomerated score(CD , CE ) is unchanged

score(C{AB} ,CD ) is easily derived from Score (CA, CD ) and Score (CB, CD )

Page 14: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: ISBM 201014

Supertriplets: NNI optimisation

The variation d3(T’,F) - d3(T,F) depends on few triplets (here ) All these variations are initially evaluated in O(n3)

Once a NNI is done few NNI have to be re-evaluated (4 adjacent edges) NNI optimisation is therefore very fast

2 possible NNI per edge

T T’

Page 15: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: ISBM 201015

Supertriplets: edge supports

Local support ∑ N+( ) / [ ∑ N+( ) + ∑ N-( ) ] If <0.5 collapsing the edge improve d3(T,F)

Global support Also take into account N+( ) and N- ( ) impact two edges

Final edge support: min (local, global)

T

Page 16: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: ISBM 201016

Supertriplets: simulation protocol

Are they similar?Triplet/split measure

[Eulenstein et al. 2004] [Criscuolo et al. 2006]

Page 17: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: ISBM 201017

Supertriplets: simulation results

Less resolvedVery few errors

Contain errors

lack of resolutionperfect

Splits

triplets

Page 18: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: ISBM 201018

Supertriplets: Phylogenomic case study

Supertree of 33 mammals Species: complete genomes

( EnsEMBL v54)

Sequences: orthologous CDS (orthoMaM v5)

Gene trees: 13 000 ML trees (inferred using PAUP)

Output supertree Computed in 30s Congruent with [Prasad et al. 2008]

Page 19: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: ISBM 201019

Conclusion & prospects

(Asymmetric) median supertree Easy to understand Makes tree weighting natural

MRP, triplets and median supertree Understanding the criteria optimized by MRP Design a dedicated algorithm to optimize it http://www.supertriplets.univ-montp2.fr/

Supertrees & supermatrix are complementary 1 000 vertebrate genome project Divide and conquer approach

i) trees based on multiple CDSs (supermatrix)ii) assembling those trees (supertree)

Page 20: SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

SuperTriplets: ISBM 201020

Supertriplet: http://www.supertriplets.univ-montp2.fr/

N-(homo pan|mus)N+(homo pan|mus)

N-(pan bos|mus)N+(pan bos|mus)

N-(homo pan|bos)N+(homo pan|bos)

N-(mus pan| bos)N+(mus pan|bos)

…………………..

Triplet decompostion

first sketchNJ-like strategy

improvementNNI local search

Branch supportand collapse

O(n3 |F| ) O(n3)+ consistency

O(n3) to test all branches once

O(n3)

Less resolvedVery few errors