Towards inferring the history of life in the presence of lateral gene transfers Bastien Boussau LBBE, CNRS, Université de Lyon
Jun 20, 2015
Towards inferring the history of life in the presence of lateral gene transfers
Bastien Boussau
LBBE, CNRS, Université de Lyon
Collaborators
–Gergely Szöllősi (Budapest), –Eric Tannier (LBBE, Lyon), –Nicolas Lartillot (LBBE, Lyon), –Vincent Daubin (LBBE, Lyon)
Gene trees provide confusing signals about species relationships
Boussau, Gueguen, Gouy, BMC Evol. Biol. 2008HOGENOM database
Gene transfers and the quixotic pursuit of the TOL
Doolittle WF,
Science 1999
Gene transfers and the quixotic pursuit of the TOL
Doolittle WF,
Science 1999
Gene transfers and the quixotic pursuit of the TOL
Doolittle WF,
Science 1999
“The monistic concept of a single universal tree appears […] increasingly obsolete. […][It is] no longer the most scientifically productive position to hold[…][It] accounts for only a minority of observations from genomes.”
Bapteste, O’Malley, Beiko, Ereshefsky, Gogarten, Franklin-Hall, Lapointe, Dupré, Dagan, Boucher, Martin, Biology Direct 2009.
Gene transfers and the quixotic pursuit of the TOL
Doolittle WF,
Science 1999
“The monistic concept of a single universal tree appears […] increasingly obsolete. […][It is] no longer the most scientifically productive position to hold[…][It] accounts for only a minority of observations from genomes.”
Bapteste, O’Malley, Beiko, Ereshefsky, Gogarten, Franklin-Hall, Lapointe, Dupré, Dagan, Boucher, Martin, Biology Direct 2009.
“Bien parece, respondió Don Quijote, que no estás cursado en esto de las aventuras”
!"Obviously," replied Don Quijote,
"you don't know much about adventures” Don Quijote, VIII
Can we extract some signal from the noise?Huge amounts of gene tree incongruence in genomic
data
1. What proportion of it is biological (=signal) and what proportion of it comes from our failure to correctly infer gene trees (=noise)?
2. Amid the signal, is there trace of a tree of life?
1-Removing the noise in gene trees
Usual approach
ALE+DTL
RF d
ista
nce
to re
al tr
ee
Szöllősi et al., Syst. Biol. 2013
1-Removing the noise in gene trees
Usual approach
ALE+DTL
Tran
sfer
eve
nts
per f
amily
Usual approach
ALE+DTL
RF d
ista
nce
to re
al tr
ee
Szöllősi et al., Syst. Biol. 2013
2-Amid the signal, is there trace of a tree of life?
• STRALE: • A Bayesian probabilistic method that can interpret thousands of
gene trees in terms of: • speciation events • duplication events (D) • transfer events (T) • loss events (L)
• A method able to estimate the DTL rates • A method able to reconstruct the species tree • A method able to order the nodes of the species tree
Using transfers to date clades
?T IM E
Using transfers to date clades
?T IM E
Using transfers to date clades
?T IM E
Using transfers to date clades
?T IM E
Using transfers to date clades
?T IM E
Because we can identify gene transfers, we have information for ordering the nodes of a species tree
Simulation to test the species tree reconstruction• 20 species • 200 gene families
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
1 5
1
3
14
10
6
8
12
18
13
5
4
2
9
0
11
19
7
16
17
0.0 0.25 0.5 0.75 1.0 1.25
2
13
7
17
15
1
5
12
10
16
11
9
0
4
8
3
14
19
6
18
Simulated Inferred
Can we recover DTL rate heterogeneity among families?
• Simulation: • 10 species ; 100 gene families per rate category • 4 rate categories drawn from a Gamma distribution
1 2 3 4
12
34
Rate category
Rat
e
Simulated DTL rate
Infe
rred
DTL
rat
e
Application to 36 Cyanobacteria
PROM3
SYNE7
TRIEI
CYAA5
ACAM1
NOSP7
PROM9
PROM5
CYAP4
SYNPX
SYNP6
CYAP8
SYNS9
ANAVT
SYNPW
PROMS
SYNY3
SYNS3
SYNSC
GLVIO1
SYNR3
SYNJA
THEEB
PROM2
CYAP7
PROM0
PROMT
MICAN
SYNJB
PRMAR1
PROMM
PROM4
SYNP2
PROMP
ANASP
PROM1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0.79
1
1
1
Szöllősi et al., PNAS 2012
1099 gene families
Conclusion, perspectives
• STRALE: • A Bayesian probabilistic method that can interpret thousands of
gene trees with DTL events and reconstruct a time-ordered species tree
• Currently undergoing tests • Can run on thousands of gene families (parallel architecture) • Will be open access • Can run on dozens of species
Postdocs wanted!
Thank you!
• Organizers of Evolution 2014!
• LBBE collaborators (Lyon and Budapest):
–Gergely Szöllősi, –Eric Tannier, –Nicolas Lartillot –Vincent Daubin,
Postdocs wanted!