Top Banner
Bastien Boussau LBBE, CNRS, Université de Lyon Genome-scale phylogenomics
53
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Genome-scale phylogenomics

Bastien Boussau

LBBE, CNRS, Université de Lyon

Genome-scale phylogenomics

Page 2: Genome-scale phylogenomics

Collaborators• Lyon collaborators:

• Adrián Arellano Davín

• Gergely Szöllősi (Budapest),

• Eric Tannier,

• Vincent Daubin,

• Thomas Bigot,

• Magali Semeria,

• Manolo Gouy,

• Laurent Duret

• Austin collaborators:

• Siavash Mirarab

• Md. Shamsuzzoha Bayzid

• Tandy Warnow

• RevBayes collaborators:

• Sebastian Hoehna • Michael Landis • Tracy Heath • Fredrik Ronquist • Brian Moore • John Huelsenbeck • …

Page 3: Genome-scale phylogenomics

To study genome evolution:

1. One species tree:

!!!

2. Thousands of gene trees:

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Page 4: Genome-scale phylogenomics

To study genome evolution:

1. One species tree:

!!!

2. Thousands of gene trees:

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Page 5: Genome-scale phylogenomics

Why our current pipeline can be improved

�������������

��������

� ���������

�������� �

�������������

���������������

��������

�������������� ���������������������

���������������������� ������������ ���������������

����������� !���"� !��#����!�#$��%

���������&$�%!�������������'(%!�#$�%

�������( )'�

����!�����*+ ('�,#$��%

����!��������&�����-���!�����&( ��� $�.��"'(%

���������/���

Page 6: Genome-scale phylogenomics

Why our current pipeline can be improved

�������������

��������

� ���������

�������� �

�������������

���������������

��������

�������������� ���������������������

���������������������� ������������ ���������������

����������� !���"� !��#����!�#$��%

���������&$�%!�������������'(%!�#$�%

�������( )'�

����!�����*+ ('�,#$��%

����!��������&�����-���!�����&( ��� $�.��"'(%

���������/���

•Gene alignments: •Error prone •Short •Point estimates

Page 7: Genome-scale phylogenomics

Why our current pipeline can be improved

�������������

��������

� ���������

�������� �

�������������

���������������

��������

�������������� ���������������������

���������������������� ������������ ���������������

����������� !���"� !��#����!�#$��%

���������&$�%!�������������'(%!�#$�%

�������( )'�

����!�����*+ ('�,#$��%

����!��������&�����-���!�����&( ��� $�.��"'(%

���������/���

•Gene alignments: •Error prone •Short •Point estimates

•Gene trees: •based on alignments •Point estimates

Page 8: Genome-scale phylogenomics

Why our current pipeline can be improved

�������������

��������

� ���������

�������� �

�������������

���������������

��������

�������������� ���������������������

���������������������� ������������ ���������������

����������� !���"� !��#����!�#$��%

���������&$�%!�������������'(%!�#$�%

�������( )'�

����!�����*+ ('�,#$��%

����!��������&�����-���!�����&( ��� $�.��"'(%

���������/���

•Gene alignments: •Error prone •Short •Point estimates

•Gene trees: •based on alignments •Point estimates

•Species trees: •based on gene trees

Page 9: Genome-scale phylogenomics

Why our current pipeline can be improved

�������������

��������

� ���������

�������� �

�������������

���������������

��������

�������������� ���������������������

���������������������� ������������ ���������������

����������� !���"� !��#����!�#$��%

���������&$�%!�������������'(%!�#$�%

�������( )'�

����!�����*+ ('�,#$��%

����!��������&�����-���!�����&( ��� $�.��"'(%

���������/���

•Gene alignments: •Error prone •Short •Point estimates

•Gene trees: •based on alignments •Point estimates

•Species trees: •based on gene trees

Page 10: Genome-scale phylogenomics

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Page 11: Genome-scale phylogenomics

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Page 12: Genome-scale phylogenomics

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Page 13: Genome-scale phylogenomics

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

D

Page 14: Genome-scale phylogenomics

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

D DL

Page 15: Genome-scale phylogenomics

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

LGTD DL

Page 16: Genome-scale phylogenomics

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

LGT ILSD DL

Page 17: Genome-scale phylogenomics

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

LGT ILS

DL: Boussau et al., Genome Research 2013

D DL

Page 18: Genome-scale phylogenomics

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

LGT ILS

DL: Boussau et al., Genome Research 2013

D DLDL+T:!

Szöllősi et al. "PNAS 2013

Page 19: Genome-scale phylogenomics

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

LGT ILSILS: !

Mirarab et al. Science 2014

DL: Boussau et al., Genome Research 2013

D DLDL+T:!

Szöllősi et al. "PNAS 2013

Page 20: Genome-scale phylogenomics

(thousands of alignments)

PHYLDOG

All gene families

Rooted species tree,numbers of duplications

and losses,rooted gene trees D1

D2

D3D4

D5

D6

L2L1

L4L3

L5

L6

Joint reconstruction of the species tree, gene trees, and

numbers of duplications and losses

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

D1D3

D2 D4

D5 D6

L1L3

L2 L4

L5 L6

Boussau et al., Genome Research 2013

Page 21: Genome-scale phylogenomics

(thousands of alignments)

PHYLDOG

All gene families

Rooted species tree,numbers of duplications

and losses,rooted gene trees D1

D2

D3D4

D5

D6

L2L1

L4L3

L5

L6

Joint reconstruction of the species tree, gene trees, and

numbers of duplications and losses

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

D1D3

D2 D4

D5 D6

L1L3

L2 L4

L5 L6

Probabilistic models: • sequence evolution • gene family evolution

Boussau et al., Genome Research 2013

Page 22: Genome-scale phylogenomics

Sus scrofa

Felis catus

Ornithorhynchus anatinus

Oryctolagus cuniculus

Loxodonta africana

Mus musculus

Gorilla gorilla

Dipodomys ordii

Monodelphis domestica

Vicugna pacos

Macaca mulatta

Tupaia belangeri

Procavia capensis

Spermophilus tridecemlineatus

Pongo pygmaeus

Tursiops truncatus

Microcebus murinus

Callithrix jacchus

Equus caballus

Erinaceus europaeus

Tarsius syrichta

Choloepus hoffmanni

Ochotona princeps

Cavia porcellus

Pan troglodytes

Bos taurus

Rattus norvegicus

Homo sapiens

Otolemur garnettii

Dasypus novemcinctusEchinops telfairi

Pteropus vampyrus

Macropus eugenii

Canis familiaris

Sorex araneus

Myotis lucifugus

Laurasiatheria

Afrotheria

Xenarthra

Marsupials

Primates

Glires

010

000

010

000

010

000

010

000

010

000

010

000

010

000PHYLDOG

TreeBeSTPhyML

PHYLDOG: better trees for better ancestral genomes

Page 23: Genome-scale phylogenomics

An example gene family

0.1

Ornithorhynchus anatinus

0.3

Ornithorhynchus anatinusMus musculusMus musculusMus musculusCavia porcellusMus musculus

Oryctolagus cuniculusCanis familiaris

Bos taurusHomo sapiens

Pongo pygmaeusOryctolagus cuniculus

Cavia porcellusEquus caballusEquus caballus

Bos taurusCallithrix jacchusHomo sapiens

Monodelphis domesticaSpermophilus tridecemlineatus

Homo sapiensOrnithorhynchus anatinusOrnithorhynchus anatinusOrnithorhynchus anatinusOrnithorhynchus anatinus

Mus musculusMus musculus

Ornithorhynchus anatinusOrnithorhynchus anatinus

Mus musculusMus musculusMus musculus

Cavia porcellus

Mus musculus

Oryctolagus cuniculus

Canis familiaris

Bos taurus

Homo sapiens

Pongo pygmaeus

Oryctolagus cuniculus

Cavia porcellus

Equus caballusEquus caballus

Bos taurus

Callithrix jacchusHomo sapiens

Monodelphis domestica

Spermophilus tridecemlineatus

Homo sapiens

Ornithorhynchus anatinusOrnithorhynchus anatinusOrnithorhynchus anatinusOrnithorhynchus anatinus

Mus musculusMus musculus

TreeBeST PHYLDOG

Boussau et al., Genome Research 2013

Page 24: Genome-scale phylogenomics

Species: A B C D

TIME

ILS: !Mirarab et al. Science 2014

DL: Boussau et al., Genome Research 2013DL+T:!

Szöllősi et al. "PNAS 2013

Page 25: Genome-scale phylogenomics

Species: A B C D

TIME

LGT ILSILS: !

Mirarab et al. Science 2014

DL: Boussau et al., Genome Research 2013

D DLDL+T:!

Szöllősi et al. "PNAS 2013

Page 26: Genome-scale phylogenomics

Gene transfers and the quixotic pursuit of the TOL

Doolittle WF, Science 1999

Page 27: Genome-scale phylogenomics

Gene transfers and the quixotic pursuit of the TOL

Doolittle WF, Science 1999

Page 28: Genome-scale phylogenomics

Gene transfers and the quixotic pursuit of the TOL

Doolittle WF, Science 1999

“The monistic concept of a single universal tree appears […] increasingly obsolete. […][It is] no longer the most scientifically productive position to hold[…][It] accounts for only a minority of observations from genomes.”!

Bapteste, O’Malley, Beiko, Ereshefsky, Gogarten, Franklin-Hall, Lapointe, Dupré, Dagan, Boucher, Martin, !

Biology Direct 2009.

Page 29: Genome-scale phylogenomics

Using transfers to date clades

?T IM E

Page 30: Genome-scale phylogenomics

Using transfers to date clades

?T IM E

Page 31: Genome-scale phylogenomics

Using transfers to date clades

?T IM E

Page 32: Genome-scale phylogenomics

Using transfers to date clades

?T IM E

Page 33: Genome-scale phylogenomics

Using transfers to date clades

?T IM E

Page 34: Genome-scale phylogenomics

Using transfers to date clades

?T IM E

Page 35: Genome-scale phylogenomics

Using transfers to date clades

?T IM E

Because we can identify gene transfers, we have information for ordering the nodes of a species tree

Page 36: Genome-scale phylogenomics

Bayesian species tree inference

accounting for DTL events

• STRALE: • A Bayesian probabilistic method that can interpret thousands of

gene trees in terms of: • speciation events • duplication events (D) • transfer events (T) • loss events (L)

• A method able to estimate the DTL rates • A method able to reconstruct the species tree • A method able to order the nodes of the species tree

Page 37: Genome-scale phylogenomics

Simulation to test the species tree reconstruction• 20 species • 200 gene families

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

1 5

1

3

14

10

6

8

12

18

13

5

4

2

9

0

11

19

7

16

17

0.0 0.25 0.5 0.75 1.0 1.25

2

13

7

17

15

1

5

12

10

16

11

9

0

4

8

3

14

19

6

18

Simulated Inferred

Page 38: Genome-scale phylogenomics

Better gene trees, fewer transfers

Usual approach

ALE+DTL

RF d

ista

nce

to re

al tr

ee

Szöllősi et al., Syst. Biol. 2013

Page 39: Genome-scale phylogenomics

Better gene trees, fewer transfers

Usual approach

ALE+DTL

Tran

sfer

eve

nts

per f

amily

Usual approach

ALE+DTL

RF d

ista

nce

to re

al tr

ee

Szöllősi et al., Syst. Biol. 2013

Page 40: Genome-scale phylogenomics

Better gene trees, fewer transfers

Usual approach

ALE+DTL

Tran

sfer

eve

nts

per f

amily

Usual approach

ALE+DTL

RF d

ista

nce

to re

al tr

ee

Szöllősi et al., Syst. Biol. 2013

Better ancestral genomes:

go see Adrián Arellano Davín’s poster on reconstructing ancestral genomes across the

tree of life!

Page 41: Genome-scale phylogenomics

Species: A B C D

TIME

ILS: !Mirarab et al. Science 2014

DL: Boussau et al., Genome Research 2013DL+T:!

Szöllősi et al. "PNAS 2013

Page 42: Genome-scale phylogenomics

Species: A B C D

TIME

LGT ILSILS: !

Mirarab et al. Science 2014

DL: Boussau et al., Genome Research 2013

D DLDL+T:!

Szöllősi et al. "PNAS 2013

Page 43: Genome-scale phylogenomics

18

Statistical binning

Mirarab et al., Science 2014

Page 44: Genome-scale phylogenomics

18

Statistical binning

Mirarab et al., Science 2014

MP-EST

Page 45: Genome-scale phylogenomics

19

Statistical binning

Mirarab et al., Science 2014

MP-EST

Page 46: Genome-scale phylogenomics

19

Statistical binning

Mirarab et al., Science 2014

MP-EST

MP-EST

Page 47: Genome-scale phylogenomics

20

Statistical binning improves

species tree inference

Mirarab et al., Science 2014

Page 48: Genome-scale phylogenomics

21

Statistical binning

Mirarab et al., Science 2014

Page 49: Genome-scale phylogenomics

22

Jarvis et al., Science 2014Statistical binning and birds

Page 50: Genome-scale phylogenomics

RevBayes

• Collaborative effort

• Model-based phylogenetics

• Many models of sequence evolution

• Models for dating

• Models for phylogeography

• Models for continuous traits

• Models for gene tree/species tree inference

• http://revbayes.net

• Sebastian Hoehna • Michael Landis • Tracy Heath • Fredrik Ronquist • Nicolas Lartillot • Brian Moore • John Huelsenbeck • …

Page 51: Genome-scale phylogenomics

Conclusions

• We develop methods for gene tree and species tree inference

• Improvement of gene trees and species trees in the presence of:

• duplications and losses,

• transfers,

• incomplete lineage sorting

• Parallel algorithms applicable to genome-scale data

Page 52: Genome-scale phylogenomics

Thanks!

• Lyon collaborators:

• Adrián Arellano Davín

• Gergely Szöllősi (Budapest),

• Eric Tannier,

• Vincent Daubin,

• Thomas Bigot,

• Magali Semeria,

• Manolo Gouy,

• Laurent Duret

• Austin collaborators:

• Siavash Mirarab

• Md. Shamsuzzoha Bayzid

• Tandy Warnow

Page 53: Genome-scale phylogenomics

Thanks!

• Lyon collaborators:

• Adrián Arellano Davín

• Gergely Szöllősi (Budapest),

• Eric Tannier,

• Vincent Daubin,

• Thomas Bigot,

• Magali Semeria,

• Manolo Gouy,

• Laurent Duret

• Austin collaborators:

• Siavash Mirarab

• Md. Shamsuzzoha Bayzid

• Tandy Warnow

Go see Adrián Arellano Davín’s poster on reconstructing ancestral genomes across the tree of life!