Top Banner
Algorithms for phylogeny construction A Hybrid Micro-Macroevolutionary Approach to Gene Tree Reconstruction ICE-TCS Inaugural Symposium Bjarni V. Halld´ orsson April 30, 2005 1
21

Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

Sep 13, 2019

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

Algorithms for phylogeny construction

A Hybrid Micro-Macroevolutionary Approach to Gene Tree

Reconstruction

ICE-TCS Inaugural Symposium

Bjarni V. Halldorsson

April 30, 2005

1

Page 2: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

Character based phylogeny

2

Page 3: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

Has Intelligence?

nozzvvvvvvvvvvvvvvvvvvvvvvvv

yes

$$JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

Has Body Hair?

no����

����

����

����

����

����

����

����

yes BB

BBBB

BBBB

BBBB

BBBB

BBBB

BBBB

BBBB

BB

3

Page 4: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

Genes, genomes

• Gene - a sequence having functional importance. AACG,

CACC, TACT

• Genome - a sequence containing genes as subsequences

TATAACGTTTCTACTCTATTACTCC

4

Page 5: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

Evolution - changes in the genome

Original:

TATAACGTTTCTACTCTATTACTCC

Mutation:

TATAACGTTTCTAATCTCTTACTCC

Duplication:

TATAACGTTTCTACTCTATTACTCCTCTACTCT

Loss:

TA-TCTACTCTATTACTCC

5

Page 6: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

Phylogenies

A species phylogeny shows the evolutionary history of a set of

species.

wwooooooooooooooo

))TTTTTTTTTTTTTTTT

%%KKKKKKKKKKKK

zzuuuuuuuuuuuu mouse

human monkey

A gene phylogeny shows the evolutionary history of a single gene.

P

�� ))SSSSSSSSSSSSSSSSS

P

xxpppppppppppppp

((RRRRRRRRRRRRRRRRRRRR Pmouse

P1

yyssssssssss

%%LLLLLLLLLL P2

�� ((PPPPPPPPPPPPPP

P1human

P1monkey P2

humanP2monkey

6

Page 7: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

Why are gene phylogenies interesting?

• The same gene in different species is likely to play the same

role.

• We want to determine the function of a gene in human.

• Experiments in mouse, yeast or flies are less controversial

and take less time than in human.

7

Page 8: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

Phylogeny construction considering mutations A very large num-

ber of algorithms exist for this problem.

• Character based algorithms (as mentioned before).

• Distances between the sequences are computed (such as the

number of mutations that occured between the sequences).

• If the phylogeny has the ultrametric property an efficient

algorithm can be employed.

8

Page 9: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

Macroevolutionary phylogeny

Input: A rooted species tree, TS with s leaves; a list of multiplic-

ities m1 . . . ms, where ml is the number of gene family members

found in species l; weights cλ and cδ.

Output: A rooted gene tree {TG} with∑s

l=1 ml leaves such that

the D/L Score of TG is minimal.

zztttttttttt

''OOOOOOOOOOOOO

||zzzz

zzzz

��yyssssssssss

""EE

EEEE

EEEE

E

2A

����

@@@@

@@@@

@ 1B

~~~~~~

~~~~

~

��@@

@@@@

@@@

��@@

@@@@

@@@

��~~~~

~~~~

~ 2C 1D 2E

1F 2G

9

Page 10: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

Phylogenies considering only cost of loss

• If the cost of losing a gene is much higher than the cost of

duplication we will construct a phylogeny that minimizes the

number of lost genes.

• All duplications will then take place after the speciations take

place.

10

Page 11: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

4 Duplications

zzvvvvvvvvv

&&NNNNNNNNNNNN

}}zzzz

zzzz

��yytttt

tttttt

""DD

DDDD

DDDD

2A

����

@@@@

@@@@

@ 1B

��~~~~

~~~~

~

��@@

@@@@

@@@

��@@

@@@@

@@@

������

����

� 2C 1D 2E

1F 2G

vvmmmmmmmmmmmmmm

''PPPPPPPPPPPPP

xxppppppppppp

��yyrrrrrrrrrrr

##GGGGGGGGGGG

Dupl

||xxxx

xxxx

x

����

%%JJJJJJJJJJJJJ 1B

}}||||

||||

||

""FF

FFFF

FFFF

F

A A

yytttttttttttttt

��

Dupl

�� $$HHH

HHHHH

HH1D Dupl

||xxxx

xxxx

x

��

F Dupl

%%KKKKKKKKKKK

yysssssssssssC C E E

G G

11

Page 12: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

Phylogenies considering only cost of duplication

• If the cost of a duplication is much higher than the cost of a

loss we will construct a phylogeny that minimizes the number

of duplications.

• All duplications can then be assumed to occur before any

speciation occurs.

12

Page 13: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

1 Duplication, 3 Losses

zzvvvvvvvvv

&&NNNNNNNNNNNN

}}zzzz

zzzz

��yytttt

tttttt

""DD

DDDD

DDDD

2A

����

@@@@

@@@@

@ 1B

��~~~~

~~~~

~

��@@

@@@@

@@@

��@@

@@@@

@@@

������

����

� 2C 1D 2E

1F 2G

rreeeeeeeeeeeeeeeeeeeeeeeeeeeeee

))RRRRRRRRRRRRRRR

yyssssssssss

((PPPPPPPPPPPPPP

||yyyy

yyyy

y

$$IIIII

IIIII

{{wwwwwwwww

��xxqqqqqqqqqqq

##GG

GGGGG

GGGG

~~~~~~

~~~

��||yy

yyyy

yyy

AA

AAAA

AAA

A

��!!

CCCC

CCCC

CC Lost

}}{{{{

{{{{

{{

��<<

<<<<

<<< A

����

<<<<

<<<<

< B

������

����

��<<

<<<<

<<<

!!CC

CCCC

CCCC

}}{{{{

{{{{

{{ C Lost E

��<<

<<<<

<<<

������

����

� C D E

Lost G F G

13

Page 14: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

Phylogenies considering duplication and loss

Reconstruct[TS, {m1 . . . ms}]

Ascend[root(TS)];

Descend[root(TS), 1];

Construct[root(TS)];

14

Page 15: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

Ascend[v]

if v is not a leaf: Ascend[l(v)]; Ascend[r(v)];

if v is a leaf:

∀i s.t. 1 ≤ i ≤ m

costminv [i]← cδ ∗max(mv − i,0) + cλ ∗max(i−mv,0);

if v is not a leaf:

∀i, j s.t. 1 ≤ i, j ≤ m

costv[i, j]← cδ ∗max(j − i,0) + cλ ∗max(i− j,0) + costminl(v)

[j] + costminr(v)

[j];

∀i costminv [i]← min∀j{costv[i, j]};

15

Page 16: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

Descend

Descend[v, i]

if v is a leaf:

v.losses← max((i−mv),0); v.dups← max((mv − i),0);

v.out← 0;

else

repeat { v.out + + } until ( costv[i, v.out] == costminv [i] );

Descend[l(v), v.out]; Descend[r(v), v.out]

v.losses← max(i− v.out,0); v.dups← max (v.out− i,0)

16

Page 17: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

Construct

Construct[s]

g ← new gene node; g.species← s

if (s.currDup < s.dups)

s.currDup + +; l(g)← Construct[s]; r(g)← Construct[s];

else if (s.currLoss < s.losses)

s.currLoss + +;

else if (s.currSpec < s.out)

s.currSpec + +;

if s is not a leaf: l(g)← Construct[l(s)]; r(g)← Construct[r(s)];

return g;

17

Page 18: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

2 Duplications, 1 loss

zzvvvvvvvvv

&&NNNNNNNNNNNN

}}zzzz

zzzz

��yytttt

tttttt

""DD

DDDD

DDDD

2A

����

@@@@

@@@@

@ 1B

��~~~~

~~~~

~

��@@

@@@@

@@@

��@@

@@@@

@@@

������

����

� 2C 1D 2E

1F 2G

Dupl

vvmmmmmmmmmmmmmm

%%LLLLLLLLLL

@@

@@@@

@@@

vvmmmmmmmmmmmmmmmmm

xxqqqqqqqqqqq

##HHHHHHHHHHH

~~||||

||||

||

�� ������

����

��� 1B

}}{{{{

{{{{

{{

##GG

GGGG

GGGG

G

A

��::

::::

:::

��

A

!!CC

CCCC

CCCC

��

1D Dupl

��{{wwww

wwww

w

��<<

<<<<

<<<

}}{{{{

{{{{

{{ C

""DD

DDDD

DDDD

������

����

C E E

Lost G F G

18

Page 19: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

Time Complexity

Optimal history can be found in time O(nm2). Where n is the

number of nodes in the species tree and m is the maximum

number of genes drawn from any species.

In Ascendleaves of the species tree can be annotated with mul-

tiplicities in O(nm) time. The cost vector in each node is of

length m + 1 and each entry can be computed in time O(m),

total O(nm2).

Descend requires O(m) at each node, total O(nm). Construct

inserts duplication and loss nodes in the new tree, which can

number in total no more than m per node in TS. Total O(nm).

19

Page 20: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

Extensions

• Combining duplication and loss cost with cost of mutations.

– Some edges of a phylogeny tree are well supported by a

micro-evolutionary phylogenetic construction algorithms.

– Edges that are not as well supported can be rearranged

minimizing duplication and loss.

• Consider and display all possible optimal histories.

20

Page 21: Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ J Has Body Hair? no yes BB BBB BBB BBB BBB BBB BBB BBB BBB BBB BBB 3. Genes, genomes

Acknowledgements

• R. Ravi, Carnegie Mellon University

• Dannie Durand, Carnegie Mellon University

A Hybrid Micro-Macroevolutionary Approach to Gene Tree Re-

construction. D. Durand, B. V. Halldorsson, B. Vernot, 2005.

Proceedings of the Ninth Annual International Conference on

Computational Molecular Biology (RECOMB), To Appear.

21