Top Banner
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 [email protected] www.cis.fiu.edu/~giri/teach/BioinfS15.html
49

Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Apr 22, 2018

Download

Documents

dinhminh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

CAP 5510: Introduction to BioinformaticsCGS 5166: Bioinformatics Tools

Giri Narasimhan ECS 254; Phone: x3748

[email protected] www.cis.fiu.edu/~giri/teach/BioinfS15.html

Page 2: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Evolution and Phylogeny

Page 3: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Introduction

Page 215

Darwin: Evolution & Natural Selectionq Charles Darwin’s 1859 book (On the Origin of

Species By Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life) introduced the Theory of Evolution.

q Struggle for existence induces a natural selection. Offspring are dissimilar from their parents (that is, variability exists), and individuals that are more fit for a given environment are selected for. In this way, over long periods of time, species evolve. Groups of organisms change over time so that descendants differ structurally and functionally from their ancestors.

Slide by Pevsner 4/5/15 CAP5510 / CGS5166 3

Page 4: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Dominant View of Evolutionq All existing organisms are derived from a common

ancestor and that new species arise by splitting of a population into subpopulations that do not cross-breed.

q Organization: Directed Rooted Tree; Existing species: Leaves; Common ancestor species (divergence event): Internal node; Length of an edge: Time.

4/5/15 CAP5510 / CGS5166 4

Page 5: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

plants animals

monera

fungi protists

protozoa

invertebrates

vertebrates

mammals Five kingdom system

(Haeckel, 1879)

Page 516

Slide by Pevsner

4/5/15 CAP5510 / CGS5166 5

Page 6: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Evolution & Phylogenyq At the molecular level, evolution is a process of

mutation with selection. q Molecular evolution is the study of changes in genes

and proteins throughout different branches of the tree of life.

q Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features between organisms. Today, molecular sequence data are also used for phylogenetic analyses.

Slide by Pevsner 4/5/15 CAP5510 / CGS5166 6

Page 7: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Questions for Phylogenetic Analysisq How many genes are related to my favorite gene? q How related are whales, dolphins & porpoises to

cows? q Where and when did HIV or other viruses

originate? q What is the history of life on earth? q Was the extinct quagga more like a zebra or a

horse?

Slide by Pevsner

4/5/15 CAP5510 / CGS5166 7

Page 8: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Phylogenetic Treesq Molecular phylogeny

uses trees to depict evolutionary relationships among organisms. These trees are based upon DNA and protein sequence data.

A

B

C

D

E

F

G

H I

time

6

2 1 1

2

1

2

Slide by Pevsner 4/5/15 CAP5510 / CGS5166 8

Page 9: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

A

B

C

D

E

F

G

H I

time

6

2 1 1

2

1

2

6

1 2

2

1

A

B C

2

1

2 D

E one unit

Tree nomenclature

taxon

taxon

Fig. 7.8 Page 232

Tree NomenclatureSlide by Pevsner

4/5/15 CAP5510 / CGS5166 9

Page 10: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

A

B

C

D

E

F

G

H I

time

6

2 1 1

2

1

2

6

1 2

2

1

A

B C

2

1

2 D

E one unit

Tree nomenclature

taxon

operational taxonomic unit (OTU) such as a protein sequence

Fig. 7.8 Page 232

Slide by Pevsner

4/5/15 CAP5510 / CGS5166 10

Page 11: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

A

B

C

D

E

F

G

H I

time

6

2 1 1

2

1

2

6

1 2

2

1

A

B C

2

1

2 D

E one unit

Tree nomenclature

branch (edge)

Node (intersection or terminating point of two or more branches)

Fig. 7.8 Page 232

Slide by Pevsner

4/5/15 CAP5510 / CGS5166 11

Page 12: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

A

B

C

D

E

F

G

H I

time

6

2 1 1

2

1

2

6

1 2

2

1

A

B C

2

1

2 D

E one unit

Tree nomenclature

Branches are unscaled... Branches are scaled...

…branch lengths are proportional to number of amino acid changes

…OTUs are neatly aligned, and nodes reflect time

Fig. 7.8 Page 232

Slide by Pevsner

4/5/15 CAP5510 / CGS5166 12

Page 13: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

A

B

C

D

E

F

G

H I

time

6

2 1 1

2

1

2

6

1 2

2

1

A

B C

2 2 D

E one unit

Tree nomenclature

bifurcating internal node

multifurcating internal node

Fig. 7.9 Page 233

Slide by Pevsner

4/5/15 CAP5510 / CGS5166 13

Page 14: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Examples of multifurcation: failure to resolve the branching order of some metazoans and protostomes

Rokas A. et al., Animal Evolution and the Molecular Signature of Radiations Compressed in Time, Science 310:1933 (2005), Fig. 1.

Slide by Pevsner

4/5/15 CAP5510 / CGS5166 14

Page 15: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

A

B C

D

E

F

G

H I

time

6

2 1 1

2

1

2

Tree nomenclature: clades

Clade ABF (monophyletic group)

Fig. 7.8 Page 232

Slide by Pevsner

4/5/15 CAP5510 / CGS5166 15

Page 16: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

A

B

C

D

E

F

G

H I

time

6

2 1 1

2

1

2

Tree nomenclature

Clade CDH

Fig. 7.8 Page 232

Slide by Pevsner

4/5/15 CAP5510 / CGS5166 16

Page 17: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

A

B

C

D

E

F

G

H I

time

6

2 1 1

2

1

2

Tree nomenclature

Clade ABF/CDH/G

Fig. 7.8 Page 232

Slide by Pevsner

4/5/15 CAP5510 / CGS5166 17

Page 18: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Examples of clades

Lindblad-Toh et al., Nature 438: 803 (2005), fig. 10

Slide by Pevsner

4/5/15 CAP5510 / CGS5166 18

Page 19: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Tree nomenclature: roots

past

present

1

2 3 4

5

6 7 8

9

4

5

8 7

1

2

3 6

Rooted tree (specifies evolutionary path)

Unrooted tree

Fig. 7.10 Page 234

Slide by Pevsner

4/5/15 CAP5510 / CGS5166 19

Page 20: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Tree nomenclature: outgroup rooting

past

present

1

2 3 4

5

6 7 8

9

Rooted tree

1 2 3 4

5 6 Outgroup

(used to place the root)

7 9 10

root

8

Fig. 7.10 Page 234

Slide by Pevsner

4/5/15 CAP5510 / CGS5166 20

Page 21: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Constructing Evolutionary/Phylogenetic Trees

q 2 broad categories:  Distance-based methods Ø Ultrametric Ø Additive:

§  UPGMA §  Transformed Distance §  Neighbor-Joining

Character-based Ø Maximum Parsimony Ø Maximum Likelihood Ø Bayesian Methods

4/5/15 CAP5510 / CGS5166 21

Page 22: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Ultrametricq An ultrametric tree:

decreasing internal node labels distance between two nodes is label of least common ancestor.

q An ultrametric distance matrix: Symmetric matrix such that for every i, j, k, there is tie for maximum of D(i,j), D(j,k), D(i,k)

Dij, Dik

i j k

Djk

4/5/15 CAP5510 / CGS5166 22

Page 23: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Ultrametric: Assumptionsq Molecular Clock Hypothesis, Zuckerkandl & Pauling,

1962: Accepted point mutations in amino acid sequence of a protein occurs at a constant rate.

Varies from protein to protein Varies from one part of a protein to another

4/5/15 CAP5510 / CGS5166 23

Page 24: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Ultrametric Data Sourcesq Lab-based methods: hybridization

Take denatured DNA of the 2 taxa and let them hybridize. Then measure energy to separate.

q Sequence-based methods: distance

4/5/15 CAP5510 / CGS5166 24

Page 25: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Ultrametric: Example

A B C D E F G H

A 0 4 3 4 5 4 3 4

B

C

D

E

F

G

H C,G

B,D,F,H

E

A

5

4

3

4/5/15 CAP5510 / CGS5166 25

Page 26: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Ultrametric: Example

A B C D E F G H

A 0 4 3 4 5 4 3 4

B 0 4 2 5 1 4 4

C

D

E

F

G

H A C,G

E

5

4

3

F

D H

B

2

1

4/5/15 CAP5510 / CGS5166 26

Page 27: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Ultrametric: Distances Computed

A B C D E F G H

A 0 4 3 4 5 4 3 4

B 0 4 2 5 1 4 4

C 2

D

E

F

G

H A C,G

E

5

4

3

F

D H

B

2

1

4/5/15 CAP5510 / CGS5166 27

Page 28: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Ultrametric: Assumptionsq Molecular Clock Hypothesis, Zuckerkandl & Pauling,

1962: Accepted point mutations in amino acid sequence of a protein occurs at a constant rate.

Varies from protein to protein Varies from one part of a protein to another

4/5/15 CAP5510 / CGS5166 28

Page 29: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Ultrametric Data Sourcesq Lab-based methods: hybridization

Take denatured DNA of the 2 taxa and let them hybridize. Then measure energy to separate.

q Sequence-based methods: distance

4/5/15 CAP5510 / CGS5166 29

Page 30: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Additive-Distance Trees

A B C D

A 0 3 7 9

B 0 6 8

C 0 6

D 0

A 2

B C

D 3

2

4

1

Additive distance trees are edge-weighted trees, with distance between leaf nodes are exactly equal to length of path between nodes.

4/5/15 CAP5510 / CGS5166 30

Page 31: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Unrooted Trees on 4 Taxa

A

D

C

B

A

D

B

C

A

B

C

D

4/5/15 CAP5510 / CGS5166 31

Page 32: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Four-Point Conditionq  If the true tree is as shown below, then

1.  dAB + dCD < dAC + dBD, and 2.  dAB + dCD < dAD + dBC

A

D

C

B

4/5/15 CAP5510 / CGS5166 32

Page 33: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Unweighted pair-group method with arithmetic means (UPGMA)

A B C

B dAB

C dAC dBC

D dAD dBD dCD

A B

dAB/2

AB C

C d(AB)C

D d(AB)D dCD

d(AB)C = (dAC + dBC) /2

4/5/15 CAP5510 / CGS5166 33

Page 34: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Transformed Distance Methodq UPGMA makes errors when rate constancy among

lineages does not hold. q Remedy: introduce an outgroup & make corrections

q Now apply UPGMA !!!!

"

#

$$$$

%

&

+−−

=∑=

n

DDDDD

n

kkO

jOiOijij 1

2'

4/5/15 CAP5510 / CGS5166 34

Page 35: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Saitou & Nei: Neighbor-Joining Method

q Start with a star topology. q Find the pair to separate such that the total length

of the tree is minimized. The pair is then replaced by its arithmetic mean, and the process is repeated.

∑∑≤≤≤= −

++−

+=njiij

n

kkk D

nDD

nDS

3321

1212

)2(1)(

)2(21

2

4/5/15 CAP5510 / CGS5166 35

Page 36: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Neighbor-Joining

1

2

n n

3 3

1

2

∑∑≤≤≤= −

++−

+=njiij

n

kkk D

nDD

nDS

3321

1212

)2(1)(

)2(21

2

4/5/15 CAP5510 / CGS5166 36

Page 37: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Constructing Evolutionary/Phylogenetic Trees

q 2 broad categories: Distance-based methods Ø Ultrametric Ø Additive:

§  UPGMA §  Transformed Distance §  Neighbor-Joining

 Character-based Ø Maximum Parsimony Ø Maximum Likelihood Ø Bayesian Methods

4/5/15 CAP5510 / CGS5166 37

Page 38: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Character-based Methodsq  Input: characters, morphological features, sequences, etc. q  Output: phylogenetic tree that provides the history of what features

changed. [Perfect Phylogeny Problem] q  one leaf/object, 1 edge per character, path ⇔changed traits

1 2 3 4 5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 0 1

D 0 0 1 1 0

E 0 1 0 0 0

3

4

2

1

5 D

A C

E B

4/5/15 CAP5510 / CGS5166 38

Page 39: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Exampleq Perfect phylogeny does not always exist.

1 2 3 4 5

A 1 1 0 0 0

B 0 0 1 0 1

C 1 1 0 0 1

D 0 0 1 1 0

E 0 1 0 0 1

1 2 3 4 5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 0 1

D 0 0 1 1 0

E 0 1 0 0 0 3

4

2

1

5 D

A C

E B

4/5/15 CAP5510 / CGS5166 39

Page 40: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Maximum Parsimonyq Minimize the total number of mutations implied by

the evolutionary history

4/5/15 CAP5510 / CGS5166 40

Page 41: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Examples of Character Data

Characters/Sites

Sequences 1 2 3 4 5 6 7 8 9

1 A A G A G T T C A

2 A G C C G T T C T

3 A G A T A T C C A

4 A G A G A T C C T

1 2 3 4 5

A 1 1 0 0 0

B 0 0 1 0 1

C 1 1 0 0 1

D 0 0 1 1 0

E 0 1 0 0 1

4/5/15 CAP5510 / CGS5166 41

Page 42: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Maximum Parsimony Method: Example

Characters/Sites

Sequences 1 2 3 4 5 6 7 8 9

1 A A G A G T T C A

2 A G C C G T T C T

3 A G A T A T C C A

4 A G A G A T C C T

4/5/15 CAP5510 / CGS5166 42

Page 43: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Unrooted Trees on 4 Taxa

A

D

C

B

A

D

B

C

A

B

C

D

4/5/15 CAP5510 / CGS5166 43

Page 44: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

1 2 3 4 5 6 7 8 9

1 A A G A G T T C A

2 A G C C G T T C T

3 A G A T A T C C A

4 A G A G A T C C T

1 2 3 4 5 6 7 8 9

1 A A G A G T T C A

2 A G C C G T T C T

3 A G A T A T C C A

4 A G A G A T C C T

1 2 3 4 5 6 7 8 9

1 A A G A G T T C A

2 A G C C G T T C T

3 A G A T A T C C A

4 A G A G A T C C T

1 2 3 4 5 6 7 8 9

1 A A G A G T T C A

2 A G C C G T T C T

3 A G A T A T C C A

4 A G A G A T C C T

4/5/15 CAP5510 / CGS5166 44

Page 45: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Inferring nucleotides on internal nodes

4/5/15 CAP5510 / CGS5166 45

Page 46: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Searching for the Maximum Parsimony

Tree: Exhaustive Search

4/5/15 CAP5510 / CGS5166 46

Page 47: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Searching for the Maximum Parsimony Tree: Branch-&-Bound

4/5/15 CAP5510 / CGS5166 47

Page 48: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Probabilistic Models of Evolution

q  Assuming a model of substitution,

Pr{Si(t+Δ) = Y |Si(t) = X}, q  Using this formula it is

possible to compute the likelihood that data D is generated by a given phylogenetic tree T under a model of substitution. Now find the tree with the maximum likelihood.

X

Y

• Time elapsed? Δ • Prob of change along edge? Pr{Si(t+Δ) = Y |Si(t) = X} • Prob of data? Product of prob for all edges

4/5/15 CAP5510 / CGS5166 48

Page 49: Giri Narasimhanusers.cis.fiu.edu/~giri/teach/Bioinf/S15/LecX1-Phylogeny.pdf · Giri Narasimhan ECS 254; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis, Zuckerkandl

Computing Maximum Likelihood

Tree

4/5/15 CAP5510 / CGS5166 49