Top Banner
Inferring phylogenetic trees Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington [email protected]
23

Inferring phylogenetic trees

Feb 23, 2016

Download

Documents

zayit

Inferring phylogenetic trees. Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington [email protected]. One-minute responses. I did not understand anything in the Gibbs sampling and the second method. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Inferring phylogenetic trees

Inferring phylogenetic trees

Prof. William Stafford NobleDepartment of Genome Sciences

Department of Computer Science and EngineeringUniversity of Washington

[email protected]

Page 2: Inferring phylogenetic trees

One-minute responses• I did not understand anything in the Gibbs sampling and the second method.• The class was quite OK now. Understood most important things.• I understood 50% of the Python part. But I am a bit confused about the goal of the

programs.• Please send us the slides immediately after lecture.

– I put the slides on the website during the Python half of the class. Hit “refresh” on the web browser to see them.

• I didn’t understand clearly converting scores to p-values, more especially putting 1 and 2. Otherwise everything was clear.

• I think we should go a little bit slower.• I didn’t understand the EM and Gibbs.• The concept of EM and Gibbs sampling are really very important. Please go in

depth on them.• Python sessions are still fine as usual.• These algorithms are complex. Could you please explain them with a bit of some

examples?• I didn’t understand the second Python problem.• Emile must not mark our assessment on the programming part.

Page 3: Inferring phylogenetic trees

Revision - Gibbs

Motif occurrences

PSSM

Randomly select

1. Randomly discard one sequence

2. Build PSSM from remaining sequences• Counts• Add pseudocounts• Normalize

1. Scan discarded sequence with PSSM

2. Choose new occurrence according to resulting probabilities

sequences

Page 4: Inferring phylogenetic trees

Revision - EM

Motif occurrences

PSSM

Randomly select

1. Counts2. Add pseudocounts3. Normalize4. Divide by background5. Take log2

1. Scan each sequence with PSSM

2. Take top-scoring occurrence

sequences

Page 5: Inferring phylogenetic trees

Phylogenetic inference

RabbitDoveLionDonkey

?

Page 6: Inferring phylogenetic trees

Outline

• Parsimony• Distance methods

– Computing distances– Finding the tree

• Maximum likelihood

Page 7: Inferring phylogenetic trees

Selecting a method

Chooseset of

relatedsequences

Obtainmultiple

sequencealignment

Is therestrong

sequencesimilarity?

Maximumparsimonymethods

Is there clearlyrecognizable

sequencesimilarity

Maximumlikelihoodmethods

Distancemethods

No

Yes

No

Yes

Page 8: Inferring phylogenetic trees

Maximum parsimony

for each possible treecompute the parsimony score

return the tree with the best score

Enumerating these trees can take a very

long time

Computing this score is straightforward

Page 9: Inferring phylogenetic trees

How many trees?

• With four sequences: 3 unrooted trees

• With five sequences: 15 unrooted trees.• With seven sequences: 954 unrooted trees.

1

2

3

4

1

3

2

4

1

4

3

2

Page 10: Inferring phylogenetic trees

Computing parsimony scoresScer     A G A A A A A T A A C T T T C T C A T G

Spar     G G A A A A A T A A C T T T C T G A C A

Smik     A A A A T A A C T T C T C A A C A A T ASkud     A T C T T G A T C C C T T G T G T T G A

Scer = A Smik = A

Spar = G Skud = A

Page 11: Inferring phylogenetic trees

Computing parsimony scoresScer     A G A A A A A T A A C T T T C T C A T G

Spar     G G A A A A A T A A C T T T C T G A C A

Smik     A A A A T A A C T T C T C A A C A A T ASkud     A T C T T G A T C C C T T G T G T T G A

Scer = A Smik = A

Spar = G Skud = A

A A

Score = 1

Page 12: Inferring phylogenetic trees

Computing parsimony scoresScer     A G A A A A A T A A C T T T C T C A T G

Spar     G G A A A A A T A A C T T T C T G A C A

Smik     A A A A T A A C T T C T C A A C A A T ASkud     A T C T T G A T C C C T T G T G T T G A

Scer = A Smik = A

Spar = G Skud = A

Scer = A Spar = G

Smik = A Skud = A

Scer = A Smik = A

Skud = A Spar = G

A A

Score = 1

A A

A A

Score = 1

Score = 1

This site is uninformative, because all the trees have the same score.

Page 13: Inferring phylogenetic trees

Computing parsimony scoresScer     A G A A A A A T A A C T T T C T C A T G

Spar     G G A A A A A T A A C T T T C T G A C A

Smik     A A A A T A A C T T C T C A A C A A T ASkud     A T C T T G A T C C C T T G T G T T G A

Scer = Smik =

Spar = Skud =

Scer = Spar =

Smik = Skud =

Scer = Smik =

Skud = Spar =

Score = ?

Score = ?

Score = ?

Page 14: Inferring phylogenetic trees

Computing parsimony scoresScer     A G A A A A A T A A C T T T C T C A T G

Spar     G G A A A A A T A A C T T T C T G A C A

Smik     A A A A T A A C T T C T C A A C A A T ASkud     A T C T T G A T C C C T T G T G T T G A

Scer = G Smik = A

Spar = G Skud = T

Scer = G Spar = G

Smik = A Skud = T

Scer = G Smik = A

Skud = T Spar = G

G A

Score = 2

G G

G G

Score = 2

Score = 2

Page 15: Inferring phylogenetic trees

Computing parsimony scoresScer     A G A A A A A T A A C T T T C T C A T G

Spar     G G A A A A A T A A C T T T C T G A C A

Smik     A A A A T A A C T T C T C A A C A A T ASkud     A T C T T G A T C C C T T G T G T T G A

Scer = Smik =

Spar = Skud =

Scer = Spar =

Smik = Skud =

Scer = Smik =

Skud = Spar =

Score = ?

Score = ?

Score = ?

Page 16: Inferring phylogenetic trees

Computing parsimony scoresScer     A G A A A A A T A A C T T T C T C A T G

Spar     G G A A A A A T A A C T T T C T G A C A

Smik     A A A A T A A C T T C T C A A C A A T ASkud     A T C T T G A T C C C T T G T G T T G A

Scer = A Smik = T

Spar = A Skud = T

Scer = A Spar = A

Smik = T Skud = T

Scer = A Smik = T

Skud = T Spar = A

Score = 1

Score = 2

Score = 2

A T

A A

A A

This tree is best.

Page 17: Inferring phylogenetic trees

Computing parsimony scoresScer     A G A A A A A T A A C T T T C T C A T G

Spar     G G A A A A A T A A C T T T C T G A C A

Smik     A A A A T A A C T T C T C A A C A A T ASkud     A T C T T G A T C C C T T G T G T T G A

1 2 1 1 1 1 0 1 2 2 0 0 1 2 2 2 3 1 2 1

Scer Smik

Spar Skud

Total = 26

Page 18: Inferring phylogenetic trees

Computing parsimony scoresScer     A G A A A A A T A A C T T T C T C A T G

Spar     G G A A A A A T A A C T T T C T G A C A

Smik     A A A A T A A C T T C T C A A C A A T ASkud     A T C T T G A T C C C T T G T G T T G A

1 2 1 1 2 1 0 1 2 2 0 0 1 2 2 2 3 1 3 1

Scer Spar

Smik Skud

Total = 28

Page 19: Inferring phylogenetic trees

Parsimony software

• In general, the most widely used programs for phylogenetic analysis are– Phylip (Joe Felsenstein)– PAUP (Jim Swofford)– MacClade (David and Wayne Maddison)

• All three do parsimony. Only Phylip is free.

Page 20: Inferring phylogenetic trees

Previous one-minute responses• How many sequences are usually analyzed by

parsimony methods?– Exhaustively, probably tens of sequences. With heuristic

search methods, you can analyze arbitrarily many, but you lose the guarantee that you’re finding the most parsimonious tree.

• What do good parsimony scores look like?– It depends upon how many sequences are involved, and

how divergent they are.• Why doesn’t the parsimony method take into

account transitions versus transversions?– It can; I presented the simplest version.

Page 21: Inferring phylogenetic trees

Jukes-Cantor model• Assume the same

probability of change at all positions and all times.

• dAB is the proportion of changed sites in the alignment.

• KAB is the distance between sequences A and B.

ABAB dK

341ln

43

Page 22: Inferring phylogenetic trees

Problem #1

• Write a program jukes-cantor.py that takes as input a pairwise sequence alignment and prints the Jukes-Cantor distance. Skip sites that contain gaps.

> cat twoseqs.txtACGTACCG> python jukes-cantor.py twoseqs.txt0.823959

ABAB dK

341ln

43

Page 23: Inferring phylogenetic trees

Problem #2• Generalize your previous program to work for a multiple

sequence alignment.> cat threeseqs.txtACGTACTGACGG> python jukes-cantor-matrix.py threeseqs.txt 0.000 0.824 0.304 0.824 0.000 0.304 0.304 0.304 0.000 > jukes-cantor-multiple.py moreseqs.txt 0.000 0.233 0.383 0.233 0.233 0.000 0.824 0.572 0.383 0.824 0.000 0.107 0.233 0.572 0.107 0.000