Top Banner
The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics http://bioquest.org/bedr > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA…
27

The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

Dec 17, 2015

Download

Documents

Nelson Small
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

The Evolutionary Basis of Bioinformatics:An Introduction to Phylogenetics

http://bioquest.org/bedrock

> Sequence 1GAGGTAGTAATTAGATCCGAAA…> Sequence 2GAGGTAGTAATTAGATCTGAAA…> Sequence 3GAGGTAGTAATTAGATCTGTCA…

Page 2: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

What is phylogenetics?

Phylogenetics is the study of evolutionary relationships among and within species.

crocodiles

birds

lizards

snakesrodents

primates

marsupials

Page 3: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

What is phylogenetics?

crocodiles

birds

lizards

snakes

rodents

primates

marsupials

This is an example of a phylogenetic tree.

Page 4: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

• Forensics:Did a patient’s HIV infection result from an invasive dental

procedure performed by an HIV+ dentist?

Applications of phylogenetics

• Conservation:How much gene flow is there among local populations of island

foxes off the coast of California?

• Medicine:What are the evolutionary relationships among the various

prion-related diseases?

To be continued…

Page 5: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

Phylogenetic concepts:Interpreting a Phylogeny

Sequence A

Sequence B

Sequence C

Sequence D

Sequence E

Time

Which sequence is most closely related to B?

A, because B diverged from A more recently than from any other sequence.

Physical position in tree is not meaningful! Only tree structure matters.

Page 6: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

Phylogenetic concepts:Rooted and Unrooted Trees

Time

A

B

C

D

Root =

A B

C D

Root

X

=?

A B

C D

?

? ?

? ?

X

Page 7: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

Rooting and Tree Interpretation

bacteria archaebacteria

oak

fruit fly

chickenhuman

bacteria

archaea

oak

fruit fly

chicken

human

bacteria

archaebacteria

oak

fruit fly

chicken

human

– bones

– cell nuclei

+ cell nuclei

+ bones

Page 8: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

Rooting MethodsOutgroup rootAdd 2+ taxa whose branches contain tree’s new root

trout

eagle

bat mouse

trout

eaglebat

mouse

Must already know position of new tree’s root (often go from higher to lower taxonomic unit, e.g. family genus)

shark rayray

shark

Page 9: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

How Many Trees?

Unrooted trees Rooted trees

# sequences

# pairwise distances # trees

# branches /

tree # trees

# branches

/tree

3

4

5

6

10

30

N

(assuming bifurcation only)

Page 10: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

How Many Trees?

2N - 2(2N - 3)!

2N - 2 (N - 2)!

2N - 3(2N - 5)!

2N - 3 (N - 3)!

N (N - 1)

2

N

584.95 1038578.69 103643530

1834,459,425172,027,0254510

109459105156

8105715105

6155364

433133

# branches

/tree# trees

# branches /

tree# trees

# pairwise distances

# sequences

Rooted treesUnrooted trees

Page 11: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

Tree Types

Root

50 million years

sharks

seahorses

frogs

owls

crocodiles

armadillosbats

Evolutionary trees measure time.

Root

sharksseahorses

frogsowls

crocodilesarmadillos

bats5% change

Phylograms measure change.

Page 12: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

Tree Properties

Root

UltrametricityAll tips are an equal

distance from the root.X

Y

a

b

c de

a = b + c + d + e

Root

AdditivityDistance between any two tips equals the total branch

length between them.

X

Y

ab

c d

e

XY = a + b + c + d + e

In simple scenarios, evolutionary trees are ultrametric and phylograms are additive.

Page 13: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

Tree Building Exercise

UltrametricityAll tips are an equal

distance from the root. Root

X

Y

a

b

c de

a = b + c + d + e

Using the distance matrix given, construct an ultrametric tree.

Page 14: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

Phylogenetic Methods

Maximum likelihood• Maximizes likelihood of observed data

Many different procedures exist. Three of the most popular:

Maximum parsimony• Minimizes total evolutionary change

Neighbor-joining• Minimizes distance between nearest neighbors

Page 15: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

Comparison of Methods

Neighbor-joining Maximum parsimony Maximum likelihood

Very fast Slow Very slow

Easily trapped in local optima

Assumptions fail when evolution is rapid

Highly dependent on assumed evolution model

Good for generating tentative tree, or choosing among multiple trees

Best option when tractable (<30 taxa, strong conservation)

Good for very small data sets and for testing trees built using other methods

Page 16: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

Phylogenetic concepts:Homology and Homoplasy

Hair? Wings?

Bat

Chimp

Hawk

bat

chimp

hawk

+ hair

no hairno wings

+ wings

+ wings

Homology:identity due to shared ancestry

(evolutionary signal)

Homoplasy:identity despite

separate ancestry(evolutionary noise)

Page 17: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

Trees are hypotheses about evolutionary history

So far, we’ve looked at understanding and formulating these hypotheses. Now, let’s turn our attention to testing them.

Page 18: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

Tree Testing

Let’s study the following four sequences:

How can we explain the indicated character?

P. A C A T A C GQ. G T A T A C GR. G C A C A T GS. G C A C A C A

1. Homology: Changed just once.2. Homoplasy: Changed twice or more.

P Q

R S

Homology more likely, but homoplasy still feasible.

Page 19: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

Tree Testing

Now let’s look at four other sequences:

W. A C A T G T C A G A C GX. G T A T G T C A G A C GY. G C A C A C T G A A T GZ. G C A C A C T G A A C A

P Q

R S

Same two explanations possible.Any changes to their relative likelihood?

Homology much more likely; homoplasy implausible.

Page 20: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

Tree TestingBasic principle:

Long branches Strong evolutionary signal

A

B

C

D

Short branches Weak evolutionary signal

A

B

C

D

Zero-length branches NO evolutionary signal

A

B

C

D

Tree-testing methods:Bootstrapping, Jackknifing, Split decomposition, …

Page 21: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

Applications of phylogenetics

1. Forensics

Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist?

Page 22: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

Phylogenetic analysis

Page 23: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

So what do the results mean?

• 2 of 3 patients closer to dentist than to local controls. Statistical significance? More powerful analyses?

• Do we have enough data to be confident in our conclusions? What additional data would help?

• If we determine that the dentist’s virus is linked to those of patients E and G, what are possible interpretations of this pattern? How could we test between them?

Page 24: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

How much gene flow is there among local populations of island foxes off the coast of California?

Applications of phylogenetics

2. Conservation

Page 25: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

http://bioquest.org/bedrock/

Wayne, K. R, Morin, P.A. 2004 Conservation Genetics in the New Molecular Age, Frontiers in Ecology and the Environment. 2: 89-97. (ESA publication)

Page 26: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

Applications of phylogenetics

What are the evolutionary relationships among the various prion-related diseases?

3. Medicine

Page 27: The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics  > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.

Linking Sequence and Structure

Enolase