Top Banner
Trees & Topologies Chapter 3, Part 1
18

Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.

Jan 03, 2016

Download

Documents

Howard Gardner
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.

Trees & TopologiesChapter 3, Part 1

Page 2: Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.

Terminology

• Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set of genes

• Jump Process – describes which pair of genes coalesce at each coalescence event

• Waiting Time Process – the waiting time to the next coalescent event when there are k genes left

2/19/2009 COMP 790-Trees & Topologies 2

Page 3: Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.

Coalescent Tree

2/19/2009 COMP 790-Trees & Topologies 3

Page 4: Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.

Coalescent vs. Phylogenic Trees

• Phylogenetic tree: branch length = #of mutations• Coalescent tree: branch length = time to coalescence

(coalescent time x 2N generations x generation time)• Expected number of mutations = /2 Coalescent time

2/19/2009 COMP 790-Trees & Topologies 4

Rooted Phylogenetic Tree

Four representations of a coalescent tree

Page 5: Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.

Counting Trees & Topologies

(Ck) # of coalescent topologies with k leaves

(Bk) # of binary unrooted tree topologies with k leaves

2/19/2009 COMP 790-Trees & Topologies 5

Page 6: Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.

Recursion Illustrated

2/19/2009 COMP 790-Trees & Topologies 6

Basic recursion for the number of unrooted tree topologies as a function of leaves

Page 7: Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.

Recurrence Intuition

K 2 3 4 5 6 7 8 10 15 20

Bk 1 1 3 15 105 945 10395 2027025 7.9x1012 2.2x1020

Ck 1 3 18 180 2700 56700 1587600 2571912000 7.0x1018 5.6x1029

2/19/2009 COMP 790-Trees & Topologies 7

Page 8: Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.

Gene Trees

• Graph that shows the ancestral relationship between genes.

• Assume infinite sites model to build gene trees. (Ch. 5 discusses what happens without this assumption)

• Not a coalescent tree.• Clusters genes according to their type and

mutation pattern.

2/19/2009 COMP 790-Trees & Topologies 8

Page 9: Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.

Example Gene Tree

2/19/2009 COMP 790-Trees & Topologies 9

Data set with five sequences and four segregating sites with relative positions.

Built up, starting with first site, and continually adding more sites to the tree.

Page 10: Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.

Building Gene Trees

1. Determine if data passes 4-gamete test. If not, there cannot be a gene tree.

2. If each column is a binary number, sort the numbers in decreasing order, with largest binary number in column one.

3. Add each sequence with all its characters one at a time. The characters of a sequence to be added is a specific row, which is read right to left. The sequence is placed by tracing from the leaves towards the root. It has its own edges until the prefix is encountered where it coincides with the last added character.

4. Root is labeled with an open circle. It can be removed to form an unrooted tree.

2/19/2009 COMP 790-Trees & Topologies 10

Page 11: Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.

Example

Given the following table, build a gene tree.1. Determine if data passes 4-gamete test. If not, there cannot be a gene tree.2. If each column is a binary number, sort the numbers in decreasing order, with

largest binary number in column one.3. Add each sequence with all its characters one at a time. The characters of a

sequence to be added is a specific row, which is read right to left. The sequence is placed by tracing from the leaves towards the root. It has its own edges until the prefix is encountered where it coincides with the last added character.

4. Root is labeled with an open circle. It can be removed to form an unrooted tree.

2/19/2009 COMP 790-Trees & Topologies 11

A B C D

1. 0 0 1 0

2. 0 0 0 1

3. 1 0 0 0

4. 0 0 0 1

5. 1 1 0 0

Page 12: Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.

Nested Subsamples

• Assume a sample A, is taken of size n, and within that sample a subsample B, of size m is taken, m n.

• Process describing the number of ancestors starts out in (m,n) and jumps to either (m,n-1) or (m-1,n-1)

2/19/2009 COMP 790-Trees & Topologies 12

Page 13: Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.

More nested subsamples

• Probability that the MRCA of B is also the MRCA of A

• Special case: A is the whole population (n , or n = 2N, and 2N is large)

2/19/2009 COMP 790-Trees & Topologies 13

Page 14: Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.

More nested subsamples

M 1 2 3 5 9 19 29

P (A = B) 0 / 2 (no info) 1/3 1/2 2/3 = 0.67 4/5 = 0.80 9/10 = 0.90 14/15 = 0.9333

2/19/2009 COMP 790-Trees & Topologies 14

Remember: time until whole population has found a MRCA is 2 (in coalescent units) and the time until a sample of size two has found a MRCA is 1.

Page 15: Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.

Hanging Subtrees

2/19/2009 COMP 790-Trees & Topologies 15

Page 16: Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.

Unbalanced Trees

• Probability that the basal split into two lineages at the root of the tree results in the labeled, unordered partition (i, n-i), i = 1,2,…,n/2

• In large samples, unbalanced trees are unlikely.

2/19/2009 COMP 790-Trees & Topologies 16

Page 17: Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.

Neanderthal Example

• Nordborg(1998) studied the tree of a combined sample of 986 human mitochondrial sequences and 1 Neanderthal sequence.

• Assuming random mating: 2 /(986 *985) = 2 * 10-6

• Nordborg pointed out that a large part of the human sample had found a common ancestor during the time the sequence Neanderthal lived (30,000-100,000 years ago)

• For example, if there were 5 ancestors to present human sample 30,000 years ago, the probability is 2 /(5*4) = 10%.

• Does not provide strong evidence against interbreeding between Neanderthals and humans.

2/19/2009 COMP 790-Trees & Topologies 17

Page 18: Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.

Next Time

• More Trees & Topologies– A single lineage– Disjoint subsamples– A sample partitioned by a mutation– The probability of going from n ancestors to k

ancestors.

2/19/2009 COMP 790-Trees & Topologies 18