Top Banner
Evolutionary Genetics: Part 3 Coalescent 2 – Effective Population size S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG Populationsgenetik
37

Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Jul 18, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Evolutionary Genetics: Part 3

Coalescent 2 – Effective Population size

S. peruvianum

S. chilense

Winter Semester 2012-2013

Prof Aurélien TellierFG Populationsgenetik

Page 2: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Color code

Color code:

Red = Important result or definition

Purple: exercise to do

Green: some bits of maths

Page 3: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Population genetics: 4 evolutionary forces

random genomic processes(mutation, duplication, recombination, gene conversion)

natural

selection

random demographicprocess (drift)

random spatial

process (migration)

molecular diversity

Page 4: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Effective population size

Page 5: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

The coalescent

� We can calculate many aspects of a genealogical (coalescent) tree for a

population of size 2N

� Time to MRCA : E[TMRCA] = 4N (1 – 1/n)

� Length of a tree: E[L] ≈ 4N log(n-1)

� Time of coalescence of last two lineages : E[T2] = 2N

2N

2N/3

2N/6

2N/10

Page 6: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Definition

� The real physical population is likely not to behave as in the Wright – Fisher model

� Most populations show some kind of structure:

� Geographic proximity of individuals,

� Social constraints…

� The number of descendants may be > 1 for the Poisson distribution

� Effective population size = size of a Wright – Fisher population that would

produce the same rate of genetic drift as the population of interest

� One consequence of drift: do two randomly picked offspring individuals have a common

ancestor in the parent generation?

Page 7: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Definition

� We will use here the inbreeding effective size = Ne

� Also called identity by descent population size

Ne = 1/ (2 * P[T2 = 1])

Where T2 is given in generations, T2 = time until two lineages coalesce

This depends on the immediate previous generation!

� An extension is:

Ne(t) = E[T2 ] / 2

� This relates to the number of generations until a MRCA is found in the population

Page 8: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Definition

� For the haploid Wright – Fisher model

Ne = 1/ (2 * P[T2 = 1])

With P[T2 = 1] = 1 / (2N)

So that Ne = N

� The extension is:

Ne(t) = E[T2 ] / 2

With E[T2] = 2N

So that Ne(t) = N

� For the Wright – Fisher model the two definitions agree

Page 9: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Calculating Ne

� Diploid model with different numbers of males and females

� Nf = number of females

� Nm = number of males

� Nf + Nm = N

� P[T2 = 1] = (1- 1/(2N)) * N /(8NfNm)

� Ne(t) = Ne = 4NfNm/(Nf+ Nm)

� For example: when some men have a harem, Nf = 20 and Nm = 1

� What is Ne ?

Page 10: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Calculating N and Ne

� Example based on human population: many human genes have an MRCA less than

200,000 years ago

� If one generation = 20 years

� So if 4Ne < E[MRCA]

� Ne < 200,000 / (4*20) => Ne < 2,500 !!!!!!!!!

� Of course N is bigger in human population, but Ne maybe be very small ☺

� We will see how to estimate Ne from sequence data later on

Page 11: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

The coalescent – 2 role of mutations

Page 12: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Coalescent tree + mutations

“Coalescent theory” John Wakeley, 2009

� The distribution of mutations amongst individuals can be summarized as a tree (on

a genealogy)

� The distribution of mutations amongst individuals can be summarized as a tree (on

a genealogy)

Page 13: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Coalescent tree + mutations

� How to add mutation on a coalescent tree?

� In a Wright Fisher model: see drawing

� Probability of mutation = µ that an offspring changes its genotype

� And P[no mutation] = 1- µ

� This means for example: for a two allele model A and a: mutation to go

from a to A, and vice and versa

� Classical model for DNA sequences is the so called infinite site model

� Definition: each new mutation hits a new site in the genome

� So it cannot be masked by back mutation

� Not affected by recurrent mutation

� Every mutation is visible except if lost by drift

Page 14: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Models of mutation

� There are other models of sequence evolution, but these will not be used

for now.

� Infinite allele model

� Definition: each mutation creates a new allele

� Example on a tree

� Finite site model

� Definition: mutations fall on a finite number of sites

� Example on a tree

Page 15: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Coalescent tree + mutations

� How to add mutation on a coalescent tree?

� Probability of mutation = µ that an offspring changes its genotype

� And P[no mutation] = 1- µ

� Do you see where this is going?

� After t generations, what is the probability that there was no mutations?

� P[X>t] = (1- µ)t = e- µt

� So we can draw again in an exponential distribution the time until a

new mutation occurs

� And put this on a tree, drawing for each branch the time to new mutation

Page 16: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Coalescent tree + mutations

� How to add mutation on a coalescent tree?

� The mutation will be visible in all descendants from that branch

4 sites

AAAA

AAAA TTAA TTTT

Page 17: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Coalescent tree + mutations

� How to add mutation on a coalescent tree?

� The mutation will be visible in all descendants from that branch

4 sites

AAAA

AAAA TTAA TTTT

5 sites

AAAAA

AAAAG TTAAA TTTTA

One more mutation

Page 18: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Mutations on a tree

� For neutral mutations we can do this process without changing the shape of the

tree or the size of the tree

� Tree topology = shape and branching of the tree

� Branch lengths = length of branches usually in units of 2N generations

� BECAUSE

� Forward in time: a neutral mutation does not change the offspring distribution

of an individual

� Backward in time: mutation does not change the probability to be picked as a

parent

Page 19: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Tree topology

� For neutral mutations we can do this process without changing the shape of the

tree or the size of the tree

� Tree topology = shape and branching of the tree

� Branch lengths = length of branches usually in units of 2N generations

� Definitions: external branches and internal branches

Page 20: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Tree topology and mutation

� We define mutations = SNPs depending on their frequency

� Mutation a is found in two sequences = doubleton

� Mutation b is found in one sequence = singleton

a

b

1 2 3 4

Page 21: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Mutations on a tree

� We are now interested in the number of mutations on each branch of the tree

� For a branch of length llll

� The number of mutations follows a Poisson distribution with parameter (l l l l µ)

� So for the total tree: Poisson (Lµ)

� Remember

� So we define S as the total number of mutations on a tree (on a set of sequences)

1

1

1[ ] 4

n

i

E L Ni

=

= ∑

1

1

1

1

[ ] 4 [ ]

1[ ] 4

4

1[ ]

n

i

n

i

E S N E L

E S NN i

E Si

µ

θ

θ

=

=

=

=

=

∑ With θ=4Neµ

Page 22: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

The population mutation rate

� This is the crucial parameter: combines mutation and Ne

� θ is called the population mutation rate or scaled mutation rate

� We can estimate θ based on sequence data

� Two estimators have been derived:

� θ̟ derived by Tajima (1983)

� θS (or θW ) derived by Watterson (1975)

θ=4Neµ

Page 23: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Watterson estimator

� θS = θW is based on the number of segregating sites in a tree S, compared to

the average branch length of sample of size n

� defined as remember:

� This is the expected average number of segregating sites per given length

of tree branch

1

1

1S n

i

S

i

θ−

=

=

1

1

1[ ] 4

n

i

E L Ni

=

= ∑

Page 24: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Tajima estimator

� θ̟ is defined as the number of average differences for all pairs of sequences

in a sample

� Based on ̟ij which is the number of differences between two sequences i

and j

� Defined as

� Because there are n(n-1)/2 pairs of sequences

� So take all sequences, and count for all pairs the number of differences,

� And then do the average

1 2

( 1)

2

ij ij

i j i jn n nπθ π π

≠ ≠

= =−

∑ ∑

Page 25: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Tajima estimator

� Based on πij which is the number of differences between two sequences i and j

� Different mutations counts differently

� Mutation a is counted in four pairwise comparisons

� Mutation b is counted in three comparisons

� πij and thus θπ depends on how many mutations fall on internal or external

branches

a

b

1 2 3 4

Page 26: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Coalescent tree + mutations

� Example of calculation

4 sites

AAAA

ATAA TAAT TATA

1

1

4 8

11 31

2

S n

i

S

i

θ−

=

= = =

+∑

2 3 3 2 8

( 1) 3 3ij

i jn nπθ π

+ += = =

−∑

Page 27: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Watterson estimator

� θS = θW is based on the number of segregating sites in a tree S, compared to

the average branch length of sample of size n

� defined as remember:1

1

1S n

i

S

i

θ−

=

=

1

1

1[ ] 4

n

i

E L Ni

=

= ∑

5 * 1/10

4 * 1/6

3 * 1/3

2 * 1

1

1

2 1 1 1 1 1[ ] 2 2 1 2 2(1 ) 4

3 2 2 3 4

n

i

E L N N Ni

=

= + + + = + + + =

Page 28: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Neutral model of coalescent

� Very important result:

θS = θ̟

� If the population follows

� a neutral model of coalescent with constant population size!!!!

θ=4Neµ

Page 29: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Estimating Ne

� It is possible to estimate Ne based on the two estimators

� IF and only IF you have independent data on the mutation rate

Ne = θ̟ / 4µ = θS / 4µ

� This assumes:

� Infinite site model

� Constant Ne over time

� Homogeneous population (equal coalescent probability for all pairs)

Page 30: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Estimating Ne

� Exercise Calculate θ̟, θS and estimate Ne

� For two datasets:

� In human populations: TNFSF-5-Humans.fas

� In Drosophila populations: 055-Droso.nex

� Define populations in Dnasp using: data => define sequence sets

� Then => Polymophism analysis

� For droso: europe and africa

� Mutation rate in humans = 1.2 * 10-8 per base per generation (Scally and Durbin,

Nat Rev Genetics October 2012)

� Mutation rate in Drosophila = 10-8 per base per generation

� What are the differences?

Page 31: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Heterozygosity

Page 32: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Heterozygosity

� Definition: Heterozygosity H is the probability that two alleles taken

at random from a population are different at a random site or locus.

� It is a key measure of diversity in populations

� If H0 is the heterozygosity at generation 0, then at generation 1:

� Assuming no new mutations

1 0

1 10 (1 )

2 2H H

Ne Ne= + −

Proba to have the same parents at

generation 0, with probability=0 to

be different

With proba 1-(1/2N) offsprings have

different parents, and these parents have

proba H0 (by definition) to be different

Page 33: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Heterozygosity

� By iteration we get at generation t

� This means that in the absence of mutation, heterozygosity is lost at

a rate of (1/2N) every generation

0

11

2

t

tH H

Ne

= −

Page 34: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Heterozygosity + mutation

� With the infinite allele model assumption that every new mutation

creates a new allele:

� Two contrary mechanisms drive the evolution of diversity in population:

genetic drift and mutation

� If they have the same strength and balance each other = mutation-

drift balance

� The change in heterozygosity between two generations is:

( )1

12 1

2t t t t

H H H H HNe

µ+∆ = − = − + −

Page 35: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Heterozygosity + mutation

� At equilibrium the value of heterozygosity is Ĥ:

( )1

12 1

2t t t t

H H H H HNe

µ+∆ = − = − + −

Change of heterozygosity due to

random drift (always negative)

Change of heterozygosity due to new

mutations (always positive)

4ˆ01 4

e

e

NH H

N

µ

µ∆ = ⇒ =

+

Ĥ=θ / (1+ θ) The value at equilibrium increases with increasing µ and NeWHY?

Page 36: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Mutation – Drift balance

� In the case of such model, we are interested in:

� The probability for a new mutation to get fixed?

� How long does it take to get fixed?

� Using a coalescent argument: fixation of the mutation occured if and only

if the mutant is that ancestor, this probability = 1/ 2N

� The expected time of fixation is equal to the expected time to the MRCA,

so it is = 4N

� What do we expect for selected loci?

Page 37: Evolutionary Genetics: Part 3 Coalescent 2 –Effective ... · The real physical population is likely not to behave as in the Wright –Fisher model Most populations show some kind

Mutation – Drift balance

� Substitution rate = rate at which mutations get fixed in a

population/species

� It is called k

� A new mutation starts with frequency 1/ 2N in a population,

� The substitution rate occurs mutliplying the number of mutations in a

population = 2 N µ

� And the probability that one mutation gets fixed = 1/ 2N

� So k = 2 N µ * (1/2N) = µ (Kimura)

� Most striking result: k does not depend on the effective population size