Top Banner
Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University
35

Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Non-stationary population genetic models with selection:Theory and Inference

Scott Williamson

and Carlos Bustamante

Cornell University

Page 2: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Inferring natural selection from samples

• Statistical tests of the neutral theory (lots)

• Methods for detecting selective sweeps (lots)

• Parametric inference: estimating selection parameters, etc.

• Quantification of selective constraint, deleterious mutation

Page 3: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

The demography problem

• Many existing methods assume random mating, constant population size

• These assumptions don’t apply in most natural populations

• The effect of demography can mimic the effect of natural selection

Page 4: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Natural selection and population growth

• Inferring selection from the frequency spectrum while correcting for demography

• The McDonald-Kreitman test: does recent population growth cause you to misidentify negative selection as adaptive evolution?

Page 5: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

The frequency spectrum: an example

Site

Sequence

Frequency class:

A G G C T T A A AA T G C T C G A AG T G T T C A C GA G G C T C A A GA G A C C C G A A

163

975

1972

2188

3529

4424

4961

5286

7019

1

2

3

4

5

1 2 1 1 1 4 2 1 3

Ancestral Derived

1 2 3 4

1

2

3

4

5

Frequency class

Cou

nt

The frequency spectrum

Page 6: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

1 2 3 4 5 6 7 8 9

2

4

6

8

10

Natural selection and the frequency spectrum

Frequency class

Cou

ntEquilibrium neutral and positively selected

frequency spectra

Neutral

2Ns=2

Page 7: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

1 2 3 4 5 6 7 8 9

2

4

6

8

10

Natural selection and the frequency spectrum

Frequency class

Cou

ntEquilibrium neutral and negatively selected

frequency spectra

Neutral

2Ns=-2

Page 8: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

Natural selection vs. demography

Frequency class

Cou

ntNon-stationary neutral and equilibrium selected

frequency spectra

Population growth, neutral

Equilibrium, 2Ns=-2

Page 9: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

How do we distinguish selection from demography?

McDonald-Kreitman approach:• Use a priori information to classify changes as

“neutral” (e.g. synonymous, non-coding) or “potentially selected” (e.g. non-synonymous)

• Putatively neutral changes are treated as a standard for patterns of neutral evolution in a particular sample

• Potentially selected sites are compared to the neutral standard

Can we develop a neutral standard for the frequency spectrum?

Page 10: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

1 2 3 4 5 6 7 8 9

10

20

30

40

50

60

Comparing frequency spectra for different classes of mutation

Frequency class

Cou

nt

Observed frequency spectra

Putatively neutral

Potentially selected

This talk:• Likelihood ratio test of neutrality

at potentially selected sites, using information from the neutral sites

• Biologically meaningful measure of the difference between the two spectra

Page 11: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

1 2 3 4 5 6 7 8 9

10

20

30

40

50

60

Frequency class

Cou

nt

Observed frequency spectra

Putatively neutral

Potentially selected

A model-based approach:

1. Fit a neutral demographic model to estimate demographic parameters

2. Given those parameter estimates, fit a selective demographic model to estimate selection parameters, test hypotheses

Comparing frequency spectra for different classes of mutation

Page 12: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

1 2 3 4 5 6 7 8 9

10

20

30

40

50

60

Comparing frequency spectra for different classes of mutation

Frequency class

Cou

nt

Observed frequency spectra

Putatively neutral

Potentially selected

Requirements:1. Demographic model

2. Frequency spectrum predictions from the model under neutrality

3. Frequency spectrum predictions from the model subject to natural selection

Page 13: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Theory: population growth model

2-epoch model

time

NA

NC

now

Po

pula

tion

siz

e

=NA/NC

Model parameters: ,

Page 14: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Theory: predicting the frequency spectrum

Definitions:

xi Number of sites in frequency class i

f(q,t;) Distribution of allele frequency, q, at time t

Predictions:

1

01];[ dqtqfqqi

nxE inii ;,

n Sample size

1

1];[

];[n

j i

i

xE

xEniP ;,

Page 15: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Theory: the distribution of allele frequency

Poisson Random Field approach (Sawyer and Hartl 1992):

• Use single-locus diffusion theory to predict the distribution of allele-frequency

• If sites are independent (i.e. in linkage equilibrium) and identically distributed, then the single-locus theory applies across sites

To get f, we need to solve the diffusion equation:

;,;;,;;, tqfqMdq

dtqfqV

dq

dtqf

dt

d2

2

2

1

Page 16: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Theory: time-dependent solution, neutral case

Kimura’s (1964) solution, given some initial allele frequency, p:

tii

iii

eqCpCii

piptq

)1(2

12/31

2/31

1

2

2121)1(

))21(1)(12(|,

tqfqqdq

dtqf

dt

d,, 1

2

12

2

The forward equation under neutrality:

Page 17: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Theory: time-dependent solution, neutral case

Kimura’s (1964) solution, given some initial allele frequency, p:

Applying Kimura’s solution to the 2-epoch model: ancestral mutations

dtNtqdpp

pqqf c

1

0 021

2

1 /|,

|,,;

Distribution of allele frequency:

1

0; , , |Af q q p dp

p

0

1; , , |1/ 2

2C cf q q t N dt

Page 18: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Theory: time-dependent solution, neutral case

Expected frequency spectrum after

a change in population size (=0.01)

1 2 3 4 6 75 8 9

0.2

0.4

0.6

0.8

frequency class

P(i

,n;,0.01)

Page 19: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Theory: time-dependent solution, neutral case

Multinomial likelihood:

,;,ln)!ln()!ln()x|,( niPxxnn

ii

n

ii

1

1

1

1

Maximum likelihood estimates of and

Likelihood ratio test of population growth

Page 20: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

1 2 3 4 5 6 7 8 9

10

20

30

40

50

60

Comparing frequency spectra for different classes of mutation

Frequency class

Cou

nt

Observed frequency spectra

Putatively neutral

Potentially selected

Requirements:1. Demographic model

2. Frequency spectrum predictions from the model under neutrality

3. Frequency spectrum predictions from the model subject to natural selection

Page 21: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Theory: time-dependent solution, selected case

;,;,;, tqfqqdq

dtqfqq

dq

dtqf

dt

dSSS 11

2

12

2

The forward equation with selection:

where =2NCs

Initial condition:

2

12

1

1

1

10

e

e

qqqf

q

S

)(

;,

Page 22: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Theory: time-dependent solution, selected case

1. Numerically solve the forward equation using the Crank-Nicolson finite differencing scheme

2. Use this approximation of f to evaluate the likelihood function:

,,;,ln)!ln()!ln()x|,,( niPxxnn

ii

n

ii

1

1

1

1

3. Fix and to their MLEs from the neutral data

4. Optimize the likelihood for . Likelihood ratio test of neutrality:

)x|ˆ,ˆ,()x|ˆ,ˆ,ˆ( 02LRT

Page 23: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Theory: time-dependent solution, selected case

How can we be sure that the numerical solution actually works?

• Von Neumann stability analysis: solution is unconditionally stable

• Numerical solution converges to the stationary distribution after ~4NC generations

• Comparison with time-dependent neutral predictions: Kimura, Crank, and Nicolson all agree with each other

Page 24: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Human Polymorphism Data

• From Stephens et al. (2001)

• 80 individuals, geographically diverse ancestry

• 313 genes, 720 kb sequenced

• ~3000 SNPs (72% non-coding, 13% synonymous, 15% non-synonymous)

Page 25: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Results for non-coding changes, assuming neutrality

Model MLEs ln(L)

2-epoch = 0.016

= 0.13

-5674.6

Equilibrium neutral

-6046.6 (P0, d.f. 2)

Goodness-of-fit

-5608.3 (P=0.54, d.f. 76)

Page 26: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Results for non-synonymous changes, categorized by Grantham’s distance

Category S P-value

conservative 136 -2.24 0.52

moderate 137 -6.08 0.07

radical 107 -8.44 0.02

all nonsyn 380 -4.88 0.10

Page 27: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Ongoing work and future directions

1. Simulate, simulate, simulate

• How robust is the method to different types of demographic forces?

• How does linkage among some sites affect the analysis?

• How does estimation error affect the LRTs?

2. Numerical solution for different demographic scenarios (e.g. bottleneck, population structure)

3. Variable selective effects among new mutations

Page 28: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

The McDonald-Kreitman test

Sn Number of non-synonymous segregating sites

Dn Number of non-synonymous fixed differences

Ss Number of synonymous segregating sites

Ds Number of synonymous fixed differences

n s

n s

S S

D D Adaptive evolution

n s

n s

S S

D D Negative selection

Extensions: Sawyer and Hartl (1992), Rand and Kann (1996), Smith and Eyre-Walker (2002), Bustamante et al. (2002), others

Page 29: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Demography and the McDonald-Kreitman test

• Robust to different demographic scenarios because it implicitly conditions on the underlying genealogy (see Nielsen 2001)

• However, under some demographic scenarios it’s possible to misidentify the type of selection

• Weak negative selection with population growth

When the population size is small, non-synonymous deleterious mutations might be fixed by drift

Once the population size becomes large, the level of non-synonymous polymorphism would be reduced (relative to the level of synonymous polymorphism)

n s

n s

S S

D D

Page 30: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Demography and the McDonald-Kreitman test

• Over what range of parameter values might you misidentify negative selection as adaptive evolution?

• How large is the effect?

Eyre-Walker (2002):

• Addressed these questions, finding that recent population growth or bottlenecks can cause you to misidentify negative selection

• Assumed that levels of polymorphism and fixation rates changed instantaneously with population size

Page 31: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Demography and the McDonald-Kreitman test

1

01 1 ; , ,

nnn nE S q q f q dq

2

1

0

22 1; , ,

2 1

1 ; , ,

nn div n

n

n

E D t fe

q f q dq

1

01 1 ;0, ,

nns sE S q q f q dq

1

02 1;0, , 1 ;0, ,

2ns

s div s sE D t f q f q dq

where tdiv is the divergence time, measured in 2NC generations

Page 32: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Demography and the McDonald-Kreitman test

NI n s

s n

S D

S D

0.1 10.01

1

10

=1, tdiv=4

0.1 10.01

=1, tdiv=10

1

10

0.1 10.01

1

10

=0.1, tdiv=4

=0.1, tdiv=10

0.1 10.01

1

10

(=NA/NC)

Exp

ect

ed

Neu

tra

lity

Ind

ex

(NI)

Page 33: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Demography and the McDonald-Kreitman test: Preliminary results

1. It is possible to misidentify negative selection for some parameter combinations

2. But…the parameter range over which this is true is probably smaller than previously thought, as is the magnitude of the effect

Page 34: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Summary

1. Model-based approach to correcting for demography while inferring selection

• Evidence for very recent population growth in humans

• Reasonable estimates of selection parameters for classes of non-synonymous changes

2. McDonald-Kreitman test: negative selection + population growth problem not as severe as previously thought

3. Numerical methods for solving the diffusion are fast, accurate, and fun!

Page 35: Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University.

Acknowledgements

Collaborator: Carlos Bustamante

Data: Genaissance Pharmaceuticals

Helpful discussions: Bret Payseur, Rasmus Nielsen, Matt Dimmic, Jim Crow, Hiroshi Akashi, Graham Coop