Thorie de la coalescence
Pass
La coalescence
Prsent
Anctre commun tous les gnes (MRCA)Coalescence
La coalescence
La probabilit que deux allles coalescent est de 1/N, la probabilit quils
viennent de deux gnes diffrents est de 1-1/N.
La probabilit que 3 allles aient 3 anctres diffrents la gnration prcdente
est probabilit que lallle 1 et lallle 2 aient deux anctres diffrents multiplie
par la probabilit que lallle 3 ait un anctre diffrent des 2 autres = (1-1/N)(1-
2/N).
La probabilit que k allles aient k anctres distincts la gnration
prcdente est :
N2k
1Ni1)k(P
1k
1i
=
=
)!2k(!2!k
2k
=
avec
2/N).
La coalescence
Probabilit que deux gnes coalescent il y a
t+1 gnrations
t 111 Ntt
eN1
N11
N1
Loi exponentielle pour N petit
Ce processus suit une loi gomtrique (moyenne 1/p, variance q/p2) avec p=1/N.
La moyenne est donc de N et la variance de
22
N)1N(N)N
1(N11
=
La coalescence
On revient lchantillon de k allles. La probabilit de ne pas avoir de
coalescence pendant t gnrations puis un vnement de coalescence
est
2kt kkk
[ ]t
N2
t eN2k
N2k
1N2k
)k(P1)k(P
=
)1k(kN2
La moyenne est alors
de
T2=N
T3=N/3
T4=N/6
T5=N/10
La coalescence
)k11(N2TMRCA =Temps au MRCA
=
=
k
2iiiTT
= =
=
=
=
k
2i
k
2i
1k
1 i1N21i
1N2)1i(iN2iT
Longueur de larbre
Population stationnaire
Population en croissance
La divergence entre les squences reflte
leur temps de coalescence
ATACGTATC
T2A2A3T3
ATACCTATC ATACCTATC AAACCTAACATTCGTATGATTAGTATG
T2A2
T8A8
G5C5A3T3
C9G9
C4A4
Une mesure pratique : la distribution du nombre de
diffrences entre paires de gnes ( mismatch distribution )
Pour chaque paire de squences, on compte le nombre de
chantillon de squences
ATACCTATC
ATACCTATC
AAACCTAAC
ATTCGTATG
ATTAGTATG
Pour chaque paire de squences, on compte le nombre de
diffrences entre individus.
On compte le nombre de paires spares par une diffrences,
deux diffrences
0%
10%
20%
30%
0 1 2 3 4 5 6Nombre de diffrences
F
r
q
u
e
n
c
e
ATACCTATC
ATACCTATC
AAACCTAAC
ATTCGTATG
ATTAGTATG
Sample of
sequences
Ancestral: ATACGTATC
Mutation spectrum
For each mutation we count the number of sequences
that carry the mutation
S21, S32, S41, S53, S92 We count the number of mutations that have a given frequency:
2 mutations of frequency 1
2 mutations of frequency 2
1 mutation of frequency 3
Populationstationnaire
Mutation
PopulationEn croissance
Impact of demographic history on
population genetics
Hunter-Gatherer populations
(source : L. Excoffier)
Post-Neolithic populations
The detection of rapid population growth
a case study: Birgus latro
Old constant size populations Young expanding populations
Lavery, S., Moritz, C. & Fielder, D. R. 1996. Mol. Biol. Evol. 13, 1106-1113.
Old constant size populations Young expanding populations
La coalescence
Estimer des paramtres dmographiques:
Taux de migration
Taux de croissance
Date des vnements de scission Date des vnements de scission
Tester lintensit de la slection
Origins and Genetic Diversity of Pygmy
Hunter-Gatherers from Western Central Africa
Tracing back population history
Genetic adaptation to different life style:
hunter-gatherer versus farmers
Genetic factors involved in small height Genetic factors involved in small height
With the teams of L Quintana-Murci (Pasteur Institute), Y LeBouc (Hpital
Armand-Trousseau), F Austerlitz (Orsay)
22 populations
10 Pygmy population12 Non-pygmy populations
Average = 28 individuals/population
28 nuclear microsatellite loci
Population Set
ACP analysis based on the pairwise FST values
Pygmy populations clearly differentiated from non-pygmies.
Pygmy populations more scattered on the graph (more differentiated)
Individual Structuring of the Central African Genetic Diversity
Pygmies and Non-Pygmies cluster in two groups
Among Pygmies :
Asymmetric admixture signal from Non-pygmies
Heterogenous signal : variable admixture intensity with Non-pygmies
Does it echo some sociocultural rules on intermarriages
between Pygmies and Non-pygmies ?
Social Intermarriage Rules between Pygmies and Non-pygmies
Unprobable
Marriages
Potential
Marriages
Patrilocality :
Married woman livesMarried woman lives
at her husbands village
Frequent Divorces :
The pygmy woman goes back to
her community of origin
+ Illegitimate childs
Asymmetrical Admixture from Non-pygmies into Pygmies
Heterogenous Signal among Pygmies = specific social relationships vs Non-pygmies
and immediate pygmy neighbours
Principle of ABC methods: example of two
diverging populations
TN1
N2
Na
Parameters to estimate : N1, N2, Na, T, .
Observed statistics : He1, He2, FST
The program draws the values of the parameters in uninformative prior
distributions and perform simulations with these values, and compute the
same statistics on the simulated data.
Only the simulations in which the simulated statistics are close enough from
the observed statistics are kept, allowing thus an a posteriori estimation of the
parameters.
Prior versus posterior
distribution
From the posterior distribution, we can obtain a
estimate using the mode or the median value.
We can also obtain 95% confidence intervals.
Linear regression method
Beaumont et al (2002) Genetics 162, 2025-2035.
ABC study
Comparing two scenarios
Estimating the parameters for the best scenario
Performed with the software DiY ABCCornuet et al (2008) Bioinformatics (advanced online publication).
35 summary statistics.
the mean number of alleles per population
the mean expected heterozygosity He
the mean allele size variance expressed in
base pairsbase pairs
all pairwise FSTs
All pairwise genetic distances ().
The common origin scenario wins
Ancient separation time between pygmies and non-pygmies: 89,675 YBP (95% CI: 23,025 123,275).
Recent divergence of the pygmy groups: 2,900 YBP (95% CI: 850 30,050)
Similar level of admixture as with structure, except for the Baka
Estimated population sizes
N1 (Baka) 8,137 (1,347 9,824) N2 (Bezan) 2,795 (790 9,677) N3 (Kola) 3,302 (603 9,599) N4 (Koya) 3,197 (1,134 9,771) Nnp (Non-pygmies) 77,157 (27,926 97,828) Nap (ancestral pygmy population) 8,007 (960 9,825) NA (ancestral population) 1,071 (202 8,404)
Most likely scenarioscenario
Tested by ABC approach (Verdu et al, 2009 Current Biology)
Origins and Genetic Diversity of PygmyHunter-Gatherers from Western Central Africa
Inferring the Demographic History of African Farmersand Pygmy HunterGatherers Using a MultilocusResequencing Data SetPatin et al, PLOSGenetics 2009Patin et al, PLOSGenetics 2009
20 sequences neutres
Structure analysis.Best likelihood K=4
Summary statitstics
Frequency spectrum
Distance entre simulation et donnes observes avec les
Scnarios dmographiques:Tbot Sbot: temps et intensit du bottleneckTrec et Srec temps et intensit du recovery
avec les statistiques S, pi, D, D*
WPyg bot 2500-25000yrs (80% decrease) recovery 125yrs later 100-400%EPyg bot 250-2500 90-95% decrease No recovery
Modles dhistoire des populations
A-WE le meilleur
Values:Na : 11402 (75000-15000)Tsep 56000 (25000-130000) Tsep Pyg 22000 (14 000-66 000)Gene flow : WPYG-EPYG / WPYG-AGR/ EPYG- AGR 4.4 10-4 1.8 10-4 2.4 10-5