Models of DNA evolution How does DNA change, and how can we obtain distances?
Feb 22, 2016
Models of DNA evolutionHow does DNA change, and how can we obtain distances?
The Jukes-Cantor model
Thomas H. Jukes (1906-1999)
King JL Jukes TH 1969. Non-DarwinianEvolution. Science 164: 788-798.
Charles R. Cantor (°1942)
The Jukes-Cantor model
A G
TC
u
u
u
u
u
u
in the JC model, each base in the sequence has an equal chance of changing, u, into one of the three other bases
The Jukes-Cantor model
A G
TC
u/3
u/3
u/3
u/3
u/3
u/3
fictionalising, each base has a chance of (4/3)u of changing to a base randomly drawn from all 4 possibilities
The Jukes-Cantor model
PrY=y = e-m my
y!
537 hits576 squaresaverage hit per square = 0.9323
probability of not being hit e-0.9323*0.93230
0!= = 0.3936
expected number of squares hit 226.74not at all
the probability of no event is given by the zero term of a Poisson distribution
The Jukes-Cantor model
PrY=y = e-m my
y!
537 hits576 squaresaverage hit per square = 0.9323
probability being hit once e-0.9323*0.93231
1!= = 0.3670
expected number of squares hit 226.74 211.39 not at all 1x
The Jukes-Cantor model
PrY=y = e-m my
y!
537 hits576 squaresaverage hit per square = 0.9323
probability of being hit twice e-0.9323*0.93232
2!= = 0.1711
expected number of squares hit 226.74 211.39 98.54not at all 1x 2x
The Jukes-Cantor model
PrY=y = e-m my
y!
537 hits576 squaresaverage hit per square = 0.9323
probability of being hit four times e-0.9323*0.93234
4!= = 0.012
expected number of squares hit 226.74 211.39 98.54 30.62 7.13 1.6not at all 1x 2x 3x 4x 5+
observed number of squares hit 229 211 93 35 7 1
The Jukes-Cantor model
PrY=y = e-m my
y!
u/3
A G
TC
u/3
u/3
u/3
u/3
u/3
probability of no event = e-(4/3)ut
probability of ≥1 event = 1 - e-(4/3)ut
probability of C at the end of a branch that started with A = (¼)(1 - e-(4/3)ut)
probability that a site is differentat two ends of a branch = (¾)(1 - e-(4/3)ut)
The Jukes-Cantor model
branch length (ut)
0 1 2 3
diffe
renc
es p
er si
te
0.0
0.2
0.4
0.6
0.8
y = (¾)(1 - e-(4/3)ut)
the expected difference per site between two sequences increases with branch length but reaches a plateau at 0.75
The Jukes-Cantor model not using the J&C correction will distort the tree
A D
B C A B C D
A 0 0.57698 0.59858 0.70439
B 0.57698 0 0.24726 0.59858
C 0.59858 0.24726 0 0.57698
D 0.70439 0.59858 0.57698 0
the real tree expected uncorrected sequence differences
A D
BC
least squares tree
The Jukes-Cantor model
A G
TC
u/3
u/3
u/3
u/3
u/3
u/3
the J&C model assumes no difference in substitution rates between transversions and transitions
Kimura’s two-parameter model
A G
TC
a
b
a
b
b
b
R = number of transitionsnumber of transversions
= a2b
the Kimura model allows a difference in substitution rate between transversions and transitions
Kimura’s two-parameter model
Prob (transition|t) = ¼ - ½ e + ¼ e - 2R+1
R+1 t 2R+1 t-
probability that a transition will occur in a time interval t
R = a2b
Kimura’s two-parameter model
Prob (transition|t) = ¼ - ½ e + ¼ e - 2R+1
R+1 t 2R+1 t-
probability that any tranversion will occur in a time interval t
Prob (transversion|t) = ½ - ½ e
2R+1 t
Kimura’s two-parameter model
transversions
transitions
total
R=10
Time (branch length)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Diffe
renc
es
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
(50% different)
Time (branch length)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Diffe
renc
es
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Kimura’s two-parameter model
transversions
transitions
total
R=2
(50% different)
Tamura-Nei models
P(event type I)purine>purinepyrim>pyrim
P(event type II) random base A G C T
purine
A aR b -aRpG/pR
+ bpGbpC bpT
G aR baRpA/pR
+ bpA- bpC bpT
pyrimidine
C aY b bpA bpG -aYpT/pY +
bpT
T aY b bpA bpGaYpC/pY +
bpC-
pA,G,C,T: relative proportion of A,G,C,T in the poolpR= pA+ pG
pY = pC+ pT
the T&N models allow asymmetric base frequencies
The general time-reversible model (GTR)
A G C T
A - apG bpC gpT
G apA - dpC epT
C bpA dpG - hYpT
T gpA epG hpC -
The general 12-parameter model
A G C T
A - apG bpC gpT
G dpA - epC fpT
C gpA hpG - iYpT
T jpA kpG lpC -