Linkage Analysis: Linkage Analysis: An Introduction An Introduction Pak Sham Twin Workshop 2001
Linkage MappingLinkage Mapping
Compares inheritance pattern of trait with the inheritance pattern of chromosomal regions
First gene-mapping in 1913 (Sturtevant)
Uses naturally occurring DNA variation (polymorphisms) as genetic markers
>400 Mendelian (single gene) disorders mapped
Current challenge is to map QTLs
Linkage = Co-Linkage = Co-segregationsegregation
A2A4
A3A4
A1A3
A1A2
A2A3
A1A2 A1A4 A3A4 A3A2
Marker allele A1
cosegregates withdominant disease
RecombinationRecombinationA1
A2
Q1
Q2
A1
A2
Q1
Q2
A1
A2 Q1
Q2
Likely gametes(Non-recombinants)
Unlikely gametes(Recombinants)
Parental genotypes
Recombination of three Recombination of three linked locilinked loci
(1-1)(1-2)
1 2
(1-1)2
1(1-2)
12
Map distanceMap distance
Map distance between two loci (Morgans)
= Expected number of crossovers per meiosis
Note: Map distances are additive
Recombination & map Recombination & map distancedistance
2
1 2me
Haldane mapfunction
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0 0.2 0.4 0.6 0.8 1
Map distance (M)
Re
co
mb
ina
tio
n f
rac
tio
n
Methods of Linkage Methods of Linkage AnalysisAnalysis
Model-based lod scores Assumes explicit trait model
Model-free allele sharing methods Affected sib pairs Affected pedigree members
Quantitative trait loci Variance-components models
Double Backcross :Double Backcross :Fully Informative GametesFully Informative Gametes
AaBb aabb
AABB aabb
AaBb aabb Aabb aaBb
Non-recombinant Recombinant
Linkage Analysis :Linkage Analysis :Fully Informative GametesFully Informative Gametes
Count Data Recombinant Gametes: RNon-recombinant Gametes: N
Parameter Recombination Fraction:
Likelihood L() = R (1- )N
Parameter
Chi-square
)(ˆ RNR
)5log(.)(
)1log(log22
NR
NR
Phase Unknown MeiosesPhase Unknown Meioses
AaBb aabb
AaBb aabb Aabb aaBb
Non-recombinant Recombinant
Recombinant Non-recombinant
Either :
Or :
Linkage Analysis :Linkage Analysis :Phase-unknown MeiosesPhase-unknown Meioses
Count Data Recombinant Gametes: XNon-recombinant Gametes: Y
or Recombinant Gametes: YNon-recombinant Gametes: X
Likelihood L() = X (1- )Y + Y (1- )X
An example of incomplete data :
Mixture distribution likelihood function
Parental genotypes Parental genotypes unknownunknown
Likelihood will be a function of
allele frequencies (population parameters)
(transmission parameter)
AaBb aabb Aabb aaBb
Trait phenotypesTrait phenotypes
Penetrance parameters
Genotype Phenotype
f2AA
aa
Aa
Disease
Normal
f1
f0
1- f2
1- f1
1- f0
Each phenotype is compatible with multiple genotypes.
General Pedigree General Pedigree LikelihoodLikelihood
Likelihood is a sum of products (mixture distribution likelihood)
n
f
imf
f
i
G
n
gggtransgpopgxpenL iiii
111
)|()()|( ,
number of terms = (m1, m2 …..mk)2n
where mj is number of alleles at locus j
Elston-Stewart algorithmElston-Stewart algorithmReduces computations by Peeling:
Step 1Condition likelihoods of family 1 on genotype of X.
1
2X
Step 2Joint likelihood of families 2 and 1
Lod Score: Morton Lod Score: Morton (1955)(1955)
5.0
log
L
LLod
Lod > 3 conclude linkage
Prior odds linkage ratio Posterior odds1:50 1000 20:1
Lod <-2 exclude linkage
Linkage AnalysisLinkage AnalysisAdmixture TestAdmixture Test
Model
Probabilty of linkage in family =
Likelihood
L(, ) = L() + (1- ) L(=1/2)
Allele sharing Allele sharing (non-parametric) (non-parametric)
methodsmethodsPenrose (1935): Sib Pair linkage
For rare disease IBDConcordant affected
Concordant normalDiscordant
Therefore Affected sib pair design
Test H0: Proportion of alleles IBD =1/2
Affected sib pairs: Affected sib pairs: incomplete marker incomplete marker
informationinformationParameters: IBD sharing probabilities
Z=(z0, z1, z2)
iIBDMPzzLi
i
|2
0
Marker Genotype Data M: Finite Mixture Likelihood
SPLINK, ASPEX
Joint distribution of Joint distribution of Pedigree IBDPedigree IBD
IBD of relative pairs are independent
e.g If IBD(1,2) = 2 and IBD (1,3) = 2
then IBD(2,3) = 2
Inheritance vector gives joint IBD distribution
Each element indicates whether
paternally inherited allele is transmitted (1)
or maternally inherited allele is transmitted (0)
Vector of 2N elements (N = # of non-founders)
Pedigree allele-sharing Pedigree allele-sharing methodsmethods
Problem
APM: Affected family members Uses IBS
ERPA: Extended Relative Pairs Analysis Dodgy statisticGenehunter NPL: Non-Parametric Linkage Conservative
Genehunter-PLUS: Likelihood (“tilting”)
•All these methods consider affected members only
Convergence of Convergence of parametric and non-parametric and non-parametric methodsparametric methods
Curtis and Sham (1995)
MFLINK: Treats penetrance as parameter
Terwilliger et al (2000)
Complex recombination fractions
Parameters with no simple biological interpretation
Quantitative Sib Pair LinkageQuantitative Sib Pair Linkage
X, Y standardised to mean 0, variance 1r = sib correlationVA = additive QTL variance
(X-Y)2 = 2(1-r) – 2VA(-0.5) +
Haseman-Elston Regression (1972)Haseman-Elston Regression (1972)
Haseman-Elston Revisited (2000)Haseman-Elston Revisited (2000)
XY = r + VA(-0.5) +
Improved Haseman-Improved Haseman-ElstonElston
Sham and Purcell (2001) Use as dependent variable
Gives equivalent power to variance components model for sib pair data
2YX
2
2
)1( r
YX
2
2
2
2
)1()1( r
YX
r
YX
Variance components Variance components linkagelinkage
Models trait values of pedigree members jointly Assumes multivariate normality conditional on IBD Covariance between relative pairs
= Vr + VA [-E()]
Where V = trait variance
r = correlation (depends on relationship)
VA= QTL additive variance
E() = expected proportion IBD
QTL linkage model for sib-pair QTL linkage model for sib-pair datadata
PT1
QSN
PT2
Q S N
1
[0 / 0.5 / 1]
n qs nsq
Incomplete Marker Incomplete Marker InformationInformation
IBD sharing cannot be deduced from marker genotypes with certainty
Obtain probabilities of all possible IBD values
Finite mixture likelihood
Pi-hat likelihood
Ai ViIBDXLZL ;|
AVIBDXLL ;ˆ2|
Conditioning on Trait Conditioning on Trait ValuesValues
Usual test
0;ln
;|lnln
A
Ai
VXL
ViIBDXLZMaxLR
Conditional test
Ai
Ai
ViIBDXLP
ViIBDXLZMaxLR
;|ln
;|lnln
Zi = IBD probability estimated from marker genotypesPi = IBD probability given relationship
QTL linkage: some QTL linkage: some problemsproblems
Sensitivity to marker misspecification of marker allele frequencies and positions
Sensitivity to non-normality / phenotypic selection Heavy computational demand for large pedigrees or
many marker loci Sensitivity to marker genotype and relationship errors Low power and poor localisation for minor QTL