-
1
IBD- based QTL detection in multi-cross inbred designs: A case
study of
cereal breeding programs
Sébastien Crepieux*,1, Bertrand Servin† , Claude Lebreton‡ and
Gilles Charmet*
*UMR 1095 INRA-UBP, 234 Av. du Brezet, 63039 Clermont-Ferrand
Cedex 2, France,
†INRA UMR de Génétique Végétale, INRA/UPS/INAPG, 91 190 Gif sur
Yvette, France
‡Limagrain Agro-Industrie, site d’ULICE, av G. Gershwin, BP173,
F-63204 Riom Cedex,
France
-
2
IBD-based multi-cross QTL mapping
1 : corresponding author : 1095 INRA-UBP, 234 Av. du Brezet,
63039 Clermont-Ferrand
Cedex 2, France. Email : [email protected]
Phone : 00 33 4 73 62 43 09 , Fax: 00 33 4 73 62 44 53
Key words: QTL detection, variance component, pedigree breeding,
IBD, multi-cross
ABSTRACT
Mapping quantitative trait loci in plants is usually conducted
using a population derived from
a cross between two inbred lines. The power of such QTL
detection and the parameter
estimates highly depend on the choice of the two parental lines.
Thus, the QTL detected in
such populations only represent a small part of the genetic
architecture of the trait. Besides,
the effects of only two alleles are characterised, which is of
limited interest to the breeder. On
the other hand, common pedigree breeding material remains
unexploited for QTL mapping. In
this study, we extend QTL mapping methodology to a generalized
framework, based on a
two-step IBD variance component approach, applicable to any type
of breeding population
coming from inbred parents. The power and accuracy of this
method were assessed on
simulated data mimicking conventional breeding programs in
cereals. This method can
provide an alternative to the development of specifically
designed recombinant population, by
exploiting the genetic variation actually managed by plant
breeders. The use of these detected
QTL in assisting breeding would thus be facilitated.
-
3
INTRODUCTION
The availability of molecular markers in the 1980’s has opened
new scope for quantitative
genetics and breeding. It was thus anticipated that the
manipulation of loci underlying
quantitative traits (QTL) would be as easily feasible as with
mendelian factors. This
perspective, however, has remained largely unreached, despite
the large corpus of theoretical
studies on marker assisted selection ( e.g. LANDE and THOMPSON
1990; GIMELFARB and
LANDE 1994 , 1995 ; HOSPITAL et al. 1997). The main reason is
probably the cost of markers
and the relatively low improvement in selection efficiency that
leads MAS to be generally
much more expensive than conventional breeding (MOREAU et al.
2000). The other reason is
that applied breeding programs and QTL research are often
disconnected, i.e. carried out by
different teams and using different plant material.
Classically, QTL analyses are carried out on a few progenies
from broad base crosses, coming
from a small number of distantly related lines, often including
wild relatives. Such analyses
mostly involve bi-parental progenies such as back-crosses (BC),
doubled haploids lines (DH),
F2 or recombinant inbred lines (RILs). In the approaches based
on this kind of plant material,
the effect of an allele substitution at a candidate locus is
tested. This is called the fixed model
approach (XU and ATCHLEY 1995) since it considers a fixed number
of distinct alleles (most
often two) at each putative QTL. Statistical methods for the QTL
analysis of bi-parental
populations underwent successive improvements through the advent
of Interval Mapping
(LANDER and BOTSTEIN 1989) and its linearization (HALEY and
KNOTT 1992), the Composite
Interval Mapping (ZENG 1993, 1994 ; JANSEN 1993) and multiple
trait QTL mapping (KOROL
et al. 1995; JIANG and ZENG).
On the other hand, the breeder’s material is far from the
studied bi-parental populations.
Breeders generally handle many small families from crosses
between (often) highly related
elite lines. Thus, the above described methods are poorly
adapted. Moreover there are many
-
4
drawbacks for the breeder’s use of the QTL found on bi-parental
populations. First, when
only two parents are considered, some markers and potential QTLs
are more likely to be
monomorphic, even if parental lines are carefully selected for
trait divergence. Since, by
definition, QTL can only be found at polymorphic sites in the
genome, the expected number
of QTL detected with a bi-parental cross will be lower than that
expected when analyzing
several crosses at a time (assuming the total number of
genotypes is not the limiting factor).
The second drawback is that the QTL effect is estimated as a
contrast between two alleles and
in one genetic background only. Therefore, in that context, the
improvement of a line by the
introgression of a QTL allele in a completely new genetic
background is rather unpredictable,
because of possible epistatic interaction between QTL and
genetic background. Finally, from
an economic standpoint, the cost of creation of large single
cross progenies and specific trials
for trait evaluation to perform QTL detection is quite high and
often at the expense of other
selection programs.
All these drawbacks reduce the breeders’ interest for
implementing such experimental designs
when funding and work are constrained. Bi-parental crosses are
usually preferred for more
upstream studies, e.g. genomics: the fine-mapping of a QTL,
which is a pre-requisite for its
positional cloning, is easier when fewer QTLs are segregating.
In contrast, the breeders’ focus
will be to characterize the effect of a wide range of alleles in
his germplasm. Methods for
simultaneous detection and manipulation of QTL in breeding
programs would thus enhance
the applicability of MAS.
In plant breeding, new methods for QTL detection in complex
designs, close to those used in
real breeding schemes have already been developed. MURANTY
(1996) suggested to work in
plants with progenies from several parents, in order to achieve
a high probability to have more
than one allele at a putative QTL, and also to have a more
representative estimate of the
variance accounted for by a QTL. She operated in a fixed-effect
framework. Simulations
-
5
demonstrated that a higher QTL detection power was achieved, for
a given sample size. XU
(1998) compared the QTL detection powers obtained with random
effect models and fixed
effects and found similar values for individual family sizes as
low as 25 individuals.
However, in more unbalanced designs, the random effect approach
was presumed to be more
suited as it can handle any arbitrary pedigree of individuals
(LYNCH and WALSH 1998, XU
1998). Efficient methodologies for more fragmented populations
in plants have been
developed (XIE et al. 1998; YI and XU 2001, BINK et al. 2002,
JANSEN et al. 2003 for
example), but their extension or implementation for any complex
plant designs, implying a
mixing of half-sibs and full-sibs families of different sizes,
at any generation of selfing and
with the hermaphrodite status of parents, is not
straightforward. The identical-by-descent
(IBD)-based variance component analysis is a powerful
statistical method for QTL mapping
in complex populations and can be used in pedigrees of arbitrary
size and complexity
(ALMASY and BLANGERO 1998). These IBD-based variance component
analyses are derived
from the assumption that individuals of similar phenotype are
more likely to share alleles that
are identical by descent. The construction of IBD matrices for
alleles at each tested position
along the genome, and the fitting of random effect models (which
assumes that QTL effects
are normally distributed) offer an appropriate method to map QTL
if the progeny population
is large enough and if the progenies are connected in some way.
Besides, these models do not
need to assume a known, finite set of alleles at each putative
QTL. Thus, they offer a less
parameterized statistical environment in which to map QTL,
because only the variances need
to be estimated instead of every allele substitution effect.
Generally, the IBD-based
approaches assume a between family IBD-likelihood of zero (i.e.
no parents in common
between the two families), and thus, consider the parents as
founders. However, this
assumption is often wrong in common breeding pedigrees.
Furthermore, in fragmented
situations, i.e. where there are many families of small sizes
(especially when the genotyping
-
6
takes place at a late stage in pedigree breeding, where we may
easily end up with as few as
one or two lines per cross), the IBD-likelihood matrix can be
very sparse. Hence, there could
be much to be gained in exploring the actual between family
IBD-likelihoods.
In the work related in this paper, we took over these
developments and further assumed a non-
zero IBD-likelihood between non-sib lines. We then present a
unified IBD-based variance
component analysis framework, to map QTL in any kind of
multi-cross designs involving
self-pollinating species, at any generation. To test the
accuracy of the method, we developed a
simulation program which mimics the steps of real breeding
schemes. We chose the
simulation parameters according to the information provided by
breeders on their real
material in order to be as comprehensive as possible in the
range of genetic configurations
explored.
This method can provide an alternative to the development of
specifically designed
recombinant population, by exploiting the genetic variation
actually managed by plant
breeders. Our method can thus provide breeders with valuable
information about the breeding
values of their material and help them to design selection
strategies.
METHODS
Two-step IBD based variance component method
The method used to map QTL in a complex inbred pedigree is a two
step variance component
method, as described in GEORGE et al. (2000). Hence, this method
first consists in
constructing the (co)variance matrix of fixed and random effects
at each putative QTL
position and then estimates the likelihood of the presence of a
QTL at these positions using
appropriate linear models.
-
7
These two steps are common to all interval mapping based
variance components methods.
What differs mainly among all the published methods is the way
to calculate the IBD
probabilities (see GEORGE et al. 2000 for a review of IBD
probabilities calculation). We
adopted a deterministic approach to infer IBD probabilities for
any generation of
recombination and breeding scheme (e.g. F2, Fn, RIL, BCn), based
on the MDM program
(SERVIN et al. 2002).
Mixed linear models
We assume that the quantitative trait is a linear combination of
fixed design effects, a putative
QTL effect (with additive or/and dominance effect) and additive
polygenic effects. The
random polygenic effect is seen as the cumulative effect of all
loci affecting the quantitative
trait that are unlinked to the QTL. The model is:
)1(eZvZuXy +++= β
where y is an (m*1) vector of phenotypes, X is an (m*s) design
matrix, β is a (s*1) vector of
fixed effects, Z is an (m*q) incidence matrix relating records
to individuals, u is a (q*1) vector
of additive QTL effects, v is a (q*1) vector of additive
polygenic effects and e is the residual.
We assume that the random effects u, v and e are uncorrelated
and distributed as multivariate
normal densities: ),0(~;),0(~;),0(~ 222 evu INeANvGNu σσσ , with
222 , evu and σσσ being
respectively the additive variance of the QTL, the polygenic
variance and the residual
variance. A is the (q*q) additive genetic relationship matrix; G
is the (q*q) (co)variance
matrix for the QTL additive effects conditional on marker
information; and I is the (m*m)
identity matrix.
The model without QTL segregating in the population is, with the
same notations:
)2(eZvXy ++= β
-
8
Computation of the IBD probabilities
We consider a mapping population composed of several
sub-populations of small size. Each
of these sub-populations is an offspring coming from an inbred
pedigree started with two
parents. For example, these sub-populations could be produced by
several consecutive
selfings (e.g. RILs) or back-crossings. We want to compute the
probability that two
individuals taken from any of these sub-populations share IBD
alleles at a given locus of their
genome. If we consider a pair of individuals from the mapping
population, they may be (i)
taken from the same sub-population, in which case they are
full-sibs (ii) taken from two
different sub-populations. In this last case, if one of the
parents is common to the two sub-
populations, the two individuals will be half-sibs, if the
parents of the two sub-populations are
distinct, the two individuals are considered as unrelated. We
will now draw relevant
calculations of the IBD probabilities for each of these
cases.
IBD value between two sibs at a QTL: Within each full-sib family
of the breeding scheme,
only two alleles are segregating giving only three possible
genotypes at the QTL: QQ, Qq and
qq.
Following XIE et al. (1998) notations, the IBD value of two
individuals i and j, is measured
as:
QqQqorQqqqqqqqor
qqQQQqQQQQQQ
forforfor
jiji −−−
−−−
== ,
,,,
012
2 ,, θπ
where ji,π are the ijth elements of G, and ji,θ is MALECOT’s
(1948) coefficient of coancestry.
As pointed out by many authors, when inbreeding is present, ji,π
is not interpreted as the
proportion of alleles IBD, but rather as twice the coefficient
of coancestry (KEMPTHORNE
1955; HARRIS 1964; COCKERHAM 1983).
-
9
Inferring the IBD value of a QTL from markers: The IBD value is
completely determined
by the genotypes of two individuals at the QTL of interest. The
actual QTL genotype of an
individual, however, is not observable, and must be inferred
from flanking marker
information. We denote the following probabilities:
)/Pr(),/Pr(),/Pr( 012 MjMjMj IqqpandIQqpIQQp === .
We write [ ] Tiiii pppp 012= and [ ]Tjjjj pppp 012= . The
conditional expectations of the
IBD values between two full sibs are:
jTiMjiji CppIE == )/( ,, ππ for between individuals,
and jT
Mjiji pcIE == )/( ,, ππ for the individual with itself,
where:
=
=
121
210111012
candC
In the rest of the paper, this formula, for computing IBD
probabilities, will be referred to as
formula (1).
General case: Above, we emphasized the calculation of IBD for
the full-sib case. The
generalization to the half-sib case is trivial.
Using this first formula to compute IBD probabilities, we
assumed that parents of sub-
populations were unrelated, i.e. they did not share any common
ancestors. However, in
practice, these parents were coming from previous generations of
breeding and were very
likely to share IBD alleles in their genome (due to the
intensive use of some “star varieties”,
for example). In order to take these possible relationships
between parents into account, we
estimated coancestry coefficients between them, using molecular
information, as suggested by
BERNARDO (1993). The pedigree information, if available, could
be used to that end.
However, the selection pressure may generate a discrepancy in
the predicted proportion of
parental genomes shared by the current lines. Besides, we
assumed that the pedigree
-
10
information was often very scarce or even unavailable. Hence, we
resorted to a genetic
similarity based index to estimate that proportion of
genome.
First, we generalized the C matrix to the known half-sib,
full-sib and unrelated individuals by
introducing the coancestries between parents, estimated by
markers in our case. We
considered two individuals taken in two sub-populations. Let P1
and P2 be the parents of the
first sub-population and P3 and P4, the parents of the other
sub-population. We denote by
GSP1P3, GSP1P4, GSP2P3 and GSP2P4 the estimates of the
coefficients of coancestry between
these parents. Taking into account possible coancestries between
P1 and P3 on one hand and
P2 and P4 on the other hand, the C matrix can then be re-written
as:
+=
4242
42423121
31
3131
1
20)(
02
PPPP
PPPPPPPP
PPPP
GSGSGSGSGSGS
GSGSC
Note that in the full-sib case P1=P3 and P2=P4, so that GSP1P3
and GSP2P4 are equal to one
and the C1 matrix is similar to C. Similarly, the relevant C
matrices for half-sibs individual or
for unrelated individuals can be obtained by replacing
respectively GSP1P3 by one and GSP2P4
by zero on one hand, and both GSP1P3 and GSP2P4 by zero on the
other hand.
Similarly, taking into account the coefficients between the
parents P1 and P4 on one hand and
P2 and P3 on the other hand, we can re-write the C matrix as
:
+=
02)(21
20
4141
32413241
3232
2
PPPP
PPPPPPPP
PPPP
GSGSGSGSGSGSGSGS
C
Finally, we can draw a general formula for the conditional
expectation of the IBD values
between two individuals coming from four (distinct or not)
inbred parents:
jTij
TiMjiji pCppCpIE 21,, )/( +== ππ
-
11
))])([()])([(
)])([()])([((2)/(..
121
0121
042121
2121
032
121
0121
241121
2121
231,,
iijjPPiijjPP
iijjPPiijjPPMjiji
ppppGSppppGS
ppppGSppppGSIEei
++++++
+++++== ππ
The conditional expectation of the IBD for an individual with
itself remains:
12,, 2)/( jjMjiji ppIE +== ππ
In the rest of the paper, this formula, using the C1 and C2
matrices, will be referred to as
formula (2).
The elements in the additive relationship matrix A are estimates
of the proportion of IBD
genome between any two lines. The ai,j elements is the
proportion of IBD genome between
line i and line j, based on genetic similarities.
For both A and G matrices, genetic similarities (GSi,j) were
computed using NEI and LI
(1979) formula.
Implementation of the IBD formula
We used the deterministic approach of the MDM program (SERVIN et
al. 2002) to compute all
the pi and pj probabilities. IBD-likelihoods were computed every
3cM. Two flanking markers
were used to infer the genotypes probabilities. In the frequent
case where the two parents
shared the same marker alleles at one or two loci flanking the
putative QTL position, the next
closest markers to the interval were used. It can easily be
demonstrated that the IBD
probabilities calculated at a putative QTL will be more precise
if the flanking markers are
highly polymorphic. Issues on the use of better estimates of the
coefficients of co-ancestries
will be discussed on the last part of the article.
All the G matrices were then inverted, and written in ASREML
(GILMOUR et al. 1998) format
for user-defined inverse (co)variance matrices. We also computed
the additive relationship
matrix A. Then, it was inverted and written in ASREML format
(end of step1).
In step 2, ASREML provided restricted maximum-likelihood (REML)
estimates of (1) and
(2). To test for the presence of a QTL against no QTL at a
particular chromosomal position,
-
12
we used the Log Likelihood Ratio test: LR= -2ln( L0(H0, no QTL
present) – L1(H1, QTL
present) ), where L1 and L0 represent the likelihood values of
(1) and (2) evaluated at the
REML solutions, respectively.
Test statistic under the null hypothesis
The choice of the threshold in this kind of population is always
challenging. Many
publications (ZENG 1994; XU and ATCHLEY 1995 for example) report
that when a
chromosomal interval is being scanned, the empirical
distribution of LR follows a mixture of
two Chi-square distributions, with one and two degrees of
freedom, respectively.
Since this article deals with simulated data, it is possible to
replicate data under the null
hypothesis of no QTL segregating, construct the empirical
distribution of LR and derive
empirical threshold by choosing the 95th percentile of the
highest test statistic, generally over
500 or 1000 stochastic realizations. In this paper, we
calculated an empirical threshold for
every set of parameters to see whether significant differences
on threshold appeared or not.
Hence, for each set of parameters tested, we ran 1000 additional
simulations with no QTL
segregating. We increased the polygenic variance such that the
total genetic variance
remained unchanged. We thus determined the empirical threshold
by choosing the 95th
percentile from the list of 1000 runs.
A SIMULATION STUDY: The case of a cereal breeding program
We chose the cereal example in the simulation study as it
contains most of the difficulties
generally encountered in inbred breeding schemes:
- the frequent unavailability of reliable pedigree information,
beyond the parents (and
thus unavailability of ancestor lines to genotyping)
- the potential to genotype only advanced generations of
selfing, when the number of
lines has been narrowed down and the trial precision of trials
increased, constraining
-
13
to compute IBD at the F6 or F7 stage without any marker
information between the
initial cross and the resulting progenies
- the very high number of parents of the mapping population
yielding very small full-sib
families, and an uneven (L shaped) distribution of half-sib
family sizes
Simulation of the inbred breeding scheme
Every year, new plant breeding programs are started. The choice
of the numerous parents for
crosses combines on one hand lines with the highest genetic
merit for some traits of interest
like yield or quality, and on the other hand, some lines of
specific interest such as special
quality or pest/disease resistance, sometimes taken in old or
exotic material. The original
crossing scheme in a breeding program is very dependent on the
breeder for the choice of the
parents, but the breeding process is often closely the same.
To reproduce the steps of the breeding programs, an S-PLUS
(2000) function was developed.
This function was designed by analysing wheat breeders
information and allele diversity and
frequencies made available by recent studies (DONINI et al.
2000; RODER et al. 2002).
All simulations follow a similar procedure: (Figure 1)
Figure 1 around here
Marker-genotypes construction: Before the start of the breeding
process, we considered only
a few founder lines, with full linkage disequilibrium across all
their genome, hence between
all the markers and the QTLs. We also imposed that at this
generation, no founder lines had
any alleles in common with any other. Thus, line 1 carried only
markers and QTLs coded 1,
line 2 only markers and QTLs coded 2, and the last founder line
(the NPth) carried only
markers and QTLs coded NP all along its genome. The genome of
simulated individuals was
composed of 21 chromosomes of 100 cM, with markers evenly spaced
every d centiMorgans
-
14
all along the chromosomes, with two markers on each chromosome
telomeres. On
chromosome 1, we simulated a single QTL between two markers. In
addition to the QTL of
interest, we simulated on the other chromosomes Npoly=40
randomly located QTLs (that
could therefore be linked or not) with random effects to
simulate the polygenic contribution to
the trait values.
First generation of crosses: From the founder lines generation,
circular crosses, i.e. 1*2, 2*3,
… , NP-1*NP, NP*1 were performed. We then derived lines by
self-pollination during five
generations in order to obtain F6 lines. NP mixed
sub-populations of the same size (same
contribution for all founders for this first generation) were
derived, giving the “G0”
generation. The population was considered as totally fixed, by
sampling only one gamete after
the last self-pollination. This stage corresponded to the end of
a breeding cycle. The
recombination procedure was based on randomly placed chiasmatas
with no interference.
Quantitative trait: The parameters for the creation of the
quantitative trait are the QTL
heritability (h²QTL) and the heritability of the polygenes
(h²poly).
We created at the founder line generation the NP allele effects
for the QTL and the Npoly
polygenes. The NP=20 possible effects of the QTL were drawn from
a normal distribution
with mean 0 and variance 1. Then the QTL variance (VarQTL) was
calculated at the true QTL
position, and the NP=20 effects for each of the Npoly polygenes
were extracted from a normal
law with mean 0 and variance [VarQTL*(h²poly/h²QTL)]/Npoly].
Finally, the true variance
accounted by the polygenes was computed (VarPoly), and a random
normally distributed noise
with variance 2eσ = [VarQTL *(1/h²QTL-1) - VarPoly ] was added
to simulate phenotypic values
of the trait. Thus, the ratio of the additive variance explained
by the QTL on the total
phenotypic variance is exactly equal to the specified value
h²QTL while the ratio of polygenic
QTL on the total phenotypic variance could be slightly different
from the specified h²poly.
Hence, the allele effects and the environmental variance were
created at the first generation,
-
15
and remained constant for all the generations even if the number
of alleles decreased.
Nevertheless, the environmental variance was adjusted at the
last generation to set the desired
QTL heritability before performing the QTL detection.
Overlapping generations and matrix of crosses: When the genotype
and phenotype of the
lines were obtained, virtual breeding schemes have been
performed. Two hypotheses were
implemented by extrapolating information obtained by
breeders:
- The “overlapping” choice of the parents. All the parents were
not necessarily extracted from
the last generation only, but a proportion of them (parameter)
could originate from older ones
- The influence of a matrix of crosses on the structure of the
resulting progeny of a cross
breeding program, which really influenced the effective
population size
The design of crosses at the beginning of a breeding scheme
could be seen as a geometric
series, since the representation of parents in the selected
progeny is uneven, L-shaped rather
than random. For example, if a given line, say X, is the most
cultivated line at a given period
(with the best agronomic performance in a range of
environments), X will usually be crossed
to many other lines to fully exploit its genetic value. After
self-pollination and selection, a
certain number of lines coming from this parent X will still
remain at the F6 stage, and will
form one of the largest half-sib families of all the breeding
scheme (containing possibly some
full-sibs when a specific cross is particularly outstanding). On
the other hand, an exotic plant
with a really focused interest but with low agronomic
performance may also be used to
initiate crosses, but at a smaller scale. Some of its offspring
will also probably be selected but
at a much smaller extent.
A matrix of crosses was implemented to reproduce the formation
of half-sib and full-sib
families during a cycle of breeding, hence taking into account
unbalanced contributions of
parents to the final population. This matrix was filled only
upper diagonal, with the parents
sorted from the best ones to the exotic ones. The choice and
order of the parents to fill the
-
16
matrix were based on their anteriority. We considered that the
best agronomic ones came from
the closer generations of breeding, and that the exotic ones
came from older generations of
breeding. The “overlapping” option extracted 80% of parents from
the closest generation of
breeding, and 20% from all the older generations (accounting for
10% of the resulting
progeny). On average, 285 crosses were performed per simulated
selection cycle (out of 4750
possible crosses in the full matrix of crosses). Note that each
cross gave, on average, 1.75 full-
sibs and that each parent was found, on average, in ten
progenies. Thus, each individual is
related, on average, to 8.25 half-sibs and related to 0.75
full-sib.
The resulting half-sib families sizes are presented in figure
2.
Figure 2 around here
Performing n breeding cycles: After the first generation, a loop
on the number of breeding
cycles was performed and the parents of crosses, the resulting
progenies and the phenotypic
data were stored. All available generations were used to build
the next. The last breeding
cycle, NG, was used for QTL detection.
Note that at the beginning, all the allele frequencies were
equal, which was not the case after
many generations due to genetic drift and non-panmictic
conditions. NP alleles with different
effects were possible at each QTL locus and at each marker. All
the markers and QTL were in
full linkage disequilibrium at G0 but were not after many
recombinations (5 generations of
self-pollination by cycle, and NG cycles before the mapping
generation).
Thus, it is easy to anticipate that the information carried by
markers and QTL will be different
in many cases, and that ANOVA based method are likely to be
unefficient.
As the goal of this article is not to anticipate the influence
of selection on the QTL detection,
we did not performed recurrent pedigree selection on the value
of the quantitative trait for the
-
17
choice of the parents, or intra-cross-selection during the self
pollination process. We just
wanted to anticipate the influence of the structure of the
population and the effect of the
design of crosses and breeding cycles for a non-selected trait,
at non-selected locus.
Design of simulations:
Standard setting: as proposed by XU (1998), instead of
performing simulations on a factorial
design of every possible parameter combinations, we chose a
standard setting for each
parameter. The conditions of this standard setting were 21
chromosomes of length 100 cM
each with 11 markers spaced every 10 cM, a QTL segregating at
position 45 on chromosome
1 with heritability 0.1. Npoly=40 polygenic QTL were randomly
placed on the other
chromosomes setting a total genetic heritability (h²g) of 0.435
(with standard deviation 0.051
among the 100 repetitions). For each cycle, about 500
individuals were created at the F6 stage
coming from 100 parents. The studied generation came from the
10th cycle of breeding,
having considered NP=20 founder lines at the beginning of
selection.
Comparison of IBD formula (1) and IBD formula (2) for three
levels of QTL heritability:
Both formulae were tested for the same set of 100 independent
replicates. Three different
quantitative traits per replicate were created at the last
generation of breeding scheme with
QTL heritabilities 0.05, 0.1 and 0.2. Total genetic heritability
(h²g) was initially set to 0.5 but
after 10 generations of crosses (and genetic drift), some
stochastic differences between
replicates appeared. Thus, h²g values are 0.427 (0.063 standard
deviation among 100
replicates) for h²QTL=0.05, 0.435 (0.051) for h²QTL=0.1, and
0.456 (0.047) for h²QTL=0.2.
Different experimental conditions: We first tested the
robustness of formula (2) for different
marker bias conditions. To avoid stochastic differences between
the tested conditions, the 100
same replicates from the standard setting were used before
adding bias to the marker
information. Thus, differences between the results within those
conditions can be directly
-
18
attributed to this added bias on marker information. Created
biases included (1) missing
marker information (NA) (10% of randomly placed NA in the
progenies), (2) portion of non-
IBD Alike-In-State (AIS) alleles (a portion of allele codes were
changed to other existing
allele codes in parents and progeny files to obtain 25% of AIS
(i.e. not IBD) alleles in the
whole genome).
We then varied out parameters around the standard setting,
changing only one parameter at a
time: these parameters include (1) the total number of founder
lines in the base population
(NP: 40 vs 20); (2) the number of breeding cycles (NG: 20 vs 10)
; (3) the marker density (d:
20 vs 10) ; (4) the number of lines at the mapping generation
(n: 250 vs 500) and number of
parents (m: 50 vs 100)
We also tested the 20 cM marker density condition (3) with 25%
of AIS alleles in order to
anticipate the effect of this added bias at a lower marker
density. Formula (2) was used for the
analysis of all these variations, except for the NG=20 case,
where we tested both formulae.
Some parameters remained constant for all the simulations: the
total number of chromosomes
(21), the number of polygenic QTL (40), the position of the QTL
on chromosome 1 (45 cM
except for the d=20cM case where the QTL position is 50 cM), the
total genetic heritability
originally set to 0.5.
RESULTS
We report the results of the two step variance component
analysis on the different settings
below. Due to computer constraints, only every third centiMorgan
was tested for the presence
of a QTL. Under each condition, the detection was performed for
100 random replicates.
Parameters estimates and their standard error are reported. The
empirical thresholds were
computed from the analysis of 1000 replicated data sets. In the
special case of the estimation
of the QTL position, we measured a confidence interval (CI)
based on the LOD-Drop-Off
-
19
(LDO) method (LANDER and BOTSTEIN 1989) calculated for each
significant replicate. With
this interval, we calculated the frequency at which the true QTL
location was included in the
LOD- Drop-Off CI. We also measured the power that the true QTL
position was included in
the CI determined by 4 times the standard error of the estimated
position. Finally, for the
significant replicates and at the true QTL position, we
calculated the bias on the QTL
heritability induced by our method, for both formulae. We then
computed the QTL
heritability over an interval including the true QTL
position.
Figure 3 around here
The average likelihood ratio test profiles over 100 replicates
for the different settings explored
are presented in Figure 3. As expected, we notice a strong
influence of the magnitude of the
QTL effect (i.e. the heritability of the QTL) on the LR profile
(Figure 3a). Formula (2) -
which takes into account ancestor pedigree relationships as
estimated by markers, to infer the
IBD probabilities - highly outperforms formula (1) for the three
levels of QTL heritabilities in
terms of detection power. For subsequent simulations, we used
formula (2) only.
The method seems robust to biased information on markers (Figure
3b). Indeed, likelihood
ratio test profiles under conditions supposed to be found in
real breeding schemes (i.e. non-
informative alleles or wrongly informative, biased map
estimation) shows only small
differences between the different tested conditions, except for
the 10% missing marker
information on the last generation where the LR profile lies
below the profile obtained with
complete marker information. Alike In State alleles that come
from different founder lines –
hence linked to different, non-IBD QTL alleles - have only a
small influence on the LR curve.
Figures 3c and 3d show the variations around the standard
setting, for the number of founders,
number of breeding cycles, total number of F6 and marker
density. For a constant mean size
-
20
of the derived populations (10 F6 lines per parent), switching
from 500 to 250 F6 generates a
strong decrease in detection power (see graph 3c, lower curve).
We also notice on this figure a
very low influence of the number of breeding cycles (NG=20) and
a strong effect of the
marker informativeness represented by the number of founder
lines (NP=40 against 20).
Finally, we notice a low influence of the marker density, except
on the smoothness of the
curve which presents informativeness peaks close to the marker
positions (Figure 3d).
Nevertheless, with 25% of AIS alleles, the QTL detection power
was found more sensitive to
a decrease in marker density from a value of one marker every 20
cM downwards. With a
marker every 10 cM, the decrease in detection power is not very
large (see Figure 3b).
Table 1 around here
The ability of the method to accurately estimate the parameters
of interest can be judged from
the results presented in Table 1. The accuracy of the QTL
estimated position increases with
the switch from formula (1) to formula (2) and also, as
expected, with higher QTL
heritabilities. Nevertheless, formula (2) leads us to
overestimate both QTL and total genetic
heritabilities, more than formula (1) does.
For the different designs explored, the smaller population size
is the parameter which affects
the precision of the QTL position estimate most. On the
opposite, the higher number of
founder lines at the beginning of crosses gives the best
estimates, as it increases the chance to
have polymorphic markers between the parents. The influence of
the number of breeding
generations on the accuracy of the parameter estimates also
turned out to be small.
Finally, we notice that the 20 cM density case yields accurate
results even if the number of
markers available to build the relationship matrix and infer the
IBD is divided by two.
Nevertheless, both heritability estimates are less accurate than
with a 10 cM density map.
-
21
Table 2 around here
The empirical threshold values of LR test statistics over 1000
replicated simulations are
reported in Table 2. For all the designs, the critical values
are nearly equivalent. This is not
really surprising as the number of parameters being tested in
the random model strategy
remains the same. We also report in Table 2 the average LR test
statistics and the power
estimates for a Type I error α=0.05 over 100 replicated
simulations. First we notice that the
value of the LR test statistic increased with the value of the
QTL heritability and that the
chance to detect QTL is increased by using formula (2).
Biased marker information (non-IBD but AIS marker alleles,
missing genotypes) does not
really influence the detection, except for 10% of missing data
where the loss in power reaches
14% in comparison to the standard setting. For the different
experimental designs, a higher
number of founder lines tends to increase the detection power.
Finally, the simulations where
d=20cM and NF6=250 give correct detection powers in comparison
to the standard setting.
Table 3 around here
In Table 3, we report the size of the confidence interval
determined by four standard
deviations of the estimated position or the drop by one and by
two LOD units. These CI have
been established by only taking the significant runs into
account. We also report the
frequency at which the true position (i.e. 45 cM) is included in
the delimited interval
(percentage of inclusion).
We notice that the drop by two LOD units gives the confidence
interval that is closest to 95%
for the percentage of inclusion. In most cases, this interval is
larger or equal to the one
delimited by four standard deviations. Nevertheless, as the LOD
drop-off method does not
-
22
give symmetric intervals, it takes into account the information
of the curve in a better way.
We also notice that the drop off by one LOD unit gives
appropriate confidence intervals only
for the highest QTL heritability.
Table 4, 5, 6 around here
Table 4 shows, for the different levels of QTL heritabilities,
the parameter estimates for the
replicates that are significant under the empirical threshold.
We notice that, in comparison to
Table 1 which includes all the replicates, the overestimation of
the QTL heritability is higher.
However, the estimated position is more accurate when averaging
over the significant
replicates only, yielding a smaller standard deviation also.
In Table 5, we present the QTL and total genetic heritabilities
at the true QTL position to
detect possible bias from the method under formula (1) and (2).
We notice that formula (2)
gives more correct estimates at the true QTL position than
formula (1) for all levels of QTL
heritabilities. As formula (2) gives almost unbiased estimates
at the true QTL position, a
corrective factor for the heritability could then be worked out
as a function of the estimated
position. We thus report in Table 6 an attempt to give more
appropriate estimate of the
heritability for this kind of methods. We built an “Averaged
Confidence Interval Heritability”
by averaging the heritability over the confidence interval
around the position estimate (which
is supposed to include the true QTL position), instead of taking
only a point heritability
estimate at the detected position. We notice that this
corrective factor yields results very close
to the true parameter value with formula (2) but underestimates
both heritabilities with
formula (1).
-
23
DISCUSSION
Obviously, many statistical methods already exist to map QTL in
inbred plant material;
however, these methods mainly focus on a single bi-parental
cross. Other methods have been
developed to address more challenging population structures (XIE
et al. 1998; XU 1998; Yi
and XU 2001, for example). Nevertheless, these methods do not
appear to be easily extended
to highly fragmented populations, at any selfed or back-crossed
generation, and coming from
many different parents. They also do not take into account the
possibility for alleles to be IBD
if ancestor pedigrees are not available.
In this study, we extended the QTL mapping methodology proposed
by GEORGE et al. (2000)
and based on a two-step IBD variance component approach, to
typical plant breeding
populations made up of selfed lines which may have either: one
or two parents in common,
parents related to each other or not related to each other. In
this paper we studied the simplest
possible scenario where no relevant information about the
ancestors’ pedigrees was available.
The power and accuracy of this method were assessed using
simulated data mimicking
conventional breeding programs in cereals, in an effort to
reproduce actual conditions of
marker and gene frequencies and linkage disequilibrium across
the parental lines.
In constructing the matrix of IBD probabilities, a more thorough
use of the marker data was
achieved by calculating the genetic similarities between the
parents. The extent to which the
matrix G was modified from formula (1) to formula (2) is quite
large. The proportion of PIBD
values equal to zero with formula (1) – those values between
non-sib lines – and replaced by
non-zero values, was equal to 91%, as calculated from the
distribution of family sizes featured
in Figure 2. The inferred relatedness patterns between non-sibs
leads to a substantial
improvement of the accuracy of the position estimates and of the
QTL detection power.
Nevertheless, a downside of this improvement is a stronger
overestimation of the QTL
heritability as we shall discuss below.
-
24
The method performed quite well for all the tested sources of
bias (missing marker data, non-
IBD alike-in-state alleles). Two complementary explanations can
be put forward to explain
the observed loss in statistical power in the settings with
non-IBD AIS alleles, observed with
the lower marker density:
1- the lower chance to have informative markers flanking the
interval being scanned.
Thus, informative flanking markers had to be fetched further
apart on average, thus
decreasing in turn the estimates accuracy of the putative QTL’s
allelic state.
2- A higher proportion of alleles that are AIS but not IBD
should also have generated an
upward bias of some of most genetic similarity estimates between
the parents. This, in
turn, will have affected the estimates of IBD probabilities and
concomitantly the
additive relationships between individuals thereby generating an
upward bias of their
estimates for non-full-sibs.
In our population design, there is a strong within family
linkage disequilibrium that can be
exploited by comparing the parent’s genotypes to the current F6,
which accumulated relatively
few cross-overs. Formula (1) is solely based on the utilization
of this linkage disequilibrium,
and is similar to that used by XIE et al. (1998). Formula (2)
can be viewed, loosely speaking,
as an attempt to merge, to some extent, several families
together on the basis of the likelihood
that the parents share the same alleles identical-by-descent at
the putative locus. The power
increase obtained by using formula (2) follows the same
principle as that obtained by XIE et
al. (1998) in his Table 7 when he switched from a 100 x 5
sampling strategy to a less
fragmented 50 x 10.
The same comparison can be made to explain the increased
detection power observed after 20
breeding cycles instead of 10. Due to the very uneven parental
contribution to the crossing
scheme at each breeding cycle, some genetic drift takes place
regularly during the breeding
-
25
cycles leading to the loss of certain haplotypes. Cross-overs
increasingly occur between
chromosome blocks with similar haplotypes and are thus
genetically ineffective. This could
be compared to a situation where a smaller number of “effective”
parents were used, which,
for a constant progeny size, gives enhanced power (XIE et al.,
1998). This explanation is
confirmed by the comparison between the use of formula (1) and
formula (2) for the same 100
replicates, for the NG=20 setting. Formula (2) anticipates the
similarities between the parents
in a better way, hence yielding better detection power.
QTL heritability readjustment before mapping
In preliminary simulations, QTL heritabilities changed, from the
founder line generation to
that of the mapping F6 lines, due to random genetic drift and
non-panmictic conditions in
small populations, hence yielding different heritabilities at
the last generation. What we are
concerned with is the QTL heritability at the current mapping
generation to allow sounder
comparison with published results. It therefore seemed sensible
to choose that one, as
opposed to that at the founder generation, as an entry parameter
for our study. It is also to
conciliate a germplasm history effect and a resulting QTL
heritability set as a parameter, that
we did a systematic readjustment of this parameter at the end of
the ten breeding cycles. In so
doing, we did not compromise too much with the assessment of the
germplasm history effect
as far as allelic frequencies and the ratio of the QTL effect
over the other polygenic effects are
concerned.
Overestimation of the QTL heritability and proposals for a
corrected heritability
CHARCOSSET and GALLAIS (1996)’s conclusions about the
overestimation of h2QTL by the R2
estimator, in the standard fixed-effect ANOVA framework, cannot
be directly extended to our
case, where random effects are fitted to our QTLs. In fact, if
the model is known, REML
estimates of σu2 and of σv2 are unbiased since the estimates of
the fixed effects and the
prediction of the random effects are unbiased – the “U” of the
BLUE and BLUP acronyms.
-
26
However, in QTL mapping, the model is unknown inasmuch as we do
not know whether
there is a QTL or not, or if we assume that there is one, we do
not know which locus must
have its segregation’s effect fitted in the model.
In this paper, we related the mean ĥ2QTL from all stochastic
realisations, in addition to that of
the significant QTL’s only. This allowed us to distinguish
between the part of overestimation
due to the Beavis effect (BEAVIS 1994) and other remaining
sources of bias. There did remain
some. However, when, in our simulations, h2QTL was estimated at
the QTL’s real position only
– which, again, one cannot do over real data - this bias
disappeared with formula (2). This
suggests that :
1- the uncertainty over the QTL’s position generates a
substantial bias in the QTL
heritability estimates by itself. Since the locus retained is
the one that yields a model
with the maximum likelihood ratio (Lmax), it is also likely to
be the locus where a
chance association with the residuals component of the phenotype
is strongest. Thus,
in terms of expectancy, the residuals vector will, on average,
play towards either
decreasing or increasing ĥ2QTL with the same frequency at the
QTL’s real position,
whereas it will play towards increasing it more often at Lmax,
since it is, by definition,
the locus most strongly associated with the phenotypes.
2- Since formula (2) recovers more of the real information, it
was quite expected that
ĥ2QTL, in this case would approach h2QTL more closely and that
formula (1) would give
an under-estimate, at the QTL’s real position.
We shall focus on formula (2) (which yields more accurate
heritability estimates at the true
QTL position) to propose a correction of the (over)estimated
heritability at the position of
maximum likelihood. At this stage, we focused on the significant
replicates only to be
representative of what we would find with real data. We plotted
the LR curve and that of the
-
27
putative QTL heritability estimate from many replicates along
the scanned chromosome. The
highest LR at a certain position also corresponds to the highest
detected QTL heritability as
exemplified by the fact that both curves showed the same
pattern, i.e. the estimated QTL
heritability and the LR curve followed proportional values along
the y-axis. This property can
be mathematically demonstrated.
For lower levels of QTL heritabilities, the position was poorly
estimated for many replicates
(high standard deviation of the position estimates for the set
of significant replicates), and the
QTL effects were overestimated. This was interpreted as being
due to the Beavis (BEAVIS
1994) effect, because chance association between residuals and
genotype can yield a
maximum for the QTL heritability away from the true QTL
position.
On the contrary, when drawing the same curves for the replicates
with the highest QTL
heritability (i.e. 0.2), we observed the same pattern for the
curves (they still follow each other,
as expected), but the detected position was more often very
close to the true QTL position
(yielding a smaller standard deviation on QTL position estimates
for the set of significant
replicates). Thus, the variance component method applied to high
QTL heritabilities did not
yield such a bias on QTL heritability estimates. This was
interpreted as due to a lower effect
of the residual component, contrary to the case of lower
heritabilities.
To summarize, we pointed out that:
(i) the estimated QTL heritability curve follows the LR
curve.
(ii) formula (2) yields the correct QTL heritability estimate at
the true QTL position,
(iii) the poor heritability estimates at the detected position
are due to wrong position
estimates, and to the averaging of heritabilities, which were
obtained at their own
different position estimates (hence not at the true one).
We worked out a sort of Confidence Interval for the estimated
heritability, based on the
Confidence Interval for the QTL position estimate: bearing in
mind that the expectancy of the
-
28
position where the estimated heritability is equal to the true
one lies at the true QTL position.
Thus a pair of loci that delimits a confidence interval (hence a
set of positions) that entails the
true QTL position will also delimit a corresponding set of
estimated heritabilities – at the
different likely QTL positions within this interval - that
entails the true one. The expectancy
of the heritabilities calculated in the middle of the interval
will be higher than the real QTL
heritability by a certain factor that only depends on the real
QTL heritability and on the
experimental design. On the other hand, towards the ends of the
confidence interval, the
heritability of the QTL would, on average, be underestimated, by
a factor that depends not
only on QTL heritability and experimental design but also on the
chosen stringency of the
confidence interval. Hence, a less stringent confidence interval
will contain a higher
proportion of underestimates. What we observed was that a 95%
confidence interval (as
roughly determined by a 2-LOD drop-off interval) seems to
contain just the right proportion
of under- and overestimates so that when we average the
estimated heritabilities over the
confidence interval, we obtain an unbiased estimate.
Leads for improvement:
Taking all the markers to calculate the genetic similarities
between the parents seems to be the
most appropriate solution in order to calculate the ji,π at each
putative locus. As for the
polygenic term of the model, one may argue that the calculation
of the matrix A (in v ~
(0,Aσv2)) could be more precise if it was based on the markers
that are actually linked to some
polygenes, i.e. to some QTLs, instead of using all the markers
indiscriminately as we did in
this study. Implementing this scheme would require a forward
iterative search procedure
whereby the first step of QTL mapping would be carried out with
no polygenic term. The
second step would add a polygenic term with A being equal to the
arithmetic mean of the G
matrices of the different positions of the QTL detected in step
one. New QTLs could then be
detected due to an increase in power, since some background
genetic noise would have been
-
29
removed or at least, it is expected that the new QTL position
estimates will be more precise.
Hence, in the subsequent round these new position estimates
would be used to update A. And
the QTL search could be iterated until a convergence criterion
is satisfied. This procedure is
somewhat analogous to one option of the Composite Interval
Mapping proposed by ZENG
(1994) in which a best subset of markers was chosen by stepwise
regression, then used as
cofactors in the linear model adjustment in the QTL search. This
procedure, though, can bring
an advantage only if a few QTLs explain the genetic variation as
opposed to many with a
small effect, all over the genome.
There is still some scope for a more accurate and probably less
biased estimation of the
coefficients of co-ancestries between parents and individuals in
order to better estimate the
parameters of the model and increase the QTL detection power. A
first attempt to improve the
PIBD estimates could be to subtract from all genetic
similarities an estimated proportion of
alleles in common that supposedly unrelated lines have in common
– by definition, these
alleles in common would be alike in state only and not IBD. This
method was suggested by
MELCHINGER (1991). The use of STRUCTURE (PRITCHARD et al. 2000,
FALUSH et al. 2003 for
linked loci), for example, would allow to group parents
according to a common selection
history. Our genetic similarities between the parents would be
replaced by the scalar products
of parents’ decompositions between the inferred clusters. The
remaining of the PIBD
calculation would be identical. Likewise, matrix A could be
computed from the same
decomposition.
The proposed method has proved to be powerful in detecting
medium sized QTL (h²=0.1) in a
typical set of inbred lines from a complex pedigree, such as
those created in common cereal
breeding programs. The use of this kind of methods could
increase the relevance and cost
-
30
effectiveness of quantitative trait loci mapping in applied
contexts and could provide an
alternative to the development of specifically designed
recombinant population, by exploiting
the genetic variation actually used by plant breeders. JANSEN et
al. (2003) proposed to use
parental haplotypes sharing to routinely map QTL in breeding
populations. The use of
haplotypes is challenging in this kind of methods where only
little information is available on
founders and on the relationships between parents. It is even
more challenging with the
increasing use of SNPs versus micro-satellite markers. A mixing
of the proposed IBD method
and the method proposed by JANSEN et al. (2003) would be a good
solution for mapping QTL
and properly estimate the haplotype effects, at a lower marking
cost. One would first detect
QTL within an IBD-based variance component framework at a low
marker density then use a
higher marker density for some QTL of interest to build
haplotypes and estimate their effects
within a fixed-effect framework. Such locally high density
mapping could allow identifying
the haplotypes of minimum length that have the most promising
effect. Besides, it would
directly provide markers to manipulate these haplotypes in
breeding schemes.
The methodology developed in this article is currently applied
to the analysis of real wheat
breeding data.
Acknowledgments: The authors are grateful to the Ministère de
l'Economie, des Finances et
de l'Industrie for its financial support (ASG program n° 01 04
90 6058)
-
31
LITTERATURE CITED
ALMASY,L., and J. BLANGERO, 1998 Multipoint quantitative trait
linkage analysis in general
pedigrees. Am. J. Hum. Genet. 62: 1198-1211.
BEAVIS W.D., 1994 The power and deceit of QTL experiments:
lessons from comparative
QTL studies. American Seed Trade Association. 49th Annual Corn
and Sorghum Research
Conference. Washington D.C.
BERNARDO, R., 1993 Estimation of coefficient of coancestry using
molecular markers in
maize. Theor. Appl. Genet. 85: 1055-1062.
BINK, M. C. A. M., P. UIMARI, M. SILLANPÄÄ, L. JANSS, and R.
JANSEN, 2002 Multiple QTL
mapping in related plant populations via a pedigree-analysis
approach. Theor. Appl. Genet.
104: 751-762.
CHARCOSSET, A., and A. GALLAIS, 1996 Estimation of the
contribution of quantitative trait
loci (QTL) to the variance of a quantitative trait by means of
genetic markers. Theor. Appl.
Genet. 93: 1193-1201.
COCKERHAM, C. C., 1983 Covariances of relatives from
self-fertilization. Crop Science 23:
1177-1180.
DONINI P., J. R. LAW, R. M. D. KOEBNER, J. C. REEVES and R. J.
Cooke, 2000 Temporal
trends in the diversity of UK wheat. Theor Appl Genet 100:
912-917.
FALUSH D., M. STEPHENS and J. K. PRITCHARD, 2003 Inference of
population structure using
multilocus genotype data: linked loci and correlated allele
frequencies. Genetics 164: 1567-
1587.
GEORGE A. W., P. M. VISSCHER and C. S. HALEY, 2000 Mapping
quantitative trait in complex
-
32
pedigrees: a two-step variance component approach. Genetics 156:
2081-2092.
GILMOUR, A. R., B. R. CULLIS, S. J. WELHAM and R. THOMPSON, 1998
ASREML. Program
User Manual. Ed. Orange Agricultural Institute, New South Wales,
Australia.
GIMELFARB, A., and R. LANDE, 1994 Simulation of marker-assisted
selection in hybrid
populations. Genet. Res. 63: 39-47.
GIMELFARB, A., and R. LANDE, 1995 Marker-assisted selection and
marker-QTL associations
in hybrid populations. Theor. Appl. Genet. 91: 522-528.
HALEY, C.S., and S. KNOTT, 1992 A simple regression method for
mapping quantitative trait
loci in line crosses using flanking markers. Heredity 69:
315-324.
HARRIS, D. L., 1964 Genotypic covariances between inbred
relatives. Genetics 50: 1319-
1348.
HOSPITAL, F., L. MOREAU, F. LACOUDRE, A. CHARCOSSET and A.
GALLAIS, 1997 More on the
efficiency of marker-assisted selection. Theor. Appl. Genet. 95:
1181-1189.
JANSEN, R. C., 1993 Interval mapping of multiple quantitative
trait loci. Genetics 135: 205-
211.
JANSEN R. C., J-L. JANNINK and W.D. BEAVIS, 2003 Mapping
quantitative trait loci in plant
breeding populations: use of parental haplotype sharing. Crop
Sci. 43: 829-834.
JIANG, C., and Z-B ZENG,, 1995 Multiple trait analysis of
genetic mapping for quantitative
trait loci. Genetics 140: 1111-1127.
KEMPTHORNE, O., 1955 The correlation between relatives in inbred
populations. Genetics 40:
681-691.
-
33
KOROL, A., Y. RONIN, and V. KIRZHNER, 1995 Interval mapping of
quantitative trait loci
employing correlated trait complexes. Genetics 140:
1137-1147.
LANDE R., and R. THOMPSON, 1990 Efficiency of marker-assisted
selection in the
improvement of quantitative traits. Genetics 124: 743-756.
LANDER, E., and D. BOTSTEIN, 1989 Mapping mendelian factors
underlying quantitative traits
using RFLP linkage maps. Genetics 121: 185-199.
LYNCH, M., and B. WALSH, 1998 Genetics and Analysis of
Quantitative Traits. Sinauer
Associates, Sunderland, MA.
MALECOT G., 1948 Les mathématiques de l'hérédité, Ed. Masson et
Cie, Paris.
MELCHINGER A. E., M. M. MESSMER, M. LEE, W. L. WOODMAN and K. R.
LAMKEY, 1991
Diversity and relationships among U.S. maize inbreds revealed by
restriction fragment length
polymorphisms. Crop Sci. 31: 669-678.
MOREAU L., S. LEMARIÉ, A. CHARCOSSET, and A. GALLAIS, 2000
Economic efficiency of
one cycle of marker-assisted selection. Crop Sci. 40:
329-337.
MURANTY, H., 1996 Power of tests for quantitative trait loci
detection using full-sib families
in different schemes. Heredity 76: 156-165.
NEI M. and W. H. LI, 1979 Mathematical model for studying
genetic variations in terms of
restriction endonucleases. Proc. Natl. Acad. Sci. 76:
5369-5373.
PRITCHARD J. K., M. STEPHENS and P. DONNELLY, 2000 Inference of
population structure
using multilocus genotype data. Genetics 155: 945-959.
RODER, M. S. , K. WENDEHAKE, V. KORZUN, G. BREDEMEIJER, D.
LABORIE, D. et al., 2002
-
34
Construction and analysis of a microsatellite-based database of
European wheat varieties.
Theor. Appl. Genet. 106: 67-73.
SERVIN, B., C. DILLMANN, G. DECOUX and F. HOSPITAL, 2002 MDM a
program to compute
fully informative genotype frequencies in complex breeding
schemes. J. Hered. 93(3): 227-
228.
S-PLUS, 2000 S-PLUS guide to statistical and mathematical
analyses. MathSoft,
Massachusetts Institute of Technology
XIE, C., D. D. G. GESSLER and S. XU, 1998 Combining different
line crosses for mapping
quantitative trait loci using the identical by descent-based
variance component method.
Genetics 149: 1139-1146.
XU, S. 1998 Mapping quantitative trait loci using multiple
families of line crosses. Genetics
148: 517-524.
XU, S., and W. R. ATCHLEY, 1995 A random model approach to
interval mapping of
quantitative trait loci. Genetics 141: 1189-1197.
YI, N., and S. XU, 2001 Bayesian mapping of quantitative trait
loci under complicated mating
designs. Genetics 157: 1759-1771.
ZENG, Z-B., 1994 Precision mapping of quantitative trait loci.
Genetics 136: 1457-1468.
ZENG, Z-B., 1993 Theoritical basis of separation of multiple
linked gene effects on mapping
quantitative trait loci. Proc. Natl. Acad. Sci. USA. 90:
10972-10976.
-
35
TABLE 1
Estimates of the position, QTL and total genetic heritabilities;
h²QTL and h²g respectively
Experimental design h² g Position ĥ2QTL ĥ2g
(1) QTL heritability and formula
h²QTL=0.05
Formula (1)
Formula (2)
0.427 (0.068) 49.44 (25.42)
46.45 (21.84)
0.067 (0.035)
0.085 (0.041)
0.427 (0.108)
0.432 (0.107)
h²QTL=0.1 Formula (1)
Formula (2)
0.435 (0.051) 47.76 (20.88)
45.52 (17.72)
0.099 (0.042)
0.129 (0.056)
0.442 (0.081)
0.450 (0.083)
h²QTL=0.2 Formula (1)
Formula (2)
0.456 (0.047) 44.91 (7.35)
45.73 (6.89)
0.192 (0.058)
0.228 (0.065)
0.451 (0.083)
0.467 (0.091)
(2) Different experimental designs
Standard setting 45.52 (17.72) 0.129 (0.056) 0.450 (0.083)
AIS 25% 47.15 (17.99) 0.135 (0.060) 0.445 (0.084)
NA 10%
0.435 (0.051)
48.55 (18.87) 0.131 (0.098) 0.434 (0.094)
NP=40 0.451 (0.067) 48.42 (13.75) 0.136 (0.053) 0.446
(0.108)
NG=20 formula (1) 0.413 (0.059) 48.00 (18.75) 0.091 (0.039)
0.430 (0.091)
NG=20 formula (2) 0.413 (0.059) 46.89 (15.96) 0.145 (0.049)
0.436 (0.094)
d=20cM
d=20, AIS 25%
0.433 (0.052)
0.433 (0.052)
52.27 (23.10)
52.87 (24.80)
0.149 (0.070)
0.147 (0.076)
0.395 (0.088)
0.399 (0.087)
Npar=50, NF6=250 0.403 (0.077) 49.86 (23.25) 0.159 (0.070) 0.441
(0.149)
The standard setting is a QTL heritability of 0.1, a total
genetic heritability (h2g) of 0.435
(with standard deviation 0.051 among the 100 repetitions) and a
10 cM marker density, with a
QTL at 45 cM. About 500 individuals are created at the F6 stage
coming from 100 parents for
each breeding cycle. h2g is the result of the readjustment of
the residuals in order to obtained
a given target h2QTL. The studied generation comes from the 10th
cycle of breeding, having
considered 20 founder lines at the beginning of selection.
-
36
(1) Each set of runs differs from the simulated QTL heritability
noted in column one, and
by the IBD formula tested. IBD formula (1) takes only into
account Half-Sib and Full-
Sib relationships while formula (2) adds ancestor relationships
estimated by markers.
(2) Each additional set of runs differ from the standard setting
by the simulation
parameter noted in column one. Except the NG=20 case, all the
other settings are
tested with formula (2) only.
Mean and standard deviations (in parentheses) are calculated
among the 100 replicates.
-
37
TABLE 2
LR threshold, test statistic and QTL detection power
Threshold Test statistic Power (%)
(1) QTL heritability and formula
h²QTL=0,05 formula (1) 4.08 4.62 (3.52) 47
h²QTL=0,05 formula (2) 3.96 5.23 (3.97) 58
h²QTL=0,1 formula (1) 4.08 7.75 (5.06) 71
h²QTL=0,1 formula (2) 3.96 9.76 (6.71) 80
h²QTL=0,2 formula (1) 4.08 22.03 (10.9) 100
h²QTL=0,2 formula (2) 3.96 23.69 (10.8) 100
(2) Different experimental designs
Standard setting 3.96 9.76 (6.71) 80
AIS 25% 4.28 9.91 (6.56) 72
NA 10% 4.30 7.81 (5.53) 66
NP=40 3.70 11.29 (7.09) 89
NG=20 formula (1) 3.72 5.95 (4.15) 65
NG=20 formula (2) 3.62 10.62 (5.75) 91
d=20 4.25 9.21 (6.81) 75
d=20, AIS 25% 4.94 8.06 (6.50) 64
Npar=50, NF6=250 4.00 7.96 (3.66) 62
See Table 1 for the standard setting. Each additional set of
runs differs from the standard
setting by the parameter change noted in column one. Threshold
represents the empirical
threshold calculated for 1000 replicates. Test statistic is the
mean and standard deviation of
the maximum of LR test for the 100 replicates. Power is the
percentage of replicates with max
LR exceeding the empirical threshold.
-
38
TABLE 3
Accuracy of three methods used to infer Confidence Intervals
(CI)
CI- 4 S.D. % 4 S.D. CI- 1 LDO % 1 LDO CI- 2 LDO % 2 LDO
(1) QTL heritabilities and IBD formula
h²QTL=0,05 formula (1) 86.40 91 41.78 79 86.66 97
h²QTL=0,05 formula (2) 72.20 93 41.18 79 85.43 97
h²QTL=0,1 formula (1) 63.28 90 29.10 82 74.75 99
h²QTL=0,1 formula (2) 63.40 93 31.04 86 66.62 96
h²QTL=0,2 formula (1) 29.40 96 17.26 90 34.33 97
h²QTL=0,2 formula (2) 26.00 98 15.21 91 28.21 97
(2) Different experimental designs
Standard Setting 63.40 93 31.04 86 66.62 96
AIS 25% 63.84 92 26.4 80 62.9 97
NA 10% 63.24 91 28.48 80 67.6 95
NP=40 52.24 94 26.66 85 60.34 97
NG=20, formula (1) 67.60 91 43.95 72 82.80 96
NG=20, formula (2) 62.20 91 27.30 76 66.44 97
D=20 93.60 96 36.35 75 73.96 94
D=20, AIS 25% 94.56 95 35.08 67 75.61 91
Npar=50, NF6=250 88.16 89 45.61 79 82.82 95
See Table 1 for the standard setting. “CI- 4 S.D.” represents
the size of the CI calculated as
four times the standard deviation on the estimated QTL position
for the significant runs, and
“% 4 S.D.” represents the percentage of times that the true
position is included within the
delimited interval. “CI- 1 LDO” and “CI- 2 LDO” represent the
size of the CI as determined
by the drop of one and two LOD unit (multiply by 2*ln(10) to
convert in LR units)
-
39
respectively. “% 1 LDO” and “% 2 LDO” represent the number of
times that the true position
is included within the delimited interval.
-
40
TABLE 4
Estimates of the QTL parameters for different levels of QTL
heritability under the
empirical threshold
Experimental design h² g Position ĥ2QTL ĥ2g
QTL heritability
h²QTL=0.05
Formula (1)
Formula (2)
0.427 (0.068) 47.55 (21.61)
45.98 (18.05)
0.092 (0.032)
0.110 (0.030)
0.429 (0.110)
0.437 (0.109)
h²QTL=0.1 Formula (1)
Formula (2)
0.435 (0.051) 49.56 (15.82)
45.56 (15.85)
0.117 (0.035)
0.143 (0.051)
0.452 (0.081)
0.450 (0.08)
h²QTL=0.2 Formula (1)
Formula (2)
0.456 (0.047) 44.91 (7.35)
45.49 (6.50)
0.192 (0.058)
0.230 (0.062)
0.451 (0.083)
0.471 (0.081)
Each set of runs differs by the simulated QTL heritability noted
in column one, and by the
IBD formula tested. Mean and standard deviations (in
parentheses) are reported only for the
significant replicates (noted in the “Power” column of Table 2)
among 100.
-
41
TABLE 5
Estimates of the QTL and total genetic heritabilities at the
true QTL position (45)
h²QTL =0.05, h2g=0.427 h²QTL =0.1, h2g=0.435 h²QTL =0.2,
h2g=0.456
Formula ĥ2QTL ĥ2g ĥ2QTL ĥ2g ĥ2QTL ĥ2g
(1)
(2)
0.036 (0.037)
0.051 (0.047)
0.426 (0.106)
0.430 (0.108)
0.068 (0.048)
0.094 (0.064)
0.442 (0.080)
0.447 (0.080)
0.191 (0.056)
0.225 (0.066)
0.455 (0.081)
0.471 (0.094)
Formula (1) and (2) are tested for three levels of QTL
heritability at the true QTL position.
h2g is the result of the readjustment of the residuals in order
to obtained a given target h2QTL.
-
42
TABLE 6
Averaged Confidence Interval Heritability for the significant
replicates over the 2 Lod-
Drop-off units interval
heritabilities h²QTL =0.05, h2g=0.427 h²QTL =0.1, h2g=0.435
h²QTL =0.2, h2g=0.456
No-Selection ĥ2QTL ĥ2g ĥ2QTL ĥ2g ĥ2QTL ĥ2g
(1)
(2)
0.049 (0.028)
0.054 (0.034)
0.405 (0.104)
0.418 (0.109)
0.072 (0.039)
0.100 (0.063)
0.434 (0.083)
0.438 (0.081)
0.158 (0.062)
0.194 (0.067)
0.438 (0.085)
0.459 (0.082)
Formula (1) and (2) are tested for three levels of QTL
heritability.
h2g is the result of the readjustment of the residuals in order
to obtained a given target h2QTL.
-
FIGURE 1
1 2 3 NP Q1 Q2 Q3 QNP 1 2 3 NP
P1 P2 P3 P4 (…) P99 P100
matrix of crosses
* * * *
P1 P2 P79 P80
Founder lines: full linkage disequilibrium marker-QTL
G0 generation
For i=10 breeding cycles
Gi generation 500 F6 lines
QTL Detection
1 to 4 progenies per cross + half-sib relationships
circular crosses
G10 generation
+ P81-P100 : overlapping
NP sub-populations
Creation of the QTL and polygene effects
-
44
0
10
20
30
40
50
60
70
80
P1 P6 P16
P26
P36
P46
P56
P66
P76
P86
P96
Parent name
Num
ber o
f pro
geni
es /
pare
nt
MeanStandard Deviation
FIGURE 2
-
45
FIGURE 3
-
46
FIGURE LEGENDS: FIGURE 1: Simulation of the breeding scheme
FIGURE 2: Mean and standard error for 1000 replicates created by
the “matrix of crosses”
function for the half-sib family size under the standard setting
(100 parents, 500 F6).
FIGURE 3: Comparison of the LR profiles for (a) different levels
of QTL heritabilities for
formula (1) and (2), (b) different biased marker information for
the standard setting under
formula (2), (c) difference in the breeding schemes compared to
the standard setting, and (d) a
density of one marker every 20 cM, with 25% of AIS non-IBD
alleles.