Brigham Young University Brigham Young University BYU ScholarsArchive BYU ScholarsArchive Theses and Dissertations 2006-07-19 Simple Sequence Repeat Development, Polymorphism and Simple Sequence Repeat Development, Polymorphism and Genetic Mapping in Quinoa (Chenopodium quinoa Willd.) Genetic Mapping in Quinoa (Chenopodium quinoa Willd.) David Jarvis Brigham Young University - Provo Follow this and additional works at: https://scholarsarchive.byu.edu/etd Part of the Animal Sciences Commons BYU ScholarsArchive Citation BYU ScholarsArchive Citation Jarvis, David, "Simple Sequence Repeat Development, Polymorphism and Genetic Mapping in Quinoa (Chenopodium quinoa Willd.)" (2006). Theses and Dissertations. 504. https://scholarsarchive.byu.edu/etd/504 This Thesis is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of BYU ScholarsArchive. For more information, please contact [email protected], [email protected].
125
Embed
Simple Sequence Repeat Development, Polymorphism and ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Brigham Young University Brigham Young University
BYU ScholarsArchive BYU ScholarsArchive
Theses and Dissertations
2006-07-19
Simple Sequence Repeat Development, Polymorphism and Simple Sequence Repeat Development, Polymorphism and
Genetic Mapping in Quinoa (Chenopodium quinoa Willd.) Genetic Mapping in Quinoa (Chenopodium quinoa Willd.)
David Jarvis Brigham Young University - Provo
Follow this and additional works at: https://scholarsarchive.byu.edu/etd
Part of the Animal Sciences Commons
BYU ScholarsArchive Citation BYU ScholarsArchive Citation Jarvis, David, "Simple Sequence Repeat Development, Polymorphism and Genetic Mapping in Quinoa (Chenopodium quinoa Willd.)" (2006). Theses and Dissertations. 504. https://scholarsarchive.byu.edu/etd/504
This Thesis is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of BYU ScholarsArchive. For more information, please contact [email protected], [email protected].
in partial fulfillment of the degree requirements for
Master of Science
Department of Plant and Animal Sciences
Brigham Young University
August 2006
BRIGHAM YOUNG UNIVERSITY
GRADUATE COMMITTEE APPROVAL
of a thesis submitted by
David E. Jarvis
This thesis has been read by each member of the following graduate committee and by majority vote has been found to be satisfactory. ______________________________ ____________________________________ Date Eric N. Jellen, Chair ______________________________ ____________________________________ Date P. Jeffrey Maughan
______________________________ ____________________________________ Date R. Paul Evans
BRIGHAM YOUNG UNIVERSITY
As chair of the candidate’s graduate committee, I have read the thesis of David E. Jarvis in its final form and have found that (1) its format, citations, and bibliographical style are consistent and acceptable and fulfill university and departmental style requirements; (2) its illustrative material including figures, tables, and charts are in place; and (3) the final manuscript is satisfactory to the graduate committee and is ready for submission to the university library. ______________________________ _________________________________ Date Eric N. Jellen Chair, Graduate Committee Accepted for the Department _________________________________ Von D. Jolley Graduate Coordinator Accepted for the College _________________________________ Rodney J. Brown Dean, College of Biology and Agriculture
ABSTRACT
SIMPLE SEQUNCE REPEAT DEVELOPMENT, POLYMORPHISM
AND GENETIC MAPPING IN QUINOA
(CHENOPODIUM QUINOA WILLD.)
David E. Jarvis
Department of Plant and Animal Sciences
Master of Science
Quinoa is an important, highly nutritional grain crop in the Andean region of
South America. DNA markers and linkage maps are important tools for the improvement
of underdeveloped crops such as quinoa. The objectives of this study were to (i) develop
a new set of SSR markers to augment the number of SSR markers available in quinoa,
and (ii) construct a new genetic linkage map of quinoa based on SSRs using multiple
recombinant-inbred line (RIL) populations. Here we report the development of 216 new
polymorphic SSR markers from libraries enriched for GA, CAA, and AAT repeats, as
well as 6 SSR markers developed from BAC-end sequences (BES-SSRs).
Heterozygosity (H) values of the SSR markers ranged from 0.12 to 0.90, with an average
value of 0.56. These new SSR and BES-SSR markers were analyzed on two RIL
mapping populations (designated Population 1 and Population 40), each obtained by
crossing Altiplano and coastal ecotypes of quinoa. Additional markers, including AFLPs,
two 11S seed storage protein loci, a SNP, and the nucleolar organizing region (NOR),
were also analyzed on one or both populations. Linkage maps were constructed for both
populations. The Population 1 map contains 275 markers, including 200 SSR and 70
AFLP markers, as well as five additional markers. The map consists of 41 linkage
groups (LGs) covering 913 cM. The Population 40 map contains 68 markers, including
62 SSR and six BES-SSR markers, and consists of 20 LGs covering 353 cM. Thirty-nine
anchor markers common between both maps were used to combine 15 Population 1 LGs
with 13 Population 40 LGs. The resulting integrated map consists of 13 LGs containing
140 SSR, 48 AFLP, four BES-SSR, one SNP, and one NOR marker spanning a total of
606 cM. A high level of segregation distortion was observed in both populations,
indicating possible chromosomal regions associated with gametophytic factors or QTLs
conferring a selective advantage under the particular growing conditions. As these maps
are based primarily on easily-transferable SSR markers, they are particularly suitable for
applications in the underdeveloped Andean regions where quinoa is grown.
ACKNOWLEDGMENTS
Many thanks are owed to my committee chair, Dr. Eric N. Jellen, and my committee
members, Dr. P. Jeff Maughan and Dr. R. Paul Evans. Their doors and email inboxes
have always been open, and they have been always willing to answer questions or give
advice. Dr. Mikel Stevens and Dr. Craig Coleman have also been helpful and enjoyable
to work with. Thanks to all the quinoa team for making such a unique environment in
which students can learn and grow together. I am indebted to the countless undergrads
that have helped me with the tedious tasks of pouring and loading gels; there are too
many to mention them all. Dr. Olga Kopp and her team, especially Melanie Mallory, did
excellent work on the SSR development. Thanks to Aaron Towers for his help on
AFLPs, Kristin Andelin for her help on SNP mapping, and Jenny N. Thornton for her
help on 11S mapping. I am also grateful to the McKnight Foundation and the Holmes
Family Foundation, whose financial support made my project possible. Most of all, a
very special thanks to my patient and beautiful wife, Stephanie, for her constant love and
support.
vii
TABLE OF CONTENTS
Graduate Committee Approval ii
Final Reading Approval and Acceptance iii
Abstract iv
Acknowledgments vi
List of Figures ix
List of Tables x
Chapter 1: Simple Sequence Repeat Development, Polymorphism, and Genetic
Mapping in Quinoa (Chenopodium quinoa Willd.) 1
Introduction 2
Materials and Methods 4
Results and Discussion 10
Conclusions 20
References 23
Chapter 2: Tables and Figures 30
Chapter 3: Literature Review 51
Introduction 52
History 53
Taxonomy 54
Breeding 55
Biotic and Abiotic Stresses 57
Molecular Studies in Quinoa 58
viii
Molecular Markers 60
Simple Sequence Repeats 62
Mapping 63
Conclusion 64
References 65
Appendix: Scoring Data 75
ix
LIST OF FIGURES
Figure 1. Number of clones sequenced and primers developed for each library. 39
(A) Total number of sequenced clones, including those containing unique microsatellites, redundant sequences, and those not used for primer design. 39 (B) Total number of primers designed, including polymorphic and monomorphic
primers, polymorphic primers with high molecular weight amplicons, those polymorphic only between C. berlandieri and quinoa, and primers with poor
or no amplification. 39 Figure 2. Histogram showing number and heterozygosity (H) values of polymorphic markers by repeat length. 40 Figure 3. Linkage maps. 44
(A) Population 1. 44 (B) Population 40. 45 (C) Integrated map. 46
Figure 4. Comparison of loci linked to the BSP locus (BSPL) (Ricks 2005) in LG 11 of the Maughan et al. (2004) map and linkage group (LG) 1 of the integrated map reported herein. 49
x
LIST OF TABLES
Table 1. Quinoa microsatellite marker name, primary motif, complexity, type, primer sequences, expected PCR product size (PRO), observed number of alleles (ONA), and heterozygosity value (H). 31 Table 2. Significant database sequence homologies to microsatellite-containing clones for which primers were designed, including E-value, nucleotide and/or protein homology match, organism match, and GenBank accession number, as identified through BLASTN and BLASTX searches. 41 Table 3. Skewed markers scored and mapped in Populations 1 and 40. 47
(A) Name and parental direction of skewed markers scored in
Populations 1 and 40. 47 (B) Number, linkage group location, and parental direction of skewed
markers for Populations 1 and Population 40 . 48 Table 4. Potentially homoeologous loci and linkage groups (LG) in the Population 1, Population 40 and inegrated map, as indicated by a single primer set amplifying two segregating loci. 50
1
Chapter 1: SIMPLE SEQUNCE REPEAT DEVELOPMENT, POLYMORPHISM AND
GENETIC MAPPING IN QUINOA (CHENOPODIUM QUINOA WILLD.)
2
Introduction Quinoa (Chenopodium quinoa Willd.) is an allotetraploid (2n = 4x= 36) that shows
amphidiploid inheritance for most qualitative traits (Simmonds 1971; Risi and Galwey
1984; Ward 2000). It is an important South American cereal crop that recently has
gained international attention for the high nutritional value of its grain. Grown primarily
in the Altiplano regions of Bolivia, Ecuador, Chile, and Peru, quinoa has served as an
important staple crop for subsistence farmers for thousands of years (Pearsall 1992;
Wilson 1988, Maughan et al. 2004). It is well-suited as a staple crop in the Altiplano due
to its high protein content (7.5-22.1%) (Tapia et al. 1979) as well as its ability to grow in
the harsh environments that characterize much of the Altiplano, specifically high altitudes
(up to 4000 m), frequent frosts, and saline soils (Risi and Galwey 1984; Vacher 1998;
Prado et al. 2000; Jacobsen et al. 2003; Maughan et al. 2004).
Despite its many desirable nutritional characteristics, however, quinoa is plagued
by a number of biotic stressors. Serious quinoa diseases include bacterial stem rot and
downy mildew (Danielsen et al. 2003). Quinoa is also affected by avian, arthropod, and
nematode pests (Rasmussen et al. 2003; Franco 2003), all of which reduce grain yields.
Thus, a major breeding objective for quinoa includes the development of disease-
resistant, high-yielding varieties. Unlike the other major cereal crops which have
benefited greatly from modern plant breeding techniques and genetic research, genetic
improvement of quinoa has in large part been neglected. Indeed, it has been in just the
last five years that dedicated breeding programs for quinoa have been established.
Essential to these improvement programs is the development of molecular tools,
including genetic markers and genetic maps (Mason et al. 2005).
3
Genetic markers are essential tools for modern plant breeding research programs
(Staub et al. 1996). They are particularly important for germplasm conservation and
core-collection development (Diwan et al. 1995; Tanksley and McCouch 1997), as well
as in enhanced breeding applications, such as marker assisted selection. Crucial to all of
these marker applications is the development of highly informative, easily transferable,
and reliable genetic markers. The first step towards the development of genetic markers
for quinoa was made by Mason et al. (2005) who reported the development of 208
microsatellite, or simple sequence repeat (SSR), markers. These markers have already
been utilized to assess the genetic diversity among quinoa accessions within the USDA
collection (Christensen 2005) and efforts to genetically characterize Andean and Chilean
germplasm is currently underway. Unfortunately, of the 208 SSR markers identified by
Mason et al. (2005), only 67 were considered highly polymorphic (H>0.7) – highlighting
the need for additional marker development.
The first genetic linkage map of quinoa was reported by Maughan et al. (2004).
This map, which covered an estimated 60% of the genome, was based primarily on
amplified fragment length polymorphisms (AFLPs) since relatively few sequence-based
(e.g., SSR) markers were available. Unfortunately, the difficulties associated with AFLP
marker technologies and the associated transfer of this technology to developing world
countries where quinoa is cultivated have limited the utility of this map and the
development of MAS strategies within quinoa improvement program. Here we report the
generation of a second-generation map based primarily on easily-transferable and reliable
SSR markers. Specifically, the objectives of this study were: (i) develop a new set of
polymorphic SSR markers to augment the number of SSR markers available in quinoa,
4
and (ii) construct a new genetic linkage map of quinoa based primarily on the SSRs
described here and by Mason et al. (2005) using two immortalized recombinant-inbred
line (RIL) populations.
Materials and Methods
Plant material
For SSR development and characterization, seeds from 22 quinoa accessions representing
the geographical distribution of cultivated quinoa were kindly provided by Angel Mujica,
National University of the Altiplano, Puno, Peru, and Alejandro Bonifacio, PROINPA,
La Paz, Bolivia. Seeds from control species pitseed goosefoot (Chenopodium berlandieri
Moq.; PI 595315) were kindly provided by David Brenner (USDA, Chenopodium
curator, Ames, Iowa).
For genetic map construction, two RIL populations were developed (designated
Population 1 and Population 40). Population 1 consists of 82 F6 plants from a cross of
‘Ku-2’ (Chilean coastal ecotype) and ‘0654’ (Peruvian Altiplano ecotypes), while
population 40 is from a cross between ‘NL-6’ (Chilean coastal ecotype) and ‘Chucapaca’
(Bolivian Altiplano ecotype) and consists of 85 F7 plants. Both RIL populations were
produced by self-fertilizing a single F1 plant and allowing plants of subsequent
generations to self-fertilize.
All plants were greenhouse grown in Provo, Utah, USA in 15 cm (6 in) pots using
Sunshine Mix II (Sun Grow, Inc., Bellevue, WA) and were supplemented with nitrogen
fertilizer. Plants were maintained at 25°C under broad-spectrum halogen lamps with a
12-h photoperiod.
5
DNA extraction
Genomic DNA from all plants was extracted from 30 mg freeze-dried leaf tissue
following procedures described by Sambrook et al. (1989), with modifications described
by Todd and Vodkin (1996).
SSR discovery and analysis
SSR markers were developed from two sources: enriched SSR libraries and from BAC
end sequences from a BAC library reported by Steven et al. (2006). Enriched libraries
for GA, CAA, and AAT repeats were produced by Genomic Identification Services, Inc.
(Chatsworth, CA) using genomic DNA from the Bolivian Altiplano ecotype ‘Surimi’
according to protocol described by Mason et al. (2005). Libraries were plated in S-gal
media (Sigma-Aldrich, Inc., Saint Louis, MO) supplemented with 50 mg/l ampicillin, for
blue-white detection of recombinant clones. Recombinant clones were sequenced bi-
directionally using M13 forward (5’ GTA AAA CGA CGG CCA GT) and M13 reverse
(5’ CAG GAA ACA GCT ATG AC) primers at the Arizona Genomics Institute (Tucson,
AZ) using standard ABI Prism Taq dye terminator cycle sequencing methodologies. The
computer program Contig Express (InforMax, Inc., Frederick, MD) was used to
determine consensus sequences, eliminate redundant clones, and identify simple
sequence repeats.
Primers flanking each unique SSR were designed using the web-based computer
program Primer3 version 2.0 (Rozen and Skaletsky 2000) according to the program’s
default parameters. Oligonucleotide primers were synthesized by Integrated DNA
Technologies, Inc. (Iowa City, IA). All primers were screened on a panel of eight DNAs
6
including seven quinoa and one pitseed goosefoot accession. This panel was used to
eliminate monomorphic primer pairs or primer pairs that failed to amplify. Primers that
were successfully amplified on this panel and showed a simple amplification pattern were
subsequently run on a full panel consisting of the 22 quinoa and one pitseed goosefoot
control accession. All data analysis, including calculation of heterozygosity values, was
performed using data obtained from this full panel. Pitseed goosefoot was included to
assess the extent of cross-species amplification of the SSR primers.
SSR primers developed from the quinoa BAC-end sequences (BES-SSRs) were
identified using the web-based computer program Tandem Repeats Finder (Benson
1999). Only repeat sequences with repeat length greater than 20 bp (n=10 for
dinucleotides; n=7 for trinucleotides, etc.) were selected for primer design using the
program Primer3 version 2.0 (Rozen and Skaletsky 2000) as described previously.
PCR amplifications of the SSRs were performed in 10-μl PCR reactions
containing 30 ng genomic DNA, 0.2 mM of each dNTP, 2.5 mM MgCl2, 1X PCR buffer,
0.1 mM cresol red and 2% (w/v) sucrose, 0.5U JumpStart Taq polymerase (Sigma-
Aldrich, Inc., Saint Louis, MO), 1.0 µM forward primer, and 1.0 µM reverse primer.
Thermal cycling profiles was as follows: 94°C for 60 s, followed by 19 cycles of 94°C
for 60 s, 64°C for 30 s (decreasing 0.5°C every cycle), 72°C for 60 s; 30 cycles of 94°C
for 60 s, 55°C for 60 s, 72°C for 60 s, followed by a final extension at 72°C for 10 m.
PCR products were separated on 3% Metaphor agarose gels (Cambrex Bio Science, Inc.,
East Rutherford, NJ) at 120V for 4-5 h. All gels were run in 0.5X TBE and were
visualized using ethidium bromide staining with UV transillumination.
7
Data analysis
The information content for each new SSR was described using the heterozygosity (H)
value. In a multiallele system, heterozygosity values estimate the probability that any
two individuals taken at random from a population will be polymorphic and is
determined using the following equation:
where Pi is the frequency of the ith allele and k is the number of alleles (Nei 1978).
Additional markers
Amplified Fragment Length Polymorphisms. To increase the number of markers on the
map, AFLP analysis was performed on Population 1 following procedures described by
Vos et al. (1995), with minor modifications for quinoa as described by Maughan et al.
(2004). Further modifications included a selective amplification protocol consisting of
94°C for 60 s, followed by 13 cycles of 94°C for 30 s, 65°C for 30 s, and 72°C for 60 s.
The annealing temperatures were lowered 0.7°C for each of the 12 cycles, followed by 23
cycles of 94°C for 30 s, 56°C for 30 s, and 72°C for 60 s.
Nucleolar Organizing Region (NOR) mapping. Maughan et al. (2006) recently reported
the cloning and sequencing of the intergenic spacer (IGS) region of the 45S NOR in
quinoa. Sequence analysis of the parents of population 1 revealed a 43-bp indel
polymorphism - present in ‘Ku-2’ (GenBank # DQ187958) and deleted in ‘0654’
(DQ187960). Segregation analysis of the NOR was performed using standard PCR (as
k
H = 1 - ∑ Pi2
i=1
8
described above) with primers flanking the indel (5’ TTT GAA ACC ATA ACA CAC
CTA TAA AG and 5’ TGG TCC AAA GAA TGG GTA TTT). PCR products were
resolved on 1.4% agarose.
11S seed protein mapping. The isolation of two BAC clones containing homologs of the
11S seed storage protein gene was recently reported in quinoa (Stevens et al. 2006). Two
11S loci (11S_77L9, 11S_164F2), presumably from each of quinoa’s subgenomes, were
isolated and sequenced (Balzotti, personal communication).
Sequence analysis of 11S_77L9 revealed a polymorphism between ‘Ku-2’ and
‘0654’ in a DraI restriction site, allowing for mapping of the polymorphism in Population
1 using a standard cleaved amplified polymorphic sequence (CAPS) assay (Konieczny
and Ausubel 1993). Briefly, DNA from the parents and the RIL population was
amplified in a 10-μl reaction containing 30 ng genomic DNA, 0.2 mM of each dNTP, 2.5
mM MgCl2, 1X PCR buffer, 0.1 mM cresol red and 2% (w/v) sucrose, 0.125U JumpStart
Taq polymerase, 0.5 mM forward primer (5’ ACA ACA CCG GAA ATG AGC CT), and
0.5 mM reverse primer (5’ CCA CTG AAT ACG TTG CCG C). PCR conditions were as
follows: 95°C for 5 m; 40 cycles of 95°C for 30 s, 65°C for 30 s, and 72°C for 30s;
followed by a hold at 72°C for 7 m. The PCR product was brought up to a volume of 20
μl with water, 1X Tango buffer (Fermentas, Hanover, MD) and 5U DraI restriction
endonuclease (Fermentas, Hanover, MD), and was incubated at 37°C for a minimum of 2
h. Restriction fragments were size-separated on 1% agarose at 150V for 2 h, and were
visualized using ethidium bromide staining with UV transillumination.
9
Sequence analysis of 11S_164F2 revealed no polymorphisms between ‘Ku-2’ and
‘0654’ in a common restriction enzyme site; thus, a Taqman allelic discrimination assay
(Perkin Elmer Biosystems) was used to map this locus. The allelic discrimination
reactions were performed using Applied Biosystems (Foster City, CA) PCR Supermix
according to the manufacturer’s protocol. The final reaction consisted of 30 ng of quinoa
genomic DNA, 0.4 µM forward (5’ GCG CTT TTT CCA ATA TTA GAC TCA A) and
reverse (5’ TGT TGA AGT TGG TAC GTA AGC ATC A) primers, 0.2 µM of each
discrimination probe (5’ TTG TTT GCT ACA TTC A; 5’ TAT TGT TTG ATA CAT
TCA AT) and a 1X concentration of the PCR Supermix, which includes an internal ROX
standard dye. PCR amplifications were carried out on an ABI 7300 RT-thermocycler
using the following thermal cycling conditions: 50°C for 2 min, 95°C for 10 min, 40
cycles of 95°C for 15 s and 60°C for 60 s. The analysis of the allelic discrimination
assays was performed using the SDS v2.0 software (Applied Biosystems, Foster City,
CA). Genotype calls for each accession were determined by inspecting the plot of the
fluorescence signals (standardized with ROX values) from each of the allelic
discrimination probes (VIC vs. FAM) generated from the post-PCR fluorescence reads
(end-point analysis). Fluorescence of only the FAM probe or only the VIC probe
indicated homozygosity for a particular allele while intermediate fluorescence from both
reporters indicated heterozygosity at the locus. DNA samples with allelic genotypes,
verified via sequencing, were utilized as internal standards to validate each TaqMan SNP
assay.
10
Others. A SNP (S01C15) was analyzed on Population 1 using a standard CAPS assay.
The betalain color locus (scored as stem color) was also analyzed on both populations.
Map construction
For map construction, markers were scored as codominant (as was the case with a
majority of the SSR markers) or dominant (majority of the AFLP markers). Marker
segregation was analyzed for conformation to Mendelian ratios expected in RILs using a
chi-square test, with two and one degrees of freedom for codominant and dominant
markers, respectively. Linkage groups were constructed with a minimum LOD score of
3.0 using the default mapping parameters (LOD>1.0, recombination threshold = 0.4,
ripple value = 1, jump threshold = 5, Kosambi mapping function) of the computer
program JoinMap, version 3.0 (van Ooijen and Voorrips 2001). Linkage groups from the
two different populations that shared at least one common marker were combined using
the “Combine groups for map integration” function of JoinMap (Stam 1993).
Results and Discussion
SSR discovery and analysis
Here we report the results of libraries enriched for GA, CAA and AAT. These particular
libraries were chosen based on results reported by Mason et al. (2005) that suggested that
the quinoa genome contains high frequencies of GA, CAA, and AAT repeats. A total of
1172 clones were sequenced, including 490 clones from each of the GA and CAA
libraries and 192 clones from the AAT library. A total of 436 (37%) clones were
identified that contained unique SSR sequences, of which 402 were suitable for primer
11
design (178, 85, and 139 from the GA, AAT, and CAA libraries, respectively) (Fig 1a).
As expected from the enriched libraries, the most common repeats observed in the study
were GA (49%), CAA (35.6%), and AAT (12.9%). Other repeat motifs, including CA,
CGA, GAA and GGT, were also observed, albeit infrequently. Of the 402 SSRs tested,
216 (54%) were polymorphic when tested on the screening panel of seven quinoa
accessions (Fig. 1b). An additional 19 (4.7%) were polymorphic when the pitseed
goosefoot accession was included in the analysis (interspecies polymorphism). The
remaining primers (165) were monomorphic or amplified poorly. In only nine cases did
a primer successfully amplify in quinoa but not in pitseed goosefoot, suggesting that
these two Chenopodium species share a high degree of DNA sequence homology.
Indeed, gene flow between quinoa and pitseed goosefoot has been reported previously
(Wilson and Manhart 1993). Most polymorphic markers had repeat lengths of greater
than 20 bp (Fig. 2), confirming the conclusions of Mason et al. (2005) who suggested that
the future development of SSR markers in quinoa should focus on the identification of
markers with repeat lengths of >20 bp in order to maximize polymorphism (H values).
All 216 polymorphic SSRs, including 111 dinucleotide, 104 trinucleotide, and one
hexanucleotide repeat, were screened on the larger panel of 22 quinoa accessions and
one pitseed goosefoot accession to determine their polymorphic information content
(heterozygosity values). A total of 888 alleles were observed across all 22 quinoa
samples included in the full panel. The observed number of alleles per SSR ranged from
2 to 13, with an average of 4 alleles per SSR. Heterozygosity (H) values ranged from
0.12 to 0.90, with an average value of 0.56 (Table 1). These values are within the range
observed previously in quinoa (Mason et al. 2005) as well as in related species such as
12
sugar beet (Cureton et al. 2002, Rae et al. 2000). According to Ott (1992), a marker is
considered polymorphic if H ≥ 0.10 and highly polymorphic if H ≥ 0.70. Based on these
criteria, all 216 markers identified are considered polymorphic, and 53 (25%) are
considered highly polymorphic (H ≥ 0.70).
Sequence homology analysis was conducted for clones for which primers were
designed. BLASTN and BLASTX searches identified 41 sequenced clones with
significant homology (E<0.0001) to sequences in the GenBank databases (Table 2).
Seven clones showed homology to known sequences at the nucleotide level only, while
32 showed significant homology (E<0.0001) to known sequences at the amino acid level
only. Two sequences showed homology at both the nucleotide and amino acid level.
Hits to annotated gene and protein sequences on GenBank included SotA gene; an alpha
zein gene of Zea mays; proteins involved in developmental processes including a putative
C2H2 type zinc finger protein, a Circadian-clock associated protein, and proteins
involved in defense responses and protection including Nim1 (non-inducible immunity-
like protein). Metabolic proteins including isocitrate dehydrogenase, succinyl CoA
synthetase, succinyl CoA ligase, phosphoenolpyruvate carboxylase kinase, beta-amylase,
and oligosaccharyl transferase were also identified. Homologies with GeneBank
sequences were most often identified with Arabidopsis thaliana and rice (Oryza sativa
L.).
SSR marker analysis
Population 1. A total of 424 SSR primers were screened on the parents of Population 1.
Analysis on the entire population revealed 203 primers that were polymorphic and easily
13
scored, while the rest were either not segregating in the population or were too
ambiguous to score. The 203 polymorphic primers amplified 213 segregating loci, a
result of 193 primers amplifying one locus each, and 10 primers each amplifying two
loci. Quinoa is an allotetraploid and it is likely that the second band amplified in these 10
primers represents amplification products from homoeologous loci from the two
subgenomes of quinoa. Of the total marker loci scored, 190 (89%) loci were scored in a
codominant fashion, while 23 (11%) were scored as dominant. Of the dominant loci, 14
were specific to ‘Ku-2’ and nine were specific to ‘0654’. Sixteen markers (7.5%)
deviated significantly (P<0.05) from the expected 1:1 segregation ratio, eight (3.8%) of
which were highly significant (P<0.01; Table 3a). Approximately 15 SSR primers
displayed complex banding patterns when amplified in this population, as well as in
Population 40. This was previously observed in quinoa (Maughan et al. 2004) and is
likely caused by duplicate chromosome regions in the allotetraploid quinoa genome (Rae
et al. 2000).
Population 40. Population 40 was used in an effort to increase the total number of
markers placed within the genetic map. Thus, the same 424 SSR primers were screened
on the parents (‘Chucapaca’ and ‘NL-6’) of Population 40; however, only those primers
uniquely polymorphic to Population 40, as well as a small set of common (anchor)
markers, were chosen for analysis on the entire population. In total, 82 SSRs were
polymorphic and easily scored in the population. The 82 polymorphic SSRs amplified a
total of 84 polymorphic loci, again a likely result of two primers each amplifying
homoeologous loci. Thirty-seven of the 84 loci are uniquely polymorphic to Population
14
40, while the remaining 47 were used as anchor markers for cross-population map
integration. Seventy-eight (93%) markers were scored in a codominant fashion, while six
(7%) were scored as dominant loci. Four of the dominant loci were specific to
‘Chucapaca’, while two were specific to ‘NL-6’. Twenty-seven markers (32%) deviated
significantly (P<0.05) from expected segregation values, twenty-one (25%) of which
were highly significant (P<0.01; Table 3a). Eighteen BES-SSRs were also screened on
the parents of Population 40, six of which were polymorphic and easily scored on the
entire population. All of the BES-SSRs were scored as codominant loci, and none of
them showed distorted segregation.
AFLP, 11S, NOR and morphological markers
AFLP analysis was conducted only on Population 1. Twenty-four primer combinations
were chosen based on their previously demonstrated ability to amplify polymorphic loci
(Maughan et al. 2004). A total of 81 polymorphic, easily-scored loci were amplified
from the 24 AFLP primer combinations. The number of polymorphic loci per primer
combination varied from one to nine, with an average of 3.4. Of the 81 scored
polymorphic loci, 79 were dominant, and two were codominant. Thirty-one (39%) of the
dominant loci were specific to ‘Ku-2’, while 48 (61%) were specific to ‘0654’. An
unusually high number of AFLP markers showed distorted segregation; 15 and 7 markers
were significant at P<0.05 and P<0.01, respectively (Table 3a).
Five additional morphological and DNA markers were analyzed for Population 1,
including: the betalain color locus (scored as stem color), two 11S seed storage protein
loci, the NOR, and a SNP marker (S01C15; GenBank # CN782051). While
15
heterozygotes for the color locus could not be distinguished, all loci, except the betalain
color locus, were scored in a codominant fashion, and none showed distorted segregation.
Linkage analysis and map construction
Population 1. A total of 299 loci were included in the linkage analysis of Population 1;
275 (92%) of these loci mapped at a minimum LOD of 3.0, including 200 (94%) SSR, 70
(86%) AFLP, and all five additional markers (11S loci, NOR locus, betalain color locus,
and SNP). The resulting map (Fig. 3a) consists of 41 linkage groups covering 913 cM, or
approximately 54% of the predicted 1700-cM quinoa genome (Maughan et al. 2004).
Linkage groups (LGs) were numbered based on the number of markers, with LG 1
containing the most markers (41). Linkage group lengths vary from a high of 86 cM (LG
1) to a low of 0 cM (LGs 39, 40, and 41). The largest interval between two linked
markers is 22 cM on LG 21, and the average distance between all loci is 3.32 cM/marker.
Most intervals (88%) are <10 cM and 85% of intervals between SSRs markers are <10
cM. The largest gap between SSR markers is 25 cM on LG 20, with an average gap
between SSR markers of 4.6 cM.
Population 40. A total of 91 loci were included in the linkage analysis of Population 40;
68 (75%) markers mapped at a minimum LOD of 3.0, including all six BES-SSRs. The
betalain color locus did not map in Population 40. The resulting map (Fig. 3b) consists of
20 LGs covering 353 cM, or an estimated 21% of the entire quinoa genome. LG 1
contains the most markers (12) and spans the longest distance (72 cM), while eight
different LGs each contain only 2 markers, two of which cosegregated. Markers are
16
spaced at an average of 5.2 cM/marker, with the largest interval being 25 cM on LG 7.
Sixty-nine percent of all intervals are <10 cM.
Skewed markers
The high number of skewed markers in this study (particularly in Population 40), was
not observed in the AFLP linkage map constructed by Maughan et al. 2004, but has been
observed in other plant studies using both inter- and intraspecific crosses (for a review,
see Jenczewski et al. 1997). Segregation distortion of markers has been reported as a
result of random chance or as the result of linkage disequilibrium with genes that
ultimately reduce viability of the gamete and/or zygote (Zamir and Tadmor 1986). Of the
22 skewed markers that mapped in Population 1, 15 are skewed toward ‘Ku-2’, while
seven are skewed toward ‘0654’ (Table 3b). Eleven skewed markers mapped to LG 1 of
Population 1, seven of which are skewed toward ‘Ku-2’. Six of these seven markers are
localized to the first 34 cM on the LG. All four markers skewed toward ‘0654’ are
AFLPs, and are localized to a 7 cM region of LG 1. Six of the 22 skewed markers in
Population 1 are localized to a 23 cM interval on LG 13; all six markers are skewed
toward ‘Ku-2’. All skewed markers in Population 1 mapped to a total of six different
LGs. While some linkage groups contained only one skewed marker, the presence of
clusters of markers skewed to one parent or the other is suggestive of chromosomal
regions containing possible gametophytic factors (Lu et al. 2002). Alternatively, these
skewed chromosomal regions, such as those on LGs 1 and 13, may be associated with
QTL conferring a selective advantage under the particular greenhouse growing conditions
utilized to produce the RIL populations – we note that some (approximately 10%) of the
17
lines were lost during the population development process. A better understanding of the
cause of these skewed regions will require further studies. We also note that while
segregation distortion is generally believed to be greater in interspecific crosses, reaching
levels as high as 68.5% (Paterson et al. 1988), levels can also be high in intraspecific
crosses. For example, Hall and Willis (2005) observed similar levels of distortion (near
50%) in both interspecific and intraspecific crosses, an observation attributed to the high
level of genomic divergence between the parents of the intraspecific cross. Thus, the
extent of segregation distortion appears to be only indirectly related to the type of cross,
and more directly related to the extent of genome divergence between the lines being
crossed. Our populations are the result of crossing highly divergent Altiplano and
Coastal quinoa ecotypes (Mason et al. 2005). Indeed, the parents of Population 1 and
Population 40 have very low similarity coefficients (0.304 and 0.245, respectively)
suggesting a high degree of genome divergence between the parents of both crosses
(Maughan et al. 2004). This high level of genome divergence may also play a role in the
aberrant phenotypes periodically displayed in certain progeny of Population 1 throughout
the inbreeding process. These plants were shorter than normal, with reduced internode
length and thicker leaves with mostly smooth rather than toothed margins. In addition,
they displayed delayed flowering, reduced inflorescence structures, and increased
sterility.
Population 40 contains fewer markers, although a larger percentage showed
segregation distortion. Ten of the 23 skewed markers that mapped in Population 40 are
skewed toward ‘Chucapaca’, while 13 are skewed toward ‘NL-6’ (Table 3b). These 23
skewed markers mapped to 14 different LGs. Four LGs (1, 3, 7, 8) contain three skewed
18
markers, while all other LGs contain fewer than three skewed markers. In this
population, any given marker was significantly skewed if five or more individuals in the
population were heterozygous at that marker locus. Interestingly, of the 23 skewed
markers that mapped in Population 40, 17 (74%) were skewed because they contained
five or more heterozygotes. A significantly high number of heterozygotes has been
observed in other mapping studies. In mapping RFLP loci in an F2 diploid alfalfa
(Medicago sativa L.) population, Brummer et al. (1993) noted that a majority of all
skewed markers had too many heterozygotes. This observation was attributed to the
maximum heterozygosis hypothesis [Demarly (cited in Busbice et al. 1972)] which
asserts that fitness is directly correlated to the number of alleles at a locus. Thus, the high
number of heterozygous loci reported here could be evidence of heterozygote advantage
at particular loci in quinoa. Interestingly, these same loci in Population 1 do not show
excess heterozygosity, suggesting that this phenomenon may be population specific.
Integrated map. Thirty-nine mapped markers were common between Population 1 and
Population 40, and were thus used as anchor markers to integrate portions of the two
maps. Twenty Population 1 LGs shared at least one common marker with 17 different
Population 40 LGs. Five pairs of LGs sharing common markers were unable to be
combined using JoinMap software. The remaining 15 Population 1 LGs sharing common
markers with 13 Population 40 LGs were successfully integrated into 13 new LGs. This
integrated map (Fig. 3c) contains 140 SSR, 48 AFLP, four BES-SSR, one SNP, and one
NOR marker spanning a total of 606 cM. Sixteen markers in the integrated map are
unique to Population 40. Three SSR markers (KGA165, QCA053, QCA117) that
19
grouped with LG 1 in Population 1, but did not map to precise locations in that
population, were able to be more precisely mapped in the integrated map. Linkage
groups were again ordered based on the number of linked markers, with LG 1 containing
47 markers covering 147 cM, and LG 13 containing 3 markers spanning 13 cM. The
average spacing between SSR, AFLP, and all markers is 4.3, 12.6, and 3.1 cM/marker,
respectively. The largest interval between two mapped loci is 16 cM on LGs 4 and 5.
The largest interval between two linked SSR markers and between two linked AFLP
markers is 24 cM and 38 cM, on LGs 5 and 1, respectively. Ninety-two percent of all
intervals are <10 cM, while 88% and 67% of intervals between two linked SSR markers
and between two linked AFLP markers are <10 cM, respectively.
The lack of anchor markers in many of the Population 1 and 40 LGs prevented
their integration; thus, this map clearly does not represent complete genome coverage –
indeed, the integrated map covers only an estimated 37% of the estimated 1700-cM
quinoa genome. However, the map does demonstrate the ability to combine maps of
different populations in quinoa. This allows for the addition of markers polymorphic in
only one of the maps, thus increasing the total number of mapped markers. Furthermore,
an integrated map with more markers can result in higher marker density, as was the case
here. The average marker density of the integrated map was 3.1 cM/marker, compared to
3.3 and 5.2 cM/marker in Populations 1 and 40, respectively. In addition, the average LG
length in the integrated map was 47 cM, compared to 22 and 18 cM in Populations 1 and
40, respectively. Moreover, since lines within both populations have been selfed to near
homozygosity, each line can be propagated eternally without genetic changes. Such
populations are essential for the quinoa research community, since they alleviate the need
20
to develop new mapping populations each time new genetic markers become available.
Indeed, the use of RIL populations for genetic map production achieves greater mapping
resolutions since the breakpoints in RILs are more dense than those that occur in F2
populations (single meiotic events) (Broman 2005). Additionally, since the seed of these
populations are essentially limitless, these populations also lend themselves to qualitative
and quantitative traits loci mapping experiments since replicated field trials can be
analyzed using identical genetic material. The quantitative trait data can then be used to
determine if any molecular markers are closely associated with those traits – an important
first step toward map-based gene cloning.
Conclusions
The major objectives of this project were to increase the number of available SSR
markers and to build the first SSR-based genetic map of quinoa. We report the
development and characterization of 216 new SSRs markers and the development of a
genetic map based primarily on sequence-tagged SSR markers. Compared to the haploid
chromosome number (18) of quinoa, the high number of linkage groups identified in both
populations indicates that many regions of the genome have not been detected and that
additional markers and/or targeted marker development is still needed to coalesce linkage
groups and provide complete coverage of the quinoa genome. One potential method for
coalescing linkage groups into syntenic groups involves the mining of marker-containing
BACs for suitable in situ hybridization markers. These markers, or possibly their BACs
if the latter do not contain dispersed repetitive sequences, can then be hybridized in pairs
with BACs containing markers from other linkage groups directly to quinoa
21
chromosomes using fluorescent in situ hybridization (FISH). Here we report the
development and mapping of several SSR markers derived from BAC-end sequences of a
newly constructed BAC library (Stevens et al. 2006). The development of a physical
map via restriction mapping of BAC-end sequencing should prove invaluable in the
targeted development of genetic markers as well as the integration of future genetic and
physical maps of quinoa (McCouch et al. 2002; Mozo et al. 1999).
The markers, maps, and populations developed here are an important step toward
developing marker-assisted selection (MAS) strategies for important agronomic
characteristics in quinoa. For example, saponins - a bitter antinutritional triterpenoid
compound found on the quinoa seedcoats - is an ideal trait for marker-assisted selection.
The presence of saponin component deters avian predation, but also increases production
costs due to necessary washing steps. Thus, the ability to effectively select for the
presence or absence of saponins is of agronomic importance. We previously identified a
number of markers loosely linked to the bitter saponin production locus (BSPL),
including an AFLP marker (eACAmCTG-135) linked to the BSPL at 9 cM (Ricks 2005).
This same marker was also present in the Maughan et al. (2004) map (LG 11), and was
included in this study as well (Population 1 LG 1, integrated LG 1). Comparison of
markers present on the BSPL LG, LG 11 of the Maughan map, and LG 1 of the
integrated map presented here, revealed several common markers (Fig. 4). The presence
of additional markers on LG 1 of the integrated map should allow for the identification of
a marker more closely linked to the BSPL, thus improving MAS strategies.
Furthermore, the linkage maps reported here can also be used for cytogenetic
studies. Several SSR primers amplified more than one segregating locus; of these, eight
22
amplified two loci that both mapped in Population 1. One of these pairs (QATG087-A,
QATG087-B) mapped to the same linkage group, while all others mapped to different
linkage groups (Table 4). In addition, the two 11S seed storage protein loci each mapped
to different linkage groups. Thus, these linkage groups represent putative homoeologous
chromosomes in the allotetraploid quinoa genome, and in the future may be useful in
cytological analyses and genome evolutionary studies.
The markers and maps presented here will be particularly useful in the developing
regions where quinoa is cultivated. Compared to other marker techniques, SSRs are
relatively inexpensive once they have been developed, highly polymorphic, and easy to
use. SSR markers are easily transferred between laboratories and are highly
reproducible. These characteristics make them especially applicable in developing
countries that may lack the resources required for other marker techniques.
23
References
Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic
Acids Res 27:573–80
Broman KW (2005) The genomes of recombinant inbred lines. Genetics 169:1133--1146
Brummer EC, Bouton JH, Kochert G (1993) Development of an RFLP map in diploid
alfalfa. Theor Appl Genet 86:329--332
Busbice TH, Hill RR, Carnahan HL (1972) Genetics and breeding procedures. In:
Hanson CH, ed. Alfalfa Science and Technology. American Society of Agronomy,
Madison, WI, pp. 283-314
Christensen SA (2005) Assessment of Chenopodium quinoa Willd. genetic diversity in
the USDA and CIP-FAO collections using SSRs and SNPs. Thesis. Brigham Young
University, Provo, UT
Cureton AN, Burns MJ, Ford-Lloyd BV, Newbury HJ (2002) Development of simple
sequence repeat (SSR) markers for the assessment of gene flow between sea beet (Beta
Figure 1. Number of clones sequenced (A) and primers developed (B) for each library. (A) Total number of sequenced clones, including those containing unique microsatellites, redundant sequences, and those not used for primer design. (B) Total number of primers designed, including polymorphic and monomorphic primers, polymorphic primers with high molecular weight amplicons, those polymorphic only between C. berlandieri and quinoa, and primers with poor or no amplification. A.
0
100
200
300
400
500
600
GA AAT CAA
Library
Num
ber
of C
lone
s Not used
Redundant
Uniquemicrosatellites
B.
020406080
100120140160180200
GA AAT CAA
Library
Num
ber o
f Prim
ers
High molecular weightpolymorphicNo amplification
Poor amplification
Polym.C.berlandieri
Monomorphic
Polymorphic
40
Figure 2. Histogram showing number and heterozygosity (H) values of polymorphic markers by repeat length.
H Value of Polymorphic Markers by Repeat Length
0
5
10
15
20
25
30
35
40
45
5010
-19
20-2
9
30-3
9
40-4
9
50-5
9
60-6
9
70-7
9
80-8
9
90-9
9
> 1
00
Length in Base Pairs
Cou
nt
0.70 - 0.900.50 - 0.690.30 - 0.490.10 - 0.29
41
Table 2. Significant database sequence homologies to microsatellite-containing clones for which primers were designed, including E-value, nucleotide and/or protein homology match, organism match, and GenBank accession number, as identified through BLASTN and BLASTX searches.
Marker name E-value Nucleotide homology E-value Protein homology Organism matched GenBank accession #
KAAT007 2.00E-39 Mitochondrial DNA Beta vulgaris BA000009 GI:47118321
KAAT032 2.00E-06 DNA microsatellite locus mm17 Sus scrofa AB206674 GI:78483939
KAAT040 3.00E-09 Putative C2H2 type zinc finger protein Arabidopsis thaliana AAL91203
GI:19698935
KCAA010 1.00E-06 Hypothetical protein MmC1DRAFT_0268 Magnetococcus sp. EAN27141
KGA177 6.00E-05 Putative protein Oryza sativa CAC39056 GI:14140139
KGA184 3.00E-21 Catalytic protein Arabidopsis thaliana NP_566539 GI:18401044
KGA186 1.00E-06 At5g09860 – MYH9-7 protein Arabidopsis thaliana AAN33204 GI:23463079
KGA024 1.00E-28 Phosphoenolpyruvate carboxylase kinase 2 Clusia minor AAR3183
GI:39842453
44
Figure 3. Linkage maps of Population 1 (A), Population 40 (B), and the integrated map (C). Distances in centiMorgans are indicated on the left side of each linkage group. All SSR markers begin with Q or K; newly developed SSR markers reported herein begin with K. BES-SSR markers begin with B. AFLP markers begin with e. Markers skewed at P<0.05 and P<0.01 are indicated with * and **, respectively. A. Population 1 linkage map.
45
B. Population 40 linkage map.
46
C. Integrated map
47
Table 3. Skewed markers scored (A) and mapped (B) in Populations 1 and 40. SSR markers begin with Q or K. AFLP markers begin with e. Markers skewed at P<0.05 and P<0.01 are indicated with * and **, respectively. (A) Name and parental direction of skewed markers scored in Populations 1 and 40. (B) Number, linkage group location, and parental direction of skewed markers for Population 1 and Population 40 . A.
Figure 4. Comparison of loci linked to the BSP locus (BSPL) (Ricks 2005) in LG 11 of the Maughan et al (2004) map and linkage group (LG) 1 of the integrated map reported herein. Distances in centiMorgans are reported on the left side of the linkage groups. SSR markers begin with Q or K. AFLP markers begin with e.
50
Table 4. Potentially homoeologous loci and linkage groups (LG) in the Population 1, Population 40 and inegrated map, as indicated by a single primer set amplifying two segregating loci.