Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci Ying-Hui Li 1 , Wei Li 1 , Chen Zhang 1 , Liang Yang 2 , Ru-Zhen Chang 1 , Brandon S. Gaut 2 and Li-Juan Qiu 1 1 The National Key Facility for Crop Gene Resources and Genetic Improvement (NFCRI) ⁄ Key Lab of Germplasm Utilization (MOA), Institute of Crop Science, Chinese Academy of Agricultural Sciences, 100081 Beijing, China; 2 Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697, USA Author for correspondence: Li-Juan Qiu Tel: +86 10 8210 5843 Email: [email protected]Received: 6 January 2010 Accepted: 14 May 2010 New Phytologist (2010) 188: 242–253 doi: 10.1111/j.1469-8137.2010.03344.x Key words: diversity pattern, domestication, geographic variation, introgression, population structure. Summary • The study of genetic diversity between a crop and its wild relatives may yield fundamental insights into evolutionary history and the process of domestication. • In this study, we genotyped a sample of 303 accessions of domesticated soybean (Glycine max) and its wild progenitor Glycine soja with 99 microsatellite markers and 554 single-nucleotide polymorphism (SNP) markers. • The simple sequence repeat (SSR) loci averaged 21.5 alleles per locus and overall Nei’s gene diversity of 0.77. The SNPs had substantially lower genetic diversity (0.35) than SSRs. A SSR analyses indicated that G. soja exhibited higher diversity than G. max, but SNPs provided a slightly different snapshot of diversity between the two taxa. For both marker types, the primary division of genetic diversity was between the wild and domesticated accessions. Within taxa, G. max consisted of four geographic regions in China. G. soja formed six subgroups. Genealogical analyses indicated that cultivated soybean tended to form a monophyletic clade with respect to G. soja. • G. soja and G. max represent distinct germplasm pools. Limited evidence of admixture was discovered between these two species. Overall, our analyses are consistent with the origin of G. max from regions along the Yellow River of China. Introduction Plant domestication fundamentally altered the course of human history, prompting the shift from hunter–gatherer to agricultural societies. Domestication is a multifaceted process that is amenable to study by a wide range of disci- plines, including archaeology, anthropology, molecular genetics and evolutionary biology. Accordingly, the study of domestication has yielded fundamental insights into early societies, the genes and biological mechanisms that underlie morphological change, and the strength and patterns of selection (Doebley et al., 2006). Recent genetic and archaeological investigations have also shown that the process of domestication can vary substan- tially among crop species. For example, genetic studies of maize suggest that it was domesticated only once, from a wild progenitor located in highland Mexico (Matsuoka et al., 2002). After domestication c. 9000 yr ago (Matsuoka et al., 2002), archaeological evidence indicates that culti- vated maize dispersed throughout the Americas quite rapidly – that is, within, perhaps, hundreds of years (Pohl et al., 2007). By contrast, evolutionary genetic analyses of barley and rice have demonstrated at least two domestication events for both species (Cheng et al., 2003; Morrell & Clegg, 2007). Moreover, archaeological study of cereal grains suggests that the process of domestication may have taken thousands of years (Fuller, 2007), and perhaps as long as five millennia for rice (Fuller et al., 2009). While the result for rice is still open to interpretation (Jones & Liu, 2009) and may not be completely compatible with genetic evidence (Zhang et al., 2009), extending the duration of domestication from a rapid to a multimillennial New Phytologist Research 242 New Phytologist (2010) 188: 242–253 www.newphytologist.com Ó The Authors (2010) Journal compilation Ó New Phytologist Trust (2010)
12
Embed
Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Genetic diversity in domesticated soybean (Glycine max)and its wild progenitor (Glycine soja) for simplesequence repeat and single-nucleotide polymorphismloci
Ying-Hui Li1, Wei Li1, Chen Zhang1, Liang Yang2, Ru-Zhen Chang1, Brandon S. Gaut2 and Li-Juan Qiu1
1The National Key Facility for Crop Gene Resources and Genetic Improvement (NFCRI) ⁄ Key Lab of Germplasm Utilization (MOA), Institute of Crop
Science, Chinese Academy of Agricultural Sciences, 100081 Beijing, China; 2Department of Ecology and Evolutionary Biology, University of California,
Irvine, CA 92697, USA
Author for correspondence:Li-Juan QiuTel: +86 10 8210 5843
• The study of genetic diversity between a crop and its wild relatives may yield
fundamental insights into evolutionary history and the process of domestication.
• In this study, we genotyped a sample of 303 accessions of domesticated
soybean (Glycine max) and its wild progenitor Glycine soja with 99 microsatellite
markers and 554 single-nucleotide polymorphism (SNP) markers.
• The simple sequence repeat (SSR) loci averaged 21.5 alleles per locus and overall
Nei’s gene diversity of 0.77. The SNPs had substantially lower genetic diversity
(0.35) than SSRs. A SSR analyses indicated that G. soja exhibited higher diversity
than G. max, but SNPs provided a slightly different snapshot of diversity between
the two taxa. For both marker types, the primary division of genetic diversity was
between the wild and domesticated accessions. Within taxa, G. max consisted of
four geographic regions in China. G. soja formed six subgroups. Genealogical
analyses indicated that cultivated soybean tended to form a monophyletic clade
with respect to G. soja.
• G. soja and G. max represent distinct germplasm pools. Limited evidence of
admixture was discovered between these two species. Overall, our analyses are
consistent with the origin of G. max from regions along the Yellow River of China.
Introduction
Plant domestication fundamentally altered the course ofhuman history, prompting the shift from hunter–gathererto agricultural societies. Domestication is a multifacetedprocess that is amenable to study by a wide range of disci-plines, including archaeology, anthropology, moleculargenetics and evolutionary biology. Accordingly, the study ofdomestication has yielded fundamental insights into earlysocieties, the genes and biological mechanisms that underliemorphological change, and the strength and patterns ofselection (Doebley et al., 2006).
Recent genetic and archaeological investigations have alsoshown that the process of domestication can vary substan-tially among crop species. For example, genetic studies ofmaize suggest that it was domesticated only once, from a
wild progenitor located in highland Mexico (Matsuokaet al., 2002). After domestication c. 9000 yr ago (Matsuokaet al., 2002), archaeological evidence indicates that culti-vated maize dispersed throughout the Americas quite rapidly– that is, within, perhaps, hundreds of years (Pohl et al.,2007). By contrast, evolutionary genetic analyses of barleyand rice have demonstrated at least two domesticationevents for both species (Cheng et al., 2003; Morrell &Clegg, 2007). Moreover, archaeological study of cerealgrains suggests that the process of domestication may havetaken thousands of years (Fuller, 2007), and perhaps as longas five millennia for rice (Fuller et al., 2009).
While the result for rice is still open to interpretation(Jones & Liu, 2009) and may not be completely compatiblewith genetic evidence (Zhang et al., 2009), extending theduration of domestication from a rapid to a multimillennial
NewPhytologistResearch
242 New Phytologist (2010) 188: 242–253
www.newphytologist.com� The Authors (2010)
Journal compilation � New Phytologist Trust (2010)
process has important implications for interpreting patternsof genetic diversity between crops and their wild relatives.For example, a domestication process of thousands (asopposed to hundreds) of years provides more opportunitiesfor local domestication, migration among local domesticatesand local extinction. Just as importantly, a domesticationevent of long duration provides an expanded period forpotential introgression between the domesticated and localpopulations of its wild relative(s). Evidence for local intro-gression between a crop and its wild relatives can be inferredfrom patterns of genetic diversity – for example, rice (Garriset al., 2005).
Here, we investigate genetic diversity in cultivatedsoybean (Glycine max) and its wild progenitor Glycine soja.Soybean is cultivated globally, in part because it producesamong the highest gross oil output – with the highest pro-tein content – of any vegetable crop (Mohamed &Rangappa, 1992). The weight of cytological, biochemicaland molecular evidence supports the domestication of soy-bean from G. soja, a wild annual species that is nativethroughout China and parts of Korea, Japan and Russia(Fig. 1). Nonetheless, several aspects of soybean domestica-tion are not well established, which is surprising given itsagricultural importance. For example, the location ofdomestication in China is not yet clearly substantiated. Ithas been hypothesized that soybean was domesticated innorth-eastern China (Fukuda, 1933; Li, 1994), the YellowRiver valley of northern China (Vavilov, 1951; Hymowitz& Newell, 1981; Chang, 1989; Zhou et al., 1998; Dong
et al., 2004; Zhao & Gai, 2004; Li et al., 2008) and south-ern China (Gai et al., 2000). It is also unclear as to whethersoybean was domesticated more than once, but multipledomestications have been explicitly suggested (Xu et al.,2002).
Patterns of molecular diversity often yield insights intothe location and number of domestication events.Molecular diversity in G. max and G. soja have been exam-ined with a series of markers, including simple-sequencerepeats (SSRs), random amplified polymorphic DNA(RAPD) markers and amplified fragment length polymor-phisms (AFLPs). To date, these studies have yielded similarinsights into the patterning of genetic diversity in G. sojaand its relationship to G. max. Typically, genetic diversityclusters by taxon, with a clear differentiation between wildand domesticated taxa (Powell et al., 1996). Within ataxon, the genetic structure of G. max and G. soja typicallyagree with geographic location (Dong et al., 2001, 2004;Abe et al., 2003; Xu & Gai, 2003; Li et al., 2008b). Forexample, Chinese and Japanese G. soja populations formdistinct germplasm pools (Hirata et al., 1999; Kurodaet al., 2006), and Asian accessions of G. max (Abe et al.,2003) group in general accordance with planting regionand also the sowing season.
In this study, we investigate genetic diversity in a broadsample of G. max and G. soja. Our study differs from previ-ous studies of genetic diversity in Glycine in three importantways. First, we rely on two types of molecular markers –SSRs and SNPs, which differ in mutational properties
13
RUSSIA
CHINA
1
1
74 12
1313
7
811
34
717
2319
6
3026
3
3
36
37
20
25
30
9
8
1
North
Huanghuai
South
Spratly Islands
Fig. 1 The geographic distributions of samples used in this study. Regional sampling is designated by circles, where Glycine soja is representedby the open part of the circle and Glycine max is the closed portion. Each circle represents a different province. The number of samples perprovince is indicated by the number. The four colored portions divide China into four regions: NER (northeast region), NR (north region), HR(Huanghuai region) and SR (south region). The blue lines represent the Yellow and Yangtze rivers.
NewPhytologist Research 243
� The Authors (2010)
Journal compilation � New Phytologist Trust (2010)
(Payseur & Jing, 2009) – and compare results betweenthem. Second, our study differs with respect to the size andextent of the Glycine sample. Our survey includes G. maxindividuals that represent over 90% of the phenotypicdiversity found within the Chinese soybean germplasmcollection and G. soja individuals representative of itsnatural range. Third, we explicitly consider the possibilitiesof multiple centers of domestication and of admixture andprolonged gene flow between the wild and the cultivatedspecies after the initial domestication event(s).
With SSR data from 99 loci and SNP data from 738 locityped in a common sample of 303 individuals, we addressthe following questions: Do G. soja and G. max continue torepresent distinct germplasm pools, as in previous studies?Is G. soja geographically structured? If so, can we identifythe region or the regions of China in which G. max wasdomesticated? Is there evidence that the process of domesti-cation included substantial admixture between wild andcultivated populations? Along the way, we also evaluate therelative merits of SNPs and SSRs to address these questions.
Materials and Methods
Plant materials
We sampled a total of 435 accessions representing culti-vated G. max (L.) Merr. (321), its wild progenitor speciesG. soja Sieb. et Zucc (112) and two outgroup species(Fig. 1). The G. max population consisted of 240 landracesand 81 cultivars, including the minicore collection of culti-vated soybean in the Chinese National Soybean GeneBank(CNSGB) (248 G. max accessions). The minicore collectionrepresents most of the phenotypic diversity and c. 70% ofthe molecular genetic diversity of 23 587 cultivated acces-sions housed in the CNSGB (Wang et al., 2006; Qiu,2009). Our accessions originate from four large ecologicalregions, northeast region (NER), north region (NR),Huanghuai region (HR) and south region (SR) (Fig. 1),respectively, ranging over 19.4 to 50.2�N and 86.3 to130.2�E. These four regions represent the four major plant-ing areas of soybean in China (Li et al., 2008).
Accessions of the wild progenitor were selected to repre-sent the geographical range of this species from 24.5 to52.2�N and 100.5 to 141.2�E. Of the 112 G. soja acces-sions, 73 were from China, 8 from Korea, 9 from Russiaand 22 from Japan (Fig. 1). A single accession of Glycinetomentalla Hayata (one of two perennial species found inChina) and Glycine falcata Benth. were included as out-groups. The G. max accessions and 73 Chinese G. soja wereobtained from the CNSGB, with the remainder providedby Dr Randall Nelson from the USDA-ARS SoybeanGermplasm Collection (University of Illinois, Urbana, IL,USA). Detailed information about each accession isprovided in the Supporting Information, Table S1.
Data collection
For both SNP and SSR analyses, DNA was extracted fromyoung leaf tissue of one plant (G. tomentella and G. falcata)or a bulk of young leaf tissue of 20–30 plants (G. soja andG. max) per accession as previous described (Xie et al.,2005). We bulked samples to produce enough DNA forgenotyping, but the bulking of samples is justified by thefact that each accession of the minicore collection has beenculled for both phenotypic and genotypic homogeneity.
Ninety-nine simple sequence repeats (SSRs) were selectedfor genotyping, based on their distribution across thegenetic linkage map (http://bldg6.arsusda.gov/cregan/soymap.htm). The SSR loci were mapped onto theWilliams82 genome sequence (http://www.phytozome.net)with blast (E-value < 10)10), using the SSR primers as aquery. If a best hit of blast mapped only part of the primersequence, we extended it in order to map the full length.Then, the allele size in Williams82 was calculated based onthe boundaries of the extended hit. Overall, the 99 SSR lociwere located on 20 integrated genetic linkage groups, cover-ing 1581.8 cM of soybean genome, with 20.0 cM averagegenetic distance between adjoining loci (Table S2). A PCRamplification of SSRs followed Xie et al. (2005). The PCRproducts were separated on an ABI PRISM 377 DNAAnalyser (Applied Biosystems, Foster City, CA, USA).Allele sizes were estimated with an automated sequencer(Applied Biosystems) and inspected manually. When thegenotyping had multiple (‡ 3) peaks, we treated the obser-vation as missing data. These had little overall effect,because they represented only 0.2% of the total genotypingdata. In a small proportion of the time two peaks were iden-tified in four SSR markers (c. 4.0% of 99 SSR markers), wescored the higher peak after confirmation by the repeat test(ABI PRISM 377 DNA Analyser) and polyacrylamide gelelectrophoresis (PAGE).
We assayed 738 SNPs in the complete set of 435 acces-sions. These SNPs were polymorphic in a set of six diverseG. max accessions and used to build the first transcript mapof soybean (Choi et al., 2007). They were chosen based ona design ability rank score > 0.6 and a pre-evaluationof 60 bp of upstream and downstream flanking regionsby Illumina (http://www.illumina.com/, San Diego, CA,USA). The upstream and downstream data were accessed athttp://bfgl.anri.barc.usda.gov/soybean/.
The SNPs were assayed using the Illumina GoldenGateassay which was performed based on the manufacturer’sprotocol and the methods described in Shen et al. (2005).For each SNP, the lowest acceptable score of GenCall andGenTrain were set at 80% and 0.6, respectively, for separat-ing homozygote and heterozygote clusters.
The 738 SNPs were also mapped to the Williams82 gen-ome sequence with blast (E-value < 10)10) using the SNPflanking sequences as queries; 17 of the 738 SNP markers
244 Research
NewPhytologist
� The Authors (2010)
Journal compilation � New Phytologist Trust (2010)
either overlapped with another marker or had an ambiguouslocation and were not considered further.
A list of the SNP and SSR loci, along with genetic andphysical positions is provided in Table S2.
Analyses of genetic diversity and population structure
Summary statistics were computed for both the SSR andSNP data sets. The statistics included the number ofalleles, the proportion of heterozygous individuals in thepopulation and Nei’s gene diversity, as calculated bypowermarker 3.25 (Liu & Muse, 2005). In addition, pwas calculated on SNP data with DnaSP (Rozas et al.,2003). As sample sizes differed across populations, thenumber of distinct alleles per sample was estimated byadze (Szpiech et al., 2008), which employs a rarefactionapproach to obtain sample-size corrected estimates. Sample-size corrected estimates of allelic richness were calculatedwith fstat (Goudet, 2001).
We examined population structure and differentiationwith two methods. First, we used two Bayesian MarkovChain Monte Carlo approaches, structure 2.1 (Pritchardet al., 2000; Falush et al., 2003) and instruct (Gao et al.,2007). structure minimizes deviations from Hardy–Weinberg equilibrium within an inferred population; bycontrast, instruct, uses expected genotype frequencies andestimates of selfing rate to make population assignments.
For instruct, one haplotype per line was included in thedataset. Haplotypes were inferred from the structure
dataset using phase (Stephens et al., 2001) version 2.1(Stephens & Donnelly, 2003). For both structure andinstruct analyses, we employed the admixture and inde-pendent allele frequency models, using a number of clusters(K) ranging from 1 to 8. Five runs were performed for eachvalue of K, without using previous population information.Burn-in time and replication number were both set to100 000 for each run. Additional parameters in theinstruct analyses were set to the default values on the web-site (http://cbsuapps.tc.cornell.edu/instruct.aspx/). Thevalue of logePr(X ⁄ K ) and the variance in logePr(X ⁄ K )(Var(logeP(X ⁄ K ))) were used to identify the appropriatevalues of K.
Our second method of examining population struc-ture was analyses of molecular variance (AMOVA), basedon implementations in arlequin (http://cmpg.unibe.ch/software/arlequin3/).
Phylogenetic analyses
We constructed two types of phylogenetic trees. Both treeswere based on the shared-allele distance among accessions,as calculated by powermarker 3.25 and displayed bymega4 (Tamura et al., 2007). Both tree types were alsobased on the neighbor-joining algorithm (Saitou & Nei,
1987) implemented in mega, using the G. tomentella andG. falcata accessions as outgroups. The first type of tree useddata from all 435 accessions and treated each accession asan operational taxonomic unit (OTU). The second treegrouped accessions into 15 OTUs on the basis of their geo-graphic location and their position in the first type of tree.Accessions from the same geographic region or subregionwere grouped as an OTU when they clustered together inthe first tree. Both trees were calculated for three data sets –SSRs, SNPs and SSRs + SNPs – resulting in a total of sixtrees. Confidence on each tree was assessed with 1000 boot-strap replications.
We compared the shared-allele distance matrices betweenSSR and SNP data with a Mantel test, which was based on1000 random permutations as implemented in MXCOMPwithin the ntsyspc 2.10j package (New York, NY, USA).
Results
Diversity among G. soja and G. max samples
We attempted to genotype 435 accessions of wild and culti-vated soybean at 99 SSRs and 738 SNPs. Both SSRs andSNPs were culled with respect to quality and failure rates.For the SSR dataset, all 99 loci provided reliable results, but62 accessions had missing data for nine or more loci; theseaccessions were removed from further analyses. For theSNP dataset, 167 SNP loci were removed in total, owing tofailure in 20% or more of samples (121 SNPs), apparentheterozygosity in > 20% of samples (34 SNPs), suggestingparalogous markers (Fig. S1), or monomorphism in allaccessions (12 SNPs). The SNP dataset thus ultimately con-sisted of 554 SNPs (Table S2). However, 70 accessions hadmissing data for 55 SNP loci, and these accessions wereremoved. Thus, the genotyping data resulted in three datasets: the SSR dataset, consisting of 373 accessions scored for99 SSRs; the SNP dataset, consisting of 365 accessionsgenotyped for 554 SNPs; and a combined SSRs + SNPsdataset, comprising 303 common accessions scored for 554SNPs and 99 SSRs.
To explore properties of genetic variation, summarystatistics were calculated for the various datasets (Table 1).The 99 SSR loci averaged 21.5 alleles per locus, with anoverall Nei’s gene diversity of 0.77. The SNPs had substan-tially lower genetic diversity (0.35) than the SSRs (t-test,P < 0.001). The marker types also had markedly differentallele frequency distribution (Fig. S2); for SSRs, most(80.9%) of the alleles were at < 5% frequency, but most(92.0%) SNPs had an overall frequency ‡ 5%. Presumably,the relatively high frequency of SNP markers reflectsascertainment biases.
A few major themes become apparent when contrastingdiversity between G. soja and G. max. First, SSRs indicatethat wild G. soja has significantly higher allelic richness,
NewPhytologist Research 245
� The Authors (2010)
Journal compilation � New Phytologist Trust (2010)
gene diversity and allele numbers than cultivated G. max (t-test, P < 0.01). For example, the number of alleles observedin G. soja (1807) exceeded that of G. max (1473), despitesmaller sample sizes in G. soja (92 vs 279). After using therarefaction method to standardize for sample sizes, G. sojastill exhibited a higher numbers of expected distinct and pri-vate alleles than G. max at different samples sizes (Fig. S3).
Second, SNPs provided a slightly different snapshot ofmolecular diversity between the two taxa. The numbers ofdistinct and private alleles expected in G. soja was higherthan in G. max (Fig. S3), as was found with SSR data.However, genetic diversity, as measured both by Nei’s genediversity and by p, was lower in G. soja (0.301 and 0.309,respectively) than G. max (0.337 and 0.343, respectively)(Table 1).
The information from SSR and SNP markers wasconcordant in some respects, however. For example, bothindicate that China had the highest gene diversity within G.soja followed by Japan, Korea and Russia (Table 1). Bothmarker types also provided similar insights into the parti-tioning of diversity based on AMOVA analyses (Table 2).The proportion of variance caused by differences betweenspecies was 9.7–10.4% for SSRs and 14.3–15.2% for SNPs.For both, the largest component of variation was amongindividuals within population (SSRs, 73.9–84.1%; SNPs,65.7–78.4%).
Population structure
STRUCTURE vs INSTRUCT We applied two Bayesianapproaches – structure (Pritchard et al., 2000; Falushet al., 2003) and instruct (Gao et al., 2007) – to investigategenetic clustering among G. max and G. soja accessions.Each approach was applied to SSR data alone, SNP dataalone, and combined SSR + SNP data. The analyses usingstructure did not produce a clear ‘plateau’ as the estimatedlog probability of data Pr(X ⁄ K) increased gradually as valuesof K increased (Fig. S4). The variance in logePr(X ⁄ K)(Var(logeP(X ⁄ K))) increased constantly from K = 1 to K = 4(SSR dataset) or K = 5 (SNP dataset), with only slightchanges at higher K-values. For SSR data, most (83.4%)accessions were assigned to a population at K = 6, and thissubdivision seemed biologically sensible for selfing soybean(see the section–structure analyses among datasets).Hence we selected K = 6 as the optimal cluster number. ForSNPs we selected K = 5 because only 57.1% accessionscould be assigned to a single population when K = 6, but76.3% accessions could be assigned with K = 5. instruct
suggested the same number of populations for the SSR(K = 6) and SNP (K = 5) datasets (data not shown).
The results were highly concordant among runs instructure and instruct analyses, respectively, and thusresults are shown for a single run. At K = 2, the populationstructure inferred by these two approaches were similarT
able
1Su
mm
ary
stat
istics
for
Gly
cine
max
and
Gly
cine
soja
popula
tions
by
single
-nucl
eotide
poly
morp
his
m(S
NP)
(554)
and
sim
ple
sequen
cere
pea
t(S
SR)
(99)
loci
Spec
ies
Typ
eO
rigin
SNP
SSR
Sam
ple
size
Num
ber
of
poly
morp
hic
loci
p1N
ei’s
gen
ediv
ersi
tyH
eter
ozy
gosi
ty2
f3Sa
mple
size
Num
ber
of
alle
les
Nei
’sgen
ediv
ersi
tyA
llelic
rich
nes
s4H
eter
ozy
gosi
ty2
f3
G.m
ax
298
546
0.3
43
0.3
37
0.0
49
0.8
54
279
1473
0.6
87
10.1
(43)
0.0
49
0.9
29
Bre
dC
hin
a65
524
0.3
22
0.3
13
0.0
50
0.8
42
62
884
0.6
72
8.0
(30)
0.0
56
0.9
17
Landra
ceC
hin
a233
541
0.3
44
0.3
38
0.0
49
0.8
56
217
1332
0.6
82
8.8
(30)
0.0
46
0.9
32
G.so
ja65
533
0.3
09
0.3
01
0.0
72
0.7
63
92
1807
0.8
71
16.7
(43)
0.1
53
0.8
26
Chin
a41
525
0.3
14
0.3
40
0.0
67
0.8
09
61
1506
0.8
51
5.4
(4)
0.1
55
0.8
21
Japan
14
319
0.2
59
0.2
69
0.0
59
0.7
95
15
842
0.8
07
5.1
(4)
0.1
56
0.8
19
Kore
a5
300
0.2
42
0.2
33
0.1
13
0.5
93
7542
0.7
36
4.5
(4)
0.2
29
0.7
30
Russ
ian
5196
0.1
85
0.1
77
0.0
13
0.9
42
9463
0.6
95
4.0
(4)
0.0
73
0.9
09
Tota
l5363
554
0.3
57
0.3
50
0.0
54
0.8
36
371
2133
0.7
66
14.0
(43)
0.0
75
0.9
03
1N
ucl
eotide
div
ersi
ty.Si
tes
with
alig
nm
ent
gap
sor
mis
sing
dat
aw
ere
consi
der
ed.
2The
aver
age
of
pro
port
ion
of
het
erozy
gous
indiv
idual
sin
the
popula
tion
(Liu
&M
use
,2005).
3f,
Inbre
edin
gco
effici
ent.
4The
num
ber
inth
ebra
cket
dis
pla
ysm
inim
um
sam
ple
size
for
test
ing
alle
licrich
nes
s.5The
ove
rall
estim
ates
are
calc
ula
ted
asth
eav
erag
eac
ross
alll
oci
,w
her
eas
varian
ces
and
confiden
cein
terv
als
are
estim
ated
by
nonpar
amet
ric
boots
trap
pin
g(1
00
tim
es)
acro
ssdiffe
rent
loci
.
246 Research
NewPhytologist
� The Authors (2010)
Journal compilation � New Phytologist Trust (2010)
(Fig. 2a,b); both first differentiated G. soja and G. max.However, the assignment of accessions differed slightlybetween the two approaches (Table S1). For example,structure split cultivated soybean into four populationsand wild soybean into two populations, but instruct splitcultivated soybean into three populations and wild soybeaninto three populations.
Overall, the differences in assignment seemed to be rela-tively minor, because the consistency of assignment was veryhigh for most groups. For example, we calculated the freq-uency of assignment between structure and instruct
based on SSR, SNP and SSR + SNP data (Table S3). Forthe SSR data, 100% of NER and NR structure-inferredaccessions were assigned by instruct into the NER and
(a) PNS+RSSRSS SNPG. max G. soja G. max G. soja G. max G. soja
K = 2
K = 3
K = 4
K = 5
(b)
(c)
K = 6
K = 2
K = 3
K = 4
K = 5
C RKJ JKRCRSRHRNRENJKRCRSRHRNRENRSRHRNREN
K = 6
SR HR NR NER R K JSR HR NR NER R K JSR HR NR NER R K J
1.00.90.80.70.60.50.40.30.20.10.0
Fig. 2 Population structure inferred by Bayesian clustering approaches based on simple sequence repeat (SSR), single-nucleotide polymor-phism (SNP) and SSR + SNP data, respectively. (a) Total accessions using STRUCTURE; (b) total accessions using INSTRUCT; (c) Glycine soja withoutinferred hybrids using STRUCTURE. Each individual is shown as a thin vertical line partitioned into K colored components, representing inferredmembership in K genetic clusters. The top row (a and b) provided the species name and the bottom row (a, b and c) indicates geographicregion. NER, northeast region, China; NR, north region, China; HR, Huanghuai region, China; SR, south region, China; C, China; R, Russia; K,Korea; J, Japan.
Table 2 Analyses of molecular variance (AMOVA) based on simple sequence repeat (SSR) (99 loci) and single-nucleotide polymorphism (SNP)(554 loci) analyses
Sample Number of groups
Number of populations Percentage of variation and 95% confidence intervals (%)
TotalIn Glycinesoja
In Glycinemax
Amonggroups
Amongpopulationswithin groups
Amongindividualswithinpopulations
Withinindividuals
SSR SNP SSR SNP SSR SNP SSR SNP
Total 2 (G. soja, G. max) 11 7 4 10.4 15.2 4.6 5.9 73.9 65.7 11.2 13.3Total 2 (G. soja, G. max) 8 4 4 9.7 14.3 4.6 6.3 74.5 66.2 11.2 13.3Total 1 2 1 1 11. 5 16.5 77.3 70.2 11.3 13.3G. max 1 4 4 5.0 7.2 84.1 78.4 10.9 14.4G. soja 4 (China, Korea,
G. soja 1 7 7 5.2 7.3 77.3 68.5 17.6 24.3G. soja 1 4 4 6.8 8.7 76.1 67.7 17.1 23.6
NewPhytologist Research 247
� The Authors (2010)
Journal compilation � New Phytologist Trust (2010)
New Phytologist (2010) 188: 242–253
www.newphytologist.com
NR clusters. Similarly, 97.3% of the HR and 99.0% of theSR accessions inferred by structure were assigned into theHR + SR cluster identified by InStruct.
The greatest difference between the structure andinstruct analyses was in the number of admixed individ-uals. structure identified far more admixed individuals(78), but the number of unassigned accessions was muchlower in instruct analyses (8). As the structure resultsmore closely fit the geographical distribution and theresults of previous studies (Li et al., 2008; Wen et al.,2008). We thus chose to focus on their description inmore detail.
STRUCTURE analyses among datasets The structure resultsvaried somewhat among the three data sets (Fig. 2a,b). ForSSRs alone, the two taxa were clearly delineated at K = 2.Each additional cluster delineated geographic regions(Fig. 3): at K = 3, G. max accessions from south Chinaseparated from north China; at K = 4, accessions fromnortheast China were separated from the Yellow Riverregion; at K = 5, G. soja accessions split into two clusters(China vs neighboring countries); finally, at K = 6, theK-value with the highest likelihood, G. max separated intotwo clusters along the Yellow River. Thus, the cultivatedaccessions ultimately grouped into four clusters that werelargely concordant with major geographic regions in China,including NER, NR, HR and SR.
The structure analyses of the SNP and SNP + SSRdatasets agree with the SSR data in most respects – that is,G. max is differentiated into four regional groups, and G.soja is clearly separated from G. max. However, the groupdelineation with K = 2 was not primarily along taxonomiclines. This initial delineation separated G. max from agroup that included G. soja and the part of NR group of G.max (the other NR accessions were defined as unassignedaccessions), thus suggesting the possibilities either of intro-gression between G. soja and NR or recent shared ancestry.
The structure analyses provided limited evidence ofadmixture between G. soja and G. max (Fig. 4). For all threedatasets, some accessions labeled as G. max contained anappreciable component of diversity that was assigned to thewild gene pool. For the SSR data set, for example, theaccessions could be defined by accessions with low, mediumand high assignment probabilities to the G. soja gene pool.The low set consisted of 264 G. max and two G. soja acces-sions with an ancestry coefficient £ 0.27. (Here ‘ancestrycoefficient’ is defined as the inferred proportion of member-ship in the G. soja gene pool when K = 2.) The high groupincluded 61 accessions, including 60 G. soja and one G. maxaccession with ancestry coefficients ‡ 0.88. The middlegroup, which is the most interesting because it may representhybrids or introgressed material, consisted of 44 accessions(15 from G. max and 29 from G. soja) with ancestry coeffi-cients between 0.33 and 0.83. This middle group contained
Fig. 4 Distinction of wild and cultivated soybean, expressed as individual ancestry to the wild gene pool in a STRUCTURE analyses while assumingtwo populations. Ancestry was ranked for each individual, and the ranks are plotted against the ancestry in the wild gene pool. The arrowsindicate small gaps in the distribution (see text for details). Glycine max, circles; Glycine soja, crosses. SSR, simple sequence repeat; SNP,single-nucleotide polymorphism.
SSR
K = 2
G. soja G. max
K = 3
North China South China
K = 4
HR+NRNER
K = 5
C+R J+K
K = 6
NER SRNR HRC+R J+K
HR+SR
SNP
K = 2
K = 3
G. soja + G. max (NR) G. soja + G. max (NR)
K = 4
HR+NER
K = 5
NRG. soja SRNER HR
SSR + SNP
K = 2
K = 3
K = 4
K = 5
NRG. soja NERHR SR
G.max G. max
G. soja + G. max G. soja + G. max
G. soja + G. max
Fig. 3 Schematic clustering procedure during inferring population structure using STRUCTURE, based on simple sequence repeat (SSR), single-nucleotide polymorphism (SNP) and SSR + SNP data for Glycine max and Glycine soya. NER, northeast region, China; NR, north region,China; HR, Huanghuai region, China; SR, south region, China; C, China; R, Russia; K, Korea; J, Japan.
248 Research
NewPhytologist
� The Authors (2010)
Journal compilation � New Phytologist Trust (2010)
more accessions with the SNP dataset (60 G. max and 14 G.soja accessions) and with SNP + SSR dataset (43 G. max andnine G. soja accessions). Thus, while Bayesian analysesclearly delineate between taxa, these analyses also suggest thatc. 20% of wild and domesticated accessions are eitherpoorly differentiated or owe their origin to admixture.
To detect population structure within G. soja, weremoved the putative hybrids (Fig. 4) and reanalyzed theseG. soja datasets with structure. The SSR, SNP andSSR + SNP dataset yielded similar results (Fig. S4b). Wefound K = 6 to converge well and show the highest aver-aged likelihoods among runs of the program for all threedatasets. The accessions from different regions (exceptNER) tended to form six distinct clusters, corresponding totheir geographical origins (Japan, Korea, Russia and NR,HR and SR in China) (Fig. 2c). This pattern demonstratesthat geographical genetic differentiation exists in wildsoybean (G. soja).
Genealogical analyses
To gain insight into potential locations of G. max domesti-cation, we constructed a neighbor-joining tree, based onindividual accessions, with G. tometella and G. falcata out-groups (Fig. 5a). Regardless of the dataset examined, withfew exceptions, cultivated soybean tended to form a mono-phyletic clade with respect to G. soja. Moreover, accessionswithin G. max tended to form subpopulations correspond-ing to geographic origin, but there was overlap, particularlyamong accessions from the NR and HR regions with SSRdata, consistent with the structure analyses (Fig. 2a).
To better assess the monophyly of G. max and relation-ships among geographic regions, we collapsed clades toform OTUs. We pooled the accessions into 13 (SNPs andSSRs + SNPs) or 15 (SSRs) OTUs based on their geo-graphic origin (latitude and longitude), their position inFig. 5(a) and their assignment in structure analyses (see
SSR SNP(a)
G. soja G. soja G. soja
NE
R
G. max xa m.GG. max NE
R
NE
R
SR HRAnnual Perennial
G. tomentella
G. falcata
G. soja
G. max
NER NR HR SR Russia Korea Japan
0.050.050.05
SSR+SNP
(b)PerennialW-HRC-SR-SPC-SR-EPC-SR-WPC-HR-WP
8696
70
99
40
C-HR-EPC-NER-SPC-NER-NPC-NR-WPC-NR-EP
100
100
95
100
W-KoreaW-JapanW-RussiaW-SRW-NR
8856
3255
SSRPerennialC-SR-WP100
SNP
C-SR-EPC-NER-SPC-NER-NPC-HR-WP
100
10081
100
92C-HR-EPC-NR-WPC-NR-EPW-SR
80
100
92
55W-KoreaW-JapanW-RussiaW-NR+HR
100
99
55
85
SSR + SNPPerennialC-NER-SP100C-NER-NPC-SR-WPC-SR-EPC-HR-WP
100
54
100
100C-HR-EPC-NR-WPC-NR-EPW-Russia
95
100
100
W-KoreaW-JapanW-SRW-NR+HR
100
97
100
93
97
Fig. 5 Neighbor-joining tree of Glycine soja and Glycine max rooted with Glycine tomentella and Glycine falcata based on shared allelepairwise distances. (a) Trees for individual soybean accessions. Colored symbol indicate the inferred genetic cluster from STRUCTURE analyses.SSR, simple sequence repeat; SNP, single-nucleotide polymorphism. (b) Trees for operational taxonomic units (OTUs) of individual soybean.The percentage bootstrap support is indicated at each node. Abbreviation for OTUs are: c, cultivated soybean (G. max); W, wild soybean (G.
soja); NER, northeast region, China; NR, north region, China; HR, Huanghuai region, China; SR, south region, China; EP, east part; WP, westpart; NP, north part; SP, south part.
NewPhytologist Research 249
� The Authors (2010)
Journal compilation � New Phytologist Trust (2010)
New Phytologist (2010) 188: 242–253
www.newphytologist.com
Table S1). The OTUs comprised 3–20 accessions. Thisapproach again resulted in a monophyletic grouping of G.max (Fig. 5b), but also revealed that wild accessions fromthe HR region (SSR analyses) or the NR + HR cluster(SNP analyses) are closest phylogenetically to the G. maxclade. Interestingly, structure analyses of SNP dataassigned NR cultivated accessions into G. soja clusters atK = 2 (Fig. 3). These patterns suggest that the NRaccessions best represent early domestication germplasm.
In most trees, the deepest split separated the two species(Fig. 5). Within species, there were clear geographic pat-terns of the OTUs. Within G. max, OTUs from NR splitfirst. Within G. soja, the populations from China exhibiteda closer relationship to G. max than the relationshipbetween those and populations from Japan or Korea.Although not entirely consistent among data sets, phylo-genetic analyses tended to suggest that wild soybeans fromNR and HR, both of which are along the Yellow River,were genetically most closely related to cultivated soybeans.
Discussion
The analyses of genetic diversity in domesticated crops andtheir wild ancestors is typically used for one of three pur-poses. The first is to identify distinct genetic groups forretention of germplasm (Agrama et al., 2009). The secondis to identify the genes that underlie important phenotypicand genetic shifts during domestication and breeding, usingthe approach of selective sweep mapping (Vigouroux et al.,2002; Wright et al., 2005; Ross-Ibarra et al., 2007;Chapman et al., 2008). The third is to infer aspects of thehistory and timing of domestication. Here, we have assayedgenetic diversity in a broad sample of domesticated soybean(G. max) and its wild progenitor (G. soja) with this thirdpurpose in mind – that is, to provide additional insightsinto soybean domestication – but the breadth of our studyensures that it is also useful for analyses of germplasm col-lections.
Our study uses both SSRs and SNPs and our sampling ismuch broader than previous studies (Matsuoka et al., 2002;Becquet et al., 2007). Nonetheless, our data, like previousdata (Powell et al., 1996; Kuroda et al., 2006) suggest thatthe primary division of genetic diversity is consistently thatbetween wild (G. soja) and domesticated soybean (G. max).For example, structure and instruct analyses on all data-sets clearly delineate wild and cultivated germplasm atK = 2, and AMOVA analyses also clearly differentiatebetween wild and cultivated accessions. Moreover, phyloge-netic analyses of both individual accessions (Fig. 5a) andcollapsed OTUs (Fig. 5b) tend to suggest that the culti-vated germplasm is monophyletic. Based on these lines ofevidence, we favor the interpretation that soybean, likemaize (Matsuoka et al., 2002), barley (Badr et al., 2000),pearl millet (Oumar et al., 2008), emmer wheat (Ozkan
et al., 2002) and einkorn wheat (Heun et al., 1997), mayresult from a single domestication event.
If this inference is correct, soybean differs from that ofother species studied for which genetic evidence providescompelling evidence of multiple domestication events(Londo et al., 2006; Morrell & Clegg, 2007; Sang & Ge,2007; Aguilar-Melendez et al., 2009). In this context, itshould be emphasized that inferential methods are imperfect,because simulations indicate that multiple domesticationevents can lead to monophyletic clustering of domesticatedaccessions under some conditions (Allaby et al., 2008;Ross-Ibarra & Gaut, 2008). Thus, a pattern of monophylycould provide a false signal of a single domestication event,and there may be a bias toward concluding there has been asingle domestication event even when it is untrue.Nonetheless, our inference of a single domestication eventis consistent with most previous studies of soybean (Xuet al., 1986; Zhu et al., 1995; Zhou et al., 1998; Gai et al.,2000), except one based on a modest sample of chloroplastDNA (cp)SSRs, which suggested that the cultivated soybeanoriginated independently in different regions from differentwild gene pools (Xu et al., 2002).
Admixture and geographic subdivision
Because our study relies on more genetic markers than pre-vious studies, our data provide more potential for insightinto geographic delineations within species and hybridiza-tion between species. With regard to genetic subdivisionswithin species, G. max clustered by geographic location; G.max in China divides into the geographic regions NER,NR, HR and SR. This is a coarser clustering than inferredpreviously from a study that used fewer SSR markers (59)but more landrace accessions (1863) (Li et al., 2008). Thelatter identified seven clusters, representing roughly theclusters inferred here, except that four separate clusters wereinferred within the geographical region of SR, and these SRsubclusters reflected differences in sowing season. Thus,increasing sample size may be beneficial for inferring fine-tuned geographic structure (Morin et al., 2004).
We infer six genetic subgroups within G. soja. These sixclusters separate geographically, corresponding to Japan,Korea, Russia and three distinct regions in China. Thispattern was also discovered in a previous study (Wen et al.,2009). AMOVA analyses at 60 SSR loci and eight morpho-logical traits with 196 Chinese G. soja accessions alsoshowed that significant variation exists among northeastChina, the Huanghuai Valleys and southern China.The lone exception was the Northeast region (NER) ofChina, for which accessions were mainly assigned tomixture cluster and HR, NR, Korea subgroups. This wasperhaps contributed by the small number of accessions fromNER (six for SSR, five for SNP and three for SSR + SNPanalyses).
250 Research
NewPhytologist
� The Authors (2010)
Journal compilation � New Phytologist Trust (2010)
The structure and instruct analyses provided slightlydifferent insights into the extent of potential hybridizationbetween wild and cultivated soybean. structure suggeststhat fully 20% of our accessions are admixed, which mayindicate extensive post-domestication hybridizationbetween species. By contrast, instruct assigns only 0.5%of individuals as hybrid, based on the SSR data set, which ismore similar to the measured natural hybridization rate of0.73% (Nakayama & Yamaguchi, 2002). Potential hybridindividuals have been noted in the field, and sometimesthese hybrids are considered to belong to an intermediateevolutionary species, Glycine gracilis (Skvortzow, 1927;Fukuda, 1933; Chen & Nelson, 2004), but others haveconsidered them as hybridization products of G. soja and G.max (Hymowitz, 1970).
Almost all unassigned accessions originated from theregion from 30 to 40�N latitude in China, but along a widelongitudinal swath. We thus examined morphological char-acteristics of NR accessions that were collected from the 34to 40�N region and assigned into the G. soja cluster whenK = 2 with SNP data. Based on the description of seedcolor, 100-seed weight, growth habit and stem terminationof catalogs (Wang, 1982; Chang & Sun, 1991; Changet al., 1996), most of these exhibit ancestral traits, includ-ing: 72% with black and 22% with bicolor or green seedcoat color, 100% with small seed size (100-seed weight< 12 g), 78% with viney (or semi-viney) and 89% withindeterminate (or semi-determinate) characters. Thus, theseaccessions seem to show evidence of admixture.
SNPs vs SSRs
This study utilized extensive data from both SNP and SSRmarkers and hence provides an opportunity to carefullyassess the relative utility of these two marker types. Overall,SNPs had lower resolving power for detecting populationstructure. For example, SSR data yielded six clusters thatwere consistent with geographical origin, but SNPs revealedonly five clusters without resolving groups expected to beclearly differentiated (e.g. Chinese vs Russian populationsand Japanese vs Korean populations within G. soja).Despite these differences, SSR and SNP analyses stillyielded similar population structure within the species,especially within G. max, similar fractions of diversityattributable to various hierarchical components of popu-lation structure (Table 2) and similar phylogenetic infor-mation, as measured by Mantel tests on the pairwiseshared-allele distance (r = 0.505, P = 0.001).
Some previous studies have also found that SSR markersperform better at clustering germplasm into populationsthan SNP markers (Rosenberg et al., 2003; Hamblin et al.,2007; Payseur & Jing, 2009). In our case, it seems that thereare two reasons for the discrepancy in the power to resolvepopulations. The first is the level of genetic diversity. The
number of observed SNP alleles (1108) was only half that ofSSR markers (2133), despite assaying > 5 times as manySNP loci (554 vs 99). The second, as noted previously(Morin et al., 2004; Kauwe et al., 2005), is the frequency ofdistinct alleles. The SNP markers used in this study werediscovered in only six G. max cultivars (Choi et al., 2007)and hence most of our alleles were neither specific to G. sojanor rare within populations. Hence, population structurewas probably more poorly resolved because common allelesare more likely to be shared among populations.
Acknowledgements
This research was supported by the State Key BasicResearch and Development Plan of China (973) (Nos.2010CB125900 and 2004CB117203), National KeyTechnologies R&D Program in the 11th Five-Year Plan(No. 2006BAD13B05), State High-tech (863) (Nos.2006AA10A110 and 2006AA10Z164), InternationalScience and Technology Cooperation and ExchangesProjects (No. 20061773) and the Academy and InstituteFoundation for Basic Scientific Research in Institute ofCrop Science, Chinese Academy of Agricultural Sciences.We thank Dr. Song Ge (Institute of Botany, ChineseAcademy of Sciences, Beijing, China), Dr Marinus. J. M.Smulders (Plant Research International, Wageningen UR,the Netherlands.), Dr Richard Abbott and two anonymousreviewers for stimulating discussions and useful suggestion.
References
Abe J, Xu D, Suzuki Y, Kanazawa A, Shimamoto Y. 2003. Soybean
germplasm pools in Asia revealed by nuclear SSRs. Theoretical andApplied Genetics 106: 445–453.
Agrama HA, Yan WG, Lee F, Fjellstrom R, Chen MH, Jia M, McClung
A. 2009. Genetic assessment of a mini-core subset developed from the
USDA rice genebank. Crop Science 49: 1336–1346.
Aguilar-Melendez A, Morrell PL, Roose ML, Kim SC. 2009. Genetic
diversity and structure in semiwild and domesticated chiles (Capsicumannuum; Solanaceae) from Mexico. American Journal of Botany 96:
1190–1202.
Allaby RG, Fuller DQ, Brown TA. 2008. The genetic expectations of a
protracted model for the origins of domesticated crops. Proceedings of theNational Academy of Sciences, USA 105: 13982–13986.
Badr A, M K, Sch R, Rabey HE, Effgen S, Ibrahim HH, Pozzi C, Rohde
W, Salamini F. 2000. On the origin and domestication history of barley
(Hordeum vulgare). Molecular Biology and Evolution 17: 499–510.
Becquet C, Patterson N, Stone AC, Przeworski M, Reich D. 2007.
Genetic structure of chimpanzee populations. PLoS Genetics 3: e66.
0617-0626.
Chang R. 1989. Studies on the origin of the cultivated soybean (Glycinemax (L.) Merr.). Oil Crops of China: 1–6.
Chang R, Sun J. 1991. Catalogues of Chinese soybean germplasm andresources (sequel 1). Beijing, China: China Agricultural Press.
Chang R, Sun J, Qiu L, Chen Y. 1996. Catalogues of Chinese soybean germ-plasm and resources (sequel 2). Beijing, China: China Agricultural Press.
of simple sequence repeats and single nucleotide polymorphisms in
assessment of maize diversity and relatedness. PLoS ONE 2: e1367.
Heun M, Schafer-Pregl R, Klawan D, Castagna R, Accerbi M, Borghi B,
Salamini F. 1997. Site of einkorn wheat domestication identified by
DNA fingerprinting. Science 278: 1312–1314.
Hirata T, Abe J, Shimamoto Y. 1999. Genetic structure of the Japanese
soybean population. Genetic Resources and Crop Evolution 46: 441–453.
Hymowitz T. 1970. On the domestication of the soybean. EconomicBotany 24: 408–421.
Hymowitz T, Newell C. 1981. Taxonomy of the genus Glycine,domestication and uses of soybeans. Economic Botany 35: 272–288.
Jones MK, Liu X. 2009. Origins of agriculture in East Asia. Science 324:
730–731.
Kauwe J, Bertelsen S, Bierut L, Dunn G, Hinrichs A, Jin C, Suarez B.
2005. The efficacy of short tandem repeat polymorphisms versus single-
nucleotide polymorphisms for resolving population structure. BMCGenetics 6(Suppl. 1): S84.
Kuroda Y, Kaga A, Tomooka N, Vaughan D. 2006. Population genetic
structure of Japanese wild soybean (Glycine soja) based on microsatellite
variation. Molecular Ecology 15: 959–974.
Li FS. 1994. A study on origin and evolution of soybean. Soybean Science(China) 13: 61–66.
Li Y, Guan R, Liu Z, Ma Y, Wang L, Li L, Lin F, Luan W, Chen P, Qiu
L. 2008. Genetic structure and diversity of cultivated soybean (Glycinemax (L.) Merr.) landraces in China. Theoretical and Applied Genetics117: 857–871.
Liu K, Muse S. 2005. Powermarker: an integrated analysis environment
for genetic marker analysis. Bioinformatics 21: 2128–2129.
Wright SI, Bi IV, Schroeder SG, Yamasaki M, Doebley JF, McMullen
MD, Gaut BS. 2005. The effects of artificial selection on the maize
genome. Science 308: 1310–1314.
Xie H, Chang R, Guan R, Qiu L. 2005. Genetic diversity of Chinese
summer soybean germplasm revealed by SSR markers. Chinese ScienceBulletin 50: 526–535.
Xu B, Zheng H, Lu Q, Zhao S, Zhou S. 1986. Three new evidences of the
origional area of soybean. Soybean Science (China) 5: 123–130.
Xu D, Abe J, Gai J, Shimamoto Y. 2002. Diversity of chloroplast DNA
SSRs in wild and cultivated soybeans: evidence for multiple origins of
cultivated soybean. Theoretical and Applied Genetics 105: 645–653.
Xu D, Gai J. 2003. Genetic diversity of wild and cultivated soybeans
growing in China revealed by RAPD analysis. Plant Breeding 122: 503–
506.
Zhang LB, Zhu Q, Wu ZQ, Ross-Ibarra J, Gaut BS, Ge S, Sang T. 2009.
Selection on grain shattering genes and rates of rice domestication. NewPhytologist 184: 708–720.
Zhao TJ, Gai JY. 2004. The origin and evolution of cultivated soybean
(Glycine max (L.) Merr.). Scientia Agricultura Sinica 37: 954–962.
Zhou X, Peng Y, Wang G, Chang R. 1998. Preliminary studies on the
centers of genetic diversity and origination of cultivated soybean in
China. Acta Agronomica Sinica 31: 37–43.
Zhu T, Shi L, Doyle JJ, Keim P. 1995. A single nuclear locus phylogeny
of soybean based on DNA sequence. Theoretical and Applied Genetics 90:
991–999.
Supporting Information
Additional supporting information may be found in theonline version of this article.
Fig. S1 Heterozygosity distribution at 99 simple sequencerepeat (SSR) loci and 554 single-nucleotide polymorphism(SNP) loci.
Fig. S2 Allele frequency distribution at 99 simple sequencerepeat (SSR) loci and 554 single-nucleotide polymorphism(SNP) loci.
Fig. S3 The mean number of distinct alleles (a) and privatealleles (b) per locus as a function of standardized sample sizefor two species (Glycine soja and Glycine max) and their sub-population in China.
Fig. S4 Estimated In P(X ⁄ K) and Var(logeP(X ⁄ K)) for Kfrom 1 to 8 based on simple sequence repeat (SSR), single-nucleotide polymorphism (SNP) and SSR + SNP dataset inall of accessions (a) and wild accessions (b).
Fig. S5 Each value gives the per cent of accessions that had‡ 0.8 ancestry in the STRUCTURE analyses.
Table S1 The geographical distribution pattern andinferred clusters and OTUs based on simple sequencerepeat (SSR), single-nucleotide polymorphism (SNP) andSSR + SNP analyses of 435 accessions in China
Table S2 Description of the 99 simple sequence repeat(SSR) markers and 617 single-nucleotide polymorphism(SNP) markers, population genetic statistics for the SSRand SNP complete dataset, respectively
Table S3 Frequency of assignment from STRUCTURE-inferred clusters to INSTRUCT-inferred clusters based on sim-ple sequence repeat (SSR), single-nucleotide polymorphism(SNP) and SSR + SNP dataset, respectively
Please note: Wiley-Blackwell are not responsible forthe content or functionality of any supporting informationsupplied by the authors. Any queries (other than missingmaterial) should be directed to the New Phytologist CentralOffice.
NewPhytologist Research 253
� The Authors (2010)
Journal compilation � New Phytologist Trust (2010)