Top Banner
Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci Ying-Hui Li 1 , Wei Li 1 , Chen Zhang 1 , Liang Yang 2 , Ru-Zhen Chang 1 , Brandon S. Gaut 2 and Li-Juan Qiu 1 1 The National Key Facility for Crop Gene Resources and Genetic Improvement (NFCRI) Key Lab of Germplasm Utilization (MOA), Institute of Crop Science, Chinese Academy of Agricultural Sciences, 100081 Beijing, China; 2 Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697, USA Author for correspondence: Li-Juan Qiu Tel: +86 10 8210 5843 Email: [email protected] Received: 6 January 2010 Accepted: 14 May 2010 New Phytologist (2010) 188: 242–253 doi: 10.1111/j.1469-8137.2010.03344.x Key words: diversity pattern, domestication, geographic variation, introgression, population structure. Summary The study of genetic diversity between a crop and its wild relatives may yield fundamental insights into evolutionary history and the process of domestication. In this study, we genotyped a sample of 303 accessions of domesticated soybean (Glycine max) and its wild progenitor Glycine soja with 99 microsatellite markers and 554 single-nucleotide polymorphism (SNP) markers. The simple sequence repeat (SSR) loci averaged 21.5 alleles per locus and overall Nei’s gene diversity of 0.77. The SNPs had substantially lower genetic diversity (0.35) than SSRs. A SSR analyses indicated that G. soja exhibited higher diversity than G. max, but SNPs provided a slightly different snapshot of diversity between the two taxa. For both marker types, the primary division of genetic diversity was between the wild and domesticated accessions. Within taxa, G. max consisted of four geographic regions in China. G. soja formed six subgroups. Genealogical analyses indicated that cultivated soybean tended to form a monophyletic clade with respect to G. soja. G. soja and G. max represent distinct germplasm pools. Limited evidence of admixture was discovered between these two species. Overall, our analyses are consistent with the origin of G. max from regions along the Yellow River of China. Introduction Plant domestication fundamentally altered the course of human history, prompting the shift from hunter–gatherer to agricultural societies. Domestication is a multifaceted process that is amenable to study by a wide range of disci- plines, including archaeology, anthropology, molecular genetics and evolutionary biology. Accordingly, the study of domestication has yielded fundamental insights into early societies, the genes and biological mechanisms that underlie morphological change, and the strength and patterns of selection (Doebley et al., 2006). Recent genetic and archaeological investigations have also shown that the process of domestication can vary substan- tially among crop species. For example, genetic studies of maize suggest that it was domesticated only once, from a wild progenitor located in highland Mexico (Matsuoka et al., 2002). After domestication c. 9000 yr ago (Matsuoka et al., 2002), archaeological evidence indicates that culti- vated maize dispersed throughout the Americas quite rapidly – that is, within, perhaps, hundreds of years (Pohl et al., 2007). By contrast, evolutionary genetic analyses of barley and rice have demonstrated at least two domestication events for both species (Cheng et al., 2003; Morrell & Clegg, 2007). Moreover, archaeological study of cereal grains suggests that the process of domestication may have taken thousands of years (Fuller, 2007), and perhaps as long as five millennia for rice (Fuller et al., 2009). While the result for rice is still open to interpretation (Jones & Liu, 2009) and may not be completely compatible with genetic evidence (Zhang et al., 2009), extending the duration of domestication from a rapid to a multimillennial New Phytologist Research 242 New Phytologist (2010) 188: 242–253 www.newphytologist.com Ó The Authors (2010) Journal compilation Ó New Phytologist Trust (2010)
12

Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci

Apr 23, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci

Genetic diversity in domesticated soybean (Glycine max)and its wild progenitor (Glycine soja) for simplesequence repeat and single-nucleotide polymorphismloci

Ying-Hui Li1, Wei Li1, Chen Zhang1, Liang Yang2, Ru-Zhen Chang1, Brandon S. Gaut2 and Li-Juan Qiu1

1The National Key Facility for Crop Gene Resources and Genetic Improvement (NFCRI) ⁄ Key Lab of Germplasm Utilization (MOA), Institute of Crop

Science, Chinese Academy of Agricultural Sciences, 100081 Beijing, China; 2Department of Ecology and Evolutionary Biology, University of California,

Irvine, CA 92697, USA

Author for correspondence:Li-Juan QiuTel: +86 10 8210 5843

Email: [email protected]

Received: 6 January 2010

Accepted: 14 May 2010

New Phytologist (2010) 188: 242–253doi: 10.1111/j.1469-8137.2010.03344.x

Key words: diversity pattern, domestication,geographic variation, introgression,population structure.

Summary

• The study of genetic diversity between a crop and its wild relatives may yield

fundamental insights into evolutionary history and the process of domestication.

• In this study, we genotyped a sample of 303 accessions of domesticated

soybean (Glycine max) and its wild progenitor Glycine soja with 99 microsatellite

markers and 554 single-nucleotide polymorphism (SNP) markers.

• The simple sequence repeat (SSR) loci averaged 21.5 alleles per locus and overall

Nei’s gene diversity of 0.77. The SNPs had substantially lower genetic diversity

(0.35) than SSRs. A SSR analyses indicated that G. soja exhibited higher diversity

than G. max, but SNPs provided a slightly different snapshot of diversity between

the two taxa. For both marker types, the primary division of genetic diversity was

between the wild and domesticated accessions. Within taxa, G. max consisted of

four geographic regions in China. G. soja formed six subgroups. Genealogical

analyses indicated that cultivated soybean tended to form a monophyletic clade

with respect to G. soja.

• G. soja and G. max represent distinct germplasm pools. Limited evidence of

admixture was discovered between these two species. Overall, our analyses are

consistent with the origin of G. max from regions along the Yellow River of China.

Introduction

Plant domestication fundamentally altered the course ofhuman history, prompting the shift from hunter–gathererto agricultural societies. Domestication is a multifacetedprocess that is amenable to study by a wide range of disci-plines, including archaeology, anthropology, moleculargenetics and evolutionary biology. Accordingly, the study ofdomestication has yielded fundamental insights into earlysocieties, the genes and biological mechanisms that underliemorphological change, and the strength and patterns ofselection (Doebley et al., 2006).

Recent genetic and archaeological investigations have alsoshown that the process of domestication can vary substan-tially among crop species. For example, genetic studies ofmaize suggest that it was domesticated only once, from a

wild progenitor located in highland Mexico (Matsuokaet al., 2002). After domestication c. 9000 yr ago (Matsuokaet al., 2002), archaeological evidence indicates that culti-vated maize dispersed throughout the Americas quite rapidly– that is, within, perhaps, hundreds of years (Pohl et al.,2007). By contrast, evolutionary genetic analyses of barleyand rice have demonstrated at least two domesticationevents for both species (Cheng et al., 2003; Morrell &Clegg, 2007). Moreover, archaeological study of cerealgrains suggests that the process of domestication may havetaken thousands of years (Fuller, 2007), and perhaps as longas five millennia for rice (Fuller et al., 2009).

While the result for rice is still open to interpretation(Jones & Liu, 2009) and may not be completely compatiblewith genetic evidence (Zhang et al., 2009), extending theduration of domestication from a rapid to a multimillennial

NewPhytologistResearch

242 New Phytologist (2010) 188: 242–253

www.newphytologist.com� The Authors (2010)

Journal compilation � New Phytologist Trust (2010)

Page 2: Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci

process has important implications for interpreting patternsof genetic diversity between crops and their wild relatives.For example, a domestication process of thousands (asopposed to hundreds) of years provides more opportunitiesfor local domestication, migration among local domesticatesand local extinction. Just as importantly, a domesticationevent of long duration provides an expanded period forpotential introgression between the domesticated and localpopulations of its wild relative(s). Evidence for local intro-gression between a crop and its wild relatives can be inferredfrom patterns of genetic diversity – for example, rice (Garriset al., 2005).

Here, we investigate genetic diversity in cultivatedsoybean (Glycine max) and its wild progenitor Glycine soja.Soybean is cultivated globally, in part because it producesamong the highest gross oil output – with the highest pro-tein content – of any vegetable crop (Mohamed &Rangappa, 1992). The weight of cytological, biochemicaland molecular evidence supports the domestication of soy-bean from G. soja, a wild annual species that is nativethroughout China and parts of Korea, Japan and Russia(Fig. 1). Nonetheless, several aspects of soybean domestica-tion are not well established, which is surprising given itsagricultural importance. For example, the location ofdomestication in China is not yet clearly substantiated. Ithas been hypothesized that soybean was domesticated innorth-eastern China (Fukuda, 1933; Li, 1994), the YellowRiver valley of northern China (Vavilov, 1951; Hymowitz& Newell, 1981; Chang, 1989; Zhou et al., 1998; Dong

et al., 2004; Zhao & Gai, 2004; Li et al., 2008) and south-ern China (Gai et al., 2000). It is also unclear as to whethersoybean was domesticated more than once, but multipledomestications have been explicitly suggested (Xu et al.,2002).

Patterns of molecular diversity often yield insights intothe location and number of domestication events.Molecular diversity in G. max and G. soja have been exam-ined with a series of markers, including simple-sequencerepeats (SSRs), random amplified polymorphic DNA(RAPD) markers and amplified fragment length polymor-phisms (AFLPs). To date, these studies have yielded similarinsights into the patterning of genetic diversity in G. sojaand its relationship to G. max. Typically, genetic diversityclusters by taxon, with a clear differentiation between wildand domesticated taxa (Powell et al., 1996). Within ataxon, the genetic structure of G. max and G. soja typicallyagree with geographic location (Dong et al., 2001, 2004;Abe et al., 2003; Xu & Gai, 2003; Li et al., 2008b). Forexample, Chinese and Japanese G. soja populations formdistinct germplasm pools (Hirata et al., 1999; Kurodaet al., 2006), and Asian accessions of G. max (Abe et al.,2003) group in general accordance with planting regionand also the sowing season.

In this study, we investigate genetic diversity in a broadsample of G. max and G. soja. Our study differs from previ-ous studies of genetic diversity in Glycine in three importantways. First, we rely on two types of molecular markers –SSRs and SNPs, which differ in mutational properties

13

RUSSIA

CHINA

1

1

74 12

1313

7

811

34

717

2319

6

3026

3

3

36

37

20

25

30

9

8

1

North

Huanghuai

South

Spratly Islands

Fig. 1 The geographic distributions of samples used in this study. Regional sampling is designated by circles, where Glycine soja is representedby the open part of the circle and Glycine max is the closed portion. Each circle represents a different province. The number of samples perprovince is indicated by the number. The four colored portions divide China into four regions: NER (northeast region), NR (north region), HR(Huanghuai region) and SR (south region). The blue lines represent the Yellow and Yangtze rivers.

NewPhytologist Research 243

� The Authors (2010)

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 242–253

www.newphytologist.com

Page 3: Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci

(Payseur & Jing, 2009) – and compare results betweenthem. Second, our study differs with respect to the size andextent of the Glycine sample. Our survey includes G. maxindividuals that represent over 90% of the phenotypicdiversity found within the Chinese soybean germplasmcollection and G. soja individuals representative of itsnatural range. Third, we explicitly consider the possibilitiesof multiple centers of domestication and of admixture andprolonged gene flow between the wild and the cultivatedspecies after the initial domestication event(s).

With SSR data from 99 loci and SNP data from 738 locityped in a common sample of 303 individuals, we addressthe following questions: Do G. soja and G. max continue torepresent distinct germplasm pools, as in previous studies?Is G. soja geographically structured? If so, can we identifythe region or the regions of China in which G. max wasdomesticated? Is there evidence that the process of domesti-cation included substantial admixture between wild andcultivated populations? Along the way, we also evaluate therelative merits of SNPs and SSRs to address these questions.

Materials and Methods

Plant materials

We sampled a total of 435 accessions representing culti-vated G. max (L.) Merr. (321), its wild progenitor speciesG. soja Sieb. et Zucc (112) and two outgroup species(Fig. 1). The G. max population consisted of 240 landracesand 81 cultivars, including the minicore collection of culti-vated soybean in the Chinese National Soybean GeneBank(CNSGB) (248 G. max accessions). The minicore collectionrepresents most of the phenotypic diversity and c. 70% ofthe molecular genetic diversity of 23 587 cultivated acces-sions housed in the CNSGB (Wang et al., 2006; Qiu,2009). Our accessions originate from four large ecologicalregions, northeast region (NER), north region (NR),Huanghuai region (HR) and south region (SR) (Fig. 1),respectively, ranging over 19.4 to 50.2�N and 86.3 to130.2�E. These four regions represent the four major plant-ing areas of soybean in China (Li et al., 2008).

Accessions of the wild progenitor were selected to repre-sent the geographical range of this species from 24.5 to52.2�N and 100.5 to 141.2�E. Of the 112 G. soja acces-sions, 73 were from China, 8 from Korea, 9 from Russiaand 22 from Japan (Fig. 1). A single accession of Glycinetomentalla Hayata (one of two perennial species found inChina) and Glycine falcata Benth. were included as out-groups. The G. max accessions and 73 Chinese G. soja wereobtained from the CNSGB, with the remainder providedby Dr Randall Nelson from the USDA-ARS SoybeanGermplasm Collection (University of Illinois, Urbana, IL,USA). Detailed information about each accession isprovided in the Supporting Information, Table S1.

Data collection

For both SNP and SSR analyses, DNA was extracted fromyoung leaf tissue of one plant (G. tomentella and G. falcata)or a bulk of young leaf tissue of 20–30 plants (G. soja andG. max) per accession as previous described (Xie et al.,2005). We bulked samples to produce enough DNA forgenotyping, but the bulking of samples is justified by thefact that each accession of the minicore collection has beenculled for both phenotypic and genotypic homogeneity.

Ninety-nine simple sequence repeats (SSRs) were selectedfor genotyping, based on their distribution across thegenetic linkage map (http://bldg6.arsusda.gov/cregan/soymap.htm). The SSR loci were mapped onto theWilliams82 genome sequence (http://www.phytozome.net)with blast (E-value < 10)10), using the SSR primers as aquery. If a best hit of blast mapped only part of the primersequence, we extended it in order to map the full length.Then, the allele size in Williams82 was calculated based onthe boundaries of the extended hit. Overall, the 99 SSR lociwere located on 20 integrated genetic linkage groups, cover-ing 1581.8 cM of soybean genome, with 20.0 cM averagegenetic distance between adjoining loci (Table S2). A PCRamplification of SSRs followed Xie et al. (2005). The PCRproducts were separated on an ABI PRISM 377 DNAAnalyser (Applied Biosystems, Foster City, CA, USA).Allele sizes were estimated with an automated sequencer(Applied Biosystems) and inspected manually. When thegenotyping had multiple (‡ 3) peaks, we treated the obser-vation as missing data. These had little overall effect,because they represented only 0.2% of the total genotypingdata. In a small proportion of the time two peaks were iden-tified in four SSR markers (c. 4.0% of 99 SSR markers), wescored the higher peak after confirmation by the repeat test(ABI PRISM 377 DNA Analyser) and polyacrylamide gelelectrophoresis (PAGE).

We assayed 738 SNPs in the complete set of 435 acces-sions. These SNPs were polymorphic in a set of six diverseG. max accessions and used to build the first transcript mapof soybean (Choi et al., 2007). They were chosen based ona design ability rank score > 0.6 and a pre-evaluationof 60 bp of upstream and downstream flanking regionsby Illumina (http://www.illumina.com/, San Diego, CA,USA). The upstream and downstream data were accessed athttp://bfgl.anri.barc.usda.gov/soybean/.

The SNPs were assayed using the Illumina GoldenGateassay which was performed based on the manufacturer’sprotocol and the methods described in Shen et al. (2005).For each SNP, the lowest acceptable score of GenCall andGenTrain were set at 80% and 0.6, respectively, for separat-ing homozygote and heterozygote clusters.

The 738 SNPs were also mapped to the Williams82 gen-ome sequence with blast (E-value < 10)10) using the SNPflanking sequences as queries; 17 of the 738 SNP markers

244 Research

NewPhytologist

� The Authors (2010)

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 242–253

www.newphytologist.com

Page 4: Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci

either overlapped with another marker or had an ambiguouslocation and were not considered further.

A list of the SNP and SSR loci, along with genetic andphysical positions is provided in Table S2.

Analyses of genetic diversity and population structure

Summary statistics were computed for both the SSR andSNP data sets. The statistics included the number ofalleles, the proportion of heterozygous individuals in thepopulation and Nei’s gene diversity, as calculated bypowermarker 3.25 (Liu & Muse, 2005). In addition, pwas calculated on SNP data with DnaSP (Rozas et al.,2003). As sample sizes differed across populations, thenumber of distinct alleles per sample was estimated byadze (Szpiech et al., 2008), which employs a rarefactionapproach to obtain sample-size corrected estimates. Sample-size corrected estimates of allelic richness were calculatedwith fstat (Goudet, 2001).

We examined population structure and differentiationwith two methods. First, we used two Bayesian MarkovChain Monte Carlo approaches, structure 2.1 (Pritchardet al., 2000; Falush et al., 2003) and instruct (Gao et al.,2007). structure minimizes deviations from Hardy–Weinberg equilibrium within an inferred population; bycontrast, instruct, uses expected genotype frequencies andestimates of selfing rate to make population assignments.

For instruct, one haplotype per line was included in thedataset. Haplotypes were inferred from the structure

dataset using phase (Stephens et al., 2001) version 2.1(Stephens & Donnelly, 2003). For both structure andinstruct analyses, we employed the admixture and inde-pendent allele frequency models, using a number of clusters(K) ranging from 1 to 8. Five runs were performed for eachvalue of K, without using previous population information.Burn-in time and replication number were both set to100 000 for each run. Additional parameters in theinstruct analyses were set to the default values on the web-site (http://cbsuapps.tc.cornell.edu/instruct.aspx/). Thevalue of logePr(X ⁄ K ) and the variance in logePr(X ⁄ K )(Var(logeP(X ⁄ K ))) were used to identify the appropriatevalues of K.

Our second method of examining population struc-ture was analyses of molecular variance (AMOVA), basedon implementations in arlequin (http://cmpg.unibe.ch/software/arlequin3/).

Phylogenetic analyses

We constructed two types of phylogenetic trees. Both treeswere based on the shared-allele distance among accessions,as calculated by powermarker 3.25 and displayed bymega4 (Tamura et al., 2007). Both tree types were alsobased on the neighbor-joining algorithm (Saitou & Nei,

1987) implemented in mega, using the G. tomentella andG. falcata accessions as outgroups. The first type of tree useddata from all 435 accessions and treated each accession asan operational taxonomic unit (OTU). The second treegrouped accessions into 15 OTUs on the basis of their geo-graphic location and their position in the first type of tree.Accessions from the same geographic region or subregionwere grouped as an OTU when they clustered together inthe first tree. Both trees were calculated for three data sets –SSRs, SNPs and SSRs + SNPs – resulting in a total of sixtrees. Confidence on each tree was assessed with 1000 boot-strap replications.

We compared the shared-allele distance matrices betweenSSR and SNP data with a Mantel test, which was based on1000 random permutations as implemented in MXCOMPwithin the ntsyspc 2.10j package (New York, NY, USA).

Results

Diversity among G. soja and G. max samples

We attempted to genotype 435 accessions of wild and culti-vated soybean at 99 SSRs and 738 SNPs. Both SSRs andSNPs were culled with respect to quality and failure rates.For the SSR dataset, all 99 loci provided reliable results, but62 accessions had missing data for nine or more loci; theseaccessions were removed from further analyses. For theSNP dataset, 167 SNP loci were removed in total, owing tofailure in 20% or more of samples (121 SNPs), apparentheterozygosity in > 20% of samples (34 SNPs), suggestingparalogous markers (Fig. S1), or monomorphism in allaccessions (12 SNPs). The SNP dataset thus ultimately con-sisted of 554 SNPs (Table S2). However, 70 accessions hadmissing data for 55 SNP loci, and these accessions wereremoved. Thus, the genotyping data resulted in three datasets: the SSR dataset, consisting of 373 accessions scored for99 SSRs; the SNP dataset, consisting of 365 accessionsgenotyped for 554 SNPs; and a combined SSRs + SNPsdataset, comprising 303 common accessions scored for 554SNPs and 99 SSRs.

To explore properties of genetic variation, summarystatistics were calculated for the various datasets (Table 1).The 99 SSR loci averaged 21.5 alleles per locus, with anoverall Nei’s gene diversity of 0.77. The SNPs had substan-tially lower genetic diversity (0.35) than the SSRs (t-test,P < 0.001). The marker types also had markedly differentallele frequency distribution (Fig. S2); for SSRs, most(80.9%) of the alleles were at < 5% frequency, but most(92.0%) SNPs had an overall frequency ‡ 5%. Presumably,the relatively high frequency of SNP markers reflectsascertainment biases.

A few major themes become apparent when contrastingdiversity between G. soja and G. max. First, SSRs indicatethat wild G. soja has significantly higher allelic richness,

NewPhytologist Research 245

� The Authors (2010)

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 242–253

www.newphytologist.com

Page 5: Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci

gene diversity and allele numbers than cultivated G. max (t-test, P < 0.01). For example, the number of alleles observedin G. soja (1807) exceeded that of G. max (1473), despitesmaller sample sizes in G. soja (92 vs 279). After using therarefaction method to standardize for sample sizes, G. sojastill exhibited a higher numbers of expected distinct and pri-vate alleles than G. max at different samples sizes (Fig. S3).

Second, SNPs provided a slightly different snapshot ofmolecular diversity between the two taxa. The numbers ofdistinct and private alleles expected in G. soja was higherthan in G. max (Fig. S3), as was found with SSR data.However, genetic diversity, as measured both by Nei’s genediversity and by p, was lower in G. soja (0.301 and 0.309,respectively) than G. max (0.337 and 0.343, respectively)(Table 1).

The information from SSR and SNP markers wasconcordant in some respects, however. For example, bothindicate that China had the highest gene diversity within G.soja followed by Japan, Korea and Russia (Table 1). Bothmarker types also provided similar insights into the parti-tioning of diversity based on AMOVA analyses (Table 2).The proportion of variance caused by differences betweenspecies was 9.7–10.4% for SSRs and 14.3–15.2% for SNPs.For both, the largest component of variation was amongindividuals within population (SSRs, 73.9–84.1%; SNPs,65.7–78.4%).

Population structure

STRUCTURE vs INSTRUCT We applied two Bayesianapproaches – structure (Pritchard et al., 2000; Falushet al., 2003) and instruct (Gao et al., 2007) – to investigategenetic clustering among G. max and G. soja accessions.Each approach was applied to SSR data alone, SNP dataalone, and combined SSR + SNP data. The analyses usingstructure did not produce a clear ‘plateau’ as the estimatedlog probability of data Pr(X ⁄ K) increased gradually as valuesof K increased (Fig. S4). The variance in logePr(X ⁄ K)(Var(logeP(X ⁄ K))) increased constantly from K = 1 to K = 4(SSR dataset) or K = 5 (SNP dataset), with only slightchanges at higher K-values. For SSR data, most (83.4%)accessions were assigned to a population at K = 6, and thissubdivision seemed biologically sensible for selfing soybean(see the section–structure analyses among datasets).Hence we selected K = 6 as the optimal cluster number. ForSNPs we selected K = 5 because only 57.1% accessionscould be assigned to a single population when K = 6, but76.3% accessions could be assigned with K = 5. instruct

suggested the same number of populations for the SSR(K = 6) and SNP (K = 5) datasets (data not shown).

The results were highly concordant among runs instructure and instruct analyses, respectively, and thusresults are shown for a single run. At K = 2, the populationstructure inferred by these two approaches were similarT

able

1Su

mm

ary

stat

istics

for

Gly

cine

max

and

Gly

cine

soja

popula

tions

by

single

-nucl

eotide

poly

morp

his

m(S

NP)

(554)

and

sim

ple

sequen

cere

pea

t(S

SR)

(99)

loci

Spec

ies

Typ

eO

rigin

SNP

SSR

Sam

ple

size

Num

ber

of

poly

morp

hic

loci

p1N

ei’s

gen

ediv

ersi

tyH

eter

ozy

gosi

ty2

f3Sa

mple

size

Num

ber

of

alle

les

Nei

’sgen

ediv

ersi

tyA

llelic

rich

nes

s4H

eter

ozy

gosi

ty2

f3

G.m

ax

298

546

0.3

43

0.3

37

0.0

49

0.8

54

279

1473

0.6

87

10.1

(43)

0.0

49

0.9

29

Bre

dC

hin

a65

524

0.3

22

0.3

13

0.0

50

0.8

42

62

884

0.6

72

8.0

(30)

0.0

56

0.9

17

Landra

ceC

hin

a233

541

0.3

44

0.3

38

0.0

49

0.8

56

217

1332

0.6

82

8.8

(30)

0.0

46

0.9

32

G.so

ja65

533

0.3

09

0.3

01

0.0

72

0.7

63

92

1807

0.8

71

16.7

(43)

0.1

53

0.8

26

Chin

a41

525

0.3

14

0.3

40

0.0

67

0.8

09

61

1506

0.8

51

5.4

(4)

0.1

55

0.8

21

Japan

14

319

0.2

59

0.2

69

0.0

59

0.7

95

15

842

0.8

07

5.1

(4)

0.1

56

0.8

19

Kore

a5

300

0.2

42

0.2

33

0.1

13

0.5

93

7542

0.7

36

4.5

(4)

0.2

29

0.7

30

Russ

ian

5196

0.1

85

0.1

77

0.0

13

0.9

42

9463

0.6

95

4.0

(4)

0.0

73

0.9

09

Tota

l5363

554

0.3

57

0.3

50

0.0

54

0.8

36

371

2133

0.7

66

14.0

(43)

0.0

75

0.9

03

1N

ucl

eotide

div

ersi

ty.Si

tes

with

alig

nm

ent

gap

sor

mis

sing

dat

aw

ere

consi

der

ed.

2The

aver

age

of

pro

port

ion

of

het

erozy

gous

indiv

idual

sin

the

popula

tion

(Liu

&M

use

,2005).

3f,

Inbre

edin

gco

effici

ent.

4The

num

ber

inth

ebra

cket

dis

pla

ysm

inim

um

sam

ple

size

for

test

ing

alle

licrich

nes

s.5The

ove

rall

estim

ates

are

calc

ula

ted

asth

eav

erag

eac

ross

alll

oci

,w

her

eas

varian

ces

and

confiden

cein

terv

als

are

estim

ated

by

nonpar

amet

ric

boots

trap

pin

g(1

00

tim

es)

acro

ssdiffe

rent

loci

.

246 Research

NewPhytologist

� The Authors (2010)

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 242–253

www.newphytologist.com

Page 6: Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci

(Fig. 2a,b); both first differentiated G. soja and G. max.However, the assignment of accessions differed slightlybetween the two approaches (Table S1). For example,structure split cultivated soybean into four populationsand wild soybean into two populations, but instruct splitcultivated soybean into three populations and wild soybeaninto three populations.

Overall, the differences in assignment seemed to be rela-tively minor, because the consistency of assignment was veryhigh for most groups. For example, we calculated the freq-uency of assignment between structure and instruct

based on SSR, SNP and SSR + SNP data (Table S3). Forthe SSR data, 100% of NER and NR structure-inferredaccessions were assigned by instruct into the NER and

(a) PNS+RSSRSS SNPG. max G. soja G. max G. soja G. max G. soja

K = 2

K = 3

K = 4

K = 5

(b)

(c)

K = 6

K = 2

K = 3

K = 4

K = 5

C RKJ JKRCRSRHRNRENJKRCRSRHRNRENRSRHRNREN

K = 6

SR HR NR NER R K JSR HR NR NER R K JSR HR NR NER R K J

1.00.90.80.70.60.50.40.30.20.10.0

Fig. 2 Population structure inferred by Bayesian clustering approaches based on simple sequence repeat (SSR), single-nucleotide polymor-phism (SNP) and SSR + SNP data, respectively. (a) Total accessions using STRUCTURE; (b) total accessions using INSTRUCT; (c) Glycine soja withoutinferred hybrids using STRUCTURE. Each individual is shown as a thin vertical line partitioned into K colored components, representing inferredmembership in K genetic clusters. The top row (a and b) provided the species name and the bottom row (a, b and c) indicates geographicregion. NER, northeast region, China; NR, north region, China; HR, Huanghuai region, China; SR, south region, China; C, China; R, Russia; K,Korea; J, Japan.

Table 2 Analyses of molecular variance (AMOVA) based on simple sequence repeat (SSR) (99 loci) and single-nucleotide polymorphism (SNP)(554 loci) analyses

Sample Number of groups

Number of populations Percentage of variation and 95% confidence intervals (%)

TotalIn Glycinesoja

In Glycinemax

Amonggroups

Amongpopulationswithin groups

Amongindividualswithinpopulations

Withinindividuals

SSR SNP SSR SNP SSR SNP SSR SNP

Total 2 (G. soja, G. max) 11 7 4 10.4 15.2 4.6 5.9 73.9 65.7 11.2 13.3Total 2 (G. soja, G. max) 8 4 4 9.7 14.3 4.6 6.3 74.5 66.2 11.2 13.3Total 1 2 1 1 11. 5 16.5 77.3 70.2 11.3 13.3G. max 1 4 4 5.0 7.2 84.1 78.4 10.9 14.4G. soja 4 (China, Korea,

Japan, Russia)7 7 5.8 7.2 1.5 2.4 75.6 66.8 17.2 23.7

G. soja 1 7 7 5.2 7.3 77.3 68.5 17.6 24.3G. soja 1 4 4 6.8 8.7 76.1 67.7 17.1 23.6

NewPhytologist Research 247

� The Authors (2010)

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 242–253

www.newphytologist.com

Page 7: Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci

NR clusters. Similarly, 97.3% of the HR and 99.0% of theSR accessions inferred by structure were assigned into theHR + SR cluster identified by InStruct.

The greatest difference between the structure andinstruct analyses was in the number of admixed individ-uals. structure identified far more admixed individuals(78), but the number of unassigned accessions was muchlower in instruct analyses (8). As the structure resultsmore closely fit the geographical distribution and theresults of previous studies (Li et al., 2008; Wen et al.,2008). We thus chose to focus on their description inmore detail.

STRUCTURE analyses among datasets The structure resultsvaried somewhat among the three data sets (Fig. 2a,b). ForSSRs alone, the two taxa were clearly delineated at K = 2.Each additional cluster delineated geographic regions(Fig. 3): at K = 3, G. max accessions from south Chinaseparated from north China; at K = 4, accessions fromnortheast China were separated from the Yellow Riverregion; at K = 5, G. soja accessions split into two clusters(China vs neighboring countries); finally, at K = 6, theK-value with the highest likelihood, G. max separated intotwo clusters along the Yellow River. Thus, the cultivatedaccessions ultimately grouped into four clusters that werelargely concordant with major geographic regions in China,including NER, NR, HR and SR.

The structure analyses of the SNP and SNP + SSRdatasets agree with the SSR data in most respects – that is,G. max is differentiated into four regional groups, and G.soja is clearly separated from G. max. However, the groupdelineation with K = 2 was not primarily along taxonomiclines. This initial delineation separated G. max from agroup that included G. soja and the part of NR group of G.max (the other NR accessions were defined as unassignedaccessions), thus suggesting the possibilities either of intro-gression between G. soja and NR or recent shared ancestry.

The structure analyses provided limited evidence ofadmixture between G. soja and G. max (Fig. 4). For all threedatasets, some accessions labeled as G. max contained anappreciable component of diversity that was assigned to thewild gene pool. For the SSR data set, for example, theaccessions could be defined by accessions with low, mediumand high assignment probabilities to the G. soja gene pool.The low set consisted of 264 G. max and two G. soja acces-sions with an ancestry coefficient £ 0.27. (Here ‘ancestrycoefficient’ is defined as the inferred proportion of member-ship in the G. soja gene pool when K = 2.) The high groupincluded 61 accessions, including 60 G. soja and one G. maxaccession with ancestry coefficients ‡ 0.88. The middlegroup, which is the most interesting because it may representhybrids or introgressed material, consisted of 44 accessions(15 from G. max and 29 from G. soja) with ancestry coeffi-cients between 0.33 and 0.83. This middle group contained

1.2

1

SSR

Genotype ranking

An

cest

ry in

wild

gen

e p

oo

l

Genotype ranking Genotype ranking

0.8

0.2

0.4

0.6

SNP

1

1.2

0.6

0.4

0.8

0

0.2

0 50 100 150 200 250 300 3500 50 100 150 200 250 300 350

SSR+SNP1.2

1

0.6

0.4

0.8

0.2

00 50 100 150 200 250 300 350

0

Fig. 4 Distinction of wild and cultivated soybean, expressed as individual ancestry to the wild gene pool in a STRUCTURE analyses while assumingtwo populations. Ancestry was ranked for each individual, and the ranks are plotted against the ancestry in the wild gene pool. The arrowsindicate small gaps in the distribution (see text for details). Glycine max, circles; Glycine soja, crosses. SSR, simple sequence repeat; SNP,single-nucleotide polymorphism.

SSR

K = 2

G. soja G. max

K = 3

North China South China

K = 4

HR+NRNER

K = 5

C+R J+K

K = 6

NER SRNR HRC+R J+K

HR+SR

SNP

K = 2

K = 3

G. soja + G. max (NR) G. soja + G. max (NR)

K = 4

HR+NER

K = 5

NRG. soja SRNER HR

SSR + SNP

K = 2

K = 3

K = 4

K = 5

NRG. soja NERHR SR

G.max G. max

G. soja + G. max G. soja + G. max

G. soja + G. max

Fig. 3 Schematic clustering procedure during inferring population structure using STRUCTURE, based on simple sequence repeat (SSR), single-nucleotide polymorphism (SNP) and SSR + SNP data for Glycine max and Glycine soya. NER, northeast region, China; NR, north region,China; HR, Huanghuai region, China; SR, south region, China; C, China; R, Russia; K, Korea; J, Japan.

248 Research

NewPhytologist

� The Authors (2010)

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 242–253

www.newphytologist.com

Page 8: Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci

more accessions with the SNP dataset (60 G. max and 14 G.soja accessions) and with SNP + SSR dataset (43 G. max andnine G. soja accessions). Thus, while Bayesian analysesclearly delineate between taxa, these analyses also suggest thatc. 20% of wild and domesticated accessions are eitherpoorly differentiated or owe their origin to admixture.

To detect population structure within G. soja, weremoved the putative hybrids (Fig. 4) and reanalyzed theseG. soja datasets with structure. The SSR, SNP andSSR + SNP dataset yielded similar results (Fig. S4b). Wefound K = 6 to converge well and show the highest aver-aged likelihoods among runs of the program for all threedatasets. The accessions from different regions (exceptNER) tended to form six distinct clusters, corresponding totheir geographical origins (Japan, Korea, Russia and NR,HR and SR in China) (Fig. 2c). This pattern demonstratesthat geographical genetic differentiation exists in wildsoybean (G. soja).

Genealogical analyses

To gain insight into potential locations of G. max domesti-cation, we constructed a neighbor-joining tree, based onindividual accessions, with G. tometella and G. falcata out-groups (Fig. 5a). Regardless of the dataset examined, withfew exceptions, cultivated soybean tended to form a mono-phyletic clade with respect to G. soja. Moreover, accessionswithin G. max tended to form subpopulations correspond-ing to geographic origin, but there was overlap, particularlyamong accessions from the NR and HR regions with SSRdata, consistent with the structure analyses (Fig. 2a).

To better assess the monophyly of G. max and relation-ships among geographic regions, we collapsed clades toform OTUs. We pooled the accessions into 13 (SNPs andSSRs + SNPs) or 15 (SSRs) OTUs based on their geo-graphic origin (latitude and longitude), their position inFig. 5(a) and their assignment in structure analyses (see

SSR SNP(a)

G. soja G. soja G. soja

NE

R

G. max xa m.GG. max NE

R

NE

R

SR HRAnnual Perennial

G. tomentella

G. falcata

G. soja

G. max

NER NR HR SR Russia Korea Japan

0.050.050.05

SSR+SNP

(b)PerennialW-HRC-SR-SPC-SR-EPC-SR-WPC-HR-WP

8696

70

99

40

C-HR-EPC-NER-SPC-NER-NPC-NR-WPC-NR-EP

100

100

95

100

W-KoreaW-JapanW-RussiaW-SRW-NR

8856

3255

SSRPerennialC-SR-WP100

SNP

C-SR-EPC-NER-SPC-NER-NPC-HR-WP

100

10081

100

92C-HR-EPC-NR-WPC-NR-EPW-SR

80

100

92

55W-KoreaW-JapanW-RussiaW-NR+HR

100

99

55

85

SSR + SNPPerennialC-NER-SP100C-NER-NPC-SR-WPC-SR-EPC-HR-WP

100

54

100

100C-HR-EPC-NR-WPC-NR-EPW-Russia

95

100

100

W-KoreaW-JapanW-SRW-NR+HR

100

97

100

93

97

Fig. 5 Neighbor-joining tree of Glycine soja and Glycine max rooted with Glycine tomentella and Glycine falcata based on shared allelepairwise distances. (a) Trees for individual soybean accessions. Colored symbol indicate the inferred genetic cluster from STRUCTURE analyses.SSR, simple sequence repeat; SNP, single-nucleotide polymorphism. (b) Trees for operational taxonomic units (OTUs) of individual soybean.The percentage bootstrap support is indicated at each node. Abbreviation for OTUs are: c, cultivated soybean (G. max); W, wild soybean (G.

soja); NER, northeast region, China; NR, north region, China; HR, Huanghuai region, China; SR, south region, China; EP, east part; WP, westpart; NP, north part; SP, south part.

NewPhytologist Research 249

� The Authors (2010)

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 242–253

www.newphytologist.com

Page 9: Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci

Table S1). The OTUs comprised 3–20 accessions. Thisapproach again resulted in a monophyletic grouping of G.max (Fig. 5b), but also revealed that wild accessions fromthe HR region (SSR analyses) or the NR + HR cluster(SNP analyses) are closest phylogenetically to the G. maxclade. Interestingly, structure analyses of SNP dataassigned NR cultivated accessions into G. soja clusters atK = 2 (Fig. 3). These patterns suggest that the NRaccessions best represent early domestication germplasm.

In most trees, the deepest split separated the two species(Fig. 5). Within species, there were clear geographic pat-terns of the OTUs. Within G. max, OTUs from NR splitfirst. Within G. soja, the populations from China exhibiteda closer relationship to G. max than the relationshipbetween those and populations from Japan or Korea.Although not entirely consistent among data sets, phylo-genetic analyses tended to suggest that wild soybeans fromNR and HR, both of which are along the Yellow River,were genetically most closely related to cultivated soybeans.

Discussion

The analyses of genetic diversity in domesticated crops andtheir wild ancestors is typically used for one of three pur-poses. The first is to identify distinct genetic groups forretention of germplasm (Agrama et al., 2009). The secondis to identify the genes that underlie important phenotypicand genetic shifts during domestication and breeding, usingthe approach of selective sweep mapping (Vigouroux et al.,2002; Wright et al., 2005; Ross-Ibarra et al., 2007;Chapman et al., 2008). The third is to infer aspects of thehistory and timing of domestication. Here, we have assayedgenetic diversity in a broad sample of domesticated soybean(G. max) and its wild progenitor (G. soja) with this thirdpurpose in mind – that is, to provide additional insightsinto soybean domestication – but the breadth of our studyensures that it is also useful for analyses of germplasm col-lections.

Our study uses both SSRs and SNPs and our sampling ismuch broader than previous studies (Matsuoka et al., 2002;Becquet et al., 2007). Nonetheless, our data, like previousdata (Powell et al., 1996; Kuroda et al., 2006) suggest thatthe primary division of genetic diversity is consistently thatbetween wild (G. soja) and domesticated soybean (G. max).For example, structure and instruct analyses on all data-sets clearly delineate wild and cultivated germplasm atK = 2, and AMOVA analyses also clearly differentiatebetween wild and cultivated accessions. Moreover, phyloge-netic analyses of both individual accessions (Fig. 5a) andcollapsed OTUs (Fig. 5b) tend to suggest that the culti-vated germplasm is monophyletic. Based on these lines ofevidence, we favor the interpretation that soybean, likemaize (Matsuoka et al., 2002), barley (Badr et al., 2000),pearl millet (Oumar et al., 2008), emmer wheat (Ozkan

et al., 2002) and einkorn wheat (Heun et al., 1997), mayresult from a single domestication event.

If this inference is correct, soybean differs from that ofother species studied for which genetic evidence providescompelling evidence of multiple domestication events(Londo et al., 2006; Morrell & Clegg, 2007; Sang & Ge,2007; Aguilar-Melendez et al., 2009). In this context, itshould be emphasized that inferential methods are imperfect,because simulations indicate that multiple domesticationevents can lead to monophyletic clustering of domesticatedaccessions under some conditions (Allaby et al., 2008;Ross-Ibarra & Gaut, 2008). Thus, a pattern of monophylycould provide a false signal of a single domestication event,and there may be a bias toward concluding there has been asingle domestication event even when it is untrue.Nonetheless, our inference of a single domestication eventis consistent with most previous studies of soybean (Xuet al., 1986; Zhu et al., 1995; Zhou et al., 1998; Gai et al.,2000), except one based on a modest sample of chloroplastDNA (cp)SSRs, which suggested that the cultivated soybeanoriginated independently in different regions from differentwild gene pools (Xu et al., 2002).

Admixture and geographic subdivision

Because our study relies on more genetic markers than pre-vious studies, our data provide more potential for insightinto geographic delineations within species and hybridiza-tion between species. With regard to genetic subdivisionswithin species, G. max clustered by geographic location; G.max in China divides into the geographic regions NER,NR, HR and SR. This is a coarser clustering than inferredpreviously from a study that used fewer SSR markers (59)but more landrace accessions (1863) (Li et al., 2008). Thelatter identified seven clusters, representing roughly theclusters inferred here, except that four separate clusters wereinferred within the geographical region of SR, and these SRsubclusters reflected differences in sowing season. Thus,increasing sample size may be beneficial for inferring fine-tuned geographic structure (Morin et al., 2004).

We infer six genetic subgroups within G. soja. These sixclusters separate geographically, corresponding to Japan,Korea, Russia and three distinct regions in China. Thispattern was also discovered in a previous study (Wen et al.,2009). AMOVA analyses at 60 SSR loci and eight morpho-logical traits with 196 Chinese G. soja accessions alsoshowed that significant variation exists among northeastChina, the Huanghuai Valleys and southern China.The lone exception was the Northeast region (NER) ofChina, for which accessions were mainly assigned tomixture cluster and HR, NR, Korea subgroups. This wasperhaps contributed by the small number of accessions fromNER (six for SSR, five for SNP and three for SSR + SNPanalyses).

250 Research

NewPhytologist

� The Authors (2010)

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 242–253

www.newphytologist.com

Page 10: Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci

The structure and instruct analyses provided slightlydifferent insights into the extent of potential hybridizationbetween wild and cultivated soybean. structure suggeststhat fully 20% of our accessions are admixed, which mayindicate extensive post-domestication hybridizationbetween species. By contrast, instruct assigns only 0.5%of individuals as hybrid, based on the SSR data set, which ismore similar to the measured natural hybridization rate of0.73% (Nakayama & Yamaguchi, 2002). Potential hybridindividuals have been noted in the field, and sometimesthese hybrids are considered to belong to an intermediateevolutionary species, Glycine gracilis (Skvortzow, 1927;Fukuda, 1933; Chen & Nelson, 2004), but others haveconsidered them as hybridization products of G. soja and G.max (Hymowitz, 1970).

Almost all unassigned accessions originated from theregion from 30 to 40�N latitude in China, but along a widelongitudinal swath. We thus examined morphological char-acteristics of NR accessions that were collected from the 34to 40�N region and assigned into the G. soja cluster whenK = 2 with SNP data. Based on the description of seedcolor, 100-seed weight, growth habit and stem terminationof catalogs (Wang, 1982; Chang & Sun, 1991; Changet al., 1996), most of these exhibit ancestral traits, includ-ing: 72% with black and 22% with bicolor or green seedcoat color, 100% with small seed size (100-seed weight< 12 g), 78% with viney (or semi-viney) and 89% withindeterminate (or semi-determinate) characters. Thus, theseaccessions seem to show evidence of admixture.

SNPs vs SSRs

This study utilized extensive data from both SNP and SSRmarkers and hence provides an opportunity to carefullyassess the relative utility of these two marker types. Overall,SNPs had lower resolving power for detecting populationstructure. For example, SSR data yielded six clusters thatwere consistent with geographical origin, but SNPs revealedonly five clusters without resolving groups expected to beclearly differentiated (e.g. Chinese vs Russian populationsand Japanese vs Korean populations within G. soja).Despite these differences, SSR and SNP analyses stillyielded similar population structure within the species,especially within G. max, similar fractions of diversityattributable to various hierarchical components of popu-lation structure (Table 2) and similar phylogenetic infor-mation, as measured by Mantel tests on the pairwiseshared-allele distance (r = 0.505, P = 0.001).

Some previous studies have also found that SSR markersperform better at clustering germplasm into populationsthan SNP markers (Rosenberg et al., 2003; Hamblin et al.,2007; Payseur & Jing, 2009). In our case, it seems that thereare two reasons for the discrepancy in the power to resolvepopulations. The first is the level of genetic diversity. The

number of observed SNP alleles (1108) was only half that ofSSR markers (2133), despite assaying > 5 times as manySNP loci (554 vs 99). The second, as noted previously(Morin et al., 2004; Kauwe et al., 2005), is the frequency ofdistinct alleles. The SNP markers used in this study werediscovered in only six G. max cultivars (Choi et al., 2007)and hence most of our alleles were neither specific to G. sojanor rare within populations. Hence, population structurewas probably more poorly resolved because common allelesare more likely to be shared among populations.

Acknowledgements

This research was supported by the State Key BasicResearch and Development Plan of China (973) (Nos.2010CB125900 and 2004CB117203), National KeyTechnologies R&D Program in the 11th Five-Year Plan(No. 2006BAD13B05), State High-tech (863) (Nos.2006AA10A110 and 2006AA10Z164), InternationalScience and Technology Cooperation and ExchangesProjects (No. 20061773) and the Academy and InstituteFoundation for Basic Scientific Research in Institute ofCrop Science, Chinese Academy of Agricultural Sciences.We thank Dr. Song Ge (Institute of Botany, ChineseAcademy of Sciences, Beijing, China), Dr Marinus. J. M.Smulders (Plant Research International, Wageningen UR,the Netherlands.), Dr Richard Abbott and two anonymousreviewers for stimulating discussions and useful suggestion.

References

Abe J, Xu D, Suzuki Y, Kanazawa A, Shimamoto Y. 2003. Soybean

germplasm pools in Asia revealed by nuclear SSRs. Theoretical andApplied Genetics 106: 445–453.

Agrama HA, Yan WG, Lee F, Fjellstrom R, Chen MH, Jia M, McClung

A. 2009. Genetic assessment of a mini-core subset developed from the

USDA rice genebank. Crop Science 49: 1336–1346.

Aguilar-Melendez A, Morrell PL, Roose ML, Kim SC. 2009. Genetic

diversity and structure in semiwild and domesticated chiles (Capsicumannuum; Solanaceae) from Mexico. American Journal of Botany 96:

1190–1202.

Allaby RG, Fuller DQ, Brown TA. 2008. The genetic expectations of a

protracted model for the origins of domesticated crops. Proceedings of theNational Academy of Sciences, USA 105: 13982–13986.

Badr A, M K, Sch R, Rabey HE, Effgen S, Ibrahim HH, Pozzi C, Rohde

W, Salamini F. 2000. On the origin and domestication history of barley

(Hordeum vulgare). Molecular Biology and Evolution 17: 499–510.

Becquet C, Patterson N, Stone AC, Przeworski M, Reich D. 2007.

Genetic structure of chimpanzee populations. PLoS Genetics 3: e66.

0617-0626.

Chang R. 1989. Studies on the origin of the cultivated soybean (Glycinemax (L.) Merr.). Oil Crops of China: 1–6.

Chang R, Sun J. 1991. Catalogues of Chinese soybean germplasm andresources (sequel 1). Beijing, China: China Agricultural Press.

Chang R, Sun J, Qiu L, Chen Y. 1996. Catalogues of Chinese soybean germ-plasm and resources (sequel 2). Beijing, China: China Agricultural Press.

Chapman MA, Pashley CH, Wenzler J, Hvala J, Tang S, Knapp SJ,

Burke JM. 2008. A genomic scan for selection reveals candidates for

NewPhytologist Research 251

� The Authors (2010)

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 242–253

www.newphytologist.com

Page 11: Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci

genes involved in the evolution of cultivated sunflower (Helianthusannuus). The Plant Cell 20: 2931–2945.

Chen Y, Nelson RL. 2004. Genetic variation and relationships among

cultivated, wild, and semiwild soybean. Crop Science 44: 316–325.

Cheng C, Motohashi R, Tsuchimoto S, Fukuta Y, Ohtsubo H, Ohtsubo

E. 2003. Polyphyletic origin of cultivated rice: based on the interspersion

pattern of sines. Molecular Biology and Evolution 20: 67–75.

Choi I, Hyten D, Matukumalli L, Song Q, Chaky J, Quigley C, Chase K,

Lark K, Reiter R, Yoon M. 2007. A soybean transcript map: gene

distribution, haplotype and single-nucleotide polymorphism analysis.

Genetics 176: 685–696.

Doebley J, Gaut B, Smith B. 2006. The molecular genetics of crop

domestication. Cell 127: 1309–1321.

Dong Y, Zhao L, Liu B, Wang Z, Jin Z, Sun H. 2004. The genetic

diversity of cultivated soybean grown in China. Theoretical and AppliedGenetics 108: 931–936.

Dong Y, Zhuang B, Zhao L, Sun H, He M. 2001. The genetic diversity of

annual wild soybeans grown in China. Theoretical and Applied Genetics103: 98–103.

Falush D, Stephens M, Pritchard J. 2003. Inference of population

structure using multilocus genotype data linked loci and correlated allele

frequencies. Genetics 164: 1567–1587.

Fukuda Y. 1933. Cytological studies on the wild and cultivated

Manchurian soybeans. Japanese Journal of Botany 6: 489–506.

Fuller D. 2007. Contrasting patterns in crop domestication and

domestication rates: recent archaeobotanical insights from the old world.

Annals of Botany 100: 903–924.

Fuller D, Qin L, Zheng Y, Zhao Z, Chen X, Hosoya L, Sun G. 2009. The

domestication process and domestication rate in rice: spikelet bases from

the lower Yangtze. Science 323: 1607–1610.

Gai J, Xu D, Gao Z, Shimamoto Y, Abe J, Fukushi H, Kitajima S. 2000.

Studies on the evolutionary relationship among eco-types of G. max and

G. soja in China. Acta Agronomica Sinica 26: 513–520.

Gao H, Williamson S, Bustamante CD. 2007. A Markov chain Monte

Carlo approach for joint inference of population structure and inbreeding

rates from multilocus genotype data. Genetics 176: 1635–1651.

Garris A, Tai T, Coburn J, Kresovich S, McCouch S. 2005. Genetic

structure and diversity in Oryza sativa L. Genetics 169: 1631–1638.

Goudet J. 2001. Fstat, a program to estimate and test gene diversities andfixation indices (version 2.9.3) [WWW document]. URL http://www2.

unil.ch/popgen/softwares/fstat.htm [accessed on 28 June 2010].

Hamblin MT, Warburton ML, Buckler ES. 2007. Empirical comparison

of simple sequence repeats and single nucleotide polymorphisms in

assessment of maize diversity and relatedness. PLoS ONE 2: e1367.

Heun M, Schafer-Pregl R, Klawan D, Castagna R, Accerbi M, Borghi B,

Salamini F. 1997. Site of einkorn wheat domestication identified by

DNA fingerprinting. Science 278: 1312–1314.

Hirata T, Abe J, Shimamoto Y. 1999. Genetic structure of the Japanese

soybean population. Genetic Resources and Crop Evolution 46: 441–453.

Hymowitz T. 1970. On the domestication of the soybean. EconomicBotany 24: 408–421.

Hymowitz T, Newell C. 1981. Taxonomy of the genus Glycine,domestication and uses of soybeans. Economic Botany 35: 272–288.

Jones MK, Liu X. 2009. Origins of agriculture in East Asia. Science 324:

730–731.

Kauwe J, Bertelsen S, Bierut L, Dunn G, Hinrichs A, Jin C, Suarez B.

2005. The efficacy of short tandem repeat polymorphisms versus single-

nucleotide polymorphisms for resolving population structure. BMCGenetics 6(Suppl. 1): S84.

Kuroda Y, Kaga A, Tomooka N, Vaughan D. 2006. Population genetic

structure of Japanese wild soybean (Glycine soja) based on microsatellite

variation. Molecular Ecology 15: 959–974.

Li FS. 1994. A study on origin and evolution of soybean. Soybean Science(China) 13: 61–66.

Li Y, Guan R, Liu Z, Ma Y, Wang L, Li L, Lin F, Luan W, Chen P, Qiu

L. 2008. Genetic structure and diversity of cultivated soybean (Glycinemax (L.) Merr.) landraces in China. Theoretical and Applied Genetics117: 857–871.

Liu K, Muse S. 2005. Powermarker: an integrated analysis environment

for genetic marker analysis. Bioinformatics 21: 2128–2129.

Londo JP, Chiang YC, Hung KH, Chiang TY, Schaal BA. 2006.

Phylogeography of Asian wild rice, Oryza rufipogon, reveals multiple

independent domestications of cultivated rice, Oryza sativa. Proceedingsof the National Academy of Sciences, USA 103: 9578–9583.

Matsuoka Y, Vigouroux Y, Goodman M, Sanchez G. 2002. A single

domestication for maize shown by multilocus microsatellite genotyping.

Proceedings of the National Academy of Sciences, USA 99: 6080–6084.

Mohamed AI, Rangappa M. 1992. Nutrient composition and anti-

nutritional factors in vegetable soybean. II: Oil, fatty acids, sterols, and

lipoxygenase activity. Food Chemistry 44: 277–282.

Morin PA, Luikart G, Wayne RK. 2004. SNPs in ecology, evolution and

conservation. Trends in Ecology and Evolution 19: 208–216.

Morrell P, Clegg M. 2007. Genetic evidence for a second domestication of

barley (Hordeum vulgare) east of the Fertile Crescent. Proceedings of theNational Academy of Sciences, USA 104: 3289–3294.

Nakayama Y, Yamaguchi H. 2002. Natural hybridization in wild soybean

(Glycine max ssp. soja) by pollen flow from cultivated soybean (Glycinemax ssp. max) in a designed population. Weed Biology and Management2: 25–30.

Oumar I, Mariac C, Pham J, Vigouroux Y. 2008. Phylogeny and origin of

pearl millet (Pennisetum glaucum (L.) R. Br) as revealed by microsatellite

loci. Theoretical and Applied Genetics 117: 489–497.

Ozkan H, Brandolini A, Schafer-Pregl R, Salamini F. 2002. AFLP

analysis of a collection of tetraploid wheats indicates the origin of

emmer and hard wheat domestication in Southeast Turkey. MolecularBiology and Evolution 19: 1797–1801.

Payseur BA, Jing P. 2009. A genome-wide comparison of population

structure at STRPs and nearby SNPs in humans. Molecular Biology andEvolution 26: 1369–1377.

Pohl M, Piperno D, Pope K, Jones J. 2007. Microfossil evidence for pre-

Columbian maize dispersals in the neotropics from San Andres,

Tabasco, Mexico. Proceedings of the National Academy of Sciences, USA104: 6870–6875.

Powell W, Morgante M, Doyle J, McNicol J, Tingey S, Rafalski A. 1996.

Genepool variation in genus Glycine subgenus soja revealed by poly-

morphic nuclear and chloroplast microsatellites. Genetics 144: 793–803.

Pritchard J, Stephens M, Donnelly P. 2000. Inference of population

structure using multilocus genotype data. Genetics 155: 945–959.

Qiu L. 2009. Establishment, representative testing and research progress of

soybean core collection and mini core collection. Acta Agronomica Sinica35: 571–575.

Rosenberg NA, Li LM, Ward R, Pritchard JK. 2003. Informativeness of

genetic markers for inference of ancestry. The American Journal ofHuman Genetics 73: 1402–1422.

Ross-Ibarra J, Gaut BS. 2008. Multiple domestications do not appear

monophyletic. Proceedings of the National Academy of Sciences, USA 105:

E105.

Ross-Ibarra J, Morrell PL, Gaut BS. 2007. Plant domestication, a unique

opportunity to identify the genetic basis of adaptation. Proceedings of theNational Academy of Sciences, USA 104(Suppl. 1): 8641–8648.

Rozas J, Sanchez-DelBarrio J, Messeguer X, Rozas R. 2003. DnaSP,

DNA polymorphism analysis by the coalescent and other methods.

Bioinformatics 19: 2496–2497.

Saitou N, Nei M. 1987. The neighbor-joining method: a new method for

reconstructing phylogenetic trees. Molecular Biology and Evolution 4:

406–425.

Sang T, Ge S. 2007. The puzzle of rice domestication. Journal ofIntegrative Plant Biology 49: 760–768.

252 Research

NewPhytologist

� The Authors (2010)

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 242–253

www.newphytologist.com

Page 12: Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci

Shen R, Fan J, Campbell D, Chang W, Chen J, Doucet D, Yeakley J,

Bibikova M, Wickham Garcia E, McBride C. 2005. High-throughput

SNP genotyping on universal bead arrays. Mutation Research-Fundamental and Molecular Mechanisms of Mutagenesis 573: 70–82.

Skvortzow BV. 1927. The soybean–wild and cultivated in Eastern Asia. In:

Proceedings of the Manchurian Research Society, Natural History SectionPublication Series A, No. 22. Harbin, China.

Stephens M, Donnelly P. 2003. A comparison of Bayesian methods for

haplotype reconstruction from population genotype data. The AmericanJournal of Human Genetics 73: 1162–1169.

Stephens M, Smith NJ, Donnelly P. 2001. A new statistical method for

haplotype reconstruction from population data. The American Journal ofHuman Genetics 68: 978–989.

Szpiech Z, Jakobsson M, Rosenberg N. 2008. ADZE: Allelic diversityanalyzer version 1.0. WWW document. URL http://

rosenberglab.bioinformatics.med.umich.edu/adze.html [accessed on

28 June 2010].

Tamura K, Dudley J, Nei M, Kumar S. 2007. MEGA4: molecular

evolutionary genetics analysis (MEGA) software version 4.0. MolecularBiology and Evolution 24: 1596–1599.

Vavilov N. 1951. The origin, variation, immunity and breeding of cultivatedplants. New York, NY, USA: Ronald Press, Translated from the Russian

by K. Starrchester.

Vigouroux Y, McMullen M, Hittinger CT, Houchins K, Schulz L,

Kresovich S, Matsuoka Y, Doebley J. 2002. Identifying genes of

agronomic importance in maize by screening microsatellites for evidence

of selection during domestication. Proceedings of the National Academy ofSciences, USA 99: 9650–9655.

Wang G. 1982. Catalogues of Chinese soybean germplasm and resources.Beijing, China: China Agricultural Press.

Wang L, Guan R, LIU Z, Chang R, Qiu L. 2006. Genetic diversity of

Chinese cultivated soybean revealed by SSR markers. Crop Science 46:

1032–1038.

Wen Z, Ding Y, Zhao T, Gai J. 2009. Genetic diversity and peculiarity of

annual wild soybean (G. soja Sieb. et Zucc.) from various eco-regions in

China. Theoretical and Applied Genetics 119: 371–381.

Wen ZX, Zhao TJ, Zhang YZ, Liu SH, Wang CE, Wang F, Gai JY. 2008.

Association analysis of agronomic and quality traits with SSR markers in

Glycine max and Glycine soja in China: I. Population structure and

associated markers. Acta Agronomica Sinica 34: 1169–1178.

Wright SI, Bi IV, Schroeder SG, Yamasaki M, Doebley JF, McMullen

MD, Gaut BS. 2005. The effects of artificial selection on the maize

genome. Science 308: 1310–1314.

Xie H, Chang R, Guan R, Qiu L. 2005. Genetic diversity of Chinese

summer soybean germplasm revealed by SSR markers. Chinese ScienceBulletin 50: 526–535.

Xu B, Zheng H, Lu Q, Zhao S, Zhou S. 1986. Three new evidences of the

origional area of soybean. Soybean Science (China) 5: 123–130.

Xu D, Abe J, Gai J, Shimamoto Y. 2002. Diversity of chloroplast DNA

SSRs in wild and cultivated soybeans: evidence for multiple origins of

cultivated soybean. Theoretical and Applied Genetics 105: 645–653.

Xu D, Gai J. 2003. Genetic diversity of wild and cultivated soybeans

growing in China revealed by RAPD analysis. Plant Breeding 122: 503–

506.

Zhang LB, Zhu Q, Wu ZQ, Ross-Ibarra J, Gaut BS, Ge S, Sang T. 2009.

Selection on grain shattering genes and rates of rice domestication. NewPhytologist 184: 708–720.

Zhao TJ, Gai JY. 2004. The origin and evolution of cultivated soybean

(Glycine max (L.) Merr.). Scientia Agricultura Sinica 37: 954–962.

Zhou X, Peng Y, Wang G, Chang R. 1998. Preliminary studies on the

centers of genetic diversity and origination of cultivated soybean in

China. Acta Agronomica Sinica 31: 37–43.

Zhu T, Shi L, Doyle JJ, Keim P. 1995. A single nuclear locus phylogeny

of soybean based on DNA sequence. Theoretical and Applied Genetics 90:

991–999.

Supporting Information

Additional supporting information may be found in theonline version of this article.

Fig. S1 Heterozygosity distribution at 99 simple sequencerepeat (SSR) loci and 554 single-nucleotide polymorphism(SNP) loci.

Fig. S2 Allele frequency distribution at 99 simple sequencerepeat (SSR) loci and 554 single-nucleotide polymorphism(SNP) loci.

Fig. S3 The mean number of distinct alleles (a) and privatealleles (b) per locus as a function of standardized sample sizefor two species (Glycine soja and Glycine max) and their sub-population in China.

Fig. S4 Estimated In P(X ⁄ K) and Var(logeP(X ⁄ K)) for Kfrom 1 to 8 based on simple sequence repeat (SSR), single-nucleotide polymorphism (SNP) and SSR + SNP dataset inall of accessions (a) and wild accessions (b).

Fig. S5 Each value gives the per cent of accessions that had‡ 0.8 ancestry in the STRUCTURE analyses.

Table S1 The geographical distribution pattern andinferred clusters and OTUs based on simple sequencerepeat (SSR), single-nucleotide polymorphism (SNP) andSSR + SNP analyses of 435 accessions in China

Table S2 Description of the 99 simple sequence repeat(SSR) markers and 617 single-nucleotide polymorphism(SNP) markers, population genetic statistics for the SSRand SNP complete dataset, respectively

Table S3 Frequency of assignment from STRUCTURE-inferred clusters to INSTRUCT-inferred clusters based on sim-ple sequence repeat (SSR), single-nucleotide polymorphism(SNP) and SSR + SNP dataset, respectively

Please note: Wiley-Blackwell are not responsible forthe content or functionality of any supporting informationsupplied by the authors. Any queries (other than missingmaterial) should be directed to the New Phytologist CentralOffice.

NewPhytologist Research 253

� The Authors (2010)

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 242–253

www.newphytologist.com