Top Banner
Author's personal copy Comparative analysis of Gossypium and Vitis genomes indicates genome duplication specic to the Gossypium lineage Lifeng Lin a , Haibao Tang a , Rosana O. Compton a , Cornelia Lemke a , Lisa K. Rainville a , Xiyin Wang a,b , Junkang Rong a,1 , Mukesh Kumar Rana a,c , Andrew H. Paterson a, a Plant Genome Mapping Laboratory, University of Georgia, Athens, GA 30605, USA b Center for Genomics and Biocomputing, College of Sciences, Hebei Polytechnic University, Tangshan, Hebei, 063000, China c NRC on DNA Fingerprinting, NBPGR, Pusa Campus, New Delhi, 110012, India abstract article info Article history: Received 17 November 2010 Accepted 15 February 2011 Available online 22 February 2011 Keywords: Gossypium Lineage specic genome-wide duplication Vitis vinifera Whole genome alignment Synteny Collinearity Gene loss Gene density Genetic mapping studies have suggested that diploid cotton (Gossypium) might be an ancient polyploid. However, further evidence is lacking due to the complexity of the genome and the lack of sequence resources. Here, we used the grape (Vitis vinifera) genome as an out-group in two different approaches to further explore evidence regarding ancient genome duplication (WGD) event(s) in the diploid Gossypium lineage and its (their) effects: a genome-level alignment analysis and a local-level sequence component analysis. Both studies suggest that at least one round of genome duplication occurred in the Gossypium lineage. Also, gene densities in corresponding regions from Gossypium raimondii, V. vinifera, Arabidopsis thaliana and Carica papaya genomes are similar, despite the huge difference in their genome sizes and the different number of WGDs each genome has experienced. These observations t the model that differences in plant genome sizes are largely explained by transposon insertions into heterochromatic regions. © 2011 Elsevier Inc. All rights reserved. 1. Introduction Whole genome duplication (WGD) events have been more frequent in the lineages of owering plant species than in most other taxa. With more plant genomes being sequenced and released and the emergence of new tools for genome comparisons, our understanding of the history of genome duplication and its impor- tance in angiosperm evolution is becoming clearer. An ancient genome triplication event is very likely to have been shared by all eudicots [1,2], and different lineages have experienced additional, more recent WGD events [2,3]. For example, Populus had one round of tetraploidy in the Salicoid lineage [4] and Glycine had two rounds of tetraploidy in the legume lineage [5]. In contrast, Vitis and Carica have no lineage specic genome duplication events after the common ancestor of all rosids [1,6]. WGD profoundly impacts the genomic landscape in many ways [7]. Synthetic polyploid plants experience abrupt CpG methylation changes after genome doubling [8]. Interchromosomal rearrange- ments increase after WGD in teleost sh [9]. Duplicated genes created by WGD behave differently from single gene duplications, showing a longer life span before one copy is pseudogenized and/or deleted [10]. In a cross-taxon alignment using a Gossypium raimondii (D-genome cotton) physical map [11], more Gossypium contigs were aligned to the distantly-related Vitis vinifera genome than to the more closely- related Arabidopsis genome [11]. It is possible that the two additional WGD events in Arabidopsis lineage, along with subsequent gene losses and chromosomal rearrangements, have signicantly disrupted the conservation of synteny. The fact that members of the Gossypium genus have a gametic chromosome number of 13 and several related genera have many species with n = 6 has long hinted that a Gossypium ancestor may have experienced a relatively recent WGD [12]. While the history of genome duplication in the Gossypium lineage is not yet clear due to the lack of whole genome sequence, classical cytogenetic analysis, Ks distributions of duplicated gene pairs, and possible homoeologous relationships among multiple chromosomal segments within the Gossypium genome [1315] all support the hypothesis that Gossypium experienced at least one whole genome duplication event since the triplication shared by most if not all eudicots. However, inferred Gossypium homology to date is based on genetic mapping, which is dependent on the marker density and might lead to some spurious matchings [14]. Additionally, sequence shufing between the peri- centromeric regions may cause false positives as well [14]. Therefore, Genomics 97 (2011) 313320 Corresponding author at: 111 Riverbend Road, Rm 228, Athens, GA 30602-6810, USA. Fax: +1 706 5830160. E-mail address: [email protected] (A.H. Paterson). URL: http://www.plantgenome.uga.edu (A.H. Paterson). 1 Current afliation: School of Agriculture and Food Sciences, Zhejiang Forestry University, Lin'an, Hangzhou, Zhejiang, 311300, China. 0888-7543/$ see front matter © 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.ygeno.2011.02.007 Contents lists available at ScienceDirect Genomics journal homepage: www.elsevier.com/locate/ygeno
8

Comparative analysis of Gossypium and Vitis genomes indicates genome duplication specific to the Gossypium lineage

Feb 07, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Comparative analysis of Gossypium and Vitis genomes indicates genome duplication specific to the Gossypium lineage

Author's personal copy

Comparative analysis of Gossypium and Vitis genomes indicates genome duplicationspecific to the Gossypium lineage

Lifeng Lin a, Haibao Tang a, Rosana O. Compton a, Cornelia Lemke a, Lisa K. Rainville a, Xiyin Wang a,b,Junkang Rong a,1, Mukesh Kumar Rana a,c, Andrew H. Paterson a,⁎a Plant Genome Mapping Laboratory, University of Georgia, Athens, GA 30605, USAb Center for Genomics and Biocomputing, College of Sciences, Hebei Polytechnic University, Tangshan, Hebei, 063000, Chinac NRC on DNA Fingerprinting, NBPGR, Pusa Campus, New Delhi, 110012, India

a b s t r a c ta r t i c l e i n f o

Article history:Received 17 November 2010Accepted 15 February 2011Available online 22 February 2011

Keywords:GossypiumLineage specific genome-wide duplicationVitis viniferaWhole genome alignmentSyntenyCollinearityGene lossGene density

Genetic mapping studies have suggested that diploid cotton (Gossypium) might be an ancient polyploid.However, further evidence is lacking due to the complexity of the genome and the lack of sequence resources.Here, we used the grape (Vitis vinifera) genome as an out-group in two different approaches to further exploreevidence regarding ancient genome duplication (WGD) event(s) in the diploid Gossypium lineage and its(their) effects: a genome-level alignment analysis and a local-level sequence component analysis. Bothstudies suggest that at least one round of genome duplication occurred in the Gossypium lineage. Also, genedensities in corresponding regions from Gossypium raimondii, V. vinifera, Arabidopsis thaliana and Caricapapaya genomes are similar, despite the huge difference in their genome sizes and the different number ofWGDs each genome has experienced. These observations fit the model that differences in plant genome sizesare largely explained by transposon insertions into heterochromatic regions.

© 2011 Elsevier Inc. All rights reserved.

1. Introduction

Whole genome duplication (WGD) events have been morefrequent in the lineages of flowering plant species than in mostother taxa. With more plant genomes being sequenced and releasedand the emergence of new tools for genome comparisons, ourunderstanding of the history of genome duplication and its impor-tance in angiosperm evolution is becoming clearer. An ancientgenome triplication event is very likely to have been shared by alleudicots [1,2], and different lineages have experienced additional,more recentWGD events [2,3]. For example, Populus had one round oftetraploidy in the Salicoid lineage [4] and Glycine had two rounds oftetraploidy in the legume lineage [5]. In contrast, Vitis and Carica haveno lineage specific genome duplication events after the commonancestor of all rosids [1,6].

WGD profoundly impacts the genomic landscape in many ways[7]. Synthetic polyploid plants experience abrupt CpG methylationchanges after genome doubling [8]. Interchromosomal rearrange-

ments increase after WGD in teleost fish [9]. Duplicated genes createdby WGD behave differently from single gene duplications, showing alonger life span before one copy is pseudogenized and/or deleted [10].In a cross-taxon alignment using a Gossypium raimondii (D-genomecotton) physical map [11], more Gossypium contigs were aligned tothe distantly-related Vitis vinifera genome than to the more closely-related Arabidopsis genome [11]. It is possible that the two additionalWGD events in Arabidopsis lineage, along with subsequent gene lossesand chromosomal rearrangements, have significantly disrupted theconservation of synteny.

The fact that members of the Gossypium genus have a gameticchromosome number of 13 and several related genera have manyspecies with n=6 has long hinted that a Gossypium ancestor mayhave experienced a relatively recent WGD [12]. While the history ofgenome duplication in the Gossypium lineage is not yet clear due tothe lack of whole genome sequence, classical cytogenetic analysis, Ksdistributions of duplicated gene pairs, and possible homoeologousrelationships among multiple chromosomal segments within theGossypium genome [13–15] all support the hypothesis that Gossypiumexperienced at least one whole genome duplication event since thetriplication shared by most if not all eudicots. However, inferredGossypium homology to date is based on genetic mapping, which isdependent on the marker density and might lead to some spuriousmatchings [14]. Additionally, sequence shuffling between the peri-centromeric regions may cause false positives as well [14]. Therefore,

Genomics 97 (2011) 313–320

⁎ Corresponding author at: 111 Riverbend Road, Rm 228, Athens, GA 30602-6810,USA. Fax: +1 706 5830160.

E-mail address: [email protected] (A.H. Paterson).URL: http://www.plantgenome.uga.edu (A.H. Paterson).

1 Current affiliation: School of Agriculture and Food Sciences, Zhejiang ForestryUniversity, Lin'an, Hangzhou, Zhejiang, 311300, China.

0888-7543/$ – see front matter © 2011 Elsevier Inc. All rights reserved.doi:10.1016/j.ygeno.2011.02.007

Contents lists available at ScienceDirect

Genomics

j ourna l homepage: www.e lsev ie r.com/ locate /ygeno

Page 2: Comparative analysis of Gossypium and Vitis genomes indicates genome duplication specific to the Gossypium lineage

Author's personal copy

although ancient lineage specific WGD in Gossypium has beensuggested, definitive proof is still lacking.

In sequenced genomes, one common method to search forevidence of ancient WGD is by “all-against-all” dot plot. In thismethod, ancient homologous genes (or “anchors”) are identifiedusing BLAST, with runs of syntenic segments reflected by consecutivestrings of homologous genes preserved in a linear pattern parallel tothe diagonal or anti-diagonal (the latter indicating segmentalinversion). Compared to the Ks distribution plot, this approach notonly provides structural evidence of ancient duplication events, butalso the physical location of the duplicated segment pairs. However,this method is not feasible in species that lack contiguous sequencesor information about the relative chromosomal positions of thesequences.

Without whole genome data, local gene loss patterns can also beindicative of the history of WGD [16]. After genome duplication, onehomologous gene is thought to be freed from selective pressure, andmay adopt new functions (neofunctionalization), share the originalgene function with its paralogue (subfunctionalization) or becomepseudogenized or removed. Indeed, the majority of duplicated genecopies are lost in just a few million years after polyploidy. If a eudicotgenome (such as G. raimondii) has experiencedWGDwith consequentgene loss after its divergence from V. vinifera, one would predict thatmany genes would have been lost from their ancestral locations.

To further our understanding of its evolutionary history, westudied the Gossypium genome using two different methods: a whole-genome-level dot plot analysis, and a local-level comparative study ofa specific region of Gossypium–Vitis synteny on the basis of twosequenced G. raimondii BACs with a total base pair of ~184 kb. Boththe whole genome dot–plots and local-level sequence comparisonsprovide new evidence of Gossypium lineage-specific genome dupli-cation after the Vitales–Malvales split. Comparison of homologoussequences between the two species also offers new insight intomechanisms of genome size variation.

2. Results

2.1. Gossypium–Vitis whole genome dot plot

Gossypium is a genus consists of 50 allotetraploid species anddiploid species. The smallest diploid Gossypium genome, that ofG. raimondii, has an estimated genome size of around 880 Mb [17]. Theconstruction of a cotton consensus map with 13 chromosomes bymerging the most saturated AD tetraploid genetic map (constructedfrom F2 population of tetraploid cotton G. hirsutum and G. barbadense)and the D genome (constructed from F2 population of diploid cottonG. trilobum and G. raimondii) genetic maps, was described in earlierstudies [13]. Briefly, there are 333 pairs of loci that were mapped inextensive blocks in different subgenomes. These were used as thebasis for merging. The positions of non-sharing markers wereinterpolated between these common anchor markers based on therelative recombinational distance from the nearest anchor marker.The combined genetic map contains 3016 loci distributed in a reducednumber of 13 putative ancestral chromosomes, thus providing amarker density higher than any previously published maps, offeringmore resolution than using individual maps alone.

In this study, all genetic markers from the Gossypium consensusmap were compared and plotted against all Vitis genes. Among 3016loci on the cotton consensus genetic map, there were 1865 identifiedhomologies with a total of 3012 genes on the Vitis genome. Thesegenes/loci formed 5097 pairs and the positions of these pairs wereused in creating the genome-wide dot–plot.

We were able to detect N50 blocks of syntenic regions betweenGossypium consensus map and Vitis chromosomes (Fig. 1). It is clearfrom the dot plot that there is often more than one region inGossypium that matches the same Vitis region. For example, more than

half of Vitis chromosome 18 matches regions on Gossypium consensuschromosome 9 and chromosome 10. Similarly, there are syntenicblocks found between Vitis chromosome 3 and Gossypium consensuschromosomes 8 and 12, and syntenic blocks found between Vitischromosome 14 and Gossypium consensus chromosomes 1 and 6.Across many regions in the Vitis genome, two or more blocks ofGossypium consensus chromosome fragments are found to besyntenic to the same Vitis chromosome region, and we argue thatthe duplicated Gossypium regions are likely derived from a wholegenome doubling event not shared with Vitis.

The consensus map provided us with improved information aboutthe genome structure in cotton. However, we realize that the syntenicblocks in Fig. 1 appear “fuzzy” because of the uncertainties in the exactorder of genetic markers and the construction of consensus map. Forexample, some areas on the dot plot show a high density of matches,but lack a clear collinear relationship. In places where we coulddiscern significant collinear relationships, there are still fluctuationsaround the predicted linear order. We should note that the consensusGossypiummapwas constructed bymerging the genetic markers fromAt-, Dt- and D-genome genetic maps. The interpolation of thepositions of unshared markers could be problematic in inferringmarker orders on a local scale because both maps are relatively lowresolution (ca. 1 cM) and because the genetic/physical distance ratiocan fluctuate widely (violating the linear assumption). Nonetheless,1629 Vitis genes and 954 Gossypium loci are found in syntenic blocks,among which 263 Vitis genes and 314 Gossypium loci are found inblocks that show a 2:1 relationship between Gossypium and Vitis.These “duplicated blocks” are distributed across many differentchromosomes in Vitis, which strongly indicates that at least onegenome-wide duplication event has occurred in the Gossypiumlineage since its divergence from Vitis.

We further note that the current analysis is feasible because of thehigh density of markers in our Gossypium consensus map. Indeed, wealso attempted to detect collinearity using its individual components:the AD tetraploid reference map and the D genome genetic map [13]separately. Although there are isolated cases where homoeologoustetraploid Gossypium chromosomes were found to be syntenic to thesame Vitis chromosome region, they generally fail to reach the sameresolution as the analysis with the consensus map. There are manyinstances where syntenic blocks detected in the plot using theconsensus map were missing in the plot using the individual mapsdue to lack of data points (Supplemental Figure 1).

2.2. Gossypium BAC sequencing and microsynteny detection

We surveyed three BACs from the D-genome Gossypium physicalmap [11] using shotgun sequencing. The BACs selected wereGR174O23, GR109E22 and GR163B08, in the order arranged by FPC(Fingerprinted Contigs [11,18]). Two sequence contigs were assem-bled for GR109E22 with sizes of 30,903 bp (GR109E22contig1) and78,650 bp (GR109E22contig2) respectively. There is still one se-quence gap (b3 kb) between the two contigs but they are ordered andoriented with the mate-pair information from the subclones. Theassembled lengths for the other two BACs are: 97,267bp forGR174O23 and 134,012 bp for GR163B08. Sequence comparisonamong the three BACs revealed that GR174O23 overlaps withGR109E22contig1, with a merged sequence 104,965 bp long. Nooverlaps among other BAC sequence fragments were found.

We created putative cotton gene models based on two differentmethods: Ab initio gene predictions were performed using FGENESH,and a similarity-based method was performed by aligning to cottonEST databases (see Methods). A total of 12 genes were identified inGR109E22 contig2 and an additional 12 genes were identified in thecombined fragment of GR109E22 contig1 and GR174O23. The BACGR163B08 has 19 genes identified by FGENESH, but these either failedto show any corresponding EST sequence or are transposon-related

314 L. Lin et al. / Genomics 97 (2011) 313–320

Page 3: Comparative analysis of Gossypium and Vitis genomes indicates genome duplication specific to the Gossypium lineage

Author's personal copy

(detailed in later sections). Significant synteny was found betweentwo regions on Vitis chromosome 6 and two Gossypium BACsGR174O23 and GR109E22. Putative gene sequences from GR163B08correspond to genes in multiple scattered positions on the Vitisgenome, and we failed to detect syntenic relationships to any Vitisregions using this BAC.

For the ease of analysis, we divided the collinear relationships foundbetween G. raimondii BACs and the V. vinifera genome into two regions.Region 1 contains the consensus sequence combining GR174O23,GR109E22 contig1 and part of GR109E22 contig2 that is immediatelyadjacent to contig 1 across the sequencing gap in the BAC. This regioncontains 8 collinear genes that aligned to the region from 21.8 Mb to22.3 Mb (Vv6g1599–Vv6g1637) on Vitis chromosome 6. Region 2contains the remaining portion of the GR109E22 contig2, whichcorresponds to 7.5 Mb to 7.8 Mb (Vv6g0801–Vv6g0829) on Vitischromosome 6, with 9 genes in collinear order (Fig. 2).

Region 1 and region 2 are contiguous on the Gossypium genome,but are located on separate arms of chromosome 6 in Vitis (Fig. 2). To

determine if this rearrangement happened in the Gossypium genomeor the Vitis genome, we looked at the corresponding syntenic regionsof Arabidopsis . Due to the lineage specific genome-wide duplicationsand rearrangements in the Arabidopsis genomes, we have identifiedseveral Arabidopsis genomic locations that showed synteny to theGossypium BACs. Nonetheless, region 1 and region 2 are found to besyntenic to different Arabidopsis genomic locations that are notadjacent to each other. In addition, the synteny between Vitis andArabidopsis in these regions does not break at the same point as it doesbetween region 1 and region 2 in the Gossypium genome studied here.We conclude that the genome rearrangement that fuses region1 andregion 2 happened in the Gossypium lineage.

2.3. Ks value between syntenic gene pairs

We further calculated the synonymous substitution rate (Ks)between our predicted Gossypium gene models and syntenic orthologsin Arabidopsis and Vitis (Supplemental Table 1). With median value of

Fig. 1.Whole genome dot–plots between Gossypium consensus genetic map and Vitis whole genome gene sequences. Collinearity blocks identified as described inMethods andwerehighlighted in red.

Fig. 2. Position of orthologous regions of sequenced Gossypium BACs and two regions on Vitis chromosome 6.

315L. Lin et al. / Genomics 97 (2011) 313–320

Page 4: Comparative analysis of Gossypium and Vitis genomes indicates genome duplication specific to the Gossypium lineage

Author's personal copy

~1.8 (of 21 gene pairs)Ks betweenGossypium–Arabidopsis orthologs aremuch higher than Gossypium–Vitis orthologs (median value of ~1.4 outof 18 gene pairs). This trend is unexpected since Arabidopsis isphylogenetically closer to cotton (both taxa belong to the Eurosid IIclade) than to Vitis. However, there are significant variations in thesubstitution rate among different angiosperm lineages [19]. Indeed,among the four sequenced rosid genomes studied in Ref. [20],Arabidopsis has the fastest substitution rate while Vitis has the slowestsubstitution rate, which could explain this unexpected Ks trend. Also,the substitution rate of genes froma small region such as theone studiedhere might not be representative of the whole genome. A genome levelcomparative analysis, once a cotton genome sequence is available, willclarify the evolutionary history of these species.

2.4. Extensive loss of homologous genes observed in Gossypium sequencedregions

To investigate gene loss in the Gossypium lineage after its split fromVitis, we constructed a putative ancestral gene order and compared itto the homologous regions in Arabidopsis. By comparing genesconserved in collinear arrangements in all four Arabidopsis homolo-gous regions (resulting from the two doublings in the Arabidopsislineage), we were able to distinguish genes present in ancestrallocations from “lineage-specific” insertion or deletion events. Genesfound in collinear blocks across taxa were inferred to be in putativeancestral locations; other genes are likely to be lineage specific geneinsertions or deletions. Fig. 3 shows an example using Region 2.

In Region 1, 19 genes were in putative ancestral locations on theVitis chromosome, of which 8 are still preserved in Gossypium; inRegion 2 (Fig. 3), 9 genes are preserved in Gossypium out of 17 inputative ancestral locations in Vitis. In both cases, roughly half theV. vinifera genes in ancestral locations are still found in the synteniclocations in Gossypium.

We further compared the extent of gene-loss in the Gossypiumregions with the corresponding regions in Carica (no WGDs after itsdivergence from Vitis) and Arabidopsis (twoWGDs after its divergencefrom Vitis). Gene numbers are similar in Carica and Vitis regions, eachcontaining approximately twice the number of genes found incollinear positions in Gossypium (Table 2). In the collinear Arabidopsisregions, the preserved gene number is significantly lower than that ofGossypium (Table 2), closer to ¼ of the genes in putative ancestrallocations.

Many genes in the collinear regions of these genomes do not fitinto putative ancestral gene positions. These are likely to be lineagespecific gene insertions or deletions. In particular, in ArabidopsisRegion 2 (Fig. 3), seven consecutive genes show no homology in Vitis,Carica or Gossypium, but are found in a collinear block on Vitischromosome 13, indicating translocation of a large fragment to thisregion in the Arabidopsis lineage.

2.5. The Vitis homologous region spans a larger physical distance than thecorresponding regions of Gossypium and Arabidopsis

Although the V. vinifera genome is only about 55% of the size of theG. raimondii [1,17], the syntenic regiononVitis ismuch larger in physicalsize than the corresponding Gossypium regions in both cases. Region 1covers a Vitis genomic region of ~446 kb, and a Gossypium region of58 kb; region 2 covers a Vitis region of 290 kb and a Gossypium region of53 kb. In both cases, the Vitis region is 5–10 times as large as thecorresponding Gossypium region. Arabidopsis syntenic regions hadphysical sizes similar to the Gossypium regions (between 22 and 70 kbfor both Regions 1 and 2). The size difference between correspondingregions of Gossypium and Vitis could be caused by either expansion inthe Vitis genome or condensation in the Gossypium genome, or verylikely, both.

Transposons

We analyzed the distribution of transposable elements (Figs. 4 and 5)using RepBase (http://www.girinst.org/) and default parameters. TEscomprise a largerproportionof the sequence in theV. viniferahomologousregions, at 25% and 17% of Region 1 and 2, than the 13% and 7% inG. raimondii. Both DNA transposons and retroelements comprise a largerportion of the Vitis sequences than the Gossypium sequences. Thedifference in quantity of transposons explains 30% and 18% of the sizedifferences between the compared regions (Fig. 5A).

Fig. 3. Pattern of Gossypium gene loss in Region 2. Genes that showed collinearity across genomes are represented by filled squares; genes not preserved in collinearity (putativelineage specific insertions) are represented by hollow squares. Out of the 17 putative ancestral genes in Vitis, only 9 are still identifiable in Gossypium.

Table 1Vitis homologous region of Gossypium sequenced BACs.

Vitis gene number GR BAC number Homologous positionon GR BACs

(Approx. kb)

Region1 Vv6g1599 GR174O23_GR109E22contig1 64Vv6g1600 GR174O23_GR109E22contig1 79Vv6g1602 GR174O23_GR109E22contig1 84Vv6g1615 GR174O23_GR109E22contig1 89Vv6g1617 GR174O23_GR109E22contig1 97Vv6g1624 GR174O23_GR109E22contig1 105Vv6g1625 GR109E22Contig2 2Vv6g1637 GR109E22Contig2 16

Region2 Vv6g0801 GR109E22Contig2 24Vv6g0802 GR109E22Contig2 29Vv6g0806 GR109E22Contig2 34Vv6g0814 GR109E22Contig2 43Vv6g0817 GR109E22Contig2 49Vv6g0819 GR109E22Contig2 49Vv6g0823 GR109E22Contig2 54Vv6g0826 GR109E22Contig2 58Vv6g0829 GR109E22Contig2 77

316 L. Lin et al. / Genomics 97 (2011) 313–320

Page 5: Comparative analysis of Gossypium and Vitis genomes indicates genome duplication specific to the Gossypium lineage

Author's personal copy

Gene loss

In addition to the sizedifference explained by transposable elements,there is still a 3× to 4× difference in the size of the correspondingG. raimondii and V. vinifera sequences (Fig. 4). This variation in physicallength of syntenic regions is approximately proportional to the numberof genes identified. In region1,Vitishas 38genes (446 kb) correspondingto 8 genes in Gossypium (15 kb+43 kb); in region 2, Vitis has 29 genes(290 kb) corresponding to 9 genes in Gossypium (53 kb). In both cases,the size of the genomic region corresponds to the number of genesidentified, i.e. gene densities are relatively constant. By plotting thepositions of genes and TEs on these regions from the two genomes(Fig. 5), we found that many “extra” gene sequences in the Vitis regionsare indeed retained in ancestral positions, suggesting that theymayhavebeen lost in this particular region of Gossypium during diploidizationfollowing lineage-specific WGD. This suggests that the missing genes inGossypium are likely to be present in paralogous (homeologous) regionsthat have yet to be identified and sequenced.

2.6. A non-syntenic Gossypium BAC is enriched for repetitive DNA

GR163B08, distal to GR109E22 in the same physical BAC contig(Fig. 2), differs markedly from the other two BACs sequenced.Homology searches in Genbank showed that 8 (out of 19) predictedgenes on this BAC are retrotransposon related, and the remaining 11showed either no significant homology to known proteins, orhomology to unknown proteins. No collinearity was detected withthe Vitis, Carica or Arabidopsis genomes. A total of 11% of the BACsequence is made up of transposable elements, but unlike the othertwo BACs, these are almost exclusively (97%) LTR-retrotransposons.The number of tandem repeats found in this BAC is 3 to 8 times higherthan in other two BACs.

GR163B08 is closer to the end of the chromosome than the othersequenced BACs (Lin et al. unpublished) and may be in or near atransition zone fromgene rich euchromatin to the sub-telomeric region.Common features of sub-telomeric regions include the enrichment oftandem repeats and large transposable element insertions [21],consistent with the sequence composition of GR163B08.

3. Discussion

Earlier genome mapping studies suggested that diploid Gossypiummight be an ancient polyploidy [13]. In this study, we used two differentapproaches to further investigate this hypothesis. We first showedwhole genome dot–plot analysis using genetic markers in Gossypiumagainst all genes in the sequenced V. vinifera genome. Although asignificant improvementoverprior studies, the resolutionof thedot plotis still constrained by the limited number of informative Gossypiummarkers. Nonetheless, in many cases one V. vinifera chromosomesegment corresponded to at least two Gossypium segments, whichstrongly suggests at least one round of WGD in the diploid Gossypiumlineage. Sequencing of one of the collinear regions revealed genomestratification in Gossypium that fits the expected behavior of duplicatedgene loss after WGD events. These findings complement and reinforceearlier published findings using different methods that Gossypiumspecies are ancient polyploids.

Despite the smaller genome size of V. vinifera than G. raimondii, thehomologous regions in V. vinifera that we have analyzed are muchlarger than the G. raimondii regions. We argue that although TEinsertions do play a role in the size differences, diploidization in theGossypium genome could explain a larger portion of the sizedifference. The missing genes in the Gossypium regions studied arelikely to be retained in paleo-duplicated fragments elsewhere in theGossypium genome that we have not sampled. Therefore, althoughgene loss has caused the Gossypium regions studied to be smaller insize than the corresponding Vitis regions, given the similar genedensities it is likely that the overall genome size is not affected muchby gene deletion. Earlier studies in other taxa [22] suggest that thegenome is composed of two distinctive components, with genesdensely packed in euchromatic regions, and the heterochromaticregions being largely repetitive DNA that explains the majority ofgenome size differences. Therefore, given the similar gene densities inthese genomes, the variation of genome sizes is mostly determined bythe size of heterochromatin.

3.1. New evidence supporting a history of WGD in Gossypium

Both cytogenetic studies [15] and intragenomic comparisons ofgenetic marker positions and use of the current gene/marker order todeduce the ancestral gene order [13,23] previously suggested that theGossypium lineage experienced at least one WGD. Two new lines ofevidence further support this hypothesis and indicate that theGossypium WGD was subsequent to the triplication affecting most ifnot all dicots.

Table 2Number of ancestral genes preserved in Gossypium and Arabidopsis in the sequenceBACs

Region 1 Region 2

Size of the region in Vitis 446 kb 290 kbNumber of orthologous Vitis genes 19 17Size of the region in Gossypium ~58 kb 53 kbNumber of orthologous Gossypium genes 8 9Size of the region in Carica ~250 kb 328 kbNumber of orthologous Carica genes ~20 18Size of the region in Arabidopsis 22-70 kb 23-40 kbNumber of orthologous Arabidopsis genes 6,4,9,5 5,8,3,6

Fig. 4. The proportion of transposable elements (A) and genes (B) in Gossypium BACsequences and the corresponding Vitis homologous regions.

317L. Lin et al. / Genomics 97 (2011) 313–320

Page 6: Comparative analysis of Gossypium and Vitis genomes indicates genome duplication specific to the Gossypium lineage

Author's personal copy

Compared to earlier analyses using software packages CrimeStat2and FISH in the detection of ancient duplicated segments [14], whole-genome dot–plots of mapped Gossypium genes herein reveal fewersegments. However, our analysis imposed the additional requirementthat genes to be ordered in a collinearmanner, reducing the likelihoodthat corresponding segments are explicable by factors other thangenome duplication.

Our study also includes a local-level comparative analysis, startingfrom a putative ancestral gene order prior to the duplication event, ina “top-down” approach [2]. The local analysis, based on BACsequences, has the advantage of more detailed view of local regionsthan studies using coarsely-mapped Gossypium markers alone. Thefraction of retained collinear genes in all Gossypium regions studied isless than 50% but is appreciably higher than that of any oneArabidopsis segment (the latter having experienced two WGD eventsafter its divergence from Vitis). However, across the four Arabidopsissegments (Fig. 3), a total of 17 genes are preserved in collinearlocations in at least one segment, roughly double the 9 in the singleG. raimondii segment.

This pattern of gene loss in theG. raimondii sequenced region couldbe the result of a) individual single gene translocation events; b) geneloss after segmental duplication; or c) gene loss after genomeduplication. The first scenario is unlikely because with a similardivergence time from Vitis, the gene content in Carica homologousregion is still very well conserved. Although our comparison hereincludes only four species, the correlation between gene conservationpattern and number of genome duplications is quite significant.However, it is difficult to differentiate the effect of genome duplicationversus segmental duplication using the gene loss pattern alone for thisparticular region, due to the limited sequences that we sampled in thisstudy. Nevertheless, with the evidence provided in our genome-leveldot–plot analysis and other published genome-level comparativemapping results, it is likely that the observed gene loss pattern in ourGossypium BACs is representative of a genome-wide event and thegene losses is incurred byWGD rather than segmental duplications. Inother words, it is likely that the relatively small number of ancestral

genes found in the Gossypium BACs is explained by the existence ofone or more paleo-duplicated segments in the G. raimondii genomethat have not yet been sequenced. Such a segment would be expectedto retain collinearity to the same V. vinifera region but based on a geneset that is largely complementary to what is found on the sequencedBACs, accounting for the missing half of the inferred ancestral genecontent.

Whether there has been one round or two rounds of WGD inGossypium lineage is yet to be determined. Although our local-levelgene-loss patterns resemble the effect of one round of WGD, we needto be cautious of our conclusions here for several reasons: first, wheninferring the ancestral gene repertoire, we inevitably miss genes thatare lost either in V. vinifera or in all Arabidopsis homologous regions, orboth. So the real ancestral gene number may be larger than what weinfer, and thus the apparent 2:1 ratio of ancestral gene number to G.raimondii preserved gene number may be not significantly differentfrom 4:1 (indicative of two rounds of WGD). Secondly, although onaverage, Arabidopsis thaliana homologous regions have fewer dupli-cated genes preserved than G. raimondii, the number of duplicatedgenes preserved in Gossypium is not significantly larger than what isfound in the best preserved Arabidopsis homologous region (Table 1,bold numbers) (Fig. 3). The sequencing of the whole genome ofG. raimondii (in progress) will provide us with a relatively completelist of Gossypium genes and their arrangements, clarifying the historyof genome duplications in Gossypium species.

3.2. Effect of genome duplication on genome size

There is no obvious correlation between the number of WGDs agenome has experienced, and the size of its genome. Genomes withhistory of WGD vary greatly in genome size. For example, sorghumand rice share a similar history of WGD, while the sorghum genome is~72% larger (740 Mb vs 430 Mb) [24]. The Arabidopsis genome, with ahistory of one triplication and two duplications, has one of the mostcompact genomes in higher plants.

Fig. 5. Schematic view of gene and transposable elements in corresponding Gossypium and Vitis regions. Lines connecting different regions indicate syntenic genes.

318 L. Lin et al. / Genomics 97 (2011) 313–320

Page 7: Comparative analysis of Gossypium and Vitis genomes indicates genome duplication specific to the Gossypium lineage

Author's personal copy

The loss of duplicate genes (or “diploidization”) is common afterwhole genome duplication events. Over long periods of time, thediploidization process seems to restore plant genomes to a relativelystable gene number although changing the relative abundance ofsome gene functional groups. For the (albeit small) genomic regionthat we studied here, gene density of homologous regions in genomeswith and without WGD is similar, consistent with the notion thatgenome size variation is mostly caused by transposon accumulationsin heterochromatic regions. Comparative studies between rice andsorghum showed that in genomes where sizes of gene space are verysimilar, heterochromatin alone causes huge genome size differences[22,24]. In the regions of our study, however, fewer transposoninsertions were detected in the Gossypium sequences. This might bebecause the Gossypium BACs selected came from a gene-rich region.Transposon insertions tend to accumulate in heterochromatic regionssuch as peri-centromeric or sub-telomeric regions [22,25]. Ineuchromatic regions, duplicated genes in one paralogous regionmight be removed along with neighboring sequences after WGD,causing the region to be more compact than unduplicated counter-parts in outgroup genomes.

Many studies of genome size evolution focus on the effects oftransposable elements, particularly the insertion and deletionpatterns of LTR-retrotransposons [25,26]. The rapid expansion ofone or a few transposon families could lead to a huge increase ingenome size [27–29]. A burst of transposon activity has beendescribed in synthesized polyploids, and retrotransposons alone cancause genome size doubling even without WGD [30]. Our findingshere suggest that regardless of the number of WGDs a genome hasexperienced, the collective size of gene-rich regions in differentgenomes do not vary much after extensive gene loss, e.g. the sum ofthe sizes of four Arabidopsis regions homologous to the Vitis regionstudied are similar in size to the Vitis region. This suggests that mostWGDs have little long-term impact on the huge genome sizedifferences between plant species.

3.3. Advantage of using the V. vinifera genome in whole genome dot–plots

The V. vinifera genome is an excellent reference for efforts todetermine numbers of WGDs in eudicots. The Vitis genome hasexperienced no WGD since the ancient hexaploidy event shared bymost if not all eudicots [1]. Its slow evolutionary rate [2] and its stablekaryotype [1] may have left it closer to the ancestral gene order thanmost other eudicot lineages. Vitis is also a good phylogenetic out-group for comparative analysis of many eudicot species. Theseattributes are very helpful in elucidating the duplication history of anew genome.

Detecting ancient WGD often requires relatively complete infor-mation, i.e. sequences and arrangement of most genes in a genome, inorder for collinearity and/or synteny to be discernible after extensivegene loss, single gene duplications and translocations. For Gossypium,which is not yet sequenced and has only ~2000 genes geneticallymapped, there are relatively few homologous gene pairs available sofar to distinguish paleopolyploidy from background noise [13].

We show that the lack of data points to infer paleopolyploidy byintra-genomic comparison can be partially mitigated by using aconsensus genetic map (with markers from homeoelogous chromo-somes in tetraploid cotton interleaved into a consensus order) andcomparison to an outgroup genome. This approach has twoadvantages: 1. the consensus genetic map approximately doubledthe number of Gossypium data points available; 2. using an outgroupgenome such as Vitis helps to detect “ghost duplication” [31] segmentsthat are not detectable in self-plots due to the loss of one homolog.The dot–plot analysis using Gossypium consensus map and theV. vinifera genome in this study has shown patterns of synteny thatare not detected using Gossypium–Gossypium dot–plots. This method

could be generalized to the study of other genomes with well-developed genetic maps but lacking whole genome sequence.

4. Materials and methods

4.1. Genetic map and genome sequences

Gossypium genetic map and marker sequence data were retrievedfrom a previously published map [13]. Gene peptide sequences andposition information for Vitis, Carica and Arabidopsis were all down-loaded from the Plant Genome Duplication Database (PGDD: http://chibba.agtec.uga.edu/duplication/). Gossypium mRNA sequences weredownloaded from PlantGDB (http://www.plantgdb.org/prj/ESTCluster/progress.php).

4.2. Gossypium–Vitis whole genome dot–plot

Gossypium genetic marker sequences were aligned against Vitisgenes using BLASTx, with an E-value cut-off of 1e-10. The top 5 best hitswere retained in the BLAST results. The dot–plot was generated using aPython script (http://github.com/tanghaibao/quota-alignment/blob/master/scripts/blast_plot.py). ColinearScan [32] was used to detectcollinear blocks. Themaximumgap allowedwithin a syntenic block on aVitis chromosome was set to 1 Mb, and the maximum genetic distanceallowed on the consensus map was set to 10 cM.

4.3. BAC sequencing

The BACs are sequenced following a shotgun protocol. Each BACDNA sample was sheared using a Hydroshear (GeneMachines) toensure random fragmentation. Sheared DNA fragments were repairedusing End-it DNA End Repair Kit (Epicenter, Madison, Wisconsin,USA). Fragment sizes of ~4–5 kb were selected on a 1% low meltingagarose gel, eluting the DNA from the gel using the Qiagen QIAEX II(Qiagen, Valencia, California, USA) gel extraction system. DNAfragments were then ligated into the PCR-Blunt II-TOPO vector andtransformed into DH10B Escherichia coli host cells using an electro-porator. The transformed cells were spread onto Q-plates and pickedby a Q-bot into 96-well plates. Sequencing was performed on an ABI3730-XL Sequence Analyzer using BigDye Terminator v3.1 CycleSequencing Kit. Chromatographs were assembled using PhredPhrap.Quality of sequence assemblies were checked using SequencherV.4.1.4.

4.4. Gene and repetitive element identification from BAC sequences

Genes were identified from BAC sequences using FGENESH (http://linux1.softberry.com/berry.phtml). InGossypium, the species parameterwas set to “Dicot plants”; for Vitis the parameter was set to V. vinifera.Repetitive elements were identified using RepBase repeat maskingservice (http://www.girinst.org/), with species set to A. thaliana.

4.5. Local-level collinearity searches

The Vitis peptide sequences were used to BLAST against the BACsequences using tBLASTn, with a cutoff value of 1e-20. The BLASTresults were manually checked for collinearity. For Arabidopsis–Vitisgenomes synteny, multiple collinearity search and alignment wasperformed using MCScan [20].

4.6. Calculation of synonymous substitutions (Ks)

For homologues inferred from syntenic alignments, we aligned theprotein sequences using CLUSTALW [33] and used the protein alignmentsto guide coding sequence alignmentsbyPAL2NAL [34]. To calculateKs,weused the Nei–Gojobori method implemented in yn00 program in PAML

319L. Lin et al. / Genomics 97 (2011) 313–320

Page 8: Comparative analysis of Gossypium and Vitis genomes indicates genome duplication specific to the Gossypium lineage

Author's personal copy

package [35]. Python script is used to pipeline all the calculations andavailable at (http://github.com/tanghaibao/bio-pipeline/tree/master/synonymous_calculation/).

Supplementarymaterials related to this article can be found onlineat 10.1016/j.ygeno.2011.02.007.

References

[1] O. Jaillon, J.M. Aury, B. Noel, A. Policriti, C. Clepet, A. Casagrande, N. Choisne, S.Aubourg, N. Vitulo, C. Jubin, A. Vezzi, F. Legeai, P. Hugueney, C. Dasilva, D. Horner,E. Mica, D. Jublot, J. Poulain, C. Bruyere, A. Billault, B. Segurens, M. Gouyvenoux, E.Ugarte, F. Cattonaro, V. Anthouard, V. Vico, C. Del Fabbro, M. Alaux, G. Di Gaspero,V. Dumas, N. Felice, S. Paillard, I. Juman, M.Moroldo, S. Scalabrin, A. Canaguier, I. LeClainche, G. Malacrida, E. Durand, G. Pesole, V. Laucou, P. Chatelet, D. Merdinoglu,M. Delledonne, M. Pezzotti, A. Lecharny, C. Scarpelli, F. Artiguenave, M.E. Pe, G.Valle, M. Morgante, M. Caboche, A.F. Adam-Blondon, J. Weissenbach, F. Quetier, P.Wincker, The grapevine genome sequence suggests ancestral hexaploidization inmajor angiosperm phyla, Nature 449 (2007) 463–467.

[2] H. Tang, J.E. Bowers, X. Wang, R. Ming, M. Alam, A.H. Paterson, Synteny andcollinearity in plant genomes, Science 320 (2008) 486–488.

[3] J.E. Bowers, B.A. Chapman, J. Rong, A.H. Paterson, Unravelling angiosperm genomeevolution by phylogenetic analysis of chromosomal duplication events, Nature422 (2003) 433–438.

[4] E. Stokstad, Genomics. Poplar tree sequence yields genome double take, Science313 (2006) 1556.

[5] J. Schmutz, S.B. Cannon, J. Schlueter, J.Ma, T.Mitros,W.Nelson,D.L. Hyten,Q. Song, J.J.Thelen, J. Cheng, D. Xu, U. Hellsten, G.D. May, Y. Yu, T. Sakurai, T. Umezawa, M.K.Bhattacharyya, D. Sandhu, B. Valliyodan, E. Lindquist, M. Peto, D. Grant, S. Shu, D.Goodstein, K. Barry, M. Futrell-Griggs, B. Abernathy, J. Du, Z. Tian, L. Zhu, N. Gill, T.Joshi, M. Libault, A. Sethuraman, X.C. Zhang, K. Shinozaki, H.T. Nguyen, R.A. Wing, P.Cregan, J. Specht, J. Grimwood, D. Rokhsar, G. Stacey, R.C. Shoemaker, S.A. Jackson,Genome sequence of the palaeopolyploid soybean, Nature 463 (2010) 178–183.

[6] R. Ming, S. Hou, Y. Feng, Q. Yu, A. Dionne-Laporte, J.H. Saw, P. Senin, W.Wang, B.V.Ly, K.L. Lewis, S.L. Salzberg, L. Feng, M.R. Jones, R.L. Skelton, J.E. Murray, C. Chen,W.Qian, J. Shen, P. Du, M. Eustice, E. Tong, H. Tang, E. Lyons, R.E. Paull, T.P. Michael, K.Wall, D.W. Rice, H. Albert, M.L. Wang, Y.J. Zhu, M. Schatz, N. Nagarajan, R.A. Acob,P. Guan, A. Blas, C.M. Wai, C.M. Ackerman, Y. Ren, C. Liu, J. Wang, J.K. Na, E.V.Shakirov, B. Haas, J. Thimmapuram, D. Nelson, X. Wang, J.E. Bowers, A.R.Gschwend, A.L. Delcher, R. Singh, J.Y. Suzuki, S. Tripathi, K. Neupane, H. Wei, B.Irikura, M. Paidi, N. Jiang, W. Zhang, G. Presting, A. Windsor, R. Navajas-Perez, M.J.Torres, F.A. Feltus, B. Porter, Y. Li, A.M. Burroughs, M.C. Luo, L. Liu, D.A. Christopher,S.M. Mount, P.H. Moore, T. Sugimura, J. Jiang, M.A. Schuler, V. Friedman, T.Mitchell-Olds, D.E. Shippen, C.W. dePamphilis, J.D. Palmer, M. Freeling, A.H.Paterson, D. Gonsalves, L. Wang, M. Alam, The draft genome of the transgenictropical fruit tree papaya (Carica papaya Linnaeus), Nature 452 (2008) 991–996.

[7] M. Semon, K.H. Wolfe, Consequences of genome duplication, Curr. Opin. Genet.Dev. 17 (2007) 505–512.

[8] L.N. Lukens, J.C. Pires, E. Leon, R. Vogelzang, L. Oslach, T. Osborn, Patterns of sequenceloss and cytosine methylation within a population of newly resynthesized Brassicanapus allopolyploids, Plant Physiol. 140 (2006) 336–348.

[9] J.H. Postlethwait, Y.L. Yan, A. Amores, B. Cresko, A. Singer, D. Rubin, Consequencesof genome duplication for the evolution of developmental mechanisms in teleostfish, Integr. Comp. Biol. 45 (2005) 1058.

[10] M. Lynch, J.S. Conery, The evolutionary fate and consequences of duplicate genes,Science 290 (2000) 1151–1155.

[11] L. Lin, G. Pierce, J. Bowers, J. Estill, R. Compton, L. Rainville, C. Kim, C. Lemke, J.Rong, H. Tang, X. Wang, M. Braidotti, A. Chen, K. Chicola, K. Collura, E. Epps, W.Golser, C. Grover, J. Ingles, S. Karunakaran, D. Kudrna, J. Olive, N. Tabassum, E. Um,M. Wissotski, Y. Yu, A. Zuccolo, M. ur Rahman, D. Peterson, R. Wing, J. Wendel, A.Paterson, A draft physical map of a D-genome cotton species (Gossypiumraimondii), BMC Genomics 11 (2010) 395.

[12] J. Rong, A. Paterson, Comparative Genomics of Cotton and Arabidopsis, in: A.Paterson (Ed.), Genetics and Genomics of Cotton, Springer, New York, 2009.

[13] J. Rong, C. Abbey, J.E. Bowers, C.L. Brubaker, C. Chang, P.W. Chee, T.A. Delmonte, X.Ding, J.J. Garza, B.S. Marler, C.H. Park, G.J. Pierce, K.M. Rainey, V.K. Rastogi, S.R.Schulze, N.L. Trolinder, J.F.Wendel, T.A.Wilkins, T.D. Williams-Coplin, R.A.Wing, R.J.Wright, X. Zhao, L. Zhu, A.H. Paterson, A 3347-locus genetic recombination map ofsequence-tagged sites reveals features of genome organization, transmission andevolution of cotton (Gossypium), Genetics 166 (2004) 389–417.

[14] J. Rong, J.E. Bowers, S.R. Schulze, V.N. Waghmare, C.J. Rogers, G.J. Pierce, H. Zhang,J.C. Estill, A.H. Paterson, Comparative genomics of Gossypium and Arabidopsis:unraveling the consequences of both ancient and recent polyploidy, Genome Res.15 (2005) 1198–1210.

[15] O.V. Muravenko, A.R. Fedotov, E.O. Punina, L.I. Fedorova, V.G. Grif, A.V. Zelenin,Comparison of chromosome BrdU–Hoechst–Giemsa banding patterns of the A(1)and (AD)(2) genomes of cotton, Genome 41 (1998) 616–625.

[16] H.M. Ku, T. Vision, J.P. Liu, S.D. Tanksley, Comparing sequenced segments of thetomato andArabidopsis genomes: large-scale duplication followed by selective geneloss creates a network of synteny, Proc. Natl Acad. Sci. USA 97 (2000) 9121–9126.

[17] B. Hendrix, J.M. Stewart, Estimation of the nuclear DNA content of Gossypiumspecies, Ann. Bot. (Lond) 95 (2005) 789–797.

[18] C. Soderlund, I. Longden, R. Mott, FPC: a system for building contigs fromrestriction fingerprinted clones, Comput. Appl. Biosci. 13 (1997) 523–535.

[19] S.A. Smith, M.J. Donoghue, Rates of molecular evolution are linked to life history inflowering plants, Science 322 (2008) 86–89.

[20] H. Tang, X. Wang, J.E. Bowers, R. Ming, M. Alam, A.H. Paterson, Unraveling ancienthexaploidy through multiply-aligned angiosperm gene maps, Genome Res. 18(2008) 1944–1954.

[21] H.F. Kuo, K.M. Olsen, E.J. Richards, Natural variation in a subtelomeric region ofArabidopsis: implications for the genomic dynamics of a chromosome end,Genetics 173 (2006) 401–417.

[22] J.E. Bowers, M.A. Arias, R. Asher, J.A. Avise, R.T. Ball, G.A. Brewer, R.W. Buss, A.H.Chen, T.M. Edwards, J.C. Estill, H.E. Exum, V.H. Goff, K.L. Herrick, C.L. Steele, S.Karunakaran, G.K. Lafayette, C. Lemke, B.S. Marler, S.L. Masters, J.M. McMillan, L.K.Nelson, G.A. Newsome, C.C. Nwakanma, R.N. Odeh, C.A. Phelps, E.A. Rarick, C.J.Rogers, S.P. Ryan, K.A. Slaughter, C.A. Soderlund, H. Tang, R.A. Wing, A.H. Paterson,Comparative physical mapping links conservation of microsynteny to chromo-some structure and recombination in grasses, Proc. Natl Acad. Sci. USA 102 (2005)13206–13211.

[23] A. Desai, P.W. Chee, J. Rong, O.L.May, A.H. Paterson, Chromosome structural changesin diploid and tetraploid A genomes of Gossypium, Genome 49 (2006) 336–345.

[24] A.H. Paterson, J.E. Bowers, R. Bruggmann, I. Dubchak, J. Grimwood, H. Gundlach, G.Haberer, U. Hellsten, T. Mitros, A. Poliakov, J. Schmutz, M. Spannagl, H. Tang, X.Wang, T. Wicker, A.K. Bharti, J. Chapman, F.A. Feltus, U. Gowik, I.V. Grigoriev, E.Lyons, C.A. Maher, M. Martis, A. Narechania, R.P. Otillar, B.W. Penning, A.A.Salamov, Y. Wang, L. Zhang, N.C. Carpita, M. Freeling, A.R. Gingle, C.T. Hash, B.Keller, P. Klein, S. Kresovich, M.C. McCann, R. Ming, D.G. Peterson, D. Mehboob ur,D. Ware, P. Westhoff, K.F.X. Mayer, J. Messing, D.S. Rokhsar, The Sorghum bicolorgenome and the diversification of grasses, Nature 457 (2009) 551–556.

[25] J.L. Bennetzen, J.X. Ma, K. Devos, Mechanisms of recent genome size variation inflowering plants, Ann. Bot. 95 (2005) 127–132.

[26] J.L. Bennetzen, Mechanisms and rates of genome expansion and contraction inflowering plants, Genetica 115 (2002) 29–36.

[27] X. Zhao, R.A. Wing, A.H. Paterson, Cloning and characterization of the majority ofrepetitive DNA in cotton (Gossypium L.), Genome 38 (1995) 1177–1188.

[28] J.S. Hawkins, H. Kim, J.D. Nason, R.A. Wing, J.F. Wendel, Differential lineage-specific amplification of transposable elements is responsible for genome sizevariation in Gossypium, Genome Res. 16 (2006) 1252–1261.

[29] X.P. Zhao, Y. Si, R.E. Hanson, C.F. Crane, H.J. Price, D.M. Stelly, J.F. Wendel, A.H.Paterson, Dispersed repetitive DNA has spread to new genomes since polyploidformation in cotton, Genome Res. 8 (1998) 479–492.

[30] B. Piegu, R. Guyot, N. Picault, A. Roulin, A. Saniyal, H. Kim, K. Collura, D.S. Brar, S.Jackson, R.A. Wing, O. Panaud, Doubling genome size without polyploidization:dynamics of retrotransposition-driven genomic expansions in Oryza australiensis,a wild relative of rice, Genome Res. 16 (2006) 1262–1269.

[31] C. Simillion, K. Vandepoele, M.C.E. Van Montagu, M. Zabeau, Y. Van de Peer, Thehidden duplication past of Arabidopsis thaliana, Proc. Natl Acad. Sci. USA 99 (2002)13627–13632.

[32] X.Y. Wang, X.L. Shi, Z. Li, Q.H. Zhu, L. Kong, W. Tang, S. Ge, J.C. Luo, Statisticalinference of chromosomal homology based on gene colinearity and applicationsto Arabidopsis and rice, BMC Bioinform. 7 (2006).

[33] M.A. Larkin, G. Blackshields, N.P. Brown, R. Chenna, P.A. McGettigan, H.McWilliam, F. Valentin, I.M. Wallace, A. Wilm, R. Lopez, J.D. Thompson, T.J.Gibson, D.G. Higgins, Clustal W and Clustal X version 2.0, Bioinformatics 23(2007) 2947–2948.

[34] M. Suyama, D. Torrents, P. Bork, PAL2NAL: robust conversion of protein sequencealignments into the corresponding codon alignments, Nucleic Acids Res. 34(2006) W609–W612.

[35] Z. Yang, PAML 4: phylogenetic analysis bymaximum likelihood, Mol. Biol. Evol. 24(2007) 1586–1591.

320 L. Lin et al. / Genomics 97 (2011) 313–320