The Chimonanthus salicifolius genome provides insight into ... · Magnoliids represent the third largest group of angios-perms, which includes approximately 10 000 species (Pal-mer
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
RESOURCE
The Chimonanthus salicifolius genome provides insight intomagnoliid evolution and flavonoid biosynthesis
Qundan Lv1,†, Jie Qiu2,†, Jie Liu2,†, Zheng Li3, Wenting Zhang2, Qin Wang2, Jie Fang1, Junjie Pan1, Zhengdao Chen1,
Wenliang Cheng1, Michael S. Barker3, Xuehui Huang2, Xin Wei2,* and Kejun Cheng1,*1Chemical Biology Center, Lishui Institute of Agriculture and Forestry Sciences, Lishui, China,2Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai,
China, and3Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, USA
Received 4 November 2019; revised 25 May 2020; accepted 2 June 2020.
resolution of APG III and IV, which placed magnoliids as
sister to a clade containing both monocots and eudicots.
However, it is in line with a previous analysis of 59 low-
copy nuclear genes in 26 Mesangiospermae (Zeng et al.,
2014), and a phylogeny constructed by orthologous low-
copy nuclear genes in 115 plant species (Zhang et al.,
2020). In addition, it was also supported by the phyloge-
nomic framework constructed by 410 single-copy nuclear
gene families extracted from genome and transcriptome
data from 1153 species (One Thousand Plant Transcrip-
tomes Initiative, 2019).
Ancient whole-genome duplications in the Chimonanthus
genome
In this study, we inferred and placed two rounds of ancient
WGD in the genome of C. salicifolius by incorporating Ks
plots and ortholog divergences, synteny analyses, and the
MAPS phylogenomic approach. We show evidence for an
ancient polyploidy event only found in the Chimonanthus
genome, and not shared with Cinnamomum and Lirioden-
dron. This WGD was not inferred in the 1KP project (One
Thousand Plant Transcriptomes Initiative, 2019). This is
likely due to difficulties in detecting two highly overlapping
WGD peaks with mixture models from duplicate gene age
distributions. Based on the similarity of Ks distribution of
Idiospermum australiense and Calycanthus floridus from
the 1KP study, this Chimonanthus WGD is likely shared by
the Calycanthaceae. We also show evidence consistent
with an ancient WGD shared among Laurales and Magno-
liales. A previous study has shown the ancient WGD
inferred in the Liriodendron genome likely predated the
divergence of Magnoliaceae and Lauraceae (Chen et al.,
2019). Based on the genome of Cinnamomum and tran-
scriptome of 17 Laurales and Magnoliales from the 1KP
project, previous studies inferred an ancient polyploidy
event shared by Laraceae and another round of ancient
WGD at the ancestry of Laurales and Magnoliales (Chaw
et al., 2019; One Thousand Plant Transcriptomes Initiative,
2019). Consistent with these studies, we found further evi-
dence for the placement of this ancient WGD shared by
Laurales and Magnoliales by our MAPS phylogenomic
approach. Overall, our ancient WGD analyses are largely
consistent with previous findings, and provide clear evi-
dence for two rounds of ancient WGDs in Chimonanthus.
The magnoliid genomes contain a large number of long
genes
In total, 2737 long genes were identified from the C. salici-
folius genome, much more than monocot and eudicot gen-
omes. Long genes with long introns (> 10 kb) have also
been detected in animals, such as humans, Rattus norvegi-
cus, Danio rerio and Drosophila. In the human genome,
the number of introns longer than 24 kb was more than
8000, and the super-long-introns (> 100 kb) numbered
more than 1200 (Shepard et al., 2009). Previous research
on the long introns of Drosophila revealed that some of
the long introns underwent recursive splicing (Hatton
et al., 1998; Conklin et al., 2005; Sibley et al., 2015). Muta-
tions that occurred in the recursive splicing sites resulted
in many human diseases (Chabot and Shkreta, 2016). The
recursive splicing is a splicing phenomenon difficult to
capture, and requires nascent RNA sequencing, which can
profile pre-mRNA transcripts shortly after they are tran-
scribed (Pai et al., 2018). With the data for designed tran-
scriptomic experiments, more characteristics for the long
genes (such as recursive splicing and other mechanisms)
in the genomes of magnoliids could be explored in future.
The Chimonanthus salicifolius genome benefits functional
genomics research and molecular breeding of
Chimonanthus salicifolius
The genus Chimonanthus is widely grown in Asia, America
and Europe. Chimonanthus salicifolius is distributed
mainly in central and eastern China. It is collected and
used as a traditional medicine. The plants of this species
show vigorous growth, tolerance to several abiotic and
biotic stresses, and flowering at low temperatures. Despite
its importance, C. salicifolius is still not deeply utilized,
and its basic research is lacking.
Based on the high quality of the reference genome, gen-
ome-wide association studies (GWAS) and genome-wide
linkage mapping could be performed to quickly and com-
prehensively identify quantitative trait loci (QTLs) that are
related to the yield (yield of leaf buds) and quality (content
of flavones and other bioactive secondary metabolites) of
C. salicifolius. Using gene annotation and gene expression
information, candidate genes in the QTL regions could be
identified. Genome-editing and genetic complementation
experiments, which will also benefit from this genome by
using gene sequences, could be carried out to validate the
candidate genes. These genes can be further utilized in the
molecular breeding for high-yield and superior-quality
C. salicifolius cultivars.
Thus, an accurate reference genome of C. salicifolius
will provide a platform for elucidating the genomic evolu-
tion of the Chimonanthus genus and understanding the
genes responsible for biosynthesis of the various flavo-
noids made in C. salicifolius as well as laying a foundation
for the molecular breeding of C. salicifolius.
EXPERIMENTAL PROCEDURES
Plant materials, genomic DNA extraction and sequencing
The individual C. salicifolius that was used for genome sequenc-ing was originally collected from Liandu District (28°27053″N,119°55031″E), Lishui City, Zhejiang Province in Eastern China, andpreserved in the Lishui Institute of Agriculture and ForestrySciences. The RNA samples were collected from a wild population
of C. salicifolius in the natural environment at Kaihua County(29°14026″N, 118°27057″E), Zhejiang Province, Eastern China.
Genomic DNA was extracted from young leaves of C. salicifoliusplants using a The DNeasy Plant Mini Kit (Qiagen, Hilden, Ger-many) according to the user manual. The further treatment andpreparation of the genomic DNA of Illumina sequencing followedthe description in Wei et al. (2016). PacBio SMRTbell libraries(20 kb inserts) were prepared with a Template Prep Kit (Pacific Bio-sciences, Menlo Park, CA, USA), and 12 SMRT cells were run on thePacBio Sequel system with P6-C4 chemistry (Chin et al., 2013).
RNA extraction and sequencing
Tissues of roots, stems, leaf buds and seeds, as well as flowersand leaves in three developmental stages were collected fromthree individuals, and total RNA was extracted from each sampleusing RNeasy Plant Mini Kit (Qiagen) according to the user man-ual. The cDNA was synthesized from 20 lg total RNA using ReverTra Ace (TOYOBO, Osaka, Japan) with oligo(dT) primer followingthe manufacturer’s protocol. High-throughput sequencing wasthen performed on the Illumina HiSeq X Ten platform.
Genome size estimation
Flow cytometry was used to determine the nuclear DNA contentof C. salicifolius as described by Dole�zel et al. (2007). Sampleswere prepared by homogenizing young leaves of C. salicifoliusand O. sativa ssp. japonica cv. Nipponbare (as an internal stan-dard, 0.91 pg/2C; Ammiraju et al., 2006) on ice in Galbraith’s buf-fer (5 mM sodium metabi-sulfite and 5 ll b-mercaptoethanolcomplemented) with 50 lg ml�1 propidium iodide, and then ana-lyzed on a MoFlo XDP Cell Sorter (excitation 488 nm, emission620 nm; Beckman Coulter, Hialeah, FL, USA) after filtration. Thedata were further analyzed with FlowJo_V10.4.0 software. Thenuclear DNA content of C. salicifolius was estimated as followswith 1 pg of DNA assumed to be equivalent to 9.78 9 108 bp:Sample 1C value = Reference 1C value 9 sample 2C mean peakposition/reference 2C mean peak position. Genome size estima-tion based on Illumina short reads was conducted via a 17-bpk-mer frequency analysis with ‘kmerfreq’ as implemented inSOAPdenovo2 (Luo et al., 2012).
De novo assembly and genome evaluation
De novo assembly of C. salicifolius was performed using Falconv1.87 (Chin et al., 2016) software. After the process of base error cor-rection, overlap graphs were built, and consensus contigs were con-structed based on raw PacBio long reads. Contig sequences werealigned against each other to remove redundant sequences withmore than 85% similarity and overlap. The Illumina data were alignedwith the assembly contigs by bwa (Li and Durbin, 2009), and SNP andindel errors were corrected using Pilon v1.22 (Walker et al., 2014).
The contigs were scaffolded by FragScaff v140324.1 (Adeyet al., 2014) using 10 9 Genomics data. Based on Hi-C data, scaf-folds were anchored to 11 pseudomolecules using LACHESIS soft-ware (Burton et al., 2013). The completeness of the assembledgenome was evaluated by BUSCO v3 using the ‘embryophyta_od-b9’ database (Simao et al., 2015).
Repeat and gene annotation
We constructed a C. salicifolius genome repeat library usingRepeatModeler v1.0.11 with the default parameters (Chen, 2004).The constructed C. salicifolius repeat library was further used torun RepeatMasker v4.0.7 (Chen, 2004) for whole-genome repeatannotation.
The combination of ab initio gene prediction, protein homologevidence and transcriptomic evidence was used for the predictionof protein-coding genes. AUGUSTUS v3.0.3 (Stanke and Waack,2003), SNAP v5.0 (Leskovec and Sosic, 2016) and GeneMark-ETv4.212 (Lomsadze et al., 2014) were used in ab initio gene predic-tion. The protein sequences of Arabidopsis were aligned to theassembled C. salicifolius genome by Exonerate (Slater and Birney,2005) to achieve evidence for gene structure. The open readingframes (ORFs) in the transcripts from the RNA-seq data were pre-dicted by PASA v2.0.1 (Haas et al., 2003). Finally, all the predic-tions were combined into consensus gene models using EVM(Haas et al., 2008).
The predicted C. salicifolius gene models were aligned againstthe Swiss-Prot and NR protein databases for functional annotation(BLASTP, E-value ≤ 1E-5). InterProScan v5 (Zdobnov and Apwei-ler, 2001) was then applied for the prediction of protein domainsand GO terms for each gene model with the setting ‘-appl PfamA-goterms -pa’. Non-coding RNAs were predicted by the Infernalprogram using the default parameters (Nawrocki and Eddy, 2013).
Phylogenetic analysis and estimation of divergence time
A total of 17 plant species, including four magnoliids (C. salici-folius, P. americana, C. kanehirae, L. chinense), three monocots(O. sativa, Z. mays, M. acuminata), eight eudicots (A. thaliana,P. trichocarpa, P. mume, V. vinifera, D. carota, M. guttatus,A. coerulea, N. nucifera), A. trichopoda and S. moellendorffii wereselected for building the phylogenetic tree. Except for N. nucifera,all the genomes were downloaded from the ftp site of JGI (ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v12.0/). Paralogs andorthologs among the 17 species were identified using theOrthoFinder pipeline with the parameter ‘-M msa -oa’ (Emms andKelly, 2015), and the protein sequences of the identified 103 sin-gle-copy genes were used for phylogenetic tree construction.RAxML v8 (Stamatakis, 2014) was used for the tree construction,with the parameters ‘-m PROTGAMMAAUTO–auto-prot=bic’ toautomatically select the best protein model. A total of 100 boot-strap resampling was performed. The phylogenetic tree was visu-alized using MEGA v5 (Tamura et al., 2011). In addition, ASTRAL-III v5.7.3 (Zhang et al., 2018) was applied to infer the coalescence-based species tree with 1420 gene trees (Figure S6).
Estimation of divergence and ancient whole-genome
duplications
DupPipe analyses of ancient whole-genome duplications. Foreach genome, we used the DupPipe pipeline to construct genefamilies and estimate the age distribution of gene duplications(Barker et al., 2008, 2010). We translated DNA sequences and iden-tified ORFs by comparing the Genewise (Birney et al., 2004) align-ment to the best-hit protein from a collection of proteins from 25plant genomes from Phytozome (Goodstein et al., 2012). For allDupPipe runs, we used protein-guided DNA alignments to alignour nucleic acid sequences while maintaining the ORFs. We esti-mated synonymous divergence (Ks) using PAML with the F3X4model (Yang, 2007) for each node in the gene family phylogenies.We then used mixture modeling to identify significant peaks con-sistent with a potential WGD and to estimate their median paralogKs values. Significant peaks were identified using a likelihood ratiotest in the boot.comp function of the package mixtools in R (Bena-glia et al., 2009).
Estimating orthologous divergence. To place putative WGDs inrelation to lineage divergence, we estimated the synonymous
divergence of orthologs among pairs of species that may share aWGD based on their phylogenetic position and evidence from thewithin-species Ks plots. We used the RBH Orthologue pipeline(Barker et al., 2010) to estimate the mean and median synony-mous divergence of orthologs, and compared those with the syn-onymous divergence of inferred paleopolyploid peaks. Weidentified orthologs as reciprocal best blast hits in pairs of tran-scriptomes. Using protein-guided DNA alignments, we estimatedthe pairwise synonymous divergence for each pair of orthologsusing PAML with the F3X4 model (Yang, 2007).
Synteny analyses and dating of ancient whole-genome
duplications and orthology divergence. The genomiccollinearity blocks for intra- and interspecies comparisons formagnoliids were identified by MCscan program (Tang et al.,2008). We performed all-against-all LAST (Kielbasa et al., 2011)and chained the LAST hits with a distance cutoff of 10 genes,requiring at least 5 gene pairs per synteny block. The syntenic‘depth’ function implemented in MCscan was applied to estimatethe duplication history in respective genomes. The genomic syn-teny was visualized by the python version of MCScan (Tang et al.,2008) and Circos (Krzywinski et al., 2009). The dating of ancientWGDs and orthology divergence were estimated using the for-mula T = Ks/2R, where Ks refers to the synonymous substitutionsper site, and R (3.02 9 10�9) is the synonymous substitution ratefor magnoliids estimated by Cui et al. (2006). Estimation of thedivergence times for A. trichopoda – O. sativa and O. sativa –magnoliids was based on TimeTree (Kumar et al., 2017).
MAPS analyses of whole-genome duplications from gen-
omes of multiple species. To determine the WGD nodeacross the magnoliid phylogeny, the MAPS tool (Li et al., 2015,2018) was applied. Six species, including the four magnoliids(P. americana, C. kanehirae, C. salicifolius, L. chinense), onemonocot species (O. sativa) and A. trichopoda, were selected asoutgroup. Orthologous groups for the six species were obtainedfrom Orthofinder (Emms and Kelly, 2015). We chose gene familieswith a maximum gene family size of 20, and achieved a total num-ber of 8437 gene families. The phylogenetic trees for the 8437gene families constructed by FastTree (Price et al., 2009) were ana-lyzed by the MAPS program. Both null and positive simulations ofthe background gene birth and death rates were performed tocompare with the observed number of duplications at each node.
For null simulations, we estimated the gene birth rate (k) anddeath rate (l) for the selected six species with WGDgc (Rabieret al., 2014). Gene count data of each gene family for the six spe-cies were obtained from Orthofinder (Emms and Kelly, 2015). Theestimated parameters (k = 1.355; l = 0.050) were configured in theMAPS program, and the gene trees were then simulated withinthe species tree using the GuestTreeGen program from GenPhylo-Data (Sjostrand et al., 2013). For each species tree, we simulated3000 gene trees with at least one tip per species: 1000 gene treesat the estimated k and l, 1000 gene trees at half of the estimated kand l, 1000 trees at three times k and l according to the settingsin the 1KP program (One Thousand Plant Transcriptomes Initia-tive, 2019; Li and Barker, 2020). We then randomly resampled1000 trees without replacement from the total pool of gene trees100 times to provide a measure of uncertainty on the percentageof subtrees at each node. A Fisher’s exact test was used to identifylocations with significant increases in gene duplication comparingwith a null simulation.
For positive simulations, we simulated gene trees using thesame methods described above. However, we incorporated WGDs
at the location in the MAPS phylogeny with significantly largernumbers of gene duplications compared with the null simulation.We allowed at least 20% of the genes to be retained following thesimulated WGD to account for biased gene retention and loss.
Identification and validation of long genes
The lengths of all genes were screened, and genes longer than20 kb were selected. Twenty long genes were randomly selected,and their coding sequences were amplified from the cDNA of dif-ferent C. salicifolius tissues using KOD-FX Plus (TOYOBO). Theprimers used for cloning long genes are listed in Table S11. Theamplified fragments were ligated into pMD18-T cloning vector byusing pMDTM 18-T Vector Cloning Kit (TAKARA, Shiga, Japan) afteradding A-tailing through DNA A-Tailing Kit (TAKARA). Positivesingle bacterial colonies were selected for plasmid extraction andfurther sequencing. The sequences were aligned with that of thelong genes.
Gene expression profiling
The raw paired-end RNA-seq reads were filtered into clean databy FASTP (Chen et al., 2018). The clean reads were aligned to ourgenerated C. salicifolius genome reference by Hisat2 (Kim et al.,2015), and StringTie (Pertea et al., 2015) was adopted for quantifi-cation of expression. The differential expression analysis was per-formed with Cuffdiff in the Cufflinks package (Trapnell et al.,2010). The gene coexpression pattern was visualized using the Rpackage ‘corrplot’.
The MapMan software was used to investigate the transcrip-tomic profiles of different developmental stages of flowers andleaves. A functional annotation database was constructed withMercator (Lohse et al., 2014). The list of significantly differentiallyexpressed genes was loaded into MapMan to analyze the signifi-cantly up- and downregulated pathways. GO enrichment analysiswas performed using agriGO (Tian et al., 2017), with the GOterms identified with InterProScan as the species background.The ‘Plant GO slim’ option was selected, and a false discoveryrate (FDR) criterion of 0.05 was used for the considered enrich-ment GO terms.
Identification of genes involved in the flavonoid pathway
To identify the candidate genes involved in the flavonoid pathwayin the C. salicifolius genome, we collected the genes in A. thalianathat were documented in the flavonoid pathway (Saito et al.,2013). The protein sequences of genes for four species (A. tri-chopoda, O. sativa, A. thaliana, C. salicifolius) were combined intoa database. Using each gene of A. thaliana in the flavonoid path-way as a query sequence, BLASTP was applied to scan homolo-gous genes (E-value thresholds: 1E-10). Phylogenetic trees wereconstructed for the homologous genes of the four species byRAxML v8 (Stamatakis, 2014), and further used for identificationof candidate orthologous genes.
Evaluation of flavonoid content in different tissues
The tissues used in the flavonoid evaluation were in accordancewith the samples used in the RNA-seq. These samples were col-lected and dried at 60°C. The dried samples were ground intopowder, and were filtered by passing through 80–100 mesh. HPLCanalysis was carried out on Agilent 1260 instrument following themethod described previously (Yang et al., 2018). Contents of sixflavonoids, including kaempferol, kaempferol-3-O-glucoside,kaempferol-3-O-rutinoside, quercetin, isoquercitrin and rutin, wereanalyzed with commercial reference standards. Pearson
correlation coefficient was calculated for expression values ofeach gene in the identified flavonoid pathway with the measuredflavonoid contents.
ACKNOWLEDGEMENTS
This work is financially supported by the Zhejiang Major Science& Technology Project of New Agricultural Varieties (2016C02058),the Zhejiang Province Major Science & Technology Project(2012C12014-1), the National Natural Science Foundation of China(31671282), Shanghai Science and Technology Committee Rising-Star Program (19QA1406500), and Shanghai Engineering ResearchCenter of Plant Germplasm Resources (17DZ2252700). The authorsthank Nextomics Biosciences Co., Ltd (Wuhan) for the help in gen-ome assembly, Dr Qiang Zhao from National Center for GeneResearch of Chinese Academy of Sciences for assistance in gen-ome annotation, Dr Yunpeng Zhao (Zhejiang University), and DrJun Yang (Chinese Academy of Sciences) for the discussion andproviding valuable suggestions to the manuscript. The authorsgratefully acknowledge the support of the IBM high-performancecomputing cluster of Analysis Center of Agrobiology and Environ-mental Sciences, Zhejiang University.
AUTHOR CONTRIBUTIONS
KC, XH and XW conceived and coordinated the project. QL,
JL, QW, JF, JP, ZC and WC prepared the materials and
conducted the experiments. JL and JQ performed the
assembly and annotation of the genome. JQ, XW, ZL, WZ
and JL carried out the phylogenetic, comparative genomics
and transcriptome analysis. XW, JQ, ZL, QL MB and KC
wrote the manuscript.
CONFLICT OF INTEREST
The authors declare no competing financial interests.
DATA AVAILABILITY STATEMENT
The assembled C. salicifolius genome and its related data
have been deposited under NCBI BioProject accession
PRJNA602413. The genome assembly has been assigned
with the accession number JAAGOE000000000. The SRA
accession numbers for the raw sequencing data (Pacbio,
Illumina, 10 9 Genomics, and Hi-C) are SRR11127589-
SRR11127597 and SRR11191851-SRR11191853. The tran-
scriptomic data generated in this study are under acces-
sion numbers SRR11109013-SRR11109042. The
C. salicifolius genome assembly and the annotated genes
are accessible at http://xhhuanglab.cn/data/Chimonanthus_
salicifolius.html.
SUPPORTING INFORMATION
Additional Supporting Information may be found in the online ver-sion of this article.
Figure S1. Genome survey of C. salicifolius based on K-mer analy-sis using Illumina sequencing data.
Figure S2. Genome size estimation based on flow cytometry usingO. sativa as an internal reference.
Figure S3. Hi-C contact map of the 11 constructed pseudochromo-somes.
Figure S4. Percentage of long genes and all genes that present ingenomic regions of different repetitive levels.
Figure S5. Heterozygous SNP distribution in the repeat sequenceregions.
Figure S6. Coalescent-based phylogenetic tree constructed by1420 orthologous genes retrieved from 15 plants.
Figure S7. Distribution of Ks among paralogs in four magnoliidplants.
Figure S8. Genomic syntenic depth ratio between magnoliidsagainst A. trichopoda and V. vinifera.
Figure S9. Long genes validated by PCR amplification.
Figure S10. Tissues used for RNA-seq.
Figure S11. PCA based on the expression profile of all genes fordifferent tissues of C. salicifolius.
Figure S12. Transcriptomic profiles for different tissues of C. sali-cifolius.
Figure S13. GO and MapMan terms for the significantly differen-tially expressed genes between bud and blooming stages.
Figure S14. Expansion of two gene families related to cold toler-ance.
Figure S15. Transcriptomic profile for metabolism-related genesvisualized by MapMan.
Figure S16. Classification of UDP-glucosyltransferase multigenefamily in the C. salicifolius genome.
Figure S17. Distribution of flavonoid pathway genes in the C. sali-cifolius genome.
Table S1. Assessment of the completeness of the genome assem-bly by BUSCO analysis
Table S2. Summary statistics of repeat sequences in the C. salici-folius genome.
Table S3. Comparison of number of genes with specific proteindomains in C. salicifolius and magnoliids against 11 monocot andeudicot plants.
Table S4. MAPS result for placements of WGDS for magnoliidsand their simulated distributions.
Table S5. Long genes that were successfully amplified from cDNAof C. salicifolius tissues.
Table S6. GO enrichment terms of long genes.
Table S7. Summary of RNA-seq data generated in this study.
Table S8. Table S8 Homologous genes involved in flavone biosyn-thetic pathways in C. salicifolius.
Table S9. Flavonoid content of different tissues in C. salicifolius.
Table S10. Correlation of flavonoid content and the expression offlavonoid biosynthetic genes.
Table S11. Primers used in PCR amplification for validation of longgenes.
REFERENCES
Adey, A., Kitzman, J.O., Burton, J.N. et al. (2014) In vitro, long-range
sequence information for de novo genome assembly via transposase
contiguity. Genome Res. 24, 2041–2049.Amborella Genome Project (2013) The Amborella genome and the evolution
of flowering plants. Science, 342, 1241089.
Ammiraju, J.S.S., Luo, M.Z., Goicoechea, J.L. et al. (2006) The Oryza bacte-
rial artificial chromosome library resource: construction and analysis of
12 deep-coverage large-insert BAC libraries that represent the 10 gen-
ome types of the genus Oryza. Genome Res. 16, 140–147.Barker, M.S., Kane, N.C., Matvienko, M., Kozik, A., Michelmore, R.W.,
Knapp, S.J. and Rieseberg, L.H. (2008) Multiple paleopolyploidizations
during the evolution of the Compositae reveal parallel patterns of dupli-
cate gene retention after millions of years. Mol. Biol. Evol. 25, 2445–2455.