Genetic Architecture of Maize Kernel Composition in the ...rapid breakdown of linkage disequilibrium in diverse maize lines, enabling very high resolution for QTL map-ping via association
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Genetic Architecture of Maize Kernel Compositionin the Nested Association Mapping and InbredAssociation Panels1[W]
Jason P. Cook, Michael D. McMullen, James B. Holland, Feng Tian, Peter Bradbury, Jeffrey Ross-Ibarra,Edward S. Buckler, and Sherry A. Flint-Garcia*
Division of Plant Sciences, University of Missouri, Columbia, Missouri 65211 (J.P.C., M.D.M., S.A.F.-G.);United States Department of Agriculture-Agricultural Research Service, Columbia, Missouri 65211 (M.D.M.,S.A.F.-G.); United States Department of Agriculture-Agricultural Research Service, Raleigh, North Carolina27695 (J.B.H.); United States Department of Agriculture-Agricultural Research Service, Ithaca, New York14853 (P.B., E.S.B.); Department of Crop Science, North Carolina State University, Raleigh, North Carolina 27695(J.B.H.); Department of Plant Breeding and Genetics, Cornell University, Ithaca, New York 14853 (F.T., P.B.,E.S.B.); and Department of Plant Sciences, University of California, Davis, California 95616 (J.R.-I.)
The maize (Zea mays) kernel plays a critical role in feeding humans and livestock around the world and in a wide arrayof industrial applications. An understanding of the regulation of kernel starch, protein, and oil is needed in order tomanipulate composition to meet future needs. We conducted joint-linkage quantitative trait locus mapping and genome-wideassociation studies (GWAS) for kernel starch, protein, and oil in the maize nested association mapping population, composedof 25 recombinant inbred line families derived from diverse inbred lines. Joint-linkage mapping revealed that the geneticarchitecture of kernel composition traits is controlled by 21–26 quantitative trait loci. Numerous GWAS associations weredetected, including several oil and starch associations in acyl-CoA:diacylglycerol acyltransferase1-2, a gene that regulates oilcomposition and quantity. Results from nested association mapping were verified in a 282 inbred association panel using bothGWAS and candidate gene association approaches. We identified many beneficial alleles that will be useful for improvingkernel starch, protein, and oil content.
Maize (Zea mays) is the world’s most important pro-duction crop (faostat.fao.org): Its starch, protein, and oilare essential in supplying adequate food and nutrition toboth humans and animals, andmaize starch has recentlybecome an important feedstock for ethanol production.Altering starch content can lead to higher yields, spe-cialty industrial applications, and improved sweet cornvarieties, while increased protein content and aug-mented levels of essential amino acids improve nutri-tional quality. Growing demand for healthy cooking oilcan be met by improved oil content and composition.
Substantial effort has been spent to develop maizevarieties thatmeetmarket demands formodified kernelcomposition. Specialty maize germplasm with uniquekernel composition traits has been developed by ex-ploiting mutations affecting kernel grain compositionand quality, including opaque2 (o2), which increases Lyscontent (Mertz et al., 1964), amylose-free waxy1 (wx1;
Lambert, 2001), sugary1 (su1), sugary enhancer (SE), andshrunken2 (sh2), which are responsible for sweet corn(Schultz and Juvik, 2004), and linoleic acid1 (ln1) with analtered fatty acid ratio (Poneleit and Alexander, 1965).Use of specialty maize germplasm with unique kernelcomposition has been limited, however, due to difficul-ties in developing agronomically superior germplasm.Future progress in kernel composition improvementwill depend on understanding and exploiting quanti-tative trait loci (QTLs) for kernel composition traits.
The complex genetic architecture of starch, protein,and oil content has been demonstrated in the inbred line(IL) long-term selection experiment, in which more than100 generations of recurrent selection has increased oiland protein content to approximately 20% and 27%,respectively (Moose et al., 2004). The continued pheno-typic response of kernel composition provides convinc-ing evidence that these traits are controlled by manygenes. This is further demonstrated by the numerousstarch, protein, and oil QTLs detected in studies involv-ing lines derived from the IL long-term selection pop-ulations (Goldman et al., 1993, 1994; Sene et al., 2001;Laurie et al., 2004; Hill, 2005; Dudley et al., 2004, 2007;Dudley, 2008; Clark et al., 2006; Wassom et al., 2008).Little is known, however, about the causative geneticfactors underlying kernel composition QTLs.
Two publically available maize genetic resources,the nested association mapping (NAM) population
1 This work was supported by the National Science Foundation(DBI–0321467 and IOS–0820619) and U.S. Department of AgricultureNational Institute of Food and Agriculture (grant no. 2009–01864).
* Corresponding author; e-mail [email protected] author responsible for distribution of materials integral to the
findings presented in this article in accordance with the policydescribed in the Instructions for Authors (www.plantphysiol.org) is:Sherry Flint-Garcia ([email protected]).
[W] The online version of this article contains Web-only data.www.plantphysiol.org/cgi/doi/10.1104/pp.111.185033
824 Plant Physiology�, February 2012, Vol. 158, pp. 824–834, www.plantphysiol.org � 2011 American Society of Plant Biologists. All Rights Reserved. www.plantphysiol.orgon June 25, 2020 - Published by Downloaded from
(McMullen et al., 2009) and the 282 IL association panel(AP; Flint-Garcia et al., 2005), were developed for high-power, high-resolution QTL analysis. The NAM popu-lation was developed by crossing 25 diverse founder ILsto the reference inbred B73 and producing 25 recombi-nant inbred line (RIL) families. The presentNAMgeneticmap is based on 1,106 single nucleotide polymorphisms(SNPs) assayed on 4,699 RILs. The power and resolutionof joint-linkage mapping in NAM was recently demon-strated for maize flowering time (Buckler et al., 2009).The unique structure of NAM also offers an opportunityto further dissect QTLs using genome-wide associationstudies (GWAS; Tian et al., 2011). Release of the first-generation maize HapMap (Gore et al., 2009) enablesprojection of 1.6million SNPs and indels identified in theNAM founder lines onto the NAM RILs. Use of Hap-Mapmarkers for GWAS successfully dissected leaf mor-phology and northern and southern leaf blight QTLs tothe level of individual genes (Kump et al., 2011; Polandet al., 2011; Tian et al., 2011). The 282 IL AP exploits therapid breakdown of linkage disequilibrium in diversemaize lines, enabling very high resolution for QTL map-ping via association analysis (Flint-Garcia et al., 2005).The candidate gene association approach has been suc-cessful in identifying genes controlling various quanti-tative traits in maize (Thornsberry et al., 2001; Wilsonet al., 2004; Harjes et al., 2008; Krill et al., 2010; Yan et al.,2010).In this study, we evaluated the NAM population and
the 282 IL APs for starch, protein, and oil content. QTLswere identified by joint-linkage analysis and furtherresolved with GWAS in NAM. We report kernel starch,protein, and oil composition genetic architecture is char-acterized primarily by additive gene action. The finemapping resolution of NAM-enabled GWAS to resolvean oil QTL on chromosome 6 to the genic level, revealingan allelic series for acyl-coa:diacylglycerol acyltransferase1-2(DGAT1-2), a gene involved in oil synthesis. The NAManalysis was complemented by GWAS on the 282 inbredAP using 55,000 SNPs. After multiple test correction,none of the GWAS associations in the AP were signif-icant. However, SNPs located in specific candidate geneswere significant when the candidate gene associationanalysis approach was used.
RESULTS
Phenotypic Assessment of NAM and APKernel Composition
Starch, protein, and oil content was estimated bynear-infrared (NIR) spectroscopy for self-pollinatedseed samples of the NAM population and 282 inbredAP grown in seven locations spanning 2 y. The PertenEthanol Calibration Package contains over 1,700 cali-bration samples with the following ranges: 7.4%–37.6%for moisture, 4.9%–15.3% for protein, and 2.2%–3.5%for oil. The R2 values for the Perten calibrations are allvery high (.0.94) for samples within these ranges. The
proprietary Syngenta starch calibration sample set con-tained 814 samples ranging from 48.3% to 67.9% starch,and the R2 value was 0.94 for samples within that range.After adjusting these calibration sample compositionvalues to a dry matter basis, the vast majority of ourNAM and AP samples fell within the range of thecalibration, with only 0.7%, 1.2%, and 0.9% of our valuesfalling outside that range for starch, protein, and oil,respectively. All composition values were adjusted to adry matter basis.
The two NAM sweet corn families (IL14H and P39)were excluded from analysis due to their extreme kernelphenotypes. Starch, protein, and oil content among theNAM founders ranged from 62.3% to 69.6%, 12.3% to15.3%, and 3.5% to 5.5%, respectively, whereas theNAM population displayed transgressive segregationresulting in greater differences among the RILs (Table I;Supplemental Table S1). Starch, protein, and oil contentamong the inbreds in the 282 AP ranged from 59.6% to70.3%, 11.5% to 17.5%, and 3.1% to 8.2%, respectively(Table I). In both the NAM population and AP, highlysignificant (P , 0.0001) negative phenotypic correla-tions were detected between starch and both protein(r = 20.66 and 20.56 for NAM and AP, respectively)and oil (r = 20.41 and 20.33 for NAM and AP, respec-tively), and a significant positive phenotypic correlationwas detected between protein and oil (r = 0.32 and 0.29for NAM andAP, respectively). Broad-sense heritabilityfor these traits was high in both the NAM populationand AP, ranging from 83% to 91% (Table I).
NAM Joint QTL Linkage Analysis
Joint stepwise regression identified 21 starch, 26protein, and 22 oil QTLs, which collectively explained59%, 61%, and 70% of the total variation, respectively(Fig. 1; Table I). All starch, protein, and oil QTLs wereshared among multiple families, with most QTLsshowing significant effects among three to six families.Because the founder lines were crossed to a commonreference line (B73), additive allelic effects relative toB73 can be accurately estimated. In joint-linkage map-ping, we are mapping QTLs that are linked to the SNPsbeing tested. While the SNP markers are biallelic, eachof the 23 populations was allowed to have an indepen-dent allele by fitting a population-by-marker term in thestepwise regression and final models. A total of 133starch, 136 protein, and 114 oil alleles were significantafter false discovery rate (FDR) correction (P= 0.05; Fig. 2;Supplemental Figs. S1 and S2; Supplemental Tables S5–S7). All QTL additive allelic effects were small relative tothe amount of variation observed among founders, withthe largest allelic effects for starch, protein, and oil QTLsbeing 0.65%,20.38%, and 0.21% dry matter, respectively.Allelic series, or QTLs displaying both positive andnegative additive allelic effects, were identified in 31%to 43% of the QTLs, depending on the trait.
We searched for the presence of epistatic interac-tions in the NAM population by testing all pairwisemarker combinations. Eight significant epistatic inter-
Genetic Architecture of Maize Kernel Composition
Plant Physiol. Vol. 158, 2012 825 www.plantphysiol.orgon June 25, 2020 - Published by Downloaded from
actions were observed for oil at the NAM level at the5% FDR (Benjamini and Hochberg, 1995). However,none of these oil interactions remained significantwhen added to the full joint-linkage model. Analysisof individual families yielded only two family-specificepistatic interactions for protein that were significantafter FDR correction, but these were likewise not sig-nificant in the context of the joint-linkage model.
The NAM design provides a powerful test of pleiot-ropy among overlapping QTL intervals from multipletraits by correlating the allelic effects across 23 families.Joint-linkage mapping with 1,106 markers producedstarch, protein, and oil QTL support intervals averaging9.1 to14.4 cM. The majority of the starch (90%), protein(85%), and oil (73%) QTL intervals overlapped a secondkernel composition trait and were subsequently testedfor pleiotropy. The high level of pleiotropy was ex-pected, as starch, protein, and oil make up the bulk ofthe kernel’s dry matter. It is mathematically impossibleto achieve a kernel with.100% dry matter, and thus asthe percentage of one trait increases significantly, thepercentage of the other traits must decrease. If two traitsshare a QTL due to pleiotropy, the allelic effects at thatlocuswill be significantly correlated. Allelic effects weresignificantly correlated (P # 0.001) when each pair oftraits was examined (Supplemental Table S8). Each QTLwas also analyzed independently, revealing 12 of 13(92%) starch/protein, 1 of 8 (13%) starch/oil, 7 of 11(64%) protein/oil, and 1 of 8 (13%) starch/protein/oilwere pleiotropic (P # 0.05; Supplemental Table S8).
GWAS in NAM and 282 Inbred AP
The NAM design, combined with the increasedmarker density provided by HapMap.v1 markers (Goreet al., 2009), enables further dissection of the joint-linkagemapping QTL intervals via GWAS. To perform GWAS,1.6 million HapMap.v1 SNPs and indels identified in the26 NAM parents were projected onto the NAM RILs(Tian et al., 2011). Two GWASmethods were tested, eachrun on a chromosome-by-chromosome basis accountingfor the presence of QTLs on the other nine chromosomes.In the first analysis, a single forward regression modelwas developed for each trait based on the complete RILdata set (23 complete NAM families). The single forwardregressionmethod identified 33 starch, 31 protein, and 43oil SNP associations (Supplemental Tables S9–S11). Inorder to explore a wider range of models, a secondanalysis was conducted based on 100 random subsam-ples containing 80% of the RILs from each family. Thesubsamplingmethod yielded 127 starch, 118 protein, and135 oil SNP associations with resample model inclusionprobability (RMIP) $ 0.05 (Supplemental Tables S12–S14). More than 80% of all associations from the singleregression analysis were also identified in the subsam-pling analysis (Supplemental Tables S9–S11).
NAM GWAS results were compared to the NAMjoint-linkage QTL intervals. Between 47% and 100% ofthe SNPs selected by the 100 subsample method over-lapped with NAM joint-linkage QTL intervals, depend-ing on the RMIP level and trait (Supplemental Fig. S3).
Table I. Means, ranges, difference within range, and broad-sense heritability estimates for percent starch, protein, and oil kernel composition bestlinear unbiased predictors on a dry matter basis in the NAM population and 282 inbred AP
Number of QTLs detected in NAM by joint-linkage analysis for each trait with their respective R2 values explaining the amount of genetic variationdetected by the QTLs.
Between 54% and 62% of SNPs selected by both thesubsampling and single forward regression GWASmethods overlapped the starch, oil, and protein NAMjoint-linkage QTL intervals, respectively (Fig. 3).Although the joint-linkage genetic QTL intervals in
NAM were relatively small (average 9.1–14.4 cM),several intervals encompassed over 100 Mb of DNAsequence (Supplemental Tables S2–S4). In most cases,intervals that encompass large genomic regions corre-spond to low recombination regions, often representingcentromeric regions (Gore et al., 2009). GWAS analysiswith NAM was able to further dissect several of theQTL intervals overlapping large genomic regions intosubstantially smaller genomic intervals (Fig. 3).Complementing the NAM analysis, we conducted
an association analysis of kernel composition traits inan AP comprised of 282 ILs (Flint-Garcia et al., 2005)genotyped with the MaizeSNP50 BeadChip (IlluminaInc.). Removal of nonpolymorphic and low-qualitySNPs resulted in a dataset of 51,741 SNPs that wereused for GWAS employing the mixed linear model(MLM) method (Q+K; Yu et al., 2006) to control forpopulation structure. None of the 51,741 genome-wideassociations were significant for any of the traits aftera multiple test FDR (P = 0.05) correction was applied(Benjamini and Hochberg, 1995).
Underlying Genetic Architecture
The ultimate goal of our QTL study was to identifygenes underlying kernel composition traits. We iden-
tified NAM GWAS associations in several genes thatare known to be important enzymes in biochemical path-ways that influence starch, protein, and oil kernelcontent such as DGAT1-2 (RMIP 0.67), carbonic anhy-drase (RMIP 0.59), Suc synthase (RMIP 0.36), pyruvatekinase (RMIP 0.23), b-amylase2 (RMIP 0.20), nitratereductase (RMIP 0.07), and a-amylase (RMIP 0.06;Buchanan et al., 2000; Supplemental Tables S12–S14).Additionally, several significant GWAS associationswere located within transcription factors, zinc fingerbinding proteins, kinases, and the histone H1 variantH1.2, all of which regulate complex biochemical path-ways (Supplemental Tables S9–S14).
We explored the relationship between the joint-linkage oil QTLs on chromosome 6 (NAM markerm708; PZA03461.1) and a gene previously identified toaffect oil content and the ratio of oleic:linoleic acids(Zheng et al., 2008). The QTL was the most significantjoint-linkage QTL in our experiment and overlaps apreviously identified locus, ln1, confirmed to encode atype I acyl-CoA:diacylglycerol acyltransferase locatedat chromosome 6: 105,013,351 to 105,020,258 (B73RefGen_v1; (Schnable et al., 2009), which is involvedin the Kennedy pathway for triacylglycerol biosynthesis(Zheng et al., 2008). The authors of the latter studyidentified a functional Phe insertion in the C terminus ofthe protein that resulted in a high oil allele of DGAT1-2with 0.29% additive genetic effect. The NAM joint-linkage QTL on chromosome 6 overlapping DGAT1-2showed a distinct allelic series ranging from20.05% to0.21% (Fig. 4A). The 23 NAM founders used in this
Figure 2. Heat map displaying addi-tive allelic effects for oil content QTLsfor the 23 NAM founders relative toB73. The top horizontal axis lists thechromosome and genetic map positionfor each QTL peak, and the bottomaxis shows the NAMmap SNP selectedby stepwise regression. The verticalaxis displays the 23 inbred NAMfounder lines sorted in increasing per-cent oil content on a dry matter basis.Allelic effects are color coded basedon 0.05% increments.
Genetic Architecture of Maize Kernel Composition
Plant Physiol. Vol. 158, 2012 827 www.plantphysiol.orgon June 25, 2020 - Published by Downloaded from
study were genotyped for the indel conferring the Pheinsertion, and the four founders (M162W, Oh7B, Ky21,and Tx303) with the highest allelic effects also have thehigh oil Phe insertion allele (Fig. 4B).
GWAS on the NAM population further suggests thatDGAT1-2 is responsible for the joint-linkage oil QTL on
chromosome 6. Two biallelic SNPs (105,014,855 and105,019,473 bp) located in DGAT1-2 were associatedwith oil content with RMIP scores of 0.31 and 0.67,respectively (Fig. 4, C and D). Likewise, two SNPs(105,019,334 and 105,019,473 bp) located in DGAT1-2were associated with starch content with RMIP scores
Figure 3. Starch, protein, and oil GWAS in NAMand the 282 inbred AP compared with the NAMjoint-linkage mapping analysis. The regionsshaded blue (starch), red (protein), and green(oil) depict NAM joint-linkage QTL support inter-vals, with their height indicating log of the odds(LOD) score. Gray boxes along the horizontalaxis, Centromere positions. A, C, and E, NAM,black diamonds indicate position and magnitudeof associations detected by the subsamplingmethod (RMIP $ 0.05; Supplemental TablesS12–S14), and yellow diamonds show the posi-tion and magnitude of associations selected byboth the 100 subsample and single forward re-gression methods (RMIP; Supplemental TablesS9–S11). B, D, and F, 282 Inbred AP, blackdiamonds show the position and magnitude ofGWAS SNPs selected by MLM (Q+K) analysis atP = 0.01.
Cook et al.
828 Plant Physiol. Vol. 158, 2012 www.plantphysiol.orgon June 25, 2020 - Published by Downloaded from
of 0.51 and 0.11, respectively (Fig. 4, C and D). The twooil SNPs had a positive estimated additive effect rela-tive to B73 (0.13% and 0.18%), and both starch SNPs hada negative estimated additive effect relative to B73(20.32% and 20.38%; Fig. 4D). The negative effects forstarch and positive effects for oil correspond with thesignificant pleiotropy (r = 20.59) detected betweenoverlapping starch (m707) and oil (m708) joint-linkageQTLs in this region (Supplemental Table S8). The oilSNP located at 105,019,473 bp and the starch SNPlocated at 105,019,334 bp were also selected by theNAM GWAS single forward regression analysis.Association of the Phe:indel in DGAT1-2 was not
detectable by GWAS analysis, because the indel wasnot present in the HapMap.v1 marker set. To verifythat the Phe:indel was associated with kernel compo-sition in our diverse inbred panel, a candidate geneassociation analysis approach was implemented using
the MLM (Q+K) method on the 282 inbred AP. Consis-tent with previous results, the Phe:indel was signifi-cantly associated with oil content (P = 9.99 E-04) butwas not significantly associated with either starch orprotein content (Fig. 4D). In addition, there were twoSNPs from the MaizeSNP50 BeadChip located inDGAT1-2 at 105,013,351 and 105,019,334 bp. Whilethese SNPs associations were not significant after a5% FDR correction in the context of a full genome scan,they were associated with oil content with P-values of1.17E-04 and 4.32E-05 when a candidate gene approachwas used (Fig. 4D). The additive allelic effects for theseSNPs were 0.18% and 0.19%, respectively (Fig. 4D).
Comparing the NAM joint-linkage QTL allelic effectsfor the 23 founders to the genotypes of the significantGWAS markers in DGAT1-2 associated with oil contentsuggests the presence of an allelic series for DGAT1-2(Fig. 4, A andB). The four lineswith the highest estimated
Figure 4. QTL and GWAS analyses for the chro-mosome 6 oil QTL and candidate geneDGAT1-2.A, NAM additive percentage oil content on a drymatter basis allelic effect estimates for the m708QTL interval overlapping the DGAT1-2 genomicposition. Red bars, NAM founders possessinga significant high oil allele; blue bar, NAMfounder with a significant low oil allele relativeto B73. B, NAM founder genotypes for all markersdisplaying significant associations in DGAT1-2.C, DGAT1-2 gene model showing the position ofmarkers with significant associations. Note thatDGAT1-2 is on the negative DNA strand. D, NAMGWAS and candidate gene association analysisfor DGAT1-2. M2 is the Phe:indel previouslydetermined to be the functional polymorphismfor oil content at this locus (Zheng et al., 2008).
Genetic Architecture of Maize Kernel Composition
Plant Physiol. Vol. 158, 2012 829 www.plantphysiol.orgon June 25, 2020 - Published by Downloaded from
oil allelic effects (Tx303, Ky21, Oh7B, and M162W) havenon-B73 genotypes at all five significantmarkers detectedby GWAS and candidate gene association analysis. Twofounders with intermediate oil allelic effects, CML228and Ki3, have non-B73 genotypes at only two markerslocated in the N terminus of DGAT (105,019,334 and105,019,473 bp; Fig. 4, A and B). All other founder lineshave the B73 haplotype for DGAT1-2.
DISCUSSION
Joint-linkage analysis on the NAM population re-vealed that variation in starch, protein, and oil kernelcontent is controlled by at least 21 to 26 QTLs, eachwith relatively small effects. We compared our NAMQTL results with previous biparental QTL studieswhere we could determine the physical location of themarkers. Previous QTL studies detected a wide range(0 to .50) in the number of kernel composition QTLs(Goldman et al., 1993, 1994; Sene et al., 2001; Dudleyet al., 2004, 2007; Laurie et al., 2004; Clark et al., 2006;Dudley, 2008; Wassom et al., 2008). We found that lessthan one-half of these previously reported QTLs weredetected in NAM (Supplemental Table S15). Severalfactors could be responsible for differences in positionand quantity of QTLs detected in NAM versus thesestudies, including variation in allelic frequency, map-ping resolution influenced by the magnitude of linkagedisequilibrium in a population, marker density, envi-ronmental effects, and QTL analysis methods. The ma-jority of the previous QTL studies used parental lineswith extreme kernel composition phenotypes derivedfrom the IL long-term selection program (Goldmanet al., 1993, 1994; Laurie et al., 2004; Hill, 2005; Clarket al., 2006; Dudley et al., 2004, 2007; Wassom et al.,2008). The IL high- and low-oil and high- and low-protein populations were driven apart via artificialselection, and these populations likely accumulatedadditional variation controlled by small effect QTLs(Moose et al., 2004). This is in contrast to the NAMpopulation, where the parents were chosen to repre-sent overall natural variation in maize rather thanvariation specific to kernel composition, resulting inless extreme kernel composition variation and there-fore fewer QTLs (Yu et al., 2008). While NAM wassuccessful in capturing a representative sample ofQTLs for kernel composition in naturally diversegermplasm, the 24 founders analyzed in this studydo not possess all the phenotypic variation present inmaize for kernel composition traits.
Epistatic additive 3 dominance or dominance 3dominance interactions cannot be measured with theRIL structure of NAM; however, NAM has excellentpower to detect additive 3 additive epistasis. Wereport that additive 3 additive epistatic interactionsare not important for kernel composition traits inNAM; thus, the genetic architecture of starch, protein,and oil kernel content in the NAM population ischaracterized primarily by additive gene action. The
lack of epistasis for kernel composition genetic archi-tecture is consistent with other traits studied in NAM:flowering time, leaf morphology, and northern andsouthern leaf blight resistance (Buckler et al., 2009;Kump et al., 2011; Poland et al., 2011; Tian et al., 2011).In contrast, previous biparental kernel compositionQTL studies reported minimal to substantial levels ofepistasis (Goldman et al., 1993; Laurie et al., 2004;Dudley, 2008; Wassom et al., 2008). Variation in num-ber of epistatic interactions among studies is not un-common and has been observed for numerous traitsand species (Barton and Keightley, 2002; Holland,2007; Hill et al., 2008; Phillips, 2008). Interestingly,two kernel composition studies that used either RILsor S2 lines derived from the same source exhibitedcontrasting levels of epistasis for oil content (Laurieet al., 2004; Dudley, 2008). The study using RILs foundvariation in oil was predominantly explained by ad-ditive effects, leaving little variation for detection ofepistatic effects (Laurie et al., 2004). In contrast, thestudy using S2 progeny had a higher level of oilvariation described by dominant genetic effects andalso detected substantial, nonadditive, epistatic inter-actions (Dudley, 2008).
One of the greatest challenges in developing varietieswith desirable kernel quality characteristics in majorcrops [i.e. maize, wheat (Triticum aestivum), rice (Oryzasativa), soybeans (Glycine max), barley (Hordeum vulgare),etc.] is the strong phenotypic correlations among kernelquality traits that can be attributed to pleiotropic inter-actions (Simmonds, 1995; Ge et al., 2005; Panthee et al.,2005). Studies using parental lines derived from the ILlong-term selection program (Goldman et al., 1994;Dudley et al., 2004, 2007; Clark et al., 2006; Wassomet al., 2008) suggest that kernel composition is regulatedby a complex genetic network, resulting in strongphenotypic and pleiotropic interactions, and that itwill be difficult to develop maize germplasm withhigh starch, protein, and oil kernel characteristics. Ouranalysis in NAM confirms that these traits are signifi-cantly correlated both phenotypically and geneticallyacross diverse germplasm (Supplemental Table S8).
The NAM population was specifically constructed forhigh-resolution QTL dissection (Yu et al., 2008) and hasproven valuable for GWAS (Kump et al., 2011; Polandet al., 2011; Tian et al., 2011). Inter- and intra-chromosomallinkage disequilibrium among SNPs in the NAMfounders was reduced during NAM population devel-opment through random chromosome assortment andrecombination, thereby reducing spurious unlinkedassociations and increasing mapping resolution. Anal-ysis of linkage disequilibrium in NAM indicatedGWAS resolution will vary but in specific cases ap-pears sufficient to identify causal genes (Kump et al.,2011; Poland et al., 2011; Tian et al., 2011). We de-monstrate the use of GWAS to identify DGAT1-2 as astrong candidate gene for a 23.5-cM oil QTL (m708;PZA03461.1) corresponding to an approximately 25-Mb genomic region on chromosome 6. GWAS identi-fied two oil and two starch associations in DGAT1-2, a
Cook et al.
830 Plant Physiol. Vol. 158, 2012 www.plantphysiol.orgon June 25, 2020 - Published by Downloaded from
gene previously shown to influence oil content via aPhe insertion and is responsible for the ln1 mutation(Zheng et al., 2008). While DGAT1-2 was not shown toaffect starch content in Zheng et al. (2008), joint-linkageQTL analysis in NAM revealed that the oil QTL (m708;PZA03461.1) overlapping DGAT1-2 was significantlypleiotropic with starch (m707; PZB01658.1) and protein(m707; PZB01658.1; Supplemental Table S8). Detectionof two GWAS SNPs with positive allelic effects on oiland two GWAS SNPs with negative effects on starch inDGAT1-2 further substantiates pleiotropic effects onkernel composition.We complemented our studies in the NAM popula-
tion with an AP of 282 ILs, using both candidate geneassociation and GWAS approaches to verify NAMGWAS hits. Results from GWAS using the MaizeSNP50BeadChip produced no significant associations afterperforming a multiple hypothesis test correction. How-ever, candidate gene association analysis proved effec-tive, as we were able to detect a significant associationbetween oil content and the Phe insertion previouslyidentified in DGAT1-2 for increased oil (Zheng et al.,2008). Two additional SNPs on the MaizeSNP50 Bead-Chip located in the DGAT1-2 gene were significantlyassociated with oil content using the candidate geneapproach.Results from performing GWAS on both the NAM
population and the AP demonstrate that NAM maybe better suited for detecting associations with smalleffects than the AP.While NAM is genetically diverse, itcaptures only 80% of the diversity in the AP; thus, trueassociations with rare alleles present in the AP areundetected due to a lack of power. This is supported bythe lower overlap between the NAM joint-linkage re-sults and the GWAS AP hits (Supplemental Fig. S4) ascompared to the NAM GWAS hits (Supplemental Fig.S3). Many associations detected by GWAS on the AP areundoubtedly real, as evident by the DGAT1-2 example.However, the need for multiple test correction requireshighly significant associations, and as the number ofSNPs available for GWAS approaches millions, it willbecome increasingly difficult to detect significant asso-ciations in an AP of the present population size usingGWAS, especially for QTLs with small effects.Other than DGAT1-2, we were surprised that we
did not detect additional NAM GWAS associationswith other classical kernel composition genes such aso2, pyruvate orthophosphate dikinase, amylose-free wx1(=starch-granule-bound nucleotide diphosphate-starch gluco-syl transferase), su1 (=isoamylase-type starch-debranchingenzyme), prolamine box binding factor1 (pbf1), sh2 (=ADPG-ppase), and zein protein genes despite substantial SNPcoverage within or around these genes (Mertz et al.,1964; Thompson and Larkins, 1989; Vicente-Carbajosaet al., 1997; Lambert, 2001; Schultz and Juvik, 2004;Hennen-Bierwagen et al., 2009). We did detect GWASassociations in several genes that are known to be im-portant enzymes in biochemical pathways that influ-ence starch, protein, and oil kernel content [i.e. carbonicanhydrase (RMIP 0.59); Suc synthase (RMIP 0.36); pyru-
vate kinase (RMIP 0.23); b-amylase2 (RMIP 0.20); nitratereductase (RMIP 0.07); and a-amylase (RMIP 0.06)].Interestingly, the majority of the significant GWASassociations located within annotated genes were ele-ments that regulate complex molecular pathways suchas transcription factors, zinc finger binding proteins,kinases, and the histone H1 variant H1.2 (Supplemen-tal Tables S9–S14). Transcription factors and zinc fingerbinding proteins, such as o2, WRINKLED1 (ZmWRI1),and pbf1, have already been shown to be key regula-tors of kernel composition pathways, and kinasesare essential for signal transduction and regulationof feedback loops (Vicente-Carbajosa et al., 1997;Manicacci et al., 2009; Pouvreau et al., 2011). Histonevariants, such as the H1.2 gene that we found to beassociated with oil (RMIP; 0.63), are not well charac-terized for kernel composition; however, chromatinremodeling has been implicated in regulation of kernelcomposition, and histone variants have been shown tobe involved with gene-specific transcription regula-tion (Ascenzi and Gantt, 1997; Vicente-Carbajosa et al.,1997; Locatelli et al., 2009; Miclaus et al., 2011). Wepropose the prevalence of GWAS associations in reg-ulatory elements with small effects is related to thedelicate balance necessary for an inbred breeding pro-gram; breeders must manipulate multiple pleiotropictraits while simultaneously improving the overall ag-ronomic performance of a new IL. For example, whilethe null mutant allele of o2 results in a dramatic increasein Lys content, it would likely be selected out of thebreeding population due to its substantial negativeagronomic effects (Gibbon and Larkins, 2005). Selectionof subtle changes in multiple regulatory elements is amore likely mode of action in a breeding program.
Our DGAT1-2 results provide valuable confirmationthat GWAS in the NAM population is capable of iden-tifying genes influencing kernel composition QTLs. Abroad inference about the accuracy of NAM GWAS forkernel composition is limited, however, by the smallnumber of genes that have been verified to controlnatural variation in kernel composition. For example,we cannot rule out the possibility that the eight addi-tional significant GWAS SNPs identified in the chromo-some 6 oil QTL interval (m708; PZA03461.1) that are notlocated in the DGAT1-2 gene are in valid candidategenes, because their function is currently unknown.Likewise, lack of known genetic factors regulating quan-titative variation in kernel composition (as opposed tothe classical mutants with large effects) limits our abilityto explore significant GWAS SNPs located outside theQTL intervals. Further analysis of additional significantGWAS associations will help determine if the associa-tions are the result of the biallelic GWAS methodshaving more power to detect weak QTL effects versusthe multi-allelic QTLmethods under some conditions orif they are false positives due to linkage disequilibriumwithin chromosomes combined with insufficient SNPcoverage in the causative gene (Gore et al., 2009; Kumpet al., 2011; Tian et al., 2011). Significant SNPs should notbe ignored, as they could represent real QTLs, but
Genetic Architecture of Maize Kernel Composition
Plant Physiol. Vol. 158, 2012 831 www.plantphysiol.orgon June 25, 2020 - Published by Downloaded from
should be approached with caution, as they may be ab-errations due to extended linkage disequilibrium. Char-acterization of candidate genes such as the regulatoryelements previously discussed that are responsible forkernel composition QTLs will provide valuable infor-mation that can be used to “train” GWAS to detect genesassociated with kernel composition traits.
We have identified many favorable alleles for improv-ing starch, protein, and oil content in maize relative toB73. While B73 had the highest starch content of theNAM founders, it does not contain all the favorablealleles at the QTLs we identified. In fact, substituting themost favorable allele at 12 QTLs is predicted to increasethe starch content of B73 from 69.6% to 79.2%. Evenmorestriking is the potential to increase the oil content of B73from 3.6% to 7.2% by selecting favorable alleles at 17QTLs. Themost favorable alleles are dispersed among 10of the NAM parents in the case of starch and among 12of the NAM parents for oil. Thus, a large, inter-matedpopulation of the NAM parents would be required inorder to bring together all these favorable alleles in abreeding program focused on kernel composition.
In conclusion, the successful resolution of kernel com-position genetic architecture demonstrates the power ofNAM. Analysis of the DGAT1-2 gene demonstratesNAM mapping resolution capable of identifying signif-icant associations between traits and functional genes.Many of the significant GWAS SNP associations wedetected are located in uncharacterized genes (Supple-mental Tables S9–S14); hence, better gene annotation ofthe B73 reference genome and additional experimentswill be required to determine if these genes indeed in-fluence kernel composition. As the marker coverage onthe NAM RIL population increases and the location ofrecombination events is improved, the ability to detectadditional functional polymorphisms will also improve.Results from this study can be directly used for thedevelopment ofmaize germplasmwith improved kernelcomposition traits.
MATERIALS AND METHODS
Materials and Phenotypic Analysis
Development of the NAM population has been previously described
(Buckler et al., 2009; McMullen et al., 2009). The present study utilized 4,699
RILs genotyped with 1,106 SNPs. Similarly, the 282 IL AP was selected to
represent the genetic diversity found in world wide collections of publically
available germplasm (Flint-Garcia et al., 2005).
The NAM population and AP were planted in seven locations: five