Genetic Diversity and the Effects of Artificial Selection in Maize
Feb 12, 2016
Genetic Diversity and the Effects of Artificial Selection in Maize
Maize Diversity Project Team
Molecular DiversityHow has selection shaped molecular diversity in maize?
What is the relationship of selected genes to agronomic traits?
Goal: Identify genes exhibiting selection– Domestication, agronomic improvement, and local
adaptation
Community resource: SNP marker collection
Teosinte Landraces Inbreds/HybridsPhotos courtesy J. Doebley
Major predictions for the model
Those genes have contributed most to maize improvement, i.e. have experienced the strongest history of selection have the least genetic variability left to contribute to crop improvement by classical breeding.
These genes will not be detected in standard QTL experiments because all lines will contain similar alleles.
Can we develop genomics screens to identify genes that have undergone selection?
Invariant SSR approach (Vigouroux et al. 2002 PNAS 99:9650)
Directly contrast sequence diversity among teosintes and inbreds (Wright et al. 2005 Science 308:1310)
Are genes with low inbred diversity enriched for selected genes? (Yamasaki et al. 2005 Plant Cell 17:2859)
[email protected] for .pdfs
Summary of Sequencing on Random Genes(Irie Vroh Bi, Masanori Yamasaki, Kate Houchins)
MPZ inbreds – (temperate) B73(2), Mo17(2), Hp301, Il14H, Ky21, M37W, Oh43, (tropical) CML69, CML247, CML322, CML333, KUI3, KUI11, NC350.1095 alignments - 6169 SNPs. MPZ inbreds + 16 teosinte partial inbreds774 alignments – 3463 SNPs MPZ inbreds – 6136 SNPs in teosintes.
p for 1095 genes in maize inbreds
0
30
60
90
120
0 0.01 0.02 0.03p
Average 0.0067
≥
Sequence statistics for 1095 genes for diverse maize inbred lines.
N L Total L S Total S π
All Maize 13.1 280.4 307034 5.6 6169 0.0067 Temperate 6.7 292.2 310306 4.3 4560 0.0065 Tropical 6.6 290.8 308816 4.2 4427 0.0061
N = number of sequences, L = length of alignment, S = number of segregating sites, π average number of pairwise differences per bp.
Inbred-Teosinte Sequence Summary
• Number of alignments >5 in both sets 774• Average sample size inbreds 12.0• Average sample size teosinte 12.7• Average alignment length 294• Total SNPS in inbreds 3463• Total SNP in teosintes 6136
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0 0.02 0.04 0.06 0.08
teosintes
in
bred
s
Average .inbred/.teosinte 0.57Excluding .inbred=0 values 0.63
Diversity in maize inbreds vs. teosinte
To identify the selected genes we need new statistical approaches
• There are two models: a selection model and a bottleneck model
• We must estimate the size of the bottleneck• For each model, we estimate the probability of the
model given the data (the likelihood) for each gene• This is very simulation and computer intensive!• This approach allows us to estimate the proportion
of genes under selection and to identify the candidates
t1t2
Na
Nb
Np
t1
t2
Na
Nb
Np
Two models: To be considered selected need to fail the neutral model and be accepted by the selected model.
neutral selected
Locus S inb. S teo. Probability of beingin selected class
Annotated BLAST hit
scl394_p3** 0 27 0.74 Arabidopsis thaliana L28 ribosomal protein
scl491_p3** 0 13 0.62 Maize dihydrodipicolinate synthase
scl405_p3** 0 12 0.59 Unknown expressed protein
scl427_p2* 0 16 0.54 A. thaliana DNAJ heat shock protein
scl526_p3** 1 16 0.54 Maize hexokinase
scl499_p5** 0 12 0.51 Unknown expressed protein
scl512_p1** 0 16 0.51 Triticum adenylosuccinate synthetase
scl536_p4** 0 17 0.49 Oryza sativa putative acetyl transferase
scl531_p4** 0 11 0.46 Oryza sativa putative auxin-induced protein
scl457_p4* 0 7 0.45 Oryza sativa putative growth factor
Genes significant for selection
On a genomic scale….
• Assume 40,000 genes in maize• 40,000 x 0.04 = 1600 selected genes• Before genome scans, 11 genes had been
identified as selected by population genetic approaches
• By sequencing 1000 genes, have ~30 novel candidates
• These genes need to be divided between domestication and improvement
What genes show evidence of selection?
• Genes involved in amino acid synthesis or metabolism
• Genes involved in growth response.• Transcription factors and signal transduction
components.• Unique genes with no significant BLAST
homologies.
Are genes with low inbred diversity enriched for domestication and improvement candidates?
(Masanori Yamasaki)
Chose 35 genes with no diversity among the MPZ inbred set.
Sequenced same region in 16 haploid landrace samples, 16 teosinte partial inbreds and a Tripsacum dactyloides sample. Performed Hudson-Kreitman-Aguadé (HKA) (tests for selection) on inbreds, landraces and teosintes against the neutral genes adh1, glb1, fus6 and bz2.
Performed coalescent simulations of domestication (CS) of inbreds vs. teosintes and landraces vs. teosintes.
Nucleotide position (bp)
1 500 10000
0.01 Amino Acid Transporter
1 500 10000
0.02
0.01
1500
GTP-binding Protein
1 500 10000
0.02
0.01
1500
Unknown
0
0.01
1 1000 2000 3000
0.02 Fruit protein
1 500 1000
0.02
0.01
15000
0.03Chromatin remodeling
0
0.02
0.01
0.03 F-box (circadian clock)
1 1000 2000 1 1000 20000
0.04
0.02
3000
0.06
0.08 Ankyrin repeat
1 1000 20000
3000
ARF0.02
0.01
p
Inbreds Teosintes
Unigene N L SP value in HKAtotal
P value in HKAsilent
N L SP value in HKAtotal
P value in HKAsilent
Candidate status Homology search
AY108876 14 1,055 1 < 0.0068 ** < 0.0120 * 16 1,026 13 < 0.0433
* < 0.1849 Selected Gene Amino acid transporter
AY107195 14 3,119 1 < 0.0058 ** < 0.0087 ** 11 3,097 81 < 0.5889 <
0.6761 Selected Gene Auxin response factor
AY110109 14 1,466 1 < 0.0051 ** < 0.0054 ** 14 1,355 43 < 0.3613 <
0.5321 Selected Gene GTP-binding protein
AY105060 14 1,090 0 < 0.0041 ** < 0.0053 ** 15 1,112 59 < 0.7005 <
0.7631 Selected Gene
AY108178 14 1,259 0 < 0.0054 ** < 0.0082 ** 13 1,224 54 < 0.3233 <
0.4719 Selected Gene Circadian clock
AY106616 14 2,745 84 < 0.4395 < 0.2205 7 2,619 97 < 0.6859 < 0.7214 - Ankyrin repeat-like protein
AY107952 14 2,469 23 < 0.1193 < 0.0927 14 2,599 38 < 0.1453 < 0.1678 - Putative fruit protein, Oxidoreductase
AY106371 14 1,574 4 < 0.0094 **
< 0.0061 ** 15 1,615 65 < 0.4603 <
0.4047 Selected Gene Putative methyl-binding domain protein
Do genes exhibiting signatures of selection control agronomic traits?
(Sherry Flint-Garcia)
• Hypothesis: manipulation of the expression of domestication and improvement genes will alter key agronomic traits
• Methods: use genetic and transgenic approaches to examine teosinte, exotic, and inbred alleles
• Test case: amino acid composition in kernels• Evidence for selection for cysteine synthase,
chorismate mutase, dihydrodipicolinate synthase and hexokinase
To what extend has diversity in amino acid synthesis genes been reduced by
selection? (Sherry Flint-Garcia)
• Whitt et al., 2002 demonstrated that 3 of 6 genes in starch synthesis pathway in maize show solid evidence of artificial selection
• Evidence for selection for cysteine synthase, chorismate mutase, dihydrodipicolinate synthase and hexokinase from random sequencing
• Chose 16 additional genes for important steps in amino acid synthesis, sequenced in teosintes, landraces and inbreds and conducted tests of selection
0
5
10
15
20
25
Per
cent
of t
otal
am
ino
acid
Teosinte (n = 7)Landraces (n = 11)Maize (n = 27)
Per
cent
of K
erne
l Wei
ght
0
5
10
15
20
25
30
Tota
l A
min
o A
cid
Ala
nine
Arg
inin
e
Asp
artic
Aci
d
Cys
tein
e
Glu
tam
ic A
cid
Gly
cine
His
tidin
e
Isol
euci
ne
Leuc
ine
Lysi
ne
Met
hion
ine
Phe
nyla
lani
ne
Pro
line
Ser
ine
Thre
onin
e
Tryp
toph
an
Tyro
sine
Val
ine
**** **** **** **** **** **** ns** **** ns**Teosinte vs. Landraces
Teosinte vs. Inbred Lines**** **** **** **** **** **ns nsns **** **ns **
**
TCA Cycle
α-Keto-glutarate
Oxalo-acetate
Glucose
3-Phospho-glycerate
Phosphoenolpyruvate
Pyruvate
Acetyl-CoA
O-Acetylserine
SerineGlycine
Cysteine
Cysteine synthase
Pyruvate
Valine
Leucine
2-isopropyl-malate
synthase
Lysine
2,3-Dihydro-dipicolinate
DHDP synthase
Cystathionine γ-synthase
Cystathionine
Homocysteine
Methionine
S-Adenosyl-methionine
SAM synthetase II
Cysteine
Aspartate Amino-
transferaseAspartate
Aspartate4-seminaldehyde
Asparagine
Asparagine synthetase
Aspartate kinase
Glutamate
Isoleucine
2-Ketobutyrate
Homoserine4-phosphate
Threonine
Threonine deaminase
Acetohydroxyacid synthase
Glutamate dehydrogenase
Glutamate
ArginineProline
Prolinedehydrogenase
Glutamine
HistidineNitrate
Reductase
NH4NO3– NO2
–
NH4
Hexokinase(N:C sensing)
Trans-cinnamic acid Lignin
PAL
TyrosinePhenylalanine
Chorismate mutase
Prephenate
AnthranilateSynthase β Anthranilate
Tryptophan
Indole-3-glycerolphosphate
TryptophanSynthase β1
ChorismateShikimateDAHP
Erythrose 4-P
SAM synthetase I
Alanine
ntl1 -- nitrogen regulating protein
Sequencing candidate genes• Goal is to sequence 1000 candidate genes in all
inbreds for the 25DL, 16 teosintes, 2 Tripsacum, and W22 R-std
• Shared responsibility by E. Buckler and M. McMullen laboratories
• Develop SNP (or sequence) based assays for association analysis
• Develop a mechanism to accept candidate gene suggestions for outside the project
• www.panzea.org
100%
80%
60%
38,000 genes 1,000 genes1,000 genes
Implications for GEM
• For the vast majority of genes inbreds lines retain on average 60% of common diversity of teosinte and 80% of the diversity of landraces. Therefore the problem of loss of diversity is a specific problem to particular genes and traits rather than a general problem
• Most of the diversity lost in unselected genes is in rare alleles and therefore hard to capture
Implications for GEM
• Our studies to date have not addressed specific adaptation, possibly a more important justification for GEM than limited diversity per se
• It is hard for me to think about how to tap diversity for specific adaptation without considering diversity in a trait context.