Edward Buckler USDA-ARS Cornell University http://www.maizegenetics.net Crop design with genomics and natural diversity
Edward Buckler USDA-ARS
Cornell University
http://www.maizegenetics.net
Crop design with
genomics and natural
diversity
Goal: Create the global model to
decrease cycle time Make
Crosses
Inbreed
Small Scale Hybrid
Large Area
Hybrid
Trials
Sell or Release Winner Hybrids
5 years
Make Crosses
Doubled Haploid
Genotype
Predict Value
Small Scale Hybrid
Large Area
Hybrid
Trials
The Model
Data From Other Efforts
4 months
4 y
ears
Sell or Release Winner Hybrids
Genomic
Selection (GS) Standard
Breeding
With perfect knowledge it could run 15X faster,
current reality ~3X
GS versus GWAS
• Same data: genome wide marker and
phenotypes, different statistics
• GWAS – Genome Wide Association
Studies are aimed identifying causative
genes and variants
• GS – Genomic Selection aims to predict
phenotype using the complete genotype
Challenges
• Genotype to unite world’s germplasm
resources
• Resolve complex traits so perfect LD
remains for more than 15 meioses.
• Collect and mathematically model
relevant trait and environmental
interaction
• Deploy
The Maize Diversity Project
McMullen & Flint-Garcia, at University
of Missouri
Holland, at North Carolina State Univ.
Ware, at Cold Spring Harbor Lab.
Sun & Kresovich, Cornell University
Doebley, University of Wisconsin
USDA-ARS & NSF Plant Genome
www.panzea.org
Unite world’s
germplasm diversity
Maize has more molecular diversity
than humans and apes combined
Silent Diversity (Zhao PNAS 2000; Tenallion et al, PNAS 2001)
1.34% 0.09%
1.42%
Maize likely has functional variation at every gene. In total, there could
be 100,000s of functional SNPs (Single Nucleotide Polymorphisms)
Only 50% of the maize genome is
shared between two varieties
Fu & Dooner 2002, Morgante et al. 2005, Brunner et al 2005
Numerous PAVs and CNVs - Springer, Lai, Schnable in 2010
50%
Plant 1
Plant 2 Plant 3
99%
Person 1
Person 2 Person 3
Maize Humans
Maize genetic variation has been
evolving for 5 million years
Modern Variation
Begins Evolving
Sister Genus
Diverges
Zea species begin
diverging
Maize domesticated
5mya
4mya
3mya
2mya
1mya
Warm
Pli
ocen
e
Co
ld
Ple
isto
cen
e
Divergence from
Chimps
Ardipithecus
Homo erectus
Modern Humans
Modern Variation Begins
Australopithecus
The Maize HapMapV2 Project Ware, at Cold Spring Harbor Lab.
Ross-Ibarra, Univ. California, Davis
X. Xun & S. Chi, Beijing Genome Inst.
Y. Xu, CIMMYT
J. Lai, Chinese Agri. Univ.
Q. Sun, Cornell Univ.
N. Springer, Univ. of Minnesota
McMullen, at University of Missouri
Doebley & Kaeppler, Univ. of Wisconsin
USDA-ARS, NSF, BGI, JGI
Maize HapMap2 • Increase the breadth of samples (teosinte, landraces, improved lines)
– All inbred lines
• Whole Genome Shotgun, Illumina Paired-End, 76-100bp
• 103 lines, 13 Billion reads, 1Tbp of sequence
• Median 5X coverage
0 200 400 600 800
Sequence Reads (Gbp)
Tripsacum dactyloides
Teosinte (Zea Mays
ssp. Mexicana)
Teosinte (Zea Mays
ssp. Parviglumis)
Maize Landraces
Maize Improved Lines (including
NAM)
60 Inbred lines
23 Inbred lines
17 Inbred lines
2 Inbred lines
1 sample
The Warning & It Applies To
Many Other Studies • CSHL & BGI alignment pipelines only
agree 50% of time with same data
• ~160M SNPs identified – most probably
really exist somewhere
– MOST DO NOT EXIST WHERE ALIGNED
– GENETIC AND EVOLUTIONARY CONTROLS
• >50% errors if accept standard pipelines
• 55M pass various population & genetic
filters
HapMapV2 Results
• 55M SNPs identified
• Domestication & improvement
loci found
• Copy number and PAV
identified
–80-90% of the genome in flux
–Explain many QTL
Genotyping By Sequencing
GBS Reduced representation sequencing
for rapidly genotyping highly diverse
species RJ Elshire, JC Glaubitz, Q Sun, JA Poland, K
Kawamoto, ES Buckler, and SE Mitchell
Institute for
Genomic Diversity
PlosONE 2011
http://www.maizegenetics.net/
What is GBS? • Use next generation sequencing to
genotype a reduced representation
portion of a genome
• RAD, RRL, CROPS, GBS
• Molecularly the most effective
approaches use restriction enzymes
– The first maize HapMap was RRL (Gore
et al 2009 Science)
– Recent efforts are drive price down
Expectation of marker
distribution
Biallelic, 17%
Too Repetitiv
e, 15%
Non-polymor
phic; 18%
Presense/Absense
, 50%
Multiallelic, 34%
Too Repetitiv
e, 15%
Non-polymorphic; 1%
Presense/Absense
, 50%
Biparental population Across the species
GBS 96-plex Protocol http://www.maizegenetics.net/
1. Plate DNA
&
adapter
pair
Barcode
Adapter
―Sticky Ends‖
Barcode
(4-8 bp)
Common
Adapter
primer 1 primer 2
2. Digest DNA with methylation-
sensitive Restriction Enzyme
3. Ligate adapters
(Steps 2 & 3 may be done simultaneously)
ApeKI (5 base-cutter) or PstI (6 base-cutter)
GBS 96-plex Protocol
. .
. .
. . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . .
.
.
. .
. . . . .
.
. . .
. . . . .
. . . . . .
. . . . . . . . .
. .
. .
. . . . . .
. . .
. . . . .
. . . . .
. . . . . .
. . .
. . . . . . . .
.
. .
.
. .
. .
.
. . . . . . . . . . .
Plate DNA &
adapter pair Pool
DNAs
PCR
Primer
s
Digest DNA with RE
Ligate adapters
(may be done simultaneously)
Evaluate
fragment sizes
Clean-up
CTGCAATCTTGGACAATGTATGTAGGGACTAGGGACAGTGATGTAATTAC
CAGCACTAATTCACACAATTTTGTCGGTTGATGTTACTGCAGTGGATCTT
CAGCACTAATTCACACAATTTTGTCGGTTGATGTTACTGCAGTGGATCTT
CAGCACTAATTCATACAATTTTGTTGGTTGATGTTACTGCAGTGGATCTT
CTGCGATCGCCGCGCCGATGAACGGGCCTACCCAGAAGATCCACTGCAGT
CTGCGATCGCCGCGCCGATGAACGGGCCTACCCAGAAGATCCACTGCAGT
CTGCCGTTGCTGGCAGTGCTACAACTCTTCACCTGACTGAAAGCTACTAA
CAGCTAGCGCAAGTGTTTGTGTTGCGCGCGCGCTGTGGAAAAGTGTGCCG
CAGCTAATTTTTTGGTATTTATTTGAAATAAGTTCCCACTACTCGCGGTT
CAGCTAATTTTTTGGTATTTGTTTGAAATAAGTTCCCACTACTCGCGGTT
CAGCCACTTCCCTCATTTGAAACTTTTTGGATCTTTGAAGACCAATAGAT
CAGCTAAGAAGATAGAGCCAAACAAGGTGGGCCTGCCAACGTCTCCTTCC
CAGCTAAGAAGATAGAGCCAAACAAGGTGGGCCTGCCAACGTCTCCTTCC
CTGCGACTCGTGCTTCGCCGCGGCCTGAAGAACCCGGTCTTTCACCGCCG
CTGCTCGGTAGTAAACGGGTACAGAATTTAATCCCGCATCATTTGGAAGC
Sequence (8 x 96 samples
per flowcell)
1.3 million reads per sample
110Mbp (today)
Costs per DNA sample
at various multiplex levels
0
5
10
15
20
25
30
35
48-plex 96-plex 384-plex
Sequencing
Labor
Reagents & Consumables
$33.00
$19.00
$9.00
Co
st
in U
S
Do
llars
GBS has been used in 11
plant species
Molecular Biology Basically Solved Over 30,000 samples run in the last months
Cacao,190Grape,570
Maize,11115
ReedCanaryGrass,1045
Sorghum,3325
Switchgrass,950
The main GBS challenges
currently are
bioinformatics
Bioinformatics Problems
• Massive amounts of data
• Complex genomes with many
unstable parts of a genome
• No reference genome
• Missing data
• Phasing and imputation
Discovery
Tag Counts by Taxa
Map Tags Genetically
Map Tags by Homology
Genetic Logic
Reference Genetic Map
Alleles and synonyms
Alleles to SNPs
Tags by Taxa
QSeq
Assign Tags to Alleles
Alleles to SNPs and locations
Genotypes (HapMap format)
Production
QSeq
Tag Counts by Taxa
Tags by Taxa Reference Genome
GBS Bioinformatic Pipelines
Only 50% of the maize genome is
shared between two varieties
Fu & Dooner 2002, Morgante et al. 2005, Brunner et al 2005
Numerous PAVs and CNVs - Springer, Lai, Schnable in 2010
50%
Plant 1
Plant 2 Plant 3
99%
Person 1
Person 2 Person 3
Maize Humans
Physical and genetic mapping
of 8.7 million GBS alleles Gene candPhysicalAgree
Gene candPhysicalDisagree
NotinPhysical,Gene callymapped
Complexmappingormodestpowercurrently
ConsistentErrororEvenlyrepe ve
Readswithstronggene cand/orBLASTposi on
Readswithweakerposi onhypothesis
Readswithnohypothesis(Errororevenrepe ve)
• Only 29% of alleles are
simple - physical and
genetic agree
• 55% of alleles are easily
genetically mappable
• Many complex alleles are
rarer, so 71% of alleles are
genetic and/or physically
interpretable.
• With more samples and
better error models perhaps
90% will be useable
All
ele
s
Re
ad
s
12 Trillion Data Point
Opportunity/Problem
• By end of 2011:
– GBS on ~30,000 public sample worldwide
– 200M variants known from whole genome
sequencing
– Combine and impute missing data:
2 alleles x 30,000 lines x 200,000,000 variants =
12 trillion data points
Doing the statistics and math will be a
challenge.
Resolve complex
traits
The Hammer: Maize Nested Association Mapping (NAM)
• Crossed and sequenced 25 diverse maize lines to capture a substantial portion of world’s breeding diversity
• Derived 5000 inbred lines from the crosses
• Grew millions of plants
• Largest genetic dissection system ever
Tx303
Mo18W
MS71 Hp301
CML333 CML247
P39
CML228
Ki11
M37W
CML103
NC350
Oh43
Ky21
CML52
Oh7B
M162W
CML69
Tzi8
Ki3
NC358
CML322 CML277
IL14H B97 CML52 B73
F1
RIL2 RIL199 RIL200 RIL1 …
B73
F1
RIL2 RIL199 RIL200 RIL1 …
P39
McM
ullen
et
al 2009 S
cie
nce
P1
P2
P25
B73
Genotyping parents by sequencing to exploit
both recent and ancient recombination
.
.
.
.
.
.
.
.
.
Pop1
Pop2
Pop25
NAM HapMapV1 provides 1.6M SNPs Gore, Chia et al 2009 Science
GWAS for Plant
Density – the leaf
architecture
portion of the
story
There has been 8 fold jump in US
maize yield in the last 80 years
USDA-NASS; Troyer 2006 Crop Sci. 46:528–543; Duvick 2005 Maydica 50:193-202
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
0
20
40
60
80
100
120
140
160
180
1865 1885 1905 1925 1945 1965 1985 2005
Co
rn P
lan
ts p
er
Acre
Avera
ge c
orn
yie
ld (
bu
/ac)
Year
Open pollinated
double cross
single cross
modern
Plant Density
3 fold increase in plant density
Leaf angle, blade length, blade width
•Determine canopy morphology and
light harvest
•Important for high density and yield
Newer hybrids have upright leaves (Duvick 2005)
At least 30-40 genes control each
aspect of the leaf
Upper leaf angle Leaf length Leaf width
93% of significant
alleles: <18mm effect
96% of significant alleles:
<2.5° effect
95% of significant
alleles: <3mm effect
-200
-150
-100
-50
0
50
100
150
200
3 6 9 12 15 18 21 24Fr
eq
ue
ncy
of A
lle
le
Allelic Effect (mm)
Significant alleles
-200
-150
-100
-50
0
50
100
150
200
0.5 1.5 2.5 3.5 4.5
Fre
qu
en
cy o
f All
ele
Allelic Effect (mm)
Significant alleles
Alleles showing positive effects Alleles showing negative effects
-250
-150
-50
50
150
250
0.5 1 1.5 2 2.5 3 3.5 4
Fre
qu
en
cy o
f All
ele
Allelic Effect (°)
Significant allele
Each gene has a small effect
Tian, Bradbury et al 2011 Nature Genetics
liguleless1 and liguleless2 explained the
two ―biggest‖ leaf angle QTL
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−lo
g(p
)
Associations with positive effect
Associations with negative effect
Linkage QTL peak
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−lo
g(p
)
Associations with positive effect
Associations with negative effect
Linkage QTL peak
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−lo
g(p
)
Associations with positive effect
Associations with negative effect
Linkage QTL peak
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−lo
g(p
)
Associations with positive effect
Associations with negative effect
Linkage QTL peak
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−lo
g(p
)
Associations with positive effect
Associations with negative effect
Linkage QTL peak
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−lo
g(p
)
Associations with positive effect
Associations with negative effect
Linkage QTL peak
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−lo
g(p
)
Associations with positive effect
Associations with negative effect
Linkage QTL peak
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−lo
g(p
)
Associations with positive effect
Associations with negative effect
Linkage QTL peak
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−lo
g(p
)
Associations with positive effect
Associations with negative effect
Linkage QTL peak
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−lo
g(p
)
Associations with positive effect
Associations with negative effect
Linkage QTL peak
lg1 lg3 lg2 lg4
Upper leaf angle
10
20
1 2 3 4 5 6 7 8 9 10
cM
/Mb
Chromosomes
Tian, Bradbury et al 2011 Nature Genetics
The biggest effect was less than <2°
Effect lg1 lg2
QTL effect -2.4° -2.5°
SNP effect -1.2° -1.7°
Leaf Length
(36 QTL)
Leaf width
(34 QTL)
Upper Leaf Angle
(30 QTL)
2
0.08
3 0.03
3
0.03
Number of
shared QTLs
Phenotypic
correlation (r2)
Days To Silk
(39 QTL)
7
0.30
6
0.20
3 0.04
Low genetic overlap among leaf
architecture traits
Genetic architectures are finely
tuned to each exact
environment with evolution
favoring low pleiotropy
What genes have natural variation
to control Carbon & Nitrogen
metabolism in the field?
Nengyi Zhang
With Stitt & Gibon
groups, sampled 12000
plants in the field for
basic carbon & nitrogen
metabolites across all of
NAM
CA
Direct GWAS hit in the Carbonic anhydrase
(CA) gene
GW
AS
— B
PP
Lin
kag
e —
-lo
g(P
)
CA is the single most important gene
controlling Chlorophyll, Malate, Nitrate,
Glutamine, and overall protein content.
Carbonic anhydrase (CA) is a critical
enzyme in C fixation in C4 plant
Ludwig M. et.al. Plant Physiol. 1998
CA
Mala
• CO2 HCO3-
• CAs are upstream regulators
of CO2-controlled stomatal
movements in guard cells
• Water use efficiency, heat
stress
CA
Hu H. et al. Nature Cell Biology 2010
CA SNP associations: Chla, Mala, Nitr, Glut, Prot, Prin1
Trait SNP BPP (%) Gene AGP Glut 3: 213,890,769 6 carbonic anhydrase 213,888,899-213,896,251
Star 2: 22,808,083 76 invertase 22,804,880-22,809,451
5: 168,868,583 67 invertase 168,865,756-168,868,879
Chla 3: 213,848,077 48 carbonic anhydrase 213,847,057-213,859,958
3: 213,848,298 10 carbonic anhydrase 213,847,057-213,859,958
3: 213,894,582 24 carbonic anhydrase 213,888,899-213,896,251
9: 23,215,157 16 starch synthase 23,213,761-23,217,689
Gluc 5: 167,871,133 5 1,4-alpha-glucan branching enzyme 167,869,465-167,892,914
Fruc 5: 204,526,436 41 endoglucanase 1 (Cellulase) 204,527,678-204,531,175
Mala 3: 213,856,232 29 carbonic anhydrase 213,847,057-213,859,958
3: 214,330,739 23 malate transporter 214,325,927-214,328,710
Prot 8: 117,977,083 8 ribosome protein 117,979,473-117,983,191
3: 213,854,238 11 carbonic anhydrase 213,847,057-213,859,958
Nitr 1: 202,621,762 5 malate dehydrogenase (NADP+) 202,617,705-202,621,864
2: 181,079,834 39 chla,b binding protein 181,076,994-181,079,397
3: 213848077 7 carbonic anhydrase 213,847,057-213,859,958
4: 166,175,217 5 glutamine synthetase 166,172,187-166,175,518
Fuma 1: 195,285,519 22 pyruvate dehydrogenase E1 195,281,414-195,283,531
Significant SNPs either within or very near (<2kb)
candidate genes
Can we make useful
predictions
y = 1.32x – 21.2 R² = 0.93
65
70
75
80
85
90
95
65 70 75 80 85 90 95
Ob
se
rve
d D
ays
To
Flo
we
rin
g o
f P
are
nta
l L
ine
s
NAM QTL Prediction of Days to Flowering
Can we predict?
Predicted Flowering from markers models
Ob
se
rve
d F
low
eri
ng
Tim
e
With a $20 test, we can predict when many
varieties will flower with a couple days
NAM QTLs accurately
predict many traits.
Hence we can breed
with it.
Leaf Length
Leaf width Upper leaf
angle
500
600
700
800
900
1000
650 750 850 950
Ob
serv
ed
Predicted
R2=0.84
55
65
75
85
95
105
115
60 70 80 90 100 110
Ob
serv
ed
Predicted
R2=0.81
25
35
45
55
65
75
85
95
40 50 60 70 80 90
Ob
serv
ed
Predicted
R2=0.78
Taming of NAM
NAM and Ames Yield
Trials
• 1800 NAM lines test crossed on PVP
and trialed in 4 location in 2010 and 6
locations in 2011
• Every inbred in Ames has been
evaluated for basic traits in 2010
• Yield trials for 1200 Ames inbreds on
PVPs in 6 environments in 2011
• Collaborating with breeders to combine
GEBV models
S. Larsson
C. Romay
Evaluate Natural Variation
Mathematically Model Genotype to Phenotype
Predict Phenotype
Facilitates Rapid Breeding Progress
What can genomics do to
accelerate the breeding of
simple and complex traits?
What should and can we do
in the next decades?
• Double yield with same fertilizer and
water (better drought and N utilization)
– Perhaps even more in the developing world.
• Perennialize our crops
• Biofortify crops to improve nutrition in
the developing world
• Do this in 100 species.
Who do I contact to learn more?
• NAM – Jim Holland, Mike McMullen, and Sherry Flint-Garcia
• HapMapV2 – Doreen Ware, Jer-Ming Chia, Jeff Ross-Ibarra
• QTL Mapping on NAM – Peter Bradbury, Zhiwu Zhang, Feng Tian
• Leaf Architecture – Feng Tian
• C & N Metabolites – Nengyi Zhang & Yves Gibon
• GBS Methods & Bioinformatics – Rob Elshire & Sharon Mitchell, Qi Sun, Jeff Glaubitz, James Harriman
Web: www.panzea.org & www.maizegenetics.net
Supported by USDA-ARS & NSF