The potential benefits of the potato genome sequence and high throughput SNP platform to breeding. David Douches 1 , Candice N. Hansey 2 , Alicia Massa 2 , Kim Felcher 1 , Joseph Coombs 1 and C. Robin Buell 2 . 1 Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI 48824, 2 Department of Plant Biology, Michigan State University, East Lansing, MI 48824,
90
Embed
sequence and high throughput SNP platform to breedingThe potential benefits of the potato genome sequence and high throughput SNP platform to breeding. David Douches 1, Candice N.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The potential benefits of the potato genome
sequence and high throughput
SNP platform to breeding.
David Douches1, Candice N. Hansey2, Alicia Massa2, Kim Felcher1, Joseph Coombs1 and C. Robin Buell2.
1Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI 48824, 2Department of Plant Biology, Michigan State University, East Lansing, MI 48824,
The Potato: Our favorite vegetable
• Potatoes are the world’s 3rd most important crop, esp. developing countries
• Americans eat ~57 kg (126 pounds) of potatoes per year (fries and chips)
• Breeding is challenging, antiquated methods – Most cultivated potato are tetraploids, highly heterozygous, not all are
fertile, vegetatively propagated
• Can genomics provide insight into unique aspects of potato biology/genetics and can this be used to improve potato as a crop?
Doubled Monoploid DM 1-3 516 R44
• Doubled monoploid line DM 1-3 516 R44 of adapted Solanum tuberosum Group Phureja (from Richard Veilleux, Virginia Tech, USA)
• Reduced complexity for whole genome shotgun sequencing due to homozygosity
• Taxonomic study (Spooner et al. 2007) suggest it is same species as S. tuberosum
• Very slow growing, presumably due to increased ‘genetic load’ caused by exposure of inferior alleles to environment and homozygosity
• Genome size based on flow cytometry ~850 Mb
The Potato Genome
• Assembled DM genome (727 Mb)
• WGS of RH
• RH BAC sequences
• First asterid genome published
-39,000
genes
Potato Breeding Challenges
• Potato breeding – currently a phenotypic based process.
• “A lot” of molecular markers for a potato breeder pre-2011 was 4
• Tetraploid breeding and genetics
• Vegetative propagation
• The challenge has been for the breeder to combine the market-driven quality with the agronomic performance and host plant resistance needed by the growers.
What is SolCAP?
The SolCAP project is a coordinated agricultural project that links together people from public institutions, private institutions and industries who are dedicated to the improvement of the Solanaceae crops: potato and tomato. Through innovative research, education and extension the SolCAP project will focus on providing significant benefits to both the consumer and the environment.
The SolCAP project is supported by the Agriculture and Food Research Initiative Applied Plant Genomics CAP Program of the USDA’s National Institute of Food and Agriculture
SolCAP Overall Research Objective
• To reduce the gap between genomics and breeding SolCAP will provide infrastructure to link allelic variation of SNPs in genes to valuable traits.
– Identify up to 10,000 SNPs for tomato and potato in elite germplasm
– genotype germplasm panels and mapping populations with Illumina Infinium potato and tomato SNP arrays
The SolCAP project is supported by the Agriculture and Food Research Initiative Applied Plant Genomics CAP Program of the USDA’s National Institute of Food and Agriculture
Potato SNP Discovery
Align reads to contigs with Maq pipeline
2,263,279 SNPs
Filter SNPs for read depth, density, and quality with Maq SNPFilter
575,340 SNPs (Filtered SNPs)
Align contigs to genome sequence and link SNPs from each variety to a genomic position
80,986 SNPs
Remove SNPs that are not biallelic and filter SNPs within 50bp of intron
69,011 SNPs (High Confidence SNPs)
RNA-Seq Reads Sanger ESTs
Assemble ESTs per variety using TGICL and call SNPs
8,327 SNPs
Filter SNPs for read depth and density using custom perl script
2,358 SNPs (Filtered SNPs)
Hamilton et al., 2012
Atlantic Premier Russet
Snowden
Bintje Kennebec Shepody
• Unique oligo for each bead type
• Bead Pool is 250,000 per sample
• Random self-assembly of beads onto the chip
• Redundancy averages 15 to 30 beads of each type
• 8,303 SNPs on Illumina Infinium chip
• 24 samples per chip
SNP Detection: Infinium 8303 Potato Array
Infinium 8303 Potato Array (A genome-wide set of SNP markers)
Number of SNPs Reason Selected
3,018 In community provided
candidate genes
536 Previously identified genetic
markers
4,749 Genome-wide coverage
SNPs/100 kb
Genes/100 kb
Felcher et al., 2012
Calling SNPs with Infinium 8303 Potato Array
• SolCAP Custom potato calling file – Based on potato diversity panel, two 4x populations and one 2x population – http://solcap.msu.edu
• 3 Cluster Calling
– Good - 7412 (89.3%) – Questionable - 296 (3.6%) – Segregation - 254 (3.1%) – Bad - 341 (4.1%)
• 215 out of the 250 clones (86%) were genotyped twice and had less than 0.57% difference between the replicates in the diploid model
• Filtered SNPs to contain less than 20% missing data – 6,373 SNPs in the filtered diploid model (simplified with all 2x) – 3,763 SNPs in the filtered dosage model (4x-AAAA, 2x-AA, 1x-A)
Hirsch et al., in prep
SNP Di pl oi d Fi l t er ed
Percent Missing
Frequency
0 20 40 60 80 100
01000
3000
SNP Dosage Fi l t er ed
Percent Missing
Frequency
0 20 40 60 80 100
0500
1000
1500
Cl one Di pl oi d Fi l t er ed
Percent Missing
Frequency
0 20 40 60 80 100
050
100
150
200
Cl one Dosage Fi l t er ed
Percent Missing
Frequency
0 20 40 60 80 100
050
100
150
SNP Di pl oi d Fi l t er ed
Percent Missing
Frequency
0 20 40 60 80 100
01000
3000
SNP Dosage Fi l t er ed
Percent Missing
Frequency
0 20 40 60 80 100
0500
1000
1500
Cl one Di pl oi d Fi l t er ed
Percent Missing
Frequency
0 20 40 60 80 100
050
100
150
200
Cl one Dosage Fi l t er ed
Percent Missing
Frequency
0 20 40 60 80 100
050
100
150
Genetic Relationship Between Clones
Red – Chip Processing Dark Blue – Genetic Stock Purple – Pigmented Green – Processing Russet Light Blue – Round White Table Pink – Species Brown – Table Russet Yellow - Yellow
Hirsch et al., in prep
UPGMA tree from Roger’s allele frequency based distances
Divergence of Market Classes
Hamilton et al., 2012 Hirsch et al., in prep
Green – Processing Russet Brown – Table Russet
Russet germplasm groups tightly regardless of if it was bred for processing (french fries) or table markets
Species
Processing Russet
Table Russet
Chip Processing
Yellow
Pigmented
Round White Table
Diploid Breeding Line
Genetic Stock
Subgroups within Market Classes
Hirsch et al., in prep
•There is divergence of market classes •There is also subgroups within the market classes, particularly in the chippers
Divergence of Market Classes
Hirsch et al., in prep
A century of potato breeding has resulted in clear genetic differentiation of germplasm within market classes
Red – Chip Processing Dark Blue – Genetic Stock Purple – Pigmented Green – Processing Russet Light Blue – Round White Table Pink – Species Brown – Table Russet Yellow - Yellow
Percent Heterozygosity in Potato
Average percent heterozygosity in
the panel is 51.21%
Hirsch et al., in prep
Percent Heterozygosity in Potato
Heterozygosity is much lower in the species and genetic stocks
Hirsch et al., in prep
Phenotypic Evaluation
• Traits measured at two locations (Wisconsin (Janskey and Bethke) and New York (De Jong) and two replications per location in the summer of 2010
• Traits measured for biochemical composition, growth descriptors, tuber phenotypes, and processing properties
• Only tetraploid clones were phenotyped
Hirsch et al., in prep
Market Class Phenotypic Divergence
Fructose
Fructose
Co
un
t
0 2 4 6 8 10 12 14 16 18
020
40
60
80
10
01
20
14
0
Phenotypic divergence between market classes is observed for many of the traits in the expected patterns given the selective pressures placed on each market class.
Hirsch et al., in prep
0 2 4 6 8 10 12 14 16 18
140
120
100
80
60
40
20
0
Co
un
t
Fructose (mg/g) 0 5 10 15 20 25 30
100
80
60
40
20
0 C
ou
nt
Sucrose (mg/g)
Chip Processing and Processing Russet Chip Processing
Market Class Phenotypic Divergence Tuber Length
Tuber Length
Co
unt
50 70 90 110 130 150
01
020
30
40
50
60
70
Tuber appearance traits also diverged among market classes
Hirsch et al., in prep
50 70 90 110 130 150
70
60
50
40
30
20
10
0
Processing Russet and Table Russet
Tuber Length (mm)
Co
un
t
Market Class Phenotypic Divergence Yield
Yield
Cou
nt
0 1 2 3 4 5 6 7 8 9
015
30
45
60
75
90
Not all traits demonstrated clear market class divergence such as yield. Hirsch et al., in prep
Yield per 10 Plant Plot (kg) 0 1 2 3 4 5 6 7 8 9
90
75
60
45
30
15
0
Co
un
t
Ongoing Work
• Allelic composition over time
• Phenotypic divergence over time
• Tracking genes selected through pedigrees important for traits of interest
• Role of wild species in germplasm diversity
• Association mapping
Blindauer, C.A., and R. Schmid. 2010. Cytosolic metal handling in plants: determinants for zinc specificity in metal transporters and metallothioneins. Metallomics 2: 510-529.
HMA ATPases – Heavy Metal Associated transporting ATPases involved in metal transport from the cytosol MTP – Metal Tolerance Proteins involved in membrane-bound transport ZIP – Zinc/Iron Permease responsible for cellular metal ion uptake esp. in roots
“The percentage of genes coding for zinc-binding proteins in eukaryotes is estimated conservatively at around 10%.”
Gene model Putative function PGSC0003DMG402004858 C2H2L domain class transcription factor PGSC0003DMG400013784 Non-ltr retrotransposon reverse transcriptase
PGSC0003DMG400022166 SEC14 cytosolic factor PGSC0003DMG400030728 Zinc ion binding protein
• Integrated, breeder-focused resources for genotypic and phenotypic analysis at SGN and MSU
– http://solcap.msu.edu
– http://solanaceae.plantbiology.msu.edu
– http://solgenomics.net
Breeder's Toolbox
Double Reduction in Tetraploids
• Autotetraploids can undergo double reduction that results in (the segments of) two sister chromatids being recovered in a single gamete.
• For this to occur, multivalent pairing must take place with a cross-over between a locus and its centromere followed by the two pairs of chromatids passing to the same pole in anaphase I (adjacent segregation).
Tetraploid Mapping
• Premier Russet (PR) X Rio Grande Russet (RG)
– PRRG – 184 Progeny
– Rich Novy’s population
Double Reduction Example Tetraploid Potato on Infinium Array
Progeny
PR RG AAAA AAAB AABB ABBB BBBB NC
AAAA AAAB 92 86 6 0 0 3
Expected Ratio 1 1
Progeny
PR RG AAAA AAAB AABB ABBB BBBB NC
BBBB ABBB 0 0 6 87 92 2
Expected Ratio 1 1
Distribution of Simplex SNPs with Double Reduction in PRxRG
No. of SNPs
No. of DR PR RG Total
0 373 168 541
1 47 68 115
2 19 37 56
3 32 14 46
4 7 13 20
5 7 8 15
6 2 4 6
7 0 1 1
Distribution of Simplex SNPs with Double Reduction in PRxRG by chromosome and parent
Premier Russet Rio Grande
Chromosome No. SNPs No. DR SNPs No. SNPs No. DR SNPs
chr01 46 21 33 19
chr02 56 15 20 15
chr03 48 5 24 4
chr04 43 7 50 30
chr05 33 4 14 6
chr06 47 9 31 8
chr07 36 2 15 6
chr08 46 12 12 10
chr09 39 11 43 15
chr10 43 21 19 8
chr11 17 6 21 11
chr12 33 1 31 13
Total 487 114 313 145
Double Reduction in PRxRG Simplex SNPs by Pseudomolecule Chromosome Position
Is there a homozygous potato?
• Most wild tuber-bearing Solanum species are diploid (2n =2x = 24) and self-sterile due to the presence of a genetically-based gametophytic self-incompatibility system
• It has been difficult to develop inbred lines for breeding and genetics studies.
• Self-compatible genotypes have occasionally been reported
• In S. chacoense, self-compatibility is conditioned by the presence of a dominant allele of an S-locus (self-incompatibility locus) inhibitor gene (Sli)
S. chacoense S7
line 523-3
Objective
• This study was carried out to characterize the distribution of heterozygous SNPS in potato inbred lines that have been self-pollinated for 6-7 generations
Levels of Heterozygosity in SolCAP Germplasm Panel Diploids
Pathogenesis, environmental stress, and defense response related proteins
SNP Frequency Distribution Chromosome 4
S7
S6
SNP Frequency Distribution Chromosome 8
S7
S6
Future studies
• Can we detect where selection has occurred in the genome?
• What genes might be under selection? – Limitation: insufficient recombination to identify candidate genes
• How can we apply these tools and this information to crop improvement?
• Can sub-population data based on inbred lines predict hybrid performance?
• Mapping in Elite x Elite populations
– recombination is limiting • increase the number of SNPs for mapping
• Study genomes of wild relatives
• Genome wide selection
– Limitation to GWS is establishing appropriate trait models.
Genotyping strategies to consider (balancing information, cost, time)
• Genotyping by sequencing – reduced representation ($50/sample)
• Genotyping using the Infinium array ($100 sample) • Optimized pools of 384 SNPs for community mapping
projects – (BeadXpress and Kbio platforms) eg. Tomato
http://www.extension.org/pages/61007/
• Process:
– Select SNPs based on Polymorphic information content (PIC) in target germplasm pools
– Select SNPs based on genetic map position – Fill-in based on physical position
Summary
• DR products were identified in Simplex x Nulliplex crosses. Other crosses will also allow us to study DR.
• DR is observed on all chromosomes and all arms.
• DR frequency is greater further from the centromere.
Discussion
• The distribution of residual heterozygosity in S. chacoense S6 and S7 lines is genome wide
• Only 34 of the >1000 heterozygous SNPs were heterozygous across all lines tested
• 40 % of these SNPs are from genes related to pathogenesis, environmental stress, and defense response mechanisms
• The residual heterozygosity may be due to selfing and other factors such as selection, recombination, and mutation
• The S. chacoense S7 lines are a resource for future genetic studies
Summary
• SolCAP has developed a genome-wide set of SNP markers that can be used by the breeding and genetics community
• The Infinium SNPs allow for dosage calls in heterozygous tetraploid potatoes.
• Five cluster calling of SNPs in Genome Studio adds power to marker analysis
• There has been both phenotypic and genotypic divergence between market classes
• Identification of genes associated with traits of interest and the use of marker assisted selection will allow for phenotypic improvement to proceed at a more rapid pace
Summary
• We have the tools in place to start to identify these associations with the diversity panel – Significant genotypic variation for traits of interest – Genetic variation in the population underlying the phenotypic
variation
• QTL mapping of economically important traits is initiated in tetraploid populations with simplex, duplex and triplex SNPs
• Potato QTL mapping is more feasible with a genome wide set of SNPs!
• Opportunities to improve our understanding of the potato genome and develop new breeding strategies are numerous in the genomics era!
Acknowledgments Collaborators, OSU
Heather Merk
Sung-Chur Sim
Matt Robbins
Troy Aldrich
Collaborators, MSU
C Robin Buell
John Hamilton
Dan Zarka
Kelly Zarka
Collaborators, VTU
Richard Veilleux
Industry Collaborators Cindy Lawley, Illumina
Martin Ganal, Trait
Genetics
Funding USDA/AFRI
This project is supported by the Agriculture and Food Research Initiative of USDA’s