FHI Biotechnology Approaches Genome sequencing Clonal testing Transgenics GE trees New varieties Marker-aided breeding
Jan 07, 2016
FHI Biotechnology Approaches
Genome sequencing
Clonal testing
Transgenics
GE trees
New varieties
Marker-aidedbreeding
Chestnut Genome Research Team
John E. Carlson, PI, Schatz Center, Penn State University
DNA SequencingStephan C. Schuster Professor of Biochemistry and Molecular Biology, Penn State Lynn P. Tomsho, Daniela Drautz, and Lindsay Kasson Sequencing Specialists, Penn StateTyler Wagner Research Assistant, Penn State
Bioinformatics and Comparative GenomicsWebb Miller Professor of Biology and Computer Science & Engineering, Penn StateCharles Addo-Quaye Postdoctoral Fellow, Penn StateMeg Staton, Stephen Ficklin and Christopher Saski Bioinformatics team at Clemson University Genomics InstituteAbdelali Barakat Research Associate, Clemson University
FHI Cooperators: Bert Abbott, Sandra Anagnostakis, Kathleen Baier, Ali Barakat, Nurul Faridi, Eric Feng, Stephen Ficklin, Fred Hebard, Thomas Kubisiak, Charles Maynard, Scott Merkle, Joseph Nairn, William Powell, Dana Nelson
Our Goals:
1) Develop a complete reference genome sequence for chestnut
2) Identify all genes in the three blight resistance QTL
3) Deliver candidate genes to the FHI Transgenics group and the FHI Marker-aided breeding group
4) Provide the genome to the research community
5) Demonstrate the potential of genomics to address forest health and ecosystem restoration.
The Chinese Chestnut Genome Sequencing Project
The Chinese Chestnut Genome Sequencing Project
1. The reference Castanea mollissima cv. Vanuxem genome was sequenced to over 25-fold depth.
2. Preliminary de novo assemblies of the reference genome sequence were conducted.
3. Commenced use of genetic and physical map information (from the FHI genetic technologies group) in genome assembly.
DELIVERABLES FOR YEAR ONEwere all achieved
• “Shot-gun” sequencing completed by March, 201018-fold* depth by 454 technology = 14.2 Gigabases 47-fold* depth by Illumina technology = 37.6 Gigabases
• Passed QC tests: mtDNA < 0.4% and cpDNA < 0.3% of sequence microbial DNA negligible sequence reads over 350 bp repetitive DNA manageable (conserved repeats at 9 to 12%)
• Preliminary assemblies of the genome sequence were promising totalling app. 852 Mbp, but in smaller pieces than desired
* assumes a genome size for chestnut of app 800 Mbp
DELIVERABLES FOR YEAR ONE, the details
The Chinese Chestnut Genome Sequencing Project
The Chinese Chestnut Genome Sequencing Project
WHAT WE LEARNED IN YEAR ONE
1. “Next Gen” sequencing technologies produce a large amount of high quality data, very quickly.
2. Large amounts of high quality data take a long time to assemble using currently available software.
3. Assembly of the reference genome will require more than just “shot gun” Next Gen sequence data.
4. “Paired end” data are required to pull contigs together into chromosome scaffolds.
5. For assembly purposes, the chestnut genome may be larger than 800 Mbp.
1. Produced paired-end sequence data.
2. Covered the physical map with BAC-end sequences.
3. Commenced gene identification and characterization:
Transcripts aligned to the genome assembly
Assembly searched for genes
Preliminary annotations of genes conducted
4. Strategy for resistance gene discovery updated.
The Chinese Chestnut Genome Sequencing Project
DELIVERABLES ACHIEVED IN YEAR TWO
1. Paired-end sequences from 454 sequencing at 4.5-fold depth (3.6 Gb).
2. 43,143 BAC-end sequences obtained, “tiling” the physical genome map to 1.5-fold depth, anchored to genetic map.
3. New assemblies conducted using the paired-end data: 587,208,063 bp assembled into 51,766 scaffolds, 925,312,071 bp assembled into 1,147,939 contigs
The Chinese Chestnut Genome Sequencing Project
DELIVERABLES FOR YEAR TWO, the details
The Chinese Chestnut Genome Sequencing Project
DELIVERABLES FOR YEAR TWO Gene Identification and Characterization
4. Chinese chestnut unigenes (transcripts) from NSF project aligned well to the current genome assembly: 97% of transcripts (46,954) aligned to genome assembly 98% identity of transcripts and genome sequences
5. Results of gene search with preliminary assembly: 66,662 gene models predicted in the scaffolds
- certainly an over-estimate of gene number at this point
- mean gene length 2,761 bp, maximum length 43,203 bp
- mean number of genes per scaffold 12.8, maximum 58
6. Candidate gene sequences identified in genome contigs Coding sequences delivered to the transgenics team
•Transcript length: 43,203 bases•Number of Exons: 71•Scaffold ID: scaffold01252
The Chinese Chestnut Genome Sequencing Project
The largest gene identified in the preliminaryChinese Chestnut genome assembly
Homolog of AT1G67120 (NP_176883.4), AAA ATPase, von Willebrand factor type A domain-containing protein, with nucleoside-triphosphatase activity.
Num
ber
of G
enes
E-values (strength of matches)
N = 959
Most Arabidopsis single-copy genes have strong matches to the current genome assembly (by BLAST alignment)
The Chinese Chestnut Genome Sequencing Project
Best matches of proteins from the chestnut genome assembly are to peach and other related species
Only 1% of best matches to Arabidopsis.
The Chinese Chestnut Genome Sequencing Project
• peach, 23%• rice, 12%• grapevine, 7%• Eurosids 1 species, 56%
Best matches:
BLASTx alignments to model plant genomes in Phytozome
The peach genome is best for chestnut gene discovery.
Source: http://www.phytozome.net/
eurosids 1
eurosids 2
The predicted chestnut proteins are most similar to species in the Eurosids 1 clade, that also includes peach and chestnut.
The Chinese Chestnut Genome Sequencing Project
However, the genome assembly is uneven and not as good as needed to assemble all of the blight resistance QTL genes
Range of coverage among genome scaffolds
The Chinese Chestnut Genome Sequencing Project
QTL by Linkage Group
Physical Map Contig #
Estimated Contig Size
# Clones in minimum tiling path
Estimated Clone Lengths
DNA Pool
G 7039 4.51 Mb 40 6.22 Mb A
F 403 5.13 Mb 51 7.64 Mb B
B 9166 2.50 Mb 24 3.47 Mb C
B 4269 2.31 Mb 24 3.45 Mb C
B 3279 1.68 Mb 19 2.37 Mb D
B 11956 3.65 Mb 30 5.06 Mb D
TOTALS 19.79 Mb 188 28.2 Mb
• Sets of BAC contigs covering the QTLs were identified.• Sequencing of each QTL underway as contig pools.• Genes will be identified using peach resistance QTL and CC transcripts.
Our target is the blight resistance genes. We will sequence the Resistance QTL themselves, which is already in progress:
The Chinese Chestnut Genome Sequencing Project
Marker-aidedbreeding
Genome sequencing
Clonal testing
Transgenics
GE trees
New varieties
Complete QTL sequences
Markers in QTL genes
Candidate genes from the QTLs
Candidate gene validation
Year 3 - Gene discovery