WHAT ARE THE MOLECULAR BASES OF SPECIFICITY BETWEEN HOST AND PATHOGEN? WHAT IS THE GENETIC BASIS FOR VARIATION IN SUCH SPECIFICITY? HOW CAN WE ACHIEVE DURABLE RESISTANCE? WHAT NEW OPPORTUNITIES ARE PROVIDED BY HIGH-THROUGHPUT DNA SEQUENCING AND MARKER TECHNOLOGIES? LONG-STANDING QUESTIONS IN PLANT-PATHOGEN INTERACTIONS: High Throughput DNA Sequencing and the Deployment of Resistance Genes Richard Michelmore The Genome Center & Dept. Plant Sciences University of California, Davis http://michelmorelab.ucdavis.edu
45
Embed
High Throughput DNA Sequencing and the Deployment of ...gcp21.org/Bellagio/17-RichardMichelmore.pdf · Long, high accuracy reads, ABI 3730 Massively parallel, shorter reads 454 Illumina
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
WHAT ARE THE MOLECULAR BASES OF
SPECIFICITY BETWEEN HOST AND PATHOGEN? WHAT IS THE GENETIC BASIS FOR VARIATION
IN SUCH SPECIFICITY? HOW CAN WE ACHIEVE DURABLE RESISTANCE? WHAT NEW OPPORTUNITIES ARE PROVIDED BY
HIGH-THROUGHPUT DNA SEQUENCING AND MARKER TECHNOLOGIES?
LONG-STANDING QUESTIONS IN
PLANT-PATHOGEN INTERACTIONS:
High Throughput DNA Sequencing and the Deployment of Resistance Genes
Richard Michelmore The Genome Center & Dept. Plant Sciences
University of California, Davis
http://michelmorelab.ucdavis.edu
Lettuce genetics, genomics & breeding
The Compositae Genome Project
Molecular basis of disease resistance: CHARGE, Niblrrs
OVERLAP AND INTEGRATION OF PROJECTS IN MICHELMORE LAB
Gary Shroth (Illumina): “One HiSeq will be able to generate as much sequence as was in GenBank in 2009, every four days”.
Decreasing cost of sequencing (1990 – 2010)
DNA sequence becoming an inexpensive commodity. New paradigms as to how DNA sequence is generated, handled and valued.
The Economist, August 12th, 2010
Clone by clone, Sanger sequencing Long, high accuracy reads, ABI 3730 Massively parallel, shorter reads 454 Illumina Genome Analyzer & HiSeq 2000 ABI Solid Helicos (Complete Genomics) Imminent arrivals: Ion Torrent Pacific BioSciences Oxford Nanopore
The Revolution in DNA Sequencing: Identification of Variation
Illumina Sequencing Platforms
Feature GAiix HiSeq2000
Flowcells x Surface Imaging 1 x 1 2 x 2
Read length 2 x 150 2 x 100
Yield per run (PF data) 50 Gb 200 to 350 Gb
Data Rate 5 Gb / day 25 to 40 Gb / day
Human Genomes (30x) per run 0.5 >2
G. Shroth, Illumina
DNA (0.1-1.0 ug)
Sample preparation: Fragmentation &
addition of primers Cluster growth in flow cell
5’
5’ 3’
G
T
C
A
G
T
C
A
G
T
C
A
C
A
G
T C
A
T
C
A
C
C
T A G
C G
T A
G T
1 2 3 7 8 9 4 5 6
Image acquisition Base calling
T G C T A C G A T …
Sequencing
Illumina Sequencing of Clusters of DNA Molecules Reversible Terminator Chemistry
Cycles of synthesis, imaging, & washing
G. Shroth, Illumina
Illumina HiSeq 2000
Dual surface imaging Fast scanning and imaging Two flow cells Initially capable of 200 Gb
per run -> 350 Gb Run time 7 to 8 days for
2x100 bp 25 Gb/day 2 billion paired-end reads <$10,000 per human
genome <$200 per transcriptome
G. Shroth, Illumina
Low instrument and reagent costs. Silicon chip technology. Single-use microchips Uses natural dNTPs and DNA polymerase Real-time detection of base incorporation No optics or enzymic amplification cascade Proprietary H+ ion sensitive layer
www.iontorrent.com
www.iontorrent.com
Sequencing determined by measuring H+ release following incorporation of nucleotide. Chip interrogated with cycles of A, T, C, G.
Interrogated with C
Interrogated with G (or A)
Interrogated with T
Simple kits and operation Benchtop convenience, small footprint
Fast. Single reads
Up to millions of reads 10s to 100s Mb output
Not single molecule sequencing
Compatible with preps for other libraries
www.iontorrent.com
Single Molecule Real Time (SMRT™) sequencing Recording natural DNA synthesis by DNA polymerase as it occurs Single molecule resolution Simple amplification-free sample prep Long reads, average read over 1kb Fast, 1 to 3 bases incorporated per second, Sample prep to data analysis in less than a day Low overall costs 80,000 Zero Mode Waveguides (ZMWs) monitored simultaneously, 2 sets per SMRT cell ~33% of ZMWs have only one polymerase Not for counting large numbers of tags
http://pacificbiosciences.com
Processive DNA Synthesis with Phospholinked Nucleotides
Incorporation of labeled nucleotide creates flash of light, captured by optics system and converted into base call with quality metrics
Base being incorporated held in detection volume for tens of milliseconds, producing a bright flash of light. Phosphate chain cleaved, releasing the attached dye molecule. Process repeats at 1 to 3 bases per second.
Kinetic detection of methylated bases during sequencing E.g.: N6-methyladenosine (mA)
www.nanoporetech.com
www.nanoporetech.com
Exonuclease attached at entrance of α-hemolysin nanopore delivers individual DNA bases in order into the nanopore. Cyclodextrin attached to inside surface of nanopore acts as a binding site for individual DNA bases and allows accurate monitoring of their passage through the nanopore.
$1,000 ($100?) human genome coming => $1,000 genome for many animals and plants $100 genome for fungi $10 genome for bacteria en masse Not just genomic DNA sequence:
DNA modifications epigenomics & copy number variation (CNV) expression analysis
Enormous amounts of sequence data Need for major data handling capabilities Vital role for bioinformatics
The Challenge and Opportunity: How to utilize the deluge of sequence data?
In near future: DNA sequence = an inexpensive commodity generated on a variety of platforms
High-throughput sequencing Sequence assembly
Annotation Acquisition of other relevant data
Display in genome browser
UC Davis Sequencing & Gene Expression Service Cores 2011 ->
Tissue, DNA or RNA samples brought to Genome Center
Researcher queries samples versus existing information over web
Integrated activities of DNA Technology & Bioinformatics Cores
Genomic sequencing De novo Microbial, animal and plant diversity Novel & unculturable organisms Biomes (bacteria = 100x human) Novel genespace Re-sequencing SNP and CNV discovery, TILLING Gene cloning, novel allelic diversity Genome Wide Association Studies (GWAS) High resolution population genetics Mapping BSAseq
Gene regulation Transcriptome sequencing for gene models and splicing RNAseq for expression analysis Small and non-coding RNAs Ribosome profiling CHIPseq for DNA binding sites DNA modifications and epigenomics
Gene discovery from non-model organisms
E.g. Cloning of 10 genes from wheat rust, Puccinia striiformis f.sp. tritici (collaboration with J. Dubcovsky)
Whole genome sequencing (<100Mb genome) Two lanes Illumina GA to ~60x.
Quicker, cheaper, more informative than gene-by-gene. Permanent resource for other studies.
Genomic Encyclopedia of Bacteria & Archea www.jgi.doe.gov/programs/GEBA Using phyogenetic approach to sample diversity for genome sequencing.
1000 (Human) Genomes Project
http://www.1000genomes.org Extending hapmap. 2,000 anonymous individuals from 20 populations. 4x coverage originally planned; deeper, 30x now feasible Pilot studies: 15 M SNPs, 1 M small indels, 20 K structural variants
25,000 human genomes sequenced by end of 2011 by this & other projects 1001 Arabidopsis Genomes Project
http://ldl.genomics.org.cn BGI 1,000 economically and scientifically important species in two years
Analyses of Biological Diversity
Genome-Wide Association Studies (GWAS) Case and control comparisons to identify phenotype-marker associations Requires sufficient density of markers & numbers of individuals Currently based on SNP marker analysis Will be replaced by sequencing, at least in non-model species
False-positives due to multiple comparison issues Power issues due to loci of small effect
E.g. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. The Wellcome Trust Case Control Consortium. (2007) Nature 447:661-678.
Within L. sativa types: Crisphead x Crisphead Romaine x Romaine Leafy x Leafy + Butterhead
Between cultivated lettuce types: L. sativa x L. sativa
Between & within primary genepool: L. sativa x L. serriola L. serriola x L. serriola
Between & within secondary genepool: L. sativa x L. saligna L. saligna x L. saligna
For each population: Intercross 20 genotypes for 3 generations to breakdown LD & population substructure. Self to produce 200 F7:8 RILs. Phenotype & sequence F7:8 RILs.
PLANNED ASSOCIATION MAPPING POPULATIONS IN LETTUCE
Cultivated lettuce
Primary genepool
Secondary genepool
Old paradigm (slow and inflexible): One-by-one marker development Utilization of core set of reference markers
Current paradigm (faster but specific to populations): Sequence transcriptome of parents to id. 1,000s of SNPs Develop informative SNP panel for specific sets of crosses Run SNPs on segregating individuals
Future paradigm (fast, flexible,& highly informative): Sequence transcriptome or genespace of segregating individuals
Rate limiting steps:
Informative populations Accurate phenotyping Library preparation Sequencing not limiting Data analysis?
Genetic Mapping
Parent-independent genotyping for constructing an ultrahigh-density linkage map based on population sequencing. Xie et al., 2010. PNAS 2010.
238 rice RILs each sequenced to 0.055x, 13x in aggregate. Barcoded and multiplexed. 2x 36 nt paired-end reads, 20.6 Mb total single run. Genotypes inferred from RILs using maximum parsimony of recombination & HMM. New capabilities => any species tractable in a single run.
E.g. BSAseq: High-throughput sequencing of the 52b2 mutant of Arabidopsis to identify the causal mutation using sequence-based mapping of a bulked segregating population. Cuperus et al. (2010) PNAS 107:466-471.
• Rapidly replacing chip hybridizations • Analysis of unknown genomes/genes, no a priori knowledge required • No limitation on number of genes assayed • Digital, unambiguous readout • High sensitivity, specificity, dynamic range • Opportunity to progressively sequence to greater resolution • Detection few mRNA per cell possible • Low cost (vs. SAGE, MPSS, = chips) • Mulitplex samples 12 per lane, 192 libraries per run on HiSeq2000 • Quantification of splice variants • Quantification of allele specific expression • Currently limited by speed (& cost) of library preparation ->
52 PHENOTYPES MAPPED RELATIVE TO 289 RESISTANCE GENE CANDIDATES
Nearly all resistance phenotypes segregate with a cluster of NBS-LRR encoding genes. 100s of resistance genes in all genotypes.
Complex clusters of phenotypes & candidate genes
Leah McHale et al., 2009. Theor. Appl. Genet. 118:1223-4. Maria Truco, Oswaldo Ochoa, Kirsten Lahre, et al. unpublished
Cf-2 Cf-4 Cf-5 Cf-9 RPP27
Xa21 FLS2
Pto L6 M N RPS4 RPP1 RPP5 RRS1
RPM1 RPS2 Prf I2 Mi Dm3 RPP8 Bs2
Nucleotide Binding Site Leucine-Rich Repeat Protein Kinase Toll-Interleukin Homology Coiled-coil domain C terminal domain
RPP13 RPS5 Xa1 Pi-B Rx1 Rx2 Rp1 Gpa2
MAJOR PROTEIN MOTIFS SHARED BETWEEN PLANT DISEASE
RESISTANCE GENE PRODUCTS ACTIVE AGAINST DIVERSE
PATHOGENS & PESTS
RPW8
Rpg1
35S ocs
RGC2B = Dm3
~500bp pdk intron ~400bp GUS
• Generate stable transgenic plants with RNAi construct. • Test for silencing of GUS using Agrobacterium-‐mediated transient assays with 35S::GUS. • Test plants exhibiCng silencing for disease resistance to isolates expressing avirulence to mulCple Dm genes in cluster.
LG2
FUNCTIONAL ANALYSIS OF CANDIDATE GENES USING RNAi: SILENCING RGC2 GENE FAMILY WITH GUS REPORTER FRAGMENT FOR SILENCING
Wroblewski et al. 2007. Plant J. 51:803-818
RNAi using the NBS region of RGC12G silences the linked Dm7, Dm4 and Dm11 genes
CG w/ RNAi x LSE57/15 R4T57 x CG w/ RNAi Capitan x CG w/ RNAi (T-DNA/-)(dm0) (Dm7) (Dm4) (T-DNA/-)(dm0) (Dm11) (T-DNA/-)(dm0)
+ + - - R R S S no no yes yes
+ + - - R R S S no no yes yes
GUS
Avr4
Silencing
Dm4
GUS
Avr11
Silencing
Dm11
+ + - - R R S S no no yes yes
Marilena Christopoulou
Conclusion: Dm4, Dm7 and Dm11 are encoded by RGC12G or related sequence(s)
On-going: making RNAi lines for most candidate resistance genes
= excellent markers for breeding resistance
COMBINING RESISTANCES TO FUSARIUM AND DIEBACK IN CRISPHEAD AND ROMAINE
In collaboration with Ivan Simko, USDA, Salinas
Ra
Dm
1
AN
T2
Dm
2 Dm
6 Dm
14 Dm
15 Dm
16 Dm
18 FUS2
20
40
60
80
100
120
140
2 RGC2F
RGC2A
AY153845
RGC2X
QGB28O08
RGC2J
RGC2M
RGC2B
RGC2S
CLSM16923
RGC2Y
RGC2U
RGC4U
CLX_S3_4455
QGD10N11
CLX_S3_14182
CLX_S3_1771
CLX_S3_9978
LsatNBS05
CLX_S3_13590
CLX_S3_11288
CLX_S3_12930
CLX_S3_3045
CLX_S3_4550
CLX_S3_3943
CLX_S3_15473
CLX_S3_6671
QGI11K10
CLX_S3_7649
CLX_S3_4292
CLX_S3_15231
QGC22B19
CLX_S3_4116
CLX_S3_1216
CLX_S3_12591
CLX_S3_4001
CLX_S3_7749
CLX_S3_8485
CLX_S3_891
CLX_S3_5919
QGF20E14
CLX_S3_7847
RGC1B
CLX_S3_6313
0 Dm
3 Tvr
Dieback resistance from crisphead
Fusarium resistance from romaine
Danger that transfer of resistance to one disease will introduce susceptibility to the other.
Broke linkage. Have lines resistant to both diseases.
QTL for Fusarium resistance close to resistance to dieback
Chr 2
Large cluster of NBS-LRR encoding genes and resistance phenotypes
CONSEQUENCES OF RESISTANCE GENE GENETICS & MOLECULAR BIOLOGY TO PLANT IMPROVEMENT
All genotypes have large numbers (100s) of resistance genes. Backcrossing often introgresses clusters not single resistance genes. Introgression will replace resistance genes in recurrent parent. Clustering implies pyramiding some combinations of genes will be very difficult. Large number and wide variety of recognition specificities possible. Only finite number of specificities against a particular pathogen?? Possibility for repeated introgression of same resistance specificity. Some non-host resistance may be pyramids of specific genes. Quantitative resistance can be conferred by single NBS-LRR genes. Divergent selection using multiple/different resistances optimal. Opportunities for transfer across sexual compatibility barriers.
HOW CAN WE ACHIEVE DURABLE RESISTANCE?
Strategies to enhance durability of resistance
Use knowledge of pathogen variability to inform strategies for resistance gene deployment. New opportunities from new
technologies for rapid and comprehensive genotyping
Utilize ‘durable’ resistance – empirical rather than mechanistic?
Match life expectancy of resistance and cultivar
DIRECT OR INDIRECT PATHOGEN PRODUCTS
SIGNAL CASCADE, RESISTANCE RESPONSE & RAPID CELL DEATH
SELECTION PRESSURE ON PATHOGEN FOR LACK OF ACTIVITY OF MULTIPLE AVIRULENCE FACTORS
PYRAMIDING RESISTANCE GENES USING MARKER-ASSISTED SELECTION OF CAUSAL GENES
X
MULTIPLE RECOGNITION EVENTS
Diversify selection pressure on pathogen
Heterogeneity of resistance genes in space and/or time:
diversify resistance sources,
pipeline with different resistance genes from other programs,
multilines and cultivar mixtures,
syn- and allo-patric gene deployment,
Roy Johnson (1984). A critical analysis of durable resistance. Ann. Rev. Phytopathol. 22:309-30.
IMPACT OF HIGH THROUGHPUT SEQUENCING & MASSIVELY PARALLEL GENOTYPING ON RESISTANCE STRATEGIES
• Global rather then gene-by-gene analysis • Saturation of identification of candidate genes Recognition, signal transduction, response SNPs in causal genes • Characterization of germplasm Full genome resequencing of 10 – 100 genotypes Gobal genotyping of 1000 - 10,000 genotypes Natural variation in 1o, 2o, & 3o genepools Vast numbers of resistance genes available § Characterization of pathogens (A)virulence factors Pathogen variability § Gene deployment Marker assisted selection of causal genes Pyramids of multiple genes, conventional & transgenic Heterogeneity between genotypes in space & time Fragment selection pressure on pathogen populations Manage pathogen evolution
PATHOGEN POPULATION GENETICS SHOULD DRIVE DEPLOYMENT OF RESISTANCE GENES
Influenza paradigm
Continual sampling of pathogen Virulence phenotyping Gene-space sequencing (10s) SNP genotyping (1000s)
Deployment of effective resistance genes Pyramiding, MAS or effector-driven selection Allo- and sympatric diversity Temporal adjustment of resistance genes deployed Transgenic approaches for novel resistance strategies
+
Potential Challenges to Implementation of the Influenza Paradigm for Resistance Gene Deployment
Collection of pathogen samples from diverse, low-tech locations. Global coordination of pathogen sampling.
Data processing and interpretation.
Data-driven consensus building and decision making.
Sufficient numbers of genes for pipeline (conventional &/or transgenic).
Persuading breeders (public and private) to participate and coordinate.
Providing agronomically acceptable options to famers and consumers.
Revision/adaption of regulatory/registration requirements to accommodate
agronomically equivalent cultivars with different resistance genes.
SUMMARY DNA sequence becoming an inexpensive commodity generated on a variety of platforms. New paradigms as to how DNA sequence is generated, handled and valued. Enormous amounts of sequence data imminent. Need for major bioinformatics capabilities: data into knowledge. Unprecedented opportunities for discovery and manipulation of variation in plants and pathogens. Durable resistance: old ideas, new opportunities. Diversify sources and types of resistance. Pyramid to increase evolutionary hurdle for pathogen. Fragment/diversify selection pressure on pathogen. Pathogen population genetics should drive deployment of resistance genes: the influenza paradigm.