The genome sequence of Melampsora larici-populina, the causal agent of the poplar rust disease
M. larici-populina Transcriptome
Mlp Summer workshop – INRA Nancy, August 20-21 2008
Duplessis Sébastien (INRA Nancy)Tree/Microbe Interactions Joint Unit, INRA/University Nancy, UMR 1136 IAM
Mlp Transcriptome – Goals and Means
Goals
Gene Expression
- Identify genetic determinants involved in Mlp biology- Identify sets of genes involved in development of infection structures
(secretion, effectors, avirulence, ...)- Identify sets of genes involved in biotrophy (nutrition, transport)
- Identify expression profiles expressed during plant-fungal interaction
Gene Models Annotation
- Validation of Gene Models prediction- Detection of new Gene Models
Mlp Transcriptome – Goals and Means
Means
EST sequencing
- Sanger ESTs from specific cDNA library (cDNA cloning / 100-1000s ESTs)- 454-pyrosequencing from specific tissue (no cDNA cloning / 200-400k
reads)
454: 80 Mb in 1 run for 10K€ vs. 1000s of Sanger ESTs for much more
=> Genes expressed in a given tissue (specific and ubiquitous)
=> No gene prediction a priori
Array-based expression profiling
- DNA Chips – NimbleGen Systems oligonucleotide arrays
=> Expression of all predicted genes represented on the array
=> Gene prediction a priori or EST sequencing required
Mlp Transcriptome – EST sequencing I
cDNA Library of Mlp 98AG31 uriniospores and germlings
250 µg of DNase free-RNA were isolated from Mlp 98AG31 urediniospores and germlings (urediniospores grown for less than 12h on agar) sent to JGI
Mlp is an obligate biotroph so spores are unique sources for uncontaminated ESTs
cDNA Library => 29,081 cDNA clones
5'/3' sequencing => 52,269 ESTs (including ~ 4,500 ESTs previously obtained at INRA Nancy)
EST assembly => 11,535 Consensus (mean size 780nt: 100 -> 5052 nt)
— 6,599 singletons — 4,936 clusters — 119 consensus contain > 50 ESTs
Best Blast Hits of most abundant ESTs consisted in:
— stress response TF rds1, HSP, glycosidase, ubiquitin, fruitingbody protein, cyclin, SOD, Ras, antibiotic resistance, protease, laccase, tubulin
— dehydrogenases and cytP450 from Uromyces fabae— predicted gene models from P. graminis
Mlp Transcriptome – EST Sequencing I
Comparison to released Pucciniales ESTs (e-value < 10-5)
Phakopsora pachyrizi (soybean rust) ESTs => Germinated/not germ spores, Infected tissues
Puccinia graminis f. sp. tritici (wheat stem rust) => Germ/not germ urediniospores and teliospores
46,411 28,5365,858 45,812 56,7536,483Mlp Mlp Pp Pgt
4,045Pgt spore ESTs
5,738Pp spore ESTs
Mlp Transcriptome – EST Sequencing I
Mlp 98AG31 ESTs for Gene Prediction and Gene model support
ESTs were used in JGI and EuGene predictions
=> 27 % of Gene Models supported => 4,507 Gene models supported
ESTs to support gene curation
=> ESTs and clusters are shown on the JGI Melampsora website
Mlp Transcriptome – EST Sequencing II
M. medusae f.sp. deltoidae (MMD)— Multiple isolates, diff. growth stages (field)
M. larici-populina (MLP and MLP-H)— Multiple isolates, diff. growth stages (field)— Single isolate, haustoria-enriched (in vitro)
M. medusae f.sp. tremuloidae (MMT)— Single isolate, 13 days growth (in vitro)
M. occidentalis (MO)— Single isolate, 13 days growth (in vitro)
cDNA Libraries from various Melampsora Spp. (Feau, Joly, Hamelin, CFS, Canada)
Mlp Transcriptome – EST Sequencing II
Construction kit
# clones sequenced
# readable sequences # contigs #
singletons
MMD Stratagene 5,541 3,695 465 589
MLP Stratagene 3,008 2,493 282 564
MLP-H Clontech 3,708 3,137 615 1,034
MMT Clontech 3,008 2,793 638 999
MO Clontech 3,008 2,642 367 1,285
cDNA Libraries from various Melampsora Spp.
Mlp Transcriptome – EST Sequencing II
Procaryota 0.18%Vertebrates 0.25%
Invertebrates 0.74%
Plants 1.4%
Fungi 26.8%
No hits 70.6%
Hypothetical proteins 63%
known proteins in public databases 37%
Avirulence and pathogenicity factors 3%Cell defense 6%
Cell growth/Cell division/DNA synthesis 2%
Cellular organization 6%
Energy 5%
Metabolisms 15%
Transcription 4%Protein destination 5%
Protein synthesis 27%
Signal transduction mediators 4%
Intracellular traffic 5%
Transport facilitator 6%
Unclassified 12%
Stress response (72%)Detoxification (28%)
Carbohydrates, amino-acids, lipids (66%)Ribosomal proteins (77%)Translational factors (22%)tRNA-synthetases (1%)
N, P and S (9%)Nucleotides (7%)
Biosynthesis of cofactors and vitamins (18%)
Glycolysis (42%)TCA pathway (17%)Respiration (33%)
Haustorially expressed secreted proteins (47%)Planta induced rust proteins (13%)Rust transferred protein precursors (27%)Other (13%)
Gluconeogenesis (8%)
Poplars (13%)
Figure 1. Gene prediction and classification of the 4867 assembled ESTs from the four Melampsoralibraries. ESTs with significant matches (Blastx against the Uniprot database; E values < 10-20) were classified into categories according to the functional nomenclature presented in Kamoun et al. (1999).
Others (87%)
Feau et al. 2007. Can.J.Bot
Annotation of Melampsora Spp. ESTs
Mlp Transcriptome – EST Sequencing II
Annotation of Melampsora Spp. ESTs
0 5 10 15 20
Protein/domain of unknown function
RNA recognition motifATP synthase
Ubiquitin domain
Core histone H2A/H2B/H3/H4
RNA-metabolising metallo-beta-lactamaseElongation factor
Thioredoxin14-3-3 protein
Ras family
Cytochrome b5-like Heme/Steroid binding domainHelicase conserved C-terminal domain
CFEM domain
Cyclophilin type peptidyl-prolyl cis-trans isomerase
Putative GTPase activating protein for ArfWD domain, G beta repeat
Zing finger (C2HC/CCCH/ZPR1)Actin
Cytochrome c oxidase subunit Va/Vb
Enolase
16 5 7 15
No. of assembled ESTs representing the protein family
1
1
1
1
1
1
1 1
1
1 1
1
5
77
3 12 8 9
6 34
8412
184
3 670
6
23717
116
8 8 3
3 4
Acyl CoA binding protein 53
Heat shock protein 115 9
EF hand 1 9 2
Mitochondrial carrier protein 39 4
34 236632
3 3 7
32
4 2
8 4
11 4
17 2
7 1
M. larici-populina
M. medusae f.sp. deltoidae
M. medusae f.sp. tremuloidae
M. occidentalis
16 14229
196
Ribosomal proteins
0 5 10 15 20 196
Feau et al. 2007. Can.J.Bot
Mlp Transcriptome – EST Sequencing III: 454-pyrosequencing
454-pyrosequencing of poplar leaf infected tissues
Melampsora is an obligate biotroph => specialized infection structures (haustoria) formed after 16 h post-inoculation (pi) and uredinia formed after 7 dpi only in the plant host
Strong Mlp invasion of plant tissues was observed at 4 dpi (Rinaldi et al., 2007)
Pyrosequencing allows the generation of 100,000s sequences from isolated transcripts
=> 200,000 ESTs from transcripts isolated from Poplar infected leaves at 4 and 7 dpi with 454 GS-FLEX (Roche) by Cogenix
— Transcripts expressed during plant infection— Transcripts involved in infection structure development, maintenance and biotrophy— Transcripts involved in spore formation and maturation— Identification of plant infection-specific transcripts by comparison with Sanger ESTs
Mlp Transcriptome – 454-pyrosequencing
(From Ellegren, Mol. Ecol. 2008)
Mlp Transcriptome – 454-pyrosequencing
454-sequencing at JGI
Mlp Transcriptome – 454-pyrosequencing
1. 250 µg of total RNA were isolated from infected Poplar leaves ('Beaupré') at 4 hpi and 7 dpi with Mlp 98AG312. cDNA synthesis with SMART cDNA synthesis kit from 60 ng purified mRNA3. 10 µg cDNA recovered and sent to Cogenix for 454-pyrosequencing on GS-FLEX (Roche)
4 dpi: infection hyphae, haustoria 4 dpi: infection hyphae, haustoria,uredinia, spore-forming cells
Pictures by S Hacquard & S Duplessis (2008)by confocal microscopy with PI/Uvitex staining
Mlp Transcriptome – 454-pyrosequencing
Cogenix report on 454-sequencing
454-pyrosequencing allow to generate > 400,000 sequences or 2 x 200,000 sequences in 1 run
Poplar infected tissues => ~ 185,663 sequences 454-sequences are small (mean length 203 nt) and requires assembly for transcript reconstruction
Assembly by Newbler => 148,688 assembled in 10,629 contigs & 36,975 reads (= singletons?)
Mlp Transcriptome – 454-pyrosequencing
Newbler assembly vs. MIRA assembly
Newbler is a de novo assembler designed for genomic sequences (not transcripts) working in flow-chart space, not nucleotide space
Newbler tends to eliminate several reads with no obvious reasons (>38,000 reads are lost)Cogenix recommended the use of other de novo assembler dedicated to transcript assembly
CAP3 is not recommendedMIRA is an ESTs assembler recently updated for 454-data
=> http://chevreux.org/projects_mira.html
MIRA generates more contigs than Newbler => 17511 contigs (including 2,600 singletons)MIRA provides information on overall quality of sequences (tag 'too short' = low quality sequences)
Genome threader (Gth) allows to map transcript sequences to a genome sequenceMIRA contigs are mapped to Mlp and poplar genomes to identify fungal and plant transcripts
Mlp Transcriptome – 454-pyrosequencing
0.50.52
0.540.56
0.580.6
0.620.64
0.660.68
0.70.72
0.740.76
0.780.8
0.820.84
0.860.88
0.900.92
0.940.96
0.981.00
0
500
1000
1500
2000
2500
3000MIRANewbler
Nb
cont
igs
0.50.52
0.540.56
0.580.6
0.620.64
0.660.68
0.70.72
0.740.76
0.780.8
0.820.84
0.860.88
0.900.92
0.940.96
0.981.00
0
200
400
600
800
1000
1200
1400 MIRANewbler
Nb
cont
igs
10e-5 - 10e-20 10e-20 - 10e-50 10e-50 - 10e-100 10e-100 - 0.0 0.0
0
200
400
600
800
1000
1200
1400
1600NewblerMIRA
Nb
cont
igs
10e-5 - 10e-20 10e-20 - 10e-50 10e-50 - 10e-100 10e-100 - 0.0 0.0
0
500
1000
1500
2000
2500NewblerMIRA
Nb
cont
igs
Newbler vs. MIRA
Mlp sequences Poplar sequences
Singletons reads from Newbler are mostly low quality sequences
Mlp Transcriptome – 454-pyrosequencing
Final MIRA assembly vs. poplar and Mlp genomes
— Contigs that showed a Gth score < 0.9 were dissolved in singletons— Contigs attributed to both genomes with Gth scores > 0.9 were manually resolved— Contigs attributed to a genome and containing reads attributed to the other genome
were manually inspected with Consed => new contigs/singletons— Singletons with Gth scores < 0.9 were not retained
5,956 contigs & 9,562 singletons attributed to Mlp
6,414 contigs & 21,400 singletons attributed to Poplar
PASA (Program to Assemble Spliced Alignment)
PASA is a tool designed for curation of gene catalogs using sets of ESTs and FL-CDNA and based onstringent alignment to genome sequence with GMAP, assembly in clusters based on position on genome sequence, comparison to current catalogue of gene models => curation
PASA was used in several published 454-analyses, and in Arabidopsis community for gene curation
PASA => Mlp EST (Sanger & 454 contigs) vs. Mlp genome/gene models
Mlp Transcriptome – 454-pyrosequencing
PASA outputs for Mlp 454 Contigs
PASA was run using all 454 reads against Mlp Genome and a similar number of gene models were supported
Mlp Transcriptome – 454-pyrosequencing
PASA outputs for Mlp Sanger contigs
Total of 6294 Mlp Gene Models supported (38%)
Mlp Transcriptome – 454-pyrosequencingExamples of gene models curation based on Mlp 454 Contigs proposed by PASA
Mlp Transcriptome – 454-pyrosequencingMost abundant transcripts supporting Mlp Gene Models identified through 454-sequencing
4010 Gene models supported by 454 ESTs
— 935 no hits in nr/swissprot - 391 specific to Pucciniales - 519 specific to Mlp
— 265 encodes SSPs => 166 no hits in nr/swpr - 34 specific to Pucciniales - 128 specific to Mlp
Mlp Transcriptome – NimbleGen Systems oligonucleotide arrays
NimbleGen Systems Expression oligont arrays
~390,000 60-mer oligoprobes evenly distributed on 2cm2 array4plex arrays = 80 to 90,000 probes per array (+ controls)
Set of 8 oligoprobes/gene duplicated in Laccaria bicolor
16,694 JGI models + new EuGene models with 454 support[All 454 supported new CDS ?]
17 to 20,000 Mlp Gene Models => 4 probes/genes => no duplicated probes => Populus filtered
10 x 4plex NimbleGen arrays ordered – Design ASAPMlp Gene Expression during timecourse infection
Mlp Transcriptome – Conclusions
Conclusions
— 52,269 Mlp 98AG31 ESTs support 27% JGI Mlp Gene Models
— ESTs from other Mlp Spp to help in annotation (+ polymorphism study)
— 185,000 454-reads were assembled in 12,370 Contigs & 30,962 Singletons
5,956 contigs & 9,562 singletons attributed to Mlp by Gth
6,414 contigs & 21,400 singletons attributed to Poplar by Gth
— PASA identified a total of 6294 Mlp Gene Models supported both by 454 and Sanger ESTs contigs = 38% of Mlp Gene Models (11% increase)
— MIRA identified many Gene models that may need annotation
— MIRA also identified more than 2,500 putative new genes (to be verified)
— Among the 4,010 Gene Models expressed in planta
=> 519 are specific to Mlp and 391 to Pucciniales => 265 encode SSPs and 128 SSPs are specific
toMlp
Mlp Transcriptome – Conclusions
Ongoing…
— Curation of Gene Models supported by 454 contigs
— Prediction/Curation of putative new genes with 454 contigs support
— Design of NimbleGen Systems Oligoarray Mlp v1.0
To come…
— Alternative splicing
— Presence of SNPs (Transcripts expressed in both nuclei?)
— Profiles of candidate genes during timecourse infection of poplar leaves
Stéphane Hacquard (INRA Nancy)Mlp effectors
Emilie Tisserant & Benoît Hilselberger
(INRA Nancy) Mlp Bioinfo
Yao-Cheng Lin (VIB, Ghent, BE) EuGene prediction, Mlp gene families
Mlp 98AG31
the 'bad guy' genomic team at INRA
UMR 1136 IAM
Marie-Pierre Oudot-Le Secq(INRA Nancy) EST annotation
Duplessis Sébastien & Francis Martin