The genome sequence of Melampsora larici-populina, the causal agent of the poplar rust disease M. larici-populina Transcriptome Mlp Summer workshop – INRA Nancy, August 20-21 2008 Duplessis Sébastien (INRA Nancy) Tree/Microbe Interactions Joint Unit, INRA/University Nancy, UMR 1136 IAM
26
Embed
The genome sequence of Melampsora larici-populina , the causal agent of the poplar rust disease
Mlp Summer workshop – INRA Nancy, August 20-21 2008. The genome sequence of Melampsora larici-populina , the causal agent of the poplar rust disease M. larici-populina Transcriptome. Duplessis Sébastien (INRA Nancy). - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The genome sequence of Melampsora larici-populina, the causal agent of the poplar rust disease
M. larici-populina Transcriptome
Mlp Summer workshop – INRA Nancy, August 20-21 2008
Duplessis Sébastien (INRA Nancy)Tree/Microbe Interactions Joint Unit, INRA/University Nancy, UMR 1136 IAM
Mlp Transcriptome – Goals and Means
Goals
Gene Expression
- Identify genetic determinants involved in Mlp biology- Identify sets of genes involved in development of infection structures
(secretion, effectors, avirulence, ...)- Identify sets of genes involved in biotrophy (nutrition, transport)
- Identify expression profiles expressed during plant-fungal interaction
Gene Models Annotation
- Validation of Gene Models prediction- Detection of new Gene Models
Mlp Transcriptome – Goals and Means
Means
EST sequencing
- Sanger ESTs from specific cDNA library (cDNA cloning / 100-1000s ESTs)- 454-pyrosequencing from specific tissue (no cDNA cloning / 200-400k
reads)
454: 80 Mb in 1 run for 10K€ vs. 1000s of Sanger ESTs for much more
=> Genes expressed in a given tissue (specific and ubiquitous)
=> No gene prediction a priori
Array-based expression profiling
- DNA Chips – NimbleGen Systems oligonucleotide arrays
=> Expression of all predicted genes represented on the array
=> Gene prediction a priori or EST sequencing required
Mlp Transcriptome – EST sequencing I
cDNA Library of Mlp 98AG31 uriniospores and germlings
250 µg of DNase free-RNA were isolated from Mlp 98AG31 urediniospores and germlings (urediniospores grown for less than 12h on agar) sent to JGI
Mlp is an obligate biotroph so spores are unique sources for uncontaminated ESTs
Figure 1. Gene prediction and classification of the 4867 assembled ESTs from the four Melampsoralibraries. ESTs with significant matches (Blastx against the Uniprot database; E values < 10-20) were classified into categories according to the functional nomenclature presented in Kamoun et al. (1999).
Cyclophilin type peptidyl-prolyl cis-trans isomerase
Putative GTPase activating protein for ArfWD domain, G beta repeat
Zing finger (C2HC/CCCH/ZPR1)Actin
Cytochrome c oxidase subunit Va/Vb
Enolase
16 5 7 15
No. of assembled ESTs representing the protein family
1
1
1
1
1
1
1 1
1
1 1
1
5
77
3 12 8 9
6 34
8412
184
3 670
6
23717
116
8 8 3
3 4
Acyl CoA binding protein 53
Heat shock protein 115 9
EF hand 1 9 2
Mitochondrial carrier protein 39 4
34 236632
3 3 7
32
4 2
8 4
11 4
17 2
7 1
M. larici-populina
M. medusae f.sp. deltoidae
M. medusae f.sp. tremuloidae
M. occidentalis
16 14229
196
Ribosomal proteins
0 5 10 15 20 196
Feau et al. 2007. Can.J.Bot
Mlp Transcriptome – EST Sequencing III: 454-pyrosequencing
454-pyrosequencing of poplar leaf infected tissues
Melampsora is an obligate biotroph => specialized infection structures (haustoria) formed after 16 h post-inoculation (pi) and uredinia formed after 7 dpi only in the plant host
Strong Mlp invasion of plant tissues was observed at 4 dpi (Rinaldi et al., 2007)
Pyrosequencing allows the generation of 100,000s sequences from isolated transcripts
=> 200,000 ESTs from transcripts isolated from Poplar infected leaves at 4 and 7 dpi with 454 GS-FLEX (Roche) by Cogenix
— Transcripts expressed during plant infection— Transcripts involved in infection structure development, maintenance and biotrophy— Transcripts involved in spore formation and maturation— Identification of plant infection-specific transcripts by comparison with Sanger ESTs
Mlp Transcriptome – 454-pyrosequencing
(From Ellegren, Mol. Ecol. 2008)
Mlp Transcriptome – 454-pyrosequencing
454-sequencing at JGI
Mlp Transcriptome – 454-pyrosequencing
1. 250 µg of total RNA were isolated from infected Poplar leaves ('Beaupré') at 4 hpi and 7 dpi with Mlp 98AG312. cDNA synthesis with SMART cDNA synthesis kit from 60 ng purified mRNA3. 10 µg cDNA recovered and sent to Cogenix for 454-pyrosequencing on GS-FLEX (Roche)
Pictures by S Hacquard & S Duplessis (2008)by confocal microscopy with PI/Uvitex staining
Mlp Transcriptome – 454-pyrosequencing
Cogenix report on 454-sequencing
454-pyrosequencing allow to generate > 400,000 sequences or 2 x 200,000 sequences in 1 run
Poplar infected tissues => ~ 185,663 sequences 454-sequences are small (mean length 203 nt) and requires assembly for transcript reconstruction
Assembly by Newbler => 148,688 assembled in 10,629 contigs & 36,975 reads (= singletons?)
Mlp Transcriptome – 454-pyrosequencing
Newbler assembly vs. MIRA assembly
Newbler is a de novo assembler designed for genomic sequences (not transcripts) working in flow-chart space, not nucleotide space
Newbler tends to eliminate several reads with no obvious reasons (>38,000 reads are lost)Cogenix recommended the use of other de novo assembler dedicated to transcript assembly
CAP3 is not recommendedMIRA is an ESTs assembler recently updated for 454-data
=> http://chevreux.org/projects_mira.html
MIRA generates more contigs than Newbler => 17511 contigs (including 2,600 singletons)MIRA provides information on overall quality of sequences (tag 'too short' = low quality sequences)
Genome threader (Gth) allows to map transcript sequences to a genome sequenceMIRA contigs are mapped to Mlp and poplar genomes to identify fungal and plant transcripts
Singletons reads from Newbler are mostly low quality sequences
Mlp Transcriptome – 454-pyrosequencing
Final MIRA assembly vs. poplar and Mlp genomes
— Contigs that showed a Gth score < 0.9 were dissolved in singletons— Contigs attributed to both genomes with Gth scores > 0.9 were manually resolved— Contigs attributed to a genome and containing reads attributed to the other genome
were manually inspected with Consed => new contigs/singletons— Singletons with Gth scores < 0.9 were not retained
5,956 contigs & 9,562 singletons attributed to Mlp
6,414 contigs & 21,400 singletons attributed to Poplar
PASA (Program to Assemble Spliced Alignment)
PASA is a tool designed for curation of gene catalogs using sets of ESTs and FL-CDNA and based onstringent alignment to genome sequence with GMAP, assembly in clusters based on position on genome sequence, comparison to current catalogue of gene models => curation
PASA was used in several published 454-analyses, and in Arabidopsis community for gene curation
PASA => Mlp EST (Sanger & 454 contigs) vs. Mlp genome/gene models
Mlp Transcriptome – 454-pyrosequencing
PASA outputs for Mlp 454 Contigs
PASA was run using all 454 reads against Mlp Genome and a similar number of gene models were supported
Mlp Transcriptome – 454-pyrosequencing
PASA outputs for Mlp Sanger contigs
Total of 6294 Mlp Gene Models supported (38%)
Mlp Transcriptome – 454-pyrosequencingExamples of gene models curation based on Mlp 454 Contigs proposed by PASA