This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Next Generation (Sequencing) Tools for
Advanced Molecular Breeding
Peter Winter, Kamila Bokszczanin, Himabindu Kudapa, Alejandro Rodriguez Meisel,
Nicolas Krezdorn, Ruth Jüngling, Rajeev Varshney, Guenter Kahl, Björn Rotter
GenXPro GmbH, Frankfurt am Main
www.genxpro.de
TranSNiPtomics: Genome-wide transcription profiles provided by
NGS-based Massive Analysis of cDNA Ends (MACE) simultaneously
identify allele-specific differential expression of root-trait-related
drought-response genes in drought-tolerant and susceptible
chickpea varieties
Genome & Transcriptome Analysis Services
Transcriptome : - Massive Analysis of cDNA Ends (MACE)
- Bacterial SuperSAGE
- Normalization of cDNA libraries (qualitative information)
- RNA-seq
- Small RNAs / microRNA in tissues, body fluids, exosomes
- Other non-coding RNAs, Degradome
- qPCR service
Genome: - Whole-genome Sequencing
- Digital karyotyping (ST-DK), RC-seq, CNVs
- Methylation-specific DK (ST-MSDK), Meth-seq
- All Exome sequencing, Target Enrichment
Metagenome: - COXI, 16s rRNA, others...
Bioinformatics: - NGS Data Handling, Assembly, Quantification, BLAST
- Expression Data Interpretation, Gene Ontology
GenXPro GmbH
Nucleotide-based information
GenXPro: Our Service Portfolio
Illumina Hiseq2000
Sequence length 2 x 150 bp ~ 500-15.000 bp (!)
Throughput/ subunit 30-60 Giga Bases 250 Mega Bases
PacBio
Full service: Transcriptomics, Genomics, Genotyping, Epigenomics, Bioinformatics
• Patented techniqe for reduced representation analyses
• Method to eliminate PCR-copies from dataset
• No prior knowledge about NGS required, no hardware, no software, just samples…
Hardware/Computers:
• 164 CPUs, 704 Gigabyte RAM
Assembly:
• different assembly programs available
Annotation:
• Novoalign, BLAST, SOAP, BLAT, Annovar
Enrichment Analysis:
• Gene Ontology, KEGG, BioCarta, GSEA etc.
How to handle Gigabytes of Data?
Bioinformatics NGS data management at GenXPro
From poster of Manish Roorkiwal, ICRISAT, 2/7/2013
Drought is the major constraint to chickpea production:
A case for TranSNiPtomics
Application of NGS for Chickpea Breeding
TranSNiPtomics
“TranSNiPtomics“
Requirements:
• Sufficient coverage - distinguish between sequencing error and SNP
• Accurate measurement of transcription levels
“TranSNiPtomics”:
simultaneous analysis of gene expression AND polymorphism =
allel-specific gene expression measurement
Advantages:
• Markers located within genes - very likely connected to specific
trait
• Markers can be chosen from differentially expressed genes to
increase chance of involvement in trait
Transcriptomes
Frequencies of transcript species Total transcript distribution
Less than 0.2 % of genes contribute
more than 40% of all transcripts
> 50% of transcripts are present in
less than 10 copies
*
Some frequent, many rare transcripts
Differential gene expresson results in large differences of
transcript representation in all transcriptomes
RNA-Seq
5’ 3’
AAAAAAA-3’ TTTTTTT-5’ cDNA of transcript B
5’ 3’
AAAAAAA-3’ TTTTTTT-5’ cDNA of transcript A RNA-Seq
o Many reads per transcript
o Reads per transcript vary, depending on transcript lenght
o Quantification often difficult in non-model organisms
o Very deep sequencing required for short and low-abundant transcripts
(e.g. transcription factors, receptors)
Measuring the Transcriptome
Our solution = Massive Analyisis of cDNA Ends (MACE)
only the cDNA-3‘ends (or 5‘-ends) are sequenced
• Reduced complexity, less variants, but:
• concentration on the most polymorphic region in a gene
• highly specific for good annotation !
• easy to quantify !
• high coverage for SNP detection !
• low costs !
• hundreds of genotypes can be analysed, e.g. mapping
populations at reasonable costs
MACE
Measuring the Transcriptome
5’ 3’
AAAAAAA-3’ TTTTTTT-5’
cDNA
cDNA
cDNA
cDNA
5’ 3’
5’ 3’
5’ 3’
AAAAAAA-3’ TTTTTTT-5’
AAAAAAA-3’ TTTTTTT-5’
AAAAAAA-3’ TTTTTTT-5’
Massive Analysis of cDNA Ends (MACE):
cDNA 5’ 3’
AAAAAAA-3’ TTTTTTT-5’
cDNA 5’ 3’
AAAAAAA-3’ TTTTTTT-5’
Streptavidin-Beads
How it works
Massive Analysis of cDNA Ends: MACE
AAAAAAA-3’ TTTTTTT-5’
AAAAAAA-3’ TTTTTTT-5’
AAAAAAA-3’ TTTTTTT-5’
AAAAAAA-3’ TTTTTTT-5’
Fragmentation, washing
AAAAAAA-3’ TTTTTTT-5’
AAAAAAA-3’ TTTTTTT-5’
Streptavidin-Beads
100-300 bp
How it works
Massive Analysis of cDNA Ends: MACE
AAAAAAA-3’ TTTTTTT-5’
AAAAAAA-3’ TTTTTTT-5’
AAAAAAA-3’ TTTTTTT-5’
AAAAAAA-3’ TTTTTTT-5’
2nd generation sequencing of 50-100 bp
AAAAAAA-3’ TTTTTTT-5’
AAAAAAA-3’ TTTTTTT-5’
How it works
Massive Analysis of cDNA Ends: MACE
AAAAAAA-3’ TTTTTTT-5’
AAAAAAA-3’ TTTTTTT-5’
AAAAAAA-3’ TTTTTTT-5’
AAAAAAA-3’ TTTTTTT-5’
Assembly & Counting
AAAAAAA-3’ TTTTTTT-5’
AAAAAAA-3’ TTTTTTT-5’
How it works
Massive Analysis of cDNA Ends: MACE
50-400bp
Assembly & Counting
Counting, BLAST
4
1
1
Only one fragment per transcript!
How it works
Massive Analysis of cDNA Ends: MACE
for model and non-model organsisms
Tags:
annotation / mapping
Gen 1
Gen 2
unknown
unknown
unknown
unknown
Assembly
BLASTX (Protein DBs)
quantification
1
1
4
Enrichment
analysis
quantification
WEB tool
„MACE2GO“
data browser
Bioinformatics: automated workflow
RNA-Seq vs. MACE
5’ 3’
AAAAAAA-3’ TTTTTTT-5’ cDNA of transcript B
5’ 3’
AAAAAAA-3’ TTTTTTT-5’ cDNA of transcript A
RNA-Seq
Many reads per transcript, reads per transcript varies!
For similar resolution, RNA-Seq requires
about 20-30 times more sequencing*
one read = one transcript
*Asmann et. al 2009
MACE A
B
cDNA AT
CG
AAAAAAAA
TTTTTTTT AT
cDNA AT
CG
AAAAAAAA
TTTTTTTT
AT
RNA-Seq = high complexity
MACE = reduced complexity
Concentration on polymorph 3‘ end: SNPs with enough coverage : 2
Reads distributed all over transcript: SNPs with enough coverage : 0
comp14620_c0_seq1 len=251 path=[229:0-168 398:169-250] : gi|326519328|dbj|BAJ96663.1| predicted protein [Hordeum vulgare subsp. vulgare] e-value: 7e-06 blast score: 46.2169 T C 194 2 GACAAGGTAGGTACTCATAAAACAAACCATGGAGAGAGACCATGAACCAAATTGGACAAAACATACTTGCTTCCATATTAGAAAGCTTACATGGTATATT/C AAGTGGTGCTAAATAATCTTATAGAAGGGCAAAACAGTATACACGGTCTGCAAGAGAGTGGCCACAAGCAGGACGACGGCG
comp2006449_c0_seq160 C T 0 11 GTATACTTTTATGTACAAGTAGTTGCTTAATTGTTATTATGTGTTCTCTTTTTAGTTATC/T TTCTTCATTATAATTTTTCCATGGAAATAATGTATGCTGGTAGAGTGGCAGTGGTAATCAATGTGTATATTGCAAGGTGCTAGAGTACACACTGCAGGCT