This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
FIND MEANING IN COMPLEXITY For Research Use Only. Not for use in diagnostic procedures.
– Alternative polyadenlyation – (possible w/ proper protocol) Alternative start sites
Disclaimer
• Everything shown from now on are transcripts/isoforms, not genes
• Data shown is preliminary, very unbaked
• Concept Analysis
Count Information Associated with Each Unique Transcript
Clusters of transcript alignments using FL + nFL reads
Transcript 1 Transcript 2 Transcript 3
Final transcript consensus
Transcript 1 Transcript 2 Transcript 3
Count matrix
Transcript Count Norm_Count
1 2 3 …
8 5 7 …
0.08 0.05 0.07 …
Count Information from non-FL reads
For non-FL reads: • If uniquely associated with a transcript, assume it is the transcript • If ambiguously associated, most likely because it’s a partial match
• For now, weight of ambiguous nFL is just
read _ count = # of FL + # of unique nFL + weighted # of ambiguous nFL
1Number of associated transcripts
In current dataset, about 40-60% nFL reads partially match multiple isoforms (FL reads are always fully and uniquely associated)
Read Count Variation in Technical Replicates
Rat Heart • Technical replicates (same starting RNA & protocol) • 3 size libraries (1 – 2 kb, 2 – 3 kb, 3 – 6 kb) • Runs from diff sizes pooled for
bioinformatics pipeline
Boxplot of log2 read counts
Scatterplot of log2 read count for each transcript
Rat Heart, technical replicates
Read Count Variation in Technical Replicates
10
Rat Lung, technical replicates
All technical replicates were seq with total ~8 SMRT® Cells (low depth) Most NA transcripts are low counts
Choice of Chemistry Does Not Bias Sequencing
11
Rat Brain Same 3-size library (not technical replicate) • Sequenced with P4-C2 chemistry • Sequenced with P5-C3 chemistry
However for longer (> 3 kb) transcripts, P5-C3 chemistry will increase chance of seeing FL reads
Choice of PCR Enzyme May Bias Amplification
12
Human Brain, 2 – 3 kb library
Human Brain, 3 – 6 kb library
Current Iso-Seq Protocol Amplifies Sample Twice
13
polyA+ RNA
Total RNA
Optional Poly-A Selection
Reverse Transcription (SMARTScribe RT)
Full-‐length 1st Strand cDNA
PCR Optimization
Large-scale Amplification
Amplified cDNA
1-‐2 kb
2-‐3 kb
3-‐6 kb
Size Selection
1-‐2 kb
2-‐3 kb
3-‐6 kb
Re-Amplification
1-‐2 kb
2-‐3 kb
3-‐6 kb
SMRTbell™ Template Preparation
1-‐2 kb
2-‐3 kb
3-‐6 kb
SMRT® Sequencing
3-‐6 kb
Optional Size Selection
2nd Amplification Does Not Introduce Strong Bias
14
FL Read Length Distribution
Std. vs. skipping 2nd amp
Std. vs. skipping 1st amp Skipping 1st amplification results in size selection of first-strand cDNA that may be hard to optimize
Expected Transcript Variability in Different Rat Tissues
15
Rat Heart vs Rat Lung
Rat Heart vs Rat Brain
Heart Lung
Heart Brain
Conclusion
• Technical variation not a big issue – If done with same library protocol – Different (PCR) enzymes bias amplification
– Amplification can be tolerated if kept at reasonable # of cycles
• Potential for DE – Still many unknown factors – Everything shown in previous slides merely “proof of concept”
– With control comes better modeling
16
Looking Ahead
17
• Detection limit • Amplification bias
– Adding control at known %
– Factors: GC? Length? Enzyme?
• Account for library pooling • Ambiguous mapping • Modeling bias • DE isoform detection • Combining short-read data
Wet Lab Bioinformatics
For Research Use Only. Not for use in diagnostic procedures. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell and Iso-Seq are trademarks of Pacific Biosciences in the United States and/or other countries. All other trademarks are the sole property of their respective owners.