Top Banner
Base Calling Error Toleration in Reference Based Assembly Hadi Gharibi Email: [email protected] Sharif University of Technology Max Planck Institute for Molecular Genetics May 2015
14

Base Calling Error Toleration in Reference Base Assembly

Jan 19, 2017

Download

Science

Hadi Gharibi
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Base Calling Error Toleration in Reference Base Assembly

Base Calling Error Toleration in Reference Based Assembly

Hadi GharibiEmail: [email protected] University of Technology

Max Planck Institute for Molecular GeneticsMay 2015

Page 2: Base Calling Error Toleration in Reference Base Assembly

How Base Calling Error Can Be Tolerated in Next Generation Sequencing (NGS)

2

Importance

Challenges

Our Hypothesis

Our Approach

• Deal with Large Amount of Data • Impact on Sequencing Data Analysis Time and Accuracy

Researchers have developed many base calling algorithms, however, they have not resolved the tradeoff between accuracy and time complexity.

• Required Accuracy • Sequencing Data Analysis Execution Time

Base Calling Error Is Compensated in Down-stream Sequencing Steps

• Massive Data• Diverse Algorithms

Page 3: Base Calling Error Toleration in Reference Base Assembly

Importance: Base Calling Translates Noisy Intensity Data Into Reads

3© EMBO Conference, 2014 [1]© illumina Incorporation, 2011.[2]

IntensityImage Processing

Base Calling

ReadAssemblingGenome

Page 4: Base Calling Error Toleration in Reference Base Assembly

Challenge: Base Calling Errors Are Always Compared

4© C. Ye, 2014 [3]

Figure: Error rate for base callers per sequencing cycle on the PhiX174 test data is plotted. Accurate callers are slower than the others. [3]

Page 5: Base Calling Error Toleration in Reference Base Assembly

Fundamental Question:

5

How Much Accuracy Is Required?

Page 6: Base Calling Error Toleration in Reference Base Assembly

Our Approach: Analytical Assumptions and Method

6

Assumptions

• Random Genome• Single Variations• Mismatches << Read Length• Uniform Substitution Error• Equally Likely Base Errors

Method• Variant Calling for Re-sequencing

• Derive Variant Calling Errors

Page 7: Base Calling Error Toleration in Reference Base Assembly

Analytical Results: Base Calling Error Is Tolerated by Mapping Mismatch

7

Figure: Variant Calling Error Vs. Base Calling Error

Random GenomeMismatches={2, 5, 7, 9}Genome Size ~ 4MbpRead Length= 30bpVariation Rate= 0.01

Page 8: Base Calling Error Toleration in Reference Base Assembly

Simulation Method and Setup

8

• Generate Target Genome• Simulate Reads [4]• Add Base Calling Error• Call Variants• Calculate Variant Calling Error

Method Setup

© Gemsim, 2013[4]

Page 9: Base Calling Error Toleration in Reference Base Assembly

Simulation Results: Simulation Verifies Analysis Predictions

9

• E-Coli Genome [5]• Mismatches= {3, 4, 5}• Genome Size ~ 4Mbp• Read Length= 30bp• Variation Rate~ 0.01• Single-end Shotgun Run • Map with SOAP[6]

Figure: Variant Calling Error Vs. Base Calling Error

© NCBI, 2014[5]© G. BGI, 2008[6]

Page 10: Base Calling Error Toleration in Reference Base Assembly

Simulation Results: Random Genome Obviates Repeat Region Effect

10

• Genome Sizes ~ 4Mbp• Mismatches= 3• Read Length= 30bp• Variation Rate~ 0.01• Single-end Shotgun Run • Map with SOAP[6]

Figure: Random Genome Vs. E-Coli Genome

© G. BGI, 2008[6]

Page 11: Base Calling Error Toleration in Reference Base Assembly

11

Conclusion

Simulation Results

• Confirm the Hypothesis• Genome Repeat Regions Impair Accuracy

• Confirm the Hypothesis• Higher Mismatches May Not Obey

Analytical Results

Page 12: Base Calling Error Toleration in Reference Base Assembly

Next Steps

12

Simulation Steps• Genome Having More Repeat Regions • Develop Mapper with Higher Mismatches

• Genome Structure• Paired-end Shotgun Sequencing• Erasure Base Calling Error• Other Variant Types

Analytical Steps

Page 13: Base Calling Error Toleration in Reference Base Assembly

References[1] EMBO Conference, “Human Evolution in the Genomic Era: Origins, Populations, and Phenotypes,” 2014, [Online]. Available: events.embo.org/14-human-evo[2] Illumina Inc., “Theory of Operation, HCS 1.4/RTA 1.12”,2011.[3] C. Ye, C. Hsiao, and H. Corrada Bravo, “BlindCall: ultra-fast base-calling of high-throughput sequencing data by blind deconvolution,” Bioinformatics, 30(9), 1214–1219, 2014. [4] C. Ledergerber and C. Dessimoz, “Base-calling for next-generation sequencing platforms”, Briefings in Bioinformatics, 2011.[5] GemSIM, “Gemsim,” 2013. [Online]. Available: http://sourceforge.net/projects/gemsim[6] NCBI, “Escherichia coli o157:h7 str. sakai dna, complete genome - nucleotide - ncbi,” 2014. [Online]. Available: http://www.ncbi.nlm.nih.gov/nuccore/47118301?report=fasta[7] G. BGI, “Soap: Short oligonucleotide analysis package,” 2008. [Online]. Available: http://soap.genomics.org.cn

13

Page 14: Base Calling Error Toleration in Reference Base Assembly

Acknowledgement

Thank You for Your Patience, Time and Attention.

14

Danke Seher