Top Banner
Dynamic mappers of NGS reads Karel Břinda (LIGM Université Paris-Est) Valentina Boeva (Institut Curie) Gregory Kucherov (LIGM Université Paris-Est)
43

Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Mar 09, 2018

Download

Documents

vuphuc
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Dynamic mappers of NGS readsKarel Bř inda (L IGM Univers ité Par is -Est )

Valent ina Boeva ( Inst i tut Cur ie)

Gregory Kucherov (L IGM Univers ité Par is -Est )

Page 2: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

IntroductionRead mapping is a bottleneck in NGS data processing (e.g., for variant calling)

A lot of effort constantly invested into the development of new mappers

None of them supports dynamic updates of the reference during the mapping

Page 3: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Idea: update reference during the mappingOnly few papers on this topic exist

◦ J. Pritt. Efficiently Improving the Reference Genome for DNA Read Alignment. Seminar work, Harvard University, 2013.

◦ A. Ghanayim and D. Geiger. Iterative referencing for improving the interpretation of DNA sequence data. Technical report, Technion, Israel, 2013.

◦ C. S. Iliopoulos et al. An algorithm for mapping short reads to a dynamically changing genomic sequence. Journal of Discrete Algorithms 10, 2012.

Page 4: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Mapping – from static to dynamic

1. Static mapping◦ Classical mappers, no updates

2. Iterative referencing◦ Usage of a standard mappers, mapping is followed by calling variants in many iterations

3. Dynamic mapping◦ Mapper is dynamically updating its index accordingly to already mapped reads

Page 5: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

1) Static mapping (standard mappers)

Static mapper

Reference (index)

MAPPER OUTPUT

1 2 n1 iter.Read mapping

SAM/BAM file

READS

Page 6: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

2) Iterative referencing (Ghanayim&Geiger, 2013)

Static mapper

Reference (index)

Statistics

1 2 n

1 2 n

1 2 n

MAPPER

1 iter.

1 iter.

1 iter.

.

.

.

Read mapping

Pileup, consensus

Update of the reference

OUTPUT

SAM/BAM file

READS

Page 7: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

3) Dynamic mapping (no existing mapper until now)

Dynamic mapperSAM/BAM

file

Reference (index)

Statistics1

2

n

READS MAPPER

1 iter.

1 iter.

1 iter.

.

.

.

Read mapping

Update of the reference

OUTPUT

Page 8: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Estimating the usefulness

Memory requirements Speed Quality of alignment

Iterative referencing + -- ++

Dynamic mapping -- + +

Static mapping + ++ -

Page 9: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Dynamic mappers

Page 10: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Difficulties – dynamic data structuresTwo basic types of mappers:

◦ FM-index based (e.g., BWA-ALN, BWA-SW, BWA-MEM, GEM, etc.)

◦ Hash-table based (e.g., SHRiMP 2, SToRM, etc.)

Data structures must be dynamic◦ Difficult to make dynamic versions

◦ More memory needed

◦ Worse cache-optimization (=> significant decrease of speed)

Dynamic FM-index – already studied:◦ M. Salson, T. Lecroq, M. Léonard, and L. Mouchard. A four-stage algorithm for updating a Burrows–

Wheeler transform. Theoretical Computer Science 410(43), 2009.

◦ M. Salson, T. Lecroq, M. Léonard, and L. Mouchard. Dynamic extended suffix arrays. Journal of Discrete Algorithms 8(2), 2010.

◦ Implementation: http://dfmi.sourceforge.net/

Page 11: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Difficulties – statistics and referenceTo make updates, it is necessary to keep simplified pileups (nucleotide counts in an alignment column).

It is difficult to deal with insertions.

The coordinates of already mapped reads can change during the mapping.◦ Possible solution: padded reference, many

initial place holders (‘*’ character), final small post-processing corrections of the SAM file.

‘A’counter

‘C’counter

‘G’counter

‘T’counter

DEL counter

Sum

3 bits 3 bits 3 bits 3 bits 3 bits 15 bits

Example (memory needed for statistics for a single nucleotide)

1 3 5 7 9 11 13 15 17 19

C * * A * * G * * C * * G C * C * * A * …

Example (padded reference, an insertion at pos. 14)

Page 12: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Difficulties – remapping, unmappingWhen reference sequence changes too much, some of the already mapped reads should be remapped or unmapped

Possible solution:◦ Ignore it

◦ Iterate over the set of reads more times and take only the last reported alignments for each read

...AAAAATATATATATCGATCTGC...CC _

1: ATCTATATATCG2: CCGATCTGC3: CCCGATCTG4: ATCCCGATC

Reference:

Reads:

Page 13: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Simulating dynamic mapping

Page 14: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Dynamic mapping

Dynamic mapper

Reference (index)

Statistics1

2

n

READS MAPPER

1 iter.

1 iter.

1 iter.

.

.

.

Read mapping

Update of the reference

OUTPUT

SAM/BAM file

Page 15: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Simulation (ideal approach)

Static mapper

Reference (index)

READS

1

1 2

1 2 n

MAPPER

Statistics

1 iter.

1 iter.

1 iter.

.

.

.

Read mapping

Pileup, consensus

Update of the reference

OUTPUT

SAM/BAM file

Page 16: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Simulation (feasible approach: 1

𝑑iterations)

Static mapper

Reference (index)

READS

d reads

d reads d reads

d reads d reads d reads

MAPPER

Statistics

1 iter.

1 iter.

1 iter.

.

.

.

Read mapping

Pileup, consensus

Update of the reference

OUTPUT

SAM/BAM file

Page 17: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Our pipelineGoals:

◦ Simulating dynamic mapper using existing static mappers

◦ Estimating usefulness of dynamic mapping

◦ Making general statements about its benefit

Implementation:◦ Set of several scripts (BASH, Python) and programs (C++)

◦ It uses standard bioinformatics software (SAMtools suit, etc.) and mappers (any mapper can be incorporated)

◦ Updates are made by own simple variant caller (simulating real capabilities of mapper)

◦ Currently only SNP updates (no indels) and single-end reads supported

Page 18: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Comparing mappers and alignments

Page 19: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Comparison of mappersTypical approach:

1. Taking several mappers as black-boxes.

2. Simulating reads.

3. Mapping by the selected mappers.

4. Applying the same threshold on mapping qualities for all reads.

5. Comparing.

…it is not very useful.

Page 20: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Comparison of mappers/alignmentsTypical approach:

1. Taking several mappers as black-boxes.

2. Simulating reads.

3. Mapping by the selected mappers.

4. Applying the same threshold on mapping qualities for all reads.

5. Comparing.

…it is not very useful.

Source: Heng Li: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM, arXiv:1303.3997

Threshold 20(on mapping qualities)

Page 21: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Comparison of mappers/alignmentsTypical approach:

1. Taking several mappers as black-boxes.

2. Simulating reads.

3. Mapping by the selected mappers.

4. Applying the same threshold on mapping qualities for all reads.

5. Comparing.

…it is not very useful.

It is important to considerall thresholds on mapping qualities!

Source: Heng Li: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM, arXiv:1303.3997

Page 22: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

LAVEnderA new evaluation software for comparing alignments (C++, Python)

It creates interactive HTML reports for a set of BAM files

Support of:◦ DWGsim read simulator (will be extended)

◦ Single-end reads

Availability◦ Currently a private repository on GitHub

◦ In case of interest, don’t hesitate to contact me at [email protected]

Page 23: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Example of a comparison• Human chromosome 21

• Sequencing error rate: 0.04

• Mutation rate: 0.10

• Single-end reads

• Simulated by DWGsim

• Aligned by BWA-MEM

Fraction of wrongly mapped reads in mapped reads

Part of all reads in %

Page 24: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

EXPERIMENTS

Page 25: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

SetupMappers: BWA-ALN, BWA-MEM

Reference genomes: a bacteria (Borrelia crocidurae), human chromosome 21

Mutation rates: 0.01 – 0.05 for BWA-ALN, 0.15 for BWA-MEM

Sequencing error rate: 0.01

Read length: 100

Read simulator: DWGSim

Evaluator: LAVEnder

Page 26: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

BWA-ALNBorrelia crociduraeRate of mutations: 0.01, Rate of seq. errors: 0.01, Read length: 100Average coverage: 10

MAPPING OF ALL READSWITHOUT ANY UPDATES

BorreliaBWA-ALN

0.01 mut. rate

Page 27: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

BWA-ALNBorrelia crociduraeRate of mutations: 0.01, Rate of seq. errors: 0.01, Read length: 100Average coverage: 10

DYNAMIC MAPPING ITERATIVE REFERENCING

Page 28: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

BWA-ALNHuman chromosome 21Rate of mutations: 0.01, Rate of seq. errors: 0.01, Read length: 100Average coverage: 10

MAPPING OF ALL READSWITHOUT ANY UPDATES

Human Chr. 21BWA-ALN

0.01 mut. rate

Page 29: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

BWA-ALNHuman chromosome 21Rate of mutations: 0.01, Rate of seq. errors: 0.01, Read length: 100Average coverage: 10

DYNAMIC MAPPING ITERATIVE REFERENCING

Page 30: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

BWA-ALNBorrelia crociduraeRate of mutations: 0.03, Rate of seq. errors: 0.01, Read length: 100Average coverage: 10

MAPPING OF ALL READSWITHOUT ANY UPDATES

BorreliaBWA-ALN

0.03 mut. rate

Page 31: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

BWA-ALNBorrelia crociduraeRate of mutations: 0.03, Rate of seq. errors: 0.01, Read length: 100Average coverage: 10

DYNAMIC MAPPING ITERATIVE REFERENCING

Page 32: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

BWA-ALNHuman chromosome 21Rate of mutations: 0.03, Rate of seq. errors: 0.01, Read length: 100Average coverage: 10

MAPPING OF ALL READSWITHOUT ANY UPDATES

Human Chr. 21BWA-ALN

0.03 mut. rate

Page 33: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

BWA-ALNHuman chromosome 21Rate of mutations: 0.03, Rate of seq. errors: 0.01, Read length: 100Average coverage: 10

DYNAMIC MAPPING ITERATIVE REFERENCING

Page 34: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

BWA-ALNBorrelia crociduraeRate of mutations: 0.05, Rate of seq. errors: 0.01, Read length: 100Average coverage: 10

MAPPING OF ALL READSWITHOUT ANY UPDATES

BorreliaBWA-ALN

0.05 mut. rate

Page 35: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

BWA-ALNBorrelia crociduraeRate of mutations: 0.05, Rate of seq. errors: 0.01, Read length: 100Average coverage: 10

DYNAMIC MAPPING ITERATIVE REFERENCING

Page 36: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

BWA-ALNHuman chromosome 21Rate of mutations: 0.05, Rate of seq. errors: 0.01, Read length: 100Average coverage: 10

MAPPING OF ALL READSWITHOUT ANY UPDATES

Human Chr. 21BWA-ALN

0.05 mut. rate

Page 37: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

BWA-ALNHuman chromosome 21Rate of mutations: 0.05, Rate of seq. errors: 0.01, Read length: 100Average coverage: 10

DYNAMIC MAPPING ITERATIVE REFERENCING

Page 38: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

BWA-MEMBorrelia crociduraeRate of mutations: 0.15, Rate of seq. errors: 0.01, Read length: 100Average coverage: 10

MAPPING OF ALL READSWITHOUT ANY UPDATES

BorreliaBWA-MEM

0.15 mut. rate

Page 39: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

BWA-MEMBorrelia crociduraeRate of mutations: 0.15, Rate of seq. errors: 0.01, Read length: 100Average coverage: 10

DYNAMIC MAPPING ITERATIVE REFERENCING

Page 40: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

BWA-MEMHuman chromosome 21Rate of mutations: 0.15, Rate of seq. errors: 0.01, Read length: 100Average coverage: 10

MAPPING OF ALL READSWITHOUT ANY UPDATES

Human Chr. 21BWA-MEM

0.15 mut. rate

Page 41: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

BWA-MEMHuman chromosome 21Rate of mutations: 0.15, Rate of seq. errors: 0.01, Read length: 100Average coverage: 10

DYNAMIC MAPPING ITERATIVE REFERENCING

Page 42: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

ConclusionWe have shown: For cases with small number of mutations between genomes, static mapping suffices (e.g., 1%+1%,

BWA-ALN)

For cases with high amount of mutations, mapping is much improved when dynamic mapping is employed (e.g., 15%+1%, BWA-MEM)

Real situations: regions with low rates of mutations as well as highly mutated regions (e.g., hot spot regions) If we are interested also in these regions, dynamic mapping/iterative referencing would provide great

improvement (especially for, e.g., variant calling)

Side products of our work: LAVEnder – a new evaluator of alignments

Page 43: Dynamic mappers of NGS reads - Institut Gaspard Mongeigm.univ-mlv.fr/AlgoB/slides/Brinda_SeqBio_2014.pdf ·  · 2014-11-18Dynamic mappers of NGS reads Karel Břinda ... (* character),

Thank you for your attention!

Gregory KucherovValentina Boeva