Top Banner
An Improved Search Algorithm for Optimal Multiple-Sequence Alignment Paper by: Stefan Schroedl Presentation by: Bryan Franklin
17

An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

Jan 13, 2016

Download

Documents

hafwen

An Improved Search Algorithm for Optimal Multiple-Sequence Alignment. Paper by: Stefan Schroedl Presentation by: Bryan Franklin. Outline. Multiple-Sequence Alignment (MSA) Graph Representation Computing Path Costs Heuristics Experimental Results Other Optimizations. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

An Improved Search Algorithm for Optimal Multiple-Sequence

Alignment

Paper by: Stefan SchroedlPresentation by: Bryan Franklin

Page 2: An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

Outline

Multiple-Sequence Alignment (MSA)

Graph Representation

Computing Path Costs

Heuristics

Experimental Results

Other Optimizations

Page 3: An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

Multiple-Sequence-Alignment

Sequence:

DNA: String over alphabet {A,C,G,T}

Protein: String with |Σ|=20 (one symbol for each amino acid)

Alignment:

Insert gaps (_) into sequences to line up matching characters.

Page 4: An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

Multiple-Sequence-Alignment

Sequences: ABCB, BCD, DB

Page 5: An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

Multiple-Sequence-Alignment

Indel: Insertion, Deletion, Point mutation (single symbol replacement)

Find a minimum set of indels between two or more sequences.

NP-Hard for an arbitrary number of sequences

Page 6: An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

Multiple-Sequence-Alignment

Applications

Common ancestry between species

Locating useful portions of DNA

Predicting structure of folded proteins

Page 7: An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

Graph Representation

Page 8: An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

Computing G(n)

ConsiderationsBiological MeaningCost of computation

Sum-of-pairs

Substitution matrix ((|Σ|+1)2)Sum alignment costs for all pairsCosts can depend on neighbors

Page 9: An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

Computing G(n)

Sequences: ABCB, BCD, DB

Page 10: An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

Computing G(n)

6

7

8

7

7

Page 11: An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

Computing G(n)

Page 12: An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

Heuristics

Methods Examined

2-fold (hpair)

divide-conquer

3-fold (h3,all)

4-fold (h4,all)

Page 13: An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

Heuristic Comparison

Page 14: An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

Resource Usage

Page 15: An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

Optimizations

Sparse Path Representation

Curve fitting for predicting threshold values

Sub-optimal paths periodically deleted

Page 16: An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

The End

Any Questions?

Page 17: An Improved Search Algorithm for Optimal Multiple-Sequence Alignment