Top Banner
Regular Expression Constrained Sequence Alignment Abdullah N. Arslan Assistant Professor Computer Science Department
20

Regular Expression Constrained Sequence Alignment

Feb 25, 2016

Download

Documents

Anisa

Regular Expression Constrained Sequence Alignment. Abdullah N. Arslan Assistant Professor Computer Science Department. Outline. Sequence alignment Common frame-work DP solution Why constrained ? RE constrained sequence alignment Algorithm Concluding Remarks. Alignment Matrix. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Regular Expression Constrained Sequence Alignment

Regular Expression Constrained Sequence Alignment

Abdullah N. ArslanAssistant Professor

Computer Science Department

Page 2: Regular Expression Constrained Sequence Alignment

Outline

• Sequence alignment Common frame-work DP solution Why constrained ?

• RE constrained sequence alignment Algorithm

• Concluding Remarks

Page 3: Regular Expression Constrained Sequence Alignment

Alignment Matrix

Page 4: Regular Expression Constrained Sequence Alignment

Edit Graph

Page 5: Regular Expression Constrained Sequence Alignment

Dynamic Programming Solution

Hi,j: maximum score achieved at (i, j)

where Hi,j = 0 whenever i=0 or j=0,

Hn,m in O(nm) time, O(m) space

Page 6: Regular Expression Constrained Sequence Alignment

DP Solution: Local Alignment

Hi,j: similarity score achieved at (i, j)

where Si,j = 0 whenever i=0 or j=0,

max Hi,j in O(nm) time, O(m) space

Page 7: Regular Expression Constrained Sequence Alignment

Dynamic Programming Formulation

Affine gap penalties Penalty for a gap of length k is +(k-1)

where Si,j = Fi,j = Ei,j = 0 when i=0 or j=0

max Hi,j O(nm) time, O(m) space

Page 8: Regular Expression Constrained Sequence Alignment

The Definition of the Constrained LCS Problem

• The contrained LCS (CLCS) problem Given strings S1,S2, and P

• Find lcs of S1 and S2 s.t. P is a subsequence of this lcs

• Motivation: Computing the homology of two biological

sequences that have a specific part in common

Page 9: Regular Expression Constrained Sequence Alignment

Constrained Sequence Alignment Problems

• Constrained LCS Tsai 2003, O(n2m2r) time Chin et. al 2004, Arslan and Egecioglu 2004

• O(nmr) time

• Edit-distance constrained sequence alignment Arslan and Egecioglu 2004, O(dnmr)

• Regular-expression constrained sequence alignment Motivation:

• Comet and Henry, 2002• PROSITE patterns

This paper

Page 10: Regular Expression Constrained Sequence Alignment

PROSITE patterns as constraints

• PROSITE patterns are Regular expressions with no Kleene closure PROSITE database e.g. [GA]-X(4)-G-K-[ST]

• ATP/GTP-binding site motif A (P-loop) (PS00017)

• Comet and Henry reward alignments• Regular expression constrained sequence

alignment Find a maximal alignment that includes a given

RE

Page 11: Regular Expression Constrained Sequence Alignment

Example: For [GA]-X(4)-G-K-[ST]

Page 12: Regular Expression Constrained Sequence Alignment

Using Edit Graph: e.g. A(C+G)*(S+T)

Page 13: Regular Expression Constrained Sequence Alignment

Automata for A(C+G)*(S+T)

Page 14: Regular Expression Constrained Sequence Alignment

Some Details of Automata Construction

• Equivalent NFA N to a given RE R

• Construct from N a new NxN automaton

Moves on edit operations • (or equivalently on alignment columns)

States have weights• Interested in the weights of the final states after the

alignment is complete

Page 15: Regular Expression Constrained Sequence Alignment

Weighted Automaton

• Initial weights are

• Weight of (q0,q0) is initially 0

• Update new maximum scores at reachable states

• Weights become in unreachable states

• What are the maximum weights at the final states?

Page 16: Regular Expression Constrained Sequence Alignment

Computations on Automata

Page 17: Regular Expression Constrained Sequence Alignment

Complexity• Simulate automata based on DP solution

Each steps requires examining the trasition functions

Maintain a list of active (reachable) states

Update state weights as alignments are formed

Automaton Mi,j has the optimum weights

Page 18: Regular Expression Constrained Sequence Alignment

Generalizations: Local Alignment & Affine gaps

Page 19: Regular Expression Constrained Sequence Alignment

CONCLUSION

• Introduced the regular expression constrained sequence alignment problem

• Present an algorithm for the problem

• Future work Generalization of the problem for

• Multiple sequence alignment• Multiple regular expressions as a constraint

Page 20: Regular Expression Constrained Sequence Alignment

Thank YouThank You