Jan 19, 2016

Local Alignment of RNA Sequences with Arbitrary Scoring Schemes. Rolf Backofen Danny Hermelin Gad M. Landau Oren Weimann. C G. C G. G C. C. A U. U A. C G. A. G. U. A. G. U. C. G. A. C. G. U. G. U. C. A. A. A. C. G. U. U. G. G. C. RNA sequences. RNA sequences. - PowerPoint PPT Presentation

Welcome message from author

This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

Rolf BackofenRolf Backofen Danny Hermelin Danny Hermelin Gad M. Landau Gad M. LandauOren WeimannOren Weimann

RNA sequencesRNA sequencesC G

C G

G C

U A

A U

C G

C

A G U A G U

C C G U A G U A C C A C A G U G U G G

RNA sequencesRNA sequencesC G

C G

G C

U A

A U

C G

C

A G U A G U

C C G U A G U A C C A C A G U G U G G

RNA sequencesRNA sequencesC G

C G

G C

U A

A U

C G

C

A G U A G U

C C G U A G U A C C A C A G U G U G G

Alignment of StringsAlignment of Strings

Global Alignment: )(nmO

S1=

S2=

U C A C C G __ A __ G

U C G C G G U A U G

Alignment of RNA Alignment of RNA sequencessequences

A A G G C C C U G A U

A G A C C G U UA U

Alignment of RNA Alignment of RNA sequencessequences

A A G G C C C U G A U

A G A C C G U U U

Alignment of RNA Alignment of RNA sequencessequences

RNA Global Alignment via tree edit distance:

A A G G C C C U G A U

A G A C C G U U U

[K 1998]

)n(O)nm(O 422 [SZ 1989]

)nlogn(O)nlgnm(O 32

)n(O))1(lgnm(O 32 [DMRW 2006]

n

m

Theorem: All these algorithms compute the edit distance

between any two arcs provided we match these arcs.

The Alignment graphThe Alignment graph

U C A C C G A G

U

C

G

C

G

G

U

A

U

G

Theorem: There is a one to one correspondence between all paths in the alignment graph and all alignments of substrings of R1 and R2.

The Alignment graphThe Alignment graph

U C A C C G A G

U

C

G

C

G

G

U

A

U

G

Theorem: There is a one to one correspondence between all paths in the alignment graph and all alignments of substrings of R1 and R2.

The Alignment graphThe Alignment graph

U C A C C G A G

U

C

G

C

G

G

U

A

U

G

The Alignment graphThe Alignment graph

U C A C C G A G

U

C

G

C

G

G

U

A

U

G

The Alignment graphThe Alignment graph

U C A C C G A G

U

C

G

C

G

G

U

A

U

G

Theorem: There is a one to one correspondence between all paths in the alignment graph and all alignments of substrings of R1 and R2 in which all arcs are deleted.

The Alignment graphThe Alignment graph

U C A C C G A G

U

C

G

C

G

G

U

A

U

G

The Alignment graphThe Alignment graph

U C A C C G A G

U

C

G

C

G

G

U

A

U

G

Theorem: There is a one to one correspondence between HEAVIEST paths in the alignment graph and OPTIMAL alignments of substrings of R1 and R2.

The Local Alignment The Local Alignment algorithmsalgorithms

We use the alignment graph to We use the alignment graph to compute the local similarity between compute the local similarity between two RNA sequences according to two RNA sequences according to two well known metrics:two well known metrics: Smith-Waterman – the Smith-Waterman – the highest scoring

alignment between any pair of substrings of the input RNAs.

It’s normalized version. It’s normalized version.

Standard Local Similarity Standard Local Similarity (Smith-Waterman)(Smith-Waterman)

The score is computed The score is computed via dynamic program:via dynamic program:

Score(i,j) =Score(i,j) =

maxmax

U C A C C G A G

U

C

G

C

G

G

U

A

U

G

Score(i’,j’) + Weight of the incoming edge from (i’,j’)Score(i’,j’) + Weight of the incoming edge from (i’,j’),,

00Time complexity:

O(mn) + one run of a global algorithm = 1))n(lgO(m2 nm

Normalized Local SimilarityNormalized Local Similarity The weakness of Smith Waterman approach The weakness of Smith Waterman approach

[AP 2001]:[AP 2001]:

Solution: look for the substrings (with Solution: look for the substrings (with their arcs) that maximize: their arcs) that maximize:

and some given value.and some given value.

|'R||'R|

)'R,'ED(R

21

21

'R,'R 21

)'R,'ED(R 21

Normalized Local Similarity Local Similarity

Again, dynamic program: Again, dynamic program: U C A C C G A G

U

C

G

C

G

G

U

A

U

G

Define Define Length(k,i,j)) to be the length of to be the length of the shortest path that ends at vertex the shortest path that ends at vertex (i,j) and has weight equal to k.(i,j) and has weight equal to k.

• The best The best k/Length(k,i,j) over all ) over all i,j,ki,j,k is the normalized score. is the normalized score.

Normalized Local SimilarityNormalized Local Similarity

Again, dynamic program: Again, dynamic program:

Define Define Length(k,i,j)Length(k,i,j) to be the length of to be the length of the shortest path that ends at vertex the shortest path that ends at vertex (i,j) and has weight equal to k.(i,j) and has weight equal to k.

For every k,i,j compute For every k,i,j compute Length(k,i,jLength(k,i,j)) = =

minmin Length(k-w,i’,j’)Length(k-w,i’,j’) + (j’-j+i’-i) | where w = weight of the incoming edge from (i’,j’) + (j’-j+i’-i) | where w = weight of the incoming edge from (i’,j’)

Length(k-w,i’,j’)

Length(k,i,j)

w

j’-j

i’-i

Time complexity:

+ one run of a global algorithm = m)O(n2

m)O(n1))n(lgO(mm)O(n 222 nm

Open ProblemsOpen Problems

Arc deletion:Arc deletion:

Improve global tree edit distanceImprove global tree edit distance

U C A C C G A G

U

C

G

C

G

G

U

A

U

G

Related Documents