On realizing shapes in the theory of RNA neutral networks Speaker: Leszek Gąsieniec, U of Liverpool, UK Joint work with: Peter Clote, Boston College, USA Roman Kolpakov, U of Moscow, Russia Evangelos Kranakis, Carleton U, Canada Danny Krizanc, Wesleyan U, USA
24
Embed
On realizing shapes in the theory of RNA neutral networks Speaker: Leszek Gąsieniec, U of Liverpool, UK Joint work with: Peter Clote, Boston College, USA.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
On realizing shapes in the theory of RNA neutral networks
• For an integer n>0, a length n RNA nucleotide sequence is considered as a word in space CCn={A,C,G,U}n
• For a=a1a2…an Cn a secondary structure S Sn is a collection of pairs (i,j) s.t.:– aiaj {AU,UA,CG,GC}
– if (i,j) and (k,l) S then a combination i<k<j<l is not permitted, i.e., pseudo-knots are disallowed
– for each pair (i,j) S the values of i,j are unique
Shapes
ACGUCGGUACCAGUUGAGGUCCGAGGACG
ACGUCGGUACCAGUUGAGGUCCGAGGACG
NO
ACGUCGGUACCAGUUGAGGUCCGAGGACG
NO
Shapes• Secondary structures can be identified with a
balanced parenthesis expressions padded with ‘dots’, where– a dot (°) corresponds to an unpaired nucleotide
position, and– a matching parenthesis which opens at nucleotide
position i and closes at nucleotide position j corresponds to a base pair (i,j)
AC°UCGGUA°CAGUU°A°°UC°GAG°°C°
Realizing Shapes
• Give a shape S Sn and a word a Cn we say that a is realizing S if padding with dots is feasible
AC°UCGGUA°CAGUU°A°°UC°GAG°°C°
ACGUCGGUACCAGUUGAGGUCCGAGGACG
S
a
Decision Problem
• “Given a finite set of secondary structures (shapes) {S1,S2,…,Sk}. Under what conditions does there exist a single DNA sequence which can realize which of the given structures?”
• What can be done if such a realization is not feasible?
Optimization Problem M*RP
• We add a “don’t care” symbol * which matches any symbol {A,C,G,U}.
• Given a set of secondary structures (shapes) {S1,S2,…,Sk} to be realized by a sequence Cn. Find the minimum number of positions N(S1,...,Sk) for which after removal (replacement) of all base pairs incident to these positions there exists a sequence a Cn
which realizes each of the structures Si. • We call this the Min * Realizability Problem and we
refer to it by M*RP
Results
• O(nk) algorithm for the decision problem, i.e., when N(S1,…,Sk)=0
• Proof that M*RP problem is NP-hard for k > 3 (case k=3 is unclear)
• We also study a bounded version of M*RP with limited number of *s. E.g., we show that the case limited to the presence of a single * is also solvable in time O(nk).
M*RP Simplification
• We observe that a string a realizing the shapes S1,…,Sk over the four letter alphabet {A,C,G,U} exists if and only if there is a binary string b realizing (here we mean that the endpoints of each edge/pair must have a different bit 0/1) the same set of shapes.
M*RP Simplification
AC°UCGGUA°CAGUU°A°°UC°GAG°°C°
10°010101°0110U°A°°UC°GAG°°C°
10°010101°01101°1°°00°100°°1°
Graph of shapes
• G(S1,…,Sk) = (V,E) is a graph with:
– the set of vertices V containing consecutive positions 1,…, n of base pairs (binary symbols in the simplified version) of the sequence Cn
– the set of edges E is the union of the set of edges appearing in the shapes S1,…,Sk
Graph of shapes
ACGUCGGUACCAGUUGAGGUCCGAGGACG
ACGUCGGUACCAGUUGAGGUCCGAGGACG
1 2 n
An observation
• Lemma: – Any set of shapes S1,S2,…,Sk of size n can
be realized by a single binary string b if and only if the graph G(S1,S2,…,Sk) has no odd cycles (it is 2-colorable).
– Moreover, one can check the existence of b and, if b exists, construct it in O(nk) time
M*RP[m] Problem
• M*RP[m] problem - for any set of shapes S1,…,Sk compute a string over alphabet {0,1,*} which realize all shapes and contain no more than m occurrences of the don’t care symbol *
• Lemma: M*RP[m] problem can be solved in time O(( )||G(S1,…,Sk)||)
nm
Solving M*RP[1] problem
• Using the formula from previous slide we know that M*RP[1] problem can be solved in time O(n||G(S1,…,Sk)||)
• In what follows we give some details of the algorithm solving M*RP[1] in time O(||G(S1,…,Sk)||)
Critical vertices
• A vertex of a graph G is called critical if it is contained in all odd cycles in G.
• Lemma: All critical vertices of an arbitrary graph G can be found in time O(||G||).
• Theorem: M*RP[1] can be solved in O(||G(S1,…,Sk)||) time.
Sketch of the algorithm
• Find any odd cycle without chords– this can be done via finding any odd cycle C,
e.g., with a help of BFS search and the parity test
– having an odd cycle we “chop-off” (one after another) its even sub-cycles based on chords
– all done in time O(||G||)
External connected components K1,K2…,Ke
Odd cycle CK1
K2
Ki
Odd neighbor pairs
Connected component Ki
territory
Odd cycle C territory
0
1
11
1
0
02
3
4
Length L
x
y
L + l(x) + l(y) = 5
Some properties of external connected components
• The external components must not contain an odd cycle, i.e., each component is 2-colorable
• For any Ki – a number of odd neighbor pairs of Ki must be odd,– and it cannot be larger than 2
• Which means that each Ki must have exactly one odd neighbor pair, which defines a segment Li on the odd cycle C
Critical vertices
• Let R be the intersection of all Lis
• One can prove that:– all critical vertices are contained in R– and every vertex in R is critical, i.e., any
cycle in G which does not contain vertices from r must be even
• The content of the set R can be computed in time linear in ||C||.
Conclusion
• Theorem: M*RP[1] can be solved in O(||G(S1,…,Sk)||) time
– what is the complexity of M*RP[i]?
• Theorem: M*RP is NP-hard for k>3– the case with k=2 is always realizable, and– the complexity of the case with k=3 is not