Top Banner
Decision Problems Observation: Many polynomial algorithms. Questions: Can we solve all problems in polynomial time? Answer: No, absolutely not. Definition: The class of problems that can be solved by polynomial-time algorithms is called P. Contrary to P, we also have the notion of NP problems, which describes the hardness of a problem. However, NP = NotP . A problem is called a decision problems if it has a yes/no solution. Sometimes we also call it a language-recognition problem. Many problems can be cast as decision problems by imposing simple constraints. Let U be the set of all possible inputs to the decision problem, let L U be the set of in- puts for which the answer to the problem is “yes”. L is called the language correspond- ing to the problem. Jiming Peng, AdvOL, CAS, McMaster 1
43
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: lecture7

Decision Problems

Observation: Many polynomial algorithms.

Questions: Can we solve all problems in

polynomial time?

Answer: No, absolutely not.

Definition: The class of problems that can

be solved by polynomial-time algorithms is

called P.

Contrary to P, we also have the notion of NP

problems, which describes the hardness of a

problem. However, NP 6= NotP .

A problem is called a decision problems if it

has a yes/no solution. Sometimes we also call

it a language-recognition problem. Many

problems can be cast as decision problems by

imposing simple constraints.

Let U be the set of all possible inputs to the

decision problem, let L ⊂ U be the set of in-

puts for which the answer to the problem is

“yes”. L is called the language correspond-

ing to the problem.

Jiming Peng, AdvOL, CAS, McMaster 1

Page 2: lecture7

Polynomial Reduction

Definition: Let L1 and L2 be two languages

from the input spaces U1 and U2. We say

L1 is polynomial reducible to L2 if there is a

polynomial-time algorithm that converts each

input u1 ∈ U1 to another input u2 ∈ U2 such

that u1 ∈ L1 if and only if u2 ∈ L2.

Remark: In the above definition, we assume

the algorithm is polynomial in the size of u1,

which implies the size u2 is also polynomial

in the size of u1.

Theorem If L1 is polynomial reducible to

L2 and there is a polynomial algorithm for

L2, then there is a polynomial algorithm for

L1.

Theorem If L1 is polynomially reducible to

L2, and L2 to L3, then L1 is polynomially

reducible to L3.

Jiming Peng, AdvOL, CAS, McMaster 2

Page 3: lecture7

NP Problems

Definition: NP denotes the class of prob-

lems that a positive answer has a ‘certificate’

so that the correctness of a positive answer

can be derived in polynomial time.

The algorithm to verify the correctness of

the positive answer is called a nondetermin-

istic algorithm.

Examples:

L1 := {G|G, a graph with a perfect matching}

L′1 := {(G, M)|a Graph and a perfect Matching in G}.

L′1 is polynomially solvable, thus L1 ∈ NP !

L2 := {G|G, a Hamiltonian graph},

L′2 := (G, M)| a Graph and a Hamilton cycle (M) in G}.

L′2 ∈ P , therefore L2 ∈ NP .

Clearly, P ⊂ NP .

The relation between P and NP has become

one of the open mysteries in computer sci-

ence and math. It is one of seven open “Mil-

lion” questions in the new century.

Jiming Peng, AdvOL, CAS, McMaster 3

Page 4: lecture7

NP-Completeness

Question: Does P equal to NP?

Answer: Seems NP is much larger than P.

However, we have not found a single problem

in NP that is not in P!!

We next introduce two classes of problems

that have not been proven to be P yet.

Definition: A problem X is called an NP-

hard problem if every problem in NP is poly-

nomial reducible to X.

Conclusion: If we can solve a NP-hard prob-

lem in polynomial time, then we can solve all

the problems in NP in polynomial time!

Definition: A problem X is called an NP-

complete problem if it is NP-hard and be-

longs to NP.

Conclusion: NP-complete problems are the

hardest problems in NP. If we can prove one

NP-complete problem is P, then P=NP.

Jiming Peng, AdvOL, CAS, McMaster 4

Page 5: lecture7

SAT: An NP-Complete Problem

The following lemma by Cook (1971) is fun-

damental in the theory of NP-Completeness

(NPC).

Lemma: A problem X is an NPC problem if

(1) X belongs NP and (2) Y is polynomially

reducible to X for some NPC Y .

In his seminar paper, Cook gave the first

example of NPC problem.

Satisfiability (SAT): Let S be a Boolean

expression, such as S = (x+y+z) ·(x+y+ z)

here addition denotes ‘or’ and multiplication

means ‘and’. A Boolean expression is said to

be satisfiable if there is an assignment of 0s

and 1s such that the value of the expression is

1. The SAT problem is to determine whether

a given expression is satisfiable.

A detailed proof can be found in Cook’s pa-

per, which used the similarity between Turing

machine and the Boolean expression. From

now on we use the fact that SAT is NPC.

Jiming Peng, AdvOL, CAS, McMaster 5

Page 6: lecture7

Other NPC Problems

Definition: An instance of 3SAT is a Boolean

expression in which each clause contains ex-

actly three variables.

3SAT Problem: Given a Boolean expression

in which each clause contains exactly three

variables, determine whether it is satisfiable.

Theorem: 3SAT is NPC.

Proof: Obviously 3SAT belongs to NP be-

cause we can verify whether an assignment is

satisfiable in polynomial time. We next con-

struct a polynomial reduction that transforms

a general SAT into 3SAT.

Let E be an arbitrary instance of SAT. We

try to replace each clause of E by several

3-clauses. We first consider a clause C =

(x1 + x2 + · · · + xk) with k ≥ 4.

Let y1, . . . , yk−3 be new variables. By usingthese new variables, we can define

C ′=(x1+x2+y1)·(x3+y1+y2)·(x4+y2+y3) · · · (xk−1+xk+yk−3).

Jiming Peng, AdvOL, CAS, McMaster 6

Page 7: lecture7

Proof of NPC 3SAT

Statement: C′ is satisfiable if and only if C

is satisfiable.

To prove the statement, observe that if C is

satisfiable, then at least one of the xis must

be 1. For instance, if xi = 1 for some i > 2,

then we set y1, · · · , yi−2 to be 1 and the rest

be 0, which satisfies C′. If x1 or x2 is 1, then

we can set all yj to zero.

Conversely, if C′ is satisfiable, then at leastxi must be 1. Otherwise, if all xis are 0, then

C ′ = y1.(y1 + y2) · (y2 + y3) · · · (yk−3).

This expression is clearly unsatisfiable.

Other cases: If C = (x1 + x2), then

C′ = (x1 + x2 + z) · (x1 + x2 + z),

where z is a new variable. If C = x1, then

C′ = (x1+y+z)·(x1+y+z)·(x1+y+z)·(x1+y+z).

This reduction can be done in polynomial

time. Therefore, SAT can be reduced to

3SAT in polynomial time. Since SAT is NPC,

so is 3SAT.

Jiming Peng, AdvOL, CAS, McMaster 7

Page 8: lecture7

3-Coloring Problem

Let G = (V, E) be an undirected graph. A

valid coloring of G is an assignment of col-

ors to the vertices such that no two adjacent

vertices have the same color.

Problem: Given a graph G = (V, E), de-

termine whether G can be colored with three

colors.

Theorem: 3-coloring is NP-complete.

Proof: Obviously 3-coloring is in NP. To

prove it is NP-complete, we reduce 3SAT to

3-coloring.

Let E = (x + y + z) be an clause of 3SAT.

We want to construct a graph G such that

E is satisfiable if and only if G can be 3-

colored. First we construct the main triangle,

denote by M and labelled with colors T ,F ,A.

These colors are used only for proof. For each

variable x, we then build another triangle Mx

whose vertices are A, x, x.

Jiming Peng, AdvOL, CAS, McMaster 8

Page 9: lecture7

From 3-SAT to 3-coloring

A

T F

y

y

z

x x

z

Basic (or Main) Trangle

Jiming Peng, AdvOL, CAS, McMaster 9

Page 10: lecture7

Impose Satisfiability

We now try to impose the condition that

at least one variable in the clause must be

1.We introduce 6 new vertices and connect

them to the new graph. Let’s call the three

new vertices connected to T and x, y, z by

O (Outer vertices), and the other three new

vertices in a triangle as I (Inner vertices).

I1

I2 I3

O2 O3

O1

x T

y T z

T

Constructing the graph

Jiming Peng, AdvOL, CAS, McMaster 10

Page 11: lecture7

Reduce 3-SAT to 3-coloring

We claim that if this graph can be colored

with no more than 3 colors, then at least one

of x, y, z must be colored T . Since otherwise

all the outer vertices must be colored with A,

and then we can not color the inner triangle.

Now we consider the converse case. Sup-

pose that E is satisfiable and we want to color

the graph by 3 colors. Because E is satisfi-

able, we can assume w.l.g. that x is 1. Then

we can color the vertex connected to x in the

outer vertices as F , and the rest outer ver-

tices with A. Correspondingly we can color

the inner triangle.

This is a polynomial reduction from 3-SAT

to 3-Coloring problem. Because 3-SAT is

NP-complete, so is 3-Coloring.

Jiming Peng, AdvOL, CAS, McMaster 11

Page 12: lecture7

A graph for x+y+z=1

A

T F

y

y

z

x x

z

O1

O3 O2

x+y+z=1 if and only if this graph can be colored with three colors

Jiming Peng, AdvOL, CAS, McMaster 12

Page 13: lecture7

NP-Complete Clique Problem

Problem: A clique C is a subgraph of G such

that all the vertices in it are connected to

each other. The clique problem is to deter-

mine, for a given G and constant k, whether

G has a clique of size ≥ k.

Theorem: The clique problem is NP-complete.

Proof: Obviously the clique problem belongs

to NP. It suffices if we can reduce SAT to

the clique problem. Let E = E1 · · ·Em be an

arbitrary Boolean expression. We construct

a graph in the following way:

1 Cast each variable in one clause as a ver-

tex in the graph;

2 Add edges to link the vertices from differ-

ent clauses unless they are complements

to each other;

3 Vertices from the same clause are not

connected.

Jiming Peng, AdvOL, CAS, McMaster 13

Page 14: lecture7

A Graph for Boolean Clause

x

y

z

x

y

z

y

z

A graph for the expression (x+y+ z ).( x + y +z).(y+ z )

Jiming Peng, AdvOL, CAS, McMaster 14

Page 15: lecture7

NP-completeness of Clique

Statement: G has a clique ≥ m if and only

if E is satisfiable.

Proof: The construction guarantees that the

maximal clique size does not exceed m. Sup-

pose E is satisfiable, then there is a true as-

signment such that each clause has at least

one ‘true’ variable. We claim all these ‘true’

vertices in G are connected because the cho-

sen vertices can not have a complement pair.

This means the resulting subgraph is a clique.

Conversely, assume that G contains a clique

of size ≥ m. The clique must consists of m

vertices from distinct ‘column’ of clauses. We

assign the corresponding variable a value of

1, and their complements 0, and the rest vari-

ables arbitrarily. Since all the vertices in the

clique are connected, and we know a com-

plementary pair is never connected. This as-

signment is consistent.

Jiming Peng, AdvOL, CAS, McMaster 15

Page 16: lecture7

Vertex Covering Problem

Definition: Let G = (V, E) be a graph. A

vertex set S of G is called vertex cover if each

edge in G is incident to at least one of the

vertices in S.

Problem: Given an undirected graph G =

(V, E) and an integer k, decide whether G has

a vertex cover containing ≤ k vertices.

Definitely set covering problem belongs to

the set of NP-problems. Recall that Clique

is NP-complete. Only if we can reduce any

clique problem into a set covering problem.

The idea is to construct the complement G

of G with the same vertex set. However, all

the edges in G are broken, while all the dis-

joint vertices in G are connected in G.

Now we can show that a clique of size k in

G matches a vertex covering of size n − k in

G, versa via.

Jiming Peng, AdvOL, CAS, McMaster 16

Page 17: lecture7

From Clique to Set Cover

V1

V2 V3

V4

V5 V6

A graph with clique (v2,v5,v6)

V1

V2 V3

V4

V5 V6

The complement has a vertex cover (v1,v3,v4)

Jiming Peng, AdvOL, CAS, McMaster 17

Page 18: lecture7

NP-complete family

SAT

Clique 3-SAT

Set Cover

3- Coloring

Dominati ng Set

Independ ent Set

Hamilton Cycle

TSP

NP-complete Family

Partition

Jiming Peng, AdvOL, CAS, McMaster 18

Page 19: lecture7

Branch and Bound for NPCPs

Consider the 3-coloring problem. Note that

if a vertex v is colored, then there are two

ways to color its neighbor. This fits into the

structure of a binary tree.

We can start with any two vertices and ex-

plore all the possibilities for the rest vertices.

Let pick one child in the tree, and continue

this process until the whole graph is colored

or a ‘No’ answer is reported. In the later case,

we track back and try other children.

Algorithm 3-coloring (G, Var U);

Input: G=(V,E), an empty set U;

Output: a coloring.

Begin

If U=V, then G is colored, stop.

else pick v not in U;

for C := 1 to 3 do

if no neighbor of v is colored with C

U := U + v, v is colored by C,

3-coloring (G,U)

End

Jiming Peng, AdvOL, CAS, McMaster 19

Page 20: lecture7

Backtrack for 3-Coloring

We use the colors R(ed), G(reen) and B(lue).

v1 v2 v3 v4

v5

1B 2G

3R 3B

4G 4B

No No

4R 4G

5R

Yes

No

Branch and Bound for 3-coloring

Jiming Peng, AdvOL, CAS, McMaster 20

Page 21: lecture7

Branch and Bound for ILP

The technique of branch and bound is fre-

quently used in integer linear programming

where we usually want to minimize or maxi-

mize a linear objective subject to some con-

straints. The heuristics in ILP is to fix some

variables temporarily, and then solve the re-

sulting ILP which is usually smaller and rel-

atively easier than the original problem. We

can also use the relaxed linear program to

solve ILP. If the solution of the relaxed LP is

integer, then it solves ILP.

Example:

minx1 − 2x2 x1, x2 ∈ {0,1}.

We first set x1 to 0, and then solve the sub-

problem

min−2x2 x2 ∈ {0,1},

which has a solution at x2 = 1 with value -2.

Then we set x1 to 1, and solve

min1 − 2x2 x2 ∈ {0,1}.

The minimal solution of the above problem

has a value -1. Comparing these two values,

we get the solution to the original problem as

x1 = 0, x2 = 1.

Jiming Peng, AdvOL, CAS, McMaster 21

Page 22: lecture7

ILP for Clique Problem

Problem: Find a clique C in graph G =

(V, E) with the maximal size.

We try to model the problem as an integer

linear programming problem. Let us define n

variables corresponding to vertices as follows:

xi =

1 if the vertex vi is in C,

0 otherwise.

Therefore, we can formulate the maximal clique

problem as the following

max z =n

i=1

xi;

xi ∈ {0,1};

xi + xj ≤ 1, ∀(vi, vj) 6∈ E.

Jiming Peng, AdvOL, CAS, McMaster 22

Page 23: lecture7

ILP for Clique

V1

V2 V3

V4

V5 V6

A graph with clique (v2,v5,v6)

For the above problem, the ILP model reads

max z =6

i=1

xi;

xi ∈ {0,1};

x1 + x3 ≤ 1, x1 + x4 ≤ 1;

x1 + x5 ≤ 1, x1 + x6 ≤ 1;

x2 + x3 ≤ 1, x2 + x4 ≤ 1;

x3 + x5 ≤ 1, x4 + x5 ≤ 1.

The final solution is x1 = 0, x2 = 1, x3 =

0, x4 = 0, x5 = 1, x6 = 1.

Jiming Peng, AdvOL, CAS, McMaster 23

Page 24: lecture7

LP Relaxation for ILP

Note that in our ILP example for clique prob-

lem, if we relax the constraints xi ∈ {0,1} to

0 ≤ xi ≤ 1, then we get a LP problem that

could be solved efficiently. By solving the re-

laxed problem, we can get a solution to the

original problem!

However, this is not true for general ILPs.

Nevertheless, the LP relaxation provides us a

useful approach for solving ILP. For example,

we can employ the backtracking technique,

and use the values of the easily solvable re-

laxed LP problems to drop some children. For

instance, we have already a feasible solution

and thus a value z1. After fixing some vari-

ables, we solve the relaxed LP. If the resulting

optimal value is worse than z1, then we can

throw away the whole branch, and thus avoid

unnecessary work.

The worst case of this branch-bound algo-

rithm is exponential. But by exploring the

special structure of the underlying problem,

special heuristics can be developed and many

results have been reported.

Jiming Peng, AdvOL, CAS, McMaster 24

Page 25: lecture7

Approximation Algorithms

Definition: An algorithm that may not lead

to the optimal result but yet give a good fea-

sible solution is called an approximation al-

gorithm.

If NCPs are so hard to solve, why not try

approximate algorithms?

Definition: For a given problem, an approxi-

mate algorithm is called an ρ-approximate al-

gorithm if it always give a solution satisfying

C∗/C ≤ ρ or C/C∗ ≤ ρ, depending on the un-

derlying problem is minimizing or maximizing,

where C∗ is the exact solution of the under-

lying problem.

Question: If we do not know the exact solu-

tion C∗ of the problem, how to estimate the

approximate ratio ρ?

Jiming Peng, AdvOL, CAS, McMaster 25

Page 26: lecture7

Approximate Bin Packing

Bin Packing: Let X be a set of elements

xi ∈ [0,1], i = 1, · · · , n. Partition these el-

ements into subsets such that the summa-

tion of elements in each subset is less than

or equal to 1.

This is a variant of Knapsack problem.

A direct is to fit each bin until there is no

room in all the previous bins for the next item.

This is called first fit algorithm, which re-

quires at most 2OPT bins. The proof is triv-

ial, because first fit can not leave two bins

less than half full.

A better idea is to do ordering first, and

then using first fit again, and the solution

can be improved to 1.22OPT. The proof can

be found in the main textbook.

Jiming Peng, AdvOL, CAS, McMaster 26

Page 27: lecture7

Approximate Vertex Cover

Definition: Vertex Cover is a set of vertices

that each edge in G = (V, E) is incident to it.

Problem: Find the minimal vertex cover in

a graph.

The problem is NPC. So we try to find an

approximate solution.

We can use method for maximal matching to

approach it. The set of all vertices in a max-

imal matching forms a vertex cover, which is

at most 2OPT, where OPT is the number of

vertices in the best vertex cover.

But we need to know how to find a maximal

matching in a graph first.

Jiming Peng, AdvOL, CAS, McMaster 27

Page 28: lecture7

Euclidean TSP

Problem: Let ci, i = 1, · · · , n are n points

on the plane. Find the Hamilton cycle with

minimal distance.

The problem is NP-hard, but since the graph

is on the plane, it satisfies the triangle in-

equality.

We can start with a minimal cost span-

ning tree, which can be obtained in polyno-

mial time. Therefore, the cost of the tree

is less than or equal to the cycle (Note that

by removing one edge in the cycle we get a

spanning tree).

We can construct a circuit which traverse

each vertex twice, by using depth first algo-

rithm. The length of this circuit is at most

twice as that of the minimal TSP tour. Now

we can construct a cycle from this circuit.

Instead of backtracking, we can move to the

first new vertex. This gives a cycle whose

length is less than 2OPT for TSP.

Jiming Peng, AdvOL, CAS, McMaster 28

Page 29: lecture7

TSP tour VS Spanning Tree

A spanning tree

A TSP Tour expanded from the spanning tree

Jiming Peng, AdvOL, CAS, McMaster 29

Page 30: lecture7

Further improvement

We can use the idea of Eulerian circuit to

construct a better TSP tour. This can be

done by adding edges to link all nodes with

an odd degree. Note that the number of odd-

degree nodes should be even! (Why?)

Suppose in total there are 2k odd-degree

nodes, we can use k edges to connect them.

We want to use the edges whose total dis-

tance is minimal. This gives rise to a minimum-

distance matching problem which can be solved

in n3 time.

We can prove that the total distance of the

minimum matching is less than half of the

distance in the final TSP tour. This can be

shown in the following way. We construct

two disjoint paths that link all the odd-degree

nodes together such that the total length of

these two paths is less that that of the TSP

tour.

Jiming Peng, AdvOL, CAS, McMaster 30

Page 31: lecture7

Eulerian Circuit to TSP

A spanning tree + Matching

A TSP Tour derived from Eulerian Circuit

Two different matchings that link all the odd-degree nodes together

Jiming Peng, AdvOL, CAS, McMaster 31

Page 32: lecture7

Matching Problem in Graph

Definition: Let G = (V, E) be a graph.

1 A matching is a set of edges such that

no two of which have a vertex in common;

2 A perfect matching is a matching in

which all the vertices are matched;

3 A matching is called Maximal Matching

if it has the maximal cardinality.

Sometimes, finding a perfect matching in a

graph is impossible. However, if the graph

is very dense, for example, |V | = 2n and the

degree of each vertex in the graph is greater

than n, then we can use induction and greedy

algorithm to find a perfect matching in such

a dense graph.

The algorithm takes a procedure as follows.

First take any edge in the graph, and remove

the corresponding two vertices and the edges

linked to these two. Then we have a smaller

graph with |V1| = 2n − 2. Since the original

graph is very dense, so is the reduced graph.

Jiming Peng, AdvOL, CAS, McMaster 32

Page 33: lecture7

Maximal matching

Theorem: A matching is maximal if and only

if it has no augmenting path.

Finding a maximal matching

• Start with M = φ;

• Finding an augmenting path P relative to

M and replace M by M + P

• Repeat the process until no augmenting

path exists.

Idea: From two matchings M, N , we can ob-

tain an augmenting path for M or N .

Consider the graph G′ = (V, M + N):

• Each vertex is an endpoint of at most one

edge from M and one edge from N;

• Each connected component of G′ form a

path with edges alternating between M and

N;

• Each path that is not a cycle form an

augmenting path for M or N.

Jiming Peng, AdvOL, CAS, McMaster 33

Page 34: lecture7

Finding augmenting path

• Level 0: Count all unmatched vertices

from V ;

• At Odd Level i: Add new vertices that

are adjacent to a vertex at level i−1 by a

non-matching edge (edge is also added);

• At Even Level i: Add new vertices that

are adjacent to a vertex at level i − 1 be-

cause of an edge in the matching M , to-

gether with that edge;

• Continue the process until an unmatched

vertex is added at an odd level, or no more

vertices can be added;

• Remove all the edges in the original match-

ing.

Jiming Peng, AdvOL, CAS, McMaster 34

Page 35: lecture7

Augmenting path: 1

V1 V2

V3 V4

V5 V6

V7 V8

V2

V4

V6

V8

v1

v3

v5

v7

Add V5 and (v5,v4) to the matching

Unmatched vertices V5, V8

Jiming Peng, AdvOL, CAS, McMaster 35

Page 36: lecture7

Augmenting path: 2

V1 V2

V3 V4

V5 V6

V7 V8

V2

V4

V6

V8

v1

v3

v5

v7

Add v6 and (v3,v6) to the matching

Add v3 and (v4,v3)

Jiming Peng, AdvOL, CAS, McMaster 36

Page 37: lecture7

Augmenting path: 3

V1 V2

V3 V4

V5 V6

V7 V8

V2

V4

V6

V8

v1

v3

v5

v7

Add v8 and (v7,v8) to the matching

Add v7 and (v6,v7)

Jiming Peng, AdvOL, CAS, McMaster 37

Page 38: lecture7

Augmenting path: 4

V1 V2

V3 V4

V5 V6

V7 V8

V2

V4

V6

V8

v1

v3

v5

v7

The final maximal matching

Remove edges in the original

matching

Jiming Peng, AdvOL, CAS, McMaster 38

Page 39: lecture7

Computing with DNA

So far we are working on digital computers.How about DNA-based computers?

In 1994, Len Adleman (computer scientist)showed an NP-complete problem can be solvedusing DNA! This is impossible in the classicway. Adleman’s work is based on biochemicalprocess work on huge numbers of moleculesin parallel.

The problem Adleman tackled is Hamiltonpath problem (HP) in directed graph G =(V, E) with designated start v0 and end vn

vertices. The problem is to decide whetherthere is a path from v0 to vn with n = |V |such that it pass all the other vertices in Gexactly once. This is an NP-hard problem.

Let w0, w1, · · · , wq be any path in G, we cancheck whether it is HP by determining if itsatisfies the following properties

1: w0 = v0, wq = vn;2: q = n3: Every vertex in V appear once in the

pathThis can be done in polynomial time, but theproblem is that too many possible paths...

Jiming Peng, AdvOL, CAS, McMaster 39

Page 40: lecture7

Background knowledge on DNA

Using the DNA model, we can perform the

following process

• Generate DNA strands to represent paths

in G;

• Use biochemical processes to extract strands

satisfying 1-3, and discard all others

a: Extract strands that start at v0 and

end at vn;

b: Extract strands that include n ver-

tices;

c: Extract strands that contain every

vertex.

• Any strand that remains represents a HP.

If no, then there is no HP in G.

DNA is deoxyribonucleic acid, the genetic

material that encodes the characteristics of

living things. It consists of strings of chem-

icals called nucleotides denoted by: adenine

(A), cytosine (C), guanine (G) and thymine

(T). Thus we can encode any information

using this four-letter alphabet, different from

the binary 0-1 coding.

Jiming Peng, AdvOL, CAS, McMaster 40

Page 41: lecture7

Background knowledge on DNA

Two noble winners, J. Watson and F. Crick

found the double helix structure of DNA: A

and T are complements, C and G are com-

plements. Two strands of nucleotides will at-

tach to each other if they have complemen-

tary components in corresponding positions.

But it is also possible that DNA strands might

attach to each other without the complemen-

tary elements.

We associate a string Ri = di,1di,2 · · · di,20

of 20 letters from alphabet A,C, G,T with

each vertex vi in the graph G. The recipe

for generating DNA strands uses two ingre-

dients for edges and vertices. For each edge

vivj 6= v0vn, make a strand Si,j of 20 letters,

where the first half is the same as last part

di,11 · · · di,20 of Ri , and the second half is as

the first half dj,1 · · · dj,10 of Rj. For edges

start from v0 or end at vn we use all of R0

and Rn, added the corresponding half of an-

other vertex. Thus for these edges, we get

30 letters.

Jiming Peng, AdvOL, CAS, McMaster 41

Page 42: lecture7

DNA model

A large number of the edge strands, about

1014 copies of each for the graph are syn-

thesized and put into ‘pot’. For each of Ri

(except R0 and Rn), create its complement

Ri and put a large amount of copies. For

a path S4,5, S5,2, S2,1, by the construction we

know it contains the substring R5R2, we can

attach their complement R5R2 to it. Now we

can get all the paths in the graph.

We can verify strands with correct start and

end. A DNA molecule representing a path has

a complete copy of Ri for each vertex vi in

the path. Thus we can extract DNA strands

with length n × 20 by biochemical process.

For each vi (except v0 and vn), we mix in

copies of Ri, extract the strands to which

they attach, and discard others. Then the Ri

molecules are separated from the strands and

removed. Now the remaining strands repre-

sents paths that pass through vi.

When this process is finished for all vertices,

then the remaining path is the desired HP

path.

Jiming Peng, AdvOL, CAS, McMaster 42

Page 43: lecture7

Comments on DNA model

Theoretically, all these steps can be done

in linear time of the problem size. But it

also depends on the volume of the involved

material in the biochemical process. What is

the increase speed of this volume regarding

the size of the underlying problem?

It is also possible that some errors happen

in the biochemical process. Then we won’t

get the exact solution. Like the probabilistic

methods, fast but no guarantee of correct-

ness!

Extensive research on this direction is going

on...

Jiming Peng, AdvOL, CAS, McMaster 43