Final Exam Review Final exam will have the similar format and requirements as Mid-term exam: • Closed book, no computer, no smartphone • Calculator is Ok Final exam questions are contained in: • Questions in Homework 2 and Programming Assignment 2 • Content listed in the following slides
19
Embed
Final Exam Review Final exam will have the similar format and requirements as Mid-term exam: Closed book, no computer, no smartphone Calculator is Ok Final.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Final Exam Review
Final exam will have the similar format and requirements as Mid-
term exam:•Closed book, no computer, no smartphone•Calculator is Ok
Final exam questions are contained in:
•Questions in Homework 2 and Programming Assignment 2
•Content listed in the following slides
2
String Similarity
How similar are two strings? ocurrance occurrence
o c u r r a n c e
c c u r r e n c eo
-
o c u r r n c e
c c u r r n c eo
- - a
e -
o c u r r a n c e
c c u r r e n c eo
-
6 mismatches, 1 gap
1 mismatch, 1 gap
0 mismatches, 3 gaps
3
Applications. Basis for Unix diff. Speech recognition. Computational biology.
Edit distance. [Levenshtein 1966, Needleman-Wunsch 1970] Gap penalty ; mismatch penalty pq. Cost = sum of gap and mismatch penalties.
2 + CA
C G A C C T A C C T
C T G A C T A C A T
T G A C C T A C C T
C T G A C T A C A T
-T
C
C
C
TC + GT + AG+ 2CA
-
Edit Distance
4
Goal: Given two strings X = x1 x2 . . . xm and Y = y1 y2 . . . yn find
alignment of minimum cost.
Def. An alignment M is a set of ordered pairs xi-yj such that each
item occurs in at most one pair and no crossings.
Def. The pair xi-yj and xi'-yj' cross if i < i', but j > j'.
Ex: CTACCG vs. TACATG.Sol: M = x2-y1, x3-y2, x4-y3, x5-y4, x6-y6.
Sequence Alignment
C T A C C -
T A C A T-
G
G
y1 y2 y3 y4 y5 y6
x2 x3 x4 x5x1 x6
5
Def. An s-t cut is a partition (A, B) of V with s A and t B.
Def. The capacity of a cut (A, B) is:
Cuts
s
2
3
4
5
6
7
t
15
5
30
15
10
8
15
9
6 10
10
10 15 4
4
Capacity = 10 + 5 + 15 = 30
A
cap( A, B) c(e)e out of A
6
s
2
3
4
5
6
7
t
15
5
30
15
10
8
15
9
6 10
10
10 15 4
4 A
Cuts
Def. An s-t cut is a partition (A, B) of V with s A and t B.
Def. The capacity of a cut (A, B) is:
cap( A, B) c(e)e out of A
Capacity = 9 + 15 + 8 + 30 = 62
7
Residual Graph
Original edge: e = (u, v) E. Flow f(e), capacity c(e).
Residual edge. "Undo" flow sent. e = (u, v) and eR = (v, u). Residual capacity:
Residual graph: Gf = (V, Ef ). Residual edges with positive residual capacity. Ef = {e : f(e) < c(e)} {eR : f(e) > 0}.
u v 17
6
capacity
u v 11
residual capacity
6
residual capacity
flow
c f (e) c(e) f (e) if e E
f (e) if eR E
8
Ford-Fulkerson Algorithm
s
2
3
4
5 t 10
10
9
8
4
10
10 6 2
G :capacity
9
Augmenting Path Algorithm
Augment(f, c, P) { b bottleneck(P) foreach e P { if (e E) f(e) f(e) + b else f(eR) f(eR) - b } return f}
Ford-Fulkerson(G, s, t, c) { foreach e E f(e) 0 Gf residual graph
while (there exists augmenting path P) { f Augment(f, c, P) update Gf
} return f}
forward edge
reverse edge
10
Certifiers and Certificates: 3-Satisfiability (3-SAT)
SAT. Given a CNF formula , is there a satisfying assignment?
Certificate. An assignment of truth values to the n boolean variables.
Certifier. Check that each clause in has at least one true literal.
Ex.
Conclusion. SAT is in NP.
instance s
certificate t
11
Subset Sum
SUBSET-SUM. Given natural numbers w1, …, wn and an integer W,
Remark. With arithmetic problems, input integers are encoded in binary. Polynomial reduction must be polynomial in binary encoding.
Claim. 3-SAT P SUBSET-SUM.
Pf. Given an instance of 3-SAT, we construct an instance of SUBSET-SUM that has solution iff is satisfiable.
12
Subset Sum
Construction. Given 3-SAT instance with n variables and k clauses, form 2n + 2k decimal integers, each of n+k digits, as illustrated below.
Claim. is satisfiable iff there exists a subset that sums to W.Pf. No carries possible.
dummies to get clausecolumns to sum to 4
y
x
z
0 0 0 0 1 0
0 0 0 2 0 0
0 0 0 1 0 0
0 0 1 0 0 1
0 1 0 0 1 1
0 1 0 1 0 0
1 0 0 1 0 1
1 0 0 0 1 0
0 0 1 1 1 0
x y z C1 C2 C3
0 0 0 0 0 2
0 0 0 0 0 1
0 0 0 0 2 0
1 1 1 4 4 4
x
y
z
W
10
200
100
1,001
10,011
10,100
100,101
100,010
1,110
2
1
20
111,444
13
Weighted Vertex Cover
Definition. Given a graph G = (V, E), a vertex cover is a set S V such that each edge in E has at least one end in S.
Weighted vertex cover. Given a graph G with vertex weights, find a vertex cover of minimum weight. (NP hard problem) all nodes with weight of 1 reduces the problem to standard vertex cover problem.
4
9
2
2
4
9
2
2
weight = 2 + 2 + 4 weight = 11
14
Pricing Method
Pricing method. Set prices and find vertex cover simultaneously.
Why S is a vertex cover set? (use contradiction to prove)
Weighted-Vertex-Cover-Approx(G, w) { foreach e in E pe = 0
while ( edge e=(i,j) such that neither i nor j are tight) select such an edge e increase pe as much as possible until i or j tight }
S set of all tight nodes return S}
15
Approximation method: Pricing Method
Pricing method. Each edge must be covered by some vertex. Edge e = (i, j) pays price pe 0 to use vertex i and j.
Fairness. Edges incident to vertex i should pay wi in total.
Lemma. For any vertex cover S and any fair prices pe: e pe
w(S).
Pf. ▪
4
9
2
2
sum fairness inequalitiesfor each node in S
each edge e covered byat least one node in S
16
Pricing Method
vertex weight
Figure 11.8
price of edge a-b
Example shows the pricing method does not provide the optimal weighted vertex cover solution
17
Weighted Vertex Cover: IP Formulation
Weighted vertex cover. Given an undirected graph G = (V, E) with vertex weights wi 0, find a minimum weight subset of
nodes S such that every edge is incident to at least one vertex in S.
Integer programming formulation. Model inclusion of each vertex i using a 0/1 variable xi.
Vertex covers in 1-1 correspondence with 0/1 assignments: S = {i V : xi = 1}
Objective function: minimize i wi xi.
– Constraints:….. Must take either i or j: xi + xj 1.