Introduction to Algorithm • Instructor: Dr. Bin Fu • Office: ENGR 3. 280 (Third Floor) • Email: [email protected] • Textbook: Introduction to Algorithm (Second Edition) by Cormem, Leiserson, Rivest and Stein
Dec 27, 2015
Introduction to Algorithm
• Instructor: Dr. Bin Fu
• Office: ENGR 3. 280 (Third Floor)
• Email: [email protected]
• Textbook: Introduction to Algorithm
(Second Edition) by Cormem, Leiserson, Rivest and Stein
Why study algorithms and performance?
• Performance often draws the line between what
is feasible and what is impossible.
• Algorithmic mathematics provides a language
for talking about program behavior. (e.g., by using big-O –notation)
• In real life, many algorithms, though different
from each other, fall into one of several
paradigms (discussed shortly). These
paradigms can be studied.
Why these particular algorithms ??
In this course, we will discuss problems, and algorithms for solving these problems.
Why these algorithms (cont.)
1. Main paradigms:
a) Greedy algorithms
b) Divide-and-Conquers
c) Dynamic programming
d) Brach-and-Bound (mostly in AI )
e) Etc etc.
2. Other reasons:
a) Relevance to many areas:
• E.g., networking, internet, search engines…
Topics
• Recursive Equations
• Divide and Conquer Method
• Dynamic Programming Method
• Basic and Advanced Data Structures
• Graph Algorithms
• Approximation Algorithms
• NP-Complete Theory
• Randomized Algorithms
Grade
• 4-5 assignments (35%)
• Midterm (20%)
• Final (25%)
• Exercises and Attendance for the class (20%)
Advantage for a good algorithm designer
• It helps you develop efficient software
• It is easy to switch from one area to another in computer science
2.2 Analyzing Algorithms
RAM: Random-access machine,
It takes one step for accessing memory once.
The instructions are executed one by one sequentially
Running time: the total number of steps expressed as the function of input size
The problem of sorting
• Input: sequence ⟨a1, a2, …, an of numbers.⟩• Output: permutation of the input numbers:
Example:
Input: 8 2 4 9 3 6
Output: 2 3 4 6 8 9
''' 21 naaa
Running time
The running time depends on the input:
• Parameterize the running time by the
size of the input n
• Seek upper bounds on the running time
T(n) for the input size n, because everybody likes a guarantee.
Kinds of analyses
Worst-case: (usually)• T(n) = maximum time of algorithm on any input of size n.
Average-case: (sometimes)• T(n) = expected time of algorithm over all inputs of size n.• Need assumption of statistical distribution of inputs.
Best-case: (bogus)• Cheat with a slow algorithm that works fast on some input.
Machine-independent time
What is insertion sort’s worst-case time?• It depends on the speed of our computer:• relative speed (on the same machine),• absolute speed (on different machines).
BIG IDEA:• Ignore machine-dependent constants.• Look at growth of T(n) as n → ∞ .• “Asymptotic Analysis”
Bubble Sort Algorithm
1. Compare 1st two elements and exchange them if they are out of order.
2. Move down one element and compare 2nd and 3rd
elements. Exchange if necessary. Continue until end of array.
3. Pass through array again, repeating process and exchanging as necessary.
4. Repeat until a pass is made with no exchanges.
Bubble Sort Example
Array numlist3 contains
Compare values 17 and 23. In correct order, sono exchange.
Compare values 23 and11. Not in correct order,so exchange them.
17 23 5 11
Compare values 23 and5. Not in correct order, so exchange them.
Bubble Sort Example
Array numlist3 contains
Compare values 23 and11. Not in correct order,so exchange them.
17 5 23 11
Bubble Sort Example
Array numlist3 contains
Compare values 23 and11. Not in correct order,so exchange them.
17 5 11 23
Bubble Sort Example (continued)
After first pass, array numlist3 contains
Compare values 17 and23. In correct order, sono exchange.
5 17 11 23
Compare values 17 and11. Not in correct order, so exchange them.
In order from previous pass
Bubble Sort Example (continued)
After second pass, array numlist3 contains
No exchanges, so array is in order
Compare values 5 and 11. In correct order, sono exchange.
Compare values 17 and23. In correct order, sono exchange.
5 11 17 23
Compare values 11 and17. In correct order, sono exchange.
In order from previous passes
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Sorting Problem
• Given a series of integers
7, 2, 5, 3, 6, 9, 8, 1
• Arrange them by the increasing order:
1< 2< 3 < 5 < 6 < 7< 8 < 9
Merge for Sorting• Convert the sorting for 7, 2, 5, 3, 6, 9, 8, 1 into 7, 2, 5, 3, and 6, 9, 8, 1 • Sort 7, 2, 5, 3 into 2< 3 < 5 < 7• Sort 6, 9, 8, 1 into 1< 6 < 8 <9• Merge 2< 3 < 5 < 7 1< 6 < 8 <9 2< 3 < 5 < 7 3 < 5 < 7 5 < 7 7 6 < 8 <9 6 < 8 <9 6<8 <9 6<8 <9 1 1<2 1<2<3 1<2<3<5
Merge for SortingMerge 2< 3 < 5 < 7 1< 6 < 8 <9
2< 3 < 5 < 7 3 < 5 < 7 5 < 7 7
6 < 8 <9 6 < 8 <9 6<8 <9 6<8 <9
1 1<2 1<2<3 1<2<3<5
7
8<9 8<9
1<2<3<5<6 1<2<3<5<6<7 1<2<3<5<6<7<8<9
Time analysis
P:
T(n): the number of steps to sort n elements
T(1)=1
T(n)=2T(n/2)+n for n>1
P1P1 P2P2
Time analysis
P:
T(n)=2T(n/2)+n
=2(2T(n/4)+n/2)+n=4T(n/4)+n+n
=4(2T(n/8)+n/4)+n+n=8T(n/8)+n+n+n
=…
=
=O(n (log n))
P1P1 P2P2
nkT knk )(2
2
Every layer costs steps
Total #layers is log n. Total time is n (log n)
Merge time #Nodes
P 1
n
P1 P2 2
n/2
……
……
k2k
n2
nknk
22
Exponentiation Problem
• Compute
• Compute na
16a
Polynomial Problem• Compute
• Compute a general polynomial:
158273 234567 xxxxxxx
012
21
1 ... axaxaxaxa nn
nn
nn
Exercise
Draw the tree for merge sorting
15, 11, 4, 22, 31, 55, 71, 12, 7, 2, 5, 3, 6, 9, 8, 1
Point out the number of comparison that you use.
Chapter 3-4
Growth of Functions and Recursion Equations
O-notation: f(n) = O(g(n)) , g(n) is an asymptotically upper bound for f(n) 。
O(g(n)) = {f(n)| if there are positive constants c and n0 such that 0 f(n) c2 g(n) for all large n n0 }
Example: 3n2 - 6n = O(n2) 。
O-notation
• Drop low-order terms; ignore leading constants.• Example: • we say that T(n)= O( g(n) ) iff• there exists positive constants , and such that• 0<T(n) < g(n) for all n > n0• Usually T(n) is running time, and n is size of input
)O(n 6046 5n – 90n 3n 323
0n1c
1c
Simplified Master Theorem Let
be a recursive equation on the nonnegative integers,
where a> 0, b > 1, c>0, and r>=0 are constants,
Then,• 1. If , then• 2. If , then• 3. If , then
ar blog )()( log abnOnT ar blog )log()( log nnOnT abar blog )()( rnOnT
rcnbnaTnT )/()(
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Sum
• Sum
• Let
kddd ...1 2
kk dddS ...1 2
132 ... kk dddddS
11 kkk dSdS
Sum
• From
• So,
kk dddS ...1 2
132 ... kk dddddS
1
1,
1)1(,
1
1
1
1
d
dSThus
dSdSo
dSdS
k
k
kk
kkk
Sum
• Assume that d is a constant.
• Case 1: d>1. (Main term is Right region)
• Case 2: d=1. (Each term is same )
• Case 3: d<1. (Main term is at left region)
kk dddS ...1 2
)( kk dOS
)(kOSk
)1(OSk
Order
• Assume that d, r are constants.
• Case 1: d>1.
• Case 2: d=1.
• Case 3: d<1.
)...1()( )(2 nhr dddnnT
)()( )(nhrdnOnT
))(()( nhnOnT r
)()( rnOnT
Simplified Master Theorem Let
be a recursive equation on the nonnegative integers,
where a> 0, b > 1, c>0, and r>0 are constants,
Then,• 1. If , then• 2. If , then• 3. If , then
ar blog )()( log abnOnT ar blog )log()( log nnOnT abar blog )()( rnOnT
rcnbnaTnT )/()(
LayersLayer rcn
r
b
nc
r
b
nc
r
b
nc
r
b
nc
2
r
b
nc
2
r
b
nc
2
a
a a
0
1
2
a a a
......
......
Proof.
• The number of nodes in the j-th layer is• The size of a node in the j-the layer is
• The cost of each node in the j-th layer is
• The total cost at j-th layer is
ja
rjb
nc )(
)(jb
n
jr
rrj
j
b
acn
b
nca )()(
Proof.
• The number of layers is • The total cost at all layers is
nblog
jr
rn
j b
acn
b
)(log
0
Proof.
• The total cost at all layers is
jr
n
j
rjr
rn
j b
acn
b
acn
bb
)()(log
0
log
0
Proof.
• The total cost at all layers is
Where d is the constant
,
)()(
log
0
log
0
log
0
jn
j
r
jr
n
j
rjr
rn
j
dcn
b
acn
b
acn
b
bb
rb
ad
Proof.
• Case 1.
So,
Therefore, the total cost is
ar blog
abba
br log
1rb
ad
)()()()( loglog
log
loglog a
r
ar
nr
nrnr b
b
b
b
b nOn
ncnO
b
acnOdcnO
Proof.Case 2.
So,
Therefore, the total cost is
ar blog
abba
br log
1rb
ad
)log(log
1log
0
log
0
nnOncn
cndcn
br
br
n
j
rjn
j
rbb
Proof.
• Case 3.
So,
Therefore, the total cost is
ar blog
abba
br log
1rb
ad
)()1(log
0
rrjn
j
r nOOcndcnb
Simplified Master Theorem Let Then,
• 1. If , then
(The main cost is the bottom layers region)
• 2. If , then
(Every layer has roughly the same cost)
• 3. If , then
(The main cost is the top layers region)
)log()( log nnOnT ab
ar blog )()( rnOnT
rcnbnaTnT )/()(
)()( log abnOnT
ar blog
ar blog
3.1 Asymptotic notation
Θ-notation: f(n) = Θ(g(n)) , g(n) is an asymptotically tight bound for f(n) 。
Θ(g(n)) = {f(n)| there exists constants c1 , c2 , and n0 such that 0 c1 g(n) f(n) c2 g(n) for all large n n0 }
Example: Prove 3n2 - 6n = Θ(n2) 。
Example: Prove 3n2 - 6n = Θ(n2) 。
Proof:
We need to find constants c1, , c2 and n0 such that:
c1n2 3n2 - 6n c2n2 , (for all nn0)
Divide n2
c1 3 - 6/n c2
Select c1=2 , c2=3 and n0=6
Note : f(n) = Θ(g(n)) iff g(n)= Θ(f(n)) , For example: n2=(3n2-6n)
O-notation: f(n) = O(g(n)) , g(n) is an asymptotically upper bound for f(n) 。
O(g(n)) = {f(n)| if there are positive constants c and n0 such that 0 f(n) c2 g(n) for all large n n0 }
• Θ(g(n)) O(g(n))
• f(n) = Θ(g(n)) implies f(n) = O(g(n))
• 6n = O(n) , 6n = O(n2)
• Computational time O(n2) means the time in the worst case is O(n2)
Ω-notation: f(n) = Ω(g(n)) , g(n) is an asymptotically lower bound for f(n) 。
Ω(g(n)) = {f(n)| there are positive constants c and n0 such that 0 cg(n) f(n) for all n n0 }
Note : f(n) = Θ(g(n)) if and only if
(f(n)=O(g(n))) & (f(n)=Ω (g(n)))
tight bound upper bound lower bound
o-notation: f(n) = o(g(n)) (little-oh of g of n)
o(g(n)) = {f(n)| for every positive constant c , there exists constant n0 > 0 such that 0 f(n) < cg(n) for all n n0 }
• 2n = o(n2) , but 2n2 o(n2)• f(n) = o(g(n)) can be also written 0 =
)(
)(lim
ng
nfn
Comparison of functions
• functions: Ω Θ O o
numbers: = <
• Transitivity , Reflexivity , Symmetry
• Two real numbers are always comparable , two functions may not be comparable
– Example : f(n)=n and g(n)=n1+sin n
Appendix A: Summation formulas
n
kk
n
kk
n
kkk bacbca
111
)(
)1/()1()()1(2
1 1
0
2
1
xxxnnnk n
n
k
kn
k
series) (Harmonic )1(log1
1
Onk
H e
n
kn
)1()1(
)1(1
12
00
xx
xkxx
xx
k
k
k
k
1
1
1
1
11)
1
11(
)1(
1 n
k
n
k nkkkk
n
kk
n
kk aa
11
lglg
Simplified Master Theorem Let Then,
• 1. If , then
(The main cost is the bottom layers region)
• 2. If , then
(Every layer has roughly the same cost)
• 3. If , then
(The main cost is the top layers region)
)log()( log nnOnT ab
ar blog )()( rnOnT
rcnbnaTnT )/()(
)()( log abnOnT
ar blog
ar blog
Exercise• For the following two equations. Identify the main cost
region and solution with the simplified master theorem :
5
2
)2/(8)(.2
)2/(8)(.1
nnTnT
nnTnT
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Divide and Conquer Method
The most-well known algorithm design strategy:
1. Divide instance of problem into two or more smaller instances
2. Solve smaller instances recursively
3. Obtain solution to original (larger) instance by combining these solutions
Simplified Master Theorem Let
be a recursive equation on the nonnegative integers,
where a> 0, b > 1, c>0, and r>0 are constants,
Then,• 1. If , then• 2. If , then• 3. If , then
ar blog )()( log abnOnT ar blog )log()( log nnOnT abar blog )()( rnOnT
rcnbnaTnT )/()(
LayersLayer rcn
r
b
nc
r
b
nc
r
b
nc
r
b
nc
2
r
b
nc
2
r
b
nc
2
a
a a
0
1
2
a a a
......
......
Integer Multiplication
• 12x 47=564 by the algorithm below
12
47
----------------------------
84
48
-----------------------------
564
Integer MultiplicationA = 1234567890135798 B = 87654321284820912
The elementary school algorithm:
a1 a2 … an
b1 b2 … bn
(d10) d11d12 … d1n
(d20) d21d22 … d2n
… … … … … … …
(dn0) dn1dn2 … dnn
Efficiency: n2 one-digit multiplications
Karatsuba’s Algorithm• Using the classical pen and paper algorithm two n
digit integers can be multiplied in O(n2) operations. Karatsuba came up with a faster algorithm.
• Let A and B be two integers with– A = A110k + A0, A0 < 10k
– B = B110k + B0, B0 < 10k
– C = A*B = (A110k + A0)(B110k + B0)
= A1B1102k + (A1B0 + A0 B1)10k + A0B0
A trivial analysis• T(n)=4T(n/2)+O(n)
• T(n)=O( )2n
Simplified Master Theorem Let Then,
• 1. If , then
(The main cost is the bottom layers region)
• 2. If , then
(Every layer has roughly the same cost)
• 3. If , then
(The main cost is the top layers region)
)log()( log nnOnT ab
ar blog )()( rnOnT
rcnbnaTnT )/()(
)()( log abnOnT
ar blog
ar blog
3 MultiplicationsInstead this can be computed with 3 multiplications
• T0 = A0B0
• T1 = (A1 + A0)(B1 + B0)
• T2 = A1B1
• C = T2102k + (T1 - T0 - T2)10k + T0
Complexity of Algorithm• Let T(n) be the time to compute the product of two
n-digit numbers using Karatsuba’s algorithm. Assume n = 2k. T(n) = (nlg(3)), lg(3) 1.58
• T(n) 3T(n/2) + cn
Matrix Multiplication
• Regular method takes time O(n*n*n)
• IDEA:
r s a b e f
= x
t u c d g h
• r =ae+bg
• s =af +bh
• t =ce+dg
• u =cf +dh
Strassen’s algorithm
• Multiply 2×2 matrices with only 7 recursive mults.
• P1 = a ( ⋅ f – h)
• P2 = (a + b) ⋅ h
• P3 = (c + d) ⋅ e
• P4 = d (⋅ g – e)
• P5 = (a + d) (⋅ e + h)
• P6 = (b – d) (⋅ g + h)
• P7 = (a – c) (⋅ e + f )
Strassen’s Algorithm
• r = P5 + P4 – P2 + P6
• s = P1 + P2
• t = P3 + P4
• u = P5 + P1 – P3 – P7
ProblemVerify one of the four equations in the Strassen’s
algorithm:
u =P5 + P1 – P3 – P7
Strassen observed [1969] that the product of two matrices can be computed as follows:
C00 C01 A00 A01 B00 B01
= *
C10 C11 A10 A11 B10 B11
M1 + M4 - M5 + M7 M3 + M5
=
M2 + M4 M1 + M3 - M2 + M6
M1 = (A00 + A11) (B00 + B11)
M2 = (A10 + A11) B00
M3 = A00 (B01 - B11)
M4 = A11 (B10 - B00)
M5 = (A00 + A01) B11
M6 = (A10 - A00) (B00 + B01)
M7 = (A01 - A11) (B10 + B11)
Analysis of Strassen’s Algorithm
If n is not a power of 2, matrices can be padded with zeros.
Number of multiplications:
T(n) = 7T(n/2)+O( ), T(1) = 1
Solution: T(n) = nlog 27 ≈ n2.807
vs. n3 of brute-force alg.
2n
ProblemVerify two of the four equations in the Strassen’s
algorithm:
t = P3 + P4
and
u =P5 + P1 – P3 – P7
Heap 3
7 5
10 11 6
Father<= childrenThe root is the smallestPerfect binary tree
Heap 3
7 5
10 11 6
Father<= childrenThe root is the smallestPerfect binary tree (every layer except the bottom is
filled up)
Heap Operations
• Insertion
• Deletion: Remove root (take the least)
Heap Insertion 3
7 5
10 11 6 1
Insertion Adjustment 3
7 1
10 11 6 5
Adjust it on the path from new leaf to root
Heap Insertion 1
7 3
10 11 6 5
Except new leaf, all adjusted nodes get smaller valuesInsertion does not damage the heap
Deletion (Remove Root)
7 3
10 11 6 5
Deletion Adjustment 5
7 3
10 11 6
Take the last leaf to the rootAdjust on the path from root to a leaf
Deletion Adjustment 3
7 5
10 11 6
Deletion does not damage heap
Heap Representation
• A heap with no more than n elements uses array h of size n.
• The children of h[i] is h[2i] and h[2i+1]
• The h[i/2] is the father of h[i]
Heap Representation
• A heap with no more than n elements uses array h of size n.
Prove By Induction:
• left[i]=2i
• right[i]=2i+1
• father(i)=i/2
Adjustment for Insertion
Bottom-Up-Adjust(i){
if (h[i]<h[parent(i)]){
swap between h[i] and h[father(i)];
Bottom-Up-Adjust(parent(i))
}
}
Insert
Insert(date, heapsize){
Put data at h[heapsize+1];
Bottom-Up-Adjust(heapsize+1);
}
Adjustment for deletion
Top-Down-Adjust(i){
let h[child] be minimal(h[left(i)], h[right(i)]).
if (h[i]>h[child]){
swap between h[i] and h[child];
Top-Down-Adjust(child)
}
}
Delete
Delete(heapsize){
move h[heapsize] to h[1];
Top-Down-Adjust(1);
}
Complexity of Heap Operations
• A heap has n elements.
• The depth of heap is O(log n)
• Insertions takes O(log n) steps
• Deletion takes O(log n) steps
Heap Sorting
• Input: a1, a2, …, an
• Build Heap: n insertions
Cost: O(nlog n)
• Remove from heap: n deletions
Cost: O(nlog n)
Total cost: O(nlog n)
ATM Traffic Shaping
Incoming
vc1 queue
vc2 queue
vc3 queue
vcn queue
schedulerOutgoing
Tele-communicationPhones
Phones
Each vc is considered as one phone connection
GatewaySwitch
Traffic in one virtual circuit
Incoming packets:
Outgoing packets after shaping:
Time:
p1 p2 p3 p4 p5 p6 p7
p1 p2 p3 p4 p5 p7p6
Interval Packet Delay
• Inter packet time gap is big enough
• Every virtual circuit i has minimal inter delay const
inter_delay_i
>inter_delay_i
time
Packet 3 Packet 4
A Trivial Algorithm
• After a packet in one vc, set the ready time for sending the next packet :
ready_time=current_time+inter_delay
• Periodically check all queues, and send packet without inter delay violation
Drawback of the Algorithm
• Much time is wasted for checking those vcs which have no packet ready
Another Algorithm
• Check if there is at least one queue ready to send
• Use a heap to select a queue with the least ready time
current_time>ready_time
Heap:
Heap for Selecting Queue
• Each vc has <ready_time, vc_i> to enter heap
• Heap is built based on the order of ready_time
Apply Heap to Traffic Control
<ready_time,vc_i>
<ready_time,vc_1>
<ready_time,vc_2>
<ready_time, vc_1000>
…….
vc1 queue
vc2 queue
vc1000 queue
When to insert to heap?
• When a queue just has one new packet, or
• When a queue just sends out one packet and still has packets waiting
When to delete from heap?• Outgoing bandwidth is available, and
• The least ready_time in heap is expired
Drawback of One Heap Solution
• It can not prevent greedy VC
• It may ignore some VCs
• Traffic control is not predictable
Two Heaps Design
vc1000 queue
vc1 queue
timingfairness
Two Heaps Functions
• Time Heap: Control the inter packet delay
<time_ready, vc_i>
• Fairness Heap: Balance the service among all VCs
<service_got, vc_i>
Adjust servic_got for fairness• Each vc has a weight w_i > 1
• When a packet is sent, its
service_got=service_got + w_i
• When a queue just has one packet,
service_got=max(service_got, time_stamp)
• The service is reverse proportional to weight w_i
Problem 1
7 3
10 11 6 5
a) Draw the steps to insert element 2.b) After 2 is inserted, draw the steps to remove the
root
Dynamic Programming
Dynamic Programming
Recursion: Like divide-and-conquer .
Overlap in subproblems: Not like divide-and-conquer
P(n)
P(m1) P(m2) …. P(mk)
S1 S2 …. Sk
S
Matrix Multiplication (definition)
Given a series of matrices A1, A2, … , An, matrix Ai
has size pi1 pi, find a way to compute A1A2…An so
that it has least number of multiplications
Example : A1 A2 A3 A4
pi :13 5 89 3 34
(A1(A2(A3A4))), (A1((A2A3)A4)), ((A1A2 )( A3A4)),
((A1(A2 A3))A4), ((( A1A2)A3)A4).
5 ways to compute the product of 4 matrices :
Matrix Multiplication(Example)
(A1(A2(A3A4))), costs = 26418
(A1((A2A3)A4)), costs = 4055
((A1A2 )( A3A4)), costs = 54201
((A1(A2 A3))A4), costs = 2856
((( A1A2)A3)A4), costs = 10582
A1 A2 A3 A4 13 5 89 3 34
(A1(A2(A3A4))) A1(A2A3A4) A2(A3A4) A3A4
cost = 13*5*34 + 5*89*34 + 89*3*34 = 2210 + 15130 + 9078
= 26418
Catalan Number
For any n, # ways to fully parenthesize the product
of a chain of n+1 matrices
= # binary trees with n nodes.
= # n pairs of fully matched parentheses.
= n-th Catalan Number = C(2n, n)/(n +1) =
(4n/n3/2)
Multiplication Tree
(A1(A2(A3A4)))
(A1((A2A3)A4))
((A1A2 )( A3A4))
((A1(A2 A3))A4)
((( A1A2)A3)A4)
A1 A2 A3 A4
2
1 3
A1 A2 A3 A4
1 2 3
Multiplication Design (1)
kT :
If T is an optimal solution for A1, A2, … , An
1, …, k k+1, …, n
T1 T2
then, T1 (resp. T2) is an optimal solution for A1, A2,
… , Ak (resp. Ak+1, Ak+2, … , An).
Multiplication Design (2)
Let m[i, j] be the minmum number of scalar multiplications needed to compute the product
Ai…Aj , for 1 i j n.
If the optimal solution splits the product Ai…Aj =
(Ai…Ak)(Ak+1…Aj), for some k, i k < j, then
m[i, j] = m[i, k] + m[k+1, j] + pi1 pk pj . Hence,
we have :m[i, j] = mini k < j{m[i, k] + m[k+1, j] + pi1 pk pj }
= 0 if i = j
Matrix Multiplication (Example)
Consider an example with sequence of dimensions <5,2,3,4,6,7,8>
m[i, j] = mini k < j{m[i, k] + m[k+1, j] + pi1 pk pj }
1 2 3 4 5 6
1 0 30 64 132 226 348
2 0 24 72 156 268
3 0 72 198 366
4 0 168 392
5 0 336
6 0
Matrix Multiplication (Find Solution)
m[i, j] = mini k < j{m[i, k] + m[k+1, j] + pi1 pk pj }s[i, j] = a value of k that gives the minimum
s 1 2 3 4 5 6
1 1 1 1 1 1
2 2 3 4 5
3 3 4 5
4 4 5
5 5
[1,6]
[2,6]A1
[2,5] A6
[2,4] A5
A4[2,3]
A2 A3
A1(((( A2A3)A4)A5) A6)
Analysis
To fill the entry m[i, j], it needs (ji) operations. Hence the execution time of the algorithm is
m[i, j] =mini k < j{m[i, k] + m[k+1, j] + pi1 pk pj }
)()(
])(
[)()(
3
1
2
1
2
1 11 2
1
nj
jjjijij
n
j
n
j
n
j
j
i
n
i
n
ij
Time: (n3)Space: (n2)
Steps for Developing DP algorithm
Characterize the structure of an optimal solution.
Derive a recursive formula for computing the
values of optimal solutions.
Compute the value of an optimal solution in a
bottom-up fashion (top-down is also applicable).
Construct an optimal solution in a top-down
fashion.
Elements of Dynamic Programming
Optimal substructure (a problem exhibits optimal
substructure if an optimal solution to the problem
contains within it optimal solutions to
subproblems)
Overlapping subproblems
Memoization
Longest Common Subsequence (Def.)
Given two sequences X = <x1, x2, … , xm> and Y =
<y1, y2, … , yn> find a maximum-length common
subsequence of X and Y.
Example 1 : Input: ABCBDAB BDCABA
C.S.’s: AB, ABA, BCB, BCAB, BCBA …
Longest: BCAB, BCBA, … Length = 4
A B C B D A B B D C A B A
Example 2 : vintner writers
Longest Common Subsequence (Design 1)
Let Z= < z1, z2, … , zk> be a LCS of X = <x1, x2,
… , xm> and Y = <y1, y2, … , yn>.
If zk xm, then Z is a LCS of <x1, x2, … , xm1>
and Y.
If zk yn, then Z is a LCS of X and <y1, y2, … ,
yn1>.
If zk = xm = yn, then <z1, z2, … , zk1> is a LCS of
<x1, x2, … , xm1> and <y1, y2, … , yn1>.
Longest Common Subsequence (Design 2)
Let L[i, j] be the length of an LCS of the prefixes
Xi = <x1, x2, … , xi> and Yj = <y1, y2, … , yj>, for
1 i m and 1 j n. We have :
L[i, j] = 0 if i = 0, or j = 0
= L[i1, j1] + 1 if i , j > 0 and xi = yj
= max(L[i, j1], L[i1, j]) if i , j > 0 and xi yj
Longest Common Subsequence L[i, j] = 0 if i = 0, or j = 0
= L[i1, j1] + 1 if i , j > 0 and xi = yj
= max(L[i, j1], L[i1, j]) if i , j > 0 and xi yj
Time: (mn)Space: (mn)
A B C B D A BB 0 1 1 1 1 1 1 D 0 1 1 1 2 2 2 C 0 1 2 2 2 2 2A 1 1 2 2 2 3 3B 1 2 2 3 3 3 4A 1 2 2 3 3 4 4
LCS: BCBALCS: BCBA
Longest Common Subsequence L[i, j] = 0 if i = 0, or j = 0
= L[i1, j1] + 1 if i , j > 0 and xi = yj
= max(L[i, j1], L[i1, j]) if i , j > 0 and xi yj
Time: (mn)Space: (mn)
A B C B D A BB 0 1 1 1 1 1 1 D 0 1 1 1 2 2 2 C 0 1 2 2 2 2 2A 1 1 2 2 2 3 3B 1 2 2 3 3 3 4A 1 2 2 3 3 4 4
LCS: BCBALCS: BCBA
Find a triangulation s.t. the sum of the weights
of the triangles in the triangulation is
minimized.
Optimal Polygon Triangulationv0
v1
v2
v3 v4
v5
v6
Optimal Polygon Triangulation (Design 1)
If T is an optimal solution for v0, v1, … , vnv0
vk
vnT1
T2
then, T1 (resp. T2) is an optimal solution for v0, v1,
… , vk (resp. vk, vk+1, … , vn), 1 k < n.
},{ 021 nkk vvvvTTT
Optimal Polygon Triangulation (Design 2)
Let t[i, j] be the weight of an optimal triangulation
of the polygon vi1, vi,…, vj, for 1 i < j n.
If the triangulation splits the polygon into vi1, vi,
…, vk and vk, vk+1, … ,vn for some k, then
t[i, j] = t[i, k] + t[k+1, j] + w(vi1 vk vj) . Hence,
we have :
t[i, j] = mini k < j{t[i, k] + t[k+1, j] + w(vi1 vk vj) }
= 0 if i = j
Catalan Number• Segner's recurrence formula gives :
,... 212312 EEEEEEE nnnn
2 nn CE
121 EE
Problem
Draw the dynamic programming table to find the Longest Common Sequence between BACAC and CABC.
Data Structures
• Linked List
• Heap
• Application of Heap in an Industry Product
Program=Data Structure + Algorithm
Link List
8 10 15
Node Structure
• struct listnode{
type data;
struct node *nextPtr;
}
Dynamic Memory Allocation:
• Apply for the memory when it is needed
• Release memory when it is not needed
a
Linked List Operations
• Insertion:
add a new node to the link list
• Deletion:
remove a node from the current link list
Linked List to Implement
c
The linked list is increasing order for characters
startPtr
e h
Insertion
• Create a new node
• Find a place to insert
• Apply for a new piece of memory
• Adjust the nearby pointers
After g is inserted
c
The linked list is increasing order for characters
startPtr
e g
h
Find Location to Insert
c
startPtr previousPtr currentPtr
e h
g
Find Location to Insert
c
startPtr
e h
g
Deletion
• Find the node to delete
• Adjust the pointers nearby the deleted node
• Release the memory for the deleted node
Find Location to Insert
c
startPtr
e h
Remove the node and release memory
c
startPtr
h
After e is deleted
c
The linked list is increasing order for characters
startPtr
h
Queue
• First in, First Out
Queue Linked List
• headPtr tailPtr
Three Important Operations
Supporting the 3 operations are the foundation of modern data base:
• Search
• Insertion
• Deletion
Tree
• Tree is a 2-dimensional structure
47
25 77
11 43 93
• Binary tree: root, left_child, right_child
Numbers in the left <= Numbers in the right.
Data structure for one node
struct treenode{
struct treenode *leftptr;
int data;
struct treenode *rightptr;
}
typedef struct treenode TreeNode;
typedef TreeNode *TreeNodePtr;
47
25 77
11 43 93
47
25 77
11 43 93
Operations
• Insertion
• Traverse:
inorder
preorder
postorder
Insertionvoid insertNode(TreeNodePtr *treePtr, int value){
if (treePtr is empty) {
allocate memory and put here
}
else if (value<(*treePtr)data)
insert at the left tree
else
insert at the right tree
}
void insertNode( TreeNodePtr *treePtr, int value ) { if ( *treePtr == NULL ) { *treePtr = malloc( sizeof( TreeNode ) ); if ( *treePtr != NULL ) { ( *treePtr )->data = value; ( *treePtr )->leftPtr = NULL; ( *treePtr )->rightPtr = NULL; } else printf( "No memory available.\n“); } else if ( value < ( *treePtr )->data ) insertNode( &( ( *treePtr )->leftPtr ), value ); else if ( value > ( *treePtr )->data ) insertNode( &( ( *treePtr )->rightPtr ), value ); else printf( "dup" );}
void inOrder( TreeNodePtr treePtr )
{
if ( treePtr != NULL ) {
inOrder( treePtr->leftPtr );
printf( "%3d", treePtr->data );
inOrder( treePtr->rightPtr );
}
}
void preOrder( TreeNodePtr treePtr )
{
if ( treePtr != NULL ) {
printf( "%3d", treePtr->data );
preOrder( treePtr->leftPtr );
preOrder( treePtr->rightPtr );
}
}
void postOrder( TreeNodePtr treePtr )
{
if ( treePtr != NULL ) {
postOrder( treePtr->leftPtr );
postOrder( treePtr->rightPtr );
printf( "%3d", treePtr->data );
}
}
Problem
Implement the function to find the largest and least elements in the binary tree:
void max(TreeNodePtr *treePtr, int *largest, int *least)
Binary tree
When the binary is not balanced, it takes O(n) steps for search, insert, or delete.
• Search: O( n) steps in the worst case
• Insert: O(n) steps in the worst case
• Delete: O(n) steps in the worst case
Maintaining Balance
• Binary Search Tree– Height governed by
• Initial order • Sequence of insertion/deletion
• Need a structure that tends to maintain balance– How?
• Grow in ‘width’ first, then height• Accommodate horizontal growth• More data at each level• Nodes of two forms
– One data member and two children (“Two node”)– Two data members and three children (“Three node”)
2-3 Tree• Each node which is not a leaf has either 2 or 3 sons
• Every path from the root to a leaf has the same length.
Depth and Size for 2-3 tree
• Let d be the depth of a 2-3 tree
• The k-level has and nodes
• A depth d=(log n) 2-3 tree can hold n nodes at leave level
k2 k3
2-3 Tree Nodes
> LS
S L
S >S L
S L M
L
2-3 Tree Nodes
S:L
S LS ,
2-3 Tree NodesS:L:M
S LS , ML ,
Search in 2-3 TreeSearch (a,r){
if (r only has leaf children) return r
else {
if a<= S then search(a, left_child)
else if (a<=L) then search(a, mid_child)
else search(a, right_child)
}
}
Insertion(36)
40:100
20:40 60:80:100
20 40 60 80 100
Insertion(36)
40:100
20:36:40 60:80:100
20 40 60 80 10036
Insertion(36)
40:100
20:36:40 60:80:100
20 40 60 80 10036
Insertion(50)
40:100
20:36:40 60:80:100
20 40 60 80 10036
Insertion(50)
40:100
20:36:40 60:80:100
20 40 60 80 10036 50
Insertion(50)
36:60:100
20:36:40 80:100
20 40 60 80 10036 50
50:60
Insertioninsertion(a){
use search(root,a) to find the node r
make a as son of r
if (r has four sons)
adjust the tree from r up to root by addson(r)
}
Insertion and Splits on the path to root root
split stops here
split starts here
Addson(v)Addson(v){
create a new node v’
make the two rightmost sons of v to sons of v’
if (v has no father) {
create a new root r
make v and v’ the left and right sons of r
}
else {make v’ a son of father(v) to the right of v
if (father(v) has four sons) then addson(father(v))
}}
Computational steps for insertion
• Assume the tree has n nodes on the leave level
• Insertion operates the nodes from the root to a leaf
• The path from the root to leaf has (log n) nodes
• The number of steps for insertion is O(log n)
Deletion(80)
36:50:100
20:36 60:80:100
20 40 60 80 10036 50
40:50
Deletion(80)
36:50:100
20:36 60:100
20 40 60 10036 50
40:50
Deletion(4)5:9
3:5 7:9
1:3 4:5 6:7 8:9
1 3 4 5 6 7 8 9
Deletion(4)5:9
3:5 7:9
1:3 4:5 6:7 8:9
1 3 5 6 7 8 9
Deletion(4)5:9
3:5 7:9
1:3:5 6:7 8:9
1 3 5 6 7 8 9
Deletion(4)5:9
3:7:9
1:3 6:7 8:9
1 3 5 6 7 8 9
Deletion(4)
3:7:9
1:3 6:7 8:9
1 3 5 6 7 8 9
Deletion
Stops at this level
merge starts at this level
Deletion
Stops at this level
merge starts at this level
Deletion
Stops at this level
Deletion
Stops at this level
Deletion
Stops at this level
Deletion
Stops at this level
Deletion
Delete(r,a){
remove the son of r with value a
call RemoveSon( r) to recursively adjust the tree
(roughly along the path from r to root)
}
RemoveSon(r)
RemoveSon( r){ if (r has one child) { let r’ be a brother of r if (r’ has 3 sons) let r get a son from r’ else {make the son of r son of r’ let f be the father of r remove r RemoveSon (f) }}
SearchS:L:M
S LS , ML ,
Insertion and Splits on the path to root root
split stops here
split starts here
Deletion
Stops at this level
merge starts at this level
Problem: show how to delete 75:9
3:5 7:9
1:3 4:5 6:7 8:9
1 3 4 5 6 7 8 9
Problem: show how to delete 75:9
3:5 7:9
1:3 4:5 6:7 8:9
1 3 4 5 6 7 8 9
Binomial Heaps
Operations
• Insert(H,x)
• Minimum(H)
• Extract-Min(H)
• Union(H1, H2)
• Decrease-Key(H,x, k)
• Delete(H,x)
• Binomial trees:B0 has a single node
Bk:
Ex
B1 B2 B3
i
Bk-1
Bk-1
– Lemma1For the binomial tree Bk,
1. there are 2k nodes,
2. the height if the tree is k,
3. there are exactly nodes at depth i for i= 0, 1, …., k,
4. the root has degree k, which is greater than that of any other node; moreover if the children of the root are numbered from left to right by k-1, k-2, …, 0, child I is the root of a subtree Bi
Proof:
1) By induction, 2k-1 + 2k-1 = 2k
2) By induction, 1 + (k-1) = k
3) , by induction
ki
ki
1k1-i
1ki
Corollary:The maximum degree of any node in an n-node binomial tree is lg(n)
• Binomial heaps:H: a set of binomial trees satisfying the following:
1. Each binomial tree in H is heap-ordered:
the key of a node is greater than or equal to the key of its parent
2. There is at most one binomial tree in H whose root has a given degree
Note:
By 2. an n-node binomial heap H consists of at most
binomial trees
1n lg
Representation of binomial heaps
(a) head[H] 10 1
1711
29148
18
12 25
38
27
6
/100
/
180
/ /
121
25
/ /
/12
/63
/
270
/ /
111
170
/
380
/ /
82
141
290
/ /
(b) head[H]
pkey
degree
child
sibling
– Operations on binomial heaps• Creating a new binomial heap
• Finding the minimum key
• Uniting 2 binomial heaps
time:θ(1) NIL,Hhead
}
yreturn
}
sibling[x] x
}
xy
key[x] min
{
min then x[k]if
{
do NIL x while
min
head[H] x
NIL y
{
Minimum(H)HeapBinomial
Time: O( lg n)
• Binomial-Heap-Merge
}1 degree[z] degree[z]
child[z] child[z] sibling[y]
z p[y] {
z) Link(y,Binomial
y
Bk-1
Bk-1
Y, z : Bk-1 trees
12 7 15
28 33
41
25 37
3 6
4410298
1731482223
32 244555
18
3050
(a) head[H1] head[H2]
37
3 6
4410298
1731482223
32 244555
18
3050
(b) head[H] 12 7 15
28 33
41
25
x next-xsorted degree
Output of
Binomial-Heap-Merge
Case 3
37
3 6
4410298
1731482223
32 244555
1830
50
(c) head[H] 12 7 15
28 33
41
25
x next-x
Case 2
sibling[next-x]
37
3 6
4410298
1731482223
32 244555
1830
50
(d) head[H] 12 7 15
28 33
4125
x next-x
Case 4
prev-x
37
3 6
4410298
1731482223
32 244555
1830
50
(d) head[H] 12
7
15
28 33
4125
x next-x
Case 3
prev-x
37
3 6
4410298
1731482223
32 244555
1830
50
(d) head[H] 12
715
28 33
41
25
x next-x
Case 1
prev-x
.......
{
then ) ]degree[ ]]ling[degree[sib and NIL ]sibling[ (
or ) ]degree[ ]degree[ ( if {
)do NIL ( while
sibling[H] head[H]
NIL H return then NILhead[H] if
to pointthey liststhe not but H and Hfree )H ,Merge(H-Heap-Binomial head[H]
) Heap(-Binomial-Make H {
)H ,Union(HHeapBinomial
21
21
21
next-xxxprev-x
xnext-xnext-x
next-xx
next-x
next-xxprev-x
Case1
}y return
} ]sibling[
}
) , Link(-Binomial ]sibling[
else head[H]
then ) NIL ( if {else }
) , Link(-Binomial ]sibling[ ]sibling[
{ then ) ]key[ ]key[ ( ifelse }
xnext-x
next-xxnext-xx
next-xprev-x
next-xprev-x
xnext-xnext-xx
next-xx
......
Case 3
Case 4
….a b c dprev-x x next-xsibling[next-x]
Bk Bl
….….(a)
Case 1
a b c dprev-x x next-x
Bk Bl
….
a b c dprev-x x next-xsibling[next-x]
Bk Bk
….….(b)
Case 2
a b c dprev-x x next-x
Bk Bk
….
Bk Bk
a b c dprev-x x next-xsibling[next-x]
Bk Bk
….….(c)
Case 3
a bc
dprev-x x next-x
Bk
Bk
….
Bl Bl
a b c dprev-x x next-xsibling[next-x]
Bk Bk
….….(d)
Case 4
B
l
][][ xnextkeyxkey
][][ xnextkeyxkey
Bk+1
a cb
dprev-x x
Bk
Bk
….
Bl
Bk+1
• Insert a node
Extracting the node with minimum key}
)'
'
'
H H, Union(-Heap-Binomial H x ]head[H NIL sibling[x]
NIL child[x] NIL p[x]
) Heap(-Binomial-Make H {
x) Insert(H,HeapBinomial
}
)x return
H H, Union(-Heap-Binomial H listresulting
the of headthe to point to ]head[H set and children sx' of list linkedthe of orderthe reverse
) Heap(-Binomial-Make H H of list root
the inkey minthe withrootthe remove and find {
Min(H)-ExtractHeapBinomial
'
'
'
13
10 1
2512166
1823262914
17 381127
418
42
(a) head[H] 37
28
77
13
10 1
2512166
1823262914
17 3811
27
418
42
(b) head[H] 37
28
77
x
13
10 25
41
(c) head[H] 37
28
77
6
2914
17 3811
27
8
16
232642
12
18
head[H’]
25 6
2914
17 3811
27
8
12
18
41
37
(d)head[H]
16
232642
13
10
28
77
Decreasing a key
• Deleting a key
}}
p[y] z z y
fields other and key[z] key[y]exchange {
do ) key[z] key[y] and NIL z ( whilep[y] z x y
k key[x] " key[x] k" error then key[x] k if
{k) x, Key(H,-DecreaseHeapBinomial
} ) H Min(-Extract-Heap-Binomial
) -x, H, Key(-Decrease-Heap-Binomial {
x) Delete(H,HeapBinomial
25 6
2914
17 3811
27
8
12
18
41
37
(a)head[H]
16
23742
13
10
28
77
z
y
25 6
2914
17 3811
27
8
12
18
41
37
(b)head[H]
7
231642
13
10
28
77
z
y
25 6
2914
17 3811
27
8
12
18
41
37
(c)head[H]
10
231642
13
7
28
77
z
y
Minimum Spanning Trees
(Greedy Algorithms)
Graph
• Graph: A set of nodes V
A set of edges E from V x V
V={ }
E={ }
v1v1
v2v2
v3v3
v4v4
4321 ,,, vvvv
),(),,(),,(),,( 42433221 vvvvvvvv
Path• Graph G=(V,E)
• A path is a series of edges linked one by one
• Loop:
Tree• A graph is connected if every two nodes have a path
to connect them
• A tree is a connected graph without loop
Connected Graph Tree
• Every connected graph can be converted into tree by removing some edges
Removing one edge on a loop does not damage the connectivity.
A tree is a minimal connected graph
• Removing any edge on a tree damages the connectivity
Proof. Tree T=(V,E).
Let (v1, v2) be removed from T. T T’=(V, E-{(v1,v2)}).
If T’ is still connected, T has a loop containing v1 and v2 . Contradiction!
Number of edges in a tree• Each tree has node with only one edge
Proof. Start from one node to build a path. Meet a node with only one edge. Otherwise, it has loop.
• Each tree of n nodes has n-1 edges
Proof. By induction. It is true for n=1,2.
Assume it is true at case n.
At case n+1, find the node with one edge. Remove it. By inductive assumption, it has n-1 edges.
Unique path on tree• Every two nodes in a tree have a unique path.
Proof. If there are different path, there is a loop.
Weighed Graph
Many graph problems have weighted edges
All weights are positive value here
ab
c d
e
f
g h
35
1 1
95
4
7
32
2
4
G=(V,E)
Minimum Spanning Trees (MST) Find the lowest-cost way to connect all of the points
(the cost is the sum of weights of the selected edges).
The solution must be a tree. (Why ?)
A spanning tree : a subgraph that is a tree and connect all of the points.
A graph may contains exponential number of spanning trees.
(e.g. # spanning trees of a complete graph = nn-2.)
A High-Level Greedy Algorithm for MST
The algorithm grows a MST one edge at a time and maintains that A is always a subset of some MST.
An edge is safe if it can be safely added to without destroying the invariant.
How to check that T is a spanning tree of G ? How to select a “safe edge” edge ?
A = ; while(T=(V,A) is not a spanning tree of G) { select a safe edge for A ; }
MST Basic LemmaLet V = V1 + V2 and V1 and V2 have no intersection
(V1,V2 ) = {uv | u V1& v V2 }.
if xy (V1,V2 ) and w(xy) = min {w(uv)| uv
(V1,V2 )}, then xy is contained in a MST.
ab
c d
e
f
g h
35
1 1
95
4
7
32
2
4
V1 V2
Proof
• Edge xy selected with the minimal w(xy) connecting V1 and V2 is an extension toward MST.
• Otherwise, add xy to the MST and replace another edge connecting V1 and V2. This makes adding xy is an extension toward MST.
Kruskal’s Algorithm (pseudo code 1)
A= ;for( each edge in order by nondecreasing weight ) if( adding the edge to A doesn't create a cycle ){ add it to A; if( | A| == n1 ) break; }
How to check that adding an edge does not create a
cycle?
Kruskal’s Algorithm (Example 1/3)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Kruskal’s Algorithm (Example 1/3)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Kruskal’s Algorithm (Example 1/3)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Kruskal’s Algorithm (Example 1/3)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Kruskal’s Algorithm (Example 1/3)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Kruskal’s Algorithm (Example 1/3)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Kruskal’s Algorithm (Example 2/3)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Kruskal’s Algorithm (Example 3/3)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
MST cost = 17MST cost = 17
Kruskal’s Algorithm (pseudo code 2)
A = ; initial(n); // for each node x construct a set {x}for( each edge xy in order by nondecreasing weight)
if ( ! find(x, y) ) { union(x, y); add xy to A; if( | A| == n1 ) break; }
find(x, y) = true iff. x and y are in the same setunion(x, y): unite the two sets that contain x
and y, respectively.
find(x, y) = true iff. x and y are in the same setunion(x, y): unite the two sets that contain x
and y, respectively.
Prim’s Algorithm (pseudo code 1)
ALGORITHM Prim(G)// Input: A weighted connected graph G=(V,E)// Output: A MST T=(V, A) VT { v0 } // Any vertex will do; A ; for i 1 to |V|1 do find an edge xy (VT, VVT ) s.t. its weight is minimized among all edges in (VT, VVT ); VT VT { y } ; A A { xy } ;
Prim’s Algorithm (Example 1/8)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Prim’s Algorithm (Example 2/8)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Prim’s Algorithm (Example 3/8)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Prim’s Algorithm (Example 4/8)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Prim’s Algorithm (Example 5/8)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Prim’s Algorithm (Example 6/8)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Prim’s Algorithm (Example 7/8)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Prim’s Algorithm (Example 8/8)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
MST cost = 17MST cost = 17
Prim’s Algorithm (pseudo code 2)
Built a priority queue Q for V with key[u] = uV;key[v0] = 0; [v0] = Nil; // Any vertex will do While (Q ) { u = Extract-Min(Q); for( each v Adj(u) ) if (v Q && w(u, v) < key[v] ) { [v] = u; key[v] = w(u, v); Change-Priority(Q, v, key[v]); }}
Minimum Spanning Tree (analysis)
Let n = |V( G)|, m =|E( G)|. Execution time of Kruskal’s algorithm: (use
union-find operations with bionomial heap)
O(m log m ) = O(m log n )
Running time of Prim’s algorithm: adjacency lists + (binary or ordinary) heap:
O((m+n) log n ) = O(m log n )
Find the Minimum spanning with Prim’s algorithmStart from the node a. Show the steps
ab
cd
e
f
g h
16
2 10
74
5
8
23
4
9
Problem 1
1.Give asymptotic upper and lower bounds for T(n) in each of the following recurrences. Assume that T(n) is constant for n2. Make your bounds as tight as possible, and justify your answers.
• a) T(n)=8T(n/2)+• b) T(n)=2T(n/4)+• c) T(n)=T(n-1)+• d) T(n)= T( ) +1
5nn
2nn
Problem 1. a)
Upper bound
(by Simplified Master Theorem )
Lower Bound
(by the recursion)
)( 5nO
)( 5n
Problem 1. b)
Upper bound
(by Simplified Master Theorem Case 2)
Lower Bound
(by the recursion tree analysis)
)log( nnO
)log( nn
Problem 1. c)
Upper bound
Lower Bound
)( 4nO
)( 4n
Problem 1. c)
We have
)(
)1(...21
...
)1()2()3(
)1()2(
)1()(
4
3333
333
33
3
n
nn
nnnnT
nnnT
nnTnT
Problem 1.d)
• Upper bound
• Lower Bound
)log(log nO
)log(log n
Problem 1. d)
Upper bound
(by Simplified Master Theorem Case 3)
Lower Bound
(by the recursion tree)
)( 2nO
)( 2n
Problem 2
2. Let A[0...n-1] be an array of n distinct integers. A pair (A[i], A[j]) is said to be an inversion if these numbers are out of order, i.e., i<j but A[i]>A[j]. Design an O(n log n) time algorithm for counting the number of inversions.
Solution
• Revise the merge sorting.• When merge to sorted sub-array, compare
two front elements. a) remove the front left element if it is less
than or equal to the front on the right. b) increase the counter by the number of
elements in the left half if front left is larger than right, and remove the front right.
Problem 3: Bubble, Merge, and Heap Sortings
a) int bubblesort(int *a, int size).
b) int mergesort(int *a, int size)
c) int generate(int *a, int size)
d) Test both with 10, 100, 1000, 10000, 100000, and 1000,000, 4000,000 integers.
Mergevoid merge(long int *a, long int lo, long int m, long int hi){ long int i, j, k; i=0; j=lo; // copy first half of array a to auxiliary array b while (j<=m) b[i++]=a[j++];
i=0; k=lo; // copy back next-greatest element at each time while (k<j && j<=hi) if (b[i]<=a[j]) a[k++]=b[i++]; else a[k++]=a[j++]; // copy back remaining elements of first half (if any) while (k<j) a[k++]=b[i++];}
Mergesort
void mergesort(long int *a, long int lo, long int hi){
if (lo<hi) {
long int m=(lo+hi)/2;
mergesort(&a[0], lo, m);
mergesort(&a[0], m+1, hi);
merge(&a[0], lo, m, hi);
}
}
The beginning of the program
#include <iostream>using namespace std;
#include <time.h>#define ARRAYSIZE 100000long int array[ARRAYSIZE];long int b[ARRAYSIZE]; void merge(long int a[], long int lo, long int m, long int hi);void mergesort(long int a[], long int lo, long int hi);void swap( long int *element1Ptr, long int *element2Ptr );void bubbleSort( long int *array, const long int size );
Main int main(void) { time_t t1,t2; long int option, i; printf("Enter 1 for merge sort or 2 for bubble sort\n"); scanf("%d", &option); for(i = 0; i < ARRAYSIZE; i++) array[i] = rand(); /* load random values */ if(option == 1){ t1 = time(NULL); mergesort(&array[0], 0, ARRAYSIZE - 1); t2 = time(NULL); } else{ t1 = time(NULL); bubbleSort(&array[0], ARRAYSIZE); t2 = time(NULL); } return 0;}
Homework 2
• The knapsack problem is that given a set of positive integers {a1,…, an}, and a knapsack of size s, find a subset A of {a1,…, an} such that the sum of elements in A is the largest, but at most s.
• Part 1. Use the dynamic programming method to design the algorithm for the knapsack problem. Prove the correctness of your algorithm. Show the computational time of your algorithm carefully.
Homework 2
Part 2. Use C++ to implement the function belowint knapsack(int *a, //the input integers int n, //the number of input integers int s, //knapsack size int *subset, //subset elements int &size_of_subset //the number of items in the subset)Test your program for the following knapsack problem:Input list: 5, 23, 27, 37, 48, 51, 63, 67, 71, 75, 79, 83, 89, 91, 101, 112,
121, 132, 137, 141, 143, 147, 153, 159, 171, 181, 190, 191 with knapsack size 595. Print out the subset and the sum of its elements.
Also print out your source code.
Single-Source Shortest Paths
Shortest-paths with Source s (Example)
s
t
u
v
wx
y
z
s
t
v
wx
y
z
Original Graph G Shortest-paths with Source s
10
10
10
10
10
10
1
1
1
1
1
10
10
1
1
1
1
Shortest-path problem
• Find the shortest path in a graph 。• G=(V,E) is a Weighted Directed Graph
• Weight function w: ER assigns weight to each edge 。
• p=(v0,v1,…,vk) is a path from v0 to vk 。
Shortest-path problem
• Define
• Define the shortest distance from node u to node v
k
i ii vvwpw1 1 ),()(
otherwise. ,
. to frompath a exists },:)(min{),( vuvupwvup
Shortest-path tree rooted at s
• For Graph G=(V,E), its Shortest-path tree rooted at s is G’=(V’,E’) , which satisfies:– V’ is the set of all nodes reachable from s 。– G’ is a tree with s as its root 。– In G’, a path from s to v is the shortest path
from s to v in G 。
Shortest-path tree rooted at s (Example)
s
t
u
v
wx
y
z
s
t
v
wx
y
z
Original Graph G Shortest-path tree rooted at s
10
10
10
10
10
10
1
1
1
1
1
10
10
1
1
1
1
Predecessor graph
• For graph G=(V,E) , follow table π to build Gπ=(Vπ,Eπ), which statisfies :– π[s]=NIL , and s∈Vπ 。– If π[v]≠NIL, then(π[v],v)∈Eπ and v∈Vπ 。
• Shortest-path tree rooted at s is an example of Predecessor graph 。
Predecessor graph Example
s
t
u
v
wx
y
z
s
t
v
wx
y
z
Original Graph G Shortest-path tree rooted at s
10
10
10
10
10
10
1
1
1
1
1
10
10
1
1
1
1
π[s] π[t] π[u] π[v] π[w] π[x] π[y] π[z]
NIL s NIL t s t x v
Initialize-Single-Source Algorithm
• Define d[v]to be the shortest distance from s to v 。
• Let π[v] be the node before reaching v on the shortest path from s to v 。
• Initially , d[v]=∞ , π[v]=NIL , d[s]=0 。 Except the shortest path from s to s, everything
else is unknown 。
Initialize-Single-Source Algorithm
Initialize-Single-Source(G,s)
{ for each vertex v∈V[G]
do d[v]∞
π[v]NILd[s]0
}
Relaxation Algorithm
• Use the edge (u,v) to improve the current known shortest path 。
Relax(u,v,w)
{ if d[v]>d[u]+w(u,v)
thend[v]d[u]+w(u,v)π[v]u
}
Relaxation Example
4 7
s
w(u,v)u v
if w(u,v)=2 (<3)
Before Relax(u,v,w) After Relax(u,v,w)
4 6
s
w(u,v)u v
Renew sv shortest path and π[v]u
if w(u,v)=4 (>3)
4 7
s
w(u,v)u v
Do not update sv shortest distance
Shortest Path and Relaxation• Triangular inequality :
For every edge (u,v) , δ(s,v)<= δ(s,u)+w(u,v)。
• Upper bound propert :δ(s,v)<= d[v] , d[v] is always the upper for the shortest distance sv 。 If d[v]=δ(s,v) , then Relaxation does not update d[v] 。
Shortest Path and Relaxation• No Path :
If there is no path from s to v, then d[v]=δ(s,v)=∞。
• Convergence Property :If the shortest path sv has edge (u,v) and d[u]=δ(s,u) , Then Relax(u,v,w) makes d[v]=δ(s,v) 。
Shortest Path and Relaxation
• Path-relaxation Property :If p=(v0,v1,…,vk) is a shortest path s=v0vk, , then excuting Relax(v0,v1,w) , Relax(v1,v2,w)… , Relax(vk-1,vk,w) can achieve d[vk]=δ(s,vk) 。
• Predecessor graph Property :After a series of Relaxation , for every node v , when d[v]=δ(s,v) , the corresponding Predecessor graph Gπ is a Shortest-path tree rooted at s 。
Bellman-Ford Algorithm
• It computes the shortest paths for the graph without negative loop 。
Bellman-Ford(G,w,s){ Initialize-Single-Source(G,s)for i = 1 to |V-1|
do for each edge (u,v)∈Edo Relex(u,v,w)
for each edge (u,v)∈Edo if d[v]>d[u]+w(u,v)
then return false//Negative loopreturn true //Success
}
Bellman-Ford Algorithm Example
0
∞
∞
∞
∞
(a)6
7
8
9
2
-3
-4
-2
5
7 0
6
7
∞
∞
(b)6
7
8
9
2
-3
-4
-2
5
7s s
Bellman-Ford Algorithm Example
0
6
7
4
2
(c)6
7
8
9
2
-3
-4
-2
5
7 0
2
7
4
-2
(d) (e)6
7
8
9
2
-3
-4
-2
5
7s s
Bellman-Ford Algorithm Analysis
• Correctness : For each edge, Relaxation can compute the next reachable node’s shortest path in the Shortest-path tree rooted at s 。 By path-relaxation property , After |V|-1 , All Shortest simple path destination v , d[v]=δ(s,v) 。
Bellman-Ford Algorithm Analysis
• Time Complexity :
– Initialize-Single-Source takes O(|V|) steps 。– For each edge, it spends O(|V|) time Relaxation and
costs O(|E||V|) steps 。– Finally, spends O(|E|) to check if it has negative loop
。
• Total time : O(|V||E|) 。
Dijkstra Algorithm
• Can only handle graph without negative edge 。
• It is faster than Bellman-ford algorithm , and select an order to do Relaxation 。
• Use Priority queue for implementation 。
• Main idea : Use the convergence property 。
Dijkstra Algorithm
Q: Priority queue with d as the key
Dijkstra(G,w,s)
{ Initialize-Single-Source(G,s)
Q=V[G]
while Q is not empty
do u=Extract-Min(Q)
for each v∈adj[u]
do Relax(u,v,w)
}
Dijkstra Algorithm Example
0
∞ ∞
∞ ∞
10
5
1
2 3
2
7
9
4 6
(a)
0
10 ∞
5 ∞
10
5
1
2 3
2
7
9
4 6
(b)
s s
Dijkstra Algorithm Example
0
8 11
5 7
10
5
1
2 3
2
7
9
4 6
(d)
s0
8 14
5 7
10
5
1
2 3
2
7
9
4 6
(c)
s
Dijkstra Algorithm Example
0
8 9
5 7
10
5
1
2 3
2
7
9
4 6
(e)
0
8 9
5 7
10
5
1
2 3
2
7
9
4 6
(e)
s s
Dijkstra Algorithm Analysis
• Use different Priority queue , has different cost 。
• Use Linear array , Cost O(|V|2) steps 。
• Use Binary heap , Costs O(|E|log|V|) steps 。
• Use Fibonacci heap , Costs O(|E|+|V|log|V|) steps。
Single-source shortest paths in DAGs
• Different from Bellman-Ford. Follow certain order to do Relaxation , Can find the shortest path in shorter time 。
DAG-Shortest-Path(G,w,s){ Topologically sort V[G]Initialize-Single-Source(G,s)for each u taken in topological order
do for each v∈adj[u]do Relax(u,v,w)
}
• Costs O(|V|+|E|) steps 。
DAG-Shortest-Path Example
∞ 0 ∞ ∞∞ ∞
(a) s5 2 7 -1 -2
6 1
3 42
∞ 0 ∞ ∞∞ ∞
(b) s5 2 7 -1 -2
6 1
3 42
DAG-Shortest-Path Example
∞ 0 ∞ ∞2 6
(c) s5 2 7 -1 -2
6 1
3 42
∞ 0 6 42 6
(d) s5 2 7 -1 -2
6 1
3 42
DAG-Shortest-Path Example
∞ 0 5 42 6
(e) s5 2 7 -1 -2
6 1
3 42
∞ 0 5 32 6
(f)s5 2 7 -1 -2
6 1
3 42
∞ 0 5 32 6
(g) s5 2 7 -1 -2
6 1
3 42
Problem: apply Dijskstra algorthm to find the shortest paths
to all nodes from s. Show how d[v] changes at every v.
s
t
u
v
wx
y
z
10
10
10
10
10
10
1
1
1
1
1
Bipartite Matching
Lecture 3: Jan 17
Bipartite Matching
A graph is bipartite if its vertex set can be partitioned
into two subsets A and B so that each edge has one
endpoint in A and the other endpoint in B.
A matching M is a subset of edges so that
every vertex has degree at most one in
M.
A B
The bipartite matching problem:
Find a matching with the maximum number of edges.
Maximum Matching
A perfect matching is a matching in which every vertex is matched.
The perfect matching problem: Is there a perfect matching?
• Greedy method?
(add an edge with both endpoints unmatched)
First Try
Key Questions
• How to tell if a graph does not have a (perfect) matching?
• How to determine the size of a maximum matching?
• How to find a maximum matching efficiently?
Hall’s Theorem [1935]:
A bipartite graph G=(A,B;E) has a matching that “saturates” A
if and only if |N(S)| >= |S| for every subset S of A.
SN(S)
Existence of Perfect Matching
König [1931]:
In a bipartite graph, the size of a maximum matching
is equal to the size of a minimum vertex cover.
What is a good upper bound on the size of a maximum matching?
Min-max theorem NP and co-NP
Implies Hall’s theorem.
Bound for Maximum Matching
König [1931]:
In a bipartite graph, the size of a maximum matching
is equal to the size of a minimum vertex cover.
Any idea to find a larger matching?
Algorithmic Idea?
Given a matching M, an M-alternating path is a path that alternates
between edges in M and edges not in M. An M-alternating path
whose endpoints are unmatched by M is an M-augmenting path.
Augmenting Path
What if there is no more M-augmenting path?
Prove the contrapositive:
A bigger matching an M-augmenting path
1. Consider
2. Every vertex in has degree at most 2
3. A component in is an even cycle or a path
4. Since , an M-augmenting path!
If there is no M-augmenting path, then M is maximum!
Optimality Condition
Algorithm
Key: M is maximum no M-augmenting path
How to find efficiently?How to find efficiently?
Finding M-augmenting paths
• Orient the edges (edges in M go up, others go down)
• An M-augmenting path
a directed path between two unmatched vertices
Complexity
• At most n iterations
• An augmenting path in time by a DFS or a BFS
• Total running time
Hall’s Theorem [1935]:
A bipartite graph G=(A,B;E) has a matching that “saturates” A
if and only if |N(S)| >= |S| for every subset S of A.
König [1931]:
In a bipartite graph, the size of a maximum matching
is equal to the size of a minimum vertex cover.
Idea: consider why the algorithm got stuck…
Minimum Vertex Cover
Observation: Many short and disjoint augmenting paths.
Idea: Find augmenting paths simultaneously in one search.
Faster Algorithms
• Matching
• Determinants
• Randomized algorithms
Bonus problem 1 (50%):
Given a bipartite graph with red and blue edges,
find a deterministic polynomial time algorithm to determine
if there is a perfect matching with exactly k red edges.
Randomized Algorithm
Application of Bipartite Matching
Jerry
Marking
Darek TomIsaac
Tutorials Solutions Newsgroup
Job Assignment Problem:
Each person is willing to do a subset of jobs.
Can you find an assignment so that all jobs are taken care of?
Application of Bipartite Matching
With Hall’s theorem, now you can determine exactly
when a partial chessboard can be filled with dominos.
Application of Bipartite Matching
Latin Square: a nxn square, the goal is to fill the square
with numbers from 1 to n so that:
• Each row contains every number from 1 to n.
• Each column contains every number from 1 to n.
Application of Bipartite Matching
Now suppose you are given a partial Latin Square.
Can you always extend it to a Latin Square?
With Hall’s theorem, you can prove that the answer is yes.
Homework 2
• Problem 1. Bitonic Euclidean Traveling Saleman problem.
Problem 1
• Define C(i, j): the minimal cost of tour from i to 1( to leftmost) and from 1 to j (to rightmost).
• Identify the recursion for C(i,j)
Problem 1
• Define C(i, j): the minimal cost of tour from i to 1( to leftmost) and from 1 to j (to rightmost).
• Identify the recursion for C(i,j)
• Sort those points by x-coordinates 1,…,n
Recursion
• Case i>j+1
i
j
1
i-1
),1()1,(),C( jiCiidistji
Recursion
• Case j>i+1
j-1 j
1
i
)1,()1j,j(),C( jiCdistji
Recursion
• Case j=i+1
j
1
i
i-1
)},1()1i,i(),1,()1i,j({min),C( jiCdistiiCdistji
Recursion
• Case j=i+1
j
1
i
i-1
)},1()1i,i(),1,()1i,j({min),C( jiCdistiiCdistji
Recursion
• Case j=i
i=j
1
i-1
i-2
)}1,()1i,i(),i,1-()1i,i({min),C( iiCdistiCdistii
Recursion
• Case j=i
i=j
1
i-1
i-2
)}1,()1i,i(),i,1-()1i,i({min),C( iiCdistiCdistii
Recursion
• Case j>i+1
j-1 j
1
i
)1,(),1-j(),C( jiCjdistji
Recursion
• Case i=j+1
i
j=i-1
1
i-2
)}2,()2,1(),1,2()2,(min{)1,C( iiCiidistiiCiidistii
Recursion
• Case i=j+1
i
j=i-1
1
i-2
)}2,()2,1(),1,2()2,(min{)1,C( iiCiidistiiCiidistii
Time
• Each C(i,j) needs to deal with O(1) cases.• Output C(n,n).• Total time is
)( 2nO
Problem 2• Printing Neatly problem.• The extra space each line is
• Minimize the sum of the cube of extra space for all lines except the last.
j
ikklijM
Problem 2• Define a line extra space cube for printing
word i, word i+1,…, word j:
• Define C(k) to be the cost for printing word k, word k+1,…, word n.
3)(),(
j
ikklijMjiline
Recursion
• If word k, word k+1,…, word n can fit into one row, then C(k)=0.
• Otherwise, assume h is the maximal number of words from k to fit into one row:
)}1(),({min)( gCgklkC hkgk
Time
• Each C(k) takes O(n) time.• Total time is
)( 2nO
Problem: Find an augmenting path to improve the red matching
Midterm
• >=90: 2
• 80-89: 3
• 70-79: 4
• 60-70: 5
• <60 : 2
Problem 1
Solve the following recursive equations with big-O notation:
T(n)=T(n-2)+n^3, with T(1)=1.
T(n)=16T(n/2)+n^2 , with T(1)=1.
Simplified Master Theorem Let
be a recursive equation on the nonnegative integers,
where a> 0, b > 1, c>0, and r>0 are constants,
Then,• 1. If , then• 2. If , then• 3. If , then
ar blog )()( log abnOnT ar blog )log()( log nnOnT abar blog )()( rnOnT
rcnbnaTnT )/()(
Problem 1
a) T(n)=T(n-1)+n^3, with T(1)=1.
Soltuion: T(n)=O(n^4)
b) T(n)=16T(n/2)+n^2 , with T(1)=1.
Solution: T(n)=O(n^4)
Problem 2• Delete 7 5:9
3:5 7:9
1:3 4:5 6:7 8:9
1 3 4 5 6 7 8 9
Problem 2• Delete 7 5:9
3:5 7:9
1:3 4:5 6:7 8:9
1 3 4 5 6 8 9
Problem 2• Delete 7 5:9
3:5 7:9
1:3 4:5 6:8:9
1 3 4 5 6 8 9
Problem 2• Delete 7
3:5:9
1:3 4:5 6:8:9
1 3 4 5 6 8 9
Problem 3 The following is a heap. A) show the steps to insert a
new element 1. b) Show the steps to remove the root after 1 is inserted.
2
7 3
11 8 6 4
Heap Insertion 2
7 3
11 8 6 4
1
Heap Insertion 2
7 3
1 8 6 4
11
Heap Insertion 2
1 3
7 8 6 4
11
Heap Insertion 1
2 3
7 8 6 4
11
Heap Deletion
2 3
7 8 6 4
11
Heap Deletion 2
3
7 8 6 4
11
Heap Deletion 2
7 3
8 6 4
11
Heap Deletion 2
7 3
11 8 6 4
Heap Deletion 2
7 3
11 8 6 4
Problem 5
Apply the Prim’s Algorithm to find the minimum spanning tree. Show each of your steps.
7 10 2 5 5 9 7 1 3 5 2 4
Prim’s Algorithm (pseudo code 1)
ALGORITHM Prim(G)// Input: A weighted connected graph G=(V,E)// Output: A MST T=(V, A) VT { v0 } // Any vertex will do; A ; for i 1 to |V|1 do find an edge xy (VT, VVT ) s.t. its weight is minimized among all edges in (VT, VVT ); VT VT { y } ; A A { xy } ;
Prim’s Algorithm (Example 1/8)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Prim’s Algorithm (Example 2/8)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Prim’s Algorithm (Example 3/8)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Prim’s Algorithm (Example 4/8)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Prim’s Algorithm (Example 5/8)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Prim’s Algorithm (Example 6/8)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Prim’s Algorithm (Example 7/8)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
Prim’s Algorithm (Example 8/8)
ab
cd
e
f
g h
35
1 1
94
4
7
32
2
4
MST cost = 17MST cost = 17
Problem 5
5. (20%) Find an O(nlog n) time algorithm such that given two sets of integers A and B, it determines whether B is a subset of A, where n=max(|A|,|B|), which is the larger size of A and B.
For examples, if A={3, 7,5} and B={3,5}, then the algorithm returns “yes”; and if A={3, 7,5} and B={2,5}, then the algorithm returns “no”.
Problem 6
6. (20%) This is a job scheduling problem with one machine. Each job has a specific time interval to be executed by the machine. In order to allocate some jobs to the machine, all the jobs assigned to the machine must have disjoint time intervals. For example, the list of input jobs has time intervals: [1, 3], [2, 6], [5, 9], [7,13], [11, 15]. There is an overlap between [1,3] and [2,6]. Therefore, [1,3] and [2,6] cannot be assigned to the machine together. Three jobs can be assigned to the machine without overlap as below [1,3], [5,9], and [11,15] (all intervals are disjoint) .
Develop an algorithm for the scheduling problem to get the maximal number of jobs assigned to the machine. Show the time complexity of your algorithm. Hint: you may use a greedy method to solve this problem.
Improve Midterm by 20 points
Rewrite the solution for problem 6, and implement the algorithm with C++.
Submit your solution with test results.
Initialize-Single-Source Algorithm
Initialize-Single-Source(G,s)
{ for each vertex v∈V[G]
do d[v]∞
π[v]NILd[s]0
}
Relaxation Algorithm
• Use the edge (u,v) to improve the current known shortest path 。
Relax(u,v,w)
{ if d[v]>d[u]+w(u,v)
thend[v]d[u]+w(u,v)π[v]u
}
Relaxation Example
4 7
s
w(u,v)u v
if w(u,v)=2 (<3)
Before Relax(u,v,w) After Relax(u,v,w)
4 6
s
w(u,v)u v
Renew sv shortest path and π[v]u
if w(u,v)=4 (>3)
4 7
s
w(u,v)u v
Do not update sv shortest distance
Shortest Path and Relaxation• Triangular inequality :
For every edge (u,v) , δ(s,v)<= δ(s,u)+w(u,v)。
• Upper bound propert :δ(s,v)<= d[v] , d[v] is always the upper for the shortest distance sv 。 If d[v]=δ(s,v) , then Relaxation does not update d[v] 。
Shortest Path and Relaxation• No Path :
If there is no path from s to v, then d[v]=δ(s,v)=∞。
• Convergence Property :If the shortest path sv has edge (u,v) and d[u]=δ(s,u) , Then Relax(u,v,w) makes d[v]=δ(s,v) 。
Shortest Path and Relaxation
• Path-relaxation Property :If p=(v0,v1,…,vk) is a shortest path s=v0vk, , then excuting Relax(v0,v1,w) , Relax(v1,v2,w)… , Relax(vk-1,vk,w) can achieve d[vk]=δ(s,vk) 。
• Predecessor graph Property :After a series of Relaxation , for every node v , when d[v]=δ(s,v) , the corresponding Predecessor graph Gπ is a Shortest-path tree rooted at s 。
Bellman-Ford Algorithm
• It computes the shortest paths for the graph without negative loop 。
Bellman-Ford(G,w,s){ Initialize-Single-Source(G,s)for i = 1 to |V-1|
do for each edge (u,v)∈Edo Relex(u,v,w)
for each edge (u,v)∈Edo if d[v]>d[u]+w(u,v)
then return false//Negative loopreturn true //Success
}
Bellman-Ford Algorithm Example
0
∞
∞
∞
∞
(a)6
7
8
9
2
-3
-4
-2
5
7 0
6
7
∞
∞
(b)6
7
8
9
2
-3
-4
-2
5
7s s
Bellman-Ford Algorithm Example
0
6
7
4
2
(c)6
7
8
9
2
-3
-4
-2
5
7 0
2
7
4
-2
(d) (e)6
7
8
9
2
-3
-4
-2
5
7s s
Bellman-Ford Algorithm Analysis
• Correctness : For each edge, Relaxation can compute the next reachable node’s shortest path in the Shortest-path tree rooted at s 。 By path-relaxation property , After |V|-1 , All Shortest simple path destination v , d[v]=δ(s,v) 。
Bellman-Ford Algorithm Analysis
• Time Complexity :
– Initialize-Single-Source takes O(|V|) steps 。– For each edge, it spends O(|V|) time Relaxation and
costs O(|E||V|) steps 。– Finally, spends O(|E|) to check if it has negative loop
。
• Total time : O(|V||E|) 。
Dijkstra Algorithm
• Can only handle graph without negative edge 。
• It is faster than Bellman-ford algorithm , and select an order to do Relaxation 。
• Use Priority queue for implementation 。
• Main idea : Use the convergence property 。
Dijkstra Algorithm
Q: Priority queue with d as the key
Dijkstra(G,w,s)
{ Initialize-Single-Source(G,s)
Q=V[G]
while Q is not empty
do u=Extract-Min(Q)
for each v∈adj[u]
do Relax(u,v,w)
}
Dijkstra Algorithm Example
0
∞ ∞
∞ ∞
10
5
1
2 3
2
7
9
4 6
(a)
0
10 ∞
5 ∞
10
5
1
2 3
2
7
9
4 6
(b)
s s
Dijkstra Algorithm Example
0
8 11
5 7
10
5
1
2 3
2
7
9
4 6
(d)
s0
8 14
5 7
10
5
1
2 3
2
7
9
4 6
(c)
s
Dijkstra Algorithm Example
0
8 9
5 7
10
5
1
2 3
2
7
9
4 6
(e)
0
8 9
5 7
10
5
1
2 3
2
7
9
4 6
(e)
s s
Problem 6
Design an algorithm to test if an undirected graph is connected. A graph is connected if there exists a path between every two vertices. For examples, the left graph is connected, but the right graph is not.
Example
• Vertices a,b,c,d are reachable, but e is not.
a
s
c
b d
e
Problem 6 Solution
• Assign weight one to each edge.
• Apply the minimal spanning tree algorithm
• The graph is connected iff the size of minimal spanning tree is n-1, where n is the number of nodes.
Problem 7
a) Design an O(n log n) time algorithm that given an array of n integers, it finds two elements a and b with |a-b|<5.
b) Improve the algorithm to O(n) time if the n integers in the input are in the range from 1 to 7n.
Problem 7 Examples• Connected Unconnected
Problem 7 Solution a)
• Apply the merge sorting. O(n log n) time
• If there two neighbors have difference < 5. O(n) time.
• Total time is O(n log n)+O(n)=O(nlog n)
Problem 7 Solution b)
• Define an array int a[7n]=0;
• Let a[k]=1 if k is in the list; O(n) time
• Check if there exists two 1s with distance less than 5 in array a[ ]. O(n) time
Problem 8
Suppose you have one machine and a set of n jobs a1, a2, …, an to process on that machine. Each job aj has a processing time tj, and a profit pj, and a deadline dj. The machine can process only one job at a time, and job aj must run uninterruptedly for tj consecutive time units. If job aj is completed by its deadline dj, you receive a profit pj, but if it is completed after its deadline, you receive a profit 0. Give an algorithm to find the schedule that obtains the maximum amount of profit, assuming that all processing times are integers between 1 and n. What is the running time of your algorithm.
Problem 8 Solution
• Try dymnamic programming method.
• Improve your midterm by working on it again.
• Due March 31 (Tuesday)
NP-completeness
NP Problems
blind monkey
Hamiltonian Path Problem• Given n cities• Does it exist a path through each city exactly once.
ORDPVD
MIADFW
SFO
LAX
LGA
HNL
Hamiltonian Path
Hamiltonian path goes through each node exactly once
HAMPATH={G| G is a directed graph with a Hamiltonian path}
ts
Polynomialn: input size
is a polynomial of n, where c does not depend on n.
Examples:
cn
,...,...,,, 10032 nnnn
Class P
P is the complexity class consisting of all decision problems that have polynomial-time algorithms
Polynomial-Time Decision Problems
• Decision problems: output is 1 or 0 (“yes” or “no”)• Examples:
• Is a given circuit satisfiable?
• Does a text T contain a pattern P?
• Does an instance of 0/1 Knapsack have a solution with benefit at least K?
• Does a graph G have an MST with weight at most K?
The Complexity Class P
• A complexity class is a collection of languages
• P is the complexity class consisting of all decision problems that have polynomial-time algorithms
• For each problem L in P, there is a polynomial-time decision algorithm A for L.– If n=|x|, for x in L (decision with “yes”), then A runs in p(n) time on input
x.
– The function p(n) is some polynomial
Verifier
A verifier for a language L is an algorithm V,
L={w| V accepts <w,c> for some string c}
For the verifier V for L, c is a certificate of w if V accepts <w,c>
If the verifier V for the language L runs in polynomial time, V is the polynomial time verifier for L.
Verifier for Hamiltonian Path
For <G,s,t>, a certificate is a list of nodes of G:
Verifier:
check if m is the number of nodes of G
Check if all nodes are all different
check if each is a directed edge of G for
i=1,…,m-1
If all pass, accept . Otherwise, reject.
mvvv ,...,, 21
),( 1ii vv
NP example (2)
• Problem: Decide if a graph has an Hamilton tour with weight K
• Verification Algorithm: 1. Test that Tour containing all nodes
2. Test that Tour has weight at most K
• Analysis: Verification takes O(n) time, so this algorithm runs in polynomial time in non-deterministic algorithms.
• Thinking about this way: if we have such a tour, we can verify that.
Class NP
NP is the class of languages that have polynomial time verifiers.
Examples:
• HAMPATH is in NP
Clique Problem
Given undirected graph G, a clique is a set of nodes of G such that every two nodes are connected by an edge.
A k-clique is a clique with k nodes
clique5
Clique Problem
CLIQUE={<G,k>| G iss an undirected graph with k-clique}
CLIQUE is in NP.
Subset Sum Problem
SUBSET-SUM={<S,t>| S= and for some
, we have
},...,,{ 21 kxxx
},...,,{},...,,{ 2121 km xxxyyy
}...21 tyyy m
Polynomial Time Computable
A function is a polynomial time computable function if some polynomial time algorithm A exists that outputs for input w.
()f
)(wf
Polynomial Time Reduction
Assume that A and B are two languages.
A is polynomial time mapping reducible to A if a polynomial time computable function f exists such that
BwfAw )(
A Bf
fBA P
Transitivity• If and ,
then
BA P CB P
CA P
Boolean Formula
A literal is either a boolean variable or its negation:
A clause is the disjunction of several literals
Conjunctive normal form is the conjunction of several clauses
4321 xxxx
x x
)()()( 636534321 xxxxxxxxx
3SAT
A 3nd conjunctive normal formula (3nd-formula) is a conjunction form with at most 3 literals at each clause
3SAT={ | is satisfiable 3nd-formula}
)()()( 63653321 xxxxxxxx
3SAT to CLIQUE
Example:
)()()( 321321321 xxxxxxxxx
1x
2x
3x
1x
1x
2x3x
2x
3x
Outline• P and NP
– Definition of P– Definition of NP– Alternate definition of NP
• NP-completeness – Definition of NP-complete and NP-hard– The Cook-Levin Theorem
More Outline
• Some NP-complete problems – Problem reduction– SAT (and CNF-SAT and 3SAT)– Vertex Cover– Clique– Hamiltonian Cycle
What is a problem• A language is a set of strings
• A problem is a collection of instances
• An instance can be coded into a string
• A language=a problem
• Size of the problem refers to the length of string
• Algorithm that solves a problem A Turing machine accepts a language
Traveling Saleman Problem• Given n cities• Find a shortest path through each city exactly once.
ORDPVD
MIADFW
SFO
LAX
LGA
HNL
849
802
13871743
1843
10991120
1233337
2555
142
Running Time Revisited• Input size, n• All the polynomial-time algorithms studied so far in
this course run in polynomial time using this definition of input size.
ORDPVD
MIADFW
SFO
LAX
LGA
HNL
849
802
13871743
1843
10991120
1233337
2555
142
NP Problems
blind monkey
Problem• Given the formula f=
• construct a graph G such that f is satisfiable iff G has a clique of size 3.
)()()( 432121 xxxxxx
An Interesting Problem
NOT
OR
AND
Logic Gates:
Inputs:
01
0
1
11
1
1
Output:
0
1
00 1
A Boolean circuit is a circuit of AND, OR, and NOT gates; the CIRCUIT-SAT problem is to determine if there is an assignment of 0’s and 1’s to a circuit’s inputs so that the circuit outputs 1.
CIRCUIT-SAT is in NP
NOT
OR
AND
Logic Gates:
Inputs:
01
0
1
11
1
1
Output:
0
1
00 1
Non-deterministically choose a set of inputs and the outcome of every gate, then test each gate’s I/O.
If there is an input assignment, we can verify that in polynomial time.
NP-Completeness• Reduction: transfer a language to a subset of another
language. P-reduction means the process of transferring each string can be done in polynomial time.
• NP-complete class L: L is in NP. For each language M in NP, we can take an input x for M, transform it in polynomial time to an input x’ for L such that x is in M if and only if x’ is in L.
• L is NP-hard if it’s harder than NP-complete.
NP poly-time L
Cook-Levin Theorem• Cook’s Theorem: CIRCUIT-SAT is NP-complete.
– Proof: We already showed it is in NP.
– To prove it is NP-complete, we have to show that every language in NP can be reduced to it.
– Let M be in NP, and let x be an input for M.
– Let y be a certificate that allows us to verify membership in M in polynomial time, p(n), by some algorithm D.
– Let S be a circuit of size at most O(p(n)2) that simulates a computer (details omitted…)
NP poly-time CIRCUIT-SATM
Cook-Levin Proof
< p(n) cells
S
x
D
W
y
x
D
W
y
S S
x
D
W
y
p(n)steps
Inp
uts
n
We can build a circuit that simulates the verification of x’s membership in M using y.
Let W be the working storage for D (including registers, such as program counter); let D be given in RAM “machine code.”
Simulate p(n) steps of D by replicating circuit S for each step of D. Only input: y.
Circuit is satisfiable if and only if x is accepted by D with some certificate y
Total size is still polynomial: O(p(n)3).
Output0/1
from D
Some Thoughts about P and NP
• Belief: P is a proper subset of NP.
• Implication: the NP-complete problems are the hardest in NP.
• Why: Because if we could solve an NP-complete problem in polynomial time, we could solve every problem in NP in polynomial time.
• That is, if an NP-complete problem is solvable in polynomial time, then P=NP.
• Since so many people have attempted without success to find polynomial-time solutions to NP-complete problems, showing your problem is NP-complete is equivalent to showing that a lot of smart people have worked on your problem and found no polynomial-time algorithm.
NP P
CIRCUIT-SAT
NP-complete problems live here
Circuit Formula
Circuit
))31(6(
))46(5(
)55(
))21(3(
))43(2(
))22(1(1
xxy
xyy
yy
xxy
yyy
xyyy
2x
1x
1x 2x
3x
1y
2y
3y 4y
5y
6y
Logic
• Demorgan Law:
xx
yxyx
yxyx
)(
)(
zyxzyx
zyxzyx
)(
)(
Truth table for
y1 y2 x2
1 1 1 0 1
1 1 0 1 0
1 0 1 0 1
1 0 0 0 1
0 1 1 1 0
0 1 0 0 1
0 0 1 1 0
0 0 0 1 0
))22(1( xyy f))22(1( xyyf
Convert to CNFConversion:
)221()221()221()221( xyyxyyxyyxyy
f
))221()221()221()221((
))221()221()221()221((
))221()221()221()221((
xyyxyyxyyxyy
xyyxyyxyyxyy
xyyxyyxyyxyy
ff
Convert to CNFConversion:
)221()221()221()221( xyyxyyxyyxyy
f
))221()221()221()221((
))221()221()221()221((
))221()221()221()221((
))221()221()221()221((
xyyxyyxyyxyy
xyyxyyxyyxyy
xyyxyyxyyxyy
xyyxyyxyyxyy
ff
3SAT
• The SAT problem is still NP-complete even if the formula is a conjunction of disjuncts, that is, it is in conjunctive normal form (CNF).
• The SAT problem is still NP-complete even if it is in CNF and every clause has just 3 literals (a variable or its negation):– (a+b+¬d)(¬a+¬c+e)(¬b+d+e)(a+¬c+¬e)
• Reduction from SAT .
Problem• Given the formula f=
• construct a graph G such that f is satisfiable iff G has a clique of size 3.
)()()( 432121 xxxxxx
Showing NP-Completeness
x1 x3x2x1 x4x3x2 x4
11
12
13 21
22
23 31
32
33
Problem Reduction• A language M is polynomial-time reducible to a language L if
an instance x for M can be transformed in polynomial time to an instance x’ for L such that x is in M if and only if x’ is in L.– Denote this by ML.
• A problem (language) L is NP-hard if every problem in NP is polynomial-time reducible to L. (another way to define NP-hard.
• A problem (language) is NP-complete if it is in NP and it is NP-hard.
• CIRCUIT-SAT is NP-complete:– CIRCUIT-SAT is in NP– For every M in NP, M CIRCUIT-SAT.
Inputs:
01
0
1
11
1
1
Output:
0
1
00 1
Problem Reduction• A general problem M is polynomial-time reducible to a general problem L
if an instance x of problem M can be transformed in polynomial time to an instance x’ of problem L such that the solution to x is yes if and only if the solution to x’ is yes.– Denote this by ML.
• A problem (language) L is NP-hard if every problem in NP is polynomial-time reducible to L.
• A problem (language) is NP-complete if it is in NP and it is NP-hard.• CIRCUIT-SAT is NP-complete:
– CIRCUIT-SAT is in NP– For every M in NP, M CIRCUIT-SAT.
Inputs:
01
0
1
11
1
1
Output:
0
1
00 1
Transitivity of Reducibility• If A B and B C, then A C.
– An input x for A can be converted to x’ for B, such that x is in A if and only if x’ is in B. Likewise, for B to C.
– Convert x’ into x’’ for C such that x’ is in B iff x’’ is in C.
– Hence, if x is in A, x’ is in B, and x’’ is in C.
– Likewise, if x’’ is in C, x’ is in B, and x is in A.
– Thus, A C, since polynomials are closed under composition.
• Types of reductions:– Local replacement: Show A B by dividing an input to A into components
and show how each component can be converted to a component for B.
– Component design: Show A B by building special components for an input of B that enforce properties needed for A, such as “choice” or “evaluate.”
CNF-SAT• A Boolean formula is a formula where the variables and
operations are Boolean (0/1):– (a+b+¬d+e)(¬a+¬c)(¬b+c+d+e)(a+¬c+¬e)
– OR: +, AND: (times), NOT: ¬
• SAT: Given a Boolean formula S, is S satisfiable, that is, can we assign 0’s and 1’s to the variables so that S is 1 (“true”)?– Easy to see that CNF-SAT is in NP:
• Non-deterministically choose an assignment of 0’s and 1’s to the variables and then evaluate each clause. If they are all 1 (“true”), then the formula is satisfiable.
CNF-SAT is NP-complete• Reduce CIRCUIT-SAT to CNF-SAT.
– Given a Boolean circuit, make a variable for every input and gate.
– Create a sub-formula for each gate, characterizing its effect. Form the formula as the output variable AND-ed with all these sub-formulas:
• Example: m((a+b)↔e)(c↔¬f)(d↔¬g)(e↔¬h)(ef↔i)(m ↔kn)…Inputs:
ab
c
e
fi
d
m
Output:
h
k
g j n
The formula is satisfiable if and only if the Boolean circuit is satisfiable.
3SAT
• The SAT problem is still NP-complete even if the formula is a conjunction of disjuncts, that is, it is in conjunctive normal form (CNF).
• The SAT problem is still NP-complete even if it is in CNF and every clause has just 3 literals (a variable or its negation):– (a+b+¬d)(¬a+¬c+e)(¬b+d+e)(a+¬c+¬e)
• Reduction from SAT .
Vertex Cover• A vertex cover of graph G=(V,E) is a subset W of V, such that, for every
edge (a,b) in E, a is in W or b is in W.
• VERTEX-COVER: Given a graph G and an integer K, does G have a vertex cover of size at most K?
• VERTEX-COVER is in NP: Non-deterministically choose a subset W of size K and check that every edge is covered by W.
Vertex-Cover is NP-completeReduce 3SAT to VERTEX-COVER.
Let S be a Boolean formula in CNF with each clause having 3 literals.For each variable x, create a node for x and ¬x, and connect these two:
For each clause Ci = (a+b+c), create a triangle and connect the three nodes.
x ¬x
i1 i3
i2
truth settingcomponent
clause satisfyingcomponent
Vertex-Cover is NP-completeCompleting the construction
Connect each literal in a clause triangle to its copy in a variable pair.E.g., for a clause Ci = (¬x+y+z)
Let n=# of variablesLet m=# of clausesSet K=n+2mG has 3m+2n vertices
y ¬y
i1 i3
i2
x ¬x z ¬z
Vertex-Cover is NP-complete
¬dd
11
12
13 21
22
23 31
32
33
Example: (a+b+c)(¬a+b+¬c)(¬b+¬c+¬d)Graph has vertex cover of size K=4+6=10 iff formula is satisfiable.
¬cc¬aa ¬bb
Proof : Vertex-Cover is NP-complete
• We need to prove the following two statements:– Suppose there is an assignment of Boolean
values that satisfies S, then we need to prove that there is a k cover.
– Suppose the special graph has a k<=n+2m cover, we need to prove that the Boolean expression is satisfiable.
Why? (satisfiable cover)
• Suppose there is an assignment of Boolean values that satisfies S– Build a subset of vertices that contains each literal that
is assigned 1 by satisfying assignment– For each clause, the satisfying assignment must assign
one to at least one of the summands (may be shared by other clauses). Include the other two vertices in the vertex cover (not share with other).
– The cover has size n + 2m (as required).
Is What We Described a Cover?
• Each edge in a truth setting component (x+¬x) is covered.• Each edge in a clause satisfying component is covered
– Two of three edges incident on a clause satisfying component is covered.
– An edge (incident to a clause satisfying component) not covered by a vertex in the component must be covered by a node in cover C labeled with a literal, since the corresponding literal is 1 (by how we chose the vertices to be covered in the clause satisfying components)
– (Choose two from each clause and chose one that has true value in each truth setting component.)
Why? (cover satisfiable)
• Suppose there is a cover C with size at most n + 2m• For this special graph, any cover must contain at least one
vertex from each truth setting component, and two from each clause satisfying component, so size is at least n + 2m (so exactly that)
• So, one edge incident to any clause satisfying component is not covered by a vertex in the clause satisfying component. This edge must be covered by the other endpoint, which is labeled with a literal.
• We can associate the literal associated with this node 1 and each clause in S is satisfied, hence S is satisfied
Why? (cover satisfiable)
This is the complete proof.• Bottom line: S is satisfiable iff G has a vertex
cover of size at most n + 2m.• Bottom line 2: Vertex Cover is NP-Complete
Clique
• A clique of a graph G=(V,E) is a subgraph C that is fully-connected (every pair in C has an edge).
• CLIQUE: Given a graph G and an integer K, is there a clique in G of size at least K?
• CLIQUE is in NP: non-deterministically choose a subset C of size K and check that every pair in C has an edge in G.
This graph hasa clique of size 5
CLIQUE is NP-Complete
G’G
Reduction from VERTEX-COVER.A graph G has a vertex cover of size K if and only if it’s complement has a clique of size n-K.
Some Other NP-Complete Problems
• SET-COVER: Given a collection of m sets, are there K of these sets whose union is the same as the whole collection of m sets?– NP-complete by reduction from VERTEX-COVER
• SUBSET-SUM: Given a set of integers and a distinguished integer K, is there a subset of the integers that sums to K?– NP-complete by reduction from VERTEX-COVER
Some Other NP-Complete Problems
• 0/1 Knapsack: Given a collection of items with weights and benefits, is there a subset of weight at most W and benefit at least K?– NP-complete by reduction from SUBSET-SUM
• Hamiltonian-Cycle: Given an graph G, is there a cycle in G that visits each vertex exactly once?– NP-complete by reduction from VERTEX-COVER
• Traveling Salesperson Tour: Given a complete weighted graph G, is there a cycle that visits each vertex and has total cost at most K?– NP-complete by reduction from Hamiltonian-Cycle.
Beyond NP
Outline and Reading• Co-NP
– A language L is in Co-NP iff (-L) is in NP. – Example, non-saitisfiable, the language is defined
as all cases of Boolean expressions that are not saitisfiable.
• PSpace– A language is in Pspace if there is a TM accept it uses only
polynomial space in an offline machine.
Some facts• Co-NP=?NP, P=?PSpace.
– Do not know• PSpace=NPSpace • P is subset of Co-NP. P=Co-P• Other facts
– Co-NP <= PSPACE <= EXPTIME. – The validity problem for propositional logic is Co-NP-
complete. – Determinining whether a position in generalized checker
game is a winning position for one of the players is PSPACE-complete.
– ML type checking is EXPTIME-complete.
Turing Machine
• Write on the tape and read from it• Head can move left and right• Tape is infinite• Rejecting and accepting states
Control
a b a b ......
Deterministic Turing Machine7-tuple
1. Q is the finite set of states
2. is the input alphabet not containing special blank
3. is the tape alphabet
4.
5. is the start state,
6. is the accept state
7. is the reject state, where
),,,,,,( 0 rejectaccept qqqQ
},{: RLQQ Qq 0
Qqaccept
Qqreject rejectaccept qq
Nondeterministic Turing Machine
1. Q is the finite set of states
2. is the tape alphabet
3.
4. is the start state,
5. is the accept state.
),,,,,( 0 rejectaccept qqqQ
}),{(: RLQPQ
Qq 0
Qqaccept
Configuration
• Current state: q7• Current head position on the tape: 4th cell• Current tape content: abab
q7
a b a b ......
Configuration
A configuration is represented by
Where is the left part of the tape content,
is the right part of the tape content,
a is the symbol at the head position,
q is the current state
rightleftqaww
leftwrightw
Configuration Transition
For
),,(),( Lcqbq ji
udbav:Tape
udcav:Tape
dcavuqbavudq ji
Configuration Transition
For
),,(),( Rcqbq ji
udbav:Tape
udcav:Tape
avudcqbavudq ji
Configuration
Start configuration: , where w is the input
Accepting configuration: a configuration with state
Rejecting configuration: a configuration with state
wq0
acceptq
rejectingq
Accept Computation
A Turing machine M accepts input w if a sequence of configurations exists where
1. is the start configuration of M on input w,
2. each yields , and
3. is an accepting configuration
kCCC ,...,, 21
1CiC
1iC
kC
Language recognized by TM
For a Turing machine M, L(M) denotes the set of all strings accepted by M.
A language is Turing recognizable if some Turing machine recognizes it.
Turing Recognizable
• Turing machine M recognizes language L
accept
reject
foreverrun _
)(xMLx
Lx
Decidability
A language L is Turing decidable if there is a deterministic Turing machine M such that
• If x is in L, then M accepts x in finite number of steps
• If x is not in L, then M rejects x in finite number of steps
Example: {w#w| w is in {0,1}*} is Turing decidable
Turing Decidable
• Turing machine M decides language L
accept
reject
)(xMLx
Lx
stopsalways _
Observation
If L is Turing decidable, then L is Turing recognizable
NP-completeness
A language B is NP-complete if
1. B is in NP, and
2. Every A in NP is polynomial time reducible to B
Theorem. If B is NP-complete and B is in P, then P=NP.
SAT
A boolean formula is satisfiable if there exists assignments to its variables to make the formula true
SAT={ | is satisfiable boolean formula}
0
0
1
)()()(
6
5
1
63653321
x
x
x
xxxxxxxx
Cook-Leving Theorem
Theorem: SAT is NP-complete
Proof.
1. SAT is in NP.
2. For every problem A in NP, SATA P
Proof
1. The start configuration is legal
2. The final state is accept.
3. The movement is legal.
4. Each cell takes one legal symbol.
:move
acceptmovestartcell
:start
:accept
:cell
Proof
1. 1 if The cell[i,j] holds symbol s; 0 otherwise
2. Time bound for the NTM M with constant k.
3. The movement is legal.
4. NTM M for accepting A.
:,, sjix
:kn
:}{#QC
:),,,,( 0 acceptqqQM
Nondeterministic Turing Machine
1. Q is the finite set of states
2. is the tape alphabet
3.
4. is the start state,
5. is the accept state.
),,,,,( 0 rejectaccept qqqQ
}),{(: RLQPQ
Qq 0
Qqaccept
Configuration Transition
For
),,(),( Lcqbq ji
udbav:Tape
udcav:Tape
dcavuqbavudq ji
Configuration Transition
For
),,(),( Rcqbq ji
udbav:Tape
udcav:Tape
avudcqbavudq ji
Configuration
Start configuration: , where w is the input
Accepting configuration: a configuration with state
Rejecting configuration: a configuration with state
wq0
acceptq
rejectingq
Accept Computation
A Turing machine M accepts input w if a sequence of configurations exists where
1. is the start configuration of M on input w,
2. each yields , and
3. is an accepting configuration
kCCC ,...,, 21
1CiC
1iC
kC
Language recognized by TM
For a Turing machine M, L(M) denotes the set of all strings accepted by M.
A language is Turing recognizable if some Turing machine recognizes it.
Proof
Each cell has only one symbol
1. The symbol is selected from C:
2. Only one symbol is selected:
3. It is true for all cell at all configuration:
))](()[( ,,,,,
,,,1
tjisji
tsCts
sjiCsnji
cell xxxk
)( ,, sjiCs
x
)( ,,,,,
tjisji
tsCts
xx
(...),1 knji
Proof
The start configuration is
#......# 210 nwwwq
0,2,1#,1,1 qstart xx
#,,1,1,1,4,1,3,1 ...... kk nnnn xxxx
nwnww xxx ,2,1,4,1,3,1 ...21
Proof
Accept computation has reached.
It makes sure the accept state will appear among the configuration transitions.
acceptk
qjinji
accept x ,,,1
kn
Proof
Characterize the legal move
The whole move is legal if all windows are legal.
Characterize one window is legal
)___),((
11
legaliswindowji
k
k
njni
move
)(654321
61,1,1,,1,1,1,1,,,,1,
_,...,
ajiajiajiajiajiaji
legalisaa
xxxxxx
Proof
The state transition
)},,(),,,{(),( 221 RaqLcqbq
acq
baq
2
1
2
1
aaq
baq
:_ windowsLegal
bbb
bbb
:_ windowsIllegal
aaq
baq
1
1
Boolean Formula
A literal is either a boolean variable or its negation:
A clause is the disjunction of several literals
Conjunctive normal form is the conjunction of several clauses
4321 xxxx
x x
)()()( 636534321 xxxxxxxxx
Prepare for the Final
• Regular language and automata
• Context free language
• Decidability
• Undecidability
• Complexity theory
Regular Language
Concepts: Automata, regular expression
Skills: Design automata to accept a regular language
Disprove a language is a regular
}0|10{ iii
}0|1{ 23 ii
Context-free Language
Concepts: Context-free grammar, parsing tree
Skills: Design automata to accept a context-free language
Disprove a language is context-free
}0|210{ iiii
}0|10{ 2332 iii
Decidability
Concepts: Turing machine, algorithm, Church-Turing Thesis, Turing recognizable, Turing Decidable
Skills: Prove a language is decidable (design algorithm)
Prove a language is Turing recognizable
}0|210{ iiii
}_int___|),...,({ 1 solutionegerhasppolynomialxxpL n
}__|,{ wacceptsMwMATM
Undecidability
Concepts: Countable, Turing undecidable, reduction
Skills: Diagonal method: Prove is undeciable
Use reduction to prove a language is undecidable
TMmTM HALTA
TMA
Complexity
Concepts: Time on Turing machine
PTIME(t(n))
NP-completeness
Polynomial time reduction
Polynomial time verifier
)( kk nTIMEP
)( kk nNTIMENP
Complexity
Skill: Prove a problem is in P
Prove a problem is in NP
Use reduction to prove a problem is NP-complete.
CompositeCliqueSATSAT ,,3,
SATSAT
CLIQUESAT
p
p
3'
,3
Grade
• A:…
• B:…
• C: Miss exam or homework
SAT’
A conjunctive normal form is a conjunction of some clauses
SAT’={ | is satisfiable conjunctive normal form}
0
0
1
)()()(
6
5
1
6365315321
x
x
x
xxxxxxxxxx
Cook-Leving Theorem’
Theorem: SAT’ is NP-complete
Proof. Same as that for SAT is NP-complete
3SAT
A 3nd conjunctive normal formula (3nd-formula) is a conjunction form with at most 3 literals at each clause
3SAT={ | is satisfiable 3nd-formula}
)()()( 63653321 xxxxxxxx
3SAT is NP-complete
Theorem: There is polynomial time reduction from SAT’ to 3SAT.
)()()( 63653321 xxxxxxxx
3SAT is NP-complete
is satisfiable if and only if the following is satisfiable
)( 4321 aaaa
)()( 4321 aazzaa
3SAT is NP-complete
is satisfiable if and only if the following is satisfiable
)...( 21 laaa
)(...)(
)()(
13342
231121
lll aazzaz
zazzaa
3SAT is NP-complete
Convert every clause into
3cnf:)...( 21 laaa
)(...)(
)()(
13342
231121
lll aazzaz
zazzaa
3SAT is NP-complete
Conjunctive normal form
Each clause is convert into
is satisfiable if and only if the following is satisfiable
kfff ...21
if ),...,2,1( kigi
kfff ...21
kggg ...21
Problem: Convert Circuit C to Formula f such that C is satisfiable iff f is satisfiable
Circuit C
1x
1x 2x
3x
Approximation Algorithms
Outline and Reading• Approximation Algorithms for NP-Complete
Problems – Approximation ratios– Polynomial-Time Approximation Schemes – 2-Approximation for Vertex Cover – Approximate Scheme for Subset Sum– 2-Approximation for TSP special case – Log n-Approximation for Set Cover
Approximation Ratios• Optimization Problems
– We have some problem instance x that has many feasible “solutions”.
– We are trying to minimize (or maximize) some cost function c(S) for a “solution” S to x. For example,
• Finding a minimum spanning tree of a graph
• Finding a smallest vertex cover of a graph
• Finding a smallest traveling salesperson tour in a graph
Approximation Ratios• An approximation produces a solution T
– T is a k-approximation to the optimal solution OPT if c(T)/c(OPT) < k (assuming a min. prob.; a maximization approximation would be the reverse)
Polynomial-Time Approximation Schemes
• A problem L has a polynomial-time approximation scheme (PTAS) if it has a polynomial-time (1+)-approximation algorithm, for any fixed >0 (this value can appear in the running time).
• Subset Sum has a PTAS.
Vertex Cover
• A vertex cover of graph G=(V,E) is a subset W of V, such that, for every (a,b) in E, a is in W or b is in W.
• OPT-VERTEX-COVER: Given an graph G, find a vertex cover of G with smallest size.
• OPT-VERTEX-COVER is NP-hard.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
A 2-Approximation for Vertex Cover• Every chosen edge e has
both ends in C• But e must be covered by
an optimal cover; hence, one end of e must be in OPT
• Thus, there is at most twice as many vertices in C as in OPT.
• That is, C is a 2-approx. of OPT
• Running time: O(m)
Algorithm VertexCoverApprox(G)Input graph GOutput a vertex cover C for GC empty setH Gwhile H has edges
e H.removeEdge(H.anEdge()) v H.origin(e)
w H.destination(e)C.add(v)C.add(w)for each f incident to v or w
H.removeEdge(f)return C
Subset Sum
Given a set {x1,x2,…,xn} of integers and an integer t, find {y1,y2,…,yk} a subset of {x1,x2,…,xn} such that:
k
i
iyt1
Approximate Solution for Subset Sum
• Find a subset {y1,y2,…,yk} from {x1,x2,…,xn} such that
• y1+y2+…+yk t
• Minimize (y1+y2+…+yk )/(z1+z2+…+zm ),
• Where z1+z2+…+zm is the optimal solution such that z1+z2+…+zm t and
t-(z1+z2+…+zm ) is minimal
Subset Sum
To prove NP-complete:
1. Prove is in NP• Verifiable in polynomial time• Give a nondeterministic algorithm
2. Reduction from a known NP-complete problem to subset sum
• Reduction from 3SAT to subset sum
Subset Sum is in NP
sum = 0
A = {x1,x2,…,xn}for each x in A
y choice(A)sum = sum + yif ( sum = t ) then successA A – {y}
donefail
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Inequality
......!3!2
132
xx
xex
Inequality• Standard formulas
• Assume that , we have
en
xxxe
nn
x
)1
1(lim
......!3!2
132
10 x
21 xxex
Scaling factor• Select
• Each time the difference is scaled by factor
• After n time,
n2
)1(
142
1)2
1()1(2
2/en
nn
TrimmingExample:
L=< 10, 11, 12, 15, 20 ,21,22, 23, 24, 29>
and
It is trimmed to L’={10, 12, 15, 20, 23, 39>
1.0
Reduction
Goal: Reduce 3SAT to SUBSET-SUM.How: Let Ф be a 3 conjunctive normal formformula. Build an instance of SUBSET-SUMproblem (S, t) such that Ф is satisfiable if and only if there is a subset T of S whoseelements sum to t.Prove the reduction is polynomial.
1. Algorithm
Input: Ф - 3 conjunctive normal form formula
Variables: x1, x2, …, xl
Clauses: c1,c2,…,ck.
Output: S, t such that
Ф is satisfiable iff there is T subset of S
which sums to t.
1. Algorithm (cont.)x1 x2 …. xl c1 c2 …. ck
y1 1 0 0 1 0 0
z1 1 0 0 0 1 0
y2 1 0 0 0 1
z2 1 0 0 0 0
…
yl 1 0 0 0
zl 1 0 0 0
g1 1 0 0
h1 2 0 0
g2 1 0
h2 2 0
…
gk 1
hk 2
t 1 1 … 1 4 4 … 4
1. Algorithm (cont.)
(yi,xj), (zi,xj) – 1 if i=j, 0 otherwise
(yi,cj) – 1 if cj contains variable xi, 0 otherwise
(zi,cj) – 1 if cj contains variable x’i, 0 otherwise
(gi,xj), (hi,xj) – 0
(gi,cj), (hi,cj) – 1 if i=j, 0 otherwise
Each row represents a decimal number.
S={y1,z1,..,yl,zl,g1,h1,…,gk,hk}
t is the last row in the table.
2. Reduction ‘’
Given a variable assignment which satisfies
Ф, find T.
1. If xi is true then yi is in T, else zi is in T
2. Add gi and/or hi to T such all last k digits of T to be 4.
3. Reduction ‘’
Given T a subset of S which sums to t, find a
variable assignment which satisfies Ф.
1. If yi is in T then xi is true
2. If zi is in T then xi is false
4. Polynomial
Table size is (k+l)2
O(n2)
Example
),(),(
),(),(
32143213
32123211
4321
xxxCxxxC
xxxCxxxC
CCCC
x1 x2 x3 c1 c2 c3 c4
y1 1 0 0 1 0 0 1
z1 1 0 0 0 1 1 0
y2 1 0 0 0 1
z2 1 0 1 1 1 0
y3 1 0 0 1 1
z3 1 1 1 0 0
g1 1 0 0
h1 2 0 0
g2 1 0
h2 2 0
…
gk 1
hk 2
t 1 1 … 1 4 4 4 4
),(),(
),(),(
32143213
32123211
xxxCxxxC
xxxCxxxC
Special Case of the Traveling Salesperson Problem
• OPT-TSP: Given a complete, weighted graph, find a cycle of minimum cost that visits each vertex.– OPT-TSP is NP-hard– Special case: edge weights satisfy the triangle inequality
(which is common in many applications):• w(a,b) + w(b,c) > w(a,c)
a
b
c
5 4
7
A 2-Approximation for TSP Special Case
Output tour T
Euler tour P of MST M
Algorithm TSPApprox(G)Input weighted complete graph G, satisfying the triangle inequalityOutput a TSP tour T for GM a minimum spanning tree for GP an Euler tour traversal of M, starting at some vertex sT empty listfor each vertex v in P (in traversal order)
if this is v’s first appearance in P then T.insertLast(v)T.insertLast(s)return T
A 2-Approximation for TSP Special Case - Proof
Euler tour P of MST MOutput tour T Optimal tour OPT (twice the cost of M) (at least the cost of MST M)(at most the cost of P)
The optimal tour is a spanning tour; hence |M|<|OPT|.The Euler tour P visits each edge of M twice; hence |P|=2|M|Each time we shortcut a vertex in the Euler Tour we will not increase the total length, by the triangle inequality (w(a,b) + w(b,c) > w(a,c)); hence, |T|<|P|.Therefore, |T|<|P|=2|M|<2|OPT|
Problem• Convert the following spanning tree into a path so
that it provides 2-approximation for the traveling saleman probelm. Point out the edges not in the tree.
Set Cover
• OPT-SET-COVER: Given a collection of m sets, find the smallest number of them whose union is the same as the whole collection of m sets?
– OPT-SET-COVER is NP-hard
• Greedy approach produces an O(log n)-approximation algorithm. See §13.4.4 for details.
Algorithm SetCoverApprox(G)
Input a collection of sets S1…Sm Output a subcollection C with same union
F {S1,S2,…,Sm}C empty set
U union of S1…Sm while U is not empty
Si set in F with most elements in U
F.remove(Si)
C.add(Si)
Remove all elements in Si from Ureturn C
Final Exam
• May 11 (Tuesday)
• 5:45-8:25pm
Randomized Algorithm
bababa
Get the apple
blind monkey
Randomized algorithm
blind monkey
Randomized algorithm
blind monkey
Randomized algorithm• Randomized select 4 independent paths• Each path has ¼ chance to get apple• Each path has 1-1/4=3/4 to get nothing• It has chance to fail at all 4 paths• It has at least 1-(1/3)=2/3 to get an apple from trying the
4 paths• The worst case is that the monkey can get an apple after
trying 13 paths
3
1
256
81
4
34
Try 6 Paths
• It has probability to fail at all 4 paths• It has at least 1-0.178=0.822 probability to get an apple
from trying the 4 paths• The worst case is that the monkey can get an apple after
trying 13 paths
178.04
36
Polynomial Identity • Check if a polynomial is constantly equal to zero:
0)12()1( 22 xxx
0)2()1( 22 yxyyxyx
Degree of polynomial • The highest exponent among all monomial terms.
• A single variable polynomial is converted into the format below, it has degree n
For Example, 011
1 ... axaxaxa nn
nn
12)1( 22 xxx
Degree of Multiple Variable Polynomial
• The polynomial
has multi-degree
if the highest degree (exponent) of is
• The degree of a variable in a multiple variable polynomial is its highest exponent.
• For example: the following polynomial has multi-degree (30, 100)
),...,,( 21 kxxxP),...,,( 21 kddd
ix id
100230 yxxyyx
Fact
• Each nonzero single variable polynomial of degree n has at n most different real roots.
Randomized algorithm Polynomial Identity
• Assume the polynomial P(x) has degree n• Randomly select n+1 different real numbers
• P(x) is zero iff all of
are zero
121 ,...,, nxxx
)(),...,(),( 121 nxPxPxP
Checking the identity of two lists
• Given two lists of integers, check if they will be the same after sorting.
• 5,1, 9,1,4 and 1, 4, 1,9,5
Two algorithms
• Check after they are sorted.
Time: O(n log n)
• Convert into two polynomials
Time: O(n)
Example• For the polynomial P(x) below, if let x=1,2,3,
P(1)=P(2)=P(3)=0.
)12()1( 22 xxx
Example• For the polynomial P(x) below, if let x=1,2,3,
then P(1)=P(2)=0, but P(3)=2.
)2)(1( xx
Two variables polynomial has infinite roots
• The polynomial
has infinite roots. It represents the circle of radius one and center at origin.
0122 yx
Two variables polynomial has infinite roots
• The polynomial
has infinite roots. It represents the circle of radius one and center at origin.
0122 yx
Randomized Algorithm• Randomly select a point (x,y) on the plane, if the
point is not in the circle boundary, then
0122 yx
Convert the multiple variable polynomial
• The polynomial
can be converted into the format:
5100230 yxxyx
)5(301002 xyxyx
Convert two variables polynomial• The polynomial
of multi-degree
can be converted into the format:
Where each has degree at most
),( yxP
)()(...)()( 01
11
12
2
2
2xPyxPyxPyxP d
dd
d
),( 21 dd
)(xPi 1d
Convert the multiple variable polynomial
• For polynomial
Replace y by
)5(301002 xyxyx 31x
)5(
)5(613102
313031002
xxx
xxxxx
Convert two variables polynomial• The polynomial
of multi-degree
can be converted into the format:
Where each has multi-degree at most
),,( 321 xxxP
),(),(...),(),( 3101
32111
211213
3
3
3xxPxxxPyxxPxxxP d
dd
d
),,( 321 ddd
),( 21 xxPi),( 21 dd
Convert two variables polynomial• For the polynomial
of multi-degree
Replace y by
Which is and has degree at most
),( yxP
)()(...)()( 0)1(
11)1(
1)1( 121
2
21
2xPxxPxxPxxP ddd
ddd
d
),( 21 dd
),( 11dxxP
11dx
)1)(1( 21 dd
Convert multiple variables polynomial into single variable poly.
• For the polynomial of multi-degree
It can be converted into a single variable polynomial of degree
Furthermore, is not zero iff is not zero
),...,,( 21 kxxxP
),...,,( 21 kddd
)1)...(1)(1( 21 kddd
),...,,( 21 kxxxP )( 1xQ
)( 1xQ
Randomized algorithm for multiple variables polynomial
• Input: the polynomial of multi-degree
Convert it into a single variable polynomial of degree (at most)
• Randomly select an integer z in
Evaluate , if P(…) is zero, then is zero. Otherwise, Q(z) is zero with chance <1/1000
),...,,( 21 kxxxP),...,,( 21 kddd
)1)...(1)(1( 21 kddd
)( 1xQ
)]1)...(1)(1(1000,1[ 21 kddd
)(zQ )(zQ
Big Open Problem
• Is there any deterministic algorithm such that
given a polynomial ,
the algorithm decides if it is identical to zero in
steps, where c is a constant and n is the length of the input polynomial.
),...,,( 21 kxxxP
cn
Degree
• The degree of a monomial
is
• For example, has degree 3+21+7=31.
• The degree of a multi-variable polynomial is the largest degree of its monomials after sum of product expansion.
kdk
dd xxx ...21
21
kddd ...21
7212
31 kxxx
Schwartz-Zippel Theorem
Let be a multivariate polynomial of degree d. Fix a set of integer S, and let be chosen randomly and uniformly from S. If
Then with probability at most ,
),...,( 1 nxxQnrr ,...,1
0),...,( 1 nxxQ
|| S
d
0),...,( 1 nrrQ
Proof
• Basis: The number of variables is one.
The polynomial has at most d different roots.
So, with probability at most ,
• Hypothesis: if the number of variables is n, then
with probability at most
|| S
d0)( rQ
|| S
d
0),...,( 1 nrrQ
• Induction: The number of variables is n+1.
Write
With probability at most ,
If with probability at most ,
Therefore, with probability at most
|| S
kd
0),,...,( 11 nn xxxQ
),...,(),,...,( 10
111 ni
k
i
innn xxQxxxxQ
0),...,( 1 nk rrQ
,0),...,( 1 ni rrQ || S
k
.0),...,(),,...,( 10
111
ni
k
i
innn xxQxxxxQ
,|||||| S
d
S
k
S
kd
.0),,...,( 11 nn xxxQ
Application
• Find the perfect matching of a bipartite.
• Convert it into determinate.
• Check if the determinate is zero.
Problem: Convert the multiple variable polynomial
• For polynomial P(x,y)=
Use the previous method to convert it into one variable
polynomial Q(x) so that P(x,y) is identical to zero iff Q(x) is identical to zero
11010102 yxxyxy
Expectation
• Let f(x) be a real-valued function. Then the expectation of f(x) is given by
.][)()]([ x
xXyprobabilitxfxfE
Independence
Two random variables X and Y are independent if
][][
)]()[(
bYyprobabilitaXyprobabilit
bYandaXyprobabilit
Independence
Two random variables X and Y are independent if
][][][ YEXEXYE )]([)]([)]()([ YgEXfEYgXfE
Markov Inequality
Let Y be random variable assuming only non-negative values. Then for all t>0,
.][
][t
YEtYyprobabilit
Chernoff Bound
Therorem: Let be independent 0,1-random variables such that Then for
, and
nXXX ,...,, 21
.]1[ pXyprobabilit n
nXXXX ...21,10
pnepnXyprobabilit ]
)1([])1([
)1(
Proof
Proof. For any real number t,
].[
])1([)1( pnttX eeyprobabilit
pnXyprobabilit
Proof
Proof. Apply Markov inequality
.][]...[][
][
])1([
)1(
)1(
21
pnt
tXtXtX
ttX
e
eEeEeE
eeyprobabilit
pnXyprobabilit
n
Proof
Proof. By the definition of expectation,
).1(][ ppeeE ttX n
.1 xex
.)1(1
)1()1(
tept
t
eep
ppe
ProofProof. Apply Markov inequality
.
...
][]...[][
])1([
)1(
)1(
)1(
)1()1()1(
)1(
21
pnt
epn
pnt
epepep
pnt
tXtXtX
e
e
e
eee
e
eEeEeE
pnXyprobabilit
t
ttt
n
ProofFind a t to make it minimal. Let
).1ln( t
.])1(
[)1(
])1([
)1(
)11(
)1(
)11(
)1)(1ln(
)1(
)1(
)1( )1ln(
pnpn
pn
pn
epn
pnt
epn
ee
e
e
e
e
pnXyprobabilitt
Chernoff Bound
Therorem: Let be independent 0,1-random variables such that Then for
, and
nXXX ,...,, 21
.]1[ pXyprobabilit n
nXXXX ...21,10
pnepnXyprobabilit2
])1([
Proof
Proof. For any real number t,
].[
])1([
])1([
)1( pnttX eeyprobabilit
pnXyprobabilit
pnXyprobabilit
Proof
Proof. Apply Markov inequality
.][]...[][
][
])1([
)1(
)1(
21
pnt
tXtXtX
pnttX
e
eEeEeE
eeyprobabilit
pnXyprobabilit
n
Proof
Proof. By the definition of expectation,
.
)1(1
)1(
][
)1(
t
n
ep
t
t
tX
e
ep
ppe
eE
Proof
Proof.
.
...
][]...[][
])1([
)1(
)1(
)1(
)1()1()1(
)1(
21
pnt
epn
pnt
epepep
pnt
tXtXtX
e
e
e
eee
e
eEeEeE
pnXyprobabilit
t
ttt
n
Proof
Proof. Let
.])1(
[)1()1(
])1([
)1()1()1(
)1)1((
)1))(1/(1(
)1(
)1(
)1( ))1/(1ln(
pnpn
pn
pn
pn
pnn
epn
pnt
epn
eee
e
e
e
e
pnXyprobabilitt
))1/(1ln( t
Proof
Use the following inequality:
For each
.)1( 2/1 2 e
],1,0(
Homework 4
• Problem 1. An independent set of a graph G=(V,E) is a subset V’ of V of vertices such that each edge E is incident on at most one vertex in V. The independent set problem is to find the maximum-size independent set in G. Formulate the independent set problem and prove that it is NP-complete.
Solution of Problem 1
• We reduce the Clique problem to Independent set problem.
• Let G=(V,E) be a graph.
• Construct a graph G’=(V,E’) such that (u,v) is in E’ if and only if (u,v) is not in E.
• G has a clique of size k if and only if G’ has a independent set of size k.
Problem 2
Longest path problem is that given a graph G and an integer g, find in G a simple path of length g. Prove that longest path problem is NP-complete.
Solution of Problem 2
• Reduce the Hamiltonian path problem to the longest path problem.
• Let G=(V,E) be an input for the Hamiltonian path problem. Assume n=|V| (number of vertices).
• Let <G,n> be an instance for the longest path problem.
• G has a Hamiltonian path if and only if G has a path of vertices n.
Problem 3
In the hitting set problem, we are given a family of sets {S1, S2, …, Sn} and a budget b, and we wish to find a set H of size at most b which intersects every Si, if such an H exists. In other words, we want for all i.
Show that hitting set is NP-complete.
Solution of Problem 3
• Reduece the vertex cover problem to the hitting set problem.
• Let G=(V,E) be an input of vertex cover problem.
• Construct the hitting set problem with S1, S2,…, Sm such that each Si={u,v} for an edge (u,v) in E.
• The hitting set problem has solution of size b if and only if the graph can be covered by b vertices.
Problem 4
Show that for every problem A in NP, there is an algorithm which solves A in time
where n is the size of the input instance and p(n) is a polynomial (which may depend on A).
)2( )(npO
Polynomialn: input size
is a polynomial of n, where c does not depend on n.
Examples:
cn
,...,...,,, 10032 nnnn
Class P
P is the complexity class consisting of all decision problems that have polynomial-time algorithms
Polynomial-Time Decision Problems
• Decision problems: output is 1 or 0 (“yes” or “no”)• Examples:
• Is a given circuit satisfiable?
• Does a text T contain a pattern P?
• Does an instance of 0/1 Knapsack have a solution with benefit at least K?
• Does a graph G have an MST with weight at most K?
The Complexity Class P
• A complexity class is a collection of languages
• P is the complexity class consisting of all decision problems that have polynomial-time algorithms
• For each problem L in P, there is a polynomial-time decision algorithm A for L.– If n=|x|, for x in L (decision with “yes”), then A runs in p(n) time on input
x.
– The function p(n) is some polynomial
Verifier
A verifier for a language L is an algorithm V,
L={w| V accepts <w,c> for some string c}
For the verifier V for L, c is a certificate of w if V accepts <w,c>
If the verifier V for the language L runs in polynomial time, V is the polynomial time verifier for L.
Verifier for Hamiltonian Path
For <G,s,t>, a certificate is a list of nodes of G:
Verifier:
check if m is the number of nodes of G
Check if all nodes are all different
check if each is a directed edge of G for
i=1,…,m-1
If all pass, accept . Otherwise, reject.
mvvv ,...,, 21
),( 1ii vv
NP example (2)
• Problem: Decide if a graph has an Hamilton tour with weight K
• Verification Algorithm: 1. Test that Tour containing all nodes
2. Test that Tour has weight at most K
• Analysis: Verification takes O(n) time, so this algorithm runs in polynomial time in non-deterministic algorithms.
• Thinking about this way: if we have such a tour, we can verify that.
Class NP
NP is the class of languages that have polynomial time verifiers.
Examples:
• HAMPATH is in NP
Clique Problem
Given undirected graph G, a clique is a set of nodes of G such that every two nodes are connected by an edge.
A k-clique is a clique with k nodes
clique5
Solution of Problem 4
• Assume that A is a problem in NP.• A has a polynomial time verifier V(.). Let V(.) run in time
h(n)= .• For each input x of length n, x is in A if and only if there is
a certificate c such that V(x,c) accepts.• Since V runs in a polynomial time, the length of c is
bounded by a polynomial q(n).• The alphabet is finite number k of strings
dn
Solution of problem 4
• There are at most certificates of length q(n).• For each certificates c, check if V(x,c) accepts.• If V(x,c) accepts for one of those certificates, then x is in A.• Total time is • Let p(n) be a polynomial
where k and d are constants.
)(nqk
knqdnknqndknqdnq nknh )(log)(loglog)()( 222)(
)()( nkqdnknqdn
Homework 5
Problem 1
Problem 1. Bin Packing Problem (35-1 from the textbook)• Suppose that we are given a set of n objects, where the size of the i-th object
satisfies 0< <1. We wish to pack all the objects into the minimum number of unit-size bins. Each bin can hold any subset of the objects whose total size does not exceed 1.
• Prove that the problem of determining the minimum number of bins required is NP-hard. (Hint: Reduce from the subset-sum problem).
• The first-fit heuristic takes each object in turn and places it into the first bin that can accommodate it. Let S=.
• Argue that the optimal number of bins required is at least [S].• Argue that the first-fit heuristic leaves at most one bin less than half full.• Prove that the number of bins used by the first-fit heuristic is never more than.• Prove an approximation ratio of 2 for the first-fit heuristic.• Analyze its running time.
Problem 2
Problem 2. A box contains n balls. Each ball is either in red or white colors. Let be an arbitrary constant in (0,1).
• a) Assume that the number of red ball is at least n/10. Develop a constant time algorithm that gives an (1-)-approximation for the number of red balls in the box.
• b) If the number of red ball is at least n/m, what is the time complexity of your algorithm to give an (1-)-approximation for the number of red balls.
• You can assume that the input is in an array char b[n], where each b[i] is either ‘r’ (red) or ‘w’ (white). Hint: Apply the Chernoff bound.
Problem 3
a) Develop an O(n) time algorithm such that given two lists of integers, decide if the second list is a permutation of the first. For example, 9,2, 13, 97 is a permutation of 2, 9, 13, 97. Do not use sorting algorithm that takes O(n log n) time.
b) Your algorithm may do a large number of multiplications that will generate very large numbers and slow down the computation. Propose some strategies to avoid large number multiplications in your algorithm.
Example
it outputs -
* *
1x 2x 3x 4x
04321 xxxx
Example
it outputs -
* *
1x 2x
02121 xxxx