Aug 05, 2020
Data Structures and Algorithm
Xiaoqing [email protected]
What are algorithms?A sequence of computational steps that transform the input into the output
Sorting problem:Input: A sequence of n numbers <a1, a2, …, an>Output: A permutation (reordering) <a'1, a'2, …, a'n>such that a'1 ≤ a'2 ≤ … ≤ a'n
Instance of sorting problem:Input: <32, 45, 64, 28, 45, 58>Output: <28, 32, 45, 45, 58, 64>
Example of insert sort8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
2 3 4 8 9 6
2 3 4 6 8 9 done
Insertion sortINSERTION-SORT(A)1 for j ← 2 to length[A]2 do key ← A[j]3 // Insert A[j] into the sorted sequence A[1 .. j − 1]4 i ← j − 15 while i > 0 and A[i] > key6 do A[i + 1] ← A[i]7 i ← i − 18 A[i + 1] ← key
A:1 i j
key
n
Sorted
Running timeThe running time depends on the input: an already sorted sequence is easier to sort.Parameterize the running time by the size of the input, since short sequences are easier to sort than long ones.Generally, we seek upper bounds on the running time, because everybody likes a guarantee.
Kinds of analysesWorst-case: (usually)
T(n) = maximum time of algorithm on any input of size n.
Average-case: (sometimes)T(n) = expected time of algorithm over all inputs of size n.Need assumption of statistical distribution of inputs.
Best-case: (bogus)Cheat with a slow algorithm that works fast on some input.
Analysis of insertion sortINSERTION-SORT(A) 1 for j ← 2 to length[A] 2 do key ← A[j] 3 // Insert A[j] into the sorted
sequence A[1 .. j − 1]4 i ← j − 1 5 while i > 0 and A[i] > key6 do A[i + 1] ← A[i] 7 i ← i − 1 8 A[i + 1] ← key
costc1c20
c4c5c6c7c8
timesnn − 1n − 1
n − 1
n − 1
2
njj
t=∑
2( 1)n
jjt
=−∑
2( 1)n
jjt
=−∑
1 2 4 5 6 7 82 2 2( ) ( 1) ( 1) ( 1) ( 1) ( 1)n n n
j j jj j jT n c n c n c n c t c t c t c n
= = == + − + − + + − + − + −∑ ∑ ∑
Analysis of insertion sort: best and worst1 2 4 5 6 7 82 2 2
( ) ( 1) ( 1) ( 1) ( 1) ( 1)n n nj j jj j j
T n c n c n c n c t c t c t c n= = =
= + − + − + + − + − + −∑ ∑ ∑
Best case:The best case occurs if the array is already sorted
1 2 4 5 8( ) ( 1) ( 1) ( 1) ( 1)T n c n c n c n c n c n= + − + − + − + −
1 2 4 5 8 2 4 5 8( ) ( )c c c c c n c c c c= + + + + − + + + an + b (linear function)
Worst case:The worst case occurs if the array is in reverse sorted order
1 2 4 5 6 7 8( 1) ( 1) ( 1)( ) ( 1) ( 1) ( 1) ( ) ( ) ( 1)
2 2 2n n n n n nT n c n c n c n c c c c n+ − −
= + − + − + − + + + −
25 6 7 5 6 71 2 4 8 2 4 5 8( ) ( ) ( )
2 2 2 2 2 2c c c c c cn c c c c n c c c c= + + + + + + − − + − + + +
an2 + bn + c (quadratic function)
Machine-independent timeWhat is insertion sort's worst-case?It depends on the speed of our computer:
Relative speed (on the same machine),Absolute speed (on different machines).
"Asymptotic Analysis""Asymptotic Analysis"
BIG IDEA:Ignore machine-dependent constants.Look at growth of T(n) as n → ∞
Θ-notationMath:Θ(g(n)) = { f(n) : there exist positive constants c1, c2, and n0 such that 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n) for all n ≥ n0 }
Engineering:Drop low-order terms; ignore leading constants.Example: 3n3 + 90n2 – 5n + 6046 = Θ(n3)
Asymptotic performanceWhen n gets large enough, a Θ(n2) algorithm always beats a Θ(n3) algorithm.
n0
T(n)
Merge sort
1. If n = 1, done.2. Recursively sort A[1 .. ]
and A[ + 1 .. n ]3. "Merge" the 2 sorted lists.
Key subroutine: MERGE
/ 2n⎡ ⎤⎢ ⎥/ 2n⎢ ⎥⎣ ⎦
MERGE-SORT A[1 .. n]
Merging two sorted arrays20 12
13 11
7 9
2 1
20 12
13 11
7 9
2
20 12
13 11
7 9
20 12
13 11
9
20 12
13 11
20 12
13
1 2 97 11 12
Time = Θ(n) to merge a totalof n elements (linear time).
1, 2, 7, 9, 11, 12, 13, 20
Operation of merge sort
20 12 7 9 11 13 12initial sequence
1 2 7 9 11 12 13 20sorted sequence
merge
7 9 12 20 1 2 11 13
merge merge
12 20 7 9 11 13 1 2
merge merge merge merge
Analyzing merge sort
1. If n = 1, done.2. Recursively sort A[1 .. ]
and A[ + 1 .. n ]3. "Merge" the 2 sorted lists.
/ 2n⎡ ⎤⎢ ⎥/ 2n⎢ ⎥⎣ ⎦
MERGE-SORT A[1 .. n]T(n)Θ(1)2T(n/2)
Θ(n)
Sloppiness: Should be , but it turns out not to matter asymptotically.
( / 2 ) ( / 2 )T n T n+⎡ ⎤ ⎢ ⎥⎢ ⎥ ⎣ ⎦
Recurrence for merge sort(1) 1,
( )2 ( / 2) ( ) 1.
nT n
T nififn n
Θ =⎧= ⎨ +Θ >⎩
Recurrence for merge sort(1) 1,
( )2 ( / 2) ( ) 1.
nT n
T nififn n
Θ =⎧= ⎨ +Θ >⎩
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
Recurrence for merge sort(1) 1,
( )2 ( / 2) ( ) 1.
nT n
T nififn n
Θ =⎧= ⎨ +Θ >⎩
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.T(n)
Recurrence for merge sort(1) 1,
( )2 ( / 2) ( ) 1.
nT n
T nififn n
Θ =⎧= ⎨ +Θ >⎩
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.cn
T(cn/2) T(cn/2)
Recurrence for merge sort(1) 1,
( )2 ( / 2) ( ) 1.
nT n
T nififn n
Θ =⎧= ⎨ +Θ >⎩
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.cn
cn/2 cn/2
T(cn/4) T(cn/4) T(cn/4) T(cn/4)
Recurrence for merge sort(1) 1,
( )2 ( / 2) ( ) 1.
nT n
T nififn n
Θ =⎧= ⎨ +Θ >⎩
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.cn
cn/2 cn/2
cn/4 cn/4 cn/4 cn/4
c c c c c … c c
......
......
......
......
Recurrence for merge sort(1) 1,
( )2 ( / 2) ( ) 1.
nT n
T nififn n
Θ =⎧= ⎨ +Θ >⎩
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.cn
cn/2 cn/2
cn/4 cn/4 cn/4 cn/4
c c c c c … c c
......
......
......
......
h = lgn
cn
cn
cn
cn
...
n Total: cn lgn + cn
Insertion sort and merge sort
Computer A executes one billioninstructions per second
Computer B executes ten millioninstructions per second
c1 = 2Insertion sort: 2n2
c2 = 50Merge sort: 50nlgn
2 · (106)2 instructions
109 instructions/second
50 · 106 lg106 instructions
107 instructions/second= 2000 seconds (ten million, 2.3 day)
≈ 100 seconds (ten million, 20 minutes)
Merge sort: c2nlgnInsertion sort: c1n2
To sort one million numbers: To sort one million numbers:
Analysis of algorithmsThe theoretical study of computer-programperformance and resource usage.
What's more important than performance?
ModularityCorrectnessMaintainabilityFunctionalityRobustness
User-friendlinessProgrammer timeSimplicityExtensibilityReliability
Comparison of running timesFor each function f(n) and time t in the following table, determine the largest size n of a problem that can be solved in time t, assuming that the algorithm to solve the problem takes f(n) microseconds.
n1/2
n!2n
n3
n2
nlgnn
lgn
1century
1year
1month
1day
1hour
1minute
1second
Asymptotically tight bound
n0
Θ(g(n)) = { f(n) : there exist positive constants c1, c2, and n0 such that 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n)for all n ≥ n0 }
n
c2g(n)
c1g(n)f(n)
f(n) = Θ(g(n))
Asymptotically upper bound
n0
O(g(n)) = { f(n) : there exist positive constants cand n0 such that 0 ≤ f(n) ≤ cg(n)for all n ≥ n0 }
n
cg(n)
f(n)
f(n) = O(g(n))
Asymptotically upper bound
n0
Ω(g(n)) = { f(n) : there exist positive constants cand n0 such that 0 ≤ cg(n) ≤ f(n) for all n ≥ n0 }
n
f(n)
cg(n)
f(n) = Ω(g(n))
Asymptotic notationsAn analogy between the asymptotic comparison of two functions f and g the comparison of two real numbers a and b:
f(n) = O(g(n)) ≈ a ≤ b,f(n) = Ω(g(n)) ≈ a ≥ b,f(n) = Θ(g(n)) ≈ a = b.
Θ(g(n)) = O(g(n)) ⌒ Ω(g(n))
RecurrencesSubstitution methodRecursion-tree methodMaster method
Substitution methodThe most general method:1. Guess the form of the solution.2. Verify by induction.3. Solve for constants.
EXAMPLE: T(n) = 4T(n/2) + n[Assume that T(1) = Θ(1).]Guess O(n3) . Assume that T(k) ≤ ck3 for k < n .Prove T(n) ≤ cn3 by induction.
Example of substitutionT(n) = 4T(n/2) + n
≤ 4c(n/2)3 + n= (c/2)n3 + n= cn3 − ((c/2)n3 − n) ← desired − residual≤ cn3 ← desired
whenever (c/2)n3 − n ≥ 0, for example, if c ≥ 2 and n ≥ 1.
residual
Example of substitution (continued)We must also handle the initial conditions, that is, ground the induction with base cases.Base: T(n) = Θ(1) for all n < n0, where n0 is a suitable constant.For 1 ≤ n < n0, we have "Θ(1)" ≤ cn3, if we pick c big enough.
This bound is not tight!
A tighter upper bound?
We shall prove that T(n) = O(n2).
Assume that T(k) ≤ ck2 for k < n:
T(n) = 4T(n/2) + n≤ 4c(n/2)2 + n= cn2 + n= O(n2) Wrong!
= cn2 − (− n) [desired − residual]≤ cn2 for no choice of c > 0. Lose!
We must prove the inductive hypothesis.
A tighter upper bound!IDEA: Strengthen the inductive hypothesis.Subtract a low-order term.Inductive hypothesis: T(k) ≤ c1k2 – c2k for k < n.
T(n) = 4T(n/2) + n≤ 4c1(n/2)2 − 4c2(n/2) + n= c1n2 − 2c2n + n= c1n2 − c2n − (c2n − n)≤ c1n2 − c2n if c2 ≥ 1.
Pick c1 big enough to handle the initial conditions.
Substitution: changing variablesT(n) = 2T( ) + lgnn⎢ ⎥
⎣ ⎦
Renaming m = lgn yieldsT(2m) = 2T(2m/2) + m
Rename S(m) = T(2m) to produce the new recurrenceS(m) = 2S(m/2) + mS(m) = O(m lgm)
Changing back from S(m) to T(n)T(n) = T(2m) = S(m) = O(m lgm) = O(lgn lglgn)
Recursion-tree methodA recursion tree models the costs (time) of a recursive execution of an algorithm.The recursion-tree method promotes intuition.The recursion tree method is good for generating guesses for the substitution method.
Example of recursion tree
T(n) = 3T(n/4) + n2Solve T(n) = 3T( ) + n2/ 4n⎢ ⎥⎣ ⎦
Example of recursion tree
T(n) = 3T(n/4) + n2
T(n)
Solve T(n) = 3T( ) + n2/ 4n⎢ ⎥⎣ ⎦
Example of recursion tree
T(n) = 3T(n/4) + n2
cn2
( )4nT ( )
4nT ( )
4nT
Solve T(n) = 3T( ) + n2/ 4n⎢ ⎥⎣ ⎦
Example of recursion tree
T(n) = 3T(n/4) + n2
cn2
2( )4nc
( )16nT ( )
16nT ( )
16nT ( )
16nT ( )
16nT ( )
16nT ( )
16nT ( )
16nT ( )
16nT
2( )4nc 2( )
4nc
Solve T(n) = 3T( ) + n2/ 4n⎢ ⎥⎣ ⎦
Example of recursion tree
T(n) = 3T(n/4) + n2
cn2
lg4n
cn2
Total: O(n2)
2( )4nc
2( )16nc 2( )
16nc 2( )
16nc 2( )
16nc 2( )
16nc 2( )
16nc 2( )
16nc 2( )
16nc 2( )
16nc
2( )4nc 2( )
4nc
...
...…
...... ... ...... ... ...... ... ...... ... ...... ... ...... ... ...... ... ...... ... ......... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...(1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T
Solve T(n) = 3T( ) + n2/ 4n⎢ ⎥⎣ ⎦
2316
cn
2 23( )16
cn
4log 3( )nΘ
4log 3n
Cost for the entire tree4 4log 1 log 32 2 2 2 23 3 3( ) ( ) ( ) ( )
16 16 16nT n cn cn cn cn n−= + + + + +Θ
44
log 1log 32
0
3( ) ( )16
ni
icn n
−
=
= +Θ∑
4log 32
0
3( ) ( )16
i
icn n
∞
=
< +Θ∑4
4
log 32
log 32
1 ( )1 (3 /16)16 ( )13
cn n
cn n
= +Θ−
= +Θ
2( )O n=
Substitution method to verify 2( ) 3 ( / 4 ) ( )T n T n n= +Θ⎢ ⎥⎣ ⎦
2
2 2
2 2
2 2
2
3 ( / 4 )
3 / 4
3 ( / 4)3
16
T n cn
d n cn
d n cn
dn cn
dn
≤ +⎢ ⎥⎣ ⎦
≤ +⎢ ⎥⎣ ⎦≤ +
= +
≤
Where the last step holds as long as (16 /13)d c≥
The master methodThe master method applies to recurrences of the form
T(n) = aT(n/b) + f (n) ,
where a ≥ 1, b > 1, and f is asymptotically positive.
Three common casesCompare f(n) with logb an
Three common casesCompare f(n) with logb an
1. f(n) = O( ) for some constant .logb an ε− 0ε >f(n) grows polynomially slower than logb an ε−
(by an factor). nε
Solution: . log( ) ( )b aT n n= Θ
Three common casesCompare f(n) with logb an
1. f(n) = O( ) for some constant .logb an ε− 0ε >f(n) grows polynomially slower than logb an ε−
(by an factor). nε
Solution: . log( ) ( )b aT n n= Θ
2. f(n) = O( ).log lgb an nf(n) and grow at similar rates.logb anSolution: . log( ) ( lg )b aT n n n= Θ
Three common cases (continued)Compare f(n) with logb an
3. f(n) = for some constant .log( )b an ε+Ω 0ε >f(n) grows polynomially faster than logb an ε+
(by an factor). nε
Solution: . ( ) ( ( ))T n f n= Θ
and f(n) satisfies the regularity condition that af(n/b) ≤ cf(n) for some constant c < 1.
Idea of master theoremf(n)
lgbn
f(n)
Total:
( )nfb
2( )nfb 2( )nf
b 2( )nfb
( )nfb
( )nfb
...
...…
...... ... ...... ... ...... ... ...... ... ...... ... ...... ... ...... ... ...... ... ......... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...(1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T
( / )af n b
2 2( / )a f n b
log( )b anΘ
logb an
a a a
a a a a a a a a a
log 1log
0( ) ( / )
bb
na j j
jn a f n b
−
=
Θ + ∑
a
2( )nfb 2( )nf
b 2( )nfb 2( )nf
b 2( )nfb 2( )nf
b
ExamplesEX. T(n) = 4T(n/2) + n
a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n
ExamplesEX. T(n) = 4T(n/2) + n
a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n
CASE 1:2( ) ( )f n O n ε−= for . 1ε =
∴ 2( ) ( )T n n= Θ
ExamplesEX. T(n) = 4T(n/2) + n
a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n
CASE 1:2( ) ( )f n O n ε−= for . 1ε =
∴ 2( ) ( )T n n= Θ
EX. T(n) = 4T(n/2) + n2
a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n2
ExamplesEX. T(n) = 4T(n/2) + n
a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n
CASE 1:2( ) ( )f n O n ε−= for . 1ε =
∴ 2( ) ( )T n n= Θ
EX. T(n) = 4T(n/2) + n2
a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n2
CASE 2:2( ) ( )f n n= Θ
∴ 2( ) ( lg )T n n n= Θ
ExamplesEX. T(n) = 4T(n/2) + n3
a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n3
ExamplesEX. T(n) = 4T(n/2) + n3
a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n3
CASE 3:2( ) ( )f n n ε+= Ω
∴ 3( ) ( )T n n= Θ
for . 1ε =and 4(n/2)3 ≤ cn3 (reg. cond.) for c = 1/2.
ExamplesEX. T(n) = 4T(n/2) + n3
a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n3
CASE 3:2( ) ( )f n n ε+= Ω
∴ 3( ) ( )T n n= Θ
for . 1ε =and 4(n/2)3 ≤ cn3 (reg. cond.) for c = 1/2.
EX. T(n) = 4T(n/2) + n2/lgna = 4, b = 2 ⇒ log 2b an n= ; f(n) = n2/lgn.
ExamplesEX. T(n) = 4T(n/2) + n3
a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n3
CASE 3:2( ) ( )f n n ε+= Ω
∴ 3( ) ( )T n n= Θ
for . 1ε =and 4(n/2)3 ≤ cn3 (reg. cond.) for c = 1/2.
EX. T(n) = 4T(n/2) + n2/lgna = 4, b = 2 ⇒ log 2b an n= ; f(n) = n2/lgn.Master method does not apply.
Any question?Xiaoqing Zheng
Fundan University