Data Structures and Algorithm - Fudan Universityfdjpkc.fudan.edu.cn/_upload/article/files/83/76/369481e...Data Structures and Algorithm Xiaoqing Zheng [email protected] What are

Data Structures and Algorithm

Xiaoqing [email protected]

What are algorithms?A sequence of computational steps that transform the input into the output

Sorting problem:Input: A sequence of n numbers <a1, a2, …, an>Output: A permutation (reordering) <a'1, a'2, …, a'n>such that a'1 ≤ a'2 ≤ … ≤ a'n

Instance of sorting problem:Input: <32, 45, 64, 28, 45, 58>Output: <28, 32, 45, 45, 58, 64>

Example of insert sort8 2 4 9 3 6

2 8 4 9 3 6

2 4 8 9 3 6

2 4 8 9 3 6

2 3 4 8 9 6

2 3 4 6 8 9 done

Insertion sortINSERTION-SORT(A)1 for j ← 2 to length[A]2 do key ← A[j]3 // Insert A[j] into the sorted sequence A[1 .. j − 1]4 i ← j − 15 while i > 0 and A[i] > key6 do A[i + 1] ← A[i]7 i ← i − 18 A[i + 1] ← key

A:1 i j

key

n

Sorted

Running timeThe running time depends on the input: an already sorted sequence is easier to sort.Parameterize the running time by the size of the input, since short sequences are easier to sort than long ones.Generally, we seek upper bounds on the running time, because everybody likes a guarantee.

Kinds of analysesWorst-case: (usually)

T(n) = maximum time of algorithm on any input of size n.

Average-case: (sometimes)T(n) = expected time of algorithm over all inputs of size n.Need assumption of statistical distribution of inputs.

Best-case: (bogus)Cheat with a slow algorithm that works fast on some input.

Analysis of insertion sortINSERTION-SORT(A) 1 for j ← 2 to length[A] 2 do key ← A[j] 3 // Insert A[j] into the sorted

sequence A[1 .. j − 1]4 i ← j − 1 5 while i > 0 and A[i] > key6 do A[i + 1] ← A[i] 7 i ← i − 1 8 A[i + 1] ← key

costc1c20

c4c5c6c7c8

timesnn − 1n − 1

n − 1

n − 1

2

njj

t=∑

2( 1)n

jjt

=−∑

2( 1)n

jjt

=−∑

1 2 4 5 6 7 82 2 2( ) ( 1) ( 1) ( 1) ( 1) ( 1)n n n

j j jj j jT n c n c n c n c t c t c t c n

= = == + − + − + + − + − + −∑ ∑ ∑

Analysis of insertion sort: best and worst1 2 4 5 6 7 82 2 2

( ) ( 1) ( 1) ( 1) ( 1) ( 1)n n nj j jj j j

T n c n c n c n c t c t c t c n= = =

= + − + − + + − + − + −∑ ∑ ∑

Best case:The best case occurs if the array is already sorted

1 2 4 5 8( ) ( 1) ( 1) ( 1) ( 1)T n c n c n c n c n c n= + − + − + − + −

1 2 4 5 8 2 4 5 8( ) ( )c c c c c n c c c c= + + + + − + + + an + b (linear function)

Worst case:The worst case occurs if the array is in reverse sorted order

1 2 4 5 6 7 8( 1) ( 1) ( 1)( ) ( 1) ( 1) ( 1) ( ) ( ) ( 1)

2 2 2n n n n n nT n c n c n c n c c c c n+ − −

= + − + − + − + + + −

25 6 7 5 6 71 2 4 8 2 4 5 8( ) ( ) ( )

2 2 2 2 2 2c c c c c cn c c c c n c c c c= + + + + + + − − + − + + +

an2 + bn + c (quadratic function)

Machine-independent timeWhat is insertion sort's worst-case?It depends on the speed of our computer:

Relative speed (on the same machine),Absolute speed (on different machines).

"Asymptotic Analysis""Asymptotic Analysis"

BIG IDEA:Ignore machine-dependent constants.Look at growth of T(n) as n → ∞

Θ-notationMath:Θ(g(n)) = { f(n) : there exist positive constants c1, c2, and n0 such that 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n) for all n ≥ n0 }

Engineering:Drop low-order terms; ignore leading constants.Example: 3n3 + 90n2 – 5n + 6046 = Θ(n3)

Asymptotic performanceWhen n gets large enough, a Θ(n2) algorithm always beats a Θ(n3) algorithm.

n0

T(n)

Merge sort

1. If n = 1, done.2. Recursively sort A[1 .. ]

and A[ 　 + 1 .. n ]3. "Merge" the 2 sorted lists.

Key subroutine: MERGE

/ 2n⎡ ⎤⎢ ⎥/ 2n⎢ ⎥⎣ ⎦

MERGE-SORT A[1 .. n]

Merging two sorted arrays20 12

13 11

7 9

2 1

20 12

13 11

7 9

2

20 12

13 11

7 9

20 12

13 11

9

20 12

13 11

20 12

13

1 2 97 11 12

Time = Θ(n) to merge a totalof n elements (linear time).

1, 2, 7, 9, 11, 12, 13, 20

Operation of merge sort

20 12 7 9 11 13 12initial sequence

1 2 7 9 11 12 13 20sorted sequence

merge

7 9 12 20 1 2 11 13

merge merge

12 20 7 9 11 13 1 2

merge merge merge merge

Analyzing merge sort

1. If n = 1, done.2. Recursively sort A[1 .. ]

and A[ 　 + 1 .. n ]3. "Merge" the 2 sorted lists.

/ 2n⎡ ⎤⎢ ⎥/ 2n⎢ ⎥⎣ ⎦

MERGE-SORT A[1 .. n]T(n)Θ(1)2T(n/2)

Θ(n)

Sloppiness: Should be , but it turns out not to matter asymptotically.

( / 2 ) ( / 2 )T n T n+⎡ ⎤ ⎢ ⎥⎢ ⎥ ⎣ ⎦

Recurrence for merge sort(1) 1,

( )2 ( / 2) ( ) 1.

nT n

T nififn n

Θ =⎧= ⎨ +Θ >⎩


( )2 ( / 2) ( ) 1.

nT n

T nififn n

Θ =⎧= ⎨ +Θ >⎩

Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.


( )2 ( / 2) ( ) 1.

nT n

T nififn n

Θ =⎧= ⎨ +Θ >⎩

Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.T(n)


( )2 ( / 2) ( ) 1.

nT n

T nififn n

Θ =⎧= ⎨ +Θ >⎩

Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.cn

T(cn/2) T(cn/2)


( )2 ( / 2) ( ) 1.

nT n

T nififn n

Θ =⎧= ⎨ +Θ >⎩


cn/2 cn/2

T(cn/4) T(cn/4) T(cn/4) T(cn/4)


( )2 ( / 2) ( ) 1.

nT n

T nififn n

Θ =⎧= ⎨ +Θ >⎩


cn/2 cn/2

cn/4 cn/4 cn/4 cn/4

c c c c c … c c

......

......

......

......


( )2 ( / 2) ( ) 1.

nT n

T nififn n

Θ =⎧= ⎨ +Θ >⎩


cn/2 cn/2

cn/4 cn/4 cn/4 cn/4

c c c c c … c c

......

......

......

......

h = lgn

cn

cn

cn

cn

...

n Total: cn lgn + cn

Insertion sort and merge sort

Computer A executes one billioninstructions per second

Computer B executes ten millioninstructions per second

c1 = 2Insertion sort: 2n2

c2 = 50Merge sort: 50nlgn

2 · (106)2 instructions

109 instructions/second

50 · 106 lg106 instructions

107 instructions/second= 2000 seconds (ten million, 2.3 day)

≈ 100 seconds (ten million, 20 minutes)

Merge sort: c2nlgnInsertion sort: c1n2

To sort one million numbers: To sort one million numbers:

Analysis of algorithmsThe theoretical study of computer-programperformance and resource usage.

What's more important than performance?

ModularityCorrectnessMaintainabilityFunctionalityRobustness

User-friendlinessProgrammer timeSimplicityExtensibilityReliability

Comparison of running timesFor each function f(n) and time t in the following table, determine the largest size n of a problem that can be solved in time t, assuming that the algorithm to solve the problem takes f(n) microseconds.

n1/2

n!2n

n3

n2

nlgnn

lgn

1century

1year

1month

1day

1hour

1minute

1second

Asymptotically tight bound

n0

Θ(g(n)) = { f(n) : there exist positive constants c1, c2, and n0 such that 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n)for all n ≥ n0 }

n

c2g(n)

c1g(n)f(n)

f(n) = Θ(g(n))

Asymptotically upper bound

n0

O(g(n)) = { f(n) : there exist positive constants cand n0 such that 0 ≤ f(n) ≤ cg(n)for all n ≥ n0 }

n

cg(n)

f(n)

f(n) = O(g(n))

Asymptotically upper bound

n0

Ω(g(n)) = { f(n) : there exist positive constants cand n0 such that 0 ≤ cg(n) ≤ f(n) for all n ≥ n0 }

n

f(n)

cg(n)

f(n) = Ω(g(n))

Asymptotic notationsAn analogy between the asymptotic comparison of two functions f and g the comparison of two real numbers a and b:

f(n) = O(g(n)) ≈ a ≤ b,f(n) = Ω(g(n)) ≈ a ≥ b,f(n) = Θ(g(n)) ≈ a = b.

Θ(g(n)) = O(g(n)) ⌒ Ω(g(n))

RecurrencesSubstitution methodRecursion-tree methodMaster method

Substitution methodThe most general method:1. Guess the form of the solution.2. Verify by induction.3. Solve for constants.

EXAMPLE: T(n) = 4T(n/2) + n[Assume that T(1) = Θ(1).]Guess O(n3) . Assume that T(k) ≤ ck3 for k < n .Prove T(n) ≤ cn3 by induction.

Example of substitutionT(n) = 4T(n/2) + n

≤ 4c(n/2)3 + n= (c/2)n3 + n= cn3 − ((c/2)n3 − n) ← desired − residual≤ cn3 ← desired

whenever (c/2)n3 − n ≥ 0, for example, if c ≥ 2 and n ≥ 1.

residual

Example of substitution (continued)We must also handle the initial conditions, that is, ground the induction with base cases.Base: T(n) = Θ(1) for all n < n0, where n0 is a suitable constant.For 1 ≤ n < n0, we have "Θ(1)" ≤ cn3, if we pick c big enough.

This bound is not tight!

A tighter upper bound?

We shall prove that T(n) = O(n2).

Assume that T(k) ≤ ck2 for k < n:

T(n) = 4T(n/2) + n≤ 4c(n/2)2 + n= cn2 + n= O(n2) Wrong!

= cn2 − (− n) [desired − residual]≤ cn2 for no choice of c > 0. Lose!

We must prove the inductive hypothesis.

A tighter upper bound!IDEA: Strengthen the inductive hypothesis.Subtract a low-order term.Inductive hypothesis: T(k) ≤ c1k2 – c2k for k < n.

T(n) = 4T(n/2) + n≤ 4c1(n/2)2 − 4c2(n/2) + n= c1n2 − 2c2n + n= c1n2 − c2n − (c2n − n)≤ c1n2 − c2n if c2 ≥ 1.

Pick c1 big enough to handle the initial conditions.

Substitution: changing variablesT(n) = 2T( ) + lgnn⎢ ⎥

⎣ ⎦

Renaming m = lgn yieldsT(2m) = 2T(2m/2) + m

Rename S(m) = T(2m) to produce the new recurrenceS(m) = 2S(m/2) + mS(m) = O(m lgm)

Changing back from S(m) to T(n)T(n) = T(2m) = S(m) = O(m lgm) = O(lgn lglgn)

Recursion-tree methodA recursion tree models the costs (time) of a recursive execution of an algorithm.The recursion-tree method promotes intuition.The recursion tree method is good for generating guesses for the substitution method.

Example of recursion tree

T(n) = 3T(n/4) + n2Solve T(n) = 3T( ) + n2/ 4n⎢ ⎥⎣ ⎦


T(n) = 3T(n/4) + n2

T(n)

Solve T(n) = 3T( ) + n2/ 4n⎢ ⎥⎣ ⎦


T(n) = 3T(n/4) + n2

cn2

( )4nT ( )

4nT ( )

4nT

Solve T(n) = 3T( ) + n2/ 4n⎢ ⎥⎣ ⎦


T(n) = 3T(n/4) + n2

cn2

2( )4nc

( )16nT ( )

16nT ( )

16nT ( )

16nT ( )

16nT ( )

16nT ( )

16nT ( )

16nT ( )

16nT

2( )4nc 2( )

4nc

Solve T(n) = 3T( ) + n2/ 4n⎢ ⎥⎣ ⎦


T(n) = 3T(n/4) + n2

cn2

lg4n

cn2

Total: O(n2)

2( )4nc

2( )16nc 2( )

16nc 2( )

16nc 2( )

16nc 2( )

16nc 2( )

16nc 2( )

16nc 2( )

16nc 2( )

16nc

2( )4nc 2( )

4nc

...

...…

...... ... ...... ... ...... ... ...... ... ...... ... ...... ... ...... ... ...... ... ......... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...(1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T

Solve T(n) = 3T( ) + n2/ 4n⎢ ⎥⎣ ⎦

2316

cn

2 23( )16

cn

4log 3( )nΘ

4log 3n

Cost for the entire tree4 4log 1 log 32 2 2 2 23 3 3( ) ( ) ( ) ( )

16 16 16nT n cn cn cn cn n−= + + + + +Θ

44

log 1log 32

0

3( ) ( )16

ni

icn n

−

=

= +Θ∑

4log 32

0

3( ) ( )16

i

icn n

∞

=

< +Θ∑4

4

log 32

log 32

1 ( )1 (3 /16)16 ( )13

cn n

cn n

= +Θ−

= +Θ

2( )O n=

Substitution method to verify 2( ) 3 ( / 4 ) ( )T n T n n= +Θ⎢ ⎥⎣ ⎦

2

2 2

2 2

2 2

2

3 ( / 4 )

3 / 4

3 ( / 4)3

16

T n cn

d n cn

d n cn

dn cn

dn

≤ +⎢ ⎥⎣ ⎦

≤ +⎢ ⎥⎣ ⎦≤ +

= +

≤

Where the last step holds as long as (16 /13)d c≥

The master methodThe master method applies to recurrences of the form

T(n) = aT(n/b) + f (n) ,

where a ≥ 1, b > 1, and f is asymptotically positive.

Three common casesCompare f(n) with logb an


1. f(n) = O( ) for some constant .logb an ε− 0ε >f(n) grows polynomially slower than logb an ε−

(by an factor). nε

Solution: . log( ) ( )b aT n n= Θ


1. f(n) = O( ) for some constant .logb an ε− 0ε >f(n) grows polynomially slower than logb an ε−

(by an factor). nε

Solution: . log( ) ( )b aT n n= Θ

2. f(n) = O( ).log lgb an nf(n) and grow at similar rates.logb anSolution: . log( ) ( lg )b aT n n n= Θ

Three common cases (continued)Compare f(n) with logb an

3. f(n) = for some constant .log( )b an ε+Ω 0ε >f(n) grows polynomially faster than logb an ε+

(by an factor). nε

Solution: . ( ) ( ( ))T n f n= Θ

and f(n) satisfies the regularity condition that af(n/b) ≤ cf(n) for some constant c < 1.

Idea of master theoremf(n)

lgbn

f(n)

Total:

( )nfb

2( )nfb 2( )nf

b 2( )nfb

( )nfb

( )nfb

...

...…

...... ... ...... ... ...... ... ...... ... ...... ... ...... ... ...... ... ...... ... ......... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...(1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T (1)T

( / )af n b

2 2( / )a f n b

log( )b anΘ

logb an

a a a

a a a a a a a a a

log 1log

0( ) ( / )

bb

na j j

jn a f n b

−

=

Θ + ∑

a

2( )nfb 2( )nf

b 2( )nfb 2( )nf

b 2( )nfb 2( )nf

b

ExamplesEX. T(n) = 4T(n/2) + n

a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n


a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n

CASE 1:2( ) ( )f n O n ε−= for . 1ε =

∴ 2( ) ( )T n n= Θ


a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n

CASE 1:2( ) ( )f n O n ε−= for . 1ε =

∴ 2( ) ( )T n n= Θ

EX. T(n) = 4T(n/2) + n2

a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n2


a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n

CASE 1:2( ) ( )f n O n ε−= for . 1ε =

∴ 2( ) ( )T n n= Θ

EX. T(n) = 4T(n/2) + n2

a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n2

CASE 2:2( ) ( )f n n= Θ

∴ 2( ) ( lg )T n n n= Θ

ExamplesEX. T(n) = 4T(n/2) + n3

a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n3


a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n3

CASE 3:2( ) ( )f n n ε+= Ω

∴ 3( ) ( )T n n= Θ

for . 1ε =and 4(n/2)3 ≤ cn3 (reg. cond.) for c = 1/2.


a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n3

CASE 3:2( ) ( )f n n ε+= Ω

∴ 3( ) ( )T n n= Θ


EX. T(n) = 4T(n/2) + n2/lgna = 4, b = 2 ⇒ log 2b an n= ; f(n) = n2/lgn.


a = 4, b = 2 ⇒ log 2b an n= ; f(n) = n3

CASE 3:2( ) ( )f n n ε+= Ω

∴ 3( ) ( )T n n= Θ


EX. T(n) = 4T(n/2) + n2/lgna = 4, b = 2 ⇒ log 2b an n= ; f(n) = n2/lgn.Master method does not apply.

Any question?Xiaoqing Zheng

Fundan University

Data Structures and Algorithm - Fudan Universityfdjpkc.fudan.edu.cn/_upload/article/files/83/76/369481e...Data Structures and Algorithm Xiaoqing Zheng [email protected] What are

Documents