Top Banner
Sorting Sung Yong Shin TC Lab. CS Dept., KAIST
35

Sorting

Jan 03, 2016

Download

Documents

desirae-klein

Sorting. Sung Yong Shin TC Lab. CS Dept., KAIST. Contents. 1. Introduction 2. Insertion Sort 3. Quick Sort 4. Merge Sort 5. Heap Sort 6. Shell Sort 7. Radix Sort Internal 8. External SortingExternal Reading Assignment p149-222, Baase. 1. Introduction. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sorting

Sorting

Sung Yong Shin

TC Lab.

CS Dept., KAIST

Page 2: Sorting

1. Introduction

2. Insertion Sort

3. Quick Sort

4. Merge Sort

5. Heap Sort

6. Shell Sort

7. Radix Sort Internal

8. External Sorting External

Reading Assignment

p149-222, Baase

Contents

Page 3: Sorting

25 - 50 % of Computing time !!!

– Updating

– Reporting Sorting, Searching

– Queries

Rich Results

Internal Sorting

– The file to be sorted is small enough so that the entire sort can be carried out in main memory.

– ( Time complexity better )

External Sorting

– ( # of I/O operations better )

1. Introduction

Page 4: Sorting

SORT : Given a set of n real numbers, rearrange them in the increasing order.

Given a file of n records (R1, R2, R3, …, Rn ) with keys (k1, k2, …, kn ) find a permutation

such that

1

2

3

n

Record

1

key

2

3

n

… …… …

note : keys are not nec. real numbers

},,3,2,1{},,3,2,1{: nn

ji kkji )()(

Page 5: Sorting

Lower Bound Worst case

Stability

A sorting method is stable if equal keys remain in the same relative order in the sorted list as they were in the original list.

In place

An algorithm is said to be in place if the amount of extra space is constant with respect to input size.

Time complexity

– Worst

– Average

…………………

Decision Tree

(n!) leaves

TSORT = ( log2n!) = ( nlog2n )

Page 6: Sorting

2. Insertion Sort

Page 7: Sorting

Algorithm (Insertion Sort)

procedure InserionSort (var L : array; n : integer );var

x : Key;xindex , j : Index;

beginfor xindex = 2 to n do

x := L(xindex)j := xindex - 1;while j > 0 and L(j) > x do

L(j+1) := L(j);j := j-1;

end {while}L(j+1) := x;

end {for}

end

Correctness Proof

Exercise. Hint : Loop invariant ( induction on xindex )

Page 8: Sorting

TSORT(n) = ( nlogn )

T(n) ? ( Worst case )

xindex # of comparisons

2 1

3 2

4 3

… …

i i-1

… …

n n-1

Total # of comparisons =

T(n) = O(n2)

Far from Optimal !!! However, ……

2

)1(n-n

Page 9: Sorting

Assumption : n! permutations are equally likely as input.

Keys are distinct.

Observation:

P( x is in the jth position ) = 1/i, j = 1, 2, …, i

Ai(n) = Given x is the i th element, the average # of comparisons

A(n) = average # of comparisons

A(n) =

i possible positions

n

ii nA

1

)(

Average Behavior

……

Page 10: Sorting

1

1

1

2

111i

j i

i

i

ij

i

n

i

n

i i

nn

i

i

2 2

2 11

4

3

4)

1

2

1(

4

2n

nln

0 if i = 1

Ai(n) =

A(n) = (See page 26)

O(n2)

Page 11: Sorting

Assumption :

(1) Compare adjacent keys

(2) Depending on the result, move the curent pair of compared keys locally.

What kind of sorting algorithms under these two assumptions?– Insertion Sort

– Bubble Sort

– …………

What is lower bound in time complexity under these assumptions?

xi-1 i+1i

Observation

Page 12: Sorting

{x1, x2, …, xn }

: { 1, 2, …, n } { 1, 2, …, n }

(i) < (j)

(i) means that the ith element is placed at the (i)th position.

(i), 1in, is the correct position of xi when the list is sorted.

xi < xj

Page 13: Sorting

Now,

: {1,2,…,n}{1,2,…,n}, where is defined as

(i) = j, if xi is the jth smallest one

x1 x2 x3 x4 x5 x6

L = ( 2.2, 5.3, 4.2, 6.6, 1.9, 3.8 ) ( 2, 5, 4, 6, 1, 3 )

( (1), (2), (3), (4), (5), (6) )

Page 14: Sorting

Def’n : An inversion of the permutation is an ordered pair

((i), (j)) such that i < j and (i) > (j)

((i), (j)) is an inversion The ith and jth keys are left

out of order(LOO).

How many inversions in L?

(2, -) : 1

(5, -) : 3

(4, -) : 2

(6, -) : 2

(1, -) : 0

8 inersions (LOO’s)

Given |L| = n, how many possible inversons in the worst case ?

inversions !!!

why?

2

)1( nn

2.2, 5.3, 4.2, 6.6, 1.9, 3.8 L = ( x1 x2 x3 x4 x5 x6 )

( 2, 5, 4, 6, 1, 3 )

( (1), (2), (3), (4), (5), (6) )

Page 15: Sorting

((i), (j))

What does an inversion imply in sorting?

xi is required to follow xj in the sorted list !!!

How can you do this?

detection : comparisons (“Local”)

resolving : “Local” moves

At least, one comparison is needed per each inversion !!!

How many inversions in the worst case?

(n2) in worst case

How about average case ?

Well, …

need to compute the average # of inversions !!!

2

)1( nn

Page 16: Sorting

: { 1, 2, …, n } { 1, 2, …, n }

(i) < (j) xi < xj

n! permutations !!!

Assumption:

P( = i) = 1/n!, i=1,2,…,n

( (1), (2), …, (n) )T = ( (n), (n-1), …, (1) ).

Transpose of

For each (i,j), ((i), (j)) is an inversion in

either ( (1), (2), …, (n) ) or ( (n), (n-1), …, (1) )

How many (, T) pairs?

why?

How many (i,j) pairs?

2

n!

2

)1(

2

nnn

Page 17: Sorting

Average # of inversions

(n2) in average case, too !!!

4

)1(

2

)1(

2

1

nnnnn!

n!

Sort In Place? Stable?

Insertion

Selection

Bubble

yes

yes

yes

yes

no why?

yes

- easy to implement

- good for small input

Page 18: Sorting

26 26 26 26 26 (11)

5 5 5 5 5 5

37 o (19) 19 19 19 19

1 1 o 1 1 1 1

61 61 61 o (15) 15 15

11 11 11 11 o 11 t (26)

59 59 59 59 t 59 o 59

15 15 15 t (61) 61 61

48 48 t 48 48 48 48

19 t (37) 37 37 37 37

3. QuickSort

Basic Idea

(1) Place xi in its final position.

( x1, x2,…, xj,…, xn )

xk < xi xi xk > xi

k = 1,2,…,j - 1 k = j+1,j+2,…,n

(2) Divide and Conquer !!!

T(n) = T(j - 1) + T(n - j) +O(n) !!!

Page 19: Sorting

> x y

x

x <x y >x

y <x x >x

case 1

case 2

Alternative method (Textbook)

: y x

: y < x

x < x y > x y

x

x

< x y

< x > x

x < x > x yo t

t

t

t

t

t

o

o

o

o

o

Page 20: Sorting

Worst case (when?)

T(n) = T(n-1) + c(n-1), c>0

0 x n-1

P(0) P(n-1)

T(0) = 0 T(n-1)

Average Case

)(O2

)1(O 2n

n-nT(n)

>x

inAiAn

n-cnAn

i

1

))()1((1

)1()(

)logO()(21

)1(1

2

nniAn

n-cn

i

Time Complexity

why?

i = 1 A(0) + A(n-1)

2 A(1) + A(n-2) ……

n-1 A(n-2) + A(1)

n A(n-1) + A(0)

0

0

0

0

Page 21: Sorting

Good Performance !!!

(Practically)

In place?

no!!! why?

Stable?

no!!! why?

Page 22: Sorting

Theorem : A(n) = O(nlogen)

[Proof] From the previous lecture,

n = 1:

A(1) = 0 why?

nlogen = 1loge1 = 0

Suppose that

1

21 )(

2)1()(

n

i

iAn

ncnA

for kiiciiA e 1,log)( nk

Page 23: Sorting

2)2log21(

1

2)

2()1(log)1(

42

log

1

2

log1

2

log1

2

)(1

2

)(1

2)11()1(

1

1

2

22

1

1

2

1

21

21

11

21

c

k

ccckkkc

xxx

k

ckc

xdxxk

ckc

icik

kc

iAk

kc

iAk

kckA

ee

k

e

k

e

k

ie

k

i

k

i

n = k + 1 :

A(k+1) c(k+1)loge(k+1) why?

Page 24: Sorting

)1(2)1(2

)1(2)2)(1()1()1()1()(

)(1

2)2()1(

)(2

)1()(

2

2

1

2

ncnA

nAnncnncnAnnnA

iAn

ncnA

iAn

ncnA

n

i

n

i

)1(

)1(2)1(

1

)(

)1(2)1()1()(

nn

nc

n

nA

n

nA

ncnAnnnA

)1(

)1(2)1()(

1

)()(

nn

ncnBnB

n

nAnB

Alternative Proof

B(n) = O(logen)

A(n) = O(nlogen)

2log)1(log2log2

12

12

1

12

)1(

1

1

12

)1(

)1(2)(

1

2

2

1

22

22

een

e

n

i

nn

i

n

i

n

i

ncxc

dxx

ci

ci

c

iiic

ii

icnB

Page 25: Sorting

<x x >x L1 2 … i … … n

)(,2

nTn

i

Comment on QuickSort

T(n) = T(i-1) + T(n-i) + cn

very sensitive to i, hence x

i) choosing x

- random

- median { L(1), L(n+1/2), L(n) }

…………

Page 26: Sorting

ii) Quicksort if n ko

Other nonrecursive sort, otherwise !!!

iii) Manipulate the stack explicitly

why?

iv) Put the larger subproblem in the stack !!!

why?

…………… … …

………………………

n

n-ii-1

ko

QuickSort

Other Sort

Page 27: Sorting

[ 1 11 5 21 3 15 12 17 ]

[ [ 1 11 5 21 ] [ 3 15 12 17 ] ]

[ [ [ 1 11 ] [ 5 21 ] ] [ [ 3 15 ] [ 12 17 ] ] ]

[ [ [ [1] [11] ] [ [5] [21] ] ] [ [ [3] [15] ] [ [12] [17] ] ] ]

[ [ [ 1 11 ] [ 5 21 ] ] [ [ 3 15 ] [ 12 17 ] ] ]

[ [ 1 5 11 21 ] [ 3 12 15 17 ] ]

[ 1 3 5 11 12 15 17 21 ]

Basic Idea

“ Divide and Conquer”

Divide

Merge

P(n)

4. Merge Sort

2

nP

2

nP

Page 28: Sorting

T(n) = 2T(n/2) + cn

Time required for dividing and merging

T(n) = O(nlogn)

Optimal !!!

How about A(n)?

A(n) = O(nlogn)

why?

Is the mergesort optimal in the average case?

well, …

Page 29: Sorting

1 3 5 7 9 11

1 2 3 4 5 6 7 8 9 10 11

2 4 6 8 10

How many comparisons?

10 comparisons !!!

n+m-1 comparisons in general

Theorem : Any algorithm for merging two sorted lists, each containing n entities, does at least 2n-1 comparisons in the worst case.

[Proof]

(a1, a2, …, an) (a1, b1, a2 b2, …, an , bn)

(b1, b2, …, bn)

ai < bi < ai+1, i = 1,2,…,n-1

Claim : bi must be compared with ai and ai+1

2n-1 comparisons. Why?

Merging

Page 30: Sorting

Suppose that bi is not compared with ai

a1 < b1 < … < ai-1 < bi-1 < ai < bi < ai+1 < … < an < bn

a1 < b1 < … < ai-1 < bi-1 < bi < ai < ai+1 < … < an < bn

the same result #

Similarly, bi needs to be compared with ai+1!!!

Page 31: Sorting

Is the Mergesort stable?

yes !!!

why?

Is the Mergesort in place?

no !!!

why? - stack

- copying

Page 32: Sorting

……………

……

Decision tree

l

eplcnTA )(

epl

# of leaf nodes

)()(l

eplnTA

the average external path length from the root to a leaf

:l

epl

Lower bound for SORT in average case

Def’n : The external path length of a tree is the sum of the length of all paths from the root to all leaves

Def’n : A binary tree is said to be a 2-tree if every node of the tree is of outdegree 0 or 2

A decision tree is a 2-tree

Page 33: Sorting

Lemma : Among 2-trees with l leaves, the epl is minimized if all the leaves are on at most two adjacent levels.

…………………

………………

full binary tree complete binary tree

[Proof] (By Contradiction) Suppose that we have a 2-

tree that has a leaf x at level k, where k d - 2

We can always rebuild a 2-tree with the same number of leaves and lower epl. #

X

Y

X

Y

… k …

… d-1 … … d …

- ( k + 2d )

+( 2(k + 1) + d - 1 )

k + 1 - d < 0

Page 34: Sorting

Lemma : The minimum epl with l leaves is

[Proof]

If l = 2k, kZn, then all the leaves are at level k !!! why?

k = log2l

Supose that l 2k for any kZn Then,

why? l 2d

From the previous lemma, all leaves is at level d-1 or d

)2(2log 2log2

llll

ld 2log

………………l(d-1)

+

2(l-2d-1)

why?

)2(2log

)2(2)1(2log

2

1

l

d

lll

ldl

Page 35: Sorting

Lemma : The average path length in a 2-tree with l leaves is at least log2l

[Proof]

Theorem: The average # of comparisons done by an algorithm to sort n

items by comparison of keys is at least lnn! = (nlnn)

[Proof] l n!

QuickSort and MergeSort are optimal in the average case

ll

ll

l

lll l

2

22

log2

log

log212log

)2(2log 2