Sorting Sung Yong Shin TC Lab. CS Dept., KAIST
Jan 03, 2016
Sorting
Sung Yong Shin
TC Lab.
CS Dept., KAIST
1. Introduction
2. Insertion Sort
3. Quick Sort
4. Merge Sort
5. Heap Sort
6. Shell Sort
7. Radix Sort Internal
8. External Sorting External
Reading Assignment
p149-222, Baase
Contents
25 - 50 % of Computing time !!!
– Updating
– Reporting Sorting, Searching
– Queries
Rich Results
Internal Sorting
– The file to be sorted is small enough so that the entire sort can be carried out in main memory.
– ( Time complexity better )
External Sorting
– ( # of I/O operations better )
1. Introduction
SORT : Given a set of n real numbers, rearrange them in the increasing order.
Given a file of n records (R1, R2, R3, …, Rn ) with keys (k1, k2, …, kn ) find a permutation
such that
1
2
3
n
Record
1
key
2
3
n
… …… …
note : keys are not nec. real numbers
},,3,2,1{},,3,2,1{: nn
ji kkji )()(
Lower Bound Worst case
Stability
A sorting method is stable if equal keys remain in the same relative order in the sorted list as they were in the original list.
In place
An algorithm is said to be in place if the amount of extra space is constant with respect to input size.
Time complexity
– Worst
– Average
…………………
Decision Tree
(n!) leaves
TSORT = ( log2n!) = ( nlog2n )
2. Insertion Sort
Algorithm (Insertion Sort)
procedure InserionSort (var L : array; n : integer );var
x : Key;xindex , j : Index;
beginfor xindex = 2 to n do
x := L(xindex)j := xindex - 1;while j > 0 and L(j) > x do
L(j+1) := L(j);j := j-1;
end {while}L(j+1) := x;
end {for}
end
Correctness Proof
Exercise. Hint : Loop invariant ( induction on xindex )
TSORT(n) = ( nlogn )
T(n) ? ( Worst case )
xindex # of comparisons
2 1
3 2
4 3
… …
i i-1
… …
n n-1
Total # of comparisons =
T(n) = O(n2)
Far from Optimal !!! However, ……
2
)1(n-n
Assumption : n! permutations are equally likely as input.
Keys are distinct.
Observation:
P( x is in the jth position ) = 1/i, j = 1, 2, …, i
Ai(n) = Given x is the i th element, the average # of comparisons
A(n) = average # of comparisons
A(n) =
i possible positions
n
ii nA
1
)(
Average Behavior
……
1
1
1
2
111i
j i
i
i
ij
i
n
i
n
i i
nn
i
i
2 2
2 11
4
3
4)
1
2
1(
4
2n
nln
0 if i = 1
Ai(n) =
A(n) = (See page 26)
O(n2)
Assumption :
(1) Compare adjacent keys
(2) Depending on the result, move the curent pair of compared keys locally.
What kind of sorting algorithms under these two assumptions?– Insertion Sort
– Bubble Sort
– …………
What is lower bound in time complexity under these assumptions?
xi-1 i+1i
Observation
{x1, x2, …, xn }
: { 1, 2, …, n } { 1, 2, …, n }
(i) < (j)
(i) means that the ith element is placed at the (i)th position.
(i), 1in, is the correct position of xi when the list is sorted.
xi < xj
Now,
: {1,2,…,n}{1,2,…,n}, where is defined as
(i) = j, if xi is the jth smallest one
x1 x2 x3 x4 x5 x6
L = ( 2.2, 5.3, 4.2, 6.6, 1.9, 3.8 ) ( 2, 5, 4, 6, 1, 3 )
( (1), (2), (3), (4), (5), (6) )
Def’n : An inversion of the permutation is an ordered pair
((i), (j)) such that i < j and (i) > (j)
((i), (j)) is an inversion The ith and jth keys are left
out of order(LOO).
How many inversions in L?
(2, -) : 1
(5, -) : 3
(4, -) : 2
(6, -) : 2
(1, -) : 0
8 inersions (LOO’s)
Given |L| = n, how many possible inversons in the worst case ?
inversions !!!
why?
2
)1( nn
2.2, 5.3, 4.2, 6.6, 1.9, 3.8 L = ( x1 x2 x3 x4 x5 x6 )
( 2, 5, 4, 6, 1, 3 )
( (1), (2), (3), (4), (5), (6) )
((i), (j))
What does an inversion imply in sorting?
xi is required to follow xj in the sorted list !!!
How can you do this?
detection : comparisons (“Local”)
resolving : “Local” moves
At least, one comparison is needed per each inversion !!!
How many inversions in the worst case?
(n2) in worst case
How about average case ?
Well, …
need to compute the average # of inversions !!!
2
)1( nn
: { 1, 2, …, n } { 1, 2, …, n }
(i) < (j) xi < xj
n! permutations !!!
Assumption:
P( = i) = 1/n!, i=1,2,…,n
( (1), (2), …, (n) )T = ( (n), (n-1), …, (1) ).
Transpose of
For each (i,j), ((i), (j)) is an inversion in
either ( (1), (2), …, (n) ) or ( (n), (n-1), …, (1) )
How many (, T) pairs?
why?
How many (i,j) pairs?
2
n!
2
)1(
2
nnn
Average # of inversions
(n2) in average case, too !!!
4
)1(
2
)1(
2
1
nnnnn!
n!
Sort In Place? Stable?
Insertion
Selection
Bubble
yes
yes
yes
yes
no why?
yes
- easy to implement
- good for small input
26 26 26 26 26 (11)
5 5 5 5 5 5
37 o (19) 19 19 19 19
1 1 o 1 1 1 1
61 61 61 o (15) 15 15
11 11 11 11 o 11 t (26)
59 59 59 59 t 59 o 59
15 15 15 t (61) 61 61
48 48 t 48 48 48 48
19 t (37) 37 37 37 37
3. QuickSort
Basic Idea
(1) Place xi in its final position.
( x1, x2,…, xj,…, xn )
xk < xi xi xk > xi
k = 1,2,…,j - 1 k = j+1,j+2,…,n
(2) Divide and Conquer !!!
T(n) = T(j - 1) + T(n - j) +O(n) !!!
> x y
x
x <x y >x
y <x x >x
case 1
case 2
Alternative method (Textbook)
: y x
: y < x
x < x y > x y
x
x
< x y
< x > x
x < x > x yo t
t
t
t
t
t
o
o
o
o
o
Worst case (when?)
T(n) = T(n-1) + c(n-1), c>0
0 x n-1
P(0) P(n-1)
T(0) = 0 T(n-1)
Average Case
)(O2
)1(O 2n
n-nT(n)
>x
inAiAn
n-cnAn
i
1
))()1((1
)1()(
)logO()(21
)1(1
2
nniAn
n-cn
i
Time Complexity
why?
i = 1 A(0) + A(n-1)
2 A(1) + A(n-2) ……
n-1 A(n-2) + A(1)
n A(n-1) + A(0)
0
0
0
0
Good Performance !!!
(Practically)
In place?
no!!! why?
Stable?
no!!! why?
Theorem : A(n) = O(nlogen)
[Proof] From the previous lecture,
n = 1:
A(1) = 0 why?
nlogen = 1loge1 = 0
Suppose that
1
21 )(
2)1()(
n
i
iAn
ncnA
for kiiciiA e 1,log)( nk
2)2log21(
1
2)
2()1(log)1(
42
log
1
2
log1
2
log1
2
)(1
2
)(1
2)11()1(
1
1
2
22
1
1
2
1
21
21
11
21
c
k
ccckkkc
xxx
k
ckc
xdxxk
ckc
icik
kc
iAk
kc
iAk
kckA
ee
k
e
k
e
k
ie
k
i
k
i
n = k + 1 :
A(k+1) c(k+1)loge(k+1) why?
)1(2)1(2
)1(2)2)(1()1()1()1()(
)(1
2)2()1(
)(2
)1()(
2
2
1
2
ncnA
nAnncnncnAnnnA
iAn
ncnA
iAn
ncnA
n
i
n
i
)1(
)1(2)1(
1
)(
)1(2)1()1()(
nn
nc
n
nA
n
nA
ncnAnnnA
)1(
)1(2)1()(
1
)()(
nn
ncnBnB
n
nAnB
Alternative Proof
B(n) = O(logen)
A(n) = O(nlogen)
2log)1(log2log2
12
12
1
12
)1(
1
1
12
)1(
)1(2)(
1
2
2
1
22
22
een
e
n
i
nn
i
n
i
n
i
ncxc
dxx
ci
ci
c
iiic
ii
icnB
<x x >x L1 2 … i … … n
)(,2
nTn
i
Comment on QuickSort
T(n) = T(i-1) + T(n-i) + cn
very sensitive to i, hence x
i) choosing x
- random
- median { L(1), L(n+1/2), L(n) }
…………
ii) Quicksort if n ko
Other nonrecursive sort, otherwise !!!
iii) Manipulate the stack explicitly
why?
iv) Put the larger subproblem in the stack !!!
why?
…………… … …
………………………
n
n-ii-1
ko
QuickSort
Other Sort
[ 1 11 5 21 3 15 12 17 ]
[ [ 1 11 5 21 ] [ 3 15 12 17 ] ]
[ [ [ 1 11 ] [ 5 21 ] ] [ [ 3 15 ] [ 12 17 ] ] ]
[ [ [ [1] [11] ] [ [5] [21] ] ] [ [ [3] [15] ] [ [12] [17] ] ] ]
[ [ [ 1 11 ] [ 5 21 ] ] [ [ 3 15 ] [ 12 17 ] ] ]
[ [ 1 5 11 21 ] [ 3 12 15 17 ] ]
[ 1 3 5 11 12 15 17 21 ]
Basic Idea
“ Divide and Conquer”
Divide
Merge
P(n)
4. Merge Sort
2
nP
2
nP
T(n) = 2T(n/2) + cn
Time required for dividing and merging
T(n) = O(nlogn)
Optimal !!!
How about A(n)?
A(n) = O(nlogn)
why?
Is the mergesort optimal in the average case?
well, …
1 3 5 7 9 11
1 2 3 4 5 6 7 8 9 10 11
2 4 6 8 10
How many comparisons?
10 comparisons !!!
n+m-1 comparisons in general
Theorem : Any algorithm for merging two sorted lists, each containing n entities, does at least 2n-1 comparisons in the worst case.
[Proof]
(a1, a2, …, an) (a1, b1, a2 b2, …, an , bn)
(b1, b2, …, bn)
ai < bi < ai+1, i = 1,2,…,n-1
Claim : bi must be compared with ai and ai+1
2n-1 comparisons. Why?
Merging
Suppose that bi is not compared with ai
a1 < b1 < … < ai-1 < bi-1 < ai < bi < ai+1 < … < an < bn
a1 < b1 < … < ai-1 < bi-1 < bi < ai < ai+1 < … < an < bn
the same result #
Similarly, bi needs to be compared with ai+1!!!
Is the Mergesort stable?
yes !!!
why?
Is the Mergesort in place?
no !!!
why? - stack
- copying
……………
……
Decision tree
l
eplcnTA )(
epl
# of leaf nodes
)()(l
eplnTA
the average external path length from the root to a leaf
:l
epl
Lower bound for SORT in average case
Def’n : The external path length of a tree is the sum of the length of all paths from the root to all leaves
Def’n : A binary tree is said to be a 2-tree if every node of the tree is of outdegree 0 or 2
A decision tree is a 2-tree
Lemma : Among 2-trees with l leaves, the epl is minimized if all the leaves are on at most two adjacent levels.
…………………
………………
full binary tree complete binary tree
[Proof] (By Contradiction) Suppose that we have a 2-
tree that has a leaf x at level k, where k d - 2
We can always rebuild a 2-tree with the same number of leaves and lower epl. #
X
Y
X
Y
… k …
… d-1 … … d …
- ( k + 2d )
+( 2(k + 1) + d - 1 )
k + 1 - d < 0
Lemma : The minimum epl with l leaves is
[Proof]
If l = 2k, kZn, then all the leaves are at level k !!! why?
k = log2l
Supose that l 2k for any kZn Then,
why? l 2d
From the previous lemma, all leaves is at level d-1 or d
)2(2log 2log2
llll
ld 2log
………………l(d-1)
+
2(l-2d-1)
why?
)2(2log
)2(2)1(2log
2
1
l
d
lll
ldl
Lemma : The average path length in a 2-tree with l leaves is at least log2l
[Proof]
Theorem: The average # of comparisons done by an algorithm to sort n
items by comparison of keys is at least lnn! = (nlnn)
[Proof] l n!
QuickSort and MergeSort are optimal in the average case
ll
ll
l
lll l
2
22
log2
log
log212log
)2(2log 2