CHAPTER 8 SORTING 【学习内容】【学习内容】 BASIC CONCEPTS 1. Introduction and Notation 2. Insertion Sort 3. Selection Sort 4. Shell Sort 5. Lower Bounds 6. Divide-and-Conquer.

CHAPTER 8 SORTING

【【学习内容学习内容】】 BASIC CONCEPTS1. Introduction and Notation2. Insertion Sort3. Selection Sort4. Shell Sort5. Lower Bounds6. Divide-and-Conquer Sorting7. Mergesort for Linked Lists8. Quicksort for Contiguous Lists9. Heaps and Heapsort10. Review: Comparison of Methods

Definitions of Sorting Problem Description: Given a list of records ( R0, R1, , Rn1 ) where each Ri has

a key value Ki. There is a transitive ordering relation “<” on the ke

ys such that for any two key values Ki and Kj, either Ki = Kj or Ki < Kj or Kj < Ki. Sorting is to find a permutation such that K(i1) K(i) for all 0 < i n1. The desired ordering is then ( R(0), R(1), , R(n1) ) . Remarks: No single sorting technique is the best in all cases. If there are identical key values, then is not unique. If s is a permutation which is not only sorted, but also stable —

that is, if Ki = Kj for i < j, then Ri precedes Rj in the sorted list

— then such a s is unique. A sorting method is stable if it generates s . An internal sort is to sort the list entirely in main memory. An external sort is to sort the file piece by piece in main memory.

概述概述排序排序：：将一组杂乱无章的数据按一定的规律将一组杂乱无章的数据按一定的规律顺次排列起来。顺次排列起来。

数据表数据表 ((datalistdatalist)):: 它是待排序数据对象的有它是待排序数据对象的有限集合。限集合。

排序码排序码 ((keykey)):: 通常数据对象有多个属性域通常数据对象有多个属性域 , , 即多个数据成员组成即多个数据成员组成 , , 其中有一个属性域可其中有一个属性域可用来区分对象用来区分对象 , , 作为排序依据。该域即为作为排序依据。该域即为排排序码序码。每个数据表用哪个属性域作为排序码，。每个数据表用哪个属性域作为排序码，要视具体的应用需要而定。要视具体的应用需要而定。

排序算法的稳定性排序算法的稳定性 :: 如果在对象序列中有如果在对象序列中有两个对象两个对象 rr[[ii]] 和和 rr[[jj]], , 它们的排序码它们的排序码 kk[[ii]] == ==

kk[[jj]] , , 且在排序之前且在排序之前 , , 对象对象 rr[[ii]] 排在排在 rr[[jj]] 前面。前面。如果在排序之后如果在排序之后 , , 对象对象 rr[[ii]] 仍在对象仍在对象 rr[[jj]] 的的前面前面 , , 则称这个排序方法是稳定的则称这个排序方法是稳定的 , , 否则否则称这个排序方法是不稳定的。称这个排序方法是不稳定的。

内排序与外排序内排序与外排序 :: 内排序是指在排序期间内排序是指在排序期间数据对象全部存放在内存的排序；外排序数据对象全部存放在内存的排序；外排序是指在排序期间全部对象个数太多，不能是指在排序期间全部对象个数太多，不能同时存放在内存，必须根据排序过程的要同时存放在内存，必须根据排序过程的要求，不断在内、外存之间移动的排序。求，不断在内、外存之间移动的排序。

排序的时间开销排序的时间开销 :: 排序的时间开销是衡量排序的时间开销是衡量算法好坏的最重要的标志。算法好坏的最重要的标志。排序的时间开销排序的时间开销可用算法执行中的可用算法执行中的数据比较次数数据比较次数与与数据移动数据移动次数次数来衡量来衡量。。

算法运行时间代价的大略估算一般都算法运行时间代价的大略估算一般都按平均按平均情况情况进行估算。对于那些进行估算。对于那些受对象排序码序列受对象排序码序列初始排列及对象个数影响较大的初始排列及对象个数影响较大的，，需要需要按最按最好情况好情况和和最坏情况最坏情况进行估算进行估算。。

算法执行时所需的附加存储算法执行时所需的附加存储 :: 评价算法好评价算法好坏的另一标准。坏的另一标准。

Sortable_listsSortable_liststemplate <class Record>class Sortable_list: public List<Record> {public: // Add prototypes for sorting methods here.private: // Add prototypes for auxiliary functions here.};

Insertion Sort

template <class Record>void Sortable_list<Record> :: insertion sort( )/* Post: The entries of the Sortable_list have been rearranged so that the keys in all the entries are sorted into nondecreasing order.Uses: Methods for the class Record; the contiguous List implementation of Chapter 6 */{

int first_unsorted; // position of first_unsorted entryint position; // searches sorted part of listRecord current; // holds the entry temporarily removed from listfor (first_unsorted = 1; first_unsorted < count; first_unsorted++)

if (entry[first_unsorted] < entry[first_unsorted − 1]) {position = first_unsorted;current = entry[first_unsorted]; // Pull unsorted entry out of the list.

do { // Shift all entries until the proper position is found.entry[position] = entry[position − 1];position−−; // position is empty.

} while (position > 0 && entry[position − 1] > current);entry[position] = current;

}} Worst case Tp = O( ? )

1

1

n

i

i = O( n2 )

The pivot key is placedat the right position with respect to

the sorted sub-list.

Insertion Sort Linked VersionInsertion Sort Linked Version

Insertion Sort Linked VersionInsertion Sort Linked Version

Insertion Sort

A record Ri is LOO ( Left Out of Order ) }{max0 jiji KK

If there are k records that are LOO, then the worst case

))1((

))()2()1(()(

nkO

knnnOnOTp

Good for k << nand n is not too large.If k = n, Tp = O( n2 ).Average Tp = O( n2 ).

Variations:

Replace the sequential search by the binary search.

Search faster but still have to move the records.

Use linked list for representation. Now insertion becomes simple,but we have to use

the sequential search.

Selection SortSelection Sort

Selection SortSelection Sort

Shell SortShell Sort

希尔排序希尔排序 (Shell Sort)(Shell Sort) 希尔排序方法又称为缩小增量排序。该方法的希尔排序方法又称为缩小增量排序。该方法的基本思想是基本思想是 : : 设待排序对象序列有设待排序对象序列有 n n 个对个对象象 , , 首先取一个整数首先取一个整数 gap < ngap < n 作为间隔作为间隔 , , 将将全部对象分为全部对象分为 gap gap 个子序列个子序列 , , 所有所有距离为距离为 gapgap 的对象放在的对象放在同一个子序列同一个子序列中中 , , 在每一个在每一个子序列中分别施行直接插入排序。然后缩小间子序列中分别施行直接插入排序。然后缩小间隔隔 gap, gap, 例如取例如取 gap = gap = gap/2gap/2 ，重复上述，重复上述的子序列划分和排序工作。直到最后取的子序列划分和排序工作。直到最后取 gapgap == == 1, 1, 将所有对象放在同一个序列中排序为止。将所有对象放在同一个序列中排序为止。

Shell SortShell Sort

Lower Bounds of SortLower Bounds of Sort

Optimal Sorting Time【 Theorem 】 Any algorithm that sorts by comparisons only

must have a worst case computing time of ( n log2 n ).

Proof: K0 K1

K1 K2

K0 K2stop

[0,1,2]

stop[0,2,1]

stop[2,0,1]

T F

T F

K0 K2

K1 K2stop

[1,0,2]

stop[1,2,0]

stop[2,1,0]

T F

T F

T F

Decision tree for insertion sort on R0, R1, and R2

When sorting n distinct

elements, there are n! different possible results.

Thus any decision tree must

have at least n! leaves.

If the height of the tree

is k, then n! 2k1 (# of leaves in a complete

binary tree)

k log2 n! + 1

Since n! (n/2)n/2 and log2 n! (n/2)log2(n/2) = ( n log2 n )

Therefore Tp = k c n log2 n .

Divide and Conquer Sortingvoid Sortable_list :: sort( ){

if the list has length greater than 1 {partition the list into lowlist, highlist;lowlist.sort( );highlist.sort( );combine(lowlist, highlist);}

}

Merge Sort

Iterative Merge Sort

Sketch of the idea:

list 0 1 2 3 …… …… n4 n3 n2 n1

…… ……

…… ……

…… …… …… ……

21 25 49 25* 16 08 21 25 49 25* 16 08

21 25 49 25 49

21 25* 16 08

25* 16 08

21 25 49 25* 16 08

16 08 25* 25 49 21

递归

21 25* 16 08 49 25

25* 16 08 21 25 49

回推

Merge SortMerge Sort

Merge SortMerge Sort

Divide Linked List in HalfDivide Linked List in Half

Analysis of MergesortAnalysis of Mergesort

Quick Sort —— gives the best average Tp

Sketch of the idea: The pivot Ki is placed at the right position wit

h respect to the whole list.

R0 R1 R2 Ri Rj Rn2 Rn1

pivot< K0 > K0< K0 > K0…… K0 K0 ……

i < j

Rj Ri+1 Rj1 Ri

Continue for sublist until ...

Rj Ri

Right position for R0

[ Rj R1 R2 Rj1 ] R0 [ Ri Rn2 Rn1 ]

Smaller Universes

QuicksortQuicksort

Quick SortQuick Sort

Quick SortQuick Sort

Partitioning the ListPartitioning the List

Quick Sort

Worst case Tp = O( ? )n2 if the list is already in sorted order.

Lucky case: [ ... ... ] [ ... ... ] T ( n ) = O( n ) + 2 T ( n / 2 ) = O( n ) + 2 [ O( n / 2 ) + 2 T ( n / 22 ) ] = 2 O( n ) + 22 T ( n / 22 ) = ... ... = k O( n ) + 2k T ( n / 2k ) n / 2k = 1 k = log2 n

= O( n log2 n ) + n T ( 1 )= O( n log2 n )

【 Lemma 7.1 】 Let Tavg( n ) be the expected time for quickso

rt to sort a file with n records. Then there exists a constant k such that Tavg( n ) k n loge n for n 2.

Proof: By induction. See p.331~332.

Space: Slucky = O( ln n ); Sworst = O( n ).

Trees, Lists, and HeapsTrees, Lists, and Heaps

If the root of the tree is in position 0 of the list, then the left and right children of the node in position k are in positions 2k + 1 and 2k + 2 of the list, respectively. If these positions are beyond the end of the list, then these children do not exist.

The Definition of HeapsThe Definition of Heaps

DEFINITION A heap is a list in which each entry contains a key, and, for all positions k in the list, the key at position k is at least as large as the keys in positions 2k and 2k + 1, provided these positions exist in the list.

完全二叉树顺序表示完全二叉树顺序表示KKii KK22ii+1+1 && KKii KK22ii+2+2

Heap Sort

b

d a

c

List [1]

[2] [3]

[4]

Adjust the list into a max heap

d

c

b

Exchange the 1st record with the last one

d

b

Decrement the heap size and go to step 1

c

b c

a

a

b

b

a

Trace of heap_sortTrace of heap_sort

Insertion into a heapInsertion into a heap

ni

iT

h

i

ih

h

i

22

distance) moving(max ) level on nodes of (#

1

1

1heapmax intoadjust

Heap Sort

Analysis of heapsort:

)ln()1(hight)(max adjust and sorting nnOnT

Therefore Tp = O( n ln n ). With a fix amount of additional storage, it is slightly slower than merge sort using O( n ) additional space, but is faster than merge sort using O( 1 ) additional space.

Radix Sort 基数排序 — sorting records that have several keys

Ki j ::= the j-th key of record Ri

Ki 0 ::= the most significant key of record Ri

Suppose that the record Ri has r keys.

Ki r1 ::= the least significant key of record Ri

A list of records R0, ..., Rn1 is lexically sorted with respect to the keys K 0, K 1, ..., K r1 iff

.10),,,,(),,,( 11

11

01

110

niKKKKKK riii

riii

That is, Ki 0 = Ki+1

0, ... , Ki l = Ki+1

l, Ki l+1 < Ki+1

l+1 for some l < r 1.

〖 Example 〗 A deck of cards sorted on 2 keysK 0 [Suit] < < < K 1 [Face value] 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10 < J < Q < K < A

Sorting result : 2 ... A 2 ... A 2 ... A 2 ... A

§8 Radix Sort

1. MSD ( Most Significant Digit ) Sort 最高位优先 Sort on K 0: for example, create 4 bins for the suits

3

3

5

5

A

A

4

4

Sort each bin independently (using insertion sort)

Stack the 4 bins

§8 Radix Sort

2. LSD ( Least Significant Digit ) Sort 最低位优先 Sort on K 1: for example, create 13 bins for the face values

2

2

3

3

4

4

5

5

A

A

...

Reform them into a single pileA

A

3

3

2

2

Create 4 bins and resort using any stable method

Note: If the number of the least significant keys is O( n ), then a bin sort requires only O( n ) time, thus making it a very fast sorting technique.

Radix Sort

Remark: MSD or LSD can be used to sort a single key, if we interpret this key as a composite of several keys. For example, for 0 K 999 we can break it into 3 keys: K = K 0 100 + K 1 10 + K 2 1.

3. Radix Sort —— decompose the (integer) key into digits using a radix r

〖 Example 〗 K = 12, r = 10 K 0 = 1, K 1 = 2 K = 1100, r = 10 K 0 = K 1 = 1, K 2 = K 3 = 0

基数排序的“分配”与“收集”过程第基数排序的“分配”与“收集”过程第一趟一趟

614614 921921 485485 637637738738 101101 215215 530530 790790 306306

第一趟分配（按最低位第一趟分配（按最低位 ii = 3 = 3 ））re[0] re[1] re[2] re[3] re[4] re[5] re[6] re[7] re[8] re[9]

614614 738738921921 485485 637637

101101 215215

530530

790790

306306

fr[0] fr[1] fr[2] fr[3] fr[4] fr[5] fr[6] fr[7] fr[8] fr[9]

第一趟收集第一趟收集

530530 790790 921921 101101 614614 485485 215215 306306 637637 738738

基数排序的“分配”与“收集”过程第基数排序的“分配”与“收集”过程第二趟二趟

614614921921 485485 637637 738738101101 215215530530 790790 306306

第二趟分配（按次低位第二趟分配（按次低位 ii = 2 = 2 ））re[0] re[1] re[2] re[3] re[4] re[5] re[6] re[7] re[8] re[9]

614614

738738

921921 485485637637

101101

215215

530530 790790

306306


第二趟收集第二趟收集

530530 790790921921101101 614614 485485215215306306 637637 738738

基数排序的“分配”与“收集”过程第基数排序的“分配”与“收集”过程第三趟三趟

614614 921921 485485637637 738738101101 215215 530530 790790306306

第三趟分配（按最高位第三趟分配（按最高位 ii = 1 = 1 ））re[0] re[1] re[2] re[3] re[4] re[5] re[6] re[7] re[8] re[9]

614614 738738 921921485485

637637

101101 215215 530530

790790

306306


第三趟收集第三趟收集

530530 790790 921921101101 614614485485215215 306306 637637 738738

Linked Implementation of Radix Sort

Sorting Method, Linked Radix SortSorting Method, Linked Radix Sort

Auxiliary Functions, Linked Radix SortAuxiliary Functions, Linked Radix Sort

The time used by radix sort is O(nk), where n is the number of items being sorted and k is the number of characters in a key.

The relative performance of radix sort to other methods will relate to the relative sizes of nk and n lg n; that is, of k and lg n.

If the keys are long but there are relatively few of them, then k is large and lg n relatively small, and other methods (such as mergesort) will outperform radix sort.

If k is small (the keys are short) and there are a large number of keys, then radix sort will be faster than any other method we have studied.

Analysis of Radix SortAnalysis of Radix Sort

若每个排序码有若每个排序码有 dd 位位 , , 需要重复执行需要重复执行 dd 趟趟““分配分配”与“”与“收集收集”。每趟对 ”。每趟对 nn 个对象个对象进行“进行“分配分配”，对”，对 radixradix 个队列进行“个队列进行“收收集集”。总时间复杂度为”。总时间复杂度为 O ( d ( n+radixO ( d ( n+radix )) )) 。。

若若基数基数 radixradix 相同相同 , , 对于对于对象个数较多对象个数较多而而排排序码位数较少序码位数较少的情况的情况 , , 使用链式基数排序较使用链式基数排序较好。好。

基数排序需要增加基数排序需要增加 n+2radixn+2radix 个附加链接指个附加链接指针。针。

基数排序是基数排序是稳定稳定的排序方法。的排序方法。

各种排序方法的比较各种排序方法的比较

比较次数移动次数附加存储排序方法

最好最差

最好最差

稳定性最好最

差

直接插入排序 n n2 0 n2 1

折半插入排序 n log2n 0 n2 1

起泡排序 n n2 0 n2 1

快速排序 nlog2n n2 log2n n2 log2n n2

简单选择排序 n2 0 n 1

锦标赛排序 n log2n n log2n n

堆排序 n log2n n log2n 1

归并排序 n log2n n log2n n

Homework P327 Exercises 8.2

E1(d), E2, E3, E4, E5 P332 Exercises 8.3

E1(d), E2 P335 Exercises 8.4

E2 P338 Exercises 8.5

E3 P343 Exercises 8.6

E1, E2 P360 Exercises 8.8

E1, E3, E4, E8 P370 Exercises 8.9

E1(f)(h), E2(d) P396 Exercises 9.5

E2

CHAPTER 8 SORTING 【学习内容】 【学习内容】 BASIC CONCEPTS 1. Introduction and Notation 2. Insertion Sort 3. Selection Sort 4. Shell Sort 5. Lower Bounds 6. Divide-and-Conquer.

Documents

CHAPTER 8 SORTING 【学习内容】【学习内容】 BASIC CONCEPTS 1. Introduction and Notation 2. Insertion Sort 3. Selection Sort 4. Shell Sort 5. Lower Bounds 6. Divide-and-Conquer.