Chap 7 Sorting

Post on 14-Jan-2016

43 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Chap 7 Sorting. The term list here is a collection of records. Each record has one or more fields. Each record has a key to distinguish one record with another. - PowerPoint PPT Presentation

Transcript

Chap 7Chap 7

SortingSorting

Motivation of Sorting• The term list here is a collection of

records.• Each record has one or more fields.• Each record has a key to distinguish one

record with another.• For example, the phone directory is a list.

Name, phone number, and even address can be the key, depending on the application or need.

Sorting

• Two ways to store a collection of records– Sequential– Non-sequential

• Assume a sequential list f. To retrieve a record with key f[i].key from such a list, we can do search in the following order:f[n].key, f[n-1].key, …, f[1].key => sequential search

Example of An Element of A Search List

class Element{public:int getKey() const {return key;};void setKey(int k) {key = k;};

private:int key;// other records…

}

Sequential Search

int SeqSearch (Element *f, const int n, const int k)// Search a list f with key values // f[1].key, …, f[n].key. Return I such// that f[i].key == k. If there is no such record,

return 0{

int i = n;f[0].setKey(k);while (f[i].getKey() != k) i--;return i;

}

Sequential Search

• The number of comparisons for a record key i is n – i +1.

• The average number of comparisons for a successful search is

• For the phone directory lookup, there should be a better way than this.

ni

nnin1

2/)1(/)1(

Search

• A binary search only takes O(log n) time to search a sequential list with n records.

• In fact, if we look up the name start with W in the phone directory, we start search toward the end of the directory rather than the middle. This search method is based on interpolation scheme.

• An interpolation scheme relies on a ordered list.

nkeylfkeyuf

keylfki *)

].[].[

].[(

Verifying Two Lists With Sequential Search

void Verify1(Element* F1, Element* F2, const int n, const int m)// Compare two unordered lists F1 and F2 of size n and m, respectively{

Boolean *marked = new Boolean[m];for (int i = 1; i <= m; i++) marked[i] = FALSE;

for (i = 1; i<= n; i++) { int j = SeqSearch(F2, m, F1[i].getKey()); if (j == 0) cout << F1[i].getKey() <<“not in F2 “ << endl; else {

if (F1[i].other != F2[j].other) cout << “Discrepancy in “<<F[i].getKey()<<“:”<<F1[i].other

<< “and “ << F2[j].other << endl;marked[j] = TRUE; // marked the record in F2[j] as being seen

}}for (i = 1; i <= m; i++) if (!marked[i]) cout << F2[i].getKey() <<“not in F1. “ << endl;delete [ ] marked;

}

O(mn)

Fast Verification of Two Lists

void Verify2(Element* F1, Element* F2, const int n, const int m)// Same task as Verfy1. But sort F1 and F2 so that the keys are in// increasing order. Assume the keys in each list are identical{

sort(F1, n);sort(F2, m);int i = 1; int j = 1;while ((i <= n) && (j <= m)) switch(compare(F1[i].getKey(), F2[j].getKey())) {

case ‘<‘: cout<<F1[i].getKey() <<“not in F2”<< endl; i++; break;

case ‘=‘: if (F1[i].other != F2[j].other) cout << “Discrepancy in “ << F1[i].getKey()<<“:” <<F1[i].other<<“ and “<<F2[j].other << endl; i++; j++; break;

case ‘>’: cout <<F2[j].getKey()<<“ not in F1”<<endl; j++;

}if (i <= n)PrintRest(F1, i, n, 1); //print records I through n of F1else if (j <= m) PrintRest(F2, j, m, 2); // print records j through m of F2

}

O(max{n log n, m log m})

Formal Description of Sorting

• Given a list of records (R1, R2, …, Rn). Each record has a key Ki. The sorting problem is then that of finding permutation, σ, such that Kσ(i) ≤ K σ(i+1) , 1 ≤ i ≤ n – 1. The desired ordering is (Rσ(1), Rσ(2), Rσ(n)).

Formal Description of Sorting (Cont.)

• If a list has several key values that are identical, the permutation, σs, is not unique. Let σs be the permutation of the following properties:(1) Kσ(i) ≤ K σ(i+1) , 1 ≤ i ≤ n – 1

(2) If i < j and Ki == Kj in the input list, then Ri precedes Rj in the sorted list.

• The above sorting method that generates σs is stable.

Categories of Sorting Method• Internal Method: Methods to be used

when the list to be sorted is small enough so that the entire sort list can be carried out in the main memory.– Insertion sort, quick sort, merge sort, heap

sort and radix sort.

• External Method: Methods to be used on larger lists.

Insert Into A Sorted Listvoid insert(const Element e, Element* list, int i)

// Insert element e with key e.key into the ordered sequence list[0], …, list[i] such that the

// resulting sequence is also ordered. Assume that e.key ≥ list[0].key. The array list must

// have space allocated for at least i + 2 elements

{

while (e.getKey() < list[i].getKey())

{

list[i+1] = list[i];

i--;

}

list[i+1] = e;

} O(i)

Insertion Sort

void InsertSort(Element* list, const int n)

// Sort list in nondecreasing order of key{

list[0].setKey(MININT);

for (int j = 2; j <= n; j++)

insert(list[j], list, j-1);

}

1

1

21n

i

)O(n)(iO(

Insertion Sort Example 1

• Record Ri is left out of order (LOO) iff Ri <

• Example 7.1: Assume n = 5 and the input key sequence is 5, 4, 3, 2, 1

}{max1

jij

R

j [1] [2] [3] [4] [5]

- 5 4 3 2 1

2 4 5 3 2 1

3 3 4 5 2 1

4 2 3 4 5 1

5 1 2 3 4 5

Insertion Sort Example 2

• Example 7.2: Assume n = 5 and the input key sequence is 2, 3, 4, 5, 1

j [1] [2] [3] [4] [5]

- 2 3 4 5 1

2 2 3 4 5 1

3 2 3 4 5 1

4 2 3 4 5 1

5 1 2 3 4 5

O(1)

O(1)

O(1)

O(n)

Insertion Sort Ananlysis

• If there are k LOO records in a list, the computing time for sorting the list via insertion sort is O((k+1)n) = O(kn).

• Therefore, if k << n, then insertion sort might be a good sorting choice.

Insertion Sort Variations• Binary insertion sort: the number of

comparisons in an insertion sort can be reduced if we replace the sequential search by binary search. The number of records moves remains the same.

• List insertion sort: The elements of the list are represented as a linked list rather than an array. The number of record moves becomes zero because only the link fields require adjustment. However, we must retain the sequential search.

Quick Sort• Quick sort is developed by C. A. R. Hoare.

The quick sort scheme has the best average behavior among the sorting methods.

• Quick sort differs from insertion sort in that the pivot key Ki is placed at the correct spot with respect to the whole list. Kj ≤ Ks(i) for j < s(i) and Kj ≥ s(i) for j > s(i).

• Therefore, the sublist to the left of S(i) and to the right of s(i) can be sorted independently.

Quick Sort

void QuickSort(Element* list, const int left, const int right)// Sort records list[left], …, list[right] into nondecreasing order on field key. Key pivot = list[left].key is // arbitrarily chosen as the pivot key. Pointers I and j are used to partition the sublist so that at any time// list[m].key pivot, m < I, and list[m].key pivot, m > j. It is assumed that list[left].key ≤ list[right+1].key.{ if (left < right) {

int i = left, j = right + 1, pivot = list[left].getKey(); do {

do i++; while (list[i].getKey() < pivot);do j--; while (list[j].getKey() > pivot);if (i<j) InterChange(list, i, j);

} while (i < j); InterChange(list, left, j); QuickSort(list, left, j–1); QuickSort(list, j+1, right);

}}

Quick Sort Example

• Example 7.3: The input list has 10 records with keys (26, 5, 37, 1, 61, 11, 59, 15, 48, 19).

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 Left Right

[26 5 37 1 61 11 59 15 48 19 1 10

[11 5 19 1 15] 26 [59 61 48 37] 1 5

[1 5] 11 [19 15] 26 [59 61 48 37] 1 2

1 5 11 [19 15] 26 [59 61 48 37] 4 5

1 5 11 15 19 26 [59 61 48 37] 7 10

1 5 11 15 19 26 [48 37] 59 [61] 7 8

1 5 11 15 19 26 37 48 59 [61] 10 10

1 5 11 15 19 26 37 48 59 61

Quick Sort (Cont.)• In QuickSort(), list[n+1] has been set to have a

key at least as large as the remaining keys.• Analysis of QuickSort

– The worst case O(n2)– If each time a record is correctly positioned, the sublist

of its left is of the same size of the sublist of its right. Assume T(n) is the time taken to sort a list of size n:

T(n) ≤ cn + 2T(n/2), for some constant c ≤ ≤ cn + 2(cn/2 +2T(n/4))

≤ 2cn + 4T(n/4)::

≤ cn log2n + T(1) = O(n logn)

Lemma 7.1

• Lemma 7.1: Let Tavg(n) be the expected time for function QuickSort to sort a list with n records. Then there exists a constant k such that Tavg(n) ≤ kn logen for n ≥ 2.

Analysis of Quick Sort• Unlike insertion sort (which only needs additional

space for a record), quick sort needs stack space to implement the recursion.

• If the lists split evenly, the maximum recursion depth would be log n and the stack space is of O(log n).

• The worst case is when the lists split into a left sublist of size n – 1 and a right sublist of size 0 at each level of recursion. In this case, the recursion depth is n, the stack space of O(n).

• The worst case stack space can be reduced by a factor of 4 by realizing that right sublists of size less than 2 need not be stacked. Asymptotic reduction in stack space can be achieved by sorting smaller sublists first. In this case the additional stack space is at most O(log n).

Quick Sort Variations

• Quick sort using a median of three: Pick the median of the first, middle, and last keys in the current sublist as the pivot. Thus, pivot = median{Kl, K(l+r)/2, Kr}.

Decision Tree• So far both insertion sorting and quick

sorting have worst-case complexity of O(n2).• If we restrict the question to sorting

algorithms in which the only operations permitted on keys are comparisons and interchanges, then O(n logn) is the best possible time.

• This is done by using a tree that describes the sorting process. Each vertex of the tree represents a key comparison, and the branches indicate the result. Such a tree is called decision tree.

Decision Tree for Insertion Sort

K1 ≤ K2

stop K1 ≤ K3

K2 ≤ K3 K1 ≤ K3

stop K2 ≤ K2

stop stop stop stop

Yes No

Yes

Yes

Yes

Yes

No

No No

No

I

II III

IV

V VI

Decision Tree (Cont.)

• Theorem 7.1: Any decision tree that sorts n distinct elements has a height of at least log2(n!) + 1

• Corollary: Any algorithm that sorts only by comparisons must have a worst-case computing time of Ω(n log n)

Simple Merge

void merge(Element* initList, Element* mergeList, const int l, const int m, const int n)

{

for (int i1 =l,iResult = l, i2 = m+1; i1<=m && i2<=n; iResult++){

if (initList[i1].getKey() <= initList[i2].getKey()) {

mergeList[iResult] = initList[i1];

i1++;

}

else {

mergeList[iResult] = initList[i2];

i2++;

}

}

if (i1 > m)

for (int t = i2; t <= n; t++)

mergeList[iResult + t - i2] = initList[t];

else

for (int t = i1; t <= m; t++)

mergeList[iResult + t - i1] = initList[t];

}

O(n - l + 1)

Analysis of Simple Merge

• If an array is used, additional space for n – l +1 records is needed.

• If linked list is used instead, then additional space for n – l + 1 links is needed.

O(1) Space Merge

• A second merge algorithm only requires O(1) additional space.

• Assume total of n records to be merged into a list, where n is a perfect square. And the numbers of records in the left sublist and the right sublist are multiple of

n

O(1) Space Merge Steps

Step 1: Identify the records with largest keys. This is done by following right to left along the two lists to be merged.

Step 2: Exchange the records of the second list that were identified in Step 1 with those just to the left of those identified from the first list so that the records with largest keys are contiguous.

Step 3: Swap the block of largest with the leftmost block (unless it is already the leftmost block). Sort the rightmost block.

Step 4: Reorder the blocks, excluding the block of largest records, into nondecreasing order of the last key in the blocks.

Step 5: Perform as many merge substeps as needed to merge the blocks, other than the block with the largest keys.

Step 6: Sort the block with the largest keys.

n

1n

n

n

O(1) Space Merge Example (First 8 Lines)

0 2 4 6 8 a c e g i j k l m n t w z|1 3 5 7 9 b d f h o p q r s u v x y

0 2 4 6 8 a c e g i j k l m n t w z 1 3 5 7 9 b d f h o p q r s u v x y

0 2 4 6 8 a|c e g i j k|u v x y w z|1 3 5 7 9 b|d f h o p q|r s l m n t

u v x y w z|c e g i j k|0 2 4 6 8 a|1 3 5 7 9 b|d f h o p q|l m n r s t

u v x y w z 0 2 4 6 8 a|1 3 5 7 9 b|c e g I j k|d f h o p q|l m n r s t

0 v x y w z u 2 4 6 8 a|1 3 5 7 9 b|c e g I j k|d f h o p q|l m n r s t

0 1 x y w z u 2 4 6 8 a|v 3 5 7 9 b|c e g I j k|d f h o p q|l m n r s t

0 1 2 y w z u x 4 6 8 a|v 3 5 7 9 b|c e g I j k|d f h o p q|l m n r s t

O(1) Space Merge Example (Last 8 Lines)

0 1 2 3 4 5 u x w 6 8 a|v y z 7 9 b|c e g i j k|d f h o p q|l m n r s t

0 1 2 3 4 5 6 7 8 u w a|v y z x 9 b|c e g i j k|d f h o p q|l m n r s t

0 1 2 3 4 5 6 7 8 9 a w|v y z x u b|c e g i j k|d f h o p q|l m n r s t

0 1 2 3 4 5 6 7 8 9 a w v y z x u b c e g i j k|d f h o p q|l m n r s t

0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k v z u|y x w o p q|l m n r s t

0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k v z u y x w o p q|l m n r s t

0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q y x w|v z u r s t

0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t|v z u y x w

Analysis of O(1) Space Merge• Step 1 and 2 and the swapping of Step 3 each

take O( ) time and O(1) space.• The sort of Step 3 can be done in O(n) time and

O(1) space using an insertion sort.• Step 4 can be done in O(n) time and O(1) space

using a selection sort. (Selection sort sorts m records using O(m2) key comparisons and O(m) record moves. So it needs O(n) comparisons and the time to move blocks is O(n).

• If insertion sort is used in Step 4, then the time becomes O(n1.5) since insertion sort needs O(m2) record moves ( records per block * n record moves).

n

n

Analysis of O(1) Space Merge (Cont.)

• The total number of merge substeps is at most . The total time for Step 5 is O(n).

• The sort of Step 6 can be done in O(n) by using either a selection sort or an insertion sort.

• Therefore, the total time is O(n) and the additional space used is O(1).

1n

Iterative Merge Sort

• Treat the input as n sorted lists, each of length 1.

• Lists are merged by pairs to obtain n/2 lists, each of size 2 (if n is odd, the one list is of length 1).

• The n/2 lists are then merged by pairs, and so on until we are left with only one list.

Merge Tree

26 5 77 1 61 11 59 15 48 19

5 26 1 77 11 61 15 59 19 48

1 5 26 77 11 15 59 61 19 48

1 5 11 15 26 59 61 77 19 48

1 5 11 15 19 26 48 59 61 77

Iterative Merge Sort

void MergeSort(Element* list, const int n)// Sort list list into non-decreasing order of the keys list[1].key, …,list[n].key.{

Element* tempList = new Element[n+1];// l is the length of the sublist currently being merged.for (int l = 1; l < n; l *= 2){ MergePass(list, tempList, n, l); l *= 2; MergePass(tempList, list, n, l); //interchange role of list and tempList}delete[ ] tempList;

}

Merge Pass

void MergePass(Element* initList, Elemen* resultList, const int n, const int l)

// One pass of merge sort. Adjacent pairs of sublists of length l are merged

// from list initList to list resultList. n is the number of records in initList

{

for (int i = 1;

i <= n – 2*l + 1; // Are enough elements remaining to form two sublists of length l?

i += 2*l)

merge(initList, resultList, i, i + l - 1, i + 2*l – 1);

// merge remaining list of length < 2 * l

if ((i + l – 1) < n) merge(initList, resultList, i, i+l–1, n);

else for (int t = i; t <= n; t++) resultList[t] = initList[t];

}

Analysis of MergeSort

• Total of passes are made over the data. Each pass of merge sort takes O(n) time.

• The total of computing time is O(n log n)

n2log

Recursive Merge Sort• Recursive merge sort divides the list to be

sorted into two roughly equal parts:– the left sublist [left : (left+right)/2]

– the right sublist [(left+right)/2 +1 : right]

• These sublists are sorted recursively, and the sorted sublists are merged.

• To avoid copying, the use of a linked list (integer instead of real link) for sublist is desirable.

Sublist Partitioning For Recursive Merge Sort

26 5 77 1 61 11 59 15 48 19

5 26 11 59 19 48

5 26 77 11 15 59 19 48

1 5 26 61 77 11 15 19 48 59

1 5 11 15 19 26 48 59 61 77

1 61

Program 7.11 (Recursive Merge Sort )

class Element{ private:

int key; Field other; int link;public: Element() {link = 0;};

};

int rMergeSort(Element* list, const int left, const int right)// List list = (list[left], …, list[right]) is to be sorted on the field key.// link is a link field in each record that is initially 0// list[0] is a record for intermediate results used only in ListMerge{ if (left >= right) return left; int mid = (left + right)/2; return ListMerge(list, rMergeSort(list, left, mid), rMergeSort(list, mid+1, right));}

O(n log n)

Program 7.12 (Merging Linked Lists)

int ListMerge(Element* list, const int start1, const int start2){ int iResult = 0; for (int i1 = start1, i2 = start2; i1 && i2;){ if (list[i1].key <= list[i2].key) { list[iResult].link = i1; iResult = i1; i1 = list[i1].link; } else { list[iResult].link = i2; iResult = i2; i2 = list[i2].link; } } // move remainder if (i1 == 0) list[iResult].link = i2; else list[iResult] = i1; return list[0].link;}

Natural Merge Sort

• Natural merge sort takes advantage of the prevailing order within the list before performing merge sort.

• It runs an initial pass over the data to determine the sublists of records that are in order.

• Then it uses the sublists for the merge sort.

Natural Merge Sort Example

26 5 77 1 61 11 59 15 48 19

1 11 59 61 15 19 485 26 77

1 5 11 26 59 61 77 15 19 48

1 5 11 15 19 26 48 59 61 77

Heap Sort• Merge sort needs additional storage space

proportional to the number of records in the file being sorted, even though its computing time is O(n log n)

• O(1) merge only needs O(1) space but the sorting algorithm is much slower.

• We will see that heap sort only requires a fixed amount of additional storage and achieves worst-case and average computing time O(n log n).

• Heap sort uses the max-heap structure.

Heap Sort (Cont.)

• For heap sort, first of all, the n records are inserted into an empty heap.

• Next, the records are extracted from the heap one at a time.

• With the use of a special function adjust(), we can create a heap of n records faster.

Program 7.13 (Adjusting A Max Heap)

void adjust (Element* tree, const int root, const int n)// Adjust the binary tree with root root to satisfy the heap property. The left and right subtrees of root // already satisfy the heap property. No node has index greater than n.{ Element e = tree[root]; int k = e.getKey(); for (int j = 2*root; j <= n; j *= 2) { if (j < n) if (tree[j].getKey() < tree[j+1].getKey()) j++; // compare max child with k. If k is max, then done. if (k >= tree[j].getKey()) break; tree[j/2] = tree[j]; // move jth record up the tree } tree[j/2] = e;}

Program 7.14 (Heap Sort)

void HeapSort (Element* list, const int n)

// The list list = (list[1], …, list[n]) is sorted into nondecreasing order of the field key.

{

for (int i = n/2; i >= 1; i--) // convert list into a heap

adjust(list, i, n);

for (i = n-1; i >= 1; i--) // sort list

{

Element t = list[i+1]; // interchange list1 and list i+1

list[i+1] = list[1];

list[1] = t;

adjust(list, 1, i);

}

}

ki ki

iik

ki

i nOniniik1 1

1

1

1 )(22/2)(2

O(n log n)

Converting An Array Into A Max Heap

26

15 48 19

5

1 61

77

11 59

77

15 1 5

61

48 19

59

11 26

[1]

[2]

[3]

[4]

[5]

[6] [7]

[8] [9] [10]

[2]

[3]

[4]

[5]

[6] [7]

[8] [9] [10]

(a) Input array

(b) Initial heap

[1]

Heap Sort Example

61

5 1

48

15 19

59

11 26

[2]

[3]

[4]

[5]

[6] [7]

[8] [9]

[1]

Heap size = 9, Sorted = [77]

59

5

48

15 19

26

11 1

[2]

[3]

[5]

[6] [7]

[8]

[1]

Heap size = 8, Sorted = [61, 77]

Sorting Several Keys• A list of records are said to be sorted with respect to

the keys K1, K2, …, Kr iff for every pair of records i and j, i < j and (K1

i, K2i, …, Kr

i) ≤ (K1j, K2

j, …, Krj).

• The r-tuple (x1, x2, …, xr) is less than or equal to the r-tuple (y1, y2, …, yr) iff either xi = yi, 1 ≤ i ≤ j, and xj+1 < yj+1 for some j < r or xi = yi , 1 ≤ i ≤ r.

• Two popular ways to sort on multiple keys– sort on the most significant key into multiple piles. For

each pile, sort on the second significant key, and so on. Then piles are combined. This method is called sort on most-significant-digit-first (MSD).

– The other way is to sort on the least significant digit first (LSD).

• Example, sorting a deck of cards: suite and face value.– Spade > Heart > Diamond > Club

Sorting Several Keys (Cont.)• LSD and MSD only defines the order in

which the keys are to be sorted. But they do not specify how each key is sorted.

• Bin sort can be used for sorting on each key. The complexity of bin sort is O(n).

• LSD and MSD can be used even when there is only one key. – E.g., if the keys are numeric, then each

decimal digit may be regarded as a subkey. => Radix sort.

Radix Sort• In a radix sort, we decompose the sort key

using some radix r. – The number of bins needed is r.

• Assume the records R1, R2, …, Rn to be sorted based on a radix of r. Each key has d digits in the range of 0 to r-1.

• Assume each record has a link field. Then the records in the same bin are linked together into a chain:– f[i], 0 ≤ i ≤ r (the pointer to the first record in bin i)– e[i], (the pointer to the end record in bin i)– The chain will operate as a queue.– Each record is assumed to have an array key[d], 0 ≤

key[i] ≤ r, 0 ≤ i ≤ d.

Program 7.15 (LSD Radix Sort)void RadixSort (Element* list, const int d, const int n){ int e[radix], f[radix]; // queue pointers for (int i = 1; i <= n; i++) list[i].link = i + 1; // link into a chain starting at current list[n].link = 0; int current = 1; for (i = d – 1; i >= 0; i--) // sort on key key[i] { for (int j = 0; j < radix; j++) f[j] = 0; // initialize bins to empty queues for (; current; current = list[current].link) { // put records into queues int k = list[current].key[i]; if (f[k] == 0) f[k] = current; else list[e[k]].link = current; e[k] = current; } for (j = 0; f[j] == 0; j++); // find the first non-empty queue current = f[j]; int last = e[j]; for (int k = j+1; k < radix; k++){ // concatenate remaining queues if (f[k]){ list[last].link = f[k]; last = e[k]; } } list[last].link = 0; } }

O(n)

O(r)

d passes

O(d(n+r))

Radix Sort Example

179 208 306 93 859 984 55 9 271 33

list[1]

list[2] list[3] list[4] list[5] list[6] list[7] list[8] list[9] list[10]

e[0] e[1] e[2] e[3] e[4] e[5] e[6] e[7] e[8] e[9]

f[0] f[1] f[2] f[3] f[4] f[5] f[6] f[7] f[8] f[9]

179

859

9

2083065598493

33

271

271 93 33 984 55 306 208 179 859 9

list[1]

list[2] list[3] list[4] list[5] list[6] list[7] list[8] list[9] list[10]

Radix Sort Example (Cont.)

e[0] e[1] e[2] e[3] e[4] e[5] e[6] e[7] e[8] e[9]

f[0] f[1] f[2] f[3] f[4] f[5] f[6] f[7] f[8] f[9]

179859

9

208

306 55 984 9333 271

271 93 33 984 55 306 208 179 859 9

list[1]

list[2] list[3] list[4] list[5] list[6] list[7] list[8] list[9] list[10]

306 208 9 33 55 859 271 179 984 93

list[1]

list[2] list[3] list[4] list[5] list[6] list[7] list[8] list[9] list[10]

Radix Sort Example (Cont.)

e[0] e[1] e[2] e[3] e[4] e[5] e[6] e[7] e[8] e[9]

f[0] f[1] f[2] f[3] f[4] f[5] f[6] f[7] f[8] f[9]

179

55

33

9 859 984306

271

9 33 55 93 179 208 271 306 859 948

list[1]

list[2] list[3] list[4] list[5] list[6] list[7] list[8] list[9] list[10]

306 208 9 33 55 859 271 179 984 93

list[1]

list[2] list[3] list[4] list[5] list[6] list[7] list[8] list[9] list[10]

93

208

List And Table Sorts• Apart from radix sort and recursive merge

sort, all the sorting methods we have looked at so far require excessive data movement.

• When the amount of data to be sorted is large, data movement tends to slow down the process.

• It is desirable to modify sorting algorithms to minimize the data movement.

• Methods such as insertion sort or merge sort can be modified to work with a linked list rather than a sequential list. Instead of physical movement, an additional link field is used to reflect the change in the position of the record in the list.

Program 7.16 (Rearranging Records Using A Doubly Linked List

void list (Element* list, const int n, int first){ int prev = 0; for (int current = first; current; current = list[current].link)

// convert chain into a doubly linked list { list[current].linkb = prev; prev = current; }

for (int i = 1; i < n; i++) // move listfirst to position i while maintaining the list { if (first != i) { if (list[i].link) list[list[i].link].linkb = first; list[list[i].linkb].link = first; Element a = list[first]; list[first] = list[i];list[i] = a; } first = list[i].link; }}

O(n)

O(nm)

Assume each record is m words

Example 7.9

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 26 5 77 1 61 11 59 15 48 19

link 9 6 0 2 3 8 5 10 7 1

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 26 5 77 1 61 11 59 15 48 19

link 9 6 0 2 3 8 5 10 7 1

linkb

10 4 5 0 7 2 9 6 1 8

Example 7.9 (Cont.)

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 1 5 77 26 61 11 59 15 48 19

link 2 6 0 9 3 8 5 10 7 4

linkb 0 4 5 10 7 2 9 6 4 8

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 1 5 77 26 61 11 59 15 48 19

link 2 6 0 9 3 8 5 10 7 1

linkb 0 4 5 10 7 2 9 6 1 8

Configuration after first iteration of the for loop of list1, first = 2

Configuration after second iteration of the for loop of list1, first = 6

Example 7.9 (Cont.)

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 1 5 11 26 61 77 59 15 48 19

link 2 6 8 9 6 0 5 10 7 4

linkb 0 4 2 10 7 5 9 6 4 8

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 1 5 11 15 61 77 59 26 48 19

link 2 6 8 10 6 0 5 9 7 8

linkb 0 4 2 6 7 5 9 10 8 8

Configuration after third iteration of the for loop of list1, first = 8

Configuration after fourth iteration of the for loop of list1, first = 10

Example 7.10 (Rearranging Records Using Only One Link)

void list2(Element* list, const int n, int first)// Same function as list1 except that a second link field linkb is not required{ for (int i = 1; i < n; i++) { while (first < i ) first = list[first].link;

int q = list[first].link; // listq is next record in nondecreasing order if (first != i)

// interchange listi and listfirst moving listfirst to its correct spot as listfirst has ith smallest key.

// Also set link from old position of listi to new one { Element t = list[i]; list[i] = list[first]; list[first] = t; list[i].link = first; } first = q; }}

O(n)

O(nm)

Example 7.10 (Rearranging Records Using Only One Link)

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 1 5 77 26 61 11 59 15 48 19

link 4 6 0 9 3 8 5 10 7 1

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 1 5 77 26 61 11 59 15 48 19

link 4 6 0 9 3 8 5 10 7 1

Configuration after first iteration of the for loop of list1, first = 2

Configuration after second iteration of the for loop of list1, first = 6

Example 7.10 (Rearranging Records Using Only One Link)

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 1 5 11 26 61 77 59 15 48 19

link 4 6 6 9 3 0 5 10 7 1

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 1 5 11 15 61 77 59 26 48 19

link 4 6 6 8 3 0 5 9 7 1

Configuration after third iteration of the for loop of list1, first = 8

Configuration after fourth iteration of the for loop of list1, first = 10

Example 7.10 (Rearranging Records Using Only One Link)

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R1

0

key 1 5 11 15 19 77 59 26 48 61

link 4 6 6 8 10 0 5 9 7 3

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 1 5 11 15 19 26 59 77 48 61

link 4 6 6 8 1 8 5 0 7 3

Configuration after fifth iteration of the for loop of list1, first = 1

Configuration after sixth iteration of the for loop of list1, first = 9

Table Sort• The list-sort technique is not well suited for quick

sort and heap sort. • One can maintain an auxiliary table, t, with one

entry per record. The entries serve as an indirect reference to the records.

• Initially, t[i] = i. When interchanges are required, only the table entries are exchanged.

• It may be necessary to physically rearrange the records according to the permutation specified by t sometimes.

Table Sort (Cont.)

• The function to rearrange records corresponding to the permutation t[1], t[2], …, t[n] can be considered as an application of a theorem from mathematics:– Every permutation is made up of

disjoint cycles. The cycle for any element i is made up of i, t[i], t2[i], …, tk[i], where tj[i]=t[tj-1[i]], t0[i]=i, tk[i]=i.

Program 7.18 (Table Sort)

void table(Element* list, const int n, int *t)

{

for (int i = 1; i < n; i++) {

if (t[i] != i) {

Element p = list[i]; int j = i;

do {

int k = t[j]; list[j] = list[k]; t[j] = j;

j = k;

} while (t[j] != i)

list[j] = p;

t[j] = j;

}

}

}

Table Sort Example

R1 R2 R3 R4 R5 R6 R7 R8

key 35 14 12 42 26 50 31 18

t 3 2 8 5 7 1 4 6

key 12 14 18 42 26 35 31 50

t 1 2 3 5 7 6 4 8

key 12 14 18 26 31 35 42 50

t 1 2 3 4 5 6 7 8

Initial configuration

Configuration after rearrangement of first cycle

Configuration after rearrangement of second cycle

1 2 3

45

Analysis of Table Sort

• To rearrange a nontrivial cycle including k distinct records one needs k+1 moves. Total of record moves

• Since the records on all nontrivial cycles must be different, then

• The total number of record moves is maximum when and there are cycles.• Assume that one record move costs O(m) time,

the total computing time is O(mn).

1

0,0

)1(n

kll

l

k

nkl

nkl 2/n

Summary Of Internal Sorting• No one method is best for all conditions.

– Insertion sort is good when the list is already partially ordered. And it is best for small number of n.

– Merge sort has the best worst-case behavior but needs more storage than heap sort.

– Quick sort has the best average behavior, but its worst-case behavior is O(n2).

– The behavior of radix sort depends on the size of the keys and the choice of r.

External Sorting• There are some lists that are too large to fit in

the memory of a computer. So internal sorting is not possible.

• Some records are stored in the disk (tape, etc.). System retrieves a block of data from a disk at a time. A block contains multiple records.

• The most popular method for sorting on external storage devices is merge sort.– Segments of the input list are sorted. – Sorted segments (called runs) are written onto external

storage.– Runs are merged until only run is left.

Example 7.12• Consider a computer which is capable of sorting

750 records is used to sort 4500 records.• Six runs are generated with each run sorting 750

records.• Allocate three 250-record blocks of internal

memory for performing merging runs. Two for input run 2 runs and the last one is for output.

• Three factors contributing to the read/write time of disk:– seek time– latency time– transmission time

Example 7.12 (Cont.)

run1 (1 – 750)

run2 (751 – 1500)

run3 (1501 – 2250)

run4 (2251 – 3000)

run5 (3001 – 3750)

run6 (3751 – 4500)

Example 7.12 (Cont.)

• tIO = ts + tl +trw

tIS = time to internal sort 750 records

ntm = time to merge n records from input buffers to the output buffer

ts = maximum seek time

tl = maximum latency time

trw = time to read or write on block of 250 records

Example 7.12 (Cont.)

Operation Time

(1) Read 18 blocks of input, 18tIO, internally sort, 6tIS , write 18 blocks, 18tIO

36tIO + 6tIS

(2) Merge runs 1 to 6 in pairs 36tIO + 4500tm

(3) Merge two runs fo 1500 records each, 12 blocks

24tIO + 3000tm

(4) Merge one run of 3000 records with one run of 1500 records

36tIO + 4500tm

Total time 132tIO + 12000tm+ 6tIS

K-Way Merging• To merge m runs via 2-way merging will need

passes.• If we use higher order merge, the number of

passes over would be reduced.• With k-way merge on m runs, we need passes over.• But is it always true that the higher order of

merging, the less computing time we will have?– Not necessary!– k-1 comparisons are needed to determine the next

output.– If loser tree is used to reduce the number of

comparisons, we can achieve complexity of O(n log2 m)

– The data block size reduced as k increases. Reduced block size implies the increase of data passes over

1log2 m

mklog

Buffer Handling for Parallel Operation

• To achieve better performance, multiple input buffers and two output buffers are used to avoid idle time.

• Evenly distributing input buffers among all runs may still have idle time problem. Buffers should be dynamically assigned to whoever needs to retrieve more data to avoid halting the computing process.

• We should take advantage of task overlapping and keep computing process busy and avoid idle time.

Buffering Algorithm

Step 1: Input the first block of each of the k runs, setting up k linked queues, each having one block of data.

Step 2: Let LastKey[i] be the last key input from run i. Let NextRun be the run for which LastKey is minimum.

Step 3: Use a function kwaymerge to merge records from the k input queues into the output buffer.

Step 4: Wait for any ongoing disk I/O to complete.Step 5: If an input buffer has been read, add it to the

queue for the appropriate run.Step 6: If LastKey[NextRun] != +infinity, then initiate

reading the next block from run NextRun into a free input buffer.

Step 7: Initiate the writing of output buffer.Step 8: If a record with key +infinity has not been merged

into the output buffer, go back to Step 3. Otherwise, wait for the ongoing write to complete and then terminate.

Optimal Merging of Runs

26

11

6

2 4

5

15 6

2 4

20

5 15

26

weighted external path length = 2*3 + 4*3 + 5*2 + 15*1

= 43

weighted external path length = 2*2 + 4*2 + 5*2 + 15*2

= 52

Huffman Code• Assume we want to obtain an optimal set of codes for

messages M1, M2, …, Mn+1. Each code is a binary string that will be used for transmission of the corresponding message.

• At receiving end, a decode tree is used to decode the binary string and get back the message.

• A zero is interpreted as a left branch and a one as a right branch. These codes are called Huffman codes.

• The cost of decoding a code word is proportional to the number bits in the code. This number is equal to the distance of the corresponding external node from the root node.

• If qi is the relative frequency with which message Mi will be transmitted, then the expected decoding time is

where di is the distance of the external node for message Mi from the root node.

11 ni

iidq

Huffman Codes (Cont.)

• The expected decoding time is minimized by choosing code words resulting in a decode tree with minimal weighted external path length.

M1 M2

M3

M4

0

0

0

1

1

1

Huffman Function

class BinaryTree {public: BinaryTree(BinaryTree bt1, BinaryTree bt2) { root->LeftChild = bt1.root; root->RightChild = bt2.root; root->weight = bt1.root->weight + bt2.root->weight; }private: BinaryTreeNode *root;}

void huffman (List<BinaryTree> l)// l is a list of single node binary trees as decribed above{ int n = l.Size(); // number of binary trees in l for (int i = 0; i < n-1; i++) { // loop n-1 times BinaryTree first = l.DeleteMinWeight(); BinaryTree second = l.DeleteMinWeight(); BinaryTree *bt = new BinaryTree(first, second); l.Insert(bt); }}

O(nlog n)

Huffman Tree Example

5

2 3 5

2 3

10

5

16

9 7

5

2 3

10

5

23

13

16

9 7

5

2 3

10

5

23

13

39(a)

(b)

(c)

(d) (e)

top related