ALG0183 Algorithms & Data Structures Lecture 16 Merge sort 8/25/2009 1 ALG0183 Algorithms & Data Structures by Dr Andy Brooks comparison sort worse-case, average case, and best-case O(nlogn) usually not in-place algorithm (additional list when merging) usually stable The main disadvantage of merge sort is that it uses extra memory proportional to n. If you are sorting a very big list you will need a lot of memory. Merge sort is stable if the merge function is stable. cursion/sjálfkvaðning cursive/endurkvæmur two-way Chapter 8.5
20
Embed
ALG0183 Algorithms & Data Structures Lecture 16 Merge sort 8/25/20091 ALG0183 Algorithms & Data Structures by Dr Andy Brooks comparison sort worse-case,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
ALG0183 Algorithms & Data Structures by Dr Andy Brooks
1
ALG0183 Algorithms & Data Structures
Lecture 16Merge sort
8/25/2009
comparison sortworse-case, average case, andbest-case O(nlogn)usually not in-place algorithm(additional list when merging)usually stable implementations
The main disadvantage of merge sort is that it uses extra memory proportional to n. If you are sorting a very big list you will need a lot of memory.Merge sort is stable if the merge function is stable.
recursion/sjálfkvaðningrecursive/endurkvæmur
two-way
Chapter 8.5
ALG0183 Algorithms & Data Structures by Dr Andy Brooks
Definition: A sort algorithm that splits the items to be sorted into two groups, recursively sorts each group, and merges them into a final, sorted sequence. Run time is Θ(n log n).
Θ Formal Definition: f(n) = Θ (g(n)) means there are positive constants c1, c2, and k, such that 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n) for all n ≥ k. The values of c1, c2, and k must be fixed for the function f and must not depend on n.
An algorithmic technique where a function, in order to accomplish a task, calls itself with some part of the task.
Note: Every recursive solution involves two major parts or cases, the second part having three components.
base case(s), in which the problem is simple enough to be solved directly, and recursive case(s). A recursive case has three components:
• divide the problem into one or more simpler or smaller parts of the problem,
• call the function (recursively) on each part, and • combine the solutions of the parts into a solution for the problem.
Depending on the problem, any of these may be trivial or complex.
Note: it can be possible to run into stack memory problems if the depth of the recursion is large. For merge sort, however, the depth of the recursion is O(log2n) – see later.
ALG0183 Algorithms & Data Structures by Dr Andy Brooks
4
History & Ongoing Development“A Meticulous Analysis of Mergesort Programs”, Jyrki Katajainen and Jesper Larsson Träff, Lecture Notes In Computer Science, Vol. 1203, Proceedings of the Third Italian Conference on Algorithms and Complexity, pp 217 – 228, 1997
8/25/2009
“Mergesort is as important in the history of sorting as sorting in the history of computing. A detailed description of bottom-up mergesort, together with a timing analysis, appeared in a report by Goldstine and von Neumann [6] as early as 1948. Today numerous variants of the basic method are known, for instance, top-down mergesort (see, e.g., [17, pp. 165-166]), queue mergesort [7], in-place mergesort (see, e.g., [8]), natural mergesort (see, e.g., [11, pp. 159-163]), as well as other adaptive versions of mergesort (see [5, 14] and the references in these surveys). The development in this paper is based on bottom-up mergesort, or straight mergesort as it was called by Knuth [11, pp. 163-165].”
“New implementations for two-way and four-way bottom-up mergesort are given, the worst-case complexities of which are shown to be bounded by 5.5nlog 2 n + O(n) and 3.25nlog 2 n+O(n), respectively.”
four-way bottom-up mergesort
ALG0183 Algorithms & Data Structures by Dr Andy Brooks
int first1 = first, last1 = mid; // endpoints of first subarray int first2 = mid+1, last2 = last; // endpoints of second subarray int index = first1; // next index open in temp array
// Copy smaller item from each subarray into temp until one // of the subarrays is exhausted while (first1 <= last1 && first2 <= last2) { if (data[first1].compareTo(data[first2]) < 0) { temp[index] = data[first1]; first1++; } else { temp[index] = data[first2]; first2++; } index++; }
additional list
ALG0183 Algorithms & Data Structures by Dr Andy Brooks
// Copy remaining elements from second subarray, if any while (first2 <= last2) { temp[index] = data[first2]; first2++; index++; }
// Copy merged data into original array for (index = first; index <= last; index++) data[index] = temp[index]; }}
ALG0183 Algorithms & Data Structures by Dr Andy Brooks
12
Stable or unstable?Record keys are a,b, and c. There are 4 records. Two records have the same key b. Let x and y subscripts be used to distinguish records with key b.
8/25/2009
bx by c a
bx by c a
bx cby a if (data[first1].compareTo(data[first2]) < 0) { temp[index] = data[first1]; first1++; } else { temp[index] = data[first2]; first2++; }
Record bx is not less than by, so by is placed into the temp array first.(Applies to all test cases.)Code should be <=0, not <0.
stop dividing
divide-and-conquer
Lewis code is unstable.
Check for yourself.
idealised
ALG0183 Algorithms & Data Structures by Dr Andy Brooks
ALG0183 Algorithms & Data Structures by Dr Andy Brooks
15
“Selecting the Right Algorithm”Talk given by Michail G. Lagoudakis at the 2001 AAAI Fall Symposium Series: Using Uncertainty within Computation, Cape Cod, MA, November 2001. http://www2.isye.gatech.edu/~mlagouda/presentations.html
8/25/2009
ALG0183 Algorithms & Data Structures by Dr Andy Brooks
168/25/2009
“Selecting the Right Algorithm”Talk given by Michail G. Lagoudakis at the 2001 AAAI Fall Symposium Series: Using Uncertainty within Computation, Cape Cod, MA, November 2001. http://www2.isye.gatech.edu/~mlagouda/presentations.html
Earlier cross-over point.
ALG0183 Algorithms & Data Structures by Dr Andy Brooks
17
• The dashed line represents the algorithm which starts with Insertion Sort and then swaps over to Quicksort at the cross-over point.
• The hybrid algorithm makes use of knowledge of the interaction between algorithms and performs even better: Insertion Sort, then Merge Sort, then Quicksort.
8/25/2009
optimal policy
“Selecting the Right Algorithm”Talk given by Michail G. Lagoudakis at the 2001 AAAI Fall Symposium Series: Using Uncertainty within Computation, Cape Cod, MA, November 2001. http://www2.isye.gatech.edu/~mlagouda/presentations.html
...but is the Quicksort stable?
platform dependent
Optimal policies must be determined empirically i.e. do an experiment.
ALG0183 Algorithms & Data Structures by Dr Andy Brooks
18
Many possible improvements can be made to merge sort.• Sorting can be speeded up by choosing a more efficient
algorithm for small n.– Advice is to use insertion sort for small n.
• The merge need not be performed if the highest element of the first subarray is less than the lowest element in the second subarray. (less than or equal to?)– Java´s merge sort has this improvement.– The improvement can certainly help for nearly ordered lists.– (There is a small cost finding the highest and lowest elements.)
• Four-way merging has been reported as being better than two-way merging.
8/25/2009
ALG0183 Algorithms & Data Structures by Dr Andy Brooks
19
Some Big-Oh merge sort (recursive)
• Assume the number of items N to be sorted is a power of 2. • Assume the cost of merging two sublists of size N/2 is N
comparisons. (actually N-1 comparisons)
• Let T(N) equal the number of comparisons needed to sort N items.– A proxy measure for the time needed. (ignoring moves/swaps)
• The time to sort N items is the time to sort two sublists of size N/2 plus the time to merge the two sublists together.
• The recurrence relation is: T(N) = 2 T(N/2) + N for N > 1 T(1) = 0
• This recurrence relation is typical of many “divide-and-conquer” algorithms.
• There are several ways of proving that T(N) is Nlog2N
8/25/2009 O(nlogn)
ALG0183 Algorithms & Data Structures by Dr Andy Brooks
20
O(nlogn) proof by recursion tree
8/25/2009
NlogN
16
8
4
2
If the number of items N = 16, there are log216 = 4 levels.