May 06, 2015
Why Sort?A classic problem in computer science!Data requested in sorted order
e.g., find students in increasing gpa orderSorting is first step in bulk loading B+ tree
index.Sorting useful for eliminating duplicate copies
in a collection of recordsSorting is useful for summarizing related groups
of tuplesSort-merge join algorithm involves sorting.Problem: sort 100Gb of data with 1Gb of RAM.
why not virtual memory?
Bubble sortCompare each element (except the last one)
with its neighbor to the rightIf they are out of order, swap themThis puts the largest element at the very endThe last element is now in the correct and final place
Compare each element (except the last two) with its neighbor to the rightIf they are out of order, swap themThis puts the second largest element next to lastThe last two elements are now in their correct and
final placesCompare each element (except the last three)
with its neighbor to the rightContinue as above until you have no unsorted elements
on the left
Example of bubble sort7 2 8 5 4
2 7 8 5 4
2 7 8 5 4
2 7 5 8 4
2 7 5 4 8
2 7 5 4 8
2 5 7 4 8
2 5 4 7 8
2 7 5 4 8
2 5 4 7 8
2 4 5 7 8
2 5 4 7 8
2 4 5 7 8
2 4 5 7 8
(done)
Sorting - BubbleFrom the first element
Exchange pairs if they’re out of order Last one must now be the largest
Repeat from the first to n-1Stop when you have only one element to check
Bubble Sort/* Bubble sort for integers */#define SWAP(a,b) { int t; t=a; a=b; b=t; }
void bubble( int a[], int n ) {
int i, j;
for(i=0;i<n;i++) { /* n passes thru the array */
/* From start to the end of unsorted part */
for(j=1;j<(n-i);j++) {
/* If adjacent items out of order, swap */
if( a[j-1]>a[j] ) SWAP(a[j-1],a[j]);
}
}
}
Bubble Sort - Analysis/* Bubble sort for integers */#define SWAP(a,b) { int t; t=a; a=b; b=t; }
void bubble( int a[], int n ) {
int i, j;
for(i=0;i<n;i++) { /* n passes thru the array */
/* From start to the end of unsorted part */
for(j=1;j<(n-i);j++) {
/* If adjacent items out of order, swap */
if( a[j-1]>a[j] ) SWAP(a[j-1],a[j]);
}
}
}
O(1) statement
Bubble Sort - Analysis
/* Bubble sort for integers */#define SWAP(a,b) { int t; t=a; a=b; b=t; }
void bubble( int a[], int n ) {
int i, j;
for(i=0;i<n;i++) { /* n passes thru the array */
/* From start to the end of unsorted part */
for(j=1;j<(n-i);j++) {
/* If adjacent items out of order, swap */
if( a[j-1]>a[j] ) SWAP(a[j-1],a[j]);
}
}
}
Inner loopn-1, n-2, n-3, … , 1 iterations
O(1) statement
Bubble Sort - Analysis/* Bubble sort for integers */#define SWAP(a,b) { int t; t=a; a=b; b=t; }
void bubble( int a[], int n ) {
int i, j;
for(i=0;i<n;i++) { /* n passes thru the array */
/* From start to the end of unsorted part */
for(j=1;j<(n-i);j++) {
/* If adjacent items out of order, swap */
if( a[j-1]>a[j] ) SWAP(a[j-1],a[j]);
}
}
}
Outer loop n iterations
Bubble Sort - Analysis/* Bubble sort for integers */#define SWAP(a,b) { int t; t=a; a=b; b=t; }
void bubble( int a[], int n ) {
int i, j;
for(i=0;i<n;i++) { /* n passes thru the array */
/* From start to the end of unsorted part */
for(j=1;j<(n-i);j++) {
/* If adjacent items out of order, swap */
if( a[j-1]>a[j] ) SWAP(a[j-1],a[j]);
}
}
}
Overall
ii=n-1
1=
n(n+1)
2
= O(n2)
n outer loop iterations inner loop iteration count
Sorting - SimpleBubble sort
O(n2)Very simple code
Insertion sortSlightly better than bubble sort
Fewer comparisonsAlso O(n2)
Selection sortGiven an array of length n,
Search elements 0 through n-1 and select the smallest Swap it with the element in location 0
Search elements 1 through n-1 and select the smallest Swap it with the element in location 1
Search elements 2 through n-1 and select the smallest Swap it with the element in location 2
Search elements 3 through n-1 and select the smallest Swap it with the element in location 3
Continue in this fashion until there’s nothing left to search
13
Example and analysis of selection sort
The selection sort might swap an array element with itself--this is harmless, and not worth checking for
Analysis:The outer loop executes n-1
timesThe inner loop executes about
n/2 times on average (from n to 2 times)
Work done in the inner loop is constant (swap two array elements)
Time required is roughly (n-1)*(n/2)
You should recognize this as O(n2)
7 2 8 5 4
2 7 8 5 4
2 4 8 5 7
2 4 5 8 7
2 4 5 7 8
Selection sort
How does it work:first find the smallest in the array and exchange it with the
element in the first position, then find the second smallest element and exchange it with the element in the second position, and continue in this way until the entire array is sorted.
How does it sort the list in a non increasing order?Selection sort is:
The simplest sorting techniques. a good algorithm to sort a small number of elementsan incremental algorithm – induction method
Selection sort is Inefficient for large lists.
Incremental algorithms process the input elements one-by-one and maintain the solution for the elements processed so far.
Selection Sort AlgorithmInput: An array A[1..n] of n elements.Output: A[1..n] sorted in nondecreasing
order.1. for i 1 to n - 12. k i3. for j i + 1 to n {Find the i th smallest element.}
4. if A[j] < A[k] then k j5. end for6. if k i then interchange A[i] and A[k]7. end for
SortingCard players all know how to sort …
First card is already sorted With all the rest,
Scan back from the end until you find the first card larger than the new one,
Move all the lower ones up one slot insert it
Q
2
9
A
K
10
J
2
2
9
One step of insertion sort
3 4 7 12 14 14 20 21 33 38 10 55 9 23 28 16
sorted next to be inserted
3 4 7 55 9 23 28 16
10
temp
3833212014141210
sorted
less than 10
Algorithm: INSERTIONSORTInput: An array A[1..n] of n elements.Output: A[1..n] sorted in nondecreasing order.
1. for i 2 to n2. x A[i]3. j i - 14. while (j >0) and (A[j] > x)5. A[j + 1] A[j]6. j j - 17. end while8. A[j + 1] x9. end for
Example sort : 34 8 64 51 32 21
Analysis of insertion sortWe run once through the outer loop,
inserting each of n elements; this is a factor of n
On average, there are n/2 elements already sortedThe inner loop looks at (and moves) half of
theseThis gives a second factor of n/4
Hence, the time required for an insertion sort of an array of n elements is proportional to n2/4
Discarding constants, we find that insertion sort is O(n2)
SummaryBubble sort, selection sort, and insertion sort
are all O(n2)As we will see later, we can do much better
than this with somewhat more complicated sorting algorithms
Within O(n2), Bubble sort is very slow, and should probably never
be used for anythingSelection sort is intermediate in speedInsertion sort is usually the fastest of the three--in
fact, for small arrays (say, 10 or 15 elements), insertion sort is faster than more complicated sorting algorithms
Selection sort and insertion sort are “good enough” for small arrays
Digit Sublist
0 340 710
1
2 812 582
3 4934
5 715 195
3856 7 4378 9
Consider the following 9 numbers:493 812 715 710 195 437 582 340 385We should start sorting by comparing and ordering the one's digits:
Radix Sort
Notice that the numbers were added onto the list in the order that they were found, which is why the numbers appear to be unsorted in each of the sublists above. Now, we gather the sublists (in order from the 0 sublist to the 9 sublist) into the main list again:340 710 812 582 493 715 195 385 437
Digit Sublist
0
1 710 812 715
2 3 4374 3405 6 7
8 582 385
9 493 195
Now, the sublists are created again, this time based on the ten's digit:Now the sublists are gathered in order from 0 to 9:710 812 715 437 340 582 385 493 195
Now the sublists are gathered in order from 0 to 9:710 812 715 437 340 582 385 493 195
Digit Sublist
0 1 1952
3 340 385
4 437 493
5 5826
7 710 715
8 8129
Finally, the sublists are created according to the hundred's digit:At last, the list is gathered up again:
195 340 385 437 493 582 710 715 812
At last, the list is gathered up again:195 340 385 437 493 582 710 715 812
DisadvantagesStill, there are some tradeoffs for Radix Sort that can make it less preferable than other sorts.The speed of Radix Sort largely depends on the inner basic operations, and if the operations are not efficient enough, Radix Sort can be slower than some other algorithms such as Quick Sort and Merge Sort. These operations include the insert and delete functions of the sublists and the process of isolating the digit you want.In the example above, the numbers were all of equal length, but many times, this is not the case. If the numbers are not of the same length, then a test is needed to check for additional digits that need sorting. This can be one of the slowest parts of Radix Sort, and it is one of the hardest to make efficient.
Radix Sort can also take up more space than other sorting algorithms, since in addition to the array that will be sorted, you need to have a sublist for each of the possible digits or letters. If you are sorting pure English words, you will need at least 26 different sublists, and if you are sorting alphanumeric words or sentences, you will probably need more than 40 sublists in all!
Since Radix Sort depends on the digits or letters, Radix Sort is also much less flexible than other sorts. For every different type of data, Radix Sort needs to be rewritten, and if the sorting order changes, the sort needs to be rewritten again. In short, Radix Sort takes more time to write, and it is very difficult to write a general purpose Radix Sort that can handle all kinds of data.
Merge Sort
7 2 9 4 2 4 7 9
7 2 2 7 9 4 4 9
7 7 2 2 9 9 4 4
Merge SortMerge sort is based
on the divide-and-conquer paradigm. It consists of three steps:Divide: partition input
sequence S into two sequences S1 and S2 of about n2 elements each
Recur: recursively sort S1 and S2
Conquer: merge S1 and S2 into a unique sorted sequence
Algorithm mergeSort(S, C)Input sequence S, comparator C Output sequence S sorted
according to Cif S.size() > 1 {
(S1, S2) := partition(S, S.size()/2)
S1 := mergeSort(S1, C)
S2 := mergeSort(S2, C)
S := merge(S1, S2)} return(S)
Divide-and-Conquer
Merge Sort Execution Tree (recursive calls)An execution of merge-sort is depicted by a binary
treeeach node represents a recursive call of merge-sort and stores
unsorted sequence before the execution and its partition sorted sequence at the end of the execution
the root is the initial call the leaves are calls on subsequences of size 0 or 1
7 2 9 4 2 4 7 9
7 2 2 7 9 4 4 9
7 7 2 2 9 9 4 4
Divide-and-Conquer 29
Execution ExamplePartition
7 2 9 4 2 4 7 9 3 8 6 1 1 3 8 6
7 2 2 7 9 4 4 9 3 8 3 8 6 1 1 6
7 7 2 2 9 9 4 4 3 3 8 8 6 6 1 1
7 2 9 4 3 8 6 1 1 2 3 4 6 7 8 9
Divide-and-Conquer 30
Execution Example (cont.)Recursive call, partition
7 2 9 4 2 4 7 9 3 8 6 1 1 3 8 6
7 2 2 7 9 4 4 9 3 8 3 8 6 1 1 6
7 7 2 2 9 9 4 4 3 3 8 8 6 6 1 1
7 2 9 4 3 8 6 1 1 2 3 4 6 7 8 9
Divide-and-Conquer 31
Execution Example (cont.)Recursive call, partition
7 2 9 4 2 4 7 9 3 8 6 1 1 3 8 6
7 2 2 7 9 4 4 9 3 8 3 8 6 1 1 6
7 7 2 2 9 9 4 4 3 3 8 8 6 6 1 1
7 2 9 4 3 8 6 1 1 2 3 4 6 7 8 9
Divide-and-Conquer 32
Execution Example (cont.)Recursive call, base case
7 2 9 4 2 4 7 9 3 8 6 1 1 3 8 6
7 2 2 7 9 4 4 9 3 8 3 8 6 1 1 6
7 7 2 2 9 9 4 4 3 3 8 8 6 6 1 1
7 2 9 4 3 8 6 1 1 2 3 4 6 7 8 9
Divide-and-Conquer 33
Execution Example (cont.)Recursive call, base case
7 2 9 4 2 4 7 9 3 8 6 1 1 3 8 6
7 2 2 7 9 4 4 9 3 8 3 8 6 1 1 6
7 7 2 2 9 9 4 4 3 3 8 8 6 6 1 1
7 2 9 4 3 8 6 1 1 2 3 4 6 7 8 9
Divide-and-Conquer 34
Execution Example (cont.)Merge
7 2 9 4 2 4 7 9 3 8 6 1 1 3 8 6
7 2 2 7 9 4 4 9 3 8 3 8 6 1 1 6
7 7 2 2 9 9 4 4 3 3 8 8 6 6 1 1
7 2 9 4 3 8 6 1 1 2 3 4 6 7 8 9
Divide-and-Conquer 35
Execution Example (cont.)Recursive call, …, base case, merge
7 2 9 4 2 4 7 9 3 8 6 1 1 3 8 6
7 2 2 7 9 4 4 9 3 8 3 8 6 1 1 6
7 7 2 2 3 3 8 8 6 6 1 1
7 2 9 4 3 8 6 1 1 2 3 4 6 7 8 9
9 9 4 4
Divide-and-Conquer 36
Execution Example (cont.)Merge
7 2 9 4 2 4 7 9 3 8 6 1 1 3 8 6
7 2 2 7 9 4 4 9 3 8 3 8 6 1 1 6
7 7 2 2 9 9 4 4 3 3 8 8 6 6 1 1
7 2 9 4 3 8 6 1 1 2 3 4 6 7 8 9
Divide-and-Conquer 37
Execution Example (cont.)Recursive call, …, merge, merge
7 2 9 4 2 4 7 9 3 8 6 1 1 3 6 8
7 2 2 7 9 4 4 9 3 8 3 8 6 1 1 6
7 7 2 2 9 9 4 4 3 3 8 8 6 6 1 1
7 2 9 4 3 8 6 1 1 2 3 4 6 7 8 9
Divide-and-Conquer 38
Execution Example (cont.)Merge
7 2 9 4 2 4 7 9 3 8 6 1 1 3 6 8
7 2 2 7 9 4 4 9 3 8 3 8 6 1 1 6
7 7 2 2 9 9 4 4 3 3 8 8 6 6 1 1
7 2 9 4 3 8 6 1 1 2 3 4 6 7 8 9
Divide-and-Conquer
Another Analysis of Merge-SortThe height h of the merge-sort tree is O(log n)
at each recursive call we divide in half the sequence, The work done at each level is O(n)
At level i, we partition and merge 2i sequences of size n2i Thus, the total running time of merge-sort is O(n log n)
depth #seqs size Cost for level
0 1 n n
1 2 n2 n
…
i 2i n2i n
… … …
logn 2logn = n n/2logn = 1 n
Summary of Sorting Algorithms (so far)
Vectors
Algorithm Time Notes
Selection Sort O(n2) Slow, in-placeFor small data sets
Insertion Sort O(n2) WC, ACO(n) BC
Slow, in-placeFor small data sets
Heap Sort O(nlog n) Fast, in-placeFor large data sets
Merge Sort O(nlogn) Fast, sequential data accessFor huge data sets
Divide-and-Conquer
7 4 9 6 2 2 4 6 7 9
4 2 2 4 7 9 7 9
2 2 9 9
Divide-and-Conquer
Quick-SortQuick-sort is a
randomized sorting algorithm based on the divide-and-conquer paradigm:Divide: pick a random
element x (called pivot) and partition S into L elements less than x E elements equal x G elements greater than x
Recur: sort L and GConquer: join L, E and G
x
x
L GE
x
Divide-and-Conquer
Analysis of Quick Sort using Recurrence Relations
• Assumption: random pivot expected to give equal sized sublists
• The running time of Quick Sort can be expressed as:
T(n) = 2T(n/2) + P(n)
• T(n) - time to run quicksort() on an input of size n
• P(n) - time to run partition() on input of size n
Algorithm QuickSort(S, l, r)Input sequence S, ranks l and rOutput sequence S with the
elements of rank between l and rrearranged in increasing order
if l r return
i a random integer between l and r x S.elemAtRank(i) (h, k) Partition(x)QuickSort(S, l, h 1)QuickSort(S, k 1, r)
QuicksortEfficient sorting algorithm
Discovered by C.A.R. HoareExample of Divide and Conquer algorithmTwo phases
Partition phase Divides the work into half
Sort phase Conquers the halves!
QuicksortPartition
Choose a pivotFind the position for the pivot so that
all elements to the left are less all elements to the right are greater
< pivot > pivotpivot
QuicksortConquerApply the same algorithm to each half
< pivot > pivot
pivot< p’ p’ > p’ < p” p” > p”
QuicksortImplementationquicksort( void *a, int low, int high ) { int pivot; /* Termination condition! */ if ( high > low ) { pivot = partition( a, low, high ); quicksort( a, low, pivot-1 ); quicksort( a, pivot+1, high ); } }
Divide
Conquer
Quicksort - Partitionint partition( int *a, int low, int high ) { int left, right; int pivot_item; pivot_item = a[low]; pivot = left = low; right = high; while ( left < right ) { /* Move left while item < pivot */ while( a[left] <= pivot_item ) left++; /* Move right while item > pivot */ while( a[right] >= pivot_item ) right--; if ( left < right ) SWAP(a,left,right); } /* right is final position for the pivot */ a[low] = a[right]; a[right] = pivot_item; return right; }
Quicksort - Partitionint partition( int *a, int low, int high ) { int left, right; int pivot_item; pivot_item = a[low]; pivot = left = low; right = high; while ( left < right ) { /* Move left while item < pivot */ while( a[left] <= pivot_item ) left++; /* Move right while item > pivot */ while( a[right] >= pivot_item ) right--; if ( left < right ) SWAP(a,left,right); } /* right is final position for the pivot */ a[low] = a[right]; a[right] = pivot_item; return right; }
This exampleuses int’s
to keep thingssimple!
23 12 15 38 42 18 36 29 27
low high
Any item will do as the pivot,choose the leftmost one!
Quicksort - Partitionint partition( int *a, int low, int high ) { int left, right; int pivot_item; pivot_item = a[low]; pivot = left = low; right = high; while ( left < right ) { /* Move left while item < pivot */ while( a[left] <= pivot_item ) left++; /* Move right while item > pivot */ while( a[right] >= pivot_item ) right--; if ( left < right ) SWAP(a,left,right); } /* right is final position for the pivot */ a[low] = a[right]; a[right] = pivot_item; return right; }
Set left and right markers
23 12 15 38 42 18 36 29 27
low highpivot: 23
left right
Quicksort - Partitionint partition( int *a, int low, int high ) { int left, right; int pivot_item; pivot_item = a[low]; pivot = left = low; right = high;
while ( left < right ) { /* Move left while item < pivot */ while( a[left] <= pivot_item ) left++; /* Move right while item > pivot */ while( a[right] >= pivot_item ) right--; if ( left < right ) SWAP(a,left,right); } /* right is final position for the pivot */ a[low] = a[right]; a[right] = pivot_item; return right; }
Move the markers until they cross over
23 12 15 38 42 18 36 29 27
low highpivot: 23
left right
Quicksort - Partitionint partition( int *a, int low, int high ) { int left, right; int pivot_item; pivot_item = a[low]; pivot = left = low; right = high;
while ( left < right ) { /* Move left while item < pivot */ while( a[left] <= pivot_item ) left++; /* Move right while item > pivot */ while( a[right] >= pivot_item ) right--; if ( left < right ) SWAP(a,left,right); } /* right is final position for the pivot */ a[low] = a[right]; a[right] = pivot_item; return right; }
Move the left pointer whileit points to items <= pivot
23 12 15 38 42 18 36 29 27
low highpivot: 23
left right Move right similarly
Quicksort - Partitionint partition( int *a, int low, int high ) { int left, right; int pivot_item; pivot_item = a[low]; pivot = left = low; right = high;
while ( left < right ) { /* Move left while item < pivot */
while( a[left] <= pivot_item ) left++; /* Move right while item > pivot */
while( a[right] >= pivot_item ) right--; if ( left < right ) SWAP(a,left,right); } /* right is final position for the pivot */ a[low] = a[right]; a[right] = pivot_item; return right; }
Swap the two itemson the wrong side of the pivot
23 12 15 38 42 18 36 29 27
low high
pivot: 23
left right
Quicksort - Partitionint partition( int *a, int low, int high ) { int left, right; int pivot_item; pivot_item = a[low]; pivot = left = low; right = high;
while ( left < right ) { /* Move left while item < pivot */
while( a[left] <= pivot_item ) left++; /* Move right while item > pivot */
while( a[right] >= pivot_item ) right--; if ( left < right ) SWAP(a,left,right); } /* right is final position for the pivot */ a[low] = a[right]; a[right] = pivot_item; return right; }
left and right have swapped over,
so stop
23 12 15 18 42 38 36 29 27
low highpivot: 23
leftright
Quicksort - Partitionint partition( int *a, int low, int high ) { int left, right; int pivot_item; pivot_item = a[low]; pivot = left = low; right = high;
while ( left < right ) { /* Move left while item < pivot */
while( a[left] <= pivot_item ) left++; /* Move right while item > pivot */
while( a[right] >= pivot_item ) right--; if ( left < right ) SWAP(a,left,right); } /* right is final position for the pivot */ a[low] = a[right]; a[right] = pivot_item; return right; }
Finally, swap the pivotand right
23 12 15 18 42 38 36 29 27
low highpivot: 23
leftright
Quicksort - Partitionint partition( int *a, int low, int high ) { int left, right; int pivot_item; pivot_item = a[low]; pivot = left = low; right = high;
while ( left < right ) { /* Move left while item < pivot */
while( a[left] <= pivot_item ) left++; /* Move right while item > pivot */
while( a[right] >= pivot_item ) right--; if ( left < right ) SWAP(a,left,right); } /* right is final position for the pivot */ a[low] = a[right]; a[right] = pivot_item; return right; }
Return the positionof the pivot
18 12 15 23 42 38 36 29 27
low high
pivot: 23
right
Quicksort - Conquerpivot
18 12 15 23 42 38 36 29 27
pivot: 23
Recursivelysort left half Recursively
sort right half