Top Banner
TABLE OF CONTENTS INTRODUCTION CLASSIFICATION OF SORTING ALGORITHMS BUBBLE SORT INSERTION SORT SHELL SORT HEAP SORT MERGE SORT QUICK SORT BUCKET SORT RADIX SORT
34
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sorting

TABLE OF CONTENTS

INTRODUCTION

CLASSIFICATION OF SORTING ALGORITHMS

BUBBLE SORT

INSERTION SORT

SHELL SORT

HEAP SORT

MERGE SORT

QUICK SORT

BUCKET SORT

RADIX SORT

Page 2: Sorting

INTRODUCTION

A sorting algorithm is an algorithm that puts a list in a specific order. Sorting is

the building block of any program. It helps in optimizing the performance of the

program, and also helps in reducing the code to a great level. This has made

sorting algorithms an area of interest for the Computer Scientists.

The list is an abstract data type that implements an ordered collection of values

where the values may occur more than once in the list. The order in which the list

is sorted can be numerical or lexicographical.

This report aims to give a glimpse about the various sorting algorithms present. It

gives a detailed (but limited) explanation on the working and the uses of some of

the most widely used sorting algorithms. With regard to the different types of

sorting algorithms at hand, the report presents us the basis on which the algorithms

are classified, some of them being complexity, stability, and general method

followed.

For every sorting algorithm, we have provided with a code fragment for the

facility of the readers. The report is to be used as a study material and not just a

reading one.

Page 3: Sorting

CLASSIFICATION OF SORTING ALGORITHMS

Sorting algorithms are classified based on various factors. These factors also

influence the efficiency of the sorting algorithm. They are often classified by:

Computational Complexity – Comparisons can be made as worst, average

and best behaviour in terms of the Big O Notation a. For an sorting algorithm

of a list of size , good behaviour is and bad behaviour is

. Ideal behaviour for a sorting algorithm is .

Memory Usage – There are in-place algorithms and out-of-place algorithms.

In-place algorithms require just or extra memory, beyond

the items being sorted. Therefore they don’t need to create auxiliary locations

for data to be temporarily stored, as in out-of-place sorting algorithms.

Recursion – Some algorithms are either recursive or non-recursive, while

others may be both (e.g., merge sort).

Stability – Stable sorting algorithms maintain the relative order of records

with equal keys (i.e., values). As an example the following set of values is to be

sorted by their first component. ( 2 , 6 ) ( 2 , 1 ) ( 3 , 5 ) ( 4 , 3 ) ( 7 , 2 ) .

They can be sorted in two ways :

( 2 , 6 ) ( 2 , 1 ) ( 3 , 5 ) ( 4 , 3 ) ( 7 , 2 ) - Order Not Changed

( 2 , 1 ) ( 2 , 6 ) ( 3 , 5 ) ( 4 , 3 ) ( 7 , 2 ) - Order Changed

The algorithm that does not change the relative order, is the stable algorithm.

The unstable algorithms can always be specially implemented to be stable, by

keeping the order of actual data as the tie breaker for equal values. This

would require additional computational cost and memory.

Comparison – Algorithms can be classified on whether the sorting algorithm

is a comparison sort. A comparison sort examines the data only by comparing

Page 4: Sorting

two elements with a comparison operator. Most of the widely used sorts are

comparison sorts.

General Method – According to the method followed by the sorting algorithm

it is classified as insertion, exchange, selection, merging, etc. Examples of

exchange sorts and selection sorts are bubble sort and heap sort respectively.

All these factors will be considered when we will be doing a detailed study of

the widely used and popular sorting algorithms.

Page 5: Sorting

BUBBLE SORT

Bubble sort is a simplistic and straightforward method of sorting data that is

used in computer science education. It works by repeatedly steeping through

the list to be sorted, comparing each pair of adjacent items and swapping

them if they are in the wrong order. The pass through the list is repeated until

no swaps are needed, which indicates that the list is sorted. The smaller

elements bubble their way to the top, and hence the name bubble sort. Bubble

sort is a comparison sort.

Bubble sort has the worst case and average complexity both О(n²), where n is

the number of items being sorted. In this algorithm for sorting 100 elements, there

are 10000 comparisons made. There are lot of other sorting algorithms which

substantially work better with the worst or average case of O(n log n). Insertion

sort, which also has a worst case of О(n²), perform better than bubble sort.

Therefore the use of bubble sort is not practical when n is large. The performance

of bubble sort also depends on the position of the elements. Large elements at the

beginning of the list do not pose a problem, as they are quickly swapped. Small

elements towards the end, however, move to the beginning extremely slowly.

Cocktail sort is a variant of bubble sort that solves this problem, but it still retains

the О(n²), worst case complexity.

Let us take the array of numbers "5 1 4 2 8", and sort the array from lowest

number to greatest number using bubble sort algorithm. In each step, elements

written in bold are being compared.

First Pass:

( 5 1 4 2 8 ) ( 1 5 4 2 8 ), Here, algorithm compares the first two elements, and

swaps them.

( 1 5 4 2 8 ) ( 1 4 5 2 8 ), Swap since 5 > 4

( 1 4 5 2 8 ) ( 1 4 2 5 8 ), Swap since 5 > 2

( 1 4 2 5 8 ) ( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5),

Page 6: Sorting

algorithm does not swap them.

Second Pass:

( 1 4 2 5 8 ) ( 1 4 2 5 8 )

( 1 4 2 5 8 ) ( 1 2 4 5 8 )

( 1 2 4 5 8 ) ( 1 2 4 5 8 )

( 1 2 4 5 8 ) ( 1 2 4 5 8 )

Now, the array is already sorted, but our algorithm does not know if it is

completed. The algorithm needs one whole pass without any swap to know it is

sorted.

Third Pass:

( 1 2 4 5 8 ) ( 1 2 4 5 8 )

( 1 2 4 5 8 ) ( 1 2 4 5 8 )

( 1 2 4 5 8 ) ( 1 2 4 5 8 )

( 1 2 4 5 8 ) ( 1 2 4 5 8 )

Finally, the array is sorted, and the algorithm can terminate.

The performance of bubble sort can be improved marginally. When the first pass

is over, the greatest element comes over to the last position i.e. the n-1 position in

the array. For further passes, that position is not to be compared. So now, each

pass can be one step shorter than the previous pass. This can shorten the number of

passes by half although the complexity still remains О(n²).

Due to its simplicity and straightforwardness, bubble sort is often used to

introduce the concept of an algorithm to introductory computer science students.

The Jargon file, which famously calls bogo-sort ‘the archetypical perversely awful

algorithm’, also calls bubble sort ‘the generic bad algorithm’. D. Knuth, in his

popular book ‘The Art of Computer Programming’ concludes that bubble sort

seems to have nothing in it to recommend. Researchers like Owen Astrachan have

shown by experimental results that insertion sort performs better even on random

lists. They have gone as far as to recommend that it no longer be taught.

Page 7: Sorting
Page 8: Sorting

INSERTION SORT

Insertion sort is a simple sorting algorithm that is relatively efficient for small

lists and mostly-sorted lists, and often is used as a part of more sophisticated

algorithms. It is a comparison sort, in which the sorted list is built one entry

at a time. It works by taking elements from the list one by one and inserting

them in their correct position into a new sorted list. The new list and the

remaining elements can share the array’s space, but insertion is expensive,

requiring shifting all following elements over by one.

Insertion sort has an average and worst complexity of O(n2). The best case is

when an already sorted list is sorted, the running time being linear i.e. O(n). Worst

case is when the list is in the reverse order. Since in average cases also, the

running time is quadratic, it is not considered suitable for large lists. However it is

one of the fastest, when the list contains less than 10 elements. A slight variant of

insertion sort, Shell sort is more efficient for larger lists. The next chapter

gives more detail on shell sort. Insertion sort is stable, and is an in-place

algorithm, requiring a constant amount of additional memory space. It can sort a

list as it receives it. Compared to other advanced algorithms like quick sort,

heap sort, or merge sort, insertion sort is much less efficient on large lists.

The following example will show the algorithm for insertion sort. Let us consider

the array of numbers “5 1 4 2 8”. In each step, elements written in bold are being

compared.

First Pass:

( 5 1 4 2 8 ) ( 1 5 4 2 8 ), Here, algorithm moves down the list searching for an

element less than 5, (which is 1 in this case) and brings it to the front by swaps.

Second Pass:

( 1 5 4 2 8 ) ( 1 4 5 2 8 ) Here 4<5, therefore it is brought to its position

( 1 4 5 2 8 ) ( 1 4 5 2 8 ) Here 1>4, hence no swapping is done

Page 9: Sorting

Third Pass:

( 1 4 5 2 8 ) ( 1 4 2 5 8 )

( 1 4 2 5 8 ) ( 1 2 4 5 8 ) Here the list is sorted, but it keeps checking that the

whole list is sorted or not.

( 1 2 4 5 8 ) ( 1 2 4 5 8 )

Fourth Pass:

( 1 2 4 5 8 ) ( 1 2 4 5 8 )

Instead of swapping, direct shifting can be done by using binary search, and

finding out where the element is to be inserted.

Binary search is only efficient if the number of comparisons is more than the

number of swaps. Since insertion is very tedious in arrays, one can use a linked-

list for the sort. But in linked-list, binary search cannot be done as random access

is not allowed in linked lists. In 2004 Bender, Farach-Colton, and Mosteiro

produced a new variant of insertion sort, called library sort, that leaves a small

number of unused gaps spread throughout the array. The benefit is that shifting of

elements be done only till a gap is reached.

Page 10: Sorting

void insertionSort(int[] arr) {

      int i, j, newValue;

      for (i = 1; i < arr.length; i++) {

            newValue = arr[i];

            j = i;

            while (j > 0 && arr[j - 1] > newValue) {

                  arr[j] = arr[j - 1];

                  j--;

            }

            arr[j] = newValue;

      }

}

Page 11: Sorting

SHELL SORT

Shell sort was invented by Donald Shell, in 1959. The sort was given its name

upon its inventor. It is an improvised version of insertion sort. It combines

insertion sort and bubble sort to give much more efficiency than both of these

traditional algorithms. Shell sort improves insertion sort by comparing elements

separated by a gap of several positions. This lets an element take "bigger steps"

toward its expected position. Multiple passes over the data are taken with smaller

and smaller gap sizes. The last step of Shell sort is a plain insertion sort, but by

then, the array of data is guaranteed to be almost sorted. An implementation of

shell sort can be described as arranging the data sequence in a two-

dimensional array and then sorting the columns of the array using insertion

set. The effect is that the data sequence is partially sorted. The process above

is repeated, but each time with a smaller number of columns. In the last step,

the array consists of only one column. Actually, the data sequence is not held

in a two-dimensional array, but in a one-dimensional array that is indexed

properly.

Though the shell sort is a simple algorithm, finding its complexity is a laborious

task. The original shell sort algorithm has O(n2) complexity for comparisons and

exchanges. The gap sequence is a major factor in the shell sort that improves or

deteriorates the performance of the algorithm. The original gap sequence

suggested by Donald Shell was to begin with N/2 and halve the number until it

reaches 1. With this gap sequence, the worst case running time is O(n2). The other

gap sequences that are in use popularly and their worst case running time is as

follows. O(n3 / 2) for Hibbard's increments of 2k − 1, O(n4 / 3) for Sedgewick's

increments of , or , or O(nlog2n) for Pratt's

increments 2i3j, and possibly unproven better running times. The existence of an

O(nlogn), (which is the optimal performance for comparison sort algorithms)

worst-case implementation of Shell sort was precluded by Poonen, Plaxton, and

Suel.

Page 12: Sorting

Let  3 7 9 0 5 1 6 8 4 2 0 6 1 5 7 3 4 9 8 2  be the data sequence to be sorted. First,

it is arranged in an array with 7 columns (left), then the columns are sorted (right):

3 7 9 0 5 1 6

8 4 2 0 6 1 5

7 3 4 9 8 2

 

 

     

3 3 2 0 5 1 5

7 4 4 0 6 1 6

8 7 9 9 8 2

Data elements 8 and 9 have now already come to the end of the sequence, but a

small element (2) is also still there. In the next step, the sequence is arranged in 3

columns, which are again sorted:

3 3 2

0 5 1

5 7 4

4 0 6

1 6 8

7 9 9

8 2

 

 

 

 

     

0 0 1

1 2 2

3 3 4

4 5 6

5 6 8

7 7 9

8 9

Now the sequence is almost completely sorted. When arranging it in one column

in the last step, it is only a 6, an 8 and a 9 that have to move a little bit to their

correct position.

The best known sequence according to research by Marcin Ciura is 1, 4, 10, 23,

57, 132, 301, 701, 1750. This study also concluded that "comparisons rather than

moves should be considered the dominant operation in Shellsort." Another

sequence that performs very well on large arrays is the Fibonacci numbers

(leaving out one of the starting 1's) to the power of twice the golden ratio, which

gives the following sequence: 1, 9, 34, 182, 836, 4025, 19001, 90358, 428481,

2034035, 9651787, 45806244, 217378076, 1031612713, ….

Page 13: Sorting

 

Algorithm Shellsort

void shellsort (int[] a, int n){ int i, j, k, h, v; int[] cols = {1391376, 463792, 198768, 86961, 33936, 13776, 4592, 1968, 861, 336, 112, 48, 21, 7, 3, 1} for (k=0; k<16; k++) { h=cols[k]; for (i=h; i<n; i++) { v=a[i]; j=i; while (j>=h && a[j-h]>v) { a[j]=a[j-h]; j=j-h; } a[j]=v; } }} 

Page 14: Sorting

MERGE SORT

Merge sort is a comparison sort which is very much effective on large lists, with a

worst case complexity of O(n log n) It was invented by John Von Neumann in the

year 1945. Merge sort is an example of divide and conquer algorithm. The

algorithm followed by merge sort is as follows –

1. If the list is of length 0 or 1, then it is already sorted. Otherwise: 2. Divide the unsorted list into two sub-lists of about half the size. 3. Sort each sub-list recursively by re-applying merge sort. 4. Merge the two sub-lists back into one sorted list.

The two main ideas behind the algorithm is sorting small lists takes lesser time and

steps than sorting long lists, and creating a sorted list from two sorted lists is easier

than from two unsorted lists. Merge sort is a stable sort i.e. the order of equal

inputs is preserved in the sorted list.

As mentioned above, merge sort has an average and worst case performance of

O(n log n) in sorting n objects. When comparing with quick sort, merge sort’s

worst case is found equal with quick sort’s best case. In the worst case, merge sort

does about 39% fewer comparisons than quick sort does in the average case. The

main disadvantage that merge sort has is that as many recursive implementations

of the algorithm is done, so much method call overhead is created, thus taking

time and memory. But it is not difficult to code an iterative, non – recursive merge

sort, avoiding all method call overheads. Also merge sort does not sort in – place,

therefore it requires an extra memory to be allocated for the sorted output to be

stored in. One of the main advantage of merge sort is that it has O(n) complexity,

if the input is already sorted, which is equivalent to running through the list and

checking if it is presorted. Sorting in place is possible using linked lists and is very

complicated. In such cases, heap sort is more preferable. Merge sort is a very

stable sort as long as the merge operation is done properly.

Page 15: Sorting

Consider a list of “3, 5, 4, 9, 2” to be sorted using merge sort. First the list will be

divided into smaller lists. Here in this case

Now let us consider the comparisons that take place in the algorithm. According to

the algorithm, if the list contains 0 or 1 no. of elements the list is already sorted,

and it is merged to form a larger sorted list. Accordingly, 3 and 5 will be merged

as ( 3 5 ) and 9 and 2 will be merged as ( 2 9 ).

Now 4 and ( 2 9 ) are two sorted lists that are to be merged. After comparisons the

new sorted list will be ( 2 4 9 ). Now the two sorted lists to be merged are ( 3 5 )

and ( 2 4 9 ). The comparisons done are shown below, and the elements being

compared are shown in “bold”.

( 3 5 ) ( 2 4 9 ) ( 2 )

( 3 5 ) ( 4 9 ) ( 2 3 )

( 5 ) ( 4 9 ) ( 2 3 4 )

( 5 ) ( 9 ) ( 2 3 4 5 9 )

Thus the merged list is ( 2 3 4 5 9 ) which is the required sorted output.

Various programming languages use either merge sort or a variant of the algorithm

as their in-built method for sorting.

3 5 4 9 2

3 5 4 9 2

3 5 4 9 2

9 2

Page 16: Sorting

public int[] mergeSort(int array[])

{ if(array.length > 1){

int elementsInA1 = array.length/2;int elementsInA2 = elementsInA1;if((array.length % 2) == 1)

elementsInA2 += 1;int arr1[] = new int[elementsInA1];int arr2[] = new int[elementsInA2];for(int i = 0; i < elementsInA1; i++)

arr1[i] = array[i];

for(int i = elementsInA1; i < elementsInA1 + elementsInA2; i++)

arr2[i - elementsInA1] = array[i];

arr1 = mergeSort(arr1);arr2 = mergeSort(arr2);

int i = 0, j = 0, k = 0;

while(arr1.length != j && arr2.length != k){

if(arr1[j] < arr2[k]){

array[i] = arr1[j];i++;j++;

}

else{

array[i] = arr2[k];i++;k++;

}}

while(arr1.length != j){ array[i] = arr1[j];

i++;j++;

}while(arr2.length != k){ array[i] = arr2[k];

i++;k++;

}}

return array;}

Page 17: Sorting

HEAPSORT

Heapsort is a much more efficient version of selection sort. It works similar to

selection sort, by determining the largest (or smallest) element of the list, placing

that at the end ( or beginning ) of the list, the continuing with the rest of the list,

but accomplishes the task more efficiently with the use of a data structure called a

heap, which is a special type of binary tree. It is always guaranteed that the root

element of the heap is always the largest element in max_heap ( or smallest

element in min_heap ). When the largest element is removed from the heap, there

is no need to find the next largest element, as the heap rearranges itself so that the

next largest element becomes the root. For the heap data structure to find the next

largest element and to move it to the top, it takes only O(log n) time. Therefore the

whole Heapsort algorithm takes just O(n log n) time.

Heap is a specialized tree based data structure, where if B is a child node of A,

then key (A) ≥ key (B). Therefore in a heap, the root element is always the largest

element. The various operations that can be done on a heap data structure are

insert new element, delete an element from the root, and so on. For elementary

Heapsort algorithms, the binary heap data structure is widely used. The operations

that can be done and their algorithms are given in Appendix B.

As Heapsort has O( n log n) complexity it is always compared with quick sort and

merge sort. Quick sort has O(n²) complexity, which is very insecure and inefficient

for large lists. However quick sort works better on smaller lists because of cache

and other factors. Since Heapsort is more secure, it is used in embedded systems

where security is a great concern. When comparing with merge sort the main

advantage Heapsort has, is that Heapsort requires only a constant amount of

auxiliary storage space, in contrast to merge sort which requires O(n) auxiliary

space. Merge sort has many advantages over Heapsort, some being that merge sort

is stable and can be easily adaptable to linked lists and lists on slow media disks.

Page 18: Sorting

Let us study an example which demonstrates the working of Heapsort. For the list

( 11 9 34 25 17 109 53 ) the heap data structure will be

Page 19: Sorting

An interesting alternative to Heapsort is introsort which combines quick sort and

Heapsort, keeping the worst case property of Heapsort and average case property

of Quicksort.

Page 20: Sorting

Quicksort

Main article: Quicksort

Quicksort is a divide and conquer algorithm which relies on a partition operation: to partition an array, we choose an element, called a pivot, move all smaller elements before the pivot, and move all greater elements after it. This can be done efficiently in linear time and in-place. We then recursively sort the lesser and greater sublists. Efficient implementations of quicksort (with in-place partitioning) are typically unstable sorts and somewhat complex, but are among the fastest sorting algorithms in practice. Together with its modest O(log n) space usage, this makes quicksort one of the most popular sorting algorithms, available in many standard libraries. The most complex issue in quicksort is choosing a good pivot element; consistently poor choices of pivots can result in drastically slower O(n²) performance, but if at each step we choose the median as the pivot then it works in O(n log n).

Bucket sort

Main article: Bucket sort

Bucket sort is a sorting algorithm that works by partitioning an array into a finite number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. Thus this is most effective on data whose values are limited (e.g. a sort of a million integers ranging from 1 to 1000). A variation of this method called the single buffered count sort is faster than quicksort and takes about the same time to run on any set of data.

Radix sort

Main article: Radix sort

Radix sort is an algorithm that sorts a list of fixed-size numbers of length k in O(n · k) time by treating them as bit strings. We first sort the list by the least significant bit while preserving their relative order using a stable sort. Then we sort them by the next bit, and so on from right to left, and the list will end up sorted. Most often, the counting sort algorithm is used to accomplish the bitwise sorting, since the number of values a bit can have is minimal - only '1' or '0'.

[hide] v • d • e

Sorting algorithms

Theory

Computational complexity theory | Big O notation | Total order | Lists | Stability | Comparison sort

Exchange sorts Bubble sort | Cocktail sort |

Page 21: Sorting

Odd-even sort | Comb sort | Gnome sort | Quicksort

Selection sortsSelection sort | Heapsort | Smoothsort | Cartesian tree sort | Tournament sort

Insertion sortsInsertion sort | Shell sort | Tree sort | Library sort | Patience sorting

Merge sortsMerge sort | Strand sort | Timsort

Non-comparison sortsRadix sort | Bucket sort | Counting sort | Pigeonhole sort | Burstsort | Bead sort

Others

Topological sorting | Sorting network | Bitonic sorter | Batcher odd-even mergesort | Pancake sorting

Ineffective/humorous sorts

Bogosort | Stooge sort

In mathematics, computer science, and related fields, big O notation describes the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. Big O notation allows its users to simplify functions in order to concentrate on their growth rates: different functions with the same growth rate may be represented using the same O notation.

Notation Name Example

constantDetermining if a number is even or odd; using a constant-size lookup table or hash table

inverse Ackermann

Amortized time per operation using a disjoint set

iterated logarithmic

The find algorithm of Hopcroft and Ullman on a disjoint set

Page 22: Sorting

log-logarithmicAmortized time per operation using a bounded priority queue [4]

logarithmicFinding an item in a sorted array with a binary search

polylogarithmicDeciding if n is prime with the AKS primality test

fractional power Searching in a kd-tree

linearFinding an item in an unsorted list; adding two n-digit numbers

linearithmic, loglinear, or quasilinear

Performing a Fast Fourier transform; heapsort, quicksort (best case), or merge sort

quadratic

Multiplying two n-digit numbers by a simple algorithm; adding two n×n matrices; bubble sort (worst case or naive implementation), shell sort, quicksort (worst case), or insertion sort

cubic

Multiplying two n×n matrices by simple algorithm; finding the shortest path on a weighted digraph with the Floyd-Warshall algorithm; inverting a (dense) nxn matrix using LU or Cholesky decomposition

polynomial or algebraic

Tree-adjoining grammar parsing; maximum matching for bipartite graphs (grows faster than cubic if and only if c > 3)

L-notation Factoring a number using the special or

Page 23: Sorting

general number field sieve

exponential or geometric

Finding the (exact) solution to the traveling salesman problem using dynamic programming; determining if two logical statements are equivalent using brute force

factorial or combinatorial

Solving the traveling salesman problem via brute-force search; finding the determinant with expansion by minors.

double exponential

Deciding the truth of a given statement in Presburger arithmetic

Page 24: Sorting

http://www.algolist.net/Algorithms/Sorting/Insertion_sorthttp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.1393