G64ADS Advanced Data Structures

Post on 22-Jan-2016

39 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

G64ADS Advanced Data Structures. Sorting. Insertion sort. 1) Initially p = 1 2) Let the first p elements be sorted. 3) Insert the (p+1)th element properly in the list so that now p+1 elements are sorted. 4) increment p and go to step (3). Insertion sort. 3. Insertion sort. - PowerPoint PPT Presentation

Transcript

1

G64ADSAdvanced Data Structures

Sorting

2

Insertion sort

1) Initially p = 1

2) Let the first p elements be sorted

3) Insert the (p+1)th element properly in the list so that now p+1 elements are sorted

4) increment p and go to step (3)

3

Insertion sort

4

Insertion sort

o Consists of N - 1 passeso For pass p = 1 through N - 1 ensures that the elements

in positions 0 through p are in sorted ordero elements in positions 0 through p - 1 are already sortedo move the element in position p left until its correct place is found

among the first p + 1 elements

5

Insertion sort

To sort the following numbers in increasing order

34 8 64 51 32 21

p = 1 tmp = 8

34 gt tmp so second element a[1] is set to 34 8 34hellip

We have reached the front of the list Thus 1st position a[0] = tmp=8

After 1st pass 8 34 64 51 32 21

(first 2 elements are sorted)

6

p = 3 tmp = 51

51 lt 64 so we have 8 34 64 64 32 21

34 lt 51 so stop at 2nd position set 3rd position = tmp

After 3rd pass 8 34 51 64 32 21

(first 4 elements are sorted)p = 4 tmp = 32

32 lt 64 so 8 34 51 64 64 21

32 lt 51 so 8 34 51 51 64 21

next 32 lt 34 so 8 34 34 51 64 21

next 32 gt 8 so stop at 1st position and set 2nd position = 32

After 4th pass 8 32 34 51 64 21

p = 5 tmp = 21

After 5th pass 8 21 32 34 51 64

p = 2 tmp = 64

34 lt 64 so stop at 3rd position and set 3rd position = 64

After 2nd pass 8 34 64 51 32 21

(first 3 elements are sorted)

7

Insertion sort worst-case running time

o Inner loop is executed p times for each p=1N-1 Overall 1 + 2 + 3 + + N-1 = hellip= O(N2)o Space requirement is O()

8

Heapsort

(1) Build a binary heap of N elements o the minimum element is at the top of the heap

(2) Perform N DeleteMin operationso the elements are extracted in sorted order

(3) Record these elements in a second array and then copy the array back

9

Heapsort -Analysis

(1) Build a binary heap of N elements o repeatedly insert N elements O(N log N) time

(2) Perform N DeleteMin operationso Each DeleteMin operation takes O(log N) O(N log N)

(3) Record these elements in a second array and then copy the array backo O(N)

o Total time complexity O(N log N)o Memory requirement uses an extra array O(N)

10

Heapsort ndash No Extra Memory

o Observation after each deleteMin the size of heap shrinks by 1

o We can use the last cell just freed up to store the element that was just deleted

after the last deleteMin the array will contain the elements in decreasing sorted order

o To sort the elements in the decreasing order use a min heap

o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

11

Heapsort ndash No Extra Memory

Sort in increasing order use max heap

Delete 97

12

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

13

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

14

Mergesort Divide

o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

cut the link

o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

o Try left=0 right = 50 center=

15

Mergesort

o Divide-and-conquer strategyo recursively mergesort the first half and the

second halfo merge the two sorted halves together

16

Mergesort

17

Mergesort Merge

o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

o initially set to the beginning of their respective arrays

(1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

(2) When either input list is exhausted the remainder of the other list is copied to C

18

Mergesort Merge

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary

    2

    Insertion sort

    1) Initially p = 1

    2) Let the first p elements be sorted

    3) Insert the (p+1)th element properly in the list so that now p+1 elements are sorted

    4) increment p and go to step (3)

    3

    Insertion sort

    4

    Insertion sort

    o Consists of N - 1 passeso For pass p = 1 through N - 1 ensures that the elements

    in positions 0 through p are in sorted ordero elements in positions 0 through p - 1 are already sortedo move the element in position p left until its correct place is found

    among the first p + 1 elements

    5

    Insertion sort

    To sort the following numbers in increasing order

    34 8 64 51 32 21

    p = 1 tmp = 8

    34 gt tmp so second element a[1] is set to 34 8 34hellip

    We have reached the front of the list Thus 1st position a[0] = tmp=8

    After 1st pass 8 34 64 51 32 21

    (first 2 elements are sorted)

    6

    p = 3 tmp = 51

    51 lt 64 so we have 8 34 64 64 32 21

    34 lt 51 so stop at 2nd position set 3rd position = tmp

    After 3rd pass 8 34 51 64 32 21

    (first 4 elements are sorted)p = 4 tmp = 32

    32 lt 64 so 8 34 51 64 64 21

    32 lt 51 so 8 34 51 51 64 21

    next 32 lt 34 so 8 34 34 51 64 21

    next 32 gt 8 so stop at 1st position and set 2nd position = 32

    After 4th pass 8 32 34 51 64 21

    p = 5 tmp = 21

    After 5th pass 8 21 32 34 51 64

    p = 2 tmp = 64

    34 lt 64 so stop at 3rd position and set 3rd position = 64

    After 2nd pass 8 34 64 51 32 21

    (first 3 elements are sorted)

    7

    Insertion sort worst-case running time

    o Inner loop is executed p times for each p=1N-1 Overall 1 + 2 + 3 + + N-1 = hellip= O(N2)o Space requirement is O()

    8

    Heapsort

    (1) Build a binary heap of N elements o the minimum element is at the top of the heap

    (2) Perform N DeleteMin operationso the elements are extracted in sorted order

    (3) Record these elements in a second array and then copy the array back

    9

    Heapsort -Analysis

    (1) Build a binary heap of N elements o repeatedly insert N elements O(N log N) time

    (2) Perform N DeleteMin operationso Each DeleteMin operation takes O(log N) O(N log N)

    (3) Record these elements in a second array and then copy the array backo O(N)

    o Total time complexity O(N log N)o Memory requirement uses an extra array O(N)

    10

    Heapsort ndash No Extra Memory

    o Observation after each deleteMin the size of heap shrinks by 1

    o We can use the last cell just freed up to store the element that was just deleted

    after the last deleteMin the array will contain the elements in decreasing sorted order

    o To sort the elements in the decreasing order use a min heap

    o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

    11

    Heapsort ndash No Extra Memory

    Sort in increasing order use max heap

    Delete 97

    12

    Mergesort

    Based on divide-and-conquer strategy

    o Divide the list into two smaller lists of about equal sizes

    o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

    o How to divide the list o Running timeo How to merge the two sorted lists o Running time

    13

    Mergesort

    Based on divide-and-conquer strategy

    o Divide the list into two smaller lists of about equal sizes

    o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

    o How to divide the list o Running timeo How to merge the two sorted lists o Running time

    14

    Mergesort Divide

    o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

    cut the link

    o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

    o Try left=0 right = 50 center=

    15

    Mergesort

    o Divide-and-conquer strategyo recursively mergesort the first half and the

    second halfo merge the two sorted halves together

    16

    Mergesort

    17

    Mergesort Merge

    o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

    o initially set to the beginning of their respective arrays

    (1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

    (2) When either input list is exhausted the remainder of the other list is copied to C

    18

    Mergesort Merge

    19

    Mergesort Merge

    20

    Mergesort Analysis

    o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

    o Space requiremento merging two sorted lists requires linear extra

    memoryo additional work to copy to the temporary array

    and back

    21

    Mergesort Analysis

    o Let T(N) denote the worst-case running time of mergesort to sort N numbers

    o Assume that N is a power of 2

    o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

    o T(1) = 1o T(N) = 2T(N2) + N

    22

    Mergesort Analysis

    kNN

    T

    NN

    T

    NNN

    T

    NN

    T

    NNN

    T

    NN

    TNT

    kk

    )2(2

    3)8(8

    2)4

    )8(2(4

    2)4(4

    )2

    )4(2(2

    )2(2)(

    )log(

    log

    )2(2)(

    NNO

    NNN

    kNN

    TNTk

    k

    Since N=2k we have k=log2 n

    23

    Quicksort

    o Divide-and-conquer approach to sortingo Like MergeSort except

    o Donrsquot divide the array in halfo Partition the array based elements being less than or

    greater than some element of the array (the pivot)

    o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

    InsertionSort) when array is small

    24

    Quicksort Algorithm

    o Given array S

    o Modify S so elements in increasing order

    1 If size of S is 0 or 1 return

    2 Pick any element v in S as the pivot

    3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

    4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

    25

    Quicksort Example

    26

    Why so fast

    o MergeSort always divides array in halfo QuickSort might divide array into

    subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

    o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

    merge stepo QuickSort can partition the array in place

    o This more than makes up for bad pivot choices

    27

    Picking the Pivot

    o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

    generator

    28

    Picking the Pivot

    o Best choice of pivoto Median of array

    o Median is expensive to calculateo Estimate median as the median of

    three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

    o Has been shown to reduce running time (comparisons) by 14

    29

    Partitioning Strategy

    o Partitioning is conceptually straightforward but easy to do inefficiently

    o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

    o Increment i until S[i] gt pivot

    o Decrement j until S[j] lt pivot

    o If (i lt j) then swap S[i] and S[j]

    o Swap pivot and S[i]

    30

    Partitioning Example

    31

    Partitioning Example

    32

    Partitioning Strategy

    o How to handle duplicateso Consider the case where all elements

    are equalo Current approach Skip over elements

    equal to pivoto No swaps (good)

    o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

    o Worst case O(N2) performance

    33

    Partitioning Strategy

    o How to handle duplicateso Alternative approach

    o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

    o Adds some unnecessary swapso But results in perfect partitioning for array

    of identical elementso Unlikely for input array but more likely for

    recursive calls to QuickSort

    34

    Small Arrays

    o When S is small generating lots of recursive calls on small sub-arrays is expensive

    o General strategyo When N lt threshold use a sort more efficient for

    small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

    for array of size 2 or less

    o Has been shown to reduce running time by 15

    35

    QuickSort Implementation

    36

    QuickSort Implementation

    37

    QuickSort Implementation

    38

    Analysis of QuickSort

    o Let i be the number of elements sent to the left partition

    o Compute running time T(N) for array of size N

    o T(0) = T(1) = O(1)

    o T(N) = T(i) + T(N ndashi ndash1) + O(N)

    39

    Analysis of QuickSort

    40

    Analysis of QuickSort

    41

    Comparison Sorting

    42

    Comparison Sorting

    43

    Comparison Sorting

    44

    Lower Bound on Sorting

    o Best worst-case sorting algorithm (so far) is O(N log N)

    o Can we do bettero Can we prove a lower bound on the

    sorting problemo Preview

    o For comparison sorting no we canrsquot do better

    o Can show lower bound of Ω(N log N)

    45

    Decision Trees

    o A decision tree is a binary treeo Each node represents a set of possible

    orderings of the array elementso Each branch represents an outcome of

    a particular comparison

    o Each leaf of the decision tree represents a particular ordering of the original array elements

    46

    Decision Trees

    47

    Decision Tree for Sorting

    o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

    o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

    o In the average case the number of comparisons is the average of the depths of all leaves

    o There are N different orderings of N elements

    48

    Lower Bound for Comparison Sorting

    o Lemma 71o Lemma 72o Theorem 76o Theorem 77

    49

    Linear Sorting

    o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

    o CountingSort

    o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

    irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

    50

    Linear Sorting

    o BucketSort

    o Assume N elements of A uniformly distributed over the range [01)

    o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

    o Assumes each bucket will contain Θ(1) elements

    51

    External Sorting

    o What is the number of elements N we wish to sort do not fit in memory

    o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

    o We want to minimize disk accesses

    52

    External Mergesorting

    o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

    o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

    o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

    size M(K+1)o Perform a K-way merge O(N)

    o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

    53

    External Mergesort

    o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

    o P = page size

    o Accesses = 4NP (read-allwrite-all twice)

    54

    Summary

    • G64ADS Advanced Data Structures
    • Insertion sort
    • Slide 3
    • Slide 4
    • Slide 5
    • Slide 6
    • Insertion sort worst-case running time
    • Heapsort
    • Heapsort -Analysis
    • Heapsort ndash No Extra Memory
    • Slide 11
    • Mergesort
    • Slide 13
    • Mergesort Divide
    • Slide 15
    • Slide 16
    • Mergesort Merge
    • Slide 18
    • Slide 19
    • Mergesort Analysis
    • Slide 21
    • Slide 22
    • Quicksort
    • Quicksort Algorithm
    • Quicksort Example
    • Why so fast
    • Picking the Pivot
    • Slide 28
    • Partitioning Strategy
    • Partitioning Example
    • Slide 31
    • Slide 32
    • Slide 33
    • Small Arrays
    • QuickSort Implementation
    • Slide 36
    • Slide 37
    • Analysis of QuickSort
    • Slide 39
    • Slide 40
    • Comparison Sorting
    • Slide 42
    • Slide 43
    • Lower Bound on Sorting
    • Decision Trees
    • Slide 46
    • Decision Tree for Sorting
    • Lower Bound for Comparison Sorting
    • Linear Sorting
    • Slide 50
    • External Sorting
    • External Mergesorting
    • External Mergesort
    • Summary

      3

      Insertion sort

      4

      Insertion sort

      o Consists of N - 1 passeso For pass p = 1 through N - 1 ensures that the elements

      in positions 0 through p are in sorted ordero elements in positions 0 through p - 1 are already sortedo move the element in position p left until its correct place is found

      among the first p + 1 elements

      5

      Insertion sort

      To sort the following numbers in increasing order

      34 8 64 51 32 21

      p = 1 tmp = 8

      34 gt tmp so second element a[1] is set to 34 8 34hellip

      We have reached the front of the list Thus 1st position a[0] = tmp=8

      After 1st pass 8 34 64 51 32 21

      (first 2 elements are sorted)

      6

      p = 3 tmp = 51

      51 lt 64 so we have 8 34 64 64 32 21

      34 lt 51 so stop at 2nd position set 3rd position = tmp

      After 3rd pass 8 34 51 64 32 21

      (first 4 elements are sorted)p = 4 tmp = 32

      32 lt 64 so 8 34 51 64 64 21

      32 lt 51 so 8 34 51 51 64 21

      next 32 lt 34 so 8 34 34 51 64 21

      next 32 gt 8 so stop at 1st position and set 2nd position = 32

      After 4th pass 8 32 34 51 64 21

      p = 5 tmp = 21

      After 5th pass 8 21 32 34 51 64

      p = 2 tmp = 64

      34 lt 64 so stop at 3rd position and set 3rd position = 64

      After 2nd pass 8 34 64 51 32 21

      (first 3 elements are sorted)

      7

      Insertion sort worst-case running time

      o Inner loop is executed p times for each p=1N-1 Overall 1 + 2 + 3 + + N-1 = hellip= O(N2)o Space requirement is O()

      8

      Heapsort

      (1) Build a binary heap of N elements o the minimum element is at the top of the heap

      (2) Perform N DeleteMin operationso the elements are extracted in sorted order

      (3) Record these elements in a second array and then copy the array back

      9

      Heapsort -Analysis

      (1) Build a binary heap of N elements o repeatedly insert N elements O(N log N) time

      (2) Perform N DeleteMin operationso Each DeleteMin operation takes O(log N) O(N log N)

      (3) Record these elements in a second array and then copy the array backo O(N)

      o Total time complexity O(N log N)o Memory requirement uses an extra array O(N)

      10

      Heapsort ndash No Extra Memory

      o Observation after each deleteMin the size of heap shrinks by 1

      o We can use the last cell just freed up to store the element that was just deleted

      after the last deleteMin the array will contain the elements in decreasing sorted order

      o To sort the elements in the decreasing order use a min heap

      o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

      11

      Heapsort ndash No Extra Memory

      Sort in increasing order use max heap

      Delete 97

      12

      Mergesort

      Based on divide-and-conquer strategy

      o Divide the list into two smaller lists of about equal sizes

      o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

      o How to divide the list o Running timeo How to merge the two sorted lists o Running time

      13

      Mergesort

      Based on divide-and-conquer strategy

      o Divide the list into two smaller lists of about equal sizes

      o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

      o How to divide the list o Running timeo How to merge the two sorted lists o Running time

      14

      Mergesort Divide

      o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

      cut the link

      o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

      o Try left=0 right = 50 center=

      15

      Mergesort

      o Divide-and-conquer strategyo recursively mergesort the first half and the

      second halfo merge the two sorted halves together

      16

      Mergesort

      17

      Mergesort Merge

      o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

      o initially set to the beginning of their respective arrays

      (1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

      (2) When either input list is exhausted the remainder of the other list is copied to C

      18

      Mergesort Merge

      19

      Mergesort Merge

      20

      Mergesort Analysis

      o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

      o Space requiremento merging two sorted lists requires linear extra

      memoryo additional work to copy to the temporary array

      and back

      21

      Mergesort Analysis

      o Let T(N) denote the worst-case running time of mergesort to sort N numbers

      o Assume that N is a power of 2

      o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

      o T(1) = 1o T(N) = 2T(N2) + N

      22

      Mergesort Analysis

      kNN

      T

      NN

      T

      NNN

      T

      NN

      T

      NNN

      T

      NN

      TNT

      kk

      )2(2

      3)8(8

      2)4

      )8(2(4

      2)4(4

      )2

      )4(2(2

      )2(2)(

      )log(

      log

      )2(2)(

      NNO

      NNN

      kNN

      TNTk

      k

      Since N=2k we have k=log2 n

      23

      Quicksort

      o Divide-and-conquer approach to sortingo Like MergeSort except

      o Donrsquot divide the array in halfo Partition the array based elements being less than or

      greater than some element of the array (the pivot)

      o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

      InsertionSort) when array is small

      24

      Quicksort Algorithm

      o Given array S

      o Modify S so elements in increasing order

      1 If size of S is 0 or 1 return

      2 Pick any element v in S as the pivot

      3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

      4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

      25

      Quicksort Example

      26

      Why so fast

      o MergeSort always divides array in halfo QuickSort might divide array into

      subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

      o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

      merge stepo QuickSort can partition the array in place

      o This more than makes up for bad pivot choices

      27

      Picking the Pivot

      o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

      generator

      28

      Picking the Pivot

      o Best choice of pivoto Median of array

      o Median is expensive to calculateo Estimate median as the median of

      three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

      o Has been shown to reduce running time (comparisons) by 14

      29

      Partitioning Strategy

      o Partitioning is conceptually straightforward but easy to do inefficiently

      o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

      o Increment i until S[i] gt pivot

      o Decrement j until S[j] lt pivot

      o If (i lt j) then swap S[i] and S[j]

      o Swap pivot and S[i]

      30

      Partitioning Example

      31

      Partitioning Example

      32

      Partitioning Strategy

      o How to handle duplicateso Consider the case where all elements

      are equalo Current approach Skip over elements

      equal to pivoto No swaps (good)

      o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

      o Worst case O(N2) performance

      33

      Partitioning Strategy

      o How to handle duplicateso Alternative approach

      o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

      o Adds some unnecessary swapso But results in perfect partitioning for array

      of identical elementso Unlikely for input array but more likely for

      recursive calls to QuickSort

      34

      Small Arrays

      o When S is small generating lots of recursive calls on small sub-arrays is expensive

      o General strategyo When N lt threshold use a sort more efficient for

      small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

      for array of size 2 or less

      o Has been shown to reduce running time by 15

      35

      QuickSort Implementation

      36

      QuickSort Implementation

      37

      QuickSort Implementation

      38

      Analysis of QuickSort

      o Let i be the number of elements sent to the left partition

      o Compute running time T(N) for array of size N

      o T(0) = T(1) = O(1)

      o T(N) = T(i) + T(N ndashi ndash1) + O(N)

      39

      Analysis of QuickSort

      40

      Analysis of QuickSort

      41

      Comparison Sorting

      42

      Comparison Sorting

      43

      Comparison Sorting

      44

      Lower Bound on Sorting

      o Best worst-case sorting algorithm (so far) is O(N log N)

      o Can we do bettero Can we prove a lower bound on the

      sorting problemo Preview

      o For comparison sorting no we canrsquot do better

      o Can show lower bound of Ω(N log N)

      45

      Decision Trees

      o A decision tree is a binary treeo Each node represents a set of possible

      orderings of the array elementso Each branch represents an outcome of

      a particular comparison

      o Each leaf of the decision tree represents a particular ordering of the original array elements

      46

      Decision Trees

      47

      Decision Tree for Sorting

      o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

      o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

      o In the average case the number of comparisons is the average of the depths of all leaves

      o There are N different orderings of N elements

      48

      Lower Bound for Comparison Sorting

      o Lemma 71o Lemma 72o Theorem 76o Theorem 77

      49

      Linear Sorting

      o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

      o CountingSort

      o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

      irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

      50

      Linear Sorting

      o BucketSort

      o Assume N elements of A uniformly distributed over the range [01)

      o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

      o Assumes each bucket will contain Θ(1) elements

      51

      External Sorting

      o What is the number of elements N we wish to sort do not fit in memory

      o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

      o We want to minimize disk accesses

      52

      External Mergesorting

      o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

      o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

      o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

      size M(K+1)o Perform a K-way merge O(N)

      o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

      53

      External Mergesort

      o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

      o P = page size

      o Accesses = 4NP (read-allwrite-all twice)

      54

      Summary

      • G64ADS Advanced Data Structures
      • Insertion sort
      • Slide 3
      • Slide 4
      • Slide 5
      • Slide 6
      • Insertion sort worst-case running time
      • Heapsort
      • Heapsort -Analysis
      • Heapsort ndash No Extra Memory
      • Slide 11
      • Mergesort
      • Slide 13
      • Mergesort Divide
      • Slide 15
      • Slide 16
      • Mergesort Merge
      • Slide 18
      • Slide 19
      • Mergesort Analysis
      • Slide 21
      • Slide 22
      • Quicksort
      • Quicksort Algorithm
      • Quicksort Example
      • Why so fast
      • Picking the Pivot
      • Slide 28
      • Partitioning Strategy
      • Partitioning Example
      • Slide 31
      • Slide 32
      • Slide 33
      • Small Arrays
      • QuickSort Implementation
      • Slide 36
      • Slide 37
      • Analysis of QuickSort
      • Slide 39
      • Slide 40
      • Comparison Sorting
      • Slide 42
      • Slide 43
      • Lower Bound on Sorting
      • Decision Trees
      • Slide 46
      • Decision Tree for Sorting
      • Lower Bound for Comparison Sorting
      • Linear Sorting
      • Slide 50
      • External Sorting
      • External Mergesorting
      • External Mergesort
      • Summary

        4

        Insertion sort

        o Consists of N - 1 passeso For pass p = 1 through N - 1 ensures that the elements

        in positions 0 through p are in sorted ordero elements in positions 0 through p - 1 are already sortedo move the element in position p left until its correct place is found

        among the first p + 1 elements

        5

        Insertion sort

        To sort the following numbers in increasing order

        34 8 64 51 32 21

        p = 1 tmp = 8

        34 gt tmp so second element a[1] is set to 34 8 34hellip

        We have reached the front of the list Thus 1st position a[0] = tmp=8

        After 1st pass 8 34 64 51 32 21

        (first 2 elements are sorted)

        6

        p = 3 tmp = 51

        51 lt 64 so we have 8 34 64 64 32 21

        34 lt 51 so stop at 2nd position set 3rd position = tmp

        After 3rd pass 8 34 51 64 32 21

        (first 4 elements are sorted)p = 4 tmp = 32

        32 lt 64 so 8 34 51 64 64 21

        32 lt 51 so 8 34 51 51 64 21

        next 32 lt 34 so 8 34 34 51 64 21

        next 32 gt 8 so stop at 1st position and set 2nd position = 32

        After 4th pass 8 32 34 51 64 21

        p = 5 tmp = 21

        After 5th pass 8 21 32 34 51 64

        p = 2 tmp = 64

        34 lt 64 so stop at 3rd position and set 3rd position = 64

        After 2nd pass 8 34 64 51 32 21

        (first 3 elements are sorted)

        7

        Insertion sort worst-case running time

        o Inner loop is executed p times for each p=1N-1 Overall 1 + 2 + 3 + + N-1 = hellip= O(N2)o Space requirement is O()

        8

        Heapsort

        (1) Build a binary heap of N elements o the minimum element is at the top of the heap

        (2) Perform N DeleteMin operationso the elements are extracted in sorted order

        (3) Record these elements in a second array and then copy the array back

        9

        Heapsort -Analysis

        (1) Build a binary heap of N elements o repeatedly insert N elements O(N log N) time

        (2) Perform N DeleteMin operationso Each DeleteMin operation takes O(log N) O(N log N)

        (3) Record these elements in a second array and then copy the array backo O(N)

        o Total time complexity O(N log N)o Memory requirement uses an extra array O(N)

        10

        Heapsort ndash No Extra Memory

        o Observation after each deleteMin the size of heap shrinks by 1

        o We can use the last cell just freed up to store the element that was just deleted

        after the last deleteMin the array will contain the elements in decreasing sorted order

        o To sort the elements in the decreasing order use a min heap

        o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

        11

        Heapsort ndash No Extra Memory

        Sort in increasing order use max heap

        Delete 97

        12

        Mergesort

        Based on divide-and-conquer strategy

        o Divide the list into two smaller lists of about equal sizes

        o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

        o How to divide the list o Running timeo How to merge the two sorted lists o Running time

        13

        Mergesort

        Based on divide-and-conquer strategy

        o Divide the list into two smaller lists of about equal sizes

        o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

        o How to divide the list o Running timeo How to merge the two sorted lists o Running time

        14

        Mergesort Divide

        o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

        cut the link

        o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

        o Try left=0 right = 50 center=

        15

        Mergesort

        o Divide-and-conquer strategyo recursively mergesort the first half and the

        second halfo merge the two sorted halves together

        16

        Mergesort

        17

        Mergesort Merge

        o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

        o initially set to the beginning of their respective arrays

        (1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

        (2) When either input list is exhausted the remainder of the other list is copied to C

        18

        Mergesort Merge

        19

        Mergesort Merge

        20

        Mergesort Analysis

        o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

        o Space requiremento merging two sorted lists requires linear extra

        memoryo additional work to copy to the temporary array

        and back

        21

        Mergesort Analysis

        o Let T(N) denote the worst-case running time of mergesort to sort N numbers

        o Assume that N is a power of 2

        o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

        o T(1) = 1o T(N) = 2T(N2) + N

        22

        Mergesort Analysis

        kNN

        T

        NN

        T

        NNN

        T

        NN

        T

        NNN

        T

        NN

        TNT

        kk

        )2(2

        3)8(8

        2)4

        )8(2(4

        2)4(4

        )2

        )4(2(2

        )2(2)(

        )log(

        log

        )2(2)(

        NNO

        NNN

        kNN

        TNTk

        k

        Since N=2k we have k=log2 n

        23

        Quicksort

        o Divide-and-conquer approach to sortingo Like MergeSort except

        o Donrsquot divide the array in halfo Partition the array based elements being less than or

        greater than some element of the array (the pivot)

        o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

        InsertionSort) when array is small

        24

        Quicksort Algorithm

        o Given array S

        o Modify S so elements in increasing order

        1 If size of S is 0 or 1 return

        2 Pick any element v in S as the pivot

        3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

        4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

        25

        Quicksort Example

        26

        Why so fast

        o MergeSort always divides array in halfo QuickSort might divide array into

        subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

        o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

        merge stepo QuickSort can partition the array in place

        o This more than makes up for bad pivot choices

        27

        Picking the Pivot

        o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

        generator

        28

        Picking the Pivot

        o Best choice of pivoto Median of array

        o Median is expensive to calculateo Estimate median as the median of

        three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

        o Has been shown to reduce running time (comparisons) by 14

        29

        Partitioning Strategy

        o Partitioning is conceptually straightforward but easy to do inefficiently

        o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

        o Increment i until S[i] gt pivot

        o Decrement j until S[j] lt pivot

        o If (i lt j) then swap S[i] and S[j]

        o Swap pivot and S[i]

        30

        Partitioning Example

        31

        Partitioning Example

        32

        Partitioning Strategy

        o How to handle duplicateso Consider the case where all elements

        are equalo Current approach Skip over elements

        equal to pivoto No swaps (good)

        o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

        o Worst case O(N2) performance

        33

        Partitioning Strategy

        o How to handle duplicateso Alternative approach

        o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

        o Adds some unnecessary swapso But results in perfect partitioning for array

        of identical elementso Unlikely for input array but more likely for

        recursive calls to QuickSort

        34

        Small Arrays

        o When S is small generating lots of recursive calls on small sub-arrays is expensive

        o General strategyo When N lt threshold use a sort more efficient for

        small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

        for array of size 2 or less

        o Has been shown to reduce running time by 15

        35

        QuickSort Implementation

        36

        QuickSort Implementation

        37

        QuickSort Implementation

        38

        Analysis of QuickSort

        o Let i be the number of elements sent to the left partition

        o Compute running time T(N) for array of size N

        o T(0) = T(1) = O(1)

        o T(N) = T(i) + T(N ndashi ndash1) + O(N)

        39

        Analysis of QuickSort

        40

        Analysis of QuickSort

        41

        Comparison Sorting

        42

        Comparison Sorting

        43

        Comparison Sorting

        44

        Lower Bound on Sorting

        o Best worst-case sorting algorithm (so far) is O(N log N)

        o Can we do bettero Can we prove a lower bound on the

        sorting problemo Preview

        o For comparison sorting no we canrsquot do better

        o Can show lower bound of Ω(N log N)

        45

        Decision Trees

        o A decision tree is a binary treeo Each node represents a set of possible

        orderings of the array elementso Each branch represents an outcome of

        a particular comparison

        o Each leaf of the decision tree represents a particular ordering of the original array elements

        46

        Decision Trees

        47

        Decision Tree for Sorting

        o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

        o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

        o In the average case the number of comparisons is the average of the depths of all leaves

        o There are N different orderings of N elements

        48

        Lower Bound for Comparison Sorting

        o Lemma 71o Lemma 72o Theorem 76o Theorem 77

        49

        Linear Sorting

        o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

        o CountingSort

        o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

        irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

        50

        Linear Sorting

        o BucketSort

        o Assume N elements of A uniformly distributed over the range [01)

        o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

        o Assumes each bucket will contain Θ(1) elements

        51

        External Sorting

        o What is the number of elements N we wish to sort do not fit in memory

        o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

        o We want to minimize disk accesses

        52

        External Mergesorting

        o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

        o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

        o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

        size M(K+1)o Perform a K-way merge O(N)

        o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

        53

        External Mergesort

        o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

        o P = page size

        o Accesses = 4NP (read-allwrite-all twice)

        54

        Summary

        • G64ADS Advanced Data Structures
        • Insertion sort
        • Slide 3
        • Slide 4
        • Slide 5
        • Slide 6
        • Insertion sort worst-case running time
        • Heapsort
        • Heapsort -Analysis
        • Heapsort ndash No Extra Memory
        • Slide 11
        • Mergesort
        • Slide 13
        • Mergesort Divide
        • Slide 15
        • Slide 16
        • Mergesort Merge
        • Slide 18
        • Slide 19
        • Mergesort Analysis
        • Slide 21
        • Slide 22
        • Quicksort
        • Quicksort Algorithm
        • Quicksort Example
        • Why so fast
        • Picking the Pivot
        • Slide 28
        • Partitioning Strategy
        • Partitioning Example
        • Slide 31
        • Slide 32
        • Slide 33
        • Small Arrays
        • QuickSort Implementation
        • Slide 36
        • Slide 37
        • Analysis of QuickSort
        • Slide 39
        • Slide 40
        • Comparison Sorting
        • Slide 42
        • Slide 43
        • Lower Bound on Sorting
        • Decision Trees
        • Slide 46
        • Decision Tree for Sorting
        • Lower Bound for Comparison Sorting
        • Linear Sorting
        • Slide 50
        • External Sorting
        • External Mergesorting
        • External Mergesort
        • Summary

          5

          Insertion sort

          To sort the following numbers in increasing order

          34 8 64 51 32 21

          p = 1 tmp = 8

          34 gt tmp so second element a[1] is set to 34 8 34hellip

          We have reached the front of the list Thus 1st position a[0] = tmp=8

          After 1st pass 8 34 64 51 32 21

          (first 2 elements are sorted)

          6

          p = 3 tmp = 51

          51 lt 64 so we have 8 34 64 64 32 21

          34 lt 51 so stop at 2nd position set 3rd position = tmp

          After 3rd pass 8 34 51 64 32 21

          (first 4 elements are sorted)p = 4 tmp = 32

          32 lt 64 so 8 34 51 64 64 21

          32 lt 51 so 8 34 51 51 64 21

          next 32 lt 34 so 8 34 34 51 64 21

          next 32 gt 8 so stop at 1st position and set 2nd position = 32

          After 4th pass 8 32 34 51 64 21

          p = 5 tmp = 21

          After 5th pass 8 21 32 34 51 64

          p = 2 tmp = 64

          34 lt 64 so stop at 3rd position and set 3rd position = 64

          After 2nd pass 8 34 64 51 32 21

          (first 3 elements are sorted)

          7

          Insertion sort worst-case running time

          o Inner loop is executed p times for each p=1N-1 Overall 1 + 2 + 3 + + N-1 = hellip= O(N2)o Space requirement is O()

          8

          Heapsort

          (1) Build a binary heap of N elements o the minimum element is at the top of the heap

          (2) Perform N DeleteMin operationso the elements are extracted in sorted order

          (3) Record these elements in a second array and then copy the array back

          9

          Heapsort -Analysis

          (1) Build a binary heap of N elements o repeatedly insert N elements O(N log N) time

          (2) Perform N DeleteMin operationso Each DeleteMin operation takes O(log N) O(N log N)

          (3) Record these elements in a second array and then copy the array backo O(N)

          o Total time complexity O(N log N)o Memory requirement uses an extra array O(N)

          10

          Heapsort ndash No Extra Memory

          o Observation after each deleteMin the size of heap shrinks by 1

          o We can use the last cell just freed up to store the element that was just deleted

          after the last deleteMin the array will contain the elements in decreasing sorted order

          o To sort the elements in the decreasing order use a min heap

          o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

          11

          Heapsort ndash No Extra Memory

          Sort in increasing order use max heap

          Delete 97

          12

          Mergesort

          Based on divide-and-conquer strategy

          o Divide the list into two smaller lists of about equal sizes

          o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

          o How to divide the list o Running timeo How to merge the two sorted lists o Running time

          13

          Mergesort

          Based on divide-and-conquer strategy

          o Divide the list into two smaller lists of about equal sizes

          o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

          o How to divide the list o Running timeo How to merge the two sorted lists o Running time

          14

          Mergesort Divide

          o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

          cut the link

          o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

          o Try left=0 right = 50 center=

          15

          Mergesort

          o Divide-and-conquer strategyo recursively mergesort the first half and the

          second halfo merge the two sorted halves together

          16

          Mergesort

          17

          Mergesort Merge

          o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

          o initially set to the beginning of their respective arrays

          (1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

          (2) When either input list is exhausted the remainder of the other list is copied to C

          18

          Mergesort Merge

          19

          Mergesort Merge

          20

          Mergesort Analysis

          o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

          o Space requiremento merging two sorted lists requires linear extra

          memoryo additional work to copy to the temporary array

          and back

          21

          Mergesort Analysis

          o Let T(N) denote the worst-case running time of mergesort to sort N numbers

          o Assume that N is a power of 2

          o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

          o T(1) = 1o T(N) = 2T(N2) + N

          22

          Mergesort Analysis

          kNN

          T

          NN

          T

          NNN

          T

          NN

          T

          NNN

          T

          NN

          TNT

          kk

          )2(2

          3)8(8

          2)4

          )8(2(4

          2)4(4

          )2

          )4(2(2

          )2(2)(

          )log(

          log

          )2(2)(

          NNO

          NNN

          kNN

          TNTk

          k

          Since N=2k we have k=log2 n

          23

          Quicksort

          o Divide-and-conquer approach to sortingo Like MergeSort except

          o Donrsquot divide the array in halfo Partition the array based elements being less than or

          greater than some element of the array (the pivot)

          o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

          InsertionSort) when array is small

          24

          Quicksort Algorithm

          o Given array S

          o Modify S so elements in increasing order

          1 If size of S is 0 or 1 return

          2 Pick any element v in S as the pivot

          3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

          4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

          25

          Quicksort Example

          26

          Why so fast

          o MergeSort always divides array in halfo QuickSort might divide array into

          subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

          o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

          merge stepo QuickSort can partition the array in place

          o This more than makes up for bad pivot choices

          27

          Picking the Pivot

          o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

          generator

          28

          Picking the Pivot

          o Best choice of pivoto Median of array

          o Median is expensive to calculateo Estimate median as the median of

          three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

          o Has been shown to reduce running time (comparisons) by 14

          29

          Partitioning Strategy

          o Partitioning is conceptually straightforward but easy to do inefficiently

          o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

          o Increment i until S[i] gt pivot

          o Decrement j until S[j] lt pivot

          o If (i lt j) then swap S[i] and S[j]

          o Swap pivot and S[i]

          30

          Partitioning Example

          31

          Partitioning Example

          32

          Partitioning Strategy

          o How to handle duplicateso Consider the case where all elements

          are equalo Current approach Skip over elements

          equal to pivoto No swaps (good)

          o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

          o Worst case O(N2) performance

          33

          Partitioning Strategy

          o How to handle duplicateso Alternative approach

          o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

          o Adds some unnecessary swapso But results in perfect partitioning for array

          of identical elementso Unlikely for input array but more likely for

          recursive calls to QuickSort

          34

          Small Arrays

          o When S is small generating lots of recursive calls on small sub-arrays is expensive

          o General strategyo When N lt threshold use a sort more efficient for

          small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

          for array of size 2 or less

          o Has been shown to reduce running time by 15

          35

          QuickSort Implementation

          36

          QuickSort Implementation

          37

          QuickSort Implementation

          38

          Analysis of QuickSort

          o Let i be the number of elements sent to the left partition

          o Compute running time T(N) for array of size N

          o T(0) = T(1) = O(1)

          o T(N) = T(i) + T(N ndashi ndash1) + O(N)

          39

          Analysis of QuickSort

          40

          Analysis of QuickSort

          41

          Comparison Sorting

          42

          Comparison Sorting

          43

          Comparison Sorting

          44

          Lower Bound on Sorting

          o Best worst-case sorting algorithm (so far) is O(N log N)

          o Can we do bettero Can we prove a lower bound on the

          sorting problemo Preview

          o For comparison sorting no we canrsquot do better

          o Can show lower bound of Ω(N log N)

          45

          Decision Trees

          o A decision tree is a binary treeo Each node represents a set of possible

          orderings of the array elementso Each branch represents an outcome of

          a particular comparison

          o Each leaf of the decision tree represents a particular ordering of the original array elements

          46

          Decision Trees

          47

          Decision Tree for Sorting

          o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

          o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

          o In the average case the number of comparisons is the average of the depths of all leaves

          o There are N different orderings of N elements

          48

          Lower Bound for Comparison Sorting

          o Lemma 71o Lemma 72o Theorem 76o Theorem 77

          49

          Linear Sorting

          o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

          o CountingSort

          o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

          irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

          50

          Linear Sorting

          o BucketSort

          o Assume N elements of A uniformly distributed over the range [01)

          o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

          o Assumes each bucket will contain Θ(1) elements

          51

          External Sorting

          o What is the number of elements N we wish to sort do not fit in memory

          o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

          o We want to minimize disk accesses

          52

          External Mergesorting

          o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

          o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

          o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

          size M(K+1)o Perform a K-way merge O(N)

          o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

          53

          External Mergesort

          o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

          o P = page size

          o Accesses = 4NP (read-allwrite-all twice)

          54

          Summary

          • G64ADS Advanced Data Structures
          • Insertion sort
          • Slide 3
          • Slide 4
          • Slide 5
          • Slide 6
          • Insertion sort worst-case running time
          • Heapsort
          • Heapsort -Analysis
          • Heapsort ndash No Extra Memory
          • Slide 11
          • Mergesort
          • Slide 13
          • Mergesort Divide
          • Slide 15
          • Slide 16
          • Mergesort Merge
          • Slide 18
          • Slide 19
          • Mergesort Analysis
          • Slide 21
          • Slide 22
          • Quicksort
          • Quicksort Algorithm
          • Quicksort Example
          • Why so fast
          • Picking the Pivot
          • Slide 28
          • Partitioning Strategy
          • Partitioning Example
          • Slide 31
          • Slide 32
          • Slide 33
          • Small Arrays
          • QuickSort Implementation
          • Slide 36
          • Slide 37
          • Analysis of QuickSort
          • Slide 39
          • Slide 40
          • Comparison Sorting
          • Slide 42
          • Slide 43
          • Lower Bound on Sorting
          • Decision Trees
          • Slide 46
          • Decision Tree for Sorting
          • Lower Bound for Comparison Sorting
          • Linear Sorting
          • Slide 50
          • External Sorting
          • External Mergesorting
          • External Mergesort
          • Summary

            6

            p = 3 tmp = 51

            51 lt 64 so we have 8 34 64 64 32 21

            34 lt 51 so stop at 2nd position set 3rd position = tmp

            After 3rd pass 8 34 51 64 32 21

            (first 4 elements are sorted)p = 4 tmp = 32

            32 lt 64 so 8 34 51 64 64 21

            32 lt 51 so 8 34 51 51 64 21

            next 32 lt 34 so 8 34 34 51 64 21

            next 32 gt 8 so stop at 1st position and set 2nd position = 32

            After 4th pass 8 32 34 51 64 21

            p = 5 tmp = 21

            After 5th pass 8 21 32 34 51 64

            p = 2 tmp = 64

            34 lt 64 so stop at 3rd position and set 3rd position = 64

            After 2nd pass 8 34 64 51 32 21

            (first 3 elements are sorted)

            7

            Insertion sort worst-case running time

            o Inner loop is executed p times for each p=1N-1 Overall 1 + 2 + 3 + + N-1 = hellip= O(N2)o Space requirement is O()

            8

            Heapsort

            (1) Build a binary heap of N elements o the minimum element is at the top of the heap

            (2) Perform N DeleteMin operationso the elements are extracted in sorted order

            (3) Record these elements in a second array and then copy the array back

            9

            Heapsort -Analysis

            (1) Build a binary heap of N elements o repeatedly insert N elements O(N log N) time

            (2) Perform N DeleteMin operationso Each DeleteMin operation takes O(log N) O(N log N)

            (3) Record these elements in a second array and then copy the array backo O(N)

            o Total time complexity O(N log N)o Memory requirement uses an extra array O(N)

            10

            Heapsort ndash No Extra Memory

            o Observation after each deleteMin the size of heap shrinks by 1

            o We can use the last cell just freed up to store the element that was just deleted

            after the last deleteMin the array will contain the elements in decreasing sorted order

            o To sort the elements in the decreasing order use a min heap

            o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

            11

            Heapsort ndash No Extra Memory

            Sort in increasing order use max heap

            Delete 97

            12

            Mergesort

            Based on divide-and-conquer strategy

            o Divide the list into two smaller lists of about equal sizes

            o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

            o How to divide the list o Running timeo How to merge the two sorted lists o Running time

            13

            Mergesort

            Based on divide-and-conquer strategy

            o Divide the list into two smaller lists of about equal sizes

            o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

            o How to divide the list o Running timeo How to merge the two sorted lists o Running time

            14

            Mergesort Divide

            o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

            cut the link

            o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

            o Try left=0 right = 50 center=

            15

            Mergesort

            o Divide-and-conquer strategyo recursively mergesort the first half and the

            second halfo merge the two sorted halves together

            16

            Mergesort

            17

            Mergesort Merge

            o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

            o initially set to the beginning of their respective arrays

            (1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

            (2) When either input list is exhausted the remainder of the other list is copied to C

            18

            Mergesort Merge

            19

            Mergesort Merge

            20

            Mergesort Analysis

            o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

            o Space requiremento merging two sorted lists requires linear extra

            memoryo additional work to copy to the temporary array

            and back

            21

            Mergesort Analysis

            o Let T(N) denote the worst-case running time of mergesort to sort N numbers

            o Assume that N is a power of 2

            o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

            o T(1) = 1o T(N) = 2T(N2) + N

            22

            Mergesort Analysis

            kNN

            T

            NN

            T

            NNN

            T

            NN

            T

            NNN

            T

            NN

            TNT

            kk

            )2(2

            3)8(8

            2)4

            )8(2(4

            2)4(4

            )2

            )4(2(2

            )2(2)(

            )log(

            log

            )2(2)(

            NNO

            NNN

            kNN

            TNTk

            k

            Since N=2k we have k=log2 n

            23

            Quicksort

            o Divide-and-conquer approach to sortingo Like MergeSort except

            o Donrsquot divide the array in halfo Partition the array based elements being less than or

            greater than some element of the array (the pivot)

            o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

            InsertionSort) when array is small

            24

            Quicksort Algorithm

            o Given array S

            o Modify S so elements in increasing order

            1 If size of S is 0 or 1 return

            2 Pick any element v in S as the pivot

            3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

            4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

            25

            Quicksort Example

            26

            Why so fast

            o MergeSort always divides array in halfo QuickSort might divide array into

            subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

            o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

            merge stepo QuickSort can partition the array in place

            o This more than makes up for bad pivot choices

            27

            Picking the Pivot

            o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

            generator

            28

            Picking the Pivot

            o Best choice of pivoto Median of array

            o Median is expensive to calculateo Estimate median as the median of

            three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

            o Has been shown to reduce running time (comparisons) by 14

            29

            Partitioning Strategy

            o Partitioning is conceptually straightforward but easy to do inefficiently

            o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

            o Increment i until S[i] gt pivot

            o Decrement j until S[j] lt pivot

            o If (i lt j) then swap S[i] and S[j]

            o Swap pivot and S[i]

            30

            Partitioning Example

            31

            Partitioning Example

            32

            Partitioning Strategy

            o How to handle duplicateso Consider the case where all elements

            are equalo Current approach Skip over elements

            equal to pivoto No swaps (good)

            o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

            o Worst case O(N2) performance

            33

            Partitioning Strategy

            o How to handle duplicateso Alternative approach

            o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

            o Adds some unnecessary swapso But results in perfect partitioning for array

            of identical elementso Unlikely for input array but more likely for

            recursive calls to QuickSort

            34

            Small Arrays

            o When S is small generating lots of recursive calls on small sub-arrays is expensive

            o General strategyo When N lt threshold use a sort more efficient for

            small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

            for array of size 2 or less

            o Has been shown to reduce running time by 15

            35

            QuickSort Implementation

            36

            QuickSort Implementation

            37

            QuickSort Implementation

            38

            Analysis of QuickSort

            o Let i be the number of elements sent to the left partition

            o Compute running time T(N) for array of size N

            o T(0) = T(1) = O(1)

            o T(N) = T(i) + T(N ndashi ndash1) + O(N)

            39

            Analysis of QuickSort

            40

            Analysis of QuickSort

            41

            Comparison Sorting

            42

            Comparison Sorting

            43

            Comparison Sorting

            44

            Lower Bound on Sorting

            o Best worst-case sorting algorithm (so far) is O(N log N)

            o Can we do bettero Can we prove a lower bound on the

            sorting problemo Preview

            o For comparison sorting no we canrsquot do better

            o Can show lower bound of Ω(N log N)

            45

            Decision Trees

            o A decision tree is a binary treeo Each node represents a set of possible

            orderings of the array elementso Each branch represents an outcome of

            a particular comparison

            o Each leaf of the decision tree represents a particular ordering of the original array elements

            46

            Decision Trees

            47

            Decision Tree for Sorting

            o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

            o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

            o In the average case the number of comparisons is the average of the depths of all leaves

            o There are N different orderings of N elements

            48

            Lower Bound for Comparison Sorting

            o Lemma 71o Lemma 72o Theorem 76o Theorem 77

            49

            Linear Sorting

            o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

            o CountingSort

            o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

            irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

            50

            Linear Sorting

            o BucketSort

            o Assume N elements of A uniformly distributed over the range [01)

            o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

            o Assumes each bucket will contain Θ(1) elements

            51

            External Sorting

            o What is the number of elements N we wish to sort do not fit in memory

            o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

            o We want to minimize disk accesses

            52

            External Mergesorting

            o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

            o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

            o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

            size M(K+1)o Perform a K-way merge O(N)

            o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

            53

            External Mergesort

            o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

            o P = page size

            o Accesses = 4NP (read-allwrite-all twice)

            54

            Summary

            • G64ADS Advanced Data Structures
            • Insertion sort
            • Slide 3
            • Slide 4
            • Slide 5
            • Slide 6
            • Insertion sort worst-case running time
            • Heapsort
            • Heapsort -Analysis
            • Heapsort ndash No Extra Memory
            • Slide 11
            • Mergesort
            • Slide 13
            • Mergesort Divide
            • Slide 15
            • Slide 16
            • Mergesort Merge
            • Slide 18
            • Slide 19
            • Mergesort Analysis
            • Slide 21
            • Slide 22
            • Quicksort
            • Quicksort Algorithm
            • Quicksort Example
            • Why so fast
            • Picking the Pivot
            • Slide 28
            • Partitioning Strategy
            • Partitioning Example
            • Slide 31
            • Slide 32
            • Slide 33
            • Small Arrays
            • QuickSort Implementation
            • Slide 36
            • Slide 37
            • Analysis of QuickSort
            • Slide 39
            • Slide 40
            • Comparison Sorting
            • Slide 42
            • Slide 43
            • Lower Bound on Sorting
            • Decision Trees
            • Slide 46
            • Decision Tree for Sorting
            • Lower Bound for Comparison Sorting
            • Linear Sorting
            • Slide 50
            • External Sorting
            • External Mergesorting
            • External Mergesort
            • Summary

              7

              Insertion sort worst-case running time

              o Inner loop is executed p times for each p=1N-1 Overall 1 + 2 + 3 + + N-1 = hellip= O(N2)o Space requirement is O()

              8

              Heapsort

              (1) Build a binary heap of N elements o the minimum element is at the top of the heap

              (2) Perform N DeleteMin operationso the elements are extracted in sorted order

              (3) Record these elements in a second array and then copy the array back

              9

              Heapsort -Analysis

              (1) Build a binary heap of N elements o repeatedly insert N elements O(N log N) time

              (2) Perform N DeleteMin operationso Each DeleteMin operation takes O(log N) O(N log N)

              (3) Record these elements in a second array and then copy the array backo O(N)

              o Total time complexity O(N log N)o Memory requirement uses an extra array O(N)

              10

              Heapsort ndash No Extra Memory

              o Observation after each deleteMin the size of heap shrinks by 1

              o We can use the last cell just freed up to store the element that was just deleted

              after the last deleteMin the array will contain the elements in decreasing sorted order

              o To sort the elements in the decreasing order use a min heap

              o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

              11

              Heapsort ndash No Extra Memory

              Sort in increasing order use max heap

              Delete 97

              12

              Mergesort

              Based on divide-and-conquer strategy

              o Divide the list into two smaller lists of about equal sizes

              o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

              o How to divide the list o Running timeo How to merge the two sorted lists o Running time

              13

              Mergesort

              Based on divide-and-conquer strategy

              o Divide the list into two smaller lists of about equal sizes

              o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

              o How to divide the list o Running timeo How to merge the two sorted lists o Running time

              14

              Mergesort Divide

              o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

              cut the link

              o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

              o Try left=0 right = 50 center=

              15

              Mergesort

              o Divide-and-conquer strategyo recursively mergesort the first half and the

              second halfo merge the two sorted halves together

              16

              Mergesort

              17

              Mergesort Merge

              o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

              o initially set to the beginning of their respective arrays

              (1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

              (2) When either input list is exhausted the remainder of the other list is copied to C

              18

              Mergesort Merge

              19

              Mergesort Merge

              20

              Mergesort Analysis

              o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

              o Space requiremento merging two sorted lists requires linear extra

              memoryo additional work to copy to the temporary array

              and back

              21

              Mergesort Analysis

              o Let T(N) denote the worst-case running time of mergesort to sort N numbers

              o Assume that N is a power of 2

              o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

              o T(1) = 1o T(N) = 2T(N2) + N

              22

              Mergesort Analysis

              kNN

              T

              NN

              T

              NNN

              T

              NN

              T

              NNN

              T

              NN

              TNT

              kk

              )2(2

              3)8(8

              2)4

              )8(2(4

              2)4(4

              )2

              )4(2(2

              )2(2)(

              )log(

              log

              )2(2)(

              NNO

              NNN

              kNN

              TNTk

              k

              Since N=2k we have k=log2 n

              23

              Quicksort

              o Divide-and-conquer approach to sortingo Like MergeSort except

              o Donrsquot divide the array in halfo Partition the array based elements being less than or

              greater than some element of the array (the pivot)

              o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

              InsertionSort) when array is small

              24

              Quicksort Algorithm

              o Given array S

              o Modify S so elements in increasing order

              1 If size of S is 0 or 1 return

              2 Pick any element v in S as the pivot

              3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

              4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

              25

              Quicksort Example

              26

              Why so fast

              o MergeSort always divides array in halfo QuickSort might divide array into

              subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

              o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

              merge stepo QuickSort can partition the array in place

              o This more than makes up for bad pivot choices

              27

              Picking the Pivot

              o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

              generator

              28

              Picking the Pivot

              o Best choice of pivoto Median of array

              o Median is expensive to calculateo Estimate median as the median of

              three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

              o Has been shown to reduce running time (comparisons) by 14

              29

              Partitioning Strategy

              o Partitioning is conceptually straightforward but easy to do inefficiently

              o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

              o Increment i until S[i] gt pivot

              o Decrement j until S[j] lt pivot

              o If (i lt j) then swap S[i] and S[j]

              o Swap pivot and S[i]

              30

              Partitioning Example

              31

              Partitioning Example

              32

              Partitioning Strategy

              o How to handle duplicateso Consider the case where all elements

              are equalo Current approach Skip over elements

              equal to pivoto No swaps (good)

              o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

              o Worst case O(N2) performance

              33

              Partitioning Strategy

              o How to handle duplicateso Alternative approach

              o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

              o Adds some unnecessary swapso But results in perfect partitioning for array

              of identical elementso Unlikely for input array but more likely for

              recursive calls to QuickSort

              34

              Small Arrays

              o When S is small generating lots of recursive calls on small sub-arrays is expensive

              o General strategyo When N lt threshold use a sort more efficient for

              small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

              for array of size 2 or less

              o Has been shown to reduce running time by 15

              35

              QuickSort Implementation

              36

              QuickSort Implementation

              37

              QuickSort Implementation

              38

              Analysis of QuickSort

              o Let i be the number of elements sent to the left partition

              o Compute running time T(N) for array of size N

              o T(0) = T(1) = O(1)

              o T(N) = T(i) + T(N ndashi ndash1) + O(N)

              39

              Analysis of QuickSort

              40

              Analysis of QuickSort

              41

              Comparison Sorting

              42

              Comparison Sorting

              43

              Comparison Sorting

              44

              Lower Bound on Sorting

              o Best worst-case sorting algorithm (so far) is O(N log N)

              o Can we do bettero Can we prove a lower bound on the

              sorting problemo Preview

              o For comparison sorting no we canrsquot do better

              o Can show lower bound of Ω(N log N)

              45

              Decision Trees

              o A decision tree is a binary treeo Each node represents a set of possible

              orderings of the array elementso Each branch represents an outcome of

              a particular comparison

              o Each leaf of the decision tree represents a particular ordering of the original array elements

              46

              Decision Trees

              47

              Decision Tree for Sorting

              o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

              o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

              o In the average case the number of comparisons is the average of the depths of all leaves

              o There are N different orderings of N elements

              48

              Lower Bound for Comparison Sorting

              o Lemma 71o Lemma 72o Theorem 76o Theorem 77

              49

              Linear Sorting

              o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

              o CountingSort

              o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

              irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

              50

              Linear Sorting

              o BucketSort

              o Assume N elements of A uniformly distributed over the range [01)

              o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

              o Assumes each bucket will contain Θ(1) elements

              51

              External Sorting

              o What is the number of elements N we wish to sort do not fit in memory

              o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

              o We want to minimize disk accesses

              52

              External Mergesorting

              o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

              o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

              o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

              size M(K+1)o Perform a K-way merge O(N)

              o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

              53

              External Mergesort

              o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

              o P = page size

              o Accesses = 4NP (read-allwrite-all twice)

              54

              Summary

              • G64ADS Advanced Data Structures
              • Insertion sort
              • Slide 3
              • Slide 4
              • Slide 5
              • Slide 6
              • Insertion sort worst-case running time
              • Heapsort
              • Heapsort -Analysis
              • Heapsort ndash No Extra Memory
              • Slide 11
              • Mergesort
              • Slide 13
              • Mergesort Divide
              • Slide 15
              • Slide 16
              • Mergesort Merge
              • Slide 18
              • Slide 19
              • Mergesort Analysis
              • Slide 21
              • Slide 22
              • Quicksort
              • Quicksort Algorithm
              • Quicksort Example
              • Why so fast
              • Picking the Pivot
              • Slide 28
              • Partitioning Strategy
              • Partitioning Example
              • Slide 31
              • Slide 32
              • Slide 33
              • Small Arrays
              • QuickSort Implementation
              • Slide 36
              • Slide 37
              • Analysis of QuickSort
              • Slide 39
              • Slide 40
              • Comparison Sorting
              • Slide 42
              • Slide 43
              • Lower Bound on Sorting
              • Decision Trees
              • Slide 46
              • Decision Tree for Sorting
              • Lower Bound for Comparison Sorting
              • Linear Sorting
              • Slide 50
              • External Sorting
              • External Mergesorting
              • External Mergesort
              • Summary

                8

                Heapsort

                (1) Build a binary heap of N elements o the minimum element is at the top of the heap

                (2) Perform N DeleteMin operationso the elements are extracted in sorted order

                (3) Record these elements in a second array and then copy the array back

                9

                Heapsort -Analysis

                (1) Build a binary heap of N elements o repeatedly insert N elements O(N log N) time

                (2) Perform N DeleteMin operationso Each DeleteMin operation takes O(log N) O(N log N)

                (3) Record these elements in a second array and then copy the array backo O(N)

                o Total time complexity O(N log N)o Memory requirement uses an extra array O(N)

                10

                Heapsort ndash No Extra Memory

                o Observation after each deleteMin the size of heap shrinks by 1

                o We can use the last cell just freed up to store the element that was just deleted

                after the last deleteMin the array will contain the elements in decreasing sorted order

                o To sort the elements in the decreasing order use a min heap

                o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

                11

                Heapsort ndash No Extra Memory

                Sort in increasing order use max heap

                Delete 97

                12

                Mergesort

                Based on divide-and-conquer strategy

                o Divide the list into two smaller lists of about equal sizes

                o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

                o How to divide the list o Running timeo How to merge the two sorted lists o Running time

                13

                Mergesort

                Based on divide-and-conquer strategy

                o Divide the list into two smaller lists of about equal sizes

                o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

                o How to divide the list o Running timeo How to merge the two sorted lists o Running time

                14

                Mergesort Divide

                o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

                cut the link

                o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

                o Try left=0 right = 50 center=

                15

                Mergesort

                o Divide-and-conquer strategyo recursively mergesort the first half and the

                second halfo merge the two sorted halves together

                16

                Mergesort

                17

                Mergesort Merge

                o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

                o initially set to the beginning of their respective arrays

                (1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

                (2) When either input list is exhausted the remainder of the other list is copied to C

                18

                Mergesort Merge

                19

                Mergesort Merge

                20

                Mergesort Analysis

                o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

                o Space requiremento merging two sorted lists requires linear extra

                memoryo additional work to copy to the temporary array

                and back

                21

                Mergesort Analysis

                o Let T(N) denote the worst-case running time of mergesort to sort N numbers

                o Assume that N is a power of 2

                o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

                o T(1) = 1o T(N) = 2T(N2) + N

                22

                Mergesort Analysis

                kNN

                T

                NN

                T

                NNN

                T

                NN

                T

                NNN

                T

                NN

                TNT

                kk

                )2(2

                3)8(8

                2)4

                )8(2(4

                2)4(4

                )2

                )4(2(2

                )2(2)(

                )log(

                log

                )2(2)(

                NNO

                NNN

                kNN

                TNTk

                k

                Since N=2k we have k=log2 n

                23

                Quicksort

                o Divide-and-conquer approach to sortingo Like MergeSort except

                o Donrsquot divide the array in halfo Partition the array based elements being less than or

                greater than some element of the array (the pivot)

                o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

                InsertionSort) when array is small

                24

                Quicksort Algorithm

                o Given array S

                o Modify S so elements in increasing order

                1 If size of S is 0 or 1 return

                2 Pick any element v in S as the pivot

                3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

                4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

                25

                Quicksort Example

                26

                Why so fast

                o MergeSort always divides array in halfo QuickSort might divide array into

                subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

                o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

                merge stepo QuickSort can partition the array in place

                o This more than makes up for bad pivot choices

                27

                Picking the Pivot

                o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                generator

                28

                Picking the Pivot

                o Best choice of pivoto Median of array

                o Median is expensive to calculateo Estimate median as the median of

                three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                o Has been shown to reduce running time (comparisons) by 14

                29

                Partitioning Strategy

                o Partitioning is conceptually straightforward but easy to do inefficiently

                o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                o Increment i until S[i] gt pivot

                o Decrement j until S[j] lt pivot

                o If (i lt j) then swap S[i] and S[j]

                o Swap pivot and S[i]

                30

                Partitioning Example

                31

                Partitioning Example

                32

                Partitioning Strategy

                o How to handle duplicateso Consider the case where all elements

                are equalo Current approach Skip over elements

                equal to pivoto No swaps (good)

                o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                o Worst case O(N2) performance

                33

                Partitioning Strategy

                o How to handle duplicateso Alternative approach

                o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                o Adds some unnecessary swapso But results in perfect partitioning for array

                of identical elementso Unlikely for input array but more likely for

                recursive calls to QuickSort

                34

                Small Arrays

                o When S is small generating lots of recursive calls on small sub-arrays is expensive

                o General strategyo When N lt threshold use a sort more efficient for

                small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                for array of size 2 or less

                o Has been shown to reduce running time by 15

                35

                QuickSort Implementation

                36

                QuickSort Implementation

                37

                QuickSort Implementation

                38

                Analysis of QuickSort

                o Let i be the number of elements sent to the left partition

                o Compute running time T(N) for array of size N

                o T(0) = T(1) = O(1)

                o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                39

                Analysis of QuickSort

                40

                Analysis of QuickSort

                41

                Comparison Sorting

                42

                Comparison Sorting

                43

                Comparison Sorting

                44

                Lower Bound on Sorting

                o Best worst-case sorting algorithm (so far) is O(N log N)

                o Can we do bettero Can we prove a lower bound on the

                sorting problemo Preview

                o For comparison sorting no we canrsquot do better

                o Can show lower bound of Ω(N log N)

                45

                Decision Trees

                o A decision tree is a binary treeo Each node represents a set of possible

                orderings of the array elementso Each branch represents an outcome of

                a particular comparison

                o Each leaf of the decision tree represents a particular ordering of the original array elements

                46

                Decision Trees

                47

                Decision Tree for Sorting

                o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                o In the average case the number of comparisons is the average of the depths of all leaves

                o There are N different orderings of N elements

                48

                Lower Bound for Comparison Sorting

                o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                49

                Linear Sorting

                o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                o CountingSort

                o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                50

                Linear Sorting

                o BucketSort

                o Assume N elements of A uniformly distributed over the range [01)

                o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                o Assumes each bucket will contain Θ(1) elements

                51

                External Sorting

                o What is the number of elements N we wish to sort do not fit in memory

                o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                o We want to minimize disk accesses

                52

                External Mergesorting

                o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                size M(K+1)o Perform a K-way merge O(N)

                o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                53

                External Mergesort

                o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                o P = page size

                o Accesses = 4NP (read-allwrite-all twice)

                54

                Summary

                • G64ADS Advanced Data Structures
                • Insertion sort
                • Slide 3
                • Slide 4
                • Slide 5
                • Slide 6
                • Insertion sort worst-case running time
                • Heapsort
                • Heapsort -Analysis
                • Heapsort ndash No Extra Memory
                • Slide 11
                • Mergesort
                • Slide 13
                • Mergesort Divide
                • Slide 15
                • Slide 16
                • Mergesort Merge
                • Slide 18
                • Slide 19
                • Mergesort Analysis
                • Slide 21
                • Slide 22
                • Quicksort
                • Quicksort Algorithm
                • Quicksort Example
                • Why so fast
                • Picking the Pivot
                • Slide 28
                • Partitioning Strategy
                • Partitioning Example
                • Slide 31
                • Slide 32
                • Slide 33
                • Small Arrays
                • QuickSort Implementation
                • Slide 36
                • Slide 37
                • Analysis of QuickSort
                • Slide 39
                • Slide 40
                • Comparison Sorting
                • Slide 42
                • Slide 43
                • Lower Bound on Sorting
                • Decision Trees
                • Slide 46
                • Decision Tree for Sorting
                • Lower Bound for Comparison Sorting
                • Linear Sorting
                • Slide 50
                • External Sorting
                • External Mergesorting
                • External Mergesort
                • Summary

                  9

                  Heapsort -Analysis

                  (1) Build a binary heap of N elements o repeatedly insert N elements O(N log N) time

                  (2) Perform N DeleteMin operationso Each DeleteMin operation takes O(log N) O(N log N)

                  (3) Record these elements in a second array and then copy the array backo O(N)

                  o Total time complexity O(N log N)o Memory requirement uses an extra array O(N)

                  10

                  Heapsort ndash No Extra Memory

                  o Observation after each deleteMin the size of heap shrinks by 1

                  o We can use the last cell just freed up to store the element that was just deleted

                  after the last deleteMin the array will contain the elements in decreasing sorted order

                  o To sort the elements in the decreasing order use a min heap

                  o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

                  11

                  Heapsort ndash No Extra Memory

                  Sort in increasing order use max heap

                  Delete 97

                  12

                  Mergesort

                  Based on divide-and-conquer strategy

                  o Divide the list into two smaller lists of about equal sizes

                  o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

                  o How to divide the list o Running timeo How to merge the two sorted lists o Running time

                  13

                  Mergesort

                  Based on divide-and-conquer strategy

                  o Divide the list into two smaller lists of about equal sizes

                  o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

                  o How to divide the list o Running timeo How to merge the two sorted lists o Running time

                  14

                  Mergesort Divide

                  o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

                  cut the link

                  o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

                  o Try left=0 right = 50 center=

                  15

                  Mergesort

                  o Divide-and-conquer strategyo recursively mergesort the first half and the

                  second halfo merge the two sorted halves together

                  16

                  Mergesort

                  17

                  Mergesort Merge

                  o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

                  o initially set to the beginning of their respective arrays

                  (1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

                  (2) When either input list is exhausted the remainder of the other list is copied to C

                  18

                  Mergesort Merge

                  19

                  Mergesort Merge

                  20

                  Mergesort Analysis

                  o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

                  o Space requiremento merging two sorted lists requires linear extra

                  memoryo additional work to copy to the temporary array

                  and back

                  21

                  Mergesort Analysis

                  o Let T(N) denote the worst-case running time of mergesort to sort N numbers

                  o Assume that N is a power of 2

                  o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

                  o T(1) = 1o T(N) = 2T(N2) + N

                  22

                  Mergesort Analysis

                  kNN

                  T

                  NN

                  T

                  NNN

                  T

                  NN

                  T

                  NNN

                  T

                  NN

                  TNT

                  kk

                  )2(2

                  3)8(8

                  2)4

                  )8(2(4

                  2)4(4

                  )2

                  )4(2(2

                  )2(2)(

                  )log(

                  log

                  )2(2)(

                  NNO

                  NNN

                  kNN

                  TNTk

                  k

                  Since N=2k we have k=log2 n

                  23

                  Quicksort

                  o Divide-and-conquer approach to sortingo Like MergeSort except

                  o Donrsquot divide the array in halfo Partition the array based elements being less than or

                  greater than some element of the array (the pivot)

                  o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

                  InsertionSort) when array is small

                  24

                  Quicksort Algorithm

                  o Given array S

                  o Modify S so elements in increasing order

                  1 If size of S is 0 or 1 return

                  2 Pick any element v in S as the pivot

                  3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

                  4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

                  25

                  Quicksort Example

                  26

                  Why so fast

                  o MergeSort always divides array in halfo QuickSort might divide array into

                  subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

                  o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

                  merge stepo QuickSort can partition the array in place

                  o This more than makes up for bad pivot choices

                  27

                  Picking the Pivot

                  o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                  generator

                  28

                  Picking the Pivot

                  o Best choice of pivoto Median of array

                  o Median is expensive to calculateo Estimate median as the median of

                  three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                  o Has been shown to reduce running time (comparisons) by 14

                  29

                  Partitioning Strategy

                  o Partitioning is conceptually straightforward but easy to do inefficiently

                  o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                  o Increment i until S[i] gt pivot

                  o Decrement j until S[j] lt pivot

                  o If (i lt j) then swap S[i] and S[j]

                  o Swap pivot and S[i]

                  30

                  Partitioning Example

                  31

                  Partitioning Example

                  32

                  Partitioning Strategy

                  o How to handle duplicateso Consider the case where all elements

                  are equalo Current approach Skip over elements

                  equal to pivoto No swaps (good)

                  o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                  o Worst case O(N2) performance

                  33

                  Partitioning Strategy

                  o How to handle duplicateso Alternative approach

                  o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                  o Adds some unnecessary swapso But results in perfect partitioning for array

                  of identical elementso Unlikely for input array but more likely for

                  recursive calls to QuickSort

                  34

                  Small Arrays

                  o When S is small generating lots of recursive calls on small sub-arrays is expensive

                  o General strategyo When N lt threshold use a sort more efficient for

                  small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                  for array of size 2 or less

                  o Has been shown to reduce running time by 15

                  35

                  QuickSort Implementation

                  36

                  QuickSort Implementation

                  37

                  QuickSort Implementation

                  38

                  Analysis of QuickSort

                  o Let i be the number of elements sent to the left partition

                  o Compute running time T(N) for array of size N

                  o T(0) = T(1) = O(1)

                  o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                  39

                  Analysis of QuickSort

                  40

                  Analysis of QuickSort

                  41

                  Comparison Sorting

                  42

                  Comparison Sorting

                  43

                  Comparison Sorting

                  44

                  Lower Bound on Sorting

                  o Best worst-case sorting algorithm (so far) is O(N log N)

                  o Can we do bettero Can we prove a lower bound on the

                  sorting problemo Preview

                  o For comparison sorting no we canrsquot do better

                  o Can show lower bound of Ω(N log N)

                  45

                  Decision Trees

                  o A decision tree is a binary treeo Each node represents a set of possible

                  orderings of the array elementso Each branch represents an outcome of

                  a particular comparison

                  o Each leaf of the decision tree represents a particular ordering of the original array elements

                  46

                  Decision Trees

                  47

                  Decision Tree for Sorting

                  o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                  o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                  o In the average case the number of comparisons is the average of the depths of all leaves

                  o There are N different orderings of N elements

                  48

                  Lower Bound for Comparison Sorting

                  o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                  49

                  Linear Sorting

                  o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                  o CountingSort

                  o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                  irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                  50

                  Linear Sorting

                  o BucketSort

                  o Assume N elements of A uniformly distributed over the range [01)

                  o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                  o Assumes each bucket will contain Θ(1) elements

                  51

                  External Sorting

                  o What is the number of elements N we wish to sort do not fit in memory

                  o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                  o We want to minimize disk accesses

                  52

                  External Mergesorting

                  o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                  o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                  o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                  size M(K+1)o Perform a K-way merge O(N)

                  o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                  53

                  External Mergesort

                  o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                  o P = page size

                  o Accesses = 4NP (read-allwrite-all twice)

                  54

                  Summary

                  • G64ADS Advanced Data Structures
                  • Insertion sort
                  • Slide 3
                  • Slide 4
                  • Slide 5
                  • Slide 6
                  • Insertion sort worst-case running time
                  • Heapsort
                  • Heapsort -Analysis
                  • Heapsort ndash No Extra Memory
                  • Slide 11
                  • Mergesort
                  • Slide 13
                  • Mergesort Divide
                  • Slide 15
                  • Slide 16
                  • Mergesort Merge
                  • Slide 18
                  • Slide 19
                  • Mergesort Analysis
                  • Slide 21
                  • Slide 22
                  • Quicksort
                  • Quicksort Algorithm
                  • Quicksort Example
                  • Why so fast
                  • Picking the Pivot
                  • Slide 28
                  • Partitioning Strategy
                  • Partitioning Example
                  • Slide 31
                  • Slide 32
                  • Slide 33
                  • Small Arrays
                  • QuickSort Implementation
                  • Slide 36
                  • Slide 37
                  • Analysis of QuickSort
                  • Slide 39
                  • Slide 40
                  • Comparison Sorting
                  • Slide 42
                  • Slide 43
                  • Lower Bound on Sorting
                  • Decision Trees
                  • Slide 46
                  • Decision Tree for Sorting
                  • Lower Bound for Comparison Sorting
                  • Linear Sorting
                  • Slide 50
                  • External Sorting
                  • External Mergesorting
                  • External Mergesort
                  • Summary

                    10

                    Heapsort ndash No Extra Memory

                    o Observation after each deleteMin the size of heap shrinks by 1

                    o We can use the last cell just freed up to store the element that was just deleted

                    after the last deleteMin the array will contain the elements in decreasing sorted order

                    o To sort the elements in the decreasing order use a min heap

                    o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

                    11

                    Heapsort ndash No Extra Memory

                    Sort in increasing order use max heap

                    Delete 97

                    12

                    Mergesort

                    Based on divide-and-conquer strategy

                    o Divide the list into two smaller lists of about equal sizes

                    o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

                    o How to divide the list o Running timeo How to merge the two sorted lists o Running time

                    13

                    Mergesort

                    Based on divide-and-conquer strategy

                    o Divide the list into two smaller lists of about equal sizes

                    o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

                    o How to divide the list o Running timeo How to merge the two sorted lists o Running time

                    14

                    Mergesort Divide

                    o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

                    cut the link

                    o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

                    o Try left=0 right = 50 center=

                    15

                    Mergesort

                    o Divide-and-conquer strategyo recursively mergesort the first half and the

                    second halfo merge the two sorted halves together

                    16

                    Mergesort

                    17

                    Mergesort Merge

                    o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

                    o initially set to the beginning of their respective arrays

                    (1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

                    (2) When either input list is exhausted the remainder of the other list is copied to C

                    18

                    Mergesort Merge

                    19

                    Mergesort Merge

                    20

                    Mergesort Analysis

                    o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

                    o Space requiremento merging two sorted lists requires linear extra

                    memoryo additional work to copy to the temporary array

                    and back

                    21

                    Mergesort Analysis

                    o Let T(N) denote the worst-case running time of mergesort to sort N numbers

                    o Assume that N is a power of 2

                    o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

                    o T(1) = 1o T(N) = 2T(N2) + N

                    22

                    Mergesort Analysis

                    kNN

                    T

                    NN

                    T

                    NNN

                    T

                    NN

                    T

                    NNN

                    T

                    NN

                    TNT

                    kk

                    )2(2

                    3)8(8

                    2)4

                    )8(2(4

                    2)4(4

                    )2

                    )4(2(2

                    )2(2)(

                    )log(

                    log

                    )2(2)(

                    NNO

                    NNN

                    kNN

                    TNTk

                    k

                    Since N=2k we have k=log2 n

                    23

                    Quicksort

                    o Divide-and-conquer approach to sortingo Like MergeSort except

                    o Donrsquot divide the array in halfo Partition the array based elements being less than or

                    greater than some element of the array (the pivot)

                    o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

                    InsertionSort) when array is small

                    24

                    Quicksort Algorithm

                    o Given array S

                    o Modify S so elements in increasing order

                    1 If size of S is 0 or 1 return

                    2 Pick any element v in S as the pivot

                    3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

                    4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

                    25

                    Quicksort Example

                    26

                    Why so fast

                    o MergeSort always divides array in halfo QuickSort might divide array into

                    subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

                    o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

                    merge stepo QuickSort can partition the array in place

                    o This more than makes up for bad pivot choices

                    27

                    Picking the Pivot

                    o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                    generator

                    28

                    Picking the Pivot

                    o Best choice of pivoto Median of array

                    o Median is expensive to calculateo Estimate median as the median of

                    three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                    o Has been shown to reduce running time (comparisons) by 14

                    29

                    Partitioning Strategy

                    o Partitioning is conceptually straightforward but easy to do inefficiently

                    o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                    o Increment i until S[i] gt pivot

                    o Decrement j until S[j] lt pivot

                    o If (i lt j) then swap S[i] and S[j]

                    o Swap pivot and S[i]

                    30

                    Partitioning Example

                    31

                    Partitioning Example

                    32

                    Partitioning Strategy

                    o How to handle duplicateso Consider the case where all elements

                    are equalo Current approach Skip over elements

                    equal to pivoto No swaps (good)

                    o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                    o Worst case O(N2) performance

                    33

                    Partitioning Strategy

                    o How to handle duplicateso Alternative approach

                    o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                    o Adds some unnecessary swapso But results in perfect partitioning for array

                    of identical elementso Unlikely for input array but more likely for

                    recursive calls to QuickSort

                    34

                    Small Arrays

                    o When S is small generating lots of recursive calls on small sub-arrays is expensive

                    o General strategyo When N lt threshold use a sort more efficient for

                    small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                    for array of size 2 or less

                    o Has been shown to reduce running time by 15

                    35

                    QuickSort Implementation

                    36

                    QuickSort Implementation

                    37

                    QuickSort Implementation

                    38

                    Analysis of QuickSort

                    o Let i be the number of elements sent to the left partition

                    o Compute running time T(N) for array of size N

                    o T(0) = T(1) = O(1)

                    o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                    39

                    Analysis of QuickSort

                    40

                    Analysis of QuickSort

                    41

                    Comparison Sorting

                    42

                    Comparison Sorting

                    43

                    Comparison Sorting

                    44

                    Lower Bound on Sorting

                    o Best worst-case sorting algorithm (so far) is O(N log N)

                    o Can we do bettero Can we prove a lower bound on the

                    sorting problemo Preview

                    o For comparison sorting no we canrsquot do better

                    o Can show lower bound of Ω(N log N)

                    45

                    Decision Trees

                    o A decision tree is a binary treeo Each node represents a set of possible

                    orderings of the array elementso Each branch represents an outcome of

                    a particular comparison

                    o Each leaf of the decision tree represents a particular ordering of the original array elements

                    46

                    Decision Trees

                    47

                    Decision Tree for Sorting

                    o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                    o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                    o In the average case the number of comparisons is the average of the depths of all leaves

                    o There are N different orderings of N elements

                    48

                    Lower Bound for Comparison Sorting

                    o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                    49

                    Linear Sorting

                    o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                    o CountingSort

                    o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                    irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                    50

                    Linear Sorting

                    o BucketSort

                    o Assume N elements of A uniformly distributed over the range [01)

                    o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                    o Assumes each bucket will contain Θ(1) elements

                    51

                    External Sorting

                    o What is the number of elements N we wish to sort do not fit in memory

                    o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                    o We want to minimize disk accesses

                    52

                    External Mergesorting

                    o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                    o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                    o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                    size M(K+1)o Perform a K-way merge O(N)

                    o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                    53

                    External Mergesort

                    o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                    o P = page size

                    o Accesses = 4NP (read-allwrite-all twice)

                    54

                    Summary

                    • G64ADS Advanced Data Structures
                    • Insertion sort
                    • Slide 3
                    • Slide 4
                    • Slide 5
                    • Slide 6
                    • Insertion sort worst-case running time
                    • Heapsort
                    • Heapsort -Analysis
                    • Heapsort ndash No Extra Memory
                    • Slide 11
                    • Mergesort
                    • Slide 13
                    • Mergesort Divide
                    • Slide 15
                    • Slide 16
                    • Mergesort Merge
                    • Slide 18
                    • Slide 19
                    • Mergesort Analysis
                    • Slide 21
                    • Slide 22
                    • Quicksort
                    • Quicksort Algorithm
                    • Quicksort Example
                    • Why so fast
                    • Picking the Pivot
                    • Slide 28
                    • Partitioning Strategy
                    • Partitioning Example
                    • Slide 31
                    • Slide 32
                    • Slide 33
                    • Small Arrays
                    • QuickSort Implementation
                    • Slide 36
                    • Slide 37
                    • Analysis of QuickSort
                    • Slide 39
                    • Slide 40
                    • Comparison Sorting
                    • Slide 42
                    • Slide 43
                    • Lower Bound on Sorting
                    • Decision Trees
                    • Slide 46
                    • Decision Tree for Sorting
                    • Lower Bound for Comparison Sorting
                    • Linear Sorting
                    • Slide 50
                    • External Sorting
                    • External Mergesorting
                    • External Mergesort
                    • Summary

                      11

                      Heapsort ndash No Extra Memory

                      Sort in increasing order use max heap

                      Delete 97

                      12

                      Mergesort

                      Based on divide-and-conquer strategy

                      o Divide the list into two smaller lists of about equal sizes

                      o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

                      o How to divide the list o Running timeo How to merge the two sorted lists o Running time

                      13

                      Mergesort

                      Based on divide-and-conquer strategy

                      o Divide the list into two smaller lists of about equal sizes

                      o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

                      o How to divide the list o Running timeo How to merge the two sorted lists o Running time

                      14

                      Mergesort Divide

                      o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

                      cut the link

                      o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

                      o Try left=0 right = 50 center=

                      15

                      Mergesort

                      o Divide-and-conquer strategyo recursively mergesort the first half and the

                      second halfo merge the two sorted halves together

                      16

                      Mergesort

                      17

                      Mergesort Merge

                      o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

                      o initially set to the beginning of their respective arrays

                      (1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

                      (2) When either input list is exhausted the remainder of the other list is copied to C

                      18

                      Mergesort Merge

                      19

                      Mergesort Merge

                      20

                      Mergesort Analysis

                      o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

                      o Space requiremento merging two sorted lists requires linear extra

                      memoryo additional work to copy to the temporary array

                      and back

                      21

                      Mergesort Analysis

                      o Let T(N) denote the worst-case running time of mergesort to sort N numbers

                      o Assume that N is a power of 2

                      o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

                      o T(1) = 1o T(N) = 2T(N2) + N

                      22

                      Mergesort Analysis

                      kNN

                      T

                      NN

                      T

                      NNN

                      T

                      NN

                      T

                      NNN

                      T

                      NN

                      TNT

                      kk

                      )2(2

                      3)8(8

                      2)4

                      )8(2(4

                      2)4(4

                      )2

                      )4(2(2

                      )2(2)(

                      )log(

                      log

                      )2(2)(

                      NNO

                      NNN

                      kNN

                      TNTk

                      k

                      Since N=2k we have k=log2 n

                      23

                      Quicksort

                      o Divide-and-conquer approach to sortingo Like MergeSort except

                      o Donrsquot divide the array in halfo Partition the array based elements being less than or

                      greater than some element of the array (the pivot)

                      o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

                      InsertionSort) when array is small

                      24

                      Quicksort Algorithm

                      o Given array S

                      o Modify S so elements in increasing order

                      1 If size of S is 0 or 1 return

                      2 Pick any element v in S as the pivot

                      3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

                      4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

                      25

                      Quicksort Example

                      26

                      Why so fast

                      o MergeSort always divides array in halfo QuickSort might divide array into

                      subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

                      o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

                      merge stepo QuickSort can partition the array in place

                      o This more than makes up for bad pivot choices

                      27

                      Picking the Pivot

                      o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                      generator

                      28

                      Picking the Pivot

                      o Best choice of pivoto Median of array

                      o Median is expensive to calculateo Estimate median as the median of

                      three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                      o Has been shown to reduce running time (comparisons) by 14

                      29

                      Partitioning Strategy

                      o Partitioning is conceptually straightforward but easy to do inefficiently

                      o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                      o Increment i until S[i] gt pivot

                      o Decrement j until S[j] lt pivot

                      o If (i lt j) then swap S[i] and S[j]

                      o Swap pivot and S[i]

                      30

                      Partitioning Example

                      31

                      Partitioning Example

                      32

                      Partitioning Strategy

                      o How to handle duplicateso Consider the case where all elements

                      are equalo Current approach Skip over elements

                      equal to pivoto No swaps (good)

                      o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                      o Worst case O(N2) performance

                      33

                      Partitioning Strategy

                      o How to handle duplicateso Alternative approach

                      o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                      o Adds some unnecessary swapso But results in perfect partitioning for array

                      of identical elementso Unlikely for input array but more likely for

                      recursive calls to QuickSort

                      34

                      Small Arrays

                      o When S is small generating lots of recursive calls on small sub-arrays is expensive

                      o General strategyo When N lt threshold use a sort more efficient for

                      small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                      for array of size 2 or less

                      o Has been shown to reduce running time by 15

                      35

                      QuickSort Implementation

                      36

                      QuickSort Implementation

                      37

                      QuickSort Implementation

                      38

                      Analysis of QuickSort

                      o Let i be the number of elements sent to the left partition

                      o Compute running time T(N) for array of size N

                      o T(0) = T(1) = O(1)

                      o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                      39

                      Analysis of QuickSort

                      40

                      Analysis of QuickSort

                      41

                      Comparison Sorting

                      42

                      Comparison Sorting

                      43

                      Comparison Sorting

                      44

                      Lower Bound on Sorting

                      o Best worst-case sorting algorithm (so far) is O(N log N)

                      o Can we do bettero Can we prove a lower bound on the

                      sorting problemo Preview

                      o For comparison sorting no we canrsquot do better

                      o Can show lower bound of Ω(N log N)

                      45

                      Decision Trees

                      o A decision tree is a binary treeo Each node represents a set of possible

                      orderings of the array elementso Each branch represents an outcome of

                      a particular comparison

                      o Each leaf of the decision tree represents a particular ordering of the original array elements

                      46

                      Decision Trees

                      47

                      Decision Tree for Sorting

                      o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                      o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                      o In the average case the number of comparisons is the average of the depths of all leaves

                      o There are N different orderings of N elements

                      48

                      Lower Bound for Comparison Sorting

                      o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                      49

                      Linear Sorting

                      o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                      o CountingSort

                      o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                      irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                      50

                      Linear Sorting

                      o BucketSort

                      o Assume N elements of A uniformly distributed over the range [01)

                      o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                      o Assumes each bucket will contain Θ(1) elements

                      51

                      External Sorting

                      o What is the number of elements N we wish to sort do not fit in memory

                      o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                      o We want to minimize disk accesses

                      52

                      External Mergesorting

                      o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                      o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                      o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                      size M(K+1)o Perform a K-way merge O(N)

                      o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                      53

                      External Mergesort

                      o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                      o P = page size

                      o Accesses = 4NP (read-allwrite-all twice)

                      54

                      Summary

                      • G64ADS Advanced Data Structures
                      • Insertion sort
                      • Slide 3
                      • Slide 4
                      • Slide 5
                      • Slide 6
                      • Insertion sort worst-case running time
                      • Heapsort
                      • Heapsort -Analysis
                      • Heapsort ndash No Extra Memory
                      • Slide 11
                      • Mergesort
                      • Slide 13
                      • Mergesort Divide
                      • Slide 15
                      • Slide 16
                      • Mergesort Merge
                      • Slide 18
                      • Slide 19
                      • Mergesort Analysis
                      • Slide 21
                      • Slide 22
                      • Quicksort
                      • Quicksort Algorithm
                      • Quicksort Example
                      • Why so fast
                      • Picking the Pivot
                      • Slide 28
                      • Partitioning Strategy
                      • Partitioning Example
                      • Slide 31
                      • Slide 32
                      • Slide 33
                      • Small Arrays
                      • QuickSort Implementation
                      • Slide 36
                      • Slide 37
                      • Analysis of QuickSort
                      • Slide 39
                      • Slide 40
                      • Comparison Sorting
                      • Slide 42
                      • Slide 43
                      • Lower Bound on Sorting
                      • Decision Trees
                      • Slide 46
                      • Decision Tree for Sorting
                      • Lower Bound for Comparison Sorting
                      • Linear Sorting
                      • Slide 50
                      • External Sorting
                      • External Mergesorting
                      • External Mergesort
                      • Summary

                        12

                        Mergesort

                        Based on divide-and-conquer strategy

                        o Divide the list into two smaller lists of about equal sizes

                        o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

                        o How to divide the list o Running timeo How to merge the two sorted lists o Running time

                        13

                        Mergesort

                        Based on divide-and-conquer strategy

                        o Divide the list into two smaller lists of about equal sizes

                        o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

                        o How to divide the list o Running timeo How to merge the two sorted lists o Running time

                        14

                        Mergesort Divide

                        o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

                        cut the link

                        o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

                        o Try left=0 right = 50 center=

                        15

                        Mergesort

                        o Divide-and-conquer strategyo recursively mergesort the first half and the

                        second halfo merge the two sorted halves together

                        16

                        Mergesort

                        17

                        Mergesort Merge

                        o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

                        o initially set to the beginning of their respective arrays

                        (1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

                        (2) When either input list is exhausted the remainder of the other list is copied to C

                        18

                        Mergesort Merge

                        19

                        Mergesort Merge

                        20

                        Mergesort Analysis

                        o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

                        o Space requiremento merging two sorted lists requires linear extra

                        memoryo additional work to copy to the temporary array

                        and back

                        21

                        Mergesort Analysis

                        o Let T(N) denote the worst-case running time of mergesort to sort N numbers

                        o Assume that N is a power of 2

                        o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

                        o T(1) = 1o T(N) = 2T(N2) + N

                        22

                        Mergesort Analysis

                        kNN

                        T

                        NN

                        T

                        NNN

                        T

                        NN

                        T

                        NNN

                        T

                        NN

                        TNT

                        kk

                        )2(2

                        3)8(8

                        2)4

                        )8(2(4

                        2)4(4

                        )2

                        )4(2(2

                        )2(2)(

                        )log(

                        log

                        )2(2)(

                        NNO

                        NNN

                        kNN

                        TNTk

                        k

                        Since N=2k we have k=log2 n

                        23

                        Quicksort

                        o Divide-and-conquer approach to sortingo Like MergeSort except

                        o Donrsquot divide the array in halfo Partition the array based elements being less than or

                        greater than some element of the array (the pivot)

                        o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

                        InsertionSort) when array is small

                        24

                        Quicksort Algorithm

                        o Given array S

                        o Modify S so elements in increasing order

                        1 If size of S is 0 or 1 return

                        2 Pick any element v in S as the pivot

                        3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

                        4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

                        25

                        Quicksort Example

                        26

                        Why so fast

                        o MergeSort always divides array in halfo QuickSort might divide array into

                        subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

                        o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

                        merge stepo QuickSort can partition the array in place

                        o This more than makes up for bad pivot choices

                        27

                        Picking the Pivot

                        o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                        generator

                        28

                        Picking the Pivot

                        o Best choice of pivoto Median of array

                        o Median is expensive to calculateo Estimate median as the median of

                        three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                        o Has been shown to reduce running time (comparisons) by 14

                        29

                        Partitioning Strategy

                        o Partitioning is conceptually straightforward but easy to do inefficiently

                        o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                        o Increment i until S[i] gt pivot

                        o Decrement j until S[j] lt pivot

                        o If (i lt j) then swap S[i] and S[j]

                        o Swap pivot and S[i]

                        30

                        Partitioning Example

                        31

                        Partitioning Example

                        32

                        Partitioning Strategy

                        o How to handle duplicateso Consider the case where all elements

                        are equalo Current approach Skip over elements

                        equal to pivoto No swaps (good)

                        o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                        o Worst case O(N2) performance

                        33

                        Partitioning Strategy

                        o How to handle duplicateso Alternative approach

                        o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                        o Adds some unnecessary swapso But results in perfect partitioning for array

                        of identical elementso Unlikely for input array but more likely for

                        recursive calls to QuickSort

                        34

                        Small Arrays

                        o When S is small generating lots of recursive calls on small sub-arrays is expensive

                        o General strategyo When N lt threshold use a sort more efficient for

                        small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                        for array of size 2 or less

                        o Has been shown to reduce running time by 15

                        35

                        QuickSort Implementation

                        36

                        QuickSort Implementation

                        37

                        QuickSort Implementation

                        38

                        Analysis of QuickSort

                        o Let i be the number of elements sent to the left partition

                        o Compute running time T(N) for array of size N

                        o T(0) = T(1) = O(1)

                        o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                        39

                        Analysis of QuickSort

                        40

                        Analysis of QuickSort

                        41

                        Comparison Sorting

                        42

                        Comparison Sorting

                        43

                        Comparison Sorting

                        44

                        Lower Bound on Sorting

                        o Best worst-case sorting algorithm (so far) is O(N log N)

                        o Can we do bettero Can we prove a lower bound on the

                        sorting problemo Preview

                        o For comparison sorting no we canrsquot do better

                        o Can show lower bound of Ω(N log N)

                        45

                        Decision Trees

                        o A decision tree is a binary treeo Each node represents a set of possible

                        orderings of the array elementso Each branch represents an outcome of

                        a particular comparison

                        o Each leaf of the decision tree represents a particular ordering of the original array elements

                        46

                        Decision Trees

                        47

                        Decision Tree for Sorting

                        o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                        o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                        o In the average case the number of comparisons is the average of the depths of all leaves

                        o There are N different orderings of N elements

                        48

                        Lower Bound for Comparison Sorting

                        o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                        49

                        Linear Sorting

                        o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                        o CountingSort

                        o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                        irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                        50

                        Linear Sorting

                        o BucketSort

                        o Assume N elements of A uniformly distributed over the range [01)

                        o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                        o Assumes each bucket will contain Θ(1) elements

                        51

                        External Sorting

                        o What is the number of elements N we wish to sort do not fit in memory

                        o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                        o We want to minimize disk accesses

                        52

                        External Mergesorting

                        o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                        o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                        o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                        size M(K+1)o Perform a K-way merge O(N)

                        o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                        53

                        External Mergesort

                        o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                        o P = page size

                        o Accesses = 4NP (read-allwrite-all twice)

                        54

                        Summary

                        • G64ADS Advanced Data Structures
                        • Insertion sort
                        • Slide 3
                        • Slide 4
                        • Slide 5
                        • Slide 6
                        • Insertion sort worst-case running time
                        • Heapsort
                        • Heapsort -Analysis
                        • Heapsort ndash No Extra Memory
                        • Slide 11
                        • Mergesort
                        • Slide 13
                        • Mergesort Divide
                        • Slide 15
                        • Slide 16
                        • Mergesort Merge
                        • Slide 18
                        • Slide 19
                        • Mergesort Analysis
                        • Slide 21
                        • Slide 22
                        • Quicksort
                        • Quicksort Algorithm
                        • Quicksort Example
                        • Why so fast
                        • Picking the Pivot
                        • Slide 28
                        • Partitioning Strategy
                        • Partitioning Example
                        • Slide 31
                        • Slide 32
                        • Slide 33
                        • Small Arrays
                        • QuickSort Implementation
                        • Slide 36
                        • Slide 37
                        • Analysis of QuickSort
                        • Slide 39
                        • Slide 40
                        • Comparison Sorting
                        • Slide 42
                        • Slide 43
                        • Lower Bound on Sorting
                        • Decision Trees
                        • Slide 46
                        • Decision Tree for Sorting
                        • Lower Bound for Comparison Sorting
                        • Linear Sorting
                        • Slide 50
                        • External Sorting
                        • External Mergesorting
                        • External Mergesort
                        • Summary

                          13

                          Mergesort

                          Based on divide-and-conquer strategy

                          o Divide the list into two smaller lists of about equal sizes

                          o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

                          o How to divide the list o Running timeo How to merge the two sorted lists o Running time

                          14

                          Mergesort Divide

                          o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

                          cut the link

                          o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

                          o Try left=0 right = 50 center=

                          15

                          Mergesort

                          o Divide-and-conquer strategyo recursively mergesort the first half and the

                          second halfo merge the two sorted halves together

                          16

                          Mergesort

                          17

                          Mergesort Merge

                          o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

                          o initially set to the beginning of their respective arrays

                          (1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

                          (2) When either input list is exhausted the remainder of the other list is copied to C

                          18

                          Mergesort Merge

                          19

                          Mergesort Merge

                          20

                          Mergesort Analysis

                          o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

                          o Space requiremento merging two sorted lists requires linear extra

                          memoryo additional work to copy to the temporary array

                          and back

                          21

                          Mergesort Analysis

                          o Let T(N) denote the worst-case running time of mergesort to sort N numbers

                          o Assume that N is a power of 2

                          o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

                          o T(1) = 1o T(N) = 2T(N2) + N

                          22

                          Mergesort Analysis

                          kNN

                          T

                          NN

                          T

                          NNN

                          T

                          NN

                          T

                          NNN

                          T

                          NN

                          TNT

                          kk

                          )2(2

                          3)8(8

                          2)4

                          )8(2(4

                          2)4(4

                          )2

                          )4(2(2

                          )2(2)(

                          )log(

                          log

                          )2(2)(

                          NNO

                          NNN

                          kNN

                          TNTk

                          k

                          Since N=2k we have k=log2 n

                          23

                          Quicksort

                          o Divide-and-conquer approach to sortingo Like MergeSort except

                          o Donrsquot divide the array in halfo Partition the array based elements being less than or

                          greater than some element of the array (the pivot)

                          o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

                          InsertionSort) when array is small

                          24

                          Quicksort Algorithm

                          o Given array S

                          o Modify S so elements in increasing order

                          1 If size of S is 0 or 1 return

                          2 Pick any element v in S as the pivot

                          3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

                          4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

                          25

                          Quicksort Example

                          26

                          Why so fast

                          o MergeSort always divides array in halfo QuickSort might divide array into

                          subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

                          o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

                          merge stepo QuickSort can partition the array in place

                          o This more than makes up for bad pivot choices

                          27

                          Picking the Pivot

                          o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                          generator

                          28

                          Picking the Pivot

                          o Best choice of pivoto Median of array

                          o Median is expensive to calculateo Estimate median as the median of

                          three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                          o Has been shown to reduce running time (comparisons) by 14

                          29

                          Partitioning Strategy

                          o Partitioning is conceptually straightforward but easy to do inefficiently

                          o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                          o Increment i until S[i] gt pivot

                          o Decrement j until S[j] lt pivot

                          o If (i lt j) then swap S[i] and S[j]

                          o Swap pivot and S[i]

                          30

                          Partitioning Example

                          31

                          Partitioning Example

                          32

                          Partitioning Strategy

                          o How to handle duplicateso Consider the case where all elements

                          are equalo Current approach Skip over elements

                          equal to pivoto No swaps (good)

                          o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                          o Worst case O(N2) performance

                          33

                          Partitioning Strategy

                          o How to handle duplicateso Alternative approach

                          o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                          o Adds some unnecessary swapso But results in perfect partitioning for array

                          of identical elementso Unlikely for input array but more likely for

                          recursive calls to QuickSort

                          34

                          Small Arrays

                          o When S is small generating lots of recursive calls on small sub-arrays is expensive

                          o General strategyo When N lt threshold use a sort more efficient for

                          small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                          for array of size 2 or less

                          o Has been shown to reduce running time by 15

                          35

                          QuickSort Implementation

                          36

                          QuickSort Implementation

                          37

                          QuickSort Implementation

                          38

                          Analysis of QuickSort

                          o Let i be the number of elements sent to the left partition

                          o Compute running time T(N) for array of size N

                          o T(0) = T(1) = O(1)

                          o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                          39

                          Analysis of QuickSort

                          40

                          Analysis of QuickSort

                          41

                          Comparison Sorting

                          42

                          Comparison Sorting

                          43

                          Comparison Sorting

                          44

                          Lower Bound on Sorting

                          o Best worst-case sorting algorithm (so far) is O(N log N)

                          o Can we do bettero Can we prove a lower bound on the

                          sorting problemo Preview

                          o For comparison sorting no we canrsquot do better

                          o Can show lower bound of Ω(N log N)

                          45

                          Decision Trees

                          o A decision tree is a binary treeo Each node represents a set of possible

                          orderings of the array elementso Each branch represents an outcome of

                          a particular comparison

                          o Each leaf of the decision tree represents a particular ordering of the original array elements

                          46

                          Decision Trees

                          47

                          Decision Tree for Sorting

                          o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                          o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                          o In the average case the number of comparisons is the average of the depths of all leaves

                          o There are N different orderings of N elements

                          48

                          Lower Bound for Comparison Sorting

                          o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                          49

                          Linear Sorting

                          o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                          o CountingSort

                          o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                          irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                          50

                          Linear Sorting

                          o BucketSort

                          o Assume N elements of A uniformly distributed over the range [01)

                          o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                          o Assumes each bucket will contain Θ(1) elements

                          51

                          External Sorting

                          o What is the number of elements N we wish to sort do not fit in memory

                          o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                          o We want to minimize disk accesses

                          52

                          External Mergesorting

                          o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                          o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                          o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                          size M(K+1)o Perform a K-way merge O(N)

                          o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                          53

                          External Mergesort

                          o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                          o P = page size

                          o Accesses = 4NP (read-allwrite-all twice)

                          54

                          Summary

                          • G64ADS Advanced Data Structures
                          • Insertion sort
                          • Slide 3
                          • Slide 4
                          • Slide 5
                          • Slide 6
                          • Insertion sort worst-case running time
                          • Heapsort
                          • Heapsort -Analysis
                          • Heapsort ndash No Extra Memory
                          • Slide 11
                          • Mergesort
                          • Slide 13
                          • Mergesort Divide
                          • Slide 15
                          • Slide 16
                          • Mergesort Merge
                          • Slide 18
                          • Slide 19
                          • Mergesort Analysis
                          • Slide 21
                          • Slide 22
                          • Quicksort
                          • Quicksort Algorithm
                          • Quicksort Example
                          • Why so fast
                          • Picking the Pivot
                          • Slide 28
                          • Partitioning Strategy
                          • Partitioning Example
                          • Slide 31
                          • Slide 32
                          • Slide 33
                          • Small Arrays
                          • QuickSort Implementation
                          • Slide 36
                          • Slide 37
                          • Analysis of QuickSort
                          • Slide 39
                          • Slide 40
                          • Comparison Sorting
                          • Slide 42
                          • Slide 43
                          • Lower Bound on Sorting
                          • Decision Trees
                          • Slide 46
                          • Decision Tree for Sorting
                          • Lower Bound for Comparison Sorting
                          • Linear Sorting
                          • Slide 50
                          • External Sorting
                          • External Mergesorting
                          • External Mergesort
                          • Summary

                            14

                            Mergesort Divide

                            o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

                            cut the link

                            o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

                            o Try left=0 right = 50 center=

                            15

                            Mergesort

                            o Divide-and-conquer strategyo recursively mergesort the first half and the

                            second halfo merge the two sorted halves together

                            16

                            Mergesort

                            17

                            Mergesort Merge

                            o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

                            o initially set to the beginning of their respective arrays

                            (1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

                            (2) When either input list is exhausted the remainder of the other list is copied to C

                            18

                            Mergesort Merge

                            19

                            Mergesort Merge

                            20

                            Mergesort Analysis

                            o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

                            o Space requiremento merging two sorted lists requires linear extra

                            memoryo additional work to copy to the temporary array

                            and back

                            21

                            Mergesort Analysis

                            o Let T(N) denote the worst-case running time of mergesort to sort N numbers

                            o Assume that N is a power of 2

                            o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

                            o T(1) = 1o T(N) = 2T(N2) + N

                            22

                            Mergesort Analysis

                            kNN

                            T

                            NN

                            T

                            NNN

                            T

                            NN

                            T

                            NNN

                            T

                            NN

                            TNT

                            kk

                            )2(2

                            3)8(8

                            2)4

                            )8(2(4

                            2)4(4

                            )2

                            )4(2(2

                            )2(2)(

                            )log(

                            log

                            )2(2)(

                            NNO

                            NNN

                            kNN

                            TNTk

                            k

                            Since N=2k we have k=log2 n

                            23

                            Quicksort

                            o Divide-and-conquer approach to sortingo Like MergeSort except

                            o Donrsquot divide the array in halfo Partition the array based elements being less than or

                            greater than some element of the array (the pivot)

                            o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

                            InsertionSort) when array is small

                            24

                            Quicksort Algorithm

                            o Given array S

                            o Modify S so elements in increasing order

                            1 If size of S is 0 or 1 return

                            2 Pick any element v in S as the pivot

                            3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

                            4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

                            25

                            Quicksort Example

                            26

                            Why so fast

                            o MergeSort always divides array in halfo QuickSort might divide array into

                            subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

                            o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

                            merge stepo QuickSort can partition the array in place

                            o This more than makes up for bad pivot choices

                            27

                            Picking the Pivot

                            o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                            generator

                            28

                            Picking the Pivot

                            o Best choice of pivoto Median of array

                            o Median is expensive to calculateo Estimate median as the median of

                            three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                            o Has been shown to reduce running time (comparisons) by 14

                            29

                            Partitioning Strategy

                            o Partitioning is conceptually straightforward but easy to do inefficiently

                            o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                            o Increment i until S[i] gt pivot

                            o Decrement j until S[j] lt pivot

                            o If (i lt j) then swap S[i] and S[j]

                            o Swap pivot and S[i]

                            30

                            Partitioning Example

                            31

                            Partitioning Example

                            32

                            Partitioning Strategy

                            o How to handle duplicateso Consider the case where all elements

                            are equalo Current approach Skip over elements

                            equal to pivoto No swaps (good)

                            o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                            o Worst case O(N2) performance

                            33

                            Partitioning Strategy

                            o How to handle duplicateso Alternative approach

                            o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                            o Adds some unnecessary swapso But results in perfect partitioning for array

                            of identical elementso Unlikely for input array but more likely for

                            recursive calls to QuickSort

                            34

                            Small Arrays

                            o When S is small generating lots of recursive calls on small sub-arrays is expensive

                            o General strategyo When N lt threshold use a sort more efficient for

                            small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                            for array of size 2 or less

                            o Has been shown to reduce running time by 15

                            35

                            QuickSort Implementation

                            36

                            QuickSort Implementation

                            37

                            QuickSort Implementation

                            38

                            Analysis of QuickSort

                            o Let i be the number of elements sent to the left partition

                            o Compute running time T(N) for array of size N

                            o T(0) = T(1) = O(1)

                            o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                            39

                            Analysis of QuickSort

                            40

                            Analysis of QuickSort

                            41

                            Comparison Sorting

                            42

                            Comparison Sorting

                            43

                            Comparison Sorting

                            44

                            Lower Bound on Sorting

                            o Best worst-case sorting algorithm (so far) is O(N log N)

                            o Can we do bettero Can we prove a lower bound on the

                            sorting problemo Preview

                            o For comparison sorting no we canrsquot do better

                            o Can show lower bound of Ω(N log N)

                            45

                            Decision Trees

                            o A decision tree is a binary treeo Each node represents a set of possible

                            orderings of the array elementso Each branch represents an outcome of

                            a particular comparison

                            o Each leaf of the decision tree represents a particular ordering of the original array elements

                            46

                            Decision Trees

                            47

                            Decision Tree for Sorting

                            o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                            o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                            o In the average case the number of comparisons is the average of the depths of all leaves

                            o There are N different orderings of N elements

                            48

                            Lower Bound for Comparison Sorting

                            o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                            49

                            Linear Sorting

                            o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                            o CountingSort

                            o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                            irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                            50

                            Linear Sorting

                            o BucketSort

                            o Assume N elements of A uniformly distributed over the range [01)

                            o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                            o Assumes each bucket will contain Θ(1) elements

                            51

                            External Sorting

                            o What is the number of elements N we wish to sort do not fit in memory

                            o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                            o We want to minimize disk accesses

                            52

                            External Mergesorting

                            o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                            o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                            o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                            size M(K+1)o Perform a K-way merge O(N)

                            o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                            53

                            External Mergesort

                            o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                            o P = page size

                            o Accesses = 4NP (read-allwrite-all twice)

                            54

                            Summary

                            • G64ADS Advanced Data Structures
                            • Insertion sort
                            • Slide 3
                            • Slide 4
                            • Slide 5
                            • Slide 6
                            • Insertion sort worst-case running time
                            • Heapsort
                            • Heapsort -Analysis
                            • Heapsort ndash No Extra Memory
                            • Slide 11
                            • Mergesort
                            • Slide 13
                            • Mergesort Divide
                            • Slide 15
                            • Slide 16
                            • Mergesort Merge
                            • Slide 18
                            • Slide 19
                            • Mergesort Analysis
                            • Slide 21
                            • Slide 22
                            • Quicksort
                            • Quicksort Algorithm
                            • Quicksort Example
                            • Why so fast
                            • Picking the Pivot
                            • Slide 28
                            • Partitioning Strategy
                            • Partitioning Example
                            • Slide 31
                            • Slide 32
                            • Slide 33
                            • Small Arrays
                            • QuickSort Implementation
                            • Slide 36
                            • Slide 37
                            • Analysis of QuickSort
                            • Slide 39
                            • Slide 40
                            • Comparison Sorting
                            • Slide 42
                            • Slide 43
                            • Lower Bound on Sorting
                            • Decision Trees
                            • Slide 46
                            • Decision Tree for Sorting
                            • Lower Bound for Comparison Sorting
                            • Linear Sorting
                            • Slide 50
                            • External Sorting
                            • External Mergesorting
                            • External Mergesort
                            • Summary

                              15

                              Mergesort

                              o Divide-and-conquer strategyo recursively mergesort the first half and the

                              second halfo merge the two sorted halves together

                              16

                              Mergesort

                              17

                              Mergesort Merge

                              o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

                              o initially set to the beginning of their respective arrays

                              (1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

                              (2) When either input list is exhausted the remainder of the other list is copied to C

                              18

                              Mergesort Merge

                              19

                              Mergesort Merge

                              20

                              Mergesort Analysis

                              o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

                              o Space requiremento merging two sorted lists requires linear extra

                              memoryo additional work to copy to the temporary array

                              and back

                              21

                              Mergesort Analysis

                              o Let T(N) denote the worst-case running time of mergesort to sort N numbers

                              o Assume that N is a power of 2

                              o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

                              o T(1) = 1o T(N) = 2T(N2) + N

                              22

                              Mergesort Analysis

                              kNN

                              T

                              NN

                              T

                              NNN

                              T

                              NN

                              T

                              NNN

                              T

                              NN

                              TNT

                              kk

                              )2(2

                              3)8(8

                              2)4

                              )8(2(4

                              2)4(4

                              )2

                              )4(2(2

                              )2(2)(

                              )log(

                              log

                              )2(2)(

                              NNO

                              NNN

                              kNN

                              TNTk

                              k

                              Since N=2k we have k=log2 n

                              23

                              Quicksort

                              o Divide-and-conquer approach to sortingo Like MergeSort except

                              o Donrsquot divide the array in halfo Partition the array based elements being less than or

                              greater than some element of the array (the pivot)

                              o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

                              InsertionSort) when array is small

                              24

                              Quicksort Algorithm

                              o Given array S

                              o Modify S so elements in increasing order

                              1 If size of S is 0 or 1 return

                              2 Pick any element v in S as the pivot

                              3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

                              4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

                              25

                              Quicksort Example

                              26

                              Why so fast

                              o MergeSort always divides array in halfo QuickSort might divide array into

                              subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

                              o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

                              merge stepo QuickSort can partition the array in place

                              o This more than makes up for bad pivot choices

                              27

                              Picking the Pivot

                              o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                              generator

                              28

                              Picking the Pivot

                              o Best choice of pivoto Median of array

                              o Median is expensive to calculateo Estimate median as the median of

                              three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                              o Has been shown to reduce running time (comparisons) by 14

                              29

                              Partitioning Strategy

                              o Partitioning is conceptually straightforward but easy to do inefficiently

                              o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                              o Increment i until S[i] gt pivot

                              o Decrement j until S[j] lt pivot

                              o If (i lt j) then swap S[i] and S[j]

                              o Swap pivot and S[i]

                              30

                              Partitioning Example

                              31

                              Partitioning Example

                              32

                              Partitioning Strategy

                              o How to handle duplicateso Consider the case where all elements

                              are equalo Current approach Skip over elements

                              equal to pivoto No swaps (good)

                              o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                              o Worst case O(N2) performance

                              33

                              Partitioning Strategy

                              o How to handle duplicateso Alternative approach

                              o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                              o Adds some unnecessary swapso But results in perfect partitioning for array

                              of identical elementso Unlikely for input array but more likely for

                              recursive calls to QuickSort

                              34

                              Small Arrays

                              o When S is small generating lots of recursive calls on small sub-arrays is expensive

                              o General strategyo When N lt threshold use a sort more efficient for

                              small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                              for array of size 2 or less

                              o Has been shown to reduce running time by 15

                              35

                              QuickSort Implementation

                              36

                              QuickSort Implementation

                              37

                              QuickSort Implementation

                              38

                              Analysis of QuickSort

                              o Let i be the number of elements sent to the left partition

                              o Compute running time T(N) for array of size N

                              o T(0) = T(1) = O(1)

                              o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                              39

                              Analysis of QuickSort

                              40

                              Analysis of QuickSort

                              41

                              Comparison Sorting

                              42

                              Comparison Sorting

                              43

                              Comparison Sorting

                              44

                              Lower Bound on Sorting

                              o Best worst-case sorting algorithm (so far) is O(N log N)

                              o Can we do bettero Can we prove a lower bound on the

                              sorting problemo Preview

                              o For comparison sorting no we canrsquot do better

                              o Can show lower bound of Ω(N log N)

                              45

                              Decision Trees

                              o A decision tree is a binary treeo Each node represents a set of possible

                              orderings of the array elementso Each branch represents an outcome of

                              a particular comparison

                              o Each leaf of the decision tree represents a particular ordering of the original array elements

                              46

                              Decision Trees

                              47

                              Decision Tree for Sorting

                              o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                              o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                              o In the average case the number of comparisons is the average of the depths of all leaves

                              o There are N different orderings of N elements

                              48

                              Lower Bound for Comparison Sorting

                              o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                              49

                              Linear Sorting

                              o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                              o CountingSort

                              o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                              irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                              50

                              Linear Sorting

                              o BucketSort

                              o Assume N elements of A uniformly distributed over the range [01)

                              o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                              o Assumes each bucket will contain Θ(1) elements

                              51

                              External Sorting

                              o What is the number of elements N we wish to sort do not fit in memory

                              o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                              o We want to minimize disk accesses

                              52

                              External Mergesorting

                              o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                              o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                              o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                              size M(K+1)o Perform a K-way merge O(N)

                              o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                              53

                              External Mergesort

                              o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                              o P = page size

                              o Accesses = 4NP (read-allwrite-all twice)

                              54

                              Summary

                              • G64ADS Advanced Data Structures
                              • Insertion sort
                              • Slide 3
                              • Slide 4
                              • Slide 5
                              • Slide 6
                              • Insertion sort worst-case running time
                              • Heapsort
                              • Heapsort -Analysis
                              • Heapsort ndash No Extra Memory
                              • Slide 11
                              • Mergesort
                              • Slide 13
                              • Mergesort Divide
                              • Slide 15
                              • Slide 16
                              • Mergesort Merge
                              • Slide 18
                              • Slide 19
                              • Mergesort Analysis
                              • Slide 21
                              • Slide 22
                              • Quicksort
                              • Quicksort Algorithm
                              • Quicksort Example
                              • Why so fast
                              • Picking the Pivot
                              • Slide 28
                              • Partitioning Strategy
                              • Partitioning Example
                              • Slide 31
                              • Slide 32
                              • Slide 33
                              • Small Arrays
                              • QuickSort Implementation
                              • Slide 36
                              • Slide 37
                              • Analysis of QuickSort
                              • Slide 39
                              • Slide 40
                              • Comparison Sorting
                              • Slide 42
                              • Slide 43
                              • Lower Bound on Sorting
                              • Decision Trees
                              • Slide 46
                              • Decision Tree for Sorting
                              • Lower Bound for Comparison Sorting
                              • Linear Sorting
                              • Slide 50
                              • External Sorting
                              • External Mergesorting
                              • External Mergesort
                              • Summary

                                16

                                Mergesort

                                17

                                Mergesort Merge

                                o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

                                o initially set to the beginning of their respective arrays

                                (1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

                                (2) When either input list is exhausted the remainder of the other list is copied to C

                                18

                                Mergesort Merge

                                19

                                Mergesort Merge

                                20

                                Mergesort Analysis

                                o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

                                o Space requiremento merging two sorted lists requires linear extra

                                memoryo additional work to copy to the temporary array

                                and back

                                21

                                Mergesort Analysis

                                o Let T(N) denote the worst-case running time of mergesort to sort N numbers

                                o Assume that N is a power of 2

                                o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

                                o T(1) = 1o T(N) = 2T(N2) + N

                                22

                                Mergesort Analysis

                                kNN

                                T

                                NN

                                T

                                NNN

                                T

                                NN

                                T

                                NNN

                                T

                                NN

                                TNT

                                kk

                                )2(2

                                3)8(8

                                2)4

                                )8(2(4

                                2)4(4

                                )2

                                )4(2(2

                                )2(2)(

                                )log(

                                log

                                )2(2)(

                                NNO

                                NNN

                                kNN

                                TNTk

                                k

                                Since N=2k we have k=log2 n

                                23

                                Quicksort

                                o Divide-and-conquer approach to sortingo Like MergeSort except

                                o Donrsquot divide the array in halfo Partition the array based elements being less than or

                                greater than some element of the array (the pivot)

                                o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

                                InsertionSort) when array is small

                                24

                                Quicksort Algorithm

                                o Given array S

                                o Modify S so elements in increasing order

                                1 If size of S is 0 or 1 return

                                2 Pick any element v in S as the pivot

                                3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

                                4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

                                25

                                Quicksort Example

                                26

                                Why so fast

                                o MergeSort always divides array in halfo QuickSort might divide array into

                                subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

                                o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

                                merge stepo QuickSort can partition the array in place

                                o This more than makes up for bad pivot choices

                                27

                                Picking the Pivot

                                o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                                generator

                                28

                                Picking the Pivot

                                o Best choice of pivoto Median of array

                                o Median is expensive to calculateo Estimate median as the median of

                                three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                                o Has been shown to reduce running time (comparisons) by 14

                                29

                                Partitioning Strategy

                                o Partitioning is conceptually straightforward but easy to do inefficiently

                                o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                                o Increment i until S[i] gt pivot

                                o Decrement j until S[j] lt pivot

                                o If (i lt j) then swap S[i] and S[j]

                                o Swap pivot and S[i]

                                30

                                Partitioning Example

                                31

                                Partitioning Example

                                32

                                Partitioning Strategy

                                o How to handle duplicateso Consider the case where all elements

                                are equalo Current approach Skip over elements

                                equal to pivoto No swaps (good)

                                o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                                o Worst case O(N2) performance

                                33

                                Partitioning Strategy

                                o How to handle duplicateso Alternative approach

                                o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                                o Adds some unnecessary swapso But results in perfect partitioning for array

                                of identical elementso Unlikely for input array but more likely for

                                recursive calls to QuickSort

                                34

                                Small Arrays

                                o When S is small generating lots of recursive calls on small sub-arrays is expensive

                                o General strategyo When N lt threshold use a sort more efficient for

                                small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                                for array of size 2 or less

                                o Has been shown to reduce running time by 15

                                35

                                QuickSort Implementation

                                36

                                QuickSort Implementation

                                37

                                QuickSort Implementation

                                38

                                Analysis of QuickSort

                                o Let i be the number of elements sent to the left partition

                                o Compute running time T(N) for array of size N

                                o T(0) = T(1) = O(1)

                                o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                39

                                Analysis of QuickSort

                                40

                                Analysis of QuickSort

                                41

                                Comparison Sorting

                                42

                                Comparison Sorting

                                43

                                Comparison Sorting

                                44

                                Lower Bound on Sorting

                                o Best worst-case sorting algorithm (so far) is O(N log N)

                                o Can we do bettero Can we prove a lower bound on the

                                sorting problemo Preview

                                o For comparison sorting no we canrsquot do better

                                o Can show lower bound of Ω(N log N)

                                45

                                Decision Trees

                                o A decision tree is a binary treeo Each node represents a set of possible

                                orderings of the array elementso Each branch represents an outcome of

                                a particular comparison

                                o Each leaf of the decision tree represents a particular ordering of the original array elements

                                46

                                Decision Trees

                                47

                                Decision Tree for Sorting

                                o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                o In the average case the number of comparisons is the average of the depths of all leaves

                                o There are N different orderings of N elements

                                48

                                Lower Bound for Comparison Sorting

                                o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                49

                                Linear Sorting

                                o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                o CountingSort

                                o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                50

                                Linear Sorting

                                o BucketSort

                                o Assume N elements of A uniformly distributed over the range [01)

                                o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                o Assumes each bucket will contain Θ(1) elements

                                51

                                External Sorting

                                o What is the number of elements N we wish to sort do not fit in memory

                                o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                o We want to minimize disk accesses

                                52

                                External Mergesorting

                                o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                size M(K+1)o Perform a K-way merge O(N)

                                o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                53

                                External Mergesort

                                o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                o P = page size

                                o Accesses = 4NP (read-allwrite-all twice)

                                54

                                Summary

                                • G64ADS Advanced Data Structures
                                • Insertion sort
                                • Slide 3
                                • Slide 4
                                • Slide 5
                                • Slide 6
                                • Insertion sort worst-case running time
                                • Heapsort
                                • Heapsort -Analysis
                                • Heapsort ndash No Extra Memory
                                • Slide 11
                                • Mergesort
                                • Slide 13
                                • Mergesort Divide
                                • Slide 15
                                • Slide 16
                                • Mergesort Merge
                                • Slide 18
                                • Slide 19
                                • Mergesort Analysis
                                • Slide 21
                                • Slide 22
                                • Quicksort
                                • Quicksort Algorithm
                                • Quicksort Example
                                • Why so fast
                                • Picking the Pivot
                                • Slide 28
                                • Partitioning Strategy
                                • Partitioning Example
                                • Slide 31
                                • Slide 32
                                • Slide 33
                                • Small Arrays
                                • QuickSort Implementation
                                • Slide 36
                                • Slide 37
                                • Analysis of QuickSort
                                • Slide 39
                                • Slide 40
                                • Comparison Sorting
                                • Slide 42
                                • Slide 43
                                • Lower Bound on Sorting
                                • Decision Trees
                                • Slide 46
                                • Decision Tree for Sorting
                                • Lower Bound for Comparison Sorting
                                • Linear Sorting
                                • Slide 50
                                • External Sorting
                                • External Mergesorting
                                • External Mergesort
                                • Summary

                                  17

                                  Mergesort Merge

                                  o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

                                  o initially set to the beginning of their respective arrays

                                  (1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

                                  (2) When either input list is exhausted the remainder of the other list is copied to C

                                  18

                                  Mergesort Merge

                                  19

                                  Mergesort Merge

                                  20

                                  Mergesort Analysis

                                  o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

                                  o Space requiremento merging two sorted lists requires linear extra

                                  memoryo additional work to copy to the temporary array

                                  and back

                                  21

                                  Mergesort Analysis

                                  o Let T(N) denote the worst-case running time of mergesort to sort N numbers

                                  o Assume that N is a power of 2

                                  o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

                                  o T(1) = 1o T(N) = 2T(N2) + N

                                  22

                                  Mergesort Analysis

                                  kNN

                                  T

                                  NN

                                  T

                                  NNN

                                  T

                                  NN

                                  T

                                  NNN

                                  T

                                  NN

                                  TNT

                                  kk

                                  )2(2

                                  3)8(8

                                  2)4

                                  )8(2(4

                                  2)4(4

                                  )2

                                  )4(2(2

                                  )2(2)(

                                  )log(

                                  log

                                  )2(2)(

                                  NNO

                                  NNN

                                  kNN

                                  TNTk

                                  k

                                  Since N=2k we have k=log2 n

                                  23

                                  Quicksort

                                  o Divide-and-conquer approach to sortingo Like MergeSort except

                                  o Donrsquot divide the array in halfo Partition the array based elements being less than or

                                  greater than some element of the array (the pivot)

                                  o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

                                  InsertionSort) when array is small

                                  24

                                  Quicksort Algorithm

                                  o Given array S

                                  o Modify S so elements in increasing order

                                  1 If size of S is 0 or 1 return

                                  2 Pick any element v in S as the pivot

                                  3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

                                  4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

                                  25

                                  Quicksort Example

                                  26

                                  Why so fast

                                  o MergeSort always divides array in halfo QuickSort might divide array into

                                  subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

                                  o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

                                  merge stepo QuickSort can partition the array in place

                                  o This more than makes up for bad pivot choices

                                  27

                                  Picking the Pivot

                                  o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                                  generator

                                  28

                                  Picking the Pivot

                                  o Best choice of pivoto Median of array

                                  o Median is expensive to calculateo Estimate median as the median of

                                  three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                                  o Has been shown to reduce running time (comparisons) by 14

                                  29

                                  Partitioning Strategy

                                  o Partitioning is conceptually straightforward but easy to do inefficiently

                                  o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                                  o Increment i until S[i] gt pivot

                                  o Decrement j until S[j] lt pivot

                                  o If (i lt j) then swap S[i] and S[j]

                                  o Swap pivot and S[i]

                                  30

                                  Partitioning Example

                                  31

                                  Partitioning Example

                                  32

                                  Partitioning Strategy

                                  o How to handle duplicateso Consider the case where all elements

                                  are equalo Current approach Skip over elements

                                  equal to pivoto No swaps (good)

                                  o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                                  o Worst case O(N2) performance

                                  33

                                  Partitioning Strategy

                                  o How to handle duplicateso Alternative approach

                                  o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                                  o Adds some unnecessary swapso But results in perfect partitioning for array

                                  of identical elementso Unlikely for input array but more likely for

                                  recursive calls to QuickSort

                                  34

                                  Small Arrays

                                  o When S is small generating lots of recursive calls on small sub-arrays is expensive

                                  o General strategyo When N lt threshold use a sort more efficient for

                                  small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                                  for array of size 2 or less

                                  o Has been shown to reduce running time by 15

                                  35

                                  QuickSort Implementation

                                  36

                                  QuickSort Implementation

                                  37

                                  QuickSort Implementation

                                  38

                                  Analysis of QuickSort

                                  o Let i be the number of elements sent to the left partition

                                  o Compute running time T(N) for array of size N

                                  o T(0) = T(1) = O(1)

                                  o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                  39

                                  Analysis of QuickSort

                                  40

                                  Analysis of QuickSort

                                  41

                                  Comparison Sorting

                                  42

                                  Comparison Sorting

                                  43

                                  Comparison Sorting

                                  44

                                  Lower Bound on Sorting

                                  o Best worst-case sorting algorithm (so far) is O(N log N)

                                  o Can we do bettero Can we prove a lower bound on the

                                  sorting problemo Preview

                                  o For comparison sorting no we canrsquot do better

                                  o Can show lower bound of Ω(N log N)

                                  45

                                  Decision Trees

                                  o A decision tree is a binary treeo Each node represents a set of possible

                                  orderings of the array elementso Each branch represents an outcome of

                                  a particular comparison

                                  o Each leaf of the decision tree represents a particular ordering of the original array elements

                                  46

                                  Decision Trees

                                  47

                                  Decision Tree for Sorting

                                  o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                  o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                  o In the average case the number of comparisons is the average of the depths of all leaves

                                  o There are N different orderings of N elements

                                  48

                                  Lower Bound for Comparison Sorting

                                  o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                  49

                                  Linear Sorting

                                  o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                  o CountingSort

                                  o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                  irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                  50

                                  Linear Sorting

                                  o BucketSort

                                  o Assume N elements of A uniformly distributed over the range [01)

                                  o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                  o Assumes each bucket will contain Θ(1) elements

                                  51

                                  External Sorting

                                  o What is the number of elements N we wish to sort do not fit in memory

                                  o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                  o We want to minimize disk accesses

                                  52

                                  External Mergesorting

                                  o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                  o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                  o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                  size M(K+1)o Perform a K-way merge O(N)

                                  o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                  53

                                  External Mergesort

                                  o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                  o P = page size

                                  o Accesses = 4NP (read-allwrite-all twice)

                                  54

                                  Summary

                                  • G64ADS Advanced Data Structures
                                  • Insertion sort
                                  • Slide 3
                                  • Slide 4
                                  • Slide 5
                                  • Slide 6
                                  • Insertion sort worst-case running time
                                  • Heapsort
                                  • Heapsort -Analysis
                                  • Heapsort ndash No Extra Memory
                                  • Slide 11
                                  • Mergesort
                                  • Slide 13
                                  • Mergesort Divide
                                  • Slide 15
                                  • Slide 16
                                  • Mergesort Merge
                                  • Slide 18
                                  • Slide 19
                                  • Mergesort Analysis
                                  • Slide 21
                                  • Slide 22
                                  • Quicksort
                                  • Quicksort Algorithm
                                  • Quicksort Example
                                  • Why so fast
                                  • Picking the Pivot
                                  • Slide 28
                                  • Partitioning Strategy
                                  • Partitioning Example
                                  • Slide 31
                                  • Slide 32
                                  • Slide 33
                                  • Small Arrays
                                  • QuickSort Implementation
                                  • Slide 36
                                  • Slide 37
                                  • Analysis of QuickSort
                                  • Slide 39
                                  • Slide 40
                                  • Comparison Sorting
                                  • Slide 42
                                  • Slide 43
                                  • Lower Bound on Sorting
                                  • Decision Trees
                                  • Slide 46
                                  • Decision Tree for Sorting
                                  • Lower Bound for Comparison Sorting
                                  • Linear Sorting
                                  • Slide 50
                                  • External Sorting
                                  • External Mergesorting
                                  • External Mergesort
                                  • Summary

                                    18

                                    Mergesort Merge

                                    19

                                    Mergesort Merge

                                    20

                                    Mergesort Analysis

                                    o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

                                    o Space requiremento merging two sorted lists requires linear extra

                                    memoryo additional work to copy to the temporary array

                                    and back

                                    21

                                    Mergesort Analysis

                                    o Let T(N) denote the worst-case running time of mergesort to sort N numbers

                                    o Assume that N is a power of 2

                                    o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

                                    o T(1) = 1o T(N) = 2T(N2) + N

                                    22

                                    Mergesort Analysis

                                    kNN

                                    T

                                    NN

                                    T

                                    NNN

                                    T

                                    NN

                                    T

                                    NNN

                                    T

                                    NN

                                    TNT

                                    kk

                                    )2(2

                                    3)8(8

                                    2)4

                                    )8(2(4

                                    2)4(4

                                    )2

                                    )4(2(2

                                    )2(2)(

                                    )log(

                                    log

                                    )2(2)(

                                    NNO

                                    NNN

                                    kNN

                                    TNTk

                                    k

                                    Since N=2k we have k=log2 n

                                    23

                                    Quicksort

                                    o Divide-and-conquer approach to sortingo Like MergeSort except

                                    o Donrsquot divide the array in halfo Partition the array based elements being less than or

                                    greater than some element of the array (the pivot)

                                    o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

                                    InsertionSort) when array is small

                                    24

                                    Quicksort Algorithm

                                    o Given array S

                                    o Modify S so elements in increasing order

                                    1 If size of S is 0 or 1 return

                                    2 Pick any element v in S as the pivot

                                    3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

                                    4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

                                    25

                                    Quicksort Example

                                    26

                                    Why so fast

                                    o MergeSort always divides array in halfo QuickSort might divide array into

                                    subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

                                    o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

                                    merge stepo QuickSort can partition the array in place

                                    o This more than makes up for bad pivot choices

                                    27

                                    Picking the Pivot

                                    o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                                    generator

                                    28

                                    Picking the Pivot

                                    o Best choice of pivoto Median of array

                                    o Median is expensive to calculateo Estimate median as the median of

                                    three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                                    o Has been shown to reduce running time (comparisons) by 14

                                    29

                                    Partitioning Strategy

                                    o Partitioning is conceptually straightforward but easy to do inefficiently

                                    o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                                    o Increment i until S[i] gt pivot

                                    o Decrement j until S[j] lt pivot

                                    o If (i lt j) then swap S[i] and S[j]

                                    o Swap pivot and S[i]

                                    30

                                    Partitioning Example

                                    31

                                    Partitioning Example

                                    32

                                    Partitioning Strategy

                                    o How to handle duplicateso Consider the case where all elements

                                    are equalo Current approach Skip over elements

                                    equal to pivoto No swaps (good)

                                    o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                                    o Worst case O(N2) performance

                                    33

                                    Partitioning Strategy

                                    o How to handle duplicateso Alternative approach

                                    o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                                    o Adds some unnecessary swapso But results in perfect partitioning for array

                                    of identical elementso Unlikely for input array but more likely for

                                    recursive calls to QuickSort

                                    34

                                    Small Arrays

                                    o When S is small generating lots of recursive calls on small sub-arrays is expensive

                                    o General strategyo When N lt threshold use a sort more efficient for

                                    small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                                    for array of size 2 or less

                                    o Has been shown to reduce running time by 15

                                    35

                                    QuickSort Implementation

                                    36

                                    QuickSort Implementation

                                    37

                                    QuickSort Implementation

                                    38

                                    Analysis of QuickSort

                                    o Let i be the number of elements sent to the left partition

                                    o Compute running time T(N) for array of size N

                                    o T(0) = T(1) = O(1)

                                    o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                    39

                                    Analysis of QuickSort

                                    40

                                    Analysis of QuickSort

                                    41

                                    Comparison Sorting

                                    42

                                    Comparison Sorting

                                    43

                                    Comparison Sorting

                                    44

                                    Lower Bound on Sorting

                                    o Best worst-case sorting algorithm (so far) is O(N log N)

                                    o Can we do bettero Can we prove a lower bound on the

                                    sorting problemo Preview

                                    o For comparison sorting no we canrsquot do better

                                    o Can show lower bound of Ω(N log N)

                                    45

                                    Decision Trees

                                    o A decision tree is a binary treeo Each node represents a set of possible

                                    orderings of the array elementso Each branch represents an outcome of

                                    a particular comparison

                                    o Each leaf of the decision tree represents a particular ordering of the original array elements

                                    46

                                    Decision Trees

                                    47

                                    Decision Tree for Sorting

                                    o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                    o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                    o In the average case the number of comparisons is the average of the depths of all leaves

                                    o There are N different orderings of N elements

                                    48

                                    Lower Bound for Comparison Sorting

                                    o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                    49

                                    Linear Sorting

                                    o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                    o CountingSort

                                    o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                    irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                    50

                                    Linear Sorting

                                    o BucketSort

                                    o Assume N elements of A uniformly distributed over the range [01)

                                    o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                    o Assumes each bucket will contain Θ(1) elements

                                    51

                                    External Sorting

                                    o What is the number of elements N we wish to sort do not fit in memory

                                    o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                    o We want to minimize disk accesses

                                    52

                                    External Mergesorting

                                    o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                    o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                    o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                    size M(K+1)o Perform a K-way merge O(N)

                                    o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                    53

                                    External Mergesort

                                    o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                    o P = page size

                                    o Accesses = 4NP (read-allwrite-all twice)

                                    54

                                    Summary

                                    • G64ADS Advanced Data Structures
                                    • Insertion sort
                                    • Slide 3
                                    • Slide 4
                                    • Slide 5
                                    • Slide 6
                                    • Insertion sort worst-case running time
                                    • Heapsort
                                    • Heapsort -Analysis
                                    • Heapsort ndash No Extra Memory
                                    • Slide 11
                                    • Mergesort
                                    • Slide 13
                                    • Mergesort Divide
                                    • Slide 15
                                    • Slide 16
                                    • Mergesort Merge
                                    • Slide 18
                                    • Slide 19
                                    • Mergesort Analysis
                                    • Slide 21
                                    • Slide 22
                                    • Quicksort
                                    • Quicksort Algorithm
                                    • Quicksort Example
                                    • Why so fast
                                    • Picking the Pivot
                                    • Slide 28
                                    • Partitioning Strategy
                                    • Partitioning Example
                                    • Slide 31
                                    • Slide 32
                                    • Slide 33
                                    • Small Arrays
                                    • QuickSort Implementation
                                    • Slide 36
                                    • Slide 37
                                    • Analysis of QuickSort
                                    • Slide 39
                                    • Slide 40
                                    • Comparison Sorting
                                    • Slide 42
                                    • Slide 43
                                    • Lower Bound on Sorting
                                    • Decision Trees
                                    • Slide 46
                                    • Decision Tree for Sorting
                                    • Lower Bound for Comparison Sorting
                                    • Linear Sorting
                                    • Slide 50
                                    • External Sorting
                                    • External Mergesorting
                                    • External Mergesort
                                    • Summary

                                      19

                                      Mergesort Merge

                                      20

                                      Mergesort Analysis

                                      o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

                                      o Space requiremento merging two sorted lists requires linear extra

                                      memoryo additional work to copy to the temporary array

                                      and back

                                      21

                                      Mergesort Analysis

                                      o Let T(N) denote the worst-case running time of mergesort to sort N numbers

                                      o Assume that N is a power of 2

                                      o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

                                      o T(1) = 1o T(N) = 2T(N2) + N

                                      22

                                      Mergesort Analysis

                                      kNN

                                      T

                                      NN

                                      T

                                      NNN

                                      T

                                      NN

                                      T

                                      NNN

                                      T

                                      NN

                                      TNT

                                      kk

                                      )2(2

                                      3)8(8

                                      2)4

                                      )8(2(4

                                      2)4(4

                                      )2

                                      )4(2(2

                                      )2(2)(

                                      )log(

                                      log

                                      )2(2)(

                                      NNO

                                      NNN

                                      kNN

                                      TNTk

                                      k

                                      Since N=2k we have k=log2 n

                                      23

                                      Quicksort

                                      o Divide-and-conquer approach to sortingo Like MergeSort except

                                      o Donrsquot divide the array in halfo Partition the array based elements being less than or

                                      greater than some element of the array (the pivot)

                                      o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

                                      InsertionSort) when array is small

                                      24

                                      Quicksort Algorithm

                                      o Given array S

                                      o Modify S so elements in increasing order

                                      1 If size of S is 0 or 1 return

                                      2 Pick any element v in S as the pivot

                                      3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

                                      4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

                                      25

                                      Quicksort Example

                                      26

                                      Why so fast

                                      o MergeSort always divides array in halfo QuickSort might divide array into

                                      subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

                                      o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

                                      merge stepo QuickSort can partition the array in place

                                      o This more than makes up for bad pivot choices

                                      27

                                      Picking the Pivot

                                      o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                                      generator

                                      28

                                      Picking the Pivot

                                      o Best choice of pivoto Median of array

                                      o Median is expensive to calculateo Estimate median as the median of

                                      three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                                      o Has been shown to reduce running time (comparisons) by 14

                                      29

                                      Partitioning Strategy

                                      o Partitioning is conceptually straightforward but easy to do inefficiently

                                      o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                                      o Increment i until S[i] gt pivot

                                      o Decrement j until S[j] lt pivot

                                      o If (i lt j) then swap S[i] and S[j]

                                      o Swap pivot and S[i]

                                      30

                                      Partitioning Example

                                      31

                                      Partitioning Example

                                      32

                                      Partitioning Strategy

                                      o How to handle duplicateso Consider the case where all elements

                                      are equalo Current approach Skip over elements

                                      equal to pivoto No swaps (good)

                                      o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                                      o Worst case O(N2) performance

                                      33

                                      Partitioning Strategy

                                      o How to handle duplicateso Alternative approach

                                      o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                                      o Adds some unnecessary swapso But results in perfect partitioning for array

                                      of identical elementso Unlikely for input array but more likely for

                                      recursive calls to QuickSort

                                      34

                                      Small Arrays

                                      o When S is small generating lots of recursive calls on small sub-arrays is expensive

                                      o General strategyo When N lt threshold use a sort more efficient for

                                      small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                                      for array of size 2 or less

                                      o Has been shown to reduce running time by 15

                                      35

                                      QuickSort Implementation

                                      36

                                      QuickSort Implementation

                                      37

                                      QuickSort Implementation

                                      38

                                      Analysis of QuickSort

                                      o Let i be the number of elements sent to the left partition

                                      o Compute running time T(N) for array of size N

                                      o T(0) = T(1) = O(1)

                                      o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                      39

                                      Analysis of QuickSort

                                      40

                                      Analysis of QuickSort

                                      41

                                      Comparison Sorting

                                      42

                                      Comparison Sorting

                                      43

                                      Comparison Sorting

                                      44

                                      Lower Bound on Sorting

                                      o Best worst-case sorting algorithm (so far) is O(N log N)

                                      o Can we do bettero Can we prove a lower bound on the

                                      sorting problemo Preview

                                      o For comparison sorting no we canrsquot do better

                                      o Can show lower bound of Ω(N log N)

                                      45

                                      Decision Trees

                                      o A decision tree is a binary treeo Each node represents a set of possible

                                      orderings of the array elementso Each branch represents an outcome of

                                      a particular comparison

                                      o Each leaf of the decision tree represents a particular ordering of the original array elements

                                      46

                                      Decision Trees

                                      47

                                      Decision Tree for Sorting

                                      o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                      o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                      o In the average case the number of comparisons is the average of the depths of all leaves

                                      o There are N different orderings of N elements

                                      48

                                      Lower Bound for Comparison Sorting

                                      o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                      49

                                      Linear Sorting

                                      o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                      o CountingSort

                                      o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                      irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                      50

                                      Linear Sorting

                                      o BucketSort

                                      o Assume N elements of A uniformly distributed over the range [01)

                                      o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                      o Assumes each bucket will contain Θ(1) elements

                                      51

                                      External Sorting

                                      o What is the number of elements N we wish to sort do not fit in memory

                                      o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                      o We want to minimize disk accesses

                                      52

                                      External Mergesorting

                                      o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                      o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                      o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                      size M(K+1)o Perform a K-way merge O(N)

                                      o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                      53

                                      External Mergesort

                                      o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                      o P = page size

                                      o Accesses = 4NP (read-allwrite-all twice)

                                      54

                                      Summary

                                      • G64ADS Advanced Data Structures
                                      • Insertion sort
                                      • Slide 3
                                      • Slide 4
                                      • Slide 5
                                      • Slide 6
                                      • Insertion sort worst-case running time
                                      • Heapsort
                                      • Heapsort -Analysis
                                      • Heapsort ndash No Extra Memory
                                      • Slide 11
                                      • Mergesort
                                      • Slide 13
                                      • Mergesort Divide
                                      • Slide 15
                                      • Slide 16
                                      • Mergesort Merge
                                      • Slide 18
                                      • Slide 19
                                      • Mergesort Analysis
                                      • Slide 21
                                      • Slide 22
                                      • Quicksort
                                      • Quicksort Algorithm
                                      • Quicksort Example
                                      • Why so fast
                                      • Picking the Pivot
                                      • Slide 28
                                      • Partitioning Strategy
                                      • Partitioning Example
                                      • Slide 31
                                      • Slide 32
                                      • Slide 33
                                      • Small Arrays
                                      • QuickSort Implementation
                                      • Slide 36
                                      • Slide 37
                                      • Analysis of QuickSort
                                      • Slide 39
                                      • Slide 40
                                      • Comparison Sorting
                                      • Slide 42
                                      • Slide 43
                                      • Lower Bound on Sorting
                                      • Decision Trees
                                      • Slide 46
                                      • Decision Tree for Sorting
                                      • Lower Bound for Comparison Sorting
                                      • Linear Sorting
                                      • Slide 50
                                      • External Sorting
                                      • External Mergesorting
                                      • External Mergesort
                                      • Summary

                                        20

                                        Mergesort Analysis

                                        o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

                                        o Space requiremento merging two sorted lists requires linear extra

                                        memoryo additional work to copy to the temporary array

                                        and back

                                        21

                                        Mergesort Analysis

                                        o Let T(N) denote the worst-case running time of mergesort to sort N numbers

                                        o Assume that N is a power of 2

                                        o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

                                        o T(1) = 1o T(N) = 2T(N2) + N

                                        22

                                        Mergesort Analysis

                                        kNN

                                        T

                                        NN

                                        T

                                        NNN

                                        T

                                        NN

                                        T

                                        NNN

                                        T

                                        NN

                                        TNT

                                        kk

                                        )2(2

                                        3)8(8

                                        2)4

                                        )8(2(4

                                        2)4(4

                                        )2

                                        )4(2(2

                                        )2(2)(

                                        )log(

                                        log

                                        )2(2)(

                                        NNO

                                        NNN

                                        kNN

                                        TNTk

                                        k

                                        Since N=2k we have k=log2 n

                                        23

                                        Quicksort

                                        o Divide-and-conquer approach to sortingo Like MergeSort except

                                        o Donrsquot divide the array in halfo Partition the array based elements being less than or

                                        greater than some element of the array (the pivot)

                                        o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

                                        InsertionSort) when array is small

                                        24

                                        Quicksort Algorithm

                                        o Given array S

                                        o Modify S so elements in increasing order

                                        1 If size of S is 0 or 1 return

                                        2 Pick any element v in S as the pivot

                                        3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

                                        4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

                                        25

                                        Quicksort Example

                                        26

                                        Why so fast

                                        o MergeSort always divides array in halfo QuickSort might divide array into

                                        subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

                                        o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

                                        merge stepo QuickSort can partition the array in place

                                        o This more than makes up for bad pivot choices

                                        27

                                        Picking the Pivot

                                        o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                                        generator

                                        28

                                        Picking the Pivot

                                        o Best choice of pivoto Median of array

                                        o Median is expensive to calculateo Estimate median as the median of

                                        three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                                        o Has been shown to reduce running time (comparisons) by 14

                                        29

                                        Partitioning Strategy

                                        o Partitioning is conceptually straightforward but easy to do inefficiently

                                        o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                                        o Increment i until S[i] gt pivot

                                        o Decrement j until S[j] lt pivot

                                        o If (i lt j) then swap S[i] and S[j]

                                        o Swap pivot and S[i]

                                        30

                                        Partitioning Example

                                        31

                                        Partitioning Example

                                        32

                                        Partitioning Strategy

                                        o How to handle duplicateso Consider the case where all elements

                                        are equalo Current approach Skip over elements

                                        equal to pivoto No swaps (good)

                                        o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                                        o Worst case O(N2) performance

                                        33

                                        Partitioning Strategy

                                        o How to handle duplicateso Alternative approach

                                        o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                                        o Adds some unnecessary swapso But results in perfect partitioning for array

                                        of identical elementso Unlikely for input array but more likely for

                                        recursive calls to QuickSort

                                        34

                                        Small Arrays

                                        o When S is small generating lots of recursive calls on small sub-arrays is expensive

                                        o General strategyo When N lt threshold use a sort more efficient for

                                        small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                                        for array of size 2 or less

                                        o Has been shown to reduce running time by 15

                                        35

                                        QuickSort Implementation

                                        36

                                        QuickSort Implementation

                                        37

                                        QuickSort Implementation

                                        38

                                        Analysis of QuickSort

                                        o Let i be the number of elements sent to the left partition

                                        o Compute running time T(N) for array of size N

                                        o T(0) = T(1) = O(1)

                                        o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                        39

                                        Analysis of QuickSort

                                        40

                                        Analysis of QuickSort

                                        41

                                        Comparison Sorting

                                        42

                                        Comparison Sorting

                                        43

                                        Comparison Sorting

                                        44

                                        Lower Bound on Sorting

                                        o Best worst-case sorting algorithm (so far) is O(N log N)

                                        o Can we do bettero Can we prove a lower bound on the

                                        sorting problemo Preview

                                        o For comparison sorting no we canrsquot do better

                                        o Can show lower bound of Ω(N log N)

                                        45

                                        Decision Trees

                                        o A decision tree is a binary treeo Each node represents a set of possible

                                        orderings of the array elementso Each branch represents an outcome of

                                        a particular comparison

                                        o Each leaf of the decision tree represents a particular ordering of the original array elements

                                        46

                                        Decision Trees

                                        47

                                        Decision Tree for Sorting

                                        o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                        o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                        o In the average case the number of comparisons is the average of the depths of all leaves

                                        o There are N different orderings of N elements

                                        48

                                        Lower Bound for Comparison Sorting

                                        o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                        49

                                        Linear Sorting

                                        o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                        o CountingSort

                                        o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                        irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                        50

                                        Linear Sorting

                                        o BucketSort

                                        o Assume N elements of A uniformly distributed over the range [01)

                                        o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                        o Assumes each bucket will contain Θ(1) elements

                                        51

                                        External Sorting

                                        o What is the number of elements N we wish to sort do not fit in memory

                                        o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                        o We want to minimize disk accesses

                                        52

                                        External Mergesorting

                                        o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                        o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                        o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                        size M(K+1)o Perform a K-way merge O(N)

                                        o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                        53

                                        External Mergesort

                                        o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                        o P = page size

                                        o Accesses = 4NP (read-allwrite-all twice)

                                        54

                                        Summary

                                        • G64ADS Advanced Data Structures
                                        • Insertion sort
                                        • Slide 3
                                        • Slide 4
                                        • Slide 5
                                        • Slide 6
                                        • Insertion sort worst-case running time
                                        • Heapsort
                                        • Heapsort -Analysis
                                        • Heapsort ndash No Extra Memory
                                        • Slide 11
                                        • Mergesort
                                        • Slide 13
                                        • Mergesort Divide
                                        • Slide 15
                                        • Slide 16
                                        • Mergesort Merge
                                        • Slide 18
                                        • Slide 19
                                        • Mergesort Analysis
                                        • Slide 21
                                        • Slide 22
                                        • Quicksort
                                        • Quicksort Algorithm
                                        • Quicksort Example
                                        • Why so fast
                                        • Picking the Pivot
                                        • Slide 28
                                        • Partitioning Strategy
                                        • Partitioning Example
                                        • Slide 31
                                        • Slide 32
                                        • Slide 33
                                        • Small Arrays
                                        • QuickSort Implementation
                                        • Slide 36
                                        • Slide 37
                                        • Analysis of QuickSort
                                        • Slide 39
                                        • Slide 40
                                        • Comparison Sorting
                                        • Slide 42
                                        • Slide 43
                                        • Lower Bound on Sorting
                                        • Decision Trees
                                        • Slide 46
                                        • Decision Tree for Sorting
                                        • Lower Bound for Comparison Sorting
                                        • Linear Sorting
                                        • Slide 50
                                        • External Sorting
                                        • External Mergesorting
                                        • External Mergesort
                                        • Summary

                                          21

                                          Mergesort Analysis

                                          o Let T(N) denote the worst-case running time of mergesort to sort N numbers

                                          o Assume that N is a power of 2

                                          o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

                                          o T(1) = 1o T(N) = 2T(N2) + N

                                          22

                                          Mergesort Analysis

                                          kNN

                                          T

                                          NN

                                          T

                                          NNN

                                          T

                                          NN

                                          T

                                          NNN

                                          T

                                          NN

                                          TNT

                                          kk

                                          )2(2

                                          3)8(8

                                          2)4

                                          )8(2(4

                                          2)4(4

                                          )2

                                          )4(2(2

                                          )2(2)(

                                          )log(

                                          log

                                          )2(2)(

                                          NNO

                                          NNN

                                          kNN

                                          TNTk

                                          k

                                          Since N=2k we have k=log2 n

                                          23

                                          Quicksort

                                          o Divide-and-conquer approach to sortingo Like MergeSort except

                                          o Donrsquot divide the array in halfo Partition the array based elements being less than or

                                          greater than some element of the array (the pivot)

                                          o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

                                          InsertionSort) when array is small

                                          24

                                          Quicksort Algorithm

                                          o Given array S

                                          o Modify S so elements in increasing order

                                          1 If size of S is 0 or 1 return

                                          2 Pick any element v in S as the pivot

                                          3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

                                          4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

                                          25

                                          Quicksort Example

                                          26

                                          Why so fast

                                          o MergeSort always divides array in halfo QuickSort might divide array into

                                          subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

                                          o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

                                          merge stepo QuickSort can partition the array in place

                                          o This more than makes up for bad pivot choices

                                          27

                                          Picking the Pivot

                                          o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                                          generator

                                          28

                                          Picking the Pivot

                                          o Best choice of pivoto Median of array

                                          o Median is expensive to calculateo Estimate median as the median of

                                          three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                                          o Has been shown to reduce running time (comparisons) by 14

                                          29

                                          Partitioning Strategy

                                          o Partitioning is conceptually straightforward but easy to do inefficiently

                                          o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                                          o Increment i until S[i] gt pivot

                                          o Decrement j until S[j] lt pivot

                                          o If (i lt j) then swap S[i] and S[j]

                                          o Swap pivot and S[i]

                                          30

                                          Partitioning Example

                                          31

                                          Partitioning Example

                                          32

                                          Partitioning Strategy

                                          o How to handle duplicateso Consider the case where all elements

                                          are equalo Current approach Skip over elements

                                          equal to pivoto No swaps (good)

                                          o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                                          o Worst case O(N2) performance

                                          33

                                          Partitioning Strategy

                                          o How to handle duplicateso Alternative approach

                                          o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                                          o Adds some unnecessary swapso But results in perfect partitioning for array

                                          of identical elementso Unlikely for input array but more likely for

                                          recursive calls to QuickSort

                                          34

                                          Small Arrays

                                          o When S is small generating lots of recursive calls on small sub-arrays is expensive

                                          o General strategyo When N lt threshold use a sort more efficient for

                                          small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                                          for array of size 2 or less

                                          o Has been shown to reduce running time by 15

                                          35

                                          QuickSort Implementation

                                          36

                                          QuickSort Implementation

                                          37

                                          QuickSort Implementation

                                          38

                                          Analysis of QuickSort

                                          o Let i be the number of elements sent to the left partition

                                          o Compute running time T(N) for array of size N

                                          o T(0) = T(1) = O(1)

                                          o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                          39

                                          Analysis of QuickSort

                                          40

                                          Analysis of QuickSort

                                          41

                                          Comparison Sorting

                                          42

                                          Comparison Sorting

                                          43

                                          Comparison Sorting

                                          44

                                          Lower Bound on Sorting

                                          o Best worst-case sorting algorithm (so far) is O(N log N)

                                          o Can we do bettero Can we prove a lower bound on the

                                          sorting problemo Preview

                                          o For comparison sorting no we canrsquot do better

                                          o Can show lower bound of Ω(N log N)

                                          45

                                          Decision Trees

                                          o A decision tree is a binary treeo Each node represents a set of possible

                                          orderings of the array elementso Each branch represents an outcome of

                                          a particular comparison

                                          o Each leaf of the decision tree represents a particular ordering of the original array elements

                                          46

                                          Decision Trees

                                          47

                                          Decision Tree for Sorting

                                          o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                          o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                          o In the average case the number of comparisons is the average of the depths of all leaves

                                          o There are N different orderings of N elements

                                          48

                                          Lower Bound for Comparison Sorting

                                          o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                          49

                                          Linear Sorting

                                          o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                          o CountingSort

                                          o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                          irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                          50

                                          Linear Sorting

                                          o BucketSort

                                          o Assume N elements of A uniformly distributed over the range [01)

                                          o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                          o Assumes each bucket will contain Θ(1) elements

                                          51

                                          External Sorting

                                          o What is the number of elements N we wish to sort do not fit in memory

                                          o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                          o We want to minimize disk accesses

                                          52

                                          External Mergesorting

                                          o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                          o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                          o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                          size M(K+1)o Perform a K-way merge O(N)

                                          o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                          53

                                          External Mergesort

                                          o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                          o P = page size

                                          o Accesses = 4NP (read-allwrite-all twice)

                                          54

                                          Summary

                                          • G64ADS Advanced Data Structures
                                          • Insertion sort
                                          • Slide 3
                                          • Slide 4
                                          • Slide 5
                                          • Slide 6
                                          • Insertion sort worst-case running time
                                          • Heapsort
                                          • Heapsort -Analysis
                                          • Heapsort ndash No Extra Memory
                                          • Slide 11
                                          • Mergesort
                                          • Slide 13
                                          • Mergesort Divide
                                          • Slide 15
                                          • Slide 16
                                          • Mergesort Merge
                                          • Slide 18
                                          • Slide 19
                                          • Mergesort Analysis
                                          • Slide 21
                                          • Slide 22
                                          • Quicksort
                                          • Quicksort Algorithm
                                          • Quicksort Example
                                          • Why so fast
                                          • Picking the Pivot
                                          • Slide 28
                                          • Partitioning Strategy
                                          • Partitioning Example
                                          • Slide 31
                                          • Slide 32
                                          • Slide 33
                                          • Small Arrays
                                          • QuickSort Implementation
                                          • Slide 36
                                          • Slide 37
                                          • Analysis of QuickSort
                                          • Slide 39
                                          • Slide 40
                                          • Comparison Sorting
                                          • Slide 42
                                          • Slide 43
                                          • Lower Bound on Sorting
                                          • Decision Trees
                                          • Slide 46
                                          • Decision Tree for Sorting
                                          • Lower Bound for Comparison Sorting
                                          • Linear Sorting
                                          • Slide 50
                                          • External Sorting
                                          • External Mergesorting
                                          • External Mergesort
                                          • Summary

                                            22

                                            Mergesort Analysis

                                            kNN

                                            T

                                            NN

                                            T

                                            NNN

                                            T

                                            NN

                                            T

                                            NNN

                                            T

                                            NN

                                            TNT

                                            kk

                                            )2(2

                                            3)8(8

                                            2)4

                                            )8(2(4

                                            2)4(4

                                            )2

                                            )4(2(2

                                            )2(2)(

                                            )log(

                                            log

                                            )2(2)(

                                            NNO

                                            NNN

                                            kNN

                                            TNTk

                                            k

                                            Since N=2k we have k=log2 n

                                            23

                                            Quicksort

                                            o Divide-and-conquer approach to sortingo Like MergeSort except

                                            o Donrsquot divide the array in halfo Partition the array based elements being less than or

                                            greater than some element of the array (the pivot)

                                            o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

                                            InsertionSort) when array is small

                                            24

                                            Quicksort Algorithm

                                            o Given array S

                                            o Modify S so elements in increasing order

                                            1 If size of S is 0 or 1 return

                                            2 Pick any element v in S as the pivot

                                            3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

                                            4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

                                            25

                                            Quicksort Example

                                            26

                                            Why so fast

                                            o MergeSort always divides array in halfo QuickSort might divide array into

                                            subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

                                            o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

                                            merge stepo QuickSort can partition the array in place

                                            o This more than makes up for bad pivot choices

                                            27

                                            Picking the Pivot

                                            o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                                            generator

                                            28

                                            Picking the Pivot

                                            o Best choice of pivoto Median of array

                                            o Median is expensive to calculateo Estimate median as the median of

                                            three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                                            o Has been shown to reduce running time (comparisons) by 14

                                            29

                                            Partitioning Strategy

                                            o Partitioning is conceptually straightforward but easy to do inefficiently

                                            o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                                            o Increment i until S[i] gt pivot

                                            o Decrement j until S[j] lt pivot

                                            o If (i lt j) then swap S[i] and S[j]

                                            o Swap pivot and S[i]

                                            30

                                            Partitioning Example

                                            31

                                            Partitioning Example

                                            32

                                            Partitioning Strategy

                                            o How to handle duplicateso Consider the case where all elements

                                            are equalo Current approach Skip over elements

                                            equal to pivoto No swaps (good)

                                            o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                                            o Worst case O(N2) performance

                                            33

                                            Partitioning Strategy

                                            o How to handle duplicateso Alternative approach

                                            o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                                            o Adds some unnecessary swapso But results in perfect partitioning for array

                                            of identical elementso Unlikely for input array but more likely for

                                            recursive calls to QuickSort

                                            34

                                            Small Arrays

                                            o When S is small generating lots of recursive calls on small sub-arrays is expensive

                                            o General strategyo When N lt threshold use a sort more efficient for

                                            small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                                            for array of size 2 or less

                                            o Has been shown to reduce running time by 15

                                            35

                                            QuickSort Implementation

                                            36

                                            QuickSort Implementation

                                            37

                                            QuickSort Implementation

                                            38

                                            Analysis of QuickSort

                                            o Let i be the number of elements sent to the left partition

                                            o Compute running time T(N) for array of size N

                                            o T(0) = T(1) = O(1)

                                            o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                            39

                                            Analysis of QuickSort

                                            40

                                            Analysis of QuickSort

                                            41

                                            Comparison Sorting

                                            42

                                            Comparison Sorting

                                            43

                                            Comparison Sorting

                                            44

                                            Lower Bound on Sorting

                                            o Best worst-case sorting algorithm (so far) is O(N log N)

                                            o Can we do bettero Can we prove a lower bound on the

                                            sorting problemo Preview

                                            o For comparison sorting no we canrsquot do better

                                            o Can show lower bound of Ω(N log N)

                                            45

                                            Decision Trees

                                            o A decision tree is a binary treeo Each node represents a set of possible

                                            orderings of the array elementso Each branch represents an outcome of

                                            a particular comparison

                                            o Each leaf of the decision tree represents a particular ordering of the original array elements

                                            46

                                            Decision Trees

                                            47

                                            Decision Tree for Sorting

                                            o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                            o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                            o In the average case the number of comparisons is the average of the depths of all leaves

                                            o There are N different orderings of N elements

                                            48

                                            Lower Bound for Comparison Sorting

                                            o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                            49

                                            Linear Sorting

                                            o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                            o CountingSort

                                            o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                            irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                            50

                                            Linear Sorting

                                            o BucketSort

                                            o Assume N elements of A uniformly distributed over the range [01)

                                            o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                            o Assumes each bucket will contain Θ(1) elements

                                            51

                                            External Sorting

                                            o What is the number of elements N we wish to sort do not fit in memory

                                            o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                            o We want to minimize disk accesses

                                            52

                                            External Mergesorting

                                            o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                            o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                            o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                            size M(K+1)o Perform a K-way merge O(N)

                                            o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                            53

                                            External Mergesort

                                            o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                            o P = page size

                                            o Accesses = 4NP (read-allwrite-all twice)

                                            54

                                            Summary

                                            • G64ADS Advanced Data Structures
                                            • Insertion sort
                                            • Slide 3
                                            • Slide 4
                                            • Slide 5
                                            • Slide 6
                                            • Insertion sort worst-case running time
                                            • Heapsort
                                            • Heapsort -Analysis
                                            • Heapsort ndash No Extra Memory
                                            • Slide 11
                                            • Mergesort
                                            • Slide 13
                                            • Mergesort Divide
                                            • Slide 15
                                            • Slide 16
                                            • Mergesort Merge
                                            • Slide 18
                                            • Slide 19
                                            • Mergesort Analysis
                                            • Slide 21
                                            • Slide 22
                                            • Quicksort
                                            • Quicksort Algorithm
                                            • Quicksort Example
                                            • Why so fast
                                            • Picking the Pivot
                                            • Slide 28
                                            • Partitioning Strategy
                                            • Partitioning Example
                                            • Slide 31
                                            • Slide 32
                                            • Slide 33
                                            • Small Arrays
                                            • QuickSort Implementation
                                            • Slide 36
                                            • Slide 37
                                            • Analysis of QuickSort
                                            • Slide 39
                                            • Slide 40
                                            • Comparison Sorting
                                            • Slide 42
                                            • Slide 43
                                            • Lower Bound on Sorting
                                            • Decision Trees
                                            • Slide 46
                                            • Decision Tree for Sorting
                                            • Lower Bound for Comparison Sorting
                                            • Linear Sorting
                                            • Slide 50
                                            • External Sorting
                                            • External Mergesorting
                                            • External Mergesort
                                            • Summary

                                              23

                                              Quicksort

                                              o Divide-and-conquer approach to sortingo Like MergeSort except

                                              o Donrsquot divide the array in halfo Partition the array based elements being less than or

                                              greater than some element of the array (the pivot)

                                              o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

                                              InsertionSort) when array is small

                                              24

                                              Quicksort Algorithm

                                              o Given array S

                                              o Modify S so elements in increasing order

                                              1 If size of S is 0 or 1 return

                                              2 Pick any element v in S as the pivot

                                              3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

                                              4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

                                              25

                                              Quicksort Example

                                              26

                                              Why so fast

                                              o MergeSort always divides array in halfo QuickSort might divide array into

                                              subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

                                              o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

                                              merge stepo QuickSort can partition the array in place

                                              o This more than makes up for bad pivot choices

                                              27

                                              Picking the Pivot

                                              o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                                              generator

                                              28

                                              Picking the Pivot

                                              o Best choice of pivoto Median of array

                                              o Median is expensive to calculateo Estimate median as the median of

                                              three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                                              o Has been shown to reduce running time (comparisons) by 14

                                              29

                                              Partitioning Strategy

                                              o Partitioning is conceptually straightforward but easy to do inefficiently

                                              o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                                              o Increment i until S[i] gt pivot

                                              o Decrement j until S[j] lt pivot

                                              o If (i lt j) then swap S[i] and S[j]

                                              o Swap pivot and S[i]

                                              30

                                              Partitioning Example

                                              31

                                              Partitioning Example

                                              32

                                              Partitioning Strategy

                                              o How to handle duplicateso Consider the case where all elements

                                              are equalo Current approach Skip over elements

                                              equal to pivoto No swaps (good)

                                              o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                                              o Worst case O(N2) performance

                                              33

                                              Partitioning Strategy

                                              o How to handle duplicateso Alternative approach

                                              o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                                              o Adds some unnecessary swapso But results in perfect partitioning for array

                                              of identical elementso Unlikely for input array but more likely for

                                              recursive calls to QuickSort

                                              34

                                              Small Arrays

                                              o When S is small generating lots of recursive calls on small sub-arrays is expensive

                                              o General strategyo When N lt threshold use a sort more efficient for

                                              small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                                              for array of size 2 or less

                                              o Has been shown to reduce running time by 15

                                              35

                                              QuickSort Implementation

                                              36

                                              QuickSort Implementation

                                              37

                                              QuickSort Implementation

                                              38

                                              Analysis of QuickSort

                                              o Let i be the number of elements sent to the left partition

                                              o Compute running time T(N) for array of size N

                                              o T(0) = T(1) = O(1)

                                              o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                              39

                                              Analysis of QuickSort

                                              40

                                              Analysis of QuickSort

                                              41

                                              Comparison Sorting

                                              42

                                              Comparison Sorting

                                              43

                                              Comparison Sorting

                                              44

                                              Lower Bound on Sorting

                                              o Best worst-case sorting algorithm (so far) is O(N log N)

                                              o Can we do bettero Can we prove a lower bound on the

                                              sorting problemo Preview

                                              o For comparison sorting no we canrsquot do better

                                              o Can show lower bound of Ω(N log N)

                                              45

                                              Decision Trees

                                              o A decision tree is a binary treeo Each node represents a set of possible

                                              orderings of the array elementso Each branch represents an outcome of

                                              a particular comparison

                                              o Each leaf of the decision tree represents a particular ordering of the original array elements

                                              46

                                              Decision Trees

                                              47

                                              Decision Tree for Sorting

                                              o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                              o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                              o In the average case the number of comparisons is the average of the depths of all leaves

                                              o There are N different orderings of N elements

                                              48

                                              Lower Bound for Comparison Sorting

                                              o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                              49

                                              Linear Sorting

                                              o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                              o CountingSort

                                              o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                              irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                              50

                                              Linear Sorting

                                              o BucketSort

                                              o Assume N elements of A uniformly distributed over the range [01)

                                              o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                              o Assumes each bucket will contain Θ(1) elements

                                              51

                                              External Sorting

                                              o What is the number of elements N we wish to sort do not fit in memory

                                              o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                              o We want to minimize disk accesses

                                              52

                                              External Mergesorting

                                              o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                              o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                              o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                              size M(K+1)o Perform a K-way merge O(N)

                                              o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                              53

                                              External Mergesort

                                              o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                              o P = page size

                                              o Accesses = 4NP (read-allwrite-all twice)

                                              54

                                              Summary

                                              • G64ADS Advanced Data Structures
                                              • Insertion sort
                                              • Slide 3
                                              • Slide 4
                                              • Slide 5
                                              • Slide 6
                                              • Insertion sort worst-case running time
                                              • Heapsort
                                              • Heapsort -Analysis
                                              • Heapsort ndash No Extra Memory
                                              • Slide 11
                                              • Mergesort
                                              • Slide 13
                                              • Mergesort Divide
                                              • Slide 15
                                              • Slide 16
                                              • Mergesort Merge
                                              • Slide 18
                                              • Slide 19
                                              • Mergesort Analysis
                                              • Slide 21
                                              • Slide 22
                                              • Quicksort
                                              • Quicksort Algorithm
                                              • Quicksort Example
                                              • Why so fast
                                              • Picking the Pivot
                                              • Slide 28
                                              • Partitioning Strategy
                                              • Partitioning Example
                                              • Slide 31
                                              • Slide 32
                                              • Slide 33
                                              • Small Arrays
                                              • QuickSort Implementation
                                              • Slide 36
                                              • Slide 37
                                              • Analysis of QuickSort
                                              • Slide 39
                                              • Slide 40
                                              • Comparison Sorting
                                              • Slide 42
                                              • Slide 43
                                              • Lower Bound on Sorting
                                              • Decision Trees
                                              • Slide 46
                                              • Decision Tree for Sorting
                                              • Lower Bound for Comparison Sorting
                                              • Linear Sorting
                                              • Slide 50
                                              • External Sorting
                                              • External Mergesorting
                                              • External Mergesort
                                              • Summary

                                                24

                                                Quicksort Algorithm

                                                o Given array S

                                                o Modify S so elements in increasing order

                                                1 If size of S is 0 or 1 return

                                                2 Pick any element v in S as the pivot

                                                3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

                                                4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

                                                25

                                                Quicksort Example

                                                26

                                                Why so fast

                                                o MergeSort always divides array in halfo QuickSort might divide array into

                                                subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

                                                o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

                                                merge stepo QuickSort can partition the array in place

                                                o This more than makes up for bad pivot choices

                                                27

                                                Picking the Pivot

                                                o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                                                generator

                                                28

                                                Picking the Pivot

                                                o Best choice of pivoto Median of array

                                                o Median is expensive to calculateo Estimate median as the median of

                                                three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                                                o Has been shown to reduce running time (comparisons) by 14

                                                29

                                                Partitioning Strategy

                                                o Partitioning is conceptually straightforward but easy to do inefficiently

                                                o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                                                o Increment i until S[i] gt pivot

                                                o Decrement j until S[j] lt pivot

                                                o If (i lt j) then swap S[i] and S[j]

                                                o Swap pivot and S[i]

                                                30

                                                Partitioning Example

                                                31

                                                Partitioning Example

                                                32

                                                Partitioning Strategy

                                                o How to handle duplicateso Consider the case where all elements

                                                are equalo Current approach Skip over elements

                                                equal to pivoto No swaps (good)

                                                o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                                                o Worst case O(N2) performance

                                                33

                                                Partitioning Strategy

                                                o How to handle duplicateso Alternative approach

                                                o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                                                o Adds some unnecessary swapso But results in perfect partitioning for array

                                                of identical elementso Unlikely for input array but more likely for

                                                recursive calls to QuickSort

                                                34

                                                Small Arrays

                                                o When S is small generating lots of recursive calls on small sub-arrays is expensive

                                                o General strategyo When N lt threshold use a sort more efficient for

                                                small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                                                for array of size 2 or less

                                                o Has been shown to reduce running time by 15

                                                35

                                                QuickSort Implementation

                                                36

                                                QuickSort Implementation

                                                37

                                                QuickSort Implementation

                                                38

                                                Analysis of QuickSort

                                                o Let i be the number of elements sent to the left partition

                                                o Compute running time T(N) for array of size N

                                                o T(0) = T(1) = O(1)

                                                o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                                39

                                                Analysis of QuickSort

                                                40

                                                Analysis of QuickSort

                                                41

                                                Comparison Sorting

                                                42

                                                Comparison Sorting

                                                43

                                                Comparison Sorting

                                                44

                                                Lower Bound on Sorting

                                                o Best worst-case sorting algorithm (so far) is O(N log N)

                                                o Can we do bettero Can we prove a lower bound on the

                                                sorting problemo Preview

                                                o For comparison sorting no we canrsquot do better

                                                o Can show lower bound of Ω(N log N)

                                                45

                                                Decision Trees

                                                o A decision tree is a binary treeo Each node represents a set of possible

                                                orderings of the array elementso Each branch represents an outcome of

                                                a particular comparison

                                                o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                46

                                                Decision Trees

                                                47

                                                Decision Tree for Sorting

                                                o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                o In the average case the number of comparisons is the average of the depths of all leaves

                                                o There are N different orderings of N elements

                                                48

                                                Lower Bound for Comparison Sorting

                                                o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                49

                                                Linear Sorting

                                                o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                o CountingSort

                                                o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                50

                                                Linear Sorting

                                                o BucketSort

                                                o Assume N elements of A uniformly distributed over the range [01)

                                                o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                o Assumes each bucket will contain Θ(1) elements

                                                51

                                                External Sorting

                                                o What is the number of elements N we wish to sort do not fit in memory

                                                o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                o We want to minimize disk accesses

                                                52

                                                External Mergesorting

                                                o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                size M(K+1)o Perform a K-way merge O(N)

                                                o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                53

                                                External Mergesort

                                                o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                o P = page size

                                                o Accesses = 4NP (read-allwrite-all twice)

                                                54

                                                Summary

                                                • G64ADS Advanced Data Structures
                                                • Insertion sort
                                                • Slide 3
                                                • Slide 4
                                                • Slide 5
                                                • Slide 6
                                                • Insertion sort worst-case running time
                                                • Heapsort
                                                • Heapsort -Analysis
                                                • Heapsort ndash No Extra Memory
                                                • Slide 11
                                                • Mergesort
                                                • Slide 13
                                                • Mergesort Divide
                                                • Slide 15
                                                • Slide 16
                                                • Mergesort Merge
                                                • Slide 18
                                                • Slide 19
                                                • Mergesort Analysis
                                                • Slide 21
                                                • Slide 22
                                                • Quicksort
                                                • Quicksort Algorithm
                                                • Quicksort Example
                                                • Why so fast
                                                • Picking the Pivot
                                                • Slide 28
                                                • Partitioning Strategy
                                                • Partitioning Example
                                                • Slide 31
                                                • Slide 32
                                                • Slide 33
                                                • Small Arrays
                                                • QuickSort Implementation
                                                • Slide 36
                                                • Slide 37
                                                • Analysis of QuickSort
                                                • Slide 39
                                                • Slide 40
                                                • Comparison Sorting
                                                • Slide 42
                                                • Slide 43
                                                • Lower Bound on Sorting
                                                • Decision Trees
                                                • Slide 46
                                                • Decision Tree for Sorting
                                                • Lower Bound for Comparison Sorting
                                                • Linear Sorting
                                                • Slide 50
                                                • External Sorting
                                                • External Mergesorting
                                                • External Mergesort
                                                • Summary

                                                  25

                                                  Quicksort Example

                                                  26

                                                  Why so fast

                                                  o MergeSort always divides array in halfo QuickSort might divide array into

                                                  subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

                                                  o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

                                                  merge stepo QuickSort can partition the array in place

                                                  o This more than makes up for bad pivot choices

                                                  27

                                                  Picking the Pivot

                                                  o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                                                  generator

                                                  28

                                                  Picking the Pivot

                                                  o Best choice of pivoto Median of array

                                                  o Median is expensive to calculateo Estimate median as the median of

                                                  three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                                                  o Has been shown to reduce running time (comparisons) by 14

                                                  29

                                                  Partitioning Strategy

                                                  o Partitioning is conceptually straightforward but easy to do inefficiently

                                                  o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                                                  o Increment i until S[i] gt pivot

                                                  o Decrement j until S[j] lt pivot

                                                  o If (i lt j) then swap S[i] and S[j]

                                                  o Swap pivot and S[i]

                                                  30

                                                  Partitioning Example

                                                  31

                                                  Partitioning Example

                                                  32

                                                  Partitioning Strategy

                                                  o How to handle duplicateso Consider the case where all elements

                                                  are equalo Current approach Skip over elements

                                                  equal to pivoto No swaps (good)

                                                  o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                                                  o Worst case O(N2) performance

                                                  33

                                                  Partitioning Strategy

                                                  o How to handle duplicateso Alternative approach

                                                  o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                                                  o Adds some unnecessary swapso But results in perfect partitioning for array

                                                  of identical elementso Unlikely for input array but more likely for

                                                  recursive calls to QuickSort

                                                  34

                                                  Small Arrays

                                                  o When S is small generating lots of recursive calls on small sub-arrays is expensive

                                                  o General strategyo When N lt threshold use a sort more efficient for

                                                  small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                                                  for array of size 2 or less

                                                  o Has been shown to reduce running time by 15

                                                  35

                                                  QuickSort Implementation

                                                  36

                                                  QuickSort Implementation

                                                  37

                                                  QuickSort Implementation

                                                  38

                                                  Analysis of QuickSort

                                                  o Let i be the number of elements sent to the left partition

                                                  o Compute running time T(N) for array of size N

                                                  o T(0) = T(1) = O(1)

                                                  o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                                  39

                                                  Analysis of QuickSort

                                                  40

                                                  Analysis of QuickSort

                                                  41

                                                  Comparison Sorting

                                                  42

                                                  Comparison Sorting

                                                  43

                                                  Comparison Sorting

                                                  44

                                                  Lower Bound on Sorting

                                                  o Best worst-case sorting algorithm (so far) is O(N log N)

                                                  o Can we do bettero Can we prove a lower bound on the

                                                  sorting problemo Preview

                                                  o For comparison sorting no we canrsquot do better

                                                  o Can show lower bound of Ω(N log N)

                                                  45

                                                  Decision Trees

                                                  o A decision tree is a binary treeo Each node represents a set of possible

                                                  orderings of the array elementso Each branch represents an outcome of

                                                  a particular comparison

                                                  o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                  46

                                                  Decision Trees

                                                  47

                                                  Decision Tree for Sorting

                                                  o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                  o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                  o In the average case the number of comparisons is the average of the depths of all leaves

                                                  o There are N different orderings of N elements

                                                  48

                                                  Lower Bound for Comparison Sorting

                                                  o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                  49

                                                  Linear Sorting

                                                  o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                  o CountingSort

                                                  o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                  irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                  50

                                                  Linear Sorting

                                                  o BucketSort

                                                  o Assume N elements of A uniformly distributed over the range [01)

                                                  o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                  o Assumes each bucket will contain Θ(1) elements

                                                  51

                                                  External Sorting

                                                  o What is the number of elements N we wish to sort do not fit in memory

                                                  o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                  o We want to minimize disk accesses

                                                  52

                                                  External Mergesorting

                                                  o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                  o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                  o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                  size M(K+1)o Perform a K-way merge O(N)

                                                  o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                  53

                                                  External Mergesort

                                                  o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                  o P = page size

                                                  o Accesses = 4NP (read-allwrite-all twice)

                                                  54

                                                  Summary

                                                  • G64ADS Advanced Data Structures
                                                  • Insertion sort
                                                  • Slide 3
                                                  • Slide 4
                                                  • Slide 5
                                                  • Slide 6
                                                  • Insertion sort worst-case running time
                                                  • Heapsort
                                                  • Heapsort -Analysis
                                                  • Heapsort ndash No Extra Memory
                                                  • Slide 11
                                                  • Mergesort
                                                  • Slide 13
                                                  • Mergesort Divide
                                                  • Slide 15
                                                  • Slide 16
                                                  • Mergesort Merge
                                                  • Slide 18
                                                  • Slide 19
                                                  • Mergesort Analysis
                                                  • Slide 21
                                                  • Slide 22
                                                  • Quicksort
                                                  • Quicksort Algorithm
                                                  • Quicksort Example
                                                  • Why so fast
                                                  • Picking the Pivot
                                                  • Slide 28
                                                  • Partitioning Strategy
                                                  • Partitioning Example
                                                  • Slide 31
                                                  • Slide 32
                                                  • Slide 33
                                                  • Small Arrays
                                                  • QuickSort Implementation
                                                  • Slide 36
                                                  • Slide 37
                                                  • Analysis of QuickSort
                                                  • Slide 39
                                                  • Slide 40
                                                  • Comparison Sorting
                                                  • Slide 42
                                                  • Slide 43
                                                  • Lower Bound on Sorting
                                                  • Decision Trees
                                                  • Slide 46
                                                  • Decision Tree for Sorting
                                                  • Lower Bound for Comparison Sorting
                                                  • Linear Sorting
                                                  • Slide 50
                                                  • External Sorting
                                                  • External Mergesorting
                                                  • External Mergesort
                                                  • Summary

                                                    26

                                                    Why so fast

                                                    o MergeSort always divides array in halfo QuickSort might divide array into

                                                    subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

                                                    o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

                                                    merge stepo QuickSort can partition the array in place

                                                    o This more than makes up for bad pivot choices

                                                    27

                                                    Picking the Pivot

                                                    o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                                                    generator

                                                    28

                                                    Picking the Pivot

                                                    o Best choice of pivoto Median of array

                                                    o Median is expensive to calculateo Estimate median as the median of

                                                    three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                                                    o Has been shown to reduce running time (comparisons) by 14

                                                    29

                                                    Partitioning Strategy

                                                    o Partitioning is conceptually straightforward but easy to do inefficiently

                                                    o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                                                    o Increment i until S[i] gt pivot

                                                    o Decrement j until S[j] lt pivot

                                                    o If (i lt j) then swap S[i] and S[j]

                                                    o Swap pivot and S[i]

                                                    30

                                                    Partitioning Example

                                                    31

                                                    Partitioning Example

                                                    32

                                                    Partitioning Strategy

                                                    o How to handle duplicateso Consider the case where all elements

                                                    are equalo Current approach Skip over elements

                                                    equal to pivoto No swaps (good)

                                                    o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                                                    o Worst case O(N2) performance

                                                    33

                                                    Partitioning Strategy

                                                    o How to handle duplicateso Alternative approach

                                                    o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                                                    o Adds some unnecessary swapso But results in perfect partitioning for array

                                                    of identical elementso Unlikely for input array but more likely for

                                                    recursive calls to QuickSort

                                                    34

                                                    Small Arrays

                                                    o When S is small generating lots of recursive calls on small sub-arrays is expensive

                                                    o General strategyo When N lt threshold use a sort more efficient for

                                                    small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                                                    for array of size 2 or less

                                                    o Has been shown to reduce running time by 15

                                                    35

                                                    QuickSort Implementation

                                                    36

                                                    QuickSort Implementation

                                                    37

                                                    QuickSort Implementation

                                                    38

                                                    Analysis of QuickSort

                                                    o Let i be the number of elements sent to the left partition

                                                    o Compute running time T(N) for array of size N

                                                    o T(0) = T(1) = O(1)

                                                    o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                                    39

                                                    Analysis of QuickSort

                                                    40

                                                    Analysis of QuickSort

                                                    41

                                                    Comparison Sorting

                                                    42

                                                    Comparison Sorting

                                                    43

                                                    Comparison Sorting

                                                    44

                                                    Lower Bound on Sorting

                                                    o Best worst-case sorting algorithm (so far) is O(N log N)

                                                    o Can we do bettero Can we prove a lower bound on the

                                                    sorting problemo Preview

                                                    o For comparison sorting no we canrsquot do better

                                                    o Can show lower bound of Ω(N log N)

                                                    45

                                                    Decision Trees

                                                    o A decision tree is a binary treeo Each node represents a set of possible

                                                    orderings of the array elementso Each branch represents an outcome of

                                                    a particular comparison

                                                    o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                    46

                                                    Decision Trees

                                                    47

                                                    Decision Tree for Sorting

                                                    o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                    o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                    o In the average case the number of comparisons is the average of the depths of all leaves

                                                    o There are N different orderings of N elements

                                                    48

                                                    Lower Bound for Comparison Sorting

                                                    o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                    49

                                                    Linear Sorting

                                                    o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                    o CountingSort

                                                    o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                    irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                    50

                                                    Linear Sorting

                                                    o BucketSort

                                                    o Assume N elements of A uniformly distributed over the range [01)

                                                    o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                    o Assumes each bucket will contain Θ(1) elements

                                                    51

                                                    External Sorting

                                                    o What is the number of elements N we wish to sort do not fit in memory

                                                    o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                    o We want to minimize disk accesses

                                                    52

                                                    External Mergesorting

                                                    o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                    o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                    o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                    size M(K+1)o Perform a K-way merge O(N)

                                                    o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                    53

                                                    External Mergesort

                                                    o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                    o P = page size

                                                    o Accesses = 4NP (read-allwrite-all twice)

                                                    54

                                                    Summary

                                                    • G64ADS Advanced Data Structures
                                                    • Insertion sort
                                                    • Slide 3
                                                    • Slide 4
                                                    • Slide 5
                                                    • Slide 6
                                                    • Insertion sort worst-case running time
                                                    • Heapsort
                                                    • Heapsort -Analysis
                                                    • Heapsort ndash No Extra Memory
                                                    • Slide 11
                                                    • Mergesort
                                                    • Slide 13
                                                    • Mergesort Divide
                                                    • Slide 15
                                                    • Slide 16
                                                    • Mergesort Merge
                                                    • Slide 18
                                                    • Slide 19
                                                    • Mergesort Analysis
                                                    • Slide 21
                                                    • Slide 22
                                                    • Quicksort
                                                    • Quicksort Algorithm
                                                    • Quicksort Example
                                                    • Why so fast
                                                    • Picking the Pivot
                                                    • Slide 28
                                                    • Partitioning Strategy
                                                    • Partitioning Example
                                                    • Slide 31
                                                    • Slide 32
                                                    • Slide 33
                                                    • Small Arrays
                                                    • QuickSort Implementation
                                                    • Slide 36
                                                    • Slide 37
                                                    • Analysis of QuickSort
                                                    • Slide 39
                                                    • Slide 40
                                                    • Comparison Sorting
                                                    • Slide 42
                                                    • Slide 43
                                                    • Lower Bound on Sorting
                                                    • Decision Trees
                                                    • Slide 46
                                                    • Decision Tree for Sorting
                                                    • Lower Bound for Comparison Sorting
                                                    • Linear Sorting
                                                    • Slide 50
                                                    • External Sorting
                                                    • External Mergesorting
                                                    • External Mergesort
                                                    • Summary

                                                      27

                                                      Picking the Pivot

                                                      o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

                                                      generator

                                                      28

                                                      Picking the Pivot

                                                      o Best choice of pivoto Median of array

                                                      o Median is expensive to calculateo Estimate median as the median of

                                                      three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                                                      o Has been shown to reduce running time (comparisons) by 14

                                                      29

                                                      Partitioning Strategy

                                                      o Partitioning is conceptually straightforward but easy to do inefficiently

                                                      o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                                                      o Increment i until S[i] gt pivot

                                                      o Decrement j until S[j] lt pivot

                                                      o If (i lt j) then swap S[i] and S[j]

                                                      o Swap pivot and S[i]

                                                      30

                                                      Partitioning Example

                                                      31

                                                      Partitioning Example

                                                      32

                                                      Partitioning Strategy

                                                      o How to handle duplicateso Consider the case where all elements

                                                      are equalo Current approach Skip over elements

                                                      equal to pivoto No swaps (good)

                                                      o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                                                      o Worst case O(N2) performance

                                                      33

                                                      Partitioning Strategy

                                                      o How to handle duplicateso Alternative approach

                                                      o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                                                      o Adds some unnecessary swapso But results in perfect partitioning for array

                                                      of identical elementso Unlikely for input array but more likely for

                                                      recursive calls to QuickSort

                                                      34

                                                      Small Arrays

                                                      o When S is small generating lots of recursive calls on small sub-arrays is expensive

                                                      o General strategyo When N lt threshold use a sort more efficient for

                                                      small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                                                      for array of size 2 or less

                                                      o Has been shown to reduce running time by 15

                                                      35

                                                      QuickSort Implementation

                                                      36

                                                      QuickSort Implementation

                                                      37

                                                      QuickSort Implementation

                                                      38

                                                      Analysis of QuickSort

                                                      o Let i be the number of elements sent to the left partition

                                                      o Compute running time T(N) for array of size N

                                                      o T(0) = T(1) = O(1)

                                                      o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                                      39

                                                      Analysis of QuickSort

                                                      40

                                                      Analysis of QuickSort

                                                      41

                                                      Comparison Sorting

                                                      42

                                                      Comparison Sorting

                                                      43

                                                      Comparison Sorting

                                                      44

                                                      Lower Bound on Sorting

                                                      o Best worst-case sorting algorithm (so far) is O(N log N)

                                                      o Can we do bettero Can we prove a lower bound on the

                                                      sorting problemo Preview

                                                      o For comparison sorting no we canrsquot do better

                                                      o Can show lower bound of Ω(N log N)

                                                      45

                                                      Decision Trees

                                                      o A decision tree is a binary treeo Each node represents a set of possible

                                                      orderings of the array elementso Each branch represents an outcome of

                                                      a particular comparison

                                                      o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                      46

                                                      Decision Trees

                                                      47

                                                      Decision Tree for Sorting

                                                      o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                      o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                      o In the average case the number of comparisons is the average of the depths of all leaves

                                                      o There are N different orderings of N elements

                                                      48

                                                      Lower Bound for Comparison Sorting

                                                      o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                      49

                                                      Linear Sorting

                                                      o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                      o CountingSort

                                                      o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                      irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                      50

                                                      Linear Sorting

                                                      o BucketSort

                                                      o Assume N elements of A uniformly distributed over the range [01)

                                                      o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                      o Assumes each bucket will contain Θ(1) elements

                                                      51

                                                      External Sorting

                                                      o What is the number of elements N we wish to sort do not fit in memory

                                                      o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                      o We want to minimize disk accesses

                                                      52

                                                      External Mergesorting

                                                      o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                      o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                      o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                      size M(K+1)o Perform a K-way merge O(N)

                                                      o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                      53

                                                      External Mergesort

                                                      o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                      o P = page size

                                                      o Accesses = 4NP (read-allwrite-all twice)

                                                      54

                                                      Summary

                                                      • G64ADS Advanced Data Structures
                                                      • Insertion sort
                                                      • Slide 3
                                                      • Slide 4
                                                      • Slide 5
                                                      • Slide 6
                                                      • Insertion sort worst-case running time
                                                      • Heapsort
                                                      • Heapsort -Analysis
                                                      • Heapsort ndash No Extra Memory
                                                      • Slide 11
                                                      • Mergesort
                                                      • Slide 13
                                                      • Mergesort Divide
                                                      • Slide 15
                                                      • Slide 16
                                                      • Mergesort Merge
                                                      • Slide 18
                                                      • Slide 19
                                                      • Mergesort Analysis
                                                      • Slide 21
                                                      • Slide 22
                                                      • Quicksort
                                                      • Quicksort Algorithm
                                                      • Quicksort Example
                                                      • Why so fast
                                                      • Picking the Pivot
                                                      • Slide 28
                                                      • Partitioning Strategy
                                                      • Partitioning Example
                                                      • Slide 31
                                                      • Slide 32
                                                      • Slide 33
                                                      • Small Arrays
                                                      • QuickSort Implementation
                                                      • Slide 36
                                                      • Slide 37
                                                      • Analysis of QuickSort
                                                      • Slide 39
                                                      • Slide 40
                                                      • Comparison Sorting
                                                      • Slide 42
                                                      • Slide 43
                                                      • Lower Bound on Sorting
                                                      • Decision Trees
                                                      • Slide 46
                                                      • Decision Tree for Sorting
                                                      • Lower Bound for Comparison Sorting
                                                      • Linear Sorting
                                                      • Slide 50
                                                      • External Sorting
                                                      • External Mergesorting
                                                      • External Mergesort
                                                      • Summary

                                                        28

                                                        Picking the Pivot

                                                        o Best choice of pivoto Median of array

                                                        o Median is expensive to calculateo Estimate median as the median of

                                                        three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

                                                        o Has been shown to reduce running time (comparisons) by 14

                                                        29

                                                        Partitioning Strategy

                                                        o Partitioning is conceptually straightforward but easy to do inefficiently

                                                        o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                                                        o Increment i until S[i] gt pivot

                                                        o Decrement j until S[j] lt pivot

                                                        o If (i lt j) then swap S[i] and S[j]

                                                        o Swap pivot and S[i]

                                                        30

                                                        Partitioning Example

                                                        31

                                                        Partitioning Example

                                                        32

                                                        Partitioning Strategy

                                                        o How to handle duplicateso Consider the case where all elements

                                                        are equalo Current approach Skip over elements

                                                        equal to pivoto No swaps (good)

                                                        o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                                                        o Worst case O(N2) performance

                                                        33

                                                        Partitioning Strategy

                                                        o How to handle duplicateso Alternative approach

                                                        o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                                                        o Adds some unnecessary swapso But results in perfect partitioning for array

                                                        of identical elementso Unlikely for input array but more likely for

                                                        recursive calls to QuickSort

                                                        34

                                                        Small Arrays

                                                        o When S is small generating lots of recursive calls on small sub-arrays is expensive

                                                        o General strategyo When N lt threshold use a sort more efficient for

                                                        small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                                                        for array of size 2 or less

                                                        o Has been shown to reduce running time by 15

                                                        35

                                                        QuickSort Implementation

                                                        36

                                                        QuickSort Implementation

                                                        37

                                                        QuickSort Implementation

                                                        38

                                                        Analysis of QuickSort

                                                        o Let i be the number of elements sent to the left partition

                                                        o Compute running time T(N) for array of size N

                                                        o T(0) = T(1) = O(1)

                                                        o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                                        39

                                                        Analysis of QuickSort

                                                        40

                                                        Analysis of QuickSort

                                                        41

                                                        Comparison Sorting

                                                        42

                                                        Comparison Sorting

                                                        43

                                                        Comparison Sorting

                                                        44

                                                        Lower Bound on Sorting

                                                        o Best worst-case sorting algorithm (so far) is O(N log N)

                                                        o Can we do bettero Can we prove a lower bound on the

                                                        sorting problemo Preview

                                                        o For comparison sorting no we canrsquot do better

                                                        o Can show lower bound of Ω(N log N)

                                                        45

                                                        Decision Trees

                                                        o A decision tree is a binary treeo Each node represents a set of possible

                                                        orderings of the array elementso Each branch represents an outcome of

                                                        a particular comparison

                                                        o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                        46

                                                        Decision Trees

                                                        47

                                                        Decision Tree for Sorting

                                                        o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                        o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                        o In the average case the number of comparisons is the average of the depths of all leaves

                                                        o There are N different orderings of N elements

                                                        48

                                                        Lower Bound for Comparison Sorting

                                                        o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                        49

                                                        Linear Sorting

                                                        o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                        o CountingSort

                                                        o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                        irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                        50

                                                        Linear Sorting

                                                        o BucketSort

                                                        o Assume N elements of A uniformly distributed over the range [01)

                                                        o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                        o Assumes each bucket will contain Θ(1) elements

                                                        51

                                                        External Sorting

                                                        o What is the number of elements N we wish to sort do not fit in memory

                                                        o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                        o We want to minimize disk accesses

                                                        52

                                                        External Mergesorting

                                                        o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                        o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                        o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                        size M(K+1)o Perform a K-way merge O(N)

                                                        o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                        53

                                                        External Mergesort

                                                        o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                        o P = page size

                                                        o Accesses = 4NP (read-allwrite-all twice)

                                                        54

                                                        Summary

                                                        • G64ADS Advanced Data Structures
                                                        • Insertion sort
                                                        • Slide 3
                                                        • Slide 4
                                                        • Slide 5
                                                        • Slide 6
                                                        • Insertion sort worst-case running time
                                                        • Heapsort
                                                        • Heapsort -Analysis
                                                        • Heapsort ndash No Extra Memory
                                                        • Slide 11
                                                        • Mergesort
                                                        • Slide 13
                                                        • Mergesort Divide
                                                        • Slide 15
                                                        • Slide 16
                                                        • Mergesort Merge
                                                        • Slide 18
                                                        • Slide 19
                                                        • Mergesort Analysis
                                                        • Slide 21
                                                        • Slide 22
                                                        • Quicksort
                                                        • Quicksort Algorithm
                                                        • Quicksort Example
                                                        • Why so fast
                                                        • Picking the Pivot
                                                        • Slide 28
                                                        • Partitioning Strategy
                                                        • Partitioning Example
                                                        • Slide 31
                                                        • Slide 32
                                                        • Slide 33
                                                        • Small Arrays
                                                        • QuickSort Implementation
                                                        • Slide 36
                                                        • Slide 37
                                                        • Analysis of QuickSort
                                                        • Slide 39
                                                        • Slide 40
                                                        • Comparison Sorting
                                                        • Slide 42
                                                        • Slide 43
                                                        • Lower Bound on Sorting
                                                        • Decision Trees
                                                        • Slide 46
                                                        • Decision Tree for Sorting
                                                        • Lower Bound for Comparison Sorting
                                                        • Linear Sorting
                                                        • Slide 50
                                                        • External Sorting
                                                        • External Mergesorting
                                                        • External Mergesort
                                                        • Summary

                                                          29

                                                          Partitioning Strategy

                                                          o Partitioning is conceptually straightforward but easy to do inefficiently

                                                          o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

                                                          o Increment i until S[i] gt pivot

                                                          o Decrement j until S[j] lt pivot

                                                          o If (i lt j) then swap S[i] and S[j]

                                                          o Swap pivot and S[i]

                                                          30

                                                          Partitioning Example

                                                          31

                                                          Partitioning Example

                                                          32

                                                          Partitioning Strategy

                                                          o How to handle duplicateso Consider the case where all elements

                                                          are equalo Current approach Skip over elements

                                                          equal to pivoto No swaps (good)

                                                          o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                                                          o Worst case O(N2) performance

                                                          33

                                                          Partitioning Strategy

                                                          o How to handle duplicateso Alternative approach

                                                          o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                                                          o Adds some unnecessary swapso But results in perfect partitioning for array

                                                          of identical elementso Unlikely for input array but more likely for

                                                          recursive calls to QuickSort

                                                          34

                                                          Small Arrays

                                                          o When S is small generating lots of recursive calls on small sub-arrays is expensive

                                                          o General strategyo When N lt threshold use a sort more efficient for

                                                          small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                                                          for array of size 2 or less

                                                          o Has been shown to reduce running time by 15

                                                          35

                                                          QuickSort Implementation

                                                          36

                                                          QuickSort Implementation

                                                          37

                                                          QuickSort Implementation

                                                          38

                                                          Analysis of QuickSort

                                                          o Let i be the number of elements sent to the left partition

                                                          o Compute running time T(N) for array of size N

                                                          o T(0) = T(1) = O(1)

                                                          o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                                          39

                                                          Analysis of QuickSort

                                                          40

                                                          Analysis of QuickSort

                                                          41

                                                          Comparison Sorting

                                                          42

                                                          Comparison Sorting

                                                          43

                                                          Comparison Sorting

                                                          44

                                                          Lower Bound on Sorting

                                                          o Best worst-case sorting algorithm (so far) is O(N log N)

                                                          o Can we do bettero Can we prove a lower bound on the

                                                          sorting problemo Preview

                                                          o For comparison sorting no we canrsquot do better

                                                          o Can show lower bound of Ω(N log N)

                                                          45

                                                          Decision Trees

                                                          o A decision tree is a binary treeo Each node represents a set of possible

                                                          orderings of the array elementso Each branch represents an outcome of

                                                          a particular comparison

                                                          o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                          46

                                                          Decision Trees

                                                          47

                                                          Decision Tree for Sorting

                                                          o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                          o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                          o In the average case the number of comparisons is the average of the depths of all leaves

                                                          o There are N different orderings of N elements

                                                          48

                                                          Lower Bound for Comparison Sorting

                                                          o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                          49

                                                          Linear Sorting

                                                          o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                          o CountingSort

                                                          o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                          irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                          50

                                                          Linear Sorting

                                                          o BucketSort

                                                          o Assume N elements of A uniformly distributed over the range [01)

                                                          o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                          o Assumes each bucket will contain Θ(1) elements

                                                          51

                                                          External Sorting

                                                          o What is the number of elements N we wish to sort do not fit in memory

                                                          o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                          o We want to minimize disk accesses

                                                          52

                                                          External Mergesorting

                                                          o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                          o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                          o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                          size M(K+1)o Perform a K-way merge O(N)

                                                          o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                          53

                                                          External Mergesort

                                                          o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                          o P = page size

                                                          o Accesses = 4NP (read-allwrite-all twice)

                                                          54

                                                          Summary

                                                          • G64ADS Advanced Data Structures
                                                          • Insertion sort
                                                          • Slide 3
                                                          • Slide 4
                                                          • Slide 5
                                                          • Slide 6
                                                          • Insertion sort worst-case running time
                                                          • Heapsort
                                                          • Heapsort -Analysis
                                                          • Heapsort ndash No Extra Memory
                                                          • Slide 11
                                                          • Mergesort
                                                          • Slide 13
                                                          • Mergesort Divide
                                                          • Slide 15
                                                          • Slide 16
                                                          • Mergesort Merge
                                                          • Slide 18
                                                          • Slide 19
                                                          • Mergesort Analysis
                                                          • Slide 21
                                                          • Slide 22
                                                          • Quicksort
                                                          • Quicksort Algorithm
                                                          • Quicksort Example
                                                          • Why so fast
                                                          • Picking the Pivot
                                                          • Slide 28
                                                          • Partitioning Strategy
                                                          • Partitioning Example
                                                          • Slide 31
                                                          • Slide 32
                                                          • Slide 33
                                                          • Small Arrays
                                                          • QuickSort Implementation
                                                          • Slide 36
                                                          • Slide 37
                                                          • Analysis of QuickSort
                                                          • Slide 39
                                                          • Slide 40
                                                          • Comparison Sorting
                                                          • Slide 42
                                                          • Slide 43
                                                          • Lower Bound on Sorting
                                                          • Decision Trees
                                                          • Slide 46
                                                          • Decision Tree for Sorting
                                                          • Lower Bound for Comparison Sorting
                                                          • Linear Sorting
                                                          • Slide 50
                                                          • External Sorting
                                                          • External Mergesorting
                                                          • External Mergesort
                                                          • Summary

                                                            30

                                                            Partitioning Example

                                                            31

                                                            Partitioning Example

                                                            32

                                                            Partitioning Strategy

                                                            o How to handle duplicateso Consider the case where all elements

                                                            are equalo Current approach Skip over elements

                                                            equal to pivoto No swaps (good)

                                                            o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                                                            o Worst case O(N2) performance

                                                            33

                                                            Partitioning Strategy

                                                            o How to handle duplicateso Alternative approach

                                                            o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                                                            o Adds some unnecessary swapso But results in perfect partitioning for array

                                                            of identical elementso Unlikely for input array but more likely for

                                                            recursive calls to QuickSort

                                                            34

                                                            Small Arrays

                                                            o When S is small generating lots of recursive calls on small sub-arrays is expensive

                                                            o General strategyo When N lt threshold use a sort more efficient for

                                                            small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                                                            for array of size 2 or less

                                                            o Has been shown to reduce running time by 15

                                                            35

                                                            QuickSort Implementation

                                                            36

                                                            QuickSort Implementation

                                                            37

                                                            QuickSort Implementation

                                                            38

                                                            Analysis of QuickSort

                                                            o Let i be the number of elements sent to the left partition

                                                            o Compute running time T(N) for array of size N

                                                            o T(0) = T(1) = O(1)

                                                            o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                                            39

                                                            Analysis of QuickSort

                                                            40

                                                            Analysis of QuickSort

                                                            41

                                                            Comparison Sorting

                                                            42

                                                            Comparison Sorting

                                                            43

                                                            Comparison Sorting

                                                            44

                                                            Lower Bound on Sorting

                                                            o Best worst-case sorting algorithm (so far) is O(N log N)

                                                            o Can we do bettero Can we prove a lower bound on the

                                                            sorting problemo Preview

                                                            o For comparison sorting no we canrsquot do better

                                                            o Can show lower bound of Ω(N log N)

                                                            45

                                                            Decision Trees

                                                            o A decision tree is a binary treeo Each node represents a set of possible

                                                            orderings of the array elementso Each branch represents an outcome of

                                                            a particular comparison

                                                            o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                            46

                                                            Decision Trees

                                                            47

                                                            Decision Tree for Sorting

                                                            o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                            o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                            o In the average case the number of comparisons is the average of the depths of all leaves

                                                            o There are N different orderings of N elements

                                                            48

                                                            Lower Bound for Comparison Sorting

                                                            o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                            49

                                                            Linear Sorting

                                                            o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                            o CountingSort

                                                            o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                            irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                            50

                                                            Linear Sorting

                                                            o BucketSort

                                                            o Assume N elements of A uniformly distributed over the range [01)

                                                            o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                            o Assumes each bucket will contain Θ(1) elements

                                                            51

                                                            External Sorting

                                                            o What is the number of elements N we wish to sort do not fit in memory

                                                            o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                            o We want to minimize disk accesses

                                                            52

                                                            External Mergesorting

                                                            o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                            o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                            o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                            size M(K+1)o Perform a K-way merge O(N)

                                                            o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                            53

                                                            External Mergesort

                                                            o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                            o P = page size

                                                            o Accesses = 4NP (read-allwrite-all twice)

                                                            54

                                                            Summary

                                                            • G64ADS Advanced Data Structures
                                                            • Insertion sort
                                                            • Slide 3
                                                            • Slide 4
                                                            • Slide 5
                                                            • Slide 6
                                                            • Insertion sort worst-case running time
                                                            • Heapsort
                                                            • Heapsort -Analysis
                                                            • Heapsort ndash No Extra Memory
                                                            • Slide 11
                                                            • Mergesort
                                                            • Slide 13
                                                            • Mergesort Divide
                                                            • Slide 15
                                                            • Slide 16
                                                            • Mergesort Merge
                                                            • Slide 18
                                                            • Slide 19
                                                            • Mergesort Analysis
                                                            • Slide 21
                                                            • Slide 22
                                                            • Quicksort
                                                            • Quicksort Algorithm
                                                            • Quicksort Example
                                                            • Why so fast
                                                            • Picking the Pivot
                                                            • Slide 28
                                                            • Partitioning Strategy
                                                            • Partitioning Example
                                                            • Slide 31
                                                            • Slide 32
                                                            • Slide 33
                                                            • Small Arrays
                                                            • QuickSort Implementation
                                                            • Slide 36
                                                            • Slide 37
                                                            • Analysis of QuickSort
                                                            • Slide 39
                                                            • Slide 40
                                                            • Comparison Sorting
                                                            • Slide 42
                                                            • Slide 43
                                                            • Lower Bound on Sorting
                                                            • Decision Trees
                                                            • Slide 46
                                                            • Decision Tree for Sorting
                                                            • Lower Bound for Comparison Sorting
                                                            • Linear Sorting
                                                            • Slide 50
                                                            • External Sorting
                                                            • External Mergesorting
                                                            • External Mergesort
                                                            • Summary

                                                              31

                                                              Partitioning Example

                                                              32

                                                              Partitioning Strategy

                                                              o How to handle duplicateso Consider the case where all elements

                                                              are equalo Current approach Skip over elements

                                                              equal to pivoto No swaps (good)

                                                              o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                                                              o Worst case O(N2) performance

                                                              33

                                                              Partitioning Strategy

                                                              o How to handle duplicateso Alternative approach

                                                              o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                                                              o Adds some unnecessary swapso But results in perfect partitioning for array

                                                              of identical elementso Unlikely for input array but more likely for

                                                              recursive calls to QuickSort

                                                              34

                                                              Small Arrays

                                                              o When S is small generating lots of recursive calls on small sub-arrays is expensive

                                                              o General strategyo When N lt threshold use a sort more efficient for

                                                              small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                                                              for array of size 2 or less

                                                              o Has been shown to reduce running time by 15

                                                              35

                                                              QuickSort Implementation

                                                              36

                                                              QuickSort Implementation

                                                              37

                                                              QuickSort Implementation

                                                              38

                                                              Analysis of QuickSort

                                                              o Let i be the number of elements sent to the left partition

                                                              o Compute running time T(N) for array of size N

                                                              o T(0) = T(1) = O(1)

                                                              o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                                              39

                                                              Analysis of QuickSort

                                                              40

                                                              Analysis of QuickSort

                                                              41

                                                              Comparison Sorting

                                                              42

                                                              Comparison Sorting

                                                              43

                                                              Comparison Sorting

                                                              44

                                                              Lower Bound on Sorting

                                                              o Best worst-case sorting algorithm (so far) is O(N log N)

                                                              o Can we do bettero Can we prove a lower bound on the

                                                              sorting problemo Preview

                                                              o For comparison sorting no we canrsquot do better

                                                              o Can show lower bound of Ω(N log N)

                                                              45

                                                              Decision Trees

                                                              o A decision tree is a binary treeo Each node represents a set of possible

                                                              orderings of the array elementso Each branch represents an outcome of

                                                              a particular comparison

                                                              o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                              46

                                                              Decision Trees

                                                              47

                                                              Decision Tree for Sorting

                                                              o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                              o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                              o In the average case the number of comparisons is the average of the depths of all leaves

                                                              o There are N different orderings of N elements

                                                              48

                                                              Lower Bound for Comparison Sorting

                                                              o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                              49

                                                              Linear Sorting

                                                              o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                              o CountingSort

                                                              o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                              irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                              50

                                                              Linear Sorting

                                                              o BucketSort

                                                              o Assume N elements of A uniformly distributed over the range [01)

                                                              o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                              o Assumes each bucket will contain Θ(1) elements

                                                              51

                                                              External Sorting

                                                              o What is the number of elements N we wish to sort do not fit in memory

                                                              o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                              o We want to minimize disk accesses

                                                              52

                                                              External Mergesorting

                                                              o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                              o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                              o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                              size M(K+1)o Perform a K-way merge O(N)

                                                              o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                              53

                                                              External Mergesort

                                                              o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                              o P = page size

                                                              o Accesses = 4NP (read-allwrite-all twice)

                                                              54

                                                              Summary

                                                              • G64ADS Advanced Data Structures
                                                              • Insertion sort
                                                              • Slide 3
                                                              • Slide 4
                                                              • Slide 5
                                                              • Slide 6
                                                              • Insertion sort worst-case running time
                                                              • Heapsort
                                                              • Heapsort -Analysis
                                                              • Heapsort ndash No Extra Memory
                                                              • Slide 11
                                                              • Mergesort
                                                              • Slide 13
                                                              • Mergesort Divide
                                                              • Slide 15
                                                              • Slide 16
                                                              • Mergesort Merge
                                                              • Slide 18
                                                              • Slide 19
                                                              • Mergesort Analysis
                                                              • Slide 21
                                                              • Slide 22
                                                              • Quicksort
                                                              • Quicksort Algorithm
                                                              • Quicksort Example
                                                              • Why so fast
                                                              • Picking the Pivot
                                                              • Slide 28
                                                              • Partitioning Strategy
                                                              • Partitioning Example
                                                              • Slide 31
                                                              • Slide 32
                                                              • Slide 33
                                                              • Small Arrays
                                                              • QuickSort Implementation
                                                              • Slide 36
                                                              • Slide 37
                                                              • Analysis of QuickSort
                                                              • Slide 39
                                                              • Slide 40
                                                              • Comparison Sorting
                                                              • Slide 42
                                                              • Slide 43
                                                              • Lower Bound on Sorting
                                                              • Decision Trees
                                                              • Slide 46
                                                              • Decision Tree for Sorting
                                                              • Lower Bound for Comparison Sorting
                                                              • Linear Sorting
                                                              • Slide 50
                                                              • External Sorting
                                                              • External Mergesorting
                                                              • External Mergesort
                                                              • Summary

                                                                32

                                                                Partitioning Strategy

                                                                o How to handle duplicateso Consider the case where all elements

                                                                are equalo Current approach Skip over elements

                                                                equal to pivoto No swaps (good)

                                                                o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

                                                                o Worst case O(N2) performance

                                                                33

                                                                Partitioning Strategy

                                                                o How to handle duplicateso Alternative approach

                                                                o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                                                                o Adds some unnecessary swapso But results in perfect partitioning for array

                                                                of identical elementso Unlikely for input array but more likely for

                                                                recursive calls to QuickSort

                                                                34

                                                                Small Arrays

                                                                o When S is small generating lots of recursive calls on small sub-arrays is expensive

                                                                o General strategyo When N lt threshold use a sort more efficient for

                                                                small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                                                                for array of size 2 or less

                                                                o Has been shown to reduce running time by 15

                                                                35

                                                                QuickSort Implementation

                                                                36

                                                                QuickSort Implementation

                                                                37

                                                                QuickSort Implementation

                                                                38

                                                                Analysis of QuickSort

                                                                o Let i be the number of elements sent to the left partition

                                                                o Compute running time T(N) for array of size N

                                                                o T(0) = T(1) = O(1)

                                                                o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                                                39

                                                                Analysis of QuickSort

                                                                40

                                                                Analysis of QuickSort

                                                                41

                                                                Comparison Sorting

                                                                42

                                                                Comparison Sorting

                                                                43

                                                                Comparison Sorting

                                                                44

                                                                Lower Bound on Sorting

                                                                o Best worst-case sorting algorithm (so far) is O(N log N)

                                                                o Can we do bettero Can we prove a lower bound on the

                                                                sorting problemo Preview

                                                                o For comparison sorting no we canrsquot do better

                                                                o Can show lower bound of Ω(N log N)

                                                                45

                                                                Decision Trees

                                                                o A decision tree is a binary treeo Each node represents a set of possible

                                                                orderings of the array elementso Each branch represents an outcome of

                                                                a particular comparison

                                                                o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                                46

                                                                Decision Trees

                                                                47

                                                                Decision Tree for Sorting

                                                                o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                                o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                                o In the average case the number of comparisons is the average of the depths of all leaves

                                                                o There are N different orderings of N elements

                                                                48

                                                                Lower Bound for Comparison Sorting

                                                                o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                                49

                                                                Linear Sorting

                                                                o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                                o CountingSort

                                                                o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                                irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                                50

                                                                Linear Sorting

                                                                o BucketSort

                                                                o Assume N elements of A uniformly distributed over the range [01)

                                                                o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                                o Assumes each bucket will contain Θ(1) elements

                                                                51

                                                                External Sorting

                                                                o What is the number of elements N we wish to sort do not fit in memory

                                                                o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                o We want to minimize disk accesses

                                                                52

                                                                External Mergesorting

                                                                o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                size M(K+1)o Perform a K-way merge O(N)

                                                                o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                53

                                                                External Mergesort

                                                                o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                o P = page size

                                                                o Accesses = 4NP (read-allwrite-all twice)

                                                                54

                                                                Summary

                                                                • G64ADS Advanced Data Structures
                                                                • Insertion sort
                                                                • Slide 3
                                                                • Slide 4
                                                                • Slide 5
                                                                • Slide 6
                                                                • Insertion sort worst-case running time
                                                                • Heapsort
                                                                • Heapsort -Analysis
                                                                • Heapsort ndash No Extra Memory
                                                                • Slide 11
                                                                • Mergesort
                                                                • Slide 13
                                                                • Mergesort Divide
                                                                • Slide 15
                                                                • Slide 16
                                                                • Mergesort Merge
                                                                • Slide 18
                                                                • Slide 19
                                                                • Mergesort Analysis
                                                                • Slide 21
                                                                • Slide 22
                                                                • Quicksort
                                                                • Quicksort Algorithm
                                                                • Quicksort Example
                                                                • Why so fast
                                                                • Picking the Pivot
                                                                • Slide 28
                                                                • Partitioning Strategy
                                                                • Partitioning Example
                                                                • Slide 31
                                                                • Slide 32
                                                                • Slide 33
                                                                • Small Arrays
                                                                • QuickSort Implementation
                                                                • Slide 36
                                                                • Slide 37
                                                                • Analysis of QuickSort
                                                                • Slide 39
                                                                • Slide 40
                                                                • Comparison Sorting
                                                                • Slide 42
                                                                • Slide 43
                                                                • Lower Bound on Sorting
                                                                • Decision Trees
                                                                • Slide 46
                                                                • Decision Tree for Sorting
                                                                • Lower Bound for Comparison Sorting
                                                                • Linear Sorting
                                                                • Slide 50
                                                                • External Sorting
                                                                • External Mergesorting
                                                                • External Mergesort
                                                                • Summary

                                                                  33

                                                                  Partitioning Strategy

                                                                  o How to handle duplicateso Alternative approach

                                                                  o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

                                                                  o Adds some unnecessary swapso But results in perfect partitioning for array

                                                                  of identical elementso Unlikely for input array but more likely for

                                                                  recursive calls to QuickSort

                                                                  34

                                                                  Small Arrays

                                                                  o When S is small generating lots of recursive calls on small sub-arrays is expensive

                                                                  o General strategyo When N lt threshold use a sort more efficient for

                                                                  small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                                                                  for array of size 2 or less

                                                                  o Has been shown to reduce running time by 15

                                                                  35

                                                                  QuickSort Implementation

                                                                  36

                                                                  QuickSort Implementation

                                                                  37

                                                                  QuickSort Implementation

                                                                  38

                                                                  Analysis of QuickSort

                                                                  o Let i be the number of elements sent to the left partition

                                                                  o Compute running time T(N) for array of size N

                                                                  o T(0) = T(1) = O(1)

                                                                  o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                                                  39

                                                                  Analysis of QuickSort

                                                                  40

                                                                  Analysis of QuickSort

                                                                  41

                                                                  Comparison Sorting

                                                                  42

                                                                  Comparison Sorting

                                                                  43

                                                                  Comparison Sorting

                                                                  44

                                                                  Lower Bound on Sorting

                                                                  o Best worst-case sorting algorithm (so far) is O(N log N)

                                                                  o Can we do bettero Can we prove a lower bound on the

                                                                  sorting problemo Preview

                                                                  o For comparison sorting no we canrsquot do better

                                                                  o Can show lower bound of Ω(N log N)

                                                                  45

                                                                  Decision Trees

                                                                  o A decision tree is a binary treeo Each node represents a set of possible

                                                                  orderings of the array elementso Each branch represents an outcome of

                                                                  a particular comparison

                                                                  o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                                  46

                                                                  Decision Trees

                                                                  47

                                                                  Decision Tree for Sorting

                                                                  o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                                  o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                                  o In the average case the number of comparisons is the average of the depths of all leaves

                                                                  o There are N different orderings of N elements

                                                                  48

                                                                  Lower Bound for Comparison Sorting

                                                                  o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                                  49

                                                                  Linear Sorting

                                                                  o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                                  o CountingSort

                                                                  o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                                  irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                                  50

                                                                  Linear Sorting

                                                                  o BucketSort

                                                                  o Assume N elements of A uniformly distributed over the range [01)

                                                                  o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                                  o Assumes each bucket will contain Θ(1) elements

                                                                  51

                                                                  External Sorting

                                                                  o What is the number of elements N we wish to sort do not fit in memory

                                                                  o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                  o We want to minimize disk accesses

                                                                  52

                                                                  External Mergesorting

                                                                  o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                  o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                  o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                  size M(K+1)o Perform a K-way merge O(N)

                                                                  o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                  53

                                                                  External Mergesort

                                                                  o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                  o P = page size

                                                                  o Accesses = 4NP (read-allwrite-all twice)

                                                                  54

                                                                  Summary

                                                                  • G64ADS Advanced Data Structures
                                                                  • Insertion sort
                                                                  • Slide 3
                                                                  • Slide 4
                                                                  • Slide 5
                                                                  • Slide 6
                                                                  • Insertion sort worst-case running time
                                                                  • Heapsort
                                                                  • Heapsort -Analysis
                                                                  • Heapsort ndash No Extra Memory
                                                                  • Slide 11
                                                                  • Mergesort
                                                                  • Slide 13
                                                                  • Mergesort Divide
                                                                  • Slide 15
                                                                  • Slide 16
                                                                  • Mergesort Merge
                                                                  • Slide 18
                                                                  • Slide 19
                                                                  • Mergesort Analysis
                                                                  • Slide 21
                                                                  • Slide 22
                                                                  • Quicksort
                                                                  • Quicksort Algorithm
                                                                  • Quicksort Example
                                                                  • Why so fast
                                                                  • Picking the Pivot
                                                                  • Slide 28
                                                                  • Partitioning Strategy
                                                                  • Partitioning Example
                                                                  • Slide 31
                                                                  • Slide 32
                                                                  • Slide 33
                                                                  • Small Arrays
                                                                  • QuickSort Implementation
                                                                  • Slide 36
                                                                  • Slide 37
                                                                  • Analysis of QuickSort
                                                                  • Slide 39
                                                                  • Slide 40
                                                                  • Comparison Sorting
                                                                  • Slide 42
                                                                  • Slide 43
                                                                  • Lower Bound on Sorting
                                                                  • Decision Trees
                                                                  • Slide 46
                                                                  • Decision Tree for Sorting
                                                                  • Lower Bound for Comparison Sorting
                                                                  • Linear Sorting
                                                                  • Slide 50
                                                                  • External Sorting
                                                                  • External Mergesorting
                                                                  • External Mergesort
                                                                  • Summary

                                                                    34

                                                                    Small Arrays

                                                                    o When S is small generating lots of recursive calls on small sub-arrays is expensive

                                                                    o General strategyo When N lt threshold use a sort more efficient for

                                                                    small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

                                                                    for array of size 2 or less

                                                                    o Has been shown to reduce running time by 15

                                                                    35

                                                                    QuickSort Implementation

                                                                    36

                                                                    QuickSort Implementation

                                                                    37

                                                                    QuickSort Implementation

                                                                    38

                                                                    Analysis of QuickSort

                                                                    o Let i be the number of elements sent to the left partition

                                                                    o Compute running time T(N) for array of size N

                                                                    o T(0) = T(1) = O(1)

                                                                    o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                                                    39

                                                                    Analysis of QuickSort

                                                                    40

                                                                    Analysis of QuickSort

                                                                    41

                                                                    Comparison Sorting

                                                                    42

                                                                    Comparison Sorting

                                                                    43

                                                                    Comparison Sorting

                                                                    44

                                                                    Lower Bound on Sorting

                                                                    o Best worst-case sorting algorithm (so far) is O(N log N)

                                                                    o Can we do bettero Can we prove a lower bound on the

                                                                    sorting problemo Preview

                                                                    o For comparison sorting no we canrsquot do better

                                                                    o Can show lower bound of Ω(N log N)

                                                                    45

                                                                    Decision Trees

                                                                    o A decision tree is a binary treeo Each node represents a set of possible

                                                                    orderings of the array elementso Each branch represents an outcome of

                                                                    a particular comparison

                                                                    o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                                    46

                                                                    Decision Trees

                                                                    47

                                                                    Decision Tree for Sorting

                                                                    o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                                    o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                                    o In the average case the number of comparisons is the average of the depths of all leaves

                                                                    o There are N different orderings of N elements

                                                                    48

                                                                    Lower Bound for Comparison Sorting

                                                                    o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                                    49

                                                                    Linear Sorting

                                                                    o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                                    o CountingSort

                                                                    o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                                    irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                                    50

                                                                    Linear Sorting

                                                                    o BucketSort

                                                                    o Assume N elements of A uniformly distributed over the range [01)

                                                                    o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                                    o Assumes each bucket will contain Θ(1) elements

                                                                    51

                                                                    External Sorting

                                                                    o What is the number of elements N we wish to sort do not fit in memory

                                                                    o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                    o We want to minimize disk accesses

                                                                    52

                                                                    External Mergesorting

                                                                    o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                    o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                    o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                    size M(K+1)o Perform a K-way merge O(N)

                                                                    o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                    53

                                                                    External Mergesort

                                                                    o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                    o P = page size

                                                                    o Accesses = 4NP (read-allwrite-all twice)

                                                                    54

                                                                    Summary

                                                                    • G64ADS Advanced Data Structures
                                                                    • Insertion sort
                                                                    • Slide 3
                                                                    • Slide 4
                                                                    • Slide 5
                                                                    • Slide 6
                                                                    • Insertion sort worst-case running time
                                                                    • Heapsort
                                                                    • Heapsort -Analysis
                                                                    • Heapsort ndash No Extra Memory
                                                                    • Slide 11
                                                                    • Mergesort
                                                                    • Slide 13
                                                                    • Mergesort Divide
                                                                    • Slide 15
                                                                    • Slide 16
                                                                    • Mergesort Merge
                                                                    • Slide 18
                                                                    • Slide 19
                                                                    • Mergesort Analysis
                                                                    • Slide 21
                                                                    • Slide 22
                                                                    • Quicksort
                                                                    • Quicksort Algorithm
                                                                    • Quicksort Example
                                                                    • Why so fast
                                                                    • Picking the Pivot
                                                                    • Slide 28
                                                                    • Partitioning Strategy
                                                                    • Partitioning Example
                                                                    • Slide 31
                                                                    • Slide 32
                                                                    • Slide 33
                                                                    • Small Arrays
                                                                    • QuickSort Implementation
                                                                    • Slide 36
                                                                    • Slide 37
                                                                    • Analysis of QuickSort
                                                                    • Slide 39
                                                                    • Slide 40
                                                                    • Comparison Sorting
                                                                    • Slide 42
                                                                    • Slide 43
                                                                    • Lower Bound on Sorting
                                                                    • Decision Trees
                                                                    • Slide 46
                                                                    • Decision Tree for Sorting
                                                                    • Lower Bound for Comparison Sorting
                                                                    • Linear Sorting
                                                                    • Slide 50
                                                                    • External Sorting
                                                                    • External Mergesorting
                                                                    • External Mergesort
                                                                    • Summary

                                                                      35

                                                                      QuickSort Implementation

                                                                      36

                                                                      QuickSort Implementation

                                                                      37

                                                                      QuickSort Implementation

                                                                      38

                                                                      Analysis of QuickSort

                                                                      o Let i be the number of elements sent to the left partition

                                                                      o Compute running time T(N) for array of size N

                                                                      o T(0) = T(1) = O(1)

                                                                      o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                                                      39

                                                                      Analysis of QuickSort

                                                                      40

                                                                      Analysis of QuickSort

                                                                      41

                                                                      Comparison Sorting

                                                                      42

                                                                      Comparison Sorting

                                                                      43

                                                                      Comparison Sorting

                                                                      44

                                                                      Lower Bound on Sorting

                                                                      o Best worst-case sorting algorithm (so far) is O(N log N)

                                                                      o Can we do bettero Can we prove a lower bound on the

                                                                      sorting problemo Preview

                                                                      o For comparison sorting no we canrsquot do better

                                                                      o Can show lower bound of Ω(N log N)

                                                                      45

                                                                      Decision Trees

                                                                      o A decision tree is a binary treeo Each node represents a set of possible

                                                                      orderings of the array elementso Each branch represents an outcome of

                                                                      a particular comparison

                                                                      o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                                      46

                                                                      Decision Trees

                                                                      47

                                                                      Decision Tree for Sorting

                                                                      o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                                      o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                                      o In the average case the number of comparisons is the average of the depths of all leaves

                                                                      o There are N different orderings of N elements

                                                                      48

                                                                      Lower Bound for Comparison Sorting

                                                                      o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                                      49

                                                                      Linear Sorting

                                                                      o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                                      o CountingSort

                                                                      o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                                      irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                                      50

                                                                      Linear Sorting

                                                                      o BucketSort

                                                                      o Assume N elements of A uniformly distributed over the range [01)

                                                                      o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                                      o Assumes each bucket will contain Θ(1) elements

                                                                      51

                                                                      External Sorting

                                                                      o What is the number of elements N we wish to sort do not fit in memory

                                                                      o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                      o We want to minimize disk accesses

                                                                      52

                                                                      External Mergesorting

                                                                      o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                      o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                      o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                      size M(K+1)o Perform a K-way merge O(N)

                                                                      o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                      53

                                                                      External Mergesort

                                                                      o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                      o P = page size

                                                                      o Accesses = 4NP (read-allwrite-all twice)

                                                                      54

                                                                      Summary

                                                                      • G64ADS Advanced Data Structures
                                                                      • Insertion sort
                                                                      • Slide 3
                                                                      • Slide 4
                                                                      • Slide 5
                                                                      • Slide 6
                                                                      • Insertion sort worst-case running time
                                                                      • Heapsort
                                                                      • Heapsort -Analysis
                                                                      • Heapsort ndash No Extra Memory
                                                                      • Slide 11
                                                                      • Mergesort
                                                                      • Slide 13
                                                                      • Mergesort Divide
                                                                      • Slide 15
                                                                      • Slide 16
                                                                      • Mergesort Merge
                                                                      • Slide 18
                                                                      • Slide 19
                                                                      • Mergesort Analysis
                                                                      • Slide 21
                                                                      • Slide 22
                                                                      • Quicksort
                                                                      • Quicksort Algorithm
                                                                      • Quicksort Example
                                                                      • Why so fast
                                                                      • Picking the Pivot
                                                                      • Slide 28
                                                                      • Partitioning Strategy
                                                                      • Partitioning Example
                                                                      • Slide 31
                                                                      • Slide 32
                                                                      • Slide 33
                                                                      • Small Arrays
                                                                      • QuickSort Implementation
                                                                      • Slide 36
                                                                      • Slide 37
                                                                      • Analysis of QuickSort
                                                                      • Slide 39
                                                                      • Slide 40
                                                                      • Comparison Sorting
                                                                      • Slide 42
                                                                      • Slide 43
                                                                      • Lower Bound on Sorting
                                                                      • Decision Trees
                                                                      • Slide 46
                                                                      • Decision Tree for Sorting
                                                                      • Lower Bound for Comparison Sorting
                                                                      • Linear Sorting
                                                                      • Slide 50
                                                                      • External Sorting
                                                                      • External Mergesorting
                                                                      • External Mergesort
                                                                      • Summary

                                                                        36

                                                                        QuickSort Implementation

                                                                        37

                                                                        QuickSort Implementation

                                                                        38

                                                                        Analysis of QuickSort

                                                                        o Let i be the number of elements sent to the left partition

                                                                        o Compute running time T(N) for array of size N

                                                                        o T(0) = T(1) = O(1)

                                                                        o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                                                        39

                                                                        Analysis of QuickSort

                                                                        40

                                                                        Analysis of QuickSort

                                                                        41

                                                                        Comparison Sorting

                                                                        42

                                                                        Comparison Sorting

                                                                        43

                                                                        Comparison Sorting

                                                                        44

                                                                        Lower Bound on Sorting

                                                                        o Best worst-case sorting algorithm (so far) is O(N log N)

                                                                        o Can we do bettero Can we prove a lower bound on the

                                                                        sorting problemo Preview

                                                                        o For comparison sorting no we canrsquot do better

                                                                        o Can show lower bound of Ω(N log N)

                                                                        45

                                                                        Decision Trees

                                                                        o A decision tree is a binary treeo Each node represents a set of possible

                                                                        orderings of the array elementso Each branch represents an outcome of

                                                                        a particular comparison

                                                                        o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                                        46

                                                                        Decision Trees

                                                                        47

                                                                        Decision Tree for Sorting

                                                                        o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                                        o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                                        o In the average case the number of comparisons is the average of the depths of all leaves

                                                                        o There are N different orderings of N elements

                                                                        48

                                                                        Lower Bound for Comparison Sorting

                                                                        o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                                        49

                                                                        Linear Sorting

                                                                        o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                                        o CountingSort

                                                                        o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                                        irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                                        50

                                                                        Linear Sorting

                                                                        o BucketSort

                                                                        o Assume N elements of A uniformly distributed over the range [01)

                                                                        o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                                        o Assumes each bucket will contain Θ(1) elements

                                                                        51

                                                                        External Sorting

                                                                        o What is the number of elements N we wish to sort do not fit in memory

                                                                        o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                        o We want to minimize disk accesses

                                                                        52

                                                                        External Mergesorting

                                                                        o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                        o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                        o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                        size M(K+1)o Perform a K-way merge O(N)

                                                                        o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                        53

                                                                        External Mergesort

                                                                        o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                        o P = page size

                                                                        o Accesses = 4NP (read-allwrite-all twice)

                                                                        54

                                                                        Summary

                                                                        • G64ADS Advanced Data Structures
                                                                        • Insertion sort
                                                                        • Slide 3
                                                                        • Slide 4
                                                                        • Slide 5
                                                                        • Slide 6
                                                                        • Insertion sort worst-case running time
                                                                        • Heapsort
                                                                        • Heapsort -Analysis
                                                                        • Heapsort ndash No Extra Memory
                                                                        • Slide 11
                                                                        • Mergesort
                                                                        • Slide 13
                                                                        • Mergesort Divide
                                                                        • Slide 15
                                                                        • Slide 16
                                                                        • Mergesort Merge
                                                                        • Slide 18
                                                                        • Slide 19
                                                                        • Mergesort Analysis
                                                                        • Slide 21
                                                                        • Slide 22
                                                                        • Quicksort
                                                                        • Quicksort Algorithm
                                                                        • Quicksort Example
                                                                        • Why so fast
                                                                        • Picking the Pivot
                                                                        • Slide 28
                                                                        • Partitioning Strategy
                                                                        • Partitioning Example
                                                                        • Slide 31
                                                                        • Slide 32
                                                                        • Slide 33
                                                                        • Small Arrays
                                                                        • QuickSort Implementation
                                                                        • Slide 36
                                                                        • Slide 37
                                                                        • Analysis of QuickSort
                                                                        • Slide 39
                                                                        • Slide 40
                                                                        • Comparison Sorting
                                                                        • Slide 42
                                                                        • Slide 43
                                                                        • Lower Bound on Sorting
                                                                        • Decision Trees
                                                                        • Slide 46
                                                                        • Decision Tree for Sorting
                                                                        • Lower Bound for Comparison Sorting
                                                                        • Linear Sorting
                                                                        • Slide 50
                                                                        • External Sorting
                                                                        • External Mergesorting
                                                                        • External Mergesort
                                                                        • Summary

                                                                          37

                                                                          QuickSort Implementation

                                                                          38

                                                                          Analysis of QuickSort

                                                                          o Let i be the number of elements sent to the left partition

                                                                          o Compute running time T(N) for array of size N

                                                                          o T(0) = T(1) = O(1)

                                                                          o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                                                          39

                                                                          Analysis of QuickSort

                                                                          40

                                                                          Analysis of QuickSort

                                                                          41

                                                                          Comparison Sorting

                                                                          42

                                                                          Comparison Sorting

                                                                          43

                                                                          Comparison Sorting

                                                                          44

                                                                          Lower Bound on Sorting

                                                                          o Best worst-case sorting algorithm (so far) is O(N log N)

                                                                          o Can we do bettero Can we prove a lower bound on the

                                                                          sorting problemo Preview

                                                                          o For comparison sorting no we canrsquot do better

                                                                          o Can show lower bound of Ω(N log N)

                                                                          45

                                                                          Decision Trees

                                                                          o A decision tree is a binary treeo Each node represents a set of possible

                                                                          orderings of the array elementso Each branch represents an outcome of

                                                                          a particular comparison

                                                                          o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                                          46

                                                                          Decision Trees

                                                                          47

                                                                          Decision Tree for Sorting

                                                                          o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                                          o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                                          o In the average case the number of comparisons is the average of the depths of all leaves

                                                                          o There are N different orderings of N elements

                                                                          48

                                                                          Lower Bound for Comparison Sorting

                                                                          o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                                          49

                                                                          Linear Sorting

                                                                          o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                                          o CountingSort

                                                                          o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                                          irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                                          50

                                                                          Linear Sorting

                                                                          o BucketSort

                                                                          o Assume N elements of A uniformly distributed over the range [01)

                                                                          o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                                          o Assumes each bucket will contain Θ(1) elements

                                                                          51

                                                                          External Sorting

                                                                          o What is the number of elements N we wish to sort do not fit in memory

                                                                          o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                          o We want to minimize disk accesses

                                                                          52

                                                                          External Mergesorting

                                                                          o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                          o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                          o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                          size M(K+1)o Perform a K-way merge O(N)

                                                                          o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                          53

                                                                          External Mergesort

                                                                          o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                          o P = page size

                                                                          o Accesses = 4NP (read-allwrite-all twice)

                                                                          54

                                                                          Summary

                                                                          • G64ADS Advanced Data Structures
                                                                          • Insertion sort
                                                                          • Slide 3
                                                                          • Slide 4
                                                                          • Slide 5
                                                                          • Slide 6
                                                                          • Insertion sort worst-case running time
                                                                          • Heapsort
                                                                          • Heapsort -Analysis
                                                                          • Heapsort ndash No Extra Memory
                                                                          • Slide 11
                                                                          • Mergesort
                                                                          • Slide 13
                                                                          • Mergesort Divide
                                                                          • Slide 15
                                                                          • Slide 16
                                                                          • Mergesort Merge
                                                                          • Slide 18
                                                                          • Slide 19
                                                                          • Mergesort Analysis
                                                                          • Slide 21
                                                                          • Slide 22
                                                                          • Quicksort
                                                                          • Quicksort Algorithm
                                                                          • Quicksort Example
                                                                          • Why so fast
                                                                          • Picking the Pivot
                                                                          • Slide 28
                                                                          • Partitioning Strategy
                                                                          • Partitioning Example
                                                                          • Slide 31
                                                                          • Slide 32
                                                                          • Slide 33
                                                                          • Small Arrays
                                                                          • QuickSort Implementation
                                                                          • Slide 36
                                                                          • Slide 37
                                                                          • Analysis of QuickSort
                                                                          • Slide 39
                                                                          • Slide 40
                                                                          • Comparison Sorting
                                                                          • Slide 42
                                                                          • Slide 43
                                                                          • Lower Bound on Sorting
                                                                          • Decision Trees
                                                                          • Slide 46
                                                                          • Decision Tree for Sorting
                                                                          • Lower Bound for Comparison Sorting
                                                                          • Linear Sorting
                                                                          • Slide 50
                                                                          • External Sorting
                                                                          • External Mergesorting
                                                                          • External Mergesort
                                                                          • Summary

                                                                            38

                                                                            Analysis of QuickSort

                                                                            o Let i be the number of elements sent to the left partition

                                                                            o Compute running time T(N) for array of size N

                                                                            o T(0) = T(1) = O(1)

                                                                            o T(N) = T(i) + T(N ndashi ndash1) + O(N)

                                                                            39

                                                                            Analysis of QuickSort

                                                                            40

                                                                            Analysis of QuickSort

                                                                            41

                                                                            Comparison Sorting

                                                                            42

                                                                            Comparison Sorting

                                                                            43

                                                                            Comparison Sorting

                                                                            44

                                                                            Lower Bound on Sorting

                                                                            o Best worst-case sorting algorithm (so far) is O(N log N)

                                                                            o Can we do bettero Can we prove a lower bound on the

                                                                            sorting problemo Preview

                                                                            o For comparison sorting no we canrsquot do better

                                                                            o Can show lower bound of Ω(N log N)

                                                                            45

                                                                            Decision Trees

                                                                            o A decision tree is a binary treeo Each node represents a set of possible

                                                                            orderings of the array elementso Each branch represents an outcome of

                                                                            a particular comparison

                                                                            o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                                            46

                                                                            Decision Trees

                                                                            47

                                                                            Decision Tree for Sorting

                                                                            o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                                            o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                                            o In the average case the number of comparisons is the average of the depths of all leaves

                                                                            o There are N different orderings of N elements

                                                                            48

                                                                            Lower Bound for Comparison Sorting

                                                                            o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                                            49

                                                                            Linear Sorting

                                                                            o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                                            o CountingSort

                                                                            o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                                            irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                                            50

                                                                            Linear Sorting

                                                                            o BucketSort

                                                                            o Assume N elements of A uniformly distributed over the range [01)

                                                                            o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                                            o Assumes each bucket will contain Θ(1) elements

                                                                            51

                                                                            External Sorting

                                                                            o What is the number of elements N we wish to sort do not fit in memory

                                                                            o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                            o We want to minimize disk accesses

                                                                            52

                                                                            External Mergesorting

                                                                            o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                            o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                            o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                            size M(K+1)o Perform a K-way merge O(N)

                                                                            o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                            53

                                                                            External Mergesort

                                                                            o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                            o P = page size

                                                                            o Accesses = 4NP (read-allwrite-all twice)

                                                                            54

                                                                            Summary

                                                                            • G64ADS Advanced Data Structures
                                                                            • Insertion sort
                                                                            • Slide 3
                                                                            • Slide 4
                                                                            • Slide 5
                                                                            • Slide 6
                                                                            • Insertion sort worst-case running time
                                                                            • Heapsort
                                                                            • Heapsort -Analysis
                                                                            • Heapsort ndash No Extra Memory
                                                                            • Slide 11
                                                                            • Mergesort
                                                                            • Slide 13
                                                                            • Mergesort Divide
                                                                            • Slide 15
                                                                            • Slide 16
                                                                            • Mergesort Merge
                                                                            • Slide 18
                                                                            • Slide 19
                                                                            • Mergesort Analysis
                                                                            • Slide 21
                                                                            • Slide 22
                                                                            • Quicksort
                                                                            • Quicksort Algorithm
                                                                            • Quicksort Example
                                                                            • Why so fast
                                                                            • Picking the Pivot
                                                                            • Slide 28
                                                                            • Partitioning Strategy
                                                                            • Partitioning Example
                                                                            • Slide 31
                                                                            • Slide 32
                                                                            • Slide 33
                                                                            • Small Arrays
                                                                            • QuickSort Implementation
                                                                            • Slide 36
                                                                            • Slide 37
                                                                            • Analysis of QuickSort
                                                                            • Slide 39
                                                                            • Slide 40
                                                                            • Comparison Sorting
                                                                            • Slide 42
                                                                            • Slide 43
                                                                            • Lower Bound on Sorting
                                                                            • Decision Trees
                                                                            • Slide 46
                                                                            • Decision Tree for Sorting
                                                                            • Lower Bound for Comparison Sorting
                                                                            • Linear Sorting
                                                                            • Slide 50
                                                                            • External Sorting
                                                                            • External Mergesorting
                                                                            • External Mergesort
                                                                            • Summary

                                                                              39

                                                                              Analysis of QuickSort

                                                                              40

                                                                              Analysis of QuickSort

                                                                              41

                                                                              Comparison Sorting

                                                                              42

                                                                              Comparison Sorting

                                                                              43

                                                                              Comparison Sorting

                                                                              44

                                                                              Lower Bound on Sorting

                                                                              o Best worst-case sorting algorithm (so far) is O(N log N)

                                                                              o Can we do bettero Can we prove a lower bound on the

                                                                              sorting problemo Preview

                                                                              o For comparison sorting no we canrsquot do better

                                                                              o Can show lower bound of Ω(N log N)

                                                                              45

                                                                              Decision Trees

                                                                              o A decision tree is a binary treeo Each node represents a set of possible

                                                                              orderings of the array elementso Each branch represents an outcome of

                                                                              a particular comparison

                                                                              o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                                              46

                                                                              Decision Trees

                                                                              47

                                                                              Decision Tree for Sorting

                                                                              o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                                              o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                                              o In the average case the number of comparisons is the average of the depths of all leaves

                                                                              o There are N different orderings of N elements

                                                                              48

                                                                              Lower Bound for Comparison Sorting

                                                                              o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                                              49

                                                                              Linear Sorting

                                                                              o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                                              o CountingSort

                                                                              o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                                              irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                                              50

                                                                              Linear Sorting

                                                                              o BucketSort

                                                                              o Assume N elements of A uniformly distributed over the range [01)

                                                                              o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                                              o Assumes each bucket will contain Θ(1) elements

                                                                              51

                                                                              External Sorting

                                                                              o What is the number of elements N we wish to sort do not fit in memory

                                                                              o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                              o We want to minimize disk accesses

                                                                              52

                                                                              External Mergesorting

                                                                              o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                              o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                              o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                              size M(K+1)o Perform a K-way merge O(N)

                                                                              o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                              53

                                                                              External Mergesort

                                                                              o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                              o P = page size

                                                                              o Accesses = 4NP (read-allwrite-all twice)

                                                                              54

                                                                              Summary

                                                                              • G64ADS Advanced Data Structures
                                                                              • Insertion sort
                                                                              • Slide 3
                                                                              • Slide 4
                                                                              • Slide 5
                                                                              • Slide 6
                                                                              • Insertion sort worst-case running time
                                                                              • Heapsort
                                                                              • Heapsort -Analysis
                                                                              • Heapsort ndash No Extra Memory
                                                                              • Slide 11
                                                                              • Mergesort
                                                                              • Slide 13
                                                                              • Mergesort Divide
                                                                              • Slide 15
                                                                              • Slide 16
                                                                              • Mergesort Merge
                                                                              • Slide 18
                                                                              • Slide 19
                                                                              • Mergesort Analysis
                                                                              • Slide 21
                                                                              • Slide 22
                                                                              • Quicksort
                                                                              • Quicksort Algorithm
                                                                              • Quicksort Example
                                                                              • Why so fast
                                                                              • Picking the Pivot
                                                                              • Slide 28
                                                                              • Partitioning Strategy
                                                                              • Partitioning Example
                                                                              • Slide 31
                                                                              • Slide 32
                                                                              • Slide 33
                                                                              • Small Arrays
                                                                              • QuickSort Implementation
                                                                              • Slide 36
                                                                              • Slide 37
                                                                              • Analysis of QuickSort
                                                                              • Slide 39
                                                                              • Slide 40
                                                                              • Comparison Sorting
                                                                              • Slide 42
                                                                              • Slide 43
                                                                              • Lower Bound on Sorting
                                                                              • Decision Trees
                                                                              • Slide 46
                                                                              • Decision Tree for Sorting
                                                                              • Lower Bound for Comparison Sorting
                                                                              • Linear Sorting
                                                                              • Slide 50
                                                                              • External Sorting
                                                                              • External Mergesorting
                                                                              • External Mergesort
                                                                              • Summary

                                                                                40

                                                                                Analysis of QuickSort

                                                                                41

                                                                                Comparison Sorting

                                                                                42

                                                                                Comparison Sorting

                                                                                43

                                                                                Comparison Sorting

                                                                                44

                                                                                Lower Bound on Sorting

                                                                                o Best worst-case sorting algorithm (so far) is O(N log N)

                                                                                o Can we do bettero Can we prove a lower bound on the

                                                                                sorting problemo Preview

                                                                                o For comparison sorting no we canrsquot do better

                                                                                o Can show lower bound of Ω(N log N)

                                                                                45

                                                                                Decision Trees

                                                                                o A decision tree is a binary treeo Each node represents a set of possible

                                                                                orderings of the array elementso Each branch represents an outcome of

                                                                                a particular comparison

                                                                                o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                                                46

                                                                                Decision Trees

                                                                                47

                                                                                Decision Tree for Sorting

                                                                                o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                                                o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                                                o In the average case the number of comparisons is the average of the depths of all leaves

                                                                                o There are N different orderings of N elements

                                                                                48

                                                                                Lower Bound for Comparison Sorting

                                                                                o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                                                49

                                                                                Linear Sorting

                                                                                o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                                                o CountingSort

                                                                                o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                                                irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                                                50

                                                                                Linear Sorting

                                                                                o BucketSort

                                                                                o Assume N elements of A uniformly distributed over the range [01)

                                                                                o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                                                o Assumes each bucket will contain Θ(1) elements

                                                                                51

                                                                                External Sorting

                                                                                o What is the number of elements N we wish to sort do not fit in memory

                                                                                o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                                o We want to minimize disk accesses

                                                                                52

                                                                                External Mergesorting

                                                                                o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                                o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                                o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                                size M(K+1)o Perform a K-way merge O(N)

                                                                                o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                                53

                                                                                External Mergesort

                                                                                o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                                o P = page size

                                                                                o Accesses = 4NP (read-allwrite-all twice)

                                                                                54

                                                                                Summary

                                                                                • G64ADS Advanced Data Structures
                                                                                • Insertion sort
                                                                                • Slide 3
                                                                                • Slide 4
                                                                                • Slide 5
                                                                                • Slide 6
                                                                                • Insertion sort worst-case running time
                                                                                • Heapsort
                                                                                • Heapsort -Analysis
                                                                                • Heapsort ndash No Extra Memory
                                                                                • Slide 11
                                                                                • Mergesort
                                                                                • Slide 13
                                                                                • Mergesort Divide
                                                                                • Slide 15
                                                                                • Slide 16
                                                                                • Mergesort Merge
                                                                                • Slide 18
                                                                                • Slide 19
                                                                                • Mergesort Analysis
                                                                                • Slide 21
                                                                                • Slide 22
                                                                                • Quicksort
                                                                                • Quicksort Algorithm
                                                                                • Quicksort Example
                                                                                • Why so fast
                                                                                • Picking the Pivot
                                                                                • Slide 28
                                                                                • Partitioning Strategy
                                                                                • Partitioning Example
                                                                                • Slide 31
                                                                                • Slide 32
                                                                                • Slide 33
                                                                                • Small Arrays
                                                                                • QuickSort Implementation
                                                                                • Slide 36
                                                                                • Slide 37
                                                                                • Analysis of QuickSort
                                                                                • Slide 39
                                                                                • Slide 40
                                                                                • Comparison Sorting
                                                                                • Slide 42
                                                                                • Slide 43
                                                                                • Lower Bound on Sorting
                                                                                • Decision Trees
                                                                                • Slide 46
                                                                                • Decision Tree for Sorting
                                                                                • Lower Bound for Comparison Sorting
                                                                                • Linear Sorting
                                                                                • Slide 50
                                                                                • External Sorting
                                                                                • External Mergesorting
                                                                                • External Mergesort
                                                                                • Summary

                                                                                  41

                                                                                  Comparison Sorting

                                                                                  42

                                                                                  Comparison Sorting

                                                                                  43

                                                                                  Comparison Sorting

                                                                                  44

                                                                                  Lower Bound on Sorting

                                                                                  o Best worst-case sorting algorithm (so far) is O(N log N)

                                                                                  o Can we do bettero Can we prove a lower bound on the

                                                                                  sorting problemo Preview

                                                                                  o For comparison sorting no we canrsquot do better

                                                                                  o Can show lower bound of Ω(N log N)

                                                                                  45

                                                                                  Decision Trees

                                                                                  o A decision tree is a binary treeo Each node represents a set of possible

                                                                                  orderings of the array elementso Each branch represents an outcome of

                                                                                  a particular comparison

                                                                                  o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                                                  46

                                                                                  Decision Trees

                                                                                  47

                                                                                  Decision Tree for Sorting

                                                                                  o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                                                  o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                                                  o In the average case the number of comparisons is the average of the depths of all leaves

                                                                                  o There are N different orderings of N elements

                                                                                  48

                                                                                  Lower Bound for Comparison Sorting

                                                                                  o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                                                  49

                                                                                  Linear Sorting

                                                                                  o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                                                  o CountingSort

                                                                                  o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                                                  irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                                                  50

                                                                                  Linear Sorting

                                                                                  o BucketSort

                                                                                  o Assume N elements of A uniformly distributed over the range [01)

                                                                                  o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                                                  o Assumes each bucket will contain Θ(1) elements

                                                                                  51

                                                                                  External Sorting

                                                                                  o What is the number of elements N we wish to sort do not fit in memory

                                                                                  o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                                  o We want to minimize disk accesses

                                                                                  52

                                                                                  External Mergesorting

                                                                                  o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                                  o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                                  o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                                  size M(K+1)o Perform a K-way merge O(N)

                                                                                  o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                                  53

                                                                                  External Mergesort

                                                                                  o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                                  o P = page size

                                                                                  o Accesses = 4NP (read-allwrite-all twice)

                                                                                  54

                                                                                  Summary

                                                                                  • G64ADS Advanced Data Structures
                                                                                  • Insertion sort
                                                                                  • Slide 3
                                                                                  • Slide 4
                                                                                  • Slide 5
                                                                                  • Slide 6
                                                                                  • Insertion sort worst-case running time
                                                                                  • Heapsort
                                                                                  • Heapsort -Analysis
                                                                                  • Heapsort ndash No Extra Memory
                                                                                  • Slide 11
                                                                                  • Mergesort
                                                                                  • Slide 13
                                                                                  • Mergesort Divide
                                                                                  • Slide 15
                                                                                  • Slide 16
                                                                                  • Mergesort Merge
                                                                                  • Slide 18
                                                                                  • Slide 19
                                                                                  • Mergesort Analysis
                                                                                  • Slide 21
                                                                                  • Slide 22
                                                                                  • Quicksort
                                                                                  • Quicksort Algorithm
                                                                                  • Quicksort Example
                                                                                  • Why so fast
                                                                                  • Picking the Pivot
                                                                                  • Slide 28
                                                                                  • Partitioning Strategy
                                                                                  • Partitioning Example
                                                                                  • Slide 31
                                                                                  • Slide 32
                                                                                  • Slide 33
                                                                                  • Small Arrays
                                                                                  • QuickSort Implementation
                                                                                  • Slide 36
                                                                                  • Slide 37
                                                                                  • Analysis of QuickSort
                                                                                  • Slide 39
                                                                                  • Slide 40
                                                                                  • Comparison Sorting
                                                                                  • Slide 42
                                                                                  • Slide 43
                                                                                  • Lower Bound on Sorting
                                                                                  • Decision Trees
                                                                                  • Slide 46
                                                                                  • Decision Tree for Sorting
                                                                                  • Lower Bound for Comparison Sorting
                                                                                  • Linear Sorting
                                                                                  • Slide 50
                                                                                  • External Sorting
                                                                                  • External Mergesorting
                                                                                  • External Mergesort
                                                                                  • Summary

                                                                                    42

                                                                                    Comparison Sorting

                                                                                    43

                                                                                    Comparison Sorting

                                                                                    44

                                                                                    Lower Bound on Sorting

                                                                                    o Best worst-case sorting algorithm (so far) is O(N log N)

                                                                                    o Can we do bettero Can we prove a lower bound on the

                                                                                    sorting problemo Preview

                                                                                    o For comparison sorting no we canrsquot do better

                                                                                    o Can show lower bound of Ω(N log N)

                                                                                    45

                                                                                    Decision Trees

                                                                                    o A decision tree is a binary treeo Each node represents a set of possible

                                                                                    orderings of the array elementso Each branch represents an outcome of

                                                                                    a particular comparison

                                                                                    o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                                                    46

                                                                                    Decision Trees

                                                                                    47

                                                                                    Decision Tree for Sorting

                                                                                    o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                                                    o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                                                    o In the average case the number of comparisons is the average of the depths of all leaves

                                                                                    o There are N different orderings of N elements

                                                                                    48

                                                                                    Lower Bound for Comparison Sorting

                                                                                    o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                                                    49

                                                                                    Linear Sorting

                                                                                    o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                                                    o CountingSort

                                                                                    o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                                                    irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                                                    50

                                                                                    Linear Sorting

                                                                                    o BucketSort

                                                                                    o Assume N elements of A uniformly distributed over the range [01)

                                                                                    o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                                                    o Assumes each bucket will contain Θ(1) elements

                                                                                    51

                                                                                    External Sorting

                                                                                    o What is the number of elements N we wish to sort do not fit in memory

                                                                                    o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                                    o We want to minimize disk accesses

                                                                                    52

                                                                                    External Mergesorting

                                                                                    o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                                    o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                                    o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                                    size M(K+1)o Perform a K-way merge O(N)

                                                                                    o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                                    53

                                                                                    External Mergesort

                                                                                    o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                                    o P = page size

                                                                                    o Accesses = 4NP (read-allwrite-all twice)

                                                                                    54

                                                                                    Summary

                                                                                    • G64ADS Advanced Data Structures
                                                                                    • Insertion sort
                                                                                    • Slide 3
                                                                                    • Slide 4
                                                                                    • Slide 5
                                                                                    • Slide 6
                                                                                    • Insertion sort worst-case running time
                                                                                    • Heapsort
                                                                                    • Heapsort -Analysis
                                                                                    • Heapsort ndash No Extra Memory
                                                                                    • Slide 11
                                                                                    • Mergesort
                                                                                    • Slide 13
                                                                                    • Mergesort Divide
                                                                                    • Slide 15
                                                                                    • Slide 16
                                                                                    • Mergesort Merge
                                                                                    • Slide 18
                                                                                    • Slide 19
                                                                                    • Mergesort Analysis
                                                                                    • Slide 21
                                                                                    • Slide 22
                                                                                    • Quicksort
                                                                                    • Quicksort Algorithm
                                                                                    • Quicksort Example
                                                                                    • Why so fast
                                                                                    • Picking the Pivot
                                                                                    • Slide 28
                                                                                    • Partitioning Strategy
                                                                                    • Partitioning Example
                                                                                    • Slide 31
                                                                                    • Slide 32
                                                                                    • Slide 33
                                                                                    • Small Arrays
                                                                                    • QuickSort Implementation
                                                                                    • Slide 36
                                                                                    • Slide 37
                                                                                    • Analysis of QuickSort
                                                                                    • Slide 39
                                                                                    • Slide 40
                                                                                    • Comparison Sorting
                                                                                    • Slide 42
                                                                                    • Slide 43
                                                                                    • Lower Bound on Sorting
                                                                                    • Decision Trees
                                                                                    • Slide 46
                                                                                    • Decision Tree for Sorting
                                                                                    • Lower Bound for Comparison Sorting
                                                                                    • Linear Sorting
                                                                                    • Slide 50
                                                                                    • External Sorting
                                                                                    • External Mergesorting
                                                                                    • External Mergesort
                                                                                    • Summary

                                                                                      43

                                                                                      Comparison Sorting

                                                                                      44

                                                                                      Lower Bound on Sorting

                                                                                      o Best worst-case sorting algorithm (so far) is O(N log N)

                                                                                      o Can we do bettero Can we prove a lower bound on the

                                                                                      sorting problemo Preview

                                                                                      o For comparison sorting no we canrsquot do better

                                                                                      o Can show lower bound of Ω(N log N)

                                                                                      45

                                                                                      Decision Trees

                                                                                      o A decision tree is a binary treeo Each node represents a set of possible

                                                                                      orderings of the array elementso Each branch represents an outcome of

                                                                                      a particular comparison

                                                                                      o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                                                      46

                                                                                      Decision Trees

                                                                                      47

                                                                                      Decision Tree for Sorting

                                                                                      o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                                                      o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                                                      o In the average case the number of comparisons is the average of the depths of all leaves

                                                                                      o There are N different orderings of N elements

                                                                                      48

                                                                                      Lower Bound for Comparison Sorting

                                                                                      o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                                                      49

                                                                                      Linear Sorting

                                                                                      o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                                                      o CountingSort

                                                                                      o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                                                      irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                                                      50

                                                                                      Linear Sorting

                                                                                      o BucketSort

                                                                                      o Assume N elements of A uniformly distributed over the range [01)

                                                                                      o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                                                      o Assumes each bucket will contain Θ(1) elements

                                                                                      51

                                                                                      External Sorting

                                                                                      o What is the number of elements N we wish to sort do not fit in memory

                                                                                      o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                                      o We want to minimize disk accesses

                                                                                      52

                                                                                      External Mergesorting

                                                                                      o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                                      o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                                      o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                                      size M(K+1)o Perform a K-way merge O(N)

                                                                                      o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                                      53

                                                                                      External Mergesort

                                                                                      o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                                      o P = page size

                                                                                      o Accesses = 4NP (read-allwrite-all twice)

                                                                                      54

                                                                                      Summary

                                                                                      • G64ADS Advanced Data Structures
                                                                                      • Insertion sort
                                                                                      • Slide 3
                                                                                      • Slide 4
                                                                                      • Slide 5
                                                                                      • Slide 6
                                                                                      • Insertion sort worst-case running time
                                                                                      • Heapsort
                                                                                      • Heapsort -Analysis
                                                                                      • Heapsort ndash No Extra Memory
                                                                                      • Slide 11
                                                                                      • Mergesort
                                                                                      • Slide 13
                                                                                      • Mergesort Divide
                                                                                      • Slide 15
                                                                                      • Slide 16
                                                                                      • Mergesort Merge
                                                                                      • Slide 18
                                                                                      • Slide 19
                                                                                      • Mergesort Analysis
                                                                                      • Slide 21
                                                                                      • Slide 22
                                                                                      • Quicksort
                                                                                      • Quicksort Algorithm
                                                                                      • Quicksort Example
                                                                                      • Why so fast
                                                                                      • Picking the Pivot
                                                                                      • Slide 28
                                                                                      • Partitioning Strategy
                                                                                      • Partitioning Example
                                                                                      • Slide 31
                                                                                      • Slide 32
                                                                                      • Slide 33
                                                                                      • Small Arrays
                                                                                      • QuickSort Implementation
                                                                                      • Slide 36
                                                                                      • Slide 37
                                                                                      • Analysis of QuickSort
                                                                                      • Slide 39
                                                                                      • Slide 40
                                                                                      • Comparison Sorting
                                                                                      • Slide 42
                                                                                      • Slide 43
                                                                                      • Lower Bound on Sorting
                                                                                      • Decision Trees
                                                                                      • Slide 46
                                                                                      • Decision Tree for Sorting
                                                                                      • Lower Bound for Comparison Sorting
                                                                                      • Linear Sorting
                                                                                      • Slide 50
                                                                                      • External Sorting
                                                                                      • External Mergesorting
                                                                                      • External Mergesort
                                                                                      • Summary

                                                                                        44

                                                                                        Lower Bound on Sorting

                                                                                        o Best worst-case sorting algorithm (so far) is O(N log N)

                                                                                        o Can we do bettero Can we prove a lower bound on the

                                                                                        sorting problemo Preview

                                                                                        o For comparison sorting no we canrsquot do better

                                                                                        o Can show lower bound of Ω(N log N)

                                                                                        45

                                                                                        Decision Trees

                                                                                        o A decision tree is a binary treeo Each node represents a set of possible

                                                                                        orderings of the array elementso Each branch represents an outcome of

                                                                                        a particular comparison

                                                                                        o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                                                        46

                                                                                        Decision Trees

                                                                                        47

                                                                                        Decision Tree for Sorting

                                                                                        o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                                                        o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                                                        o In the average case the number of comparisons is the average of the depths of all leaves

                                                                                        o There are N different orderings of N elements

                                                                                        48

                                                                                        Lower Bound for Comparison Sorting

                                                                                        o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                                                        49

                                                                                        Linear Sorting

                                                                                        o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                                                        o CountingSort

                                                                                        o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                                                        irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                                                        50

                                                                                        Linear Sorting

                                                                                        o BucketSort

                                                                                        o Assume N elements of A uniformly distributed over the range [01)

                                                                                        o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                                                        o Assumes each bucket will contain Θ(1) elements

                                                                                        51

                                                                                        External Sorting

                                                                                        o What is the number of elements N we wish to sort do not fit in memory

                                                                                        o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                                        o We want to minimize disk accesses

                                                                                        52

                                                                                        External Mergesorting

                                                                                        o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                                        o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                                        o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                                        size M(K+1)o Perform a K-way merge O(N)

                                                                                        o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                                        53

                                                                                        External Mergesort

                                                                                        o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                                        o P = page size

                                                                                        o Accesses = 4NP (read-allwrite-all twice)

                                                                                        54

                                                                                        Summary

                                                                                        • G64ADS Advanced Data Structures
                                                                                        • Insertion sort
                                                                                        • Slide 3
                                                                                        • Slide 4
                                                                                        • Slide 5
                                                                                        • Slide 6
                                                                                        • Insertion sort worst-case running time
                                                                                        • Heapsort
                                                                                        • Heapsort -Analysis
                                                                                        • Heapsort ndash No Extra Memory
                                                                                        • Slide 11
                                                                                        • Mergesort
                                                                                        • Slide 13
                                                                                        • Mergesort Divide
                                                                                        • Slide 15
                                                                                        • Slide 16
                                                                                        • Mergesort Merge
                                                                                        • Slide 18
                                                                                        • Slide 19
                                                                                        • Mergesort Analysis
                                                                                        • Slide 21
                                                                                        • Slide 22
                                                                                        • Quicksort
                                                                                        • Quicksort Algorithm
                                                                                        • Quicksort Example
                                                                                        • Why so fast
                                                                                        • Picking the Pivot
                                                                                        • Slide 28
                                                                                        • Partitioning Strategy
                                                                                        • Partitioning Example
                                                                                        • Slide 31
                                                                                        • Slide 32
                                                                                        • Slide 33
                                                                                        • Small Arrays
                                                                                        • QuickSort Implementation
                                                                                        • Slide 36
                                                                                        • Slide 37
                                                                                        • Analysis of QuickSort
                                                                                        • Slide 39
                                                                                        • Slide 40
                                                                                        • Comparison Sorting
                                                                                        • Slide 42
                                                                                        • Slide 43
                                                                                        • Lower Bound on Sorting
                                                                                        • Decision Trees
                                                                                        • Slide 46
                                                                                        • Decision Tree for Sorting
                                                                                        • Lower Bound for Comparison Sorting
                                                                                        • Linear Sorting
                                                                                        • Slide 50
                                                                                        • External Sorting
                                                                                        • External Mergesorting
                                                                                        • External Mergesort
                                                                                        • Summary

                                                                                          45

                                                                                          Decision Trees

                                                                                          o A decision tree is a binary treeo Each node represents a set of possible

                                                                                          orderings of the array elementso Each branch represents an outcome of

                                                                                          a particular comparison

                                                                                          o Each leaf of the decision tree represents a particular ordering of the original array elements

                                                                                          46

                                                                                          Decision Trees

                                                                                          47

                                                                                          Decision Tree for Sorting

                                                                                          o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                                                          o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                                                          o In the average case the number of comparisons is the average of the depths of all leaves

                                                                                          o There are N different orderings of N elements

                                                                                          48

                                                                                          Lower Bound for Comparison Sorting

                                                                                          o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                                                          49

                                                                                          Linear Sorting

                                                                                          o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                                                          o CountingSort

                                                                                          o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                                                          irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                                                          50

                                                                                          Linear Sorting

                                                                                          o BucketSort

                                                                                          o Assume N elements of A uniformly distributed over the range [01)

                                                                                          o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                                                          o Assumes each bucket will contain Θ(1) elements

                                                                                          51

                                                                                          External Sorting

                                                                                          o What is the number of elements N we wish to sort do not fit in memory

                                                                                          o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                                          o We want to minimize disk accesses

                                                                                          52

                                                                                          External Mergesorting

                                                                                          o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                                          o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                                          o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                                          size M(K+1)o Perform a K-way merge O(N)

                                                                                          o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                                          53

                                                                                          External Mergesort

                                                                                          o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                                          o P = page size

                                                                                          o Accesses = 4NP (read-allwrite-all twice)

                                                                                          54

                                                                                          Summary

                                                                                          • G64ADS Advanced Data Structures
                                                                                          • Insertion sort
                                                                                          • Slide 3
                                                                                          • Slide 4
                                                                                          • Slide 5
                                                                                          • Slide 6
                                                                                          • Insertion sort worst-case running time
                                                                                          • Heapsort
                                                                                          • Heapsort -Analysis
                                                                                          • Heapsort ndash No Extra Memory
                                                                                          • Slide 11
                                                                                          • Mergesort
                                                                                          • Slide 13
                                                                                          • Mergesort Divide
                                                                                          • Slide 15
                                                                                          • Slide 16
                                                                                          • Mergesort Merge
                                                                                          • Slide 18
                                                                                          • Slide 19
                                                                                          • Mergesort Analysis
                                                                                          • Slide 21
                                                                                          • Slide 22
                                                                                          • Quicksort
                                                                                          • Quicksort Algorithm
                                                                                          • Quicksort Example
                                                                                          • Why so fast
                                                                                          • Picking the Pivot
                                                                                          • Slide 28
                                                                                          • Partitioning Strategy
                                                                                          • Partitioning Example
                                                                                          • Slide 31
                                                                                          • Slide 32
                                                                                          • Slide 33
                                                                                          • Small Arrays
                                                                                          • QuickSort Implementation
                                                                                          • Slide 36
                                                                                          • Slide 37
                                                                                          • Analysis of QuickSort
                                                                                          • Slide 39
                                                                                          • Slide 40
                                                                                          • Comparison Sorting
                                                                                          • Slide 42
                                                                                          • Slide 43
                                                                                          • Lower Bound on Sorting
                                                                                          • Decision Trees
                                                                                          • Slide 46
                                                                                          • Decision Tree for Sorting
                                                                                          • Lower Bound for Comparison Sorting
                                                                                          • Linear Sorting
                                                                                          • Slide 50
                                                                                          • External Sorting
                                                                                          • External Mergesorting
                                                                                          • External Mergesort
                                                                                          • Summary

                                                                                            46

                                                                                            Decision Trees

                                                                                            47

                                                                                            Decision Tree for Sorting

                                                                                            o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                                                            o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                                                            o In the average case the number of comparisons is the average of the depths of all leaves

                                                                                            o There are N different orderings of N elements

                                                                                            48

                                                                                            Lower Bound for Comparison Sorting

                                                                                            o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                                                            49

                                                                                            Linear Sorting

                                                                                            o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                                                            o CountingSort

                                                                                            o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                                                            irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                                                            50

                                                                                            Linear Sorting

                                                                                            o BucketSort

                                                                                            o Assume N elements of A uniformly distributed over the range [01)

                                                                                            o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                                                            o Assumes each bucket will contain Θ(1) elements

                                                                                            51

                                                                                            External Sorting

                                                                                            o What is the number of elements N we wish to sort do not fit in memory

                                                                                            o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                                            o We want to minimize disk accesses

                                                                                            52

                                                                                            External Mergesorting

                                                                                            o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                                            o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                                            o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                                            size M(K+1)o Perform a K-way merge O(N)

                                                                                            o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                                            53

                                                                                            External Mergesort

                                                                                            o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                                            o P = page size

                                                                                            o Accesses = 4NP (read-allwrite-all twice)

                                                                                            54

                                                                                            Summary

                                                                                            • G64ADS Advanced Data Structures
                                                                                            • Insertion sort
                                                                                            • Slide 3
                                                                                            • Slide 4
                                                                                            • Slide 5
                                                                                            • Slide 6
                                                                                            • Insertion sort worst-case running time
                                                                                            • Heapsort
                                                                                            • Heapsort -Analysis
                                                                                            • Heapsort ndash No Extra Memory
                                                                                            • Slide 11
                                                                                            • Mergesort
                                                                                            • Slide 13
                                                                                            • Mergesort Divide
                                                                                            • Slide 15
                                                                                            • Slide 16
                                                                                            • Mergesort Merge
                                                                                            • Slide 18
                                                                                            • Slide 19
                                                                                            • Mergesort Analysis
                                                                                            • Slide 21
                                                                                            • Slide 22
                                                                                            • Quicksort
                                                                                            • Quicksort Algorithm
                                                                                            • Quicksort Example
                                                                                            • Why so fast
                                                                                            • Picking the Pivot
                                                                                            • Slide 28
                                                                                            • Partitioning Strategy
                                                                                            • Partitioning Example
                                                                                            • Slide 31
                                                                                            • Slide 32
                                                                                            • Slide 33
                                                                                            • Small Arrays
                                                                                            • QuickSort Implementation
                                                                                            • Slide 36
                                                                                            • Slide 37
                                                                                            • Analysis of QuickSort
                                                                                            • Slide 39
                                                                                            • Slide 40
                                                                                            • Comparison Sorting
                                                                                            • Slide 42
                                                                                            • Slide 43
                                                                                            • Lower Bound on Sorting
                                                                                            • Decision Trees
                                                                                            • Slide 46
                                                                                            • Decision Tree for Sorting
                                                                                            • Lower Bound for Comparison Sorting
                                                                                            • Linear Sorting
                                                                                            • Slide 50
                                                                                            • External Sorting
                                                                                            • External Mergesorting
                                                                                            • External Mergesort
                                                                                            • Summary

                                                                                              47

                                                                                              Decision Tree for Sorting

                                                                                              o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

                                                                                              o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

                                                                                              o In the average case the number of comparisons is the average of the depths of all leaves

                                                                                              o There are N different orderings of N elements

                                                                                              48

                                                                                              Lower Bound for Comparison Sorting

                                                                                              o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                                                              49

                                                                                              Linear Sorting

                                                                                              o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                                                              o CountingSort

                                                                                              o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                                                              irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                                                              50

                                                                                              Linear Sorting

                                                                                              o BucketSort

                                                                                              o Assume N elements of A uniformly distributed over the range [01)

                                                                                              o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                                                              o Assumes each bucket will contain Θ(1) elements

                                                                                              51

                                                                                              External Sorting

                                                                                              o What is the number of elements N we wish to sort do not fit in memory

                                                                                              o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                                              o We want to minimize disk accesses

                                                                                              52

                                                                                              External Mergesorting

                                                                                              o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                                              o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                                              o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                                              size M(K+1)o Perform a K-way merge O(N)

                                                                                              o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                                              53

                                                                                              External Mergesort

                                                                                              o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                                              o P = page size

                                                                                              o Accesses = 4NP (read-allwrite-all twice)

                                                                                              54

                                                                                              Summary

                                                                                              • G64ADS Advanced Data Structures
                                                                                              • Insertion sort
                                                                                              • Slide 3
                                                                                              • Slide 4
                                                                                              • Slide 5
                                                                                              • Slide 6
                                                                                              • Insertion sort worst-case running time
                                                                                              • Heapsort
                                                                                              • Heapsort -Analysis
                                                                                              • Heapsort ndash No Extra Memory
                                                                                              • Slide 11
                                                                                              • Mergesort
                                                                                              • Slide 13
                                                                                              • Mergesort Divide
                                                                                              • Slide 15
                                                                                              • Slide 16
                                                                                              • Mergesort Merge
                                                                                              • Slide 18
                                                                                              • Slide 19
                                                                                              • Mergesort Analysis
                                                                                              • Slide 21
                                                                                              • Slide 22
                                                                                              • Quicksort
                                                                                              • Quicksort Algorithm
                                                                                              • Quicksort Example
                                                                                              • Why so fast
                                                                                              • Picking the Pivot
                                                                                              • Slide 28
                                                                                              • Partitioning Strategy
                                                                                              • Partitioning Example
                                                                                              • Slide 31
                                                                                              • Slide 32
                                                                                              • Slide 33
                                                                                              • Small Arrays
                                                                                              • QuickSort Implementation
                                                                                              • Slide 36
                                                                                              • Slide 37
                                                                                              • Analysis of QuickSort
                                                                                              • Slide 39
                                                                                              • Slide 40
                                                                                              • Comparison Sorting
                                                                                              • Slide 42
                                                                                              • Slide 43
                                                                                              • Lower Bound on Sorting
                                                                                              • Decision Trees
                                                                                              • Slide 46
                                                                                              • Decision Tree for Sorting
                                                                                              • Lower Bound for Comparison Sorting
                                                                                              • Linear Sorting
                                                                                              • Slide 50
                                                                                              • External Sorting
                                                                                              • External Mergesorting
                                                                                              • External Mergesort
                                                                                              • Summary

                                                                                                48

                                                                                                Lower Bound for Comparison Sorting

                                                                                                o Lemma 71o Lemma 72o Theorem 76o Theorem 77

                                                                                                49

                                                                                                Linear Sorting

                                                                                                o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                                                                o CountingSort

                                                                                                o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                                                                irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                                                                50

                                                                                                Linear Sorting

                                                                                                o BucketSort

                                                                                                o Assume N elements of A uniformly distributed over the range [01)

                                                                                                o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                                                                o Assumes each bucket will contain Θ(1) elements

                                                                                                51

                                                                                                External Sorting

                                                                                                o What is the number of elements N we wish to sort do not fit in memory

                                                                                                o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                                                o We want to minimize disk accesses

                                                                                                52

                                                                                                External Mergesorting

                                                                                                o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                                                o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                                                o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                                                size M(K+1)o Perform a K-way merge O(N)

                                                                                                o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                                                53

                                                                                                External Mergesort

                                                                                                o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                                                o P = page size

                                                                                                o Accesses = 4NP (read-allwrite-all twice)

                                                                                                54

                                                                                                Summary

                                                                                                • G64ADS Advanced Data Structures
                                                                                                • Insertion sort
                                                                                                • Slide 3
                                                                                                • Slide 4
                                                                                                • Slide 5
                                                                                                • Slide 6
                                                                                                • Insertion sort worst-case running time
                                                                                                • Heapsort
                                                                                                • Heapsort -Analysis
                                                                                                • Heapsort ndash No Extra Memory
                                                                                                • Slide 11
                                                                                                • Mergesort
                                                                                                • Slide 13
                                                                                                • Mergesort Divide
                                                                                                • Slide 15
                                                                                                • Slide 16
                                                                                                • Mergesort Merge
                                                                                                • Slide 18
                                                                                                • Slide 19
                                                                                                • Mergesort Analysis
                                                                                                • Slide 21
                                                                                                • Slide 22
                                                                                                • Quicksort
                                                                                                • Quicksort Algorithm
                                                                                                • Quicksort Example
                                                                                                • Why so fast
                                                                                                • Picking the Pivot
                                                                                                • Slide 28
                                                                                                • Partitioning Strategy
                                                                                                • Partitioning Example
                                                                                                • Slide 31
                                                                                                • Slide 32
                                                                                                • Slide 33
                                                                                                • Small Arrays
                                                                                                • QuickSort Implementation
                                                                                                • Slide 36
                                                                                                • Slide 37
                                                                                                • Analysis of QuickSort
                                                                                                • Slide 39
                                                                                                • Slide 40
                                                                                                • Comparison Sorting
                                                                                                • Slide 42
                                                                                                • Slide 43
                                                                                                • Lower Bound on Sorting
                                                                                                • Decision Trees
                                                                                                • Slide 46
                                                                                                • Decision Tree for Sorting
                                                                                                • Lower Bound for Comparison Sorting
                                                                                                • Linear Sorting
                                                                                                • Slide 50
                                                                                                • External Sorting
                                                                                                • External Mergesorting
                                                                                                • External Mergesort
                                                                                                • Summary

                                                                                                  49

                                                                                                  Linear Sorting

                                                                                                  o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

                                                                                                  o CountingSort

                                                                                                  o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

                                                                                                  irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

                                                                                                  50

                                                                                                  Linear Sorting

                                                                                                  o BucketSort

                                                                                                  o Assume N elements of A uniformly distributed over the range [01)

                                                                                                  o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                                                                  o Assumes each bucket will contain Θ(1) elements

                                                                                                  51

                                                                                                  External Sorting

                                                                                                  o What is the number of elements N we wish to sort do not fit in memory

                                                                                                  o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                                                  o We want to minimize disk accesses

                                                                                                  52

                                                                                                  External Mergesorting

                                                                                                  o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                                                  o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                                                  o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                                                  size M(K+1)o Perform a K-way merge O(N)

                                                                                                  o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                                                  53

                                                                                                  External Mergesort

                                                                                                  o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                                                  o P = page size

                                                                                                  o Accesses = 4NP (read-allwrite-all twice)

                                                                                                  54

                                                                                                  Summary

                                                                                                  • G64ADS Advanced Data Structures
                                                                                                  • Insertion sort
                                                                                                  • Slide 3
                                                                                                  • Slide 4
                                                                                                  • Slide 5
                                                                                                  • Slide 6
                                                                                                  • Insertion sort worst-case running time
                                                                                                  • Heapsort
                                                                                                  • Heapsort -Analysis
                                                                                                  • Heapsort ndash No Extra Memory
                                                                                                  • Slide 11
                                                                                                  • Mergesort
                                                                                                  • Slide 13
                                                                                                  • Mergesort Divide
                                                                                                  • Slide 15
                                                                                                  • Slide 16
                                                                                                  • Mergesort Merge
                                                                                                  • Slide 18
                                                                                                  • Slide 19
                                                                                                  • Mergesort Analysis
                                                                                                  • Slide 21
                                                                                                  • Slide 22
                                                                                                  • Quicksort
                                                                                                  • Quicksort Algorithm
                                                                                                  • Quicksort Example
                                                                                                  • Why so fast
                                                                                                  • Picking the Pivot
                                                                                                  • Slide 28
                                                                                                  • Partitioning Strategy
                                                                                                  • Partitioning Example
                                                                                                  • Slide 31
                                                                                                  • Slide 32
                                                                                                  • Slide 33
                                                                                                  • Small Arrays
                                                                                                  • QuickSort Implementation
                                                                                                  • Slide 36
                                                                                                  • Slide 37
                                                                                                  • Analysis of QuickSort
                                                                                                  • Slide 39
                                                                                                  • Slide 40
                                                                                                  • Comparison Sorting
                                                                                                  • Slide 42
                                                                                                  • Slide 43
                                                                                                  • Lower Bound on Sorting
                                                                                                  • Decision Trees
                                                                                                  • Slide 46
                                                                                                  • Decision Tree for Sorting
                                                                                                  • Lower Bound for Comparison Sorting
                                                                                                  • Linear Sorting
                                                                                                  • Slide 50
                                                                                                  • External Sorting
                                                                                                  • External Mergesorting
                                                                                                  • External Mergesort
                                                                                                  • Summary

                                                                                                    50

                                                                                                    Linear Sorting

                                                                                                    o BucketSort

                                                                                                    o Assume N elements of A uniformly distributed over the range [01)

                                                                                                    o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

                                                                                                    o Assumes each bucket will contain Θ(1) elements

                                                                                                    51

                                                                                                    External Sorting

                                                                                                    o What is the number of elements N we wish to sort do not fit in memory

                                                                                                    o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                                                    o We want to minimize disk accesses

                                                                                                    52

                                                                                                    External Mergesorting

                                                                                                    o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                                                    o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                                                    o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                                                    size M(K+1)o Perform a K-way merge O(N)

                                                                                                    o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                                                    53

                                                                                                    External Mergesort

                                                                                                    o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                                                    o P = page size

                                                                                                    o Accesses = 4NP (read-allwrite-all twice)

                                                                                                    54

                                                                                                    Summary

                                                                                                    • G64ADS Advanced Data Structures
                                                                                                    • Insertion sort
                                                                                                    • Slide 3
                                                                                                    • Slide 4
                                                                                                    • Slide 5
                                                                                                    • Slide 6
                                                                                                    • Insertion sort worst-case running time
                                                                                                    • Heapsort
                                                                                                    • Heapsort -Analysis
                                                                                                    • Heapsort ndash No Extra Memory
                                                                                                    • Slide 11
                                                                                                    • Mergesort
                                                                                                    • Slide 13
                                                                                                    • Mergesort Divide
                                                                                                    • Slide 15
                                                                                                    • Slide 16
                                                                                                    • Mergesort Merge
                                                                                                    • Slide 18
                                                                                                    • Slide 19
                                                                                                    • Mergesort Analysis
                                                                                                    • Slide 21
                                                                                                    • Slide 22
                                                                                                    • Quicksort
                                                                                                    • Quicksort Algorithm
                                                                                                    • Quicksort Example
                                                                                                    • Why so fast
                                                                                                    • Picking the Pivot
                                                                                                    • Slide 28
                                                                                                    • Partitioning Strategy
                                                                                                    • Partitioning Example
                                                                                                    • Slide 31
                                                                                                    • Slide 32
                                                                                                    • Slide 33
                                                                                                    • Small Arrays
                                                                                                    • QuickSort Implementation
                                                                                                    • Slide 36
                                                                                                    • Slide 37
                                                                                                    • Analysis of QuickSort
                                                                                                    • Slide 39
                                                                                                    • Slide 40
                                                                                                    • Comparison Sorting
                                                                                                    • Slide 42
                                                                                                    • Slide 43
                                                                                                    • Lower Bound on Sorting
                                                                                                    • Decision Trees
                                                                                                    • Slide 46
                                                                                                    • Decision Tree for Sorting
                                                                                                    • Lower Bound for Comparison Sorting
                                                                                                    • Linear Sorting
                                                                                                    • Slide 50
                                                                                                    • External Sorting
                                                                                                    • External Mergesorting
                                                                                                    • External Mergesort
                                                                                                    • Summary

                                                                                                      51

                                                                                                      External Sorting

                                                                                                      o What is the number of elements N we wish to sort do not fit in memory

                                                                                                      o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

                                                                                                      o We want to minimize disk accesses

                                                                                                      52

                                                                                                      External Mergesorting

                                                                                                      o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                                                      o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                                                      o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                                                      size M(K+1)o Perform a K-way merge O(N)

                                                                                                      o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                                                      53

                                                                                                      External Mergesort

                                                                                                      o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                                                      o P = page size

                                                                                                      o Accesses = 4NP (read-allwrite-all twice)

                                                                                                      54

                                                                                                      Summary

                                                                                                      • G64ADS Advanced Data Structures
                                                                                                      • Insertion sort
                                                                                                      • Slide 3
                                                                                                      • Slide 4
                                                                                                      • Slide 5
                                                                                                      • Slide 6
                                                                                                      • Insertion sort worst-case running time
                                                                                                      • Heapsort
                                                                                                      • Heapsort -Analysis
                                                                                                      • Heapsort ndash No Extra Memory
                                                                                                      • Slide 11
                                                                                                      • Mergesort
                                                                                                      • Slide 13
                                                                                                      • Mergesort Divide
                                                                                                      • Slide 15
                                                                                                      • Slide 16
                                                                                                      • Mergesort Merge
                                                                                                      • Slide 18
                                                                                                      • Slide 19
                                                                                                      • Mergesort Analysis
                                                                                                      • Slide 21
                                                                                                      • Slide 22
                                                                                                      • Quicksort
                                                                                                      • Quicksort Algorithm
                                                                                                      • Quicksort Example
                                                                                                      • Why so fast
                                                                                                      • Picking the Pivot
                                                                                                      • Slide 28
                                                                                                      • Partitioning Strategy
                                                                                                      • Partitioning Example
                                                                                                      • Slide 31
                                                                                                      • Slide 32
                                                                                                      • Slide 33
                                                                                                      • Small Arrays
                                                                                                      • QuickSort Implementation
                                                                                                      • Slide 36
                                                                                                      • Slide 37
                                                                                                      • Analysis of QuickSort
                                                                                                      • Slide 39
                                                                                                      • Slide 40
                                                                                                      • Comparison Sorting
                                                                                                      • Slide 42
                                                                                                      • Slide 43
                                                                                                      • Lower Bound on Sorting
                                                                                                      • Decision Trees
                                                                                                      • Slide 46
                                                                                                      • Decision Tree for Sorting
                                                                                                      • Lower Bound for Comparison Sorting
                                                                                                      • Linear Sorting
                                                                                                      • Slide 50
                                                                                                      • External Sorting
                                                                                                      • External Mergesorting
                                                                                                      • External Mergesort
                                                                                                      • Summary

                                                                                                        52

                                                                                                        External Mergesorting

                                                                                                        o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

                                                                                                        o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

                                                                                                        o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

                                                                                                        size M(K+1)o Perform a K-way merge O(N)

                                                                                                        o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

                                                                                                        53

                                                                                                        External Mergesort

                                                                                                        o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                                                        o P = page size

                                                                                                        o Accesses = 4NP (read-allwrite-all twice)

                                                                                                        54

                                                                                                        Summary

                                                                                                        • G64ADS Advanced Data Structures
                                                                                                        • Insertion sort
                                                                                                        • Slide 3
                                                                                                        • Slide 4
                                                                                                        • Slide 5
                                                                                                        • Slide 6
                                                                                                        • Insertion sort worst-case running time
                                                                                                        • Heapsort
                                                                                                        • Heapsort -Analysis
                                                                                                        • Heapsort ndash No Extra Memory
                                                                                                        • Slide 11
                                                                                                        • Mergesort
                                                                                                        • Slide 13
                                                                                                        • Mergesort Divide
                                                                                                        • Slide 15
                                                                                                        • Slide 16
                                                                                                        • Mergesort Merge
                                                                                                        • Slide 18
                                                                                                        • Slide 19
                                                                                                        • Mergesort Analysis
                                                                                                        • Slide 21
                                                                                                        • Slide 22
                                                                                                        • Quicksort
                                                                                                        • Quicksort Algorithm
                                                                                                        • Quicksort Example
                                                                                                        • Why so fast
                                                                                                        • Picking the Pivot
                                                                                                        • Slide 28
                                                                                                        • Partitioning Strategy
                                                                                                        • Partitioning Example
                                                                                                        • Slide 31
                                                                                                        • Slide 32
                                                                                                        • Slide 33
                                                                                                        • Small Arrays
                                                                                                        • QuickSort Implementation
                                                                                                        • Slide 36
                                                                                                        • Slide 37
                                                                                                        • Analysis of QuickSort
                                                                                                        • Slide 39
                                                                                                        • Slide 40
                                                                                                        • Comparison Sorting
                                                                                                        • Slide 42
                                                                                                        • Slide 43
                                                                                                        • Lower Bound on Sorting
                                                                                                        • Decision Trees
                                                                                                        • Slide 46
                                                                                                        • Decision Tree for Sorting
                                                                                                        • Lower Bound for Comparison Sorting
                                                                                                        • Linear Sorting
                                                                                                        • Slide 50
                                                                                                        • External Sorting
                                                                                                        • External Mergesorting
                                                                                                        • External Mergesort
                                                                                                        • Summary

                                                                                                          53

                                                                                                          External Mergesort

                                                                                                          o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

                                                                                                          o P = page size

                                                                                                          o Accesses = 4NP (read-allwrite-all twice)

                                                                                                          54

                                                                                                          Summary

                                                                                                          • G64ADS Advanced Data Structures
                                                                                                          • Insertion sort
                                                                                                          • Slide 3
                                                                                                          • Slide 4
                                                                                                          • Slide 5
                                                                                                          • Slide 6
                                                                                                          • Insertion sort worst-case running time
                                                                                                          • Heapsort
                                                                                                          • Heapsort -Analysis
                                                                                                          • Heapsort ndash No Extra Memory
                                                                                                          • Slide 11
                                                                                                          • Mergesort
                                                                                                          • Slide 13
                                                                                                          • Mergesort Divide
                                                                                                          • Slide 15
                                                                                                          • Slide 16
                                                                                                          • Mergesort Merge
                                                                                                          • Slide 18
                                                                                                          • Slide 19
                                                                                                          • Mergesort Analysis
                                                                                                          • Slide 21
                                                                                                          • Slide 22
                                                                                                          • Quicksort
                                                                                                          • Quicksort Algorithm
                                                                                                          • Quicksort Example
                                                                                                          • Why so fast
                                                                                                          • Picking the Pivot
                                                                                                          • Slide 28
                                                                                                          • Partitioning Strategy
                                                                                                          • Partitioning Example
                                                                                                          • Slide 31
                                                                                                          • Slide 32
                                                                                                          • Slide 33
                                                                                                          • Small Arrays
                                                                                                          • QuickSort Implementation
                                                                                                          • Slide 36
                                                                                                          • Slide 37
                                                                                                          • Analysis of QuickSort
                                                                                                          • Slide 39
                                                                                                          • Slide 40
                                                                                                          • Comparison Sorting
                                                                                                          • Slide 42
                                                                                                          • Slide 43
                                                                                                          • Lower Bound on Sorting
                                                                                                          • Decision Trees
                                                                                                          • Slide 46
                                                                                                          • Decision Tree for Sorting
                                                                                                          • Lower Bound for Comparison Sorting
                                                                                                          • Linear Sorting
                                                                                                          • Slide 50
                                                                                                          • External Sorting
                                                                                                          • External Mergesorting
                                                                                                          • External Mergesort
                                                                                                          • Summary

                                                                                                            54

                                                                                                            Summary

                                                                                                            • G64ADS Advanced Data Structures
                                                                                                            • Insertion sort
                                                                                                            • Slide 3
                                                                                                            • Slide 4
                                                                                                            • Slide 5
                                                                                                            • Slide 6
                                                                                                            • Insertion sort worst-case running time
                                                                                                            • Heapsort
                                                                                                            • Heapsort -Analysis
                                                                                                            • Heapsort ndash No Extra Memory
                                                                                                            • Slide 11
                                                                                                            • Mergesort
                                                                                                            • Slide 13
                                                                                                            • Mergesort Divide
                                                                                                            • Slide 15
                                                                                                            • Slide 16
                                                                                                            • Mergesort Merge
                                                                                                            • Slide 18
                                                                                                            • Slide 19
                                                                                                            • Mergesort Analysis
                                                                                                            • Slide 21
                                                                                                            • Slide 22
                                                                                                            • Quicksort
                                                                                                            • Quicksort Algorithm
                                                                                                            • Quicksort Example
                                                                                                            • Why so fast
                                                                                                            • Picking the Pivot
                                                                                                            • Slide 28
                                                                                                            • Partitioning Strategy
                                                                                                            • Partitioning Example
                                                                                                            • Slide 31
                                                                                                            • Slide 32
                                                                                                            • Slide 33
                                                                                                            • Small Arrays
                                                                                                            • QuickSort Implementation
                                                                                                            • Slide 36
                                                                                                            • Slide 37
                                                                                                            • Analysis of QuickSort
                                                                                                            • Slide 39
                                                                                                            • Slide 40
                                                                                                            • Comparison Sorting
                                                                                                            • Slide 42
                                                                                                            • Slide 43
                                                                                                            • Lower Bound on Sorting
                                                                                                            • Decision Trees
                                                                                                            • Slide 46
                                                                                                            • Decision Tree for Sorting
                                                                                                            • Lower Bound for Comparison Sorting
                                                                                                            • Linear Sorting
                                                                                                            • Slide 50
                                                                                                            • External Sorting
                                                                                                            • External Mergesorting
                                                                                                            • External Mergesort
                                                                                                            • Summary

                                                                                                              top related