Top Banner
Dual-Pivot Quicksort — Asymmetries in Sorting Sebastian Wild Markus E. Nebel [wild, nebel] @cs.uni-kl.de Computer Science Department Feb 2014 Dagstuhl Seminar 14 091 Data Structures and Advanced Models of Computation on Big Data Sebastian Wild Dual-Pivot Quicksort Feb. 2014 1 / 19
80

Dual-Pivot Quicksort - Asymmetries in Sorting

Apr 15, 2017

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dual-Pivot Quicksort - Asymmetries in Sorting

Dual-Pivot Quicksort — Asymmetries in Sorting

Sebastian Wild Markus E. Nebel[wild, nebel] @cs.uni-kl.de

Computer Science Department

Feb 2014Dagstuhl Seminar 14 091

Data Structures and Advanced Modelsof Computation on Big Data

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 1 / 19

Page 2: Dual-Pivot Quicksort - Asymmetries in Sorting

Sorting Algorithms in Practice

Many inventionsby algorithms comunity

vs. Few methodssuccessful in practice

C

C++

Java 6

Quicksort+Mergesort variant as stable sort

.NET

Haskell

Python Timsort

Sorting methods listed on Wikipedia Sorting methods of standard libraries for random access data

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 2 / 19

Page 3: Dual-Pivot Quicksort - Asymmetries in Sorting

Sorting Algorithms in Practice

Many inventionsby algorithms comunity

vs. Few methodssuccessful in practice

C

C++

Java 6

Quicksort+Mergesort variant as stable sort

.NET

Haskell

Python Timsort

Sorting methods listed on Wikipedia Sorting methods of standard libraries for random access data

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 2 / 19

Page 4: Dual-Pivot Quicksort - Asymmetries in Sorting

History of Quicksort in Practice

1961,62 Hoare: first publication, average case analysis

1969 Singleton: median-of-three & Insertionsort on small subarrays

1975-78 Sedgewick: detailled analysis of many optimizations

1993 Bentley, McIlroy: Engineering a Sort Function

1997 Musser: O(n logn) worst case by bounded recursion depth

Basic algorithm settled since 1961; latest tweaks from 1990’s.Since then: Almost identical in all programming libraries!

Until 2009: Java 7 switches to a new dual pivot Quicksort!

1961 1969 1975 ’78 1993 1997 today’62 ’77

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 3 / 19

Page 5: Dual-Pivot Quicksort - Asymmetries in Sorting

History of Quicksort in Practice

1961,62 Hoare: first publication, average case analysis

1969 Singleton: median-of-three & Insertionsort on small subarrays

1975-78 Sedgewick: detailled analysis of many optimizations

1993 Bentley, McIlroy: Engineering a Sort Function

1997 Musser: O(n logn) worst case by bounded recursion depth

Basic algorithm settled since 1961; latest tweaks from 1990’s.Since then: Almost identical in all programming libraries!

Until 2009: Java 7 switches to a new dual pivot Quicksort!

1961 1969 1975 ’78 1993 1997 today’62 ’77

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 3 / 19

Page 6: Dual-Pivot Quicksort - Asymmetries in Sorting

History of Quicksort in Practice

1961,62 Hoare: first publication, average case analysis

1969 Singleton: median-of-three & Insertionsort on small subarrays

1975-78 Sedgewick: detailled analysis of many optimizations

1993 Bentley, McIlroy: Engineering a Sort Function

1997 Musser: O(n logn) worst case by bounded recursion depth

Basic algorithm settled since 1961; latest tweaks from 1990’s.Since then: Almost identical in all programming libraries!

Until 2009: Java 7 switches to a new dual pivot Quicksort!

1961 1969 1975 ’78 1993 1997 today’62 ’77

2009

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 3 / 19

Page 7: Dual-Pivot Quicksort - Asymmetries in Sorting

Discovery of Yaroslavskiy’s Algorithm

2009 Vladimir Yaroslavskiy (researcher in St. Petersburg)experiments with Quicksort with two pivots

11 Sep 2009 announcement on Java core library mailing list

29 Oct 2009 inclusion in development version of library

2009 – 2011 optimizations by Josh Bloch, Jon Bentley, many others

28 July 2011 public release of Java 7 with Yaroslavskiy’s Quicksort

1961 1969 1975 ’78 1993 1997’62 ’77

today2009

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 4 / 19

Page 8: Dual-Pivot Quicksort - Asymmetries in Sorting

Running Time Experiments

Why switch to new, unknown algorithm?

Because it is faster!

0 0.5 1 1.5 2

·106

7

8

9

n

time

10−

6·n

lnn

Java 6 Library

Normalized Java runtimes (in ms).Average and standard deviation of1000 random permutations per size.

remains true for basic variants of algorithms: vs. !

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 5 / 19

Page 9: Dual-Pivot Quicksort - Asymmetries in Sorting

Running Time Experiments

Why switch to new, unknown algorithm? Because it is faster!

0 0.5 1 1.5 2

·106

7

8

9

n

time

10−

6·n

lnn

Java 6 Library

Java 7 Library

Normalized Java runtimes (in ms).Average and standard deviation of1000 random permutations per size.

remains true for basic variants of algorithms: vs. !

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 5 / 19

Page 10: Dual-Pivot Quicksort - Asymmetries in Sorting

Dual Pivot Quicksort

High Level Algorithm:

1 Partition array arround two pivots p 6 q.2 Sort 3 subarrays recursively.

How to do partitioning?

1 For each element x, determine its class

small for x < p

medium for p < x < q

large for q < x

by comparing x to p and/or q

2 Arrange elements according to classes p q

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 6 / 19

Page 11: Dual-Pivot Quicksort - Asymmetries in Sorting

Dual Pivot Quicksort

High Level Algorithm:

1 Partition array arround two pivots p 6 q.2 Sort 3 subarrays recursively.

How to do partitioning?

1 For each element x, determine its class

small for x < p

medium for p < x < q

large for q < x

by comparing x to p and/or q

2 Arrange elements according to classes p q

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 6 / 19

Page 12: Dual-Pivot Quicksort - Asymmetries in Sorting

Dual Pivot Quicksort — Previous Work

Using two pivots is not a new idea!

Robert Sedgewick, 1975

in-place dual pivot Quicksort implementation

more comparisons and swaps than classic Quicksort

Pascal Hennequin, 1991

list-based Quicksort with r pivots

r = 2 same #comparisons as classic Quicksort (∼ 2n lnn)

Using two pivots does not pay.

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 7 / 19

Page 13: Dual-Pivot Quicksort - Asymmetries in Sorting

Dual Pivot Quicksort — Previous Work

Using two pivots is not a new idea!

Robert Sedgewick, 1975

in-place dual pivot Quicksort implementation

more comparisons and swaps than classic Quicksort

Pascal Hennequin, 1991

list-based Quicksort with r pivots

r = 2 same #comparisons as classic Quicksort (∼ 2n lnn)

Using two pivots does not pay.

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 7 / 19

Page 14: Dual-Pivot Quicksort - Asymmetries in Sorting

Dual Pivot Quicksort — Comparison “Bound”

How many comparisons to determine classes ( small , medium or large ) ?

Assume compare with p first (Hennequin, 1991). small elements need 1, others 2 comparisons

On average:13 small elements

13 · 1+

23 · 2 =

53 comparisons per element

Any partitioning method requires53(n− 2) ∼ 20

12n comparisons, right?

No! (Stay tuned . . . )

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 8 / 19

Page 15: Dual-Pivot Quicksort - Asymmetries in Sorting

Dual Pivot Quicksort — Comparison “Bound”

How many comparisons to determine classes ( small , medium or large ) ?

Assume compare with p first (Hennequin, 1991). small elements need 1, others 2 comparisons

On average:13 small elements

13 · 1+

23 · 2 =

53 comparisons per element

Any partitioning method requires53(n− 2) ∼ 20

12n comparisons, right?

No! (Stay tuned . . . )

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 8 / 19

Page 16: Dual-Pivot Quicksort - Asymmetries in Sorting

Dual Pivot Quicksort — Comparison “Bound”

How many comparisons to determine classes ( small , medium or large ) ?

Assume compare with p first (Hennequin, 1991). small elements need 1, others 2 comparisons

On average:13 small elements

13 · 1+

23 · 2 =

53 comparisons per element

Any partitioning method requires53(n− 2) ∼ 20

12n comparisons, right?

No! (Stay tuned . . . )

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 8 / 19

Page 17: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 5 1 7 4 2 8 6

gk̀

p qInvariant:

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 18: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 5 1 7 4 2 8 6

gk̀

p qInvariant:

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 19: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 5 1 7 4 2 8 6

gk̀

p qInvariant:

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 20: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 5 1 7 4 2 8 6

gk̀

p qInvariant:

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 21: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 5 1 7 4 2 8 6

g` k

p≤◦≤qp qInvariant:

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 22: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 5 1 7 4 2 8 6

g` k

p≤◦≤qp qInvariant:

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 23: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 1 5 7 4 2 8 6

g` k

p≤◦≤qp qInvariant:

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 24: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 1 5 7 4 2 8 6

g` k

< p p≤◦≤qp qInvariant:

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 25: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 1 5 7 4 2 8 6

g` k

< p p≤◦≤qp qInvariant:

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 26: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 1 5 7 4 2 8 6

g` k

< p p≤◦≤qp qInvariant:

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 27: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 1 5 7 4 2 8 6

g` k

< p p≤◦≤qp qInvariant:

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 28: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 1 5 7 4 2 8 6

g` k

< p p≤◦≤qp qInvariant:

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 29: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 1 5 7 4 2 8 6

g` k

< p p≤◦≤q ≥ qp qInvariant:

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 30: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 1 5 7 4 2 8 6

g` k

< p p≤◦≤q ≥ qp qInvariant:

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 31: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 1 5 7 4 2 8 6

g` k

< p p≤◦≤q ≥ qp qInvariant:

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 32: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 1 2 5 4 7 8 6

g` k

< p p≤◦≤q ≥ qp qInvariant:

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 33: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 1 2 5 4 7 8 6

g` k

< p p≤◦≤q ≥ qp qInvariant:

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 34: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 1 2 5 4 7 8 6

g` k

< p p≤◦≤q ≥ qp qInvariant:

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 35: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 1 2 5 4 7 8 6

g` k

< p p≤◦≤q ≥ qp qInvariant:

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 36: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

3 1 2 5 4 7 8 6

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 37: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

2 1 3 5 4 6 8 7

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 38: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Algorithm

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

p q

1 2 3 4 5 6 7 8

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 9 / 19

Page 39: Dual-Pivot Quicksort - Asymmetries in Sorting

Asymmetries in #Comparisons

How many comparisons used?

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

Cheap elements (1 cmp.):

small in k’s range ( s@K )

large in g’s range ( l@ G )

Key Observation: sizes of classes and ranges are correlated:

many small elements k’s range big

many large elements g’s range big

> 13 cheap elements

(on average)

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 10 / 19

Page 40: Dual-Pivot Quicksort - Asymmetries in Sorting

Asymmetries in #Comparisons

How many comparisons used?

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

Cheap elements (1 cmp.):

small in k’s range ( s@K )

large in g’s range ( l@ G )

Key Observation: sizes of classes and ranges are correlated:

many small elements k’s range big

many large elements g’s range big

> 13 cheap elements

(on average)

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 10 / 19

Page 41: Dual-Pivot Quicksort - Asymmetries in Sorting

Comparisons per Partitioning Step

Can give full distribution of s@K and l@ G(hypergeometric conditional on p and q)

Expected #cmps:

cn ∼ 2n two cmps per element

− 312n small in k’s range

− 16n large in g’s range

= 1912 n

Recall: “lower bound” was 2012n.

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 11 / 19

Page 42: Dual-Pivot Quicksort - Asymmetries in Sorting

Results — Basic Operations

Solving dual-pivot Quicksort recurrence gives:

Comparisons:

Yaroslavskiy needs only ∼ 1.9n lnn comparisons on average.

Classic Quicksort: ∼ 2n lnn

Swaps:

∼ 0.6n lnn swaps for Yaroslavskiy

∼ 0.3n lnn swaps for classic Quicksort

Which algorithm is faster?Result remains inconclusive!

Comparisons Swaps

0

0.5

1

1.5

2

−5%

+80%

Classic Quicksort

Yaroslavskiy

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 12 / 19

Page 43: Dual-Pivot Quicksort - Asymmetries in Sorting

Results — Basic Operations

Solving dual-pivot Quicksort recurrence gives:

Comparisons:

Yaroslavskiy needs only ∼ 1.9n lnn comparisons on average.

Classic Quicksort: ∼ 2n lnn

Swaps:

∼ 0.6n lnn swaps for Yaroslavskiy

∼ 0.3n lnn swaps for classic Quicksort

Which algorithm is faster?Result remains inconclusive!

Comparisons Swaps

0

0.5

1

1.5

2

−5%

+80%

Classic Quicksort

Yaroslavskiy

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 12 / 19

Page 44: Dual-Pivot Quicksort - Asymmetries in Sorting

Instruction Counts à la Knuth

How to force conclusive analysis?

More detailed cost model!

old days: Knuth’s MIX instructions

in the Java universe: Java Bytecode instructions

#Bytecodes highly correlated with running time

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 13 / 19

Page 45: Dual-Pivot Quicksort - Asymmetries in Sorting

Instruction Counts à la Knuth

How to force conclusive analysis?

More detailed cost model!

old days: Knuth’s MIX instructions

in the Java universe: Java Bytecode instructions

#Bytecodes highly correlated with running time

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 13 / 19

Page 46: Dual-Pivot Quicksort - Asymmetries in Sorting

Results — Primitive Instructions

Results:Yaroslavskiy’s algorithm: ∼ 21.7n lnn Bytecodes

Classic Quicksort: ∼ 18n lnn Bytecodes

Similar for Knuth’s new mythical machine MMIX

Yaroslavskiy: ∼ (13.1υ+2.8µ)n lnn

Classic Quicksort: ∼ (11υ+2.6µ)n lnn

υ (“oops”): CPU cycles; µ (“mems”): memory accesses

Classic Quicksort significantly better in all these measures . . .

→ These models ignore memory hierarchies; see Alex López-Ortiz’ talk!

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 14 / 19

Page 47: Dual-Pivot Quicksort - Asymmetries in Sorting

Results — Primitive Instructions

Results:Yaroslavskiy’s algorithm: ∼ 21.7n lnn Bytecodes

Classic Quicksort: ∼ 18n lnn Bytecodes

Similar for Knuth’s new mythical machine MMIX

Yaroslavskiy: ∼ (13.1υ+2.8µ)n lnn

Classic Quicksort: ∼ (11υ+2.6µ)n lnn

υ (“oops”): CPU cycles; µ (“mems”): memory accesses

Classic Quicksort significantly better in all these measures . . .

→ These models ignore memory hierarchies; see Alex López-Ortiz’ talk!

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 14 / 19

Page 48: Dual-Pivot Quicksort - Asymmetries in Sorting

Results — Primitive Instructions

Results:Yaroslavskiy’s algorithm: ∼ 21.7n lnn Bytecodes

Classic Quicksort: ∼ 18n lnn Bytecodes

Similar for Knuth’s new mythical machine MMIX

Yaroslavskiy: ∼ (13.1υ+2.8µ)n lnn

Classic Quicksort: ∼ (11υ+2.6µ)n lnn

υ (“oops”): CPU cycles; µ (“mems”): memory accesses

Classic Quicksort significantly better in all these measures . . .

→ These models ignore memory hierarchies; see Alex López-Ortiz’ talk!

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 14 / 19

Page 49: Dual-Pivot Quicksort - Asymmetries in Sorting

Asymmetries in #Comparisons — Revisited

How many comparisons used?

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

Cheap elements (1 cmp.):

small in k’s range ( s@K )

large in g’s range ( l@ G )

Key Observation: sizes of classes and ranges are correlated:

many small elements k’s range big

many large elements g’s range big

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 15 / 19

Page 50: Dual-Pivot Quicksort - Asymmetries in Sorting

Exploiting Asymmetries in #Comparisons

Motivation: medium elements always cause 2 cmps Should avoid medium elements if possible!

Thought Experiment:

Assume always τ1 · n / τ2 · n / τ3 · n elements small / medium / large.

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

< p p ≤ ◦ ≤ q ≥ qp q

Comparisons per paritioning step:

2n two per element− τ1 ( τ1 + τ2 )n small in k’s range

− τ3 τ3 n large in g’s range

Minimal for τ = (1, 0, 0) or τ = (0, 0, 1).

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 16 / 19

Page 51: Dual-Pivot Quicksort - Asymmetries in Sorting

Exploiting Asymmetries in #Comparisons

Motivation: medium elements always cause 2 cmps Should avoid medium elements if possible!

Thought Experiment:

Assume always τ1 · n / τ2 · n / τ3 · n elements small / medium / large.

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

< p p ≤ ◦ ≤ q ≥ qp q

Comparisons per paritioning step:

2n two per element− τ1 ( τ1 + τ2 )n small in k’s range

− τ3 τ3 n large in g’s range

Minimal for τ = (1, 0, 0) or τ = (0, 0, 1).

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 16 / 19

Page 52: Dual-Pivot Quicksort - Asymmetries in Sorting

Exploiting Asymmetries in #Comparisons

Motivation: medium elements always cause 2 cmps Should avoid medium elements if possible!

Thought Experiment:

Assume always τ1 · n / τ2 · n / τ3 · n elements small / medium / large.

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

< p p ≤ ◦ ≤ q ≥ qp q

Comparisons per paritioning step:

2n two per element− τ1 ( τ1 + τ2 )n small in k’s range

− τ3 τ3 n large in g’s range

Minimal for τ = (1, 0, 0) or τ = (0, 0, 1).

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 16 / 19

Page 53: Dual-Pivot Quicksort - Asymmetries in Sorting

Exploiting Asymmetries in #Comparisons

Motivation: medium elements always cause 2 cmps Should avoid medium elements if possible!

Thought Experiment:

Assume always τ1 · n / τ2 · n / τ3 · n elements small / medium / large.

< p ?

swap ` < q ?

skip swap g

3 7

3 7

> q ?

< p ? skip

swap ` swap k

37

3 7

< p p ≤ ◦ ≤ q ≥ qp q

Comparisons per paritioning step:

2n two per element− τ1 ( τ1 + τ2 )n small in k’s range

− τ3 τ3 n large in g’s range

Minimal for τ = (1, 0, 0) or τ = (0, 0, 1).

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 16 / 19

Page 54: Dual-Pivot Quicksort - Asymmetries in Sorting

Exploiting Asymmetries in #Comparisons

τ = (1, 0, 0) optimal w. r. t. one partitioning step

But: worst case for partitioning:

τ = (1, 0, 0) Symmetric case τ = ( 13, 13, 13)

Is τ = ( 12, 0, 1

2) better?

trade-off between partitioning costs and equality of split

What is the optimal τ ?

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 17 / 19

Page 55: Dual-Pivot Quicksort - Asymmetries in Sorting

Exploiting Asymmetries in #Comparisons

τ = (1, 0, 0) optimal w. r. t. one partitioning step

But: worst case for partitioning:

τ = (1, 0, 0) Symmetric case τ = ( 13, 13, 13)

Is τ = ( 12, 0, 1

2) better?

trade-off between partitioning costs and equality of split

What is the optimal τ ?

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 17 / 19

Page 56: Dual-Pivot Quicksort - Asymmetries in Sorting

Exploiting Asymmetries in #Comparisons

τ = (1, 0, 0) optimal w. r. t. one partitioning step

But: worst case for partitioning:

τ = (1, 0, 0) Symmetric case τ = ( 13, 13, 13)

Is τ = ( 12, 0, 1

2) better?

trade-off between partitioning costs and equality of split

What is the optimal τ ?

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 17 / 19

Page 57: Dual-Pivot Quicksort - Asymmetries in Sorting

Results — Optimal τ

We could show: Overall ∼2− τ1 ( τ1 + τ2 ) − τ3 τ3

−∑3i=1 τi ln(τi)

n lnn cmps

optimal choiceτ ∗ ≈ (0.4288, 0.2688, 0.3024)

quite different from (13 ,13 ,13)

similar analysis for #Bytecodes:

τ ∗BC ≈ (0.2068, 0.3485, 0.4447)

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

τ1

τ2

Optimal skew of pivots heavily depends on cost measure!

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 18 / 19

Page 58: Dual-Pivot Quicksort - Asymmetries in Sorting

Results — Optimal τ

We could show: Overall ∼2− τ1 ( τ1 + τ2 ) − τ3 τ3

−∑3i=1 τi ln(τi)

n lnn cmps

optimal choiceτ ∗ ≈ (0.4288, 0.2688, 0.3024)

quite different from (13 ,13 ,13)

similar analysis for #Bytecodes:

τ ∗BC ≈ (0.2068, 0.3485, 0.4447)

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

τ1

τ2

Optimal skew of pivots heavily depends on cost measure!

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 18 / 19

Page 59: Dual-Pivot Quicksort - Asymmetries in Sorting

Results — Optimal τ

We could show: Overall ∼2− τ1 ( τ1 + τ2 ) − τ3 τ3

−∑3i=1 τi ln(τi)

n lnn cmps

optimal choiceτ ∗ ≈ (0.4288, 0.2688, 0.3024)

quite different from (13 ,13 ,13)

similar analysis for #Bytecodes:

τ ∗BC ≈ (0.2068, 0.3485, 0.4447)

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

τ1

τ2

Optimal skew of pivots heavily depends on cost measure!

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 18 / 19

Page 60: Dual-Pivot Quicksort - Asymmetries in Sorting

Results — Optimal τ

We could show: Overall ∼2− τ1 ( τ1 + τ2 ) − τ3 τ3

−∑3i=1 τi ln(τi)

n lnn cmps

optimal choiceτ ∗ ≈ (0.4288, 0.2688, 0.3024)

quite different from (13 ,13 ,13)

similar analysis for #Bytecodes:

τ ∗BC ≈ (0.2068, 0.3485, 0.4447)

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

1.49311.5171

τ1

τ2

Optimal skew of pivots heavily depends on cost measure!

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 18 / 19

Page 61: Dual-Pivot Quicksort - Asymmetries in Sorting

Results — Optimal τ

We could show: Overall ∼2− τ1 ( τ1 + τ2 ) − τ3 τ3

−∑3i=1 τi ln(τi)

n lnn cmps

optimal choiceτ ∗ ≈ (0.4288, 0.2688, 0.3024)

quite different from (13 ,13 ,13)

similar analysis for #Bytecodes:

τ ∗BC ≈ (0.2068, 0.3485, 0.4447)

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

1.49311.5171

τ1

τ2

Optimal skew of pivots heavily depends on cost measure!

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 18 / 19

Page 62: Dual-Pivot Quicksort - Asymmetries in Sorting

Results — Optimal τ

We could show: Overall ∼2− τ1 ( τ1 + τ2 ) − τ3 τ3

−∑3i=1 τi ln(τi)

n lnn cmps

optimal choiceτ ∗ ≈ (0.4288, 0.2688, 0.3024)

quite different from (13 ,13 ,13)

similar analysis for #Bytecodes:

τ ∗BC ≈ (0.2068, 0.3485, 0.4447)

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

1.49311.5171

τ1

τ2

Optimal skew of pivots heavily depends on cost measure!

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 18 / 19

Page 63: Dual-Pivot Quicksort - Asymmetries in Sorting

Conclusion

We’ve seen:Breaking symmetry in order of comparisons is beneficialin multi-pivot Quicksort

skewed pivots can amplify savings

trade-off between partitioning costs and balance in subproblem sizes

Open Questions:How about other input distributions?(up to now only random permutations)

Is Yaroslavskiy’s algorithm special ?Are there better asymmetric Quicksorts (for practice)?

→ Martin Aumüller’s talk

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 19 / 19

Page 64: Dual-Pivot Quicksort - Asymmetries in Sorting

Conclusion

We’ve seen:Breaking symmetry in order of comparisons is beneficialin multi-pivot Quicksort

skewed pivots can amplify savings

trade-off between partitioning costs and balance in subproblem sizes

Open Questions:How about other input distributions?(up to now only random permutations)

Is Yaroslavskiy’s algorithm special ?Are there better asymmetric Quicksorts (for practice)?

→ Martin Aumüller’s talk

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 19 / 19

Page 65: Dual-Pivot Quicksort - Asymmetries in Sorting

Choice of τ in Practice

Lesson for Practice?

exact quantiles too expensive use sampling!

sample size k controls deviation

limited to discrete quantiles

Java 7 library uses tertiles of five elements

E[τ2,4] =

19.298n lnn Bytecodes

Recall: Optimum (for #Bytecodes)

τ ∗BC ≈ (0.2068, 0.3485, 0.4447)

Running time:

C++: 0.5% fasterJava: . . . depends on JIT

probably better choice:E[τ1,3] =

18.791n lnn Bytecodes

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 20 / 19

Page 66: Dual-Pivot Quicksort - Asymmetries in Sorting

Choice of τ in Practice

Lesson for Practice?

exact quantiles too expensive use sampling!

sample size k controls deviation

limited to discrete quantiles

Java 7 library uses tertiles of five elements

E[τ2,4] =

19.298n lnn Bytecodes

Recall: Optimum (for #Bytecodes)

τ ∗BC ≈ (0.2068, 0.3485, 0.4447)

Running time:

C++: 0.5% fasterJava: . . . depends on JIT

probably better choice:E[τ1,3] =

18.791n lnn Bytecodes

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 20 / 19

Page 67: Dual-Pivot Quicksort - Asymmetries in Sorting

Choice of τ in Practice

Lesson for Practice?

exact quantiles too expensive use sampling!

sample size k controls deviation

limited to discrete quantiles

Java 7 library uses tertiles of five elements

E[τ2,4] =

19.298n lnn Bytecodes

Recall: Optimum (for #Bytecodes)

τ ∗BC ≈ (0.2068, 0.3485, 0.4447)

Running time:

C++: 0.5% fasterJava: . . . depends on JIT

probably better choice:E[τ1,3] =

18.791n lnn Bytecodes

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 20 / 19

Page 68: Dual-Pivot Quicksort - Asymmetries in Sorting

Choice of τ in Practice

Lesson for Practice?

exact quantiles too expensive use sampling!

sample size k controls deviation

limited to discrete quantiles

Java 7 library uses tertiles of five elements

E[τ2,4] = 19.298n lnn Bytecodes

Recall: Optimum (for #Bytecodes)

τ ∗BC ≈ (0.2068, 0.3485, 0.4447)

Running time:

C++: 0.5% fasterJava: . . . depends on JIT

probably better choice:E[τ1,3] = 18.791n lnn Bytecodes

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 20 / 19

Page 69: Dual-Pivot Quicksort - Asymmetries in Sorting

Choice of τ in Practice

Lesson for Practice?

exact quantiles too expensive use sampling!

sample size k controls deviation

limited to discrete quantiles

Java 7 library uses tertiles of five elements

E[τ2,4] = 19.298n lnn Bytecodes

Recall: Optimum (for #Bytecodes)

τ ∗BC ≈ (0.2068, 0.3485, 0.4447)Running time:

C++: 0.5% fasterJava: . . . depends on JIT

probably better choice:E[τ1,3] = 18.791n lnn Bytecodes

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 20 / 19

Page 70: Dual-Pivot Quicksort - Asymmetries in Sorting

Beating the “Lower Bound”

∼ 2012n comparisons only needed,

if there is one comparison location,then checks for x and y independent

But: Can have several comparison locations!

Here: Assume two locations C1 and C2 s. t.

C1 first compares with p.

C2 first compares with q.

C1 executed often, iff p is large.

C2 executed often, iff q is small.

C1 executed ofteniff many small elementsiff good chance that C1 needs only one comparison

(C2 similar)

less comparisons than 53 per elements on average

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 21 / 19

Page 71: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s Quicksort

DUALPIVOTQUICKSORTYAROSLAVSKIY(A, left, right)

1 if right − left > 12 p := A[left]; q := A[right]3 if p > q then Swap p and q end if4 ` := left + 1; g := right − 1; k := `5 while k 6 g6 if A[k] < p7 SwapA[k] andA[`] ; ` := `+ 18 else if A[k] > q9 while A[g] > q and k < g do g := g− 1 end while10 SwapA[k] andA[g] ; g := g− 111 if A[k] < p12 SwapA[k] andA[`] ; ` := `+ 113 end if14 end if15 k := k+ 116 end while17 ` := `− 1; g := g+ 118 SwapA[left] andA[`] ; SwapA[right] andA[g]19 DUALPIVOTQUICKSORTYAROSLAVSKIY(A, left , `− 1)20 DUALPIVOTQUICKSORTYAROSLAVSKIY(A, `+ 1,g− 1)21 DUALPIVOTQUICKSORTYAROSLAVSKIY(A,g+ 1, right )22 end if

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 22 / 19

Page 72: Dual-Pivot Quicksort - Asymmetries in Sorting

Yaroslavskiy’s QuicksortDUALPIVOTQUICKSORTYAROSLAVSKIY(A, left, right)

1 if right − left > 12 p := A[left]; q := A[right]3 if p > q then Swap p and q end if4 ` := left + 1; g := right − 1; k := `5 while k 6 g6 Ck if A[k] < p7 SwapA[k] andA[`] ; ` := `+ 18 elseC′k if A[k] > q9 Cg while A[g] > q and k < g do g := g− 1 end while10 SwapA[k] andA[g] ; g := g− 111 C′g if A[k] < p12 SwapA[k] andA[`] ; ` := `+ 113 end if14 end if15 k := k+ 116 end while17 ` := `− 1; g := g+ 118 SwapA[left] andA[`] ; SwapA[right] andA[g]19 DUALPIVOTQUICKSORTYAROSLAVSKIY(A, left , `− 1)20 DUALPIVOTQUICKSORTYAROSLAVSKIY(A, `+ 1,g− 1)21 DUALPIVOTQUICKSORTYAROSLAVSKIY(A,g+ 1, right )22 end if

2 comparison locations

Ck handles pointer k

Cg handles pointer g

Ck first checks < p

C ′k if needed > q

Cg first checks > q

C ′g if needed < p

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 22 / 19

Page 73: Dual-Pivot Quicksort - Asymmetries in Sorting

Dual Pivot Quicksort Recurrence

Cn expected #comparisons to sort random permutation of {1, . . . , n}

Cn satisfies recurrence relation

Cn = Tn + 2n(n−1)

∑16p<q6n

(Cp−1 + Cq−p−1 + Cn−q

),

with Tn expected #comparisons in first partitioning step

recurrence solvable by standard methods

linear Tn ∼ a · n yields Cn ∼ 65a · n lnn.

It remains to compute Tn.

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 23 / 19

Page 74: Dual-Pivot Quicksort - Asymmetries in Sorting

Analysis of Yaroslavskiy’s Algorithm (1)

first comparison for all elements (at Ck or Cg ) ∼ n comparisons

second comparison for some elements at C′k resp. C′g. . . but how often are C ′k resp. C ′g reached?

C ′k : all non- small elements reached by pointer k.

C ′g : all non- large elements reached by pointer g.

second comparison for medium elements not avoidable ∼ 1

3n comparisons in expectation

it remains to count:large elements reached by k andsmall elements reached by g.

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 24 / 19

Page 75: Dual-Pivot Quicksort - Asymmetries in Sorting

Analysis of Yaroslavskiy’s Algorithm (2)

Second comparisons for small and large elements?Depends on location!

C ′k number of large elements in k’s range: l@K

C ′g number of small elements in g’s range: s@G

p q

k’s range K g’s range G

l@K = 3 s@G = 2

l@K and s@G are random even for fixed p and q

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 25 / 19

Page 76: Dual-Pivot Quicksort - Asymmetries in Sorting

Analysis of Yaroslavskiy’s Algorithm (2)

Second comparisons for small and large elements?Depends on location!

C ′k number of large elements in k’s range: l@K

C ′g number of small elements in g’s range: s@G

p q

k’s range K g’s range G

l@K = 3 s@G = 2

l@K and s@G are random even for fixed p and q

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 25 / 19

Page 77: Dual-Pivot Quicksort - Asymmetries in Sorting

Distribution of l@K and s@G (1)

Assume p and q are fixed.

1 Where do k and g cross?

Recall invariant: < p `→

> qg

←p 6 ◦ 6 q k

→?

k and g cross at (rank of) q|K| ∼ q, |G| ∼ n− q

2 How many small and large elements?

#small =∣∣{1, . . . , p− 1}∣∣ = p− 1

#large =∣∣{q+ 1, . . . , n}

∣∣ = n− q

|K|, |G|, #small and #large are constant (for given p and q).

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 26 / 19

Page 78: Dual-Pivot Quicksort - Asymmetries in Sorting

Distribution of l@K and s@G (1)

Assume p and q are fixed.

1 Where do k and g cross?

Recall invariant: < p `→

> qg

←p 6 ◦ 6 q k

→?

k and g cross at (rank of) q|K| ∼ q, |G| ∼ n− q

2 How many small and large elements?

#small =∣∣{1, . . . , p− 1}∣∣ = p− 1

#large =∣∣{q+ 1, . . . , n}

∣∣ = n− q

|K|, |G|, #small and #large are constant (for given p and q).

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 26 / 19

Page 79: Dual-Pivot Quicksort - Asymmetries in Sorting

Distribution of l@K and s@G (2)

3 Conditional Distribution of l@K

We draw positions of large elements at random.n− 2 positionsdraw #large positions without replacement|K| positions contribute to l@K

l@KD= Hypergeometric (#large, |K|, n− 2)

= Number of red balls obtainedwhen drawing #large balls without replacementform urn with n− 2 balls, exactly |K| of which are red.

E [l@K |p, q] =#large · |K|

n− 2∼

1 , 2

(n− q) · qn− 2

law of total expectation:

E [l@K] =∑

16p<q6n

1/(n2

)· (n− q) · q

n− 2∼ 1

6n

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 27 / 19

Page 80: Dual-Pivot Quicksort - Asymmetries in Sorting

Distribution of l@K and s@G (2)

3 Conditional Distribution of l@K

We draw positions of large elements at random.n− 2 positionsdraw #large positions without replacement|K| positions contribute to l@K

l@KD= Hypergeometric (#large, |K|, n− 2)

= Number of red balls obtainedwhen drawing #large balls without replacementform urn with n− 2 balls, exactly |K| of which are red.

E [l@K |p, q] =#large · |K|

n− 2∼

1 , 2

(n− q) · qn− 2

law of total expectation:

E [l@K] =∑

16p<q6n

1/(n2

)· (n− q) · q

n− 2∼ 1

6n

Sebastian Wild Dual-Pivot Quicksort Feb. 2014 27 / 19