Top Banner
Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)
31

Reducing Order Enforcement Cost in Complex Query Plans

Jan 11, 2016

Download

Documents

KIRTI

Reducing Order Enforcement Cost in Complex Query Plans. Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007). Background. Sort-based query processing algorithms Sort-merge Join (also Union/Intersection) Sort-based grouping and duplicate elimination Explicit “order by” - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Reducing Order Enforcement Cost in Complex Query Plans

Reducing Order Enforcement Cost in Complex Query Plans

Ravindra Guravannavar and S. Sudarshan(To appear in ICDE 2007)

Page 2: Reducing Order Enforcement Cost in Complex Query Plans

2

Background Sort-based query processing algorithms

Sort-merge Join (also Union/Intersection) Sort-based grouping and duplicate elimination

Explicit “order by” Notion of “Interesting Sort Orders” (System-R)

Find and remember the best plan for each sort order that may be useful

Optimization goal in Volcano : (expr, sort-order)

Page 3: Reducing Order Enforcement Cost in Complex Query Plans

3

The Problem Interesting orders can be too many!

Factorial in number of attributes involved Plan cost can vary substantially with the

choice of interesting order Clustering and covering indices Other operators in the input sub-expressions Possibility of partial sorting

G Group By {a2,a4,a5,…}

R S

R.a1=S.a1 and R.a2=S.a2 … R.an=S.an

Page 4: Reducing Order Enforcement Cost in Complex Query Plans

4

Motivation Joins in data integration and decision support

involve large number of attributes Increasing use of covering indices

Several alternative sort orders Partial sorting

Query patterns Attributes common to multiple operators

Known techniques Work only for unary operators like group-by

Page 5: Reducing Order Enforcement Cost in Complex Query Plans

5

Outline of the Talk Partial sorting

Changes to external sort Optimizer changes to handle partial sort orders

Interesting orders for a join tree : A special case Problem is NP-Hard A 2-approximation for the special case

The general problem Notion of favorable orders Plan generation using favorable orders Post-optimization phase

Experimental results

Page 6: Reducing Order Enforcement Cost in Complex Query Plans

6

Exploiting Partial Sort Orders

Sort on (a1, a2) given (a1) Standard external-sort

Cost is independent of input sort order Replacement-selection

Produces single run but incurs I/O Both methods break the pipeline – first o/p tuple

after reading all i/p

R S

R.a1=S.a1 and R.a2=S.a2

C. Index on (R.a1)

(a1) (a1,a2) () (a1,a2)

Page 7: Reducing Order Enforcement Cost in Complex Query Plans

7

A Minor Change to External Sorting

Multiple “partial sort segments” Hold only one segment at any given

time When a new segment starts

Sort the current segment and output

No run generation I/O if each segment fits in memory

Early output (good for Top-K) Reduced comparisons

O(n log n/k) Vs. O(n log n), k = # segments

a1 a2

1 2

1 1

1 5

1 3

2 4

2 1

2 6

6 3

… …

Page 8: Reducing Order Enforcement Cost in Complex Query Plans

8

Optimizer Changes to Handle Partial Sort Orders Cost Model for Partial Sort:

Let the input order be o1

Required (output) order be o2

Let os=Longest common prefix between o1 and o2

Let or=o2 – os (i.e, os + or = o2) A(o) = Attribute set of order o Є : Empty (no) sort order

coe(e, o1,o2) = D(e, A(os)) X coe(e’, Є, or), where e’=p(e) and p equates A(os) to a constant.

Page 9: Reducing Order Enforcement Cost in Complex Query Plans

9

Optimizer Changes to Handle Partial Sort Orders Cost Model for Partial Sort:

coe(e, o1,o2) = D(e, A(os)) X coe(e’, Є, or), where e’=p(e) and p equates A(os) to a constant.

o1=(a,b)

o2=(a,c)

os=(a), or=(c), e’=(a=k)(e)

e

Page 10: Reducing Order Enforcement Cost in Complex Query Plans

10

Flexible Order Requirements Most operators have interest in any order on the

attributes involved Merge-Join, Merge-Union, Group By, Duplicate Elimination Binary operators demand the same order from inputs

G {a1, a2}

{a1,a2,a3,a4}

{a4,a7}{a3,a5,a6}

Page 11: Reducing Order Enforcement Cost in Complex Query Plans

11

Finding Optimal is NP-Hard A special case:

All relations/intermediate results of the same size

All attribute cardinalities same

We try to maximize the length of common prefixes

Maximize LCP(pi, pj)

Reduction from graph layout problem SUM-CUT Optimal algorithm for paths and 2-approximation for binary trees

Page 12: Reducing Order Enforcement Cost in Complex Query Plans

12

A 2-Approximation Algorithm Optimal algorithm for paths

s2s1 sns3 Sn-1

OPT(i,j) = max {OPT(i,k) + OPT(k+1,j) + c(i,j)}, i ≤ k < j

2-Approximation for binary trees

- OPT ≤ OPT-EVEN + OPT-ODD- Take the one with higher benefit

Even levelsOdd levels

Page 13: Reducing Order Enforcement Cost in Complex Query Plans

13

General Case Logical plan space for inputs not expanded

(i.e, Join order not fixed)

Varying sizes of relations and intermediate results

All orders on base relations do not have the same cost (due to clustering and covering indices)

Page 14: Reducing Order Enforcement Cost in Complex Query Plans

14

Overview of the Approach Identify a small set of favorable orders

Orders that are relatively inexpensive Should not require expanding the input plan space

Plan generation (Phase-1) Deduce the interesting orders from the favorable

orders Try each of the interesting order, retain the best

Plan refinement (Phase-2) Use the 2-approximation algorithm and refine the

sort orders further

Page 15: Reducing Order Enforcement Cost in Complex Query Plans

15

Favorable Orders Benefit of an order:

benefit(o, e) = cbp(e, Є) + coe (e, Є, o) – cpb(e,o)Positive benefit The order can be obtained at cost

less than the full sort of unordered result (e.g., the

clustering order)

Favorable orders:ford(e)={ o : benefit(o,e) > 0 } Can be a huge set E.g., Every order having the clustering order as its

prefix is a favorable order.

Page 16: Reducing Order Enforcement Cost in Complex Query Plans

16

Minimal Favorable Orders A favorable order o that satisfies:

1. o’ ≤ o s.t. cbp(e, o’) + coe(e, o’, o) = cbp(e,o)

2. o” s.t. o ≤ o” and cbp(e, o”) = cbp(e,o)E.g., Relation R with clustering index on (a1,a2)

(a1,a2) is a minimal favorable order

(a1 ), (a1,a2,a3) are not

ford-min(e): Set of all minimal favorable orders for expression e

For base relations size of ford-min limited to the number of covering indices

E

E

Page 17: Reducing Order Enforcement Cost in Complex Query Plans

17

Computing Favorable Orders: Issues Defined in terms of cost of best plan

Need them before optimizing input sub-expressions Even ford-min can get prohibitively large for

join, group-by expressions

R S

J1

J2

ford-min contains every permutation of

the join attributes

Page 18: Reducing Order Enforcement Cost in Complex Query Plans

18

Heuristics for Computing ford-min

e=R : {o: o is clustering or covering index order}

e=p(e1) : {o: o ford-min(e1)}

e=L(e1) :{o: o’ ford-min(e1) and o=o’ ^ L} a,b(e1), ford-min(e1)={(a,c,b)} ford-min(e)={(a)}

e=e1 e2 : Let T=ford-min(e1) U ford-min(e2) T U {o: o’ T and o=((o’ ^ S) permute(S – A(o’ ^ S)))

UU

U

Page 19: Reducing Order Enforcement Cost in Complex Query Plans

19

Heuristics for Computing ford-minS={a,b,c,d}

ford-min={(a,b,e),(b)} ford-min={(a)}

T = {(a,b,e), (b), (a)}

Input F.Order (o) o ^ {a,b,c,d} Extended Order

(a,b,e) (a,b) (a,b,c,d)

(b) (b) (b,a,c,d)

(a) (a)

Page 20: Reducing Order Enforcement Cost in Complex Query Plans

20

Plan Generation (Phase-1) Form the set I of interesting orders to try

Collect input favorable orders and rqd. o/p order Take LCP with the set of join attributes Extend the orders (arbitrarily) to include remaining

attributes For each order o in I, generate optimization

sub-goals for input sub-expressions

Page 21: Reducing Order Enforcement Cost in Complex Query Plans

21

Plan Refinement (Phase-2)

Identify the suffix that can be freely reordered Use the 2-approximation algorithm to reorder

the suffix

R2(a)

(a,b,c,h)

(a,d,h)

R4(a)

R3(a)

R1(a)

(a,e,h){a,d,h} {a,e,h}

{a,e,h}

(a,h,e)

(a,h,b,c)

(a,h,d)

Page 22: Reducing Order Enforcement Cost in Complex Query Plans

22

Experiments1. Benefits of exploiting partial sort orders

2. Evaluate the plans produced by our optimizer extensions

Systems Compared

PostgreSQL 8.1.3, SQLServer 2005,

DB2 8.2, PYRO

Test Machine Intel P4 (HT) PC, 512 MB

Dataset TPC-H 1GB and synthetic

Queries Synthetic and from a real application

Page 23: Reducing Order Enforcement Cost in Complex Query Plans

23

Experiment 1SELECT suppkey, partkey FROM lineitem

ORDER BY suppkey, partkey;

(suppkey) (suppkey, partkey)

Page 24: Reducing Order Enforcement Cost in Complex Query Plans

24

Experiment 2

R(c1,c2,c3), 10 M records, (c1)(c1,c2), card(c1)=10,000

Page 25: Reducing Order Enforcement Cost in Complex Query Plans

25

Experiment 3

Page 26: Reducing Order Enforcement Cost in Complex Query Plans

26

Experiment 4

SELECT ps_suppkey, ps_partkey, ps_availqty,

sum(l_quantity) AS total_required

FROM partsupp, lineitem

WHERE ps_suppkey=l_suppkey AND ps_partkey=l_partkey

AND l_linestatus='O'

GROUP BY ps_partkey, ps_suppkey, ps_availqty,

HAVING sum(l_quantity) > ps_availqty

ORDER BY ps_partkey;

Parts running out of stock:

Page 27: Reducing Order Enforcement Cost in Complex Query Plans

27

Experiment 4 - Plans

Merge-Join Plan on SYS1 and SYS2 Plan Generated by PYRO-O

Page 28: Reducing Order Enforcement Cost in Complex Query Plans

28

Experiment 4 & 5 - Timings

Page 29: Reducing Order Enforcement Cost in Complex Query Plans

29

Experiments with Variants of PYRO

PYRO : Baseline PYROPYRO-O-: No partial sortPYRO-P : Postgres HeuristicPYRO-O : Our ApproachPYRO-E : Exhaustive

Page 30: Reducing Order Enforcement Cost in Complex Query Plans

30

Optimization Overheads

Page 31: Reducing Order Enforcement Cost in Complex Query Plans

31

Questions?