Top Banner
Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος [email protected]
21

Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος [email protected].

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

Waqar Hasan, Rajeev MotwaniStanford University

Παυλάτος Χρήστος

[email protected]

Page 2: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

2

Parallel plans for SQL queries

The problem is to find optimal parallel plans for SQL queries using a model based on representing the partitioning of data as a color.

Page 3: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

3

Hong and Stonebraker approach

The problem of parallel plans has been broken into two phases :

join ordering and query rewrite (JOQR) and parallelization

JOQR ParallelizationSQLQuery

ParallelPlan

Page 4: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

4

Optimize JOQR phase

JOQR

Conventional optimization

Query tree annotation and coloring

Page 5: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

5

Partitioning

A partitioning is a pair (a, h) where

a is an attribute and h is a function that maps values of a to non-negative integers.

Page 6: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

6

Partitioning example

Suppose we have two tables:Emp (name, number) and Cust (name, number) that both are partitioned across two sites usingthe function h (number) mod 2. Since thetables have the same partitioning Emp Cust = (Emp0 Cust0 ) U (Emp1 Cust1)

This permit (Emp Cust) to be computed in Parallel.

Page 7: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

7

The new approach

We want to choose the partitioning attributesin a query tree to minimize the sum total ofcommunication and computation cost. Byregarding partitioning attributes as colors wemodel the problem as a query tree coloring.

Page 8: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

8

Some definitions The color of a node in a query tree is

the attribute used for partitioning the node.

An edge between nodes i and j is multicolored if and only if i has different color from j

The weight ce of an edge represent the repartition cost.

Page 9: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

9

Query tree Coloring problem

Given a query tree T = (V, E), the weights ofthe edges and colors for some subset of thenodes, color the remaining nodes so as to minimize the total weight of multicolorededges.

Page 10: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

10

An example

Page 11: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

11

Problem Simplification

(Split) A colored interior node of degree d may be split into d nodes of the same colors and each incident edge connected to a distinct copy.

(Collapse) An uncolored leaf node may be collapsed into its parent. This gives it the same color as its parent.

Page 12: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

12

Examples on simplifications

Page 13: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

13

LemmaSuppose m is a mother with edges e1, e2…

ed to leaf childrean u1, u2 … ud . Assume

that we have numbered the childrean inorder of non-decreasing edge weight i.ece1, ce2 … ce3

Then there is a minimal coloring that cuts e1, e2…ed.

Page 14: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

14

The algorithm

Page 15: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

15

An example

Page 16: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

16

Algorithm for Repeated colors

Page 17: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

17

Decompose the tree

Page 18: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

18

Combining computation and communication costs

We can develop a new model by extending

the definition of color to be a triple <p, s, i>

where P is the partitioning attribute S is the sort attribute The indexing attribute

Page 19: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

19

The cost of a node

The cost of a node consists the cost of

Recoloring the outputs of its children

Have the color of its inputs The cost of executing the strategy

itself

Page 20: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

20

Strategy

A strategy specifies a particular algorithm

for computing an operator. It requires the

inputs to satisfy some constraints andguarantees some properties for its

output.

Page 21: Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization

21

Constraint

We use color patterns to specify such input-

output constraints. A constraint has the form :Input1, …, Inputn → Output

Where Inputj, Output are color patterns