Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος [email protected]
Dec 19, 2015
Coloring Away Communication in Parallel Query Optimization
Waqar Hasan, Rajeev MotwaniStanford University
Παυλάτος Χρήστος
Coloring Away Communication in Parallel Query Optimization
2
Parallel plans for SQL queries
The problem is to find optimal parallel plans for SQL queries using a model based on representing the partitioning of data as a color.
Coloring Away Communication in Parallel Query Optimization
3
Hong and Stonebraker approach
The problem of parallel plans has been broken into two phases :
join ordering and query rewrite (JOQR) and parallelization
JOQR ParallelizationSQLQuery
ParallelPlan
Coloring Away Communication in Parallel Query Optimization
4
Optimize JOQR phase
JOQR
Conventional optimization
Query tree annotation and coloring
Coloring Away Communication in Parallel Query Optimization
5
Partitioning
A partitioning is a pair (a, h) where
a is an attribute and h is a function that maps values of a to non-negative integers.
Coloring Away Communication in Parallel Query Optimization
6
Partitioning example
Suppose we have two tables:Emp (name, number) and Cust (name, number) that both are partitioned across two sites usingthe function h (number) mod 2. Since thetables have the same partitioning Emp Cust = (Emp0 Cust0 ) U (Emp1 Cust1)
This permit (Emp Cust) to be computed in Parallel.
Coloring Away Communication in Parallel Query Optimization
7
The new approach
We want to choose the partitioning attributesin a query tree to minimize the sum total ofcommunication and computation cost. Byregarding partitioning attributes as colors wemodel the problem as a query tree coloring.
Coloring Away Communication in Parallel Query Optimization
8
Some definitions The color of a node in a query tree is
the attribute used for partitioning the node.
An edge between nodes i and j is multicolored if and only if i has different color from j
The weight ce of an edge represent the repartition cost.
Coloring Away Communication in Parallel Query Optimization
9
Query tree Coloring problem
Given a query tree T = (V, E), the weights ofthe edges and colors for some subset of thenodes, color the remaining nodes so as to minimize the total weight of multicolorededges.
Coloring Away Communication in Parallel Query Optimization
11
Problem Simplification
(Split) A colored interior node of degree d may be split into d nodes of the same colors and each incident edge connected to a distinct copy.
(Collapse) An uncolored leaf node may be collapsed into its parent. This gives it the same color as its parent.
Coloring Away Communication in Parallel Query Optimization
13
LemmaSuppose m is a mother with edges e1, e2…
ed to leaf childrean u1, u2 … ud . Assume
that we have numbered the childrean inorder of non-decreasing edge weight i.ece1, ce2 … ce3
Then there is a minimal coloring that cuts e1, e2…ed.
Coloring Away Communication in Parallel Query Optimization
18
Combining computation and communication costs
We can develop a new model by extending
the definition of color to be a triple <p, s, i>
where P is the partitioning attribute S is the sort attribute The indexing attribute
Coloring Away Communication in Parallel Query Optimization
19
The cost of a node
The cost of a node consists the cost of
Recoloring the outputs of its children
Have the color of its inputs The cost of executing the strategy
itself
Coloring Away Communication in Parallel Query Optimization
20
Strategy
A strategy specifies a particular algorithm
for computing an operator. It requires the
inputs to satisfy some constraints andguarantees some properties for its
output.