Query processing

5/2/2011

1

Query Processing and

Optimization

Introduction

• Users are expected to write ―efficient‖ queries. But they

do not always do that!

– Users typically do not have enough information about the

database to write efficient queries. E.g., no information on

table size

– Users would not know if a query is efficient or not without

knowing how the DBMS’s query processor work

• DBMS’s job is to optimize the user’s query by:

– Converting the query to an internal representation (tree or

graph)

– Evaluate the costs of several possible ways of executing

the query and find the best one.

Steps in Query Processing

SQL query

Execution Plan

Code

Result

Parse Tree

Query Parsing

Code Generation

Query Optimization

Runtime DB Processor

Join

ProjectEmployee

Join Employee and Project

using hash join, … ...

Query ProcessingQuery in a high level language

Scanning, Parsing,

& Validating

Intermediate form of query

QUERY OPTIMIZER

Execution Plan

Query Code Generator

Code to execute the query

Runtime DB Processor

Result of query

Basic Steps in Query Processing1. Parsing and translation

2. Optimization

3. Evaluation

Basic Steps in Query Processing

• Parsing and translation

– translate the query into its internal form.

This is then translated into relational

algebra.

– Parser checks syntax, verifies relations

• Evaluation

– The query-execution engine takes a query-

evaluation plan, executes that plan, and

returns the answers to the query.

5/2/2011

2

Query Processing

• Consider the query:

select balance

from account

where balance<2500

• Can be translated into either of the following RA expressions:

balance 2500( balance(account))

balance( balance 2500(account))

• The RA expressions are equivalent

Query Processing

• Each relational algebra operation can be evaluated using one of several different algorithms– Correspondingly, a relational-algebra

expression can be evaluated in many ways.

• Annotated expression specifying detailed evaluation strategy is called an evaluation-plan– E.g., can use an index on balance to find

accounts with balance < 2500,– or can perform complete relation scan and

discard accounts with balance 2500

Query Plan Query Optimization

• Amongst all equivalent evaluation plans choose the one with lowest cost. – Cost is estimated using statistical information

from the database catalog• e.g. number of tuples in each relation, size of tuples,

etc.

• First we need to learn:– How to measure query costs– Algorithms for evaluating relational algebra

operations– How to combine algorithms for individual

operations in order to evaluate a complete expression

– How to optimize queries, that is, how to find an evaluation plan with lowest estimated cost

Measures of Query Cost• Cost is generally measured as total elapsed time for

answering query

– Many factors contribute to time cost

• disk accesses, CPU, or even network communication

• Typically disk access is the predominant cost, and is also relatively easy to estimate. Measured by taking into account

– Number of seeks * average-seek-cost

+ Number of blocks read * average-block-read-cost

+ Number of blocks written * average-block-write-cost

• Cost to write a block is greater than cost to read a block

– data is read back after being written to ensure that the write was successful

– Assumption: single disk

• Can modify formulae for multiple disks/RAID arrays

• Or just use single-disk formulae, but interpret them as measuring resource consumption instead of time

Measures of Query Cost (Cont.)• For simplicity we just use the number of block transfers from

disk and the number of seeks as the cost measures– tT – time to transfer one block

– tS – time for one seek

– Cost for b block transfers plus S seeksb * tT + S * tS

• We ignore CPU costs for simplicity– Real systems do take CPU cost into account

• We do not include cost to writing output to disk in our cost formulae

• Several algorithms can reduce disk I/O by using extra buffer space

– Amount of real memory available to buffer depends on other concurrent queries and OS processes, known only during execution

• We often use worst case estimates, assuming only the minimum amount of memory needed for the operation is available

• Required data may be buffer resident already, avoiding disk I/O

– But hard to take into account for cost estimation

5/2/2011

3

Statistics and Catalogs

• For each Table

– Table name, file name (or some identifier) & file structure (e.g., heap file)

– Attribute name and type of each attribute

– Index name of each index

– Integrity constraints

• For each Index

– Index name & the structure (e.g., B+ tree)

– Search key attributes

• For each View

– View name & definition


• Cardinality: NTuples(N) for each R

• Size: NPages(R) for each R

• Index Cardinality: Number of distinct key values NKeys(I) for each I

• Index Size: INPages(I) for each index I

• For B+ tree index, INPages is number of leaf pages

• Index Height: Number of non-leaf levels IHeight(I) for eact tree index

• Index Range: ILow(I) & IHigh(I)


• Catalogs updated periodically

– Updating whenever data changes is too expensive

• More detailed information (e.g., histograms of the values in some field) are sometimes stored.

Operator Evaluation

Algorithms for evaluating relational operators use some simple ideas extensively:

– Indexing: If a selection or join condition is specified, use an index to examine just the tuples that satisfy the condition.

– Iteration: Sometimes, faster to scan all tuples even if there is an index. (And sometimes, we can scan the data entries in an index instead of the table itself.)

– Partitioning: By using sorting or hashing, we can partition the input tuples and replace an expensive operation by similar operations on smaller inputs.

Access Paths• An access path is a method of retrieving tuples:

• File scan, or index that matches a selection (in the query)

• A tree index matches (a conjunction of) terms that involve only attributes in a prefix of the search key.

• E.g., Tree index on <a, b, c> matches the selection a=5 AND b=3, and a=5 AND b>6, but not b=3.

• A hash index matches (a conjunction of) terms that has a term attribute = value for every attribute in the search key of the index.

• E.g., Hash index on <a, b, c> matches a=5 AND b=3 AND

c=5; but it does not match b=3, or a=5 AND b=3, or a>5 AND b=3 AND c=5.

Access Paths

• Selectivity: Number of pages retrieved (Index + data) to retrieve all desired tuples

• Using the most selective access path minimizes the cost of data retrieval

• Reduction Factor: • Each conjunct is a filter

• Fraction of tuples satisfying a given conjunct is called the reduction factor

5/2/2011

4

Query Optimization

• Techniques used by a DBMS to process, optimize, and execute high-level queries

• A high-level query is – Scanned– Parsed– Validated

• Internal representation – QUERY TREE– QUERY GRAPH

• Many Execution Strategies• Choosing a suitable one for processing a query is

QUERY OPTIMIZATION• Ideally: Want to find best plan• Practically: Avoid worst plans!

Query Optimization

• Scanning– The scanner identifies the language tokens, such as SQL

keywords, attribute names, & relation names

• Parsing– Parser checks the query syntax to determine whether it

is formulated according to the grammar rules of the query language

• Validating– Checking that all the attribute & relation names are valid

and semantically meaningful names in the schema of the particular DB being queried

SQL Queries to

Relational Algebra

• SQL queries are optimized by decomposing them into a collection of smaller units, called blocks

• Query optimizer concentrates on optimizing a single block at a time

Translating SQL Queries into

Relational Algebra

• Query block: the basic unit that can be translated into the algebraic operators and optimized.

• A query block contains a single SELECT-FROM-WHERE expression, as well as GROUP BY and HAVING clause if these are part of the block.

• Nested queries within a query are identified as separate query blocks.

• Aggregate operators in SQL must be included in the extended algebra.


Relational AlgebraSELECT LNAME, FNAME

FROM EMPLOYEE

WHERE SALARY > ( SELECT MAX (SALARY)

FROM EMPLOYEE

WHERE DNO = 5);

SELECT MAX (SALARY)

FROM EMPLOYEE

WHERE DNO = 5

SELECT LNAME, FNAME

FROM EMPLOYEE

WHERE SALARY > C

πLNAME, FNAME (σSALARY>C(EMPLOYEE)) ℱMAX SALARY (σDNO=5 (EMPLOYEE))

• File scan scan all records of the file to find records that

satisfy selection condition

• Binary search when the file is sorted on attributes

specified in the selection condition

• Index scan using index to locate the qualified records

– Primary index, single record retrieval equality

comparison on a primary key attribute with a primary

index

– Primary index, multiple records retrieval comparison

condition <, >, etc. on a key field with primary index

– Clustering index to retrieve multiple records

– Secondary index to retrieve single or multiple records

Select Operation

5/2/2011

5

OP1 AND OP2 (e.g., EmpNo=123 AND Age=30)

Conjunctive selection: Evaluate the condition that has an index created (i.e.,

that can be evaluated very fast), get the qualified tuples and then check if

these tuples satisfy the remaining conditions.

Conjunctive selection using composite index: if there is a composite index

created on attributes involved in one or more conditions, then use the

composite index to find the qualified tuples

Complete Employee RecordsEmpNo Age

012 25

123 30

Composite

index

Conjunctive selection by intersection of record pointers: if secondary indexes

are available, evaluate each condition and intersect the sets of record pointers

obtained.

Conjunctive Conditions

When there are more than one attribute with an index:

– use the one that costs least, and

– the one that returns the smallest number of qualified tuple

Disjunctive select conditions: OP1 or OP2 are much more

costly:

potentially a large number of tuples will qualify

costly if any one of the condition doesn‟t have an index created

selectivity of a condition is the number of tuples that

satisfy the condition divided by total number of tuples.

The smaller the selectivity, the fewer the number of

tuples retrieved, and the higher the desirability of using

that condition to retrieve the records.

Conjunctive Conditions

• Join is one of the most time-consuming

operations in query processing.

• Two-way join is a join of two relations, and there

are many algorithms to evaluate the join.

• Multi-way join is a join of more than two relations;

different orders of evaluating a multi-way join

have different speeds

• We shall study methods for implementing two-

way joins of form

R A=B S

Join Operation

Nested (inner-outer) Loop: For each record r in R (outer loop),

retrieve every record s from S (inner loop) and check if r[A] =

s[B].

R A=B S

Join Algorithm: Nested (inner-outer) Loop

for each tuple r in Rdo for each tuple s in S

do if r.[A] = s[B] then output result

endend

0005

0002

0004

0002

0002

0001

0005

0005

0002

0002

0003

0002

0005

RS

m tuples in R

n tuples in S

m*n checkings

R and S can be reversed

If an index (or hash key) exists, say, on attribute B of S, should we put R in

the outer loop or S? Why?

Records in the outer relation are accessed sequentially, an index on the

outer relation doesn‟t help;

Records in the inner relations are accessed randomly, so an index can

retrieve all records in the inner relation that satisfy the join condition.

When One Join Attributes is Indexed

0005

0002

0004

0002

0002

0001

0005

R

0005

0002

0002

0003

0002

0005

Sindex on S

Sort-merge join: if the records of R and S are sorted on the

join attributes A and B, respectively, then the relations are

scanned in say ascending order, matching the records that

have same values for A and B.

R A=B S

0001

0002

0002

0002

0004

0005

0005

0002

0002

0002

0003

0005

0005

Sort-Merge Join

• R and S are only scanned once.

• Even if the relations are not

sorted, it is better to sort them

first and do sort-merge join then

doing double-loop join.

• if R and S are sorted, n + m

• if not sorted:

n log(n) + m log(m) + m + n

5/2/2011

6

Hash-join: R and S are both hashed to the same hash file based

on the join attributes. Tuples in the same bucket are then

“joined”.

0001

0002

0002

0002

0004

0005

00050002

0002

0002

0003

0005

0005

0001 0002

0002

0002

0004 0005

0005

0002

0002

0002

0003

0005

0005

Hash Join Method

• Disk accesses are based on blocks, not individual tuples

• Main memory buffer can significantly reduce the number of disk

accesses

– Use the smaller relation in outer loop in nested loop method

– Consider if 1 buffer is available, 2 buffers, m buffers

• When index is available, either the smaller relation or the one with

large number of matching tuples should be used in the outer loop.

• If join attributes are not indexed, it may be faster to create the

indexes on-the-fly (hash-join is close to generating a hash index

on-the-fly)

• Sort-Merge is the most efficient; the relations are often sorted

already

• Hash join is efficient if the hash file can be kept in the main

memory

Hints on Evaluating Joins


Optimization

Measures of Query Cost (Cont.)• For simplicity we just use the number of block transfers from disk

and the number of seeks as the cost measures

– tT – time to transfer one block

– tS – time for one seek

– Cost for b block transfers plus S seeks

b * tT + S * tS

• We ignore CPU costs for simplicity

– Real systems do take CPU cost into account

• We do not include cost to writing output to disk in our cost formulae

Selection Operation• File scan

• Algorithm A1 (linear search). Scan each file block and test all records to see whether they satisfy the selection condition.

– Cost estimate = br block transfers + 1 seek (br * tT + tS )

• br denotes number of blocks containing records from relation r

– If selection is on a key attribute, can stop on finding record

• cost = (br /2) block transfers + 1 seek (br /2)* tT + tS– Linear search can be applied regardless of

• selection condition or

• ordering of records in the file, or

• availability of indices• Note: binary search generally does not make sense since data is not

stored consecutively

– except when there is an index available,

– and binary search requires more seeks than index search

Selections Using Indices• Index scan – search algorithms that use an index

– selection condition must be on search-key of index.

• A2 (primary index, equality on key). Retrieve a single record

that satisfies the corresponding equality condition

– Cost = (hi + 1) * (tT + tS)

• A3 (primary index, equality on nonkey) Retrieve multiple

records.

– Records will be on consecutive blocks

• Let b = number of blocks containing matching records

– Cost = hi * (tT + tS) + tS + tT * b

5/2/2011

7

Selections Using Indices• A4 (secondary index, equality on nonkey).

– Retrieve a single record if the search-key is a candidate key

• Cost = (hi + 1) * (tT + tS)

– Retrieve multiple records if search-key is not a candidate key

• each of n matching records may be on a different block

• Cost = (hi + n) * (tT + tS)

– Can be very expensive!

Selections Involving Comparisons

• Can implement selections of the form A V (r) or A V(r) by using

– a linear file scan,

– or by using indices in the following ways:

• A5 (primary index, comparison). (Relation is sorted on A)

• For A V(r) use index to find first tuple v and scan relation sequentially from there

• For A V (r) just scan relation sequentially till first tuple > v; do

not use index

• A6 (secondary index, comparison).

• For A V(r) use index to find first index entry v and scan index sequentially from there, to find pointers to records.

• For A V (r) just scan leaf pages of index finding pointers to records, till first entry > v

• In either case, retrieve records that are pointed to

– requires an I/O for each record

– Linear file scan may be cheaper

Implementation of Complex Selections

• Conjunction: 1 2 . . . n(r)

• A7 (conjunctive selection using one index).

– Select a combination of i and algorithms A1 through A7 that

results in the least cost for i (r).

– Test other conditions on tuple after fetching it into memory buffer.

• A8 (conjunctive selection using composite index).

– Use appropriate composite (multiple-key) index if available.

• A9 (conjunctive selection by intersection of identifiers).

– Requires indices with record pointers.

– Use corresponding index for each condition, and take intersection

of all the obtained sets of record pointers.

– Then fetch records from file

– If some conditions do not have appropriate indices, apply test in

memory.

Algorithms for Complex Selections

• Disjunction: 1 2 . . . n (r).

• A10 (disjunctive selection by union of identifiers).

– Applicable if all conditions have available indices.

• Otherwise use linear scan.

– Use corresponding index for each condition, and take union

of all the obtained sets of record pointers.

– Then fetch records from file

• Negation: (r)

– Use linear scan on file

– If very few records satisfy , and an index is applicable to

• Find satisfying records using index and fetch from file

Sorting

• We may build an index on the relation, and then use the index to

read the relation in sorted order. May lead to one disk block access

for each tuple.

• For relations that fit in memory, techniques like quicksort can be

used. For relations that don’t fit in memory, external

sort-merge is a good choice.

External Sort-Merge

1. Create sorted runs. Let i be 0 initially.

Repeatedly do the following till the end of the relation:

(a) Read M blocks of relation into memory

(b) Sort the in-memory blocks

(c) Write sorted data to run Ri; increment i.

Let the final value of i be N

2. Merge the runs (next slide)…..

Let M denote memory size (in pages).

5/2/2011

8

External Sort-Merge (Cont.)

2. Merge the runs (N-way merge). We assume (for now) that N

< M.

1. Use N blocks of memory to buffer input runs, and 1 block to

buffer output. Read the first block of each run into its buffer page

2. repeat

1. Select the first record (in sort order) among all buffer pages

2. Write the record to the output buffer. If the output buffer is

full write it to disk.

3. Delete the record from its input buffer page.

If the buffer page becomes empty then

read the next block (if any) of the run into the buffer.

3. until all input buffer pages are empty:

External Sort-Merge (Cont.)• If N M, several merge passes are required.

– In each pass, contiguous groups of M - 1 runs are merged.

– A pass reduces the number of runs by a factor of M -1, and

creates runs longer by the same factor.

• E.g. If M=11, and there are 90 runs, one pass reduces

the number of runs to 9, each 10 times the size of the

initial runs

– Repeated passes are performed till all runs have been

merged into one.

Example: External Sorting Using Sort-Merge SQL Queries to

Relational Algebra

• SQL queries are optimized by decomposing them into a collection of smaller units, called blocks

• Query optimizer concentrates on optimizing a single block at a time


Relational Algebra

• Query block: the basic unit that can be translated into the algebraic operators and optimized.

• A query block contains a single SELECT-FROM-WHERE expression, as well as GROUP BY and HAVING clause if these are part of the block.

• Nested queries within a query are identified as separate query blocks.

• Aggregate operators in SQL must be included in the extended algebra.


Relational AlgebraSELECT LNAME, FNAME

FROM EMPLOYEE

WHERE SALARY > ( SELECT MAX (SALARY)

FROM EMPLOYEE

WHERE DNO = 5);

SELECT MAX (SALARY)

FROM EMPLOYEE

WHERE DNO = 5

SELECT LNAME, FNAME

FROM EMPLOYEE

WHERE SALARY > C

πLNAME, FNAME (σSALARY>C(EMPLOYEE)) ℱMAX SALARY (σDNO=5 (EMPLOYEE))

5/2/2011

9

Query Optimization• Query optimizer would now choose an execution

plan for each block

• Note that the inner block needs to be evaluated only once to produce the maximum salary

• Uncorrelated nested query

• It is much harder to optimize correlated nested query where a tuple variable from the outer block appears in the where clause of the inner block

Select S.sname

From Sailors S

Where exists (select *

from reserves R

where R.bid=103

& R.sid=S.sid)

A Word about *

• All we want to do is to check that a qualifying row exists, and not really want to retrieve any columns from the row

Select S.sname

From Sailors S

Where exists (select *

from reserves R

where R.bid=103

& R.sid=S.sid)

Select count (*)

From Sailors S

Select count (distinct S.sname)

From Sailors S

If COUNT does not include DISTINCT, the above two queries give the same result

COUNT (*) is a better querying style since it immediately clear that all records contribute to total count

• Give a relational algebra expression,

how do we transform it to a more efficient

one?

Query Optimization

• Use the query tree as a tool to rearrange

the operations of the relational algebra

expression

Query Optimization

• RDBMS query optimizers are very complex pieces of software

• Typically represent 40-50 man years of development effort!!

Query Optimization

• SQL queries translated into Relational Algebra & then optimized

• Two main techniques for optimization•Heuristic based

» Ordering the operations in a query execution strategy

» Works for most cases but not guaranteed for all possible cases

•Cost based» Systematically estimating the cost of different

execution strategies and choosing the execution plan with the lowest cost estimate

• Both combined in a typical query optimizer

Query Optimization

• Query is essentially treated as a σ-∏-►◄ algebra expression

• Remaining operations are carried out on the result of the σ-∏-►◄expression

• Optimizing an RA expression involves:• Enumerating alternative plans for evaluating the

expression. NOT ALL

• Estimating the cost of each enumerated plan and choosing the plan with the lowest estimated cost

5/2/2011

10

Query Evaluation Plans

• A QEP consists of an extended RA tree

• Additional annotations at each node indicating the access method to use for each table and the implementation method to use for each relational operator

Structure and Execution of a Query Tree

• A query tree is a tree structure that

corresponds to a relational algebra expression

by representing the input relations as leaf

nodes and the relational algebra operations as

internal nodes of the tree

• An execution of the query tree consists of

executing an internal node operation whenever

its operands are available and then replacing

that internal node by the relation that results

from executing the operation

Query Optimization: Example

SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid=S.sid AND

R.bid=100 AND S.rating>5

RA Tree:

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

(Simple Nested Loops)

(On-the-fly)

(On-the-fly)Plan:

RA Expression:∏sname (σ bid=100^rating>5(R ►◄sid=sid S))

The Schema:Sailors (sid, sname, rating, age) 50 Bytes

Reserves (sid, bid, day, rname) 40 Bytes

Interpreting the TREE

Tree partially specifies how to evaluate the query

• First compute join between Reserves & Sailors

• Then the selections

• Finally the projection

RA Tree:

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

Interpreting the TREEDecide on the implementation of each operation involved

• Page oriented simple nested loops join between Reserves & Sailors with Reserves as the outer table

• Apply selections & projections to each tuple in the result of the join as it is produced

• Result of the join before the selections and projections is never stored in its entirety

• Convention: Outer table is the left child of the operator

Reserves Sailors

sid=sid

bid=100 rating > 5

sname


(On-the-fly)

(On-the-fly)Plan:

(File Scan)(File Scan)

Heuristics for Optimizing a Query

• A query may have several equivalent

query trees

• A query parser generates a standard canonical query tree from a SQL query tree– Cartesian products are first applied

(FROM)

– then the conditions (WHERE)

– and finally projection (SELECT)

5/2/2011

11

ProjNo,DeptNo,EmpName,Address,Birthdate

ProjLocation=‘Stafford’ AND MgrNo=EmpNo AND

DeptNo=DeptNo,

Employee

DepartmentProject

The query optimizer

transforms this canonical

query into an efficient final

query


select ProjNo, DeptNo, EmpName, Address,

Birthdate

from Project, Department, Employee

where ProjLocation=„Stafford‟ and

MrgNo=EmpNo and

Department.DeptNo=Employee.DeptNo

Find the names of employees born after 1957

who work on a project named „Aquarius‟

select EmpName

from Employee, WorksOn, Project

where ProjName=„Aquarius‟ AND

Project.ProjNo=WorksOn.ProjNo AND

Employee.EmpNo = WorksOn.EmpNo

AND

Birthdate >„DEC-31-1957‟

WorksOn (EmpNo, ProjNo, Hours)

EmpName

ProjName=‘Aquarius’ AND Project.ProjNo=Project.ProjNo

AND Employee.EmpNo=WorksOn.EmpNo

AND Birthdate > ‘DEC-31-1957’

Project

WorksOnEmployee

Example

EmpName

ProjNo=ProjNo

Project

WorksOn

Employee

ProjName=‘Aquarius’

Birthdate > ‘dec-31-1957’

EmpNo=EmpNo

Example

Push all the conditions as far down

the tree as possible

Expensive due to large

size of Employee

Example

EmpName

EmpNo=EmpNo

Employee

WorksOn

Project


PNAME=‘Aquarius’

ProjNo=ProjNo

Rearrange join sequence according

to estimates of relation sizes

Only need ProjNo attribute from

Project and WorksOn

Only need EmpNo attribute from

Employee and WorksOn and

EmpName from Employee

Example

Replace cross products and selection

sequence with a join operation EmpName

EmpNo= EmpNo

EmployeeWorksOn

Project



ProjNo= ProjNo

Example

Push projection as far down the

query tree as possible

LNAME

EmpNo = EmpNo

Employee


WorksOn

Project


ProjNo= ProjNo

EmpNo, EmpNameEmpNo

EmpNo, ProjNoProjNo

5/2/2011

12

1. Cascade of : A conjunctive selection condition can be broken up into a cascade (sequence) of individual operations:

• c1 AND c2 AND...AND cn(R) c1

( c2(...( cn

(R))..))

2. Commutativity of :

c1( c2

(R)) c2( c1

(R))

3. Cascade of :

• List1( List2

(... ( Listn(R))... )) List1

(R)

if List1 is included in List2…Listn; result is null if List1 is not in any of List2…Listn

Transformation Rules

4. Commuting with : if the projection list List1 involves only attributes that are in condition c

• List1( c(R)) c( List1(R))

5. Commutivity of JOIN or : R S S R

6. Commuting with JOIN: if all the attributes in the selection condition c involve only the attributes of one of the relations being joined, say, R

• c(R S) ( c(R)) S


7. Commuting with JOIN: if List can be separated into

List1 and List2 involving only attributes from R and S,

respectively, and the join condition c involves only

attributes in List:

• List(R c S) ( List1(R) c List2

(S))

8. Commuting set operations: and are commutative

9. JOIN, , , are associative

10. distributes over , ,

• c (R S) c(R) c(S)

11. distributes over

• List (R S) ( List(R) List(S))


Use rule 1 to break up any operation with conjunctive conditions into a sequence of operations

Use rules 2, 4, 6, and 10 concerning commutativity of with other operations to move each operation as far down the query tree as possible based on the attributes in the operations

Use rule 9 concerning associativity of binary operations to rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive operations are executed

Heuristic Algebraic Optimization

Combine sequences of Cartesian product and operation representing a join condition into single JOIN operations

Use rules 3, 4, 7, and 11 concerning the cascading of and commuting with other operations, break down a and move the projection attributes down the tree as far as possible

Identify subtrees that represent groups of operations that can be executed by a single algorithm (select/join followed by project)

Heuristic Algebraic OptimizationPipelined Evaluation

• Motivation– A query is mapped into a sequence of operations.

– Each execution of an operation produces a temporary result.

– Generating and saving temporary files on disk is time consuming and expensive.

• Alternative:– Avoid constructing temporary results as much as

possible.

– Pipeline the data through multiple operations - pass the result of a previous operator to the next without waiting to complete the previous operation.

5/2/2011

13

Pipelined Evaluation

• The result of one operator is sometimes pipelined to another operator without creating a temporary table to hold the intermediate result

• The output of R ►◄S is pipelined into the selections & projections that follow

• Cost of writing out the intermediate result & reading it back in can be significant

• Temporary table: Materialized Tuples


• Consider a selection query in which only a part of the selection condition matches an index

• 2 instances of selection operator– Matching (primary) part of the selection condition

– Rest

• Pipelining: apply the second selection to each tuple in the result of the primary selection as it is produced & adding tuples that qualify to the final result

• When the input to a unary operator is pipelined into it, we say that the operator is applied on-the-fly


• Result tuples of first join pipelined into join with C

• Conceptually, the evaluation is initiated from the root, & the node joining A & B produces tuples as and when they are requested from their parent node

►◄

A B

C

►◄

(A ►◄B) ►◄ C

Estimation of the Size of Joins

• The Cartesian product r s contains nrns tuples; each tuple

occupies sr + ss bytes.

• If R S = , then r s is the same as r x s.

• If R S is a key for R, then a tuple of s will join with at most one

tuple from r; therefore, the number of tuples in r s is no greater

than the number of tuples in s.If R S in S is a foreign key in S referencing R, then the number of

tuples in r s is exactly the same as the number of tuples in s.The case for R S being a foreign key referencing S is symmetric.

R S

Matching tuples

Example of Size Estimation

• In the example query depositor customer, customer-name in

depositor is a foreign key of customer; hence, the result has exactly

depositor tuples, which is 5000.

• Data: R = Customer, S = Depositor

customer = 10,000

fcustomer = 25

bcustomer = 10000/25 = 400

depositor = 5,000

fdepositor = 50

bdepositor = 5000/50 = 100

Estimation of the size of Joins

• If R S = {A} is not a key for R or S.

If we assume that every tuple t in R produces tuples in

R S, number of tuples in R S is estimated to be:

r s

V(A, s)

• If the reverse is true, the estimates obtained will be:

r s

V(A, r)

• The lower of these two estimates is probably the more

accurate one.

Number of distinct values of A in s

R S

s

V(A, s)

5/2/2011

14


• Compute the size estimates for depositor customer

without using information about foreign keys:

– customer = 10,000

depositor = 5,000

V(customer-name, depositor ) = 2500

V(customer-name, customer ) = 10000

– The two estimates are 5000 * 10000/2500 = 20,000 and

5000 * 10000/10000 = 5000

– We choose the lower estimate, which, in this case, is the

same as our earlier computation using foreign keys.

There are 5,000 tuples in

depositor relation but has

only 2,500 distinct

depositors, so every

depositor has two accounts

Customer-name is unique

Nested-Loop Join

• Compute the theta join, r s

for each tuple tr in r do begin

for each tuple ts in s do begintest pair (tr, ts) to see if they satisfy the join condition

if they do, add tr · ts to the result.

End

end

• r is called the outer relation and s the inner relation of the join.

• Requires no indices and can be used with any kind of join condition.

• Expensive since it examines every pair of tuples in the two relations.

Cost of Nested-Loop Join• If there is enough memory to hold only one block of each

relation, the estimated cost is nr * bs + br disk accesses

• If the smaller relation fits entirely in memory, use it as the inner relation. This reduces the cost estimate to br + bs disk accesses.

– br + bs is the minimum possible cost to read R and S once

– Putting both relations in memory won’t reduce the cost further

br disk accesses to

load R into bufferRS

For each tuple in r, S has to be

read into buffer, bs disk accesses

no. of bocks in rno. of bocks in s


Optimization

Structure and Execution of a Query Tree

• A query tree is a tree structure that

corresponds to a relational algebra expression

by representing the input relations as leaf

nodes and the relational algebra operations as

internal nodes of the tree

• An execution of the query tree consists of

executing an internal node operation whenever

its operands are available and then replacing

that internal node by the relation that results

from executing the operation

Query Optimization: Example

SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid=S.sid AND

R.bid=100 AND S.rating>5

RA Tree:

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

Reserves Sailors

sid=sid

bid=100 rating > 5

sname


(On-the-fly)

(On-the-fly)Plan:

RA Expression:∏sname (σ bid=100^rating>5(R ►◄sid=sid S))

The Schema:Sailors (sid, sname, rating, age) 50 Bytes

Reserves (sid, bid, day, rname) 40 Bytes

5/2/2011

15

Interpreting the TREE

Tree partially specifies how to evaluate the query

• First compute join between Reserves & Sailors

• Then the selections

• Finally the projection

RA Tree:

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

Interpreting the TREEDecide on the implementation of each operation involved

• Page oriented simple nested loops join between Reserves & Sailors with Reserves as the outer table

• Apply selections & projections to each tuple in the result of the join as it is produced

• Result of the join before the selections and projections is never stored in its entirety

• Convention: Outer table is the left child of the operator

Reserves Sailors

sid=sid

bid=100 rating > 5

sname


(On-the-fly)

(On-the-fly)Plan:

(File Scan)(File Scan)


• A query may have several equivalent

query trees

• A query parser generates a standard canonical query tree from a SQL query tree– Cartesian products are first applied

(FROM)

– then the conditions (WHERE)

– and finally projection (SELECT)

ProjNo,DeptNo,EmpName,Address,Birthdate

ProjLocation=‘Stafford’ AND MgrNo=EmpNo AND

DeptNo=DeptNo,

Employee

DepartmentProject

The query optimizer

transforms this canonical

query into an efficient final

query


select ProjNo, DeptNo, EmpName, Address,

Birthdate

from Project, Department, Employee

where ProjLocation=„Stafford‟ and

MrgNo=EmpNo and

Department.DeptNo=Employee.DeptNo

Find the names of employees born after 1957

who work on a project named „Aquarius‟

select EmpName

from Employee, WorksOn, Project

where ProjName=„Aquarius‟ AND

Project.ProjNo=WorksOn.ProjNo AND

Employee.EmpNo = WorksOn.EmpNo

AND

Birthdate >„DEC-31-1957‟

WorksOn (EmpNo, ProjNo, Hours)

EmpName

ProjName=‘Aquarius’ AND Project.ProjNo=Project.ProjNo

AND Employee.EmpNo=WorksOn.EmpNo

AND Birthdate > ‘DEC-31-1957’

Project

WorksOnEmployee

Example

EmpName

ProjNo=ProjNo

Project

WorksOn

Employee



EmpNo=EmpNo

Example

Push all the conditions as far down

the tree as possible

Expensive due to large

size of Employee

5/2/2011

16

Example

EmpName

EmpNo=EmpNo

Employee

WorksOn

Project


PNAME=‘Aquarius’

ProjNo=ProjNo

Rearrange join sequence according

to estimates of relation sizes

Only need ProjNo attribute from

Project and WorksOn

Only need EmpNo attribute from

Employee and WorksOn and

EmpName from Employee

Example

Replace cross products and selection

sequence with a join operation EmpName

EmpNo= EmpNo

EmployeeWorksOn

Project



ProjNo= ProjNo

Example

Push projection as far down the

query tree as possible

LNAME

EmpNo = EmpNo

Employee


WorksOn

Project


ProjNo= ProjNo

EmpNo, EmpNameEmpNo

EmpNo, ProjNoProjNo

1. Cascade of : A conjunctive selection condition can be broken up into a cascade (sequence) of individual operations:

• c1 AND c2 AND...AND cn(R) c1

( c2(...( cn

(R))..))

2. Commutativity of :

c1( c2

(R)) c2( c1

(R))

3. Cascade of :

• List1( List2

(... ( Listn(R))... )) List1

(R)

if List1 is included in List2…Listn; result is null if List1 is not in any of List2…Listn


4. Commuting with : if the projection list List1 involves only attributes that are in condition c

• List1( c(R)) c( List1(R))

5. Commutivity of JOIN or : R S S R

6. Commuting with JOIN: if all the attributes in the selection condition c involve only the attributes of one of the relations being joined, say, R

• c(R S) ( c(R)) S


7. Commuting with JOIN: if List can be separated into

List1 and List2 involving only attributes from R and S,

respectively, and the join condition c involves only

attributes in List:

• List(R c S) ( List1(R) c List2

(S))

8. Commuting set operations: and are commutative

9. JOIN, , , are associative

10. distributes over , ,

• c (R S) c(R) c(S)

11. distributes over

• List (R S) ( List(R) List(S))


5/2/2011

17

Pictorial Depiction of Equivalence Rules

Use rule 1 to break up any operation with conjunctive conditions into a sequence of operations

Use rules 2, 4, 6, and 10 concerning commutativity of with other operations to move each operation as far down the query tree as possible based on the attributes in the operations

Use rule 9 concerning associativity of binary operations to rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive operations are executed

Heuristic Algebraic Optimization

Combine sequences of Cartesian product and operation representing a join condition into single JOIN operations

Use rules 3, 4, 7, and 11 concerning the cascading of and commuting with other operations, break down a and move the projection attributes down the tree as far as possible

Identify subtrees that represent groups of operations that can be executed by a single algorithm (select/join followed by project)

Heuristic Algebraic OptimizationEvaluation of Expressions

• Alternatives for evaluating an entire expression tree

– Materialization: generate results of an expression whose inputs

are relations or are already computed, materialize (store) it on

disk.

– Pipelining: pass on tuples to parent operations even as an

operation is being executed

Materialization• Materialized evaluation: evaluate one operation at a time,

starting at the lowest-level. Use intermediate results materialized

into temporary relations to evaluate next-level operations.

• E.g., in figure below, compute and store

then compute the store its join with instructor, and finally compute

the projection on name.

)("Watson" departmentbuilding

Materialization (Cont.)

• Materialized evaluation is always applicable

• Cost of writing results to disk and reading them back can be quite

high

– Our cost formulas for operations ignore cost of writing results to

disk, so

• Overall cost = Sum of costs of individual operations +

cost of writing intermediate results to disk

• Double buffering: use two output buffers for each operation, when

one is full write it to disk while the other is getting filled

– Allows overlap of disk writes with computation and reduces

execution time

5/2/2011

18

Pipelining• Pipelined evaluation : evaluate several operations

simultaneously, passing the results of one operation on to the next.

• E.g., in previous expression tree, don’t store result of

– instead, pass tuples directly to the join.. Similarly, don’t store result of join, pass tuples directly to projection.

• Much cheaper than materialization: no need to store a temporary relation to disk.

• Pipelining may not always be possible – e.g., sort, hash-join.

• For pipelining to be effective, use evaluation algorithms that generate output tuples even as tuples are received for inputs to the operation.

• Pipelines can be executed in two ways: demand driven and

producer driven

)("Watson" departmentbuilding

Pipelining• In demand driven or lazy evaluation

– system repeatedly requests next tuple from top level operation

– Each operation requests next tuple from children operations as

required, in order to output its next tuple

– In between calls, operation has to maintain ―state‖ so it knows

what to return next

• In producer-driven or eager pipelining

– Operators produce tuples eagerly and pass them up to their

parents

• Buffer maintained between operators, child puts tuples in

buffer, parent removes tuples from buffer

• if buffer is full, child waits till there is space in the buffer, and

then generates more tuples

– System schedules operations that have space in output buffer

and can process more input tuples

• Alternative name: pull and push models of pipelining

Pipelining (Cont.)• Implementation of demand-driven pipelining

– Each operation is implemented as an iterator implementing the following operations

• open()

– E.g. file scan: initialize file scan

» state: pointer to beginning of file

– E.g.merge join: sort relations;

» state: pointers to beginning of sorted relations

• next()

– E.g. for file scan: Output next tuple, and advance and store file pointer

– E.g. for merge join: continue with merge from earlier state till next output tuple is found. Save pointers as iterator state.

• close()

Evaluation Algorithms for Pipelining• Some algorithms are not able to output results even as they get input

tuples

– E.g. merge join, or hash join

– intermediate results written to disk and then read back

• Algorithm variants to generate (at least some) results on the fly, as

input tuples are read in

– E.g. hybrid hash join generates output tuples even as probe relation

tuples in the in-memory partition (partition 0) are read in

– Double-pipelined join technique: Hybrid hash join, modified to

buffer partition 0 tuples of both relations in-memory, reading them

as they become available, and output results of any matches

between partition 0 tuples

• When a new r0 tuple is found, match it with existing s0 tuples,

output matches, and save it in r0

• Symmetrically for s0 tuples


• Motivation– A query is mapped into a sequence of operations.

– Each execution of an operation produces a temporary result.

– Generating and saving temporary files on disk is time consuming and expensive.

• Alternative:– Avoid constructing temporary results as much as

possible.

– Pipeline the data through multiple operations - pass the result of a previous operator to the next without waiting to complete the previous operation.


• The result of one operator is sometimes pipelined to another operator without creating a temporary table to hold the intermediate result

• The output of R ►◄S is pipelined into the selections & projections that follow

• Cost of writing out the intermediate result & reading it back in can be significant

• Temporary table: Materialized Tuples

5/2/2011

19


• Consider a selection query in which only a part of the selection condition matches an index

• 2 instances of selection operator– Matching (primary) part of the selection condition

– Rest

• Pipelining: apply the second selection to each tuple in the result of the primary selection as it is produced & adding tuples that qualify to the final result

• When the input to a unary operator is pipelined into it, we say that the operator is applied on-the-fly


• Result tuples of first join pipelined into join with C

• Conceptually, the evaluation is initiated from the root, & the node joining A & B produces tuples as and when they are requested from their parent node

►◄

A B

C

►◄

(A ►◄B) ►◄ C

Statistical Information for Cost Estimation

• nr: number of tuples in a relation r.

• br: number of blocks containing tuples of r.

• lr: size of a tuple of r.

• fr: blocking factor of r — i.e., the number of tuples of

r that fit into one block.

• V(A, r): number of distinct values that appear in r for attribute A; same as the size of A(r).

• If tuples of r are stored together physically in a file,

then:

rfrn

rb

Histograms

• Histogram on attribute age of relation person

Equi-width histograms

• Equi-depth histograms

Estimation of the Size of Joins

• The Cartesian product r s contains nrns tuples; each tuple

occupies sr + ss bytes.

• If R S = , then r s is the same as r x s.

• If R S is a key for R, then a tuple of s will join with at most one

tuple from r; therefore, the number of tuples in r s is no greater

than the number of tuples in s.If R S in S is a foreign key in S referencing R, then the number of

tuples in r s is exactly the same as the number of tuples in s.The case for R S being a foreign key referencing S is symmetric.

R S

Matching tuples

Example of Size Estimation

• In the example query depositor customer, customer-name in

depositor is a foreign key of customer; hence, the result has exactly

depositor tuples, which is 5000.

• Data: R = Customer, S = Depositor

customer = 10,000

fcustomer = 25

bcustomer = 10000/25 = 400

depositor = 5,000

fdepositor = 50

bdepositor = 5000/50 = 100

5/2/2011

20


• If R S = {A} is not a key for R or S.

If we assume that every tuple t in R produces tuples in

R S, number of tuples in R S is estimated to be:

r s

V(A, s)

• If the reverse is true, the estimates obtained will be:

r s

V(A, r)

• The lower of these two estimates is probably the more

accurate one.

Number of distinct values of A in s

R S

s

V(A, s)


• Compute the size estimates for depositor customer

without using information about foreign keys:

– customer = 10,000

depositor = 5,000

V(customer-name, depositor ) = 2500

V(customer-name, customer ) = 10000

– The two estimates are 5000 * 10000/2500 = 20,000 and

5000 * 10000/10000 = 5000

– We choose the lower estimate, which, in this case, is the

same as our earlier computation using foreign keys.

There are 5,000 tuples in

depositor relation but has

only 2,500 distinct

depositors, so every

depositor has two accounts

Customer-name is unique

Nested-Loop Join

• Compute the theta join, r s

for each tuple tr in r do begin

for each tuple ts in s do begintest pair (tr, ts) to see if they satisfy the join condition

if they do, add tr · ts to the result.

End

end

• r is called the outer relation and s the inner relation of the join.

• Requires no indices and can be used with any kind of join condition.

• Expensive since it examines every pair of tuples in the two relations.

Cost of Nested-Loop Join• If there is enough memory to hold only one block of each

relation, the estimated cost is nr * bs + br disk accesses

• If the smaller relation fits entirely in memory, use it as the inner relation. This reduces the cost estimate to br + bs disk accesses.

– br + bs is the minimum possible cost to read R and S once

– Putting both relations in memory won’t reduce the cost further

br disk accesses to

load R into bufferRS

For each tuple in r, S has to be

read into buffer, bs disk accesses

no. of bocks in rno. of bocks in s

Selection Size Estimation

• A=v(r)

• nr / V(A,r) : number of records that will satisfy the selection

• Equality condition on a key attribute: size estimate = 1

• A V(r) (case of A V(r) is symmetric)

– Let c denote the estimated number of tuples satisfying the

condition.

– If min(A,r) and max(A,r) are available in catalog

• c = 0 if v < min(A,r)

• c =

– If histograms available, can refine above estimate

– In absence of statistical information c is assumed to be nr / 2.

),min(),max(

),min(.

rArA

rAvnr

Size Estimation of Complex Selections

• The selectivity of a condition i is the probability that a tuple in

the relation r satisfies i .

– If si is the number of satisfying tuples in r, the selectivity of

i is given by si /nr.

• Conjunction: 1 2 . . . n (r). Assuming indepdence, estimate of

tuples in the result is:

• Disjunction: 1 2 . . . n (r). Estimated number of tuples:

• Negation: (r). Estimated number of tuples:

nr – size( (r))

n

r

nr

n

sssn

. . . 21

)1(...)1()1(1 21

r

n

rr

rn

s

n

s

n

sn

5/2/2011

21

Heuristic Optimization• Cost-based optimization is expensive

• Systems may use heuristics to reduce the number of choices that must be made in a cost-based fashion.

• Heuristic optimization transforms the query-tree by using a set of rules that typically (but not in all cases) improve execution performance:

– Perform selection early (reduces the number of tuples)

– Perform projection early (reduces the number of attributes)

– Perform most restrictive selection and join operations before other similar operations.

– Some systems use only heuristics, others combine heuristics with partial cost-based optimization.

Heuristic Optimization

Perform selection operations as early as possible

– A heuristic optimizer would use this rule without finding out whether the cost is reduced by this transformation

– Does it always work?

– Consider this:

σθ (A ►◄B)



σθ (A ►◄B)

– Condition θ only refers to attributes in B

– Selection can definitely be performed before the join

– A is extremely small as compared to B

– Index on the join attribute of B

– No index on the attributes used by θ

– Is it a good idea to push the selection before the join?



σθ (A ►◄B)

– Performing the selection early ie directly on B

– Would require a scan of all tuples in B

– Probably cheaper to compute the join using the index and then to reject the tuples that fail the selection


Perform projection operations as early as possible

– Projection operation, like the selection operation, reduces the size of relations

– Whenever we need to generate a temporary relation, it is advantageous to apply immediately any projections that are possible


Perform selections earlier than projections

– Selections have the potential of reducing the size of a relation greatly

– Selections enable the use of indices to access tuples

5/2/2011

22


– Heuristics reorder an initial query-tree representation in such a way that the operations that reduce the size of the intermediate results are applied first

– Early selections reduce the number of tuples

– Early projections reduce the number of attributes

– Heuristic transformations also restructure the tree so that the system performs the most restrictive selection and join operations before other similar operations

SYSTEM R Optimizer

Current relational query optimizers have been greatly influenced by choices made in the design of the IBM’s System R query optimizer

– Use of statistics about DB instance to

estimate the cost of a QEP

– Consider only plans with binary joins in which

the inner relation is a base relation

• This heuristic greatly reduces the no. of alternative

plans that must be considered

SYSTEM R Optimizer

– Focus optimization on the class of SQL queries without nesting & treat nested queries in a relatively ad-hoc way

– Not to perform duplicate elimination for projections except as a final step when required by a DISTINCT clause

– Cartesian products avoided

– A model of cost that accounted for CPU costs as well as I/O costs

– Only left-deep plans

Left-Deep Plans

Focus optimization on the class of SQL queries without nesting & treat nested queries in a relatively ad-hoc way

– Not to perform duplicate elimination for projections except as a final step when required by a DISTINCT clause

– Cartesian products avoided

– A model of cost that accounted for CPU costs as well as I/O costs

– Only left-deep plans

Left Deep Join Trees

• In left-deep join trees, the right-hand-side input

for each join is a relation, not the result of an

intermediate join.

Query processing

Documents

query syntax

query processing1

query processingparsing

query processingconsider

query optimizationideally

query optimizationtechniques

optimizethe users query

query evaluation plan