Top Banner
CSCE 608 – 600 Database Systems Chapter 15: Query Execution 1
41

CSCE 608 – 600 Database Systems

Jan 05, 2016

Download

Documents

ludlow

CSCE 608 – 600 Database Systems. Chapter 15: Query Execution. Index-Based Algorithms. The existence of an index is especially helpful for selection, and helps others Clustered relation : tuples are packed into the minimum number of blocks - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CSCE 608 – 600  Database Systems

CSCE 608 – 600 Database Systems

Chapter 15: Query Execution

1

Page 2: CSCE 608 – 600  Database Systems

2

Index-Based Algorithms

The existence of an index is especially helpful for selection, and helps others

Clustered relation: tuples are packed into the minimum number of blocks

Clustering index: all tuples with the same value for the index's search key are packed into the minimum number of blocks

Page 3: CSCE 608 – 600  Database Systems

3

Index-Based Selection

Without an index, selection takes B(R), or even T(R), disk I/O's.

To select all tuples with attribute a equal to value v, when there is an index on a: search the index for value v and get pointers to

exactly the blocks containing the desired tuples

If index is clustering, then number of disk I/O's is about B(R)/V(R,a)

Page 4: CSCE 608 – 600  Database Systems

4

Examples

Suppose B(R) = 1000, T(R) = 20,000, there is an index on a and we want to select all tuples with a = 0. If R is clustered and don't use index: 1000 disk I/O's If R is not clustered and don't use index: 20,000 disk

I/O's If V(R,a) = 100, index is clustering, and use index:

1000/100 = 10 disk I/O's (on average) If V(R,a) = 10, R is not clustered, index is non-clustering,

and use index: 20,000/10 = 2000 disk I/O's (on average) If V(R,a) = 20,000 (a is a key) and use index: 1 disk I/O

Page 5: CSCE 608 – 600  Database Systems

5

Using Indexes in Other Operations

1. If the index is a B-tree, can efficiently select tuples with indexed attribute in a range

2. If selection is on a complex condition such as "a = v AND …", first do the index-based algorithm to get tuples satisfying "a = v".

Such splitting is part of the job of the query optimizer

Page 6: CSCE 608 – 600  Database Systems

6

Index-Based Join Algorithm

Consider natural join of R(X,Y) and S(Y,Z). Suppose S has an index on Y.

for each block of Rfor each tuple t in the current block

use index on S to find tuples of S that match t in the attribute(s) Y

output the join of these tuples

Page 7: CSCE 608 – 600  Database Systems

7

Analysis of Index-Based Join

To get all the blocks of R, either B(R) or T(R) disk I/O's are needed

For each tuple of R, there are on average T(S)/V(S,Y) matching tuples of S T(R)*T(S)/V(S,Y) disk I/O's if index is not clustering T(R)*B(S)/V(S,Y) disk I/O's if index is clustering

This method is efficient if R is much smaller than S and V(S,Y) is large (i.e., not many tuples of S match)

Page 8: CSCE 608 – 600  Database Systems

8

Join Using a Sorted Index

Suppose we want to join R(X,Y) and S(Y,Z).

Suppose we have a sorted index (e.g., B-tree) on Y for R and S:do sort-join butno need to sort the indexed relations first

Page 9: CSCE 608 – 600  Database Systems

9

Buffer Management

The availability of blocks (buffers) of main memory is controlled by buffer manager.

When a new buffer is needed, a replacement policy is used to decide which existing buffer should be returned to disk.

If the number of buffers available for an operation cannot be predicted in advance, then the algorithm chosen must degrade gracefully as the number of buffers shrinks.

If the number of buffers available is not large enough for a two-pass algorithm, then there are generalizations to algorithms that use three or more passes.

Page 10: CSCE 608 – 600  Database Systems

CSCE 608 - 600 Database Systems

Chapter 16: Query Compiler

10

Page 11: CSCE 608 – 600  Database Systems

Query Compiler

Parsing

Logical Query Plan

11

Page 12: CSCE 608 – 600  Database Systems

parse

convert

apply laws

estimate result sizes

consider physical plans estimate costs

pick best

execute

{P1,P2,…..}

{(P1,C1),(P2,C2)...}

Pi

answer

SQL query

parse tree

logical query plan

“improved” l.q.p

l.q.p. +sizes

statistics

Page 13: CSCE 608 – 600  Database Systems

13

Outline

Convert SQL query to a parse tree Semantic checking: attributes, relation names, types

Convert to a logical query plan (relational algebra expression) deal with subqueries

Improve the logical query plan use algebraic transformations group together certain operators evaluate logical plan based on estimated size of relations

Convert to a physical query plan search the space of physical plans choose order of operations complete the physical query plan

Page 14: CSCE 608 – 600  Database Systems

14

Parsing

Goal is to convert a text string containing a query into a parse tree data structure: leaves form the text string (broken into lexical

elements) internal nodes are syntactic categories

Uses standard algorithmic techniques from compilers given a grammar for the language (e.g., SQL),

process the string and build the tree

Page 15: CSCE 608 – 600  Database Systems

15

Example: SQL query

SELECT title

FROM StarsIn

WHERE starName IN (

SELECT name

FROM MovieStar

WHERE birthdate LIKE ‘%1960’

);

(Find the movies with stars born in 1960)

Assume we have a simplified grammar for SQL.

Page 16: CSCE 608 – 600  Database Systems

16

Example: Parse Tree

<Query>

<SFW>

SELECT <SelList> FROM <FromList> WHERE <Condition>

<Attribute> <RelName> <Tuple> IN <Query>

title StarsIn <Attribute> ( <Query> )

starName <SFW>

<Attribute> <RelName> <Attribute> LIKE <Pattern>

name MovieStar birthDate ‘%1960’

SELECT <SelList> FROM <FromList> WHERE <Condition>

Page 17: CSCE 608 – 600  Database Systems

17

The Preprocessor

replaces each reference to a view with a parse (sub)-tree that describes the view (i.e., a query)

does semantic checking: are relations and views mentioned in the schema? are attributes mentioned in the current scope? are attribute types correct?

Page 18: CSCE 608 – 600  Database Systems

18

Outline

Convert SQL query to a parse tree Semantic checking: attributes, relation names, types

Convert to a logical query plan (relational algebra expression) deal with subqueries

Improve the logical query plan use algebraic transformations group together certain operators evaluate logical plan based on estimated size of relations

Convert to a physical query plan search the space of physical plans choose order of operations complete the physical query plan

Page 19: CSCE 608 – 600  Database Systems

19

Convert Parse Tree to Relational Algebra

Complete algorithm depends on specific grammar, which determines forms of the parse trees

Here give a flavor of the approach

Page 20: CSCE 608 – 600  Database Systems

20

Conversion

Suppose there are no subqueries.

SELECT att-list FROM rel-list WHERE cond

is converted into

PROJatt-list(SELECTcond(PRODUCT(rel-list))), or

att-list(cond( X (rel-list)))

Page 21: CSCE 608 – 600  Database Systems

SELECT movieTitle

FROM StarsIn, MovieStar

WHERE starName = name AND birthdate LIKE '%1960';

<Query>

<SFW>

SELECT <SelList> FROM <FromList> WHERE <Condition>

<Attribute> <RelName> , <FromList> AND <Condition>

movieTitle StarsIn <RelName> <Attribute> LIKE <Pattern>

MovieStar birthdate '%1960'

<Condition>

<Attribute> = <Attribute>

starName name

Page 22: CSCE 608 – 600  Database Systems

22

Equivalent Algebraic Expression Tree

movieTitle

starname = name AND birthdate LIKE '%1960'

X

StarsIn MovieStar

Page 23: CSCE 608 – 600  Database Systems

23

Handling Subqueries

Recall the (equivalent) query:SELECT titleFROM StarsInWHERE starName IN (

SELECT nameFROM MovieStarWHERE birthdate LIKE ‘%1960’

);

Use an intermediate format called two-argument selection

Page 24: CSCE 608 – 600  Database Systems

24

title

StarsIn <condition>

<tuple> IN name

<attribute> birthdate LIKE ‘%1960’

starName MovieStar

Example: Two-Argument Selection

Page 25: CSCE 608 – 600  Database Systems

25

Converting Two-Argument Selection

To continue the conversion, we need rules for replacing two-argument selection with a relational algebra expression

Different rules depending on the nature of the subquery

Here show example for IN operator and uncorrelated query (subquery computes a relation independent of the tuple being tested)

Page 26: CSCE 608 – 600  Database Systems

26

Rules for IN

R <Condition>

t IN S R S

X

C

C is the condition that equatesattributes in t with correspondingattributes in S

Page 27: CSCE 608 – 600  Database Systems

27

Example: Logical Query Plan

title

starName=name

StarsIn

birthdate LIKE ‘%1960’

MovieStar

name

Page 28: CSCE 608 – 600  Database Systems

28

What if Subquery is Correlated?

Example is when subquery refers to the current tuple of the outer scope that is being tested

More complicated to deal with, since subquery cannot be translated in isolation

Need to incorporate external attributes in the translation

Some details are in textbook

Page 29: CSCE 608 – 600  Database Systems

29

Outline

Convert SQL query to a parse tree Semantic checking: attributes, relation names, types

Convert to a logical query plan (relational algebra expression) deal with subqueries

Improve the logical query plan use algebraic transformations group together certain operators evaluate logical plan based on estimated size of relations

Convert to a physical query plan search the space of physical plans choose order of operations complete the physical query plan

Page 30: CSCE 608 – 600  Database Systems

30

Improving the Logical Query Plan

There are numerous algebraic laws concerning relational algebra operations

By applying them to a logical query plan judiciously, we can get an equivalent query plan that can be executed more efficiently

Next we'll survey some of these laws

Page 31: CSCE 608 – 600  Database Systems

31

Associative and Commutative Operations

product natural join set and bag union set and bag intersection

associative: (A op B) op C = A op (B op C) commutative: A op B = B op A

Page 32: CSCE 608 – 600  Database Systems

32

Laws Involving Selection

Selections usually reduce the size of the relation

Usually good to do selections early, i.e., "push them down the tree"

Also can be helpful to break up a complex selection into parts

Page 33: CSCE 608 – 600  Database Systems

33

Selection Splitting

C1 AND C2 (R) = C1 ( C2 (R))

C1 OR C2 (R) = ( C1 (R)) Uset ( C2 (R))if R is a set

C1 ( C2 (R)) = C2 ( C1 (R))

Page 34: CSCE 608 – 600  Database Systems

34

Selection and Binary Operators

Must push selection to both arguments: C (R U S) = C (R) U C (S)

Must push to first arg, optional for 2nd: C (R - S) = C (R) - S C (R - S) = C (R) - C (S)

Push to at least one arg with all attributes mentioned in C: product, natural join, theta join, intersection e.g., C (R X S) = C (R) X S, if R has all the atts

in C

Page 35: CSCE 608 – 600  Database Systems

35

Pushing Selection Up the Tree

Suppose we have relations StarsIn(title,year,starName) Movie(title,year,len,inColor,studioName)

and a view CREATE VIEW MoviesOf1996 AS

SELECT *FROM MovieWHERE year = 1996;

and the query SELECT starName, studioName

FROM MoviesOf1996 NATURAL JOIN StarsIn;

Page 36: CSCE 608 – 600  Database Systems

36

The Straightforward Tree

starName,studioName

year=1996 StarsIn

Movie Remember the ruleC(R S) = C(R) S ?

Page 37: CSCE 608 – 600  Database Systems

37

The Improved Logical Query Plan

starName,studioName

year=1996 StarsIn

Movie

starName,studioName

year=1996

Movie StarsIn

starName,studioName

year=1996 year=1996

Movie StarsIn

push selectionup tree

push selectiondown tree

Page 38: CSCE 608 – 600  Database Systems

38

Laws Involving Projections

Consider adding in additional projections Adding a projection lower in the tree can improve

performance, since often tuple size is reduced Usually not as helpful as pushing selections down

If a projection is inserted in the tree, then none of the eliminated attributes can appear above this point in the tree Ex: L(R X S) = L(M(R) X N(S)), where M (resp. N) is

all attributes of R (resp. S) that are used in L Another example:

L(R Ubag S) = L(R) Ubag L(S) But watch out for set union!

Page 39: CSCE 608 – 600  Database Systems

39

Push Projection Below Selection?

Rule: L(C(R)) = L(C(M(R)))

where M is all attributes used by L or C But is it a good idea?

SELECT starName FROM StarsIn WHERE movieYear = 1996;

starName

movieYear=1996

StarsIn

starName,movieYear

starName

movieYear=1996

StarsIn

Page 40: CSCE 608 – 600  Database Systems

40

Joins and Products

Recall from the definitions of relational algebra:R C S = C(R X S) (theta join)R S = L(C(R X S)) (natural join)

where C equates same-name attributes in R and S, and L includes all attributes of R and S dropping duplicates

To improve a logical query plan, replace a product followed by a selection with a joinJoin algorithms are usually faster than doing

product followed by selection

Page 41: CSCE 608 – 600  Database Systems

41

Duplicate Elimination

Moving down the tree is potentially beneficial as it can reduce the size of intermediate relations

Can be eliminated if argument has no duplicates a relation with a primary key a relation resulting from a grouping operator

Legal to push through product, join, selection, and bag intersection Ex: (R X S) = (R) X (S)

Cannot push through bag union, bag difference or projection