8/12/2019 Lecture05 Query Processing Ch23
1/59
1
Chapter 23
Query Processing
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
2/59
2
Chapter 23 - Objectives
Objectives of query processing and optimization.
Static versus dynamic query optimization.
How a query is decomposed and semantically
analyzed.
How to create a R.A.T. to represent a query.
Rules of equivalence for RA (relation algebra)
operations.How to apply heuristic transformation rules to
improve efficiency of a query.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
3/59
3
Chapter 23 - Objectives
Types of database statistics required to estimate
cost of operations.
Different strategies for implementing selection.
How to evaluate cost and size of selection.
Different strategies for implementing join.
How to evaluate cost and size of join.
Different strategies for implementing projection.How to evaluate cost and size of projection.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
4/59
4
Chapter 23 - Objectives
How to evaluate the cost and size of other RAoperations.
How pipelining can be used to improve efficiency
of queries. Difference between materialization and
pipelining.
Advantages of left-deep trees.
Approaches to finding optimal executionstrategy.
How Oracle handles QO.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
5/59
5
Introduction
In network and hierarchical DBMSs, low-levelprocedural query language is generally embeddedin high-level programming language.
Programmers responsibility to select mostappropriate execution strategy.
With declarative languages such as SQL, userspecifies what data is required rather than how it
is to be retrieved. Relieves user of knowing what constitutes good
execution strategy.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
6/59
6
Introduction
Two main techniques for query optimization:
heuristic rules that order operations in a query;
comparing different strategies based on relativecosts, and selecting one that minimizes resourceusage.
Disk access tends to be dominant cost in query
processing for centralized DBMS.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
7/59
7
Query Processing
Activities involved in retrieving data from the
database.
Aims of QP:
transform query written in high-level language
(e.g. SQL), into correct and efficient execution
strategy expressed in low-level language
(implementing RA);execute strategy to retrieve required data.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
8/59
8
Query Optimization
Activity of choosing an efficient executionstrategy for processing query.
As there are many equivalent transformations of
same high-level query, aim of QO is to choose onethat minimizes resource usage.
Generally, reduce total execution time of query.
May also reduce response time of query.
Problem computationally intractable with largenumber of relations, so strategy adopted isreduced to finding near optimum solution.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
9/59
9
Example 23.1 - Different Strategies
Find all Managers who work at a London branch.
SELECT *
FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND
(s.position = ManagerAND b.city = London);
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
10/59
10
Example 23.1 - Different Strategies
Three equivalent RA queries are:
(1) (position='Manager')(city='London') (Staff.branchNo=Branch.branchNo) (Staff X Branch)
(2) (position='Manager')(city='London')(Staff Staff.branchNo=Branch.branchNoBranch)
(3) (position='Manager'(Staff)) Staff.branchNo=Branch.branchNo(city='London'(Branch))
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
11/59
11
Example 23.1 - Different Strategies
Assume:
1000 tuples in Staff; 50 tuples in Branch;
50 Managers; 5 London branches;
no indexes or sort keys;
results of any intermediate operations stored
on disk;
cost of the final write is ignored; tuples are accessed one at a time.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
12/59
12
Example 23.1 - Cost Comparison
Cost (in disk accesses) are:
(1) (1000 + 50) + 2*(1000 * 50) = 101 050
(2) 2*1000 + (1000 + 50) = 3 050
(3) 1000 + 2*50 + 5 + (50 + 5) = 1 160
Cartesian product and join operations muchmore expensive than selection, and third option
significantly reduces size of relations being joinedtogether.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
13/59
13
Phases of Query Processing
QP has four main phases:
decomposition (consisting of parsing and
validation);
optimization;
code generation;
execution.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
14/59
14
Phases of Query Processing
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
15/59
15
Dynamic versus Static Optimization
Two times when first three phases of QP can becarried out:
dynamically every time query is run;
statically when query is first submitted. Advantages of dynamic QO arise from fact that
information is up to date.
Disadvantages are that performance of query is
affected, time may limit finding optimumstrategy.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
16/59
16
Dynamic versus Static Optimization
Advantages of static QO are removal of runtime
overhead, and more time to find optimum
strategy.
Disadvantages arise from fact that chosenexecution strategy may no longer be optimal
when query is run.
Could use a hybrid approach to overcome this.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
17/59
17
Query Decomposition
Aims are to transform high-level query into RAquery and check that query is syntactically andsemantically correct.
Typical stages are:
analysis,
normalization,
semantic analysis,
simplification,
query restructuring.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
18/59
18
Analysis
Analyze query lexically (t vng) and
syntactically using compiler techniques.
Verify relations and attributes exist.
Verify operations are appropriate for object type.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
19/59
19
Analysis - Example
SELECT staff_no
FROM Staff
WHERE position > 10;
This query would be rejected on two grounds:
staff_no is not defined for Staff relation
(should be staffNo).
Comparison >10 is incompatible with typeposition, which is variable character string.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
20/59
8/12/2019 Lecture05 Query Processing Ch23
21/59
21
Example 23.1 - R.A.T.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
22/59
22
Normalization
Converts query into a normalized form for easier
manipulation.
Predicate can be converted into one of two forms:
Conjunctive normal form:
(position = 'Manager' salary > 20000) (branchNo = 'B003')
Disjunctive normal form:
(position = 'Manager' branchNo = 'B003' ) (salary > 20000 branchNo = 'B003')
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
23/59
23
Semantic Analysis
Rejects normalized queries that are incorrectlyformulated or contradictory.
Query is incorrectly formulated if componentsdo not contribute to generation of result.
Query is contradictory if its predicate cannot besatisfied by any tuple.
Algorithms to determine correctness exist only
for queries that do not contain disjunction andnegation.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
24/59
24
Semantic Analysis
For these queries, could construct:
A relation connection graph.
Normalized attribute connection graph.
Relation connection graph
Create node for each relation and node for
result. Create edges between two nodes that
represent a join, and edges between nodes thatrepresent projection.
If not connected, query is incorrectly formulated.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
25/59
25
Simplification
Detects redundant qualifications,
eliminates common sub-expressions,
transforms query to semantically equivalent
but more easily and efficiently computed form. Typically, access restrictions, view definitions,
and integrity constraints are considered.
Assuming user has appropriate access privileges,
first apply well-known idempotency rules ofboolean algebra.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
26/59
26
Transformation Rules for RA Operations
Conjunctive Selection operations can cascade into
individual Selection operations (and vice versa).
pqr(R) = p(q(r(R))) Sometimes referred to as cascade of Selection.
branchNo='B003' salary>15000(Staff) =branchNo='B003'(salary>15000(Staff))
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
27/59
27
Transformation Rules for RA Operations
Commutativity of Selection.
p(q(R)) = q(p(R))
For example:
branchNo='B003'(salary>15000(Staff)) =salary>15000(branchNo='B003'(Staff))
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
28/59
28
Transformation Rules for RA Operations
In a sequence of Projection operations, only the
last in the sequence is required.
LM N(R) = L(R)
For example:
lName branchNo, lName(Staff) = lName(Staff)
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
29/59
29
Transformation Rules for RA Operations
Commutativity of Selection and Projection.
If predicate p involves only attributes in projection list,
Selection and Projection operations commute:
Ai, , Am(p(R)) = p( Ai, , Am(R))where p{A1, A2, , Am}
For example:
fName, lName(lName='Beech'(Staff)) =lName='Beech'( fName,lName(Staff))
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
30/59
30
Transformation Rules for RA Operations
Commutativity of Theta join (and Cartesianproduct).
R pS = S pR
R X S = S X R
Rule also applies to Equijoin and Natural join.For example:
Staff staff.branchNo=branch.branchNoBranch =
Branch staff.branchNo=branch.branchNo Staff
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
31/59
31
Transformation Rules for RA Operations
Commutativity of Selection and Theta join (orCartesian product).
If selection predicate involves only attributes ofone of join relations, Selection and Join (orCartesian product) operations commute:
p(R rS) = (p(R)) rSp(R X S) = (p(R)) X S
where p{A1, A2, , An}
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
32/59
32
Transformation Rules for RA Operations
If selection predicate is conjunctive predicate
having form (p q), where p only involvesattributes of R, and q only attributes of S,
Selection and Theta join operations commute as:
p q(R rS) = (p(R)) r(q(S))p q(R X S) = (p(R)) X (q(S))
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
33/59
33
Transformation Rules for RA Operations
For example:
position='Manager' city='London'(StaffStaff.branchNo=Branch.branchNoBranch) =
(position='Manager'(Staff)) Staff.branchNo=Branch.branchNo(city='London'(Branch))
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
34/59
34
Transformation Rules for RA Operations
Commutativity of Projection and Theta join (orCartesian product).
If projection list is of form L = L1L
2, where L
1
only has attributes of R, and L2 only hasattributes of S, provided join condition onlycontains attributes of L, Projection and Theta
join commute:
L1L2(R rS) = ( L1(R)) r( L2(S))
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
35/59
35
Transformation Rules for RA Operations
If join condition contains additional attributes
not in L (M = M1 M2 where M1 only hasattributes of R, and M2only has attributes of S),
a final projection operation is required:
L1L2(R rS) = L1L2( ( L1M1(R)) r( L2M2(S)))
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
36/59
36
Transformation Rules for RA Operations
For example:
position,city,branchNo(Staff Staff.branchNo=Branch.branchNoBranch)
=
(position, branchNo
(Staff))Staff.branchNo=Branch.branchNo
(
city, branchNo(Branch))
and using the latter rule:
position, city
(StaffStaff.branchNo=Branch.branchNo
Branch) =
position, city(( position, branchNo(Staff))Staff.branchNo=Branch.branchNo( city, branchNo(Branch)))
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
37/59
37
Transformation Rules for RA Operations
Commutativity of Union and Intersection (but
not set difference).
R S = S RR S = S R
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
38/59
38
Transformation Rules for RA Operations
Commutativity of Selection and set operations
(Union, Intersection, and Set difference).
p(R S) = p(S) p(R)p(R S) = p(S) p(R)p(R - S) = p(S) - p(R)
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
39/59
39
Transformation Rules for RA Operations
Commutativity of Projection and Union.
L(R S) = L(S) L(R)Associativity of Union and Intersection (but not
Set difference).
(R S) T = S (R T)(R S) T = S (R T)
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
40/59
40
Transformation Rules for RA Operations
Associativity of Theta join (and Cartesian product).
Cartesian product and Natural join are always
associative:
(R S) T = R (S T)
(R X S) X T = R X (S X T)
If join condition q involves attributes only from S
and T, then Theta join is associative:(R p S) q rT = R p r (S q T)
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
41/59
41
Transformation Rules for RA Operations
For example:
(Staff Staff.staffNo=PropertyForRent.staffNo PropertyForRent)
ownerNo=Owner.ownerNo
staff.lName=Owner.lName
Owner =
Staff staff.staffNo=PropertyForRent.staffNo staff.lName=lName(PropertyForRent ownerNoOwner)
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
42/59
42
Example 23.3 Use of Transformation Rules
For prospective renters of flats, find propertiesthat match requirements and owned by CO93.
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent pWHERE c.prefType = FlatAND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.maxRent >= p.rent ANDc.prefType = p.type AND
p.ownerNo = CO93;
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
43/59
43
Example 23.3 Use of Transformation Rules
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
44/59
44
Example 23.3 Use of Transformation Rules
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
45/59
45
Example 23.3 Use of Transformation Rules
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
46/59
46
Heuristical Processing Strategies
Perform Selection operations as early as possible.
Keep predicates on same relation together.
Combine Cartesian product with subsequent
Selection whose predicate represents joincondition into a Join operation.
Use associativity of binary operations to
rearrange leaf nodes so leaf nodes with mostrestrictive Selection operations executed first.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
47/59
47
Heuristical Processing Strategies
Perform Projection as early as possible.
Keep projection attributes on same relation together.
Compute common expressions once.
If common expression appears more than once, and
result not too large, store result and reuse it when
required.
Useful when querying views, as same expression is used
to construct view each time.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
48/59
48
Cost Estimation for RA Operations
Many different ways of implementing RAoperations.
Aim of QO is to choose most efficient one.
Use formulae that estimate costs for a number ofoptions, and select one with lowest cost.
Consider only cost of disk access, which is usuallydominant cost in QP.
Many estimates are based on cardinality of therelation, so need to be able to estimate this.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
49/59
49
Database Statistics
Success of estimation depends on amount and
currency of statistical information DBMS holds.
Keeping statistics current can be problematic.
If statistics updated every time tuple is changed,this would impact performance.
DBMS could update statistics on a periodic basis,
for example nightly, or whenever the system is
idle.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
50/59
50
Query Optimization in Oracle
Oracle supports two approaches to queryoptimization: rule-based and cost-based.
Rule-based
15 rules, ranked in order of efficiency. Particularaccess path for a table only chosen if statementcontains a predicate or other construct thatmakes that access path available.
Score assigned to each execution strategy usingthese rankings and strategy with best (lowest)score selected.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
51/59
51
QO in OracleRule-Based
When 2 strategies have same score, tie-breakresolved by making decision based on order in
which tables occur in the SQL statement.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
52/59
52
QO in OracleRule-based: Example
SELECT propertyNoFROM PropertyForRent
WHERE rooms > 7 AND city = London
Single-column access path using index on city from
WHERE condition (city = London). Rank 9. Unbounded range scan using index on rooms from
WHERE condition (rooms > 7). Rank 11.
Full table scan - rank 15.
Although there is index on propertyNo, column does notappear in WHERE clause and so is not considered byoptimizer.
Based on these paths, rule-based optimizer will choose touse index based on city column.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
53/59
53
QO in OracleCost-Based
To improve QO, Oracle introduced cost-basedoptimizer in Oracle 7, which selects strategy thatrequires minimal resource use necessary toprocess all rows accessed by query (avoiding
above tie-break anomaly). User can select whether minimal resource usage
is based on throughputor based on response time,by setting the OPTIMIZER_MODE initialization
parameter. Cost-based optimizer also takes into
consideration hints that the user may provide.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
54/59
54
QO in OracleStatistics
Cost-based optimizer depends on statistics for alltables, clusters, and indexes accessed by query.
Users responsibility to generate these statisticsand keep them current.
Package DBMS_STATS can be used to generateand manage statistics.
Whenever possible, Oracle uses a parallel method
to gather statistics, although index statistics arecollected serially.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
55/59
55
QO in OracleHistograms
Previously made assumption that data values
within columns of a table are uniformly
distributed.
Histogram of values and their relativefrequencies gives optimizer improved selectivity
estimates in presence of non-uniform
distribution.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
56/59
56
QO in OracleHistograms
(a) uniform distribution of rooms; (b) actual non-uniformdistribution.
(a) can be stored compactly as low value (1) and high value
(10), and as total count of all frequencies (in this case, 100).
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
57/59
57
QO in OracleHistograms
Histogram is data structure that can improveestimates of number of tuples in result.
Two types of histogram:
width-balanced histogram, which divides data into a
fixed number of equal-width ranges (called buckets)each containing count of number of values fallingwithin that bucket;
height-balanced histogram, which places
approximately same number of values in each bucketso that end points of each bucket are determined byhow many values are in that bucket.
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
58/59
58
QO in OracleHistograms
(a) width-balanced for rooms with 5 buckets. Each bucket
of equal width with 2 values (1-2, 3-4, etc.)
(b) height-balanced height of each column is 20 (100/5).
Pearson Education 2009
8/12/2019 Lecture05 Query Processing Ch23
59/59
9
QO in OracleViewing Execution Plan