Chapter 13 Query Chapter 13 Query Processing Processing Melissa Jamili CS 157B November 11, 2004
Dec 31, 2015
Chapter 13 Query ProcessingChapter 13 Query Processing
Melissa Jamili
CS 157B
November 11, 2004
OutlineOutline
OverviewMeasures of query costSelection operation– Basic algorithms– Selections using indices– Selections involving comparisons– Implementation of complex selections
Sorting
OverviewOverview
Query processing – the range of activities involved in extracting data from a database.
– Includes: translation of queries, query-optimizing transformations, evaluation of queries.
Overview (cont.)Overview (cont.)
Steps involved in processing a query:
1) Parsing and translation
2) Optimization
3) Evaluation
Step 1. Parsing and translationStep 1. Parsing and translation
System checks the syntax of the query.
Creates a parse-tree representation of the query.
Translates the query into a relational-algebra expression.
Step 2. OptimizationStep 2. Optimization
Each relational-algebra operation can be executed by one of several different algorithms. Ex. Every tuple can be searched linearly
or a B+ tree can be used to index the tuples.
A query optimizer must know the cost of each operation.
Step 3. EvaluationStep 3. Evaluation
A relational-algebra expression and evaluation primitive are needed.– evaluation primitive - relational-algebra
expression annotated with instructions specifying how to evaluate each operation.
Step 3. Evaluation (cont.)Step 3. Evaluation (cont.)
Query execution plan or query-evaluation plan – sequence of primitive operations that can be used to evaluate the expression.
Π balance (σ balance < 2500 (account))
Step 3. Evaluation (cont.)Step 3. Evaluation (cont.)
Then the query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query.
Steps in query processingSteps in query processing
Measures of Query CostMeasures of Query Cost
Measured in terms of different resources, including disk accesses and CPU execution time
Measured in response time for a query-evaluation plan (clock time to execute the plan)
Measures of Query Cost (cont.)Measures of Query Cost (cont.)
Disk accesses, number of block transfers from disk, are usually the most important cost
To calculate add these numbers:1) # of seek operations performed2) # of blocks read3) # of blocks writtenAfter multiplying them by the average seek time,
average transfer time, and average transfer time, respectively.
Selection OperationSelection Operation
File scan is the lowest-level operator to access data.
File scans are search algorithms that locate and retrieve records that fulfill a selection condition.
Allows an entire relation to be read where the relation is stored in a single, dedicated file.
Basic AlgorithmsBasic Algorithms
Two basic scan algorithms to implement the selection operation:
A1 (linear search). – The system scans each file block and tests all records to
see whether they satisfy the selection condition.– System terminates if the required record is found,
without looking at the other records. A2 (binary search).
– System performs the binary search on the blocks of the file.
– File is ordered on an attribute, and selection condition is an equality comparison.
Selections Using IndicesSelections Using Indices
Index structures referred to a access paths, since they provide a path through which data can be located and accessed.
Primary index – allows records to be read in an order that corresponds to the physical order in the file.
Secondary index – any index that is not a primary index.
Selections Using Indices (cont.)Selections Using Indices (cont.)
A3 (primary index, equality on key).– Use the index to retrieve a single record
that satisfies the corresponding equality condition.
A4 (primary index, equality on nonkey).– Same as A3, but multiple records may
need to be fetched.
Selections Using Indices (cont.)Selections Using Indices (cont.)
A5 (secondary index, equality).– Retrieve a single record if the equality
condition is on a key.– Retrieve multiple records if the indexing
field is not a key.
Selections Involving ComparisonsSelections Involving Comparisons
Consider the form σA ≥v (r). Can be implemented by linear or binary
search or by using indices: A6 (primary index, comparison).– For A ≥ v, look for first tuple that has the value
of A = v, return all the tuples starting from the tuple up to the end of the file.
– For A < v, file scan from the beginning up to (but not including) the first tuple with attribute A = v.
Selections Involving Comparisons Selections Involving Comparisons (cont.)(cont.)A7 (secondary index, comparison).– The lowest-level blocks are scanned• From the smallest value up to v for < and ≤.• From v up to the maximum value for > and
≥.
Implementation of Complex Implementation of Complex SelectionsSelectionsPreviously only considered simple
selection conditions with an equality or comparison operation.
Now consider more complex selection predicates.
Implementation of Complex Implementation of Complex Selections (cont.)Selections (cont.)Conjunction: σθ1 ∩ θ2 ∩ … ∩ θn (r)
Disjunction: σθ1 ∪ θ2 … ∪ ∪ θn (r)
Negation: σ-θ (r)
Implementation of Complex Implementation of Complex Selections (cont.)Selections (cont.)A8 (conjunctive selection using one
index).– Determine if there is an access path for
an attribute in one of the simple conditions.
– If yes, algorithms A2 through A7 can be used.
– On all records retrieved test if they satisfy the remaining simple conditions.
Implementation of Complex Implementation of Complex Selections (cont.)Selections (cont.)A9 (conjunctive selection using
composite index (multiple attributes).– Search index directly if selection
specifies an equality condition on 2 or move attributes and a composite index exists.
– Type of index determines if A3, A4, or A5 will be used.
Implementation of Complex Implementation of Complex Selections (cont.)Selections (cont.) A10 (conjunction selection by intersection
of identifiers).– Requires indices with record pointers.– Scan each index for pointers that satisfy an
individual condition.– Take intersection of all retrieved pointers to
get set of pointers that satisfy the conjunctive condition.
– Then pointers used to retrieve actual records.– If indices not available on all conditions, then
algorithm tests retrieved records against remaining conditions.
Implementation of Complex Implementation of Complex Selections (cont.)Selections (cont.)A11 (disjunctive selection by union
of identifiers).– Each index is scanned for pointers to
tuples that satisfy the individual condition.
– Union of all the retrieved pointers gives the set of pointers to all tuples that satisfy the disjunctive condition.
– Pointers then used to retrieve actual records.
SortingSorting
Focus on external sorting where relations that are bigger than memory.
Most common is external merge-sort algorithm.
Sorting (cont.)Sorting (cont.)
N-way merge– A number of runs
are created and sorted.
– Runs are merged.
ConclusionConclusion
Query processing - translation, transformation, and evaluation.
Query cost measured by number of disk accesses.
Selection operation algorithmsSorting – N-way merge.