Access Path Selection in a RDBMS Shahram Ghandeharizadeh Computer Science Department University of Southern California.

Access Path Selection in a RDBMSAccess Path Selection in a RDBMS

Shahram GhandeharizadehShahram GhandeharizadehComputer Science DepartmentComputer Science DepartmentUniversity of Southern CaliforniaUniversity of Southern California

System RSystem R

Grand-daddy of RDBMSGrand-daddy of RDBMS Started in 1975 at IBM San Jose Research Lab.Started in 1975 at IBM San Jose Research Lab. Won the ACM Software System Award in 1988.Won the ACM Software System Award in 1988. Introduced fundamental database concepts such Introduced fundamental database concepts such

as SQL, locking, logging, cost-based query as SQL, locking, logging, cost-based query optimization techniques, etc.optimization techniques, etc.

Four Phases of SQL ProcessingFour Phases of SQL Processing

ParsingParsing Checks for correct SQL syntax,Checks for correct SQL syntax, Computes the list of items to be retrieved, the table(s) Computes the list of items to be retrieved, the table(s)

referenced, and boolean combination of simple predicates.referenced, and boolean combination of simple predicates. OptimizationOptimization

Looks up the tables in the database catalog for their Looks up the tables in the database catalog for their existence and statistics, and available access paths.existence and statistics, and available access paths.

Computes the execution plan with minimum cost.Computes the execution plan with minimum cost. Output: Execution plan in the Access Specification Output: Execution plan in the Access Specification

Language (ASL).Language (ASL). Code generationCode generation

Code generator is a table-driven program which translates Code generator is a table-driven program which translates ASL tress into machine language code.ASL tress into machine language code.

Parse tree is replaced by executable machine code and its Parse tree is replaced by executable machine code and its associated data structures. This code can be stored away associated data structures. This code can be stored away in the database for later execution.in the database for later execution.

ExecutionExecution Executes the machine code by invoking System R internal Executes the machine code by invoking System R internal

storage system (RSS) via the storage system interface storage system (RSS) via the storage system interface (RSI) to scan each of the physically stored relations (RSI) to scan each of the physically stored relations referenced by the query.referenced by the query.

Research Storage System (RSS)Research Storage System (RSS)

Maintains physical storage of relations, Maintains physical storage of relations, access paths on these relations.access paths on these relations.

Implements locking and logging.Implements locking and logging. RSS represents a relation as:RSS represents a relation as:

A collection of tuples stored in 4KB pages,A collection of tuples stored in 4KB pages, Columns of a tuple are physically contiguous,Columns of a tuple are physically contiguous, No tuple spans a page.No tuple spans a page. Pages are organized into logical units called Pages are organized into logical units called

segments.segments. Segments may contain one or more relations.Segments may contain one or more relations.

Each tuple is tagged with the identification of the Each tuple is tagged with the identification of the relation to which it belongs.relation to which it belongs.

At most one relation per segment.At most one relation per segment.

RSS (Cont…)RSS (Cont…)

Access tuples using a scan: OPEN, NEXT, Access tuples using a scan: OPEN, NEXT, and CLOSE. A scan returns a tuple at a time. and CLOSE. A scan returns a tuple at a time.

Supports two types of scans:Supports two types of scans:1.1. Segment scan: Find all tuples of a relation.Segment scan: Find all tuples of a relation.

All non-empty pages of a segment are referenced only All non-empty pages of a segment are referenced only once.once.

2.2. Index scan: B+-treesIndex scan: B+-trees

OptimizerOptimizer

Formulates a cost prediction for each access Formulates a cost prediction for each access plan, using the following cost formula:plan, using the following cost formula:

COST = Page fetches + W * (RSI Calls)COST = Page fetches + W * (RSI Calls)

W is an adjustable weighting factor between W is an adjustable weighting factor between I/O and CPU.I/O and CPU.

RSI calls is an approximation for CPU RSI calls is an approximation for CPU utilization.utilization.

Assumptions:Assumptions: WHERE tree is considered to be in conjunctive WHERE tree is considered to be in conjunctive

normal form,normal form, Every disjunct is called a boolean factor.Every disjunct is called a boolean factor.

Optimizer (Motivation)Optimizer (Motivation)

Given a query, there are many ways to Given a query, there are many ways to execute it. The optimizer must identify the execute it. The optimizer must identify the best execution plan.best execution plan.

Example:Example:SELECT name, title, salSELECT name, title, sal

FROM Emp, JobFROM Emp, Job

WHERE Emp.Job = Job.Job WHERE Emp.Job = Job.Job

and Title = ‘CLERK’and Title = ‘CLERK’

Optimizer (Motivation)Optimizer (Motivation)

Example:Example:SELECT name, title, salSELECT name, title, sal

FROM Emp, JobFROM Emp, Job

WHERE Emp.Job = Job.Job WHERE Emp.Job = Job.Job

and Title = ‘CLERK’and Title = ‘CLERK’

Decide order to perform the different operators: Decide order to perform the different operators: process “Title = ‘CLERK’” followed by the joinprocess “Title = ‘CLERK’” followed by the join Process the join “Emp.Job = Job.Job” followed by “Title = Process the join “Emp.Job = Job.Job” followed by “Title =

‘CLERK’”‘CLERK’” Decide which index structure to use: Segment scan, clustered Decide which index structure to use: Segment scan, clustered

index, non-clustered index.index, non-clustered index. Decide the join algorithm: nested-loops versus merge-scan.Decide the join algorithm: nested-loops versus merge-scan.

This paper tries to answer all the above questions!This paper tries to answer all the above questions!

How?How?

Enumerating the different execution plans,Enumerating the different execution plans, Estimate the cost of performing each plan,Estimate the cost of performing each plan, Pick the cheapest plan.Pick the cheapest plan.

What is definition of cost?What is definition of cost?

How?How?

Enumerating the different execution plans,Enumerating the different execution plans, Estimate the cost of performing each plan,Estimate the cost of performing each plan, Pick the cheapest plan.Pick the cheapest plan.

What is definition of cost?What is definition of cost?COST = Page fetches + W * (RSI Calls)COST = Page fetches + W * (RSI Calls)

Conjunctive Normal FormConjunctive Normal Form

A formula is in conjunctive normal form if it A formula is in conjunctive normal form if it is a conjunction of clauses:is a conjunction of clauses: A AND BA AND B ~A AND (B OR C)~A AND (B OR C) (A OR B) AND (D OR ~E)(A OR B) AND (D OR ~E)

Is ~(B OR C) in CNF?Is ~(B OR C) in CNF?



Is ~(B OR C) in CNF?Is ~(B OR C) in CNF?Fix it by carrying the negation inside:Fix it by carrying the negation inside:

~B AND ~C~B AND ~C



How about (A AND B) OR C?How about (A AND B) OR C?



How about (A AND B) OR C?How about (A AND B) OR C?Transform it to (A OR C) AND (B OR C)Transform it to (A OR C) AND (B OR C)

CNFCNF

Why?Why? Every tuple returned to the user must satisfy Every tuple returned to the user must satisfy

every boolean factor.every boolean factor. If a tuple fails a boolean factor, discard it from If a tuple fails a boolean factor, discard it from

farther consideration.farther consideration.

Database CatalogDatabase Catalog

System R maintains statistics for each System R maintains statistics for each relation T:relation T: NCARD(T), number of records in TNCARD(T), number of records in T TCARD(T), number of pages in the segment that TCARD(T), number of pages in the segment that

holds tuples of Tholds tuples of T P(T), fraction of data pages in the segment that P(T), fraction of data pages in the segment that

hold tuples of relation Thold tuples of relation TP(T) = TCARD(T) / (# of non-empty pages in the segment)P(T) = TCARD(T) / (# of non-empty pages in the segment)

For each index I on relation T,For each index I on relation T, ICARD(I), number of distinct keys in index I.ICARD(I), number of distinct keys in index I. NINDX(I), number of pages in index I.NINDX(I), number of pages in index I.

Maintenance of StatisticsMaintenance of Statistics

Selectivity Factor (F)Selectivity Factor (F)

Corresponds to the expected fraction of Corresponds to the expected fraction of tuples which will satisfy the predicate.tuples which will satisfy the predicate.

Column = valueColumn = value F = 1 / ICARD(column index) with an index, F = 1 / ICARD(column index) with an index,

assuming an even distribution of tuples among assuming an even distribution of tuples among the index key values.the index key values.

F = 1 / 10 otherwise.F = 1 / 10 otherwise.

Clustered IndexClustered Index

Assume a student table: Student(name, age, Assume a student table: Student(name, age, gpa, major)gpa, major)

t(Student) = 16t(Student) = 16

P(Student) = 4P(Student) = 4

Bob, 21, 3.7, CS

Mary, 24, 3, ECE

Tom, 20, 3.2, EE

Kathy, 18, 3.8, LS

Kane, 19, 3.8, ME

Lam, 22, 2.8, ME

Chang, 18, 2.5, CS

Vera, 17, 3.9, EE

Louis, 32, 4, LS

Martha, 29, 3.8, CS

James, 24, 3.1, ME

Pat, 19, 2.8, EE

Chris, 22, 3.9, CS

Chad, 28, 2.3, LS

Leila, 20, 3.5, LS

Shideh, 16, 4, CS

Number of Records per GPANumber of Records per GPA

0

1

2

3

4

2.3 2.5 2.8 3 3.1 3.2 3.5 3.7 3.8 3.9 4

Actual GPA Values

ESTIMATING NUMBER OF RESULTING RECORDSESTIMATING NUMBER OF RESULTING RECORDS

For exact match selection predicates assume a uniform distribution of For exact match selection predicates assume a uniform distribution of records across the number of unique values. E.g., the selection records across the number of unique values. E.g., the selection predicate is gpa = 3.3predicate is gpa = 3.3

For range selection predicates assume a uniform distribution of For range selection predicates assume a uniform distribution of records across the range of available values defined by min and max. records across the range of available values defined by min and max. In this case, one must think about the interval. E.g., gpa > 3.5In this case, one must think about the interval. E.g., gpa > 3.5

0

0.5

1

1.5

2

2.3 2.5 2.8 3 3.1 3.2 3.5 3.7 3.8 3.9 4

0

0.2

0.4

0.6

0.8

1

43.93.83.73.53.23.132.82.52.3


Column > value Column > value F = (high key value – value) / (high key value – F = (high key value – value) / (high key value –

low key value) as long as the column is an low key value) as long as the column is an arithmetic type and value is known at access arithmetic type and value is known at access path selection time.path selection time.

F = 1/3 otherwise (column is not arithmetic)F = 1/3 otherwise (column is not arithmetic)


Column < value Column < value

??


Column < value Column < value F = (value - low key value) / (high key value – low F = (value - low key value) / (high key value – low

key value) as long as the column is an arithmetic key value) as long as the column is an arithmetic type and value is known at access path selection type and value is known at access path selection time.time.

F = 1/3 otherwise (column is not arithmetic)F = 1/3 otherwise (column is not arithmetic)


Value1 < Column < Value2 Value1 < Column < Value2

??


Value1 < Column < Value2 Value1 < Column < Value2 F = (Value2 – Value1) / (high key value – low key F = (Value2 – Value1) / (high key value – low key

value) as long as the column is arithmeticvalue) as long as the column is arithmetic F = ¼ otherwiseF = ¼ otherwise


Column in (list of values) Column in (list of values)

Join predicate, Column 1 = Column 2Join predicate, Column 1 = Column 2

Disjunctive predicateDisjunctive predicate


Conjunctive predicateConjunctive predicate

NegationNegation

Interesting orderInteresting order

A query block’s GROUP BY or ORDER BY A query block’s GROUP BY or ORDER BY clauses may correspond to the order of clauses may correspond to the order of records in an access path. This tuple order records in an access path. This tuple order is an interesting order. is an interesting order.

Example query:Example query:

Interesting orderInteresting order

A query block’s GROUP BY or ORDER BY A query block’s GROUP BY or ORDER BY clauses may correspond to the order of clauses may correspond to the order of records in an access path. This tuple order records in an access path. This tuple order is an interesting order. is an interesting order.

Example query:Example query:Student(name, age, gpa, major) with a B+-tree on Student(name, age, gpa, major) with a B+-tree on

the gpa attributethe gpa attribute

SELECT nameSELECT name

FROM StudentFROM Student

WHERE gpa < 3.0WHERE gpa < 3.0

ORDER BY gpaORDER BY gpa

SELECT gpa, count(*)SELECT gpa, count(*)FROM StudentFROM StudentWHERE gpa < 3.0WHERE gpa < 3.0GROUP BY gpaGROUP BY gpa

BB++-Tree-Tree

A B+-tree on the gpa attributeA B+-tree on the gpa attribute

Bob, 21, 3.7, CSMary, 24, 3, ECE

Tom, 20, 3.2, EE

Kathy, 18, 3.8, LS

Kane, 19, 3.8, MELam, 22, 2.8, ME

Chang, 18, 2.5, CS Vera, 17, 3.9, EE

Louis, 32, 4, LS

Martha, 29, 3.8, CS

James, 24, 3.1, ME

Pat, 19, 2.8, EE

Chris, 22, 3.9, CSChad, 28, 2.3, LS

Leila, 20, 3.5, LS Shideh, 16, 4, CS

(3.7, (3, 1))

(3.8, (3,2))

(3.8, (3,3))

(3.9, (4,2))

(4, (4,3))

(3.8, (3,4))

(3.9, (4,1))

(4, (4,4))

(2.3, (1, 1))

(2.5, (1,2))

(2.8, (1,3))

(3.1, (2,2))

(3.2, (2,3)

(2.8, (1,4))

(3, (2,1))

(3.5, (2,4))

3.6

Single Relation Access PathsSingle Relation Access Paths

Single relation access paths are simple Single relation access paths are simple selects with ORDER BY and GROUP BY selects with ORDER BY and GROUP BY clausesclauses

SELECT nameSELECT name

FROM StudentFROM Student

WHERE age < 20WHERE age < 20

Without an index, must perform a segment Without an index, must perform a segment scan, what is the cost?scan, what is the cost? TCARD / P + W * RSISCANTCARD / P + W * RSISCAN

TCARD(T), number of pages in the segment that holds TCARD(T), number of pages in the segment that holds tuples of Ttuples of T

P(T), fraction of data pages in the segment that hold P(T), fraction of data pages in the segment that hold tuples of relation Ttuples of relation TP(T) = TCARD(T) / (# of non-empty pages in the segment)P(T) = TCARD(T) / (# of non-empty pages in the segment) Why?Why?


Single relation access paths are simple Single relation access paths are simple selects with ORDER BY and GROUP BY selects with ORDER BY and GROUP BY clausesclauses

SELECT nameSELECT nameFROM StudentFROM StudentWHERE age < 20WHERE age < 20

Without an index, must perform a segment Without an index, must perform a segment scan, what is the cost?scan, what is the cost? TCARD / P + W * RSISCANTCARD / P + W * RSISCAN

TCARD(T), number of pages in the segment that holds TCARD(T), number of pages in the segment that holds tuples of Ttuples of T

P(T), fraction of data pages in the segment that hold P(T), fraction of data pages in the segment that hold tuples of relation Ttuples of relation TP(T) = TCARD(T) / (# of non-empty pages in the segment)P(T) = TCARD(T) / (# of non-empty pages in the segment)

Tuples of Student might be inter-mixed with professors. Tuples of Student might be inter-mixed with professors. Example: the student table with TCARD = 100 pages and Example: the student table with TCARD = 100 pages and P(T) = 0.75. Note that P(T) = 1 when the student table is P(T) = 0.75. Note that P(T) = 1 when the student table is not intermixed with another table.not intermixed with another table.


Cost of scanning leaf pages and data Cost of scanning leaf pages and data pagespages

Bob, 21, 3.7, CSMary, 24, 3, ECE

Tom, 20, 3.2, EE

Kathy, 18, 3.8, LS

Kane, 19, 3.8, MELam, 22, 2.8, ME

Chang, 18, 2.5, CS Vera, 17, 3.9, EE

Louis, 32, 4, LS

Martha, 29, 3.8, CS

James, 24, 3.1, ME

Pat, 19, 2.8, EE

Chris, 22, 3.9, CSChad, 28, 2.3, LS

Leila, 20, 3.5, LS Shideh, 16, 4, CS

(3.7, (3, 1))

(3.8, (3,2))

(3.8, (3,3))

(3.9, (4,2))

(4, (4,3))

(3.8, (3,4))

(3.9, (4,1))

(4, (4,4))

(2.3, (1, 1))

(2.5, (1,2))

(2.8, (1,3))

(3.1, (2,2))

(3.2, (2,3)

(2.8, (1,4))

(3, (2,1))

(3.5, (2,4))

3.6


Cost of scanning leaf pages and data Cost of scanning leaf pages and data pages containing the qualifying recordspages containing the qualifying records

Non-Clustered BNon-Clustered B++-Tree-Tree

A random I/O for every qualifying recordA random I/O for every qualifying record

Bob, 21, 3.7, CS

Mary, 24, 3, ECE

Tom, 20, 3.2, EE

Kathy, 18, 3.8, LS

Kane, 19, 3.8, ME

Lam, 22, 2.8, ME

Chang, 18, 2.5, CS

Vera, 17, 3.9, EE

Louis, 32, 4, LS

Martha, 29, 3.8, CS

James, 24, 3.1, ME

Pat, 19, 2.8, EE

Chris, 22, 3.9, CS

Chad, 28, 2.3, LS

Leila, 20, 3.5, LS

Shideh, 16, 4, CS

(3.7, (1, 1))

(3.8, (3,2))

(3.8, (2,1))

(3.9, (2,4))

(4, (3,1))

(3.8, (1,4))

(3.9, (4,1))

(4, (4,4))

(2.3, (4, 2))

(2.5, (2,3))

(2.8, (2,2))

(3.1, (3,3))

(3.2, (1,3)

(2.8, (3,4))

(3, (1,2))

(3.5, (4,3))

3.6

Non-Clustered BNon-Clustered B++-Tree-Tree

A random I/O for every qualifying recordA random I/O for every qualifying record

R EQUALITY JOIN S: R.A = S.AR EQUALITY JOIN S: R.A = S.A

Two algorithms for performing the join Two algorithms for performing the join operator: nested loops and merge-scan.operator: nested loops and merge-scan.

Tuple nested loops: Tuple nested loops: for each tuple r in R do for each tuple r in R do

for each tuple s in S do for each tuple s in S do

if r.A=s.A then output r,s in the result if r.A=s.A then output r,s in the result relation relation

end-for end-for

end-for end-for

Estimated cost of tuple nested loops:Estimated cost of tuple nested loops: TCARD(R)/P(R) + [NCARD(R) × TCARD(S)/P(S)]TCARD(R)/P(R) + [NCARD(R) × TCARD(S)/P(S)]

TCARD(S)/P(S)

NCARD(R)

EQUALITY JOIN (Cont…)EQUALITY JOIN (Cont…)

Merge-scan:Merge-scan:1.1. Interesting order on R.A (sorted)Interesting order on R.A (sorted)

2.2. Interesting order on S.A (sorted)Interesting order on S.A (sorted)

3.3. Scan Scan R R and and S S in parallel, merging tuples with in parallel, merging tuples with matching A valuesmatching A values

Estimated cost of merge scan: NINDX(IEstimated cost of merge scan: NINDX(IRR) + ) + NINDX(ININDX(ISS))

N-Way JoinN-Way Join

N-Way joins as a sequence of 2-way joins.N-Way joins as a sequence of 2-way joins. Utilize pipelining whenever appropriate:Utilize pipelining whenever appropriate:

The ordering of the joins is important. Consider all The ordering of the joins is important. Consider all ordering such that:ordering such that: Join predicates relate the two participating tables together; Join predicates relate the two participating tables together;

do not consider cartesian products. do not consider cartesian products. For example if the join For example if the join clause is (R.A = S.A and R.B = T.B) then it would be a mistake to use the clause is (R.A = S.A and R.B = T.B) then it would be a mistake to use the following clause (S Cartesian product T) and R.A = ST.A and R.B = ST.Bfollowing clause (S Cartesian product T) and R.A = ST.A and R.B = ST.B

Delay computation of cartesian products as much as Delay computation of cartesian products as much as possible.possible.

Consider interesting orders in order to use merge-scan Consider interesting orders in order to use merge-scan whenever possible.whenever possible.

Search SpaceSearch Space

Rather large search space for expressions Rather large search space for expressions joining several tables:joining several tables:

Heuristics prune the search space:Heuristics prune the search space:

Nested QueriesNested Queries

Correlation subquery: A subquery with a Correlation subquery: A subquery with a reference to a value obtained from a reference to a value obtained from a candidate tuple of a higher level query block.candidate tuple of a higher level query block.

Non-Correlation sub-queriesNon-Correlation sub-queries

Evaluate the inner query once and use its Evaluate the inner query once and use its results to process the outer query.results to process the outer query.

Access Path Selection in a RDBMS Shahram Ghandeharizadeh Computer Science Department University of Southern California.

Documents