Top Banner
Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 25, 2003 slide content courtesy Susan Davidson, Raghu Ramakrishnan & Johannes Gehrke
31

Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

Optimization, Auto-Tuning, andIntroduction to Transactions

Zachary G. IvesUniversity of Pennsylvania

CIS 550 – Database & Information Systems

November 25, 2003

Some slide content courtesy Susan Davidson, Raghu Ramakrishnan & Johannes Gehrke

Page 2: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

2

Administrivia

It’s nearly the end! Homework 7 due next Tuesday 12/2 Projects due next Thurs. 12/4: please sign

up to give me a demo that day Final exam handed out 12/4 Final exam and project report due 12/18

Projects will be graded both on quality of the project and the quality of the report – writing is always important!

Page 3: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

Recap of Query Optimization

Plan: Tree of R.A. ops, with choice of alg for each op. Each operator typically implemented using a `pull’

interface: when an operator is `pulled’ for the next output tuples, it `pulls’ on its inputs and computes them.

Two main issues: For a given query, what plans are considered?

Algorithm to search plan space for cheapest (estimated) plan. How is the cost of a plan estimated?

Ideally: Want to find best plan. Practically: Avoid worst plans!

Our focus is on the approach from “System R”.

Page 4: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

Highlights of System R Optimizer

Impact: Most widely used currently; works well for < 10 joins.

Cost estimation: Approximate art at best. Statistics, maintained in system catalogs, used to estimate

cost of operations and result sizes. Considers combination of CPU and I/O costs.

Plan Space: Too large, must be pruned! Break the query into blocks Only the space of left-deep plans will be considered.

Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation.

Cartesian products avoided.

Page 5: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

First Need to Divide into Query Blocks

An SQL query is parsed into a collection of query blocks, and these are optimized one block at a time.

Nested blocks are usually treated as calls to a subroutine, made once per outer tuple. (This is an over-simplification, but serves for now.)

SELECT S.snameFROM Sailors SWHERE S.age IN (SELECT MAX (S2.age) FROM Sailors S2 GROUP BY S2.rating)

Nested blockOuter block For each block, the plans considered are:

– All available access methods, for each reln in FROM clause.– Possible join trees

Page 6: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

6

Recall: The Core Idea in System R

For computing the most effective way of joining tables, use dynamic programming, which builds subresults and then uses these to construct successively bigger results Find cheapest ways of accessing tables Find cheapest ways of joining every pair of tables Find cheapest way of joining every pair of tables with

a 3rd table (reusing the cheapest way of getting that pair)

Find cheapest way of joining every triple of tables with a 4th table (reusing the cheapest way of joining that triple)

Page 7: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

7

Still Not Quite Enough…

Problem 1: Query blocks also have selection, projection, grouping!

Problem 2: We still have to consider too many alternative plans!

Problem 3: Sorting causes funny things to happen in the dynamic programming approach – it doesn’t easily account for amortization of a sort across multiple queries

Page 8: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

8

Heuristic 1: Selections, Projections, Groupings

What do we know is generally the case about selection & projection operations?

ORDER BY, GROUP BY, aggregates etc. handled as a final step

Page 9: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

Heuristic: Left-Deep Join Trees

Fundamental decision in System R: only left-deep join trees are considered. As the number of joins increases, the number of alternative

plans grows rapidly; we need to restrict the search space. Left-deep trees allow us to generate all fully pipelined

plans. Intermediate results not written to temporary files. Not all left-deep trees are fully pipelined (e.g., SM join).

BA

C

D

BA

C

D

C DBA

Page 10: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

10

“Interesting Orders”

Dynamic programming doesn’t account for amortization of a sort across multiple joins We need to fix this!

Solution: Figure out all of the possible orderings that might

be useful in the plan (for joining or grouping) Create a separate “layer” in the DP table for these At every point in the DP algorithm:

Find the cheapest join that maintains the order Find the cheapest join that doesn’t maintain the order,

using both the ordered and unordered alternatives

Page 11: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

Final Details of Query Block Optimization

First, joins and cartesian products are enumerated: An N-1 way plan is not combined with an additional

relation if there is no join condition between them, unless all predicates in WHERE have been used up.

i.e., avoid Cartesian products if possible

Selections and projections are “pushed down” Final ORDER BY is applied

In spite of pruning the plan space and using heuristics, this approach is still exponential in the # of tables.

Page 12: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

Nested Queries

Nested block is optimized independently, with the outer tuple considered as providing a selection condition.

Outer block is optimized with the cost of `calling’ nested block computation taken into account.

Implicit ordering of these blocks means that some good strategies are not considered. The non-nested version of the query is typically optimized better.

SELECT S.snameFROM Sailors SWHERE EXISTS (SELECT * FROM Reserves R WHERE R.bid=103 AND R.sid=S.sid)

Nested block to optimize: SELECT * FROM Reserves R WHERE R.bid=103 AND S.sid= outer valueEquivalent non-nested query:

SELECT S.snameFROM Sailors S, Reserves RWHERE S.sid=R.sid AND R.bid=103

Page 13: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

Query Optimization Recapped

Query optimization is an important task in a relational DBMS

Must understand optimization in order to understand the performance impact of a given database design (relations, indexes) on a workload (set of queries) Additionally, may need to do “hand optimization”

Two parts to optimizing a query: Consider a set of alternative plans

Heuristics for simpler operators Must prune search space; typically, left-deep plans only

Must estimate cost of each plan that is considered Must estimate size of result and cost for each plan node Key issues: statistics, indexes, operator implementations PITFALL: often the estimates of intermediate results AREN’T

good!!!

Page 14: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

14

The Bigger Picture: Tuning

We saw that indexes and optimization decisions were critical to performance Homeworks 6 and 7 tried to demonstrate some of

that Also important: buffer pool sizes, layout of data

on disk, isolation levels (discussed shortly) Many DBAs and consultants have made a

living off understanding query workloads, data, and estimated intermediate result sizes They “tune” DBs as a specialty … Though this career MIGHT be diminishing in

significance…

Page 15: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

15

Autonomic & Auto-Tuning DBMSs

Hot research topic: self-tuning and adaptive DBMSs SQL Server and DB2 have “Index Wizards” that

take a query workload and try to find an optimal set of indices for it Basically, they try lots of combinations of indices to

find one that works well

“Adaptive query processing” systems also try to figure out where the optimizer’s estimates “went wrong” and compensate for it Change the query in the middle, or Make a note so we pick a better plan next time!

Page 16: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

16

Switching Gears…

We’ve spent a lot of time talking about querying data

Yet updates are a really major part of many DBMS applications Particularly important: ensuring ACID properties

Atomicity: each operation looks atomic to the user Consistency: each operation in isolation keeps the

database in a consistent state (this is the responsibility of the user)

Isolation: should be able to understand what’s going on by considering each separate transaction independently

Durability: updates stay in the DBMS!!!

Page 17: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

17

What is a transaction?

A transaction is a sequence of read and write operations on data items that logically functions as one unit of work: should either be done entirely or not at all if it succeeds, the effects of write operations

persist (commit); if it fails, no effects of write operations persist (abort)

these guarantees are made despite concurrent activity in the system, and despite failures that may occur

Page 18: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

18

How things can go wrong

Suppose we have a table of bank accounts which contains the balance of the account. A deposit of $50 to a particular account # 1234 would be written as:

Reads and writes the account’s balance What if two owners of the account make

deposits simultaneously?

update Accountsset balance = balance + $50where account#= ‘1234’;

Page 19: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

19

Concurrent deposits

This SQL update code is represented as a sequence of read and write operations on “data items” (which for now should be thought of as individual accounts):

Here, X is the data item representing the account with account# 1234.

Deposit 1 Deposit 2read(X.bal) read(X.bal)X.bal := X.bal + $50 X.bal:= X.bal + $10write(X.bal) write(X.bal)

Page 20: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

20

A “bad” concurrent execution

But only one “action” (e.g. a read or a write) can happen at a time, and there are a variety of ways in which the two deposits could be simultaneously executed: Deposit 1 Deposit 2

read(X.bal) read(X.bal)X.bal := X.bal + $50 X.bal:= X.bal + $10write(X.bal) write(X.bal)

time

BAD!

Page 21: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

21

A “good” execution

Previous execution would have been fine if the accounts were different (i.e. one were X and one were Y).

The following execution is a serial execution, and executes one transaction after the other:

Deposit 1 Deposit 2read(X.bal) X.bal := X.bal + $50 write(X.bal) read(X.bal) X.bal:= X.bal + $10 write(X.bal)

time

GOOD!

Page 22: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

22

Good executions

An execution is “good” if is it serial (i.e. the transactions are executed one after the other) or serializable (i.e. equivalent to some serial execution)

This execution is equivalent to executing Deposit 1 then Deposit 3, or vice versa.

Deposit 1 Deposit 3read(X.bal) read(Y.bal)X.bal := X.bal + $50 Y.bal:= Y.bal + $10write(X.bal) write(Y.bal)

Page 23: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

23

Atomicity

Problems can also occur if a crash occurs in the middle of executing a transaction:

Need to guarantee that the write to X does not persist (ABORT) Default assumption if a transaction doesn’t commit

Transferread(X.bal)read(Y.bal)X.bal= X.bal-$100

Y.bal= Y.bal+$100

CRASH

Page 24: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

24

Transactions in SQL

A transaction begins when any SQL statement that queries the db begins.

To end a transaction, the user issues a COMMIT or ROLLBACK statement.

TransferUPDATE Accounts SET balance = balance - $100 WHERE account#= ‘1234’;UPDATE Accounts SET balance = balance + $100 WHERE account#= ‘5678’;COMMIT;

Page 25: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

25

Read-only transactions

When a transaction only reads information, we have more freedom to let the transaction execute in parallel with other transactions.

We signal this to the system by stating

SET TRANSACTION READ ONLY; SELECT * FROM Accounts WHERE account#=‘1234’;...

Page 26: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

26

Read-write transactions

If we state “read-only”, then the transaction cannot perform any updates.

Instead, we must specify that the transaction may update (the default):

SET TRANSACTION READ ONLY; UPDATE AccountsSET balance = balance - $100WHERE account#= ‘1234’; ...

SET TRANSACTION READ WRITE; update Accountsset balance = balance - $100where account#= ‘1234’; ...

ILLEGAL!

Page 27: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

27

Dirty reads

Dirty data is data written by an uncommitted transaction; a dirty read is a read of dirty data.

Sometimes dirty reads are acceptable, other times they are not:

e.g., if we wished to ensure balances never went negative in the transfer example, we should test that there is enough money first!

Page 28: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

28

“Bad” dirty readEXEC SQL select balance into :bal from Accounts where account#=‘1234’;if (bal > 100) { EXEC SQL update Accounts set balance = balance - $100 where account#= ‘1234’;EXEC SQL update Accounts set balance = balance + $100 where account#= ‘5678’;}EXEC SQL COMMIT;

If the initial read (italics) were dirty, the balance could become negative!

Page 29: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

29

Acceptable dirty read

However, if we are just checking availability of an airline seat, a dirty read might be fine! Reservation transaction:EXEC SQL select occupied into :occ

from Flights where Num= ‘123’ and date=11-03-99 and seat=‘23f’;if (!occ) {EXEC SQL update Flights set occupied=true where Num= ‘123’ and date=11-03-99 and seat=‘23f’;}else {notify user that seat is unavailable}

Page 30: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

30

Other phenomena

Unrepeatable read: a transaction reads the same data item twice and gets different values.

Phantom problem: a transaction retrieves a collection of tuples twice and sees different results

Page 31: Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

31

Phantom Problem Example T1: “find the oldest climber who is either MED or

EXP” T2: “insert a new EXP climber aged 96, then insert a

new MED climber aged 60”

Suppose that T1 locks all data pages with some EXP climber and finds that the oldest is 85.Then T2 executes, inserting the new EXP climber on a page not locked by T1.T1 then completes, locking all pages with some MED climber and finding the oldest MED climber is 60 (whereas the previous oldest MED climber had been 40).