Top Banner
You’re Smarter than a Database Overcoming the optimizer’s bad cardinality estimates
53

Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Mar 26, 2015

Download

Documents

Julia Reed
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

You’re Smarter than a Database

Overcoming the optimizer’s bad cardinality estimates

Page 2: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

About me

Bobby Durrett US Foodservice [email protected] Scripts in

http://www.geocities.com/bobbyandmarielle/sqltuning.zip

Page 3: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

What you know

Page 4: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

What the database knows

Page 5: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Before SQL Example - mainframe Datacom/DB COBOL

List index names Write loops

read a from one index i1 where one.c=10

while more table one rows exist get next row

read b from two index i2 where two.a = one.a

while more table two rows exist get next row

print one.a,two.b

end while

end while

Page 6: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

SQL Tell what you want, not how to get it

select one.a,two.b

from

one,two

where

one.c=10 and one.a=two.a;

Page 7: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Pre-SQL versus SQL

Pre-SQL code very efficient – runs in megabytes – VSE mainframe COBOL

Labor intensive SQL can be inefficient – runs in

gigabytes (if you are lucky!) Much more productive – do in

minutes what took hours before – create tables

Page 8: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

What the database doesn’t know

Optimizer has a limited set of statistics that describe the data

It can miscalculate the number of rows a query will return, its cardinality

A cardinality error can lead optimizer to choose a slow way to run the SQL

Page 9: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Example plan/Cardinality-------------------------------------------------| Id | Operation | Name | Rows | Cost-------------------------------------------------| 0 | SELECT STATEMENT | | 10 | 3|* 1 | TABLE ACCESS FULL| TEST1 | 10 | 3------------------------------------------------- Plan = how Oracle will run your query Rows = how many rows optimizer thinks

that step will return Cost = estimate of time query will take, a

function of the number of rows

Page 10: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

How to fix cardinality problems Find out if it really is a cardinality

issue Determine the reason it occurred

Single column Multiple columns

Choose a strategy Give the optimizer more information Override optimizer decision Change the application

Page 11: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Four examples

Four examples of how the optimizer calculates cardinality

Full scripts and their outputs on portal, pieces on slides – edited for simplicity

Page 12: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 1: Find out if it really is a cardinality issue

Example 1 Data

select a,count(*) from test1 group by a;

A COUNT(*) ---------- ---------- 1 10

Query

select * from test1 where a=1;

Page 13: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 1: Find out if it really is a cardinality issue

Get estimated cardinality from plan-------------------------------------------| Id | Operation | Name | Rows |-------------------------------------------| 0 | SELECT STATEMENT | | 10 ||* 1 | TABLE ACCESS FULL| TEST1 | 10 |-------------------------------------------

Do query for actual number of rowsselect count(*) from test1 where a=1;

Page 14: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 1: Find out if it really is a cardinality issue

Plan is a tree – find cardinality and select count(*) on part of query represented by that part of plan. join

table join

table table

Page 15: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 2: Understand the reason for the wrong cardinality

Unequal distribution of data: Within a single column

Last name “Smith” or “Jones”

Among multiple columns – Address Zipcode and State

Page 16: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 2: Understand the reason for the wrong cardinality

Example 2 - Unequal distribution of values in a single column 1,000,000 rows with value 1 1 row with value 2

select a,count(*) from TEST2 group by a;

A COUNT(*)

---------- ----------

1 1000000

2 1

Page 17: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 2: Understand the reason for the wrong cardinality

SQL statement – returns one row

select * from TEST2 where a=2;

Page 18: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 2: Understand the reason for the wrong cardinality

Plan with wrong number of rows = 500,000 Full scan instead of range scan – 100 times

slower

---------------------------------------------| Operation | Name | Rows |---------------------------------------------| SELECT STATEMENT | | 500K|| INDEX FAST FULL SCAN| TEST2INDEX | 500K|---------------------------------------------

Page 19: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 2: Understand the reason for the wrong cardinality

Column statistics – two distinct values

LOW HIGH NUM_DISTINCT---------- ---------- ------------ 1 2 2

Table statistic – total # of rows – 1,000,001

NUM_ROWS---------- 1000001

Page 20: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 2: Understand the reason for the wrong cardinality

Rows in plan = (rows in table)/ (distinct values of column)

500000=1000001/2 Optimizer knew that there were

only two values – assumed they had equal number of rows

Page 21: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 2: Understand the reason for the wrong cardinality

Example 3 - Combinations of column values not equally distributed 1,000,000 rows with values 1,1 1,000,000 rows with values 2,2 1 row with value 1,2 ~ Equal numbers of 1’s and 2’s in each

column A B COUNT(*)---------- ---------- ----------

1 1 1000000

1 2 1

2 2 1000000

Page 22: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 2: Understand the reason for the wrong cardinality

SQL statement – retrieves one row

select sum(a+b)

from TEST3

where

a=1 and b=2;

Page 23: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 2: Understand the reason for the wrong cardinality

Plan with wrong number of rows = 500,000 Inefficient full scan

----------------------------------------------

| Operation | Name | Rows |

----------------------------------------------

| SELECT STATEMENT | | 1 |

| SORT AGGREGATE | | 1 |

| INDEX FAST FULL SCAN| TEST3INDEX | 500K|

----------------------------------------------

Page 24: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 2: Understand the reason for the wrong cardinality

Column statistics

C LOW HIGH NUM_DISTINCT- ---------- ---------- ------------A 1 2 2B 1 2 2

Table statistic – total # of rows – 2,000,001

NUM_ROWS---------- 2000001

Page 25: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 2: Understand the reason for the wrong cardinality

Rows in plan = (rows in table)/ (distinct values A * distinct values B)

500000=2000001/(2 * 2) Optimizer assumes all four

combinations (1,1),(1,2),(2,1),(2,2) equally likely

Page 26: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 2: Understand the reason for the wrong cardinality

How to tell which assumption is in play? Select count(*) each column

select a,count(*) from TEST3 group by a;

select b,count(*) from TEST3 group by b;

count(*) each column combination

select a,b,count(*) from TEST3

group by a,b;

Page 27: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Giving the optimizer more information Histograms SQL Profiles

Overriding optimizer decisions Hints

Changing the application

Try to use optimizer as much as possible to minimize development work

Page 28: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Giving the optimizer more information – using histograms Works for unequal distribution within

a single column A histogram records the distribution

of values within a column in up to 254 “buckets”

Works best on columns with fewer than 255 distinct values

Page 29: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Run gather_table_stats command to get histograms on the column – 254 is max number of buckets

method_opt=>'FOR ALL COLUMNS SIZE 254'

Page 30: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Plan for Example 2 with correct number of rows with histogram Uses range scan-----------------------------------------

| Operation | Name | Rows |

-----------------------------------------

| SELECT STATEMENT | | 1 |

| INDEX RANGE SCAN| TEST2INDEX | 1 |

-----------------------------------------

Page 31: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Column statistics – two buckets

LOW HIGH NUM_DISTINCT NUM_BUCKETS---------- ---------- ------------ ----------- 1 2 2 2

Table statistic – unchanged

NUM_ROWS---------- 1000001

Page 32: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Time without histograms (1 second):

Elapsed: 00:00:01.00

Time with histograms(1/100th second):

Elapsed: 00:00:00.01

Page 33: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Giving the optimizer more information – using SQL Profiles Works for unequal distribution among

multiple columns Includes information about the relationship

between columns in the SQL – correlated columns or predicates

Page 34: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

SQL Tuning Advisor gathers statistics on the columns

...DBMS_SQLTUNE.CREATE_TUNING_TASK(...

...DBMS_SQLTUNE.EXECUTE_TUNING_TASK(...

Accept the SQL Profile it creates to use the new statistics

...DBMS_SQLTUNE.ACCEPT_SQL_PROFILE (...

Page 35: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Example 3 plan with correct number of rows = 1 using SQL profile

--------------------------------------------------| Operation | Name | Rows | Bytes |--------------------------------------------------| SELECT STATEMENT | | 1 | 6 || SORT AGGREGATE | | 1 | 6 || INDEX RANGE SCAN| TEST3INDEX | 1 | 6 |-------------------------------------------------|

Page 36: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Time without a profile (1 second):

Elapsed: 00:00:01.09

Time with a profile(1/100th second):

Elapsed: 00:00:00.01

Page 37: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Overriding optimizer decisions – using hints Example 4 has unequal distribution of

column values across two tables – histograms and SQL Profiles don’t work

Hint forces index range scan Small amount of additional code – not like

Cobol on mainframe

Page 38: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Example 4 - SMALL table MANY relates to 1 – there are many rows

with value 1 FEW relates to 2 – there are few with value 2

insert into SMALL values ('MANY',1);

insert into SMALL values ('FEW',2);

Page 39: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Example 4 - LARGE table: 1,000,000 rows with value 1 1 row with value 2 NUM COUNT(*)---------- ----------

1 1000000

2 1

Page 40: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

SQL statement – returns one row

select B.NUM

from SMALL A,LARGE B

where

A.NUM=B.NUM and

A.NAME='FEW';

Page 41: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Plan with wrong number of rows = 125,000

----------------------------------------------| Operation | Name | Rows |----------------------------------------------| SELECT STATEMENT | | 125K|| HASH JOIN | | 125K|| TABLE ACCESS FULL | SMALL | 1 || INDEX FAST FULL SCAN| LARGEINDEX | 1000K|----------------------------------------------

Page 42: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Column statistics – two buckets on all columns – using histograms

LOW HIGH NUM_DISTINCT NUM_BUCKETS---------- ---------- ------------ -----------

1 2 2 2

LOW HIGH NUM_DISTINCT NUM_BUCKETS

---- ---- ------------ -----------

FEW MANY 2 2

Page 43: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Table statistics – SMALL has 2 rows, LARGE 1000001

NUM_ROWS----------

2

NUM_ROWS----------

1000001

Page 44: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

125000=1000001/8 Optimizer appears to assume all

eight combinations of the three columns’ values are equally likely

Can’t verify formula – references don’t include formula with histograms

Even worse without histograms – cardinality is 500000

Page 45: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

No SQL profile from SQL Tuning Advisor:

There are no recommendations to improve the statement.

Neither histograms nor SQL profiles help example 4

Page 46: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Statement with hints: Use index Don’t do full scan

select /*+ INDEX(B LARGEINDEX) NO_INDEX_FFS(B LARGEINDEX) */ B.NUMfrom SMALL A,LARGE Bwherea.NUM=B.NUM andA.NAME='FEW';

Page 47: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Time without a hint (1 second):

Elapsed: 00:00:01.03

Time with a hint (1/100th second):

Elapsed: 00:00:00.01

Page 48: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Changing the application Change your tables so that the optimizer gets

your SQL’s cardinality right Requires more work designing tables, but

keeps productivity benefits of SQL

Page 49: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Example 4 – moved NAME column to LARGE table and split table in two One million (‘MANY’,1) rows in LARGEA One (‘FEW’,2) row in LARGEB Query:

select NUMfrom (select * from largeaunionselect * from largeb)whereNAME='FEW';

Page 50: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Plan is just as efficient as with hint: Number of rows = 2 (reality is 1) Range Scan

--------------------------------------------------------------| Id | Operation | Name | Rows |--------------------------------------------------------------| 0 | SELECT STATEMENT | | 2 || 1 | VIEW | | 2 || 2 | SORT UNIQUE | | 2 || 3 | UNION-ALL | | || 4 | TABLE ACCESS BY INDEX ROWID| LARGEA | 1 ||* 5 | INDEX RANGE SCAN | LARGEAINDEX | 1 || 6 | TABLE ACCESS BY INDEX ROWID| LARGEB | 1 ||* 7 | INDEX RANGE SCAN | LARGEBINDEX | 1 |--------------------------------------------------------------

Page 51: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Step 3: Choose the best strategy for fixing the cardinality problem

Time without table change (1 second):

Elapsed: 00:00:01.03

Time with table change (1/100th second):

Elapsed: 00:00:00.01

Page 52: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

Conclusion

SQL improves productivity, optimizer has limits Identify cases where cardinality is wrong Understand why the database got it wrong

One column Multiple columns

Choose best strategy to fix Give optimizer more info Override optimizer’s choices Redesign tables

Page 53: Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.

References Cost Based Optimizer Fundamentals, Jonathan Lewis

Metalink Note:212809.1, Limitations of the Oracle Cost Based Optimizer

Metalink Note:68992.1, Predicate Selectivity

Histograms – Myths and Facts, Wolfgang Breitling Select Journal, Volume 13, Number 3