Top Banner
Understanding Histograms in the Oracle Database Robert Gaydos Robert Gaydos R Wave Solutions [email protected] Copyright © R Wave Solutions 2008 All rights reserved
28

Understanding Histogram Ppt

Apr 27, 2015

Download

Documents

Sandeep Sharma
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Understanding Histogram Ppt

Understanding Histograms in the Oracle Database

Robert GaydosRobert GaydosR Wave [email protected]

Copyright © R Wave Solutions 2008 All rights reserved

Page 2: Understanding Histogram Ppt

Addenda

� What is a Histogram?� Types of Histograms� How to create Histograms� How to identify existing Histograms� Rules and tips when creating Histograms� Q & A

Copyright © R Wave Solutions 2008 All rights reserved

Page 3: Understanding Histogram Ppt

What is a Histogram?

� A histogram holds the data distribution of values within a column of a table.– Number of occurrences for a specific value/range – Number of occurrences for a specific value/range – Used by CBO to optimize a query.

� It is collected by using DBMS_STATS .– By default DBMS_STATS does not collect

histogram stats.� Min/Max Value and # of distinct values� NOTE: DBMS_STAT is used to delete histogram data.

Copyright © R Wave Solutions 2008 All rights reserved

Page 4: Understanding Histogram Ppt

Types of Histograms

� Two types of Histograms– frequency histograms

height-balanced histograms – height-balanced histograms – Type of histogram is stored in the HISTOGRAM

column of the *tab_col_statistics views (USER/DBA)

– Value = (‘HEIGHT BALANCED’, ‘FREQUENCY’, or ‘NONE’)

Copyright © R Wave Solutions 2008 All rights reserved

Page 5: Understanding Histogram Ppt

Histogram and Buckets

� When Histograms are created the number of buckets can be specified.

� It is this number that controls the type of � It is this number that controls the type of histogram created.

� # Buckets = # of Rows of information.� When Distinct Values <= Buckets

– Then Frequency Histogram is created– Else Height-Balanced Histogram is created

Copyright © R Wave Solutions 2008 All rights reserved

Page 6: Understanding Histogram Ppt

Frequency Histograms

� Each value of the column corresponds to a single bucket of the histogram. Each bucket contains the number of � Each bucket contains the number of occurrences of that single value.

Copyright © R Wave Solutions 2008 All rights reserved

Page 7: Understanding Histogram Ppt

Frequency Histograms Example

DataABC

SortedABB

ResultsBucket 1 A = 1Bucket 2 B = 3

CBCCCCCBDE

BBCCCCCCDE

Bucket 2 B = 3Bucket 3 C = 6Bucket 4 D = 1Bucket 5 E = 1

Copyright © R Wave Solutions 2008 All rights reserved

Page 8: Understanding Histogram Ppt

Height-Balanced Histograms

� In a height-balanced histogram, the column values are divided into bands so that each band contains approximately the same band contains approximately the same number of rows.

� The useful information that the histogram provides is where in the range of values the endpoints fall .

Copyright © R Wave Solutions 2008 All rights reserved

Page 9: Understanding Histogram Ppt

Height-Balanced Example 1

DataABC

SortedABB

ResultsEPN0

EPVB

CBCCCCCBDE

BBCCCCCCDE

Buckets = 4

0123

BCCE

Copyright © R Wave Solutions 2008 All rights reserved

Page 10: Understanding Histogram Ppt

Height-Balanced Example 1

DataABC

SortedABB

ResultsEPN0

EPVB

Notice that C Crosses Buckets.

The DB only stores0 - B2 - C3 - E

CBCCCCCBDE

BBCCCCCCDE

0123

Buckets = 4

BCCE

Crosses Buckets.

This is how database knows which values are popular

Copyright © R Wave Solutions 2008 All rights reserved

Page 11: Understanding Histogram Ppt

How to Create a Histogram

� Created by using DBMS_STATS.GATHER_TABLE_STATS

� METHOD_OPT => 'FOR COLUMNS SIZE � METHOD_OPT => 'FOR COLUMNS SIZE <# of buckets> <Column Name>‘

execute dbms_stats.gather_table_stats (ownname => 'oe', tabname => 'inventories', METHOD_OPT => 'FOR COLUMNS SIZE 10 quantity_on_hand');

Copyright © R Wave Solutions 2008 All rights reserved

Page 12: Understanding Histogram Ppt

Identify Existing Histograms

� In 10g– HISTOGRAM column of the *tab_col_statistics views

(USER/DBA)

In 9� In 9select owner, table_name, column_name, count(*)

bucketsfrom dba_histogramswhere endpoint_number not in (0,1)group by owner, table_name, column_name

Copyright © R Wave Solutions 2008 All rights reserved

Page 13: Understanding Histogram Ppt

Rules On When To Create Histograms

� First there are no rules of thumb that are always correct.When creating using dbms_stats, the use of � When creating using dbms_stats, the use of the DEGREE (for parallel) can skew the value of density

� Running dbms_stats can drop histograms– Use method_opt==>'REPEAT‘

Copyright © R Wave Solutions 2008 All rights reserved

Page 14: Understanding Histogram Ppt

Rules On When To Create Histograms

� Index maintenance overhead is performed during update, insert and delete operations.

� Histogram maintenance overhead is � Histogram maintenance overhead is performed during the analyze/dbms_stats process.

� Histograms are NOT just for indexed columns.

– Adding a histogram to an un-indexed column that is used in a where clause can improve performance.

Copyright © R Wave Solutions 2008 All rights reserved

Page 15: Understanding Histogram Ppt

Histogram Example

select sum(t1.d2*t2.d3*t3.d3)from t1, t2, t3where t1.fk1 = t2.pk1and t3.pk1 = t2.fk1and t3.d2 = 35and t1.d3 = 0; << This column has 2 values in it

In this example column T1.D3 has 2 values and is our most selective criterion, but without a histogram CBO assumes an even distribution of 2 distinct values (density = 0.5).

Copyright © R Wave Solutions 2008 All rights reserved

Page 16: Understanding Histogram Ppt

SQL*Plan Without Histogram

Rows Row Source Operation------- ---------------------------------

1 SORT AGGREGATE 1 SORT AGGREGATE 2088 HASH JOIN 601 TABLE ACCESS BY INDEX ROWID 601 INDEX RANGE SCAN OBJ#(57612)

1000 HASH JOIN 4 TABLE ACCESS FULL OBJ#(57604)

filter("T1"."D3"=0)62500 TABLE ACCESS FULL OBJ#(57605)

Copyright © R Wave Solutions 2008 All rights reserved

Page 17: Understanding Histogram Ppt

Timing Without Histogram

call count cpu elapsed disk query current rows------- ------ -------- ---------- ---------- ---------- ---------- ----------Parse 1 0.00 0.00 0 0 0 0Execute 1 0.00 0.00 0 0 0 0Fetch 2 1.04 4.91 19013 20231 0 1------- ------ -------- ---------- ---------- ---------- ---------- ----------------- ------ -------- ---------- ---------- ---------- ---------- ----------total 4 1.04 4.91 19013 20231 0 1

Query = 20231

Copyright © R Wave Solutions 2008 All rights reserved

Page 18: Understanding Histogram Ppt

Use DBMS_STATS to Add Histogram

execute dbms_stats.gather_table_stats(NULL, 'T1', estimate_percent=>null, method_opt=>'FOR COLUMNS SIZE AUTO method_opt=>'FOR COLUMNS SIZE AUTO d3');

Copyright © R Wave Solutions 2008 All rights reserved

Page 19: Understanding Histogram Ppt

Timing With Histogram

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 1 0.01 0.00 0 0 0 0

Fetch 2 0.56 2.84 9841 10666 0 1Fetch 2 0.56 2.84 9841 10666 0 1

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 4 0.57 2.84 9841 10666 0 1

Now Query = 10666It was Query = 20231

½ the IOCopyright © R Wave Solutions 2008 All rights reserved

Page 20: Understanding Histogram Ppt

Plan With Histogram

Rows Row Source Operation------- ---------------------------------------

1 SORT AGGREGATE 2088 HASH JOIN 2088 HASH JOIN 601 TABLE ACCESS BY INDEX ROWID OBJ#(57606) 601 INDEX RANGE SCAN OBJ#(57612) 1000 TABLE ACCESS BY INDEX ROWID OBJ#(57605) 1005 NESTED LOOPS

4 TABLE ACCESS FULL OBJ#(57604)filter("T1"."D3"=0)

1000 INDEX RANGE SCAN OBJ#(57609)access("T1"."FK1"="T2"."PK1")

Copyright © R Wave Solutions 2008 All rights reserved

Page 21: Understanding Histogram Ppt

Histograms Opportunities

� Any column used in a where clause with skewed dataColumns that are not queried all the time� Columns that are not queried all the time

� Reduced overhead for insert, update, delete

Copyright © R Wave Solutions 2008 All rights reserved

Page 22: Understanding Histogram Ppt

When Not To Create A Histogram

� No rule of thumb is ever perfect.� Do not create Histograms

– Evenly distributed data in the column– Columns that are not queried

– Do not create them on every column of every table.

– PK of a table

Copyright © R Wave Solutions 2008 All rights reserved

Page 23: Understanding Histogram Ppt

Creating / Maintaining Histograms

� Table maintenance procedures are important.– Accidentally deleting histogram data is possible.– Accidentally deleting histogram data is possible.– Histogram data is ONLY Collected/Maintained

when stats are collected (DBMS_STATS).– Changing data in the table does not change data

stored in the histogram.

Copyright © R Wave Solutions 2008 All rights reserved

Page 24: Understanding Histogram Ppt

Maintaining Histograms

� Always check if histogram exists on table before DBMS_STATS is run.– Do not forget to note the # of buckets.– Do not forget to note the # of buckets.– Create tables to hold information.

� Use METHOD_OPT FOR ALL COLUMNS REPEAT to prevent deletion of histograms data.

� Distribution ratio change -> Recollect

Copyright © R Wave Solutions 2008 All rights reserved

Page 25: Understanding Histogram Ppt

Creating Histograms

� What is the best way to start out creating Histograms? Three popular ways to create.– Salt and Pepper - Create them as you need them.– Salt and Pepper - Create them as you need them.– Little Hammer DBMS_STATS using

METHOD_OPT AUTO� Should NOT be used after fresh restarted database

– Big Hammer DBMS_STATS using SKEWONLY– Over Kill Hammer - FOR ALL COLUMNS

Copyright © R Wave Solutions 2008 All rights reserved

Page 26: Understanding Histogram Ppt

Other Comments

� It is the Density of a column that the optimizer considers when accessing a table.Changing the number of buckets changes � Changing the number of buckets changes the density.

� More Buckets DOES NOT mean better density.

� Using degree != 1 can affect density.

Copyright © R Wave Solutions 2008 All rights reserved

Page 27: Understanding Histogram Ppt

Review

� What is a Histogram?� Two types of Histograms in the database� How to create Histograms� Data is collected by running DBMS_STATS.� Different methods to create histograms� Bucket size affects density of column.� Density of the column affects CBO.

Copyright © R Wave Solutions 2008 All rights reserved

Page 28: Understanding Histogram Ppt

Thank You for Your Time

Copyright © R Wave Solutions 2008 All rights reserved