Top Banner
1 SQL Performance Basics for DB2 UDB for iSeries Luis Gonzalez-Suarez [email protected] 2 Trademarks & Disclaimer ! Course content and examples are based upon Version 5 Release 3 of OS/400 and DB2 UDB for iSeries. System and application behavior may be different on other releases of OS/400 and DB2 UDB. ! Example queries may not compile or run. ! NOTICE: this publication may refer to products that are not currently available in your country. ! IBM makes no commitment to make available any products referred to herein. ! IBM, eServer, iSeries, AS/400, OS/400 and DB2 are trademarks of the IBM Corporation in the United States or other countries or both. ! Other company, product, and service names may be trademarks or service marks of others. ! Java and all java-based trademarks and logos are trademarks of Sun Microsystems, Inc. In the United States and/or other countries. ! IBM’s VisualAge products and services are not associated with or sponsored by Visual Edge software, ltd. ! A list of trademarks may be found on the worldwide web: http://www.ibm.com/trademarks.html © 2005 IBM Corporation
29
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Trademarks

1

SQL Performance Basics for DB2 UDB for iSeries

Luis [email protected]

Copyright IBM Corporation 2003 2© 2005 IBM Corporation

Trademarks & Disclaimer! Course content and examples are based upon Version 5 Release 3 of

OS/400 and DB2 UDB for iSeries. System and application behavior may be different on other releases of OS/400 and DB2 UDB.

! Example queries may not compile or run.! NOTICE: this publication may refer to products that are not currently

available in your country.! IBM makes no commitment to make available any products referred to

herein.! IBM, eServer, iSeries, AS/400, OS/400 and DB2 are trademarks of the IBM

Corporation in the United States or other countries or both.! Other company, product, and service names may be trademarks or service

marks of others.! Java and all java-based trademarks and logos are trademarks of Sun

Microsystems, Inc. In the United States and/or other countries.! IBM’s VisualAge products and services are not associated with or

sponsored by Visual Edge software, ltd.! A list of trademarks may be found on the worldwide web:

http://www.ibm.com/trademarks.html

© 2005 IBM Corporation

Page 2: Trademarks

2

Copyright IBM Corporation 2003 3© 2005 IBM Corporation

Agenda! Introduction! Data Access Choices

– Table Scan Access Method– Index Probe Access Method

! The Optimizer! Optimization Tools

– Query Optimizer Debug Messages– Print SQL Information– Database Monitor– iSeries Navigator’s Visual Explain

! Optimizing Queries– Examples

! Summary

© 2005 IBM Corporation

Introduction

Page 3: Trademarks

3

Copyright IBM Corporation 2003 5© 2005 IBM Corporation

Rule of Thumb! iSeries only uses a cost based optimizer.! This presentation will attempt to describe the

most common situations you will encounter and the standard optimization rules (Rules of Thumb) for handling those queries.

! Since the optimization decisions are based on a variety of different factors, exceptions to the rules discussed here will present themselves. Be aware that they exist and save yourself considerable headache.

Copyright IBM Corporation 2003 6© 2005 IBM Corporation

Optimizer Interfaces

D A T AD A T A S Q EC Q E

System Licensed Internal Code (SLIC)

Parts of the SQE Optimizer now reside in SLIC

Query Dispatcher

Optimizer

Non-SQL InterfacesOPNQRYFQuery/400

QQQQry API

Non-SQL InterfacesOPNQRYFQuery/400

QQQQry API

SQL Based InterfacesODBC / JDBC / CLI

Embedded & Interactive SQLRun SQL ScriptsQuery Manager

Net.DataRUNSQLSTM

SQL Based InterfacesODBC / JDBC / CLI

Embedded & Interactive SQLRun SQL ScriptsQuery Manager

Net.DataRUNSQLSTM

S Q EC Q E

Page 4: Trademarks

4

Data Access Choices

Copyright IBM Corporation 2003 8© 2005 IBM Corporation

Table Scan Access MethodReads all of the rows from the table and applies the selection criteria to the rows within the table.! Advantages:

– Minimizes page I/O operations through asynchronous pre-fetching of the rows since the pages are scanned sequentially

• Requests a larger I/O to bring the data more efficiently! Potential Disadvantages:

– All rows in the table are examined regardless of the selectivity of the query

– Rows marked as deleted are still paged into memory even though none will be selected

! Rule of Thumb:– Used when only asking for or expecting a larger number of rows

returned from the table– Used when the number of large I/O’s to scan is less than the smaller

I/O’s required to probe the table

Page 5: Trademarks

5

Copyright IBM Corporation 2003 9© 2005 IBM Corporation

Table Scan Example

SELECT * FROM EmployeeWHERE WorkDept BETWEEN ‘A01’ AND ‘E01’OPTIMIZE FOR ALL ROWS

SQL4010 Table scan access for table1.

Copyright IBM Corporation 2003 10© 2005 IBM Corporation

Table Scan Example

A01

F01

D01

C01

B01

F01

C01

A01

D01

C01

A01

E01

G01

A01

G01

B01

B01

WorkDeptSELECT * FROM EmployeeWHERE WorkDept BETWEEN ‘A01’ AND ‘E01’OPTIMIZE FOR ALL ROWS

Need to touch every row and test to see if it matches the

range selection.

‘A01’ – ‘E01’

Need to touch every row and test to see if it matches the

range selection.

‘A01’ – ‘E01’

Page 6: Trademarks

6

Copyright IBM Corporation 2003 11© 2005 IBM Corporation

Table Scan GraphRemember we optimize to eliminate records at the earliest possible point.

WorkDept BETWEEN ‘A01’ AND ‘E01’

Final Select

TableScan (Employee)

Copyright IBM Corporation 2003 12© 2005 IBM Corporation

Index Probe Access Method! Selection criteria is applied to ranges of index

entries to quickly get a subset of rows before the table is retrieved.– Advantages:

• Only those index entries that are within a selected range are processed

• Provides quick access to rows in an OLTP environment– Potential Disadvantages:

• Can perform poorly when a large number of rows are selected– Requires a separate Random I/O against the table to extract the values

– Rule of Thumb:• Used when only asking for or expecting a few rows returned from

the index• Used when sequencing the rows is required for ordering or grouping• The selection columns match the first (n) key fields of the index

Page 7: Trademarks

7

Copyright IBM Corporation 2003 13© 2005 IBM Corporation

Index Probe Example

CREATE INDEX X1 ON Employee(LastName, WorkDept)

SELECT * FROM EmployeeWHERE WorkDept BETWEEN ‘A01’ AND ‘E01’

AND LastName IN (‘Smith’, ‘Jones’, ‘Peterson’)OPTIMIZE FOR ALL ROWS

SQL4008 Index X1 used for table1.SQL4011 Index scan-key row positioning used on table 1.

Copyright IBM Corporation 2003 14© 2005 IBM Corporation

Index Probe Example

A01Wulf

F01Smith

D01Smith

C01Smith

B01Smith

F01Peterson

C01Peterson

A01Milligan

D01Jones

C01Jones

A01Jones

E01Driesch

G01Doe

A01Cain

G01Anderson

B01Anderson

B01Adamson

WorkDeptLastNameCREATE INDEX X1 ON Employee(LastName, WorkDept)

SELECT * FROM EmployeeWHERE WorkDept BETWEEN ‘A01’ AND ‘E01’

AND LastName IN (‘Smith’, ‘Jones’, ‘Peterson’)OPTIMIZE FOR ALL ROWS

Think of processing a set of ranges:

‘JonesA01’ – ‘JonesE01’‘PetersonA01’ – ‘PetersonE01’

‘SmithA01’ – ‘SmithE01’

Think of processing a set of ranges:

‘JonesA01’ – ‘JonesE01’‘PetersonA01’ – ‘PetersonE01’

‘SmithA01’ – ‘SmithE01’

Page 8: Trademarks

8

Copyright IBM Corporation 2003 15© 2005 IBM Corporation

Index Probe GraphRemember we optimize to eliminate records at the earliest possible point.

WorkDept BETWEEN ‘A01’ AND ‘E01’ ANDLastName IN (‘Smith’, ‘Jones’, ‘Peterson’)

IndexProbe (X1)

TableProbe (Employee)

Final Select

The Optimizer

Page 9: Trademarks

9

Copyright IBM Corporation 2003 17© 2005 IBM Corporation

Phases of Query OptimizationQuery processing can be divided into four phases:! Query Validation

– Validate the query request– Validate any existing access plan– Build any internal structures

! Query Dispatcher– Determine what query engine should complete the processing

! Query Optimization– Choose the most efficient access method– Builds access plan

! Query Execution– Builds structures needed for the query cursor– Builds structures needed for any temporary indexes (if needed)– Builds and activates query cursor (ODP)– Generate any feedback requested

Copyright IBM Corporation 2003 18© 2005 IBM Corporation

Most Efficient Access Method

Cost based optimization dictates that the most efficient access method will vary for the same query based upon selectivity.– Many different data access choices can be used to satisfy

a query. Each has their own strengths and weaknesses– What data access method should we use to find rows that

match a specific Color and Size combination from a million record table?

• When 1 row matches the selection• When 1,000 rows match the selection• When 100,000 rows match the selection• When 1,000,000 rows match the selection

– How does the optimizer know which choice to make?

Page 10: Trademarks

10

Copyright IBM Corporation 2003 19© 2005 IBM Corporation

SQE Optimizer! Controls the strategies and algorithms used to determine what data

access methods should be employed! No knowledge of the meta-data or the systems capabilities

– Asks questions about the system and the tables and uses the answers in its algorithms

– Separation of responsibilities, now rely upon the Statistics Manager and the SQE Primitives to provide answers to plug into the algorithms

! The optimizer has new strategies and uses existing strategies differently– Temporary indexes will no longer be considered, new algorithms are

now available– Table Scans will be considered more often due to new SQE Primitives

! Access plans now organized into a tree-based structure to provide maximum flexibility

! Still responsible for collecting and processing the feedback information

Copyright IBM Corporation 2003 20© 2005 IBM Corporation

SQE Primitives! Provides the actual implementation of the query using data access

methods derived from an OO tree-based architecture! More aggressive on utilizing I/O subsystems and main memory

– Different memory footprint because the mechanisms and structuresused by SQE have changed from CQE

! Redesigned and implemented many of the same existing data access methods– On average less CPU consumption– New algorithms for distinct and SMP processing

! Some new data access methods have been added! Temporary results have been retooled to eliminate the need for a

“true” database object to be created– Leverages the proximity of the code in relation to the data to minimize

data movement and take advantage of efficient structures to satisfy the request

Page 11: Trademarks

11

Copyright IBM Corporation 2003 21© 2005 IBM Corporation

SQE Plan Cache! Incorporates Self-Managing Technology

– Cache is automatically maintained to keep most active queries available for reuse

– Plans are optimized on-demand as new stats or indexes become available

– Foundation for a self-learning query optimizer to interrogate the plans to make wiser costing decisions

! Caches all access plans optimized by the SQE Optimizer– Allows more reuse of existing plans regardless of interface for

identical SQL statements– Works in conjunction with the System Wide Statement Cache

and the SQL programs, packages and service programs! Cache is cleared during an IPL or varying the IASP

Copyright IBM Corporation 2003 22© 2005 IBM Corporation

SQE Plan Cache cont…! Better centralized management of plan information:

– Plans are stored in a compressed mode– Plans stored independent of job information for better sharing of

plans– Access is optimized to minimize contention on plan entries

across the system– Out of date plans are cycled from the cache as more space is

needed– Multiple plans can be maintained for identical SQL statements

(library list or environmental changes)! Repository of information that is used to determine

feedback and automatic stats generation! Enabling auto stats collection causes the Statistics

Manager to interrogate the Plan Cache looking for plans where stats would have been helpful

Page 12: Trademarks

12

Copyright IBM Corporation 2003 23© 2005 IBM Corporation

Statistics! All query optimizer’s rely upon statistics to make plan decisions ! Accuracy of the statistics will dictate the optimizers ability to chose

the best plan– DB2 UDB for the iSeries has always relied upon indexes as its

source for stats – Other databases rely upon manual stats collection for their

source of stats! Starting in V5R2M0, the SQL Query Engine (SQE) offers a hybrid

approach where statistics will be automatically collected for cases where indexes do not already exist – Diminished need to create indexes solely for statistics– Still need indexes for plan implementation choices

Copyright IBM Corporation 2003 24© 2005 IBM Corporation

Sources for Statistics Answers

! Meta-data sources– Existing indexes (Radix or Encoded Vector)

• More accurately describes multi-key values• Stats available immediately as the index maintenance occurs

– Stats (only used for the SQL Query Engine)• Column Cardinality, Histograms & Frequent Values List• Constructed over a single column in a table

– Stored internally as a part of the table object after created• Collected automatically by default for the system• Stats not immediately maintained as the table changes. Instead

stats are refreshed as they become “stale” over time! Default or stale sources

– No representation of actual values in columns• Essentially just a guess based upon how the column is used

Page 13: Trademarks

13

Copyright IBM Corporation 2003 25© 2005 IBM Corporation

Selectivity StatisticsThe optimizer will use the selectivity for the selection predicates to determine how many rows must be processed for each data access method considered

!The selectivity will always be calculated for each selection predicate. The answer will come from either:

– Default Sources (default filtering based upon operator used)– Meta-Data Sources (existing indexes or column statistics)

!Always want Meta-data statistics for your most and least selective columns

Union of ValuesUnion of ValuesOROrs

Intersection of ValuesIntersection of ValuesANDAnds

Actual Selectivity20%LIKEWild Cards

Actual Selectivity25%BETWEENRanges

Actual Selectivity33%>, >=, <, <=Non-Equals

Actual Selectivity90%<>, ^=Not Equals

Actual Selectivity10%=Equals

Meta-Data SourcesDefault SourcesSQL SyntaxOperator

Copyright IBM Corporation 2003 26© 2005 IBM Corporation

Strategy for Query OptimizationThe Query Optimization Phase will generally follow this simplified

strategy:– Gather statistics needed for costing

• Selectivity statistics• Indexes available to be costed

– Sort the indexes based upon their perceived usefulness• Environmental attributes that may effect the costs

– Generate a default cost• Build an access plan associated with the default plan

– For each index (until time-out):• Gather information needed specific to this index• Build an access plan associated with this index• Cost the use of the index with this access plan• Compare the resulting cost against the cost from the current best plan

! All access plans generated to be costed must be a valid choice to implement the query if optimization is cut short for any reason.

Page 14: Trademarks

14

Optimization Tools

Copyright IBM Corporation 2003 28© 2005 IBM Corporation

What Tools?! DB2 UDB for the iSeries offers a number of free tools to

help explain your queries– Help determine how your queries are being implemented– Help show what data access choices are being made by

the optimizer– Tools to help explain the feedback from the optimizer:

• Query Optimizer Debug Messages• Print SQL Information• Database Monitor

– Summary and Detailed Monitors• iSeries Navigator’s Visual Explain

– Tools to modify the environment for the optimizer:• Predictive Query Governor• Query Attributes & INI File• Index & Statistic Advisors

Page 15: Trademarks

15

Copyright IBM Corporation 2003 29© 2005 IBM Corporation

Query Optimizer Debug Messages

! Informational messages are written to the joblog illustrating the implementation of the query– Use messages written to the joblog to detail the data

access choices made by the optimizer• Message ID’s CPI4321 through CPI434F

– Messages issued when queries are run in a debug mode environment

• STRDBG• STRSRVJOB and STRDBG for batch jobs• MESSAGES_DEBUG from the QAQQINI file

– Detailed explanation contained within second level text of the messages

• Why an index was used or not used• Why a temporary result was created• Join order of the resulting query• The keys for any indexes created or advised

Copyright IBM Corporation 2003 30© 2005 IBM Corporation

Print SQL Information! CL command that creates a spool file report that will list the SQL statements

and their implementations – DB2 UDB for the iSeries version of the SQL EXPLAIN utility– Creates a spool file report that contains:

• The environmental information used to invoke the SQL pre-compiler• The SQL statements• The data access choices made by the optimizer

– Can be issued against all of the following objects:• SQL programs (*SQLPGM)• SQL service programs (*SRVPGM)• SQL packages (*SQLPKG)• Job level SQL statement cache (*JOB)

– Data access output is similar to the debug messages• Message ID’s SQL4001 through SQL403F• All information contained within the first level text of the messages

– Examples• PRTSQLINF OBJ(MySQLPkg) OBJTYPE(*SQLPKG)• PRTSQLINF OBJ(*JOB)

Page 16: Trademarks

16

Copyright IBM Corporation 2003 31© 2005 IBM Corporation

Database Monitor! Logs all query activity for a single job or all the jobs on the system to

a database table– Two different types of monitors can be employed:

• Detailed Monitor logs all activity into a table with no compression of information

– Controlled through STRDBMON & ENDDBMON CL commands– Large overhead as data is collected, formatted and written to the table

• Summary Monitor logs all unique queries into a series of tables after some compression and summarization has occurred

– Controlled through a series of APIs– Only collected for SQL statements– Less overhead as data is managed within memory until the API indicates to write

the data into the tables– Monitor only collects data, queries must then be written to extract

information from the table (s)• iSeries Navigator has pre-written queries that can be used• The data in the table is uniquely labeled for each query to ensure all of the

records associated can be analyzed

Copyright IBM Corporation 2003 32© 2005 IBM Corporation

Data Collected by Database Monitor

! The Database Monitor collects the same information as the Debug Messages and Print SQL Information for a query. It also adds information that can only be found within the monitor:– System, job and database schema names– Original SQL statement and host variable values– Estimated as well as actual processing time – Estimate as well as actual number of rows selected– Fields advised for either an index or a column statistic

creation– Total optimization time– Types of operations the query performs (ORDER BY,

GROUP BY, UNION)– ODP implementation (Full Open or Reusable Open)– Others

Page 17: Trademarks

17

Copyright IBM Corporation 2003 33© 2005 IBM Corporation

iSeries Navigator’s Visual Explain

! GUI tool through iSeries Navigator to visually represent the implementation of a query– Based on the existing data collected by the detailed SQL

Performance Monitor (Database Monitor)• SQL Performance Monitors can either be collected by or imported

into iSeries Navigator– Can be launched through iSeries Navigator by:

• Listing explainable statements for a detailed SQL Performance Monitor

• Selecting the explain option on the Run SQL Scripts window– Self-study iSeries Navigator tutorials for Visual Explain:

• Performance Tuning DB2 UDB with iSeries Navigator & Visual Explain

• www.iseries.ibm.com/developer/education/ibo/view.html?biz

Copyright IBM Corporation 2003 34© 2005 IBM Corporation

iSeries Navigator’s Visual Explain

Page 18: Trademarks

18

Copyright IBM Corporation 2003 35© 2005 IBM Corporation

Predictive Query Governor! Tests the access plan chosen by the optimizer against a pre-

determined time limit prior to running the query– The time limit can be specified in one of the following ways:

• QQRYTIMLMT system value• CHGQRYA CL command• QUERY_TIME_LIMIT from the QAQQINI file

– Allows a query that will exceed the time limit to be canceled before it starts using system resources

• Works based upon the estimated runtime as calculated by the optimizer– Inquiry message CPA4259 is issued when the query will exceed the

time limit• Can either ignore the message or cancel the query at this point• Print SQL Information messages always contained in second level text of the

inquiry message– Debug messages will be written to the joblog if the query is canceled

(option C)– A time limit of zero can be used while iteratively tuning a query to see

the effects of any changes being made

Copyright IBM Corporation 2003 36© 2005 IBM Corporation

Query Attributes & INI File! Provides a central point of control for all environmental attributes,

options and knobs that can impact query optimization– Options are stored within a database table (QAQQINI)

• Dynamically set through INSERT / UPDATE / DELETE statements• Options are validated and cached by the system

– Three columns used to identify each attribute• QQPARM The attribute or options name• QQVAL The value associated with the parameter• QQTEXT Optional description of the attribute

– Can either modify attributes for a specific job or the entire system• Optimizer searches for the QAQQINI file based upon a specific library• Default library, QUSRSYS, can be overridden using the CHGQRYA CL

command– Template file is shipped in the QSYS library

• CRTDUPOBJ OBJ(QAQQINI) FROMLIB(QSYS) OBJTYPE(*FILE) TOLIB(MyLib) NEWOBJ(*OBJ) DATA(*YES)

– Supported options are documented in the Database Performance andQuery Optimization book at www.iseries.ibm.com/db2/books.htm

Page 19: Trademarks

19

Copyright IBM Corporation 2003 37© 2005 IBM Corporation

Index & Statistic Advisors

! The optimizer will attempt to advise what indexes or statistics should be created for a particular query– Optimizer has no knowledge that either of these will

actually benefit the query– Advises when an index could be created that would

match the selection within the query• Shows up in Debug Message CPI432F or Database Monitor

records: 3000, 3001 or 3002• Best used to help analyze complex selection for a query

– Advises when a column statistic should be created because there was no source of statistics on a column

• Only shows up in the 3015 Database Monitor record

Optimizing Queries

Examples

Page 20: Trademarks

20

Copyright IBM Corporation 2003 39© 2005 IBM Corporation

Simple Optimization Strategy! In order to optimize most queries, you will have to create an index

for the optimizer to chose:– All optimizers rely upon statistics for their costing, so make sure that

appropriate statistics are available to the optimizer. Statistics come from:

• Existing Indexes• Column Statistics

– In general the perfect index will arrange the columns as follows:• Selection predicates + join predicates• Join predicates + selection predicates• Selection predicates + group by columns• Selection predicates + order by columns

– Always place the most selective columns as the first key (s) in the index• May have to adjust the key field order to match your data model and how

most queries access the table– Attempt to avoid variable length and null capable columns as index keys– Use any feedback from the optimizer to modify your approach and

iterate the process

Copyright IBM Corporation 2003 40© 2005 IBM Corporation

Using a Query Graph! Queries can be represented as a graph to help visualize

what columns should be considered for index creation:– Separate all of the tables and major functions in the query

into different nodes of the graph• Create a different node for each table, grouping, ordering or join

requirement– Push all of the selection to the lowest level possible in the

graph– Process the columns starting at the bottom of the graph to

determine which ones should be included into any indexes– Use the Perfect Index Guidelines to determine what should

be included into the index ! Columns at the top of the graph may be better suited for

a column stat rather than a permanent index.

Page 21: Trademarks

21

Copyright IBM Corporation 2003 41© 2005 IBM Corporation

Binary Radix Index! Key values are compressed

– Common patterns are stored once– Unique portion stored in “leaf” pages– Positive impact on size and depth of the index tree

! Algorithm used to find values– Binary search

• Very efficient process to find a unique value– Modified to fit the data structure

! Maintenance– Index data is automatically spread across all available

disk units– Tree is automatically rebalanced to maintain an

efficient structure

Copyright IBM Corporation 2003 42© 2005 IBM Corporation

Binary Radix Index

……

ARIZONA005

IOWA004

MISSOURI003

MISSIPPI002

ARKANSAS001

Database Table

ADVANTAGES:! Quick access to a single key

value (million-entry index, on average, only 20 tests)

! Also efficient for small, selected range of key values (low cardinality)

DISADVANTAGES:! Table rows retrieved in order of key values

(not physical order) which equates to many random I/O’s when selecting a large number of keys (high cardinality)

! No way to predict which physical index pages are next when traversing the index for large number of key values

ROOTROOT

Test NodeTest Node MISSMISS

ISSIPPI002

ISSIPPI002

OURI003

OURI003

IOWA004

IOWA004IZONA

005IZONA

005KANSAS

001KANSAS

001

ARAR

Page 22: Trademarks

22

Copyright IBM Corporation 2003 43© 2005 IBM Corporation

Encoded Vector Index (EVI)! New index object for delivering fast data access in

decision support and query reporting environments– Complementary alternative to existing index object

(binary radix tree structure – keyed logical file or SQL index)

– Advanced technology from IBM Research, that is variation on bitmap indexing

– Easy to access data statistics improve query optimizer decision making

! Can only be created through an SQL interface– CREATE ENCODED VECTOR INDEX Library/EVI_Name on

Library/Table_Name (Column) WITH n DISTINCT VALUES

Copyright IBM Corporation 2003 44© 2005 IBM Corporation

Encoded Vector Index (EVI)Vector

138

38

72

918

171

Code

98

7

65

43

2

1

Row Number

276083000738Wyoming

34030111122237Virginia

…73009976052Arkansas

50008000511Arizona

CountLast RowFirst RowCodeKey Value

Symbol Table

• Composed of two parts

! Symbol table contains information for each distinct key value. Each key value is assigned a unique code

! Code is 1, 2, or 4 bytes depending on number of distinct key values! Rather then a bit array for each distinct key value, the index has one array of codes

(The Vector)

Page 23: Trademarks

23

Copyright IBM Corporation 2003 45© 2005 IBM Corporation

Simple Selection Query

SELECT SUM(Item_Price)FROM Sales Default Selectivity StatisticWHERE Date = ’07/15/1966’ 0.10

AND Store = ‘MN55906’ 0.10AND EmpNum = 222473 0.10

0.001 (0.1%)SQL4010 Table scan access for table 1.

The default selectivity statistic for each selection predicate is 10% (0.10) which means that the entire query is only assumed to be selecting 0.1% (0.001) of the records in the table (i.e. 10 records) for the summation.

An index with three keys: Date, Store and EmpNum (in any order because of the equal selection) can be used to satisfy all three selection predicate criteria at once.

This query will return the sum of all of the item prices for a given employee at a store given a particular date.

Assume that no indexes exist for this table of 10,000 records.

SQL4008 Index X1 used for table 1.SQL4011 Index scan-key row positioning used on table 1.

Copyright IBM Corporation 2003 46© 2005 IBM Corporation

Simple Selection Query GraphAn index could be created over the columns Date, Store and EmpNum in any order because they all use equal selection predicates.

WHERE Date = ’07/15/1966’AND Store = ‘MN55906’AND EmpNum = 222473

Final Select

Table (Sales)

SUM(Item_Price)

Aggregation

Page 24: Trademarks

24

Copyright IBM Corporation 2003 47© 2005 IBM Corporation

Index Key Field Order

SELECT SUM(Item_Price)FROM SalesWHERE Date = ’07/15/1966’

AND Store = ‘MN55906’AND EmpNum = 222473

SQL4008 Index X1 used for table 1.SQL4011 Index scan-key row positioning used on table 1.

This index works fine for this query, however you need to determine how it will work with other queries over the same file.

This index implies that the Date column is used the most often in the queries over this table (not very likely).

An index with the key fields in the order of Store, EmpNum and Date will probably be able to be used by more queries.

The key order is very important when analyzing an index, so some care is needed when creating multi-key indexes.

Lets look at our last example with index keys fields of Date, Store and EmpNum:

Copyright IBM Corporation 2003 48© 2005 IBM Corporation

Selection and Order By Query

SELECT Item_Price, DateFROM SalesWHERE Store = ‘MN55906’

AND EmpNum = 222473ORDER BY Date, Item_Price

SQL4011 Index scan-key row positioning used on table 1.SQL4012 Index created from X1 for table 1.

The need for an index keyed on Store and EmpNum conflicts with an index required for the ORDER BY (Date and Item_Price).

However, an index keyed on Store, EmpNum, Date and Item_Price (in this order) can be used to satisfy both the selection and ordering requirements for this query at the same time.

This query will return all of the detail information that was used in the previous example.

Assume that only the index we created in the Simple Query example exists (Store, EmpNum and Date):

Page 25: Trademarks

25

Copyright IBM Corporation 2003 49© 2005 IBM Corporation

Selection and Order By Query Graph

Always start looking at the bottom of the graph for the columns to place into the index first. Then work your way up the graph looking for additional columns.

WHERE Store = ‘MN55906’AND EmpNum = 222473

Final Select

Table (Sales)

Date, Item_PriceORDER BY

Copyright IBM Corporation 2003 50© 2005 IBM Corporation

Complex Selection Query

SELECT SUM(Item_Price)FROM SalesWHERE EmpNum = :HV_Emp

AND ((Date IN (:HV_Date1, :HV_Date2)AND Store = :HV_Store1AND Location = ‘Mid-West’) OR

((Date IN (’07/15/1966’, :HV_Date3, ’11/06/1969’)AND Store = :HV_Store2))

SQL4010 Table scan access for table 1.

The default selectivity statistic for this query is difficult to figure out and probably not give you much useful information (actual default is 0.45585%)

Need to rely upon one of the other Optimization Tools to help interrogate the complex selection found within this query…

This query will return the sum of all of the item prices for a given employee on a set of dates at two particular stores.

Again assume that no indexes exist for this table of 10,000 records:

Page 26: Trademarks

26

Copyright IBM Corporation 2003 51© 2005 IBM Corporation

Complex Selection Query Graph

Need to determine what columns are being used as equal selection predicates across all of the ranges (ORs) for this query.

Final Select

SUM(Item_Price)

Aggregation

WHERE EmpNum = :HV_EmpAND ((Date IN (:HV_Date1, :HV_Date2)AND Store = :HV_Store1AND Location = ‘Mid-West’) OR

(Date IN (’07/15/1966’, :HV_Date3, ’11/06/1969’) AND Store = :HV_Store2))

Table (Sales)

Copyright IBM Corporation 2003 52© 2005 IBM Corporation

Complex Selection Index Advisor

Message ID . . . . : CPI432FSeverity . . . . . . . : 00Message type . . : Informational

Message . . . : Access path suggestion for file SALES.Cause . . . . . : To improve performance the query optimizer is suggesting a permanent access path be built with the key fields it is recommending. The access path will access records from member SALES of file SALES in library COOTER.

In the list of key fields that follow, the query optimizer is recommending the first 3 key fields as primary key fields. The remaining key fields are considered secondary key fields and are listed in order of expected selectivity based on this query. The following list contains the suggested primary and secondary key fields:

DATE, EMPNUM, STORE, LOCATION.

The index advisor will help to identify what columns are being used as equal selection predicates across all of the OR ranges. These are labeled as the primary key fields within this debug message.

The index advisor will only recommend a list of index key candidates, a DBA will still need to make the final determination of the key field order for any permanent index.

From the debug messages in the job log…

Page 27: Trademarks

27

Summary

Copyright IBM Corporation 2003 54© 2005 IBM Corporation

Conclusion! Use the tools provided to understand what data access

methods your queries are using.! Optimization is all about eliminating a record from

consideration as early as possible.! Use any suggestions made by the tools to create the

correct indexes or statistics. This will help the correct data access method to be chosen.

! Don’t be afraid to experiment with the tools to find out what the best data access method for your queries really is.

! Remember this is an iterative process so do not get frustrated!

Page 28: Trademarks

28

Copyright IBM Corporation 2003 55© 2005 IBM Corporation

Additional Information! DB2 UDB for iSeries home page – http://www.iseries.ibm.com/db2! Newsgroups

– USENET: comp.sys.ibm.as400.misc, comp.database.ibm-db2– iSeries Network (NEWS/400 Magazine) SQL & DB2 Forum –

http://www.iseriesnetwork.com/Forums/main.cfm?CFApp=59! Education Resources – Classroom & Online

– http://www.iseries.ibm.com/db2/db2educ_m.htm– http://www.iseries.ibm.com/developer/education/ibo/index.html

! DB2 UDB for iSeries Publications– Online Manuals: http://www.iseries.ibm.com/db2/books.htm– Porting Help: http://www.iseries.ibm.com/developer/db2/porting.html– DB2 UDB for iSeries Redbooks (http://ibm.com/redbooks )

• Stored Procedures & Triggers on DB2 UDB for iSeries (SG24-6503)• DB2 UDB for AS/400 Object Relational Support (SG24-5409)• SQL Query Engine

– (http://publib-b.boulder.ibm.com/Redbooks.nsf/RedpieceAbstracts/sg2456598.html)– SQL/400 Developer’s Guide by Paul Conte & Mike Cravitz

• http://iseriesnetwork.com/str/books/Uniquebook2.cfm?NextBook=183– iSeries and AS/400 SQL at Work by Howard Arner

• http://www.sqlthing.com/books.htm! Please send questions or comments to [email protected]

Copyright IBM Corporation 2003 56© 2005 IBM Corporation

Informational CD

! DB2 UDB for iSeries Informational CD– White Papers– Expert Video Clips– Product Demos & Information– Self-Guided Education– Links to Additional Information

! Order CD from:– www-1.ibm.com/servers/eserver/education/media.html?true/true/true/true/

Page 29: Trademarks

29

Copyright IBM Corporation 2003 57© 2005 IBM Corporation

Class Roadmap

DB2 UDB for iSeriesFundamentals

(S6145)

DB2 UDB for iSeriesFundamentals

(S6145)

Accessing DB2 UDB for iSeries w/SQL

(S6137)

Accessing DB2 UDB for iSeries w/SQL

(S6137)

Developing iSeriesApplications w/SQL

(S6138)

Developing iSeriesApplications w/SQL

(S6138)

DB2 UDB for iSeriesSQL AdvancedProgramming

(S6139)

DB2 UDB for iSeriesSQL AdvancedProgramming

(S6139)

DB2 UDB for iSeries SQL& Query Performance

Workshopwww.ibm.com/eserver/iseries/service/igs/db2performance.html

DB2 UDB for iSeries SQL& Query Performance

Workshopwww.ibm.com/eserver/iseries/service/igs/db2performance.html

Self-study iSeries Navigator tutorials for DB2 UDB:• www.iseries.ibm.com/developer/education/ibo/view.html?biz• Piloting DB2 UDB with iSeries Navigator• Performance Tuning DB2 UDB with iSeries Navigator & Visual

Explain• Integrating XML and DB2 UDB for iSeries

www.ibm.com/eserver/iseries/db2/db2educ_m.htmwww.ibm.com/services/learning