This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Rule of Thumb! iSeries only uses a cost based optimizer.! This presentation will attempt to describe the
most common situations you will encounter and the standard optimization rules (Rules of Thumb) for handling those queries.
! Since the optimization decisions are based on a variety of different factors, exceptions to the rules discussed here will present themselves. Be aware that they exist and save yourself considerable headache.
Phases of Query OptimizationQuery processing can be divided into four phases:! Query Validation
– Validate the query request– Validate any existing access plan– Build any internal structures
! Query Dispatcher– Determine what query engine should complete the processing
! Query Optimization– Choose the most efficient access method– Builds access plan
! Query Execution– Builds structures needed for the query cursor– Builds structures needed for any temporary indexes (if needed)– Builds and activates query cursor (ODP)– Generate any feedback requested
Cost based optimization dictates that the most efficient access method will vary for the same query based upon selectivity.– Many different data access choices can be used to satisfy
a query. Each has their own strengths and weaknesses– What data access method should we use to find rows that
match a specific Color and Size combination from a million record table?
• When 1 row matches the selection• When 1,000 rows match the selection• When 100,000 rows match the selection• When 1,000,000 rows match the selection
– How does the optimizer know which choice to make?
SQE Primitives! Provides the actual implementation of the query using data access
methods derived from an OO tree-based architecture! More aggressive on utilizing I/O subsystems and main memory
– Different memory footprint because the mechanisms and structuresused by SQE have changed from CQE
! Redesigned and implemented many of the same existing data access methods– On average less CPU consumption– New algorithms for distinct and SMP processing
! Some new data access methods have been added! Temporary results have been retooled to eliminate the need for a
“true” database object to be created– Leverages the proximity of the code in relation to the data to minimize
data movement and take advantage of efficient structures to satisfy the request
Statistics! All query optimizer’s rely upon statistics to make plan decisions ! Accuracy of the statistics will dictate the optimizers ability to chose
the best plan– DB2 UDB for the iSeries has always relied upon indexes as its
source for stats – Other databases rely upon manual stats collection for their
source of stats! Starting in V5R2M0, the SQL Query Engine (SQE) offers a hybrid
approach where statistics will be automatically collected for cases where indexes do not already exist – Diminished need to create indexes solely for statistics– Still need indexes for plan implementation choices
! Meta-data sources– Existing indexes (Radix or Encoded Vector)
• More accurately describes multi-key values• Stats available immediately as the index maintenance occurs
– Stats (only used for the SQL Query Engine)• Column Cardinality, Histograms & Frequent Values List• Constructed over a single column in a table
– Stored internally as a part of the table object after created• Collected automatically by default for the system• Stats not immediately maintained as the table changes. Instead
stats are refreshed as they become “stale” over time! Default or stale sources
– No representation of actual values in columns• Essentially just a guess based upon how the column is used
Selectivity StatisticsThe optimizer will use the selectivity for the selection predicates to determine how many rows must be processed for each data access method considered
!The selectivity will always be calculated for each selection predicate. The answer will come from either:
– Default Sources (default filtering based upon operator used)– Meta-Data Sources (existing indexes or column statistics)
!Always want Meta-data statistics for your most and least selective columns
Union of ValuesUnion of ValuesOROrs
Intersection of ValuesIntersection of ValuesANDAnds
Strategy for Query OptimizationThe Query Optimization Phase will generally follow this simplified
strategy:– Gather statistics needed for costing
• Selectivity statistics• Indexes available to be costed
– Sort the indexes based upon their perceived usefulness• Environmental attributes that may effect the costs
– Generate a default cost• Build an access plan associated with the default plan
– For each index (until time-out):• Gather information needed specific to this index• Build an access plan associated with this index• Cost the use of the index with this access plan• Compare the resulting cost against the cost from the current best plan
! All access plans generated to be costed must be a valid choice to implement the query if optimization is cut short for any reason.
! Informational messages are written to the joblog illustrating the implementation of the query– Use messages written to the joblog to detail the data
access choices made by the optimizer• Message ID’s CPI4321 through CPI434F
– Messages issued when queries are run in a debug mode environment
• STRDBG• STRSRVJOB and STRDBG for batch jobs• MESSAGES_DEBUG from the QAQQINI file
– Detailed explanation contained within second level text of the messages
• Why an index was used or not used• Why a temporary result was created• Join order of the resulting query• The keys for any indexes created or advised
Print SQL Information! CL command that creates a spool file report that will list the SQL statements
and their implementations – DB2 UDB for the iSeries version of the SQL EXPLAIN utility– Creates a spool file report that contains:
• The environmental information used to invoke the SQL pre-compiler• The SQL statements• The data access choices made by the optimizer
– Can be issued against all of the following objects:• SQL programs (*SQLPGM)• SQL service programs (*SRVPGM)• SQL packages (*SQLPKG)• Job level SQL statement cache (*JOB)
– Data access output is similar to the debug messages• Message ID’s SQL4001 through SQL403F• All information contained within the first level text of the messages
Database Monitor! Logs all query activity for a single job or all the jobs on the system to
a database table– Two different types of monitors can be employed:
• Detailed Monitor logs all activity into a table with no compression of information
– Controlled through STRDBMON & ENDDBMON CL commands– Large overhead as data is collected, formatted and written to the table
• Summary Monitor logs all unique queries into a series of tables after some compression and summarization has occurred
– Controlled through a series of APIs– Only collected for SQL statements– Less overhead as data is managed within memory until the API indicates to write
the data into the tables– Monitor only collects data, queries must then be written to extract
information from the table (s)• iSeries Navigator has pre-written queries that can be used• The data in the table is uniquely labeled for each query to ensure all of the
! The Database Monitor collects the same information as the Debug Messages and Print SQL Information for a query. It also adds information that can only be found within the monitor:– System, job and database schema names– Original SQL statement and host variable values– Estimated as well as actual processing time – Estimate as well as actual number of rows selected– Fields advised for either an index or a column statistic
creation– Total optimization time– Types of operations the query performs (ORDER BY,
GROUP BY, UNION)– ODP implementation (Full Open or Reusable Open)– Others
Predictive Query Governor! Tests the access plan chosen by the optimizer against a pre-
determined time limit prior to running the query– The time limit can be specified in one of the following ways:
• QQRYTIMLMT system value• CHGQRYA CL command• QUERY_TIME_LIMIT from the QAQQINI file
– Allows a query that will exceed the time limit to be canceled before it starts using system resources
• Works based upon the estimated runtime as calculated by the optimizer– Inquiry message CPA4259 is issued when the query will exceed the
time limit• Can either ignore the message or cancel the query at this point• Print SQL Information messages always contained in second level text of the
inquiry message– Debug messages will be written to the joblog if the query is canceled
(option C)– A time limit of zero can be used while iteratively tuning a query to see
Query Attributes & INI File! Provides a central point of control for all environmental attributes,
options and knobs that can impact query optimization– Options are stored within a database table (QAQQINI)
• Dynamically set through INSERT / UPDATE / DELETE statements• Options are validated and cached by the system
– Three columns used to identify each attribute• QQPARM The attribute or options name• QQVAL The value associated with the parameter• QQTEXT Optional description of the attribute
– Can either modify attributes for a specific job or the entire system• Optimizer searches for the QAQQINI file based upon a specific library• Default library, QUSRSYS, can be overridden using the CHGQRYA CL
command– Template file is shipped in the QSYS library
! The optimizer will attempt to advise what indexes or statistics should be created for a particular query– Optimizer has no knowledge that either of these will
actually benefit the query– Advises when an index could be created that would
match the selection within the query• Shows up in Debug Message CPI432F or Database Monitor
records: 3000, 3001 or 3002• Best used to help analyze complex selection for a query
– Advises when a column statistic should be created because there was no source of statistics on a column
• Only shows up in the 3015 Database Monitor record
Simple Optimization Strategy! In order to optimize most queries, you will have to create an index
for the optimizer to chose:– All optimizers rely upon statistics for their costing, so make sure that
appropriate statistics are available to the optimizer. Statistics come from:
• Existing Indexes• Column Statistics
– In general the perfect index will arrange the columns as follows:• Selection predicates + join predicates• Join predicates + selection predicates• Selection predicates + group by columns• Selection predicates + order by columns
– Always place the most selective columns as the first key (s) in the index• May have to adjust the key field order to match your data model and how
most queries access the table– Attempt to avoid variable length and null capable columns as index keys– Use any feedback from the optimizer to modify your approach and
! Symbol table contains information for each distinct key value. Each key value is assigned a unique code
! Code is 1, 2, or 4 bytes depending on number of distinct key values! Rather then a bit array for each distinct key value, the index has one array of codes
SELECT SUM(Item_Price)FROM Sales Default Selectivity StatisticWHERE Date = ’07/15/1966’ 0.10
AND Store = ‘MN55906’ 0.10AND EmpNum = 222473 0.10
0.001 (0.1%)SQL4010 Table scan access for table 1.
The default selectivity statistic for each selection predicate is 10% (0.10) which means that the entire query is only assumed to be selecting 0.1% (0.001) of the records in the table (i.e. 10 records) for the summation.
An index with three keys: Date, Store and EmpNum (in any order because of the equal selection) can be used to satisfy all three selection predicate criteria at once.
This query will return the sum of all of the item prices for a given employee at a store given a particular date.
Assume that no indexes exist for this table of 10,000 records.
SQL4008 Index X1 used for table 1.SQL4011 Index scan-key row positioning used on table 1.
Simple Selection Query GraphAn index could be created over the columns Date, Store and EmpNum in any order because they all use equal selection predicates.
WHERE Date = ’07/15/1966’AND Store = ‘MN55906’AND EmpNum = 222473
SELECT Item_Price, DateFROM SalesWHERE Store = ‘MN55906’
AND EmpNum = 222473ORDER BY Date, Item_Price
SQL4011 Index scan-key row positioning used on table 1.SQL4012 Index created from X1 for table 1.
The need for an index keyed on Store and EmpNum conflicts with an index required for the ORDER BY (Date and Item_Price).
However, an index keyed on Store, EmpNum, Date and Item_Price (in this order) can be used to satisfy both the selection and ordering requirements for this query at the same time.
This query will return all of the detail information that was used in the previous example.
Assume that only the index we created in the Simple Query example exists (Store, EmpNum and Date):
Always start looking at the bottom of the graph for the columns to place into the index first. Then work your way up the graph looking for additional columns.
AND ((Date IN (:HV_Date1, :HV_Date2)AND Store = :HV_Store1AND Location = ‘Mid-West’) OR
((Date IN (’07/15/1966’, :HV_Date3, ’11/06/1969’)AND Store = :HV_Store2))
SQL4010 Table scan access for table 1.
The default selectivity statistic for this query is difficult to figure out and probably not give you much useful information (actual default is 0.45585%)
Need to rely upon one of the other Optimization Tools to help interrogate the complex selection found within this query…
This query will return the sum of all of the item prices for a given employee on a set of dates at two particular stores.
Again assume that no indexes exist for this table of 10,000 records:
Message . . . : Access path suggestion for file SALES.Cause . . . . . : To improve performance the query optimizer is suggesting a permanent access path be built with the key fields it is recommending. The access path will access records from member SALES of file SALES in library COOTER.
In the list of key fields that follow, the query optimizer is recommending the first 3 key fields as primary key fields. The remaining key fields are considered secondary key fields and are listed in order of expected selectivity based on this query. The following list contains the suggested primary and secondary key fields:
DATE, EMPNUM, STORE, LOCATION.
The index advisor will help to identify what columns are being used as equal selection predicates across all of the OR ranges. These are labeled as the primary key fields within this debug message.
The index advisor will only recommend a list of index key candidates, a DBA will still need to make the final determination of the key field order for any permanent index.
! DB2 UDB for iSeries Informational CD– White Papers– Expert Video Clips– Product Demos & Information– Self-Guided Education– Links to Additional Information
! Order CD from:– www-1.ibm.com/servers/eserver/education/media.html?true/true/true/true/