Explain 21773 Ram On

8/8/2019 Explain 21773 Ram On

1/41


2/41

Agenda

What is the EXPLAIN facility?

What does the EXPLAIN terminology mean?

What can be learned by reading the EXPLAIN text?

Where does the EXPLAIN output come FROM? What are the latest features of EXPLAIN?

What can be done to influence the Optimizer?

Summary


3/41


4/41

How is EXPLAIN Text Generated?

SQL

PE

Optimizer

Parse Tree

Best Plan

Dispatcher

AMPs

EXPLAIN SQL

PE

Optimizer

Parse Tree

Best Plan

Explain Text

User

Parse Tree

3

1

2

4

1

1

2

4

3

1

2

(Cost-based Optimizer)


5/41

Information Known to Optimizer

Number of nodes in system

Number and type of cpus per node

Number of configured AMP Vprocs

Disk array configuration

Interconnect configuration

Amount and configuration of memory

All are taken into account when calculating query cost.


6/41

Additional Information Requiredby the Optimizer

Columns with indexes

Rows in the table

Rows per block

Values per column

Rows per value


7/41

Dynamic Sampling

Select a random AMP based on Session number. Read its Master Index. Count the cylinders with data for this table. Select one cylinder and read its Cylinder Index. Count the data blocks and rows for this table on this cylinder. Calculate the approximate number of rows in this table:

> (Average Number of Rows per Block in the sampled Cylinder)

> (Number of Data Blocks in the sampled Cylinder)

> (Number of Cylinders with data for this table on this AMP)

> (Number of AMPs in this configuration) Secondary Index demographicsare collected the same way concurrently.

Any skewed component in the sample skews the demographics. The Optimizer makes its choices based on the demographics.

Skewed demographics mislead the optimizer into choosing a plan which can degradeperformance.

Different sessions may optimize a query differently with Random Samples. The parser is more aggressive with collected statistics. Random Samples are:

> LESS COMPLETE than COLLECTed STATISTICS.

> MORE CURRENT than COLLECTed STATISTICS.


8/41

Dynamic Sampling

V2R6.0 Enhancement

When statistics are not available, the Optimizer canobtain random samples from more than one AMP whengenerating row counts for a query plan.

Improves the row count, row size and rows per valueestimates for a given table.


9/41

Optimizer Facts

Cost-based Optimizer - looks for lowest cost plan

Does not store plan - dynamically regenerates

As data demographics change, so may plan

Will only assign cost to steps for which there are choices

Assigns confidence factors on row estimates

Mature, large-table, decision-support optimization


10/41

EXPLAIN Example

EXPLAIN SELECT department_name

,last_name

,first_name

FROM employee INNER JOIN department

ON employee.employee_number =

department.manager_employee_number;


11/41

EXPLAIN Example (cont.)

1) First, we lock a distinct SAMPLE."pseudo table" for read on a RowHash to preventglobal deadlock for SAMPLE.employee.2) Next, we lock a distinct SAMPLE."pseudo table" for read on a RowHash to prevent

global deadlock for SAMPLE.department.3) We lock SAMPLE.employee for read, AND we lock SAMPLE.department for read.4) We execute the following steps in parallel.

1) We do an all-AMPs RETRIEVE step FROM SAMPLE.employee by way of an all-rows scan with no residual conditions into Spool 2, which is built locally on theAMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is

estimated with high confidence to be 2,002 rows. The estimated time for thisstep is 0.22 seconds.2) We do an all-AMPs RETRIEVE step FROM SAMPLE.department by way of an all-

rows scan with no residual conditions into Spool 3, which is duplicated on allAMPs. Then we do a SORT to order Spool 3 by row hash. The size of Spool 3is estimated with high confidence to be 792 rows. The estimated time for thisstep is 0.05 seconds.

5) We do an all-AMPs JOIN step FROM Spool 2 (Last Use) by way of a RowHash matchscan, which is joined to Spool 3 (Last Use). Spool 2 AND Spool 3 are joined using a

merge join, with a join condition of ("manager_employee_number =manager_employee_number"). The result goes into Spool 1, which is built locally onthe AMPs. The size of Spool 1 is estimated with INDEX join confidence to be 5,916rows. The estimated time for this step is 0.42 seconds.

6) Finally, we send out an END TRANSACTION step to all AMPs involved in processing therequest.

The contents of Spool 1 are sent back to the user as the result ofstatement 1.The total estimated time is 0.63 seconds.


12/41

EXPLAIN Terminology

. Pseudo Table Locks.

Prevents two users from getting conflicting locks with all-AMP requests.

All-AMP lock requests are handled as follows:

> PE determines Table ID hash for an AMP to manage the all-AMP lock request.

> Put pseudo lock on the table

> Acquire lock on all AMPs


13/41

EXPLAIN Terminology

AMP

PE PE

AMP AMP AMP

Deter i e

Table ID ash

First

request

econd

request

. PseudoTable Locks.


14/41

EXPLAIN Terminology

Most EXPLAIN text is easy to understand. The followingadditional definitions may be helpful:

... (Last Use) A spool file is no longer needed and will be released when this step completes.

... with no residual conditions

All applicable conditions have been applied to the rows.... END TRANSACTION

Transaction locks are released, and changes are committed.

... eliminating duplicate rows ...(Duplicate rows only exist in spool files, not set tables.) Doing a DISTINCT operation.

... by way of the sort key in spool field1 Field1 is created to allow a tag sort.

... we do an ABORT test Caused by an ABORT or ROLLBACK statement.

... by way of a traversal of index #n extracting row ids only A spool file is built containing the Row IDs found in a secondary index (index #n).


15/41

EXPLAIN Terminology, (cont.)

... we do a SMS (set manipulation step) Combining rows using a UNION, MINUS, or INTERSECT operator.

... we do a BMSMS(bit map set manipulation step)Doing a NUSI Bit Map operation.

... which is redistributed by hash code to all AMPs.

... which is duplicated on all AMPs.Relocating data in preparation for a join.

... (one_AMP) or (group_AMPs)

indicates one AMP or a subset of AMPs will be used instead of all AMPs.

... ("NOT (table_name.column_name ISNULL)")feature in which optimizer realizes that column being joined to is NOT NULL or has

referential integrity.

... Joined using a row id join

indicates a join back condition with a join index.


16/41

Understanding Row and Time Estimates

The EXPLAIN facility may express confidence for aretrieve from a table. Some of the phrases used are:

. . . with high confidence . . .> Restricting conditions exist on index(es) or column(s) that have collected statistics.

. . . with low confidence . . .> Restricting conditions exist on index(es) having no statistics, but estimates can be

based upon a sampling of the index(es).

> Restricting conditions exist on index(es) or column(s) that have collected statistics butare AND-ed together with conditions on non-indexed columns.

> Restricting conditions exist on index(es) or column(s) that have collected statistics butare OR-ed together with other conditions.

. . . with no confidence . . .> Conditions outside the above.

For a retrieve from a spool, the confidence is the

same as the step generating the spool.


17/41


The following are confidence phrases for a join:

. . . with index join confidence . . .> A join condition via a primary index.

. . . with high confidence . . .> One input relation has high confidence and the other has high or index join

confidence.

. . . with low confidence . . .> One input relation has low confidence and the other has low, high, or join index

confidence.

. . . with no confidence . . .> One input relation has no confidence.> Statistics do not exist for either join field.


18/41

Synchronized Scanning

EXPLAIN SELECT * FROM message_detail

1) First, we lock TWI.message_detail for read.

2) Next, we do an all-AMPs RETRIEVE step from TWI.message_detail by

way of an all-rows scan with no residual conditions into Spool 1, which is

built locally on the AMPs. The input table will not be cached in memory,but it is eligible for synchronized scanning. The result spool file will not be

cached in memory. The size of Spool 1 is estimated with high confidence

to be 301,886 rows. The estimated time for this step is 1 minute and 15

seconds.

3) Finally, we send out an END TRANSACTION step to all AMPs involved in

processing the request.> The contents of Spool 1 are sent back to the user as the result of

statement 1. The total estimated time is 1 minute and 15 seconds.


19/41

Sync Scanning

Permits overlap processing for concurrent table scans. Takes advantage of data blocks already read into memory.

User1 and User2

scan the data in sync

Data blocks kept in memory

for synchronized scan

(released on LRU basis)

Data blocksimmediately

discarded after

User 1 finishes

Available Memory

User 1 Begins

User 2 Begins

User 2 Continues


20/41

Query Cost Estimates

Row estimates:> May be estimated using random samples, statistics or indexes> Are assigned a confidence level - high, low or none> Affect timing estimates - more rows, more time needed

Timings:> Used to determine the lowest cost plan> Total cost generated if all processing steps have assigned cost> Not intended to predict wall-clock time, useful for comparisons

Miscellaneous Notes:

> Estimates too large to display show 3 asterisks (***).

> The accuracy of the time estimate depends upon the accuracy of therow estimate.


21/41


Low and no confidence may indicate a need to collectstatistics on indexes or columns involved in restrictingconditions.

You may otherwise consider a closer examination of the

conditions in the query for possible changes that mayimprove the confidence.

Collecting statistics or altering the conditions has no realimpact unless it influences the optimizer to pick a betterplan.


22/41

Parallel Steps

PARALLEL STEPS are AMP steps that can execute concurrently:

They have no functional overlap and do not contend forresources.

They improve performance.

The Optimizer generates PARALLEL STEPS wheneverpossible.

EXPLAIN text identifies Parallel Steps.


23/41

EXPLAIN of Create Table

QUERYEXPLAIN CREATE TABLE SAMPLE.department

( department_number SMALLINT

, department_name CHAR(30) NOT NULL, budget_amount DECIMAL(10,2), manager_employee_number INTEGER)UNIQUE PRIMARY INDEX (department_number) ;

EXPLANATION

--------------------------------------------------------------------------------------

1) First, we lock SAMPLE.department for exclusive use.

2) Next, we lock a distinct DBC."pseudo table" for write on a RowHash for deadlock prevention,

we lock a distinct DBC."pseudo table" for read on a RowHash for deadlock prevention, we

lock a distinctDBC."pseudo table" for write on a RowHash for deadlock prevention, and we

lock a distinct DBC."pseudo table" for write on a RowHash for deadlock prevention.

3) We lock DBC.AccessRights for write on a RowHash, we lock DBC.TVFields for write on a

RowHash, we lock DBC.TVM for write on a RowHash, we lock DBC.DBase for read on a

RowHash, and we lock

DBC.Indexes for write on a RowHash.


24/41

EXPLAIN of Create Table (cont.)

4) We execute the following steps in parallel.1) We do a single-AMP ABORT test from DBC.DBase by way of the

unique primary index.2) We do a single-AMP ABORT test from DBC.TVM by way of the

unique primary index with no residual conditions.3) We do an INSERT into DBC.TVFields (no lock required).

4) We do an INSERT into DBC.TVFields (no lock required).5) We do an INSERT into DBC.TVFields (no lock required).6) We do an INSERT into DBC.TVFields (no lock required).7) We do an INSERT into DBC.Indexes (no lock required).8) We do an INSERT into DBC.TVM (no lock required).9) We INSERT default rights to DBC.AccessRights for

SAMPLE.department.

5) We create the table header.6) Finally, we send out an END TRANSACTION step to all AMPs involved

in processing the request.> No rows are returned to the user as the result of statement 1.


25/41

Unique Primary INDEX Request (UPI)

EXPLAIN

SELECT * FROM employee WHERE employee_number = 801;

1) First, we do a single-AMP RETRIEVE step FROM SAMPLE.employeeby way of the unique primary INDEX

SAMPLE.employee.employee_number = 801 with no residualconditions. The estimated time for this step is 0.03 seconds.

> The row is sent directly back to the user as the result ofstatement 1. The total estimated time is 0.03 seconds.

Simplest and most efficient type of access

Spool is not used


26/41


27/41

Full Table Scan

EXPLAIN

SELECT employee_number FROM employee

WHERE manager_employee_number = 40801

AND job_code = 411100;

1) First, we lock a distinct SAMPLE."pseudo table" for read on a RowHash

to prevent global deadlock for SAMPLE.employee.

2) Next, we lock SAMPLE.employee for read.

3) We do an all-AMPs RETRIEVE step FROM SAMPLE.employee by way of an

all-rows scan with a condition of ("(SAMPLE.employee.job_code = 411100)

AND (SAMPLE.employee.manager_employee_number = 40801)") into Spool 1,

which is built locally on the AMPs. The size of Spool 1 is estimated with

no confidence to be 139 rows. The estimated time for this step is 0.23 seconds.


processing the request.

> The contents of Spool 1 are sent back to the user as the result of statement 1. Thetotal estimated time is 0.23 seconds.


28/41


29/41

Optimized INSERT/SELECT

INSERT/SELECT is the process of SELECTing data FROM one tableand using it as input to be inserted into another table.

Two different optimizations can occur:

1.) If the PI of the source AND destination tables are identical, anAMP local operation is used.

2.) If the target table is empty,

a.) Transient Journaling is reduced

b.) 64K block transfers are used

If both conditions are satisfied,

both optimizations are used.


30/41

Optimized INSERT/SELECT Example

EXPLAIN

INSERT INTO employee_copy SELECT * FROM employee;

1) First, we lock a distinct SAMPLE."pseudo table" for write on a RowHash to

prevent global deadlock for SAMPLE.employee_copy.

2) Next, we lock a distinct SAMPLE."pseudo table" for read on a RowHash to

prevent global deadlock for SAMPLE.employee.

3) We lock SAMPLE.employee_copy for write, AND we lock SAMPLE.employee

for read.

4) We do a MERGE into SAMPLE.employee_copy FROM SAMPLE.employee.

5) We spoil the parser's dictionary cache for the table.


processing the request.> No rows are returned to the user as the result of statement 1.


31/41

INSERT/SELECT With Different PIs

If the target table has a different Primary Index, astandard insert SELECT process must be used. ABYNET operation will be used to relocate theSELECTed rows onto the target AMPs.

This will require:

a.) Single row inserts (vs. 64 K blocks)

b.) Transient journal entries for each row


32/41

Non-Optimized INSERT/SELECT Example

CREATE TABLE employee

(employee_number INTEGER

,manager_employee_number INTEGER

:

,salary_amount DECIMAL(10,2) NOT NULL)

UNIQUE PRIMARY INDEX(employee_number);

CREATE TABLE employee_charPI

(employee_number char(11)

,manager_employee_number INTEGER

:,salary_amount DECIMAL(10,2) NOT

NULL)

UNIQUE PRIMARY INDEX

(employee_number);


33/41

Non-Optimized INSERT/SELECTExample (cont.)

EXPLAIN INSERT INTO employee_charPI SELECT * FROM employee;

1) First, we lock a distinct SAMPLE."pseudo table" for write on a RowHash to prevent global

deadlock for SAMPLE.employee_charPI.

2) Next, we lock a distinct SAMPLE."pseudo table" for read on a RowHash to prevent global

deadlock for SAMPLE.employee.

3) We lock SAMPLE.employee_charPI for write, AND we lock SAMPLE.employee for read.4) We do an all-AMPs RETRIEVE step FROM SAMPLE.employee by way of an all-rows

scan with no residual conditions into Spool 1, which is redistributed by hash code to all

AMPs. Then we do a SORT to order Spool 1 by row hash. The size of Spool 1 is

estimated with high confidence to be 2,002 rows. The estimated time for this step is 0.61

seconds.

5) We do a MERGE into SAMPLE.employee_charPI FROM Spool 1 (Last Use).

6) We spoil the parser's dictionary cache for the table.7) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the

request.

> No rows are returned to the user as the result of statement 1.


34/41

Unexpected Full Table Scan

EXPLAIN SELECT * FROM employee_charPIWHERE employee_number = 801;

1) First, we lock a distinct SAMPLE."pseudo table" for read on a RowHash to prevent global

deadlock for SAMPLE.employee_charPI.

2) Next, we lock SAMPLE.employee_charPI for read.

3) We do an all-AMPs RETRIEVE step from SAMPLE.employee_charPI by way of an all-rows scan with a condition of (

TRANSLATE((SAMPLE.employee_charPI.employee_number )USING

LATIN_TO_UNICODE)(FLOAT, FORMAT '-9.99999999999999E-999'))=

8.01000000000000E 002") into Spool 1 (group_amps), which is built locally on the AMPs.

The size of Spool 1 is estimated with no confidence to be 212 rows. The estimated time

for this step is 0.12 seconds.

4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing therequest.

> The contents of Spool 1 are sent back to the user as the result of statement 1. Thetotal estimated time is 0.12 seconds.


35/41

Correct use of Primary INDEX

EXPLAIN SELECT * FROM employee_charPI

WHERE employee_number = 801;

1) First, we do a single-AMP RETRIEVE step FROM

SAMPLE.employee_charPI by way of the unique primary INDEXSAMPLE.employee_charPI.employee_number = '801'" with no residual

conditions. The estimated time for this step is 0.01 seconds.

> The row is sent directly back to the user as the result of statement 1.The total estimated time is 0.01 seconds.


36/41

Explaining Macros

Create Macro TEST(department-number INTEGER)

AS

(SELECT * FROM Employee

WHERE department_number = :dept_no;);

EXPLAIN EXEC TEST (17401);

1) First, we lock a distinct SAMPLE."pseudo table" for read on a RowHash to prevent

global deadlock for SAMPLE.employee.


3) We do an all-AMPs RETRIEVE step from SAMPLE.employee by way of index # 4

SAMPLE.employee.department_number = 17401" with no residual conditions into

Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with high

confidence to be 7 rows. The estimated time for this step is 0.23 seconds.

4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing

the request.

> The contents of Spool 1 are sent back to the user as the result of statement 1.The total estimated time is 0.23 seconds.


37/41

Explaining Macros (cont.)

EXPLAIN Using (dep_in Integer)Execute test (:dept_in);

1) First, we lock a distinct SAMPLE."pseudo table" for read on a RowHash to prevent

global deadlock for SAMPLE.employee.


3) We do an all-AMPs RETRIEVE step from SAMPLE.employee by way of an all-rows

scan with a condition of ( SAMPLE.employee.department_number = :x") into Spool

1, which is built locally on the AMPs. The size of Spool 1 is estimated with low

confidence to be 20 rows. The estimated time for this step is 0.26 seconds.

4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing

the request.

> The contents of Spool 1 are sent back to the user as the result of statement 1.The total estimated time is 0.26 seconds.


38/41

How to Influence the Optimizer

COLLECTED STATISTICS can help theOptimizer make better decisions using actual

row counts and data distributioninformation.

Collect Statistics on:

> Non-unique indexes> Non-index join columns> Small tables

Collect Statistics down-side:

> Requires a full table scan> Must be kept current> May be unnecessary


39/41

Other Factors To Help Optimizer

Proper index choices at physical design time

Add secondary indexes where helpful

Use equality-based join conditions

Experiment using EXPLAIN


40/41

Summary

EXPLAIN is a tool to help you plan query resources

Teradata uses a cost-based optimizer

Adding Secondary Indexes gives optimizer more choices

Collecting Statistics allows better plan estimates

Most mature optimizer for large table decision supportin the industry


41/41

[email protected]

Explain 21773 Ram On

Documents