Top Banner

of 41

Explain 21773 Ram On

Apr 10, 2018

Download

Documents

hyd.rasool
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/8/2019 Explain 21773 Ram On

    1/41

  • 8/8/2019 Explain 21773 Ram On

    2/41

    Agenda

    What is the EXPLAIN facility?

    What does the EXPLAIN terminology mean?

    What can be learned by reading the EXPLAIN text?

    Where does the EXPLAIN output come FROM? What are the latest features of EXPLAIN?

    What can be done to influence the Optimizer?

    Summary

  • 8/8/2019 Explain 21773 Ram On

    3/41

  • 8/8/2019 Explain 21773 Ram On

    4/41

    How is EXPLAIN Text Generated?

    SQL

    PE

    Optimizer

    Parse Tree

    Best Plan

    Dispatcher

    AMPs

    EXPLAIN SQL

    PE

    Optimizer

    Parse Tree

    Best Plan

    Explain Text

    User

    Parse Tree

    3

    1

    2

    4

    1

    1

    2

    4

    3

    1

    2

    (Cost-based Optimizer)

  • 8/8/2019 Explain 21773 Ram On

    5/41

    Information Known to Optimizer

    Number of nodes in system

    Number and type of cpus per node

    Number of configured AMP Vprocs

    Disk array configuration

    Interconnect configuration

    Amount and configuration of memory

    All are taken into account when calculating query cost.

  • 8/8/2019 Explain 21773 Ram On

    6/41

    Additional Information Requiredby the Optimizer

    Columns with indexes

    Rows in the table

    Rows per block

    Values per column

    Rows per value

  • 8/8/2019 Explain 21773 Ram On

    7/41

    Dynamic Sampling

    Select a random AMP based on Session number. Read its Master Index. Count the cylinders with data for this table. Select one cylinder and read its Cylinder Index. Count the data blocks and rows for this table on this cylinder. Calculate the approximate number of rows in this table:

    > (Average Number of Rows per Block in the sampled Cylinder)

    > (Number of Data Blocks in the sampled Cylinder)

    > (Number of Cylinders with data for this table on this AMP)

    > (Number of AMPs in this configuration) Secondary Index demographicsare collected the same way concurrently.

    Any skewed component in the sample skews the demographics. The Optimizer makes its choices based on the demographics.

    Skewed demographics mislead the optimizer into choosing a plan which can degradeperformance.

    Different sessions may optimize a query differently with Random Samples. The parser is more aggressive with collected statistics. Random Samples are:

    > LESS COMPLETE than COLLECTed STATISTICS.

    > MORE CURRENT than COLLECTed STATISTICS.

  • 8/8/2019 Explain 21773 Ram On

    8/41

    Dynamic Sampling

    V2R6.0 Enhancement

    When statistics are not available, the Optimizer canobtain random samples from more than one AMP whengenerating row counts for a query plan.

    Improves the row count, row size and rows per valueestimates for a given table.

  • 8/8/2019 Explain 21773 Ram On

    9/41

    Optimizer Facts

    Cost-based Optimizer - looks for lowest cost plan

    Does not store plan - dynamically regenerates

    As data demographics change, so may plan

    Will only assign cost to steps for which there are choices

    Assigns confidence factors on row estimates

    Mature, large-table, decision-support optimization

  • 8/8/2019 Explain 21773 Ram On

    10/41

    EXPLAIN Example

    EXPLAIN SELECT department_name

    ,last_name

    ,first_name

    FROM employee INNER JOIN department

    ON employee.employee_number =

    department.manager_employee_number;

  • 8/8/2019 Explain 21773 Ram On

    11/41

    EXPLAIN Example (cont.)

    1) First, we lock a distinct SAMPLE."pseudo table" for read on a RowHash to preventglobal deadlock for SAMPLE.employee.2) Next, we lock a distinct SAMPLE."pseudo table" for read on a RowHash to prevent

    global deadlock for SAMPLE.department.3) We lock SAMPLE.employee for read, AND we lock SAMPLE.department for read.4) We execute the following steps in parallel.

    1) We do an all-AMPs RETRIEVE step FROM SAMPLE.employee by way of an all-rows scan with no residual conditions into Spool 2, which is built locally on theAMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is

    estimated with high confidence to be 2,002 rows. The estimated time for thisstep is 0.22 seconds.2) We do an all-AMPs RETRIEVE step FROM SAMPLE.department by way of an all-

    rows scan with no residual conditions into Spool 3, which is duplicated on allAMPs. Then we do a SORT to order Spool 3 by row hash. The size of Spool 3is estimated with high confidence to be 792 rows. The estimated time for thisstep is 0.05 seconds.

    5) We do an all-AMPs JOIN step FROM Spool 2 (Last Use) by way of a RowHash matchscan, which is joined to Spool 3 (Last Use). Spool 2 AND Spool 3 are joined using a

    merge join, with a join condition of ("manager_employee_number =manager_employee_number"). The result goes into Spool 1, which is built locally onthe AMPs. The size of Spool 1 is estimated with INDEX join confidence to be 5,916rows. The estimated time for this step is 0.42 seconds.

    6) Finally, we send out an END TRANSACTION step to all AMPs involved in processing therequest.

    The contents of Spool 1 are sent back to the user as the result ofstatement 1.The total estimated time is 0.63 seconds.

  • 8/8/2019 Explain 21773 Ram On

    12/41

    EXPLAIN Terminology

    . Pseudo Table Locks.

    Prevents two users from getting conflicting locks with all-AMP requests.

    All-AMP lock requests are handled as follows:

    > PE determines Table ID hash for an AMP to manage the all-AMP lock request.

    > Put pseudo lock on the table

    > Acquire lock on all AMPs

  • 8/8/2019 Explain 21773 Ram On

    13/41

    EXPLAIN Terminology

    AMP

    PE PE

    AMP AMP AMP

    Deter i e

    Table ID ash

    First

    request

    econd

    request

    . PseudoTable Locks.

  • 8/8/2019 Explain 21773 Ram On

    14/41

    EXPLAIN Terminology

    Most EXPLAIN text is easy to understand. The followingadditional definitions may be helpful:

    ... (Last Use) A spool file is no longer needed and will be released when this step completes.

    ... with no residual conditions

    All applicable conditions have been applied to the rows.... END TRANSACTION

    Transaction locks are released, and changes are committed.

    ... eliminating duplicate rows ...(Duplicate rows only exist in spool files, not set tables.) Doing a DISTINCT operation.

    ... by way of the sort key in spool field1 Field1 is created to allow a tag sort.

    ... we do an ABORT test Caused by an ABORT or ROLLBACK statement.

    ... by way of a traversal of index #n extracting row ids only A spool file is built containing the Row IDs found in a secondary index (index #n).

  • 8/8/2019 Explain 21773 Ram On

    15/41

    EXPLAIN Terminology, (cont.)

    ... we do a SMS (set manipulation step) Combining rows using a UNION, MINUS, or INTERSECT operator.

    ... we do a BMSMS(bit map set manipulation step)Doing a NUSI Bit Map operation.

    ... which is redistributed by hash code to all AMPs.

    ... which is duplicated on all AMPs.Relocating data in preparation for a join.

    ... (one_AMP) or (group_AMPs)

    indicates one AMP or a subset of AMPs will be used instead of all AMPs.

    ... ("NOT (table_name.column_name ISNULL)")feature in which optimizer realizes that column being joined to is NOT NULL or has

    referential integrity.

    ... Joined using a row id join

    indicates a join back condition with a join index.

  • 8/8/2019 Explain 21773 Ram On

    16/41

    Understanding Row and Time Estimates

    The EXPLAIN facility may express confidence for aretrieve from a table. Some of the phrases used are:

    . . . with high confidence . . .> Restricting conditions exist on index(es) or column(s) that have collected statistics.

    . . . with low confidence . . .> Restricting conditions exist on index(es) having no statistics, but estimates can be

    based upon a sampling of the index(es).

    > Restricting conditions exist on index(es) or column(s) that have collected statistics butare AND-ed together with conditions on non-indexed columns.

    > Restricting conditions exist on index(es) or column(s) that have collected statistics butare OR-ed together with other conditions.

    . . . with no confidence . . .> Conditions outside the above.

    For a retrieve from a spool, the confidence is the

    same as the step generating the spool.

  • 8/8/2019 Explain 21773 Ram On

    17/41

    Understanding Row and Time Estimates

    The following are confidence phrases for a join:

    . . . with index join confidence . . .> A join condition via a primary index.

    . . . with high confidence . . .> One input relation has high confidence and the other has high or index join

    confidence.

    . . . with low confidence . . .> One input relation has low confidence and the other has low, high, or join index

    confidence.

    . . . with no confidence . . .> One input relation has no confidence.> Statistics do not exist for either join field.

  • 8/8/2019 Explain 21773 Ram On

    18/41

    Synchronized Scanning

    EXPLAIN SELECT * FROM message_detail

    1) First, we lock TWI.message_detail for read.

    2) Next, we do an all-AMPs RETRIEVE step from TWI.message_detail by

    way of an all-rows scan with no residual conditions into Spool 1, which is

    built locally on the AMPs. The input table will not be cached in memory,but it is eligible for synchronized scanning. The result spool file will not be

    cached in memory. The size of Spool 1 is estimated with high confidence

    to be 301,886 rows. The estimated time for this step is 1 minute and 15

    seconds.

    3) Finally, we send out an END TRANSACTION step to all AMPs involved in

    processing the request.> The contents of Spool 1 are sent back to the user as the result of

    statement 1. The total estimated time is 1 minute and 15 seconds.

  • 8/8/2019 Explain 21773 Ram On

    19/41

    Sync Scanning

    Permits overlap processing for concurrent table scans. Takes advantage of data blocks already read into memory.

    User1 and User2

    scan the data in sync

    Data blocks kept in memory

    for synchronized scan

    (released on LRU basis)

    Data blocksimmediately

    discarded after

    User 1 finishes

    Available Memory

    User 1 Begins

    User 2 Begins

    User 2 Continues

  • 8/8/2019 Explain 21773 Ram On

    20/41

    Query Cost Estimates

    Row estimates:> May be estimated using random samples, statistics or indexes> Are assigned a confidence level - high, low or none> Affect timing estimates - more rows, more time needed

    Timings:> Used to determine the lowest cost plan> Total cost generated if all processing steps have assigned cost> Not intended to predict wall-clock time, useful for comparisons

    Miscellaneous Notes:

    > Estimates too large to display show 3 asterisks (***).

    > The accuracy of the time estimate depends upon the accuracy of therow estimate.

  • 8/8/2019 Explain 21773 Ram On

    21/41

    Understanding Row and Time Estimates

    Low and no confidence may indicate a need to collectstatistics on indexes or columns involved in restrictingconditions.

    You may otherwise consider a closer examination of the

    conditions in the query for possible changes that mayimprove the confidence.

    Collecting statistics or altering the conditions has no realimpact unless it influences the optimizer to pick a betterplan.

  • 8/8/2019 Explain 21773 Ram On

    22/41

    Parallel Steps

    PARALLEL STEPS are AMP steps that can execute concurrently:

    They have no functional overlap and do not contend forresources.

    They improve performance.

    The Optimizer generates PARALLEL STEPS wheneverpossible.

    EXPLAIN text identifies Parallel Steps.

  • 8/8/2019 Explain 21773 Ram On

    23/41

    EXPLAIN of Create Table

    QUERYEXPLAIN CREATE TABLE SAMPLE.department

    ( department_number SMALLINT

    , department_name CHAR(30) NOT NULL, budget_amount DECIMAL(10,2), manager_employee_number INTEGER)UNIQUE PRIMARY INDEX (department_number) ;

    EXPLANATION

    --------------------------------------------------------------------------------------

    1) First, we lock SAMPLE.department for exclusive use.

    2) Next, we lock a distinct DBC."pseudo table" for write on a RowHash for deadlock prevention,

    we lock a distinct DBC."pseudo table" for read on a RowHash for deadlock prevention, we

    lock a distinctDBC."pseudo table" for write on a RowHash for deadlock prevention, and we

    lock a distinct DBC."pseudo table" for write on a RowHash for deadlock prevention.

    3) We lock DBC.AccessRights for write on a RowHash, we lock DBC.TVFields for write on a

    RowHash, we lock DBC.TVM for write on a RowHash, we lock DBC.DBase for read on a

    RowHash, and we lock

    DBC.Indexes for write on a RowHash.

  • 8/8/2019 Explain 21773 Ram On

    24/41

    EXPLAIN of Create Table (cont.)

    4) We execute the following steps in parallel.1) We do a single-AMP ABORT test from DBC.DBase by way of the

    unique primary index.2) We do a single-AMP ABORT test from DBC.TVM by way of the

    unique primary index with no residual conditions.3) We do an INSERT into DBC.TVFields (no lock required).

    4) We do an INSERT into DBC.TVFields (no lock required).5) We do an INSERT into DBC.TVFields (no lock required).6) We do an INSERT into DBC.TVFields (no lock required).7) We do an INSERT into DBC.Indexes (no lock required).8) We do an INSERT into DBC.TVM (no lock required).9) We INSERT default rights to DBC.AccessRights for

    SAMPLE.department.

    5) We create the table header.6) Finally, we send out an END TRANSACTION step to all AMPs involved

    in processing the request.> No rows are returned to the user as the result of statement 1.

  • 8/8/2019 Explain 21773 Ram On

    25/41

    Unique Primary INDEX Request (UPI)

    EXPLAIN

    SELECT * FROM employee WHERE employee_number = 801;

    1) First, we do a single-AMP RETRIEVE step FROM SAMPLE.employeeby way of the unique primary INDEX

    SAMPLE.employee.employee_number = 801 with no residualconditions. The estimated time for this step is 0.03 seconds.

    > The row is sent directly back to the user as the result ofstatement 1. The total estimated time is 0.03 seconds.

    Simplest and most efficient type of access

    Spool is not used

  • 8/8/2019 Explain 21773 Ram On

    26/41

  • 8/8/2019 Explain 21773 Ram On

    27/41

    Full Table Scan

    EXPLAIN

    SELECT employee_number FROM employee

    WHERE manager_employee_number = 40801

    AND job_code = 411100;

    1) First, we lock a distinct SAMPLE."pseudo table" for read on a RowHash

    to prevent global deadlock for SAMPLE.employee.

    2) Next, we lock SAMPLE.employee for read.

    3) We do an all-AMPs RETRIEVE step FROM SAMPLE.employee by way of an

    all-rows scan with a condition of ("(SAMPLE.employee.job_code = 411100)

    AND (SAMPLE.employee.manager_employee_number = 40801)") into Spool 1,

    which is built locally on the AMPs. The size of Spool 1 is estimated with

    no confidence to be 139 rows. The estimated time for this step is 0.23 seconds.

    4) Finally, we send out an END TRANSACTION step to all AMPs involved in

    processing the request.

    > The contents of Spool 1 are sent back to the user as the result of statement 1. Thetotal estimated time is 0.23 seconds.

  • 8/8/2019 Explain 21773 Ram On

    28/41

  • 8/8/2019 Explain 21773 Ram On

    29/41

    Optimized INSERT/SELECT

    INSERT/SELECT is the process of SELECTing data FROM one tableand using it as input to be inserted into another table.

    Two different optimizations can occur:

    1.) If the PI of the source AND destination tables are identical, anAMP local operation is used.

    2.) If the target table is empty,

    a.) Transient Journaling is reduced

    b.) 64K block transfers are used

    If both conditions are satisfied,

    both optimizations are used.

  • 8/8/2019 Explain 21773 Ram On

    30/41

    Optimized INSERT/SELECT Example

    EXPLAIN

    INSERT INTO employee_copy SELECT * FROM employee;

    1) First, we lock a distinct SAMPLE."pseudo table" for write on a RowHash to

    prevent global deadlock for SAMPLE.employee_copy.

    2) Next, we lock a distinct SAMPLE."pseudo table" for read on a RowHash to

    prevent global deadlock for SAMPLE.employee.

    3) We lock SAMPLE.employee_copy for write, AND we lock SAMPLE.employee

    for read.

    4) We do a MERGE into SAMPLE.employee_copy FROM SAMPLE.employee.

    5) We spoil the parser's dictionary cache for the table.

    6) Finally, we send out an END TRANSACTION step to all AMPs involved in

    processing the request.> No rows are returned to the user as the result of statement 1.

  • 8/8/2019 Explain 21773 Ram On

    31/41

    INSERT/SELECT With Different PIs

    If the target table has a different Primary Index, astandard insert SELECT process must be used. ABYNET operation will be used to relocate theSELECTed rows onto the target AMPs.

    This will require:

    a.) Single row inserts (vs. 64 K blocks)

    b.) Transient journal entries for each row

  • 8/8/2019 Explain 21773 Ram On

    32/41

    Non-Optimized INSERT/SELECT Example

    CREATE TABLE employee

    (employee_number INTEGER

    ,manager_employee_number INTEGER

    :

    ,salary_amount DECIMAL(10,2) NOT NULL)

    UNIQUE PRIMARY INDEX(employee_number);

    CREATE TABLE employee_charPI

    (employee_number char(11)

    ,manager_employee_number INTEGER

    :,salary_amount DECIMAL(10,2) NOT

    NULL)

    UNIQUE PRIMARY INDEX

    (employee_number);

  • 8/8/2019 Explain 21773 Ram On

    33/41

    Non-Optimized INSERT/SELECTExample (cont.)

    EXPLAIN INSERT INTO employee_charPI SELECT * FROM employee;

    1) First, we lock a distinct SAMPLE."pseudo table" for write on a RowHash to prevent global

    deadlock for SAMPLE.employee_charPI.

    2) Next, we lock a distinct SAMPLE."pseudo table" for read on a RowHash to prevent global

    deadlock for SAMPLE.employee.

    3) We lock SAMPLE.employee_charPI for write, AND we lock SAMPLE.employee for read.4) We do an all-AMPs RETRIEVE step FROM SAMPLE.employee by way of an all-rows

    scan with no residual conditions into Spool 1, which is redistributed by hash code to all

    AMPs. Then we do a SORT to order Spool 1 by row hash. The size of Spool 1 is

    estimated with high confidence to be 2,002 rows. The estimated time for this step is 0.61

    seconds.

    5) We do a MERGE into SAMPLE.employee_charPI FROM Spool 1 (Last Use).

    6) We spoil the parser's dictionary cache for the table.7) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the

    request.

    > No rows are returned to the user as the result of statement 1.

  • 8/8/2019 Explain 21773 Ram On

    34/41

    Unexpected Full Table Scan

    EXPLAIN SELECT * FROM employee_charPIWHERE employee_number = 801;

    1) First, we lock a distinct SAMPLE."pseudo table" for read on a RowHash to prevent global

    deadlock for SAMPLE.employee_charPI.

    2) Next, we lock SAMPLE.employee_charPI for read.

    3) We do an all-AMPs RETRIEVE step from SAMPLE.employee_charPI by way of an all-rows scan with a condition of (

    TRANSLATE((SAMPLE.employee_charPI.employee_number )USING

    LATIN_TO_UNICODE)(FLOAT, FORMAT '-9.99999999999999E-999'))=

    8.01000000000000E 002") into Spool 1 (group_amps), which is built locally on the AMPs.

    The size of Spool 1 is estimated with no confidence to be 212 rows. The estimated time

    for this step is 0.12 seconds.

    4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing therequest.

    > The contents of Spool 1 are sent back to the user as the result of statement 1. Thetotal estimated time is 0.12 seconds.

  • 8/8/2019 Explain 21773 Ram On

    35/41

    Correct use of Primary INDEX

    EXPLAIN SELECT * FROM employee_charPI

    WHERE employee_number = 801;

    1) First, we do a single-AMP RETRIEVE step FROM

    SAMPLE.employee_charPI by way of the unique primary INDEXSAMPLE.employee_charPI.employee_number = '801'" with no residual

    conditions. The estimated time for this step is 0.01 seconds.

    > The row is sent directly back to the user as the result of statement 1.The total estimated time is 0.01 seconds.

  • 8/8/2019 Explain 21773 Ram On

    36/41

    Explaining Macros

    Create Macro TEST(department-number INTEGER)

    AS

    (SELECT * FROM Employee

    WHERE department_number = :dept_no;);

    EXPLAIN EXEC TEST (17401);

    1) First, we lock a distinct SAMPLE."pseudo table" for read on a RowHash to prevent

    global deadlock for SAMPLE.employee.

    2) Next, we lock SAMPLE.employee for read.

    3) We do an all-AMPs RETRIEVE step from SAMPLE.employee by way of index # 4

    SAMPLE.employee.department_number = 17401" with no residual conditions into

    Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with high

    confidence to be 7 rows. The estimated time for this step is 0.23 seconds.

    4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing

    the request.

    > The contents of Spool 1 are sent back to the user as the result of statement 1.The total estimated time is 0.23 seconds.

  • 8/8/2019 Explain 21773 Ram On

    37/41

    Explaining Macros (cont.)

    EXPLAIN Using (dep_in Integer)Execute test (:dept_in);

    1) First, we lock a distinct SAMPLE."pseudo table" for read on a RowHash to prevent

    global deadlock for SAMPLE.employee.

    2) Next, we lock SAMPLE.employee for read.

    3) We do an all-AMPs RETRIEVE step from SAMPLE.employee by way of an all-rows

    scan with a condition of ( SAMPLE.employee.department_number = :x") into Spool

    1, which is built locally on the AMPs. The size of Spool 1 is estimated with low

    confidence to be 20 rows. The estimated time for this step is 0.26 seconds.

    4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing

    the request.

    > The contents of Spool 1 are sent back to the user as the result of statement 1.The total estimated time is 0.26 seconds.

  • 8/8/2019 Explain 21773 Ram On

    38/41

    How to Influence the Optimizer

    COLLECTED STATISTICS can help theOptimizer make better decisions using actual

    row counts and data distributioninformation.

    Collect Statistics on:

    > Non-unique indexes> Non-index join columns> Small tables

    Collect Statistics down-side:

    > Requires a full table scan> Must be kept current> May be unnecessary

  • 8/8/2019 Explain 21773 Ram On

    39/41

    Other Factors To Help Optimizer

    Proper index choices at physical design time

    Add secondary indexes where helpful

    Use equality-based join conditions

    Experiment using EXPLAIN

  • 8/8/2019 Explain 21773 Ram On

    40/41

    Summary

    EXPLAIN is a tool to help you plan query resources

    Teradata uses a cost-based optimizer

    Adding Secondary Indexes gives optimizer more choices

    Collecting Statistics allows better plan estimates

    Most mature optimizer for large table decision supportin the industry

  • 8/8/2019 Explain 21773 Ram On

    41/41

    [email protected]