Teradata Idexes

Teradata IndexesWhat they are and how they work

Alison Torres, DirectorTeradata Warehouse Consulting

Teradata Certified Master V2R3 & V2R5

Teradata Overview

3

UNIX / Windows / Linux O/SPDE (Parallel Database Extensions)

LANGateway Channel

Teradata RDBMSTeradata RDBMS

Communication Interfaces

Teradata Node

4

BYNET

SWITCH

V-Net

PE V-Proc

DATA

AMP V-Proc

AMP V-Proc

AMP V-Proc

AMP V-Proc

PE V-Proc

DATA DATA DATA

V-Net

PE V-Proc

AMP V-Proc

AMP V-Proc

AMP V-Proc

AMP V-Proc

PE V-Proc

DATA DATA DATA DATA

Teradata expansion to MPP nodes

5

Agenda

•Primary Indexes

•Partitioned Primary Index

•Secondary Indexes

>Unique and Non-unique

•Other Types of Secondary Indexes

>Single Table Join Index

>Value Ordered Index

>Sparse Join Index

>Value Ordered Sparse Single Table Join Index

>Hash Index

6

Primary Indexes

• Primary physical access path

• Mechanism used to assign a row to an AMP

• Table must have one and only one Primary Index

• Primary Index cannot be changed without recreating the table

• UPIs result in even distribution of the rows of the table across all AMPs.

• UPIs ensure no duplicate rows

• PI access are always one-AMP operations

• NUPIs will result in even distribution of the table rows proportional to the degree of uniqueness of the index and the number of AMPs

• Primary Indexes may or may not be the same as Primary Keys

7

Primary Indexes: How They Work

Teradata Hash Function

RowHash (Hash Bucket) Data Columns

Primary Index

Column(s) Value(s)

AMP AMP AMP AMPRH DTable A

Rows ordered by RH

Hash Map

BYNET

AMP

8

Agenda


>How PPIs work

>PPI Performance Considerations and Trade-offs

9

Partitioned Primary Index (PPI)

• Description> A table organization to optimize the physical database design for range constrained queries

> Allows partitioning of large history tables by a numeric value (e.g.. month) so queries can access only one month of a history

• Your Benefit > Significantly improve performance for range constrained queries

>Strategic queries still see all data in one table, but tactical queries look only at the subset they need

>Performance improvements for other functions like deletes and updates

>Read only a subset of table> Easy to manage

>None of the pain of re-partitioning>All the self management you expect from Teradata>Reduce high-volume batch insert times by 90%>Delete large volumes of rows, nearly instantaneously>Drop unneeded secondary indexes or value-ordered join indexes

10


• Overview of the Basics> Rows are still hash distributed among the AMPs on the primary index columns

> Rows are ordered by partition, then by Primary Index Hash within partition

> CREATE TABLE statement has new PARTITION BY clause

> Partitioning column can be part of primary index, but is not required to be• In many cases, better performance occurs when partitioning column is part of the primary index

> Maximum of 65,535 partitions, numbered from one

> One or more columns in partitioning expression

11

• Restrictions on PPI Tables> For PI to be unique (UPI with partition), partitioning column must be part of the PI

> No character or graphic comparison allowed in partitioning expression

> PPIs allowed on base tables, Global Temporary, and Volatile Temporary tables only

• Performance Awareness> Possible degradation of PI Access

•• If partitioning column is not qualified, all partitions will If partitioning column is not qualified, all partitions will be readbe read

•• Joins on PI columns from a nonJoins on PI columns from a non--PPI table to a PPI table PPI table to a PPI table will result in comparing PI column in every PPI table will result in comparing PI column in every PPI table partitionpartition


12

Partitioned Primary Indexes: How They Work (since V2R5.0)

Partitioning Columns

Teradata Hash Function

Partition RowHash (Hash Bucket) Data Columns

Primary Index Columns

AMP AMP AMP AMPP RH DTable A

Rows ordered by RH

Partition 1

Partition 2

Partition 4

Partition 3

User-specifiedPartitioning Function

Hash Map

BYNET

AMP

13

AMP with PPIPartitions are

accessed as needed

AMP with PIRequires

Full Table Scan

Non-PPI vs. PPI

SEL …WHERE order_date BETWEEN DATE ‘2007-01-01 and DATE ‘2008-01-14’;

14

Partitioned Primary Indexes

• Trade-off considerations

> Potential advantages

• Partition elimination

• Finer granularity on separation of data

• May eliminate need for some secondary indexes

• Deletes by partitions are nearly instantaneous

> Potential disadvantages

• Rows are two bytes wider

• PI cannot be defined as unique when partitioning column is not

part of PI

• Access can be degraded if partitioning column is not specified in

the query

• Joins to non-partitioned tables with same PI may be degraded

15


• Trade-off considerations

> Common errors

•• The Optimizer needs to see the partitioning column as a The Optimizer needs to see the partitioning column as a

constantconstant to determine which partitions can be excluded

• Caution when doing range partition deletes, if data falls outside

partition range, it will be moved to the NO RANGE partition and

the move will not be fast

> Conclusions

• PPI can offer dramatic improvements in query response time

and in high volume data load and maintenance operations

• May be degradations in PI access and in join steps due to PPI

• DBA should understand trade-off considerations

• Testing of various alternatives will usually be necessary to get

the maximum benefit from PPI

16

Agenda


>Unique Secondary Index (USI)

>Non-unique Secondary Index (NUSI)

17

Secondary Indexes

>A secondary index is an alternate path to the rows of a table.

>Secondary indexes:•Do not affect table distribution.•Add overhead, both in terms of disk space and maintenance.

•May be added or dropped dynamically as needed.

•Are chosen to improve access performance.

18

Unique Secondary Index (USI) Access

Customer table Id = 100

USI Value = 54

AMP 1 AMP 2 AMP 3 AMP 4

RowID Cust RowID RowID Cust RowID RowID Cust RowID RowID Cust RowID

BYNET

AMP 2

Table ID

100

Row Hash

778

Unique Val

7

USI Subtable USI Subtable USI SubtableUSI Subtable

Table ID Row Hash USI Value

100 602 54

Hashing Algorithm

PE

CREATE UNIQUE INDEX (cust) on customer;

SELECT *FROM customerWHERE cust = 54;

Create USI

Access via USI

BYNET

AMP 1 AMP 3 AMP 4

74775127

884, 1639, 1915, 9388, 1

244, 1505, 1744, 4757, 1

8498

5449

536, 5555, 6

778, 7147, 1

296, 1135, 1

602, 1969, 1

31404595

638, 1640, 1471, 1778, 3

288, 1339, 1372, 2588, 1

175, 1 37 107, 1489, 1 72 717, 2838, 1 12 147, 2919, 1 62 822, 1

AdamsSmith

RiceWhite 555-4444

111-2222222-3333

666-555531

37

40

84107, 1536, 5638, 1640, 1

RowID Cust Name PhoneNUPI

Base Table

USIRowID Cust Name Phone

NUPI

Base Table

USI

Base Table Base Table

AdamsSmith

BrownAdams 444-6666

666-7777555-6666

333-999972

45

74

98471, 1555, 6717, 2884, 1

JonesBlack

YoungSmith 111-6666

222-8888444-5555

777-444427

49

62

12147, 1147, 2388, 1822, 1

RowID Cust Name PhoneNUPIUSI

SmithMarsh

PetersJones 777-6666

555-7777888-2222

555-777754

77

51

95639, 1778, 3778, 7915, 9

RowID Cust Name PhoneNUPIUSI

19

Non-Unique Secondary Index (NUSI) Access

Table ID

100

Row Hash

567

NUSI Value

‘Adams’

Hashing Algorithm

Customer table Id = 100

BYNET

AMP 2

NUSI Value = ‘Adams’

PE

CREATE INDEX (name) on customer;

SELECT *FROM customerWHERE name = ‘Adams’;

Create NUSI

Access via NUSI

AMP 1

BrownAdams

Smith555, 6471, 1 717, 2

884, 1852, 1567, 2

432, 3

RowID Name RowIDWhiteRiceAdamsSmith

107, 1536, 5638, 1640, 1

448, 1656, 1567, 3432, 8

RowID Name RowID

NUSI Subtable NUSI Subtable

SmithYoungJonesBlack

147, 1147, 2338, 1822, 1

432, 1770, 1567, 6448, 4

RowID Name RowID

NUSI Subtable

JonesPetersSmithMarsh

639, 1778, 3778, 7915, 9

262, 1396, 1432, 5155, 1

RowID Name RowID

NUSI Subtable

AMP 4AMP 3

AdamsSmith

RiceWhite 555-4444

111-2222222-3333

666-555531

37

40

84107, 1536, 5638, 1640, 1


Base Table


Base Table Base Table Base Table

AdamsSmith

BrownAdams 444-6666

666-7777555-6666

333-999972

45

74

98471, 1555, 6717, 2884, 1

JonesBlack

YoungSmith 111-6666

222-8888444-5555

777-444427

49

62

12147, 1147, 2388, 1822, 1


SmithMarsh

PetersJones 777-6666

555-7777888-2222

555-777754

77

51

95639, 1778, 3778, 7915, 9

RowID Cust Name PhoneNUPINUSI NUSI NUSI NUSI

20

Full Table Scans vs. Non-Unique Secondary Index (NUSI)

• Full Table Scans Read Every Data Block> Great when aggregating

• NUSI useful when not every block is read > Usefulness depends on % of rows qualifying and number of rows in a data block

NUSI

Table Data Blocks

• Optimizer will decide best access methods

• Use EXPLAIN to determine index usage

• Collect Statistics on NUSIs

21

Overlooked NUSI Criteria

Usage depends on rows per block that qualify

Example 1: IF >= 1 row per block qualifies, THEN full table scan of the base table is faster than NUSI access and NUSI is not used> If 100 rows/block and 1% of the data qualifies, then every block will be read. Full Table Scan is faster.Full Table Scan is faster.

Example 2: IF < 1 row per block qualifies, THEN NUSI access is faster than full table scan > If 100 rows/block and 1 in 1000 rows qualify, then 1 in every 10 blocks would be read. NUSI will be NUSI will be used.used.

22

Uneven Distribution of Values

• Some values represent a large percentage of the table, other values have few instances>Full Table Scan done for values that represent a large percent of table

>NUSI is used for values that represent a tiny percent of the table

Example:*>Large corporation with 100,000 calls / month would do Full Table Scan

>Residential phone customer with 20 calls / month would use NUSI

*(Candidate for Sparse Single Table Join Index - STJI)

More NUSI Criteria

Sparse index is a special case of a STJI Create a join index, qualify with where clause.Can’t just put a where clause on a SI.

23

NUSI - Index Covering

• Index Covering> Occurs when Query can be satisfied by columns in the secondary index

> Enables scanning secondary index sub-table instead of primary data table

> Savings based on number of bytes (columns) in NUSI definition versus number of bytes (columns) in table definition.

> Example on next page• Table has 26 data columns; NUSI has 5 data columns

• I/O savings ~ 60% - 80%

24

• Index Covering - example - NUSI with 5 columns• Table has 26 data columns; NUSI has 5 data columns

• I/O savings ~ 60% - 80%

Table Data

NUSI

NUSI containsRow Hash Codeof data row

� Query satisfied with NUSI access only

NUSI – Vertical Partitioning of Data

25

NUSI on PI Columns of PPI Table

• NUSIs can be defined on the same columns as the PI of the PPI table

• For a given value, accessing a NUSI on PI column of a PPI table results in a single-AMP operation

• Example:> Access seasonal items sold in a store

> PPI on Store and Item with Partition on Date

> NUSI on Item

> NUSI accesses only the partitions for the months when item was sold

26

Agenda




>Sparse Join Index


>Hash Index

27

Other Types of Secondary Indexes

• Join Index > Used to define a pre-joined table on frequently joined columns (with

optional aggregation) without denormalizing the database.

> Used to create a full or partial replication of a base table with a primary index on a foreign key column table to facilitate joins of very large tables by hashing their rows to the same AMP as the large table.

> Used to define a summary table without denormalizing the database.

> You can define a join index on one or several tables.

• Sparse Index> Any join index, whether simple or aggregate, multi-table or single-

table, can be sparse.

> Uses a constant expression in the WHERE clause of its definition to narrowly filter its row population.

• Value-Ordered NUSI> Very efficient for range conditions and conditions with an inequality on

the secondary index column set.

• Hash Index> Used for the same purposes as single-table join indexes.

> Create a full or partial replication of a base table with a primary index on a foreign key column table to facilitate joins of very large tables by hashing them to the same AMP. Limited to one table only.

28

Spectacular Gains Through Indexes

Vertical Partitioning Indexes > Non-Unique Secondary Index (NUSI)

> Single Table Join Index

> Release 5 - Expanded to 64 columns in an index

Enhanced Index Features> Index Covering

> Value Ordering

> Sparse Index (Qualification of rows to put into STJI)

Horizontal Partitioning of Data Table > Partitioned Primary Index (PPI)

29

Single Table Join Index (STJI) - Index Covering

• More on Vertical Partitioning of Data> Index Covering - example - STJI with 5 columns

Table Data

STJI

STJI can haveRow Hash Code(ROWID) of data row

1 5 6 11 15 RI 1 5 6 11 15

� Different structure, query satisfied with STJI acce ss� Index maintained automatically

30

Single Table Join Index (STJI)

• Similarities between STJI and NUSI

> STJI can be defined with same columns as NUSI

> Index covering applies to both STJI and NUSI

> Can do value ordering on both STJI and NUSI

• Basic Differences between STJI and NUSI

> STJI is similar to a table with a primary index and other

columns defined

• Means STJI row can be stored on same or different

AMP as table data row; NUSI stored on same AMP as

table data row

> Cannot join to a NUSI while can join to STJI

> NUSI supported by MultiLoad, but not STJI

> All columns of NUSI must be accessed for Index to be

considered

31

Single Table Join Index (STJI) -Different Primary Index

• Use Different Primary Index for STJI to avoid row redistribution at time of query

• Primary Index for base table is (store, item, date_sold)

• Primary Index for STJI is (store, item)

Effect is for a given item and store, all rows with different dates are grouped together in same data block

* Same table, indexed two different ways

32

Single Table Join Index (STJI) -Different Primary Index

Case: table PI = (store, item, date_sold)

Rows are redistributed and sorted to get all rows with same store and item and date on same AMP

Case: STJI PI = (store, item)

Rows have been redistributed and grouped together at the time STJI was built

SELECT item,

COUNT

(DISTINCT(Store_no))

FROM Sales_History

WHERE On_hand_qty > 0

AND qty_sold > 0

AND item IN (x,y,z)

AND date_sold IN (a,b,c)

AND store IN (d,e,f)

GROUP BY 1 ORDER BY 1;

Find certain items sold in a set of stores for a given set of dates

Result: Query suite ran 10 times faster with a STJI

33

Single Table Join Index (STJI) –Same Primary Index – Partial Covering

• Use Same Primary Index for STJI so STJI row is on same AMP as table data row

• Similar to NUSI – STJI row is on same AMP as data row

• Useful for partial covering

• Partial covering means qualification is done on Index columns before accessing primary data table for non-covered columns. This can reduce the number of rows to retrieve from base table

• Acts like a NUSI

> AMP-local and no BYNET traffic

• Useful for scoring:

> Only want 2000 names of the 20M

• Scan the STJI which is a narrower table

• Then go back to base table

34

STJI - Using LIKE clause

CREATE JOIN INDEX LIKETAB as

SELECT car_license, pi_col, ROWID

FROM Customer_Info

PRIMARY INDEX (pi_col) ;

SELECT * from Customer_Info

WHERE car_license LIKE ‘ABC%’ ;

Query scans LIKETAB (a very narrow table), qualifies rows with car_license LIKE ‘ABC%’, then uses ROWID to get data from table Customer_Info where row is on same AMP because tables have same primary index

* Optimizer will NOT scan NUSI for LIKE, instead it will scan base table

35

STJI - Using LIKE clause

Customer needed OLTP type response time

> seconds, not minutes

LIKE clause on base table does full table scan

>40 million rows with 19 columns on 2 node system

>Took 1 minute Full Table Scan

Build STJI with column for LIKE plus ROWID of base table

>Use LIKE clause on narrower table

> Took 4 seconds

36

• Value-ordering option on NUSI and STJI> Numeric restriction of 4 bytes only

> Integer values only - no character data

• V2R5 - Expanded from 16 to 64 columns

• Syntax –CREATE INDEX OrdDate (orderdate)ORDER BY VALUES (orderdate)ON ORDERS;

Value Ordered Index

37

Value Ordered Indexes

• Value Ordered Non-Unique Secondary Index (VONUSI)

• Value Ordered Single Table Join Index (VOSTJI)

• Example> Invoice Table - 60 Million rows

• 1500 days X 100 outlets X 400 sales / day

> Invoice_Item Table - 240 Million rows• 1500 days X 100 outlets X 400 sales/day X 4 items/sale

38


Table Data –

Rows are hashed

Query does full table scan

Value Ordered NUSI/STJI –Scans only value specifiedportion of table

39


Teradata Database V2R4 - VONUSI & VOSTJI

Note: Cannot join to NUSI but can join to STJI

day

40

SPARSE Join Indexes

•Description> Indexes a portion of the table that is used most frequently

> Uses WHERE clause predicates to limit the rows indexed

> Like other index choices, a sparse Join Index should be chosen to support high frequency queries requiring short response times

•Your Benefit > A sparse index can focus on the portion of the table(s) that are most frequently used:• Reduces the storage requirements for a join index and maintenance costs for updates

• Makes access faster since the size of the join index is smaller

• No change for the user: optimizer will evaluate all join indexes, and choose one if it's appropriate for the specific query

IndexedColumn Index

Null

Null

Null

Null

Sparse Index (NOT NULL)

41

Sparse Single Table Join Index –Form of Horizontal Partitioning

• Built with a qualification of which rows to store in index> CREATE JOIN INDEX J1 AS

SELECT * FROM sales

WHERE Status = ‘ACTIVE’;

42

Sparse Single Table Join Index

Skewed Distribution of Phone Calls per day

0

200

400

600

800

1000

1200

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96

Number of Calls

Large Corporation averages 1000 calls per day

Residential Customers average 2.2 calls per day

Small Businesses average 25 calls per day

• Residential Customers>Comprise 94% of all customers

>Make 50% of all calls

• Application needs quick response for Residential customer calls

• Use Sparse STJI instead of NUSI>Saves space and maintenance time

43

Sparse Single Table Join Index - Other Examples

• Insurance – 10 year history

> Clients renew insurance every 6 months

> 1/20th of data represents current paid policies

• Manufacturing parts - used and available in

inventory

> Less than 1% of parts are available

> Retrieval looking for available parts

• Retail – Filled versus unfilled Orders

> Less than 0.5% of orders are unfilled

44

Value Ordered Sparse Single Table Join Index

• Combination of Vertical and Horizontal Partitioning> Sparse Join Index qualifies only active rows

> Build index only with columns needed

• Value order the Sparse Join Index

12 month history of all flyers

and all flights

Only today’s flights and

today’s flyers

45

Value Ordered Sparse STJI on PPI Table

• Join Index Maintenance is not supported by Multiload

• Qualify Sparse Join Index on Partitioning Value> Rebuilding Index only requires processing qualifying Partition

> Time to rebuild and space required for Sparse STJI can be quite small

Retailer has 100M rows/day

Subset of latest data

46

Hash Index automatically includes ROWID

• Hash Index can have a primary index different from the primary index of the base table> Different primary index allows for redistribution of data at time hash index is built – useful for index covering

> Also useful when number of rows for a given value is less than half the number of AMPs in a table

• Hash Index can be value ordered

• Uses different DDL to create it than STJI syntax

Hash Index – Similar to STJI

47

• Hash Index can have the same primary index as the base table> Having the same primary index allows for Partial Covering where rows are qualified by columns in index and retrieval of table data is on the same AMP

Example - Marketing Campaign

• Reduce list of prospects with successive qualifying queries using Hash Index until final number achieved, then get detailed data.

Hash Index – Similar to STJI

48

Join Index Types

• Simple Join Indexes are, like all Teradata indexes, automatically updated with the base tables, and automatically evaluated and selected by the Optimizer.

• Single Table Join Indexes, like Hashed NUSIs, are built on a single table, used primarily for covering (base table row IDs are optional) and can be hashed on a user-defined Primary Index.

• Multi-table Join Indexes can store covering data from as many as 64 base tables. NUSIs can be defined on these indexes, and the user can define the PI column.

• Aggregate Join Indexes may include SUM and COUNT values (from which Averages may be calculated) on one or more of its columns.They may be defined on:> Single Tables - A columnar subset of a base table with aggregates

automatically maintained by the software, or> Multiple Tables - A columnar subset of as many as 64 base tables

with aggregate columns automatically maintained.> Sparse Join Indexes are defined with a WHERE clause that limits

the number of base table rows included and space required to store them.

49

Teradata Indexes

•Primary Indexes



>Unique and Non-unique




>Sparse Join Index


>Hash Index

50

Alison [email protected]

Additional Considerations

52


• ‘Obvious’ PPI candidate table

> Large sales history table with 24 full months and current

month-to-date

> Nightly batch inserts of that day’s transactions (high volume)

and monthly deletes of oldest data

• Consider partitioning by transaction date

> Primary Index is (product_code, transaction_date, agent_id)

• No secondary indexes or join indexes

> Some queries access PI

> No other tables have the same primary index (and will not

join to it using the PI)

53



> What makes it such a good PPI candidate?

• High volume of daily inserts, so there is a bias toward

partitioning on transaction date

• Transaction date is part of PI, so that is even better

> Proposal: Convert to PPI, partitioned by transaction date

with daily granularity

> Could also consider partitioning by product_code or

agent_id

• Would improve some queries

• Would not improve batch insert or delete operations

54



> Summary of benefits of PPI proposal:

> Improves performance of batch inserts of daily transactions

and periodic bulk deletes of oldest data

• Faster inserts: most insets will be appended to end of table

• Faster deletes: ALTER TABLE …DROP RANGE is nearly

instantaneous (disclaimer: no secondary index or join index)

> Many queries can benefit from partition elimination

> PI access is not degraded much, if at all

> Joins should not be degraded (but check EXPLAINs, and do

comparison testing if they change)

• If joins are slower, could consider weekly or monthly partitions

instead of daily

55



> Disadvantages of PPI proposal:

> Full Table scan queries will have to read a little more data,

due to the two-byte partition number embedded in each

row

• If rows average 50 bytes, then 4% more disk space is needed.

Secondary index rows would also be wider (this example has

none)

> Reconfig and Table Rebuild will be slower

• These are infrequent operations

56


• ‘Obvious’ PPI candidate table Alternative

> Why daily granularity?

> Finer granularity improves the batch load performance

more than coarser granularity – no difference for batch

deletes

> Queries can get more partition elimination with finer

granularity, if they specify short time intervals

> No real disadvantage to having daily partitions for this

example (if join EXPLAINs are unchanged), so go for

maximum granularity

57


• ‘Maybe Yes/Maybe No’ PPI candidate table

> Large invoice table

• Four years of history

• Unique Primary Index is invoice number

• Partition candidate is invoice date

• Nightly batch inserts, monthly deletes of oldest data

• Fairly high volume of PI accesses

• Some time-constrained queries

• Other tables have same PI, and no invoice date column

58



> Why is it an uncertain candidate for partitioning?

• PI is single column, unique, and used for access and joins

> Advantages of partitioning on invoice date

• Inserts and deletes will be faster, but the secondary index will

reduce the amount of improvement

• Date-constrained queries will be faster

> Disadvantages of partitioning on invoice date

• Must define PI as non-unique, and define unique secondary

index to enforce uniqueness– USI prevent MultiLoad protocol inserts

• PI access will use secondary indexes, will take two or three

times as long as non-PPI PI access

• Joins to other tables with same PI will probably be degraded

59



> Worthwhile to convert to PPI?

• Need to measure performance difference

• Need to assess relative business importance of performance

differences

• Might consider schema changes to accommodate PPI. For

example, denormalize other tables with same PI by adding the

invoice date column to improve join performance

Teradata Idexes

Documents