www.SageLogix.com Scaling to Infinity Partitioning in Oracle DW 09-February 2004 Tim Gorman SageLogix, Inc. RMOUG Training Days 2005
Mar 26, 2015
www.SageLogix.com
Scaling to InfinityPartitioning in Oracle
DW
09-February 2004
Tim GormanSageLogix, Inc.
RMOUG Training Days 2005RMOUG Training Days 2005
www.SageLogix.com
Who am I?Who am I?
“C” programmer since 1983
Databases since 1984, including BTrieve, C-ISAM, Unify, DEC VMS/RMS
Oracle application developer since 1990
RMOUG “Training Days” presenter since 1993
Oracle DBA since 1994
Oracle data warehouses since 1994
RMOUG board member since 1995
With Oak Table (2004) With Gary Dodge (2000 and 1998)
www.SageLogix.com
AgendaAgenda
DW “utopia” on OracleRiding the “virtuous cycle”
Sliding down the “death spiral”
Partitioning, and the use of EXCHANGE PARTITION for ETL, is the differentiator
www.SageLogix.com
Four DW characteristics
Four DW characteristics
Non-volatile, time-variant, subject-oriented, integratedBill Inmon “Building the Data Warehouse” 3rd Ed 2002 (Wiley)Think about what these mean?
Consider the converse of these characteristics? Volatile? Static-image? Process-oriented? Application-specific?
Time-variant dimensional data model implies:Insert, index, and analyze each row of data only onceFrom an implementation perspective, this is vital to remember!
Consider an extreme situation? Analytical database for quantum research in physics 50 Tbytes of data to load every day
www.SageLogix.com
The Virtuous CycleThe Virtuous Cycle
Insert-only processing enables…Direct-path loads of data
Partitioned tables/indexes stored in time-variant tablespaces
Direct-path (a.k.a. append) loads enable…Larger volume loads with less overall impact
Table compression
NOLOGGING and PARALLEL operations
Partitioned tables/indexes stored in time-variant tablespaces enable…
EXCHANGE PARTITION during ETL
READ ONLY tablespaces as data ages
Performance scalability from partition pruning
www.SageLogix.com
EXCHANGE PARTITION during ETL enables…Bitmap indexes and bitmap-join indices
Elimination of ETL “load window” and 24x7 availability for queries
Bitmap indices enable…Star transformations on “star” (dimensional) schemas
READ ONLY tablespaces enable…Near-line storage (i.e. NAS, SAMFS/HFS, etc)
“Right-sizing” of storage to the need, classified by IOPS
Backup efficienciesREAD WRITE tablespaces scheduled for backup every week
READ ONLY tablespaces scheduled for backup every year
The Virtuous CycleThe Virtuous Cycle
www.SageLogix.com
The Death SpiralThe Death Spiral
Volatile data presented in a static-image according to process-oriented concepts leads to…
ETL using “conventional-path” INSERT, UPDATE, and DELETE operations (including MERGE and multi-table INSERT)
Conventional-path operations are troubletrouble with:Bitmap indexes and bitmap-join indexes
Forcing frequent complete rebuilds until they get too big
Contention in Shared Pool, Buffer Cache, global structuresMixing of queries and loads simultaneously on table and indexesPeriodic rebuilds/reorgs of tables if deletions occur
Full redo logging and undo transaction tracking
ETL will dominate the workload in the databaseQueries will consist mainly of “dumps” or extracts to downstream systemsQuery performance will be abysmal and worsening…
www.SageLogix.com
UpSert/Merge logic during large-scale ETL represents deathdeath for scalability in a large DW
The Death SpiralThe Death Spiral
www.SageLogix.com
Virtue is easier on everyone
Virtue is easier on everyone
Use dimensional data models for presentation of data to users
Anything goes during ETL staging, but users want simplicity and speedOracle mechanisms optimize dimensional data models
Star transformations using bitmap and bitmap-join indexesPartition pruning during queries
Non-intrusive ETL processing (24x7 query operations)Direct-path “bulk” loading without interrupting queriesNewly-loaded data “published” simultaneously to users
Conserving resourcesAny UPDATE or DELETE logic can be converted to INSERT
Table compressionREAD ONLY tablespaces
www.SageLogix.com
Direct-path loadsDirect-path loads
Bulk loading feature first introduced in Oracle v6 in FASTLOAD utility on MVS to compete with DB2
Incorporated into SQL*Loader DIRECT=TRUE in v7.0Extended to CREATE TABLE AS SELECT in v7.2Extended to INSERT /*+ APPEND */ in v8.0Enhanced in v8.1 to leave behind a direct-path log for use by MV “fast” refresh
Loads data outside of “managed space”Load above the “high-water mark” in target table in SQL*Loader DIRECT=TRUE PARALLEL=FALSE
After successful completion, high-water mark is raised to include newly-loaded rows in the table
Load into TEMPORARY segments in all other load mechanismsAfter successful completion, TEMPORARY segments are merged into the table segment
www.SageLogix.com
Direct-path loadsDirect-path loads
Fast bulk load mechanism bypasses:Buffer Cache
Though which “conventional-path” INSERTs, and all UPDATE and DELETE operations pass
Log Buffer and entire redo log generation processIf keyword NOLOGGING is utilized
Processes format blocks with newly-inserted rows in private process memory
Writes them directly to datafiles
www.SageLogix.com
Direct-path loadsDirect-path loads
NOLOGGING options exist for all direct-path or APPEND operations
Available only for INSERT operations, never UPDATE or DELETE
A potential performance enhancementIf the redo logging stream is truly causing performance problems
Don’t assume that this is so, please verify!
Flip side: NOLOGGING means no recoverabilityRMAN incremental backup capability can help here…
www.SageLogix.com
Exchange PartitionExchange Partition
EXCHANGE PARTITION is crucial to non-intrusive ETLData is transformed, cleansed, loaded, indexed, analyzed offline from “live” tables and indexes
Direct-path load operations are especially tough on “live” indexes
“In-flight” queries continue to process during and after EXCHANGE PARTITION operations
Oracle’s read-consistency mechanisms cause existing operations to use data that was exchanged away from the table, and new operations to use data exchanged into the table
Local-partitioned indexes and statistics are exchanged as well
Global-partitioned indexes are maintained during exchange operation
www.SageLogix.com
Exchange PartitionExchange Partition
EXCHANGE PARTITION can be transparent to in-flight queries
DML locks prevent exchange on objects where INSERT, UPDATE, DELETE, and SELECT … FOR UPDATE in progressWhat happens to in-flight queries if standalone table “TT” is truncated or dropped immediately after the exchange completes?
P11 P12 P13 P14 P15 P15 P17 P18
TT
Queries started after EXCHANGEutilize the segment that is partitionP18 after the exchange
Queries that were in-flight beforeEXCHANGE continue to utilizethe segment that was partitionP18 before the exchange
www.SageLogix.com
Exchange PartitionExchange Partition
The basic technique of bulk-loading new data into a temporary “load table”, which is then indexed, analyzed, and then “published” all at once to end-users using the EXCHANGE PARTITION operation, should be the default load technique for all large tables in a data warehouse
fact tables
slowly-changing or quickly-changing dimensions
Assumptions for this example:Composite partitioned fact table named TXN
Range partitioned on DATE column TXN_DATE
Hash partitioned on NUMBER column ACCT_KEY
Data to be loaded into partition P20040225 on TXN
www.SageLogix.com
Exchange PartitionExchange Partition
1. Create temporary table TXN_TEMP as a hash-partitioned table
2. Perform parallel, direct-path load of new data into TXN_TEMP
3. Gather CBO statistics on table TXN_TEMP4. Create indexes on the temporary hash-partitioned
table TXN_TEMP corresponding to the local indexes on TXN
• using PARALLEL, NOLOGGING, and COMPUTE STATISTICS options
5. alter table TXNexchange partition P20040225 with table TXN_TEMPincluding indexes without validation update global indexes;
6. Table TXN_TEMP is left ready for next load cycle
www.SageLogix.com
22-Feb2004
23-Feb2004
24-Feb2004
(empty) 25-Feb2004
Composite-partitionedtable TXN
Hash-partitionedtable TXN_TEMP
2. BulkLoads
5. EXCHANGE PARTITION
4. IndexCreates
3. Analyze
Exchange PartitionExchange Partition
www.SageLogix.com
Exchange PartitionExchange Partition
It is a good idea to encapsulate this logic inside PL/SQL packaged- or stored-procedures:
SQL> execute exchpart.prepare(‘TXN_FACT’,’TMP_’, - 2 ’25-FEB-2004’,’27-FEB-2004’);SQL> alter session enable parallel dml;SQL> insert /*+ append nologging parallel(n,4) */ 2 into tmp_txn_fact n 3 select /*+ full(x) parallel(x,4) */ * 4 from stage_txn_fact x 5 where load_date >= ‘25-FEB-2004’ 6 and load_date < ‘28-FEB-2004’;SQL> commit;SQL> execute exchpart.finish(‘TXN_FACT’,’TMP_’);
DDL for “exchpart.sql” posted at http://www.EvDBT.com/tools.htm
www.SageLogix.com
Exchange PartitionExchange Partition
It is wise to encapsulate this partition-exchange functionality in a PL/SQL package- or stored-procedure
Along with the related functionality to:Gather CBO statistics on the table
Build indexes (in the proper related tablespaces with the proper parameters)
Also, the use of stored procedures to encapsulate this logic is crucial for security
You do NOT want to grant anybody the ability to ALTER TABLE or CREATE TABLE
You do NOT want anybody connecting as the table owner schema!
Stored procedures, once created, can be granted
www.SageLogix.com
Publishing Loaded Data
Publishing Loaded Data
Coordinating the final EXCHANGE PARTITION operation permits all of the newly-loaded data to appear to the end-users simultaneously
Publishing data
If newly-loaded data is becoming visible to users gradually
Then a “load window” when new queries cannot be started becomes necessary
Exchange Partition load techniques make “load windows” of restricted activity unnecessary
www.SageLogix.com
Star TransformationsStar Transformations
Why are “star” join-transformations desirable?Typical Oracle “nested loops”, “sort/merge”, or “hash” join methods tend to start the query from a dimension table and join into the fact table using only one index
Further “filtering” is performed by joining to other dimensions
Very “sequential” and not optimal
Star transformations do the following:Using database statistics, identifies the pattern of one large table at the center of a query involving two or more smaller tables
Resolves a result set from each of the dimension tables
Merges all of the results sets from all of the dimensions
Uses powerful BITMAP MERGE operation on fact table
www.SageLogix.com
Star TransformationsStar Transformations
1. Drive query from one of the dimensions
2. Join to the fact from that dimension
3. Filter on the fact by joining to other dimensions
www.SageLogix.com
Star TransformationsStar Transformations
1. Find result set in each dimension
2. Merge results from all dimensions
3. Join to the fact from merged result set, using BITMAP MERGE index scan
www.SageLogix.com
Star TransformationsStar Transformations
Enabling “star” join transformations in OracleParameter settings:
COMPATIBLE = 8.1.0 or higherOPTIMIZER_FEATURES_ENABLE = 8.1.0 or higherSTAR_TRANSFORMATION_ENABLED = TEMP_DISABLE
The optimizer will consider performing a cost-based query transformation on the star query but will not use temporary tables in the star transformation.Lots of bugs associated with setting to TRUE
BITMAP_MERGE_AREA_SIZE = <huge!>Default is 1M. Set to 16M? 32M? 128M? 512M?
HASH_JOIN_ENABLED = TRUE
Bitmap indexesAll fact table foreign-key columns must have bitmap indices(Optional) all dimension table non-key attribute columns should have bitmap indicesALTER TABLE … MINIMIZE RECORDS_PER_BLOCK
www.SageLogix.com
Star TransformationsStar Transformations
STAR_TRANSFORMATION_ENABLE = TEMP_DISABLESetting to TRUE causes star transformation to create then drop a global temporary table to store intermediate results
Buggy, nasty plan…
ALTER TABLE … MINIMIZE RECORDS_PER_BLOCKMetalink note #103490.1 provides explanation
ALTER TABLE command scans rows in the table and calculates a value for “maximum number of rows in a block”, which is used by CREATE BITMAP INDEX operations
If MINIMIZE RECORDS_PER_BLOCK not performed, a default max number of rows value is used instead
MINIMIZE allows smaller bitmap indexes to be created
www.SageLogix.com
Star TransformationsStar Transformations
A “star transformation” is not named as such in the EXPLAIN PLAN display
Instead, it is indicated by the following operations:
TABLE ACCESS BY ROWID OF (<fact-table-name>)
BITMAP AND
BITMAP MERGE
(data retrieved from a dimension)
BITMAP MERGE
(data retrieved from a dimension)
BITMAP MERGE
(data retrieved from a dimension)
www.SageLogix.com
Star TransformationsStar Transformations
Bitmap-join index (BJI) is an optimization of the initial phase of a star transformation
Index itself is comprised of saved data from the initial merge structure
BJI is an index on a table using data from one or more table joins
Almost like a materialized view of one of the steps of a BITMAP MERGE operation during a star query
SQL> CREATE BITMAP INDEX c_s_p_bjix1 2 ON SALES (c.region, p.category) 3 FROM SALES s, CUSTOMERS c, PRODUCTS p 4 WHERE c.cust_id = s.cust_id 5 AND p.prod_id = s.prod_id; Index created.
www.SageLogix.com
Partition PruningPartition Pruning
Oracle offers a total of five ways to partition tables and indexes
RANGE of data values
LIST of specified data values
HASH (pseudo-random distribution of data values)
Composite RANGE-HASH
Composite RANGE-LIST
Oracle cost-based optimizer can prune partitions that will not be utilized from the query
Explicit pruning (partition/subpartition name specified in query)
Implicit pruning (partition-key columns are referenced in query)
Don’t scan what you don’t need
www.SageLogix.com
Partition PruningPartition Pruning
The decision of what columns to partition upon must be carefully considered
Question: “Which queries do we want to optimize?”Try to choose outer RANGE partition-key column based on a frequently-queried DATE column
Bear in mind that RANGE partitioning is very important to ETL processing also…
EXCHANGE PARTITION permits a great deal of flexibility, so there is no need to choose to benefit ETL exclusively
Choose the inner HASH or LIST subpartition-key column based on any other frequently-queried column
Use HASH subpartitioning for open-ended data valuesUse LIST subpartitioning for bounded static data values
www.SageLogix.com
Partition PruningPartition Pruning
Prune by RANGE partition key on TXN_DT
Prune byHASHsubpartitionkey onACCT_ID
Or, prune by both!
www.SageLogix.com
Partition PruningPartition Pruning
Explicit partition pruning
SELECT …
FROM sales_fact PARTITION (sales_200403)
WHERE …
SELECT …
FROM sales_fact SUBPARTITION (sales_200403_sp12)
WHERE …
www.SageLogix.com
Implicit partition pruning
select …
from txn_fact
where posting_date between ‘03-May-2004’
and ‘05-May-2004’
and …
select …
from txn_fact
where state = ‘NJ’
and …
Partition PruningPartition Pruning
www.SageLogix.com
Partition PruningPartition Pruning
Implicit partition pruning can be disabled if there is an expression involving the partition-key column
Partition pruning is disabled in this situation:select …from sales_factwhere trunc(posting_date) = ‘15-MAY-2004’and …
Partitiong pruning is enabled in this situation:select …from sales_factwhere posting_date >= ‘15-MAY-2004’and posting_date < ‘16-MAY-2004’and …
www.SageLogix.com
Partition PruningPartition Pruning
EXPLAIN PLAN displays are a little confusing:
PARTITION RANGE (ALL)
PARTITION LIST (ALL)
PARTITION HASH (ALL)
No partition pruning occurring
PARTITION RANGE (ITERATOR)
PARTITION LIST (ITERATOR)
PARTITION HASH (ITERATOR)Pruning involving two or more partitions
No mention of partitions at all in the EXPLAIN PLAN indicates that pruning has occurred to include one and only one partition
www.SageLogix.com
Table CompressionTable Compression
Available in Oracle9i Release 2 (v9.2.0)Physical storage attribute for tables and materialized views
[ CREATE | ALTER ] TABLE …[ COMPRESS | NOCOMPRESS ] …
Restrictions:Can be used for RANGE or LIST partitions
But cannot be used with HASH partitionsBut cannot be used for HASH or LIST sub-partitions
Can be specified for NESTED tablesBut cannot be used with any LOB construct
Such as CLOB, BLOB, BFILE, and VARRAY
Not valid for index-organized or external tables
www.SageLogix.com
Table CompressionTable Compression
Storing repeated data values once in each blockA symbol table of distinct data values created in each block
The symbol table is stored as another table in the block
Each column in a row in a block references back to an entry in the symbol table in the block
Header & Tailer
Table & Column Map
ITL
Symbol table
Free
Row data
www.SageLogix.com
Table CompressionTable Compression
Only bulk-loading INSERT operations perform compression
CREATE TABLE … AS SELECT …INSERT /*+ APPEND */ (single-threaded and parallel)ALTER TABLE … MOVE …ALTER TABLE … MOVE PARTITION …ALTER TABLE … MERGE PARTITION …ALTER TABLE … SPLIT PARTITION …SQL*Loader DIRECT=TRUE
Conventional INSERT operations unaffectedSELECT, UPDATE, and DELETE behavior also unaffected
www.SageLogix.com
Table CompressionTable Compression
SELECTImpressive performance improvements!!!
Less I/O due to fewer blocksCompression ratio is linear with performance improvements
Better impact on FULL table scansIndexed scans still exhibit less-impressive improvements
Conventional and direct-path INSERTNoticeable performance slowdown (2-3x)
UPDATEVery negative performance impact observed (4-8x)
DELETESome performance improvements observed
www.SageLogix.com
Table CompressionTable Compression
Columns cannot be added, renamed, modified, or dropped on compressed tables or partitioned tables with compressed partitions
Might be fixed in 10g?
Local partitioned indexes are marked UNUSABLE during compression
Includes indexes on non-partitioned tables
Must be rebuilt
Global partitioned indexes can be maintained using UPDATE GLOBAL INDEXES
Includes non-partitioned indexes on partitioned tablesA rare situation when GLOBAL indexes can be more highly available than LOCAL indexes!
www.SageLogix.com
READ ONLY tablespacesREAD ONLY tablespaces
Partitioning by a datetime value allows the time-variant nature of data to be exploited
Within the same table, different partitions can exist in different tablespaces
Different tablespaces can reside on different types of storage media
Most-expensive (i.e. SSD)
Very expensive (i.e. SAN)
Less expensive (i.e. JBOD/HDD, NAS, SAMFS, etc)
Set tablespaces to READ ONLY as soon as possibleVerify read-only nature using V$SEGMENT_STATISTICS and/or V$FILESTAT
Over time, the majority of data in any DW can be READ ONLY
www.SageLogix.com
READ ONLY tablespaces can be:Backed up less frequently (i.e. quarterly, annually, etc) than “active” READ WRITE tablespaces, with no compromise on recovery strategy
Moved from faster, more-expensive storage to cheaper, less-expensive storage
Without interrupting operationsOS-level copy of datafiles can proceed without interruption
Finalizing ALTER DATABASE RENAME FILE would actually interrupt in-flight queries
Only way to scale to infinity from a storage perspective!
READ ONLY tablespacesREAD ONLY tablespaces
www.SageLogix.com
Q&AQ&A
Questions?
Email: [email protected]
Personal website: http://www.EvDBT.com/
Corporate website: http://www.SageLogix.com/