www.semantec.de
Oracle 8i/9i features which support Data Warehousing
Author: Krasen Paskalev
Certified Oracle DBA
Semantec GmbH.
D-71083 Herrenberg
www.semantec.de
Agenda
• ETL Features
• Data Warehouse Management
• Data Warehouse Querying
• Parallel Operations
www.semantec.de
Agenda
• ETL (Extraction, Transformation, Transportation and Loading)
– Transportable Tablespaces– External Tables– Table Functions– MERGE Statement
• Data Warehouse Management• Data Warehouse Querying• Parallel Operations
www.semantec.de
Transportable tablespaces
• The fastest method for moving data between databases
• The tablespeces with all their data are plugged into the data warehouse database
Production Data WarehouseTablespace Tablespace
ftp
www.semantec.de
External Tables
• Can be directly queried and joined in SQL, PL/SQL and Java
• Avoid data staging• One step loading and transformation• Save DB space
ASCIIfile
Excelsheet
Read-only virtual tables
External files
www.semantec.de
Table Functions• Can take a set of rows as input• Can return a set of rows as output• Can be used in the FROM clause• Can be paralellized• Can be pipelined• User defined in PL/SQL, Java or C
Region %
West
Central
East
30
50
20
Sales
TableFunction
www.semantec.de
Table Functions
• Pipelining Data Transformation
TableFunction
TableFunction
Source TargetStep 1 Step 2
Log table
www.semantec.de
MERGE statement
id amount
4 3000
8 1000
9 2000
id amount
4 5000
7 3000
8 6000
9 2000
UPDATE
UPDATE
INSERT
new_sales sales
MERGE INTO sales s
USING new_sales n
ON (s.id = n.id)
WHEN MATCHED THEN
UPDATE s.amount = s.amount + n.amount
WHEN NOT MATCHED THEN
INSERT (s.id, s.amount)
VALUES (n.id, n.amount)
id amount
4 2000
7 3000
8 5000
www.semantec.de
MERGE Advantages
• Single simple SQL statement
• Can be paralellized
• Can use Bulk DML
• Fewer scans of the base table
www.semantec.de
More ETL Features
• Direct-path Interface– SQL*Loader– CREATE AS SELECT– INSERT– Oracle Call Interface
• Multi-table INSERTs
www.semantec.de
Agenda
• ETL Features
• Data Warehouse Management– Partitioning– Materialized Views– DBMS_STATS
• Data Warehouse Querying
• Parallel Operations
www.semantec.de
Partitioning
Jan‘2002
Tablespace 0102
Feb‘2002
Tablespace 0202
Dec‘2002
Tablespace 1202
...
Table Sales
www.semantec.de
Advantages of Partitioning
• Partition independance– LOAD, MOVE, Purge and DROP partitions– MERGE, SPLIT, EXCHANGE partitions– BACKUP, RESTORE, SET READ ONLY
• Partition elimination– SELECT or JOIN only the partition needed
• Parallel Operations– SELECT, UPDATE, DELETE, MERGE
www.semantec.de
Partitioning Methods
• Hash Partitioning– Even row distribution by hash function
• Range Patitioning– <01.01.2002 | <01.02.2002 | ... | <01.01.2003
• List Partitioning– Stuttgart, Munich | Manheim, Frankfurt | ...
www.semantec.de
Table Compression
• Stores tables or partitions in compressed format
• Reduces disk space requirements• Reduces memory requirements• Speeds up query execution• Speeds up backup and recovery• Very efficient for highly redundant data –
the FACT table• 2 to 4 times compression is usual
www.semantec.de
Materialized Views
revenue_sum
region month revenue
sales
region month invc_sum...
SELECT region, month,
sum(invc_sum) revenue
FROM sales
GROUP BY region, month
www.semantec.de
Advantages of Materialized Views
• Improved query/reporting performance for:– Summaries– Agregates– Joins
• Fast Refresh– Data change tracking– Partition change tracking
• No application change needed – their usage is automatic
www.semantec.de
DBMS_STATS
• New package for gathering table and index statistics
• Gathers statistics in parallel
• Can export and import statistics
ProductionData Warehouse
DevelopmentData Warehouse
Statistics
www.semantec.de
More Data Warehouse Management Features
• Index-organized tables
• Online index rebuild
• Online table rebuild
www.semantec.de
Agenda
• ETL Features• Data Warehouse Management• Data Warehouse Querying
– Bitmap Indexing
– Star Query Transformation
– Agregation – ROLLUP, CUBE, Grouping Sets
– Analytic functions
• Parallel Operations
www.semantec.de
Bitmap IndexesRegion east central west NULL
rowid 1 0 0 0
rowid 0 0 1 0
... 0 0 0 1
rowid 0 1 0 0
1
0
0
0
0
1
0
0
0
0
1
0
OR AND NOT( ) =
1
1
0
0
www.semantec.de
Advantages of Bitmap Indexes
• Reduced response time for ad-hoq queries• Uses much less space than a B-tree index• Dramatic performance gains for large class
of queries:– Multiple AND, OR and NOT conditions– IS NULL conditions– COUNT– NOT IN - Bitmap MINUS– BETWEEN - Bitmap UNION
www.semantec.de
Star Query Transformation• The query is re-written for efficient execution
sales cust_id prod_id amountq_id
cust_id name prod_id name q_id name
customers products quarters • Steps:1. Filter all
dimentions
2. Combine the bitmap indexes of the fact table‘s foreign keys
3. Retrieve fact and dimention other rows
www.semantec.de
Agregation Operators
• Oracle extends the GROUP BY clause by:– ROLLUP– CUBE– Grouping Sets
2500 8000
4000
6500
10500
SELECT SUM(amount)
FROM sales
GROUP BY county, quarter
Q1
Q2
UK US
1000 3000
1500 5000
www.semantec.de
ROLLUP and CUBE
ROLLUP(country, department, quarter)
(country, department, quarter)
(country, department)
(country)
() - Grand Total
CUBE(country, department, quarter)
(country, department, quarter)
(country, department)
(country, quarter)
(department, quarter)
(country)
(department)
(quarter)
() - Grand Total
ROLLUP – subtotals at increasing levels of agregation – from right to left
CUBE – subtotals on all combinations
n+1
2n
www.semantec.de
Agregation Operators Advantages
• Applicable on many agregation functions:– SUM, AVG, COUNT– MIN, MAX– STDDEV, VARIANCE
• Flexible agregation groups and levels
• Runs in parallel
www.semantec.de
Analytic functions
• Significantly improved performance for complex reports as:– Ranking – Find top 10 sales in each region– Moving agregates – What is the 90 day moving
sales average?– Period-over-period comparison – What are the
revenues from January 2002 compared to January 2001?
www.semantec.de
Example – Moving WindowSELECT c.cust_id, t.month,
SUM(amount_sold) SALES,
AVG(SUM(amount_sold))
OVER (ORDER BY c.cust_id, t.month ROWS 2 PRECEDING) MOV_3_MONTH
FROM sales s, times t, customers c
WHERE s.time_id = t.time_id AND
s.cust_id = c.cust_id AND
t. year = 1999 AND
c.cust_id IN (6380)
GROUP BY c.cust_id, t.month
ORDER BY c.cust_id, t.month;
CUST_ID MONTH SALES MOV_3_MONTH
------- ------- ------- -----------
6380 1999-01 19,642 19,642
6380 1999-02 19,324 19,483
6380 1999-03 21,655 20,207
6380 1999-04 27,091 22,690
6380 1999-05 16,367 21,704
6380 1999-06 24,755 22,738
www.semantec.de
More Data Warehouse Querying Features
• Function-based Indexes• Optimizer Plan Stability• Statistics for Long Running Operations• Resumable Statements• Full Outer Join• With Operator• Oracle Text “Advanced Searching with Oracle Text”
14.11.2002, 2nd Conference day
11:50-12:30, Konferenzraum EG
www.semantec.de
Agenda
• ETL Features
• Data Warehouse Management
• Data Warehouse Querying
• Parallel Operations
www.semantec.de
Parallel Operations
• Dramatically reduce execution time of data intensive operations
• Loading– Direct Path Load
• DDL Statements– CREATE AS SELECT, CREATE INDEX– REBUILD INDEX, REBUILD INDEX PARTITION– MOVE, SPLIT, COALESCE PARTITION
• DML Statements– INSERT AS SELECT– UPDATE, DELETE and MERGE
www.semantec.de
Parallel Operations
• Access methods– Table and index range and full scans
• Join methods– Nested loops, Sort merge, Hash, Star transformation
• SQL operations– GROUP BY, ROLLUP , CUBE
– DISTINCT, UNION, UNION ALL
– Agregate functions
www.semantec.de
Parallel System Requirements
• Symetric Multiprocessor Systems, Clusters or Massively Parallel Systems
• Sufficient I/O Bandwidth
• Sufficient (Underutilized) CPUs
• Sufficient Memory
www.semantec.de
Summary
• Effective handling of multi-terabyte Data Warehouses
• Rich feature set for all Data Warehouse operations
• Flexible agregation and analytical features for high performance queries
• Effective parallelizm
www.semantec.de
Want to know more?
Telephone:
Telephone:
Fax:
E-Mail:
Internet:
Company:Name:
Address:
Semantec GmbH.
Krasen Paskalev, Armin Singer, Peter Kopecki
Benzstr. 32D-71083 Herrenberg, Germany
Meet us here -> booth 2C at the ground floor
+49(7032)9130-0
+49(7032)9130-12
+49(7032)9130-22
www.semantec.de