Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates. Oracle Data Warehousing and Business Intelligence - The Complete Story Andreas Katsaris Arisant, LLC
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Oracle Data Warehousing and
Business Intelligence -
The Complete Story Andreas Katsaris
Arisant, LLC
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
The complete picture…
Siebel CRM
Oracle EBS PeopleSoft SAP/R3
Other Sources
Oracle Data Integrator
E-LT Metadata E-LT Agent
Oracle BI Enterprise Data
Warehouse
Oracle BI Suite EE
Oracle BI Server
Oracle BI Presentation Server
Answers Interactive Dashboards Publisher Delivers
Bulk E-LT
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Agenda • Warehouse definition
– Why Bother? – Life before Integrated warehousing and BI – Terminology
• Warehouse design aspects and considerations – Methodology – Dimensional Modeling – Top Most powerful and underutilized database features
• Oracle Business Intelligence (OBI) – OBI Components and descriptions
• Oracle Data Integrator (ODI) – ETL Strategy – Architecture
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Data Warehouse Mission
• Gather, integrate and reconcile operational, decision support, and external data
• Provide meaningful, accessible, consistent, and easy to understand business information to enterprise users
• Act as a single integrated source of data for processing information
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
What is a Data Warehouse?
• a relational database designed for query and analysis
• contains historical data derived from transaction data, but it can include data from other sources
• enables an organization to consolidate data from several sources
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
What is a Data Warehouse?
• Components of a warehouse environment – extraction, transformation, and loading (ETL)
engine – online analytical processing (OLAP) solution – client analysis tools/Data Mining – other applications that manage the delivery of
data to enterprise users
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Data Warehouse Methodology • Methodology
– Iterative/Agile process, avoid waterfall – ‘big bang’ approach – Start with one Subject Area and one target user group
• Scoping Phase - Identify key players (business sponsors, stakeholders, users) - Build and validate the business case
• Show ROI benefits - Support for the following types of functionality
- Activities that cannot be done at all today, but should be - Activities that are accomplished today by manually synthesizing
data from reports, file extracts, and other sources • Evaluate candidate technologies for the ETL, data warehouse and front-
end tools – Today we are confident that we can do all this with Oracle
technologies!
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Data Warehouse Architectures
• Data Warehouse Architecture (Basic) • Data Warehouse Architecture (with Staging
Area) • Data Warehouse Architecture (with Staging
Area and Data Marts)
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Data Warehouse Architecture (Staging Area and Data Marts)
End Users
Data Sources:Databases,flat files spreadsheets, XML …
Warehouse Staging Area
Extraction
Data Mart
Transformation & Load
Transformation & Load
KNOWLEDGE
INFORMATION DATA Managed
Reporting
Ad-Hoc Queries
Balanced Scorecards
BU
S. INTELLIG
ENC
E DELIVERY
Notification
Analysis
Dashboards
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Ultimate Goal • Timely, Accurate, Fast delivery of knowledge
Star Schema with
Mviews (rewrite, refresh) Star Schema
3NF
OLAP cubes
(rewrite, refresh)
Better Performance
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Top Most powerful and underutilized database features
• Data Sourcing
• ETL at database level
• Maintenance
• Performance
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Data Sourcing - Replication - Transportable Tablespaces - Streams - CDC (Change Data Capture) - External Tables (access to flat files) - Data Pump - SqlLoader (access to flat files)
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Replication
• Snapshot logs on source system • Snapshot technology can be extended to
produce/preserve exact row changes • Resistance from source system owners
– usually complain about performance impact – space requirements for snapshot logs
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Transportable Tablespaces
• copying of datafiles (tablespaces in read only mode) • move both table and index data avoids index rebuilds • Cross platform support • When to consider:
– Lots of source system changes – Required tables can be placed in specific tablespace(s) – No timestamps so incremental extraction not possible – Other options not allowed (streams, replication, CDC)
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Streams • routes published information to subscribed destinations • only changes to desired objects are captured • Architecture
– Capture – Staging/Propagation – Consumption – Parallel processing of log files
• Non-intrusive – No triggers or snapshot logs – uses redo log/archive log
• Custom Transformations possible • Access via Oracle-supplied PL/SQL packages or Enterprise
Manager Console • Golden Gate acquisition
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Change Data Capture • simplifies the process of identifying changed data since the
last extraction (incremental extraction) • architecture is based on publisher-subscriber model • synchronous (part of transaction)
– change data is captured via triggers and stored inside the database in change tables
• asynchronous (not part of transaction) – lightweight Oracle Streams application – Changes extracted from the log files
• Captured data made available to the target systems in a controlled manner (subscription window), using database views
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
External Tables • Files stored outside the database • Read only in 9i – Read/Write in 10g(CTAS) –
compressed and encrypted in 11g • DML/index creation not allowed • Can be read with SQL as if they were tables
– May be joined with database tables – Filtering allowed with WHERE clause
• Can be used to load staging area or act as a staging area themselves
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Data Pump • Next generation export/import tools • Various interfaces
– expdp / impdp – Web based GUI via Database Control – DBMS_DATAPUMP
• Jobs (exports and loads) are interruptible and resumable • Parallelism • Fine-grain object selection • Allows data movement via db links!
– Works in a pipelined fashion • Compressed and encrypted in 11g
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
• Resumable Operations • PL/SQL
– Direct Path Operations – Transformation and Loading functions – BULK operations
ETL at the database level
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Resumable Operations • Instance level
– init.ora: RESUMABLE_TIMEOUT = 3600 • Session level
– ALTER SESSION ENABLE RESUMABLE <TIMEOUT secs>; – ALTER SESSION DISABLE RESUMABLE;
• Used for: – DDL
• CREATE INDEX, CTAS … – DML
• INSERT/UPDATE
• Detection: – Look in DBA_RESUMABLE where STATUS='SUSPENDED'
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
PL/SQL • Easy to write, flexible, fast deployment • Extraction/Loading using Direct Path
Operations – CTAS operations with NOLOGGING – INSERT … SELECT with APPEND and PARALLEL hints – Ideal for incremental extractions (e.g. timestamp based)
• Transformations/Loading – Pipelined functions, MERGE, Multitable INSERT, INSERT
FIRST, pivot, myriad of Analytical functions
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
PL/SQL
• BULK operations – BULK COLLECT
• bulk binds with SELECT statements • fetch into user defined array or PL/SQL table
– FORALL • bulk binds with INSERT, UPDATE, and DELETE
statements – RETURNING INTO
• Retrieving DML Results into a Collection
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
DML Error Logging • Rows causing errors are loaded in an error log table • Alternative to using PL/SQL with the SAVE EXCEPTIONS
clause BEGIN DBMS_ERRLOG.CREATE_ERROR_LOG('SALES_FACT'); END; /
INSERT /*+ APPEND */ INTO sales_fact SELECT * FROM <source tables> LOG ERRORS REJECT LIMIT UNLIMITED ;
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
• Maintenance – DML/DDL Operations using :
• Direct Path Operations (NOLOGGING, append hint)
• Parallelism • Partitioning • ILM (Oracle ILM tool) • Advanced Compression
Maintenance
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Performance • Dimensional modeling/star schemas • Partitioning • Parallelism • Deferred/disabled constraints • Index Management (unuse/rebuilt) • Advanced compression (test first) • Materialized Views • OLAP Cubes/Cube Organized Materialized Views • Data Caching
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Star Schema • FACT table
– ‘skinny’ and large – Contains numeric measurements and FKs to dimensions – FK columns – bitmap indexes – Consider partitioning and archiving aspects
• Dimensions – Denormalized – small and ‘wide’ – Contain business attributes relating to the measurements
• Avoid snowflake if possible
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Star Schema
FACT SALES
DIMENSION 1 REGION
DIMENSION 2 PRODUCT
DIMENSION 4 ENTITY
DIMENSION 3 TIME
• Intuitive to end-users • Mirrors the way people think about the business • High Query Performance
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Direct Path Operations • NOLOGGING option, APPEND hint • CTAS … NOLOGGING • INSERT /*+ append */ INTO … SELECT …
– exclusive locks on the table (no parallel streams, use PARALLEL hint instead)
– No enabled FKs (hint ignored) – Index maintenance deferred to end to direct
path insert operation • CREATE INDEX …. NOLOGGING;
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Parallelism • Queries requiring large table scans, joins, or
partitioned index scans • Creation of large indexes • Creation of large tables (including
materialized views) • Bulk inserts, updates, merges, and deletes • Understand related init.ora parms and set
appropriately
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Parallelism • Types:
– Parallel Query – Parallel DDL – Parallel DML
• Implementation: – via a hint
/*+ PARALLEL(<table name>,<degree of parallelism>) */
– at the object definition level • CREATE/ALTER TABLE finance_trans PARALLEL 4; • CREATE/ALTER INDEX … PARALLEL 2;
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Partitioning • Goal
– Rolling window operations – Increased performance
• Types – Range Partitioning – Hash Partitioning – List Partitioning – Composite Partitioning
• FACT tables are candidates for partitioning
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Partitioning • Partition Pruning (a.k.a partition elimination)
– evaluate WHERE clause to eliminate unneeded partitions
– Reduces unnecessary I/O – Improved query performance
• Partition operations to facilitate warehouse loads – Load multiple partitions in parallel – Load stand alone table - EXCHANGE partition – Index maintenance at the partition level – Archive old data using EXCHANGE partition
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Partitioning- 11g enhancements • Partition Advisor • New composite combinations
– list/range, range/range, list/hash, list/list • Interval Partitioning
– Automatic creation of range-based partitions • REF Partitioning
– Partition detail table based on the master-table key • Virtual-Column Based Partitioning
– Partition based on an expression
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Interval Partitioning • Minimizes periodic partition maintenance (No need to
create new partitions) • Partition segments allocated as soon as new data
arrives • Local indexes are created and maintained as well • Requires at least one range partition • Range key value determines the range high point • Partitioning key can only be a single column, and either
DATE or NUMBER datatype
CREATE TABLE sales(…) PARTITION BY RANGE (sales_date) INTERVAL(NUMTOYMINTERVAL(1, 'MONTH')) ( PARTITION p1 VALUES LESS THAN (TO_DATE('1-2-2006', 'DD-MM-YYYY')) );
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
REF Partitioning • Related tables benefit from same partitioning strategy • Partition a table based on a column in another table • Partition key inherited via FK/PK relationship • Avoids having the partition key on the child table
CREATE TABLE orders /* parent table */ ( order_id /* PK */…..order_date /* partition key*/… …CONSTRAINT orders_pk PRIMARY KEY(order_id) ) PARTITION BY RANGE(order_date)…
CREATE TABLE order_items /* child table */ ( order_id /* FK */… CONSTRAINT order_items_fk ..FOREIGN KEY(order_id) REFERENCES orders(order_id) ) PARTITION BY REFERENCE (order_items_fk);
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Virtual-Column Based Partitioning • Add expression based virtual (meta data) columns • Virtual column used as the partition key
CREATE TABLE sales ( sales_num NUMBER(12), rep_name VARCHAR2(20), sales_code NUMBER(2) GENERATED ALWAYS AS TO_NUMBER (SUBSTR(TO_CHAR(sales_num),1,2)) ) PARTITION BY LIST (sales_code)
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Advanced Compression • Table Compression
– Table Scan Performance: 2x faster – Storage Savings: 2x smaller – DML Performance: 5% slower
CREATE TABLE SALES_FACT (…) COMPRESS FOR ALL OPERATIONS;
• RMAN Compression – ~40% faster than compressed backups in 10g – Slightly better compression ratio than in 10g
RMAN> CONFIGURE COMPRESSION ALGORITHM ‘zlib’; RMAN> backup as COMPRESSED BACKUPSET database archivelog all;
• Data Pump Compression expdp hr FULL=y DUMPFILE=dpump_dir:full.dmp COMPRESS;
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Advanced Compression
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
ILM – Information Lifecycle Management
• Why Bother – Compliance – Performance – Cost – Data Maintenance
5% - most active
35% - less active
60% - historical
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
ILM – Information Lifecycle Management
• Implementation of different tiers of storage • Consider Oracle ILM Assistant (free!)
– Leverages Oracle Partitioning – Uses Lifecycle Definitions – Calculates storage costs & savings – Simulates the impact of partitioning on a table – Advises how to partition a table – Generates scripts to move data when required
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
SQL Query Result Cache
• Caching of query results or PL/SQL function calls • DML/DDL against dependent database objects
invalidates cache • Candidate queries
– access many, many rows – return few rows (small result set) – executed many times
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
SQL Query Result Cache • result_cache_mode init.ora parameter
– AUTO (optimizer uses repetitive executions to determine if query will be cached)
– MANUAL (need use /*+ RESULT_CACHE */ hint in queries) – FORCE (All results are stored in cache)
• result_cache_max_size init.ora parameter – default is dependent on other memory settings – (0.25% of memory_target or 0.5% of sga_target or 1% of
shared_pool_size) – 0 disables result cache – never >75% of shared pool (built-in restriction)
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Optimizing Querying
• Star Queries • Materialized Views • Cube Organized Materialized Views • Index Merge • Bitmap Join Indexes
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Star Queries • A join between a fact table and a number of dimension
tables • Require the existence of bitmap indexes on FK columns on
the FACT table • retrieve exactly the necessary rows from the fact table using n merged bitmap indexes
• join this result set to the dimension tables • Implementation:
– star_transformation_enabled=true|false|temp_disable (system or session modifiable)
– hints for ‘stubborn’ queries • /*+ STAR_TRANSFORMATION */ • /*+ FACT */
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Materialized Views • Pre-aggregation of data • Changes to the underlying tables reflected in mview
– Fast/Complete refresh – Refresh can be on demand or scheduled
• Can be partitioned and indexed like a table • Query Rewrite
– /*+ REWRITE_OR_ERROR */ hint –new in 10g (ORA-30393) – DBMS_MVIEW.EXPLAIN_REWRITE() to see why rewrite failed – $ORACLE_HOME/rdbms/admin/utlxrw.sql for REWRITE_TABLE (explanation)
• DBMS_ADVISOR package – APIs to recommend materialized views and indexes – DBMS_ADVISOR.TUNE_MVIEW()
• API to tune existing materialized views and indexes • Tuned version in USER_TUNE_MVIEW
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Materialized Views • Query rewrite makes mviews transparent to end-users
Base tables Snapshot logs
INSERT/
UPDATE/
DELETE
Query (report, ETL code, etc)
Create materialized view … enable
query rewrite …
Select ….
Without query rewrite
With query rewrite
Refresh mode ?
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Cubes and Cube Organized MViews
• Query Options – Query star schema (fact+dimensions) – Build mviews on star schema
• Too many combinations
– Build cubes
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Cubes and Cube Organized MViews
• Stored in special db areas called Analytic Workspaces (which are stored in BLOBs)
• Manipulated via Analytic Workspaces Manager tool • 11g- Best of both worlds
– Rewrite and refresh features of regular MVs – Performance benefits of OLAP cubes – Expose OLAP cube as a relational object access via SQL – CUBE_TABLE function searches cube using SQL
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Cubes and Cube Organized MViews
SQL> explain plan for 2 select * from table(cube_table('GLOBAL.PRICE_CUBE'));
PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------------- Plan hash value: 3184667476
-------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2000 | 195K| 29 (0)| 00:00:01 | | 1 | CUBE SCAN PARTIAL OUTER| PRICE_CUBE | 2000 | 195K| 29 (0)| 00:00:01 | --------------------------------------------------------------------------------------
• Query cube as if it was a relational object
• Refresh using DBMS_MVIEW.REFRESH
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Index Merge
• Merge two separate indexes • Avoid creating a new concatenated index • Implementation via a hint
– /*+ index_join(table_name,index1, index2) */
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Index Merge
SQL> select count(*) from FINANCE_TRANS where COMPANY_KEY between 1 and 1000 and CHRGE_CODE
= 'BASIC';
Elapsed: 00:03:44.03
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5147 Card=1 Bytes=11 )
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (BY INDEX ROWID) OF ‘FINANCE_TRANS' (Cost=5147 Card=10004 Bytes=110044)
3 2 INDEX (RANGE SCAN) OF 'FINANCE_TRANS_IX2' (NON-UNIQUE) (Cost=51 Card=100035)
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Index Merge
SQL> select /*+ index_join(FINANCE_TRANS, FINANCE_TRANS _IX2,TEMP1) */ count(*) from FINANCE_TRANS where COMPANY_KEY between 1 and 1000 and CHRGE_CODE='BASIC';
Elapsed: 00:00:01.73
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2037 Card=1 Bytes=11)
1 0 SORT (AGGREGATE)
2 1 VIEW OF 'index$_join$_001' (Cost=2037 Card=10004 Bytes=110044)
3 2 HASH JOIN
4 3 INDEX (RANGE SCAN) OF ' FINANCE_TRANS _IX2' (NON-UN IQUE) (Cost=1811 Card=10004 Bytes=110044)
5 3 INDEX (RANGE SCAN) OF 'TEMP1' (NON-UNIQUE) (Cost=1811 Card=10004 Bytes=110044)
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Bitmap Join Indexes • An index build on a table using columns from another table(s)! • Index contains the data to support a join query • Allows the query to retrieve the data from the index rather than
referencing the join tables
• Good for query performance, bad for DML operations – Consider drop/recreate
CREATE BITMAP INDEX temp_bixj1 ON BUDGET_FACT
(LEDGER_DIM.ledger_name)
FROM BUDGET_FACT, LEDGER_DIM
WHERE LEDGER_DIM.ledger_key = BUDGET_FACT.ledger_key
NOLOGGING
TABLESPACE INDEX_1;
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Bitmap Join Indexes -- Without bitmap join index
SQL> SELECT D.ledger_name, SUM(F.base_amount) FROM BUDGET_FACT F, LEDGER_DIM D
WHERE D.ledger_key = F.ledger_key AND D.ledger_name = ‘Internal Spending’
GROUP BY D.ledger_name;
Elapsed: 00:00:07.58 Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4667 Card=1 Bytes=29 )
1 0 SORT (GROUP BY NOSORT) (Cost=4667 Card=1 Bytes=29)
2 1 HASH JOIN (Cost=4667 Card=1577907 Bytes=45759303)
3 2 VIEW OF 'index$_join$_002' (Cost=97 Card=6125 Bytes=128625) 4 3 HASH JOIN
5 4 INDEX (RANGE SCAN) OF ' LEDGER_DIM_IX4' (NON-UNIQUE ) (Cost=6 Card=6125 Bytes=128625)
6 4 INDEX (FAST FULL SCAN) OF 'LEDGER_DIM_PK' (UNIQUE)(Cost=6 Card=6125 Bytes=128625)
7 2 TABLE ACCESS (FULL) OF ‘BUDGET_FACT' (Cost=2631 Card=8092591 Bytes=64740728)
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Bitmap Join Indexes -- Add bitmap join index TEMP_BIXJ1
Elapsed: 00:00:00.72
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1875 Card=1 Bytes=29 )
1 0 SORT (GROUP BY NOSORT) (Cost=1875 Card=1 Bytes=29)
2 1 HASH JOIN (Cost=1875 Card=1577907 Bytes=45759303)
3 2 VIEW OF 'index$_join$_002' (Cost=97 Card=6125 Bytes=128625)
4 3 HASH JOIN
5 4 INDEX (RANGE SCAN) OF ‘LEDGER_DIM_IX4' (NON-UNIQUE) (Cost=6 Card=6125 Bytes=128625)
6 4 INDEX (FAST FULL SCAN) OF ‘LEDGER_DIM_PK' (UNIQUE)(Cost=6 Card=6125 Bytes=128625)
7 2 TABLE ACCESS (BY INDEX ROWID) OF ‘BUDGET_FACT' (Cost=1712 Card=8092591 Bytes=64740728)
8 7 BITMAP CONVERSION (TO ROWIDS)
9 8 BITMAP INDEX (SINGLE VALUE) OF 'TEMP_BIXJ1'
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Index Management • Why is it a big deal in a warehouse? • Index maintenance during DML
– Benchmark loads with/without indexes present – Specific attention to bitmap index maintenance – Consider invalidate/rebuild or drop/recreate – Partition level maintenance provides granular control – Provide APIs (stored procs) to ETL application
• Virtual Indexes! – undocumented feature – Not intended for standalone usage – Part of Tuning Pack’s Virtual index Wizard – Allows CBO to evaluate a new index without having to create it!
• Index monitoring (watch out from DBMS_STATS)
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Index Management - Case Study – Loading in non partitioned table
• 1.2 billion existing records and 3 indexes • Load 100 million records • Last 30% of records require some lookups on first
70% of loaded records (1 of the indexes is needed)
– Slow load performance – Indexes can be dropped to improve loading
• Need to be recreated • Not available for other queries
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Index Management - Case Study
No partitioning
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Index Management - Case Study • The solution
– Partition the table and indexes – Invalidate specific index partitions (mark
unusable) • No inline index maintenance while data is loading • Rebuild required index partition to help last part of the
load (granular control) – Rebuild remaining index partitions – Other partitions and related indexes available for
querying while loading new data
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Unusable partitioned Indexes: No inline index maintenance
Usable partitioned Indexes: Rebuilt by the application using stored procedures after a load has been completed
Index Management - Case Study Partitioned table and indexes
P2 P3 P4 P5 P1 P6
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Constraint Management • Do I need foreign keys in a warehouse?
– My code takes care of data integrity – Our data is clean
• Trace to see what happens while loading • Consider /*+ append */ hint behavior • Consider some of these options
– Deferred constraints • check that constraint is satisfied only at commit time • Useful when loading in no particular order
– Disable/enable, validate/novalidate – Enable in parallel!
• Metalink Note:124848.1
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Statistics Management • Be careful what/when you analyze • Manage statistics on ‘big’ tables
– Do I really need to analyze all the tables? – Determine frequency/timing, estimate sample – Consider ‘faking’ statistics – Consider providing APIs to ETL application
• Ability to analyze before/after certain loads
• Index monitoring – Watch out from dbms_stats.gather_table_stats and
dbms_stats.gather_index_stats behavior • Procedures mark analyzed indexes as USED!
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Oracle Business Intelligence EE
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
From this…
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
To this…
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Gartner Magic Quadrant for BI Platforms 2008 2009
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Traditional, non-Integrated Solution
Mining Engine
- Analysis
OLAP Engine
-Analytic Apps
Reporting Engine
Portal
ETL
ETL
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
N0n-Integrated Solution • Siloed reports, multiple islands of information • No knowledge sharing • Duplication of effort and reports • Security issues • Different formats and tools, local data copies • Access and distribution is problematic
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Oracle’s Approach
• Integrated BI Database – BI, Data Mining and OLAP functions in the database
• Integrated BI Tools – Common BI Technology Platform
• Integrated Analytic Applications – Industry Specific pre-build Apps
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Comprehensive DW and BI stack (source - Oracle)
Oracle OLAP Oracle Data Mining Oracle Partitioning OWB/ODI
Oracle BI Suite Standard Edition Oracle BI Suite Enterprise Edition
Oracle BI Applications
Oracle Daily Business Intelligence
Oracle Corporate Performance Management
PeopleSoft Enterprise Performance Management
Oracle Real-Time Decisions
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Pipeline Analysis
Triangulated Forecasting
Sales Team Effectiveness
Up-sell / Cross-sell
Discounting Analysis
Lead Conversion
Receivables / Payables Analysis
Customer Profitability
Product Profitability
Regulatory Compliance
Expense Management
Cash Flow Analysis
Churn Propensity
Customer Satisfaction
Resolution Rates
Service Rep Effectiveness
Service Cost Analysis
Service Trends
Campaign Scorecard
Response Rates
Product Propensity
Loyalty and Attrition
Market Basket Analysis
Campaign ROI
Supplier Performance
Inventory Analysis
Procurement Cycle Times
Inventory Availability
Employee Expenses
BOM Analysis
Employee Productivity
Compensation Analysis
Compliance Reporting
Workforce Profile
Turnover Trends
Return on Human Capital
Oracle BI Applications (source - Oracle)
Sales Analytics
Financial Analytics
Workforce Analytics
Supply Chain Analytics
Service & Contact Center
Analytics Marketing Analytics
Travel & Trans
Auto Comms. & Media
Complex Mfg.
Consumer Sector
Energy Financial Services
High Tech
Insurance & Health
Life Sciences
Public Sector
Other Operational & Analytic Sources Prebuilt adapters: Siebel Oracle SAP PeopleSoft
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
OBI Apps (source - Oracle)
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
OBI Suite Offerings Enterprise Edition Plus (OBI EE Plus) • BI Server • BI Answers • BI Interactive Dashboards • BI Publisher (a.k.a XML Publisher) • BI Delivers • BI Disconnected Analytics • MS Office Add-In • Hyperion Source: http://www.oracle.com/technology/products/bi/enterprise-edition.html
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
OBIEE Answers –Ad Hoc Querying
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
OBIEE Interactive Dashboards
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
BI Publisher – “Pixel-Perfect” Reporting
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
BI Server Overview • Integrates information from various sources
– Native RDBMS support for Oracle, SQL Server, DB2 and Teradata • Sophisticated data access, aggregation, and calculation
engine • Three separate layers
– Physical – Business Model – Presentation
• Mapping of physical data schemas (tables & joins) to a logical business model
• Security services • Represent multiple physical data sources as a single,
simplified data structure to end user tools
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
BI Server Overview – Admin Tool
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
BI Server – Physical Layer
• exact depiction of the target physical objects
• mapping of physical data schemas (tables, joins) to a logical business model
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
• Layer of abstraction that sits on top of the physical data – Simplifies complex database structures
• ‘firewall’ between users and the mechanics of the physical data access layer
• Drill-Paths – e.g. Year->Qtr->Month->Week->Day
• Derived Business Measures or Calculations – e.g. support for time-series analysis, moving
averages, weighted averages, rolling sums, cumulative calculations, etc.
BI Server – Business Model Layer
Customer
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
• Sits on top of Business Model • Can build many views into the
Business Model: – simplified Presentation for
end-users – complex Presentation for
power-users/report writers (complex calculations, etc.)
BI Server – Presentation Layer
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Oracle Data Integrator (ODI) • Sunopsis Acquisition • ‘Oracle friendly’ architecture (leverages RDBMS) • Data Movement and Transformation from
Multiple Sources to Heterogeneous Targets • Key Differentiators
– Transformations leverage RDBMS – Declarative Design (automatically generates the Data
Flow whatever the sources and target DB) – Knowledge Modules
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Oracle Data Integrator Architecture • Java design-time environment
– Runs on any platform – Thin client for browsing
Metadata • Java runtime environment
– Runs on any platform – Orchestrates the execution
of data flows • Metadata repository
– Pluggable on many RDBMS – Ready for deployment – Modular and extensible
metadata
Design-Time
Metadata Management
Runtime
Agent
Data Flow Conductor
Service Interfaces and Developer APIs
User Interfaces
Thin Client
Data Flow Generator
Knowledge Module Interpreter
Knowledge Modules
Master Repository
Work Repositories
Runtime Repositories
Data Flow Generator
Runtime Session
Interpreter
Data Flow
Operator
Designer
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Terminology • ETL/ELT projects are designed in the Designer tool • Transformations in ODI are defined in objects called Interfaces • Interfaces are stored into Projects • Interfaces are sequenced in a Package that will be ultimately
compiled into a Scenario for production execution
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Interface • An Interface will define
– Where the data is sent to (the Target) – Where the data is coming from (the Sources) – How the data is transformed from the Source format to the Target
format (the Mappings) – How the data is physically transferred (the path taken) from the
sources to the target (the data Flow) • Source and target are defined using Metadata imported from the
databases and other systems • Mappings are expressed in SQL • Flows are defined in Templates called Knowledge Modules (KMs) • An interface may have more than one source but only populates a
single target • To populate several targets, you need several interfaces
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Staging Area • A separate, dedicated area in an RDBMS where ODI creates its temporary objects and executes some of the transformation rules • By default, ODI sets the staging area on the target data server
- can be on the source or a 3rd RDBMS or the Sunopsis Memory Engine - cannot be placed on non relational systems (Flat files, ESBs, etc.)
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Metadata • ODI is strongly based on the relational paradigm • In ODI, data are handled through tabular structures
defined as datastores • Datastores are used for all type of “real” data
structures: database tables, flat files, XML files, JMS messages, LDAP trees, etc
• The definition of these datastores (the metadata) will be used in the tool to design the data integration processes.
• Defining the datastores is the starting point of any data integration project
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Two Methods for Reverse Engineering • Standard reverse-engineering
– Uses JDBC connectivity features to retrieve metadata, then writes it to the ODI repository
• Customized reverse-engineering – Read metadata from the application/database system
repository, then writes these metadata in the ODI repository
– Uses a technology-specific strategy, implemented in a Reverse-engineering Knowledge Module (RKM)
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Knowledge Modules - KMs • Knowledge Modules are templates of code that define
integration patterns and their implementation • They are usually written to follow Data Integration best
practices, but can be adapted and modified for project specific requirements
• Example steps of a KM: • When loading data from a heterogeneous environment:
1. create a staging table 2. to load the data, use SQL loader 3. create the CONTROL file for SQL loader 4. when finished with the integration, remove the CONTROL file and
the staging table
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
• A Knowledge Module is made of steps
• Each step has a name and a template for the code to be generated
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Journalize Read from CDC
Source
Load From Sources to
Staging
Check Constraints before
Load
Integrate Transform and Move
to Targets
Service Expose Data and Transformation
Services
Reverse Engineer Metadata
Reverse
Journalize
Load
Check
Integrate Services
Pluggable Knowledge Modules Architecture
CDC
Sources
Staging Tables
Error Tables
Target Tables
WS
WS W
S
SAP/R3
Siebel
Log Miner
DB2 Journals
SQL Server Triggers
Oracle DBLink
DB2 Exp/Imp
JMS Queues Check MS Excel
Check Sybase
Oracle SQL*Loader
TPump/ Multiload
Type II SCD
Oracle Merge
Siebel EIM Schema
Oracle Web Services
DB2 Web Services
Sample out-of-the-box Knowledge Modules
Knowledge Modules Hot-Pluggable: Modular, Flexible, Extensible
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
KM Types KM Type Description
CKM Check Checks data in a datastore or during an integration process.
IKM Integration Uses a given strategy to populate the target datastore from the staging area.
LKM Loading Assembles data from source datastores to the staging area.
RKM Reverse-engineering
Retrieves the structure of a data model from a database. Only needed for customized reverse-engineering.
JKM Journalizing Sets up a system for Changed Data Capture to reduce the amount of data that needs to be processed.
Inte
rface
s Mo
dels
SKM Web Services Defines the code that will be generated to create Data Web Services (Exposing data as a web service)
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Target (Oracle)
Staging Area
Staging Area on Target-Example ETL
Source (Sybase)
ORDERS
LINES
CORRECTIONS File
SALES C$_0
C$_1
I$_SALES
1
2
3
5
Extract/Join/Transform
Extract/Transform
Join/Transform
Transform & Integrate
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Target (Oracle)
Staging Area Source (Sybase)
ORDERS
LINES
CORRECTIONS File
SALES C$_0
C$_1
I$_SALES
LKM_1
LKM SQL to Oracle
LKM_2
LKM File to Oracle (SQLLDR)
IKM_1
IKM Oracle Incremental Update
IKM_1
IKM Oracle Incremental Update
Staging Area on Target-Example ETL
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Requirements
• To run an interface, you need at least the following: – A target table – An Integration Knowledge Module – A Loading Knowledge Module if there is a remote
source
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Code Generation • When we ask ODI to Execute the transformations, ODI will
generate the necessary code for the execution (usually SQL code)
• The code is stored in the repository • The execution details are available in the Operator
Interface: – Statistics about the jobs (duration, number of records
processed, inserted, updated, deleted) – Actual code that was generated and executed by the
database – Error codes and error messages returned by the
databases if any
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
ODI Design-Time Environment ODI Runtime Environment
Development Development Servers and Applications
Production Production Servers and Applications
User Interfaces
Overview: 6 steps to Production 1. Retrieve/Enrich metadata 2. Design transformations 3. Orchestrate data flows
4. Generate/Deploy data flows 5. Monitor executions 6. Analyze impact / data lineage
Agent Data Flow Conductor
CRM
Legacy
ERP
Data Warehouse
Files / XML
User Interfaces
Administrators
Designers
ESB
Design-time Repositories Design-time
Repositories
Agent Data Flow Conductor
CRM
Legacy
ERP
Data Warehouse
Files / XML
Operator
Metadata Navigator
ESB
Runtime Repository
Copyright © 2009 Arisant LLC. Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Q&A • Thank you for attending • If you have follow-up questions I will be here for the rest of the day or can be contacted by email [email protected]