Top Banner
1 The Warehouse Designer’s School Of Hard Knocks A Graduate’s Perspective David Stanford David Stanford Sr. Vice President Sr. Vice President Cognicase Inc. Cognicase Inc. [email protected] [email protected] The Data Warehouse Designer’s The Data Warehouse Designer’s School of Hard Knocks School of Hard Knocks International Oracle Users Group April 15, 2002
93

The Warehouse Designer’s School Of Hard Knocks

Jan 02, 2016

Download

Documents

yasir-wheeler

International Oracle Users Group. April 15, 2002. The Warehouse Designer’s School Of Hard Knocks. The Data Warehouse Designer’s School of Hard Knocks. A Graduate’s Perspective. David Stanford Sr. Vice President Cognicase Inc. [email protected]. Objectives. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Warehouse Designer’s School Of Hard Knocks

1

The Warehouse Designer’s School Of Hard Knocks

A Graduate’s Perspective

David StanfordDavid StanfordSr. Vice PresidentSr. Vice President

Cognicase Inc.Cognicase [email protected]@cognicase.com

The Data Warehouse Designer’s The Data Warehouse Designer’s School of Hard KnocksSchool of Hard Knocks

International Oracle Users GroupApril 15, 2002

Page 2: The Warehouse Designer’s School Of Hard Knocks

2

Objectives

• Obtain a clear understanding of data warehouse design ‘hot points’

• Identify solutions and alternatives for these ‘hot points’

• See how real world solutions are implemented

Page 3: The Warehouse Designer’s School Of Hard Knocks

3

Agenda

• Top 10 Gotchya’s• Warehouse Design• Surrogate Keys• Tracking History• Row Level Security In The Warehouse• BAM Rules, Audit and Administrative Fields• Other Tidbits of Advice

Page 4: The Warehouse Designer’s School Of Hard Knocks

4

Dave’s Top 10 Gotchya’s

1. Failing to model for both a) view of the data when the event occurred and b) view of the data as of today’s reality

2. Limiting the number of dimensions3. Failing to model and populate a meta data

repository4. Failing to provide sufficient audit capabilities

to verify loads against source systems5. Not using surrogate keys for everything

Page 5: The Warehouse Designer’s School Of Hard Knocks

5

Dave’s Top 10 Gotchya’s (cont’d)

6. Failing to design an error correction process7. Normalizing too much8. Not using a staging area9. Failing to load ALL of the fact data10.Failing to classify incorrect data

Page 6: The Warehouse Designer’s School Of Hard Knocks

6

Warehouse Design

Page 7: The Warehouse Designer’s School Of Hard Knocks

7

Overall Architecture

• At the 20,000 foot level we must decide on the use of:– The Operational Data Store (ODS)– The Staging Area– The Data Warehouse proper– The Data Mart(s)

• All of which are being coined as “The Corporate Information Factory” or CIF

Page 8: The Warehouse Designer’s School Of Hard Knocks

8

Design Considerations

• The ODS vs the Warehouse• The ODS vs the Staging Area• The Warehouse vs the Data Mart• The lines become blurred

• Plan for the world, design for the future, and build for today

Page 9: The Warehouse Designer’s School Of Hard Knocks

9

Not Designing For The Future…

SourceOLTPSystems

SourceOLTPSystems

HRHR

SalesSales

ManagementManagement

ManufacturingManufacturing

UsersUsersStove Pipe data marts...

Page 10: The Warehouse Designer’s School Of Hard Knocks

10

…leads to trouble

…become legamarts

ConstituencyConstituency

Page 11: The Warehouse Designer’s School Of Hard Knocks

11

Basic Data Warehouse Architecture

SourceSourceOLTPOLTP

SystemsSystems

DataWarehouse

StagingStagingAreaArea

Page 12: The Warehouse Designer’s School Of Hard Knocks

12

Data Mart DW Architecture

Source OLTPSource OLTPSystemsSystems

Data MartsData Marts

DataWarehouse

Source: Enterprise GroupSource: Enterprise Group

StagingStagingAreaArea

Page 13: The Warehouse Designer’s School Of Hard Knocks

13

Data Warehouse Process

Source OLTPSource OLTPSystemsSystems Data MartsData Marts

•Design•Mapping

•Design•Mapping

•Extract•Scrub•Transform

•Extract•Scrub•Transform

•Load•Index•Aggregation

•Load•Index•Aggregation

•Replication•Data Set Distribution

•Replication•Data Set Distribution

•Access & Analysis•Resource Scheduling & Distribution

•Access & Analysis•Resource Scheduling & Distribution

Meta DataMeta Data

System MonitoringSystem Monitoring

• Raw Detail• No/Minimal History

• Integrated•Scrubbed

• History•Summaries

• Targeted• Specialized (OLAP)

Data Characteristics

DataWarehouse

Source: Enterprise GroupSource: Enterprise Group

StagingStagingAreaArea

Page 14: The Warehouse Designer’s School Of Hard Knocks

14

Where The Work Is

Source OLTPSource OLTPSystemsSystems Data MartsData Marts

•Design•Mapping

•Design•Mapping

•Extract•Scrub•Transform

•Extract•Scrub•Transform

•Load•Index•Aggregation

•Load•Index•Aggregation

•Replication•Data Set Distribution

•Replication•Data Set Distribution

•Access & Analysis•Resource Scheduling & Distribution

•Access & Analysis•Resource Scheduling & Distribution

Meta DataMeta Data

System MonitoringSystem Monitoring

DataWarehouse

Over 80% of the work is here

Source: Enterprise GroupSource: Enterprise Group

StagingStagingAreaArea

Page 15: The Warehouse Designer’s School Of Hard Knocks

15

Data Warehouse Architecture

Components of a Data Warehousing ArchitectureComponents of a Data Warehousing Architecture

WarehouseAdmin.

Transformand Load

DataModeling

Tool

DataModeling

Tool

CentralMeta Data

DataExtractProcess

StagingArea

Data Transforma -tion & Load

SourceDatabases

DataStaging

DataExtraction

ArchitectedData Marts

Data Accessand Analysis

Central DataWarehouse

CentralData

Warehouse

Local Metadata

MetadataExchange

Mid -TierMid -Tier

Mid -Tier

Local Metadata

Local Metadata

Local Metadata

DataMart

MDB

DataMart

RDBMS MDBMDB

DataMart

RDBMS

RDBMS

ETL Tool

DataCleansing

ToolDataCleansing

Tool

RelationalRelational

ERP

e-Commerce

External

LegacyLegacy

Page 16: The Warehouse Designer’s School Of Hard Knocks

16

The Staging Area

• Holds a mirror copy of the extract files• Allows pre-processing of the data before

loading• Allows easier reloading (you WILL do this)• Keeps more control with the DW team, rather

than an external group (the extract team)• Facilitates easier audit processes• Can facilitate error correction processes

Page 17: The Warehouse Designer’s School Of Hard Knocks

17

Modelling is not straight forward

Donation

Member

IncomeCampaign

Time Gender

Marital Status

Location

Age

Page 18: The Warehouse Designer’s School Of Hard Knocks

18

Should These Dimensions Be Combined?

Donation

Member

IncomeCampaign

Time Gender

Marital Status

Location

Age

Page 19: The Warehouse Designer’s School Of Hard Knocks

19

The 10 Step Process-Data Model Design

1. Identify major subject areas or topics2. Add element of time to the tables3. Create appropriate names for tables, columns, and

views4. Add derived fields where applicable5. Add administrative fields6. Consider security and privacy in design7. Make sure data model answers the critical business

questions8. Consider meta data9. Consider error correction10. Performance considerations: Tune, Tune, Tune

Page 20: The Warehouse Designer’s School Of Hard Knocks

20

Independent of Approach…

…the goal of the data model is to satisfy two primary criteria:

1. Meet Business Objectives2. Provide Good Performance

Page 21: The Warehouse Designer’s School Of Hard Knocks

21

Warehouse Design

• Normalized (Relational) Design• Dimensional Design• Hybrid Design• Behind the Scenes

Page 22: The Warehouse Designer’s School Of Hard Knocks

22

Normalized/Relational Schema

• Usually As Normalized as Possible• Used mostly in OLTP databases• Uses entities and relations to describe data• Fast for Inserts and Updates

Page 23: The Warehouse Designer’s School Of Hard Knocks

23

Relational Schemas

Page 24: The Warehouse Designer’s School Of Hard Knocks

24

The Star Schema

STAR SchemaSTAR Schema

• Used in OLAP (BI) and DWH• Uses FACT and DIMENSION Tables• Normalized FACT table• Dimensions Denormalized

Page 25: The Warehouse Designer’s School Of Hard Knocks

25

Sample Star Schema

Page 26: The Warehouse Designer’s School Of Hard Knocks

26

Snowflake Schema

• Contains FACT and DIMENSION Tables

• Dimension Tables can be FACT for other STAR

• Dimension Hierarchies are normalized

Page 27: The Warehouse Designer’s School Of Hard Knocks

27

Sample Snowflake Schema

Page 28: The Warehouse Designer’s School Of Hard Knocks

28

Hybrid

• In reality, the DW is more normalized but has elements of dimensional design

• The data marts are star schemas but have elements of normalization

Page 29: The Warehouse Designer’s School Of Hard Knocks

29

Behind The Scenes

• There are several aspects of a design that users don’t directly see:– Meta Data– Error Correction– Audit– Load Control (if not using a scheduling tool)– Transformation Tables (used for transforming the

data prior to being loaded into the DW)

Page 30: The Warehouse Designer’s School Of Hard Knocks

30

Behind The Scenes

Data MartsData Marts

DataWarehouse

Error Correction

Meta Data

Audit

Load Control

Transform Tables

Source OLTPSource OLTPSystemsSystems

StagingStagingAreaArea

Page 31: The Warehouse Designer’s School Of Hard Knocks

31

Surrogate Keys

Page 32: The Warehouse Designer’s School Of Hard Knocks

32

Surrogate Keys

• A surrogate key is a single column, unique identifier for each row within a table

• Always use surrogate keys for dimensions• Always use surrogate keys for the time

dimension• Always use surrogate keys for facts• Always use surrogate keys for transformation

tables• Always use surrogate keys for EVERY table

Page 33: The Warehouse Designer’s School Of Hard Knocks

33

Surrogate Keys Avoid…

• Duplicate keys from different source systems• Recycling of primary keys• Use of the same key for different business

rows• Lengthy composite key joins• Space in fact tables• Application changes or upgrades in source

systems

Page 34: The Warehouse Designer’s School Of Hard Knocks

34

Using Surrogates In Fact Tables

• You will need a surrogate key on the fact table if you allow ‘unknown’ values into the fact table (which is recommended by the way)

• The PK of a fact is typically the combination of the base dimensions

Page 35: The Warehouse Designer’s School Of Hard Knocks

35

Surrogates In Fact Tables

DIM_DATES_OF_FIRST_SERVICE

Date_Of_First_Service_Key: NUMBER(10,0)

DIM_ICD9_PRIMARY_DIAGNOSES

Primary_Diagnosis_key: NUMBER(10,0)

DIM_BENEFIT_PACKAGES

Benefit_package_key: NUMBER(10,0)

DIM_MEMBERS

Member_key: NUMBER(10,0)

DIM_SERVICE_PROVIDERS

Provider_key: NUMBER(10,0)

FCT_CLAIMS

Product_Key: NUMBER(10,0)Primary_Diagnosis_key: NUMBER(10,0)Date_Of_First_Service_Key: NUMBER(10,0)Provider_key: NUMBER(10,0)Contract_key: NUMBER(10,0)Member_key: NUMBER(10,0)Benefit_package_key: NUMBER(10,0)

DIM_CONTRACTS

Contract_key: NUMBER(10,0)

DIM_PRODUCTS

Product_Key: NUMBER(10,0)

Page 36: The Warehouse Designer’s School Of Hard Knocks

36

Surrogates In Fact Tables

Date Of First Service 15-Jan-2001

Benefit Package Family, Eye Coverage

Contract 123456789

Product ExtendaGroup

Member David Stanford

Service Provider Dr. Walters

Primary Diagnosis Broken Arm

Amount $123.34

Page 37: The Warehouse Designer’s School Of Hard Knocks

37

Surrogates In Fact Tables

Date Of First Service 15-Jan-2001

Benefit Package Family, Eye Coverage

Contract 123456789

Product ExtendaGroup

Member David Stanford

Service Provider Dr. Walters

Primary Diagnosis MISSING (Broken Arm)

Amount $123.34

Page 38: The Warehouse Designer’s School Of Hard Knocks

38

Surrogates In Fact Tables

Date Of First Service 15-Jan-2001

Benefit Package Family, Eye Coverage

Contract 123456789

Product ExtendaGroup

Member David Stanford

Service Provider Dr. Walters

Primary Diagnosis MISSING (Heart Attack)

Amount $16,239.00

• This results in a duplicate primary key in the table

Page 39: The Warehouse Designer’s School Of Hard Knocks

39

Surrogates In Fact Tables

DIM_DATES_OF_FIRST_SERVICE

Date_Of_First_Service_Key: NUMBER(10,0)

DIM_ICD9_PRIMARY_DIAGNOSES

Primary_Diagnosis_key: NUMBER(10,0)

DIM_BENEFIT_PACKAGES

Benefit_package_key: NUMBER(10,0)

DIM_MEMBERS

Member_key: NUMBER(10,0)

DIM_SERVICE_PROVIDERS

Provider_key: NUMBER(10,0)

FCT_CLAIMS

Claim_Line_Key: NUMBER(10,0)

DIM_CONTRACTS

Contract_key: NUMBER(10,0)

DIM_PRODUCTS

Product_Key: NUMBER(10,0)

• Thus the need for a surrogate primary key

Page 40: The Warehouse Designer’s School Of Hard Knocks

40

Tracking History

Page 41: The Warehouse Designer’s School Of Hard Knocks

41

Tracking History in Dimensions

• Type 1 – No history• Type 2 – All history• Type 3 – Some history

Page 42: The Warehouse Designer’s School Of Hard Knocks

42

Type 1 – No History

Source Transaction #1

Id 1

Name Sandy Rubble

Address 23 Boulder Rd

City Bedrock

Salutation Ms.

Warehouse Record #1

Key 100

Id 1

Name Sandy Rubble

Address 23 Boulder Rd

City Bedrock

Salutation Ms.

Date 01-Jan-2001

Page 43: The Warehouse Designer’s School Of Hard Knocks

43

Type 1 – No History

Source Transaction #1

Id 1

Name Sandy Rubble

Address 23 Boulder Rd

City Bedrock

Salutation Ms.

Warehouse Record #2

Key 100

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Salutation Mrs.

Date 15-Mar-2001

Source Transaction #2

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Salutation Mrs.

Page 44: The Warehouse Designer’s School Of Hard Knocks

44

Type 2 – All History

Source Transaction #1

Id 1

Name Sandy Rubble

Address 23 Boulder Rd

City Bedrock

Salutation Ms.

Warehouse Record #1

Key 100

Id 1

Name Sandy Rubble

Address 23 Boulder Rd

City Bedrock

Salutation Ms.

Date 01-Jan-2001

Source Transaction #2

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Salutation Mrs.

Warehouse Record #2

Key 101

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Salutation Mrs.

Date 15-Mar-2001

Page 45: The Warehouse Designer’s School Of Hard Knocks

45

Type 3 – Some History

Source Transaction #1

Id 1

Name Sandy Rubble

Address 23 Boulder Rd

City Bedrock

Salutation Ms.

Warehouse Record #1

Key 100

Id 1

Name Sandy Rubble

Address 23 Boulder Rd

City Bedrock

Original Salutation

Ms.

Salutation Ms.

Date 01-Jan-2001

Page 46: The Warehouse Designer’s School Of Hard Knocks

46

Type 3 – Some History

Source Transaction #1

Id 1

Name Sandy Rubble

Address 23 Boulder Rd

City Bedrock

Salutation Ms.

Warehouse Record #1

Key 100

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Original Salutation

Ms.

Salutation Mrs.

Date 15-Mar-2001

Source Transaction #2

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Salutation Mrs.

Page 47: The Warehouse Designer’s School Of Hard Knocks

47

More Dimension Types…Combinations

• Type 3 Prime – Types 1 and 2 (the most common)

• Type 4 – Types 1 and 3• Type 5 – Types 2 & 3• Type 6 – Types 1, 2, and 3 (the second most

common)

Page 48: The Warehouse Designer’s School Of Hard Knocks

48

Trigger Fields

• Trigger Fields are fields within a table that you want to track history

• Non-Trigger fields are those which you do not want to track history

Page 49: The Warehouse Designer’s School Of Hard Knocks

49

Type 3 Prime –All and No History

Source Transaction #1

Id 1

Name Sandy Rubble

Address 23 Boulder Rd

City Bedrock

Salutation Ms.

Page 50: The Warehouse Designer’s School Of Hard Knocks

50

Type 3 Prime –All and No History

Source Transaction #1

Id 1

Name Sandy Rubble

Address 23 Boulder Rd

City Bedrock

Salutation Ms.

Warehouse Record #2

Key 100

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Salutation Ms.

Date 15-Mar-2001

Source Transaction #2

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Salutation Ms.

Page 51: The Warehouse Designer’s School Of Hard Knocks

51

Type 3 Prime –All and No History

Source Transaction #1

Id 1

Name Sandy Rubble

Address 23 Boulder Rd

City Bedrock

Salutation Ms.

Warehouse Record #1

Key 100

Id 1

Name Sandy Rubble

Address 23 Boulder Rd

City Bedrock

Salutation Ms.

Date 01-Jan-2001

Source Transaction #2

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Salutation Mrs.

Warehouse Record #2

Key 101

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Salutation Mrs.

Date 15-Mar-2001

Page 52: The Warehouse Designer’s School Of Hard Knocks

52

Expect To Track Everything

• Users want to view the data as it was when the transaction or event occurred

AND…

• Users want to view the data in the context of today’s realities

THUS, model for both!

Page 53: The Warehouse Designer’s School Of Hard Knocks

53

Add ‘Current’ Columns

• In order to provide these two views, consider adding ‘current’ columns to tables. This is a special Type 6.

• These fields get updated in historical records when a trigger field changes value in the current record.

• This simplifies the use of the DW by the users• It’s easier to understand than having to write

complex SQL

Page 54: The Warehouse Designer’s School Of Hard Knocks

54

Type 6 – All, Some, and No History, Ex#1

Source Transaction #1

Id 1

Name Sandy Rubble

Address 23 Boulder Rd

City Bedrock

Salutation Ms.

Warehouse Record #1

Key 100

Id 1

Name Sandy Rubble

Address 23 Boulder Rd

City Bedrock

Salutation Ms.

Current Sal’n Mrs.

Date 01-Jan-2001

Source Transaction #2

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Salutation Mrs.

Warehouse Record #2

Key 101

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Salutation Mrs.

Current Sal’n

Mrs.

Date 15-Mar-2001

Page 55: The Warehouse Designer’s School Of Hard Knocks

55

Type 6 – All, Some, and No History, Ex#2

Source Transaction #1

Id 1

Name Sandy Rubble

Address 23 Boulder Rd

City Bedrock

Salutation Ms.

Warehouse Record #1

Key 100

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Salutation Ms.

Current Sal’n Ms.

Date 01-Jan-2001

Source Transaction #2

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Salutation Ms.

Page 56: The Warehouse Designer’s School Of Hard Knocks

56

Type 6 – All, Some, and No History Ex#2

Warehouse Record #1

Key 100

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Salutation Ms.

Current Sal’n Mrs.

Date 01-Jan-2001

Source Transaction #3

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Salutation Mrs.

Warehouse Record #2

Key 101

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Salutation Mrs.

Current Sal’n

Mrs.

Date 15-Mar-2001

Page 57: The Warehouse Designer’s School Of Hard Knocks

57

History Tracking – Some Closing Thoughts

• Double Keying - Slowly Changing Dimensions (SCD’s)– Consider adding a second surrogate key for the

business keys– Only if you know you have a volatile, multiple source

systems

• Rapidly Changing Dimensions (RCD’s) need to be partitioned– Use Oracle partitioning– Include the native partition key in the dimension– Or split into several tables

Page 58: The Warehouse Designer’s School Of Hard Knocks

58

Row Level Security

Page 59: The Warehouse Designer’s School Of Hard Knocks

59

Row Level Security

• Three key pieces of data are required:– Who are the users? – The relationship to the system?– The relationship amongst each other?

• Combine these 3 pieces of information and you have the key to row level security

Page 60: The Warehouse Designer’s School Of Hard Knocks

60

An Example … The Users Table

• Identifies who and the relationship to the DWUser Id Broker # Broker/MGA Name

Fred 1 Broker #1

Barney 1 Broker #1

Wilma 2 Broker #2

Betty 2 Broker #2

Joe 3 Broker #3

Slate 3 Broker #3

Dino 4 Broker #4

Gazoo 4 Broker #4

Pebbles 5 Best Insurance

Bambam 5 Best Insurance

Arnold 6 Life R Us

Elmo 6 Life R Us

Page 61: The Warehouse Designer’s School Of Hard Knocks

61

The Hierarchy Table

CREATE TABLE BROKER_HIERARCHIES ( BBH_KEY NUMBER(12) DEFAULT -99 NOT NULL, PARENT_BROKER_KEY NUMBER(12) DEFAULT -99 NOT NULL, CHILD_BROKER_KEY NUMBER(12) DEFAULT -99 NOT NULL, PARENT_ROLE_CD VARCHAR2(10), PARENT_ROLE_DSC VARCHAR2(240), CHILD_ROLE_CD VARCHAR2(10), CHILD_ROLE_DSC VARCHAR2(240), TOP_MOST_FLG NUMBER(1), BOTTOM_MOST_FLG NUMBER(1), BBH_REPORTING_ORDER NUMBER(10), BBH_DEPTH_FROM_PARENT NUMBER(10), BBH_EFFECTIVE_DT DATE, BBH_END_DT DATE, BBH_CONTRACT_TYPE_CD VARCHAR2(10), BBH_CONTRACT_TYPE_DSC VARCHAR2(240), BBH_ACTIVE_FLG NUMBER(1), BBH_CREATE_DT DATE, BBH_CREATE_SOURCE VARCHAR2(25), BBH_UPDATE_DT DATE, BBH_UPDATE_SOURCE VARCHAR2(25), BBH_BK VARCHAR2(50), CONSTRAINT PK_BRG_BROKER_HIERARCHIES PRIMARY KEY ( BBH_KEY ) ) ;

• Identifies the relationship amongst each other

Page 62: The Warehouse Designer’s School Of Hard Knocks

62

Which Looks Like

Parent Broker Key

Child Broker Key Parent Name Child Name

1 1 Broker #1 Broker #1

2 2 Broker #2 Broker #2

3 3 Broker #3 Broker #3

4 4 Broker #4 Broker #4

5 1 Best Insurance Broker #1

5 2 Best Insurance Broker #2

6 3 Life R Us Broker #3

6 4 Life R Us Broker #4

5 5 Best Insurance Best Insurance

6 6 Life R Us Life R Us

Page 63: The Warehouse Designer’s School Of Hard Knocks

63

Combining Users & Hierarchies

• Every time we query, we need to pass through these two tables

• Hierarchies will result in self joins or the dreaded CONNECT BY

• We are data warehousers – let’s build a helper or bridge table –

The User Broker Security (UBS) table

Page 64: The Warehouse Designer’s School Of Hard Knocks

64

User Broker Security (UBS) Table

• Every possible combination of who reports to who, denormalized, by user

• Only has 2 columns – skinny table

CREATE TABLE USER_BROKER_SECURITY

(USER_ID VARCHAR2 (15),BROKER_KEY NUMBER (12,0));

Page 65: The Warehouse Designer’s School Of Hard Knocks

65

Results in…

• We’ve added the ‘From’ and ‘Broker Name’ columns for simplicityUser Id From Broker Key Broker Name

Pebbles Best Insurance 5 Best Insurance

Pebbles Best Insurance 1 Broker #1

Pebbles Best Insurance 2 Broker #2

Bambam Life R Us 6 Life R Us

Bambam Life R Us 3 Broker #3

Bambam Life R Us 4 Broker #4

Page 66: The Warehouse Designer’s School Of Hard Knocks

66

The UBS Table Is The Key

• The User Broker Security table is the key to all solutions

• Row level security can be built using:– Materialized View– Virtual Private Database (VPDB)

• No matter what option is used to implement row level security, you will need a table like this in some shape or form

Page 67: The Warehouse Designer’s School Of Hard Knocks

67

What It All Looks Like

Hier-archies

UBS

Fact Policies – Full Table

Fact Policies -Row SecurityOracle

Users

RefreshScript

Page 68: The Warehouse Designer’s School Of Hard Knocks

68

Materialized Views

Hier-archies

UBS

Fact Policies – Full Table

Fact Policies -Row Security

Oracle Users

RefreshScript

Page 69: The Warehouse Designer’s School Of Hard Knocks

69

Virtual Private Database

Fact Policies -Row Security

Components

Oracle Policy

Context

DB Trigger

Hier-archies

UBS Oracle Users

Page 70: The Warehouse Designer’s School Of Hard Knocks

70

BAM Rules, Audit & Administrative Fields

Page 71: The Warehouse Designer’s School Of Hard Knocks

71

Bad & Missing Data

• Bad and/or missing data will be always be an issue

• The source data is never completely clean• There are always exceptions• Recall that you need to tie back into the

source systems for your audit, thus you must load this ‘incorrect’ data

• Put the decisions into the hands of your users – don’t decide for them whether the data is good enough or not

• Need to develop Bad & Missing (BAM) Rules

Page 72: The Warehouse Designer’s School Of Hard Knocks

72

BAM Rules

• Used in the ETL process when loading data that references other tables (e.g. loading a fact table and looking up the dimension record)

• Need a series of rules to follow if the lookup fails

• Create a set of ‘dummy’ records for each referenced table (for Referential Integrity purposes)

• In snapshots, may need a set of dummy records per snapshot period

Page 73: The Warehouse Designer’s School Of Hard Knocks

73

BAM Rules – Dummy Records

-99 Error/Missing

-88 Not Available

-77 Acceptable Error

-66 Temporarily Not Available

-1 Not Applicable

A great hockey team!A great hockey team!

GretzkyGretzky

LindrosLindros

CoffeyCoffey

LemieuxLemieux

Bunny Bunny LarocqueLarocque

Page 74: The Warehouse Designer’s School Of Hard Knocks

74

Dummy Record Meanings

-99 A data element is missing or a lookup into another table cannot find a matching value (e.g. Missing foreign key). The source record is still loaded and the column value is set to –99.

-88 ‘Not Applicable’. This data element is not required in the context of the record.

-77 ‘Acceptable Errors’ that will not be corrected. This data element was invalid (set to -99) during the initial load and will not be corrected or reloaded.

-66 Data is temporarily not available. Usually used in a multiple pass loading process.

-1 ‘Data not available’. This data element is not available from the source record.

Page 75: The Warehouse Designer’s School Of Hard Knocks

75

Error Correction Process

• An area that you can report from and reload from

• Hold or point to the original source record and be able to recreate it (the DW has lost the original value once tagged to a BAM rule)

• Can be one summary table with standard error types

• For more detail, create one error table for each target table

• Create a series of error flag columns in the error table indicating what went wrong

Page 76: The Warehouse Designer’s School Of Hard Knocks

76

Error Correction Model – Summary Mode

Error_type

error_type_cd: VARCHAR(2) NOT NULL

error_type_desc: VARCHAR(255)last_update_ts: TIMESTAMP NOT NULLrecord_expiry_ts: TIMESTAMP

Severity_Level

severity_cd: VARCHAR(3) NOT NULL

severity_desc: VARCHAR(255)last_update_ts: TIMESTAMP NOT NULLrecord_expiry_ts: TIMESTAMP ETL_ERROR

etl_load_key: INTEGER NOT NULL (FK)sys_load_col_name: VARCHAR(30) NOT NULLsource_name: VARCHAR(80) NOT NULL (FK)error_type_cd: VARCHAR(2) NOT NULL (FK)source_row_id: INTEGER

severity_cd: VARCHAR(3) NOT NULL (FK)

Page 77: The Warehouse Designer’s School Of Hard Knocks

77

Error Correction Model – Detail Mode

Stage

Target

LoadProcessSource

Reload

ErrorExists

Page 78: The Warehouse Designer’s School Of Hard Knocks

78

Audit Considerations

• A key area that is quite often ignored• You must match to the source systems or be

able to explain the differences• Auditing data loads (when did we start a load

and what is the status?)

• Without proof, you will not get the credibility!

Page 79: The Warehouse Designer’s School Of Hard Knocks

79

Audit Model

ETL_AUDIT

etl_load_key: INTEGER NOT NULL

academic_yr: CHAR(9)prev_etl_load_key: INTEGERmost_rcnt_fy_ind: CHAR NOT NULLsystem_cd: VARCHAR(5) NOT NULL (FK)load_status_flg: VARCHAR(12)load_type_flg: CHARstage_archvd_date: DATEwh_archvd_date: DATEstage_start_ts: TIMESTAMPwarehouse_start_ts: TIMESTAMPnum_rows_read: INTEGERfct_cleanup_ind: CHARacad_yr_transt_ind: CHAR

ETL_Source_System

system_cd: VARCHAR(5) NOT NULL

system_name: VARCHAR(20)system_desc: VARCHAR(255)sys_req_file_cnt: INTEGER

ETL_AUDIT_TABLE_LOADS

etl_load_key: INTEGER NOT NULL (FK)source_name: VARCHAR(80) NOT NULL

num_rows_read: INTEGERnum_records_reqd: INTEGERload_status_flg: VARCHAR(12)extract_num: INTEGERextract_ts: TIMESTAMPstop_source_row_id: INTEGERload_session_name: VARCHAR(80)load_start_ts: TIMESTAMPload_stop_ts: TIMESTAMP

Page 80: The Warehouse Designer’s School Of Hard Knocks

80

Pulling Audit & Error Correction Together

ETL_AUDIT

etl_load_key: INTEGER NOT NULL

academic_yr: CHAR(9)prev_etl_load_key: INTEGERmost_rcnt_fy_ind: CHAR NOT NULLsystem_cd: VARCHAR(5) NOT NULL (FK)load_status_flg: VARCHAR(12)load_type_flg: CHARstage_archvd_date: DATEwh_archvd_date: DATEstage_start_ts: TIMESTAMPwarehouse_start_ts: TIMESTAMPnum_rows_read: INTEGERfct_cleanup_ind: CHARacad_yr_transt_ind: CHAR

ETL_Source_System

system_cd: VARCHAR(5) NOT NULL

system_name: VARCHAR(20)system_desc: VARCHAR(255)sys_req_file_cnt: INTEGER

ETL_AUDIT_TABLE_LOADS

etl_load_key: INTEGER NOT NULL (FK)source_name: VARCHAR(80) NOT NULL

num_rows_read: INTEGERnum_records_reqd: INTEGERload_status_flg: VARCHAR(12)extract_num: INTEGERextract_ts: TIMESTAMPstop_source_row_id: INTEGERload_session_name: VARCHAR(80)load_start_ts: TIMESTAMPload_stop_ts: TIMESTAMP

Error_type

error_type_cd: VARCHAR(2) NOT NULL

error_type_desc: VARCHAR(255)last_update_ts: TIMESTAMP NOT NULLrecord_expiry_ts: TIMESTAMP

Severity_Level

severity_cd: VARCHAR(3) NOT NULL

severity_desc: VARCHAR(255)last_update_ts: TIMESTAMP NOT NULLrecord_expiry_ts: TIMESTAMP ETL_ERROR

etl_load_key: INTEGER NOT NULL (FK)sys_load_col_name: VARCHAR(30) NOT NULLsource_name: VARCHAR(80) NOT NULL (FK)error_type_cd: VARCHAR(2) NOT NULL (FK)source_row_id: INTEGER

severity_cd: VARCHAR(3) NOT NULL (FK)

Page 81: The Warehouse Designer’s School Of Hard Knocks

81

Administrative Fields

• Supports the ‘behind the scenes’ aspects– Loading– Querying

• Different requirements for dimensions and facts

• But try to standardize across all tables, even if the fields aren’t utilized today

Page 82: The Warehouse Designer’s School Of Hard Knocks

82

Dimension Tables

• Record Type – indicates New, Modify, Delete, Correction

• Active Flg - indicates a business key is active• Most Recent Flg - indicates the most recent

row loaded within a business key• Effective Date - for the instance of that row• End Date - for the instance of that row• Create Date• Update Date• Create User• Update User

Page 83: The Warehouse Designer’s School Of Hard Knocks

83

Fact Tables

• Record Type• Active Flg• Most Recent Flg• Row Cnt• Partition Date – store the actual date value• Create Date• Update Date• Create User• Update User

Page 84: The Warehouse Designer’s School Of Hard Knocks

84

Other Tidbits Of Advice

Page 85: The Warehouse Designer’s School Of Hard Knocks

85

Random Thoughts

• Ensure you secure…– Budget– Top management commitment

• Have focus (scope definition)• Develop incrementally• Have a business driven solution• Use experienced designers and implementers• Use industry tools for development

Page 86: The Warehouse Designer’s School Of Hard Knocks

86

More Random Thoughts

• Generally, make all of your column names unique across tables

• Conform fact table measures (same name)• Don’t normalize too much – jump right into a

dimensional design• Avoid retroactive changes• Don’t be afraid of many dimensions

Page 87: The Warehouse Designer’s School Of Hard Knocks

87

Don’t Be Afraid Of Too Many Dimensions

• 1 Fact, 41 CONFORMED dimensions

DIM_PRODUCTS

DIM_ICD9_ADMITTING_DIAGNOSES

DIM_AUTHORIZING_PROVIDERS

DIM_PROVIDER_ROLES

DIM_HCP_CODES

DIM_CONTRACTS

DIM_PCP_PANELS

DIM_AGES

DIM_MR_CLASSIFICATIONS

DIM_PLACES_OF_SERVICE

DIM_SEXES

DIM_MARITAL_STATUSES

FCT_CLAIMS

DIM_SERVICE_PROVIDERS

DIM_MEMBERS

DIM_BENEFIT_PACKAGES

DIM_TIER_PLAN_TYPES

DIM_CORPORATIONS

DIM_EMPLOYER_GROUPS

DIM_MODIFIERS

DIM_ICD9_PRIMARY_DIAGNOSES

DIM_ICD9_SECONDARY_DIAGNOSES

DIM_MEMBER_LOCATIONS

DIM_PROVIDER_LOCATIONS

DIM_DATES_OF_FIRST_SERVICE

DIM_DATES_OF_LAST_SERVICE

DIM_PAID_DATES

DIM_CLAIM_RECEIVED_DATES

DIM_ADMISSION_DATES

DIM_CLAIM_INVOICE_DATES

DIM_DISCHARGE_DATES

DIM_REFERRING_PROVIDERS

DIM_PCPS

DIM_CPT4_CODES

DIM_HCPCS_CODES

DIM_REVENUE_CODES

DIM_ICD9_PROCEDURE_CODES

DIM_PCP_NETWORKS

DIM_MEDICAL_PCP_NETWORKS

DIM_SERV_PROV_NETWORKS

CLAIMS_DETAIL

DIM_DRG_CODES

Page 88: The Warehouse Designer’s School Of Hard Knocks

88

12 Common DW Design Mistakes(Intelligent Enterprise: Ralph Kimball Oct 2001)

1. Place text attributes in a fact table when you want to use them as constraints and groupings

2. Limit the use of verbose descriptions in your dimensions to save space

3. Split hierarchy and hierarchy levels into multiple dimension tables

4. Delay dealing with slowly changing dimensions 5. Use smart keys to join dimension and fact tables6. Add dimensions to fact tables before declaring the

grain

Page 89: The Warehouse Designer’s School Of Hard Knocks

89

7. Declare that the dimensional model is based on a specific report

8. Mixing different grains in one fact table9. Leave lowest-level atomic data in non-

dimensional format10.Avoid building aggregates and use hardware

for performance improvements11.Fail to conform fact data12.Fail to conform dimension data

12 Common DW Design Mistakes(Intelligent Enterprise: Ralph Kimball Oct 2001)

Page 90: The Warehouse Designer’s School Of Hard Knocks

90

Dave’s Top 10 Gotchya’s

1. Failing to model for both a) view of the data when the event occurred and b) view of the data as of today’s reality

2. Limiting the number of dimensions3. Failing to model and populate a meta data

repository4. Failing to provide sufficient audit capabilities

to verify loads against source systems5. Not using surrogate keys for everything

Page 91: The Warehouse Designer’s School Of Hard Knocks

91

Dave’s Top 10 Gotchya’s (cont’d)

6. Failing to design an error correction process7. Normalizing too much8. Not using a staging area9. Failing to load ALL of the fact data10.Failing to classify incorrect data

Page 92: The Warehouse Designer’s School Of Hard Knocks

92

In Summary

• Be careful in your design• Meet business

requirements• Address the ‘behind the

scenes’ issues• Remember: DW design is

not a science, it is an art• Thus be an artist and

create

Page 93: The Warehouse Designer’s School Of Hard Knocks

93

AQ&Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S

David [email protected]

& D R A W & D R A W

Thank You!