About Rafael
DW BI Professional– 12 years
SQL Server MVP – 4 years
Architect/Consultant @ Quaero, CSG
Systems
Live in Charlotte, NC
Quaero is Hiring! DB Engineer
5+ years of database support
Expertise on SQL Server 2005 and 2008 database environment is a must
Expertise on ETL skills including SSIS packages, stored procedures and T-SQL.
Ability to work directly and effectively with clients.
Experience working in complex production database environments
Experience in implementing data hygiene and customer matching routine is plus.
Excellent written and verbal communication skills
Experience in scripting language and XML a plus.
Agenda
The Stage: Kimball‟s Data Warehouse
Lifecycle overview
Dimensional Modeling Basics
Dimensional Design Process: 4 steps
More About Dimension tables
More About Fact Tables
What To Expect?
Introduction to dimensional modeling
concepts, terminology and design
guidelines
Not an advanced dimensional modeling
class
No demos, but lots of slides
Questions welcome at anytime
The Stage: Kimball’s DW
Lifecycle
Kimball DW Lifecycle is one of the most
popular data warehousing
methodologies
First Lifecycle book published in 1996,
latest in 2010
Dimensional model or “star schema” is
today‟s dominant “theme” in leading BI
field
Kimball DW Lifecycle
Fundamentals
Enterprise data warehouse framework
Business Driven
Iterative approach
Dimensional Model for data delivery
Intuitive DB model to end users
Fast query performance
Dimensional Modeling in the DW
Lifecycle
Dimensional Modeling
Logical model design technique
Intuitive DB structures to end users
Fast query performance
Divides the world in
Facts
Dimensions
Also known as “Star Schema”
Reviewing Star Schema Benefits
Transforms normalized data into a simpler model
Delivers high-performance queries
SQL Server offers Star Join Query Optimization
Uses mature modeling techniques that are widely supported by many BI tools
Requires low maintenance as the data warehouse design evolves
Introducing the Star Schema
Facts
A measurement of a business event
Numeric values
Additive, semi-additives, non-additives
Normalized data structures
Fact Table Anatomy
Dimension keys (FKs)
Facts
Dimensions
Context of the facts
Descriptive attributes
Who, what, where, when, how…
Query Constraining and result set
labeling
Denormalized data structures
e.g., Geography, Customer, Time,
Product
Dimension Denormalization
Denormalization of
Customer
Before You Start Modeling
DW Bus Matrix
DW High level architecture
Dimensional Design Process: 4
steps
Business Requirements
• Bus Matrix
Data Reality
• Initial Data
Profiling
Step 1: Choose the business process
Step 2: Declare the grain
Step 3: Identify Dimensions
Step 4: Identify Facts
High Level Dimensional Model
Grain = one row per General Ledger
Journal line
Applied
Date
P and L
Unit
Vendor
Client
GL
Account
Number
Record
ed Date
GL Journal
Line
GL
Transacti
on Detail
= Fact
= Dimension
GL Main
Account
Period
Ending
Date
P and L
Unit
Vendor
Client
GL
Account
Number
GL Balance
Grain = one row per GL Account per
budget period
Detailed Dimensional Model
More About Dimensions
Surrogate Keys
Conformed Dimensions
Slowly Changing Dimensions (SCD)
Role-Playing Dimensions
Date Dimension
Surrogate keys
“A meaningless key, ideally integer number, to be used as
the primary key of dimensions”
Better query performance
Creating row versioning is easier
No risk of key collision for multi-source DW
Avoid overhead of using transactional keys
Flexibility when inserting pre-defined rows
Conformed Dimensions
Shared dimensions across the enterprise
Deliver a consistent interpretation for all business process involved
Allow for drill across fact tables
ETL work is done only once
Applied
Date
P and L
Unit
Vendor
Client
GL
Account
Number
Record
ed Date
GL Journal
Line
GL
Transacti
on Detail
GL
Main
Account
Period
Ending
Date
P and L
Unit
Vendor
Client GL
Account
Number
GL Balance
Slowly Changing Dimensions (SCD)
How do the dimensions have to
respond to data changes?
Common types SCD Type 1
SCD Type 2
SCD Type 3
SCD Type 6
Slowly Changing Dimensions
(SCD) Type 1
Override previous value
Best when tracking history is not required
1 row per natural key
Simplest approach for handling data
changes
Insert…else…update
SQL Server 2008 T-SQL 'Merge‟
SSIS SCD Transformation
Slowly Changing Dimensions
(SCD) Type 1
Customer Key Customer
Code Customer First
Name Customer Last
Name ETL Insert
Date ETL Update
Date
12345 YFG-FDS Jane Ross 02/24/2008
Customer Dimension
Last name changes
Customer Key Customer Code
Customer First
Name Customer Last
Name ETL Insert
Date ETL Update
Date
12345 YFG-FDS Jane Smith 02/24/2008 09/09/2008
Existing row is updated!
Slowly Changing Dimensions
(SCD) Type 2
Insert a new row
Best for tracking changes in attribute values
Use effective dates to represent row lifespan
If row does not exists then insert …else
expire current version and insert new one.
Slowly Changing Dimensions
(SCD) Type 2
Customer Dimension
A new row is inserted!
Existing row is expired!
Customer
Key Customer
Code Customer First
Name Customer
Last Name Start Date End Date Current
row
12345 YFG-FDS Jane Ross 02/24/2008 12/31/2099 Y
Customer Dimension
Last name change
Customer
Key Customer
Code Customer First
Name Customer
Last Name Start Date End Date Current
row
12345 YFG-FDS Jane Ross 02/24/2008 09/08/2008 N
67843 YFG-FDS Jane Smith 09/09/2008 12/31/2099 Y
Role-Playing Dimensions
Same physical dimension plays distinct
logical roles in a fact table
Implemented through views or query aliases
Date Dimension
playing 4 roles
Date Dimension
Grain should not be lower than daily
Hour: 8,736 rows per year
Minute: 525,600 rows per year
Second: A way too many…
Surrogate key rule exception: intelligent
key is recommended (integer value:
20081011)
Time of day, if required, in fact table
(most cases)
More about Facts
3 Type of fact tables:
Transaction
Periodic snapshot
Accumulating snapshots
Transaction Fact Tables
Records events in a point in
time
Represent transaction
activity
The most common type of
fact tables
Only inserts (most cases)
Store facts at the most atomic level possible
Periodic Snapshot Fact Tables „Snapshots‟ taken in a
regular basis
regardless of activity
Stores 1 row per time
period
Complement of
transactional fact tables
Only Inserts (most
cases)
Accumulating Snapshot Fact
Tables
Captures activity for processes with defined beginning and end
1 row per event lifetime
Fact row is updated at each milestone
Least frequently used Fact table type
Accumulating Snapshot Fact
Tables Appl.Key Start Date Complete
date
Transm.Da
te
Process
date
1 20080215 -1 -1 -1
Appl.Key Start Date Complete
date
Transm.Da
te
Process
date
1 20080215 20080217 20080217 -1
Appl.Key Start Date Complete
date
Transm.Da
te
Process
date
1 20080215 20080217 20080217 20080219
Insert
Update
Update
T1
T2
T3
Dimensional Modeling Myths
It fits only as departmental solution
Limited extensibility potential
It only provides aggregated data
It only supports many-to-one
relationships
It is waste of disk space
Risks
High Profile Success (and failure!) is visible to Management
Business Driven Hard for technologists
Technology Focus Let‟s build it and users will come
Dashboards not a good starting point
Data Quality and integration
Complexity Tackling too much at once
SQL Server and Dimensional
modeling
SSAS
SSIS
SCD transformation ETL
Relational Engine
T-SQL Merge ETL
Start join optimization Query
performance
Want to learn more?
Kimball Method:
The Data Warehouse Lifecycle Toolkit. 2nd
edition. 2008
Dimensional Modeling advanced
techniques
The Data Warehouse Toolkit. 2nd edition.
2002
SQL Server 2008 BI/DW:
www.microsoft.com/bi/