DW Concepts Dimension Modeling Techniques
Post on 07-Apr-2018
230 Views
Preview:
Transcript
8/3/2019 DW Concepts Dimension Modeling Techniques
1/59
1
www.technologica.com
DW Concepts
Dimension ModelingTechniques
8/3/2019 DW Concepts Dimension Modeling Techniques
2/59
2
www.technologica.com
TechnoLogica DW Projects
Business Management SystemNational Health Insurance Fund (10.2004 current)
Customer Data IntegrationAllianz Bulgaria Holding (10.2004 current)
Regulatory Reporting SystemBULBANK (2002 - 2003)
Information System Monetary StatisticsBulgarian National Bank (April 2003 August 2004)
Management Information SystemBULBANK (January 2001 - June 2002)
8/3/2019 DW Concepts Dimension Modeling Techniques
3/59
3
www.technologica.com
Agenda
DW Terminology Overview
Dimensional Modeling
Dimension Types
History and Dimensions
Hierarchy in Dimensions
8/3/2019 DW Concepts Dimension Modeling Techniques
4/59
4
www.technologica.com
The data warehouse must
Make an organizations information easily accessible.
Present the organizations information consistently.
Be adaptive and resilient to change
Be a secure bastion that protects our informationassets.
Serve as the foundation for improved decision making
The business community must accept the datawarehouse if it is to be deemed successful.
8/3/2019 DW Concepts Dimension Modeling Techniques
5/59
5
www.technologica.com
Components of a Data Warehouse
8/3/2019 DW Concepts Dimension Modeling Techniques
6/59
6
www.technologica.com
Dimensional Modeling
Dimensional modeling is a new name for an oldtechnique for making databases simple andunderstandable
Dimensional modeling is quite different from third-normal-form (3NF) modeling
ERM ->The TransactionProcessing Model
o One table per entity
o Minimize data redundancy
o Optimize update
DM -> The data warehousingmodel
o One fact table for a process inthe organization
o Maximize understandability
o Optimized for retrieval
o Resilient to change
8/3/2019 DW Concepts Dimension Modeling Techniques
7/59
7
www.technologica.com
Star Dimensional Modeling
History(Dimension
table)
Customer(Dimension
table)
Product(Dimension
table)
Channel(Dimension
table)
Item_nbr
Item_descQuantityDiscnt_priceUnit_priceOrder_amount
(Fact table)
OrderHistory
(Dimensiontable)
Customer(Dimension
table)
Product(Dimension
table)
Channel(Dimension
table)
Item_nbr
Item_descQuantityDiscnt_priceUnit_priceOrder_amount
(Fact table)
Order
8/3/2019 DW Concepts Dimension Modeling Techniques
8/59
8
www.technologica.com
Four-Step Dimensional Design Process
1. Select the business process to model.
2. Declare the grain of the business process.
3. Choose the dimensions that apply to each facttable row.
4. Identify the numeric facts that will populateeach fact table row.
8/3/2019 DW Concepts Dimension Modeling Techniques
9/59
9
www.technologica.com
Dimensions
Determine these by the ways you want to sliceand dice the data
Small number of rows compared to facts
Usually 5-10 dimensions surrounding a fact table
Time is almost always a dimension used byevery fact
Track history
Uses Surrogate Keys
Hierarchies are usually built into them if possible
8/3/2019 DW Concepts Dimension Modeling Techniques
10/59
10
www.technologica.com
Date Dimension
The date dimension is the one dimension nearlyguaranteed to be in every data mart
Date Dimension = Time Dimension before
We can build the date dimension table inadvance (5-10 years -> only 3,650 rows)
8/3/2019 DW Concepts Dimension Modeling Techniques
11/59
11
www.technologica.com
DateDimension
8/3/2019 DW Concepts Dimension Modeling Techniques
12/59
12
www.technologica.com
Date Dimension
8/3/2019 DW Concepts Dimension Modeling Techniques
13/59
13
www.technologica.com
Date Dimension
Data warehouses always need an explicit datedimension table. There are many date attributesnot supported by the SQL date function, includingfiscal periods, seasons, holidays, and weekends.
Rather than attempting to determine thesenonstandard calendar calculations in a query, weshould look them up in a date dimension table.
select sum(f.amount_sold)
from DATE_DIM d, FACT fwhere d.Calendar_Month = January
and d.id = f.date_dim_id;
8/3/2019 DW Concepts Dimension Modeling Techniques
14/59
14
www.technologica.com
Dimension Normalization(Denormalized dimension)
8/3/2019 DW Concepts Dimension Modeling Techniques
15/59
15
www.technologica.com
Dimension Normalization(Denormalized dimension)
8/3/2019 DW Concepts Dimension Modeling Techniques
16/59
16
www.technologica.com
Dimension Normalization(Snowflaking)
8/3/2019 DW Concepts Dimension Modeling Techniques
17/59
17
www.technologica.com
Dimension Normalization(Snowflaking)
The dimension tables should remain as flattables physically.
Normalized, snowflaked dimension tables
penalize cross-attribute browsing and prohibit theuse of bit-mapped indexes.
Disk space savings gained by normalizing thedimension tables typically are less than 1 percent
of the total disk space needed for the overallschema
8/3/2019 DW Concepts Dimension Modeling Techniques
18/59
18
www.technologica.com
Too Many Dimensions
8/3/2019 DW Concepts Dimension Modeling Techniques
19/59
19
www.technologica.com
Too Many Dimensions
A very large number of dimensions typically is asign that several dimensions are not completelyindependent and should be combined into asingle dimension.
If our design has 25 or more dimensions, weshould look for ways to combine correlateddimensions into a single dimension
It is a dimensional modeling mistake to representelements of a hierarchy as separate dimensionsin the fact table.
8/3/2019 DW Concepts Dimension Modeling Techniques
20/59
20
www.technologica.com
Surrogate Keys
Every join between dimension and fact tables inthe data warehouse should be based onmeaningless integer surrogate keys.
You should avoid using the natural operationalproduction codes. None of the data warehousekeys should be smart, where you can tellsomething about the row just by looking at thekey.
8/3/2019 DW Concepts Dimension Modeling Techniques
21/59
21
www.technologica.com
Surrogate Keys
Surrogate keys are like an immunization for thedata warehouse
Buffer the data warehouse environment fromoperational changes
Performance advantagesThe smaller surrogate key translates into smaller fact tables,smaller fact table indices, and more fact table rows per blockinput-output operation
Surrogate keys are used to record dimensionconditions that may not have an operational codeNo Promotion in Effect, Date Not Applicable.
8/3/2019 DW Concepts Dimension Modeling Techniques
22/59
22
www.technologica.com
Surrogate Keys
The date dimension is the one dimension wheresurrogate keys should be assigned in ameaningful, sequential order
Surrogate keys are needed to support one of theprimary techniques for handling changes todimension table attributes
Dont use concatenated or compound keys for
dimension tables
8/3/2019 DW Concepts Dimension Modeling Techniques
23/59
23
www.technologica.com
Data Warehouse Bus Architecture
8/3/2019 DW Concepts Dimension Modeling Techniques
24/59
24
www.technologica.com
Data Warehouse Bus Matrix
8/3/2019 DW Concepts Dimension Modeling Techniques
25/59
25
www.technologica.com
Conformed Dimensions
Most dimensions are defined naturally at the mostgranular level possible
Conformed dimensions are either identical or strictmathematical subsets of the most granular,
detailed dimension
They have consistent dimension keys, consistentattribute column names, consistent attributedefinitions, and consistent attribute values
The conformed dimension may be the samephysical table within the database or may beduplicated synchronously in each data mart
8/3/2019 DW Concepts Dimension Modeling Techniques
26/59
26
www.technologica.com
Conformed Dimensions
Roll-up dimensions conform to the base-levelatomic dimension if they are a strict subset of thatatomic dimension.
2
8/3/2019 DW Concepts Dimension Modeling Techniques
27/59
27
www.technologica.com
Conformed Dimensions
They should be built once in the staging area
They must be published prior to staging of thefact data
The dimension authority has responsibility fordefining, maintaining, and publishing a particulardimension or its subsets to all the data martclients who need it
28
8/3/2019 DW Concepts Dimension Modeling Techniques
28/59
28
www.technologica.com
Tracking History in Dimensions
Unchanging Dimensions
Changing, but Original Values are IrrelevantA phone number in a customer record
Slowly Changing Dimensions (SCD)A customer address, manager
Rapidly Changing DimensionsIncome range of a customer
Continuously Changing DimensionsCustomer age
29
8/3/2019 DW Concepts Dimension Modeling Techniques
29/59
29
www.technologica.com
Type 1: Overwrite the Value
The type 1 response is easy to implement, but:
it does not maintain any history of prior attribute values
any preexisting aggregations based on the departmentvalue will need to be rebuilt
30
8/3/2019 DW Concepts Dimension Modeling Techniques
30/59
30
www.technologica.com
The type 2 response is the primary technique foraccurately tracking slowly changing dimensionattributes. It is extremely powerful because thenew dimension row automatically partitions
history in the fact table.
Its not suitable for dimension tables that alreadyexceed a million rows
Type 2: Add a Dimension Row
31
8/3/2019 DW Concepts Dimension Modeling Techniques
31/59
31
www.technologica.com
Type 2: Add a Dimension Row
ProductKey
ProductDescription Department
SKUNumber
EffectiveDate
ExpirationDate
12345 IntelliKidz 1.0 Education ABC922-Z 01.1.1900 22.4.2005
25984 IntelliKidz 1.0 Strategy ABC922-Z 23.4.2005 01.1.2500
ProductKey
ProductDescription Department
SKUNumber
EffectiveDate
Most
ResentFlag
12345 IntelliKidz 1.0 Education ABC922-Z 01.1.1900 N
25984 IntelliKidz 1.0 Strategy ABC922-Z 23.4.2005 Y
Product
Key Date Key
Amount
Sold
12345 200 100
8/3/2019 DW Concepts Dimension Modeling Techniques
32/59
32
www.technologica.com
Type 3: Add a Dimension Column
The type 3 slowly changing dimension techniqueallows us to see new and historical fact data byeither the new or prior attribute values.
33
8/3/2019 DW Concepts Dimension Modeling Techniques
33/59
33
www.technologica.com
Hybrid SCD Techniques
Series of Type 3 Attributes
Predictable Changes withMultiple Version Overlays
Report each years sales using thedistrict map for that year.
Report each years sales using adistrict map from an arbitrarydifferent year.
Report an arbitrary span of years
sales using a single district mapfrom any chosen year. The mostcommon version of this requirementwould be to report the completespan of fact data using the currentdistrict map.
34
8/3/2019 DW Concepts Dimension Modeling Techniques
34/59
34
www.technologica.com
Hybrid SCD TechniquesType 2 with "Current" Overwrite
Unpredictable Changes with Single-Version Overlaypreserves historical accuracy while supporting the ability toreport historical data according to the current values
35
8/3/2019 DW Concepts Dimension Modeling Techniques
35/59
35
www.technologica.com
Dimension Table Staging
36
8/3/2019 DW Concepts Dimension Modeling Techniques
36/59
36
www.technologica.com
Dimension Table Staging
38
8/3/2019 DW Concepts Dimension Modeling Techniques
37/59
38
www.technologica.com
Junk Dimensions
What to do with flags and indicators Leave the flags and indicators unchanged in the fact
table row.
Make each flag and indicator into its own separate
dimension Strip out all the flags and indicators from the design.
A junk dimension is a convenient grouping oftypically low-cardinality flags and indicators
39
8/3/2019 DW Concepts Dimension Modeling Techniques
38/59
39
www.technologica.com
Junk Dimensions
Whether to use junk dimension5 indicators, each has 3 values -> 243 (35) rows
5 indicators, each has 100 values -> 100 million (1005) rows
When to insert rows in the dimension
40
8/3/2019 DW Concepts Dimension Modeling Techniques
39/59
40
www.technologica.com
Multiple Currencies
41
8/3/2019 DW Concepts Dimension Modeling Techniques
40/59
41
www.technologica.com
Customer Dimension
Critical element for effective CRM
The most challenging dimension for any datawarehouse
extremely deep (with millions of rows) extremely wide (with dozens or even hundreds of
attributes)
sometimes subject to rather rapid change
42
8/3/2019 DW Concepts Dimension Modeling Techniques
41/59
42
www.technologica.com
Customer DimensionName and Address Parsing
43
8/3/2019 DW Concepts Dimension Modeling Techniques
42/59
43
www.technologica.com
Customer DimensionOther Common Customer Attributes
Gender
Ethnicity
Age or other life-stage classifications
Income or other lifestyle classificationsStatus (for example, new, active, inactive, closed)
Referring source
Business-specific market segment
Scores characterizing the customer, such aspurchase behavior, payment behavior, productpreferences
44
8/3/2019 DW Concepts Dimension Modeling Techniques
43/59
www.technologica.com
Customer DimensionAggregated Facts as Attributes
These attributes are to be used for constrainingand labeling; they are not to be used in numericcalculations
Focus on those which will be used frequently
Minimize the frequency with which theseattributes need to be updated
Replace metrics with more meaningful
descriptive values, such as High Spender
45
Di i O i f
8/3/2019 DW Concepts Dimension Modeling Techniques
44/59
www.technologica.com
Dimension Outriggers for aLow-Cardinality Attribute Set
46
R idl Ch i C
8/3/2019 DW Concepts Dimension Modeling Techniques
45/59
www.technologica.com
Rapidly Changing CustomerDimensions
Challenges It generally takes too long to constrain or browse
among the relationships in such a big table
It is difficult to use previously described techniques fortracking changes in these large dimensions
One solution is to break off frequently analyzed orfrequently changing attributes into a separatedimension, referred to as a minidimension
47
R idl Ch i C t
8/3/2019 DW Concepts Dimension Modeling Techniques
46/59
www.technologica.com
Rapidly Changing CustomerDimensions
The Mini Dimension with "Current" Overwrite
48
R idl Ch i C t
8/3/2019 DW Concepts Dimension Modeling Techniques
47/59
www.technologica.com
Rapidly Changing CustomerDimensions
The minidimensionterminology refers to whenthe demographics key is part of the fact tablecomposite key
If the demographics key is a foreign key in thecustomer dimension, we refer to it as anoutrigger
49
R idl Ch i C t
8/3/2019 DW Concepts Dimension Modeling Techniques
48/59
www.technologica.com
Rapidly Changing CustomerDimensions
Type 2 with Natural Keys in Fact Table
Customer Dimension - Current
Attributes (SCD1) Fact Table
Customer ID (Natural Key) Customer Key (FK)
Customer Name Customer Demographics Key (FK)
Customer Address More Foreign Keys
Customer Date of Birth Facts
Customer Date of 1st Order
Age
Gender
Customer Dimension - "As was"
Attributes (SCD2)
Annual Income Customer Key (PK)
Number of Children Customer ID (Natural Key)
Marital Status Customer Name
Customer Address
Customer Date of Birth
Customer Date of 1st Order
Age
Gender
Annual Income
Number of Children
Marital Status
50
I li ti f T 2 C t
8/3/2019 DW Concepts Dimension Modeling Techniques
49/59
www.technologica.com
Implications of Type 2 CustomerDimension Changes
Be careful to avoid overcounting because wemay have multiple rows in the customerdimension for the same individual
COUNT DISTINCT
A most recent row indicator
The comparison operators depend on thebusiness rules used to set our effective/expirationdates.
51
8/3/2019 DW Concepts Dimension Modeling Techniques
50/59
www.technologica.com
Capture the keys of the customers or productswhose behavior you are tracking
Customer Behavior Study Groups
52
8/3/2019 DW Concepts Dimension Modeling Techniques
51/59
www.technologica.com
Commercial Customer Hierarchies
53
8/3/2019 DW Concepts Dimension Modeling Techniques
52/59
www.technologica.com
Commercial Customer Hierarchies
Bridge tables
54
8/3/2019 DW Concepts Dimension Modeling Techniques
53/59
www.technologica.com
Commercial Customer Hierarchies
55
8/3/2019 DW Concepts Dimension Modeling Techniques
54/59
www.technologica.com
Commercial Customer Hierarchies
Be aware of risk of double counting
SELECT 'San Francisco', SUM(F.REVENUE)FROM FACT F, DATE DWHERE F.CUSTOMER_KEY IN
(SELECT B.SUBSIDIARY_KEYFROM CUSTOMER C, BRIDGE BWHERE C.CUSTOMER_KEY =
B.PARENT_KEY
AND C.CUSTOMER_CITY = 'SanFrancisco') //to sum all SF parentsAND F.DATE_KEY = D.DATE_KEYAND D.MONTH = 'January 2002GROUP BY 'San Francisco'
56
8/3/2019 DW Concepts Dimension Modeling Techniques
55/59
www.technologica.com
Heterogeneous Product Schemas
57
8/3/2019 DW Concepts Dimension Modeling Techniques
56/59
www.technologica.com
Heterogeneous Product Schemas
58
Common Dimensional Modeling
8/3/2019 DW Concepts Dimension Modeling Techniques
57/59
www.technologica.com
Common Dimensional ModelingMistakes to Avoid
Mistake 10: Place text attributes used forconstraining and grouping in a fact table
Mistake 9: Limit verbose descriptive attributes indimensions to save space
Mistake 8: Split hierarchies and hierarchy levelsinto multiple dimensions
Mistake 7: Ignore the need to track dimension
attribute changes
Mistake 6: Solve all query performance problemsby adding more hardware
59
Common Dimensional Modeling
8/3/2019 DW Concepts Dimension Modeling Techniques
58/59
www.technologica.com
Common Dimensional ModelingMistakes to Avoid
Mistake 5: Use operational or smart keys to joindimension tables to a fact table
Mistake 4: Neglect to declare and then complywith the fact tables grain
Mistake 3: Design the dimensional model basedon a specific report
Mistake 2: Expect users to query the lowest-level
atomic data in a normalized forma
Mistake 1: Fail to conform facts and dimensionsacross separate fact tables
60
8/3/2019 DW Concepts Dimension Modeling Techniques
59/59
Answers
Questions
and
top related