Top Banner
Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions) : The hidden slides of this slideshow may be important. However, I will focus on leaning by exercises and therefore, rattling off new concepts are often done in hidden slides.
37

Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Dec 17, 2015

Download

Documents

Sabina Baker
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions):

The hidden slides of this slideshow may be important. However, I will focus on leaning by exercises and therefore, rattling off new concepts are often done in hidden slides.

Page 2: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Introduction to Slowly Changing Dimensions (SCD)

Bank accounts

Branch-offices

- Account# - Interest-last-year - Cost-last-year - Branch#

- Branch# - Branch-name - Branch-size

Fact table Dimension

If the attributes of a dimension is dynamic (e.i. they may be updated) we say that they are slowly changing.

May the Branch-size of a Branch-office change after e.g. a renovation?May the Branch-name of a Branch-office change?

Page 3: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Exercise in SCD:

Soppose the attribute Branch-size is dynamic and aggregations is made to the level (Branch-size, Year) or (Branch-size, Month) .

Does this aggregation make sense and how would you solve possible problems?

Bank accounts

Branch-offices

- Account# - Interest-last-year - Cost-last-year - Branch#

- Branch# - Branch-name - Branch-size

Fact table Dimension

Page 4: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Exercise in SCD:

Soppose the attribute Branch-name is dynamic and aggregations is made to the level (Branch-name, Year).

Does this aggregation make sense and how would you solve possible problems?

Bank accounts

Branch-offices

- Account# - Interest-last-year - Cost-last-year - Branch#

- Branch# - Branch-name - Branch-size

Fact table Dimension

Page 5: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Problems with slowly changing dimensions:

TimeID

Branch Office

ProductID

Amount

Price

ProductID

Product name

Product group

Price category

Branch Office

Address

City

District

Size group

Value group

TimeID

Dayname

Week

Month

Quarter

Year

Day no

Working day

•If you do not update a dynamic attribute the datawarehouse is stale. •If you update a dynamic attribute the old measures may be aggregated to a wrong attribute level value as e.g. the Branch office size!

Which dimension attributes and relationships may be slowly changing and which of these give aggregation problems?

Page 6: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Response type Evaluation criteriaIs historical information preserved

Aggregation performance Storage consumption

Response 1 where dimension records are overwritten

No In the evaluation, we define this solution to have average performance

Only the current dimension record version is stored. No redundant data is stored

Response 2 where new versions are created

Yes Version records makes performance slower proportional to the number of changes

All old versions of dimension records are stored often with redundant attributes

Response 3 where only one historical version is saved

The current version and a single history destroying version are saved

No performance degradation occurs if either the current or the historical version are used in a query

Normally, only a single extra attribute version is stored

Response 4 that use the top of a dynamic dimension hierarchy as a new static dimension

Yes Better or worse depen-ding on whether both dimension tables are used in a query

The relatively large fact table must have an extra foreign key attribute

Response 5 with dimension data as fact data

Yes Better or worse depen-ding on whether the new fact data are used in a query

The relatively large fact table must have an extra attribute for each dynamic dimension attribute

Response 6 that use fine granularity in combination with response 1 or 3

The finer the granularity, the more historical state information is preserved

The finer the granularity, the slower the performance

The finer the granularity, the more storage consumption

Response 7 that stores dynamic dimension data as static facts in another data mart

Yes Better or worse depen-ding on whether both fact tables are used in a drill across query

This is the most storage consuming solution as at least a new fact and foreign key are stored in the new fact table

Page 7: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Kimball’s type 1 response:

Owerwrite the old value:

Bank account Fact- Account-ID- Time-ID- Branch-ID- Interest-last-month- Cost-last-month

Branch-office Dimension- Branch-ID- Branchname

Figure 3.2

Time Dimension- Time-ID- Monthname

Page 8: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Response 1 used with dimension attribute change:

2000

Quantity

001

……Bran-ID Centre

Br-Name

001

……Bran-ID

2000

Quantity

001

……Bran-ID

West

Br-Name

001

……ButikID

2000001

3500

Quantity

001

……Bran-ID

West

Br-Name

001

……ButikID

Sales fact table Branch office dimension

Page 9: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

In response 2 you create a new version of the changed record:

Bran-ID … Quantity …

001 2000

Bran-ID … Bran-Size …

001 250

Bran-ID … Quantity …

001 2000

Bran-ID … Bran-Size …

001 250

002 450

Bran-ID … Quantity …

001 2000

002 3500

Bran-ID … Bran-Size …

001 250

002 450

How is it possible to aggregate to the fhysical Branch office level?

Sales fact table Branch office dimension

Page 10: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Exercise in SCD:

Soppose the attribute Branch-name and Branch-size use response type 1 and 2, respectively and are changed at the same time.

How is it in this situation possible not to preserve the historic Branch-name information as the this gives wrong name level aggregations?

Bank accounts

Branch-offices

- Account# - Interest-last-year - Cost-last-year - Branch#

- Branch# - Branch-name - Branch-size

Fact table Dimension

Page 11: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Exercise:What SCD responces will you recommend for the datawarehouses designed in the car rentel case of slideshow 1.

Customers

Car types

Reservations

Orders

Branch offices

Cars

GaragesGarage services

Pick up

Contracts

Car return

Page 12: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Kimball’s 3 responces to slowly changing dimensions :

1. Owerwrite the old value.

2. Create a new dimension record with the new value.

3. Create an extra attribute for the changed dimension value.

Page 13: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Kimball’s type 3 response:Create an extra attribute for the changed dimension

relationship.

Suppose the product group of a product may be changed.Does this solution make meaningful aggregations to the two group levels?

Page 14: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

In response 3, you create a new version attribute:

Bran-ID … Quantity …

001 2000

Bran-ID … Old-Size New-Size …

001 250 250

Bran-ID … Quantity …

001 2000

Bran-ID … Quantity …

001 2000

001 3500

Bran-ID … Old-Size New-Size …

001 250 450

Bran-ID … Old-Size New-Size …

001 250 450

Order-line fact table Branch office dimension

Does this solution make meaningful aggregations to the two Size levels?

Page 15: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Response 3 should only be used for a new grouping criteria:

Prod-ID … Quantity …

001 2000

Prod-ID … Old-group New-group …

001 A

Prod-ID … Quantity …

001 2000

Prod-ID … Quantity …

001 2000

001 3500

Prod-ID … Old-group New-group …

001 A B

Prod-ID … Old-group New-group …

001 A B

Order-line fact table Product dimension

What is the difference between the Grouping update and the previous Branch size update as the Grouping aggregations functions well while the Branch-size aggregations does not give any meening?

Page 16: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Suppose the product group of a product may be changed.

Product dimension- Product-ID- Group-ID- Product-name

Orderdetail fact- Order-ID- Product-ID- Qty- Price

Productgroup dimension- Group-ID- Group-name

How would you implement SCD response 2 in this example?

Will SCD response 2 make meaningful aggregations if you want to compare product group sale over time?Will SCD response 3 make meaningful aggregations?

Page 17: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Exercise in when to preserve historic information.

Product dimension- Product-ID- Group-ID- Product-name

Orderdetail fact- Order-ID- Product-ID- Qty- Price

Productgroup dimension- Group-ID- Group-name

Exchange the Product dimension with a Branch office dimension and the Productgroup dimension with a Branch-Size dimension in the following example!

Will SCD response 2 make meaningful aggregations if you want to compare the sale of the Branch-Size over time?Will SCD response 3 make meaningful aggregations?

Notice!It may be both attribute and business dependent whether you want to preserve historic information or not.

Page 18: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Suppose the product group of a product may be updated.

Product dimension- Product-ID- Product-name- Group-ID- Group-name

Orderdetail fact- Order-ID- Product-ID- Qty- Price

Will the response type 1 give correct aggregations to the group level if you want to compare product group sale over time?

Page 19: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Suppose the product group of a product may be changed.

Product dimension- Product-ID- Product-name

Orderdetail fact- Order-ID- Product-ID- Group-ID- Qty- Price

Productgroup dimension- Group-ID- Group-name

Will the solution below give correct aggregations to the group level if you want to compare product group sale over time?

Page 20: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

SCD Type 4 may be used in dynamic dimension hierarchies:Order Dimension- Order-ID- Ordertype. . .

Orderdetails Fact- Product-ID- Order-ID- Date-ID- Salesman-ID- Qty- Price

Time Dimension- Date-ID- Date- Month- Year- Holiday indication

Product Dimension- Product-ID- Product-name- Product-group-name

Salesman- Salesman-ID- Salesman-name- Salary-group-ID

Salary-Group- Salary-group-ID- Salary-name- Salary. . .

Dimension Hierachy

Figure 2.1

Suppose both salary group and product group are dynamic. Does this make SCD problems?

Page 21: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

The Type 4 Responce:Dynamic relationships in a dimension hierarchy may be related directly to the fact table

Order Dimension- Order-ID- Ordertype. . .

Orderdetails Fact- Product-ID- Order-ID- Date-ID- Salesman-ID- Salary-group-ID- Product-group-ID- Qty- Price

Time Dimension- Date-ID- Date- Month- Year- Holiday indication

Product Dimension- Product-ID- Product-name

Salesman Dimension- Salesman-ID- Salesman-name. . .

Salary-GroupDimension- Salary-group-ID- Salary-name- Salary

Figure 3.1

Product-groupDimension- Product-group-ID- Product-group-name

Page 22: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

SCD Type 5 store dynamic attributes in the fact table:

- Product#- Order#- Qty- Date#- Salesman#

Fact table

Orders

Orderdetails

Time

Products Salesmen

Dimension Dimension

Dimension

Dimension

- Product#- Product-name- Price

- Order#- Ordertype

- Salesman#- Salesman-name

- Date#- Date-Name

Page 23: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

SCD Type 6 Responce:

Use fine granularity:

Bank account Fact- Account-ID- Time-ID- Branch-ID- Interest-last-month- Cost-last-month

Branch-office Dimension- Branch-ID- Branchname

Figure 3.2

Time Dimension- Time-ID- Monthname

Page 24: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

The Type 7 Response: Store the Dynamic Dimension Data as Static Facts in another Mart.

Example Let us suppose a fact table stores the sale of products in a department store. In this example the department records may have an attribute with the number of salesmen as well as well as an attribute with the monthly costs of the departments. These attributes are dynamic!

Which response type would you recommend?

Time sheets per day per salesman per department

Orderdetails

- Product# - Order# - Qty

Fact table

Salesmen Products

- Product# - Product-name - Price - Group#

Product groups

- Group# - Group-name - Department#

Departments

- Salesman# - Salesman-name

Department# Department name No. of employes Department costs

Page 25: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Exercise: Select responses to SCD for theAirline DW.

Flight routes

Subroutes

Departures

Airports

Tickets

Travelarrangement

Customers

Airlinecompanies

Page 26: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Exercise: Select responses to SCD for the Hotel DW.

Hotels

Rooms

Room reservations

Services/ tours/ car rentals Check-in

periods

Customers Customer groups

Hotel chains

Page 27: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Exercise. Select responses to SCD for the travel agency.

Customers

Reservations

Orders

Departures/Hotel rooms/Car rentals/

etc.

Flight routes/Room types/Car types/

service types

Buyer

Bookings

Traveler

Product owners

Page 28: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Exercise.Design a datawarehouse for a promotion company.

Customers

Presentation blocks/types

Order lines

Orders

Logical promotions

Physical promotions

Promotion media

How is it possible to measure the results of promotions and where should these measures be stored in the data warehouse?

Page 29: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Exercise:

Design a DW for a commercial TV channel

Page 30: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

HRM exercise:

Make some requerements for a HRM system and try to group them in OLTP and OLAP requerements.

Make an ER diagram for an OLTP database and one or more OLAP datamarts that can fulfill the requerements.

Page 31: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Design a datawarehouse for a bank:

It should be possible to analyze both costs and revenye for customers, households, branch offices, regions, account managers etc.

Page 32: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Exercise:

Design a datawarehouse for a housing association that let out flats, shops and office areas.

It is possible to sign up on vaiting lists for these.

Page 33: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Exercise:

Design et datawarehouse for DSB in order to deminish train delays.

Page 34: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Exercise:

Design a datawarehouse for stock exchange dealers in a bank.

Page 35: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Kimball’s type 2 response:Suppose an account shifts Branch relationship in the middle of the month. Will the aggregations be correct and how will you solve possible problems?

Bank account Fact- Account-ID- Date-ID- Branch-ID- Interest-last-month- Cost-last-month

Branch-office Dimension- Branch-ID- Branchname

Figure 3.2

Time Dimension- Date-ID- Monthname

Can you find more solutions?

Page 36: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

Kimball’s type 2 response:

Suppose both the Branch relationship and the Branch-size are dynamic.How can aggregations be correct?

Bank account Fact- Account-ID- Date-ID- Branch-ID- Interest-last-month- Cost-last-month

Branch-office Dimension- Branch-ID- Branchname

Time Dimension- Date-ID- Monthname

Page 37: Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning.

End of session

Thank you !!!Thank you !!!