Agenda • Common terms used in the software of data warehousing and what they mean. • Difference between a database and a data warehouse - the difference in how each are optimised. • What is a cube and what are dimensions? • High level overview of Performance Point • Difference between a score card and a dashboard • How do the data warehouse, cube and Performance Point relate to one another? • At which point and how should calculated fields be added. • The purpose and definition of Fact Tables, Dimension Tables etc. • Quantifiable benefits organisations achieve through data warehousing
22
Embed
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Agenda
• Common terms used in the software of data warehousing and what they mean.
• Difference between a database and a data warehouse - the difference in how each are optimised.
• What is a cube and what are dimensions? • High level overview of Performance Point • Difference between a score card and a dashboard • How do the data warehouse, cube and Performance Point relate to one
another? • At which point and how should calculated fields be added. • The purpose and definition of Fact Tables, Dimension Tables etc. • Quantifiable benefits organisations achieve through data warehousing
• This structure can hold a certain number of data elements. • The number of elements is the total number of separate labels multiplied together• i.e this structure can hold 4 x 3 x 4 data elements. (= 48)• Which makes it look a lot like a cube…• That’s as far as the cube analogy can go, because a real data warehouse will have many different sets of independent labels – They are called Dimensions
Dimension Tables
• Dimension Tables contain the names of each member of the dimension:Product_ID Product_Name Category
101 Left Handed Widget Retail
102 Right Handed Widget Retail
103 Ambidextrous Widget Specialist
Primary Key
Fact Table
Region_ID Product_ID Quarter Units Price
1 101 1 300 45.20
1 101 2 330 45.20
1 101 3 355 45.20
1 101 4 461 44.00
1 102 1 200 39.00
1 102 2 235 39.00
1 102 3 260 38.50
1 102 4 261 38.50
Fact Table & Dimension Table Relationship
Region_ID Product_ID Quarter Units Price
1 101 1 300 45.20
1 101 2 330 45.20
1 101 3 355 45.20
1 101 4 461 44.00
1 102 1 200 39.00
1 102 2 235 39.00
1 102 3 260 38.50
1 102 4 261 38.50
Product_ID Product_Name
101 Left Handed Widget
102 Right Handed Widget
103 Ambidextrous Widget
One-to-Many Relationship
• Normalised Data Structure– Structure designed for handling live transactions
• Dimensional Data Structure– AKA Denormalised Data Structure– Structure designed for querying
• Operational Data Store– Often a copy of a transactional database– Updated regularly from transactional systems– May be used for reporting
Common terms used in data warehousing and what they mean - 1
Common terms used in data warehousing and what they mean - 2
• Dimensional Modelling– Fact Table or Measure Table
• Holds historical records of events that occurred in a transactional system– Conformed Facts
• Facts from multiple fact tables are conformed when the technical definitions of the facts are equivalent. Conformed facts can have the same name in different tables and can be combined and compared mathematically
– Dimension Table• Has a number of Attributes, e.g. Product Name, Category, Colour, etc• Used to slice and dice the data in the Fact Table
– Attribute• Property of a Dimension
– Conformed Dimension• Dimensions are conformed when the are exactly the same (including the keys) or
one is a perfect subset ot the other• The row headers produced in answer sets from two different conformed
dimensions must be able to be matched perfectly
Conformed Dimensions - Example
Business Processes
Common Dimensions
Date
Product
Store
Promotion
Warehouse
Vendor
Contract
Shipper
Retail Sales x x x x
Retail Inventory x x x
Retail Deliveries x x x
Warehouse Inventory x x x x
Warehouse Deliveries x x x x
Purchase Orders x x x x x x
Facts and Dimensions - Example
Common terms used in data warehousing and what they mean - 3
• Slowly Changing Dimension (SCD)– A Dimension where the rows change slowly over time. An example would be a
product Dimension where the Price attribute changes from year to year as a result of marketing/profitability issues.
• Type 1 SCD– Values are overwritten when they change
• Type 2 SCD– A new row is written when the value of an attribute changes
• Type 3 SCD– The previous value is put into an “Old Value” column
• Data Mart– A logical and physical subset of the data warehouse’s presentation area– Data Marts can be tied together using Drill-Across queries when their
dimensions are conformed
Common terms used in data warehousing and what they mean - 4
• Primary Key– Unique Identifier for a record
• Foreign Key– A value in a record that refers to a Primary Key in another table
• Surrogate Key – AKA Meaningless key, integer key, nonnatural key, artificial key, synthetic key– A new primary key that is created in a table to ensure uniqueness regardless of the source of new
records.• E.g. Two Customer tables in different sources may both have a primary key on CustomerID. This means that
the same CustomerID could relate to two totally different customers, depending on which source they came from. So when the records are added to a Dimensional Data Warehouse, a new Primary Key is added which has no relationship to the sources’ primary keys
• Grain– The meaning of a single row in a table. The grain of a fact table represents the most atomic level by
which the facts may be defined. The grain of a SALES fact table might be stated as "Sales volume by Day by Product by Store“. Each record in this fact table is therefore uniquely defined by a day, product and store. In this case you would not be able to look at sales by the hour, nor could you look at individual sales
• Granularity– The level of detail captured in a data warehouse.
Surrogate Key
• Surrogate Key (AKA Meaningless key, integer key, nonnatural key, artificial key, synthetic key)– Data Warehouses integrate data from multiple sources and therefore they
can’t rely upon an application key in one table being different from another application key in another table in another database.
– A new primary key that is created in a table to ensure uniqueness regardless of the source of new records.
– Surrogate keys can be integers even if the application key isn’t • This saves space• e.g. Two Customer tables in different sources may both have a primary key on
CustomerID. This means that the same CustomerID could relate to two totally different customers, depending on which source they came from. So when the records are added to a Dimensional Data Warehouse, a new Primary Key is added which has no relationship to the sources’ primary keys
• e.g Data changes over time. As an example, if the price of Left Handed Widgets is increased from 45.20 to 47.90, we need to keep the old data and add new data. Therefore we need a key that doesn’t depend solely upon the product ID