Top Banner
ISQS 6339, Business ISQS 6339, Business Intelligence Intelligence Dimensional Dimensional Modeling Modeling Zhangxi Lin Texas Tech University 1 1
55

ISQS 6339, Business Intelligence Dimensional Modeling

Feb 09, 2016

Download

Documents

Ricky Lien

ISQS 6339, Business Intelligence Dimensional Modeling. Zhangxi Lin Texas Tech University. 1. Outline. Principles of Dimensional Modeling Data Warehousing Methodology Three Phases of Dimensional Modeling. Principles of Dimensional Modeling. Dimensional Model. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ISQS 6339, Business Intelligence Dimensional Modeling

ISQS 6339, Business IntelligenceISQS 6339, Business IntelligenceDimensional Dimensional

ModelingModelingZhangxi LinTexas Tech University

11

Page 2: ISQS 6339, Business Intelligence Dimensional Modeling

OutlineOutlinePrinciples of Dimensional ModelingData Warehousing MethodologyThree Phases of Dimensional Modeling

2

Page 3: ISQS 6339, Business Intelligence Dimensional Modeling

PRINCIPLES OF PRINCIPLES OF DIMENSIONAL DIMENSIONAL MODELINGMODELING

3

Page 4: ISQS 6339, Business Intelligence Dimensional Modeling

Dimensional ModelDimensional Model Also called star schema (but snowflake schema is also

fine)◦ Fact table is in the middle and dimensions serving

as the points on the star.◦ A normalized fact table plus denormalized

dimension tables

Reference: database normalization◦ Edgar F. Codd, the inventor of the relational model, introduced

the concept of normalization and what we now know as the First Normal Form (1NF) in 1970. Codd went on to define the Second Normal Form (2NF) and Third Normal Form (3NF) in 1971, and Codd and Raymond F. Boyce defined the Boyce-Codd Normal Form (BCNF) in 1974. 

◦ Informally, a relational database table is often described as "normalized" if it is in the Third Normal Form. Most 3NF tables are free of insertion, update, and deletion anomalies.

44

Page 5: ISQS 6339, Business Intelligence Dimensional Modeling

5

Star Schema ModelStar Schema Model

5

Product TableProduct_idProduct_disc,...

Time TableDay_idMonth_idYear_id,...

Sales Fact TableProduct_idStore_idItem_idDay_idSales_amountSales_units, ...

Item TableItem_idItem_desc,...

Store TableStore_idDistrict_id,...

Central fact table

Denormalizeddimensions

Page 6: ISQS 6339, Business Intelligence Dimensional Modeling

6

Snowflake Schema ModelSnowflake Schema Model

6

Time TableWeek_idPeriod_idYear_id

Dept TableDept_idDept_descMgr_id

Mgr TableDept_idMgr_idMgr_name

Product TableProduct_id

Product_desc

Item TableItem_idItem_descDept_id

Sales Fact TableItem_idStore_id

Product_idWeek_id

Sales_amountSales_units

Store TableStore_idStore_descDistrict_id

District TableDistrict_idDistrict_desc

Page 7: ISQS 6339, Business Intelligence Dimensional Modeling

Snowflake Schema ModelSnowflake Schema Model◦Direct use by some tools◦More flexible to change◦Provides for speedier data loading◦Can become large and

unmanageable◦Degrades query performance◦More complex metadata

77

Country State County City

Page 8: ISQS 6339, Business Intelligence Dimensional Modeling

FactsFactsDefinition

◦ Measure – a numeric quantity expressing some aspect of the organization’s performance

◦ Aggregate – formed by combining values from a given dimension or set of dimensions to create a single value.

Measurements associated with a specific business process.

Most facts are additive (calculative); others are semi-additive, non-additive, or descriptive (e.g. factless fact table).

Many facts can be derived from other facts. So, non-additive facts can be avoided by calculating it from additive facts.

Page 9: ISQS 6339, Business Intelligence Dimensional Modeling

Fact Table CharacteristicsFact Table Characteristics◦ Contain numerical metrics of the business◦ Can hold large volumes of data◦ Can grow quickly◦ Can contain base, derived,

and summarized data◦ Are typically additive◦ Are joined to dimension

tables through foreign keys that reference primary keys in the dimension tables

99

Sales Fact TableProduct_idStore_idItem_idDay_idSales_amountSales_units...

Page 10: ISQS 6339, Business Intelligence Dimensional Modeling

The Three Fact Table TypesThe Three Fact Table Types Transaction fact table

◦ The most basic and fundamental◦ “One row per line in a transaction", e.g., every line on a receipt◦ A transactional fact table holds data of the most detailed level◦ have a great number of dimensions associated with it

Periodic snapshot fact table◦ Takes a "picture of the moment“◦ Cumulative performance over specific time intervals◦ Dependent on the transactional table◦ Valuable to combine data across several business processes in

the value chain Accumulating snapshot fact table

◦ Used to show the activity of a process that has a well-defined beginning and end

◦ Constantly updated over time

10

Page 11: ISQS 6339, Business Intelligence Dimensional Modeling

Types of factsTypes of factsWeek Date Trans# Change OldBal NewBal

1 1A1-1 100 1000 11001 2A1-2 -50 1100 10501 4A1-3 200 1050 12502 2A2-1 -120 1250 11302 2A2-2 200 1130 13303 1A3-1 -300 1330 10304 2A4-1 -20 1030 10104 3A4-2 100 1010 11104 3A4-3 250 1110 13604 5A4-4 -220 1360 1140

Transaction fact: each row

Periodic snapshot fact: (OldBal, NewBal) on each transaction

Accumulating snapshot fact: The average numbers in a week, such as average balance, number of transactions, average amount of transactions, the total amount of trading in a given period.

Page 12: ISQS 6339, Business Intelligence Dimensional Modeling

DimensionsDimensions Definition: a categorization used to spread out an

aggregate measure to reveal its constituent part

The foundation of the dimensional model to describe the objects of the business

The nouns of the DW/BI system◦ Business processes (facts) are the verbs of the

business Dimension tables link to all the business processes. A dimension shared across all processes is called

conformed dimension The analysis involving data from more than one

business process is called drill-across.

1212

Page 13: ISQS 6339, Business Intelligence Dimensional Modeling

AttributesAttributesAn additional piece of

information pertaining to a dimension member that is not the unique identifier or the description of the member.

Attributes can be used to more fully describe dimension members

Page 14: ISQS 6339, Business Intelligence Dimensional Modeling

Dimension Table Dimension Table CharacteristicsCharacteristics Dimension tables have the following characteristics:

◦ Contain textual information that represents the attributes of the business

◦ Contain relatively static data◦ Are joined to a fact table through

a foreign key reference

1414

Page 15: ISQS 6339, Business Intelligence Dimensional Modeling

Star Dimensional Model Star Dimensional Model CharacteristicsCharacteristics

◦The model is easy for users to understand.◦Primary keys represent a dimension.◦Nonforeign key columns are values.◦Facts are usually highly normalized.◦Dimensions are completely denormalized.◦Fast response to queries is provided.◦Performance is improved by reducing table

joins.◦End users can express complex queries.◦Support is provided by many front-end tools.

1515

Page 16: ISQS 6339, Business Intelligence Dimensional Modeling

The Time DimensionThe Time DimensionTime is critical to the data warehouse. A

consistent representation of time is required for extensibility.

1616

Where should the element of time be stored?

TimedimensionSales fact

Page 17: ISQS 6339, Business Intelligence Dimensional Modeling

HierarchiesHierarchiesMeaningful, standard ways to group the

data within a dimension◦Variable-depth hierarchies◦Frequently changing hierarchies

Examples of hierarchy in a dimension◦ Address: street, city, state, country◦ Organization: section, division, branch, region◦ Time: year, quarter, month, date

17

Page 18: ISQS 6339, Business Intelligence Dimensional Modeling

Data CubeData Cube Data cubes are multidimensional extensions of 2-D

tables, just as in geometry a cube is a three-dimensional extension of a square. The word cube brings to mind a 3-D object, and we can think of a 3-D data cube as being a set of similarly structured 2-D tables stacked on top of one another.

Data cubes aren't restricted to just three dimensions. Most OLAP systems can build data cubes with many more dimensions allows up to 64 dimensions.

In practice, we often construct data cubes with many dimensions, but we tend to look at just three at a time. What makes data cubes so valuable is that we can index the cube on one or more of its dimensions.

1818

Page 19: ISQS 6339, Business Intelligence Dimensional Modeling

Data CubeData Cube

Time

Region Product

Page 20: ISQS 6339, Business Intelligence Dimensional Modeling

OLAP systemOLAP system OLAP – allows users to retrieve information

from data quickly for analysis purposesFeatures

◦ Multidimensional database◦ Easily understood

◦ What is OLAP? 5’04”◦ SQL OLAP Tutorial - Data Warehouse Schema Design

9’45”

Page 21: ISQS 6339, Business Intelligence Dimensional Modeling

Dimensional Modeling Dimensional Modeling ProcessProcess High level dimensional model design

◦ Choosing business model in accordance with the analytic theme

◦ Declaring the grain◦ Choosing dimensions◦ Identifying the facts

Detailed dimensional model development Dimensional model review and validation

◦ IS◦ Core users◦ Business community

Final design iteration

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 21

Page 22: ISQS 6339, Business Intelligence Dimensional Modeling

DATA WAREHOUSING DATA WAREHOUSING METHODOLOGYMETHODOLOGY

22

Page 23: ISQS 6339, Business Intelligence Dimensional Modeling

23

Data Warehouse Data Warehouse Development ApproachesDevelopment Approaches Data warehouse development approaches

◦ Kimball Model: Data mart approach Data marts - EDW

◦ Inmon Model: EDW approach EDW – Data Marts

Which model is better?◦ There is no one-size-fits-all strategy to data

warehousing ◦ One alternative is the hosted warehouse

Page 24: ISQS 6339, Business Intelligence Dimensional Modeling

ComparisonComparison Kimball Model

◦ Kimball’s model follows a bottom-up approach. The Data Warehouse (DW) is provisioned from Datamarts (DM) as and when they are available or required.

◦ The Datamarts are sourced from OLTP systems are usually relational databases in Third normal form (3NF).

◦ The Data Warehouse which is central to the model is a de-normalized star schema. The OLAP cubes are built on this DW.

Inmon Model◦ Inmon’s model follows a top-down approach. The Data

Warehouse (DW) is sourced from OLTP systems and is the central repository of data.

◦ The Data Warehouse in Inmon’s model is in Third Normal Form (3NF).

◦ The Datamarts (DM) are provisioned out of the Data Warehouse as and when required. Datamarts in Inmon’s model are in 3NF from which the OLAP cubes are built.

Page 25: ISQS 6339, Business Intelligence Dimensional Modeling
Page 26: ISQS 6339, Business Intelligence Dimensional Modeling

Strengths and Strengths and WeaknessesWeaknesses Scalable vs. structural

◦ Kimball’s model is more scalable because of the bottom-up approach and hence you can start small and scale-up eventually. The ROI is usually faster with Kimball’s model. Because of this approach it is difficult to created re-usable structures/ ETL for different data marts.

◦ On the other hand Inmon’s model is more structured and easier to maintain while it is rigid and takes more time to build. The significant advantage of Inmon’s model is because the DW is in 3NF; it is easier to build data mining models.     

Both Kimball and Inmon models agree and emphasis that DW is the central repository of data and OLAP cubes are built of de-normalized star schemas.     

In conclusion, when it comes to data modeling, it is irrelevant which camp you belong to as long as you understand why you are adopting a specific model. Sometimes it makes sense to take a hybrid approach. 

Page 27: ISQS 6339, Business Intelligence Dimensional Modeling

General Data Warehouse General Data Warehouse Development ApproachesDevelopment Approaches“Big bang” approach

Incremental approach:◦Top-down incremental approach◦Bottom-up incremental approach

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 27

Page 28: ISQS 6339, Business Intelligence Dimensional Modeling

““Big Bang” ApproachBig Bang” Approach

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 28

Analyze enterpriserequirements

Build enterprisedata warehouse

Report in subsets orstore in data marts

Page 29: ISQS 6339, Business Intelligence Dimensional Modeling

Incremental Approach Incremental Approach to Warehouse Developmentto Warehouse DevelopmentMultiple iterationsShorter implementationsValidation of each phase

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 29

Strategy

Definition

Analysis

Design

Build

Production

Increment 1

Iterative

Page 30: ISQS 6339, Business Intelligence Dimensional Modeling

Top-Down ApproachTop-Down Approach

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 30

Analyze requirements at the enterprise levelDevelop conceptual information modelIdentify and prioritize subject areas

Complete a model of selected subject areaMap to available dataPerform a source system analysis

Implement base technical architectureEstablish metadata, extraction, and load processes for the initial subject area

Create and populate the initial subject area data mart within the overall warehouse framework

Page 31: ISQS 6339, Business Intelligence Dimensional Modeling

Bottom-Up ApproachBottom-Up Approach

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 31

Define the scope and coverage of the data warehouse and analyze the source systems within this scope

Define the initial increment based on the political pressure, assumed business benefit and data volume

Implement base technical architecture and establish metadata, extraction, and load processes as required by increment

Create and populate the initial subject areas within the overall warehouse framework

Page 32: ISQS 6339, Business Intelligence Dimensional Modeling

THREE PHASES OF THREE PHASES OF DATA WAREHOUSE DATA WAREHOUSE DESIGNDESIGN

Note: There are many details about data warehouse design, which need a lot effort to learn. Because of limited time to spend for this part, here are only some of the details.

32

Page 33: ISQS 6339, Business Intelligence Dimensional Modeling

Data Warehouse Database Data Warehouse Database Design PhasesDesign PhasesPhase 1: Defining the business

modelPhase 2: Defining the dimensional

modelPhase 3: Defining the physical

model

3333

Page 34: ISQS 6339, Business Intelligence Dimensional Modeling

Phase 1: Defining the Phase 1: Defining the Business ModelBusiness Model

◦Performing strategic analysis◦Define business analytic theme

◦Creating the business model◦Documenting metadata

3434

Page 35: ISQS 6339, Business Intelligence Dimensional Modeling

Performing Strategic AnalysisPerforming Strategic Analysis

Identify crucial business processesUnderstand business processesPrioritize and select the business

processes to implement

3535

BusinessBenefit

Low High

Low

High

Feasibility

Page 36: ISQS 6339, Business Intelligence Dimensional Modeling

Creating the Business ModelCreating the Business Model Defining business requirements:

◦Identifying the business measures◦Identifying the dimensions◦Identifying the grain◦Identifying the business definitions

and rules Verifying data sources

3636

Page 37: ISQS 6339, Business Intelligence Dimensional Modeling

37

Business Requirements Drive Business Requirements Drive the Design Processthe Design Process

◦Primary input

◦Secondary input

Existing Metadata Production ERD Model

BusinessRequirements

Research

37

Page 38: ISQS 6339, Business Intelligence Dimensional Modeling

Identifying Measures and Identifying Measures and DimensionsDimensions

The attribute varies continuously:◦ Balance◦ Units Sold◦ Cost◦ Sales

38

The attribute is perceived as constant or discrete:◦ Product◦ Location◦ Time◦ Size

38

Measures

Dimensions

Page 39: ISQS 6339, Business Intelligence Dimensional Modeling

39

Using a Business Process Using a Business Process MatrixMatrix

39

Sample of business process matrix

Business Dimensions

Business Processes

Sales Returns Inventory

Customer

Date

Product

Channel

Promotion

Page 40: ISQS 6339, Business Intelligence Dimensional Modeling

40

Determining GranularityDetermining Granularity

40

YEAR?

QUARTER?

MONTH?

WEEK?

DAY?

Page 41: ISQS 6339, Business Intelligence Dimensional Modeling

41

Identifying Business RulesIdentifying Business Rules

41

StoreStore > District > Region

Location

Geographic proximity

0 - 1 miles1 - 5 miles > 5 miles

Product

Type Monitor Status PC 15 inch NewServer 17 inch Rebuilt

19 inch CustomNone

TimeMonth > Quarter > Year

Page 42: ISQS 6339, Business Intelligence Dimensional Modeling

Documenting MetadataDocumenting MetadataDocumenting metadata should

include:◦Documenting the design process◦Documenting the development

process◦Providing a record of changes ◦Recording enhancements over time

4242

Page 43: ISQS 6339, Business Intelligence Dimensional Modeling

Metadata Documentation Metadata Documentation ApproachesApproaches

◦Automated Data modeling tools ETL tools End-user tools

◦Manual

4343

Page 44: ISQS 6339, Business Intelligence Dimensional Modeling

Phase 2: Defining the Phase 2: Defining the Dimensional ModelDimensional Model

◦Identify fact tables: Translate business measures into fact

tables Analyze source system information for

additional measures◦Identify dimension tables◦Link fact tables to the dimension

tables◦Model the time dimension

4444

Page 45: ISQS 6339, Business Intelligence Dimensional Modeling

Illustrative case – IMW DataIllustrative case – IMW DataTransaction fact – The

transaction tablePeriodic snapshot fact table –

current records in Land & Office facts

Accumulating snapshot fact table – N/A in this case

Page 46: ISQS 6339, Business Intelligence Dimensional Modeling

Steps in designing a fact Steps in designing a fact tabletable Identify a business process for analysis (like sales). Identify measures or facts (sales dollar), by asking questions like

'What number of XX are relevant for the business process?', replacing the XX with various options that make sense within the context of the business.

Identify dimensions for facts (product dimension, location dimension, time dimension, organization dimension), by asking questions that make sense within the context of the business, like 'Analyse by XX', where XX is replaced with the subject to test.

List the columns that describe each dimension (region name, branch name, business unit name).

Determine the lowest level (granularity) of summary in a fact table (e.g. sales dollars).

An alternative approach is the four step design process described in Kimball. – Check what it is

Page 47: ISQS 6339, Business Intelligence Dimensional Modeling

Using Time in the Data Using Time in the Data WarehouseWarehouse

◦Defining standards for time is critical.◦Aggregation based on time is

complex.

4747

Page 48: ISQS 6339, Business Intelligence Dimensional Modeling

Using Data Modeling ToolsUsing Data Modeling Tools◦ Tools with a GUI enable definition, modeling, and

reporting.◦ Avoid a mix of modeling techniques caused by:

Development pressures Developers with lack of knowledge No strategy

◦ Determine a strategy.◦ Write and publish formally.◦ Make available electronically.

4848

Page 49: ISQS 6339, Business Intelligence Dimensional Modeling

Phase 3: Defining the Phase 3: Defining the Physical ModelPhysical Model Why

◦ Huge amount of data must be effectively processed and retrieved in realtime.

How◦ Translate the dimensional design to a physical model

for implementation.◦ Define storage strategy for tables and indexes.◦ Perform database sizing.◦ Define initial indexing strategy.◦ Define partitioning strategy.◦ Update metadata document with physical information.

4949

Page 50: ISQS 6339, Business Intelligence Dimensional Modeling

Storage and Performance Storage and Performance ConsiderationsConsiderationsDatabase sizingData partitioningIndexingStar query optimization

5050

Page 51: ISQS 6339, Business Intelligence Dimensional Modeling

Database Sizing - Test Load Database Sizing - Test Load SamplingSamplingAnalyze a representative sample of the data

chosen using proven statistical methods. Ensure that the sample reflects:

◦Test loads for different periods◦Day-to-day operations◦Seasonal data and worst-case scenarios◦ Indexes and summaries

5151

Page 52: ISQS 6339, Business Intelligence Dimensional Modeling

Data PartitioningData PartitioningBreaking up of data into separate physical

units that can be handled independentlyTypes of data partitioning

◦Horizontal partitioning. ◦Vertical partitioning

5252

Page 53: ISQS 6339, Business Intelligence Dimensional Modeling

IndexingIndexingIndexing is used for the following reasons:

◦ It is a huge cost saving, greatly improving performance and scalability.

◦ It can replace a full table scan by a quick read of the index followed by a read of only those disk blocks that contain the rows needed.

5353

Page 54: ISQS 6339, Business Intelligence Dimensional Modeling

ParallelismParallelism

5454

Parallel Execution Servers

Sales table

Customerstable

P3

P3

P1

P1

P2

P2

Page 55: ISQS 6339, Business Intelligence Dimensional Modeling

Using Summary DataUsing Summary DataDesigning summary tables offers the

following benefits:◦Provides fast access to precomputed

data◦Reduces use of I/O, CPU, and memory

5555