Top Banner
Sunday, May 8, 2022 Introduction To DWH by Satish Kumar Yellanki Slide No 1 Let us Get Through Data Modeling
44

Dimensional Modeling Introduction By Sathish Yellanki

Nov 07, 2014

Download

Documents

satishalerts

Dimensional Modeling For Data Warehouse
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 1

Let us Get Through Data Modeling

Page 2: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 2

• A Data Model is A Conceptual Representation of Data Structures, That is Tables Required For A Database.

• A Data Model is Very Powerful Means of Expressing And Communicating The Business Requirements.

• A Data Model Visually Represents The Nature of Data, Business Rules Governing The Data, And How it Will Be Organized in The Database.

• A Data Model is Comprised of Two Parts • Logical Design• Physical Design.

• Data Model Helps Functional And Technical Team in Designing The Database.

• Functional Team Normally Consists of One OR More Business Analysts, Business Managers, Smart Management Experts, End Users.

• Technical Teams Consists of One OR More Programmers, DBA’s, Project Architects, Program Analysts.

• Data Modelers Are Responsible For Designing The Data Model And They Communicate With Functional Team To Get The Business Requirements And Technical Teams To Implement The Database.

Page 3: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 3

When A Data Model is Developed?• When A New Application For OLTP (Online Transaction

Processing), ODS (Operational Data Store), Data Warehouse And Data Marts is Planned.

• When We Need To Rewrite Data Models From Existing Systems That May Need To Change Reports.

• When An Incorrect Data Model Exists in The Current System.

• A Current Data Base Has No Previous Data Model.Advantages And Importance of Data Model• To Make Sure That All Data Objects Provided By The

Functional Team Are Completely And Accurately Represented.

• Data Model is Detailed Enough To Be Used By The Technical Team For Building The Physical Database.

• The Information Contained in The Data Model Will Be Used To Define The Significance of Business, Relational Tables, Primary And Foreign Keys, Stored Procedures, And Triggers.

• Data Model Can Be Used To Communicate With The Business Within And Across Businesses.

Page 4: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 4

Data Modeler Role• Business Requirement Analysis, Interact With Business

Analysts To Get The Functional Requirements. • Interact With End Users And Find Out The Reporting

Needs. • Conduct Interviews, Brain Storming Discussions With

Project Team To Get Additional Requirements. • Gather Accurate Data By Data Analysis And Functional

Analysis.Popular Data Modeling Tools

Tool Name Company Name

Erwin Computer Associates

Embarcadero Embarcadero Technologies

Rational Rose IBM Corporation

Power Designer Sybase Corporation

Oracle Designer Oracle Corporation

Xcase RESolution LTD.

TOAD Data Modeler TOAD OR Quest Software

Page 5: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 5

Steps To Create A Data Model • Get Business Requirements. • Create High Level Conceptual Data Model. • Create Logical Data Model. • Select Target DBMS Where Data Modeling Tool Creates

The Physical Schema. • Create Domain. • Create Entities And Add Definitions. • Create Attributes And Add Definitions. • Assign Data Type To Attribute. If A Domain is Already

Present Then The Attribute Should Be Attached To The Domain.

• Create Primary OR Unique Keys To Attribute. • Create Check Constraint OR Default To Attribute. • Create Unique Index OR Bitmap Index To Attribute. • Create Foreign Key Relationship Between Entities. • Create Physical Data Model. • Add Database Properties To Physical Data Model. • Create SQL Scripts From Physical Data Model And

Forward That To DBA. • Maintain Logical And Physical Data Model.

Page 6: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 6

Data Modeling Development CycleFirst Phase Gathering Business Requirements• Data Modelers Have To Interact With Business Analysts To

Get The Functional Requirements And With End Users To Find Out The Reporting Needs.

Second Phase Conceptual Data Modeling (CDM)• This Data Model Includes All Major Entities, Relationships

And It Will Not Contain Much Detail About Attributes And is Often Used in The Initial Planning Phase.

Bank

Investment Type Credit Card Type Loan Type

Investment Credit Card Loan

Auto LoanPersonal Loan

Mortgage Loan

General Credit Card

Premier Credit Card

Credit CardCertificate Deposit

Checking Account

Saving Account

Page 7: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 7

Third Phase Logical Data Modeling (LDM)• A Logical Data Model is The Version of The Model That

Represents All of The Business Requirements of An Organization.

State NameDate Time Stamp

State Code (PK)

State Lookup

City NameDate Time Stamp

City Code (PK)

City Lookup

Village NameDate Time Stamp

Village Code (PK)

Village Lookup

Employer NameDate Time Stamp

Employer Identifier (PK)

Employer Lookup

Date Time Stamp

Employee Identifier (FK)Employer Identifier (FK)

Employee Employer Reference

State Code (FK)Village Code (FK)City Code (FK)Manager Identifier (FK)Employee First NameEmployee Last NameEmployee Full NameDate Time Stamp

Employee Identifier (PK)

Employee

Page 8: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 8

Fourth Phase Physical Data Modeling (PDM)• This is A Complete Model That Includes All Required

Tables, Columns, Relationship, Database Properties For The Physical Implementation of The Database.

STATE_NM: VARCHAR2(50) NOT NULLDT_TM_STMP: DATE NOT NULL

STATE_CD: VARCHAR2(5) NOT NULL (PK)

State_LKP

CITY_NM: VARCHAR2(50) NOT NULLDT_TM_STMP: DATE NOT NULL

CITY_CD: VARCHAR2(5) NOT NULL (PK)

City_LKP

VLLGE_NM: VARCHAR2(50) NOT NULLDT_TM_STMP: DATE NOT NULL

VLLGE_CD: VARCHAR2(5) NOT NULL (PK)

Village_LKP

EMPLYER_NM: VARCHAR2(50) NOT NULLDT_TM_STMP: DATE NOT NULL

EMPLYER_CD: VARCHAR2(5) NOT NULL (PK)

Employer_LKP

EMPLOYEE

EMPLYE_ID: NUMBER(10, 0) NOT NULL (PK)

STATE_CD:VARCHAR2(5) NOT NULL (FK)CITY_CD: VARCHAR2(5) NOT NULL (FK)VLLGE_CD: VARCHAR2(5) NOT NULL (FK)MNGR_ID: NUMBER(10, 0) NULL (FK)EMPLYE_FIRST_NAME: VARCHAR2(15) NOT NULLEMPLYE_LAST_NAME: VARCHAR2(15) NOT NULLEMPLYE_FULL_NAME: VARCHAR2(30) NOT NULLDT_TM_STMP: DATE NOT NULL

EMPLOYEE_EMPLOYER_REF

EMPLYE_ID: NUMBER(10, 0) NOT NULL (FK)

EMPLYER_ID: NUMBER(10, 0) NOT NULL (FK)DT_TM_STMP: DATE NOT NULL

Page 9: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 9

Entity Relational Modeling Versus Dimensional Modeling

Page 10: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 10

Relational (OLTP) Data Modeling• Relational Data Model is A Data Model That Views The

Real World As Entities And Relationships. • Entities Are Associated With Each Other By Relationship

And Attributes Are Properties of Entities. • Business Rules Would Determine The Relationship

Between Each of The Entities in A Data Model.• Helps To Automate The Normalization of Physical Data

Structures And Seeks To Control Data Redundancy .Goal of Relational Data Model• To Normalize That is To Avoid Redundancy in The Data

And To Present it in A Good Normal Form For Transactional Standards.

Qualifications Required• The Data Modeler Has To Understand 1st Normal Form

Through 5th Normal Form To Design A Good Data Model For OLTP Systems.

• Look For One-To-Many OR Many-To-Many Relationships Among Data Elements, And Separate The Data Elements into Distinct Tables Joined By Keys.

• Concentrate on Microscopic Relationships Among Data Elements in The Systems.

Page 11: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 11

Dimensional Modeling• Dimensional Data Modeling Comprises of One OR More

Dimension Tables And Fact Tables.• Dimension Table Stores Records Related To That

Particular Dimension And They Cannot Store Facts That is Values of Measures.

• Dimensional Modeling Adheres To A Discipline of Relational Model, With High Performance Access.

• In Dimensional Modeling Dimensions Integrate Through Concept of Star Join With Fact Table.

• In Dimensional Modeling Fact Table is A Multi Part Primary Key Associated With Many To Many Relation.

• Dimensional Modeling Encourages The Top-Down Design Process of The System. • First We Identify The Main Business Processes That

Act As The Sources of The Fact Tables. • Then We Populate The Fact Tables With Numeric,

Additive Facts. • We Describe Each Fact Record By As Many Business

Dimensions As We Can Identify.

Page 12: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 12

• The Fact Table Records Consist of Key Values That Have Many-To-Many Relationships With One Other, Together With Numeric Data Representing Measurements of Each Dimension.

• It is Important That The Dimension Tables Remain As Flat Structures, Single-Level Tables Without Being Further Normalized.

Product Dimension

Product Dimension Identifier (PK)

Product Category NameProduct Sub-Category NameProduct NameProduct Feature DescriptionProduct Unit of Measure DescriptionDate Time Stamp

Organization Dimension

Organization Dimension Identifier (PK)

Corporate Office NameRegion NameBranch NameEmployee NameDate Time Stamp

Location Dimension

Location Dimension Identifier (PK)

Country NameState NameDistrict NameMandal NameVillage OR City NameDate Time Stamp

Time Dimension

Time Dimension Identifier (PK)

Year NumberDay of YearYear Week NumberQuarter NumberMonth NumberMonth NameMonth Day NumberMonth Week NumberDay of WeekCalendar DateDate Time Stamp

Sales Information Fact

Product Dimension Identifier (FK)Organization Dimension Identifier (FK)Location Dimension Identifier (FK)Time Dimension Identifier (FK)

Sales in DollarsDate Time Stamp

Page 13: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 13

Detailing The Dimensions

Page 14: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 14

Location Dimension

Location Dimension

Location Dimension Identifier (PK)

Country NameState NameDistrict NameMandal NameVillage OR City NameDate Time Stamp

Country Lookup

Country ISO Code (PK)

Country NameDate Time Stamp

State Lookup

State ISO Code (PK)

State NameDate Time Stamp

District Lookup

District Code (PK)

District NameDate Time Stamp

Mandal Lookup

Mandal Code (PK)

Mandal NameDate Time Stamp

Village OR City Lookup

Village OR City Code (PK)

Village OR City NameDate Time Stamp

Page 15: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 15

Product Dimension

Product Dimension

Product Dimension Identifier (PK)

Product Category NameProduct Sub Category NameProduct NameProduct Feature DescriptionProduct Unit of Measure DescriptionDate Time Stamp

Product Category Lookup

Product Category Code (PK)

Product Category NameDate Time Stamp

Product Sub Category Lookup

Product Sub Category Code (PK)

Product Sub Category NameDate Time Stamp

Product Lookup

Product Code (PK)

Product NameDate Time Stamp

Product Feature Lookup

Product Feature Code (PK)

Product Feature DescriptionDate Time Stamp

Product Unit of Measure Lookup

Product Unit of Measure Code (PK)

Product Unit of Measure DescriptionDate Time Stamp

Page 16: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 16

Organizational Dimension

Organization Dimension

Organization Dimension Identifier (PK)

Corporate Office NameRegion NameBranch NameEmployee NameDate Time Stamp

Corporate Office Lookup

Corporate Office Code (PK)

Corporate Office NameDate Time Stamp

Region Lookup

Region Code (PK)

Region NameDate Time Stamp

Branch Lookup

Branch Code (PK)

Branch NameDate Time Stamp

Employee Lookup

Employee Code (PK)

Employee NameDate Time Stamp

Page 17: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 17

Time Dimension

Time Dimension

Time Dimension Identifier (PK)

Year NumberDay of The YearQuarter NumberMonth NumberMonth NameMonth Day NumberWeek NumberDay of The WeekCalendar DateDate Time Stamp

Year Lookup

Year Code (PK)

Year NumberDate Time Stamp

Quarter Lookup

Quarter Code (PK)

Quarter NameDate Time Stamp

Month Lookup

Month Code (PK)

Month NameDate Time Stamp

Week Lookup

Week Code (PK)

Day of The WeekDate Time Stamp

Page 18: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 18

Star Schema• Star Schema is A Relational Database Schema For

Representing Multi-Dimensional Data. • It is The Simplest Form of Data Warehouse Schema That

Contains One OR More Dimensions And Fact Tables. • Star Schema is So Called Because The Entity-

Relationship Diagram Between Dimensions And Fact Tables Resembles A Star Where One Fact Table is Connected To Multiple Dimensions.

• The Center of The Star Schema Consists of A Large Fact Table And it Points Towards The Dimension Tables.

• The Advantage of Star Schema is it Enables Slicing Down, And Associates Performance Increase And Easy Understanding of Data.

Star Join• In Star Schema We Have Few Dimensions Existing

Independently Merging into A Fact Table. Time Dimension

Time Attributes

Store Dimension

Store Attributes

Retail Sales Fact

Retail Sales Attributes

Product Dimension

Product Attributes

Promotion Dimension

Promotion Attributes

Page 19: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 19

• The Fact Table Helps The Data Warehouse To Analyze And Are Classified Along Different Dimensions.

• The Fact Tables Hold The Main Data, While The Usually Smaller Dimension Tables Describe Each Value of A Dimension And Can Be Joined To Fact Tables As Needed.

• Dimension Tables Have A Simple PRIMARY KEY, While Fact Tables Have A Set of FOREIGN KEYS Which Make Up A COMPOUND PRIMARY KEY With A Combination of Relevant Dimension Keys.

• It is Common For Dimension Tables To Consolidate Redundant Data in The Most Granular Column, And Are Thus Rendered in Second Normal Form.

• Fact Tables Are Usually in Third Normal Form Because All Data Depends on Either One Dimension OR All Dimensions But Not on Combinations of A Few Dimensions.

• The Star Schema is A Way To Implement Multi-Dimensional Database (MDDB) Functionality Using A Mainstream Relational Database.

• In Star Schema Queries Are Never Complex Because The Only Joins And Conditions Involve A Fact Table And A Single Level of Dimension Tables.

Page 20: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 20

Snowflake Schema• A Snowflake Schema is A Logical Arrangement of Tables

in A Multidimensional Database Such That The Entity Relationship Diagram Resembles A Snowflake in Shape.

• The Snowflake Schema is Represented By Centralized Fact Tables Which Are Connected To Multiple Dimensions.

• In The Snowflake Schema, Dimensions Are Normalized into Multiple Related Tables Whereas The Star Schema's Dimensions Are De-Normalized With Each Dimension Being Represented By A Single Table.

• The "Snow Flaking" Effect Only Affects The Dimension Tables And Not The Fact Tables.

Points To Consider in Star Schema & Snow Flake Schema• In Star Schema Every Dimension Will Have A Primary Key And,

A Dimension Table Will Not Have Any Parent Table.• In A Snow Flake Schema, A Dimension Table Will Have One OR

More Parent Tables.• In Star Schema Hierarchies For The Dimensions Are Stored in

The Dimensional Table Itself.• In Snowflake Hierarchies Are Broken into Separate Tables.• Hierarchies Helps To Drill Down The Data From Topmost

Hierarchies To The Lowermost Hierarchies.

Page 21: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 21

Year Dimension

Year Key (PK)

Year ValueDate Time Stamp

Month Dimension

Month Key (PK)

Year Key (FK)Month ValueDate Time Stamp

Day Dimension

Day Key (PK)

Month Key (FK)Day ValueDate Time Stamp

Time Dimension

Time Key (PK)

Day Key (FK)Time ValueDate Time Stamp

Sales Fact

Time Key (FK)Product Key (FK)Customer Key (FK)Store Key (FK)

Unit SoldValue in Rupees

Product Dimension

Product Key (PK)

Company Key (FK)Product NameAbbreviationDescriptionDate Time Stamp

Store Dimension

Store Key (PK)

Store NameDescriptionDate Time Stamp

Customer Dimension

Customer Key (PK)

Customer NameAddressDate Time Stamp

Company Dimension

Company Key (PK)

Company NameDescriptionDate Time Stamp

Customer Type Dimension

Customer Type Key (PK)

Customer Type NameDescriptionDate Time Stamp

Page 22: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 22

Time_DIM

Time_Key(PK)

DayDay_Of_The_WeekMonthQuarterYear

Item_DIM

Item_Key(PK)

Item_NameItem_BrandItem_TypeSupplier_Type

Branch_DIM

Branch_Key(PK)

Branch_NameBranch_TypeBranch_PhoneBranch_Fax

Location_DIM

Location_Key(pk)

Street_NameCity_NameProvince_OR_StreetCountry_Name

Sales_Fact

Time_Key(FK)Branch_Key(FK)Item_Key(FK)Location_Key(FK)

Total_Units_SoldTotal_Dollars_SoldAvg_Sold

Page 23: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 23

Time_DIM

Time_Key(PK)

DayDay_Of_The_WeekMonthQuarterYear

Item_DIM

Item_Key(PK)

Item_NameItem_BrandItem_TypeSupplier_Key(FK)

Branch_DIM

Branch_Key(PK)

Branch_NameBranch_TypeBranch_PhoneBranch_Fax

Location_DIM

Location_Key(PK)

Street_NameCity_Key(FK)

Sales_Fact

Time_Key(FK)Branch_Key(FK)Item_Key(FK)Location_Key(FK)

Total_Units_SoldTotal_Dollars_SoldAvg_Sold

City_DIM

City_Key(pk)

City_NameProvince_OR_StreetCountry_Name

Supplier_DIM

Supplier_Key(PK)

Supplier_NameSupplier_Type

Page 24: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 24

Components of Dimensional Modeling• Dimension Tables. • Fact Tables. Dimension Tables• Dimension Tables Contain Textual Descriptors of The

Business.• Dimension Tables Are Integral Companions OR

Supporters For Operations on The Fact Tables.• Dimension Tables in Real Time Contain A Very large

Number of Columns OR Attributes.• The Number of Records OR Rows Managed By The

Dimensional Table Are Very Less.• The Strength of A Dimensional Table is Directly

Proportional To The Quality and Depth With Which The Dimensional Attributes Are Selected.

• Dimension Tables Store Records Related To That Particular Dimension And No Measures Are part of it.

• Dimension Table Describe The Business Entities of An Enterprise, Represented As Hierarchical, Categorical Information.

• Dimension Tables Are Also Called As Lookup Tables OR Reference Tables.

Page 25: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 25

Fact Tables• Fact Tables Are The Primary Tables in Dimensional

Modeling And Data Warehouse Implementation.• Fact Tables Are The Resources of Online Analytical

Processing of Information in DWH.• Fact Tables Contain Numerical Performance

Measurements of The Business Data Stored in Database.• Fact in A Data Warehouse Represents a Business

Measure, At The Same Level of Granularity.• All Registered Facts Must Always Be Numeric And

Additive in Nature.• Fact Tables in Normal Sense Contain a Less Number of

Collection of Columns But With Large Number of Rows OR Records.

• Fact Table is The Centralized Table in a Star Schema.• A Fact Table Consists of Two Types of Columns…

• Columns Containing Facts. • Columns That Are Foreign Keys To Dimension Tables.

• Fact Table Contains a Primary Key Which is Usually A Composite Key That is Made Up of All of its Foreign Keys.

Page 26: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 26

Types of Measures Stored in Fact TableAdditive Measures• Measures That Can Be Added Across All Dimensions.Illustration• Adding Sales Across All Quarters To Avail The Yearly

Sales of The Organization.Non Additive Measures• Measures That Cannot Be Added Across All Dimensions.Illustration• Aggregation of Percentage OR Dates.Semi Additive Measures• Measures That Can Be Added Across Few Dimensions

And Not With Others.Illustration• Stock Levels on Monday 1000 • Sales on Tuesday 200• Sales on Wednesday 300 • Current Stock on Thursday 500 Note• In The Above Scenario To Obtain Current Stock Level We

Cannot Aggregate The Stock And Sales Across Time Dimension Hierarchy, If Done We Will Have Inappropriate Outcomes.

Page 27: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 27

Points To Consider• A Fact Table Might Contain Either Detail Level Facts OR

Facts That Have Been Aggregated.• Fact Tables Containing Aggregated Facts Are Often

Called As Summary Tables.• A Fact Table That Contains No Measures OR Facts At All

is Called As Factless Fact Tables.Column Types in Data Warehouse• The Fact Tables Are Practically Converted To Cubes in

Real Time Data Warehouse Implementation.• A Cube Contains Two Components

• Dimensions.• Measures.

Dimension• A Dimension is A Component of A Cube, Which Groups

Related Business Data.• Dimensions Become The Axis Labels For Columns And

Rows of A Report.• A Dimension Can Have Levels, Details And Members.• A Level Specifies The Amount of Detail For The Business

Data in The Cube.• All Members in A Dimension Are Organized By Levels.

Page 28: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 28

Measures• Measures Are Also Dimensions, Depending on Which We

Make Our Comparisons.• Measures Are The Measurable Quantities Upon Which

The Business Process is Measured.• Measures Are Always Numeric Quantities.• All Aggregations Are Performed Using The Measures

Only.

Location Dimension

Location Dimension Identifier (PK)

Country NameState NameDistrict NameMandal NameVillage OR City NameDate Time Stamp

Detail

Levels

Sales Fact

Time Key (FK)Product Key (FK)Customer Key (FK)Store Key (FK)

Unit SoldValue in Rupees

Measure

What Are Slowly Changing Dimensions?• Dimensions That Change Over Time Are Called Slowly

Changing Dimensions. Examples • A Product Price Changes Over Time.• People Change Their Names For Some Reason.• Country And State Names May Change Over Time.

Page 29: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 29

How To Handle Slowly Changing Dimensions?• The Data Warehouse Administrator Encounters A

Changed Description in A Dimension Record, Then The DBA Should Issue A New Dimension Record.

• To Issue A New Correct New Dimension The Data Warehouse Must Have A More General Key Structure, Which Needs A Surrogate Key.

• The Slowly Changing Dimension Problem Can Be Solved in Three Ways…• Type 1 Overwriting The Old Values.• Type 2 Create Another Additional Record.• Type 3 A New Field is Added With Bit Status.

Let Us Detail The Slowly Changing DimensionsCase Study For Analysis• Mr. Rama Subramanyam is A Registered Cardiac Patient

With CARE Hospitals, India. • He First Lived in Andhra Pradesh, Hyderabad. • The Original Entry in The Patients Lookup Table Looks As

Follows…Patient

KeyPatient Name State

1000 Rama Subramanyam Hyderabad

Page 30: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 30

Change in The Scenario• At A Later Date, Mr. Rama Subramanyam Moved To

Tamil Nadu, Chennai on September, 2009. Problem To Resolve• How Should CARE Hospitals, India, Now Identify its

Patients Status in The Table To Reflect This Change of Address in The Database.

Solution Approaches Slowly Changing Dimension Type 1• In This Approach We Simply Overwrite The Original

Information With New Information. • In This Approach The History is Not Kept OR Maintained.Original State

Patient Key

Patient Name City

1000 Rama Subramanyam Hyderabad

Updated State

Patient Key

Patient Name City

1000 Rama Subramanyam Chennai

Page 31: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 31

Advantages• This is The Easiest Way To Handle The Slowly Changing

Dimension Problem.• There is No Need To Keep Track of The Old Information.Disadvantages• All The History of The Previous Data is Lost. • By Applying This Methodology, it is Not Possible To

Trace Back in History. Final Conclusion• In This Case, The CARE Hospitals, India, Would Not Be

Able To Know That Mr. Rama Subramanyam Lived in Hyderabad Before.

Priority of Usage• Mostly Used in About 50% of The Situations.When To Use Type 1• Type 1 Slowly Changing Dimension Should Be Used When

it is Not Necessary For The Data Warehouse To Keep Track of Historical Changes.

Page 32: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 32

Slowly Changing Dimension Type 2• In This Approach A New Record is Added To The Table To

Represent The New Information. • In This Approach Both The Original And The New Record

Will Be Available in The Database. • The New Record is Maintained With its Own Primary Key.Original State

Patient Key

Patient Name City

1000 Rama Subramanyam HyderabadUpdated State

Patient Key

Patient Name City

1000 Rama Subramanyam Hyderabad

1002 Rama Subramanyam ChennaiAdvantages • Allows in Keeping Accurately All Historical Information. Disadvantages • This Will Cause The Size of The Table To Grow Fast. • If The Number of Records Are Very High, Then Storage

And Performance Can Become A Concern. • This Approach Necessarily Complicates The ETL Process.

Page 33: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 33

Priority of Usage • Mostly Used in About 50% of The Situations.When To Use Type 2 • Type 2 Slowly Changing Dimension Should Be Used

When it is Necessary For The Data Warehouse To Track Historical Changes.

Slowly Changing Dimension Type 3• In This Approach There Will Be Two Columns To Indicate

The Particular Attribute of Interest.• One Column Indicates The Original Value, And The Other

Column Indicates The Current Value. • There Will Also Be A Column That Indicates When The

Current Value Became Active.Original State

Patient Key

Patient Name City

1000 Rama Subramanyam Hyderabad

Updated State

Patient Key

Patient Name Original City Current City Effective Date

1000 Rama Subramanyam Hyderabad Chennai 10-SEP-2009

Page 34: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 34

Advantages• This Approach Does Not Increase The Size of The Table,

Since New Information is Updated. • This Approach Allows Us To Keep Some Part of History. Disadvantages • Type 3 Will Not Be Able To Keep All History Where An

Attribute is Changed More Than Once.Priority of Usage• Type 3 is Rarely Used in Actual Real Time Practice. When To Use Type 3 • Type 3 Slowly Changing Dimension Should Only Be Used

When it is Necessary For The Data Warehouse To Track Historical Changes, And When Such Changes Will Only Occur For A Finite Number of Time.

Relational Modeling Versus Dimensional Modeling

Relational Data Modeling Dimensional Data Modeling

Data is Stored in RDBMSData is Stored in RDBMS OR Multi-Dimensional Databases

Tables Are Units of Storage Cubes Are Units of Storage

Data is Normalized And Used For OLTP. Optimized For OLTP Processing

Data is Denormalized And Used in Data Warehouse And Data Mart. Optimized For OLAP

Page 35: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 35

Relational Data Modeling Dimensional Data Modeling

Contains Several Tables And Chains of Relationships Among Them.

Contains Few Tables And Fact Tables Are Connected To Dimensional Tables.

Volatile (Several Updates) And Time Variant

Non Volatile And Time Invariant

SQL is Used To Manipulate Data MDX is Used To Manipulate Data

Detailed Level of Transactional DataSummary of Bulky Transactional Data (Aggregates And Measures) Used in Business Decisions

Normal ReportsUser Friendly, Interactive, Drag And Drop Multidimensional OLAP Reports

Typical Data Design Used For Business Transaction Systems

Data Design Used For Analysis Systems

Goal – Reduce Every Piece of Information To It’s Simplest Form.

Goal – Break Up Information into ‘Facts’

Suited For Concurrent Handling of Many Small Transactions By Many Users. Only A Limited Amount of Data History is Normally Kept

Suited For Reading OR Analyzing Large Amounts of Data By A Modest Numbers of Users. Many Years of Data History May Be Kept.

User is Usually Constrained By An Application That Understands The Data Design. Users Are Typically Operations Staff.

This Simpler Data Design Makes it Easier For Users To Analyze Data in Any Way They Choose. Users Are Typically Analysts,  Company Strategists, OR Even Executives

Page 36: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 36

Fact Constellation• Multiple Fact Tables Share Dimension Tables.• This Schema is Viewed As Collection of Stars Hence

Called Galaxy Schema OR Fact Constellation.• Sophisticated Applications Require Such Schema.

Product Dimension

Product Identifier (PK)

Product NameProduct DescriptionDate Time Stamp

Sales Fact

Product Identifier (FK)Store Identifier (FK)

Units SoldPriceDate Time Stamp

Store Dimension

Store Identifier (PK)

Store NameCity NameState NameRegion NameDate Time Stamp

Shipping Fact

Product Identifier (FK)Store Identifier (FK)Shipper Identifier (FK)

Units ShippedPriceDate Time Stamp

Shipper Dimension

Shipper Identifier (PK)

Shipper NameShipper Address DescriptionDate Time Stamp

Page 37: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 37

Representation of OLAP CUBE• It is A Multi Dimensional Data Representation.

• A CUBE Can Be A Representation of Hypercube Which Can Be Multi-Dimensional in Nature To Make Data Analysis Easier.

• A Real Time OLAP CUBE Can Have More Than Three Dimensions.

• Basic OLAP Operations on OLAP CUBE Are Drill Down, Roll Up, Slice And Dice, Pivot.

Cu

sto

mer

Product

Tim

e

AC1

P1

T1

R

C

P

Revenue

Cost

Profit

Base Cell

Page 38: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 38

• CUBE Operations Are Conducted Using The Hierarchical Summarization Paths.

• Base Table Stores The Measure, Which is A Function of The Different Dimensions Integrated To The Fact Table.

• Measures in The Fact Table Are Generally Pre Computed Aggregations And Patterns Taken From Atomic State of Values Residing in OLTP System.

• The Aggregated Patterns Are Calculated Using The Aggregated Functions Like SUM, COUNT, AVG, MAX, MIN, STDDEV Etc.

Hierarchical Summarization Paths IllustrationIndustry

Category

Product

Region

Country

City

Office

Quarter

Month Week

Year

Day• Hierarchical Summarization Paths Are Identified Form

The Dimensions That Are Connected To The Fact Table.• The Hierarchical Paths Make The Loading of The Pre

Computed Aggregations Easier into CUBE.

Page 39: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 39

Drilling Up• Drill Up Presents The Data At A Higher Level on The

Hierarchy.• Drill Up is Generally An Aggregation of The Data Analysis

From Detail To Highest Summary.Drilling Down• Drill Down Presents The Data At A Lower Level on The

Hierarchy.• Drill Down is Generally Detailing of The Data Analysis

From Highest Summary To Detail.Reliance Retail

North Zone South Zone East Zone West Zone

Andhra PradeshTamil Nadu Karnataka Kerla

Hyderabad Vizag Vijayawada Tirupathi

Store2Store1 Store3

Dri

ll U

pD

rill Dow

n

Page 40: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 40

Slice• Slicing is The Process of Retrieving A Block of Data From

A CUBE By Filtering on One Dimension.• Slicing is Focus on Particular Partitions Along One OR

More Dimensions.• Slicing is Implemented in SQL Using The WHERE Clause.

Cu

sto

mer

Product

Tim

e

C1

P1

T1

Original CUBE Sliced Piece

Page 41: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 41

Dice• Dicing is The Process of Retrieving A Block of Data From

A Cube By Filtering on All Dimensions.• Dicing Partitions The Cube into Smaller Sub-Cubes And

The Points in Each Sub-Cube Are Aggregated.• Dicing is Implemented in SQL Using The GROUP BY

Clause.

Cu

sto

mer

Product

Tim

e

C1

P1

T1

Original CUBE

Diced Piece

Page 42: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 42

Qtr1 Qtr2 Qtr3 Qtr4 SUMTV

DVD

PC

SUM

Prod

ucts

Time

India

USA

Germany

Locati

on

SUM

All SUM’s

Final Visualization of CUBE

Page 43: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 43

Slicing And Dicing in SQLGeneric SyntaxSQL> SELECT Grouping-Attributes And Aggregations

FROM Fact Table JOINED With (Zero OR More) Dimension TablesWHERE Attributes Compared To Constants SlicingGROUP BY Grouping-Attributes Dicing

Slicing And Dicing IllustrationScenario • Suppose A Particular Television Model, Say ‘SONY

Bravio’, is Not Selling As Good As Anticipated. How To Analyze?

Assumption Parameter • Maybe The Surround Sound is Not Good.Solution Approach Level 1• Slice For ‘SONY Bravio’.• Dice For ‘Surround Sound’.SQL> SELECT Surround_Sound, SUM(Price)

FROM Sales NATURAL JOIN TelevisionsWHERE Model = ‘SONY_BRAVIO’GROUP BY Surround_Sound;

Page 44: Dimensional Modeling Introduction By Sathish Yellanki

Saturday, April 8, 2023 Introduction To DWH by Satish Kumar Yellanki Slide No 44

• Assume That The Previous Query Doesn't Give Much Information, Each ‘Surround Sound’ Generates About The Same Revenue.

• Since The Query Does Not Dice For Time, We Only See The Total Over All Time For Each ‘Surround Sound’.

Solution Approach Level 2• Issue A Revised Query That Also Partitions Time By

Month.SQL> SELECT Surround_Sound, Month, SUM(Price)

FROM (Sales NATURAL JOIN Televisions) JOIN Days ON Date = DayWHERE Model = ‘SONY_BRAVIO’GROUP BY Surround_Sound, Month;

Solution Approach Level 3• Issue A Revised Query That Also Partitions Month By

Dealer.SQL> SELECT Dealer, Month, SUM(Price)

FROM (Sales NATURAL JOIN Televisions) JOIN Days ON Date = DayWHERE Model = ‘SONY_BRAVIO’ AND Color = ‘RED’GROUP BY Month, Dealer;