Top Banner
Data Warehouse Concepts
60
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Warehouse Concepts_Final.pptx

Data Warehouse Concepts

Page 2: Data Warehouse Concepts_Final.pptx

DWH-Training Material 2

Chapter 1

• Data,Information,Knowledge,Decision• Analysis• Report

Chapter2

• Normalization• OLTP Systems• Characteristics of OLTP

Chapter 3

• Data Warehouse• Advantages of DataWarehouse• Goals of Data Warehouse

Page 3: Data Warehouse Concepts_Final.pptx

DWH-Training Material 3

Chapter 4

• Characteristics of Data Warehouse• Difference between OLTP/DW• OLAP• Data Warehouse/Data Mart• Data Warehouse Strategies

Chapter 5

• Dimension Modeling• Star Schema• Snow Flake Schema• Dimension Table• Conformed Dimension• Degenerated Dimension

Chapter 6

• Fact Table• Types of Fact• Metadata Management

Page 4: Data Warehouse Concepts_Final.pptx

DWH-Training Material 4

Chapter 7

• Grain Level• Surrogate Key• Time Dimension• Staging Area• Slowly Changing Dimensions

Chapter 8

• Project Overview• Phases of Project

Page 5: Data Warehouse Concepts_Final.pptx

DWH-Training Material 5

Data >> Decision

Raw Observations No Meaning

• Data

i• Information - Meaning by Relational Connection

• Knowledge -Appropriate collection of information -Intent is to be useful and to change the business process

Page 6: Data Warehouse Concepts_Final.pptx

DWH-Training Material 6

Action

What is Knowledge?

Data Information Knowledge

Raw Facts Data in context Information+Experience Knowledge applied Numbers Readily Captured to decision making

Strategic Value

Page 7: Data Warehouse Concepts_Final.pptx

DWH-Training Material 7

Analysis

• Comparison of Sales (Fact) of a product (dimension) over Years(dimension) in the same region(dimension).

• What is the total sales value(fact) of a particular product(dimension) in a store(dimension), in 3-months(dimension)?

• What is the amount spent(fact) for a particular product promotion(dimension) in a particular branch(dimension), in a particular city(dimension),in a year(dimension)?

Page 8: Data Warehouse Concepts_Final.pptx

DWH-Training Material 8

• Report: Collection of Data

• Purpose: Analysis- Comparitive Study of Data, Historical Data

• Final: Improve Decision

Page 9: Data Warehouse Concepts_Final.pptx

DWH-Training Material 9

Chapter 2

Page 10: Data Warehouse Concepts_Final.pptx

DWH-Training Material 10

Normalization• Normalization is the process od efficiently organizing data in a database.There

are two goals of the normalization process::

• Eliminating redundant data Ensuring data dependencies

• First Normal Form• First normal form (1NF) sets the very basic rules for an organized database• Eliminate duplicate columns from the same table• Create separate tables for each group of related data and identify each row

with a unique column or set of columns( the primary key)

Page 11: Data Warehouse Concepts_Final.pptx

DWH-Training Material 11

• Second Normal Form• Second Normal Form(2NF) further addresses the concept of removing

duplicative data• Meet all the requirements of the first normal form.• Create relationships between these new tables and their predecessors through

the use of foreign keys.

• Third Normal Form• Third Normal Form(3NF) remove columns which are not dependent upon the

primary key.• Meet all the requirements of the second normal form• Remove columns that are not dependent upon the primary key.

Page 12: Data Warehouse Concepts_Final.pptx

DWH-Training Material 12

Information System/OLTP Systems• OLTP systems- Highly Normalized databases• Purpose of OLTP systems is to capture data• Do DML activities• Purpose of Data Warehouse is for multidimensional analysis• OLTP applications like Equity Plans,Shares,Insurance,Loans,Savings

Page 13: Data Warehouse Concepts_Final.pptx

DWH-Training Material 13

Characteristics-OLTPCharacteristics OLTP

Operation Insert/Update

Analytical Requirements Low

Data per Transaction Small

Data Level Detailed

Orientation Records

Page 14: Data Warehouse Concepts_Final.pptx

DWH-Training Material 14

Business Intelligence• From an information systems standpoint, BI provides users with online analytical

processing or data analysis capabilities to predict trends, evaluate business questions and so on

• From a business analyst viewpoint, it is the process of gathering high quality,meaningful information about a subject, which enables the analyst to draw conclusions

Page 15: Data Warehouse Concepts_Final.pptx

DWH-Training Material 15

Chapter 3

Page 16: Data Warehouse Concepts_Final.pptx

DWH-Training Material 16

Data Warehouse

• Data warehousing is the entire process of data extraction, transformation and loading of data to the warehouse and the access of the data by end users and applications.

Page 17: Data Warehouse Concepts_Final.pptx

DWH-Training Material 17

Data Warehouse Architecture

Page 18: Data Warehouse Concepts_Final.pptx

DWH-Training Material 18

Advantages through DW• Acquire new customers• Retain Existing customers• Improve customer satisfaction• Sell more products

Page 19: Data Warehouse Concepts_Final.pptx

DWH-Training Material 19

Goals of Data Warehouse

• Easy access to organization information• Data Warehouse must be adaptive and resilent to change• Secure environment to protect information assets.• Foundation for improved decision making,

Page 20: Data Warehouse Concepts_Final.pptx

DWH-Training Material 20

Chapter 4

Page 21: Data Warehouse Concepts_Final.pptx

DWH-Training Material 21

Data Warehouse Characteristics

• Subject- Oriented• Integrated• Non-Volatile• Time-Variant

Page 22: Data Warehouse Concepts_Final.pptx

DWH-Training Material 22

Difference- OLTP and DW• They are both databases• They both hold data• But, they have been designed for different scopes: Running the business (OLTP Systems) v/s managing the business(DWH):

Operational systems focus on present data. DWH’s focus on historical data(present,past) OLTP systems are optimized to insert/update and store data DWH are optimized to select/analyze data.

Page 23: Data Warehouse Concepts_Final.pptx

DWH-Training Material 23

OLTP v/s Data WarehouseOLTP OLAP(DW)

Access Read/Write Read – Lots of scan

Unit of Work Short, Simple Transaction Query

# Users Thousands Hundreds

DB Size 100 MB-GB 100 GB - Terabytes

Function Date of Date Operations Decision Support

DB Design Application Oriented Subject Oriented

Data Current, Up to date detailed

Historical, Summarized

Page 24: Data Warehouse Concepts_Final.pptx

DWH-Training Material 24

OLAP• OLAP is an acronym for Online Analytical Processing. OLAP

performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis. OLAP enables end-users to perform ad hoc analysis of data in multiple dimensions, thereby providing the insight and understanding they need for better decision making.

• OLAP operationsRoll-upDrill-downSlice and dicePivot (rotate)

Page 25: Data Warehouse Concepts_Final.pptx

DWH-Training Material 25

Data Mart – Data Warehouse• A Data Mart stores data for a limited number of subject areas, such as

marketing or sales data.

• A Data warehouse deals with multiple subject areas and is typically implemented and controlled by a central organization unit such as the corporate information factory. It is often called a central or enterprise data warehouse.

Page 26: Data Warehouse Concepts_Final.pptx

DWH-Training Material 26

Data Warehouse / Data MartsProperty Data Warehouse Data Mart

Scope Enterprise Department

Subjects Multiple Single

Data Source Many Few

Implementation time Months to Years Months

Page 27: Data Warehouse Concepts_Final.pptx

DWH-Training Material 27

Data Warehousing Strategies• Enterprise wide warehouse, top down, the Inmon methodology

• Data mart, Bottom up, the Kimball methodology

• When properly executed , both result in an enterprise-wide data warehouse, but with different architectures

Page 28: Data Warehouse Concepts_Final.pptx

DWH-Training Material 28

Top Down Approach

Data Warehouse

Data Marts

Marketing Sales

Finance

Marketing

Finance

SalesOperational Systems

External Data

Page 29: Data Warehouse Concepts_Final.pptx

DWH-Training Material 29

Bottom Up ApproachData Marts Data Warehouse

Legacy Data

Operations Data

External data sources

Marketing

Finance

SalesMarketing

SalesFinance

Page 30: Data Warehouse Concepts_Final.pptx

DWH-Training Material 30

Chapter 5 and Chapter 6

Page 31: Data Warehouse Concepts_Final.pptx

DWH-Training Material 31

Data Warehouse Architecture

Page 32: Data Warehouse Concepts_Final.pptx

DWH-Training Material 32

Dimensional Modeling• Dimensional Modeling provides users the ability to view data based on

the organization of the business and the important characteristics of the data

• There are two major components of dimensional analysis: Dimensions, which determine how data will be presented; and Facts which determine what data will be presented.

Page 33: Data Warehouse Concepts_Final.pptx

DWH-Training Material 33

Dimension Table Examples• Retail – store name, zip code, product name, product category, day of

the week• Telecommunication – call origin, call destination• Banking – customer name, account number, branch, account officer• Insurance – Policy type, insured party

Page 34: Data Warehouse Concepts_Final.pptx

DWH-Training Material 34

Dimension Table CharacteristicsDimension tables have the following characteristics:• Contain textual information that represents the attributes of the

business• Contain relatively static data• Are joined to a fact through foreign key reference• They are hierarchical in nature and provide the ability to view data at

varying levels of details.

Page 35: Data Warehouse Concepts_Final.pptx

DWH-Training Material 35

Fact Table Examples• Retail -- number of units sold, sales amount

• Telecommunications -- length of the call in minutes, average number of calls

• Banking -- average monthly balance

• Insurance – claims amount

Page 36: Data Warehouse Concepts_Final.pptx

DWH-Training Material 36

Fact Table Characteristics• Fact table have the following characteristics

– Contain numerical metrics of the business– Can hold large volumes of data– Can grow quickly– Are joined to dimension table through foreign keys that reference

primary keys in the dimension tables

Page 37: Data Warehouse Concepts_Final.pptx

DWH-Training Material 37

Star Schema

Page 38: Data Warehouse Concepts_Final.pptx

DWH-Training Material 38

Snowflake Schema

Page 39: Data Warehouse Concepts_Final.pptx

DWH-Training Material 39

Conformed Dimensions• An dimension Table which is shared across data marts or more than 1 Fact

table• Example:

– Calendar/Date/Time – Dimension– Customer Dimension– Product Dimension

Page 40: Data Warehouse Concepts_Final.pptx

DWH-Training Material 40

Degenerated Dimension• Degenerative dimension is something dimensional in nature but exist

in fact table

Page 41: Data Warehouse Concepts_Final.pptx

DWH-Training Material 41

Fact Tables• Types of Measures

– Additive facts– Non-additive facts– Semi-additive facts

Page 42: Data Warehouse Concepts_Final.pptx

DWH-Training Material 42

Fact Tables• Additive Facts

– Additive facts are facts that can be summed up through all of the dimensions in the fact table.

Example :Dollar value is additive fact. If we want to find out the amount for a particular place for a particular period of time, we can add the dollar amounts and come up with total amount.

Page 43: Data Warehouse Concepts_Final.pptx

DWH-Training Material 43

• Non- Additive FactsNon-additive facts are facts that cannot be summed up for any of the

dimensions present in the fact table.

Example: Measure height for ‘citizens by geographical location’, when we rollup ‘city’data to ‘state’ level data we should not add heights of the citizens rather we may want to use it ti derive ’count’

Example: percentage(%)

Page 44: Data Warehouse Concepts_Final.pptx

DWH-Training Material 44

• Semi-additive factsSemi-additive facts are facts that can be summed up for some of

the dimensions in the fact table, but not the others.

Page 45: Data Warehouse Concepts_Final.pptx

DWH-Training Material 45

Factless Fact Table

• A factless fact table is a fact table that does not have any measures.

Teacher_FKCourse_FKStudent_FKLocation_FK

Student_DimensionStudent_PK

Course_DimensionCourse_PK

Location DimesnionLocation_PK

Teacher DimensionTeacher_PK

Page 46: Data Warehouse Concepts_Final.pptx

DWH-Training Material 46

Metadata• Its data bout data• Vital to the warehouse• Used by everyone• The key to understanding warehouse information

Page 47: Data Warehouse Concepts_Final.pptx

DWH-Training Material 47

Chapter 7

Page 48: Data Warehouse Concepts_Final.pptx

DWH-Training Material 48

Grain Level

• Level at which the data has to be captured in the Fact tableExample• Each Sales Transaction• Insurance claim Transaction• Monthly Account

Page 49: Data Warehouse Concepts_Final.pptx

DWH-Training Material 49

Surrogate Keys• It has no meaning, other than stating uniqueness for each record

stored in the fact table i.e to implement primary keys of almost all dimension tables

• It is just a sequence no.• Advantages of surrogate key include

– Control over data– Avoid using the OLTP keys as data warehouse keys

Page 50: Data Warehouse Concepts_Final.pptx

DWH-Training Material 50

Data Staging

• Often used as an interim step between data extraction and later steps• No end user access to staging

Source Staging Target

Page 51: Data Warehouse Concepts_Final.pptx

DWH-Training Material 51

Slowly Changing Dimensions(SCD)

• Slowly changing dimension change gradually and occasionally over time.

Example: Employee change their address, name, marital status

Page 52: Data Warehouse Concepts_Final.pptx

DWH-Training Material 52

SCD Approach Results

Type1 Overwriting the old values in the dimension record

Only current Losing the ability to track the old history

Type2 Creating an additional dimension record(with a time stamp)at the time of the change with the new attribute values

History+ Current

Segmenting history very accurately between the old description and the new description

Type3 Creating new ‘current’ fields and move the old attribute in a precedent field

Previous +Current

Describe both historical and current view

Page 53: Data Warehouse Concepts_Final.pptx

DWH-Training Material 53

Business Analyst Architect

ETL Lead

SourceSystem Study OLAPLead

Data Modeler ETL Devs/Cons

OLAP Devs/Cons

DBA

Test Lead

Tester

Project Manager

Page 54: Data Warehouse Concepts_Final.pptx

DWH-Training Material 54

Phases of Project

Phase1 - Define

Phase2- Analysis

Phase4-Build

Phase3 - Design

Phase5-Test

Phase6-Production

Page 55: Data Warehouse Concepts_Final.pptx

DWH-Training Material 55

The Define Phase

Sol ID Hand off

Revisit Effort Estimation

Business Vision/Goal

Project Plan

Resource Plan

Sol ID Hand off

Revisit Effort Estimation

Business Vision/Goal

Project Plan

Resource Plan

Analyze Risk

Communication Plan

Escalation Plan

CTS’s or CTQ’s

Sample Weekly Report

Page 56: Data Warehouse Concepts_Final.pptx

DWH-Training Material 56

The Analysis Phase

Sol ID Hand off

Revisit Effort Estimation

Business Vision/Goal

Project Plan

Resource Plan

Sample Report Requirement

Source System Study

Business Requirement

Gap Analysis

Fact Dimension Matrix

Initiate Capacity Planning

Evaluate ETL Tools

Evaluate OLAP Tools

Loading Strategy-ETL

Availability of reusable components

Technical Architecture Strategy

Page 57: Data Warehouse Concepts_Final.pptx

DWH-Training Material 57

The Design Phase

Sol ID Hand off

Revisit Effort Estimation

Business Vision/Goal

Project Plan

Resource Plan

Design Technical Architecture

Logical Model

Design Alternate Solution

Physical Model

Set up Dev/Test Environment

Design ETL Architecture

ETL Specification

ETL Test Plan

Design OLAP Architecture

Reporting Specifications

Reporting Test Plan

Page 58: Data Warehouse Concepts_Final.pptx

DWH-Training Material 58

The Build Phase

Sol ID Hand off

Revisit Effort Estimation

Business Vision/Goal

Project Plan

Resource Plan

Create Database

Test ETL Mappings

Build ETL Mappings

Build OLAP Reports

Test OLAP Reports

Page 59: Data Warehouse Concepts_Final.pptx

DWH-Training Material 59

The Test Phase

Sol ID Hand off

Revisit Effort Estimation

Business Vision/Goal

Project Plan

Resource Plan

Train End users

Report Validation

Data Load Testing

UAT-User Acceptance Testing

Production Readiness Checklist

Page 60: Data Warehouse Concepts_Final.pptx

DWH-Training Material 60

Transition to Production Phase

Sol ID Hand off

Revisit Effort Estimation

Business Vision/Goal

Project Plan

Resource Plan

ETL-First time loading

Post Implementation Support

OLAP-Report Validation

Project Closure Report

Monitor System Performance