Transcript

Introduction to Data Warehousing

December 20, 2012

Tameem AhmadM.Tech. (F)ZHCET, AMU, Aligarh

04/10/2023 Tameem Ahmad 2

References:

• “Building Data Warehouse” by Inmon (Third Edition), New York: John Wiley & Sons, (2002)

• “Data Mining: Concepts and Techniques” by Han,Kamber. 2000

• http://www.data-warehouse-online.com/ [Accessed: November 4, 2012]

• Data Warehousing Battle of the Giants: Comparing the Basics of the Kimball and Inmon Models: by Mary Breslin

http://www.bibestpractices.com/view-articles/4768

04/10/2023 Tameem Ahmad 3

Plan for the Presentation

• Necessity of Data Warehousing. (Why it is needed?)• What is Data Warehousing?• Architecture• Schema• How to build Data Warehouse (components)• Data Warehousing Tools

04/10/2023 Tameem Ahmad 4

Necessity is the mother of invention…

Why Data Warehouse?

? ? ? ?

04/10/2023 Tameem Ahmad 5

Scenario

• ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.

04/10/2023 Tameem Ahmad 6

Scenarion: ABC Pvt. Ltd.

6

Mumbai

Delhi

Chennai

Banglore

SalesManager

Sales per item type per branchfor first quarter.

04/10/2023 Tameem Ahmad 7

Solution: ABC Pvt. Ltd.

Extract sales information from each database.

Store the information in a common repository at a single site.

04/10/2023 Tameem Ahmad 8

Solution: ABC Pvt. Ltd.

Mumbai

Delhi

Chennai

Banglore

DataWarehouse

SalesManager

Query &Analysis tools

Report

04/10/2023 Tameem Ahmad 9

Data Warehousing…

• DefinitionA data warehouse is» -subject-oriented,» -integrated,» -time-variant,» -nonvolatile

collection of data in support of management’s decision making process.

04/10/2023 Tameem Ahmad 10

Subject-oriented

• Data warehouse is organized around subjects such as sales, product, customer.

• It focuses on modeling and analysis of data for decision makers.

• Excludes data not useful in decision support process.

04/10/2023 Tameem Ahmad 11

Integration

• Data Warehouse is constructed by integrating multiple heterogeneous sources.

• Data Preprocessing are applied to ensure consistency.

RDBMS

LegacySystem

DataWarehouse

Flat File Data ProcessingData Transformation

04/10/2023 Tameem Ahmad 12

Time-variant

• Provides information from historical perspective e.g. past 5-10 years

04/10/2023 Tameem Ahmad 13

Nonvolatile

• Data once recorded cannot be updated.• Data warehouse requires two operations

in data accessing– Initial loading of data– Access of data

load access

04/10/2023 Tameem Ahmad 14

Data Warehousing Architecture

04/10/2023 Tameem Ahmad 15

Data Warehousing Architecture (Contt…)

• Data Warehouse server• almost always a relational DBMS, rarely flat files

• OLAP servers• to support and operate on multi-dimensional data

structures• Clients

• Query and reporting tools• Analysis tools• Data mining tools

04/10/2023 Tameem Ahmad 16

Data Warehousing Schema

• Star Schema• Snowflake Schema

04/10/2023 Tameem Ahmad 17

Measures & Dimensions

• Measure – Units sold, Amount.

• Dimensions – Product, Time, Region

04/10/2023 Tameem Ahmad 18

Star Schema

• A single, large and central fact table and one table for each dimension.

• Every fact points to one tuple in each of the dimensions and has additional attributes.

• Does not capture hierarchies directly.

04/10/2023 19

Star Schema (Contt…)

Store Key

Product Key

Period Key

Units

Price

Store Dimension

Time Dimension

Product Dimension

Fact Table

Store Key

Store Name

City

State

Region

Period Key

Year

Quarter

Month

Product Key

Product Desc

Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins.

Tameem Ahmad

04/10/2023 Tameem Ahmad 20

Snowflake Schema

• Variant of star schema model.• A single, large and central fact table and one or more tables

for each dimension.• Dimension tables are normalized i.e. split dimension table

data into additional tables

04/10/2023 Tameem Ahmad 21

Snowflake Schema (Contt…)

Store Key

Product Key

Period Key

Units

Price

Time Dimension

Product Dimension

Fact Table

Store Key

Store Name

City Key

Period Key

Year

Quarter

Month

Product Key

Product Desc

City Key

City

State

Region

City Dimension

Store Dimension

Drawbacks: Time consuming joins,report generation slow

04/10/2023 22

Building the Data Warehouse

• Data Selection

• Data Pre-processing

– Fill missing values

– Remove inconsistency

• Data Transformation & Integration

• Data Loading

Data in warehouse is stored in form of fact tables and dimension tables.

Tameem Ahmad

04/10/2023 Tameem Ahmad 23

Data Warehousing Tools

• Data Warehouse– SQL Server 2000 DTS– Oracle 8i Warehouse Builder

• ETL tools– Ab Initio– Informatica

• OLAP tools– SQL Server Analysis

Services– Oracle Express Server

• Reporting tools− MS Excel Pivot Chart− VB Applications− cognos, − Microstrategy,

− Hyperion

Thank You

top related