Top Banner
Data Warehousing - Chetan Gadodia
23
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Datawarehousing.

DataWarehousing

- Chetan Gadodia

Page 2: Introduction to Datawarehousing.

What’s Warehousing?

• Large volume of data (Gb, Tb)• Non-volatile• Historical• Time attributes are important• Updates infrequent• May be append-only

1

Page 3: Introduction to Datawarehousing.

What’s Data Warehousing?

• Process of extracting.• Integrating.• Filtering.• Standardizing.• Transforming.• Cleaning & quality checking.• Storing it in a consolidated database.

2

Page 4: Introduction to Datawarehousing.

Need

• Huge Amount of Operational Data• Knowledge worker wants to turn this data into useful

information.• Support strategic decision making .• From business perspective– Marketing weapon– Valuable tool in today’s world.– Learning more about Customer needs

3

Page 5: Introduction to Datawarehousing.

Benefits

• The potential benefits of data warehousing are high returns on investment.

• Substantial competitive advantage.

• Increased productivity of corporate decision-makers.

4

Page 6: Introduction to Datawarehousing.

Volatile•Same data for different period

Definition

Subject Oriented

•Finance•Marketing•Inventory

Integrated •SAP•Weblog•Legacy

Time Variant•Daily•Monthly•Quarterly

5

Page 7: Introduction to Datawarehousing.

Basic Architecture6

Page 8: Introduction to Datawarehousing.

Architecture with Staging Area7

Page 9: Introduction to Datawarehousing.

Operational Database Data Warehouse

OLTPOLAP

Vs

Perform on-line transaction & query processing.

Day-to-Day operations of an organization

Data analysis & Decision making.

Systems can organize & present data in various formats

8

Page 10: Introduction to Datawarehousing.

Data Marts: Overview

• Data Mart is a decentralized subset of data

• Data Marts have specific business-related purposes

9

Page 11: Introduction to Datawarehousing.

Data Marts: Needs

• Much better performance querying from a data mart than from a data warehouse

• Much easier time navigating through data marts

10

Page 12: Introduction to Datawarehousing.

Data Marts: Features

• Low cost • Controlled locally rather than

centrally, conferring power on the user group

• Contain less information• Rapid response• Easily understood and navigated

than an enterprise Data Warehouse

• Within the range of divisional or departmental budgets

11

Page 13: Introduction to Datawarehousing.

Dimensional Data Modeling

E-R model• Symmetric• Divides data into many entities• Describes entities and relationships• Seeks to eliminate data redundancy• Good for high transaction performance

Dimensional model• Asymmetric• Divides data into dimensions and facts• Describes dimensions and measures• Encourages data redundancy• Good for high query performance

12

Page 14: Introduction to Datawarehousing.

What is Dimension?

• Single join to the fact table (single primary key)

• Stores business attributes

• Attributes are textual in nature

• Organized into hierarchies

• More or less constant data

• E.g. Time, Product, Customer, Store, etc.

13

Page 15: Introduction to Datawarehousing.

What is Fact?

• Central, dominant table

• Multi-part primary key

• Links directly to dimensions

• Stores business measures

• Constantly varying data

14

Page 16: Introduction to Datawarehousing.

Star Schema

• A single, large and central fact table and one table for each dimension.

• For example A Fact surrounded by 4-15 dimensions

• Dimensions are de-normalized

15

Page 17: Introduction to Datawarehousing.

Star Schema Example…

Store KeyProduct Key

Period Key

Units

Price

Store Dimension Time DimensionFact Table

Store Key

Store Name

City

State

Region

Period Key

Year

Quarter

Month

Product Key

Product Desc

16

Page 18: Introduction to Datawarehousing.

Snowflake Schema

• Variant of star schema model.

• A single, large and central fact table and one or more tables for each dimension.

• Dimension tables are normalized i.e. split dimension table data into additional tables

17

Page 19: Introduction to Datawarehousing.

Eg: Snowflake schema

Store KeyProduct Key

Period Key

Units

Price

Time DimensionFact Table

Store Key

Store Name

City Key

Period Key

Year

Quarter

Month

Product Key

Product Desc

City Key

City

State

Region

Store Dimension

18

Page 20: Introduction to Datawarehousing.

Avoid Snowflakes• Avoid natural desire to normalize model:• Complicates end-user query

construction• Adds additional level of “JOIN”

complexity• Database optimizers do not handle very

well• Saves some space at the cost of longer

queries

So,• Don’t snowflake for saving space• Snowflake if secondary dimensions have

many attributes

19

Page 21: Introduction to Datawarehousing.

Star vs Snow Flake20

Page 22: Introduction to Datawarehousing.

Widely used ETL Tools

• IBM Information Server (Datastage) • PowerCenter –Informatica• Abinitio• SAS Data Integration Studio • Oracle Warehouse Builder (OWB)• SQL Server Integration Services(SSIS)

21

Page 23: Introduction to Datawarehousing.

END