Top Banner
Introduction to Data Introduction to Data Warehousing Warehousing BY G.KIRAN KUMAR HT.NO:001-09-06-002
36

introduction to datawarehouse

Sep 08, 2014

Download

Sports

kiran14360

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: introduction to datawarehouse

Introduction to Data WarehousingIntroduction to Data Warehousing

BY

G.KIRAN KUMAR

HT.NO:001-09-06-002

Page 2: introduction to datawarehouse

Why Data Warehouse?Why Data Warehouse?

Necessity is the mother of invention

Page 3: introduction to datawarehouse

Scenario 1Scenario 1

GK Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.

Page 4: introduction to datawarehouse

Scenario 1 : GK Pvt Ltd.Scenario 1 : GK Pvt Ltd.

Mumbai

Delhi

Chennai

Banglore

SalesManager

Sales per item type per branchfor first quarter.

Page 5: introduction to datawarehouse

Solution 1:GK Pvt Ltd.Solution 1:GK Pvt Ltd.

Extract sales information from each database. Store the information in a common repository at a

single site.

Page 6: introduction to datawarehouse

Solution 1:GK Pvt Ltd.Solution 1:GK Pvt Ltd.

Mumbai

Delhi

Chennai

Banglore

DataWarehouse

SalesManager

Query &Analysis tools

Report

Page 7: introduction to datawarehouse

Scenario 2Scenario 2

One Software company has huge operational

database. Whenever Executives wantssome report the OLTP system becomes slow and data entry operators have to wait for some time.

Page 8: introduction to datawarehouse

Scenario 2 : One Software CompanyScenario 2 : One Software Company

OperationalDatabase

Data Entry Operator

Data Entry Operator

ManagementWait

Report

Page 9: introduction to datawarehouse

Solution 2Solution 2

Extract data needed for analysis from operational database.

Store it in warehouse. Refresh warehouse at regular interval so that it

contains up to date information for analysis. Warehouse will contain data with historical

perspective.

Page 10: introduction to datawarehouse

Solution 2Solution 2

Operationaldatabase

DataWarehouse

Extractdata

Data EntryOperator

Data EntryOperator

Manager

Report

Transaction

Page 11: introduction to datawarehouse

Scenario 3Scenario 3

Cakes & Cookies is a small, new company. President of the company wants his company should grow. He needs information so that he can make correct decisions.

Page 12: introduction to datawarehouse

Solution 3Solution 3

Improve the quality of data before loading it into the warehouse.

Perform data cleaning and transformation before loading the data.

Use query analysis tools to support adhoc queries.

Page 13: introduction to datawarehouse

Solution 3Solution 3

Query and Analysistool

President

Expansion

Improvement

sales

time

DataWarehouse

Page 14: introduction to datawarehouse

What is Data Warehouse??What is Data Warehouse??

Page 15: introduction to datawarehouse

Inmons’s definitionInmons’s definition

A data warehouse is-subject-oriented,-integrated,-time-variant,-nonvolatile

collection of data in support of management’sdecision making process.

Page 16: introduction to datawarehouse

Subject-orientedSubject-oriented

Data warehouse is organized around subjects such as sales,product,customer.

It focuses on modeling and analysis of data for decision makers.

Excludes data not useful in decision support process.

Page 17: introduction to datawarehouse

IntegrationIntegration

Data Warehouse is constructed by integrating multiple heterogeneous sources.

Data Preprocessing are applied to ensure consistency.

RDBMS

LegacySystem

DataWarehouse

Flat File Data ProcessingData Transformation

Page 18: introduction to datawarehouse

Time-variantTime-variant

Provides information from historical perspective e.g. past 5-10 years

Every key structure contains either implicitly or explicitly an element of time

Page 19: introduction to datawarehouse

NonvolatileNonvolatile

Data once recorded cannot be updated. Data warehouse requires two operations in data

accessing– Initial loading of data– Access of data

load

access

Page 20: introduction to datawarehouse

Data Warehousing ArchitectureData Warehousing Architecture

Page 21: introduction to datawarehouse

Data Warehouse ArchitectureData Warehouse Architecture

Data Warehouse server– almost always a relational DBMS,rarely flat

files OLAP servers

– to support and operate on multi-dimensional data structures

Clients– Query and reporting tools– Analysis tools– Data mining tools

Page 22: introduction to datawarehouse

Data Warehouse SchemaData Warehouse Schema

Star SchemaFact Constellation SchemaSnowflake Schema

Page 23: introduction to datawarehouse

Star SchemaStar Schema

A single,large and central fact table and one table for each dimension.

Every fact points to one tuple in each of the dimensions and has additional attributes.

Does not capture hierarchies directly.

Page 24: introduction to datawarehouse

Star Schema (contd..)Star Schema (contd..)

Store Key

Product Key

Period Key

Units

Price

Store Dimension

Time Dimension

Product Dimension

Fact Table

Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins.

Store Key

Store Name

City

State

Region

Period Key

Year

Quarter

Month

Product Key

Product Desc

Page 25: introduction to datawarehouse

SnowFlake SchemaSnowFlake Schema

Variant of star schema model.A single, large and central fact table and

one or more tables for each dimension.Dimension tables are normalized i.e. split

dimension table data into additional tables

Page 26: introduction to datawarehouse

SnowFlake Schema (contd..)SnowFlake Schema (contd..)

Store Key

Product Key

Period Key

Units

Price

Time Dimension

Product Dimension

Fact Table

Store Key

Store Name

City Key

Period Key

Year

Quarter

Month

Product Key

Product Desc

City Key

City

State

Region

City Dimension

Store Dimension

Drawbacks: Time consuming joins,report generation slow

Page 27: introduction to datawarehouse

Fact ConstellationFact Constellation

Multiple fact tables share dimension tables.This schema is viewed as collection of stars

hence called galaxy schema or fact constellation.

Sophisticated application requires such schema.

Page 28: introduction to datawarehouse

Fact Constellation (contd..)Fact Constellation (contd..)

Store Key

Product Key

Period Key

Units

Price

Store Dimension

Product Dimension

SalesFact Table

Store Key

Store Name

City

State

Region

Product Key

Product Desc

Shipper Key

Store Key

Product Key

Period Key

Units

Price

ShippingFact Table

Page 29: introduction to datawarehouse

Building Data WarehouseBuilding Data Warehouse

Data SelectionData Preprocessing

– Fill missing values– Remove inconsistency

Data Transformation & IntegrationData Loading Data in warehouse is stored in form of fact tables

and dimension tables.

Page 30: introduction to datawarehouse

Data Warehousing includesData Warehousing includes

Build Data Warehouse Online analysis processing(OLAP). Presentation.

RDBMS

Flat File

Presentation

Cleaning ,Selection &Integration

Warehouse & OLAP serverClient

Page 31: introduction to datawarehouse

Need for Data WarehousingNeed for Data Warehousing

Industry has huge amount of operational data Knowledge worker wants to turn this data into

useful information. This information is used by them to support

strategic decision making . It is a platform for consolidated historical data for

analysis. It stores data of good quality so that knowledge

worker can make correct decisions.

Page 32: introduction to datawarehouse

Advantages Of Data WarehouseAdvantages Of Data Warehouse

There are many advantages to using a data warehouse, some of them are: •Enhances end-user access to a wide variety of data •Business decision makers can obtain various kinds of trend reports e.g. the item with the most sales in a particular area / country for the last two years •Increased data consistency •Providing a place to combine related data from separate sources

Page 33: introduction to datawarehouse

Disadvantages Of Data warehouse

Data owners lose control over their data, raising ownership (responsibility and accountability), security and privacy issues

Adding new data sources takes time and associated high cost

Limited flexibility of use and types of users - requires multiple separate data marts for multiple uses and types of users

Page 34: introduction to datawarehouse

CONCLUSION

A parallel was made between Operational Systems and Data Warehouse Systems to show their differences mainly in the objectives and type of data that each one deals

Data warehouse is the technology for the future. data warehouse enables knowledge worker to

make faster and better decisions

Page 35: introduction to datawarehouse

ReferencesReferences

Building Data Warehouse by Inmon Data Mining:Concepts and Techniques by Han,Kamber. www.dwinfocenter.org www.datawarehousingonline.com www.billinmon.com

Page 36: introduction to datawarehouse

Thank YouThank You