Intro. to Data Warehouse Intro. to Data Warehouse รร รร . . รร รร . . รรรรรร รรรรรรรรรร รรรรรร รรรรรรรรรร Worapoj Kreesuradej, Ph.D. Worapoj Kreesuradej, Ph.D. Ass Ass ociate ociate Professor Professor Data Mining & Data Exploration Laboratory (DME Lab), Data Mining & Data Exploration Laboratory (DME Lab), Faculty of Information Technology, Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, King Mongkut's Institute of Technology Ladkrabang, Web: www.it.kmitl.ac.th/dme Web: www.it.kmitl.ac.th/dme Email: Email: [email protected]
49
Embed
Intro. to Data Warehouse รศ. ดร. วรพจน์ กรีสุระเดช Worapoj Kreesuradej, Ph.D. Associate Professor Data Mining & Data Exploration Laboratory
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Intro. to Data WarehouseIntro. to Data Warehouseรศรศ..ดรดร. . วรพจน์� กร�สุ ระเดชวรพจน์� กร�สุ ระเดช
Worapoj Kreesuradej, Ph.D.Worapoj Kreesuradej, Ph.D. AssAssociateociate Professor Professor
Data Mining & Data Exploration Laboratory (DME Lab),Data Mining & Data Exploration Laboratory (DME Lab),
Faculty of Information Technology,Faculty of Information Technology,
King Mongkut's Institute of Technology Ladkrabang,King Mongkut's Institute of Technology Ladkrabang,
Paulraj Ponniah, Data Warehousing Paulraj Ponniah, Data Warehousing
Fundamentals, John Wiley & Sons, 2001.Fundamentals, John Wiley & Sons, 2001.
Ralph Kimbal and Margy Ross, Ralph Kimbal and Margy Ross, The Data The Data
Warehouse ToolkitWarehouse Toolkit, John Wiley and , John Wiley and
Sons, 2002.Sons, 2002.
Definition of DWDefinition of DW““A collection of integrated, subject-
oriented databases designed to supply the information required for decision-making.” - W. Inmon
A decision support database that is maintained separately from the organization’s operational databases.
A physical repository where relational A physical repository where relational data are specially organized to provide data are specially organized to provide enterprise-wide, cleansed data in a enterprise-wide, cleansed data in a standardized format –E. Turban and etc.standardized format –E. Turban and etc.
R. Kimball’s definition of a DWR. Kimball’s definition of a DW A data warehouse is a copy of A data warehouse is a copy of
transactional data transactional data specifically
structured for querying and analysis.structured for querying and analysis.
Problem: Data Management Problem: Data Management in Large Enterprisesin Large Enterprises
Vertical fragmentation of informational Vertical fragmentation of informational systems systems
Result of application (user)-driven Result of application (user)-driven development of operational systemsdevelopment of operational systems
Most business Most business analysis has a analysis has a time componenttime component
Trend Analysis Trend Analysis (historical data is (historical data is required)required)
2001 2002 2003 20042001 2002 2003 2004
SalesSales
Data Warehousing Data Warehousing Process Overview Process Overview
Data Warehousing Data Warehousing Process Overview Process Overview The major components of a data The major components of a data
warehousing process warehousing process Data sources Data sources Data extraction Data extraction Data loading Data loading Comprehensive Comprehensive Database /Data Store Data Mart Metadata Metadata Middleware tools /information delivery Middleware tools /information delivery
toolstools
ETL
• Data Extraction
• Data Cleaning and TransformationConvert from legacy/host format to
OLTP Systems are Data Capture SystemsOLTP Systems are Data Capture Systems““DATA IN” systemsDATA IN” systemsDW are “DATA OUT” systemsDW are “DATA OUT” systems
OLTP DW
Dimensional ModelingDimensional ModelingFacts are stored in FACT TablesFacts are stored in FACT TablesDimensions are stored in Dimensions are stored in
Increasingly common mode of delivery: Increasingly common mode of delivery:
Web-enabledWeb-enabled
Data Flow ArchitectureData Flow Architecture System ArchitectureSystem Architecture
Data Warehouse ArchitectureData Warehouse Architecture
Data Flow ArchitectureData Flow Architecture
Data Flow ArchitectureData Flow Architecture
Data Flow ArchitectureData Flow Architecture
Operational data stores (ODS)Operational data stores (ODS)
A type of database often used as an A type of database often used as an interim area for a data warehouse, interim area for a data warehouse, especially for customer information filesespecially for customer information files
Three parts of the data warehouseThree parts of the data warehouse The data warehouse that contains the data The data warehouse that contains the data
and associated softwareand associated software Data acquisition (back-end) software that Data acquisition (back-end) software that
extracts data from legacy systems and extracts data from legacy systems and external sources, consolidates and external sources, consolidates and summarizes them, and loads them into the summarizes them, and loads them into the data warehousedata warehouse
Client (front-end) software that allows Client (front-end) software that allows users to access and analyze data from the users to access and analyze data from the warehousewarehouse
System ArchitecturesSystem Architectures
System ArchitecturesSystem Architectures
System ArchitectureSystem Architecture
System ArchitectureSystem Architecture
Data Warehouse DevelopmentData Warehouse Development Data warehouse development Data warehouse development
Data Warehouse DevelopmentData Warehouse Development
Some best practices for implementing a Some best practices for implementing a data warehouse data warehouse (Weir, 2002):(Weir, 2002):
Project must fit with corporate strategy and Project must fit with corporate strategy and business objectivesbusiness objectives
There must be complete buy-in to the There must be complete buy-in to the project by executives, managers, and usersproject by executives, managers, and users
It is important to manage user expectations It is important to manage user expectations about the completed projectabout the completed project
The data warehouse must be built The data warehouse must be built incrementallyincrementally
Build in adaptability Build in adaptability
Data Warehouse DevelopmentData Warehouse Development
Some best practices for implementing a Some best practices for implementing a data warehouse data warehouse (Weir, 2002):(Weir, 2002):
The project must be managed by both IT The project must be managed by both IT and business professionalsand business professionals
Develop a business/supplier relationshipDevelop a business/supplier relationship Only load data that have been cleansed and Only load data that have been cleansed and
are of a quality understood by the are of a quality understood by the organizationorganization
Do not overlook training requirementsDo not overlook training requirements Be politically aware Be politically aware