Chapter 2: Chapter 2: Data Warehousing Data Warehousing Business Intelligence: Business Intelligence: A Managerial Approach A Managerial Approach (2 (2 nd nd Edition) Edition)
Oct 28, 2015
Chapter 2:Chapter 2:
Data WarehousingData Warehousing
Business Intelligence: Business Intelligence: A Managerial Approach A Managerial Approach
(2(2ndnd Edition) Edition)
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-2
Learning ObjectivesLearning Objectives Understand the basic definitions and
concepts of data warehouses Learn different types of data warehousing
architectures; their comparative advantages and disadvantages
Describe the processes used in developing and managing data warehouses
Explain data warehousing operations Explain the role of data warehouses in
decision support
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-3
Learning ObjectivesLearning Objectives
Explain data integration and the extraction, transformation, and load (ETL) processes
Describe real-time (a.k.a. right-time and/or active) data warehousing
Understand data warehouse administration and security issues
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-4
Opening Vignette…Opening Vignette…
“DirecTV Thrives with Active Data Warehousing”
Company backgroundProblem descriptionProposed solutionResultsAnswer & discuss the case questions.
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-5
Main Data Warehousing TopicsMain Data Warehousing Topics
DW definition Characteristics of DW Data Marts ODS, EDW, Metadata DW Framework DW Architecture & ETL Process DW Development DW Issues
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-6
What is a Data Warehouse?What is a Data Warehouse?
A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format
“The data warehouse is a collection of integrated, subject-oriented databases designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time”
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-7
Characteristics of DWCharacteristics of DW Subject oriented Integrated Time-variant (time series) Nonvolatile Summarized Not normalized Metadata Web based, relational/multi-dimensional Client/server Real-time and/or right-time (active)
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-8
Data MartData Mart
A departmental data warehouse that stores only relevant data
Dependent data mart A subset that is created directly from a data warehouse
Independent data martA small data warehouse designed for a strategic business unit or a department
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-9
Data Warehousing DefinitionsData Warehousing Definitions Operational data stores (ODS)
A type of database often used as an interim area for a data warehouse
Oper marts An operational data mart
Enterprise data warehouse (EDW)A data warehouse for the enterprise
Metadata Data about data. In a data warehouse, metadata describe the contents of a data warehouse and the manner of its acquisition and use
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-10
DW FrameworkDW Framework
DataSources
ERP
Legacy
POS
OtherOLTP/wEB
External data
Select
Transform
Extract
Integrate
Load
ETL Process
EnterpriseData warehouse
Metadata
Replication
A P
I
/ M
iddl
ewar
e Data/text mining
Custom builtapplications
OLAP,Dashboard,Web
RoutineBusinessReporting
Applications(Visualization)
Data mart(Engineering)
Data mart(Marketing)
Data mart(Finance)
Data mart(...)
Access
No data marts option
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-11
DW ArchitectureDW Architecture
Three-tier architecture1. Data acquisition software (back-end)2. The data warehouse that contains the data
& software3. Client (front-end) software that allows
users to access and analyze data from the warehouse
Two-tier architectureFirst 2 tiers in three-tier architecture is
combined into one
Sometimes there is only one tier
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-12
DW ArchitecturesDW Architectures
Tier 2:Application server
Tier 1:Client workstation
Tier 3:Database server
Tier 1:Client workstation
Tier 2:Application & database server
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-13
A Web-based DW ArchitectureA Web-based DW Architecture
WebServer
Client(Web browser)
ApplicationServer
Datawarehouse
Web pages
Internet/Intranet/Extranet
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-14
Data Warehousing Architectures Data Warehousing Architectures Issues to consider when deciding
which architecture to use: Which database management system
(DBMS) should be used? Will parallel processing and/or
partitioning be used? Will data migration tools be used to load
the data warehouse? What tools will be used to support data
retrieval and analysis?
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-15
Alternative DW ArchitecturesAlternative DW Architectures
SourceSystems
Staging Area
Independent data marts(atomic/summarized data)
End user access and applications
ETL
(a) Independent Data Marts Architecture
SourceSystems
Staging Area
End user access and applications
ETL
Dimensionalized data marts linked by conformed dimentions
(atomic/summarized data)
(b) Data Mart Bus Architecture with Linked Dimensional Datamarts
SourceSystems
Staging Area
End user access and applications
ETL
Normalized relational warehouse (atomic data)
Dependent data marts(summarized/some atomic data)
(c) Hub and Spoke Architecture (Corporate Information Factory)
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-16
Alternative DW ArchitecturesAlternative DW Architectures
SourceSystems
Staging Area
Normalized relational warehouse (atomic/some
summarized data)
End user access and applications
ETL
(d) Centralized Data Warehouse Architecture
End user access and applications
Logical/physical integration of common data elements
Existing data warehousesData marts and legacy systmes
Data mapping / metadata
(e) Federated Architecture
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-17
Alternative DW ArchitecturesAlternative DW Architectures
1. Independent Data Marts2. Data Mart Bus Architecture3. Hub-and-Spoke Architecture4. Centralized Data Warehouse5. Federated Data Warehouse
Each has pros and cons!
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-18
Teradata Corp. DW ArchitectureTeradata Corp. DW Architecture
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-19
Data Warehousing Architectures Data Warehousing Architectures
1. Information interdependence between organizational units
2. Upper management’s information needs
3. Urgency of need for a data warehouse
4. Nature of end-user tasks5. Constraints on resources
6. Strategic view of the data warehouse prior to implementation
7. Compatibility with existing systems
8. Perceived ability of the in-house IT staff
9. Technical issues10.Social/political factors
Ten factors that potentially affect Ten factors that potentially affect the architecture selection decision:the architecture selection decision:
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-20
Data Integration and the Extraction, Data Integration and the Extraction, Transformation, and Load (ETL) Transformation, and Load (ETL) ProcessProcess Data integration
Integration that comprises three major processes: data access, data federation, and change capture
Enterprise application integration (EAI)A technology that provides a vehicle for pushing data from source systems into a data warehouse
Enterprise information integration (EII) An evolving tool space that promises real-time data integration from a variety of sources, such as relational databases, Web services, and multidimensional databases
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-21
Extraction, transformation, and load (ETL)
Data Integration and the Extraction, Data Integration and the Extraction, Transformation, and Load (ETL) Transformation, and Load (ETL) ProcessProcess
Packaged application
Legacy system
Other internal applications
Transient data source
Extract Transform Cleanse Load
Datawarehouse
Data mart
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-22
ETL ETL Issues affecting the purchase of ETL tool
Data transformation tools are expensive Data transformation tools may have a long
learning curve Important criteria in selecting an ETL tool
Ability to read from and write to an unlimited number of data sources/architectures
Automatic capturing and delivery of metadata A history of conforming to open standards An easy-to-use interface for the developer and
the functional user
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-23
Data Warehouse DevelopmentData Warehouse Development Data warehouse development approaches
Inmon Model: EDW approach (top-down) Kimball Model: Data mart approach (bottom-
up) Which model is best?
There is no one-size-fits-all strategy to DW
One alternative is the hosted warehouse Data warehouse structure:
The Star Schema vs. Relational Real-time data warehousing?
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-24
Hosted Data WarehousesHosted Data Warehouses Benefits:
Requires minimal investment in infrastructure Frees up capacity on in-house systems Frees up cash flow Makes powerful solutions affordable Enables powerful solutions that provide for
growth Offers better quality equipment and software Provides faster connections Enables users to access data remotely Allows a company to focus on core business Meets storage needs for large volumes of data
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-25
Representation of Data in DWRepresentation of Data in DW Dimensional Modeling – a retrieval-based system
that supports high-volume query access Star schema – the most commonly used and the
simplest style of dimensional modeling Contain a fact table surrounded by and connected to
several dimension tables Fact table contains the descriptive attributes (numerical
values) needed to perform decision analysis and query reporting
Dimension tables contain classification and aggregation information about the values in the fact table
Snowflakes schema – an extension of star schema where the diagram resembles a snowflake in shape
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-26
MultidimensionalityMultidimensionality Multidimensionality
The ability to organize, present, and analyze data by several dimensions, such as sales by region, by product, by salesperson, and by time (four dimensions)
Multidimensional presentation Dimensions: products, salespeople, market
segments, business units, geographical locations, distribution channels, country, or industry
Measures: money, sales volume, head count, inventory profit, actual versus forecast
Time: daily, weekly, monthly, quarterly, or yearly
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-27
Star vs Snowflake SchemaStar vs Snowflake Schema
Fact TableSALES
UnitsSold
...
DimensionTIME
Quarter
...
DimensionPEOPLE
Division
...
DimensionPRODUCT
Brand
...
DimensionGOGRAPHY
Coutry
...
Fact TableSALES
UnitsSold
...
DimensionDATE
Date
...
DimensionPEOPLE
Division
...
DimensionPRODUCT
LineItem
...
DimensionSTORE
LocID
...
DimensionBRAND
Brand
...
DimensionCATEGORY
Category
...
DimensionLOCATION
State
...
DimensionMONTH
M_Name
...
DimensionQUARTER
Q_Name
...
Star Schema Snowflake Schema
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-28
Analysis of Data in DWAnalysis of Data in DW Online analytical processing (OLAP)
Data driven activities performed by end users to query the online system and to conduct analyses
Data cubes, drill-down / rollup, slice & dice, …
OLAP Activities Generating queries (query tools) Requesting ad hoc reports Conducting statistical and other analyses Developing multimedia-based applications
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-29
Analysis of Data Stored in DWAnalysis of Data Stored in DWOLTP vs. OLAPOLTP vs. OLAP OLTP (online transaction processing)
A system that is primarily responsible for capturing and storing data related to day-to-day business functions such as ERP, CRM, SCM, POS,
The main focus is on efficiency of routine tasks OLAP (online analytic processing)
A system is designed to address the need of information extraction by providing effectively and efficiently ad hoc analysis of organizational data
The main focus is on effectiveness
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-31
OLAP OperationsOLAP Operations Slice – a subset of a multidimensional array Dice – a slice on more than two dimensions Drill Down/Up – navigating among levels of
data ranging from the most summarized (up) to the most detailed (down)
Roll Up – computing all of the data relationships for one or more dimensions
Pivot – used to change the dimensional orientation of a report or an ad hoc query-page display
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-32
OLAPOLAP
Product
Time
Geo
grap
hy
Sales volumes of a specific Product on variable Time and Region
Sales volumes of a specific Region on variable Time and Products
Sales volumes of a specific Time on variable Region and Products
Cells are filled with numbers representing
sales volumes
A 3-dimensional OLAP cube with slicing operations
Slicing Slicing Operations on Operations on a Simple Tree-a Simple Tree-DimensionalDimensionalData CubeData Cube
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-33
Variations of OLAP Variations of OLAP
Multidimensional OLAP (MOLAP)OLAP implemented via a specialized multidimensional database (or data store) that summarizes transactions into multidimensional views ahead of time
Relational OLAP (ROLAP)The implementation of an OLAP database on top of an existing relational database
Database OLAP and Web OLAP (DOLAP and WOLAP); Desktop OLAP,…