Top Banner
Chapter 13 – Data Chapter 13 – Data Warehousing Warehousing
49
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 13   data warehousing

Chapter 13 – Data Chapter 13 – Data WarehousingWarehousing

Page 2: Chapter 13   data warehousing

DatabasesDatabases

Databases are developed on the IDEA that Databases are developed on the IDEA that DATA is one of the critical materials of the DATA is one of the critical materials of the Information AgeInformation Age

Information, which is created by data, Information, which is created by data, becomes the bases for decision makingbecomes the bases for decision making

Page 3: Chapter 13   data warehousing

Decision Support SystemsDecision Support Systems

Created to facilitate the decision making Created to facilitate the decision making processprocess

So much information that it is difficult to So much information that it is difficult to extract it all from a traditional databaseextract it all from a traditional database

Need for a more comprehensive data Need for a more comprehensive data storage facilitystorage facility– Data WarehouseData Warehouse

Page 4: Chapter 13   data warehousing

Decision Support SystemsDecision Support Systems

Extract Information from data to use as the basis Extract Information from data to use as the basis for decision makingfor decision making

Used at all levels of the OrganizationUsed at all levels of the Organization Tailored to specific business areasTailored to specific business areas InteractiveInteractive Ad Hoc queries to retrieve and display informationAd Hoc queries to retrieve and display information Combines historical operation data with business Combines historical operation data with business

activitiesactivities

Page 5: Chapter 13   data warehousing

4 Components of DSS4 Components of DSS

Data Store – The DSS DatabaseData Store – The DSS Database– Business DataBusiness Data– Business Model DataBusiness Model Data– Internal and External DataInternal and External Data

Data Extraction and FilteringData Extraction and Filtering– Extract and validate data from the operational Extract and validate data from the operational

database and the external data sourcesdatabase and the external data sources

Page 6: Chapter 13   data warehousing

4 Components of DSS4 Components of DSS

End-User Query ToolEnd-User Query Tool– Create Queries that access either the Create Queries that access either the

Operational or the DSS databaseOperational or the DSS database

End User Presentation ToolsEnd User Presentation Tools– Organize and Present the DataOrganize and Present the Data

Page 7: Chapter 13   data warehousing

Differences with DSSDifferences with DSS

OperationalOperational– Stored in Normalized Relational DatabaseStored in Normalized Relational Database– Support transactions that represent daily Support transactions that represent daily

operations (Not Query Friendly)operations (Not Query Friendly)

3 Main Differences3 Main Differences– Time SpanTime Span– GranularityGranularity– DimensionalityDimensionality

Page 8: Chapter 13   data warehousing

Time SpanTime Span

OperationalOperational– Real TimeReal Time– Current TransactionsCurrent Transactions– Short Time FrameShort Time Frame– Specific Data FactsSpecific Data Facts

DSSDSS– HistoricHistoric– Long Time Frame (Months/Quarters/Years)Long Time Frame (Months/Quarters/Years)– PatternsPatterns

Page 9: Chapter 13   data warehousing

GranularityGranularity

OperationalOperational– Specific Transactions that occur at a given timeSpecific Transactions that occur at a given time

DSSDSS– Shown at different levels of aggregationShown at different levels of aggregation– Different Summary LevelsDifferent Summary Levels– Decompose (drill down)Decompose (drill down)– Summarize (roll up)Summarize (roll up)

Page 10: Chapter 13   data warehousing

DimensionalityDimensionality

Most distinguishing characteristic of DSS Most distinguishing characteristic of DSS datadata

OperationalOperational– Represents atomic transactionsRepresents atomic transactions

DSSDSS– Data is related in Many waysData is related in Many ways– Develop the larger pictureDevelop the larger picture– Multi-dimensional view of dataMulti-dimensional view of data

Page 11: Chapter 13   data warehousing

DSS Database RequirementsDSS Database Requirements

DSS Database SchemeDSS Database Scheme– Support Complex and Non-Normalized dataSupport Complex and Non-Normalized data

Summarized and Aggregate dataSummarized and Aggregate data Multiple RelationshipsMultiple Relationships Queries must extract multi-dimensional time slicesQueries must extract multi-dimensional time slices Redundant DataRedundant Data

Page 12: Chapter 13   data warehousing

DSS Database RequirementsDSS Database Requirements

Data Extraction and FilteringData Extraction and Filtering– DSS databases are created mainly by extracting data DSS databases are created mainly by extracting data

from operational databases combined with data from operational databases combined with data imported from external sourceimported from external source Need for advanced data extraction & filtering toolsNeed for advanced data extraction & filtering tools Allow batch / scheduled data extractionAllow batch / scheduled data extraction Support different types of data sourcesSupport different types of data sources Check for inconsistent data / data validation rulesCheck for inconsistent data / data validation rules Support advanced data integration / data formatting conflictsSupport advanced data integration / data formatting conflicts

Page 13: Chapter 13   data warehousing

DSS Database RequirementsDSS Database Requirements

End User Analytical InterfaceEnd User Analytical Interface– Must support advanced data modeling and data Must support advanced data modeling and data

presentation toolspresentation tools– Data analysis toolsData analysis tools– Query generationQuery generation– Must Allow the User to Navigate through the DSSMust Allow the User to Navigate through the DSS

Size RequirementsSize Requirements– VERY Large – TerabytesVERY Large – Terabytes– Advanced Hardware (Multiple processors, multiple disk Advanced Hardware (Multiple processors, multiple disk

arrays, etc.)arrays, etc.)

Page 14: Chapter 13   data warehousing

Data WarehouseData Warehouse

DSS – friendly data repository for the DSS is DSS – friendly data repository for the DSS is the DATA WAREHOUSEthe DATA WAREHOUSE

Definition: Integrated, Subject-Oriented, Definition: Integrated, Subject-Oriented, Time-Variant, Nonvolatile database that Time-Variant, Nonvolatile database that provides support for decision makingprovides support for decision making

Page 15: Chapter 13   data warehousing

IntegratedIntegrated

The data warehouse is a centralized, The data warehouse is a centralized, consolidated database that integrated data consolidated database that integrated data derived from the entire organizationderived from the entire organization– Multiple SourcesMultiple Sources– Diverse SourcesDiverse Sources– Diverse FormatsDiverse Formats

Page 16: Chapter 13   data warehousing

Subject-OrientedSubject-Oriented

Data is arranged and optimized to provide Data is arranged and optimized to provide answer to questions from diverse functional answer to questions from diverse functional areasareas– Data is organized and summarized by topicData is organized and summarized by topic

Sales / Marketing / Finance / Distribution / Etc.Sales / Marketing / Finance / Distribution / Etc.

Page 17: Chapter 13   data warehousing

Time-VariantTime-Variant

The Data Warehouse represents the flow of The Data Warehouse represents the flow of data through timedata through time

Can contain projected data from statistical Can contain projected data from statistical modelsmodels

Data is periodically uploaded then time-Data is periodically uploaded then time-dependent data is recomputeddependent data is recomputed

Page 18: Chapter 13   data warehousing

NonvolatileNonvolatile

Once data is entered it is NEVER removedOnce data is entered it is NEVER removed Represents the company’s entire historyRepresents the company’s entire history

– Near term history is continually added to itNear term history is continually added to it– Always growingAlways growing– Must support terabyte databases and Must support terabyte databases and

multiprocessorsmultiprocessors

Read-Only database for data analysis and Read-Only database for data analysis and query processingquery processing

Page 19: Chapter 13   data warehousing

Data MartsData Marts

Small Data StoresSmall Data Stores More manageable data setsMore manageable data sets Targeted to meet the needs of small groups Targeted to meet the needs of small groups

within the organizationwithin the organization

Small, Single-Subject data warehouse Small, Single-Subject data warehouse subset that provides decision support to a subset that provides decision support to a small group of peoplesmall group of people

Page 20: Chapter 13   data warehousing

OLAPOLAP

Online Analytical Processing ToolsOnline Analytical Processing Tools DSS tools that use multidimensional data DSS tools that use multidimensional data

analysis techniquesanalysis techniques– Support for a DSS data storeSupport for a DSS data store– Data extraction and integration filterData extraction and integration filter– Specialized presentation interfaceSpecialized presentation interface

Page 21: Chapter 13   data warehousing

12 Rules of a Data Warehouse12 Rules of a Data Warehouse

Data Warehouse and Operational Data Warehouse and Operational Environments are SeparatedEnvironments are Separated

Data is integratedData is integrated Contains historical data over a long period Contains historical data over a long period

of timeof time Data is a snapshot data captured at a given Data is a snapshot data captured at a given

point in timepoint in time Data is subject-orientedData is subject-oriented

Page 22: Chapter 13   data warehousing

12 Rules of Data Warehouse12 Rules of Data Warehouse

Mainly read-only with periodic batch updatesMainly read-only with periodic batch updates Development Life Cycle has a data driven Development Life Cycle has a data driven

approach versus the traditional process-approach versus the traditional process-driven approachdriven approach

Data contains several levels of detailData contains several levels of detail– Current, Old, Lightly Summarized, Highly Current, Old, Lightly Summarized, Highly

SummarizedSummarized

Page 23: Chapter 13   data warehousing

12 Rules of Data Warehouse12 Rules of Data Warehouse

Environment is characterized by Read-only Environment is characterized by Read-only transactions to very large data setstransactions to very large data sets

System that traces data sources, transformations, System that traces data sources, transformations, and storageand storage

Metadata is a critical componentMetadata is a critical component– Source, transformation, integration, storage, Source, transformation, integration, storage,

relationships, history, etcrelationships, history, etc Contains a chargeback mechanism for resource Contains a chargeback mechanism for resource

usage that enforces optimal use of data by end usage that enforces optimal use of data by end usersusers

Page 24: Chapter 13   data warehousing

OLAPOLAP

Need for More Intensive Decision SupportNeed for More Intensive Decision Support 4 Main Characteristics4 Main Characteristics

– Multidimensional data analysisMultidimensional data analysis– Advanced Database SupportAdvanced Database Support– Easy-to-use end-user interfacesEasy-to-use end-user interfaces– Support Client/Server architectureSupport Client/Server architecture

Page 25: Chapter 13   data warehousing

Multidimensional Data Analysis Multidimensional Data Analysis TechniquesTechniques

Advanced Data Presentation FunctionsAdvanced Data Presentation Functions– 3-D graphics, Pivot Tables, Crosstabs, etc.3-D graphics, Pivot Tables, Crosstabs, etc.– Compatible with Spreadsheets & Statistical Compatible with Spreadsheets & Statistical

packagespackages– Advanced data aggregations, consolidation and Advanced data aggregations, consolidation and

classification across time dimensionsclassification across time dimensions– Advanced computational functionsAdvanced computational functions– Advanced data modeling functionsAdvanced data modeling functions

Page 26: Chapter 13   data warehousing

Advanced Database SupportAdvanced Database Support

Advanced Data Access FeaturesAdvanced Data Access Features– Access to many kinds of DBMS’s, flat files, and Access to many kinds of DBMS’s, flat files, and

internal and external data sourcesinternal and external data sources– Access to aggregated data warehouse dataAccess to aggregated data warehouse data– Advanced data navigation (drill-downs and roll-Advanced data navigation (drill-downs and roll-

ups)ups)– Ability to map end-user requests to the Ability to map end-user requests to the

appropriate data sourceappropriate data source– Support for Very Large DatabasesSupport for Very Large Databases

Page 27: Chapter 13   data warehousing

Easy-to-Use End-User InterfaceEasy-to-Use End-User Interface

Graphical User InterfacesGraphical User Interfaces Much more useful if access is kept simpleMuch more useful if access is kept simple

Page 28: Chapter 13   data warehousing

Client/Server ArchitectureClient/Server Architecture

Framework for the new systems to be Framework for the new systems to be designed, developed and implementeddesigned, developed and implemented

Divide the OLAP system into several Divide the OLAP system into several components that define its architecturecomponents that define its architecture– Same ComputerSame Computer– Distributed among several computerDistributed among several computer

Page 29: Chapter 13   data warehousing

OLAP ArchitectureOLAP Architecture

3 Main Modules3 Main Modules– GUIGUI– Analytical Processing LogicAnalytical Processing Logic– Data-processing LogicData-processing Logic

Page 30: Chapter 13   data warehousing

OLAP Client/Server OLAP Client/Server ArchitectureArchitecture

Page 31: Chapter 13   data warehousing

Relational OLAPRelational OLAP

Relational Online Analytical ProcessingRelational Online Analytical Processing– OLAP functionality using relational database OLAP functionality using relational database

and familiar query tools to store and analyze and familiar query tools to store and analyze multidimensional datamultidimensional data

Multidimensional data schema supportMultidimensional data schema support Data access language & query performance Data access language & query performance

for multidimensional datafor multidimensional data Support for Very Large DatabasesSupport for Very Large Databases

Page 32: Chapter 13   data warehousing

Multidimensional Data Schema Multidimensional Data Schema SupportSupport

Decision Support Data tends to beDecision Support Data tends to be– NonnormalizedNonnormalized– DuplicatedDuplicated– PreaggregatedPreaggregated

Star SchemaStar Schema– Special Design technique for multidimensional Special Design technique for multidimensional

data representationsdata representations– Optimize data query operations instead of data Optimize data query operations instead of data

update operationsupdate operations

Page 33: Chapter 13   data warehousing

Star SchemasStar Schemas

Data Modeling Technique to map Data Modeling Technique to map multidimensional decision support data into multidimensional decision support data into a relational databasea relational database

Current Relational modeling techniques do Current Relational modeling techniques do not serve the needs of advanced data not serve the needs of advanced data requirementsrequirements

Page 34: Chapter 13   data warehousing

Star SchemaStar Schema

4 Components4 Components– FactsFacts– DimensionsDimensions– AttributesAttributes– Attribute HierarchiesAttribute Hierarchies

Page 35: Chapter 13   data warehousing

FactsFacts

Numeric measurements (values) that represent a Numeric measurements (values) that represent a specific business aspect or activityspecific business aspect or activity

Stored in a fact table at the center of the star Stored in a fact table at the center of the star schemescheme

Contains facts that are linked through their Contains facts that are linked through their dimensionsdimensions

Can be computed or derived at run timeCan be computed or derived at run time Updated periodically with data from operational Updated periodically with data from operational

databasesdatabases

Page 36: Chapter 13   data warehousing

DimensionsDimensions

Qualifying characteristics that provide Qualifying characteristics that provide additional perspectives to a given factadditional perspectives to a given fact– DSS data is almost always viewed in relation to DSS data is almost always viewed in relation to

other dataother data

Dimensions are normally stored in Dimensions are normally stored in dimension tablesdimension tables

Page 37: Chapter 13   data warehousing

AttributesAttributes

Dimension Tables contain AttributesDimension Tables contain Attributes Attributes are used to search, filter, or classify Attributes are used to search, filter, or classify

factsfacts Dimensions provide descriptive characteristics Dimensions provide descriptive characteristics

about the facts through their attributedabout the facts through their attributed Must define common business attributes that will Must define common business attributes that will

be used to narrow a search, group information, or be used to narrow a search, group information, or describe dimensions. (ex.: Time / Location / describe dimensions. (ex.: Time / Location / Product)Product)

No mathematical limit to the number of dimensions No mathematical limit to the number of dimensions (3-D makes it easy to model)(3-D makes it easy to model)

Page 38: Chapter 13   data warehousing

Attribute HierarchiesAttribute Hierarchies

Provides a Top-Down data organizationProvides a Top-Down data organization– AggregationAggregation– Drill-down / Roll-Up data analysisDrill-down / Roll-Up data analysis

Attributes from different dimensions can be Attributes from different dimensions can be grouped to form a hierarchygrouped to form a hierarchy

Page 39: Chapter 13   data warehousing

Star Schema for SalesStar Schema for Sales

Fact Table

Dimension Tables

Page 40: Chapter 13   data warehousing

Star Schema RepresentationStar Schema Representation

Fact and Dimensions are represented by physical Fact and Dimensions are represented by physical tables in the data warehouse databasetables in the data warehouse database

Fact tables are related to each dimension table in Fact tables are related to each dimension table in a Many to One relationship (Primary/Foreign Key a Many to One relationship (Primary/Foreign Key Relationships)Relationships)

Fact Table is related to many dimension tablesFact Table is related to many dimension tables– The primary key of the fact table is a composite primary The primary key of the fact table is a composite primary

key from the dimension tableskey from the dimension tables Each fact table is designed to answer a specific Each fact table is designed to answer a specific

DSS questionDSS question

Page 41: Chapter 13   data warehousing

Star SchemaStar Schema

The fact table is always the larges table in The fact table is always the larges table in the star schemathe star schema

Each dimension record is related to Each dimension record is related to thousand of fact recordsthousand of fact records

Star Schema facilitated data retrieval Star Schema facilitated data retrieval functionsfunctions

DBMS first searches the Dimension Tables DBMS first searches the Dimension Tables before the larger fact tablebefore the larger fact table

Page 42: Chapter 13   data warehousing

Data Warehouse ImplementationData Warehouse Implementation

An Active Decision Support FrameworkAn Active Decision Support Framework– Not a Static DatabaseNot a Static Database– Always a Work in ProcessAlways a Work in Process– Complete Infrastructure for Company-Wide Complete Infrastructure for Company-Wide

decision supportdecision support– Hardware / Software / People / Procedures / Hardware / Software / People / Procedures /

DataData– Data Warehouse is a critical component of the Data Warehouse is a critical component of the

Modern DSS – But not the Only critical Modern DSS – But not the Only critical componentcomponent

Page 43: Chapter 13   data warehousing

Data MiningData Mining

Discover Previously unknown data Discover Previously unknown data characteristics, relationships, dependencies, characteristics, relationships, dependencies, or trendsor trends

Typical Data Analysis Relies on end users Typical Data Analysis Relies on end users – Define the ProblemDefine the Problem– Select the DataSelect the Data– Initial the Data AnalysisInitial the Data Analysis– Reacts to External StimulusReacts to External Stimulus

Page 44: Chapter 13   data warehousing

Data MiningData Mining

ProactiveProactive Automatically searchesAutomatically searches

– AnomaliesAnomalies– Possible RelationshipsPossible Relationships– Identify Problems before the end-userIdentify Problems before the end-user

Data Mining tools analyze the data, uncover Data Mining tools analyze the data, uncover problems or opportunities hidden in data problems or opportunities hidden in data relationships, form computer models based on relationships, form computer models based on their findings, and then user the models to predict their findings, and then user the models to predict business behavior – with minimal end-user business behavior – with minimal end-user interventionintervention

Page 45: Chapter 13   data warehousing

Data MiningData Mining

A methodology designed to perform A methodology designed to perform knowledge-discovery expeditions over the knowledge-discovery expeditions over the database data with minimal end-user database data with minimal end-user interventionintervention

3 Stages of Data3 Stages of Data– DataData– InformationInformation– KnowledgeKnowledge

Page 46: Chapter 13   data warehousing

Extraction of Knowledge from Extraction of Knowledge from DataData

Page 47: Chapter 13   data warehousing

4 Phases of Data Mining4 Phases of Data Mining

Data PreparationData Preparation– Identify the main data sets to be used by the Identify the main data sets to be used by the

data mining operation (usually the data data mining operation (usually the data warehouse)warehouse)

Data Analysis and ClassificationData Analysis and Classification– Study the data to identify common data Study the data to identify common data

characteristics or patternscharacteristics or patterns Data groupings, classifications, clusters, sequencesData groupings, classifications, clusters, sequences Data dependencies, links, or relationshipsData dependencies, links, or relationships Data patterns, trends, deviationData patterns, trends, deviation

Page 48: Chapter 13   data warehousing

4 Phases of Data Mining4 Phases of Data Mining

Knowledge AcquisitionKnowledge Acquisition– Uses the Results of the Data Analysis and Classification phaseUses the Results of the Data Analysis and Classification phase– Data mining tool selects the appropriate modeling or knowledge-Data mining tool selects the appropriate modeling or knowledge-

acquisition algorithmsacquisition algorithms Neural NetworksNeural Networks Decision TreesDecision Trees Rules InductionRules Induction Genetic algorithmsGenetic algorithms Memory-Based ReasoningMemory-Based Reasoning

PrognosisPrognosis– Predict Future BehaviorPredict Future Behavior– Forecast Business OutcomesForecast Business Outcomes

65% of customers who did not use a particular credit card in the last 6 65% of customers who did not use a particular credit card in the last 6 months are 88% likely to cancel the account.months are 88% likely to cancel the account.

Page 49: Chapter 13   data warehousing

Data MiningData Mining

Still a New TechniqueStill a New Technique May find many Unmeaningful RelationshipsMay find many Unmeaningful Relationships Good at finding Practical RelationshipsGood at finding Practical Relationships

– Define Customer Buying PatternsDefine Customer Buying Patterns– Improve Product Development and AcceptanceImprove Product Development and Acceptance– Etc.Etc.

Potential of becoming the next frontier in Potential of becoming the next frontier in database developmentdatabase development