Database Systems – Data Warehousing INTRODUCTION There exists an information gap amongst organizations. Organizations have plenty of data, but little information. Most data is gathered in a fragmented manner from many sources (both internal and external). Additionally, most systems are designed for transactional purposes, which doesn’t lend itself to information gathering. Think about how limits of the operation of an ATM machine.
28
Embed
Database Systems – Data Warehousing INTRODUCTION There exists an information gap amongst organizations. Organizations have plenty of data, but little information.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Database Systems – Data WarehousingINTRODUCTION
There exists an information gap amongst organizations.
Organizations have plenty of data, but little information.
Most data is gathered in a fragmented manner from many sources (both internal and external).
Additionally, most systems are designed for transactional purposes, which doesn’t lend itself to information gathering. Think about how limits of the operation of an ATM machine.
Database Systems – Data WarehousingPROBLEMS WITH TRANSACTIONAL SYSTEMS
Only had scattered transactional systems in the organization – data spread among different systems
Transactional systems were not designed for decision support analysis
Data constantly changes on transactional systems
Lack of historical data
Often resources were taxed with both needs on the same systems
Operational databases are designed to keep transactions from daily operations. It is optimized to efficiently update or create individual records
A database for analysis on the other hand needs to be geared toward flexible requests or queries (Ad hoc, statistical analysis)
Database Systems – Data WarehousingDEFINITION – DATA WAREHOUSE
A data warehouse is a subject-oriented, integrated, time-variant, nonupdatable collection of data used in support of management decision-making processes and business intelligence.
The term was introduced in 1990 by William Immon
Database Systems – Data WarehousingDEFINITION – DATA WAREHOUSE
A data warehouse is a subject-oriented, integrated, time-variant, nonupdatable collection of data used in support of management decision-making processes and business intelligence.
Subject-oriented: A data warehouse is organized around the key subjects (or high-level entities) of the enterprise. Major subjects may include: customers, patients, students, products, and time.
Database Systems – Data WarehousingDEFINITION – DATA WAREHOUSE
A data warehouse is a subject-oriented, integrated, time-variant, nonupdatable collection of data used in support of management decision-making processes and business intelligence.
Integrated: The data housed in the data warehouse are defined using consistent naming conventions, formats, encoding structures, and related characteristics gathered from several internal systems of record and also often from sources external to the organization. This means that the data warehouse holds one version of the truth.
Often data must be converted (standardized) as it’s loaded.
Database Systems – Data WarehousingDEFINITION – DATA WAREHOUSE
A data warehouse is a subject-oriented, integrated, time-variant, nonupdatable collection of data used in support of management decision-making processes and business intelligence.
Time-variant: Data in the data warehouse contain a time dimension so that they may be used to study trends and changes.
Database Systems – Data WarehousingDEFINITION – DATA WAREHOUSE
A data warehouse is a subject-oriented, integrated, time-variant, nonupdatable collection of data used in support of management decision-making processes and business intelligence.
Nonupdatable: Data in the data warehouse is loaded and refreshed from operational systems, it is not updated by end users.
Data may not be refreshed as much as appended to.
Database Systems – Data WarehousingWHAT IS A DATA WAREHOUSE
Not a product, it is a process
Combination of hardware and software
Concept of a Data Warehouse is not new, but the technology that allows it is
Can often be set up as one VLDB (Very Large Database) or a collection of subject areas called Data Marts.
There are now tools which “unify” these Data Marts and make it appear as a single database to the end user.
Database Systems – Data WarehousingWHAT IS A DATA WAREHOUSE
Transformation of Data to Information
Transaction Processing
Cleansing & Normalization
Relational Warehouse
SQL reporting
Exploration / AnalysisInformation
Data
External DataSources
Internal Data Sources
Monitoring & Administration
Metadata
DataWarehouse
ExtractTransformLoadRefresh
OLAP Server
s
Data Marts
Serve
Analysis
Query
Reporting
Data Information
DataMining
Database Systems – Data WarehousingWHAT IS A DATA WAREHOUSE
• Data will come from multiple databases and files within the organization
• Also can come from outside sources• Examples:
• Weather Reports• Demographic information by Zip Code
External DataSources
Internal Data Sources
Database Systems – Data Warehousing
GETTING DATA IN
1. Extraction Phase
2. Transformation Phase
3. Loading Phase
ExtractTransformLoadRefresh
Database Systems – Data Warehousing
GETTING DATA IN
Extraction Phase
• Source systems export data via files or populates directly when the databases can “talk” to each other
• Transfers them to the Data Warehouse server and puts it into some sort of staging area
ExtractTransformLoadRefresh
Database Systems – Data Warehousing
GETTING DATA IN
Transformation Phase
• Takes data and turns it into a form that is suitable for insertion into the warehouse
• Combines related data
• Removes redundancies
• Common Codes (Commercial Customer)
• Spelling Mistakes (Lozenges)
• Consistency (PA,Pa,Penna,Pennsylvania)
• Formatting (Addresses)
ExtractTransformLoadRefresh
Database Systems – Data Warehousing
GETTING DATA IN
Loading Phase
• Places the cleaned data into the DBMS in its final, useable form
• Compare data from source systems and the Data Warehouse
• Document the load information for the users
ExtractTransformLoadRefresh
Database Systems – Data Warehousing
GETTING DATA IN
1. Hardware2. DBMS - Database Management System
DataWarehouse
Data Marts
Database Systems – Data Warehousing
COMPONENTS OF A DATA WAREHOUSE
3. Front End Access Tools4. Other Tools & Extensions
Monitoring & Administration
Metadata
DataWarehouse
OLAP
Data Marts
Serve
AnalysisQuery
Reporting
DataMining
Database Systems – Data Warehousing
COMPONENTS OF A DATA WAREHOUSE
Four General Components:1. Hardware2. DBMS - Database Management System3. Front End Access Tools4. Other Tools & Extensions
In all components scalability is vitalScalability is the ability to grow as your data and processing needs increase
Database Systems – Data Warehousing
COMPONENTS OF A DATA WAREHOUSE
• Power - # of Processors, Memory, I/O Bandwidth, and Speed of the Bus
• Availability – Redundant equipment
• Disk Storage - Speed and enough storage for the loaded data set
• Backup Solution - Automated and be able to allow for incremental backups and archiving older data
DataWarehouse
Database Systems – Data Warehousing
HARDWARE
• Physical storage capacity of the DBMS• Loading, indexing, and processing speed• Availability• Handle your data needs• Operational integrity, reliability, and manageability